|
--- |
|
license: mit |
|
language: |
|
- en |
|
pipeline_tag: video-classification |
|
--- |
|
|
|
# Model Card for UniformerV2 |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
UniformerV2 is a large transformer-based model trained on a binary classification task. Specifically, it is trained to detect whether the input video contains a chimpanzee(s) exhibiting a reaction to the presence of a camera trap. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
UniformerV2 is a large transformer-based model trained on a binary classification task. Specifically, it is trained to detect whether the input video contains a chimpanzee(s) exhibiting a reaction to the presence of a camera trap. As the dataset heavily favours videos exhibiting no reaction to the camera, we employ a class-balanced focal loss to address the class imbalance. |
|
|
|
- **Developed by:** Otto Brookes, Christophe Boesch, Hjalmar S. Kühl, Majid Mirmehdi, Tilo Burghardt |
|
- **Model type:** Vision Transformer, UniformerV2 |
|
- **License:** MIT |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
It is trained on camera trap video footage from 15 different countries in Africa as part of the The Pan African Programme: The Cultured Chimpanzee. |
|
|
|
### Results |
|
We use mean average precision to evaluate models |
|
| Dataset | Model | Loss | mAP (%) | |
|
|-----------|------------|------------|---------| |
|
| PanAf | Uniformer | CB Focal | 87.82% | |