File size: 1,442 Bytes

c38c982
 
 
 
 
 
 
4949dd1
c38c982
 
a8b10e5
c38c982
 
 
 
 
 
bef85e9
c38c982
bef85e9
 
4949dd1
c38c982
 
 
 
a8b10e5
c38c982
df1e7db
a8b10e5

---
license: mit
language:
- en
pipeline_tag: video-classification
---

# Model Card for UniformerV2 

<!-- Provide a quick summary of what the model is/does. -->
UniformerV2 is a large transformer-based model trained on a binary classification task. Specifically, it is trained to detect whether the input video contains a chimpanzee(s) exhibiting a reaction to the presence of a camera trap.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->
UniformerV2 is a large transformer-based model trained on a binary classification task. Specifically, it is trained to detect whether the input video contains a chimpanzee(s) exhibiting a reaction to the presence of a camera trap. As the dataset heavily favours videos exhibiting no reaction to the camera, we employ a class-balanced focal loss to address the class imbalance.

- **Developed by:** Otto Brookes, Christophe Boesch, Hjalmar S. Kühl, Majid Mirmehdi, Tilo Burghardt
- **Model type:** Vision Transformer, UniformerV2
- **License:** MIT

## Training Details

### Training Data
It is trained on camera trap video footage from 15 different countries in Africa as part of the The Pan African Programme: The Cultured Chimpanzee.

### Results
We use mean average precision to evaluate models 
| Dataset   | Model      | Loss       | mAP (%) |
|-----------|------------|------------|---------|
| PanAf     | Uniformer  | CB Focal   | 87.82%  |