--- license: mit language: - en pipeline_tag: video-classification --- # Model Card for UniformerV2 UniformerV2 is a large transformer-based model trained on a binary classification task. Specifically, it is trained to detect whether the input video contains a chimpanzee(s) exhibiting a reaction to the presence of a camera trap. ## Model Details ### Model Description UniformerV2 is a large transformer-based model trained on a binary classification task. Specifically, it is trained to detect whether the input video contains a chimpanzee(s) exhibiting a reaction to the presence of a camera trap. As the dataset heavily favours videos exhibiting no reaction to the camera, we employ a class-balanced focal loss to address the class imbalance. - **Developed by:** Otto Brookes, Christophe Boesch, Hjalmar S. Kühl, Majid Mirmehdi, Tilo Burghardt - **Model type:** Vision Transformer, UniformerV2 - **License:** MIT ## Training Details ### Training Data It is trained on camera trap video footage from 15 different countries in Africa as part of the The Pan African Programme: The Cultured Chimpanzee. ### Results We use mean average precision to evaluate models | Dataset | Model | Loss | mAP (%) | |-----------|------------|------------|---------| | PanAf | Uniformer | CB Focal | 87.82% |