This repository contains the models submitted to Task 1 of the DCASE 2024 Challenge

Description

The task is to develop a data-efficient and low-complexity acoustic scene classification system. The challenge dataset consists of 1 second audio clips from one of 10 classes: airport, bus, metro, metro_station, park, public_square, shopping_mall, street_pedestrian, street_traffic, tram. Five models are trained on splits of the training data: 5%, 10%, 25%, 50%, and 100%, respectively.

We chose to use the baseline model architecture and apply a target-specific training process which involves a pretraining dataset that is pruned to match the target dataset. Knowledge distillation is used to transfer knowledge from a pre-trained audio tagging ensemble to the target model. A technical report describing the training process can be found here

Results

The full results of all participants can be found here: https://dcase.community/challenge2024/task-data-efficient-low-complexity-acoustic-scene-classification-results

The results of our submission compared to the baseline on the evaluation data are as follows:

Name Official rank Rank score Split 5% Split 10% Split 25% Split 50% Split 100%
Werning_UPBNT 8 54.35 49.21 % 52.51 % 55.49 % 56.20 % 58.34 %
Baseline 17 50.73 44.00 % 46.95 % 51.47 % 54.40 % 56.84 %

Usage

The example notebook shows how to predict the acoustic scene for a given audio file using the models.

The model code is adapted from the baseline repository: https://github.com/CPJKU/dcase2024_task1_baseline

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Collection including WestAI-SC/dcase24_challenge_task1_upbnt