# Adversarial-MidiBERT The description is generated by Grok3. ## Model Details - **Model Name**: Adversarial-MidiBERT - **Model Type**: Transformer-based model for symbolic music understanding - **Version**: 1.0 - **Release Date**: August 2025 - **Developers**: Zijian Zhao - **Organization**: SYSU - **License**: Apache License 2.0 - **Paper**: [Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale Adversarial Pre-training](https://dl.acm.org/doi/abs/10.1145/3731715.3733483), ACM ICMR 2025 - **Arxiv**: https://arxiv.org/abs/2407.08306 - Citation: ``` @inproceedings{zhao2025let, title={Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale Adversarial Pre-training}, author={Zhao, Zijian}, booktitle={Proceedings of the 2025 International Conference on Multimedia Retrieval}, pages={2128--2132}, year={2025} } ``` - **Contact**: zhaozj28@mail2.sysu.edu.cn - **Repository**: https://github.com/RS2002/Adversarial-MidiBERT ## Model Description Adversarial-MidiBERT is a transformer-based model designed for symbolic music understanding, leveraging large-scale adversarial pre-training. It builds upon the [MidiBERT-Piano](https://github.com/wazenmai/MIDI-BERT) framework and extends it with adversarial pre-training techniques to enhance performance on music-related tasks. The model processes symbolic music data in an octuple format and can be fine-tuned for various downstream tasks such as music generation, classification, and analysis. - **Architecture**: Transformer-based (based on MidiBERT) - **Input Format**: Octuple representation of symbolic music (batch_size, sequence_length, 8) - **Output Format**: Hidden states of dimension [batch_size, sequence_length, 768] - **Hidden Size**: 768 - **Training Objective**: Adversarial pre-training followed by task-specific fine-tuning - **Tasks Supported**: Symbolic music understanding tasks ## Training Data The model was pre-trained and fine-tuned on the following datasets: - **POP1K7**: A dataset of popular music MIDI files. - **POP909**: A dataset of 909 pop songs in MIDI format. - **Pinaist8**: A dataset of piano performances. - **EMOPIA**: A dataset for emotion-based music analysis. - **GiantMIDI**: A large-scale MIDI dataset. For details on dataset preprocessing and dictionary files, refer to the [PianoBART repository](https://github.com/RS2002/PianoBart). Pre-training data should be placed in `./Data/output_pretrain`. ## Usage ### Installation ```shell git clone https://huggingface.co/RS2002/Adversarial-MidiBERT ``` Please ensure that the `model.py` and `Octuple.pkl` files are located in the same folder. ### Example Code ```python import torch from model import Adversarial_MidiBERT # Load the model model = Adversarial_MidiBERT.from_pretrained("RS2002/Adversarial-MidiBERT") # Example input input_ids = torch.randint(0, 10, (2, 1024, 8)) attention_mask = torch.zeros((2, 1024)) # Forward pass y = model(input_ids, attention_mask) print(y.last_hidden_state.shape) # Output: [2, 1024, 768] ```