|
# Adversarial-MidiBERT |
|
|
|
The description is generated by Grok3. |
|
|
|
|
|
|
|
## Model Details |
|
|
|
- **Model Name**: Adversarial-MidiBERT |
|
|
|
- **Model Type**: Transformer-based model for symbolic music understanding |
|
|
|
- **Version**: 1.0 |
|
|
|
- **Release Date**: August 2025 |
|
|
|
- **Developers**: Zijian Zhao |
|
|
|
- **Organization**: SYSU |
|
|
|
- **License**: Apache License 2.0 |
|
|
|
- **Paper**: [Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale Adversarial Pre-training](https://dl.acm.org/doi/abs/10.1145/3731715.3733483), ACM ICMR 2025 |
|
|
|
- **Arxiv**: https://arxiv.org/abs/2407.08306 |
|
|
|
- Citation: |
|
|
|
``` |
|
@inproceedings{zhao2025let, |
|
title={Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale Adversarial Pre-training}, |
|
author={Zhao, Zijian}, |
|
booktitle={Proceedings of the 2025 International Conference on Multimedia Retrieval}, |
|
pages={2128--2132}, |
|
year={2025} |
|
} |
|
``` |
|
|
|
- **Contact**: [email protected] |
|
|
|
- **Repository**: https://github.com/RS2002/Adversarial-MidiBERT |
|
|
|
|
|
|
|
## Model Description |
|
|
|
Adversarial-MidiBERT is a transformer-based model designed for symbolic music understanding, leveraging large-scale adversarial pre-training. It builds upon the [MidiBERT-Piano](https://github.com/wazenmai/MIDI-BERT) framework and extends it with adversarial pre-training techniques to enhance performance on music-related tasks. The model processes symbolic music data in an octuple format and can be fine-tuned for various downstream tasks such as music generation, classification, and analysis. |
|
|
|
- **Architecture**: Transformer-based (based on MidiBERT) |
|
- **Input Format**: Octuple representation of symbolic music (batch_size, sequence_length, 8) |
|
- **Output Format**: Hidden states of dimension [batch_size, sequence_length, 768] |
|
- **Hidden Size**: 768 |
|
- **Training Objective**: Adversarial pre-training followed by task-specific fine-tuning |
|
- **Tasks Supported**: Symbolic music understanding tasks |
|
|
|
## Training Data |
|
|
|
The model was pre-trained and fine-tuned on the following datasets: |
|
|
|
- **POP1K7**: A dataset of popular music MIDI files. |
|
- **POP909**: A dataset of 909 pop songs in MIDI format. |
|
- **Pinaist8**: A dataset of piano performances. |
|
- **EMOPIA**: A dataset for emotion-based music analysis. |
|
- **GiantMIDI**: A large-scale MIDI dataset. |
|
|
|
For details on dataset preprocessing and dictionary files, refer to the [PianoBART repository](https://github.com/RS2002/PianoBart). Pre-training data should be placed in `./Data/output_pretrain`. |
|
|
|
|
|
|
|
## Usage |
|
|
|
### Installation |
|
|
|
```shell |
|
git clone https://huggingface.co/RS2002/Adversarial-MidiBERT |
|
``` |
|
|
|
Please ensure that the `model.py` and `Octuple.pkl` files are located in the same folder. |
|
|
|
### Example Code |
|
|
|
```python |
|
import torch |
|
from model import Adversarial_MidiBERT |
|
|
|
# Load the model |
|
model = Adversarial_MidiBERT.from_pretrained("RS2002/Adversarial-MidiBERT") |
|
|
|
# Example input |
|
input_ids = torch.randint(0, 10, (2, 1024, 8)) |
|
attention_mask = torch.zeros((2, 1024)) |
|
|
|
# Forward pass |
|
y = model(input_ids, attention_mask) |
|
print(y.last_hidden_state.shape) # Output: [2, 1024, 768] |
|
``` |