File size: 3,110 Bytes
5ce3e16 62856ac 5ce3e16 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
# Adversarial-MidiBERT
The description is generated by Grok3.
## Model Details
- **Model Name**: Adversarial-MidiBERT
- **Model Type**: Transformer-based model for symbolic music understanding
- **Version**: 1.0
- **Release Date**: August 2025
- **Developers**: Zijian Zhao
- **Organization**: SYSU
- **License**: Apache License 2.0
- **Paper**: [Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale Adversarial Pre-training](https://dl.acm.org/doi/abs/10.1145/3731715.3733483), ACM ICMR 2025
- **Arxiv**: https://arxiv.org/abs/2407.08306
- Citation:
```
@inproceedings{zhao2025let,
title={Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale Adversarial Pre-training},
author={Zhao, Zijian},
booktitle={Proceedings of the 2025 International Conference on Multimedia Retrieval},
pages={2128--2132},
year={2025}
}
```
- **Contact**: [email protected]
- **Repository**: https://github.com/RS2002/Adversarial-MidiBERT
## Model Description
Adversarial-MidiBERT is a transformer-based model designed for symbolic music understanding, leveraging large-scale adversarial pre-training. It builds upon the [MidiBERT-Piano](https://github.com/wazenmai/MIDI-BERT) framework and extends it with adversarial pre-training techniques to enhance performance on music-related tasks. The model processes symbolic music data in an octuple format and can be fine-tuned for various downstream tasks such as music generation, classification, and analysis.
- **Architecture**: Transformer-based (based on MidiBERT)
- **Input Format**: Octuple representation of symbolic music (batch_size, sequence_length, 8)
- **Output Format**: Hidden states of dimension [batch_size, sequence_length, 768]
- **Hidden Size**: 768
- **Training Objective**: Adversarial pre-training followed by task-specific fine-tuning
- **Tasks Supported**: Symbolic music understanding tasks
## Training Data
The model was pre-trained and fine-tuned on the following datasets:
- **POP1K7**: A dataset of popular music MIDI files.
- **POP909**: A dataset of 909 pop songs in MIDI format.
- **Pinaist8**: A dataset of piano performances.
- **EMOPIA**: A dataset for emotion-based music analysis.
- **GiantMIDI**: A large-scale MIDI dataset.
For details on dataset preprocessing and dictionary files, refer to the [PianoBART repository](https://github.com/RS2002/PianoBart). Pre-training data should be placed in `./Data/output_pretrain`.
## Usage
### Installation
```shell
git clone https://huggingface.co/RS2002/Adversarial-MidiBERT
```
Please ensure that the `model.py` and `Octuple.pkl` files are located in the same folder.
### Example Code
```python
import torch
from model import Adversarial_MidiBERT
# Load the model
model = Adversarial_MidiBERT.from_pretrained("RS2002/Adversarial-MidiBERT")
# Example input
input_ids = torch.randint(0, 10, (2, 1024, 8))
attention_mask = torch.zeros((2, 1024))
# Forward pass
y = model(input_ids, attention_mask)
print(y.last_hidden_state.shape) # Output: [2, 1024, 768]
``` |