File size: 3,110 Bytes
5ce3e16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62856ac
 
5ce3e16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# Adversarial-MidiBERT

The description is generated by Grok3.



## Model Details

- **Model Name**: Adversarial-MidiBERT

- **Model Type**: Transformer-based model for symbolic music understanding

- **Version**: 1.0

- **Release Date**: August 2025

- **Developers**: Zijian Zhao

- **Organization**: SYSU

- **License**: Apache License 2.0

- **Paper**: [Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale Adversarial Pre-training](https://dl.acm.org/doi/abs/10.1145/3731715.3733483), ACM ICMR 2025

- **Arxiv**: https://arxiv.org/abs/2407.08306

- Citation:

  ```
  @inproceedings{zhao2025let,
    title={Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale Adversarial Pre-training},
    author={Zhao, Zijian},
    booktitle={Proceedings of the 2025 International Conference on Multimedia Retrieval},
    pages={2128--2132},
    year={2025}
  }
  ```

- **Contact**: [email protected]

- **Repository**: https://github.com/RS2002/Adversarial-MidiBERT



## Model Description

Adversarial-MidiBERT is a transformer-based model designed for symbolic music understanding, leveraging large-scale adversarial pre-training. It builds upon the [MidiBERT-Piano](https://github.com/wazenmai/MIDI-BERT) framework and extends it with adversarial pre-training techniques to enhance performance on music-related tasks. The model processes symbolic music data in an octuple format and can be fine-tuned for various downstream tasks such as music generation, classification, and analysis.

- **Architecture**: Transformer-based (based on MidiBERT)
- **Input Format**: Octuple representation of symbolic music (batch_size, sequence_length, 8)
- **Output Format**: Hidden states of dimension [batch_size, sequence_length, 768]
- **Hidden Size**: 768
- **Training Objective**: Adversarial pre-training followed by task-specific fine-tuning
- **Tasks Supported**: Symbolic music understanding tasks

## Training Data

The model was pre-trained and fine-tuned on the following datasets:

- **POP1K7**: A dataset of popular music MIDI files.
- **POP909**: A dataset of 909 pop songs in MIDI format.
- **Pinaist8**: A dataset of piano performances.
- **EMOPIA**: A dataset for emotion-based music analysis.
- **GiantMIDI**: A large-scale MIDI dataset.

For details on dataset preprocessing and dictionary files, refer to the [PianoBART repository](https://github.com/RS2002/PianoBart). Pre-training data should be placed in `./Data/output_pretrain`.



## Usage

### Installation

```shell
git clone https://huggingface.co/RS2002/Adversarial-MidiBERT
```

Please ensure that the `model.py` and `Octuple.pkl` files are located in the same folder.

### Example Code

```python
import torch
from model import Adversarial_MidiBERT

# Load the model
model = Adversarial_MidiBERT.from_pretrained("RS2002/Adversarial-MidiBERT")

# Example input
input_ids = torch.randint(0, 10, (2, 1024, 8))
attention_mask = torch.zeros((2, 1024))

# Forward pass
y = model(input_ids, attention_mask)
print(y.last_hidden_state.shape)  # Output: [2, 1024, 768]
```