musicbert

Model Description

MusicBERT large is a 24-layer BERT-style masked language model trained on REMI+BPE symbolic music sequences extracted from the GigaMIDI corpus. It is tailored for symbolic music understanding, fill-mask style infilling, and as a backbone for downstream generative tasks.

Checkpoint: 130000 steps
Hidden size: 768
Parameters: ~150M
Validation loss: 1.509289264678955

Training Configuration

Objective: Masked language modeling with span-aware masking
Dataset: GigaMIDI (REMI tokens → BPE, vocab size 50000)
Sequence length: 1024
Max events per MIDI: 2048

Inference Example

Using with MIDI files

import torch
from transformers import BertForMaskedLM
from miditok import MusicTokenizer

# Load model and tokenizer
model = BertForMaskedLM.from_pretrained("manoskary/musicbert")
tokenizer = MusicTokenizer.from_pretrained("manoskary/miditok-REMI")

# Convert MIDI to BPE tokens (MIDI → REMI → BPE pipeline)
midi_path = "path/to/your/file.mid"
tok_seq = tokenizer(midi_path)
bpe_ids = tok_seq.ids

# Mask some tokens for prediction
import random
mask_token_id = 3  # MASK_None token
input_ids = bpe_ids.copy()
mask_positions = random.sample(range(1, len(input_ids)-1), k=5)
for pos in mask_positions:
    input_ids[pos] = mask_token_id

# Run inference
input_tensor = torch.tensor([input_ids])
with torch.no_grad():
    outputs = model(input_tensor)
    predictions = outputs.logits[0, mask_positions, :].argmax(dim=-1)

print("Predicted token IDs:", predictions.tolist())

Limitations and Risks

Model is trained purely on symbolic data; it does not produce audio directly.
The GigaMIDI dataset is biased towards Western tonal music.
Long-form structure beyond 1024 tokens requires chunking or iterative decoding.
Generated continuations may need post-processing to ensure musical coherence.

Citation

If you use this checkpoint, please cite the original MusicBERT introduction and the GigaMIDI dataset.

Downloads last month: 9

Safetensors

Model size

0.1B params

Tensor type

F32

manoskary
/

musicbert

musicbert

Model Description

Training Configuration

Inference Example

Using with MIDI files

Limitations and Risks

Citation

Dataset used to train manoskary/musicbert

musicbert

Model Description

Training Configuration

Inference Example

Using with MIDI files

Limitations and Risks

Citation

Dataset used to train manoskary/musicbert

🎉 Free Image Generator Now Available!