musicbert

Model Description

MusicBERT large is a 24-layer BERT-style masked language model trained on REMI+BPE symbolic music sequences extracted from the GigaMIDI corpus. It is tailored for symbolic music understanding, fill-mask style infilling, and as a backbone for downstream generative tasks.

  • Checkpoint: 130000 steps
  • Hidden size: 768
  • Parameters: ~150M
  • Validation loss: 1.509289264678955

Training Configuration

  • Objective: Masked language modeling with span-aware masking
  • Dataset: GigaMIDI (REMI tokens → BPE, vocab size 50000)
  • Sequence length: 1024
  • Max events per MIDI: 2048

Inference Example

Using with MIDI files

import torch
from transformers import BertForMaskedLM
from miditok import MusicTokenizer

# Load model and tokenizer
model = BertForMaskedLM.from_pretrained("manoskary/musicbert")
tokenizer = MusicTokenizer.from_pretrained("manoskary/miditok-REMI")

# Convert MIDI to BPE tokens (MIDI → REMI → BPE pipeline)
midi_path = "path/to/your/file.mid"
tok_seq = tokenizer(midi_path)
bpe_ids = tok_seq.ids

# Mask some tokens for prediction
import random
mask_token_id = 3  # MASK_None token
input_ids = bpe_ids.copy()
mask_positions = random.sample(range(1, len(input_ids)-1), k=5)
for pos in mask_positions:
    input_ids[pos] = mask_token_id

# Run inference
input_tensor = torch.tensor([input_ids])
with torch.no_grad():
    outputs = model(input_tensor)
    predictions = outputs.logits[0, mask_positions, :].argmax(dim=-1)

print("Predicted token IDs:", predictions.tolist())

Limitations and Risks

  • Model is trained purely on symbolic data; it does not produce audio directly.
  • The GigaMIDI dataset is biased towards Western tonal music.
  • Long-form structure beyond 1024 tokens requires chunking or iterative decoding.
  • Generated continuations may need post-processing to ensure musical coherence.

Citation

If you use this checkpoint, please cite the original MusicBERT introduction and the GigaMIDI dataset.

Downloads last month
9
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train manoskary/musicbert