a prompt generator for the ACE-Step music generation model, fintuned from the MuFun model proposed in Advancing the Foundation Model for Music Understanding

gradio demo: http://47.121.209.64/mufun_demo_acestep

demo code: https://github.com/laitselec/MuFun/blob/main/demo/mufun_acestep/gr_app.py

train code: https://github.com/laitselec/MuFun

Usage

some audio processing packages like mutagen, torchaudio are needed to be installed

from transformers import AutoTokenizer, AutoModelForCausalLM
hf_path = 'Yi3852/MuFun-ACEStep'
tokenizer = AutoTokenizer.from_pretrained(hf_path, use_fast=False)
device='cuda'
model = AutoModelForCausalLM.from_pretrained(hf_path, trust_remote_code=True, torch_dtype="bfloat16")
model.to(device)

aud="/path/to/your/song.wav"
inp='<audio>\nDeconstruct this song, listing its tags and lyrics. Directly output a JSON object with prompt and lyrics fields, without any additional explanations or text.'
res=model.chat(prompt=inp, audio_files=aud, segs=None, tokenizer=tokenizer)
print(res)
# {     "prompt": "110 bpm, soulful, electric, synthesizer, catchy, keyboard, guitar",
#     "lyrics": "[verse]  \nNeon lights, they flicker bright,  \nCity hums in dead of night.  \nRhythms pulse through concrete veins,  \nLost in echoes of refrains.  \n\nBassline grooves in my chest,  \nHeartbeats match the city's vest.  \nElectric whispers fill the air,  \nSynthesized dreams everywhere.  \n\n[chorus]  \nTurn it up and let it flow,  \nFeel the fire, let it grow.  \nIn this rhythm, we belong,  \nHere tonight, sing our song.  \n\n[verse]  \nGuitar strings, they start to weep,  \nWake the soul from silent sleep.  \nEvery note a story told,  \nIn this night, we're bold and gold.  \n\nVoices blend in harmony,  \nLost in pure cacophony.  \nTimeless echoes, timeless cries,  \nSoulful shouts beneath the skies.  \n\n[bridge]  \nKeyboard dances on the keys,  \nMelodies on evening breeze.  \nCatch the tune and hold it tight,  \nIn this moment, we take flight.  \n\n[chorus]  \nTurn it up and let it flow,  \nFeel the fire, let it grow.  \nIn this rhythm, we belong,  \nHere tonight, sing our song.  "
# }

Citation

@misc{jiang2025advancingfoundationmodelmusic,
      title={Advancing the Foundation Model for Music Understanding}, 
      author={Yi Jiang and Wei Wang and Xianwen Guo and Huiyun Liu and Hanrui Wang and Youri Xu and Haoqi Gu and Zhongqian Xie and Chuanjiang Luo},
      year={2025},
      eprint={2508.01178},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2508.01178}, 
}
Downloads last month
21
Safetensors
Model size
8.92B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Yi3852/MuFun-ACEStep

Base model

Qwen/Qwen3-8B-Base
Finetuned
Yi3852/MuFun-Base
Finetuned
(3)
this model

Dataset used to train Yi3852/MuFun-ACEStep

Collection including Yi3852/MuFun-ACEStep