File size: 2,883 Bytes

---
license: apache-2.0
pipeline_tag: audio-text-to-text
language:
- en
- zh
base_model:
- Yi3852/MuFun-Base
datasets:
- Yi3852/ACEStep-Songs
---
a prompt generator for the [ACE-Step](https://huggingface.co/ACE-Step/ACE-Step-v1-3.5B) music generation model, fintuned from the MuFun model proposed in [Advancing the Foundation Model for Music Understanding](https://arxiv.org/abs/2508.01178)

more info see https://github.com/ace-step/ACE-Step/issues/313

gradio demo: http://47.121.209.64/mufun_demo_acestep

demo code: https://github.com/laitselec/MuFun/blob/main/demo/mufun_acestep/gr_app.py

train code: https://github.com/laitselec/MuFun

## Usage
some audio processing packages like mutagen, torchaudio are needed to be installed
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
hf_path = 'Yi3852/MuFun-ACEStep'
tokenizer = AutoTokenizer.from_pretrained(hf_path, use_fast=False)
device='cuda'
model = AutoModelForCausalLM.from_pretrained(hf_path, trust_remote_code=True, torch_dtype="bfloat16")
model.to(device)

aud="/path/to/your/song.wav"
inp='<audio>\nDeconstruct this song, listing its tags and lyrics. Directly output a JSON object with prompt and lyrics fields, without any additional explanations or text.'
res=model.chat(prompt=inp, audio_files=aud, segs=None, tokenizer=tokenizer)
print(res)
# {     "prompt": "110 bpm, soulful, electric, synthesizer, catchy, keyboard, guitar",
#     "lyrics": "[verse]  \nNeon lights, they flicker bright,  \nCity hums in dead of night.  \nRhythms pulse through concrete veins,  \nLost in echoes of refrains.  \n\nBassline grooves in my chest,  \nHeartbeats match the city's vest.  \nElectric whispers fill the air,  \nSynthesized dreams everywhere.  \n\n[chorus]  \nTurn it up and let it flow,  \nFeel the fire, let it grow.  \nIn this rhythm, we belong,  \nHere tonight, sing our song.  \n\n[verse]  \nGuitar strings, they start to weep,  \nWake the soul from silent sleep.  \nEvery note a story told,  \nIn this night, we're bold and gold.  \n\nVoices blend in harmony,  \nLost in pure cacophony.  \nTimeless echoes, timeless cries,  \nSoulful shouts beneath the skies.  \n\n[bridge]  \nKeyboard dances on the keys,  \nMelodies on evening breeze.  \nCatch the tune and hold it tight,  \nIn this moment, we take flight.  \n\n[chorus]  \nTurn it up and let it flow,  \nFeel the fire, let it grow.  \nIn this rhythm, we belong,  \nHere tonight, sing our song.  "
# }
```

## Citation

```bibtex
@misc{jiang2025advancingfoundationmodelmusic,
      title={Advancing the Foundation Model for Music Understanding}, 
      author={Yi Jiang and Wei Wang and Xianwen Guo and Huiyun Liu and Hanrui Wang and Youri Xu and Haoqi Gu and Zhongqian Xie and Chuanjiang Luo},
      year={2025},
      eprint={2508.01178},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2508.01178}, 
}