Yi3852
/

MuFun-ACEStep

Audio-Text-to-Text

Model card Files Files and versions

MuFun-ACEStep / README.md

Yi3852's picture

Update README.md

7ca4323 verified about 16 hours ago

|

history blame contribute delete

2.88 kB

	---
	license: apache-2.0
	pipeline_tag: audio-text-to-text
	language:
	- en
	- zh
	base_model:
	- Yi3852/MuFun-Base
	datasets:
	- Yi3852/ACEStep-Songs
	---
	a prompt generator for the [ACE-Step](https://huggingface.co/ACE-Step/ACE-Step-v1-3.5B) music generation model, fintuned from the MuFun model proposed in [Advancing the Foundation Model for Music Understanding](https://arxiv.org/abs/2508.01178)

	more info see https://github.com/ace-step/ACE-Step/issues/313

	gradio demo: http://47.121.209.64/mufun_demo_acestep

	demo code: https://github.com/laitselec/MuFun/blob/main/demo/mufun_acestep/gr_app.py

	train code: https://github.com/laitselec/MuFun

	## Usage
	some audio processing packages like mutagen, torchaudio are needed to be installed
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	hf_path = 'Yi3852/MuFun-ACEStep'
	tokenizer = AutoTokenizer.from_pretrained(hf_path, use_fast=False)
	device='cuda'
	model = AutoModelForCausalLM.from_pretrained(hf_path, trust_remote_code=True, torch_dtype="bfloat16")
	model.to(device)

	aud="/path/to/your/song.wav"
	inp='<audio>\nDeconstruct this song, listing its tags and lyrics. Directly output a JSON object with prompt and lyrics fields, without any additional explanations or text.'
	res=model.chat(prompt=inp, audio_files=aud, segs=None, tokenizer=tokenizer)
	print(res)
	# { "prompt": "110 bpm, soulful, electric, synthesizer, catchy, keyboard, guitar",
	# "lyrics": "[verse] \nNeon lights, they flicker bright, \nCity hums in dead of night. \nRhythms pulse through concrete veins, \nLost in echoes of refrains. \n\nBassline grooves in my chest, \nHeartbeats match the city's vest. \nElectric whispers fill the air, \nSynthesized dreams everywhere. \n\n[chorus] \nTurn it up and let it flow, \nFeel the fire, let it grow. \nIn this rhythm, we belong, \nHere tonight, sing our song. \n\n[verse] \nGuitar strings, they start to weep, \nWake the soul from silent sleep. \nEvery note a story told, \nIn this night, we're bold and gold. \n\nVoices blend in harmony, \nLost in pure cacophony. \nTimeless echoes, timeless cries, \nSoulful shouts beneath the skies. \n\n[bridge] \nKeyboard dances on the keys, \nMelodies on evening breeze. \nCatch the tune and hold it tight, \nIn this moment, we take flight. \n\n[chorus] \nTurn it up and let it flow, \nFeel the fire, let it grow. \nIn this rhythm, we belong, \nHere tonight, sing our song. "
	# }
	```

	## Citation

	```bibtex
	@misc{jiang2025advancingfoundationmodelmusic,
	title={Advancing the Foundation Model for Music Understanding},
	author={Yi Jiang and Wei Wang and Xianwen Guo and Huiyun Liu and Hanrui Wang and Youri Xu and Haoqi Gu and Zhongqian Xie and Chuanjiang Luo},
	year={2025},
	eprint={2508.01178},
	archivePrefix={arXiv},
	primaryClass={cs.SD},
	url={https://arxiv.org/abs/2508.01178},
	}