|
--- |
|
license: apache-2.0 |
|
pipeline_tag: audio-text-to-text |
|
language: |
|
- en |
|
- zh |
|
base_model: |
|
- Qwen/Qwen3-8B-Base |
|
- openai/whisper-large-v3 |
|
--- |
|
MuFun model proposed in [Advancing the Foundation Model for Music Understanding](https://arxiv.org/abs/2508.01178) |
|
|
|
train code: https://github.com/laitselec/MuFun |
|
|
|
## Usage |
|
some audio processing packages like mutagen, torchaudio are needed to be installed |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
hf_path = 'Yi3852/MuFun-Base' |
|
tokenizer = AutoTokenizer.from_pretrained(hf_path, use_fast=False) |
|
device='cuda' |
|
model = AutoModelForCausalLM.from_pretrained(hf_path, trust_remote_code=True, torch_dtype="bfloat16") |
|
model.to(device) |
|
|
|
# single audio |
|
# during inference the audio(converted to a sequence of embeddings) will be placed in the position of <audio> tag in the prompt |
|
aud="/path/to/your/song.mp3" |
|
inp="\n<audio>Can you listen to this song and tell me its lyrics?" |
|
res=model.chat(prompt=inp, audio_files=aud, tokenizer=tokenizer) |
|
print(res) |
|
|
|
# multiple audios |
|
# for multiple songs each will be placed in the coresponding <audio> tag in the prompt |
|
aud=["/path/to/your/song1.mp3", '/path/to/your/song2.mp3'] |
|
inp="\n<audio> This is song1. <audio> This is song2. Which song do you like more? Tell me the reason." |
|
res=model.chat(prompt=inp, audio_files=aud, tokenizer=tokenizer) |
|
print(res) |
|
|
|
# analyze only a specific segment of audio using the segs parameter |
|
# format is [start_time, end_time](in seconds), for multiple audios segs can be passed like [[0,30],[60,90]], [None,[0,30.0]] |
|
aud="/path/to/your/song.mp3" |
|
inp="\n<audio>How is the rhythm of this music clip?" |
|
res=model.chat(prompt=inp, audio_files=aud, segs=[0,30.0], tokenizer=tokenizer) |
|
print(res) |
|
|
|
# set audio_files=None will work, however it is not recommended to use it as a text model |
|
``` |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{jiang2025advancingfoundationmodelmusic, |
|
title={Advancing the Foundation Model for Music Understanding}, |
|
author={Yi Jiang and Wei Wang and Xianwen Guo and Huiyun Liu and Hanrui Wang and Youri Xu and Haoqi Gu and Zhongqian Xie and Chuanjiang Luo}, |
|
year={2025}, |
|
eprint={2508.01178}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.SD}, |
|
url={https://arxiv.org/abs/2508.01178}, |
|
} |