File size: 2,261 Bytes
df3c3ac 11062fc df3c3ac |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
---
license: apache-2.0
pipeline_tag: audio-text-to-text
language:
- en
- zh
base_model:
- Qwen/Qwen3-8B-Base
- openai/whisper-large-v3
---
MuFun model proposed in [Advancing the Foundation Model for Music Understanding](https://arxiv.org/abs/2508.01178)
train code: https://github.com/laitselec/MuFun
## Usage
some audio processing packages like mutagen, torchaudio are needed to be installed
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
hf_path = 'Yi3852/MuFun-Base'
tokenizer = AutoTokenizer.from_pretrained(hf_path, use_fast=False)
device='cuda'
model = AutoModelForCausalLM.from_pretrained(hf_path, trust_remote_code=True, torch_dtype="bfloat16")
model.to(device)
# single audio
# during inference the audio(converted to a sequence of embeddings) will be placed in the position of <audio> tag in the prompt
aud="/path/to/your/song.mp3"
inp="\n<audio>Can you listen to this song and tell me its lyrics?"
res=model.chat(prompt=inp, audio_files=aud, tokenizer=tokenizer)
print(res)
# multiple audios
# for multiple songs each will be placed in the coresponding <audio> tag in the prompt
aud=["/path/to/your/song1.mp3", '/path/to/your/song2.mp3']
inp="\n<audio> This is song1. <audio> This is song2. Which song do you like more? Tell me the reason."
res=model.chat(prompt=inp, audio_files=aud, tokenizer=tokenizer)
print(res)
# analyze only a specific segment of audio using the segs parameter
# format is [start_time, end_time](in seconds), for multiple audios segs can be passed like [[0,30],[60,90]], [None,[0,30.0]]
aud="/path/to/your/song.mp3"
inp="\n<audio>How is the rhythm of this music clip?"
res=model.chat(prompt=inp, audio_files=aud, segs=[0,30.0], tokenizer=tokenizer)
print(res)
# set audio_files=None will work, however it is not recommended to use it as a text model
```
## Citation
```bibtex
@misc{jiang2025advancingfoundationmodelmusic,
title={Advancing the Foundation Model for Music Understanding},
author={Yi Jiang and Wei Wang and Xianwen Guo and Huiyun Liu and Hanrui Wang and Youri Xu and Haoqi Gu and Zhongqian Xie and Chuanjiang Luo},
year={2025},
eprint={2508.01178},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2508.01178},
} |