--- license: apache-2.0 pipeline_tag: audio-text-to-text language: - en - zh base_model: - Qwen/Qwen3-8B-Base - openai/whisper-large-v3 --- MuFun model proposed in [Advancing the Foundation Model for Music Understanding](https://arxiv.org/abs/2508.01178) train code: https://github.com/laitselec/MuFun ## Usage some audio processing packages like mutagen, torchaudio are needed to be installed ```python from transformers import AutoTokenizer, AutoModelForCausalLM hf_path = 'Yi3852/MuFun-Base' tokenizer = AutoTokenizer.from_pretrained(hf_path, use_fast=False) device='cuda' model = AutoModelForCausalLM.from_pretrained(hf_path, trust_remote_code=True, torch_dtype="bfloat16") model.to(device) # single audio # during inference the audio(converted to a sequence of embeddings) will be placed in the position of