Papers
arxiv:2601.02967

MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free

Published on Jan 6
Authors:
,
,
,
,
,
,
,
,
,
,
,

Abstract

The MoE-Adapter architecture addresses audio heterogeneity in LLMs through a sparse Mixture-of-Experts design that reduces gradient conflicts and improves performance on audio semantic and paralinguistic tasks.

AI-generated summary

Extending the input modality of Large Language Models~(LLMs) to the audio domain is essential for achieving comprehensive multimodal perception. However, it is well-known that acoustic information is intrinsically heterogeneous, entangling attributes such as speech, music, and environmental context. Existing research is limited to a dense, parameter-shared adapter to model these diverse patterns, which induces gradient conflict during optimization, as parameter updates required for distinct attributes contradict each other. To address this limitation, we introduce the \textbf{MoE-Adapter}, a sparse Mixture-of-Experts~(MoE) architecture designed to decouple acoustic information. Specifically, it employs a dynamic gating mechanism that routes audio tokens to specialized experts capturing complementary feature subspaces while retaining shared experts for global context, thereby mitigating gradient conflicts and enabling fine-grained feature learning. Comprehensive experiments show that the MoE-Adapter achieves superior performance on both audio semantic and paralinguistic tasks, consistently outperforming dense linear baselines with comparable computational costs. Furthermore, we will release the related code and models to facilitate future research.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.02967 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.02967 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.02967 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.