MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published Jan 10 • 45
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Paper • 2502.13128 • Published 6 days ago • 34