MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark Paper • 2508.07307 • Published 29 days ago
AudioStory: Generating Long-Form Narrative Audio with Large Language Models Paper • 2508.20088 • Published 12 days ago • 20
A Comprehensive Survey on Continual Learning in Generative Models Paper • 2506.13045 • Published Jun 16
Aligned Better, Listen Better for Audio-Visual Large Language Models Paper • 2504.02061 • Published Apr 2
ProtoGCD: Unified and Unbiased Prototype Learning for Generalized Category Discovery Paper • 2504.03755 • Published Apr 2 • 1
MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution Paper • 2405.18240 • Published May 28, 2024
Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization Paper • 2403.03145 • Published Mar 5, 2024
Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization Paper • 2403.03095 • Published Mar 5, 2024
WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models Paper • 2407.10131 • Published Jul 14, 2024
GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers Paper • 2503.19480 • Published Mar 25 • 16
Happy: A Debiased Learning Framework for Continual Generalized Category Discovery Paper • 2410.06535 • Published Oct 9, 2024