Submitted by jinjieni 75 MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures · 13 authors 2
Submitted by tyl5566 38 Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens · 9 authors 3
Submitted by FanBuCUHK 34 Roadmap towards Superhuman Speech Understanding using Large Language Models · 6 authors 2
Submitted by WuChengyue 34 Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation · 11 authors 4
Submitted by JamesZhutheThird 33 MobA: A Two-Level Agent System for Efficient Mobile Task Automation · 11 authors 3
Submitted by gentaiscool 32 WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines · 51 authors 3
Submitted by weilllllls 25 DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control · 12 authors 2
Submitted by richardxp888 22 MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models · 9 authors 3
Submitted by zhoutianyi 20 BenTo: Benchmark Task Reduction with In-Context Transferability · 4 authors 3
Submitted by ZenMoore 19 PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment · 8 authors 2
Submitted by SiweiWu 17 A Comparative Study on Reasoning Patterns of OpenAI's o1 Model · 17 authors 2
Submitted by Tigerph 17 A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models · 8 authors 2
Submitted by hbseong 13 Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems · 2 authors 2
Submitted by akhaliq 13 VidPanos: Generative Panoramic Videos from Casual Panning Videos · 9 authors 2
Submitted by MING-ZCH 11 Can MLLMs Understand the Deep Implication Behind Chinese Images? · 21 authors 2
Submitted by Sreyan88 10 Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation · 7 authors 2
Submitted by Hoar012 9 Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant · 5 authors 2
Submitted by KrithikV 9 MedMobile: A mobile-sized language model with expert-level clinical capabilities · 5 authors 2
Submitted by YaxinLuo 8 $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models · 7 authors 2
Submitted by ckzheng 8 MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization · 6 authors 2
Submitted by mshuaibi 7 Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models · 9 authors 1
Submitted by Shiym 7 LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning · 7 authors 2
Submitted by Yingda 6 Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key · 6 authors 2
Submitted by arthurhero 6 Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats · 8 authors 2
Submitted by ChenDRAG 5 Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment · 4 authors 2
Submitted by markywg 3 TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration · 5 authors 2
Submitted by pdx97 3 SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation · 2 authors 2