Submitted by myownskyW7 94 InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions · 29 authors 3
Submitted by oliu-io 53 Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions · 5 authors 2
Submitted by wcy1122 45 Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition · 15 authors 3
Submitted by ranpox 28 AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials · 8 authors 2
Submitted by alanspike 24 SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training · 19 authors 3
Submitted by CaraJ 21 EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM · 8 authors 3
Submitted by kangnamgyu27 18 PIG: Physics-Informed Gaussians as Adaptive Parametric Mesh Representations · 4 authors 2
Submitted by zxhezexin 18 Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion · 5 authors 4
Submitted by lisabdunlap 13 VisionArena: 230K Real World User-VLM Conversations with Preference Labels · 8 authors 3
Submitted by praeclarumjj3 11 OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation · 5 authors 2
Submitted by wenyueH 10 RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios · 7 authors 2
Submitted by versae 8 The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective · 18 authors 2
Submitted by Yw22 8 DisPose: Disentangling Pose Guidance for Controllable Human Image Animation · 7 authors 2
Submitted by enisimsar 8 LoRACLR: Contrastive Adaptation for Customization of Diffusion Models · 4 authors 2
Submitted by bluestyle97 8 FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction · 3 authors 3
Submitted by adhiraj1998 6 ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities · 6 authors 2
Submitted by praeclarumjj3 5 Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders · 6 authors 2
Submitted by ZGZzz 5 SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts · 6 authors 2
Submitted by rumourscape 4 Shiksha: A Technical Domain focused Translation Dataset and Model for Indian Languages · 2 authors 2