Submitted by KevinQHLin 80 ShowUI: One Vision-Language-Action Model for GUI Visual Agent · 9 authors 3
Submitted by BestWishYsh 35 Identity-Preserving Text-to-Video Generation by Frequency Decomposition · 8 authors 4
Submitted by noamrot 33 Pathways on the Image Manifold: Image Editing via Video Generation · 6 authors 2
Submitted by SadilKhan 24 MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation · 7 authors 5
Submitted by shuaishuaicdp 22 Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment · 11 authors 2
Submitted by huangsiteng 20 Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration · 7 authors 2
Submitted by yifanzhang114 20 MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs · 12 authors 2
Submitted by sggetao 13 Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens · 6 authors 5
Submitted by cyw-3d 12 SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE · 5 authors 2
Submitted by tobiaslee 11 VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models · 12 authors 2
Submitted by arkimjh 8 SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis · 4 authors 2
Submitted by hhua2 8 FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity · 8 authors 2
Submitted by akhaliq 7 AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation · 10 authors 2
Submitted by SanghyeokLee 6 EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality · 3 authors 2
Submitted by phenixace 5 MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts · 9 authors 2
Submitted by yisol 4 Controllable Human Image Generation with Personalized Multi-Garments · 5 authors 2
Submitted by amanchadha 3 Visual Counter Turing Test (VCT^2): Discovering the Challenges for AI-Generated Image Detection and Introducing Visual AI Index (V_AI) · 14 authors 2