Submitted by zhihou 52 Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy · 11 authors 112 2
Submitted by akhaliq 48 Open Deep Search: Democratizing Search with Open-source Reasoning Agents · 12 authors 3.52k 3
Submitted by KennyUTC 36 LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? · 9 authors 29 2
Submitted by phillipinseoul 22 Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models · 4 authors 3
Submitted by msj9817 16 GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers · 6 authors 70 2
Submitted by Awiny 14 BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation · 9 authors 290 3
Submitted by Concyclics 12 LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation · 7 authors 12 2
Submitted by yilunzhao 11 MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search · 4 authors 2
Submitted by aejion 10 AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset · 6 authors 2
Submitted by hahahawu 9 Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging · 10 authors 5
Submitted by Ningyu 7 ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems · 7 authors 2
Submitted by r0nn13 6 Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image · 2 authors 2
Submitted by ya-mehdi 6 Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs · 8 authors 3
Submitted by Awiny 4 Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models · 5 authors 3
Submitted by johanobandoc 4 Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training · 10 authors 3
Submitted by akhaliq 3 Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals · 7 authors 2
Submitted by Jarvis1111 2 UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis · 3 authors 49 2
Submitted by SteveZeyuZhang 1 PathoHR: Breast Cancer Survival Prediction on High-Resolution Pathological Images · 10 authors 4 2
Submitted by aadarsh-ram 1 RONA: Pragmatically Diverse Image Captioning with Coherence Relations · 3 authors 3 2