Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation Paper • 2512.24271 • Published 8 days ago • 50 • 5
Can We Trust AI Explanations? Evidence of Systematic Underreporting in Chain-of-Thought Reasoning Paper • 2601.00830 • Published 14 days ago • 2 • 3
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits Paper • 2512.20578 • Published 15 days ago • 59 • 4
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization Paper • 2512.24615 • Published 8 days ago • 100 • 4
Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs Paper • 2510.01954 • Published Oct 2, 2025 • 12 • 3
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation Paper • 2601.00664 • Published 5 days ago • 45 • 3
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models Paper • 2512.24165 • Published 8 days ago • 45 • 5
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion Paper • 2512.17504 • Published 19 days ago • 95 • 4
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space Paper • 2512.24617 • Published 8 days ago • 54 • 4
UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement Paper • 2512.21185 • Published 14 days ago • 26 • 4
Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation Paper • 2411.14971 • Published Nov 22, 2024 • 1
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published 9 days ago • 93 • 4
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published Dec 27, 2024 • 87 • 4
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1, 2025 • 109 • 8
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 15 days ago • 49 • 4
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models Paper • 2512.19995 • Published 16 days ago • 14 • 5
Spatia: Video Generation with Updatable Spatial Memory Paper • 2512.15716 • Published 21 days ago • 29 • 4
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning Paper • 2512.20605 • Published 15 days ago • 60 • 5