VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining Paper • 2603.15030 • Published 6 days ago • 15
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens Paper • 2603.19232 • Published 3 days ago • 30
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs Paper • 2603.18004 • Published 4 days ago • 12
Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer Paper • 2603.19227 • Published 3 days ago • 37
Stereo World Model: Camera-Guided Stereo Video Generation Paper • 2603.17375 • Published 4 days ago • 10
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models Paper • 2603.15557 • Published 6 days ago • 28
Rethinking UMM Visual Generation: Masked Modeling for Efficient Image-Only Pre-training Paper • 2603.16139 • Published 5 days ago • 30
MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos Paper • 2603.14145 • Published 8 days ago • 13
Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding Paper • 2603.13366 • Published 13 days ago • 91
AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents Paper • 2603.14465 • Published 7 days ago • 22
WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation Paper • 2603.15132 • Published 6 days ago • 33
Cosmos-Predict2.5 Collection Improved World Simulation with Video Foundation Models for Physical AI • 2 items • Updated 2 days ago • 15