Watch Before You Answer: Learning from Visually Grounded Post-Training Paper • 2604.05117 • Published 3 days ago • 26
Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision Paper • 2604.04934 • Published 3 days ago • 32
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning Paper • 2604.05404 • Published 2 days ago • 33
ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement Paper • 2604.01591 • Published 7 days ago • 32
GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers Paper • 2604.02648 • Published 6 days ago • 38
ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation Paper • 2604.03922 • Published 4 days ago • 48
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents Paper • 2604.06132 • Published 2 days ago • 99
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published 3 days ago • 205
BidirLM: From Text to Omnimodal Bidirectional Encoders by Adapting and Composing Causal LLMs Paper • 2604.02045 • Published 7 days ago • 19
Synthetic Sandbox for Training Machine Learning Engineering Agents Paper • 2604.04872 • Published 3 days ago • 8
Scaling Teams or Scaling Time? Memory Enabled Lifelong Learning in LLM Multi-Agent Systems Paper • 2604.03295 • Published 13 days ago • 7
Type-Checked Compliance: Deterministic Guardrails for Agentic Financial Systems Using Lean 4 Theorem Proving Paper • 2604.01483 • Published 8 days ago • 4
Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models Paper • 2604.00375 • Published 8 days ago • 3
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models Paper • 2604.04707 • Published 3 days ago • 167 • 12
LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models Paper • 2603.28301 • Published 10 days ago • 75
The Geometric Alignment Tax: Tokenization vs. Continuous Geometry in Scientific Foundation Models Paper • 2604.04155 • Published 4 days ago • 6
Cog-DRIFT: Exploration on Adaptively Reformulated Instances Enables Learning from Hard Reasoning Problems Paper • 2604.04767 • Published 3 days ago • 3
Emergent Compositional Communication for Latent World Properties Paper • 2604.03266 • Published 22 days ago • 3