LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation Paper • 2512.23576 • Published 5 days ago • 62
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents Paper • 2505.20411 • Published May 26, 2025 • 92
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards Paper • 2510.08529 • Published Oct 9, 2025 • 18
TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments Paper • 2510.01179 • Published Oct 1, 2025 • 25
Do You Need Proprioceptive States in Visuomotor Policies? Paper • 2509.18644 • Published Sep 23, 2025 • 49
Do You Need Proprioceptive States in Visuomotor Policies? Paper • 2509.18644 • Published Sep 23, 2025 • 49 • 2
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks? Paper • 2509.16941 • Published Sep 21, 2025 • 21
DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery Paper • 2508.06960 • Published Aug 9, 2025 • 1