PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC Paper • 2502.14282 • Published 3 days ago • 14
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 3 days ago • 147
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published 10 days ago • 38
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks Paper • 2502.08235 • Published 11 days ago • 53