The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs Paper • 2509.09677 • Published Sep 11 • 34
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools Paper • 2509.09734 • Published Sep 10 • 15
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published Jan 20 • 109