BabyVision Collection State-of-the-art MLLMs achieve PhD-level language reasoning but struggle with visual tasks that 3-year-olds solve effortlessly. • 2 items • Updated 6 days ago • 4
EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce Paper • 2512.08868 • Published Dec 9, 2025 • 2
EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce Paper • 2512.08868 • Published Dec 9, 2025 • 2
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction Paper • 2511.07327 • Published Nov 10, 2025 • 77
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction Paper • 2511.07327 • Published Nov 10, 2025 • 77
AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis Paper • 2510.24695 • Published Oct 28, 2025 • 23
AgentFold: Long-Horizon Web Agents with Proactive Context Management Paper • 2510.24699 • Published Oct 28, 2025 • 70
Repurposing Synthetic Data for Fine-grained Search Agent Supervision Paper • 2510.24694 • Published Oct 28, 2025 • 24
AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis Paper • 2510.24695 • Published Oct 28, 2025 • 23
WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking Paper • 2510.24697 • Published Oct 28, 2025 • 20
ParallelMuse: Agentic Parallel Thinking for Deep Information Seeking Paper • 2510.24698 • Published Oct 28, 2025 • 20
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents Paper • 2509.13309 • Published Sep 16, 2025 • 67
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization Paper • 2509.13313 • Published Sep 16, 2025 • 80
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning Paper • 2509.13305 • Published Sep 16, 2025 • 91