BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation Paper • 2403.09227 • Published Mar 14, 2024 • 1
Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models Paper • 2506.13923 • Published Jun 16
ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents Paper • 2511.07685 • Published Nov 10 • 9