AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26, 2024 • 35
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning Paper • 2406.06469 • Published Jun 10, 2024 • 30
ADaPT: As-Needed Decomposition and Planning with Language Models Paper • 2311.05772 • Published Nov 8, 2023 • 15
Decomposed Prompting: A Modular Approach for Solving Complex Tasks Paper • 2210.02406 • Published Oct 5, 2022 • 1
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources Paper • 2306.04751 • Published Jun 7, 2023 • 5
Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance Paper • 2305.17306 • Published May 26, 2023 • 2
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge Paper • 1803.05457 • Published Mar 14, 2018 • 2
Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback Paper • 2305.10142 • Published May 17, 2023 • 1