Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published 19 days ago • 117
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization Paper • 2508.14460 • Published 19 days ago • 80
AI4Research: A Survey of Artificial Intelligence for Scientific Research Paper • 2507.01903 • Published Jul 2 • 4
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published Jan 16 • 41
ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model Paper • 2502.03325 • Published Feb 5 • 1
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models Paper • 2412.12932 • Published Dec 17, 2024 • 1
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation Paper • 2502.13092 • Published Feb 18 • 13
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation Paper • 2502.13092 • Published Feb 18 • 13
ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model Paper • 2502.03325 • Published Feb 5 • 1
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8 • 284
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models Paper • 2310.08582 • Published Oct 12, 2023 • 2
Through the Lens of Core Competency: Survey on Evaluation of Large Language Models Paper • 2308.07902 • Published Aug 15, 2023
M$^3$CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought Paper • 2405.16473 • Published May 26, 2024
HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model Paper • 2408.09559 • Published Aug 18, 2024