Jointly Reinforcing Diversity and Quality in Language Model Generations Paper • 2509.02534 • Published 4 days ago • 22
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics Paper • 2508.18124 • Published 12 days ago • 46
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published 26 days ago • 44
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks Paper • 2507.23751 • Published Jul 31 • 4
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems Paper • 2508.07407 • Published 27 days ago • 90
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Paper • 2507.10532 • Published Jul 14 • 88
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6 • 184