Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning Paper • 2508.16949 • Published 14 days ago • 22
Diffusion Language Models Know the Answer Before Decoding Paper • 2508.19982 • Published 10 days ago • 22
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models Paper • 2508.18773 • Published 11 days ago • 14
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens Paper • 2508.01191 • Published Aug 2 • 234
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published 10 days ago • 78