Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published Oct 3, 2025 • 75
Rethinking Entropy Regularization in Large Reasoning Models Paper • 2509.25133 • Published Sep 29, 2025 • 4
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents Paper • 2509.09265 • Published Sep 11, 2025 • 47