LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization Paper • 2507.15758 • Published Jul 21 • 34
Hierarchical Budget Policy Optimization for Adaptive Reasoning Paper • 2507.15844 • Published Jul 21 • 16
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning Paper • 2507.16814 • Published Jul 22 • 21
Perception-Aware Policy Optimization for Multimodal Reasoning Paper • 2507.06448 • Published Jul 8 • 47