Adapting Vision-Language Models Without Labels: A Comprehensive Survey Paper • 2508.05547 • Published 30 days ago • 11
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models Paper • 2508.10751 • Published 23 days ago • 26
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation Paper • 2508.12040 • Published 21 days ago • 14
A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models Paper • 2508.12903 • Published 19 days ago • 11
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published 18 days ago • 117
Controlling Multimodal LLMs via Reward-guided Decoding Paper • 2508.11616 • Published 22 days ago • 6
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning Paper • 2507.22607 • Published Jul 30 • 46