Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published 10 days ago • 78
Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation Paper • 2506.15068 • Published Jun 18 • 14
Skill Discovery for Software Scripting Automation via Offline Simulations with LLMs Paper • 2504.20406 • Published Apr 29 • 8
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement Paper • 2504.07934 • Published Apr 10 • 20