Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning Paper • 2508.20751 • Published 9 days ago • 85
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling Paper • 2508.17445 • Published 12 days ago • 77
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space Paper • 2508.19247 • Published 10 days ago • 39
USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning Paper • 2508.18966 • Published 11 days ago • 55
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published 3 days ago • 139
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published 3 days ago • 76
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model Paper • 2509.00676 • Published 6 days ago • 73
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published 5 days ago • 59
Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling Paper • 2509.00605 • Published 6 days ago • 32
DeepResearch Arena: The First Exam of LLMs' Research Abilities via Seminar-Grounded Tasks Paper • 2509.01396 • Published 5 days ago • 41