Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities Paper • 2505.15692 • Published May 21 • 14
s3: You Don't Need That Much Data to Train a Search Agent via RL Paper • 2505.14146 • Published May 20 • 18