Baichuan-M2: Scaling Medical Capability with Large Verifier System Paper • 2509.02208 • Published 4 days ago • 32
Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers Paper • 2505.19439 • Published May 26 • 31