Towards a Unified View of Large Language Model Post-Training Paper • 2509.04419 • Published 2 days ago • 52
UserBench: An Interactive Gym Environment for User-Centric Agents Paper • 2507.22034 • Published Jul 29 • 29
MiniCPM4 Collection MiniCPM4: Ultra-Efficient LLMs on End Devices • 28 items • Updated about 2 hours ago • 74
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper • 2505.22617 • Published May 28 • 130
view article Article Process Reinforcement through Implicit Rewards By ganqu and 1 other • Jan 3 • 29
Eurus Collection Advancing LLM Reasoning Generalists with Preference Trees • 11 items • Updated 30 days ago • 25