Libra: Assessing and Improving Reward Model by Learning to Think Paper • 2507.21645 • Published Jul 29 • 3
Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Deepseek-R1-Llama-70B Viewer • Updated Jan 27 • 150k • 61 • 17