view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • 16 days ago • 42
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) Dec 9, 2022 • 169
SimpleRL Collection The collection for the Project "Simple Reinforcement Learning for Reasoning" • 2 items • Updated 5 days ago • 4
CodeI/O Collection Collection for CodeI/O @ https://codei-o.github.io/ • 15 items • Updated 11 days ago • 6
NuminaMath Collection Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 7 items • Updated 13 days ago • 75
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2 Paper • 2502.03544 • Published 18 days ago • 42
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published 16 days ago • 114
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published 13 days ago • 59
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 13 days ago • 134
OpenR1-Math Collection Dataset and SFT model distilled from DeepSeek-R1. Check out our blog post for more details: https://huggingface.co/blog/open-r1/update-2 • 3 items • Updated 9 days ago • 6
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 12 items • Updated 4 days ago • 79