Running 1.36k 1.36k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
SimpleRL Collection The collection for the Project "Simple Reinforcement Learning for Reasoning" • 2 items • Updated 5 days ago • 4
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 7 days ago • 133
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13 • 91
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published Dec 23, 2024 • 46
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published Dec 23, 2024 • 46