SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published 4 days ago • 76
Running 3.16k 3.16k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published Jul 1, 2024 • 41
Bootstrapping Language Models with DPO Implicit Rewards Paper • 2406.09760 • Published Jun 14, 2024 • 41