BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining Paper • 2508.10975 • Published 7 days ago • 50 • 2
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding Paper • 2507.19427 • Published 27 days ago • 18 • 2