Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published 26 days ago • 26
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 26 days ago • 106
Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation Paper • 2501.17749 • Published 25 days ago • 13
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published Dec 23, 2024 • 30