view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • 16 days ago • 42
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 19 days ago • 190
Contrastive Sparse Autoencoders for Interpreting Planning of Chess-Playing Agents Paper • 2406.04028 • Published Jun 6, 2024 • 1
Extending the Massive Text Embedding Benchmark to French Paper • 2405.20468 • Published May 30, 2024 • 2