Towards Best Practices for Open Datasets for LLM Training Paper • 2501.08365 • Published Jan 14 • 55
Preference Leakage: A Contamination Problem in LLM-as-a-judge Paper • 2502.01534 • Published 20 days ago • 37
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 19 days ago • 190
Instruction-Following Evaluation for Large Language Models Paper • 2311.07911 • Published Nov 14, 2023 • 20