Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published 13 days ago • 59
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement Paper • 2501.12273 • Published Jan 21 • 14
deepseek-ai/DeepSeek-R1-Distill-Llama-70B Text Generation • Updated 15 days ago • 472k • • 580
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B Text Generation • Updated 15 days ago • 586k • • 419
deepseek-ai/DeepSeek-R1-Distill-Llama-8B Text Generation • Updated 15 days ago • 943k • • 577
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B Text Generation • Updated 15 days ago • 1.09M • • 917
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B Text Generation • Updated 15 days ago • 1.04M • • 1.15k