Simple Reinforcement Learning for Reasoning

This is the model checkpoint in Project SimpleRL. Qwen-2.5-Math-7B-SimpleRL-Zero is the simple RL training from the base model directly, using only 8K MATH examples.

Citation

If you find this blog or our code useful, we would appreciate it if you could cite our work:

@misc{
    zeng2025simplerl,
    title={7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient},
    author={Weihao Zeng and Yuzhen Huang and Wei Liu and Keqing He and Qian Liu and Zejun Ma and Junxian He},
    year={2025},
    howpublished={\url{https://hkust-nlp.notion.site/simplerl-reason}},
    note={Notion Blog}
}

hkust-nlp
/

Qwen-2.5-Math-7B-SimpleRL-Zero

Simple Reinforcement Learning for Reasoning

Citation

Model tree for hkust-nlp/Qwen-2.5-Math-7B-SimpleRL-Zero

Collection including hkust-nlp/Qwen-2.5-Math-7B-SimpleRL-Zero

SimpleRL