SimpleRL
Collection
The collection for the Project "Simple Reinforcement Learning for Reasoning"
•
2 items
•
Updated
•
4
This is the model checkpoint in Project SimpleRL. Qwen-2.5-Math-7B-SimpleRL-Zero is the simple RL training from the base model directly, using only 8K MATH examples.
If you find this blog or our code useful, we would appreciate it if you could cite our work:
@misc{
zeng2025simplerl,
title={7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient},
author={Weihao Zeng and Yuzhen Huang and Wei Liu and Keqing He and Qian Liu and Zejun Ma and Junxian He},
year={2025},
howpublished={\url{https://hkust-nlp.notion.site/simplerl-reason}},
note={Notion Blog}
}