Simple Reinforcement Learning for Reasoning

Notion

This is the model checkpoint in Project SimpleRL. Qwen-2.5-Math-7B-SimpleRL-Zero is the simple RL training from the base model directly, using only 8K MATH examples.

Citation

If you find this blog or our code useful, we would appreciate it if you could cite our work:

@misc{
    zeng2025simplerl,
    title={7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient},
    author={Weihao Zeng and Yuzhen Huang and Wei Liu and Keqing He and Qian Liu and Zejun Ma and Junxian He},
    year={2025},
    howpublished={\url{https://hkust-nlp.notion.site/simplerl-reason}},
    note={Notion Blog}
}
Downloads last month
154
Safetensors
Model size
7.62B params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for hkust-nlp/Qwen-2.5-Math-7B-SimpleRL-Zero

Base model

Qwen/Qwen2.5-7B
Finetuned
(57)
this model
Quantizations
1 model

Collection including hkust-nlp/Qwen-2.5-Math-7B-SimpleRL-Zero