File size: 640 Bytes
1cc4a86 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
---
license: mit
---
## CoReward-Qwen2.5-7B
This is the Qwen2.5-7B model trained by Co-Reward method using MATH training set.
If you are interested in Co-Reward, you can find more details on our Github Repo [https://github.com/tmlr-group/Co-Reward].
## Citation
```
@article{zhang2025coreward,
title={Co-Reward: Self-supervised Reinforcement Learning for Large Language Model Reasoning via Contrastive Agreement},
author={Zizhuo Zhang and Jianing Zhu and Xinmu Ge and Zihua Zhao and Zhanke Zhou and Xuan Li and Xiao Feng and Jiangchao Yao and Bo Han},
journal={arXiv preprint arXiv:2508.00410}
year={2025},
}
``` |