--- license: mit --- ## CoReward-Qwen2.5-7B This is the Qwen2.5-7B model trained by Co-Reward method using MATH training set. If you are interested in Co-Reward, you can find more details on our Github Repo [https://github.com/tmlr-group/Co-Reward]. ## Citation ``` @article{zhang2025coreward, title={Co-Reward: Self-supervised Reinforcement Learning for Large Language Model Reasoning via Contrastive Agreement}, author={Zizhuo Zhang and Jianing Zhu and Xinmu Ge and Zihua Zhao and Zhanke Zhou and Xuan Li and Xiao Feng and Jiangchao Yao and Bo Han}, journal={arXiv preprint arXiv:2508.00410} year={2025}, } ```