--- license: mit --- ## TMLR-Group-HF/GT-Qwen3-8B-Base This is the Qwen3-8B-Base model trained by GRPO Ground Truth method using MATH training set. If you are interested in Co-Reward, you can find more details on our Github Repo [https://github.com/tmlr-group/Co-Reward]. ## Citation ``` @article{zhang2025coreward, title={Co-Reward: Self-supervised Reinforcement Learning for Large Language Model Reasoning via Contrastive Agreement}, author={Zizhuo Zhang and Jianing Zhu and Xinmu Ge and Zihua Zhao and Zhanke Zhou and Xuan Li and Xiao Feng and Jiangchao Yao and Bo Han}, journal={arXiv preprint arXiv:2508.00410} year={2025}, } ```