TMLR-Group-HF
/

GT-Qwen3-8B-Base

Model card Files Files and versions

GT-Qwen3-8B-Base / README.md

Geraldxm's picture

Update README.md

51213e5 verified about 1 month ago

|

history blame contribute delete

663 Bytes

	---
	license: mit
	---
	## TMLR-Group-HF/GT-Qwen3-8B-Base

	This is the Qwen3-8B-Base model trained by GRPO Ground Truth method using MATH training set.

	If you are interested in Co-Reward, you can find more details on our Github Repo [https://github.com/tmlr-group/Co-Reward].

	## Citation

	```
	@article{zhang2025coreward,
	title={Co-Reward: Self-supervised Reinforcement Learning for Large Language Model Reasoning via Contrastive Agreement},
	author={Zizhuo Zhang and Jianing Zhu and Xinmu Ge and Zihua Zhao and Zhanke Zhou and Xuan Li and Xiao Feng and Jiangchao Yao and Bo Han},
	journal={arXiv preprint arXiv:2508.00410}
	year={2025},
	}
	```