Elliott
/

Qwen2.5-Math-7B-16k-think

Text Generation

text-generation-inference

Model card Files Files and versions Community

Qwen2.5-Math-7B-16k-think / README.md

nielsr's picture

nielsr HF Staff

Add library name, pipeline tag, link to Github

cdc752a verified 5 months ago

|

884 Bytes

	---
	license: mit
	library_name: transformers
	pipeline_tag: text-generation
	---

	The base Qwen2.5-Math-7B model used by LUFFY, described in [Learning to Reason under Off-Policy Guidance](https://huggingface.co/papers/2504.14945).
	We change to rope_theta from 10000 to 40000 and extend the context window to 16k.
	Also, we modify the chat_template for the system prompt and add <think>.

	Github: https://github.com/ElliottYan/LUFFY

	# Citation
	If you find our model, data, or evaluation code useful, please kindly cite our paper:
	```bib
	@misc{luffy,
	title={Learning to Reason under Off-Policy Guidance},
	author={Jianhao Yan and Yafu Li and Zican Hu and Zhi Wang and Ganqu Cui and Xiaoye Qu and Yu Cheng and Yue Zhang},
	year={2025},
	eprint={2504.14945},
	archivePrefix={arXiv},
	primaryClass={cs.LG},
	url={https://arxiv.org/abs/2504.14945},
	}
	```