nielsr's picture
nielsr HF Staff
Add library name, pipeline tag, link to Github
cdc752a verified
|
raw
history blame
884 Bytes
metadata
license: mit
library_name: transformers
pipeline_tag: text-generation

The base Qwen2.5-Math-7B model used by LUFFY, described in Learning to Reason under Off-Policy Guidance. We change to rope_theta from 10000 to 40000 and extend the context window to 16k. Also, we modify the chat_template for the system prompt and add .

Github: https://github.com/ElliottYan/LUFFY

Citation

If you find our model, data, or evaluation code useful, please kindly cite our paper:

@misc{luffy,
      title={Learning to Reason under Off-Policy Guidance}, 
      author={Jianhao Yan and Yafu Li and Zican Hu and Zhi Wang and Ganqu Cui and Xiaoye Qu and Yu Cheng and Yue Zhang},
      year={2025},
      eprint={2504.14945},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2504.14945}, 
}