Elliott
/

Qwen2.5-Math-7B-16k-think

@@ -1,9 +1,14 @@
 ---
 license: mit
 ---
-The base Qwen2.5-Math-7B model used by LUFFY.
 We change to rope_theta from 10000 to 40000 and extend the context window to 16k.
-Also, we modify the chat_template for the system prompt and add <think>.
 # Citation
 If you find our model, data, or evaluation code useful, please kindly cite our paper:

 ---
 license: mit
+library_name: transformers
+pipeline_tag: text-generation
 ---
+The base Qwen2.5-Math-7B model used by LUFFY, described in [Learning to Reason under Off-Policy Guidance](https://huggingface.co/papers/2504.14945).
 We change to rope_theta from 10000 to 40000 and extend the context window to 16k.
+Also, we modify the chat_template for the system prompt and add <think>.
+Github: https://github.com/ElliottYan/LUFFY
 # Citation
 If you find our model, data, or evaluation code useful, please kindly cite our paper: