Safetensors
qwen2
yizhilll commited on
Commit
39c7c80
·
verified ·
1 Parent(s): 0b8c099

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - m-a-p/TreePO_data
4
+ base_model:
5
+ - Qwen/Qwen2.5-7B
6
+ ---
7
+
8
+
9
+ We release the resources for the paper [TreePO](arxiv.org/abs/2508.17445):
10
+ - Checkpoint with average weighted subgroup advantages + more diverse intial divergence ([the final one](https://huggingface.co/m-a-p/TreePO-Qwen2.5-7B)).
11
+ - Checkpoint with average weighted subgroup advantages + [fixed divergence](https://huggingface.co/m-a-p/TreePO-Qwen2.5-7B_fixed-div). **← You are here.**
12
+ - The [training dataset](https://huggingface.co/datasets/m-a-p/TreePO_data) consisted of deepscaler and simplerl math reasoning.
13
+
14
+
15
+ More links:
16
+ - [Huggingface Paper](https://huggingface.co/papers/2508.17445)
17
+ - [Project Page](https://m-a-p.ai/TreePO)
18
+ - [X/Twitter Thread](https://x.com/yizhilll/status/1960616873180954854)
19
+ - [Github Repo](https://github.com/multimodal-art-projection/TreePO)
20
+
21
+
22
+ If you find this work useful, please consider citing the paper:
23
+
24
+ ```bibtex
25
+ @misc{li2025treepo, title={TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling}, author={Yizhi Li and Qingshui Gu and Zhoufutu Wen and Ziniu Li and Tianshun Xing and Shuyue Guo and Tianyu Zheng and Xin Zhou and Xingwei Qu and Wangchunshu Zhou and Zheng Zhang and Wei Shen and Qian Liu and Chenghua Lin and Jian Yang and Ge Zhang and Wenhao Huang}, year={2025}, eprint={2508.17445}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2508.17445}, howpublished = {\url{https://m-a-p.ai/TreePO}} }
26
+ ```