Triangle104 commited on
Commit
5d794be
Β·
verified Β·
1 Parent(s): 32766ce

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +203 -0
README.md CHANGED
@@ -10,6 +10,209 @@ tags:
10
  This model was converted to GGUF format from [`Open-Reasoner-Zero/Open-Reasoner-Zero-7B`](https://huggingface.co/Open-Reasoner-Zero/Open-Reasoner-Zero-7B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
11
  Refer to the [original model card](https://huggingface.co/Open-Reasoner-Zero/Open-Reasoner-Zero-7B) for more details on the model.
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ## Use with llama.cpp
14
  Install llama.cpp through brew (works on Mac and Linux)
15
 
 
10
  This model was converted to GGUF format from [`Open-Reasoner-Zero/Open-Reasoner-Zero-7B`](https://huggingface.co/Open-Reasoner-Zero/Open-Reasoner-Zero-7B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
11
  Refer to the [original model card](https://huggingface.co/Open-Reasoner-Zero/Open-Reasoner-Zero-7B) for more details on the model.
12
 
13
+ ---
14
+ An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
15
+
16
+ Overview
17
+
18
+
19
+
20
+
21
+ 🌊 We introduce Open-Reasoner-Zero, the first open
22
+ source implementation of large-scale reasoning-oriented RL training
23
+ focusing on scalability, simplicity and accessibility.
24
+
25
+
26
+ To enable broader participation in this pivotal moment we witnessed
27
+ and accelerate research towards artificial general intelligence (AGI),
28
+ we release our source code, parameter settings, training data, and model
29
+ weights.
30
+ Please refer to our paper for more insights.
31
+
32
+
33
+ Let the Reasoner-Zero tide rise!
34
+
35
+
36
+
37
+
38
+
39
+
40
+
41
+ Releases πŸ“¦
42
+
43
+
44
+
45
+
46
+ [2025/02/18]
47
+ We release Open-Reasoner-Zero.
48
+
49
+
50
+ As part of this release, we open-source:
51
+
52
+
53
+ 🌊 Paper on our comprehensive analysis and insights in Reasoner-Zero training
54
+ πŸ€— HF Model Open-Reasoner-Zero-7B and Open-Reasoner-Zero-32B
55
+ 🎁 Our curated 57k training data
56
+ πŸ“„ Training Scripts to enjoy your own Reasoner-Zero journey!
57
+
58
+
59
+
60
+
61
+
62
+
63
+
64
+ Key Features in Codebase πŸ”‘
65
+
66
+
67
+
68
+
69
+ Adopt single controller trainer design, flexible and researcher-friendly.
70
+ Colocate training and generation in the same GPUs to maximize GPU utilization.
71
+
72
+
73
+
74
+
75
+
76
+
77
+
78
+ Getting Started πŸš€
79
+
80
+
81
+
82
+
83
+
84
+
85
+
86
+
87
+
88
+ Installation & Training Scripts
89
+
90
+
91
+
92
+
93
+ We release our Dockerfile in docker folder to facilitate the reproducibility of our training.
94
+
95
+
96
+ To install the package, run:
97
+
98
+
99
+ pip install -e .
100
+
101
+
102
+
103
+
104
+
105
+
106
+
107
+
108
+ Start Orz-7B PPO Training
109
+
110
+
111
+
112
+
113
+ debug running command in single node:
114
+
115
+
116
+ DEBUG_MODE=True python -m playground.orz_7b_ppo
117
+
118
+
119
+
120
+ Multi-node Training:
121
+
122
+
123
+ first on master node, run:
124
+
125
+
126
+ ray start --head
127
+
128
+
129
+
130
+ then on other nodes, run:
131
+
132
+
133
+ ray start --address='<master-node-ip>:<master-node-port>'
134
+
135
+
136
+
137
+ then on master node, run:
138
+
139
+
140
+ python -m playground.orz_7b_ppo
141
+
142
+
143
+
144
+ Your training log will be shown in the master node terminal.
145
+
146
+
147
+
148
+
149
+
150
+
151
+
152
+ Start Orz-32B PPO Training
153
+
154
+
155
+
156
+
157
+ running command in 8 nodes:
158
+
159
+
160
+ first on master node, run:
161
+
162
+
163
+ ray start --head
164
+
165
+
166
+
167
+ then on other nodes, run:
168
+
169
+
170
+ ray start --address='<master-node-ip>:<master-node-port>'
171
+
172
+
173
+
174
+ then on master node, run:
175
+
176
+
177
+ python -m playground.orz_32b_ppo
178
+
179
+
180
+
181
+ Your training log will be shown in the master node terminal.
182
+
183
+
184
+
185
+
186
+
187
+
188
+
189
+ Data
190
+
191
+
192
+
193
+
194
+ We release all of 57k curated high-quality training data in the data folder.
195
+
196
+
197
+ The details for how to collect data are described in our paper.
198
+
199
+
200
+
201
+
202
+
203
+
204
+
205
+ Acknowledgements
206
+
207
+
208
+
209
+
210
+ This work was supported by computing resources and valuable feedback provided by StepFun and Tsinghua University.
211
+ Our training framework is built on OpenRLHF, vllm, DeepSpeed and ray.
212
+ Our model is based on Qwen2.5-7B and Qwen2.5-32B.
213
+ We thank Project Numina and Tulu3 for their collected open sourced data.
214
+
215
+ ---
216
  ## Use with llama.cpp
217
  Install llama.cpp through brew (works on Mac and Linux)
218