Triangle104
/

Open-Reasoner-Zero-7B-Q6_K-GGUF

GGUF

llama-cpp

gguf-my-repo

Inference Endpoints

conversational

Model card Files Files and versions Community

Triangle104 commited on 4 days ago

Commit

5d794be

verified ·

1 Parent(s): 32766ce

Update README.md

Browse files

Files changed (1) hide show

README.md +203 -0

README.md CHANGED Viewed

@@ -10,6 +10,209 @@ tags:
 This model was converted to GGUF format from [`Open-Reasoner-Zero/Open-Reasoner-Zero-7B`](https://huggingface.co/Open-Reasoner-Zero/Open-Reasoner-Zero-7B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/Open-Reasoner-Zero/Open-Reasoner-Zero-7B) for more details on the model.
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)

 This model was converted to GGUF format from [`Open-Reasoner-Zero/Open-Reasoner-Zero-7B`](https://huggingface.co/Open-Reasoner-Zero/Open-Reasoner-Zero-7B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/Open-Reasoner-Zero/Open-Reasoner-Zero-7B) for more details on the model.
+---
+An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
+Overview
+🌊 We introduce Open-Reasoner-Zero, the first open
+source implementation of large-scale reasoning-oriented RL training
+focusing on scalability, simplicity and accessibility.
+To enable broader participation in this pivotal moment we witnessed
+and accelerate research towards artificial general intelligence (AGI),
+we release our source code, parameter settings, training data, and model
+ weights.
+Please refer to our paper for more insights.
+Let the Reasoner-Zero tide rise!
+		Releases 📦
+[2025/02/18]
+We release Open-Reasoner-Zero.
+As part of this release, we open-source:
+🌊 Paper on our comprehensive analysis and insights in Reasoner-Zero training
+🤗 HF Model Open-Reasoner-Zero-7B and Open-Reasoner-Zero-32B
+🎁 Our curated 57k training data
+📄 Training Scripts to enjoy your own Reasoner-Zero journey!
+		Key Features in Codebase 🔑
+Adopt single controller trainer design, flexible and researcher-friendly.
+Colocate training and generation in the same GPUs to maximize GPU utilization.
+		Getting Started 🚀
+		Installation & Training Scripts
+We release our Dockerfile in docker folder to facilitate the reproducibility of our training.
+To install the package, run:
+pip install -e .
+		Start Orz-7B PPO Training
+debug running command in single node:
+DEBUG_MODE=True python -m playground.orz_7b_ppo
+Multi-node Training:
+first on master node, run:
+ray start --head
+then on other nodes, run:
+ray start --address='<master-node-ip>:<master-node-port>'
+then on master node, run:
+python -m playground.orz_7b_ppo
+Your training log will be shown in the master node terminal.
+		Start Orz-32B PPO Training
+running command in 8 nodes:
+first on master node, run:
+ray start --head
+then on other nodes, run:
+ray start --address='<master-node-ip>:<master-node-port>'
+then on master node, run:
+python -m playground.orz_32b_ppo
+Your training log will be shown in the master node terminal.
+		Data
+We release all of 57k curated high-quality training data in the data folder.
+The details for how to collect data are described in our paper.
+		Acknowledgements
+This work was supported by computing resources and valuable feedback provided by StepFun and Tsinghua University.
+Our training framework is built on OpenRLHF, vllm, DeepSpeed and ray.
+Our model is based on Qwen2.5-7B and Qwen2.5-32B.
+We thank Project Numina and Tulu3 for their collected open sourced data.
+---
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)