File size: 1,790 Bytes

88c085f
b2b9f2f
88c085f
b2b9f2f
88c085f
 
 
 
 
ae96ffb
88c085f
 
b2b9f2f
88c085f
b2b9f2f
88c085f
 
 
 
b2b9f2f
88c085f
b2b9f2f
 
88c085f
b2b9f2f
 
 
 
 
 
88c085f
b2b9f2f
88c085f
b2b9f2f
 
 
 
 
88c085f
b2b9f2f
88c085f
b2b9f2f
88c085f
b2b9f2f
88c085f
e8b57f9
 
 
 
ae96ffb

---
datasets: open-r1/openr1-220k-math
library_name: transformers
model_name: OpenR1-Qwen-7B
tags:
- generated_from_trainer
- trl
- sft
licence: license
license: apache-2.0
---

# OpenR1-Qwen-7B

This is a finetune of [Qwen2.5-Math-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct) on [OpenR1-220k-Math](https://huggingface.co/datasets/open-r1/openr1-220k-math) (`default` split).

## Quick start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "open-r1/OpenR1-Qwen-7B"
device = "cuda" 

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$."

messages = [
    {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
    {"role": "user", "content": prompt}
]
```

## Training 

We train the model on the `default` split of [OpenR1-220k-Math](https://huggingface.co/datasets/open-r1/openr1-220k-math) for 3 epochs. We use learning rate of 5e-5 and extend the context length from 4k to 32k, by increasing RoPE frequency to 300k. The training follows a linear learning rate schedule with a 10% warmup phase. The table below compares the performance of OpenR1-Qwen-7B to DeepSeek-Distill-Qwen-7B and OpenThinker-7B using [lighteval](https://github.com/huggingface/open-r1/tree/main?tab=readme-ov-file#evaluating-models).

You can find the training and evaluation code at: https://github.com/huggingface/open-r1/

| Model | MATH-500 | AIME24 |  AIME25 |
| --- | --- | --- |--- |
| DeepSeek-Distill-Qwen-7B | 91.6 | 43.3 | 40.0|
| OpenR1-Qwen-7B | 90.6 | 36.7 | 40.0 |
| OpenThinker-7B | 89.6 | 30.0 | 33.3 |