open-r1
/

OpenR1-Qwen-7B

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

OpenR1-Qwen-7B / README.md

lewtun's picture

lewtun HF staff

Update README.md

ae96ffb verified 13 days ago

|

history blame contribute delete

1.79 kB

	---
	datasets: open-r1/openr1-220k-math
	library_name: transformers
	model_name: OpenR1-Qwen-7B
	tags:
	- generated_from_trainer
	- trl
	- sft
	licence: license
	license: apache-2.0
	---

	# OpenR1-Qwen-7B

	This is a finetune of [Qwen2.5-Math-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct) on [OpenR1-220k-Math](https://huggingface.co/datasets/open-r1/openr1-220k-math) (`default` split).

	## Quick start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "open-r1/OpenR1-Qwen-7B"
	device = "cuda"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$."

	messages = [
	{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
	{"role": "user", "content": prompt}
	]
	```

	## Training

	We train the model on the `default` split of [OpenR1-220k-Math](https://huggingface.co/datasets/open-r1/openr1-220k-math) for 3 epochs. We use learning rate of 5e-5 and extend the context length from 4k to 32k, by increasing RoPE frequency to 300k. The training follows a linear learning rate schedule with a 10% warmup phase. The table below compares the performance of OpenR1-Qwen-7B to DeepSeek-Distill-Qwen-7B and OpenThinker-7B using [lighteval](https://github.com/huggingface/open-r1/tree/main?tab=readme-ov-file#evaluating-models).

	You can find the training and evaluation code at: https://github.com/huggingface/open-r1/

	\| Model \| MATH-500 \| AIME24 \| AIME25 \|
	\| --- \| --- \| --- \|--- \|
	\| DeepSeek-Distill-Qwen-7B \| 91.6 \| 43.3 \| 40.0\|
	\| OpenR1-Qwen-7B \| 90.6 \| 36.7 \| 40.0 \|
	\| OpenThinker-7B \| 89.6 \| 30.0 \| 33.3 \|