RabotniKuma nielsr HF Staff commited on
Commit
abd20a4
·
verified ·
1 Parent(s): 1c9d3a9

Improve model card: Add pipeline tag, library name, update paper link and enhance details (#1)

Browse files

- Improve model card: Add pipeline tag, library name, update paper link and enhance details (6e8c7ddd6983c984051d14a3100a36e2e0e6e08d)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +134 -12
README.md CHANGED
@@ -1,25 +1,48 @@
1
  ---
2
- license: apache-2.0
3
  base_model:
4
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
 
7
  # Kaggle AI Mathematical Olympiad - Progress Prize 2 - 9th Place Solution (Fast-Math-R1-14B)
 
 
 
8
  ## Team
9
  - Hiroshi Yoshihara @ [Aillis Inc.](https://aillis.jp/en), [The Univ. of Tokyo](https://publichealth.f.u-tokyo.ac.jp/#page_home)
10
  - Yuichi Inoue @ [Sakana AI](https://sakana.ai)
11
  - Taiki Yamaguchi @ [Rist Inc.](https://www.rist.co.jp/en/)
12
 
13
- # Summary
14
  By applying SFT and GRPO on difficult math problems, we enhanced the performance of `DeepSeek-R1-Distill-Qwen-14B` and developed `Fast-Math-R1-14B`,
15
  which achieves up to 60% (on average approx. 30%) faster inference while maintaining accuracy.
16
 
 
 
17
  Technical details can be found in [Kaggle Discussion](https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/discussion/571252) and [Github](https://github.com/analokmaus/kaggle-aimo2-fast-math-r1).
18
 
19
- # Evaluation
20
  <img src="https://github.com/analokmaus/kaggle-aimo2-fast-math-r1/blob/master/assets/pass1_aime_all.png?raw=true" max-height="400px">
21
 
22
- ## DS-R1-Qwen-14B vs Fast-Math-R1-14B (Ours)
23
  | | | AIME 2024 | | AIME 2025 | |
24
  | ---------------------------- | ------------ | ---------------- | ------------------ | ---------------- | ------------------ |
25
  | Model | Token budget | Pass@1 (avg. 64) | Mean output tokens | Pass@1 (avg. 64) | Mean output tokens |
@@ -34,7 +57,7 @@ Technical details can be found in [Kaggle Discussion](https://www.kaggle.com/com
34
  | | 12000 | 61.9 | 7362 | 45.2 | 8048 |
35
  | | 8000 | 51.4 | 5939 | 36.3 | 6174 |
36
 
37
- ## OpenMath-Nemotron-14B vs Fast-OpenMath-Nemotron-14B (Ours)
38
  | | | AIME 2024 | | AIME 2025 | |
39
  | -------------------------- | ------------ | ---------------- | ------------------ | ---------------- | ------------------ |
40
  | Model | Token budget | Pass@1 (avg. 64) | Mean output tokens | Pass@1 (avg. 64) | Mean output tokens |
@@ -49,7 +72,7 @@ Technical details can be found in [Kaggle Discussion](https://www.kaggle.com/com
49
  | | 12000 | 59.4 | 7927 | 45.6 | 8752 |
50
  | | 8000 | 47.6 | 6282 | 33.8 | 6589 |
51
 
52
- ## Qwen3-14B vs Fast-Math-Qwen-14B
53
  | | | AIME 2024 | | AIME 2025 | |
54
  | ------------------- | ------------ | ---------------- | ------------------ | ---------------- | ------------------ |
55
  | Model | Token budget | Pass@1 (avg. 64) | Mean output tokens | Pass@1 (avg. 64) | Mean output tokens |
@@ -64,13 +87,12 @@ Technical details can be found in [Kaggle Discussion](https://www.kaggle.com/com
64
  | | 12000 | 65.1 | 7775 | 49.4 | 8733 |
65
  | | 8000 | 50.7 | 6260 | 36 | 6618 |
66
 
67
-
68
- # Dataset
69
  - [Our first stage SFT dataset](https://huggingface.co/datasets/RabotniKuma/Fast-Math-R1-SFT)
70
  - [Our second stage GRPO dataset](https://huggingface.co/datasets/RabotniKuma/Fast-Math-R1-GRPO)
71
 
72
- # Inference
73
- ## vLLM
74
  ```python
75
  from vllm import LLM, SamplingParams
76
  from transformers import AutoTokenizer
@@ -97,7 +119,7 @@ messages = [
97
  {
98
  'role': 'user',
99
  'content': (
100
- 'Solve the problem, and put the answer in \boxed{{}}. '
101
  'Sarah is twice as old as her youngest brother. If the difference between their ages is 15 years. How old is her youngest brother?'
102
  )
103
  }
@@ -108,4 +130,104 @@ messages = tokenizer.apply_chat_template(
108
  add_generation_prompt=True
109
  )
110
  response = vllm_engine.generate(messages, sampling_params=sampling_params)
111
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  base_model:
3
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
4
+ license: apache-2.0
5
+ pipeline_tag: text-generation
6
+ library_name: transformers
7
+ tags:
8
+ - math
9
+ - reasoning
10
+ - llm
11
+ - mathematical-reasoning
12
+ - aimo
13
+ datasets:
14
+ - RabotniKuma/Fast-Math-R1-SFT
15
+ - RabotniKuma/Fast-Math-R1-GRPO
16
+ - open-r1/OpenR1-Math-220k
17
+ - hoanganhpham/openr1_hard
18
+ - qihoo360/Light-R1-SFTData
19
+ language:
20
+ - en
21
+ metrics:
22
+ - pass@1
23
  ---
24
 
25
  # Kaggle AI Mathematical Olympiad - Progress Prize 2 - 9th Place Solution (Fast-Math-R1-14B)
26
+
27
+ This model was presented in the paper [A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning](https://huggingface.co/papers/2507.08267).
28
+
29
  ## Team
30
  - Hiroshi Yoshihara @ [Aillis Inc.](https://aillis.jp/en), [The Univ. of Tokyo](https://publichealth.f.u-tokyo.ac.jp/#page_home)
31
  - Yuichi Inoue @ [Sakana AI](https://sakana.ai)
32
  - Taiki Yamaguchi @ [Rist Inc.](https://www.rist.co.jp/en/)
33
 
34
+ ## Summary
35
  By applying SFT and GRPO on difficult math problems, we enhanced the performance of `DeepSeek-R1-Distill-Qwen-14B` and developed `Fast-Math-R1-14B`,
36
  which achieves up to 60% (on average approx. 30%) faster inference while maintaining accuracy.
37
 
38
+ In addition, we trained and open-sourced `Fast-OpenMath-Nemotron-14B`, an efficiency-optimized version of NVIDIA’s `OpenMath-Nemotron-14B`, following the same approach.
39
+
40
  Technical details can be found in [Kaggle Discussion](https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/discussion/571252) and [Github](https://github.com/analokmaus/kaggle-aimo2-fast-math-r1).
41
 
42
+ ## Evaluation
43
  <img src="https://github.com/analokmaus/kaggle-aimo2-fast-math-r1/blob/master/assets/pass1_aime_all.png?raw=true" max-height="400px">
44
 
45
+ ### DS-R1-Qwen-14B vs Fast-Math-R1-14B (Ours)
46
  | | | AIME 2024 | | AIME 2025 | |
47
  | ---------------------------- | ------------ | ---------------- | ------------------ | ---------------- | ------------------ |
48
  | Model | Token budget | Pass@1 (avg. 64) | Mean output tokens | Pass@1 (avg. 64) | Mean output tokens |
 
57
  | | 12000 | 61.9 | 7362 | 45.2 | 8048 |
58
  | | 8000 | 51.4 | 5939 | 36.3 | 6174 |
59
 
60
+ ### OpenMath-Nemotron-14B vs Fast-OpenMath-Nemotron-14B (Ours)
61
  | | | AIME 2024 | | AIME 2025 | |
62
  | -------------------------- | ------------ | ---------------- | ------------------ | ---------------- | ------------------ |
63
  | Model | Token budget | Pass@1 (avg. 64) | Mean output tokens | Pass@1 (avg. 64) | Mean output tokens |
 
72
  | | 12000 | 59.4 | 7927 | 45.6 | 8752 |
73
  | | 8000 | 47.6 | 6282 | 33.8 | 6589 |
74
 
75
+ ### Qwen3-14B vs Fast-Math-Qwen3-14B
76
  | | | AIME 2024 | | AIME 2025 | |
77
  | ------------------- | ------------ | ---------------- | ------------------ | ---------------- | ------------------ |
78
  | Model | Token budget | Pass@1 (avg. 64) | Mean output tokens | Pass@1 (avg. 64) | Mean output tokens |
 
87
  | | 12000 | 65.1 | 7775 | 49.4 | 8733 |
88
  | | 8000 | 50.7 | 6260 | 36 | 6618 |
89
 
90
+ ## Dataset
 
91
  - [Our first stage SFT dataset](https://huggingface.co/datasets/RabotniKuma/Fast-Math-R1-SFT)
92
  - [Our second stage GRPO dataset](https://huggingface.co/datasets/RabotniKuma/Fast-Math-R1-GRPO)
93
 
94
+ ## Inference
95
+ ### vLLM
96
  ```python
97
  from vllm import LLM, SamplingParams
98
  from transformers import AutoTokenizer
 
119
  {
120
  'role': 'user',
121
  'content': (
122
+ 'Solve the problem, and put the answer in \\\\boxed{{}}. '
123
  'Sarah is twice as old as her youngest brother. If the difference between their ages is 15 years. How old is her youngest brother?'
124
  )
125
  }
 
130
  add_generation_prompt=True
131
  )
132
  response = vllm_engine.generate(messages, sampling_params=sampling_params)
133
+ ```
134
+
135
+ ## Training models
136
+ ### 1. Installation
137
+ ```bash
138
+ poetry lock
139
+ poetry install --no-root
140
+ ```
141
+
142
+ ### 2. First stage training
143
+ Training time: approx. 10 hours (8× H200 GPUs)
144
+ ```bash
145
+ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
146
+ accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml --num_processes 8 \
147
+ experiments/train_first_stage.py
148
+ ```
149
+ <img src="https://github.com/analokmaus/kaggle-aimo2-fast-math-r1/blob/master/assets/wandb_stage1.png?raw=true" max-height="300px">
150
+
151
+ ### 3. Second stage training
152
+ Training time: approx. 10 hours (8× H200 GPUs)
153
+ ```bash
154
+ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
155
+ accelerate launch --config_file accelerate_configs/deepspeed_zero2.yaml --num_processes 8 \
156
+ experiments/train_second_stage.py
157
+ ```
158
+ <img src="https://github.com/analokmaus/kaggle-aimo2-fast-math-r1/blob/master/assets/wandb_stage2.png?raw=true" max-height="600px">
159
+
160
+ ### (Optional) Token scheduler training
161
+ Training time: approx. 1 hours (8× H200 GPUs)
162
+
163
+ The token scheduler is a lightweight model that predicts the difficulty of a problem, measured by how many tokens the R1 model requires before reaching the final answer. See [Kaggle discussion](https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/discussion/571252) for details.
164
+ ```bash
165
+ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
166
+ accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml --num_processes 8 \
167
+ experiments/train_token_scheduler.py
168
+ ```
169
+ <img src="https://github.com/analokmaus/kaggle-aimo2-fast-math-r1/blob/master/assets/wandb_token_scheduler.png?raw=true" max-height="300px">
170
+
171
+ ### (Optional) Fast-OpenMath-Nemotron-14B
172
+ Training time: approx. 12 hours (8× H200 GPUs)
173
+ ```bash
174
+ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
175
+ accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml --num_processes 8 \
176
+ experiments/train_fast_nemotron_14b.py
177
+ ```
178
+
179
+ ### (Optional) Fast-Math-Qwen3-14B
180
+ Training time: approx. 12 hours (8× H200 GPUs)
181
+
182
+ **Note:** You’ll need to update your dependencies to train any of the Qwen3 series models.
183
+ ```bash
184
+ # Update environment
185
+ cp dev/pyproject_qwen3.toml pyproject.toml
186
+ poetry lock
187
+ poetry install --no-root
188
+ # Train
189
+ CUDA_VISIBLE_DEVICES=0,1,2,3 \
190
+ accelerate launch --config_file accelerate_configs/deepspeed_zero3_cpu_offload.yaml --num_processes 4 \
191
+ experiments/train_fast_qwen3_14b.py &
192
+ CUDA_VISIBLE_DEVICES=4,5,6,7 trl vllm-serve --model Qwen/Qwen3-14B --tensor_parallel_size 2 --data_parallel_size 2 &
193
+ wait
194
+ ```
195
+
196
+ ## Technical details
197
+ Detailed report is available on [Kaggle Disucussion](https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/discussion/571252).
198
+
199
+ ### First stage: intensive SFT using a high-difficulty dataset
200
+ #### Dataset
201
+ - [OpenR1 Math](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k): We randomly sampled 3000 examples where the R1’s trace had more than 12800 tokens and an accuracy of over 50%, along with another 3000 examples where the accuracy ranged between 50% and 75%.
202
+ - [openr1_hard](https://huggingface.co/datasets/hoanganhpham/openr1_hard): "~2.5k hard samples from open-r1-math-220k. Samples deemed as hard were unsolvable by r1-distill-32b after 4 tries."
203
+ - [Light-R1-SFTData](https://huggingface.co/datasets/qihoo360/Light-R1-SFTData): We used the 2nd stage data from Light-R1-SFTData.
204
+
205
+ We merged all the datasets mentioned above, removed duplicates, and selected the correct generation with the shortest token length. For samples in the Light-R1 dataset where ground truth answers were not provided, we extracted and substituted the answers from the R1 traces. As a result, we constructed a **high-difficulty dataset consisting of 7900 problem - R1 trace - answer sets**.
206
+
207
+ [Our first stage SFT dataset](https://huggingface.co/datasets/RabotniKuma/Fast-Math-R1-SFT)
208
+
209
+ #### Training
210
+ A full-parameter supervised fine-tuning training was conducted on a machine with 8 H200 GPUs, using the SFTTrainer from the trl library.
211
+
212
+ ### Second stage: GRPO for more efficient reasoning
213
+ #### Dataset
214
+ - [Light-R1-SFTData](https://huggingface.co/datasets/qihoo360/Light-R1-SFTData): We extracted the answers from the 2nd stage SFT data of Light-R1.
215
+
216
+ [Our second stage GRPO dataset](https://huggingface.co/datasets/RabotniKuma/Fast-Math-R1-GRPO)
217
+
218
+ #### Training
219
+ We used the [faster implementation of trl GRPOTrainer](https://github.com/nhannguyen2709/open-r1).
220
+
221
+ Reward functions:
222
+ 1. Format reward
223
+
224
+ In order to save output tokens, we forced the model to give an answer in the end of reasoning block before `</think>` by rewarding the pattern `r"^.*?oxed{(.*?)}.*?</think>.*?$"`. Generation is stopped at `</think>` during inference.
225
+
226
+ 2. Cosine reward
227
+
228
+ Compared to a normal accuracy-based reward, cosine reward applies a continuous penalty to longer correct reasoning traces and shorter incorrect ones.
229
+
230
+ 3. Length reward
231
+
232
+ Length-based rewards to discourage overthinking and promote token efficiency.
233
+ Paper: https://arxiv.org/abs/2501.12599