RabotniKuma commited on
Commit
1c9d3a9
·
verified ·
1 Parent(s): cbb3d80

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -17
README.md CHANGED
@@ -16,23 +16,53 @@ which achieves up to 60% (on average approx. 30%) faster inference while maintai
16
 
17
  Technical details can be found in [Kaggle Discussion](https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/discussion/571252) and [Github](https://github.com/analokmaus/kaggle-aimo2-fast-math-r1).
18
 
19
- <img src="https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1973217%2F4f221ab914f3e950fa35bdab5723d462%2Fpass1_aime_all.png?generation=1744851665782759&alt=media" max-height="300px">
 
20
 
21
- | | | AIME 2024 | | AIME 2025 | |
22
- | ---------------------------- | ------------ | ---------------- | ------------- | ---------------- | ------------- |
23
- | Model | Token budget | Pass@1 (avg. 64) | Output tokens | Pass@1 (avg. 64) | Output tokens |
24
- | DeepSeek-R1-Distill-Qwen-14B | 16384 | 63.3 | 9590 | 46.7 | 10602 |
25
- | | 12800 | 58 | 8632 | 41.9 | 9363 |
26
- | | 8192 | 45.6 | 6638 | 30.6 | 6897 |
27
- | Light-R1-14B-DS | 16384 | **66.8** | 10146 | **51.3** | 11308 |
28
- | | 12800 | 59.2 | 9110 | 43.8 | 9834 |
29
- | | 8192 | 42.4 | 7020 | 30.4 | 7124 |
30
- | Fast-Math-R1-14B | 16384 | 66 | **7932** | 49.2 | **9066** |
31
- | | 12800 | **63** | **7449** | **46.1** | **8282** |
32
- | | 8192 | **51.4** | **5963** | **37.2** | **6256** |
33
- | Fast-Math-R1-14B-SFT Only | 16384 | 65.2 | 10268 | 49.7 | 11264 |
34
- | | 12800 | 57.2 | 9180 | 42.8 | 9805 |
35
- | | 8192 | 41.3 | 7015 | 30.1 | 7074 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
 
38
  # Dataset
@@ -61,7 +91,7 @@ sampling_params = SamplingParams(
61
  top_p=0.90,
62
  min_p=0.05,
63
  max_tokens=8192,
64
- stop='</think>', # Important!: early stop at </think> to save output tokens
65
  )
66
  messages = [
67
  {
 
16
 
17
  Technical details can be found in [Kaggle Discussion](https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/discussion/571252) and [Github](https://github.com/analokmaus/kaggle-aimo2-fast-math-r1).
18
 
19
+ # Evaluation
20
+ <img src="https://github.com/analokmaus/kaggle-aimo2-fast-math-r1/blob/master/assets/pass1_aime_all.png?raw=true" max-height="400px">
21
 
22
+ ## DS-R1-Qwen-14B vs Fast-Math-R1-14B (Ours)
23
+ | | | AIME 2024 | | AIME 2025 | |
24
+ | ---------------------------- | ------------ | ---------------- | ------------------ | ---------------- | ------------------ |
25
+ | Model | Token budget | Pass@1 (avg. 64) | Mean output tokens | Pass@1 (avg. 64) | Mean output tokens |
26
+ | DeepSeek-R1-Distill-Qwen-14B | 32000 | 66.9 | 11026 | 49.9 | 12310 |
27
+ | | 24000 | 65.7 | 10784 | 49.7 | 11978 |
28
+ | | 16000 | 61 | 9708 | 46.2 | 10567 |
29
+ | | 12000 | 53.7 | 8472 | 39.9 | 9008 |
30
+ | | 8000 | 41.8 | 6587 | 31.1 | 6788 |
31
+ | Fast-Math-R1-14B | 32000 | 68 | 8217 | 49.6 | 9663 |
32
+ | | 24000 | 67.9 | 8209 | 49.6 | 9627 |
33
+ | | 16000 | 66.7 | 8017 | 48.4 | 9083 |
34
+ | | 12000 | 61.9 | 7362 | 45.2 | 8048 |
35
+ | | 8000 | 51.4 | 5939 | 36.3 | 6174 |
36
+
37
+ ## OpenMath-Nemotron-14B vs Fast-OpenMath-Nemotron-14B (Ours)
38
+ | | | AIME 2024 | | AIME 2025 | |
39
+ | -------------------------- | ------------ | ---------------- | ------------------ | ---------------- | ------------------ |
40
+ | Model | Token budget | Pass@1 (avg. 64) | Mean output tokens | Pass@1 (avg. 64) | Mean output tokens |
41
+ | OpenMath-Nemotron-14B | 32000 | 76.2 | 11493 | 64.5 | 13414 |
42
+ | | 24000 | 75.4 | 11417 | 63.4 | 13046 |
43
+ | | 16000 | 66 | 10399 | 54.2 | 11422 |
44
+ | | 12000 | 55 | 9053 | 40 | 9609 |
45
+ | | 8000 | 36 | 6978 | 27.2 | 7083 |
46
+ | [Fast-OpenMath-Nemotron-14B](https://huggingface.co/RabotniKuma/Fast-OpenMath-Nemotron-14B) | 32000 | 70.7 | 9603 | 61.4 | 11424 |
47
+ | | 24000 | 70.6 | 9567 | 60.9 | 11271 |
48
+ | | 16000 | 66.6 | 8954 | 55.3 | 10190 |
49
+ | | 12000 | 59.4 | 7927 | 45.6 | 8752 |
50
+ | | 8000 | 47.6 | 6282 | 33.8 | 6589 |
51
+
52
+ ## Qwen3-14B vs Fast-Math-Qwen-14B
53
+ | | | AIME 2024 | | AIME 2025 | |
54
+ | ------------------- | ------------ | ---------------- | ------------------ | ---------------- | ------------------ |
55
+ | Model | Token budget | Pass@1 (avg. 64) | Mean output tokens | Pass@1 (avg. 64) | Mean output tokens |
56
+ | Qwen3-14B | 32000 | 79.3 | 13669 | 69.5 | 16481 |
57
+ | | 24000 | 75.9 | 13168 | 65.6 | 15235 |
58
+ | | 16000 | 64.5 | 11351 | 50.4 | 12522 |
59
+ | | 12000 | 49.7 | 9746 | 36.3 | 10353 |
60
+ | | 8000 | 28.4 | 7374 | 19.5 | 7485 |
61
+ | [Fast-Math-Qwen3-14B](https://huggingface.co/RabotniKuma/Fast-Math-Qwen3-14B) | 32000 | 77.6 | 9740 | 66.6 | 12281 |
62
+ | | 24000 | 76.5 | 9634 | 65.3 | 11847 |
63
+ | | 16000 | 72.6 | 8793 | 60.1 | 10195 |
64
+ | | 12000 | 65.1 | 7775 | 49.4 | 8733 |
65
+ | | 8000 | 50.7 | 6260 | 36 | 6618 |
66
 
67
 
68
  # Dataset
 
91
  top_p=0.90,
92
  min_p=0.05,
93
  max_tokens=8192,
94
+ stop='</think>', # For even faster inference, applying early stopping at the </think> tag and extracting the final boxed content is recommended.
95
  )
96
  messages = [
97
  {