Update README.md
Browse files
README.md
CHANGED
|
@@ -16,23 +16,53 @@ which achieves up to 60% (on average approx. 30%) faster inference while maintai
|
|
| 16 |
|
| 17 |
Technical details can be found in [Kaggle Discussion](https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/discussion/571252) and [Github](https://github.com/analokmaus/kaggle-aimo2-fast-math-r1).
|
| 18 |
|
| 19 |
-
|
|
|
|
| 20 |
|
| 21 |
-
|
| 22 |
-
|
|
| 23 |
-
|
|
| 24 |
-
|
|
| 25 |
-
|
|
| 26 |
-
| |
|
| 27 |
-
|
|
| 28 |
-
| |
|
| 29 |
-
| |
|
| 30 |
-
| Fast-Math-R1-14B |
|
| 31 |
-
| |
|
| 32 |
-
| |
|
| 33 |
-
|
|
| 34 |
-
| |
|
| 35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
|
| 38 |
# Dataset
|
|
@@ -61,7 +91,7 @@ sampling_params = SamplingParams(
|
|
| 61 |
top_p=0.90,
|
| 62 |
min_p=0.05,
|
| 63 |
max_tokens=8192,
|
| 64 |
-
stop='</think>',
|
| 65 |
)
|
| 66 |
messages = [
|
| 67 |
{
|
|
|
|
| 16 |
|
| 17 |
Technical details can be found in [Kaggle Discussion](https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/discussion/571252) and [Github](https://github.com/analokmaus/kaggle-aimo2-fast-math-r1).
|
| 18 |
|
| 19 |
+
# Evaluation
|
| 20 |
+
<img src="https://github.com/analokmaus/kaggle-aimo2-fast-math-r1/blob/master/assets/pass1_aime_all.png?raw=true" max-height="400px">
|
| 21 |
|
| 22 |
+
## DS-R1-Qwen-14B vs Fast-Math-R1-14B (Ours)
|
| 23 |
+
| | | AIME 2024 | | AIME 2025 | |
|
| 24 |
+
| ---------------------------- | ------------ | ---------------- | ------------------ | ---------------- | ------------------ |
|
| 25 |
+
| Model | Token budget | Pass@1 (avg. 64) | Mean output tokens | Pass@1 (avg. 64) | Mean output tokens |
|
| 26 |
+
| DeepSeek-R1-Distill-Qwen-14B | 32000 | 66.9 | 11026 | 49.9 | 12310 |
|
| 27 |
+
| | 24000 | 65.7 | 10784 | 49.7 | 11978 |
|
| 28 |
+
| | 16000 | 61 | 9708 | 46.2 | 10567 |
|
| 29 |
+
| | 12000 | 53.7 | 8472 | 39.9 | 9008 |
|
| 30 |
+
| | 8000 | 41.8 | 6587 | 31.1 | 6788 |
|
| 31 |
+
| Fast-Math-R1-14B | 32000 | 68 | 8217 | 49.6 | 9663 |
|
| 32 |
+
| | 24000 | 67.9 | 8209 | 49.6 | 9627 |
|
| 33 |
+
| | 16000 | 66.7 | 8017 | 48.4 | 9083 |
|
| 34 |
+
| | 12000 | 61.9 | 7362 | 45.2 | 8048 |
|
| 35 |
+
| | 8000 | 51.4 | 5939 | 36.3 | 6174 |
|
| 36 |
+
|
| 37 |
+
## OpenMath-Nemotron-14B vs Fast-OpenMath-Nemotron-14B (Ours)
|
| 38 |
+
| | | AIME 2024 | | AIME 2025 | |
|
| 39 |
+
| -------------------------- | ------------ | ---------------- | ------------------ | ---------------- | ------------------ |
|
| 40 |
+
| Model | Token budget | Pass@1 (avg. 64) | Mean output tokens | Pass@1 (avg. 64) | Mean output tokens |
|
| 41 |
+
| OpenMath-Nemotron-14B | 32000 | 76.2 | 11493 | 64.5 | 13414 |
|
| 42 |
+
| | 24000 | 75.4 | 11417 | 63.4 | 13046 |
|
| 43 |
+
| | 16000 | 66 | 10399 | 54.2 | 11422 |
|
| 44 |
+
| | 12000 | 55 | 9053 | 40 | 9609 |
|
| 45 |
+
| | 8000 | 36 | 6978 | 27.2 | 7083 |
|
| 46 |
+
| [Fast-OpenMath-Nemotron-14B](https://huggingface.co/RabotniKuma/Fast-OpenMath-Nemotron-14B) | 32000 | 70.7 | 9603 | 61.4 | 11424 |
|
| 47 |
+
| | 24000 | 70.6 | 9567 | 60.9 | 11271 |
|
| 48 |
+
| | 16000 | 66.6 | 8954 | 55.3 | 10190 |
|
| 49 |
+
| | 12000 | 59.4 | 7927 | 45.6 | 8752 |
|
| 50 |
+
| | 8000 | 47.6 | 6282 | 33.8 | 6589 |
|
| 51 |
+
|
| 52 |
+
## Qwen3-14B vs Fast-Math-Qwen-14B
|
| 53 |
+
| | | AIME 2024 | | AIME 2025 | |
|
| 54 |
+
| ------------------- | ------------ | ---------------- | ------------------ | ---------------- | ------------------ |
|
| 55 |
+
| Model | Token budget | Pass@1 (avg. 64) | Mean output tokens | Pass@1 (avg. 64) | Mean output tokens |
|
| 56 |
+
| Qwen3-14B | 32000 | 79.3 | 13669 | 69.5 | 16481 |
|
| 57 |
+
| | 24000 | 75.9 | 13168 | 65.6 | 15235 |
|
| 58 |
+
| | 16000 | 64.5 | 11351 | 50.4 | 12522 |
|
| 59 |
+
| | 12000 | 49.7 | 9746 | 36.3 | 10353 |
|
| 60 |
+
| | 8000 | 28.4 | 7374 | 19.5 | 7485 |
|
| 61 |
+
| [Fast-Math-Qwen3-14B](https://huggingface.co/RabotniKuma/Fast-Math-Qwen3-14B) | 32000 | 77.6 | 9740 | 66.6 | 12281 |
|
| 62 |
+
| | 24000 | 76.5 | 9634 | 65.3 | 11847 |
|
| 63 |
+
| | 16000 | 72.6 | 8793 | 60.1 | 10195 |
|
| 64 |
+
| | 12000 | 65.1 | 7775 | 49.4 | 8733 |
|
| 65 |
+
| | 8000 | 50.7 | 6260 | 36 | 6618 |
|
| 66 |
|
| 67 |
|
| 68 |
# Dataset
|
|
|
|
| 91 |
top_p=0.90,
|
| 92 |
min_p=0.05,
|
| 93 |
max_tokens=8192,
|
| 94 |
+
stop='</think>', # For even faster inference, applying early stopping at the </think> tag and extracting the final boxed content is recommended.
|
| 95 |
)
|
| 96 |
messages = [
|
| 97 |
{
|