Update README.md
Browse files
README.md
CHANGED
@@ -9,12 +9,6 @@ library_name: transformers
|
|
9 |
---
|
10 |
# Welcome to FairyR1-32B created by PKU-DS-LAB!
|
11 |
|
12 |
-
## Introduction
|
13 |
-
|
14 |
-
FairyR1-32B, a highly efficient large-language-model (LLM) that matches or exceeds larger models on select tasks despite using only ~5% of their parameters. Built atop the DeepSeek-R1-Distill-Qwen-32B base, FairyR1-32B leverages a novel “distill-and-merge” pipeline—combining task-focused fine-tuning with model-merging techniques to deliver competitive performance with drastically reduced size and inference cost. This project was funded by NSFC, Grant 624B2005.
|
15 |
-
|
16 |
-
<!-- ## Evaluation -->
|
17 |
-
|
18 |
| Benchmark | DeepSeek-R1-671B | DeepSeek-R1-Distill-Qwen-32B | FairyR1-32B (PKU) |
|
19 |
| :-----------------------: | :--------------: | :--------------------------: | :-----------------------: |
|
20 |
| **AIME 2024 (Math)** | 79.8 | 72.6 | **80.4** |
|
@@ -22,9 +16,10 @@ FairyR1-32B, a highly efficient large-language-model (LLM) that matches or excee
|
|
22 |
| **LiveCodeBench (Code)** | 65.9 | 57.2 | **67.7** |
|
23 |
| **GPQA-Diamond (Sci-QA)** | **71.5** | 62.1 | 60.0 |
|
24 |
|
25 |
-
|
26 |
-
|
27 |
-
-
|
|
|
28 |
|
29 |
## Model Details
|
30 |
|
@@ -63,6 +58,12 @@ This work demonstrates the feasibility of significantly reducing model size and
|
|
63 |
- **Hours used(Coding):** 1.5h
|
64 |
- **Model Merging:** about 40min on CPU, no GPU needed.
|
65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
66 |
|
67 |
## FairyR1 series Team Members:
|
68 |
|
|
|
9 |
---
|
10 |
# Welcome to FairyR1-32B created by PKU-DS-LAB!
|
11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
| Benchmark | DeepSeek-R1-671B | DeepSeek-R1-Distill-Qwen-32B | FairyR1-32B (PKU) |
|
13 |
| :-----------------------: | :--------------: | :--------------------------: | :-----------------------: |
|
14 |
| **AIME 2024 (Math)** | 79.8 | 72.6 | **80.4** |
|
|
|
16 |
| **LiveCodeBench (Code)** | 65.9 | 57.2 | **67.7** |
|
17 |
| **GPQA-Diamond (Sci-QA)** | **71.5** | 62.1 | 60.0 |
|
18 |
|
19 |
+
## Introduction
|
20 |
+
|
21 |
+
FairyR1-32B, a highly efficient large-language-model (LLM) that matches or exceeds larger models on select tasks despite using only ~5% of their parameters. Built atop the DeepSeek-R1-Distill-Qwen-32B base, FairyR1-32B leverages a novel “distill-and-merge” pipeline—combining task-focused fine-tuning with model-merging techniques to deliver competitive performance with drastically reduced size and inference cost. This project was funded by NSFC, Grant 624B2005.
|
22 |
+
|
23 |
|
24 |
## Model Details
|
25 |
|
|
|
58 |
- **Hours used(Coding):** 1.5h
|
59 |
- **Model Merging:** about 40min on CPU, no GPU needed.
|
60 |
|
61 |
+
### Evaluation Set
|
62 |
+
|
63 |
+
- AIME 2024/2025 (math): We evaluate 32 times and report the average accuracy. [AIME 2024](https://huggingface.co/datasets/HuggingFaceH4/aime_2024) contains 30 problems. [AIME 2025](https://huggingface.co/datasets/MathArena/aime_2025) consists of Part I and Part II, with a total of 30 questions.<br>
|
64 |
+
- [LiveCodeBench (code)](https://huggingface.co/datasets/livecodebench/code_generation_lite): We evaluate 8 times and report the average accuracy. The dataset version is "release_v5" (date range: 2024-08-01 to 2025-02-01), consisting of 279 problems.<br>
|
65 |
+
- [GPQA-Diamond (Sci-QA)](https://huggingface.co/datasets/Idavidrein/gpqa): We evaluate 8 times and report the average accuracy. The dataset consists of 198 problems.<br>
|
66 |
+
|
67 |
|
68 |
## FairyR1 series Team Members:
|
69 |
|