Add library_name, pipeline_tag, and project page link
#2
by
nielsr
HF staff
- opened
README.md
CHANGED
@@ -1,7 +1,13 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
3 |
---
|
|
|
4 |
# Universal-PRM-7B
|
|
|
|
|
|
|
5 |
## 1. Overview
|
6 |
Universal-PRM is trained using Qwen2.5-Math-7B-Instruct as the base. The training process incorporates diverse policy distributions, ensemble prompting, and reverse verification to enhance generalization and robustness. It achieves state-of-the-art performance on ProcessBench and the internally developed UniversalBench.
|
7 |
## 2. Experiments
|
@@ -75,5 +81,4 @@ with torch.no_grad():
|
|
75 |
judge_list_infer.append(reward)
|
76 |
|
77 |
print(judge_list_infer) # [0.73828125, 0.7265625, 0.73046875, 0.73828125, 0.734375]
|
78 |
-
|
79 |
-
```
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
library_name: transformers
|
4 |
+
pipeline_tag: text-generation
|
5 |
---
|
6 |
+
|
7 |
# Universal-PRM-7B
|
8 |
+
|
9 |
+
Project page: https://auroraprm.github.io/
|
10 |
+
|
11 |
## 1. Overview
|
12 |
Universal-PRM is trained using Qwen2.5-Math-7B-Instruct as the base. The training process incorporates diverse policy distributions, ensemble prompting, and reverse verification to enhance generalization and robustness. It achieves state-of-the-art performance on ProcessBench and the internally developed UniversalBench.
|
13 |
## 2. Experiments
|
|
|
81 |
judge_list_infer.append(reward)
|
82 |
|
83 |
print(judge_list_infer) # [0.73828125, 0.7265625, 0.73046875, 0.73828125, 0.734375]
|
84 |
+
```
|
|