infly
/

Universal-PRM-7B

@@ -1,7 +1,13 @@
 ---
 license: apache-2.0
 ---
 # Universal-PRM-7B
 ## 1. Overview
 Universal-PRM is trained using Qwen2.5-Math-7B-Instruct as the base. The training process incorporates diverse policy distributions, ensemble prompting, and reverse verification to enhance generalization and robustness. It achieves state-of-the-art performance on ProcessBench and the internally developed UniversalBench.
 ## 2. Experiments
@@ -75,5 +81,4 @@ with torch.no_grad():
         judge_list_infer.append(reward)
 print(judge_list_infer)     # [0.73828125, 0.7265625, 0.73046875, 0.73828125, 0.734375]
-```

 ---
 license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
 ---
 # Universal-PRM-7B
+Project page: https://auroraprm.github.io/
 ## 1. Overview
 Universal-PRM is trained using Qwen2.5-Math-7B-Instruct as the base. The training process incorporates diverse policy distributions, ensemble prompting, and reverse verification to enhance generalization and robustness. It achieves state-of-the-art performance on ProcessBench and the internally developed UniversalBench.
 ## 2. Experiments
         judge_list_infer.append(reward)
 print(judge_list_infer)     # [0.73828125, 0.7265625, 0.73046875, 0.73828125, 0.734375]
+```