Tej3 commited on
Commit
785f95d
·
verified ·
1 Parent(s): baca193

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -3
README.md CHANGED
@@ -6,6 +6,8 @@ base_model:
6
  - Qwen/Qwen2.5-Math-7B-Instruct
7
  pipeline_tag: text-classification
8
  library_name: transformers
 
 
9
  ---
10
 
11
  # PathFinder-PRM-7B
@@ -181,9 +183,54 @@ reward_score = run_inference(messages)
181
 
182
  ### Results
183
 
184
- [More Information Needed]
185
 
186
- <!-- #### Summary -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
187
 
188
 
189
  ## Citation
@@ -198,4 +245,4 @@ reward_score = run_inference(messages)
198
  primaryClass={cs.LG},
199
  url={https://arxiv.org/abs/2505.12345},
200
  }
201
- ```
 
6
  - Qwen/Qwen2.5-Math-7B-Instruct
7
  pipeline_tag: text-classification
8
  library_name: transformers
9
+ datasets:
10
+ - declare-lab/PathFinder-600K
11
  ---
12
 
13
  # PathFinder-PRM-7B
 
183
 
184
  ### Results
185
 
186
+ ![benchmark_comparison.png](images/benchmark_comparison.png)
187
 
188
+ #### PRMBench Results
189
+
190
+ | Model | Simplicity | Soundness | Sensitivity | Overall |
191
+ |----------------------------------|------------|-----------|-------------|---------|
192
+ | **LLM-as-judge, Proprietary Language Models** | | | | |
193
+ | Gemini-2.0-thinking-exp-1219 | 66.2 | 71.8 | 75.3 | 68.8 |
194
+ | GPT-4o | 59.7 | 70.9 | 75.8 | 66.8 |
195
+ | **LLM-as-judge, Open-source Language Models** | | | | |
196
+ | Qwen-2.5-Math-72B | 55.1 | 61.1 | 67.1 | 57.4 |
197
+ | QwQ-Preview-32B | 56.4 | 68.2 | 73.5 | 63.6 |
198
+ | **Discriminative Process Reward Models** | | | | |
199
+ | Math-Shepherd-7B | 47.1 | 45.7 | 60.7 | 47.0 |
200
+ | Math-PSA-7B | 51.3 | 51.8 | 64.9 | 52.3 |
201
+ | RLHFlow-Mistral-8B | 46.7 | 57.5 | 68.5 | 54.4 |
202
+ | Lemma-PRM800k-7B | 51.4 | 50.9 | 66.0 | 52.0 |
203
+ | ReasonEval-7B | 55.5 | 63.9 | 71.0 | 60.0 |
204
+ | Qwen2.5-Math-PRM-7B | 52.1 | **71.0** | 75.5 | 65.5 |
205
+ | 🟢 PathFinder-PRM-7B | **58.9** | 70.8 | **76.9** | **67.7** |
206
+
207
+ Note: Simplicity, Soundness, and Sensitivity are averaged sub-metrics from PRMBench. Our model, PathFinder-PRM-7B, outperforms all open-source discriminative PRMs and LLM-as-judge models, while achieving competitive performance compared to large proprietary models.
208
+
209
+
210
+ #### ProcessBench Results
211
+
212
+ | Model | # Samples | GSM8K | MATH | Olympiad | OmniMath | Avg. F1 |
213
+ |-------------------------------|-----------|-------|-------|----------|----------|---------|
214
+ | Math-Shepherd-7B | 445K | 47.9 | 29.5 | 24.8 | 23.8 | 31.5 |
215
+ | RLHFlow-Mistral-8B | 273K | 50.4 | 33.4 | 13.8 | 15.8 | 28.4 |
216
+ | Llemma-PRM800K-7B | ~350K | 48.4 | 43.1 | 28.5 | 33.4 | 38.4 |
217
+ | Qwen2.5-Math-7B-PRM800K | 264K | 68.2 | 62.6 | 50.7 | 44.3 | 58.5 |
218
+ | 🟢 PathFinder-PRM-7B | ~400K | 77.9 | 75.3 | 65.0 | 59.7 | 69.5 |
219
+ | Qwen2.5-Math-PRM-7B | ~1.5M | 82.4 | 77.6 | 67.5 | 66.3 | 73.5 |
220
+
221
+ PathFinder-PRM-7B outperforms models trained on similar data sizes on ProcessBench but performs 4 points worse compared to Qwen2.5-Math-PRM-7B which was trained with 3x more data.
222
+
223
+ ### Reward-Guided Greedy Search (PRM@8)
224
+
225
+ | Model | AIME24 | AMC23 | MATH | Olympiad | College | Minerva | Avg |
226
+ |------------------------------|--------|-------|-------|----------|---------|---------|-------|
227
+ | Math-Shepherd-7B | 13.3 | 52.5 | 74.6 | 38.5 | 36.5 | 41.2 | 42.8 |
228
+ | Math-PSA-7B | 6.7 | 57.5 | 79.8 | 42.5 | 41.0 | 39.3 | 44.5 |
229
+ | Skywork-PRM-7B | 10.0 | 57.5 | 77.8 | 41.5 | 39.0 | **43.4** | 44.9 |
230
+ | Qwen2.5-Math-PRM-7B | 16.7 | 60.0 | **81.0** | **43.5** | 39.0 | 40.4 | 46.8 |
231
+ | 🟢 PathFinder-PRM-7B | **20.0** | **62.5** | 78.8 | 36.5 | **55.0** | 36.7 | **48.3** |
232
+
233
+ Note: All results are computed using reward-guided greedy search with Qwen2.5‑7B‑Instruct as the policy model. PathFinder-PRM-7B, outperforms all open-source discriminative PRMs in Reward-Guided Greedy Search showcasing its ability to better guide policy models towards correct solutions
234
 
235
 
236
  ## Citation
 
245
  primaryClass={cs.LG},
246
  url={https://arxiv.org/abs/2505.12345},
247
  }
248
+ ```