Update README.md
#2
by
jw-sohn
- opened
README.md
CHANGED
@@ -10,7 +10,7 @@ tags:
|
|
10 |
---
|
11 |
|
12 |
# Med-PRM-Reward (Version 1.0)
|
13 |
-
🚀 Med-PRM-Reward is among the first Process Reward Models (PRMs) specifically designed for the medical domain. Unlike conventional PRMs, it enhances its verification capabilities by integrating clinical knowledge through retrieval-augmented generation (RAG). Med-PRM-Reward demonstrates exceptional performance in scaling-test-time computation, particularly outperforming majority‐voting ensembles on complex medical reasoning tasks. Moreover, its scalability is not limited to Llama-3.1-8B-Instruct: it delivers similarly outstanding results in scaling-test-time computation across multiple other medical‐specialized models. Notably, when combined with llama-3-meerkat-8b-v1.0, it became the first
|
14 |
|
15 |
|
16 |
📄 Paper: [Med-PRM-Reward: Medical Reasoning Models with Stepwise, Guideline‑verified Process Rewards](https://arxiv.org/abs/2506.11474)
|
|
|
10 |
---
|
11 |
|
12 |
# Med-PRM-Reward (Version 1.0)
|
13 |
+
🚀 Med-PRM-Reward is among the first Process Reward Models (PRMs) specifically designed for the medical domain. Unlike conventional PRMs, it enhances its verification capabilities by integrating clinical knowledge through retrieval-augmented generation (RAG). Med-PRM-Reward demonstrates exceptional performance in scaling-test-time computation, particularly outperforming majority‐voting ensembles on complex medical reasoning tasks. Moreover, its scalability is not limited to Llama-3.1-8B-Instruct: it delivers similarly outstanding results in scaling-test-time computation across multiple other medical‐specialized models. Notably, when combined with llama-3-meerkat-8b-v1.0, it became the first 8B model framework to surpass a score of 80 on the MedQA (4-option) benchmark.
|
14 |
|
15 |
|
16 |
📄 Paper: [Med-PRM-Reward: Medical Reasoning Models with Stepwise, Guideline‑verified Process Rewards](https://arxiv.org/abs/2506.11474)
|