dmis-lab
/

llama-3.1-medprm-reward-v1.0

Text Generation

process-reward-model

retrieval-augmented-generation

text-generation-inference

Model card Files Files and versions

Update README.md

#2

by jw-sohn - opened Jun 18

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ tags:
 ---
 # Med-PRM-Reward (Version 1.0)
-🚀 Med-PRM-Reward is among the first Process Reward Models (PRMs) specifically designed for the medical domain. Unlike conventional PRMs, it enhances its verification capabilities by integrating clinical knowledge through retrieval-augmented generation (RAG). Med-PRM-Reward demonstrates exceptional performance in scaling-test-time computation, particularly outperforming majority‐voting ensembles on complex medical reasoning tasks. Moreover, its scalability is not limited to Llama-3.1-8B-Instruct: it delivers similarly outstanding results in scaling-test-time computation across multiple other medical‐specialized models. Notably, when combined with llama-3-meerkat-8b-v1.0, it became the first sub-10B small language model to surpass a score of 80 on the MedQA (4-option) benchmark.
 📄 Paper: [Med-PRM-Reward: Medical Reasoning Models with Stepwise, Guideline‑verified Process Rewards](https://arxiv.org/abs/2506.11474)

 ---
 # Med-PRM-Reward (Version 1.0)
+🚀 Med-PRM-Reward is among the first Process Reward Models (PRMs) specifically designed for the medical domain. Unlike conventional PRMs, it enhances its verification capabilities by integrating clinical knowledge through retrieval-augmented generation (RAG). Med-PRM-Reward demonstrates exceptional performance in scaling-test-time computation, particularly outperforming majority‐voting ensembles on complex medical reasoning tasks. Moreover, its scalability is not limited to Llama-3.1-8B-Instruct: it delivers similarly outstanding results in scaling-test-time computation across multiple other medical‐specialized models. Notably, when combined with llama-3-meerkat-8b-v1.0, it became the first 8B model framework to surpass a score of 80 on the MedQA (4-option) benchmark.
 📄 Paper: [Med-PRM-Reward: Medical Reasoning Models with Stepwise, Guideline‑verified Process Rewards](https://arxiv.org/abs/2506.11474)