Abaryan
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -20,9 +20,11 @@ BioXP-0.5B is a 🤗 Medical-AI model trained using our two-stage fine-tuning ap
|
|
20 |
2. Group Relative Policy Optimization (GRPO): In the second stage, GRPO was applied to further align the model with human-like reasoning patterns.
|
21 |
This reinforcement learning technique enhances the model’s ability to generate coherent, high-quality explanations and improve answer reliability.
|
22 |
|
23 |
-
|
|
|
24 |
|
25 |
-
|
|
|
26 |
|
27 |
This model is a finetuned version of Qwen/Qwen2.5-0.5B-Instruct, a 0.5 billion parameter language model from the Qwen2 family.
|
28 |
The finetuning was performed using reinforcement learning approach: Group Relative Policy Optimization (GRPO).
|
|
|
20 |
2. Group Relative Policy Optimization (GRPO): In the second stage, GRPO was applied to further align the model with human-like reasoning patterns.
|
21 |
This reinforcement learning technique enhances the model’s ability to generate coherent, high-quality explanations and improve answer reliability.
|
22 |
|
23 |
+
The final model achieves an accuracy of 64.58% on the MedMCQA benchmark.
|
24 |
+
|
25 |
|
26 |
+
|
27 |
+
## Model Details
|
28 |
|
29 |
This model is a finetuned version of Qwen/Qwen2.5-0.5B-Instruct, a 0.5 billion parameter language model from the Qwen2 family.
|
30 |
The finetuning was performed using reinforcement learning approach: Group Relative Policy Optimization (GRPO).
|