Abaryan commited on
Commit
a151363
·
verified ·
1 Parent(s): fd68fe6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -20,9 +20,11 @@ BioXP-0.5B is a 🤗 Medical-AI model trained using our two-stage fine-tuning ap
20
  2. Group Relative Policy Optimization (GRPO): In the second stage, GRPO was applied to further align the model with human-like reasoning patterns.
21
  This reinforcement learning technique enhances the model’s ability to generate coherent, high-quality explanations and improve answer reliability.
22
 
23
- ## Model Details
 
24
 
25
- ### Model Description
 
26
 
27
  This model is a finetuned version of Qwen/Qwen2.5-0.5B-Instruct, a 0.5 billion parameter language model from the Qwen2 family.
28
  The finetuning was performed using reinforcement learning approach: Group Relative Policy Optimization (GRPO).
 
20
  2. Group Relative Policy Optimization (GRPO): In the second stage, GRPO was applied to further align the model with human-like reasoning patterns.
21
  This reinforcement learning technique enhances the model’s ability to generate coherent, high-quality explanations and improve answer reliability.
22
 
23
+ The final model achieves an accuracy of 64.58% on the MedMCQA benchmark.
24
+
25
 
26
+
27
+ ## Model Details
28
 
29
  This model is a finetuned version of Qwen/Qwen2.5-0.5B-Instruct, a 0.5 billion parameter language model from the Qwen2 family.
30
  The finetuning was performed using reinforcement learning approach: Group Relative Policy Optimization (GRPO).