abaryan
/

BioXP-0.5B-MedMCQA

Question Answering

Model card Files Files and versions Community

Abaryan commited on Jun 1

Commit

a151363

·

verified ·

1 Parent(s): fd68fe6

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -20,9 +20,11 @@ BioXP-0.5B is a 🤗 Medical-AI model trained using our two-stage fine-tuning ap
 2. Group Relative Policy Optimization (GRPO): In the second stage, GRPO was applied to further align the model with human-like reasoning patterns.
 This reinforcement learning technique enhances the model’s ability to generate coherent, high-quality explanations and improve answer reliability.
-## Model Details
-### Model Description
 This model is a finetuned version of Qwen/Qwen2.5-0.5B-Instruct, a 0.5 billion parameter language model from the Qwen2 family.
 The finetuning was performed using reinforcement learning approach: Group Relative Policy Optimization (GRPO).

 2. Group Relative Policy Optimization (GRPO): In the second stage, GRPO was applied to further align the model with human-like reasoning patterns.
 This reinforcement learning technique enhances the model’s ability to generate coherent, high-quality explanations and improve answer reliability.
+The final model achieves an accuracy of 64.58% on the MedMCQA benchmark.
+## Model Details
 This model is a finetuned version of Qwen/Qwen2.5-0.5B-Instruct, a 0.5 billion parameter language model from the Qwen2 family.
 The finetuning was performed using reinforcement learning approach: Group Relative Policy Optimization (GRPO).