Abaryan commited on
Commit
8802d8c
·
verified ·
1 Parent(s): 44ebb1e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -1
README.md CHANGED
@@ -13,7 +13,12 @@ tags:
13
 
14
  # Model Card for BioXP
15
 
16
- This model is a 🤗 transformers model, BioXP-0.5B.
 
 
 
 
 
17
 
18
  ## Model Details
19
 
 
13
 
14
  # Model Card for BioXP
15
 
16
+ BioXP-0.5B is a 🤗 Transformers-based model trained using our two-stage fine-tuning approach:
17
+
18
+ 1. Supervised Fine-Tuning (SFT): The model was initially fine-tuned on labeled data(MedMCQA) to achieve strong baseline accuracy on multiple-choice medical QA tasks.
19
+
20
+ 2. Group Relative Policy Optimization (GRPO): In the second stage, GRPO was applied to further align the model with human-like reasoning patterns.
21
+ This reinforcement learning technique enhances the model’s ability to generate coherent, high-quality explanations and improve answer reliability.
22
 
23
  ## Model Details
24