Abaryan
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,12 @@ tags:
|
|
13 |
|
14 |
# Model Card for BioXP
|
15 |
|
16 |
-
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
## Model Details
|
19 |
|
|
|
13 |
|
14 |
# Model Card for BioXP
|
15 |
|
16 |
+
BioXP-0.5B is a 🤗 Transformers-based model trained using our two-stage fine-tuning approach:
|
17 |
+
|
18 |
+
1. Supervised Fine-Tuning (SFT): The model was initially fine-tuned on labeled data(MedMCQA) to achieve strong baseline accuracy on multiple-choice medical QA tasks.
|
19 |
+
|
20 |
+
2. Group Relative Policy Optimization (GRPO): In the second stage, GRPO was applied to further align the model with human-like reasoning patterns.
|
21 |
+
This reinforcement learning technique enhances the model’s ability to generate coherent, high-quality explanations and improve answer reliability.
|
22 |
|
23 |
## Model Details
|
24 |
|