tcotter
/

DeepSeek-R1-Qwen-1.5B-unsloth-bnb-4bit-LoRA-Adapter

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

tcotter commited on 4 days ago

Commit

fdf1333

·

verified ·

1 Parent(s): 8a14d50

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -17,6 +17,12 @@ language:
 - **License:** apache-2.0
 - **Finetuned from model :** unsloth/DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit
 This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 - **License:** apache-2.0
 - **Finetuned from model :** unsloth/DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit
+Finetuned Qwen1.5B (R1 Distilled Version) on [this dataset](https://huggingface.co/datasets/tcotter/o1-medical-data-reasoning-w-summary),
+which comes from [this dataset](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT) but with an
+additional "summary" produced by an in-house synthetic data generator.
+This LoRA is therefore a LoRA which helps the model return a "\n\nFinal Answer: ..." after it's reasoning and initial response steps.
 This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)