metadata
base_model: unsloth/DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- qwen2
- trl
license: apache-2.0
language:
- en
Uploaded model
- Developed by: tcotter
- License: apache-2.0
- Finetuned from model : unsloth/DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit
Finetuned Qwen1.5B (R1 Distilled Version) on this dataset, which comes from this dataset but with an additional "summary" produced by an in-house synthetic data generator.
This LoRA is therefore a LoRA which helps the model return a "\n\nFinal Answer: ..." after it's reasoning and initial response steps.
This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.