metadata

base_model: unsloth/DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - qwen2
  - trl
license: apache-2.0
language:
  - en

Uploaded model

Developed by: tcotter
License: apache-2.0
Finetuned from model : unsloth/DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit

Finetuned Qwen1.5B (R1 Distilled Version) on this dataset, which comes from this dataset but with an additional "summary" produced by an in-house synthetic data generator.

This LoRA is therefore a LoRA which helps the model return a "\n\nFinal Answer: ..." after it's reasoning and initial response steps.

This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.