DeepSeek-R1-Distill-Llama-3B

This model is the distilled version of DeepSeek-R1 on Llama-3.2-3B with R1-Distill-SFT dataset. This model is 4bit quantized! You should import it f16 if you want to use full model.

Example usage:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "suayptalha/DeepSeek-R1-Distill-Llama-3B-4bit",
    load_in_4bit = True,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained("suayptalha/DeepSeek-R1-Distill-Llama-3B-4bit")

SYSTEM_PROMPT = """Respond in the following format:
<reasoning>
You should reason between these tags.
</reasoning>

Answer goes here...

Always use <reasoning> </reasoning> tags even if they are not necessary.
"""

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True,
    return_tensors = "pt",
).to("cuda")
output = model.generate(input_ids=inputs, max_new_tokens=256, use_cache=True, temperature=0.7)
decoded_output = tokenizer.decode(output[0], skip_special_tokens=False)
print(decoded_output)

Output:

<reasoning>
To continue the Fibonacci sequence, we need to recall the pattern of adding the previous two numbers to get the next number.
</reasoning>

The next numbers in the sequence would be: 13, 21, 34, 55, 89, 144

Suggested system prompt:

Respond in the following format:
<reasoning>
You should reason between these tags.
</reasoning>

Answer goes here...

Always use <reasoning> </reasoning> tags even if they are not necessary.

Parameters

lr: 2e-5
epochs: 1
optimizer: paged_adamw_8bit

suayptalha
/

DeepSeek-R1-Distill-Llama-3B-4bit-v0

DeepSeek-R1-Distill-Llama-3B

Parameters

Support

Model tree for suayptalha/DeepSeek-R1-Distill-Llama-3B-4bit-v0

Dataset used to train suayptalha/DeepSeek-R1-Distill-Llama-3B-4bit-v0