DeepSeek-R1-Distill-Llama-3B
This model is the distilled version of DeepSeek-R1 on Llama-3.2-3B with R1-Distill-SFT dataset. This model is 4bit quantized! You should import it f16 if you want to use full model.
Example usage:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"suayptalha/DeepSeek-R1-Distill-Llama-3B-4bit",
load_in_4bit = True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("suayptalha/DeepSeek-R1-Distill-Llama-3B-4bit")
SYSTEM_PROMPT = """Respond in the following format:
<reasoning>
You should reason between these tags.
</reasoning>
Answer goes here...
Always use <reasoning> </reasoning> tags even if they are not necessary.
"""
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True,
return_tensors = "pt",
).to("cuda")
output = model.generate(input_ids=inputs, max_new_tokens=256, use_cache=True, temperature=0.7)
decoded_output = tokenizer.decode(output[0], skip_special_tokens=False)
print(decoded_output)
Output:
<reasoning>
To continue the Fibonacci sequence, we need to recall the pattern of adding the previous two numbers to get the next number.
</reasoning>
The next numbers in the sequence would be: 13, 21, 34, 55, 89, 144
Suggested system prompt:
Respond in the following format:
<reasoning>
You should reason between these tags.
</reasoning>
Answer goes here...
Always use <reasoning> </reasoning> tags even if they are not necessary.
Parameters
- lr: 2e-5
- epochs: 1
- optimizer: paged_adamw_8bit
Support
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for suayptalha/DeepSeek-R1-Distill-Llama-3B-4bit-v0
Base model
meta-llama/Llama-3.2-3B-Instruct