Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled
The Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled model has been distilled from the Qwen2.5-Coder-1.5B-Instruct-SFT model down to 1B parameters using a token-based knowledge distillation method.
TableofContents
Usage
Hugging Face
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled"
tokenize  = AutoTokenizer.from_pretrained(repo, padding_side="left")
model  = AutoModelForCausalLM.from_pretrained(
          repo,
          device_map="auto",
          torch_dtype="auto",
      ).eval()
system = "You are a senior Python developer."
user   = "Give me a Python implementation of bubble sort."
text = f"System: {system}\nUser: {user}\nAssistant:"
inputs = tokenize(text, return_tensors="pt").to(model.device)
with torch.no_grad():
    out_ids = model.generate(**inputs, max_new_tokens=512)
print(tokenize.decode(out_ids[0], skip_special_tokens=True))
Dataset
Training
Hyperparameters
| Hyperparameter | Value | 
|---|---|
| Base Model | bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT | 
| Knowledge Distillation Method | Token based | 
| Task Type | CAUSAL_LM | 
| Number of Epochs | 11 | 
| Batch Size | 12 | 
| Gradient Accumulation Steps | 2 | 
| Effective Batch Size | 24 (12 × 2) | 
| Learning Rate | 5e-5 | 
| Optimizer | AdamW | 
| Precision | BF16 Mixed Precision | 
| Evaluation Strategy | epoch | 
| Max Sequence Length | 256 tokens | 
| Logging Steps | every epoch steps | 
| Save Checkpoint Steps | every 10000 steps | 
| Experiment Tracking | MLflow (local) | 
| Experiment Name | StudentKnowledgeDistillation | 
| MLflow Run Name | StudentKD | 
Knowledge Distillation Configuration
| Parameter | Value | 
|---|---|
| Distillation Weight | 0.3 | 
| Temperature | 0.5 | 
| Loss Reduction | batchmean | 
Dataset
- Train/Test Split: 
90%/10% - Random Seed: 
42 - Train Batched: 
True - Eval Batched: 
True 
Tokenizer Configuration
- Truncation: Enabled (
max_length=256) - Masked Language Modeling (MLM): 
False 
Speeds, Sizes, Times
- Total Training Time: ~7 hours
 - Checkpoint Frequency: every 
10000steps - Checkpoint Steps:
checkpoint-10000checkpoint-13200(final checkpoint)
 
Compute Infrastructure
Hardware:
- GPU: 1 × NVIDIA L40S (48 GB VRAM)
 - RAM: 94 GB
 - CPU: 16 vCPU
 
Software:
- OS: Ubuntu 22.04
 - Frameworks: PyTorch 2.4.0
 - CUDA Version: 12.4.1
 
Licence
Links
Team
Contact
Citation
@software{       Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled,
  author       = {Bunyamin Ergen},
  title        = {{Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled}},
  year         = {2025},
  month        = {04},
}
- Downloads last month
 - 2
 
Model tree for bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled
Base model
Qwen/Qwen2.5-1.5B
				Finetuned
	
	
Qwen/Qwen2.5-Coder-1.5B
						
				Finetuned
	
	
Qwen/Qwen2.5-Coder-1.5B-Instruct