---
license: apache-2.0
datasets:
- beomi/KoAlpaca-RealQA
language:
- ko
base_model:
- Qwen/Qwen2.5-Coder-1.5B-Instruct
pipeline_tag: text-generation
---

# Model Description
Qwen/Qwen2.5-Coder-1.5B-Instruct을 기반으로 PEFT를 이용하여 QLoRA (4-bit quantization + PEFT)해본 모델입니다.

학습 데이터는 beomi/KoAlpaca-RealQA를 사용하였습니다.

작은 모델을 이용하여 QLoRA를 한 것이다 보니 양질의 output이 나오지는 않지만 QLoRA모델과 원본모델의 답변이 차이는 확실히 있었습니다.

# Quantization Configuration
```python
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)
```

# LoRA Condifiguration
```python
lora_config = LoraConfig(
    r=8,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["c_attn", "q_proj", "v_proj"]
)
```

# Training Arguments
```python
training_args = TrainingArguments(
    num_train_epochs=8,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    evaluation_strategy="steps",
    eval_steps=300,
    save_strategy="steps",
    save_steps=300,
    logging_steps=300,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False
)
```

# Training Progress
| Step | Training Loss | Validation Loss |
|------|---------------|-----------------|
| 300  | 1.595000      | 1.611501        |
| 600  | 1.593300      | 1.596210        |
| 900  | 1.577600      | 1.586121        |
| 1200 | 1.564600      | 1.577804        |
| ...  | ...           | ...             |
| 7200 | 1.499700      | 1.525933        |
| 7500 | 1.493400      | 1.525612        |
| 7800 | 1.491000      | 1.525330        |
| 8100 | 1.499900      | 1.525138        |


# 실행 코드
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

# Quantization config (must match QLoRA settings used during fine-tuning)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

# Load tokenizer and model (local or hub path)
model_path = "onebeans/Qwen2.5-Coder-KoInstruct-QLoRA"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    quantization_config=bnb_config,
    device_map="auto"
)
model.eval()

# Define prompt using ChatML format (Qwen-style)
def build_chatml_prompt(question: str) -> str:
    system_msg = "<|im_start|>system\n당신은 유용한 한국어 도우미입니다.<|im_end|>\n"
    user_msg = f"<|im_start|>user\n{question}<|im_end|>\n"
    return system_msg + user_msg + "<|im_start|>assistant\n"

# Run inference
def generate_response(question: str, max_new_tokens: int = 128) -> str:
    prompt = build_chatml_prompt(question)
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=False,
            top_p=0.9,
            temperature=0.7,
            eos_token_id=tokenizer.eos_token_id,
        )

    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example
question = "한국의 수도는 어디인가요?" # 기존 모델(Qwen/Qwen2.5-Coder-1.5B-Instruct)의 응답 -> 한국의 수도는 서울입니다.
response = generate_response(question)
print("모델 응답:\n", response)
```

# 실행환경

Window 10

NVIDIA GeForce RTX 4070 Ti 

# Framework Versions 

Python: 3.10.14

PyTorch: 1.12.1

Transformers: 4.46.2

Datasets: 3.2.0

Tokenizers: 0.20.3

PEFT: 0.8.2