--- license: apache-2.0 datasets: - beomi/KoAlpaca-RealQA language: - ko base_model: - Qwen/Qwen2.5-Coder-1.5B-Instruct pipeline_tag: text-generation --- # Model Description Qwen/Qwen2.5-Coder-1.5B-Instruct을 기반으로 PEFT를 이용하여 QLoRA (4-bit quantization + PEFT)해본 모델입니다. 학습 데이터는 beomi/KoAlpaca-RealQA를 사용하였습니다. 작은 모델을 이용하여 QLoRA를 한 것이다 보니 양질의 output이 나오지는 않지만 QLoRA모델과 원본모델의 답변이 차이는 확실히 있었습니다. # Quantization Configuration ```python bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.float16, ) ``` # LoRA Condifiguration ```python lora_config = LoraConfig( r=8, lora_alpha=32, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM", target_modules=["c_attn", "q_proj", "v_proj"] ) ``` # Training Arguments ```python training_args = TrainingArguments( num_train_epochs=8, per_device_train_batch_size=4, gradient_accumulation_steps=4, evaluation_strategy="steps", eval_steps=300, save_strategy="steps", save_steps=300, logging_steps=300, load_best_model_at_end=True, metric_for_best_model="eval_loss", greater_is_better=False ) ``` # Training Progress | Step | Training Loss | Validation Loss | |------|---------------|-----------------| | 300 | 1.595000 | 1.611501 | | 600 | 1.593300 | 1.596210 | | 900 | 1.577600 | 1.586121 | | 1200 | 1.564600 | 1.577804 | | ... | ... | ... | | 7200 | 1.499700 | 1.525933 | | 7500 | 1.493400 | 1.525612 | | 7800 | 1.491000 | 1.525330 | | 8100 | 1.499900 | 1.525138 | # 실행 코드 ```python from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig import torch # Quantization config (must match QLoRA settings used during fine-tuning) bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.float16, ) # Load tokenizer and model (local or hub path) model_path = "onebeans/Qwen2.5-Coder-KoInstruct-QLoRA" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained( model_path, quantization_config=bnb_config, device_map="auto" ) model.eval() # Define prompt using ChatML format (Qwen-style) def build_chatml_prompt(question: str) -> str: system_msg = "<|im_start|>system\n당신은 유용한 한국어 도우미입니다.<|im_end|>\n" user_msg = f"<|im_start|>user\n{question}<|im_end|>\n" return system_msg + user_msg + "<|im_start|>assistant\n" # Run inference def generate_response(question: str, max_new_tokens: int = 128) -> str: prompt = build_chatml_prompt(question) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=max_new_tokens, do_sample=False, top_p=0.9, temperature=0.7, eos_token_id=tokenizer.eos_token_id, ) return tokenizer.decode(outputs[0], skip_special_tokens=True) # Example question = "한국의 수도는 어디인가요?" # 기존 모델(Qwen/Qwen2.5-Coder-1.5B-Instruct)의 응답 -> 한국의 수도는 서울입니다. response = generate_response(question) print("모델 응답:\n", response) ``` # 실행환경 Window 10 NVIDIA GeForce RTX 4070 Ti # Framework Versions Python: 3.10.14 PyTorch: 1.12.1 Transformers: 4.46.2 Datasets: 3.2.0 Tokenizers: 0.20.3 PEFT: 0.8.2