Model Card: Llama-2-chat-finetuned

Model Details

Model Name: Llama-2-chat-finetuned
Base Model: NousResearch/Llama-2-7b-chat-hf
Fine-Tuned By: HiTruong
Fine-Tuning Method: LoRA (Low-Rank Adaptation)
Dataset: Movie-related dataset
Evaluation Metric: BLEU Score
BLEU Score Before Fine-Tuning: 33.26
BLEU Score After Fine-Tuning: 77.53

Model Description

This model is a fine-tuned version of NousResearch/Llama-2-7b-chat-hf, optimized for movie-related conversations. The fine-tuning process was performed using LoRA to efficiently adapt the model while keeping computational requirements manageable. It is designed to improve conversational understanding and response generation for movie-related queries.

Training Details

Hardware Used: Kaggle GPU (T4x2)
Fine-Tuning Framework: Hugging Face Transformers + LoRA
Output Folder: ./results
Number of Epochs: 2
Batch Size:
- Per Device Train: 4
- Per Device Eval: 4
Gradient Accumulation Steps: 1
Gradient Checkpointing: Enabled
Max Gradient Norm: 0.3
Mixed Precision: fp16=False, bf16=False
Optimizer: paged_adamw_32bit
Learning Rate: 2e-5
Weight Decay: 0.001
LR Scheduler Type: cosine
Warmup Ratio: 0.03
Max Steps: -1 (determined by epochs)
Quantization Settings:
- use_4bit = True
- bnb_4bit_compute_dtype = float16
- bnb_4bit_quant_type = nf4
- use_nested_quant = False
LoRA Hyperparameters:
- lora_r = 64
- lora_alpha = 16
- lora_dropout = 0.05
Sequence Length: Dynamic (max_seq_length=None)
Packing: Disabled (packing=False)
Device Map: {"": 0}

Capabilities

Answers movie-related questions with improved accuracy.
Understands movie genres, actors, directors, and plots.
Provides recommendations based on user preferences.

Limitations

May generate incorrect or biased information.
Limited to the knowledge present in the training dataset.
Does not have real-time access to new movie releases.

Usage

You can load and use the model with the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "HiTruong/Llama-2-chat-finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

def generate_answer(question):
    inputs = tokenizer(f"<s>[INST] {question} [/INST]", return_tensors="pt", truncation=True, max_length=100).to(model.device)
    with torch.no_grad():
        output = model.generate(**inputs, max_length=75, eos_token_id=tokenizer.eos_token_id)
    response = tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
    return response.replace(f"[INST] {question} [/INST]", "").strip().split('.')[0]

input_text = "What are some great sci-fi movies?"

print(generate_answer(input_text))

HiTruong
/

Llama-2-chat-finetuned