|
--- |
|
base_model: unsloth/gemma-2-9b-bnb-4bit |
|
library_name: peft |
|
license: mit |
|
language: |
|
- hi |
|
datasets: |
|
- FreedomIntelligence/alpaca-gpt4-hindi |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
Gemma2 HindiChat |
|
|
|
This version of Gemma2.0 was finetuned on Colab notebook with L4 GPU for Hindi language tasks. The inference code below includes data preprocessing and evaluation pipelines tailored for Hindi language generation and |
|
understanding. |
|
|
|
## Model Details |
|
|
|
The base model, unsloth/gemma-2-9b, supports RoPE scaling, 4-bit quantization for memory efficiency, and fine-tuning with LoRA (Low-Rank Adaptation). Flash Attention 2 is utilized to enable softcapping and improve efficiency during training. |
|
|
|
## Prompt Design |
|
|
|
A custom Hindi Alpaca style Prompt Template is designed to format instructions, inputs, and expected outputs in a conversational structure. |
|
|
|
निर्देश:{instruction} |
|
|
|
इनपुट:{input} |
|
|
|
उत्तर:{response} |
|
|
|
```python |
|
# Install required libraries |
|
!pip install peft accelerate bitsandbytes |
|
|
|
from peft import PeftModel, PeftConfig |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig |
|
import torch |
|
|
|
# Load the configuration for the fine-tuned model |
|
model_id = "Vijayendra/Gemma2.0-9B-HindiChat" |
|
config = PeftConfig.from_pretrained(model_id) |
|
|
|
# Load the base model and the fine-tuned model |
|
base_model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path) |
|
model = PeftModel.from_pretrained(base_model, model_id) |
|
|
|
# Load the tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path) |
|
|
|
# Define the prompt template (same as used during training) |
|
hindi_alpaca_prompt = """नीचे एक निर्देश दिया गया है जो एक कार्य का वर्णन करता है, और इसके साथ एक इनपुट है जो अतिरिक्त संदर्भ प्रदान करता है। एक उत्तर लिखें जो अनुरोध को सही ढंग से पूरा करता हो। |
|
|
|
### निर्देश: |
|
{} |
|
|
|
### इनपुट: |
|
{} |
|
|
|
### उत्तर: |
|
{}""" |
|
|
|
# Define new prompts for inference |
|
prompts = [ |
|
hindi_alpaca_prompt.format("भारत के स्वतंत्रता संग्राम में महात्मा गांधी की भूमिका क्या थी?", "", ""), |
|
hindi_alpaca_prompt.format("सौर मंडल में कौन सा ग्रह सबसे छोटा है?", "", ""), |
|
hindi_alpaca_prompt.format("पानी का रासायनिक सूत्र क्या है?", "", ""), |
|
hindi_alpaca_prompt.format("हिमालय पर्वत श्रृंखला की विशेषताएँ बताइए।", "", ""), |
|
hindi_alpaca_prompt.format("प्रकाश संश्लेषण की प्रक्रिया क्या है?", "", "") |
|
] |
|
|
|
# Tokenize the prompts |
|
inputs = tokenizer(prompts, return_tensors="pt", padding=True, truncation=True).to("cuda") |
|
|
|
# Generate responses |
|
outputs = model.generate(**inputs, max_new_tokens=512,do_sample=True, |
|
temperature=0.7,top_k=50, use_cache=True) |
|
|
|
# Decode and print the responses |
|
responses = tokenizer.batch_decode(outputs, skip_special_tokens=True) |
|
for i, response in enumerate(responses): |
|
print(f"प्रश्न {i+1}: {prompts[i].split('### निर्देश:')[1].split('### इनपुट:')[0].strip()}") |
|
print(f"उत्तर: {response.split('### उत्तर:')[1].strip()}") |
|
print("-" * 50) |