Upload model card with evaluation metrics
Browse files
README.md
ADDED
@@ -0,0 +1,229 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
base_model: meta-llama/Llama-3.3-70B-Instruct
|
4 |
+
tags:
|
5 |
+
- llama
|
6 |
+
- llama-3.3
|
7 |
+
- fine-tuned
|
8 |
+
- qlora
|
9 |
+
- development
|
10 |
+
- expert-system
|
11 |
+
- peft
|
12 |
+
- lora
|
13 |
+
pipeline_tag: text-generation
|
14 |
+
library_name: peft
|
15 |
+
---
|
16 |
+
|
17 |
+
# Decipher Llama 3.3 70B Instruct
|
18 |
+
|
19 |
+
## Model Description
|
20 |
+
|
21 |
+
This is a fine-tuned version of [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) using QLoRA (Quantized Low-Rank Adaptation) for domain-specific expertise across multiple development sectors.
|
22 |
+
|
23 |
+
**Base Model:** meta-llama/Llama-3.3-70B-Instruct
|
24 |
+
**Fine-tuning Method:** QLoRA with aggressive configuration
|
25 |
+
**Training Date:** 2025-07-15T22:44:44.375292
|
26 |
+
**Model Type:** Causal Language Model
|
27 |
+
|
28 |
+
## Domain Expertise
|
29 |
+
|
30 |
+
This model has been fine-tuned to provide expert-level responses in:
|
31 |
+
|
32 |
+
- **Health Programming** - Maternal health, community health interventions, mHealth solutions
|
33 |
+
- **Agriculture Programming** - Sustainable farming, crop management, agricultural development
|
34 |
+
- **MEL (Monitoring, Evaluation, and Learning)** - Program evaluation, theory of change, impact measurement
|
35 |
+
- **Democracy & Governance** - Civic engagement, governance structures, democratic processes
|
36 |
+
- **Water & Sanitation** - WASH programs, water resource management, sanitation systems
|
37 |
+
- **Education** - Educational program design, learning outcomes, educational technology
|
38 |
+
- **Economic Development** - Microfinance, economic growth strategies, financial inclusion
|
39 |
+
|
40 |
+
## Training Configuration
|
41 |
+
|
42 |
+
### Enhanced Training Parameters
|
43 |
+
- **Learning Rate:** 0.0001 (20x higher than baseline)
|
44 |
+
- **LoRA Rank:** 64 (4x larger than baseline)
|
45 |
+
- **LoRA Alpha:** 128
|
46 |
+
- **Training Epochs:** 5
|
47 |
+
- **Batch Size:** 1
|
48 |
+
- **Gradient Accumulation:** 64
|
49 |
+
- **Max Length:** 4096 tokens
|
50 |
+
|
51 |
+
### Training Results
|
52 |
+
- **Final Training Loss:** 96% (0.266 → 0.009)
|
53 |
+
- **Final Validation Loss:** 13% (1.295 → 1.127)
|
54 |
+
- **Training Completed:** True
|
55 |
+
|
56 |
+
### Evaluation Metrics (Comprehensive Model Assessment)
|
57 |
+
|
58 |
+
Our fine-tuned model demonstrates significant improvements across multiple evaluation metrics compared to the base Llama 3.3 70B model:
|
59 |
+
|
60 |
+
#### **Text Generation Quality Metrics**
|
61 |
+
|
62 |
+
| Metric | Base Model | Fine-tuned Model | Improvement | Statistical Significance |
|
63 |
+
|--------|------------|------------------|-------------|-------------------------|
|
64 |
+
| **BLEU Score** | 0.0033 | 0.0058 | **+77.8%** | Significant (p=0.038) |
|
65 |
+
| **ROUGE-1 F1** | 0.0984 | 0.1247 | **+26.7%** | Significant (p=0.002) |
|
66 |
+
| **ROUGE-2 F1** | 0.0250 | 0.0309 | **+23.9%** | Significant (p=0.045) |
|
67 |
+
| **ROUGE-L F1** | 0.0687 | 0.0822 | **+19.6%** | Significant (p=0.004) |
|
68 |
+
|
69 |
+
|
70 |
+
#### **Key Performance Insights**
|
71 |
+
|
72 |
+
**Significant Improvements:**
|
73 |
+
- **BLEU Score**: 78% improvement indicates better n-gram overlap with reference answers
|
74 |
+
- **ROUGE Metrics**: 20-27% improvements across all variants show enhanced content relevance
|
75 |
+
- **Statistical Significance**: All major improvements are statistically significant (p < 0.05)
|
76 |
+
|
77 |
+
**Effect Sizes:**
|
78 |
+
- **ROUGE-1**: Medium effect size (0.47) - substantial practical improvement
|
79 |
+
- **ROUGE-L**: Medium effect size (0.43) - meaningful structural improvements
|
80 |
+
- **BLEU**: Small-medium effect size (0.31) - noticeable quality enhancement
|
81 |
+
|
82 |
+
|
83 |
+
*Evaluation conducted on 50 domain-specific questions across all expertise areas using automated metrics and statistical analysis.*
|
84 |
+
|
85 |
+
## Usage
|
86 |
+
|
87 |
+
### Quick Start with Inference API
|
88 |
+
|
89 |
+
```python
|
90 |
+
# Using HF Inference API (Recommended - No GPU needed)
|
91 |
+
from huggingface_hub import InferenceClient
|
92 |
+
|
93 |
+
client = InferenceClient(model="{self.repo_id}")
|
94 |
+
|
95 |
+
def generate_expert_response(question, domain="Health Programming Expert"):
|
96 |
+
system_prompt = f"You are a "{domain} with deep specialized knowledge."
|
97 |
+
|
98 |
+
messages = [
|
99 |
+
{"role": "system", "content": system_prompt},
|
100 |
+
{"role": "user", "content": f"Question: {question}"}
|
101 |
+
]
|
102 |
+
|
103 |
+
response = client.chat_completion(
|
104 |
+
messages=messages,
|
105 |
+
max_tokens=512,
|
106 |
+
temperature=0.7
|
107 |
+
)
|
108 |
+
|
109 |
+
return response.choices[0].message.content
|
110 |
+
|
111 |
+
# Example usage
|
112 |
+
question = "What are the key components of a successful maternal health program?"
|
113 |
+
response = generate_expert_response(question, "Health Programming Expert")
|
114 |
+
print(response)
|
115 |
+
```
|
116 |
+
|
117 |
+
### Direct Model Loading (Requires GPU)
|
118 |
+
|
119 |
+
```python
|
120 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
121 |
+
import torch
|
122 |
+
|
123 |
+
# Load model and tokenizer directly
|
124 |
+
model_name = "{self.repo_id}"
|
125 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
126 |
+
model = AutoModelForCausalLM.from_pretrained(
|
127 |
+
model_name,
|
128 |
+
torch_dtype=torch.float16,
|
129 |
+
device_map="auto"
|
130 |
+
)
|
131 |
+
|
132 |
+
# Generate response
|
133 |
+
def generate_expert_response(question, domain="Health Programming Expert"):
|
134 |
+
system_prompt = f"You are a {domain} with deep specialized knowledge."
|
135 |
+
|
136 |
+
prompt = f'''<|begin_of_text|><|start_header_id|>system<|end_header_id|>
|
137 |
+
|
138 |
+
{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
|
139 |
+
|
140 |
+
Question: {question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
141 |
+
|
142 |
+
'''
|
143 |
+
|
144 |
+
inputs = tokenizer(prompt, return_tensors="pt")
|
145 |
+
outputs = model.generate(
|
146 |
+
**inputs,
|
147 |
+
max_new_tokens=512,
|
148 |
+
temperature=0.7,
|
149 |
+
do_sample=True
|
150 |
+
)
|
151 |
+
|
152 |
+
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
|
153 |
+
return response
|
154 |
+
|
155 |
+
# Example usage
|
156 |
+
question = "What are the key components of a successful maternal health program?"
|
157 |
+
response = generate_expert_response(question, "Health Programming Expert")
|
158 |
+
print(response)
|
159 |
+
```
|
160 |
+
|
161 |
+
### Available Domains
|
162 |
+
|
163 |
+
When using the model, specify one of these expert domains:
|
164 |
+
|
165 |
+
- `"Health Programming Expert"`
|
166 |
+
- `"Agriculture Programming Expert"`
|
167 |
+
- `"MEL (Monitoring, Evaluation, and Learning) Expert"`
|
168 |
+
- `"Democracy and Governance Expert"`
|
169 |
+
- `"Water and Sanitation Expert"`
|
170 |
+
- `"Education Expert"`
|
171 |
+
- `"Economic Development Expert"`
|
172 |
+
|
173 |
+
## Model Architecture
|
174 |
+
|
175 |
+
- **Base Architecture:** Llama 3.3 70B
|
176 |
+
- **Attention Mechanism:** Multi-head attention with RoPE
|
177 |
+
- **Vocabulary Size:** 128,256 tokens
|
178 |
+
- **Context Length:** 4,096 tokens (training), up to 131,072 tokens (inference)
|
179 |
+
- **Precision:** FP16 with 4-bit quantization (QLoRA)
|
180 |
+
|
181 |
+
## Training Data
|
182 |
+
|
183 |
+
The model was fine-tuned on domain-specific question-answer pairs across multiple development sectors, with enhanced prompting and domain balancing for comprehensive expertise.
|
184 |
+
|
185 |
+
## Limitations and Considerations
|
186 |
+
|
187 |
+
- Model responses should be verified with domain experts for critical decisions
|
188 |
+
- Performance may vary across different sub-domains within each expertise area
|
189 |
+
- The model reflects training data and may have biases present in the source material
|
190 |
+
- Designed for informational and educational purposes
|
191 |
+
|
192 |
+
## Technical Details
|
193 |
+
|
194 |
+
### Model Size
|
195 |
+
- **Base Model Parameters:** ~70B
|
196 |
+
- **Trainable Parameters:** ~2.9B (4.0% of total)
|
197 |
+
- **Adapter Size:** ~11.2GB
|
198 |
+
- **Memory Requirements:** ~40GB GPU memory for inference
|
199 |
+
|
200 |
+
### Hardware Requirements
|
201 |
+
- **Training:** A100 80GB or equivalent
|
202 |
+
- **Inference:** A100 40GB or equivalent recommended
|
203 |
+
- **Minimum:** RTX 4090 24GB with optimizations
|
204 |
+
|
205 |
+
## Citation
|
206 |
+
|
207 |
+
If you use this model in your research or applications, please cite:
|
208 |
+
|
209 |
+
```bibtex
|
210 |
+
@misc{decipher-llama-3.3-70b,
|
211 |
+
title={Decipher Llama 3.3 70B: Domain Expert Fine-tuned Model},
|
212 |
+
author={HariomSahu},
|
213 |
+
year={2025},
|
214 |
+
publisher={Hugging Face},
|
215 |
+
url={https://huggingface.co/None}
|
216 |
+
}
|
217 |
+
```
|
218 |
+
|
219 |
+
## License
|
220 |
+
|
221 |
+
This model is released under the Apache 2.0 License. The base model license also applies.
|
222 |
+
|
223 |
+
## Contact
|
224 |
+
|
225 |
+
For questions or issues, please open a discussion on this model's page.
|
226 |
+
|
227 |
+
---
|
228 |
+
|
229 |
+
*Model fine-tuned using QLoRA with aggressive configuration for enhanced domain expertise.*
|