HariomSahu commited on
Commit
ceef4af
·
verified ·
1 Parent(s): a3893f1

Upload model card with evaluation metrics

Browse files
Files changed (1) hide show
  1. README.md +229 -0
README.md ADDED
@@ -0,0 +1,229 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: meta-llama/Llama-3.3-70B-Instruct
4
+ tags:
5
+ - llama
6
+ - llama-3.3
7
+ - fine-tuned
8
+ - qlora
9
+ - development
10
+ - expert-system
11
+ - peft
12
+ - lora
13
+ pipeline_tag: text-generation
14
+ library_name: peft
15
+ ---
16
+
17
+ # Decipher Llama 3.3 70B Instruct
18
+
19
+ ## Model Description
20
+
21
+ This is a fine-tuned version of [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) using QLoRA (Quantized Low-Rank Adaptation) for domain-specific expertise across multiple development sectors.
22
+
23
+ **Base Model:** meta-llama/Llama-3.3-70B-Instruct
24
+ **Fine-tuning Method:** QLoRA with aggressive configuration
25
+ **Training Date:** 2025-07-15T22:44:44.375292
26
+ **Model Type:** Causal Language Model
27
+
28
+ ## Domain Expertise
29
+
30
+ This model has been fine-tuned to provide expert-level responses in:
31
+
32
+ - **Health Programming** - Maternal health, community health interventions, mHealth solutions
33
+ - **Agriculture Programming** - Sustainable farming, crop management, agricultural development
34
+ - **MEL (Monitoring, Evaluation, and Learning)** - Program evaluation, theory of change, impact measurement
35
+ - **Democracy & Governance** - Civic engagement, governance structures, democratic processes
36
+ - **Water & Sanitation** - WASH programs, water resource management, sanitation systems
37
+ - **Education** - Educational program design, learning outcomes, educational technology
38
+ - **Economic Development** - Microfinance, economic growth strategies, financial inclusion
39
+
40
+ ## Training Configuration
41
+
42
+ ### Enhanced Training Parameters
43
+ - **Learning Rate:** 0.0001 (20x higher than baseline)
44
+ - **LoRA Rank:** 64 (4x larger than baseline)
45
+ - **LoRA Alpha:** 128
46
+ - **Training Epochs:** 5
47
+ - **Batch Size:** 1
48
+ - **Gradient Accumulation:** 64
49
+ - **Max Length:** 4096 tokens
50
+
51
+ ### Training Results
52
+ - **Final Training Loss:** 96% (0.266 → 0.009)
53
+ - **Final Validation Loss:** 13% (1.295 → 1.127)
54
+ - **Training Completed:** True
55
+
56
+ ### Evaluation Metrics (Comprehensive Model Assessment)
57
+
58
+ Our fine-tuned model demonstrates significant improvements across multiple evaluation metrics compared to the base Llama 3.3 70B model:
59
+
60
+ #### **Text Generation Quality Metrics**
61
+
62
+ | Metric | Base Model | Fine-tuned Model | Improvement | Statistical Significance |
63
+ |--------|------------|------------------|-------------|-------------------------|
64
+ | **BLEU Score** | 0.0033 | 0.0058 | **+77.8%** | Significant (p=0.038) |
65
+ | **ROUGE-1 F1** | 0.0984 | 0.1247 | **+26.7%** | Significant (p=0.002) |
66
+ | **ROUGE-2 F1** | 0.0250 | 0.0309 | **+23.9%** | Significant (p=0.045) |
67
+ | **ROUGE-L F1** | 0.0687 | 0.0822 | **+19.6%** | Significant (p=0.004) |
68
+
69
+
70
+ #### **Key Performance Insights**
71
+
72
+ **Significant Improvements:**
73
+ - **BLEU Score**: 78% improvement indicates better n-gram overlap with reference answers
74
+ - **ROUGE Metrics**: 20-27% improvements across all variants show enhanced content relevance
75
+ - **Statistical Significance**: All major improvements are statistically significant (p < 0.05)
76
+
77
+ **Effect Sizes:**
78
+ - **ROUGE-1**: Medium effect size (0.47) - substantial practical improvement
79
+ - **ROUGE-L**: Medium effect size (0.43) - meaningful structural improvements
80
+ - **BLEU**: Small-medium effect size (0.31) - noticeable quality enhancement
81
+
82
+
83
+ *Evaluation conducted on 50 domain-specific questions across all expertise areas using automated metrics and statistical analysis.*
84
+
85
+ ## Usage
86
+
87
+ ### Quick Start with Inference API
88
+
89
+ ```python
90
+ # Using HF Inference API (Recommended - No GPU needed)
91
+ from huggingface_hub import InferenceClient
92
+
93
+ client = InferenceClient(model="{self.repo_id}")
94
+
95
+ def generate_expert_response(question, domain="Health Programming Expert"):
96
+ system_prompt = f"You are a "{domain} with deep specialized knowledge."
97
+
98
+ messages = [
99
+ {"role": "system", "content": system_prompt},
100
+ {"role": "user", "content": f"Question: {question}"}
101
+ ]
102
+
103
+ response = client.chat_completion(
104
+ messages=messages,
105
+ max_tokens=512,
106
+ temperature=0.7
107
+ )
108
+
109
+ return response.choices[0].message.content
110
+
111
+ # Example usage
112
+ question = "What are the key components of a successful maternal health program?"
113
+ response = generate_expert_response(question, "Health Programming Expert")
114
+ print(response)
115
+ ```
116
+
117
+ ### Direct Model Loading (Requires GPU)
118
+
119
+ ```python
120
+ from transformers import AutoTokenizer, AutoModelForCausalLM
121
+ import torch
122
+
123
+ # Load model and tokenizer directly
124
+ model_name = "{self.repo_id}"
125
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
126
+ model = AutoModelForCausalLM.from_pretrained(
127
+ model_name,
128
+ torch_dtype=torch.float16,
129
+ device_map="auto"
130
+ )
131
+
132
+ # Generate response
133
+ def generate_expert_response(question, domain="Health Programming Expert"):
134
+ system_prompt = f"You are a {domain} with deep specialized knowledge."
135
+
136
+ prompt = f'''<|begin_of_text|><|start_header_id|>system<|end_header_id|>
137
+
138
+ {system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
139
+
140
+ Question: {question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
141
+
142
+ '''
143
+
144
+ inputs = tokenizer(prompt, return_tensors="pt")
145
+ outputs = model.generate(
146
+ **inputs,
147
+ max_new_tokens=512,
148
+ temperature=0.7,
149
+ do_sample=True
150
+ )
151
+
152
+ response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
153
+ return response
154
+
155
+ # Example usage
156
+ question = "What are the key components of a successful maternal health program?"
157
+ response = generate_expert_response(question, "Health Programming Expert")
158
+ print(response)
159
+ ```
160
+
161
+ ### Available Domains
162
+
163
+ When using the model, specify one of these expert domains:
164
+
165
+ - `"Health Programming Expert"`
166
+ - `"Agriculture Programming Expert"`
167
+ - `"MEL (Monitoring, Evaluation, and Learning) Expert"`
168
+ - `"Democracy and Governance Expert"`
169
+ - `"Water and Sanitation Expert"`
170
+ - `"Education Expert"`
171
+ - `"Economic Development Expert"`
172
+
173
+ ## Model Architecture
174
+
175
+ - **Base Architecture:** Llama 3.3 70B
176
+ - **Attention Mechanism:** Multi-head attention with RoPE
177
+ - **Vocabulary Size:** 128,256 tokens
178
+ - **Context Length:** 4,096 tokens (training), up to 131,072 tokens (inference)
179
+ - **Precision:** FP16 with 4-bit quantization (QLoRA)
180
+
181
+ ## Training Data
182
+
183
+ The model was fine-tuned on domain-specific question-answer pairs across multiple development sectors, with enhanced prompting and domain balancing for comprehensive expertise.
184
+
185
+ ## Limitations and Considerations
186
+
187
+ - Model responses should be verified with domain experts for critical decisions
188
+ - Performance may vary across different sub-domains within each expertise area
189
+ - The model reflects training data and may have biases present in the source material
190
+ - Designed for informational and educational purposes
191
+
192
+ ## Technical Details
193
+
194
+ ### Model Size
195
+ - **Base Model Parameters:** ~70B
196
+ - **Trainable Parameters:** ~2.9B (4.0% of total)
197
+ - **Adapter Size:** ~11.2GB
198
+ - **Memory Requirements:** ~40GB GPU memory for inference
199
+
200
+ ### Hardware Requirements
201
+ - **Training:** A100 80GB or equivalent
202
+ - **Inference:** A100 40GB or equivalent recommended
203
+ - **Minimum:** RTX 4090 24GB with optimizations
204
+
205
+ ## Citation
206
+
207
+ If you use this model in your research or applications, please cite:
208
+
209
+ ```bibtex
210
+ @misc{decipher-llama-3.3-70b,
211
+ title={Decipher Llama 3.3 70B: Domain Expert Fine-tuned Model},
212
+ author={HariomSahu},
213
+ year={2025},
214
+ publisher={Hugging Face},
215
+ url={https://huggingface.co/None}
216
+ }
217
+ ```
218
+
219
+ ## License
220
+
221
+ This model is released under the Apache 2.0 License. The base model license also applies.
222
+
223
+ ## Contact
224
+
225
+ For questions or issues, please open a discussion on this model's page.
226
+
227
+ ---
228
+
229
+ *Model fine-tuned using QLoRA with aggressive configuration for enhanced domain expertise.*