Likhith003 commited on
Commit
1b55dd9
·
verified ·
1 Parent(s): ff81aac

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # DPO Fine-Tuned Adapter - LLM Judge Dataset
3
+
4
+ ## 🧠 Model
5
+ - Base: `meta-llama/Llama-3.2-1B-Instruct`
6
+ - Fine-tuned using TRL's `DPOTrainer` with the LLM Judge preference dataset (50 pairs)
7
+
8
+ ## ⚙️ Training Parameters
9
+ | Parameter | Value |
10
+ |-----------------------|---------------|
11
+ | Learning Rate | 5e-5 |
12
+ | Batch Size | 4 |
13
+ | Epochs | 3 |
14
+ | Beta (DPO regularizer)| 0.1 |
15
+ | Max Input Length | 1024 tokens |
16
+ | Max Prompt Length | 512 tokens |
17
+ | Padding Token | `eos_token` |
18
+
19
+ ## 📦 Dataset
20
+ - Source: `llm_judge_preferences.csv`
21
+ - Size: 50 human-labeled pairs with `prompt`, `chosen`, and `rejected` columns
22
+
23
+ ## 📂 Output
24
+ - Adapter saved and uploaded as `Likhith003/dpo-llmjudge-lora-adapter`