Likhith003
/

dpo-llmjudge-lora-adapter

Text Generation

preference-optimization

instruction-tuning

text-generation-inference

Model card Files Files and versions

Likhith003 commited on Apr 19

Commit

1b55dd9

·

verified ·

1 Parent(s): ff81aac

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +24 -0

README.md ADDED Viewed

	@@ -0,0 +1,24 @@

+# DPO Fine-Tuned Adapter - LLM Judge Dataset
+## 🧠 Model
+- Base: `meta-llama/Llama-3.2-1B-Instruct`
+- Fine-tuned using TRL's `DPOTrainer` with the LLM Judge preference dataset (50 pairs)
+## ⚙️ Training Parameters
+| Parameter             | Value         |
+|-----------------------|---------------|
+| Learning Rate         | 5e-5          |
+| Batch Size            | 4             |
+| Epochs                | 3             |
+| Beta (DPO regularizer)| 0.1           |
+| Max Input Length      | 1024 tokens   |
+| Max Prompt Length     | 512 tokens    |
+| Padding Token         | `eos_token`   |
+## 📦 Dataset
+- Source: `llm_judge_preferences.csv`
+- Size: 50 human-labeled pairs with `prompt`, `chosen`, and `rejected` columns
+## 📂 Output
+- Adapter saved and uploaded as `Likhith003/dpo-llmjudge-lora-adapter`