Likhith003
/

dpo-llmjudge-lora-adapter

Text Generation

preference-optimization

instruction-tuning

text-generation-inference

Model card Files Files and versions

dpo-llmjudge-lora-adapter / README.md

Likhith003's picture

Update README.md

dda2169 verified 4 months ago

|

history blame contribute delete

1.01 kB


	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	tags:
	- llama
	- dpo
	- preference-optimization
	- PEFT
	- instruction-tuning
	pipeline_tag: text-generation
	---
	# DPO Fine-Tuned Adapter - LLM Judge Dataset

	## 🧠 Model
	- Base: `meta-llama/Llama-3.2-1B-Instruct`
	- Fine-tuned using TRL's `DPOTrainer` with the LLM Judge preference dataset (50 pairs)

	## ⚙️ Training Parameters
	\| Parameter \| Value \|
	\|-----------------------\|---------------\|
	\| Learning Rate \| 5e-5 \|
	\| Batch Size \| 4 \|
	\| Epochs \| 3 \|
	\| Beta (DPO regularizer)\| 0.1 \|
	\| Max Input Length \| 1024 tokens \|
	\| Max Prompt Length \| 512 tokens \|
	\| Padding Token \| `eos_token` \|

	## 📦 Dataset
	- Source: `llm_judge_preferences.csv`
	- Size: 50 human-labeled pairs with `prompt`, `chosen`, and `rejected` columns

	## 📂 Output
	- Adapter saved and uploaded as `Likhith003/dpo-llmjudge-lora-adapter`