Pythia-160m supervised finetuned with Anthropic-hh-rlhf dataset for 1 epoch (sft-model), before DPO (paper) with same dataset for 1 epoch.

wandb log

Benchmark evaluations included in repo done using lm-evaluation-harness.

See Pythia-160m for original model details (paper).

Downloads last month
79
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train lomahony/eleuther-pythia160m-hh-dpo

Collection including lomahony/eleuther-pythia160m-hh-dpo