sapphia-410m-RM

super duper ultra highly experimental lora finetune of EleutherAI/pythia-410m-deduped on argilla/dpo-mix-7k, to be a reward model.

why?

nexusflow achieved good results with traditional reward model finetuning! why not meeeeeee :3

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Base model

Adapter

(234)

this model