RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation
•
8B
•
Updated
•
65
•
•
38
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
Totally Free + Zero Barriers + No Login Required