RLHFlow

university

RLHFlow

RLHFlow

AI & ML interests

Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/

Papers

Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training

View all Papers

RLHFlow 's models 37

RLHFlow/pair-preference-model-LLaMA3-8B

Text Generation • 8B • Updated Oct 14, 2024 • 65 • • 38

RLHFlow/LLaMA3-iterative-DPO-final

Text Generation • 8B • Updated Oct 14, 2024 • 40 • • 41

RLHFlow/LLaMA3.2-3B-SFT

Text Generation • 3B • Updated Oct 1, 2024 • 152

RLHFlow/LLaMA3.2-1B-SFT

Text Generation • 1B • Updated Oct 1, 2024 • 6 •

RLHFlow/ArmoRM-Llama3-8B-v0.1

Text Classification • 8B • Updated Sep 23, 2024 • 12.2k • 184

RLHFlow/DPA-v1-Mistral-7B

Text Generation • 7B • Updated May 23, 2024 • 12 • 1

RLHFlow/RewardModel-Mistral-7B-for-DPA-v1

Text Classification • 7B • Updated May 23, 2024 • 455 • 4