Copy from https://huggingface.co/RLHFlow/LLaMA3-SFT

We fixed the generation_config.json.

This is the SFT checkpoint used for the project Online-RLHF. Also, check the technical report here.

The model is trained from meta-llama/Meta-Llama-3-8B on a mixture of diverse open-source high-quality data for 1 epoch with detailed parameters in the report. It has not been trained by RLHF and can serve as a good starting point for the RLHF research.

The datasets included: ShareGPT, Evol-Instruct, SlimOrca, MathInstruct, Magicoder-Evol-Instruct, GPT4-LLM, OrcaMath, GPTeacher, UltraInteract.

Downloads last month
14,000
Safetensors
Model size
8.03B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for OpenRLHF/Llama-3-8b-sft-mixture

Finetunes
7 models
Quantizations
2 models

Spaces using OpenRLHF/Llama-3-8b-sft-mixture 6