license: other
library_name: transformers
pipeline_tag: text-generation
This repository provides the codebase as presented in Autonomous Data Selection with Language Models for Mathematical Texts.
👋 Join our WeChat or NPU user group.
[ English | 中文 ]
Fine-tuning a large language model can be easy as...
https://github.com/user-attachments/assets/7c96b465-9df7-45f4-8053-bf03e58386d3
Choose your path:
- Colab: https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing
- PAI-DSW: Llama3 Example | Qwen2-VL Example
- Local machine: Please refer to usage
- Documentation (WIP): https://llamafactory.readthedocs.io/zh-cn/latest/
Except for the above links, all other websites are unauthorized third-party websites. Please carefully use them.
Table of Contents
- Features
- Benchmark
- Changelog
- Supported Models
- Supported Training Approaches
- Provided Datasets
- Requirement
- Getting Started
- Projects using LLaMA Factory
- License
- Citation
- Acknowledgement
Features
- Various models: LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen, Qwen2-VL, Yi, Gemma, Baichuan, ChatGLM, Phi, etc.
- Integrated methods: (Continuous) pre-training, (multimodal) supervised fine-tuning, reward modeling, PPO, DPO, KTO, ORPO, etc.
- Scalable resources: 16-bit full-tuning, freeze-tuning, LoRA and 2/3/4/5/6/8-bit QLoRA via AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ.
- Advanced algorithms: GaLore, BAdam, Adam-mini, DoRA, LongLoRA, LLaMA Pro, Mixture-of-Depths, LoRA+, LoftQ, PiSSA and Agent tuning.
- Practical tricks: FlashAttention-2, Unsloth, Liger Kernel, RoPE scaling, NEFTune and rsLoRA.
- Experiment monitors: LlamaBoard, TensorBoard, Wandb, MLflow, etc.
- Faster inference: OpenAI-style API, Gradio UI and CLI with vLLM worker.
Benchmark
Compared to ChatGLM's P-Tuning, LLaMA Factory's LoRA tuning offers up to 3.7 times faster training speed with a better Rouge score on the advertising text generation task. By leveraging 4-bit quantization technique, LLaMA Factory's QLoRA further improves the efficiency regarding the GPU memory.
Definitions
- Training Speed: the number of training samples processed per second during the training. (bs=4, cutoff_len=1024)
- Rouge Score: Rouge-2 score on the development set of the advertising text generation task. (bs=4, cutoff_len=1024)
- GPU Memory: Peak GPU memory usage in 4-bit quantized training. (bs=1, cutoff_len=1024)
- We adopt
pre_seq_len=128
for ChatGLM's P-Tuning andlora_rank=32
for LLaMA Factory's LoRA tuning.
Changelog
[24/10/09] We supported downloading pre-trained models and datasets from the Modelers Hub. See this tutorial for usage.
[24/09/19] We support fine-tuning the Qwen2.5 models.
[24/08/30] We support fine-tuning the Qwen2-VL models. Thank @simonJJJ's PR.
[24/08/27] We support Liger Kernel. Try enable_liger_kernel: true
for efficient training.
[24/08/09] We support Adam-mini optimizer. See examples for usage. Thank @relic-yuexi's PR.
Full Changelog
[24/07/04] We supported contamination-free packed training. Use neat_packing: true
to activate it. Thank @chuan298's PR.
[24/06/16] We supported PiSSA algorithm. See examples for usage.
[24/06/07] We supported fine-tuning the Qwen2 and GLM-4 models.
[24/05/26] We supported SimPO algorithm for preference learning. See examples for usage.
[24/05/20] We supported fine-tuning the PaliGemma series models. Note that the PaliGemma models are pre-trained models, you need to fine-tune them with paligemma
template for chat completion.
[24/05/18] We supported KTO algorithm for preference learning. See examples for usage.
[24/05/14] We supported training and inference on the Ascend NPU devices. Check installation section for details.
[24/04/26] We supported fine-tuning the LLaVA-1.5 multimodal LLMs. See examples for usage.
[24/04/22] We provided a Colab notebook for fine-tuning the Llama-3 model on a free T4 GPU. Two Llama-3-derived models fine-tuned using LLaMA Factory are available at Hugging Face, check Llama3-8B-Chinese-Chat and Llama3-Chinese for details.
[24/04/21] We supported Mixture-of-Depths according to AstraMindAI's implementation. See examples for usage.
[24/04/16] We supported BAdam optimizer. See examples for usage.
[24/04/16] We supported unsloth's long-sequence training (Llama-2-7B-56k within 24GB). It achieves 117% speed and 50% memory compared with FlashAttention-2, more benchmarks can be found in this page.
[24/03/31] We supported ORPO. See examples for usage.
[24/03/21] Our paper "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models" is available at arXiv!
[24/03/20] We supported FSDP+QLoRA that fine-tunes a 70B model on 2x24GB GPUs. See examples for usage.
[24/03/13] We supported LoRA+. See examples for usage.
[24/03/07] We supported GaLore optimizer. See examples for usage.
[24/03/07] We integrated vLLM for faster and concurrent inference. Try infer_backend: vllm
to enjoy 270% inference speed.
[24/02/28] We supported weight-decomposed LoRA (DoRA). Try use_dora: true
to activate DoRA training.
[24/02/15] We supported block expansion proposed by LLaMA Pro. See examples for usage.
[24/02/05] Qwen1.5 (Qwen2 beta version) series models are supported in LLaMA-Factory. Check this blog post for details.
[24/01/18] We supported agent tuning for most models, equipping model with tool using abilities by fine-tuning with dataset: glaive_toolcall_en
.
[23/12/23] We supported unsloth's implementation to boost LoRA tuning for the LLaMA, Mistral and Yi models. Try use_unsloth: true
argument to activate unsloth patch. It achieves 170% speed in our benchmark, check this page for details.
[23/12/12] We supported fine-tuning the latest MoE model Mixtral 8x7B in our framework. See hardware requirement here.\n [23/12/01] We supported downloading pre-trained models and datasets from the ModelScope Hub. See this tutorial for usage.
[23/10/21] We supported NEFTune trick for fine-tuning. Try neftune_noise_alpha: 5
argument to activate NEFTune.
[23/09/27] We supported $S^2$-Attn proposed by LongLoRA for the LLaMA models. Try shift_attn: true
argument to enable shift short attention.
[23/09/23] We integrated MMLU, C-Eval and CMMLU benchmarks in this repo. See examples for usage.
[23/09/10] We supported FlashAttention-2. Try flash_attn: fa2
argument to enable FlashAttention-2 if you are using RTX4090, A100 or H100 GPUs.
[23/08/12] We supported RoPE scaling to extend the context length of the LLaMA models. Try rope_scaling: linear
argument in training and rope_scaling: dynamic
argument at inference to extrapolate the position embeddings.
[23/08/11] We supported DPO training for instruction-tuned models. See examples for usage.
[23/07/31] We supported dataset streaming. Try streaming: true
and max_steps: 10000
arguments to load your dataset in streaming mode.
[23/07/29] We released two instruction-tuned 13B models at Hugging Face. See these Hugging Face Repos (LLaMA-2 / Baichuan) for details.
[23/07/18] We developed an all-in-one Web UI for training, evaluation and inference. Try train_web.py
to fine-tune models in your Web browser. Thank @KanadeSiina and @codemayq for their efforts in the development.
[23/07/09] We released FastEdit ⚡ 🩹, an easy-to-use package for editing the factual knowledge of large language models efficiently. Please follow FastEdit if you are interested.
[23/06/29] We provided a reproducible example of training a chat model using instruction-following datasets, see Baichuan-7B-sft for details.
[23/06/22] We aligned the demo API with the OpenAI's format where you can insert the fine-tuned model in arbitrary ChatGPT-based applications.
[23/06/03] We supported quantized training and inference (aka QLoRA). See examples for usage.
Supported Models
Model | Model size | Template |
---|---|---|
Baichuan 2 | 7B/13B | baichuan2 |
BLOOM/BLOOMZ | 560M/1.1B/1.7B/3B/7.1B/176B | - |
ChatGLM3 | 6B | chatglm3 |
Command R | 35B/104B | cohere |
DeepSeek (Code/MoE) | 7B/16B/67B/236B | deepseek |
Falcon | 7B/11B/40B/180B | falcon |
Gemma/Gemma 2/CodeGemma | 2B/7B/9B/27B | gemma |
GLM-4 | 9B | glm4 |
Llama | 7B/13B/33B/65B | - |
Llama 2 | 7B/13B/70B | llama2 |
Llama 3 | 8B/70B | llama3 |
LLaVA-1.5 | 7B/13B | llava |
LLaVA-NeXT | 7B/8B/13B/34B/72B/110B | llava_next |
LLaVA-NeXT-Video | 7B/34B | llava_next_video |
MiniCPM | 1B/2B/4B | cpm/cpm3 |
Mistral/Mixtral | 7B/8x7B/8x22B | mistral |
OLMo | 1B/7B | - |
PaliGemma | 3B | paligemma |
Phi-1.5/Phi-2 | 1.3B/2.7B | - |
Qwen (1-2.5) (Code/Math/MoE) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen |
StarCoder 2 | 3B/7B/15B | - |
XVERSE | 7B/13B/65B | xverse |
Yi/Yi-1.5 (Code) | 1.5B/6B/9B/34B | yi |
Yi-VL | 6B/34B | yi_vl |
Yuan 2 | 2B/51B/102B | yuan |
For the "base" models, the
template
argument can be chosen fromdefault
,alpaca
,vicuna
etc. But make sure to use the corresponding template for the "instruct/chat" models.Remember to use the SAME template in training and inference.
Please refer to constants.py for a full list of models we supported.
You also can add a custom chat template to template.py.
Supported Training Approaches
Approach | Full-tuning | Freeze-tuning | LoRA | QLoRA |
---|---|---|---|---|
Pre-Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
Supervised Fine-Tuning | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
Reward Modeling | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
PPO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
DPO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
KTO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
ORPO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
SimPO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
The implementation details of PPO can be found in this blog.
Provided Datasets
Pre-training datasets
Supervised fine-tuning datasets
- Identity (en&zh)
- Stanford Alpaca (en)
- Stanford Alpaca (zh)
- Alpaca GPT4 (en&zh)
- Glaive Function Calling V2 (en&zh)
- LIMA (en)
- Guanaco Dataset (multilingual)
- BELLE 2M (zh)
- BELLE 1M (zh)
- BELLE 0.5M (zh)
- BELLE Dialogue 0.4M (zh)
- BELLE School Math 0.25M (zh)
- BELLE Multiturn Chat 0.8M (zh)
- UltraChat (en)
- OpenPlatypus (en)
- CodeAlpaca 20k (en)
- Alpaca CoT (multilingual)
- OpenOrca (en)
- SlimOrca (en)
- MathInstruct (en)
- Firefly 1.1M (zh)
- Wiki QA (en)
- Web QA (zh)
- WebNovel (zh)
- Nectar (en)
- deepctrl (en&zh)
- Advertise Generating (zh)
- ShareGPT Hyperfiltered (en)
- ShareGPT4 (en&zh)
- UltraChat 200k (en)
- AgentInstruct (en)
- LMSYS Chat 1M (en)
- Evol Instruct V2 (en)
- Cosmopedia (en)
- STEM (zh)
- Ruozhiba (zh)
- Neo-sft (zh)
- Magpie-Pro-300K-Filtered (en)
- Magpie-ultra-v0.1 (en)
- WebInstructSub (en)
- LLaVA mixed (en&zh)
- Pokemon-gpt4o-captions (en&zh)
- Open Assistant (de)
- Dolly 15k (de)
- Alpaca GPT4 (de)
- OpenSchnabeltier (de)
- Evol Instruct (de)
- [Dolphin (de)](https://huggingface