mlabonne commited on
Commit
2516f8f
·
1 Parent(s): 236d002

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -20
README.md CHANGED
@@ -21,13 +21,13 @@ datasets:
21
 
22
  # NeuralHermes 2.5 - Mistral 7B
23
 
24
- NeuralHermes is an [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) model that has been further fine-tuned with Direct Preference Optimization (DPO) using the [mlabonne/chatml_dpo_pairs](https://huggingface.co/datasets/mlabonne/chatml_dpo_pairs) dataset. It surpasses the original model on several benchmarks (see results)
25
 
26
- It is directly inspired by the RLHF process described by [neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1)'s authors to improve performance. I used the same dataset and reformatted it to apply the ChatML template.
27
 
28
  The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
29
 
30
- GGUF versions of this model are available here: [mlabonne/NeuralHermes-2.5-Mistral-7B-GGUF](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B-GGUF).
31
 
32
  ## Results
33
 
@@ -38,7 +38,7 @@ Results are improved on every benchmark: **AGIEval** (from 43.07% to 43.62%), **
38
  ### AGIEval
39
  ![](https://i.imgur.com/7an3B1f.png)
40
 
41
- ### GPT4All:
42
  ![](https://i.imgur.com/TLxZFi9.png)
43
 
44
  ### TruthfulQA
@@ -87,24 +87,24 @@ print(sequences[0]['generated_text'])
87
  ## Training hyperparameters
88
 
89
  **LoRA**:
90
- * r=16,
91
- * lora_alpha=16,
92
- * lora_dropout=0.05,
93
- * bias="none",
94
- * task_type="CAUSAL_LM",
95
  * target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
96
 
97
  **Training arguments**:
98
- * per_device_train_batch_size=4,
99
- * gradient_accumulation_steps=4,
100
- * gradient_checkpointing=True,
101
- * learning_rate=5e-5,
102
- * lr_scheduler_type="cosine",
103
- * max_steps=200,
104
- * optim="paged_adamw_32bit",
105
- * warmup_steps=100,
106
 
107
  **DPOTrainer**:
108
- * beta=0.1,
109
- * max_prompt_length=1024,
110
- * max_length=1536,
 
21
 
22
  # NeuralHermes 2.5 - Mistral 7B
23
 
24
+ NeuralHermes is an [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) model that has been further fine-tuned with Direct Preference Optimization (DPO) using the [mlabonne/chatml_dpo_pairs](https://huggingface.co/datasets/mlabonne/chatml_dpo_pairs) dataset. It surpasses the original model on several benchmarks (see results)
25
 
26
+ It is directly inspired by the RLHF process described by [Intel/neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1)'s authors to improve performance. I used the same dataset and reformatted it to apply the ChatML template.
27
 
28
  The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
29
 
30
+ 🤗 GGUF: [mlabonne/NeuralHermes-2.5-Mistral-7B-GGUF](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B-GGUF).
31
 
32
  ## Results
33
 
 
38
  ### AGIEval
39
  ![](https://i.imgur.com/7an3B1f.png)
40
 
41
+ ### GPT4All
42
  ![](https://i.imgur.com/TLxZFi9.png)
43
 
44
  ### TruthfulQA
 
87
  ## Training hyperparameters
88
 
89
  **LoRA**:
90
+ * r=16
91
+ * lora_alpha=16
92
+ * lora_dropout=0.05
93
+ * bias="none"
94
+ * task_type="CAUSAL_LM"
95
  * target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
96
 
97
  **Training arguments**:
98
+ * per_device_train_batch_size=4
99
+ * gradient_accumulation_steps=4
100
+ * gradient_checkpointing=True
101
+ * learning_rate=5e-5
102
+ * lr_scheduler_type="cosine"
103
+ * max_steps=200
104
+ * optim="paged_adamw_32bit"
105
+ * warmup_steps=100
106
 
107
  **DPOTrainer**:
108
+ * beta=0.1
109
+ * max_prompt_length=1024
110
+ * max_length=1536