mlabonne
/

NeuralHermes-2.5-Mistral-7B

@@ -21,13 +21,13 @@ datasets:
 # NeuralHermes 2.5 - Mistral 7B
-NeuralHermes is an [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) model that has been further fine-tuned with Direct Preference Optimization (DPO) using the [mlabonne/chatml_dpo_pairs](https://huggingface.co/datasets/mlabonne/chatml_dpo_pairs) dataset. It surpasses the original model on several benchmarks (see results)
-It is directly inspired by the RLHF process described by [neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1)'s authors to improve performance. I used the same dataset and reformatted it to apply the ChatML template.
 The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
-GGUF versions of this model are available here: [mlabonne/NeuralHermes-2.5-Mistral-7B-GGUF](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B-GGUF).
 ## Results
@@ -38,7 +38,7 @@ Results are improved on every benchmark: **AGIEval** (from 43.07% to 43.62%), **
 ### AGIEval
 ![](https://i.imgur.com/7an3B1f.png)
-### GPT4All:
 ![](https://i.imgur.com/TLxZFi9.png)
 ### TruthfulQA
@@ -87,24 +87,24 @@ print(sequences[0]['generated_text'])
 ## Training hyperparameters
 **LoRA**:
-* r=16,
-* lora_alpha=16,
-* lora_dropout=0.05,
-* bias="none",
-* task_type="CAUSAL_LM",
 * target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
 **Training arguments**:
-* per_device_train_batch_size=4,
-* gradient_accumulation_steps=4,
-* gradient_checkpointing=True,
-* learning_rate=5e-5,
-* lr_scheduler_type="cosine",
-* max_steps=200,
-* optim="paged_adamw_32bit",
-* warmup_steps=100,
 **DPOTrainer**:
-* beta=0.1,
-* max_prompt_length=1024,
-* max_length=1536,

 # NeuralHermes 2.5 - Mistral 7B
+NeuralHermes is an [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) model that has been further fine-tuned with Direct Preference Optimization (DPO) using the [mlabonne/chatml_dpo_pairs](https://huggingface.co/datasets/mlabonne/chatml_dpo_pairs) dataset. It surpasses the original model on several benchmarks (see results)
+It is directly inspired by the RLHF process described by [Intel/neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1)'s authors to improve performance. I used the same dataset and reformatted it to apply the ChatML template.
 The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
+🤗 GGUF: [mlabonne/NeuralHermes-2.5-Mistral-7B-GGUF](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B-GGUF).
 ## Results
 ### AGIEval
 ![](https://i.imgur.com/7an3B1f.png)
+### GPT4All
 ![](https://i.imgur.com/TLxZFi9.png)
 ### TruthfulQA
 ## Training hyperparameters
 **LoRA**:
+* r=16
+* lora_alpha=16
+* lora_dropout=0.05
+* bias="none"
+* task_type="CAUSAL_LM"
 * target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
 **Training arguments**:
+* per_device_train_batch_size=4
+* gradient_accumulation_steps=4
+* gradient_checkpointing=True
+* learning_rate=5e-5
+* lr_scheduler_type="cosine"
+* max_steps=200
+* optim="paged_adamw_32bit"
+* warmup_steps=100
 **DPOTrainer**:
+* beta=0.1
+* max_prompt_length=1024
+* max_length=1536