|
--- |
|
base_model: |
|
- mlabonne/NeuralHermes-2.5-Mistral-7B |
|
- mistralai/Mistral-7B-v0.1 |
|
- OpenPipe/mistral-ft-optimized-1218 |
|
tags: |
|
- mergekit |
|
- merge |
|
license: apache-2.0 |
|
language: |
|
- en |
|
- fr |
|
- ar |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
### Merge Method |
|
|
|
This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as a base. |
|
|
|
### TIES : |
|
|
|
|
|
 |
|
|
|
In Yadav et al.'s paper, the **TIES-Merging** technique is introduced as an efficient approach for consolidating multiple task-specific models into a unified multitask model. This method addresses two primary challenges associated with model merging. Firstly, it tackles redundancy in model parameters by identifying and eliminating redundant parameters within task-specific models. This is achieved by focusing on the changes made during fine-tuning, identifying the top-k% most significant changes, and discarding the remainder. Secondly, TIES-Merging addresses conflicts arising from disagreement between parameter signs, where different models suggest opposing adjustments to the same parameter. To resolve these conflicts, TIES-Merging creates a unified sign vector representing the most dominant direction of change across all models. The TIES-Merging process is structured into three key steps: Trim, which reduces redundancy by retaining a fraction of the most significant parameters; Elect Sign, which resolves sign conflicts by establishing a unified sign vector based on the dominant direction; and Disjoint Merge, which averages parameter values aligning with the unified sign vector while excluding zero values. |
|
|
|
|
|
### Models Merged |
|
|
|
The following models were included in the merge: |
|
* [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B) |
|
* [OpenPipe/mistral-ft-optimized-1218](https://huggingface.co/OpenPipe/mistral-ft-optimized-1218) |
|
|
|
### Configuration |
|
|
|
The following YAML configuration was used to produce this model: |
|
|
|
```yaml |
|
|
|
models: |
|
- model: mistralai/Mistral-7B-v0.1 |
|
# no parameters necessary for base model |
|
- model: OpenPipe/mistral-ft-optimized-1218 |
|
parameters: |
|
density: 0.5 |
|
weight: 0.5 |
|
- model: mlabonne/NeuralHermes-2.5-Mistral-7B |
|
parameters: |
|
density: 0.5 |
|
weight: 0.5 |
|
merge_method: ties |
|
base_model: mistralai/Mistral-7B-v0.1 |
|
parameters: |
|
normalize: true |
|
dtype: float16 |
|
|
|
``` |
|
## Usage : |
|
|
|
```python |
|
# Load model directly |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("ayoubkirouane/Mistral-TIES-Merged7B") |
|
model = AutoModelForCausalLM.from_pretrained("ayoubkirouane/Mistral-TIES-Merged7B") |
|
|
|
|
|
## Load it in 4 bit : |
|
|
|
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
|
import torch |
|
|
|
nf4_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_use_double_quant=True, |
|
bnb_4bit_compute_dtype=torch.bfloat16 |
|
) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
"ayoubkirouane/Mistral-TIES-Merged7B", |
|
device_map='auto', |
|
quantization_config=nf4_config, |
|
use_cache=False |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained("ayoubkirouane/Mistral-TIES-Merged7B") |
|
|
|
tokenizer.pad_token = tokenizer.eos_token |
|
tokenizer.padding_side = "right" |
|
|
|
def generate_response(prompt, model , max_new_tokens): |
|
encoded_input = tokenizer(prompt, return_tensors="pt", add_special_tokens=True) |
|
model_inputs = encoded_input.to('cuda') |
|
|
|
generated_ids = model.generate(**model_inputs, max_new_tokens=max_new_tokens, do_sample=True, pad_token_id=tokenizer.eos_token_id) |
|
|
|
decoded_output = tokenizer.batch_decode(generated_ids) |
|
|
|
return decoded_output[0].replace(prompt, "") |
|
|
|
``` |