update readme
Browse files
README.md
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model:
|
3 |
+
- mistralai/Mistral-Large-Instruct-2407
|
4 |
+
pipeline_tag: text-generation
|
5 |
+
tags:
|
6 |
+
- mistral
|
7 |
+
- 3bit
|
8 |
+
---
|
9 |
+
This is a 3bit AutoRound GPTQ version of Mistral-Large-Instruct-2407.
|
10 |
+
This conversion used model-*.safetensors.
|
11 |
+
|
12 |
+
Quantization script (it takes around 520 GB RAM and A40 GPU 40GB around 20 hours to convert):
|
13 |
+
```
|
14 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
15 |
+
import torch
|
16 |
+
model_name = "mistralai/Mistral-Large-Instruct-2407"
|
17 |
+
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
|
18 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
19 |
+
|
20 |
+
from auto_round import AutoRound
|
21 |
+
|
22 |
+
bits, group_size, sym = 3, 128, True
|
23 |
+
|
24 |
+
autoround = AutoRound(model, tokenizer, nsamples=256, iters=512, low_gpu_mem_usage=True, batch_size=4, bits=bits, group_size=group_size, sym=sym,
|
25 |
+
device='cuda')
|
26 |
+
autoround.quantize()
|
27 |
+
output_dir = "./Mistral-Large-Instruct-2407-3bit"
|
28 |
+
autoround.save_quantized(output_dir, format='auto_gptq', inplace=True)
|
29 |
+
|
30 |
+
```
|