MLDataScientist commited on
Commit
1d7f12b
·
verified ·
1 Parent(s): a84e6a4

update readme

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - mistralai/Mistral-Large-Instruct-2407
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - mistral
7
+ - 3bit
8
+ ---
9
+ This is a 3bit AutoRound GPTQ version of Mistral-Large-Instruct-2407.
10
+ This conversion used model-*.safetensors.
11
+
12
+ Quantization script (it takes around 520 GB RAM and A40 GPU 40GB around 20 hours to convert):
13
+ ```
14
+ from transformers import AutoModelForCausalLM, AutoTokenizer
15
+ import torch
16
+ model_name = "mistralai/Mistral-Large-Instruct-2407"
17
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
18
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
19
+
20
+ from auto_round import AutoRound
21
+
22
+ bits, group_size, sym = 3, 128, True
23
+
24
+ autoround = AutoRound(model, tokenizer, nsamples=256, iters=512, low_gpu_mem_usage=True, batch_size=4, bits=bits, group_size=group_size, sym=sym,
25
+ device='cuda')
26
+ autoround.quantize()
27
+ output_dir = "./Mistral-Large-Instruct-2407-3bit"
28
+ autoround.save_quantized(output_dir, format='auto_gptq', inplace=True)
29
+
30
+ ```