Quantization of flan-t5-base with device_map = CPU
I am trying to do the quantization of the flan-t5-base model in Mac notebook .
The python script is mentioned below:
QUantization
#Testing
import bitsandbytes as bnb
print(bnb.version)
Check if BitsAndBytesConfig is accessible
print(hasattr(BitsAndBytesConfig, 'load_in_8bit'))
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, BitsAndBytesConfig
model_name = "google/flan-t5-base"
Define the configuration for quantization
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
print(quantization_config)
Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
Load the model with the quantization config and set to use the CPU
device_map = {"": "cpu"}
model = AutoModelForSeq2SeqLM.from_pretrained(
model_name,
quantization_config=quantization_config,
device_map=device_map
)
Output is :
0.42.0 ( bnb version )
True. (checking of the load_in_8bit attribute )
BitsAndBytesConfig {
"_load_in_4bit": false,
"_load_in_8bit": true,
"bnb_4bit_compute_dtype": "float32",
"bnb_4bit_quant_storage": "uint8",
"bnb_4bit_quant_type": "fp4",
"bnb_4bit_use_double_quant": false,
"llm_int8_enable_fp32_cpu_offload": false,
"llm_int8_has_fp16_weight": false,
"llm_int8_skip_modules": null,
"llm_int8_threshold": 6.0,
"load_in_4bit": false,
"load_in_8bit": true,
"quant_method": "bitsandbytes"
}
model = AutoModelForSeq2SeqLM.from_pretrained(
... model_name,
... quantization_config=quantization_config,
... device_map = device_map)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_8bit.py", line 73, in validate_environment
raise ImportError(
ImportError: Using bitsandbytes
8-bit quantization requires the latest version of bitsandbytes: `pip install -U bitsandbytes
- I upgraded the bitsandbytes multiple times and it's version is 0.42.0 ( bnb version ) and it had the " load_in_8bit " attribute . The same can be verified in the BitsAndBytesConfig also.
- After searching in the internet, I found in some forums that the Quantization of this model is not supported in CPU device.
Can someone confirm the same or for any other alternative solution using BitsAndBytesConfig in CPU mode?
Thanks
Ananth