Errors with quantized model
I am using the following quantization method:
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True
)
model = AutoModelForVision2Seq.from_pretrained("ibm-granite/granite-vision-3.1-2b-preview", quantization_config=bnb_config)
During generation, I get an error:
/usr/local/lib/python3.11/dist-packages/torch/nn/functional.py in multi_head_attention_forward(query, key, value, embed_dim_to_check, num_heads, in_proj_weight, in_proj_bias, bias_k, bias_v, add_zero_attn, dropout_p, out_proj_weight, out_proj_bias, training, key_padding_mask, need_weights, attn_mask, use_separate_proj_weight, q_proj_weight, k_proj_weight, v_proj_weight, static_k, static_v, average_attn_weights, is_causal)
6249 attn_output.transpose(0, 1).contiguous().view(tgt_len * bsz, embed_dim)
6250 )
-> 6251 attn_output = linear(attn_output, out_proj_weight, out_proj_bias)
6252 attn_output = attn_output.view(tgt_len, bsz, attn_output.size(1))
6253
RuntimeError: self and mat2 must have the same dtype, but got Half and Byte
It works fine w/o quantization, however quantization is useful during fine-tuning, could you please suggest how to make it work?
Thank you
Thank you for raising this issue,
We managed to reproduce the error and are currently investigating.
There's an issue with the quantization of the vision encoder.
Quantizing with the following config should work:
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
llm_int8_skip_modules=["vision_tower", "lm_head"], # Skip problematic modules
llm_int8_enable_fp32_cpu_offload=True
)
Thank you, I will give it a try. I am basically trying to reduce the model size so that I can fine-tune it on A100 GPU
Check out the example here:
https://huggingface.co/learn/cookbook/en/fine_tuning_granite_vision_sft_trl
I still need to push the quantization fix there, but the full fine tuning works on A100.
It works, thank you! This is very helpful
Thank you
@elischwartz
I am closing this issue for now.