Errors with quantized model

by tatyanavidrevich - opened 10 days ago

10 days ago

I am using the following quantization method:

from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True

)
model = AutoModelForVision2Seq.from_pretrained("ibm-granite/granite-vision-3.1-2b-preview", quantization_config=bnb_config)

During generation, I get an error:
/usr/local/lib/python3.11/dist-packages/torch/nn/functional.py in multi_head_attention_forward(query, key, value, embed_dim_to_check, num_heads, in_proj_weight, in_proj_bias, bias_k, bias_v, add_zero_attn, dropout_p, out_proj_weight, out_proj_bias, training, key_padding_mask, need_weights, attn_mask, use_separate_proj_weight, q_proj_weight, k_proj_weight, v_proj_weight, static_k, static_v, average_attn_weights, is_causal)
6249 attn_output.transpose(0, 1).contiguous().view(tgt_len * bsz, embed_dim)
6250 )
-> 6251 attn_output = linear(attn_output, out_proj_weight, out_proj_bias)
6252 attn_output = attn_output.view(tgt_len, bsz, attn_output.size(1))
6253

RuntimeError: self and mat2 must have the same dtype, but got Half and Byte

It works fine w/o quantization, however quantization is useful during fine-tuning, could you please suggest how to make it work?

Thank you

aarbelle

IBM Granite org 8 days ago

Thank you for raising this issue,
We managed to reproduce the error and are currently investigating.

elischwartz

7 days ago

Hi @tatyanavidrevich

There's an issue with the quantization of the vision encoder.
Quantizing with the following config should work:

bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
        llm_int8_skip_modules=["vision_tower", "lm_head"],  # Skip problematic modules
        llm_int8_enable_fp32_cpu_offload=True
    )

tatyanavidrevich

7 days ago

Thank you, I will give it a try. I am basically trying to reduce the model size so that I can fine-tune it on A100 GPU

elischwartz

7 days ago

Check out the example here:
https://huggingface.co/learn/cookbook/en/fine_tuning_granite_vision_sft_trl

I still need to push the quantization fix there, but the full fine tuning works on A100.

tatyanavidrevich

7 days ago

It works, thank you! This is very helpful

aarbelle

IBM Granite org 7 days ago

Thank you @elischwartz
I am closing this issue for now.

aarbelle changed discussion status to closed 7 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment