--- library_name: transformers base_model: - ibm-granite/granite-vision-3.1-2b-preview pipeline_tag: image-text-to-text license: apache-2.0 tags: - granite - vision - quantization --- ## About the uploaded model - Quantized by: hassenhamdi - Original model: [ibm-granite/granite-vision-3.1-2b-preview](https://huggingface.co/ibm-granite/granite-vision-3.1-2b-preview) - precision: 8-bit ## Setup - You can run the quantized model with these steps: - Check requirements from the original. In particular, check python, cuda, and transformers versions. - Make sure that you have installed quantization related packages. ```bash pip install transformers accelerate bitsandbytes>0.37.0 ``` - Load & run the model. ```python from transformers import AutoProcessor, AutoModelForVision2Seq from huggingface_hub import hf_hub_download import torch device = "cuda" if torch.cuda.is_available() else "cpu" model = AutoModelForVision2Seq.from_pretrained('hassenhamdi/granite-vision-3.1-2b-preview-8bit', trust_remote_code=True).to(device) tokenizer = AutoProcessor.from_pretrained('ibm-granite/granite-vision-3.1-2b-preview') # prepare image and text prompt, using the appropriate prompt template img_path = hf_hub_download(repo_id=model_path, filename='example.png') conversation = [ { "role": "user", "content": [ {"type": "image", "url": img_path}, {"type": "text", "text": "What is the highest scoring model on ChartQA and what is its score?"}, ], }, ] inputs = processor.apply_chat_template( conversation, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt" ).to(device) # autoregressively complete prompt output = model.generate(**inputs, max_new_tokens=100) print(processor.decode(output[0], skip_special_tokens=True)) ``` ## Configurations - The configuration info are in config.json.