Model Details

The VisMin-Idefics2 model was developed as a fine-tuned version of the Idefics2 model, leveraging the VisMin dataset for enhanced performance in multimodal tasks. This model excels in visual-text alignment and is designed to handle tasks where models must differentiate between similar images based on textual descriptions. By employing the QLoRa technique and focusing on a rule-based selection of image-text pairs, the VisMin-Idefics2 model is optimized for fine-grained understanding and improved generalization across various multimodal benchmarks.

Model Summary

Usage

This section shows snippets of code for generation for fine-tuned idefics2-8b. The codes only differ by the input formatting. Let's first define some common imports and inputs.

from transformers import AutoProcessor, AutoModelForVision2Seq

model_name_or_path = "path/to/fine-tuned-model"
if "A100" in gpu_name or "H100" in gpu_name:
     attn_implementation = "flash_attention_2"
else:
     attn_implementation = None

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)
processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b", do_image_splitting=False)
model = AutoModelForVision2Seq.from_pretrained(
    model_name_or_path,
    low_cpu_mem_usage=True,
    device_map="auto",
    torch_dtype=torch.float16,
    _attn_implementation=attn_implementation,  # only A100, H100 GPUs
    quantization_config=quantization_config
    if model_name_or_path in ["HuggingFaceM4/idefics2-8b", "HuggingFaceM4/idefics2-8b-base"]
    else None,
)

Bibtex

 @article{vismin2024,
    title={VisMin: Visual Minimal-Change Understanding},
    author={Awal, Rabiul and Ahmadi, Saba and Zhang, Le and Agrawal, Aishwarya},
    year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Collection including mair-lab/vismin-idefics2-8b