ABC - a TIGER-Lab Collection

TIGER-Lab 's Collections

MoCha

General-Reasoner

Vamba

ABC

VisualWebInstruct

CritiqueFineTuning

VISTA

VLM2Vec

MAmmoTH

UniIR

Science

Mantis

ABC

updated Mar 6

A collection of models and datasets from ABC: Achieving Better Control of Multimodal Embeddings using VLMs.

TIGER-Lab/ABC-Pretraining-Data

Viewer • Updated Aug 29 • 2.25M • 479 • 5

Note Pretraining data for ABC-Qwen2VL-Pretrain, derived from Conceptual Captions using negative mining for details, see the paper.
TIGER-Lab/ABC-VG-Instruct

Viewer • Updated Aug 29 • 12.5k • 124

Note Instruction finetuning dataset derived from Visual Genome, contains multiple instructions for each image (which can be used as negatives for each other while training).
TIGER-Lab/ABC-Qwen2VL-Pretrain

Image-Text-to-Text • Updated Mar 11 • 22 • 1

Note The pretrained base adapter. Supports text and image embeddings (similar to CLIP) for creating embeddings. If training your own adapter, use this as the base.
TIGER-Lab/ABC-Qwen2VL-Instruct

Image-Text-to-Text • Updated Mar 11 • 15

Note The final instruction finetuned model. Support text, image, and image-text modalities when creating embeddings.
ABC: Achieving Better Control of Multimodal Embeddings using VLMs

Paper • 2503.00329 • Published Mar 1 • 20