Vision-Language - a OliP Collection

OliP 's Collections

NewGen small LMs

Leading Leaderboards

2024 Papers of the year

2023 (and before) Papers of the Year

Vision-Language

Audio

Special LMs <10B

Coding

Vision-Language

updated Dec 19, 2024

EVLM: An Efficient Vision-Language Model for Visual Understanding

Paper • 2407.14177 • Published Jul 19, 2024 • 43
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild

Paper • 2407.04172 • Published Jul 4, 2024 • 23
facebook/chameleon-7b

Image-Text-to-Text • Updated Jul 23, 2024 • 24.6k • 175
vidore/colpali

Visual Document Retrieval • Updated 18 days ago • 28.8k • 423
E5-V: Universal Embeddings with Multimodal Large Language Models

Paper • 2407.12580 • Published Jul 17, 2024 • 40
Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26, 2024 • 32
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5, 2024 • 61
LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 60
Running on Zero

111

111

ColPali

🏃

Document Retrieval
VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9, 2024 • 47
nvidia/NVLM-D-72B

Image-Text-to-Text • Updated Jan 14 • 16.1k • 767
mistralai/Pixtral-12B-2409

Image-Text-to-Text • Updated Dec 26, 2024 • • 608
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3, 2024 • 83
stepfun-ai/GOT-OCR2_0

Image-Text-to-Text • Updated 20 days ago • 116k • 1.4k
deepseek-ai/Janus-1.3B

Any-to-Any • Updated 28 days ago • 189k • 577
h2oai/h2ovl-mississippi-2b

Text Generation • Updated Dec 13, 2024 • 117k • 29
HuggingFaceM4/Idefics3-8B-Llama3

Image-Text-to-Text • Updated Dec 2, 2024 • 49k • 269
wyu1/Leopard-Idefics2

Updated Nov 8, 2024 • 13 • 4
HuggingFaceTB/SmolVLM-Instruct

Image-Text-to-Text • Updated Dec 2, 2024 • 114k • 392
alibaba-damo/mgp-str-base

Image-to-Text • Updated Dec 11, 2023 • 8.42k • 64
google/paligemma2-3b-pt-224

Image-Text-to-Text • Updated Dec 5, 2024 • 70.8k • 146