ViDoRe V3 is our latest benchmark, engineered to set a new industry gold standard for multi-modal, enterprise document retrieval evaluation.
AI & ML interests
Retrieval, Computer Vision, LLM
Recent Activity
View all activity
Pre-trained checkpoints for the ColPali model.
Pre-trained checkpoints for the ColVision models with a ColSmolVLM backbone.
Benchmark for document retrieval using visual features, introduced in the ColPali paper. Datasets are using the QA format.
The ViDoRe benchmark was passed to Unstructured to partition each page into text chunks. Detected figures/tables were captioned with Claude 3-Sonnet.
ViDoRe benchmark with the full OCR text of each page. ⚠️ This dataset serves a intermediate step → use "ViDoRe Chunk OCR (baseline)" for evaluation!
Pre-trained checkpoints for the ColQwen2 model.
Models that can be used with the native transformers 🤗 implementation instead of colpali-engine.
Benchmark for document retrieval using visual features, introduced in the ColPali paper. Datasets are using the BEIR format.
Main resources for the paper: "ColPali: Efficient Document Retrieval with Vision Language Models"
-
ColPali: Efficient Document Retrieval with Vision Language Models
Paper • 2407.01449 • Published • 50 -
vidore/colpali
Visual Document Retrieval • Updated • 6.49k • 467 -
vidore/colpali_train_set
Viewer • Updated • 119k • 4.75k • 88 -
Vidore Leaderboard
🥇192Browse and compare visual document retrieval models
ViDoRe V3 is our latest benchmark, engineered to set a new industry gold standard for multi-modal, enterprise document retrieval evaluation.
Pre-trained checkpoints for the ColPali model.
Pre-trained checkpoints for the ColQwen2 model.
Pre-trained checkpoints for the ColVision models with a ColSmolVLM backbone.
Models that can be used with the native transformers 🤗 implementation instead of colpali-engine.
Benchmark for document retrieval using visual features, introduced in the ColPali paper. Datasets are using the QA format.
Benchmark for document retrieval using visual features, introduced in the ColPali paper. Datasets are using the BEIR format.
The ViDoRe benchmark was passed to Unstructured to partition each page into text chunks. Detected figures/tables were captioned with Claude 3-Sonnet.
Main resources for the paper: "ColPali: Efficient Document Retrieval with Vision Language Models"
-
ColPali: Efficient Document Retrieval with Vision Language Models
Paper • 2407.01449 • Published • 50 -
vidore/colpali
Visual Document Retrieval • Updated • 6.49k • 467 -
vidore/colpali_train_set
Viewer • Updated • 119k • 4.75k • 88 -
Vidore Leaderboard
🥇192Browse and compare visual document retrieval models
ViDoRe benchmark with the full OCR text of each page. ⚠️ This dataset serves a intermediate step → use "ViDoRe Chunk OCR (baseline)" for evaluation!