nielsr HF Staff commited on
Commit
ba9c697
·
verified ·
1 Parent(s): 7fbbd2c

Add model card for OpenVision 2

Browse files

This PR adds a comprehensive model card for the OpenVision 2 model.

It links to the official paper: [OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning](https://huggingface.co/papers/2509.01644), its project page: [https://ucsc-vlaa.github.io/OpenVision2/](https://ucsc-vlaa.github.io/OpenVision2/), and its GitHub repository: [https://github.com/UCSC-VLAA/OpenVision/blob/main/src/main_openvision2.py](https://github.com/UCSC-VLAA/OpenVision/blob/main/src/main_openvision2.py).

Additionally, it includes the `pipeline_tag: image-text-to-text` in the metadata, making the model discoverable under the "Image-to-Text" pipeline at https://huggingface.co/models?pipeline_tag=image-text-to-text.

Files changed (1) hide show
  1. README.md +13 -0
README.md ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-text-to-text
3
+ ---
4
+
5
+ # OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning
6
+
7
+ This repository hosts the OpenVision 2 model, a family of generative pretrained visual encoders for multimodal learning. As described in the paper [OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning](https://huggingface.co/papers/2509.01644), OpenVision 2 simplifies its predecessor's architecture by removing the text encoder and contrastive loss, relying solely on a captioning loss for a purely generative training signal.
8
+
9
+ This simplification significantly enhances training efficiency, reducing both training time and memory consumption, while maintaining competitive performance across a broad range of multimodal benchmarks. The improved efficiency allows for scaling to vision encoders exceeding 1 billion parameters, advocating for a lightweight, generative-only approach in multimodal foundation models.
10
+
11
+ - **Paper:** [OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning](https://huggingface.co/papers/2509.01644)
12
+ - **Project Page:** [https://ucsc-vlaa.github.io/OpenVision2/](https://ucsc-vlaa.github.io/OpenVision2/)
13
+ - **GitHub Repository:** [https://github.com/UCSC-VLAA/OpenVision/blob/main/src/main_openvision2.py](https://github.com/UCSC-VLAA/OpenVision/blob/main/src/main_openvision2.py)