AI & ML interests
Computer Vision
Recent Activity
View all activity
Organization Card
OpenGVLab
Welcome to OpenGVLab! We are a research group from Shanghai AI Lab focused on Vision-Centric AI research. The GV in our name, OpenGVLab, means general vision, a general understanding of vision, so little effort is needed to adapt to new vision-based tasks.
Models
- InternVL: a pioneering open-source alternative to GPT-4V.
- InternImage: a large-scale vision foundation models with deformable convolutions.
- InternVideo: large-scale video foundation models for multimodal understanding.
- VideoChat: an end-to-end chat assistant for video comprehension.
- All-Seeing-Project: towards panoptic visual recognition and understanding of the open world.
Datasets
- ShareGPT4o: a groundbreaking large-scale resource that we plan to open-source with 200K meticulously annotated images, 10K videos with highly descriptive captions, and 10K audio files with detailed descriptions.
- InternVid: a large-scale video-text dataset for multimodal understanding and generation.
- MMPR: a high-quality, large-scale multimodal preference dataset.
Benchmarks
- MVBench: a comprehensive benchmark for multimodal video understanding.
- CRPE: a benchmark covering all elements of the relation triplets (subject, predicate, object), providing a systematic platform for the evaluation of relation comprehension ability.
- MM-NIAH: a comprehensive benchmark for long multimodal documents comprehension.
- GMAI-MMBench: a comprehensive multimodal evaluation benchmark towards general medical AI.
This collection includes only the InternVL3.5 checkpoints that have completed the full training pipeline (i.e., Pretraining, SFT, MPO, Cascade RL).
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 191 -
OpenGVLab/InternVL3_5-241B-A28B-HF
Image-Text-to-Text • 241B • Updated • 371 • 8 -
OpenGVLab/InternVL3_5-38B-HF
Image-Text-to-Text • 38B • Updated • 1.35k • 5 -
OpenGVLab/InternVL3_5-30B-A3B-HF
Image-Text-to-Text • 31B • Updated • 948 • 4
This collection includes only the InternVL3.5 checkpoints that have completed the full training pipeline (i.e., Pretraining, SFT, MPO, Cascade RL).
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 191 -
OpenGVLab/InternVL3_5-241B-A28B-HF
Image-Text-to-Text • 241B • Updated • 371 • 8 -
OpenGVLab/InternVL3_5-38B-HF
Image-Text-to-Text • 38B • Updated • 1.35k • 5 -
OpenGVLab/InternVL3_5-30B-A3B-HF
Image-Text-to-Text • 31B • Updated • 948 • 4
spaces
13
No application file
1
ScaleCUA Grounding
🏢
Runtime error
InternVideo2.5
💬
Hierarchical Compression for Long-Context Video Modeling
Running
497
InternVL
⚡
Interact with a multimodal chatbot that analyzes images and text
Running
40
MVBench Leaderboard
🐨
Submit and view model evaluations
Runtime error
18
InternVideo2 Chat 8B HD
👁
Upload a video to chat about its contents
models
269

OpenGVLab/ScaleCUA-7B
Image-Text-to-Text
•
8B
•
Updated
•
30
•
5

OpenGVLab/ScaleCUA-32B
Image-Text-to-Text
•
33B
•
Updated
•
15
•
12

OpenGVLab/ScaleCUA-3B
Image-Text-to-Text
•
4B
•
Updated
•
55
•
8

OpenGVLab/InternVL3-78B-AWQ
Image-Text-to-Text
•
Updated
•
665
•
10

OpenGVLab/InternVL3-78B-Instruct
Image-Text-to-Text
•
78B
•
Updated
•
4.06k
•
8

OpenGVLab/InternVL3-78B-Pretrained
Image-Text-to-Text
•
78B
•
Updated
•
33
•
1

OpenGVLab/InternVL3-78B
Image-Text-to-Text
•
78B
•
Updated
•
25.9k
•
218

OpenGVLab/InternVL3-38B-AWQ
Image-Text-to-Text
•
Updated
•
1.33k
•
4

OpenGVLab/InternVL3-38B-Instruct
Image-Text-to-Text
•
38B
•
Updated
•
7.21k
•
9

OpenGVLab/InternVL3-38B-Pretrained
Image-Text-to-Text
•
38B
•
Updated
•
38
•
1
datasets
48
OpenGVLab/ScaleCUA-Data
Updated
•
1.16k
•
14
OpenGVLab/GenExam
Updated
•
163
•
3
OpenGVLab/VRBench
Preview
•
Updated
•
177
•
3
OpenGVLab/MMPR-v1.2
Updated
•
13.4k
•
31
OpenGVLab/MMPR-Tiny
Updated
•
529
•
3
OpenGVLab/MMPR-v1.2-prompts
Updated
•
6.66k
•
2
OpenGVLab/MMBench-GUI
Preview
•
Updated
•
80
•
35
OpenGVLab/GUI-Odyssey
Viewer
•
Updated
•
7.74k
•
18.2k
•
25
OpenGVLab/LORIS
Updated
•
224
•
3
OpenGVLab/OpenCUA_Env
Updated
•
6