-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper ā¢ 2402.04252 ā¢ Published ā¢ 26 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper ā¢ 2402.03749 ā¢ Published ā¢ 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper ā¢ 2402.04615 ā¢ Published ā¢ 42 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper ā¢ 2402.05008 ā¢ Published ā¢ 22
Collections
Discover the best community collections!
Collections including paper arxiv:2403.12895
-
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Paper ā¢ 2403.12895 ā¢ Published ā¢ 32 -
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Paper ā¢ 2408.01800 ā¢ Published ā¢ 80 -
Phantom of Latent for Large Language and Vision Models
Paper ā¢ 2409.14713 ā¢ Published ā¢ 29