-
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
Paper • 2401.13313 • Published • 5 -
BAAI/Bunny-v1_0-4B
Text Generation • Updated • 136 • 9 -
What matters when building vision-language models?
Paper • 2405.02246 • Published • 102 -
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper • 2405.20204 • Published • 35
Collections
Discover the best community collections!
Collections including paper arxiv:2406.12275
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 19 -
Learning and Leveraging World Models in Visual Representation Learning
Paper • 2403.00504 • Published • 32 -
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 27 -
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
Paper • 2403.05438 • Published • 20