video/image
updated
google/vit-base-patch16-224
Image Classification
•
86.6M
•
Updated
•
3.95M
•
•
933
OpenGVLab/internimage_g_jointto22k_384
Image Classification
•
3B
•
Updated
•
82
•
1
chancharikm/qwen2.5-vl-72b-cam-motion
Video-Text-to-Text
•
73B
•
Updated
•
3
•
1
Text Generation
•
2B
•
Updated
•
23
•
91
Updated
•
43
•
1
Viewer
•
Updated
•
27.1k
•
1.07k
•
1
Viewer
•
Updated
•
900
•
2.26k
•
10
moonshotai/Kimi-VL-A3B-Thinking-2506
Image-Text-to-Text
•
16B
•
Updated
•
128k
•
346
lmms-lab/llava-critic-113k
Viewer
•
Updated
•
113k
•
182
•
28
lmms-lab/M4-Instruct-Data
Updated
•
1.33k
•
76
lmms-lab/llava-next-interleave-qwen-7b
Text Generation
•
8B
•
Updated
•
201
•
27
lmms-lab/LLaVA-OneVision-Data
Viewer
•
Updated
•
3.94M
•
14.2k
•
227
Viewer
•
Updated
•
19.2k
•
17
Viewer
•
Updated
•
12.5k
•
14
Multimodal Attention Merging for Improved Speech Recognition and Audio
Event Classification
Paper
•
2312.14378
•
Published
avalab/cTBLS_knowledge_retriever
Updated
CraftJarvis/minecraft-vla-sft
Viewer
•
Updated
•
3.78M
•
326
•
10