-
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Paper • 2309.15915 • Published • 2 -
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
Paper • 2310.00653 • Published • 3 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 8 -
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
Paper • 2309.09958 • Published • 19
Zhao
Hanyu66
AI & ML interests
CV, NLP
Recent Activity
liked
a model
5 days ago
Pointcept/PointTransformerV3
liked
a dataset
7 days ago
RunsenXu/PointLLM
liked
a model
9 days ago
RunsenXu/PointLLM_7B_v1.2
Organizations
None yet
Collections
1
models
1
datasets
None public yet