Jiannan Wu's picture

4 6

Jiannan Wu

wjn922

·

AI & ML interests

None yet

Recent Activity

upvoted an article 8 days ago

NEO-unify: Building Native Multimodal Unified Models End to End

liked a dataset 3 months ago

m-Just/O3-Bench

upvoted a paper 3 months ago

InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search

View all activity

Organizations

authored 4 papers 7 months ago

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

Paper • 2406.08394 • Published Jun 12, 2024

Language as Queries for Referring Video Object Segmentation

Paper • 2201.00487 • Published Jan 3, 2022

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Paper • 2312.14238 • Published Dec 21, 2023 • 20

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Paper • 2305.11175 • Published May 18, 2023 • 4