Devil in the Number: Towards Robust Multi-modality Data Filter Paper • 2309.13770 • Published Sep 24, 2023
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark Paper • 2410.03051 • Published Oct 4, 2024 • 6
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark Paper • 2504.14693 • Published Apr 20
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Paper • 2410.08261 • Published Oct 10, 2024 • 52
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding Paper • 2307.16449 • Published Jul 31, 2023 • 16