SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 3 days ago • 100
Magic 1-For-1: Generating One Minute Video Clips within One Minute Paper • 2502.07701 • Published 12 days ago • 32
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper • 2502.07617 • Published 12 days ago • 27
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation Paper • 2502.05179 • Published 16 days ago • 22