Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding Paper • 2508.20478 • Published 9 days ago • 15
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer Paper • 2508.09131 • Published 25 days ago • 16
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published 23 days ago • 141
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation Paper • 2507.09862 • Published Jul 14 • 49
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Paper • 2507.05255 • Published Jul 7 • 73
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Paper • 2502.10248 • Published Feb 14 • 56
Taming Teacher Forcing for Masked Autoregressive Video Generation Paper • 2501.12389 • Published Jan 21 • 10