Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published 17 days ago • 114
VideoRoPE: What Makes for Good Video Rotary Position Embedding? Paper • 2502.05173 • Published 17 days ago • 60
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published 20 days ago • 56
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper • 2501.09755 • Published Jan 16 • 34
Learning and Leveraging World Models in Visual Representation Learning Paper • 2403.00504 • Published Mar 1, 2024 • 32
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis Paper • 2401.17093 • Published Jan 30, 2024 • 20