Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies Paper • 2508.20072 • Published 11 days ago • 28
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts Paper • 2507.20939 • Published Jul 28 • 56
AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation Paper • 2506.03126 • Published Jun 3 • 22
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction Paper • 2504.01014 • Published Apr 1 • 71
GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers Paper • 2503.19480 • Published Mar 25 • 16
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation Paper • 2412.04432 • Published Dec 5, 2024 • 16
Moto: Latent Motion Token as the Bridging Language for Robot Manipulation Paper • 2412.04445 • Published Dec 5, 2024 • 23
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation Paper • 2307.06942 • Published Jul 13, 2023 • 23