GARDO: Reinforcing Diffusion Models without Reward Hacking Paper • 2512.24138 • Published 8 days ago • 26
SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder Paper • 2512.11749 • Published 26 days ago • 38
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework Paper • 2512.03041 • Published Dec 2, 2025 • 62
Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout Paper • 2511.20649 • Published Nov 25, 2025 • 47
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Paper • 2512.02014 • Published Dec 1, 2025 • 71
VisCoder2: Building Multi-Language Visualization Coding Agents Paper • 2510.23642 • Published Oct 24, 2025 • 21
Latent Diffusion Model without Variational Autoencoder Paper • 2510.15301 • Published Oct 17, 2025 • 49
BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions Paper • 2510.10666 • Published Oct 12, 2025 • 27
UniVideo: Unified Understanding, Generation, and Editing for Videos Paper • 2510.08377 • Published Oct 9, 2025 • 71
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning Paper • 2509.01644 • Published Sep 1, 2025 • 33
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1, 2025 • 76
ACECODER: Acing Coder RL via Automated Test-Case Synthesis Paper • 2502.01718 • Published Feb 3, 2025 • 28
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design Paper • 2505.16175 • Published May 22, 2025 • 42
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning Paper • 2505.15966 • Published May 21, 2025 • 53
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation Paper • 2505.14640 • Published May 20, 2025 • 16
General-Reasoner: Advancing LLM Reasoning Across All Domains Paper • 2505.14652 • Published May 20, 2025 • 24