Region-Constraint In-Context Generation for Instructional Video Editing Paper • 2512.17650 • Published 6 days ago • 45
Is Nano Banana Pro a Low-Level Vision All-Rounder? A Comprehensive Evaluation on 14 Tasks and 40 Datasets Paper • 2512.15110 • Published 9 days ago • 7
Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning Paper • 2511.20549 • Published about 1 month ago • 25
RynnVLA-002: A Unified Vision-Language-Action and World Model Paper • 2511.17502 • Published Nov 21 • 25
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning Paper • 2511.16043 • Published Nov 20 • 106
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models Paper • 2511.16668 • Published Nov 20 • 53
NaTex: Seamless Texture Generation as Latent Color Diffusion Paper • 2511.16317 • Published Nov 20 • 15
SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing Paper • 2405.04007 • Published May 7, 2024 • 1
Chimera: Compositional Image Generation using Part-based Concepting Paper • 2510.18083 • Published Oct 20 • 1
MaskAttn-SDXL: Controllable Region-Level Text-To-Image Generation Paper • 2509.15357 • Published Sep 18 • 1
Structured Information for Improving Spatial Relationships in Text-to-Image Generation Paper • 2509.15962 • Published Sep 19 • 1
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation Paper • 2511.09611 • Published Nov 12 • 68
Generating an Image From 1,000 Words: Enhancing Text-to-Image With Structured Captions Paper • 2511.06876 • Published Nov 10 • 27
Emu3.5 Collection Native Multimodal Models are World Learners 🌍 • 4 items • Updated about 20 hours ago • 72
Simulating the Visual World with Artificial Intelligence: A Roadmap Paper • 2511.08585 • Published Nov 11 • 29