OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation Paper • 2512.08294 • Published 25 days ago • 17
EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published 29 days ago • 38
Architecture Decoupling Is Not All You Need For Unified Multimodal Model Paper • 2511.22663 • Published Nov 27, 2025 • 29
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views Paper • 2510.18632 • Published Oct 21, 2025 • 21
Go with Your Gut: Scaling Confidence for Autoregressive Image Generation Paper • 2509.26376 • Published Sep 30, 2025 • 9
OmniGen2: Exploration to Advanced Multimodal Generation Paper • 2506.18871 • Published Jun 23, 2025 • 78
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding Paper • 2506.05551 • Published Jun 5, 2025 • 5
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Paper • 2506.01111 • Published Jun 1, 2025 • 31
EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models Paper • 2506.01667 • Published Jun 2, 2025 • 21
Temporal Regularization Makes Your Video Generator Stronger Paper • 2503.15417 • Published Mar 19, 2025 • 22
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization Paper • 2503.08619 • Published Mar 11, 2025 • 20
OmniCreator: Self-Supervised Unified Generation with Universal Editing Paper • 2412.02114 • Published Dec 3, 2024 • 14
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation Paper • 2412.02259 • Published Dec 3, 2024 • 60