SpatialTree: How Spatial Abilities Branch Out in MLLMs Paper • 2512.20617 • Published 2 days ago • 40
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition Paper • 2512.15603 • Published 8 days ago • 55
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published 17 days ago • 109
EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published 20 days ago • 38
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing Paper • 2512.02589 • Published 24 days ago • 63
Thinking with Programming Vision: Towards a Unified View for Thinking with Images Paper • 2512.03746 • Published 23 days ago • 15
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published 23 days ago • 32
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published 24 days ago • 232
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published Nov 20 • 91
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published Nov 13 • 95
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published Nov 12 • 201
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6 • 210
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning Paper • 2510.27492 • Published Oct 30 • 82