Haoyu Guo's picture

69 3

Haoyu Guo

ghy0324

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

SpatialTree: How Spatial Abilities Branch Out in MLLMs

upvoted a paper 8 days ago

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

upvoted a paper 8 days ago

Step-GUI Technical Report

View all activity

Organizations

upvoted a paper 2 days ago

SpatialTree: How Spatial Abilities Branch Out in MLLMs

Paper • 2512.20617 • Published 2 days ago • 40

upvoted 2 papers 8 days ago

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

Paper • 2512.15603 • Published 8 days ago • 55

Step-GUI Technical Report

Paper • 2512.15431 • Published 9 days ago • 121

upvoted a paper 10 days ago

EgoX: Egocentric Video Generation from a Single Exocentric Video

Paper • 2512.08269 • Published 17 days ago • 109

upvoted a paper 18 days ago

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Paper • 2512.05965 • Published 20 days ago • 38

upvoted a paper 21 days ago

PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

Paper • 2512.02589 • Published 24 days ago • 63

upvoted 3 papers 22 days ago

Thinking with Programming Vision: Towards a Unified View for Thinking with Images

Paper • 2512.03746 • Published 23 days ago • 15

OneThinker: All-in-one Reasoning Model for Image and Video

Paper • 2512.03043 • Published 23 days ago • 32

Qwen3-VL Technical Report

Paper • 2511.21631 • Published 29 days ago • 139

upvoted a paper 23 days ago

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published 24 days ago • 232

upvoted 5 papers about 1 month ago

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20 • 91

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20 • 121

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published Nov 20 • 109

Depth Anything 3: Recovering the Visual Space from Any Views

Paper • 2511.10647 • Published Nov 13 • 95

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Paper • 2511.08892 • Published Nov 12 • 201

upvoted 5 papers about 2 months ago

Cambrian-S: Towards Spatial Supersensing in Video

Paper • 2511.04670 • Published Nov 6 • 37

V-Thinker: Interactive Thinking with Images

Paper • 2511.04460 • Published Nov 6 • 96

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6 • 210

DeepEyesV2: Toward Agentic Multimodal Model

Paper • 2511.05271 • Published Nov 7 • 42

ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

Paper • 2510.27492 • Published Oct 30 • 82