OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published 27 days ago • 32
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models Paper • 2511.11007 • Published Nov 14 • 15
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning Paper • 2503.07523 • Published Mar 10 • 1
Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling Paper • 2508.03404 • Published Aug 5 • 4
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning Paper • 2508.06259 • Published Aug 8 • 2
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow Paper • 2509.21789 • Published Sep 26 • 9
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views Paper • 2510.18632 • Published Oct 21 • 21
CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge Paper • 2403.15426 • Published Mar 13, 2024
DV-Matcher: Deformation-based Non-Rigid Point Cloud Matching Guided by Pre-trained Visual Features Paper • 2408.08568 • Published Aug 16, 2024
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views Paper • 2510.18632 • Published Oct 21 • 21
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning Paper • 2508.06259 • Published Aug 8 • 2
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning Paper • 2503.07523 • Published Mar 10 • 1
Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling Paper • 2508.03404 • Published Aug 5 • 4