InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization Paper • 2508.05731 • Published 13 days ago • 25
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners Paper • 2504.14239 • Published Apr 19 • 14
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection Paper • 2501.04575 • Published Jan 8 • 24
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 65
DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation Paper • 2410.18666 • Published Oct 24, 2024 • 19
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published Sep 19, 2024 • 51
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22, 2024 • 257
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding Paper • 2403.01487 • Published Mar 3, 2024 • 16
COCO is "ALL'' You Need for Visual Instruction Fine-tuning Paper • 2401.08968 • Published Jan 17, 2024 • 2
CORE-MM: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models Paper • 2311.11567 • Published Nov 20, 2023 • 8