UGround

osunlp 's Collections

Mind2Web 2

WebDreamer

Mind2Web

SAE-V

UGround

AmpleGCG

LlaSMol

updated May 4

Navigating GUIs as Humans Do: Universal Visual Grounding for GUI Agents (ICLR'25 Oral)

Upvote

osunlp/UGround-V1-Data

Viewer • Updated May 2 • 1.23M • 7.86k • 18

Note The training data used in the paper
osunlp/UGround-V1-Data-Box

Viewer • Updated May 2 • 488k • 285 • 6

Note data with bounding box coordinates
osunlp/UGround-V1-2B

Image-Text-to-Text • 2B • Updated Feb 16 • 2.89k • 9

Note Based on Qwen2-VL-2B-Instruct
osunlp/UGround-V1-7B

Image-Text-to-Text • 8B • Updated Apr 16 • 2.34k • 19

Note Based on Qwen2-VL-7B-Instruct
osunlp/UGround-V1-72B

Image-Text-to-Text • 73B • Updated Jan 23 • 24 • 4

Note Based on Qwen2-VL-72B-Instruct. Full training without LoRA.
osunlp/UGround-V1-72B-Preview

Image-Text-to-Text • 73B • Updated Jan 12 • 4 • 2

Note Based on Qwen2-VL-72B-Instruct. Trained with LoRA.
osunlp/UGround

Image-Text-to-Text • 7B • Updated Apr 16 • 69 • 24

Note The initial model. Based on the modified LLaVA arch (CLIP + Vicuna-7B) describe in the paper
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Paper • 2410.05243 • Published Oct 7, 2024 • 19

Note Low-cost, scalable and effective data synthesis pipeline for GUI visaul grounding; SOTA GUI visual grounding model UGround; purely vision-only (modular) GUI agent framework SeeAct-V; first time demonstrating SOTA performance of vision-only GUI agents.
Paused

16

16

UGround

📱

Note Paused. Will open a new one for Qwen2-VL-based UGround
Paused

2

2

UGround-V1-2B

📱

Note Paused. Trying to figure out how to accelerate the inference.

Upvote

UGround

UGround

UGround-V1-2B