Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents Paper • 2502.11357 • Published 7 days ago • 9 • 2
Diversifying Joint Vision-Language Tokenization Learning Paper • 2306.03421 • Published Jun 6, 2023 • 1
A Systematic Investigation of KB-Text Embedding Alignment at Scale Paper • 2106.01586 • Published Jun 3, 2021
Bringing Back the Context: Camera Trap Species Identification as Link Prediction on Multimodal Knowledge Graphs Paper • 2401.00608 • Published Dec 31, 2023 • 1
A Retrieve-and-Read Framework for Knowledge Graph Link Prediction Paper • 2212.09724 • Published Dec 19, 2022 • 1
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents Paper • 2502.11357 • Published 7 days ago • 9
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents Paper • 2502.11357 • Published 7 days ago • 9
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery Paper • 2410.05080 • Published Oct 7, 2024 • 21
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents Paper • 2410.05243 • Published Oct 7, 2024 • 19
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper • 2404.05719 • Published Apr 8, 2024 • 82
timm/vit_base_patch16_clip_384.laion2b_ft_in1k Image Classification • Updated Jan 21 • 1.08k • 5
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error Paper • 2403.04746 • Published Mar 7, 2024 • 24
Learning and Leveraging World Models in Visual Representation Learning Paper • 2403.00504 • Published Mar 1, 2024 • 32
A Retrieve-and-Read Framework for Knowledge Graph Link Prediction Paper • 2212.09724 • Published Dec 19, 2022 • 1