arxiv:2502.14638

NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization

Published on Feb 20

· Submitted by

Zheyuan22 on Feb 21

Upvote

Authors:

Zheyuan Zhang ,

Tasnim Kabir ,

Abstract

Image geo-localization is the task of predicting the specific location of an image and requires complex reasoning across visual, geographical, and cultural contexts. While prior Vision Language Models (VLMs) have the best accuracy at this task, there is a dearth of high-quality datasets and models for analytical reasoning. We first create NaviClues, a high-quality dataset derived from GeoGuessr, a popular geography game, to supply examples of expert reasoning from language. Using this dataset, we present Navig, a comprehensive image geo-localization framework integrating global and fine-grained image information. By reasoning with language, Navig reduces the average distance error by 14% compared to previous state-of-the-art models while requiring fewer than 1000 training samples. Our dataset and code are available at https://github.com/SparrowZheyuan18/Navig/.

View arXiv page View PDF Add to collection

Community

Zheyuan22

Paper author Paper submitter 2 days ago

Navig is a novel framework that reasons and searches with tools to locate an image.

📍 1. Navig learns from GeoGuessr experts: We introduce the first reasoning dataset for Image Geo-localization, which uses image details to infer the location step-by-step. This data is collected from expert players on YouTube.

🗺️ 2. Navig searches on maps: Navig identifies and searches text on images, such as road signs or store names, improving accuracy in pinpointing fine-grained locations.

🔍 3. Performance of Navig: By incorporating language-based reasoning, Navig reduces the average distance error by 14% compared to previous state-of-the-art models.

For more details, check out our dataset here: Navig GitHub. Feel free to reach out if you have any questions.

librarian-bot

1 day ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.14638 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.14638 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.14638 in a Space README.md to link it from this page.