Regional biases in image geolocation estimation: a case study with the SenseCity Africa dataset

2404.02558

Published 4/4/2024 by Ximena Salgado Uribe, Mart'i Bosch, J'er^ome Chenal

Regional biases in image geolocation estimation: a case study with the SenseCity Africa dataset

Abstract

Advances in Artificial Intelligence are challenged by the biases rooted in the datasets used to train the models. In image geolocation estimation, models are mostly trained using data from specific geographic regions, notably the Western world, and as a result, they may struggle to comprehend the complexities of underrepresented regions. To assess this issue, we apply a state-of-the-art image geolocation estimation model (ISNs) to a crowd-sourced dataset of geolocated images from the African continent (SCA100), and then explore the regional and socioeconomic biases underlying the model's predictions. Our findings show that the ISNs model tends to over-predict image locations in high-income countries of the Western world, which is consistent with the geographic distribution of its training data, i.e., the IM2GPS3k dataset. Accordingly, when compared to the IM2GPS3k benchmark, the accuracy of the ISNs model notably decreases at all scales. Additionally, we cluster images of the SCA100 dataset based on how accurately they are predicted by the ISNs model and show the model's difficulties in correctly predicting the locations of images in low income regions, especially in Sub-Saharan Africa. Therefore, our results suggest that using IM2GPS3k as a training set and benchmark for image geolocation estimation and other computer vision models overlooks its potential application in the African context.

Get summaries of the top AI research delivered straight to your inbox:

Overview

This paper investigates regional biases in image geolocation estimation using the SenseCity Africa dataset.
The researchers explore how well current machine learning models can determine the geographic location of images, and identify potential biases in their performance across different regions of Africa.
The findings have implications for the development of more inclusive and equitable computer vision systems, especially for underrepresented areas.

Plain English Explanation

The study looks at how well AI systems can figure out where a photo was taken, using a dataset focused on images from across Africa. Current machine learning models for geolocation estimation often perform better in some regions compared to others. The researchers wanted to understand these regional differences and identify any biases in the models' performance.

This is an important issue because these AI systems are increasingly being used for applications like urban planning, tourism, and navigation. If the systems have trouble accurately locating images from certain parts of the world, it could lead to those areas being underrepresented or overlooked. The goal is to develop more inclusive and fair computer vision technologies that work well no matter where the image is from.

Technical Explanation

The paper uses the SenseCity Africa dataset, which contains over 1 million geotagged images from 54 African countries. The researchers trained several state-of-the-art deep learning models for image geolocation estimation on this dataset. They then evaluated the models' performance, looking at both overall accuracy and how that accuracy varied across different regions of Africa.

The results show that the models generally perform better in some parts of Africa compared to others. Factors like the amount of training data available, economic development, and infrastructure seem to play a role in these regional disparities. The researchers provide a detailed analysis of the geographical patterns in the models' performance, identifying hotspots of high and low accuracy.

Critical Analysis

The paper does a thorough job of documenting and analyzing the regional biases present in current image geolocation models. However, it does not delve into the potential causes and societal implications of these biases in depth. More discussion around the socioeconomic and historical factors that may contribute to the uneven performance would have provided further insight.

Additionally, the paper focuses on evaluating the models' accuracy, but does not explore other important aspects like model fairness, robustness, or interpretability. Examining these additional metrics could shed light on how well the models generalize and whether they exhibit concerning biases beyond just geographic location.

Further research is also needed to develop effective strategies for mitigating regional biases in computer vision systems. The paper suggests some potential approaches, but more work is required to translate these findings into practical solutions.

Conclusion

This study highlights an important yet underexplored issue in the field of computer vision - the presence of regional biases in image geolocation estimation. By using the SenseCity Africa dataset, the researchers were able to uncover significant disparities in model performance across different parts of the continent.

The findings have important implications for ensuring that emerging AI technologies are inclusive and equitable, especially for historically underrepresented regions. Addressing these biases will be crucial as computer vision systems become more widely deployed in real-world applications that impact people's lives. This research represents an important step towards building more fair and representative artificial intelligence.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

$Classification for everyone : Building geography agnostic models for fairer recognition$

Classification for everyone : Building geography agnostic models for fairer recognition

Akshat Jindal, Shreya Singh, Soham Gadgil

In this paper, we analyze different methods to mitigate inherent geographical biases present in state of the art image classification models. We first quantitatively present this bias in two datasets - The Dollar Street Dataset and ImageNet, using images with location information. We then present different methods which can be employed to reduce this bias. Finally, we analyze the effectiveness of the different techniques on making these models more robust to geographical locations of the images.

4/3/2024

cs.CV cs.AI cs.CY cs.LG

Towards Geographic Inclusion in the Evaluation of Text-to-Image Models

Melissa Hall, Samuel J. Bell, Candace Ross, Adina Williams, Michal Drozdzal, Adriana Romero Soriano

Rapid progress in text-to-image generative models coupled with their deployment for visual content creation has magnified the importance of thoroughly evaluating their performance and identifying potential biases. In pursuit of models that generate images that are realistic, diverse, visually appealing, and consistent with the given prompt, researchers and practitioners often turn to automated metrics to facilitate scalable and cost-effective performance profiling. However, commonly-used metrics often fail to account for the full diversity of human preference; often even in-depth human evaluations face challenges with subjectivity, especially as interpretations of evaluation criteria vary across regions and cultures. In this work, we conduct a large, cross-cultural study to study how much annotators in Africa, Europe, and Southeast Asia vary in their perception of geographic representation, visual appeal, and consistency in real and generated images from state-of-the art public APIs. We collect over 65,000 image annotations and 20 survey responses. We contrast human annotations with common automated metrics, finding that human preferences vary notably across geographic location and that current metrics do not fully account for this diversity. For example, annotators in different locations often disagree on whether exaggerated, stereotypical depictions of a region are considered geographically representative. In addition, the utility of automatic evaluations is dependent on assumptions about their set-up, such as the alignment of feature extractors with human perception of object similarity or the definition of appeal captured in reference datasets used to ground evaluations. We recommend steps for improved automatic and human evaluations.

5/8/2024

cs.CV cs.CY cs.HC

PIGEON: Predicting Image Geolocations

Lukas Haas, Michal Skreta, Silas Alberti, Chelsea Finn

Planet-scale image geolocalization remains a challenging problem due to the diversity of images originating from anywhere in the world. Although approaches based on vision transformers have made significant progress in geolocalization accuracy, success in prior literature is constrained to narrow distributions of images of landmarks, and performance has not generalized to unseen places. We present a new geolocalization system that combines semantic geocell creation, multi-task contrastive pretraining, and a novel loss function. Additionally, our work is the first to perform retrieval over location clusters for guess refinements. We train two models for evaluations on street-level data and general-purpose image geolocalization; the first model, PIGEON, is trained on data from the game of Geoguessr and is capable of placing over 40% of its guesses within 25 kilometers of the target location globally. We also develop a bot and deploy PIGEON in a blind experiment against humans, ranking in the top 0.01% of players. We further challenge one of the world's foremost professional Geoguessr players to a series of six matches with millions of viewers, winning all six games. Our second model, PIGEOTTO, differs in that it is trained on a dataset of images from Flickr and Wikipedia, achieving state-of-the-art results on a wide range of image geolocalization benchmarks, outperforming the previous SOTA by up to 7.7 percentage points on the city accuracy level and up to 38.8 percentage points on the country level. Our findings suggest that PIGEOTTO is the first image geolocalization model that effectively generalizes to unseen places and that our approach can pave the way for highly accurate, planet-scale image geolocalization systems. Our code is available on GitHub.

4/9/2024

cs.CV cs.LG

📊

Localization Is All You Evaluate: Data Leakage in Online Mapping Datasets and How to Fix It

Adam Lilja, Junsheng Fu, Erik Stenborg, Lars Hammarstrand

The task of online mapping is to predict a local map using current sensor observations, e.g. from lidar and camera, without relying on a pre-built map. State-of-the-art methods are based on supervised learning and are trained predominantly using two datasets: nuScenes and Argoverse 2. However, these datasets revisit the same geographic locations across training, validation, and test sets. Specifically, over $80$% of nuScenes and $40$% of Argoverse 2 validation and test samples are less than $5$ m from a training sample. At test time, the methods are thus evaluated more on how well they localize within a memorized implicit map built from the training data than on extrapolating to unseen locations. Naturally, this data leakage causes inflated performance numbers and we propose geographically disjoint data splits to reveal the true performance in unseen environments. Experimental results show that methods perform considerably worse, some dropping more than $45$ mAP, when trained and evaluated on proper data splits. Additionally, a reassessment of prior design choices reveals diverging conclusions from those based on the original split. Notably, the impact of lifting methods and the support from auxiliary tasks (e.g., depth supervision) on performance appears less substantial or follows a different trajectory than previously perceived. Splits can be found at https://github.com/LiljaAdam/geographical-splits

4/8/2024

cs.CV