A Picture is Worth 500 Labels: A Case Study of Demographic Disparities in Local Machine Learning Models for Instagram and TikTok

Read original: arXiv:2403.19717 - Published 4/1/2024 by Jack West, Lea Thiemt, Shimaa Ahmed, Maggie Bartig, Kassem Fawaz, Suman Banerjee

A Picture is Worth 500 Labels: A Case Study of Demographic Disparities in Local Machine Learning Models for Instagram and TikTok

Overview

The paper examines demographic disparities in local machine learning models for popular social media platforms like Instagram and TikTok.
It finds significant differences in the performance of these models across user demographics, with some groups experiencing lower accuracy than others.
The authors investigate the potential causes of these disparities and discuss the implications for equitable AI development.

Plain English Explanation

The study looks at how well machine learning models used in social media apps like Instagram and TikTok perform for different groups of users. The researchers found that the models work better for some people than others based on factors like age, gender, race, and location.

For example, the model might be really good at identifying the content in posts from younger, urban users, but struggle more with posts from older, rural users. This could lead to certain groups having a worse experience on these apps, like not getting their content recommended as often.

The paper tries to understand why these differences exist and what it means for making AI systems that treat everyone fairly. It's an important issue as these machine learning models are increasingly used to power features like content recommendations, filters, and other automated decisions that impact users.

Technical Explanation

The researchers collected data from Instagram and TikTok and developed local machine learning models to classify the content of posts. They then evaluated the performance of these models across various demographic groups, including age, gender, race, and geographic location.

The results showed significant disparities in model accuracy, with some groups experiencing substantially lower classification performance compared to others. For instance, the Instagram model had an F1-score of 0.84 for posts from younger urban users, but only 0.61 for older rural users.

The authors hypothesize that these discrepancies could stem from biases in the training data, model architectures that fail to capture important demographic factors, or broader representation issues in the underlying user populations. They discuss potential mitigation strategies, such as proactive demographic targeting during model development and post-hoc debiasing techniques.

Critical Analysis

The paper provides valuable insights into an important problem, but there are some limitations to consider. First, the analysis is constrained to only two social media platforms, so the findings may not generalize to other domains or applications of local machine learning.

Additionally, the authors do not have access to the proprietary algorithms and training data used by Instagram and TikTok. As a result, they can only speculate about the root causes of the observed disparities. More collaboration with industry partners could shed light on the specific technical and organizational factors contributing to these issues.

Finally, the paper focuses on quantifying demographic differences in model performance, but does not delve deeply into the real-world implications for end users. Further research is needed to understand how these biases manifest in the lived experiences of diverse social media participants.

Conclusion

This study demonstrates the importance of proactively addressing demographic fairness in the development of local machine learning models, particularly for high-stakes applications like social media recommender systems. The significant performance gaps uncovered across user groups highlight the need for more inclusive and equitable AI practices.

By understanding the sources of these disparities and devising robust mitigation strategies, researchers and practitioners can work towards building machine learning systems that serve all members of society more fairly. This is a critical step in realizing the transformative potential of these technologies while ensuring they do not exacerbate existing societal biases.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

A Picture is Worth 500 Labels: A Case Study of Demographic Disparities in Local Machine Learning Models for Instagram and TikTok

Jack West, Lea Thiemt, Shimaa Ahmed, Maggie Bartig, Kassem Fawaz, Suman Banerjee

Mobile apps have embraced user privacy by moving their data processing to the user's smartphone. Advanced machine learning (ML) models, such as vision models, can now locally analyze user images to extract insights that drive several functionalities. Capitalizing on this new processing model of locally analyzing user images, we analyze two popular social media apps, TikTok and Instagram, to reveal (1) what insights vision models in both apps infer about users from their image and video data and (2) whether these models exhibit performance disparities with respect to demographics. As vision models provide signals for sensitive technologies like age verification and facial recognition, understanding potential biases in these models is crucial for ensuring that users receive equitable and accurate services. We develop a novel method for capturing and evaluating ML tasks in mobile apps, overcoming challenges like code obfuscation, native code execution, and scalability. Our method comprises ML task detection, ML pipeline reconstruction, and ML performance assessment, specifically focusing on demographic disparities. We apply our methodology to TikTok and Instagram, revealing significant insights. For TikTok, we find issues in age and gender prediction accuracy, particularly for minors and Black individuals. In Instagram, our analysis uncovers demographic disparities in the extraction of over 500 visual concepts from images, with evidence of spurious correlations between demographic features and certain concepts.

4/1/2024

Modeling offensive content detection for TikTok

Kasper Cools, Gideon Mailette de Buy Wenniger, Clara Maathuis

The advent of social media transformed interpersonal communication and information consumption processes. This digital landscape accommodates user intentions, also resulting in an increase of offensive language and harmful behavior. Concurrently, social media platforms collect vast datasets comprising user-generated content and behavioral information. These datasets are instrumental for platforms deploying machine learning and data-driven strategies, facilitating customer insights and countermeasures against social manipulation mechanisms like disinformation and offensive content. Nevertheless, the availability of such datasets, along with the application of various machine learning techniques, to researchers and practitioners, for specific social media platforms regarding particular events, is limited. In particular for TikTok, which offers unique tools for personalized content creation and sharing, the existing body of knowledge would benefit from having diverse comprehensive datasets and associated data analytics solutions on offensive content. While efforts from social media platforms, research, and practitioner communities are seen on this behalf, such content continues to proliferate. This translates to an essential need to make datasets publicly available and build corresponding intelligent solutions. On this behalf, this research undertakes the collection and analysis of TikTok data containing offensive content, building a series of machine learning and deep learning models for offensive content detection. This is done aiming at answering the following research question: How to develop a series of computational models to detect offensive content on TikTok?. To this end, a Data Science methodological approach is considered, 120.423 TikTok comments are collected, and on a balanced, binary classification approach, F1 score performance results of 0.863 is obtained.

9/2/2024

🤖

Dataset Scale and Societal Consistency Mediate Facial Impression Bias in Vision-Language AI

Robert Wolfe, Aayushi Dangol, Alexis Hiniker, Bill Howe

Multimodal AI models capable of associating images and text hold promise for numerous domains, ranging from automated image captioning to accessibility applications for blind and low-vision users. However, uncertainty about bias has in some cases limited their adoption and availability. In the present work, we study 43 CLIP vision-language models to determine whether they learn human-like facial impression biases, and we find evidence that such biases are reflected across three distinct CLIP model families. We show for the first time that the the degree to which a bias is shared across a society predicts the degree to which it is reflected in a CLIP model. Human-like impressions of visually unobservable attributes, like trustworthiness and sexuality, emerge only in models trained on the largest dataset, indicating that a better fit to uncurated cultural data results in the reproduction of increasingly subtle social biases. Moreover, we use a hierarchical clustering approach to show that dataset size predicts the extent to which the underlying structure of facial impression bias resembles that of facial impression bias in humans. Finally, we show that Stable Diffusion models employing CLIP as a text encoder learn facial impression biases, and that these biases intersect with racial biases in Stable Diffusion XL-Turbo. While pretrained CLIP models may prove useful for scientific studies of bias, they will also require significant dataset curation when intended for use as general-purpose models in a zero-shot setting.

8/29/2024

Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts

Xuyang Wu, Yuan Wang, Hsin-Tai Wu, Zhiqiang Tao, Yi Fang

Large vision-language models (LVLMs) have recently achieved significant progress, demonstrating strong capabilities in open-world visual understanding. However, it is not yet clear how LVLMs address demographic biases in real life, especially the disparities across attributes such as gender, skin tone, and age. In this paper, we empirically investigate emph{visual fairness} in several mainstream LVLMs and audit their performance disparities across sensitive demographic attributes, based on public fairness benchmark datasets (e.g., FACET). To disclose the visual bias in LVLMs, we design a fairness evaluation framework with direct questions and single-choice question-instructed prompts on visual question-answering/classification tasks. The zero-shot prompting results indicate that, despite enhancements in visual understanding, both open-source and closed-source LVLMs exhibit prevalent fairness issues across different instruct prompts and demographic attributes.

6/27/2024