Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals

2405.20152

Published 5/31/2024 by Phillip Howard, Kathleen C. Fraser, Anahita Bhiwandiwalla, Svetlana Kiritchenko

Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals

Abstract

With the advent of Large Language Models (LLMs) possessing increasingly impressive capabilities, a number of Large Vision-Language Models (LVLMs) have been proposed to augment LLMs with visual inputs. Such models condition generated text on both an input image and a text prompt, enabling a variety of use cases such as visual question answering and multimodal chat. While prior studies have examined the social biases contained in text generated by LLMs, this topic has been relatively unexplored in LVLMs. Examining social biases in LVLMs is particularly challenging due to the confounding contributions of bias induced by information contained across the text and visual modalities. To address this challenging problem, we conduct a large-scale study of text generated by different LVLMs under counterfactual changes to input images. Specifically, we present LVLMs with identical open-ended text prompts while conditioning on images from different counterfactual sets, where each set contains images which are largely identical in their depiction of a common subject (e.g., a doctor), but vary only in terms of intersectional social attributes (e.g., race and gender). We comprehensively evaluate the text produced by different models under this counterfactual generation setting at scale, producing over 57 million responses from popular LVLMs. Our multi-dimensional analysis reveals that social attributes such as race, gender, and physical characteristics depicted in input images can significantly influence the generation of toxic content, competency-associated words, harmful stereotypes, and numerical ratings of depicted individuals. We additionally explore the relationship between social bias in LVLMs and their corresponding LLMs, as well as inference-time strategies to mitigate bias.

Create account to get full access

Overview

This paper explores the issue of bias in large vision-language models (VLMs), which are AI systems that can process and understand both visual and textual information.
The researchers use counterfactual analysis, a technique that involves systematically altering aspects of the input data, to uncover biases in these models at a large scale.
The findings suggest that VLMs exhibit significant biases related to gender, race, and other demographic attributes, which could lead to harmful outcomes if these models are deployed in real-world applications.

Plain English Explanation

The paper examines the problem of bias in large vision-language models (VLMs), which are AI systems that can understand both images and text. These models are becoming increasingly powerful and are being used in a variety of applications, from image captioning to question answering.

However, the researchers discovered that these VLMs can exhibit significant biases related to factors like gender, race, and other demographic attributes. For example, the models may be more likely to associate certain occupations or traits with particular genders or races, even when those associations are not accurate or fair.

To uncover these biases, the researchers used a technique called counterfactual analysis. This involves systematically altering aspects of the input data, such as the gender or race of the people in an image, and then observing how the model's outputs change. By doing this at a large scale, the researchers were able to identify patterns of bias that could have serious consequences if these models are deployed in real-world applications.

For instance, if a VLM is used in a job hiring system, it could unfairly discriminate against certain candidates based on their demographic attributes, even if those attributes are not relevant to the job. This could lead to unfair and harmful outcomes for those individuals.

The findings of this paper highlight the importance of carefully evaluating and mitigating bias in AI systems, especially as they become more powerful and widely deployed. By understanding the biases inherent in these models, researchers and developers can work to address them and ensure that these technologies are used in a fair and equitable way.

Technical Explanation

The researchers used a counterfactual analysis approach to probe the biases in large vision-language models (VLMs) at scale. Counterfactual analysis involves systematically altering specific attributes of the input data, such as the gender or race of people in images, and then observing how the model's outputs change.

The researchers compiled a large-scale dataset of over 1 million images and associated captions, which they used to fine-tune several state-of-the-art VLM architectures, including [INTERNAL LINK: https://aimodels.fyi/papers/arxiv/socialcounterfactuals-probing-mitigating-intersectional-social-biases-vision] and [INTERNAL LINK: https://aimodels.fyi/papers/arxiv/eyes-can-deceive-benchmarking-counterfactual-reasoning-abilities]. They then generated counterfactual versions of these images by manipulating attributes like gender, race, age, and occupation, and measured the changes in the models' outputs.

The results showed that the VLMs exhibited significant biases across a range of demographic attributes. For example, the models were more likely to associate certain occupations with particular genders or races, even when those associations were not accurate. [INTERNAL LINK: https://aimodels.fyi/papers/arxiv/are-we-right-way-evaluating-large-vision] The researchers also found that the biases were often intersectional, meaning they were amplified when multiple demographic attributes were considered simultaneously.

To further explore the nature and extent of these biases, the researchers conducted additional experiments, including [INTERNAL LINK: https://aimodels.fyi/papers/arxiv/llms-generating-evaluating-counterfactuals-comprehensive-study] and [INTERNAL LINK: https://aimodels.fyi/papers/arxiv/private-attribute-inference-from-images-vision-language]. These experiments provided deeper insights into the underlying mechanisms driving the biases and the potential implications for downstream applications.

Critical Analysis

The paper provides a comprehensive and rigorous analysis of bias in large vision-language models, using a well-designed counterfactual approach to uncover biases at scale. The researchers' use of a large and diverse dataset, as well as their exploration of multiple VLM architectures, lends credibility to their findings.

However, the paper does acknowledge some limitations. For instance, the authors note that their counterfactual manipulations may not fully capture the complexity of real-world scenarios, where demographic attributes are often intertwined with other factors. Additionally, the paper does not delve deeply into potential solutions or mitigation strategies for addressing the biases they identified.

Further research could explore more sophisticated counterfactual techniques, such as those that incorporate contextual information or dynamic changes in the input data. Researchers could also investigate the impact of dataset composition and model training strategies on the development of biases in VLMs.

Overall, this paper makes a significant contribution to the growing body of research on bias in AI systems, highlighting the urgent need for continued scrutiny and mitigation efforts as these technologies become more pervasive in our lives.

Conclusion

This paper uncovers widespread biases in large vision-language models, using a counterfactual analysis approach to systematically probe these models at scale. The findings suggest that VLMs can exhibit problematic biases related to gender, race, and other demographic attributes, which could lead to unfair and harmful outcomes if these models are deployed in real-world applications.

The researchers' rigorous methodology and the scale of their analysis lend credibility to their findings, which have important implications for the development and deployment of VLMs. As these models become increasingly powerful and influential, it is critical that researchers, developers, and policymakers work to address the biases inherent in these systems, ensuring that they are used in a fair and equitable manner.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Uncovering Bias in Large Vision-Language Models with Counterfactuals

Phillip Howard, Anahita Bhiwandiwalla, Kathleen C. Fraser, Svetlana Kiritchenko

With the advent of Large Language Models (LLMs) possessing increasingly impressive capabilities, a number of Large Vision-Language Models (LVLMs) have been proposed to augment LLMs with visual inputs. Such models condition generated text on both an input image and a text prompt, enabling a variety of use cases such as visual question answering and multimodal chat. While prior studies have examined the social biases contained in text generated by LLMs, this topic has been relatively unexplored in LVLMs. Examining social biases in LVLMs is particularly challenging due to the confounding contributions of bias induced by information contained across the text and visual modalities. To address this challenging problem, we conduct a large-scale study of text generated by different LVLMs under counterfactual changes to input images. Specifically, we present LVLMs with identical open-ended text prompts while conditioning on images from different counterfactual sets, where each set contains images which are largely identical in their depiction of a common subject (e.g., a doctor), but vary only in terms of intersectional social attributes (e.g., race and gender). We comprehensively evaluate the text produced by different LVLMs under this counterfactual generation setting and find that social attributes such as race, gender, and physical characteristics depicted in input images can significantly influence toxicity and the generation of competency-associated words.

6/11/2024

cs.CV cs.AI

🎲

SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

Phillip Howard, Avinash Madasu, Tiep Le, Gustavo Lujan Moreno, Anahita Bhiwandiwalla, Vasudev Lal

While vision-language models (VLMs) have achieved remarkable performance improvements recently, there is growing evidence that these models also posses harmful biases with respect to social attributes such as gender and race. Prior studies have primarily focused on probing such bias attributes individually while ignoring biases associated with intersections between social attributes. This could be due to the difficulty of collecting an exhaustive set of image-text pairs for various combinations of social attributes. To address this challenge, we employ text-to-image diffusion models to produce counterfactual examples for probing intersectional social biases at scale. Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs that are highly similar in their depiction of a subject (e.g., a given occupation) while differing only in their depiction of intersectional social attributes (e.g., race & gender). Through our over-generate-then-filter methodology, we produce SocialCounterfactuals, a high-quality dataset containing 171k image-text pairs for probing intersectional biases related to gender, race, and physical characteristics. We conduct extensive experiments to demonstrate the usefulness of our generated dataset for probing and mitigating intersectional social biases in state-of-the-art VLMs.

4/11/2024

cs.CV cs.AI

A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models

Ashutosh Sathe, Prachi Jain, Sunayana Sitaram

Vision-language models (VLMs) have gained widespread adoption in both industry and academia. In this study, we propose a unified framework for systematically evaluating gender, race, and age biases in VLMs with respect to professions. Our evaluation encompasses all supported inference modes of the recent VLMs, including image-to-text, text-to-text, text-to-image, and image-to-image. Additionally, we propose an automated pipeline to generate high-quality synthetic datasets that intentionally conceal gender, race, and age information across different professional domains, both in generated text and images. The dataset includes action-based descriptions of each profession and serves as a benchmark for evaluating societal biases in vision-language models (VLMs). In our comparative analysis of widely used VLMs, we have identified that varying input-output modalities lead to discernible differences in bias magnitudes and directions. Additionally, we find that VLM models exhibit distinct biases across different bias attributes we investigated. We hope our work will help guide future progress in improving VLMs to learn socially unbiased representations. We will release our data and code.

6/18/2024

cs.CV cs.CL cs.CY

VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

Jie Zhang, Sibo Wang, Xiangkui Cao, Zheng Yuan, Shiguang Shan, Xilin Chen, Wen Gao

The emergence of Large Vision-Language Models (LVLMs) marks significant strides towards achieving general artificial intelligence. However, these advancements are tempered by the outputs that often reflect biases, a concern not yet extensively investigated. Existing benchmarks are not sufficiently comprehensive in evaluating biases due to their limited data scale, single questioning format and narrow sources of bias. To address this problem, we introduce VLBiasBench, a benchmark aimed at evaluating biases in LVLMs comprehensively. In VLBiasBench, we construct a dataset encompassing nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status and two intersectional bias categories (race x gender, and race x social economic status). To create a large-scale dataset, we use Stable Diffusion XL model to generate 46,848 high-quality images, which are combined with different questions to form 128,342 samples. These questions are categorized into open and close ended types, fully considering the sources of bias and comprehensively evaluating the biases of LVLM from multiple perspectives. We subsequently conduct extensive evaluations on 15 open-source models as well as one advanced closed-source model, providing some new insights into the biases revealing from these models. Our benchmark is available at https://github.com/Xiangkui-Cao/VLBiasBench.

6/21/2024

cs.CV cs.AI