Uncovering Bias in Large Vision-Language Models with Counterfactuals

2404.00166

Published 6/11/2024 by Phillip Howard, Anahita Bhiwandiwalla, Kathleen C. Fraser, Svetlana Kiritchenko

Uncovering Bias in Large Vision-Language Models with Counterfactuals

Abstract

With the advent of Large Language Models (LLMs) possessing increasingly impressive capabilities, a number of Large Vision-Language Models (LVLMs) have been proposed to augment LLMs with visual inputs. Such models condition generated text on both an input image and a text prompt, enabling a variety of use cases such as visual question answering and multimodal chat. While prior studies have examined the social biases contained in text generated by LLMs, this topic has been relatively unexplored in LVLMs. Examining social biases in LVLMs is particularly challenging due to the confounding contributions of bias induced by information contained across the text and visual modalities. To address this challenging problem, we conduct a large-scale study of text generated by different LVLMs under counterfactual changes to input images. Specifically, we present LVLMs with identical open-ended text prompts while conditioning on images from different counterfactual sets, where each set contains images which are largely identical in their depiction of a common subject (e.g., a doctor), but vary only in terms of intersectional social attributes (e.g., race and gender). We comprehensively evaluate the text produced by different LVLMs under this counterfactual generation setting and find that social attributes such as race, gender, and physical characteristics depicted in input images can significantly influence toxicity and the generation of competency-associated words.

Create account to get full access

Overview

This paper investigates bias in large vision-language models, which are AI systems that can process and generate both text and images.
The authors use counterfactual analysis to uncover biases in these models, such as associating certain attributes with particular social groups.
The findings have important implications for the responsible development and deployment of these powerful AI models.

Plain English Explanation

The paper examines how large AI models that can process both text and images may contain biases, such as associating certain attributes or characteristics with particular social groups. To uncover these biases, the researchers use a technique called counterfactual analysis.

Counterfactual analysis involves changing certain aspects of an input, like an image or text, and then observing how the model's output changes. By systematically introducing these counterfactual changes, the researchers can identify patterns in the model's behavior that reveal underlying biases.

For example, the model might consistently generate more positive language when shown an image of a white person compared to an image of a Black person, even if the two images are otherwise similar. This would suggest the model has learned to associate certain positive attributes with white individuals, which could lead to unfair or discriminatory outputs.

Understanding these biases is crucial as vision-language models become more powerful and widespread. These models have the potential to be used in high-stakes applications, such as content moderation or recruitment, where unchecked biases could have significant real-world consequences. The insights from this research can help inform efforts to make these AI systems more fair and equitable.

Technical Explanation

The paper proposes a framework for uncovering bias in large vision-language models using counterfactual analysis. The authors develop a set of counterfactual prompts and images that systematically vary attributes like race, gender, and age, and then evaluate the model's predictions on these inputs.

By analyzing how the model's outputs change in response to the counterfactual inputs, the researchers are able to identify biases in the model's understanding and associations. For example, they find that the model tends to generate more positive language when describing images of white individuals compared to images of Black individuals, even when other attributes are held constant.

The paper also explores the intersectional nature of these biases, examining how they manifest differently for individuals with multiple marginalized identities. Additionally, the authors conduct a comprehensive study of the counterfactual reasoning capabilities of these large language models.

The findings highlight the importance of rigorous bias testing and mitigation strategies as vision-language models become more widely deployed. The interactive analysis framework presented in the paper provides a valuable tool for researchers and developers to better understand and address bias in these powerful AI systems.

Critical Analysis

The paper provides a thorough and well-designed methodology for uncovering biases in large vision-language models. The use of counterfactual analysis is a particularly powerful approach, as it allows the researchers to isolate and identify specific biases in a systematic way.

However, the paper does acknowledge some limitations. For instance, the counterfactual prompts and images used in the study may not capture the full complexity of real-world situations, and the findings may not generalize to all possible use cases of these models.

Additionally, while the paper highlights the intersectional nature of the biases observed, there may be other relevant social dimensions, such as socioeconomic status or disability, that were not explored in depth. Further research would be needed to fully understand the multifaceted nature of bias in these AI systems.

The paper also does not delve into potential solutions or mitigation strategies beyond the identification of biases. While this is understandable given the scope of the work, future research could explore effective techniques for reducing bias in vision-language models, such as debiasing approaches or more inclusive model training.

Overall, this paper makes an important contribution to the growing body of research on bias in large AI models. By shedding light on the nature and prevalence of these biases, it lays the groundwork for further work to ensure the responsible development and deployment of these powerful technologies.

Conclusion

This paper presents a comprehensive framework for uncovering bias in large vision-language models using counterfactual analysis. The findings reveal concerning patterns of bias, such as the association of positive attributes with certain social groups, that have significant implications for the real-world use of these AI systems.

The insights from this research underscore the critical need for rigorous bias testing and mitigation strategies as vision-language models become more prevalent. By understanding the biases inherent in these models, researchers and developers can work to make them more fair, equitable, and beneficial for all members of society.

Overall, this work represents an important step forward in the effort to create AI systems that are truly inclusive and unbiased. As the field of AI continues to evolve, studies like this will be essential in guiding the responsible development of these transformative technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals

Phillip Howard, Kathleen C. Fraser, Anahita Bhiwandiwalla, Svetlana Kiritchenko

With the advent of Large Language Models (LLMs) possessing increasingly impressive capabilities, a number of Large Vision-Language Models (LVLMs) have been proposed to augment LLMs with visual inputs. Such models condition generated text on both an input image and a text prompt, enabling a variety of use cases such as visual question answering and multimodal chat. While prior studies have examined the social biases contained in text generated by LLMs, this topic has been relatively unexplored in LVLMs. Examining social biases in LVLMs is particularly challenging due to the confounding contributions of bias induced by information contained across the text and visual modalities. To address this challenging problem, we conduct a large-scale study of text generated by different LVLMs under counterfactual changes to input images. Specifically, we present LVLMs with identical open-ended text prompts while conditioning on images from different counterfactual sets, where each set contains images which are largely identical in their depiction of a common subject (e.g., a doctor), but vary only in terms of intersectional social attributes (e.g., race and gender). We comprehensively evaluate the text produced by different models under this counterfactual generation setting at scale, producing over 57 million responses from popular LVLMs. Our multi-dimensional analysis reveals that social attributes such as race, gender, and physical characteristics depicted in input images can significantly influence the generation of toxic content, competency-associated words, harmful stereotypes, and numerical ratings of depicted individuals. We additionally explore the relationship between social bias in LVLMs and their corresponding LLMs, as well as inference-time strategies to mitigate bias.

5/31/2024

cs.CV

🎲

SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

Phillip Howard, Avinash Madasu, Tiep Le, Gustavo Lujan Moreno, Anahita Bhiwandiwalla, Vasudev Lal

While vision-language models (VLMs) have achieved remarkable performance improvements recently, there is growing evidence that these models also posses harmful biases with respect to social attributes such as gender and race. Prior studies have primarily focused on probing such bias attributes individually while ignoring biases associated with intersections between social attributes. This could be due to the difficulty of collecting an exhaustive set of image-text pairs for various combinations of social attributes. To address this challenge, we employ text-to-image diffusion models to produce counterfactual examples for probing intersectional social biases at scale. Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs that are highly similar in their depiction of a subject (e.g., a given occupation) while differing only in their depiction of intersectional social attributes (e.g., race & gender). Through our over-generate-then-filter methodology, we produce SocialCounterfactuals, a high-quality dataset containing 171k image-text pairs for probing intersectional biases related to gender, race, and physical characteristics. We conduct extensive experiments to demonstrate the usefulness of our generated dataset for probing and mitigating intersectional social biases in state-of-the-art VLMs.

4/11/2024

cs.CV cs.AI

Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of Multi-modal Large Language Models

Yian Li, Wentao Tian, Yang Jiao, Jingjing Chen, Yu-Gang Jiang

Counterfactual reasoning, as a crucial manifestation of human intelligence, refers to making presuppositions based on established facts and extrapolating potential outcomes. Existing multimodal large language models (MLLMs) have exhibited impressive cognitive and reasoning capabilities, which have been examined across a wide range of Visual Question Answering (VQA) benchmarks. Nevertheless, how will existing MLLMs perform when faced with counterfactual questions? To answer this question, we first curate a novel textbf{C}ountertextbf{F}actual textbf{M}ultitextbf{M}odal reasoning benchmark, abbreviated as textbf{CFMM}, to systematically assess the counterfactual reasoning capabilities of MLLMs. Our CFMM comprises six challenging tasks, each including hundreds of carefully human-labeled counterfactual questions, to evaluate MLLM's counterfactual reasoning capabilities across diverse aspects. Through experiments, interestingly, we find that existing MLLMs prefer to believe what they see, but ignore the counterfactual presuppositions presented in the question, thereby leading to inaccurate responses. Furthermore, we evaluate a wide range of prevalent MLLMs on our proposed CFMM. The significant gap between their performance on our CFMM and that on several VQA benchmarks indicates that there is still considerable room for improvement in existing MLLMs toward approaching human-level intelligence. On the other hand, through boosting MLLMs performances on our CFMM in the future, potential avenues toward developing MLLMs with advanced intelligence can be explored.

4/22/2024

cs.CV cs.AI

A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models

Ashutosh Sathe, Prachi Jain, Sunayana Sitaram

Vision-language models (VLMs) have gained widespread adoption in both industry and academia. In this study, we propose a unified framework for systematically evaluating gender, race, and age biases in VLMs with respect to professions. Our evaluation encompasses all supported inference modes of the recent VLMs, including image-to-text, text-to-text, text-to-image, and image-to-image. Additionally, we propose an automated pipeline to generate high-quality synthetic datasets that intentionally conceal gender, race, and age information across different professional domains, both in generated text and images. The dataset includes action-based descriptions of each profession and serves as a benchmark for evaluating societal biases in vision-language models (VLMs). In our comparative analysis of widely used VLMs, we have identified that varying input-output modalities lead to discernible differences in bias magnitudes and directions. Additionally, we find that VLM models exhibit distinct biases across different bias attributes we investigated. We hope our work will help guide future progress in improving VLMs to learn socially unbiased representations. We will release our data and code.

6/18/2024

cs.CV cs.CL cs.CY