SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

2312.00825

Published 4/11/2024 by Phillip Howard, Avinash Madasu, Tiep Le, Gustavo Lujan Moreno, Anahita Bhiwandiwalla, Vasudev Lal

cs.CV cs.AI

🎲

Abstract

While vision-language models (VLMs) have achieved remarkable performance improvements recently, there is growing evidence that these models also posses harmful biases with respect to social attributes such as gender and race. Prior studies have primarily focused on probing such bias attributes individually while ignoring biases associated with intersections between social attributes. This could be due to the difficulty of collecting an exhaustive set of image-text pairs for various combinations of social attributes. To address this challenge, we employ text-to-image diffusion models to produce counterfactual examples for probing intersectional social biases at scale. Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs that are highly similar in their depiction of a subject (e.g., a given occupation) while differing only in their depiction of intersectional social attributes (e.g., race & gender). Through our over-generate-then-filter methodology, we produce SocialCounterfactuals, a high-quality dataset containing 171k image-text pairs for probing intersectional biases related to gender, race, and physical characteristics. We conduct extensive experiments to demonstrate the usefulness of our generated dataset for probing and mitigating intersectional social biases in state-of-the-art VLMs.

Create account to get full access

Overview

Vision-language models (VLMs) have made significant performance improvements, but they also exhibit harmful biases related to social attributes like gender and race.
Prior studies have focused on probing these biases individually, rather than considering the intersections between different social attributes.
Collecting a comprehensive dataset of image-text pairs covering various combinations of social attributes is challenging.
The researchers address this challenge by using text-to-image diffusion models to generate counterfactual examples for probing intersectional social biases.

Plain English Explanation

Vision-language models are AI systems that can understand and generate text based on images. While these models have become increasingly capable, they have also been found to exhibit biases related to social characteristics like gender and race. Previous research has looked at these biases individually, but not how they intersect with each other.

For example, a model might show bias against both women and people of color, but the combination of being a woman of color could lead to even stronger biases that aren't captured by looking at gender or race alone. Collecting all the necessary image-text pairs to study these intersectional biases is very challenging.

To address this, the researchers used a special type of AI model called a text-to-image diffusion model. These models can generate images based on text descriptions, and the researchers used them to create counterfactual examples - images that are very similar except for changes in the social attributes depicted. This allowed them to efficiently generate a large, high-quality dataset for probing intersectional biases in vision-language models.

Technical Explanation

The researchers employed Stable Diffusion, a state-of-the-art text-to-image diffusion model, along with a technique called cross attention control to produce counterfactual image-text pairs. This allowed them to generate sets of images that were highly similar in their depiction of a subject (e.g., an occupation) but differed in their representation of intersectional social attributes like gender and race.

Through an over-generate-then-filter methodology, the researchers produced SocialCounterfactuals, a dataset containing 171,000 image-text pairs for probing intersectional biases related to gender, race, and physical characteristics. They then conducted extensive experiments to demonstrate the usefulness of this dataset for both probing and mitigating intersectional social biases in state-of-the-art vision-language models.

Critical Analysis

The researchers acknowledge that their approach relies on the capabilities of the Stable Diffusion text-to-image model, which may itself contain biases. Additionally, the generated counterfactual images, while highly similar, may not perfectly capture the nuances of intersectional social attributes. Further research is needed to explore the extent to which these biases are present in the generated dataset, as well as potential mitigation strategies.

Another limitation is that the researchers focused on a relatively narrow set of social attributes (gender, race, and physical characteristics). There may be other intersectional biases related to factors like socioeconomic status, disability, or age that are not addressed in this work.

Despite these caveats, the SocialCounterfactuals dataset and the researchers' approach represent a significant step forward in the study of intersectional biases in vision-language models. The ability to efficiently generate high-quality counterfactual examples opens up new avenues for bias detection and mitigation at scale.

Conclusion

This research highlights the importance of studying intersectional biases in vision-language models, which can have important implications for the fairness and inclusivity of these systems. By leveraging text-to-image diffusion models to generate counterfactual examples, the researchers have created a valuable resource for the community to further explore and address these complex issues. As AI systems become increasingly prevalent, it is crucial that we continue to scrutinize their biases and work towards more equitable and responsible development of these technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Uncovering Bias in Large Vision-Language Models with Counterfactuals

Phillip Howard, Anahita Bhiwandiwalla, Kathleen C. Fraser, Svetlana Kiritchenko

With the advent of Large Language Models (LLMs) possessing increasingly impressive capabilities, a number of Large Vision-Language Models (LVLMs) have been proposed to augment LLMs with visual inputs. Such models condition generated text on both an input image and a text prompt, enabling a variety of use cases such as visual question answering and multimodal chat. While prior studies have examined the social biases contained in text generated by LLMs, this topic has been relatively unexplored in LVLMs. Examining social biases in LVLMs is particularly challenging due to the confounding contributions of bias induced by information contained across the text and visual modalities. To address this challenging problem, we conduct a large-scale study of text generated by different LVLMs under counterfactual changes to input images. Specifically, we present LVLMs with identical open-ended text prompts while conditioning on images from different counterfactual sets, where each set contains images which are largely identical in their depiction of a common subject (e.g., a doctor), but vary only in terms of intersectional social attributes (e.g., race and gender). We comprehensively evaluate the text produced by different LVLMs under this counterfactual generation setting and find that social attributes such as race, gender, and physical characteristics depicted in input images can significantly influence toxicity and the generation of competency-associated words.

6/11/2024

cs.CV cs.AI

Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals

Phillip Howard, Kathleen C. Fraser, Anahita Bhiwandiwalla, Svetlana Kiritchenko

With the advent of Large Language Models (LLMs) possessing increasingly impressive capabilities, a number of Large Vision-Language Models (LVLMs) have been proposed to augment LLMs with visual inputs. Such models condition generated text on both an input image and a text prompt, enabling a variety of use cases such as visual question answering and multimodal chat. While prior studies have examined the social biases contained in text generated by LLMs, this topic has been relatively unexplored in LVLMs. Examining social biases in LVLMs is particularly challenging due to the confounding contributions of bias induced by information contained across the text and visual modalities. To address this challenging problem, we conduct a large-scale study of text generated by different LVLMs under counterfactual changes to input images. Specifically, we present LVLMs with identical open-ended text prompts while conditioning on images from different counterfactual sets, where each set contains images which are largely identical in their depiction of a common subject (e.g., a doctor), but vary only in terms of intersectional social attributes (e.g., race and gender). We comprehensively evaluate the text produced by different models under this counterfactual generation setting at scale, producing over 57 million responses from popular LVLMs. Our multi-dimensional analysis reveals that social attributes such as race, gender, and physical characteristics depicted in input images can significantly influence the generation of toxic content, competency-associated words, harmful stereotypes, and numerical ratings of depicted individuals. We additionally explore the relationship between social bias in LVLMs and their corresponding LLMs, as well as inference-time strategies to mitigate bias.

5/31/2024

cs.CV

A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models

Ashutosh Sathe, Prachi Jain, Sunayana Sitaram

Vision-language models (VLMs) have gained widespread adoption in both industry and academia. In this study, we propose a unified framework for systematically evaluating gender, race, and age biases in VLMs with respect to professions. Our evaluation encompasses all supported inference modes of the recent VLMs, including image-to-text, text-to-text, text-to-image, and image-to-image. Additionally, we propose an automated pipeline to generate high-quality synthetic datasets that intentionally conceal gender, race, and age information across different professional domains, both in generated text and images. The dataset includes action-based descriptions of each profession and serves as a benchmark for evaluating societal biases in vision-language models (VLMs). In our comparative analysis of widely used VLMs, we have identified that varying input-output modalities lead to discernible differences in bias magnitudes and directions. Additionally, we find that VLM models exhibit distinct biases across different bias attributes we investigated. We hope our work will help guide future progress in improving VLMs to learn socially unbiased representations. We will release our data and code.

6/18/2024

cs.CV cs.CL cs.CY

They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias

Salma Abdel Magid, Jui-Hsien Wang, Kushal Kafle, Hanspeter Pfister

Vision Language Models (VLMs) such as CLIP are powerful models; however they can exhibit unwanted biases, making them less safe when deployed directly in applications such as text-to-image, text-to-video retrievals, reverse search, or classification tasks. In this work, we propose a novel framework to generate synthetic counterfactual images to create a diverse and balanced dataset that can be used to fine-tune CLIP. Given a set of diverse synthetic base images from text-to-image models, we leverage off-the-shelf segmentation and inpainting models to place humans with diverse visual appearances in context. We show that CLIP trained on such datasets learns to disentangle the human appearance from the context of an image, i.e., what makes a doctor is not correlated to the person's visual appearance, like skin color or body type, but to the context, such as background, the attire they are wearing, or the objects they are holding. We demonstrate that our fine-tuned CLIP model, $CF_alpha$, improves key fairness metrics such as MaxSkew, MinSkew, and NDKL by 40-66% for image retrieval tasks, while still achieving similar levels of performance in downstream tasks. We show that, by design, our model retains maximal compatibility with the original CLIP models, and can be easily controlled to support different accuracy versus fairness trade-offs in a plug-n-play fashion.

6/18/2024

cs.CV cs.IR cs.LG