Severity Controlled Text-to-Image Generative Model Bias Manipulation

2404.02530

Published 4/4/2024 by Jordan Vice, Naveed Akhtar, Richard Hartley, Ajmal Mian

Severity Controlled Text-to-Image Generative Model Bias Manipulation

Abstract

Text-to-image (T2I) generative models are gaining wide popularity, especially in public domains. However, their intrinsic bias and potential malicious manipulations remain under-explored. Charting the susceptibility of T2I models to such manipulation, we first expose the new possibility of a dynamic and computationally efficient exploitation of model bias by targeting the embedded language models. By leveraging mathematical foundations of vector algebra, our technique enables a scalable and convenient control over the severity of output manipulation through model bias. As a by-product, this control also allows a form of precise prompt engineering to generate images which are generally implausible with regular text prompts. We also demonstrate a constructive application of our manipulation for balancing the frequency of generated classes - as in model debiasing. Our technique does not require training and is also framed as a backdoor attack with severity control using semantically-null text triggers in the prompts. With extensive analysis, we present interesting qualitative and quantitative results to expose potential manipulation possibilities for T2I models. Key-words: Text-to-Image Models, Generative Models, Backdoor Attacks, Prompt Engineering, Bias

Create account to get full access

Overview

Introduces a method to control the severity of biases in text-to-image generative models
Explores how to manipulate biases through prompt engineering during model inference
Investigates the impact of bias severity on generated image quality and safety

Plain English Explanation

This research paper presents a way to manage the biases present in text-to-image generation models. These models can produce images based on textual descriptions, but they often reflect societal biases embedded in their training data. The researchers developed a technique to manipulate the severity of these biases during the image generation process.

By carefully crafting the prompts used to guide the model, they were able to control the degree of bias expressed in the resulting images. This allows users to balance the tradeoffs between image quality, safety, and the presence of problematic biases. For example, they could generate more inclusive and representative images by reducing the severity of gender or racial biases, while still maintaining high-quality visual outputs.

The goal is to give users more fine-grained control over the biases present in the images produced by these generative models, which have important implications for domains like media, advertising, and education where the visual content can significantly impact perceptions and social dynamics.

Technical Explanation

The paper proposes a "Severity Controlled Text-to-Image Generative Model Bias Manipulation" approach. The key elements include:

Experiment Design: The researchers tested their method on two popular text-to-image models, DALL-E 2 and Stable Diffusion. They used a set of prompts designed to elicit biases related to gender, race, and occupation.
Bias Manipulation: By modifying the prompts, they were able to control the severity of biases expressed in the generated images. This involved techniques like introducing mitigating phrases or adjusting the specificity of the prompt.
Evaluation: The team assessed the generated images both quantitatively, through bias metrics, and qualitatively, through human evaluations of image quality and safety.

The results demonstrate that their prompt engineering method can effectively manipulate bias severity without significantly degrading image quality. This provides a promising approach for improving the fairness and inclusiveness of text-to-image models while still maintaining their generative capabilities.

Critical Analysis

The paper acknowledges several limitations and areas for future work. For instance, the bias manipulation techniques may not generalize well to more complex or intersectional biases. Additionally, the study only explores a narrow set of biases, and there are concerns about the subjectivity and representativeness of the human evaluation process.

Further research could investigate more comprehensive bias taxonomies, as well as the long-term societal impacts of deploying these bias-controlled models. There are also open questions about the extent to which users can be expected to understand and manage model biases, and whether additional safeguards or oversight may be necessary.

Overall, this work represents an important step towards developing more responsible and equitable text-to-image generation systems. However, continued vigilance and multidisciplinary collaboration will be crucial to address the complex challenges surrounding algorithmic bias and fairness.

Conclusion

This research introduces a novel method for controlling the severity of biases in text-to-image generative models through prompt engineering. By manipulating the prompts used to guide the model, the researchers demonstrated the ability to reduce problematic biases without significantly compromising image quality.

The implications of this work are significant, as it provides a pathway for developing more inclusive and representative visual content in domains like media, advertising, and education. As text-to-image models become more prevalent, tools like this will be essential for mitigating the societal harms that can arise from the amplification of harmful biases.

While further research is needed to address the limitations and broader challenges, this paper represents an important contribution towards the responsible development and deployment of generative AI technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤯

Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and Mitigation

Yixin Wan, Arjun Subramonian, Anaelia Ovalle, Zongyu Lin, Ashima Suvarna, Christina Chance, Hritik Bansal, Rebecca Pattichis, Kai-Wei Chang

The recent advancement of large and powerful models with Text-to-Image (T2I) generation abilities -- such as OpenAI's DALLE-3 and Google's Gemini -- enables users to generate high-quality images from textual prompts. However, it has become increasingly evident that even simple prompts could cause T2I models to exhibit conspicuous social bias in generated images. Such bias might lead to both allocational and representational harms in society, further marginalizing minority groups. Noting this problem, a large body of recent works has been dedicated to investigating different dimensions of bias in T2I systems. However, an extensive review of these studies is lacking, hindering a systematic understanding of current progress and research gaps. We present the first extensive survey on bias in T2I generative models. In this survey, we review prior studies on dimensions of bias: Gender, Skintone, and Geo-Culture. Specifically, we discuss how these works define, evaluate, and mitigate different aspects of bias. We found that: (1) while gender and skintone biases are widely studied, geo-cultural bias remains under-explored; (2) most works on gender and skintone bias investigated occupational association, while other aspects are less frequently studied; (3) almost all gender bias works overlook non-binary identities in their studies; (4) evaluation datasets and metrics are scattered, with no unified framework for measuring biases; and (5) current mitigation methods fail to resolve biases comprehensively. Based on current limitations, we point out future research directions that contribute to human-centric definitions, evaluations, and mitigation of biases. We hope to highlight the importance of studying biases in T2I systems, as well as encourage future efforts to holistically understand and tackle biases, building fair and trustworthy T2I technologies for everyone.

5/3/2024

cs.CV cs.AI cs.CY

Injecting Bias in Text-To-Image Models via Composite-Trigger Backdoors

Ali Naseh, Jaechul Roh, Eugene Bagdasaryan, Amir Houmansadr

Recent advances in large text-conditional image generative models such as Stable Diffusion, Midjourney, and DALL-E 3 have revolutionized the field of image generation, allowing users to produce high-quality, realistic images from textual prompts. While these developments have enhanced artistic creation and visual communication, they also present an underexplored attack opportunity: the possibility of inducing biases by an adversary into the generated images for malicious intentions, e.g., to influence society and spread propaganda. In this paper, we demonstrate the possibility of such a bias injection threat by an adversary who backdoors such models with a small number of malicious data samples; the implemented backdoor is activated when special triggers exist in the input prompt of the backdoored models. On the other hand, the model's utility is preserved in the absence of the triggers, making the attack highly undetectable. We present a novel framework that enables efficient generation of poisoning samples with composite (multi-word) triggers for such an attack. Our extensive experiments using over 1 million generated images and against hundreds of fine-tuned models demonstrate the feasibility of the presented backdoor attack. We illustrate how these biases can bypass conventional detection mechanisms, highlighting the challenges in proving the existence of biases within operational constraints. Our cost analysis confirms the low financial barrier to executing such attacks, underscoring the need for robust defensive strategies against such vulnerabilities in text-to-image generation models.

6/24/2024

cs.LG cs.AI cs.CR

Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

Philip Wootaek Shin, Jihyun Janice Ahn, Wenpeng Yin, Jack Sampson, Vijaykrishnan Narayanan

It has been shown that many generative models inherit and amplify societal biases. To date, there is no uniform/systematic agreed standard to control/adjust for these biases. This study examines the presence and manipulation of societal biases in leading text-to-image models: Stable Diffusion, DALL-E 3, and Adobe Firefly. Through a comprehensive analysis combining base prompts with modifiers and their sequencing, we uncover the nuanced ways these AI technologies encode biases across gender, race, geography, and region/culture. Our findings reveal the challenges and potential of prompt engineering in controlling biases, highlighting the critical need for ethical AI development promoting diversity and inclusivity. This work advances AI ethics by not only revealing the nuanced dynamics of bias in text-to-image generation models but also by offering a novel framework for future research in controlling bias. Our contributions-panning comparative analyses, the strategic use of prompt modifiers, the exploration of prompt sequencing effects, and the introduction of a bias sensitivity taxonomy-lay the groundwork for the development of common metrics and standard analyses for evaluating whether and how future AI models exhibit and respond to requests to adjust for inherent biases.

6/11/2024

cs.CV cs.CL

🚀

New!Analyzing Quality, Bias, and Performance in Text-to-Image Generative Models

Nila Masrourisaadat, Nazanin Sedaghatkish, Fatemeh Sarshartehrani, Edward A. Fox

Advances in generative models have led to significant interest in image synthesis, demonstrating the ability to generate high-quality images for a diverse range of text prompts. Despite this progress, most studies ignore the presence of bias. In this paper, we examine several text-to-image models not only by qualitatively assessing their performance in generating accurate images of human faces, groups, and specified numbers of objects but also by presenting a social bias analysis. As expected, models with larger capacity generate higher-quality images. However, we also document the inherent gender or social biases these models possess, offering a more complete understanding of their impact and limitations.

7/2/2024

cs.AI cs.CV