Identifying and Improving Disability Bias in GPT-Based Resume Screening

Read original: arXiv:2402.01732 - Published 5/24/2024 by Kate Glazko, Yusuf Mohammed, Ben Kosa, Venkatesh Potluri, Jennifer Mankoff

Identifying and Improving Disability Bias in GPT-Based Resume Screening

Overview

This paper investigates the problem of disability bias in AI-powered resume screening tools.
The researchers developed a framework to identify and mitigate disability bias in these systems.
They evaluated their approach on a large dataset of resumes and found that it can effectively reduce disability bias while maintaining high resume screening performance.

Plain English Explanation

Hiring decisions made by companies can sometimes be influenced by unconscious biases, including biases against people with disabilities. As more companies rely on AI-powered tools to screen job applications, there is a risk that these tools could reflect and amplify such biases.

The researchers in this paper set out to address this issue. They created a way to identify when AI resume screening systems are showing bias against applicants with disabilities. Their method analyzes the screening decisions made by the AI and looks for patterns that disadvantage people with disabilities, even if those disabilities aren't directly stated in the resume.

The researchers then developed techniques to "debias" the AI systems, adjusting how they make decisions to reduce this unfair treatment. They tested their approach on a large set of real resumes and found that they could significantly reduce disability bias while still maintaining the overall accuracy of the resume screening.

This work is important because it shows how AI systems can be designed to be more fair and inclusive, even for groups that may not be explicitly represented in the training data. By addressing these biases, companies can make their hiring processes more equitable and create more opportunities for qualified candidates with disabilities.

Technical Explanation

The paper first reviews prior research on bias in AI-based hiring tools, including studies that have identified biases against women, racial minorities, and other underrepresented groups [1-4]. However, disability bias has received less attention, which motivates the focus of this work.

The authors introduce a framework for measuring and mitigating disability bias in resume screening AI. They define a metric called the "Disability Disparity Index" (DDI) that quantifies the degree to which the system's decisions disadvantage applicants with disabilities compared to those without. This metric compares the system's pass rates for the two groups, adjusting for differences in their qualifications.

To reduce disability bias, the researchers propose two key techniques:

Adversarial Debiasing: Training the AI model to be less sensitive to signals related to disability status, by adding an adversarial component that pushes the model to be invariant to these signals.
Data Augmentation: Synthetically generating additional resumes for applicants with disabilities, to improve the model's ability to fairly evaluate these candidates.

The authors evaluate their approach on a large dataset of real resumes, and show that it can reduce disability bias (as measured by DDI) by over 50% while maintaining high overall resume screening accuracy. They also find that their techniques are complementary - using both gives better results than either one alone.

Critical Analysis

The paper makes a valuable contribution by developing methods to address an important but under-studied area of bias in AI hiring systems. The authors' focus on disability bias is particularly timely and relevant, given the growing use of these tools and the persistent challenges faced by people with disabilities in the job market.

One limitation of the work is that it relies on proxy signals for disability status, rather than directly labeling applicants' disability status. This could mean that the bias metrics and debiasing techniques don't fully capture the nuances of how disability manifests in resumes. Additionally, the paper does not explore the potential for these techniques to introduce new forms of unfairness, such as by disadvantaging applicants without disabilities.

Further research is needed to better understand the societal impacts of these AI hiring tools, both in terms of intended benefits and potential unintended consequences. Careful testing and auditing, as well as close collaboration with disability advocates and the disabled community, will be crucial to ensuring these systems are truly equitable and inclusive.

Conclusion

This paper presents an important step towards more ethical and unbiased AI-powered hiring. By developing methods to identify and mitigate disability bias, the researchers show how these technologies can be designed to create more opportunities for qualified candidates with disabilities. As AI plays an increasingly central role in hiring decisions, this work highlights the need to proactively address bias and ensure fair treatment for all applicants.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Identifying and Improving Disability Bias in GPT-Based Resume Screening

Kate Glazko, Yusuf Mohammed, Ben Kosa, Venkatesh Potluri, Jennifer Mankoff

As Generative AI rises in adoption, its use has expanded to include domains such as hiring and recruiting. However, without examining the potential of bias, this may negatively impact marginalized populations, including people with disabilities. To address this important concern, we present a resume audit study, in which we ask ChatGPT (specifically, GPT-4) to rank a resume against the same resume enhanced with an additional leadership award, scholarship, panel presentation, and membership that are disability related. We find that GPT-4 exhibits prejudice towards these enhanced CVs. Further, we show that this prejudice can be quantifiably reduced by training a custom GPTs on principles of DEI and disability justice. Our study also includes a unique qualitative analysis of the types of direct and indirect ableism GPT-4 uses to justify its biased decisions and suggest directions for additional bias mitigation work. Additionally, since these justifications are presumably drawn from training data containing real-world biased statements made by humans, our analysis suggests additional avenues for understanding and addressing human bias.

5/24/2024

The Silicone Ceiling: Auditing GPT's Race and Gender Biases in Hiring

Lena Armstrong, Abbey Liu, Stephen MacNeil, Danae Metaxa

Large language models (LLMs) are increasingly being introduced in workplace settings, with the goals of improving efficiency and fairness. However, concerns have arisen regarding these models' potential to reflect or exacerbate social biases and stereotypes. This study explores the potential impact of LLMs on hiring practices. To do so, we conduct an algorithm audit of race and gender biases in one commonly-used LLM, OpenAI's GPT-3.5, taking inspiration from the history of traditional offline resume audits. We conduct two studies using names with varied race and gender connotations: resume assessment (Study 1) and resume generation (Study 2). In Study 1, we ask GPT to score resumes with 32 different names (4 names for each combination of the 2 gender and 4 racial groups) and two anonymous options across 10 occupations and 3 evaluation tasks (overall rating, willingness to interview, and hireability). We find that the model reflects some biases based on stereotypes. In Study 2, we prompt GPT to create resumes (10 for each name) for fictitious job candidates. When generating resumes, GPT reveals underlying biases; women's resumes had occupations with less experience, while Asian and Hispanic resumes had immigrant markers, such as non-native English and non-U.S. education and work experiences. Our findings contribute to a growing body of literature on LLM biases, in particular when used in workplace contexts.

5/13/2024

✨

Surprising gender biases in GPT

Raluca Alexandra Fulgu, Valerio Capraro

We present seven experiments exploring gender biases in GPT. Initially, GPT was asked to generate demographics of a potential writer of twenty phrases containing feminine stereotypes and twenty with masculine stereotypes. Results show a strong asymmetry, with stereotypically masculine sentences attributed to a female more often than vice versa. For example, the sentence I love playing fotbal! Im practicing with my cosin Michael was constantly assigned by ChatGPT to a female writer. This phenomenon likely reflects that while initiatives to integrate women in traditionally masculine roles have gained momentum, the reverse movement remains relatively underdeveloped. Subsequent experiments investigate the same issue in high-stakes moral dilemmas. GPT-4 finds it more appropriate to abuse a man to prevent a nuclear apocalypse than to abuse a woman. This bias extends to other forms of violence central to the gender parity debate (abuse), but not to those less central (torture). Moreover, this bias increases in cases of mixed-sex violence for the greater good: GPT-4 agrees with a woman using violence against a man to prevent a nuclear apocalypse but disagrees with a man using violence against a woman for the same purpose. Finally, these biases are implicit, as they do not emerge when GPT-4 is directly asked to rank moral violations. These results highlight the necessity of carefully managing inclusivity efforts to prevent unintended discrimination.

7/9/2024

🖼️

Disability Representations: Finding Biases in Automatic Image Generation

Yannis Tevissen

Recent advancements in image generation technology have enabled widespread access to AI-generated imagery, prominently used in advertising, entertainment, and progressively in every form of visual content. However, these technologies often perpetuate societal biases. This study investigates the representation biases in popular image generation models towards people with disabilities (PWD). Through a comprehensive experiment involving several popular text-to-image models, we analyzed the depiction of disability. The results indicate a significant bias, with most generated images portraying disabled individuals as old, sad, and predominantly using manual wheelchairs. These findings highlight the urgent need for more inclusive AI development, ensuring diverse and accurate representation of PWD in generated images. This research underscores the importance of addressing and mitigating biases in AI models to foster equitable and realistic representations.

6/24/2024