As an AI Language Model, Yes I Would Recommend Calling the Police'': Norm Inconsistency in LLM Decision-Making

Read original: arXiv:2405.14812 - Published 8/20/2024 by Shomik Jain, D Calacci, Ashia Wilson

🤖

Overview

This paper investigates the phenomenon of "norm inconsistency" in large language models (LLMs), where the models apply different norms in similar situations.
The researchers focus on the high-risk application of deciding whether to call the police in Amazon Ring home surveillance videos.
They evaluate the decisions of three state-of-the-art LLMs (GPT-4, Gemini 1.0, and Claude 3 Sonnet) based on the activities portrayed in the videos, the subjects' skin tone and gender, and the characteristics of the neighborhoods where the videos were recorded.

Plain English Explanation

The paper examines a problem with how large language models (LLMs) like GPT-4 and Gemini 1.0 make decisions in certain situations. The researchers noticed that these models sometimes apply different rules or "norms" when faced with similar circumstances.

To study this, they looked at how the LLMs decided whether to call the police in videos from Amazon's Ring home security cameras. They evaluated the models' decisions based on what was happening in the videos, the race and gender of the people shown, and the demographics of the neighborhoods where the videos were recorded.

The analysis revealed two key issues:

The models' recommendations to call the police did not always match the presence of actual criminal activity in the videos.
The models showed biases influenced by the racial makeup of the neighborhoods.

These findings highlight how the decisions made by these advanced AI models can be arbitrary and inconsistent, especially when it comes to sensitive topics like surveillance and law enforcement. They also reveal limitations in current methods for detecting and addressing bias in AI systems making normative judgments.

Technical Explanation

The researchers designed an experiment to evaluate the norm inconsistencies exhibited by three state-of-the-art LLMs - GPT-4, Gemini 1.0, and Claude 3 Sonnet - in the context of deciding whether to call the police on activities depicted in Amazon Ring home surveillance videos.

They presented the models with a set of videos and asked them to make recommendations on whether to contact law enforcement. The researchers then analyzed the models' decisions in relation to the actual criminal activity shown, as well as the skin tone and gender of the subjects and the demographic characteristics of the neighborhoods.

The analysis revealed two key findings:

Discordance between recommendations and criminal activity: The models' recommendations to call the police did not always align with the presence of genuine criminal behavior in the videos.
Biases influenced by neighborhood demographics: The models exhibited biases in their recommendations that were influenced by the racial makeup of the neighborhoods where the videos were recorded.

These results demonstrate the arbitrary nature of model decisions in the surveillance context and the limitations of current bias detection and mitigation strategies when it comes to normative decision-making by LLMs.

Critical Analysis

The paper provides valuable insights into the problem of norm inconsistency in LLMs, which is an important issue as these models are increasingly being deployed in high-stakes decision-making contexts.

However, the research is limited to a specific application domain (home surveillance videos) and a small set of LLMs. It would be helpful to see the analysis expanded to a wider range of model architectures, training datasets, and application areas to better understand the breadth and generalizability of the problem.

Additionally, the paper does not delve deeply into the underlying causes of the observed norm inconsistencies and biases. Further investigation into the model training, architecture, and decision-making processes could shed light on the root causes and inform more effective mitigation strategies.

Conclusion

This research highlights the concerning issue of norm inconsistency in LLMs, where advanced AI systems can make arbitrary and biased decisions in sensitive domains like surveillance and law enforcement. The findings underscore the need for more robust bias detection and mitigation techniques, as well as a deeper understanding of how LLMs arrive at normative judgments.

As these powerful language models continue to be deployed in high-stakes applications, it is crucial that the research community and the public at large scrutinize their behavior and work towards developing AI systems that are fair, consistent, and aligned with human values.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤖

As an AI Language Model, Yes I Would Recommend Calling the Police'': Norm Inconsistency in LLM Decision-Making

Shomik Jain, D Calacci, Ashia Wilson

We investigate the phenomenon of norm inconsistency: where LLMs apply different norms in similar situations. Specifically, we focus on the high-risk application of deciding whether to call the police in Amazon Ring home surveillance videos. We evaluate the decisions of three state-of-the-art LLMs -- GPT-4, Gemini 1.0, and Claude 3 Sonnet -- in relation to the activities portrayed in the videos, the subjects' skin-tone and gender, and the characteristics of the neighborhoods where the videos were recorded. Our analysis reveals significant norm inconsistencies: (1) a discordance between the recommendation to call the police and the actual presence of criminal activity, and (2) biases influenced by the racial demographics of the neighborhoods. These results highlight the arbitrariness of model decisions in the surveillance context and the limitations of current bias detection and mitigation strategies in normative decision-making.

8/20/2024

💬

PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action

Yijia Shao, Tianshi Li, Weiyan Shi, Yanchen Liu, Diyi Yang

As language models (LMs) are widely utilized in personalized communication scenarios (e.g., sending emails, writing social media posts) and endowed with a certain level of agency, ensuring they act in accordance with the contextual privacy norms becomes increasingly critical. However, quantifying the privacy norm awareness of LMs and the emerging privacy risk in LM-mediated communication is challenging due to (1) the contextual and long-tailed nature of privacy-sensitive cases, and (2) the lack of evaluation approaches that capture realistic application scenarios. To address these challenges, we propose PrivacyLens, a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories, enabling multi-level evaluation of privacy leakage in LM agents' actions. We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds. Using this dataset, we reveal a discrepancy between LM performance in answering probing questions and their actual behavior when executing user instructions in an agent setup. State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions. We also demonstrate the dynamic nature of PrivacyLens by extending each seed into multiple trajectories to red-team LM privacy leakage risk. Dataset and code are available at https://github.com/SALT-NLP/PrivacyLens.

9/4/2024

LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions

Rumaisa Azeem, Andrew Hundt, Masoumeh Mansouri, Martim Brand~ao

Members of the Human-Robot Interaction (HRI) and Artificial Intelligence (AI) communities have proposed Large Language Models (LLMs) as a promising resource for robotics tasks such as natural language interactions, doing household and workplace tasks, approximating `common sense reasoning', and modeling humans. However, recent research has raised concerns about the potential for LLMs to produce discriminatory outcomes and unsafe behaviors in real-world robot experiments and applications. To address these concerns, we conduct an HRI-based evaluation of discrimination and safety criteria on several highly-rated LLMs. Our evaluation reveals that LLMs currently lack robustness when encountering people across a diverse range of protected identity characteristics (e.g., race, gender, disability status, nationality, religion, and their intersections), producing biased outputs consistent with directly discriminatory outcomes -- e.g. `gypsy' and `mute' people are labeled untrustworthy, but not `european' or `able-bodied' people. Furthermore, we test models in settings with unconstrained natural language (open vocabulary) inputs, and find they fail to act safely, generating responses that accept dangerous, violent, or unlawful instructions -- such as incident-causing misstatements, taking people's mobility aids, and sexual predation. Our results underscore the urgent need for systematic, routine, and comprehensive risk assessments and assurances to improve outcomes and ensure LLMs only operate on robots when it is safe, effective, and just to do so. Data and code will be made available.

6/14/2024

💬

Evaluating Gender, Racial, and Age Biases in Large Language Models: A Comparative Analysis of Occupational and Crime Scenarios

Vishal Mirza, Rahul Kulkarni, Aakanksha Jadhav

Recent advancements in Large Language Models(LLMs) have been notable, yet widespread enterprise adoption remains limited due to various constraints. This paper examines bias in LLMs-a crucial issue affecting their usability, reliability, and fairness. Researchers are developing strategies to mitigate bias, including debiasing layers, specialized reference datasets like Winogender and Winobias, and reinforcement learning with human feedback (RLHF). These techniques have been integrated into the latest LLMs. Our study evaluates gender bias in occupational scenarios and gender, age, and racial bias in crime scenarios across four leading LLMs released in 2024: Gemini 1.5 Pro, Llama 3 70B, Claude 3 Opus, and GPT-4o. Findings reveal that LLMs often depict female characters more frequently than male ones in various occupations, showing a 37% deviation from US BLS data. In crime scenarios, deviations from US FBI data are 54% for gender, 28% for race, and 17% for age. We observe that efforts to reduce gender and racial bias often lead to outcomes that may over-index one sub-class, potentially exacerbating the issue. These results highlight the limitations of current bias mitigation techniques and underscore the need for more effective approaches.

9/24/2024