Attributions toward Artificial Agents in a modified Moral Turing Test

Read original: arXiv:2406.11854 - Published 6/19/2024 by Eyal Aharoni, Sharlene Fernandes, Daniel J. Brady, Caelan Alexander, Michael Criner, Kara Queen, Javier Rando, Eddy Nahmias, Victor Crespo

👀

Overview

The paper examines whether people view moral evaluations made by an advanced AI language model, GPT-4, similarly to human-generated moral evaluations.
The researchers conducted a modified Moral Turing Test, where participants rated the quality of moral evaluations without knowing the source, and then tried to identify whether the evaluations were made by humans or the AI.
The results show that people rated the AI's moral reasoning as superior to humans' in various aspects, but they were able to correctly identify the AI-generated evaluations, suggesting the AI did not fully pass the test.

Plain English Explanation

The researchers wanted to understand how people perceive moral judgments made by an advanced AI system, GPT-4, compared to those made by humans. They conducted a modified version of the Moral Turing Test, where they asked people to rate the quality of moral evaluations without knowing the source, and then tried to identify whether each evaluation was made by a human or the AI.

Remarkably, the participants rated the AI's moral reasoning as better than humans' in terms of virtuousness, intelligence, and trustworthiness. This suggests the AI was able to produce moral responses that were perceived as superior to those made by people. However, when asked to identify the source of each evaluation, the participants were able to correctly distinguish the AI-generated ones, even though the AI's moral reasoning was rated as higher quality.

This raises concerns that people may be too willing to accept moral guidance from AI systems, even if the guidance is potentially harmful. The researchers argue that this highlights the need for safeguards and oversight when it comes to using advanced language models, like GPT-4, in matters of morality.

Technical Explanation

The researchers conducted a modified version of the Moral Turing Test (m-MTT) proposed by Allen and colleagues. They used a representative sample of 299 U.S. adults and presented them with moral evaluations, without revealing the source (human or AI). The participants rated the quality of these evaluations along various dimensions, including virtuousness, intelligence, and trustworthiness.

Remarkably, the participants rated the moral reasoning of the AI language model, GPT-4, as superior to humans' across almost all dimensions. This suggests the AI was able to produce moral responses that were perceived as higher quality than those made by people, consistent with passing what Allen and colleagues call the "comparative MTT."

However, when the participants were then tasked with identifying the source of each evaluation (human or AI), they performed significantly above chance levels. This means the AI did not fully pass the test, as the participants were able to correctly distinguish the AI-generated evaluations, even though they rated the AI's moral reasoning as superior.

The researchers argue that this result is not necessarily because the AI's moral reasoning was inferior, but rather due to other potential factors, such as the perceived superiority of the AI's responses. This raises concerns that people may uncritically accept potentially harmful moral guidance from AI systems, highlighting the need for safeguards and oversight in this domain.

Critical Analysis

The researchers acknowledge several limitations and caveats in their study. First, they only used a single advanced language model, GPT-4, and it's unclear how their findings would generalize to other AI systems or future generations of language models. Additionally, the study was conducted with a representative sample of U.S. adults, and the results may differ in other cultural or demographic contexts.

The researchers also note that their modified Moral Turing Test did not fully capture the complexity of moral reasoning and decision-making. The test focused on the perceived quality of moral evaluations, but did not assess the participants' ability to engage in deeper moral reasoning or to apply moral principles in more nuanced scenarios. MoralBench and other benchmarks may provide a more comprehensive assessment of moral reasoning capabilities in AI.

Furthermore, the researchers did not explore the potential reasons why the participants were able to correctly identify the AI-generated evaluations, despite rating them as superior in quality. It's possible that the AI's responses had certain characteristics or patterns that made them distinguishable from human-generated evaluations, even if the overall quality was perceived as higher. Ethical studies of generative AI may provide further insights into this issue.

Ultimately, the study highlights the need for continued research and careful consideration of the societal implications of advanced AI systems, particularly in sensitive domains like morality. While the results suggest that people may be overly trusting of AI's moral guidance, more work is needed to understand the factors that influence human perceptions and the appropriate safeguards that should be put in place.

Conclusion

This study raises important questions about how people perceive and respond to moral evaluations made by advanced AI systems, such as GPT-4. The researchers found that people rated the AI's moral reasoning as superior to humans' in various aspects, but they were still able to correctly identify the AI-generated evaluations. This suggests that while the AI may be capable of producing high-quality moral responses, people may not fully trust or accept them, at least not without proper safeguards and oversight.

The study's findings highlight the need for further research and thoughtful consideration of the potential risks and benefits of using advanced language models in matters of morality. As AI systems become increasingly capable of generating human-like moral evaluations, it will be crucial to ensure that their use is guided by ethical principles and that people maintain a critical and informed perspective on the role of AI in moral decision-making.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👀

Attributions toward Artificial Agents in a modified Moral Turing Test

Eyal Aharoni, Sharlene Fernandes, Daniel J. Brady, Caelan Alexander, Michael Criner, Kara Queen, Javier Rando, Eddy Nahmias, Victor Crespo

Advances in artificial intelligence (AI) raise important questions about whether people view moral evaluations by AI systems similarly to human-generated moral evaluations. We conducted a modified Moral Turing Test (m-MTT), inspired by Allen and colleagues' (2000) proposal, by asking people to distinguish real human moral evaluations from those made by a popular advanced AI language model: GPT-4. A representative sample of 299 U.S. adults first rated the quality of moral evaluations when blinded to their source. Remarkably, they rated the AI's moral reasoning as superior in quality to humans' along almost all dimensions, including virtuousness, intelligence, and trustworthiness, consistent with passing what Allen and colleagues call the comparative MTT. Next, when tasked with identifying the source of each evaluation (human or computer), people performed significantly above chance levels. Although the AI did not pass this test, this was not because of its inferior moral reasoning but, potentially, its perceived superiority, among other possible explanations. The emergence of language models capable of producing moral responses perceived as superior in quality to humans' raises concerns that people may uncritically accept potentially harmful moral guidance from AI. This possibility highlights the need for safeguards around generative language models in matters of morality.

6/19/2024

↗️

Learning Machine Morality through Experience and Interaction

Elizaveta Tennant, Stephen Hailes, Mirco Musolesi

Increasing interest in ensuring safety of next-generation Artificial Intelligence (AI) systems calls for novel approaches to embedding morality into autonomous agents. Traditionally, this has been done by imposing explicit top-down rules or hard constraints on systems, for example by filtering system outputs through pre-defined ethical rules. Recently, instead, entirely bottom-up methods for learning implicit preferences from human behavior have become increasingly popular, such as those for training and fine-tuning Large Language Models. In this paper, we provide a systematization of existing approaches to the problem of introducing morality in machines - modeled as a continuum, and argue that the majority of popular techniques lie at the extremes - either being fully hard-coded, or entirely learned, where no explicit statement of any moral principle is required. Given the relative strengths and weaknesses of each type of methodology, we argue that more hybrid solutions are needed to create adaptable and robust, yet more controllable and interpretable agents. In particular, we present three case studies of recent works which use learning from experience (i.e., Reinforcement Learning) to explicitly provide moral principles to learning agents - either as intrinsic rewards, moral logical constraints or textual principles for language models. For example, using intrinsic rewards in Social Dilemma games, we demonstrate how it is possible to represent classical moral frameworks for agents. We also present an overview of the existing work in this area in order to provide empirical evidence for the potential of this hybrid approach. We then discuss strategies for evaluating the effectiveness of moral learning agents. Finally, we present open research questions and implications for the future of AI safety and ethics which are emerging from this framework.

4/22/2024

GPT-4 is judged more human than humans in displaced and inverted Turing tests

Ishika Rathi, Sydney Taylor, Benjamin K. Bergen, Cameron R. Jones

Everyday AI detection requires differentiating between people and AI in informal, online conversations. In many cases, people will not interact directly with AI systems but instead read conversations between AI systems and other people. We measured how well people and large language models can discriminate using two modified versions of the Turing test: inverted and displaced. GPT-3.5, GPT-4, and displaced human adjudicators judged whether an agent was human or AI on the basis of a Turing test transcript. We found that both AI and displaced human judges were less accurate than interactive interrogators, with below chance accuracy overall. Moreover, all three judged the best-performing GPT-4 witness to be human more often than human witnesses. This suggests that both humans and current LLMs struggle to distinguish between the two when they are not actively interrogating the person, underscoring an urgent need for more accurate tools to detect AI in conversations.

7/15/2024

🏋️

People cannot distinguish GPT-4 from a human in a Turing test

Cameron R. Jones, Benjamin K. Bergen

We evaluated 3 systems (ELIZA, GPT-3.5 and GPT-4) in a randomized, controlled, and preregistered Turing test. Human participants had a 5 minute conversation with either a human or an AI, and judged whether or not they thought their interlocutor was human. GPT-4 was judged to be a human 54% of the time, outperforming ELIZA (22%) but lagging behind actual humans (67%). The results provide the first robust empirical demonstration that any artificial system passes an interactive 2-player Turing test. The results have implications for debates around machine intelligence and, more urgently, suggest that deception by current AI systems may go undetected. Analysis of participants' strategies and reasoning suggests that stylistic and socio-emotional factors play a larger role in passing the Turing test than traditional notions of intelligence.

5/15/2024