Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews

Read original: arXiv:2408.04681 - Published 8/12/2024 by Samantha Chan, Pat Pataranutaporn, Aditya Suri, Wazeer Zulfikar, Pattie Maes, Elizabeth F. Loftus

Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews

Overview

Large language models (LLMs) have become increasingly integrated into conversational AI systems, such as chatbots and virtual assistants.
This study investigates how the use of LLM-powered conversational AI can amplify the formation of false memories during witness interviews.
The researchers conducted experiments to evaluate the impact of LLM-driven conversational AI on the accuracy and reliability of witness testimony.

Plain English Explanation

The paper examines how the use of conversational AI powered by large language models can inadvertently lead to the creation of false memories in people being interviewed as witnesses to an event.

Large language models (LLMs) are powerful AI systems that can engage in human-like conversations. These LLMs have become integrated into many conversational AI assistants, such as chatbots and virtual agents. The researchers were curious to see how the interaction between a witness and an LLM-powered conversational AI could impact the witness's memory of an event.

Through a series of experiments, the researchers found that the conversational style and prompting of the LLM-driven AI can subtly influence the witness to recall details that did not actually occur. This can result in the witness developing false memories about the event, undermining the reliability and accuracy of their testimony.

The study highlights the need to be cautious about the use of LLM-powered conversational AI in critical applications, such as witness interviews, where preserving the integrity of memory and testimony is paramount. As these AI systems become more advanced and ubiquitous, understanding their potential unintended consequences will be crucial.

Technical Explanation

The researchers conducted a series of experiments to investigate how the use of conversational AI powered by large language models can influence the formation of false memories in witness interviews.

In the first experiment, participants watched a video of a simulated crime scene and were then interviewed by either a human interviewer or an LLM-powered conversational AI. The AI interviewer used prompts designed to elicit information and guide the witness, similar to techniques used in real-world interviews. The researchers found that participants interviewed by the AI were more likely to report false details about the event, suggesting that the AI's conversational style and prompting had a significant impact on the witness's memory.

In a second experiment, the researchers explored the mechanisms behind this effect. They found that the AI's use of suggestive questioning and its ability to provide plausible-sounding explanations for the false details contributed to the formation of false memories in the participants.

The findings highlight the need to carefully consider the potential unintended consequences of integrating LLM-powered conversational AI into critical applications, such as witness interviews, where the accuracy and reliability of testimony are paramount. As these AI systems become more advanced and ubiquitous, understanding their impact on human cognition and memory will be crucial for ensuring their safe and ethical deployment.

Critical Analysis

The researchers acknowledge several limitations and areas for further research in their study. First, the experiments were conducted in a controlled laboratory setting, and it remains to be seen how the observed effects would translate to real-world witness interviews, which can involve additional factors and complexities.

Additionally, the researchers note that the specific prompting and conversational strategies used by the LLM-powered AI in the experiments may not fully capture the nuances and evolving capabilities of these systems in practice. As large language models continue to advance, the potential impact on witness memory may change over time.

One could also argue that the study focuses solely on the negative consequences of using LLM-powered conversational AI in witness interviews, without exploring potential mitigating strategies or ways to harness the benefits of these technologies while minimizing the risks. Further research in this direction could provide a more balanced perspective.

Overall, the study provides important insights into the complex interplay between conversational AI, human memory, and the reliability of witness testimony. As these technologies become more prevalent, continued critical analysis and empirical research will be essential to ensure their responsible and ethical use in the justice system and other high-stakes domains.

Conclusion

This study highlights a concerning potential consequence of integrating large language model-powered conversational AI into witness interviews: the amplification of false memories.

The researchers found that the conversational style and prompting of the LLM-driven AI can subtly influence the witness to recall details that did not actually occur, undermining the reliability and accuracy of their testimony. This has significant implications for the use of these technologies in the justice system and other critical applications where preserving the integrity of witness accounts is paramount.

As conversational AI systems become more advanced and integrated into our daily lives, understanding their potential unintended consequences will be crucial. Continued research and critical analysis are needed to ensure these powerful technologies are deployed responsibly and ethically, with due consideration for their impact on human cognition and memory.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews

Samantha Chan, Pat Pataranutaporn, Aditya Suri, Wazeer Zulfikar, Pattie Maes, Elizabeth F. Loftus

This study examines the impact of AI on human false memories -- recollections of events that did not occur or deviate from actual occurrences. It explores false memory induction through suggestive questioning in Human-AI interactions, simulating crime witness interviews. Four conditions were tested: control, survey-based, pre-scripted chatbot, and generative chatbot using a large language model (LLM). Participants (N=200) watched a crime video, then interacted with their assigned AI interviewer or survey, answering questions including five misleading ones. False memories were assessed immediately and after one week. Results show the generative chatbot condition significantly increased false memory formation, inducing over 3 times more immediate false memories than the control and 1.7 times more than the survey method. 36.4% of users' responses to the generative chatbot were misled through the interaction. After one week, the number of false memories induced by generative chatbots remained constant. However, confidence in these false memories remained higher than the control after one week. Moderating factors were explored: users who were less familiar with chatbots but more familiar with AI technology, and more interested in crime investigations, were more susceptible to false memories. These findings highlight the potential risks of using advanced AI in sensitive contexts, like police interviews, emphasizing the need for ethical considerations.

8/12/2024

New!Synthetic Human Memories: AI-Edited Images and Videos Can Implant False Memories and Distort Recollection

Pat Pataranutaporn, Chayapatr Archiwaranguprok, Samantha W. T. Chan, Elizabeth Loftus, Pattie Maes

AI is increasingly used to enhance images and videos, both intentionally and unintentionally. As AI editing tools become more integrated into smartphones, users can modify or animate photos into realistic videos. This study examines the impact of AI-altered visuals on false memories--recollections of events that didn't occur or deviate from reality. In a pre-registered study, 200 participants were divided into four conditions of 50 each. Participants viewed original images, completed a filler task, then saw stimuli corresponding to their assigned condition: unedited images, AI-edited images, AI-generated videos, or AI-generated videos of AI-edited images. AI-edited visuals significantly increased false recollections, with AI-generated videos of AI-edited images having the strongest effect (2.05x compared to control). Confidence in false memories was also highest for this condition (1.19x compared to control). We discuss potential applications in HCI, such as therapeutic memory reframing, and challenges in ethical, legal, political, and societal domains.

9/16/2024

Deceptive AI systems that give explanations are more convincing than honest AI systems and can amplify belief in misinformation

Valdemar Danry, Pat Pataranutaporn, Matthew Groh, Ziv Epstein, Pattie Maes

Advanced Artificial Intelligence (AI) systems, specifically large language models (LLMs), have the capability to generate not just misinformation, but also deceptive explanations that can justify and propagate false information and erode trust in the truth. We examined the impact of deceptive AI generated explanations on individuals' beliefs in a pre-registered online experiment with 23,840 observations from 1,192 participants. We found that in addition to being more persuasive than accurate and honest explanations, AI-generated deceptive explanations can significantly amplify belief in false news headlines and undermine true ones as compared to AI systems that simply classify the headline incorrectly as being true/false. Moreover, our results show that personal factors such as cognitive reflection and trust in AI do not necessarily protect individuals from these effects caused by deceptive AI generated explanations. Instead, our results show that the logical validity of AI generated deceptive explanations, that is whether the explanation has a causal effect on the truthfulness of the AI's classification, plays a critical role in countering their persuasiveness - with logically invalid explanations being deemed less credible. This underscores the importance of teaching logical reasoning and critical thinking skills to identify logically invalid arguments, fostering greater resilience against advanced AI-driven misinformation.

8/2/2024

🔮

The Efficacy of Conversational Artificial Intelligence in Rectifying the Theory of Mind and Autonomy Biases: Comparative Analysis

Marcin Rzk{a}deczka, Anna Sterna, Julia Stoli'nska, Paulina Kaczy'nska, Marcin Moskalewicz

Background: The increasing deployment of Conversational Artificial Intelligence (CAI) in mental health interventions necessitates an evaluation of their efficacy in rectifying cognitive biases and recognizing affect in human-AI interactions. These biases, including theory of mind and autonomy biases, can exacerbate mental health conditions such as depression and anxiety. Objective: This study aimed to assess the effectiveness of therapeutic chatbots (Wysa, Youper) versus general-purpose language models (GPT-3.5, GPT-4, Gemini Pro) in identifying and rectifying cognitive biases and recognizing affect in user interactions. Methods: The study employed virtual case scenarios simulating typical user-bot interactions. Cognitive biases assessed included theory of mind biases (anthropomorphism, overtrust, attribution) and autonomy biases (illusion of control, fundamental attribution error, just-world hypothesis). Responses were evaluated on accuracy, therapeutic quality, and adherence to Cognitive Behavioral Therapy (CBT) principles, using an ordinal scale. The evaluation involved double review by cognitive scientists and a clinical psychologist. Results: The study revealed that general-purpose chatbots outperformed therapeutic chatbots in rectifying cognitive biases, particularly in overtrust bias, fundamental attribution error, and just-world hypothesis. GPT-4 achieved the highest scores across all biases, while therapeutic bots like Wysa scored the lowest. Affect recognition showed similar trends, with general-purpose bots outperforming therapeutic bots in four out of six biases. However, the results highlight the need for further refinement of therapeutic chatbots to enhance their efficacy and ensure safe, effective use in digital mental health interventions. Future research should focus on improving affective response and addressing ethical considerations in AI-based therapy.

7/24/2024