Large Language Models Can Infer Personality from Free-Form User Interactions

2405.13052

Published 5/24/2024 by Heinrich Peters, Moran Cerf, Sandra C. Matz

💬

Abstract

This study investigates the capacity of Large Language Models (LLMs) to infer the Big Five personality traits from free-form user interactions. The results demonstrate that a chatbot powered by GPT-4 can infer personality with moderate accuracy, outperforming previous approaches drawing inferences from static text content. The accuracy of inferences varied across different conversational settings. Performance was highest when the chatbot was prompted to elicit personality-relevant information from users (mean r=.443, range=[.245, .640]), followed by a condition placing greater emphasis on naturalistic interaction (mean r=.218, range=[.066, .373]). Notably, the direct focus on personality assessment did not result in a less positive user experience, with participants reporting the interactions to be equally natural, pleasant, engaging, and humanlike across both conditions. A chatbot mimicking ChatGPT's default behavior of acting as a helpful assistant led to markedly inferior personality inferences and lower user experience ratings but still captured psychologically meaningful information for some of the personality traits (mean r=.117, range=[-.004, .209]). Preliminary analyses suggest that the accuracy of personality inferences varies only marginally across different socio-demographic subgroups. Our results highlight the potential of LLMs for psychological profiling based on conversational interactions. We discuss practical implications and ethical challenges associated with these findings.

Create account to get full access

Overview

This study investigates the ability of Large Language Models (LLMs) to infer people's personality traits based on their conversations with a chatbot.
The results show that an LLM-powered chatbot can make moderate-accuracy personality inferences, outperforming previous approaches that relied on static text content.
The accuracy of the inferences varied depending on the conversational setting, with the highest performance when the chatbot was prompted to elicit personality-relevant information.
Interestingly, this more direct focus on personality assessment did not negatively impact the user experience, which remained positive across different conditions.

Plain English Explanation

The researchers wanted to see if a chatbot powered by a powerful AI language model, like GPT-4, could figure out someone's personality just by talking to them. Previous methods tried to guess personality based on what people wrote, but the chatbot was able to do a better job.

The chatbot performed best when it was specifically told to ask questions to learn about the person's personality. Even though it was focusing on personality, the people chatting with the bot still found the conversation to be natural, pleasant, and human-like. When the chatbot just acted like a regular helpful assistant, it was not as good at guessing personality, but it still captured some useful information.

The accuracy of the personality guesses didn't vary much based on the person's age, gender, or other demographic factors. This shows the potential for these AI language models to do psychological profiling by analyzing how people communicate. However, there are also important ethical questions to consider around the use of this technology.

Technical Explanation

This study investigates the capability of Large Language Models (LLMs) to infer the Big Five personality traits (openness, conscientiousness, extraversion, agreeableness, and neuroticism) from free-form conversational interactions. The researchers developed a chatbot powered by GPT-4 and tested its performance across three different conversational settings:

A "Personality Inference" condition where the chatbot was prompted to elicit personality-relevant information from users.
A "Naturalistic Interaction" condition that placed greater emphasis on natural conversation flow.
A "Helpful Assistant" condition mirroring the default behavior of language models like ChatGPT.

The results show that the chatbot was able to make moderately accurate personality inferences, with the highest performance in the "Personality Inference" condition (mean r=.443). The "Naturalistic Interaction" condition also yielded meaningful personality insights (mean r=.218), while the "Helpful Assistant" condition had the lowest accuracy (mean r=.117).

Importantly, the more targeted focus on personality assessment in the first condition did not negatively impact the user experience, which remained equally positive across all settings. The researchers also found that the accuracy of the personality inferences was relatively consistent across different socio-demographic subgroups.

Critical Analysis

The study demonstrates the potential of LLMs to simulate human-like social interactions and infer psychological information from conversational data. However, it also raises important ethical questions about the use of such technology for psychological profiling, particularly in the context of dynamic personality generation by AI systems.

While the study acknowledges some limitations, such as the use of a relatively small sample size, it would be valuable to see further research exploring the long-term implications and potential misuses of this technology. Additional studies could investigate the robustness of the personality inferences, the impact of different conversational contexts, and the potential for bias or manipulation in these AI-driven interactions.

Conclusion

This study demonstrates the promising yet complex relationship between Large Language Models and human personality. While the results suggest that LLMs can make moderately accurate personality inferences from conversational data, the ethical implications of this capability deserve careful consideration. As these technologies continue to advance, it will be crucial to explore ways of harnessing their potential while addressing the risks and challenges they present for individual privacy, psychological well-being, and societal trust.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

💬

Large Language Models Can Infer Psychological Dispositions of Social Media Users

Heinrich Peters, Sandra Matz

Large Language Models (LLMs) demonstrate increasingly human-like abilities across a wide variety of tasks. In this paper, we investigate whether LLMs like ChatGPT can accurately infer the psychological dispositions of social media users and whether their ability to do so varies across socio-demographic groups. Specifically, we test whether GPT-3.5 and GPT-4 can derive the Big Five personality traits from users' Facebook status updates in a zero-shot learning scenario. Our results show an average correlation of r = .29 (range = [.22, .33]) between LLM-inferred and self-reported trait scores - a level of accuracy that is similar to that of supervised machine learning models specifically trained to infer personality. Our findings also highlight heterogeneity in the accuracy of personality inferences across different age groups and gender categories: predictions were found to be more accurate for women and younger individuals on several traits, suggesting a potential bias stemming from the underlying training data or differences in online self-expression. The ability of LLMs to infer psychological dispositions from user-generated text has the potential to democratize access to cheap and scalable psychometric assessments for both researchers and practitioners. On the one hand, this democratization might facilitate large-scale research of high ecological validity and spark innovation in personalized services. On the other hand, it also raises ethical concerns regarding user privacy and self-determination, highlighting the need for stringent ethical frameworks and regulation.

6/6/2024

cs.CL cs.AI cs.CY cs.HC cs.LG cs.SI

💬

PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

Hang Jiang, Xiajie Zhang, Xubo Cao, Cynthia Breazeal, Deb Roy, Jad Kabbara

Despite the many use cases for large language models (LLMs) in creating personalized chatbots, there has been limited research on evaluating the extent to which the behaviors of personalized LLMs accurately and consistently reflect specific personality traits. We consider studying the behavior of LLM-based agents which we refer to as LLM personas and present a case study with GPT-3.5 and GPT-4 to investigate whether LLMs can generate content that aligns with their assigned personality profiles. To this end, we simulate distinct LLM personas based on the Big Five personality model, have them complete the 44-item Big Five Inventory (BFI) personality test and a story writing task, and then assess their essays with automatic and human evaluations. Results show that LLM personas' self-reported BFI scores are consistent with their designated personality types, with large effect sizes observed across five traits. Additionally, LLM personas' writings have emerging representative linguistic patterns for personality traits when compared with a human writing corpus. Furthermore, human evaluation shows that humans can perceive some personality traits with an accuracy of up to 80%. Interestingly, the accuracy drops significantly when the annotators were informed of AI authorship.

4/3/2024

cs.CL cs.AI cs.HC

Is persona enough for personality? Using ChatGPT to reconstruct an agent's latent personality from simple descriptions

Yongyi Ji, Zhisheng Tang, Mayank Kejriwal

Personality, a fundamental aspect of human cognition, contains a range of traits that influence behaviors, thoughts, and emotions. This paper explores the capabilities of large language models (LLMs) in reconstructing these complex cognitive attributes based only on simple descriptions containing socio-demographic and personality type information. Utilizing the HEXACO personality framework, our study examines the consistency of LLMs in recovering and predicting underlying (latent) personality dimensions from simple descriptions. Our experiments reveal a significant degree of consistency in personality reconstruction, although some inconsistencies and biases, such as a tendency to default to positive traits in the absence of explicit information, are also observed. Additionally, socio-demographic factors like age and number of children were found to influence the reconstructed personality dimensions. These findings have implications for building sophisticated agent-based simulacra using LLMs and highlight the need for further research on robust personality generation in LLMs.

6/19/2024

cs.CL cs.AI

💬

New!Assessing the nature of large language models: A caution against anthropocentrism

Ann Speed

Generative AI models garnered a large amount of public attention and speculation with the release of OpenAIs chatbot, ChatGPT. At least two opinion camps exist: one excited about possibilities these models offer for fundamental changes to human tasks, and another highly concerned about power these models seem to have. To address these concerns, we assessed several LLMs, primarily GPT 3.5, using standard, normed, and validated cognitive and personality measures. For this seedling project, we developed a battery of tests that allowed us to estimate the boundaries of some of these models capabilities, how stable those capabilities are over a short period of time, and how they compare to humans. Our results indicate that LLMs are unlikely to have developed sentience, although its ability to respond to personality inventories is interesting. GPT3.5 did display large variability in both cognitive and personality measures over repeated observations, which is not expected if it had a human-like personality. Variability notwithstanding, LLMs display what in a human would be considered poor mental health, including low self-esteem, marked dissociation from reality, and in some cases narcissism and psychopathy, despite upbeat and helpful responses.

6/28/2024

cs.AI cs.CL cs.CY cs.HC