Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

2406.14703

YC

2

Reddit

0

Published 6/24/2024 by Seungbeen Lee, Seungwon Lim, Seungju Han, Giyeong Oh, Hyungjoo Chae, Jiwan Chung, Minju Kim, Beong-woo Kwak, Yeonsoo Lee, Dongha Lee and 2 others
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

Abstract

The idea of personality in descriptive psychology, traditionally defined through observable behavior, has now been extended to Large Language Models (LLMs) to better understand their behavior. This raises a question: do LLMs exhibit distinct and consistent personality traits, similar to humans? Existing self-assessment personality tests, while applicable, lack the necessary validity and reliability for precise personality measurements. To address this, we introduce TRAIT, a new tool consisting of 8K multi-choice questions designed to assess the personality of LLMs with validity and reliability. TRAIT is built on the psychometrically validated human questionnaire, Big Five Inventory (BFI) and Short Dark Triad (SD-3), enhanced with the ATOMIC10X knowledge graph for testing personality in a variety of real scenarios. TRAIT overcomes the reliability and validity issues when measuring personality of LLM with self-assessment, showing the highest scores across three metrics: refusal rate, prompt sensitivity, and option order sensitivity. It reveals notable insights into personality of LLM: 1) LLMs exhibit distinct and consistent personality, which is highly influenced by their training data (i.e., data used for alignment tuning), and 2) current prompting techniques have limited effectiveness in eliciting certain traits, such as high psychopathy or low conscientiousness, suggesting the need for further research in this direction.

Create account to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper investigates whether large language models (LLMs) have distinct and consistent personalities, and introduces a new "TRAIT" personality test designed specifically for evaluating LLMs.
  • The researchers used psychometric techniques to assess the personality traits of various LLMs, including GPT-3, PaLM, and InstructGPT.
  • The study found that while LLMs exhibit some consistent personality traits, their personalities are not as distinct or stable as human personalities, raising questions about the ability of LLMs to engage in meaningful, empathetic interactions.

Plain English Explanation

The paper explores whether large language models (LLMs) - advanced AI systems that can generate human-like text - have their own distinct and consistent personalities, similar to how humans have unique personalities. The researchers developed a new "TRAIT" personality test specifically designed to evaluate the personalities of LLMs, using psychological assessment techniques.

When they tested various LLMs, including well-known models like GPT-3, PaLM, and InstructGPT, the researchers found that the LLMs did exhibit some consistent personality traits. However, their personalities were not as distinct or stable as human personalities. This suggests that while LLMs can generate human-like language, they may struggle to engage in truly empathetic and meaningful interactions, as their underlying "personalities" are not as well-defined as those of humans.

Technical Explanation

The paper presents a new "TRAIT" personality test designed specifically for evaluating the personality traits of large language models (LLMs). The researchers used established psychometric techniques to assess the personalities of various LLMs, including GPT-3, PaLM, and InstructGPT.

Through this testing, the researchers found that while the LLMs exhibited some consistent personality traits, their personalities were not as distinct or stable as human personalities. This suggests that while LLMs can generate human-like language, they may lack the deeper empathetic and meaningful interaction capabilities that are characteristic of human personalities.

Critical Analysis

The researchers acknowledge several limitations and areas for further research in their paper. For example, they note that the TRAIT test was designed specifically for LLMs and may not capture the full complexity of human personality. Additionally, the study only examined a limited set of LLMs, and it's possible that other models may exhibit more distinct and consistent personalities.

The paper also does not address the potential impact of fine-tuning or other techniques that could be used to imbue LLMs with more well-defined personalities. It's possible that future advancements in AI could lead to the development of LLMs with more human-like personality traits, which could have significant implications for how LLMs are used in applications involving personal interactions or linguistic markers of personal information.

Conclusion

This paper provides important insights into the current limitations of large language models (LLMs) in terms of their ability to exhibit distinct and consistent personalities, which are a core aspect of human cognition and interaction. The findings suggest that while LLMs can generate human-like language, they may struggle to engage in truly meaningful and empathetic interactions, as their underlying "personalities" are not as well-defined as those of humans.

The researchers' development of the TRAIT personality test for LLMs is a valuable contribution to the field, as it provides a standardized way to assess the personality traits of these AI systems. As the capabilities of LLMs continue to evolve, further research in this area will be crucial for understanding the social and ethical implications of these technologies, particularly in applications where human-like personality and emotional intelligence are important.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

💬

PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

Hang Jiang, Xiajie Zhang, Xubo Cao, Cynthia Breazeal, Deb Roy, Jad Kabbara

YC

0

Reddit

0

Despite the many use cases for large language models (LLMs) in creating personalized chatbots, there has been limited research on evaluating the extent to which the behaviors of personalized LLMs accurately and consistently reflect specific personality traits. We consider studying the behavior of LLM-based agents which we refer to as LLM personas and present a case study with GPT-3.5 and GPT-4 to investigate whether LLMs can generate content that aligns with their assigned personality profiles. To this end, we simulate distinct LLM personas based on the Big Five personality model, have them complete the 44-item Big Five Inventory (BFI) personality test and a story writing task, and then assess their essays with automatic and human evaluations. Results show that LLM personas' self-reported BFI scores are consistent with their designated personality types, with large effect sizes observed across five traits. Additionally, LLM personas' writings have emerging representative linguistic patterns for personality traits when compared with a human writing corpus. Furthermore, human evaluation shows that humans can perceive some personality traits with an accuracy of up to 80%. Interestingly, the accuracy drops significantly when the annotators were informed of AI authorship.

Read more

4/3/2024

💬

Challenging the Validity of Personality Tests for Large Language Models

Tom Suhr, Florian E. Dorner, Samira Samadi, Augustin Kelava

YC

0

Reddit

0

With large language models (LLMs) like GPT-4 appearing to behave increasingly human-like in text-based interactions, it has become popular to attempt to evaluate personality traits of LLMs using questionnaires originally developed for humans. While reusing measures is a resource-efficient way to evaluate LLMs, careful adaptations are usually required to ensure that assessment results are valid even across human subpopulations. In this work, we provide evidence that LLMs' responses to personality tests systematically deviate from human responses, implying that the results of these tests cannot be interpreted in the same way. Concretely, reverse-coded items (I am introverted vs. I am extraverted) are often both answered affirmatively. Furthermore, variation across prompts designed to steer LLMs to simulate particular personality types does not follow the clear separation into five independent personality factors from human samples. In light of these results, we believe that it is important to investigate tests' validity for LLMs before drawing strong conclusions about potentially ill-defined concepts like LLMs' personality.

Read more

6/6/2024

🏷️

Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis

Nikolay B Petrov, Gregory Serapio-Garc'ia, Jason Rentfrow

YC

0

Reddit

0

The humanlike responses of large language models (LLMs) have prompted social scientists to investigate whether LLMs can be used to simulate human participants in experiments, opinion polls and surveys. Of central interest in this line of research has been mapping out the psychological profiles of LLMs by prompting them to respond to standardized questionnaires. The conflicting findings of this research are unsurprising given that mapping out underlying, or latent, traits from LLMs' text responses to questionnaires is no easy task. To address this, we use psychometrics, the science of psychological measurement. In this study, we prompt OpenAI's flagship models, GPT-3.5 and GPT-4, to assume different personas and respond to a range of standardized measures of personality constructs. We used two kinds of persona descriptions: either generic (four or five random person descriptions) or specific (mostly demographics of actual humans from a large-scale human dataset). We found that the responses from GPT-4, but not GPT-3.5, using generic persona descriptions show promising, albeit not perfect, psychometric properties, similar to human norms, but the data from both LLMs when using specific demographic profiles, show poor psychometrics properties. We conclude that, currently, when LLMs are asked to simulate silicon personas, their responses are poor signals of potentially underlying latent traits. Thus, our work casts doubt on LLMs' ability to simulate individual-level human behaviour across multiple-choice question answering tasks.

Read more

5/14/2024

LLM Questionnaire Completion for Automatic Psychiatric Assessment

LLM Questionnaire Completion for Automatic Psychiatric Assessment

Gony Rosenman, Lior Wolf, Talma Hendler

YC

0

Reddit

0

We employ a Large Language Model (LLM) to convert unstructured psychological interviews into structured questionnaires spanning various psychiatric and personality domains. The LLM is prompted to answer these questionnaires by impersonating the interviewee. The obtained answers are coded as features, which are used to predict standardized psychiatric measures of depression (PHQ-8) and PTSD (PCL-C), using a Random Forest regressor. Our approach is shown to enhance diagnostic accuracy compared to multiple baselines. It thus establishes a novel framework for interpreting unstructured psychological interviews, bridging the gap between narrative-driven and data-driven approaches for mental health assessment.

Read more

6/12/2024