Large Language Models Assume People are More Rational than We Really are

Read original: arXiv:2406.17055 - Published 7/31/2024 by Ryan Liu, Jiayi Geng, Joshua C. Peterson, Ilia Sucholutsky, Thomas L. Griffiths
Total Score

0

Large Language Models Assume People are More Rational than We Really are

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • Large language models (LLMs) are trained on vast amounts of human-generated text, which can lead them to overestimate people's rationality and decision-making capabilities.
  • This paper examines how LLMs' assumptions about human behavior can influence their performance on tasks that involve decision-making and reasoning.
  • The authors propose that LLMs may need to be fine-tuned or designed with a more nuanced understanding of human psychology and decision-making biases.

Plain English Explanation

Large language models (LLMs) are very advanced artificial intelligence systems that can understand and generate human-like text. These models are trained on huge amounts of data, including millions of web pages, books, and other text sources.

One potential issue with LLMs is that they may assume people are more rational and logical in their decision-making than we actually are. Humans are often influenced by biases, emotions, and other factors that can lead to irrational choices. However, LLMs may not fully account for these human quirks when trying to mimic or predict human behavior.

This paper explores how an LLM's assumptions about human rationality could impact its performance on tasks that involve decision-making or reasoning. For example, if an LLM expects people to always choose the most logical option, it may struggle to accurately predict real-world choices that are influenced by things like cognitive biases.

The authors suggest that to be more effective, LLMs may need to be fine-tuned or designed with a deeper understanding of human psychology and the ways our decision-making can deviate from pure rationality. This could help the models better capture the nuances of human behavior and decision-making.

Technical Explanation

The paper examines how the assumptions that large language models (LLMs) make about human behavior can influence their performance on tasks involving decision-making and reasoning. LLMs are trained on vast amounts of human-generated text, which can lead them to overestimate people's rationality and logical decision-making capabilities.

The authors propose that LLMs may need to be fine-tuned or designed with a more nuanced understanding of human psychology and decision-making biases. This is because humans are often influenced by factors like emotions, heuristics, and cognitive biases that can lead to irrational choices, which may not be fully accounted for in LLM architectures.

To support their argument, the authors review related work on the ways LLMs can capture and generate personas, the use of Bayesian statistical modeling to understand LLM predictions, and the limitations of LLMs in accurately predicting human choices and collective decision-making. They also discuss a framework for evaluating LLM decision-making behavior that considers factors beyond pure rationality.

Critical Analysis

The paper raises valid concerns about the potential mismatch between LLMs' assumptions about human behavior and the reality of how people actually make decisions. The authors provide a solid overview of related research that supports their argument, such as the work on modeling LLM personas and limitations in predicting human choices.

However, the paper could have delved deeper into the specific cognitive biases, heuristics, and psychological factors that contribute to irrational human decision-making. A more thorough exploration of these concepts and their implications for LLM design and performance could have strengthened the analysis.

Additionally, the paper does not provide any empirical data or experiments to directly demonstrate the impact of LLMs' rationality assumptions on their performance. While the theoretical argument is compelling, some concrete evidence or case studies would have made the claims more robust.

Overall, the paper raises an important issue that deserves further investigation. As LLMs continue to be deployed in real-world applications, it will be crucial to ensure they account for the nuances of human behavior and decision-making, rather than relying solely on assumptions of rationality.

Conclusion

This paper highlights a potential limitation of large language models (LLMs) - their tendency to overestimate the rationality and logical decision-making capabilities of humans. The authors argue that this mismatch between LLM assumptions and actual human behavior can negatively impact the models' performance on tasks involving reasoning and decision-making.

To address this issue, the authors suggest that LLMs may need to be fine-tuned or designed with a more nuanced understanding of human psychology and the various biases, heuristics, and other factors that can lead to irrational choices. By incorporating a more realistic view of human decision-making, LLMs could potentially become more effective at tasks that require understanding and predicting real-world human behavior.

As LLMs continue to be deployed in a wide range of applications, it will be crucial for developers and researchers to carefully consider the implications of these models' assumptions about human rationality. Incorporating a deeper understanding of human psychology and decision-making biases may be key to unlocking the full potential of large language models in real-world settings.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Large Language Models Assume People are More Rational than We Really are
Total Score

0

Large Language Models Assume People are More Rational than We Really are

Ryan Liu, Jiayi Geng, Joshua C. Peterson, Ilia Sucholutsky, Thomas L. Griffiths

In order for AI systems to communicate effectively with people, they must understand how we make decisions. However, people's decisions are not always rational, so the implicit internal models of human decision-making in Large Language Models (LLMs) must account for this. Previous empirical evidence seems to suggest that these implicit models are accurate -- LLMs offer believable proxies of human behavior, acting how we expect humans would in everyday interactions. However, by comparing LLM behavior and predictions to a large dataset of human decisions, we find that this is actually not the case: when both simulating and predicting people's choices, a suite of cutting-edge LLMs (GPT-4o & 4-Turbo, Llama-3-8B & 70B, Claude 3 Opus) assume that people are more rational than we really are. Specifically, these models deviate from human behavior and align more closely with a classic model of rational choice -- expected value theory. Interestingly, people also tend to assume that other people are rational when interpreting their behavior. As a consequence, when we compare the inferences that LLMs and people draw from the decisions of others using another psychological dataset, we find that these inferences are highly correlated. Thus, the implicit decision-making models of LLMs appear to be aligned with the human expectation that other people will act rationally, rather than with how people actually act.

Read more

7/31/2024

Are Large Language Models Aligned with People's Social Intuitions for Human-Robot Interactions?
Total Score

0

Are Large Language Models Aligned with People's Social Intuitions for Human-Robot Interactions?

Lennart Wachowiak, Andrew Coles, Oya Celiktutan, Gerard Canal

Large language models (LLMs) are increasingly used in robotics, especially for high-level action planning. Meanwhile, many robotics applications involve human supervisors or collaborators. Hence, it is crucial for LLMs to generate socially acceptable actions that align with people's preferences and values. In this work, we test whether LLMs capture people's intuitions about behavior judgments and communication preferences in human-robot interaction (HRI) scenarios. For evaluation, we reproduce three HRI user studies, comparing the output of LLMs with that of real participants. We find that GPT-4 strongly outperforms other models, generating answers that correlate strongly with users' answers in two studies $unicode{x2014}$ the first study dealing with selecting the most appropriate communicative act for a robot in various situations ($r_s$ = 0.82), and the second with judging the desirability, intentionality, and surprisingness of behavior ($r_s$ = 0.83). However, for the last study, testing whether people judge the behavior of robots and humans differently, no model achieves strong correlations. Moreover, we show that vision models fail to capture the essence of video stimuli and that LLMs tend to rate different communicative acts and behavior desirability higher than people.

Read more

7/10/2024

Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning
Total Score

0

Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning

Thuy Ngoc Nguyen, Kasturi Jamale, Cleotilde Gonzalez

Large Language Models (LLMs) have demonstrated their capabilities across various tasks, from language translation to complex reasoning. Understanding and predicting human behavior and biases are crucial for artificial intelligence (AI) assisted systems to provide useful assistance, yet it remains an open question whether these models can achieve this. This paper addresses this gap by leveraging the reasoning and generative capabilities of the LLMs to predict human behavior in two sequential decision-making tasks. These tasks involve balancing between exploitative and exploratory actions and handling delayed feedback, both essential for simulating real-life decision processes. We compare the performance of LLMs with a cognitive instance-based learning (IBL) model, which imitates human experiential decision-making. Our findings indicate that LLMs excel at rapidly incorporating feedback to enhance prediction accuracy. In contrast, the cognitive IBL model better accounts for human exploratory behaviors and effectively captures loss aversion bias, i.e., the tendency to choose a sub-optimal goal with fewer step-cost penalties rather than exploring to find the optimal choice, even with limited experience. The results highlight the benefits of integrating LLMs with cognitive architectures, suggesting that this synergy could enhance the modeling and understanding of complex human decision-making patterns.

Read more

7/15/2024

Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function
Total Score

0

Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function

Keyon Vafa, Ashesh Rambachan, Sendhil Mullainathan

What makes large language models (LLMs) impressive is also what makes them hard to evaluate: their diversity of uses. To evaluate these models, we must understand the purposes they will be used for. We consider a setting where these deployment decisions are made by people, and in particular, people's beliefs about where an LLM will perform well. We model such beliefs as the consequence of a human generalization function: having seen what an LLM gets right or wrong, people generalize to where else it might succeed. We collect a dataset of 19K examples of how humans make generalizations across 79 tasks from the MMLU and BIG-Bench benchmarks. We show that the human generalization function can be predicted using NLP methods: people have consistent structured ways to generalize. We then evaluate LLM alignment with the human generalization function. Our results show that -- especially for cases where the cost of mistakes is high -- more capable models (e.g. GPT-4) can do worse on the instances people choose to use them for, exactly because they are not aligned with the human generalization function.

Read more

6/4/2024