Analyzing Narrative Processing in Large Language Models (LLMs): Using GPT4 to test BERT

2405.02024

Published 5/6/2024 by Patrick Krauss, Jannik Hosch, Claus Metzner, Andreas Maier, Peter Uhrig, Achim Schilling

⚙️

Abstract

The ability to transmit and receive complex information via language is unique to humans and is the basis of traditions, culture and versatile social interactions. Through the disruptive introduction of transformer based large language models (LLMs) humans are not the only entity to understand and produce language any more. In the present study, we have performed the first steps to use LLMs as a model to understand fundamental mechanisms of language processing in neural networks, in order to make predictions and generate hypotheses on how the human brain does language processing. Thus, we have used ChatGPT to generate seven different stylistic variations of ten different narratives (Aesop's fables). We used these stories as input for the open source LLM BERT and have analyzed the activation patterns of the hidden units of BERT using multi-dimensional scaling and cluster analysis. We found that the activation vectors of the hidden units cluster according to stylistic variations in earlier layers of BERT (1) than narrative content (4-5). Despite the fact that BERT consists of 12 identical building blocks that are stacked and trained on large text corpora, the different layers perform different tasks. This is a very useful model of the human brain, where self-similar structures, i.e. different areas of the cerebral cortex, can have different functions and are therefore well suited to processing language in a very efficient way. The proposed approach has the potential to open the black box of LLMs on the one hand, and might be a further step to unravel the neural processes underlying human language processing and cognition in general.

Get summaries of the top AI research delivered straight to your inbox:

Overview

This paper explores the use of large language models (LLMs) like ChatGPT as a way to understand how the human brain processes language.
The researchers used ChatGPT to generate different stylistic variations of Aesop's fables, and then analyzed the activation patterns of the hidden units in the BERT LLM when processing these stories.
The key finding is that the different layers of BERT perform different tasks, similar to how different areas of the human cerebral cortex have specialized functions for language processing.
This provides a model for understanding the neural processes underlying human language and cognition.

Plain English Explanation

Humans have a unique ability to communicate complex information through language, which forms the basis of our culture, traditions, and social interactions. However, the introduction of powerful large language models (LLMs) like ChatGPT has meant that machines can now understand and generate language as well.

In this study, the researchers used ChatGPT to create different versions of Aesop's fables, each with a distinct style. They then fed these stories into the BERT LLM and analyzed how the different layers of BERT responded to the stylistic variations versus the narrative content.

The key finding is that the various layers of BERT performed different tasks, much like how different regions of the human brain's cerebral cortex have specialized functions for language processing. This suggests that LLMs could be a useful model for understanding the neural mechanisms underlying human language and cognition.

By peering into the "black box" of LLMs, the researchers hope to gain insights that could help unravel the complexities of how the human brain processes and understands language.

Technical Explanation

The researchers used ChatGPT to generate seven different stylistic variations of ten different narratives (Aesop's fables). They then used these stories as input for the open-source LLM BERT and analyzed the activation patterns of the hidden units in BERT using multi-dimensional scaling and cluster analysis.

The results showed that the activation vectors of BERT's hidden units clustered according to stylistic variations in the earlier layers (layer 1) of the model, rather than the narrative content, which was only reflected in the later layers (layers 4-5). This suggests that the different layers of BERT perform distinct tasks, similar to how different areas of the human cerebral cortex have specialized functions for language processing.

Despite the fact that BERT consists of 12 identical building blocks that are stacked and trained on large text corpora, the researchers found that the model is able to parse language in a hierarchical and efficient manner, mirroring the organization of the human brain.

Critical Analysis

The researchers acknowledge that their approach is just a first step towards using LLMs as a model for understanding human language processing and cognition. While the results provide an intriguing parallel between LLM architecture and the human brain, more research is needed to fully validate this comparison and explore its implications.

One potential limitation is that the study only used a single LLM (BERT) and a limited set of narratives (Aesop's fables). Expanding the analysis to a wider range of LLMs and text genres could help strengthen the generalizability of the findings.

Additionally, the researchers did not directly compare the LLM's language processing to actual human behavior or neurocognitive data. Integrating these empirical measurements could help solidify the links between LLM architecture and human language abilities.

Overall, the proposed approach has the potential to open new avenues for understanding the inner workings of LLMs and their relationship to human cognition. However, further research is needed to fully validate and build upon these initial insights.

Conclusion

This study takes an important first step towards using powerful large language models as a tool for shedding light on the fundamental mechanisms of human language processing and cognition. By analyzing how an LLM like BERT responds to stylistic variations in narratives, the researchers have uncovered parallels between the model's hierarchical architecture and the specialized functions of different regions in the human brain.

While more work is needed to fully validate and extend these findings, this research opens up exciting new possibilities for using LLMs as a model system to generate hypotheses and make predictions about the neural underpinnings of our unique human ability to understand and produce language. Ultimately, this could lead to a better understanding of cognition and potentially new insights that benefit both artificial intelligence and our knowledge of the brain.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

💬

PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

Hang Jiang, Xiajie Zhang, Xubo Cao, Cynthia Breazeal, Deb Roy, Jad Kabbara

Despite the many use cases for large language models (LLMs) in creating personalized chatbots, there has been limited research on evaluating the extent to which the behaviors of personalized LLMs accurately and consistently reflect specific personality traits. We consider studying the behavior of LLM-based agents which we refer to as LLM personas and present a case study with GPT-3.5 and GPT-4 to investigate whether LLMs can generate content that aligns with their assigned personality profiles. To this end, we simulate distinct LLM personas based on the Big Five personality model, have them complete the 44-item Big Five Inventory (BFI) personality test and a story writing task, and then assess their essays with automatic and human evaluations. Results show that LLM personas' self-reported BFI scores are consistent with their designated personality types, with large effect sizes observed across five traits. Additionally, LLM personas' writings have emerging representative linguistic patterns for personality traits when compared with a human writing corpus. Furthermore, human evaluation shows that humans can perceive some personality traits with an accuracy of up to 80%. Interestingly, the accuracy drops significantly when the annotators were informed of AI authorship.

4/3/2024

cs.CL cs.AI cs.HC

💬

Utilizing Large Language Models to Generate Synthetic Data to Increase the Performance of BERT-Based Neural Networks

Chancellor R. Woolsey, Prakash Bisht, Joshua Rothman, Gondy Leroy

An important issue impacting healthcare is a lack of available experts. Machine learning (ML) models could resolve this by aiding in diagnosing patients. However, creating datasets large enough to train these models is expensive. We evaluated large language models (LLMs) for data creation. Using Autism Spectrum Disorders (ASD), we prompted ChatGPT and GPT-Premium to generate 4,200 synthetic observations to augment existing medical data. Our goal is to label behaviors corresponding to autism criteria and improve model accuracy with synthetic training data. We used a BERT classifier pre-trained on biomedical literature to assess differences in performance between models. A random sample (N=140) from the LLM-generated data was evaluated by a clinician and found to contain 83% correct example-label pairs. Augmenting data increased recall by 13% but decreased precision by 16%, correlating with higher quality and lower accuracy across pairs. Future work will analyze how different synthetic data traits affect ML outcomes.

5/14/2024

cs.CL cs.AI

💬

Aspects of human memory and Large Language Models

Romuald A. Janik

Large Language Models (LLMs) are huge artificial neural networks which primarily serve to generate text, but also provide a very sophisticated probabilistic model of language use. Since generating a semantically consistent text requires a form of effective memory, we investigate the memory properties of LLMs and find surprising similarities with key characteristics of human memory. We argue that the human-like memory properties of the Large Language Model do not follow automatically from the LLM architecture but are rather learned from the statistics of the training textual data. These results strongly suggest that the biological features of human memory leave an imprint on the way that we structure our textual narratives.

4/9/2024

cs.CL cs.AI cs.LG

💬

A Case Study of Large Language Models (ChatGPT and CodeBERT) for Security-Oriented Code Analysis

Zhilong Wang, Lan Zhang, Chen Cao, Nanqing Luo, Peng Liu

LLMs can be used on code analysis tasks like code review, vulnerabilities analysis and etc. However, the strengths and limitations of adopting these LLMs to the code analysis are still unclear. In this paper, we delve into LLMs' capabilities in security-oriented program analysis, considering perspectives from both attackers and security analysts. We focus on two representative LLMs, ChatGPT and CodeBert, and evaluate their performance in solving typical analytic tasks with varying levels of difficulty. Our study demonstrates the LLM's efficiency in learning high-level semantics from code, positioning ChatGPT as a potential asset in security-oriented contexts. However, it is essential to acknowledge certain limitations, such as the heavy reliance on well-defined variable and function names, making them unable to learn from anonymized code. For example, the performance of these LLMs heavily relies on the well-defined variable and function names, therefore, will not be able to learn anonymized code. We believe that the concerns raised in this case study deserve in-depth investigation in the future.

5/3/2024

cs.CR cs.AI