Navigating Brain Language Representations: A Comparative Analysis of Neural Language Models and Psychologically Plausible Models

2404.19364

Published 5/1/2024 by Yunhao Zhang, Shaonan Wang, Xinyi Dong, Jiajun Yu, Chengqing Zong

Navigating Brain Language Representations: A Comparative Analysis of Neural Language Models and Psychologically Plausible Models

Abstract

Neural language models, particularly large-scale ones, have been consistently proven to be most effective in predicting brain neural activity across a range of studies. However, previous research overlooked the comparison of these models with psychologically plausible ones. Moreover, evaluations were reliant on limited, single-modality, and English cognitive datasets. To address these questions, we conducted an analysis comparing encoding performance of various neural language models and psychologically plausible models. Our study utilized extensive multi-modal cognitive datasets, examining bilingual word and discourse levels. Surprisingly, our findings revealed that psychologically plausible models outperformed neural language models across diverse contexts, encompassing different modalities such as fMRI and eye-tracking, and spanning languages from English to Chinese. Among psychologically plausible models, the one incorporating embodied information emerged as particularly exceptional. This model demonstrated superior performance at both word and discourse levels, exhibiting robust prediction of brain activation across numerous regions in both English and Chinese.

Create account to get full access

Overview

This paper compares the language representations of neural language models and psychologically plausible models to better understand how the human brain processes language.
The authors examine the similarities and differences between these two types of models, providing insights into the mechanisms underlying human language processing.
They use a range of experiments and analyses to assess the strengths and limitations of each approach, with the goal of informing the development of more human-like language models.

Plain English Explanation

The paper explores how well modern artificial intelligence (AI) language models, like those used in chatbots and translation tools, capture the way humans actually process and understand language. The researchers compare these AI language models to more psychologically realistic models that try to mimic the way the human brain works when processing language.

By looking at the similarities and differences between these two types of language models, the authors hope to gain a better understanding of the cognitive processes involved in human language use. This could help improve the development of AI systems that can communicate more like humans do.

The researchers use a variety of experiments and analyses to assess the strengths and weaknesses of each approach. They're interested in understanding which aspects of language the AI models capture well, and where they fall short compared to the psychologically plausible models.

Ultimately, the goal is to use these insights to create AI language models that are more in tune with how humans actually process and understand language. This could lead to AI assistants and language tools that feel more natural and human-like in their interactions.

Technical Explanation

The paper compares the language representations of neural language models and psychologically plausible models. Neural language models, such as those used in modern chatbots and translation tools, are trained on vast amounts of text data to generate human-like language. In contrast, psychologically plausible models aim to capture the cognitive processes involved in human language processing, as seen in studies of how humans reason with language.

The authors use a range of experiments and analyses to assess the similarities and differences between these two approaches. For example, they examine how the models perform on tasks that test logical reasoning, and how the internal representations of the models compare to neuroimaging data on human language processing.

The findings suggest that while neural language models can generate fluent and coherent text, they may not fully capture the deeper cognitive mechanisms underlying human language use. The psychologically plausible models, on the other hand, can provide valuable insights into the ways the brain processes and understands language.

Critical Analysis

The paper acknowledges several limitations and areas for further research. For example, the psychologically plausible models used in the study are still relatively simplistic compared to the complexity of the human brain. Additionally, the experiments and analyses conducted may not fully capture the nuances of real-world language use.

One potential concern is that the study focuses primarily on formal logical reasoning tasks, which may not fully reflect the diverse ways in which humans use and understand language in daily life. It would be valuable to explore a wider range of language phenomena to get a more comprehensive understanding of the strengths and limitations of each approach.

Furthermore, the paper does not delve deeply into the potential biases or ethical considerations that may arise from the use of AI language models, which is an important area of concern as these technologies become more widely deployed.

Overall, the research presented in this paper makes a valuable contribution to the ongoing effort to develop AI systems that can more effectively engage with human language. However, further work is needed to fully bridge the gap between artificial and human language processing.

Conclusion

This paper provides a comparative analysis of neural language models and psychologically plausible models, shedding light on the strengths and limitations of each approach in capturing the complexities of human language processing. The findings suggest that while neural language models can generate fluent text, they may not fully capture the deeper cognitive mechanisms underlying human language use.

The psychologically plausible models, on the other hand, can offer valuable insights into the ways the brain processes and understands language. By combining these two perspectives, the authors hope to inform the development of more human-like AI language models that can better engage with and understand natural human language.

As AI language technologies continue to advance, it will be crucial to consider not only their technical capabilities, but also their alignment with the cognitive and psychological realities of human language use. This paper represents an important step in that direction, and its insights may have far-reaching implications for the field of natural language processing and its applications in various domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

What Are Large Language Models Mapping to in the Brain? A Case Against Over-Reliance on Brain Scores

Ebrahim Feghhi, Nima Hadidi, Bryan Song, Idan A. Blank, Jonathan C. Kao

Given the remarkable capabilities of large language models (LLMs), there has been a growing interest in evaluating their similarity to the human brain. One approach towards quantifying this similarity is by measuring how well a model predicts neural signals, also called brain score. Internal representations from LLMs achieve state-of-the-art brain scores, leading to speculation that they share computational principles with human language processing. This inference is only valid if the subset of neural activity predicted by LLMs reflects core elements of language processing. Here, we question this assumption by analyzing three neural datasets used in an impactful study on LLM-to-brain mappings, with a particular focus on an fMRI dataset where participants read short passages. We first find that when using shuffled train-test splits, as done in previous studies with these datasets, a trivial feature that encodes temporal autocorrelation not only outperforms LLMs but also accounts for the majority of neural variance that LLMs explain. We therefore use contiguous splits moving forward. Second, we explain the surprisingly high brain scores of untrained LLMs by showing they do not account for additional neural variance beyond two simple features: sentence length and sentence position. This undermines evidence used to claim that the transformer architecture biases computations to be more brain-like. Third, we find that brain scores of trained LLMs on this dataset can largely be explained by sentence length, position, and pronoun-dereferenced static word embeddings; a small, additional amount is explained by sense-specific embeddings and contextual representations of sentence structure. We conclude that over-reliance on brain scores can lead to over-interpretations of similarity between LLMs and brains, and emphasize the importance of deconstructing what LLMs are mapping to in neural signals.

6/24/2024

cs.CL cs.AI

Large language models surpass human experts in predicting neuroscience results

Xiaoliang Luo, Akilles Rechardt, Guangzhi Sun, Kevin K. Nejad, Felipe Y'a~nez, Bati Yilmaz, Kangjoo Lee, Alexandra O. Cohen, Valentina Borghesani, Anton Pashkov, Daniele Marinazzo, Jonathan Nicholas, Alessandro Salatiello, Ilia Sucholutsky, Pasquale Minervini, Sepehr Razavi, Roberta Rocca, Elkhan Yusifov, Tereza Okalova, Nianlong Gu, Martin Ferianc, Mikail Khona, Kaustubh R. Patil, Pui-Shee Lee, Rui Mata, Nicholas E. Myers, Jennifer K Bizley, Sebastian Musslick, Isil Poyraz Bilgin, Guiomar Niso, Justin M. Ales, Michael Gaebler, N Apurva Ratan Murty, Leyla Loued-Khenissi, Anna Behler, Chloe M. Hall, Jessica Dafflon, Sherry Dongqi Bao, Bradley C. Love

Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created BrainBench, a forward-looking benchmark for predicting neuroscience results. We find that LLMs surpass experts in predicting experimental outcomes. BrainGPT, an LLM we tuned on the neuroscience literature, performed better yet. Like human experts, when LLMs were confident in their predictions, they were more likely to be correct, which presages a future where humans and LLMs team together to make discoveries. Our approach is not neuroscience-specific and is transferable to other knowledge-intensive endeavors.

6/24/2024

cs.AI

Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network

Badr AlKhamissi, Greta Tuckute, Antoine Bosselut, Martin Schrimpf

Large Language Models (LLMs) have been shown to be effective models of the human language system, with some models predicting most explainable variance of brain activity in current datasets. Even in untrained models, the representations induced by architectural priors can exhibit reasonable alignment to brain data. In this work, we investigate the key architectural components driving the surprising alignment of untrained models. To estimate LLM-to-brain similarity, we first select language-selective units within an LLM, similar to how neuroscientists identify the language network in the human brain. We then benchmark the brain alignment of these LLM units across five different brain recording datasets. By isolating critical components of the Transformer architecture, we identify tokenization strategy and multihead attention as the two major components driving brain alignment. A simple form of recurrence further improves alignment. We further demonstrate this quantitative brain alignment of our model by reproducing landmark studies in the language neuroscience field, showing that localized model units -- just like language voxels measured empirically in the human brain -- discriminate more reliably between lexical than syntactic differences, and exhibit similar response profiles under the same experimental conditions. Finally, we demonstrate the utility of our model's representations for language modeling, achieving improved sample and parameter efficiency over comparable architectures. Our model's estimates of surprisal sets a new state-of-the-art in the behavioral alignment to human reading times. Taken together, we propose a highly brain- and behaviorally-aligned model that conceptualizes the human language system as an untrained shallow feature encoder, with structural priors, combined with a trained decoder to achieve efficient and performant language processing.

6/24/2024

cs.CL cs.LG

fMRI predictors based on language models of increasing complexity recover brain left lateralization

Laurent Bonnasse-Gahot, Christophe Pallier

Over the past decade, studies of naturalistic language processing where participants are scanned while listening to continuous text have flourished. Using word embeddings at first, then large language models, researchers have created encoding models to analyze the brain signals. Presenting these models with the same text as the participants allows to identify brain areas where there is a significant correlation between the functional magnetic resonance imaging (fMRI) time series and the ones predicted by the models' artificial neurons. One intriguing finding from these studies is that they have revealed highly symmetric bilateral activation patterns, somewhat at odds with the well-known left lateralization of language processing. Here, we report analyses of an fMRI dataset where we manipulate the complexity of large language models, testing 28 pretrained models from 8 different families, ranging from 124M to 14.2B parameters. First, we observe that the performance of models in predicting brain responses follows a scaling law, where the fit with brain activity increases linearly with the logarithm of the number of parameters of the model (and its performance on natural language processing tasks). Second, we show that a left-right asymmetry gradually appears as model size increases, and that the difference in left-right brain correlations also follows a scaling law. Whereas the smallest models show no asymmetry, larger models fit better and better left hemispheric activations than right hemispheric ones. This finding reconciles computational analyses of brain activity using large language models with the classic observation from aphasic patients showing left hemisphere dominance for language.

5/29/2024

cs.CL cs.AI