Speech language models lack important brain-relevant semantics

2311.04664

Published 6/18/2024 by Subba Reddy Oota, Emin c{C}elik, Fatma Deniz, Mariya Toneva

🗣️

Abstract

Despite known differences between reading and listening in the brain, recent work has shown that text-based language models predict both text-evoked and speech-evoked brain activity to an impressive degree. This poses the question of what types of information language models truly predict in the brain. We investigate this question via a direct approach, in which we systematically remove specific low-level stimulus features (textual, speech, and visual) from language model representations to assess their impact on alignment with fMRI brain recordings during reading and listening. Comparing these findings with speech-based language models reveals starkly different effects of low-level features on brain alignment. While text-based models show reduced alignment in early sensory regions post-removal, they retain significant predictive power in late language regions. In contrast, speech-based models maintain strong alignment in early auditory regions even after feature removal but lose all predictive power in late language regions. These results suggest that speech-based models provide insights into additional information processed by early auditory regions, but caution is needed when using them to model processing in late language regions. We make our code publicly available. [https://github.com/subbareddy248/speech-llm-brain]

Create account to get full access

Overview

This paper investigates the relationship between language models and brain activity during reading and listening.
The researchers systematically removed different low-level features (textual, speech, and visual) from language model representations to assess their impact on alignment with functional magnetic resonance imaging (fMRI) brain recordings.
The findings suggest that text-based and speech-based language models provide insights into different aspects of language processing in the brain.

Plain English Explanation

The human brain processes language in complex ways, whether we're reading text or listening to speech. Recent research has shown that text-based language models can predict brain activity quite well during both reading and listening tasks. This raises the question of what types of information these language models are really capturing about how the brain processes language.

To investigate this, the researchers in this study took a direct approach. They systematically removed specific low-level features (such as the individual words or sounds) from the language model representations and then looked at how this affected the model's ability to predict the brain activity recorded during reading and listening tasks. By comparing the results for text-based and speech-based language models, the researchers were able to gain insights into the different types of information these models are capturing about language processing in the brain.

The key findings are:

Text-based language models show reduced alignment with brain activity in early sensory regions after removing low-level features, but they still maintain significant predictive power in later language processing regions of the brain.
In contrast, speech-based language models maintain strong alignment with brain activity in early auditory regions even after removing low-level features, but they lose all predictive power in later language processing regions.

These results suggest that speech-based language models provide additional insights into the information processed by early auditory regions of the brain, but they may not be as useful for understanding language processing in higher-level brain regions. The researchers make their code publicly available to encourage further exploration of these language-brain relationships.

Technical Explanation

The researchers in this study used a direct approach to investigate what types of information language models truly predict in the brain during reading and listening tasks. They systematically removed specific low-level stimulus features (textual, speech, and visual) from language model representations and assessed the impact on the models' alignment with functional magnetic resonance imaging (fMRI) brain recordings.

By comparing the effects of feature removal on text-based and speech-based language models, the researchers were able to gain insights into the differences in the information these models capture about language processing in the brain.

The key findings are:

Text-based language models showed reduced alignment with brain activity in early sensory regions after removing low-level features, but they retained significant predictive power in late language regions.
In contrast, speech-based language models maintained strong alignment with brain activity in early auditory regions even after feature removal, but they lost all predictive power in late language regions.

These results suggest that speech-based language models provide insights into additional information processed by early auditory regions of the brain, but caution is needed when using them to model processing in higher-level language regions. The researchers make their code publicly available to encourage further exploration of these comparative brain-language relationships.

Critical Analysis

The researchers in this study have taken a thoughtful and systematic approach to investigating the relationship between language models and brain activity during reading and listening tasks. By selectively removing low-level features from the language model representations, they were able to gain valuable insights into the types of information these models are capturing about language processing in the brain.

One potential limitation of the study is that the researchers focused on a specific set of low-level features (textual, speech, and visual). It would be interesting to see if the removal of other types of features, such as semantic or syntactic information, would yield similar or different results. Additionally, the study was conducted using fMRI data, which has relatively low temporal resolution. Incorporating other neuroimaging techniques with higher temporal resolution, such as electroencephalography (EEG) or magnetoencephalography (MEG), could provide a more detailed understanding of the temporal dynamics of language processing in the brain.

Furthermore, the researchers acknowledged that their findings do not fully explain the impressive predictive power of language models in previous studies on fMRI-based brain activity prediction. Additional research is needed to elucidate the specific mechanisms and representations within language models that contribute to their ability to predict brain activity during language-related tasks.

Overall, this study represents an important step in understanding the relationship between language models and the neural processing of language. The researchers' transparent approach and public availability of their code set a commendable standard for fostering further exploration and discussion in this rapidly evolving field.

Conclusion

This study provides valuable insights into the relationship between language models and brain activity during reading and listening tasks. By systematically removing low-level features from language model representations and comparing the effects on text-based and speech-based models, the researchers were able to gain a better understanding of the types of information these models capture about language processing in the brain.

The key finding is that text-based and speech-based language models appear to provide insights into different aspects of language processing. Text-based models maintain significant predictive power in late language regions of the brain, even after removing low-level textual features, suggesting they capture higher-level linguistic information. In contrast, speech-based models show strong alignment with early auditory regions but lose predictive power in late language regions, indicating they may be better suited for modeling early stages of auditory processing.

These results highlight the need for caution when using speech-based language models to draw conclusions about language processing in higher-level brain regions. The researchers' transparent approach and public availability of their code set the stage for further exploration and discussion of the complex relationships between language, brain, and artificial intelligence.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

fMRI predictors based on language models of increasing complexity recover brain left lateralization

Laurent Bonnasse-Gahot, Christophe Pallier

Over the past decade, studies of naturalistic language processing where participants are scanned while listening to continuous text have flourished. Using word embeddings at first, then large language models, researchers have created encoding models to analyze the brain signals. Presenting these models with the same text as the participants allows to identify brain areas where there is a significant correlation between the functional magnetic resonance imaging (fMRI) time series and the ones predicted by the models' artificial neurons. One intriguing finding from these studies is that they have revealed highly symmetric bilateral activation patterns, somewhat at odds with the well-known left lateralization of language processing. Here, we report analyses of an fMRI dataset where we manipulate the complexity of large language models, testing 28 pretrained models from 8 different families, ranging from 124M to 14.2B parameters. First, we observe that the performance of models in predicting brain responses follows a scaling law, where the fit with brain activity increases linearly with the logarithm of the number of parameters of the model (and its performance on natural language processing tasks). Second, we show that a left-right asymmetry gradually appears as model size increases, and that the difference in left-right brain correlations also follows a scaling law. Whereas the smallest models show no asymmetry, larger models fit better and better left hemispheric activations than right hemispheric ones. This finding reconciles computational analyses of brain activity using large language models with the classic observation from aphasic patients showing left hemisphere dominance for language.

5/29/2024

cs.CL cs.AI

Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network

Badr AlKhamissi, Greta Tuckute, Antoine Bosselut, Martin Schrimpf

Large Language Models (LLMs) have been shown to be effective models of the human language system, with some models predicting most explainable variance of brain activity in current datasets. Even in untrained models, the representations induced by architectural priors can exhibit reasonable alignment to brain data. In this work, we investigate the key architectural components driving the surprising alignment of untrained models. To estimate LLM-to-brain similarity, we first select language-selective units within an LLM, similar to how neuroscientists identify the language network in the human brain. We then benchmark the brain alignment of these LLM units across five different brain recording datasets. By isolating critical components of the Transformer architecture, we identify tokenization strategy and multihead attention as the two major components driving brain alignment. A simple form of recurrence further improves alignment. We further demonstrate this quantitative brain alignment of our model by reproducing landmark studies in the language neuroscience field, showing that localized model units -- just like language voxels measured empirically in the human brain -- discriminate more reliably between lexical than syntactic differences, and exhibit similar response profiles under the same experimental conditions. Finally, we demonstrate the utility of our model's representations for language modeling, achieving improved sample and parameter efficiency over comparable architectures. Our model's estimates of surprisal sets a new state-of-the-art in the behavioral alignment to human reading times. Taken together, we propose a highly brain- and behaviorally-aligned model that conceptualizes the human language system as an untrained shallow feature encoder, with structural priors, combined with a trained decoder to achieve efficient and performant language processing.

6/24/2024

cs.CL cs.LG

What Are Large Language Models Mapping to in the Brain? A Case Against Over-Reliance on Brain Scores

Ebrahim Feghhi, Nima Hadidi, Bryan Song, Idan A. Blank, Jonathan C. Kao

Given the remarkable capabilities of large language models (LLMs), there has been a growing interest in evaluating their similarity to the human brain. One approach towards quantifying this similarity is by measuring how well a model predicts neural signals, also called brain score. Internal representations from LLMs achieve state-of-the-art brain scores, leading to speculation that they share computational principles with human language processing. This inference is only valid if the subset of neural activity predicted by LLMs reflects core elements of language processing. Here, we question this assumption by analyzing three neural datasets used in an impactful study on LLM-to-brain mappings, with a particular focus on an fMRI dataset where participants read short passages. We first find that when using shuffled train-test splits, as done in previous studies with these datasets, a trivial feature that encodes temporal autocorrelation not only outperforms LLMs but also accounts for the majority of neural variance that LLMs explain. We therefore use contiguous splits moving forward. Second, we explain the surprisingly high brain scores of untrained LLMs by showing they do not account for additional neural variance beyond two simple features: sentence length and sentence position. This undermines evidence used to claim that the transformer architecture biases computations to be more brain-like. Third, we find that brain scores of trained LLMs on this dataset can largely be explained by sentence length, position, and pronoun-dereferenced static word embeddings; a small, additional amount is explained by sense-specific embeddings and contextual representations of sentence structure. We conclude that over-reliance on brain scores can lead to over-interpretations of similarity between LLMs and brains, and emphasize the importance of deconstructing what LLMs are mapping to in neural signals.

6/24/2024

cs.CL cs.AI

Do Large Language Models Mirror Cognitive Language Processing?

Yuqi Ren, Renren Jin, Tongxuan Zhang, Deyi Xiong

Large Language Models (LLMs) have demonstrated remarkable abilities in text comprehension and logical reasoning, indicating that the text representations learned by LLMs can facilitate their language processing capabilities. In cognitive science, brain cognitive processing signals are typically utilized to study human language processing. Therefore, it is natural to ask how well the text embeddings from LLMs align with the brain cognitive processing signals, and how training strategies affect the LLM-brain alignment? In this paper, we employ Representational Similarity Analysis (RSA) to measure the alignment between 23 mainstream LLMs and fMRI signals of the brain to evaluate how effectively LLMs simulate cognitive language processing. We empirically investigate the impact of various factors (e.g., pre-training data size, model scaling, alignment training, and prompts) on such LLM-brain alignment. Experimental results indicate that pre-training data size and model scaling are positively correlated with LLM-brain similarity, and alignment training can significantly improve LLM-brain similarity. Explicit prompts contribute to the consistency of LLMs with brain cognitive language processing, while nonsensical noisy prompts may attenuate such alignment. Additionally, the performance of a wide range of LLM evaluations (e.g., MMLU, Chatbot Arena) is highly correlated with the LLM-brain similarity.

5/29/2024

cs.AI cs.CL