Learned feature representations are biased by complexity, learning order, position, and more

2405.05847

Published 6/7/2024 by Andrew Kyle Lampinen, Stephanie C. Y. Chan, Katherine Hermann

✨

Abstract

Representation learning, and interpreting learned representations, are key areas of focus in machine learning and neuroscience. Both fields generally use representations as a means to understand or improve a system's computations. In this work, however, we explore surprising dissociations between representation and computation that may pose challenges for such efforts. We create datasets in which we attempt to match the computational role that different features play, while manipulating other properties of the features or the data. We train various deep learning architectures to compute these multiple abstract features about their inputs. We find that their learned feature representations are systematically biased towards representing some features more strongly than others, depending upon extraneous properties such as feature complexity, the order in which features are learned, and the distribution of features over the inputs. For example, features that are simpler to compute or learned first tend to be represented more strongly and densely than features that are more complex or learned later, even if all features are learned equally well. We also explore how these biases are affected by architectures, optimizers, and training regimes (e.g., in transformers, features decoded earlier in the output sequence also tend to be represented more strongly). Our results help to characterize the inductive biases of gradient-based representation learning. These results also highlight a key challenge for interpretability $-$ or for comparing the representations of models and brains $-$ disentangling extraneous biases from the computationally important aspects of a system's internal representations.

Create account to get full access

Overview

This paper explores the surprising dissociations between representation and computation in machine learning and neuroscience.
The researchers created datasets to investigate how different features are represented by deep learning models, even when the features have similar computational roles.
They found that models exhibit systematic biases in their feature representations, based on factors like feature complexity, learning order, and feature distribution.
These findings pose challenges for interpreting the internal representations of models and comparing them to brain representations.

Plain English Explanation

The paper looks at how machine learning models and the human brain represent and process information. Typically, researchers in these fields use representations (the way information is encoded) as a way to understand or improve a system's computations (the calculations it performs).

However, the researchers in this paper found some unexpected disconnects between how a model or brain represents information and the actual computations it performs. They created datasets where different features (pieces of information) played similar computational roles, but had other properties manipulated, like how complex they were or when they were learned.

When they trained deep learning models on these datasets, they found the models had systematic biases in how they represented the features. For example, simpler features or ones learned earlier tended to be represented more strongly, even if the model learned all the features equally well.

These biases based on extraneous properties, rather than just the computational importance of the features, pose challenges for interpreting the representations of AI models or comparing them to brain representations. It means we have to be careful to disentangle these biases from the truly important aspects of the representations.

Technical Explanation

The researchers created specialized datasets where different features had similar computational roles, but varied in properties like complexity and order of learning. They then trained various deep learning architectures, including transformers, to compute multiple abstract features from the inputs.

They found that the models exhibited systematic biases in their learned feature representations. Features that were simpler to compute or learned earlier tended to be represented more strongly and densely, even if all features were learned equally well. These biases were also affected by factors like the architecture, optimizer, and training regime used.

For example, in transformer models, features decoded earlier in the output sequence also tended to be represented more strongly. This suggests these representation biases are a general property of gradient-based representation learning, not specific to a particular model type.

These findings highlight a key challenge for interpreting the internal representations of AI models, or for comparing them to brain representations. Researchers need to be able to disentangle these extraneous biases from the truly computationally important aspects of the representations.

Critical Analysis

The paper provides valuable insights into the inductive biases of gradient-based representation learning, but also raises important caveats and areas for further research.

One limitation is that the experiments were conducted on synthetic datasets, so it's unclear how well the findings generalize to real-world, more complex data. The researchers acknowledge this and suggest replicating the study on natural datasets.

Additionally, while the paper highlights the challenges these representation biases pose for interpretability, it doesn't offer concrete solutions. Further research is needed to develop methods for mitigating or correcting these biases, in order to truly understand the computationally relevant aspects of a model's representations.

It's also worth considering whether these biases are unique to machine learning, or if similar phenomena occur in biological neural networks. Exploring the parallels and differences between artificial and natural representation learning could yield valuable insights.

Overall, this paper makes an important contribution by shedding light on a fundamental issue in representation learning that has been overlooked. Addressing these challenges will be crucial for advancing the interpretability and comparability of machine learning models and biological systems.

Conclusion

This paper uncovers surprising dissociations between representation and computation in machine learning and neuroscience. It demonstrates that deep learning models exhibit systematic biases in how they represent different features, based on factors like complexity and learning order, rather than just the features' computational roles.

These findings pose significant challenges for interpreting the internal representations of AI models and comparing them to brain representations. Researchers must be cautious in disentangling these extraneous biases from the truly computationally relevant aspects of the representations.

Addressing these challenges will be crucial for advancing the field of interpretable machine learning and for building a better understanding of how both artificial and biological systems process and represent information. Further research is needed to develop methods for mitigating these biases and to explore the parallels between artificial and natural representation learning.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

New!Dimensions underlying the representational alignment of deep neural networks with humans

Florian P. Mahner, Lukas Muttenthaler, Umut Guc{c}lu, Martin N. Hebart

Determining the similarities and differences between humans and artificial intelligence is an important goal both in machine learning and cognitive neuroscience. However, similarities in representations only inform us about the degree of alignment, not the factors that determine it. Drawing upon recent developments in cognitive science, we propose a generic framework for yielding comparable representations in humans and deep neural networks (DNN). Applying this framework to humans and a DNN model of natural images revealed a low-dimensional DNN embedding of both visual and semantic dimensions. In contrast to humans, DNNs exhibited a clear dominance of visual over semantic features, indicating divergent strategies for representing images. While in-silico experiments showed seemingly-consistent interpretability of DNN dimensions, a direct comparison between human and DNN representations revealed substantial differences in how they process images. By making representations directly comparable, our results reveal important challenges for representational alignment, offering a means for improving their comparability.

6/28/2024

cs.CV cs.AI cs.LG

✨

Knowledge Accumulation in Continually Learned Representations and the Issue of Feature Forgetting

Timm Hess, Eli Verwimp, Gido M. van de Ven, Tinne Tuytelaars

Continual learning research has shown that neural networks suffer from catastrophic forgetting at the output level, but it is debated whether this is also the case at the level of learned representations. Multiple recent studies ascribe representations a certain level of innate robustness against forgetting -- that they only forget minimally in comparison with forgetting at the output level. We revisit and expand upon the experiments that revealed this difference in forgetting and illustrate the coexistence of two phenomena that affect the quality of continually learned representations: knowledge accumulation and feature forgetting. Taking both aspects into account, we show that, even though forgetting in the representation (i.e. feature forgetting) can be small in absolute terms, when measuring relative to how much was learned during a task, forgetting in the representation tends to be just as catastrophic as forgetting at the output level. Next we show that this feature forgetting is problematic as it substantially slows down the incremental learning of good general representations (i.e. knowledge accumulation). Finally, we study how feature forgetting and knowledge accumulation are affected by different types of continual learning methods.

6/26/2024

cs.LG cs.CV

Coding schemes in neural networks learning classification tasks

Alexander van Meegen, Haim Sompolinsky

Neural networks posses the crucial ability to generate meaningful representations of task-dependent features. Indeed, with appropriate scaling, supervised learning in neural networks can result in strong, task-dependent feature learning. However, the nature of the emergent representations, which we call the `coding scheme', is still unclear. To understand the emergent coding scheme, we investigate fully-connected, wide neural networks learning classification tasks using the Bayesian framework where learning shapes the posterior distribution of the network weights. Consistent with previous findings, our analysis of the feature learning regime (also known as `non-lazy', `rich', or `mean-field' regime) shows that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity. In linear networks, an analog coding scheme of the task emerges. Despite the strong representations, the mean predictor is identical to the lazy case. In nonlinear networks, spontaneous symmetry breaking leads to either redundant or sparse coding schemes. Our findings highlight how network properties such as scaling of weights and neuronal nonlinearity can profoundly influence the emergent representations.

6/26/2024

cs.LG stat.ML

Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize

Tianren Zhang, Chujie Zhao, Guanyu Chen, Yizhou Jiang, Feng Chen

Learning representations that generalize under distribution shifts is critical for building robust machine learning models. However, despite significant efforts in recent years, algorithmic advances in this direction have been limited. In this work, we seek to understand the fundamental difficulty of out-of-distribution generalization with deep neural networks. We first empirically show that perhaps surprisingly, even allowing a neural network to explicitly fit the representations obtained from a teacher network that can generalize out-of-distribution is insufficient for the generalization of the student network. Then, by a theoretical study of two-layer ReLU networks optimized by stochastic gradient descent (SGD) under a structured feature model, we identify a fundamental yet unexplored feature learning proclivity of neural networks, feature contamination: neural networks can learn uncorrelated features together with predictive features, resulting in generalization failure under distribution shifts. Notably, this mechanism essentially differs from the prevailing narrative in the literature that attributes the generalization failure to spurious correlations. Overall, our results offer new insights into the non-linear feature learning dynamics of neural networks and highlight the necessity of considering inductive biases in out-of-distribution generalization.

6/7/2024

cs.LG cs.AI