Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

2405.15071

239

Published 5/28/2024 by Boshi Wang, Xiang Yue, Yu Su, Huan Sun

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

Abstract

We study whether transformers can learn to implicitly reason over parametric knowledge, a skill that even the most capable language models struggle with. Focusing on two representative reasoning types, composition and comparison, we consistently find that transformers can learn implicit reasoning, but only through grokking, i.e., extended training far beyond overfitting. The levels of generalization also vary across reasoning types: when faced with out-of-distribution examples, transformers fail to systematically generalize for composition but succeed for comparison. We delve into the model's internals throughout training, conducting analytical experiments that reveal: 1) the mechanism behind grokking, such as the formation of the generalizing circuit and its relation to the relative efficiency of generalizing and memorizing circuits, and 2) the connection between systematicity and the configuration of the generalizing circuit. Our findings guide data and training setup to better induce implicit reasoning and suggest potential improvements to the transformer architecture, such as encouraging cross-layer knowledge sharing. Furthermore, we demonstrate that for a challenging reasoning task with a large search space, GPT-4-Turbo and Gemini-1.5-Pro based on non-parametric memory fail badly regardless of prompting styles or retrieval augmentation, while a fully grokked transformer can achieve near-perfect accuracy, showcasing the power of parametric memory for complex reasoning.

Create account to get full access

Overview

This paper explores the inner workings of Transformer models and their ability to reason implicitly about abstract concepts and perform multi-step reasoning.
The researchers use a combination of experimental and analytical techniques to gain a deeper understanding of how Transformers learn and generalize.
Key findings include insights into Transformers' capacity for implicit reasoning, their ability to learn syntactic structure without explicit supervision, and their performance on tasks involving multi-step reasoning.

Plain English Explanation

Transformer models, a type of deep learning architecture, have become incredibly powerful in a variety of tasks, from language processing to image recognition. But how exactly do these models work, and what are they capable of?

This research paper dives into the inner workings of Transformers, exploring their ability to reason about abstract concepts and perform multi-step reasoning. The researchers use a combination of experiments and analyses to uncover the mechanisms underlying Transformers' impressive performance.

One key finding is that Transformers can learn syntactic structure without explicit supervision, suggesting that they have a remarkable capacity for implicit reasoning. They can also tackle multi-step reasoning tasks, demonstrating their expressive power and ability to chain together complex thought processes.

Overall, this research sheds light on the inner workings of Transformers, helping us better understand how these powerful models learn and generalize. By delving into the mechanisms behind their performance, the researchers hope to pave the way for even more advanced and capable AI systems in the future.

Technical Explanation

The researchers in this paper use a combination of experimental and analytical techniques to investigate the inner workings of Transformer models. They explore the models' capacity for implicit reasoning about abstract concepts, as well as their ability to learn syntactic structure and perform multi-step reasoning.

Through a series of carefully designed experiments, the researchers demonstrate that Transformers can learn to reason about abstract symbols without explicit supervision. They also find that Transformers can learn syntactic structure in an implicit manner, suggesting a remarkable capacity for implicit reasoning.

Furthermore, the researchers investigate the expressive power of Transformers and their ability to perform multi-step reasoning. They find that Transformers can effectively chain together complex thought processes, demonstrating their versatility and potential for tackling increasingly sophisticated tasks.

Critical Analysis

The researchers in this paper provide a comprehensive and insightful analysis of Transformer models, shedding light on their inner workings and capabilities. However, it's important to note that the findings presented here are specific to the particular experimental setups and datasets used in the study.

While the researchers have taken great care to design their experiments and analyses, it's possible that the results may not generalize to all Transformer models or applications. There may be limitations or edge cases that were not explored in this study, and further research would be needed to fully understand the broader implications of these findings.

Additionally, the paper focuses primarily on the technical aspects of Transformer models, without much discussion of the potential societal implications or ethical considerations surrounding the use of these powerful AI systems. As Transformers continue to advance and become more widely deployed, it will be crucial to consider the broader impact and responsible development of this technology.

Conclusion

This research paper offers a comprehensive and insightful exploration of the inner workings of Transformer models, providing valuable insights into their capacity for implicit reasoning, their ability to learn syntactic structure, and their expressive power in performing multi-step reasoning.

By delving into the mechanisms underlying Transformers' impressive performance, the researchers hope to pave the way for even more advanced and capable AI systems in the future. However, it's important to consider the limitations and potential broader implications of these findings, as the continued development and deployment of Transformers will have significant societal impacts that deserve careful consideration.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

⚙️

A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers

Jordan Meadows, Marco Valentino, Damien Teney, Andre Freitas

This paper proposes a methodology for generating and perturbing detailed derivations of equations at scale, aided by a symbolic engine, to evaluate the generalisability of Transformers to out-of-distribution mathematical reasoning problems. Instantiating the framework in the context of sequence classification tasks, we compare the capabilities of GPT-4, GPT-3.5, and a canon of fine-tuned BERT models, exploring the relationship between specific operators and generalisation failure via the perturbation of reasoning aspects such as symmetry and variable surface forms. Surprisingly, our empirical evaluation reveals that the average in-distribution performance of fine-tuned models surpasses GPT-3.5, and rivals GPT-4. However, perturbations to input reasoning can reduce their performance by up to 80 F1 points. Overall, the results suggest that the in-distribution performance of smaller open-source models may potentially rival GPT by incorporating appropriately structured derivation dependencies during training, and highlight a shared weakness between BERT and GPT involving a relative inability to decode indirect references to mathematical entities. We release the full codebase, constructed datasets, and fine-tuned models to encourage future progress in the field.

4/9/2024

cs.CL cs.LG

🌐

When can transformers reason with abstract symbols?

Enric Boix-Adsera, Omid Saremi, Emmanuel Abbe, Samy Bengio, Etai Littwin, Joshua Susskind

We investigate the capabilities of transformer models on relational reasoning tasks. In these tasks, models are trained on a set of strings encoding abstract relations, and are then tested out-of-distribution on data that contains symbols that did not appear in the training dataset. We prove that for any relational reasoning task in a large family of tasks, transformers learn the abstract relations and generalize to the test set when trained by gradient descent on sufficiently large quantities of training data. This is in contrast to classical fully-connected networks, which we prove fail to learn to reason. Our results inspire modifications of the transformer architecture that add only two trainable parameters per head, and that we empirically demonstrate improve data efficiency for learning to reason.

4/17/2024

cs.CL cs.AI cs.LG

Transformers meet Neural Algorithmic Reasoners

Wilfried Bounsi, Borja Ibarz, Andrew Dudzik, Jessica B. Hamrick, Larisa Markeeva, Alex Vitvitskyi, Razvan Pascanu, Petar Veliv{c}kovi'c

Transformers have revolutionized machine learning with their simple yet effective architecture. Pre-training Transformers on massive text datasets from the Internet has led to unmatched generalization for natural language understanding (NLU) tasks. However, such language models remain fragile when tasked with algorithmic forms of reasoning, where computations must be precise and robust. To address this limitation, we propose a novel approach that combines the Transformer's language understanding with the robustness of graph neural network (GNN)-based neural algorithmic reasoners (NARs). Such NARs proved effective as generic solvers for algorithmic tasks, when specified in graph form. To make their embeddings accessible to a Transformer, we propose a hybrid architecture with a two-phase training procedure, allowing the tokens in the language model to cross-attend to the node embeddings from the NAR. We evaluate our resulting TransNAR model on CLRS-Text, the text-based version of the CLRS-30 benchmark, and demonstrate significant gains over Transformer-only models for algorithmic reasoning, both in and out of distribution.

6/14/2024

cs.CL cs.LG

🤔

Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically

Kabir Ahuja, Vidhisha Balachandran, Madhur Panwar, Tianxing He, Noah A. Smith, Navin Goyal, Yulia Tsvetkov

Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures without explicitly encoding any structural bias. In this work, we investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge. We extensively experiment with transformer models trained on multiple synthetic datasets and with different training objectives and show that while other objectives e.g. sequence-to-sequence modeling, prefix language modeling, often failed to lead to hierarchical generalization, models trained with the language modeling objective consistently learned to generalize hierarchically. We then conduct pruning experiments to study how transformers trained with the language modeling objective encode hierarchical structure. When pruned, we find joint existence of subnetworks within the model with different generalization behaviors (subnetworks corresponding to hierarchical structure and linear order). Finally, we take a Bayesian perspective to further uncover transformers' preference for hierarchical generalization: We establish a correlation between whether transformers generalize hierarchically on a dataset and whether the simplest explanation of that dataset is provided by a hierarchical grammar compared to regular grammars exhibiting linear generalization.

6/4/2024

cs.CL cs.LG