On the Representational Capacity of Recurrent Neural Language Models

2310.12942

Published 5/31/2024 by Franz Nowak, Anej Svete, Li Du, Ryan Cotterell

🧠

Abstract

This work investigates the computational expressivity of language models (LMs) based on recurrent neural networks (RNNs). Siegelmann and Sontag (1992) famously showed that RNNs with rational weights and hidden states and unbounded computation time are Turing complete. However, LMs define weightings over strings in addition to just (unweighted) language membership and the analysis of the computational power of RNN LMs (RLMs) should reflect this. We extend the Turing completeness result to the probabilistic case, showing how a rationally weighted RLM with unbounded computation time can simulate any deterministic probabilistic Turing machine (PTM) with rationally weighted transitions. Since, in practice, RLMs work in real-time, processing a symbol at every time step, we treat the above result as an upper bound on the expressivity of RLMs. We also provide a lower bound by showing that under the restriction to real-time computation, such models can simulate deterministic real-time rational PTMs.

Create account to get full access

Overview

This paper investigates the computational capabilities of language models (LMs) based on recurrent neural networks (RNNs).
It builds on previous work that showed RNNs with rational weights and unbounded computation time are Turing complete.
However, LMs define weightings over strings, so the authors analyze the computational power of RNN-based LMs (RLMs) in this context.

Plain English Explanation

The researchers wanted to understand how powerful language models built using recurrent neural networks (RNNs) can be. Previous research had shown that RNNs with certain mathematical properties could, in theory, perform any computation that a standard computer can do, given enough time.

However, language models don't just decide whether a string of words is valid or not - they also assign probabilities or "weights" to different sequences of words. So the researchers set out to analyze the computational capabilities of these weighted language models based on RNNs.

Their key finding is that an RNN-based language model with unbounded computation time can simulate any probabilistic Turing machine with rational weights. This means these language models can, in principle, perform a wide range of computations, not just simple language tasks.

But in practice, language models process text in real-time, one symbol at a time. So the researchers also showed that within this real-time constraint, RNN-based language models can at least simulate a certain class of probabilistic Turing machines. This provides a lower bound on their computational expressivity.

Technical Explanation

The authors build on the Turing completeness result for RNNs shown by Siegelmann and Sontag (1992). They extend this to the probabilistic case, proving that a rationally weighted RNN-based language model (RLM) with unbounded computation time can simulate any deterministic probabilistic Turing machine (PTM) with rationally weighted transitions.

Since in practice, RLMs process text in real-time, the authors treat the above result as an upper bound on their computational expressivity. They also provide a lower bound by showing that under the real-time constraint, RLMs can simulate deterministic real-time rational PTMs.

Critical Analysis

The paper provides a rigorous theoretical analysis of the computational capabilities of RNN-based language models. However, the authors acknowledge that their results only establish upper and lower bounds on expressivity, rather than a precise characterization.

Additionally, the assumption of unbounded computation time in the upper bound result may not be realistic for practical language models. The authors note that further work is needed to understand the computational power of RLMs under more realistic resource constraints.

Another potential limitation is that the paper focuses solely on the mathematical properties of RLMs, without considering aspects like the evolution of representations or the role of memory in realistic language models. Incorporating these factors could lead to a more nuanced understanding of their computational capabilities.

Conclusion

This paper makes an important contribution to understanding the computational expressivity of RNN-based language models. By establishing upper and lower bounds on their capabilities, it provides a mathematical foundation for reasoning about the potential and limitations of these models, particularly in relation to simulating simple word2vec-style vector representations.

While the results are theoretical in nature, they have implications for the design and application of language models, suggesting that these models may be capable of a wider range of computations than traditionally assumed. Further research is needed to fully characterize their practical computational power and to understand how these theoretical results relate to the behavior of real-world language models.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🧠

Lower Bounds on the Expressivity of Recurrent Neural Language Models

Anej Svete, Franz Nowak, Anisha Mohamed Sahabdeen, Ryan Cotterell

The recent successes and spread of large neural language models (LMs) call for a thorough understanding of their computational ability. Describing their computational abilities through LMs' emph{representational capacity} is a lively area of research. However, investigation into the representational capacity of neural LMs has predominantly focused on their ability to emph{recognize} formal languages. For example, recurrent neural networks (RNNs) with Heaviside activations are tightly linked to regular languages, i.e., languages defined by finite-state automata (FSAs). Such results, however, fall short of describing the capabilities of RNN emph{language models} (LMs), which are definitionally emph{distributions} over strings. We take a fresh look at the representational capacity of RNN LMs by connecting them to emph{probabilistic} FSAs and demonstrate that RNN LMs with linearly bounded precision can express arbitrary regular LMs.

6/19/2024

cs.CL

🧠

On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning

Franz Nowak, Anej Svete, Alexandra Butoi, Ryan Cotterell

The performance of modern language models (LMs) has been improved by chain-of-thought (CoT) reasoning, i.e., the process of generating intermediate results that guide the model towards a final answer. A possible explanation for this improvement is that CoT reasoning extends an LM's computational power, as RNNs and transformers with additional scratch space are known to be Turing complete. Comparing LMs to Turing machines, however, introduces a category error - Turing machines decide language membership, whereas LMs define distributions over strings. To bridge this gap, we formalize CoT reasoning in a probabilistic setting. We present several results on the representational capacity of recurrent and transformer LMs with CoT reasoning, showing that they can represent the same family of distributions over strings as probabilistic Turing machines.

6/21/2024

cs.CL cs.FL

🤯

On Efficiently Representing Regular Languages as RNNs

Anej Svete, Robin Shing Moon Chan, Ryan Cotterell

Recent work by Hewitt et al. (2020) provides an interpretation of the empirical success of recurrent neural networks (RNNs) as language models (LMs). It shows that RNNs can efficiently represent bounded hierarchical structures that are prevalent in human language. This suggests that RNNs' success might be linked to their ability to model hierarchy. However, a closer inspection of Hewitt et al.'s (2020) construction shows that it is not inherently limited to hierarchical structures. This poses a natural question: What other classes of LMs can RNNs efficiently represent? To this end, we generalize Hewitt et al.'s (2020) construction and show that RNNs can efficiently represent a larger class of LMs than previously claimed -- specifically, those that can be represented by a pushdown automaton with a bounded stack and a specific stack update function. Altogether, the efficiency of representing this diverse class of LMs with RNN LMs suggests novel interpretations of their inductive bias.

6/19/2024

cs.CL cs.CC cs.LG

Language Modeling Using Tensor Trains

Zhan Su, Yuqin Zhou, Fengran Mo, Jakob Grue Simonsen

We propose a novel tensor network language model based on the simplest tensor network (i.e., tensor trains), called `Tensor Train Language Model' (TTLM). TTLM represents sentences in an exponential space constructed by the tensor product of words, but computing the probabilities of sentences in a low-dimensional fashion. We demonstrate that the architectures of Second-order RNNs, Recurrent Arithmetic Circuits (RACs), and Multiplicative Integration RNNs are, essentially, special cases of TTLM. Experimental evaluations on real language modeling tasks show that the proposed variants of TTLM (i.e., TTLM-Large and TTLM-Tiny) outperform the vanilla Recurrent Neural Networks (RNNs) with low-scale of hidden units. (The code is available at https://github.com/shuishen112/tensortrainlm.)

5/9/2024

cs.CL cs.IR