Probabilistically-Sound Beam Search with Masked Language Models

Read original: arXiv:2402.15020 - Published 7/10/2024 by Creston Brooks, Robert Calef, Charlie Cowen-Breen, Anna Sappington

Probabilistically-Sound Beam Search with Masked Language Models

Overview

This paper introduces a probabilistically-sound beam search algorithm for text infilling tasks using masked language models.
The authors propose a method to estimate the probability of a complete sentence given a partially masked input, which allows for more principled decoding during beam search.
The technique is demonstrated on tasks like text summarization and machine translation, showing improved performance over standard beam search.

Plain English Explanation

The paper is focused on a technique called "text infilling" - the process of filling in missing parts of a piece of text using a language model. This is useful for tasks like summarization, translation, and other text generation applications.

Traditionally, these language models use a technique called "beam search" to generate the most likely completion for a given input. However, this approach has some limitations, as it doesn't explicitly consider the probability of the full, completed text.

The researchers in this paper introduce a new algorithm that aims to address this issue. Their "probabilistically-sound beam search" method tries to estimate the probability of the entire completed sentence, not just the individual words. This allows the model to make more informed decisions during the beam search process, leading to better overall results.

The paper demonstrates this technique on a few different text-related tasks, showing improvements over standard beam search approaches. The key innovation is the way they can efficiently calculate the probability of the full output, which gives the model a more principled way to navigate the search process.

Technical Explanation

The paper introduces a novel decoding strategy called "probabilistically-sound beam search" for text infilling tasks using masked language models. The core idea is to estimate the probability of a complete sentence, rather than just the individual token probabilities, to guide the beam search process.

Traditionally, beam search greedily selects the top-k most probable next tokens at each step. However, this can lead to suboptimal results, as it does not consider the overall probability of the full, completed sequence. The authors address this by deriving a closed-form expression to efficiently compute the probability of a complete sentence given a partially masked input.

This probability estimate is then used to score candidate completions during the beam search, allowing the algorithm to explore the search space in a more principled, probabilistically-sound manner. The authors demonstrate this technique on tasks like text summarization and machine translation, showing improvements over standard beam search approaches.

The technical details involve deriving the sentence probability calculation, incorporating it into the beam search procedure, and carefully designing the masking strategies to enable efficient inference. The paper provides theoretical analysis and empirical results to validate the effectiveness of the proposed method.

Critical Analysis

The paper presents a well-motivated and technically sound approach to improving text infilling with masked language models. The key innovation of incorporating a principled probability estimate into the beam search process is a logical and promising direction.

One potential limitation is the computational overhead of the probability calculation, which could impact the efficiency of the decoding process. The authors do discuss techniques to mitigate this, but it remains an area for further optimization and investigation.

Additionally, the paper focuses on a relatively narrow set of tasks (summarization, translation). It would be interesting to see how the method performs on a broader range of text generation scenarios, such as open-ended creative writing or controllable text generation.

The authors also do not explore potential issues around language model bias and safety in depth, which is an important consideration for real-world deployment of such techniques.

Overall, the paper presents a valuable contribution to the field of text generation with masked language models, and the proposed probabilistically-sound beam search is a promising direction for further research and development.

Conclusion

This paper introduces a novel decoding strategy called "probabilistically-sound beam search" for text infilling tasks using masked language models. The key innovation is the ability to efficiently estimate the probability of a complete sentence, rather than just individual tokens, to guide the beam search process in a more principled way.

The authors demonstrate the effectiveness of this approach on tasks like text summarization and machine translation, showing improvements over standard beam search methods. The technical details and theoretical analysis provide a solid foundation for the proposed technique.

While the paper focuses on a relatively narrow set of tasks, the probabilistically-sound beam search algorithm represents an important step forward in improving the performance and robustness of text generation with masked language models. Further exploration of its capabilities and limitations, as well as its application to a broader range of scenarios, will be valuable areas for future research.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Probabilistically-Sound Beam Search with Masked Language Models

Creston Brooks, Robert Calef, Charlie Cowen-Breen, Anna Sappington

Beam search with masked language models (MLMs) is challenging in part because joint probability distributions over sequences are not readily available, unlike for autoregressive models. However, estimating such distributions has important domain-specific applications such as ancient text restoration and protein engineering. Here we present probabilistically-sound methods for beam search with MLMs. First, we clarify the conditions under which it is theoretically sound to perform text infilling with MLMs using standard beam search. When these conditions fail, we provide a probabilistically-sound modification with no additional computational complexity and demonstrate that it is superior to the aforementioned beam search in the expected conditions. We then present empirical results comparing several infilling approaches with MLMs across several domains.

7/10/2024

Beam Prediction based on Large Language Models

Yucheng Sheng, Kai Huang, Le Liang, Peng Liu, Shi Jin, Geoffrey Ye Li

Millimeter-wave (mmWave) communication is promising for next-generation wireless networks but suffers from significant path loss, requiring extensive antenna arrays and frequent beam training. Traditional deep learning models, such as long short-term memory (LSTM), enhance beam tracking accuracy however are limited by poor robustness and generalization. In this letter, we use large language models (LLMs) to improve the robustness of beam prediction. By converting time series data into text-based representations and employing the Prompt-as-Prefix (PaP) technique for contextual enrichment, our approach unleashes the strength of LLMs for time series forecasting. Simulation results demonstrate that our LLM-based method offers superior robustness and generalization compared to LSTM-based models, showcasing the potential of LLMs in wireless communications.

8/19/2024

Enabling Beam Search for Language Model-Based Text-to-Speech Synthesis

Zehai Tu, Guangyan Zhang, Yiting Lu, Adaeze Adigwe, Simon King, Yiwen Guo

Tokenising continuous speech into sequences of discrete tokens and modelling them with language models (LMs) has led to significant success in text-to-speech (TTS) synthesis. Although these models can generate speech with high quality and naturalness, their synthesised samples can still suffer from artefacts, mispronunciation, word repeating, etc. In this paper, we argue these undesirable properties could partly be caused by the randomness of sampling-based strategies during the autoregressive decoding of LMs. Therefore, we look at maximisation-based decoding approaches and propose Temporal Repetition Aware Diverse Beam Search (TRAD-BS) to find the most probable sequences of the generated speech tokens. Experiments with two state-of-the-art LM-based TTS models demonstrate that our proposed maximisation-based decoding strategy generates speech with fewer mispronunciations and improved speaker consistency.

8/30/2024

Uncertainty-Guided Optimization on Large Language Model Search Trees

Julia Grosse, Ruotian Wu, Ahmad Rashid, Philipp Hennig, Pascal Poupart, Agustinus Kristiadi

Beam search is a standard tree search algorithm when it comes to finding sequences of maximum likelihood, for example, in the decoding processes of large language models. However, it is myopic since it does not take the whole path from the root to a leaf into account. Moreover, it is agnostic to prior knowledge available about the process: For example, it does not consider that the objective being maximized is a likelihood and thereby has specific properties, like being bound in the unit interval. Taking a probabilistic approach, we define a prior belief over the LLMs' transition probabilities and obtain a posterior belief over the most promising paths in each iteration. These beliefs are helpful to define a non-myopic Bayesian-optimization-like acquisition function that allows for a more data-efficient exploration scheme than standard beam search. We discuss how to select the prior and demonstrate in on- and off-model experiments with recent large language models, including Llama-2-7b, that our method achieves higher efficiency than beam search: Our method achieves the same or a higher likelihood while expanding fewer nodes than beam search.

7/8/2024