Language Model Decoding as Direct Metrics Optimization

Read original: arXiv:2310.01041 - Published 6/6/2024 by Haozhe Ji, Pei Ke, Hongning Wang, Minlie Huang

💬

Overview

Current language models struggle to generate texts that align with human texts across different aspects.
Sampling-based methods produce less-repetitive but disjunctive texts, while search-based methods maintain topic coherence at the cost of increased repetition.
The paper frames decoding from a language model as an optimization problem to strictly match the expected performance with human texts across multiple metrics simultaneously.

Plain English Explanation

Despite the impressive capabilities of modern language models, the texts they generate often fall short of matching the attributes of human-written texts. Sampling-based methods can produce more varied output, but the resulting text may lack coherence and flow. Conversely, search-based methods can maintain topic consistency, but at the expense of increased repetition.

This paper proposes a new approach that frames the text generation process as an optimization problem. The goal is to generate text that closely matches human-written text across a range of desired qualities, such as fluency, coherence, and creativity. This is achieved by defining a sequence-level energy function that incorporates multiple metrics simultaneously, and then using this function to scale the input language model distribution.

The key insight is that this induced distribution is guaranteed to improve the perplexity on human texts, suggesting it is a better approximation of the underlying distribution of human-written language. To make the sampling process tractable, the researchers adopt the Sampling-Importance-Resampling technique.

Technical Explanation

The paper proposes a novel decoding method that frames the text generation process as an optimization problem. The goal is to generate text that strictly matches the expected performance with human texts, as measured by multiple metrics of desired aspects simultaneously.

The authors define a sequence-level energy function based on these metrics and use it to scale the input language model distribution. This induced distribution enjoys an analytical solution and is proven to improve the perplexity on human texts, implying a better approximation of the underlying distribution of human-written language.

To facilitate tractable sampling from this globally normalized distribution, the researchers adopt the Sampling-Importance-Resampling (SIR) technique. Experiments on various domains and model scales demonstrate the superiority of their method in metrics alignment with human texts and human evaluation over strong baselines, such as SpeechAlign and SED.

Critical Analysis

The paper presents a compelling approach to text generation that aims to address the shortcomings of existing methods. By formulating the problem as an optimization task and deriving an analytically-solved distribution, the authors demonstrate a principled way to generate text that aligns with human-written text across multiple metrics.

However, the paper does not discuss the computational complexity of the proposed method, which could be a practical concern, especially for large-scale language models. Additionally, the paper does not explore the robustness of the method to different types of human texts or its performance on tasks beyond text generation, such as summarization or dialogue.

Further research could investigate the generalization of the approach to other language-related tasks, as well as its scalability and efficiency in real-world applications. It would also be valuable to understand the specific tradeoffs between the proposed method and existing techniques, perhaps through sample-efficient human evaluation or other comparative analyses.

Conclusion

This paper presents a novel optimization-based approach to text generation that aims to closely match the qualities of human-written text. By defining a sequence-level energy function based on multiple metrics and scaling the input language model distribution accordingly, the authors demonstrate a principled way to generate text that is more aligned with human preferences.

The key contribution of this work is the guaranteed improvement in perplexity on human texts, suggesting the induced distribution is a better approximation of the underlying human language distribution. While the computational complexity and generalization of the method require further investigation, this research represents an important step towards generating text that is more coherent, fluent, and engaging from the perspective of human readers.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

Language Model Decoding as Direct Metrics Optimization

Haozhe Ji, Pei Ke, Hongning Wang, Minlie Huang

Despite the remarkable advances in language modeling, current mainstream decoding methods still struggle to generate texts that align with human texts across different aspects. In particular, sampling-based methods produce less-repetitive texts which are often disjunctive in discourse, while search-based methods maintain topic coherence at the cost of increased repetition. Overall, these methods fall short in achieving holistic alignment across a broad range of aspects. In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts measured by multiple metrics of desired aspects simultaneously. The resulting decoding distribution enjoys an analytical solution that scales the input language model distribution via a sequence-level energy function defined by these metrics. And most importantly, we prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts. To facilitate tractable sampling from this globally normalized distribution, we adopt the Sampling-Importance-Resampling technique. Experiments on various domains and model scales demonstrate the superiority of our method in metrics alignment with human texts and human evaluation over strong baselines.

6/6/2024

Impact of Decoding Methods on Human Alignment of Conversational LLMs

Shaz Furniturewala, Kokil Jaidka, Yashvardhan Sharma

To be included into chatbot systems, Large language models (LLMs) must be aligned with human conversational conventions. However, being trained mainly on web-scraped data gives existing LLMs a voice closer to informational text than actual human speech. In this paper, we examine the effect of decoding methods on the alignment between LLM-generated and human conversations, including Beam Search, Top K Sampling, and Nucleus Sampling. We present new measures of alignment in substance, style, and psychometric orientation, and experiment with two conversation datasets. Our results provide subtle insights: better alignment is attributed to fewer beams in Beam Search and lower values of P in Nucleus Sampling. We also find that task-oriented and open-ended datasets perform differently in terms of alignment, indicating the significance of taking into account the context of the interaction.

7/30/2024

Improving Open-Ended Text Generation via Adaptive Decoding

Wenhong Zhu, Hongkun Hao, Zhiwei He, Yiming Ai, Rui Wang

Current language models decode text token by token according to probabilistic distribution, and determining the appropriate candidates for the next token is crucial to ensure generation quality. This study introduces adaptive decoding, a mechanism that dynamically empowers language models to ascertain a sensible candidate set during generation. Specifically, we introduce an entropy-based metric called confidence and conceptualize determining the optimal candidate set as a confidence-increasing process. The rationality of including a token in the candidate set is assessed by leveraging the increment of confidence. Experimental results reveal that our method balances diversity and coherence well. The human evaluation shows that our method can generate human-preferred text. Additionally, our method can potentially improve the reasoning ability of language models.

6/4/2024

A Thorough Examination of Decoding Methods in the Era of LLMs

Chufan Shi, Haoran Yang, Deng Cai, Zhisong Zhang, Yifan Wang, Yujiu Yang, Wai Lam

Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specific models, may not extend to the current era of general-purpose large language models (LLMs). Moreover, the recent influx of decoding strategies has further complicated this landscape. This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of LLMs, evaluating their performance, robustness to hyperparameter changes, and decoding speeds across a wide range of tasks, models, and deployment environments. Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization. Intriguingly, sensitivity analysis exposes that certain methods achieve superior performance at the cost of extensive hyperparameter tuning, highlighting the trade-off between attaining optimal results and the practicality of implementation in varying contexts.

6/18/2024