Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access

Read original: arXiv:2401.09967 - Published 7/23/2024 by Saibo Geng, Berkay Doner, Chris Wendler, Martin Josifoski, Robert West

Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access

Overview

This paper proposes a novel method called "Sketch-Guided Constrained Decoding" (SGCD) that can boost the performance of blackbox large language models without access to their internal logits.
The method leverages user-provided sketches to guide the decoding process and constrain the model's output to be consistent with the provided sketch.
The authors demonstrate the effectiveness of SGCD on various text generation tasks, including constrained code generation, mitigating hallucination in vision-language models, and integrating large language models with causal discovery.

Plain English Explanation

The paper introduces a new technique called "Sketch-Guided Constrained Decoding" (SGCD) that can improve the performance of large language models, even when you don't have access to their internal workings (the "logits"). The key idea is to use a simple sketch or outline provided by the user to guide the model's text generation process and make sure the output is consistent with the sketch.

Imagine you have a powerful language model that can write amazing stories, but sometimes it produces content that doesn't quite match what you had in mind. With SGCD, you can provide a basic sketch of the story you want - maybe just a few bullet points about the characters, setting, and plot points. The model then uses this sketch as a guide to generate text that stays true to your vision, without straying off in unexpected directions.

The researchers show that this SGCD approach can be really helpful in a variety of applications, like generating secure computer code, improving vision-language models to avoid hallucination, and integrating large language models with causal discovery methods. By giving the model a little bit of guidance, you can get much better results compared to just letting the model run free.

Technical Explanation

The core of the SGCD method is a novel decoding algorithm that incorporates user-provided sketches to guide the text generation process of a blackbox large language model. The sketches act as constraints that the model must adhere to, ensuring the generated text is consistent with the user's intent.

Specifically, the SGCD decoding algorithm works as follows:

The user provides a sketch, which is a set of keywords, phrases, or other semantic constraints that the desired output should satisfy.
The model is then tasked with generating text that maximizes the likelihood of the output given both the input prompt and the provided sketch.
This is achieved through a constrained optimization procedure that iteratively updates the model's token predictions to align with the sketch while maintaining fluency and coherence.

The authors demonstrate the effectiveness of SGCD on a range of text generation tasks, including constrained code generation, mitigating hallucination in vision-language models, and integrating large language models with causal discovery methods. In each case, SGCD is shown to outperform standard decoding approaches, highlighting the benefits of user-guided text generation.

Critical Analysis

The SGCD method presented in this paper offers a compelling approach to leveraging user input to boost the performance of blackbox large language models. By allowing users to provide sketches or semantic constraints, the model can generate text that is more aligned with the user's intent and requirements.

One potential limitation of SGCD is that the user-provided sketches may not always be sufficient or expressive enough to fully capture the desired output. In complex or open-ended tasks, the sketches may be too simplistic to effectively guide the model's generation. Additional research could explore ways to make the sketch specification more flexible and expressive.

Another area for further investigation is the scalability of SGCD to larger and more diverse language models. The authors demonstrate the technique on a few specific models and tasks, but its generalizability to a wider range of models and applications remains to be seen. Evaluating SGCD's performance on a broader set of large language models would be a valuable next step.

Overall, the SGCD method represents a promising direction for improving the controllability and reliability of blackbox large language models, with potential applications in conditional generative modeling, secure code generation, and other domains where user guidance is crucial.

Conclusion

The Sketch-Guided Constrained Decoding (SGCD) method presented in this paper offers a novel approach to boosting the performance of blackbox large language models without access to their internal logits. By allowing users to provide sketches or semantic constraints, SGCD can generate text that is more aligned with the user's intent and requirements.

The authors demonstrate the effectiveness of SGCD across a range of text generation tasks, including constrained code generation, mitigating hallucination in vision-language models, and integrating large language models with causal discovery methods. This suggests that SGCD could have broad applicability in various domains where user guidance and controllability are crucial.

While the SGCD method shows promise, further research is needed to explore its scalability, flexibility, and generalizability to a wider range of large language models and applications. Nonetheless, this work represents an important step forward in the quest to make blackbox language models more reliable and trustworthy.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access

Saibo Geng, Berkay Doner, Chris Wendler, Martin Josifoski, Robert West

Constrained decoding, a technique for enforcing constraints on language model outputs, offers a way to control text generation without retraining or architectural modifications. Its application is, however, typically restricted to models that give users access to next-token distributions (usually via softmax logits), which poses a limitation with blackbox large language models (LLMs). This paper introduces sketch-guided constrained decoding (SGCD), a novel approach to constrained decoding for blackbox LLMs, which operates without access to the logits of the blackbox LLM. SGCD utilizes a locally hosted auxiliary model to refine the output of an unconstrained blackbox LLM, effectively treating this initial output as a sketch for further elaboration. This approach is complementary to traditional logit-based techniques and enables the application of constrained decoding in settings where full model transparency is unavailable. We demonstrate the efficacy of SGCD through experiments in closed information extraction and constituency parsing, showing how it enhances the utility and flexibility of blackbox LLMs for complex NLP tasks.

7/23/2024

Grammar-Aligned Decoding

Kanghee Park, Jiayu Wang, Taylor Berg-Kirkpatrick, Nadia Polikarpova, Loris D'Antoni

Large Language Models (LLMs) struggle with reliably generating highly structured outputs, such as program code, mathematical formulas, or well-formed markup. Constrained decoding approaches mitigate this problem by greedily restricting what tokens an LLM can output at each step to guarantee that the output matches a given constraint. Specifically, in grammar-constrained decoding (GCD), the LLM's output must follow a given grammar. In this paper we demonstrate that GCD techniques (and in general constrained decoding techniques) can distort the LLM's distribution, leading to outputs that are grammatical but appear with likelihoods that are not proportional to the ones given by the LLM, and so ultimately are low-quality. We call the problem of aligning sampling with a grammar constraint, grammar-aligned decoding (GAD), and propose adaptive sampling with approximate expected futures (ASAp), a decoding algorithm that guarantees the output to be grammatical while provably producing outputs that match the conditional probability of the LLM's distribution conditioned on the given grammar constraint. Our algorithm uses prior sample outputs to soundly overapproximate the future grammaticality of different output prefixes. Our evaluation on code generation and structured NLP tasks shows how ASAp often produces outputs with higher likelihood (according to the LLM's distribution) than existing GCD techniques, while still enforcing the desired grammatical constraints.

6/3/2024

Graph-Structured Speculative Decoding

Zhuocheng Gong, Jiahao Liu, Ziyue Wang, Pengfei Wu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models (LLMs) by employing a small language model to draft a hypothesis sequence, which is then validated by the LLM. The effectiveness of this approach heavily relies on the balance between performance and efficiency of the draft model. In our research, we focus on enhancing the proportion of draft tokens that are accepted to the final output by generating multiple hypotheses instead of just one. This allows the LLM more options to choose from and select the longest sequence that meets its standards. Our analysis reveals that hypotheses produced by the draft model share many common token sequences, suggesting a potential for optimizing computation. Leveraging this observation, we introduce an innovative approach utilizing a directed acyclic graph (DAG) to manage the drafted hypotheses. This structure enables us to efficiently predict and merge recurring token sequences, vastly reducing the computational demands of the draft model. We term this approach Graph-structured Speculative Decoding (GSD). We apply GSD across a range of LLMs, including a 70-billion parameter LLaMA-2 model, and observe a remarkable speedup of 1.73$times$ to 1.96$times$, significantly surpassing standard speculative decoding.

7/24/2024

Improving Logits-based Detector without Logits from Black-box LLMs

Cong Zeng, Shengkun Tang, Xianjun Yang, Yuanzhou Chen, Yiyou Sun, zhiqiang xu, Yao Li, Haifeng Chen, Wei Cheng, Dongkuan Xu

The advent of Large Language Models (LLMs) has revolutionized text generation, producing outputs that closely mimic human writing. This blurring of lines between machine- and human-written text presents new challenges in distinguishing one from the other a task further complicated by the frequent updates and closed nature of leading proprietary LLMs. Traditional logits-based detection methods leverage surrogate models for identifying LLM-generated content when the exact logits are unavailable from black-box LLMs. However, these methods grapple with the misalignment between the distributions of the surrogate and the often undisclosed target models, leading to performance degradation, particularly with the introduction of new, closed-source models. Furthermore, while current methodologies are generally effective when the source model is identified, they falter in scenarios where the model version remains unknown, or the test set comprises outputs from various source models. To address these limitations, we present Distribution-Aligned LLMs Detection (DALD), an innovative framework that redefines the state-of-the-art performance in black-box text detection even without logits from source LLMs. DALD is designed to align the surrogate model's distribution with that of unknown target LLMs, ensuring enhanced detection capability and resilience against rapid model iterations with minimal training investment. By leveraging corpus samples from publicly accessible outputs of advanced models such as ChatGPT, GPT-4 and Claude-3, DALD fine-tunes surrogate models to synchronize with unknown source model distributions effectively.

8/20/2024