SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning

Read original: arXiv:2407.12874 - Published 8/13/2024 by Chenyang Zhao, Xueying Jia, Vijay Viswanathan, Tongshuang Wu, Graham Neubig

SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning

Overview

This paper introduces a self-guided approach called "Self-Guide" to improve task-specific instruction following in large language models (LLMs).
The main idea is to fine-tune the LLM on synthetically generated task-specific instructions and user responses, allowing the model to learn better how to follow instructions.
The authors show that this self-synthetic finetuning approach outperforms traditional finetuning on human-written instructions across a variety of tasks.

Plain English Explanation

The paper presents a new technique called "Self-Guide" to help large language models (LLMs) - powerful AI systems that can understand and generate human language - become better at following specific instructions or completing certain tasks.

The key insight is that instead of just finetuning the LLM on a limited set of human-written instructions, you can have the LLM generate its own synthetic instructions and practice following them. This allows the model to learn the general patterns and structure of good instructions, rather than just memorizing a few examples.

The authors show that this self-synthetic finetuning approach leads to better performance compared to standard finetuning, across a range of different tasks. The LLM is able to more effectively understand and execute the instructions, suggesting this could be a useful technique for improving the real-world capabilities of these powerful AI systems.

Technical Explanation

The paper introduces a novel finetuning approach called Self-Guide that aims to improve task-specific instruction following in large language models (LLMs).

The core idea is to have the LLM generate its own synthetic task-specific instructions and user responses, and then finetune the model on this self-generated data. This allows the LLM to learn the general patterns and structure of good instructions, rather than just memorizing a fixed set of human-written examples.

The authors demonstrate the effectiveness of this self-synthetic finetuning approach by comparing it to standard finetuning on a variety of instruction following tasks. They find that the Self-Guide method outperforms traditional finetuning, suggesting the model is able to more effectively understand and execute the instructions after this self-guided training.

The authors also draw connections to prior work on self-play and execution feedback for improving instruction following, as well as the broader trend of incorporating supervised knowledge to make large language models more capable.

Critical Analysis

The paper presents a compelling approach for improving instruction following in LLMs, with promising empirical results. However, there are a few limitations and areas for further research worth noting:

The self-synthetic data generation process is not described in great detail, and it's unclear how much human oversight or curation is required to ensure the quality of the synthetic instructions.
The evaluation is limited to a relatively narrow set of tasks, and it's unclear how well the Self-Guide approach would generalize to more open-ended or complex instruction following scenarios.
The paper does not address potential safety or robustness concerns that may arise from having LLMs generate their own instructions, which could lead to unintended or harmful outputs.

Additionally, future work could explore multimodal approaches that incorporate visual or other non-textual information to further enhance instruction following capabilities.

Conclusion

Overall, the Self-Guide approach presented in this paper represents an interesting and promising direction for improving the instruction following abilities of large language models. By having the models learn from their own self-generated synthetic data, they can develop a deeper understanding of the structure and patterns of effective instructions.

While there are some limitations and areas for further research, the authors have demonstrated the potential of this self-synthetic finetuning technique to outperform traditional finetuning methods. As LLMs continue to grow in capability and influence, innovations like Self-Guide will be increasingly important for ensuring they can reliably and safely follow instructions to benefit society.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning

Chenyang Zhao, Xueying Jia, Vijay Viswanathan, Tongshuang Wu, Graham Neubig

Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. However, prompting often leads models to make predictions with lower accuracy compared to finetuning a model with ample training data. On the other hand, while finetuning LLMs on task-specific data generally improves their performance, abundant annotated datasets are not available for all tasks. Previous work has explored generating task-specific data from state-of-the-art LLMs and using this data to finetune smaller models, but this approach requires access to a language model other than the one being trained, which introduces cost, scalability challenges, and legal hurdles associated with continuously relying on more powerful LLMs. In response to these, we propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM, then use these input-output pairs to finetune the student LLM itself. In our empirical evaluation of the Natural Instructions V2 benchmark, we find that SELF-GUIDE improves the performance of LLM by a substantial margin. Specifically, we report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics. This sheds light on the promise of self-synthesized data guiding LLMs towards becoming task-specific experts without any external learning signals.

8/13/2024

Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations

Kai Tzu-iunn Ong, Taeyoon Kwon, Jinyoung Yeo

Guiding large language models with a selected set of human-authored demonstrations is a common practice for improving LLM applications. However, human effort can be costly, especially in specialized domains (e.g., clinical diagnosis), and does not guarantee optimal performance due to the potential discrepancy of target skills between selected demonstrations and real test instances. Motivated by these, this paper explores the automatic creation of customized demonstrations, whose target skills align with the given target instance. We present SELF-TAUGHT, a problem-solving framework, which facilitates demonstrations that are tailored to the target problem and filtered for better quality (i.e., correctness) in a zero-shot manner. In 15 tasks of multiple-choice questions of diverse domains and the diagnosis of Alzheimer's disease (AD) with real-world patients, SELF-TAUGHT achieves superior performance to strong baselines (e.g., Few-shot CoT, Plan-and-Solve, Auto-CoT). We conduct comprehensive analyses on SELF-TAUGHT, including its generalizability to existing prompting methods and different LLMs, the quality of its intermediate generation, and more.

8/23/2024

Self-Judge: Selective Instruction Following with Alignment Self-Evaluation

Hai Ye, Hwee Tou Ng

Pre-trained large language models (LLMs) can be tailored to adhere to human instructions through instruction tuning. However, due to shifts in the distribution of test-time data, they may not always execute instructions accurately, potentially generating factual errors or misaligned content when acting as chat assistants. To enhance the reliability of LLMs in following instructions, we propose the study of selective instruction following, whereby the system declines to execute instructions if the anticipated response quality is low. We train judge models that can predict numerical quality scores for model responses. To address data scarcity, we introduce Self-J, a novel self-training framework for developing judge models without needing human-annotated quality scores. Our method leverages the model's inherent self-evaluation capability to extract information about response quality from labeled instruction-tuning data. It incorporates a gold reference answer to facilitate self-evaluation and recalibrates by assessing the semantic similarity between the response sample and the gold reference. During the training phase, we implement self-distillation as a regularization technique to enhance the capability of reference-free estimation. To validate alignment evaluation on general instruction-following tasks, we collect large-scale high-quality instructions from Hugging Face for model training and evaluation. Extensive experiments on five open-source models show that our method correlates much more with GPT-4 than strong baselines, e.g., supervised models distilled from GPT-4 and GPT-3.5-turbo. Our analysis shows our model's strong generalization across domains. Additionally, our judge models serve as good reward models, e.g., boosting WizardLM-13B-V1.2 from 89.17 to 92.48 and from 12.03 to 15.90 in version v1 and v2 of AlpacaEval respectively using best-of-32 sampling with our judge models.

9/4/2024

Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching

Xiaoying Zhang, Baolin Peng, Ye Tian, Jingyan Zhou, Yipeng Zhang, Haitao Mi, Helen Meng

Large language models (LLMs) often struggle to provide up-to-date information due to their one-time training and the constantly evolving nature of the world. To keep LLMs current, existing approaches typically involve continued pre-training on new documents. However, they frequently face difficulties in extracting stored knowledge. Motivated by the remarkable success of the Feynman Technique in efficient human learning, we introduce Self-Tuning, a learning framework aimed at improving an LLM's ability to effectively acquire new knowledge from raw documents through self-teaching. Specifically, we develop a Self-Teaching strategy that augments the documents with a set of knowledge-intensive tasks created in a self-supervised manner, focusing on three crucial aspects: memorization, comprehension, and self-reflection. In addition, we introduce three Wiki-Newpages-2023-QA datasets to facilitate an in-depth analysis of an LLM's knowledge acquisition ability concerning memorization, extraction, and reasoning. Extensive experimental results on Llama2 family models reveal that Self-Tuning consistently exhibits superior performance across all knowledge acquisition tasks and excels in preserving previous knowledge.

6/18/2024