FuseChat: Knowledge Fusion of Chat Models

Read original: arXiv:2402.16107 - Published 5/29/2024 by Fanqi Wan, Ziyi Yang, Longguang Zhong, Xiaojun Quan, Xinting Huang, Wei Bi

FuseChat: Knowledge Fusion of Chat Models

Overview

Presents a novel approach called "FuseChat" for fusing knowledge from multiple large language models (LLMs) to improve chatbot performance
Explores techniques for effectively combining the strengths of different LLMs to enhance the capabilities of conversational AI systems
Demonstrates the benefits of knowledge fusion in chat models through thorough experiments and analysis

Plain English Explanation

FuseChat is a new method for combining the knowledge and abilities of different large language models (LLMs) to create more capable chatbots. The researchers recognized that while individual LLMs have impressive capabilities, no single model is perfect. By finding ways to fuse the knowledge from multiple LLMs, the team aimed to develop chatbots that can draw upon a broader and more diverse set of skills and information.

The key idea behind FuseChat is to dynamically sample and blend the outputs from different LLMs based on the context of the conversation. This allows the chatbot to adaptively leverage the strengths of each model, rather than relying on a single, fixed LLM. The researchers explored various techniques for model fusion and language model combination to achieve this goal.

Through extensive experiments, the researchers demonstrated that FuseChat can outperform individual LLMs on a range of conversational tasks, from open-ended chatting to task-oriented dialogues. The fusion approach allows the chatbot to provide more informative, coherent, and relevant responses by drawing upon the complementary strengths of the underlying models.

Technical Explanation

FuseChat is a framework for fusing knowledge from multiple large language models (LLMs) to enhance the performance of conversational AI systems. The key innovation is the development of dynamic sampling and blending techniques that allow the chatbot to adaptively leverage the strengths of different LLMs based on the context of the conversation.

The researchers explored various model fusion and language model combination approaches, including weighted averaging, adapter-based fusion, and prompt engineering. These techniques enable the chatbot to selectively incorporate relevant knowledge and capabilities from the constituent LLMs to generate more informative, coherent, and relevant responses.

The performance of FuseChat was evaluated on a range of conversational tasks, including open-ended chatting and task-oriented dialogues. The results demonstrate that the fusion-based approach can outperform individual LLMs, as the chatbot is able to draw upon a more diverse set of skills and information to handle a wider variety of conversational scenarios.

Critical Analysis

The FuseChat paper provides a robust and well-designed exploration of knowledge fusion for improving chatbot performance. The researchers acknowledge that while individual LLMs have impressive capabilities, no single model is perfect, and there is significant value in dynamically sampling and blending the outputs from multiple models.

However, the paper does not fully address the potential challenges and limitations of the FuseChat approach. For example, the fusion process may introduce additional complexity and computational overhead, which could impact the scalability and real-time performance of the chatbot. Additionally, the paper does not explore the potential biases or inconsistencies that may arise when combining outputs from diverse LLMs, and how these issues can be mitigated.

Further research may be needed to investigate the long-term stability and robustness of the FuseChat system, as well as its generalizability to a wider range of conversational domains and use cases. Exploring the interpretability and explainability of the fusion process could also be a valuable area of inquiry, as it would help users understand the reasoning behind the chatbot's responses.

Conclusion

FuseChat presents a promising approach for enhancing the capabilities of conversational AI systems by fusing knowledge from multiple large language models. The dynamic sampling and blending techniques allow the chatbot to adaptively leverage the strengths of different LLMs, resulting in more informative, coherent, and relevant responses across a variety of conversational tasks.

The research highlights the potential benefits of knowledge fusion in chat models, and demonstrates the value of combining complementary skills and information sources to create more capable and versatile conversational AI assistants. As the field of large language models continues to advance, approaches like FuseChat could play a crucial role in unlocking the full potential of these powerful systems for real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

FuseChat: Knowledge Fusion of Chat Models

Fanqi Wan, Ziyi Yang, Longguang Zhong, Xiaojun Quan, Xinting Huang, Wei Bi

Recently, FuseLLM introduced the concept of knowledge fusion to transfer the collective knowledge of multiple structurally varied LLMs into a target LLM through lightweight continual training. In this report, we extend the scalability and flexibility of the FuseLLM framework to realize the fusion of chat LLMs, resulting in FusionChat. FusionChat comprises two main stages. Firstly, we undertake knowledge fusion for structurally and scale-varied source LLMs to derive multiple target LLMs of identical structure and size via lightweight fine-tuning. Then, these target LLMs are merged within the parameter space, wherein we propose a novel method for determining the merging weights based on the variation ratio of parameter matrices before and after fine-tuning. We validate our approach using three prominent chat LLMs with diverse architectures and scales, namely NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B. Experimental results spanning various chat domains demonstrate the superiority of FusionChat-7B across a broad spectrum of chat LLMs at 7B and 34B scales, even surpassing GPT-3.5 (March) and approaching Mixtral-8x7B-Instruct.

5/29/2024

FuseChat: Knowledge Fusion of Chat Models

Fanqi Wan, Longguang Zhong, Ziyi Yang, Ruijun Chen, Xiaojun Quan

While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, it incurs substantial costs and may lead to redundancy in competencies. Knowledge fusion aims to integrate existing LLMs of diverse architectures and capabilities into a more potent LLM through lightweight continual training, thereby reducing the need for costly LLM development. In this work, we propose a new framework for the knowledge fusion of chat LLMs through two main stages, resulting in FuseChat. Firstly, we conduct pairwise knowledge fusion on source chat LLMs of varying structures and scales to create multiple target LLMs with identical structure and size via lightweight fine-tuning. During this process, a statistics-based token alignment approach is introduced as the cornerstone for fusing LLMs with different structures. Secondly, we merge these target LLMs within the parameter space, where we propose a novel method for determining the merging coefficients based on the magnitude of parameter updates before and after fine-tuning. We implement and validate FuseChat using six prominent chat LLMs with diverse architectures and scales, including OpenChat-3.5-7B, Starling-LM-7B-alpha, NH2-SOLAR-10.7B, InternLM2-Chat-20B, Mixtral-8x7B-Instruct, and Qwen-1.5-Chat-72B. Experimental results on two instruction-following benchmarks, AlpacaEval 2.0 and MT-Bench, demonstrate the superiority of FuseChat-7B over baselines of various sizes. Our model is even comparable to the larger Mixtral-8x7B-Instruct and approaches GPT-3.5-Turbo-1106 on MT-Bench. Our code, model weights, and data are public at url{https://github.com/fanqiwan/FuseAI}.

8/16/2024

Cool-Fusion: Fuse Large Language Models without Training

Cong Liu, Xiaojun Quan, Yan Pan, Liang Lin, Weigang Wu, Xu Chen

We focus on the problem of fusing two or more heterogeneous large language models (LLMs) to facilitate their complementary strengths. One of the challenges on model fusion is high computational load, i.e. to fine-tune or to align vocabularies via combinatorial optimization. To this end, we propose emph{Cool-Fusion}, a simple yet effective approach that fuses the knowledge of heterogeneous source LLMs to leverage their complementary strengths. emph{Cool-Fusion} is the first method that does not require any type of training like the ensemble approaches. But unlike ensemble methods, it is applicable to any set of source LLMs that have different vocabularies. The basic idea is to have each source LLM individually generate tokens until the tokens can be decoded into a text segment that ends at word boundaries common to all source LLMs. Then, the source LLMs jointly rerank the generated text segment and select the best one, which is the fused text generation in one step. Extensive experiments are conducted across a variety of benchmark datasets. On emph{GSM8K}, emph{Cool-Fusion} increases accuracy from three strong source LLMs by a significant 8%-17.8%.

7/30/2024

ProFuser: Progressive Fusion of Large Language Models

Tianyuan Shi, Fanqi Wan, Canbin Huang, Xiaojun Quan, Chenliang Li, Ming Yan, Ji Zhang

While fusing the capacities and advantages of various large language models (LLMs) offers a pathway to construct more powerful and versatile models, a fundamental challenge is to properly select advantageous model during the training. Existing fusion methods primarily focus on the training mode that uses cross entropy on ground truth in a teacher-forcing setup to measure a model's advantage, which may provide limited insight towards model advantage. In this paper, we introduce a novel approach that enhances the fusion process by incorporating both the training and inference modes. Our method evaluates model advantage not only through cross entropy during training but also by considering inference outputs, providing a more comprehensive assessment. To combine the two modes effectively, we introduce ProFuser to progressively transition from inference mode to training mode. To validate ProFuser's effectiveness, we fused three models, including vicuna-7b-v1.5, Llama-2-7b-chat, and mpt-7b-8k-chat, and demonstrated the improved performance in knowledge, reasoning, and safety compared to baseline methods.

8/12/2024