Multi-Reference Preference Optimization for Large Language Models






Published 5/28/2024 by Hung Le, Quan Tran, Dung Nguyen, Kien Do, Saloni Mittal, Kelechi Ogueji, Svetha Venkatesh
Multi-Reference Preference Optimization for Large Language Models


How can Large Language Models (LLMs) be aligned with human intentions and values? A typical solution is to gather human preference on model outputs and finetune the LLMs accordingly while ensuring that updates do not deviate too far from a reference model. Recent approaches, such as direct preference optimization (DPO), have eliminated the need for unstable and sluggish reinforcement learning optimization by introducing close-formed supervised losses. However, a significant limitation of the current approach is its design for a single reference model only, neglecting to leverage the collective power of numerous pretrained LLMs. To overcome this limitation, we introduce a novel closed-form formulation for direct preference optimization using multiple reference models. The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models, substantially enhancing preference learning capabilities compared to the single-reference DPO. Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance. Furthermore, MRPO effectively finetunes LLMs to exhibit superior performance in several downstream natural language processing tasks such as GSM8K and TruthfulQA.

Plain English Explanation

Large language models (LLMs) like GPT-3 are powerful tools that can generate human-like text on a wide range of topics. However, these models are often trained on broad datasets and may not align well with the specific preferences and needs of individual users.

The authors of this paper propose a new approach called Multi-Reference Preference Optimization (MRPO) to fine-tune LLMs to better match a user's preferences. The key idea is to use multiple "reference models" during the fine-tuning process, each representing a different set of preferences. For example, one reference model might represent the preferences of a creative writer, while another might represent the preferences of a scientist.

By incorporating these diverse reference models, the fine-tuned LLM can learn to generate text that is more personalized and aligned with the user's specific needs and preferences. This could be particularly useful for tasks like personalized text ranking, where the model needs to understand the user's preferences to provide relevant and engaging content.

The authors demonstrate the effectiveness of MRPO on a variety of tasks, showing that it can outperform traditional fine-tuning approaches and lead to better alignment between the language model and the user's preferences.

Technical Explanation

The paper introduces a novel approach called Multi-Reference Preference Optimization (MRPO) for fine-tuning large language models to better align with user preferences. The key idea is to leverage multiple "reference models" during the fine-tuning process, each representing a different set of preferences.

The authors formulate the fine-tuning problem as a multi-task learning problem, where the model is trained to simultaneously match the preferences of multiple reference models. This is achieved by defining a loss function that encourages the model to generate text that is preferred by each of the reference models.

The authors explore several variants of MRPO, including token-level direct preference optimization and triple preference optimization, which aim to directly optimize the model's preference alignment with the reference models.

The authors conduct extensive experiments on a range of tasks, including text ranking, preference learning, and direct preference optimization. The results demonstrate the effectiveness of MRPO in improving the alignment between the fine-tuned language model and the user's preferences, outperforming traditional fine-tuning approaches.

Critical Analysis

The paper presents a compelling approach to fine-tuning large language models to better align with user preferences. The use of multiple reference models during the fine-tuning process is a clever idea that allows the model to learn a more personalized representation of language.

One potential limitation of the approach is the requirement to have access to multiple reference models, each representing a different set of preferences. In practice, it may be challenging to obtain or construct these reference models, especially for niche or specialized domains.

Additionally, the authors do not discuss the potential for unobserved preference heterogeneity within the reference models themselves. This could be an important consideration, as individual preferences may not be fully captured by the reference models.

Further research could explore ways to learn the reference models more effectively, potentially by incorporating user feedback or other sources of preference information. Investigating the robustness of MRPO to variations in the reference models would also be valuable.


This paper introduces a novel approach called Multi-Reference Preference Optimization (MRPO) for fine-tuning large language models to better align with user preferences. By leveraging multiple reference models during the fine-tuning process, the authors demonstrate that LLMs can learn a more personalized representation of language, leading to improved performance on a variety of tasks.

The core idea of MRPO has the potential to significantly advance the field of personalized language modeling, opening up new opportunities for more tailored and engaging language-based applications. As the authors have shown, this approach can lead to better text ranking, preference learning, and direct preference optimization, among other applications.

While the paper presents a promising solution, there are still areas for further research, such as addressing the challenges of obtaining and curating the reference models. Overall, the MRPO approach represents an important step towards developing large language models that are better aligned with the needs and preferences of individual users.

