Active Preference Learning for Ordering Items In- and Out-of-sample

Read original: arXiv:2405.03059 - Published 5/7/2024 by Herman Bergstrom, Emil Carlsson, Devdatt Dubhashi, Fredrik D. Johansson
Total Score

0

🚀

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper explores active learning algorithms for ordering items based on noisy pairwise comparisons, which is useful when item-specific labels are difficult to assign.
  • Many existing algorithms treat items as unrelated, ignoring their shared contextual attributes and limiting sample efficiency and generalization.
  • The authors propose two new algorithms designed to actively sample comparisons in a way that minimizes the expected ordering error, considering both aleatoric (inherent) and epistemic (model) uncertainty.
  • The algorithms are evaluated on realistic image ordering tasks, including one with human-provided comparisons, and demonstrate superior sample efficiency compared to non-contextual and baseline approaches.

Plain English Explanation

Ordering items is often useful, but can be tricky when it's hard to directly label the items themselves. For example, if you need to rank a set of products based on how "good" they are, it may be difficult for people to assign a numerical score to each one. Instead, it might be easier for them to compare pairs of products and say which one is better.

The authors recognized that many existing algorithms for learning these rankings from pairwise comparisons don't take advantage of the fact that the items being ranked often have associated attributes (like product features). By ignoring these attributes, the algorithms miss out on opportunities to learn more efficiently and apply what they've learned to new items.

To address this, the authors developed two new algorithms that actively select which pairwise comparisons to request in order to minimize the expected error in the final ranking. Importantly, these algorithms consider both the inherent uncertainty in the comparisons (e.g., due to human disagreement) and the uncertainty in the model's understanding of how the item attributes relate to the rankings.

The authors tested their algorithms on real-world tasks like ranking images, including one where the comparisons came from human annotators. They found that their algorithms were more sample-efficient than previous approaches, meaning they could learn accurate rankings using fewer pairwise comparisons.

Technical Explanation

The paper proposes a framework for active learning of item orderings from noisy pairwise comparisons, taking into account contextual item attributes. The authors model the pairwise preferences using a logistic utility function, where the utility of an item depends on its contextual features.

To actively select the most informative pairwise comparisons, the authors derive an upper bound on the expected ordering error in terms of the aleatoric (inherent) uncertainty in the comparisons and the epistemic (model) uncertainty. They then propose two algorithms, Active Ranking with Uncertainty Weighted Pairs (ARUWP) and Active Ranking with Feature-Weighted Uncertainty (ARFWU), that greedily select pairs to minimize this error bound.

ARUWP selects pairs with high overall uncertainty, while ARFWU additionally weighs the uncertainty by the relevance of the item features to the ranking. The authors provide theoretical analysis of these algorithms, showing they can achieve better sample efficiency than non-contextual ranking approaches and [active preference learning baselines](https://aimodels.fyi/papers/arxiv/optimal-design-human-feedback, https://aimodels.fyi/papers/arxiv/improved-active-learning-via-dependent-leverage-score, https://aimodels.fyi/papers/arxiv/classification-tree-based-active-learning-wrapper-approach).

The algorithms are evaluated on two realistic image ordering tasks, one with comparisons provided by human annotators. The results demonstrate the superior sample efficiency of the proposed methods compared to the baselines, particularly in the human-annotated task.

Critical Analysis

The paper presents a novel and well-motivated approach to active learning for item ordering from pairwise comparisons. The key strengths are the consideration of contextual item attributes, the principled optimization of an error bound, and the demonstrated improvements over existing methods.

One potential limitation is the assumption of a logistic utility function, which may not perfectly capture real-world preference structures. The authors acknowledge this and suggest extensions to more flexible models as future work.

Additionally, the paper focuses on a single-task setting, where the goal is to learn a ranking for a fixed set of items. It would be interesting to explore how these algorithms could be adapted to handle cold-start scenarios with new items, or to learn a general ranking function that can be applied across different item sets.

Finally, while the human-annotated experiment provides valuable real-world insights, the sample size is relatively small. Conducting larger-scale user studies would help further validate the practical benefits of the proposed approach.

Overall, this work makes an important contribution to the active learning literature, with promising implications for a variety of ranking and preference learning applications.

Conclusion

This paper introduces a new framework for actively learning item orderings from noisy pairwise comparisons, taking advantage of contextual item attributes to improve sample efficiency and generalization. The proposed algorithms, ARUWP and ARFWU, greedily select the most informative comparisons to minimize the expected ordering error, considering both inherent and model-based uncertainty.

Experiments on realistic image ordering tasks, including one with human-provided comparisons, demonstrate the superior performance of these algorithms compared to non-contextual and baseline active learning approaches. This work represents an important step forward in preference learning, with potential applications in recommendation systems, user interface design, and other domains where accurate rankings are valuable but direct item labels are hard to obtain.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🚀

Total Score

0

Active Preference Learning for Ordering Items In- and Out-of-sample

Herman Bergstrom, Emil Carlsson, Devdatt Dubhashi, Fredrik D. Johansson

Learning an ordering of items based on noisy pairwise comparisons is useful when item-specific labels are difficult to assign, for example, when annotators have to make subjective assessments. Algorithms have been proposed for actively sampling comparisons of items to minimize the number of annotations necessary for learning an accurate ordering. However, many ignore shared structure between items, treating them as unrelated, limiting sample efficiency and precluding generalization to new items. In this work, we study active learning with pairwise preference feedback for ordering items with contextual attributes, both in- and out-of-sample. We give an upper bound on the expected ordering error incurred by active learning strategies under a logistic preference model, in terms of the aleatoric and epistemic uncertainty in comparisons, and propose two algorithms designed to greedily minimize this bound. We evaluate these algorithms in two realistic image ordering tasks, including one with comparisons made by human annotators, and demonstrate superior sample efficiency compared to non-contextual ranking approaches and active preference learning baselines.

Read more

5/7/2024

Deep Bayesian Active Learning for Preference Modeling in Large Language Models
Total Score

0

Deep Bayesian Active Learning for Preference Modeling in Large Language Models

Luckeciano C. Melo, Panagiotis Tigas, Alessandro Abate, Yarin Gal

Leveraging human preferences for steering the behavior of Large Language Models (LLMs) has demonstrated notable success in recent years. Nonetheless, data selection and labeling are still a bottleneck for these systems, particularly at large scale. Hence, selecting the most informative points for acquiring human feedback may considerably reduce the cost of preference labeling and unleash the further development of LLMs. Bayesian Active Learning provides a principled framework for addressing this challenge and has demonstrated remarkable success in diverse settings. However, previous attempts to employ it for Preference Modeling did not meet such expectations. In this work, we identify that naive epistemic uncertainty estimation leads to the acquisition of redundant samples. We address this by proposing the Bayesian Active Learner for Preference Modeling (BAL-PM), a novel stochastic acquisition policy that not only targets points of high epistemic uncertainty according to the preference model but also seeks to maximize the entropy of the acquired prompt distribution in the feature space spanned by the employed LLM. Notably, our experiments demonstrate that BAL-PM requires 33% to 68% fewer preference labels in two popular human preference datasets and exceeds previous stochastic Bayesian acquisition policies.

Read more

6/17/2024

Active Preference Learning for Large Language Models
Total Score

0

Active Preference Learning for Large Language Models

William Muldrew, Peter Hayes, Mingtian Zhang, David Barber

As large language models (LLMs) become more capable, fine-tuning techniques for aligning with human intent are increasingly important. A key consideration for aligning these models is how to most effectively use human resources, or model resources in the case where LLMs themselves are used as oracles. Reinforcement learning from Human or AI preferences (RLHF/RLAIF) is the most prominent example of such a technique, but is complex and often unstable. Direct Preference Optimization (DPO) has recently been proposed as a simpler and more stable alternative. In this work, we develop an active learning strategy for DPO to make better use of preference labels. We propose a practical acquisition function for prompt/completion pairs based on the predictive entropy of the language model and a measure of certainty of the implicit preference model optimized by DPO. We demonstrate how our approach improves both the rate of learning and final performance of fine-tuning on pairwise preference data.

Read more

7/1/2024

⚙️

Total Score

0

Active Learning for Non-Parametric Choice Models

Fransisca Susan (MIT Operations Research Center), Negin Golrezaei (MIT Sloan School of Management), Ehsan Emamjomeh-Zadeh (Meta Platforms, Inc), David Kempe (University of Southern California, Los Angeles)

We study the problem of actively learning a non-parametric choice model based on consumers' decisions. We present a negative result showing that such choice models may not be identifiable. To overcome the identifiability problem, we introduce a directed acyclic graph (DAG) representation of the choice model. This representation provably encodes all the information about the choice model which can be inferred from the available data, in the sense that it permits computing all choice probabilities. We establish that given exact choice probabilities for a collection of item sets, one can reconstruct the DAG. However, attempting to extend this methodology to estimate the DAG from noisy choice frequency data obtained during an active learning process leads to inaccuracies. To address this challenge, we present an inclusion-exclusion approach that effectively manages error propagation across DAG levels, leading to a more accurate estimate of the DAG. Utilizing this technique, our algorithm estimates the DAG representation of an underlying non-parametric choice model. The algorithm operates efficiently (in polynomial time) when the set of frequent rankings is drawn uniformly at random. It learns the distribution over the most popular items among frequent preference types by actively and repeatedly offering assortments of items and observing the chosen item. We demonstrate that our algorithm more effectively recovers a set of frequent preferences on both synthetic and publicly available datasets on consumers' preferences, compared to corresponding non-active learning estimation algorithms. These findings underscore the value of our algorithm and the broader applicability of active-learning approaches in modeling consumer behavior.

Read more

4/26/2024