Data as voters: instance selection using approval-based multi-winner voting

Read original: arXiv:2304.09995 - Published 5/22/2024 by Luis S'anchez-Fern'andez, Jes'us A. Fisteus, Rafael L'opez-Zaragoza

📊

Overview

Presents a new approach to the instance selection problem in machine learning and data mining
Leverages recent research on proportional representation in approval-based multi-winner elections
Instances play a dual role as both voters and candidates in the model
Selects "election winners" as the reduced training set instances

Plain English Explanation

The paper introduces a novel way to reduce the size of a machine learning training dataset while trying to maintain the accuracy of the model. The key idea is to treat the data instances (e.g., individual data points) as both "voters" and "candidates" in an approval-based multi-winner election.

Each instance in the original training set acts as a "voter" and defines its "approval set" based on a concept called the "local set" (which is already known in the literature). Then, a voting rule is used to select the "winners" of this election, and those winners become the instances that are kept in the reduced training set.

The researchers found that for K-Nearest Neighbors (KNN) models, a particular voting rule called "Simple 2-EJR" outperformed other state-of-the-art algorithms in terms of accuracy versus the amount of reduction in the training set size. For Support Vector Machines (SVMs), they saw slight increases in average accuracy by using voting rules that satisfy certain properties like Extended justified representation (EJR) or Proportional justified representation (PJR).

Technical Explanation

The paper proposes a novel approach to the instance selection problem, which aims to reduce the size of a machine learning training dataset while preserving its quality. The key idea is to leverage recent results on proportional representation in approval-based multi-winner elections.

In the proposed model, each instance in the training set plays a dual role as both a "voter" and a "candidate" in an approval-based election. The "approval set" of each instance (acting as a voter) is defined based on the concept of a "local set", which is already established in the literature.

The researchers then use a representative voting rule to select the "winners" of this election, and these winners become the instances that are kept in the reduced training set.

The paper presents experimental results for two common machine learning models: K-Nearest Neighbors (KNN) and Support Vector Machines (SVMs). For KNN, the researchers found that a variant of the "Simple EJR" voting rule, called "Simple 2-EJR", outperformed other state-of-the-art algorithms and baselines in terms of accuracy versus reduction. For SVMs, they obtained slight increases in average accuracy by using voting rules that satisfy the EJR or PJR properties.

Critical Analysis

The paper presents a novel and intriguing approach to the instance selection problem, but it does not address several important considerations. The authors do not provide a detailed analysis of the computational complexity of their proposed method, which could be a significant factor, especially for large-scale datasets. Additionally, the paper does not explore the sensitivity of the results to the choice of the "local set" definition or the specific voting rules used.

Another potential limitation is the reliance on accuracy as the primary evaluation metric. In many real-world applications, other performance measures, such as interpretability or robustness, may be equally or more important. The paper would benefit from a more comprehensive evaluation of the proposed method's impact on these other relevant metrics.

Further research could also investigate the theoretical properties of the instance selection approach, such as its convergence guarantees or the optimality of the selected instances under different assumptions or constraints.

Conclusion

The presented research offers a novel perspective on the instance selection problem in machine learning, drawing inspiration from the field of approval-based multi-winner elections. By treating data instances as both voters and candidates, the authors develop a unique approach to reducing the size of training datasets while attempting to preserve model accuracy.

The experimental results demonstrate promising performance, especially for KNN models, and suggest that voting rules satisfying certain properties can be beneficial for SVMs as well. However, the paper also highlights the need for further investigation into the computational complexity, sensitivity to design choices, and broader performance implications of the proposed method.

Overall, this work represents an interesting contribution to the ongoing efforts to develop efficient and effective data selection techniques in the field of machine learning and data mining.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

Data as voters: instance selection using approval-based multi-winner voting

Luis S'anchez-Fern'andez, Jes'us A. Fisteus, Rafael L'opez-Zaragoza

We present a novel approach to the instance selection problem in machine learning (or data mining). Our approach is based on recent results on (proportional) representation in approval-based multi-winner elections. In our model, instances play a double role as voters and candidates. The approval set of each instance in the training set (acting as a voter) is defined from the concept of local set, which already exists in the literature. We then select the election winners by using a representative voting rule, and such winners are the data instances kept in the reduced training set. Our experiments show that, for KNN, the rule Simple 2-EJR (a variant of the Simple EJR voting rule that satisfies 2-EJR) outperforms all the state-of-the-art algorithms and all the baselines that we consider in this paper in terms of accuracy vs reduction. For SVMs, we have obtained slight increases in the average accuracy by using several voting rules that satisfy EJR or PJR compared to the results obtained with the original datasets.

5/22/2024

Multiwinner Temporal Voting with Aversion to Change

Valentin Zech, Niclas Boehmer, Edith Elkind, Nicholas Teh

We study two-stage committee elections where voters have dynamic preferences over candidates; at each stage, a committee is chosen under a given voting rule. We are interested in identifying a winning committee for the second stage that overlaps as much as possible with the first-stage committee. We show a full complexity dichotomy for the class of Thiele rules: this problem is tractable for Approval Voting (AV) and hard for all other Thiele rules (including, in particular, Proportional Approval Voting and the Chamberlin-Courant rule). We extend this dichotomy to the greedy variants of Thiele rules. We also explore this problem from a parameterized complexity perspective for several natural parameters. We complement the theory with experimental analysis: e.g., we investigate the average number of changes in the committee as a function of changes in voters' preferences and the role of ties.

8/21/2024

DeepVoting: Learning Voting Rules with Tailored Embeddings

Leonardo Matone, Ben Abramowitz, Nicholas Mattei, Avinash Balakrishnan

Aggregating the preferences of multiple agents into a collective decision is a common step in many important problems across areas of computer science including information retrieval, reinforcement learning, and recommender systems. As Social Choice Theory has shown, the problem of designing algorithms for aggregation rules with specific properties (axioms) can be difficult, or provably impossible in some cases. Instead of designing algorithms by hand, one can learn aggregation rules, particularly voting rules, from data. However, the prior work in this area has required extremely large models, or been limited by the choice of preference representation, i.e., embedding. We recast the problem of designing a good voting rule into one of learning probabilistic versions of voting rules that output distributions over a set of candidates. Specifically, we use neural networks to learn probabilistic social choice functions from the literature. We show that embeddings of preference profiles derived from the social choice literature allows us to learn existing voting rules more efficiently and scale to larger populations of voters more easily than other work if the embedding is tailored to the learning objective. Moreover, we show that rules learned using embeddings can be tweaked to create novel voting rules with improved axiomatic properties. Namely, we show that existing voting rules require only minor modification to combat a probabilistic version of the No Show Paradox.

8/27/2024

Abductive and Contrastive Explanations for Scoring Rules in Voting

Cl'ement Contet, Umberto Grandi, J'er^ome Mengin

We view voting rules as classifiers that assign a winner (a class) to a profile of voters' preferences (an instance). We propose to apply techniques from formal explainability, most notably abductive and contrastive explanations, to identify minimal subsets of a preference profile that either imply the current winner or explain why a different candidate was not elected. Formal explanations turn out to have strong connections with classical problems studied in computational social choice such as bribery, possible and necessary winner identification, and preference learning. We design algorithms for computing abductive and contrastive explanations for scoring rules. For the Borda rule, we find a lower bound on the size of the smallest abductive explanations, and we conduct simulations to identify correlations between properties of preference profiles and the size of their smallest abductive explanations.

8/27/2024