An Experimental Study on the Rashomon Effect of Balancing Methods in Imbalanced Classification

Read original: arXiv:2405.01557 - Published 7/25/2024 by Mustafa Cavus, Przemys{l}aw Biecek

🏷️

Overview

Predictive models can be biased when classifying imbalanced datasets, favoring the majority class and performing poorly on the minority class.
Balancing or resampling methods are common pre-processing steps to address this issue, but there have been debates about their effectiveness.
The Rashomon effect, where multiple candidate models exhibit similar predictive performance, can lead to the selection of suboptimal models and predictive multiplicity (conflicting predictions).
This study examines the impact of balancing methods on predictive multiplicity through the Rashomon effect.

Plain English Explanation

Predictive models, such as those used for classification tasks, can sometimes have a bias towards the majority class in a dataset. This means the model is better at predicting the most common class, but struggles to accurately predict the minority class. To fix this, techniques like balancing or resampling the data are often used as pre-processing steps.

However, there has been some debate about how well these methods work. One issue is the Rashomon effect, where multiple candidate models perform similarly well. This can make it difficult to choose the best model, as they may give conflicting predictions for the same data points (predictive multiplicity).

In this study, the researchers wanted to understand how balancing methods impact this predictive multiplicity through the Rashomon effect. They conducted experiments on real datasets to see how different balancing approaches affected the number of equally accurate models and the consistency of their predictions.

Technical Explanation

The researchers performed experiments using real-world datasets to observe the impact of balancing methods on predictive multiplicity through the Rashomon effect. They applied various balancing techniques, such as oversampling the minority class or undersampling the majority class, to address the class imbalance problem.

By examining the Rashomon effect - where multiple models exhibit similar predictive performance - the researchers were able to assess how the balancing methods affected the number of equally accurate models and the consistency of their predictions (predictive multiplicity).

The findings showed that the balancing methods tended to inflate the predictive multiplicity, leading to a larger set of models with comparable performance. This means that blindly selecting one of these models could result in using a suboptimal model, as the models may yield conflicting predictions for the same data points.

To help address this issue, the researchers proposed using an extended performance-gain plot to monitor the trade-off between model performance and predictive multiplicity when conducting the modeling process.

Critical Analysis

The study provides valuable insights into the impact of balancing methods on predictive multiplicity through the Rashomon effect. However, the authors acknowledge that the findings may be limited to the specific datasets and models used in the experiments.

Additionally, the study does not explore the underlying reasons why balancing methods can inflate the Rashomon effect. Further research may be needed to understand the mechanisms behind this phenomenon and how it can be mitigated.

While the proposed extended performance-gain plot is a promising approach, its practical implementation and effectiveness in real-world scenarios require further validation and testing.

Conclusion

This study highlights the importance of considering predictive multiplicity, in addition to model performance, when selecting predictive models, especially when working with imbalanced datasets. The findings suggest that commonly used balancing methods can inflate the Rashomon effect, leading to a larger set of equally accurate models with potentially conflicting predictions.

To address this issue, the researchers recommend using the extended performance-gain plot to monitor the trade-off between model performance and predictive multiplicity during the modeling process. This approach can help researchers and practitioners make more informed decisions when selecting predictive models and ensure the robustness and reliability of their findings.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🏷️

An Experimental Study on the Rashomon Effect of Balancing Methods in Imbalanced Classification

Mustafa Cavus, Przemys{l}aw Biecek

Predictive models may generate biased predictions when classifying imbalanced datasets. This happens when the model favors the majority class, leading to low performance in accurately predicting the minority class. To address this issue, balancing or resampling methods are critical data-centric AI approaches in the modeling process to improve prediction performance. However, there have been debates and questions about the functionality of these methods in recent years. In particular, many candidate models may exhibit very similar predictive performance, called the Rashomon effect, in model selection, and they may even produce different predictions for the same observations. Selecting one of these models without considering the predictive multiplicity -- which is the case of yielding conflicting models' predictions for any sample -- can result in blind selection. In this paper, the impact of balancing methods on predictive multiplicity is examined using the Rashomon effect. It is crucial because the blind model selection in data-centric AI is risky from a set of approximately equally accurate models. This may lead to severe problems in model selection, validation, and explanation. To tackle this matter, we conducted real dataset experiments to observe the impact of balancing methods on predictive multiplicity through the Rashomon effect by using a newly proposed metric obscurity in addition to the existing ones: ambiguity and discrepancy. Our findings showed that balancing methods inflate the predictive multiplicity and yield varying results. To monitor the trade-off between the prediction performance and predictive multiplicity for conducting the modeling process responsibly, we proposed using the extended version of the performance-gain plot when balancing the training data.

7/25/2024

Amazing Things Come From Having Many Good Models

Cynthia Rudin, Chudi Zhong, Lesia Semenova, Margo Seltzer, Ronald Parr, Jiachang Liu, Srikar Katta, Jon Donnelly, Harry Chen, Zachery Boner

The Rashomon Effect, coined by Leo Breiman, describes the phenomenon that there exist many equally good predictive models for the same dataset. This phenomenon happens for many real datasets and when it does, it sparks both magic and consternation, but mostly magic. In light of the Rashomon Effect, this perspective piece proposes reshaping the way we think about machine learning, particularly for tabular data problems in the nondeterministic (noisy) setting. We address how the Rashomon Effect impacts (1) the existence of simple-yet-accurate models, (2) flexibility to address user preferences, such as fairness and monotonicity, without losing performance, (3) uncertainty in predictions, fairness, and explanations, (4) reliable variable importance, (5) algorithm choice, specifically, providing advanced knowledge of which algorithms might be suitable for a given problem, and (6) public policy. We also discuss a theory of when the Rashomon Effect occurs and why. Our goal is to illustrate how the Rashomon Effect can have a massive impact on the use of machine learning for complex problems in society.

7/11/2024

Practical Attribution Guidance for Rashomon Sets

Sichao Li, Amanda S. Barnard, Quanling Deng

Different prediction models might perform equally well (Rashomon set) in the same task, but offer conflicting interpretations and conclusions about the data. The Rashomon effect in the context of Explainable AI (XAI) has been recognized as a critical factor. Although the Rashomon set has been introduced and studied in various contexts, its practical application is at its infancy stage and lacks adequate guidance and evaluation. We study the problem of the Rashomon set sampling from a practical viewpoint and identify two fundamental axioms - generalizability and implementation sparsity that exploring methods ought to satisfy in practical usage. These two axioms are not satisfied by most known attribution methods, which we consider to be a fundamental weakness. We use the norms to guide the design of an $epsilon$-subgradient-based sampling method. We apply this method to a fundamental mathematical problem as a proof of concept and to a set of practical datasets to demonstrate its ability compared with existing sampling methods.

7/29/2024

On the Rashomon ratio of infinite hypothesis sets

Evzenie Coupkova, Mireille Boutin

Given a classification problem and a family of classifiers, the Rashomon ratio measures the proportion of classifiers that yield less than a given loss. Previous work has explored the advantage of a large Rashomon ratio in the case of a finite family of classifiers. Here we consider the more general case of an infinite family. We show that a large Rashomon ratio guarantees that choosing the classifier with the best empirical accuracy among a random subset of the family, which is likely to improve generalizability, will not increase the empirical loss too much. We quantify the Rashomon ratio in two examples involving infinite classifier families in order to illustrate situations in which it is large. In the first example, we estimate the Rashomon ratio of the classification of normally distributed classes using an affine classifier. In the second, we obtain a lower bound for the Rashomon ratio of a classification problem with a modified Gram matrix when the classifier family consists of two-layer ReLU neural networks. In general, we show that the Rashomon ratio can be estimated using a training dataset along with random samples from the classifier family and we provide guarantees that such an estimation is close to the true value of the Rashomon ratio.

4/30/2024