Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection

Read original: arXiv:2302.03391 - Published 7/19/2024 by Louis Ohl, Pierre-Alexandre Mattei, Charles Bouveyron, Mickael Leclercq, Arnaud Droit, Fr'ed'eric Precioso

🔗

Overview

Discovering relevant clusters and variables is a challenging task in feature selection for clustering
Existing feature selection algorithms often rely on model-based approaches or strong assumptions about data distribution
This paper introduces a discriminative clustering model called Sparse GEMINI that aims to maximize a geometry-aware generalization of mutual information, with a simple L1 penalty

Plain English Explanation

The paper discusses the problem of feature selection in clustering, which is a difficult task. When you're trying to group data into clusters, you also need to figure out which features (or variables) of the data are most important for those clusters.

Existing algorithms for this often rely on model-based approaches or make strong assumptions about how the data is distributed. The paper introduces a new approach called Sparse GEMINI that takes a different tack.

Sparse GEMINI uses a discriminative clustering model, which means it tries to find clusters by looking at how the data points are different from each other, rather than how they're similar. It does this by trying to maximize a measure called GEMINI, which is a generalization of mutual information that takes into account the geometry (or shape) of the data.

Sparse GEMINI also uses a simple L1 penalty, which helps it automatically select the most important features for the clustering without having to explore all possible feature subsets. This makes the algorithm scalable to high-dimensional data and large datasets.

Technical Explanation

The paper introduces a discriminative clustering model called Sparse GEMINI that aims to maximize a geometry-aware generalization of mutual information, called GEMINI, with a simple L1 penalty. This approach avoids the need for combinatorial feature subset exploration and is designed to be easily scalable to high-dimensional data and large datasets.

The GEMINI measure is a generalization of mutual information that takes into account the underlying geometry of the data, rather than just considering statistical independence. By maximizing GEMINI, the Sparse GEMINI algorithm is able to discover relevant clusters and variables simultaneously, without requiring relevance criteria or prior hypotheses.

The authors demonstrate the performance of Sparse GEMINI on synthetic datasets and large-scale real-world datasets. The results show that Sparse GEMINI is a competitive algorithm that can effectively select relevant subsets of variables for clustering, outperforming other feature selection methods.

Critical Analysis

The paper presents a novel and interesting approach to the challenging problem of feature selection in clustering. By using a discriminative clustering model and a geometry-aware generalization of mutual information, the Sparse GEMINI algorithm avoids some of the limitations of existing model-based or assumption-heavy feature selection methods.

However, the paper does not provide a deep analysis of the theoretical properties of the GEMINI measure or the Sparse GEMINI algorithm. While the experimental results are promising, it would be helpful to have a more thorough discussion of the algorithm's strengths, weaknesses, and potential edge cases.

Additionally, the paper does not address the computational complexity of the Sparse GEMINI algorithm, which can be an important consideration for real-world applications, especially as dataset sizes continue to grow. Further research on the scalability and efficiency of the algorithm would be valuable.

Conclusion

This paper introduces a novel feature selection algorithm for clustering called Sparse GEMINI that uses a discriminative clustering model and a geometry-aware generalization of mutual information. The algorithm is designed to be scalable and can effectively select relevant subsets of variables without relying on relevance criteria or prior hypotheses.

While the experimental results are promising, the paper could benefit from a more in-depth analysis of the theoretical properties and practical considerations of the Sparse GEMINI approach. Overall, the paper presents an interesting and innovative solution to the challenging problem of feature selection in clustering, and it opens up avenues for further research and development in this important area of machine learning.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔗

Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection

Louis Ohl, Pierre-Alexandre Mattei, Charles Bouveyron, Mickael Leclercq, Arnaud Droit, Fr'ed'eric Precioso

Feature selection in clustering is a hard task which involves simultaneously the discovery of relevant clusters as well as relevant variables with respect to these clusters. While feature selection algorithms are often model-based through optimised model selection or strong assumptions on the data distribution, we introduce a discriminative clustering model trying to maximise a geometry-aware generalisation of the mutual information called GEMINI with a simple l1 penalty: the Sparse GEMINI. This algorithm avoids the burden of combinatorial feature subset exploration and is easily scalable to high-dimensional data and large amounts of samples while only designing a discriminative clustering model. We demonstrate the performances of Sparse GEMINI on synthetic datasets and large-scale datasets. Our results show that Sparse GEMINI is a competitive algorithm and has the ability to select relevant subsets of variables with respect to the clustering without using relevance criteria or prior hypotheses.

7/19/2024

✨

Estimating Conditional Mutual Information for Dynamic Feature Selection

Soham Gadgil, Ian Covert, Su-In Lee

Dynamic feature selection, where we sequentially query features to make accurate predictions with a minimal budget, is a promising paradigm to reduce feature acquisition costs and provide transparency into a model's predictions. The problem is challenging, however, as it requires both predicting with arbitrary feature sets and learning a policy to identify valuable selections. Here, we take an information-theoretic perspective and prioritize features based on their mutual information with the response variable. The main challenge is implementing this policy, and we design a new approach that estimates the mutual information in a discriminative rather than generative fashion. Building on our approach, we then introduce several further improvements: allowing variable feature budgets across samples, enabling non-uniform feature costs, incorporating prior information, and exploring modern architectures to handle partial inputs. Our experiments show that our method provides consistent gains over recent methods across a variety of datasets.

9/10/2024

Compressive Feature Selection for Remote Visual Multi-Task Inference

Saeed Ranjbar Alvar, Ivan V. Baji'c

Deep models produce a number of features in each internal layer. A key problem in applications such as feature compression for remote inference is determining how important each feature is for the task(s) performed by the model. The problem is especially challenging in the case of multi-task inference, where the same feature may carry different importance for different tasks. In this paper, we examine how effective is mutual information (MI) between a feature and a model's task output as a measure of the feature's importance for that task. Experiments involving hard selection and soft selection (unequal compression) based on MI are carried out to compare the MI-based method with alternative approaches. Multi-objective analysis is provided to offer further insight.

5/16/2024

Slicing Mutual Information Generalization Bounds for Neural Networks

Kimia Nadjahi, Kristjan Greenewald, Rickard Bruel Gabrielsson, Justin Solomon

The ability of machine learning (ML) algorithms to generalize well to unseen data has been studied through the lens of information theory, by bounding the generalization error with the input-output mutual information (MI), i.e., the MI between the training data and the learned hypothesis. Yet, these bounds have limited practicality for modern ML applications (e.g., deep learning), due to the difficulty of evaluating MI in high dimensions. Motivated by recent findings on the compressibility of neural networks, we consider algorithms that operate by slicing the parameter space, i.e., trained on random lower-dimensional subspaces. We introduce new, tighter information-theoretic generalization bounds tailored for such algorithms, demonstrating that slicing improves generalization. Our bounds offer significant computational and statistical advantages over standard MI bounds, as they rely on scalable alternative measures of dependence, i.e., disintegrated mutual information and $k$-sliced mutual information. Then, we extend our analysis to algorithms whose parameters do not need to exactly lie on random subspaces, by leveraging rate-distortion theory. This strategy yields generalization bounds that incorporate a distortion term measuring model compressibility under slicing, thereby tightening existing bounds without compromising performance or requiring model compression. Building on this, we propose a regularization scheme enabling practitioners to control generalization through compressibility. Finally, we empirically validate our results and achieve the computation of non-vacuous information-theoretic generalization bounds for neural networks, a task that was previously out of reach.

6/7/2024