Normalised clustering accuracy: An asymmetric external cluster validity measure

Read original: arXiv:2209.02935 - Published 7/26/2024 by Marek Gagolewski

🔗

Overview

There is no single best clustering algorithm that works for all tasks
Clustering algorithms are traditionally evaluated using internal or external validity measures
Internal measures quantify aspects of the obtained partitions, but their validity is questionable
External measures compare algorithm outputs to ground truth groupings, but commonly used scores have issues

Plain English Explanation

Clustering is a way of grouping similar things together, like grouping customers by their shopping habits. But there's no one perfect way to do this clustering. Clustering algorithms are often evaluated using internal or external measures.

Internal measures look at things like how tight the clusters are or how separated the points are. But these measures don't always match up with what experts think are good clusters. External measures compare the clustering results to what experts say the groups should be. But the common scores used for this, like normalized mutual information, have some issues.

For example, they don't always identify the worst-case scenarios correctly, and they're not easy to interpret. This can make it hard to really understand how well a clustering algorithm is performing on different datasets. To fix this, the authors propose a new measure that is normalized, scales well, and accounts for imbalances in cluster sizes.

Technical Explanation

The paper argues that the commonly used partition similarity scores, such as normalized mutual information, Fowlkes-Mallows, and adjusted Rand index, have some shortcomings. In particular, they do not correctly identify the worst-case scenarios, and they are not easily interpretable.

To address these issues, the authors propose and analyze a new measure: a version of the optimal set-matching accuracy. This new measure is normalized, monotonic with respect to a similarity relation, scale-invariant, and corrected for imbalanced cluster sizes (though it is not symmetric nor adjusted for chance).

The paper provides a detailed mathematical formulation of this new validity measure and demonstrates its properties through theoretical analysis and empirical evaluation on benchmark datasets. The results show that this new measure can provide a more comprehensive and interpretable assessment of clustering algorithm performance compared to the classical partition similarity scores.

Critical Analysis

The paper identifies valid concerns with the existing clustering evaluation measures and proposes a novel approach to address these limitations. The new measure seems to have desirable theoretical properties, such as being normalized, monotonic, and accounting for imbalanced cluster sizes.

However, the paper does not provide a comprehensive comparison of this new measure against other recently proposed clustering evaluation metrics that may also aim to address some of the same issues. A more thorough comparative analysis would help strengthen the case for adopting this new measure.

Additionally, the paper focuses on the mathematical properties of the measure and does not delve deeply into the practical implications and interpretability for domain experts who use clustering algorithms. Further research into how this new measure resonates with and aids practitioners would be valuable.

Conclusion

This paper presents a new validity measure for evaluating clustering algorithms that aims to address some shortcomings of commonly used partition similarity scores. The proposed measure has desirable theoretical properties and could potentially provide a more comprehensive and interpretable assessment of clustering performance.

While the paper makes a compelling case for this new measure, further comparative analysis and exploration of its practical applications would help strengthen the contribution and facilitate wider adoption by the research community and practitioners working on clustering-related problems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔗

Normalised clustering accuracy: An asymmetric external cluster validity measure

Marek Gagolewski

There is no, nor will there ever be, single best clustering algorithm. Nevertheless, we would still like to be able to distinguish between methods that work well on certain task types and those that systematically underperform. Clustering algorithms are traditionally evaluated using either internal or external validity measures. Internal measures quantify different aspects of the obtained partitions, e.g., the average degree of cluster compactness or point separability. However, their validity is questionable because the clusterings they endorse can sometimes be meaningless. External measures, on the other hand, compare the algorithms' outputs to fixed ground truth groupings provided by experts. In this paper, we argue that the commonly used classical partition similarity scores, such as the normalised mutual information, Fowlkes-Mallows, or adjusted Rand index, miss some desirable properties. In particular, they do not identify worst-case scenarios correctly, nor are they easily interpretable. As a consequence, the evaluation of clustering algorithms on diverse benchmark datasets can be difficult. To remedy these issues, we propose and analyse a new measure: a version of the optimal set-matching accuracy, which is normalised, monotonic with respect to some similarity relation, scale-invariant, and corrected for the imbalancedness of cluster sizes (but neither symmetric nor adjusted for chance).

7/26/2024

🔗

From A-to-Z Review of Clustering Validation Indices

Bryar A. Hassan, Noor Bahjat Tayfor, Alla A. Hassan, Aram M. Ahmed, Tarik A. Rashid, Naz N. Abdalla

Data clustering involves identifying latent similarities within a dataset and organizing them into clusters or groups. The outcomes of various clustering algorithms differ as they are susceptible to the intrinsic characteristics of the original dataset, including noise and dimensionality. The effectiveness of such clustering procedures directly impacts the homogeneity of clusters, underscoring the significance of evaluating algorithmic outcomes. Consequently, the assessment of clustering quality presents a significant and complex endeavor. A pivotal aspect affecting clustering validation is the cluster validity metric, which aids in determining the optimal number of clusters. The main goal of this study is to comprehensively review and explain the mathematical operation of internal and external cluster validity indices, but not all, to categorize these indices and to brainstorm suggestions for future advancement of clustering validation research. In addition, we review and evaluate the performance of internal and external clustering validation indices on the most common clustering algorithms, such as the evolutionary clustering algorithm star (ECA*). Finally, we suggest a classification framework for examining the functionality of both internal and external clustering validation measures regarding their ideal values, user-friendliness, responsiveness to input data, and appropriateness across various fields. This classification aids researchers in selecting the appropriate clustering validation measure to suit their specific requirements.

7/31/2024

Normalized mutual information is a biased measure for classification and community detection

Maximilian Jerdee, Alec Kirkley, M. E. J. Newman

Normalized mutual information is widely used as a similarity measure for evaluating the performance of clustering and classification algorithms. In this paper, we argue that results returned by the normalized mutual information are biased for two reasons: first, because they ignore the information content of the contingency table and, second, because their symmetric normalization introduces spurious dependence on algorithm output. We introduce a modified version of the mutual information that remedies both of these shortcomings. As a practical demonstration of the importance of using an unbiased measure, we perform extensive numerical tests on a basket of popular algorithms for network community detection and show that one's conclusions about which algorithm is best are significantly affected by the biases in the traditional mutual information.

8/30/2024

A new validity measure for fuzzy c-means clustering

Dae-Won Kim, Kwang H. Lee

A new cluster validity index is proposed for fuzzy clusters obtained from fuzzy c-means algorithm. The proposed validity index exploits inter-cluster proximity between fuzzy clusters. Inter-cluster proximity is used to measure the degree of overlap between clusters. A low proximity value refers to well-partitioned clusters. The best fuzzy c-partition is obtained by minimizing inter-cluster proximity with respect to c. Well-known data sets are tested to show the effectiveness and reliability of the proposed index.

7/10/2024