Improving multidimensional projection quality with user-specific metrics and optimal scaling

Maniru Ibrahim

The growing prevalence of high-dimensional data has fostered the development of multidimensional projection (MP) techniques, such as t-SNE, UMAP, and LAMP, for data visualization and exploration. However, conventional MP methods typically employ generic quality metrics, neglecting individual user preferences. This study proposes a new framework that tailors MP techniques based on user-specific quality criteria, enhancing projection interpretability. Our approach combines three visual quality metrics, stress, neighborhood preservation, and silhouette score, to create a composite metric for a precise MP evaluation. We then optimize the projection scale by maximizing the composite metric value. We conducted an experiment involving two users with different projection preferences, generating projections using t-SNE, UMAP, and LAMP. Users rate projections according to their criteria, producing two training sets. We derive optimal weights for each set and apply them to other datasets to determine the best projections per user. Our findings demonstrate that personalized projections effectively capture user preferences, fostering better data exploration and enabling more informed decision-making. This user-centric approach promotes advancements in multidimensional projection techniques that accommodate diverse user preferences and enhance interpretability.

7/24/2024

A new visual quality metric for Evaluating the performance of multidimensional projections

Maniru Ibrahim, Thales Vieira

Multidimensional projections (MP) are among the most essential approaches in the visual analysis of multidimensional data. It transforms multidimensional data into two-dimensional representations that may be shown as scatter plots while preserving their similarity with the original data. Human visual perception is frequently used to evaluate the quality of MP. In this work, we propose to study and improve on a well-known map called Local Affine Multidimensional Projection (LAMP), which takes a multidimensional instance and embeds it in Cartesian space via moving least squares deformation. We propose a new visual quality metric based on human perception. The new metric combines three previously used metrics: silhouette coefficient, neighborhood preservation, and silhouette ratio. We show that the proposed metric produces more precise results in analyzing the quality of MP than other previously used metrics. Finally, we describe an algorithm that attempts to overcome a limitation of the LAMP method which requires a similar scale for control points and their counterparts in the Cartesian space.

7/24/2024

📉

CBMAP: Clustering-based manifold approximation and projection for dimensionality reduction

Berat Dogan

Dimensionality reduction methods are employed to decrease data dimensionality, either to enhance machine learning performance or to facilitate data visualization in two or three-dimensional spaces. These methods typically fall into two categories: feature selection and feature transformation. Feature selection retains significant features, while feature transformation projects data into a lower-dimensional space, with linear and nonlinear methods. While nonlinear methods excel in preserving local structures and capturing nonlinear relationships, they may struggle with interpreting global structures and can be computationally intensive. Recent algorithms, such as the t-SNE, UMAP, TriMap, and PaCMAP prioritize preserving local structures, often at the expense of accurately representing global structures, leading to clusters being spread out more in lower-dimensional spaces. Moreover, these methods heavily rely on hyperparameters, making their results sensitive to parameter settings. To address these limitations, this study introduces a clustering-based approach, namely CBMAP (Clustering-Based Manifold Approximation and Projection), for dimensionality reduction. CBMAP aims to preserve both global and local structures, ensuring that clusters in lower-dimensional spaces closely resemble those in high-dimensional spaces. Experimental evaluations on benchmark datasets demonstrate CBMAP's efficacy, offering speed, scalability, and minimal reliance on hyperparameters. Importantly, CBMAP enables low-dimensional projection of test data, addressing a critical need in machine learning applications. CBMAP is made freely available at https://github.com/doganlab/cbmap and can be installed from the Python Package Directory (PyPI) software repository with the command pip install cbmap.

9/17/2024

Approximate UMAP allows for high-rate online visualization of high-dimensional data streams

Peter Wassenaar, Pierre Guetschel, Michael Tangermann

In the BCI field, introspection and interpretation of brain signals are desired for providing feedback or to guide rapid paradigm prototyping but are challenging due to the high noise level and dimensionality of the signals. Deep neural networks are often introspected by transforming their learned feature representations into 2- or 3-dimensional subspace visualizations using projection algorithms like Uniform Manifold Approximation and Projection (UMAP). Unfortunately, these methods are computationally expensive, making the projection of data streams in real-time a non-trivial task. In this study, we introduce a novel variant of UMAP, called approximate UMAP (aUMAP). It aims at generating rapid projections for real-time introspection. To study its suitability for real-time projecting, we benchmark the methods against standard UMAP and its neural network counterpart parametric UMAP. Our results show that approximate UMAP delivers projections that replicate the projection space of standard UMAP while decreasing projection speed by an order of magnitude and maintaining the same training time.

4/8/2024

Improving multidimensional projection quality with user-specific metrics and optimal scaling

Overview

Plain English Explanation

Technical Explanation

Critical Analysis

Conclusion

Related Papers