$sigma$-PCA: a unified neural model for linear and nonlinear principal component analysis

Read original: arXiv:2311.13580 - Published 7/2/2024 by Fahdi Kanavati, Lucy Katsnith, Masayuki Tsuneki

🧠

Overview

Linear Principal Component Analysis (PCA), Nonlinear PCA, and Linear Independent Component Analysis (ICA) are methods with single-layer autoencoder formulations for learning special linear transformations from data.
Linear PCA learns orthogonal transformations that orient axes to maximize variance, but it suffers from a subspace rotational indeterminacy.
Nonlinear PCA and Linear ICA reduce this indeterminacy by maximizing statistical independence under the assumption of unit variance.
The main difference is that Nonlinear PCA only learns rotations, while Linear ICA learns any linear transformation with unit variance.

Plain English Explanation

These three methods, Linear PCA, Nonlinear PCA, and Linear ICA, are ways of finding special patterns in data.

Linear PCA tries to find the directions in the data that have the most variation or spread. Imagine you have a bunch of points in a 3D space, and Linear PCA would find the three directions that best capture the overall shape of the cloud of points. However, Linear PCA has a problem - it can't figure out the exact orientation of the axes if they have the same amount of variation.

Nonlinear PCA and Linear ICA try to fix this by also looking at how independent or uncorrelated the different directions are, not just how much variation they have. Nonlinear PCA can only find rotations of the axes, while Linear ICA can find any kind of linear transformation.

The relationship between the three methods can be understood through a mathematical technique called Singular Value Decomposition, which breaks down the Linear ICA transformation into a sequence of rotations and scalings.

Technical Explanation

Linear PCA learns a special kind of orthogonal transformation that aligns the axes of the data to maximize the variance, or spread, along each axis. However, this approach suffers from a subspace rotational indeterminacy - it cannot find a unique rotation for axes that have the same variance.

Both Nonlinear PCA and Linear ICA tackle this issue by maximizing the statistical independence of the transformed variables, under the assumption that they all have unit variance. The key difference is that Nonlinear PCA only learns rotations, while Linear ICA can learn any linear transformation with unit variance.

This relationship can be understood through Singular Value Decomposition (SVD). The Linear ICA transformation can be decomposed into a sequence of rotation, scaling, and rotation. Linear PCA learns the first rotation, while Nonlinear PCA learns the second rotation. The scaling is the inverse of the standard deviations.

The problem is that, unlike Linear PCA, conventional Nonlinear PCA cannot be used directly on the data to learn the first rotation, as this rotation is special - it reduces dimensionality and orders the components by variance.

Critical Analysis

The paper proposes a solution to this problem, called σ-PCA, which is a unified neural model for Linear and Nonlinear PCA as single-layer autoencoders. This modification allows Nonlinear PCA to learn not just the second rotation, but also the first rotation, by maximizing both variance and statistical independence.

This means that, like Linear PCA, Nonlinear PCA can now learn a semi-orthogonal transformation that reduces dimensionality and orders the components by variance. But, unlike Linear PCA, Nonlinear PCA can also eliminate the subspace rotational indeterminacy.

The authors do not address potential limitations or areas for further research in this paper. One potential concern is the computational and memory efficiency of the proposed σ-PCA model, especially compared to the conventional Nonlinear PCA approach.

Conclusion

This research proposes a novel method, σ-PCA, that unifies Linear and Nonlinear PCA as single-layer autoencoders. This advancement allows Nonlinear PCA to learn transformations that can both reduce dimensionality and order the components by variance, while also eliminating the subspace rotational indeterminacy that plagues Linear PCA.

The implications of this work could be far-reaching, as PCA and its nonlinear variants are widely used in machine learning, data analysis, and dimensionality reduction tasks across numerous domains. The ability to learn more robust and interpretable linear transformations could enhance the performance and interpretability of a wide range of AI models and applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🧠

$sigma$-PCA: a unified neural model for linear and nonlinear principal component analysis

Fahdi Kanavati, Lucy Katsnith, Masayuki Tsuneki

Linear principal component analysis (PCA) learns (semi-)orthogonal transformations by orienting the axes to maximize variance. Consequently, it can only identify orthogonal axes whose variances are clearly distinct, but it cannot identify the subsets of axes whose variances are roughly equal. It cannot eliminate the subspace rotational indeterminacy: it fails to disentangle components with equal variances (eigenvalues), resulting, in each eigen subspace, in randomly rotated axes. In this paper, we propose $sigma$-PCA, a method that (1) formulates a unified model for linear and nonlinear PCA, the latter being a special case of linear independent component analysis (ICA), and (2) introduces a missing piece into nonlinear PCA that allows it to eliminate, from the canonical linear PCA solution, the subspace rotational indeterminacy -- without whitening the inputs. Whitening, a preprocessing step which converts the inputs into unit-variance inputs, has generally been a prerequisite step for linear ICA methods, which meant that conventional nonlinear PCA could not necessarily preserve the orthogonality of the overall transformation, could not directly reduce dimensionality, and could not intrinsically order by variances. We offer insights on the relationship between linear PCA, nonlinear PCA, and linear ICA -- three methods with autoencoder formulations for learning special linear transformations from data, transformations that are (semi-)orthogonal for PCA, and arbitrary unit-variance for ICA. As part of our formulation, nonlinear PCA can be seen as a method that maximizes both variance and statistical independence, lying in the middle between linear PCA and linear ICA, serving as a building block for learning linear transformations that are identifiable.

7/2/2024

🎯

Principal Component Analysis in Space Forms

Puoya Tabaghi, Michael Khanzadeh, Yusu Wang, Sivash Mirarab

Principal Component Analysis (PCA) is a workhorse of modern data science. While PCA assumes the data conforms to Euclidean geometry, for specific data types, such as hierarchical and cyclic data structures, other spaces are more appropriate. We study PCA in space forms; that is, those with constant curvatures. At a point on a Riemannian manifold, we can define a Riemannian affine subspace based on a set of tangent vectors. Finding the optimal low-dimensional affine subspace for given points in a space form amounts to dimensionality reduction. Our Space Form PCA (SFPCA) seeks the affine subspace that best represents a set of manifold-valued points with the minimum projection cost. We propose proper cost functions that enjoy two properties: (1) their optimal affine subspace is the solution to an eigenequation, and (2) optimal affine subspaces of different dimensions form a nested set. These properties provide advances over existing methods, which are mostly iterative algorithms with slow convergence and weaker theoretical guarantees. We evaluate the proposed SFPCA on real and simulated data in spherical and hyperbolic spaces. We show that it outperforms alternative methods in estimating true subspaces (in simulated data) with respect to convergence speed or accuracy, often both.

7/11/2024

🚀

Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning

James Chapman, Lennie Wells, Ana Lawry Aguila

The Canonical Correlation Analysis (CCA) family of methods is foundational in multiview learning. Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and be unified with a Generalized Eigenvalue Problem (GEP) framework. However, classical algorithms for these linear methods are computationally infeasible for large-scale data. Extensions to Deep CCA show great promise, but current training procedures are slow and complicated. First we propose a novel unconstrained objective that characterizes the top subspace of GEPs. Our core contribution is a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying stochastic gradient descent (SGD) to the corresponding CCA objectives. Our algorithms show far faster convergence and recover higher correlations than the previous state-of-the-art on all standard CCA and Deep CCA benchmarks. These improvements allow us to perform a first-of-its-kind PLS analysis of an extremely large biomedical dataset from the UK Biobank, with over 33,000 individuals and 500,000 features. Finally, we apply our algorithms to match the performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10 and CIFAR-100 with minimal hyper-parameter tuning, and also present theory to clarify the links between these methods and classical CCA, laying the groundwork for future insights.

5/2/2024

🧠

Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

Pattarawat Chormai, Jan Herrmann, Klaus-Robert Muller, Gr'egoire Montavon

Explainable AI aims to overcome the black-box nature of complex ML models like neural networks by generating explanations for their predictions. Explanations often take the form of a heatmap identifying input features (e.g. pixels) that are relevant to the model's decision. These explanations, however, entangle the potentially multiple factors that enter into the overall complex decision strategy. We propose to disentangle explanations by extracting at some intermediate layer of a neural network, subspaces that capture the multiple and distinct activation patterns (e.g. visual concepts) that are relevant to the prediction. To automatically extract these subspaces, we propose two new analyses, extending principles found in PCA or ICA to explanations. These novel analyses, which we call principal relevant component analysis (PRCA) and disentangled relevant subspace analysis (DRSA), maximize relevance instead of e.g. variance or kurtosis. This allows for a much stronger focus of the analysis on what the ML model actually uses for predicting, ignoring activations or concepts to which the model is invariant. Our approach is general enough to work alongside common attribution techniques such as Shapley Value, Integrated Gradients, or LRP. Our proposed methods show to be practically useful and compare favorably to the state of the art as demonstrated on benchmarks and three use cases.

4/16/2024