Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning

Read original: arXiv:2310.01012 - Published 5/2/2024 by James Chapman, Lennie Wells, Ana Lawry Aguila

🚀

Overview

The paper proposes novel algorithms for stochastic Partial Least Squares (PLS), stochastic Canonical Correlation Analysis (CCA), and Deep CCA that show faster convergence and higher correlations than previous state-of-the-art methods.
The algorithms are applied to large-scale datasets, including performing the first-of-its-kind PLS analysis on the UK Biobank dataset.
The paper also links CCA-family Self-Supervised Learning (SSL) methods to classical CCA, laying the groundwork for future insights.

Plain English Explanation

The paper focuses on a family of machine learning methods called Canonical Correlation Analysis (CCA) and its extensions. CCA is a powerful technique for finding relationships between two sets of variables, such as images and their captions, or gene expression data and clinical outcomes.

The authors propose new algorithms that can efficiently apply CCA and related methods, such as Partial Least Squares (PLS), to very large datasets. These new algorithms converge much faster and achieve higher correlations than previous approaches, allowing the researchers to perform analyses that were previously infeasible.

For example, the authors were able to apply PLS to a massive biomedical dataset from the UK Biobank, which has over 33,000 individuals and 500,000 features. This type of large-scale analysis was not possible with classical CCA algorithms.

The paper also shows how the new CCA algorithms can be used to train self-supervised learning models, which are a powerful approach for learning useful representations from data without the need for labeled examples. The authors demonstrate that their CCA-based models can match the performance of state-of-the-art self-supervised learning methods on common benchmarks.

Technical Explanation

The paper proposes a novel unconstrained objective that characterizes the top subspace of the Generalized Eigenvalue Problem (GEP) framework, which unifies various CCA-related methods. The authors then develop a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA by applying stochastic gradient descent (SGD) to the corresponding CCA objectives.

These new algorithms show significant improvements in convergence speed and correlation recovery compared to previous state-of-the-art methods on standard CCA and Deep CCA benchmarks. The authors leverage these advancements to perform the first large-scale PLS analysis on the UK Biobank dataset, which has over 33,000 individuals and 500,000 features.

Additionally, the paper establishes theoretical connections between CCA-family Self-Supervised Learning (SSL) methods and classical CCA. This lays the groundwork for future insights into the relationships between these powerful techniques.

Critical Analysis

The paper presents a robust set of algorithmic contributions that significantly advance the state-of-the-art in CCA and related multiview learning methods. The authors have thoroughly evaluated their approaches on a range of benchmark datasets and demonstrated impressive performance gains.

One potential limitation of the work is that the theoretical analysis is primarily focused on the unconstrained CCA objective and its relationship to GEP, without fully addressing the implications for the Deep CCA formulation. Further theoretical insights into the deep learning-based extensions would be valuable.

Additionally, while the authors showcase the applicability of their methods to large-scale datasets like the UK Biobank, it would be interesting to see more detailed discussions of the practical challenges and considerations involved in scaling these techniques to such massive real-world problems.

Overall, this paper makes important contributions to the multiview learning literature and provides a strong foundation for future research in this area. The novel algorithms and insights presented here have the potential to enable new discoveries across a wide range of domains.

Conclusion

This paper introduces a family of fast, efficient algorithms for Canonical Correlation Analysis (CCA) and related multiview learning methods. The authors' key innovations include a novel unconstrained CCA objective and stochastic optimization procedures that demonstrate significant performance improvements over previous approaches.

These advancements allow the researchers to apply CCA-based techniques to large-scale datasets that were previously infeasible, as shown by the first-of-its-kind Partial Least Squares (PLS) analysis of the UK Biobank. The paper also establishes theoretical connections between CCA-family Self-Supervised Learning methods and classical CCA, paving the way for further insights in this active area of machine learning research.

Overall, this work represents an important step forward in the field of multiview learning, with the potential to unlock new discoveries and applications across a variety of domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🚀

Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning

James Chapman, Lennie Wells, Ana Lawry Aguila

The Canonical Correlation Analysis (CCA) family of methods is foundational in multiview learning. Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and be unified with a Generalized Eigenvalue Problem (GEP) framework. However, classical algorithms for these linear methods are computationally infeasible for large-scale data. Extensions to Deep CCA show great promise, but current training procedures are slow and complicated. First we propose a novel unconstrained objective that characterizes the top subspace of GEPs. Our core contribution is a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying stochastic gradient descent (SGD) to the corresponding CCA objectives. Our algorithms show far faster convergence and recover higher correlations than the previous state-of-the-art on all standard CCA and Deep CCA benchmarks. These improvements allow us to perform a first-of-its-kind PLS analysis of an extremely large biomedical dataset from the UK Biobank, with over 33,000 individuals and 500,000 features. Finally, we apply our algorithms to match the performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10 and CIFAR-100 with minimal hyper-parameter tuning, and also present theory to clarify the links between these methods and classical CCA, laying the groundwork for future insights.

5/2/2024

Suitability of CCA for Generating Latent State/ Variables in Multi-View Textual Data

Akanksha Mehndiratta, Krishna Asawa

The probabilistic interpretation of Canonical Correlation Analysis (CCA) for learning low-dimensional real vectors, called as latent variables, has been exploited immensely in various fields. This study takes a step further by demonstrating the potential of CCA in discovering a latent state that captures the contextual information within the textual data under a two-view setting. The interpretation of CCA discussed in this study utilizes the multi-view nature of textual data, i.e. the consecutive sentences in a document or turns in a dyadic conversation, and has a strong theoretical foundation. Furthermore, this study proposes a model using CCA to perform the Automatic Short Answer Grading (ASAG) task. The empirical analysis confirms that the proposed model delivers competitive results and can even beat various sophisticated supervised techniques. The model is simple, linear, and adaptable and should be used as the baseline especially when labeled training data is scarce or nonexistent.

6/21/2024

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi, Albert Manuel Orozco Camacho, Eugene Belilovsky, Guy Wolf

Combining the predictions of multiple trained models through ensembling is generally a good way to improve accuracy by leveraging the different learned features of the models, however it comes with high computational and storage costs. Model fusion, the act of merging multiple models into one by combining their parameters reduces these costs but doesn't work as well in practice. Indeed, neural network loss landscapes are high-dimensional and non-convex and the minima found through learning are typically separated by high loss barriers. Numerous recent works have been focused on finding permutations matching one network features to the features of a second one, lowering the loss barrier on the linear path between them in parameter space. However, permutations are restrictive since they assume a one-to-one mapping between the different models' neurons exists. We propose a new model merging algorithm, CCA Merge, which is based on Canonical Correlation Analysis and aims to maximize the correlations between linear combinations of the model features. We show that our alignment method leads to better performances than past methods when averaging models trained on the same, or differing data splits. We also extend this analysis into the harder setting where more than 2 models are merged, and we find that CCA Merge works significantly better than past methods. Our code is publicly available at https://github.com/shoroi/align-n-merge

7/9/2024

🧠

$sigma$-PCA: a unified neural model for linear and nonlinear principal component analysis

Fahdi Kanavati, Lucy Katsnith, Masayuki Tsuneki

Linear principal component analysis (PCA) learns (semi-)orthogonal transformations by orienting the axes to maximize variance. Consequently, it can only identify orthogonal axes whose variances are clearly distinct, but it cannot identify the subsets of axes whose variances are roughly equal. It cannot eliminate the subspace rotational indeterminacy: it fails to disentangle components with equal variances (eigenvalues), resulting, in each eigen subspace, in randomly rotated axes. In this paper, we propose $sigma$-PCA, a method that (1) formulates a unified model for linear and nonlinear PCA, the latter being a special case of linear independent component analysis (ICA), and (2) introduces a missing piece into nonlinear PCA that allows it to eliminate, from the canonical linear PCA solution, the subspace rotational indeterminacy -- without whitening the inputs. Whitening, a preprocessing step which converts the inputs into unit-variance inputs, has generally been a prerequisite step for linear ICA methods, which meant that conventional nonlinear PCA could not necessarily preserve the orthogonality of the overall transformation, could not directly reduce dimensionality, and could not intrinsically order by variances. We offer insights on the relationship between linear PCA, nonlinear PCA, and linear ICA -- three methods with autoencoder formulations for learning special linear transformations from data, transformations that are (semi-)orthogonal for PCA, and arbitrary unit-variance for ICA. As part of our formulation, nonlinear PCA can be seen as a method that maximizes both variance and statistical independence, lying in the middle between linear PCA and linear ICA, serving as a building block for learning linear transformations that are identifiable.

7/2/2024