Two-sample Test using Projected Wasserstein Distance

Read original: arXiv:2010.11970 - Published 4/1/2024 by Jie Wang, Rui Gao, Yao Xie

📊

Overview

The researchers developed a method called "projected Wasserstein distance" to compare two sets of samples and determine if they come from the same distribution.
This addresses a problem known as the "curse of dimensionality" - when dealing with high-dimensional data, standard distance metrics like Wasserstein distance become less effective at distinguishing differences between distributions.
The key idea is to find a low-dimensional linear projection that maximizes the Wasserstein distance between the projected distributions, bypassing the challenges of high-dimensional spaces.
The researchers provide theoretical analysis of the method's properties and practical algorithms for computing the metric, validating their results through numerical examples.

Plain English Explanation

Imagine you have two bags of marbles - one blue and one red. You want to know if the marbles in the two bags come from the same source, or if they are different. One way to do this would be to measure the size of each marble and compare the distributions (how the marble sizes are spread out) between the two bags.

This "distribution comparison" problem is fundamental in statistics and machine learning. A common way to measure the difference between distributions is called the Wasserstein distance. However, the Wasserstein distance becomes less effective when the marbles have many attributes (like color, shape, weight, etc.) - this is the "curse of dimensionality."

The researchers' key insight is to first find a simple, low-dimensional way to represent the marbles (like just focusing on size) that still captures the important differences between the two bags. By projecting the high-dimensional marble data onto this low-dimensional representation, they are able to better compare the distributions and determine if the marbles came from the same source.

The researchers provide a rigorous mathematical framework for this "projected Wasserstein distance" approach, along with efficient algorithms to compute it. They demonstrate through examples that this technique outperforms standard high-dimensional Wasserstein distance comparisons, especially for complex, high-dimensional data.

Technical Explanation

The researchers developed a novel "projected Wasserstein distance" for the two-sample testing problem. This involves finding an optimal low-dimensional linear projection that maximizes the Wasserstein distance between the projected probability distributions of the two input datasets.

The key technical contribution is a theoretical characterization of the finite-sample convergence rate of this projected Wasserstein distance. The researchers show that by coupling the optimal projection step with the Wasserstein distance, they are able to bypass the "curse of dimensionality" that plagues standard high-dimensional Wasserstein metrics.

Practically, the researchers provide algorithms to efficiently compute the projected Wasserstein distance. Through numerical experiments, they validate their theoretical results and demonstrate that the projected distance outperforms the standard Wasserstein distance, especially in high-dimensional settings.

Critical Analysis

The proposed projected Wasserstein distance addresses an important problem and provides a theoretically-grounded and empirically-validated solution. However, the paper does not explore the sensitivity of the method to the choice of the low-dimensional linear projection - it assumes an "optimal" projection is found, but the process of identifying this optimal projection is not discussed in depth.

Additionally, the paper focuses on the two-sample testing scenario, but many real-world applications may involve comparing more than two datasets at a time. It would be valuable to see an extension of the projected Wasserstein distance to the multi-sample case.

Finally, while the numerical examples validate the theoretical results, it would be helpful to see the projected Wasserstein distance applied to a broader range of real-world problems to better understand its practical advantages and limitations.

Conclusion

The researchers have developed a novel "projected Wasserstein distance" that can effectively compare high-dimensional probability distributions, overcoming the limitations of standard Wasserstein distance in such settings. This work advances the state-of-the-art in statistical and machine learning techniques for distribution comparison, with applications in areas like anomaly detection, domain adaptation, and generative modeling. By projecting the data to a low-dimensional space, the method provides a computationally efficient and theoretically-grounded approach to this fundamental problem.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

Two-sample Test using Projected Wasserstein Distance

Jie Wang, Rui Gao, Yao Xie

We develop a projected Wasserstein distance for the two-sample test, a fundamental problem in statistics and machine learning: given two sets of samples, to determine whether they are from the same distribution. In particular, we aim to circumvent the curse of dimensionality in Wasserstein distance: when the dimension is high, it has diminishing testing power, which is inherently due to the slow concentration property of Wasserstein metrics in the high dimension space. A key contribution is to couple optimal projection to find the low dimensional linear mapping to maximize the Wasserstein distance between projected probability distributions. We characterize the theoretical property of the finite-sample convergence rate on IPMs and present practical algorithms for computing this metric. Numerical examples validate our theoretical results.

4/1/2024

Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances

Jie Wang, March Boedihardjo, Yao Xie

Optimal transport has been very successful for various machine learning tasks; however, it is known to suffer from the curse of dimensionality. Hence, dimensionality reduction is desirable when applied to high-dimensional data with low-dimensional structures. The kernel max-sliced (KMS) Wasserstein distance is developed for this purpose by finding an optimal nonlinear mapping that reduces data into $1$ dimensions before computing the Wasserstein distance. However, its theoretical properties have not yet been fully developed. In this paper, we provide sharp finite-sample guarantees under milder technical assumptions compared with state-of-the-art for the KMS $p$-Wasserstein distance between two empirical distributions with $n$ samples for general $pin[1,infty)$. Algorithm-wise, we show that computing the KMS $2$-Wasserstein distance is NP-hard, and then we further propose a semidefinite relaxation (SDR) formulation (which can be solved efficiently in polynomial time) and provide a relaxation gap for the SDP solution. We provide numerical examples to demonstrate the good performance of our scheme for high-dimensional two-sample testing.

5/31/2024

🛠️

Stereographic Spherical Sliced Wasserstein Distances

Huy Tran, Yikun Bai, Abihith Kothapalli, Ashkan Shahbazi, Xinran Liu, Rocio Diaz Martin, Soheil Kolouri

Comparing spherical probability distributions is of great interest in various fields, including geology, medical domains, computer vision, and deep representation learning. The utility of optimal transport-based distances, such as the Wasserstein distance, for comparing probability measures has spurred active research in developing computationally efficient variations of these distances for spherical probability measures. This paper introduces a high-speed and highly parallelizable distance for comparing spherical measures using the stereographic projection and the generalized Radon transform, which we refer to as the Stereographic Spherical Sliced Wasserstein (S3W) distance. We carefully address the distance distortion caused by the stereographic projection and provide an extensive theoretical analysis of our proposed metric and its rotationally invariant variation. Finally, we evaluate the performance of the proposed metrics and compare them with recent baselines in terms of both speed and accuracy through a wide range of numerical studies, including gradient flows and self-supervised learning. Our code is available at https://github.com/mint-vu/s3wd.

6/11/2024

A New Robust Partial $p$-Wasserstein-Based Metric for Comparing Distributions

Sharath Raghvendra, Pouyan Shirzadian, Kaiyi Zhang

The $2$-Wasserstein distance is sensitive to minor geometric differences between distributions, making it a very powerful dissimilarity metric. However, due to this sensitivity, a small outlier mass can also cause a significant increase in the $2$-Wasserstein distance between two similar distributions. Similarly, sampling discrepancy can cause the empirical $2$-Wasserstein distance on $n$ samples in $mathbb{R}^2$ to converge to the true distance at a rate of $n^{-1/4}$, which is significantly slower than the rate of $n^{-1/2}$ for $1$-Wasserstein distance. We introduce a new family of distances parameterized by $k ge 0$, called $k$-RPW that is based on computing the partial $2$-Wasserstein distance. We show that (1) $k$-RPW satisfies the metric properties, (2) $k$-RPW is robust to small outlier mass while retaining the sensitivity of $2$-Wasserstein distance to minor geometric differences, and (3) when $k$ is a constant, $k$-RPW distance between empirical distributions on $n$ samples in $mathbb{R}^2$ converges to the true distance at a rate of $n^{-1/3}$, which is faster than the convergence rate of $n^{-1/4}$ for the $2$-Wasserstein distance. Using the partial $p$-Wasserstein distance, we extend our distance to any $p in [1,infty]$. By setting parameters $k$ or $p$ appropriately, we can reduce our distance to the total variation, $p$-Wasserstein, and the L'evy-Prokhorov distances. Experiments show that our distance function achieves higher accuracy in comparison to the $1$-Wasserstein, $2$-Wasserstein, and TV distances for image retrieval tasks on noisy real-world data sets.

6/4/2024