A New Robust Partial $p$-Wasserstein-Based Metric for Comparing Distributions

Read original: arXiv:2405.03664 - Published 6/4/2024 by Sharath Raghvendra, Pouyan Shirzadian, Kaiyi Zhang

🚀

Overview

The 2-Wasserstein distance is a powerful metric for measuring differences between distributions, but it is highly sensitive to minor geometric changes and small outliers.
Sampling discrepancy can cause the empirical 2-Wasserstein distance to converge to the true distance at a slower rate than the 1-Wasserstein distance.
The authors introduce a new family of distances called k-RPW that aims to address these issues.

Plain English Explanation

The 2-Wasserstein distance is a way to measure how different two probability distributions are. It's a very useful tool, but it has some drawbacks. It's very sensitive to small changes in the geometry of the distributions, and even a tiny outlier (an unusual data point) can make the distance between two similar distributions seem much larger than it really is.

Another issue is that when you try to estimate the 2-Wasserstein distance from samples of the distributions, the estimate converges to the true distance more slowly than the 1-Wasserstein distance. This means you need a lot more samples to get an accurate estimate.

To address these problems, the researchers introduced a new family of distances called k-RPW. The key idea is to only look at the k largest values in the 2-Wasserstein distance calculation, ignoring the small outliers. This makes the distance more robust to outliers while still capturing the important geometric differences between the distributions.

The researchers also show that when k is a constant, the k-RPW distance converges to the true distance much faster than the 2-Wasserstein distance when estimated from samples. This means you can get accurate estimates with fewer samples.

By adjusting the parameters k and p, the k-RPW distance can also be reduced to other useful distances like the total variation distance, p-Wasserstein distance, and Lévy-Prokhorov distance. Experiments show that the k-RPW distance outperforms these other distances for image retrieval tasks on real-world data that contains noise.

Technical Explanation

The 2-Wasserstein distance is a powerful metric for measuring the dissimilarity between probability distributions, as it is sensitive to minor geometric differences. However, this sensitivity also means that a small outlier mass can significantly increase the 2-Wasserstein distance between two similar distributions. Additionally, the empirical 2-Wasserstein distance on n samples in ℝ^2 converges to the true distance at a rate of n^(-1/4), which is significantly slower than the n^(-1/2) rate for the 1-Wasserstein distance.

To address these issues, the authors introduce a new family of distances called k-RPW, which is parameterized by k ≥ 0 and based on computing the partial 2-Wasserstein distance. They show that (1) k-RPW satisfies the metric properties, (2) k-RPW is robust to small outlier mass while retaining the sensitivity of 2-Wasserstein distance to minor geometric differences, and (3) when k is a constant, the k-RPW distance between empirical distributions on n samples in ℝ^2 converges to the true distance at a rate of n^(-1/3), which is faster than the n^(-1/4) convergence rate for the 2-Wasserstein distance.

By using the partial p-Wasserstein distance, the authors extend their distance to any p in [1,∞]. They show that by setting the parameters k or p appropriately, their distance can be reduced to the total variation, p-Wasserstein, and the Lévy-Prokhorov distances. Experiments demonstrate that their distance function achieves higher accuracy than the 1-Wasserstein, 2-Wasserstein, and TV distances for image retrieval tasks on noisy real-world data sets.

Critical Analysis

The paper introduces a promising new family of distances, the k-RPW, that aims to address some of the limitations of the 2-Wasserstein distance. The authors provide a thorough theoretical analysis, proving that the k-RPW satisfies the metric properties, is robust to small outliers, and has a faster convergence rate than the 2-Wasserstein distance when estimated from samples.

However, the paper does not delve into the potential drawbacks or limitations of the k-RPW distance. For example, it is unclear how the choice of the parameter k affects the distance and its properties. Additionally, the authors only provide experiments on image retrieval tasks, and it would be interesting to see how the k-RPW performs on a wider range of applications and datasets.

Another area for further research could be exploring the connections between the k-RPW distance and other related distances, such as the Gromov-Wasserstein distance or the Wasserstein Distortion. Understanding these relationships could lead to a more comprehensive understanding of the strengths and weaknesses of the different distance measures.

Overall, the paper presents a novel and promising approach to addressing the sensitivity and convergence issues of the 2-Wasserstein distance. However, further investigation into the practical implications and limitations of the k-RPW distance would be valuable for the research community.

Conclusion

The authors have introduced a new family of distances, the k-RPW, that aims to improve upon the 2-Wasserstein distance by being more robust to small outliers while retaining its sensitivity to minor geometric differences. Theoretically, the k-RPW distance has several desirable properties, including satisfying the metric axioms and converging to the true distance at a faster rate than the 2-Wasserstein distance when estimated from samples.

The experimental results on image retrieval tasks are promising, showing that the k-RPW distance outperforms the 1-Wasserstein, 2-Wasserstein, and TV distances. However, further research is needed to fully understand the practical implications and limitations of the k-RPW distance, as well as its connections to other related distance measures.

Overall, this work presents an interesting and potentially impactful contribution to the field of probability metric learning, with applications in areas such as private Wasserstein distance with random noises and beyond. The introduction of the k-RPW distance opens up new avenues for exploring more robust and efficient methods for measuring distribution dissimilarity.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🚀

A New Robust Partial $p$-Wasserstein-Based Metric for Comparing Distributions

Sharath Raghvendra, Pouyan Shirzadian, Kaiyi Zhang

The $2$-Wasserstein distance is sensitive to minor geometric differences between distributions, making it a very powerful dissimilarity metric. However, due to this sensitivity, a small outlier mass can also cause a significant increase in the $2$-Wasserstein distance between two similar distributions. Similarly, sampling discrepancy can cause the empirical $2$-Wasserstein distance on $n$ samples in $mathbb{R}^2$ to converge to the true distance at a rate of $n^{-1/4}$, which is significantly slower than the rate of $n^{-1/2}$ for $1$-Wasserstein distance. We introduce a new family of distances parameterized by $k ge 0$, called $k$-RPW that is based on computing the partial $2$-Wasserstein distance. We show that (1) $k$-RPW satisfies the metric properties, (2) $k$-RPW is robust to small outlier mass while retaining the sensitivity of $2$-Wasserstein distance to minor geometric differences, and (3) when $k$ is a constant, $k$-RPW distance between empirical distributions on $n$ samples in $mathbb{R}^2$ converges to the true distance at a rate of $n^{-1/3}$, which is faster than the convergence rate of $n^{-1/4}$ for the $2$-Wasserstein distance. Using the partial $p$-Wasserstein distance, we extend our distance to any $p in [1,infty]$. By setting parameters $k$ or $p$ appropriately, we can reduce our distance to the total variation, $p$-Wasserstein, and the L'evy-Prokhorov distances. Experiments show that our distance function achieves higher accuracy in comparison to the $1$-Wasserstein, $2$-Wasserstein, and TV distances for image retrieval tasks on noisy real-world data sets.

6/4/2024

Relative-Translation Invariant Wasserstein Distance

Binshuai Wang, Qiwei Di, Ming Yin, Mengdi Wang, Quanquan Gu, Peng Wei

We introduce a new family of distances, relative-translation invariant Wasserstein distances ($RW_p$), for measuring the similarity of two probability distributions under distribution shift. Generalizing it from the classical optimal transport model, we show that $RW_p$ distances are also real distance metrics defined on the quotient set $mathcal{P}_p(mathbb{R}^n)/sim$ and invariant to distribution translations. When $p=2$, the $RW_2$ distance enjoys more exciting properties, including decomposability of the optimal transport model, translation-invariance of the $RW_2$ distance, and a Pythagorean relationship between $RW_2$ and the classical quadratic Wasserstein distance ($W_2$). Based on these properties, we show that a distribution shift, measured by $W_2$ distance, can be explained in the bias-variance perspective. In addition, we propose a variant of the Sinkhorn algorithm, named $RW_2$ Sinkhorn algorithm, for efficiently calculating $RW_2$ distance, coupling solutions, as well as $W_2$ distance. We also provide the analysis of numerical stability and time complexity for the proposed algorithm. Finally, we validate the $RW_2$ distance metric and the algorithm performance with three experiments. We conduct one numerical validation for the $RW_2$ Sinkhorn algorithm and show two real-world applications demonstrating the effectiveness of using $RW_2$ under distribution shift: digits recognition and similar thunderstorm detection. The experimental results report that our proposed algorithm significantly improves the computational efficiency of Sinkhorn in certain practical applications, and the $RW_2$ distance is robust to distribution translations compared with baselines.

9/5/2024

Instance-Optimal Private Density Estimation in the Wasserstein Distance

Vitaly Feldman, Audra McMillan, Satchit Sivakumar, Kunal Talwar

Estimating the density of a distribution from samples is a fundamental problem in statistics. In many practical settings, the Wasserstein distance is an appropriate error metric for density estimation. For example, when estimating population densities in a geographic region, a small Wasserstein distance means that the estimate is able to capture roughly where the population mass is. In this work we study differentially private density estimation in the Wasserstein distance. We design and analyze instance-optimal algorithms for this problem that can adapt to easy instances. For distributions $P$ over $mathbb{R}$, we consider a strong notion of instance-optimality: an algorithm that uniformly achieves the instance-optimal estimation rate is competitive with an algorithm that is told that the distribution is either $P$ or $Q_P$ for some distribution $Q_P$ whose probability density function (pdf) is within a factor of 2 of the pdf of $P$. For distributions over $mathbb{R}^2$, we use a different notion of instance optimality. We say that an algorithm is instance-optimal if it is competitive with an algorithm that is given a constant-factor multiplicative approximation of the density of the distribution. We characterize the instance-optimal estimation rates in both these settings and show that they are uniformly achievable (up to polylogarithmic factors). Our approach for $mathbb{R}^2$ extends to arbitrary metric spaces as it goes via hierarchically separated trees. As a special case our results lead to instance-optimal private learning in TV distance for discrete distributions.

7/1/2024

📊

Two-sample Test using Projected Wasserstein Distance

Jie Wang, Rui Gao, Yao Xie

We develop a projected Wasserstein distance for the two-sample test, a fundamental problem in statistics and machine learning: given two sets of samples, to determine whether they are from the same distribution. In particular, we aim to circumvent the curse of dimensionality in Wasserstein distance: when the dimension is high, it has diminishing testing power, which is inherently due to the slow concentration property of Wasserstein metrics in the high dimension space. A key contribution is to couple optimal projection to find the low dimensional linear mapping to maximize the Wasserstein distance between projected probability distributions. We characterize the theoretical property of the finite-sample convergence rate on IPMs and present practical algorithms for computing this metric. Numerical examples validate our theoretical results.

4/1/2024