Approximation Theory, Computing, and Deep Learning on the Wasserstein Space

2310.19548

Published 5/1/2024 by Massimo Fornasier, Pascal Heid, Giacomo Enrico Sodini

🤿

Abstract

The challenge of approximating functions in infinite-dimensional spaces from finite samples is widely regarded as formidable. In this study, we delve into the challenging problem of the numerical approximation of Sobolev-smooth functions defined on probability spaces. Our particular focus centers on the Wasserstein distance function, which serves as a relevant example. In contrast to the existing body of literature focused on approximating efficiently pointwise evaluations, we chart a new course to define functional approximants by adopting three machine learning-based approaches: 1. Solving a finite number of optimal transport problems and computing the corresponding Wasserstein potentials. 2. Employing empirical risk minimization with Tikhonov regularization in Wasserstein Sobolev spaces. 3. Addressing the problem through the saddle point formulation that characterizes the weak form of the Tikhonov functional's Euler-Lagrange equation. As a theoretical contribution, we furnish explicit and quantitative bounds on generalization errors for each of these solutions. In the proofs, we leverage the theory of metric Sobolev spaces and we combine it with techniques of optimal transport, variational calculus, and large deviation bounds. In our numerical implementation, we harness appropriately designed neural networks to serve as basis functions. These networks undergo training using diverse methodologies. This approach allows us to obtain approximating functions that can be rapidly evaluated after training. Consequently, our constructive solutions significantly enhance at equal accuracy the evaluation speed, surpassing that of state-of-the-art methods by several orders of magnitude.

Create account to get full access

Overview

The paper addresses the challenge of approximating Sobolev-smooth functions defined on probability spaces from finite samples.
The focus is on the Wasserstein distance function as a relevant example.
The authors propose three machine learning-based approaches to define functional approximants, including solving optimal transport problems, using empirical risk minimization with Tikhonov regularization, and a saddle point formulation.
Explicit and quantitative bounds on generalization errors are provided for each solution.
The authors leverage the theory of metric Sobolev spaces, optimal transport, variational calculus, and large deviation bounds in their proofs.
Neural networks are used as basis functions, allowing for rapid evaluation of the approximating functions after training.

Plain English Explanation

The paper tackles the difficult problem of approximating smooth functions defined on probability spaces using a finite number of samples. This is a challenging task because these functions exist in an infinite-dimensional space, making it hard to capture their full complexity from a limited dataset.

The authors focus on the Wasserstein distance function as an example, which is a useful metric for comparing probability distributions. Instead of just trying to approximate the function's values at specific points, the researchers propose three machine learning-based approaches to define full functional approximations.

The first approach involves solving a finite number of optimal transport problems and computing the corresponding Wasserstein potentials. The second approach uses empirical risk minimization with Tikhonov regularization in Wasserstein Sobolev spaces. The third approach addresses the problem through a saddle point formulation that characterizes the weak form of the Tikhonov functional's Euler-Lagrange equation.

Importantly, the authors provide precise mathematical bounds on how well these approximations can generalize to new data, leveraging advanced concepts from metric Sobolev spaces, optimal transport, variational calculus, and large deviation theory.

The researchers also use appropriately designed neural networks as basis functions, which can be trained using diverse methodologies. This allows the approximating functions to be evaluated very quickly after the training process, significantly outperforming state-of-the-art methods in terms of evaluation speed at the same level of accuracy.

Technical Explanation

The paper investigates the challenging problem of numerically approximating Sobolev-smooth functions defined on probability spaces from finite samples. The authors focus on the Wasserstein distance function as a relevant example.

In contrast to existing literature that has primarily focused on approximating pointwise evaluations, the researchers propose three machine learning-based approaches to define functional approximants:

Solving a finite number of optimal transport problems and computing the corresponding Wasserstein potentials.
Employing empirical risk minimization with Tikhonov regularization in Wasserstein Sobolev spaces.
Addressing the problem through the saddle point formulation that characterizes the weak form of the Tikhonov functional's Euler-Lagrange equation.

As a theoretical contribution, the authors provide explicit and quantitative bounds on generalization errors for each of these solutions. In the proofs, they leverage the theory of metric Sobolev spaces and combine it with techniques from optimal transport, variational calculus, and large deviation bounds.

In the numerical implementation, the researchers use appropriately designed neural networks as basis functions, which undergo training using diverse methodologies. This approach allows them to obtain approximating functions that can be rapidly evaluated after training, significantly outperforming state-of-the-art methods in terms of evaluation speed at equal accuracy.

Critical Analysis

The paper presents a comprehensive and rigorous approach to approximating Sobolev-smooth functions defined on probability spaces from finite samples. The authors have provided solid theoretical foundations and explicit generalization error bounds for their proposed solutions, which is a significant contribution to the field.

One potential limitation of the research is that the focus is primarily on the Wasserstein distance function as an example, and it remains to be seen how well the proposed methods would generalize to other types of Sobolev-smooth functions. Additionally, the numerical experiments are conducted on synthetic data, and it would be valuable to see the performance of these methods on real-world datasets.

Furthermore, while the authors have demonstrated significant improvements in evaluation speed compared to state-of-the-art methods, it would be interesting to explore the computational complexity and training time required for their approaches. This information could provide a more complete understanding of the practical implications of the proposed solutions.

Overall, the paper presents an important and well-executed contribution to the challenging problem of approximating functions in infinite-dimensional spaces from finite samples. The theoretical insights and the development of fast-evaluating approximating functions are valuable advancements in the field.

Conclusion

This paper tackles the formidable challenge of numerically approximating Sobolev-smooth functions defined on probability spaces from finite samples. The authors propose three machine learning-based approaches to define functional approximants, including solving optimal transport problems, using empirical risk minimization with Tikhonov regularization, and a saddle point formulation.

Importantly, the researchers provide explicit and quantitative bounds on the generalization errors of their solutions, drawing on advanced mathematical concepts from metric Sobolev spaces, optimal transport, variational calculus, and large deviation theory. The use of appropriately designed neural networks as basis functions allows for rapid evaluation of the approximating functions after training, significantly outperforming state-of-the-art methods.

This work represents a significant advancement in the field of function approximation in infinite-dimensional spaces, with potential applications in various areas of machine learning and data analysis. The insights and techniques developed in this paper can inspire further research and contribute to the continued progress in this challenging and important domain.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Mirror and Preconditioned Gradient Descent in Wasserstein Space

Cl'ement Bonet, Th'eo Uscidda, Adam David, Pierre-Cyril Aubin-Frankowski, Anna Korba

As the problem of minimizing functionals on the Wasserstein space encompasses many applications in machine learning, different optimization algorithms on $mathbb{R}^d$ have received their counterpart analog on the Wasserstein space. We focus here on lifting two explicit algorithms: mirror descent and preconditioned gradient descent. These algorithms have been introduced to better capture the geometry of the function to minimize and are provably convergent under appropriate (namely relative) smoothness and convexity conditions. Adapting these notions to the Wasserstein space, we prove guarantees of convergence of some Wasserstein-gradient-based discrete-time schemes for new pairings of objective functionals and regularizers. The difficulty here is to carefully select along which curves the functionals should be smooth and convex. We illustrate the advantages of adapting the geometry induced by the regularizer on ill-conditioned optimization tasks, and showcase the improvement of choosing different discrepancies and geometries in a computational biology task of aligning single-cells.

6/14/2024

cs.LG

🧠

Gaussian random field approximation via Stein's method with applications to wide random neural networks

Krishnakumar Balasubramanian, Larry Goldstein, Nathan Ross, Adil Salim

We derive upper bounds on the Wasserstein distance ($W_1$), with respect to $sup$-norm, between any continuous $mathbb{R}^d$ valued random field indexed by the $n$-sphere and the Gaussian, based on Stein's method. We develop a novel Gaussian smoothing technique that allows us to transfer a bound in a smoother metric to the $W_1$ distance. The smoothing is based on covariance functions constructed using powers of Laplacian operators, designed so that the associated Gaussian process has a tractable Cameron-Martin or Reproducing Kernel Hilbert Space. This feature enables us to move beyond one dimensional interval-based index sets that were previously considered in the literature. Specializing our general result, we obtain the first bounds on the Gaussian random field approximation of wide random neural networks of any depth and Lipschitz activation functions at the random field level. Our bounds are explicitly expressed in terms of the widths of the network and moments of the random weights. We also obtain tighter bounds when the activation function has three bounded derivatives.

5/2/2024

cs.LG stat.ML

🤯

Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space

Yiheng Jiang, Sinho Chewi, Aram-Alexandre Pooladian

We develop a theory of finite-dimensional polyhedral subsets over the Wasserstein space and optimization of functionals over them via first-order methods. Our main application is to the problem of mean-field variational inference, which seeks to approximate a distribution $pi$ over $mathbb{R}^d$ by a product measure $pi^star$. When $pi$ is strongly log-concave and log-smooth, we provide (1) approximation rates certifying that $pi^star$ is close to the minimizer $pi^star_diamond$ of the KL divergence over a emph{polyhedral} set $mathcal{P}_diamond$, and (2) an algorithm for minimizing $text{KL}(cdot|pi)$ over $mathcal{P}_diamond$ with accelerated complexity $O(sqrt kappa log(kappa d/varepsilon^2))$, where $kappa$ is the condition number of $pi$.

6/11/2024

cs.LG

New!Instance-Optimal Private Density Estimation in the Wasserstein Distance

Vitaly Feldman, Audra McMillan, Satchit Sivakumar, Kunal Talwar

Estimating the density of a distribution from samples is a fundamental problem in statistics. In many practical settings, the Wasserstein distance is an appropriate error metric for density estimation. For example, when estimating population densities in a geographic region, a small Wasserstein distance means that the estimate is able to capture roughly where the population mass is. In this work we study differentially private density estimation in the Wasserstein distance. We design and analyze instance-optimal algorithms for this problem that can adapt to easy instances. For distributions $P$ over $mathbb{R}$, we consider a strong notion of instance-optimality: an algorithm that uniformly achieves the instance-optimal estimation rate is competitive with an algorithm that is told that the distribution is either $P$ or $Q_P$ for some distribution $Q_P$ whose probability density function (pdf) is within a factor of 2 of the pdf of $P$. For distributions over $mathbb{R}^2$, we use a different notion of instance optimality. We say that an algorithm is instance-optimal if it is competitive with an algorithm that is given a constant-factor multiplicative approximation of the density of the distribution. We characterize the instance-optimal estimation rates in both these settings and show that they are uniformly achievable (up to polylogarithmic factors). Our approach for $mathbb{R}^2$ extends to arbitrary metric spaces as it goes via hierarchically separated trees. As a special case our results lead to instance-optimal private learning in TV distance for discrete distributions.

7/1/2024

cs.LG cs.CR cs.DS stat.ML