Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods

Read original: arXiv:2301.13006 - Published 6/21/2024 by Gen Li, Yanxi Chen, Yu Huang, Yuejie Chi, H. Vincent Poor, Yuxin Chen

🛠️

Overview

This paper presents a new algorithm for efficiently computing the optimal transport distance between two probability distributions.
The algorithm achieves state-of-the-art computational guarantees among first-order methods and exhibits favorable numerical performance compared to classical approaches like Sinkhorn and Greenkhorn.
The key elements of the algorithm are: (a) converting the original problem into a bilinear minimax problem over probability distributions, and (b) using the extragradient idea in conjunction with entropy regularization and adaptive learning rates to accelerate convergence.

Plain English Explanation

The optimal transport distance is a way to measure the similarity between two probability distributions, with applications in various machine learning tasks. Efficiently computing this distance is important as it can serve as a subroutine for other algorithms.

The paper introduces a new method that can compute the optimal transport distance quickly and accurately. The key ideas are:

Reframing the problem: The original optimal transport problem is converted into a bilinear minimax problem, which is easier to solve.
Optimization technique: The extragradient method, combined with entropy regularization and adaptive learning rates, is used to accelerate the convergence of the optimization process.

This new algorithm achieves the best theoretical runtime guarantees among first-order methods, while also performing well in practice compared to classical approaches like Sinkhorn and Greenkhorn.

Technical Explanation

The paper develops a scalable first-order optimization-based method to compute the optimal transport distance between two probability distributions. The algorithm achieves $\varepsilon$-additive accuracy in $\widetilde{O}(n^2/\varepsilon)$ runtime, where $n$ is the dimension of the distributions.

The key elements of the algorithm are:

Bilinear Minimax Formulation: The authors convert the original optimal transport problem into a bilinear minimax problem over probability distributions. This reformulation allows them to leverage powerful optimization techniques.
Extragradient with Adaptive Learning: The algorithm exploits the extragradient idea, which involves performing two gradient steps per iteration. This is combined with entropy regularization and adaptive learning rates to accelerate convergence.

The experimental results show that the proposed algorithm outperforms classical approaches like Sinkhorn and Greenkhorn in terms of both computational efficiency and numerical performance.

Critical Analysis

The paper presents a strong theoretical analysis and empirical evaluation of the proposed algorithm. However, a few potential limitations and areas for further research are worth considering:

Sensitivity to Hyperparameters: The algorithm relies on several hyperparameters, such as the learning rate and the entropy regularization parameter. The sensitivity of the method to these hyperparameters is not fully explored, and tuning them may require additional computational effort.
Generalization to Other Transport Metrics: The paper focuses on the Wasserstein distance, but there are other important optimal transport metrics, such as the Gromov-Wasserstein distance. Extending the proposed algorithm to these more general settings could further expand its applicability.
Real-world Performance: While the algorithm exhibits strong performance on synthetic benchmarks, its effectiveness in real-world applications with large-scale, high-dimensional data is not extensively evaluated. Assessing the algorithm's practical utility in diverse domains would be a valuable direction for future research.

Despite these potential areas for improvement, the paper represents an important contribution to the efficient computation of optimal transport distances, which is a fundamental problem in machine learning and optimization.

Conclusion

This paper presents a novel first-order optimization-based algorithm for efficiently computing the optimal transport distance between two probability distributions. The algorithm achieves state-of-the-art computational guarantees and outperforms classical approaches in numerical experiments. The key ideas behind the algorithm's success are the reformulation of the problem as a bilinear minimax problem and the use of the extragradient method with entropy regularization and adaptive learning rates. While the paper identifies some potential limitations, the proposed method represents a significant advancement in the field of optimal transport computation, with important implications for a wide range of machine learning applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🛠️

Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods

Gen Li, Yanxi Chen, Yu Huang, Yuejie Chi, H. Vincent Poor, Yuxin Chen

Efficient computation of the optimal transport distance between two distributions serves as an algorithm subroutine that empowers various applications. This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy with runtime $widetilde{O}( n^2/varepsilon)$, where $n$ denotes the dimension of the probability distributions of interest. Our algorithm achieves the state-of-the-art computational guarantees among all first-order methods, while exhibiting favorable numerical performance compared to classical algorithms like Sinkhorn and Greenkhorn. Underlying our algorithm designs are two key elements: (a) converting the original problem into a bilinear minimax problem over probability distributions; (b) exploiting the extragradient idea -- in conjunction with entropy regularization and adaptive learning rates -- to accelerate convergence.

6/21/2024

Fast Gradient Computation for Gromov-Wasserstein Distance

Wei Zhang, Zihao Wang, Jie Fan, Hao Wu, Yong Zhang

The Gromov-Wasserstein distance is a notable extension of optimal transport. In contrast to the classic Wasserstein distance, it solves a quadratic assignment problem that minimizes the pair-wise distance distortion under the transportation of distributions and thus could apply to distributions in different spaces. These properties make Gromov-Wasserstein widely applicable to many fields, such as computer graphics and machine learning. However, the computation of the Gromov-Wasserstein distance and transport plan is expensive. The well-known Entropic Gromov-Wasserstein approach has a cubic complexity since the matrix multiplication operations need to be repeated in computing the gradient of Gromov-Wasserstein loss. This becomes a key bottleneck of the method. Currently, existing methods accelerate the computation focus on sampling and approximation, which leads to low accuracy or incomplete transport plan. In this work, we propose a novel method to accelerate accurate gradient computation by dynamic programming techniques, reducing the complexity from cubic to quadratic. In this way, the original computational bottleneck is broken and the new entropic solution can be obtained with total quadratic time, which is almost optimal complexity. Furthermore, it can be extended to some variants easily. Extensive experiments validate the efficiency and effectiveness of our method.

4/16/2024

🤯

An ordinary differential equation for entropic optimal transport and its linearly constrained variants

Joshua Zoen-Git Hiew, Luca Nenna, Brendan Pass

We characterize the solution to the entropically regularized optimal transport problem by a well-posed ordinary differential equation (ODE). Our approach works for discrete marginals and general cost functions, and in addition to two marginal problems, applies to multi-marginal problems and those with additional linear constraints. Solving the ODE gives a new numerical method to solve the optimal transport problem, which has the advantage of yielding the solution for all intermediate values of the ODE parameter (which is equivalent to the usual regularization parameter). We illustrate this method with several numerical simulations. The formulation of the ODE also allows one to compute derivatives of the optimal cost when the ODE parameter is $0$, corresponding to the fully regularized limit problem in which only the entropy is minimized.

4/1/2024

✨

Decentralized Entropic Optimal Transport for Distributed Distribution Comparison

Xiangfeng Wang, Hongteng Xu, Moyi Yang

Distributed distribution comparison aims to measure the distance between the distributions whose data are scattered across different agents in a distributed system and cannot even be shared directly among the agents. In this study, we propose a novel decentralized entropic optimal transport (DEOT) method, which provides a communication-efficient and privacy-preserving solution to this problem with theoretical guarantees. In particular, we design a mini-batch randomized block-coordinate descent (MRBCD) scheme to optimize the DEOT distance in its dual form. The dual variables are scattered across different agents and updated locally and iteratively with limited communications among partial agents. The kernel matrix involved in the gradients of the dual variables is estimated by a decentralized kernel approximation method, in which each agent only needs to approximate and store a sub-kernel matrix by one-shot communication and without sharing raw data. Besides computing entropic Wasserstein distance, we show that the proposed MRBCD scheme and kernel approximation method also apply to entropic Gromov-Wasserstein distance. We analyze our method's communication complexity and, under mild assumptions, provide a theoretical bound for the approximation error caused by the convergence error, the estimated kernel, and the mismatch between the storage and communication protocols. In addition, we discuss the trade-off between the precision of the EOT distance and the strength of privacy protection when implementing our method. Experiments on synthetic data and real-world distributed domain adaptation tasks demonstrate the effectiveness of our method.

7/23/2024