An ordinary differential equation for entropic optimal transport and its linearly constrained variants

Read original: arXiv:2403.20238 - Published 4/1/2024 by Joshua Zoen-Git Hiew, Luca Nenna, Brendan Pass

🤯

Introduction

The paper discusses optimal transport problems involving two or more probability measures and cost functions. It introduces a generalized optimal transport problem that allows multiple marginal constraints and additional linear constraints. The authors propose an entropic regularization approach, leading to a strictly convex optimization problem that can be efficiently solved using the Sinkhorn algorithm.

The main contribution is deriving an ordinary differential equation (ODE) that characterizes the solutions to the regularized optimal transport problem with discrete marginals. Solving this ODE provides a new numerical method for computing solutions to various optimal transport problems, including multi-marginal problems, problems with linear constraints, and even Wasserstein barycenters and geodesics.

The ODE formulation offers two key advantages: 1) It efficiently computes the entire curve of solutions as the regularization parameter varies, rather than requiring separate computations for each parameter value. 2) It allows calculating derivatives of the optimal cost around the fully regularized limit, enabling Taylor expansions in this regime.

The paper also provides a general framework for applying the Sinkhorn algorithm to linearly constrained optimal transport problems, unifying previous work on specific cases. Several numerical examples demonstrate the feasibility of the proposed ODE method and compare it with the traditional Sinkhorn algorithm.

Optimal Transport with Extra Linear constraints

The section discusses the optimal transport problem with additional linear constraints on the couplings between probability measures. It introduces the basic problem setting, defining the primal and dual optimization problems. The focus is on the discrete case where the probability measures have finite support.

The primal problem aims to find a coupling minimizing the total transportation cost plus an entropy regularization term, subject to the linear constraints. The dual maximizes a concave function involving the marginal distributions and constraints.

A key result shows that by removing certain redundant columns from the constraint matrices, one can obtain an equivalent problem where the dual objective is strictly concave, ensuring a unique solution exists when the regularization parameter is non-zero.

The paper then expresses the discrete primal and dual problems in a compact matrix-vector form, setting up the framework to study properties of the solution paths as the regularization parameter varies from zero to infinity. The goal is to characterize how the regularized solutions interpolate between the original optimal transport solution and an entropy-minimizing coupling satisfying the constraints.

ODE for entropic Optimal Transport

The provided section discusses formulating and solving an optimal dual Kantorovich potential problem using ordinary differential equations (ODEs). Key points:

A function Φ(φ,ε) is defined involving the entropy of the probability measures.
Minimizing Φ(φ,ε) over φ is equivalent to the original optimal transport problem for a fixed ε.
Lemma 3.1 shows Φ(φ,ε) is uniformly convex in φ, ensuring a unique minimizer φ(ε) for each ε.
Differentiating the optimality condition w.r.t. ε yields an ODE governing how φ(ε) changes with ε.
Theorem 3.2 establishes this ODE has a unique smooth solution characterized by an initial value problem.
The initial value φ(0) corresponds to the unconstrained optimal transport problem.
Proposition 3.4 shows a Sinkhorn algorithm converges to the unique solution of the dual problem for fixed ε.

Overall, this formulates the dual optimal transport problem as an ODE initial value problem that can be solved numerically using the Sinkhorn algorithm. The solution path φ(ε) then parameterizes the optimal transport plans.

Derivatives of the optimal cost

The section discusses how to compute derivatives of the optimal cost function C(ε) at ε=0 for the two-marginal optimal transport problem. It focuses on the unconstrained case (n=2, Q={0}) for simplicity.

The primal problem P(ε) and dual problem D(ε) are defined. By eliminating one variable, the dual can be expressed solely in terms of u as an optimization problem infuΦ(u,ε)=-D(ε)=-P(ε), where Φ(u,ε) has a simple form.

The derivatives C'(0), C''(0) are computed in closed form using the ODEs satisfied by the optimizers. C'(0) is simply the negative expected cost under the product measure μ⊗ν. C''(0) involves the variance of the cost function under μ⊗ν.

Higher order derivatives can be obtained similarly by further differentiating and using the ODEs. A remark notes that C''(0)≥0 since P(ε) is concave, though the explicit expression does not obviously show non-negativity. A counterexample shows the expression can be negative for non-product couplings.

The technical details involve computing the derivatives of Φ(u,ε) with respect to ε and u using the ODEs satisfied by the optimizers (u(ε),v(ε)). Overall, the section provides a technique to compute derivatives of the optimal transport cost at ε=0 in closed form.

Numerical examples

The provided section discusses numerical approximations of solutions to the optimal transport problem and several variants, using an ODE method. It focuses on computing entropy-regularized optimal transport plans by discretizing an ODE in the regularization parameter ε and solving with a 4th-order Runge-Kutta method.

The section covers several examples:

Two marginal optimal transport: Comparing the ODE method to the Sinkhorn algorithm for computing optimal couplings between two discrete marginal measures, with attractive and repulsive costs.
Multi-marginal optimal transport: A three marginal problem with a specific cost function, comparing ODE and Sinkhorn results.
Martingale optimal transport: Enforcing the martingale constraint by projecting onto a set of test functions. Examples use discrete marginals with costs of Spence-Mirrlees type.
Multi-period martingale optimal transport: Extending to a 3-period problem with computations of optimal multi-period left-monotone couplings.
Wasserstein geodesics and barycenters: Using the ODE approach to approximate displacement interpolants between measures and compute Wasserstein barycenters.

In each example, numerical results from the ODE method are compared against Sinkhorn computations and theoretical values when available. Plots visualize the evolution of optimal couplings as ε varies. Overall, the ODE approach is shown to produce accurate approximations competitively with Sinkhorn.

Appendix A Convergence of entropic optimal transport with extra linear constraints to the unregularized limit

The provided text aims to prove that the optimal solution to the regularized (entropic) optimal transport problem with additional linear constraints will converge to an optimal solution of the unregularized problem that has the minimum relative entropy.

It first establishes some preliminary lemmas. Lemma A.1 shows that the set of probability measures satisfying the linear constraints is compact in the weak topology. Lemma A.2 states that if the sequence of regularized optimal solutions converges weakly to some limit, and their entropies are uniformly bounded, then this limit is an optimal solution to the unregularized problem with minimum entropy among all optimal solutions.

The text then focuses on the discrete case and uses a simplified notation. Proposition A.3 proves that the sequence of dual potentials corresponding to the regularized problems is bounded. If this sequence converges, its limit is an optimal solution to the unregularized dual problem. The proof relies on bounding the exponential terms appearing in the regularized dual problems using the constraint qualifications.

In summary, the provided section establishes theoretical results guaranteeing that the regularized entropic optimal transport problem can approximate the unregularized problem as the regularization parameter goes to zero, and the solution attains minimum entropy among all optimal solutions in the limit.

Appendix B Calculation of the second derivatives of C⁢(ε)𝐶𝜀C(\varepsilon)italic_C ( italic_ε )

The provided section derives the second-order derivatives of the entropy regularized optimal transport cost functions for three different settings:

Multi-marginal optimal transport: The function involves marginal distributions over sources and destinations, and the cost depends on the source-destination pair. The second derivatives are given with respect to the dual variables associated with the source and destination marginals.
Martingale optimal transport: The function involves a base marginal and a stochastic kernel mapping the base to target marginals. The cost depends on the base sample and the pair of base-target samples. The second derivatives are given with respect to the dual variables associated with the base marginal and the stochastic kernel.
Multi-period optimal transport: This extends the martingale setting to multiple periods. The cost now depends on the sequence of samples across periods. The second derivatives are given with respect to the dual variables associated with the marginals over each period and the stochastic kernels mapping between consecutive periods.

The derivations make use of certain probability distributions parameterized by the dual variables and costs. Kronecker deltas and ratios of these distributions are used to express the second derivatives compactly in matrix form. The final expressions involve sums and products of these probability distributions evaluated at different sample values.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤯

An ordinary differential equation for entropic optimal transport and its linearly constrained variants

Joshua Zoen-Git Hiew, Luca Nenna, Brendan Pass

We characterize the solution to the entropically regularized optimal transport problem by a well-posed ordinary differential equation (ODE). Our approach works for discrete marginals and general cost functions, and in addition to two marginal problems, applies to multi-marginal problems and those with additional linear constraints. Solving the ODE gives a new numerical method to solve the optimal transport problem, which has the advantage of yielding the solution for all intermediate values of the ODE parameter (which is equivalent to the usual regularization parameter). We illustrate this method with several numerical simulations. The formulation of the ODE also allows one to compute derivatives of the optimal cost when the ODE parameter is $0$, corresponding to the fully regularized limit problem in which only the entropy is minimized.

4/1/2024

🛠️

Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods

Gen Li, Yanxi Chen, Yu Huang, Yuejie Chi, H. Vincent Poor, Yuxin Chen

Efficient computation of the optimal transport distance between two distributions serves as an algorithm subroutine that empowers various applications. This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy with runtime $widetilde{O}( n^2/varepsilon)$, where $n$ denotes the dimension of the probability distributions of interest. Our algorithm achieves the state-of-the-art computational guarantees among all first-order methods, while exhibiting favorable numerical performance compared to classical algorithms like Sinkhorn and Greenkhorn. Underlying our algorithm designs are two key elements: (a) converting the original problem into a bilinear minimax problem over probability distributions; (b) exploiting the extragradient idea -- in conjunction with entropy regularization and adaptive learning rates -- to accelerate convergence.

6/21/2024

Progressive Entropic Optimal Transport Solvers

Parnian Kassraie, Aram-Alexandre Pooladian, Michal Klein, James Thornton, Jonathan Niles-Weed, Marco Cuturi

Optimal transport (OT) has profoundly impacted machine learning by providing theoretical and computational tools to realign datasets. In this context, given two large point clouds of sizes $n$ and $m$ in $mathbb{R}^d$, entropic OT (EOT) solvers have emerged as the most reliable tool to either solve the Kantorovich problem and output a $ntimes m$ coupling matrix, or to solve the Monge problem and learn a vector-valued push-forward map. While the robustness of EOT couplings/maps makes them a go-to choice in practical applications, EOT solvers remain difficult to tune because of a small but influential set of hyperparameters, notably the omnipresent entropic regularization strength $varepsilon$. Setting $varepsilon$ can be difficult, as it simultaneously impacts various performance metrics, such as compute speed, statistical performance, generalization, and bias. In this work, we propose a new class of EOT solvers (ProgOT), that can estimate both plans and transport maps. We take advantage of several opportunities to optimize the computation of EOT solutions by dividing mass displacement using a time discretization, borrowing inspiration from dynamic OT formulations, and conquering each of these steps using EOT with properly scheduled parameters. We provide experimental evidence demonstrating that ProgOT is a faster and more robust alternative to standard solvers when computing couplings at large scales, even outperforming neural network-based approaches. We also prove statistical consistency of our approach for estimating optimal transport maps.

6/10/2024

Neural Optimal Transport with Lagrangian Costs

Aram-Alexandre Pooladian, Carles Domingo-Enrich, Ricky T. Q. Chen, Brandon Amos

We investigate the optimal transport problem between probability measures when the underlying cost function is understood to satisfy a least action principle, also known as a Lagrangian cost. These generalizations are useful when connecting observations from a physical system where the transport dynamics are influenced by the geometry of the system, such as obstacles (e.g., incorporating barrier functions in the Lagrangian), and allows practitioners to incorporate a priori knowledge of the underlying system such as non-Euclidean geometries (e.g., paths must be circular). Our contributions are of computational interest, where we demonstrate the ability to efficiently compute geodesics and amortize spline-based paths, which has not been done before, even in low dimensional problems. Unlike prior work, we also output the resulting Lagrangian optimal transport map without requiring an ODE solver. We demonstrate the effectiveness of our formulation on low-dimensional examples taken from prior work. The source code to reproduce our experiments is available at https://github.com/facebookresearch/lagrangian-ot.

6/4/2024