V. I. Bogachev, S. N. Popova, “Hausdorff distances between couplings and optimal transportation”, Sb. Math., 215:1 (2024), 28

Sbornik: Mathematics

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Forthcoming papers
	Archive
	Impact factor
	Guidelines for authors
	License agreement
	Submit a manuscript

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Mat. Sb.:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Sbornik: Mathematics, 2024, Volume 215, Issue 1, Pages 28–51
DOI: https://doi.org/10.4213/sm9920e (Mi sm9920)

This article is cited in 3 scientific papers (total in 3 papers)

Hausdorff distances between couplings and optimal transportation

V. I. Bogachev^ab, S. N. Popova^cb

^a Faculty of Mechanics and Mathematics, Lomonosov Moscow State University, Moscow, Russia
^b National Research University Higher School of Economics, Moscow, Russia
^c Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia

English version PDF (538 kB) HTML full-text Citations (3) Russian version article

References:

PDF

HTML

DOI: https://doi.org/10.4213/sm9920e

Abstract: We consider optimal transportation of measures on metric and topological spaces in the case where the cost function and marginal distributions depend on a parameter with values in a metric space. The Hausdorff distance between the sets of probability measures with prescribed marginals is estimated in terms of the distances between the marginals themselves. This estimate is used to prove the continuity of the cost of optimal transportation with respect to the parameter in the case of the continuous dependence of the cost function and marginal distributions on this parameter. Existence of approximate optimal plans continuous with respect to the parameter is established. It is shown that the optimal plan is continuous with respect to the parameter in the case of uniqueness. However, examples are constructed when there is no continuous selection of optimal plans. Another application of the estimate for the Hausdorff distance concerns discrete approximations of the transportation problem. Finally, a general result on the convergence of Monge optimal mappings is proved.
Bibliography: 46 titles.

Keywords: Kantorovich problem, Monge problem, Hausdorff distance, coupling, weak convergence, continuity with respect to a parameter.

Funding agency	Grant number
Russian Science Foundation	22-11-00015
This research was carried out at Lomonosov Moscow State University and supported by the Russian Science Foundation under grant no. 22-11-00015, https://rscf.ru/en/project/22-11-00015/.

Received: 05.04.2023 and 30.09.2023

Bibliographic databases:

Document Type: Article

MSC: Primary 49Q22; Secondary 60A10

Language: English

Original paper language: Russian

§ 1. Introduction

Recall that the Kantorovich optimal transportation problem deals with a triple $(\mu,\nu, h)$, where $\mu$ and $\nu$ are Borel probability measures on topological spaces $X$ and $Y$, respectively, and $h\geqslant 0$ is a Borel function on $X\times Y$. The problem concerns the minimization of the integral

$$ \begin{equation*} \int h\, d\sigma \end{equation*} \notag $$

over all measures $\sigma$ in the set $\Pi(\mu,\nu)$ consisting of the Borel probability measures on $X\times Y$ with projections $\mu$ and $\nu$ onto the factors, that is, $\sigma (A\times Y)=\mu(A)$ and $\sigma (X\times B)=\nu(B)$ for all Borel sets $A\subset X$ and $B\subset Y$. The measures $\mu$ and $\nu$ are called the marginal distributions or marginals, and $h$ is called a cost function. Measures in $\Pi(\mu,\nu)$ are called couplings of $\mu$ and $\nu$ or Kantorovich transport plans. In general, there exists only an infimum $K_h(\mu,\nu)$ (which can be infinite), but if $h$ is continuous (or lower semicontinuous) and bounded and $\mu$ and $\nu$ are Radon, then the minimum is attained, and measures on which it is attained are called optimal measures or optimal Kantorovich plans. Moreover, the boundedness of $h$ can be replaced by the assumption that there is a measure in $\Pi(\mu,\nu)$ with respect to which $h$ is integrable. The problem is also meaningful in the purely set-theoretic setting, but here we consider the topological case; moreover, the spaces under consideration are completely regular and, in some results, metric. The multimarginal Kantorovich problem with marginals $\mu_1,\dots,\mu_n$ on spaces $X_1,\dots,X_n$ is introduced similarly. General information about the Monge and Kantorovich problems can be found in [3], [4], [12], [17], [25], [36], [39], [42] and [43].

We study optimal transportation of measures on metric and topological spaces in the case where the cost function $h_t$ and marginal distributions $\mu_t$ and $\nu_t$ depend on a parameter $t$ with values in a metric space. Here the questions about the continuity with respect to $t$ of the optimal cost $K_{h_t}(\mu_t,\nu_t)$ and also about the possibility to select an optimal plan in $\Pi(\mu_t,\nu_t)$ that is continuous with respect to the parameter arise naturally. In addition, the set of all transport plans $\Pi(\mu_t,\nu_t)$ also depends on the parameter, so that one can ask about its continuity in the case where the space of sets of measures is equipped with the Hausdorff metric generated by some metric on the space of measures. Our first main result gives a simple estimate for the Hausdorff distance between sets of couplings in terms of the Kantorovich distances between their marginals. A qualitative result of this kind was established in [9], and another approach was subsequently developed in [40]; in the recent paper [26] a shorter justification of the continuity of the correspondence $(\mu, \nu)\mapsto \Pi(\mu,\nu)$ by means of the Prohorov metric was proposed (see [26], Theorem 1). We show that the Kantorovich distance enables one to find a simpler and more informative estimate.

Kantorovich problems depending on a parameter were investigated in several papers; see [13]–[15], [18], [23], [30], [32], [43] and [46], where the questions of measurability were addressed. The continuity properties are also of great interest and fundamental importance; in particular, they can be useful in the study of differential equations and inclusions on spaces of measures (see [21]), in the regularization of optimal transportation (see [22] and [31]), in constructing approximations by discrete transportation problems (see [6]; a result of this kind is presented below), and in other applications (see, for instance, [1], [10], [41] and [45]). As already mentioned, the continuous dependence on marginals was considered in [9], [40] and [26]. Similar problems arise for nonlinear cost functionals, so-called weak transport costs considered in [27], [2], [7] and [8].

The main results of this paper are as follows.

1. The Hausdorff distance between two sets of probability measures $\Pi(\mu_1,\nu_1)$ and $\Pi(\mu_2,\nu_2)$ is estimated in terms of the distances between $\mu_1$ and $\mu_2$ and $\nu_1$ and $\nu_2$ (see Theorem 1). An analogous result holds for the $p$-Kantorovich metric $W_p$. This estimate is applied to the construction of discrete approximations of transportation problems. Actually, a more general estimate is established for the Kantorovich and Hausdorff pseudometrics in the case of general completely regular spaces.

2. The cost of optimal transportation is continuous with respect to the parameter in the case of the continuous dependence of the cost function and marginal distributions on this parameter (see Theorem 2 and Theorem 5 for the case of a nonlinear cost functional).

3. The optimal plan is continuous with respect to the parameter in the case of uniqueness (see Corollary 2).

4. There exist approximate optimal plans which are continuous with respect to the parameter (see Theorem 3).

5. Examples are constructed when there is no continuous selection of optimal plans on $[0,1]^2$, with a cost function continuous with respect to the parameter and marginal distributions equal to Lebesgue measure.

Finally, we prove a general result on the convergence of optimal Monge mappings in the spirit of known results in [5] and [24]: according to Corollary 4, the optimal Monge mappings taking $\mu_n$ to $\nu_n$ converge in the measure $\mu_0$, provided that ${\mu_n\to\mu_0}$ in variation, $\nu_n\to\nu_0$ weakly, the cost functions $h_n$ converge to a cost function $h_0$ uniformly on compact sets, and the corresponding optimal Monge mappings and optimal Kantorovich plans are unique. Some of these results were announced in our note [19].

§ 2. Notation and terminology

We recall that a nonnegative Radon measure on a topological space $X$ is a bounded nonnegative Borel measure $\mu$ such that for every Borel set $B$ and every $\varepsilon>0$ there is a compact set $K\subset B$ such that $\mu(B\setminus K)<\varepsilon$ (see [11]). If $X$ is a complete separable metric space, then all Borel measures are Radon. A bounded signed Borel measure $\mu$ is called Radon if its total variation $|\mu|$ is Radon.

The space $\mathcal{M}_r(X)$ of signed bounded Radon measures on $X$ can be equipped with the weak topology generated by the seminorms

$$ \begin{equation*} \mu\mapsto \biggl|\int f\, d\mu\biggr|, \end{equation*} \notag $$

where $f$ is a bounded continuous function. Its subset $\mathcal{P}_r(X)$ of Radon probability measures is equipped with the induced topology.

Given two measures $\mu\in \mathcal{P}_r(X)$ and $\nu\in \mathcal{P}_r(Y)$, the set $\Pi(\mu,\nu)$ consists of all measures $\sigma\in \mathcal{P}_r(X\times Y)$ with projections $\mu$ and $\nu$ onto the factors.

If $h\geqslant 0$ is a Borel function on $X\times Y$ (called a cost function), then for measures $\mu\in \mathcal{P}_r(X)$ and $\nu\in \mathcal{P}_r(Y)$ the Kantorovich cost is defined by

$$ \begin{equation*} K_h(\mu,\nu)=\inf_{\sigma\in \Pi(\mu,\nu)} \int_{X\times Y} h\, d\sigma, \end{equation*} \notag $$

where also the value $+\infty$ is possible.

A set $\mathcal{M}$ of nonnegative Radon measures on a space $X$ is called uniformly tight if for every $\varepsilon>0$ there exists a compact set $K\subset X$ such that $\mu(X\setminus K)<\varepsilon$ for all ${\mu\in\mathcal{M}}$.

Let $(X, d_X)$ and $(Y, d_Y)$ be metric spaces. Then the space $X\times Y$ is equipped with the metric

$$ \begin{equation*} d((x_1,y_1),(x_2,y_2))= d_X(x_1,x_2)+d_Y(y_1,y_2). \end{equation*} \notag $$

The weak topology on the spaces of Radon probability measures $\mathcal{P}_r(X)$, $\mathcal{P}_r(Y)$ and $\mathcal{P}_r(X\times Y)$ is metrizable by the corresponding Kantorovich–Rubinshtein metrics $d_{\mathrm{KR}}$ (also called Fortet–Mourier metrics; see [12]) defined by

$$ \begin{equation*} d_{\mathrm{KR}}(\mu,\nu)=\sup \biggl\{\int f\, d(\mu - \nu)\colon f\in \mathrm{Lip}_1,\ |f|\leqslant 1\biggr\}, \end{equation*} \notag $$

where $\mathrm{Lip}_1$ is the set of $1$-Lipschitz functions on $X$. If $X$ is complete, then $(\mathcal{P}_r(X), d_{\mathrm{KR}})$ is also complete, and if $X$ is Polish, then $\mathcal{P}_r(X)$ is too.

The subsets $\mathcal{P}_r^1(X)$, $\mathcal{P}_r^1(Y)$ and $\mathcal{P}_r^1(X\times Y)$, consisting of measures with respect to which all functions of the form $x\mapsto d(x,x_0)$ are integrable, are equipped with the Kantorovich metric

$$ \begin{equation*} d_{\mathrm K}(\mu,\nu)=\sup \biggl\{\int f\, d(\mu - \nu)\colon f\in \mathrm{Lip}_1 \biggr\}. \end{equation*} \notag $$

This metric is also called the ‘Wasserstein distance’, but we do not use this historically incorrect terminology.

Since we consider probability measures, in the formula for $d_{\mathrm{K}}$ the supremum can be taken over the functions $f$ satisfying the additional condition $f(x_0)=0$ for a fixed point $x_0$. Hence for a space contained in a ball of radius $1$ the equality $d_{\mathrm{K}}=d_{\mathrm{KR}}$ holds.

Note that an unbounded metric space $(X,d)$ can be equipped with the bounded metric $\widetilde{d}=\min(d,1)$ generating the original topology. For the new metric we have $\widetilde{d}_{\mathrm{K}}=\widetilde{d}_{\mathrm{KR}}$. Moreover,

$$ \begin{equation} 2^{-1}\widetilde{d}_{\mathrm K} \leqslant d_{\mathrm{KR}} \leqslant 2\widetilde{d}_{\mathrm K}. \end{equation} \tag{2.1} $$

Indeed, if $|f|\leqslant 1$ and $f$ is $1$-Lipschitz in the original metric, then $f$ is $2$-Lipschitz with respect to the new metric. On the other hand, every function $f$ that is $1$-Lipschitz in the new metric and vanishes at a point $x_0$ satisfies the bound ${|f|\leqslant 1}$ and is $2$-Lipschitz in the original metric, since $\widetilde{d}(x,y)= d(x,y)$ whenever $d(x,y)\leqslant 1$, and when $d(x,y)>1$, the required inequality follows from the estimate $|f|\leqslant1$.

It is worth noting that if $X=Y$ and we take the distance as a cost function, then the equality $K_d(\mu,\nu)=d_{\mathrm{K}}(\mu,\nu)$ holds on $\mathcal{P}_r^1(X)$, which is called the Kantorovich duality formula.

Similarly, for any $p\in [1,+\infty)$ the subspace $\mathcal{P}_r^p(X)$ in $\mathcal{P}_r(X)$ consisting of measures with respect to which the function $x\mapsto d(x,x_0)^p$ is integrable for some (and therefore each) fixed point $x_0$ can be equipped with the $p$-Kantorovich metric

$$ \begin{equation*} W_p(\mu,\nu)=K_{d^p}(\mu, \nu)^{1/p}. \end{equation*} \notag $$

Recall that the Hausdorff distance between bounded closed subsets $A$ and $B$ of a metric space $(M,d)$ is defined by the formula

$$ \begin{equation*} H(A,B)=\max \Bigl\{\sup_{x\in A} d(x,B), \,\sup_{y\in B} d(y,A)\Bigr\}. \end{equation*} \notag $$

This distance will be considered for subsets of the space of probability measures $\mathcal{P}_r(X\times Y)$ with the Kantorovich–Rubinshtein metric $d_{\mathrm{KR}}$ (generated by the metric on $X\times Y$ introduced above) or its subspace $\mathcal{P}_r^1(X\times Y)$ with the Kantorovich metric $d_{\mathrm{K}}$, which gives the corresponding Hausdorff distances $H_{\mathrm{KR}}$ and $H_{\mathrm{K}}$. Due to estimates (2.1) we can deal with the latter case.

When $\mathcal{P}_r^p(X\times Y)$ is equipped with the metric $W_p$, we obtain the corresponding Hausdorff distance $H_p$ on the space of closed subsets of $\mathcal{P}_r^p(X\times Y)$.

Similar constructions are introduced in a more general case of completely regular spaces $X$ and $Y$. The topologies of such spaces can be defined by families of pseudometrics $\Psi_X$ and $\Psi_Y$ (recall that a pseudometric $d$ differs from a metric by the property that the equality $d(x,y)=0$ is allowed for $x\ne y$). Then the topology of the product $X\times Y$ is generated by the pseudometrics

$$ \begin{equation*} ((x_1,y_1), (x_2,y_2))\mapsto d_1(x_1,x_2)+d_2(y_1,y_2), \qquad d_1\in \Psi_X, \quad d_2\in \Psi_Y. \end{equation*} \notag $$

For a pseudometric $d\in \Psi_X$, in the way indicated above one defines the Kantorovich and Kantorovich–Rubinshtein pseudometrics $d_{\mathrm{K},d}$ and $d_{\mathrm{KR},d}$ on the spaces $\mathcal{P}_r^\Psi(X)$ and $\mathcal{P}_r(X)$, where the former consists of the measures with respect to which the functions $x\mapsto d(x,x_0)$ are integrable for all pseudometrics in $\Psi_X$. It is readily verified that the weak topology on $\mathcal{P}_r(X)$ is generated by the family of such pseudometrics $d_{\mathrm{KR},d}$ (see [1], where the Kantorovich and Kantorovich–Rubinshtein topologies are studied on spaces of measures on completely regular spaces).

The Kantorovich duality $K_d(\mu,\nu)=d_{\mathrm{K}}(\mu,\nu)$ also holds for pseudometrics (see [37] or [17], Theorem 1.3.1).

The Monge problem for the same triple $(\mu, \nu, h)$ consists in finding a Borel mapping $T\colon X\to Y$ taking $\mu$ to $\nu$, that is, satisfying the equality $\nu=\mu\circ T^{-1}$, where $(\mu\circ T^{-1})(B)=\mu(T^{-1}(B))$ for all Borel sets $B \subset Y$ for which the integral

$$ \begin{equation*} \int h(x, T(x))\, \mu(dx) \end{equation*} \notag $$

is smallest. Again, in general, only the infimum (possibly, infinite) $M_h(\mu, \nu)$ of this integral is defined, but in many interesting cases there exist optimal Monge mappings. In any case, $K_h(\mu,\nu)\leqslant M_h(\mu, \nu)$, but for non-atomic separable Radon measures and continuous cost functions one has $K_h(\mu,\nu)= M_h(\mu, \nu)$; see [16] and [35]. It follows from this equality that if there is a solution $T$ to the Monge problem, then the image of $\mu$ under the mapping $x\mapsto (x,T(x))$ is an optimal Kantorovich plan.

We need below the so-called ‘gluing lemma’; see, for instance, [17], Lemma 1.1.6, or [12], Lemma 3.3.1 (in the latter lemma the formulation involves metric spaces, but the proof is actually given for Radon measures on completely regular spaces). Let $X_1$, $X_2$ and $X_3$ be completely regular spaces and let $\mu_{1,2}$ and $\mu_{2,3}$ be Radon probability measures on $X_1\times X_2$ and $X_2\times X_3$, respectively, such that their projections on $X_2$ coincide. Then there exists a Radon probability measure $\mu$ on $X_1\times X_2\times X_3$ such that its projection onto $X_1\times X_2$ is $\mu_{1,2}$ and its projection onto $X_2\times X_3$ is $\mu_{2,3}$.

§ 3. Main results

We start with the following general estimate for measures on completely regular spaces. For functions $\alpha_1,\dots,\alpha_n$ on spaces $X_1^2,\dots, X_n^2$ (where $X_i^2=X_i\times X_i$) we set

$$ \begin{equation*} (\alpha_1\oplus \dots \oplus \alpha_n)((x_1, \dots, x_n), (x_1', \dots, x_n')) =\alpha_1(x_1,x_1')+\dots+\alpha_n(x_n,x_n'), \end{equation*} \notag $$

where $x_1, x'_1 \in X_1$, …, $x_n, x'_n \in X_n$.

Theorem 1. Let $\mu_1, \mu_2\in \mathcal{P}_r(X)$ and $\nu\in \mathcal{P}_r(Y)$, and let $\alpha$ and $\beta$ be continuous nonnegative functions on $X^2$ and $Y^2$, respectively, where $\beta(y,y)=0$. Then for every measure $\sigma_1\in \Pi(\mu_1,\nu)$ there exists a measure $\sigma_2\in \Pi(\mu_2,\nu)$ such that

$$ \begin{equation} K_{\alpha\oplus \beta}(\sigma_1,\sigma_2)\leqslant K_{\alpha}(\mu_1,\mu_2). \end{equation} \tag{3.1} $$

If $\nu_1, \nu_2\in \mathcal{P}_r(Y)$ and $\alpha$ and $\beta$ are pseudometrics, then for every measure $\sigma_1\in \Pi(\mu_1,\nu_1)$ there exists a measure $\sigma_2\in \Pi(\mu_2,\nu_2)$ such that

$$ \begin{equation} d_{\mathrm K, \alpha\oplus \beta}(\sigma_1,\sigma_2)\leqslant d_{\mathrm K, \alpha}(\mu_1,\mu_2)+d_{\mathrm K, \beta}(\nu_1,\nu_2). \end{equation} \tag{3.2} $$

Hence, for the corresponding Kantorovich and Hausdorff pseudometrics we have

$$ \begin{equation} H_{\mathrm K, \alpha\oplus \beta}(\Pi(\mu_1,\nu_1), \Pi(\mu_2,\nu_2))\leqslant d_{\mathrm K, \alpha}(\mu_1,\mu_2)+d_{\mathrm K, \beta}(\nu_1,\nu_2). \end{equation} \tag{3.3} $$

A similar assertion is true for $n$ marginals: if $\mu_i, \nu_i\in\mathcal{P}_r(X_i)$, $i=1,\dots,n$, $\alpha_i$ are continuous pseudometrics on the $X_i$ and $\sigma \in \Pi(\mu_1, \dots, \mu_n)$, then there exists a measure $\pi \in \Pi(\nu_1, \dots, \nu_n)$ such that

$$ \begin{equation*} d_{\mathrm K, \alpha_1\oplus \dots\oplus \alpha_n}(\pi, \sigma) \leqslant d_{\mathrm K, \alpha_1}(\mu_1,\nu_1)+ \dots + d_{\mathrm K, \alpha_n}(\mu_n,\nu_n). \end{equation*} \notag $$

Proof. Let us take a measure $\eta \in \Pi(\mu_1, \mu_2)$ such that

$$ \begin{equation*} \int_{X\times X} \alpha(x_1, x_2) \, \eta(dx_1\, dx_2) = K_{\alpha}(\mu_1,\mu_2). \end{equation*} \notag $$

According to the gluing lemma mentioned above and applied to $X_1=X_2=X$, there exists a measure $\lambda\in \mathcal{P}_r(X_1\times X_2\times Y)$ such that its projection onto $X_1\times Y$ is $\sigma_1$ and its projection onto $X_1\times X_2$ is $\eta$. For $\sigma_2$ we take the projection of the measure $\lambda$ onto $X_2\times Y$. The projections of the measure $\sigma_2$ onto $X$ and $Y$ equal $\mu_2$ and $\nu$, respectively. Indeed, the projection onto $X$ is obtained by projecting the measure $\lambda$ first onto $X_2\times Y$, and then onto $X_2$, which coincides with the composition of the operators of projection onto $X_1\times X_2$ and $X_2$, but this composition takes $\lambda$ to $\mu_2$, since $\lambda$ is first mapped to $\eta$ and then to $\mu_2$. The projection of the measure $\sigma_2$ onto $Y$ is obtained by projecting first onto $X_1\times Y$ and then onto $Y$, that is, it equals the projection of the measure $\sigma_1$ onto $Y$.

For the proof of the required estimate we consider the measure $\zeta$ equal to the image of $\lambda$ under the mapping

$$ \begin{equation*} X_1\times X_2\times Y\to X_1\times Y_1\times X_2\times Y_2, \quad\text{where } Y_1=Y_2=Y, \quad (x_1, x_2,y)\mapsto (x_1, y, x_2, y). \end{equation*} \notag $$

Then $\zeta\in \Pi(\sigma_1,\sigma_2)$, since the projection of the measure $\lambda$ onto $X_1\times Y$ is $\sigma_1$ and the projection onto $X_2\times Y$ is $\sigma_2$. Moreover,

$$ \begin{equation*} \begin{aligned} \, K_{\alpha\oplus \beta}(\sigma_1,\sigma_2) &\leqslant\int_{X_1\times Y_1\times X_2\times Y_2}\alpha\oplus \beta\, d\zeta \\ &=\int_{X_1\times X_2\times Y} (\alpha\oplus \beta)((x_1,y), (x_2,y))\, \lambda(dx_1\, dx_2\, dy) \\ &=\int_{X_1\times X_2\times Y} \alpha(x_1,x_2)\, \lambda(dx_1\, dx_2\, dy) \\ &=\int_{X_1\times X_2} \alpha(x_1,x_2)\, \eta(dx_1\, dx_2)= K_{\alpha}(\mu_1,\mu_2). \end{aligned} \end{equation*} \notag $$

In the case of pseudometrics, first we pick a measure $\sigma_2\in \Pi(\mu_2,\nu_1)$ satisfying the bound

$$ \begin{equation*} K_{\alpha\oplus \beta}(\sigma_1,\sigma_2)\leqslant d_{\mathrm K}(\mu_1,\mu_2), \end{equation*} \notag $$

and next we pick a measure $\sigma_3\in \Pi(\mu_2,\nu_2)$ satisfying

$$ \begin{equation*} K_{\alpha\oplus \beta}(\sigma_2,\sigma_3)\leqslant d_{\mathrm K}(\nu_1,\nu_2). \end{equation*} \notag $$

It remains to use the triangle inequality

$$ \begin{equation*} K_{\alpha\oplus \beta}(\sigma_1,\sigma_3)\leqslant K_{\alpha\oplus \beta}(\sigma_1,\sigma_2)+K_{\alpha\oplus \beta}(\sigma_2,\sigma_3). \end{equation*} \notag $$

Let us proceed to the case of $n$ marginals. Here it suffices to prove that there exists a measure $\pi \in \Pi(\nu_1, \mu_2, \dots, \mu_n)$ such that

$$ \begin{equation*} d_{\mathrm K, \alpha_1 \oplus \dots \oplus \alpha_n}(\pi,\sigma) \leqslant d_{\mathrm K, \alpha_1}(\mu_1,\nu_1). \end{equation*} \notag $$

Let $\sigma_1$ be the projection of $\sigma$ onto $X_2 \times \dots \times X_n$. Then $\sigma \in \Pi(\mu_1, \sigma_1)$. As shown above, there exists a measure $\pi \in \Pi(\nu_1, \sigma_1)$ for which

$$ \begin{equation*} d_{\mathrm K, \alpha_1 \oplus \dots \oplus \alpha_n}(\pi,\sigma) \leqslant d_{\mathrm K, \alpha_1}(\mu_1,\nu_1). \end{equation*} \notag $$

In addition, the inclusion $\pi \in \Pi(\nu_1, \mu_2, \dots, \mu_n)$ holds, which completes the proof.

Theorem 1 is proved.

Remark 1. If in place of the continuity of $\alpha$ and $\beta$ we assume only their Borel measurability (or universal measurability), then the reasoning used above shows that for every $\varepsilon>0$ there is a measure $\sigma_2^\varepsilon$ for which

$$ \begin{equation*} K_{\alpha\oplus \beta}(\sigma_1,\sigma_2^\varepsilon)\leqslant K_{\alpha}(\mu_1,\mu_2)+\varepsilon. \end{equation*} \notag $$

To show this we take a measure $\eta^\varepsilon \in \Pi(\mu_1, \mu_2)$ such that

$$ \begin{equation*} \int_{X\times X} \alpha(x_1, x_2) \, \eta^\varepsilon(dx_1\, dx_2) \leqslant K_{\alpha}(\mu_1,\mu_2)+\varepsilon. \end{equation*} \notag $$

Then the last equality in the proof involving the quantity $K_{\alpha}(\mu_1,\mu_2)$ is replaced by the inequality containing $K_{\alpha}(\mu_1,\mu_2)+\varepsilon$ on the right-hand side.

A straightforward modification of the proof yields the following assertion for the $p$-Kantorovich metric $W_p$.

Corollary 1. Let $X$ be a metric space, $p\in [1,+\infty)$ and $\mu_1, \mu_2, \nu_1, \nu_2\in \mathcal{P}_r^p(X)$. Then for every measure $\sigma_1\in \Pi(\mu_1,\nu_1)$ there exists a measure $\sigma_2\in \Pi(\mu_2,\nu_2)$ such that

$$ \begin{equation*} W_p(\sigma_1,\sigma_2)\leqslant W_p(\mu_1,\mu_2)+ W_p(\nu_1,\nu_2). \end{equation*} \notag $$

Hence the Hausdorff distance $H_p$ satisfies

$$ \begin{equation*} H_{p}(\Pi(\mu_1,\nu_1), \Pi(\mu_2,\nu_2))\leqslant W_p(\mu_1,\mu_2)+ W_p(\nu_1,\nu_2). \end{equation*} \notag $$

Proof. Repeating the reasoning presented above for $h=d^p$, first we find a measure $\sigma_0\in \Pi(\mu_2,\nu_1)$ such that $W_p(\sigma_1,\sigma_0)\leqslant W_p(\mu_1,\mu_2)$, and then we take a measure $\sigma_2\in \Pi(\mu_2,\nu_2)$ such that $W_p(\sigma_0,\sigma_2)\leqslant W_p(\nu_1,\nu_2)$.

We recall that a completely regular space is called sequentially Prohorov if every sequence of probability Radon measures on this space converging weakly to a Radon measure is uniformly tight. For example, complete metric spaces are sequentially Prohorov (see [11] or [12] about this property).

Theorem 2. Let $X$ and $Y$ be completely regular spaces. Suppose that the sequence of measures $\mu_n\in \mathcal{P}_r(X)$ converges weakly to a measure $\mu\in \mathcal{P}_r(X)$, the sequence of measures $\nu_n\in \mathcal{P}_r(Y)$ converges weakly to a measure $\nu \in \mathcal{P}_r(Y)$, both sequences are uniformly tight (which holds automatically if both spaces are sequentially Prohorov), and the continuous functions $h_n\colon X\times Y\to [0,+\infty)$ converge to a function $h\colon X\times Y\to [0,+\infty)$ uniformly on compact sets. Suppose also that there are nonnegative Borel functions $a_n\in L^1(\mu_n)$ and $b_n\in L^1(\nu_n)$ such that

$$ \begin{equation} h_n(x,y)\leqslant a_n(x)+b_n(y)\quad\textit{and} \quad \lim_{R\to+\infty} \sup_n \biggl(\int_{\{a_n\geqslant R\}} a_n\, d\mu_n+ \int_{\{b_n\geqslant R\}} b_n\, d\nu_n\biggr)=0. \end{equation} \tag{3.4} $$

Then

$$ \begin{equation*} K_h(\mu,\nu)=\lim_{n\to\infty} K_{h_n}(\mu_n,\nu_n). \end{equation*} \notag $$

In particular, this is true if $\sup_n (\|a_n\|_{L^p(\mu_n)}+\|b_n\|_{L^p(\nu_n)})<\infty$ for some $p>1$.

Proof. First we consider the case where $h_n\equiv h\leqslant 1$. Since $h$ is continuous on compact sets (it need not be continuous on the whole space), there are optimal measures $\sigma_n\in \Pi(\mu_n,\nu_n)$ for the function $h$ (see the comments after Theorem 1.2.1 in [17]). By our assumption the sequences of measures $\mu_n$ and $\nu_n$ are uniformly tight, which implies the uniform tightness of the sequence of measures $\sigma_n$. Let us pass to a subsequence of indices $n_i$ for which the numbers $K_h(\mu_{n_i},\nu_{n_i})$ tend to $\liminf_{n\to\infty} K_h(\mu_n,\nu_n)$. This sequence has a cluster point $\sigma$ in the weak topology; moreover, $\sigma\in \Pi(\mu,\nu)$. The integral of $h$ against respect to $\sigma$ is a limit point of the integrals against the measures $\sigma_n$, that is, of the numbers $K_h(\mu_n,\nu_n)$. Hence the inequality $K_h(\mu,\nu)\leqslant \liminf_{n\to\infty} K_h(\mu_n,\nu_n)$ holds.

We show that $K_h(\mu,\nu)\geqslant \limsup_{n\to\infty} K_h(\mu_n,\nu_n)$. Let $\varepsilon>0$. Passing to a subsequence we can assume that

$$ \begin{equation*} K_h(\mu_n,\nu_n)\to \limsup_{n\to\infty} K_h(\mu_n,\nu_n). \end{equation*} \notag $$

Consider compact sets $S_1\subset X$ and $S_2\subset Y$ such that

$$ \begin{equation*} \mu_n(X\setminus S_1)+\nu_n(Y\setminus S_2) <\varepsilon \quad \forall\, n. \end{equation*} \notag $$

The set $S=S_1\times S_2$ is compact in $X\times Y$ and

$$ \begin{equation} \zeta((X\times Y)\setminus S)<\varepsilon \quad \forall\, \zeta\in \Pi(\mu_n, \nu_n), \quad \forall\, n. \end{equation} \tag{3.5} $$

We can take bounded continuous pseudometrics $\alpha$ and $\beta$ on the spaces $X$ and $Y$ and a function $g\colon X\times Y\to [0,1]$ such that

$$ \begin{equation*} |g(x,y)-g(x_1,y_1)|\leqslant \alpha(x,x_1)+\beta(y,y_1) \quad \forall\, x,x_1\in X, \quad y,y_1\in Y, \end{equation*} \notag $$

and

$$ \begin{equation*} g(x,y)=h(x,y) \quad \forall\, (x,y)\in S. \end{equation*} \notag $$

There are several known ways to do this. One is as follows. Let us embed our spaces $X$ and $Y$ homeomorphically into locally convex spaces $X_1$ and $Y_1$ (or just equip them with uniformities generating the original topologies). Then the function $h$ is uniformly continuous on $S$. It has a uniformly continuous extension $g\colon X_1\times Y_1\to [0,1]$ (this is true for any subsets (see [29]), but for a compact subset it suffices to take a continuous extension to the Stone–Čech compactification of $X_1\times Y_1$). The function

$$ \begin{equation*} \alpha(x_1,x_2)=\sup_y |g(x_1,y)-g(x_2,y)| \end{equation*} \notag $$

is a uniformly continuous pseudometric on $X$, and the function

$$ \begin{equation*} \beta(y_1,y_2)=\sup_x |g(x,y_1)-g(x,y_2)| \end{equation*} \notag $$

is a uniformly continuous pseudometric on $Y$. In addition,

$$ \begin{equation*} |g(x,y)-g(x_1,y_1)|\leqslant |g(x,y)-g(x_1,y)|+|g(x_1,y)-g(x_1,y_1)|, \end{equation*} \notag $$

so $|g(x,y)-g(x_1,y_1)|\leqslant \alpha(x,x_1)+\beta(y,y_1)$.

Let us consider the Kantorovich pseudometrics $d_{\mathrm{K},\alpha}$ and $d_{\mathrm{K},\beta}$ on $X$ and $Y$ generated by these pseudometrics $\alpha$ and $\beta$. For all sufficiently large $n$ we have

$$ \begin{equation*} d_{\mathrm K,\alpha}(\mu_n,\mu)+d_{\mathrm K,\beta}(\nu_n,\nu)< \varepsilon. \end{equation*} \notag $$

By Theorem 1 there exist measures $\zeta_n\in \Pi(\mu_n,\nu_n)$ such that

$$ \begin{equation*} d_{\mathrm K, \alpha\oplus\beta}(\sigma,\zeta_n)< \varepsilon. \end{equation*} \notag $$

Therefore, by (3.5)

$$ \begin{equation*} \int g\, d\sigma \geqslant \int g\, d\zeta_n-\varepsilon \geqslant \int h\, d\zeta_n - 2\varepsilon \geqslant K_h(\mu_n,\nu_n)-2\varepsilon. \end{equation*} \notag $$

Since $\varepsilon$ is arbitrary, we obtain the estimate

$$ \begin{equation*} K_h(\mu,\nu)\geqslant \limsup_{n\to\infty} K_h(\mu_n,\nu_n). \end{equation*} \notag $$

Now consider the case of different $h_n$, but with a common bound $h_n\leqslant 1$. Let ${\varepsilon>0}$. Using the same compact set $S$ as above, we find $N$ such that $|h_n-h|\leqslant \varepsilon$ for all $(x,y)\in S$ and $n\geqslant N$. Then

$$ \begin{equation*} |K_h(\mu_n,\nu_n)- K_{h_n}(\mu_n,\nu_n)|\leqslant 2\varepsilon, \end{equation*} \notag $$

because by (3.5), for every measure $\eta\in \Pi(\mu_n,\nu_n)$ we have the estimate

$$ \begin{equation*} \eta((X\times Y)\setminus S)<\varepsilon, \end{equation*} \notag $$

so the difference between the integrals of $h_n$ and $h$ with respect to the measure $\eta$ does not exceed $2\varepsilon$. This yields our assertion in the case of uniformly bounded $h_n$. The general case reduces easily to this case, because for the functions $\min (h_n,R)$ we have

$$ \begin{equation*} \begin{aligned} \, \int [h_n -\min (h_n,R)] \, d\eta &\leqslant \int h_n I_{\{h_n\geqslant R\}}\, d\eta\leqslant \int [2a_n I_{\{a_n\geqslant R/2\}} +2b_n I_{\{b_n\geqslant R/2\}}]\, d\eta \\ &=2 \int_{\{a_n\geqslant R/2\}} a_n\, d\mu_n + 2\int_{\{b_n\geqslant R/2\}} b_n\, d\nu_n \end{aligned} \end{equation*} \notag $$

for all measures $\eta\in\Pi(\mu_n,\nu_n)$ and similarly for the triple $(h,\mu,\nu)$. Because of this estimate and (3.4) we can pick a sufficiently large $R$ so that the right-hand side of the previous inequality is less than a fixed number $\varepsilon$. Therefore,

$$ \begin{equation*} |K_{h_n}(\mu_n,\nu_n)-K_{\min(h_n,R)}(\mu_n,\nu_n)|\leqslant \varepsilon \quad\text{and}\quad |K_{h}(\mu,\nu)-K_{\min(h,R)}(\mu,\nu)|\leqslant \varepsilon. \end{equation*} \notag $$

It remains to use the established fact that

$$ \begin{equation*} |K_{\min(h_n,R)}(\mu_n,\nu_n)|-K_{\min(h,R)}(\mu,\nu)|\leqslant \varepsilon \end{equation*} \notag $$

for all $n$ large enough.

Theorem 2 is proved.

Theorem 2 can be compared with Theorem 5.20 in [43], where the spaces are Polish, the marginals $\mu_n$ and $\nu_n$ converge weakly to $\mu$ and $\nu$, respectively, the cost functions $h_n$ converge uniformly to $h$, and the conclusion is that the sequence of optimal plans $\pi_n$ for $(\mu_n,\nu_n,h_n)$ contain a subsequence weakly converging to an optimal plan for $(\mu,\nu,h)$.

Corollary 2. If, in the situation of Theorem 2, optimal plans $\sigma_n$ and $\sigma$ for the triples $(\mu_n,\nu_n,h_n)$ and $(\mu,\nu,h)$ are unique, then the measures $\sigma_n$ converge weakly to $\sigma$.

Proof. The sequence of measures $\sigma_{n}$ is uniformly tight by the uniform tightness of its marginals. Hence it has a weakly convergent subnet (but not necessarily a subsequence, because the spaces are not supposed to be metrizable). Note that if its subnet $\{\sigma_\alpha\}$ converges weakly to a measure $\sigma_0$, then $\sigma_0$ is optimal for $h$. Indeed, we show that

$$ \begin{equation} \int h\, d\sigma_0 \leqslant \lim_{n\to\infty} K_{h_n}(\mu_{n},\nu_{n}). \end{equation} \tag{3.6} $$

Otherwise there are numbers $\varepsilon>0$ and $R>1$ such that

$$ \begin{equation*} \int \min(h,R)\, d\sigma_0 > \lim_{n\to\infty} K_{h_n}(\mu_{n},\nu_{n})+\varepsilon. \end{equation*} \notag $$

By weak convergence

$$ \begin{equation*} \int \min(h,R)\, d\sigma_0 = \lim_{\alpha} \int \min(h,R)\, d\sigma_{\alpha}. \end{equation*} \notag $$

Hence we can assume that

$$ \begin{equation*} \int \min(h,R)\, d\sigma_{\alpha}\geqslant \lim_{n\to\infty} K_{h_n}(\mu_{n},\nu_{n})+\varepsilon. \end{equation*} \notag $$

Then we can find infinitely many indices $n$ such that for the corresponding elements of the original sequence we have

$$ \begin{equation*} \int \min(h,R)\, d\sigma_{n}\geqslant \int h_n\, d\sigma_n +\frac{\varepsilon}2. \end{equation*} \notag $$

However, it is clear that for all $n$ large enough

$$ \begin{equation*} \int \min(h,R)\, d\sigma_{n}\leqslant \int \min(h_n,R)\, d\sigma_n +\frac{\varepsilon}4, \end{equation*} \notag $$

because the measures $\sigma_n$ are uniformly tight and the functions $\min(h_n,R)$ converge to $\min(h,R)$ uniformly on compact sets.

The right-hand side of (3.6) equals $K_{h}(\mu,\nu)$ by Theorem 2. Hence $\sigma_0$ is optimal and then $\sigma_0=\sigma$ by uniqueness. Thus, the sequence $\{\sigma_n\}$ has no limit points different from $\sigma$. Therefore, the measures $\sigma_{n}$ converge weakly to $\sigma$.

The corollary is proved.

For uncountable families of measures $\mu_t$ and $\nu_t$ and functions $a_t\in L^1(\mu_t)$ and $b_t\in L^1(\nu_t)$ an analogue of (3.4) reads

$$ \begin{equation} h_t(x,y)\leqslant a_t(x)+b_t(y)\quad\text{and} \quad \lim_{R\to+\infty} \sup_t \biggl(\int_{\{a_t\geqslant R\}}\!\! a_t\, d\mu_t+ \int_{\{b_t\geqslant R\}}\!\! b_t\, d\nu_t\biggr)=0. \end{equation} \tag{3.7} $$

In particular, this is true if

$$ \begin{equation*} \sup_t \bigl[\|a_t\|_{L^p(\mu_t)}+\|a_t\|_{L^p(\nu_t)}\bigr]<\infty \end{equation*} \notag $$

for some $p>1$. A more general sufficient condition is as follows: there is an unbounded increasing function $V>0$ on $[0,\infty)$ for which the integrals of $a_t V(a_t)$ against $\mu_t$ and the integrals of $b_t V(b_t)$ against $\nu_t$ are uniformly bounded.

Corollary 3. Let $X$ and $Y$ be sequentially Prohorov completely regular spaces, and let $T$ be a topological space. Suppose that the mappings

$$ \begin{equation*} t\mapsto \mu_t\in \mathcal{P}_r(X) \quad\textit{and}\quad t\mapsto \nu_t\in \mathcal{P}_r(Y) \end{equation*} \notag $$

are sequentially continuous and $(t,x,y)\mapsto h_t(x,y)$, $T\times X\times Y\to [0,+\infty)$ is a continuous function. Suppose also that there exist nonnegative Borel functions ${a_t\in L^1(\mu_t)}$ and $b_t\in L^1(\nu_t)$ such that (3.7) holds. Then the function $t\,{\mapsto}\, K_{h_t}(\mu_t,\nu_t)$ is sequentially continuous.

For the proof it suffices to use that $h_{t_n}(x,y)\to h_t(x,y)$ uniformly on compact sets if $t_n\to t$.

Clearly, the Prohorov property can be replaced by the uniform tightness of the families $\{\mu_t\}$ and $\{\nu_t\}$.

Suppose now that $X$, $Y$ and $T$ are metric spaces, $X$ and $Y$ are complete, and for every $t\in T$ we fix measures $\mu_t\in \mathcal{P}_r(X)$ and $\nu_t\in \mathcal{P}_r(Y)$ such that the mappings $t\mapsto \mu_t$ and $t\mapsto \nu_t$ are continuous in the weak topology (which is equivalent to the continuity in the Kantorovich–Rubinshtein metric). Suppose also that there is a continuous nonnegative function $(t,x,y)\mapsto h_t(x,y)$. A question arises as to whether it is possible to select an optimal plan depending continuously on the parameter $t$. It turns out that such a choice is not always possible, as the examples below show. However, the situation improves for approximate optimal plans or in the case of unique optimal plans. Given $\varepsilon>0$, a measure $\sigma\in \Pi(\mu,\nu)$ is called $\varepsilon$-optimal for the cost function $h$ if

$$ \begin{equation*} \int h\, d\sigma \leqslant K_h(\mu,\nu)+\varepsilon. \end{equation*} \notag $$

Theorem 3. Suppose that for every $t$ there exist nonnegative Borel functions ${a_t\in L^1(\mu_t)}$ and $b_t\in L^1(\nu_t)$ such that (3.7) holds. Then one can select $\varepsilon$-optimal measures $\sigma_t^\varepsilon\in \Pi(\mu_t,\nu_t)$ for the cost functions $h_t$ such that they are continuous in $t$ in the weak topology for every fixed $\varepsilon>0$.

If for every $t$ there is a unique optimal plan $\sigma_t$, then it is continuous in $t$.

Proof. Our assumption implies the inclusion $h_t\in L^1(\sigma)$ for all $\sigma\in \Pi(\mu_t,\nu_t)$. The set

$$ \begin{equation*} M_t=\biggl\{\sigma\in \Pi(\mu_t,\nu_t)\colon \int h_t\, d\sigma=K_{h_t}(\mu_t,\nu_t)\biggr\} \end{equation*} \notag $$

is convex and compact in $\mathcal{P}_r(X\times Y)$. Indeed, the convexity is obvious and the compactness follows from the fact that $M_t$ is closed in the compact set $\Pi(\mu_t,\nu_t)$, which is verified in the following way. Suppose that the measures $\pi_n\in M_t$ converge weakly to a measure $\pi$. Then for the convergence of the integrals of the continuous function $h_t$ against the measures $\pi_n$ to the integral against the measure $\pi$ it suffices to verify (see [12], Theorem 2.7.1) that

$$ \begin{equation*} \lim_{R\to\infty} \sup_n \int_{\{h_t\geqslant R\}} h_t\, d\pi_n =0. \end{equation*} \notag $$

Since $\pi_n\in \Pi(\mu_t,\nu_t)$, this equality follows from the estimate

$$ \begin{equation*} \begin{aligned} \, \int_{\{h_t\geqslant R\}} h_t\, d\pi_n &\leqslant \int_{X\times Y} \bigl[2a_tI_{\{a_t\geqslant R/2\}} + 2b_tI_{\{b_t\geqslant R/2\}}\bigr] \, d\pi_n \\ & =2\int_{\{a_t\geqslant R/2\}} a_t\, d\mu_t+ 2\int_{\{b_t\geqslant R/2\}} b_t\, d\nu_t. \end{aligned} \end{equation*} \notag $$

Set

$$ \begin{equation*} \Psi(t) = \biggl\{\pi \in \Pi(\mu_t, \nu_t)\colon \int h_t\, d\pi \leqslant K_{h_t}(\mu_t, \nu_t) + \varepsilon \biggr\}. \end{equation*} \notag $$

In order to find continuous selections, we verify the hypotheses of the classical Michael selection theorem as applied to the multivalued mapping $t\mapsto \Psi_t$ with convex compact values in the complete metrizable subset $\mathcal{P}_r(X\times Y)$ of the locally convex space $\mathcal{M}_r(X\times Y)$. In [33] the corresponding result was established for a set $M$ in a metrizable locally convex space (see also [38], p. 41, Theorem 1.5, about this case); it can be applied to the space $\mathcal{M}_r(X\times Y)$ equipped with the Kantorovich–Rubinshtein norm. However, Theorem 1.2 in [34] covers the case where $M$ is contained in a complete locally convex space, which can be applied in our situation to the completion of the space $\mathcal{M}_r(X\times Y)$ with the weak topology. In order to verify the hypotheses of Michael’s theorem, we have to show that for every open set $U$ in $\mathcal{P}_r(X\times Y)$ the set

$$ \begin{equation*} W=\{t\in T\colon \Psi_t\cap U\ne \varnothing \} \end{equation*} \notag $$

is open in $T$. Let $w\in W$. Then there exists a measure $\sigma\in \Psi_{w}\cap U$, that is, $\sigma \in \Pi(\mu_{w}, \nu_{w})$ and

$$ \begin{equation*} \int h_w\, d\sigma \leqslant K_{h_w}(\mu_w, \nu_w) + \varepsilon. \end{equation*} \notag $$

We can assume that this inequality is strict, since otherwise we can take a convex linear combination with an optimal measure $\sigma_0\in M_w$, which for $\alpha\in (0,1)$ will give the measure $(1-\alpha) \sigma + \alpha \sigma_0\in U$ with a smaller integral of the function $h_w$. Thus,

$$ \begin{equation*} \int h_w\, d\sigma \leqslant K_{h_w}(\mu_w, \nu_w) + \varepsilon -\delta, \quad \text{where } \delta>0. \end{equation*} \notag $$

Let us show that there exists a neighbourhood $V$ of $w$ such that for every $v\in V$ there is a measure $\sigma_v \in \Pi(\mu_v, \nu_v)$ for which

$$ \begin{equation*} \int h_v\, d\sigma_v \leqslant K_{h_v}(\mu_v, \nu_v) + \varepsilon . \end{equation*} \notag $$

Indeed, otherwise there exists a sequence of points $w_n$ converging to $w$ such that

$$ \begin{equation} \int h_{w_n}\, d\zeta > K_{h_{w_n}}(\mu_{w_n}, \nu_{w_n}) + \varepsilon \quad \forall\, \zeta\in \Pi(\mu_{w_n}, \nu_{w_n}). \end{equation} \tag{3.8} $$

It follows from (3.7) that there is $R>1$ such that

$$ \begin{equation} \int [h_t -\min(h_t,R)]\, d\zeta \leqslant \frac\delta8 \quad \forall\, t\in T, \quad \forall \, \zeta\in \Pi(\mu_t,\nu_t). \end{equation} \tag{3.9} $$

Next, the weak continuity of $\mu_t$ and $\nu_t$ in $t$ implies that the sequences $\{\mu_{w_n}\}$ and $\{\nu_{w_n}\}$ are uniformly tight (recall that the spaces here are metrizable). This implies the uniform tightness of the union of the sets $\Pi(\mu_{w_n}, \nu_{w_n})$. Hence there is a compact set $S\subset X\times Y$ such that

$$ \begin{equation} (\zeta+\sigma)((X\times Y)\setminus S)< \frac{\delta}{32R} \quad \forall\, n, \quad \forall\, \zeta\in \Pi(\mu_{w_n}, \nu_{w_n}). \end{equation} \tag{3.10} $$

Set

$$ \begin{equation*} g_t=\min(h_t,R). \end{equation*} \notag $$

By the continuity of $h$ on the compact set $(\{w_n\}\cup \{w\})\times S$ there exists a Lipschitz function $(t,x,y)\mapsto L_t(x,y)$ on $T\times X\times Y$ with values in $[0,R]$ for which

$$ \begin{equation} |g_{w_n}(x,y)-L_{w_n}(x,y)|\leqslant \frac{\delta}{32} \quad \forall\, n\geqslant 1, \quad (x,y)\in S. \end{equation} \tag{3.11} $$

Let $L>0$ be the Lipschitz constant of this function. Since the compact set $\Pi(\mu_t, \nu_t)$ depends on $t$ continuously in the Hausdorff metric, for all sufficiently large $n$ we have

$$ \begin{equation*} H_{\mathrm K}(\Pi(\mu_{w_n}, \nu_{w_n}),\Pi(\mu_{w}, \nu_{w}))\leqslant \frac{\delta}{16 L}. \end{equation*} \notag $$

Hence for every such $n$ there is a measure $\zeta_n\in \Pi(\mu_{w_n}, \nu_{w_n})$ satisfying the inequality

$$ \begin{equation*} d_{\mathrm K}(\sigma,\zeta_n)\leqslant \frac{\delta}{16 L}, \end{equation*} \notag $$

which yields the estimate

$$ \begin{equation*} \biggl|\int L_w\, d\sigma - \int L_w\, d\zeta_n\biggr| \leqslant \frac{\delta}{16}. \end{equation*} \notag $$

Since

$$ \begin{equation*} \sup_{(x,y)\in S} |L_w(x,y)-L_{w_n}(x,y)|\to 0, \qquad L_t\leqslant R\quad\text{and} \quad (\zeta_n+\sigma)((X\times Y)\setminus S)< \frac{\delta}{32R}, \end{equation*} \notag $$

for all sufficiently large $n$ we have

$$ \begin{equation*} \biggl|\int L_w\, d\sigma - \int L_{w_n}\, d\zeta_n\biggr| \leqslant \frac{\delta}{8}. \end{equation*} \notag $$

Therefore, from (3.10) and (3.11) we obtain

$$ \begin{equation*} \biggl|\int g_w\, d\sigma - \int g_{w_n}\, d\zeta_n\biggr| \leqslant \frac{\delta}{4}, \end{equation*} \notag $$

which by (3.9) gives the estimate

$$ \begin{equation*} \biggl|\int h_w\, d\sigma - \int h_{w_n}\, d\zeta_n\biggr| \leqslant \frac{\delta}{2}. \end{equation*} \notag $$

Since $K_{h_{w_n}}(\mu_{w_n},\nu_{w_n})\to K_{h_{w}}(\mu_{w},\nu_{w})$, we arrive at a contradiction with (3.8). Thus, the set $\{t\colon \Psi(t) \cap U \ne \varnothing\}$ open.

Finally, in the case of unique optimal plans the result follows from Corollary 2.

Theorem 3 is proved.

One might say that the continuity of the optimal cost with respect to the parameter is something expected, but what about continuous optimal plans? In the general case there is no continuous selection of optimal plans.

Example. Let $\mu$ and $\nu$ be Borel probability measures on a complete separable metric space $X$ such that there are at least two continuous mappings $f, g\colon X\to X$ taking $\mu$ to $\nu$ for which the images of $\mu$ under the mappings $x\mapsto (x,f(x))$ and $x\mapsto (x,g(x))$ are different. For example, as $\mu$ and $\nu$ one can take Lebesgue measure on $[0,1]$. Let us take the sequence of points $1/n$ and $0$ as the space $T$. Set

$$ \begin{equation*} h_0=0, \qquad h_t(x,y)=t |y-f(x)|^2 \quad \text{for } t=(2n-1)^{-1} \end{equation*} \notag $$

and

$$ \begin{equation*} h_t(x,y)=t |y-g(x)|^2 \quad \text{for } t=(2n)^{-1}. \end{equation*} \notag $$

Let $\mu_t=\mu$ and $\nu_t=\nu$. For $t=0$ all measures in $\Pi(\mu,\nu)$ are optimal for the cost function $h_0$. For $t=(2n-1)^{-1}$ the unique optimal measure for the cost function $h_t$ is the image of $\mu$ under the mapping $x\mapsto (x,f(x))$, for $t=(2n)^{-1}$ a unique optimal measure for the cost function $h_t$ is the image of $\mu$ under the mapping $x\mapsto (x,g(x))$. Then the optimal measures indicated have no limit as $t\to 0$. For example, if $\mu=\nu$ is Lebesgue measure on $[0,1]$, then we can take $f(x)=x$ and $g(x)=1-x$. Here is another explicit example with the same measures on $[0,1]$:

$$ \begin{equation*} h_t(x, y) = \min(|x-y|, |x+y-1| + t), \qquad t \geqslant 0, \end{equation*} \notag $$

and

$$ \begin{equation*} h_t(x, y) = \min(|x-y| - t, |x+y-1|), \qquad t < 0. \end{equation*} \notag $$

Then for $t>0$ the optimal plan is concentrated on the diagonal $x=y$, and for $t<0$ the optimal plan is concentrated on the diagonal $x+y=1$; in all cases it is the normalized linear Lebesgue measure. For $t=0$ there is no uniqueness: there are many optimal plans concentrated on the union of the diagonals, but some of them are not generated by Monge mappings. However, the left and right limits of optimal plans are generated by Monge mappings.

It is worth noting that a related, but not equivalent problem was considered in [5] and [24] (see also [42]), where the convergence in measure of Monge optimal mappings $T_\varepsilon\to T_0$ as $\varepsilon\to 0$ was shown for some special cost functions $h_\varepsilon$. In our situation a relevant question would be about some form of continuity of the Monge optimal mappings $T_t$ in the case where $\mu_t$ does not depend on $t$ or all measures $\mu_t$ are absolutely continuous with respect to some reference measure $\lambda$. Of course, in the case where $T_t$ is not unique, the question is about continuous selections of optimal mappings. Here is a general result in the spirit of the papers cited in the case where optimal plans are generated by mappings. The particular situation considered in those papers (see also [42], Theorem 2.53) deals with constant marginals and the cost functions $h_t(x,y)=|x-y|+t |x-y|^2$ or a more general family of functions with the property that $h_t(x,y)=|x-y|+t \varphi(|x-y|)+o(t)$ for a strictly convex function $\varphi$. However, this special structure is needed to guarantee the existence and uniqueness of Monge solutions. In the next abstract result the existence of ‘Monge mappings’ is part of the hypotheses.

Recall that convergence in measure for mappings to completely regular spaces is defined as follows. Let $\{d_\alpha\}$ be a family of pseudometrics defining the topology of a completely regular space $X$, and let $\mu\in\mathcal{P}_r(X)$. Then Borel mappings $T_n\colon {X\to X}$ converge in the measure $\mu$ to a Borel mapping $T\colon X\to X$ if, for each $\alpha$ and each $\delta>0$, one has $\mu(x\colon d_\alpha(T_n(x),T(x))\geqslant \delta)\to 0$ as $n\to \infty$.

Proposition. Let $X$ be a completely regular space, and let $\mu_n\in\mathcal{P}_r(X)$ for $n\in \mathbb{Z}^+$ and $\mu_n\to\mu_0$ in variation. Suppose that there are Borel mappings $T_n\colon X\to X$, $n\geqslant 0$, such that the measures $\sigma_n$ equal to the images of the measures $\mu_n$ under the mappings $x\mapsto (x,T_n(x))$ converge weakly to $\sigma_0$. Suppose in addition that the measures $\mu_0$ and $\nu_0:=\mu_0\circ T_0^{-1}$ are concentrated on countable unions of metrizable compact sets (which holds automatically if $X$ is a Souslin space). Then the mappings $T_n$ converge to $T_0$ in measure with respect to $\mu_0$.

Proof. Let $\psi$ be a continuous pseudometric on $X$. We can assume that $0\leqslant \psi\leqslant 1$. Suppose first that $T_{0}$ is continuous. Then the function $\psi(T_{0}(x),y)$ is continuous on $X^2$, hence by the weak convergence $\sigma_n\to \sigma_{0}$ we have

$$ \begin{equation*} \begin{aligned} \, & \int \psi(T_0(x),T_n(x)) \, \mu_n(dx)=\int \psi(T_{0}(x),y)\, \sigma_{n}(dx\, dy) \\ &\qquad \to \int \psi(T_{0}(x),y)\, \sigma_{0}(dx\, dy)=\int \psi(T_0(x),T_{0}(x)) \, \mu_0(dx)= 0. \end{aligned} \end{equation*} \notag $$

Then by convergence in variation we have

$$ \begin{equation*} \int \psi(T_0(x),T_n(x)) \, \mu_0(dx)\to 0. \end{equation*} \notag $$

Hence $T_n\to T_{0}$ in the measure $\mu_0$ by definition.

In the general case we can embed $X$ homeomorphically into a suitable power of the real line and assume that $X=\mathbb{R}^\tau$. Given $\varepsilon>0$, we can find a compact set $K$ such that $\mu_0(K)>1-\varepsilon$ on which $T_{0}$ is continuous. To this end we find a metrizable compact set $Q$ satisfying $\nu_0(Q)>1-\varepsilon$ and in the Borel set $T_0^{-1}(Q)$ we find a metrizable compact set $Q_1$ such that $\mu_0(Q_1)>1-\varepsilon$, which is possible because $\mu_0(T_0^{-1}(Q))=\nu_0(Q)>1-\varepsilon$. It remains to apply Luzin’s theorem to the Borel mapping $T_0$ between the metrizable compact sets $Q_1$ and $Q$. Actually, to apply a generalization of Luzin’s theorem it suffices that only $\nu_0$ be concentrated on metrizable compact sets (see [11], Theorem 7.1.13). Next, there is a continuous mapping $S$ that coincides with $T_{0}$ on $K$ (here we use that $X=\mathbb{R}^\tau$, so it suffices to extend the components of $T_0$). Repeating the same reasoning with the function $\psi(S(x),y)$, we obtain that the integral of $\psi(S(x),T_n(x))$ against $\mu_0$ tends to the integral of $\psi(S(x), T_0(x))$, which is estimated by $\varepsilon$ since $\psi(S(x),T_{0}(x))=0$ on $K$. This yields convergence in measure in the general case.

The proposition is proved.

Note that the convergence of $T_n$ to $T_0$ in measure $\mu_0$ is also sufficient for the weak convergence of $\sigma_n$.

Corollary 4. Suppose that $X$ is a complete separable metric space, the measures $\mu_n\in \mathcal{P}_r(X)$ converge in variation to a measure $\mu_0$, the measures $\nu_n\in \mathcal{P}_r(X)$ converge weakly to a measure $\nu_0$, the continuous functions $h_n\geqslant 0$ on $X\times X$ converge to a function $h$ uniformly on compact sets and (3.4) holds. Suppose also that for the triples $(\mu_n,\nu_n,h_n)$, $n\geqslant 0$, there are unique optimal Kantorovich plans $\sigma_n$ that are generated by unique Monge optimal mappings $T_n$. Then the mappings $T_n$ converge to $T_0$ in the measure $\mu_0$.

Note that Corollary 5.20 in [43] contains an analogous result in the case where all measures $\mu_n$ coincide with $\mu_0$ and $X$ is locally compact.

Remark 2. As shown above, in the situation when we deal with optimal transportation problems for triples $(\mu_n,\nu_n,h_n)$, the hypothesis that the optimal plans $\sigma_n$ converge weakly to $\sigma_0$ is fulfilled if optimal measures are unique, the measures $\nu_n$ converge weakly to $\nu_0$ and are uniformly tight, and the functions $h_n$ are continuous and converge to $h_0$ uniformly on compact sets.

If we do not assume the weak convergence of optimal plans, but $\{\nu_n\}$ converges weakly and is uniformly tight, then the conclusion still holds for a subsequence extracted from $\{T_n\}$ so that the corresponding plans converge, provided it is known that all optimal plans for $h_0$ (or at least those in the closure of the subsequence of plans under consideration) are generated by Monge mappings.

The particular case of constant marginals $\mu$ and $\nu$ is not much simpler, because anyway we have to ensure the convergence of plans and need the existence of Monge mappings. It would be interesting to study approximate Monge solutions depending continuously on a parameter. One could analyze the constructions in [16] and [35] to this end.

§ 4. Nonlinear cost functionals

Let us consider nonlinear cost functionals of the form

$$ \begin{equation*} J_H(\sigma) =\int_{X\times Y} H(x,y,\sigma)\, \sigma(dx\, dy), \end{equation*} \notag $$

where $H$ is a Borel function on $X\times Y\times \mathcal{P}_r(X\times Y)$ or on $X\times Y\times \Pi(\mu,\nu)$. Functionals of this type have recently been considered by several authors; see, for example, [27], [2] and [7], where additional references can be found. We assume for simplicity that $H$ is bounded; moreover, since we discuss continuity properties, we also assume that $H$ is continuous. Under these assumptions, for any measures $\mu\in\mathcal{P}_r(X)$ and $\nu\in\mathcal{P}_r(Y)$ there is a solution to the nonlinear Kantorovich problem for the functional $J_H$, that is, a plan $\sigma\in \Pi(\mu,\nu)$ for which the quantity $J_H(\sigma)$ attains its minimum $K_H(\mu,\nu)$ on the set $\Pi(\mu,\nu)$; see [20] and [13].

However, it should be noted that there is no continuity in $\sigma$ in the typical examples presented in the papers cited, where $H(x,y,\sigma)=H(x,\sigma^x)$ and $(\sigma^x)_{x\in X}$ is the disintegration of $\sigma$ with respect to the projection $\sigma_X$ of $\sigma$ onto $X$ (which is $\mu$ if $\sigma\in\Pi(\mu,\nu)$). We recall that a disintegration is a Borel mapping $x\mapsto \sigma^x$ from $X$ to $\mathcal{P}_r(Y)$ such that $\sigma (dx\, dy)=\sigma^x(dy)\, \sigma_X(dx)$ in the sense of the equality

$$ \begin{equation*} \sigma(B)=\int_X \sigma^x(B^x)\, \sigma_X(dx), \quad\text{where } B^x=\{y\in Y\colon (x,y)\in B\}. \end{equation*} \notag $$

General nonlinear cost functionals with a parameter will be studied in another paper; here we only present the following result on continuity with respect to a parameter.

Let $X$, $Y$ and $T$ be complete separable metric spaces. Suppose that

$$ \begin{equation*} H\colon T\times X\times Y\times \mathcal{P}_r(X\times Y)\to \mathbb{R} \end{equation*} \notag $$

is a bounded continuous function. Let $H_t(x,y,\sigma):=H(x,y,\sigma,t)$. Given $\mu\in \mathcal{P}_r(X)$ and $\nu\in \mathcal{P}_r(Y)$, let

$$ \begin{equation*} K(t,\mu,\nu)=K_{H_t}(\mu,\nu)=\inf_{\sigma\in \Pi(\mu,\nu)} \int_{X\times Y} H_t(x,y,\sigma)\, \sigma(dx\, dy). \end{equation*} \notag $$

First we consider the case where the function $H$ does not depend on $t$.

Theorem 4. Suppose that the measures $\mu_n\in\mathcal{P}_r(X)$ converge weakly to a measure $\mu\in\mathcal{P}_r(X)$ and the measures $\nu_n\in\mathcal{P}_r(Y)$ converge weakly to a measure $\nu\in\mathcal{P}_r(Y)$. Then $K_H(\mu,\nu)=\lim_{n\to\infty} K_H(\mu_n,\nu_n)$.

In addition, any measure $\sigma_0\in \mathcal{P}_r(X\times Y)$ that is a limit point of the sequence of optimal measures $\sigma_n$ for the marginals $\mu_n$ and $ \nu_n$ is optimal for the marginals $\mu$ and $\nu$.

Proof. We can assume that $0\leqslant H\leqslant 1$. Since by assumption we have the weak convergence $\mu_{n}\to\mu$ and $\nu_{n}\to\nu$ and the spaces $X$ and $Y$ are sequentially Prohorov, both sequences are uniformly tight. Therefore, the union of all sets $\Pi(\mu_{n}, \nu_{n})$ and $\Pi(\mu, \nu)$ is uniformly tight. Hence it is contained in a compact uniformly tight set $S$ in $\mathcal{P}_r(X\times Y)$. Let us take optimal measures $\sigma_n$ for the pairs $(\mu_n,\nu_n)$ and some limit measure $\sigma_0$ of the sequence $\{\sigma_n\}$ in the compact set $S$. It is clear that $\sigma\in \Pi(\mu,\nu)$. We show that $J_H(\sigma_0)=K_H(\mu,\nu)=\lim_{n\to\infty}K_H(\mu_n,\nu_n)$. First we verify the last equality.

Let $\varepsilon>0$. Take a compact set $K$ in $X\times Y$ such that

$$ \begin{equation*} \sigma(K)>1-\varepsilon \quad \forall\, \sigma\in S. \end{equation*} \notag $$

There are bounded continuous pseudometrics $d_1$ on $X$ and $d_2$ on $Y$ such that there exists a function $G\colon X\times Y\times \mathcal{P}_r(X\times Y)\to [0,1]$ that is Lipschitz with constant $L$ with respect to the pseudometric $d_1\oplus d_2\oplus d_{\mathrm{K}}$, where $d_{\mathrm{K}}$ is the Kantorovich pseudometric generated by $d_1\oplus d_2$, and satisfies the estimate

$$ \begin{equation*} |H(x,y,\sigma)- G(x,y,\sigma)|\leqslant \varepsilon \quad \forall\, (x,y)\in K, \quad \sigma\in S. \end{equation*} \notag $$

This can be done with the aid of the Stone–Weierstrass theorem, since the class of such functions is an algebra separating points of the compact set $K\times S$. By weak convergence we have $d_{\mathrm{K},d_1}(\mu_n,\mu)\to0$ and $d_{\mathrm{K},d_2}(\nu_n,\nu)\to0$. Hence for some $N$, for all ${n\geqslant N}$ we obtain the inequalities

$$ \begin{equation*} d_{\mathrm K,d_1}(\mu_n,\mu)+d_{\mathrm K,d_2}(\nu_n,\nu)\leqslant \frac{\varepsilon}{L+1}. \end{equation*} \notag $$

By this inequality we have

$$ \begin{equation*} H_{\mathrm K, d_1\oplus d_2}(\Pi(\mu_n,\nu_n),\Pi(\mu,\nu))\leqslant \frac{\varepsilon}{L+1}. \end{equation*} \notag $$

It follows from this that the $L$-Lipschitz cost function $G$ satisfies the inequality

$$ \begin{equation*} |K_G(\mu,\nu)-K_G(\mu_n,\nu_n)|\leqslant 2\varepsilon, \qquad n\geqslant N. \end{equation*} \notag $$

Indeed, if $\pi\in\Pi(\mu,\nu)$ is an optimal measure for the triple $(\mu,\nu,G)$, then there is a plan $\pi_n\in \Pi(\mu_n,\nu_n)$ for which

$$ \begin{equation*} d_{\mathrm K, d_1\oplus d_2}(\pi,\pi_n)\leqslant \frac{\varepsilon}{L+1}. \end{equation*} \notag $$

Then by the $L$-Lipschitz property of $G$ we have

$$ \begin{equation*} \biggl|\int G(x,y, \pi)\, d\pi - \int G(x,y, \pi)\, d\pi_n\biggr|\leqslant \varepsilon. \end{equation*} \notag $$

In addition,

$$ \begin{equation*} \biggl|\int G(x,y, \pi_n)\, d\pi_n - \int G(x,y, \pi)\, d\pi_n\biggr|\leqslant \varepsilon, \end{equation*} \notag $$

because $|G(x,y, \pi_n)-G(x,y, \pi)|\leqslant Ld_{\mathrm{K},d_1\oplus d_2}(\pi_n,\pi) \leqslant L\varepsilon /(L+1)<\varepsilon$. Hence

$$ \begin{equation*} \biggl|\int G(x,y, \pi)\, d\pi - \int G(x,y, \pi_n)\, d\pi_n\biggr|\leqslant 2\varepsilon. \end{equation*} \notag $$

So $K_G(\mu_n,\nu_n) \leqslant K_G(\mu,\nu)+2\varepsilon$. We obtain the estimate $K_G(\mu,\nu) \leqslant K_G(\mu_n,\nu_n)+ 2\varepsilon$ similarly, by taking an optimal plan $\pi_n'$ for the triple $(G,\mu_n,\nu_n)$ and picking ${\pi'\in \Pi(\mu,\nu)}$ such that $d_{\mathrm{K}, d_1\oplus d_2}(\pi',\pi_n')\leqslant \varepsilon/(L+1)$.

For the original function $H$ the estimates $|J_H(\sigma)-J_G(\sigma)|\leqslant 3\varepsilon$ hold for all $\sigma\in S$, hence

$$ \begin{equation*} |K_H(\mu_n,\nu_n)-K_G(\mu_n,\nu_n)|\leqslant 3\varepsilon \quad\text{and}\quad |K_H(\mu,\nu)-K_G(\mu,\nu)|\leqslant 3\varepsilon \end{equation*} \notag $$

whenever $n\geqslant N$. Thus, $|K_H(\mu,\nu)-K_H(\mu_n,\nu_n)|\leqslant 8\varepsilon$, which proves the equality $K_H(\mu,\nu)=\lim_{n\to\infty}K_H(\mu_n,\nu_n)$, because $\varepsilon$ is arbitrary.

Now we show that $\sigma_0$ is an optimal plan for the marginals $\mu$ and $\nu$. Let $\varepsilon>0$. We take the same compact set $K\subset X\times Y$ as above and also the same pseudometric $d_1\oplus d_2$ and the cost function $G$, which is $L$-Lipschitz with respect to this pseudometric. The measure $\sigma_0$ is a limit point for the sequence $\{\sigma_n\}$ in the weak topology, and therefore there is a subsequence $\sigma_{n_k}$ converging to $\sigma_0$ with respect to the pseudometric indicated. Without loss of generality we can assume that so is the whole sequence. Hence there is $N$ such that $d_{\mathrm{K},d_1\oplus d_2}(\sigma_{n},\sigma_0)\leqslant \varepsilon/(L+1)$ whenever $n\geqslant N$. Then for such $n$ we have

$$ \begin{equation*} \biggl|\int G(x,y, \sigma_n)\, d\sigma_n - \int G(x,y, \sigma_n)\, d\sigma_0\biggr|\leqslant \varepsilon \end{equation*} \notag $$

and

$$ \begin{equation*} \biggl|\int G(x,y, \sigma_n)\, d\sigma_0 - \int G(x,y, \sigma_0)\, d\sigma_0\biggr|\leqslant \varepsilon, \end{equation*} \notag $$

from which we obtain

$$ \begin{equation*} \biggl|\int G(x,y, \sigma_n)\, d\sigma_n - \int G(x,y, \sigma_0)\, d\sigma_0\biggr|\leqslant 2\varepsilon. \end{equation*} \notag $$

As above, comparing these integrals with the integrals of the original function $H$, we arrive at the estimate

$$ \begin{equation*} \biggl|\int H(x,y, \sigma_n)\, d\sigma_n - \int H(x,y, \sigma_0)\, d\sigma_0\biggr|\leqslant 8\varepsilon, \end{equation*} \notag $$

that is, $|K_H(\mu_n,\nu_n)-J_H(\sigma_0)|\leqslant 8\varepsilon$. Hence $|K_H(\mu,\nu)-J_H(\sigma_0)|\leqslant 8\varepsilon$, which completes the proof.

Theorem 4 is proved.

Let us return to a cost function $H$ depending on $t$.

Theorem 5. Suppose that $t\mapsto \mu_t$ and $t\mapsto \nu_t$ are continuous mappings with values in $\mathcal{P}_r(X)$ and $\mathcal{P}_r(Y)$, respectively. Then the function $t\mapsto K(t,\mu_t,\nu_t)$ is sequentially continuous.

Proof. We reduce our assertion to Theorem 4. We use the same initial construction. We can assume that $0\leqslant H\leqslant 1$. Let $t_n\to t_0$. Since by assumption we have the weak convergence $\mu_{t_n}\to\mu_{t_0}$ and $\nu_{t_n}\to\nu_{t_0}$, while the spaces $X$ and $Y$ are sequentially Prohorov, both sequences are uniformly tight. So the union of the sets of measures $\Pi(\mu_{t_n}, \nu_{t_n})$, $n\geqslant 0$, is uniformly tight. Hence it is contained in a compact uniformly tight set $S$ in $\mathcal{P}_r(X\times Y)$. Let $\varepsilon>0$. Again, we take a compact set $K$ in $X\times Y$ such that

$$ \begin{equation*} \sigma(K)>1-\varepsilon \quad \forall\, \sigma\in S. \end{equation*} \notag $$

By the continuity of $H$ and the compactness of $K\times S$ there exists $N$ such that

$$ \begin{equation*} |H(t_n,x,y,\sigma)-H(t_0,x,y,\sigma)|\leqslant \varepsilon \quad \forall\, (x,y)\in K, \quad \sigma\in S, \quad n\geqslant N. \end{equation*} \notag $$

Therefore,

$$ \begin{equation*} |J_{H_{t_n}}(\sigma)-J_{H_{t_0}}(\sigma)|\leqslant 3\varepsilon \quad \forall\, \sigma\in S, \quad n\geqslant N, \end{equation*} \notag $$

which gives us the estimate

$$ \begin{equation*} |K(t_n,\mu_{t_n},\nu_{t_n})-K(t_0,\mu_{t_n},\nu_{t_n})|\leqslant 3\varepsilon \quad \forall\, n\geqslant N. \end{equation*} \notag $$

According to Theorem 4, there exists $N_1\geqslant N$ such that

$$ \begin{equation*} |K(t_0,\mu_{t_n},\nu_{t_n})-K(t_0,\mu_{t_0},\nu_{t_0})|\leqslant \varepsilon \quad \forall\, n\geqslant N_1. \end{equation*} \notag $$

For such $n$ we obtain $|K(t_n,\mu_{t_n},\nu_{t_n})-K(t_0,\mu_{t_0},\nu_{t_0})|\leqslant 4\varepsilon$, which completes the proof.

Theorem 5 is proved.

As above, this result extends easily to the case of unbounded nonnegative cost functions with suitable uniform integrability. For example, it suffices to have a bound

$$ \begin{equation*} G(t,x,y,\sigma)\leqslant a_t(x)+b_t(y), \end{equation*} \notag $$

where $a_t$ and $b_t$ satisfy (3.7).

Note that in the case of a cost functional generated by a function of the form $H(x,\sigma_x)$ (mentioned above) it was shown in [7] that $J_H(\sigma)$ is lower semicontinuous in $\sigma$, provided that $H$ is jointly lower semicontinuous and convex in the second argument. As shown in [2] (see also [13]), this can fail without convexity.

§ 5. Application to discrete approximations

Let $X$ and $Y$ be compact metric spaces with Borel probability measures $\mu$ and $\nu$, and let $h$ be a Lipschitz function on $X\times Y$. Let $L$ be its Lipschitz constant. The estimate obtained above can be used in the study of discrete approximations of the Kantorovich problem for the triple $(\mu,\nu,h)$. We consider discrete approximations of the marginals $\mu$ and $\nu$ by measures of the form

$$ \begin{equation*} \mu_n=\sum_{i=1}^n \mu(A_i)\delta_{a_i} \quad\text{and}\quad \nu_n=\sum_{i=1}^n \nu(B_i)\delta_{b_i} \end{equation*} \notag $$

respectively, where $A_1,\dots,A_n$ and $B_1,\dots,B_n$ are partitions of $X$ and $Y$ into disjoint Borel sets of diameter not larger than a prescribed number $\varepsilon$, $a_i\in A_i$ and $b_i\in B_i$. Since for any functions $f\in \mathrm{Lip}_1(X)$ and $g\in \mathrm{Lip}_1(Y)$ we have $|f(x)-f(a_i)|\leqslant \varepsilon$ for all $x\in A_i$ and $|g(y)-g(b_i)|\leqslant \varepsilon$ for all $y\in B_i$, we obtain the bounds

$$ \begin{equation*} \|\mu-\mu_n\|_{\mathrm K} \leqslant \varepsilon \quad\text{and}\quad \|\nu-\nu_n\|_{\mathrm K} \leqslant \varepsilon. \end{equation*} \notag $$

Whenever $\sigma_1\in \Pi(\mu,\nu)$ and $\sigma_2\in \Pi(\mu_n,\nu_n)$ are such that $\|\sigma_1-\sigma_2\|_{\mathrm{K}}\leqslant 2\varepsilon$, we have

$$ \begin{equation*} \biggl|\int h\, d(\sigma_1-\sigma_2)\biggr|\leqslant 2L \varepsilon. \end{equation*} \notag $$

Taking as $\sigma_2$ an optimal measure for the triple $(h,\mu_n,\nu_n)$ and finding $\sigma_1\in \Pi(\mu,\nu)$ such that $\|\sigma_1-\sigma_2\|_{\mathrm{K}}\leqslant 2\varepsilon$, on account of our estimate we obtain

$$ \begin{equation*} K_h(\mu,\nu)- K_h(\mu_n,\nu_n)\leqslant 2L \varepsilon. \end{equation*} \notag $$

We can similarly show that $K_h(\mu_n,\nu_n)-K_h(\mu,\nu)\leqslant 2L \varepsilon$, taking $\sigma_1$ optimal for the triple $(h,\mu,\nu)$ first and then finding a suitable coupling $\sigma_2\in \Pi(\mu_n,\nu_n)$. It follows that

$$ \begin{equation*} |K_h(\mu,\nu)- K_h(\mu_n,\nu_n)|\leqslant 2L \varepsilon. \end{equation*} \notag $$

The Kantorovich problem for $(\mu_n,\nu_n,h)$ is finite-dimensional. The corresponding matrix is $(h(a_i,b_j))_{i,j\leqslant n}$. If $\sigma_n$ is an optimal plan for this problem, then our estimate shows that there is a coupling $\pi_n\in \Pi(\mu,\nu)$ for the original problem such that $\|{\sigma_n-\pi_n}\|_{\mathrm{K}}\leqslant 2\varepsilon$. It is straightforward to see that the coupling $\pi_n$ is $4L\varepsilon$-approximate for the original problem. Of course, it is important to control $n$: the minimum $n$ possible depends on the metric entropy of $X$ and $Y$. For example, if $X$ and $Y$ are contained in $\mathbb{R}^d$, then $n$ is of order $\varepsilon^{-d}$. The rate of approximation of $\mu$ and $\nu$ by discrete measures is, of course, a separate question (see [28]). If $h$ is not Lipschitz, then a similar estimate can be obtained in terms of the modulus of continuity of $h$. Our result can be used in approximation schemes discussed in [6].

Finally, the case of noncompact spaces reduces in principle to the one we have considered, provided we have some information about compact sets on which marginals are $\varepsilon$-concentrated. Compactness is only needed to obtain unified discrete approximations of marginals and use the distance $d_{\mathrm{K}}$ in place of $d_{\mathrm{KR}}$. Discrete approximations of of multi-marginal Kantorovich problems are constructed similarly; our inequality yields quite universal bounds for the corresponding costs and optimal plans. For recent results on discrete multi-marginal Kantorovich problems, see [44].

The same approach works for nonlinear cost functionals considered in the previous section. Suppose that $h$ is a nonnegative function on $X\times Y\times \mathcal{P}_r(X\times Y)$ that is Lipschitz with constant $L$. Then the nonlinear functional

$$ \begin{equation*} J_h(\sigma)=\int_{X\times Y} h(x,y,\sigma)\, \sigma(dx\, dy) \end{equation*} \notag $$

is $2L$-Lipschitz on $\mathcal{P}_r(X\times Y)$, since

$$ \begin{equation*} \begin{aligned} \, |J_h(\sigma_1)-J_h(\sigma_2)| &\leqslant \int_{X\times Y} |h(x,y,\sigma_1)-h(x,y,\sigma_2)|\, \sigma_1(dx\, dy) \\ &\qquad+ \biggl|\int_{X\times Y} h(x,y,\sigma_2)\, (\sigma_1-\sigma_2)(dx\, dy)\biggr| \leqslant 2L d_{\mathrm K}(\sigma_1,\sigma_2). \end{aligned} \end{equation*} \notag $$

Therefore, using the same discrete $\varepsilon$-approximations of the marginals $\mu$ and $\nu$ as above, we obtain the bound $|K_h(\mu,\nu)-K_h(\mu_n,\nu_n)|\leqslant 4\varepsilon L$.

The same method works for noncompact metric spaces $X$ and $Y$, provided that we have discrete $\varepsilon$-approximations of marginals in the Kantorovich–Rubinshtein metric $d_{\mathrm{KR}}$ and the function $h$ is $L$-Lipschitz and bounded by $L$. Then the resulting estimate is also the same. If $\sigma_n$ is an optimal plan for $(\mu_n,\nu_n)$, then our estimate furnishes a plan $\pi_n\in \Pi(\mu,\nu)$ with $d_{\mathrm{KR}}(\sigma_n, \pi_n)\leqslant 2\varepsilon$. This plan is $8L\varepsilon$-approximate for $(\mu,\nu)$.

Acknowledgement

We are grateful to the referee for useful comments.



Bibliography

1.	K. A. Afonin and V. I. Bogachev, “Kantorovich type topologies on spaces of measures and convergence of barycenters”, Commun. Pure Appl. Anal., 22:2 (2023), 597–612
2.	J.-J. Alibert, G. Bouchitté and T. Champion, “A new class of costs for optimal transport planning”, European J. Appl. Math., 30:6 (2019), 1229–1263
3.	L. Ambrosio, E. Brué and D. Semola, Lectures on optimal transport, Unitext, 130, La Mat. per il 3+2, Springer, Cham, 2021, ix+250 pp.
4.	L. Ambrosio and N. Gigli, “A user's guide to optimal transport”, Modelling and optimisation of flows on networks, Lecture Notes in Math., 2062, Fond. CIME/CIME Found. Subser., Springer, Heidelberg, 2013, 1–155
5.	L. Ambrosio and A. Pratelli, “Existence and stability results in the $L^1$ theory of optimal transportation”, Optimal transportation and applications (Martina Franca 2001), Lecture Notes in Math., 1813, Springer, Berlin, 2003, 123–160
6.	M. L. Avendaño-Garrido, J. R. Gabriel-Argüelles, L.-T. Quintana and J. González-Hernández, “An approximation scheme for the Kantorovich–Rubinstein problem on compact spaces”, J. Numer. Math., 26:2 (2018), 63–75
7.	J. Backhoff-Veraguas, M. Beiglböck and G. Pammer, “Existence, duality, and cyclical monotonicity for weak transport costs”, Calc. Var. Partial Differential Equations, 58:6 (2019), 203, 28 pp.
8.	J. Backhoff-Veraguas and G. Pammer, “Applications of weak transport theory”, Bernoulli, 28:1 (2022), 370–394
9.	J. Bergin, “On the continuity of correspondences on sets of measures with restricted marginals”, Econom. Theory, 13:2 (1999), 471–481
10.	S. Bobkov and M. Ledoux, One-dimensional empirical measures, order statistics, and Kantorovich transport distances, Mem. Amer. Math. Soc., 261, no. 1259, Amer. Math. Soc., Providence, RI, 2019, v+126 pp.
11.	V. I. Bogachev, Measure theory, v. I, II, Springer-Verlag, Berlin, 2007, xviii+500 pp., xiv+575 pp.
12.	V. I. Bogachev, Weak convergence of measures, Math. Surveys Monogr., 234, Amer. Math. Soc., Providence, RI, 2018, xii+286 pp.
13.	V. I. Bogachev, “Kantorovich problem of optimal transportation of measures: new directions of research”, Russian Math. Surveys, 77:5 (2022), 769–817
14.	V. I. Bogachev, “Kantorovich problems with a parameter and density constraints”, Siberian Math. J., 63:1 (2022), 34–47
15.	V. I. Bogachev, A. N. Doledenok and I. I. Malofeev, “The Kantorovich problem with a parameter and density constraints”, Math. Notes, 110:6 (2021), 952–955
16.	V. I. Bogachev, A. N. Kalinin and S. N. Popova, “On the equality of values in the Monge and Kantorovich problems”, J. Math. Sci. (N.Y.), 238:4 (2019), 377–389
17.	V. I. Bogachev and A. V. Kolesnikov, “The Monge–Kantorovich problem: achievements, connections, and perspectives”, Russian Math. Surveys, 67:5 (2012), 785–890
18.	V. I. Bogachev and I. I. Malofeev, “Kantorovich problems and conditional measures depending on a parameter”, J. Math. Anal. Appl., 486:1 (2020), 123883, 30 pp.
19.	V. I. Bogachev and S. N. Popova, “On Kantorovich problems with a parameter”, Dokl. Math., 106:3 (2022), 426–428
20.	V. I. Bogachev and A. V. Rezbaev, “Existence of solutions to the nonlinear Kantorovich transportation problem”, Math. Notes, 112:3 (2022), 369–377
21.	B. Bonnet and H. Frankowska, “Differential inclusions in Wasserstein spaces: the Cauchy–Lipschitz framework”, J. Differential Equations, 271 (2021), 594–637
22.	C. Clason, D. A. Lorenz, H. Mahler and B. Wirth, “Entropic regularization of continuous optimal transport problems”, J. Math. Anal. Appl., 494:1 (2021), 124432, 22 pp.
23.	J. Dedecker, C. Prieur and P. Raynaud De Fitte, “Parametrized Kantorovich–Rubinštein theorem and application to the coupling of random variables”, Dependence in probability and statistics, Lect. Notes Stat., 187, Springer, New York, 2006, 105–121
24.	L. De Pascale, J. Louet and F. Santambrogio, “The Monge problem with vanishing gradient penalization: vortices and asymptotic profile”, J. Math. Pures Appl. (9), 106:2 (2016), 237–279
25.	A. Figalli and F. Glaudo, An invitation to optimal transport, Wasserstein distances, and gradient flows, EMS Textbk. Math., EMS Press, Berlin, 2021, vi+136 pp.
26.	M. Ghossoub and D. Saunders, “On the continuity of the feasible set mapping in optimal transport”, Econ. Theory Bull., 9:1 (2021), 113–117
27.	N. Gozlan, C. Roberto, P.-M. Samson and P. Tetali, “Kantorovich duality for general transport costs and applications”, J. Funct. Anal., 273:11 (2017), 3327–3405
28.	S. Graf and H. Luschgy, Foundations of quantization for probability distributions, Lecture Notes in Math., 1730, Springer-Verlag, Berlin, 2000, x+230 pp.
29.	M. Katětov, “On real-valued functions in topological spaces”, Fund. Math., 38 (1951), 85–91 ; “Correction”, 40 (1953), 203–205
30.	S. Kuksin, V. Nersesyan and A. Shirikyan, “Exponential mixing for a class of dissipative PDEs with bounded degenerate noise”, Geom. Funct. Anal., 30:1 (2020), 126–187
31.	D. A. Lorenz, P. Manns and C. Meyer, “Quadratically regularized optimal transport”, Appl. Math. Optim., 83:3 (2021), 1919–1949
32.	I. I. Malofeev, “Measurable dependence of conditional measures on a parameter”, Dokl. Math., 94:2 (2016), 493–497
33.	E. Michael, “Continuous selections. I”, Ann. of Math. (2), 63:2 (1956), 361–382
34.	E. Michael, “A selection theorem”, Proc. Amer. Math. Soc., 17 (1966), 1404–1406
35.	A. Pratelli, “On the equality between Monge's infimum and Kantorovich's minimum in optimal mass transportation”, Ann. Inst. Henri Poincaré Probab. Stat., 43:1 (2007), 1–13
36.	S. T. Rachev and L. Rüschendorf, Mass transportation problems, v. I, Probab. Appl. (N.Y.), Theory, Springer-Verlag, New York, 1998, xxvi+508 pp. ; v. II, Applications, xxvi+430 pp.
37.	D. Ramachandran and L. Rüschendorf, “A general duality theorem for marginal problems”, Probab. Theory Related Fields, 101:3 (1995), 311–319
38.	D. Repovš and P. V. Semenov, Continuous selections of multivalued mappings, Math. Appl., 455, Kluwer Acad. Publ., Dordrecht, 1998, viii+356 pp.
39.	F. Santambrogio, Optimal transport for applied mathematicians. Calculus of variations, PDEs, and modeling, Progr. Nonlinear Differential Equations Appl., 87, Birkhäuser/Springer, Cham, 2015, xxvii+353 pp.
40.	A. Savchenko and M. Zarichnyi, “Correspondences of probability measures with restricted marginals”, Proc. Intern. Geom. Center, 7:4 (2014), 34–39
41.	A. M. Vershik, P. B. Zatitskiy and F. V. Petrov, “Geometry and dynamics of admissible metrics in measure spaces”, Cent. Eur. J. Math., 11:3 (2013), 379–400
42.	C. Villani, Topics in optimal transportation, Grad. Stud. Math., 58, Amer. Math. Soc., Providence, RI, 2003, xvi+370 pp.
43.	C. Villani, Optimal transport. Old and new, Grundlehren Math. Wiss., 338, Springer, New York, 2009, xxii+973 pp.
44.	D. Vögler, “Geometry of Kantorovich polytopes and support of optimizers for repulsive multi-marginal optimal transport on finite state spaces”, J. Math. Anal. Appl., 502:1 (2021), 125147, 31 pp.
45.	Feng-Yu Wang and Jie-Xiang Zhu, “Limit theorems in Wasserstein distance for empirical measures of diffusion processes on Riemannian manifolds”, Ann. Inst. Henri Poincaré Probab. Stat., 59:1 (2023), 437–475
46.	Xicheng Zhang, “Stochastic Monge–Kantorovich problem and its duality”, Stochastics, 85:1 (2013), 71–84

Citation: V. I. Bogachev, S. N. Popova, “Hausdorff distances between couplings and optimal transportation”, Sb. Math., 215:1 (2024), 28–51

Citation in format AMSBIB

\Bibitem{BogPop24}

\by V.~I.~Bogachev, S.~N.~Popova

\paper Hausdorff distances between couplings and optimal transportation

\jour Sb. Math.

\yr 2024

\vol 215

\issue 1

\pages 28--51

\mathnet{http://mi.mathnet.ru//eng/sm9920}

\crossref{https://doi.org/10.4213/sm9920e}

\mathscinet{http://mathscinet.ams.org/mathscinet-getitem?mr=4741221}

\zmath{https://zbmath.org/?q=an:1543.49037}

\adsnasa{https://adsabs.harvard.edu/cgi-bin/bib_query?2024SbMat.215...28B}

\isi{https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=Publons&SrcAuth=Publons_CEL&DestLinkType=FullRecord&DestApp=WOS_CPL&KeyUT=001224793300002}

\scopus{https://www.scopus.com/record/display.url?origin=inward&eid=2-s2.0-85193420317}

Linking options:

https://www.mathnet.ru/eng/sm9920

https://doi.org/10.4213/sm9920e

https://www.mathnet.ru/eng/sm/v215/i1/p33

This publication is cited in the following 3 articles:

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Что такое QR-код?

Registration to the website

Logotypes