Sbornik: Mathematics
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Forthcoming papers
Archive
Impact factor
Guidelines for authors
License agreement
Submit a manuscript

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Mat. Sb.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Sbornik: Mathematics, 2024, Volume 215, Issue 4, Pages 494–510
DOI: https://doi.org/10.4213/sm9982e
(Mi sm9982)
 

Lipschitz continuity of the metric projection operator and convergence of gradient methods

M. V. Balashov

V. A. Trapeznikov Institute of Control Sciences of Russian Academy of Sciences, Moscow, Russia
References:
Abstract: Various support conditions for a closed subset of a real Hilbert space $\mathcal H$ at a boundary point of this set are considered. These conditions ensure certain local Lipschitz continuity of the metric projection operator as a function of a point. The local Lipschitz continuity of the metric projection as a function of the set in the Hausdorff metric is also proved. This Lipschitz property is used to verify the linear convergence of some gradient methods (the gradient projection method and the conditional gradient method) without assuming that the function must be strongly convex (or even convex) and for not necessarily convex sets. The function is assumed to be differentiable with Lipschitz continuous gradient.
Bibliography: 29 titles.
Keywords: support strong convexity condition, support weak convexity condition, gradient projection method, conditional gradient method, nonsmooth analysis.
Funding agency Grant number
Russian Science Foundation 22-11-00042
This research was carried out at the Trapeznikov Institute of Control Sciences of Russian Academy of Sciences with the financial support of the Russian Science Foundation (grant no. 22-11-00042), https://rscf.ru/en/project/22-11-00042/.
Received: 16.07.2023 and 30.12.2023
Bibliographic databases:
Document Type: Article
MSC: Primary 49J52, 90C26; Secondary 46N10
Language: English
Original paper language: Russian

§ 1. Introduction

1.1.

Let $\mathcal Q\subset\mathcal H$ be a closed convex subset of a real Hilbert space $\mathcal H$. For any point $x\in\mathcal H$ the metric projection $P_{\mathcal Q}x$ of $x$ onto $\mathcal Q$ is single-valued and Lipschitz continuous:

$$ \begin{equation*} \|P_{\mathcal Q}x-P_{\mathcal Q}x_0\|\leqslant 1\cdot\|x-x_0\|, \qquad x,x_0\in\mathcal H. \end{equation*} \notag $$
This Lipschitz continuity was originally pointed out apparently by Phelps [1]. Later, various continuity properties of the metric projection operator $P_{\mathcal Q}$ were studied in various classes of spaces and for various sets $\mathcal Q$. For smooth manifolds in $\mathcal H$ the Lipschitz constant of the contractive metric projection operator was estimated in [2], Corollary 4.2. Subsequently, various generalizations of convex sets (for example, proximally smooth sets) were introduced. The Lipschitz continuity of the metric projection (as a function of the point) for such sets in $\mathbb R^n$ and $\mathcal H$ was studied in [3] and [4]. For uniformly convex and uniformly smooth Banach spaces Björnestål [5] showed that the metric projection onto any closed subspace is locally Hölder continuous as a function of the point.

The stability properties of the metric projection operator as a function of a set were also studied. Daniel [6] showed that for convex closed bounded sets $\mathcal Q,\mathcal Q_0\subset\mathcal H$ the distance between the sets of best approximants $P_{\mathcal Q}0$ and $P_{\mathcal Q_0}0$ behaves as $(h(\mathcal Q,\mathcal Q_0))^{1/2}$ in the Hausdorff metric $h(\mathcal Q,\mathcal Q_0)$. It can be shown in elementary examples in $\mathbb R^2$ (see, for instance, [7], p. 190) that the exponent $1/2$ of $h(\mathcal Q,\mathcal Q_0)$ is sharp. Berdyshev [8] extended the result in [6] to the case of uniformly convex Banach spaces. His result explained, in particular, the square root in Daniel’s estimate — this is the inverse function of the modulus of convexity in a Hilbert space.

In uniformly convex and uniformly smooth Banach spaces the metric projection onto proximally smooth sets was shown to be Hölder continuous (as a function of the point and set); the corresponding estimates were obtained in terms of the moduli of convexity and smoothness (see [9]).

Nonexpansive (and, in particular, contractive) metric projections $\mathcal H\ni x\mapsto P_{\mathcal Q}x$ have many applications, one of the most important of which is related to gradient methods for the minimization of a function on a set and to conditions for their convergence.

Let $\mathcal Q\subset\mathbb R^n$ be a closed set and $f\colon \mathbb R^n\to\mathbb R$ be a function with Lipschitz continuous gradient. Gradient methods, like the gradient projection method

$$ \begin{equation} x_1\in\mathcal Q, \qquad x_{k+1}\in P_{\mathcal Q}(x_k-\alpha_k f'(x_k)), \quad k\geqslant 1, \quad \alpha_k>0, \end{equation} \tag{1.1} $$
and the conditional gradient method
$$ \begin{equation} \begin{gathered} \, x_1\in\mathcal Q,\qquad z_k\in\operatorname{Arg}\max_{x\in\mathcal Q}(-f'(x_k),x), \\ x_{k+1}\in\operatorname{Arg}\min_{x\in [x_k,z_k]}f(x),\qquad k\geqslant 1, \end{gathered} \end{equation} \tag{1.2} $$
are important algorithms for first-order minimization. Both methods are applied mainly to convex sets and functions; however, some recent studies relax the assumptions of convexity on the function, and, for the method (1.1), on the set.

It is important that, for sufficiently wide classes of problems, the above methods converge linearly to the solution (that is, geometrically, with ratio $<1$).

Levitin and Polyak [10] were the first to give linear estimates for the convergence rate of the methods (1.1) and (1.2) for a wide class of convex sets and functions. For the gradient projection method (1.1) linear convergence was proved for strongly convex functions1 $f$ and convex sets $\mathcal Q$. For the conditional gradient method (1.2) linear convergence was established for strongly convex sets2 $\mathcal Q$ and convex functions $f$. In both cases, the gradient $f'$ was assumed to be Lipschitz continuous.

Let us mention some generalizations of [10]. The conditional gradient method (1.2) was shown to converge linearly if the $\mathcal Q$ is locally strongly convex at the solution $x_0\in\mathcal Q$ of problem (1.2). Local strong convexity, which is defined in terms of the local modulus of convexity (see [11]), means that there exists a positive constant $c$ such that $(x_0+x)/2+\mathcal B_{c\|x_0-x\|^2}(0)\subset\mathcal Q$ for each point $x\in\mathcal Q$, where $\mathcal B_r(0)$ is the closed Euclidean ball of radius $r$ with centre at the origin. For some generalizations and refinements of this result, see § 2.2.3 in [12]. In addition, the method (1.2) was shown to converge linearly for strongly convex and smooth $f$ (see Theorem 2.29 in [12]). In [13] and [14] linear convergence was established for a modification of method (1.2) for strongly convex sets $\mathcal Q$ and functions $f$ which may not be convex but whose gradient is Lipschitz continuous. Note that this method does not involve minimization over the interval, that is, it was assumed that $x_{k+1}=z_k$. This method was shown to converge linearly under a certain relation between the strong convexity constants, the Lipschitz constants and so on.

Some advances in the gradient projection method where made by using the idea of sharp minimum. (For historical remarks on the sharp minimum condition in geometric approximation theory, see [15].) In [16], formula (4), the sharp minimum condition was employed in linear estimates for the convergence rate of subgradient methods for nonsmooth convex functions. Subsequently, the sharp minimum approach was applied to verify the linear convergence of gradient methods for weakly convex functions (such functions may be nonconvex); see [17] and [18].

Several important results on linear convergence for the gradient projection method (1.1) were obtained under the Polyak–Łojasiewicz and other metric-regularity type conditions, in particular, on smooth manifolds (see [19]–[22]).

In [23] it was shown that the gradient projection method (1.1) for $x_1\in\mathcal L$ converges linearly for sufficiently small $\alpha_k=\alpha>0$, where $\mathcal Q\subset \mathcal H$ is proximally smooth3 with constant $K>0$ and the function $f$ is strongly convex with constant $\varkappa>0$ such that ${L}/{\varkappa}<K$, where $L=\sup_{x\in\mathcal L}\|f'(x)\|$, on some lower Lebesgue set $\mathcal L$ of level $>\inf_{x\in\mathcal Q}f(x)$ of the function $f$. In this result the relaxation of the convexity of $\mathcal Q$ is compensated by the strong convexity of $f$ and a restrictive convexity assumption on the Lebesgue sets of $f$, which is ‘stronger’ than the ‘nonconvexity’ of $\mathcal Q$. In particular, it can be shown that the Lebesgue sets of $f$ lying in $\mathcal L$ are strongly convex with radius ${L}/{\varkappa}$.

Note that the results of [23] depend substantially on the Lipschitz continuity of the metric projection onto $\mathcal Q$ in the $K$-neighbourhood of $K$: $\|P_{\mathcal Q}x_0-P_{\mathcal Q}x_1\|\leqslant ({K}/(K- \rho))\|x_0- x_1\|$ for all $x_0$ and $x_1$ lying at a distance $<K$ from $\mathcal Q$.

Since the constant ${K}/(K-\rho)$ is close to 1 for small $\rho>0$, the method (1.1) can be shown to be contractive by standard arguments involving the strong convexity of the function $f$.

On the other hand it is known that the strong convexity of $\mathcal Q$ (see [24]) — or even its relaxation in the form of a certain support condition (see [25]) imply that the operator $P_{\mathcal Q}$ is contractive on the set of points sufficiently distant from $\mathcal Q$. In particular, this property can imply the linear convergence of gradient methods.

The present paper is motivated primarily by the following question: what minimal geometrical properties of the set $\mathcal Q$ ensure that the metric projection $P_{\mathcal Q}$ onto this set is contractive (in some sense)? In parallel, we solve the convergence problem for the gradient methods (1.1) and (1.2). The function $f$ is assumed to have a Lipschitz continuous gradient. In particular, $f$ may be nonconvex.

We need two definitions. The first definition, which was introduced earlier by this author (see [25]), is a substantial relaxation of the strong convexity condition. The second definition is a certain relaxation of the proximal smoothness condition.

In § 2 we give refined estimates for the Lipschitz continuity of the metric projection $P_{\mathcal Q}$ for sets satisfying one of these definitions. We show, in particular, that the metric projection onto a set satisfying the first definition is locally Lipschitz continuous with constant $<1$. It is shown using examples that the Lipschitz constants for the metric projection $P_{\mathcal Q}$ are sharp. We also prove some extension of Daniel’s result from [6] on the continuity of the metric projection in the Hausdorff metric (as a function of a convex bounded closed subset of a Hilbert space $\mathcal H$) to the case of fairly general subsets of $\mathcal H$ which are not necessarily convex.

From the contractive properties of $P_{\mathcal Q}$ we obtain sufficient conditions for the convergence of gradient methods for not necessarily convex functions and sets (see § 3 and § 4). The corresponding results for convex sets are obtained by taking the limit.

In the concluding section, § 6, we compare briefly the above local strong convexity condition from [11] with our Definition 1, and show that some estimates available in the convex case can be obtained from the quadratic growth of the function.

1.2. Some definitions and notation

The inner product of points $x$ and $y$ in real Euclidean space $\mathbb R^n $ is denoted by $(x,y)$; as usual, $\|x\|=\sqrt{(x,x)}$. We also set $\mathcal B_R(a)=\{x\in\mathbb R^n\mid \|x-a\|\leqslant R\}$. The same notation is used for a real Hilbert space $H$.

Let $e_1,\dots,e_n$ be the standard orthonormal basis of $\mathbb R^n$. Given a set $\mathcal P\subset\mathcal H$ and a point $x\in\mathcal H$, we set $\varrho(x,\mathcal P)=\inf_{z\in \mathcal P}\|x-z\|$.

In what follows $\operatorname{co} \mathcal P$ is the convex hull of a set $\mathcal P$, $\partial \mathcal P$ is its boundary, and $\operatorname{int}\mathcal P$ is its interior.

The Hausdorff distance between closed subsets $\mathcal P$ and $\mathcal Q$ of $\mathcal H$ is defined by

$$ \begin{equation*} \begin{aligned} \, h(\mathcal P,\mathcal Q) &=\inf\bigl\{\varepsilon>0\mid \mathcal P\subset \mathcal Q+\mathcal B_{\varepsilon}(0),\, \mathcal Q\subset \mathcal P+\mathcal B_{\varepsilon}(0)\bigr\} \\ &=\max\Bigl\{\max_{x\in \mathcal P}\varrho(x,\mathcal Q),\, \max_{x\in \mathcal Q}\varrho(x,\mathcal P)\Bigr\}. \end{aligned} \end{equation*} \notag $$

The support function of a set $\mathcal Q\subset\mathbb R^n$ is given by the equality

$$ \begin{equation*} s(p,\mathcal Q)=\sup_{x\in \mathcal Q}(p,x),\qquad p\in\mathbb R^n. \end{equation*} \notag $$
The normal cone at a point $x\in \mathcal Q$ of a convex closed set $\mathcal Q\subset\mathbb R^n$ is
$$ \begin{equation*} N(\mathcal Q,x)=\{p\in\mathbb R^n\mid (p,x)=s(p,\mathcal Q)\}. \end{equation*} \notag $$
The support set of a (not necessarily convex) compact set $\mathcal Q\subset\mathbb R^n$ in a direction $p\ne 0$ is defined by
$$ \begin{equation*} \mathcal Q (p)=\{x\in \mathcal Q\mid (p,x)=s(p,\mathcal Q)\} \end{equation*} \notag $$
Note that $\mathcal Q (p) \subset (\operatorname{co}\mathcal Q)(p)$ for an arbitrary compact subset $\mathcal Q$ of $\mathbb R^n$ and a point ${p\ne 0}$ (see Lemma 1.18.1 and Theorems 1.18.3 in [7]). For a convex closed set $\mathcal Q$ the set $\mathcal Q(p)$ is the subdifferential of the support function $s(\,\cdot\,,\mathcal Q)$ at $p$. The subdifferential of a convex function $f\colon \mathbb R^n\supset D\to\mathbb R$ at a point $x_0\in\mathbb R^n$ is defined by
$$ \begin{equation*} \partial f(x_0)=\{p\in\mathbb R^n\mid f(x)\geqslant f(x_0)+(p,x-x_0)\ \forall\, x\in D\} \end{equation*} \notag $$
(this set can be empty). If $\mathcal Q(p)$ is not a singleton, then we also denote its elements by $\mathcal Q(p)$.

A nonempty set $Q\subset\mathcal H$ is said to be strongly convex with radius $R$ if $Q=\bigcap_{x\in \mathcal X}\mathcal B_R(x)$ for some set $\mathcal X\subset\mathcal H$. For a convex closed bounded subset $Q$ of $\mathcal H$ the following conditions are equivalent (see Theorems 4.2.7 and 4.3.3 in [7], and [3]):

A closed set $\mathcal Q\subset\mathcal H$ is called proximally smooth with constant $K>0$ if the distance function $\varrho (x,\mathcal Q)$ is continuously differentiable on $\mathcal U_{\mathcal Q}(K)=\{x\in\mathcal H$: ${0<\varrho (x,\mathcal Q)<K}\}$. The following conditions are equivalent to saying that $\mathcal Q\subset\mathcal H$ is proximally smooth with constant $K$ (see [3] and [4]):

We say that a vector $\mathcal H\ni p\ne 0$ is a proximal normal of a closed set $\mathcal Q\subset\mathcal H$ at a point $x\in\mathcal Q$ if $P_{\mathcal Q}(x+tp)=x$ for some $t>0$. Given $x\in\mathcal Q$, the set of proximal normals, together with the origin $0\in\mathcal H$, forms the cone (possibly consisting only of $\{0\}$), which we denote by $N_P(\mathcal Q,x)$.

Definition 1 (see [25]). We say that a closed set $\mathcal Q\,{\subset}\,\mathcal H$ satisfies the support strong convexity condition with radius $R>0$ for a vector $p_0\ne 0$, $p_0 \in \mathcal H$, if $\mathcal Q(p_0)=\operatorname{Arg}\max_{x\in\mathcal Q}(p_0,x)$ is a singleton and

$$ \begin{equation} \mathcal Q\subset \mathcal B_{R}\biggl(\mathcal Q(p_0)-R\frac{p_0}{\|p_0\|}\biggr). \end{equation} \tag{1.3} $$
For brevity, we denote this condition by $\operatorname{sc}$, or by $\operatorname{sc}(p_0,R)$ if we need to specify $p_0$ and $R$.

Definition 2. We say that a closed set $\mathcal Q\subset\mathcal H$ satisfies the support weak convexity condition with positive constant $K$ for a point $x_0\in\mathcal Q$ and a vector $p_0\ne 0$, $p_0\in N_P(\mathcal Q,x_0)$, if

$$ \begin{equation} \mathcal Q\cap \operatorname{int}\mathcal B_{K}\biggl(x_0+K\frac{p_0}{\| p_0\|}\biggr)=\varnothing. \end{equation} \tag{1.4} $$
For brevity, we denote this condition by $\operatorname{wc}$, or by $\operatorname{wc}(x_0,p_0,K)$ if we need to specify $x_0$, $p_0$ and $K$.

§ 2. Local Lipschitz continuity of the metric projection

Theorem 1. Let the closed set $\mathcal Q\subset\mathcal H$ satisfy the support condition $\operatorname{sc}(p_0,R)$, let $\|p_0\|=1$, let $r>0$, $x_0=\mathcal Q(p_0)+rp_0$, and let $x\in\mathcal H$ be such that $\varrho (x,\mathcal Q)=\rho \geqslant 0$ and $P_{\mathcal Q}x\ne\varnothing$. Next, let $a_0=P_{\mathcal Q}x_0=\mathcal Q(p_0)$, and fix some $a\in P_{\mathcal Q}x$. Suppose that the support condition $\operatorname{wc}(a,x-a,K)$ is satisfied at $a\in\mathcal Q$. Then

$$ \begin{equation} \|a_0-a\|\leqslant \frac{2R}{2R+r-({R}/{K})\rho}\|x_0-x\|. \end{equation} \tag{2.1} $$

Proof. Let $\rho>0$. The support condition $\operatorname{sc}$ reads
$$ \begin{equation*} \biggl\|a_0-\frac{x_0-a_0}{r}R-a\biggr\|\leqslant R. \end{equation*} \notag $$
By squaring we obtain
$$ \begin{equation*} \begin{aligned} \, &\|a_0-a\|^2 -\frac{2R}{r}(x_0-a_0,a_0-a) \\ &\qquad=\|a_0-a\|^2 -\frac{2R}{r}(x_0-a_0+a-a,a_0-a)\leqslant 0. \end{aligned} \end{equation*} \notag $$
Hence
$$ \begin{equation} (2R+r)\|a_0-a\|^2\leqslant 2R (x_0-a,a_0-a). \end{equation} \tag{2.2} $$
The support condition $\operatorname{wc}$ reads
$$ \begin{equation*} \biggl\|a+\frac{x-a}{\rho}K-a_0\biggr\|\geqslant K. \end{equation*} \notag $$
By squaring we obtain
$$ \begin{equation*} \|a_0-a\|^2 +\frac{2K}{\rho}(x-a,a-a_0)\geqslant 0. \end{equation*} \notag $$
Hence
$$ \begin{equation} (x-a,a_0-a)\leqslant \frac{\rho}{2K}\|a_0-a\|^2. \end{equation} \tag{2.3} $$

From (2.2) it follows that

$$ \begin{equation*} \begin{aligned} \, (2R+r)\|a_0-a\|^2 &\leqslant 2R (x_0-a,a_0-a) \\ &\leqslant 2R (x_0-x,a_0-a)+2R (x-a,a_0-a). \end{aligned} \end{equation*} \notag $$
Now an appeal to (2.3) shows that
$$ \begin{equation*} \begin{aligned} \, (2R+r)\|a_0-a\|^2 &\leqslant 2R (x_0-a,a_0-a) \\ &\leqslant 2R (x_0-x,a_0-a)+\frac{R\rho}{K}\|a_0-a\|^2. \end{aligned} \end{equation*} \notag $$
Hence
$$ \begin{equation*} \begin{aligned} \, \biggl(2R+r-\frac{R\rho}{K}\biggr)\|a_0-a\|^2 &\leqslant 2R (x_0-x,a_0-a) \\ &\leqslant 2R \|x_0-x\|\,\| a_0-a\|. \end{aligned} \end{equation*} \notag $$

If $\rho=0$, then $x=a$, and from (2.2) we obtain

$$ \begin{equation*} \begin{aligned} \, (2R+r)\|a_0-a\|^2 &\leqslant 2R (x_0-a,a_0-a) \\ &=2R (x_0-x,a_0-a)\leqslant 2R\|x_0-x\|\,\|a_0-a\|. \end{aligned} \end{equation*} \notag $$
So estimate (2.1) also holds for $\rho=0$.

This proves the theorem.

Remark 1. If the set $\mathcal Q$ in Theorem 1 is convex, then $K=+\infty$, and the estimate in the theorem assumes the form

$$ \begin{equation} \|a_0-a\|\leqslant \frac{2R}{2R+r}\|x_0-x\|. \end{equation} \tag{2.4} $$

Remark 2. The result of Theorem 1 does not change if we assume that $\|x_0- a_0\|\geqslant r$ (and $(x_0-a_0)/(\|x_0-a_0\|)=p_0$) and $\varrho (x,\mathcal Q)\leqslant \rho$. This is an immediate consequence of (2.1).

Remark 3. Note that for each closed subset $\mathcal Q$ of $\mathcal H$, from Theorem 1 we have $K=\rho$, where no support condition $\operatorname{wc}$ at $a\in\mathcal Q$ is assumed. Hence

$$ \begin{equation} \|a_0-a\|\leqslant \frac{2R}{R+r}\|x_0-x\|. \end{equation} \tag{2.5} $$

Under the hypotheses of Theorem 1 let $\mathcal Q=\mathcal Q_0$. Next, let $\mathcal Q_1$ be another closed set. Assume that $\mathcal Q_0$ and $\mathcal Q_1$ are convex. Then

$$ \begin{equation*} \|a_0-a\|\leqslant \|a_0-b\|+\|b-a\|, \end{equation*} \notag $$
where $a=P_{\mathcal Q_1}x$ and $b=P_{\mathcal Q_0}x$. By Theorem 1 for $r=\|x_0-a_0\|$ and $\rho=\|x-b\|$, we have
$$ \begin{equation*} \|a_0-b\|\leqslant \frac{2R}{2R+r}\|x_0-x\|, \end{equation*} \notag $$
and so $\|a-b\|$ is of the order of $\sqrt{h(Q_0,Q_1)}$ (see [6]). For example, from formula (2.1.1) in [7] we obtain $\|a-b\|\leqslant 2\sqrt{2Rh(Q_0,Q_1)}$, where $R>0$ is such that $Q_0\cup\mathcal Q_1\subset \mathcal B_R(x)$ for some $x$. It can be shown that similar estimates also hold for the metric projection onto an arbitrary closed subset of $\mathcal H$ under the condition $\operatorname{wc}$.

Theorem 2. Let the closed set $\mathcal Q_0\subset\mathcal H$ satisfy the support weak convexity condition $\operatorname{wc}(a_0,p_0,K)$, let $\|p_0\|=1$, let $r\in (0,K)$, $t\in (0,r]$, $x_0=a_0+tp_0$, $a_0=P_{\mathcal Q_0}x_0 $, and let $\mathcal Q\subset \mathcal H$ be a closed set such that $h(\mathcal Q_0,\mathcal Q)<r$ and $x\in\mathcal H$ be a point such that $\varrho (x,\mathcal Q)\leqslant r$ and $P_{\mathcal Q}x\ne\varnothing$. Next, let $a\in P_{\mathcal Q}x$ be an arbitrary point. Then

$$ \begin{equation} \|a_0-a\|\leqslant \frac{2K}{K-r}\|x_0-x\|+\sqrt{\frac{4Krh(\mathcal Q_0,\mathcal Q)}{K-r}}. \end{equation} \tag{2.6} $$

Proof. Let $r>h_1>h=h(\mathcal Q_0,\mathcal Q)$, and let $b\in\mathcal Q_0$ be such that $\|a-b\|\leqslant h_1$. Then condition $\operatorname{wc}$ reads
$$ \begin{equation*} \biggl\|a_0+\frac{x_0-a_0}{\|x_0-a_0\|}K-b\biggr\|\geqslant K. \end{equation*} \notag $$
Hence
$$ \begin{equation*} \biggl\|a_0+\frac{x_0-a_0}{\|x_0-a_0\|}K-a\biggr\|\geqslant K-h_1. \end{equation*} \notag $$
By squaring and setting $\rho_0=\|x_0-a_0\|\leqslant r$ we obtain
$$ \begin{equation*} \begin{gathered} \, \|a_0-a\|^2+\frac{2K}{\rho_0}(a_0-a,x_0-a_0)+K^2\geqslant (K-h_1)^2, \\ \|a_0-a\|^2+\frac{2K}{\rho_0}(a_0-a,x_0-a_0+a-a)+K^2\geqslant (K-h_1)^2 \end{gathered} \end{equation*} \notag $$
and
$$ \begin{equation*} \|a_0-a\|^2-\frac{2K}{\rho_0}\|a_0-a\|^2+\frac{2K}{\rho_0}(a_0-a,x_0-a)+K^2\geqslant (K-h_1)^2. \end{equation*} \notag $$
Since $\rho_0\leqslant r$, we find that
$$ \begin{equation} (2K-r)\|a_0-a\|^2\leqslant 2K(a_0-a,x_0-a)+2Krh_1. \end{equation} \tag{2.7} $$

Let us specify the choice of $h_1\in (h,r)$ in the following two cases.

Case 1: $\rho=\varrho (x,\mathcal Q) >0$, $\rho > h$. Let $h_1\in (h,\rho)$. Proceeding as above, we have

$$ \begin{equation} \begin{gathered} \, \notag \biggl\|a+\frac{x-a}{\|x-a\|}\rho-a_0\biggr\|\geqslant \rho-h_1, \\ \notag -\|a_0-a\|^2+2(a-a_0,x-a_0)+2\rho h_1\geqslant 0, \\ \begin{split} K\|a_0-a\|^2 &\leqslant 2K(a-a_0,x-a_0)+2K\rho h_1 \\ &\leqslant 2K(a-a_0,x-a_0)+2Kr h_1. \end{split} \end{gathered} \end{equation} \tag{2.8} $$
Adding (2.7) and (2.8) we obtain
$$ \begin{equation*} \begin{gathered} \, (3K-r)\|a_0-a\|^2\leqslant 2K(x_0-x+a_0-a,a_0-a)+4Krh_1, \\ \begin{aligned} \, \|a_0-a\|^2 &\leqslant \frac{2K}{K-r}(x_0-x,a_0-a)+\frac{4Krh_1}{K-r} \\ &\leqslant \frac{2K}{K-r}\|x_0-x\|\, \|a_0-a\|+ \frac{4Kr}{K-r}h_1. \end{aligned} \end{gathered} \end{equation*} \notag $$
Letting $h_1\to h$ in the estimate
$$ \begin{equation*} \begin{aligned} \, \biggl( \|a_0-a\|-\frac{2K}{2(K-r)}\|x_0-x\|\biggr)^2 &\leqslant \biggl( \frac{2K}{2(K-r)}\| x_0-x\|\biggr)^2+\frac{4Kr}{K-r}h_1 \\ & \leqslant \biggl(\frac{2K}{2(K-r)}\|x_0-x\|+\sqrt{\frac{4Kr}{K-r}h_1}\,\biggr)^2, \end{aligned} \end{equation*} \notag $$
we prove the lemma in Case 1.

Case 2: $0<\rho=\varrho (x,\mathcal Q)\leqslant h$. Let $h_1\in (h,r)$ and $z=a+((x-a)/\|x-a\|)\rho=x$, and let $b\in \mathcal Q$ be a point such that $\|a_0-b\|\leqslant h_1$. Since $\|z-b\|\geqslant \rho$ and $\|a_0-b\|\leqslant h_1$, we have

$$ \begin{equation*} \|z-a_0+a_0-b\|\geqslant \rho\quad\text{and} \quad \|z-a_0\|^2+\|a_0-b\|^2\geqslant \frac{\rho^2}{2}. \end{equation*} \notag $$
Hence, substituting $z=x$, we find that
$$ \begin{equation*} \|a-a_0\|^2+2(a-a_0,x-a)+\rho^2+h_1^2\geqslant \frac{\rho^2}{2}. \end{equation*} \notag $$
Using the condition $\rho<h_1$ we write this inequality as
$$ \begin{equation*} \|a-a_0\|^2+2(a-a_0,x-a_0)-2\|a_0-a\|^2+2h_1^2\geqslant \frac{\rho^2}{2}. \end{equation*} \notag $$
This eventually gives us
$$ \begin{equation*} \|a_0-a\|^2\leqslant 2(a-a_0,x-a_0)+2h_1^2 \end{equation*} \notag $$
and
$$ \begin{equation*} K\|a_0-a\|^2\leqslant 2K(a-a_0,x-a_0)+2Kh_1^2. \end{equation*} \notag $$
By the assumptions of the theorem $h_1<r$, and therefore
$$ \begin{equation*} K\|a_0-a\|^2\leqslant 2K(a-a_0,x-a_0)+2Krh_1. \end{equation*} \notag $$
Adding the last inequality and (2.7) we obtain
$$ \begin{equation*} (3K-r)\|a_0-a\|^2\leqslant 2K(a_0-a,x_0-x+a_0-a)+4Krh_1. \end{equation*} \notag $$
Using this inequality and proceeding as in Case 1 we obtain estimate (2.6) from the theorem. In the case when $\varrho (x,\mathcal Q)=0$ estimate (2.6) can be derived from (2.7) and the equality $x=a$.

This proves the theorem.

The following corollary to Theorem 2 generalizes Daniel’s result (see [6]) to the nonconvex setting.

Corollary 1. Let $\mathcal Q_0\,{\subset}\,\mathcal H$ be a closed set satisfying the support weak convexity condition $\operatorname{wc}(a_0,p_0,K)$, let $\|p_0\|=1$, let $r\in (0,K)$, $t\in (0,r]$, $x_0=a_0+tp_0$, $a_0=P_{\mathcal Q}x_0 $, and let $\mathcal Q\subset \mathcal H$ be a closed set such that $h(\mathcal Q_0,\mathcal Q)<r$ and $\varrho (x_0,\mathcal Q)\leqslant r$. Assume that $P_{\mathcal Q}x_0\ne \varnothing$. Then the following estimate for the metric projection holds:

$$ \begin{equation*} \|a_0-a\|\leqslant \sqrt{\frac{4Krh(\mathcal Q_0,\mathcal Q)}{K-r}} \quad \forall\, a\in P_{\mathcal Q}x_0. \end{equation*} \notag $$

Lemma 1. Let the convex closed set $\mathcal Q\subset\mathcal H$ satisfy the condition $\operatorname{sc}(p,R)$, and let $\|p\|=1$. Then for each unit vector $q\in\mathcal H$ and an arbitrary element of the support set $\mathcal Q(q)$ (which we also denote by $\mathcal Q(q)$)

$$ \begin{equation} \|\mathcal Q(p)-\mathcal Q(q)\|\leqslant 2R \|p-q\|. \end{equation} \tag{2.9} $$

Proof. Fix a unit vector $q$ and a point $\mathcal Q(q)$. Without loss of generality we assume that $\mathcal Q (p)-Rp=0$.

Let $(p,q)\geqslant 0$. The point $\mathcal Q(q)$ lies in the set $\mathcal S=\mathcal B_R(0)\cap H_q^+$ (where $H_q^+= \{x\in\mathcal H\colon (q,x)\geqslant (q,\mathcal Q(p))\}$), which is a spherical segment. Let $\xi=[0,Rq]\cap\partial H_q^+$. Then $\|\xi-Rp\|\leqslant R\|q-p\|$ (since $\xi$ is a nearest element of $\partial H_q^+$ for $Rq$) and $\|\mathcal Q(q)-\mathcal Q(p)\|\leqslant \operatorname{diam}\mathcal S=2\|\xi-Rp\|$ (since $(p,q)\geqslant 0$ and the diameter of $\mathcal S$ coincides with the base of the segment).

If $(p,q)<0$, then $\|p-q\|^2=2-2(p,q)>2$, $\|p-q\|>\sqrt{2}$ and

$$ \begin{equation*} \frac{\|\mathcal Q(p)-\mathcal Q(q)\|}{\|p-q\|}\leqslant \frac{2R}{\sqrt{2}}=\sqrt{2}\, R, \end{equation*} \notag $$
which proves the lemma.

Let us show that the above Lipschitz constants are sharp.

We start with the Lipschitz constant $2R$ in Lemma 1. Let $R>0$. In $\mathbb R^2$ we set $p=(0,1)=e_2$ and $q=(\sin\varphi,\cos\varphi)$ for small $\varphi>0$. Consider the set $\mathcal Q=\mathcal B_R(0)\cap \{x\in\mathbb R^2\colon (q,x)\leqslant R(q,e_2)\}$ (this is a disc without a circular segment). It is a simple exercise in planimetry that the base length of the removed segment is $2R\sin\varphi$. In addition, we have $\mathcal Q(p)=Re_2$. We augment $\mathcal Q$ with a point from the removed segment that lies strictly above the straight line $\{x\in\mathbb R^2\colon (q,x)= R(q,e_2)\}$, close to the point $R(\sin 2\varphi,\cos 2\varphi)$ in the following sense: the distance between these points behaves as $\varphi^2$ with respect to $\varphi$. Let $\widetilde{\mathcal Q}$ be the convex hull of the set $\mathcal Q$ and the added point; note that by construction $\widetilde{\mathcal Q}(q)$ is the added point. We have $\|p-q\|\sim \varphi$ and $\|\widetilde{\mathcal Q}(p)-\widetilde{\mathcal Q}(q)\|\sim 2R\varphi$ as $\varphi\to +0$.

Let us show that the estimate in Theorem 1 is sharp for the convex set $\mathcal Q$ from the above example (see Remark 1). Let $r>0$ be fixed. Let $x_0=(R+r)e_2$, $a_0=Re_2=P_{\mathcal Q}x_0$, $a=R(\sin 2\varphi,\cos 2\varphi)$ and $x=a+rq\sim ((2R+r)\varphi, (R+r))$ as $\varphi\to +0$. Note that $a=P_{\mathcal Q}x$. We have $\|a-a_0\|\sim 2R\varphi$, and $\|x-x_0\|\sim (2R+r)\varphi$ as $\varphi \to +0$.

Let us prove that the Lipschitz constant $2R/(R+r)$ in (2.5) is sharp. Let $R>\varepsilon>0$ and $r>0$. Consider the set $\mathcal Q=\{a_0,a\}$ in $\mathbb R^2$, where $a_0=Re_2$ and $a=-Re_2$, and let $x_0=(R+r)e_2$ and $x=-\varepsilon e_2$. We have $P_{\mathcal Q}x_0=a_0$, $P_{\mathcal Q}x=a$, $\|a_0-a\|=2R$, $\| x_0-x\|=R+r+\varepsilon$, and the condition $\operatorname{sc}(x_0-a_0,R)$ is met.

Now let us show that the inequality in Theorem 2 is sharp. In the convex case $\|a_0-a\|$ behaves as $\sqrt{h}$ with respect to $h=h(\mathcal Q_0,\mathcal Q)$, and so the inequality in Theorem 2 is sharp. Let $K>r>0$ and $\varepsilon>0$. Consider the set $\mathcal Q=\{a_0,a\}$ in $\mathbb R^2$, where $a_0=-e_2K$ and $a=e_2(K+\varepsilon)$. Let $x_0=-e_2(K-r)$ and $x=\varepsilon e_2$. Note that $a_0=P_{\mathcal Q}x_0$ and $a=P_{\mathcal Q}x$. We have $\|a_0-a\|=2K+\varepsilon$ and $\| x_0-x\|=K-r+\varepsilon$, which proves the sharpness of the Lipschitz constant. Note that here the condition $\operatorname{wc}(a_0,x_0-a_0,K)$ is met.

Remark 4. If $P_{\mathcal Q}x=\varnothing$ in Theorem 1 or 2, then there exists a sequence $\{a_k\}\subset\mathcal Q$ such that $\lim_{k\to\infty}\|x-a_k\|=\varrho (x,\mathcal Q)$, and the upper limit $\limsup_{k\to\infty}\|a_0-a_k\|$ is majorized either by the right-hand side of (2.5) in Theorem 1 or the right-hand side of (2.6) in Theorem 2.

Let $x$ be a point in Theorem 1 such that $P_{\mathcal Q}x=\varnothing$. For each closed subset of $\mathcal H$ the set of points with single-valued metric projection onto this set is dense in $\mathcal H$ (see [26]), so there exists a sequence $\{x_k\}$, $x_k\to x$ as $k\to\infty$, such that for each $k$ $P_{\mathcal Q}x_k$ is a singleton. We set $P_{\mathcal Q}x_k=a_k$. Let $C=2R/(R+r)$. Then in the general case, for each point $x_k$ we have $\|a_0-a_k\|\leqslant C\|x_0-x_k\|$ (see Remark 3) and

$$ \begin{equation*} \limsup_{k\to\infty}\|a_0-a_k\|\leqslant C\limsup_{k\to\infty}\|x_0-x_k\|=C\|x_0-x\|. \end{equation*} \notag $$

The case of Theorem 2 is considered similarly.

§ 3. Application to the conditional gradient method

In $\mathbb R^n$ consider the problem

$$ \begin{equation} \min_{x\in\mathcal Q}f(x) \end{equation} \tag{3.1} $$
for a compact set $\mathcal Q$ and a Lipschitz differentiable function $f$.

Theorem 3. Let $\mathcal Q\subset\mathbb R^n$ be a closed set, and let $x_0\in \partial \mathcal Q$ be a solution of problem (3.1). Let $f\colon \mathbb R^n\to\mathbb R$ be such that $\|f'(x)\|\geqslant m>0$ for all $x\in\mathcal Q$, the gradient $f'$ is Lipschitz continuous with constant $L_1>0$, and the set $\mathcal Q$ satisfies the condition $\operatorname{sc}(-f'(x_0),R)$. Also let $2RL_1/{m}<1$. Then the iterations $x_1\in\mathcal Q$, $x_{k+1}\in \operatorname{Arg}\max_{x\in\mathcal Q}(-f'(x_k),x)$ converge to $x_0$ linearly:

$$ \begin{equation} \|x_0-x_{k+1}\|\leqslant \frac{2RL_1}{m}\|x_0-x_{k}\|. \end{equation} \tag{3.2} $$

Proof. Note that the condition $\operatorname{sc}(-f'(x_0),R)$ is also satisfied for the convex compact set $\operatorname{co}\mathcal Q$. Using the equality
$$ \begin{equation*} \|x_0-x_{k+1}\|=\|\mathcal Q(-f'(x_0))-\mathcal Q(-f'(x_k))\|, \end{equation*} \notag $$
the relations $\mathcal Q(-f'(x_k))\in (\operatorname{co}\mathcal Q)(-f'(x_k))$, $\mathcal Q(-f'(x_0)) = (\operatorname{co}\mathcal Q)(-f'(x_0))$ and Lemma 1 we have the following estimate:
$$ \begin{equation*} \begin{aligned} \, \|\mathcal Q(-f'(x_0))-\mathcal Q(-f'(x_k))\| &\leqslant 2R\biggl\|\frac{f'(x_0)}{\|f'(x_0)\|}-\frac{f'(x_k)}{\|f'(x_k)\|}\biggr\| \\ &\leqslant \frac{2R}{m}\|f'(x_0)-f'(x_k)\|\leqslant \frac{2RL_1}{m}\|x_0-x_k\|, \end{aligned} \end{equation*} \notag $$
which proves the theorem.

Note that in Theorem 3, unlike the previously known results, neither the function, nor the set are assumed to be convex.

§ 4. Application to the gradient projection method

Again, we consider problem (3.1) in $\mathbb R^n$. Let us present some results on the convergence of the gradient projection method (1.1) for various types of functions and sets. In all cases the algorithm will be shown to converge linearly.

In the next theorem we consider two cases of a convex and a nonconvex function $f$ with Lipschitz continuous gradient. We also assume that the set $\mathcal Q$ is proximally smooth with constant $K>0$. The result for convex sets $\mathcal Q$ is obtained by letting $K\to +\infty$.

Theorem 4. Let $\mathcal Q\subset\mathbb R^n$ be a compact set, and let $x_0\in \partial \mathcal Q$ be a solution of problem (3.1). Next, let $f\colon \mathbb R^n\to\mathbb R$ be a convex function with $L_1$-Lipschitz continuous gradient $f'$, where $L_1>0$ and $f'(x_0)\ne 0$, and let $\mathcal Q$ satisfy the condition $\operatorname{sc}(-f'(x_0),R)$. In addition, let $\mathcal Q$ be proximally smooth with constant $K>0$. Set $L=\max_{x\in\mathcal Q}\|f'(x)\|$ and assume that $\| f'(x_0)\|>{RL}/{K}$. Then the algorithm (1.1) with $\alpha_k=\alpha\in ( 0,2/{L_1})$ converges linearly with rate

$$ \begin{equation} \|x_0-x_{k+1}\|\leqslant \frac{2R}{2R+\alpha\|f'(x_0)\|-\alpha L({R}/{K})}\|x_0-x_{k}\|. \end{equation} \tag{4.1} $$

If the function $f$ is not convex, then the convergence is linear with rate

$$ \begin{equation} \|x_0-x_{k+1}\|\leqslant \frac{2R(1+\alpha L_1)}{2R+\alpha\|f'(x_0)\|-\alpha L({R}/{K})}\| x_0-x_{k}\|, \end{equation} \tag{4.2} $$
provided that $2RL_1+{RL}/{K}<\|f'(x_0)\|$ and $\alpha_k=\alpha>0$.

Proof. Setting
$$ \begin{equation*} \beta_k=\frac{2R}{2R+\alpha\|f'(x_0)\|-({R}/{K})\varrho (x_k-\alpha f'(x_k),\mathcal Q)}, \end{equation*} \notag $$
by (2.1) we have
$$ \begin{equation} \begin{aligned} \, \notag \|x_{k+1}-x_0\|^2 &=\|P_{\mathcal Q}(x_k-\alpha f'(x_k))-P_{\mathcal Q}(x_0-\alpha f'(x_0))\|^2 \\ \notag &\leqslant \beta_k^2\|(x_k-x_0)-\alpha (f'(x_k)-f'(x_0))\|^2 \\ &=\beta_k^2\bigl( \|x_k-x_0\|^2- 2\alpha (f'(x_k)-f'(x_0),x_k-x_0) \notag \\ &\qquad+\alpha^2\|f'(x_k)-f'(x_0)\|^2\bigr). \end{aligned} \end{equation} \tag{4.3} $$
Since $f$ is convex and the gradient $f'$ is $L_1$-Lipschitz continuous, we have the estimate (see [7], Lemma 1.19.5, (iii))
$$ \begin{equation*} L_1 (f'(x_k)-f'(x_0),x_k-x_0)\geqslant \|f'(x_k)-f'(x_0)\|^2. \end{equation*} \notag $$
Hence
$$ \begin{equation*} \|x_{k+1}-x_0\|^2\leqslant \beta_k^2\bigl( \|x_k-x_0\|^2- \alpha(2-\alpha L_1) (f'(x_k)-f'(x_0),x_k-x_0)\bigr). \end{equation*} \notag $$
The subdifferential of a convex function is monotone, and so, since $0<\alpha<{2}/{L_1}$, we have the inequality $\alpha(2-\alpha L_1) (f'(x_k)-f'(x_0),x_k-x_0)\geqslant 0$, which implies that
$$ \begin{equation*} \|x_{k+1}-x_0\|\leqslant \beta_k \|x_k-x_0\|\leqslant \frac{2R}{2R+\alpha\|f'(x_0)\|-\alpha L({R}/{K})}\|x_0-x_{k}\|. \end{equation*} \notag $$

Let $f$ be nonconvex. Using the notation of the theorem, by (4.3) we have

$$ \begin{equation*} \|x_{k+1}-x_0\|^2\leqslant \beta_k^2 (1+\alpha L_1)^2\|x_k-x_0\|^2\leqslant \beta^2 (1+\alpha L_1)^2\|x_k-x_0\|^2, \end{equation*} \notag $$
where
$$ \begin{equation*} \beta=\frac{2R}{2R+\alpha\|f'(x_0)\|-\alpha L({R}/{K})}. \end{equation*} \notag $$
From the condition
$$ \begin{equation*} 2RL_1+\frac{RL}{K}<\|f'(x_0)\| \end{equation*} \notag $$
we obtain $\beta (1+\alpha L_1)<1$, which proves the theorem.

In the next theorem $\mathcal Q\subset\mathbb R^n$ is an arbitrary compact set.

Theorem 5. Let $\mathcal Q\subset\mathbb R^n$ be a compact set, and let $x_0\in \partial \mathcal Q$ be a solution of problem (3.1). Next, let $f\colon \mathbb R^n\to\mathbb R$ be a convex function with $L_1$-Lipschitz continuous gradient $f'$, $L_1>0$, let $f'(x_0)\ne 0$, and let $\mathcal Q$ satisfy the condition $\operatorname{sc}(-f'(x_0),R)$. Assume that $2\|f'(x_0)\|>RL_1$. Then the algorithm (1.1) converges linearly with rate

$$ \begin{equation} \|x_0-x_{k+1}\|\leqslant \frac{2R}{R+\alpha\|f'(x_0)\|}\|x_0-x_{k}\|, \end{equation} \tag{4.4} $$
where
$$ \begin{equation*} \alpha_k=\alpha\in \biggl(\frac{R}{\| f'(x_0)\|},\frac{2}{L_1}\biggr). \end{equation*} \notag $$

If the function $f$ is not convex, then the convergence is linear with rate

$$ \begin{equation} \|x_0-x_{k+1}\|\leqslant \frac{2R(1+\alpha L_1)}{R+\alpha\|f'(x_0)\|}\|x_0-x_{k}\|, \end{equation} \tag{4.5} $$
provided that $2RL_1<\|f'(x_0)\|$ and
$$ \begin{equation*} \alpha_k=\alpha>\frac{R}{\|f'(x_0)\|-2RL_1}. \end{equation*} \notag $$

The proof is as in Theorem 4, with estimate (2.5) from Remark 3 used in place of (2.1).

Theorem 6. Under the assumptions of Theorem 4, assume that the function $f$ is not convex and the constant $L$ is unknown. Assume that an initial point $x_1\in\mathcal Q$ of the algorithm (1.1) is chosen so that

$$ \begin{equation*} \frac{R}{K}L_1\|x_1-x_0\|<\|f'(x_0)\|\biggl(1-\frac{R}{K}\biggr)-2RL_1. \end{equation*} \notag $$
Then the algorithm converges linearly with rate
$$ \begin{equation} \begin{aligned} \, &\|x_0-x_{k+1}\| \nonumber \\ &\quad \leqslant \frac{2R(1+\alpha L_1)}{2R+\alpha\|f'(x_0)\|-\alpha ({R}/{K})\| f'(x_0)\|-\alpha ({R}/{K})L_1\|x_1-x_0\|}\|x_0-x_{k}\| \end{aligned} \end{equation} \tag{4.6} $$
for an arbitrary $\alpha_k=\alpha>0$.

Proof. An appeal to Theorem 1 (formula (2.1)) shows that
$$ \begin{equation*} \begin{aligned} \, &\|x_{k+1}- x_0\| \\ &\, \leqslant\frac{2R}{2R+\alpha \|f'(x_0)\|-({R}/{K})\varrho(x_k-\alpha f'(x_k),\mathcal Q)}\|(x_k-\alpha f'(x_k))- (x_0-\alpha f'(x_0))\| \\ &\, \leqslant\frac{2R(1+\alpha L_1)}{2R+\alpha \|f'(x_0)\|-({R}/{K})\varrho(x_k-\alpha f'(x_k),\mathcal Q)}\|x_k-x_0\|. \end{aligned} \end{equation*} \notag $$
Next, from the estimate
$$ \begin{equation*} \varrho(x_k-\alpha f'(x_k),\mathcal Q)\leqslant \alpha \|f'(x_k)\|\leqslant \alpha \|f'(x_0)\|+\alpha L_1\|x_k-x_0\| \end{equation*} \notag $$
we have
$$ \begin{equation*} \|x_{k+1}- x_0\|\leqslant \frac{2R(1+\alpha L_1)}{2R+\alpha \|f'(x_0)\|(1-{R}/{K})-\alpha({R}/{K})L_1\| x_k-x_0\|}\|x_k- x_0\|. \end{equation*} \notag $$
It follows by induction from the last formula that the sequence $\{\|x_k{-}\,x_0\|\}$ is monotone decreasing. This completes the proof of the theorem.

To conclude this section, we summarize in Table 1 the properties of linear convergence of the method (1.1) for a function $f$ with $L_1$-Lipschitz continuous gradient and an arbitrary compact set $\mathcal Q$ which is proximally smooth with constant $K>0 $. Here we assume that the set $\mathcal Q$ satisfies the condition $\operatorname{sc}(-f'(x_0),R)$, $f'(x_0)\ne 0$, where $x_0$ is a minimum point, and $L=\max_{x\in\mathcal Q}\|f'(x)\|$. The column ‘$P_{\mathcal Q}$’ refers to the relevant variant of Lipschitz continuity of the metric projection that is required for the proof of linear convergence.

Table 1

$f$$Q$$P_{\mathcal Q}$ Conditions for linear convergence Step $\alpha_k=\alpha$
convex proximally smooth with $K>0$ (2.1)$\|f'(x_0)\|>\frac{RL}{K}$$\alpha\in \bigl(0,\frac{2}{L_1}\bigr)$
nonconvex proximally smooth with $K>0$ (2.1)$\|f'(x_0)\|>\frac{RL}{K}+2RL_1$$\alpha > 0$
convexany(2.5)$2\|f'(x_0)\|>RL_1$ $\alpha\in\bigl(\frac{R}{\|f'(x_0)\|},\frac{2}{L_1}\bigr)$
nonconvexany(2.5)$\|f'(x_0)\|>2RL_1$ $\alpha >\frac{R}{\|f'(x_0)\|-2RL_1}$

Note that in the second and fourth rows of this table the case of convex $f$ is also allowed. We also note that the corresponding result for convex sets $\mathcal Q$ is obtained by letting $K\to +\infty$.

The algorithm can fail to converge if the liner convergence condition (in the corresponding column of the table) is not satisfied. Let us give an example for the first row, when the function $f$ is convex, and the set $\mathcal Q$ is proximally smooth. In $\mathbb R^2$ consider the function $f(\mathrm{x}_1,\mathrm{x}_2)=(\mathrm{x}_1^2+\mathrm{x}_2^2)/2$ and the set

$$ \begin{equation*} \mathcal Q=\biggr\{(\mathrm{x}_1,\mathrm{x}_2)\in\mathbb R^2\colon \mathrm{x}_1\leqslant 0,\, x_1^2+\biggl(x_2-\frac32\biggr)^2=\frac14\biggr\}. \end{equation*} \notag $$
The minimum of $f$ on $\mathcal Q$ is attained at $x_0=(0,1)$. The set $\mathcal Q$ satisfies the support strong convexity condition $\operatorname{sc}(-f'(x_0),R)$ for $f'(x_0) = (0,1)$ and $R=1/2$, the set $\mathcal Q$ is proximally smooth with constant $K=1/2$, and $L=\max_{x\in\mathcal Q}\|f'(x)\|=2$. We have $\|f'(x_0)\| = 1$, and the condition
$$ \begin{equation*} \|f'(x_0)\|>\frac{R}{K}L \end{equation*} \notag $$
is not satisfied. If the algorithm starts at $x_1=(0,2)$, then for all $\alpha\in (0,1/4)$
$$ \begin{equation*} x_2=P_{\mathcal Q}(x_1-\alpha f'(x_1))=P_{\mathcal Q}(0, 2-2\alpha)=x_1. \end{equation*} \notag $$
Thus, for $\alpha\in (0,1/4)$ the algorithm loops forever.

§ 5. The case of unbounded $\mathcal Q$

Consider, for simplicity, the case of a convex set $\mathcal Q$ and a convex function $f$. Let $\mathcal Q$ be unbounded and $x_0$ be a solution of problem (3.1), and assume that $f'(x_0)\ne 0$. Note that in the convex case $x_0$ is a point of global minimum. For $d> 0$ we set $\mathcal Q_d=\mathcal Q\cap\mathcal B_d(x_0)$. Assume that the set $\mathcal Q_d$ satisfies the condition $\operatorname{sc}(-f'(x_0),R)$. Let $L=\max_{x\in\mathcal Q_d}\|f'(x)\|$, and let $\alpha>0$ be such that $d>2\alpha L$. Consider $x_1\in \mathcal Q_{d-2\alpha L}$. Then for $x_2\in P_{\mathcal Q}(x_1-\alpha f'(x_1))$,

$$ \begin{equation*} \|x_2-x_1\|\leqslant 2\alpha \|f'(x_1)\|\leqslant 2\alpha L. \end{equation*} \notag $$
Hence $x_2\in\mathcal Q_d$, and from Theorem 4 for $K=+\infty$ we obtain
$$ \begin{equation*} \| x_2-x_0\|\leqslant\frac{2R}{2R+\alpha\|f'(x_0)\|} \|x_1-x_0\|< d-2\alpha L. \end{equation*} \notag $$
Continuing the argument, we find that $x_k\in\mathcal Q_d$ for all $k$, and the convergence of $x_k$ to $x_0$ is linear.

In the nonconvex setting one can present sufficient conditions for the convergence of the iterations to a local extremum.

§ 6. Conclusions

The conditions obtained for the convergence of gradient methods are predominantly of theoretical interest. The condition $\operatorname{sc}$ is generic, and for an arbitrary convex compact set it holds for Lebesgue almost all vectors $p\in\partial \mathcal B_1(0)\subset\mathbb R^n$, for some positive radius $R(p)>0$ (see [27], Theorem 4). In addition, for a convex set $\mathcal Q$ the condition $\operatorname{sc}$ is more general than the local strong convexity condition in [11] (see [25], conclusions). For a nonconvex set $\mathcal Q$ the local strong convexity condition (defined in terms of the modulus of convexity) ceases to be meaningful, in contrast to the condition $\operatorname{sc}$. This is why the condition $\operatorname{sc}$ can be of interest, which is also motivated by the applications to gradient methods considered above.

Note that, for a Lipschitz differentiable convex function $f\colon \mathbb R^n\to\mathbb R$ and a convex compact set $\mathcal Q\subset \mathbb R^n$, an analogue of Theorem 4 can be obtained from the quadratic growth condition. Using the condition $\operatorname{sc}(-f'(x_0),R)$ we have

$$ \begin{equation*} \biggl\|x_0+\frac{f'(x_0)}{\|f'(x_0)\|}R-x\biggr\|\leqslant R \end{equation*} \notag $$
for each $x\in\mathcal Q$. Squaring, we find that
$$ \begin{equation*} \frac{\|f'(x_0)\|}{2R}\|x_0-x\|^2\leqslant (f'(x_0),x-x_0), \end{equation*} \notag $$
and now, since $f$ is convex, for all $x\in\mathcal Q$ we have
$$ \begin{equation*} f(x)-f(x_0)\geqslant (f'(x_0),x-x_0)\geqslant \frac{\|f'(x_0)\|}{2R}\|x_0-x\|^2. \end{equation*} \notag $$
The last condition is known as the quadratic growth condition. Note that, for a convex set $\mathcal Q$ and a convex Lipschitz differentiable function $f$ satisfying the quadratic growth condition on $\mathcal Q$, the gradient projection method with certain constant step $\alpha_k=\alpha > 0$ is known to converge linearly (see [28], Theorem 13). Some refinements of these results can be found in [29].


Bibliography

1. R. R. Phelps, “Convex sets and nearest points”, Proc. Amer. Math. Soc., 8 (1957), 790–797  crossref  mathscinet  zmath
2. T. Abatzoglou, “The metric projection on $C^2$ manifolds in Banach spaces”, J. Approx. Theory, 26:3 (1979), 204–211  crossref  mathscinet  zmath
3. J.-Ph. Vial, “Strong and weak convexity of sets and functions”, Math. Oper. Res., 8:2 (1983), 231–259  crossref  mathscinet  zmath
4. F. H. Clarke, R. J. Stern and P. R. Wolenski, “Proximal smoothness and the lower-$C^2$ property”, J. Convex Anal., 2:1–2 (1995), 117–144  mathscinet  zmath
5. B. O. Björnestål, “Local Lipschitz continuity of the metric projection operator”, Approximation theory, Papers, VIth semester, Stefan Banach Internat. Math. Center, Warsaw, 1975, Banach Center Publ., 4, PWN–Polish Sci. Publ., Warsaw, 1979, 43–53  crossref  mathscinet  zmath
6. J. W. Daniel, “The continuity of metric projections as functions of the data”, J. Approx. Theory, 12:3 (1974), 234–239  crossref  mathscinet  zmath
7. E. S. Polovinkin and M. V. Balashov, Elements of convex and strongly convex analysis, 2nd ed., Fizmatlit, Moscow, 2007, 438 pp. (Russian)  zmath
8. V. I. Berdyšev (Berdyshev), “Continuity of a multivalued mapping connected with the problem of minimizing a functional”, Math. USSR-Izv., 16:3 (1981), 431–456  mathnet  crossref  mathscinet  zmath  adsnasa
9. G. E. Ivanov, “Sharp estimates for the moduli of continuity of metric projections onto weakly convex sets”, Izv. Math., 79:4 (2015), 668–697  mathnet  crossref  mathscinet  zmath  adsnasa
10. E. S. Levitin and B. T. Polyak, “Constrained minimization methods”, U.S.S.R. Comput. Math. Math. Phys., 6:5 (1966), 1–50  mathnet  crossref  mathscinet  zmath
11. V. M. Veliov, “On the convexity of integrals of multivalued mappings: applications in control theory”, J. Optim. Theory Appl., 54:3 (1987), 541–563  crossref  mathscinet  zmath
12. G. Braun, A. Carderera, C. W. Combettes, H. Hassani, A. Karbasi, A. Mokhtari and S. Pokutta, Conditional gradient methods, arXiv: 2211.14103v1
13. M. V. Balashov, “Maximization of a function with Lipschitz continuous gradient”, J. Math. Sci. (N.Y.), 209:1 (2015), 12–18  mathnet  crossref  mathscinet  zmath
14. M. V. Balashov, B. T. Polyak and A. A. Tremba, “Gradient projection and conditional gradient methods for constrained nonconvex minimization”, Numer. Funct. Anal. Optim., 41:7 (2020), 822–849  crossref  mathscinet  zmath
15. A. V. Marinov, “On uniform constants of strong uniqueness in Chebyshev approximations and fundamental results of N. G. Chebotarev”, Izv. Math., 75:3 (2011), 603–630  mathnet  crossref  mathscinet  zmath  adsnasa
16. B. T. Polyak, “Minimization of unsmooth functionals”, U.S.S.R. Comput. Math. Math. Phys., 9:3 (1969), 14–29  mathnet  crossref  mathscinet  zmath
17. D. Davis, D. Drusvyatskiy, K. J. MacPhee and C. Paquette, Subgradient methods for sharp weakly convex functions, arXiv: 1803.02461v1
18. Xiao Li, Zhihui Zhu, A. Man-Cho So and J. D. Lee, Incremental methods for weakly convex optimization, arXiv: 1907.11687v1
19. R. Schneider and A. Uschmajew, “Convergence results for projected line-search methods on varieties of low-rank matrices via Łojasiewicz inequality”, SIAM J. Optim., 25:1 (2015), 622–646  crossref  mathscinet  zmath
20. P.-A. Absil, R. Mahony and R. Sepulchre, Optimization algorithms on matrix manifolds, Princeton Univ. Press, Princeton, NJ, 2008, xvi+224 pp.  crossref  mathscinet  zmath
21. M. V. Balashov, “The gradient projection algorithm for a proximally smooth set and a function with Lipschitz continuous gradient”, Sb. Math., 211:4 (2020), 481–504  mathnet  crossref  mathscinet  zmath  adsnasa
22. M. V. Balashov, “Gradient projection method on matrix manifolds”, Comput. Math. Math. Phys., 60:9 (2020), 1403–1411  mathnet  crossref  mathscinet  zmath  adsnasa
23. M. V. Balashov, “About the gradient projection algorithm for a strongly convex function and a proximally smooth set”, J. Convex Anal., 24:2 (2017), 493–500  mathscinet  zmath
24. M. V. Balashov and M. O. Golubev, “About the Lipschitz property of the metric projection in the Hilbert space”, J. Math. Anal. Appl., 394:2 (2012), 545–551  crossref  mathscinet  zmath
25. M. V. Balashov, “Sufficient conditions for the linear convergence of an algorithm for finding the metric projection of a point onto a convex compact set”, Math. Notes, 113:5 (2023), 632–641  mathnet  crossref  mathscinet  zmath
26. N. V. Efimov and S. B. Stechkin, “Approximation properties of sets in normed linear spaces”: S. B. Stechkin, Selected papers, Fizmatlit, Moscow, 1998, 270–281 (Russian)  mathscinet  zmath
27. M. V. Balashov and K. Z. Biglov, “The strong convexity supporting condition and the Lipschitz condition for the metric projection”, Math. Notes, 115:2 (2024), 164–172  mathnet
28. I. Necoara, Yu. Nesterov and F. Glineur, “Linear convergence of first order methods for non-strongly convex optimization”, Math. Program., 175:1–2(A) (2019), 69–107  crossref  mathscinet  zmath
29. Y. Bello-Cruz, Guoyin Li and T. T. A. Nghia, “On the linear convergence of forward-backward splitting method: Part I — Convergence analysis”, J. Optim. Theory Appl., 188:2 (2021), 378–401  crossref  mathscinet  zmath

Citation: M. V. Balashov, “Lipschitz continuity of the metric projection operator and convergence of gradient methods”, Sb. Math., 215:4 (2024), 494–510
Citation in format AMSBIB
\Bibitem{Bal24}
\by M.~V.~Balashov
\paper Lipschitz continuity of the metric projection operator and convergence of gradient methods
\jour Sb. Math.
\yr 2024
\vol 215
\issue 4
\pages 494--510
\mathnet{http://mi.mathnet.ru//eng/sm9982}
\crossref{https://doi.org/10.4213/sm9982e}
\mathscinet{http://mathscinet.ams.org/mathscinet-getitem?mr=4782820}
\zmath{https://zbmath.org/?q=an:07945683}
\adsnasa{https://adsabs.harvard.edu/cgi-bin/bib_query?2024SbMat.215..494B}
\isi{https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=Publons&SrcAuth=Publons_CEL&DestLinkType=FullRecord&DestApp=WOS_CPL&KeyUT=001298689600003}
\scopus{https://www.scopus.com/record/display.url?origin=inward&eid=2-s2.0-85202215645}
Linking options:
  • https://www.mathnet.ru/eng/sm9982
  • https://doi.org/10.4213/sm9982e
  • https://www.mathnet.ru/eng/sm/v215/i4/p62
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Математический сборник Sbornik: Mathematics
    Statistics & downloads:
    Abstract page:275
    Russian version PDF:13
    English version PDF:21
    Russian version HTML:37
    English version HTML:123
    References:19
    First page:6
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024