Izvestiya: Mathematics
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Forthcoming papers
Archive
Impact factor
Guidelines for authors
Submit a manuscript

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Izv. RAN. Ser. Mat.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Izvestiya: Mathematics, 2023, Volume 87, Issue 4, Pages 726–767
DOI: https://doi.org/10.4213/im9305e
(Mi im9305)
 

Variations of $v$-change of time in an optimal control problem with state and mixed constraints

A. V. Dmitruk

Steklov Mathematical Institute of Russian Academy of Sciences, Moscow
References:
Abstract: For a general optimal control problem with state and regular mixed constraints we propose a proof of the maximum principle based on the so-called $v$-change of time variable $t \mapsto \tau$, under which the original time becomes an additional state variable subject to the equation $dt/d\tau = v(\tau)$, while the additional control variable $v(\tau)\geqslant 0$ is piecewise constant, and its values become arguments of the new problem.
Keywords: state and mixed constraints, positively linearly independent vectors, $v$-change of time, Lebesgue–Stieltjes measure, stationarity conditions, Lagrange multipliers, functional on $L_\infty$, weak* compactness, maximum principle.
Funding agency Grant number
Russian Science Foundation 20-11-20169
This work was supported by the Russian Science Foundation under grant no. 20-11-20169, https://rscf.ru/en/project/20-11-20169/.
Received: 20.12.2021
Revised: 31.08.2022
Russian version:
Izvestiya Rossiiskoi Akademii Nauk. Seriya Matematicheskaya, 2023, Volume 87, Issue 4, Pages 91–132
DOI: https://doi.org/10.4213/im9305
Bibliographic databases:
Document Type: Article
UDC: 517.97
MSC: 49K15, 49K27
Language: English
Original paper language: Russian

§ 1. Introduction

Optimal control problems with state and mixed constraints are widely used in theoretical and applied research. Their study was initiated in 1960 by Gamkrelidze [1], whose methods were developed in some other papers (see, for example, [2]–[4]). However, it is well known that the extension of the Pontryagin maximum principle (MP) involves considerable difficulties due to an infinite (uncountable) number of inequality constraints. The principal difficulty here, which appeared already in obtaining stationarity conditions (the Euler–Lagrange equation), consists in characterization of the Lagrange multipliers under these constraints.

Dubovitskii and Milyutin [5] proposed to treat the state constraint $\Phi(t, x(t))\leqslant 0$ as an inclusion in the cone of non-positive functions in the space $C$ of continuous functions on a given time interval; in this case, the corresponding Lagrange multiplier is represented by an element of the dual space $C^*$, that is, by a Lebesgue–Stieltjes measure. By using this approach, necessary conditions for the weak minimum (that is, stationarity conditions) can be obtained rather simply (see, for example, [5]–[7]). The authors of [5]–[7] had also proposed a method for reducing the initial problem to a family of auxiliary (associated) problems (see [8]), and, from the stationarity conditions in these problems, they obtained a generalization of the Pontryagin MP (that is, necessary conditions for the strong minimum) for problems with state constraints. This approach is based on the so-called $v$-change of the time variable, which will be described later (but other classes of variations can also be employed here).

By analogy with state constraints, it was proposed to treat the mixed constraints of the form $\varphi(t, x(t),u(t))\leqslant0$ as an inclusion in the cone of non-positive $L_\infty$-functions; with this proviso, the corresponding multipliers are elements of its dual space. However, in general, the problem of characterization of such multipliers is not an easy task, because they may contain so-called singular components. In the case of regular mixed constraints (where their gradients with respect to $u$ are non-degenerate in a sense), it can be shown that the Lagrange multipliers do not contain singular components, and they all lie in the space $L_1$. Therefore, in this case, regular mixed constraints are even simpler than the purely state ones in the formulation of optimality conditions.

Based on the stationarity conditions thus obtained and using again the $v$-change of time, Dubovitskii and Milyutin also derived an MP for problems involving both state and regular mixed constraints. However, the proof of this result became public long after this discovery (see [6], Ch. 5), while at that time Dubovitskii and Milyutin were concentrated on problems with general mixed constraints without the regularity assumption [9]–[11]. Other scholars started working on problems with regular mixed constraints in the mid-1970s (see [12]–[14]). In [15], the author of the present paper, being a post graduate student of A. A. Milyutin at that time, implemented his idea of the use of the so-called sliding modes (which were introduced earlier by Gamkrelidze [16] for a proof of existence of solutions to optimal control problems); he also gave a complete proof of an MP for problems with regular mixed constraints of equality and inequality types.

There has been relatively little other studies on optimality conditions for problems with constraints of this type (see [3], [17]–[23]). Usually, in such studies either particular statements of the problem are considered under more restrictive regularity assumptions, or, vice versa, generalizations of the problem are studied involving non-smooth (Lipschitz-continuous) constraints, and the corresponding versions of stationarity conditions and MPs are obtained by the machinery of non-smooth analysis. However, we believe that the smooth case is the most important and deserves a special attention, the more so that an application of “non-smooth” conditions to the smooth case usually produces results more rough than the “smooth” ones.

The general $v$-change of time consists in a passage from the original time $t$ to a new time $\tau$ so that the original time $t= t(\tau)$ becomes an additional state variable subjected to the equation $dt/d\tau = v(\tau)$, where $v(\tau)\geqslant 0$ is an additional control. A key point here is that this transformation is not one-one (at the intervals where $v(\tau)= 0)$, and because of this small variations of the control $v(\tau)$ generate non-small (Pontryagin-type) variations of the original control $u(t)$. However, this approach requires a deep knowledge of the theory of functions of real variable.

At the end of the 1990s, A. A. Milyutin proposed a simplified version of $v$-change of time with piecewise constant function $v(\tau)$. Using this approach, he proved the maximum principle for a general Pontryagin-type problem, that is, for a problem with endpoints constraints, but without state and mixed constraints. In the case of a piecewise constant $v$-change of time, small variations of the control $v(\tau)$ generate, in fact, needle-like variations of the control $u(t)$ with a small but substantial difference from the usual ones. The advantage of $v$-change variations against the usual needle-like variations (packets of needles) is as follows:

a) they can be placed at any point $t$ of the given time interval, while the needle variations can work only at Lebesgue points of the optimal control $\widehat u(t)$ (see, for example, [24]–[26]);

b) the constraints of the new problem are defined, at least, in a whole neighbourhood of the parameters of the $v$-change, whereas the needle variations lead to a problem in which the functions are defined only on a non-negative orthant in a finite-dimensional space (to be exact, on its intersection with a neighbourhood of the origin) which corresponds to the needle widths $\varepsilon_i\geqslant 0$ in the given packet;

c) the problem constraints depend smoothly on the parameters of the $v$-change, whereas, for needle variations, differentiability of these constraints with respect to the needle width can be guaranteed only at $\varepsilon_i=0$.

The recent studies [27], [28] show that piecewise constants $v$-change of time allow one to obtain an MP also in problems with state constraints. The purpose of the present paper is to show the feasibility of this approach for problems involving both state and mixed constraints. However, here the mere “generalized” needle variations are insufficient in the associated problem — one should also add uniformly small variations (to obtain the stationarity condition in the control $\overline H_u=0)$, and so the problem is now posed in an infinite-dimensional space.

The general structure of the proof, as in [27], [28], is as follows. Piecewise constancy of the function $v(\tau)$ allows one to pass to a problem in which the arguments are the values of $v(\tau)$ on the intervals of its constancy, the values of the control $u$ on the intervals $v(\tau)> 0$, and the initial value of the state variable $x(\tau_0)$. The presence of state and mixed constraints implies that this problem involves an infinite number of inequality constraints, that is, it is not a usual smooth problem. Nevertheless, optimality conditions in this problem are well known; the specific feature of these conditions is only that they involve support functionals to the cones of non-positive functions in the corresponding spaces. Applying these conditions and rewriting them in terms of the original problem, we obtain a family of corresponding collections of Lagrange multipliers, which form a non-empty compact set relative to a certain topology. Each element of this compact set (that is, a collection of Lagrange multipliers) guarantees the fulfilment of the maximum principle for a finite set of control values and times corresponding to the given $v$-change. The family of compact sets generated by all possible piecewise constant $v$-changes are partially ordered by inclusion, and hence form a centred (Alexandroff type) system. Taking an arbitrary element of their intersection, we obtain a universal optimality condition, which is a collection of Lagrange multipliers that guarantee the fulfilment of the maximum principle for all values of the control and time.

The approach we propose here for obtaining an MP for problems with state and regular mixed constraints has an advantage over the one with sliding modes [15], [7] because the latter calls for a proof of a rather difficult (though interesting per se) relaxation theorem justifying the extension (convexification) of the control system by introducing the sliding modes [29], whereas the method of $v$-variations does not require this.

It is worth pointing out again that the idea of a passage to a family of associated problems in which optimality conditions are already known with a subsequent application of centred systems of compact sets is also due to Dubovitskii and Milyutin (see [8], [6], [30], [31]). This approach was already employed for obtaining an MP both in problems without state constraints [7], [32], [25] and in problems with such constraints [7], [27], [28].

§ 2. Statement of the problem and the maximum principle

Let $x(\,{\cdot}\,)\colon [t_0,t_1]\to\mathbb{R}^n$ be an absolutely continuous function (the state variable) and $u(\,{\cdot}\,)\colon [t_0,t_1]\to\mathbb{R}^r$ be a measurable bounded function (the control). The time interval $[t_0,t_1]$ is not fixed a priori. Consider the problem with Mayer type cost functional

$$ \begin{equation} \mathcal{J}:= F_0(t_0,x(t_0),t_1,x(t_1)) \to \min, \end{equation} \tag{2.1} $$
$$ \begin{equation} F(t_0,x(t_0),t_1,x(t_1))\leqslant 0, \qquad K(t_0,x(t_0),t_1,x(t_1))=0, \end{equation} \tag{2.2} $$
$$ \begin{equation} \dot x(t)= f(t,x(t),u(t)) \quad \text{a.e. on }\, [t_0,t_1], \end{equation} \tag{2.3} $$
$$ \begin{equation} \varphi(t,x(t),u(t)) \leqslant 0,\qquad g(t,x(t),u(t))=0 \quad \text{a.e. on }\, [t_0,t_1], \end{equation} \tag{2.4} $$
$$ \begin{equation} \Phi(t,x(t))\leqslant0 \quad \text{on } [t_0,t_1]. \end{equation} \tag{2.5} $$

Here, $F$, $K$, $f$, $\varphi$, $g$, $\Phi$ are vector functions of some dimensions, which, to save letters, we denote by $d(F)$, $d(K),$ etc. In constraints (2.2)(2.5), we always use vector notation, which should be understood coordinatewise. The cost function $F_0$ assumes real values. The functions of finite-dimensional argument $(t_0,x(t_0), t_1,x(t_1))$ are defined on an open set $\mathcal{P} \subset\mathbb{R}^{2n+2}$, and the functions depending on $(t,x,u)$ are defined on an open set $\mathcal{Q} \subset \mathbb{R}^{1+n+r}$. We assume that all these functions are smooth, that is, they are continuously differentiable with respect to their arguments. For brevity, problem (2.1)(2.5) will be referred to as Problem $\mathrm{A}$.

Relations (2.2) are called endpoint (or terminal) constraints, (2.4) are known as mixed constraints, (2.5) are state constraints, and (2.3) is the control system.

In addition to the above smoothness assumptions, we will also assume that the mixed constraints are regular, that is, for any point $(t,x,u)\in \mathcal{Q}$ satisfying (2.4), the system of vectors

$$ \begin{equation} \varphi'_{iu}(t,x,u),\quad i\in I(t,x,u), \qquad g'_{ju}(t,x,u),\quad j=1,\dots, d(g), \end{equation} \tag{2.6} $$
is positively linearly independent. Here, $I(t,x,u) = \{i\mid \varphi_i(t,x,u) =0\}$ is the set of active indexes for the mixed inequality constraints.

Definition 1. A system of two collections of vectors $p_i$, $i\in I$, and $q_j$, $j\in J$, from $\mathbb{R}^r$, where $I$ and $J$ are some finite sets of indexes, is called positively linearly independent (PLI) if

$$ \begin{equation*} \sum_{i\in I} \alpha_i p_i + \sum_{j\in J} \beta_j q_j = 0 \end{equation*} \notag $$
does not hold for any non-trivial collection of coefficients $\alpha_i$, $i\in I$, and $\beta_j$, $j\in J$, where all $\alpha_i\geqslant 0$.

It is easily seen that this requirement is equivalent to saying that a) the vectors $q_j$ are linearly independent, and b) their linear hull does not intersect the convex hull of the vectors $p_i$. Sometimes, the following dual form of assumption b) is useful: there exists a vector $\overline u$ such that $(p_i,\overline u)<0$ and $(q_j,\overline u) =0$ for all $i,j$.

Thus, the regularity assumption of mixed constraints means that, at any point where they hold, the gradients in $u$ of the active inequality constraints and of all equality constraints are positively linearly independent.1

Remark 1. In non-smooth problems, the assumption that system (2.6) is PLI is replaced by its non-smooth analog that guarantees that any outer normal $(\alpha,\beta)$ to the set of first order admissible variations of the variables $(x,u)$ satisfies the estimate $|\alpha|\leqslant \mathrm{const}\,|\beta|$. Geometrically, this means that any support hyperplane to the graph of the set-valued mapping $x\mapsto U(t,x)$ corresponding to the mixed constraints is not close to vertical ones, that is, its slope is bounded. Because of this, this assumption is called the bounded slope condition (see, for example, [17]–[23]).

Remark 2. Note that the state equality constraints $G(t,x)=0$ are not allowed, for otherwise the linearization of the equality constraints of the problem would not give, in general, a closed image — this condition is a basic requirement in obtaining first-order optimality conditions in all classes of optimization problems (see, for example, § 9.1). Such constraints should be differentiated with respect to $t$, and replaced by the mixed ones $G_t(t,x) + G_x(t,x)f(t,x,u) =0$ in hope that their gradients with respect to $u$ together with (2.6) would be positively linearly independent.

Remark 3. For now, we do not allow the traditional inclusion constraints of type $u(t)\in U$. If a set $U\subset \mathbb{R}^r$ is given by smooth constraints of the form $\widetilde\varphi(u)\leqslant0$, $ \widetilde g(u)=0$, then these constraints should be treated as mixed constraints together with (2.4), for otherwise the problem ceases to be smooth (and then one has to assume that the support vector to the set $U$ together with the gradients of the mixed constraints with respect to $u$ constitute a PLI system), which we try to avoid because, in this case, the study is much more technically complicated. However, one fairly general class of problems with inclusion constraint will be briefly discussed in § 7 below.

Thus, Problem $\mathrm{A}$ is posed. A pair of functions $w(t)=(x(t), u(t))$ related by (2.3) together with the interval of their definition $[t_0, t_1]$ will be called a process of the problem. A process is called admissible if its endpoints $(t_0,x(t_0),t_1,x(t_1))$ belong to the set $\mathcal{P}$, there exists a compact set $D \subset \mathcal{Q}$ such that $(t,w(t))\in D$ for almost all $t$, and if all constraints of the problem are satisfied. As usual, we say that an admissible process $\widehat{w}(t)=(\widehat{x}(t), \widehat{u}(t)),$ $t\in [\widehat t_0, \widehat t_1]$, delivers a strong minimum if there exists $\varepsilon>0$ such that $\mathcal{J}(w) \geqslant \mathcal{J}(\widehat{w})$ for any admissible process $w(t)=(x(t), u(t))$, $t\in [t_0, t_1]$, such that

$$ \begin{equation*} |t_0-\widehat t_0|<\varepsilon,\quad |t_1-\widehat t_0|<\varepsilon, \qquad |x(t)-\widehat x(t)|<\varepsilon \quad \text{on } [t_0,t_1]\cap[\widehat t_0,\widehat t_1]. \end{equation*} \notag $$

We also need the following concept due to Dubovitskii and Milyutin (see [7], [31], [33]). An admissible process $\widehat{w}(t)=(\widehat{x}(t), \widehat{u}(t)),$ $ t\in [\widehat t_0, \widehat t_1]$, is said to provide a Pontryagin minimum in Problem $\mathrm{A}$ if, for any number $N,$ it delivers a local minimum with respect to the norm $\|x\|_C + \|u\|_1$ in the same problem with the additional constraint $|u(t)|\leqslant N;$ that is, if there exists an $\varepsilon>0$ such that $\mathcal{J}(w)\geqslant \mathcal{J}(\widehat{w})$ for any admissible process $w(t)=(x(t), u(t))$, $t\in [t_0, t_1]$ satisfying

$$ \begin{equation*} |t_0-\widehat t_0|< \varepsilon,\quad |t_1-\widehat t_0|< \varepsilon,\quad\; \|x - \widehat{x}\|_C <\varepsilon, \quad\; \|u -\widehat{u}\|_1 <\varepsilon, \quad \|u\|_\infty \leqslant N. \end{equation*} \notag $$
(Here, both norms are taken on the common interval of definition of the corresponding functions.)

It is clear that the Pontryagin minimum is intermediate between the weak and strong minima. In particular, this type of minimum enables both needle-type and uniformly small variations of the control.

Remark 4. If a reference process $(\widehat{x}(t), \widehat{u}(t))$ is given, then it suffices to assume regularity of the mixed constraints only along the trajectory $\widehat{x}(t)$, that is, it suffices that system (2.6) be PLI not for all above triples $(t,x,u)$, but only for triples of the form $(t,\widehat{x}(t),u)$.

To avoid the a priori degeneracy of the “standard” optimality conditions, we will assume that the endpoints of the reference process do not lie on the state boundary; more precisely, the following strict inequalities are assumed:

$$ \begin{equation} \Phi(t_0, \widehat{x}(t_0))<0, \qquad \Phi(t_1, \widehat{x}(t_1))<0. \end{equation} \tag{2.7} $$

To formulate necessary optimality conditions in Problem $\mathrm{A},$ we will need the following notation. We introduce the Pontryagin function

$$ \begin{equation*} H(t,x,u)=\psi_x f(t, x, u), \end{equation*} \notag $$
where $\psi_x$ is a row vector of dimension $n$ (sometimes, the argument $\psi_x$ in $H$ will be omitted); we also define the extended Pontryagin function
$$ \begin{equation*} \overline H(t,x,u)=\psi_x f(t, x, u) - \lambda\varphi(t,x,u) - mg(t,x,u) - \frac{d\mu}{dt}\, \Phi(t,x) \end{equation*} \notag $$
and the endpoint Lagrange function
$$ \begin{equation*} l(t_0,x_0,t_1,x_1)= (\alpha_0 F_0 + \alpha F + \beta K)(t_0,x_0,t_1,x_1), \end{equation*} \notag $$
where $\alpha_0$ is a number, $\alpha$, $\beta$ are row vectors of the same dimensions as $F$, $K,$ respectively (the arguments $ \alpha_0$, $\alpha$, $\beta\,$ in $l$ are omitted), $\lambda$, $m$ are row vectors of the same dimensions as $\varphi$, $g$, and $d\mu/dt$ is a row vector of the same dimension as $\Phi$.

Let $w=(x(t), u(t))$, $ t\in [t_0, t_1]$, be an admissible process for Problem $\mathrm{A}$. We will say that it satisfies the maximum principle if there exist a number $\alpha_0$, row vectors $\alpha\in\mathbb{R}^{d(F)}$, $\beta\in \mathbb{R}^{d(K)}$, measurable bounded functions $\lambda(t)$, $m(t)$ of dimensions $d(\varphi)$, $d(g),$ respectively, a non-decreasing function $\mu(t)$ of dimension $d(\Phi)$, functions of bounded variation $\psi_x(t)$, $\psi_t(t)$ of dimensions $n,\, 1,$ respectively (where $x, t$ are indexes, rather than notation for derivatives) such that:

$$ \begin{equation} \begin{aligned} \, &(\mathrm{i})\ \alpha_0\geqslant0,\quad \alpha\geqslant0,\quad \lambda(t)\geqslant0\quad \text{a.e. on }[t_0,t_1]; \nonumber \\ &(\mathrm{ii})\ \alpha_0+|\alpha| + \int_{t_0}^{t_1} \lambda(t)\,dt + \int_{t_0}^{t_1} d \mu(t) >0; \nonumber \\ &(\mathrm{iii})\ \alpha F(t_0,x(t_0),t_1,x(t_1))=0,\quad \lambda(t)\varphi(t,x(t),u(t))=0\quad\text{a.e. on }[t_0,t_1], \nonumber \\ &\qquad \Phi(t,x(t))\,d \mu(t) =0\quad \text{on }[t_0,t_1]; \nonumber \\ &(\mathrm{iv}_x)\ -\dot\psi_x(t)= \overline H_x(t,x(t), u(t)); \nonumber \\ &(\mathrm{iv}_t)\ -\dot\psi_t(t)=\overline H_t(t,x(t), u(t)); \nonumber \\ &(\mathrm{v}_x)\ \psi_x(t_0)=l_{x_0}(t_0,x(t_0),t_1,x(t_1)),\quad \psi_x(t_1)= -l_{x_1}(t_0,x(t_0),t_1,x(t_1)); \nonumber \\ &(\mathrm{v}_t)\ \psi_t(t_0)= l_{t_0}(t_0,x(t_0),t_1,x(t_1)),\quad \psi_t(t_1)= -l_{t_1}(t_0,x(t_0),t_1,x(t_1)); \nonumber \\ &(\mathrm{vi})\ \overline H_u(\psi_x(t),t,x(t), u(t))=0\quad \text{for almost all } t\in[t_0,t_1]; \nonumber \\ &(\mathrm{vii})\ H(\psi_x(t),t,x(t), u(t)) + \psi_t(t)=0\quad \text{for almost all } t\in[t_0,t_1]; \nonumber \\ &(\mathrm{viii})\ H(\psi_x(t\,{-}\,0),t,x(t), u')\,{+}\, \psi_t(t\,{-}\,0)\,{\leqslant}\, 0, \ H(\psi_x(t\,{+}\,0),t,x(t),u') \,{+}\,\psi_t(t\,{+}\,0)\,{\leqslant}\, 0 \nonumber \\ &\text{for all }t\in[t_0,t_1]\text{ and all }u'\text{ such that} \nonumber \end{aligned} \end{equation} \notag $$
$$ \begin{equation} \\ (t,x(t), u') \in \mathcal{Q}, \qquad \varphi(t,x(t), u')\leqslant0, \qquad g(t,x(t), u') =0. \end{equation} \tag{2.8} $$
The set of all $u' \in \mathbb{R}^r$ satisfying constraints (2.8) will be denoted by $\mathcal{R}(t,x(t))$.

The functions $\psi_x(t)$ and $\psi_t(t)$ are known as the adjoint (costate) variables.2 For now, one does not need to specify from which side they are continuous, assuming only that at each point $t$ they have both left- and right-hand limits; these limits are equal at each continuity point (the discontinuity points form an at most countable set). The function $\mu(t)$ generates a Lebesgue–Stieltjes measure $d\mu(t)\geqslant0$ on $[t_0,t_1]$ with generalized density $d\mu(t)/dt$; and the third condition in (iii) means that $d\mu(t) =0$ on any interval where $\Phi(t,x(t))<0$. (As already mentioned, this pertains to every component of the vector $\Phi$ and the measure $d\mu$.) In particular, by assumption (2.7), $d\mu(t) =0$ in some neighbourhoods of the points $t_0$ and $t_1$. Note also that without loss of generality one can put $\mu(0)=0$.

Relations (i)–(vi) are known as non-negativity condition, non-triviality condition, the complementary slackness, the adjoint (costate) equations, the transversality, and the stationarity in the control, respectively. Relation (vii) can be called the law of energy dynamics, since together with the costate equation $(\mathrm{iv}_t)$ for $\psi_t$ it yields an equation for the function $H$, which often plays the role of the energy in mechanical problems:

$$ \begin{equation*} \dot H = \overline H_t \quad \text{or} \quad \frac{dH}{dt}= \frac{\partial \overline H}{\partial t}. \end{equation*} \notag $$
(If the problem is time-independent, that is, the functions $f$, $g$, $\varphi$, and $\Phi$ do not depend on $t$, we get the energy conservation law: $\dot H =0$, that is, $H =\mathrm{const}$.)

Relation (viii) is obviously equivalent to $H(\psi_x(t),t,x(t), u')+ \psi_t(t) \leqslant 0$ at all continuity points of the functions $\psi_x$ and $\psi_t$. This and relation (vii) yield the maxumality condition for the Pontryagin function: for almost all $t\in[t_0,t_1]$,

$$ \begin{equation} \max_{u' \in \mathcal{R}(t,x(t))} H(\psi_x(t),t,x(t), u') = H(\psi_x(t),t,x(t), u(t)), \end{equation} \tag{2.9} $$
thanks to which the entire set of relations (i)–(viii) is called the maximum principle. Note that here the maximum is taken over $u'$ from the above set $\mathcal{R}(t,x(t))$. In the absence of state and mixed constraints (2.4), (2.5), the set $\mathcal{R}(t,x(t))= \{ u'\mid (t,x(t),u') \in \mathcal{Q}\}$, the following multipliers vanish: $\lambda(t)=0$, $m(t)=0$, $d\mu(t) = 0$. So, we get the Pontryagin maximum principle for the general Lagrange problem of classical calculus of variations (2.1)(2.3), that is, the Weierstrass condition.

Note that the function $\overline H$ appears in the relations involving differentiation with respect to one of the variables $t,x,u$, whereas the function $H$ is not differentiated in (i)–(viii) and (2.9).

The costate equations $(\mathrm{iv}_x)$–$(\mathrm{iv}_t)$ should be understood as equalities between the measures on $[t_0,t_1]$:

$$ \begin{equation*} \begin{aligned} \, d\psi_x(t) &= \bigl(-H_x(\psi_x(t),t,x(t), u(t)) \\ &\qquad +\lambda(t)\varphi_x(t,x(t), u(t)) + m(t)g_x(t,x(t), u(t))\bigr)\,d t + d \mu(t)\,\Phi_{x}(t,x(t)), \\ d\psi_t(t) &= \bigl(-H_t(\psi_x(t),t,x(t), u(t)) \\ &\qquad +\lambda(t)\varphi_t(t,x(t), u(t)) + m(t)g_t(t,x(t), u(t))\bigr)\,d t + d \mu(t)\,\Phi_{t}(t,x(t)), \end{aligned} \end{equation*} \notag $$
One can also write these equalities in an integral form, for example,
$$ \begin{equation*} \psi_x(t+0) =\psi_x(t_0) + \int_{t_0}^t (- H_x + \lambda \varphi_x + mg_x)\,ds + \int_{t_0}^{t+0}\Phi_{x}(s,x(s))\,d\mu(s), \end{equation*} \notag $$
and, similarly, for $\psi_x(t-0)$ and $\psi_t(t \pm 0)$.

The maximum principle is commonly regarded as a necessary condition for strong minimality. However, the following stronger assertion due to Dubovitskii and Milyutin holds (see, for example, [6], [11], [30]).

Theorem 1. If a process $\widehat{w}=(\widehat{x}(t), \widehat{u}(t))$, $t\in [\widehat t_0, \widehat t_1]$, delivers a strong minimum in Problem $\mathrm{A}$, then it satisfies the maximum principle (i)–(viii).

As mentioned in the introduction, we will provide a new relatively simple proof of this theorem. It is more convenient to give it not for the general Problem $\mathrm{A}$, but rather for its particular time-independent case.

§ 3. The autonomous Problem $\mathrm{B}$

Consider the following Problem $\mathrm{B}$ on a non-fixed interval $[t_0,t_1]$ (an autonomous case of Problem $\mathrm{A}$):

$$ \begin{equation} J:= F_0(x(t_0),x(t_1)) \to \min, \end{equation} \tag{3.1} $$
$$ \begin{equation} F(x(t_0),x(t_1))\leqslant0, \qquad K(x(t_0),x(t_1))=0, \end{equation} \tag{3.2} $$
$$ \begin{equation} \dot x(t)= f(x(t),u(t)), \end{equation} \tag{3.3} $$
$$ \begin{equation} \varphi(x(t),u(t)) \leqslant 0,\qquad g(x(t),u(t))=0, \end{equation} \tag{3.4} $$
$$ \begin{equation} \Phi(x(t))\leqslant0. \end{equation} \tag{3.5} $$

For this problem, the costate equation $(\mathrm{iv}_t)$ gives $\psi_t = \mathrm{const}$, and now the transversality condition $(\mathrm v_t)$ implies $\psi_t \equiv 0$; so, instead of $\psi_x$, we will simply write $\psi$. Thus, conditions (vii) and (viii) for Problem $\mathrm{B}$ take the form

$$ \begin{equation} \psi(t) f(x(t), u(t)) =0 \quad \text{a.e.}, \qquad \psi(t\pm 0) f(x(t), u') \leqslant 0 \quad \forall\, t, \end{equation} \tag{3.6} $$
where $u' \in \mathcal{R}(x(t))$. The remaining MP conditions do not change.

Even though Problem $\mathrm{B}$ is a particular case of Problem $\mathrm{A}$, any problem of type $\mathrm{A}$ can be reduced to the form of Problem $\mathrm{B}$. This can be done by the following simple trick. We augment the control system $\dot{x}=f(t,x,u)$ with the additional equation $dt/d\tau=1$, regarding $\tau$ as a new time variable ranging over some interval $[\tau_0,\tau_1]$, and the original time $t=t(\tau)$, as a new state variable. The functions $x(\,{\cdot}\,)$ and $u(\,{\cdot}\,)$ now also depend on the new time: $x=x(\tau)$, $ u=u(\tau)$. Thus, we have the following Problem $\mathrm{A}'$:

$$ \begin{equation} J= F_0(t(\tau_0),x(\tau_0),t(\tau_1),x(\tau_1))\to \min, \nonumber \end{equation} \notag $$
$$ \begin{equation} F(t(\tau_0),x(\tau_0),t(\tau_1),x(\tau_1)) \leqslant 0, \qquad K(t(\tau_0),x(\tau_0),t(\tau_1),x(\tau_1))=0, \nonumber \end{equation} \notag $$
$$ \begin{equation} \frac{dx}{d\tau}= f(t(\tau),x(\tau),u(\tau)), \qquad \frac{dt}{d\tau}=1, \end{equation} \tag{3.7} $$
$$ \begin{equation} \varphi(t(\tau),x(\tau),u(\tau)) \leqslant0, \qquad g(t(\tau),x(\tau),u(\tau)) =0, \end{equation} \tag{3.8} $$
$$ \begin{equation} \Phi(t(\tau),x(\tau))\leqslant 0, \end{equation} \tag{3.9} $$
where $t(\tau)$, $x(\tau)$ are state variables, $u(\tau)$ is the control, and $\tau\in [\tau_0,\tau_1]$ is a non-fixed time interval. Clearly, Problem $\mathrm{A}'$ is of type $\mathrm{B}$.

Problem $\mathrm{A}'$ is invariant with respect to shifting the time $\tau$, and hence one can fix an initial moment $\tau_0$, and then both admissible and optimal processes of Problems $\mathrm{A}$ and $\mathrm{A}'$ will obviously be in a one-one correspondence. Therefore, having obtained necessary optimality conditions for Problem $\mathrm{B}$, one can apply them to Problem $\mathrm{A}'$, thereby obtaining necessary conditions for Problem $\mathrm{A}$. The costate variable in Problem $\mathrm{A}'$ is the pair $(\psi_x, \psi_t)$, the Pontryagin function for system (3.7) is $\widetilde H = \psi_x f + \psi_t$, the “autonomous” conditions $\widetilde H(x,u) =0$ and $\widetilde H(x,u') \leqslant 0$ (see (3.6)) assume the form $\psi_x f(x,u) + \psi_t =0$ and $\psi_x f(x,u') + \psi_t \leqslant 0$, which are exactly conditions (vii) and (viii) in Theorem 1. The details of these transformations are left to the reader.

Let us now proceed with the proof of Theorem 1 for Problem $\mathrm{B}$. To this aim, we again convert the time to a state variable, but now setting $dt/d\tau= v(\tau)$, where the function $v(\tau)$ is non-negative (rather than positive everywhere), hence $t=t(\tau)$ is non-decreasing, but is not necessarily strictly increasing. This non-invertible change, which transforms the time $t$ into a state variable, was proposed by Dubovitskii, used in his joint works with Milyutin [5], [11], and, then, in the works of Milyutin [6], [30] (see also [32]); they called it a $v$-change. A non-trivial point here is that small variations of the new control $v(\tau)$ generate needle-like variations of the original control $u(t)$. The simplest case of this $v$-change (with piecewise constant $v(\tau)$) will be now considered.

Since Problem $\mathrm{B}$ is invariant with respect to time shifting, we fix, for definiteness, an initial moment $t_0= \widehat t_0$.

In parallel with the set $\mathcal{R}(x) = \{u \mid (x,u)\in \mathcal{Q},\; \varphi(x,u)\leqslant0, \;g(x,u)=0\}$, we consider its subset $\mathcal{R}_0(x) = \{u\mid (x,u)\in \mathcal{Q},\; \varphi(x,u)<0, \; g(x,u)=0\}$. Note that, under our assumption of regularity of the mixed constraints, any point in $\mathcal{R}(x)$ is a limit point of $\mathcal{R}_0(x)$.

Lemma 1 (on density). The set $\mathcal{R}_0(x)$ is dense in $\mathcal{R}(x)$.

Proof. Consider any point $(x,u)$, where $u \in \mathcal{R}(x)$. Let $I$ be the corresponding set of active inequalities. By the positively-linear independence assumption of the gradients $\varphi_u(x,u)$, $g_u(x,u),$ there is a vector $\overline u$ such that $\varphi_u(x,u)\overline u<0$ and $g_u(x,u)\overline u=0$. The last relation means that $\overline u$ is tangential to the surface $M(x) = \{u'\mid g(x,u')=0\}$ at the point $u$, that is, there exists a family of corrections $u_\varepsilon = o(\varepsilon)$ for $\varepsilon\to0+$ such that $u'_\varepsilon = u+ \varepsilon\overline u + u_\varepsilon \in M(x)$, that is, $g(x, u'_\varepsilon)=0$. Moreover, $\varphi(x, u'_\varepsilon) = \varphi(x,u) + \varphi_u(x,u)\,\varepsilon\overline u + o(\varepsilon) <0$. Thus, the points $u'_\varepsilon \in \mathcal{R}_0(x)$ converge to $u$, the result required. $\Box$

3.1. Index $\theta$

Let $\widehat w= (\widehat{x}(t),\widehat{u}(t))$, $t\in [\widehat t_0,\widehat t_1]$, be an optimal process in Problem $\mathrm{B}$. With this process we associate a family of problems $\mathrm{B}^\theta$, which we construct below, and their optimal solutions labeled by some index $\theta$.

By the index we will mean a collection of time and control values

$$ \begin{equation*} \theta= \{(t^1,u^1),\dots,(t^{d},u^{d})\}, \end{equation*} \notag $$
where $d$ is an arbitrary natural number, $\widehat t_0 < t^1\leqslant \dots \leqslant t^{d} < \widehat t_1$, and the value $u^s\in \mathcal{R}_0(\widehat{x}(t^s))$ is arbitrary for any $s =1,\dots, d$. The index length $d = d(\theta)$ depends on $\theta$.

Let us define the interval $[\tau_0,\tau_1]$ as follows: we take the interval $[\widehat t_0,\widehat t_1]$, and at the points $t^1,\dots,t^{d(\theta)}$ we successively insert unit intervals, preserving, at each time, the position of the point $\widehat t_0$. As a result, we obtain the interval $[\tau_0,\tau_1]$ with the endpoints $\tau_0=\widehat t_0$, $\tau_1=\widehat t_1+ d(\theta)$, and the inserted intervals have the form

$$ \begin{equation*} \Delta^1=[t^1,\,t^1+1], \;\ \Delta^2=[t^2+1,\,t^2+2], \ \dots,\ \Delta^{d(\theta)} =[t^{d(\theta)}+(d(\theta)-1),\,t^{d(\theta)}+ d(\theta)]. \end{equation*} \notag $$
We next set
$$ \begin{equation*} E_0= \bigcup_{1}^{d(\theta)}\Delta^s,\qquad E_+ = [\tau_0,\tau_1]\setminus E_0, \end{equation*} \notag $$
and define the functions
$$ \begin{equation} v^\theta(\tau)= \begin{cases} 0, &\tau\in E_0, \\ 1, &\tau\in E_+, \end{cases} \qquad t^\theta(\tau)= \widehat t_0 + \int_{\tau_0}^\tau v^\theta(a)\,da, \quad \tau\in[\tau_0,\tau_1]. \end{equation} \tag{3.10} $$
We have
$$ \begin{equation*} \frac{dt^\theta(\tau)}{d\tau}=v^\theta(\tau),\qquad t^\theta(\tau_0)=\widehat t_0, \quad\; t^\theta(\tau_1)=\widehat t_1. \end{equation*} \notag $$
So, $t^\theta(\tau)$ is a piecewise linear non-decreasing function mapping $[\tau_0,\tau_1]$ onto $[\widehat t_0,\widehat t_1]$, and $\Delta^s$ are the intervals of its constancy with $t^\theta(\Delta^s)=t^s$, $\;s =1,\dots, d(\theta)$.

We next define

$$ \begin{equation} u^\theta(\tau)=\begin{cases} \widehat u(t^\theta(\tau)), &\tau\in E_+, \\ u^s, &\tau\in \Delta^s, \ s =1,\dots, d(\theta), \end{cases} \qquad x^\theta(\tau)=\widehat{x}(t^\theta(\tau)). \end{equation} \tag{3.11} $$
The function $u^\theta(\tau)$ is a bounded measurable function, and $x^\theta(\tau)$ is an absolutely continuous functions satisfying
$$ \begin{equation*} \frac{dx^\theta(\tau)}{d\tau} = v^\theta(\tau)\, f(x^\theta(\tau),u^\theta(\tau)), \qquad x^\theta(\tau_0) = \widehat{x}(\widehat t_0), \quad x^\theta(\tau_1) = \widehat{x}(\widehat t_1), \end{equation*} \notag $$
that is, the endpoints of the new trajectory $x^\theta(\tau)$ coincide with those of the original $\widehat{x}(t)$. Moreover, $x^\theta(\tau) = \widehat{x}(t^s)$ on any inserted interval $\Delta^s$, and the new pair satisfies the mixed constraints (3.4) on the whole interval $[\tau_0,\tau_1]$, that is,
$$ \begin{equation} \begin{gathered} \, \varphi_i(x^\theta(\tau),u^\theta(\tau)) \leqslant 0, \qquad i=1,\dots,d(\varphi), \\ g_j(x^\theta(\tau),u^\theta(\tau)) =0, \qquad j=1,\dots,d(g), \end{gathered} \end{equation} \tag{3.12} $$
where on each $\Delta^s$ the inequalities are strict.

Note that some points $t^s$ may coincide: $t^{s'} = \dots = t^{s''} = t_*$, whence at any such a point $t_*$, we successively insert several unit intervals, on each of which we set $v^\theta(\tau)=0$, and the corresponding value $u^\theta(\tau)= u^s$.

The set $E_0$ is a finite union of intervals $\Delta^s$, $s =1,\dots, d(\theta)$. The set $E_+$ is a finite union of intervals or half-open intervals. Consider the collection of all these intervals and half-open intervals of $E_0$ and $E_+$, order it, and denote its elements by $\sigma_k$, $k=1,\dots,m$. We have , $[\tau_0,\tau_1]= \sigma_1 \cup \dots \cup \sigma_m$, where different $\sigma_k$ do not overlap. Let $\chi_k(\tau)$ be the characteristic function of the set $\sigma_k$, $ k=1,\dots,m$.

3.2. The control system of index $\theta$

We will need the following simple fact.

Lemma 2. Let a point $(x^*,u^*)\in\mathbb{R}^{n+r}$ satisfy the conditions

$$ \begin{equation*} \varphi(x^*,u^*) <0, \qquad g(x^*,u^*) =0. \end{equation*} \notag $$
Then there exists a neighbourhood $\mathcal{O}(x^*)$ of the point $x^*$ and a smooth function $\mathcal{U}\colon \mathcal{O}(x^*)\,{\to}\, \mathbb{R}^r$ such that
$$ \begin{equation*} \varphi(x,\mathcal{U}(x)) <0, \quad\; g(x,\mathcal{U}(x)) =0 \quad\; \forall\, x\in \mathcal{O}(x^*), \end{equation*} \notag $$
and $\mathcal{U}(x^*) = u^*$.

Proof. Recall that by the assumption of regularity of the mixed constraints, the rank of the matrix $g'_u(x^*,u^*)$ is $d(g)$. Hence, the components of the vector $u$ can be split into two groups $u =(u_1,u_2)$ so that $\dim u_2= d(g)$ and the matrix $g'_{u_2}(x^*,u^*_1, u^*_2)$ is invertible. By the implicit function theorem, there exists a neighbourhood $\mathcal{O}(x^*,u^*_1)$ in which the equation $g(x, u_1, u_2)=0$ is resolved by a smooth function $u_2 = G(x, u_1)$, that is, $g(x, u_1, G(x, u_1))=0$ and $G(x^*, u_1^*) = u_2^*$.

Freezing here $u_1 =u_1^*$, we get a smooth function $u_2 = \widetilde G(x) = G(x, u_1^*)$ on the open set $\mathcal{O}(x^*) = \{x\mid (x,u_1^*) \in \mathcal{O}(x^*,u^*_1)\}$. By reducing this set, if necessary, we also obtain the inequality $\varphi(x,u_1^*,\widetilde G(x)) <0$. Now it remains to define $\mathcal{U}(x) = (u_1^*,\widetilde G(x))$. $\Box$

Now, we take an arbitrary index $\theta$. For any $s =1,\dots, d(\theta),$ let $\mathcal{U}^s(x)$ be the function from Lemma 2 corresponding to the point $(\widehat{x}(t^s),u^s)$, and which is defined in a neighbourhood of the point $\widehat{x}(t^s)$. Note that $\mathcal{U}^s(\widehat{x}(t^s)) = u^s$.

We fix the interval $[\tau_0,\tau_1]$ corresponding to the index $\theta.$ Consider the space $\mathbb{R}^{m+n}$ of variables $z=(z_1,\dots,z_m)$ and $x_0= x(\tau_0)$. Generalizing (3.10), we define the piecewise constant function

$$ \begin{equation} v(\tau)= \sum_{k=1}^m z_k\chi_k(\tau), \qquad \tau\in [\tau_0,\tau_1] \end{equation} \tag{3.13} $$
(that is, $z_k$ is its value on the interval $\sigma_k$), and consider the control system
$$ \begin{equation} \frac{dx}{d\tau}= v(\tau) \begin{cases} f(x(\tau),u(\tau)), &\tau\in E_+, \\ f(x(\tau),\,\mathcal{U}^s(x(\tau))), &\tau\in \Delta^s \subset E_0, \end{cases} \qquad x(\tau_0)=x_0. \end{equation} \tag{3.14} $$
Here, the control $u \in L^r_\infty(E_+)$ (that is, $u(\tau)$ is an arbitrary measurable bounded function on $E_+$), and on each $\Delta^s\subset E_0$ we set $u(\tau) = \mathcal{U}^s(x(\tau))$, that is, the control is in fact absent on there. It is clear that $\mathcal{U}^s(x^\theta(\tau)) = u^\theta(\tau) = u^s$ on each $\Delta^s$.

Consider the function

$$ \begin{equation*} \mathcal{F}(\tau,x,u)= \begin{cases} f(x,u), &\tau\in E_+, \\ f(x,\mathcal{U}^s(x)), &\tau\in \Delta^s \subset E_0. \end{cases} \end{equation*} \notag $$
This function depends smoothly on the pair $(x,u)\in \mathbb{R}^n \times \mathbb{R}^r;$ that $\mathcal{F}$ is discontinuous with respect to $\tau$ plays no role here. Now system (3.14) has the form
$$ \begin{equation} \frac{dx}{d\tau}= v(\tau)\mathcal{F}(\tau,x(\tau),u(\tau)),\qquad x(\tau_0)=x_0. \end{equation} \tag{3.15} $$
In view of (3.13) it can be written as
$$ \begin{equation} \frac{dx}{d\tau}= \sum_{k=1}^m z_k\chi_k(\tau)\mathcal{F}(\tau,x(\tau),u(\tau)), \qquad x(\tau_0)=x_0. \end{equation} \tag{3.16} $$

Let $z^\theta_k$ be the value of $v^\theta(\tau)$ on $\sigma_k$, $k=1,\dots,m$, that is, $v^\theta(\tau)=\sum z^\theta_k\chi_k(\tau)$. Recall that $z^\theta_k=0$ if $\sigma_k\subset E_0$, and $z^\theta_k =1$ if $\sigma_k\subset E_+$. We set $z^\theta =(z^\theta_1,\dots, z^\theta_m)$ and define $x^\theta_0= x^\theta(\tau_0)= \widehat{x}(\widehat t_0)$; the control $u^\theta(\tau)$ is defined above. It is easily seen that the triple $(u^\theta,z^\theta,x^\theta_0)$ satisfies system (3.16). Let us call it a basic point of Problem $\mathrm{B}^{\theta}$ (this problem will be constructed a bit later) corresponding to the process $\widehat{w}(t)= (\widehat{x}(t),\widehat{u}(t))$ of the original Problem $\mathrm{B}$.

The right-hand side in (3.15), (3.16) is a smooth function of $(u,z,x_0)\in\mathbb{R}^{r+m+n}$, and hence, for any triple $(u,z,x_0)\in L_\infty^r(E_+) \times \mathbb{R}^m \times \mathbb{R}^n$ sufficiently close to $(u^\theta, z^\theta,x^\theta_0)$, the Cauchy problem (3.16) has a solution $x(\tau)$ which depends smoothly on this triple.3 Thus, we have the operator

$$ \begin{equation*} P\colon L^r_\infty(E_+)\times \mathbb{R}^m \times \mathbb{R}^n\to C^n[\tau_0, \tau_1], \qquad (u,z,x_0) \mapsto x(\tau), \end{equation*} \notag $$
which is Frechét differentiable near the point $(u^\theta,z^\theta,x^\theta_0)$ and whose derivative is continuous at this point. The derivative at this point is a linear mapping $P'(u,z,x_0)\colon (\overline u,\overline{z},\overline x_0) \mapsto \overline x(\tau)$, where the function $\overline x(\tau)$ is the solution of the Cauchy problem with the initial condition $\overline{x}(\tau_0) =\overline{x}_0$ for the equation in variations
$$ \begin{equation} \frac{d\overline{x}}{d\tau} = \sum_k \bigl( z^\theta \chi_k \mathcal{F}_x(\tau,x^\theta, u^\theta)\overline{x} + z^\theta \chi_k \mathcal{F}_u(\tau,x^\theta, u^\theta) \overline u + \overline{z}_k\chi_k \mathcal{F}(\tau,x^\theta, u^\theta) \bigr), \end{equation} \tag{3.17} $$
or, in a different form,
$$ \begin{equation} \frac{d\overline{x}}{d\tau} = v^\theta \bigl(f_x(x^\theta, u^\theta)\overline{x} + f_u(x^\theta, u^\theta) \overline u\bigr) + \overline v f(x^\theta, u^\theta), \end{equation} \tag{3.18} $$
where, according to (3.13), $\overline v(\tau)= \sum_{k=1}^m \overline{z}_k\chi_k(\tau)$.

Here, we used the fact that $\mathcal{U}^s(x^\theta(\tau)) = u^\theta(\tau)$, where $v^\theta =0$ (that is, on $E_0$), and that $\mathcal{F}(\tau,x^\theta, u^\theta) = f(x^\theta, u^\theta)$ on the whole $[\tau_0,\tau_1]$. The derivatives $\mathcal{F}_x,\, \mathcal{F}_u$ coincide with $f_x$, $f_u$ on $E_+$, while on $E_0$ only their existence is important, rather than their values. (Note that (3.18) can also be directly derived from (3.15).)

3.3. Problem $\mathrm{B}^{\theta}$ for index $\theta$

For the above index $\theta$, consider the following Problem $\mathrm{B}^{\theta}$ in the space $L^r_\infty(E_+)\times \mathbb{R}^m \times \mathbb{R}^n$ of elements $(u,z,x_0)$:

$$ \begin{equation} F_0(x_0,x(\tau_1))\to \min, \end{equation} \tag{3.19} $$
$$ \begin{equation} F(x_0,x(\tau_1))\leqslant 0,\qquad K(x_0,x(\tau_1))=0, \qquad -z \leqslant 0, \end{equation} \tag{3.20} $$
$$ \begin{equation} \Phi(x(\tau))\leqslant 0 \quad \text{on } [\tau_0,\tau_1], \end{equation} \tag{3.21} $$
$$ \begin{equation} \varphi(x(\tau),u(\tau)) \leqslant 0,\qquad g(x(\tau),u(\tau)) = 0 \quad \text{on }E_+\,, \end{equation} \tag{3.22} $$
where $x(\tau) = P(u,z,x_0)(\tau)$ is determined by $(u,z,x_0)$ from the control system (3.16). We will call it the associated problem corresponding to the process $\widehat{w}(t)= (\widehat{x}(t),\widehat{u}(t))$ of the original Problem $\mathrm{B}$ and the index $\theta$.

Obviously, for any triple $(u,z,x_0)\in L_\infty^r(E_+) \times \mathbb{R}^m \times \mathbb{R}^n$ sufficiently close to $(u^\theta,z^\theta,x^\theta_0)$, where $z\geqslant0,$ the pair $(x(\tau),u(\tau))$ is generated by a unique solution $(x'(t),u'(t))$ of the original system (3.3) defined on the interval $[\widehat t_0,\, t_1 = t(\tau_1)]$, that is,

$$ \begin{equation} x(\tau)= x'(t(\tau)), \qquad u(\tau)= u'(t(\tau)), \end{equation} \tag{3.23} $$
where $t(\tau)$ is determined by the equation $dt/d\tau = v(\tau),\;$ $t(\tau_0)= \tau_0$. Moreover, if the triple $(u,z,x_0)$ tends to the basic triple $(u^\theta,z^\theta,x^\theta_0)$, then $t(\tau_1)$ tends to $\widehat t_1$, the pair $(x'(t),u'(t))$ tends to the optimal pair $(\widehat{x}(t),\widehat{u}(t))$ of Problem $\mathrm{B}$ in the norm of the space $C\times L_1$ (evaluated each time on the mutual interval of their definition), and $\|u'\|_\infty \leqslant \mathrm{const}$ (where the constant depends on $\theta)$.

Indeed, when changing from the new time $\tau$ to the original time $t,$ the intervals from $E_+$ are mapped to the intervals obtained from the initial intervals $[t^s, t^{s+1}]$ by small translations and dilations, and hence the state variable $x'(t)$ on these intervals is uniformly close to the optimal $\widehat{x}(t)$, and the control $u'(t)$ is close to $\widehat{u}(t)$ in the integral metric. Each interval $\Delta \subset E_0$ of the $\tau$-axis is sent to a small interval of the $t$-axis, and hence $x'(t)$ is uniformly close to $\widehat{x}(t)$ on it, and the integral of $|u'(t)|$ is small. We omit the routine verification of these facts. (Some estimates of this type can be found in [34].)

Remark 5. Note that, for the basic $z^\theta$, that is, for $v= v^\theta$, every interval $\Delta^s \subset E_0$ collapses under the mapping $\tau \mapsto t(\tau)$, and is transformed to the point $t^s$, so that the above chosen values $u^\theta(\tau) =u^s$ on $\Delta^s$ do not appear in the original time $t$, and hence these values seemingly do not play any role. However, if $z$ slightly deviates from the basic value, then the interval $\Delta^s \subset E_0$ of the $\tau$-axis corresponding to $z_s >0$ is now transformed to a small interval of length $z_s$ on the $t$-axis, where $u'(t) = \mathcal{U}^s(x(\tau(t)))$ is close to $u^s$. Thus, in the original time, we obtain in fact a needle-type variation of the control! Its principal difference from the “standard” needle-type variation is that we do not replace the control $\widehat{u}(t)$ on a small interval near the point $t^s$, but rather expand this point by inserting there a small interval with profile $u'(t) =\mathcal{U}^s(x(\tau(t)))$. This point $t^s$ is not unique, and so we get a packet of such generalized needle variations. Note that here it does not seem possible to employ standard needle variations (as, for example, in [25], [26], for problems without state and mixed constraints), because the constraint $\Phi(x(t))\leqslant 0$ would not be differentiable with respect to the width of the needle, because even the derivative of the trajectory $x(t)$ with respect to the width of the needle would be a discontinuous function of $t$.

As was already noted in the introduction, the advantage of such “inserted” needles against the usual ones is also in that they guarantee a smooth dependence of all the problem constraints on the needle width for any measurable control, whereas the usual needles work only for the piecewise continuous optimal control $\widehat{u}(t)$.

Remark 6. The control $u(\tau)$ on the intervals in $E_0$ is not varied, but is given by certain functions of $x$, while its variation on the set $E_+$ will be needed for obtaining the stationarity condition with respect to the control $\overline H_u=0$. For the problems without mixed constraints, this condition is absent, hence there is no need to vary the control on $E_+$ — it suffices to consider only generalized needles, so that Problem $\mathrm{B}^{\theta}$ is finite-dimensional [27], [28]. If the mixed constraints are present, the generalized needles themselves would not do.

Let us find a link between optimality of the basic points in Problems $\mathrm{B}$ and $\mathrm{B}^\theta$.

Lemma 3. If a process $\widehat{w}= (\widehat{x}(t), \widehat{u}(t))$ delivers a Pontryagin minimum in Problem $\mathrm{B}$, then the triple $\zeta^\theta =(u^\theta,z^\theta,x^\theta_0)$ delivers a local minimum in the associated Problem $\mathrm{B}^{\theta}$, that is, a minimum with respect to the norm $\|u\|_\infty +|z| +|x_0|$ (a weak minimum).

Proof. Suppose on the contrary that the triple $\zeta^\theta$ does not give a local minimum in Problem $\mathrm{B}^{\theta}$. This means that there is a sequence of admissible triples $\zeta = (u,z,x_0)$ of Problem $\mathrm{B}^{\theta}$ such that $\zeta \to \zeta^\theta$ and $F_0(\zeta) < F_0(\zeta^\theta)$. Passing from the time $\tau$ to the original time $t$, we construct, as above, a sequence of processes $w' = (x'(t), u'(t))$ satisfying equalities (3.23) and system (3.3). By virtue of (3.22), these processes satisfy the mixed constraints of Problem $\mathrm{B}$ on the image of the set $E_+$. On the intervals of the $\tau$-axis in $E_0$, these constraints hold by construction (with strict inequalities). The pass to $t$ transforms each interval from $E_0$ to a small interval on which the mixed constraints also hold (with strict inequalities). The state constraints for the process $w'$ remain valid by (3.21).

Since every trajectory $x'(t)$ has the same endpoints as $x(\tau)$, the processes $w'$ are admissible in Problem $\mathrm{B}$ and produce the values $F_0(w') = F_0(\zeta) < F_0(\zeta^\theta) = F_0(\widehat{w})$. Finally, since $\zeta \to \zeta^\theta$, we have, by the above, $\|x' -\widehat{x}\|_C \to 0$, $\|u' -\widehat{u}\|_1 \to 0$ and $\|u\|_\infty \leqslant \mathrm{const}$, which contradicts Pontryagin minimality of the point $\widehat{w}$ in Problem $\mathrm{B}$. $\Box$

Now, we can write necessary conditions for a local minimum in Problem $\mathrm{B}^{\theta}$. Note that, even though all the “data functions” in this problem are smooth, this is not a standard smooth problem, because it involves an uncountable number of inequality constraints (3.21) and (3.21). This is a problem of so-called “semi-infinite” optimization. Nevertheless, necessary conditions for a local minimum in such problems are well known — this is a general Lagrange multiplier rule (or a principle) (see § 9.1 in the appendix). In our case, it reads as follows.

Theorem 2. Let a triple $(u^\theta, z^\theta, x^\theta_0)$ deliver a local minimum in Problem $\mathrm{B}^{\theta}$. Then there exist a number $\alpha_0$, row vectors $\alpha\in\mathbb{R}^{d(F)}$, $\beta\in \mathbb{R}^{d(K)}$, and $\gamma \in\mathbb{R}^{m+n}$, elements $\lambda \in L_\infty^{d(\varphi)*}(E_+)$ and $m \in L_\infty^{d(g)*}(E_+)$, and a vector function $\mu(\tau)$ of dimension $d(\Phi)$ on $[\tau_0,\tau_1]$ with non-decreasing components and initial value $\mu(\tau_0)=0$ such that

$$ \begin{equation*} \begin{aligned} \, &(\mathrm{i})\ \alpha_0\geqslant0,\quad \alpha\geqslant0,\quad \gamma\geqslant 0, \quad \lambda\geqslant 0; \\ &(\mathrm{ii})\ \alpha_0+ |\alpha|+ |\beta|+ |\gamma|+ \|\lambda\|+ \|m\|+ \mu(\tau_1)> 0; \\ &(\mathrm{iii})\ \alpha F(\widehat{x}_0, \widehat{x}_1)=0,\quad \gamma z^\theta =0, \quad \langle \lambda,\varphi(x^\theta,u^\theta) \rangle =0, \quad \Phi(x^\theta(\tau))\,d \mu(\tau) =0;\qquad \end{aligned} \end{equation*} \notag $$
and, moreover, the Lagrange function for Problem $\mathrm{B}^{\theta}$
$$ \begin{equation*} L(u,z,x_0) = (\alpha_0 F_0+\alpha F+\beta K) -\gamma z + \langle \lambda,\varphi(x,u)\rangle +\langle m, g(x,u)\rangle + \int_{\tau_0}^{\tau_1}\Phi(x)\,d \mu \end{equation*} \notag $$
is stationary at the point $(u^\theta, z^\theta, x^\theta_0)$,
$$ \begin{equation} L'(u^\theta, z^\theta, x^\theta_0) = 0. \end{equation} \tag{3.24} $$

Here, $\lambda$ and $m$ are linear continuous functionals on the spaces $L_\infty(E_+)$ of corresponding dimensions; by $\langle \lambda, \overline\varphi \rangle$ and $\langle m,\overline g \rangle$ we denote evaluation of $\lambda$ and $m$ at arbitrary points $\overline\varphi$ and $\overline g$ of these spaces.

Our next aim is to decipher the above conditions.

Let us dwell in more detail on the condition $\langle \lambda,\varphi(w^\theta) \rangle = \sum_i\langle \lambda_i,\varphi_i(w^\theta) \rangle =0$. This condition means that, for every $i$, the functional $\lambda_i \in L^*_\infty(E_+)$ is a support element (an outer normal) to the cone $\Omega$ of non-positive functions in the space $L_\infty(E_+)$ at the point $\varphi_i(w^\theta)\in \Omega$. For any $\delta>0$, define the set $M_i^\delta = \{\tau\in E_+\mid \varphi_i(w^\theta)\geqslant -\delta\}$ (which may be empty). Each $\lambda_i$ is characterized by the following properties (see § 3.5 in [7]): a) $\lambda_i\geqslant0$; b) $\lambda_i$ is supported on the set $M_i^\delta$ for any $\delta>0$; c) $\|\lambda_i\| := \langle \lambda_i, \mathbf{1}\rangle =1$. (Here, $\mathbf{1}$ is the identically one function.) Below, it will be shown that every $\lambda_i$ is a “usual” function from $L_1(E_+)$ (and even from $L_\infty(E_+)$), and hence it supported on the set $M_i^0 = \{ \tau \in E_+ \,{\mid}\, \varphi_i(w^\theta) = 0\}$, that is, we get the usual complementary slackness condition $\lambda_i(\tau)\varphi_i(w^\theta(\tau)) =0$ almost everywhere on $E_+$.

§ 4. Stationarity conditions in Problem $\mathrm{B}^\theta$

For notational convenience, we introduce the endpoint Lagrange function $l = \alpha_0 F_0+\alpha F+\beta K$, and write, for brevity, $f^\theta = f(x^\theta, u^\theta)$, $f^\theta_x = f_x(x^\theta,u^\theta)$, etc.

Condition (3.24) means that, for any $(\overline u,\overline{z},\overline x_0)$,

$$ \begin{equation} \begin{aligned} \, &L'(u^\theta, z^\theta, x^\theta_0)(\overline u,\overline{z}, \overline x_0) = l_{x_0} \overline x_0+ l_{x_1} \overline x_1 - \gamma \overline{z} \nonumber \\ &\qquad\qquad\quad + \langle \lambda,(\varphi_x^\theta \overline x + \varphi_u^\theta\overline u) \rangle + \langle m,(g_x^\theta \overline x + g_u^\theta\overline u) \rangle + \int_{\tau_0}^{\tau_1}\Phi_x^\theta \overline x\,d \mu =0, \end{aligned} \end{equation} \tag{4.1} $$
where $\overline x_1 =\overline x(\tau_1)$ according to equation (3.17) (or (3.18)). (The derivatives of all functions are taken at the optimal point $(u^\theta, z^\theta, x^\theta_0)$.)

1. Let us first simplify the functionals $\lambda$ and $m$, which a priori lie in $L^*_\infty(E_+)$. To this aim, we recall the following property of $L_\infty^*(\Delta)$-functionals on an interval $\Delta$.

The functional $\pi\in L_\infty^*(\Delta)$ is called absolutely continuous if there exists a function $p \in L_1(\Delta)$ such that $\pi$ can be represented as

$$ \begin{equation*} \langle \pi, u \rangle = \int_{\tau_0}^{\tau_1} p(\tau)\,u(\tau)\,d\tau \quad \text{for all} \quad u\in L_\infty(\Delta). \end{equation*} \notag $$

One can easily show that $\pi$ is absolutely continuous if and only if $\langle \pi, u_n\rangle \to 0$ for any sequence $u_n \in L_\infty(\Delta)$ such that $\|u_n\|_\infty \leqslant\mathrm{const}$, $\|u_n\|_1 \to0$. (This follows, for example, from the Yosida–Hewitt theorem on decomposition of $\pi$ into an absolutely continuous and singular components. Obviously, this property holds for the absolutely continuous component, but not for the singular one.)

This implies that, for any $\eta \in L_\infty^*(E_+)$ and any function $a\in L_\infty(E_+)$, the functional of the form $\langle \eta, a\overline x \rangle$, where $\overline x$ is expressed via $\overline u \in L_\infty(E_+)$ by the equation $d\overline{x}/d\tau = A(\tau)\overline x + B(\tau)\overline u$ with given matrices $A, B\in L_\infty$ on the interval $[\tau_0,\tau_1]$ and with the initial conditions $\overline x(\tau_0)=0$, is absolutely continuous with respect to $\overline u$. Indeed, if $\|\overline u_n\|_1 \to 0$, then by the Gronwall lemma, $\|\overline x_n\|_C \to0$, whence $\| a\,\overline x_n\|_\infty \to 0$, and so, $\langle \eta, a\overline x_n \rangle \to 0$. By the same reason, for any measure $d\mu$ on $[\tau_0,\tau_1]$ and any continuous function $c(\tau)$, the functional $\int c\overline x\,d\mu$ is also absolutely continuous with respect to $\overline u \in L_\infty(E_+)$.

2. Let us get back to equality (4.1). Setting $\overline{z}=0$ and $\overline x_0=0$ in this equality, we get $\overline{v}=0$, and hence in view of (3.18) $\overline x$ is expressed in terms of $\overline u$ via

$$ \begin{equation} \frac{d\overline{x}}{d\tau}= v^\theta(f_x^\theta \overline x + f_u^\theta \overline u), \qquad \overline x(\tau_0)=0. \end{equation} \tag{4.2} $$
In addition, for any $\overline u \in L_\infty(E_+)$, we have
$$ \begin{equation*} \langle \lambda,\varphi_u^\theta\overline u \rangle + \langle m, g_u^\theta\overline u \rangle = -l_{x_1} \overline x_1 - \langle \lambda, \varphi_x^\theta\overline x \rangle - \langle m, g_x^\theta\overline x \rangle + \int_{\tau_0}^{\tau_1}\Phi_x^\theta \overline x\,d \mu. \end{equation*} \notag $$
By the above, the right-hand side of this equality is an absolutely continuous functional, that is,
$$ \begin{equation} \sum_{i=1}^{d(\varphi)}\langle \lambda_i,\varphi_{iu}^\theta\overline u \rangle+ \sum_{j=1}^{d(g)}\langle m_j, g_{ju}^\theta\overline u \rangle = \int_{E_+} p(\tau)\overline u(\tau)\,d\tau, \end{equation} \tag{4.3} $$
where $p$ is an $L_1(E_+)$-function. By the above assumption on regularity of mixed constraints, to the collection of vector functions $\varphi_{iu}(w^\theta)$ and $g_{ju}(w^\theta)$ one can apply Theorem 8 on the absence of singular components (see the appendix, § 9.2), which together with equation (4.3) implies that all the components of the functionals $\lambda$ and $m$ are absolutely continuous, that is, $\lambda_i = \lambda_i(\tau)$ and $m_j= m_j(\tau)$ are functions from $L_1(E_+)$, and, in addition, $\lambda_i(\tau)\geqslant0$ on $E_+$. Now the complementary slackness condition $\langle \lambda,\varphi(x^\theta,u^\theta) \rangle =0$ assumes the form
$$ \begin{equation*} \sum_{i=1}^{d(\varphi)}\int_{E_+}\lambda_i(\tau)\,\varphi_i(x^\theta(\tau),u^\theta(\tau))\,d\tau =0. \end{equation*} \notag $$
So, for any component $\lambda_i$, we have $\lambda_i(\tau)\varphi_i(x^\theta(\tau),u^\theta(\tau)) = 0$, that is, $\lambda_i$ is concentrated on the zero set of the $i$th mixed inequality $\varphi_i(x^\theta(\tau),u^\theta(\tau))=0$. To unify the notation, we put $\lambda=0$ and $m=0$ on $E_0$, so that now $\lambda, m$ lie in $L_1[\tau_0,\tau_1]$.

Now (4.1) assumes the form

$$ \begin{equation} \begin{aligned} \, &l_{x_0} \overline x_0+ l_{x_1} \overline x_1-\gamma \overline{z} \nonumber \\ &\qquad + \int_{\tau_0}^{\tau_1} \lambda (\varphi_x^\theta\overline x + \varphi_u^\theta\overline u)\, d\tau + \int_{\tau_0}^{\tau_1} m (g_x^\theta \overline x + g_u^\theta\overline u)\,d\tau + \int_{\tau_0}^{\tau_1}\Phi_x^\theta \overline x\,d \mu =0. \end{aligned} \end{equation} \tag{4.4} $$

3. Let us rewrite this equality in terms of the independent variables $(\overline u,\overline{z},\overline x_0)$ with due account of (3.17) (or (3.18)). We need to properly transform the terms involving $\overline x_1$ and $\overline x(\tau)$. To this aim, we require the following simple fact.

Lemma 4. Let an absolutely continuous function $\overline x(\tau)$ and a function of bounded variation $\psi(\tau)$ (both $n$-dimensional, $\overline x$ is a column, $\psi$ a row) satisfy

$$ \begin{equation} \begin{array}{ll} \dot{\overline x} = A\overline x + \overline b, & \quad\; \overline x(\tau_0) = \overline x_0, \\ \dot\psi = -\psi A + \dot\rho, & \quad\; \psi(\tau_1) = -l_1, \end{array} \end{equation} \tag{4.5} $$
where the matrix $A(\tau)$ and the function $\overline b(\tau)$ are measurable and bounded, $\rho(\tau)$ is a function of bounded variation continuous at $\tau_0$ and $\tau_1$, and $l_1\in \mathbb{R}^n$. Then
$$ \begin{equation} l_1\overline x_1 + \int_{\tau_0}^{\tau_1} \overline x\, d\rho = -\psi_0\overline x_0 - \int_{\tau_0}^{\tau_1} \psi\overline b\,d\tau. \end{equation} \tag{4.6} $$

Proof. Taking the time derivative of the product $\psi\overline x$, we have
$$ \begin{equation*} \frac d{d\tau}(\psi\overline x)= (-\psi A + \dot\rho)\overline x + \psi(A\overline x + \overline b) = \dot\rho\overline x+\psi\overline b, \end{equation*} \notag $$
and hence
$$ \begin{equation*} \psi_1\overline x_1 - \psi_0\overline x_0 = \int_{\tau_0}^{\tau_1} \overline x\,d\rho + \int_{\tau_0}^{\tau_1} \psi\overline b\,d\tau. \end{equation*} \notag $$
Now, using the terminal value $\psi_1 = -l_1$, we arrive at (4.6). $\Box$

Remark 7. This result is a generalization of the classical DuBois–Reimond lemma, which is, in fact, the integration by parts formula for the Stieltjes integral.

We now apply Lemma 4 to (4.4) and take into account (3.18). A comparison of (4.4) and (3.18), respectively, with the left-hand side of (4.6), and the upper row in (4.5) shows that

$$ \begin{equation*} \begin{gathered} \, A= v^\theta f_x^\theta, \qquad \overline b = v^\theta f^\theta_u\overline u+ \overline v f^\theta, \\ d\rho = (\lambda\varphi_x^\theta+m g_x^\theta)\,d\tau+ \Phi_x^\theta\,d\mu, \qquad l_1= l_{x_1}. \end{gathered} \end{equation*} \notag $$
Next, we introduce the function of bounded variation $\psi^\theta(\tau)$ (the adjoint variable of Problem $\mathrm{B}^\theta)$, which, according to (4.5), is a solution of the equation
$$ \begin{equation} \frac{d\psi^\theta}{d\tau}= -v^\theta\psi^\theta f_x^\theta+ \lambda\varphi_x^\theta + mg_x^\theta+\frac{d\mu}{d\tau}\,\Phi_x^\theta, \qquad \psi^\theta(\tau_1)= -l_{x_1}. \end{equation} \tag{4.7} $$
By Lemma 4, equality (4.4) assumes the form
$$ \begin{equation*} l_{x_0} \overline x_0 - \psi_0^\theta\overline x_0 - \gamma \overline{z} - \int_{\tau_0}^{\tau_1} \psi^\theta (v^\theta f_u^\theta\overline u + \overline{v} f^\theta)\,d\tau + \int_{\tau_0}^{\tau_1} (\lambda\varphi_u^\theta + m g_u^\theta) \overline u \,d\tau =0. \end{equation*} \notag $$
Since $\overline v(\tau)= \sum \overline{z}_k \chi_k(\tau)$, we have
$$ \begin{equation} \begin{aligned} \, &(l_{x_0}-\psi^\theta_0)\,\overline x_0 + \sum_k z_k^\theta \int_{\sigma_k} (-\psi^\theta f_u^\theta +\lambda\varphi_u^\theta+ mg_u^\theta)\,\overline u \,d\tau \nonumber \\ &\qquad- \sum_k \overline{z}_k \int_{\sigma_k} \psi^\theta f^\theta\, d\tau - \sum_k\gamma_k\overline{z}_k= 0. \end{aligned} \end{equation} \tag{4.8} $$

This equality holds for all $\overline x_0\in\mathbb{R}^n$, all $\overline{z}_k\in \mathbb{R}$, $k=1,\dots,m,$ and all $\overline u \in L_\infty(E_+)$. By varying $\overline x_0$ and $\overline{z}_k$, we get $\psi^\theta(\tau_0) = l_{x_0}$, and, for every $k$

$$ \begin{equation} \int_{\sigma_k} \psi^\theta f^\theta\, d\tau = -\gamma_k. \end{equation} \tag{4.9} $$

Now recall that all $\gamma_k \geqslant 0$, $ z_k^\theta\geqslant 0$, and, according to the complementary slackness condition (iii) in Theorem 2, $\gamma z^\theta := \sum \gamma_k z^\theta_k =0$, and so $\gamma_k z^\theta_k =0$ for all $k$.

If $\sigma_k\subset E_+$, then $z^\theta_k =1$, and so $\gamma_k =0$. If $\sigma_k\subset E_0$, then $z^\theta_k =0$, and we only know that $\gamma_k \geqslant 0$.

Finally, varying $\overline u$, we have

$$ \begin{equation} -\psi^\theta f_u^\theta+\lambda\varphi_u^\theta+ m g_u^\theta=0 \quad \text{on each}\ \ \sigma_k\subset E_+. \end{equation} \tag{4.10} $$
It is worth pointing out that this equality holds only on $E_+$. If $\sigma_k\subset E_0$, then $\overline u$ is not varied, and we get no condition here.

4. Let us summarize the preliminary results of deciphering of the stationarity conditions (4.1).

Theorem 3. For any index $\theta$, there exists a collection

$$ \begin{equation*} \xi^\theta= (\alpha_0,\alpha,\beta,\lambda^\theta(\tau), m^\theta(\tau),\mu^\theta(\tau)) \end{equation*} \notag $$
from the space $\mathbb{R}^{1+d(F)+d(K)} \times \bigl(L_1^{d(\varphi)} \times L_1^{d(g)} \times BV^{d(\Phi)}\bigr)[\tau_0,\tau_1]$ and a corresponding function of bounded variation $\psi^\theta(\tau)$ such that the following conditions hold:
$$ \begin{equation} \begin{split} &(\mathrm{i})\qquad \alpha_0\geqslant0,\quad \alpha\geqslant0,\quad \gamma\geqslant 0, \quad \lambda^\theta\geqslant 0,\quad d\mu^\theta \geqslant0; \\ &(\mathrm{ii})\qquad \alpha_0+ |\alpha|+ |\beta|+ \int_{E_+} |\lambda^\theta|\,dt + \int_{E_+} |m^\theta|\,dt+ \int_{\tau_0}^{\tau_1} d\mu^\theta>0, \\ &\qquad\lambda^\theta =0,\quad m^\theta=0\quad\textit{a.e. on }E_0; \\ &(\mathrm{iii})\qquad \alpha F(\widehat{x}_0,\widehat{x}_1)=0,\quad \lambda^\theta(\tau) \varphi^\theta(\tau) =0,\quad \Phi(x^\theta(\tau))\,d \mu^\theta(\tau) =0, \end{split}\nonumber \end{equation} \notag $$
$$ \begin{equation} \frac{d\psi^\theta}{d\tau}= -v^\theta\psi^\theta f_x^\theta+ \lambda^\theta \varphi_x^\theta+m^\theta g_x^\theta + \frac{d\mu^\theta}{d\tau}\,\Phi_x^\theta, \end{equation} \tag{4.11} $$
$$ \begin{equation} \psi^\theta(\tau_0)= l_{x_0}, \qquad \psi^\theta(\tau_1)= -l_{x_1}, \end{equation} \tag{4.12} $$
$$ \begin{equation} -\psi^\theta f_u^\theta+\lambda^\theta \varphi_u^\theta+ m^\theta g_u^\theta=0 \quad \textit{on } E_+, \end{equation} \tag{4.13} $$
$$ \begin{equation} \int_{\sigma_k}\psi^\theta f^\theta \, d\tau \begin{cases} =0, &\textit{if }\sigma_k\subset E_+, \\ \leqslant 0, &\textit{if }\sigma_k\subset E_0, \end{cases} \qquad k=1,\dots,m. \end{equation} \tag{4.14} $$

The function $\psi^\theta$ is uniquely determined by the collection $\xi^\theta$ from the equation (4.11) and any of the boundary conditions (4.12).

Note that the multiplier $\gamma$ does not appear in the non-triviality condition (ii), since by (4.9) it determined from $\psi^\theta$. Moreover, let us show that $m^\theta$ can also be excluded from condition (ii), that is, this condition can be written as

$$ \begin{equation*} \alpha_0+ |\alpha|+ |\beta|+ \int_{E_+} |\lambda^\theta|\,dt + \int_{\tau_0}^{\tau_1} d\mu^\theta > 0. \end{equation*} \notag $$
Indeed, if the left-hand side here is zero, then $l=0$, $\lambda^\theta=0$, and $d\mu^\theta =0$, and so
$$ \begin{equation*} \frac{d\psi^\theta}{d\tau}= -v^\theta\psi^\theta f_x^\theta+m^\theta g_x^\theta, \quad\; \psi^\theta(\tau_0)= \psi^\theta(\tau_1)= 0, \quad\; -\psi^\theta f_u^\theta+ m^\theta g_u^\theta=0 \;\; \text{on } E_+. \end{equation*} \notag $$
The matrix $g_u(x^\theta, u^\theta)$ has full rank uniformly in $\tau$, and hence its right inverse $D(\tau)$ is bounded, so that $m^\theta = \psi^\theta f_u^\theta D(\tau)$. Substituting this expression into the equation for $\psi^\theta$, we get a linear homogeneous equation with zero boundary conditions. Therefore, $\psi^\theta =0$, which also implies $m^\theta =0$.

Note that even in the general case, with non-zero $\lambda^\theta$ and $d\mu^\theta$, we can express $m^\theta = (\psi^\theta f_u^\theta - \lambda^\theta\varphi_u^\theta)D(\tau)$ and substitute this expression into the adjoint equation, thereby obtaining a linear equation with respect to $\psi $ that contains $\lambda^\theta$ and $d\mu^\theta$.

5. Consider in more detail the second condition in (4.14). We take any interval $\sigma = [\tau', \tau'']$ composing $E_0$. On this interval $u^\theta(\tau) = u^s$ for some $s$, it is constant, and $v^\theta = 0$, whence $x^\theta(\tau)$ is also constant, which we denote by $\widehat{x}_*$. Thus, $f^\theta = f(\widehat{x}_*,u^s)$. Note that some other intervals from $E_0$ may be adjacent to $[\tau', \tau'']$ from left or right. (The mapping $\tau \mapsto t$ sends each such an interval to the same point $t^s.)$ Let $\widetilde \sigma =[\tau'_*,\tau''_*]$ be the union of this interval with all the adjacent intervals from $E_0$. (If there are no adjacent intervals on the left-hand side of $\sigma$, we have $\tau'_* =\tau'$, and if there are no such interval on the right-hand side of $\sigma$, we have $\tau''_* =\tau''.)$ Since $v^\theta = 0$ on the entire interval $\widetilde\sigma$, and so we still have $x^\theta(\tau)= \widehat{x}_*$ is constant there.

According to (4.11) and since $\lambda=0 $ and $m=0$ on $\widetilde\sigma$, we have

$$ \begin{equation} d\psi^\theta(\tau)= \sum_{j=1}^{d(\Phi)} d\mu_j^\theta(\tau)\,\Phi_j'(\widehat{x}_*) \quad \text{on }\, \widetilde\sigma. \end{equation} \tag{4.15} $$
(Recall that the index $j$ denotes the $j$th state constraint $\Phi_j(x)\leqslant0$ and the corresponding measure $d\mu_j^\theta$. Here, $\Phi'_j$ are the rows of the matrix $\Phi_x$.) This implies that, for any $\tau\in [\tau'_*,\tau''_*]$,
$$ \begin{equation} \psi^\theta(\tau)-\psi^\theta(\tau'_*-0)= \sum_{j=1}^{d(\Phi)}\, [\mu_j^\theta(\tau)-\mu_j^\theta(\tau'_*-0)]\Phi_j'(\widehat{x}_*). \end{equation} \tag{4.16} $$
The second condition in (4.14) means that, on the above non-extended interval $\sigma$,
$$ \begin{equation} \int_{\tau'}^{\tau''} \psi^\theta(\tau) f(\widehat{x}_*,u^k)\, d\tau \leqslant 0. \end{equation} \tag{4.17} $$
Putting here the value $\psi^\theta(\tau)$ from (4.16), we get
$$ \begin{equation} \begin{aligned} \, &\psi^\theta(\tau'_*-0)f(\widehat{x}_*,u^s)(\tau''-\tau') \nonumber \\ &\qquad +\sum_{j=1}^{d(\Phi)}\Phi'_j(\widehat{x}_*)f(\widehat{x}_*,u^s) \int_{\tau'}^{\tau''}[\mu_j^\theta(\tau)- \mu_j^\theta(\tau'_*-0)]\,d\tau \leqslant 0. \end{aligned} \end{equation} \tag{4.18} $$
Since $\mu_j^\theta(\tau) \leqslant \mu_j^\theta(\tau''_*+0)$ on $[\tau', \tau'']$ for all $j$, we have
$$ \begin{equation*} \int_{\tau'}^{\tau''}[(\mu_j^\theta(\tau) -\mu_j^\theta(\tau'_*-0)]\, d\tau \leqslant [\mu_j^\theta(\tau''_*+0) -\mu_j^\theta(\tau'_*-0)](\tau''-\tau'). \end{equation*} \notag $$
Let numbers $0\leqslant \rho_j \leqslant 1$, $\;j=1,\dots,d(\Phi)$, be such that
$$ \begin{equation*} \int_{\tau'}^{\tau''} [(\mu_j^\theta(\tau) -\mu_j^\theta(\tau'_*-0)]\, d\tau = \rho_j [\mu_j^\theta(\tau''_*+0) -\mu_j^\theta(\tau'_*-0)](\tau''-\tau'). \end{equation*} \notag $$
Hence from (4.18) we have
$$ \begin{equation} \biggl(\psi^\theta(\tau'_*-0) + \sum_{j=1}^{d(\Phi)} \rho_j [\mu_j^\theta(\tau''_* +0) -\mu_j^\theta(\tau'_*-0)] \Phi'_j(\widehat{x}_*)\biggr) f(\widehat{x}_*,u^s) \leqslant 0. \end{equation} \tag{4.19} $$

Remark 8. This non-trivial trick of replacing condition (4.18) by condition (4.19) with unknown numbers $\rho_j$ was proposed by Milyutin in his lectures at the Faculty of Mechanics and Mathematics of Lomonosov Moscow State University in the 1970s. This trick will allow us to proceed with the next important step in the proof of an MP for problems with several state constraints, that is, to pass from the conditions in the time $\tau$ to conditions in the original time $t$. In the case of a scalar state constraint, this trick is not required, since in this setting the function $\psi^\theta(\tau) f(\widehat{x}_*,u^k)$ is monotone on $[\tau'_*,\tau''_*]$ (see [27]).

Now, we rewrite the obtained conditions in terms of the original time $t.$ This will make it possible to consider conditions, as obtained for different indexes $\theta$, on the same interval $[\widehat t_0,\widehat t_1]$.

§ 5. Finite-valued maximum principle of index $\theta$

By construction, $t^\theta(\tau)$ is a non-decreasing function on $[\tau_0, \tau_1]$ which maps this interval onto $[\widehat t_0,\widehat t_1]$, and which is constant on each interval $\sigma \subset E_0$. In addition, on $[\tau_0, \tau_1]$ we have the functions $u^\theta(\tau)$ and $x^\theta(\tau)$ related to the original functions $\widehat{x}(t)$ and $\widehat{u}(t)$ via (3.11).

Let $\tau^\theta(t)$ be the smallest root of the equation $t^\theta(\tau) =t$. This function strictly increases and have jumps at the given points $t^s$ (and only at these points), that is, the jump $\Delta\tau(t^s) = \tau''_* -\tau'_*$, where $[\tau'_*,\tau''_*]$ is the above maximal interval corresponding to the point $t^s$. Consider the functions

$$ \begin{equation*} \begin{alignedat}{2} \lambda(t) &= \lambda^\theta(\tau^\theta(t)), & \qquad m(t) &= m^\theta(\tau^\theta(t)), \\ \mu(t) &= \mu^\theta(\tau^\theta(t)), & \qquad \psi(t)&=\psi^\theta(\tau^\theta(t)), \end{alignedat} \qquad t\in [\widehat t_0,\widehat t_1]. \end{equation*} \notag $$

Since $\lambda^\theta =0$ and $m^\theta=0$ on $E_0$, and $dt = d\tau$ on $E_+$, the functions $\lambda(t)$ and $m(t)$ are also integrable, now on the interval $[\widehat t_0,\widehat t_1]$, and the normalization of these multipliers is preserved when passing from $\tau$ to $t$:

$$ \begin{equation*} \int_{\widehat t_0}^{\widehat t_1} |\lambda(t)|\,dt = \int_{\tau_0}^{\tau_1} |\lambda^\theta(\tau)|\,dt, \qquad \int_{\widehat t_0}^{\widehat t_1} |m(t)|\,dt = \int_{\tau_0}^{\tau_1} |m^\theta(\tau)|\,dt. \end{equation*} \notag $$
(The second equality will not be used below, since the multiplier $m$ is excluded from the normalization).

It is easily seen that the function $\mu(t)$ does not decrease and has the jumps $\Delta\mu(t^s)= \mu^\theta(\tau''_* +0) -\mu^\theta(\tau'_*-0)$ at the points $t^s$; moreover,

$$ \begin{equation*} \int_{\widehat t_0}^{\widehat t_1} d\mu(t)\,dt = \int_{\tau_0}^{\tau_1} d\mu^\theta(\tau)\,d\tau, \end{equation*} \notag $$
and $\psi(t)$ is a function of bounded variation satisfying
$$ \begin{equation*} \begin{aligned} \, \frac{d\psi(t)}{dt} &= -\psi(t)f_x(\widehat{x}(t),\widehat{u}(t)) \\ &\qquad + \lambda(t)\varphi_x(\widehat{x}(t),\widehat{u}(t))+ m(t)g_x(\widehat{x}(t),\widehat{u}(t))+ \frac{d \mu(t)}{dt}\, \Phi'(\widehat{x}(t)) \end{aligned} \end{equation*} \notag $$
with the same endpoint values as $\psi^\theta(\tau)$. This equation follows from (4.11) since $d\mu(t) = d\mu^\theta(\tau)$ for $\tau \in E_+$ and for the corresponding $t = t^\theta(\tau)$. The proof of these properties is left to the reader.

Recall that, in view of assumption (2.7), the measure does not work near the points $\widehat t_0$ and $\widehat t_1$ ($d\mu(t)=0$), and hence $\psi(t)$ is continuous at these points.

Theorem 3 can be rewritten in the original time $t\in [\widehat t_0,\widehat t_1]$ as follows.

Theorem 4 (maximum principle for index $\theta$). For any index $\theta$, there exists a collection $\xi = (\alpha_0,\alpha,\beta,\lambda(t), m(t),\mu(t))$, where the functions $\lambda(t)$ and $m(t)$ are integrable, $\mu(t)$ is non-decreasing, and a function of bounded variation $\psi(t)$ corresponding to this collection, such that:

$$ \begin{equation} \begin{gathered} \, \begin{split} &(\mathrm{i}) \ \alpha_0\geqslant0,\quad \alpha\geqslant 0,\quad \lambda(t)\geqslant0,\quad d\mu(t)\geqslant 0; \\ &(\mathrm{ii}) \ \alpha_0+ |\alpha|+ |\beta|+ \sum_i \int_{\widehat t_0}^{\widehat t_1} \lambda_i(t)\,dt+ \sum_j \int_{\widehat t_0}^{\widehat t_1}d \mu_j(t) =1; \\ &(\mathrm{iii}) \ \alpha F(\widehat{x}_0,\widehat{x}_1)=0, \qquad \lambda_i(t)\varphi_i(\widehat{x}(t)),\widehat{u}(t))=0,\quad i=1,\dots, d(\varphi), \\ &\qquad\Phi_j(\widehat{x}(t))\,d\mu_j(t) =0,\qquad j=1,\dots, d(\Phi); \\ &(\mathrm{iv}) \ \frac{d\psi}{dt}= -\psi f_x(\widehat{x},\widehat{u})+ \lambda\varphi_x(\widehat{x},\widehat{u})+ m g_x(\widehat{x},\widehat{u})+ \frac{d \mu}{dt}\, \Phi_x(\widehat{x}); \\ &(\mathrm{v}) \ \psi(\widehat t_0)= l_{x_0},\qquad \psi(\widehat t_1)= -l_{x_1}; \\ &(\mathrm{vi}) \ -\psi f_u(\widehat{x},\widehat{u})+ \lambda\varphi_u(\widehat{x},\widehat{u})+ mg_u(\widehat{x},\widehat{u})=0; \\ &(\mathrm{vii}) \ \textit{for any neighbouring points }t^s < t^{s+1}\textit{ of index }\theta, \\ &\qquad\qquad\qquad\int_{t^s}^{t^{s+1}} \psi(t)\,f(\widehat{x}(t),\widehat{u}(t))\,dt = 0, \\ &(\mathrm{viii}) \ \textit{for any pair }(t^s,u^s)\textit{ of index }\theta, \textit{ there exist numbers } 0\leqslant \rho_j\leqslant 1,\qquad\quad \\ &\qquad\ j=1,\dots,d(\Phi),\textit{ such that} \end{split} \end{gathered} \end{equation} \notag $$
$$ \begin{equation} \biggl(\psi(t^s -0)+ \sum_{j=1}^{d(\Phi)} \rho_j \Delta\mu_j(t^s)\Phi'_j(\widehat x(t^s))\biggr) f(\widehat{x}(t^s),u^s) \leqslant 0. \end{equation} \tag{5.1} $$

Relation (vii) is secured by the first in (4.14) since on any $\Delta \subset E_+$ the mapping $\tau \to t$ is one-one and $v^\theta(\tau)=1$, which gives $dt= d\tau$. Relation (viii) follows from (4.19).

Note that the function $\psi(t)$ is uniquely determined by $\xi$ from equation (iv) and any of boundary conditions (v).

Thus, for the given index $\theta$, we obtained a collection of Lagrange multipliers that generate a function $\psi(t)$, so that conditions (i)–(viii) hold. These Lagrange multipliers depend, in general, on the index $\theta$. Conditions (i)–(vi) are the same for all indexes, but conditions (vii)–(viii) are index-specific. Now, our goal is to pass to conditions (vii)–(viii) for a “universal” collection of multipliers which do not depend on the index $\theta$.

§ 6. Passage to a universal maximum principle

1. Using the regularity assumption of the mixed constraints we have shown above that the multipliers $\lambda(t)$ and $m(t)$ in Theorem 4 are integrable. Let us now show that these functions are bounded. Since the number of active indexes of mixed inequality constraints $\varphi_i(\widehat{w}(t))\leqslant0$ varies with time, one can consider all possible finite subsets of $\{1,\dots, d(\varphi)\}$. In this case, it can be assumed, without loss of generality, that all the inequalities $\varphi_i(\widehat{w}(t))\leqslant0$ are active on some measurable set $E \subset [\widehat t_0,\widehat t_1]$. Consider the sets

$$ \begin{equation*} \begin{gathered} \, S = \biggl\{(\alpha,\beta)\in \mathbb{R}^{d(\varphi)}\times \mathbb{R}^{d(g)}\biggm|\; \alpha\geqslant 0,\, \sum\alpha_i + \sum |\beta_j|=1\biggr\}, \\ Q_0 = \{w\in \mathcal{Q}\mid \varphi(w)=0,\; g(w)=0\}. \end{gathered} \end{equation*} \notag $$
(Recall that $Q$ is an open subset of $\mathbb{R}^{n+r}$ on which the data functions of Problem $B$ are defined.)

By the assumption, for any $w\in Q_0$, the vectors $\varphi_{iu}(w)$ and $g_{ju}(w)$ are positively linearly independent, and hence

$$ \begin{equation*} \min_S\,\Bigl|\sum\alpha_i \varphi_{iu}(w) +\sum \beta_j g_{ju}(w)\Bigr| >0. \end{equation*} \notag $$
The function on the left-hand side is continuous, and hence, for any compact set $M \subset Q_0$, we still have
$$ \begin{equation*} \min_{w\in M} \min_S\, \Bigl|\sum\alpha_i \varphi_{iu}(w) + \sum \beta_j g_{ju}(w)\Bigr| := c >0. \end{equation*} \notag $$
Hence, for any $w\in M$ and any $\alpha\geqslant0$ and $\beta$,
$$ \begin{equation} \Bigl|\sum\alpha_i \varphi_{iu}(w) + \sum \beta_j g_{ju}(w)\Bigr| \geqslant c \Bigl(\sum\alpha_i + \sum |\beta_j|\Bigr). \end{equation} \tag{6.1} $$

Recall that, for the process $\widehat{w}$, there exists a compact set $D \subset Q$ such that $\widehat{w}(t) \in D$ almost everywhere on $[\widehat t_0,\widehat t_1]$, and hence, on $E$. We now set $M = D\cap Q_0$. Clearly, $M$ is a compact set, and $\widehat{w}(t) \in M$ almost everywhere on $E$.

We next set $\alpha= \lambda(t)$ and $\beta = m(t)$. In view of (6.1), for almost all $t\in E$,

$$ \begin{equation*} \sum \lambda_i(t) + \sum |m_j(t)| \leqslant \frac 1c\, \Bigl|\sum \lambda_i(t) \varphi_{iu}(\widehat{w}(t)) + \sum m_j(t) g_{ju}(\widehat{w}(t))\Bigr|. \end{equation*} \notag $$

By condition (vi), the quantity under the modulus sign on the right is a bounded function $\psi(t) f_u(\widehat{w}(t))$, whence

$$ \begin{equation} \sum \lambda_i(t) + \sum |m_j(t)| \leqslant \frac 1c\,|\psi(t) f_u(\widehat{w}(t))|, \end{equation} \tag{6.2} $$
which implies that all the multipliers $\lambda_i(t)$ and $m_j(t)$ are also bounded, that is, they lie in $L_\infty(E)$.

2. To take into account the conditions generated by all the indexes $\theta$, we proceed as follows.4 For a given index $\theta$, we introduce the set $\Lambda^\theta$ of all collections $\xi =(\alpha_0,\alpha,\beta,\lambda(t),m(t),\mu(t))$ that satisfy, together with the corresponding functions $\psi(t)$, conditions (i)–(viii) of Theorem 4; we denote this set by $\Lambda^\theta$. According to the above, this set lies in the space

$$ \begin{equation*} Y^* = \mathbb{R}^{1+d(F)+d(K)} \times L_\infty^{d(\varphi)}(\Delta) \times L_\infty^{d(g)}(\Delta) \times BV^{d(\Phi)}(\Delta), \end{equation*} \notag $$
which is dual of the space
$$ \begin{equation*} Y = \mathbb{R}^{1+d(F)+d(K)} \times L_1^{d(\varphi)}(\Delta) \times L_1^{d(g)}(\Delta)\times C^{d(\Phi)}(\Delta), \end{equation*} \notag $$
where $\Delta = [\widehat t_0,\widehat t_1].\;$

The following key fact holds.

Lemma 5. The set $\Lambda^\theta$ is compact in the w$^*$-topology of the space $Y^*$.

Proof. First, let us check that $\Lambda^\theta$ is bounded. By the normalization condition (ii), $\alpha_0 +|\alpha|+|\beta|\leqslant 1$ and $\|\lambda\|_1 + \|d\mu\|\leqslant 1$. Proceeding as above, multiplying equality (vi) by a bounded matrix $D(t)$, we get the expression $m = (\psi f_u - \lambda g_u)D(t)$, which we substitute into (iv), thereby obtaining, for $\psi$, the linear equation $d\psi = (A(t)\psi + B(t)\lambda)\,dt + G(t)\, d\mu$, where $A$, $B$, and $G$ are bounded measurable matrices. By Lemma 6 (see the appendix, § 9.3),
$$ \begin{equation*} \|\psi\|_\infty \leqslant \mathrm{const} \bigl(|\psi(\widehat t_0)| + \|\lambda\|_1 + \|d\mu\|\bigr) \leqslant \mathrm{const} \end{equation*} \notag $$
on $\Lambda^\theta$, and hence by (6.2) we also have $\|\lambda\|_\infty + \|m\|_\infty \leqslant \mathrm{const}$ on $\Lambda^\theta$. This shows that the set $\Lambda^\theta$ is bounded.

Further, since all the conditions defining $\Lambda^\theta$ (except the normalization condition) are linear with respect to all components of $\xi$, and since the normalization of the infinite-dimensional components $\lambda_i$ and $d\mu_j$ is given by linear functionals from the original spaces (from $L_1$ and $C$, respectively), that is, by w$^*$-continuous functionals, it follows that the set $\Lambda^\theta$ is w$^*$-closed.

For example, let us check the w$^*$-closedness of the set of all $\xi$ satisfying the adjoint equation (iv), which we write in terms of measures

$$ \begin{equation} d\psi(t)= (-\psi f_x + \lambda\varphi_x + m g_x)\,dt + \Phi_{x}\, d\mu(t), \end{equation} \tag{6.3} $$
and the boundary conditions
$$ \begin{equation} \psi(\widehat t_0) = (\alpha_0 F_0 + \alpha F +\beta K)'_{x_0}, \qquad \psi(\widehat t_1) = -(\alpha_0 F_0 + \alpha F +\beta K)'_{x_1}. \end{equation} \tag{6.4} $$
Note that the functions $f_x$, $\varphi_x$, and $g_x$, as evaluated along the process $\widehat{w}(t)$, are measurable and bounded, and $\Phi_x$ is continuous.

Using again the expression $m =(\psi f_u-\lambda g_u)D(t)$ with bounded matrix $D(t)$, we have by (6.3) the equation

$$ \begin{equation} d\psi(t)= A(t)\psi(t)\,dt+B(t)\lambda(t)\,dt+ G(t)\,d\mu(t), \end{equation} \tag{6.5} $$
where $A$, $B$, $G$ are some measurable bounded matrices of corresponding dimensions. (Here, $\psi$, $\lambda$, and $\mu$ are considered as columns.)

Let collections $\xi^k\in Y^*$, $k =1,2,\dots$, satisfy conditions (6.3), (6.4) and w$^*$-converge to a collection $\xi^0 \in Y^*$. We have to show that the limit collection $\xi^0$ also satisfies these conditions. (The space $Y$ is separable, and hence the w$^*$-topology is metrizable on the bounded subsets of $Y^*$, and so it suffices to work with sequences.)

The w$^*$-convergence of the measures $d\mu^k \to d\mu^0$ in $C^*$, the w$^*$-convergence of the functions $\lambda^k \to \lambda^0$ in $L_\infty$, and the convergence of the finite-dimensional components $(\alpha_0^k,\alpha^k,\beta^k) \to (\alpha_0^0,\alpha^0,\beta^0)$ imply that the functions $\psi^k$ are uniformly bounded, $\psi^k(t) \to \psi^0(t)$ almost everywhere on $\Delta$, and the measures $d\psi^k$ w$^*$-converge to $d\psi^0$ (see Lemma 7 in § 9.3). By the Lebesgue theorem, $\psi^k \to \psi^0$, and hence $m^k $ w$^*$-converges to $m^0$ in $L_\infty$.

Since (6.3) is an equality between linear functionals on $C^n(\Delta)$, it suffices to check this equality at any test function $\overline x\in C^n(\Delta)$. By the assumption, for any $k$,

$$ \begin{equation} \int_\Delta \overline x\,d\psi^k= \int_\Delta \overline x\,(-\psi^k f_x + \lambda^k\varphi_x + m^k g_x)\,dt + \int_\Delta \overline x\,\Phi_{x}\, d\mu^k. \end{equation} \tag{6.6} $$
Since $d\psi^k\to d\psi^0$ and $d\mu^k\to d\mu^0$ weakly$^*$, the left-hand side and the last term on the right converge to the required limits
$$ \begin{equation} \int_\Delta \overline x\,d\psi^k \to \int_\Delta \overline x\,d\psi^0, \qquad \int_\Delta \overline x\,\Phi_x\,d\mu^k \to \int_\Delta \overline x\,\Phi_x\,d\mu^0, \end{equation} \tag{6.7} $$
and the convergence of the middle integral on the right of (6.6)
$$ \begin{equation*} \int_\Delta \overline x\,(-\psi^k f_x + \lambda^k\varphi_x + m^k g_x)\,dt \to \int_\Delta \overline x\,(-\psi^0 f_x + \lambda^0\varphi_x + m^0 g_x)\,dt \end{equation*} \notag $$
is secured by the fact that $\psi^k\to \psi^0$, $\;\lambda^k\to \lambda^0$ and $m^k\to m^0$ weakly$^*$ in $L_\infty$ with respect to $L_1$.

Passing to the limit, we have

$$ \begin{equation*} \int_\Delta \overline x\,d\psi^0= \int_\Delta \overline x\,(-\psi^0 f_x + \lambda^0 \varphi_x + m^0 g_x)\,dt + \int_\Delta \overline x\,\Phi_{x}\, d\mu^0. \end{equation*} \notag $$
Since $\overline x \in C^n(\Delta)$ is arbitrary, this yields the required equality
$$ \begin{equation*} d\psi^0(t)= (-\psi^0 f_x + \lambda^0 \varphi_x + m^0\,g_x)\,dt + \Phi_{x}\, d\mu^0(t), \end{equation*} \notag $$
which means that equation (iv) is preserved under w$^*$-limits.

The proof of the w$^*$-closedness of conditions (i), (ii), (v), (vi), (vii), and of the first two conditions in (iii) which define the set $\Lambda^\theta$ is even simpler and hence omitted. The third condition in (iii), for any $j= 1, \dots, d(\Phi)$, means that, for any interval between neighbouring points $t^s< t^{s+1}$ at which $\Phi_j(\widehat{x}(t))<0$, the measure $d\mu_j $ vanishes. This is equivalent to saying that $\int \overline x\,d\mu_j =0$ for any continuous function $\overline x(t)$ supported on this interval. Obviously, this property is preserved under the w$^*$-convergence $d\mu^k_j \to d\mu^0_j$.

It remains only to consider the last condition (viii). We fix any $s$ and consider the pair $(t^s,u^s)$ from the index $\theta$. To avoid confusion with the numbers in a sequence, we set here $t^s =t_*$, $\widehat{x}(t^s)= \widehat{x}_*$, and $u^s = u_*$.

For any collection $\xi \in \Lambda^\theta$, we introduce the function $h(t) = \psi(t) f(\widehat{x}_*,u_*)$, that is, the projection of a vector $\psi(t)$ to a fixed direction $f(\widehat{x}_*,u_*)$, and consider it on the entire interval $\Delta = [\widehat t_0,\widehat t_1]$. By (6.3), the function $h$ satisfies

$$ \begin{equation*} d h(t)= (-\psi f_x + \lambda\varphi_x + m\,g_x)f(\widehat{x}_*,u_*)\,dt+ \sum \Phi'_j(\widehat{x}(t))\,f(\widehat{x}_*,u_*)\, d\mu_j(t). \end{equation*} \notag $$

Consider the scalar functions $a_j(t) = \Phi'_j(\widehat{x}(t))\, f(\widehat{x}_*,u_*)$, $j=1,\dots, d(\Phi)$. These functions are continuous, since so is $\Phi'_j(\widehat{x}(t))$, and since the vector $f(\widehat{x}_*,u_*)$ is constant. We also introduce the function $b(t) = (-\psi f_x + \lambda\varphi_x + m\,g_x)f(\widehat{x}_*,u_*)$, which is measurable and bounded. We have

$$ \begin{equation} d h(t)= b(t)\,dt+ \sum a_j(t)\, d\mu_j(t). \end{equation} \tag{6.8} $$

Now let a sequence $\xi^k \in\Lambda^\theta$ w$^*$-converge to $\xi^0 \in Y^*$. Then, for any $k =1,2, \dots$, we have the measure

$$ \begin{equation*} d h^k(t)= b^k(t)\,dt+ \sum a_j(t)\, d\mu_j^k(t), \end{equation*} \notag $$
where $\|b^k\|_\infty \leqslant \mathrm{const}$ and $\|d\mu_j^k\| \leqslant 1$ by the normalization condition (ii).

In view of condition (viii), for the given pair $(t_*,u_*)$, for any $n$, there exist numbers $\rho_{j}^k\in[0,1]$, $j=1,\dots,d(\Phi)$, such that

$$ \begin{equation*} h^k(t_*-0)+ \sum \rho_{j}^k a_j(t_*)\Delta\mu_{j}^k(t_*) \leqslant 0. \end{equation*} \notag $$
Clearly, $h^k(\widehat t_0) \to h^0(\widehat t_0)$, and by the assumption, $b^k \xrightarrow{w^*} b^0$ (in $L_\infty(\Delta)$ with respect to $L_1(\Delta))$, and also $d\mu^k \xrightarrow{{w^*}} d\mu^0$ for all $j$. Therefore, Lemma 11 (see the appendix, § 9.3) applies, and hence there exist numbers $\rho_{j}^0\in [0,1]$, $ j= 1,\dots, d(\Phi)$, such that
$$ \begin{equation*} h^0(t_*-0)+ \sum \rho_{j}^0 a_j(t_*) \Delta\mu_{j}^0(t_*) \leqslant 0. \end{equation*} \notag $$
The last inequality means that
$$ \begin{equation*} \biggl(\psi^0(t_*-0)+ \sum_{j=1}^{d(\Phi)} \rho^0_{j}\, \Delta\mu^0_{j}(t_*)\Phi'_j(\widehat{x}_*)\biggr) f(\widehat{x}_*,u_*) \leqslant 0, \end{equation*} \notag $$
which is exactly condition (viii) for the limit collection $\xi^0 \in \Lambda^\theta$.

Thus, the set $\Lambda^\theta$ is bounded and w$^*$-closed, and hence is w$^*$-compact by the Alaoglu theorem. Lemma 5 is proved. $\Box$

3. Now, for each possible index $\theta$, we obtain a corresponding non-empty compact set $\Lambda^\theta$. Let us show that the family of all such compact sets constitutes a centred (Alexandrov type) system. To this aim, we introduce a partial order in the set of all indexes. We say that $\theta_1\subset \theta_2$ if each pair $(t^s,u^s)$ from $\theta_1$ also lies in $\theta_2$. Obviously, for any two indexes $\theta_1$ and $\theta_2$, there is a third one which contains both of them, for example, their union. It is also clear that an expansion of $\theta$ reduces the set $\Lambda^\theta$, that is, $\theta_1\subset\theta_2$ implies the inverse inclusion $\Lambda^{\theta_1}\supset \Lambda^{\theta_2}$.

Consider now a finite collection of compact sets $\Lambda^{\theta_1},\dots, \Lambda^{\theta_m} $ and take any index $\theta$ containing all indexes $\theta_1,\dots, \theta_m$. The non-empty compact set $\Lambda^\theta$ is contained in each of the sets $\Lambda^{\theta_1},\dots,\Lambda^{\theta_m}$, and so, their intersection is non-empty. Therefore, the family $\{ \Lambda^{\theta}\}$ is centred, and hence has a non-empty intersection

$$ \begin{equation*} \Lambda_*=\; \bigcap_{\theta}\Lambda^\theta. \end{equation*} \notag $$

Next, consider an arbitrary collection of multipliers $\xi=(\alpha_0,\alpha,\beta,\lambda,m,\mu)\in \Lambda_*$, and let $\psi$ be the corresponding adjoint function. By definition, this collection satisfies conditions (i)–(vi) of Theorem 4 . Fulfillment of condition (vii) for any index $\theta$ means that, for any interval $(t',t'')$,

$$ \begin{equation*} \int_{t'}^{t''}\psi(t) f(\widehat{x}(t),\widehat{u}(t))\,dt = 0 \end{equation*} \notag $$
(since there exists an index containing the points $t'$ and $t''$); this is equivalent to saying that
$$ \begin{equation} \psi(t)f(\widehat{x}(t),\widehat{u}(t)) = 0 \qquad \text{a.e. on $[\widehat t_0,\widehat t_1]$}. \end{equation} \tag{6.9} $$

Condition (viii) for the collection $\xi$ implies that, for any $u \in \mathcal{R}_0(\widehat{x}(t))$ and any point $t \in (\widehat t_0,\widehat t_1)$ at which the measures $d\mu_j$ do not have atoms (that is, $\Delta\mu_j(t)=0$), we have

$$ \begin{equation} \psi(t -0) f(\widehat{x}(t),u) \leqslant 0. \end{equation} \tag{6.10} $$
Since this inequality holds for all $t$ except countably many points, and since $\psi$ can be considered continuous from either the left or right, this inequality holds for all $t$ from $(\widehat t_0,\widehat t_1)$, and hence it also holds for the endpoints of this interval (since $\psi$ is continuous at these points). Hence, for any $t$, we also have the symmetric inequality
$$ \begin{equation} \psi(t +0) f(\widehat{x}(t),u) \leqslant 0. \end{equation} \tag{6.11} $$
Inequalities (6.10) and (6.11) remain valid for all $u \in \mathcal{R}(\widehat{x}(t))$, since by Lemma 1 any such point is a limit point of $\mathcal{R}_0(\widehat{x}(t))$, and hence, condition (3.6) of the maximum principle for the autonomous Problem $\mathrm{B}$ is met.

Thus, the chosen collection $\xi$ ensures all the conditions of the MP for Problem $\mathrm{B}$. This proves Theorem 1 for Problem $\mathrm{B}$, and therefore, for the original Problem $\mathrm{A}$. $\Box$

§ 7. A problem with an inclusion type constraint

Consider briefly a problem with an inclusion type constraint $u(t) \in U$. According to Remark 3, we cannot simply add this constraint to the problem. However, we may proceed as follows. Assume that the control components are split into two groups: $u= (u_1, u_2)$, where $u_1\in\mathbb{R}^{r_1}$, $u_2\in\mathbb{R}^{r_2}$. Consider Problem $\mathrm{A}$ with an additional constraint only on the second group: $u_2(t) \in U$, where the set $U \subset \mathbb{R}^{r_2}$ is arbitrary. This problem will be called Problem $\mathrm{D}$. (If the component $u_2$ is absent, we still have Problem $\mathrm{A}$.)

Assume that the functions $f$, $\varphi$, $g$, and their first derivatives with respect to $u_1$ are jointly continuous with respect to $(t,x, u_1, u_2)$ on the set $\mathcal{Q}$, whereas no differentiability with respect to $u_2$ is assumed. We say in this case that $u_1$ is a smooth control and $u_2$ is a non-smooth control.

The regularity assumption of the mixed constraints should be now related only to the smooth control, that is, one should assume that, for any point $(t,x,u_1,u_2)\in \mathcal{Q}$ at which these constraints are met together with $u_2 \in U$, the gradients with respect to $u_1$,

$$ \begin{equation*} \varphi'_{iu_1}(t,x,u_1,u_2),\quad i\in I(t,x,u_1,u_2), \qquad g'_{ju_1}(t,x,u_1,u_2),\quad j=1,\dots, d(g), \end{equation*} \notag $$
are positively linearly independent. (Note that this is a more restrictive assumption than the former one, which involves the gradients with respect to all the control components.) The following analog of Theorem 1 holds.

Theorem 5. If a process $\widehat{w}=(\widehat{x}(t), \widehat{u}_1(t), \widehat{u}_2(t))$, $t\in [\widehat t_0, \widehat t_1]$, delivers a Pontryagin minimum in Problem $\mathrm{D}$, then there exist multipliers $\alpha_0$, $\alpha$, $\beta$, and functions $\lambda(t)$, $m(t)$, $\mu(t)$, $\psi_x(t)$, $\psi_t(t)$ of the same classes as before, for which conditions (i)–(vi) of Theorem 1 still hold, condition (vii) is replaced by

$$ \begin{equation*} \overline H_{u_1}(\psi_x(t),t,\widehat{x}(t), \widehat{u}(t))=0, \end{equation*} \notag $$
and condition (viii) holds for all $u' =(u'_1, u'_2)$ such that
$$ \begin{equation*} \varphi(t,\widehat{x}(t), u'_1, u'_2)\leqslant0,\quad g(t,\widehat{x}(t), u'_1, u'_2) =0, \qquad u'_2 \in U. \end{equation*} \notag $$

If $\widetilde{\mathcal{R}}(t,x(t))$ denotes the set of all $u' =(u'_1, u'_2)$ satisfying the last three relations, then the set $\mathcal{R}(t,\widehat{x}(t))$ in the maximality condition (2.9) should be replaced by $\widetilde{\mathcal{R}}(t,\widehat{x}(t))$.

The proof, which proceeds as above, involves a reduction to the autonomous case, the only difference is that now the index $\theta$ consists of finitely many triples $(t^s, u_1^s, u_2^s)$ satisfying $\varphi(\widehat{x}(t^s),u_1^s, u_2^s) <0$, $\;g(\widehat{x}(t^s),u_1^s, u_2^s) =0$ and $u_2^s \in U$, and, in addition, for constructing the control system in the $\theta$-problem one should, in analogy with Lemma 2, define $u_1 = (\tilde u_1,\tilde u_2)$, so that the matrix $g'_{\tilde u_2}(\widehat{x}(t^s),\tilde u_1^s,\tilde u_2^s, u_2^s)$ is invertible, resolve the equality $g(x,\tilde u_1,\tilde u_2, u_2^s) =0$ by a smooth function $\tilde u_2 =G(x,\tilde u_1, u_2^s)$, and then freeze the value $\tilde u_1 = \tilde u_1^s$. The details are left to the reader.

§ 8. Example: geodesics on a smooth surface

In the Euclidean space $\mathbb{R}^n$, consider the surface $S\colon c(x)=0$, where $c$ is a twice differentiable function such that $c'(x) \ne 0$ at all points of the surface. We are given two points $x_0$ and $x_1$ on this surface. The problem is to find a shortest curve lying on $S$ which connects these points.

We represent this problem as the time-optimal control problem

$$ \begin{equation*} \begin{gathered} \, \dot x =u, \qquad |u|\leqslant 1, \qquad x(t_0)= x_0, \qquad x(t_1)= x_1, \\ c(x(t))=0, \qquad J = t_1 -t_0 \to \min. \end{gathered} \end{equation*} \notag $$

Here, $x$ is a state variable, and its velocity $u$ is a control. Clearly, if the modulus of the velocity is bounded by 1, then the fastest trajectory has the shortest length. Since the problem is linear with respect to the control, and since the set of admissible control values is convex and compact, the existence of a solution is secured by the classical Filippov theorem.

As already mentioned, the state constraint $c(x)=0$ is not allowed. Differentiating it by virtue of the control system and taking into account that the initial point $x_0$ lies on $S$, we replace it by the equality $x(t_0)= x_0$ and the nullity of the derivative $c'(x)u=0$. However, in this case, the equality $x(t_1)= x_1$ at the terminal point is overdetermined, since we automatically get $c(x(t_1))=0$, and so, the set of equality constraints becomes a priori jointly degenerate. In fact, to satisfy the terminal condition $x(t_1)= x_1$ it suffices to fulfil it only on the tangential hyperplane to $S$ at the point $x_1$.

Let $L(x_1)$ be this tangential hyperplane, and $\xi_1,\dots, \xi_{n-1}$ be some basis for it. It suffices to require at $t= t_1$ that

$$ \begin{equation} (\xi_i,\, (x(t_1)- x_1)) =0, \qquad i=1,\dots, n-1, \end{equation} \tag{8.1} $$
that is, $\pi_L (x(t_1)- x_1)) =0$, where $\pi_L\colon \mathbb{R}^n \to L(s_1)$ is the orthogonal projection onto $L(x_1)$ along the vector $c'(x_1)$. It is easily seen that in some neighbourhood of $x_1$ this equality and $c(x_1)=0$ imply that $x(t_1)= x_1$.

Thus, instead of the original “incorrect” statement of the problem, we consider the problem

$$ \begin{equation} \dot x =u, \qquad x(t_0)= x_0, \quad J = t_1 -t_0 \to \min, \end{equation} \tag{8.2} $$
$$ \begin{equation} (\xi_i,\, (x(t_1)- x_1)) =0, \qquad i=1,\dots, n-1, \end{equation} \tag{8.3} $$
$$ \begin{equation} c'(x)u=0, \qquad (u,u)-1 \leqslant 0. \end{equation} \tag{8.4} $$

The last two relations will be treated as mixed constraints, and the control $u$ is assumed to be smooth. So, we have a problem of type $\mathrm{A}$ (and even of type $\mathrm{B}$).

Note that at any point $(x,u)$ at which constraints (8.4) are met and $|u|=1$, their gradients with respect to $u$ are linearly independent. Indeed, these gradients are non-zero vectors $c'(x)$ and $2u$, which, in view of the first equality, are orthogonal. If $u=0$, then only the gradient $c'(x)$ of the first constraint should be considered, which by the assumption is non-zero. Thus, constraints (8.4) are regular, and so Theorem 1 can be applied to problem (8.2)(8.4).

Let $(\widehat{x}(t), \widehat{u}(t))$ be an optimal pair. Then there exist a number $\alpha_0\geqslant 0$, vectors $\beta\in \mathbb{R}^n$, $ \gamma\in \mathbb{R}^{n-1}$, Lipschitz-continuous functions $\psi_x(t)$, $\psi_t(t)$, measurable bounded functions $\lambda(t)\geqslant 0$, $m(t)$, not all equal to zero, which generate the Pontryagin function $H =(\psi_x,\,u)$, the extended Pontryagin function

$$ \begin{equation*} \overline H = (\psi_x,u) - \frac12 \lambda(t)((u,u)-1)- m(t)(c'(x),u), \end{equation*} \notag $$
and the endpoint Lagrange function
$$ \begin{equation*} l= \alpha_0(t_1 -t_0) +\beta (x(t_0)- x_0) + \sum_{i=1}^{n-1} \gamma_i(\xi_i,\,(x(t_1)- x_1)), \end{equation*} \notag $$
such that along $(\widehat{x}(t), \widehat{u}(t))$ the following conditions hold:

the complementary slackness condition

$$ \begin{equation} \lambda(t)\,\bigl((\widehat{u},\widehat{u})-1\bigr) =0; \end{equation} \tag{8.5} $$

the adjoint equation in $x$

$$ \begin{equation} \dot\psi_x = -\overline H_x = m(t)c''(\widehat{x})\widehat{u} \end{equation} \tag{8.6} $$
(here we use the symmetricity of matrix $c''(x)$);

the transversality conditions

$$ \begin{equation} \psi_x(t_0) = \beta, \qquad \psi_x(t_1)= - \sum_{i=1}^{n-1} \gamma_i\xi_i; \end{equation} \tag{8.7} $$

the adjoint equation in $t$

$$ \begin{equation} \psi_t= \mathrm{const} = -\alpha_0; \end{equation} \tag{8.8} $$

“the energy conservation law”

$$ \begin{equation} (\psi_x,\widehat{u})+ \psi_t= 0,\quad \text{i.e.,}\quad \widehat H = (\psi_x,\widehat{u}) \equiv \alpha_0; \end{equation} \tag{8.9} $$

and the stationarity condition in $u$

$$ \begin{equation} \overline H_u= \psi_x- \lambda(t)u- m(t)c'(x)=0. \end{equation} \tag{8.10} $$

Also, one can write the maximality condition for $H$, but since the constraints (8.4) are convex in $u$, this condition follows from the last equality.

Below, we will drop the hats on $x$ and $u$, and, instead of $\psi_x$ we simply write $\psi$.

Multiplying (8.10) by $u$, we get $(\psi,u) - \lambda(t)(u,u) =0$. By (8.5), $\lambda(t)(u,u) = \lambda(t)$, and now from (8.9) we get $\lambda(t) \equiv \alpha_0$.

Consider the case $\alpha_0=0$. We have $\lambda(t)=0$, and now (8.10) gives $\psi(t) = m(t)c'(x)$, that is, $\psi(t)$ is proportional to $c'(x(t))$. Therefore, $m(t) = (k(t), \psi(t))$ with some vector function $k(t)$. Now (8.6) is the homogeneous equation

$$ \begin{equation*} \dot\psi= (k, \psi)c''(x)u. \end{equation*} \notag $$
Moreover, $\psi(t_1) = m(t_1)\,c'(x_1)$, and, in view of (8.7), we have $\psi(t_1) \in L(x_1)$. Therefore, $\psi(t_1)=0$, and so, $\psi(t) \equiv 0$. Now from (8.7) we find that $\beta=0$ and all $\gamma_i=0$, so the collection of multipliers is trivial, a contradiction.

Therefore, $\alpha_0=1$, which gives $\lambda(t) \equiv 1$, and now by the complementary slackness condition (8.5) we have $|u| \equiv 1$ (motion with maximal possible velocity).

So, we have

$$ \begin{equation} \dot\psi = m(t)c''(x)u, \end{equation} \tag{8.11} $$
$$ \begin{equation} \psi = u + m(t)c'(x). \end{equation} \tag{8.12} $$
Multiplying the last equation by $c'(x)$, we obtain
$$ \begin{equation} (\psi, c'(x))= m(t)(c'(x),c'(x)). \end{equation} \tag{8.13} $$

Since $c'(x) \ne 0$, the function $m(t)$ is Lipschitz-continuous, and hence so is $u(t) = \psi(t) - m(t)c'(x)$. Hence $u(t) $ can be differentiated:

$$ \begin{equation*} \dot u= \dot\psi- \dot mc'(x)- mc''(x)u= - \dot mc'(x), \end{equation*} \notag $$
that is, $\ddot x = - \dot mc'(x)$.

From $(c'(x),\dot x)=0$ we have $(c'(x),\ddot x)+ (c''(x)\dot x, \dot x)=0$, which, by the above, gives $\dot m(c'(x),c'(x)) = (c''(x)\dot x, \dot x)$. Hence

$$ \begin{equation*} \dot m = \frac{(c''(x)\dot x, \dot x)}{(c'(x),c'(x))}. \end{equation*} \notag $$
Finally, we get the geodesic equation in terms of the trajectory $x(t)$:
$$ \begin{equation} \ddot x= -\frac{(c''(x)\dot x, \dot x)}{(c'(x),c'(x))}c'(x). \end{equation} \tag{8.14} $$
(Everywhere, the passage from a covector to a vector is by transposition, since we work in the Euclidean space $\mathbb{R}^n$.)

In particular cases, when the surface $S$ is a plane, a sphere, or a cylinder, equation (8.14) gives, respectively, a rectilinear motion, a rotation along a big circle, and a motion along a helix line with velocity $1$.

§ 9. Appendix

9.1. Lagrange principle for extremum problems with infinite number of constraints

Let $X$, $Y$ and $Z_i$, $i=1,\dots, \nu$, be Banach spaces, $\mathcal{D}\subset X$ be an open set, $K_i \subset Z_i$, $ i=1,\dots, \nu,$ be closed convex cones with non-empty interiors. Next, let $F_0\colon \mathcal{D}\to \mathbb{R}$, $g\colon \mathcal{D}\to Y$ and $f_i \colon \mathcal{D}\to Z_i$, $ i=1,\dots, \nu$, be given mappings. Consider the extremal problem

$$ \begin{equation} F_0(x)\to \min, \qquad f_i(x) \in K_i, \quad i=1,\dots, \nu, \qquad g(x)=0. \end{equation} \tag{9.1} $$

This problem covers a majority of theoretical and applied optimization problems, including optimal control problems with state and mixed state-control constraints $\Phi(t,x(t)) \leqslant 0$ and $\varphi(t,x(t),u(t)) \leqslant 0$, which can be regarded as inclusions in the cones of non-positive functions in the spaces $C$ and $L_\infty$, respectively (see [35]); some versions of problem (9.1) are considered in [36] and [37].

Assumptions. 1) The objective function $F_0$ and the mappings $f_i$ are Fréchet differentiable at some point $x_0\in \mathcal{D};$ the operator $g$ is strictly differentiable at $x_0$ (smoothness of the data functions); 2) the image of the derivative $g'(x_0)$ is closed in $Y$ (weak regularity of the equality constraint).

Even though all the mappings in the problem are differentiable, problem (9.1) is not a standard smooth problem, because any constraint $f_i(x) \in K_i$ can be given by an infinite number of smooth scalar inequalities (since the spaces $Z_i$ can be infinite-dimensional).

Theorem 6. Let $x_0$ be a point of local minimum in problem (9.1). Then there exist multipliers $\alpha_0\geqslant 0$, $z_i^* \in Z^*_i$, $i=1,\dots,\nu$, and $y^*\in Y^*$, not all zero, such that $z_i^* \in K^0_i$ and $\langle z_i^*, f_i(x_0) \rangle =0$, $i=1,\dots,\nu$ (that is, every $z^*_i$ is an outer normal to the cone $K_i$ at the point $f_i(x_0)$), and the Lagrange function $\mathcal{L}(x) = \alpha_0 F_0(x) + \sum_{i=1}^\nu \langle z_i^*, f_i(x)\rangle + \langle y^*, g(x)\rangle$ is stationary at $x_0$:

$$ \begin{equation} \mathcal{L}'(x_0) = \alpha_0 F_0'(x_0)+ \sum_{i=1}^\nu z_i^* f_i'(x_0)+ y^* g'(x_0) = 0. \end{equation} \tag{9.2} $$

The last equality is called the Euler–Lagrange equation.

Theorem 6 is a generalization of the classical Lagrange multiplier rule to problems with infinite number of constraints. The proof follows the Dubovitskii–Milyutin scheme and is based on standard notions and facts from functional analysis, see [7], [35]–[37].

9.2. Theorem on the absence of singular components [15]

Let $D \subset \mathbb{R}^{d(w)}$ be a compact set, and let $p_i\colon D\to \mathbb{R}^r$, $i\in I$, and $q_j\colon D\to \mathbb{R}^r$, $j\in J$, be continuous vector functions, where $I$ and $J$ are some finite index sets. Suppose that, for any $w\in D$, the system of vectors $p_i(w)$, $i\in I$, $ q_j(w)$, $j\in J$, is positively linearly independent (PLI).

Let also $E\subset \mathbb{R}$ be a set of finite positive measure, and let a measurable function $\widehat{w}(t)\in D$ lie in $D$ almost everywhere on $E$.

Theorem 7. Let functionals $\lambda_i $, $m_j \in L_\infty^*(E)$, $\lambda_i\geqslant0$, and let a function $l\in L_1^r(E)$ be such that, for any test function $\overline u(t)\in L_\infty^r(E)$,

$$ \begin{equation} \sum_{i\in I} \langle \lambda_i, p_i(\widehat{w}(t))\overline u(t)\rangle+ \sum_{j\in J} \langle m_j, q_j(\widehat{w}(t))\overline u(t)\rangle = \int_{E} l(t)\overline u(t)\,dt. \end{equation} \tag{9.3} $$
Then all $\lambda_i, m_j$ are functions from $L_1(E)$, and each $\lambda_i(t)\geqslant0$ almost everywhere on $E$.

Proof. As already noted, for any point $w_0\in D$, there exists a vector $\overline{v}_0$ such that $p_i(w_0)\overline{v}_0> 1$ for all $i\in I$ and $q_j(w_0)\overline{v}_0 =0$ for all $j$. By continuity, there exist a neighbourhood $\mathcal{O}(w_0)$ of the point $w_0$ and a continuous function $\overline{v}(w)$ such that on $\mathcal{O}(w_0)$ we have
$$ \begin{equation} p_i(w)\overline{v}(w)> 1\quad \forall\, i\in I, \qquad q_j(w)\overline{v}(w) =0\quad \forall\, j\in J, \end{equation} \tag{9.4} $$
and $\overline{v}(w_0)= \overline{v}_0$. (For example, one can take the projection of the vector $\overline{v}_0$ to the joint zero set of the vectors $q_j(w)$.) By compactness, there is a finite number of neighbourhoods $\mathcal{O}(w_s)$, $s=1,\dots,\widetilde s$, that cover all $D$, and on their union there is a “piecewise continuous” (to be precise, a bounded Borel) function $\overline{v}(w)$ satisfying (9.4) on the whole $D$. Hence, for $w= \widehat{w}(t)$, we get a measurable function $\overline{v}(\widehat{w}(t))$ satisfying, for almost all $t\in E$,
$$ \begin{equation} p_i(\widehat{w}(t))\overline{v}(\widehat{w}(t))> 1\quad \forall\, i\in I, \qquad q_j(\widehat{w}(t))\overline{v}(\widehat{w}(t)) =0\quad \forall\, j\in J. \end{equation} \tag{9.5} $$

Suppose now that some functional $\lambda_i$, say $\lambda_1$, has a singular component. Thus, $\lambda_1 =\lambda_1' +\lambda_1''$, where the functional $\lambda_1'$ is absolutely continuous and $\lambda_1''$ is a singular functional supported on a sequence of measurable sets $E_k\subset E$ with $\operatorname{mes}E_k\to 0$, $k=1, 2, \dots$, and such that $\|\lambda_1''\| = \gamma >0$.

Consider a sequence of functions $\overline u_k(t) = \chi_{E_k}(t)\overline{v}(\widehat{w}(t))$. For this sequence, in view of (9.5), the second sum in (9.3) vanishes, and hence

$$ \begin{equation*} \sum_i \langle \lambda_i, p_i(\widehat{w}(t))\overline u_k(t) \rangle = \int_{E_k} l(t)\overline u(t)\,dt. \end{equation*} \notag $$
Since all $\lambda_i$ are non-negative (and hence all $\lambda_i'\geqslant 0$ and $\lambda_i''\geqslant 0)$, the left-hand side of the last relation is not smaller than
$$ \begin{equation*} \langle \lambda_1'', \chi_{E_k}\rangle= \langle \lambda_1'', \mathbf{1}\rangle= \|\lambda_1''\| = \gamma > 0 \end{equation*} \notag $$
(where $\mathbf{1}(t)\equiv 1$), while the right-hand side tends to zero by absolute continuity of the Lebesgue integral, a contradiction. Therefore, the functionals $\lambda_i$ cannot have singular components, each $\lambda_i$ is regular: $\lambda_i\in L_1^r(E)$, $ i\in I$.

Now (9.3) assumes the form

$$ \begin{equation} \sum_j \langle m_j,q_i(\widehat{w}(t))\overline u(t)\rangle = \int_{E} l'(t)\overline u(t)\,dt, \end{equation} \tag{9.6} $$
where $l'(t)$ is some new function from $L^r_1(E)$.

Suppose now that some functional $m_j$, say $m_1$, has a singular component, that is, $m_1 = m_1' + m_1''$, where the functional $m_1'$ is absolutely continuous and $m_1''$ is a singular functional supported on a sequence of measurable sets $E_k\subset E$ such that $\operatorname{mes}E_k\to 0$ and $\|m_1''\|=\gamma >0$. We again consider an arbitrary point $w_0\in D$. Since the vectors $q_j(w_0)$, $j\in J$, are linearly independent, there exists a vector $\overline{v}_0$ such that $q_1(w_0)\overline{v}_0 =1$ and $q_i(w_0)\overline{v}_0 =0$ for all $j\ne 1$. In addition, there exist a neighbourhood $\mathcal{O}(w_0)$ and a continuous function $\overline{v}(w)$ such that on $\mathcal{O}(w_0)$

$$ \begin{equation} q_1(w)\overline{v}(w) =1, \qquad q_j(w)\overline{v}(w) =0\quad \forall\, j\ne 1, \end{equation} \tag{9.7} $$
and $\overline{v}(w_0)= \overline{v}_0$. (One can take the projection of the vector $\overline{v}_0$ onto the common zero subspace of the vectors $q_j(w)$, $j\ne 1$, and then normalize it.) By compactness, there exist a finite number of neighbourhoods $\mathcal{O}(w_s)$, $s=1,\dots,\widetilde s$, that cover $D$, and there is a bounded Borel function $\overline{v}(w)$ on the union of $\mathcal{O}(w_s)$ satisfying (9.7) on the whole $D$. Now, for $w= \widehat{w}(t)$, we get a measurable function $\overline{v}(\widehat{w}(t))$ satisfying, for all $t\in E$,
$$ \begin{equation} q_1(\widehat{w}(t))\overline{v}(\widehat{w}(t)) =1, \qquad q_j(\widehat{w}(t))\overline{v}(\widehat{w}(t)) =0\quad \forall\, j\ne 1. \end{equation} \tag{9.8} $$

Let $z(t)\in L_\infty (E)$ be a function such that $\langle m_1'', z\rangle =1$. Then the function $\overline u(t) = z(t)\,\overline{v}(\widehat{w}(t))$ satisfies

$$ \begin{equation*} q_1(\widehat{w}(t))\overline u(t) = z(t), \qquad q_j(\widehat{w}(t))\overline u(t) =0 \quad \forall\, j\ne 1, \end{equation*} \notag $$
and, for the sequence $\overline u_k(t) = \chi_{E_k}(t)\overline u(t)$, we have by (9.6)
$$ \begin{equation*} \langle m_1, q_1(\widehat{w}(t))\overline u_k(t)\rangle = \int_{E_k} l'(t)\overline u(t)\,dt, \end{equation*} \notag $$
that is,
$$ \begin{equation} \langle m_1'', q_1(\widehat{w}(t))\overline u_k(t)\rangle= -\langle m_1', q_1(\widehat{w}(t))\overline u_k(t) \rangle + \int_{E_k} l'(t) \overline u(t)\,dt. \end{equation} \tag{9.9} $$
But, for all $k$,
$$ \begin{equation*} \langle m_1'', q_1(\widehat{w}(t)\,\overline u_k(t)\rangle= \langle m_1'', q_1(\widehat{w}(t)\,\overline u(t)\rangle = \langle m_1'', z \rangle =1, \end{equation*} \notag $$
and hence the left-hand side of (9.9) is $1$ for all $k$, while the right-hand side tends to zero, and we again have a contradiction. Thus, the functionals $m_j$ also cannot have singular components. Theorem 7 is proved.5

The next theorem generalizes the above one to the case where the collection of vectors $p_i(w)$ in a PLI system depends on the point $w$. Namely, assume that, on a compact set $D$, we are given, in addition to vector functions $p_i$ and $q_j$, continuous scalar functions $\varphi_i(w)\leqslant 0$, $i\in I$. Let, for any point $w\in D$, the system of vectors $p_i(w)$, $i\in I(w)$, $ q_j(w)$, $j\in J$, where $I(w) = \{i\in I \mid \varphi_i(w) =0\}$ is the set of active indexes for the point $w$, be positively linearly independent.

Let again $E$ be a measurable set and a measurable $\widehat{w}(t)\in D$ lie in $E$ almost everywhere. As above, let $\lambda_i, m_j \in L_\infty^*(E)$, but now each $\lambda_i$ is non-negative and supported on the set $M_i^\delta = \{ t\mid \varphi_i(\widehat{w}(t))\geqslant -\delta\}$ for any $\delta>0$.

Theorem 8. Let functionals $\lambda_i, m_j \in L_\infty^*(E)$, $\lambda_i\geqslant0$ and let a function $l\in L_1^r(E)$ be such that (9.3) holds for any test function $\overline u(t)\in L_\infty^r(E)$. Then all $\lambda_i$ and $m_j$ are functions from $L_1(E)$, and so, all $\lambda_i(t)\geqslant 0$ and $\lambda_i(t)\varphi_i(\widehat{w}(t))=0\,$ almost everywhere on $E$.

Proof. Consider any index set $\Gamma \subset I$ and define the corresponding compact set $D_\Gamma = \{w \in D\mid \varphi_i(w)=0\ \forall\, i\in \Gamma\}$. In particular, $D_{\varnothing}= D$.

For any $\delta>0$, we also define the wider compact set $D_\Gamma^\delta = \{w \in D\mid \varphi_i(w)\geqslant -\delta$ for all $ i\in \Gamma\}$. Obviously, $\bigcap_{\delta>0} D_\Gamma^\delta = D_\Gamma$, and hence, there is $\delta>0$ such that the vectors $p_{i}(w)$, $i\in \Gamma$, $q_j(w)$, $j \in J$, are PLI at any $w\in D_\Gamma^\delta$. Since the family of all sets $\Gamma$ is finite, there exists $\delta>0$, common for all of them. Reducing $\delta$ if necessary we may assume that if $D_{\Gamma_1} \cap D_{\Gamma_2} =\varnothing$, then $D_{\Gamma_1}^\delta \cap D_{\Gamma_2}^\delta =\varnothing$. The family of all these compact sets is partially ordered by the inclusion: if $\Gamma_1\subset \Gamma_2$, then $D_1^\delta\supset D_2^\delta$, and, for any $\Gamma_1$, $\Gamma_2$, we have $D_{\Gamma_1\cup \Gamma_2}^\delta = D_{\Gamma_1}^\delta \cap D_{\Gamma_2}^\delta$.

From the function $\widehat{w}(t)$, for each $\Gamma$ we define the measurable set $M_\Gamma^\delta = \{t\in E\mid \widehat{w}(t) \in D_\Gamma^\delta\}$. Let $\mathcal{G}$ be the family of all “essential” sets $\Gamma$ ($\Gamma$ is essential if $M_\Gamma^\delta$ has positive measure). Clearly, $\mathcal{G}$ is also partially ordered by the inclusion. Consider any maximal element $\Gamma_1$ in this family, that is, an element such that, for any other $\Gamma \supset \Gamma_1$, $M_\Gamma^\delta$ is nullset. In other words, $\varphi_i(\widehat{w}(t)) \geqslant -\delta$ on $M_{\Gamma_1}^\delta$ for all $i\in \Gamma_1$, and $\varphi_i(\widehat{w}(t)) <-\delta$ for the remaining $i\notin \Gamma_1$.

Consider equality (9.3) for all $\overline u(t)$ supported on the set $M_{\Gamma_1}^\delta$. By definition, each functional $\lambda_i$ is supported on its own $M_i^\delta$, and the maximality of $\Gamma_1$ implies that, for $i\notin \Gamma_1$, each vanishes on $M_{\Gamma_1}^\delta$, whence, in the first sum, only $i\in \Gamma_1$ can be retained:

$$ \begin{equation*} \sum_{i\in \Gamma_1} \langle \lambda_i,p_i(\widehat{w}(t))\overline u(t)\rangle+ \sum_{j\in J} \langle m_j,q_j(\widehat{w}(t))\overline u(t) \rangle = \int_{M_{\Gamma_1}^\delta} l(\tau)\overline u(\tau)\,d\tau. \end{equation*} \notag $$
Now applying Theorem 7 to the collection $\lambda_i$, $i\in \Gamma_1$, $ m_j$, $j\in J$, the compact set $D_{\Gamma_1}^\delta$, and the set $M_{\Gamma_1}^\delta$, we find that the restriction to $M_\Gamma^\delta$ of each functional from this collection is absolutely continuous. So, it remains to consider equality (9.3) on the set $E_1 = E \setminus M_{\Gamma_1}^\delta$.

For this set, the family $\mathcal{G}$ of essential $\Gamma \subset I$ is smaller (at least on $\Gamma_1$), and we will again proceed as above: consider any maximal element $\Gamma_2$; for this element all the “alien” functionals $\lambda_i$ vanish on $M_{\Gamma_2}^\delta$, while, by Theorem 7, the restriction to $M_{\Gamma_2}^\delta$ of the “own” $\lambda_i$, $i\in \Gamma_2$, and $m_j$, $j\in J$, is absolutely continuous. Hence, we can pass to the set $E_2 = E_1 \setminus M_{\Gamma_2}^\delta$, and so on. After a finite number of steps, the set $\mathcal{G}$ will consist of a single set $\Gamma_N$. Hence by Theorem 7, on the set $M_{\Gamma_N}^\delta$, each of the functionals $\lambda_i$, $i\in \Gamma_N$, and $m_j$, $j\in J$, is absolutely continuous, while on the remaining set $E_N$ all $\lambda_i$ vanish, and now another appeal to Theorem 7 shows that all $m_j$, $j\in J$, are absolutely continuous. Theorem 8 is proved. $\Box$

Applying this theorem to (4.3), the scalar functions $\varphi_i(w)$, the vector functions $p_i(w) = \varphi'_{iu}(w)$, $q_j(w) = g'_{ju}(w)$, the set $E_+$ in Problem $\mathrm{B}^\theta$, the compact set $D = \{w\in \widehat D\mid \varphi(w)\leqslant0,\ g(w)=0\}$, where $\widehat D$ is the compact set containing the optimal process, and the function $w^\theta(\tau) \in D$, we find that all functionals $\lambda_i$ and $m_j$ are functions from $L_1(E_+)$.

9.3. Some properties of functions of bounded variations

On an interval $\Delta=[t_0,t_1]$, consider a linear differential equation with respect to a vector function $\psi \in BV(\Delta)$ (treated as a column):

$$ \begin{equation} d\psi(t)= A(t)\psi(t)\,dt+B(t)\lambda(t)\,dt+ G(t)\,d\mu(t), \qquad \psi(t_0) = \psi_0, \end{equation} \tag{9.10} $$
where $A$, $B$, $G$ are given measurable matrices of corresponding dimensions, $A$ is integrable,6 $B$, $G$ are bounded, the function $\mu\in BV(\Delta)$ (that is, the measure $d\mu\in C^*(\Delta))$, $\lambda\in L_1(\Delta)$, and $\psi_0 \in \mathbb{R}^{d(\psi)}$.

Assume that the functions $\psi\in BV(\Delta)$ are left-continuous, that is, $\psi(t-0)= \psi(t)$ for $t\in (t_0,t_1]$, define $\psi(t_0-0)= \psi(t_0)$, and assume also that there exists a value $\psi(t_1+0)$. Then the measure $d\psi$ and the function $\psi$ are related via

$$ \begin{equation*} \psi(t) = \int_{t_0-0}^{t-0} d\psi, \quad t\in (t_0,t_1], \quad \text{and} \quad \psi(t_1+0) = \psi(t_1) + \Delta\psi(t_1), \end{equation*} \notag $$
where we also have
$$ \begin{equation*} \|d\psi\|_{C^*}= \int_{t_0-0}^{t_1+0} |d\psi|,\qquad \|\psi\|_{BV} = |\psi(t_0)| + \|d\psi\|_{C^*}\,. \end{equation*} \notag $$
Note that $\|\psi\|_\infty = \max_{[t_0-0,\, t_1+0]} |\psi(t)| \leqslant \|\psi\|_{BV}$. If $\psi$ is absolutely continuous, then $\|\psi\|_{BV} = \|\psi\|_{AC} = |\psi(t_0)| + \int_{t_0}^{t_1} |\dot\psi(t)|\,dt$.

The facts presented in this section are well known; their proofs are given for the convenience of the reader. The following lemma is actually taken from [30].

Lemma 6. For any initial condition $\psi(t_0)=\psi_0$, equation (9.10) has a unique solution $\psi(t)$, which is continuous at all points of continuity of the measure $d\mu$ and satisfies the estimate

$$ \begin{equation} \|\psi\|_{BV} \leqslant \mathrm{const} \biggl(|\psi_0| + \int_{t_0}^{t_1}|\lambda(t)|\,dt + \int_{t_0-0}^{t_1+0}|d\mu(t)| \biggr). \end{equation} \tag{9.11} $$

Proof. Consider the function of bounded variation
$$ \begin{equation} \rho(t) = \int_{t_0}^{t-0} \bigl(B(\tau)\lambda(\tau)\,d\tau + G(\tau)\,d\mu(\tau)\bigr), \qquad \rho(t_0)=0. \end{equation} \tag{9.12} $$
Obviously, this function is continuous at all points of continuity of the measure $d\mu$ and generates the measure $d\rho = B \lambda\,dt + G\,d\mu$. Hence $\|\rho\|_{BV} \leqslant \mathrm{const}(\|\lambda\|_1 + \|d\mu\|_{C^*})$, and equation (9.10) now has the form
$$ \begin{equation} d\psi(t)= A(t) \psi(t)\,dt+ d\rho(t), \qquad \psi(t_0) = \psi_0. \end{equation} \tag{9.13} $$
Let us find its solution in the form $\psi = \overline\psi +\rho$. We have $d\overline\psi = A(\overline\psi + \rho)\, dt$, and hence the function $\overline\psi$ is absolutely continuous and satisfies the linear ordinary differential equation
$$ \begin{equation} \dot{\overline\psi}= A(\overline\psi +\rho), \qquad \overline\psi(t_0) = \psi_0. \end{equation} \tag{9.14} $$
As is well known, it has a unique solution, and moreover,
$$ \begin{equation*} \|\overline\psi\|_{BV}= \|\overline\psi\|_{AC} \leqslant \mathrm{const}(|\psi_0| + \|\rho\|_\infty) \leqslant \mathrm{const}(|\psi_0| + \|\rho\|_{BV}). \end{equation*} \notag $$
This implies that $\psi = \overline\psi +\rho$ satisfies the required estimate (9.11). $\Box$

Lemma 7. Let, as $k\to\infty$, the functions $\lambda^k \to \lambda^0$ weakly converge (in the space $L_1(\Delta)$ with respect to $L_\infty(\Delta))$, the measures $d\mu^k \to d\mu^0$ w$^*$-converge in the space $C^*(\Delta)$, and the initial conditions $\psi^k_0\to \psi^0_0$. Then the corresponding solutions $\psi^k(t)$ to equation (9.10) converge to $\psi^0(t)$ at all points of continuity of the limit measure $d\mu^0$, and hence, almost everywhere on $\Delta$. In addition, $\|\psi^k\|_\infty \leqslant\mathrm{const}$, $\|\psi^k -\psi^0\|_1 \to0$, and the measures $d\psi^k \overset{\text{w}^*}\to d\psi^0$ in $C^*(\Delta)$.

Proof. Let us construct functions $\rho^k$, $\rho^0$ corresponding to the triples $(\lambda^k, d\mu^k, \psi^k_0)$ and $(\lambda^0, d\mu^0, \psi^0_0)$ by formula (9.12). By assumptions of the lemma, $d\rho^k \overset{\text{w}^*}\to d\rho^0$ in $C^*(\Delta)$, whence, as is known, $\rho^k(t)\to \rho^0(t)$ at all points of continuity of the limit measure $d\rho^0$, and, a fortiori, at all points of continuity of the measure $d\mu^0$. In view of (9.12), $\|\rho^k\|_\infty \leqslant \mathrm{const} (\|\lambda^k\|_1 + \|d\mu^k\|) \leqslant \mathrm{const}$, whence, by the Lebesgue theorem, $\|A(t)(\rho^k(t) -\rho^0(t))\|_1 \to 0$, and now (9.14) implies that the corresponding $\overline\psi{}^{\,k}$ converge to $\overline\psi{}^{\,0}$ everywhere on $\Delta$, and hence $\psi^k(t)\to \psi^0(t)$ at all points of continuity of the measure $d\mu^0$. Since $\|\psi^k\|_\infty \leqslant \|\psi^k\|_{BV} \leqslant \mathrm{const}$ by (9.11), we have $\|\psi^k -\psi^0\|_1 \to0$ and $\|A(t)(\psi^k(t) -\psi^0(t))\|_1 \to 0$, and now by (9.13) the measures $d\psi^k \to d\psi^0$ w$^*$-converge in $C^*(\Delta)$. $\Box$

Lemma 8 (on the limit of jumps of measures). Let measures $d\mu^k \geqslant0$ be such that $d\mu^k \overset{\text{w}^*}\to d\mu^0$ on the closed interval $\Delta = [t_0, t_1]$. Then, for any point $t_* \in \Delta$,

$$ \begin{equation} \limsup_k \Delta\mu^k(t_*) \leqslant \Delta\mu^0(t_*). \end{equation} \tag{9.15} $$
(The w$^*$-limit measure can concentrate at the given point.)

Proof. We set $\Delta\mu^0(t_*) = c\geqslant0$. We first suppose that $t_* \in \operatorname{int}\Delta$. Let $\varepsilon>0$ and let $t'< t_*< t''$ be such that $\mu^0(t'') -\mu^0(t') < c+\varepsilon$. Then, on any smaller interval $[\tau_1,\tau_2] \subset [t',t'']$, we still have $\mu^0(\tau_2) -\mu^0(\tau_1)< c+\varepsilon$. The w$^*$-convergence $d\mu^k \to d\mu^0$ implies the convergence almost everywhere on $[t',t'']$. Now, we take any two points $\tau_1\in (t',t_*)$ and $\tau_2\in (t_*, t'')$ at which
$$ \begin{equation*} \mu^k(\tau_1) \to \mu^0(\tau_1), \qquad \mu^k(\tau_2) \to \mu^0(\tau_2). \end{equation*} \notag $$
Then, for sufficiently large $k$,
$$ \begin{equation*} \Delta\mu^k(t_*) \leqslant \mu^k(\tau_2) - \mu^k(\tau_1)= \mu^0(\tau_2) - \mu^0(\tau_1) + o(1)< c+\varepsilon + o(1), \end{equation*} \notag $$
whence $\limsup_k\Delta\mu^k(t_*) \leqslant c+\varepsilon$. Since $\varepsilon$ is arbitrary, we obtain the required estimate (9.15). In the case $t_* =t_0$ or $t_* =t_1$, the same arguments work with a small modification. The lemma is proved. $\Box$

Lemma 9. Let measures $d\mu^k \geqslant0$ be such that $d\mu^k \overset{\text{w}^*}\to d\mu^0$ on the closed interval $\Delta = [t_0, t_1]$. Then, for any closed interval $D \subset \Delta$,

$$ \begin{equation*} \limsup_k \int_{D} d\mu^k \leqslant \int_{D} d\mu^0. \end{equation*} \notag $$

The proof proceeds as above and hence omitted.

Lemma 10. Let measures $d\mu^k \geqslant0$ be such that $ d\mu^k \xrightarrow{{w^*}} d\mu^0$, $ \mu^k(t_0)= \mu^0(t_0)=0$ and let functions $b^k \in L_1(\Delta)$ be such that $ b^k\overset{\text{w}} \to b^0 \in L_1(\Delta)$. Consider the measures

$$ \begin{equation*} \begin{alignedat}{1} d h^k &= b^k(t)\,dt+ a(t)\,d\mu^k, \\ d h^0 &= b^0(t)\,dt+ a(t)\,d\mu^0, \end{alignedat} \end{equation*} \notag $$
where $a(t)$ is a continuous function on $\Delta$, and suppose that $h^k(t_0)\to h^0(t_0)$. Let, at some point $t_*\in \operatorname{int} \Delta$, for all $k$,
$$ \begin{equation} h^k(t_*-0)+ \rho^k a(t_*)\Delta\mu^k(t_*) \leqslant 0, \qquad \rho^k \in [0,1]. \end{equation} \tag{9.16} $$
Then there exists $\rho^0 \in [0,1]$ such that
$$ \begin{equation} h^0(t_*-0)+ \rho^0 a(t_*)\Delta\mu^0(t_*) \leqslant 0. \end{equation} \tag{9.17} $$

Proof. Consider the case $a(t_*)\geqslant0$. Setting $\rho^k=0$ for all $k$, we have $h^k(t_*-0)\leqslant 0$. Let us prove the limit inequality $h^0(t_*-0) \leqslant 0$.

By w$^*$-convergence $d\mu^k \to d\mu^0$, the norms $\|d\mu^k\|_{C^*}$ are uniformly bounded by some constant $M$. We fix any $\varepsilon>0$. By the continuity of $a(t)$ and the weak convergence $b^k \to b^0$, there exists $\delta>0$ such that $a(t)\geqslant - \varepsilon/M$ on $(t_* -\delta,\, t_*)$, and, for all $k$,

$$ \begin{equation*} \int_{t_* -\delta}^{t_*} |b^k(t)|\,dt < \varepsilon. \end{equation*} \notag $$
Hence, on this interval
$$ \begin{equation*} h^k(t_*-0)- h^k(t) \geqslant -\int |b^k(t)|\,dt- \frac{\varepsilon}{M} \int d\mu^k \geqslant\, - 2\varepsilon, \end{equation*} \notag $$
whence $ h^k(t) \leqslant h^k(t_*-0) +2\varepsilon$. By the assumption $h^k(t_*-0) \leqslant 0$, and hence $h^k(t)\leqslant 2\varepsilon$ on the interval $(t_* -\delta,\, t_*)$. Since $\varepsilon$ and $\delta$ are independent of $k$, and since $h^k(t) \to h^0(t)$ almost everywhere by the assumptions of lemma, we have $h(t)\leqslant 2\varepsilon$ on the same interval, and hence $h(t_*-0)\leqslant 2\varepsilon$. Consequently, $h(t_*- 0)\leqslant 0$, since $\varepsilon>0$ is arbitrary

In the case $a(t_*)< 0$, setting $\rho^k=1$ for all $k$, we get $h^k(t_*-0) + a(t_*)\Delta\mu^k(t_*) = h^k(t_*+0)\leqslant0$ for all $k$. Making the change $t \mapsto \tau = t_0+t_1-t$, we arrive at the considered case $\widetilde a(\tau_*) = -a(\tau_*)>0$ with the inequality $\widetilde h^k(\tau_*-0) \leqslant0$. $\Box$

The next lemma is a generalization of the above result to the case of a finite numbers of measures $d\mu^k$. Let $d\mu^k_j $ be measures such that

$$ \begin{equation*} d\mu^k_j \geqslant0, \quad d\mu^k_j \xrightarrow{{w^*}} d\mu^0_j \quad {as }\; k\to \infty,\quad \mu^k_j(t_0)= \mu^0_j(t_0)=0, \end{equation*} \notag $$
$j=1,\dots, N$, and let $b^k \in L_1(\Delta)$ be functions such that $b^k \to b^0 \in L_1(\Delta)$ weakly. Consider the measures
$$ \begin{equation*} \begin{alignedat}{1} d h^k &= b^k(t)\,dt+ a_1(t)\,d\mu^k_1 + \dots + a_N(t)\, d\mu^k_N, \\ d h^0 &= b^0(t)\,dt+ a_1(t)\,d\mu^0_1 + \dots + a_N(t)\,d\mu^0_N, \end{alignedat} \qquad h^k(t_0)\to h^0(t_0), \end{equation*} \notag $$
where $N$ is an integer, and functions $a_j(t)\geqslant 0$ are continuous on $\Delta$.

Clearly, $d h^k \xrightarrow{{w^*}} d h^0$ and $h^k(t)\to h^0(t)$ almost everywhere on $\Delta$.

Lemma 11. Let, at some point $t_*\in \operatorname{int} \Delta$,

$$ \begin{equation} h^k(t_*-0)+ \sum_{j=1}^N \rho^k_j\, a_j(t_*)\Delta\mu^k_j(t_*) \leqslant 0, \qquad \rho^k_j \in [0,1], \end{equation} \tag{9.18} $$
for all $k$. Then there exist $\rho^0_j \in [0,1]$ such that
$$ \begin{equation} h^0(t_*-0)+ \sum_{j=1}^N \rho^0_j\, a(t_*)\Delta\mu^0_j(t_*) \leqslant 0. \end{equation} \tag{9.19} $$

Proof. Consider arbitrary $\gamma_j$, $j=1,\dots, N$, such that $\sum \gamma_j = 1$. For any $j$, we introduce the measures
$$ \begin{equation*} d h^k_j = \gamma_j b^k\,dt + a_j\,d \mu^k_j, \quad h^k_j(t_0)= \gamma_j h^k(t_0), \qquad k = 0,1,2,\dots, \end{equation*} \notag $$
which define the functions $h^k_j(t)$. We also set $\widetilde h^k = \sum_j h^k_j$. We have $d\widetilde h^k = d h^k$, $\widetilde h^k(t_0) = h^k(t_0)$, and hence $\widetilde h^k = h^k$ everywhere on $\Delta$ if they are assumed to be left-continuous. For any $j=1,\dots, N$, let
$$ \begin{equation*} h^k_j(t_*-0)+ \rho^k_j\, a_j(t_*)\Delta\mu^k_j(t_*)= c_j. \end{equation*} \notag $$
Summing over all $j$ and using (9.18), we have $\sum c_j\leqslant0$. An application of Lemma 10 to the functions $h^k_j(t) - c_j$ and $\widetilde b^k_j(t) = \gamma_jb^k(t)$, for any $j$, shows that there exist numbers $\rho^0_j\in [0,1]$ such that
$$ \begin{equation*} h^0_j(t_*-0)+ \rho^0_j\, a_j(t_*)\Delta\mu^0_j(t_*) \leqslant c_j. \end{equation*} \notag $$
Summing these inequalities over all $j=1,\dots, N$ and taking into account that $h^0_j = \gamma_j h^0$, we arrive at (9.19). $\Box$

The author is grateful to N. P. Osmolovskii for useful discussions.


Bibliography

1. R. V. Gamkrelidze, “Optimal control processes for bounded phase coordinates”, Izv. Akad. Nauk SSSR Ser. Mat., 24:3 (1960), 315–356 (Russian)  mathnet  zmath
2. M. R. Hestenes, Calculus of variations and optimal control theory, John Wiley & Sons, Inc., New York–London–Sydney, 1966  mathscinet  zmath
3. R. F. Hartl, S. P. Sethi, and R. G. Vickson, “A survey of the maximum principles for optimal control problems with state constraints”, SIAM Rev., 37:2 (1995), 181–218  crossref  mathscinet  zmath
4. A. Dmitruk and I. Samylovskiy, “On the relation between two approaches to necessary optimality conditions in problems with state constraints”, J. Optim. Theory Appl., 173:2 (2017), 391–420  crossref  mathscinet  zmath
5. A. Ya. Dubovitskii and A. A. Milyutin, “Extremum problems in the presence of restrictions”, Zh. Vychisl. Mat. Mat. Fiz., 5:3 (1965), 395–453  mathnet  mathscinet  zmath; English transl. U.S.S.R. Comput. Math. Math. Phys., 5:3 (1965), 1–80  crossref
6. A. A. Milyutin, “Maximum principle in a regular problem of optimal control”, Necessary condition in optimal control, Chaps. 1–5, Nauka, Moscow, 1990 (Russian)  mathscinet  zmath
7. A. A. Milyutin, A. V. Dmitruk, and N. P. Osmolovskii, Maximum princiiple in optimal control, Moscow State Univ., Faculty of Mech. and Math., Moscow, 2004 https://kafedra-opu.ru/node/139 (Russian)
8. A. A. Milyutin, “General schemes of necessary conditions for extrema and problems of optimal control”, Uspekhi Mat. Nauk, 25:5(155) (1970), 110–116  mathnet  mathscinet  zmath; English transl. Russian Math. Surveys, 25:5 (1970), 109–115  crossref  adsnasa
9. A. Ya. Dubovitskii and A. A. Milyutin, “Necessary conditions for a weak extremum in optimal control problems with mixed constraints of the inequality type”, Zh. Vychisl. Mat. Mat. Fiz., 8:4 (1968), 725–779  mathnet  mathscinet  zmath; English transl. U.S.S.R. Comput. Math. Math. Phys., 8:4 (1968), 24–98  crossref
10. A. Ya. Dubovitskii and A. A. Milyutin, Necessary weak extremum conditions in a general optimal control problem, Nauka, In-t Khim. Fiz. AN SSSR, Moscow, 1971 (Russian)
11. A. Ya. Dubovitskii and A. A. Milyutin, “Maximum principle theory”, Methods of the theory of extremal problems in economics, ed. V. L. Levin, Nauka, Moscow, 1981, 6–47 (Russian)  mathscinet  zmath
12. K. Makowski and L. W. Neustadt, “Optimal control problems with mixed control-phase variable equality and inequality constraints”, SIAM J. Control, 12:2 (1974), 184–228  crossref  mathscinet  zmath
13. A. M. Ter-Krikorov, “Convex programming in a space adjoint to a Banach space and convex optimal control problems with phase constraints”, Zh. Vychisl. Mat. Mat. Fiz., 16:2 (1976), 351–358  mathnet  mathscinet  zmath; English transl. U.S.S.R. Comput. Math. Math. Phys., 16:2 (1976), 68–75  crossref
14. A. N. Dyukalov and A. Y. Ilyutovich, “Indicator of optimality in nonlinear control problems with mixed constraints. I”, Avtomat. i Telemekh., 1977, no. 3, 96–106  mathnet  mathscinet  zmath; II, no. 5, 11–20  mathnet  mathscinet  zmath; English transl. Autom. Remote Control, 38:3 (1977), 381–389; “Features of optimality in nonlinear problems of optimal control with mixed constraints. II”:5, 620–628
15. A. V. Dmitruk, “Maximum principle for the general optimal control problem with phase and regular mixed constraints”, Optimality of control dynamical systems, 14, Nauka, Vsesoyuznyi Nauchno Issled. Inst. Sist. Issled., Moscow, 1990, 26–42  zmath; English transl. Comput. Math. and Modeling, 4 (1993), 364–377  crossref  mathscinet
16. R. V. Gamkrelidze, “Optimal sliding states”, Dokl. Akad. Nauk SSSR, 143:6 (1962), 1243–1245  mathnet  mathscinet  zmath; English transl. Soviet Math. Dokl., 3 (1962), 559–562
17. E. N. Devdariani and Yu. S. Ledyaev, “Maximum principle for implicit control systems”, Appl. Math. Optim., 40:1 (1999), 79–103  crossref  mathscinet  zmath
18. M. R. de Pinho and J. F. Rosenblueth, “Necessary conditions for constrained problems under Mangasarian–Fromowitz conditions”, SIAM J. Control Optim., 47:1 (2008), 535–552  crossref  mathscinet  zmath
19. F. Clarke and M. R. de Pinho, “Optimal control problems with mixed constraints”, SIAM J. Control Optim., 48:7 (2010), 4500–4524  crossref  mathscinet  zmath
20. H. A. Biswas and M. d. R. de Pinho, “A maximum principle for optimal control problems with state and mixed constraints”, ESAIM Control Optim. Calc. Var., 21:4 (2015), 939–957  crossref  mathscinet  zmath
21. A. Boccia, M. D. R. de Pinho, and R. B. Vinter, “Optimal control problems with mixed and pure state constraints”, SIAM J. Control Optim., 54:6 (2016), 3061–3083  crossref  mathscinet  zmath
22. An Li and J. J. Ye, “Necessary optimality conditions for optimal control problems with nonsmooth mixed state and control constraints”, Set-Valued Var. Anal., 24:3 (2016), 449–470  crossref  mathscinet  zmath
23. R. Andreani, V. A. de Oliveira, J. T. Pereira, and G. N. Silva, “A weak maximum principle for optimal control problems with mixed constraints under a constant rank condition”, IMA J. Math. Control Inform., 37:3 (2020), 1021–1047  crossref  mathscinet  zmath
24. L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, and E. F. Mishchenko, The mathematical theory of optimal processes, Fizmatgiz, Moscow, 1961  mathscinet  zmath; 2nd ed., Nauka, Moscow, 1969  zmath; English transl. of the first edition Intersci. Publ. John Wiley & Sons, Inc., New York–London, 1962  mathscinet  zmath
25. A. V. Dmitruk and N. P. Osmolovskii, “On the proof of Pontryagin's maximum principle by means of needle variations”, Fundam. Prikl. Mat., 19:5 (2014), 49–73  mathnet  mathscinet  zmath; English transl. J. Math. Sci. (N.Y.), 218:5 (2016), 581–598  crossref
26. G. G. Magaril-Il'yaev, “The Pontryagin maximum principle. Ab ovo usque ad mala”, Optimal control, Collected papers. In commemoration of the 105th anniversary of Academician Lev Semenovich Pontryagin, Trudy Mat. Inst. Steklova, 291, MAIK “Nauka/Interperiodica”, Moscow, 2015, 215–230  mathnet  crossref  mathscinet  zmath; English transl. G. G. Magaril-Il'yaev, 291, 2015, 203–218  crossref
27. A. V. Dmitruk and N. P. Osmolovskii, “Variations of the $v$-change of time in problems with state constraints”, Trudy Inst. Mat. i Mekh. UrO RAN, 24, no. 1, 2018, 76–92  mathnet  crossref  mathscinet  zmath; English transl. Proc. Steklov Inst. Math. (Suppl.), 305, suppl. 1 (2019), S49–S64  crossref
28. A. V. Dmitruk and N. P. Osmolovskii, “Proof of the maximum principle for a problem with state constraints by the $v$-change of time variable”, Discrete Contin. Dyn. Syst. Ser. B, 24:5 (2019), 2189–2204  crossref  mathscinet  zmath
29. A. V. Dmitruk, “Approximation theorem for a nonlinear control system with sliding modes”, Dynamical systems and optimization, Collected papers. Dedicated to the 70th birthday of academician Dmitrii Viktorovich Anosov, Trudy Mat. Inst. Steklova, 256, Nauka, MAIK Nauka/Inteperiodika, Moscow, 2007, 102–114  mathnet  mathscinet  zmath; English transl. Proc. Steklov Inst. Math., 256 (2007), 92–104  crossref
30. A. A. Milyutin, The maximum principle in the general problem of optimal control, Fizmatlit, Moscow, 2001 (Russian)  zmath
31. A. V. Dmitruk, “On the development of Pontryagin's maximum principle in the works of A. Ya. Dubovitskii and A. A. Milyutin”, Control Cybernet., 38:4A (2009), 923–957  mathscinet  zmath
32. “Necessary extremum conditions (Lagrange principle)”, Optimal control, Chap. 3, eds. V. M. Tikhomirov and N. P. Osmolovskii, MCCME, Moscow, 2008, 89–122 (Russian)
33. A. A. Milyutin and N. P. Osmolovskii, “First order conditions”, Calculus of variations and optimal control, Transl. from the Russian manuscript, Transl. Math. Monogr., 180, Amer. Math. Soc., Providence, RI, 1998  crossref  mathscinet  zmath
34. A. V. Dmitruk and A. M. Kaganovich, “Maximum principle for optimal control problems with intermediate constraints”, Nonlinear dynamics and control, 6, Fizmatlit, Moscow, 2008, 101–136  mathscinet  zmath; English transl. Comput. Math. Model., 22:2 (2011), 180–215  crossref
35. A. V. Dmitruk and N. P. Osmolovskii, “Necessary conditions for a weak minimum in optimal control problems with integral equations subject to state and mixed constraints”, SIAM J. Control Optim., 52:6 (2014), 3437–3462  crossref  mathscinet  zmath
36. A. Dmitruk and N. Osmolovskii, “A general Lagrange multipliers theorem”, 2017 Constructive nonsmooth analysis and related topics (dedicated to the memory of V. F. Demyanov), CNSA-2017 (St. Petersburg 2017), IEEE, 2017, 82–84  crossref
37. A. V. Dmitruk and N. P. Osmolovskii, “A general Lagrange multipliers theorem and related questions”, Control systems and mathematical methods in economics, Lecture Notes in Econom. and Math. Systems, 687, Springer, Cham, 2018, 165–194  crossref  mathscinet  zmath

Citation: A. V. Dmitruk, “Variations of $v$-change of time in an optimal control problem with state and mixed constraints”, Izv. RAN. Ser. Mat., 87:4 (2023), 91–132; Izv. Math., 87:4 (2023), 726–767
Citation in format AMSBIB
\Bibitem{Dmi23}
\by A.~V.~Dmitruk
\paper Variations of $v$-change of time in an~optimal control problem with state and mixed constraints
\jour Izv. RAN. Ser. Mat.
\yr 2023
\vol 87
\issue 4
\pages 91--132
\mathnet{http://mi.mathnet.ru/im9305}
\crossref{https://doi.org/10.4213/im9305}
\mathscinet{http://mathscinet.ams.org/mathscinet-getitem?mr=4656040}
\adsnasa{https://adsabs.harvard.edu/cgi-bin/bib_query?2023IzMat..87..726D}
\transl
\jour Izv. Math.
\yr 2023
\vol 87
\issue 4
\pages 726--767
\crossref{https://doi.org/10.4213/im9305e}
\isi{https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=Publons&SrcAuth=Publons_CEL&DestLinkType=FullRecord&DestApp=WOS_CPL&KeyUT=001088986700003}
\scopus{https://www.scopus.com/record/display.url?origin=inward&eid=2-s2.0-85174899146}
Linking options:
  • https://www.mathnet.ru/eng/im9305
  • https://doi.org/10.4213/im9305e
  • https://www.mathnet.ru/eng/im/v87/i4/p91
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Известия Российской академии наук. Серия математическая Izvestiya: Mathematics
    Statistics & downloads:
    Abstract page:419
    Russian version PDF:24
    English version PDF:70
    Russian version HTML:135
    English version HTML:138
    References:81
    First page:8
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024