Loading [MathJax]/jax/element/mml/optable/BasicLatin.js
Izvestiya: Mathematics
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Forthcoming papers
Archive
Impact factor
Guidelines for authors
Submit a manuscript

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Izv. RAN. Ser. Mat.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Izvestiya: Mathematics, 2023, Volume 87, Issue 4, Pages 726–767
DOI: https://doi.org/10.4213/im9305e
(Mi im9305)
 

Variations of v-change of time in an optimal control problem with state and mixed constraints

A. V. Dmitruk

Steklov Mathematical Institute of Russian Academy of Sciences, Moscow
References:
Abstract: For a general optimal control problem with state and regular mixed constraints we propose a proof of the maximum principle based on the so-called v-change of time variable tτ, under which the original time becomes an additional state variable subject to the equation dt/dτ=v(τ), while the additional control variable v(τ)0 is piecewise constant, and its values become arguments of the new problem.
Keywords: state and mixed constraints, positively linearly independent vectors, v-change of time, Lebesgue–Stieltjes measure, stationarity conditions, Lagrange multipliers, functional on L, weak* compactness, maximum principle.
Funding agency Grant number
Russian Science Foundation 20-11-20169
This work was supported by the Russian Science Foundation under grant no. 20-11-20169, https://rscf.ru/en/project/20-11-20169/.
Received: 20.12.2021
Revised: 31.08.2022
Bibliographic databases:
Document Type: Article
UDC: 517.97
MSC: 49K15, 49K27
Language: English
Original paper language: Russian

§ 1. Introduction

Optimal control problems with state and mixed constraints are widely used in theoretical and applied research. Their study was initiated in 1960 by Gamkrelidze [1], whose methods were developed in some other papers (see, for example, [2]–[4]). However, it is well known that the extension of the Pontryagin maximum principle (MP) involves considerable difficulties due to an infinite (uncountable) number of inequality constraints. The principal difficulty here, which appeared already in obtaining stationarity conditions (the Euler–Lagrange equation), consists in characterization of the Lagrange multipliers under these constraints.

Dubovitskii and Milyutin [5] proposed to treat the state constraint Φ(t,x(t))0 as an inclusion in the cone of non-positive functions in the space C of continuous functions on a given time interval; in this case, the corresponding Lagrange multiplier is represented by an element of the dual space C, that is, by a Lebesgue–Stieltjes measure. By using this approach, necessary conditions for the weak minimum (that is, stationarity conditions) can be obtained rather simply (see, for example, [5]–[7]). The authors of [5]–[7] had also proposed a method for reducing the initial problem to a family of auxiliary (associated) problems (see [8]), and, from the stationarity conditions in these problems, they obtained a generalization of the Pontryagin MP (that is, necessary conditions for the strong minimum) for problems with state constraints. This approach is based on the so-called v-change of the time variable, which will be described later (but other classes of variations can also be employed here).

By analogy with state constraints, it was proposed to treat the mixed constraints of the form φ(t,x(t),u(t))0 as an inclusion in the cone of non-positive L-functions; with this proviso, the corresponding multipliers are elements of its dual space. However, in general, the problem of characterization of such multipliers is not an easy task, because they may contain so-called singular components. In the case of regular mixed constraints (where their gradients with respect to u are non-degenerate in a sense), it can be shown that the Lagrange multipliers do not contain singular components, and they all lie in the space L1. Therefore, in this case, regular mixed constraints are even simpler than the purely state ones in the formulation of optimality conditions.

Based on the stationarity conditions thus obtained and using again the v-change of time, Dubovitskii and Milyutin also derived an MP for problems involving both state and regular mixed constraints. However, the proof of this result became public long after this discovery (see [6], Ch. 5), while at that time Dubovitskii and Milyutin were concentrated on problems with general mixed constraints without the regularity assumption [9]–[11]. Other scholars started working on problems with regular mixed constraints in the mid-1970s (see [12]–[14]). In [15], the author of the present paper, being a post graduate student of A. A. Milyutin at that time, implemented his idea of the use of the so-called sliding modes (which were introduced earlier by Gamkrelidze [16] for a proof of existence of solutions to optimal control problems); he also gave a complete proof of an MP for problems with regular mixed constraints of equality and inequality types.

There has been relatively little other studies on optimality conditions for problems with constraints of this type (see [3], [17]–[23]). Usually, in such studies either particular statements of the problem are considered under more restrictive regularity assumptions, or, vice versa, generalizations of the problem are studied involving non-smooth (Lipschitz-continuous) constraints, and the corresponding versions of stationarity conditions and MPs are obtained by the machinery of non-smooth analysis. However, we believe that the smooth case is the most important and deserves a special attention, the more so that an application of “non-smooth” conditions to the smooth case usually produces results more rough than the “smooth” ones.

The general v-change of time consists in a passage from the original time t to a new time τ so that the original time t=t(τ) becomes an additional state variable subjected to the equation dt/dτ=v(τ), where v(τ)0 is an additional control. A key point here is that this transformation is not one-one (at the intervals where v(τ)=0), and because of this small variations of the control v(τ) generate non-small (Pontryagin-type) variations of the original control u(t). However, this approach requires a deep knowledge of the theory of functions of real variable.

At the end of the 1990s, A. A. Milyutin proposed a simplified version of v-change of time with piecewise constant function v(τ). Using this approach, he proved the maximum principle for a general Pontryagin-type problem, that is, for a problem with endpoints constraints, but without state and mixed constraints. In the case of a piecewise constant v-change of time, small variations of the control v(τ) generate, in fact, needle-like variations of the control u(t) with a small but substantial difference from the usual ones. The advantage of v-change variations against the usual needle-like variations (packets of needles) is as follows:

a) they can be placed at any point t of the given time interval, while the needle variations can work only at Lebesgue points of the optimal control ˆu(t) (see, for example, [24]–[26]);

b) the constraints of the new problem are defined, at least, in a whole neighbourhood of the parameters of the v-change, whereas the needle variations lead to a problem in which the functions are defined only on a non-negative orthant in a finite-dimensional space (to be exact, on its intersection with a neighbourhood of the origin) which corresponds to the needle widths εi0 in the given packet;

c) the problem constraints depend smoothly on the parameters of the v-change, whereas, for needle variations, differentiability of these constraints with respect to the needle width can be guaranteed only at εi=0.

The recent studies [27], [28] show that piecewise constants v-change of time allow one to obtain an MP also in problems with state constraints. The purpose of the present paper is to show the feasibility of this approach for problems involving both state and mixed constraints. However, here the mere “generalized” needle variations are insufficient in the associated problem — one should also add uniformly small variations (to obtain the stationarity condition in the control ¯Hu=0), and so the problem is now posed in an infinite-dimensional space.

The general structure of the proof, as in [27], [28], is as follows. Piecewise constancy of the function v(τ) allows one to pass to a problem in which the arguments are the values of v(τ) on the intervals of its constancy, the values of the control u on the intervals v(τ)>0, and the initial value of the state variable x(τ0). The presence of state and mixed constraints implies that this problem involves an infinite number of inequality constraints, that is, it is not a usual smooth problem. Nevertheless, optimality conditions in this problem are well known; the specific feature of these conditions is only that they involve support functionals to the cones of non-positive functions in the corresponding spaces. Applying these conditions and rewriting them in terms of the original problem, we obtain a family of corresponding collections of Lagrange multipliers, which form a non-empty compact set relative to a certain topology. Each element of this compact set (that is, a collection of Lagrange multipliers) guarantees the fulfilment of the maximum principle for a finite set of control values and times corresponding to the given v-change. The family of compact sets generated by all possible piecewise constant v-changes are partially ordered by inclusion, and hence form a centred (Alexandroff type) system. Taking an arbitrary element of their intersection, we obtain a universal optimality condition, which is a collection of Lagrange multipliers that guarantee the fulfilment of the maximum principle for all values of the control and time.

The approach we propose here for obtaining an MP for problems with state and regular mixed constraints has an advantage over the one with sliding modes [15], [7] because the latter calls for a proof of a rather difficult (though interesting per se) relaxation theorem justifying the extension (convexification) of the control system by introducing the sliding modes [29], whereas the method of v-variations does not require this.

It is worth pointing out again that the idea of a passage to a family of associated problems in which optimality conditions are already known with a subsequent application of centred systems of compact sets is also due to Dubovitskii and Milyutin (see [8], [6], [30], [31]). This approach was already employed for obtaining an MP both in problems without state constraints [7], [32], [25] and in problems with such constraints [7], [27], [28].

§ 2. Statement of the problem and the maximum principle

Let x():[t0,t1]Rn be an absolutely continuous function (the state variable) and u():[t0,t1]Rr be a measurable bounded function (the control). The time interval [t0,t1] is not fixed a priori. Consider the problem with Mayer type cost functional

J:=F0(t0,x(t0),t1,x(t1))min
\begin{equation} F(t_0,x(t_0),t_1,x(t_1))\leqslant 0, \qquad K(t_0,x(t_0),t_1,x(t_1))=0, \end{equation} \tag{2.2}
\begin{equation} \dot x(t)= f(t,x(t),u(t)) \quad \text{a.e. on }\, [t_0,t_1], \end{equation} \tag{2.3}
\begin{equation} \varphi(t,x(t),u(t)) \leqslant 0,\qquad g(t,x(t),u(t))=0 \quad \text{a.e. on }\, [t_0,t_1], \end{equation} \tag{2.4}
\begin{equation} \Phi(t,x(t))\leqslant0 \quad \text{on } [t_0,t_1]. \end{equation} \tag{2.5}

Here, F, K, f, \varphi, g, \Phi are vector functions of some dimensions, which, to save letters, we denote by d(F), d(K), etc. In constraints (2.2)(2.5), we always use vector notation, which should be understood coordinatewise. The cost function F_0 assumes real values. The functions of finite-dimensional argument (t_0,x(t_0), t_1,x(t_1)) are defined on an open set \mathcal{P} \subset\mathbb{R}^{2n+2}, and the functions depending on (t,x,u) are defined on an open set \mathcal{Q} \subset \mathbb{R}^{1+n+r}. We assume that all these functions are smooth, that is, they are continuously differentiable with respect to their arguments. For brevity, problem (2.1)(2.5) will be referred to as Problem \mathrm{A}.

Relations (2.2) are called endpoint (or terminal) constraints, (2.4) are known as mixed constraints, (2.5) are state constraints, and (2.3) is the control system.

In addition to the above smoothness assumptions, we will also assume that the mixed constraints are regular, that is, for any point (t,x,u)\in \mathcal{Q} satisfying (2.4), the system of vectors

\begin{equation} \varphi'_{iu}(t,x,u),\quad i\in I(t,x,u), \qquad g'_{ju}(t,x,u),\quad j=1,\dots, d(g), \end{equation} \tag{2.6}
is positively linearly independent. Here, I(t,x,u) = \{i\mid \varphi_i(t,x,u) =0\} is the set of active indexes for the mixed inequality constraints.

Definition 1. A system of two collections of vectors p_i, i\in I, and q_j, j\in J, from \mathbb{R}^r, where I and J are some finite sets of indexes, is called positively linearly independent (PLI) if

\begin{equation*} \sum_{i\in I} \alpha_i p_i + \sum_{j\in J} \beta_j q_j = 0 \end{equation*} \notag
does not hold for any non-trivial collection of coefficients \alpha_i, i\in I, and \beta_j, j\in J, where all \alpha_i\geqslant 0.

It is easily seen that this requirement is equivalent to saying that a) the vectors q_j are linearly independent, and b) their linear hull does not intersect the convex hull of the vectors p_i. Sometimes, the following dual form of assumption b) is useful: there exists a vector \overline u such that (p_i,\overline u)<0 and (q_j,\overline u) =0 for all i,j.

Thus, the regularity assumption of mixed constraints means that, at any point where they hold, the gradients in u of the active inequality constraints and of all equality constraints are positively linearly independent.1

Remark 1. In non-smooth problems, the assumption that system (2.6) is PLI is replaced by its non-smooth analog that guarantees that any outer normal (\alpha,\beta) to the set of first order admissible variations of the variables (x,u) satisfies the estimate |\alpha|\leqslant \mathrm{const}\,|\beta|. Geometrically, this means that any support hyperplane to the graph of the set-valued mapping x\mapsto U(t,x) corresponding to the mixed constraints is not close to vertical ones, that is, its slope is bounded. Because of this, this assumption is called the bounded slope condition (see, for example, [17]–[23]).

Remark 2. Note that the state equality constraints G(t,x)=0 are not allowed, for otherwise the linearization of the equality constraints of the problem would not give, in general, a closed image — this condition is a basic requirement in obtaining first-order optimality conditions in all classes of optimization problems (see, for example, § 9.1). Such constraints should be differentiated with respect to t, and replaced by the mixed ones G_t(t,x) + G_x(t,x)f(t,x,u) =0 in hope that their gradients with respect to u together with (2.6) would be positively linearly independent.

Remark 3. For now, we do not allow the traditional inclusion constraints of type u(t)\in U. If a set U\subset \mathbb{R}^r is given by smooth constraints of the form \widetilde\varphi(u)\leqslant0, \widetilde g(u)=0, then these constraints should be treated as mixed constraints together with (2.4), for otherwise the problem ceases to be smooth (and then one has to assume that the support vector to the set U together with the gradients of the mixed constraints with respect to u constitute a PLI system), which we try to avoid because, in this case, the study is much more technically complicated. However, one fairly general class of problems with inclusion constraint will be briefly discussed in § 7 below.

Thus, Problem \mathrm{A} is posed. A pair of functions w(t)=(x(t), u(t)) related by (2.3) together with the interval of their definition [t_0, t_1] will be called a process of the problem. A process is called admissible if its endpoints (t_0,x(t_0),t_1,x(t_1)) belong to the set \mathcal{P}, there exists a compact set D \subset \mathcal{Q} such that (t,w(t))\in D for almost all t, and if all constraints of the problem are satisfied. As usual, we say that an admissible process \widehat{w}(t)=(\widehat{x}(t), \widehat{u}(t)), t\in [\widehat t_0, \widehat t_1], delivers a strong minimum if there exists \varepsilon>0 such that \mathcal{J}(w) \geqslant \mathcal{J}(\widehat{w}) for any admissible process w(t)=(x(t), u(t)), t\in [t_0, t_1], such that

\begin{equation*} |t_0-\widehat t_0|<\varepsilon,\quad |t_1-\widehat t_0|<\varepsilon, \qquad |x(t)-\widehat x(t)|<\varepsilon \quad \text{on } [t_0,t_1]\cap[\widehat t_0,\widehat t_1]. \end{equation*} \notag

We also need the following concept due to Dubovitskii and Milyutin (see [7], [31], [33]). An admissible process \widehat{w}(t)=(\widehat{x}(t), \widehat{u}(t)), t\in [\widehat t_0, \widehat t_1], is said to provide a Pontryagin minimum in Problem \mathrm{A} if, for any number N, it delivers a local minimum with respect to the norm \|x\|_C + \|u\|_1 in the same problem with the additional constraint |u(t)|\leqslant N; that is, if there exists an \varepsilon>0 such that \mathcal{J}(w)\geqslant \mathcal{J}(\widehat{w}) for any admissible process w(t)=(x(t), u(t)), t\in [t_0, t_1] satisfying

\begin{equation*} |t_0-\widehat t_0|< \varepsilon,\quad |t_1-\widehat t_0|< \varepsilon,\quad\; \|x - \widehat{x}\|_C <\varepsilon, \quad\; \|u -\widehat{u}\|_1 <\varepsilon, \quad \|u\|_\infty \leqslant N. \end{equation*} \notag
(Here, both norms are taken on the common interval of definition of the corresponding functions.)

It is clear that the Pontryagin minimum is intermediate between the weak and strong minima. In particular, this type of minimum enables both needle-type and uniformly small variations of the control.

Remark 4. If a reference process (\widehat{x}(t), \widehat{u}(t)) is given, then it suffices to assume regularity of the mixed constraints only along the trajectory \widehat{x}(t), that is, it suffices that system (2.6) be PLI not for all above triples (t,x,u), but only for triples of the form (t,\widehat{x}(t),u).

To avoid the a priori degeneracy of the “standard” optimality conditions, we will assume that the endpoints of the reference process do not lie on the state boundary; more precisely, the following strict inequalities are assumed:

\begin{equation} \Phi(t_0, \widehat{x}(t_0))<0, \qquad \Phi(t_1, \widehat{x}(t_1))<0. \end{equation} \tag{2.7}

To formulate necessary optimality conditions in Problem \mathrm{A}, we will need the following notation. We introduce the Pontryagin function

\begin{equation*} H(t,x,u)=\psi_x f(t, x, u), \end{equation*} \notag
where \psi_x is a row vector of dimension n (sometimes, the argument \psi_x in H will be omitted); we also define the extended Pontryagin function
\begin{equation*} \overline H(t,x,u)=\psi_x f(t, x, u) - \lambda\varphi(t,x,u) - mg(t,x,u) - \frac{d\mu}{dt}\, \Phi(t,x) \end{equation*} \notag
and the endpoint Lagrange function
\begin{equation*} l(t_0,x_0,t_1,x_1)= (\alpha_0 F_0 + \alpha F + \beta K)(t_0,x_0,t_1,x_1), \end{equation*} \notag
where \alpha_0 is a number, \alpha, \beta are row vectors of the same dimensions as F, K, respectively (the arguments \alpha_0, \alpha, \beta\, in l are omitted), \lambda, m are row vectors of the same dimensions as \varphi, g, and d\mu/dt is a row vector of the same dimension as \Phi.

Let w=(x(t), u(t)), t\in [t_0, t_1], be an admissible process for Problem \mathrm{A}. We will say that it satisfies the maximum principle if there exist a number \alpha_0, row vectors \alpha\in\mathbb{R}^{d(F)}, \beta\in \mathbb{R}^{d(K)}, measurable bounded functions \lambda(t), m(t) of dimensions d(\varphi), d(g), respectively, a non-decreasing function \mu(t) of dimension d(\Phi), functions of bounded variation \psi_x(t), \psi_t(t) of dimensions n,\, 1, respectively (where x, t are indexes, rather than notation for derivatives) such that:

\begin{equation} \begin{aligned} \, &(\mathrm{i})\ \alpha_0\geqslant0,\quad \alpha\geqslant0,\quad \lambda(t)\geqslant0\quad \text{a.e. on }[t_0,t_1]; \nonumber \\ &(\mathrm{ii})\ \alpha_0+|\alpha| + \int_{t_0}^{t_1} \lambda(t)\,dt + \int_{t_0}^{t_1} d \mu(t) >0; \nonumber \\ &(\mathrm{iii})\ \alpha F(t_0,x(t_0),t_1,x(t_1))=0,\quad \lambda(t)\varphi(t,x(t),u(t))=0\quad\text{a.e. on }[t_0,t_1], \nonumber \\ &\qquad \Phi(t,x(t))\,d \mu(t) =0\quad \text{on }[t_0,t_1]; \nonumber \\ &(\mathrm{iv}_x)\ -\dot\psi_x(t)= \overline H_x(t,x(t), u(t)); \nonumber \\ &(\mathrm{iv}_t)\ -\dot\psi_t(t)=\overline H_t(t,x(t), u(t)); \nonumber \\ &(\mathrm{v}_x)\ \psi_x(t_0)=l_{x_0}(t_0,x(t_0),t_1,x(t_1)),\quad \psi_x(t_1)= -l_{x_1}(t_0,x(t_0),t_1,x(t_1)); \nonumber \\ &(\mathrm{v}_t)\ \psi_t(t_0)= l_{t_0}(t_0,x(t_0),t_1,x(t_1)),\quad \psi_t(t_1)= -l_{t_1}(t_0,x(t_0),t_1,x(t_1)); \nonumber \\ &(\mathrm{vi})\ \overline H_u(\psi_x(t),t,x(t), u(t))=0\quad \text{for almost all } t\in[t_0,t_1]; \nonumber \\ &(\mathrm{vii})\ H(\psi_x(t),t,x(t), u(t)) + \psi_t(t)=0\quad \text{for almost all } t\in[t_0,t_1]; \nonumber \\ &(\mathrm{viii})\ H(\psi_x(t\,{-}\,0),t,x(t), u')\,{+}\, \psi_t(t\,{-}\,0)\,{\leqslant}\, 0, \ H(\psi_x(t\,{+}\,0),t,x(t),u') \,{+}\,\psi_t(t\,{+}\,0)\,{\leqslant}\, 0 \nonumber \\ &\text{for all }t\in[t_0,t_1]\text{ and all }u'\text{ such that} \nonumber \end{aligned} \end{equation} \notag
\begin{equation} \\ (t,x(t), u') \in \mathcal{Q}, \qquad \varphi(t,x(t), u')\leqslant0, \qquad g(t,x(t), u') =0. \end{equation} \tag{2.8}
The set of all u' \in \mathbb{R}^r satisfying constraints (2.8) will be denoted by \mathcal{R}(t,x(t)).

The functions \psi_x(t) and \psi_t(t) are known as the adjoint (costate) variables.2 For now, one does not need to specify from which side they are continuous, assuming only that at each point t they have both left- and right-hand limits; these limits are equal at each continuity point (the discontinuity points form an at most countable set). The function \mu(t) generates a Lebesgue–Stieltjes measure d\mu(t)\geqslant0 on [t_0,t_1] with generalized density d\mu(t)/dt; and the third condition in (iii) means that d\mu(t) =0 on any interval where \Phi(t,x(t))<0. (As already mentioned, this pertains to every component of the vector \Phi and the measure d\mu.) In particular, by assumption (2.7), d\mu(t) =0 in some neighbourhoods of the points t_0 and t_1. Note also that without loss of generality one can put \mu(0)=0.

Relations (i)–(vi) are known as non-negativity condition, non-triviality condition, the complementary slackness, the adjoint (costate) equations, the transversality, and the stationarity in the control, respectively. Relation (vii) can be called the law of energy dynamics, since together with the costate equation (\mathrm{iv}_t) for \psi_t it yields an equation for the function H, which often plays the role of the energy in mechanical problems:

\begin{equation*} \dot H = \overline H_t \quad \text{or} \quad \frac{dH}{dt}= \frac{\partial \overline H}{\partial t}. \end{equation*} \notag
(If the problem is time-independent, that is, the functions f, g, \varphi, and \Phi do not depend on t, we get the energy conservation law: \dot H =0, that is, H =\mathrm{const}.)

Relation (viii) is obviously equivalent to H(\psi_x(t),t,x(t), u')+ \psi_t(t) \leqslant 0 at all continuity points of the functions \psi_x and \psi_t. This and relation (vii) yield the maxumality condition for the Pontryagin function: for almost all t\in[t_0,t_1],

\begin{equation} \max_{u' \in \mathcal{R}(t,x(t))} H(\psi_x(t),t,x(t), u') = H(\psi_x(t),t,x(t), u(t)), \end{equation} \tag{2.9}
thanks to which the entire set of relations (i)–(viii) is called the maximum principle. Note that here the maximum is taken over u' from the above set \mathcal{R}(t,x(t)). In the absence of state and mixed constraints (2.4), (2.5), the set \mathcal{R}(t,x(t))= \{ u'\mid (t,x(t),u') \in \mathcal{Q}\}, the following multipliers vanish: \lambda(t)=0, m(t)=0, d\mu(t) = 0. So, we get the Pontryagin maximum principle for the general Lagrange problem of classical calculus of variations (2.1)(2.3), that is, the Weierstrass condition.

Note that the function \overline H appears in the relations involving differentiation with respect to one of the variables t,x,u, whereas the function H is not differentiated in (i)–(viii) and (2.9).

The costate equations (\mathrm{iv}_x)(\mathrm{iv}_t) should be understood as equalities between the measures on [t_0,t_1]:

\begin{equation*} \begin{aligned} \, d\psi_x(t) &= \bigl(-H_x(\psi_x(t),t,x(t), u(t)) \\ &\qquad +\lambda(t)\varphi_x(t,x(t), u(t)) + m(t)g_x(t,x(t), u(t))\bigr)\,d t + d \mu(t)\,\Phi_{x}(t,x(t)), \\ d\psi_t(t) &= \bigl(-H_t(\psi_x(t),t,x(t), u(t)) \\ &\qquad +\lambda(t)\varphi_t(t,x(t), u(t)) + m(t)g_t(t,x(t), u(t))\bigr)\,d t + d \mu(t)\,\Phi_{t}(t,x(t)), \end{aligned} \end{equation*} \notag
One can also write these equalities in an integral form, for example,
\begin{equation*} \psi_x(t+0) =\psi_x(t_0) + \int_{t_0}^t (- H_x + \lambda \varphi_x + mg_x)\,ds + \int_{t_0}^{t+0}\Phi_{x}(s,x(s))\,d\mu(s), \end{equation*} \notag
and, similarly, for \psi_x(t-0) and \psi_t(t \pm 0).

The maximum principle is commonly regarded as a necessary condition for strong minimality. However, the following stronger assertion due to Dubovitskii and Milyutin holds (see, for example, [6], [11], [30]).

Theorem 1. If a process \widehat{w}=(\widehat{x}(t), \widehat{u}(t)), t\in [\widehat t_0, \widehat t_1], delivers a strong minimum in Problem \mathrm{A}, then it satisfies the maximum principle (i)–(viii).

As mentioned in the introduction, we will provide a new relatively simple proof of this theorem. It is more convenient to give it not for the general Problem \mathrm{A}, but rather for its particular time-independent case.

§ 3. The autonomous Problem \mathrm{B}

Consider the following Problem \mathrm{B} on a non-fixed interval [t_0,t_1] (an autonomous case of Problem \mathrm{A}):

\begin{equation} J:= F_0(x(t_0),x(t_1)) \to \min, \end{equation} \tag{3.1}
\begin{equation} F(x(t_0),x(t_1))\leqslant0, \qquad K(x(t_0),x(t_1))=0, \end{equation} \tag{3.2}
\begin{equation} \dot x(t)= f(x(t),u(t)), \end{equation} \tag{3.3}
\begin{equation} \varphi(x(t),u(t)) \leqslant 0,\qquad g(x(t),u(t))=0, \end{equation} \tag{3.4}
\begin{equation} \Phi(x(t))\leqslant0. \end{equation} \tag{3.5}

For this problem, the costate equation (\mathrm{iv}_t) gives \psi_t = \mathrm{const}, and now the transversality condition (\mathrm v_t) implies \psi_t \equiv 0; so, instead of \psi_x, we will simply write \psi. Thus, conditions (vii) and (viii) for Problem \mathrm{B} take the form

\begin{equation} \psi(t) f(x(t), u(t)) =0 \quad \text{a.e.}, \qquad \psi(t\pm 0) f(x(t), u') \leqslant 0 \quad \forall\, t, \end{equation} \tag{3.6}
where u' \in \mathcal{R}(x(t)). The remaining MP conditions do not change.

Even though Problem \mathrm{B} is a particular case of Problem \mathrm{A}, any problem of type \mathrm{A} can be reduced to the form of Problem \mathrm{B}. This can be done by the following simple trick. We augment the control system \dot{x}=f(t,x,u) with the additional equation dt/d\tau=1, regarding \tau as a new time variable ranging over some interval [\tau_0,\tau_1], and the original time t=t(\tau), as a new state variable. The functions x(\,{\cdot}\,) and u(\,{\cdot}\,) now also depend on the new time: x=x(\tau), u=u(\tau). Thus, we have the following Problem \mathrm{A}':

\begin{equation} J= F_0(t(\tau_0),x(\tau_0),t(\tau_1),x(\tau_1))\to \min, \nonumber \end{equation} \notag
\begin{equation} F(t(\tau_0),x(\tau_0),t(\tau_1),x(\tau_1)) \leqslant 0, \qquad K(t(\tau_0),x(\tau_0),t(\tau_1),x(\tau_1))=0, \nonumber \end{equation} \notag
\begin{equation} \frac{dx}{d\tau}= f(t(\tau),x(\tau),u(\tau)), \qquad \frac{dt}{d\tau}=1, \end{equation} \tag{3.7}
\begin{equation} \varphi(t(\tau),x(\tau),u(\tau)) \leqslant0, \qquad g(t(\tau),x(\tau),u(\tau)) =0, \end{equation} \tag{3.8}
\begin{equation} \Phi(t(\tau),x(\tau))\leqslant 0, \end{equation} \tag{3.9}
where t(\tau), x(\tau) are state variables, u(\tau) is the control, and \tau\in [\tau_0,\tau_1] is a non-fixed time interval. Clearly, Problem \mathrm{A}' is of type \mathrm{B}.

Problem \mathrm{A}' is invariant with respect to shifting the time \tau, and hence one can fix an initial moment \tau_0, and then both admissible and optimal processes of Problems \mathrm{A} and \mathrm{A}' will obviously be in a one-one correspondence. Therefore, having obtained necessary optimality conditions for Problem \mathrm{B}, one can apply them to Problem \mathrm{A}', thereby obtaining necessary conditions for Problem \mathrm{A}. The costate variable in Problem \mathrm{A}' is the pair (\psi_x, \psi_t), the Pontryagin function for system (3.7) is \widetilde H = \psi_x f + \psi_t, the “autonomous” conditions \widetilde H(x,u) =0 and \widetilde H(x,u') \leqslant 0 (see (3.6)) assume the form \psi_x f(x,u) + \psi_t =0 and \psi_x f(x,u') + \psi_t \leqslant 0, which are exactly conditions (vii) and (viii) in Theorem 1. The details of these transformations are left to the reader.

Let us now proceed with the proof of Theorem 1 for Problem \mathrm{B}. To this aim, we again convert the time to a state variable, but now setting dt/d\tau= v(\tau), where the function v(\tau) is non-negative (rather than positive everywhere), hence t=t(\tau) is non-decreasing, but is not necessarily strictly increasing. This non-invertible change, which transforms the time t into a state variable, was proposed by Dubovitskii, used in his joint works with Milyutin [5], [11], and, then, in the works of Milyutin [6], [30] (see also [32]); they called it a v-change. A non-trivial point here is that small variations of the new control v(\tau) generate needle-like variations of the original control u(t). The simplest case of this v-change (with piecewise constant v(\tau)) will be now considered.

Since Problem \mathrm{B} is invariant with respect to time shifting, we fix, for definiteness, an initial moment t_0= \widehat t_0.

In parallel with the set \mathcal{R}(x) = \{u \mid (x,u)\in \mathcal{Q},\; \varphi(x,u)\leqslant0, \;g(x,u)=0\}, we consider its subset \mathcal{R}_0(x) = \{u\mid (x,u)\in \mathcal{Q},\; \varphi(x,u)<0, \; g(x,u)=0\}. Note that, under our assumption of regularity of the mixed constraints, any point in \mathcal{R}(x) is a limit point of \mathcal{R}_0(x).

Lemma 1 (on density). The set \mathcal{R}_0(x) is dense in \mathcal{R}(x).

Proof. Consider any point (x,u), where u \in \mathcal{R}(x). Let I be the corresponding set of active inequalities. By the positively-linear independence assumption of the gradients \varphi_u(x,u), g_u(x,u), there is a vector \overline u such that \varphi_u(x,u)\overline u<0 and g_u(x,u)\overline u=0. The last relation means that \overline u is tangential to the surface M(x) = \{u'\mid g(x,u')=0\} at the point u, that is, there exists a family of corrections u_\varepsilon = o(\varepsilon) for \varepsilon\to0+ such that u'_\varepsilon = u+ \varepsilon\overline u + u_\varepsilon \in M(x), that is, g(x, u'_\varepsilon)=0. Moreover, \varphi(x, u'_\varepsilon) = \varphi(x,u) + \varphi_u(x,u)\,\varepsilon\overline u + o(\varepsilon) <0. Thus, the points u'_\varepsilon \in \mathcal{R}_0(x) converge to u, the result required. \Box

3.1. Index \theta

Let \widehat w= (\widehat{x}(t),\widehat{u}(t)), t\in [\widehat t_0,\widehat t_1], be an optimal process in Problem \mathrm{B}. With this process we associate a family of problems \mathrm{B}^\theta, which we construct below, and their optimal solutions labeled by some index \theta.

By the index we will mean a collection of time and control values

\begin{equation*} \theta= \{(t^1,u^1),\dots,(t^{d},u^{d})\}, \end{equation*} \notag
where d is an arbitrary natural number, \widehat t_0 < t^1\leqslant \dots \leqslant t^{d} < \widehat t_1, and the value u^s\in \mathcal{R}_0(\widehat{x}(t^s)) is arbitrary for any s =1,\dots, d. The index length d = d(\theta) depends on \theta.

Let us define the interval [\tau_0,\tau_1] as follows: we take the interval [\widehat t_0,\widehat t_1], and at the points t^1,\dots,t^{d(\theta)} we successively insert unit intervals, preserving, at each time, the position of the point \widehat t_0. As a result, we obtain the interval [\tau_0,\tau_1] with the endpoints \tau_0=\widehat t_0, \tau_1=\widehat t_1+ d(\theta), and the inserted intervals have the form

\begin{equation*} \Delta^1=[t^1,\,t^1+1], \;\ \Delta^2=[t^2+1,\,t^2+2], \ \dots,\ \Delta^{d(\theta)} =[t^{d(\theta)}+(d(\theta)-1),\,t^{d(\theta)}+ d(\theta)]. \end{equation*} \notag
We next set
\begin{equation*} E_0= \bigcup_{1}^{d(\theta)}\Delta^s,\qquad E_+ = [\tau_0,\tau_1]\setminus E_0, \end{equation*} \notag
and define the functions
\begin{equation} v^\theta(\tau)= \begin{cases} 0, &\tau\in E_0, \\ 1, &\tau\in E_+, \end{cases} \qquad t^\theta(\tau)= \widehat t_0 + \int_{\tau_0}^\tau v^\theta(a)\,da, \quad \tau\in[\tau_0,\tau_1]. \end{equation} \tag{3.10}
We have
\begin{equation*} \frac{dt^\theta(\tau)}{d\tau}=v^\theta(\tau),\qquad t^\theta(\tau_0)=\widehat t_0, \quad\; t^\theta(\tau_1)=\widehat t_1. \end{equation*} \notag
So, t^\theta(\tau) is a piecewise linear non-decreasing function mapping [\tau_0,\tau_1] onto [\widehat t_0,\widehat t_1], and \Delta^s are the intervals of its constancy with t^\theta(\Delta^s)=t^s, \;s =1,\dots, d(\theta).

We next define

\begin{equation} u^\theta(\tau)=\begin{cases} \widehat u(t^\theta(\tau)), &\tau\in E_+, \\ u^s, &\tau\in \Delta^s, \ s =1,\dots, d(\theta), \end{cases} \qquad x^\theta(\tau)=\widehat{x}(t^\theta(\tau)). \end{equation} \tag{3.11}
The function u^\theta(\tau) is a bounded measurable function, and x^\theta(\tau) is an absolutely continuous functions satisfying
\begin{equation*} \frac{dx^\theta(\tau)}{d\tau} = v^\theta(\tau)\, f(x^\theta(\tau),u^\theta(\tau)), \qquad x^\theta(\tau_0) = \widehat{x}(\widehat t_0), \quad x^\theta(\tau_1) = \widehat{x}(\widehat t_1), \end{equation*} \notag
that is, the endpoints of the new trajectory x^\theta(\tau) coincide with those of the original \widehat{x}(t). Moreover, x^\theta(\tau) = \widehat{x}(t^s) on any inserted interval \Delta^s, and the new pair satisfies the mixed constraints (3.4) on the whole interval [\tau_0,\tau_1], that is,
\begin{equation} \begin{gathered} \, \varphi_i(x^\theta(\tau),u^\theta(\tau)) \leqslant 0, \qquad i=1,\dots,d(\varphi), \\ g_j(x^\theta(\tau),u^\theta(\tau)) =0, \qquad j=1,\dots,d(g), \end{gathered} \end{equation} \tag{3.12}
where on each \Delta^s the inequalities are strict.

Note that some points t^s may coincide: t^{s'} = \dots = t^{s''} = t_*, whence at any such a point t_*, we successively insert several unit intervals, on each of which we set v^\theta(\tau)=0, and the corresponding value u^\theta(\tau)= u^s.

The set E_0 is a finite union of intervals \Delta^s, s =1,\dots, d(\theta). The set E_+ is a finite union of intervals or half-open intervals. Consider the collection of all these intervals and half-open intervals of E_0 and E_+, order it, and denote its elements by \sigma_k, k=1,\dots,m. We have , [\tau_0,\tau_1]= \sigma_1 \cup \dots \cup \sigma_m, where different \sigma_k do not overlap. Let \chi_k(\tau) be the characteristic function of the set \sigma_k, k=1,\dots,m.

3.2. The control system of index \theta

We will need the following simple fact.

Lemma 2. Let a point (x^*,u^*)\in\mathbb{R}^{n+r} satisfy the conditions

\begin{equation*} \varphi(x^*,u^*) <0, \qquad g(x^*,u^*) =0. \end{equation*} \notag
Then there exists a neighbourhood \mathcal{O}(x^*) of the point x^* and a smooth function \mathcal{U}\colon \mathcal{O}(x^*)\,{\to}\, \mathbb{R}^r such that
\begin{equation*} \varphi(x,\mathcal{U}(x)) <0, \quad\; g(x,\mathcal{U}(x)) =0 \quad\; \forall\, x\in \mathcal{O}(x^*), \end{equation*} \notag
and \mathcal{U}(x^*) = u^*.

Proof. Recall that by the assumption of regularity of the mixed constraints, the rank of the matrix g'_u(x^*,u^*) is d(g). Hence, the components of the vector u can be split into two groups u =(u_1,u_2) so that \dim u_2= d(g) and the matrix g'_{u_2}(x^*,u^*_1, u^*_2) is invertible. By the implicit function theorem, there exists a neighbourhood \mathcal{O}(x^*,u^*_1) in which the equation g(x, u_1, u_2)=0 is resolved by a smooth function u_2 = G(x, u_1), that is, g(x, u_1, G(x, u_1))=0 and G(x^*, u_1^*) = u_2^*.

Freezing here u_1 =u_1^*, we get a smooth function u_2 = \widetilde G(x) = G(x, u_1^*) on the open set \mathcal{O}(x^*) = \{x\mid (x,u_1^*) \in \mathcal{O}(x^*,u^*_1)\}. By reducing this set, if necessary, we also obtain the inequality \varphi(x,u_1^*,\widetilde G(x)) <0. Now it remains to define \mathcal{U}(x) = (u_1^*,\widetilde G(x)). \Box

Now, we take an arbitrary index \theta. For any s =1,\dots, d(\theta), let \mathcal{U}^s(x) be the function from Lemma 2 corresponding to the point (\widehat{x}(t^s),u^s), and which is defined in a neighbourhood of the point \widehat{x}(t^s). Note that \mathcal{U}^s(\widehat{x}(t^s)) = u^s.

We fix the interval [\tau_0,\tau_1] corresponding to the index \theta. Consider the space \mathbb{R}^{m+n} of variables z=(z_1,\dots,z_m) and x_0= x(\tau_0). Generalizing (3.10), we define the piecewise constant function

\begin{equation} v(\tau)= \sum_{k=1}^m z_k\chi_k(\tau), \qquad \tau\in [\tau_0,\tau_1] \end{equation} \tag{3.13}
(that is, z_k is its value on the interval \sigma_k), and consider the control system
\begin{equation} \frac{dx}{d\tau}= v(\tau) \begin{cases} f(x(\tau),u(\tau)), &\tau\in E_+, \\ f(x(\tau),\,\mathcal{U}^s(x(\tau))), &\tau\in \Delta^s \subset E_0, \end{cases} \qquad x(\tau_0)=x_0. \end{equation} \tag{3.14}
Here, the control u \in L^r_\infty(E_+) (that is, u(\tau) is an arbitrary measurable bounded function on E_+), and on each \Delta^s\subset E_0 we set u(\tau) = \mathcal{U}^s(x(\tau)), that is, the control is in fact absent on there. It is clear that \mathcal{U}^s(x^\theta(\tau)) = u^\theta(\tau) = u^s on each \Delta^s.

Consider the function

\begin{equation*} \mathcal{F}(\tau,x,u)= \begin{cases} f(x,u), &\tau\in E_+, \\ f(x,\mathcal{U}^s(x)), &\tau\in \Delta^s \subset E_0. \end{cases} \end{equation*} \notag
This function depends smoothly on the pair (x,u)\in \mathbb{R}^n \times \mathbb{R}^r; that \mathcal{F} is discontinuous with respect to \tau plays no role here. Now system (3.14) has the form
\begin{equation} \frac{dx}{d\tau}= v(\tau)\mathcal{F}(\tau,x(\tau),u(\tau)),\qquad x(\tau_0)=x_0. \end{equation} \tag{3.15}
In view of (3.13) it can be written as
\begin{equation} \frac{dx}{d\tau}= \sum_{k=1}^m z_k\chi_k(\tau)\mathcal{F}(\tau,x(\tau),u(\tau)), \qquad x(\tau_0)=x_0. \end{equation} \tag{3.16}

Let z^\theta_k be the value of v^\theta(\tau) on \sigma_k, k=1,\dots,m, that is, v^\theta(\tau)=\sum z^\theta_k\chi_k(\tau). Recall that z^\theta_k=0 if \sigma_k\subset E_0, and z^\theta_k =1 if \sigma_k\subset E_+. We set z^\theta =(z^\theta_1,\dots, z^\theta_m) and define x^\theta_0= x^\theta(\tau_0)= \widehat{x}(\widehat t_0); the control u^\theta(\tau) is defined above. It is easily seen that the triple (u^\theta,z^\theta,x^\theta_0) satisfies system (3.16). Let us call it a basic point of Problem \mathrm{B}^{\theta} (this problem will be constructed a bit later) corresponding to the process \widehat{w}(t)= (\widehat{x}(t),\widehat{u}(t)) of the original Problem \mathrm{B}.

The right-hand side in (3.15), (3.16) is a smooth function of (u,z,x_0)\in\mathbb{R}^{r+m+n}, and hence, for any triple (u,z,x_0)\in L_\infty^r(E_+) \times \mathbb{R}^m \times \mathbb{R}^n sufficiently close to (u^\theta, z^\theta,x^\theta_0), the Cauchy problem (3.16) has a solution x(\tau) which depends smoothly on this triple.3 Thus, we have the operator

\begin{equation*} P\colon L^r_\infty(E_+)\times \mathbb{R}^m \times \mathbb{R}^n\to C^n[\tau_0, \tau_1], \qquad (u,z,x_0) \mapsto x(\tau), \end{equation*} \notag
which is Frechét differentiable near the point (u^\theta,z^\theta,x^\theta_0) and whose derivative is continuous at this point. The derivative at this point is a linear mapping P'(u,z,x_0)\colon (\overline u,\overline{z},\overline x_0) \mapsto \overline x(\tau), where the function \overline x(\tau) is the solution of the Cauchy problem with the initial condition \overline{x}(\tau_0) =\overline{x}_0 for the equation in variations
\begin{equation} \frac{d\overline{x}}{d\tau} = \sum_k \bigl( z^\theta \chi_k \mathcal{F}_x(\tau,x^\theta, u^\theta)\overline{x} + z^\theta \chi_k \mathcal{F}_u(\tau,x^\theta, u^\theta) \overline u + \overline{z}_k\chi_k \mathcal{F}(\tau,x^\theta, u^\theta) \bigr), \end{equation} \tag{3.17}
or, in a different form,
\begin{equation} \frac{d\overline{x}}{d\tau} = v^\theta \bigl(f_x(x^\theta, u^\theta)\overline{x} + f_u(x^\theta, u^\theta) \overline u\bigr) + \overline v f(x^\theta, u^\theta), \end{equation} \tag{3.18}
where, according to (3.13), \overline v(\tau)= \sum_{k=1}^m \overline{z}_k\chi_k(\tau).

Here, we used the fact that \mathcal{U}^s(x^\theta(\tau)) = u^\theta(\tau), where v^\theta =0 (that is, on E_0), and that \mathcal{F}(\tau,x^\theta, u^\theta) = f(x^\theta, u^\theta) on the whole [\tau_0,\tau_1]. The derivatives \mathcal{F}_x,\, \mathcal{F}_u coincide with f_x, f_u on E_+, while on E_0 only their existence is important, rather than their values. (Note that (3.18) can also be directly derived from (3.15).)

3.3. Problem \mathrm{B}^{\theta} for index \theta

For the above index \theta, consider the following Problem \mathrm{B}^{\theta} in the space L^r_\infty(E_+)\times \mathbb{R}^m \times \mathbb{R}^n of elements (u,z,x_0):

\begin{equation} F_0(x_0,x(\tau_1))\to \min, \end{equation} \tag{3.19}
\begin{equation} F(x_0,x(\tau_1))\leqslant 0,\qquad K(x_0,x(\tau_1))=0, \qquad -z \leqslant 0, \end{equation} \tag{3.20}
\begin{equation} \Phi(x(\tau))\leqslant 0 \quad \text{on } [\tau_0,\tau_1], \end{equation} \tag{3.21}
\begin{equation} \varphi(x(\tau),u(\tau)) \leqslant 0,\qquad g(x(\tau),u(\tau)) = 0 \quad \text{on }E_+\,, \end{equation} \tag{3.22}
where x(\tau) = P(u,z,x_0)(\tau) is determined by (u,z,x_0) from the control system (3.16). We will call it the associated problem corresponding to the process \widehat{w}(t)= (\widehat{x}(t),\widehat{u}(t)) of the original Problem \mathrm{B} and the index \theta.

Obviously, for any triple (u,z,x_0)\in L_\infty^r(E_+) \times \mathbb{R}^m \times \mathbb{R}^n sufficiently close to (u^\theta,z^\theta,x^\theta_0), where z\geqslant0, the pair (x(\tau),u(\tau)) is generated by a unique solution (x'(t),u'(t)) of the original system (3.3) defined on the interval [\widehat t_0,\, t_1 = t(\tau_1)], that is,

\begin{equation} x(\tau)= x'(t(\tau)), \qquad u(\tau)= u'(t(\tau)), \end{equation} \tag{3.23}
where t(\tau) is determined by the equation dt/d\tau = v(\tau),\; t(\tau_0)= \tau_0. Moreover, if the triple (u,z,x_0) tends to the basic triple (u^\theta,z^\theta,x^\theta_0), then t(\tau_1) tends to \widehat t_1, the pair (x'(t),u'(t)) tends to the optimal pair (\widehat{x}(t),\widehat{u}(t)) of Problem \mathrm{B} in the norm of the space C\times L_1 (evaluated each time on the mutual interval of their definition), and \|u'\|_\infty \leqslant \mathrm{const} (where the constant depends on \theta).

Indeed, when changing from the new time \tau to the original time t, the intervals from E_+ are mapped to the intervals obtained from the initial intervals [t^s, t^{s+1}] by small translations and dilations, and hence the state variable x'(t) on these intervals is uniformly close to the optimal \widehat{x}(t), and the control u'(t) is close to \widehat{u}(t) in the integral metric. Each interval \Delta \subset E_0 of the \tau-axis is sent to a small interval of the t-axis, and hence x'(t) is uniformly close to \widehat{x}(t) on it, and the integral of |u'(t)| is small. We omit the routine verification of these facts. (Some estimates of this type can be found in [34].)

Remark 5. Note that, for the basic z^\theta, that is, for v= v^\theta, every interval \Delta^s \subset E_0 collapses under the mapping \tau \mapsto t(\tau), and is transformed to the point t^s, so that the above chosen values u^\theta(\tau) =u^s on \Delta^s do not appear in the original time t, and hence these values seemingly do not play any role. However, if z slightly deviates from the basic value, then the interval \Delta^s \subset E_0 of the \tau-axis corresponding to z_s >0 is now transformed to a small interval of length z_s on the t-axis, where u'(t) = \mathcal{U}^s(x(\tau(t))) is close to u^s. Thus, in the original time, we obtain in fact a needle-type variation of the control! Its principal difference from the “standard” needle-type variation is that we do not replace the control \widehat{u}(t) on a small interval near the point t^s, but rather expand this point by inserting there a small interval with profile u'(t) =\mathcal{U}^s(x(\tau(t))). This point t^s is not unique, and so we get a packet of such generalized needle variations. Note that here it does not seem possible to employ standard needle variations (as, for example, in [25], [26], for problems without state and mixed constraints), because the constraint \Phi(x(t))\leqslant 0 would not be differentiable with respect to the width of the needle, because even the derivative of the trajectory x(t) with respect to the width of the needle would be a discontinuous function of t.

As was already noted in the introduction, the advantage of such “inserted” needles against the usual ones is also in that they guarantee a smooth dependence of all the problem constraints on the needle width for any measurable control, whereas the usual needles work only for the piecewise continuous optimal control \widehat{u}(t).

Remark 6. The control u(\tau) on the intervals in E_0 is not varied, but is given by certain functions of x, while its variation on the set E_+ will be needed for obtaining the stationarity condition with respect to the control \overline H_u=0. For the problems without mixed constraints, this condition is absent, hence there is no need to vary the control on E_+ — it suffices to consider only generalized needles, so that Problem \mathrm{B}^{\theta} is finite-dimensional [27], [28]. If the mixed constraints are present, the generalized needles themselves would not do.

Let us find a link between optimality of the basic points in Problems \mathrm{B} and \mathrm{B}^\theta.

Lemma 3. If a process \widehat{w}= (\widehat{x}(t), \widehat{u}(t)) delivers a Pontryagin minimum in Problem \mathrm{B}, then the triple \zeta^\theta =(u^\theta,z^\theta,x^\theta_0) delivers a local minimum in the associated Problem \mathrm{B}^{\theta}, that is, a minimum with respect to the norm \|u\|_\infty +|z| +|x_0| (a weak minimum).

Proof. Suppose on the contrary that the triple \zeta^\theta does not give a local minimum in Problem \mathrm{B}^{\theta}. This means that there is a sequence of admissible triples \zeta = (u,z,x_0) of Problem \mathrm{B}^{\theta} such that \zeta \to \zeta^\theta and F_0(\zeta) < F_0(\zeta^\theta). Passing from the time \tau to the original time t, we construct, as above, a sequence of processes w' = (x'(t), u'(t)) satisfying equalities (3.23) and system (3.3). By virtue of (3.22), these processes satisfy the mixed constraints of Problem \mathrm{B} on the image of the set E_+. On the intervals of the \tau-axis in E_0, these constraints hold by construction (with strict inequalities). The pass to t transforms each interval from E_0 to a small interval on which the mixed constraints also hold (with strict inequalities). The state constraints for the process w' remain valid by (3.21).

Since every trajectory x'(t) has the same endpoints as x(\tau), the processes w' are admissible in Problem \mathrm{B} and produce the values F_0(w') = F_0(\zeta) < F_0(\zeta^\theta) = F_0(\widehat{w}). Finally, since \zeta \to \zeta^\theta, we have, by the above, \|x' -\widehat{x}\|_C \to 0, \|u' -\widehat{u}\|_1 \to 0 and \|u\|_\infty \leqslant \mathrm{const}, which contradicts Pontryagin minimality of the point \widehat{w} in Problem \mathrm{B}. \Box

Now, we can write necessary conditions for a local minimum in Problem \mathrm{B}^{\theta}. Note that, even though all the “data functions” in this problem are smooth, this is not a standard smooth problem, because it involves an uncountable number of inequality constraints (3.21) and (3.21). This is a problem of so-called “semi-infinite” optimization. Nevertheless, necessary conditions for a local minimum in such problems are well known — this is a general Lagrange multiplier rule (or a principle) (see § 9.1 in the appendix). In our case, it reads as follows.

Theorem 2. Let a triple (u^\theta, z^\theta, x^\theta_0) deliver a local minimum in Problem \mathrm{B}^{\theta}. Then there exist a number \alpha_0, row vectors \alpha\in\mathbb{R}^{d(F)}, \beta\in \mathbb{R}^{d(K)}, and \gamma \in\mathbb{R}^{m+n}, elements \lambda \in L_\infty^{d(\varphi)*}(E_+) and m \in L_\infty^{d(g)*}(E_+), and a vector function \mu(\tau) of dimension d(\Phi) on [\tau_0,\tau_1] with non-decreasing components and initial value \mu(\tau_0)=0 such that

\begin{equation*} \begin{aligned} \, &(\mathrm{i})\ \alpha_0\geqslant0,\quad \alpha\geqslant0,\quad \gamma\geqslant 0, \quad \lambda\geqslant 0; \\ &(\mathrm{ii})\ \alpha_0+ |\alpha|+ |\beta|+ |\gamma|+ \|\lambda\|+ \|m\|+ \mu(\tau_1)> 0; \\ &(\mathrm{iii})\ \alpha F(\widehat{x}_0, \widehat{x}_1)=0,\quad \gamma z^\theta =0, \quad \langle \lambda,\varphi(x^\theta,u^\theta) \rangle =0, \quad \Phi(x^\theta(\tau))\,d \mu(\tau) =0;\qquad \end{aligned} \end{equation*} \notag
and, moreover, the Lagrange function for Problem \mathrm{B}^{\theta}
\begin{equation*} L(u,z,x_0) = (\alpha_0 F_0+\alpha F+\beta K) -\gamma z + \langle \lambda,\varphi(x,u)\rangle +\langle m, g(x,u)\rangle + \int_{\tau_0}^{\tau_1}\Phi(x)\,d \mu \end{equation*} \notag
is stationary at the point (u^\theta, z^\theta, x^\theta_0),
\begin{equation} L'(u^\theta, z^\theta, x^\theta_0) = 0. \end{equation} \tag{3.24}

Here, \lambda and m are linear continuous functionals on the spaces L_\infty(E_+) of corresponding dimensions; by \langle \lambda, \overline\varphi \rangle and \langle m,\overline g \rangle we denote evaluation of \lambda and m at arbitrary points \overline\varphi and \overline g of these spaces.

Our next aim is to decipher the above conditions.

Let us dwell in more detail on the condition \langle \lambda,\varphi(w^\theta) \rangle = \sum_i\langle \lambda_i,\varphi_i(w^\theta) \rangle =0. This condition means that, for every i, the functional \lambda_i \in L^*_\infty(E_+) is a support element (an outer normal) to the cone \Omega of non-positive functions in the space L_\infty(E_+) at the point \varphi_i(w^\theta)\in \Omega. For any \delta>0, define the set M_i^\delta = \{\tau\in E_+\mid \varphi_i(w^\theta)\geqslant -\delta\} (which may be empty). Each \lambda_i is characterized by the following properties (see § 3.5 in [7]): a) \lambda_i\geqslant0; b) \lambda_i is supported on the set M_i^\delta for any \delta>0; c) \|\lambda_i\| := \langle \lambda_i, \mathbf{1}\rangle =1. (Here, \mathbf{1} is the identically one function.) Below, it will be shown that every \lambda_i is a “usual” function from L_1(E_+) (and even from L_\infty(E_+)), and hence it supported on the set M_i^0 = \{ \tau \in E_+ \,{\mid}\, \varphi_i(w^\theta) = 0\}, that is, we get the usual complementary slackness condition \lambda_i(\tau)\varphi_i(w^\theta(\tau)) =0 almost everywhere on E_+.

§ 4. Stationarity conditions in Problem \mathrm{B}^\theta

For notational convenience, we introduce the endpoint Lagrange function l = \alpha_0 F_0+\alpha F+\beta K, and write, for brevity, f^\theta = f(x^\theta, u^\theta), f^\theta_x = f_x(x^\theta,u^\theta), etc.

Condition (3.24) means that, for any (\overline u,\overline{z},\overline x_0),

\begin{equation} \begin{aligned} \, &L'(u^\theta, z^\theta, x^\theta_0)(\overline u,\overline{z}, \overline x_0) = l_{x_0} \overline x_0+ l_{x_1} \overline x_1 - \gamma \overline{z} \nonumber \\ &\qquad\qquad\quad + \langle \lambda,(\varphi_x^\theta \overline x + \varphi_u^\theta\overline u) \rangle + \langle m,(g_x^\theta \overline x + g_u^\theta\overline u) \rangle + \int_{\tau_0}^{\tau_1}\Phi_x^\theta \overline x\,d \mu =0, \end{aligned} \end{equation} \tag{4.1}
where \overline x_1 =\overline x(\tau_1) according to equation (3.17) (or (3.18)). (The derivatives of all functions are taken at the optimal point (u^\theta, z^\theta, x^\theta_0).)

1. Let us first simplify the functionals \lambda and m, which a priori lie in L^*_\infty(E_+). To this aim, we recall the following property of L_\infty^*(\Delta)-functionals on an interval \Delta.

The functional \pi\in L_\infty^*(\Delta) is called absolutely continuous if there exists a function p \in L_1(\Delta) such that \pi can be represented as

\begin{equation*} \langle \pi, u \rangle = \int_{\tau_0}^{\tau_1} p(\tau)\,u(\tau)\,d\tau \quad \text{for all} \quad u\in L_\infty(\Delta). \end{equation*} \notag

One can easily show that \pi is absolutely continuous if and only if \langle \pi, u_n\rangle \to 0 for any sequence u_n \in L_\infty(\Delta) such that \|u_n\|_\infty \leqslant\mathrm{const}, \|u_n\|_1 \to0. (This follows, for example, from the Yosida–Hewitt theorem on decomposition of \pi into an absolutely continuous and singular components. Obviously, this property holds for the absolutely continuous component, but not for the singular one.)

This implies that, for any \eta \in L_\infty^*(E_+) and any function a\in L_\infty(E_+), the functional of the form \langle \eta, a\overline x \rangle, where \overline x is expressed via \overline u \in L_\infty(E_+) by the equation d\overline{x}/d\tau = A(\tau)\overline x + B(\tau)\overline u with given matrices A, B\in L_\infty on the interval [\tau_0,\tau_1] and with the initial conditions \overline x(\tau_0)=0, is absolutely continuous with respect to \overline u. Indeed, if \|\overline u_n\|_1 \to 0, then by the Gronwall lemma, \|\overline x_n\|_C \to0, whence \| a\,\overline x_n\|_\infty \to 0, and so, \langle \eta, a\overline x_n \rangle \to 0. By the same reason, for any measure d\mu on [\tau_0,\tau_1] and any continuous function c(\tau), the functional \int c\overline x\,d\mu is also absolutely continuous with respect to \overline u \in L_\infty(E_+).

2. Let us get back to equality (4.1). Setting \overline{z}=0 and \overline x_0=0 in this equality, we get \overline{v}=0, and hence in view of (3.18) \overline x is expressed in terms of \overline u via

\begin{equation} \frac{d\overline{x}}{d\tau}= v^\theta(f_x^\theta \overline x + f_u^\theta \overline u), \qquad \overline x(\tau_0)=0. \end{equation} \tag{4.2}
In addition, for any \overline u \in L_\infty(E_+), we have
\begin{equation*} \langle \lambda,\varphi_u^\theta\overline u \rangle + \langle m, g_u^\theta\overline u \rangle = -l_{x_1} \overline x_1 - \langle \lambda, \varphi_x^\theta\overline x \rangle - \langle m, g_x^\theta\overline x \rangle + \int_{\tau_0}^{\tau_1}\Phi_x^\theta \overline x\,d \mu. \end{equation*} \notag
By the above, the right-hand side of this equality is an absolutely continuous functional, that is,
\begin{equation} \sum_{i=1}^{d(\varphi)}\langle \lambda_i,\varphi_{iu}^\theta\overline u \rangle+ \sum_{j=1}^{d(g)}\langle m_j, g_{ju}^\theta\overline u \rangle = \int_{E_+} p(\tau)\overline u(\tau)\,d\tau, \end{equation} \tag{4.3}
where p is an L_1(E_+)-function. By the above assumption on regularity of mixed constraints, to the collection of vector functions \varphi_{iu}(w^\theta) and g_{ju}(w^\theta) one can apply Theorem 8 on the absence of singular components (see the appendix, § 9.2), which together with equation (4.3) implies that all the components of the functionals \lambda and m are absolutely continuous, that is, \lambda_i = \lambda_i(\tau) and m_j= m_j(\tau) are functions from L_1(E_+), and, in addition, \lambda_i(\tau)\geqslant0 on E_+. Now the complementary slackness condition \langle \lambda,\varphi(x^\theta,u^\theta) \rangle =0 assumes the form
\begin{equation*} \sum_{i=1}^{d(\varphi)}\int_{E_+}\lambda_i(\tau)\,\varphi_i(x^\theta(\tau),u^\theta(\tau))\,d\tau =0. \end{equation*} \notag
So, for any component \lambda_i, we have \lambda_i(\tau)\varphi_i(x^\theta(\tau),u^\theta(\tau)) = 0, that is, \lambda_i is concentrated on the zero set of the ith mixed inequality \varphi_i(x^\theta(\tau),u^\theta(\tau))=0. To unify the notation, we put \lambda=0 and m=0 on E_0, so that now \lambda, m lie in L_1[\tau_0,\tau_1].

Now (4.1) assumes the form

\begin{equation} \begin{aligned} \, &l_{x_0} \overline x_0+ l_{x_1} \overline x_1-\gamma \overline{z} \nonumber \\ &\qquad + \int_{\tau_0}^{\tau_1} \lambda (\varphi_x^\theta\overline x + \varphi_u^\theta\overline u)\, d\tau + \int_{\tau_0}^{\tau_1} m (g_x^\theta \overline x + g_u^\theta\overline u)\,d\tau + \int_{\tau_0}^{\tau_1}\Phi_x^\theta \overline x\,d \mu =0. \end{aligned} \end{equation} \tag{4.4}

3. Let us rewrite this equality in terms of the independent variables (\overline u,\overline{z},\overline x_0) with due account of (3.17) (or (3.18)). We need to properly transform the terms involving \overline x_1 and \overline x(\tau). To this aim, we require the following simple fact.

Lemma 4. Let an absolutely continuous function \overline x(\tau) and a function of bounded variation \psi(\tau) (both n-dimensional, \overline x is a column, \psi a row) satisfy

\begin{equation} \begin{array}{ll} \dot{\overline x} = A\overline x + \overline b, & \quad\; \overline x(\tau_0) = \overline x_0, \\ \dot\psi = -\psi A + \dot\rho, & \quad\; \psi(\tau_1) = -l_1, \end{array} \end{equation} \tag{4.5}
where the matrix A(\tau) and the function \overline b(\tau) are measurable and bounded, \rho(\tau) is a function of bounded variation continuous at \tau_0 and \tau_1, and l_1\in \mathbb{R}^n. Then
\begin{equation} l_1\overline x_1 + \int_{\tau_0}^{\tau_1} \overline x\, d\rho = -\psi_0\overline x_0 - \int_{\tau_0}^{\tau_1} \psi\overline b\,d\tau. \end{equation} \tag{4.6}

Proof. Taking the time derivative of the product \psi\overline x, we have
\begin{equation*} \frac d{d\tau}(\psi\overline x)= (-\psi A + \dot\rho)\overline x + \psi(A\overline x + \overline b) = \dot\rho\overline x+\psi\overline b, \end{equation*} \notag
and hence
\begin{equation*} \psi_1\overline x_1 - \psi_0\overline x_0 = \int_{\tau_0}^{\tau_1} \overline x\,d\rho + \int_{\tau_0}^{\tau_1} \psi\overline b\,d\tau. \end{equation*} \notag
Now, using the terminal value \psi_1 = -l_1, we arrive at (4.6). \Box

Remark 7. This result is a generalization of the classical DuBois–Reimond lemma, which is, in fact, the integration by parts formula for the Stieltjes integral.

We now apply Lemma 4 to (4.4) and take into account (3.18). A comparison of (4.4) and (3.18), respectively, with the left-hand side of (4.6), and the upper row in (4.5) shows that

\begin{equation*} \begin{gathered} \, A= v^\theta f_x^\theta, \qquad \overline b = v^\theta f^\theta_u\overline u+ \overline v f^\theta, \\ d\rho = (\lambda\varphi_x^\theta+m g_x^\theta)\,d\tau+ \Phi_x^\theta\,d\mu, \qquad l_1= l_{x_1}. \end{gathered} \end{equation*} \notag
Next, we introduce the function of bounded variation \psi^\theta(\tau) (the adjoint variable of Problem \mathrm{B}^\theta), which, according to (4.5), is a solution of the equation
\begin{equation} \frac{d\psi^\theta}{d\tau}= -v^\theta\psi^\theta f_x^\theta+ \lambda\varphi_x^\theta + mg_x^\theta+\frac{d\mu}{d\tau}\,\Phi_x^\theta, \qquad \psi^\theta(\tau_1)= -l_{x_1}. \end{equation} \tag{4.7}
By Lemma 4, equality (4.4) assumes the form
\begin{equation*} l_{x_0} \overline x_0 - \psi_0^\theta\overline x_0 - \gamma \overline{z} - \int_{\tau_0}^{\tau_1} \psi^\theta (v^\theta f_u^\theta\overline u + \overline{v} f^\theta)\,d\tau + \int_{\tau_0}^{\tau_1} (\lambda\varphi_u^\theta + m g_u^\theta) \overline u \,d\tau =0. \end{equation*} \notag
Since \overline v(\tau)= \sum \overline{z}_k \chi_k(\tau), we have
\begin{equation} \begin{aligned} \, &(l_{x_0}-\psi^\theta_0)\,\overline x_0 + \sum_k z_k^\theta \int_{\sigma_k} (-\psi^\theta f_u^\theta +\lambda\varphi_u^\theta+ mg_u^\theta)\,\overline u \,d\tau \nonumber \\ &\qquad- \sum_k \overline{z}_k \int_{\sigma_k} \psi^\theta f^\theta\, d\tau - \sum_k\gamma_k\overline{z}_k= 0. \end{aligned} \end{equation} \tag{4.8}

This equality holds for all \overline x_0\in\mathbb{R}^n, all \overline{z}_k\in \mathbb{R}, k=1,\dots,m, and all \overline u \in L_\infty(E_+). By varying \overline x_0 and \overline{z}_k, we get \psi^\theta(\tau_0) = l_{x_0}, and, for every k

\begin{equation} \int_{\sigma_k} \psi^\theta f^\theta\, d\tau = -\gamma_k. \end{equation} \tag{4.9}

Now recall that all \gamma_k \geqslant 0, z_k^\theta\geqslant 0, and, according to the complementary slackness condition (iii) in Theorem 2, \gamma z^\theta := \sum \gamma_k z^\theta_k =0, and so \gamma_k z^\theta_k =0 for all k.

If \sigma_k\subset E_+, then z^\theta_k =1, and so \gamma_k =0. If \sigma_k\subset E_0, then z^\theta_k =0, and we only know that \gamma_k \geqslant 0.

Finally, varying \overline u, we have

\begin{equation} -\psi^\theta f_u^\theta+\lambda\varphi_u^\theta+ m g_u^\theta=0 \quad \text{on each}\ \ \sigma_k\subset E_+. \end{equation} \tag{4.10}
It is worth pointing out that this equality holds only on E_+. If \sigma_k\subset E_0, then \overline u is not varied, and we get no condition here.

4. Let us summarize the preliminary results of deciphering of the stationarity conditions (4.1).

Theorem 3. For any index \theta, there exists a collection

\begin{equation*} \xi^\theta= (\alpha_0,\alpha,\beta,\lambda^\theta(\tau), m^\theta(\tau),\mu^\theta(\tau)) \end{equation*} \notag
from the space \mathbb{R}^{1+d(F)+d(K)} \times \bigl(L_1^{d(\varphi)} \times L_1^{d(g)} \times BV^{d(\Phi)}\bigr)[\tau_0,\tau_1] and a corresponding function of bounded variation \psi^\theta(\tau) such that the following conditions hold:
\begin{equation} \begin{split} &(\mathrm{i})\qquad \alpha_0\geqslant0,\quad \alpha\geqslant0,\quad \gamma\geqslant 0, \quad \lambda^\theta\geqslant 0,\quad d\mu^\theta \geqslant0; \\ &(\mathrm{ii})\qquad \alpha_0+ |\alpha|+ |\beta|+ \int_{E_+} |\lambda^\theta|\,dt + \int_{E_+} |m^\theta|\,dt+ \int_{\tau_0}^{\tau_1} d\mu^\theta>0, \\ &\qquad\lambda^\theta =0,\quad m^\theta=0\quad\textit{a.e. on }E_0; \\ &(\mathrm{iii})\qquad \alpha F(\widehat{x}_0,\widehat{x}_1)=0,\quad \lambda^\theta(\tau) \varphi^\theta(\tau) =0,\quad \Phi(x^\theta(\tau))\,d \mu^\theta(\tau) =0, \end{split}\nonumber \end{equation} \notag
\begin{equation} \frac{d\psi^\theta}{d\tau}= -v^\theta\psi^\theta f_x^\theta+ \lambda^\theta \varphi_x^\theta+m^\theta g_x^\theta + \frac{d\mu^\theta}{d\tau}\,\Phi_x^\theta, \end{equation} \tag{4.11}
\begin{equation} \psi^\theta(\tau_0)= l_{x_0}, \qquad \psi^\theta(\tau_1)= -l_{x_1}, \end{equation} \tag{4.12}
\begin{equation} -\psi^\theta f_u^\theta+\lambda^\theta \varphi_u^\theta+ m^\theta g_u^\theta=0 \quad \textit{on } E_+, \end{equation} \tag{4.13}
\begin{equation} \int_{\sigma_k}\psi^\theta f^\theta \, d\tau \begin{cases} =0, &\textit{if }\sigma_k\subset E_+, \\ \leqslant 0, &\textit{if }\sigma_k\subset E_0, \end{cases} \qquad k=1,\dots,m. \end{equation} \tag{4.14}

The function \psi^\theta is uniquely determined by the collection \xi^\theta from the equation (4.11) and any of the boundary conditions (4.12).

Note that the multiplier \gamma does not appear in the non-triviality condition (ii), since by (4.9) it determined from \psi^\theta. Moreover, let us show that m^\theta can also be excluded from condition (ii), that is, this condition can be written as

\begin{equation*} \alpha_0+ |\alpha|+ |\beta|+ \int_{E_+} |\lambda^\theta|\,dt + \int_{\tau_0}^{\tau_1} d\mu^\theta > 0. \end{equation*} \notag
Indeed, if the left-hand side here is zero, then l=0, \lambda^\theta=0, and d\mu^\theta =0, and so
\begin{equation*} \frac{d\psi^\theta}{d\tau}= -v^\theta\psi^\theta f_x^\theta+m^\theta g_x^\theta, \quad\; \psi^\theta(\tau_0)= \psi^\theta(\tau_1)= 0, \quad\; -\psi^\theta f_u^\theta+ m^\theta g_u^\theta=0 \;\; \text{on } E_+. \end{equation*} \notag
The matrix g_u(x^\theta, u^\theta) has full rank uniformly in \tau, and hence its right inverse D(\tau) is bounded, so that m^\theta = \psi^\theta f_u^\theta D(\tau). Substituting this expression into the equation for \psi^\theta, we get a linear homogeneous equation with zero boundary conditions. Therefore, \psi^\theta =0, which also implies m^\theta =0.

Note that even in the general case, with non-zero \lambda^\theta and d\mu^\theta, we can express m^\theta = (\psi^\theta f_u^\theta - \lambda^\theta\varphi_u^\theta)D(\tau) and substitute this expression into the adjoint equation, thereby obtaining a linear equation with respect to \psi that contains \lambda^\theta and d\mu^\theta.

5. Consider in more detail the second condition in (4.14). We take any interval \sigma = [\tau', \tau''] composing E_0. On this interval u^\theta(\tau) = u^s for some s, it is constant, and v^\theta = 0, whence x^\theta(\tau) is also constant, which we denote by \widehat{x}_*. Thus, f^\theta = f(\widehat{x}_*,u^s). Note that some other intervals from E_0 may be adjacent to [\tau', \tau''] from left or right. (The mapping \tau \mapsto t sends each such an interval to the same point t^s.) Let \widetilde \sigma =[\tau'_*,\tau''_*] be the union of this interval with all the adjacent intervals from E_0. (If there are no adjacent intervals on the left-hand side of \sigma, we have \tau'_* =\tau', and if there are no such interval on the right-hand side of \sigma, we have \tau''_* =\tau''.) Since v^\theta = 0 on the entire interval \widetilde\sigma, and so we still have x^\theta(\tau)= \widehat{x}_* is constant there.

According to (4.11) and since \lambda=0 and m=0 on \widetilde\sigma, we have

\begin{equation} d\psi^\theta(\tau)= \sum_{j=1}^{d(\Phi)} d\mu_j^\theta(\tau)\,\Phi_j'(\widehat{x}_*) \quad \text{on }\, \widetilde\sigma. \end{equation} \tag{4.15}
(Recall that the index j denotes the jth state constraint \Phi_j(x)\leqslant0 and the corresponding measure d\mu_j^\theta. Here, \Phi'_j are the rows of the matrix \Phi_x.) This implies that, for any \tau\in [\tau'_*,\tau''_*],
\begin{equation} \psi^\theta(\tau)-\psi^\theta(\tau'_*-0)= \sum_{j=1}^{d(\Phi)}\, [\mu_j^\theta(\tau)-\mu_j^\theta(\tau'_*-0)]\Phi_j'(\widehat{x}_*). \end{equation} \tag{4.16}
The second condition in (4.14) means that, on the above non-extended interval \sigma,
\begin{equation} \int_{\tau'}^{\tau''} \psi^\theta(\tau) f(\widehat{x}_*,u^k)\, d\tau \leqslant 0. \end{equation} \tag{4.17}
Putting here the value \psi^\theta(\tau) from (4.16), we get
\begin{equation} \begin{aligned} \, &\psi^\theta(\tau'_*-0)f(\widehat{x}_*,u^s)(\tau''-\tau') \nonumber \\ &\qquad +\sum_{j=1}^{d(\Phi)}\Phi'_j(\widehat{x}_*)f(\widehat{x}_*,u^s) \int_{\tau'}^{\tau''}[\mu_j^\theta(\tau)- \mu_j^\theta(\tau'_*-0)]\,d\tau \leqslant 0. \end{aligned} \end{equation} \tag{4.18}
Since \mu_j^\theta(\tau) \leqslant \mu_j^\theta(\tau''_*+0) on [\tau', \tau''] for all j, we have
\begin{equation*} \int_{\tau'}^{\tau''}[(\mu_j^\theta(\tau) -\mu_j^\theta(\tau'_*-0)]\, d\tau \leqslant [\mu_j^\theta(\tau''_*+0) -\mu_j^\theta(\tau'_*-0)](\tau''-\tau'). \end{equation*} \notag
Let numbers 0\leqslant \rho_j \leqslant 1, \;j=1,\dots,d(\Phi), be such that
\begin{equation*} \int_{\tau'}^{\tau''} [(\mu_j^\theta(\tau) -\mu_j^\theta(\tau'_*-0)]\, d\tau = \rho_j [\mu_j^\theta(\tau''_*+0) -\mu_j^\theta(\tau'_*-0)](\tau''-\tau'). \end{equation*} \notag
Hence from (4.18) we have
\begin{equation} \biggl(\psi^\theta(\tau'_*-0) + \sum_{j=1}^{d(\Phi)} \rho_j [\mu_j^\theta(\tau''_* +0) -\mu_j^\theta(\tau'_*-0)] \Phi'_j(\widehat{x}_*)\biggr) f(\widehat{x}_*,u^s) \leqslant 0. \end{equation} \tag{4.19}

Remark 8. This non-trivial trick of replacing condition (4.18) by condition (4.19) with unknown numbers \rho_j was proposed by Milyutin in his lectures at the Faculty of Mechanics and Mathematics of Lomonosov Moscow State University in the 1970s. This trick will allow us to proceed with the next important step in the proof of an MP for problems with several state constraints, that is, to pass from the conditions in the time \tau to conditions in the original time t. In the case of a scalar state constraint, this trick is not required, since in this setting the function \psi^\theta(\tau) f(\widehat{x}_*,u^k) is monotone on [\tau'_*,\tau''_*] (see [27]).

Now, we rewrite the obtained conditions in terms of the original time t. This will make it possible to consider conditions, as obtained for different indexes \theta, on the same interval [\widehat t_0,\widehat t_1].

§ 5. Finite-valued maximum principle of index \theta

By construction, t^\theta(\tau) is a non-decreasing function on [\tau_0, \tau_1] which maps this interval onto [\widehat t_0,\widehat t_1], and which is constant on each interval \sigma \subset E_0. In addition, on [\tau_0, \tau_1] we have the functions u^\theta(\tau) and x^\theta(\tau) related to the original functions \widehat{x}(t) and \widehat{u}(t) via (3.11).

Let \tau^\theta(t) be the smallest root of the equation t^\theta(\tau) =t. This function strictly increases and have jumps at the given points t^s (and only at these points), that is, the jump \Delta\tau(t^s) = \tau''_* -\tau'_*, where [\tau'_*,\tau''_*] is the above maximal interval corresponding to the point t^s. Consider the functions

\begin{equation*} \begin{alignedat}{2} \lambda(t) &= \lambda^\theta(\tau^\theta(t)), & \qquad m(t) &= m^\theta(\tau^\theta(t)), \\ \mu(t) &= \mu^\theta(\tau^\theta(t)), & \qquad \psi(t)&=\psi^\theta(\tau^\theta(t)), \end{alignedat} \qquad t\in [\widehat t_0,\widehat t_1]. \end{equation*} \notag

Since \lambda^\theta =0 and m^\theta=0 on E_0, and dt = d\tau on E_+, the functions \lambda(t) and m(t) are also integrable, now on the interval [\widehat t_0,\widehat t_1], and the normalization of these multipliers is preserved when passing from \tau to t:

\begin{equation*} \int_{\widehat t_0}^{\widehat t_1} |\lambda(t)|\,dt = \int_{\tau_0}^{\tau_1} |\lambda^\theta(\tau)|\,dt, \qquad \int_{\widehat t_0}^{\widehat t_1} |m(t)|\,dt = \int_{\tau_0}^{\tau_1} |m^\theta(\tau)|\,dt. \end{equation*} \notag
(The second equality will not be used below, since the multiplier m is excluded from the normalization).

It is easily seen that the function \mu(t) does not decrease and has the jumps \Delta\mu(t^s)= \mu^\theta(\tau''_* +0) -\mu^\theta(\tau'_*-0) at the points t^s; moreover,

\begin{equation*} \int_{\widehat t_0}^{\widehat t_1} d\mu(t)\,dt = \int_{\tau_0}^{\tau_1} d\mu^\theta(\tau)\,d\tau, \end{equation*} \notag
and \psi(t) is a function of bounded variation satisfying
\begin{equation*} \begin{aligned} \, \frac{d\psi(t)}{dt} &= -\psi(t)f_x(\widehat{x}(t),\widehat{u}(t)) \\ &\qquad + \lambda(t)\varphi_x(\widehat{x}(t),\widehat{u}(t))+ m(t)g_x(\widehat{x}(t),\widehat{u}(t))+ \frac{d \mu(t)}{dt}\, \Phi'(\widehat{x}(t)) \end{aligned} \end{equation*} \notag
with the same endpoint values as \psi^\theta(\tau). This equation follows from (4.11) since d\mu(t) = d\mu^\theta(\tau) for \tau \in E_+ and for the corresponding t = t^\theta(\tau). The proof of these properties is left to the reader.

Recall that, in view of assumption (2.7), the measure does not work near the points \widehat t_0 and \widehat t_1 (d\mu(t)=0), and hence \psi(t) is continuous at these points.

Theorem 3 can be rewritten in the original time t\in [\widehat t_0,\widehat t_1] as follows.

Theorem 4 (maximum principle for index \theta). For any index \theta, there exists a collection \xi = (\alpha_0,\alpha,\beta,\lambda(t), m(t),\mu(t)), where the functions \lambda(t) and m(t) are integrable, \mu(t) is non-decreasing, and a function of bounded variation \psi(t) corresponding to this collection, such that:

\begin{equation} \begin{gathered} \, \begin{split} &(\mathrm{i}) \ \alpha_0\geqslant0,\quad \alpha\geqslant 0,\quad \lambda(t)\geqslant0,\quad d\mu(t)\geqslant 0; \\ &(\mathrm{ii}) \ \alpha_0+ |\alpha|+ |\beta|+ \sum_i \int_{\widehat t_0}^{\widehat t_1} \lambda_i(t)\,dt+ \sum_j \int_{\widehat t_0}^{\widehat t_1}d \mu_j(t) =1; \\ &(\mathrm{iii}) \ \alpha F(\widehat{x}_0,\widehat{x}_1)=0, \qquad \lambda_i(t)\varphi_i(\widehat{x}(t)),\widehat{u}(t))=0,\quad i=1,\dots, d(\varphi), \\ &\qquad\Phi_j(\widehat{x}(t))\,d\mu_j(t) =0,\qquad j=1,\dots, d(\Phi); \\ &(\mathrm{iv}) \ \frac{d\psi}{dt}= -\psi f_x(\widehat{x},\widehat{u})+ \lambda\varphi_x(\widehat{x},\widehat{u})+ m g_x(\widehat{x},\widehat{u})+ \frac{d \mu}{dt}\, \Phi_x(\widehat{x}); \\ &(\mathrm{v}) \ \psi(\widehat t_0)= l_{x_0},\qquad \psi(\widehat t_1)= -l_{x_1}; \\ &(\mathrm{vi}) \ -\psi f_u(\widehat{x},\widehat{u})+ \lambda\varphi_u(\widehat{x},\widehat{u})+ mg_u(\widehat{x},\widehat{u})=0; \\ &(\mathrm{vii}) \ \textit{for any neighbouring points }t^s < t^{s+1}\textit{ of index }\theta, \\ &\qquad\qquad\qquad\int_{t^s}^{t^{s+1}} \psi(t)\,f(\widehat{x}(t),\widehat{u}(t))\,dt = 0, \\ &(\mathrm{viii}) \ \textit{for any pair }(t^s,u^s)\textit{ of index }\theta, \textit{ there exist numbers } 0\leqslant \rho_j\leqslant 1,\qquad\quad \\ &\qquad\ j=1,\dots,d(\Phi),\textit{ such that} \end{split} \end{gathered} \end{equation} \notag
\begin{equation} \biggl(\psi(t^s -0)+ \sum_{j=1}^{d(\Phi)} \rho_j \Delta\mu_j(t^s)\Phi'_j(\widehat x(t^s))\biggr) f(\widehat{x}(t^s),u^s) \leqslant 0. \end{equation} \tag{5.1}

Relation (vii) is secured by the first in (4.14) since on any \Delta \subset E_+ the mapping \tau \to t is one-one and v^\theta(\tau)=1, which gives dt= d\tau. Relation (viii) follows from (4.19).

Note that the function \psi(t) is uniquely determined by \xi from equation (iv) and any of boundary conditions (v).

Thus, for the given index \theta, we obtained a collection of Lagrange multipliers that generate a function \psi(t), so that conditions (i)–(viii) hold. These Lagrange multipliers depend, in general, on the index \theta. Conditions (i)–(vi) are the same for all indexes, but conditions (vii)–(viii) are index-specific. Now, our goal is to pass to conditions (vii)–(viii) for a “universal” collection of multipliers which do not depend on the index \theta.

§ 6. Passage to a universal maximum principle

1. Using the regularity assumption of the mixed constraints we have shown above that the multipliers \lambda(t) and m(t) in Theorem 4 are integrable. Let us now show that these functions are bounded. Since the number of active indexes of mixed inequality constraints \varphi_i(\widehat{w}(t))\leqslant0 varies with time, one can consider all possible finite subsets of \{1,\dots, d(\varphi)\}. In this case, it can be assumed, without loss of generality, that all the inequalities \varphi_i(\widehat{w}(t))\leqslant0 are active on some measurable set E \subset [\widehat t_0,\widehat t_1]. Consider the sets

\begin{equation*} \begin{gathered} \, S = \biggl\{(\alpha,\beta)\in \mathbb{R}^{d(\varphi)}\times \mathbb{R}^{d(g)}\biggm|\; \alpha\geqslant 0,\, \sum\alpha_i + \sum |\beta_j|=1\biggr\}, \\ Q_0 = \{w\in \mathcal{Q}\mid \varphi(w)=0,\; g(w)=0\}. \end{gathered} \end{equation*} \notag
(Recall that Q is an open subset of \mathbb{R}^{n+r} on which the data functions of Problem B are defined.)

By the assumption, for any w\in Q_0, the vectors \varphi_{iu}(w) and g_{ju}(w) are positively linearly independent, and hence

\begin{equation*} \min_S\,\Bigl|\sum\alpha_i \varphi_{iu}(w) +\sum \beta_j g_{ju}(w)\Bigr| >0. \end{equation*} \notag
The function on the left-hand side is continuous, and hence, for any compact set M \subset Q_0, we still have
\begin{equation*} \min_{w\in M} \min_S\, \Bigl|\sum\alpha_i \varphi_{iu}(w) + \sum \beta_j g_{ju}(w)\Bigr| := c >0. \end{equation*} \notag
Hence, for any w\in M and any \alpha\geqslant0 and \beta,
\begin{equation} \Bigl|\sum\alpha_i \varphi_{iu}(w) + \sum \beta_j g_{ju}(w)\Bigr| \geqslant c \Bigl(\sum\alpha_i + \sum |\beta_j|\Bigr). \end{equation} \tag{6.1}

Recall that, for the process \widehat{w}, there exists a compact set D \subset Q such that \widehat{w}(t) \in D almost everywhere on [\widehat t_0,\widehat t_1], and hence, on E. We now set M = D\cap Q_0. Clearly, M is a compact set, and \widehat{w}(t) \in M almost everywhere on E.

We next set \alpha= \lambda(t) and \beta = m(t). In view of (6.1), for almost all t\in E,

\begin{equation*} \sum \lambda_i(t) + \sum |m_j(t)| \leqslant \frac 1c\, \Bigl|\sum \lambda_i(t) \varphi_{iu}(\widehat{w}(t)) + \sum m_j(t) g_{ju}(\widehat{w}(t))\Bigr|. \end{equation*} \notag

By condition (vi), the quantity under the modulus sign on the right is a bounded function \psi(t) f_u(\widehat{w}(t)), whence

\begin{equation} \sum \lambda_i(t) + \sum |m_j(t)| \leqslant \frac 1c\,|\psi(t) f_u(\widehat{w}(t))|, \end{equation} \tag{6.2}
which implies that all the multipliers \lambda_i(t) and m_j(t) are also bounded, that is, they lie in L_\infty(E).

2. To take into account the conditions generated by all the indexes \theta, we proceed as follows.4 For a given index \theta, we introduce the set \Lambda^\theta of all collections \xi =(\alpha_0,\alpha,\beta,\lambda(t),m(t),\mu(t)) that satisfy, together with the corresponding functions \psi(t), conditions (i)–(viii) of Theorem 4; we denote this set by \Lambda^\theta. According to the above, this set lies in the space

\begin{equation*} Y^* = \mathbb{R}^{1+d(F)+d(K)} \times L_\infty^{d(\varphi)}(\Delta) \times L_\infty^{d(g)}(\Delta) \times BV^{d(\Phi)}(\Delta), \end{equation*} \notag
which is dual of the space
\begin{equation*} Y = \mathbb{R}^{1+d(F)+d(K)} \times L_1^{d(\varphi)}(\Delta) \times L_1^{d(g)}(\Delta)\times C^{d(\Phi)}(\Delta), \end{equation*} \notag
where \Delta = [\widehat t_0,\widehat t_1].\;

The following key fact holds.

Lemma 5. The set \Lambda^\theta is compact in the w^*-topology of the space Y^*.

Proof. First, let us check that \Lambda^\theta is bounded. By the normalization condition (ii), \alpha_0 +|\alpha|+|\beta|\leqslant 1 and \|\lambda\|_1 + \|d\mu\|\leqslant 1. Proceeding as above, multiplying equality (vi) by a bounded matrix D(t), we get the expression m = (\psi f_u - \lambda g_u)D(t), which we substitute into (iv), thereby obtaining, for \psi, the linear equation d\psi = (A(t)\psi + B(t)\lambda)\,dt + G(t)\, d\mu, where A, B, and G are bounded measurable matrices. By Lemma 6 (see the appendix, § 9.3),
\begin{equation*} \|\psi\|_\infty \leqslant \mathrm{const} \bigl(|\psi(\widehat t_0)| + \|\lambda\|_1 + \|d\mu\|\bigr) \leqslant \mathrm{const} \end{equation*} \notag
on \Lambda^\theta, and hence by (6.2) we also have \|\lambda\|_\infty + \|m\|_\infty \leqslant \mathrm{const} on \Lambda^\theta. This shows that the set \Lambda^\theta is bounded.

Further, since all the conditions defining \Lambda^\theta (except the normalization condition) are linear with respect to all components of \xi, and since the normalization of the infinite-dimensional components \lambda_i and d\mu_j is given by linear functionals from the original spaces (from L_1 and C, respectively), that is, by w^*-continuous functionals, it follows that the set \Lambda^\theta is w^*-closed.

For example, let us check the w^*-closedness of the set of all \xi satisfying the adjoint equation (iv), which we write in terms of measures

\begin{equation} d\psi(t)= (-\psi f_x + \lambda\varphi_x + m g_x)\,dt + \Phi_{x}\, d\mu(t), \end{equation} \tag{6.3}
and the boundary conditions
\begin{equation} \psi(\widehat t_0) = (\alpha_0 F_0 + \alpha F +\beta K)'_{x_0}, \qquad \psi(\widehat t_1) = -(\alpha_0 F_0 + \alpha F +\beta K)'_{x_1}. \end{equation} \tag{6.4}
Note that the functions f_x, \varphi_x, and g_x, as evaluated along the process \widehat{w}(t), are measurable and bounded, and \Phi_x is continuous.

Using again the expression m =(\psi f_u-\lambda g_u)D(t) with bounded matrix D(t), we have by (6.3) the equation

\begin{equation} d\psi(t)= A(t)\psi(t)\,dt+B(t)\lambda(t)\,dt+ G(t)\,d\mu(t), \end{equation} \tag{6.5}
where A, B, G are some measurable bounded matrices of corresponding dimensions. (Here, \psi, \lambda, and \mu are considered as columns.)

Let collections \xi^k\in Y^*, k =1,2,\dots, satisfy conditions (6.3), (6.4) and w^*-converge to a collection \xi^0 \in Y^*. We have to show that the limit collection \xi^0 also satisfies these conditions. (The space Y is separable, and hence the w^*-topology is metrizable on the bounded subsets of Y^*, and so it suffices to work with sequences.)

The w^*-convergence of the measures d\mu^k \to d\mu^0 in C^*, the w^*-convergence of the functions \lambda^k \to \lambda^0 in L_\infty, and the convergence of the finite-dimensional components (\alpha_0^k,\alpha^k,\beta^k) \to (\alpha_0^0,\alpha^0,\beta^0) imply that the functions \psi^k are uniformly bounded, \psi^k(t) \to \psi^0(t) almost everywhere on \Delta, and the measures d\psi^k w^*-converge to d\psi^0 (see Lemma 7 in § 9.3). By the Lebesgue theorem, \psi^k \to \psi^0, and hence m^k w^*-converges to m^0 in L_\infty.

Since (6.3) is an equality between linear functionals on C^n(\Delta), it suffices to check this equality at any test function \overline x\in C^n(\Delta). By the assumption, for any k,

\begin{equation} \int_\Delta \overline x\,d\psi^k= \int_\Delta \overline x\,(-\psi^k f_x + \lambda^k\varphi_x + m^k g_x)\,dt + \int_\Delta \overline x\,\Phi_{x}\, d\mu^k. \end{equation} \tag{6.6}
Since d\psi^k\to d\psi^0 and d\mu^k\to d\mu^0 weakly^*, the left-hand side and the last term on the right converge to the required limits
\begin{equation} \int_\Delta \overline x\,d\psi^k \to \int_\Delta \overline x\,d\psi^0, \qquad \int_\Delta \overline x\,\Phi_x\,d\mu^k \to \int_\Delta \overline x\,\Phi_x\,d\mu^0, \end{equation} \tag{6.7}
and the convergence of the middle integral on the right of (6.6)
\begin{equation*} \int_\Delta \overline x\,(-\psi^k f_x + \lambda^k\varphi_x + m^k g_x)\,dt \to \int_\Delta \overline x\,(-\psi^0 f_x + \lambda^0\varphi_x + m^0 g_x)\,dt \end{equation*} \notag
is secured by the fact that \psi^k\to \psi^0, \;\lambda^k\to \lambda^0 and m^k\to m^0 weakly^* in L_\infty with respect to L_1.

Passing to the limit, we have

\begin{equation*} \int_\Delta \overline x\,d\psi^0= \int_\Delta \overline x\,(-\psi^0 f_x + \lambda^0 \varphi_x + m^0 g_x)\,dt + \int_\Delta \overline x\,\Phi_{x}\, d\mu^0. \end{equation*} \notag
Since \overline x \in C^n(\Delta) is arbitrary, this yields the required equality
\begin{equation*} d\psi^0(t)= (-\psi^0 f_x + \lambda^0 \varphi_x + m^0\,g_x)\,dt + \Phi_{x}\, d\mu^0(t), \end{equation*} \notag
which means that equation (iv) is preserved under w^*-limits.

The proof of the w^*-closedness of conditions (i), (ii), (v), (vi), (vii), and of the first two conditions in (iii) which define the set \Lambda^\theta is even simpler and hence omitted. The third condition in (iii), for any j= 1, \dots, d(\Phi), means that, for any interval between neighbouring points t^s< t^{s+1} at which \Phi_j(\widehat{x}(t))<0, the measure d\mu_j vanishes. This is equivalent to saying that \int \overline x\,d\mu_j =0 for any continuous function \overline x(t) supported on this interval. Obviously, this property is preserved under the w^*-convergence d\mu^k_j \to d\mu^0_j.

It remains only to consider the last condition (viii). We fix any s and consider the pair (t^s,u^s) from the index \theta. To avoid confusion with the numbers in a sequence, we set here t^s =t_*, \widehat{x}(t^s)= \widehat{x}_*, and u^s = u_*.

For any collection \xi \in \Lambda^\theta, we introduce the function h(t) = \psi(t) f(\widehat{x}_*,u_*), that is, the projection of a vector \psi(t) to a fixed direction f(\widehat{x}_*,u_*), and consider it on the entire interval \Delta = [\widehat t_0,\widehat t_1]. By (6.3), the function h satisfies

\begin{equation*} d h(t)= (-\psi f_x + \lambda\varphi_x + m\,g_x)f(\widehat{x}_*,u_*)\,dt+ \sum \Phi'_j(\widehat{x}(t))\,f(\widehat{x}_*,u_*)\, d\mu_j(t). \end{equation*} \notag

Consider the scalar functions a_j(t) = \Phi'_j(\widehat{x}(t))\, f(\widehat{x}_*,u_*), j=1,\dots, d(\Phi). These functions are continuous, since so is \Phi'_j(\widehat{x}(t)), and since the vector f(\widehat{x}_*,u_*) is constant. We also introduce the function b(t) = (-\psi f_x + \lambda\varphi_x + m\,g_x)f(\widehat{x}_*,u_*), which is measurable and bounded. We have

\begin{equation} d h(t)= b(t)\,dt+ \sum a_j(t)\, d\mu_j(t). \end{equation} \tag{6.8}

Now let a sequence \xi^k \in\Lambda^\theta w^*-converge to \xi^0 \in Y^*. Then, for any k =1,2, \dots, we have the measure

\begin{equation*} d h^k(t)= b^k(t)\,dt+ \sum a_j(t)\, d\mu_j^k(t), \end{equation*} \notag
where \|b^k\|_\infty \leqslant \mathrm{const} and \|d\mu_j^k\| \leqslant 1 by the normalization condition (ii).

In view of condition (viii), for the given pair (t_*,u_*), for any n, there exist numbers \rho_{j}^k\in[0,1], j=1,\dots,d(\Phi), such that

\begin{equation*} h^k(t_*-0)+ \sum \rho_{j}^k a_j(t_*)\Delta\mu_{j}^k(t_*) \leqslant 0. \end{equation*} \notag
Clearly, h^k(\widehat t_0) \to h^0(\widehat t_0), and by the assumption, b^k \xrightarrow{w^*} b^0 (in L_\infty(\Delta) with respect to L_1(\Delta)), and also d\mu^k \xrightarrow{{w^*}} d\mu^0 for all j. Therefore, Lemma 11 (see the appendix, § 9.3) applies, and hence there exist numbers \rho_{j}^0\in [0,1], j= 1,\dots, d(\Phi), such that
\begin{equation*} h^0(t_*-0)+ \sum \rho_{j}^0 a_j(t_*) \Delta\mu_{j}^0(t_*) \leqslant 0. \end{equation*} \notag
The last inequality means that
\begin{equation*} \biggl(\psi^0(t_*-0)+ \sum_{j=1}^{d(\Phi)} \rho^0_{j}\, \Delta\mu^0_{j}(t_*)\Phi'_j(\widehat{x}_*)\biggr) f(\widehat{x}_*,u_*) \leqslant 0, \end{equation*} \notag
which is exactly condition (viii) for the limit collection \xi^0 \in \Lambda^\theta.

Thus, the set \Lambda^\theta is bounded and w^*-closed, and hence is w^*-compact by the Alaoglu theorem. Lemma 5 is proved. \Box

3. Now, for each possible index \theta, we obtain a corresponding non-empty compact set \Lambda^\theta. Let us show that the family of all such compact sets constitutes a centred (Alexandrov type) system. To this aim, we introduce a partial order in the set of all indexes. We say that \theta_1\subset \theta_2 if each pair (t^s,u^s) from \theta_1 also lies in \theta_2. Obviously, for any two indexes \theta_1 and \theta_2, there is a third one which contains both of them, for example, their union. It is also clear that an expansion of \theta reduces the set \Lambda^\theta, that is, \theta_1\subset\theta_2 implies the inverse inclusion \Lambda^{\theta_1}\supset \Lambda^{\theta_2}.

Consider now a finite collection of compact sets \Lambda^{\theta_1},\dots, \Lambda^{\theta_m} and take any index \theta containing all indexes \theta_1,\dots, \theta_m. The non-empty compact set \Lambda^\theta is contained in each of the sets \Lambda^{\theta_1},\dots,\Lambda^{\theta_m}, and so, their intersection is non-empty. Therefore, the family \{ \Lambda^{\theta}\} is centred, and hence has a non-empty intersection

\begin{equation*} \Lambda_*=\; \bigcap_{\theta}\Lambda^\theta. \end{equation*} \notag

Next, consider an arbitrary collection of multipliers \xi=(\alpha_0,\alpha,\beta,\lambda,m,\mu)\in \Lambda_*, and let \psi be the corresponding adjoint function. By definition, this collection satisfies conditions (i)–(vi) of Theorem 4 . Fulfillment of condition (vii) for any index \theta means that, for any interval (t',t''),

\begin{equation*} \int_{t'}^{t''}\psi(t) f(\widehat{x}(t),\widehat{u}(t))\,dt = 0 \end{equation*} \notag
(since there exists an index containing the points t' and t''); this is equivalent to saying that
\begin{equation} \psi(t)f(\widehat{x}(t),\widehat{u}(t)) = 0 \qquad \text{a.e. on $[\widehat t_0,\widehat t_1]$}. \end{equation} \tag{6.9}

Condition (viii) for the collection \xi implies that, for any u \in \mathcal{R}_0(\widehat{x}(t)) and any point t \in (\widehat t_0,\widehat t_1) at which the measures d\mu_j do not have atoms (that is, \Delta\mu_j(t)=0), we have

\begin{equation} \psi(t -0) f(\widehat{x}(t),u) \leqslant 0. \end{equation} \tag{6.10}
Since this inequality holds for all t except countably many points, and since \psi can be considered continuous from either the left or right, this inequality holds for all t from (\widehat t_0,\widehat t_1), and hence it also holds for the endpoints of this interval (since \psi is continuous at these points). Hence, for any t, we also have the symmetric inequality
\begin{equation} \psi(t +0) f(\widehat{x}(t),u) \leqslant 0. \end{equation} \tag{6.11}
Inequalities (6.10) and (6.11) remain valid for all u \in \mathcal{R}(\widehat{x}(t)), since by Lemma 1 any such point is a limit point of \mathcal{R}_0(\widehat{x}(t)), and hence, condition (3.6) of the maximum principle for the autonomous Problem \mathrm{B} is met.

Thus, the chosen collection \xi ensures all the conditions of the MP for Problem \mathrm{B}. This proves Theorem 1 for Problem \mathrm{B}, and therefore, for the original Problem \mathrm{A}. \Box

§ 7. A problem with an inclusion type constraint

Consider briefly a problem with an inclusion type constraint u(t) \in U. According to Remark 3, we cannot simply add this constraint to the problem. However, we may proceed as follows. Assume that the control components are split into two groups: u= (u_1, u_2), where u_1\in\mathbb{R}^{r_1}, u_2\in\mathbb{R}^{r_2}. Consider Problem \mathrm{A} with an additional constraint only on the second group: u_2(t) \in U, where the set U \subset \mathbb{R}^{r_2} is arbitrary. This problem will be called Problem \mathrm{D}. (If the component u_2 is absent, we still have Problem \mathrm{A}.)

Assume that the functions f, \varphi, g, and their first derivatives with respect to u_1 are jointly continuous with respect to (t,x, u_1, u_2) on the set \mathcal{Q}, whereas no differentiability with respect to u_2 is assumed. We say in this case that u_1 is a smooth control and u_2 is a non-smooth control.

The regularity assumption of the mixed constraints should be now related only to the smooth control, that is, one should assume that, for any point (t,x,u_1,u_2)\in \mathcal{Q} at which these constraints are met together with u_2 \in U, the gradients with respect to u_1,

\begin{equation*} \varphi'_{iu_1}(t,x,u_1,u_2),\quad i\in I(t,x,u_1,u_2), \qquad g'_{ju_1}(t,x,u_1,u_2),\quad j=1,\dots, d(g), \end{equation*} \notag
are positively linearly independent. (Note that this is a more restrictive assumption than the former one, which involves the gradients with respect to all the control components.) The following analog of Theorem 1 holds.

Theorem 5. If a process \widehat{w}=(\widehat{x}(t), \widehat{u}_1(t), \widehat{u}_2(t)), t\in [\widehat t_0, \widehat t_1], delivers a Pontryagin minimum in Problem \mathrm{D}, then there exist multipliers \alpha_0, \alpha, \beta, and functions \lambda(t), m(t), \mu(t), \psi_x(t), \psi_t(t) of the same classes as before, for which conditions (i)–(vi) of Theorem 1 still hold, condition (vii) is replaced by

\begin{equation*} \overline H_{u_1}(\psi_x(t),t,\widehat{x}(t), \widehat{u}(t))=0, \end{equation*} \notag
and condition (viii) holds for all u' =(u'_1, u'_2) such that
\begin{equation*} \varphi(t,\widehat{x}(t), u'_1, u'_2)\leqslant0,\quad g(t,\widehat{x}(t), u'_1, u'_2) =0, \qquad u'_2 \in U. \end{equation*} \notag

If \widetilde{\mathcal{R}}(t,x(t)) denotes the set of all u' =(u'_1, u'_2) satisfying the last three relations, then the set \mathcal{R}(t,\widehat{x}(t)) in the maximality condition (2.9) should be replaced by \widetilde{\mathcal{R}}(t,\widehat{x}(t)).

The proof, which proceeds as above, involves a reduction to the autonomous case, the only difference is that now the index \theta consists of finitely many triples (t^s, u_1^s, u_2^s) satisfying \varphi(\widehat{x}(t^s),u_1^s, u_2^s) <0, \;g(\widehat{x}(t^s),u_1^s, u_2^s) =0 and u_2^s \in U, and, in addition, for constructing the control system in the \theta-problem one should, in analogy with Lemma 2, define u_1 = (\tilde u_1,\tilde u_2), so that the matrix g'_{\tilde u_2}(\widehat{x}(t^s),\tilde u_1^s,\tilde u_2^s, u_2^s) is invertible, resolve the equality g(x,\tilde u_1,\tilde u_2, u_2^s) =0 by a smooth function \tilde u_2 =G(x,\tilde u_1, u_2^s), and then freeze the value \tilde u_1 = \tilde u_1^s. The details are left to the reader.

§ 8. Example: geodesics on a smooth surface

In the Euclidean space \mathbb{R}^n, consider the surface S\colon c(x)=0, where c is a twice differentiable function such that c'(x) \ne 0 at all points of the surface. We are given two points x_0 and x_1 on this surface. The problem is to find a shortest curve lying on S which connects these points.

We represent this problem as the time-optimal control problem

\begin{equation*} \begin{gathered} \, \dot x =u, \qquad |u|\leqslant 1, \qquad x(t_0)= x_0, \qquad x(t_1)= x_1, \\ c(x(t))=0, \qquad J = t_1 -t_0 \to \min. \end{gathered} \end{equation*} \notag

Here, x is a state variable, and its velocity u is a control. Clearly, if the modulus of the velocity is bounded by 1, then the fastest trajectory has the shortest length. Since the problem is linear with respect to the control, and since the set of admissible control values is convex and compact, the existence of a solution is secured by the classical Filippov theorem.

As already mentioned, the state constraint c(x)=0 is not allowed. Differentiating it by virtue of the control system and taking into account that the initial point x_0 lies on S, we replace it by the equality x(t_0)= x_0 and the nullity of the derivative c'(x)u=0. However, in this case, the equality x(t_1)= x_1 at the terminal point is overdetermined, since we automatically get c(x(t_1))=0, and so, the set of equality constraints becomes a priori jointly degenerate. In fact, to satisfy the terminal condition x(t_1)= x_1 it suffices to fulfil it only on the tangential hyperplane to S at the point x_1.

Let L(x_1) be this tangential hyperplane, and \xi_1,\dots, \xi_{n-1} be some basis for it. It suffices to require at t= t_1 that

\begin{equation} (\xi_i,\, (x(t_1)- x_1)) =0, \qquad i=1,\dots, n-1, \end{equation} \tag{8.1}
that is, \pi_L (x(t_1)- x_1)) =0, where \pi_L\colon \mathbb{R}^n \to L(s_1) is the orthogonal projection onto L(x_1) along the vector c'(x_1). It is easily seen that in some neighbourhood of x_1 this equality and c(x_1)=0 imply that x(t_1)= x_1.

Thus, instead of the original “incorrect” statement of the problem, we consider the problem

\begin{equation} \dot x =u, \qquad x(t_0)= x_0, \quad J = t_1 -t_0 \to \min, \end{equation} \tag{8.2}
\begin{equation} (\xi_i,\, (x(t_1)- x_1)) =0, \qquad i=1,\dots, n-1, \end{equation} \tag{8.3}
\begin{equation} c'(x)u=0, \qquad (u,u)-1 \leqslant 0. \end{equation} \tag{8.4}

The last two relations will be treated as mixed constraints, and the control u is assumed to be smooth. So, we have a problem of type \mathrm{A} (and even of type \mathrm{B}).

Note that at any point (x,u) at which constraints (8.4) are met and |u|=1, their gradients with respect to u are linearly independent. Indeed, these gradients are non-zero vectors c'(x) and 2u, which, in view of the first equality, are orthogonal. If u=0, then only the gradient c'(x) of the first constraint should be considered, which by the assumption is non-zero. Thus, constraints (8.4) are regular, and so Theorem 1 can be applied to problem (8.2)(8.4).

Let (\widehat{x}(t), \widehat{u}(t)) be an optimal pair. Then there exist a number \alpha_0\geqslant 0, vectors \beta\in \mathbb{R}^n, \gamma\in \mathbb{R}^{n-1}, Lipschitz-continuous functions \psi_x(t), \psi_t(t), measurable bounded functions \lambda(t)\geqslant 0, m(t), not all equal to zero, which generate the Pontryagin function H =(\psi_x,\,u), the extended Pontryagin function

\begin{equation*} \overline H = (\psi_x,u) - \frac12 \lambda(t)((u,u)-1)- m(t)(c'(x),u), \end{equation*} \notag
and the endpoint Lagrange function
\begin{equation*} l= \alpha_0(t_1 -t_0) +\beta (x(t_0)- x_0) + \sum_{i=1}^{n-1} \gamma_i(\xi_i,\,(x(t_1)- x_1)), \end{equation*} \notag
such that along (\widehat{x}(t), \widehat{u}(t)) the following conditions hold:

the complementary slackness condition

\begin{equation} \lambda(t)\,\bigl((\widehat{u},\widehat{u})-1\bigr) =0; \end{equation} \tag{8.5}

the adjoint equation in x

\begin{equation} \dot\psi_x = -\overline H_x = m(t)c''(\widehat{x})\widehat{u} \end{equation} \tag{8.6}
(here we use the symmetricity of matrix c''(x));

the transversality conditions

\begin{equation} \psi_x(t_0) = \beta, \qquad \psi_x(t_1)= - \sum_{i=1}^{n-1} \gamma_i\xi_i; \end{equation} \tag{8.7}

the adjoint equation in t

\begin{equation} \psi_t= \mathrm{const} = -\alpha_0; \end{equation} \tag{8.8}

“the energy conservation law”

\begin{equation} (\psi_x,\widehat{u})+ \psi_t= 0,\quad \text{i.e.,}\quad \widehat H = (\psi_x,\widehat{u}) \equiv \alpha_0; \end{equation} \tag{8.9}

and the stationarity condition in u

\begin{equation} \overline H_u= \psi_x- \lambda(t)u- m(t)c'(x)=0. \end{equation} \tag{8.10}

Also, one can write the maximality condition for H, but since the constraints (8.4) are convex in u, this condition follows from the last equality.

Below, we will drop the hats on x and u, and, instead of \psi_x we simply write \psi.

Multiplying (8.10) by u, we get (\psi,u) - \lambda(t)(u,u) =0. By (8.5), \lambda(t)(u,u) = \lambda(t), and now from (8.9) we get \lambda(t) \equiv \alpha_0.

Consider the case \alpha_0=0. We have \lambda(t)=0, and now (8.10) gives \psi(t) = m(t)c'(x), that is, \psi(t) is proportional to c'(x(t)). Therefore, m(t) = (k(t), \psi(t)) with some vector function k(t). Now (8.6) is the homogeneous equation

\begin{equation*} \dot\psi= (k, \psi)c''(x)u. \end{equation*} \notag
Moreover, \psi(t_1) = m(t_1)\,c'(x_1), and, in view of (8.7), we have \psi(t_1) \in L(x_1). Therefore, \psi(t_1)=0, and so, \psi(t) \equiv 0. Now from (8.7) we find that \beta=0 and all \gamma_i=0, so the collection of multipliers is trivial, a contradiction.

Therefore, \alpha_0=1, which gives \lambda(t) \equiv 1, and now by the complementary slackness condition (8.5) we have |u| \equiv 1 (motion with maximal possible velocity).

So, we have

\begin{equation} \dot\psi = m(t)c''(x)u, \end{equation} \tag{8.11}
\begin{equation} \psi = u + m(t)c'(x). \end{equation} \tag{8.12}
Multiplying the last equation by c'(x), we obtain
\begin{equation} (\psi, c'(x))= m(t)(c'(x),c'(x)). \end{equation} \tag{8.13}

Since c'(x) \ne 0, the function m(t) is Lipschitz-continuous, and hence so is u(t) = \psi(t) - m(t)c'(x). Hence u(t) can be differentiated:

\begin{equation*} \dot u= \dot\psi- \dot mc'(x)- mc''(x)u= - \dot mc'(x), \end{equation*} \notag
that is, \ddot x = - \dot mc'(x).

From (c'(x),\dot x)=0 we have (c'(x),\ddot x)+ (c''(x)\dot x, \dot x)=0, which, by the above, gives \dot m(c'(x),c'(x)) = (c''(x)\dot x, \dot x). Hence

\begin{equation*} \dot m = \frac{(c''(x)\dot x, \dot x)}{(c'(x),c'(x))}. \end{equation*} \notag
Finally, we get the geodesic equation in terms of the trajectory x(t):
\begin{equation} \ddot x= -\frac{(c''(x)\dot x, \dot x)}{(c'(x),c'(x))}c'(x). \end{equation} \tag{8.14}
(Everywhere, the passage from a covector to a vector is by transposition, since we work in the Euclidean space \mathbb{R}^n.)

In particular cases, when the surface S is a plane, a sphere, or a cylinder, equation (8.14) gives, respectively, a rectilinear motion, a rotation along a big circle, and a motion along a helix line with velocity 1.

§ 9. Appendix

9.1. Lagrange principle for extremum problems with infinite number of constraints

Let X, Y and Z_i, i=1,\dots, \nu, be Banach spaces, \mathcal{D}\subset X be an open set, K_i \subset Z_i, i=1,\dots, \nu, be closed convex cones with non-empty interiors. Next, let F_0\colon \mathcal{D}\to \mathbb{R}, g\colon \mathcal{D}\to Y and f_i \colon \mathcal{D}\to Z_i, i=1,\dots, \nu, be given mappings. Consider the extremal problem

\begin{equation} F_0(x)\to \min, \qquad f_i(x) \in K_i, \quad i=1,\dots, \nu, \qquad g(x)=0. \end{equation} \tag{9.1}

This problem covers a majority of theoretical and applied optimization problems, including optimal control problems with state and mixed state-control constraints \Phi(t,x(t)) \leqslant 0 and \varphi(t,x(t),u(t)) \leqslant 0, which can be regarded as inclusions in the cones of non-positive functions in the spaces C and L_\infty, respectively (see [35]); some versions of problem (9.1) are considered in [36] and [37].

Assumptions. 1) The objective function F_0 and the mappings f_i are Fréchet differentiable at some point x_0\in \mathcal{D}; the operator g is strictly differentiable at x_0 (smoothness of the data functions); 2) the image of the derivative g'(x_0) is closed in Y (weak regularity of the equality constraint).

Even though all the mappings in the problem are differentiable, problem (9.1) is not a standard smooth problem, because any constraint f_i(x) \in K_i can be given by an infinite number of smooth scalar inequalities (since the spaces Z_i can be infinite-dimensional).

Theorem 6. Let x_0 be a point of local minimum in problem (9.1). Then there exist multipliers \alpha_0\geqslant 0, z_i^* \in Z^*_i, i=1,\dots,\nu, and y^*\in Y^*, not all zero, such that z_i^* \in K^0_i and \langle z_i^*, f_i(x_0) \rangle =0, i=1,\dots,\nu (that is, every z^*_i is an outer normal to the cone K_i at the point f_i(x_0)), and the Lagrange function \mathcal{L}(x) = \alpha_0 F_0(x) + \sum_{i=1}^\nu \langle z_i^*, f_i(x)\rangle + \langle y^*, g(x)\rangle is stationary at x_0:

\begin{equation} \mathcal{L}'(x_0) = \alpha_0 F_0'(x_0)+ \sum_{i=1}^\nu z_i^* f_i'(x_0)+ y^* g'(x_0) = 0. \end{equation} \tag{9.2}

The last equality is called the Euler–Lagrange equation.

Theorem 6 is a generalization of the classical Lagrange multiplier rule to problems with infinite number of constraints. The proof follows the Dubovitskii–Milyutin scheme and is based on standard notions and facts from functional analysis, see [7], [35]–[37].

9.2. Theorem on the absence of singular components [15]

Let D \subset \mathbb{R}^{d(w)} be a compact set, and let p_i\colon D\to \mathbb{R}^r, i\in I, and q_j\colon D\to \mathbb{R}^r, j\in J, be continuous vector functions, where I and J are some finite index sets. Suppose that, for any w\in D, the system of vectors p_i(w), i\in I, q_j(w), j\in J, is positively linearly independent (PLI).

Let also E\subset \mathbb{R} be a set of finite positive measure, and let a measurable function \widehat{w}(t)\in D lie in D almost everywhere on E.

Theorem 7. Let functionals \lambda_i , m_j \in L_\infty^*(E), \lambda_i\geqslant0, and let a function l\in L_1^r(E) be such that, for any test function \overline u(t)\in L_\infty^r(E),

\begin{equation} \sum_{i\in I} \langle \lambda_i, p_i(\widehat{w}(t))\overline u(t)\rangle+ \sum_{j\in J} \langle m_j, q_j(\widehat{w}(t))\overline u(t)\rangle = \int_{E} l(t)\overline u(t)\,dt. \end{equation} \tag{9.3}
Then all \lambda_i, m_j are functions from L_1(E), and each \lambda_i(t)\geqslant0 almost everywhere on E.

Proof. As already noted, for any point w_0\in D, there exists a vector \overline{v}_0 such that p_i(w_0)\overline{v}_0> 1 for all i\in I and q_j(w_0)\overline{v}_0 =0 for all j. By continuity, there exist a neighbourhood \mathcal{O}(w_0) of the point w_0 and a continuous function \overline{v}(w) such that on \mathcal{O}(w_0) we have
\begin{equation} p_i(w)\overline{v}(w)> 1\quad \forall\, i\in I, \qquad q_j(w)\overline{v}(w) =0\quad \forall\, j\in J, \end{equation} \tag{9.4}
and \overline{v}(w_0)= \overline{v}_0. (For example, one can take the projection of the vector \overline{v}_0 to the joint zero set of the vectors q_j(w).) By compactness, there is a finite number of neighbourhoods \mathcal{O}(w_s), s=1,\dots,\widetilde s, that cover all D, and on their union there is a “piecewise continuous” (to be precise, a bounded Borel) function \overline{v}(w) satisfying (9.4) on the whole D. Hence, for w= \widehat{w}(t), we get a measurable function \overline{v}(\widehat{w}(t)) satisfying, for almost all t\in E,
\begin{equation} p_i(\widehat{w}(t))\overline{v}(\widehat{w}(t))> 1\quad \forall\, i\in I, \qquad q_j(\widehat{w}(t))\overline{v}(\widehat{w}(t)) =0\quad \forall\, j\in J. \end{equation} \tag{9.5}

Suppose now that some functional \lambda_i, say \lambda_1, has a singular component. Thus, \lambda_1 =\lambda_1' +\lambda_1'', where the functional \lambda_1' is absolutely continuous and \lambda_1'' is a singular functional supported on a sequence of measurable sets E_k\subset E with \operatorname{mes}E_k\to 0, k=1, 2, \dots, and such that \|\lambda_1''\| = \gamma >0.

Consider a sequence of functions \overline u_k(t) = \chi_{E_k}(t)\overline{v}(\widehat{w}(t)). For this sequence, in view of (9.5), the second sum in (9.3) vanishes, and hence

\begin{equation*} \sum_i \langle \lambda_i, p_i(\widehat{w}(t))\overline u_k(t) \rangle = \int_{E_k} l(t)\overline u(t)\,dt. \end{equation*} \notag
Since all \lambda_i are non-negative (and hence all \lambda_i'\geqslant 0 and \lambda_i''\geqslant 0), the left-hand side of the last relation is not smaller than
\begin{equation*} \langle \lambda_1'', \chi_{E_k}\rangle= \langle \lambda_1'', \mathbf{1}\rangle= \|\lambda_1''\| = \gamma > 0 \end{equation*} \notag
(where \mathbf{1}(t)\equiv 1), while the right-hand side tends to zero by absolute continuity of the Lebesgue integral, a contradiction. Therefore, the functionals \lambda_i cannot have singular components, each \lambda_i is regular: \lambda_i\in L_1^r(E), i\in I.

Now (9.3) assumes the form

\begin{equation} \sum_j \langle m_j,q_i(\widehat{w}(t))\overline u(t)\rangle = \int_{E} l'(t)\overline u(t)\,dt, \end{equation} \tag{9.6}
where l'(t) is some new function from L^r_1(E).

Suppose now that some functional m_j, say m_1, has a singular component, that is, m_1 = m_1' + m_1'', where the functional m_1' is absolutely continuous and m_1'' is a singular functional supported on a sequence of measurable sets E_k\subset E such that \operatorname{mes}E_k\to 0 and \|m_1''\|=\gamma >0. We again consider an arbitrary point w_0\in D. Since the vectors q_j(w_0), j\in J, are linearly independent, there exists a vector \overline{v}_0 such that q_1(w_0)\overline{v}_0 =1 and q_i(w_0)\overline{v}_0 =0 for all j\ne 1. In addition, there exist a neighbourhood \mathcal{O}(w_0) and a continuous function \overline{v}(w) such that on \mathcal{O}(w_0)

\begin{equation} q_1(w)\overline{v}(w) =1, \qquad q_j(w)\overline{v}(w) =0\quad \forall\, j\ne 1, \end{equation} \tag{9.7}
and \overline{v}(w_0)= \overline{v}_0. (One can take the projection of the vector \overline{v}_0 onto the common zero subspace of the vectors q_j(w), j\ne 1, and then normalize it.) By compactness, there exist a finite number of neighbourhoods \mathcal{O}(w_s), s=1,\dots,\widetilde s, that cover D, and there is a bounded Borel function \overline{v}(w) on the union of \mathcal{O}(w_s) satisfying (9.7) on the whole D. Now, for w= \widehat{w}(t), we get a measurable function \overline{v}(\widehat{w}(t)) satisfying, for all t\in E,
\begin{equation} q_1(\widehat{w}(t))\overline{v}(\widehat{w}(t)) =1, \qquad q_j(\widehat{w}(t))\overline{v}(\widehat{w}(t)) =0\quad \forall\, j\ne 1. \end{equation} \tag{9.8}

Let z(t)\in L_\infty (E) be a function such that \langle m_1'', z\rangle =1. Then the function \overline u(t) = z(t)\,\overline{v}(\widehat{w}(t)) satisfies

\begin{equation*} q_1(\widehat{w}(t))\overline u(t) = z(t), \qquad q_j(\widehat{w}(t))\overline u(t) =0 \quad \forall\, j\ne 1, \end{equation*} \notag
and, for the sequence \overline u_k(t) = \chi_{E_k}(t)\overline u(t), we have by (9.6)
\begin{equation*} \langle m_1, q_1(\widehat{w}(t))\overline u_k(t)\rangle = \int_{E_k} l'(t)\overline u(t)\,dt, \end{equation*} \notag
that is,
\begin{equation} \langle m_1'', q_1(\widehat{w}(t))\overline u_k(t)\rangle= -\langle m_1', q_1(\widehat{w}(t))\overline u_k(t) \rangle + \int_{E_k} l'(t) \overline u(t)\,dt. \end{equation} \tag{9.9}
But, for all k,
\begin{equation*} \langle m_1'', q_1(\widehat{w}(t)\,\overline u_k(t)\rangle= \langle m_1'', q_1(\widehat{w}(t)\,\overline u(t)\rangle = \langle m_1'', z \rangle =1, \end{equation*} \notag
and hence the left-hand side of (9.9) is 1 for all k, while the right-hand side tends to zero, and we again have a contradiction. Thus, the functionals m_j also cannot have singular components. Theorem 7 is proved.5

The next theorem generalizes the above one to the case where the collection of vectors p_i(w) in a PLI system depends on the point w. Namely, assume that, on a compact set D, we are given, in addition to vector functions p_i and q_j, continuous scalar functions \varphi_i(w)\leqslant 0, i\in I. Let, for any point w\in D, the system of vectors p_i(w), i\in I(w), q_j(w), j\in J, where I(w) = \{i\in I \mid \varphi_i(w) =0\} is the set of active indexes for the point w, be positively linearly independent.

Let again E be a measurable set and a measurable \widehat{w}(t)\in D lie in E almost everywhere. As above, let \lambda_i, m_j \in L_\infty^*(E), but now each \lambda_i is non-negative and supported on the set M_i^\delta = \{ t\mid \varphi_i(\widehat{w}(t))\geqslant -\delta\} for any \delta>0.

Theorem 8. Let functionals \lambda_i, m_j \in L_\infty^*(E), \lambda_i\geqslant0 and let a function l\in L_1^r(E) be such that (9.3) holds for any test function \overline u(t)\in L_\infty^r(E). Then all \lambda_i and m_j are functions from L_1(E), and so, all \lambda_i(t)\geqslant 0 and \lambda_i(t)\varphi_i(\widehat{w}(t))=0\, almost everywhere on E.

Proof. Consider any index set \Gamma \subset I and define the corresponding compact set D_\Gamma = \{w \in D\mid \varphi_i(w)=0\ \forall\, i\in \Gamma\}. In particular, D_{\varnothing}= D.

For any \delta>0, we also define the wider compact set D_\Gamma^\delta = \{w \in D\mid \varphi_i(w)\geqslant -\delta for all i\in \Gamma\}. Obviously, \bigcap_{\delta>0} D_\Gamma^\delta = D_\Gamma, and hence, there is \delta>0 such that the vectors p_{i}(w), i\in \Gamma, q_j(w), j \in J, are PLI at any w\in D_\Gamma^\delta. Since the family of all sets \Gamma is finite, there exists \delta>0, common for all of them. Reducing \delta if necessary we may assume that if D_{\Gamma_1} \cap D_{\Gamma_2} =\varnothing, then D_{\Gamma_1}^\delta \cap D_{\Gamma_2}^\delta =\varnothing. The family of all these compact sets is partially ordered by the inclusion: if \Gamma_1\subset \Gamma_2, then D_1^\delta\supset D_2^\delta, and, for any \Gamma_1, \Gamma_2, we have D_{\Gamma_1\cup \Gamma_2}^\delta = D_{\Gamma_1}^\delta \cap D_{\Gamma_2}^\delta.

From the function \widehat{w}(t), for each \Gamma we define the measurable set M_\Gamma^\delta = \{t\in E\mid \widehat{w}(t) \in D_\Gamma^\delta\}. Let \mathcal{G} be the family of all “essential” sets \Gamma (\Gamma is essential if M_\Gamma^\delta has positive measure). Clearly, \mathcal{G} is also partially ordered by the inclusion. Consider any maximal element \Gamma_1 in this family, that is, an element such that, for any other \Gamma \supset \Gamma_1, M_\Gamma^\delta is nullset. In other words, \varphi_i(\widehat{w}(t)) \geqslant -\delta on M_{\Gamma_1}^\delta for all i\in \Gamma_1, and \varphi_i(\widehat{w}(t)) <-\delta for the remaining i\notin \Gamma_1.

Consider equality (9.3) for all \overline u(t) supported on the set M_{\Gamma_1}^\delta. By definition, each functional \lambda_i is supported on its own M_i^\delta, and the maximality of \Gamma_1 implies that, for i\notin \Gamma_1, each vanishes on M_{\Gamma_1}^\delta, whence, in the first sum, only i\in \Gamma_1 can be retained:

\begin{equation*} \sum_{i\in \Gamma_1} \langle \lambda_i,p_i(\widehat{w}(t))\overline u(t)\rangle+ \sum_{j\in J} \langle m_j,q_j(\widehat{w}(t))\overline u(t) \rangle = \int_{M_{\Gamma_1}^\delta} l(\tau)\overline u(\tau)\,d\tau. \end{equation*} \notag
Now applying Theorem 7 to the collection \lambda_i, i\in \Gamma_1, m_j, j\in J, the compact set D_{\Gamma_1}^\delta, and the set M_{\Gamma_1}^\delta, we find that the restriction to M_\Gamma^\delta of each functional from this collection is absolutely continuous. So, it remains to consider equality (9.3) on the set E_1 = E \setminus M_{\Gamma_1}^\delta.

For this set, the family \mathcal{G} of essential \Gamma \subset I is smaller (at least on \Gamma_1), and we will again proceed as above: consider any maximal element \Gamma_2; for this element all the “alien” functionals \lambda_i vanish on M_{\Gamma_2}^\delta, while, by Theorem 7, the restriction to M_{\Gamma_2}^\delta of the “own” \lambda_i, i\in \Gamma_2, and m_j, j\in J, is absolutely continuous. Hence, we can pass to the set E_2 = E_1 \setminus M_{\Gamma_2}^\delta, and so on. After a finite number of steps, the set \mathcal{G} will consist of a single set \Gamma_N. Hence by Theorem 7, on the set M_{\Gamma_N}^\delta, each of the functionals \lambda_i, i\in \Gamma_N, and m_j, j\in J, is absolutely continuous, while on the remaining set E_N all \lambda_i vanish, and now another appeal to Theorem 7 shows that all m_j, j\in J, are absolutely continuous. Theorem 8 is proved. \Box

Applying this theorem to (4.3), the scalar functions \varphi_i(w), the vector functions p_i(w) = \varphi'_{iu}(w), q_j(w) = g'_{ju}(w), the set E_+ in Problem \mathrm{B}^\theta, the compact set D = \{w\in \widehat D\mid \varphi(w)\leqslant0,\ g(w)=0\}, where \widehat D is the compact set containing the optimal process, and the function w^\theta(\tau) \in D, we find that all functionals \lambda_i and m_j are functions from L_1(E_+).

9.3. Some properties of functions of bounded variations

On an interval \Delta=[t_0,t_1], consider a linear differential equation with respect to a vector function \psi \in BV(\Delta) (treated as a column):

\begin{equation} d\psi(t)= A(t)\psi(t)\,dt+B(t)\lambda(t)\,dt+ G(t)\,d\mu(t), \qquad \psi(t_0) = \psi_0, \end{equation} \tag{9.10}
where A, B, G are given measurable matrices of corresponding dimensions, A is integrable,6 B, G are bounded, the function \mu\in BV(\Delta) (that is, the measure d\mu\in C^*(\Delta)), \lambda\in L_1(\Delta), and \psi_0 \in \mathbb{R}^{d(\psi)}.

Assume that the functions \psi\in BV(\Delta) are left-continuous, that is, \psi(t-0)= \psi(t) for t\in (t_0,t_1], define \psi(t_0-0)= \psi(t_0), and assume also that there exists a value \psi(t_1+0). Then the measure d\psi and the function \psi are related via

\begin{equation*} \psi(t) = \int_{t_0-0}^{t-0} d\psi, \quad t\in (t_0,t_1], \quad \text{and} \quad \psi(t_1+0) = \psi(t_1) + \Delta\psi(t_1), \end{equation*} \notag
where we also have
\begin{equation*} \|d\psi\|_{C^*}= \int_{t_0-0}^{t_1+0} |d\psi|,\qquad \|\psi\|_{BV} = |\psi(t_0)| + \|d\psi\|_{C^*}\,. \end{equation*} \notag
Note that \|\psi\|_\infty = \max_{[t_0-0,\, t_1+0]} |\psi(t)| \leqslant \|\psi\|_{BV}. If \psi is absolutely continuous, then \|\psi\|_{BV} = \|\psi\|_{AC} = |\psi(t_0)| + \int_{t_0}^{t_1} |\dot\psi(t)|\,dt.

The facts presented in this section are well known; their proofs are given for the convenience of the reader. The following lemma is actually taken from [30].

Lemma 6. For any initial condition \psi(t_0)=\psi_0, equation (9.10) has a unique solution \psi(t), which is continuous at all points of continuity of the measure d\mu and satisfies the estimate

\begin{equation} \|\psi\|_{BV} \leqslant \mathrm{const} \biggl(|\psi_0| + \int_{t_0}^{t_1}|\lambda(t)|\,dt + \int_{t_0-0}^{t_1+0}|d\mu(t)| \biggr). \end{equation} \tag{9.11}

Proof. Consider the function of bounded variation
\begin{equation} \rho(t) = \int_{t_0}^{t-0} \bigl(B(\tau)\lambda(\tau)\,d\tau + G(\tau)\,d\mu(\tau)\bigr), \qquad \rho(t_0)=0. \end{equation} \tag{9.12}
Obviously, this function is continuous at all points of continuity of the measure d\mu and generates the measure d\rho = B \lambda\,dt + G\,d\mu. Hence \|\rho\|_{BV} \leqslant \mathrm{const}(\|\lambda\|_1 + \|d\mu\|_{C^*}), and equation (9.10) now has the form
\begin{equation} d\psi(t)= A(t) \psi(t)\,dt+ d\rho(t), \qquad \psi(t_0) = \psi_0. \end{equation} \tag{9.13}
Let us find its solution in the form \psi = \overline\psi +\rho. We have d\overline\psi = A(\overline\psi + \rho)\, dt, and hence the function \overline\psi is absolutely continuous and satisfies the linear ordinary differential equation
\begin{equation} \dot{\overline\psi}= A(\overline\psi +\rho), \qquad \overline\psi(t_0) = \psi_0. \end{equation} \tag{9.14}
As is well known, it has a unique solution, and moreover,
\begin{equation*} \|\overline\psi\|_{BV}= \|\overline\psi\|_{AC} \leqslant \mathrm{const}(|\psi_0| + \|\rho\|_\infty) \leqslant \mathrm{const}(|\psi_0| + \|\rho\|_{BV}). \end{equation*} \notag
This implies that \psi = \overline\psi +\rho satisfies the required estimate (9.11). \Box

Lemma 7. Let, as k\to\infty, the functions \lambda^k \to \lambda^0 weakly converge (in the space L_1(\Delta) with respect to L_\infty(\Delta)), the measures d\mu^k \to d\mu^0 w^*-converge in the space C^*(\Delta), and the initial conditions \psi^k_0\to \psi^0_0. Then the corresponding solutions \psi^k(t) to equation (9.10) converge to \psi^0(t) at all points of continuity of the limit measure d\mu^0, and hence, almost everywhere on \Delta. In addition, \|\psi^k\|_\infty \leqslant\mathrm{const}, \|\psi^k -\psi^0\|_1 \to0, and the measures d\psi^k \overset{\text{w}^*}\to d\psi^0 in C^*(\Delta).

Proof. Let us construct functions \rho^k, \rho^0 corresponding to the triples (\lambda^k, d\mu^k, \psi^k_0) and (\lambda^0, d\mu^0, \psi^0_0) by formula (9.12). By assumptions of the lemma, d\rho^k \overset{\text{w}^*}\to d\rho^0 in C^*(\Delta), whence, as is known, \rho^k(t)\to \rho^0(t) at all points of continuity of the limit measure d\rho^0, and, a fortiori, at all points of continuity of the measure d\mu^0. In view of (9.12), \|\rho^k\|_\infty \leqslant \mathrm{const} (\|\lambda^k\|_1 + \|d\mu^k\|) \leqslant \mathrm{const}, whence, by the Lebesgue theorem, \|A(t)(\rho^k(t) -\rho^0(t))\|_1 \to 0, and now (9.14) implies that the corresponding \overline\psi{}^{\,k} converge to \overline\psi{}^{\,0} everywhere on \Delta, and hence \psi^k(t)\to \psi^0(t) at all points of continuity of the measure d\mu^0. Since \|\psi^k\|_\infty \leqslant \|\psi^k\|_{BV} \leqslant \mathrm{const} by (9.11), we have \|\psi^k -\psi^0\|_1 \to0 and \|A(t)(\psi^k(t) -\psi^0(t))\|_1 \to 0, and now by (9.13) the measures d\psi^k \to d\psi^0 w^*-converge in C^*(\Delta). \Box

Lemma 8 (on the limit of jumps of measures). Let measures d\mu^k \geqslant0 be such that d\mu^k \overset{\text{w}^*}\to d\mu^0 on the closed interval \Delta = [t_0, t_1]. Then, for any point t_* \in \Delta,

\begin{equation} \limsup_k \Delta\mu^k(t_*) \leqslant \Delta\mu^0(t_*). \end{equation} \tag{9.15}
(The w^*-limit measure can concentrate at the given point.)

Proof. We set \Delta\mu^0(t_*) = c\geqslant0. We first suppose that t_* \in \operatorname{int}\Delta. Let \varepsilon>0 and let t'< t_*< t'' be such that \mu^0(t'') -\mu^0(t') < c+\varepsilon. Then, on any smaller interval [\tau_1,\tau_2] \subset [t',t''], we still have \mu^0(\tau_2) -\mu^0(\tau_1)< c+\varepsilon. The w^*-convergence d\mu^k \to d\mu^0 implies the convergence almost everywhere on [t',t'']. Now, we take any two points \tau_1\in (t',t_*) and \tau_2\in (t_*, t'') at which
\begin{equation*} \mu^k(\tau_1) \to \mu^0(\tau_1), \qquad \mu^k(\tau_2) \to \mu^0(\tau_2). \end{equation*} \notag
Then, for sufficiently large k,
\begin{equation*} \Delta\mu^k(t_*) \leqslant \mu^k(\tau_2) - \mu^k(\tau_1)= \mu^0(\tau_2) - \mu^0(\tau_1) + o(1)< c+\varepsilon + o(1), \end{equation*} \notag
whence \limsup_k\Delta\mu^k(t_*) \leqslant c+\varepsilon. Since \varepsilon is arbitrary, we obtain the required estimate (9.15). In the case t_* =t_0 or t_* =t_1, the same arguments work with a small modification. The lemma is proved. \Box

Lemma 9. Let measures d\mu^k \geqslant0 be such that d\mu^k \overset{\text{w}^*}\to d\mu^0 on the closed interval \Delta = [t_0, t_1]. Then, for any closed interval D \subset \Delta,

\begin{equation*} \limsup_k \int_{D} d\mu^k \leqslant \int_{D} d\mu^0. \end{equation*} \notag

The proof proceeds as above and hence omitted.

Lemma 10. Let measures d\mu^k \geqslant0 be such that d\mu^k \xrightarrow{{w^*}} d\mu^0, \mu^k(t_0)= \mu^0(t_0)=0 and let functions b^k \in L_1(\Delta) be such that b^k\overset{\text{w}} \to b^0 \in L_1(\Delta). Consider the measures

\begin{equation*} \begin{alignedat}{1} d h^k &= b^k(t)\,dt+ a(t)\,d\mu^k, \\ d h^0 &= b^0(t)\,dt+ a(t)\,d\mu^0, \end{alignedat} \end{equation*} \notag
where a(t) is a continuous function on \Delta, and suppose that h^k(t_0)\to h^0(t_0). Let, at some point t_*\in \operatorname{int} \Delta, for all k,
\begin{equation} h^k(t_*-0)+ \rho^k a(t_*)\Delta\mu^k(t_*) \leqslant 0, \qquad \rho^k \in [0,1]. \end{equation} \tag{9.16}
Then there exists \rho^0 \in [0,1] such that
\begin{equation} h^0(t_*-0)+ \rho^0 a(t_*)\Delta\mu^0(t_*) \leqslant 0. \end{equation} \tag{9.17}

Proof. Consider the case a(t_*)\geqslant0. Setting \rho^k=0 for all k, we have h^k(t_*-0)\leqslant 0. Let us prove the limit inequality h^0(t_*-0) \leqslant 0.

By w^*-convergence d\mu^k \to d\mu^0, the norms \|d\mu^k\|_{C^*} are uniformly bounded by some constant M. We fix any \varepsilon>0. By the continuity of a(t) and the weak convergence b^k \to b^0, there exists \delta>0 such that a(t)\geqslant - \varepsilon/M on (t_* -\delta,\, t_*), and, for all k,

\begin{equation*} \int_{t_* -\delta}^{t_*} |b^k(t)|\,dt < \varepsilon. \end{equation*} \notag
Hence, on this interval
\begin{equation*} h^k(t_*-0)- h^k(t) \geqslant -\int |b^k(t)|\,dt- \frac{\varepsilon}{M} \int d\mu^k \geqslant\, - 2\varepsilon, \end{equation*} \notag
whence h^k(t) \leqslant h^k(t_*-0) +2\varepsilon. By the assumption h^k(t_*-0) \leqslant 0, and hence h^k(t)\leqslant 2\varepsilon on the interval (t_* -\delta,\, t_*). Since \varepsilon and \delta are independent of k, and since h^k(t) \to h^0(t) almost everywhere by the assumptions of lemma, we have h(t)\leqslant 2\varepsilon on the same interval, and hence h(t_*-0)\leqslant 2\varepsilon. Consequently, h(t_*- 0)\leqslant 0, since \varepsilon>0 is arbitrary

In the case a(t_*)< 0, setting \rho^k=1 for all k, we get h^k(t_*-0) + a(t_*)\Delta\mu^k(t_*) = h^k(t_*+0)\leqslant0 for all k. Making the change t \mapsto \tau = t_0+t_1-t, we arrive at the considered case \widetilde a(\tau_*) = -a(\tau_*)>0 with the inequality \widetilde h^k(\tau_*-0) \leqslant0. \Box

The next lemma is a generalization of the above result to the case of a finite numbers of measures d\mu^k. Let d\mu^k_j be measures such that

\begin{equation*} d\mu^k_j \geqslant0, \quad d\mu^k_j \xrightarrow{{w^*}} d\mu^0_j \quad {as }\; k\to \infty,\quad \mu^k_j(t_0)= \mu^0_j(t_0)=0, \end{equation*} \notag
j=1,\dots, N, and let b^k \in L_1(\Delta) be functions such that b^k \to b^0 \in L_1(\Delta) weakly. Consider the measures
\begin{equation*} \begin{alignedat}{1} d h^k &= b^k(t)\,dt+ a_1(t)\,d\mu^k_1 + \dots + a_N(t)\, d\mu^k_N, \\ d h^0 &= b^0(t)\,dt+ a_1(t)\,d\mu^0_1 + \dots + a_N(t)\,d\mu^0_N, \end{alignedat} \qquad h^k(t_0)\to h^0(t_0), \end{equation*} \notag
where N is an integer, and functions a_j(t)\geqslant 0 are continuous on \Delta.

Clearly, d h^k \xrightarrow{{w^*}} d h^0 and h^k(t)\to h^0(t) almost everywhere on \Delta.

Lemma 11. Let, at some point t_*\in \operatorname{int} \Delta,

\begin{equation} h^k(t_*-0)+ \sum_{j=1}^N \rho^k_j\, a_j(t_*)\Delta\mu^k_j(t_*) \leqslant 0, \qquad \rho^k_j \in [0,1], \end{equation} \tag{9.18}
for all k. Then there exist \rho^0_j \in [0,1] such that
\begin{equation} h^0(t_*-0)+ \sum_{j=1}^N \rho^0_j\, a(t_*)\Delta\mu^0_j(t_*) \leqslant 0. \end{equation} \tag{9.19}

Proof. Consider arbitrary \gamma_j, j=1,\dots, N, such that \sum \gamma_j = 1. For any j, we introduce the measures
\begin{equation*} d h^k_j = \gamma_j b^k\,dt + a_j\,d \mu^k_j, \quad h^k_j(t_0)= \gamma_j h^k(t_0), \qquad k = 0,1,2,\dots, \end{equation*} \notag
which define the functions h^k_j(t). We also set \widetilde h^k = \sum_j h^k_j. We have d\widetilde h^k = d h^k, \widetilde h^k(t_0) = h^k(t_0), and hence \widetilde h^k = h^k everywhere on \Delta if they are assumed to be left-continuous. For any j=1,\dots, N, let
\begin{equation*} h^k_j(t_*-0)+ \rho^k_j\, a_j(t_*)\Delta\mu^k_j(t_*)= c_j. \end{equation*} \notag
Summing over all j and using (9.18), we have \sum c_j\leqslant0. An application of Lemma 10 to the functions h^k_j(t) - c_j and \widetilde b^k_j(t) = \gamma_jb^k(t), for any j, shows that there exist numbers \rho^0_j\in [0,1] such that
\begin{equation*} h^0_j(t_*-0)+ \rho^0_j\, a_j(t_*)\Delta\mu^0_j(t_*) \leqslant c_j. \end{equation*} \notag
Summing these inequalities over all j=1,\dots, N and taking into account that h^0_j = \gamma_j h^0, we arrive at (9.19). \Box

The author is grateful to N. P. Osmolovskii for useful discussions.


Bibliography

1. R. V. Gamkrelidze, “Optimal control processes for bounded phase coordinates”, Izv. Akad. Nauk SSSR Ser. Mat., 24:3 (1960), 315–356 (Russian)  mathnet  zmath
2. M. R. Hestenes, Calculus of variations and optimal control theory, John Wiley & Sons, Inc., New York–London–Sydney, 1966  mathscinet  zmath
3. R. F. Hartl, S. P. Sethi, and R. G. Vickson, “A survey of the maximum principles for optimal control problems with state constraints”, SIAM Rev., 37:2 (1995), 181–218  crossref  mathscinet  zmath
4. A. Dmitruk and I. Samylovskiy, “On the relation between two approaches to necessary optimality conditions in problems with state constraints”, J. Optim. Theory Appl., 173:2 (2017), 391–420  crossref  mathscinet  zmath
5. A. Ya. Dubovitskii and A. A. Milyutin, “Extremum problems in the presence of restrictions”, Zh. Vychisl. Mat. Mat. Fiz., 5:3 (1965), 395–453  mathnet  mathscinet  zmath; English transl. U.S.S.R. Comput. Math. Math. Phys., 5:3 (1965), 1–80  crossref
6. A. A. Milyutin, “Maximum principle in a regular problem of optimal control”, Necessary condition in optimal control, Chaps. 1–5, Nauka, Moscow, 1990 (Russian)  mathscinet  zmath
7. A. A. Milyutin, A. V. Dmitruk, and N. P. Osmolovskii, Maximum princiiple in optimal control, Moscow State Univ., Faculty of Mech. and Math., Moscow, 2004 https://kafedra-opu.ru/node/139 (Russian)
8. A. A. Milyutin, “General schemes of necessary conditions for extrema and problems of optimal control”, Uspekhi Mat. Nauk, 25:5(155) (1970), 110–116  mathnet  mathscinet  zmath; English transl. Russian Math. Surveys, 25:5 (1970), 109–115  crossref  adsnasa
9. A. Ya. Dubovitskii and A. A. Milyutin, “Necessary conditions for a weak extremum in optimal control problems with mixed constraints of the inequality type”, Zh. Vychisl. Mat. Mat. Fiz., 8:4 (1968), 725–779  mathnet  mathscinet  zmath; English transl. U.S.S.R. Comput. Math. Math. Phys., 8:4 (1968), 24–98  crossref
10. A. Ya. Dubovitskii and A. A. Milyutin, Necessary weak extremum conditions in a general optimal control problem, Nauka, In-t Khim. Fiz. AN SSSR, Moscow, 1971 (Russian)
11. A. Ya. Dubovitskii and A. A. Milyutin, “Maximum principle theory”, Methods of the theory of extremal problems in economics, ed. V. L. Levin, Nauka, Moscow, 1981, 6–47 (Russian)  mathscinet  zmath
12. K. Makowski and L. W. Neustadt, “Optimal control problems with mixed control-phase variable equality and inequality constraints”, SIAM J. Control, 12:2 (1974), 184–228  crossref  mathscinet  zmath
13. A. M. Ter-Krikorov, “Convex programming in a space adjoint to a Banach space and convex optimal control problems with phase constraints”, Zh. Vychisl. Mat. Mat. Fiz., 16:2 (1976), 351–358  mathnet  mathscinet  zmath; English transl. U.S.S.R. Comput. Math. Math. Phys., 16:2 (1976), 68–75  crossref
14. A. N. Dyukalov and A. Y. Ilyutovich, “Indicator of optimality in nonlinear control problems with mixed constraints. I”, Avtomat. i Telemekh., 1977, no. 3, 96–106  mathnet  mathscinet  zmath; II, no. 5, 11–20  mathnet  mathscinet  zmath; English transl. Autom. Remote Control, 38:3 (1977), 381–389; “Features of optimality in nonlinear problems of optimal control with mixed constraints. II”:5, 620–628
15. A. V. Dmitruk, “Maximum principle for the general optimal control problem with phase and regular mixed constraints”, Optimality of control dynamical systems, 14, Nauka, Vsesoyuznyi Nauchno Issled. Inst. Sist. Issled., Moscow, 1990, 26–42  zmath; English transl. Comput. Math. and Modeling, 4 (1993), 364–377  crossref  mathscinet
16. R. V. Gamkrelidze, “Optimal sliding states”, Dokl. Akad. Nauk SSSR, 143:6 (1962), 1243–1245  mathnet  mathscinet  zmath; English transl. Soviet Math. Dokl., 3 (1962), 559–562
17. E. N. Devdariani and Yu. S. Ledyaev, “Maximum principle for implicit control systems”, Appl. Math. Optim., 40:1 (1999), 79–103  crossref  mathscinet  zmath
18. M. R. de Pinho and J. F. Rosenblueth, “Necessary conditions for constrained problems under Mangasarian–Fromowitz conditions”, SIAM J. Control Optim., 47:1 (2008), 535–552  crossref  mathscinet  zmath
19. F. Clarke and M. R. de Pinho, “Optimal control problems with mixed constraints”, SIAM J. Control Optim., 48:7 (2010), 4500–4524  crossref  mathscinet  zmath
20. H. A. Biswas and M. d. R. de Pinho, “A maximum principle for optimal control problems with state and mixed constraints”, ESAIM Control Optim. Calc. Var., 21:4 (2015), 939–957  crossref  mathscinet  zmath
21. A. Boccia, M. D. R. de Pinho, and R. B. Vinter, “Optimal control problems with mixed and pure state constraints”, SIAM J. Control Optim., 54:6 (2016), 3061–3083  crossref  mathscinet  zmath
22. An Li and J. J. Ye, “Necessary optimality conditions for optimal control problems with nonsmooth mixed state and control constraints”, Set-Valued Var. Anal., 24:3 (2016), 449–470  crossref  mathscinet  zmath
23. R. Andreani, V. A. de Oliveira, J. T. Pereira, and G. N. Silva, “A weak maximum principle for optimal control problems with mixed constraints under a constant rank condition”, IMA J. Math. Control Inform., 37:3 (2020), 1021–1047  crossref  mathscinet  zmath
24. L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, and E. F. Mishchenko, The mathematical theory of optimal processes, Fizmatgiz, Moscow, 1961  mathscinet  zmath; 2nd ed., Nauka, Moscow, 1969  zmath; English transl. of the first edition Intersci. Publ. John Wiley & Sons, Inc., New York–London, 1962  mathscinet  zmath
25. A. V. Dmitruk and N. P. Osmolovskii, “On the proof of Pontryagin's maximum principle by means of needle variations”, Fundam. Prikl. Mat., 19:5 (2014), 49–73  mathnet  mathscinet  zmath; English transl. J. Math. Sci. (N.Y.), 218:5 (2016), 581–598  crossref
26. G. G. Magaril-Il'yaev, “The Pontryagin maximum principle. Ab ovo usque ad mala”, Optimal control, Collected papers. In commemoration of the 105th anniversary of Academician Lev Semenovich Pontryagin, Trudy Mat. Inst. Steklova, 291, MAIK “Nauka/Interperiodica”, Moscow, 2015, 215–230  mathnet  crossref  mathscinet  zmath; English transl. G. G. Magaril-Il'yaev, 291, 2015, 203–218  crossref
27. A. V. Dmitruk and N. P. Osmolovskii, “Variations of the v-change of time in problems with state constraints”, Trudy Inst. Mat. i Mekh. UrO RAN, 24, no. 1, 2018, 76–92  mathnet  crossref  mathscinet  zmath; English transl. Proc. Steklov Inst. Math. (Suppl.), 305, suppl. 1 (2019), S49–S64  crossref
28. A. V. Dmitruk and N. P. Osmolovskii, “Proof of the maximum principle for a problem with state constraints by the v-change of time variable”, Discrete Contin. Dyn. Syst. Ser. B, 24:5 (2019), 2189–2204  crossref  mathscinet  zmath
29. A. V. Dmitruk, “Approximation theorem for a nonlinear control system with sliding modes”, Dynamical systems and optimization, Collected papers. Dedicated to the 70th birthday of academician Dmitrii Viktorovich Anosov, Trudy Mat. Inst. Steklova, 256, Nauka, MAIK Nauka/Inteperiodika, Moscow, 2007, 102–114  mathnet  mathscinet  zmath; English transl. Proc. Steklov Inst. Math., 256 (2007), 92–104  crossref
30. A. A. Milyutin, The maximum principle in the general problem of optimal control, Fizmatlit, Moscow, 2001 (Russian)  zmath
31. A. V. Dmitruk, “On the development of Pontryagin's maximum principle in the works of A. Ya. Dubovitskii and A. A. Milyutin”, Control Cybernet., 38:4A (2009), 923–957  mathscinet  zmath
32. “Necessary extremum conditions (Lagrange principle)”, Optimal control, Chap. 3, eds. V. M. Tikhomirov and N. P. Osmolovskii, MCCME, Moscow, 2008, 89–122 (Russian)
33. A. A. Milyutin and N. P. Osmolovskii, “First order conditions”, Calculus of variations and optimal control, Transl. from the Russian manuscript, Transl. Math. Monogr., 180, Amer. Math. Soc., Providence, RI, 1998  crossref  mathscinet  zmath
34. A. V. Dmitruk and A. M. Kaganovich, “Maximum principle for optimal control problems with intermediate constraints”, Nonlinear dynamics and control, 6, Fizmatlit, Moscow, 2008, 101–136  mathscinet  zmath; English transl. Comput. Math. Model., 22:2 (2011), 180–215  crossref
35. A. V. Dmitruk and N. P. Osmolovskii, “Necessary conditions for a weak minimum in optimal control problems with integral equations subject to state and mixed constraints”, SIAM J. Control Optim., 52:6 (2014), 3437–3462  crossref  mathscinet  zmath
36. A. Dmitruk and N. Osmolovskii, “A general Lagrange multipliers theorem”, 2017 Constructive nonsmooth analysis and related topics (dedicated to the memory of V. F. Demyanov), CNSA-2017 (St. Petersburg 2017), IEEE, 2017, 82–84  crossref
37. A. V. Dmitruk and N. P. Osmolovskii, “A general Lagrange multipliers theorem and related questions”, Control systems and mathematical methods in economics, Lecture Notes in Econom. and Math. Systems, 687, Springer, Cham, 2018, 165–194  crossref  mathscinet  zmath

Citation: A. V. Dmitruk, “Variations of v-change of time in an optimal control problem with state and mixed constraints”, Izv. Math., 87:4 (2023), 726–767
Citation in format AMSBIB
\Bibitem{Dmi23}
\by A.~V.~Dmitruk
\paper Variations of $v$-change of time in an~optimal control problem with state and mixed constraints
\jour Izv. Math.
\yr 2023
\vol 87
\issue 4
\pages 726--767
\mathnet{http://mi.mathnet.ru/eng/im9305}
\crossref{https://doi.org/10.4213/im9305e}
\mathscinet{http://mathscinet.ams.org/mathscinet-getitem?mr=4656040}
\zmath{https://zbmath.org/?q=an:1533.49015}
\adsnasa{https://adsabs.harvard.edu/cgi-bin/bib_query?2023IzMat..87..726D}
\isi{https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=Publons&SrcAuth=Publons_CEL&DestLinkType=FullRecord&DestApp=WOS_CPL&KeyUT=001088986700003}
\scopus{https://www.scopus.com/record/display.url?origin=inward&eid=2-s2.0-85174899146}
Linking options:
  • https://www.mathnet.ru/eng/im9305
  • https://doi.org/10.4213/im9305e
  • https://www.mathnet.ru/eng/im/v87/i4/p91
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Известия Российской академии наук. Серия математическая Izvestiya: Mathematics
    Statistics & downloads:
    Abstract page:512
    Russian version PDF:40
    English version PDF:86
    Russian version HTML:207
    English version HTML:174
    References:95
    First page:8
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2025