Dinh Dũng, “Collocation approximation by deep neural ReLU networks for parametric and stochastic PDEs with lognormal inputs”, Sb. Math., 214:4 (2023), 479

Sbornik: Mathematics

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Forthcoming papers
	Archive
	Impact factor
	Guidelines for authors
	License agreement
	Submit a manuscript

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Mat. Sb.:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Sbornik: Mathematics, 2023, Volume 214, Issue 4, Pages 479–515
DOI: https://doi.org/10.4213/sm9791e (Mi sm9791)

This article is cited in 2 scientific papers (total in 2 papers)

Collocation approximation by deep neural ReLU networks for parametric and stochastic PDEs with lognormal inputs

Dinh Dũng

Information Technology Institute, Vietnam National University, Hanoi, Vietnam

English version PDF (737 kB) HTML full-text Citations (2) Russian version article

References:

PDF

HTML

DOI: https://doi.org/10.4213/sm9791e

Abstract: We find the convergence rates of the collocation approximation by deep ReLU neural networks of solutions to elliptic PDEs with lognormal inputs, parametrized by $\boldsymbol{y}$ in the noncompact set ${\mathbb R}^\infty$. The approximation error is measured in the norm of the Bochner space $L_2({\mathbb R}^\infty, V, \gamma)$, where $\gamma$ is the infinite tensor-product standard Gaussian probability measure on ${\mathbb R}^\infty$ and $V$ is the energy space. We also obtain similar dimension-independent results in the case when the lognormal inputs are parametrized by ${\mathbb R}^M$ of very large dimension $M$, and the approximation error is measured in the $\sqrt{g_M}$-weighted uniform norm of the Bochner space $L_\infty^{\sqrt{g}}({\mathbb R}^M, V)$, where $g_M$ is the density function of the standard Gaussian probability measure on ${\mathbb R}^M$.
Bibliography: 62 titles.

Keywords: high-dimensional approximation, collocation approximation, deep ReLU neural networks, parametric elliptic PDEs, lognormal inputs.

Funding agency	Grant number
National Foundation for Science and Technology Development (Vietnam)	102.01-2020.03
This research was carried out with the support of the Vietnam National Foundation for Science and Technology Development (grant no. 102.01-2020.03).

Received: 09.05.2022 and 15.12.2022

Bibliographic databases:

Document Type: Article

MSC: 65C30, 65D05, 65D32, 65N15, 65N30, 65N35

Language: English

Original paper language: Russian

§ 1. Introduction

Partial differential equations (PDEs) with parametric and stochastic inputs are a common model used in science and engineering. Stochastic nature reflects the uncertainty in various parameters presented in the physical phenomenon modelled by the equation. A central problem of computational uncertainty quantification is efficient numerical approximation for parametric and stochastic PDEs; it was of great interest and achieved significant progress in recent decades. There is a large number of non-deep-neural-network papers on this topic to mention all of them. We point out just some works [3]–[5], [7]–[12], [14], [15], [24], [36], [61] and [62], which are related directly to our paper. In particular, collocation approximations which are based on a finite number of particular solvers to parametric and stochastic PDEs, were considered in [8]–[10], [14], [15], [18], [24] and [61].

The approximation universality of neural networks has achieved basic understanding since the 1980s (see [6], [13], [25] and [37]). In recent years deep neural networks were rapidly developed in theory and in applications to a wide range of fields due to their advantage over shallow ones. Since their application range is getting wider, a theoretical analysis discovering reasons for these significant practical improvements attracts special attention [2], [20], [44], [56], [57]. In recent years a number of interesting papers addressed the role of the depth and architecture of deep neural networks for nonadaptive and adaptive approximation of functions having a certain regularity [1], [22], [29], [32], [31], [42], [39], [51], [48], [59], [60]. High-dimensional approximations by deep neural networks were studied in [43], [53], [16] and [19], and their applications to high-dimensional PDEs in [23], [27], [28], [30], [33], [46] and [52]. Most of these papers employed the rectified linear unit (ReLU) as the activation function of deep neural networks, since the ReLU is simple and preferable in many applications. The output of such a deep neural network is a continuous piecewise linear function, which is easily and cheaply computed. The reader can consult the recent survey papers [21] and [47] for various problems and aspects of neural network approximation and bibliography.

Recently, a number of papers were devoted to various problems and methods of deep neural network approximation for parametric and stochastic PDEs such as dimensionality reduction [58], deep neural network expression rates for generalized polynomial chaos expansions (GPC) of solutions to parametric elliptic PDEs [17], [49], reduced basis methods [38], the problem of learning the discretized parameter-to-solution map in practice [26], Bayesian PDE inversion [33], [34], [45] and so on. Note that, except for [17], all these papers treated parametric and stochastic PDEs with affine inputs on the compact set ${\mathbb I}^\infty:= [-1,1]^\infty$. The authors of [49] proved dimension-independent deep neural network expression rate bounds for the uniform approximation of solutions to parametric elliptic PDEs with affine inputs on ${\mathbb I}^\infty$, based on $n$-term truncations of the nonorthogonal Taylor GPC expansion. The construction of approximating deep neural networks relies on the weighted summability of the Taylor GPC expansion coefficients of the solution, which is derived from its analyticity. The paper [17] investigated nonadaptive methods of deep ReLU neural network approximation of the solution $u$ to parametric and stochastic elliptic PDEs with lognormal inputs on the noncompact set ${\mathbb R}^\infty$. The approximation error is measured in the norm of the Bochner space $L_2({\mathbb R}^\infty, V, \gamma)$, where $\gamma$ is the tensor-product standard Gaussian probability on ${\mathbb R}^\infty$ and $V$ is the energy space. The approximation is based on an $m$-term truncation of the Hermite GPC of $u$. Under a certain assumption on the $\ell_q$-summability ($0<q<\infty$) of the lognormal inputs, it was proved that for every integer $n > 1$ one can construct a non-adaptive compactly supported deep ReLU neural network $\boldsymbol{\phi}_n$ of size ${\leqslant n}$ on ${\mathbb R}^m$ with $m = \mathcal{O}(n/\log n)$, having $m$ outputs so that the sum obtained by replacing the Hermite polynomials in the $m$-term truncation by these $m$ outputs approximates $u$ with error bound $\mathcal O((n/\log n )^{-1/q})$. The authors of [17] also obtained some results on similar problems for parametric and stochastic elliptic PDEs with affine inputs, based on the Jacobi and Taylor GPC expansions.

In the present paper we are interested in constructing deep ReLU neural networks for collocation approximation of the solution to parametric elliptic PDEs with lognormal inputs. We study the convergence rate of this approximation in terms of the size of deep ReLU neural networks.

Let $D \subset \mathbb{R}^d$ be a bounded Lipschitz domain. Consider the diffusion elliptic equation

$$ \begin{equation} - \operatorname{div} (a\nabla u)=f \quad \text{in } D, \qquad u|_{\partial D}= 0, \end{equation} \tag{1.1} $$

for a prescribed right-hand side $f$ and diffusion coefficient $a$ as functions on $D$. Denote by $V:= H^1_0(D)$ the so-called energy space of all those functions in the Sobolev space $H^1(D)$ that have compact support in $D$. Let $H^{-1}(D)$ be the dual space of $V$. Assume that $f \in H^{-1}(D)$ (in what follows this preliminary assumption always holds without mention). If $a \in L_\infty(D)$ satisfies the ellipticity assumption

$$ \begin{equation*} 0<a_{\min} \leqslant a \leqslant a_{\max}<\infty, \end{equation*} \notag $$

then by the well-known Lax-Milgram lemma there exists a unique solution $u \in V$ to the (non-parametric) equation (1.1) in the weak form

$$ \begin{equation*} \int_{D} a\nabla u \cdot \nabla v \, \mathrm d \boldsymbol{x}=\langle f , v \rangle \quad \forall\, v \in V. \end{equation*} \notag $$

For equation (1.1) we consider diffusion coefficients having a parametrized form $a=a(\boldsymbol{y})$, where $\boldsymbol{y}=(y_j)_{j \in \mathbb{N}}$ is a sequence of real-valued parameters ranging in the set ${\mathbb R}^\infty$. Denote by $u(\boldsymbol{y})$ the solution to the parametrized diffusion elliptic equation

$$ \begin{equation} - \operatorname{div} (a(\boldsymbol{y})\nabla u(\boldsymbol{y}))=f \quad\text{in } D, \qquad u(\boldsymbol{y})|_{\partial D}= 0. \end{equation} \tag{1.2} $$

The resulting solution operator maps $\boldsymbol{y}\in {\mathbb R}^\infty$ to $u(\boldsymbol{y})\in V$. The goal is to achieve numerical approximation of this complex map by a small number of parameters with a guaranteed error in a given norm. Depending on the nature of the object modelled, the parameter $\boldsymbol{y}$ can be either deterministic or random. In our paper we consider the so-called lognormal case when the diffusion coefficient $a$ is of the form

$$ \begin{equation} a(\boldsymbol{y})=\exp(b(\boldsymbol{y})) \end{equation} \tag{1.3} $$

with $ b(\boldsymbol{y})$ in the infinite-dimensional form:

$$ \begin{equation} b(\boldsymbol{y})=\sum_{j=1}^\infty y_j\psi_j, \qquad \boldsymbol{y} \in {\mathbb R}^\infty, \end{equation} \tag{1.4} $$

where the $y_j$ are independent and identically distributed standard Gaussian random variables and $\psi_j \in L_\infty(D)$. We also consider the finite-dimensional form when

$$ \begin{equation} b(\boldsymbol{y})=\sum_{j=1}^M y_j\psi_j, \qquad \boldsymbol{y} \in {\mathbb R}^M, \end{equation} \tag{1.5} $$

for a finite but very large dimension $M$. Notice that for fixed $\boldsymbol{y}$ both the cases (1.4) and (1.5) of equation (1.2) satisfy the ellipticity assumption, and therefore there exists a unique solution $u(\boldsymbol{y}) \in V$ to equation (1.2) in the weak form. However, there is no the uniform ellipticity with respect to $\boldsymbol{y}$ since ${\mathbb R}^\infty$ and ${\mathbb R}^M$ are not compact sets.

We describe briefly the main results of our paper.

We investigate nonadaptive collocation methods for high-dimensional deep ReLU neural network approximation of the solution $u(\boldsymbol{y})$ to parametrized diffusion elliptic PDEs (1.2) with lognormal inputs (1.3) in the infinite-dimensional case (1.4) and the finite-dimensional case (1.5). In the infinite-dimensional case (1.4) the approximation error is measured in the norm of the Bochner space $ L_2({\mathbb R}^\infty, V, \gamma)$, where $\gamma$ is the infinite tensor-product standard Gaussian probability on ${\mathbb R}^\infty$. Assume that there exists an increasing sequence $\boldsymbol{\rho}= (\rho_{j})_{j \in \mathbb N}$ of positive numbers strictly larger than $1$ such that for some $0<q<2$,

$$ \begin{equation*} \biggl\|\sum _{j \in \mathbb N} \rho_j |\psi_j| \biggr\| _{L_\infty(D)} <\infty \quad\text{and}\quad \boldsymbol{\rho}^{-1}=(\rho_{j}^{-1}) _{j \in \mathbb N}\in {\ell_q}(\mathbb N). \end{equation*} \notag $$

Then, given an arbitrary number $\delta$ such that $0<\delta < \min (1, 1/q -1/2)$, for every integer $n > 1$, we can construct a deep ReLU neural network $\boldsymbol{\phi}_n:= (\phi_j)_{j=1}^m$ on ${\mathbb R}^m$ with $m=\mathcal O (n^{1-\delta})$ of size at most $n$ and the sequence of points ${Y_n:=(\boldsymbol{y}^j)_{j=1}^m \subset {\mathbb R}^m}$ so that

(i) the deep ReLU neural network $\boldsymbol{\phi}_n$ and sequence of points $Y_n$ are independent of $u$;

(ii) the output dimension of $\boldsymbol{\phi}_n$ is $m=\mathcal O (n^{1-\delta})$;

(iii) the depth of $\boldsymbol{\phi}_n$ is $\mathcal{O}(n^\delta)$;

(iv) the components $\phi_j$, $j = 1,\dots,m$, of $\boldsymbol{\phi}_n$ are deep ReLU neural networks on $\mathbb{R}^{m_j}$ for $m_j = \mathcal{O}(n^\delta)$, having support in the super-cube $[-T,T]^{m_j}$, where ${T=\mathcal O (n^{1-\delta})}$;

(v) if $\Phi_j$ is the extension of $\phi_j$ to the whole ${\mathbb R}^\infty$ by $\Phi_j(\boldsymbol{y})=\phi_j\bigl((y_j)_{j=1}^{m_j}\bigr)$ for ${\boldsymbol{y}=(y_j)_{j\in \mathbb N} \in {\mathbb R}^\infty}$, then the collocation approximation of $u$ by the function

$$ \begin{equation*} \Phi_n u:=\sum_{j=1}^m u(\boldsymbol{y}^j) \Phi_j, \end{equation*} \notag $$

which is based on the $m$ solvers $(u(\boldsymbol{y}^j))_{j=1}^m$ and the deep ReLU network $\boldsymbol{\phi}_n$, gives the double error estimates

$$ \begin{equation} \|u- \Phi_n u\|_{L_2({\mathbb R}^\infty, V, \gamma)} =\mathcal O(m^{-(1/q-1/2)}) =\mathcal O(n^{-(1-\delta)(1/q-1/2)}). \end{equation} \tag{1.6} $$

We also obtain results similar to properties (i)–(v) in the finite-dimensional case (1.5), with the approximation error measured in the $\sqrt{g_M}$-weighted uniform norm of the Bochner space $L_\infty^{\sqrt{g}}({\mathbb R}^M, V)$, where $g_M$ is the density function of the standard Gaussian probability measure on ${\mathbb R}^M$.

These results are derived from results on deep ReLU neural network collocation approximation of functions in Bochner spaces related to a general separable Hilbert space and standard Gaussian probability measures, which are based on weighted $\ell_2$-summabilities of the Hermite GPC expansion coefficients of functions (see § 3 for details).

Notice that the error bound in $m$ in (1.6) is the same as the error bound of the collocation approximation of $u$ by the sparse-grid Lagrange GPC interpolation based on $m$ the same particular solvers $(u(\boldsymbol{y}^j))_{j=1}^m$, which is the best known result so far (see [15], Corollary 3.1). Moreover, the convergence rate $(1-\delta)(1/q - 1/2)$ with arbitrarily small $\delta > 0$ in terms of the size of the deep ReLU network in collocation approximation, is comparable with the convergence rate $1/q - 1/2$ with respect to the number of particular solvers in collocation approximation by sparse-grid Lagrange GPC interpolation. This is a crucial difference between the results in our paper and in [17], where the convergence rate was found for the deep ReLU network approximation of solutions to parametrized diffusion elliptic PDEs (1.2) with lognormal inputs (1.3) based on different input information, the coefficients of the Hermite GPC expansion in its finite truncations. Although that convergence rate is sharper than the one in (1.6), in general it is well known that collocation approximations are more important, difficult and applicable than those using spectral information about the coefficients of an orthonormal expansion. The extension of the results (i)–(v) to the Bochner space $L_\infty^{\sqrt{g}}({\mathbb R}^M, V)$ is also an important difference of our contribution in comparison with [17].

We would like to emphasize that the motivation of this paper is to establish approximation results which should show the possibilities of nonadaptive collocation approximation by deep ReLU neural networks and the convergence rates of approximation for the parametrized diffusion elliptic equation (1.2) with lognormal inputs, and we do not consider the numerical aspect of this problem. The results themselves do not give a practically realizable approximation because they do not cover the approximation of the coefficients which are particular solvers for certain points of the space variables. Moreover, the approximant $\Phi_n u$ is not a real deep ReLU network, but just a combination of these particular solvers and components of a deep ReLU network. It would be interesting to investigate the problem of full deep ReLU neural network approximation of the solution $u$ to parametric and stochastic elliptic PDEs by combining the spatial and parametric domains based on fully discrete approximation in [3] and [15]. This problem will be discussed in a forthcoming paper.

The paper is organized as follows. In § 2 we present a necessary knowledge about deep ReLU neural networks. Section 3 is devoted to collocation methods of deep ReLU neural network approximation of functions in Bochner spaces $L_2({\mathbb R}^\infty, X, \gamma)$ or in $L_2({\mathbb R}^M,X,\gamma)$ related to a separable Hilbert space $X$ and the tensor-product standard Gaussian probability measure $\gamma$. In § 4 we apply the results from § 3 to the collocation approximation by deep ReLU neural networks of the solution $u$ to the parametrized elliptic PDEs (1.2) with lognormal inputs (1.3) in the infinite case (1.4) and the finite case (1.5).

Notation

As usual, $\mathbb{N}$ denotes the natural numbers, $\mathbb{Z}$ the integers, $\mathbb{R}$ the real numbers and $\mathbb{N}_0:= \{s\in \mathbb{Z}\colon s\geqslant0\}$. We denote by $\mathbb{R}^\infty$ the set of all sequences $\boldsymbol{y} = (y_j)_{j\in \mathbb{N}}$ with $y_j\in \mathbb{R}$. For a set $G$, we denote by $|G|$ the cardinality of $G$. If $\boldsymbol{a}= (a_j)_{j \in \mathcal{J}}$ is a sequence of positive numbers with any index set $\mathcal{J}$, then we use the notation $\boldsymbol{a}^{-1}:= (a_j^{-1})_{j \in \mathcal{J}}$. We use the letters $C$ and $K$ to denote general positive constants which can take different values, and we use $C_{\alpha,\beta,\dots}$ and $K_{\alpha,\beta,\dots}$ when we want to emphasize the dependence of these constants on $\alpha,\beta,\dots$, or when this dependence is important in a particular situation.

For the convenience of the reader we list some specific notation and definitions which are widely used in the present paper and indicate where they are introduced.

Section 2: the symbols $W(\Phi)$, $L(\Phi)$ and $\operatorname{supp}(\Phi)$ denote the size, depth and support of the deep ReLU neural network $\Phi$, respectively; $\sigma(t):= \max\{t,0\}$ is the ReLU activation function.

Subsection 3.1: $\mathbb{F}$ denotes the set of all sequences of nonnegative integers ${\boldsymbol{s}=(s_j)_{j \in \mathbb{N}}}$ such that their support $\operatorname{supp} (\boldsymbol{s}):= \{j \in \mathbb{N}\colon s_j >0\}$ is a finite set. The letter $J$ denotes either $\infty$ or $M \in \mathbb{N}$; the set $U$ is defined in (3.2), the set $\mathcal{F}$ in (3.6), and the set $\mathcal{N}$ in (3.7); $\gamma$ and $\gamma_M$ are the standard Gaussian measures in ${\mathbb R}^\infty$ and ${\mathbb R}^M$, respectively. For $\boldsymbol{s} \in \mathbb{F}$ put $|\boldsymbol{s}|_1:= \sum_{j \in \mathbb{N}} s_j$ and $|\boldsymbol{s}|_0:= |\operatorname{supp} (\boldsymbol{s})|$. For $\boldsymbol{s}, \boldsymbol{s}' \in \mathcal{F}$ the inequality $\boldsymbol{s}' \leqslant \boldsymbol{s}$ means that $s_j' \leqslant s_j$ for $j \in \mathcal{N}$. A set $\boldsymbol{\sigma}=(\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}}$ with $\sigma_{\boldsymbol{s}} \in \mathbb{R}$ is called increasing if $\sigma_{\boldsymbol{s}'} \leqslant \sigma_{\boldsymbol{s}}$ for $\boldsymbol{s}' \leqslant \boldsymbol{s}$. The Bochner space $\mathcal{L}(U,X)$ is defined in (3.5); the Bochner spaces $L_2(U,X,\gamma)$ and $L_{\infty}^{\sqrt{g}}({\mathbb R}^M,X)$ are given by (3.3) and (3.4), respectively. In (3.9), $H_{\boldsymbol{s}}$ is defined as the $\boldsymbol{s}$th Hermite orthonormal polynomial and $v_{\boldsymbol{s}}$ as the $\boldsymbol{s}$th coefficient of the Hermite GPC expansion of $v$.

Subsection 3.2: $Y_m = (y_{m;k})_{k \in \pi_m}$ is the increasing sequence of the $m+1$ roots of the Hermite polynomial $H_{m+1}$; $I_m$ is the Lagrange interpolation operator defined by (3.12); $\lambda_m$ is the Lebesgue constant defined by (3.13); $\Delta_{\boldsymbol{s}}$ is the tensor product operator defined by (3.16); $I_\Lambda$ is the GPC interpolation operator defined by (3.18); the set $\boldsymbol{p}(\theta, \lambda):=(p_{\boldsymbol{s}}(\theta, \lambda))_{\boldsymbol{s} \in \mathcal F}$ is defined by (3.20); the set $\Lambda(\xi)$ is defined by (3.21) and the set $G(\xi)$ by (3.23).

§ 2. Deep ReLU neural networks

In this section we present some auxiliary knowledge on deep ReLU neural networks, which will be used as a tool of approximation. As in [59], we use deep feed-forward neural networks that allow connections between neurons in a layer with neurons in any preceding layers (but not in the same layer). The ReLU activation function is defined by $\sigma(t):= \max\{t,0\}$, $t\in \mathbb{R}$. We set $\sigma(\boldsymbol{x}):= (\sigma(x_1),\dots, \sigma(x_d))$ for $\boldsymbol{x}=(x_1,\dots,x_d) \in {\mathbb R}^d$.

Let us recall a standard definition of a deep ReLU neural network and some relevant terminology. Let $d,L\in \mathbb{N}$, $L\geqslant 2$, $N_0=d$, and $N_1,\dots,N_{L}\in \mathbb{N}$. Let $\boldsymbol{W}^\ell=(w^\ell_{i,j})\in \mathbb R^{N_\ell\times (\sum_{i=1}^{\ell-1}N_i)}$, $\ell=1,\dots,L$, be an $N_\ell\times (\sum_{i=1}^{\ell-1}N_i)$ matrix, and let $\boldsymbol{b}^\ell =(b^\ell_j)\in \mathbb{R}^{N_\ell}$. A ReLU neural network $\Phi$ (on ${\mathbb R}^d$) with input dimension $d$, output dimension $N_L$ and $L$ layers is a sequence of matrix-vector tuples

$$ \begin{equation*} \Phi=\bigl((\boldsymbol{W}^1,\boldsymbol{b}^1),\dots,(\boldsymbol{W}^L,\boldsymbol{b}^L)\bigr), \end{equation*} \notag $$

in which the following computation scheme is implemented:

$$ \begin{equation*} \begin{aligned} \, \boldsymbol{z}^0&:=\boldsymbol{x} \in \mathbb R^d, \\ \boldsymbol{z}^\ell &:= \sigma(\boldsymbol{W}^{\ell}(\boldsymbol{z}^0,\dots,\boldsymbol{z}^{\ell-1})^{\mathrm T}+\boldsymbol{b}^\ell), \qquad\ell=1,\dots,L-1, \\ \boldsymbol{z}^L&:=\boldsymbol{W}^L(\boldsymbol{z}^0,\dots, \boldsymbol{z}^{L-1})^{\mathrm T}+\boldsymbol{b}^L. \end{aligned} \end{equation*} \notag $$

We call $\boldsymbol{z}^0$ the input and, with a certain ambiguity, we use the notation $\Phi(\boldsymbol{x}):= \boldsymbol{z}^L$ for the output of $\Phi$ which is an $L$-dimensional vector function on ${\mathbb R}^d$. In some places we identify a ReLU neural network with its output. We adopt the following terminology:

• the number of layers $L(\Phi)=L$ is the depth of $\Phi$;
• the number of nonzero $w^\ell_{i,j}$ and $b^\ell_j$ is the size of $\Phi$ and is denoted by $W(\Phi)$;
• when $L(\Phi) \geqslant 3$, $\Phi$ is called a deep ReLU neural network; otherwise, it is called a shallow ReLU neural network;
• if $\Phi(\boldsymbol{x})=(\phi_j(\boldsymbol{x}))_{j=1}^L$, then the support of the deep ReLU neural network $\Phi$ is defined as $\bigcup_{j=1}^L \operatorname{supp}(\phi_j)$ and denoted by $\operatorname{supp}(\Phi)$.

There are two basic operations that neural networks allow. These are the parallelelization of several neural networks and the concatenation of two neural networks. The reader can find, for instance, in [32] (see also [21] and [47]) for detailed descriptions, as well as the following two lemmas on these operations.

Lemma 2.1 (parallelization). Let $N\in \mathbb{N}$ and $\lambda_j\in \mathbb{R}$, $j=1,\dots,N$. Let $\Phi_j$, $j=1,\dots,N$ be deep ReLU neural networks with input dimension $d$. Then we can explicitly construct a deep ReLU neural network denoted by $\Phi$ so that

$$ \begin{equation*} \Phi(\boldsymbol{x})=\sum_{j=1}^N\lambda_j \Phi_j(\boldsymbol{x}), \qquad \boldsymbol{x}\in \mathbb R^d. \end{equation*} \notag $$

Moreover, we have

$$ \begin{equation*} W(\Phi) \leqslant \sum_{j=1}^N W(\Phi_j) \quad\textit{and}\quad L(\Phi)=\max_{j=1,\dots,N} L(\Phi_j). \end{equation*} \notag $$

The deep ReLU neural network $\Phi$ is called the parallelization of $\Phi_j$, $j=1,\dots,N$.

Lemma 2.2 (concatenation). Let $\Phi_1$ and $\Phi_2$ be two ReLU neural networks such that the output layer of $\Phi_1$ has the same dimension as the input layer of $\Phi_2$. Then we can explicitly construct a ReLU neural network $\Phi$ such that $\Phi(\boldsymbol{x})=\Phi_2(\Phi_1(\boldsymbol{x}))$ for $\boldsymbol{x}\in \mathbb{R}^d$. Moreover we have

$$ \begin{equation*} W(\Phi)\leqslant 2W(\Phi_1)+2W(\Phi_2) \quad\textit{and}\quad L(\Phi)=L(\Phi_1)+L(\Phi_2). \end{equation*} \notag $$

The deep ReLU neural network $\Phi$ is called the concatenation of $\Phi_1$ and $\Phi_2$.

The following lemma is a direct consequence of Proposition 3.3 in [49].

Lemma 2.3. Let $\boldsymbol{\ell} \in {\mathbb N}^d$. For every $\delta \in (0,1)$ we can explicitly construct a deep ReLU neural network $\Phi_P$ on ${\mathbb R}^d$ so that

$$ \begin{equation*} \sup_{ \boldsymbol{x} \in [-1,1]^d} \biggl|\prod_{j=1}^d x_j^{\ell_j}-\Phi_P(\boldsymbol{x}) \biggr| \leqslant \delta. \end{equation*} \notag $$

Furthermore, if $x_j=0$ for some $j\in \{1,\dots,d\}$ then $\Phi_P(\boldsymbol{x})=0$ and there exists a positive constant $C$ independent of $\delta$, $d$ and $\boldsymbol{\ell}$ such that

$$ \begin{equation*} W(\Phi_P) \leqslant C |\boldsymbol{\ell}|_1\log (|\boldsymbol{\ell}|_1\delta^{-1}) \quad\textit{and}\quad L(\Phi_P) \leqslant C\log |\boldsymbol{\ell}|_1\log(|\boldsymbol{\ell}|_1\delta^{-1}). \end{equation*} \notag $$

For $j=0,1$ let $\varphi_j$ be the continuous piecewise linear functions with break points $\{-2,-1,1,2\}$ and $\operatorname{supp}(\varphi_j) \subset [-2,2]$ such that $\varphi_0(x)=1$ and $\varphi_1(x)=x$ if $x\in [-1,1]$.

Lemma 2.4. Let $\boldsymbol{\ell} \in {\mathbb N}^d$, and let $\varphi$ be either $\varphi_0$ or $\varphi_1$. Then for every $\delta \in (0,1)$ we can explicitly construct a deep ReLU neural network $\Phi$ on ${\mathbb R}^d$ so that

$$ \begin{equation*} \sup_{ \boldsymbol{x} \in [-2,2]^d} \biggl|\prod_{j=1}^d\varphi^{\ell_j}(x_j)- \Phi(\boldsymbol{x}) \biggr| \leqslant \delta. \end{equation*} \notag $$

Furthermore, $\operatorname{supp}(\Phi)\subset [-2,2]^d$ and there exists a positive constant $C$ independent of $\delta$, $d$ and $\boldsymbol{\ell}$ such that

$$ \begin{equation} W(\Phi) \leqslant C\bigl(1+ |\boldsymbol{\ell}|_1\log (|\boldsymbol{\ell}|_1\delta^{-1}) \bigr) \quad\textit{and}\quad L(\Phi) \leqslant C\bigl(1+\log |\boldsymbol{\ell}|_1\log(|\boldsymbol{\ell}|_1\delta^{-1})\bigr). \end{equation} \tag{2.1} $$

Proof. Notice that the explicit forms of $\varphi_j$ in terms of the ReLU activation function are

$$ \begin{equation*} \varphi_0(x) =\sigma(x-2)-3\sigma(x-1)+4\sigma(x)-3\sigma(x+1)+\sigma(x+2) \end{equation*} \notag $$

and

$$ \begin{equation*} \varphi_1(x) =\sigma(x-2)-2\sigma(x-1)+2\sigma(x+1) -\sigma(x+2). \end{equation*} \notag $$

This yields that the $\varphi_j $ can be realized exactly by a shallow ReLU neural network (still denoted by $\varphi_j$) of size $W(\varphi_0)\leqslant 10$ and $W(\varphi_1)\leqslant 8$. The network $\Phi$ can be constructed as a concatenation of the deep ReLU neural networks $\{\varphi(x_j)\}_{j=1}^d$ and $\Phi_P$. By the definitions of a deep ReLU neural network and the function $\varphi$ we have

$$ \begin{equation*} \boldsymbol{z}^1=\{\varphi(x_j)\}_{j=1}^d \subset [-1,1]^d. \end{equation*} \notag $$

Hence estimates (2.1) follow directly from Lemmas 2.2 and 2.3. The lemma is proved.

§ 3. Deep ReLU neural network approximation in Bochner spaces

In this section we investigate collocation methods of deep ReLU neural network approximation of functions in Bochner spaces related to a Hilbert space $X$ and tensor-product standard Gaussian probability measures $\gamma$. Functions to be approximated have the weighted $\ell_2$-summable Hermite GPC expansion coefficients (see Assumption (I) below). The approximation is based on sparse-grid Lagrange GPC interpolation. We develop such methods and establish the convergence rates of the approximation using them. The results obtained in this section are applied to deep ReLU neural network collocation approximation of the solution of parametrized elliptic PDEs with lognormal inputs in the next section.

3.1. Tensor-product Gaussian measures and Bochner spaces

Let $\gamma(y)$ be the standard Gaussian probability measure on $\mathbb{R}$ with density

$$ \begin{equation} g(y):=\frac 1 {\sqrt{2\pi}} e^{-y^2/2}, \quad\text{so that } \mathrm d\gamma(y):=g(y)\,\mathrm d y . \end{equation} \tag{3.1} $$

For $M \in \mathbb{N}$, the standard Gaussian probability measures $\gamma(\boldsymbol{y})$ on ${\mathbb R}^M$ can be defined by

$$ \begin{equation*} \mathrm d \gamma(\boldsymbol{y}) := g_M(\boldsymbol{y}) \mathrm d (\boldsymbol{y})=\bigotimes_{j=1}^M g(y_j) \mathrm d (y_j), \qquad \boldsymbol{y}=(y_j)_{j=1}^M \in {\mathbb R}^M, \end{equation*} \notag $$

where $g_M(\boldsymbol{y}) := \bigotimes_{j=1}^M g(y_j)$.

Next we recall the concept of standard Gaussian probability measure $\gamma(\boldsymbol{y})$ on ${\mathbb R}^\infty$ as the infinite tensor product of the standard Gaussian probability measures $\gamma(y_i)$:

$$ \begin{equation*} \gamma(\boldsymbol{y}):= \bigotimes_{j \in \mathbb N} \gamma(y_j) , \qquad \boldsymbol{y}=(y_j)_{j \in \mathbb N} \in {\mathbb R}^\infty. \end{equation*} \notag $$

The sigma algebra for $\gamma(\boldsymbol{y})$ is generated by the set of cylinders $A:= \prod_{j \in \mathbb{N}} A_j$, where $A_j \subset \mathbb{R}$ are univariate $\gamma$-measurable sets and only a finite number of the $A_i$ are different from $\mathbb{R}$. For such a set $A$, we have $\gamma(A) = \prod_{j \in \mathbb{N}} \gamma(A_j)$. (For details on an infinite tensor product of probability measures, see, for example, [35], pp. 429–435.)

In what follows we use the unified notation: $J$ denotes either $\infty$ or $M \in \mathbb{N}$ and

$$ \begin{equation} U := \begin{cases} {\mathbb R}^M&\text{if }J=M, \\ {\mathbb R}^\infty&\text{if }J=\infty. \end{cases} \end{equation} \tag{3.2} $$

If $X$ is a separable Hilbert space, then the standard Gaussian probability measure $\gamma$ on $U$ induces the Bochner space $L_2(U,X,\gamma)$ of $\gamma$-measurable mappings $v$ from $U$ to $X$, equipped with the norm

$$ \begin{equation} \|v\|_{L_2(U,X,\gamma)} := \biggl(\int_{U} \|v(\cdot,\boldsymbol{y})\|_X^2 \, \mathrm d \gamma(\boldsymbol{y}) \biggr)^{1/2}. \end{equation} \tag{3.3} $$

For a $\gamma$-measurable subset $\Omega$ of $U$ the spaces $L_2(\Omega,X,\gamma)$ and $L_2(\Omega,\gamma)$ are defined in the usual way.

In the case $U={\mathbb R}^M$ we also introduce the space $L_{\infty}^{\sqrt{g}}({\mathbb R}^M,X)$ as the set of all strongly $\gamma$-measurable functions $v\colon {\mathbb R}^M \to X$ with $\sqrt{g_M}$-weighted uniform norm

$$ \begin{equation} \|v\|_{L_{\infty}^{\sqrt{g}}({\mathbb R}^M,X)} :=\operatorname*{ess\,sup}_{\boldsymbol{y} \in {\mathbb R}^M} \Bigl(\|v(\boldsymbol{y})\|_X \sqrt{g_M(\boldsymbol{y})} \Bigr) . \end{equation} \tag{3.4} $$

One may expect an infinite-dimensional version of this space. Unfortunately, we could not give a consistent definition of $L_{\infty}^{\sqrt{g}}({\mathbb R}^\infty,X)$ because there is no an infinite-dimensional counterpart of the weight $g_M$. However, under certain assumptions (see Assumption (I) below) we can obtain some approximation results which do not depend on $M$, in particular, when $M$ are very large. We make use of the shorthand notation $L_{\infty}^{\sqrt{g}}({\mathbb R}^M)= L_{\infty}^{\sqrt{g}}({\mathbb R}^M,\mathbb{R})$ and $L_{\infty}^{\sqrt{g}}(\mathbb{R})= L_{\infty}^{\sqrt{g}}(\mathbb{R},\mathbb{R})$.

In this section we investigate the problem of deep ReLU neural network approximation of functions in $L_2({\mathbb R}^\infty, X, \gamma)$ or $L_2({\mathbb R}^M, X, \gamma)$ with the error measured in the norms of the space $L_2({\mathbb R}^\infty, X, \gamma)$ or $L_\infty^{\sqrt{g}}({\mathbb R}^M,X)$, respectively. (Notice that these norms are the most important ones in the evaluation of the error of the collocation approximation of solutions of parametric and stochastic PDEs). It is convenient for us to incorporate these different approximation problems into unified consideration. Hence, in what follows we use the unified notation

$$ \begin{equation} \mathcal L(U,X) :=\begin{cases} L_{\infty}^{\sqrt{g}}( {\mathbb R}^M,X)&\text{if }U={\mathbb R}^M, \\ L_2({\mathbb R}^\infty,X,\gamma)&\text{if } U={\mathbb R}^\infty, \end{cases} \end{equation} \tag{3.5} $$

$$ \begin{equation} \mathcal F :=\begin{cases} {\mathbb N}_0^M &\text{if } U={\mathbb R}^M, \\ \mathbb F &\text{if } U={\mathbb R}^\infty, \end{cases} \end{equation} \tag{3.6} $$

$$ \begin{equation} \mathcal N :=\begin{cases} \{1, \dotsc, M\} &\text{if } U={\mathbb R}^M, \\ \mathbb N &\text{if } U={\mathbb R}^\infty. \end{cases} \end{equation} \tag{3.7} $$

Here $\mathbb{F}$ is the set of all sequences of nonnegative integers $\boldsymbol{s}=(s_j)_{j \in \mathbb{N}}$ such that their support $\operatorname{supp} (\boldsymbol{s}):= \{j \in \mathbb{N}\colon s_j >0\}$ is a finite set.

Let $(H_k)_{k \in \mathbb{N}_0}$ be the Hermite polynomials normalized by $\displaystyle\int_{\mathbb{R}} |H_k(y)|^2 g(y)\,\mathrm{d}y=1$. Then a function $v \in L_2(U,X,\gamma)$ can be represented by the Hermite GPC expansion

$$ \begin{equation} v(\boldsymbol{y})=\sum_{\boldsymbol{s}\in\mathcal F} v_{\boldsymbol{s}} H_{\boldsymbol{s}}(\boldsymbol{y}), \qquad v_{\boldsymbol{s}} \in X, \end{equation} \tag{3.8} $$

where

$$ \begin{equation} H_{\boldsymbol{s}}(\boldsymbol{y})=\bigotimes_{j \in \mathcal N}H_{s_j}(y_j)\quad\text{and} \quad v_{\boldsymbol{s}}:=\int_U v(\boldsymbol{y})H_{\boldsymbol{s}}(\boldsymbol{y})\, \mathrm d\gamma (\boldsymbol{y}), \qquad \boldsymbol{s} \in \mathcal F. \end{equation} \tag{3.9} $$

Notice that $(H_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}}$ is an orthonormal basis of $L_2(U,\gamma):= L_2(U,\mathbb{R}, \gamma)$. Moreover, for every $v \in L_2(U,X,\gamma)$ represented by the series (3.8), Parseval’s identity holds

$$ \begin{equation*} \|v\|_{L_2(U,X,\gamma)}^2= \sum_{\boldsymbol{s}\in\mathcal F} \|v_{\boldsymbol{s}}\|_X^2. \end{equation*} \notag $$

For $\boldsymbol{s}, \boldsymbol{s}' \in \mathcal{F}$, the inequality $\boldsymbol{s}' \leqslant \boldsymbol{s}$ means that $s_j' \leqslant s_j$ for $j \in \mathcal{N}$. A set $\boldsymbol{\sigma}=(\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}}$ with $\sigma_{\boldsymbol{s}} \in \mathbb{R}$ is called increasing if $\sigma_{\boldsymbol{s}'} \leqslant \sigma_{\boldsymbol{s}}$ for $\boldsymbol{s}' \leqslant \boldsymbol{s}$.

Assumption (I). For $v \in L_2(U,X,\gamma)$ represented by a series (3.8) there exists an increasing set $\boldsymbol{\sigma} =(\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}}$ of positive numbers such that for some $q$ such that $0< q < 2$,

$$ \begin{equation} \biggl(\sum_{\boldsymbol{s}\in\mathcal F} (\sigma_{\boldsymbol{s}} \|v_{\boldsymbol{s}}\|_{X})^2\biggr)^{1/2} \leqslant C_1 <\infty, \qquad \|{\boldsymbol{\sigma}^{-1}} \|_{\ell_q(\mathcal F)} \leqslant C_2 < \infty, \end{equation} \tag{3.10} $$

where the constants $C_1$ and $C_2$ are independent of $J$.

Here and in what follows, ‘independent of $J$’ means that $C_1$ and $C_2$ (and other constants) are independent of $M$ when $J=M$, since we are interested in convergence rates and other asymptotic properties which do not depend on $M$ and are based on Assumption (I).

Lemma 3.1. For $v \in L_2(U,X,\gamma)$ satisfying Assumption (I) the series (3.8) converges absolutely, and therefore unconditionally, in $\mathcal{L}(U,X)$ to $v$ and

$$ \begin{equation} \sum_{\boldsymbol{s}\in\mathcal F} \|v_{\boldsymbol{s}}\|_{X} \leqslant C <\infty, \end{equation} \tag{3.11} $$

where the constant $C$ is independent of $J$.

Proof. By applying Hölder’s inequality from Assumption (I) we obtain

$$ \begin{equation*} \sum_{\boldsymbol{s}\in\mathcal F} \|v_{\boldsymbol{s}}\|_{X} \leqslant \biggl( \sum_{\boldsymbol{s}\in \mathcal F} (\sigma_{\boldsymbol{s}} \|v_{\boldsymbol{s}}\|_{X})^2\biggr)^{1/2} \biggl(\sum_{\boldsymbol{s}\in \mathcal F} \sigma_{\boldsymbol{s}}^{-2} \biggr)^{1/2} \leqslant C\|\boldsymbol{\sigma}^{-1}\|_{\ell_q(\mathcal F)} <\infty. \end{equation*} \notag $$

This proves (3.11). Hence, by the equality $\|H_{\boldsymbol{s}}\|_{L_2({\mathbb R}^\infty)}=1$, $\boldsymbol{s} \in \mathbb{F}$, and the inequality $\|H_{\boldsymbol{s}}\|_{L_\infty^{\sqrt{g}}({\mathbb R}^\infty)} < 1$, $\boldsymbol{s} \in \mathbb{N}_0^M$ (which follows from (5.7)) the series (3.8) converges absolutely, and therefore unconditionally, to $v \in L_2(U,X,\gamma)$, since by Parseval’s identity it already converges to $v$ in the norm of $L_2(U,X,\gamma)$. The lemma is proved.

3.2. Sparse-grid Lagrange GPC interpolation

For $m \in \mathbb{N}_0$, let $Y_m = (y_{m;k})_{k \in \pi_m}$ be the increasing sequence of the $m+1$ roots of the Hermite polynomial $H_{m+1}$, ordered as follows:

$$ \begin{equation*} \begin{gathered} \, y_{m,-j} < \dots < y_{m,-1} < y_{m,0}=0 < y_{m,1} < \dots < y_{m,j} \quad \text{if } m=2j, \\ y_{m,-j} < \dots < y_{m,-1} < y_{m,1} < \dots < y_{m,j} \quad \text{if } m=2j-1, \end{gathered} \end{equation*} \notag $$

where

$$ \begin{equation*} \pi_m:=\begin{cases} \{-j,-j+1, \dots, -1, 0, 1, \dots ,j-1,j \} &\text{if }m=2j, \\ \{-j,-j+1, \dots, -1, 1, \dots, j-1,j \}&\text{if }m=2j-1 \end{cases} \end{equation*} \notag $$

(in particular, $Y_0 = (y_{0;0})$ with $y_{0;0} = 0$).

For a function$v$ on $\mathbb{R}$ taking values in a Hilbert space $X$ and $m \in \mathbb{N}_0$, we define the Lagrange interpolation operator $I_m$ by

$$ \begin{equation} I_m(v):= \sum_{k\in \pi_m} v(y_{m;k}) L_{m;k}, \quad\text{where } L_{m;k}(y) :=\prod_{j \in \pi_m,\,j\ne k}\frac{y-y_{m;j}}{y_{n;k}-y_{m;j}} \end{equation} \tag{3.12} $$

(in particular, $I_0(v) = v(y_{0,0})L_{0,0}(y)= v(0)$ and $L_{0,0}(y)=1$). Notice that $I_m(v)$ is a function on $\mathbb{R}$ taking values in $X$ and interpolating $v$ at $y_{m;k}$, that is, $I_m(v)(y_{m;k}) = v(y_{m;k})$. Moreover, for a function $v\colon \mathbb{R} \to \mathbb{R}$, the function $I_m(v)$ is the Lagrange polynomial of degree $\leqslant m$, and $I_m(\varphi) = \varphi$ for every polynomial $\varphi$ of degree ${\leqslant m}$.

Let

$$ \begin{equation} \lambda_m:= \sup_{\|v\|_{L_\infty^{\sqrt{g}}(\mathbb R)} \leqslant 1} \|I_m(v)\|_{L_\infty^{\sqrt{g}}(\mathbb R)} \end{equation} \tag{3.13} $$

be the Lebesgue constant. It was proved in [40], [41] and [54] that

$$ \begin{equation*} \lambda_m \leqslant C(m+1)^{1/6}, \qquad m \in \mathbb N, \end{equation*} \notag $$

for some positive constant $C$ independent of $m$ (with the obvious inequality $\lambda_0(Y_0) \leqslant 1$). Hence, for every $\varepsilon > 0$ there exists a positive constant $C_\varepsilon \geqslant 1$ independent of $m$ such that

$$ \begin{equation} \lambda_m \leqslant (1+C_\varepsilon m)^{1/6+\varepsilon} \quad \forall\, m \in \mathbb N_0. \end{equation} \tag{3.14} $$

We define the univariate operator $\Delta_m$ for $m \in \mathbb{N}_0$ by

$$ \begin{equation*} \Delta_m := I_m-I_{m-1}, \end{equation*} \notag $$

with the convention $I_{-1} = 0$.

Lemma 3.2. For every $\varepsilon > 0$ there exists a positive constant $C_\varepsilon$ independent of $m$ such that for every function $v$ on $\mathbb{R}$,

$$ \begin{equation} \|\Delta_m(v)\|_{L_\infty^{\sqrt{g}}(\mathbb R)} \leqslant (1+C_\varepsilon m)^{1/6+\varepsilon} \|v\|_{L_\infty^{\sqrt{g}}(\mathbb R)} \quad \forall\, m \in \mathbb N_0 \end{equation} \tag{3.15} $$

whenever the norm in the right-hand side is finite.

Proof. From the assumptions we have

$$ \begin{equation*} \|\Delta_m(v)\|_{L_\infty^{\sqrt{g}}(\mathbb R)} \leqslant 2C(1+m)^{1/6} \|v\|_{L_\infty^{\sqrt{g}}(\mathbb R)} \quad \forall\, m \in \mathbb N_0, \end{equation*} \notag $$

which implies (3.15). The lemma is proved.

We will use a sparse-grid Lagrange GPC interpolation as an intermediate approximation in the deep ReLU neural network approximation of functions $v\in L_2(U,X,\gamma)$. In order to have a consistent definition of the interpolation operator we have to impose some necessary restrictions on $v$. Let $\mathcal{E}$ be a $\gamma$-measurable subset in $U$ such that $\gamma(\mathcal{E}) =1$ and $\mathcal{E}$ contains all $\boldsymbol{y} \in U$ with $|\boldsymbol{y}|_0 < \infty$ in the case when $U={\mathbb R}^\infty$, where $|\boldsymbol{y}|_0$ denotes the number of nonzero components $y_j$ of $\boldsymbol{y}$. Given $\mathcal{E}$ and a Hilbert space $X$, we define $L_2^\mathcal{E}(U,X,\gamma)$ as the subspace in $L_2(U,X,\gamma)$ of all elements $v$ such that the value at a point $v(\boldsymbol{y})$ (of a representative of $v$) is well defined for all $\boldsymbol{y} \in \mathcal{E}$. In what follows $\mathcal{E}$ is fixed.

For $v \in L_2^\mathcal{E}(U,X,\gamma)$ we introduce the tensor product operator $\Delta_{\boldsymbol{s}}$, $\boldsymbol{s} \in \mathcal{F}$, by

$$ \begin{equation} \Delta_{\boldsymbol{s}}(v) :=\bigotimes_{j \in \mathcal N} \Delta_{s_j}(v), \end{equation} \tag{3.16} $$

where the univariate operator $\Delta_{s_j}$ is successively applied to the univariate function $\bigotimes_{i<j} \Delta_{s_i}(v)$ by considering it as a function of the variable $y_j$ with the other variables fixed. From the definition of $L_2^\mathcal{E}(U,X,\gamma)$ one can see that the operators $\Delta_{\boldsymbol{s}}$ are well defined for all $\boldsymbol{s} \in \mathcal{F}$. For $\boldsymbol{s} \in \mathcal{F}$ we set

$$ \begin{equation*} I_{\boldsymbol{s}}(v) :=\bigotimes_{j \in \mathcal N} I_{s_j}(v), \qquad L_{\boldsymbol{s};\boldsymbol{k}} :=\bigotimes_{j \in \mathcal N} L_{s_j;k_j} \quad\text{and}\quad \pi_{\boldsymbol{s}} :=\prod_{j \in \mathcal N} \pi_{s_j} \end{equation*} \notag $$

(the function $I_{\boldsymbol{s}}(v)$ is defined in the same manner as $\Delta_{\boldsymbol{s}}(v)$).

For $\boldsymbol{s} \in \mathcal{F}$ and $\boldsymbol{k} \in \pi_{\boldsymbol{s}}$, let $E_{\boldsymbol{s}}$ be the subset in $\mathcal{F}$ of all $\boldsymbol{e}$ such that $e_j$ is either $1$ or $0$ if $s_j > 0$, and $e_j$ is $0$ if $s_j = 0$, and let $\boldsymbol{y}_{\boldsymbol{s};\boldsymbol{k}}:= (y_{s_j;k_j})_{j \in \mathcal{N}} \in U$. Put $|\boldsymbol{s}|_1 := \sum_{j \in \mathcal{N}} s_j$ for $\boldsymbol{s} \in \mathcal{F}$. It is easy to check that the interpolation operator $\Delta_{\boldsymbol{s}}$ can be represented in the form

$$ \begin{equation} \Delta_{\boldsymbol{s}}(v) =\sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} (-1)^{|\boldsymbol{e}|_1} I_{\boldsymbol{s}- \boldsymbol{e}} (v) =\sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} (-1)^{|\boldsymbol{e}|_1} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} v(\boldsymbol{y}_{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}) L_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}. \end{equation} \tag{3.17} $$

Given a finite set $\Lambda \subset \mathcal{F}$, we introduce the GPC interpolation operator $I_\Lambda$ by

$$ \begin{equation} I_\Lambda :=\sum_{\boldsymbol{s} \in \Lambda} \Delta_{\boldsymbol{s}}. \end{equation} \tag{3.18} $$

From (3.17) we obtain

$$ \begin{equation} I_\Lambda(v)=\sum_{\boldsymbol{s} \in \Lambda} \sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} (-1)^{|\boldsymbol{e}|_1} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} v(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}) L_{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}. \end{equation} \tag{3.19} $$

A set $\Lambda \subset \mathcal{F}$ is called downward closed if the inclusion $\boldsymbol{s} \in \Lambda$ yields the inclusion $\boldsymbol{s}' \in \Lambda$ for every $\boldsymbol{s}' \in \mathcal{F}$ such that $\boldsymbol{s}' \leqslant \boldsymbol{s}$.

For $\theta, \lambda \geqslant 0$ we define the set $\boldsymbol{p}(\theta, \lambda):= (p_{\boldsymbol{s}}(\theta, \lambda))_{\boldsymbol{s} \in \mathcal F}$ by

$$ \begin{equation} p_{\boldsymbol{s}}(\theta, \lambda) :=\prod_{j \in \mathcal N} (1+\lambda s_j)^\theta, \qquad \boldsymbol{s} \in \mathcal F, \end{equation} \tag{3.20} $$

with shorthand notation $p_{\boldsymbol{s}}(\theta):= p_{\boldsymbol{s}}(\theta, 1)$ and $\boldsymbol{p}(\theta):= \boldsymbol{p}(\theta, 1)$.

Let $0 < q < \infty$, and let $\boldsymbol{\sigma}= (\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}}$ be a set of positive numbers. For $\xi >0$ we define the set

$$ \begin{equation} \Lambda(\xi):=\{\boldsymbol{s} \in \mathcal F\colon\sigma_{\boldsymbol{s}}^q \leqslant \xi\}. \end{equation} \tag{3.21} $$

By formula (3.19) we can represent the operator $I_{\Lambda(\xi)}$ in the form

$$ \begin{equation} I_{\Lambda(\xi)}(v) =\sum_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)} (-1)^{|\boldsymbol{e}|_1} v(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}})L_{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}, \end{equation} \tag{3.22} $$

where

$$ \begin{equation} G(\xi) := \{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in \mathcal F \times \mathcal F \times \mathcal F\colon\boldsymbol{s} \in \Lambda(\xi), \ \boldsymbol{e} \in E_{\boldsymbol{s}}, \ \boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}} \}. \end{equation} \tag{3.23} $$

The following theorem gives an estimate for the error of the approximation of $v \in \mathcal{L}_2^\mathcal{E}(U,X,\gamma)$ by sparse-grid Lagrange GPC interpolation $I_{\Lambda(\xi)} v$ at the sampling points in the set $G(\xi)$, which will be used in the deep ReLU neural approximation in the next section.

Theorem 3.1. Let $v \in \mathcal{L}_2^\mathcal{E}(U,X,\gamma)$ satisfy Assumption (I), and let $\varepsilon >0$ be a fixed number. Assume that $\|\boldsymbol{p}(\theta/q,\lambda)\boldsymbol{\sigma}^{-1}\|_{\ell_q(\mathcal F)} \leqslant C < \infty$, where $\theta =7/3 + 2\varepsilon$, $\lambda:= C_\varepsilon$ is the constant from Lemma 3.2, and the constant $C$ is independent of $J$. Then for each $\xi > 1$ we have

$$ \begin{equation} \|v -I_{\Lambda(\xi)}v\|_{\mathcal L(U,X)} \leqslant C\xi^{-(1/q-1/2)}, \end{equation} \tag{3.24} $$

where the constant $C$ in (3.24) is independent of $J$, $v$ and $\xi$.

A proof of this theorem is given in § 5.2.

Corollary 3.1. Under the assumptions of Theorem 3.1, for each $n > 1$ we can construct a sequence of points $Y_{\Lambda(\xi_n)}:= (\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e}; \boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi_n)}$ so that $|G(\xi_n)| \leqslant n$ and

$$ \begin{equation} \|v -I_{\Lambda(\xi_n)}v\|_{\mathcal L(U,X)} \leqslant Cn^{-(1/q-1/2)}, \end{equation} \tag{3.25} $$

where the constant $C$ in (3.25) is independent of $J$, $v$ and $n$.

Proof. Notice that this corollary was proved in [15], Corollary 3.1, for the case $U={\mathbb R}^\infty$. By Lemma 5.2 $|G(\xi)| \leqslant C_q \xi$ for every $\xi > 1$. Hence the corollary follows from Theorem 3.1 by selecting $\xi_n$ as the maximum number satisfying $|G(\xi_n)| \leqslant n$.

3.3. Approximation by deep ReLU neural networks

In this section, we construct deep ReLU neural networks for the collocation approximation of functions $v \in L_2(U,X,\gamma)$. We primarily approximate $v$ by the sparse-grid Lagrange GPC interpolation $I_{\Lambda(\xi)} v$. Under assumption (iii) of Lemma 5.1, $I_{\Lambda(\xi)} v $ can be seen as a function on ${\mathbb R}^m$, where $m :=\min\{M,\lfloor K_q \xi \rfloor\}$. At the next step we approximate $I_{\Lambda(\xi)}v$ by its truncation $I_{\Lambda(\xi)}^{\omega}v$ on a sufficiently large super-cube

$$ \begin{equation} B^m_\omega :=[-2\sqrt{\omega}, 2\sqrt{\omega}]^m \subset \mathbb R^m, \end{equation} \tag{3.26} $$

where the parameter $\omega$ depending on $\xi$ is chosen in an appropriate way. Finally, the function $I_{\Lambda(\xi)}^{\omega}v$, and therefore $v$ is approximated by a function $\Phi_{\Lambda(\xi)}v $ on ${\mathbb R}^m$ which is constructed from a deep ReLU neural network. Let us describe this construction.

For convenience, we consider $\mathbb{R}^m$ as the subset of all $\boldsymbol{y} \in U$ such that $y_j = 0$ for $j > m$. If $f$ is a function on ${\mathbb R}^m$ taking values in a Hilbert space $X$, then $f$ has an extension to $\mathbb{R}^{m'}$ for $m' > m$ and the whole of $U$, which is denoted by $f$ again, by the formula $f(\boldsymbol{y})=f((y_j)_{j=0}^m)$ for $\boldsymbol{y} = (y_j)_{j =1}^{m'}$ and $\boldsymbol{y} = (y_j)_{j \in \mathcal{N}}$, respectively.

Suppose that deep ReLU neural networks $\phi_{\boldsymbol{s} - \boldsymbol{e};\boldsymbol{k}}$ on $\mathbb{R}^{|\operatorname{supp}(\boldsymbol{s})|}$ have already been constructed for the approximation of the polynomials $L_{\boldsymbol{s} - \boldsymbol{e};\boldsymbol{k}}$, $(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)$. Then the network $\boldsymbol{\phi}_{\Lambda(\xi)}:= (\phi_{\boldsymbol{s}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)}$ on $\mathbb{R}^m$ with $|G(\xi)|$ outputs which is constructed by parallelization, is used to construct an approximation of $I_{\Lambda(\xi)}^{\omega}v$, and therefore of $v$. Namely, we approximate $v$ by

$$ \begin{equation} \Phi_{\Lambda(\xi)}v (\boldsymbol{y}) :=\sum_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)} (-1)^{|\boldsymbol{e}|_1} v(\boldsymbol{y}_{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}})\phi_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}} (\boldsymbol{y}). \end{equation} \tag{3.27} $$

For the set $\Lambda(\xi)$, we introduce the following numbers:

$$ \begin{equation} m_1(\xi) :=\max_{\boldsymbol{s} \in \Lambda(\xi)} |\boldsymbol{s}|_1 \end{equation} \tag{3.28} $$

and

$$ \begin{equation} m(\xi) :=\max\bigl\{j \in \mathcal N\colon \exists\, \boldsymbol{s} \in \Lambda(\xi) \ \text{such that} \ s_j > 0 \bigr\}. \end{equation} \tag{3.29} $$

In this section we prove our main results on deep ReLU neural network approximation of functions $v \in L_2^\mathcal{E}(U,X,\gamma)$ with the error measured in the norm of the space $L_2({\mathbb R}^\infty,X,\gamma)$ or $L_\infty^{\sqrt{g}}({\mathbb R}^M,X)$, which are incorporated into the following common theorem.

Denote by $\boldsymbol{e}^i = (e^i_j)_{j \in \mathcal{N}}\in \mathcal{F}$ the element such that $e^i_i = 1$ and $e^i_j = 0$ for $j \ne i$.

Theorem 3.2. Let $v \in L_2^\mathcal{E}(U,X,\gamma)$ satisfy Assumption (I). Let $\theta$ be any number such that $\theta \geqslant 3/q$. Assume that the set $\boldsymbol{\sigma}= (\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}}$ in Assumption (I) satisfies $\sigma_{\boldsymbol{e}^{i'}} \leqslant \sigma_{\boldsymbol{e}^i}$ for $i' < i$, and that $\|\boldsymbol{p}(\theta)\boldsymbol{\sigma}^{-1} \|_{\ell_q(\mathcal F)} \leqslant C<\infty$, where the constant $C$ is independent of $J$. Let $K_q$, $K_{q,\theta}$ and $C_q$ be the constants in the assumptions of Lemmas 5.1 and 5.2. Then for every $\xi > 2$ we can construct a deep ReLU neural network $\boldsymbol{\phi}_{\Lambda(\xi)}:= (\phi_{\boldsymbol{s}-\boldsymbol{e}; \boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)}$ on $\mathbb{R}^m$, where

$$ \begin{equation*} m :=\begin{cases} \min\{M,\lfloor K_q \xi \rfloor\}&\textit{if } U={\mathbb R}^M, \\ \lfloor K_q \xi \rfloor&\textit{if }U={\mathbb R}^\infty, \end{cases} \end{equation*} \notag $$

and a sequence of points $Y_{\Lambda(\xi)}:= (\boldsymbol{y}_{\boldsymbol{s} -\boldsymbol{e};\boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)}$ having the following properties:

(i) the deep ReLU neural network $\boldsymbol{\phi}_{\Lambda(\xi)}$ and sequence of points $Y_{\Lambda(\xi)}$ are independent of $v$;
(ii) the output dimension of $\boldsymbol{\phi}_{\Lambda(\xi)}$ are at most $\lfloor C_q \xi \rfloor$;
(iii) $W(\boldsymbol{\phi}_{\Lambda(\xi)}) \leqslant C \xi^{1+2/(\theta q)} \log \xi$;
(iv) $L(\boldsymbol{\phi}_{\Lambda(\xi)}) \leqslant C \xi^{1/(\theta q)} (\log \xi)^2$;
(v) the components $\phi_{\boldsymbol{s} - \boldsymbol{e};\boldsymbol{k}}$ of $\boldsymbol{\phi}_{\Lambda(\xi)}$, $(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)$, are deep ReLU neural networks on $\mathbb{R}^{|\operatorname{supp}(\boldsymbol{s})|}$ with ${|{\operatorname{supp}(\boldsymbol{s})}|} \leqslant K_{q,\theta} \xi^{1/(\theta q) }$, with support in the super-cube $[-T,T]^{|{\operatorname{supp}(\boldsymbol{s})}|}$, where $T:= 4\sqrt{\lfloor K_{q,\theta} \xi \rfloor}$;
(vi) the approximation of $v$ by $\Phi_{\Lambda(\xi)}v$ gives the error estimate
$$ \begin{equation} \| v- \Phi_{\Lambda(\xi)}v \|_{\mathcal L(U,X)}\leqslant C\xi^{-(1/q-1/2)}. \end{equation} \tag{3.30} $$

Here the constants $C$ are independent of $J$, $v$ and $\xi$.

Let us briefly draw a plan of the proof of this theorem. We will present a detailed proof for $U={\mathbb R}^\infty$ and then point out that the case $U= {\mathbb R}^M$ can be proved in the same way with slight modifications.

In the rest of this section, all definitions, formulae and assertions are given for $U= {\mathbb R}^\infty$ and $\xi >1$; we use the letters $m$ and $\omega$ only for the notation

$$ \begin{equation} m :=\lfloor K_q\xi \rfloor \quad\text{and}\quad \omega :=\lfloor {K_{q,\theta}}\xi \rfloor, \end{equation} \tag{3.31} $$

where $K_q$ and $K_{q,\theta}$ are the constants defined in Lemma 5.1. As mentioned above, we primarily approximate $v \in L_2({\mathbb R}^\infty,X,\gamma)$ by the GPC interpolation $I_{\Lambda(\xi)} v$. At the next step, we approximate $I_{\Lambda(\xi)} v $ by its truncation $I_{\Lambda(\xi)}^{\omega}v$ on the super-cube $B^m_\omega$, which we construct below. The final step is to construct a deep ReLU neural network $\boldsymbol{\phi}_{\Lambda(\xi)}:= (\phi_{\boldsymbol{s}-\boldsymbol{e}; \boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)}$ to approximate $I_{\Lambda(\xi)}^{\omega}v$ by $\Phi_{\Lambda(\xi)}v$ of the form (3.27).

For a function $\varphi$ defined on $\mathbb{R}$, we denote by $\varphi^{\omega}$ the truncation of $\varphi$ on $B^1_\omega$, that is,

$$ \begin{equation} \varphi^{\omega}(y) :=\begin{cases} \varphi(y) & \text{if } y \in B^1_\omega, \\ 0 & \text{otherwise}. \end{cases} \end{equation} \tag{3.32} $$

If $\operatorname{supp} (\boldsymbol{s}) \subset \{1,\dots,m\}$, then we put

$$ \begin{equation*} L_{\boldsymbol{s},\boldsymbol{k}}^{\omega}(\boldsymbol{y}) :=\prod_{j=1}^m L_{s_j;k_j}^{\omega}(y_j),\qquad \boldsymbol{y}\in \mathbb R^m. \end{equation*} \notag $$

We have $L_{\boldsymbol{s},\boldsymbol{k}}^{\omega}(\boldsymbol{y}) =\prod_{j=1}^m L_{s_j;k_j}(y_j)$ if $\boldsymbol{y}\in B^m_\omega $ and $L_{\boldsymbol{s},\boldsymbol{k}}^{\omega}(\boldsymbol{y})=0$ otherwise. For a function $v \in L_2^\mathcal{E}({\mathbb R}^\infty,X,\gamma)$, we define

$$ \begin{equation} f I_{\Lambda(\xi)}^\omega(v) :=\sum_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)}(-1)^{|\boldsymbol{e}|_1} v(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}) L_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}^\omega. \end{equation} \tag{3.33} $$

Let the assumptions of Theorem 3.2 hold. By Lemma 5.1, (iii), for every $\xi >2$ we have $m(\xi) \leqslant m$. Hence, for every $(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)$, $L_{\boldsymbol{s} - \boldsymbol{e};\boldsymbol{k}}$ and $L_{\boldsymbol{s} - \boldsymbol{e};\boldsymbol{k}}^\omega$, and therefore $I_{\Lambda(\xi)}v $ and $I_{\Lambda(\xi)}^{\omega} v$, can be considered as functions on ${\mathbb R}^m$. For $g \in L_2({\mathbb R}^m,X,\gamma)$ we have $\|g\|_{L_2({\mathbb R}^m,X,\gamma)}=\|g\|_{L_2({\mathbb R}^\infty,X,\gamma)}$ in the sense of an extension of $g$. We make use of these facts without mention.

To prove Theorem 3.2 we will use some intermediate approximations for estimation of the approximation error as in (3.30). Suppose that the deep ReLU neural network $\boldsymbol{\phi}_{\Lambda(\xi)}$, and therefore the function $\Phi_{\Lambda(\xi)}$, have already been constructed. By the triangle inequality we have

$$ \begin{equation} \begin{aligned} \, \notag &\|v- \Phi_{\Lambda(\xi)} v\|_{L_2({\mathbb R}^\infty,X,\gamma)} \leqslant \|v-I_{\Lambda(\xi)}v \|_{L_2({\mathbb R}^\infty,X,\gamma)} + \|I_{\Lambda(\xi)}v-I_{\Lambda(\xi)}^{\omega}v\|_{{L_2({\mathbb R}^m \setminus B^m_\omega,X,\gamma)}} \\ &\qquad\qquad + \| I_{\Lambda(\xi)}^{\omega}v- \Phi_{\Lambda(\xi)} v\|_{L_2(B^m_\omega,X,\gamma)} + \| \Phi_{\Lambda(\xi)} v \|_{L_2({\mathbb R}^m \setminus B^m_\omega,X,\gamma)}. \end{aligned} \end{equation} \tag{3.34} $$

Hence the estimate (3.30) will be obtained via the bound $C\xi^{-(1/q - 1/2)}$ for every of the four terms in the right-hand side. The first term is already estimated as in Theorem 3.1. The estimates for the others will be carried out in the following lemmas (Lemmas 3.3–3.5). To complete the proof of Theorem 3.2 we also have to prove bounds on the size and depth of $\boldsymbol{\phi}_{\Lambda(\xi)}$ as in items (iii) and (iv); this is done in Lemma 3.6 below.

For $v \in L_2^\mathcal{E}({\mathbb R}^\infty,X,\gamma)$ satisfying Assumption (I), by Lemma 3.1 the series (3.8) converges unconditionally to $v$ in $L_2({\mathbb R}^\infty,X,\gamma)$. Therefore, formula (3.19) for ${\Lambda = \Lambda(\xi)}$ can be rewritten as

$$ \begin{equation} I_{\Lambda(\xi)}(v) =\sum_{\boldsymbol{s} \in \Lambda(\xi)} \sum_{\boldsymbol{s}' \in \mathbb F}v_{\boldsymbol{s}'} \sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} (-1)^{|\boldsymbol{e}|_1} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} H_{\boldsymbol{s}'}(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}) L_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}. \end{equation} \tag{3.35} $$

Hence, by the definition (3.33) we also have

$$ \begin{equation} I_{\Lambda(\xi)}^\omega(v) =\sum_{\boldsymbol{s} \in \Lambda(\xi)} \sum_{\boldsymbol{s}' \in \mathbb F}v_{\boldsymbol{s}'} \sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} (-1)^{|\boldsymbol{e}|_1} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} H_{\boldsymbol{s}'}(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}) L_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}^\omega. \end{equation} \tag{3.36} $$

Lemma 3.3. Under the assumptions of Theorem 3.2, for every $\xi > 1$ we have

$$ \begin{equation} \|I_{\Lambda(\xi)}v-I_{\Lambda(\xi)}^{\omega} v\|_{L_2({\mathbb R}^\infty,X,\gamma)} \leqslant C\xi^{-(1/q-1/2)}, \end{equation} \tag{3.37} $$

where the constant $C$ is independent of $v$ and $\xi$.

Proof. By the equality

$$ \begin{equation*} \| L_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}-L_{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}^\omega\|_{L_2({\mathbb R}^\infty,\gamma)} =\| L_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k} }\|_{L_2(\mathbb R^m\setminus B^m_\omega,\gamma)} \quad \forall\, (\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi) \end{equation*} \notag $$

and the triangle inequality, taking (3.35) and (3.36) into account we obtain

$$ \begin{equation*} \begin{aligned} \, &\| I_{\Lambda(\xi)}v-I_{\Lambda(\xi)}^{\omega}v \|_{L_2({\mathbb R}^\infty,X,\gamma)} \\ &\qquad \leqslant \sum_{\boldsymbol{s} \in \Lambda(\xi)} \sum_{\boldsymbol{s}' \in \mathbb F} \|v_{\boldsymbol{s}'}\|_{X} \sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} |H_{\boldsymbol{s}'}(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}})| \|L_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k} }\|_{L_2(\mathbb R^m\setminus B^m_\omega,\gamma)}. \end{aligned} \end{equation*} \notag $$

Let $(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)$ be fixed. Then we have

$$ \begin{equation*} L_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}} = \prod_{j=1}^m L_{s_j-e_j;k_j}(y_j), \qquad \boldsymbol{y} \in \mathbb R^m, \end{equation*} \notag $$

where $L_{s_j - e_j;k_j}$ is a polynomial in the variable $y_j$, of degree not greater than $m_1(\xi)$. Hence, applying Lemma 5.7 while taking account of (3.31) gives

$$ \begin{equation*} \|L_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}} \|_{L_2(\mathbb R^m\setminus B^m_\omega,\gamma)} \leqslant C\xi e^{- K_1\xi} \| L_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}} \|_{L_2(\mathbb R^m,\gamma)}. \end{equation*} \notag $$

From Lemmas 5.3 and 5.4 and Lemma 5.1, (ii), we derive that

$$ \begin{equation*} \begin{aligned} \, \| L_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}} \|_{L_2(\mathbb R^m,\gamma)} &= \prod_{j \in \mathbb N} \|L_{s_j-e_j;k_j}\|_{L_2(\mathbb R,\gamma)} \leqslant \prod_{j \in \mathbb N} e^{K_2 (s_j-e_j)} \\ &\leqslant \prod_{j \in \mathbb N} e^{K_2 s_j }=e^{K_2 |\boldsymbol{s}|_1} \leqslant e^{K_2 m_1(\xi)} \leqslant e^{K_3\xi^{1/(\theta q)}} \end{aligned} \end{equation*} \notag $$

and

$$ \begin{equation} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} |H_{\boldsymbol{s}'}(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}})| \leqslant e^{K_4|\boldsymbol{s}|_1} \leqslant e^{K_4 m_1(\xi)} \leqslant e^{K_5\xi^{1/(\theta q)}}. \end{equation} \tag{3.38} $$

Summing up, we arrive at the inequality

$$ \begin{equation*} \begin{aligned} \, &\| I_{\Lambda(\xi)}v-I_{\Lambda(\xi)}^{\omega}v \|_ {L_2({\mathbb R}^\infty,X,\gamma)} \\ &\qquad \leqslant C_1\xi \exp(- K_1\xi+(K_2+K_5) \xi^{1/(\theta q)}) \sum_{\boldsymbol{s} \in \Lambda(\xi)} \sum_{\boldsymbol{s}' \in \mathbb F} \|v_{\boldsymbol{s}'}\|_{X} \sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} 1 \\ &\qquad \leqslant C_1 \xi \exp\bigl(- K_1\xi+K_6 \xi^{1/(\theta q)}\bigr) |G(\xi)| \sum_{\boldsymbol{s}' \in \mathbb F} \|v_{\boldsymbol{s}'}\|_{X}. \end{aligned} \end{equation*} \notag $$

Hence, by Lemmas 3.1 and 5.2 and the inequality $1/\theta q \leqslant 1/3$ we obtain

$$ \begin{equation*} \| I_{\Lambda(\xi)}v-I_{\Lambda(\xi)}^{\omega}v \|_ {L_2({\mathbb R}^\infty,X,\gamma)} \leqslant C_2 \xi^2 \exp(- K_1\xi+K_6 \xi^{1/(\theta q)}) \leqslant C\xi^{-(1/q-1/2)}. \end{equation*} \notag $$

The lemma is proved.

Lemma 3.3 gives a bound on the second term in the right-hand side of (3.34), that is, an error bound for the approximation of sparse-grid Lagrange interpolation $I_{\Lambda(\xi)}v$ by its truncation $I_{\Lambda(\xi)}^{\omega}v$ on $B_m^\omega$ for $v \in L_2({\mathbb R}^\infty,X,\gamma)$. As the next step we construct a deep ReLU neural network $\boldsymbol{\phi}_{\Lambda(\xi)}:=(\phi_{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)}$ on ${\mathbb R}^m$ for approximating $I_{\Lambda(\xi)}^{\omega}v$ by the function $\Phi_{\Lambda(\xi)}v$ given as in (3.27), and prove a bound on the error as the third term in the right-hand side of (3.34).

For $s\in \mathbb{N}_0$ we represent the univariate interpolation polynomial $L_{s;k}$ in the form of a linear combination of monomials:

$$ \begin{equation} L_{s;k}(y)=: \sum_{\ell=0}^s b^{s;k}_\ell y^\ell. \end{equation} \tag{3.39} $$

From (3.39), for each $(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)$ we have

$$ \begin{equation} L_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}} =\sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s}-\boldsymbol{e}} b^{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}_{\boldsymbol{\ell}} \boldsymbol{y}^{\boldsymbol{\ell}}, \end{equation} \tag{3.40} $$

where the notation $\sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s} - \boldsymbol{e}}$ means that the sum is taken over all $\boldsymbol{\ell}$ such that $\boldsymbol{0} \leqslant \boldsymbol{\ell} \leqslant {\boldsymbol{s} - \boldsymbol{e}}$, and

$$ \begin{equation*} b^{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}_{\boldsymbol{\ell}}=\prod_{j=1}^m b^{s_j- e_j;k_j}_{\ell_j}\quad\text{and} \quad \boldsymbol{y}^{\boldsymbol{\ell}}=\prod_{j=1}^m y_j^{\ell_j}. \end{equation*} \notag $$

Indeed, we have

$$ \begin{equation*} \begin{aligned} \, L_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}} &=\prod_{j=1}^m L_{s_j-e_j;k_j}(y_j) =\prod_{j=1}^m\sum_{\ell_j=0}^{s_j-e_j} b^{s_j-e_j;k_j}_{\ell_j} y_j^{\ell_j} \\ &=\sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s}-\boldsymbol{e}} \biggl( {\prod_{j=1}^m b^{s_j-e_j;k_j}_{\ell_j}} \biggr)\boldsymbol{y}^{\boldsymbol{\ell}} =\sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s}-\boldsymbol{e}} b^{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}_{\boldsymbol{\ell}} \boldsymbol{y}^{\boldsymbol{\ell}}. \end{aligned} \end{equation*} \notag $$

By (3.36) and (3.40), for every $\boldsymbol{y} \in B^m_\omega$ we obtain

$$ \begin{equation} I_{\Lambda(\xi)}^\omega(v)(\boldsymbol{y}) =\sum_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)} (-1)^{|\boldsymbol{e}|_1} v(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}) \sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s}-\boldsymbol{e}} b^{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}_{\boldsymbol{\ell}} (2\sqrt{\omega})^{|\boldsymbol{\ell}|_1} \prod_{j \in \operatorname{supp}(\boldsymbol{\ell})}\biggl(\frac{y_j}{2\sqrt{\omega}} \biggr)^{\ell_j}. \end{equation} \tag{3.41} $$

Let $\boldsymbol{\ell} \in \mathbb{F}$ be such that $\boldsymbol{0} \leqslant \boldsymbol{\ell} \leqslant \boldsymbol{s} - \boldsymbol{e}$. By definition we have $\operatorname{supp}(\boldsymbol{\ell}) \subset \operatorname{supp}(\boldsymbol{s})$. Making the change of variables

$$ \begin{equation*} \boldsymbol{x}=\frac{\boldsymbol{y}}{2\sqrt{\omega}}, \qquad \boldsymbol{y} \in \mathbb R^{|{\operatorname{supp}(\boldsymbol{s})}|}, \end{equation*} \notag $$

we have

$$ \begin{equation} \prod_{j \in \operatorname{supp}(\boldsymbol{\ell})} \biggl(\frac{y_j}{2\sqrt{\omega}}\biggr)^{\ell_j} = \prod_{j \in \operatorname{supp}(\boldsymbol{\ell})} \varphi_1^{\ell_j} \biggl( \frac{y_j}{2\sqrt{\omega}}\biggr) \prod_{j \in \operatorname{supp}(\boldsymbol{s}) \setminus\operatorname{supp}(\boldsymbol{\ell})} \varphi_0 \biggl(\frac{y_j}{2\sqrt{\omega}}\biggr)=h^{\boldsymbol{s}- \boldsymbol{e}}_{\boldsymbol{\ell}}(\boldsymbol{x}), \end{equation} \tag{3.42} $$

where

$$ \begin{equation} h^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}}(\boldsymbol{x}) :=\prod_{j \in \operatorname{supp}(\boldsymbol{\ell})} \varphi_1^{\ell_j}(x_j) \prod_{j \in \operatorname{supp}(\boldsymbol{s})\setminus\operatorname{supp}(\boldsymbol{\ell})} \varphi_0(x_j), \end{equation} \tag{3.43} $$

and $\varphi_0$ and $\varphi_1$ are the piecewise linear functions defined before Lemma 2.4. We put

$$ \begin{equation} B_{\boldsymbol{s}}:= \max_{\boldsymbol{e} \in E_{\boldsymbol{s}}, \, \boldsymbol{k} \in \pi_{\boldsymbol{s}- \boldsymbol{e}}} \max_{\boldsymbol{0}\leqslant \boldsymbol{\ell}\leqslant \boldsymbol{s}- \boldsymbol{e}} |b^{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}_{\boldsymbol{\ell}}| \end{equation} \tag{3.44} $$

and

$$ \begin{equation} \delta^{-1} :=\xi^{1/q-1/2} \sum_{\boldsymbol{s} \in \Lambda(\xi)} e^{K|\boldsymbol{s}|_1} p_{\boldsymbol{s}}(2) (2\sqrt{\omega})^{|\boldsymbol{s}|_1} B_{\boldsymbol{s}}, \end{equation} \tag{3.45} $$

where $K$ is the constant from Lemma 5.3. Hence applying Lemma 2.4 to the product in the left-hand side of (3.43), for every $(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)$ and $\boldsymbol{\ell}$ satisfying $\boldsymbol{0} < \boldsymbol{\ell} \leqslant \boldsymbol{s}-\boldsymbol{e}$, there exists a deep ReLU neural network $\phi^{\boldsymbol{s} - \boldsymbol{e}}_{\boldsymbol{\ell}}$ on $\mathbb{R}^{|{\operatorname{supp}(\boldsymbol{s})}|}$ such that $\operatorname{supp}(\phi^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}}) \subset [-2,2]^{|{\operatorname{supp}(\boldsymbol{s})}|}$ and

$$ \begin{equation} \begin{split} &\sup_{\boldsymbol{y} \in B^{|{\operatorname{supp}(\boldsymbol{s})}|}_{\omega}}\biggl| \prod_{j \in \operatorname{supp}(\boldsymbol{s})} \biggl(\frac{y_j}{2\sqrt{\omega}}\biggr)^{\ell_j} - \phi^{\boldsymbol{s}- \boldsymbol{e}}_{\boldsymbol{\ell}}\biggl(\frac{\boldsymbol{y}}{\sqrt{\omega}}\biggr)\biggr| \\ &\qquad \leqslant \sup_{\boldsymbol{y} \in B^{|{\operatorname{supp}(\boldsymbol{s})}|}_{4\omega}}\biggl| h^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}} \biggl(\frac{\boldsymbol{y}}{2\sqrt{\omega}}\biggr) -\phi^{\boldsymbol{s}- \boldsymbol{e}}_{\boldsymbol{\ell}} \biggl(\frac{\boldsymbol{y}}{2\sqrt{\omega}}\biggr)\biggr| \leqslant \delta, \end{split} \end{equation} \tag{3.46} $$

and

$$ \begin{equation} \operatorname{supp}\biggl(\phi^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}}\biggl( \frac{\cdot}{2\sqrt{\omega}} \biggr) \biggr) \subset B^{|{\operatorname{supp}(\boldsymbol{s})}|}_{4\omega}. \end{equation} \tag{3.47} $$

Also, from Lemma 2.4 and the inequalities $|\boldsymbol{\ell}|_1 +|\operatorname{supp}(\boldsymbol{s})\setminus\operatorname{supp}(\boldsymbol{\ell})| \leqslant |\boldsymbol{s}|_1 \leqslant \delta^{-1}$ one can see that

$$ \begin{equation} W(\phi^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}}) \leqslant C \bigl(1+|\boldsymbol{s}|_1(\log|\boldsymbol{s}|_1+\log\delta^{-1})\bigr) \leqslant C(1+|\boldsymbol{s}|_1\log\delta^{-1}) \end{equation} \tag{3.48} $$

and

$$ \begin{equation} L(\phi^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}}) \leqslant C \bigl(1+\log |\boldsymbol{s}|_1 (\log|\boldsymbol{s}|_1+\log \delta^{-1})\bigr) \leqslant C(1+\log|\boldsymbol{s}|_1 \log \delta^{-1}). \end{equation} \tag{3.49} $$

We define the deep ReLU neural network $\phi_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}$ on $\mathbb{R}^{|{\operatorname{supp} (\boldsymbol{s})}|}$ by

$$ \begin{equation} \phi_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}(\boldsymbol{y}) :=\sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s}-\boldsymbol{e}} b^{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}_{\boldsymbol{\ell}} (2\sqrt{\omega})^{|\boldsymbol{\ell}|_1} \phi^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}} \biggl(\frac{\boldsymbol{y}}{2\sqrt{\omega}}\biggr), \qquad \boldsymbol{y} \in \mathbb R^{|{\operatorname{supp} (\boldsymbol{s})}|}, \end{equation} \tag{3.50} $$

which is the parallelization deep ReLU neural network of the component deep ReLU neural networks $\phi^{\boldsymbol{s} -\boldsymbol{e}}_{\boldsymbol{\ell}}({\cdot}/(2\sqrt{\omega}))$. From (3.47) it follows that

$$ \begin{equation} \operatorname{supp} (\phi_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}) \subset B^{|{\operatorname{supp} (\boldsymbol{s})}|}_{4\omega}. \end{equation} \tag{3.51} $$

According to the above convention, for $(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)$, we identify occasionally without mention the functions $\phi^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}}$ and $\phi_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}$ on $\mathbb{R}^{|{\operatorname{supp}(\boldsymbol{s})}|}$ with their extensions to ${\mathbb R}^m$ or to ${\mathbb R}^\infty$ in view of to the inclusions $\operatorname{supp}(\boldsymbol{s})\subset\{1,\dots,m\} \subset \mathbb{N}$.

We define $\boldsymbol{\phi}_{\Lambda(\xi)}:=(\phi_{\boldsymbol{s} -\boldsymbol{e};\boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)}$ as the deep ReLU neural network on ${\mathbb R}^m$ which is realized by parallelization of $\phi_{\boldsymbol{s}-\boldsymbol{e}; \boldsymbol{k}}$, $(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)$. Consider the approximation of $I_{\Lambda(\xi)}^{\omega}v$ by the function $\Phi_{\Lambda(\xi)} v$ where for convenience we recall that

$$ \begin{equation} \Phi_{\Lambda(\xi)} v(\boldsymbol{y}) :=\sum_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)} (-1)^{|\boldsymbol{e}|_1} v(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}) \phi_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}(\boldsymbol{y}). \end{equation} \tag{3.52} $$

Lemma 3.4. Under the assumptions of Theorem 3.2, for every $\xi > 1$ we have

$$ \begin{equation} \| I_{\Lambda(\xi)}^{\omega} v-\Phi_{\Lambda(\xi)} u \|_{L_2(B^m_\omega ,X,\gamma)} \leqslant C \xi^{-(1/q-1/2)}, \end{equation} \tag{3.53} $$

where the constant $C$ is independent of $v$ and $\xi$.

Proof. According to Lemma 3.1 the series (3.8) converges unconditionally to $v$. Hence, for every $\boldsymbol{y} \in B^m_\omega$, by (3.36) we have

$$ \begin{equation} \begin{aligned} \, \notag I_{\Lambda(\xi)}^\omega(v) (\boldsymbol{y}) &=\sum_{\boldsymbol{s} \in \Lambda(\xi)} \sum_{\boldsymbol{s}' \in \mathbb F}v_{\boldsymbol{s}'} \sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} (-1)^{|\boldsymbol{e}|_1} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} H_{\boldsymbol{s}'}(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}) \sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s}-\boldsymbol{e}} b^{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}_{\boldsymbol{\ell}} (2\sqrt{\omega})^{|\boldsymbol{\ell}|_1} \\ &\qquad \times\prod_{j \in \operatorname{supp}(\boldsymbol{s})} \biggl(\frac{y_j}{2\sqrt{\omega}}\biggr)^{\ell_j}, \end{aligned} \end{equation} \tag{3.54} $$

and by (3.52)

$$ \begin{equation} \begin{aligned} \, \notag \Phi_{\Lambda(\xi)} v (\boldsymbol{y}) =\sum_{\boldsymbol{s} \in \Lambda(\xi)} \sum_{\boldsymbol{s}' \in \mathbb F}v_{\boldsymbol{s}'} \sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} (-1)^{|\boldsymbol{e}|_1} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} H_{\boldsymbol{s}'}(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}) \\ \qquad\qquad \times \sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s}-\boldsymbol{e}} b^{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}_{\boldsymbol{\ell}} (2\sqrt{\omega})^{|\boldsymbol{\ell}|_1} \phi^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}} \biggl(\frac{\boldsymbol{y}}{2\sqrt{\omega}}\biggr). \end{aligned} \end{equation} \tag{3.55} $$

From these formulae and (3.46) we derive the inequality

$$ \begin{equation} \begin{aligned} \, \notag &\| I_{\Lambda(\xi)}^{\omega}v-\Phi_{\Lambda(\xi)} v \|_{L_2(B^m_\omega,X,\gamma)} \\ &\qquad \leqslant\sum_{\boldsymbol{s} \in \Lambda(\xi)} \sum_{\boldsymbol{s}' \in \mathbb F}\|v_{\boldsymbol{s}'}\|_{X} \sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} |H_{\boldsymbol{s}'}(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}})| \sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s}-\boldsymbol{e}} |b^{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}_{\boldsymbol{\ell}}| (2\sqrt{\omega})^{|\boldsymbol{\ell}|_1}\delta. \end{aligned} \end{equation} \tag{3.56} $$

By (3.44) we have

$$ \begin{equation*} \sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s}-\boldsymbol{e}} |b^{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}_{\boldsymbol{\ell}}| \leqslant B_{\boldsymbol{s}} \prod_{j \in \operatorname{supp}(\boldsymbol{s}-\boldsymbol{e})} s_j \leqslant p_{\boldsymbol{s}}(1) B_{\boldsymbol{s}}, \end{equation*} \notag $$

and by Lemma 5.3

$$ \begin{equation} \sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} |H_{\boldsymbol{s}'}(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}})| \leqslant \sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} e^{K|\boldsymbol{s}-\boldsymbol{e}|_1} \leqslant 2^{|\boldsymbol{s}|_0}e^{K|\boldsymbol{s}|_1} \leqslant p_{\boldsymbol{s}}(1) e^{K|\boldsymbol{s}|_1}. \end{equation} \tag{3.57} $$

This, in combination with (3.56), Lemma 3.1 and (3.45), yields

$$ \begin{equation} \begin{aligned} \, \notag &\| I_{\Lambda(\xi)}^{\omega}v-\Phi_{\Lambda(\xi)} v \|_{L_2(B^m_\omega,X,\gamma)} \\ \notag &\qquad \leqslant \sum_{\boldsymbol{s} \in \Lambda(\xi)}\delta B_{\boldsymbol{s}} p_{\boldsymbol{s}}(1)\sum_{\boldsymbol{s}' \in \mathbb F}\|v_{\boldsymbol{s}'}\|_{X} (2\sqrt{\omega})^{|\boldsymbol{s}|_1} \sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} |H_{\boldsymbol{s}'}(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}})| \\ &\qquad \leqslant \sum_{\boldsymbol{s}' \in \mathbb F}\|v_{\boldsymbol{s}'}\|_{X} \delta \sum_{\boldsymbol{s} \in \Lambda(\xi)} e^{K|\boldsymbol{s}|_1}p_{\boldsymbol{s}}(2)(2\sqrt{\omega})^{|\boldsymbol{s}|_1} B_{\boldsymbol{s}} \leqslant C\xi^{-(1/q-1/2)}. \end{aligned} \end{equation} \tag{3.58} $$

The lemma is proved.

In Lemma 3.4 we proved a bound on the third term in the right-hand side of (3.34), that is, an error bound for the approximation of $I_{\Lambda(\xi)}^{\omega}v$ by the function $\Phi_{\Lambda(\xi)}v$ for $v \in L_2({\mathbb R}^\infty,X,\gamma)$. As the last step of error estimation we establish a bound for the fourth term in the right-hand side of (3.34).

Lemma 3.5. Under the assumptions of Theorem 3.2, for every $\xi > 1$ we have

$$ \begin{equation} \|{\Phi_{\Lambda(\xi)} v} \|_{L_2(({\mathbb R}^m \setminus B^m_\omega) ,X,\gamma)} \leqslant C\xi^{-(1/q-1/2)}, \end{equation} \tag{3.59} $$

where the constant $C$ is independent of $v$ and $\xi$.

Proof. We use formula (3.55) to estimate the norm $\|\Phi_{\Lambda(\xi)} v \|_{L_2(({\mathbb R}^m \setminus B^m_\omega),X,\gamma)}$. We need the following auxiliary inequality

$$ \begin{equation} \biggl|\phi^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}} \biggl(\frac{\boldsymbol{y}}{2\sqrt{\omega}}\biggr)\biggr| \leqslant 2 \quad \forall\, \boldsymbol{y} \in {\mathbb R}^m. \end{equation} \tag{3.60} $$

Due to (3.47), it is sufficient to prove this inequality for $\boldsymbol{y} \in B^{|\operatorname{supp}(\boldsymbol{s})|}_{4\omega}$. Considering the right-hand side of (3.45), we have

$$ \begin{equation} \sum_{\boldsymbol{s} \in \Lambda(\xi)} e^{K|\boldsymbol{s}|_1} p_{\boldsymbol{s}}(2) (2\sqrt{\omega})^{|\boldsymbol{s}|_1} B_{\boldsymbol{s}} \geqslant e^{K|{\boldsymbol{0}}|_1} p_{\boldsymbol{0}}(2) (2\sqrt{\omega})^{|{\boldsymbol{0}}|_1} B_{\boldsymbol{0}}=1. \end{equation} \tag{3.61} $$

In combination with the definition (3.45), this yields $\delta \leqslant 1$. On the other hand, by the definition of $h^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}}$,

$$ \begin{equation*} \sup_{\boldsymbol{y} \in B^{|{\operatorname{supp}(\boldsymbol{s})}|}_{4\omega}} \biggl |h^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}} \biggl( \frac{\boldsymbol{y}}{2\sqrt{\omega}}\biggr)\biggr| \leqslant 1. \end{equation*} \notag $$

From the last two inequalities, (3.46) and the triangle inequality we derive (3.60) for $\boldsymbol{y} \in B^{|\operatorname{supp}(\boldsymbol{s})|}_{4\omega}$.

By (3.60) and Lemma 5.7,

$$ \begin{equation*} \biggl\|\phi^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}} \biggl( \frac{\cdot}{2\sqrt{\omega}}\biggr)\biggr\|{L_2({\mathbb R}^m \setminus B^m_\omega,\gamma)} \leqslant 2\|1\|_{L_2({\mathbb R}^m \setminus B^m_\omega,\gamma)} \leqslant C_1 m \exp (- K_1\omega ). \end{equation*} \notag $$

This, in combination with (3.55), implies that

$$ \begin{equation*} \begin{aligned} \, |\Phi_{\Lambda(\xi)} v \|_{L_2({\mathbb R}^m \setminus B^m_\omega ,X,\gamma)} &\leqslant \sum_{\boldsymbol{s} \in \Lambda(\xi)} \sum_{\boldsymbol{s}' \in \mathbb F} \|v_{\boldsymbol{s}'}\|_X\sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} |H_{\boldsymbol{s}'}(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}})| \\ &\qquad \times \sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s}-\boldsymbol{e}} |b^{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}_{\boldsymbol{\ell}}| (2\sqrt{\omega})^{|\boldsymbol{\ell}|_1} \|\phi^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}} \biggl(\frac{\cdot}{2\sqrt{\omega}}\biggr)\|_{L_2({\mathbb R}^m \setminus B^m_\omega,\gamma)} \\ &\leqslant C_1m \exp(-K_1\omega) \sum_{\boldsymbol{s} \in \Lambda(\xi)} (2\sqrt{\omega})^{|\boldsymbol{s}|_1}\sum_{\boldsymbol{s}' \in \mathbb F} \|v_{\boldsymbol{s}'}\|_X\sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} \\ &\qquad \times\sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} |H_{\boldsymbol{s}'}(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}})| \sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s}-\boldsymbol{e}} |b^{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}_{\boldsymbol{\ell}}|. \end{aligned} \end{equation*} \notag $$

Using the tensor product argument from Lemma 5.6 and the inequality $\boldsymbol{s} - \boldsymbol{e} \leqslant \boldsymbol{s}$ for $\boldsymbol{e} \in E_{\boldsymbol{s}}$, we deduce the estimates

$$ \begin{equation} \sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s}-\boldsymbol{e}} |b^{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}_{\boldsymbol{\ell}}| \leqslant e^{K_2|\boldsymbol{s}|_1} \boldsymbol{s}! \leqslant e^{K_2|\boldsymbol{s}|_1} |\boldsymbol{s}|_1^{|\boldsymbol{s}|_1}, \end{equation} \tag{3.62} $$

which, together (3.57), give us

$$ \begin{equation} \begin{aligned} \, \notag \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} | H_{\boldsymbol{s}'}(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}})| \sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s}-\boldsymbol{e}} |b^{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}_{\boldsymbol{\ell}}| &\leqslant \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} |H_{\boldsymbol{s}'}(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}})| e^{K_2|\boldsymbol{s}|_1} |\boldsymbol{s}|_1^{|\boldsymbol{s}|_1} \\ &\leqslant p_{\boldsymbol{s}}(1) e^{K_2|\boldsymbol{s}|_1} |\boldsymbol{s}|_1^{|\boldsymbol{s}|_1}. \end{aligned} \end{equation} \tag{3.63} $$

In combination with (3.31), (3.38) and Lemma 3.1, this allows us to continue the estimate as follows:

$$ \begin{equation} \begin{aligned} \, \notag &\| \Phi_{\Lambda(\xi)} v \| _{L_2({\mathbb R}^m \setminus B^m_\omega ,X,\gamma)} \\ \notag &\qquad \leqslant C_1m \exp (- K_1\omega ) \sum_{\boldsymbol{s}' \in \mathbb F} \|v_{\boldsymbol{s}'}\|_X \sum_{\boldsymbol{s} \in \Lambda(\xi)} (2\sqrt{\omega})^{|\boldsymbol{s}|_1} p_{\boldsymbol{s}}(1) e^{K_2|\boldsymbol{s}|_1} |\boldsymbol{s}|_1^{|\boldsymbol{s}|_1} \\ \notag &\qquad \leqslant C_2m \exp (- K_1\omega ) \sum_{\boldsymbol{s} \in \Lambda(\xi)} (2\sqrt{\omega})^{|\boldsymbol{s}|_1} p_{\boldsymbol{s}}(1) e^{K_2|\boldsymbol{s}|_1} |\boldsymbol{s}|_1^{|\boldsymbol{s}|_1} \\ &\qquad \leqslant C_2\xi \exp (- K_1 \xi ) \bigl({C_3 \xi^{1/2}} \bigr)^{m_1(\xi)} e^{K_2{m_1(\xi)}} [m_1(\xi)]^{m_1(\xi)} \sum_{\boldsymbol{s} \in \Lambda(\xi)} p_{\boldsymbol{s}}(1). \end{aligned} \end{equation} \tag{3.64} $$

By the assumptions of Theorem 3.2, $\| {\boldsymbol{p}(\theta)\boldsymbol{\sigma}^{-1}}\|_ {\ell_q(\mathcal F)} \leqslant C < \infty$ for some $\theta \geqslant 3/q$. From this we derive that

$$ \begin{equation*} \biggl\|{\boldsymbol{p}\biggl(\frac 3q\biggr) \boldsymbol{\sigma}^{-1}} \biggr\|_{\ell_q(\mathcal F)} \leqslant \|{\boldsymbol{p}(\theta)\boldsymbol{\sigma}^{-1}} \|_{\ell_q(\mathcal F)} \leqslant C < \infty. \end{equation*} \notag $$

Applying Lemma 5.1, (i), gives

$$ \begin{equation*} \sum_{\boldsymbol{s} \in \Lambda(\xi)} p_{\boldsymbol{s}}(1) \leqslant \sum_{\boldsymbol{s} \in \Lambda(\xi)} p_{\boldsymbol{s}}(3) \leqslant C. \end{equation*} \notag $$

Hence by (3.64) and Lemma 5.1, (ii) we have

$$ \begin{equation*} \begin{aligned} \, &\| {\Phi_{\Lambda(\xi)} v} \|_ {L_2({\mathbb R}^m \setminus B^m_\omega ,X,\gamma)} \\ &\qquad \leqslant C_2\xi \exp (- K_1 \xi) ({C_3 \xi^{1/2}})^{K_{q,\theta} \xi^{1/(\theta q)}} e^{K_3K_{q,\theta} \xi^{1/(\theta q)}} (K_{q,\theta} \xi^{1/(\theta q)})^{K_{q,\theta} \xi^{1/(\theta q)}} C_4 \xi \\ &\qquad \leqslant C_5 \xi^2 \exp(- K_1 \xi+K_4\xi^{1/(\theta q)} \log \xi+K_5 \xi^{1/(\theta q)}). \end{aligned} \end{equation*} \notag $$

Since $1/\theta q \leqslant 1/3$, we obtain

$$ \begin{equation*} \|\Phi_{\Lambda(\xi)} v\|_{L_2({\mathbb R}^m \setminus B^m_\omega ,X,\gamma)} \leqslant C \xi^{-(1/q -1/2)}. \end{equation*} \notag $$

Lemma 3.5 is proved.

To complete the proof of Theorem 3.2, we have to establish bounds on the size and depth of the deep ReLU neural network $\boldsymbol{\phi}_{\Lambda(\xi)}$ as in (iii) and (iv).

Lemma 3.6. Under the assumptions of Theorem 3.2 the input dimension of $\boldsymbol{\phi}_{\Lambda(\xi)}$ is at most $\lfloor K_q \xi \rfloor$ for every $\xi \,{>}\, 1$, and the output dimension of $\boldsymbol{\phi}_{\Lambda(\xi)}$ is at most $\lfloor C_q \xi \rfloor$,

$$ \begin{equation} W(\boldsymbol{\phi}_{\Lambda(\xi)}) \leqslant C \xi^{1+2/(\theta q)} \log \xi \end{equation} \tag{3.65} $$

and

$$ \begin{equation} L(\boldsymbol{\phi}_{\Lambda(\xi)}) \leqslant C \xi^{1/(\theta q)} (\log \xi)^2, \end{equation} \tag{3.66} $$

where the constants $C$ are independent of $v$ and $\xi$.

Proof. The input dimension of $\boldsymbol{\phi}_{\Lambda(\xi)}$ is not greater than $m(\xi)$, which is at most $\lfloor K_q \xi \rfloor$ by Lemma 5.1, (iii). The output dimension of $\boldsymbol{\phi}_{\Lambda(\xi)}$ is the quantity $|G(\xi)|$ which is at most $\lfloor C_q \xi \rfloor$ by Lemma 5.2.

By Lemmas 2.1 and 2.4 and (3.48) the size of $\boldsymbol{\phi}_{\Lambda(\xi)} $ is estimated as

$$ \begin{equation} \begin{aligned} \, \notag W(\boldsymbol{\phi}_{\Lambda(\xi)}) &\leqslant\sum_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)} W (\phi_{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}) \leqslant\sum_{\boldsymbol{s} \in \Lambda(\xi)} \sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} \sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s}-\boldsymbol{e}} W(\phi^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}}) \\ &\leqslant C_1 \sum_{\boldsymbol{s} \in \Lambda(\xi)} \sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} \sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s}-\boldsymbol{e}} (1+|\boldsymbol{s}|_1 \log \delta^{-1}), \end{aligned} \end{equation} \tag{3.67} $$

where, we recall,

$$ \begin{equation*} \delta^{-1} :=\xi^{1/q-1/2} \sum_{\boldsymbol{s} \in \Lambda(\xi)} e^{K_1|\boldsymbol{s}|_1} p_{\boldsymbol{s}}(2) (2\sqrt{\omega})^{|\boldsymbol{s}|_1} B_{\boldsymbol{s}} \end{equation*} \notag $$

and

$$ \begin{equation*} B_{\boldsymbol{s}}:= \max_{\boldsymbol{e} \in E_{\boldsymbol{s}}, \, \boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} \max_{\boldsymbol{0}\leqslant \boldsymbol{\ell}\leqslant \boldsymbol{s}- \boldsymbol{e}} |b^{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}_{\boldsymbol{\ell}}|. \end{equation*} \notag $$

From (3.62) it follows that

$$ \begin{equation*} B_{\boldsymbol{s}} \leqslant \max_{\boldsymbol{e} \in E_{\boldsymbol{s}}, \, \boldsymbol{k} \in \pi_{\boldsymbol{s}- \boldsymbol{e}}} \sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s}-\boldsymbol{e}} |{b^{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}_{\boldsymbol{\ell}}}| \leqslant \exp(K_2 \xi^{1/(\theta q)} \log \xi), \end{equation*} \notag $$

which by Lemma 5.1, (i), implies that

$$ \begin{equation*} \begin{aligned} \, \delta^{-1} &\leqslant \xi^{1/q-1/2}\exp\bigl({K_2 \xi^{1/(\theta q)} \bigr) \log \xi} \sum_{\boldsymbol{s} \in \Lambda(\xi)} p_{\boldsymbol{s}}(2) \\ & \leqslant C_2\xi^{1/q +1/2}\exp\bigl({K_3 \xi^{1/(\theta q)} \log \xi}\bigr) \leqslant C_2\exp\bigl({K_3 \xi^{1/(\theta q)} \log \xi}\bigr). \end{aligned} \end{equation*} \notag $$

Hence

$$ \begin{equation} \log(\delta^{-1}) \leqslant K_4 \xi^{1/(\theta q)} \log \xi, \end{equation} \tag{3.68} $$

and, consequently,

$$ \begin{equation*} (1+|\boldsymbol{s}|_1 \log \delta^{-1}) \leqslant \bigl({ 1+|\boldsymbol{s}|_1 K_4 \xi^{1/(\theta q)} \log \xi}\bigr) \leqslant C_2 \xi^{2/(\theta q)} \log \xi. \end{equation*} \notag $$

Now from (3.67) and Lemma 5.2 we obtain the required bound on the size of $\boldsymbol{\phi}_{\Lambda(\xi)}$:

$$ \begin{equation*} \begin{aligned} \, W(\boldsymbol{\phi}_{\Lambda(\xi)}) &\leqslant C_2 \xi^{2/(\theta q)} \log \xi \sum_{\boldsymbol{s} \in \Lambda(\xi)} \sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} \sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s}-\boldsymbol{e}} 1 \\ &\leqslant C_2 \xi^{2/(\theta q)} \log \xi \sum_{ (\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)}p_{\boldsymbol{s}}(1) \leqslant C_3 \xi^{1+2/(\theta q)} \log \xi. \end{aligned} \end{equation*} \notag $$

Using Lemma 2.1, (3.49), (3.68) and Lemma 5.1, (ii), we prove that the depth of $\boldsymbol{\phi}_{\Lambda(\xi)}$ is bounded as in (3.66):

$$ \begin{equation*} \begin{aligned} \, L({\boldsymbol{\phi}_{\Lambda(\xi)}}) &\leqslant \max_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)} L (\phi_{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}) \leqslant \max_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)} \max_{\boldsymbol{0}\leqslant \boldsymbol{\ell} \leqslant \boldsymbol{s}-\boldsymbol{e}} L({\phi^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}}}) \\ & \leqslant C_4 \max_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)} \max_{\boldsymbol{0}\leqslant \boldsymbol{\ell} \leqslant \boldsymbol{s}-\boldsymbol{e}} ({ 1+\log |\boldsymbol{s}|_1 \log \delta^{-1}}) \\ & \leqslant C_4 \max_{\boldsymbol{s} \in \Lambda(\xi)} ({ 1+\log |\boldsymbol{s}|_1 \log \delta^{-1}}) \\ & \leqslant C_4 \max_{\boldsymbol{s} \in \Lambda(\xi)} \bigl( 1+\log ({K_{q,\theta} \xi^{1/(\theta q)}} )({K_5 \xi^{1/(\theta q)} \log \xi)} \bigr) \leqslant C_5 \xi^{1/(\theta q)} (\log \xi)^2. \end{aligned} \end{equation*} \notag $$

Lemma 3.6 is proved.

We are now in a position to give a formal proof of Theorem 3.2.

Proof of Theorem 3.2. From (3.34), Theorem 3.1 and Lemmas 3.3–3.5, for every ${\xi \!>\! 2}$ we deduce that

$$ \begin{equation*} \|{v-\Phi_{\Lambda(\xi)} v}{L_2({\mathbb R}^\infty,X,\gamma)}\| \leqslant C \xi^{- (1/q-1/2)}. \end{equation*} \notag $$

Claim (vi) is proved. Claim (i) follows directly from the construction of the deep ReLU neural network $\boldsymbol{\phi}_{\Lambda(\xi)}$ and the sequence of points $Y_{\Lambda(\xi)}$, claim (ii) follows from Lemma 5.2, claims (iii) and (iv) from Lemma 3.6, and claim (v) from Lemma 5.1, (ii), and (3.51). Thus, Theorem 3.2 is proved for $U={\mathbb R}^\infty$.

The case $U= {\mathbb R}^M$ can be dealt with in the same way with slight modifications. Counterparts of all definitions, formulae and assertions which were used in the proof of the case $U= {\mathbb R}^\infty$, are true for $U= {\mathbb R}^M$. In the proof for this case, in particular, the equality $\|H_{\boldsymbol{s}}\|_{L_2({\mathbb R}^\infty)}=1$, $\boldsymbol{s} \in \mathbb{F}$, we have used, is replaced by the inequality $\|{H_{\boldsymbol{s}}}{L_\infty^{\sqrt{g}}({\mathbb R}^M)} \|< 1$, $\boldsymbol{s} \in \mathbb{N}_0^M$. The theorem is proved.

§ 4. Application to parametrized elliptic PDEs

In this section we apply the results in the previous section to the deep ReLU neural network approximation of the solution $u$ to the parametrized elliptic PDEs (1.2) with lognormal inputs (1.3). This is based on the weighted $\ell_2$-summability of the series $(\|u_{\boldsymbol{s}}\|_V)_{\boldsymbol{s} \in \mathcal{F}}$ in following lemma, which was proved in [4], Theorems 3.3 and 4.2.

Lemma 4.1. Assume that there exist a number $q$, $0<q<\infty$, and an increasing sequence $\boldsymbol{\rho} =(\rho_{j})_{j \in \mathcal{N}}$ of numbers strictly larger than 1 such that $\|{\boldsymbol{\rho}^{-1}}\|_{\ell_q(\mathcal N)} \leqslant C < \infty$ and

$$ \begin{equation*} \biggl\| \sum _{j \in \mathcal N} \rho_j |\psi_j| \biggr\|_{L_\infty(D)} \leqslant C <\infty, \end{equation*} \notag $$

where the constants $C$ are independent of $J$. Then for any $\eta \in \mathcal{N}$ we have

$$ \begin{equation} \sum_{\boldsymbol{s}\in\mathcal F} (\sigma_{\boldsymbol{s}} \|u_{\boldsymbol{s}}\|_V)^2 \leqslant C < \infty \quad \textit{for } \sigma_{\boldsymbol{s}}^2:=\sum_{\|\boldsymbol{s}'\|_{\ell_\infty(\mathcal F)}\leqslant \eta}{\binom {\boldsymbol{s}}{ \boldsymbol{s}'}} \prod_{j \in \mathcal N}\rho_j^{2s_j'}, \end{equation} \tag{4.1} $$

where the constant $C$ is independent of $J$.

The following two lemmas are proved in [15] (Lemmas 5.2 and 5.3, respectively).

Lemma 4.2. Let the assumptions of Lemma 4.1 hold. Then the solution map $\boldsymbol{y} \mapsto u(\boldsymbol{y})$ is $\gamma$-measurable and $u \in L_2(U,V,\gamma)$. Moreover, $u \in L_2^\mathcal{E}(U,V,\gamma)$, where the set

$$ \begin{equation} \mathcal E :=\Bigl\{\boldsymbol{y} \in {\mathbb R}^\infty\colon\sup_{j \in \mathbb N} \rho_j^{-1} |y_j| < \infty \Bigr\} \end{equation} \tag{4.2} $$

has measure $\gamma(\mathcal{E}) =1$ and contains all the $\boldsymbol{y} \in {\mathbb R}^\infty$ with $|\boldsymbol{y}|_0 < \infty$ in the case when $U={\mathbb R}^\infty$.

Lemma 4.3. Let $0 < q <\infty$ and let $\boldsymbol{\rho}=(\rho_j) _{j \in \mathcal{N}}$ be a sequence of positive numbers such that $\|{\boldsymbol{\rho}^{-1}} \|_{\ell_q(\mathcal N)} \leqslant C < \infty$, where the constant $C$ is independent of $J$. Let $\theta$ be an arbitrary nonnegative number and $\boldsymbol{p}(\theta)=(p_{\boldsymbol{s}}(\theta))_{\boldsymbol{s} \in \mathcal{F}}$ the set defined as in (3.20). For $\eta \in \mathbb{N}$ let the set $\boldsymbol{\sigma}=(\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}}$ be defined as in (4.1). Then for any $\eta >2(\theta+1)/q$, we have

$$ \begin{equation*} \|{\boldsymbol{p}(\theta)\boldsymbol{\sigma}^{-1}}\| _ {\ell_q(\mathcal F)} \leqslant C < \infty, \end{equation*} \notag $$

where the constant $C$ is independent of $J$.

Now we are in a position to formulate our main results on collocation deep ReLU neural network approximation of the solution $u$ to parametric elliptic PDEs with lognormal inputs.

Theorem 4.1. Under the assumptions of Lemma 4.1, let $0 < q < 2$. Then, given an arbitrary number $\delta > 0$, for every integer $n > 2$ we can construct a deep ReLU neural network $\boldsymbol{\phi}_{\Lambda(\xi_n)}:=(\phi_{\boldsymbol{s} -\boldsymbol{e};\boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi_n)}$ of size $W(\boldsymbol{\phi}_{\Lambda(\xi_n)}) \leqslant n$ on $\mathbb{R}^m$, where

$$ \begin{equation*} m := \begin{cases} \min \biggl\{M, \biggl\lfloor K \biggl(\dfrac{n}{\log n}\biggr)^{1/(1+\delta)} \biggr\rfloor \biggr\} &\textit{if } U={\mathbb R}^M, \\ \biggl\lfloor K \biggl(\dfrac{n}{\log n})^{1/(1+\delta)} \biggr\rfloor &\textit{if }U={\mathbb R}^\infty, \end{cases} \end{equation*} \notag $$

and a sequence of points $Y_{\Lambda(\xi_n)}:=(\boldsymbol{y}_{\boldsymbol{s} -\boldsymbol{e};\boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi_n)}$ having the following properties:

(i) the deep ReLU neural network $\boldsymbol{\phi}_{\Lambda(\xi_n)}$ and the sequence of points $Y_{\Lambda(\xi_n)}$ are independent of $u$;
(ii) the output dimension of $\boldsymbol{\phi}_{\Lambda(\xi_n)}$ is at most $\lfloor K ({n}/{\log n})^{1/(1+\delta)} \rfloor $;
(iii) $L (\boldsymbol{\phi}_{\Lambda(\xi_n)} )\leqslant C_\delta (n/\log n)^{\delta/(2(1+\delta))}(\log n)^2$;
(iv) the components $\phi_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}$, of $\boldsymbol{\phi}_{\Lambda(\xi_n)}$, $(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi_n)$, are deep ReLU neural networks on $\mathbb{R}^{m_{\boldsymbol{s}}}$, where $m_{\boldsymbol{s}} \leqslant C_\delta n^\delta$, with support in the super-cube $[-T,T]^{m_{\boldsymbol{s}}}$, where $T:=C_\delta (n/\log n)^{1/(2(1+\delta))}$;
(v) the approximation of $u$ by $\Phi_{\Lambda(\xi_n)}u$ defined as in (3.27) has the error estimate
$$ \begin{equation*} \| u- \Phi_{\Lambda(\xi_n)} u \|_{\mathcal L(U,V)} \leqslant C \biggl(\frac{n}{\log n}\biggr)^{-(1/q-1/2)/(1+\delta)}. \end{equation*} \notag $$

Here the constants $C$, $K$ and $C_\delta$ are independent of $J$, $u$ and $n$.

Proof. To prove the theorem we apply Theorem 3.2 to the solution $u$. Without loss of generality we can assume that $\delta \leqslant 1/6$. First we take the number $\theta := 2/\delta q$, satisfying $\theta \geqslant 3/q$, and then choose a number $\eta \in \mathbb{N}$ satisfying $\eta > 2(\theta + 1)/q$. Using Lemmas 4.1–4.3 one can check that $u \in L_2^\mathcal{E}(U,V,\gamma)$ satisfies the assumptions of Theorem 3.2 for $X=V$ and the set $(\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathbb{F}}$ defined as in (4.1), where $\mathcal{E}$ is the set defined in Lemma 4.2. Given an integer $n > 2$, we choose $\xi_n >2$ as the largest number satisfying $C\xi_n^{1+\delta} \log \xi_n \leqslant n$, where $C$ is the constant in claim (ii) of Theorem 3.2. It is easy to verify that there exist positive constants $C_1$ and $C_2$ independent of $n$ such that

$$ \begin{equation*} C_1\biggl(\frac{n}{\log n}\biggr)^{1/(1+\delta)} \leqslant \xi_n \leqslant C_2\biggl(\frac{n}{\log n}\biggr)^{1/(1+\delta)}. \end{equation*} \notag $$

From Theorem 3.2 for $\xi=\xi_n$ we deduce the required results. Theorem 4.1 is proved.

From Theorem 4.1 one can directly derive the following.

Theorem 4.2. Under the assumptions of Lemma 4.1 let $0 < q < 2$ and $\delta_q:=\min (1, 1/q -1/2)$. Then, given an arbitrary number $\delta \in (0,\delta_q)$, for every integer $n > 1$ we can construct a deep ReLU neural network $\boldsymbol{\phi}_{\Lambda(\xi_n)}:=(\phi_{\boldsymbol{s} -\boldsymbol{e};\boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi_n)}$ of size $W\big(\boldsymbol{\phi}_{\Lambda(\xi_n)}\big) \leqslant n$ on $\mathbb{R}^m$, where

$$ \begin{equation*} m := \begin{cases} \min \{M, \lfloor K n^{1-\delta} \rfloor\}&\textit{if }U={\mathbb R}^M, \\ \lfloor K n^{1-\delta} \rfloor&\textit{if }U={\mathbb R}^\infty, \end{cases} \end{equation*} \notag $$

and a sequence of points $Y_{\Lambda(\xi_n)}\!:=\! (\boldsymbol{y}_{\boldsymbol{s} - \boldsymbol{e};\boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi_n)}$ with the following properties:

(i) the deep ReLU neural network $\boldsymbol{\phi}_{\Lambda(\xi_n)}$ and the sequence of points $Y_{\Lambda(\xi_n)}$ are independent of $u$;
(ii) the output dimension of $\boldsymbol{\phi}_{\Lambda(\xi_n)}$ is at most $\lfloor K n^{1-\delta} \rfloor$;
(iii) $L(\boldsymbol{\phi}_{\Lambda(\xi_n)})\leqslant C_\delta n^\delta$;
(iv) the components $\phi_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}$, of $\boldsymbol{\phi}_{\Lambda(\xi_n)}$, $(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi_n)$, are deep ReLU neural networks on $\mathbb{R}^{m_{\boldsymbol{s}}}$, where $m_{\boldsymbol{s}} \leqslant C_\delta n^\delta$, with support in the super-cube $[-T,T]^{m_{\boldsymbol{s}}}$, where $T:= C_\delta n^{1 - \delta}$;
(v) the approximation of $u$ by $\Phi_{\Lambda(\xi_n)}u$ defined as in (3.27) has the error estimates
$$ \begin{equation} \| u- \Phi_{\Lambda(\xi_n)} u \|_{\mathcal L(U,V)} \leqslant C m^{-(1/q-1/2)} \leqslant C_\delta n^{- (1-\delta)(1/q-1/2)}. \end{equation} \tag{4.3} $$

Here the constants $K$, $C$ and $C_\delta$ are independent of $J$, $u$ and $n$.

Let us compare the collocation approximation of $u$ by the function

$$ \begin{equation} \Phi_{\Lambda(\xi_n)}u :=\sum_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi_n)} (-1)^{|\boldsymbol{e}|_1} u(\boldsymbol{y}_{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}})\phi_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}, \end{equation} \tag{4.4} $$

generated from the deep ReLU neural network $\boldsymbol{\phi}_{\Lambda(\xi_n)}$ as in Theorem 4.2 and the collocation approximation of $u$ by the sparse-grid Lagrange GPC interpolation

$$ \begin{equation} I_{\Lambda(\xi_n)}u :=\sum_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi_n)} (-1)^{|\boldsymbol{e}|_1} u(\boldsymbol{y}_{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}})L_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}. \end{equation} \tag{4.5} $$

Both methods are based on $m$ the same particular solvers $(u(\boldsymbol{y}_{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}))_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi_n)}$. From Corollary 3.1 one can see that under the assumptions of Theorem 4.2, for the last approximation we have the error bound in $m$

$$ \begin{equation*} \|u- I_{\Lambda(\xi_n)} u\|_{\mathcal L(U, V)} \leqslant C m^{-(1/q-1/2)}, \end{equation*} \notag $$

which is the same as the one in (4.3) for the first approximation, since by construction the parameter $m$ in (4.3) can be treated as independent.

After the present paper and [17] appeared on the ArXiv website, we were informed about the paper [50], concerned with some problems similar to the ones considered in [17], through private communication with its authors.

§ 5. Appendix

5.1. Auxiliary lemmas

Lemma 5.1. Let $\theta\geqslant 0$ and $0<q<\infty$. Let $\boldsymbol{\sigma}= (\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}}$ be a set of numbers strictly larger than $1$. Then the following hold.

(i) If $\|{\boldsymbol{p} (\theta/q)\boldsymbol{\sigma}^{-1}}\|_{\ell_q(\mathcal F)} \leqslant K < \infty$, where the constant $K$ is independent of $J$, then

$$ \begin{equation} \sum_{ \boldsymbol{s}\in \Lambda(\xi)} p_{\boldsymbol{s}}(\theta) \leqslant K \xi \quad \forall\, \xi > 1. \end{equation} \tag{5.1} $$

In particular, if $\|{\boldsymbol{\sigma}^{-1}}\|_{\ell_q(\mathcal F)}^q \leqslant K_q < \infty$, where the constant satisfies $K_q \geqslant 1$ and is independent of $J$, then the set $\Lambda(\xi)$ is finite and

$$ \begin{equation} |\Lambda(\xi)| \leqslant K_q \xi \quad \forall\, \xi > 1. \end{equation} \tag{5.2} $$

(ii) If $\|{\boldsymbol{p}(\theta)\boldsymbol{\sigma}^{-1}} \|_{\ell_q(\mathcal F)}^{1/\theta} \leqslant K_{q,\theta} < \infty$, where the constant $K_{q,\theta}$ is independent of $J$, then

$$ \begin{equation} m_1(\xi) \leqslant K_{q,\theta} \xi^{1/(\theta q)} \quad \forall\, \xi > 1. \end{equation} \tag{5.3} $$

(iii) If $\sigma_{\boldsymbol{e}^{i'}} \leqslant \sigma_{\boldsymbol{e}^i}$ for $i' < i$, and if $\|\boldsymbol{\sigma}^{-1}\|_{\ell_q(\mathcal F)}^q \leqslant K_q < \infty$, where the constant satisfies $K_q \geqslant 1$ and is independent of $J$, then

$$ \begin{equation} m(\xi) \leqslant K_q \xi \quad \forall\, \xi > 1. \end{equation} \tag{5.4} $$

Proof. The claims (ii) and (iii) were proved in [17] (Lemmas 3.2 and 3.3, respectively) for $\mathcal{F} = \mathbb{F}$. The case $\mathcal{F} = \mathbb{N}_0^M$ can be proved in a similar way. Let us prove claim (i). Indeed, for every $\xi > 1$ we have

$$ \begin{equation*} \sum_{\boldsymbol{s} \in \Lambda(\xi)} p_{\boldsymbol{s}}(\theta) \leqslant \sum_{\boldsymbol{s} \in \mathcal F\colon\sigma_{\boldsymbol{s}}^{-q}\xi\geqslant 1} p_{\boldsymbol{s}}(\theta) \xi \sigma_{\boldsymbol{s}}^{-q} \leqslant \xi \sum_{\boldsymbol{s}\in \mathcal F} p_{\boldsymbol{s}}(\theta) \sigma_{\boldsymbol{s}}^{-q} \leqslant C\xi. \end{equation*} \notag $$

The lemma is proved.

Lemma 5.2. Let $\theta\geqslant 0$, $0<q<\infty$ and $\xi > 1$. Let $\boldsymbol{\sigma}= (\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}}$ be a set of numbers strictly greater than $1$. If $\|{\boldsymbol{p} ((\theta+2)/q) \boldsymbol{\sigma}^{-1}}\|_{\ell_q(\mathcal F)} \leqslant C < \infty$, where the constant $C$ is independent of $J$, then

$$ \begin{equation} \sum_{ (\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)} p_{\boldsymbol{s}}(\theta) \leqslant C\xi \quad \forall\, \xi > 1. \end{equation} \tag{5.5} $$

In particular, if, in addition, $\|\boldsymbol{p}(2/q)\boldsymbol{\sigma}^{-1}\|_{\ell_q(\mathcal F)}^q \leqslant C_q < \infty$, where the constant $C_q$ is independent of $J$, then

$$ \begin{equation*} |G(\xi)| \leqslant C_q \xi \quad \forall\, \xi > 1. \end{equation*} \notag $$

Proof. For every $\xi > 1$ we have

$$ \begin{equation*} \begin{aligned} \, \sum_{ (\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)} p_{\boldsymbol{s}}(\theta) &=\sum_{\boldsymbol{s} \in \Lambda(\xi)} \sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}-\boldsymbol{e}}} p_{\boldsymbol{s}}(\theta) \leqslant \sum_{\boldsymbol{s} \in \Lambda(\xi)} p_{\boldsymbol{s}}(\theta) \sum_{\boldsymbol{e} \in E_{\boldsymbol{s}}} |\pi_{\boldsymbol{s}-\boldsymbol{e}}| \\ &\leqslant \sum_{\boldsymbol{s} \in \Lambda(\xi)} p_{\boldsymbol{s}}(\theta) |E_{\boldsymbol{s}}| p_{\boldsymbol{s}}(1) =\sum_{\boldsymbol{s} \in \Lambda(\xi)} p_{\boldsymbol{s}}(\theta+1) 2^{|\boldsymbol{s}|_0} \leqslant \sum_{\boldsymbol{s} \in \Lambda(\xi)} p_{\boldsymbol{s}}(\theta+2) \\ &\leqslant \sum_{\boldsymbol{s} \in \mathcal F\colon\sigma_{\boldsymbol{s}}^{-q}\xi\geqslant 1} p_{\boldsymbol{s}}(\theta+2) \xi \sigma_{\boldsymbol{s}}^{-q} \leqslant \xi \sum_{\boldsymbol{s}\in \mathcal F} p_{\boldsymbol{s}}(\theta+2) \sigma_{\boldsymbol{s}}^{-q} \leqslant C\xi. \end{aligned} \end{equation*} \notag $$

Lemma 5.3. For any $\boldsymbol{s}, \boldsymbol{s}' \in \mathcal{F}$, we have

$$ \begin{equation} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}}} |H_{\boldsymbol{s}'}(\boldsymbol{y}_{\boldsymbol{s};\boldsymbol{k}})| \leqslant e^{K|\boldsymbol{s}|_1}, \end{equation} \tag{5.6} $$

where the constant $K$ is independent of $J$ and $\boldsymbol{s}, \boldsymbol{s}'$.

Proof. From Cramér’s bound (see, for instance, [15], Lemma 3.2) we deduce that

$$ \begin{equation} |H_s(y)\sqrt{g(y)}|<1 \quad \forall\, y \in \mathbb R \quad \forall\, s \in \mathbb N_0, \end{equation} \tag{5.7} $$

or, equivalently,

$$ \begin{equation} |H_s(y)|<(2\pi)^{1/4} e^{y^2/4} \quad \forall\, y \in \mathbb R, \quad \forall\, s \in \mathbb N_0. \end{equation} \tag{5.8} $$

Let $\boldsymbol{s}, \boldsymbol{s}' \in \mathcal{F}$ and $\boldsymbol{k} \in \pi_{\boldsymbol{s}}$ be given. Notice that the univariate Hermite polynomials satisfy $H_0 = 1$, $H_{2s+1}(0) = 0$ and $|H_{2s}(0)|\leqslant 1 $ for $s \in \mathbb{N}_0$. Hence, by (5.8) we have

$$ \begin{equation} |H_{\boldsymbol{s}'}(\boldsymbol{y}_{\boldsymbol{s};\boldsymbol{k}})| \leqslant \prod_{j \in \operatorname{supp} (\boldsymbol{s}')\cap \operatorname{supp} (\boldsymbol{s})} |H_{s_j'}(y_{s_j,k_j})| \leqslant \prod_{j \in \operatorname{supp} (\boldsymbol{s})} (2\pi)^{1/4} e^{y_{s_j,k_j}^2/4}. \end{equation} \tag{5.9} $$

Therefore,

$$ \begin{equation} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}}} |H_{\boldsymbol{s}'}(\boldsymbol{y}_{\boldsymbol{s};\boldsymbol{k}})| \leqslant \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}}} \prod_{j \in \operatorname{supp} (\boldsymbol{s})} (2\pi)^{1/4} e^{y_{s_j,k_j}^2/4} = \prod_{j \in \operatorname{supp} (\boldsymbol{s})} (2\pi)^{1/4} \sum_{k_j \in \pi_{s_j}} e^{y_{s_j,k_j}^2/4}. \end{equation} \tag{5.10} $$

Inequalities (6.31.19) in [55] yield

$$ \begin{equation} |y_{s;k}| \leqslant K_1 \frac{|k|}{\sqrt s} \quad \forall\, k \in \pi_s, \quad \forall\, s \in \mathbb N. \end{equation} \tag{5.11} $$

Consequently,

$$ \begin{equation} (2\pi)^{1/4} \sum_{k_j \in \pi_{s_j}} e^{y_{s_j,k_j}^2/4} \leqslant 2 (2\pi)^{1/4} \sum_{k_j=0}^{\lfloor s_j/2\rfloor} \exp{\biggl(\frac{K_1}{4} \frac{k_j^2}{s_j}\biggr)}\leqslant e^{Ks_j} \quad \forall\, s_j \in \mathbb N. \end{equation} \tag{5.12} $$

This allows us to finish the proof of the lemma as follows:

$$ \begin{equation*} \sum_{\boldsymbol{k} \in \pi_{\boldsymbol{s}}} |H_{\boldsymbol{s}'}(\boldsymbol{y}_{\boldsymbol{s};\boldsymbol{k}})| \leqslant\prod_{j \in \operatorname{supp} (\boldsymbol{s})} e^{K s_j} =e^{K |\boldsymbol{s}|_1}. \end{equation*} \notag $$

Lemma 5.4. For any $s \in \mathbb{N}$ and $k \in \pi_{s}$ we have

$$ \begin{equation} \|L_{s;k}\|_{L_2(\mathbb R,\gamma)} \leqslant e^{K s} \end{equation} \tag{5.13} $$

and

$$ \begin{equation} \|L_{s;k}\|_{L_{\infty}^{\sqrt{g}}(\mathbb R)} \leqslant e^{K s}, \end{equation} \tag{5.14} $$

where the constants $K$ are independent of $s$ and $k \in \pi_{s}$.

Proof. Notice that $L_{s;k}$ is a polynomial with $s$ simple zeros $\{y_{s;j}\}_{j \in \pi_{s}, \, j \ne k}$, and that $L_{s;k}(y_{s;k}) =1$. Moreover, there is no zero in the open interval $(y_{s;k-1}, y_{s;k})$ and

$$ \begin{equation*} L_{s;k}(y_{s;k})=\max_{y \in [y_{s;k-1}, y_{s;k}]} L_{s;k}(y)=1. \end{equation*} \notag $$

Hence

$$ \begin{equation} |L_{s;k}(y)| \leqslant 1 \quad \forall\, y \in [y_{s;k-1}, y_{s;k+1}]. \end{equation} \tag{5.15} $$

Let us estimate $|L_{s;k}(y)|$ for $y \in \mathbb{R} \setminus (y_{s;k-1}, y_{s;k+1})$. one can see from the definition that

$$ \begin{equation} L_{s;k} (y) :=\prod_{k' \in \pi_s,\,k'\ne k}\frac{y-y_{s;k'}}{y_{s;k}-y_{s;k'}} =A_{s;k} (y-y_{s;k})^{-1} H_{s+1}(y), \end{equation} \tag{5.16} $$

where

$$ \begin{equation} A_{s;k} :=((s+1)!)^{1/2}\prod_{k' \in \pi_s,\,k'\ne k}(y_{s;k}-y_{s;k'})^{-1}. \end{equation} \tag{5.17} $$

From inequalities (6.31.22) in [55] we obtain

$$ \begin{equation} \frac{\pi \sqrt{2}}{ \sqrt{2s+3}} \leqslant d_s \leqslant \frac{\sqrt{10.5}}{ \sqrt{2s+3}} \end{equation} \tag{5.18} $$

for the minimum distance $d_s$ between consecutive zeros $y_{s;k}$, $k \in \pi_s$. Hence

$$ \begin{equation*} |y-y_{s;k}|^{-1} \leqslant d_s^{-1} \leqslant \frac{ \sqrt{2s+3}}{\sqrt{10.5}} < \sqrt{s} \quad \forall\, y \in \mathbb R \setminus (y_{s;k-1}, y_{s;k+1}) \end{equation*} \notag $$

and for any $s \in \mathbb{N}$ and $k, k' \in \pi_{s}$, $k' \ne k$, we have

$$ \begin{equation} |y_{s;k}-y_{s;k'}|^{-1} \leqslant C \frac{\sqrt{s}}{|k-k'|}, \end{equation} \tag{5.19} $$

which yields for any $y \in \mathbb{R} \setminus (y_{s;k-1}, y_{s;k+1})$ the inequality

$$ \begin{equation} \begin{aligned} \, \notag &|y- y_{s;k}|^{-1}|A_{s;k}| \leqslant\sqrt{s}\, ((s+1)!)^{1/2} \prod_{k' \in \pi_s,\,k'\ne k}|y_{s;k}-y_{s;k'}|^{-1} \\ \notag &\qquad \leqslant\sqrt{s}\, C^s \frac{((s+1)!)^{1/2}s^{s/2}}{k!\, (s-k)!} \leqslant \sqrt{s}\, C^s \binom{s}{k}\frac{((s+1)!)^{1/2}s^{s/2}}{s! } \\ &\qquad \leqslant \sqrt{s}\, (2C)^s \frac{((s+1)!)^{1/2}s^{s/2}}{s! } \leqslant e^{K_1 s}. \end{aligned} \end{equation} \tag{5.20} $$

At the last step we used Stirling’s approximation for the factorial. Thus, we have proved that

$$ \begin{equation} |L_{s;k} (y)| \leqslant e^{K_1 s} |H_{s+1}(y)| \quad \forall\, y \in \mathbb R \setminus (y_{s;k-1}, y_{s;k+1}). \end{equation} \tag{5.21} $$

Setting $I_{s;k}:= [y_{s;k-1}, y_{s;k+1}]$, from the last estimate and (5.15) we obtain (5.13):

$$ \begin{equation*} \begin{aligned} \, \|L_{s;k}\|_{L_2(\mathbb R,\gamma)}^2 &= \|L_{s;k}\|_{L_2(I_{s;k},\gamma)}^2+\|L_{s;k}\|_{L_2(\mathbb R \setminus I_{s;k},\gamma)}^2 \\ &\leqslant 1+e^{2K_1 s}\|H_s\|_{L_2(\mathbb R ,\gamma)}^2 =1+e^{2K_1 s} \leqslant e^{2K s}. \end{aligned} \end{equation*} \notag $$

The inequality (5.14) can be proved similarly by using (5.7). The lemma is proved.

Lemma 5.5. Assume that $p$ and $q$ are polynomials on $\mathbb{R}$ of the form

$$ \begin{equation} p(y):=\sum_{k=0}^m a_k y^k \quad\textit{and}\quad q(y):=\sum_{k=0}^{m-1} b_k y^k, \end{equation} \tag{5.22} $$

and that $p(y) = (y - y_0) q(y)$ for a point $y_0 \in \mathbb{R}$. Then

$$ \begin{equation} |b_k| \leqslant \sum_{k=0}^m |a_k|, \qquad k=0, \dotsc, m-1. \end{equation} \tag{5.23} $$

Proof. From the definition we have

$$ \begin{equation} \sum_{k=0}^m a_k y^k = -b_0 y_0+\sum_{k=0}^{m-1} (b_{k-1}-b_k y_0) y^k+b_{m-1} y^m. \end{equation} \tag{5.24} $$

Hence we obtain

$$ \begin{equation} 0=a_0+b_0 y_0, \qquad b_k=a_{k+1}+b_{k+1} y_0, \quad k=1, \dotsc, m-2, \qquad b_{m-1}=a_m. \end{equation} \tag{5.25} $$

From the last equalities one can see that the lemma is trivial if $y_0 = 0$. Consider the case $y_0 \ne 0$. If $|y_0| \leqslant 1$, then from (5.25) we deduce that

$$ \begin{equation} b_k=\sum_{j=k+1}^m a_j y_0^{j-k-1}, \end{equation} \tag{5.26} $$

and, consequently,

$$ \begin{equation} |b_k| \leqslant \sum_{j=k+1}^m |a_j| |y_0|^{j-k-1} \leqslant \sum_{j=0}^m |a_j|. \end{equation} \tag{5.27} $$

If $|y_0| > 1$, then from (5.25) we deduce that

$$ \begin{equation} b_k=- \sum_{j=0}^k a_j y_0^{-(k+1-j)} \end{equation} \tag{5.28} $$

and, consequently,

$$ \begin{equation} |b_k| \leqslant \sum_{j=0}^k |a_j| |y_0|^{-(k+1-j)} \leqslant \sum_{j=0}^m |a_j|. \end{equation} \tag{5.29} $$

The lemma is proved.

Lemma 5.6. Let $b^{s;k}_\ell$ be the coefficients of $L_{s;k}$ as in the representation (3.39). Then for any $s \in \mathbb{N}_0$ and $k \in \pi_s$ we have

$$ \begin{equation*} \sum_{\ell=0}^s |b^{s;k}_\ell| \leqslant e^{Ks} s!, \end{equation*} \notag $$

where the constant $K$ is independent of $s$ and $k \in \pi_{s}$.

Proof. For $s\in \mathbb{N}_0$, we represent the univariate Hermite polynomial $H_s$ in the form

$$ \begin{equation} H_s(y) := \sum_{\ell=0}^s a_{s,\ell} y^\ell. \end{equation} \tag{5.30} $$

Using the well-known equality

$$ \begin{equation} H_s(y)= s! \sum_{\ell=0}^{\lfloor{s}/{2} \rfloor} \frac{(-1)^\ell}{\ell!\,(s- 2\ell)!} \frac{y^{s-2\ell}}{2^\ell}, \end{equation} \tag{5.31} $$

one can derive that

$$ \begin{equation} \sum_{\ell=0}^{s} |a_{s,\ell}| \leqslant s!\,. \end{equation} \tag{5.32} $$

From (5.16) we have

$$ \begin{equation} A_{s;k}H_{s+1}(y) =(y-y_{s;k}) L_{s;k}(y), \end{equation} \tag{5.33} $$

where $A_{s;k}$ is as in (5.17). By Lemma 5.5, (5.32) and (5.20) we obtain

$$ \begin{equation*} \sum_{\ell=0}^s |b^{s;k}_\ell| \leqslant \sum_{\ell=0}^s A_{s;k}\sum_{\ell'=0}^{s+1} |a_{s+1,\ell'}| \leqslant e^{Ks} s!\,. \end{equation*} \notag $$

The lemma is proved.

Lemma 5.7. Let $\varphi (\boldsymbol{y})= \prod_{j = 1}^m \varphi_j(y_j)$ for $\boldsymbol{y} \in \mathbb{R}^m$, where $\varphi_j$ is a polynomial in the variable $y_j$ of degree not greater than $\omega$ for $j=1,\dots,m$. Then

$$ \begin{equation} \|\varphi\|_{L_2(\mathbb R^m{\setminus}B^m_\omega,\gamma)} \leqslant Cm \exp (- K\omega ) \|\varphi\|_{L_2(\mathbb R^m,\gamma)} \end{equation} \tag{5.34} $$

and

$$ \begin{equation} \|\varphi\|_{L_\infty^{\sqrt{g}}(\mathbb R^m{\setminus}B^m_\omega)} \leqslant Cm \exp (- K\omega ) \|\varphi\|_{L_\infty^{\sqrt{g}}(\mathbb R^m)}, \end{equation} \tag{5.35} $$

where the constants $C$ and $K$ are independent of $\omega$, $m$ and $\varphi$.

Inequality (5.34) was proved in [17], Lemma 3.3. Inequality (5.35) can be proved in a similar way with slight modifications.

5.2. Proof of Theorem 3.1

This theorem was proved in [15], Corollary 3.11, for $U={\mathbb R}^\infty$. Let us prove it for $U={\mathbb R}^M$. By Lemma 3.1 the series (3.8) converges unconditionally to $v$ in the space $L_2({\mathbb R}^M,X,\gamma)$. Observe that $I_{\Lambda(\xi)} H_{\boldsymbol{s}} = H_{\boldsymbol{s}}$ for every $\boldsymbol{s} \in \Lambda(\xi)$ and $\Delta_{\boldsymbol{s}} H_{\boldsymbol{s}'} = 0$ for every $\boldsymbol{s} \not\leqslant \boldsymbol{s}'$. Hence for the downward closed set $\Lambda(\xi) \subset {\mathbb N}_0^M$ we can write

$$ \begin{equation*} I_{\Lambda(\xi) }v =I_{\Lambda(\xi) }\biggl(\sum_{ \boldsymbol{s} \in {\mathbb N}_0^M} v_{\boldsymbol{s}} H_{\boldsymbol{s}}\biggr) =\sum_{ \boldsymbol{s} \in {\mathbb N}_0^M} v_{\boldsymbol{s}}I_{\Lambda(\xi) } H_{\boldsymbol{s}} =S_{\Lambda(\xi)} v +\sum_{\boldsymbol{s} \not\in \Lambda(\xi) } v_{\boldsymbol{s}}I_{\Lambda(\xi) \cap R_{\boldsymbol{s}}}H_{\boldsymbol{s}}, \end{equation*} \notag $$

where $R_{\boldsymbol{s}}:= \{\boldsymbol{s}' \in {\mathbb N}_0^M\colon \boldsymbol{s}' \leqslant \boldsymbol{s}\}$ and

$$ \begin{equation*} S_{\Lambda(\xi)} v:=\sum_{\boldsymbol{s} \in \Lambda(\xi) } v_{\boldsymbol{s}}H_{\boldsymbol{s}} \end{equation*} \notag $$

for $v \in L_2({\mathbb R}^M,X,\gamma)$ represented by the Hermite GPC expansion (3.8). This implies that

$$ \begin{equation} \|v- I_{\Lambda(\xi)} v\|_{L_{\infty}^{\sqrt{g}}({\mathbb R}^M,X)} \leqslant\|v- S_{\Lambda(\xi)} v\|_{L_{\infty}^{\sqrt{g}}({\mathbb R}^M,X)} +\sum_{\boldsymbol{s} \not\in \Lambda (\xi)} \|I_{\Lambda(\xi) \cap R_{\boldsymbol{s}}}H_{\boldsymbol{s}}\|_{L_{\infty}^{\sqrt{g}}({\mathbb R}^M)}. \end{equation} \tag{5.36} $$

Therefore, to prove the lemma it is sufficient to show that each term in the right-hand side is bounded by $C\xi^{-(1/q - 1/2)}$. The bound for the first term can be obtained from the Cauchy-Schwarz inequality and (5.7):

$$ \begin{equation} \begin{aligned} \, \notag &\|v- S_{\Lambda(\xi)}\|_{L_{\infty}^{\sqrt{g}}({\mathbb R}^M,X)} \leqslant \sum_{\sigma_{\boldsymbol{s}}> \xi^{1/q} } \|v_{\boldsymbol{s}}\|_{X}\|H_{\boldsymbol{s}}\|_{L_{\infty}^{\sqrt{g}}({\mathbb R}^M)} \leqslant \sum_{\sigma_{\boldsymbol{s}}> \xi^{1/q} } \|v_{\boldsymbol{s}}\|_{X} \\ \notag &\qquad\leqslant \biggl(\sum_{\sigma_{\boldsymbol{s}}> \xi^{1/q} } (\sigma_{\boldsymbol{s}}\|v_{\boldsymbol{s}}\|_{X})^2\biggr)^{1/2} \biggl(\sum_{\sigma_{\boldsymbol{s}}> \xi^{1/q} } \sigma_{\boldsymbol{s}}^{-2}\biggr)^{1/2} \leqslant C\biggl(\sum_{\sigma_{\boldsymbol{s}}> \xi^{1/q} } \sigma_{\boldsymbol{s}}^{-q} \sigma_{\boldsymbol{s}}^{-(2- q)}\biggr)^{1/2} \\ &\qquad\leqslant C \xi^{-(1/q-1/2)} \biggl(\sum_{\boldsymbol{s} \in \mathbb N_0^M} \sigma_{\boldsymbol{s}}^{-q} \biggr)^{1/2} \leqslant C \xi^{-(1/q-1/2)}. \end{aligned} \end{equation} \tag{5.37} $$

Let us prove the bound for the second term in the right-hand side of (5.36). We have

$$ \begin{equation} \|I_{\Lambda(\xi) \cap R_{\boldsymbol{s}}}H_{\boldsymbol{s}}\|_{L_{\infty}^{\sqrt{g}}({\mathbb R}^M)} \leqslant \sum_{\boldsymbol{s}' \in \Lambda(\xi) \cap R_{\boldsymbol{s}}} \|\Delta_{\boldsymbol{s}'} (H_{\boldsymbol{s}})\|_{L_{\infty}^{\sqrt{g}}({\mathbb R}^M)}. \end{equation} \tag{5.38} $$

We estimate the norms in the right-hand side. For $\boldsymbol{s} \in {\mathbb N}_0^M$ and $\boldsymbol{s}' \in \Lambda(\xi) \cap R_{\boldsymbol{s}}$ we have $\Delta_{\boldsymbol{s}'} (H_{\boldsymbol{s}}) = \prod_{j=1}^M \Delta_{s'_j}(H_{s_j})$. From Lemma 3.2 and (5.7) we deduce that

$$ \begin{equation*} \|\Delta_{s'_j} (H_{s_j})\|_{L_\infty^{\sqrt{g}}(\mathbb R)} \leqslant (1+C_\varepsilon s'_j)^{1/6+\varepsilon}\|H_{s_j}\|_{L_\infty^{\sqrt{g}}(\mathbb R)} \leqslant (1+C_\varepsilon s'_j)^{1/6+\varepsilon}, \end{equation*} \notag $$

and consequently,

$$ \begin{equation} \|\Delta_{\boldsymbol{s}'} (H_{\boldsymbol{s}})\|_{L_{\infty}^{\sqrt{g}}({\mathbb R}^M)} =\prod_{j=1}^M \|\Delta_{s'_j} (H_{s_j})\|_{L_\infty^{\sqrt{g}}(\mathbb R)} \leqslant p_{\boldsymbol{s}'}(\theta_1,\lambda) \leqslant p_{\boldsymbol{s}}(\theta_1, \lambda), \end{equation} \tag{5.39} $$

where $\theta_1 = 1/6 + \varepsilon$ and we recall that $\lambda = C_\varepsilon$. Substituting the right-hand side of (5.39) for $\|\Delta_{\boldsymbol{s}'} (H_{\boldsymbol{s}})\big\|_{L_{\infty}^{\sqrt{g}}({\mathbb R}^M)}$ into (5.38) we obtain

$$ \begin{equation*} \begin{aligned} \, \|I_{\Lambda(\xi) \cap R_{\boldsymbol{s}}}H_{\boldsymbol{s}}\|_{L_{\infty}^{\sqrt{g}}({\mathbb R}^M)} &\leqslant \sum_{\boldsymbol{s}' \in \Lambda(\xi) \cap R_{\boldsymbol{s}}}p_{\boldsymbol{s}}(\theta_1, \lambda) \leqslant |R_{\boldsymbol{s}}|p_{\boldsymbol{s}}(\theta_1, \lambda) \\ &\leqslant p_{\boldsymbol{s}}(1,1)p_{\boldsymbol{s}}(\theta_1, \lambda) \leqslant p_{\boldsymbol{s}}\biggl(\frac{\theta}{2}, \lambda\biggr). \end{aligned} \end{equation*} \notag $$

Using the last estimates and the assumption $\|\boldsymbol{p}(\theta/q,\lambda)\boldsymbol{\sigma}^{-1}\|_{\ell_q(\mathbb N_0^M)}\leqslant C < \infty$ with positive constant $C$ independent of $M$, we derive the bound for the second term in the right-hand side of (5.36):

$$ \begin{equation*} \begin{aligned} \, &\sum_{\boldsymbol{s} \not\in \Lambda (\xi)} \|I_{\Lambda(\xi) \cap R_{\boldsymbol{s}}}H_{\boldsymbol{s}}\|_{L_{\infty}^{\sqrt{g}}({\mathbb R}^M)} \leqslant C \sum_{\boldsymbol{s} \not\in \Lambda (\xi)} \|v_{\boldsymbol{s}}\|_{X}p_{\boldsymbol{s}}\biggl(\frac\theta2,\lambda\biggr) \\ &\qquad \leqslant C\biggl(\sum_{\sigma_{\boldsymbol{s}}> \xi^{1/q} } (\sigma_{\boldsymbol{s}}\|v_{\boldsymbol{s}}\|_{X})^2\biggr)^{1/2} \biggl(\sum_{\sigma_{\boldsymbol{s}}> \xi^{1/q} } p_{\boldsymbol{s}} \biggl(\frac\theta2,\lambda\biggr)^2 \sigma_{\boldsymbol{s}}^{-2}\biggr)^{1/2} \\ &\qquad \leqslant C\biggl(\sum_{\sigma_{\boldsymbol{s}}> \xi^{1/q} } p_{\boldsymbol{s}}\biggl(\frac{\theta}2,\lambda\biggr)^2 \sigma_{\boldsymbol{s}}^{-q} \sigma_{\boldsymbol{s}}^{-(2- q)}\biggr)^{1/2} \\ &\qquad \leqslant C \xi^{-(1/q-1/2)} \biggl(\sum_{\boldsymbol{s} \in \mathbb N_0^M} p_{\boldsymbol{s}}(\theta,\lambda) \sigma_{\boldsymbol{s}}^{-q} \biggr)^{1/2} \leqslant C \xi^{-(1/q-1/2)}, \end{aligned} \end{equation*} \notag $$

which, in combination with (5.36) and (5.37), proves the theorem.

Acknowledgement

A part of this work was done when the author was working at the Vietnam Institute for Advanced Study in Mathematics (VIASM). He would like to thank the VIASM for providing a fruitful research environment and working conditions.



Bibliography

1.	M. Ali and A. Nouy, “Approximation of smoothness classes by deep rectifier networks”, SIAM J. Numer. Anal., 59:6 (2021), 3032–3051
2.	R. Arora, A. Basu, P. Mianjy and A. Mukherjee, Understanding deep neural networks with rectified linear units, Electronic colloquium on computational complexity, report No. 98, 2017, 21 pp. https://eccc.weizmann.ac.il/report/2017/098/
3.	M. Bachmayr, A. Cohen, Dinh Dũng and Ch. Schwab, “Fully discrete approximation of parametric and stochastic elliptic PDEs”, SIAM J. Numer. Anal., 55:5 (2017), 2151–2186
4.	M. Bachmayr, A. Cohen, R. DeVore and G. Migliorati, “Sparse polynomial approximation of parametric elliptic PDEs. Part II: Lognormal coefficients”, ESAIM Math. Model. Numer. Anal., 51:1 (2017), 341–363
5.	M. Bachmayr, A. Cohen and G. Migliorati, “Sparse polynomial approximation of parametric elliptic PDEs. Part I: Affine coefficients”, ESAIM Math. Model. Numer. Anal., 51:1 (2017), 321–339
6.	A. R. Barron, “Complexity regularization with application to artificial neural networks”, Nonparametric functional estimation and related topics (Spetses 1990), NATO Adv. Sci. Inst. Ser. C: Math. Phys. Sci., 335, Kluwer Acad. Publ., Dordrecht, 1991, 561–576
7.	A. Chkifa, A. Cohen, R. DeVore and Ch. Schwab, “Sparse adaptive Taylor approximation algorithms for parametric and stochastic elliptic PDEs”, ESAIM Math. Model. Numer. Anal., 47:1 (2013), 253–280
8.	A. Chkifa, A. Cohen and Ch. Schwab, “High-dimensional adaptive sparse polynomial interpolation and applications to parametric PDEs”, Found. Comput. Math., 14:4 (2014), 601–633
9.	A. Chkifa, A. Cohen and Ch. Schwab, “Breaking the curse of dimensionality in sparse polynomial approximation of parametric PDEs”, J. Math. Pures Appl. (9), 103:2 (2015), 400–428
10.	A. Cohen and R. DeVore, “Approximation of high-dimensional parametric PDEs”, Acta Numer., 24 (2015), 1–159
11.	A. Cohen, R. DeVore and Ch. Schwab, “Convergence rates of best $N$-term Galerkin approximations for a class of elliptic sPDEs”, Found. Comput. Math., 10:6 (2010), 615–646
12.	A. Cohen, R. DeVore and Ch. Schwab, “Analytic regularity and polynomial approximation of parametric and stochastic elliptic PDE's”, Anal. Appl. (Singap.), 9:1 (2011), 11–47
13.	G. Cybenko, “Approximation by superpositions of a sigmoidal function”, Math. Control Signals Systems, 2:4 (1989), 303–314
14.	Dinh Dũng, “Linear collective collocation approximation for parametric and stochastic elliptic PDEs”, Mat. Sb., 210:4 (2019), 103–127 ; English transl. in Sb. Math., 210:4 (2019), 565–588
15.	Dinh Dũng, “Sparse-grid polynomial interpolation approximation and integration for parametric and stochastic elliptic PDEs with lognormal inputs”, ESAIM Math. Model. Numer. Anal., 55:3 (2021), 1163–1198
16.	Dinh Dũng and Van Kien Nguyen, “Deep ReLU neural networks in high-dimensional approximation”, Neural Netw., 142 (2021), 619–635
17.	Dinh Dũng, Van Kien Nguyen and Duong Thanh Pham, Deep ReLU neural network approximation of parametric and stochastic elliptic PDEs with lognormal inputs, arXiv: 2111.05854v1
18.	Dinh Dũng, Van Kien Nguyen, Ch. Schwab and J. Zech, Analyticity and sparsity in uncertainty quantification for PDEs with Gaussian random field inputs, arXiv: 2201.01912
19.	Dinh Dũng, Van Kien Nguyen and Mai Xuan Thao, “Computation complexity of deep ReLU neural networks in high-dimensional approximation”, J. Comp. Sci. Cybern., 37:3 (2021), 292–320
20.	I. Daubechies, R. DeVore, S. Foucart, B. Hanin and G. Petrova, “Nonlinear approximation and (deep) ReLU networks”, Constr. Approx., 55:1 (2022), 127–172
21.	R. DeVore, B. Hanin and G. Petrova, “Neural network approximation”, Acta Numer., 30 (2021), 327–444
22.	Weinan E and Qingcan Wang, “Exponential convergence of the deep neural network approximation for analytic functions”, Sci. China Math., 61:10 (2018), 1733–1740
23.	D. Elbrächter, P. Grohs, A. Jentzen and Ch. Schwab, DNN expression rate analysis of high-dimensional PDEs: application to option pricing, SAM res. rep. 2018-33, Seminar for Applied Mathematics, ETH Zürich, Zürich, 2018, 50 pp. https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2018/2018-33.pdf
24.	O. G. Ernst, B. Sprungk and L. Tamellini, “Convergence of sparse collocation for functions of countably many Gaussian random variables (with application to elliptic PDEs)”, SIAM J. Numer. Anal., 56:2 (2018), 877–905
25.	K.-I. Funahashi, “Approximate realization of identity mappings by three-layer neural networks”, Electron. Comm. Japan Part III Fund. Electron. Sci., 73:11 (1990), 61–68
26.	M. Geist, P. Petersen, M. Raslan, R. Schneider and G. Kutyniok, “Numerical solution of the parametric diffusion equation by deep neural networks”, J. Sci. Comput., 88:1 (2021), 22, 37 pp.
27.	L. Gonon and Ch. Schwab, Deep ReLU network expression rates for option prices in high-dimensional, exponential Lévy models, SAM res. rep. 2020-52 (rev. 1), Seminar for Applied Mathematics, ETH Zürich, Zürich, 2021, 35 pp. https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2020/2020-52_rev1.pdf
28.	L. Gonon and Ch. Schwab, Deep ReLU neural network approximation for stochastic differential equations with jumps, SAM res. rep. 2021-08, Seminar for Applied Mathematics, ETH Zürich, Zürich, 2021, 35 pp. https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2021/2021-08.pdf
29.	R. Gribonval, Kutyniok, M. Nielsen and F. Voigtländer, “Approximation spaces of deep neural networks”, Constr. Approx., 55:1 (2022), 259–367
30.	P. Grohs and L. Herrmann, “Deep neural network approximation for high-dimensional elliptic PDEs with boundary conditions”, IMA J. Numer. Anal., 42:3 (2022), 2055–2082
31.	D. Elbrachter, D. Perekrestenko, P. Grohs and H. Bölcskei, “Deep neural network approximation theory”, IEEE Trans. Inform. Theory, 67:5 (2021), 2581–2623
32.	I. Gühring, G. Kutyniok and P. Petersen, “Error bounds for approximations with deep ReLU neural networks in $W^{s,p}$ norms”, Anal. Appl. (Singap.), 18:5 (2020), 803–859
33.	L. Herrmann, J. A. A. Opschoor and Ch. Schwab, Constructive deep ReLU neural network approximation, SAM res. rep. 2021-04, Seminar for Applied Mathematics, ETH Zürich, Zürich, 2021, 32 pp. https://www.sam.math.ethz.ch/sam_reports/reports_fi-nal/reports2021/2021-04.pdf
34.	L. Herrmann, Ch. Schwab and J. Zech, “Deep neural network expression of posterior expectations in Bayesian PDE inversion”, Inverse Problems, 36:12 (2020), 125011, 32 pp.
35.	E. Hewitt and K. Stromberg, Real and abstract analysis. A modern treatment of the theory of functions of a real variable, Springer-Verlag, New York, 1965, vii+476 pp.
36.	Viet Ha Hoang and Ch. Schwab, “$N$-term Wiener chaos approximation rates for elliptic PDEs with lognormal Gaussian random inputs”, Math. Models Methods Appl. Sci., 24:4 (2014), 797–826
37.	K. Hornik, M. Stinchcombe and H. White, “Multilayer feedforward networks are universal approximators”, Neural Netw., 2:5 (1989), 359–366
38.	G. Kutyniok, P. Petersen, M. Raslan and R. Schneider, “A theoretical analysis of deep neural networks and parametric PDEs”, Constr. Approx., 55:1 (2022), 73–125
39.	Jianfeng Lu, Zuowei Shen, Haizhao Yang and Shijun Zhang, “Deep network approximation for smooth functions”, SIAM J. Math. Anal., 53:5 (2021), 5465–5506
40.	D. M. Matjila, “Bounds for Lebesgue functions for Freud weights”, J. Approx. Theory, 79:3 (1994), 385–406
41.	D. M. Matjila, “Convergence of Lagrange interpolation for Freud weights in weighted $L_p(\mathbb R)$, $0 < P \le 1$”, Nonlinear numerical methods and rational approximation. II (Wilrijk 1993), Math. Appl., 296, Kluwer Acad. Publ., Dordrecht, 1994, 25–35
42.	H. N. Mhaskar, “Neural networks for optimal approximation of smooth and analytic functions”, Neural Comput., 8 (1996), 164–177
43.	H. Montanelli and Qiang Du, “New error bounds for deep ReLU networks using sparse grids”, SIAM J. Math. Data Sci., 1:1 (2019), 78–92
44.	G. Montúfar, R. Pascanu, Kyunghyun Cho and Yoshua Bengio, “On the number of linear regions of deep neural networks”, NIPS 2014, Adv. Neural Inf. Process. Syst., 27, MIT Press, Cambridge, MA, 2014, 2924–2932 http://proceedings.neurips.cc/paper/2014
45.	J. A. A. Opschoor, Ch. Schwab and J. Zech, Deep learning in high dimension: ReLU network expression rates for Bayesian PDE inversion, SAM res. rep. 2020-47, Seminar for Applied Mathematics, ETH Zürich, Zürich, 2020, 50 pp. https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2020/2020-47.pdf
46.	J. A. A. Opschoor, Ch. Schwab and J. Zech, “Exponential ReLU DNN expression of holomorphic maps in high dimension”, Constr. Approx., 55:1 (2022), 537–582
47.	P. C. Petersen, Neural network theory, 2022, 60 pp. http://pc-petersen.eu/Neural_Network_Theory.pdf
48.	P. Petersen and F. Voigtlaender, “Optimal approximation of piecewise smooth functions using deep ReLU neural networks”, Neural Netw., 108 (2018), 296–330
49.	Ch. Schwab and J. Zech, “Deep learning in high dimension: Neural network expression rates for generalized polynomial chaos expansions in UQ”, Anal. Appl. (Singap.), 17:1 (2019), 19–55
50.	Ch. Schwab and J. Zech, Deep learning in high dimension: neural network approximation of analytic functions in $L^2(\mathbb R^d, \gamma_d)$, arXiv: 2111.07080
51.	Zuowei Shen, Haizhao Yang and Shijun Zhang, “Deep network approximation characterized by number of neurons”, Commun. Comput. Phys., 28:5 (2020), 1768–1811
52.	J. Sirignano and K. Spiliopoulos, “DGM: a deep learning algorithm for solving partial differential equations”, J. Comput. Phys., 375 (2018), 1339–1364
53.	T. Suzuki, Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality, ICLR 2019: International conference on learning representations (New Orleans, LA 2019) https://openreview.net/pdf?id=H1ebTsActm
54.	J. Szabados, “Weighted Lagrange and Hermité-Fejér interpolation on the real line”, J. Inequal. Appl., 1:2 (1997), 99–123
55.	G. Szegö, Orthogonal polynomials, Amer. Math. Soc. Colloq. Publ., 23, Amer. Math. Soc., New York, 1939, ix+401 pp.
56.	M. Telgarsky, Representation benefits of deep feedforward networks, arXiv: 1509.08101
57.	M. Telgarsky, “Benefits of depth in neural nets”, 29th annual conference on learning theory (Columbia Univ., New York, NY 2016), Proceedings of Machine Learning Research (PMLR), 49, 2016, 1517–1539 https://proceedings.mlr.press/v49/telgarsky16.html
58.	R. K. Tripathy and I. Bilionis, “Deep UQ: learning deep neural network surrogate models for high dimensional uncertainty quantification”, J. Comput. Phys., 375 (2018), 565–588
59.	D. Yarotsky, “Error bounds for approximations with deep ReLU networks”, Neural Netw., 94 (2017), 103–114
60.	D. Yarotsky, “Optimal approximation of continuous functions by very deep ReLU networks”, 31st annual conference on learning theory, Proceedings of Machine Learning Research (PMLR), 75, 2018, 639–649 https://proceedings.mlr.press/v75/yarotsky18a.html
61.	J. Zech, D. Dũng and Ch. Schwab, “Multilevel approximation of parametric and stochastic PDES”, Math. Models Methods Appl. Sci., 29:9 (2019), 1753–1817
62.	J. Zech and Ch. Schwab, “Convergence rates of high dimensional Smolyak quadrature”, ESAIM Math. Model. Numer. Anal., 54:4 (2020), 1259–1307

Citation: Dinh Dũng, “Collocation approximation by deep neural ReLU networks for parametric and stochastic PDEs with lognormal inputs”, Sb. Math., 214:4 (2023), 479–515

Citation in format AMSBIB

\Bibitem{Din23}

\by Dinh~D\~ung

\paper Collocation approximation by deep neural ReLU networks for parametric and stochastic PDEs with lognormal inputs

\jour Sb. Math.

\yr 2023

\vol 214

\issue 4

\pages 479--515

\mathnet{http://mi.mathnet.ru//eng/sm9791}

\crossref{https://doi.org/10.4213/sm9791e}

\mathscinet{http://mathscinet.ams.org/mathscinet-getitem?mr=4653192}

\zmath{https://zbmath.org/?q=an:1535.65013}

\adsnasa{https://adsabs.harvard.edu/cgi-bin/bib_query?2023SbMat.214..479D}

\isi{https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=Publons&SrcAuth=Publons_CEL&DestLinkType=FullRecord&DestApp=WOS_CPL&KeyUT=001086876100002}

\scopus{https://www.scopus.com/record/display.url?origin=inward&eid=2-s2.0-85176607881}

Linking options:

https://www.mathnet.ru/eng/sm9791

https://doi.org/10.4213/sm9791e

https://www.mathnet.ru/eng/sm/v214/i4/p38

This publication is cited in the following 2 articles:

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Statistics & downloads:
Abstract page:	314
Russian version PDF:	22
English version PDF:	56
Russian version HTML:	119
English version HTML:	131
References:	37
First page:	4

Что такое QR-код?

Registration to the website

Logotypes