Abstract:
We find the convergence rates of the collocation approximation by deep ReLU neural networks of solutions to elliptic PDEs with lognormal inputs, parametrized by y in the noncompact set R∞. The approximation error is measured in the norm of the Bochner space L2(R∞,V,γ), where γ is the infinite tensor-product
standard Gaussian probability measure on R∞ and V is the energy space. We also obtain similar dimension-independent results in the case when the lognormal inputs are parametrized by RM of very large dimension M, and the approximation error is measured in the √gM-weighted uniform norm of the Bochner space L√g∞(RM,V), where gM is the density function of the standard Gaussian probability measure on RM.
Bibliography: 62 titles.
Partial differential equations (PDEs) with parametric and stochastic inputs are a common model used in science and engineering. Stochastic nature reflects the uncertainty in various parameters presented in the physical phenomenon modelled by the equation. A central problem of computational uncertainty quantification is efficient numerical approximation for parametric and stochastic PDEs; it was of great interest and achieved significant progress in recent decades. There is a large number of non-deep-neural-network papers on this topic to mention all of them. We point out just some works [3]–[5], [7]–[12], [14], [15], [24], [36], [61] and [62], which are related directly to our paper. In particular, collocation approximations which are based on a finite number of particular solvers to parametric and stochastic PDEs, were considered in [8]–[10], [14], [15], [18], [24] and [61].
The approximation universality of neural networks has achieved basic understanding since the 1980s (see [6], [13], [25] and [37]). In recent years deep neural networks were rapidly developed in theory and in applications to a wide range of fields due to their advantage over shallow ones. Since their application range is getting wider, a theoretical analysis discovering reasons for these significant practical improvements attracts special attention [2], [20], [44], [56], [57]. In recent years a number of interesting papers addressed the role of the depth and architecture of deep neural networks for nonadaptive and adaptive approximation of functions having a certain regularity [1], [22], [29], [32], [31], [42], [39], [51], [48], [59], [60]. High-dimensional approximations by deep neural networks were studied in [43], [53], [16] and [19], and their applications to high-dimensional PDEs in [23], [27], [28], [30], [33], [46] and [52]. Most of these papers employed the rectified linear unit (ReLU) as the activation function of deep neural networks, since the ReLU is simple and preferable in many applications. The output of such a deep neural network is a continuous piecewise linear function, which is easily and cheaply computed. The reader can consult the recent survey papers [21] and [47] for various problems and aspects of neural network approximation and bibliography.
Recently, a number of papers were devoted to various problems and methods of deep neural network approximation for parametric and stochastic PDEs such as dimensionality reduction [58], deep neural network expression rates for generalized polynomial chaos expansions (GPC) of solutions to parametric elliptic PDEs [17], [49], reduced basis methods [38], the problem of learning the discretized parameter-to-solution map in practice [26], Bayesian PDE inversion [33], [34], [45] and so on. Note that, except for [17], all these papers treated parametric and stochastic PDEs with affine inputs on the compact set I∞:=[−1,1]∞. The authors of [49] proved dimension-independent deep neural network expression rate bounds for the uniform approximation of solutions to parametric elliptic PDEs with affine inputs on I∞, based on n-term truncations of the nonorthogonal Taylor GPC expansion. The construction of approximating deep neural networks relies on the weighted summability of the Taylor GPC expansion coefficients of the solution, which is derived from its analyticity. The paper [17] investigated nonadaptive methods of deep ReLU neural network approximation of the solution u to parametric and stochastic elliptic PDEs with lognormal inputs on the noncompact set R∞. The approximation error is measured in the norm of the Bochner space L2(R∞,V,γ), where γ is the tensor-product standard Gaussian probability on R∞ and V is the energy space. The approximation is based on an m-term truncation of the Hermite GPC of u. Under a certain assumption on the ℓq-summability (0<q<∞) of the lognormal inputs, it was proved that for every integer n>1 one can construct a non-adaptive compactly supported deep ReLU neural network ϕn of size ⩽n on Rm with m=O(n/logn), having m outputs so that the sum obtained by replacing the Hermite polynomials in the m-term truncation by these m outputs approximates u with error bound O((n/logn)−1/q). The authors of [17] also obtained some results on similar problems for parametric and stochastic elliptic PDEs with affine inputs, based on the Jacobi and Taylor GPC expansions.
In the present paper we are interested in constructing deep ReLU neural networks for collocation approximation of the solution to parametric elliptic PDEs with lognormal inputs. We study the convergence rate of this approximation in terms of the size of deep ReLU neural networks.
Let D⊂Rd be a bounded Lipschitz domain. Consider the diffusion elliptic equation
−div(a∇u)=fin D,u|∂D=0,
for a prescribed right-hand side f and diffusion coefficient a as functions on D. Denote by V:=H10(D) the so-called energy space of all those functions in the Sobolev space H1(D) that have compact support in D. Let H−1(D) be the dual space of V. Assume that f∈H−1(D) (in what follows this preliminary assumption always holds without mention). If a∈L∞(D) satisfies the ellipticity assumption
0<amin
then by the well-known Lax-Milgram lemma there exists a unique solution u \in V to the (non-parametric) equation (1.1) in the weak form
\begin{equation*}
\int_{D} a\nabla u \cdot \nabla v \, \mathrm d \boldsymbol{x}=\langle f , v \rangle \quad \forall\, v \in V.
\end{equation*}
\notag
For equation (1.1) we consider diffusion coefficients having a parametrized form a=a(\boldsymbol{y}), where \boldsymbol{y}=(y_j)_{j \in \mathbb{N}} is a sequence of real-valued parameters ranging in the set {\mathbb R}^\infty. Denote by u(\boldsymbol{y}) the solution to the parametrized diffusion elliptic equation
The resulting solution operator maps \boldsymbol{y}\in {\mathbb R}^\infty to u(\boldsymbol{y})\in V. The goal is to achieve numerical approximation of this complex map by a small number of parameters with a guaranteed error in a given norm. Depending on the nature of the object modelled, the parameter \boldsymbol{y} can be either deterministic or random. In our paper we consider the so-called lognormal case when the diffusion coefficient a is of the form
where the y_j are independent and identically distributed standard Gaussian random variables and \psi_j \in L_\infty(D). We also consider the finite-dimensional form when
for a finite but very large dimension M. Notice that for fixed \boldsymbol{y} both the cases (1.4) and (1.5) of equation (1.2) satisfy the ellipticity assumption, and therefore there exists a unique solution u(\boldsymbol{y}) \in V to equation (1.2) in the weak form. However, there is no the uniform ellipticity with respect to \boldsymbol{y} since {\mathbb R}^\infty and {\mathbb R}^M are not compact sets.
We describe briefly the main results of our paper.
We investigate nonadaptive collocation methods for high-dimensional deep ReLU neural network approximation of the solution u(\boldsymbol{y}) to parametrized diffusion elliptic PDEs (1.2) with lognormal inputs (1.3) in the infinite-dimensional case (1.4) and the finite-dimensional case (1.5). In the infinite-dimensional case (1.4) the approximation error is measured in the norm of the Bochner space L_2({\mathbb R}^\infty, V, \gamma), where \gamma is the infinite tensor-product standard Gaussian probability on {\mathbb R}^\infty. Assume that there exists an increasing sequence \boldsymbol{\rho}= (\rho_{j})_{j \in \mathbb N} of positive numbers strictly larger than 1 such that for some 0<q<2,
Then, given an arbitrary number \delta such that 0<\delta < \min (1, 1/q -1/2), for every integer n > 1, we can construct a deep ReLU neural network \boldsymbol{\phi}_n:= (\phi_j)_{j=1}^m on {\mathbb R}^m with m=\mathcal O (n^{1-\delta}) of size at most n and the sequence of points {Y_n:=(\boldsymbol{y}^j)_{j=1}^m \subset {\mathbb R}^m} so that
(i) the deep ReLU neural network \boldsymbol{\phi}_n and sequence of points Y_n are independent of u;
(ii) the output dimension of \boldsymbol{\phi}_n is m=\mathcal O (n^{1-\delta});
(iii) the depth of \boldsymbol{\phi}_n is \mathcal{O}(n^\delta);
(iv) the components \phi_j, j = 1,\dots,m, of \boldsymbol{\phi}_n are deep ReLU neural networks on \mathbb{R}^{m_j} for m_j = \mathcal{O}(n^\delta), having support in the super-cube [-T,T]^{m_j}, where {T=\mathcal O (n^{1-\delta})};
(v) if \Phi_j is the extension of \phi_j to the whole {\mathbb R}^\infty by \Phi_j(\boldsymbol{y})=\phi_j\bigl((y_j)_{j=1}^{m_j}\bigr) for {\boldsymbol{y}=(y_j)_{j\in \mathbb N} \in {\mathbb R}^\infty}, then the collocation approximation of u by the function
We also obtain results similar to properties (i)–(v) in the finite-dimensional case (1.5), with the approximation error measured in the \sqrt{g_M}-weighted uniform norm of the Bochner space L_\infty^{\sqrt{g}}({\mathbb R}^M, V), where g_M is the density function of the standard Gaussian probability measure on {\mathbb R}^M.
These results are derived from results on deep ReLU neural network collocation approximation of functions in Bochner spaces related to a general separable Hilbert space and standard Gaussian probability measures, which are based on weighted \ell_2-summabilities of the Hermite GPC expansion coefficients of functions (see § 3 for details).
Notice that the error bound in m in (1.6) is the same as the error bound of the collocation approximation of u by the sparse-grid Lagrange GPC interpolation based on m the same particular solvers (u(\boldsymbol{y}^j))_{j=1}^m, which is the best known result so far (see [15], Corollary 3.1). Moreover, the convergence rate (1-\delta)(1/q - 1/2) with arbitrarily small \delta > 0 in terms of the size of the deep ReLU network in collocation approximation, is comparable with the convergence rate 1/q - 1/2 with respect to the number of particular solvers in collocation approximation by sparse-grid Lagrange GPC interpolation. This is a crucial difference between the results in our paper and in [17], where the convergence rate was found for the deep ReLU network approximation of solutions to parametrized diffusion elliptic PDEs (1.2) with lognormal inputs (1.3) based on different input information, the coefficients of the Hermite GPC expansion in its finite truncations. Although that convergence rate is sharper than the one in (1.6), in general it is well known that collocation approximations are more important, difficult and applicable than those using spectral information about the coefficients of an orthonormal expansion. The extension of the results (i)–(v) to the Bochner space L_\infty^{\sqrt{g}}({\mathbb R}^M, V) is also an important difference of our contribution in comparison with [17].
We would like to emphasize that the motivation of this paper is to establish approximation results which should show the possibilities of nonadaptive collocation approximation by deep ReLU neural networks and the convergence rates of approximation for the parametrized diffusion elliptic equation (1.2) with lognormal inputs, and we do not consider the numerical aspect of this problem. The results themselves do not give a practically realizable approximation because they do not cover the approximation of the coefficients which are particular solvers for certain points of the space variables. Moreover, the approximant \Phi_n u is not a real deep ReLU network, but just a combination of these particular solvers and components of a deep ReLU network. It would be interesting to investigate the problem of full deep ReLU neural network approximation of the solution u to parametric and stochastic elliptic PDEs by combining the spatial and parametric domains based on fully discrete approximation in [3] and [15]. This problem will be discussed in a forthcoming paper.
The paper is organized as follows. In § 2 we present a necessary knowledge about deep ReLU neural networks. Section 3 is devoted to collocation methods of deep ReLU neural network approximation of functions in Bochner spaces L_2({\mathbb R}^\infty, X, \gamma) or in L_2({\mathbb R}^M,X,\gamma) related to a separable Hilbert space X and the tensor-product standard Gaussian probability measure \gamma. In § 4 we apply the results from § 3 to the collocation approximation by deep ReLU neural networks of the solution u to the parametrized elliptic PDEs (1.2) with lognormal inputs (1.3) in the infinite case (1.4) and the finite case (1.5).
Notation
As usual, \mathbb{N} denotes the natural numbers, \mathbb{Z} the integers, \mathbb{R} the real numbers and \mathbb{N}_0:= \{s\in \mathbb{Z}\colon s\geqslant0\}. We denote by \mathbb{R}^\infty the set of all sequences \boldsymbol{y} = (y_j)_{j\in \mathbb{N}} with y_j\in \mathbb{R}. For a set G, we denote by |G| the cardinality of G. If \boldsymbol{a}= (a_j)_{j \in \mathcal{J}} is a sequence of positive numbers with any index set \mathcal{J}, then we use the notation \boldsymbol{a}^{-1}:= (a_j^{-1})_{j \in \mathcal{J}}. We use the letters C and K to denote general positive constants which can take different values, and we use C_{\alpha,\beta,\dots} and K_{\alpha,\beta,\dots} when we want to emphasize the dependence of these constants on \alpha,\beta,\dots, or when this dependence is important in a particular situation.
For the convenience of the reader we list some specific notation and definitions which are widely used in the present paper and indicate where they are introduced.
Section 2: the symbols W(\Phi), L(\Phi) and \operatorname{supp}(\Phi) denote the size, depth and support of the deep ReLU neural network \Phi, respectively; \sigma(t):= \max\{t,0\} is the ReLU activation function.
Subsection 3.1: \mathbb{F} denotes the set of all sequences of nonnegative integers {\boldsymbol{s}=(s_j)_{j \in \mathbb{N}}} such that their support \operatorname{supp} (\boldsymbol{s}):= \{j \in \mathbb{N}\colon s_j >0\} is a finite set. The letter J denotes either \infty or M \in \mathbb{N}; the set U is defined in (3.2), the set \mathcal{F} in (3.6), and the set \mathcal{N} in (3.7); \gamma and \gamma_M are the standard Gaussian measures in {\mathbb R}^\infty and {\mathbb R}^M, respectively. For \boldsymbol{s} \in \mathbb{F} put |\boldsymbol{s}|_1:= \sum_{j \in \mathbb{N}} s_j and |\boldsymbol{s}|_0:= |\operatorname{supp} (\boldsymbol{s})|. For \boldsymbol{s}, \boldsymbol{s}' \in \mathcal{F} the inequality \boldsymbol{s}' \leqslant \boldsymbol{s} means that s_j' \leqslant s_j for j \in \mathcal{N}. A set \boldsymbol{\sigma}=(\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}} with \sigma_{\boldsymbol{s}} \in \mathbb{R} is called increasing if \sigma_{\boldsymbol{s}'} \leqslant \sigma_{\boldsymbol{s}} for \boldsymbol{s}' \leqslant \boldsymbol{s}. The Bochner space \mathcal{L}(U,X) is defined in (3.5); the Bochner spaces L_2(U,X,\gamma) and L_{\infty}^{\sqrt{g}}({\mathbb R}^M,X) are given by (3.3) and (3.4), respectively. In (3.9), H_{\boldsymbol{s}} is defined as the \boldsymbol{s}th Hermite orthonormal polynomial and v_{\boldsymbol{s}} as the \boldsymbol{s}th coefficient of the Hermite GPC expansion of v.
Subsection 3.2: Y_m = (y_{m;k})_{k \in \pi_m} is the increasing sequence of the m+1 roots of the Hermite polynomial H_{m+1}; I_m is the Lagrange interpolation operator defined by (3.12); \lambda_m is the Lebesgue constant defined by (3.13); \Delta_{\boldsymbol{s}} is the tensor product operator defined by (3.16); I_\Lambda is the GPC interpolation operator defined by (3.18); the set \boldsymbol{p}(\theta, \lambda):=(p_{\boldsymbol{s}}(\theta, \lambda))_{\boldsymbol{s} \in \mathcal F} is defined by (3.20); the set \Lambda(\xi) is defined by (3.21) and the set G(\xi) by (3.23).
§ 2. Deep ReLU neural networks
In this section we present some auxiliary knowledge on deep ReLU neural networks, which will be used as a tool of approximation. As in [59], we use deep feed-forward neural networks that allow connections between neurons in a layer with neurons in any preceding layers (but not in the same layer). The ReLU activation function is defined by \sigma(t):= \max\{t,0\}, t\in \mathbb{R}. We set \sigma(\boldsymbol{x}):= (\sigma(x_1),\dots, \sigma(x_d)) for \boldsymbol{x}=(x_1,\dots,x_d) \in {\mathbb R}^d.
Let us recall a standard definition of a deep ReLU neural network and some relevant terminology. Let d,L\in \mathbb{N}, L\geqslant 2, N_0=d, and N_1,\dots,N_{L}\in \mathbb{N}. Let \boldsymbol{W}^\ell=(w^\ell_{i,j})\in \mathbb R^{N_\ell\times (\sum_{i=1}^{\ell-1}N_i)}, \ell=1,\dots,L, be an N_\ell\times (\sum_{i=1}^{\ell-1}N_i) matrix, and let \boldsymbol{b}^\ell =(b^\ell_j)\in \mathbb{R}^{N_\ell}. A ReLU neural network \Phi (on {\mathbb R}^d) with input dimension d, output dimension N_L and L layers is a sequence of matrix-vector tuples
We call \boldsymbol{z}^0 the input and, with a certain ambiguity, we use the notation \Phi(\boldsymbol{x}):= \boldsymbol{z}^L for the output of \Phi which is an L-dimensional vector function on {\mathbb R}^d. In some places we identify a ReLU neural network with its output. We adopt the following terminology:
There are two basic operations that neural networks allow. These are the parallelelization of several neural networks and the concatenation of two neural networks. The reader can find, for instance, in [32] (see also [21] and [47]) for detailed descriptions, as well as the following two lemmas on these operations.
Lemma 2.1 (parallelization). Let N\in \mathbb{N} and \lambda_j\in \mathbb{R}, j=1,\dots,N. Let \Phi_j, j=1,\dots,N be deep ReLU neural networks with input dimension d. Then we can explicitly construct a deep ReLU neural network denoted by \Phi so that
The deep ReLU neural network \Phi is called the parallelization of \Phi_j, j=1,\dots,N.
Lemma 2.2 (concatenation). Let \Phi_1 and \Phi_2 be two ReLU neural networks such that the output layer of \Phi_1 has the same dimension as the input layer of \Phi_2. Then we can explicitly construct a ReLU neural network \Phi such that \Phi(\boldsymbol{x})=\Phi_2(\Phi_1(\boldsymbol{x})) for \boldsymbol{x}\in \mathbb{R}^d. Moreover we have
The deep ReLU neural network \Phi is called the concatenation of \Phi_1 and \Phi_2.
The following lemma is a direct consequence of Proposition 3.3 in [49].
Lemma 2.3. Let \boldsymbol{\ell} \in {\mathbb N}^d. For every \delta \in (0,1) we can explicitly construct a deep ReLU neural network \Phi_P on {\mathbb R}^d so that
Furthermore, if x_j=0 for some j\in \{1,\dots,d\} then \Phi_P(\boldsymbol{x})=0 and there exists a positive constant C independent of \delta, d and \boldsymbol{\ell} such that
For j=0,1 let \varphi_j be the continuous piecewise linear functions with break points \{-2,-1,1,2\} and \operatorname{supp}(\varphi_j) \subset [-2,2] such that \varphi_0(x)=1 and \varphi_1(x)=x if x\in [-1,1].
Lemma 2.4. Let \boldsymbol{\ell} \in {\mathbb N}^d, and let \varphi be either \varphi_0 or \varphi_1. Then for every \delta \in (0,1) we can explicitly construct a deep ReLU neural network \Phi on {\mathbb R}^d so that
This yields that the \varphi_j can be realized exactly by a shallow ReLU neural network (still denoted by \varphi_j) of size W(\varphi_0)\leqslant 10 and W(\varphi_1)\leqslant 8. The network \Phi can be constructed as a concatenation of the deep ReLU neural networks \{\varphi(x_j)\}_{j=1}^d and \Phi_P. By the definitions of a deep ReLU neural network and the function \varphi we have
Hence estimates (2.1) follow directly from Lemmas 2.2 and 2.3. The lemma is proved.
§ 3. Deep ReLU neural network approximation in Bochner spaces
In this section we investigate collocation methods of deep ReLU neural network approximation of functions in Bochner spaces related to a Hilbert space X and tensor-product standard Gaussian probability measures \gamma. Functions to be approximated have the weighted \ell_2-summable Hermite GPC expansion coefficients (see Assumption (I) below). The approximation is based on sparse-grid Lagrange GPC interpolation. We develop such methods and establish the convergence rates of the approximation using them. The results obtained in this section are applied to deep ReLU neural network collocation approximation of the solution of parametrized elliptic PDEs with lognormal inputs in the next section.
3.1. Tensor-product Gaussian measures and Bochner spaces
Let \gamma(y) be the standard Gaussian probability measure on \mathbb{R} with density
\begin{equation}
g(y):=\frac 1 {\sqrt{2\pi}} e^{-y^2/2}, \quad\text{so that } \mathrm d\gamma(y):=g(y)\,\mathrm d y .
\end{equation}
\tag{3.1}
For M \in \mathbb{N}, the standard Gaussian probability measures \gamma(\boldsymbol{y}) on {\mathbb R}^M can be defined by
\begin{equation*}
\mathrm d \gamma(\boldsymbol{y}) := g_M(\boldsymbol{y}) \mathrm d (\boldsymbol{y})=\bigotimes_{j=1}^M g(y_j) \mathrm d (y_j), \qquad \boldsymbol{y}=(y_j)_{j=1}^M \in {\mathbb R}^M,
\end{equation*}
\notag
where g_M(\boldsymbol{y}) := \bigotimes_{j=1}^M g(y_j).
Next we recall the concept of standard Gaussian probability measure \gamma(\boldsymbol{y}) on {\mathbb R}^\infty as the infinite tensor product of the standard Gaussian probability measures \gamma(y_i):
The sigma algebra for \gamma(\boldsymbol{y}) is generated by the set of cylinders A:= \prod_{j \in \mathbb{N}} A_j, where A_j \subset \mathbb{R} are univariate \gamma-measurable sets and only a finite number of the A_i are different from \mathbb{R}. For such a set A, we have \gamma(A) = \prod_{j \in \mathbb{N}} \gamma(A_j). (For details on an infinite tensor product of probability measures, see, for example, [35], pp. 429–435.)
In what follows we use the unified notation: J denotes either \infty or M \in \mathbb{N} and
If X is a separable Hilbert space, then the standard Gaussian probability measure \gamma on U induces the Bochner space L_2(U,X,\gamma) of \gamma-measurable mappings v from U to X, equipped with the norm
For a \gamma-measurable subset \Omega of U the spaces L_2(\Omega,X,\gamma) and L_2(\Omega,\gamma) are defined in the usual way.
In the case U={\mathbb R}^M we also introduce the space L_{\infty}^{\sqrt{g}}({\mathbb R}^M,X) as the set of all strongly \gamma-measurable functions v\colon {\mathbb R}^M \to X with \sqrt{g_M}-weighted uniform norm
One may expect an infinite-dimensional version of this space. Unfortunately, we could not give a consistent definition of L_{\infty}^{\sqrt{g}}({\mathbb R}^\infty,X) because there is no an infinite-dimensional counterpart of the weight g_M. However, under certain assumptions (see Assumption (I) below) we can obtain some approximation results which do not depend on M, in particular, when M are very large. We make use of the shorthand notation L_{\infty}^{\sqrt{g}}({\mathbb R}^M)= L_{\infty}^{\sqrt{g}}({\mathbb R}^M,\mathbb{R}) and L_{\infty}^{\sqrt{g}}(\mathbb{R})= L_{\infty}^{\sqrt{g}}(\mathbb{R},\mathbb{R}).
In this section we investigate the problem of deep ReLU neural network approximation of functions in L_2({\mathbb R}^\infty, X, \gamma) or L_2({\mathbb R}^M, X, \gamma) with the error measured in the norms of the space L_2({\mathbb R}^\infty, X, \gamma) or L_\infty^{\sqrt{g}}({\mathbb R}^M,X), respectively. (Notice that these norms are the most important ones in the evaluation of the error of the collocation approximation of solutions of parametric and stochastic PDEs). It is convenient for us to incorporate these different approximation problems into unified consideration. Hence, in what follows we use the unified notation
Here \mathbb{F} is the set of all sequences of nonnegative integers \boldsymbol{s}=(s_j)_{j \in \mathbb{N}} such that their support \operatorname{supp} (\boldsymbol{s}):= \{j \in \mathbb{N}\colon s_j >0\} is a finite set.
Let (H_k)_{k \in \mathbb{N}_0} be the Hermite polynomials normalized by \displaystyle\int_{\mathbb{R}} |H_k(y)|^2 g(y)\,\mathrm{d}y=1. Then a function v \in L_2(U,X,\gamma) can be represented by the Hermite GPC expansion
Notice that (H_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}} is an orthonormal basis of L_2(U,\gamma):= L_2(U,\mathbb{R}, \gamma). Moreover, for every v \in L_2(U,X,\gamma) represented by the series (3.8), Parseval’s identity holds
For \boldsymbol{s}, \boldsymbol{s}' \in \mathcal{F}, the inequality \boldsymbol{s}' \leqslant \boldsymbol{s} means that s_j' \leqslant s_j for j \in \mathcal{N}. A set \boldsymbol{\sigma}=(\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}} with \sigma_{\boldsymbol{s}} \in \mathbb{R} is called increasing if \sigma_{\boldsymbol{s}'} \leqslant \sigma_{\boldsymbol{s}} for \boldsymbol{s}' \leqslant \boldsymbol{s}.
Assumption (I). For v \in L_2(U,X,\gamma) represented by a series (3.8) there exists an increasing set \boldsymbol{\sigma} =(\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}} of positive numbers such that for some q such that 0< q < 2,
where the constants C_1 and C_2 are independent of J.
Here and in what follows, ‘independent of J’ means that C_1 and C_2 (and other constants) are independent of M when J=M, since we are interested in convergence rates and other asymptotic properties which do not depend on M and are based on Assumption (I).
Lemma 3.1. For v \in L_2(U,X,\gamma) satisfying Assumption (I) the series (3.8) converges absolutely, and therefore unconditionally, in \mathcal{L}(U,X) to v and
\begin{equation}
\sum_{\boldsymbol{s}\in\mathcal F} \|v_{\boldsymbol{s}}\|_{X} \leqslant C <\infty,
\end{equation}
\tag{3.11}
where the constant C is independent of J.
Proof. By applying Hölder’s inequality from Assumption (I) we obtain
This proves (3.11). Hence, by the equality \|H_{\boldsymbol{s}}\|_{L_2({\mathbb R}^\infty)}=1, \boldsymbol{s} \in \mathbb{F}, and the inequality \|H_{\boldsymbol{s}}\|_{L_\infty^{\sqrt{g}}({\mathbb R}^\infty)} < 1, \boldsymbol{s} \in \mathbb{N}_0^M (which follows from (5.7)) the series (3.8) converges absolutely, and therefore unconditionally, to v \in L_2(U,X,\gamma), since by Parseval’s identity it already converges to v in the norm of L_2(U,X,\gamma). The lemma is proved.
3.2. Sparse-grid Lagrange GPC interpolation
For m \in \mathbb{N}_0, let Y_m = (y_{m;k})_{k \in \pi_m} be the increasing sequence of the m+1 roots of the Hermite polynomial H_{m+1}, ordered as follows:
(in particular, I_0(v) = v(y_{0,0})L_{0,0}(y)= v(0) and L_{0,0}(y)=1). Notice that I_m(v) is a function on \mathbb{R} taking values in X and interpolating v at y_{m;k}, that is, I_m(v)(y_{m;k}) = v(y_{m;k}). Moreover, for a function v\colon \mathbb{R} \to \mathbb{R}, the function I_m(v) is the Lagrange polynomial of degree \leqslant m, and I_m(\varphi) = \varphi for every polynomial \varphi of degree {\leqslant m}.
be the Lebesgue constant. It was proved in [40], [41] and [54] that
\begin{equation*}
\lambda_m \leqslant C(m+1)^{1/6}, \qquad m \in \mathbb N,
\end{equation*}
\notag
for some positive constant C independent of m (with the obvious inequality \lambda_0(Y_0) \leqslant 1). Hence, for every \varepsilon > 0 there exists a positive constant C_\varepsilon \geqslant 1 independent of m such that
We will use a sparse-grid Lagrange GPC interpolation as an intermediate approximation in the deep ReLU neural network approximation of functions v\in L_2(U,X,\gamma). In order to have a consistent definition of the interpolation operator we have to impose some necessary restrictions on v. Let \mathcal{E} be a \gamma-measurable subset in U such that \gamma(\mathcal{E}) =1 and \mathcal{E} contains all \boldsymbol{y} \in U with |\boldsymbol{y}|_0 < \infty in the case when U={\mathbb R}^\infty, where |\boldsymbol{y}|_0 denotes the number of nonzero components y_j of \boldsymbol{y}. Given \mathcal{E} and a Hilbert space X, we define L_2^\mathcal{E}(U,X,\gamma) as the subspace in L_2(U,X,\gamma) of all elements v such that the value at a point v(\boldsymbol{y}) (of a representative of v) is well defined for all \boldsymbol{y} \in \mathcal{E}. In what follows \mathcal{E} is fixed.
For v \in L_2^\mathcal{E}(U,X,\gamma) we introduce the tensor product operator \Delta_{\boldsymbol{s}}, \boldsymbol{s} \in \mathcal{F}, by
where the univariate operator \Delta_{s_j} is successively applied to the univariate function \bigotimes_{i<j} \Delta_{s_i}(v) by considering it as a function of the variable y_j with the other variables fixed. From the definition of L_2^\mathcal{E}(U,X,\gamma) one can see that the operators \Delta_{\boldsymbol{s}} are well defined for all \boldsymbol{s} \in \mathcal{F}. For \boldsymbol{s} \in \mathcal{F} we set
(the function I_{\boldsymbol{s}}(v) is defined in the same manner as \Delta_{\boldsymbol{s}}(v)).
For \boldsymbol{s} \in \mathcal{F} and \boldsymbol{k} \in \pi_{\boldsymbol{s}}, let E_{\boldsymbol{s}} be the subset in \mathcal{F} of all \boldsymbol{e} such that e_j is either 1 or 0 if s_j > 0, and e_j is 0 if s_j = 0, and let \boldsymbol{y}_{\boldsymbol{s};\boldsymbol{k}}:= (y_{s_j;k_j})_{j \in \mathcal{N}} \in U. Put |\boldsymbol{s}|_1 := \sum_{j \in \mathcal{N}} s_j for \boldsymbol{s} \in \mathcal{F}. It is easy to check that the interpolation operator \Delta_{\boldsymbol{s}} can be represented in the form
A set \Lambda \subset \mathcal{F} is called downward closed if the inclusion \boldsymbol{s} \in \Lambda yields the inclusion \boldsymbol{s}' \in \Lambda for every \boldsymbol{s}' \in \mathcal{F} such that \boldsymbol{s}' \leqslant \boldsymbol{s}.
For \theta, \lambda \geqslant 0 we define the set \boldsymbol{p}(\theta, \lambda):= (p_{\boldsymbol{s}}(\theta, \lambda))_{\boldsymbol{s} \in \mathcal F} by
with shorthand notation p_{\boldsymbol{s}}(\theta):= p_{\boldsymbol{s}}(\theta, 1) and \boldsymbol{p}(\theta):= \boldsymbol{p}(\theta, 1).
Let 0 < q < \infty, and let \boldsymbol{\sigma}= (\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}} be a set of positive numbers. For \xi >0 we define the set
The following theorem gives an estimate for the error of the approximation of v \in \mathcal{L}_2^\mathcal{E}(U,X,\gamma) by sparse-grid Lagrange GPC interpolation I_{\Lambda(\xi)} v at the sampling points in the set G(\xi), which will be used in the deep ReLU neural approximation in the next section.
Theorem 3.1. Let v \in \mathcal{L}_2^\mathcal{E}(U,X,\gamma) satisfy Assumption (I), and let \varepsilon >0 be a fixed number. Assume that \|\boldsymbol{p}(\theta/q,\lambda)\boldsymbol{\sigma}^{-1}\|_{\ell_q(\mathcal F)} \leqslant C < \infty, where \theta =7/3 + 2\varepsilon, \lambda:= C_\varepsilon is the constant from Lemma 3.2, and the constant C is independent of J. Then for each \xi > 1 we have
Corollary 3.1. Under the assumptions of Theorem 3.1, for each n > 1 we can construct a sequence of points Y_{\Lambda(\xi_n)}:= (\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e}; \boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi_n)} so that |G(\xi_n)| \leqslant n and
where the constant C in (3.25) is independent of J, v and n.
Proof. Notice that this corollary was proved in [15], Corollary 3.1, for the case U={\mathbb R}^\infty. By Lemma 5.2|G(\xi)| \leqslant C_q \xi for every \xi > 1. Hence the corollary follows from Theorem 3.1 by selecting \xi_n as the maximum number satisfying |G(\xi_n)| \leqslant n.
3.3. Approximation by deep ReLU neural networks
In this section, we construct deep ReLU neural networks for the collocation approximation of functions v \in L_2(U,X,\gamma). We primarily approximate v by the sparse-grid Lagrange GPC interpolation I_{\Lambda(\xi)} v. Under assumption (iii) of Lemma 5.1, I_{\Lambda(\xi)} v can be seen as a function on {\mathbb R}^m, where m :=\min\{M,\lfloor K_q \xi \rfloor\}. At the next step we approximate I_{\Lambda(\xi)}v by its truncation I_{\Lambda(\xi)}^{\omega}v on a sufficiently large super-cube
where the parameter \omega depending on \xi is chosen in an appropriate way. Finally, the function I_{\Lambda(\xi)}^{\omega}v, and therefore v is approximated by a function \Phi_{\Lambda(\xi)}v on {\mathbb R}^m which is constructed from a deep ReLU neural network. Let us describe this construction.
For convenience, we consider \mathbb{R}^m as the subset of all \boldsymbol{y} \in U such that y_j = 0 for j > m. If f is a function on {\mathbb R}^m taking values in a Hilbert space X, then f has an extension to \mathbb{R}^{m'} for m' > m and the whole of U, which is denoted by f again, by the formula f(\boldsymbol{y})=f((y_j)_{j=0}^m) for \boldsymbol{y} = (y_j)_{j =1}^{m'} and \boldsymbol{y} = (y_j)_{j \in \mathcal{N}}, respectively.
Suppose that deep ReLU neural networks \phi_{\boldsymbol{s} - \boldsymbol{e};\boldsymbol{k}} on \mathbb{R}^{|\operatorname{supp}(\boldsymbol{s})|} have already been constructed for the approximation of the polynomials L_{\boldsymbol{s} - \boldsymbol{e};\boldsymbol{k}}, (\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi). Then the network \boldsymbol{\phi}_{\Lambda(\xi)}:= (\phi_{\boldsymbol{s}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)} on \mathbb{R}^m with |G(\xi)| outputs which is constructed by parallelization, is used to construct an approximation of I_{\Lambda(\xi)}^{\omega}v, and therefore of v. Namely, we approximate v by
In this section we prove our main results on deep ReLU neural network approximation of functions v \in L_2^\mathcal{E}(U,X,\gamma) with the error measured in the norm of the space L_2({\mathbb R}^\infty,X,\gamma) or L_\infty^{\sqrt{g}}({\mathbb R}^M,X), which are incorporated into the following common theorem.
Denote by \boldsymbol{e}^i = (e^i_j)_{j \in \mathcal{N}}\in \mathcal{F} the element such that e^i_i = 1 and e^i_j = 0 for j \ne i.
Theorem 3.2. Let v \in L_2^\mathcal{E}(U,X,\gamma) satisfy Assumption (I). Let \theta be any number such that \theta \geqslant 3/q. Assume that the set \boldsymbol{\sigma}= (\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}} in Assumption (I) satisfies \sigma_{\boldsymbol{e}^{i'}} \leqslant \sigma_{\boldsymbol{e}^i} for i' < i, and that \|\boldsymbol{p}(\theta)\boldsymbol{\sigma}^{-1} \|_{\ell_q(\mathcal F)} \leqslant C<\infty, where the constant C is independent of J. Let K_q, K_{q,\theta} and C_q be the constants in the assumptions of Lemmas 5.1 and 5.2. Then for every \xi > 2 we can construct a deep ReLU neural network \boldsymbol{\phi}_{\Lambda(\xi)}:= (\phi_{\boldsymbol{s}-\boldsymbol{e}; \boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)} on \mathbb{R}^m, where
and a sequence of points Y_{\Lambda(\xi)}:= (\boldsymbol{y}_{\boldsymbol{s} -\boldsymbol{e};\boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)} having the following properties:
Here the constants C are independent of J, v and \xi.
Let us briefly draw a plan of the proof of this theorem. We will present a detailed proof for U={\mathbb R}^\infty and then point out that the case U= {\mathbb R}^M can be proved in the same way with slight modifications.
In the rest of this section, all definitions, formulae and assertions are given for U= {\mathbb R}^\infty and \xi >1; we use the letters m and \omega only for the notation
where K_q and K_{q,\theta} are the constants defined in Lemma 5.1. As mentioned above, we primarily approximate v \in L_2({\mathbb R}^\infty,X,\gamma) by the GPC interpolation I_{\Lambda(\xi)} v. At the next step, we approximate I_{\Lambda(\xi)} v by its truncation I_{\Lambda(\xi)}^{\omega}v on the super-cube B^m_\omega, which we construct below. The final step is to construct a deep ReLU neural network \boldsymbol{\phi}_{\Lambda(\xi)}:= (\phi_{\boldsymbol{s}-\boldsymbol{e}; \boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)} to approximate I_{\Lambda(\xi)}^{\omega}v by \Phi_{\Lambda(\xi)}v of the form (3.27).
For a function \varphi defined on \mathbb{R}, we denote by \varphi^{\omega} the truncation of \varphi on B^1_\omega, that is,
We have L_{\boldsymbol{s},\boldsymbol{k}}^{\omega}(\boldsymbol{y}) =\prod_{j=1}^m L_{s_j;k_j}(y_j) if \boldsymbol{y}\in B^m_\omega and L_{\boldsymbol{s},\boldsymbol{k}}^{\omega}(\boldsymbol{y})=0 otherwise. For a function v \in L_2^\mathcal{E}({\mathbb R}^\infty,X,\gamma), we define
\begin{equation}
f I_{\Lambda(\xi)}^\omega(v) :=\sum_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)}(-1)^{|\boldsymbol{e}|_1} v(\boldsymbol{y}_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}) L_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}}^\omega.
\end{equation}
\tag{3.33}
Let the assumptions of Theorem 3.2 hold. By Lemma 5.1, (iii), for every \xi >2 we have m(\xi) \leqslant m. Hence, for every (\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi), L_{\boldsymbol{s} - \boldsymbol{e};\boldsymbol{k}} and L_{\boldsymbol{s} - \boldsymbol{e};\boldsymbol{k}}^\omega, and therefore I_{\Lambda(\xi)}v and I_{\Lambda(\xi)}^{\omega} v, can be considered as functions on {\mathbb R}^m. For g \in L_2({\mathbb R}^m,X,\gamma) we have \|g\|_{L_2({\mathbb R}^m,X,\gamma)}=\|g\|_{L_2({\mathbb R}^\infty,X,\gamma)} in the sense of an extension of g. We make use of these facts without mention.
To prove Theorem 3.2 we will use some intermediate approximations for estimation of the approximation error as in (3.30). Suppose that the deep ReLU neural network \boldsymbol{\phi}_{\Lambda(\xi)}, and therefore the function \Phi_{\Lambda(\xi)}, have already been constructed. By the triangle inequality we have
Hence the estimate (3.30) will be obtained via the bound C\xi^{-(1/q - 1/2)} for every of the four terms in the right-hand side. The first term is already estimated as in Theorem 3.1. The estimates for the others will be carried out in the following lemmas (Lemmas 3.3–3.5). To complete the proof of Theorem 3.2 we also have to prove bounds on the size and depth of \boldsymbol{\phi}_{\Lambda(\xi)} as in items (iii) and (iv); this is done in Lemma 3.6 below.
For v \in L_2^\mathcal{E}({\mathbb R}^\infty,X,\gamma) satisfying Assumption (I), by Lemma 3.1 the series (3.8) converges unconditionally to v in L_2({\mathbb R}^\infty,X,\gamma). Therefore, formula (3.19) for {\Lambda = \Lambda(\xi)} can be rewritten as
where L_{s_j - e_j;k_j} is a polynomial in the variable y_j, of degree not greater than m_1(\xi). Hence, applying Lemma 5.7 while taking account of (3.31) gives
Lemma 3.3 gives a bound on the second term in the right-hand side of (3.34), that is, an error bound for the approximation of sparse-grid Lagrange interpolation I_{\Lambda(\xi)}v by its truncation I_{\Lambda(\xi)}^{\omega}v on B_m^\omega for v \in L_2({\mathbb R}^\infty,X,\gamma). As the next step we construct a deep ReLU neural network \boldsymbol{\phi}_{\Lambda(\xi)}:=(\phi_{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)} on {\mathbb R}^m for approximating I_{\Lambda(\xi)}^{\omega}v by the function \Phi_{\Lambda(\xi)}v given as in (3.27), and prove a bound on the error as the third term in the right-hand side of (3.34).
For s\in \mathbb{N}_0 we represent the univariate interpolation polynomial L_{s;k} in the form of a linear combination of monomials:
where the notation \sum_{\boldsymbol{\ell}=\boldsymbol{0}}^{\boldsymbol{s} - \boldsymbol{e}} means that the sum is taken over all \boldsymbol{\ell} such that \boldsymbol{0} \leqslant \boldsymbol{\ell} \leqslant {\boldsymbol{s} - \boldsymbol{e}}, and
Let \boldsymbol{\ell} \in \mathbb{F} be such that \boldsymbol{0} \leqslant \boldsymbol{\ell} \leqslant \boldsymbol{s} - \boldsymbol{e}. By definition we have \operatorname{supp}(\boldsymbol{\ell}) \subset \operatorname{supp}(\boldsymbol{s}). Making the change of variables
where K is the constant from Lemma 5.3. Hence applying Lemma 2.4 to the product in the left-hand side of (3.43), for every (\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi) and \boldsymbol{\ell} satisfying \boldsymbol{0} < \boldsymbol{\ell} \leqslant \boldsymbol{s}-\boldsymbol{e}, there exists a deep ReLU neural network \phi^{\boldsymbol{s} - \boldsymbol{e}}_{\boldsymbol{\ell}} on \mathbb{R}^{|{\operatorname{supp}(\boldsymbol{s})}|} such that \operatorname{supp}(\phi^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}}) \subset [-2,2]^{|{\operatorname{supp}(\boldsymbol{s})}|} and
Also, from Lemma 2.4 and the inequalities |\boldsymbol{\ell}|_1 +|\operatorname{supp}(\boldsymbol{s})\setminus\operatorname{supp}(\boldsymbol{\ell})| \leqslant |\boldsymbol{s}|_1 \leqslant \delta^{-1} one can see that
\begin{equation}
W(\phi^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}}) \leqslant C \bigl(1+|\boldsymbol{s}|_1(\log|\boldsymbol{s}|_1+\log\delta^{-1})\bigr) \leqslant C(1+|\boldsymbol{s}|_1\log\delta^{-1})
\end{equation}
\tag{3.48}
which is the parallelization deep ReLU neural network of the component deep ReLU neural networks \phi^{\boldsymbol{s} -\boldsymbol{e}}_{\boldsymbol{\ell}}({\cdot}/(2\sqrt{\omega})). From (3.47) it follows that
According to the above convention, for (\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi), we identify occasionally without mention the functions \phi^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}} and \phi_{\boldsymbol{s}-\boldsymbol{e};\boldsymbol{k}} on \mathbb{R}^{|{\operatorname{supp}(\boldsymbol{s})}|} with their extensions to {\mathbb R}^m or to {\mathbb R}^\infty in view of to the inclusions \operatorname{supp}(\boldsymbol{s})\subset\{1,\dots,m\} \subset \mathbb{N}.
We define \boldsymbol{\phi}_{\Lambda(\xi)}:=(\phi_{\boldsymbol{s} -\boldsymbol{e};\boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi)} as the deep ReLU neural network on {\mathbb R}^m which is realized by parallelization of \phi_{\boldsymbol{s}-\boldsymbol{e}; \boldsymbol{k}}, (\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi). Consider the approximation of I_{\Lambda(\xi)}^{\omega}v by the function \Phi_{\Lambda(\xi)} v where for convenience we recall that
In Lemma 3.4 we proved a bound on the third term in the right-hand side of (3.34), that is, an error bound for the approximation of I_{\Lambda(\xi)}^{\omega}v by the function \Phi_{\Lambda(\xi)}v for v \in L_2({\mathbb R}^\infty,X,\gamma). As the last step of error estimation we establish a bound for the fourth term in the right-hand side of (3.34).
Lemma 3.5. Under the assumptions of Theorem 3.2, for every \xi > 1 we have
Proof. We use formula (3.55) to estimate the norm \|\Phi_{\Lambda(\xi)} v \|_{L_2(({\mathbb R}^m \setminus B^m_\omega),X,\gamma)}. We need the following auxiliary inequality
Due to (3.47), it is sufficient to prove this inequality for \boldsymbol{y} \in B^{|\operatorname{supp}(\boldsymbol{s})|}_{4\omega}. Considering the right-hand side of (3.45), we have
In combination with the definition (3.45), this yields \delta \leqslant 1. On the other hand, by the definition of h^{\boldsymbol{s}-\boldsymbol{e}}_{\boldsymbol{\ell}},
From the last two inequalities, (3.46) and the triangle inequality we derive (3.60) for \boldsymbol{y} \in B^{|\operatorname{supp}(\boldsymbol{s})|}_{4\omega}.
Using the tensor product argument from Lemma 5.6 and the inequality \boldsymbol{s} - \boldsymbol{e} \leqslant \boldsymbol{s} for \boldsymbol{e} \in E_{\boldsymbol{s}}, we deduce the estimates
By the assumptions of Theorem 3.2, \| {\boldsymbol{p}(\theta)\boldsymbol{\sigma}^{-1}}\|_ {\ell_q(\mathcal F)} \leqslant C < \infty for some \theta \geqslant 3/q. From this we derive that
To complete the proof of Theorem 3.2, we have to establish bounds on the size and depth of the deep ReLU neural network \boldsymbol{\phi}_{\Lambda(\xi)} as in (iii) and (iv).
Lemma 3.6. Under the assumptions of Theorem 3.2 the input dimension of \boldsymbol{\phi}_{\Lambda(\xi)} is at most \lfloor K_q \xi \rfloor for every \xi \,{>}\, 1, and the output dimension of \boldsymbol{\phi}_{\Lambda(\xi)} is at most \lfloor C_q \xi \rfloor,
\begin{equation}
W(\boldsymbol{\phi}_{\Lambda(\xi)}) \leqslant C \xi^{1+2/(\theta q)} \log \xi
\end{equation}
\tag{3.65}
and
\begin{equation}
L(\boldsymbol{\phi}_{\Lambda(\xi)}) \leqslant C \xi^{1/(\theta q)} (\log \xi)^2,
\end{equation}
\tag{3.66}
where the constants C are independent of v and \xi.
Proof. The input dimension of \boldsymbol{\phi}_{\Lambda(\xi)} is not greater than m(\xi), which is at most \lfloor K_q \xi \rfloor by Lemma 5.1, (iii). The output dimension of \boldsymbol{\phi}_{\Lambda(\xi)} is the quantity |G(\xi)| which is at most \lfloor C_q \xi \rfloor by Lemma 5.2.
By Lemmas 2.1 and 2.4 and (3.48) the size of \boldsymbol{\phi}_{\Lambda(\xi)} is estimated as
We are now in a position to give a formal proof of Theorem 3.2.
Proof of Theorem 3.2. From (3.34), Theorem 3.1 and Lemmas 3.3–3.5, for every {\xi \!>\! 2} we deduce that
\begin{equation*}
\|{v-\Phi_{\Lambda(\xi)} v}{L_2({\mathbb R}^\infty,X,\gamma)}\| \leqslant C \xi^{- (1/q-1/2)}.
\end{equation*}
\notag
Claim (vi) is proved. Claim (i) follows directly from the construction of the deep ReLU neural network \boldsymbol{\phi}_{\Lambda(\xi)} and the sequence of points Y_{\Lambda(\xi)}, claim (ii) follows from Lemma 5.2, claims (iii) and (iv) from Lemma 3.6, and claim (v) from Lemma 5.1, (ii), and (3.51). Thus, Theorem 3.2 is proved for U={\mathbb R}^\infty.
The case U= {\mathbb R}^M can be dealt with in the same way with slight modifications. Counterparts of all definitions, formulae and assertions which were used in the proof of the case U= {\mathbb R}^\infty, are true for U= {\mathbb R}^M. In the proof for this case, in particular, the equality \|H_{\boldsymbol{s}}\|_{L_2({\mathbb R}^\infty)}=1, \boldsymbol{s} \in \mathbb{F}, we have used, is replaced by the inequality \|{H_{\boldsymbol{s}}}{L_\infty^{\sqrt{g}}({\mathbb R}^M)} \|< 1, \boldsymbol{s} \in \mathbb{N}_0^M. The theorem is proved.
§ 4. Application to parametrized elliptic PDEs
In this section we apply the results in the previous section to the deep ReLU neural network approximation of the solution u to the parametrized elliptic PDEs (1.2) with lognormal inputs (1.3). This is based on the weighted \ell_2-summability of the series (\|u_{\boldsymbol{s}}\|_V)_{\boldsymbol{s} \in \mathcal{F}} in following lemma, which was proved in [4], Theorems 3.3 and 4.2.
Lemma 4.1. Assume that there exist a number q, 0<q<\infty, and an increasing sequence \boldsymbol{\rho} =(\rho_{j})_{j \in \mathcal{N}} of numbers strictly larger than 1 such that \|{\boldsymbol{\rho}^{-1}}\|_{\ell_q(\mathcal N)} \leqslant C < \infty and
The following two lemmas are proved in [15] (Lemmas 5.2 and 5.3, respectively).
Lemma 4.2. Let the assumptions of Lemma 4.1 hold. Then the solution map \boldsymbol{y} \mapsto u(\boldsymbol{y}) is \gamma-measurable and u \in L_2(U,V,\gamma). Moreover, u \in L_2^\mathcal{E}(U,V,\gamma), where the set
has measure \gamma(\mathcal{E}) =1 and contains all the \boldsymbol{y} \in {\mathbb R}^\infty with |\boldsymbol{y}|_0 < \infty in the case when U={\mathbb R}^\infty.
Lemma 4.3. Let 0 < q <\infty and let \boldsymbol{\rho}=(\rho_j) _{j \in \mathcal{N}} be a sequence of positive numbers such that \|{\boldsymbol{\rho}^{-1}} \|_{\ell_q(\mathcal N)} \leqslant C < \infty, where the constant C is independent of J. Let \theta be an arbitrary nonnegative number and \boldsymbol{p}(\theta)=(p_{\boldsymbol{s}}(\theta))_{\boldsymbol{s} \in \mathcal{F}} the set defined as in (3.20). For \eta \in \mathbb{N} let the set \boldsymbol{\sigma}=(\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}} be defined as in (4.1). Then for any \eta >2(\theta+1)/q, we have
Now we are in a position to formulate our main results on collocation deep ReLU neural network approximation of the solution u to parametric elliptic PDEs with lognormal inputs.
Theorem 4.1. Under the assumptions of Lemma 4.1, let 0 < q < 2. Then, given an arbitrary number \delta > 0, for every integer n > 2 we can construct a deep ReLU neural network \boldsymbol{\phi}_{\Lambda(\xi_n)}:=(\phi_{\boldsymbol{s} -\boldsymbol{e};\boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi_n)} of size W(\boldsymbol{\phi}_{\Lambda(\xi_n)}) \leqslant n on \mathbb{R}^m, where
\begin{equation*}
m := \begin{cases} \min \biggl\{M, \biggl\lfloor K \biggl(\dfrac{n}{\log n}\biggr)^{1/(1+\delta)} \biggr\rfloor \biggr\} &\textit{if } U={\mathbb R}^M, \\ \biggl\lfloor K \biggl(\dfrac{n}{\log n})^{1/(1+\delta)} \biggr\rfloor &\textit{if }U={\mathbb R}^\infty, \end{cases}
\end{equation*}
\notag
and a sequence of points Y_{\Lambda(\xi_n)}:=(\boldsymbol{y}_{\boldsymbol{s} -\boldsymbol{e};\boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi_n)} having the following properties:
\begin{equation*}
\| u- \Phi_{\Lambda(\xi_n)} u \|_{\mathcal L(U,V)} \leqslant C \biggl(\frac{n}{\log n}\biggr)^{-(1/q-1/2)/(1+\delta)}.
\end{equation*}
\notag
Here the constants C, K and C_\delta are independent of J, u and n.
Proof. To prove the theorem we apply Theorem 3.2 to the solution u. Without loss of generality we can assume that \delta \leqslant 1/6. First we take the number \theta := 2/\delta q, satisfying \theta \geqslant 3/q, and then choose a number \eta \in \mathbb{N} satisfying \eta > 2(\theta + 1)/q. Using Lemmas 4.1–4.3 one can check that u \in L_2^\mathcal{E}(U,V,\gamma) satisfies the assumptions of Theorem 3.2 for X=V and the set (\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathbb{F}} defined as in (4.1), where \mathcal{E} is the set defined in Lemma 4.2. Given an integer n > 2, we choose \xi_n >2 as the largest number satisfying C\xi_n^{1+\delta} \log \xi_n \leqslant n, where C is the constant in claim (ii) of Theorem 3.2. It is easy to verify that there exist positive constants C_1 and C_2 independent of n such that
From Theorem 3.2 for \xi=\xi_n we deduce the required results. Theorem 4.1 is proved.
From Theorem 4.1 one can directly derive the following.
Theorem 4.2. Under the assumptions of Lemma 4.1 let 0 < q < 2 and \delta_q:=\min (1, 1/q -1/2). Then, given an arbitrary number \delta \in (0,\delta_q), for every integer n > 1 we can construct a deep ReLU neural network \boldsymbol{\phi}_{\Lambda(\xi_n)}:=(\phi_{\boldsymbol{s} -\boldsymbol{e};\boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi_n)} of size W\big(\boldsymbol{\phi}_{\Lambda(\xi_n)}\big) \leqslant n on \mathbb{R}^m, where
\begin{equation*}
m := \begin{cases} \min \{M, \lfloor K n^{1-\delta} \rfloor\}&\textit{if }U={\mathbb R}^M, \\ \lfloor K n^{1-\delta} \rfloor&\textit{if }U={\mathbb R}^\infty, \end{cases}
\end{equation*}
\notag
and a sequence of points Y_{\Lambda(\xi_n)}\!:=\! (\boldsymbol{y}_{\boldsymbol{s} - \boldsymbol{e};\boldsymbol{k}})_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi_n)} with the following properties:
generated from the deep ReLU neural network \boldsymbol{\phi}_{\Lambda(\xi_n)} as in Theorem 4.2 and the collocation approximation of u by the sparse-grid Lagrange GPC interpolation
Both methods are based on m the same particular solvers (u(\boldsymbol{y}_{\boldsymbol{s}- \boldsymbol{e};\boldsymbol{k}}))_{(\boldsymbol{s},\boldsymbol{e},\boldsymbol{k}) \in G(\xi_n)}. From Corollary 3.1 one can see that under the assumptions of Theorem 4.2, for the last approximation we have the error bound in m
which is the same as the one in (4.3) for the first approximation, since by construction the parameter m in (4.3) can be treated as independent.
After the present paper and [17] appeared on the ArXiv website, we were informed about the paper [50], concerned with some problems similar to the ones considered in [17], through private communication with its authors.
§ 5. Appendix
5.1. Auxiliary lemmas
Lemma 5.1. Let \theta\geqslant 0 and 0<q<\infty. Let \boldsymbol{\sigma}= (\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}} be a set of numbers strictly larger than 1. Then the following hold.
(i) If \|{\boldsymbol{p} (\theta/q)\boldsymbol{\sigma}^{-1}}\|_{\ell_q(\mathcal F)} \leqslant K < \infty, where the constant K is independent of J, then
In particular, if \|{\boldsymbol{\sigma}^{-1}}\|_{\ell_q(\mathcal F)}^q \leqslant K_q < \infty, where the constant satisfies K_q \geqslant 1 and is independent of J, then the set \Lambda(\xi) is finite and
(ii) If \|{\boldsymbol{p}(\theta)\boldsymbol{\sigma}^{-1}} \|_{\ell_q(\mathcal F)}^{1/\theta} \leqslant K_{q,\theta} < \infty, where the constant K_{q,\theta} is independent of J, then
(iii) If \sigma_{\boldsymbol{e}^{i'}} \leqslant \sigma_{\boldsymbol{e}^i} for i' < i, and if \|\boldsymbol{\sigma}^{-1}\|_{\ell_q(\mathcal F)}^q \leqslant K_q < \infty, where the constant satisfies K_q \geqslant 1 and is independent of J, then
Proof. The claims (ii) and (iii) were proved in [17] (Lemmas 3.2 and 3.3, respectively) for \mathcal{F} = \mathbb{F}. The case \mathcal{F} = \mathbb{N}_0^M can be proved in a similar way. Let us prove claim (i). Indeed, for every \xi > 1 we have
Lemma 5.2. Let \theta\geqslant 0, 0<q<\infty and \xi > 1. Let \boldsymbol{\sigma}= (\sigma_{\boldsymbol{s}})_{\boldsymbol{s} \in \mathcal{F}} be a set of numbers strictly greater than 1. If \|{\boldsymbol{p} ((\theta+2)/q) \boldsymbol{\sigma}^{-1}}\|_{\ell_q(\mathcal F)} \leqslant C < \infty, where the constant C is independent of J, then
In particular, if, in addition, \|\boldsymbol{p}(2/q)\boldsymbol{\sigma}^{-1}\|_{\ell_q(\mathcal F)}^q \leqslant C_q < \infty, where the constant C_q is independent of J, then
where the constant K is independent of J and \boldsymbol{s}, \boldsymbol{s}'.
Proof. From Cramér’s bound (see, for instance, [15], Lemma 3.2) we deduce that
\begin{equation}
|H_s(y)\sqrt{g(y)}|<1 \quad \forall\, y \in \mathbb R \quad \forall\, s \in \mathbb N_0,
\end{equation}
\tag{5.7}
or, equivalently,
\begin{equation}
|H_s(y)|<(2\pi)^{1/4} e^{y^2/4} \quad \forall\, y \in \mathbb R, \quad \forall\, s \in \mathbb N_0.
\end{equation}
\tag{5.8}
Let \boldsymbol{s}, \boldsymbol{s}' \in \mathcal{F} and \boldsymbol{k} \in \pi_{\boldsymbol{s}} be given. Notice that the univariate Hermite polynomials satisfy H_0 = 1, H_{2s+1}(0) = 0 and |H_{2s}(0)|\leqslant 1 for s \in \mathbb{N}_0. Hence, by (5.8) we have
where the constants K are independent of s and k \in \pi_{s}.
Proof. Notice that L_{s;k} is a polynomial with s simple zeros \{y_{s;j}\}_{j \in \pi_{s}, \, j \ne k}, and that L_{s;k}(y_{s;k}) =1. Moreover, there is no zero in the open interval (y_{s;k-1}, y_{s;k}) and
From the last equalities one can see that the lemma is trivial if y_0 = 0. Consider the case y_0 \ne 0. If |y_0| \leqslant 1, then from (5.25) we deduce that
Lemma 5.7. Let \varphi (\boldsymbol{y})= \prod_{j = 1}^m \varphi_j(y_j) for \boldsymbol{y} \in \mathbb{R}^m, where \varphi_j is a polynomial in the variable y_j of degree not greater than \omega for j=1,\dots,m. Then
This theorem was proved in [15], Corollary 3.11, for U={\mathbb R}^\infty. Let us prove it for U={\mathbb R}^M. By Lemma 3.1 the series (3.8) converges unconditionally to v in the space L_2({\mathbb R}^M,X,\gamma). Observe that I_{\Lambda(\xi)} H_{\boldsymbol{s}} = H_{\boldsymbol{s}} for every \boldsymbol{s} \in \Lambda(\xi) and \Delta_{\boldsymbol{s}} H_{\boldsymbol{s}'} = 0 for every \boldsymbol{s} \not\leqslant \boldsymbol{s}'. Hence for the downward closed set \Lambda(\xi) \subset {\mathbb N}_0^M we can write
Therefore, to prove the lemma it is sufficient to show that each term in the right-hand side is bounded by C\xi^{-(1/q - 1/2)}. The bound for the first term can be obtained from the Cauchy-Schwarz inequality and (5.7):
We estimate the norms in the right-hand side. For \boldsymbol{s} \in {\mathbb N}_0^M and \boldsymbol{s}' \in \Lambda(\xi) \cap R_{\boldsymbol{s}} we have \Delta_{\boldsymbol{s}'} (H_{\boldsymbol{s}}) = \prod_{j=1}^M \Delta_{s'_j}(H_{s_j}). From Lemma 3.2 and (5.7) we deduce that
where \theta_1 = 1/6 + \varepsilon and we recall that \lambda = C_\varepsilon. Substituting the right-hand side of (5.39) for \|\Delta_{\boldsymbol{s}'} (H_{\boldsymbol{s}})\big\|_{L_{\infty}^{\sqrt{g}}({\mathbb R}^M)} into (5.38) we obtain
Using the last estimates and the assumption \|\boldsymbol{p}(\theta/q,\lambda)\boldsymbol{\sigma}^{-1}\|_{\ell_q(\mathbb N_0^M)}\leqslant C < \infty with positive constant C independent of M, we derive the bound for the second term in the right-hand side of (5.36):
which, in combination with (5.36) and (5.37), proves the theorem.
Acknowledgement
A part of this work was done when the author was working at the Vietnam Institute for Advanced Study in Mathematics (VIASM). He would like to thank the VIASM for providing a fruitful research environment and working conditions.
Bibliography
1.
M. Ali and A. Nouy, “Approximation of smoothness classes by deep rectifier networks”, SIAM J. Numer. Anal., 59:6 (2021), 3032–3051
2.
R. Arora, A. Basu, P. Mianjy and A. Mukherjee, Understanding deep neural networks with rectified linear units, Electronic colloquium on computational complexity, report No. 98, 2017, 21 pp. https://eccc.weizmann.ac.il/report/2017/098/
3.
M. Bachmayr, A. Cohen, Dinh Dũng and Ch. Schwab, “Fully discrete approximation of parametric and stochastic elliptic PDEs”, SIAM J. Numer. Anal., 55:5 (2017), 2151–2186
4.
M. Bachmayr, A. Cohen, R. DeVore and G. Migliorati, “Sparse polynomial approximation of parametric elliptic PDEs. Part II: Lognormal coefficients”, ESAIM Math. Model. Numer. Anal., 51:1 (2017), 341–363
5.
M. Bachmayr, A. Cohen and G. Migliorati, “Sparse polynomial approximation of parametric elliptic PDEs. Part I: Affine coefficients”, ESAIM Math. Model. Numer. Anal., 51:1 (2017), 321–339
6.
A. R. Barron, “Complexity regularization with application to artificial neural networks”, Nonparametric functional estimation and related topics (Spetses 1990), NATO Adv. Sci. Inst. Ser. C: Math. Phys. Sci., 335, Kluwer Acad. Publ., Dordrecht, 1991, 561–576
7.
A. Chkifa, A. Cohen, R. DeVore and Ch. Schwab, “Sparse adaptive Taylor approximation algorithms for parametric and stochastic elliptic PDEs”, ESAIM Math. Model. Numer. Anal., 47:1 (2013), 253–280
8.
A. Chkifa, A. Cohen and Ch. Schwab, “High-dimensional adaptive sparse polynomial interpolation and applications to parametric PDEs”, Found. Comput. Math., 14:4 (2014), 601–633
9.
A. Chkifa, A. Cohen and Ch. Schwab, “Breaking the curse of dimensionality in sparse polynomial approximation of parametric PDEs”, J. Math. Pures Appl. (9), 103:2 (2015), 400–428
10.
A. Cohen and R. DeVore, “Approximation of high-dimensional parametric PDEs”, Acta Numer., 24 (2015), 1–159
11.
A. Cohen, R. DeVore and Ch. Schwab, “Convergence rates of best N-term Galerkin approximations for a class of elliptic sPDEs”, Found. Comput. Math., 10:6 (2010), 615–646
12.
A. Cohen, R. DeVore and Ch. Schwab, “Analytic regularity and polynomial approximation of parametric and stochastic elliptic PDE's”, Anal. Appl. (Singap.), 9:1 (2011), 11–47
13.
G. Cybenko, “Approximation by superpositions of a sigmoidal function”, Math. Control Signals Systems, 2:4 (1989), 303–314
14.
Dinh Dũng, “Linear collective collocation approximation for parametric and stochastic elliptic PDEs”, Mat. Sb., 210:4 (2019), 103–127; English transl. in Sb. Math., 210:4 (2019), 565–588
15.
Dinh Dũng, “Sparse-grid polynomial interpolation approximation and integration for parametric and stochastic elliptic PDEs with lognormal inputs”, ESAIM Math. Model. Numer. Anal., 55:3 (2021), 1163–1198
16.
Dinh Dũng and Van Kien Nguyen, “Deep ReLU neural networks in high-dimensional approximation”, Neural Netw., 142 (2021), 619–635
17.
Dinh Dũng, Van Kien Nguyen and Duong Thanh Pham, Deep ReLU neural network approximation of parametric and stochastic elliptic PDEs with lognormal inputs, arXiv: 2111.05854v1
18.
Dinh Dũng, Van Kien Nguyen, Ch. Schwab and J. Zech, Analyticity and sparsity in uncertainty quantification for PDEs with Gaussian random field inputs, arXiv: 2201.01912
19.
Dinh Dũng, Van Kien Nguyen and Mai Xuan Thao, “Computation complexity of deep ReLU neural networks in high-dimensional approximation”, J. Comp. Sci. Cybern., 37:3 (2021), 292–320
20.
I. Daubechies, R. DeVore, S. Foucart, B. Hanin and G. Petrova, “Nonlinear approximation and (deep) ReLU networks”, Constr. Approx., 55:1 (2022), 127–172
21.
R. DeVore, B. Hanin and G. Petrova, “Neural network approximation”, Acta Numer., 30 (2021), 327–444
22.
Weinan E and Qingcan Wang, “Exponential convergence of the deep neural network approximation for analytic functions”, Sci. China Math., 61:10 (2018), 1733–1740
O. G. Ernst, B. Sprungk and L. Tamellini, “Convergence of sparse collocation for functions of countably many Gaussian random variables (with application to elliptic PDEs)”, SIAM J. Numer. Anal., 56:2 (2018), 877–905
25.
K.-I. Funahashi, “Approximate realization of identity mappings by three-layer neural networks”, Electron. Comm. Japan Part III Fund. Electron. Sci., 73:11 (1990), 61–68
26.
M. Geist, P. Petersen, M. Raslan, R. Schneider and G. Kutyniok, “Numerical solution of the parametric diffusion equation by deep neural networks”, J. Sci. Comput., 88:1 (2021), 22, 37 pp.
R. Gribonval, Kutyniok, M. Nielsen and F. Voigtländer, “Approximation spaces of deep neural networks”, Constr. Approx., 55:1 (2022), 259–367
30.
P. Grohs and L. Herrmann, “Deep neural network approximation for high-dimensional elliptic PDEs with boundary conditions”, IMA J. Numer. Anal., 42:3 (2022), 2055–2082
31.
D. Elbrachter, D. Perekrestenko, P. Grohs and H. Bölcskei, “Deep neural network approximation theory”, IEEE Trans. Inform. Theory, 67:5 (2021), 2581–2623
32.
I. Gühring, G. Kutyniok and P. Petersen, “Error bounds for approximations with deep ReLU neural networks in W^{s,p} norms”, Anal. Appl. (Singap.), 18:5 (2020), 803–859
L. Herrmann, Ch. Schwab and J. Zech, “Deep neural network expression of posterior expectations in Bayesian PDE inversion”, Inverse Problems, 36:12 (2020), 125011, 32 pp.
35.
E. Hewitt and K. Stromberg, Real and abstract analysis. A modern treatment of the theory of functions of a real variable, Springer-Verlag, New York, 1965, vii+476 pp.
36.
Viet Ha Hoang and Ch. Schwab, “N-term Wiener chaos approximation rates for elliptic PDEs with lognormal Gaussian random inputs”, Math. Models Methods Appl. Sci., 24:4 (2014), 797–826
37.
K. Hornik, M. Stinchcombe and H. White, “Multilayer feedforward networks are universal approximators”, Neural Netw., 2:5 (1989), 359–366
38.
G. Kutyniok, P. Petersen, M. Raslan and R. Schneider, “A theoretical analysis of deep neural networks and parametric PDEs”, Constr. Approx., 55:1 (2022), 73–125
39.
Jianfeng Lu, Zuowei Shen, Haizhao Yang and Shijun Zhang, “Deep network approximation for smooth functions”, SIAM J. Math. Anal., 53:5 (2021), 5465–5506
40.
D. M. Matjila, “Bounds for Lebesgue functions for Freud weights”, J. Approx. Theory, 79:3 (1994), 385–406
41.
D. M. Matjila, “Convergence of Lagrange interpolation for Freud weights in weighted L_p(\mathbb R),
0 < P \le 1”, Nonlinear numerical methods and rational approximation. II (Wilrijk 1993), Math. Appl., 296, Kluwer Acad. Publ., Dordrecht, 1994, 25–35
42.
H. N. Mhaskar, “Neural networks for optimal approximation of smooth and analytic functions”, Neural Comput., 8 (1996), 164–177
43.
H. Montanelli and Qiang Du, “New error bounds for deep ReLU networks using sparse grids”, SIAM J. Math. Data Sci., 1:1 (2019), 78–92
44.
G. Montúfar, R. Pascanu, Kyunghyun Cho and Yoshua Bengio, “On the number of linear regions of deep neural networks”, NIPS 2014, Adv. Neural Inf. Process. Syst., 27, MIT Press, Cambridge, MA, 2014, 2924–2932http://proceedings.neurips.cc/paper/2014
J. A. A. Opschoor, Ch. Schwab and J. Zech, “Exponential ReLU DNN expression of holomorphic maps in high dimension”, Constr. Approx., 55:1 (2022), 537–582
P. Petersen and F. Voigtlaender, “Optimal approximation of piecewise smooth functions using deep ReLU neural networks”, Neural Netw., 108 (2018), 296–330
49.
Ch. Schwab and J. Zech, “Deep learning in high dimension: Neural network expression rates for generalized polynomial chaos expansions in UQ”, Anal. Appl. (Singap.), 17:1 (2019), 19–55
50.
Ch. Schwab and J. Zech, Deep learning in high dimension: neural network approximation of analytic functions in L^2(\mathbb R^d, \gamma_d), arXiv: 2111.07080
51.
Zuowei Shen, Haizhao Yang and Shijun Zhang, “Deep network approximation characterized by number of neurons”, Commun. Comput. Phys., 28:5 (2020), 1768–1811
52.
J. Sirignano and K. Spiliopoulos, “DGM: a deep learning algorithm for solving partial differential equations”, J. Comput. Phys., 375 (2018), 1339–1364
53.
T. Suzuki, Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality, ICLR 2019: International conference on learning representations (New Orleans, LA 2019) https://openreview.net/pdf?id=H1ebTsActm
54.
J. Szabados, “Weighted Lagrange and Hermité-Fejér interpolation on the real line”, J. Inequal. Appl., 1:2 (1997), 99–123
55.
G. Szegö, Orthogonal polynomials, Amer. Math. Soc. Colloq. Publ., 23, Amer. Math. Soc., New York, 1939, ix+401 pp.
56.
M. Telgarsky, Representation benefits of deep feedforward networks, arXiv: 1509.08101
57.
M. Telgarsky, “Benefits of depth in neural nets”, 29th annual conference on learning theory (Columbia Univ., New York, NY 2016), Proceedings of Machine Learning Research (PMLR), 49, 2016, 1517–1539https://proceedings.mlr.press/v49/telgarsky16.html
58.
R. K. Tripathy and I. Bilionis, “Deep UQ: learning deep neural network surrogate models for high dimensional uncertainty quantification”, J. Comput. Phys., 375 (2018), 565–588
59.
D. Yarotsky, “Error bounds for approximations with deep ReLU networks”, Neural Netw., 94 (2017), 103–114
60.
D. Yarotsky, “Optimal approximation of continuous functions by very deep ReLU networks”, 31st annual conference on learning theory, Proceedings of Machine Learning Research (PMLR), 75, 2018, 639–649https://proceedings.mlr.press/v75/yarotsky18a.html
61.
J. Zech, D. Dũng and Ch. Schwab, “Multilevel approximation of parametric and stochastic PDES”, Math. Models Methods Appl. Sci., 29:9 (2019), 1753–1817
62.
J. Zech and Ch. Schwab, “Convergence rates of high dimensional Smolyak quadrature”, ESAIM Math. Model. Numer. Anal., 54:4 (2020), 1259–1307
Citation:
Dinh Dũng, “Collocation approximation by deep neural ReLU networks for parametric and stochastic PDEs with lognormal inputs”, Sb. Math., 214:4 (2023), 479–515
\Bibitem{Din23}
\by Dinh~D\~ung
\paper Collocation approximation by deep neural ReLU networks for parametric and stochastic PDEs with lognormal inputs
\jour Sb. Math.
\yr 2023
\vol 214
\issue 4
\pages 479--515
\mathnet{http://mi.mathnet.ru/eng/sm9791}
\crossref{https://doi.org/10.4213/sm9791e}
\mathscinet{http://mathscinet.ams.org/mathscinet-getitem?mr=4653192}
\zmath{https://zbmath.org/?q=an:1535.65013}
\adsnasa{https://adsabs.harvard.edu/cgi-bin/bib_query?2023SbMat.214..479D}
\isi{https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=Publons&SrcAuth=Publons_CEL&DestLinkType=FullRecord&DestApp=WOS_CPL&KeyUT=001086876100002}
\scopus{https://www.scopus.com/record/display.url?origin=inward&eid=2-s2.0-85176607881}
Linking options:
https://www.mathnet.ru/eng/sm9791
https://doi.org/10.4213/sm9791e
https://www.mathnet.ru/eng/sm/v214/i4/p38
This publication is cited in the following 2 articles:
Dinh Dũng, Van Kien Nguyen, Christoph Schwab, Jakob Zech, Lecture Notes in Mathematics, 2334, Analyticity and Sparsity in Uncertainty Quantification for PDEs with Gaussian Random Field Inputs, 2023, 197
Dinh Dũng, Van Kien Nguyen, Christoph Schwab, Jakob Zech, Lecture Notes in Mathematics, 2334, Analyticity and Sparsity in Uncertainty Quantification for PDEs with Gaussian Random Field Inputs, 2023, 123