Yu. V. Malykhin, “Widths and rigidity”, Sb. Math., 215:4 (2024), 543

Sbornik: Mathematics

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Forthcoming papers
	Archive
	Impact factor
	Guidelines for authors
	License agreement
	Submit a manuscript

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Mat. Sb.:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Sbornik: Mathematics, 2024, Volume 215, Issue 4, Pages 543–571
DOI: https://doi.org/10.4213/sm9958e (Mi sm9958)

Widths and rigidity

Yu. V. Malykhin^ab

^a Steklov Mathematical Institute of Russian Academy of Sciences, Moscow, Russia
^b Lomonosov Moscow State University, Moscow, Russia

English version PDF (657 kB) HTML full-text Russian version article

References:

PDF

HTML

DOI: https://doi.org/10.4213/sm9958e

Abstract: We consider the Kolmogorov widths of finite sets of functions. Any orthonormal system of $N$ functions in $L_2$ is rigid, that is, it cannot be well approximated by linear subspaces of dimension essentially smaller than $N$. This is not true for weaker metrics: it is known that in every $L_p$ for $p<2$ the first $N$ Walsh functions can be $o(1)$-approximated by a linear space of dimension $o(N)$.
We present some sufficient conditions for rigidity. We prove that the independence of functions (in the probabilistic meaning) implies rigidity in $L_1$ and even in $L_0$, the metric that corresponds to convergence in measure. In the case of $L_p$ for $1<p<2$ the condition is weaker: any $S_{p'}$-system is $L_p$-rigid.
Also we obtain some positive results, for example, that the first $N$ trigonometric functions can be approximated by very low-dimensional spaces in $L_0$, and by subspaces generated by $o(N)$ harmonics in $L_p$ for ${p<1}$.
Bibliography: 34 titles.

Keywords: Kolmogorov width, averaged width, $\mathrm{vc}$-dimension, matrix rigidity.

Funding agency	Grant number
Russian Science Foundation	22-11-00129
This research was performed at Lomonosov Moscow State University and supported by the Russian Science Foundation under grant no. 22-11-00129, https://rscf.ru/en/project/22-11-00129/.

Received: 30.05.2023 and 29.12.2023

Bibliographic databases:

Document Type: Article

MSC: Primary 41A46; Secondary 46B20, 60A10

Language: English

Original paper language: Russian

§ 1. Introduction

In this paper we consider the Kolmogorov widths of finite sets of functions. Recall that the Kolmogorov width $d_n(K,X)$ is defined as the minimum distance $\sup_{x\in K}\rho(x,Q_n)_X$ from the set $K$ to $n$-dimensional subspaces $Q_n$ of the space $X$. This classical notion goes back to the paper [1] by Kolmogorov (1936). Widths are fundamental approximation characteristics of sets, and the behaviour of the sequence $(d_n(K,X))_{n=1}^\infty$ for functional classes and other finite-dimensional sets was actively studied by many authors. The books [2], Chs. 13 and 14, and [3] are standard references on the subject; see also a recent survey [4], § 4.3.

We will informally say that a set is rigid if it cannot be well approximated by low-dimensional subspaces. The starting point for us is the well-known result that any orthonormal system of functions $\varphi_1,\dots,\varphi_N$ is rigid in $L_2(0,1)$ (of course, this holds true in each Euclidean space):

$$ \begin{equation} d_n(\{\varphi_1,\dots,\varphi_N\},L_2)=\sqrt{1-\frac nN}. \end{equation} \tag{1} $$

(This equality for widths was first used by Stechkin; see [5] for details.)

The situation changes if we consider weaker metrics.

Theorem A [6]. Let $w_1,w_2,\dots$ be the Walsh system in the Paley numeration. Then for any $p\in[1,2)$ there exists $\delta=\delta(p)>0$ such that the following inequality holds for sufficiently large $N$:

$$ \begin{equation*} d_n(\{w_1,\dots,w_N\},L_p[0,1])\leqslant N^{-\delta} \quad\text{if } n\geqslant N^{1-\delta}. \end{equation*} \notag $$

This theorem appeared as a corollary of methods introduced in complexity theory for the related problem of matrix rigidity. The rigidity function of a matrix $A$ is defined as the Hamming distance from $A$ to the matrices of prescribed rank:

$$ \begin{equation*} \operatorname{Rig}(A,n) :=\min_{\operatorname{rank} B\leqslant n} \#\{(i,j)\colon A_{i,j}\ne B_{i,j}\}. \end{equation*} \notag $$

Matrix rigidity was introduced by Valiant in 1977 as a way to lower bounds for linear circuit complexity (for example, the inequality $\operatorname{Rig}(A_N,\varepsilon N)\geqslant N^{1+\varepsilon}$ for some fixed $\varepsilon>0$ leads to superlinear lower bounds for circuit complexity); see the survey [7] for more details.

It was a surprising result by Alman and Williams [8] that the family of Walsh–Hadamard matrices is not rigid, that is, the rank of these matrices can significantly be reduced by changing a small number of elements. Theorem A is based on their construction.

In this paper we obtain sufficient conditions for systems of functions to be rigid in metrics weaker than $L_2$.

Average widths and $L_1$-rigidity

Our lower bounds on widths are based on the approximation properties of random vectors in $\mathbb{R}^N$. Let $\xi_1,\dots,\xi_N$ be independent random variables such that $\mathsf{E}\xi_i=0$ and $\mathsf{E}|\xi_i| = 1$. We prove that for any $n$-dimensional space $Q_n\subset\mathbb{R}^N$,

$$ \begin{equation*} \mathsf{E}\rho(\xi,Q_n)_{\ell_1^N} \geqslant c(\varepsilon)N \quad\text{if } n\leqslant N(1-\varepsilon). \end{equation*} \notag $$

Let us introduce the notation

$$ \begin{equation*} d_n^{\mathrm{avg}}(\xi,X) :=\inf_{\dim Q_n\leqslant n}\mathsf{E}\rho(\xi,Q_n)_{X}; \end{equation*} \notag $$

we call this quantity the average Kolmogorov width of a random vector $\xi$. If fact, average widths were originally considered by Voronin and Temirgaliev in [9]. This notion was further studied by Maiorov [10], Creutzig [11] and others. We discuss the notation $d_n^\mathrm{avg}$ and the properties of average widths in § 2.2. There is a simple equality

$$ \begin{equation*} d_n^{\mathrm{avg}}(\xi,\ell_1^N)=Nd_n^{\mathrm{avg}}(\{\xi_1,\dots,\xi_N\},L_1(\Omega)), \end{equation*} \notag $$

which links the rigidity of finite-dimensional vectors and the rigidity of finite function systems. Note that the averaging on the right-hand side is over a finite set of elements of $L_1$, that is, we minimize $\frac1N\sum_{k=1}^N\rho(\xi_k,Q_n)$ over the subspaces of $L_1$.

Thus, we can say that independence implies rigidity.

Theorem 1.1. Let $\xi_1,\dots,\xi_N$ be independent random variables such that $\mathsf{E}\xi_i=0$ and $\mathsf{E}|\xi_i| = 1$. Then

$$ \begin{equation*} d_n^{\mathrm{avg}}(\{\xi_1,\dots,\xi_N\},L_1(\Omega))=N^{-1}d_n^{\mathrm{avg}}(\xi,\ell_1^N) \geqslant c(\varepsilon)>0 \quad\textit{if } n\leqslant N(1-\varepsilon). \end{equation*} \notag $$

This also provides rigidity for the standard widths, because $d_n \geqslant d_n^\mathrm{avg}$. A classical example of an independent system is the Rademacher system.

The same result holds if we replace independence by unconditionality:

$$ \begin{equation*} \operatorname{Law}(\xi_1,\dots,\xi_N)=\operatorname{Law}(\pm\xi_1,\dots,\pm\xi_N). \end{equation*} \notag $$

Approximation in the space of measurable functions

Consider the space $L_0(\mathcal X,\mu)$ of all measurable functions on some measure space $(\mathcal X,\mu)$. Convergence in measure is equivalent to convergence in certain metrics, say, $\displaystyle\int_{\mathcal X}\frac{|f-g|}{1+|f-g|}\,d\mu$ or $\sup\{\varepsilon\colon \mu(|f-g|\geqslant\varepsilon)\geqslant\varepsilon\}$. We will use the latter. Of course, this metric is weaker than the usual $L_p$-metrics. The approximation in $L_0$ is much poorer studied than in $L_p$ but we believe that this subject contains interesting problems.

In § 3.2 we prove that independence implies rigidity even in the case of the $L_0$-metric.

Theorem 1.2. For any $\varepsilon\in(0,1)$ there exists $\delta>0$ such that if $\xi_1,\dots,\xi_N$ are independent random variables satisfying $\inf_{c\in\mathbb{R}}\|\xi_i-c\|_{L_0}\geqslant\varepsilon$, $i=1,\dots,N$, then

$$ \begin{equation*} d_n^{\mathrm{avg}}(\{\xi_1,\dots,\xi_N\},L_0)\geqslant\delta \quad\textit{if } n\leqslant \delta N. \end{equation*} \notag $$

This is also a corollary to a finite-dimensional result. In fact, there is an exponential bound

$$ \begin{equation*} \mathsf{P}(\rho(\xi,Q_n)_{L_0^N} \leqslant \delta)\leqslant 2\exp(-\delta N), \qquad\dim Q_n\leqslant n. \end{equation*} \notag $$

Let us mention a recent result due to Kashin [12].

Theorem B. There exist absolute constants $0<\gamma_0<1$ and $c_1,c_2>0$ such that for $N=1,2,\dots$, for any orthonormal basis $\Psi=\{\psi_k\}_{k=1}^N$ in $\mathbb{R}^N$ and any linear subspace $Q\subset\mathbb{R}^N$, $\dim Q\leqslant c_1 N$, the following inequality holds

$$ \begin{equation*} \mathsf{P}\biggl\{\varepsilon\in\{-1,1\}^N\colon \rho\biggl(\sum_{k=1}^N\varepsilon_k\psi_k,Q\biggr)_{L_0^N}\leqslant c_2\biggr\} \leqslant \gamma_0^N. \end{equation*} \notag $$

It is known that lacunary systems $\varphi(k_jx)$ behave like independent variables; see [13], for example. The rigidity property is not an exception. We will prove that lacunary systems $\{\varphi(k_1x),\dots,\varphi(k_Nx)\}$ are rigid in $L_0$ if $\min k_{j+1}/k_j$ is sufficiently large (see Statement 3.3).

Another source of rigid systems is random matrices. It is well known that an $N\times N$ square matrix $\mathcal E$ consisting of random signs is very rigid with high probability: $\operatorname{Rig}(\mathcal E,\delta N)\geqslant \delta N^2$. We observe that this is true in weaker $L_0$-metric:

$$ \begin{equation*} \mathsf{P}\Bigl(\,\min_{\operatorname{rank} B\leqslant\delta N}\|\mathbf{\mathcal E}-B\|_{L_0^{N\times N}}<\delta\Bigr)\leqslant 2\exp(-\delta N^2). \end{equation*} \notag $$

This gives us an example of an $L_0$-rigid orthonormal system $\{\varphi_1,\dots,\varphi_N\}$ of functions that are piecewise constant on the intervals $((j-1)/N, j/N)$.

There is another metric in $L_0(\mathcal X,\mu)$, namely, $\mu\{x\colon f(x)\ne g(x)\}$. We denote it by $L_{\mathrm{H}}$; the distance $\|f-g\|_{L_{\mathrm{H}}}$ is an analogue of the Hamming distance. (It is called the ‘indicator distance’ in probability theory.) This metric appears in function theory in the context of ‘correction’ theorems. For example, Luzin’s theorem states that $C$ is dense in $L_{\mathrm{H}}$, and Menshov’s theorem implies that the set of functions with uniformly convergent Fourier series is dense in $L_{\mathrm{H}}$. Note that matrix rigidity is a particular case of average Kolmogorov widths in the normalized Hamming distance $L_{\mathrm{H}}^N$. Let $A$ be an $M\times N$ real matrix and $W_A=\{A_1,\dots,A_M\}\subset\mathbb{R}^N$ be the set of rows of $A$. Then

$$ \begin{equation*} \operatorname{Rig}(A,n)=MN d_n^{\mathrm{avg}}(W_A,L_{\mathrm{H}}^N). \end{equation*} \notag $$

Some of our results can be stated in matrix terms; in some cases the matrix language is even more convenient. However, in this paper we prefer to stick to the functional-theoretic language; see also [14].

The $S_{p'}$-property

In the case $1<p<2$ we establish a less restrictive sufficient condition for rigidity.

Theorem 1.3. Let $\varphi_1,\dots,\varphi_N$ be an orthonormal system in a space $L_2(\mathcal X,\mu)$, $\mu(\mathcal X)=1$. Let $1<p<2$, and suppose that $\{\varphi_k\}$ has the $S_{p'}$-property, that is,

$$ \begin{equation*} \biggl\|\sum_{k=1}^N a_k\varphi_k \biggr\|_{p'} \leqslant B|a| \quad\forall\, a\in\mathbb{R}^N \end{equation*} \notag $$

for some $B\geqslant1$. Then this system is rigid in $L_p$:

$$ \begin{equation*} d_n^{\mathrm{avg}}(\{\varphi_1,\dots,\varphi_N\},L_p)_p \geqslant c(B,\varepsilon,p)>0 \quad\textit{if } n\leqslant N(1-\varepsilon). \end{equation*} \notag $$

Note that Bourgain’s theorem [15] states that any uniformly bounded orthonormal system has a $S_{p'}$-subsystem of size $N^{2/p'}$; this subsystem is rigid in $L_p$. Hence an $o(1)$-approximation of a uniformly bounded orthonormal system in $L_p$ requires a dimension of at least $n\geqslant N^{2/p'}/2$. In particular, in Theorem A we must indeed have polynomial dependence between $n$ and $N$.

The analogue of Theorem 1.1 also holds true for $1<p<2$ (we do not prove it here), but it fails for $p>2$.

Positive approximation results

Let $G$ be a finite Abelian group and $\widehat{G}$ be the dual group of characters $\chi\colon G\to\mathbb C^*$.

It follows from [16] that characters are not rigid in $L_{\mathrm{H}}$:

$$ \begin{equation*} d_n(\widehat{G},L_{\mathrm{H}})\leqslant N^{-1+c\varepsilon} \quad\text{if } n\geqslant N\exp(-c(\varepsilon)\log^{c_2}N), \quad N:=|G|. \end{equation*} \notag $$

Such a small error of approximation, $N^{-1+c\varepsilon}$, is natural to see in the context of Valiant’s result on circuit complexity. However, we are interested in approximation in different ‘regimes’. We consider systems generated by specific characters, namely, the Walsh and trigonometric systems. First we show that they admit good approximation by very low-dimensional spaces.

Theorem 1.4. The following inequalities hold:

$$ \begin{equation*} d_n(\{w_1,\dots,w_N\},L_{\mathrm{H}}[0,1])\leqslant \exp(-c\log^{1/4}N) \quad\textit{if } n\geqslant\exp(C\log^{2/3}N) \end{equation*} \notag $$

and

$$ \begin{equation*} d_n(\{e(kx)\}_{k=-N}^N, L_0(\mathbb T))\leqslant \exp(-c\log^{0.2}N) \quad\textit{if } n\geqslant\exp(C\log^{0.9}N). \end{equation*} \notag $$

It is known that any harmonic can be approximated by other harmonics in $L_p$, $p<1$.

Theorem C [17]. For any $0<p<1$, uniformly in $M$ and $N\geqslant M$,

$$ \begin{equation*} \inf\biggl\|\cos Mx-\sum_{|k|\leqslant N,\,k\ne M}c_ke(kx)\biggr\|_p \asymp (N-M+1)^{1-1/p}. \end{equation*} \notag $$

We bound the trigonometric widths $d_n^{\mathrm{T}}$ of the first $N$ trigonometric functions, that is, approximate them by trigonometric subspaces $Q_n=\{\sum_{k\in \Lambda}c_ke(kx)\}$, ${|\Lambda|=n}$.

Theorem 1.5. For any $p\in(0,1)$ and for sufficiently large $N$ we have

$$ \begin{equation*} d_n^{\mathrm{T}}(\{e(kx)\}_{k=-N}^N,L_p) \leqslant \log^{-c_1(p)}N \quad\textit{if } n\geqslant N\exp(-\log^{c_2}N) \end{equation*} \notag $$

and

$$ \begin{equation*} d_n^{\mathrm{T}}\biggl(\biggl\{e\biggl(\frac{kx}{N}\biggr)\biggr\}_{k\in \mathbb Z_N}, L_p^N\biggr) \leqslant \log^{-c_1(p)}N \quad\textit{if } n\geqslant N\exp(-\log^{c_2}N). \end{equation*} \notag $$

Organization of the paper

In § 2 we provide necessary definitions and some needed facts. In §§ 3–5 we prove the results mentioned in § 1, as well as some other results. In § 6 we discuss some remaining questions on the subject.

Acknowledgement

The author thanks Konstantin Sergeevich Ryutin, Sergei Vladimirovich Astashkin and Boris Sergeevich Kashin for fruitful discussions.

§ 2. Preliminaries

2.1. Metrics

Let $(\mathcal X,\mu)$ be a measure space, $\mu(\mathcal X)<\infty$. As usual, we identify functions (or random variables) that differ on a set of zero measure.

We consider classical $L_p$-spaces for $0<p<\infty$:

$$ \begin{equation*} \|f\|_p :=\|f\|_{L_p} :=\biggl(\int_{\mathcal X}|f|^p\,d\mu\biggr)^{1/p}. \end{equation*} \notag $$

The space $L_0(\mathcal X,\mu)$ consists of all measurable functions. Convergence in measure in $L_0$ can be metricised by several metrics. Let us describe the Ky–Fan metric that we use. Set

$$ \begin{equation*} \|f\|_{L_0} :=\sup\bigl\{\varepsilon\geqslant0\colon \mu\{x\colon |f(x)|\geqslant\varepsilon\}\geqslant\varepsilon\bigr\}. \end{equation*} \notag $$

Then $\|f-g\|_{L_0}$ is indeed a metric. We also use the following notation:

$$ \begin{equation*} \|f\|_{L_{\mathrm{H}}} :=\mu \{x\colon f(x)\ne0\}. \end{equation*} \notag $$

Note that $\|f\|_p \geqslant \|f\|_{L_0}^{1+1/p}$ and $\|f\|_{L_{\mathrm{H}}}\geqslant\|f\|_{L_0}$.

Let $N\in\mathbb N$. Then the space $\mathbb{R}^N$ is equipped with the usual norms $\ell_p^N$ corresponding to the counting measure on $\{1,\dots,N\}$:

$$ \begin{equation*} \|x\|_p :=\|x\|_{\ell_p^N} :=\biggl(\sum_{k=1}^N |x_k|^p\biggr)^{1/p}. \end{equation*} \notag $$

We denote the unit ball in $\ell_p^N$ by $B_p^N$. Note that the meaning of $\|\cdot\|_p$ is clear from the context: it is either the norm $\|\cdot\|_{L_p}$, if the argument is a function, or the norm $\|\cdot\|_{\ell_p^N}$ if the argument is a vector. In the Euclidean case we write $|\cdot|$.

We let $L_p^N$ denote the $L_p$-spaces for the normalized (probability) measure on $\{1,\dots,N\}$. Obviously, $\|x\|_{L_p^N} = N^{-1/p}\|x\|_{\ell_p^N}$, $0<p\leqslant\infty$.

The quantity $\|x\|_{L_{\mathrm{H}}^N}$ is the share of nonzero coordinates of a vector, and the corresponding distance is the normalized Hamming distance. Note that $\lim_{p\to+0}\|x\|_p^p = \#\{k\colon x_k\ne0\} = N\|x\|_{L_{\mathrm{H}}^N}$.

In the case $p=0$ we do not work with $\ell_N^0$, but only with $L_0^N$:

$$ \begin{equation*} \|x\|_{L_0^N}=\max\bigl\{\varepsilon\colon \#\{i\colon |x_i|\geqslant \varepsilon\} \geqslant \varepsilon N\bigr\}. \end{equation*} \notag $$

We avoid the notation $\|\cdot\|_0$.

2.2. Widths

Let $X$ be a linear space equipped with a metric (or a more general distance functional) $\rho$. Recall that the Kolmogorov width of order $n\in\mathbb Z_+$ of a set $K\subset X$ is defined by

$$ \begin{equation*} d_n(K,X) :=\inf_{Q_n\subset X} \sup_{x\in K}\rho(x,Q_n)_X, \end{equation*} \notag $$

where $\rho(x,Q)_X :=\inf_{y\in Q} \rho(x,y)$ and the infimum is taken over the linear subspaces of $X$ of dimension at most $n$. Usually, $X$ is a normed space and $\rho(x,y)=\|x-y\|_X$; note that this is not required in Kolmogorov’s original paper, and we will indeed use widths in a more general context.

Average widths

Let $X$ be a linear space with metric and $\mu$ be a Borel measure on $X$. For $p\in(0,\infty)$ we define the $p$-averaged Kolmogorov width of $\mu$ in $X$ by

$$ \begin{equation*} d_n^{\mathrm{avg}}(\mu,X)_p :=\inf_{Q_n\subset X} \biggl(\int_X \rho(x,Q_n)_X^p \,d\mu(x)\biggr)^{1/p}. \end{equation*} \notag $$

If $p=1$, then we write simply $d_n^{\mathrm{avg}}(\mu,X)$.

From now on we assume that $\mu$ is a probability measure. Equivalently, we can consider the widths of a (Borel) random vector $\xi$ with values in $X$:

$$ \begin{equation*} d^{\mathrm{avg}}_n(\xi, X)_p :=\inf_{Q_n\subset X} (\mathsf{E} \rho(\xi,Q_n)_X^p)^{1/p}. \end{equation*} \notag $$

If $K$ is a subset of $X$ and it is clear what a ‘random point in $K$’ means, then the width $d_n^{\mathrm{avg}}(K,X)_p$ is defined as the width of such a random point. For example, for a finite $K$ we have

$$ \begin{equation*} d^{\mathrm{avg}}_n(\{x_1,\dots,x_N\}, X)_p :=\inf_{Q_n\subset X} \biggl(\frac1N \sum_{i=1}^N \rho(x_i,Q_n)_X^p\biggr)^{1/p}. \end{equation*} \notag $$

Notation

Let us discuss the new notation $d_n^{\mathrm{avg}}(\xi,X)_p$. Some authors write $d_n^{(\mathrm{a})}$ instead of $d_n^{\mathrm{avg}}$. This can be confusing because $d_n^{\mathrm{a}}$ denotes absolute widths (see [18]). Also the use of $\mathrm{avg}$ is standard in average settings in information-based complexity; see [19]. Some authors write $d_n^{(\mathrm{a})}(X,\mu)$, but when we deal with sets and random variables the notation $d_n^{\mathrm{avg}}(\xi,X)$ seems to be more intuitive (the object whose width is considered goes first).

Properties

The most basic inequality that we always have in mind is

$$ \begin{equation*} d_n(K,X) \geqslant d_n^{\mathrm{avg}}(\xi,X)_p\quad\text{if $\mathsf{P}(\xi\in K)=1$.} \end{equation*} \notag $$

We will use the following inequality, which is a simple consequence of the convexity of the $L_p$-norm:

$$ \begin{equation} d_{n_1+n_2}^{\mathrm{avg}}(\xi_1+\xi_2,X)_p \leqslant d_{n_1}^{\mathrm{avg}}(\xi_1,X)_p + d_{n_2}^{\mathrm{avg}}(\xi_2,X)_p,\quad 1\leqslant p<\infty. \end{equation} \tag{2} $$

Statement 2.1. Let $\xi_1,\dots,\xi_N\colon\Omega\to\mathbb{R}$ be some random variables such that $\mathsf{E}|\xi_i|^p<\infty$ for all $i$. Then for $p\in[1,\infty)$ and $n\in\mathbb Z_+$ we have

$$ \begin{equation} d_n^{\mathrm{avg}}(\{\xi_1,\dots,\xi_N\},L_p(\Omega))_p= N^{-1/p}d_n^{\mathrm{avg}}((\xi_1,\dots,\xi_N),\ell_p^N)_p. \end{equation} \tag{3} $$

Note that in the width on the left-hand side the averaging is over the finite set of ‘points’ $\xi_1,\dots,\xi_N$ in the space $L_p(\Omega)$; and on the right-hand side we see the averaged width of the random vector $\xi=(\xi_1,\dots,\xi_N)$ in $\mathbb{R}^N$. In the case of discrete $\Omega$ equality (3) is a simple consequence of the equality of the row-rank and the column-rank of a matrix. Let us give the full proof for completeness.

Proof. Consider the error of an approximation $\xi_k\approx \eta_k$:

$$ \begin{equation} \sum_{k=1}^N \|\xi_k-\eta_k\|_{L_p(\Omega)}^p=\sum_{k=1}^N \mathsf{E}|\xi_k-\eta_k|^p=\mathsf{E} \|\xi-\eta\|_{\ell_p^N}^p. \end{equation} \tag{4} $$

Note that the random variables $\eta_1,\dots,\eta_N$ lie in a subspace of dimension at most $n$ of $L_p(\Omega)$ if and only if there is a subspace $Q\subset\mathbb{R}^N$, $\dim Q\leqslant n$, such that

$$ \begin{equation*} \eta=(\eta_1,\dots,\eta_N)\in Q \quad \text{almost surely}. \end{equation*} \notag $$

Therefore,

$$ \begin{equation*} \mathsf{E} \|\xi-\eta\|_{\ell_p^N}^p \geqslant \mathsf{E}\rho(\xi,Q)_{\ell_p^N}^p \geqslant d_n^{\mathrm{avg}}(\xi,\ell_p^N)_p^p. \end{equation*} \notag $$

This gives us ‘$\geqslant$’ inequality in (3).

To prove the reverse inequality we have to construct $\eta$ from an (almost) optimal subspace $Q\subset\mathbb{R}^N$. Note that for any $\varepsilon>0$ there is a Borel-measurable (for example, piecewise-constant) map $\pi\colon\mathbb{R}^N\to Q$, such that

$$ \begin{equation*} \|x-\pi(x)\|_{\ell_p^N} \leqslant \rho(x,Q)_{\ell_p^N}+\varepsilon, \qquad x\in\mathbb{R}^N. \end{equation*} \notag $$

Take $\eta := \pi(\xi)$; this is a random vector because $\pi$ is Borel. Moreover, $\eta\in Q$ almost surely, so $\eta_1,\dots,\eta_N$ span an at most $n$-dimensional space in $L_p(\Omega)$. It remains to substitute $x=\xi$ into the previous inequality, take the $p$-expectation and use (4) to finish the proof.

An average width in Hilbert space

We present a formula for the 2-averaged width of a random vector in Hilbert space. This formula is a generalization of the Eckart–Young theorem on matrix approximation in the Frobenius norm. Also, it is a reformulation of Ismagilov’s theorem [20]. However, our terminology is very convenient, so this formulation makes sense. Our proof repeats Ismagilov’s and is presented for completeness.

Let $H$ be a separable Hilbert space (of finite or infinite dimension) and $\xi$ be a random vector in $H$ such that $\mathsf{E}|\xi|^2<\infty$. Consider the correlation operator of $\xi$ defined by $\langle Au,v\rangle = \mathsf{E}\langle\xi,u\rangle\langle\xi,v\rangle$. It is known that $A$ is a compact positive selfadjoint operator (see, for example, [21], Ch. 3, § 2). A sequence $(\lambda_n)$ is the sequence of eigenvalues of $A$ if and only if there is an orthonormal basis $\{\varphi_n\}$ (an eigenbasis of $A$) such that

$$ \begin{equation} \mathsf{E}\langle\xi,\varphi_k\rangle^2=\lambda_k, \qquad \mathsf{E}\langle\xi,\varphi_k\rangle\langle\xi,\varphi_l\rangle=0 \quad\text{for } k\ne l. \end{equation} \tag{5} $$

We arrange the eigenvalues in decreasing order: $\lambda_1\geqslant\lambda_2\geqslant\dots\geqslant0$.

Statement 2.2. We have $d_n^\mathrm{avg}(\xi,H)_2 = (\sum_{k>n}\lambda_k)^{1/2}$.

Proof. Fix a basis (5) and put $\xi_k:=\langle\xi,\varphi_k\rangle$. We bound the width from below. Consider some approximating space $Q_n$ with orthonormal basis $\{\psi_1,\dots,\psi_n\}$. Then

$$ \begin{equation*} \mathsf{E}\rho(\xi,Q_n)^2=\mathsf{E}|\xi|^2 - \mathsf{E}|P_{Q_n}\xi|^2. \end{equation*} \notag $$

We have $\mathsf{E}|\xi|^2=\sum_{k\geqslant1}\mathsf{E}\xi_k^2=\sum_{k\geqslant 1}\lambda_k$, so it is sufficient to find an upper bound for the projection:

$$ \begin{equation*} \begin{aligned} \, \mathsf{E}|P_{Q_n}\xi|^2 &=\sum_{j=1}^n \mathsf{E}\langle \xi,\psi_j\rangle^2= \sum_{j=1}^n\mathsf{E} \biggl(\sum_{k\geqslant 1}\xi_k\langle \psi_j,\varphi_k\rangle\biggr)^2 \\ &=\sum_{j=1}^n \sum_{k\geqslant1} \lambda_k \langle \psi_j,\varphi_k\rangle^2=\sum_{k\geqslant 1}\lambda_k \mu_k, \quad\text{where } \mu_k :=\sum_{j=1}^n \langle\psi_j,\varphi_k\rangle^2. \end{aligned} \end{equation*} \notag $$

Note that the numbers $(\mu_k)$ belong to $[0,1]$ and their sum equals $n$. By the monotonicity of the $\lambda_n$ the sum $\sum_{k\geqslant1}\lambda_k\mu_k$ is maximum if $\mu_k = 1$ for $k\leqslant n$ and $\mu_k=0$ for $k>n$. We have obtained a lower bound for the width. Equality is attained at the subspace $Q_n = \operatorname{span}\{\varphi_1,\dots,\varphi_n\}$.

The statement is proved.

It is often convenient to work with singular values $\sigma_k:=\lambda_k^{1/2}$ instead of eigenvalues. One can rewrite (5) in terms of the Schmidt decomposition:

$$ \begin{equation*} \xi=\sum_{k\geqslant 1}\sigma_k \eta_k \varphi_k, \qquad \mathsf{E}\eta_k\eta_l=\delta_{k,l}, \qquad\langle\varphi_k,\varphi_l\rangle=\delta_{k,l}. \end{equation*} \notag $$

Then $d_n^\mathrm{avg}(\xi,H)_2 = (\sum_{k>n}\sigma_k^2)^{1/2}$.

We arrive at the Eckart–Young theorem if we consider finite sets. Let the matrix $X$ consist of columns $x^1,\dots,x^N\in\mathbb{R}^M$; then

$$ \begin{equation*} d_n^{\mathrm{avg}}(\{x^1,\dots,x^N\},\ell_2^M)_2= N^{-1/2}\biggl(\sum_{k>n}\sigma_k(X)^2\biggr)^{1/2}. \end{equation*} \notag $$

Indeed, the singular value decomposition of $X$ gives use a basis $\{\varphi_k\}_{k=1}^M$ such that the coordinate vectors $(\langle x^1,\varphi_k\rangle,\dots,\langle x^N,\varphi_k\rangle)$ are orthogonal and have length $\sigma_k(X)$. For $\xi$ being a random choice of $x^i$ we have $\mathsf{E}\langle\xi,\varphi_k\rangle^2=N^{-1}\sigma_k^2(X)$ and $\mathsf{E}\langle\xi,\varphi_k\rangle\langle\xi,\varphi_l\rangle=0$, hence $\sigma_k(\xi) = N^{-1/2}\sigma_k(X)$.

Finally, an orthonormal system $\varphi_1,\dots,\varphi_N$ in a Euclidean space $E$ corresponds to the identity matrix $X$ (in appropriate coordinates), so we have (see also (1))

$$ \begin{equation*} d_n(\{\varphi_1,\dots,\varphi_N\},E)= d_n^{\mathrm{avg}}(\{\varphi_1,\dots,\varphi_N\},E)_2=\sqrt{1-\frac nN}. \end{equation*} \notag $$

The trigonometric width

Let $\Phi$ be a subset of $X$. If we approximate by subspaces spanned by elements of $\Phi$, then we obtain the notion of $\Phi$-width:

$$ \begin{equation*} d_n^\Phi(K,X) :=\inf_{\varphi_1,\dots,\varphi_n\in\Phi}\sup_{x\in K}\rho(x,\operatorname{span}\{\varphi_i\}_{i=1}^n)_X. \end{equation*} \notag $$

However, we need only the case when $\Phi$ is the trigonometric system $\{e(kx)\}_{k\in\mathbb Z}$ in the continuous case or $\{e(kx/N)\}_{k\in\mathbb Z_N}$ in the discrete case. In this situation the width is called the trigonometric width and is denoted by $d_n^{\mathrm{T}}$.

Usually, we consider real spaces (and the linear dimension over $\mathbb R$). When complex-valued functions are approximated, one can also consider linear spaces over $\mathbb C$. The distinction here is not important for us because in all cases where complex-valued functions are approximated we are interested in the dimension $n$ up to its order of magnitude.

2.3. Miscellaneous

The material in this subsection is contained in [22], §§ 2.2, 2.8 and 8.8.

Probability

We use the probabilistic notation $\mathsf{P}$ for the probability, $\mathsf{E}$ for the expectation and $\operatorname{Law}$ for the distribution. We also use the conditional expectation $\mathsf{E}(\xi\mid\eta)$, but only for $\eta$ with a finite number of values.

We make use of standard tail estimates. Let $X_i$, $\mathsf{E} X_i=0$, be independent random variables.

Hoeffding’s inequality

Let $a_i\leqslant X_i\leqslant b_i$ almost surely; then

$$ \begin{equation*} \mathsf{P}\biggl(\biggl|\sum_{i=1}^k X_i\biggr| \geqslant \lambda \sigma\biggr) \leqslant2\exp(-2\lambda^2), \qquad \sigma^2 :=\sum_{i=1}^k (b_i-a_i)^2. \end{equation*} \notag $$

Bernstein’s inequality

Let $|X_i|\leqslant M$ almost surely, then

$$ \begin{equation*} \mathsf{P}\biggl(\biggl|\sum_{i=1}^k X_i\biggr| \geqslant t\biggr) \leqslant 2\exp\biggl(-\frac{t^2/2}{\sigma^2+Mt/3}\biggr), \qquad \sigma^2 :=\sum_{i=1}^n \mathsf{E} X_i^2. \end{equation*} \notag $$

The $\operatorname{VC}$-dimension

Let us recall the definition of the combinatorial dimension of a class $\mathcal F$ of functions $f\colon I\to\mathbb{R}$. We say that a set $J\subset I$ is $t$-shattered ($t>0$) by $\mathcal F$, if there is a function $h\colon J\to\mathbb{R}$, such that for any choice of signs $\sigma\colon J\to\{-1,1\}$ one can find a function $f_\sigma\in\mathcal F$ such that

$$ \begin{equation} \min_{x\in J}\sigma(x)(f_\sigma(x)-h(x))\geqslant t. \end{equation} \tag{6} $$

Then the combinatorial dimension $\operatorname{vc}(\mathcal F,t)$ is the maximum cardinality of a set $t$-shattered by $\mathcal F$.

If $\mathcal F$ is convex and centrally symmetric, then one can assume that $h(x)\equiv 0$. Indeed, we can replace $\{f_\sigma\}$ by $\{\frac12(f_\sigma-f_{-\sigma})\}$.

We will use the simple (known) fact that

$$ \begin{equation} \text{if } \operatorname{vc}(\mathcal F,t)> n, \quad\text{then } d_n(\mathcal F,\ell_\infty(I))\geqslant t. \end{equation} \tag{7} $$

Indeed, let $J$ be a $t$-shattered set, $|J|=n+1$, and suppose that there is an approximation by an $n$-dimensional subspace: $\mathcal F\ni f\approx g$, $g\in Q_n$, $\|f-g\|_\infty < t$. Then there exists a nontrivial linear functional $\Lambda(f) := \sum_{x\in J}\lambda(x)f(x)$ such that $\Lambda(g)\equiv 0$ for $g\in Q_n$. Assume that $\Lambda(h)\geqslant 0$, where $h$ is from (6). Consider a function $f$ such that $\operatorname{sign}(\lambda(x))(f(x)-h(x))\geqslant t$ for all $\{x\in J\colon \lambda(x)\ne0\}$. Then for the approximating function we have $\operatorname{sign}(\lambda(x))(g(x)-h(x))>0$, hence $\Lambda(g)=\Lambda(g-h)+\Lambda(h)>0$. We obtain a contradiction. The case $\Lambda(h)\leqslant0$ is analogous.

We will also make use of the classical VC-bound for classes (that is, in fact, families of subsets) $\mathcal F\colon I\to\{-1,1\}^k$:

$$ \begin{equation} |\mathcal F|>\binom{k}{0}+\binom{k}{1}+\dots+\binom{k}{d} \quad\Longrightarrow\quad \operatorname{vc}(\mathcal F,1) > d. \end{equation} \tag{8} $$

Thus, when the left-hand inequality in (8) holds, taking the projection of the class onto the set $J$, $|J|>d$, we obtain the whole cube $\{-1,1\}^J$.

There is a useful bound on binomial coefficients:

$$ \begin{equation} \binom{k}{0}+\dots+\binom{k}m \leqslant \biggl(\frac{ek}m\biggr)^m, \qquad 0\leqslant m \leqslant k. \end{equation} \tag{9} $$

Note that the function $(ek/x)^x$ is increasing for $x\in(0,k]$.

We will often consider $1$-periodic functions, that is, functions on $\mathbb T:=\mathbb R/\mathbb Z$. We also use the notation $e(\theta):=\exp(2\pi i\theta)$.

We denote positive constants by $c,c_1,c_2,C,\dots$. If there is dependence on some parameters, then we specify it explicitly: $c_p,c(\varepsilon),\dots$ .

§ 3. Sufficient conditions for rigidity in weak metrics

3.1. Rigidity in $L_1$

For definiteness we put $\operatorname{sign}(x)\!=\!1$ for $x\!\geqslant\! 0$ and ${\operatorname{sign}(x)\!=\!-1}$ for $x<0$.

Theorem 3.1. Let $\xi=(\xi_1,\dots,\xi_N)$ be a random vector in $\mathbb{R}^N$ such that $\mathsf{E}|\xi_i|\geqslant1$ for all $i$. Let $\varepsilon\in(0,1)$ and $n\leqslant N(1-\varepsilon)$. Then

$$ \begin{equation} d_n^{\mathrm{avg}}(\xi,\ell_1^N) \geqslant c(\varepsilon)N - \sum_{i=1}^N\|\mathsf{E}(\xi_i\mid \{\operatorname{sign}\xi_j\}_{j\ne i})\|_{L_1}. \end{equation} \tag{10} $$

Before proving the theorem let us note that the distance in $\ell_1^N$ of a vector $x\in\mathbb{R}^N$ to a subspace $Q_n\subset\mathbb{R}^N$ can be written in dual terms as

$$ \begin{equation*} \rho(x,Q_n)_1=\sup_{z\in K} \langle x,z\rangle, \qquad K :=B_\infty^N \cap Q_n^\perp. \end{equation*} \notag $$

So central cross-sections of the cube $B_\infty^N$ play an important role here. We prove the following lemma.

Lemma 3.1. Let $K$ be an $m$-dimensional central section of the cube $B_\infty^N$, and let $m\geqslant \varepsilon N$. Then $\operatorname{vc}(K,c_1(\varepsilon)) \geqslant c_2(\varepsilon)N$.

The proof of the lemma relies on two useful results.

Theorem D [23]. Any $m$-dimensional cross-section of a volume-one cube has volume at least $1$:

$$ \begin{equation*} \operatorname{Vol}_m\biggl(\frac12 B_\infty^N \cap L_m\biggr)\geqslant 1. \end{equation*} \notag $$

In fact, a weaker bound

$$ \begin{equation*} \operatorname{Vol}_m(B_\infty^N\cap L_m)\geqslant c^m \end{equation*} \notag $$

suffices for our purposes.

Theorem E [24]. Let $\mathcal F$ be a class of functions bounded by $1$ and defined on a set $I$. Then for any probability measure $\mu$ on $I$,

$$ \begin{equation} \log N_t(\mathcal F,L_2(\mu)) \leqslant C \operatorname{vc}(\mathcal F,ct)\log\frac2t, \qquad 0<t<1, \end{equation} \tag{11} $$

where $C$ and $c$ are positive absolute constants.

Here $N_\delta(K,X)$ is the size of a minimal $\delta$-net for a set $K$ in the space $X$.

Proof of Lemma 3.1. Let $K=L_m\cap B_\infty^N$. From Theorem D we obtain ${\operatorname{Vol}_m(K)\!\geqslant\! 2^m}$, and we can use the standard volume estimate

$$ \begin{equation*} \begin{aligned} \, N_r(K,\ell_2^N)&=N_r(K,\ell_2^N\cap L_m) \\ &\geqslant \frac{\operatorname{Vol}_m(K)}{\operatorname{Vol}_m(r B_2^m)} \geqslant \frac{2^m}{(cr m^{-1/2})^m}\geqslant 2^m \quad\text{for } r=c_1m^{1/2}. \end{aligned} \end{equation*} \notag $$

We apply (11) to the set $K$ (note that $\|z\|_\infty\leqslant 1$ for $z\in K$ by definition) in the space $L_2^N$ for $t=N^{-1/2}r\asymp \varepsilon^{1/2}$:

$$ \begin{equation*} \operatorname{vc}(K,ct)\geqslant c_1\frac{\log N_t(K,L_2^N)}{\log(2/t)} \geqslant c(\varepsilon)N. \end{equation*} \notag $$

The lemma is proved.

Now we are ready to prove Theorem 3.1.

Proof of Theorem 3.1. By duality we have

$$ \begin{equation*} \mathsf{E}\rho(\xi,Q_n)_1=\mathsf{E} \max_{z\in K} \langle z,\xi\rangle, \qquad K := B_\infty^N\cap Q_n^\perp. \end{equation*} \notag $$

Lemma 3.1 produces a set of coordinates $\Lambda\subset\{1,\dots,N\}$, $|\Lambda|\geqslant c(\varepsilon)N$, and a number $v\geqslant c(\varepsilon)$ such that for any choice of signs $\sigma\colon\Lambda\to\{-1,1\}$ there is a vector $z^\sigma\in K$ such that

$$ \begin{equation} \min_{i\in\Lambda} \sigma_i z^\sigma_i \geqslant v. \end{equation} \tag{12} $$

(Recall that, as $K$ is convex and centrally-symmetric, we may assume that $z^\sigma$ oscillates around zero.)

We will derive from (12) that

$$ \begin{equation} \mathsf{E} \max_\sigma \langle \xi,z^\sigma \rangle \geqslant v \sum_{i\in\Lambda} \mathsf{E}|\xi_i| - \sum_{i\notin\Lambda} \|\mathsf{E}(\xi_i\mid \{\operatorname{sign}\xi_j\}_{j\in\Lambda})\|_{L_1}. \end{equation} \tag{13} $$

Fix $\sigma^*\colon\Lambda\to\{-1,1\}$, and let $\mathsf{E}^*$ be the conditional expectation $\mathsf{E}^*(\,\cdot\,)=\mathsf{E}(\,\cdot\mid\operatorname{sign}\xi_i =\sigma_i^*,\,i\in\Lambda)$. Then we have

$$ \begin{equation*} \begin{aligned} \, \mathsf{E}^*\max_\sigma\langle\xi,z^\sigma\rangle &\geqslant \mathsf{E}^*\langle\xi,z^{\sigma^*}\rangle \\ &=\mathsf{E}^*\sum_{i\in\Lambda}\xi_i z^{\sigma^*}_i+\sum_{i\notin\Lambda}\xi_i z^{\sigma^*}_i \geqslant v\sum_{i\in\Lambda}\mathsf{E}^*|\xi_i| - \sum_{i\notin\Lambda}|\mathsf{E}^*\xi_i|. \end{aligned} \end{equation*} \notag $$

Here we have used that $\|z^{\sigma^*}\|_\infty\leqslant 1$. We average over $\sigma^*$ and obtain (13).

The theorem follows from (13), because the first sum there is at least $v|\Lambda|\geqslant c_\varepsilon N$, and for each term in the second sum we have

$$ \begin{equation*} \|\mathsf{E}(\xi_i\mid\{\operatorname{sign}\xi_j\}_{j\in\Lambda})\|_{L_1} \leqslant \|\mathsf{E}(\xi_i\mid \{\operatorname{sign}\xi_j\}_{j\ne i})\|_{L_1}. \end{equation*} \notag $$

The theorem is proved.

Let us present some corollaries of Theorem 3.1 for the case when $\mathsf{E}(\xi_i\mid\{\operatorname{sign}\xi_j\}_{j\ne i})\equiv0$.

Corollary 3.1. Let $\xi_1,\dots,\xi_N$ be independent random variables such that $\mathsf{E}\xi_i=0$ and $\mathsf{E}|\xi_i|\geqslant 1$. Then for any random variables $\eta_1,\dots,\eta_n$, $n\leqslant N(1-\varepsilon)$, and any coefficients $\{a_{i,j}\}$ we have

$$ \begin{equation*} \mathsf{E}\|\xi-A\eta\|_1=\sum_{i=1}^N \mathsf{E}\biggl|\xi_i-\sum_{j=1}^n a_{i,j}\eta_j\biggr| \geqslant c(\varepsilon)N. \end{equation*} \notag $$

In terms of widths,

$$ \begin{equation*} d_n^{\mathrm{avg}}(\{\xi_1,\dots,\xi_N\},L_1)=N^{-1}d_n^{\mathrm{avg}}(\xi,\ell_1^N) \geqslant c(\varepsilon). \end{equation*} \notag $$

This also holds true for unconditional vectors $\xi$, that is, if $\operatorname{Law}(\xi_1,\dots,\xi_N)=\operatorname{Law}(\pm\xi_1,\dots,\pm\xi_N)$.

Example 3.1. Suppose that

$$ \begin{equation*} \int_0^1 f_k(x)\,dx=0\quad\text{and}\quad \int_0^1 |f_k(x)|\,dx=1,\quad k=1,\dots,N. \end{equation*} \notag $$

Then

$$ \begin{equation*} d_n^{\mathrm{avg}}(\{f_1(x_1),\dots,f_N(x_N)\}, L_1[0,1]^N)\geqslant c(\varepsilon) \quad\text{for } n\leqslant N(1-\varepsilon). \end{equation*} \notag $$

3.2. Rigidity in $L_0$

Theorem 3.2. For any $\varepsilon\in(0,1)$ there exists $\delta>0$ such that if $\xi_1,\dots,\xi_N$ are independent random values and

$$ \begin{equation*} \inf_{c\in\mathbb{R}}\|\xi_i-c\|_{L_0}\geqslant\varepsilon, \qquad i=1,\dots,N, \end{equation*} \notag $$

then for any subspace $Q_n\subset\mathbb{R}^N$ of dimension $n\leqslant \delta N$ we have

$$ \begin{equation} \mathsf{P}(\rho(\xi,Q_n)_{L_0^N} \leqslant \delta)\leqslant 2\exp(-\delta N), \end{equation} \tag{14} $$

where $\xi=(\xi_1,\dots,\xi_N)$.

The condition $\|\xi_i-c\|_{L_0}\geqslant\varepsilon$ is essential here because it forbids an approximation of the vector $\xi$ by a constant vector. This condition allows us to ‘separate’ slightly the values of the $\xi_i$.

Lemma 3.2. Let $\varepsilon,\tau\in(0,1]$, and let $\zeta$ be a random variable such that $\inf_{c\in\mathbb{R}}\|{\zeta-c}\|_{L_0}\geqslant\varepsilon$. Then there exist real numbers $a<b$, $|b-a|\geqslant\frac{\tau\varepsilon}2$, such that

$$ \begin{equation*} \mathsf{P}(\zeta\leqslant a)\geqslant \frac{\varepsilon}3,\quad \mathsf{P}(\zeta\geqslant b)\geqslant \frac{\varepsilon}3\quad\textit{and}\quad \mathsf{P}(a<\zeta<b)\leqslant\tau. \end{equation*} \notag $$

Proof. Consider the quantities

$$ \begin{equation*} q_- :=\inf\biggl\{x\colon \mathsf{P}(\zeta\leqslant x)\geqslant \frac\varepsilon3\biggr\}\quad\text{and}\quad q_+ :=\sup\biggl\{x\colon \mathsf{P}(\zeta\geqslant x)\geqslant \frac\varepsilon3\biggr\}. \end{equation*} \notag $$

Note that $\mathsf{P}(\zeta\leqslant q_-)\geqslant \varepsilon/3$ and $\mathsf{P}(\zeta\geqslant q_+)\geqslant \varepsilon/3$. We see that $|q_+-q_-|\geqslant\varepsilon$, because otherwise $\|\zeta-c\|_{L_0}\leqslant\frac23\varepsilon$ for $c=(q_-+q_+)/2$. We fix $k:=\lceil1/\tau\rceil$, divide $[q_-,q_+]$ into $k$ equal subintervals and choose $(a,b)$ to be the one having the least probability $\mathsf{P}(a<\zeta<b)$. Then this probability is at most $1/k\leqslant\tau$. Finally, $|b-a|\geqslant |q_+-q_-|/k \geqslant \tau\varepsilon/2$.

The lemma is proved.

Now we prove the theorem.

Proof of Theorem 3.2. We fix some small $\delta>0$ (to be chosen later) and estimate $\mathsf{P}(\rho(\xi,Q_n)_{L_0^N}<\delta)$, where $\dim Q_n\leqslant n\leqslant \delta N$. Consider an almost best $L_0^N$ approximation of $(\xi_1,\dots,\xi_N)$ by a (random) vector $(y_1,\dots,y_N)$ in the subspace $Q_n$. Let

$$ \begin{equation*} \Lambda :=\{k\colon |\xi_k-y_k| \geqslant \delta\} \end{equation*} \notag $$

be the (random) set of badly approximated coordinates. If $\|\xi-y\|_{L_0^N} < \delta$, then $|\Lambda|<\delta N$, so it is sufficient to bound the probability of the event $\{|\Lambda| \leqslant \delta N\}$.

We apply Lemma 3.2 to $\tau := 5\delta/\varepsilon$ and $\xi_k$ and find intervals $(a_k,b_k)$ such that

$$ \begin{equation*} \begin{gathered} \, \mathsf{P}(a_k<\xi_k<b_k)\leqslant \tau,\qquad |b_k-a_k|\geqslant\frac{\tau\varepsilon}2 =\frac{5\delta}2, \\ \max\{\mathsf{P}(\xi_k\leqslant a_k),\mathsf{P}(\xi_k\geqslant b_k)\}\leqslant1-\frac{\varepsilon}3. \end{gathered} \end{equation*} \notag $$

Let

$$ \begin{equation*} \Gamma :=\{k\colon \xi_k\in(a_k,b_k)\} \end{equation*} \notag $$

be the (random) set of ‘intermediate’ coordinates. We have $\mathsf{E}|\Gamma|\leqslant \tau N$; by Bernstein inequality we have $\mathsf{P}(|\Gamma|\leqslant 2\tau N)\geqslant 1-2\exp(-c\tau N)$ and

$$ \begin{equation} \mathsf{P}(|\Lambda|\leqslant \delta N) \leqslant \mathsf{P}(|\Lambda|\leqslant \delta N, \, |\Gamma|\leqslant 2\tau N)+2\exp(-c\tau N). \end{equation} \tag{15} $$

Fix some sets $\Lambda^\circ,\Gamma^\circ\subset\{1,\dots,N\}$ of size $|\Lambda^\circ|\leqslant\delta N$ and $|\Gamma^\circ|\leqslant 2\tau N$ and consider the event $\{\Lambda=\Lambda^\circ,\,\Gamma=\Gamma^\circ\}$. Let $N'=N-|\Lambda^\circ\cup\Gamma^\circ|$; note that $N'\geqslant N/2$. For $x\in\mathbb{R}^N$ let $x'\in\mathbb{R}^{N'}$ be the vector $x$ without coordinates in $\Lambda^\circ\cup\Gamma^\circ$. Then $\|\xi'-y'\|_{\ell_\infty^{N'}} \leqslant \delta$, and we have $y'\in Q_n'$; therefore,

$$ \begin{equation} d_n(K,\ell_\infty^{N'})\leqslant\delta, \quad\text{where } K:=\bigl\{\xi'(\omega)\colon \Lambda(\omega)=\Lambda^\circ, \, \Gamma(\omega)=\Gamma^\circ\bigr\}. \end{equation} \tag{16} $$

Consider the random vector $\pi$ with coordinates $\pi_k := -1$ for $\xi_k\leqslant a_k$, $\pi_k := 0$ for $a_k<\xi_k<b_k$ and $\pi_k := 1$ for $\xi_k\geqslant b_k$. Note that $\pi'\in\{-1,1\}^{N'}$ by construction. We argue that the set $S := \{\pi'(\omega)\colon \Lambda(\omega)=\Lambda^\circ,\,\Gamma(\omega)=\Gamma^\circ\}$ of all possible values of $\pi'$ contains at most $(eN'/n)^{n}$ elements. Indeed, otherwise (8) and (9) imply that $S$ contains a cube of dimension $n+1$ and

$$ \begin{equation*} \operatorname{vc}(K,t)\geqslant n+1, \quad\text{where } t:=\frac12\min(b_k-a_k)\geqslant \frac{5\delta}{4}, \end{equation*} \notag $$

so (7) implies that $d_n(K,\ell_\infty^{N'})\geqslant 5\delta/4$, which contradicts (16). For any $s\in S$ we have $\mathsf{P}(\pi'=s)\leqslant(1-\varepsilon/3)^{N'}$, so

$$ \begin{equation*} \mathsf{P}(\Lambda=\Lambda^\circ,\,\Gamma=\Gamma^\circ) \leqslant \biggl(\frac{eN}{n}\biggr)^n\biggl(1-\frac{\varepsilon}3\biggr)^{N/2}. \end{equation*} \notag $$

Finally, we bound the probability in (15):

$$ \begin{equation*} \begin{aligned} \, \mathsf{P}(|\Lambda|\leqslant \delta N,\,|\Gamma|\leqslant 2\tau N) &=\sum_{|\Lambda^\circ|\leqslant\delta N,\,|\Gamma^\circ|\leqslant 2\tau N}\mathsf{P}(\Lambda=\Lambda^\circ,\,\Gamma=\Gamma^\circ) \\ &\leqslant \biggl(\frac{eN}{\delta N}\biggr)^{\delta N} \biggl(\frac{eN}{2\tau N}\biggr)^{2\tau N} \biggl(\frac{eN}{n}\biggr)^{n} \biggl(1-\frac{\varepsilon}{3}\biggr)^{N/2}. \end{aligned} \end{equation*} \notag $$

It remains to choose the parameter $\delta$ sufficiently small to ensure that this probability is exponentially small.

Theorem 3.2 is proved.

We can state corollary on the best $n$-term approximation by a dictionary:

$$ \begin{equation*} \sigma_n(x,\Phi)_X := \inf_{\substack{\varphi_1,\dots,\varphi_n\in\Phi\\c_1,\dots,c_n\in\mathbb{R}}} \biggl\|x-\sum_{k=1}^n c_k\varphi_k \biggr\|_X. \end{equation*} \notag $$

Corollary 3.2. Under the assumptions of Theorem 3.2, for any set $\Phi\subset\mathbb{R}^N$, ${|\Phi|\leqslant AN}$ and $n<\delta N$, we have

$$ \begin{equation*} \mathsf{P}(\sigma_n(\xi,\Phi)_{L_0^N} \leqslant \delta)\leqslant 2\exp(-\delta N). \end{equation*} \notag $$

Indeed, there are exponentially many subspaces, so we can choose smaller ${\delta=\delta(A,\varepsilon)}$ to provide an exponential bound on the probability.

We need a modification of Statement 2.1.

Lemma 3.3. For any random vector $\xi$ in $\mathbb{R}^N$

$$ \begin{equation*} c\,d_n^{\mathrm{avg}}(\xi,L_0^N)^4 \leqslant d_n^{\mathrm{avg}}(\{\xi_1,\dots,\xi_N\},L_0) \leqslant C\,d_n^{\mathrm{avg}}(\xi,L_0^N)^{1/4}. \end{equation*} \notag $$

Proof. Let $d_n^\mathrm{avg}(\{\xi_1,\dots,\xi_N\},L_0) \leqslant \varepsilon\leqslant 1$. Then for some $\eta_1,\dots,\eta_N$ in an $n$-dimensional space we have $\sum\|\xi_k-\eta_k\|_{L_0}\leqslant N\varepsilon$. Consider the random sets $\Lambda_t := \{k\colon |{\xi_k-\eta_k}|\geqslant t\}$; note that $\|\xi-\eta\|\geqslant t$ if and only if $|\Lambda_t|\geqslant tN$.

We take $\delta := \sqrt\varepsilon$ and obtain

$$ \begin{equation*} \begin{aligned} \, \mathsf{E}|\Lambda_\delta| &=\sum_{k=1}^N \mathsf{P}(|\xi_k-\eta_k|\geqslant\delta) \leqslant \delta N+ \bigl|\{k\colon\mathsf{P}(|\xi_k-\eta_k|\geqslant\delta)\geqslant\delta\}\bigr| \\ &\leqslant \delta N+\bigl|\{k\colon \|\xi_k-\eta_k\|_{L_0}\geqslant\delta\}\bigr| \leqslant \delta N+ \frac{\varepsilon N}\delta=2\delta N. \end{aligned} \end{equation*} \notag $$

Moreover, for $\gamma := \sqrt{2\delta}$ we have

$$ \begin{equation*} \mathsf{P}(\|\xi-\eta\|_{L_0^N}\geqslant\gamma) \leqslant \mathsf{P}(|\Lambda_\gamma|\geqslant \gamma N) \leqslant \frac{\mathsf{E}|\Lambda_\gamma|}{\gamma N} \leqslant \frac{\mathsf{E}|\Lambda_\delta|}{\gamma N} \leqslant \frac{2\delta N}{\gamma N}=\gamma. \end{equation*} \notag $$

Hence $\mathsf{E}\|\xi-\eta\|_{L_0^N} \leqslant 2\gamma$.

The proof of the second inequality is analogous.

The lemma is proved.

Corollary 3.3. In the assumptions of Theorem 3.2 we have

$$ \begin{equation*} d_n^{\mathrm{avg}}(\{\xi_1,\dots,\xi_N\},L_0)\geqslant\delta\quad\textit{if $n\leqslant \delta N$.} \end{equation*} \notag $$

3.3. Random sets

Let $A$ be a signum matrix, that is, a matrix with entries $\pm1$. We define the signum rank $\operatorname{rank}_\pm A$ to be the minimum rank of matrices $B$ such that with $\operatorname{sign} B_{i,j}\equiv A_{i,j}$ ($B_{i,j}\ne0$). There is a bound on the number of signum $N\times N$ matrices with signum rank at most $r$:

$$ \begin{equation*} |\{A\in\{-1,1\}^{N\times N}\colon \operatorname{rank}_\pm A \leqslant r\}| \leqslant \biggl(\frac{4eN}{r}\biggr)^{2rN} \end{equation*} \notag $$

(see [25], and also [26] and [6]). For $r=cN$ we obtain the bound $2^{bN^2}$, where $b=b(c)\to0$ as $c\to0$. Hence, if $c$ is small, then there are very few signum matrices such that $\operatorname{rank}_\pm A\leqslant cN$. Moreover, almost all signum matrices are quite distant from such matrices in the Hamming metric. Hence almost all signum matrices are also distant from the low-rank matrices with $\operatorname{rank} B\leqslant cN$ in the $L_0$-distance. Now we state a corollary in terms of random signum matrices (with equiprobable signs).

Statement 3.1. Let $\mathbf{\mathcal E}$ be a random matrix with independent $\pm1$ entries. Then

$$ \begin{equation*} \mathsf{P}\Bigl(\,\min_{\operatorname{rank} B\leqslant c N}\|\mathbf{\mathcal E}-B\|_{L_0^{N\times N}}\leqslant c\Bigr)\leqslant 2\exp(-c N^2). \end{equation*} \notag $$

Let $D_N(a,b)$ be the subset of $L(a,b)$ consisting of functions that are piecewise constant on the intervals

$$ \begin{equation*} \biggl(a+\frac{(b-a)(j-1)}{N},\, a+\frac{(b-a)j}{N}\biggr),\qquad j=1,\dots,N. \end{equation*} \notag $$

Statement 3.2. Let $\mathbf{\varphi}_1,\dots,\mathbf{\varphi}_N$ be a random system in $D_N(0,1)$ such that $\varphi_{k,j} :=\varphi_k|_{((j-1)/N,j/N)}$, where the $\varepsilon_{k,j}$ are independent random signs. Then

$$ \begin{equation*} d^{\mathrm{avg}}_n(\{\mathbf{\varphi}_1,\dots,\mathbf{\varphi}_N\},L_0)\geqslant c,\quad\textit{if $n\leqslant cN$} \end{equation*} \notag $$

with probability at least $1-2\exp(-cN^2)$.

Note that, although we use independent random variables in the definition of $\varphi_k$, the system $\{\varphi_k\}_1^N$ is by no means independent.

Proof of Statement 3.2. Let us prove that if $\{\varphi_k\}$ is not rigid in $L_0(0,1)$, then the matrix $\Phi := (\varphi_{k,j})$ can be well approximated in $L_0^{N\times N}$; this will contradict Statement 3.1.

Consider an approximation $\varphi_k\approx g_k$ in $L_0$ by functions in some $n$-dimensional subspace. There is a technical subtlety: we cannot claim that $g_k\in D_N$ (one cannot average over intervals because such an operator is not defined in $L_0$). Instead, we use Lemma 3.3: it shows that the rigidity of the system $\xi_k := \varphi_k$ is equivalent to the rigidity of the vector $(\xi_1,\dots,\xi_N)$, hence it is determined by the distribution of this vector. So we can assume that the measure on the space $(0,1)$, where everything is defined, consists of ‘atoms’ $((j-1)/N,j/N)$.

Finally, if $\varphi_k\approx g_k$, $g_k\in D_N$, and the average of $\|\varphi_k - g_k\|_{L_0}$ is less than $\delta$, then we have $\|\varphi_k-g_k\|_{L_0}\leqslant \gamma := \delta^{1/2}$ for all but at most $\gamma N$ indices. Therefore, $\|\Phi-G\|_{L_0^{N\times N}} \leqslant 2\gamma$.

The statement is proved.

Corollary 3.4. There exists an orthonormal system $\varphi_1,\dots,\varphi_N\in D_N(0,1)$ that is rigid in $L_0$:

$$ \begin{equation*} d_n^{\mathrm{avg}}\bigl(\{\varphi_1,\dots,\varphi_N\}, L_0(0,1)\bigr) \geqslant c \quad\textit{if } n\leqslant cN. \end{equation*} \notag $$

Proof. We construct a system $\varphi_1,\dots,\varphi_N\!\in\! D_{2N}(0,2)$ instead. We define $\varphi_1,\dots,\varphi_N$ on $(0,1)$ using random values $\varphi_i|_{((j-1)/N,j/N)}=\pm\delta$ for some $\delta$ to be specified below. This system will be $L_0$-rigid with high probability.

Then we extend this system to $(0,2)$ to make it orthonormal. A necessary and sufficient condition of the existence of such an extension is known:

$$ \begin{equation*} \max_{|a|=1}\biggl\|\sum_{k=1}^N a_k\varphi_k\biggr\|_{L_2(0,1)} \leqslant 1 \end{equation*} \notag $$

(see, for instance, [27], Ch. 8). If $\delta$ is sufficiently small, then this condition holds true with high probability, because the spectral norm of a random $\pm1$ matrix is $O(N^{1/2})$; see, for example, [22], Ch. 4.

Moreover, we can put the functions $\varphi_k|_{(1,2)}$ in an arbitrary $N$-dimensional space (preserving the integrals $\displaystyle\int_1^2\varphi_k\varphi_l$), for example, in $D_N(1,2)$.

The corollary is proved.

3.4. Lacunary systems

Let $\lambda>1$. We say that a sequence of positive integers $k_1,\dots,k_N$ is $\lambda$-lacunary if $k_{j+1}/k_j\geqslant\lambda$ for $j=1,\dots,N-1$.

Statement 3.3. Let $\varphi$ be a Riemann-integrable Borel function on $\mathbb{T}$ such that $\mu\{x\in\mathbb{T}\colon \varphi(x)\ne a\}>0$ for each $a\in\mathbb{R}$. Then the following hold.

1. There exists $\lambda=\lambda(\varphi)>1$ such that for any $\lambda$-lacunary sequence $k_1,\dots,k_N$ we have

$$ \begin{equation*} d_n^{\mathrm{avg}}\bigl(\{\varphi(k_1x),\dots,\varphi(k_Nx)\},L_0(\mathbb{T})\bigr)\geqslant c(\varphi) \quad\textit{if } n\leqslant c(\varphi) N. \end{equation*} \notag $$

2. For any $\varepsilon>0$ there exists $\lambda=\lambda(\varphi,\varepsilon)>1$ such that for any $\lambda$-lacunary sequence $k_1,\dots,k_N$ we have

$$ \begin{equation*} d_n^{\mathrm{avg}}\bigl(\{\varphi(k_1x),\dots,\varphi(k_Nx)\},L_1(\mathbb{T})\bigr)\geqslant c(\varphi,\varepsilon) \quad\textit{if } n\leqslant N(1-\varepsilon). \end{equation*} \notag $$

We use the following quantitative result.

Theorem F (see [28], Theorem 4.3). Let $(k_j)$ be an increasing sequence of positive integers. Then there exists a sequence $(g_j)$ of measurable functions on $(0,1)$ which are independent and uniformly distributed over $(0,1)$ and such that

$$ \begin{equation*} \mu\biggl\{x\in[0,1]\colon |\{k_j x\}-g_j(x)|>\frac{2k_j}{k_{j+1}}\biggr\}\leqslant \frac{2k_j}{k_{j+1}}, \qquad j=1,2,\dotsc\,. \end{equation*} \notag $$

Now can prove our statement.

Proof of Statement 3.3. Let us start with the $L_1$-case. First of all, we can shift and renorm $\varphi$ to obtain $\displaystyle\int_\mathbb{T}\varphi(x)\,dx=0$ and $\displaystyle\int_\mathbb{T}|\varphi(x)|\,dx=1$ (this does not affect rigidity). Given a $\lambda$-lacunary sequence $k_1,\dots,k_N$ for some $\lambda>1$, we apply Theorem F to obtain functions $g_1,\dots,g_N$ satisfying $\|\{k_j x\}-g_j(x)\|_{L_0} \leqslant 2/\lambda$.

Put $f_j(x):=\varphi(g_j(x))$. Then the functions $f_j$ are independent, $\displaystyle\int f_j=0$ and $\displaystyle\int|f_j|=1$. Corollary 3.1 implies the rigidity of $\{f_j\}$. To prove the rigidity of $\{\varphi(k_jx)\}$ we have to bound $\|\varphi(k_jx)-f_j\|_1$:

$$ \begin{equation*} \begin{aligned} \, &\int_\mathbb{T}|\varphi(k_jx)-f_j(x)|\,dx \\ &\qquad=\int_A|\varphi(k_jx)-\varphi(g_j(x))|\,dx+ \int_{\mathbb{T}\setminus A}|\varphi(k_jx)-\varphi(g_j(x))|\,dx, \end{aligned} \end{equation*} \notag $$

where $A=\{x\colon |\{k_j x\}-g_j(x)|>2/\lambda\}$. The first integral is at most $2\|\varphi\|_\infty\mu(A) \leqslant 4\|\varphi\|_\infty/\lambda$. The second integral does not exceed

$$ \begin{equation*} \int_{\mathbb T}\sup_{|x-t|\leqslant 2/\lambda}|\varphi(t)-\varphi(x)|\,dx=: \tau\biggl(\varphi,\frac{4}\lambda\biggr), \end{equation*} \notag $$

where $\tau$ is the averaged modulus of continuity. It is known that ${\lim_{h\to 0}\tau(\varphi,h)=0}$ for Riemann-integrable functions. Hence, for sufficiently large $\lambda$ the value of $\|\varphi(k_jx)-f_j(x)\|_1$ is sufficiently small. This proves the $L_1$-case.

The $L_0$-case also follows because

$$ \begin{equation*} \|\varphi(k_jx)-f_j(x)\|_{L_0} \leqslant \|\varphi(k_jx)-f_j(x)\|_1^{1/2} \end{equation*} \notag $$

and we can use Corollary 3.3.

The statement is proved.

3.5. Complemented subspaces

In this subsection we outline another method for the proof of rigidity, which is based on the notion of complementability.

Let $f_1,\dots,f_N$ be an orthonormal system in $L_2(0,1)$. Suppose that we aim to prove the rigidity of $\{f_j\}$ in a larger space $X\supset L_2$. If there exists a bounded linear operator $\pi\colon X\to L_2$ that preserves the system: $\pi f_j=f_j$, $j=1,\dots,N$, then we obtain rigidity in $X$:

$$ \begin{equation*} \|\pi\|_{X\to L_2} d_n^{\mathrm{avg}}(\{f_1,\dots,f_N\},X)_2 \geqslant d_n^{\mathrm{avg}}(\{f_1,\dots,f_N\},L_2)_2=\biggl(1-\frac nN\biggr)^{1/2}. \end{equation*} \notag $$

Let us state a corollary.

Statement 3.4. Let $\{f_j\}_1^\infty$ be an orthonormal system in $X\supset L_2$ such that two conditions are satisfied:

(1) the system spans an $\ell_2$ subspace:
$$ \begin{equation*} \biggl\|\sum_{j\geqslant 1}c_jf_j\biggr\|_X\asymp \biggl\|\sum_{j\geqslant1}c_jf_j\biggr\|_2; \end{equation*} \notag $$
(2) the system in complemented, that is, there exists a bounded projection $\pi$: $X\twoheadrightarrow F_X :=\overline{\operatorname{span}}\,\{f_j\}_{j=1}^\infty$.

Then

$$ \begin{equation*} d_n^{\mathrm{avg}}(\{f_{i_1},\dots,f_{i_N}\},X)_2\geqslant c\sqrt{1-\frac nN}, \quad\textit{where } c=c(F_X,X)>0, \end{equation*} \notag $$

for all $i_1<i_2<\dots<i_N$.

Corollary 3.5. The Rademacher chaos $\{r_ir_j\}_{1\leqslant i<j<\infty}$ is rigid in $L\log L$: for any set of pairs $\Lambda$ we have

$$ \begin{equation*} d_n^{\mathrm{avg}}(\{r_ir_j\}_{(i,j)\in\Lambda}, L\log L)_2 \geqslant c(\varepsilon)>0 \quad\textit{if } n<|\Lambda|(1-\varepsilon). \end{equation*} \notag $$

Proof. Let $X$ be a symmetric space. It is known that for the Rademacher chaos property (1) is equivalent to the embedding $X\supset H$ (see [29], Theorem 6.4); where the space $H$ is the closure of $L_\infty$ in the Orlicz space $L_{\varphi_1}$, $\varphi_1(t)=e^t-1$. A criterion for complementability is the double embedding $H\subset X\subset H'$ (see [29], Theorem 6.7). Thus, for $X=H'$ both conditions are satisfied. The space $H'$ coincides with the Lorentz space $\Lambda(\psi_1)$ of functions satisfying $\displaystyle\int_0^1 f^*(t)\log(2/t)\,dt<\infty$ (see [29], §§ 2.2 and 2.4). Further, this Lorentz space coincides with the space $L\log L$ of functions satisfying $\displaystyle\int_0^1 |f|\log(2+|f|)\,dt<\infty$ (see, for example, [30], Theorem D).

The corollary is proved.

§ 4. Rigidity in $L_p$, $1<p<2$

4.1. The $S_{p'}$-property

We start with a finite-dimensional result. Its proof follows the paper [31] by Gluskin.

Theorem 4.1. Let $1<p<2$ and $N\in\mathbb N$. Suppose that $\xi=(\xi_1,\dots,\xi_N)$ is an isotropic random vector in $\mathbb{R}^N$: $\mathsf{E}\xi_i^2=1$ and $\mathsf{E}\xi_i\xi_j=0$, $i\ne j$, that satisfies two conditions:

$$ \begin{equation} (\mathsf{E}|\xi|^{2+\gamma})^{1/(2+\gamma)} \leqslant AN^{1/2} \quad\textit{for some } \gamma>0, \quad A\geqslant 1, \end{equation} \tag{17} $$

and

$$ \begin{equation} \max_{|v|=1}(\mathsf{E}|\langle \xi,v\rangle|^{p'})^{1/{p'}} \leqslant B\quad\textit{for some $B\geqslant1$}. \end{equation} \tag{18} $$

Then for any $\varepsilon\in(0,1)$ satisfying $n\leqslant N(1-\varepsilon)$ and any $n$-dimensional space $Q_n\subset\mathbb{R}^N$ we have

$$ \begin{equation*} \mathsf{P}\bigl\{\rho(\xi,Q_n)_{\ell_p^N}\geqslant c(A,\varepsilon,\gamma)N^{1/p} B^{-1}\bigr\} \geqslant c(A,\varepsilon,\gamma). \end{equation*} \notag $$

Proof. Let $P$ be the orthogonal projection onto the space $Q_n^\perp$; then

$$ \begin{equation} \rho(\xi,Q_n)_p=\max_{z\in Q_n^\perp} \frac{\langle \xi,z\rangle}{\|z\|_{p'}} \geqslant \frac{\langle \xi,P\xi\rangle}{\|P\xi\|_{p'}}. \end{equation} \tag{19} $$

First we analyze $\eta := \langle\xi,P\xi\rangle$, the numerator of (19):

$$ \begin{equation*} \mathsf{E}\eta=\mathsf{E}\Bigl\langle\sum\xi_i e_i,\sum \xi_i Pe_i\Bigr\rangle= \sum \langle e_i,Pe_i\rangle=\operatorname{tr} P=N -n \geqslant \varepsilon N. \end{equation*} \notag $$

On the other hand

$$ \begin{equation*} \begin{aligned} \, \mathsf{E}\eta &\leqslant \frac12\, \varepsilon N+\mathsf{E}\eta\, \mathbf{1} \biggl\{\eta\geqslant\frac12\,\varepsilon N\biggr\} \\ &\leqslant \frac12\,\varepsilon N+ (\mathsf{E}|\eta|^{1+\gamma/2})^{2/(2+\gamma)}\mathsf{P} \biggl(\eta\geqslant\frac12\,\varepsilon N\biggr)^{\gamma/(2+\gamma)}. \end{aligned} \end{equation*} \notag $$

Since $|\eta|=|\langle\xi,P\xi\rangle|\leqslant|\xi|^2$, we have $\mathsf{E}|\eta|^{1+\gamma/2}\leqslant (AN^{1/2})^{2+\gamma}$ and we conclude that

$$ \begin{equation} \mathsf{P}\biggl(\eta \geqslant \frac12\, \varepsilon N\biggr)\geqslant t := \biggl(\frac{\varepsilon}{2A^2}\biggr)^{(2+\gamma)/\gamma}. \end{equation} \tag{20} $$

Now consider the denominator of (19):

$$ \begin{equation*} \mathsf{E}\|P\xi\|_{p'}^{p'}=\sum_{k=1}^N \mathsf{E}|\langle P\xi,e_k\rangle|^{p'}= \sum_{k=1}^N \mathsf{E}|\langle \xi,Pe_k\rangle|^{p'} \leqslant NB^{p'}. \end{equation*} \notag $$

Using Markov’s inequality we obtain

$$ \begin{equation} \mathsf{P}(\|P\xi\|_{p'}^{p'} \geqslant 2t^{-1} NB^{p'})\leqslant \frac12\, t. \end{equation} \tag{21} $$

We combine (20) and (21) to obtain the required bound

$$ \begin{equation*} \rho(\xi,Q_n)_p \geqslant \frac{\langle \xi,P\xi\rangle}{\|P\xi\|_{p'}} \geqslant \frac{\varepsilon N/2}{2^{1/p'}t^{-1/p'}N^{1/p'}B} \geqslant \frac14 \, \varepsilon t^{1/2} N^{1/p}B^{-1} \end{equation*} \notag $$

with probability at least $t/2$.

The theorem is proved.

Remark 4.1. Note that if condition (18) is satisfied, then (17) also holds for $A:=B$ and $\gamma:=p'-2$; see the proof of Theorem 4.2. However, is some cases we need dependence of the form $(\cdots)B^{-1}$ of the lower bound on the parameter $B$.

Let us formulate a corollary in function-theoretic terms.

Theorem 4.2. Let $\varphi_1,\dots,\varphi_N$ be an orthonormal system in a space $L_2(\mathcal X,\mu)$, $\mu(\mathcal X)=1$. Let $1<p<2$, and suppose that $\{\varphi_k\}$ has the $S_{p'}$-property, that is, for some $B\geqslant1$

$$ \begin{equation*} \biggl\|\sum_{k=1}^N a_k\varphi_k \biggr\|_{p'} \leqslant B|a| \quad\forall\, a\in\mathbb{R}^N. \end{equation*} \notag $$

Then this system is rigid in $L_p$:

$$ \begin{equation*} d_n^{\mathrm{avg}}(\{\varphi_1,\dots,\varphi_N\},L_p)_p \geqslant c(B,\varepsilon,p) \quad\textit{if } n\leqslant N(1-\varepsilon). \end{equation*} \notag $$

Proof. Indeed, let $x$ be a random point in $\mathcal X$, and consider the random vector $\xi=(\varphi_1(x),\dots,\varphi_N(x))$. Condition (18) in Theorem 4.1 is exactly the $S_{p'}$-property. Let us check that condition (17) is fulfilled for $A:=B$ and $\gamma=p'-2$. Indeed,

$$ \begin{equation*} \|\varphi_k^2\|_{1+\gamma/2} =\|\varphi_k\|_{2+\gamma}^2=\|\varphi_k\|_{p'}^2 \leqslant B^2 \end{equation*} \notag $$

and

$$ \begin{equation*} \biggl\|\biggl(\sum_{k=1}^N \varphi_k^2\biggr)^{1/2}\biggr\|_{2+\gamma}^2= \|\varphi_1^2+\dots+\varphi_N^2\|_{1+\gamma/2} \leqslant \sum_{k=1}^N \|\varphi_k^2\|_{1+\gamma/2} \leqslant B^2N. \end{equation*} \notag $$

Hence $d_n^{\mathrm{avg}}(\xi,\ell_p^N)_p$ is bounded below. It remains to use Statement 2.1.

The theorem is proved.

Corollary 4.1. Let $\varphi_1,\dots,\varphi_N$ be an orthonormal system in $L_2(\mathcal X,\mu)$, $\mu(\mathcal X)=1$, that is uniformly bounded: $\|\varphi_k\|_\infty\leqslant A$. Then for $1<p<2$ we have

$$ \begin{equation*} d_n(\{\varphi_1,\dots,\varphi_N\},L_p) \geqslant c(A,p)>0 \quad\textit{if } n\leqslant \frac12 N^{2/p'}. \end{equation*} \notag $$

Proof. Indeed, by a theorem of Bourgain [15] one can find a subsystem of size $\lfloor N^{2/p'}\rfloor$ of $\{\varphi_k\}$ with the $S_{p'}$-property and apply Corollary 3.5 to this subsystem.

The corollary is proved.

4.2. Additional remarks

Explicit rigid sets

Previous considerations allow us to present explicit constructions of $\ell_p$-rigid subsets of $\{-1,1\}^N$ of polynomial size.

Statement 4.1. For any $p\in(1,2)$ and any sufficiently large $N$ one can explicitly construct a set $V\subset\{-1,1\}^N$ of size $|V|\leqslant N^{C(p)}$ such that $d_{N/2}(V,\ell_p^N)\geqslant c(p) N^{1/p}$.

Proof. It is sufficient to construct $N$ characters $\chi_1,\dots,\chi_N\in \widehat{\mathbb Z_2^k}$ (with appropriate $k$) that form an $S_{p'}$-system. Then we apply Theorem 4.2: $d_{N/2}^\mathrm{avg}(\{\chi_1,\dots,\chi_N\},L_p)_p\geqslant c>0$. Hence once can take $V:=\{(\chi_1(x),\dots,\chi_N(x))\colon x\in \mathbb Z_2^k\}$.

In [32] an $S_{2m}$-system of $2^n$ characters in $\mathbb Z_2^{mn}$ was constructed, so it remains to take $2^n\approx N$, $m=\lceil p'/2\rceil$ and $k=mn$.

The statement is proved.

Independence for $p>2$

An analogue of Theorem 3.1 with normalization in $L_p$ also holds for $1<p\leqslant2$. We can easily deduce from Theorem 4.1 that an independent system of functions is rigid (with an additional log factor). However, this result will be the subject of another paper. The analogue of Theorem on the $L_1$-rigidity of independent functions is not true for $p>2$.

Statement 4.2. For any $p\in(2,\infty)$ there exists $\delta>0$ such that for all sufficiently large $N$ there exists a sequence $\xi_1,\dots,\xi_N$ of independent identically distributed random variables such that $\mathsf{E}\xi_i=0$, $\mathsf{E}|\xi_i|^p=1$ and

$$ \begin{equation*} d_{N^{1-\delta}}(\{\xi_1,\dots,\xi_N\},L_p) \leqslant C(p)N^{-\delta}. \end{equation*} \notag $$

Proof. We use the approximation of the octahedron $B_1^N$ in $\ell_\infty^N$ (see [33]). Let $n=N^{1-\beta}$, where $\beta$ is to be chosen later, and let $Q_n^*$ be an extremal $n$-dimensional $\ell_\infty$-subspace:

$$ \begin{equation*} \rho(B_1^N,Q_n^*)_\infty=d_n(B_1^N,\ell_\infty^N) \leqslant C(\beta) n^{-1/2}. \end{equation*} \notag $$

We will approximately ‘simulate’ vertices of $B_1^N$. Fix some small $\varepsilon>0$, and let $\xi_1$ be a random variable such that

$$ \begin{equation*} \mathsf{P}(\xi_1=0)=1-\varepsilon\quad\text{and} \quad \mathsf{P}(\xi_1=K)=\mathsf{P}(\xi_1=-K)=\frac\varepsilon2, \end{equation*} \notag $$

where $K$ is defined by the condition $\mathsf{E}|\xi_1|^p=1$; so that $\varepsilon K^p=1$. Consider independent copies $\xi_2,\dots,\xi_N$ of $\xi_1$ and the vector $\xi=(\xi_1,\dots,\xi_N)$. We construct an approximation vector $\eta$ that lies in $Q_n^*$ (so that $\eta_1,\dots,\eta_N$ lie in an $n$-dimensional subspace of $L^p$).

Consider three events: $\mathcal{A}_0=\{\xi=0\}$, $\mathcal{A}_1$: exactly one coordinate of $\xi$ is nonzero, and $\mathcal{A}_{2}$: at least two coordinates of $\xi$ are nonzero. Set $\eta(\omega):=0$ for $\omega\in \mathcal{A}_0\cup \mathcal{A}_{2}$, and let $\eta(\omega)$ be the best approximation to $\xi(\omega)$ from $Q_n^*$ for $\omega\in \mathcal{A}_1$. In the latter case we have $\|\xi-\eta\|_\infty \leqslant K\rho(B_1^N,Q_n^*)_\infty$.

By symmetry it is sufficient to estimate $\mathsf{E}|\xi_1-\eta_1|^p$:

$$ \begin{equation*} \mathsf{E}|\xi_1-\eta_1|^p \leqslant \mathsf{P}(\mathcal{A}_1) K^p\rho(B_1^N,Q_n^*)_\infty^p+ \mathsf{P}(\mathcal{A}_2) K^p. \end{equation*} \notag $$

The first term is bounded by

$$ \begin{equation*} N\varepsilon K^p C(\beta)^p n^{-p/2} \ll N^{1-(1-\beta)p/2} \ll N^{-\delta} \end{equation*} \notag $$

if $\beta$ is sufficiently small. The second term is bounded by $N^2\varepsilon^2K^p = N^2\varepsilon$, and we can make it small by letting $\varepsilon$ tend to zero.

The statement is proved.

§ 5. Approximation of Walsh and trigonometric functions

Recall some notation. Given a finite Abelian group $G$, we denote the group of characters $\chi\colon G\to\mathbb{C}$ by $\widehat{G}$. The probability measure $\mu_G$ on $G$ ($\mu_G(A)=|A|/|G|$) allows us to define the metrics $L_{\mathrm{H}}$ and $L_0$ for functions on $G$.

We are interested in two particular cases, the group $\mathbb Z_2^k$ and the cyclic group $\mathbb Z_N$.

In the first case characters are real and can be written as $W_x\colon y\mapsto (-1)^{\langle x,y\rangle}$, $x,y\in\mathbb Z_2^k$. We can identify these characters with functions in the well-known orthonormal Walsh system (in the Paley numeration) $w_0\equiv1,w_1,w_2,\dots$ . Indeed, for $n<2^k$ we have $w_n\bigl(\sum_{j\geqslant 1}y_j2^{-j}\bigr)=W_x(y)$, where $x=(x_1,\dots,x_k)\in\{0,1\}^k$, $y=(y_1,\dots,y_k)\in\{0,1\}^k$ and $n=\sum_{j=1}^kx_j2^{j-1}$.

The second case leads, of course, to the discrete trigonometric system $y\mapsto e(xy/N)$, where $e(\theta):=\exp(2\pi i\theta)$.

5.1. Kolmogorov widths

Recall the following well-known statement (see, for instance, [34], § 2.1).

Lemma 5.1. Let $G$ be a finite Abelian group, and let $A,B\subset G$ and $|A|+|B|>|G|$. Then $A+B=\{a+b\colon a\in A,\,b\in B\} = G$.

We use an immediate consequence of this lemma.

Lemma 5.2. Let $A,B\subset\widehat{G}$ and $|A|+|B|>|G|$. Then for any $m,n\in\mathbb N$

$$ \begin{equation*} d_{mn}(\widehat{G},L_{\mathrm{H}}) \leqslant d_m(A,L_{\mathrm{H}})+d_n(B,L_{\mathrm{H}}). \end{equation*} \notag $$

The next theorem is proved using methods from [8]. It is a modification of Theorem 1.2 in that paper.

Theorem 5.1. Let $k\in\mathbb N$. Then for any $1\leqslant \lambda\leqslant \sqrt{k}/4$ we have

$$ \begin{equation*} d_n(\widehat{\mathbb Z_2^k},L_{\mathrm{H}}) \leqslant 4\exp(-2\lambda^2) \quad \textit{if } n\geqslant \biggl(\frac{k}{\lambda^2}\biggr)^{4\lambda\sqrt{k}}. \end{equation*} \notag $$

Proof. We identify $\mathbb Z_2^k\equiv\{0,1\}^k$ and denote characters by $W_x$:

$$ \begin{equation*} W_x\colon y\mapsto (-1)^{\langle x,y\rangle}, \qquad x,y\in\{0,1\}^k. \end{equation*} \notag $$

Our goal is to approximate $\{W_x\}$ in $L_{\mathrm{H}}$. Let $\mathcal X$ be the set of $x$ such that $\bigl|\sum_{i=1}^k x_i- k/2\bigr|\leqslant\sqrt{k}$. Hoeffding’s inequality shows that $|X|>2^{k-1}$, so it is enough to approximate $W_x$ for $x\in\mathcal X$ (and then use Lemma 5.2).

Fix $x\in\mathcal X$. Let $I:=\{i\colon x_i=1\}$ and

$$ \begin{equation*} Y_I :=\biggl\{y\in\{0,1\}^k\colon \biggl|\sum_{i\in I} y_i - \frac{|I|}2\biggr| \leqslant \lambda|I|^{1/2}\biggr\}. \end{equation*} \notag $$

Then $2^{-k}|Y_I| \geqslant 1-2\exp(-2\lambda^2)$, and so for $y\in Y_I$ we have

$$ \begin{equation*} \begin{aligned} \, \biggl|\langle x,y\rangle-\frac{k}{4}\biggr| &=\biggl|\sum_{i\in I}y_i - \frac k4\biggr| \leqslant\biggl|\sum_{i\in I}y_i-\frac{|I|}2\biggr| +\biggl|\frac{|I|}2-\frac k4\biggr| \\ &\leqslant\lambda\sqrt{|I|}+\frac{\sqrt{k}}2 \leqslant\lambda\sqrt{\frac k2+\sqrt{k}}+\frac{\sqrt{k}}2 \leqslant 2\lambda\sqrt{k}. \end{aligned} \end{equation*} \notag $$

We consider the polynomial $q(t)$ that interpolates $(-1)^t$ for $t\in\mathbb Z\cap[k/4-2\lambda\sqrt{k}, k/4+2\lambda\sqrt{k}]$. Its degree $d$ is at most $4\lambda\sqrt{k}$; note that $4\lambda\sqrt{k}\leqslant k$. Then $W_x(y)=q(\langle x,y\rangle)$ for all $y\in Y_I$, hence

$$ \begin{equation*} \|W_x - q(\langle x,\,\cdot\,\rangle)\|_{L_{\mathrm{H}}} \leqslant 2\exp(-2\lambda^2). \end{equation*} \notag $$

Let us bound the linear dimension of the family of functions $\{q(\langle x,\,\cdot\,\rangle)\}_x$ (it is sufficient to consider $x\in\mathcal X$ but it is more convenient to work with all $x\in\{0,1\}^k$). Their dimension is equal to the rank of the matrix $(q(\langle x,y\rangle))_{x,y\in\{0,1\}^k}$, where $x$ indexes rows and $y$ indexes columns. The rank is bounded by the number of the monomials in the polynomial $q(x_1y_1+\dots+x_ky_k)$, because each monomial generates rank-one matrix. Every monomial is a product of several $x_iy_i$, so we can bound the total number of them:

$$ \begin{equation*} \operatorname{rank}(q(\langle x,y\rangle)_{x,y\in\{0,1\}^k}) \leqslant \binom{k}{0}+\dots+\binom{k}{d} \leqslant \biggl(\frac{ek}d\biggr)^d \leqslant \biggl(\frac{\sqrt{k}}\lambda\biggr)^{4\lambda\sqrt{k}}. \end{equation*} \notag $$

To finish the proof it remains to apply Lemma 5.2.

The theorem is proved.

Corollary 5.1. Let $w_1,\dots,w_N,\dots$ be the Walsh system. Then

$$ \begin{equation*} d_n(\{w_1,\dots,w_N\},L_{\mathrm{H}}[0,1])\leqslant 2\exp(-c\log^{0.3}N) \quad\textit{if } n\geqslant\exp(C\log^{0.7}N). \end{equation*} \notag $$

Theorem 5.2. Let $k\in\mathbb N$. For $1\leqslant \lambda \leqslant c_0k^{1/4}$ we have

$$ \begin{equation*} d_n(\widehat{\mathbb Z_{2^k}},L_0) \leqslant c_1k\exp(-c_2\lambda^2) \quad \textit{if } n\geqslant \biggl(\frac k{\lambda^2}\biggr)^{c_3\lambda^3\sqrt{k}}. \end{equation*} \notag $$

Proof. We have to approximate characters $y\mapsto e(xy/2^k)$. We write $x=\sum_{i=0}^{k-1}x_i2^i$ and $y=\sum_{j=0}^{k-1}y_j2^j$ to identify $x,y\in\mathbb Z_{2^k}$ with the binary vectors $(x_0,\dots,x_{k-1})$ and $(y_0,\dots,y_{k-1})$. Using the 1-periodicity of $e(\,\cdot\,)$ we obtain

$$ \begin{equation*} e\biggl(\frac{xy}{2^k}\biggr)=e\biggl(\sum_{i,j=0}^{k-1}x_iy_j2^{i+j-k}\biggr) =e\biggl(\sum_{s=1}^k2^{-s}\sum_{i+j=k-s}x_iy_j\biggr). \end{equation*} \notag $$

We fix a number $s_0\leqslant\sqrt{k}$ and estimate the tail:

$$ \begin{equation*} \sum_{s>s_0}2^{-s}\sum_{i+j=k-s}x_iy_j\leqslant k\sum_{s>s_0}2^{-s}\leqslant k2^{-s_0}. \end{equation*} \notag $$

The function $e(\,\cdot\,)$ is $(2\pi)$-Lipschitz, so we have a uniform approximation

$$ \begin{equation} \biggl|e\biggl(\frac{xy}{2^k}\biggr) - \psi_x(y)\biggr| \leqslant 2\pi k2^{-s_0}, \quad \psi_x(y) :=e\biggl(\sum_{s=1}^{s_0}2^{-s}\sum_{i+j=k-s}x_iy_j\biggr). \end{equation} \tag{22} $$

Hence it is sufficient to approximate the functions $\psi_x$.

As in the previous proof, first we deal with the set $\mathcal X:=\bigl\{x\colon \bigl|\sum_{i=0}^{k-1}x_i-k/2\bigr|\leqslant\sqrt{k}\,\bigr\}$. Fix $x\in\mathcal X$.

Fix an index $s\in\{1,\dots,s_0\}$. Then we have

$$ \begin{equation*} \biggl|\sum_{i=0}^{k-s}x_i-\frac{k}2\biggr|\leqslant\sqrt{k}+s\leqslant2\sqrt{k}. \end{equation*} \notag $$

Let $J_s=\{j\in\{0,\dots,k-s\}\colon x_{k-s-j}=1\}$; then $||J_s|-k/2|\leqslant2\sqrt{k}$ and

$$ \begin{equation*} \biggl|\sum_{j\in J_s}y_j-\frac12|J_s|\biggr|\leqslant\lambda|J_s|^{1/2} \end{equation*} \notag $$

for random $y\in\{0,1\}^k$ with probability at least $1-2\exp(-2\lambda^2)$. For such $y$ we have

$$ \begin{equation*} \begin{aligned} \, \biggl|\sum_{i+j=k-s}x_iy_j-\frac{k}{4}\biggr| &=\biggl|\sum_{j\in J_s}y_j-\frac{k}{4}\biggr| \leqslant \biggl|\sum_{j\in J_s}y_j-\frac{|J_s|}{2}\biggr|+\biggl|\frac{|J_s|}{2}-\frac{k}{4}\biggr| \\ &\leqslant \lambda\sqrt{\frac{k}{2}+2\sqrt{k}}+\sqrt{k}\leqslant 2\lambda\sqrt{k}. \end{aligned} \end{equation*} \notag $$

(Here we may assume that $k$ is rather large.)

We use interpolation again. There exists a polynomial $q_s$ of degree $d\leqslant 4\lambda\sqrt{k}$, that interpolates the function $t\mapsto e(2^{-s}t)$ at the points $\mathbb Z\cap [k/4-2\lambda\sqrt{k},k/4+2\lambda\sqrt{k}]$. Then the function $e\bigl(2^{-s}\sum_{i+j=k-s}x_iy_j\bigr)$ is approximated by $q_s(\langle x,y\rangle)$ and the error in the Hamming metric is at most $2\exp(-2\lambda^2)$; moreover, $\dim\{q_s(\langle x,\,\cdot\,\rangle)\}_x \leqslant (ek/d)^d \leqslant (\sqrt{k}/\lambda)^{4\lambda\sqrt{k}}$.

The function that we wish to approximate is a product:

$$ \begin{equation} \psi_x(y)=\prod_{s=1}^{s_0} e\biggl(2^{-s}\sum_{i+j=k-s}x_iy_j\biggr). \end{equation} \tag{23} $$

We approximate it by the product of the corresponding polynomials. The error in the Hamming metric is at most $2s_0\exp(-2\lambda^2)$. The rank of the product is at most the product of the ranks, that is, $(\sqrt{k}/\lambda)^{4s_0\lambda\sqrt{k}}$.

We use Lemma 5.2 to approximate $\psi_x$ in $L_{\mathrm{H}}$ for all $x$. Finally, we use (22) to obtain an $L_0$-approximation of the characters with accuracy $2\pi k2^{-\mkern-1mu s_0}\mkern-1.5mu + 4s_0\mkern-1mu\exp(\mkern-2mu-2\lambda^2\mkern-1.5mu)$ by using dimension $\leqslant (k/\lambda^2)^{4s_0\lambda\sqrt{k}}$. It remains to take $s_0\asymp\lambda^2$.

The theorem is proved.

Corollary 5.2. The following inequality holds:

$$ \begin{equation*} d_n(\{e(kx)\}_{k=-N}^N, L_0(\mathbb T))\leqslant 2\exp(-c\log^{0.2}N) \quad\textit{if } n\geqslant\exp(C\log^{0.9}N). \end{equation*} \notag $$

To approximate continuous function, first we discretize them by approximating them by piecewise constant functions defined on sufficiently finite partitionings into $2^k$ subintervals, and then we approximate the resulting matrix (of the discrete Fourier transform) using the above result.

5.2. Trigonometric widths in $L_p$, $p<1$

Recall that the trigonometric widths $d_n^{\mathrm{T}}$ correspond to approximation by subspaces spanned by functions in the trigonometric system.

We consider two cases in parallel: the continuous case with harmonics $e(kx)$ on $\mathbb{T}$ and the discrete case with harmonics $e(kx/N)$ on $\mathbb Z_N$. We use the shorthand $L_p^N$ for the space $L_p(\mathbb Z_N)$ with the usual (quasi)norm

$$ \begin{equation*} \|f\|_{L_p^N}=\biggl(\frac1{N}\sum_{x\in\mathbb Z_N}|f(x)|^p\biggr)^{1/p}. \end{equation*} \notag $$

Theorem 5.3. For any $p\in(0,1)$ and for sufficiently large $N$ we have

$$ \begin{equation} d_n^{\mathrm{T}}(\{e(kx)\}_{k=-N}^N,L_p) \leqslant \log^{-c_1(p)}N \quad\textit{if } n\geqslant N\exp(-\log^{c_2}N) \end{equation} \tag{24} $$

and

$$ \begin{equation} d_n^{\mathrm{T}}\biggl(\biggl\{e\biggl(\frac{kx}N\biggr)\biggr\}_{k\in \mathbb Z_N}, L_p^N\biggr) \leqslant \log^{-c_1(p)}N \quad\textit{if } n\geqslant N\exp(-\log^{c_2}N). \end{equation} \tag{25} $$

Consider the spaces of trigonometric polynomials of degree at most $m$:

$$ \begin{equation*} \mathcal T_m :=\biggl\{\sum_{k=-m}^m c_k e(kx)\biggr\} \quad\text{and}\quad \mathcal T_m^N :=\biggl\{\sum_{k=-m}^m c_k e\biggl(\frac{kx}N\biggr)\biggr\}. \end{equation*} \notag $$

Lemma 5.3. For any $p\in(0,1)$ there exists $\delta=\delta(p)>0$ such that for all ${m,N\in\mathbb N}$, $N\geqslant m$ we have

$$ \begin{equation*} \min\{\|T\|_p\colon T\in\mathcal T_m, \,\widehat{T}(0)=1\} \leqslant C(p)m^{-\delta} \end{equation*} \notag $$

and

$$ \begin{equation*} \min\{\|T\|_{L_p^N}\colon T\in\mathcal T_m^N,\,\widehat{T}(0)=1\} \leqslant C(p)m^{-\delta}. \end{equation*} \notag $$

For the proof one can take the Fejér kernel and use the well-known inequality $|K_{m-1}(x)|\leqslant \min(m,c/(mx^2))$.

Lemma 5.4. Let $m,N\in\mathbb N$, $N \geqslant C m^4$.

• There exists a set $\Lambda\!\subset\!\mathbb Z$ with the following property. For any $k\!\in\!\{\mkern-1.5mu-\mkern-1muN,\dots,N\}$ there exists $h\in\mathbb Z$, $h\ne0$, such that $k\pm lh\in\Lambda$ for $l=1,\dots,m$.
• There exists a set $\Lambda\subset\mathbb Z_N$ with the following property. For any $k\in\mathbb Z_N$ there exists $h\in\mathbb Z_N^*$ such that $k\pm lh\in\Lambda$ for $l=1,\dots,m$.

Moreover, in both cases

$$ \begin{equation*} |\Lambda|\leqslant C N^{1-{1}/(2m)}\log^{1/m}N. \end{equation*} \notag $$

We prove Theorem 5.3. We consider only the discrete case; the continuous case is quite analogous.

Proof of Theorem 5.3. We fix some number $m$ and use Lemma 5.4 to find a set $\Lambda$ and Lemma 5.3 to find a polynomial $T\in\mathcal T_m^N$. In order to approximate $e(kx/N)$ by the space $\mathcal T(\Lambda)$ of trigonometric polynomials with spectrum in $\Lambda$ we find $h\in\mathbb Z_N^*$ such that $k\pm h,\dots,k\pm mh\in\Lambda$. Consider the polynomial

$$ \begin{equation*} t(x)=\sum_{l=-m}^m \widehat T_m(l)e\biggl(\frac{(k+lh)x}{N}\biggr). \end{equation*} \notag $$

It equals $e(kx/N)+s(x)$, where $\operatorname{supp}\widehat{s}\subset\Lambda$, so we have to estimate $\|t\|_{L_p^N}$. Note that $t(x)=e(kx/N)T_m(hx)$ and

$$ \begin{equation*} \|t\|_{L_p^N}=\|T_m(hx)\|_{L_p^N}=\|T_m(x)\|_{L_p^N} \ll m^{-\delta}. \end{equation*} \notag $$

It remains to take $m\asymp \log^{1/2} N$ and obtain $|\Lambda|\leqslant N\exp(-(\log N)^{1/2+o(1)})$.

The theorem is proved.

Now we prove Lemma 5.4.

Proof of Lemma 5.4. We start with the discrete case. Let $\Lambda\subset\mathbb Z_N$ be a random set: a point occurs in $\Lambda$ independently of the other points with some probability $\tau\in(0,1)$.

We call a step $h\in\mathbb Z_N^*$ admissible for $k\in \mathbb Z_N$ if $k\pm lh\in\Lambda$ for $l=1,\dots,m$. We have

$$ \begin{equation*} \mathsf{P}(h \text{ is admissible for }k)=\tau^{2m}. \end{equation*} \notag $$

We call $k$ ‘bad’ if there are no admissible steps for $k$. Assume that for each $k$ there are at least $S$ steps $h_1,\dots,h_S\in\mathbb Z_N^*$ such that the progressions $k\pm lh_i$, $l=1,\dots,m$, are disjoint (we estimate $S$ slightly below). Then the events that $h_i$ is admissible for $k$ are independent for different $i$. Hence the probability that $k$ is bad is at most the probability that none of the $h_i$ is admissible, that is, $(1-\tau^{2m})^S$.

The expected number of bad $k$ is at most $N(1-\tau^{2m})^S < N\exp(-S\tau^{2m})$ so with probability $\geqslant2/3$ this number does not exceed $3N\exp(-S\tau^{2m})$. Moreover, $\mathsf{E}|\Lambda|=N\tau$, so with probability $\geqslant2/3$ we have $|\Lambda|\leqslant 3N\tau$. Hence there exists $\Lambda$ such that both estimates are true. If $\tau$ is sufficiently small to ensure that

$$ \begin{equation} 3N\exp(-S\tau^{2m}) < 1, \end{equation} \tag{26} $$

then there are no bad $k$ and $\Lambda$ is the required set.

It remains to estimate $S$ from below. There are $|\mathbb Z_N^*|\gg N/\log N$ possible steps in total. A step $h_1$ forbids the other steps $h$ that solve the $2m^2$ equations $lh \equiv \pm l_1h_1\pmod{N}$; $l,l_1\in\{1,\dots,m\}$. Each equation has at most $\mathrm{gcd}(N,l)\leqslant m$ solutions, so we can take

$$ \begin{equation*} S \asymp m^{-3}\frac{N}{\log N}. \end{equation*} \notag $$

Finally, to satisfy (29) we take

$$ \begin{equation*} \tau=\biggl(\frac{\log (4N)}{S}\biggr)^{1/(2m)} \ll \biggl(\frac{\log^2N}{N}\biggr)^{1/(2m)}. \end{equation*} \notag $$

The continuous case is analogous. Now, $\Lambda$ is a random subset of $\{-2N,\dots,2N\}$ and we consider $k\in\{-N,\dots,N\}$. To ensure that the progressions $\{k\pm lh, {l=1,\dots,m}\}$ are disjoint we take prime numbers $h$ in the interval $(m,N/m]$; hence $S \asymp m^{-1}N/\log N$.

The lemma is proved.

§ 6. Further work

In the most interesting case of the $L_1$-norm there is much more to do.

Question 6.1. Is the trigonometric system $\{e(kx)\}_{1\leqslant k\leqslant N}$ rigid in $L_1$?

It is obvious that $d_n(\{\varphi_1,\dots,\varphi_N\},L_1)\geqslant A^{-1} d_n(B_1^N,\ell_\infty^N)$ for a uniformly bounded ($\|\varphi_k\|_\infty\leqslant A$) orthonormal system. So good approximation is impossible for $n\leqslant c\log N$. Can one improve this bound?

Question 6.2. Prove that $d_n(\{w_1,\dots,w_N\},L_1)\geqslant c$ for $n=n(N)$ such that $n/\log N\to\infty$.

The independence (or unconditionality) is sufficient for $L_1$-rigidity but this condition is, of course, very restrictive. Can one find a weaker condition or at least prove the rigidity of some concrete system?

Question 6.3. Is the Rademacher chaos $\{r_i(t)r_j(t)\}_{1\leqslant i,j\leqslant N}$ rigid in $L_1$?

Cf. Statement 3.4. This question is connected with the next one (also see Statement 4.1).

Question 6.4. Present explicit constructions of $\ell_1$-rigid sets of polynomial (or at least subexponential) size in $W\subset \{-1,1\}^N$.

In [16] the non-rigidity of DFT matrices was connected with the non-rigidity of circulants. In terms of widths we obtain the following question.

Question 6.5. Let $u\in[-1,1]^N$, and let $T$ be a cyclic coordinate shift operator. Is it true that

$$ \begin{equation*} d_{N/2}(\{u,Tu,\dots,T^{N-1}u\},\ell_1^N)=o(N)? \end{equation*} \notag $$

We proceed to the $L_0$-case.

Question 6.6. Construct a uniformly bounded orthonormal system $\varphi_1,\dots,\varphi_N\in D_N(0,1)$ that is $L_0$-rigid.

Question 6.7. Is it true that for any finite Abelian group $G$ the set of characters admits a good approximation by very low-dimensional spaces:

$$ \begin{equation*} d_n(\widehat{G},L_0(G))=o(1) \quad\text{for } n=\exp(\log^{1-c}|G|)? \end{equation*} \notag $$



Bibliography

1.	A. Kolmogoroff (Kolmogorov), “Über die beste Annäherung von Funktionen einer gegebenen Funktionenklasse”, Ann. of Math. (2), 37:1 (1936), 107–110
2.	G. G. Lorentz, M. von Golitschek and Y. Makovoz, Constructive approximation. Advanced problems, Grundlehren Math. Wiss., 304, Springer-Verlag, Berlin, 1996, xii+649 pp.
3.	A. Pinkus, $n$-widths in approximation theory, Ergeb. Math. Grenzgeb. (3), 7, Springer-Verlag, Berlin, 1985, x+291 pp.
4.	Dinh Dũng, V. Temlyakov and T. Ullrich, Hyperbolic cross approximation, Adv. Courses Math. CRM Barcelona, Birkhäuser/Springer, Cham, 2018, xi+218 pp.
5.	V. M. Tikhomirov, “Approximation theory”, Analysis, v. II, Encyclopaedia Math. Sci., 14, Convex analysis and approximation theory, Springer-Verlag, Berlin, 1990, 93–243
6.	Y. Malykhin, “Matrix and tensor rigidity and $L_p$-approximation”, J. Complexity, 72 (2022), 101651, 13 pp.
7.	S. V. Lokam, “Complexity lower bounds using linear algebra”, Found. Trends Theor. Comput. Sci., 4:1–2 (2008), 1–155
8.	J. Alman and R. Williams, “Probabilistic rank and matrix rigidity”, STOC'17 Proceedings of the 49th annual ACM SIGACT symposium on theory of computing (Montreal, QC 2017), ACM, New York, 2017, 641–652
9.	S. M. Voronin and N. Temirgaliev, “On some applications of Banach measure”, Izv. Akad. Nauk Kaz. SSR, Ser. Fiz.-Mat., 1984, no. 5, 8–11 (Russian)
10.	V. E. Maĭorov, “Kolmogorov's $(n,\delta)$-widths of spaces of smooth functions”, Russian Acad. Sci. Sb. Math., 79:2 (1994), 265–279
11.	J. Creutzig, “Relations between classical, average, and probabilistic Kolmogorov widths”, J. Complexity, 18:1 (2002), 287–303
12.	B. S. Kashin, “Lower bounds for $m$-term approximations in the metric of the discrete space $L_n^0$”, Russian Math. Surveys, 76:5 (2021), 933–935
13.	V. F. Gaposhkin, “Lacunary series and independent functions”, Russian Math. Surveys, 21:6 (1966), 1–82
14.	B. S. Kashin, Yu. V. Malykhin and K. S. Ryutin, “Kolmogorov width and approximate rank”, Proc. Steklov Inst. Math., 303 (2018), 140–153
15.	J. Bourgain, “Bounded orthogonal systems and the $\Lambda(p)$-set problem”, Acta Math., 162:3–4 (1989), 227–245
16.	Z. Dvir and A. Liu, “Fourier and circulant matrices are not rigid”, 34th computational complexity conference, LIPIcs. Leibniz Int. Proc. Inform., 137, Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, 2019, 17, 23 pp.
17.	V. I. Ivanov and V. A. Yudin, “Trigonometric system in $L_p$, $0<p<1$”, Math. Notes, 28:6 (1980), 884–889
18.	T. Oikhberg and M. I. Ostrovskii, “Dependence of Kolmogorov widths on the ambient space”, Zh. Mat. Fiz. Anal. Geom., 9:1 (2013), 25–50
19.	E. Novak and H. Woźniakowski, Tractability of multivariate problems, v. I, EMS Tracts Math., 6, Linear information, Eur. Math. Soc., Zürich, 2008, xii+384 pp.
20.	R. S. Ismagilov, “On $n$-dimensional diameters of compacts in a Hilbert space”, Funct. Anal. Appl., 2:2 (1968), 125–132
21.	N. N. Vakhania, V. I. Tarieladze and S. A. Chobanyan, Probability distributions on Banach spaces, Math. Appl. (Soviet Ser.), 14, D. Reidel Publishing Co., Dordrecht, 1987, xxvi+482 pp.
22.	R. Vershynin, High-dimensional probability. An introduction with applications in data science, Camb. Ser. Stat. Probab. Math., 47, Cambridge Univ. Press, Cambridge, 2018, xiv+284 pp.
23.	J. D. Vaaler, “A geometric inequality with applications to linear forms”, Pacific J. Math., 83:2 (1979), 543–553
24.	S. Mendelson and R. Vershynin, “Entropy and the combinatorial dimension”, Invent. Math., 152:1 (2003), 37–55
25.	N. Alon, P. Frankl and V. R{o}dl, “Geometrical realization of set systems and probabilistic communication complexity”, SFCS'85 Proceedings of the 26th annual symposium on foundations of computer science (Portland, OR 1985), IEEE Computer Soc., Washington, DC, 1985, 277–280
26.	N. Alon, S. Moran and A. Yehudayoff, “Sign rank versus VC dimension”, 29th annual conference on learning theory, Proceedings of Machine Learning Research (PMLR), 49, 2016, 47–80, https://proceedings.mlr.press/v49/alon16.html
27.	B. S. Kashin and A. A. Saakyan, Orthogonal series, Transl. Math. Monogr., 75, Amer. Math. Soc., Providence, RI, 1989, xii+451 pp.
28.	I. Berkes, “On the uniform theory of lacunary series”, Number theory — Diophantine problems, uniform distribution and applications, Springer, Cham, 2017, 137–167
29.	S. V. Astashkin, The Rademacher system in function spaces, Birkhäuser/Springer, Cham, 2020, xx+559 pp.
30.	C. Bennett and K. Rudnick, On Lorentz–Zygmund spaces, Dissertationes Math. (Rozprawy Mat.), 175, PWN, Warszawa, 1980, 67 pp.
31.	E. D. Gluskin, “Intersections of a cube with an octahedron are poorly approximated by subspaces of small dimension”, Approximation of functions by special classes of operators, Interuniv. Collect. Sci. Works, Min. pros. RSFSR, Vologod. Gos. Ped. Inst., Vologda, 1987, 35–41 (Russian)
32.	D. J. Hajela, “Construction techniques for some thin sets in duals of compact abelian groups”, Ann. Inst. Fourier (Grenoble), 36:3 (1986), 137–166
33.	B. S. Kashin, “The diameters of octahedra”, Uspekhi Mat. Nauk, 30:4(184) (1975), 251–252 (Russian)
34.	Terence Tao and Van H. Vu, Additive combinatorics, Cambridge Stud. Adv. Math., 105, Reprint of the 2006 ed., Cambridge Univ. Press, Cambridge, 2010, xviii+512 pp.

Citation: Yu. V. Malykhin, “Widths and rigidity”, Sb. Math., 215:4 (2024), 543–571

Citation in format AMSBIB

\Bibitem{Mal24}

\by Yu.~V.~Malykhin

\paper Widths and rigidity

\jour Sb. Math.

\yr 2024

\vol 215

\issue 4

\pages 543--571

\mathnet{http://mi.mathnet.ru//eng/sm9958}

\crossref{https://doi.org/10.4213/sm9958e}

\mathscinet{http://mathscinet.ams.org/mathscinet-getitem?mr=4782822}

\zmath{https://zbmath.org/?q=an:07945685}

\adsnasa{https://adsabs.harvard.edu/cgi-bin/bib_query?2024SbMat.215..543M}

\isi{https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=Publons&SrcAuth=Publons_CEL&DestLinkType=FullRecord&DestApp=WOS_CPL&KeyUT=001298689600005}

\scopus{https://www.scopus.com/record/display.url?origin=inward&eid=2-s2.0-85202183298}

Linking options:

https://www.mathnet.ru/eng/sm9958

https://doi.org/10.4213/sm9958e

https://www.mathnet.ru/eng/sm/v215/i4/p117

Related presentations:

Поперечники и жесткость
Yu. V. Malykhin, November 20, 2024 11:55

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Statistics & downloads:
Abstract page:	296
Russian version PDF:	9
English version PDF:	13
Russian version HTML:	29
English version HTML:	121
References:	30
First page:	13

Что такое QR-код?

Registration to the website

Logotypes