Abstract:
We consider the Kolmogorov widths of finite sets of functions. Any orthonormal system of $N$ functions in $L_2$ is rigid, that is, it cannot be well approximated by linear subspaces of dimension essentially smaller than $N$. This is not true for weaker metrics: it is known that in every $L_p$ for $p<2$ the first $N$ Walsh functions can be $o(1)$-approximated by a linear space of dimension $o(N)$.
We present some sufficient conditions for rigidity. We prove that the independence of functions (in the probabilistic meaning) implies rigidity in $L_1$ and even in $L_0$, the metric that corresponds to convergence in measure. In the case of $L_p$ for $1<p<2$ the condition is weaker: any $S_{p'}$-system is $L_p$-rigid.
Also we obtain some positive results, for example, that the first $N$ trigonometric functions can be approximated by very low-dimensional spaces in $L_0$, and by subspaces generated by $o(N)$ harmonics in $L_p$ for ${p<1}$.
Bibliography: 34 titles.
This research was performed at Lomonosov Moscow State University and supported by the Russian Science Foundation under grant no. 22-11-00129,
https://rscf.ru/en/project/22-11-00129/.
In this paper we consider the Kolmogorov widths of finite sets of functions. Recall that the Kolmogorov width $d_n(K,X)$ is defined as the minimum distance $\sup_{x\in K}\rho(x,Q_n)_X$ from the set $K$ to $n$-dimensional subspaces $Q_n$ of the space $X$. This classical notion goes back to the paper [1] by Kolmogorov (1936). Widths are fundamental approximation characteristics of sets, and the behaviour of the sequence $(d_n(K,X))_{n=1}^\infty$ for functional classes and other finite-dimensional sets was actively studied by many authors. The books [2], Chs. 13 and 14, and [3] are standard references on the subject; see also a recent survey [4], § 4.3.
We will informally say that a set is rigid if it cannot be well approximated by low-dimensional subspaces. The starting point for us is the well-known result that any orthonormal system of functions $\varphi_1,\dots,\varphi_N$ is rigid in $L_2(0,1)$ (of course, this holds true in each Euclidean space):
(This equality for widths was first used by Stechkin; see [5] for details.)
The situation changes if we consider weaker metrics.
Theorem A [6]. Let $w_1,w_2,\dots$ be the Walsh system in the Paley numeration. Then for any $p\in[1,2)$ there exists $\delta=\delta(p)>0$ such that the following inequality holds for sufficiently large $N$:
This theorem appeared as a corollary of methods introduced in complexity theory for the related problem of matrix rigidity. The rigidity function of a matrix $A$ is defined as the Hamming distance from $A$ to the matrices of prescribed rank:
Matrix rigidity was introduced by Valiant in 1977 as a way to lower bounds for linear circuit complexity (for example, the inequality $\operatorname{Rig}(A_N,\varepsilon N)\geqslant N^{1+\varepsilon}$ for some fixed $\varepsilon>0$ leads to superlinear lower bounds for circuit complexity); see the survey [7] for more details.
It was a surprising result by Alman and Williams [8] that the family of Walsh–Hadamard matrices is not rigid, that is, the rank of these matrices can significantly be reduced by changing a small number of elements. Theorem A is based on their construction.
In this paper we obtain sufficient conditions for systems of functions to be rigid in metrics weaker than $L_2$.
Average widths and $L_1$-rigidity
Our lower bounds on widths are based on the approximation properties of random vectors in $\mathbb{R}^N$. Let $\xi_1,\dots,\xi_N$ be independent random variables such that $\mathsf{E}\xi_i=0$ and $\mathsf{E}|\xi_i| = 1$. We prove that for any $n$-dimensional space $Q_n\subset\mathbb{R}^N$,
we call this quantity the average Kolmogorov width of a random vector $\xi$. If fact, average widths were originally considered by Voronin and Temirgaliev in [9]. This notion was further studied by Maiorov [10], Creutzig [11] and others. We discuss the notation $d_n^\mathrm{avg}$ and the properties of average widths in § 2.2. There is a simple equality
which links the rigidity of finite-dimensional vectors and the rigidity of finite function systems. Note that the averaging on the right-hand side is over a finite set of elements of $L_1$, that is, we minimize $\frac1N\sum_{k=1}^N\rho(\xi_k,Q_n)$ over the subspaces of $L_1$.
Thus, we can say that independence implies rigidity.
Theorem 1.1. Let $\xi_1,\dots,\xi_N$ be independent random variables such that $\mathsf{E}\xi_i=0$ and $\mathsf{E}|\xi_i| = 1$. Then
This also provides rigidity for the standard widths, because $d_n \geqslant d_n^\mathrm{avg}$. A classical example of an independent system is the Rademacher system.
The same result holds if we replace independence by unconditionality:
Approximation in the space of measurable functions
Consider the space $L_0(\mathcal X,\mu)$ of all measurable functions on some measure space $(\mathcal X,\mu)$. Convergence in measure is equivalent to convergence in certain metrics, say, $\displaystyle\int_{\mathcal X}\frac{|f-g|}{1+|f-g|}\,d\mu$ or $\sup\{\varepsilon\colon \mu(|f-g|\geqslant\varepsilon)\geqslant\varepsilon\}$. We will use the latter. Of course, this metric is weaker than the usual $L_p$-metrics. The approximation in $L_0$ is much poorer studied than in $L_p$ but we believe that this subject contains interesting problems.
In § 3.2 we prove that independence implies rigidity even in the case of the $L_0$-metric.
Theorem 1.2. For any $\varepsilon\in(0,1)$ there exists $\delta>0$ such that if $\xi_1,\dots,\xi_N$ are independent random variables satisfying $\inf_{c\in\mathbb{R}}\|\xi_i-c\|_{L_0}\geqslant\varepsilon$, $i=1,\dots,N$, then
Let us mention a recent result due to Kashin [12].
Theorem B. There exist absolute constants $0<\gamma_0<1$ and $c_1,c_2>0$ such that for $N=1,2,\dots$, for any orthonormal basis $\Psi=\{\psi_k\}_{k=1}^N$ in $\mathbb{R}^N$ and any linear subspace $Q\subset\mathbb{R}^N$, $\dim Q\leqslant c_1 N$, the following inequality holds
It is known that lacunary systems $\varphi(k_jx)$ behave like independent variables; see [13], for example. The rigidity property is not an exception. We will prove that lacunary systems $\{\varphi(k_1x),\dots,\varphi(k_Nx)\}$ are rigid in $L_0$ if $\min k_{j+1}/k_j$ is sufficiently large (see Statement 3.3).
Another source of rigid systems is random matrices. It is well known that an $N\times N$ square matrix $\mathcal E$ consisting of random signs is very rigid with high probability: $\operatorname{Rig}(\mathcal E,\delta N)\geqslant \delta N^2$. We observe that this is true in weaker $L_0$-metric:
This gives us an example of an $L_0$-rigid orthonormal system $\{\varphi_1,\dots,\varphi_N\}$ of functions that are piecewise constant on the intervals $((j-1)/N, j/N)$.
There is another metric in $L_0(\mathcal X,\mu)$, namely, $\mu\{x\colon f(x)\ne g(x)\}$. We denote it by $L_{\mathrm{H}}$; the distance $\|f-g\|_{L_{\mathrm{H}}}$ is an analogue of the Hamming distance. (It is called the ‘indicator distance’ in probability theory.) This metric appears in function theory in the context of ‘correction’ theorems. For example, Luzin’s theorem states that $C$ is dense in $L_{\mathrm{H}}$, and Menshov’s theorem implies that the set of functions with uniformly convergent Fourier series is dense in $L_{\mathrm{H}}$. Note that matrix rigidity is a particular case of average Kolmogorov widths in the normalized Hamming distance $L_{\mathrm{H}}^N$. Let $A$ be an $M\times N$ real matrix and $W_A=\{A_1,\dots,A_M\}\subset\mathbb{R}^N$ be the set of rows of $A$. Then
Some of our results can be stated in matrix terms; in some cases the matrix language is even more convenient. However, in this paper we prefer to stick to the functional-theoretic language; see also [14].
The $S_{p'}$-property
In the case $1<p<2$ we establish a less restrictive sufficient condition for rigidity.
Theorem 1.3. Let $\varphi_1,\dots,\varphi_N$ be an orthonormal system in a space $L_2(\mathcal X,\mu)$, $\mu(\mathcal X)=1$. Let $1<p<2$, and suppose that $\{\varphi_k\}$ has the $S_{p'}$-property, that is,
Note that Bourgain’s theorem [15] states that any uniformly bounded orthonormal system has a $S_{p'}$-subsystem of size $N^{2/p'}$; this subsystem is rigid in $L_p$. Hence an $o(1)$-approximation of a uniformly bounded orthonormal system in $L_p$ requires a dimension of at least $n\geqslant N^{2/p'}/2$. In particular, in Theorem A we must indeed have polynomial dependence between $n$ and $N$.
The analogue of Theorem 1.1 also holds true for $1<p<2$ (we do not prove it here), but it fails for $p>2$.
Positive approximation results
Let $G$ be a finite Abelian group and $\widehat{G}$ be the dual group of characters $\chi\colon G\to\mathbb C^*$.
It follows from [16] that characters are not rigid in $L_{\mathrm{H}}$:
Such a small error of approximation, $N^{-1+c\varepsilon}$, is natural to see in the context of Valiant’s result on circuit complexity. However, we are interested in approximation in different ‘regimes’. We consider systems generated by specific characters, namely, the Walsh and trigonometric systems. First we show that they admit good approximation by very low-dimensional spaces.
We bound the trigonometric widths $d_n^{\mathrm{T}}$ of the first $N$ trigonometric functions, that is, approximate them by trigonometric subspaces $Q_n=\{\sum_{k\in \Lambda}c_ke(kx)\}$, ${|\Lambda|=n}$.
Theorem 1.5. For any $p\in(0,1)$ and for sufficiently large $N$ we have
In § 2 we provide necessary definitions and some needed facts. In §§ 3–5 we prove the results mentioned in § 1, as well as some other results. In § 6 we discuss some remaining questions on the subject.
Acknowledgement
The author thanks Konstantin Sergeevich Ryutin, Sergei Vladimirovich Astashkin and Boris Sergeevich Kashin for fruitful discussions.
§ 2. Preliminaries
2.1. Metrics
Let $(\mathcal X,\mu)$ be a measure space, $\mu(\mathcal X)<\infty$. As usual, we identify functions (or random variables) that differ on a set of zero measure.
We consider classical $L_p$-spaces for $0<p<\infty$:
The space $L_0(\mathcal X,\mu)$ consists of all measurable functions. Convergence in measure in $L_0$ can be metricised by several metrics. Let us describe the Ky–Fan metric that we use. Set
Note that $\|f\|_p \geqslant \|f\|_{L_0}^{1+1/p}$ and $\|f\|_{L_{\mathrm{H}}}\geqslant\|f\|_{L_0}$.
Let $N\in\mathbb N$. Then the space $\mathbb{R}^N$ is equipped with the usual norms $\ell_p^N$ corresponding to the counting measure on $\{1,\dots,N\}$:
We denote the unit ball in $\ell_p^N$ by $B_p^N$. Note that the meaning of $\|\cdot\|_p$ is clear from the context: it is either the norm $\|\cdot\|_{L_p}$, if the argument is a function, or the norm $\|\cdot\|_{\ell_p^N}$ if the argument is a vector. In the Euclidean case we write $|\cdot|$.
We let $L_p^N$ denote the $L_p$-spaces for the normalized (probability) measure on $\{1,\dots,N\}$. Obviously, $\|x\|_{L_p^N} = N^{-1/p}\|x\|_{\ell_p^N}$, $0<p\leqslant\infty$.
The quantity $\|x\|_{L_{\mathrm{H}}^N}$ is the share of nonzero coordinates of a vector, and the corresponding distance is the normalized Hamming distance. Note that $\lim_{p\to+0}\|x\|_p^p = \#\{k\colon x_k\ne0\} = N\|x\|_{L_{\mathrm{H}}^N}$.
In the case $p=0$ we do not work with $\ell_N^0$, but only with $L_0^N$:
Let $X$ be a linear space equipped with a metric (or a more general distance functional) $\rho$. Recall that the Kolmogorov width of order $n\in\mathbb Z_+$ of a set $K\subset X$ is defined by
where $\rho(x,Q)_X :=\inf_{y\in Q} \rho(x,y)$ and the infimum is taken over the linear subspaces of $X$ of dimension at most $n$. Usually, $X$ is a normed space and $\rho(x,y)=\|x-y\|_X$; note that this is not required in Kolmogorov’s original paper, and we will indeed use widths in a more general context.
Average widths
Let $X$ be a linear space with metric and $\mu$ be a Borel measure on $X$. For $p\in(0,\infty)$ we define the $p$-averaged Kolmogorov width of $\mu$ in $X$ by
If $p=1$, then we write simply $d_n^{\mathrm{avg}}(\mu,X)$.
From now on we assume that $\mu$ is a probability measure. Equivalently, we can consider the widths of a (Borel) random vector $\xi$ with values in $X$:
If $K$ is a subset of $X$ and it is clear what a ‘random point in $K$’ means, then the width $d_n^{\mathrm{avg}}(K,X)_p$ is defined as the width of such a random point. For example, for a finite $K$ we have
Let us discuss the new notation $d_n^{\mathrm{avg}}(\xi,X)_p$. Some authors write $d_n^{(\mathrm{a})}$ instead of $d_n^{\mathrm{avg}}$. This can be confusing because $d_n^{\mathrm{a}}$ denotes absolute widths (see [18]). Also the use of $\mathrm{avg}$ is standard in average settings in information-based complexity; see [19]. Some authors write $d_n^{(\mathrm{a})}(X,\mu)$, but when we deal with sets and random variables the notation $d_n^{\mathrm{avg}}(\xi,X)$ seems to be more intuitive (the object whose width is considered goes first).
Properties
The most basic inequality that we always have in mind is
Statement 2.1. Let $\xi_1,\dots,\xi_N\colon\Omega\to\mathbb{R}$ be some random variables such that $\mathsf{E}|\xi_i|^p<\infty$ for all $i$. Then for $p\in[1,\infty)$ and $n\in\mathbb Z_+$ we have
Note that in the width on the left-hand side the averaging is over the finite set of ‘points’ $\xi_1,\dots,\xi_N$ in the space $L_p(\Omega)$; and on the right-hand side we see the averaged width of the random vector $\xi=(\xi_1,\dots,\xi_N)$ in $\mathbb{R}^N$. In the case of discrete $\Omega$ equality (3) is a simple consequence of the equality of the row-rank and the column-rank of a matrix. Let us give the full proof for completeness.
Proof. Consider the error of an approximation $\xi_k\approx \eta_k$:
Note that the random variables $\eta_1,\dots,\eta_N$ lie in a subspace of dimension at most $n$ of $L_p(\Omega)$ if and only if there is a subspace $Q\subset\mathbb{R}^N$, $\dim Q\leqslant n$, such that
To prove the reverse inequality we have to construct $\eta$ from an (almost) optimal subspace $Q\subset\mathbb{R}^N$. Note that for any $\varepsilon>0$ there is a Borel-measurable (for example, piecewise-constant) map $\pi\colon\mathbb{R}^N\to Q$, such that
Take $\eta := \pi(\xi)$; this is a random vector because $\pi$ is Borel. Moreover, $\eta\in Q$ almost surely, so $\eta_1,\dots,\eta_N$ span an at most $n$-dimensional space in $L_p(\Omega)$. It remains to substitute $x=\xi$ into the previous inequality, take the $p$-expectation and use (4) to finish the proof.
An average width in Hilbert space
We present a formula for the 2-averaged width of a random vector in Hilbert space. This formula is a generalization of the Eckart–Young theorem on matrix approximation in the Frobenius norm. Also, it is a reformulation of Ismagilov’s theorem [20]. However, our terminology is very convenient, so this formulation makes sense. Our proof repeats Ismagilov’s and is presented for completeness.
Let $H$ be a separable Hilbert space (of finite or infinite dimension) and $\xi$ be a random vector in $H$ such that $\mathsf{E}|\xi|^2<\infty$. Consider the correlation operator of $\xi$ defined by $\langle Au,v\rangle = \mathsf{E}\langle\xi,u\rangle\langle\xi,v\rangle$. It is known that $A$ is a compact positive selfadjoint operator (see, for example, [21], Ch. 3, § 2). A sequence $(\lambda_n)$ is the sequence of eigenvalues of $A$ if and only if there is an orthonormal basis $\{\varphi_n\}$ (an eigenbasis of $A$) such that
We arrange the eigenvalues in decreasing order: $\lambda_1\geqslant\lambda_2\geqslant\dots\geqslant0$.
Statement 2.2. We have $d_n^\mathrm{avg}(\xi,H)_2 = (\sum_{k>n}\lambda_k)^{1/2}$.
Proof. Fix a basis (5) and put $\xi_k:=\langle\xi,\varphi_k\rangle$. We bound the width from below. Consider some approximating space $Q_n$ with orthonormal basis $\{\psi_1,\dots,\psi_n\}$. Then
We have $\mathsf{E}|\xi|^2=\sum_{k\geqslant1}\mathsf{E}\xi_k^2=\sum_{k\geqslant 1}\lambda_k$, so it is sufficient to find an upper bound for the projection:
Note that the numbers $(\mu_k)$ belong to $[0,1]$ and their sum equals $n$. By the monotonicity of the $\lambda_n$ the sum $\sum_{k\geqslant1}\lambda_k\mu_k$ is maximum if $\mu_k = 1$ for $k\leqslant n$ and $\mu_k=0$ for $k>n$. We have obtained a lower bound for the width. Equality is attained at the subspace $Q_n = \operatorname{span}\{\varphi_1,\dots,\varphi_n\}$.
The statement is proved.
It is often convenient to work with singular values $\sigma_k:=\lambda_k^{1/2}$ instead of eigenvalues. One can rewrite (5) in terms of the Schmidt decomposition:
Indeed, the singular value decomposition of $X$ gives use a basis $\{\varphi_k\}_{k=1}^M$ such that the coordinate vectors $(\langle x^1,\varphi_k\rangle,\dots,\langle x^N,\varphi_k\rangle)$ are orthogonal and have length $\sigma_k(X)$. For $\xi$ being a random choice of $x^i$ we have $\mathsf{E}\langle\xi,\varphi_k\rangle^2=N^{-1}\sigma_k^2(X)$ and $\mathsf{E}\langle\xi,\varphi_k\rangle\langle\xi,\varphi_l\rangle=0$, hence $\sigma_k(\xi) = N^{-1/2}\sigma_k(X)$.
Finally, an orthonormal system $\varphi_1,\dots,\varphi_N$ in a Euclidean space $E$ corresponds to the identity matrix $X$ (in appropriate coordinates), so we have (see also (1))
However, we need only the case when $\Phi$ is the trigonometric system $\{e(kx)\}_{k\in\mathbb Z}$ in the continuous case or $\{e(kx/N)\}_{k\in\mathbb Z_N}$ in the discrete case. In this situation the width is called the trigonometric width and is denoted by $d_n^{\mathrm{T}}$.
Usually, we consider real spaces (and the linear dimension over $\mathbb R$). When complex-valued functions are approximated, one can also consider linear spaces over $\mathbb C$. The distinction here is not important for us because in all cases where complex-valued functions are approximated we are interested in the dimension $n$ up to its order of magnitude.
2.3. Miscellaneous
The material in this subsection is contained in [22], §§ 2.2, 2.8 and 8.8.
Probability
We use the probabilistic notation $\mathsf{P}$ for the probability, $\mathsf{E}$ for the expectation and $\operatorname{Law}$ for the distribution. We also use the conditional expectation $\mathsf{E}(\xi\mid\eta)$, but only for $\eta$ with a finite number of values.
We make use of standard tail estimates. Let $X_i$, $\mathsf{E} X_i=0$, be independent random variables.
Hoeffding’s inequality
Let $a_i\leqslant X_i\leqslant b_i$ almost surely; then
Let us recall the definition of the combinatorial dimension of a class $\mathcal F$ of functions $f\colon I\to\mathbb{R}$. We say that a set $J\subset I$ is $t$-shattered ($t>0$) by $\mathcal F$, if there is a function $h\colon J\to\mathbb{R}$, such that for any choice of signs $\sigma\colon J\to\{-1,1\}$ one can find a function $f_\sigma\in\mathcal F$ such that
$$
\begin{equation}
\min_{x\in J}\sigma(x)(f_\sigma(x)-h(x))\geqslant t.
\end{equation}
\tag{6}
$$
Then the combinatorial dimension $\operatorname{vc}(\mathcal F,t)$ is the maximum cardinality of a set $t$-shattered by $\mathcal F$.
If $\mathcal F$ is convex and centrally symmetric, then one can assume that $h(x)\equiv 0$. Indeed, we can replace $\{f_\sigma\}$ by $\{\frac12(f_\sigma-f_{-\sigma})\}$.
We will use the simple (known) fact that
$$
\begin{equation}
\text{if } \operatorname{vc}(\mathcal F,t)> n, \quad\text{then } d_n(\mathcal F,\ell_\infty(I))\geqslant t.
\end{equation}
\tag{7}
$$
Indeed, let $J$ be a $t$-shattered set, $|J|=n+1$, and suppose that there is an approximation by an $n$-dimensional subspace: $\mathcal F\ni f\approx g$, $g\in Q_n$, $\|f-g\|_\infty < t$. Then there exists a nontrivial linear functional $\Lambda(f) := \sum_{x\in J}\lambda(x)f(x)$ such that $\Lambda(g)\equiv 0$ for $g\in Q_n$. Assume that $\Lambda(h)\geqslant 0$, where $h$ is from (6). Consider a function $f$ such that $\operatorname{sign}(\lambda(x))(f(x)-h(x))\geqslant t$ for all $\{x\in J\colon \lambda(x)\ne0\}$. Then for the approximating function we have $\operatorname{sign}(\lambda(x))(g(x)-h(x))>0$, hence $\Lambda(g)=\Lambda(g-h)+\Lambda(h)>0$. We obtain a contradiction. The case $\Lambda(h)\leqslant0$ is analogous.
We will also make use of the classical VC-bound for classes (that is, in fact, families of subsets) $\mathcal F\colon I\to\{-1,1\}^k$:
Thus, when the left-hand inequality in (8) holds, taking the projection of the class onto the set $J$, $|J|>d$, we obtain the whole cube $\{-1,1\}^J$.
There is a useful bound on binomial coefficients:
$$
\begin{equation}
\binom{k}{0}+\dots+\binom{k}m \leqslant \biggl(\frac{ek}m\biggr)^m, \qquad 0\leqslant m \leqslant k.
\end{equation}
\tag{9}
$$
Note that the function $(ek/x)^x$ is increasing for $x\in(0,k]$.
We will often consider $1$-periodic functions, that is, functions on $\mathbb T:=\mathbb R/\mathbb Z$. We also use the notation $e(\theta):=\exp(2\pi i\theta)$.
We denote positive constants by $c,c_1,c_2,C,\dots$. If there is dependence on some parameters, then we specify it explicitly: $c_p,c(\varepsilon),\dots$ .
§ 3. Sufficient conditions for rigidity in weak metrics
3.1. Rigidity in $L_1$
For definiteness we put $\operatorname{sign}(x)\!=\!1$ for $x\!\geqslant\! 0$ and ${\operatorname{sign}(x)\!=\!-1}$ for $x<0$.
Theorem 3.1. Let $\xi=(\xi_1,\dots,\xi_N)$ be a random vector in $\mathbb{R}^N$ such that $\mathsf{E}|\xi_i|\geqslant1$ for all $i$. Let $\varepsilon\in(0,1)$ and $n\leqslant N(1-\varepsilon)$. Then
Before proving the theorem let us note that the distance in $\ell_1^N$ of a vector $x\in\mathbb{R}^N$ to a subspace $Q_n\subset\mathbb{R}^N$ can be written in dual terms as
So central cross-sections of the cube $B_\infty^N$ play an important role here. We prove the following lemma.
Lemma 3.1. Let $K$ be an $m$-dimensional central section of the cube $B_\infty^N$, and let $m\geqslant \varepsilon N$. Then $\operatorname{vc}(K,c_1(\varepsilon)) \geqslant c_2(\varepsilon)N$.
The proof of the lemma relies on two useful results.
Theorem D [23]. Any $m$-dimensional cross-section of a volume-one cube has volume at least $1$:
where $C$ and $c$ are positive absolute constants.
Here $N_\delta(K,X)$ is the size of a minimal $\delta$-net for a set $K$ in the space $X$.
Proof of Lemma 3.1. Let $K=L_m\cap B_\infty^N$. From Theorem D we obtain ${\operatorname{Vol}_m(K)\!\geqslant\! 2^m}$, and we can use the standard volume estimate
We apply (11) to the set $K$ (note that $\|z\|_\infty\leqslant 1$ for $z\in K$ by definition) in the space $L_2^N$ for $t=N^{-1/2}r\asymp \varepsilon^{1/2}$:
Lemma 3.1 produces a set of coordinates $\Lambda\subset\{1,\dots,N\}$, $|\Lambda|\geqslant c(\varepsilon)N$, and a number $v\geqslant c(\varepsilon)$ such that for any choice of signs $\sigma\colon\Lambda\to\{-1,1\}$ there is a vector $z^\sigma\in K$ such that
$$
\begin{equation}
\min_{i\in\Lambda} \sigma_i z^\sigma_i \geqslant v.
\end{equation}
\tag{12}
$$
(Recall that, as $K$ is convex and centrally-symmetric, we may assume that $z^\sigma$ oscillates around zero.)
Fix $\sigma^*\colon\Lambda\to\{-1,1\}$, and let $\mathsf{E}^*$ be the conditional expectation $\mathsf{E}^*(\,\cdot\,)=\mathsf{E}(\,\cdot\mid\operatorname{sign}\xi_i =\sigma_i^*,\,i\in\Lambda)$. Then we have
Here we have used that $\|z^{\sigma^*}\|_\infty\leqslant 1$. We average over $\sigma^*$ and obtain (13).
The theorem follows from (13), because the first sum there is at least $v|\Lambda|\geqslant c_\varepsilon N$, and for each term in the second sum we have
Let us present some corollaries of Theorem 3.1 for the case when $\mathsf{E}(\xi_i\mid\{\operatorname{sign}\xi_j\}_{j\ne i})\equiv0$.
Corollary 3.1. Let $\xi_1,\dots,\xi_N$ be independent random variables such that $\mathsf{E}\xi_i=0$ and $\mathsf{E}|\xi_i|\geqslant 1$. Then for any random variables $\eta_1,\dots,\eta_n$, $n\leqslant N(1-\varepsilon)$, and any coefficients $\{a_{i,j}\}$ we have
The condition $\|\xi_i-c\|_{L_0}\geqslant\varepsilon$ is essential here because it forbids an approximation of the vector $\xi$ by a constant vector. This condition allows us to ‘separate’ slightly the values of the $\xi_i$.
Lemma 3.2. Let $\varepsilon,\tau\in(0,1]$, and let $\zeta$ be a random variable such that $\inf_{c\in\mathbb{R}}\|{\zeta-c}\|_{L_0}\geqslant\varepsilon$. Then there exist real numbers $a<b$, $|b-a|\geqslant\frac{\tau\varepsilon}2$, such that
Note that $\mathsf{P}(\zeta\leqslant q_-)\geqslant \varepsilon/3$ and $\mathsf{P}(\zeta\geqslant q_+)\geqslant \varepsilon/3$. We see that $|q_+-q_-|\geqslant\varepsilon$, because otherwise $\|\zeta-c\|_{L_0}\leqslant\frac23\varepsilon$ for $c=(q_-+q_+)/2$. We fix $k:=\lceil1/\tau\rceil$, divide $[q_-,q_+]$ into $k$ equal subintervals and choose $(a,b)$ to be the one having the least probability $\mathsf{P}(a<\zeta<b)$. Then this probability is at most $1/k\leqslant\tau$. Finally, $|b-a|\geqslant |q_+-q_-|/k \geqslant \tau\varepsilon/2$.
The lemma is proved.
Now we prove the theorem.
Proof of Theorem 3.2. We fix some small $\delta>0$ (to be chosen later) and estimate $\mathsf{P}(\rho(\xi,Q_n)_{L_0^N}<\delta)$, where $\dim Q_n\leqslant n\leqslant \delta N$. Consider an almost best $L_0^N$ approximation of $(\xi_1,\dots,\xi_N)$ by a (random) vector $(y_1,\dots,y_N)$ in the subspace $Q_n$. Let
be the (random) set of badly approximated coordinates. If $\|\xi-y\|_{L_0^N} < \delta$, then $|\Lambda|<\delta N$, so it is sufficient to bound the probability of the event $\{|\Lambda| \leqslant \delta N\}$.
We apply Lemma 3.2 to $\tau := 5\delta/\varepsilon$ and $\xi_k$ and find intervals $(a_k,b_k)$ such that
be the (random) set of ‘intermediate’ coordinates. We have $\mathsf{E}|\Gamma|\leqslant \tau N$; by Bernstein inequality we have $\mathsf{P}(|\Gamma|\leqslant 2\tau N)\geqslant 1-2\exp(-c\tau N)$ and
Fix some sets $\Lambda^\circ,\Gamma^\circ\subset\{1,\dots,N\}$ of size $|\Lambda^\circ|\leqslant\delta N$ and $|\Gamma^\circ|\leqslant 2\tau N$ and consider the event $\{\Lambda=\Lambda^\circ,\,\Gamma=\Gamma^\circ\}$. Let $N'=N-|\Lambda^\circ\cup\Gamma^\circ|$; note that $N'\geqslant N/2$. For $x\in\mathbb{R}^N$ let $x'\in\mathbb{R}^{N'}$ be the vector $x$ without coordinates in $\Lambda^\circ\cup\Gamma^\circ$. Then $\|\xi'-y'\|_{\ell_\infty^{N'}} \leqslant \delta$, and we have $y'\in Q_n'$; therefore,
Consider the random vector $\pi$ with coordinates $\pi_k := -1$ for $\xi_k\leqslant a_k$, $\pi_k := 0$ for $a_k<\xi_k<b_k$ and $\pi_k := 1$ for $\xi_k\geqslant b_k$. Note that $\pi'\in\{-1,1\}^{N'}$ by construction. We argue that the set $S := \{\pi'(\omega)\colon \Lambda(\omega)=\Lambda^\circ,\,\Gamma(\omega)=\Gamma^\circ\}$ of all possible values of $\pi'$ contains at most $(eN'/n)^{n}$ elements. Indeed, otherwise (8) and (9) imply that $S$ contains a cube of dimension $n+1$ and
so (7) implies that $d_n(K,\ell_\infty^{N'})\geqslant 5\delta/4$, which contradicts (16). For any $s\in S$ we have $\mathsf{P}(\pi'=s)\leqslant(1-\varepsilon/3)^{N'}$, so
Indeed, there are exponentially many subspaces, so we can choose smaller ${\delta=\delta(A,\varepsilon)}$ to provide an exponential bound on the probability.
Proof. Let $d_n^\mathrm{avg}(\{\xi_1,\dots,\xi_N\},L_0) \leqslant \varepsilon\leqslant 1$. Then for some $\eta_1,\dots,\eta_N$ in an $n$-dimensional space we have $\sum\|\xi_k-\eta_k\|_{L_0}\leqslant N\varepsilon$. Consider the random sets $\Lambda_t := \{k\colon |{\xi_k-\eta_k}|\geqslant t\}$; note that $\|\xi-\eta\|\geqslant t$ if and only if $|\Lambda_t|\geqslant tN$.
Let $A$ be a signum matrix, that is, a matrix with entries $\pm1$. We define the signum rank $\operatorname{rank}_\pm A$ to be the minimum rank of matrices $B$ such that with $\operatorname{sign} B_{i,j}\equiv A_{i,j}$ ($B_{i,j}\ne0$). There is a bound on the number of signum $N\times N$ matrices with signum rank at most $r$:
(see [25], and also [26] and [6]). For $r=cN$ we obtain the bound $2^{bN^2}$, where $b=b(c)\to0$ as $c\to0$. Hence, if $c$ is small, then there are very few signum matrices such that $\operatorname{rank}_\pm A\leqslant cN$. Moreover, almost all signum matrices are quite distant from such matrices in the Hamming metric. Hence almost all signum matrices are also distant from the low-rank matrices with $\operatorname{rank} B\leqslant cN$ in the $L_0$-distance. Now we state a corollary in terms of random signum matrices (with equiprobable signs).
Statement 3.1. Let $\mathbf{\mathcal E}$ be a random matrix with independent $\pm1$ entries. Then
Statement 3.2. Let $\mathbf{\varphi}_1,\dots,\mathbf{\varphi}_N$ be a random system in $D_N(0,1)$ such that $\varphi_{k,j} :=\varphi_k|_{((j-1)/N,j/N)}$, where the $\varepsilon_{k,j}$ are independent random signs. Then
Note that, although we use independent random variables in the definition of $\varphi_k$, the system $\{\varphi_k\}_1^N$ is by no means independent.
Proof of Statement 3.2. Let us prove that if $\{\varphi_k\}$ is not rigid in $L_0(0,1)$, then the matrix $\Phi := (\varphi_{k,j})$ can be well approximated in $L_0^{N\times N}$; this will contradict Statement 3.1.
Consider an approximation $\varphi_k\approx g_k$ in $L_0$ by functions in some $n$-dimensional subspace. There is a technical subtlety: we cannot claim that $g_k\in D_N$ (one cannot average over intervals because such an operator is not defined in $L_0$). Instead, we use Lemma 3.3: it shows that the rigidity of the system $\xi_k := \varphi_k$ is equivalent to the rigidity of the vector $(\xi_1,\dots,\xi_N)$, hence it is determined by the distribution of this vector. So we can assume that the measure on the space $(0,1)$, where everything is defined, consists of ‘atoms’ $((j-1)/N,j/N)$.
Finally, if $\varphi_k\approx g_k$, $g_k\in D_N$, and the average of $\|\varphi_k - g_k\|_{L_0}$ is less than $\delta$, then we have $\|\varphi_k-g_k\|_{L_0}\leqslant \gamma := \delta^{1/2}$ for all but at most $\gamma N$ indices. Therefore, $\|\Phi-G\|_{L_0^{N\times N}} \leqslant 2\gamma$.
The statement is proved.
Corollary 3.4. There exists an orthonormal system $\varphi_1,\dots,\varphi_N\in D_N(0,1)$ that is rigid in $L_0$:
Proof. We construct a system $\varphi_1,\dots,\varphi_N\!\in\! D_{2N}(0,2)$ instead. We define $\varphi_1,\dots,\varphi_N$ on $(0,1)$ using random values $\varphi_i|_{((j-1)/N,j/N)}=\pm\delta$ for some $\delta$ to be specified below. This system will be $L_0$-rigid with high probability.
Then we extend this system to $(0,2)$ to make it orthonormal. A necessary and sufficient condition of the existence of such an extension is known:
(see, for instance, [27], Ch. 8). If $\delta$ is sufficiently small, then this condition holds true with high probability, because the spectral norm of a random $\pm1$ matrix is $O(N^{1/2})$; see, for example, [22], Ch. 4.
Moreover, we can put the functions $\varphi_k|_{(1,2)}$ in an arbitrary $N$-dimensional space (preserving the integrals $\displaystyle\int_1^2\varphi_k\varphi_l$), for example, in $D_N(1,2)$.
The corollary is proved.
3.4. Lacunary systems
Let $\lambda>1$. We say that a sequence of positive integers $k_1,\dots,k_N$ is $\lambda$-lacunary if $k_{j+1}/k_j\geqslant\lambda$ for $j=1,\dots,N-1$.
Statement 3.3. Let $\varphi$ be a Riemann-integrable Borel function on $\mathbb{T}$ such that $\mu\{x\in\mathbb{T}\colon \varphi(x)\ne a\}>0$ for each $a\in\mathbb{R}$. Then the following hold.
1. There exists $\lambda=\lambda(\varphi)>1$ such that for any $\lambda$-lacunary sequence $k_1,\dots,k_N$ we have
Theorem F (see [28], Theorem 4.3). Let $(k_j)$ be an increasing sequence of positive integers. Then there exists a sequence $(g_j)$ of measurable functions on $(0,1)$ which are independent and uniformly distributed over $(0,1)$ and such that
Proof of Statement 3.3. Let us start with the $L_1$-case. First of all, we can shift and renorm $\varphi$ to obtain $\displaystyle\int_\mathbb{T}\varphi(x)\,dx=0$ and $\displaystyle\int_\mathbb{T}|\varphi(x)|\,dx=1$ (this does not affect rigidity). Given a $\lambda$-lacunary sequence $k_1,\dots,k_N$ for some $\lambda>1$, we apply Theorem F to obtain functions $g_1,\dots,g_N$ satisfying $\|\{k_j x\}-g_j(x)\|_{L_0} \leqslant 2/\lambda$.
Put $f_j(x):=\varphi(g_j(x))$. Then the functions $f_j$ are independent, $\displaystyle\int f_j=0$ and $\displaystyle\int|f_j|=1$. Corollary 3.1 implies the rigidity of $\{f_j\}$. To prove the rigidity of $\{\varphi(k_jx)\}$ we have to bound $\|\varphi(k_jx)-f_j\|_1$:
where $A=\{x\colon |\{k_j x\}-g_j(x)|>2/\lambda\}$. The first integral is at most $2\|\varphi\|_\infty\mu(A) \leqslant 4\|\varphi\|_\infty/\lambda$. The second integral does not exceed
where $\tau$ is the averaged modulus of continuity. It is known that ${\lim_{h\to 0}\tau(\varphi,h)=0}$ for Riemann-integrable functions. Hence, for sufficiently large $\lambda$ the value of $\|\varphi(k_jx)-f_j(x)\|_1$ is sufficiently small. This proves the $L_1$-case.
In this subsection we outline another method for the proof of rigidity, which is based on the notion of complementability.
Let $f_1,\dots,f_N$ be an orthonormal system in $L_2(0,1)$. Suppose that we aim to prove the rigidity of $\{f_j\}$ in a larger space $X\supset L_2$. If there exists a bounded linear operator $\pi\colon X\to L_2$ that preserves the system: $\pi f_j=f_j$, $j=1,\dots,N$, then we obtain rigidity in $X$:
Proof. Let $X$ be a symmetric space. It is known that for the Rademacher chaos property (1) is equivalent to the embedding $X\supset H$ (see [29], Theorem 6.4); where the space $H$ is the closure of $L_\infty$ in the Orlicz space $L_{\varphi_1}$, $\varphi_1(t)=e^t-1$. A criterion for complementability is the double embedding $H\subset X\subset H'$ (see [29], Theorem 6.7). Thus, for $X=H'$ both conditions are satisfied. The space $H'$ coincides with the Lorentz space $\Lambda(\psi_1)$ of functions satisfying $\displaystyle\int_0^1 f^*(t)\log(2/t)\,dt<\infty$ (see [29], §§ 2.2 and 2.4). Further, this Lorentz space coincides with the space $L\log L$ of functions satisfying $\displaystyle\int_0^1 |f|\log(2+|f|)\,dt<\infty$ (see, for example, [30], Theorem D).
The corollary is proved.
§ 4. Rigidity in $L_p$, $1<p<2$
4.1. The $S_{p'}$-property
We start with a finite-dimensional result. Its proof follows the paper [31] by Gluskin.
Theorem 4.1. Let $1<p<2$ and $N\in\mathbb N$. Suppose that $\xi=(\xi_1,\dots,\xi_N)$ is an isotropic random vector in $\mathbb{R}^N$: $\mathsf{E}\xi_i^2=1$ and $\mathsf{E}\xi_i\xi_j=0$, $i\ne j$, that satisfies two conditions:
Remark 4.1. Note that if condition (18) is satisfied, then (17) also holds for $A:=B$ and $\gamma:=p'-2$; see the proof of Theorem 4.2. However, is some cases we need dependence of the form $(\cdots)B^{-1}$ of the lower bound on the parameter $B$.
Let us formulate a corollary in function-theoretic terms.
Theorem 4.2. Let $\varphi_1,\dots,\varphi_N$ be an orthonormal system in a space $L_2(\mathcal X,\mu)$, $\mu(\mathcal X)=1$. Let $1<p<2$, and suppose that $\{\varphi_k\}$ has the $S_{p'}$-property, that is, for some $B\geqslant1$
Proof. Indeed, let $x$ be a random point in $\mathcal X$, and consider the random vector $\xi=(\varphi_1(x),\dots,\varphi_N(x))$. Condition (18) in Theorem 4.1 is exactly the $S_{p'}$-property. Let us check that condition (17) is fulfilled for $A:=B$ and $\gamma=p'-2$. Indeed,
Hence $d_n^{\mathrm{avg}}(\xi,\ell_p^N)_p$ is bounded below. It remains to use Statement 2.1.
The theorem is proved.
Corollary 4.1. Let $\varphi_1,\dots,\varphi_N$ be an orthonormal system in $L_2(\mathcal X,\mu)$, $\mu(\mathcal X)=1$, that is uniformly bounded: $\|\varphi_k\|_\infty\leqslant A$. Then for $1<p<2$ we have
Proof. Indeed, by a theorem of Bourgain [15] one can find a subsystem of size $\lfloor N^{2/p'}\rfloor$ of $\{\varphi_k\}$ with the $S_{p'}$-property and apply Corollary 3.5 to this subsystem.
The corollary is proved.
4.2. Additional remarks
Explicit rigid sets
Previous considerations allow us to present explicit constructions of $\ell_p$-rigid subsets of $\{-1,1\}^N$ of polynomial size.
Statement 4.1. For any $p\in(1,2)$ and any sufficiently large $N$ one can explicitly construct a set $V\subset\{-1,1\}^N$ of size $|V|\leqslant N^{C(p)}$ such that $d_{N/2}(V,\ell_p^N)\geqslant c(p) N^{1/p}$.
Proof. It is sufficient to construct $N$ characters $\chi_1,\dots,\chi_N\in \widehat{\mathbb Z_2^k}$ (with appropriate $k$) that form an $S_{p'}$-system. Then we apply Theorem 4.2: $d_{N/2}^\mathrm{avg}(\{\chi_1,\dots,\chi_N\},L_p)_p\geqslant c>0$. Hence once can take $V:=\{(\chi_1(x),\dots,\chi_N(x))\colon x\in \mathbb Z_2^k\}$.
In [32] an $S_{2m}$-system of $2^n$ characters in $\mathbb Z_2^{mn}$ was constructed, so it remains to take $2^n\approx N$, $m=\lceil p'/2\rceil$ and $k=mn$.
The statement is proved.
Independence for $p>2$
An analogue of Theorem 3.1 with normalization in $L_p$ also holds for $1<p\leqslant2$. We can easily deduce from Theorem 4.1 that an independent system of functions is rigid (with an additional log factor). However, this result will be the subject of another paper. The analogue of Theorem on the $L_1$-rigidity of independent functions is not true for $p>2$.
Statement 4.2. For any $p\in(2,\infty)$ there exists $\delta>0$ such that for all sufficiently large $N$ there exists a sequence $\xi_1,\dots,\xi_N$ of independent identically distributed random variables such that $\mathsf{E}\xi_i=0$, $\mathsf{E}|\xi_i|^p=1$ and
Proof. We use the approximation of the octahedron $B_1^N$ in $\ell_\infty^N$ (see [33]). Let $n=N^{1-\beta}$, where $\beta$ is to be chosen later, and let $Q_n^*$ be an extremal $n$-dimensional $\ell_\infty$-subspace:
where $K$ is defined by the condition $\mathsf{E}|\xi_1|^p=1$; so that $\varepsilon K^p=1$. Consider independent copies $\xi_2,\dots,\xi_N$ of $\xi_1$ and the vector $\xi=(\xi_1,\dots,\xi_N)$. We construct an approximation vector $\eta$ that lies in $Q_n^*$ (so that $\eta_1,\dots,\eta_N$ lie in an $n$-dimensional subspace of $L^p$).
Consider three events: $\mathcal{A}_0=\{\xi=0\}$, $\mathcal{A}_1$: exactly one coordinate of $\xi$ is nonzero, and $\mathcal{A}_{2}$: at least two coordinates of $\xi$ are nonzero. Set $\eta(\omega):=0$ for $\omega\in \mathcal{A}_0\cup \mathcal{A}_{2}$, and let $\eta(\omega)$ be the best approximation to $\xi(\omega)$ from $Q_n^*$ for $\omega\in \mathcal{A}_1$. In the latter case we have $\|\xi-\eta\|_\infty \leqslant K\rho(B_1^N,Q_n^*)_\infty$.
By symmetry it is sufficient to estimate $\mathsf{E}|\xi_1-\eta_1|^p$:
if $\beta$ is sufficiently small. The second term is bounded by $N^2\varepsilon^2K^p = N^2\varepsilon$, and we can make it small by letting $\varepsilon$ tend to zero.
The statement is proved.
§ 5. Approximation of Walsh and trigonometric functions
Recall some notation. Given a finite Abelian group $G$, we denote the group of characters $\chi\colon G\to\mathbb{C}$ by $\widehat{G}$. The probability measure $\mu_G$ on $G$ ($\mu_G(A)=|A|/|G|$) allows us to define the metrics $L_{\mathrm{H}}$ and $L_0$ for functions on $G$.
We are interested in two particular cases, the group $\mathbb Z_2^k$ and the cyclic group $\mathbb Z_N$.
In the first case characters are real and can be written as $W_x\colon y\mapsto (-1)^{\langle x,y\rangle}$, $x,y\in\mathbb Z_2^k$. We can identify these characters with functions in the well-known orthonormal Walsh system (in the Paley numeration) $w_0\equiv1,w_1,w_2,\dots$ . Indeed, for $n<2^k$ we have $w_n\bigl(\sum_{j\geqslant 1}y_j2^{-j}\bigr)=W_x(y)$, where $x=(x_1,\dots,x_k)\in\{0,1\}^k$, $y=(y_1,\dots,y_k)\in\{0,1\}^k$ and $n=\sum_{j=1}^kx_j2^{j-1}$.
The second case leads, of course, to the discrete trigonometric system $y\mapsto e(xy/N)$, where $e(\theta):=\exp(2\pi i\theta)$.
5.1. Kolmogorov widths
Recall the following well-known statement (see, for instance, [34], § 2.1).
Lemma 5.1. Let $G$ be a finite Abelian group, and let $A,B\subset G$ and $|A|+|B|>|G|$. Then $A+B=\{a+b\colon a\in A,\,b\in B\} = G$.
We use an immediate consequence of this lemma.
Lemma 5.2. Let $A,B\subset\widehat{G}$ and $|A|+|B|>|G|$. Then for any $m,n\in\mathbb N$
Our goal is to approximate $\{W_x\}$ in $L_{\mathrm{H}}$. Let $\mathcal X$ be the set of $x$ such that $\bigl|\sum_{i=1}^k x_i- k/2\bigr|\leqslant\sqrt{k}$. Hoeffding’s inequality shows that $|X|>2^{k-1}$, so it is enough to approximate $W_x$ for $x\in\mathcal X$ (and then use Lemma 5.2).
Fix $x\in\mathcal X$. Let $I:=\{i\colon x_i=1\}$ and
We consider the polynomial $q(t)$ that interpolates $(-1)^t$ for $t\in\mathbb Z\cap[k/4-2\lambda\sqrt{k}, k/4+2\lambda\sqrt{k}]$. Its degree $d$ is at most $4\lambda\sqrt{k}$; note that $4\lambda\sqrt{k}\leqslant k$. Then $W_x(y)=q(\langle x,y\rangle)$ for all $y\in Y_I$, hence
Let us bound the linear dimension of the family of functions $\{q(\langle x,\,\cdot\,\rangle)\}_x$ (it is sufficient to consider $x\in\mathcal X$ but it is more convenient to work with all $x\in\{0,1\}^k$). Their dimension is equal to the rank of the matrix $(q(\langle x,y\rangle))_{x,y\in\{0,1\}^k}$, where $x$ indexes rows and $y$ indexes columns. The rank is bounded by the number of the monomials in the polynomial $q(x_1y_1+\dots+x_ky_k)$, because each monomial generates rank-one matrix. Every monomial is a product of several $x_iy_i$, so we can bound the total number of them:
Proof. We have to approximate characters $y\mapsto e(xy/2^k)$. We write $x=\sum_{i=0}^{k-1}x_i2^i$ and $y=\sum_{j=0}^{k-1}y_j2^j$ to identify $x,y\in\mathbb Z_{2^k}$ with the binary vectors $(x_0,\dots,x_{k-1})$ and $(y_0,\dots,y_{k-1})$. Using the 1-periodicity of $e(\,\cdot\,)$ we obtain
Hence it is sufficient to approximate the functions $\psi_x$.
As in the previous proof, first we deal with the set $\mathcal X:=\bigl\{x\colon \bigl|\sum_{i=0}^{k-1}x_i-k/2\bigr|\leqslant\sqrt{k}\,\bigr\}$. Fix $x\in\mathcal X$.
We use interpolation again. There exists a polynomial $q_s$ of degree $d\leqslant 4\lambda\sqrt{k}$, that interpolates the function $t\mapsto e(2^{-s}t)$ at the points $\mathbb Z\cap [k/4-2\lambda\sqrt{k},k/4+2\lambda\sqrt{k}]$. Then the function $e\bigl(2^{-s}\sum_{i+j=k-s}x_iy_j\bigr)$ is approximated by $q_s(\langle x,y\rangle)$ and the error in the Hamming metric is at most $2\exp(-2\lambda^2)$; moreover, $\dim\{q_s(\langle x,\,\cdot\,\rangle)\}_x \leqslant (ek/d)^d \leqslant (\sqrt{k}/\lambda)^{4\lambda\sqrt{k}}$.
The function that we wish to approximate is a product:
We approximate it by the product of the corresponding polynomials. The error in the Hamming metric is at most $2s_0\exp(-2\lambda^2)$. The rank of the product is at most the product of the ranks, that is, $(\sqrt{k}/\lambda)^{4s_0\lambda\sqrt{k}}$.
We use Lemma 5.2 to approximate $\psi_x$ in $L_{\mathrm{H}}$ for all $x$. Finally, we use (22) to obtain an $L_0$-approximation of the characters with accuracy $2\pi k2^{-\mkern-1mu s_0}\mkern-1.5mu + 4s_0\mkern-1mu\exp(\mkern-2mu-2\lambda^2\mkern-1.5mu)$ by using dimension $\leqslant (k/\lambda^2)^{4s_0\lambda\sqrt{k}}$. It remains to take $s_0\asymp\lambda^2$.
To approximate continuous function, first we discretize them by approximating them by piecewise constant functions defined on sufficiently finite partitionings into $2^k$ subintervals, and then we approximate the resulting matrix (of the discrete Fourier transform) using the above result.
5.2. Trigonometric widths in $L_p$, $p<1$
Recall that the trigonometric widths $d_n^{\mathrm{T}}$ correspond to approximation by subspaces spanned by functions in the trigonometric system.
We consider two cases in parallel: the continuous case with harmonics $e(kx)$ on $\mathbb{T}$ and the discrete case with harmonics $e(kx/N)$ on $\mathbb Z_N$. We use the shorthand $L_p^N$ for the space $L_p(\mathbb Z_N)$ with the usual (quasi)norm
$$
\begin{equation*}
|\Lambda|\leqslant C N^{1-{1}/(2m)}\log^{1/m}N.
\end{equation*}
\notag
$$
We prove Theorem 5.3. We consider only the discrete case; the continuous case is quite analogous.
Proof of Theorem 5.3. We fix some number $m$ and use Lemma 5.4 to find a set $\Lambda$ and Lemma 5.3 to find a polynomial $T\in\mathcal T_m^N$. In order to approximate $e(kx/N)$ by the space $\mathcal T(\Lambda)$ of trigonometric polynomials with spectrum in $\Lambda$ we find $h\in\mathbb Z_N^*$ such that $k\pm h,\dots,k\pm mh\in\Lambda$. Consider the polynomial
It equals $e(kx/N)+s(x)$, where $\operatorname{supp}\widehat{s}\subset\Lambda$, so we have to estimate $\|t\|_{L_p^N}$. Note that $t(x)=e(kx/N)T_m(hx)$ and
Proof of Lemma 5.4. We start with the discrete case. Let $\Lambda\subset\mathbb Z_N$ be a random set: a point occurs in $\Lambda$ independently of the other points with some probability $\tau\in(0,1)$.
We call a step $h\in\mathbb Z_N^*$ admissible for $k\in \mathbb Z_N$ if $k\pm lh\in\Lambda$ for $l=1,\dots,m$. We have
$$
\begin{equation*}
\mathsf{P}(h \text{ is admissible for }k)=\tau^{2m}.
\end{equation*}
\notag
$$
We call $k$ ‘bad’ if there are no admissible steps for $k$. Assume that for each $k$ there are at least $S$ steps $h_1,\dots,h_S\in\mathbb Z_N^*$ such that the progressions $k\pm lh_i$, $l=1,\dots,m$, are disjoint (we estimate $S$ slightly below). Then the events that $h_i$ is admissible for $k$ are independent for different $i$. Hence the probability that $k$ is bad is at most the probability that none of the $h_i$ is admissible, that is, $(1-\tau^{2m})^S$.
The expected number of bad $k$ is at most $N(1-\tau^{2m})^S < N\exp(-S\tau^{2m})$ so with probability $\geqslant2/3$ this number does not exceed $3N\exp(-S\tau^{2m})$. Moreover, $\mathsf{E}|\Lambda|=N\tau$, so with probability $\geqslant2/3$ we have $|\Lambda|\leqslant 3N\tau$. Hence there exists $\Lambda$ such that both estimates are true. If $\tau$ is sufficiently small to ensure that
then there are no bad $k$ and $\Lambda$ is the required set.
It remains to estimate $S$ from below. There are $|\mathbb Z_N^*|\gg N/\log N$ possible steps in total. A step $h_1$ forbids the other steps $h$ that solve the $2m^2$ equations $lh \equiv \pm l_1h_1\pmod{N}$; $l,l_1\in\{1,\dots,m\}$. Each equation has at most $\mathrm{gcd}(N,l)\leqslant m$ solutions, so we can take
$$
\begin{equation*}
S \asymp m^{-3}\frac{N}{\log N}.
\end{equation*}
\notag
$$
The continuous case is analogous. Now, $\Lambda$ is a random subset of $\{-2N,\dots,2N\}$ and we consider $k\in\{-N,\dots,N\}$. To ensure that the progressions $\{k\pm lh, {l=1,\dots,m}\}$ are disjoint we take prime numbers $h$ in the interval $(m,N/m]$; hence $S \asymp m^{-1}N/\log N$.
The lemma is proved.
§ 6. Further work
In the most interesting case of the $L_1$-norm there is much more to do.
Question 6.1. Is the trigonometric system $\{e(kx)\}_{1\leqslant k\leqslant N}$ rigid in $L_1$?
It is obvious that $d_n(\{\varphi_1,\dots,\varphi_N\},L_1)\geqslant A^{-1} d_n(B_1^N,\ell_\infty^N)$ for a uniformly bounded ($\|\varphi_k\|_\infty\leqslant A$) orthonormal system. So good approximation is impossible for $n\leqslant c\log N$. Can one improve this bound?
Question 6.2. Prove that $d_n(\{w_1,\dots,w_N\},L_1)\geqslant c$ for $n=n(N)$ such that $n/\log N\to\infty$.
The independence (or unconditionality) is sufficient for $L_1$-rigidity but this condition is, of course, very restrictive. Can one find a weaker condition or at least prove the rigidity of some concrete system?
Question 6.3. Is the Rademacher chaos $\{r_i(t)r_j(t)\}_{1\leqslant i,j\leqslant N}$ rigid in $L_1$?
Cf. Statement 3.4. This question is connected with the next one (also see Statement 4.1).
Question 6.4. Present explicit constructions of $\ell_1$-rigid sets of polynomial (or at least subexponential) size in $W\subset \{-1,1\}^N$.
In [16] the non-rigidity of DFT matrices was connected with the non-rigidity of circulants. In terms of widths we obtain the following question.
Question 6.5. Let $u\in[-1,1]^N$, and let $T$ be a cyclic coordinate shift operator. Is it true that
A. Kolmogoroff (Kolmogorov), “Über die beste Annäherung von Funktionen einer gegebenen Funktionenklasse”, Ann. of Math. (2), 37:1 (1936), 107–110
2.
G. G. Lorentz, M. von Golitschek and Y. Makovoz, Constructive approximation. Advanced problems, Grundlehren Math. Wiss., 304, Springer-Verlag, Berlin, 1996, xii+649 pp.
3.
A. Pinkus, $n$-widths in approximation theory, Ergeb. Math. Grenzgeb. (3), 7, Springer-Verlag, Berlin, 1985, x+291 pp.
4.
Dinh Dũng, V. Temlyakov and T. Ullrich, Hyperbolic cross approximation, Adv. Courses Math. CRM Barcelona, Birkhäuser/Springer, Cham, 2018, xi+218 pp.
5.
V. M. Tikhomirov, “Approximation theory”, Analysis, v. II, Encyclopaedia Math. Sci., 14, Convex analysis and approximation theory, Springer-Verlag, Berlin, 1990, 93–243
6.
Y. Malykhin, “Matrix and tensor rigidity and $L_p$-approximation”, J. Complexity, 72 (2022), 101651, 13 pp.
7.
S. V. Lokam, “Complexity lower bounds using linear algebra”, Found. Trends Theor. Comput. Sci., 4:1–2 (2008), 1–155
8.
J. Alman and R. Williams, “Probabilistic rank and matrix rigidity”, STOC'17 Proceedings of the 49th annual ACM SIGACT symposium on theory of computing (Montreal, QC 2017), ACM, New York, 2017, 641–652
9.
S. M. Voronin and N. Temirgaliev, “On some applications of Banach measure”, Izv. Akad. Nauk Kaz. SSR, Ser. Fiz.-Mat., 1984, no. 5, 8–11 (Russian)
10.
V. E. Maĭorov, “Kolmogorov's $(n,\delta)$-widths of spaces of smooth functions”, Russian Acad. Sci. Sb. Math., 79:2 (1994), 265–279
11.
J. Creutzig, “Relations between classical, average, and probabilistic Kolmogorov widths”, J. Complexity, 18:1 (2002), 287–303
12.
B. S. Kashin, “Lower bounds for $m$-term approximations in the metric of the discrete space $L_n^0$”, Russian Math. Surveys, 76:5 (2021), 933–935
13.
V. F. Gaposhkin, “Lacunary series and independent functions”, Russian Math. Surveys, 21:6 (1966), 1–82
14.
B. S. Kashin, Yu. V. Malykhin and K. S. Ryutin, “Kolmogorov width and approximate rank”, Proc. Steklov Inst. Math., 303 (2018), 140–153
15.
J. Bourgain, “Bounded orthogonal systems and the $\Lambda(p)$-set problem”, Acta Math., 162:3–4 (1989), 227–245
16.
Z. Dvir and A. Liu, “Fourier and circulant matrices are not rigid”, 34th computational complexity conference, LIPIcs. Leibniz Int. Proc. Inform., 137, Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, 2019, 17, 23 pp.
17.
V. I. Ivanov and V. A. Yudin, “Trigonometric system in $L_p$, $0<p<1$”, Math. Notes, 28:6 (1980), 884–889
18.
T. Oikhberg and M. I. Ostrovskii, “Dependence of Kolmogorov widths on the ambient space”, Zh. Mat. Fiz. Anal. Geom., 9:1 (2013), 25–50
19.
E. Novak and H. Woźniakowski, Tractability of multivariate problems, v. I, EMS Tracts Math., 6, Linear information, Eur. Math. Soc., Zürich, 2008, xii+384 pp.
20.
R. S. Ismagilov, “On $n$-dimensional diameters of compacts in a Hilbert space”, Funct. Anal. Appl., 2:2 (1968), 125–132
21.
N. N. Vakhania, V. I. Tarieladze and S. A. Chobanyan, Probability distributions on Banach spaces, Math. Appl. (Soviet Ser.), 14, D. Reidel Publishing Co., Dordrecht, 1987, xxvi+482 pp.
22.
R. Vershynin, High-dimensional probability. An introduction with applications in data science, Camb. Ser. Stat. Probab. Math., 47, Cambridge Univ. Press, Cambridge, 2018, xiv+284 pp.
23.
J. D. Vaaler, “A geometric inequality with applications to linear forms”, Pacific J. Math., 83:2 (1979), 543–553
24.
S. Mendelson and R. Vershynin, “Entropy and the combinatorial dimension”, Invent. Math., 152:1 (2003), 37–55
25.
N. Alon, P. Frankl and V. R{o}dl, “Geometrical realization of set systems and probabilistic communication complexity”, SFCS'85 Proceedings of the 26th annual symposium on foundations of computer science (Portland, OR 1985), IEEE Computer Soc., Washington, DC, 1985, 277–280
26.
N. Alon, S. Moran and A. Yehudayoff, “Sign rank versus VC dimension”, 29th annual conference on learning theory, Proceedings of Machine Learning Research (PMLR), 49, 2016, 47–80,https://proceedings.mlr.press/v49/alon16.html
27.
B. S. Kashin and A. A. Saakyan, Orthogonal series, Transl. Math. Monogr., 75, Amer. Math. Soc., Providence, RI, 1989, xii+451 pp.
28.
I. Berkes, “On the uniform theory of lacunary series”, Number theory — Diophantine problems, uniform distribution and applications, Springer, Cham, 2017, 137–167
29.
S. V. Astashkin, The Rademacher system in function spaces, Birkhäuser/Springer, Cham, 2020, xx+559 pp.
30.
C. Bennett and K. Rudnick, On Lorentz–Zygmund spaces, Dissertationes Math. (Rozprawy Mat.), 175, PWN, Warszawa, 1980, 67 pp.
31.
E. D. Gluskin, “Intersections of a cube with an octahedron are poorly approximated by subspaces of small dimension”, Approximation of functions by special classes of operators, Interuniv. Collect. Sci. Works, Min. pros. RSFSR, Vologod. Gos. Ped. Inst., Vologda, 1987, 35–41 (Russian)
32.
D. J. Hajela, “Construction techniques for some thin sets in duals of compact abelian groups”, Ann. Inst. Fourier (Grenoble), 36:3 (1986), 137–166
33.
B. S. Kashin, “The diameters of octahedra”, Uspekhi Mat. Nauk, 30:4(184) (1975), 251–252 (Russian)
34.
Terence Tao and Van H. Vu, Additive combinatorics, Cambridge Stud. Adv. Math., 105, Reprint of the 2006 ed., Cambridge Univ. Press, Cambridge, 2010, xviii+512 pp.
Citation:
Yu. V. Malykhin, “Widths and rigidity”, Mat. Sb., 215:4 (2024), 117–148; Sb. Math., 215:4 (2024), 543–571