|
Short Communications
On sub-gaussian concentration of missing mass
M. Skorski University of Luxembourg, Luxembourg
Abstract:
The statistical inference on missing mass aims to estimate the weight of elements not observed during sampling. Since the pioneer work of Good and Turing, the problem has been studied in many areas, including statistical linguistics, ecology, and machine learning.
Proving the sub-Gaussian behavior of the missing mass has been notoriously hard, and a number of complicated arguments have been proposed: logarithmic Sobolev inequalities, thermodynamic approaches, and information-theoretic transportation methods. Prior works have argued that the difficulty is inherent, and classical tools are inadequate.
We show that this common belief is false, and all that we need to establish the sub-Gaussian concentration is the classical inequality of Bernstein. The strong educational value of our work is in its demonstration of this inequality in its full generality, an aspect not well recognized by researchers.
Keywords:
missing mass, measure concentration, heterogenic Bernstein's inequality, sub-Gamma concentration.
Received: 30.01.2021 Revised: 02.05.2022 Accepted: 10.06.2022
Citation:
M. Skorski, “On sub-gaussian concentration of missing mass”, Teor. Veroyatnost. i Primenen., 68:2 (2023), 393–400; Theory Probab. Appl., 68:2 (2023), 324–329
Linking options:
https://www.mathnet.ru/eng/tvp5478https://doi.org/10.4213/tvp5478 https://www.mathnet.ru/eng/tvp/v68/i2/p393
|
Statistics & downloads: |
Abstract page: | 83 | Full-text PDF : | 7 | References: | 21 | First page: | 4 |
|