|
COMPUTER SOFTWARE AND COMPUTING EQUIPMENT
Algorithm for fast computation of shannon's entropy on small samples for long biometrics codes with essentially dependent digits
V. I. Volchikhin, A. I. Ivanov, A. P. Alekseiy Penza State University, Penza, Russia
Abstract:
Estimating the entropy of long codes according to Shannon is a problem of exponential computational complexity as the length of the binary code sequence increases. At the same time, the sample size on which it is necessary to estimate the probabilities of the occurrence of rare events is rapidly growing. The purpose of the article is to simplify calculations by moving to logarithmic spaces of the probabilities of occurrence of rare events and the logarithm of the length of the analyzed code. It is showed that in the space of double logarithms for codes with dependent digits, linear extrapolation works well. This ultimately makes it possible to speed up the assessment of the entropy of long codes by eliminating the need to process large amounts of initial data using the Shannon formula. Using the classical Shannon formula, it is necessary to estimate the probability of the occurrence of certain code states, which leads to the complication of the task as the code length increases. You have to wait for rare events to appear. It is proposed to simplify the problem by symmetrizing the correlation matrix of the analyzed codes. The original asymmetric correlation matrix of arbitrary codes is replaced with an equivalent one, in which all correlation coefficients outside the diagonal are positive and identical. The coefficients of the symmetrized matrix are obtained by averaging the modules of the correlation coefficients of the original asymmetric matrix. Estimating correlation coefficients is a task that has quadratic computational complexity. That is, the entropy estimate using the proposed algorithm, instead of exponential computational complexity, has quadratic computational complexity. A nomogram is presented for the connection between the logarithm of the probability of errors of the second type (equivalent to Shannon entropy) and a long code sequence of correlation coupling coefficients of its digits $r = \{0.0, 0.2, 0.3, \dots, 0.8\}$. The algorithm is efficient on small samples of $16$ to $32$ examples of analyzed code sequences.
Keywords:
entropy estimation, Hamming entropy, Shannon entropy, small samples, symmetrization of correlations, asymmetric correlation matrix, symmetrized matrix.
Received: 24.07.2024 Accepted: 18.10.2024
Citation:
V. I. Volchikhin, A. I. Ivanov, A. P. Alekseiy, “Algorithm for fast computation of shannon's entropy on small samples for long biometrics codes with essentially dependent digits”, Vestn. Astrakhan State Technical Univ. Ser. Management, Computer Sciences and Informatics, 2024, no. 4, 27–34
Linking options:
https://www.mathnet.ru/eng/vagtu821 https://www.mathnet.ru/eng/vagtu/y2024/i4/p27
|
|