|
This article is cited in 2 scientific papers (total in 2 papers)
Generalized statistical method of text analysis based on calculation of probability distributions of statistical values
A. K. Melnikova, A. F. Ronzhinb a STC CLSC "InformInvestGroup"; 125, Bld. 17 Varshavskoye Shosse, Moscow 117587, Russian Federation
b S.A. Lebedev Institute of Precision Mechanics and Computer Engineering of the Russian Academy of Sciences, 51 Leninsky Prosp., Moscow 119991, Russian Federation
Abstract:
A lot of data streams are a mixture of random and unique data. One of the properties of unique data is the nonuniform distribution of probability of encountering the data on the set of the values. The procedure of two steps is implemented for distinguishing unique data. On the first step of candidate selection, the criterion of consensus with the uniform distribution is implemented. On the second step, resource-intensive calculation in a condition of indeterminacy is performed in order to check other unique attributes of the candidates. The choice of the size of the criterion depends on the amount of resources given for the second step. The accuracy of calculation determines the quantity of overhead of the second term for processing random data and, therefore, a part of unique data loss. The paper analyzes the values of boundary parameters for which at the current level of computer technology, one can calculate the exact distribution. A generalized statistical method of text analysis, which can be used for a wide spectrum of text parameters, is developed.
Keywords:
probability; exact distribution; limit distribution; statistics; criterion; frequency; algorithm complexity; performance of multiprocessor computer system; analysis method.
Received: 02.02.2016
Citation:
A. K. Melnikov, A. F. Ronzhin, “Generalized statistical method of text analysis based on calculation of probability distributions of statistical values”, Inform. Primen., 10:4 (2016), 89–95
Linking options:
https://www.mathnet.ru/eng/ia448 https://www.mathnet.ru/eng/ia/v10/i4/p89
|
Statistics & downloads: |
Abstract page: | 271 | Full-text PDF : | 122 | References: | 47 |
|