Computer Optics
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Computer Optics:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Computer Optics, 2021, Volume 45, Issue 2, Pages 253–260
DOI: https://doi.org/10.18287/2412-6179-CO-801
(Mi co904)
 

This article is cited in 7 scientific papers (total in 7 papers)

IMAGE PROCESSING, PATTERN RECOGNITION

A nonparametric algorithm for automatic classification of large multivariate statistical data sets and its application

I. V. Zenkovab, A. V. Lapkocd, À. L. Vasilycd, S. T. Imaed, V. P. Tuboltsevd, V. L. Avdeenokd

a Siberian Federal University, 660041, Krasnoyarsk, Russia, Svobodny Av. 79
b Krasnoyarsk Branch of the Federal Research Center for Information and Computational Technologies, 660049, Krasnoyarsk, Russia, Mira Av. 53
c Institute of Computational Modelling SB RAS, 660036, Krasnoyarsk, Russia, Akademgorodok 50
d Reshetnev Siberian State University of Science and Technology, 660037, Krasnoyarsk, Russia, Krasnoyarsky Rabochy Av. 31
e Sukachev Institute of Forest SB RAS, 660036, Krasnoyarsk, Russia, Akademgorodok 50
References:
Abstract: A nonparametric algorithm for automatic classification of large statistical data sets is proposed. The algorithm is based on a procedure for optimal discretization of the range of values of a random variable. A class is a compact group of observations of a random variable corresponding to a unimodal fragment of the probability density. The considered algorithm of automatic classification is based on the «compression» of the initial information based on the decomposition of a multidimensional space of attributes. As a result, a large statistical sample is transformed into a data array composed of the centers of multidimensional sampling intervals and the corresponding frequencies of random variables. To substantiate the optimal discretization procedure, we use the results of a study of the asymptotic properties of a kernel-type regression estimate of the probability density. An optimal number of sampling intervals for the range of values of one- and two-dimensional random variables is determined from the condition of the minimum root-mean square deviation of the regression probability density estimate. The results obtained are generalized to the discretization of the range of values of a multidimensional random variable. The optimal discretization formula contains a component that is characterized by a nonlinear functional of the probability density. An analytical dependence of the detected component on the antikurtosis coefficient of a one-dimensional random variable is established. For independent components of a multidimensional random variable, a methodology is developed for calculating estimates of the optimal number of sampling intervals for random variables and their lengths. On this basis, a nonparametric algorithm for the automatic classification is developed. It is based on a sequential procedure for checking the proximity of the centers of multidimensional sampling intervals and relationships between frequencies of the membership of the random variables from the original sample of these intervals. To further increase the computational efficiency of the proposed automatic classification algorithm, a multithreaded method of its software implementation is used. The practical significance of the developed algorithms is confirmed by the results of their application in processing remote sensing data.
Keywords: automatic classification algorithm, multidimensional histogram, regression probability density estimate, discretization of the range of values of a random variable, large samples, antikurtosis coefficient, remote sensing data.
Funding agency Grant number
Russian Foundation for Basic Research 20-41-240001 à
The research was funded by RFBR, Krasnoyarsk Territory and Krasnoyarsk Regional Fund of Science, project number 20-41-240001.
Received: 21.08.2020
Accepted: 03.12.2020
Document Type: Article
Language: Russian
Citation: I. V. Zenkov, A. V. Lapko, À. L. Vasily, S. T. Im, V. P. Tuboltsev, V. L. Avdeenok, “A nonparametric algorithm for automatic classification of large multivariate statistical data sets and its application”, Computer Optics, 45:2 (2021), 253–260
Citation in format AMSBIB
\Bibitem{ZenLapVas21}
\by I.~V.~Zenkov, A.~V.~Lapko, À.~L.~Vasily, S.~T.~Im, V.~P.~Tuboltsev, V.~L.~Avdeenok
\paper A nonparametric algorithm for automatic classification of large multivariate statistical data sets and its application
\jour Computer Optics
\yr 2021
\vol 45
\issue 2
\pages 253--260
\mathnet{http://mi.mathnet.ru/co904}
\crossref{https://doi.org/10.18287/2412-6179-CO-801}
Linking options:
  • https://www.mathnet.ru/eng/co904
  • https://www.mathnet.ru/eng/co/v45/i2/p253
  • This publication is cited in the following 7 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Computer Optics
    Statistics & downloads:
    Abstract page:56
    Full-text PDF :14
    References:11
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024