Loading [MathJax]/jax/output/CommonHTML/jax.js
Matematicheskoe modelirovanie
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Impact factor

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Mat. Model.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Matematicheskoe modelirovanie, 2025, Volume 37, Number 2, Pages 185–198
DOI: https://doi.org/10.20948/mm-2025-02-14
(Mi mm4606)
 

Model of statistical dependence of sample distributions using nearest neighbor graphs and Kullback-–Leibler divergence

A. A. Kislitsyn

Keldysh Institute of Applied Mathematics of RAS
References:
Abstract: The paper presents the results of numerical modeling of the distribution of graphs of the nearest neighbors by the number of connectivity components and vertices by degrees for the case when the distances between objects are not symmetric. The objects are discrete probability distributions, the distances between which are calculated as Kullback-Leibler divergence. Estimates of the probability that a set of probability distributions is formed by statistically dependent objects are numerically obtained. The numerical algorithm for constructing a benchmark for the probabilities of implementing a certain structure of the nearest neighbor graph is based on the fact that, up to isomorphism, this structure does not depend on the distribution of distances between objects. An algorithm for collecting sample statistics of nearest neighbor graphs for arbitrary random asymmetric matrices, the elements of which are treated as distances, is described. An example of the analysis of big data distributions obtained as a result of automatic processing of a corpus of more than 100 thousand literary texts by 8.5 thousand authors in Russian is considered. The corpus is structured according to the authors in the form of n-grams of letter combinations. The example is interesting because the n-gram distributions make it possible to accurately identify the author of a single text, so that the reference author distributions can be considered as a basis. In the exact sense of linear algebra, the vectors of the author's standards are linearly independent. At the same time, it turned out that these vectors are statistically dependent with a probability almost equal to 1, which allows additional structuring of the data array. A comparison was also made with the results obtained in the L1 histogram norm for the same distributions, and it was shown that the benchmark for asymmetric distances allows in this example to obtain an answer at a higher level of confidence.
Keywords: nearest neighbor graph, vertex degree distribution, clustering, Kullback–Leibler divergence, data dependence criterion, text author identification.
Funding agency Grant number
Russian Science Foundation 23-71-10055
Received: 09.09.2024
Revised: 09.09.2024
Accepted: 14.10.2024
Document Type: Article
Language: Russian
Citation: A. A. Kislitsyn, “Model of statistical dependence of sample distributions using nearest neighbor graphs and Kullback-–Leibler divergence”, Mat. Model., 37:2 (2025), 185–198
Citation in format AMSBIB
\Bibitem{Kis25}
\by A.~A.~Kislitsyn
\paper Model of statistical dependence of sample distributions using nearest neighbor graphs and Kullback-–Leibler divergence
\jour Mat. Model.
\yr 2025
\vol 37
\issue 2
\pages 185--198
\mathnet{http://mi.mathnet.ru/mm4606}
\crossref{https://doi.org/10.20948/mm-2025-02-14}
Linking options:
  • https://www.mathnet.ru/eng/mm4606
  • https://www.mathnet.ru/eng/mm/v37/i2/p185
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Математическое моделирование
    Statistics & downloads:
    Abstract page:56
    Full-text PDF :1
    References:10
    First page:4
     
      Contact us:
    math-net2025_05@mi-ras.ru
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2025