Computer Research and Modeling
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Computer Research and Modeling:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Computer Research and Modeling, 2020, Volume 12, Issue 1, Pages 243–254
DOI: https://doi.org/10.20537/2076-7633-2020-12-1-243-254
(Mi crm782)
 

This article is cited in 3 scientific papers (total in 3 papers)

MODELS OF ECONOMIC AND SOCIAL SYSTEMS

Statistical analysis of bigrams of specialized texts

N. A. Mitin, Yu. N. Orlov

Keldysh Institute of Applied Mathematics Russian Academy of Sciences, 4 Miusskaya pl., Moscow, 125047, Russia
References:
Abstract: The method of the stochastic matrix spectrum analysis is used to build an indicator that allows to determine the subject of scientific texts without keywords usage. This matrix is a matrix of conditional probabilities of bigrams, built on the statistics of the alphabet characters in the text without spaces, numbers and punctuation marks. Scientific texts are classified according to the mutual arrangement of invariant subspaces of the matrix of conditional probabilities of pairs of letter combinations. The separation indicator is the value of the cosine of the angle between the right and left eigenvectors corresponding to the maximum and minimum eigenvalues. The computational algorithm uses a special representation of the dichotomy parameter, which is the integral of the square norm of the resolvent of the stochastic matrix of bigrams along the circumference of a given radius in the complex plane. The tendency of the integral to infinity testifies to the approximation of the integration circuit to the eigenvalue of the matrix. The paper presents the typical distribution of the indicator of identification of specialties. For statistical analysis were analyzed dissertations on the main 19 specialties without taking into account the classification within the specialty, 20 texts for the specialty. It was found that the empirical distributions of the cosine of the angle for the mathematical and Humanities specialties do not have a common domain, so they can be formally divided by the value of this indicator without errors. Although the body of texts was not particularly large, nevertheless, in the case of arbitrary selection of dissertations, the identification error at the level of 2% seems to be a very good result compared to the methods based on semantic analysis. It was also found that it is possible to make a text pattern for each of the specialties in the form of a reference matrix of bigrams, in the vicinity of which in the norm of summable functions it is possible to accurately identify the theme of the written scientific work, without using keywords. The proposed method can be used as a comparative indicator of greater or lesser severity of the scientific text or as an indicator of compliance of the text to a certain scientific level.
Keywords: stochastic matrix, spectral portrait, statistical indicator, scientific text.
Received: 21.08.2019
Revised: 24.11.2019
Accepted: 26.11.2019
Document Type: Article
UDC: 519.25
Language: Russian
Citation: N. A. Mitin, Yu. N. Orlov, “Statistical analysis of bigrams of specialized texts”, Computer Research and Modeling, 12:1 (2020), 243–254
Citation in format AMSBIB
\Bibitem{MitOrl20}
\by N.~A.~Mitin, Yu.~N.~Orlov
\paper Statistical analysis of bigrams of specialized texts
\jour Computer Research and Modeling
\yr 2020
\vol 12
\issue 1
\pages 243--254
\mathnet{http://mi.mathnet.ru/crm782}
\crossref{https://doi.org/10.20537/2076-7633-2020-12-1-243-254}
Linking options:
  • https://www.mathnet.ru/eng/crm782
  • https://www.mathnet.ru/eng/crm/v12/i1/p243
  • This publication is cited in the following 3 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Computer Research and Modeling
    Statistics & downloads:
    Abstract page:214
    Full-text PDF :56
    References:21
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024