Trudy SPIIRAN
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Informatics and Automation:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Trudy SPIIRAN, 2019, Issue 18, volume 2, Pages 471–503
DOI: https://doi.org/10.15622/sp.18.2.471-503
(Mi trspy1053)
 

This article is cited in 3 scientific papers (total in 3 papers)

Mathematical Modeling, Numerical Methods

About similarity measures of components arrangement of naturally ordered data arrays

A. S. Gumenyuka, A. A. Skibab, N. N. Pozdnichenkoa, S. N. Shpynovc

a Omsk State Technical University (OmSTU)
b Company Elmis
c N. F. Gamaleya Federal Research Center for Epidemiology & Microbiology
Abstract: At present, adequate mathematical tools are not used to analyze the arrangement of components in arrays of naturally ordered data of a different nature, including words or letters in texts, notes in musical compositions, symbols in sign sequences, monitoring data, numbers representing ordered measurement results, components in genetic texts. Therefore, it is difficult or impossible to measure and compare the order of messages allocated in long information chains. The main approaches for comparing symbol sequences are using probabilistic models and statistical tools, pairwise and multiple alignment, which makes it possible to determine the degree of similarity of sequences using edit distance measures. The application of pseudospectral and fractal representation of symbolic sequences is somewhat exotic. "The curse of a priori unconscious knowledge" of the obvious orderliness of the sequence should be especially noticed, as it is widespread in mathematical linguistics, bioinformatics (mathematical biology), and other similar fields of science. The noted approaches almost do not pay attention to the study and detection of the patterns of the specific arrangement of all symbols, words, and components of data sets that constitute a separate sequence. The object of study in our works is a specifically organized numerical tuple – the arrangement of components (order) in symbolic or numerical sequence. The intervals between the closest identical components of the order are used as the basis for the quantitative representation of the chain arrangement. Multiplying all the intervals or summing their logarithms allows one to get numbers that uniquely reflect the arrangement of components in a particular sequence. These numbers, allow us to obtain a whole set of normalized characteristics of the order, among which the geometric mean interval and its logarithm. Such characteristics surprisingly accurately reflect the arrangement of the components in the symbolic sequences. In this paper, we present an approach for quantitative comparing the arrangement of arrays of naturally ordered data (information chains) of an arbitrary nature. The measures of similarity/distinction and procedure of comparison of the chain order, based on the selection of a list of equal and similar by the order characteristics of the subsequences (components), are proposed. Rank distributions are used for faster selection of a list of matching components. The paper presents a toolkit for comparing the order of information chains and demonstrates some of its applications for studying the structure of nucleotide sequences.
Keywords: data array, symbolic sequence, information chain, numeric characteristics of order, depth of order, average remoteness, nucleotide sequence, similarity measures, similarity matrix, alignment-free genome comparison, inter-nucleotide distance.
Received: 22.05.2018
Bibliographic databases:
Document Type: Article
UDC: 006.72
Language: Russian
Citation: A. S. Gumenyuk, A. A. Skiba, N. N. Pozdnichenko, S. N. Shpynov, “About similarity measures of components arrangement of naturally ordered data arrays”, Tr. SPIIRAN, 18:2 (2019), 471–503
Citation in format AMSBIB
\Bibitem{GumSkiPoz19}
\by A.~S.~Gumenyuk, A.~A.~Skiba, N.~N.~Pozdnichenko, S.~N.~Shpynov
\paper About similarity measures of components arrangement of naturally ordered data arrays
\jour Tr. SPIIRAN
\yr 2019
\vol 18
\issue 2
\pages 471--503
\mathnet{http://mi.mathnet.ru/trspy1053}
\crossref{https://doi.org/10.15622/sp.18.2.471-503}
\elib{https://elibrary.ru/item.asp?id=37305501}
Linking options:
  • https://www.mathnet.ru/eng/trspy1053
  • https://www.mathnet.ru/eng/trspy/v18/i2/p471
  • This publication is cited in the following 3 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Informatics and Automation
    Statistics & downloads:
    Abstract page:145
    Full-text PDF :118
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024