A. S. Gumenyuk, A. A. Skiba, N. N. Pozdnichenko, S. N. Shpynov, “About similarity measures of components arrangement of naturally ordered data arrays”, Tr. SPIIRAN, 18:2 (2019), 471

Trudy SPIIRAN

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Informatics and Automation:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Trudy SPIIRAN, 2019, Issue 18, volume 2, Pages 471–503
DOI: https://doi.org/10.15622/sp.18.2.471-503 (Mi trspy1053)

This article is cited in 3 scientific papers (total in 3 papers)

Mathematical Modeling, Numerical Methods

About similarity measures of components arrangement of naturally ordered data arrays

A. S. Gumenyuk^a, A. A. Skiba^b, N. N. Pozdnichenko^a, S. N. Shpynov^c

^a Omsk State Technical University (OmSTU)
^b Company Elmis
^c N. F. Gamaleya Federal Research Center for Epidemiology & Microbiology

Full-text PDF (1712 kB) Citations (3)

DOI: https://doi.org/10.15622/sp.18.2.471-503

Abstract: At present, adequate mathematical tools are not used to analyze the arrangement of components in arrays of naturally ordered data of a different nature, including words or letters in texts, notes in musical compositions, symbols in sign sequences, monitoring data, numbers representing ordered measurement results, components in genetic texts. Therefore, it is difficult or impossible to measure and compare the order of messages allocated in long information chains. The main approaches for comparing symbol sequences are using probabilistic models and statistical tools, pairwise and multiple alignment, which makes it possible to determine the degree of similarity of sequences using edit distance measures. The application of pseudospectral and fractal representation of symbolic sequences is somewhat exotic. "The curse of a priori unconscious knowledge" of the obvious orderliness of the sequence should be especially noticed, as it is widespread in mathematical linguistics, bioinformatics (mathematical biology), and other similar fields of science. The noted approaches almost do not pay attention to the study and detection of the patterns of the specific arrangement of all symbols, words, and components of data sets that constitute a separate sequence. The object of study in our works is a specifically organized numerical tuple – the arrangement of components (order) in symbolic or numerical sequence. The intervals between the closest identical components of the order are used as the basis for the quantitative representation of the chain arrangement. Multiplying all the intervals or summing their logarithms allows one to get numbers that uniquely reflect the arrangement of components in a particular sequence. These numbers, allow us to obtain a whole set of normalized characteristics of the order, among which the geometric mean interval and its logarithm. Such characteristics surprisingly accurately reflect the arrangement of the components in the symbolic sequences. In this paper, we present an approach for quantitative comparing the arrangement of arrays of naturally ordered data (information chains) of an arbitrary nature. The measures of similarity/distinction and procedure of comparison of the chain order, based on the selection of a list of equal and similar by the order characteristics of the subsequences (components), are proposed. Rank distributions are used for faster selection of a list of matching components. The paper presents a toolkit for comparing the order of information chains and demonstrates some of its applications for studying the structure of nucleotide sequences.

Keywords: data array, symbolic sequence, information chain, numeric characteristics of order, depth of order, average remoteness, nucleotide sequence, similarity measures, similarity matrix, alignment-free genome comparison, inter-nucleotide distance.

Received: 22.05.2018

Bibliographic databases:

Document Type: Article

UDC: 006.72

Language: Russian

Citation: A. S. Gumenyuk, A. A. Skiba, N. N. Pozdnichenko, S. N. Shpynov, “About similarity measures of components arrangement of naturally ordered data arrays”, Tr. SPIIRAN, 18:2 (2019), 471–503

Citation in format AMSBIB

\Bibitem{GumSkiPoz19}

\by A.~S.~Gumenyuk, A.~A.~Skiba, N.~N.~Pozdnichenko, S.~N.~Shpynov

\paper About similarity measures of components arrangement of naturally ordered data arrays

\jour Tr. SPIIRAN

\yr 2019

\vol 18

\issue 2

\pages 471--503

\mathnet{http://mi.mathnet.ru/trspy1053}

\crossref{https://doi.org/10.15622/sp.18.2.471-503}

\elib{https://elibrary.ru/item.asp?id=37305501}

Linking options:

https://www.mathnet.ru/eng/trspy1053

https://www.mathnet.ru/eng/trspy/v18/i2/p471

This publication is cited in the following 3 articles:

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Statistics & downloads:
Abstract page:	154
Full-text PDF :	121

Что такое QR-код?

Registration to the website

Logotypes