V. D. Gusev, L. A. Miroshnichenko, “The complexity of DNA sequences. Different approaches and definitions”, Mat. Biolog. Bioinform., 15:2 (2020), 313

Loading [MathJax]/jax/output/SVG/config.js

Matematicheskaya Biologiya i Bioinformatika

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive
	Impact factor

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Mat. Biolog. Bioinform.:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Matematicheskaya Biologiya i Bioinformatika, 2020, Volume 15, Issue 2, Pages 313–337
DOI: https://doi.org/10.17537/2020.15.313 (Mi mbb435)

This article is cited in 2 scientific papers (total in 2 papers)

Review Articles

The complexity of DNA sequences. Different approaches and definitions

V. D. Gusev, L. A. Miroshnichenko

Sobolev Institute of Mathematics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk

Full-text PDF (978 kB) Citations (2)

References:

PDF

HTML

DOI: https://doi.org/10.17537/2020.15.313

Abstract: An important quantitative characteristic of symbolic sequence (texts, strings) is complexity, which reflects at the intuitive level the degree of their “non-randomness”. A.N. Kolmogorov formulated the most general definition of complexity. He proposed measuring the complexity of an object (symbolic sequence) by the length of the shortest descriptions by which this object can be uniquely reconstructed. Since there is no program guaranteed to search for the shortest description, in practice, various algorithmic approximations considered in this paper are used for this purpose. Along with definitions of complexity, suggesting the possibility of reconstruction a sequence from its "description", a number of measures are considered that do not imply such restoration. They are based on the calculation of some quantitative characteristics. Of interest is not only a quantitative assessment of complexity, but also the identification and classification of structural regularities that determine its specific value. In one form or another, they are expressed in the demonstration of repetition in the broadest sense. The considered measures of complexity are conventionally divided into statistical ones that take into account the frequency of occurrence of symbols or short “words” in the text, “dictionary” ones that estimate the number of different “subwords” and “structural” ones based on the identification of long repeating fragments of text and the determination of relationships between them. Most of the methods are designed for sequences of an arbitrary linguistic nature. The special attention paid to DNA sequences, reflected in the title of the article, is due to the importance of the object, manifestations of repetition of different types, and numerous examples of using the concept of complexity in solving problems of classification and evolution of various biological objects. Local structural features found in the sliding window mode in DNA sequences are of considerable interest, since zones of low complexity in the genomes of various organisms are often associated with the regulation of basic genetic processes.

Key words: DNA sequences, algorithms, complexity, entropy, data compression, statistical measures, linguistic measure of complexity, structural measures of complexity.

Funding agency	Grant number
Ministry of Education and Science of the Russian Federation	0314-2019-0015
The study was carried out within the framework of the state contract of the Sobolev Institute of Mathematics (project no. 0314-2019-0015).

Received 23.10.2020, 14.11.2020, Published 30.11.2020

Document Type: Article

Language: Russian

Citation: V. D. Gusev, L. A. Miroshnichenko, “The complexity of DNA sequences. Different approaches and definitions”, Mat. Biolog. Bioinform., 15:2 (2020), 313–337

Citation in format AMSBIB

\Bibitem{GusMir20}

\by V.~D.~Gusev, L.~A.~Miroshnichenko

\paper The complexity of DNA sequences. Different approaches and definitions

\jour Mat. Biolog. Bioinform.

\yr 2020

\vol 15

\issue 2

\pages 313--337

\mathnet{http://mi.mathnet.ru/mbb435}

\crossref{https://doi.org/10.17537/2020.15.313}

Linking options:

https://www.mathnet.ru/eng/mbb435

https://www.mathnet.ru/eng/mbb/v15/i2/p313

This publication is cited in the following 2 articles:

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Statistics & downloads:
Abstract page:	187
Full-text PDF :	250
References:	22

Registration to the website

Logotypes