Abstract:
Sequencing of the human genome began in 1994. Revealing of a human DNA draft took 10 years of collaborative work of many research groups from different countries. Modern technologies allow for sequencing a whole genome in a few days. We discuss here the advances in modern bioinformatics related to the emergence of highperformance sequencing platforms, which not only contributed to the expansion of capabilities of biology and related sciences, but also gave rise to the phenomenon of Big Data in biology. The necessity for development of new technologies and methods for organization of storage, management, analysis and visualization of big data is substantiated. Modern bioinformatics is facing not only the problem of processing enormous volumes of heterogeneous data, but also a variety of methods of interpretation and presentation of the results, the simultaneous existence of various software tools and data formats. The ways of solving the arising challenges are discussed, in particular by using experiences from other areas of modern life, such as web and business intelligence. The former is the area of scientific research and development that explores the impact and makes use of artificial intelligence and information technology (IT) for new products, services and frameworks that are empowered by the World Wide Web; the latter is the domain of IT, which addresses the issues of decision-making. New database management systems, other than relational ones, will help to solve the problem of storing huge data and providing an acceptable timescale for performing search queries. New programming technologies, such as generic programming and visual programming, are designed to solve the problem of the diversity of genomic data formats and to provide the ability to quickly create one’s own scripts for data processing.
Key words:
Big Data, NGS, genome sequencing, IT technologies, bioinformatics, generic programming, visual programming, nonrelational databases, NoSQL systems, Hadoop, MapReduce.
The study was partially supported by RFBR grants 15-07-05783 (N.N.N), 16-07-00937 and 16-07-01000 (U.M.N) and the Program of Fundamental Scientific Research of the Presidium of the Russian Academy of Sciences I.33P. (U.M.N).
Received 16.03.2018, Published 03.04.2018
Document Type:
Article
UDC:
004.9:004.9:004.8:577.21
Language: English
Citation:
N. N. Nazipova, E. A. Isaev, V. V. Kornilov, D. V. Pervukhin, A. A. Morozova, A. A. Gorbunov, M. N. Ustinin, “Big Data in bioinformatics”, Mat. Biolog. Bioinform., 13, Suppl. (2018), t1–t16
\Bibitem{NazIsaKor18}
\by N.~N.~Nazipova, E.~A.~Isaev, V.~V.~Kornilov, D.~V.~Pervukhin, A.~A.~Morozova, A.~A.~Gorbunov, M.~N.~Ustinin
\paper Big Data in bioinformatics
\jour Mat. Biolog. Bioinform.
\yr 2018
\vol 13
\pages t1--t16
\issueinfo Suppl.
\mathnet{http://mi.mathnet.ru/mbb359}
\crossref{https://doi.org/10.17537/2018.13.t1}
Linking options:
https://www.mathnet.ru/eng/mbb359
https://www.mathnet.ru/eng/mbb/v13/i3/p1
Translation
Big Data in bioinformatics N. N. Nazipova, E. A. Isaev, V. V. Kornilov, D. V. Pervukhin, A. A. Morozova, A. A. Gorbunov, M. N. Ustinin Mat. Biolog. Bioinform., 2017, 12:1, 102–119
This publication is cited in the following 10 articles: