Informatics and Automation
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Informatics and Automation:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Informatics and Automation, 2022, Issue 21, volume 3, Pages 572–603
DOI: https://doi.org/10.15622/ia.21.3.5
(Mi trspy1201)
 

This article is cited in 1 scientific paper (total in 1 paper)

Artificial Intelligence, Knowledge and Data Engineering

Machine learning in base-calling for next-generation sequencing methods

A. G. Borodinova, V. V. Manoilovb, I. V. Zarutskyb, A. I. Petrovb, V. E. Kurochkinb, A. S. Saraevb

a Scientific Instruments Joint Stock Company
b Institute for Analytical Instrumentation, Russian Academy of Sciences, St. Petersburg
Abstract: The development of next-generation sequencing (NGS) technologies has made a significant contribution to the trend of reducing costs and obtaining massive sequencing data. The Institute for Analytical Instrumentation of the Russian Academy of Sciences is developing a hardware-software complex for deciphering nucleic acid sequences by the method of mass parallel sequencing (Nanofor SPS). Image processing algorithms play an essential role in solving the problems of genome deciphering. The final part of this preliminary analysis of raw data is the base-calling process. Base-calling is the process of determining a nucleotide base that generates the corresponding intensity value in the fluorescence channels for different wavelengths in the flow cell image frames for different synthesis sequencing runs. An extensive analysis of various base-calling approaches and a summary of the common procedures available for the Illumina platform are provided. Various chemical processes included in the synthesis sequencing technology, which cause shifts in the values of recorded intensities, are considered, including the effects of phasing / prephasing, signal decay, and crosstalk. A generalized model is defined, within which possible implementations are considered. Possible machine learning (ML) approaches for creating and evaluating models that implement the base-calling processing stage are considered. ML approaches take many forms, including unsupervised learning, semi-supervised learning, and supervised learning. The paper shows the possibility of using various machine learning algorithms based on the Scikit-learn platform. A separate important task is the optimal selection of features identified in the detected clusters on a flow cell for machine learning. Finally, a number of sequencing data for the MiSeq Illumina and Nanofor SPS devices show the promise of the machine learning method for solving the base-calling problem.
Keywords: next-generation sequencing, base-calling, bioinformatics, machine learning.
Funding agency Grant number
Ministry of Science and Higher Education of the Russian Federation 122032300337-4
This research was performed within the framework of the state number registration 122032300337-4 dated 03/23/22, Ministry of Science and Higher Education of the Russian Federation.
Received: 05.04.2022
Document Type: Article
UDC: 543.07
Language: Russian
Citation: A. G. Borodinov, V. V. Manoilov, I. V. Zarutsky, A. I. Petrov, V. E. Kurochkin, A. S. Saraev, “Machine learning in base-calling for next-generation sequencing methods”, Informatics and Automation, 21:3 (2022), 572–603
Citation in format AMSBIB
\Bibitem{BorManZar22}
\by A.~G.~Borodinov, V.~V.~Manoilov, I.~V.~Zarutsky, A.~I.~Petrov, V.~E.~Kurochkin, A.~S.~Saraev
\paper Machine learning in base-calling for next-generation sequencing methods
\jour Informatics and Automation
\yr 2022
\vol 21
\issue 3
\pages 572--603
\mathnet{http://mi.mathnet.ru/trspy1201}
\crossref{https://doi.org/10.15622/ia.21.3.5}
Linking options:
  • https://www.mathnet.ru/eng/trspy1201
  • https://www.mathnet.ru/eng/trspy/v21/i3/p572
  • This publication is cited in the following 1 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Informatics and Automation
    Statistics & downloads:
    Abstract page:163
    Full-text PDF :205
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024