Trudy SPIIRAN
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Informatics and Automation:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Trudy SPIIRAN, 2018, Issue 58, Pages 77–110
DOI: https://doi.org/10.15622/sp.58.4
(Mi trspy1007)
 

This article is cited in 12 scientific papers (total in 12 papers)

Artificial Intelligence, Knowledge and Data Engineering

An analytic survey of end-to-end speech recognition systems

N. M. Markovnikova, I. S. Kipyatkovaba

a St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS)
b Saint Petersburg State University of Aerospace Instrumentation (SUAI)
Abstract: This article presents an analytic survey of various end-to-end speech recognition systems, as well as some approaches to their construction, training and optimization. We consider models based on connectionist temporal classification (CTC) as a loss function for neural networks, models based on encoder-decoder architecture with attention mechanism. Also, we describe neural networks models built using conditional random field (CRF), that is a generalization of hidden markov models that allows to fix some drawbacks of standard hybrid speech recognition systems like an assumption of independency of elements from speech frames sequences. We also describe integration possibilities with language models at a stage of decoding for end-to-end systems. Also, various modification and improvements of standard end-to-end models, for example, like generalization of connectionist temporal classification and regularization using at attention-based encoder-decoder models. We see that such an approach significantly reduces recognition error rates for end-to-end models. A survey of research works in this subject area reveals that end-to-end systems allow achieving results close to that of the state-of-the-art hybrid models. Nevertheless, end-to-end models use simple configuration and demonstrate a high speed of learning and decoding. In addition, we consider popular frameworks and toolkits for creating speech recognition systems like TensorFlow, Eesen, Kaldi, etc. Theirs comparing was provided by simplicity and accessibility of implementation end-to-end speech recognition system.
Keywords: speech recognition, end-to-end models, neural networks, deep learning.
Funding agency Grant number
Russian Foundation for Basic Research 18-07-01216_а
18-07-01407_а
Ministry of Education and Science of the Russian Federation МК-1000.2017.8
МД-254.2017.8
Russian Academy of Sciences - Federal Agency for Scientific Organizations 0073-2018-0002
This research is supported by the Russian Foundation for Basic Research (projects No. 18-07-01216 and 18-07-01407), by the Council for Grants of the President of the Russian Federation (projects No. MK-1000.2017.8 and МD-254.2017.8) and state research № 0073-2018-0002.
Received: 28.11.2017
Bibliographic databases:
Document Type: Article
UDC: 004.522
Language: Russian
Citation: N. M. Markovnikov, I. S. Kipyatkova, “An analytic survey of end-to-end speech recognition systems”, Tr. SPIIRAN, 58 (2018), 77–110
Citation in format AMSBIB
\Bibitem{MarKip18}
\by N.~M.~Markovnikov, I.~S.~Kipyatkova
\paper An analytic survey of end-to-end speech recognition systems
\jour Tr. SPIIRAN
\yr 2018
\vol 58
\pages 77--110
\mathnet{http://mi.mathnet.ru/trspy1007}
\crossref{https://doi.org/10.15622/sp.58.4}
\elib{https://elibrary.ru/item.asp?id=35630304}
Linking options:
  • https://www.mathnet.ru/eng/trspy1007
  • https://www.mathnet.ru/eng/trspy/v58/p77
  • This publication is cited in the following 12 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Informatics and Automation
    Statistics & downloads:
    Abstract page:1184
    Full-text PDF :1269
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024