N. M. Markovnikov, I. S. Kipyatkova, “An analytic survey of end-to-end speech recognition systems”, Tr. SPIIRAN, 58 (2018), 77

Trudy SPIIRAN

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Informatics and Automation:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Trudy SPIIRAN, 2018, Issue 58, Pages 77–110
DOI: https://doi.org/10.15622/sp.58.4 (Mi trspy1007)

This article is cited in 12 scientific papers (total in 12 papers)

Artificial Intelligence, Knowledge and Data Engineering

An analytic survey of end-to-end speech recognition systems

N. M. Markovnikov^a, I. S. Kipyatkova^ba

^a St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS)
^b Saint Petersburg State University of Aerospace Instrumentation (SUAI)

Full-text PDF (1360 kB) Citations (12)

DOI: https://doi.org/10.15622/sp.58.4

Abstract: This article presents an analytic survey of various end-to-end speech recognition systems, as well as some approaches to their construction, training and optimization. We consider models based on connectionist temporal classification (CTC) as a loss function for neural networks, models based on encoder-decoder architecture with attention mechanism. Also, we describe neural networks models built using conditional random field (CRF), that is a generalization of hidden markov models that allows to fix some drawbacks of standard hybrid speech recognition systems like an assumption of independency of elements from speech frames sequences. We also describe integration possibilities with language models at a stage of decoding for end-to-end systems. Also, various modification and improvements of standard end-to-end models, for example, like generalization of connectionist temporal classification and regularization using at attention-based encoder-decoder models. We see that such an approach significantly reduces recognition error rates for end-to-end models. A survey of research works in this subject area reveals that end-to-end systems allow achieving results close to that of the state-of-the-art hybrid models. Nevertheless, end-to-end models use simple configuration and demonstrate a high speed of learning and decoding. In addition, we consider popular frameworks and toolkits for creating speech recognition systems like TensorFlow, Eesen, Kaldi, etc. Theirs comparing was provided by simplicity and accessibility of implementation end-to-end speech recognition system.

Keywords: speech recognition, end-to-end models, neural networks, deep learning.

Funding agency	Grant number
Russian Foundation for Basic Research	18-07-01216_а 18-07-01407_а
Ministry of Education and Science of the Russian Federation	МК-1000.2017.8 МД-254.2017.8
Russian Academy of Sciences - Federal Agency for Scientific Organizations	0073-2018-0002
This research is supported by the Russian Foundation for Basic Research (projects No. 18-07-01216 and 18-07-01407), by the Council for Grants of the President of the Russian Federation (projects No. MK-1000.2017.8 and МD-254.2017.8) and state research № 0073-2018-0002.

Received: 28.11.2017

Bibliographic databases:

Document Type: Article

UDC: 004.522

Language: Russian

Citation: N. M. Markovnikov, I. S. Kipyatkova, “An analytic survey of end-to-end speech recognition systems”, Tr. SPIIRAN, 58 (2018), 77–110

Citation in format AMSBIB

\Bibitem{MarKip18}

\by N.~M.~Markovnikov, I.~S.~Kipyatkova

\paper An analytic survey of end-to-end speech recognition systems

\jour Tr. SPIIRAN

\yr 2018

\vol 58

\pages 77--110

\mathnet{http://mi.mathnet.ru/trspy1007}

\crossref{https://doi.org/10.15622/sp.58.4}

\elib{https://elibrary.ru/item.asp?id=35630304}

Linking options:

https://www.mathnet.ru/eng/trspy1007

https://www.mathnet.ru/eng/trspy/v58/p77

This publication is cited in the following 12 articles:

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Что такое QR-код?

Registration to the website

Logotypes