|
Simulation Tools for Control Systems and Controlled Objects
Speech recognition system for russian-language telephone speech
D. Obukhov Novosibirsk State Technical University
Abstract:
We describe a system designed to recognize Russian-language speech. Our focus is on the domain of telephone conversations, when a single-channel noisy audio signal with a sample rate of 8 kHz is received at the input. Additionally, data from YouTube video hosting is used for training. We consider a number of acoustic models and techniques for building a lexicon and language model. In addition, we conduct experiments on the influence of speaker information. It is also shown that the use of augmentation techniques such as reverb, changing the speed and volume of a signal, masking frequency and time characteristics significantly increase the quality of recognition. We achieve word error rate 24.21 on our validation dataset.
Keywords:
speech recognition, russian-language speech, acoustic model, language model, speech augmentation, speaker embedding.
Received: May 9, 2020 Published: January 31, 2021
Citation:
D. Obukhov, “Speech recognition system for russian-language telephone speech”, UBS, 89 (2021), 106–122
Linking options:
https://www.mathnet.ru/eng/ubs1070 https://www.mathnet.ru/eng/ubs/v89/p106
|
Statistics & downloads: |
Abstract page: | 203 | Full-text PDF : | 409 | References: | 17 |
|