|
This article is cited in 7 scientific papers (total in 7 papers)
Speech-based emotion recognition and speaker identification: static vs. dynamic mode of speech representation
Maxim Sidorova, Wolfgang Minkera, Eugene S. Semenkinb a Institute of Communications Engineering, Ulm University, Albert-Einstein-Allee, 43, Ulm, 89081
b Informatics and Telecommunications Institute, Reshetnev Siberian State Aerospace University, Krasnoyarskiy Rabochiy, 31, Krasnoyarsk, 660037, Russia
Abstract:
In this paper we present the performance of different machine learning algorithms for the problems of speech-based Emotion Recognition (ER) and Speaker Identification (SI) in static and dynamic modes of speech signal representation. We have used a multi-corporal, multi-language approach in the study. 3 databases for the problem of SI and 4 databases for the ER task of 3 different languages (German, English and Japanese) have been used in our study to evaluate the models. More than 45 machine learning algorithms were applied to these tasks in both modes and the results alongside discussion are presented here.
Keywords:
emotion recognition from speech, speaker identification from speech, machine learning algorithms, speaker adaptive emotion recognition from speech.
Received: 28.12.2015 Received in revised form: 24.02.2016 Accepted: 15.09.2016
Citation:
Maxim Sidorov, Wolfgang Minker, Eugene S. Semenkin, “Speech-based emotion recognition and speaker identification: static vs. dynamic mode of speech representation”, J. Sib. Fed. Univ. Math. Phys., 9:4 (2016), 518–523
Linking options:
https://www.mathnet.ru/eng/jsfu514 https://www.mathnet.ru/eng/jsfu/v9/i4/p518
|
Statistics & downloads: |
Abstract page: | 455 | Full-text PDF : | 80 | References: | 44 |
|