A. A. Bērziņš, “Usage of i-vectors for automated determination of a similarity level between languages”, Proceedings of ISP RAS, 31:5 (2019), 153

Loading [MathJax]/jax/output/SVG/config.js

Proceedings of the Institute for System Programming of the RAS

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Proceedings of the Institute for System Programming of the RAS, 2019, Volume 31, Issue 5, Pages 153–164
DOI: https://doi.org/10.15514/ISPRAS-2019-31(5)-12 (Mi tisp461)

Usage of i-vectors for automated determination of a similarity level between languages

A. A. Bērziņš

University of Latvia

Full-text PDF (968 kB)

References:

PDF

HTML

DOI: https://doi.org/10.15514/ISPRAS-2019-31(5)-12

Abstract: The article describes results of applying i-vectors-based (both LID and SID) speech identification methods to define a kind of a distance between languages (in a wide sense of the word – including dialects and any other forms of spoken language). Spontaneous speech recordings of many enough speakers of languages are used on the input of the method. The experiments were carried out at recordings of Latvian and Latgalian dialects, but the method is applicable to any other idioms. Cosine similarity, Euclidean metric, standardized Euclidean metric, Jordan (or Chebyshov) metric and city block (or L1) metric were tried out. Cosine similarity worked well for SID i-vectors, but for unknown reasons was senseless for LID i-vectors. Jordan metric worked well for LID, but was not good enough for SID i-vectors. Standardization of the Euclidean metric does not gave any improvement. Thus, the conclusions are: 1) both SID and LID vectors of full length recordings of spontaneous speech are characterizing and representing languages good enough to be used for detection of a distance between languages; 2) the best metrics for such tasks are Euclidean and L1 (for arithmetic mean vectors computed from i-vectors of all informants coordinate by coordinate).

Keywords: speech, idiom, language, dialect, i-vector, LID, SID, recording, proximity of languages, distance between languages.

Document Type: Article

Language: Russian

Citation: A. A. Bērziņš, “Usage of i-vectors for automated determination of a similarity level between languages”, Proceedings of ISP RAS, 31:5 (2019), 153–164

Citation in format AMSBIB

\Bibitem{Brz19}

\by A.~A.~B{\=e}rzi{\c n}{\v s}

\paper Usage of i-vectors for automated determination  of a similarity level between languages

\jour Proceedings of ISP RAS

\yr 2019

\vol 31

\issue 5

\pages 153--164

\mathnet{http://mi.mathnet.ru/tisp461}

\crossref{https://doi.org/10.15514/ISPRAS-2019-31(5)-12}

Linking options:

https://www.mathnet.ru/eng/tisp461

https://www.mathnet.ru/eng/tisp/v31/i5/p153

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Proceedings of the Institute for System Programming of the RAS

Statistics & downloads:
Abstract page:	147
Full-text PDF :	27
References:	19

Registration to the website

Logotypes