Proceedings of the Institute for System Programming of the RAS
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Proceedings of the Institute for System Programming of the RAS, 2019, Volume 31, Issue 5, Pages 153–164
DOI: https://doi.org/10.15514/ISPRAS-2019-31(5)-12
(Mi tisp461)
 

Usage of i-vectors for automated determination of a similarity level between languages

A. A. Bērziņš

University of Latvia
References:
Abstract: The article describes results of applying i-vectors-based (both LID and SID) speech identification methods to define a kind of a distance between languages (in a wide sense of the word – including dialects and any other forms of spoken language). Spontaneous speech recordings of many enough speakers of languages are used on the input of the method. The experiments were carried out at recordings of Latvian and Latgalian dialects, but the method is applicable to any other idioms. Cosine similarity, Euclidean metric, standardized Euclidean metric, Jordan (or Chebyshov) metric and city block (or L1) metric were tried out. Cosine similarity worked well for SID i-vectors, but for unknown reasons was senseless for LID i-vectors.  Jordan metric worked well for LID, but was not good enough for SID i-vectors. Standardization of the Euclidean metric does not gave any improvement. Thus, the conclusions are: 1) both SID and LID vectors of full length recordings of spontaneous speech are characterizing and representing languages good enough to be used for detection of a distance between languages; 2) the best metrics for such tasks are Euclidean and L1 (for arithmetic mean vectors computed from i-vectors of all informants coordinate by coordinate).
Keywords: speech, idiom, language, dialect, i-vector, LID, SID, recording, proximity of languages, distance between languages.
Document Type: Article
Language: Russian
Citation: A. A. Bērziņš, “Usage of i-vectors for automated determination of a similarity level between languages”, Proceedings of ISP RAS, 31:5 (2019), 153–164
Citation in format AMSBIB
\Bibitem{Brz19}
\by A.~A.~B{\=e}rzi{\c n}{\v s}
\paper Usage of i-vectors for automated determination of a similarity level between languages
\jour Proceedings of ISP RAS
\yr 2019
\vol 31
\issue 5
\pages 153--164
\mathnet{http://mi.mathnet.ru/tisp461}
\crossref{https://doi.org/10.15514/ISPRAS-2019-31(5)-12}
Linking options:
  • https://www.mathnet.ru/eng/tisp461
  • https://www.mathnet.ru/eng/tisp/v31/i5/p153
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Proceedings of the Institute for System Programming of the RAS
    Statistics & downloads:
    Abstract page:120
    Full-text PDF :11
    References:8
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024