M. S. Karyaeva, P. I. Braslavski, V. A. Sokolov, “Word embedding for semantically relative words: an experimental study”, Model. Anal. Inform. Sist., 25:6 (2018), 726

Modelirovanie i Analiz Informatsionnykh Sistem

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive
	Impact factor

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Model. Anal. Inform. Sist.:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Modelirovanie i Analiz Informatsionnykh Sistem, 2018, Volume 25, Number 6, Pages 726–733
DOI: https://doi.org/10.18255/1818-1015-726-733 (Mi mais659)

Thesauri

Word embedding for semantically relative words: an experimental study

M. S. Karyaeva^a, P. I. Braslavski^b, V. A. Sokolov^a

^a P.G. Demidov Yaroslavl State University, 14 Sovetskaya str., Yaroslavl 150003, Russia
^b Ural Federal University, 19 Mira str., Ekaterinburg 620002, Russia

Full-text PDF (668 kB)

References:

PDF

HTML

DOI: https://doi.org/10.18255/1818-1015-726-733

Abstract: The ability to identify semantic relations between words has made a word2vec model widely used in NLP tasks. The idea of word2vec is based on a simple rule that a higher similarity can be reached if two words have a similar context. Each word can be represented as a vector, so the closest coordinates of vectors can be interpreted as similar words. It allows to establish semantic relations (synonymy, relations of hypernymy and hyponymy and other semantic relations) by applying an automatic extraction. The extraction of semantic relations by hand is considered as a time-consuming and biased task, requiring a large amount of time and some help of experts. Unfortunately, the word2vec model provides an associative list of words which does not consist of relative words only. In this paper, we show some additional criteria that may be applicable to solve this problem. Observations and experiments with well-known characteristics, such as word frequency, a position in an associative list, might be useful for improving results for the task of extraction of semantic relations for the Russian language by using word embedding. In the experiments, the word2vec model trained on the Flibusta and pairs from Wiktionary are used as examples with semantic relationships. Semantically related words are applicable to thesauri, ontologies and intelligent systems for natural language processing.

Keywords: word embedding, word2vec, semantic relations, thesaurus, hyponymy, hypernymy, synonymy.

Funding agency	Grant number
Russian Foundation for Basic Research	16-07-01180_а 16-06-00497_а
The reported study was funded by RFBR according to the research projects №16-07-01180 и №16-06-00497.

Received: 01.09.2018
Revised: 20.11.2018
Accepted: 25.11.2018

Document Type: Article

UDC: 004.912

Language: Russian

Citation: M. S. Karyaeva, P. I. Braslavski, V. A. Sokolov, “Word embedding for semantically relative words: an experimental study”, Model. Anal. Inform. Sist., 25:6 (2018), 726–733

Citation in format AMSBIB

\Bibitem{KarBraSok18}

\by M.~S.~Karyaeva, P.~I.~Braslavski, V.~A.~Sokolov

\paper Word embedding for semantically relative words: an experimental study

\jour Model. Anal. Inform. Sist.

\yr 2018

\vol 25

\issue 6

\pages 726--733

\mathnet{http://mi.mathnet.ru/mais659}

\crossref{https://doi.org/10.18255/1818-1015-726-733}

Linking options:

https://www.mathnet.ru/eng/mais659

https://www.mathnet.ru/eng/mais/v25/i6/p726

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Моделирование и анализ информационных систем

Statistics & downloads:
Abstract page:	476
Full-text PDF :	440
References:	24

Что такое QR-код?

Registration to the website

Logotypes