|
Zapiski Nauchnykh Seminarov POMI, 2021, Volume 499, Pages 248–266
(Mi znsl7052)
|
|
|
|
II
Robust word vectors: context-informed embeddings for noisy texts
T. Khakhulina, V. Logachevab, V. Malykhcbd a Skolkovo Institute of Science and Technology, Nobelya Ulitsa, 3, 121205, Moscow, Russia
b Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, Moscow Region
c Steklov Institute of Mathematics at St. Petersburg, nab. r. Fontanki, 27, 191023, St. Petersburg
d Institute for Systems Analysis, Federal Research Center “Computer Science and Control” of Russian Academy of Sciences, pr. 60-letiya Oktyabrya, 9, 117312, Moscow
Abstract:
We suggest a new language-independent architecture of robust word vectors (RoVe). It is designed to alleviate the issue of typos and misspellings, common in almost any user-generated content, which hinder automatic text processing. Our model is morphologically motivated, which allows it to deal with unseen word forms in morphologically rich languages. We present the results on a number of natural language processing (NLP) tasks and languages for a variety of related architectures and show that the proposed architecture is robust to typos.
Key words and phrases:
word vectors, distributed representations, natural language processing.
Received: 14.01.2019
Citation:
T. Khakhulin, V. Logacheva, V. Malykh, “Robust word vectors: context-informed embeddings for noisy texts”, Investigations on applied mathematics and informatics. Part I, Zap. Nauchn. Sem. POMI, 499, POMI, St. Petersburg, 2021, 248–266
Linking options:
https://www.mathnet.ru/eng/znsl7052 https://www.mathnet.ru/eng/znsl/v499/p248
|
Statistics & downloads: |
Abstract page: | 89 | Full-text PDF : | 44 | References: | 20 |
|