|
Zapiski Nauchnykh Seminarov POMI, 2021, Volume 499, Pages 129–136
(Mi znsl7048)
|
|
|
|
II
Recovering word forms by context for morphologically rich languages
A. M. Alekseeva, S. I. Nikolenkoab a St. Petersburg Department of Steklov Institute of Mathematics, St. Petersburg, Russia
b St. Petersburg State University, 7/9 Universitetskaya nab., St. Petersburg, 199034 Russia
Abstract:
In this work, we focus on “sentence-level unlemmatization”, the task of generating a grammatical sentence given a lemmatized one, which can usually be easily done by humans. We treat this setting as a machine translation problem and – as a first try – apply a sequence-to-sequence model to the texts of Russian Wikipedia articles, evaluate the effect of the different training sets sizes quantitatively and achieve the BLUE score of 67,3 using the largest training set available. We discuss preliminary results and flaws of traditional machine translation evaluation methods for this task and suggest directions for future research.
Key words and phrases:
deep learning, natural language processing, morphological agreement, machine translation.
Received: 02.10.2020
Citation:
A. M. Alekseev, S. I. Nikolenko, “Recovering word forms by context for morphologically rich languages”, Investigations on applied mathematics and informatics. Part I, Zap. Nauchn. Sem. POMI, 499, POMI, St. Petersburg, 2021, 129–136
Linking options:
https://www.mathnet.ru/eng/znsl7048 https://www.mathnet.ru/eng/znsl/v499/p129
|
Statistics & downloads: |
Abstract page: | 107 | Full-text PDF : | 71 | References: | 15 |
|