|
Zapiski Nauchnykh Seminarov POMI, 2021, Volume 499, Pages 206–221
(Mi znsl7060)
|
|
|
|
II
Word-based russian text augmentation for character-level models
R. B. Galinskya, A. M. Alekseevba, S. I. Nikolenkoab a St. Petersburg Department of Steklov Mathematical Institute of Russian Academy of Sciences
b Saint Petersburg State University
Abstract:
Large-scale deep learning models, including models for natural language processing, require large datasets for training that could be unavailable for low-resource languages or for special domains. We consider a way to approach the problem of poor variability and small size of available data for training NLP models based on augmenting the data with synonyms. We design a novel augmentation scheme that includes replacing words with synonyms and reshuffling the words, apply it to the Russian language, and report improved results for the sentiment analysis task.
Key words and phrases:
Deep learning, natural language processing, data augmentation, sentiment analysis.
Received: 02.10.2020
Citation:
R. B. Galinsky, A. M. Alekseev, S. I. Nikolenko, “Word-based russian text augmentation for character-level models”, Investigations on applied mathematics and informatics. Part I, Zap. Nauchn. Sem. POMI, 499, POMI, St. Petersburg, 2021, 206–221
Linking options:
https://www.mathnet.ru/eng/znsl7060 https://www.mathnet.ru/eng/znsl/v499/p206
|
Statistics & downloads: |
Abstract page: | 155 | Full-text PDF : | 59 | References: | 19 |
|