|
This article is cited in 1 scientific paper (total in 1 paper)
Weakly supervised word sense disambiguation using automatically labelled collections
A. S. Bolshinaa, N. V. Lukashevichb a Lomonosov Moscow State University
b Research Computing Center Lomonosov of Moscow State University
Abstract:
State-of-the-art supervised word sense disambiguation models require large sense-tagged training sets. However, many low-resource languages, including Russian, lack such a large amount of data. To cope with the knowledge acquisition bottleneck in Russian, we first utilized the method based on the concept of monosemous relatives to automatically generate a labelled training collection. We then introduce three weakly supervised models trained on this synthetic data. Our work builds upon the bootstrapping approach: relying on this seed of tagged instances, the ensemble of the classifiers is used to label samples from unannotated corpora. Along with this method, different techniques were exploited to augment the new training examples. We show the simple bootstrapping approach based on the ensemble of weakly supervised models can already produce an improvement over the initial word sense disambiguation models.
Keywords:
word sense disambiguation, Russian dataset, RuWordNet.
Citation:
A. S. Bolshina, N. V. Lukashevich, “Weakly supervised word sense disambiguation using automatically labelled collections”, Proceedings of ISP RAS, 33:6 (2021), 193–204
Linking options:
https://www.mathnet.ru/eng/tisp654 https://www.mathnet.ru/eng/tisp/v33/i6/p193
|
|