|
Ñross-lingual transfer learning in drug-related information extraction from user-generated texts
A. S. Sakhovskiyab, E. V. Tutubalinaacd a Kazan Federal University
b Lomonosov Moscow State University
c National Research University Higher School of Economics
d Sber AI
Abstract:
Aggregating knowledge about drug, disease, and drug reaction entities across a broader range of domains and languages is critical for information extraction (IE) applications. In this work, we present a fine-grained evaluation intended to understand the efficiency of multilingual BERT-based models for biomedical named entity recognition (NER) and multi-label sentence classification tasks. We investigate the role of transfer learning (TL) strategies between two English corpora and a novel annotated corpus of Russian reviews about drug therapy. Labels for sentences include health-related issues or their absence. The sentences with one are additionally labelled at the expression level to identify fine-grained subtypes such as drug names, drug indications, and drug reactions. Evaluation results demonstrate that BERT trained on Russian and English raw reviews (5M in total) shows the best transfer capabilities on evaluation of adverse drug reactions on Russian data. The macro F1 score of 74.85% in the NER task was achieved by our RuDR-BERT model. For the classification task, our EnRuDR-BERT model achieves the macro F1 score of 70%, gaining 8.64% over the score of a general domain BERT model.
Keywords:
natural language processing, text classification, information extraction, named entity recognition, BERT.
Citation:
A. S. Sakhovskiy, E. V. Tutubalina, “Ñross-lingual transfer learning in drug-related information extraction from user-generated texts”, Proceedings of ISP RAS, 33:6 (2021), 217–228
Linking options:
https://www.mathnet.ru/eng/tisp656 https://www.mathnet.ru/eng/tisp/v33/i6/p217
|
|