A. S. Sakhovskiy, E. V. Tutubalina, “Сross-lingual transfer learning in drug-related information extraction from user-generated texts”, Proceedings of ISP RAS, 33:6 (2021), 217

Proceedings of the Institute for System Programming of the RAS

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Proceedings of the Institute for System Programming of the RAS, 2021, Volume 33, Issue 6, Pages 217–228
DOI: https://doi.org/10.15514/ISPRAS-2021-33(6)-15 (Mi tisp656)

Сross-lingual transfer learning in drug-related information extraction from user-generated texts

A. S. Sakhovskiy^ab, E. V. Tutubalina^acd

^a Kazan Federal University
^b Lomonosov Moscow State University
^c National Research University Higher School of Economics
^d Sber AI

Full-text PDF (361 kB)

DOI: https://doi.org/10.15514/ISPRAS-2021-33(6)-15

Abstract: Aggregating knowledge about drug, disease, and drug reaction entities across a broader range of domains and languages is critical for information extraction (IE) applications. In this work, we present a fine-grained evaluation intended to understand the efficiency of multilingual BERT-based models for biomedical named entity recognition (NER) and multi-label sentence classification tasks. We investigate the role of transfer learning (TL) strategies between two English corpora and a novel annotated corpus of Russian reviews about drug therapy. Labels for sentences include health-related issues or their absence. The sentences with one are additionally labelled at the expression level to identify fine-grained subtypes such as drug names, drug indications, and drug reactions. Evaluation results demonstrate that BERT trained on Russian and English raw reviews (5M in total) shows the best transfer capabilities on evaluation of adverse drug reactions on Russian data. The macro F1 score of 74.85% in the NER task was achieved by our RuDR-BERT model. For the classification task, our EnRuDR-BERT model achieves the macro F1 score of 70%, gaining 8.64% over the score of a general domain BERT model.

Keywords: natural language processing, text classification, information extraction, named entity recognition, BERT.

Funding agency	Grant number
Ministry of Science and Higher Education of the Russian Federation	МК-3193.2021.1.6
The work has been supported by a grant from the President of the Russian Federation for young scientists-candidates of science (МК-3193.2021.1.6)

Document Type: Article

Language: Russian

Citation: A. S. Sakhovskiy, E. V. Tutubalina, “Сross-lingual transfer learning in drug-related information extraction from user-generated texts”, Proceedings of ISP RAS, 33:6 (2021), 217–228

Citation in format AMSBIB

\Bibitem{SakTut21}

\by A.~S.~Sakhovskiy, E.~V.~Tutubalina

\paper Сross-lingual transfer learning in drug-related information extraction from user-generated texts

\jour Proceedings of ISP RAS

\yr 2021

\vol 33

\issue 6

\pages 217--228

\mathnet{http://mi.mathnet.ru/tisp656}

\crossref{https://doi.org/10.15514/ISPRAS-2021-33(6)-15}

Linking options:

https://www.mathnet.ru/eng/tisp656

https://www.mathnet.ru/eng/tisp/v33/i6/p217

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Proceedings of the Institute for System Programming of the RAS

Statistics & downloads:
Abstract page:	38
Full-text PDF :	13

Registration to the website

Logotypes