Proceedings of the Institute for System Programming of the RAS
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Proceedings of the Institute for System Programming of the RAS, 2021, Volume 33, Issue 6, Pages 217–228
DOI: https://doi.org/10.15514/ISPRAS-2021-33(6)-15
(Mi tisp656)
 

Ñross-lingual transfer learning in drug-related information extraction from user-generated texts

A. S. Sakhovskiyab, E. V. Tutubalinaacd

a Kazan Federal University
b Lomonosov Moscow State University
c National Research University Higher School of Economics
d Sber AI
Abstract: Aggregating knowledge about drug, disease, and drug reaction entities across a broader range of domains and languages is critical for information extraction (IE) applications. In this work, we present a fine-grained evaluation intended to understand the efficiency of multilingual BERT-based models for biomedical named entity recognition (NER) and multi-label sentence classification tasks. We investigate the role of transfer learning (TL) strategies between two English corpora and a novel annotated corpus of Russian reviews about drug therapy. Labels for sentences include health-related issues or their absence. The sentences with one are additionally labelled at the expression level to identify fine-grained subtypes such as drug names, drug indications, and drug reactions. Evaluation results demonstrate that BERT trained on Russian and English raw reviews (5M in total) shows the best transfer capabilities on evaluation of adverse drug reactions on Russian data. The macro F1 score of 74.85% in the NER task was achieved by our RuDR-BERT model. For the classification task, our EnRuDR-BERT model achieves the macro F1 score of 70%, gaining 8.64% over the score of a general domain BERT model.
Keywords: natural language processing, text classification, information extraction, named entity recognition, BERT.
Funding agency Grant number
Ministry of Science and Higher Education of the Russian Federation ÌÊ-3193.2021.1.6
The work has been supported by a grant from the President of the Russian Federation for young scientists-candidates of science (ÌÊ-3193.2021.1.6)
Document Type: Article
Language: Russian
Citation: A. S. Sakhovskiy, E. V. Tutubalina, “Ñross-lingual transfer learning in drug-related information extraction from user-generated texts”, Proceedings of ISP RAS, 33:6 (2021), 217–228
Citation in format AMSBIB
\Bibitem{SakTut21}
\by A.~S.~Sakhovskiy, E.~V.~Tutubalina
\paper Ñross-lingual transfer learning in drug-related information extraction from user-generated texts
\jour Proceedings of ISP RAS
\yr 2021
\vol 33
\issue 6
\pages 217--228
\mathnet{http://mi.mathnet.ru/tisp656}
\crossref{https://doi.org/10.15514/ISPRAS-2021-33(6)-15}
Linking options:
  • https://www.mathnet.ru/eng/tisp656
  • https://www.mathnet.ru/eng/tisp/v33/i6/p217
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Proceedings of the Institute for System Programming of the RAS
    Statistics & downloads:
    Abstract page:18
    Full-text PDF :6
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024