Computer Research and Modeling
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Computer Research and Modeling:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Computer Research and Modeling, 2021, Volume 13, Issue 6, Pages 1317–1336
DOI: https://doi.org/10.20537/2076-7633-2021-13-6-1317-1336
(Mi crm950)
 

MODELS OF ECONOMIC AND SOCIAL SYSTEMS

Bibliographic link prediction using contrast resampling technique

F. V. Krasnov, I. S. Smaznevich, E. N. Baskakova

NAUMEN R&D, 49A, Tatishcheva st., Yekaterinburg, 620028, Russian Federation
References:
Abstract: The paper studies the problem of searching for fragments with missing bibliographic links in a scientific article using automatic binary classification. To train the model, we propose a new contrast resampling technique, the innovation of which is the consideration of the context of the link, taking into account the boundaries of the fragment, which mostly affects the probability of presence of a bibliographic links in it. The training set was formed of automatically labeled samples that are fragments of three sentences with class labels «without link» and «with link» that satisfy the requirement of contrast: samples of different classes are distanced in the source text. The feature space was built automatically based on the term occurrence statistics and was expanded by constructing additional features — entities (names, numbers, quotes and abbreviations) recognized in the text.
A series of experiments was carried out on the archives of the scientific journals «Law enforcement review» (273 articles) and «Journal Infectology» (684 articles). The classification was carried out by the models Nearest Neighbors, RBF SVM, Random Forest, Multilayer Perceptron, with the selection of optimal hyperparameters for each classifier.
Experiments have confirmed the hypothesis put forward. The highest accuracy was reached by the neural network classifier (95 %), which is however not as fast as the linear one that showed also high accuracy with contrast resampling (91–94 %). These values are superior to those reported for NER and Sentiment Analysis on comparable data. The high computational efficiency of the proposed method makes it possible to integrate it into applied systems and to process documents online.
Keywords: contrast resampling, citation analysis, data resampling, link prediction, text classification, artificial neural network.
Received: 30.07.2021
Revised: 14.09.2021
Accepted: 25.09.2021
Document Type: Article
UDC: 004.896, 004.584, 004.91, 519.688
Language: Russian
Citation: F. V. Krasnov, I. S. Smaznevich, E. N. Baskakova, “Bibliographic link prediction using contrast resampling technique”, Computer Research and Modeling, 13:6 (2021), 1317–1336
Citation in format AMSBIB
\Bibitem{KraSmaBas21}
\by F.~V.~Krasnov, I.~S.~Smaznevich, E.~N.~Baskakova
\paper Bibliographic link prediction using contrast resampling technique
\jour Computer Research and Modeling
\yr 2021
\vol 13
\issue 6
\pages 1317--1336
\mathnet{http://mi.mathnet.ru/crm950}
\crossref{https://doi.org/10.20537/2076-7633-2021-13-6-1317-1336}
Linking options:
  • https://www.mathnet.ru/eng/crm950
  • https://www.mathnet.ru/eng/crm/v13/i6/p1317
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Computer Research and Modeling
    Statistics & downloads:
    Abstract page:85
    Full-text PDF :31
    References:18
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024