F. V. Krasnov, I. S. Smaznevich, E. N. Baskakova, “Text sampling strategies for predicting missing bibliographic links”, Proceedings of ISP RAS, 34:2 (2022), 77

Loading [MathJax]/jax/output/CommonHTML/config.js

Proceedings of the Institute for System Programming of the RAS

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Proceedings of the Institute for System Programming of the RAS, 2022, Volume 34, Issue 2, Pages 77–88
DOI: https://doi.org/10.15514/ISPRAS-2022-34(2)-7 (Mi tisp679)

Text sampling strategies for predicting missing bibliographic links

F. V. Krasnov, I. S. Smaznevich, E. N. Baskakova

NAUMEN

Full-text PDF (583 kB)

DOI: https://doi.org/10.15514/ISPRAS-2022-34(2)-7

Abstract: The paper proposes various strategies for sampling text data when performing automatic sentence classification for the purpose of detecting missing bibliographic links. We construct samples based on sentences as semantic units of the text and add their immediate context which consists of several neighbouring sentences. We examine a number of sampling strategies that differ in context size and position. The experiment is carried out on the collection of STEM scientific papers. Including the context of sentences into samples improves the result of their classification. We automatically determine the optimal sampling strategy for a given text collection by implementing an ensemble voting when classifying the same data sampled in different ways. Sampling strategy taking into account the sentence context with hard voting procedure leads to the classification accuracy of 98% (F1-score). This method of detecting missing bibliographic links can be used in recommendation engines of applied intelligent information systems.

Keywords: text sampling, sampling strategy, citation analysis, prediction of bibliographic references, proposition classification

Document Type: Article

Language: Russian

Citation: F. V. Krasnov, I. S. Smaznevich, E. N. Baskakova, “Text sampling strategies for predicting missing bibliographic links”, Proceedings of ISP RAS, 34:2 (2022), 77–88

Citation in format AMSBIB

\Bibitem{KraSmaBas22}

\by F.~V.~Krasnov, I.~S.~Smaznevich, E.~N.~Baskakova

\paper Text sampling strategies for predicting missing bibliographic links

\jour Proceedings of ISP RAS

\yr 2022

\vol 34

\issue 2

\pages 77--88

\mathnet{http://mi.mathnet.ru/tisp679}

\crossref{https://doi.org/10.15514/ISPRAS-2022-34(2)-7}

Linking options:

https://www.mathnet.ru/eng/tisp679

https://www.mathnet.ru/eng/tisp/v34/i2/p77

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Proceedings of the Institute for System Programming of the RAS

Statistics & downloads:
Abstract page:	32
Full-text PDF :	17

Registration to the website

Logotypes