Sistemy i Sredstva Informatiki [Systems and Means of Informatics]
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Impact factor

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Sistemy i Sredstva Inform.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Sistemy i Sredstva Informatiki [Systems and Means of Informatics], 2023, Volume 33, Issue 4, Pages 149–159
DOI: https://doi.org/10.14357/08696527230414
(Mi ssi919)
 

Class imbalance in the technology of concrete historical investigation support

I. M. Adamovich, O. I. Volkov

Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119133, Russian Federation
References:
Abstract: The article continues a series of works devoted to the technology of concrete historical investigation support, built on the principles of co-creation and crowdsourcing and designed for a wide range of nonprofessional historians and biographers users. The article is devoted to the further development of the topic of data preparation for machine learning algorithms used in the technology. The special importance of binary classification for concrete historical research is shown. The problem of class imbalance in binary classification using machine learning algorithms and its consequences are described. It is shown that concrete historical data can be highly imbalanced. An overview of approaches to solving the problem of class imbalance elimination is given. The analysis of the specifics of concrete historical data was carried out, and on its basis, the oversampling approach was chosen as the most suitable for the technology. Algorithms implementing this approach are described; their advantages and disadvantages are evaluated. The ADASYN algorithm has been selected as the most promising for use in the technology conditions. The possibilities of the already included in the technology means of data noise and outliers control to compensate such a disadvantage of the ADASYN algorithm as sensitivity to outliers are evaluated.
Keywords: concrete historical investigation, distributed technology, machine learning, class imbalance, ADASYN algorithm.
Received: 20.07.2023
Bibliographic databases:
Document Type: Article
Language: Russian
Citation: I. M. Adamovich, O. I. Volkov, “Class imbalance in the technology of concrete historical investigation support”, Sistemy i Sredstva Inform., 33:4 (2023), 149–159
Citation in format AMSBIB
\Bibitem{AdaVol23}
\by I.~M.~Adamovich, O.~I.~Volkov
\paper Class imbalance in~the~technology of~concrete historical investigation support
\jour Sistemy i Sredstva Inform.
\yr 2023
\vol 33
\issue 4
\pages 149--159
\mathnet{http://mi.mathnet.ru/ssi919}
\crossref{https://doi.org/10.14357/08696527230414}
\edn{https://elibrary.ru/YDVCYC}
Linking options:
  • https://www.mathnet.ru/eng/ssi919
  • https://www.mathnet.ru/eng/ssi/v33/i4/p149
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Системы и средства информатики
    Statistics & downloads:
    Abstract page:35
    Full-text PDF :25
    References:6
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024