Informatsionnye Tekhnologii i Vychslitel'nye Sistemy
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Guidelines for authors

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Informatsionnye Tekhnologii i Vychslitel'nye Sistemy:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Informatsionnye Tekhnologii i Vychslitel'nye Sistemy, 2022, Issue 3, Pages 35–42
DOI: https://doi.org/10.14357/20718632220304
(Mi itvs774)
 

INTELLIGENT SYSTEMS AND TECHNOLOGIES

Automatic training data filtering for errors removing and improving the quality of the final neural network

N. Z. Valishinaab, S. A. Ilyuhinbc, A. V. Sheshkusbcd, V. L. Arlazarov

a Lomonosov Moscow State University, Prosp. 60-letiya Oktyabrya, 9, Moscow, 117312, Russia
b Smart Engines Service LLC
c Moscow Institute of Physics and Technology (State University), Prosp. 60-letiya Oktyabrya, 9, Moscow, 117312, Russia
d Federal Research Center "Computer Science and Control" of RAS, Prosp. 60-letiya Oktyabrya, 9, Moscow, 117312, Russia
Abstract: Real-world data are often dirty. In most cases it negatively affects the accuracy of the model trained on such data. Supervised data correction is an expensive and time-consuming procedure. So one of the possible ways to solve this problem is to automate the cleaning process. In this paper, we consider such a preprocessing technique for improving the quality of the trained network as automatic cleaning of training data. The proposed iterative method is based on the assumption that the polluted data are most likely located farther away from the median of the class. It includes detection and subsequent removal of the noisy data from a training set. Experiments on a generated synthetic dataset demonstrated that this method gives good results and allows to clean up the data even at high levels of pollution and significantly improve the quality of the classifier.
Keywords: data cleaning, outlier(s) detection, mislabels, classifier, siamese neural network.
Bibliographic databases:
Document Type: Article
Language: English
Citation: N. Z. Valishina, S. A. Ilyuhin, A. V. Sheshkus, V. L. Arlazarov, “Automatic training data filtering for errors removing and improving the quality of the final neural network”, Informatsionnye Tekhnologii i Vychslitel'nye Sistemy, 2022, no. 3, 35–42
Citation in format AMSBIB
\Bibitem{ValIlyShe22}
\by N.~Z.~Valishina, S.~A.~Ilyuhin, A.~V.~Sheshkus, V.~L.~Arlazarov
\paper Automatic training data filtering for errors removing and improving the quality of the final neural network
\jour Informatsionnye Tekhnologii i Vychslitel'nye Sistemy
\yr 2022
\issue 3
\pages 35--42
\mathnet{http://mi.mathnet.ru/itvs774}
\crossref{https://doi.org/10.14357/20718632220304}
\elib{https://elibrary.ru/item.asp?id=49501757}
Linking options:
  • https://www.mathnet.ru/eng/itvs774
  • https://www.mathnet.ru/eng/itvs/y2022/i3/p35
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Informatsionnye  Tekhnologii i Vychslitel'nye Sistemy
    Statistics & downloads:
    Abstract page:37
    Full-text PDF :15
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024