Matematicheskaya Biologiya i Bioinformatika
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Impact factor

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Mat. Biolog. Bioinform.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Matematicheskaya Biologiya i Bioinformatika, 2019, Volume 14, Issue 2, Pages 464–476
DOI: https://doi.org/10.17537/2019.14.464
(Mi mbb396)
 

Information and Computer Technologies in Biology and Medicine

Cleaning data sets with diagnostic errors in the high-dimensional feature spaces

I. A. Borisova, O. A. Kutnenko

Sobolev Institute of Mathematics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
References:
Abstract: The paper proposes a new approach in data censoring, which allows correcting diagnostic errors in the data sets in case when these samples are described in high-dimensional feature spaces. Considering this case as a separate task is explained by the fact that in high-dimensional spaces most of the methods of outliers detection and data filtering, both statistical and metric, stop working. At the same time, for the tasks of medical diagnostics, given the complexity of the objects and phenomena studied, a large number of descriptive characteristics are the norm rather than the exception. To solve this problem, an approach that focuses on local similarity between objects belonging to the same class and uses the function of rival similarity (FRiS function) as a measure of similarity has been proposed. In this approach for efficient data cleaning from misclassified objects, the most informative and relevant low-dimensional feature subspace is selected, in which the separability of classes after their correction will be maximal. Class separability here means the similarity of objects of one class to each other and their dissimilarity to objects of another class. Cleaning data from class errors can consist both in their correction and removing the objects-outliers from the data set. The described method was implemented as a FRiS-LCFS algorithm (FRiS Local Censoring with Feature Selection) and tested on model and real biomedical problems, including the problem of diagnosing prostate cancer based on DNA microarray analysis. The developed algorithm showed its competitiveness in comparison with the standard methods for filtering data in high-dimensional spaces.
Key words: pattern recognition, function of rival similarity, compactness, class separability, outliers detection, features selection.
Received 04.07.2019, 04.10.2019, Published 07.10.2019
Document Type: Article
UDC: 519.95
Language: Russian
Citation: I. A. Borisova, O. A. Kutnenko, “Cleaning data sets with diagnostic errors in the high-dimensional feature spaces”, Mat. Biolog. Bioinform., 14:2 (2019), 464–476
Citation in format AMSBIB
\Bibitem{BorKut19}
\by I.~A.~Borisova, O.~A.~Kutnenko
\paper Cleaning data sets with diagnostic errors in the high-dimensional feature spaces
\jour Mat. Biolog. Bioinform.
\yr 2019
\vol 14
\issue 2
\pages 464--476
\mathnet{http://mi.mathnet.ru/mbb396}
\crossref{https://doi.org/10.17537/2019.14.464}
Linking options:
  • https://www.mathnet.ru/eng/mbb396
  • https://www.mathnet.ru/eng/mbb/v14/i2/p464
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024