I. A. Borisova, O. A. Kutnenko, “Cleaning data sets with diagnostic errors in the high-dimensional feature spaces”, Mat. Biolog. Bioinform., 14:2 (2019), 464

Loading [MathJax]/jax/output/SVG/config.js

Matematicheskaya Biologiya i Bioinformatika

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive
	Impact factor

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Mat. Biolog. Bioinform.:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Matematicheskaya Biologiya i Bioinformatika, 2019, Volume 14, Issue 2, Pages 464–476
DOI: https://doi.org/10.17537/2019.14.464 (Mi mbb396)

Information and Computer Technologies in Biology and Medicine

Cleaning data sets with diagnostic errors in the high-dimensional feature spaces

I. A. Borisova, O. A. Kutnenko

Sobolev Institute of Mathematics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia

Full-text PDF (633 kB)

References:

PDF

HTML

DOI: https://doi.org/10.17537/2019.14.464

Abstract: The paper proposes a new approach in data censoring, which allows correcting diagnostic errors in the data sets in case when these samples are described in high-dimensional feature spaces. Considering this case as a separate task is explained by the fact that in high-dimensional spaces most of the methods of outliers detection and data filtering, both statistical and metric, stop working. At the same time, for the tasks of medical diagnostics, given the complexity of the objects and phenomena studied, a large number of descriptive characteristics are the norm rather than the exception. To solve this problem, an approach that focuses on local similarity between objects belonging to the same class and uses the function of rival similarity (FRiS function) as a measure of similarity has been proposed. In this approach for efficient data cleaning from misclassified objects, the most informative and relevant low-dimensional feature subspace is selected, in which the separability of classes after their correction will be maximal. Class separability here means the similarity of objects of one class to each other and their dissimilarity to objects of another class. Cleaning data from class errors can consist both in their correction and removing the objects-outliers from the data set. The described method was implemented as a FRiS-LCFS algorithm (FRiS Local Censoring with Feature Selection) and tested on model and real biomedical problems, including the problem of diagnosing prostate cancer based on DNA microarray analysis. The developed algorithm showed its competitiveness in comparison with the standard methods for filtering data in high-dimensional spaces.

Key words: pattern recognition, function of rival similarity, compactness, class separability, outliers detection, features selection.

Funding agency	Grant number
Russian Academy of Sciences - Federal Agency for Scientific Organizations	0314-2019-0015

Received 04.07.2019, 04.10.2019, Published 07.10.2019

Document Type: Article

UDC: 519.95

Language: Russian

Citation: I. A. Borisova, O. A. Kutnenko, “Cleaning data sets with diagnostic errors in the high-dimensional feature spaces”, Mat. Biolog. Bioinform., 14:2 (2019), 464–476

Citation in format AMSBIB

\Bibitem{BorKut19}

\by I.~A.~Borisova, O.~A.~Kutnenko

\paper Cleaning data sets with diagnostic errors in the high-dimensional feature spaces

\jour Mat. Biolog. Bioinform.

\yr 2019

\vol 14

\issue 2

\pages 464--476

\mathnet{http://mi.mathnet.ru/mbb396}

\crossref{https://doi.org/10.17537/2019.14.464}

Linking options:

https://www.mathnet.ru/eng/mbb396

https://www.mathnet.ru/eng/mbb/v14/i2/p464

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Statistics & downloads:
Abstract page:	121
Full-text PDF :	137
References:	22

Registration to the website

Logotypes