I. A. Borisova, O. A. Kutnenko, “The problem of correction diagnostic errors in the target attribute with the function of rival similarity”, Mat. Biolog. Bioinform., 13:1 (2018), 38

Loading [MathJax]/jax/output/SVG/config.js

Matematicheskaya Biologiya i Bioinformatika

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive
	Impact factor

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Mat. Biolog. Bioinform.:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Matematicheskaya Biologiya i Bioinformatika, 2018, Volume 13, Issue 1, Pages 38–49
DOI: https://doi.org/10.17537/2018.13.38 (Mi mbb326)

This article is cited in 2 scientific papers (total in 2 papers)

Data mining

The problem of correction diagnostic errors in the target attribute with the function of rival similarity

I. A. Borisova, O. A. Kutnenko

Institute of Mathematics SB RAS, Novosibirsk, Russia

Full-text PDF (520 kB) Citations (2)

References:

PDF

HTML

DOI: https://doi.org/10.17537/2018.13.38

Abstract: The problem of outliers detection is one of the important problems in Data Mining of biomedical datasets particularly in case when there could be misclassified objects, caused by diagnostic pitfalls on a stage of a data collection. Occurrence of such objects complicates and slows down dataset processing, distorts and corrupts detected regularities, reduces their accuracy score. We propose the censoring algorithm which could detect misclassified objects after which they are either removed from the dataset or the class attribute of such objects is corrected. Correction procedure keeps the volume of the analyzed dataset as big as it is possible. Such quality is very useful in case of small datasets analysis, when every bit of information can be important. The base concept in the presented work is a measure of similarity of objects with its surroundings. To evaluate the local similarity of the object with its closest neighbors the ternary relative measure called the function of rival similarity (FRiS-function) is used. Mean of similarity values of all objects in the dataset gives us a notion of a class’s separability, how close objects from the same class are to each other and how far they are from the objects of the different classes (with the different diagnosis) in the attribute space. It is supposed misclassified objects are more similar to objects from rival classes, than their own class, so their elimination from the dataset, or the target attribute correction should increase data separability value. The procedure of filtering-correcting of misclassified objects is based on the observation of changes in the evaluation of data separability calculated before and after making corrections to the dataset. The censoring process continues until the inflection point of the separability function is reached. The proposed algorithm was tested on a wide range of model tasks of different complexity. Also it was tested on biomedical tasks such as Pima Indians Diabetes data set, Breast Cancer data set and Parkinson data set. On these tasks the censoring algorithm showed high misclassification sensitivity. Accuracy score increasing and data set volume preservation after censoring procedure proved our base assumptions and the effectiveness of the algorithm.

Key words: outliers detection, function of rival similarity, compactness, class separability, classification.

Funding agency	Grant number
Russian Foundation for Basic Research	16-07-00168_а

Received 31.01.2018, Published 27.03.2018

Document Type: Article

UDC: 519.95

Language: Russian

Citation: I. A. Borisova, O. A. Kutnenko, “The problem of correction diagnostic errors in the target attribute with the function of rival similarity”, Mat. Biolog. Bioinform., 13:1 (2018), 38–49

Citation in format AMSBIB

\Bibitem{BorKut18}

\by I.~A.~Borisova, O.~A.~Kutnenko

\paper The problem of correction diagnostic errors in the target attribute with the function of rival similarity

\jour Mat. Biolog. Bioinform.

\yr 2018

\vol 13

\issue 1

\pages 38--49

\mathnet{http://mi.mathnet.ru/mbb326}

\crossref{https://doi.org/10.17537/2018.13.38

}

Linking options:

https://www.mathnet.ru/eng/mbb326

https://www.mathnet.ru/eng/mbb/v13/i1/p38

This publication is cited in the following 2 articles:

O. A. Kutnenko, A. V. Plyasunov, “NP-hardness of some data cleaning problem”, J. Appl. Industr. Math., 15:2 (2021), 285–291
I. A. Borisova, O. A. Kutnenko, “Ochistka dannykh ot diagnosticheskikh oshibok v priznakovykh prostranstvakh bolshoi razmernosti”, Matem. biologiya i bioinform., 14:2 (2019), 464–476

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Statistics & downloads:
Abstract page:	211
Full-text PDF :	100
References:	38

Registration to the website

Logotypes