|
Sistemy i Sredstva Informatiki [Systems and Means of Informatics], 2008, , special issue, Pages 6–15
(Mi ssi151)
|
|
|
|
On a simulation aproach to cluster stabilty validation
Zeev Barzily, Mati Golani, Zeev Volkovich Software Engineering Department, ORT Braude College of Engineering
Abstract:
In the current paper we outline a new approach to the “true number of clusters” determination problem. Our method combines both the stability and density concentration approaches. In the spirit of the density estimation methodology, we consider each cluster as an island of “high” density of items in a sea of “low” density. In addition, following the cluster steadiness concept, we suggest that these islands are “resistant” to a random noise. In other words, we believe that adding noise to the attributes of the data elements does not change the clusters structure. A second novelty of our approach is the proposition to measure the similarity between source-data clusters and noisy-data clusters by means of two sample test statistics, represented by probability metrics-distances. Such a pair seems as an appropriate database for the true number of clusters determination. As a consequence of the high resemblance between these samples, within the partitions, the similarity is expected to be amplified under the true number of clusters. According to our model, the true number of clusters corresponds to the empirical distance distribution which is most concentrated at zero. Thus, our procedure can be considered as the creation of an empirical normalized distance distribution, followed by testing its concentration at zero. This test is carried out by means of the sample mean and the size of the sample first quartile.
Citation:
Zeev Barzily, Mati Golani, Zeev Volkovich, “On a simulation aproach to cluster stabilty validation”, Sistemy i Sredstva Inform., 2008, special issue, 6–15
Linking options:
https://www.mathnet.ru/eng/ssi151 https://www.mathnet.ru/eng/ssi/v18/i4/p6
|
Statistics & downloads: |
Abstract page: | 204 | Full-text PDF : | 101 | References: | 37 |
|