Аннотация:
Variable selection is a difficult task in many areas of multivariate statistics such as classification, clustering and regression. Here the hope is that the structure of interest may be contained in only a small subset of variables. In contradiction to supervised classification such as discriminant analysis, variable selection in cluster analysis is a much more difficult problem because usually nothing is known about the true class structure, and hence nothing is known about the number of clusters K to be inherent in the data.
There are many proposals on variable selection in cluster analysis based on special cluster separation measures such as the criterion of Davies and Bouldin (1979). Here we present a general bottom-up approach to variable selection using non-parametric bootstrapping based on criteria of stability such as the adjusted Rand's index (Hubert and Arabie, 1985). General means that it makes use only of measures of stability of partitions, and so it can be applied to almost any cluster analysis method.