Abstract:
We consider two related discrete optimization problems of searching for a subset in a finite set of points in the Euclidean space. Both problems are induced by the versions of the fundamental problem in data analysis, namely, by selecting a subset of similar elements in a set of objects. In each problem, an input set and a positive real number are given, and it is required to find a cluster (i.e., a subset) of the largest size under constraints on the value of a quadratic clusterization function. The points in the input set which are outside the sought for subset are treated as the second (complementary) cluster. In the first problem, the function under the constraint is the sum over both clusters of the intracluster sums of the squared distances between the elements of the clusters and their centers. The center of the first (i.e., the sought) cluster is unknown and determined as the centroid, while the center of the second one is fixed at a given point in the Euclidean space (without loss of generality in the origin). In the second problem, the function under the constraint is the sum over both clusters of the weighted intracluster sums of the squared distances between the elements of the clusters and their centers. As in the first problem, the center of the first cluster is unknown and determined as the centroid, while the center of the second one is fixed in the origin. In this paper, we show that both problems are strongly NP-hard. Also, we present the exact algorithms for the cases of these problems in which the input points have integer components. If the space dimension is bounded by some constant, the algorithms are pseudopolynomial.
This work was supported by the Russian Science Foundation (project no. 16-11-10041, Problem 1),
by the Russian Foundation for Basic Research (projects no. 19-01-00308 and 18-31-00398-mol-a,
Problem 2), by the RAS Fundamental Research Program (project no. 0314-2016-0015), and by RF
Ministry of Education and Science (under Program 5-10).
Citation:
A. V. Kel'manov, A. V. Panasenko, V. I. Khandeev, “Exact algorithms of searching for the largest size cluster in two integer 2-clustering problems”, Sib. Zh. Vychisl. Mat., 22:2 (2019), 121–136; Num. Anal. Appl., 12:2 (2019), 105–115
\Bibitem{KelPanKha19}
\by A.~V.~Kel'manov, A.~V.~Panasenko, V.~I.~Khandeev
\paper Exact algorithms of searching for the largest size cluster in two integer 2-clustering problems
\jour Sib. Zh. Vychisl. Mat.
\yr 2019
\vol 22
\issue 2
\pages 121--136
\mathnet{http://mi.mathnet.ru/sjvm705}
\crossref{https://doi.org/10.15372/SJNM20190201}
\elib{https://elibrary.ru/item.asp?id=38170576}
\transl
\jour Num. Anal. Appl.
\yr 2019
\vol 12
\issue 2
\pages 105--115
\crossref{https://doi.org/10.1134/S1995423919020010}
\isi{https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=Publons&SrcAuth=Publons_CEL&DestLinkType=FullRecord&DestApp=WOS_CPL&KeyUT=000470691500001}
\scopus{https://www.scopus.com/record/display.url?origin=inward&eid=2-s2.0-85066927865}
Linking options:
https://www.mathnet.ru/eng/sjvm705
https://www.mathnet.ru/eng/sjvm/v22/i2/p121
This publication is cited in the following 4 articles:
V. Khandeev, S. Neshchadim, “Fast approximation algorithms for some maximin clustering problems”, Yugosl J Oper Rres, 34:2 (2024), 337
Vladimir Khandeev, Sergey Neshchadim, Communications in Computer and Information Science, 2239, Mathematical Optimization Theory and Operations Research: Recent Trends, 2024, 319
Vladimir Khandeev, Sergey Neshchadim, Lecture Notes in Computer Science, 13930, Mathematical Optimization Theory and Operations Research, 2023, 85
Vladimir Khandeev, Sergey Neshchadim, Communications in Computer and Information Science, 1661, Mathematical Optimization Theory and Operations Research: Recent Trends, 2022, 89