[Submitted on 16 Jul 2022]

A Nearly Tight Analysis of Greedy k-means++

The famous k-means++ algorithm of Arthur and Vassilvitskii [SODA 2007] is the most popular way of solving the k-means problem in practice. The algorithm is very simple: it samples the first center uniformly at random and each of the following k-1 centers is then always sampled proportional to its squared distance to the closest center so far. Afterward, Lloyd's iterative algorithm is run. The k-means++ algorithm is known to return a \Theta(\log k) approximate solution in expectation.
In their seminal work, Arthur and Vassilvitskii [SODA 2007] asked about the guarantees for its following \emph{greedy} variant: in every step, we sample \ell candidate centers instead of one and then pick the one that minimizes the new cost. This is also how k-means++ is implemented in e.g. the popular Scikit-learn library [Pedregosa et al.; JMLR 2011].
We present nearly matching lower and upper bounds for the greedy k-means++: We prove that it is an O(\ell^3 \log^3 k)-approximation algorithm. On the other hand, we prove a lower bound of \Omega(\ell^3 \log^3 k / \log^2(\ell\log k)). Previously, only an \Omega(\ell \log k) lower bound was known [Bhattacharya, Eube, Röglin, Schmidt; ESA 2020] and there was no known upper bound.

Subjects:	Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
Cite as:	arXiv:2207.07949 [cs.DS]
	(or arXiv:2207.07949v1 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.2207.07949

[2207.07949] A Nearly Tight Analysis of Greedy k-means++

A Nearly Tight Analysis of Greedy k-means++

Recommend

[2202.13335] A logic-based algorithmic meta-theorem for mim-width

Inspecting memory with the new Memory Profiler package

杭州 200w 总预算买房

[2207.07449] Fixed-Parameter Tractability of Maximum Colored Path and Beyond

用了三年的机场好像跑路了，千辛万苦找了个临时的登上来，求推荐

Solar Geoengineering 'Only Option' to Cool Planet Within Years, UN Says

[2103.02972] Weisfeiler--Leman and Graph Spectra

[2211.04797] Shortest Cycles With Monotone Submodular Costs

[2211.03530] Optimal Deterministic Massively Parallel Connectivity on Forests

Delta Pilots to Get Raises In New Contract | Entrepreneur

About Joyk