# Improved clustering and robust moment estimation via sum-of-squares

with Pravesh Kothari, Jacob Steinhardt. **STOC 2018.**

(merger of arxiv:1711.07465 [cs.DS] and arxiv:1711.11581 [cs.LG])

## abstract

We develop efficient algorithms for clustering and robust moment estimation. Via convex sum-of-squares relaxations, our algorithms can exploit bounds on higher-order moments of the underlying distributions. In this way, our algorithms achieve substantially stronger guarantees than previous approaches.

For example, our clustering algorithm can learn mixtures of $k$ spherical Gaussians in time $n^{O(t)}$ as long as the means have minimum separation $\sqrt t \cdot k^{1/t}$ for any $t\ge 4$, significantly improving over the minimum separation $k^{1/4}$ required by the previous best algorithm due to Vempala and Wang (JCSS'04).

Given corrupted samples that contain a constant fraction of adversarial outliers, our moment estimation algorithms can approximate low-degree moments of unknown distributions, achieving guarantees that match information-theoretic lower-bounds for a broad class of distributions. These algorithms allow us to enhance, in a black-box way, many previous learning algorithms based on the method of moments, e.g., for independent component analysis and latent Dirichlet allocation, such that they become resilient to adversarial outliers.

## keywords

- sum-of-squares method
- machine learning
- semidefinite programming