Improved clustering and robust moment estimation via sum-of-squares

with Pravesh Kothari, Jacob Steinhardt. STOC 2018.

PDF

(merger of arxiv:1711.07465 [cs.DS] and arxiv:1711.11581 [cs.LG])

abstract

We develop efficient algorithms for clustering and robust moment estimation. Via convex sum-of-squares relaxations, our algorithms can exploit bounds on higher-order moments of the underlying distributions. In this way, our algorithms achieve substantially stronger guarantees than previous approaches.

For example, our clustering algorithm can learn mixtures of kk spherical Gaussians in time nO(t)n^{O(t)} as long as the means have minimum separation tk1/t\sqrt t \cdot k^{1/t} for any t4t\ge 4, significantly improving over the minimum separation k1/4k^{1/4} required by the previous best algorithm due to Vempala and Wang (JCSS'04).

Given corrupted samples that contain a constant fraction of adversarial outliers, our moment estimation algorithms can approximate low-degree moments of unknown distributions, achieving guarantees that match information-theoretic lower-bounds for a broad class of distributions. These algorithms allow us to enhance, in a black-box way, many previous learning algorithms based on the method of moments, e.g., for independent component analysis and latent Dirichlet allocation, such that they become resilient to adversarial outliers.

keywords

  • sum-of-squares method
  • machine learning
  • semidefinite programming