# Sum-of-Squares method, dictionary learning, and tensor decomposition

**Banff approximation workshop, STOC 2015.**

## abstract

We give a new approach to the dictionary learning (also known as “sparse coding”) problem of recovering an unknown $n\times m$ matrix $A$ (for $m \geq n$) from examples of the form $y = Ax + e,$ where $x$ is a random vector in $\mathbb R^m$ with at most $\tau m$ nonzero coordinates, and $e$ is a random noise vector in $\mathbb R^n$ with bounded magnitude. For the case $m=O(n)$, our algorithm recovers every column of $A$ within arbitrarily good constant accuracy in time $m^{O(\log m/\log(\tau^{-1}))}$, in particular achieving polynomial time if $\tau = m^{-\delta}$ for any $\delta>0$, and time $m^{O(\log m)}$ if $\tau$ is (a sufficiently small) constant. Prior algorithms with comparable assumptions on the distribution required the vector $x$ to be much sparser—at most $\sqrt{n}$ nonzero coordinates—and there were intrinsic barriers preventing these algorithms from applying for denser $x$.

We achieve this by designing an algorithm for *noisy tensor decomposition* that can recover, under quite general conditions, an approximate rank-one decomposition of a tensor $T$, given access to a tensor $T'$ that is $\tau$-close to $T$ in the spectral norm (when considered as a matrix). To our knowledge, this is the first algorithm for tensor decomposition that works in the constant spectral-norm noise regime, where there is no guarantee that the local optima of $T$ and $T'$ have similar structures.

Our algorithm is based on a novel approach to using and analyzing the *Sum of Squares* semidefinite programming hierarchy (Parrilo 2000, Lasserre 2001), and it can be viewed as an indication of the utility of this very general and powerful tool for unsupervised learning problems.