Matrix Completion and Large-Scale SVD Computations

Many thanks to Kourosh Modarresi for oranizing this session, and for his care in managing the arrangements.

Matrix Completion and Large-scale SVD Computations

Trevor Hastie Stanford Statistics with Rahul Mazumder, Jason Lee, Reza Zadeh and Rob Tibshirani

Reykjavik, June 2015

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 1/ 27 Matrix Completion and Large-scale SVD Computations

Trevor Hastie Stanford Statistics with Rahul Mazumder, Jason Lee, Reza Zadeh and Rob Tibshirani

Reykjavik, June 2015 Many thanks to Kourosh Modarresi for oranizing this session, and for his care in managing the arrangements.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 1/ 27 Outline of Talk

I Convex matrix completion, collaborative ﬁltering (Mazumder, Hastie, Tibshirani 2010 JMLR)

I Recent algorithmic advances and large-scale SVD (Hastie, Mazumder, Lee, Zadeh Arxiv Oct 2014, to appear JMLR)

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 2/ 27 The Netﬂix Prize

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 3/ 27 The Netﬂix Data Set

I Training Data: 480K users, 18K movies, 100M ratings ratings 1-5 (99 % ratings missing)

movie I movie II movie III movie IV ··· I Goal: User A 1 ? 5 4 ··· $1M prize for 10 % reduction User B ? 2 3 ? ··· in RMSE over Cinematch User C 4 1 2 ? ··· User D ? 5 1 3 ··· User E 1 2 ?? I BellKor’s Pragmatic Chaos . . . . . ···. declared winners on 9/21/2009 ...... used ensemble of models, an important ingredient being low-rank factorization

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 4/ 27 Matrix Completion/ Collaborative Filtering: Problem Deﬁnition

Movie Ratings

I Large matrices # rows #columns 105, 106 | ≈

I Very under-determined (often only 1 2 % observed) −

I Exploit matrix structure, row column interactions | Users

I Task: “ﬁll-in” missing entries

I Applications: recommender systems, image-processing, imputation of NAs for genomic data, rank estimation for SVD.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 5/ 27 Model Assumption : Low Rank + Noise

I Under-determined – assume low-rank

I Meaningful? Interpretation – User & item factors induce collaboration Empirical – Netﬂix successes Theoretical – “Reconstruction” possible under low-rank & regularity conditions

Srebro et al (2005); Candes and Recht (2008); Candes and Tao (2009); Keshavan et. al. (2009); Negahban and Wainwright (2012)

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 6/ 27 Optimization problem

Find Zn×m of (small) rank r such that training error is small.

X 2 minimize (Xij Zij ) subject to rank(Z) = r Z − Observed(i,j)

Impute missing Xij with Zij

True X Observed X Fitted Z Imputed X

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 7/ 27 Our Approach: Nuclear Norm Relaxation

I The rank(Z) constraint makes the problem non-convex — combinatorially very hard (although good algorithms exist). P I Z ∗ = j λj (Z) — sum of singular values of Z — is convex kin Zk. Called the “nuclear norm” of Z.

I Z ∗ tightest convex relaxation of rank(Z) (Fazel,k k Boyd, 2002) We solve instead

X 2 minimize (Xij Zij ) subject to Z ∗ τ Z − || || ≤ Observed(i,j)

which is convex in Z.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 8/ 27 Our Approach: Nuclear Norm Relaxation

I The rank(Z) constraint makes the problem non-convex — combinatorially very hard (although good algorithms exist). P I Z ∗ = j λj (Z) — sum of singular values of Z — is convex ink Zk. Called the “nuclear norm” of Z.

I Z ∗ tightest convex relaxation of rank(Z) (Fazel,k k Boyd, 2002) We solve instead

X 2 minimize (Xij Zij ) subject to Z ∗ τ Z − || || ≤ Observed(i,j)

which is convex in Z.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 8/ 27 Our Approach: Nuclear Norm Relaxation

I Z ∗ tightest convex relaxation of rank(Z) (Fazel,k k Boyd, 2002) We solve instead

X 2 minimize (Xij Zij ) subject to Z ∗ τ Z − || || ≤ Observed(i,j)

which is convex in Z.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 8/ 27 Our Approach: Nuclear Norm Relaxation

I Z ∗ tightest convex relaxation of rank(Z) (Fazel,k k Boyd, 2002) We solve instead

X 2 minimize (Xij Zij ) subject to Z ∗ τ Z − || || ≤ Observed(i,j)

which is convex in Z.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 8/ 27 Notation

Following Cai et al (2010) deﬁne PΩ(X )n×m: projection onto the observed entries X if (i, j) is observed P (X ) = i,j Ω i,j 0 if (i, j) is missing

Criterion rewritten as:

X 2 2 (Xij Zij ) = PΩ(X ) PΩ(Z) − k − kF Observed(i,j)

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 9/ 27 Soft SVD — Prox operator for Nuclear Norm

Let (fully observed) Xn×m have SVD

0 X = U diag[σ1, . . . , σm] V · · Consider the convex optimization problem

1 2 minimize X Z F + λ Z ∗ Z 2 k − k k k Solution is soft-thresholded SVD

0 Sλ(X ) := U diag[(σ1 λ)+,..., (σm λ)+] V · − − · Like lasso for SVD: singular values are shrunk to zero, with many set to zero. Smooth version of best-rank approximation.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 10/ 27 Convex Optimization Problem

Back to missing data problem, in Lagrange form:

1 2 minimize PΩ(X ) PΩ(Z) F + λ Z ∗ Z 2 k − k k k

I This is a semi-deﬁnite program (SDP), convex in Z.

I Complexity of existing oﬀ-the-shelf solvers: – interior-point methods: O(n4) ... O(n6) ... – (black box) ﬁrst-order methods complexity: O(n3)

I We solve using an iterative soft SVD (next slide), with cost per soft SVD O[(m + n) r + Ω ] where r is rank of solution. · | |

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 11/ 27 This is an MM algorithm for solving the nuclear-norm regularized problem

Soft-Impute: Path Algorithm

1 Initialize Z old = 0 and create a decreasing grid Λ of values λ0 > λ1 > . . . > λK > 0, with λ0 = λmax(PΩ(X )) 2 For each λ = λ1, λ2,... Λ iterate 2a-2b till convergence: ∈ new ⊥ old (2a) Compute Z Sλ(PΩ(X ) + P (Z )) ← Ω (2b) Assign Z old Z new and go to step (2a) ← new (2c) Assign Zˆλ Z and go to 2 ←

ˆ ˆ 3 Output the sequence of solutions Zλ1 ,..., ZλK .

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 12/ 27 Soft-Impute: Path Algorithm

ˆ ˆ 3 Output the sequence of solutions Zλ1 ,..., ZλK .

This is an MM algorithm for solving the nuclear-norm regularized problem

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 12/ 27 Soft-Impute : Computational Bottleneck

Obtain the sequence Zk of guesses { } 1 ⊥ 2 Zk+1 = argmin 2 PΩ(X ) + PΩ (Zk ) Z F + λ Z ∗ Z k − k k k Computational bottleneck — soft SVD requires (low-rank) SVD of completed matrix after k iterations:

ˆ ⊥ Xk = PΩ(X ) + PΩ (Zk )

Trick:

⊥ PΩ(X ) + PΩ (Zk )= PΩ(X ) PΩ(Zk ) + Zk Sparse− Low Rank

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 13/ 27 Computational tricks in Soft-Impute

ˆ ˆ I Anticipate rank of Zλj+1 based on rank of Zλj , erring on generous side.

I Compute low-rank SVD of Xˆk using orthogonal QR iterations with Reitz acceleration (Stewart, 1969, Hastie, Mazumder, Lee and Zadeh 2014 [arXiv]). 0 I Iterations require left and right multiplications U Xˆk and Xˆk V . Ideal for Sparse + Low-Rank structure.