<<

Many thanks to Kourosh Modarresi for oranizing this session, and for his care in managing the arrangements.

Matrix Completion and Large-scale SVD Computations

Trevor Hastie Stanford Statistics with Rahul Mazumder, Jason Lee, Reza Zadeh and Rob Tibshirani

Reykjavik, June 2015

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 1/ 27 Matrix Completion and Large-scale SVD Computations

Trevor Hastie Stanford Statistics with Rahul Mazumder, Jason Lee, Reza Zadeh and Rob Tibshirani

Reykjavik, June 2015 Many thanks to Kourosh Modarresi for oranizing this session, and for his care in managing the arrangements.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 1/ 27 Outline of Talk

I Convex matrix completion, collaborative filtering (Mazumder, Hastie, Tibshirani 2010 JMLR)

I Recent algorithmic advances and large-scale SVD (Hastie, Mazumder, Lee, Zadeh Arxiv Oct 2014, to appear JMLR)

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 2/ 27 The Netflix Prize

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 3/ 27 The Netflix Data Set

I Training Data: 480K users, 18K movies, 100M ratings ratings 1-5 (99 % ratings missing)

movie I movie II movie III movie IV ··· I Goal: User A 1 ? 5 4 ··· $1M prize for 10 % reduction User B ? 2 3 ? ··· in RMSE over Cinematch User C 4 1 2 ? ··· User D ? 5 1 3 ··· User E 1 2 ?? I BellKor’s Pragmatic Chaos . . . . . ···. declared winners on 9/21/2009 ...... used ensemble of models, an important ingredient being low-rank factorization

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 4/ 27 Matrix Completion/ : Problem Definition

Movie Ratings

I Large matrices # rows #columns 105, 106 | ≈

I Very under-determined (often only 1 2 % observed) −

I Exploit matrix structure, row column interactions | Users

I Task: “fill-in” missing entries

I Applications: recommender systems, image-processing, imputation of NAs for genomic data, rank estimation for SVD.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 5/ 27 Model Assumption : Low Rank + Noise

I Under-determined – assume low-rank

I Meaningful? Interpretation – User & item factors induce collaboration Empirical – Netflix successes Theoretical – “Reconstruction” possible under low-rank & regularity conditions

Srebro et al (2005); Candes and Recht (2008); Candes and Tao (2009); Keshavan et. al. (2009); Negahban and Wainwright (2012)

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 6/ 27 Optimization problem

Find Zn×m of (small) rank r such that training error is small.

X 2 minimize (Xij Zij ) subject to rank(Z) = r Z − Observed(i,j)

Impute missing Xij with Zij

True X Observed X Fitted Z Imputed X

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 7/ 27 Our Approach: Nuclear Norm Relaxation

I The rank(Z) constraint makes the problem non-convex — combinatorially very hard (although good algorithms exist). P I Z ∗ = j λj (Z) — sum of singular values of Z — is convex kin Zk. Called the “nuclear norm” of Z.

I Z ∗ tightest convex relaxation of rank(Z) (Fazel,k k Boyd, 2002) We solve instead

X 2 minimize (Xij Zij ) subject to Z ∗ τ Z − || || ≤ Observed(i,j)

which is convex in Z.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 8/ 27 Our Approach: Nuclear Norm Relaxation

I The rank(Z) constraint makes the problem non-convex — combinatorially very hard (although good algorithms exist). P I Z ∗ = j λj (Z) — sum of singular values of Z — is convex ink Zk. Called the “nuclear norm” of Z.

I Z ∗ tightest convex relaxation of rank(Z) (Fazel,k k Boyd, 2002) We solve instead

X 2 minimize (Xij Zij ) subject to Z ∗ τ Z − || || ≤ Observed(i,j)

which is convex in Z.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 8/ 27 Our Approach: Nuclear Norm Relaxation

I The rank(Z) constraint makes the problem non-convex — combinatorially very hard (although good algorithms exist). P I Z ∗ = j λj (Z) — sum of singular values of Z — is convex ink Zk. Called the “nuclear norm” of Z.

I Z ∗ tightest convex relaxation of rank(Z) (Fazel,k k Boyd, 2002) We solve instead

X 2 minimize (Xij Zij ) subject to Z ∗ τ Z − || || ≤ Observed(i,j)

which is convex in Z.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 8/ 27 Our Approach: Nuclear Norm Relaxation

I The rank(Z) constraint makes the problem non-convex — combinatorially very hard (although good algorithms exist). P I Z ∗ = j λj (Z) — sum of singular values of Z — is convex ink Zk. Called the “nuclear norm” of Z.

I Z ∗ tightest convex relaxation of rank(Z) (Fazel,k k Boyd, 2002) We solve instead

X 2 minimize (Xij Zij ) subject to Z ∗ τ Z − || || ≤ Observed(i,j)

which is convex in Z.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 8/ 27 Notation

Following Cai et al (2010) define PΩ(X )n×m: projection onto the observed entries  X if (i, j) is observed P (X ) = i,j Ω i,j 0 if (i, j) is missing

Criterion rewritten as:

X 2 2 (Xij Zij ) = PΩ(X ) PΩ(Z) − k − kF Observed(i,j)

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 9/ 27 Soft SVD — Prox operator for Nuclear Norm

Let (fully observed) Xn×m have SVD

0 X = U diag[σ1, . . . , σm] V · · Consider the convex optimization problem

1 2 minimize X Z F + λ Z ∗ Z 2 k − k k k Solution is soft-thresholded SVD

0 Sλ(X ) := U diag[(σ1 λ)+,..., (σm λ)+] V · − − · Like lasso for SVD: singular values are shrunk to zero, with many set to zero. Smooth version of best-rank approximation.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 10/ 27 Convex Optimization Problem

Back to missing data problem, in Lagrange form:

1 2 minimize PΩ(X ) PΩ(Z) F + λ Z ∗ Z 2 k − k k k

I This is a semi-definite program (SDP), convex in Z.

I Complexity of existing off-the-shelf solvers: – interior-point methods: O(n4) ... O(n6) ... – (black box) first-order methods complexity: O(n3)

I We solve using an iterative soft SVD (next slide), with cost per soft SVD O[(m + n) r + Ω ] where r is rank of solution. · | |

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 11/ 27 This is an MM algorithm for solving the nuclear-norm regularized problem

Soft-Impute: Path Algorithm

1 Initialize Z old = 0 and create a decreasing grid Λ of values λ0 > λ1 > . . . > λK > 0, with λ0 = λmax(PΩ(X )) 2 For each λ = λ1, λ2,... Λ iterate 2a-2b till convergence: ∈ new ⊥ old (2a) Compute Z Sλ(PΩ(X ) + P (Z )) ← Ω (2b) Assign Z old Z new and go to step (2a) ← new (2c) Assign Zˆλ Z and go to 2 ←

ˆ ˆ 3 Output the sequence of solutions Zλ1 ,..., ZλK .

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 12/ 27 Soft-Impute: Path Algorithm

1 Initialize Z old = 0 and create a decreasing grid Λ of values λ0 > λ1 > . . . > λK > 0, with λ0 = λmax(PΩ(X )) 2 For each λ = λ1, λ2,... Λ iterate 2a-2b till convergence: ∈ new ⊥ old (2a) Compute Z Sλ(PΩ(X ) + P (Z )) ← Ω (2b) Assign Z old Z new and go to step (2a) ← new (2c) Assign Zˆλ Z and go to 2 ←

ˆ ˆ 3 Output the sequence of solutions Zλ1 ,..., ZλK .

This is an MM algorithm for solving the nuclear-norm regularized problem

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 12/ 27 Soft-Impute : Computational Bottleneck

Obtain the sequence Zk of guesses { } 1 ⊥ 2 Zk+1 = argmin 2 PΩ(X ) + PΩ (Zk ) Z F + λ Z ∗ Z k − k k k Computational bottleneck — soft SVD requires (low-rank) SVD of completed matrix after k iterations:

ˆ ⊥ Xk = PΩ(X ) + PΩ (Zk )

Trick:

⊥ PΩ(X ) + PΩ (Zk )= PΩ(X ) PΩ(Zk ) + Zk Sparse− Low Rank

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 13/ 27 Computational tricks in Soft-Impute

ˆ ˆ I Anticipate rank of Zλj+1 based on rank of Zλj , erring on generous side.

I Compute low-rank SVD of Xˆk using orthogonal QR iterations with Reitz acceleration (Stewart, 1969, Hastie, Mazumder, Lee and Zadeh 2014 [arXiv]). 0 I Iterations require left and right multiplications U Xˆk and Xˆk V . Ideal for Sparse + Low-Rank structure.

I Warm starts: Sλ(Xˆk ) provides excellent warm starts (U and ˆ ˆ ˆ V ) for Sλ(Xk+1). Likewise Zλj for Zλj+1 . I Total cost per iteration O[(m + n) r + Ω ]. · | |

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 14/ 27 Computational tricks in Soft-Impute

ˆ ˆ I Anticipate rank of Zλj+1 based on rank of Zλj , erring on generous side.

I Compute low-rank SVD of Xˆk using orthogonal QR iterations with Reitz acceleration (Stewart, 1969, Hastie, Mazumder, Lee and Zadeh 2014 [arXiv]). 0 I Iterations require left and right multiplications U Xˆk and Xˆk V . Ideal for Sparse + Low-Rank structure.

I Warm starts: Sλ(Xˆk ) provides excellent warm starts (U and ˆ ˆ ˆ V ) for Sλ(Xk+1). Likewise Zλj for Zλj+1 . I Total cost per iteration O[(m + n) r + Ω ]. · | |

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 14/ 27 Computational tricks in Soft-Impute

ˆ ˆ I Anticipate rank of Zλj+1 based on rank of Zλj , erring on generous side.

I Compute low-rank SVD of Xˆk using orthogonal QR iterations with Reitz acceleration (Stewart, 1969, Hastie, Mazumder, Lee and Zadeh 2014 [arXiv]). 0 I Iterations require left and right multiplications U Xˆk and Xˆk V . Ideal for Sparse + Low-Rank structure.

I Warm starts: Sλ(Xˆk ) provides excellent warm starts (U and ˆ ˆ ˆ V ) for Sλ(Xk+1). Likewise Zλj for Zλj+1 . I Total cost per iteration O[(m + n) r + Ω ]. · | |

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 14/ 27 Computational tricks in Soft-Impute

ˆ ˆ I Anticipate rank of Zλj+1 based on rank of Zλj , erring on generous side.

I Compute low-rank SVD of Xˆk using orthogonal QR iterations with Reitz acceleration (Stewart, 1969, Hastie, Mazumder, Lee and Zadeh 2014 [arXiv]). 0 I Iterations require left and right multiplications U Xˆk and Xˆk V . Ideal for Sparse + Low-Rank structure.

I Warm starts: Sλ(Xˆk ) provides excellent warm starts (U and ˆ ˆ ˆ V ) for Sλ(Xk+1). Likewise Zλj for Zλj+1 . I Total cost per iteration O[(m + n) r + Ω ]. · | |

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 14/ 27 Computational tricks in Soft-Impute

ˆ ˆ I Anticipate rank of Zλj+1 based on rank of Zλj , erring on generous side.

I Compute low-rank SVD of Xˆk using orthogonal QR iterations with Reitz acceleration (Stewart, 1969, Hastie, Mazumder, Lee and Zadeh 2014 [arXiv]). 0 I Iterations require left and right multiplications U Xˆk and Xˆk V . Ideal for Sparse + Low-Rank structure.

I Warm starts: Sλ(Xˆk ) provides excellent warm starts (U and ˆ ˆ ˆ V ) for Sλ(Xk+1). Likewise Zλj for Zλj+1 . I Total cost per iteration O[(m + n) r + Ω ]. · | |

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 14/ 27 Soft-Impute on Netflix problem

rank time (hrs) RMSE % Improvement

42 1.36 0.9622 -1.1 66 2.21 0.9572 -0.6 81 2.83 0.9543 -0.3

Cinematch 0.95140

95 3.27 0.9497 1.8 120 4.40 0.9213 3.2

··· Winning··· Goal 0.8563 10

state-of-the-art convex solvers do not scale to this size

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 15/ 27 Hard-Impute

minimize PΩ(X ) PΩ(Z) F rank(Z)=r || − || This is not convex in Z, but by analogy with Soft-Impute, an iterative algorithm gives good solutions. Replace step: new ⊥ old (2a) Compute Z Sλ(PΩ(X ) + P (Z )) ← Ω with new ⊥ old (2a’) Compute Z Hr (PΩ(X ) + P (Z )) ← Ω ∗ ∗ Here Hr (X ) is the best rank-r approximation to X — i.e. the rank-r truncated SVD approximation.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 16/ 27 Example: choosing a good rank for SVD

10−fold CV Rank Determination

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4 ● ● ● ● ● ●

● ●

● ●

● ●

2 ● ● Root Mean Square Error ● ● ● ● ● ● ● ● 10−fold CV ● ● ● Train ● 0

20 40 60 80

Rank

Truth is 200 100 rank-50 matrix plus noise (SNR 3). Randomly omit 10% of× entries, and then predict using solutions from Soft-Impute or Hard-Impute. Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 17/ 27 MISSING DATA AND MATRIX COMPLETION 167 The competition identified a “probe set” of ratings, about 1.4 million of the entries, for testing purposes. These were not a random draw, rather movies that had appeared chronologically later than most. Figure 7.2 shows the root mean squared error over the training and test sets as the rank of the SVD was varied. Also shown are the results from an estimator based on nuclear norm regularization, discussed in the next section. Here we double centered the training data, by removing row and column means. This amounts to fitting the model r zij = αi + βj + ci`gj` + wij; (7.8) X`=1 However, the row and column means can be estimated separately, using a simple two-way ANOVA regressionSoft-Impute model (onbeats unbalancedHard-Impute data). on Netflix Competition Data

Train Hard−Impute Test Soft−Impute RMSE Test RMSE Test 0.7 0.8 0.9 1.0 0.95 0.96 0.97 0.98 0.99 1.00

0 50 100 150 200 0.65 0.70 0.75 0.80 0.85 0.90

Rank Training RMSE

Figure 7.2 Left: Root-mean-squared errorMazumder, for Hastie, the Tibshirani, Netflix Lee, training Zadeh Matrix and Completion test data 18/ for 27 the iterated-SVD (Hard-Impute) and the convex spectral-regularization algorithm (Soft-Impute). Each is plotted against the rank of the solution, an imperfect cal- ibrator for the regularized solution. Right: Test error only, plotted against training error, for the two methods. The training error captures the amount of fitting that each method performs. The dotted line represents the baseline “Cinematch” score.

While the iterated-SVD method is quite effective, it is not guaranteed to find the optimal solution for each rank. It also tends to overfit in this example, when compared to the regularized solution. In the next section, we present a convex relaxation of this setup that leads to an algorithm with guaranteed convergence properties. Soft-Impute beats debiased Soft-Impute on Netflix Netflix Competition Data

● 1.00

● ● 0.99 ● ● ● ● ● ● ● 0.98 ● ● ● Hard−Impute ● ● ● ● Soft−Impute ● ● 0.97 ● Soft−Impute−+ ● ●

Test RMSE Test ● ● ● ● ● ● ● 0.96 ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● 0.95 ● ● ● ● ● ● ● ●

0.65 0.70 0.75 0.80 0.85 0.90

Training RMSE

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 19/ 27 Alternating Least Squares

0 Consider rank-r approximation Z = Am×r Bn×r , and solve

0 2 2 2 minimize PΩ(X ) PΩ(AB ) F + λ( A F + B F ) A,B || − || k k k k

   B′           I Regularized SVD (Srebro et al 2003,       X A  ≈  Simon Funk)           Not convex, but bi-convex: alternating   I         ridge regression  

Lemma (Srebro et al 2005, Mazumder et al 2010) For any matrix W , the following holds:

1 2 2  W ∗ = min 2 A F + B F . || || A,B: W =ABT k k k k

If rank(W ) = k min m, n , then the minimum above is attained at a ≤ { } T factor decomposition W = Am×k Bn×k . Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 20/ 27 Connections between ALS and soft-impute

1 0 2 λ 2 2 ALS : minimize 2 PΩ(X ) PΩ(AB ) F + 2 ( A F + B F ) An×r ,Bm×r || − || k k k k 1 2 soft-impute: minimize PΩ(X ) PΩ(Z) F + λ Z ∗ Z 2 || − || k k

I Solution-space of ALS contains solutions of soft-impute. I For large rank r: ALS soft-impute. ≡ k Rank 0 20 40 60 80 100

−4 −2 0 2 4 6 log λ

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 21/ 27 Synthesis and New Approach

I ALS is slower than soft-impute — factor of 10.

I ALS requires guesswork for rank, and does not return a definitive low-rank solution.

I soft-impute requires a low-rank SVD at each iteration. Typically iterative QR methods are used, exploiting problem structure and warms starts. Idea: combine soft-impute and ALS

I Leads to algorithm more efficient than soft-impute

I Scales naturally to larger problems using parallel/multicore programming

I Suggests efficient algorithm for low-rank SVD for complete matrices

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 22/ 27 New nuclear-norm and ALS results

Consider fully observed Xn×m.

1 2 Nuclear : minimize 2 X Z F + λ Z ∗ rank(Z)≤r || − || k k 1 0 2 λ 2 2 ALS : minimize 2 X AB F + 2 ( A F + B F ) An×r ,Bm×r || − || k k k k

The solution to Nuclear is

0 Z = Ur D∗Vr ,

where Ur and Vr are first r left and right singular vectors of X , and

D∗ = diag[(σ1 λ)+,..., (σr λ)+] − − A solution to ALS is

1 1 2 2 A = Ur D∗ and B = Vr D∗

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 23/ 27 Consequences of new nuclear-norm / ALS connections

For SVD of fully observed matrix:

I Can solve reduced-rank SVD by alternating ridge regressions.

I At each iteration, re-orthogonalization as in usual QR iterations (for reduced-rank SVD) means ridge regression is a simple matrix multiply, followed by column scaling.

I Ridging speeds up convergence, and focuses accuracy on leading dimensions.

I Solution delivers a reduced-rank SVD. For matrix completion:

I Combine SVD calculation and imputation in Soft-Impute.

I Leads to a faster algorithm that can be distributed to multiple cores for storage and computation efficiency.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 24/ 27 Consequences of new nuclear-norm / ALS connections

For SVD of fully observed matrix:

I Can solve reduced-rank SVD by alternating ridge regressions.

I At each iteration, re-orthogonalization as in usual QR iterations (for reduced-rank SVD) means ridge regression is a simple matrix multiply, followed by column scaling.

I Ridging speeds up convergence, and focuses accuracy on leading dimensions.

I Solution delivers a reduced-rank SVD. For matrix completion:

I Combine SVD calculation and imputation in Soft-Impute.

I Leads to a faster algorithm that can be distributed to multiple cores for storage and computation efficiency.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 24/ 27 Soft-Impute//

Back to matrix imputation.

1. Initialize Un×r , Vm×r orthogonal, Dr×r > 0 diagonal, and A = UD, B = VD. 2. Given U and D and hence A = UD, update B: 2.a Compute current imputation:

∗ ⊥ 0 X = PΩ(X ) + PΩ (AB ) 0 2 0 = [PΩ(X ) PΩ(AB )] + UD V − 2.b Ridge regression of X ∗ on A: B0 (D2 + λI )−1DU0X ∗ ← 0 0 0 = D1U [PΩ(X ) PΩ(AB )] + D2V − 2.c Reorthogonalize and update V , D and U via SVD of BD. 3. Given V and D and B = VD, update A in similar fashion. 4. At convergence, U and V provides SVD of X ∗, and hence ∗ Sλ(X ), which cleans up the rank of the solution.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 25/ 27 Timingof computing Comparisons on a Linux cluster with 300Gb of ram (with a fairly liberal relative convergence criterion of 0.001), using the softImpute package in R.

Netflix (480K, 18K) λ=100 r=100 MovieLens 10M (72K, 10K) λ=50 r=100

● ● ● ●● ● ● ● ● ● ALS ● ● ● 5e−01 ● 5e−01 softImpute−ALS ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5e−02 ● ● ● ● ● ● ●● ● ●● ● 5e−02 ● ● ● ●● ● ●● ●● ● ● ●● ● ●● ●● ●● ●● ●● ●● ● ●● ●● ●● ●● ● ● 5e−03 ● ●● ● ●● ●● ●●● ● ●●● ●●● ●●● ●● ● ●●● ● ●●● ●●● ●●● ●●● ●●● ● 5e−03 ● ● ●●● ● ●●● ● ●●● ● ●●● ●●● ●●● ●●● ● ●●● ●●● ● ●●● ●●●● ● ●●● ● 5e−04 ● ●●●● ●●● ●●●● ●●● Relative Objective (log scale) Objective Relative ●●●● (log scale) Objective Relative ●●● ●●●● ●●● ●●●● ●●●● 1e−03 ● ● ●●●● ●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● 5e−05 2e−04 0 5 10 15 0 10 20 30 40 50 60

Time in Hours Time in Minutes

Figure 3: Left: timing results on the Netflix matrix, comparing ALS with softImpute-ALS. Right: timing on the MovieLens 10M matrix. In both

cases weMazumder, see that Hastie, while Tibshirani,ALS Lee,makes Zadeh biggerMatrix Completion gains per 26/ iteration, 27 each iteration is much more costly.

Figure 3 (left panel) gives timing comparison results for one of the Netflix fits, this time implemented in Matlab. The right panel gives timing results on the smaller MovieLens 10M matrix. In these applications we need not get a very accurate solution, and so early stopping is an attractive option. softImpute-ALS reaches a solution close to the minimum in about 1/4 the time it takes ALS.

6 R Package softImpute

We have developed an R package softImpute for fitting these models [3], which is available on CRAN. The package implements both softImpute and softImpute-ALS. It can accommodate large matrices if the number of missing entries is correspondingly large, by making use of sparse-matrix formats. There are functions for centering and scaling (see Section 8), and for making

27 Thank You!

Software Implementations

I softImpute package in R. Can deal with large sparse complete matrices, or large matrices with many missing entires (ie Netflix or bigger). Includes row and column centering and scaling options.

I Spark cluster-programming. Uses distributed computing and chunking. Can deal with very large problems (e.g. 107 107, × 139 secs per iteration). See http://git.io/sparkfastals with documentation in Scala.

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 27/ 27 Software Implementations

I softImpute package in R. Can deal with large sparse complete matrices, or large matrices with many missing entires (ie Netflix or bigger). Includes row and column centering and scaling options.

I Spark cluster-programming. Uses distributed computing and chunking. Can deal with very large problems (e.g. 107 107, × 139 secs per iteration). See http://git.io/sparkfastals with documentation in Scala.

Thank You!

Mazumder, Hastie, Tibshirani, Lee, Zadeh Matrix Completion 27/ 27