Superlinear Convergence of Randomized Block Lanczos Algorithm
Total Page:16
File Type:pdf, Size:1020Kb
Superlinear Convergence of Randomized Block Lanczos Algorithm Qiaochu Yuan Ming Gu Bo Li Department of Mathematics Department of Mathematics Department of Mathematics UC Berkeley UC Berkeley UC Berkeley Berkeley, CA, USA Berkeley, CA, USA Berkeley, CA, USA [email protected] [email protected] bo [email protected] Abstract—The low rank approximation of matrices is a crucial ing behind the era of moderately sized matrices and entering component in many data mining applications today. A competi- an age of web-scale datasets and big-data applications. The tive algorithm for this class of problems is the randomized block matrices arising from such are often extraordinarily large, ex- Lanczos algorithm - an amalgamation of the traditional block 6 Lanczos algorithm with a randomized starting matrix. While ceeding the order of 10 in one or both of the dimensions [9]– empirically this algorithm performs quite well, there has been [11], and have much higher computational efficiency demands scant new theoretical results on its convergence behavior and on the algorithms. Secondly, while the truncated SVD may be approximation accuracy, and past results have been restricted to the final desired object for previous scientific computing ques- certain parameter settings. In this paper, we present a unified tions, for big-data applications, it is usually an intermediate singular value convergence analysis for this algorithm, for all valid choices of the block size parameter. We present novel representation for the overall classification or regression task. results on the rate of singular value convergence and show that Empirically, the final accuracy of the task only weakly depends under certain spectrum regimes, the convergence is superlinear. on the accuracy of the matrix approximation [12]. Thus, while Additionally, we provide results from numerical experiments that previous variants of truncated SVD algorithms focused on validate our analysis. computing up to full double precision, newer iterations of these Index Terms—low-rank approximation, randomized block Lanczos, block size, singular values. algorithms aimed at big-data applications can comfortably get by with only 2-3 digits of accuracy. These considerations have led to the development of ran- I. INTRODUCTION domized variants of traditional SVD algorithms suited to large, The low rank approximation of matrices is a crucial com- sparse matrices, in particular randomized subspace iteration ponent in many data mining applications today. In addition (RSI) and randomized block Lanczos (RBL) [13]–[16]. By to functioning as a stand alone technique for dimensional- applying either a randomized sketching or projecting operation ity reduction [1], denoising [2], signal processing [3], data on the original matrix, these algorithms balance reducing compression [4], and more, it has also been incorporated into computational complexity with producing an acceptably ac- more complex algorithms as a computational subroutine [5], curate approximation. While empirically they have shown to [6]. As part of large scale modern data processing, low rank be effective and have been widely adopted by popular software approximations help to reveal important structural information packages, e.g. [17], there has been scant new theoretical work in the raw data and to transform the data into forms that are on the convergence guarantees of the latter algorithm, the more efficient for computation, transmission, and storage. better performing but more complicated randomized block arXiv:1808.06287v1 [math.NA] 20 Aug 2018 The singular value decomposition (SVD) is a matrix factor- Lanczos algorithm. ization of both theoretical and practical importance, and it has In this paper, we present novel theoretical convergence a number of useful properties related to matrix nearness and results concerning the rate of singular value convergence for rank. In particular, it is used to identify nearby matrices of the RBL algorithm, along with numerical experiments sup- lower rank, and, leaving aside the question of computational porting these results. Our analysis presents a unified singular complexity, it is known that the rank-k truncated SVD is the value convergence theory for variants of the Block Lanczos “gold standard” for approximating a matrix by another matrix algorithm, for all valid parameter choices of block size b. of rank at most k [7]. To our knowledge, all previous results in the literature are While procedures for computing the exact rank-k truncated applicable only for the choice of b k, the target rank. SVD have existed since the 1960s [8], the computational cost We present a generalized theorem, applicable≥ to all block of these algorithms are prohibitive at the scale of many of sizes b, which coincide asymptotically with previous results today’s datasets. The recent applications of low rank matrix for the case b k, while providing equally strong rates of approximation techniques to big-data problems differ in both convergence for≥ the case b < k. the computation efficiency requirement and the accuracy re- In Section II, we present the randomized block Lanczos quirement of the algorithms. Firstly, we are increasingly leav- algorithm and discuss some previous convergence results for this algorithm. In Section III, we dive into our main theoretical B. The Algorithm result and its derivation, followed by corollaries for special The randomized block Lancos algorithm is a straightforward cases. In Section IV, we investigate the behavior of this combination of the classical block Lanczos algorithm [18] with algorithm for different parameter settings and empirically the added element of a randomized starting matrix V = AΩ. verify the results of the previous section. Finally, we give The pseudocode for this algorithm is outlined in Algo- concluding remarks in Section V. rithm 1. Of the parameters of the algorithm, k (target rank) is II. BACKGROUND problem dependent, while b (block size), q (no. of iterations) A. Preliminaries are chosen by the user to control the quality and computational cost of the approximation. The algorithm requires the choices Throughout this paper, our analysis assumes exact arith- of b, q to satisfy qb k, to ensure that the Krylov subspace metics. be at least k dimensional.≥ We denote matrices by bold-faced uppercase letters, e.g. M, entries of matrices by the plain-faced lowercase letter that Algorithm 1 randomized block Lanczos algorithm pseu- the entry belongs to, e.g. m11, and block submatrices by the docode bold-faced or script-faced uppercase letter that the submatrix A Rm×n belongs to subscripted by position, possibly with subscripts, Ω ∈ Rn×b , random Gaussian matrix ∈ e.g. M11, 11 or Ma×b. Double numerical subscripts denote Input: k , target rank M the position of the element or the submatrix, i.e. M11 and m11 b , block size are the topmost leftmost subblock or entry of M respectively. q , number of Lanczos iterations m×n m n subscripts denote the dimensions of a submatrix, when Output: Bk R , a rank-k approximation to A × ∈ such information is relevant, i.e. Ma×b denote a subblock of 1: Form the block column Krylov subspace matrix M that has dimensions a b. K = AΩ (AAT )AΩ (AAT )qAΩ . × ··· Constants are denoted by script-faced uppercase or lower- 2: Compute an orthonormal basis Q for the column span of case letters, e.g. or α, when it is asymptotically insignificant, K, using e.g. QR qr(K). C ← i.e. constant with respect to the convergence parameter. 3: Project A onto the Krylov subspace by computing The SVD of a matrix A is defined as the factorization B = QQT A. A = UΣVT (1) 4: Compute k-truncated SVD Bk = svdk (B) = T T svdk QQ A = Q svdk Q A . where U u u and V v v are · = 1 n = 1 n 5: Return Bk. orthogonal matrices··· whose columns are the set of··· left and right singular vectors respectively, and Σ is a diagonal matrix whose We present the algorithm pseudocode in this form in order entries Σ = σ are the singular values ordered descendingly ii i to highlight the mathematical ideas that are at the core of this σ σ 0. 1 n algorithm. It is well known that a naive implementation of The≥···≥ rank-k truncated≥ SVD of a matrix is defined as any Lanczos algorithm is plagued by loss of orthogonality of svdk (A)= UkΣkVk (2) the Lanczos vectors due to roundoff errors [19]. A practical implementation of Algorithm 1 should involve, at the very where Uk = u1 uk and Vk = v1 vk contain the first k left··· and right singular vectors respectively,··· least, a reorganization of the computation to use the three- term recurrence and bidiagonalization [20], and reorthogonal- and Σk = diag(σ1, , σk). The ith singular values··· of an arbitrary matrix M is denoted izations of the Lanczos vectors at each step using one of the numerous schemes that has been proposed [20]–[22]. by σi(M), or simply σi when it is clear from context the matrix in question. C. Previous Work The pth degree Chebyshev polynomial is defined by the Historically, the the classical Lanczos algorithm was de- recurrence veloped as an eigenvalue algorithm for symmetric matrices. T0(x) 1 (3) Its convergence analysis focused on theorems concerning the ≡ approximation quality of the approximant’s eigenvalues as a T1(x) x (4) ≡ function of k, the target rank. The analysis relied heavily Tp(x) 2pTp− (x) Tp− (x) (5) ≡ 1 − 2 on the analysis of the k-dimensional Krylov subspace and Alternatively, they may be expressed as the choice of the associated k-degree Chebyshev polynomial. p −p Classical results in this line of inquiry include those by Kaniel 1 2 2 Tp(x)= x + x 1 + x + x 1 (6) 2 − − [23], Paige [24], Underwood [25], Saad [26]. p p More recently, while there has been much work on the for x > 1, and estimated as | | analysis of randomized algorithms, such efforts have been 1 p focused mostly on RBL’s simpler cousins, such as randomized Tp(1 + ǫ) 1+ ǫ + √2ǫ (7) ≈ 2 power iteration or randomized subspace iteration [12], [15].