Matrices with Hierarchical Low-Rank Structures
Total Page:16
File Type:pdf, Size:1020Kb
Matrices with hierarchical low-rank structures Jonas Ballani and Daniel Kressner ANCHP, EPF Lausanne, Station 8, CH-1015 Lausanne, Switzerland e-mail: [email protected], [email protected] The first author has been supported by an EPFL fellowship through the European Union’s Seventh Framework Programme under grant agreement no. 291771. Summary. Matrices with low-rank off-diagonal blocks are a versatile tool to perform matrix compression and to speed up various matrix operations, such as the solution of linear systems. Often, the underlying block partitioning is described by a hierarchical partitioning of the row and column indices, thus giving rise to hierarchical low-rank structures. The goal of this chap- ter is to provide a brief introduction to these techniques, with an emphasis on linear algebra aspects. 1 Introduction 1.1 Sparsity versus data-sparsity An n × n matrix A is called sparse if nnz(A), the number of its nonzero entries, is much smaller than n2. Sparsity reduces the storage requirements to O(nnz(A)) and, likewise, the complexity of matrix-vector multiplications. Unfortunately, sparsity is fragile; it is easily de- stroyed by operations with the matrix. For example, the factors of an LU factorization of A usually have (many) more nonzeros than A itself. This so called fill-in can be reduced, sometimes significantly, by reordering the matrix prior to the factorization [25]. Based on existing numerical evidence, it is probably safe to say that such sparse factorizations consti- tute the methods of choice for (low-order) finite element or finite difference discretizations of two-dimensional partial differential equations (PDEs). They are less effective for three- dimensional PDEs. More severely, sparse factorizations are of limited use when the inverse of A is explicitly needed, as for example in inverse covariance matrix estimation and matrix iter- ations for computing matrix functions. For nearly all sparsity patterns of practical relevance, A−1 is a completely dense matrix. In certain situations, this issue can be addressed effectively by considering approximate sparsity: A−1 is replaced by a sparse matrix M such that kA−1 − Mk ≤ tol for some prescribed tolerance tol and an appropriate matrix norm k · k. To illustrate the concepts mentioned above, let us consider the tridiagonal matrix 2 Jonas Ballani and Daniel Kressner 0 2 −1 1 B .. C 2 B−1 2 . C 2 A = (n + 1) B C + s(n + 1) In; (1) B . C @ .. .. −1A −1 2 −1 where the shift s > 0 controls the condition number k(A) = kAk2kA k2. The intensity plots in Figure 1 reveal the magnitude of the entries of A−1; entries of magnitude below 10−15 are white. It can be concluded that A−1 can be very well approximated by a banded matrix if its (a) s = 4;k(A) ≈ 2 (b) s = 1;k(A) ≈ 5 (c) s = 0;k(A) ≈ 103 Fig. 1. Approximate sparsity of A−1 for the matrix A from (1) with n = 50 and different values of s. condition number is small. This is an accordance with a classical result [26, Proposition 2.1], which states for any symmetric positive definite tridiagonal matrix A that p 2ji− jj −1 k − 1 −1 −1 p 2 [A ]i j ≤ C p ; C = max l ;(2lmax) (1 + k(A)) ; (2) k + 1 min where lmin;lmax denote the smallest/largest eigenvalues of A; see also [15] for recent exten- sions. This implies that the entries of A−1 decay exponentially away from the diagonal, but this decay may deteriorate as k(A) increases. Although the bound (2) can be pessimistic, it correctly predicts that approximate sparsity is of limited use for ill-conditioned matrices as they arise from the discretization of (one-dimensional) PDEs. An n × n matrix is called data-sparse if it can be represented by much less than n2 pa- rameters with respect to a certain format. Data-sparsity is much more general than sparsity. For example, any matrix of rank r n can be represented with 2nr (or even less) parameters by storing its low-rank factors. The inverse of the tridiagonal matrix (1) is obviously not a low-rank matrix, simply because it is invertible. However, any of its off-diagonal blocks has rank 1. One of the many ways to see is to use the Sherman-Morrison formula. Consider the following decomposition of a general symmetric positive definite tridiagonal matrix: T A11 0 en1 en1 n1×n1 n2×n2 A = − an1;n1+1 ; A11 2 R ; A22 2 R ; 0 A22 −e1 −e1 where e j denotes a jth unit vector of appropriate length. Then −1 a T −1 A11 0 n1;n1+1 w1 w1 A = −1 + ; (3) 0 A 1 + eT A−1e + eT A−1e w2 w2 22 n1 11 n1 1 22 1 Matrices with hierarchical low-rank structures 3 −1 −1 −1 where w1 = A11 en1 and w2 = −A22 e1. Hence, the off-diagonal blocks of A have rank at most 1. Assuming that n is an integer multiple of 4, let us partition 0 1 0 1 A(2) A(2) B(2) B(2) 11 12 A 11 12 B B (2) (2) 12 C B (2) (2) 12 C B A A C − B B B C A = B 21 22 C; A 1 = B 21 22 C; (4) B A(2) A(2) C B B(2) B(2) C @ 33 34 A @ 33 34 A A21 (2) (2) B34 (2) (2) A43 A44 B43 B44 (2) (2) n=4×n=4 such that Ai j ;Bi j 2 R . By storing only the factors of the rank-1 off-diagonal blocks −1 (2) of A and exploiting symmetry, it follows that – additionally to the 4 diagonal blocks B j j of size n=4 × n=4 – only 2n=2 + 4n=4 = 2n entries are needed to represent A−1. Assuming that n = 2k and partitioning recursively, it follows that the whole matrix A−1 can be stored with k k 2n=2 + 4n=4 + ··· + 2 n=2 + n = n(log2 n + 1) (5) parameters. The logarithmic factor in (5) can actually be removed by exploiting the nestedness of the (2) n=4×2 low-rank factors. To see this, we again consider the partitioning (4) and let Uj 2 R , j = 1;:::;4, be orthonormal bases such that n (2) −1 (2) −1 o (2) span (A j j ) e1;(A j j ) en=4 ⊆ range(Uj ): (2) (2)! A11 A12 Applying the Sherman-Morrison formula (3) to A11 = (2) (2) shows A21 A22 (2) ! (2) ! −1 U1 0 −1 U1 0 A11 e1 2 range (2) ; A11 en=2 2 range (2) : 0 U2 0 U2 Similarly, (2) ! (2) ! −1 U3 0 −1 U3 0 A22 e1 2 range (2) ; A22 en=2 2 range (2) : 0 U4 0 U4 n=2×2 If we let Uj 2 R , j = 1;2, be orthonormal basis such that n −1 −1 o span A j j e1;A j j en=2 ⊆ range(Uj); 4×2 then there exists matrices Xj 2 R such that (2) ! U2 j−1 0 Uj = (2) Xj: (6) 0 U2 j n=2×2 (2) Hence, there is no need to store the bases U1;U2 2 R explicitly; the availability of Uj −1 and the small matrices X1;X2 suffices. In summary, we can represent A as 0 1 B(2) U(2)S(2)(U(2))T 11 1 12 2 U S UT B (2) (2) T (2) T (2) 1 12 2 C B U (S ) (U ) B C B 2 12 1 22 C (7) B B(2) U(2)S(2)(U(2))T C @ U ST UT 33 3 34 4 A 2 12 1 (2) (2) T (2) T (2) U4 (S34 ) (U3 ) B44 4 Jonas Ballani and Daniel Kressner (2) 2×2 for some matrices S12;Si j 2 R . The number of entries needed for representing the off- diagonal blocks A−1 is thus 4 × 2n=4+2 × 8+ (2 + 1) × 4 : | {z } |{z} | {z } (2) for Xj (2) (2) for Uj for S12;S12 ;S34 Assuming that n = 2k and recursively repeating the described procedure k − 1 times, we find that at most O(n) total storage is needed for representing A−1. Thus we have removed the logarithmic factor from (5), at the expense of a larger constant due to the fact that we work with rank 2 instead of rank 1. Remark 1. The format (7) together with (6) is a special case of the HSS matrices that will be discussed in Section 4.1. The related but different concepts of semi-separable matrices [65] and sequentially semi-separable matrices [21] allow to represent the inverse of a tridiago- nal matrix A with nested rank-1 factors and consequently reduce the required storage to two vectors of length n. For a number of reasons, the situation discussed above is simplistic and not fully rep- resentative for the type of applications that can be addressed with the hierarchical low-rank structures covered in this chapter. First, sparsity of the original matrix can be beneficial but it is not needed in the context of low-rank techniques. Second, exact low rank is a rare phenomenon and we will nearly always work with low-rank approximations instead. Third, the recursive partitioning (4) is closely tied to banded matrices and related structures, which arise, for ex- ample, from the discretization of one-dimensional PDEs. More general partitionings need to be used when dealing with structures that arise from the discretization of higher-dimensional PDEs.