Computing the Diagonal of the Inverse of a Sparse Matrix Yousef Saad Department of Computer Science and Engineering University of Minnesota

Computing the diagonal of the inverse of a sparse matrix Yousef Saad Department of Computer Science and Engineering University of Minnesota “Sparse Days”, Toulouse, June 15, 2010 Motivation: DMFT ‘Dynamic Mean Field Theory’ - quantum mechanical studies of highly correlated particles ® Equation to be solved (repeatedly) is Dyson’s equation G(!) = [(! + µ)I − V − Σ(!) + T ]−1 • ! (frequency) and µ (chemical potential) are real • V = trap potential = real diagonal • Σ(!) == local self-energy - a complex diagonal • T is the hopping matrix (sparse real). Sparse Days 06/15/2010 2 ® Interested only in diagonal of G(!) – in addition, equation must be solved self-consistently and ... ® ... must do this for many !’s ® Related approach: Non Equilibrium Green’s Function (NEGF) approach used to model nanoscale transistors. ® Many new applications of diagonal of inverse [and related problems.] ® A few examples to follow Sparse Days 06/15/2010 3 Introduction: A few examples Problem 1: Compute Tr[inv[A]] the trace of the inverse. ® Arises in cross validation : k(I − A(θ))gk2 with A(θ) ≡ I−D(DT D+θLLT )−1DT ; Tr (I − A(θ)) D == blurring operator and L is the regularization operator ® In [Huntchinson ’90] Tr[Inv[A]] is stochastically estimated ® Many authors addressed this problem. Sparse Days 06/15/2010 4 Problem 2: Compute Tr [ f (A)], f a certain function Arises in many applications in Physics. Example: ® Stochastic estimations of Tr ( f(A)) extensively used by quantum chemists to estimate Density of States, see [Ref: H. Röder, R. N. Silver, D. A. Drabold, J. J. Dong, Phys. Rev. B. 55, 15382 (1997)] Sparse Days 06/15/2010 5 Problem 3: Compute diag[inv(A)] the diagonal of the inverse ® Arises in Dynamic Mean Field Theory [DMFT, motivation for this work]. In DMFT, we seek the diagonal of a “Green’s function” which solves (self-consistently) Dyson’s equation. [see J. Freericks 2005] ® Related approach: Non Equilibrium Green’s Function (NEGF) approach used to model nanoscale transistors. ® In uncertainty quantification, the diagonal of the inverse of a covariance matrix is needed [Bekas, Curioni, Fedulova ’09] Sparse Days 06/15/2010 6 Problem 4: Compute diag[ f (A)]; f = a certain function. ® Arises in any density matrix approach in quantum modeling - for example Density Functional Theory. ® Here, f = Fermi-Dirac operator: 1 Note: when T ! 0 f() = 1 + exp(−µ) then f becomes a step kBT function. Note: if f is approximated by a rational function then diag[f(A)] −1 ≈ a lin. combinaiton of terms like diag[(A − σiI) ] ® Linear-Scaling methods based on approximating f(H) and Diag(f(H)) – avoid ‘diagonalization’ of H Sparse Days 06/15/2010 7 Methods based on the sparse L U factorization ® Basic reference: K. Takahashi, J. Fagan, and M.-S. Chin, Formation of a sparse bus impedance matrix and its application to short circuit study, in Proc. of the Eighth Inst. PICA Conf., Minneapolis, MN, IEEE, Power Engineering Soc., 1973, pp. 63-69. ® Described in [Duff, Erisman, Reid, p. 273] - ® Algorithm used by Erisman and Tinney [Num. Math. 1975] Sparse Days 06/15/2010 8 ® Main idea. If A = LDU and B = A−1 then B = U −1D−1 + B(I − L); B = D−1L−1 + (I − U)B: ® Not all entries are needed to compute selected entries of B ® For example: Consider lower part, i > j; use first equation: X bij = (B(I − L))ij = − biklkj k>j ® Need entries bik of row i where Lkj 6= 0, k > j. ® “Entries of B belonging to the pattern of (L; U)T can be extracted without computing any other entries outside the pattern.” Sparse Days 06/15/2010 9 ® More recently exploited in a different form in L. Lin, C. Yang, J. Meza, J. Lu, L. Ying, W. E SelInv – An algorithm for selected inversion of a sparse symmetric matrix, Tech. Report, Princeton Univ. ® An algorithm based on a form of nested dissection is described in Li, Ahmed, Glimeck, Darve [2008] ® A close relative to this technique is represented in L. Lin , J. Lu, L. Ying , R. Car , W. E Fast algorithm for extracting the diagonal of the inverse matrix with application to the elec- tronic structure analysis of metallic systems Comm. Math. Sci, 2009. ® Difficulty: 3-D problems. Sparse Days 06/15/2010 10 Stochastic Estimator • A = original matrix, B = A−1. • δ(B) = diag(B) [matlab notation] • D(B) = diagonal matrix with diagonal δ(B) Notation: • and : Elementwise multiplication and division of vectors • fvjg: Sequence of s random vectors 2 s 3 2 s 3 X X Result: δ(B) ≈ 4 vj Bvj5 4 vj vj5 j=1 j=1 Refs: C. Bekas , E. Kokiopoulou & YS (’05), Recent: C. Bekas, A. Curioni, I. Fedulova ’09. Sparse Days 06/15/2010 11 ® Let Vs = [v1; v2; : : : ; vs]. Then, alternative expression: > −1 > D(B) ≈ D(BVsVs )D (VsVs ) Question: When is this result exact? Main Proposition n×s n×n • Let Vs 2 R with rows fvj;:g; and B 2 C with elements fbjkg • Assume that: hvj;:; vk;:i = 0, 8j 6= k, s.t. bjk 6= 0 Then: > −1 > D(B)=D(BVsVs )D (VsVs ) ® Approximation to bij exact when rows i and j of Vs are ? Sparse Days 06/15/2010 12 Ideas from information theory: Hadamard matrices ® Consider the matrix V – want the rows to be as ‘orthogonal as possible among each other’, i.e., want to minimize T kI − VV kF T Erms = or Emax = max jVV jij pn(n − 1) i6=j ® Problems that arise in coding: find code book [rows of V = code words] to minimize ’cross-correlation amplitude’ ® Welch bounds: s s n − s n − s Erms ≥ Emax ≥ (n − 1)s (n − 1)s ® Result: 9 a sequence of s vectors vk with binary entries which achieve the first Welch bound iff s = 2 or s = 4k. Sparse Days 06/15/2010 13 ® Hadamard matrices are a special class: n × n matrices with entries ±1 and such that HH> = nI. 2 1 1 1 1 3 1 1 6 1 −1 1 −1 7 Examples : and 6 7 : 1 −1 4 1 1 −1 −1 5 1 −1 −1 1 ® Achieve both Welch bounds ® Can build larger Hadamard matrices recursively: Given two Hadamard matrices H1 and H2, the Kro- necker product H1 ⊗ H2 is a Hadamard matrix. ® Too expensive to use the whole matrix of size n ® Can use Vs = matrix of s first columns of Hn Sparse Days 06/15/2010 14 0 0 20 20 40 40 32 64 60 60 80 80 100 100 120 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 > Pattern of VsVs , for s = 32 and s = 64. Sparse Days 06/15/2010 15 A Lanczos approach ® Given a Hermitian matrix A - generate Lanczos vectors via: βi+1qi+1 = Aqi − αiqi − βiqi−1 αi; βi+1 selected s.t. kqi+1k2 = 1 and qi+1 ? qi; qi+1 ? qi−1 ® Result: > AQm = QmTm + βm+1qm+1em; > −1 −1 > ® When m = n then A = QnTnQn and A = QnTn Qn : −1 −1 > ® For m < n use the approximation: A ≈ QmTm Qm ! −1 −1 > D(A ) ≈ D[QmTm Qm] Sparse Days 06/15/2010 16 ALGORITHM : 1 diagInv via Lanczos For j = 1; 2; ··· ; Do: βj+1qj+1 = Aqj − αjqj − βjqj−1 [Lanczos step] pj := qj − ηjpj−1 δj := αj − βjηj pj pj dj := dj−1 + [Update of diag(inv(A))] δj βj+1 ηj+1 := δj EndDo −1 ® dk (a vector) will converge to the diagonal of A ® Limitation: Often requires all n steps to converge ® One advantage: Lanczos is shift invariant – so can use this for many !’s ® Potential: Use as a direct method - exploiting sparsity Sparse Days 06/15/2010 17 Using a sparse V : Probing Find Vs such that (1) s is small and (2) Vs Goal: satisfies Proposition (rows i & j orthgonoal for any nonzero bij) Can work only for sparse matrices but B = Difficulty: A−1 is usually dense ® B can sometimes be approximated by a sparse matrix. ® bij; jbijj > Consider for some : (B)ij = 0; jbijj ≤ ® B will be sparse under certain conditions, e.g., when A is diagonally dominant ® In what follows we assume B is sparse and set B := B. ® Pattern will be required by standard probing methods. Sparse Days 06/15/2010 18 Generic Probing Algorithm ALGORITHM : 2 Probing Input: A, s Output: Matrix D (B) Determine Vs := [v1; v2; : : : ; vs] for j 1 to s Solve Axj = vj end Construct Xs := [x1; x2; : : : ; xs] > −1 > Compute D (B) := D XsVs D (VsVs ) ® Note: rows of Vs are typically scaled to have unit 2-norm −1 > =1., so D (VsVs ) = I. Sparse Days 06/15/2010 19 Standard probing (e.g. to compute a Jacobian) ® Several names for same method: “probing”; “CPR”, “Sparse Jacobian estimators”,.. Basis of the method: can compute Jacobian if a coloring of the columns is known so that no two columns of the same color overlap. 1 3 5 12 13 16 20 All entries of same color 1 (1) 1 (3) can be computed with 1 (5) one matvec. Example: For all blue 1 (12) entries multiply B by the 1 (13) blue vector on right. 1 (15) 1 (20) Sparse Days 06/15/2010 20 What about Diag(inv(A))? ® Define vi - probing vector associated with color i: 1 if color(k) == i [v ] = i k 0 otherwise ® Standard probing satisfies requirement of Proposition but... ® ... this coloring is not what is needed! [It is an overkill] Alternative: ® Color the graph of B in the standard graph coloring algorithm [Adjacency graph, not graph of column-overlaps] Graph coloring yields a valid set of probing Result: vectors for D(B).

Computing the Diagonal of the Inverse of a Sparse Matrix Yousef Saad Department of Computer Science and Engineering University of Minnesota

On Multigrid Methods for Solving Electromagnetic Scattering Problems

Sparse Matrices and Iterative Methods

Scalable Stochastic Kriging with Markovian Covariances

Chapter 7 Iterative Methods for Large Sparse Linear Systems

A Parallel Solver for Graph Laplacians

Solving Linear Systems: Iterative Methods and Sparse Systems

High Performance Selected Inversion Methods for Sparse Matrices

A Parallel Gauss-Seidel Algorithm for Sparse Power Systems Matrices

Decomposition Methods for Sparse Matrix Nearness Problems∗

Matrices and Graphs

Localization in Matrix Computations: Theory and Applications

Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations