Computing the diagonal of the inverse of a sparse matrix Yousef Saad Department of Computer Science and Engineering University of Minnesota
“Sparse Days”, Toulouse, June 15, 2010 Motivation: DMFT
‘Dynamic Mean Field Theory’ - quantum mechanical studies of highly correlated particles ® Equation to be solved (repeatedly) is Dyson’s equation G(ω) = [(ω + µ)I − V − Σ(ω) + T ]−1
• ω (frequency) and µ (chemical potential) are real • V = trap potential = real diagonal • Σ(ω) == local self-energy - a complex diagonal • T is the hopping matrix (sparse real).
Sparse Days 06/15/2010 2 ® Interested only in diagonal of G(ω) – in addition, equation must be solved self-consistently and ... ® ... must do this for many ω’s ® Related approach: Non Equilibrium Green’s Function (NEGF) approach used to model nanoscale transistors. ® Many new applications of diagonal of inverse [and related problems.] ® A few examples to follow
Sparse Days 06/15/2010 3 Introduction: A few examples
Problem 1: Compute Tr[inv[A]] the trace of the inverse. ® Arises in cross validation : k(I − A(θ))gk2 with A(θ) ≡ I−D(DT D+θLLT )−1DT , Tr (I − A(θ)) D == blurring operator and L is the regularization operator ® In [Huntchinson ’90] Tr[Inv[A]] is stochastically estimated ® Many authors addressed this problem.
Sparse Days 06/15/2010 4 Problem 2: Compute Tr [ f (A)], f a certain function Arises in many applications in Physics. Example: ® Stochastic estimations of Tr ( f(A)) extensively used by quan- tum chemists to estimate Density of States, see [Ref: H. Röder, R. N. Silver, D. A. Drabold, J. J. Dong, Phys. Rev. B. 55, 15382 (1997)]
Sparse Days 06/15/2010 5 Problem 3: Compute diag[inv(A)] the diagonal of the inverse ® Arises in Dynamic Mean Field Theory [DMFT, motivation for this work]. In DMFT, we seek the diagonal of a “Green’s function” which solves (self-consistently) Dyson’s equation. [see J. Freericks 2005] ® Related approach: Non Equilibrium Green’s Function (NEGF) approach used to model nanoscale transistors. ® In uncertainty quantification, the diagonal of the inverse of a covariance matrix is needed [Bekas, Curioni, Fedulova ’09]
Sparse Days 06/15/2010 6 Problem 4: Compute diag[ f (A)]; f = a certain function. ® Arises in any density matrix approach in quantum modeling - for example Density Functional Theory. ® Here, f = Fermi-Dirac operator:
1 Note: when T → 0 f() = 1 + exp(−µ) then f becomes a step kBT function. Note: if f is approximated by a rational function then diag[f(A)] −1 ≈ a lin. combinaiton of terms like diag[(A − σiI) ] ® Linear-Scaling methods based on approximating f(H) and Diag(f(H)) – avoid ‘diagonalization’ of H
Sparse Days 06/15/2010 7 Methods based on the sparse L U factorization
® Basic reference: K. Takahashi, J. Fagan, and M.-S. Chin, Formation of a sparse bus impedance matrix and its application to short circuit study, in Proc. of the Eighth Inst. PICA Conf., Minneapolis, MN, IEEE, Power Engineering Soc., 1973, pp. 63-69. ® Described in [Duff, Erisman, Reid, p. 273] - ® Algorithm used by Erisman and Tinney [Num. Math. 1975]
Sparse Days 06/15/2010 8 ® Main idea. If A = LDU and B = A−1 then B = U −1D−1 + B(I − L); B = D−1L−1 + (I − U)B.
® Not all entries are needed to compute selected entries of B ® For example: Consider lower part, i > j; use first equation: X bij = (B(I − L))ij = − biklkj k>j
® Need entries bik of row i where Lkj 6= 0, k > j. ® “Entries of B belonging to the pattern of (L, U)T can be extracted without computing any other entries outside the pat- tern.”
Sparse Days 06/15/2010 9 ® More recently exploited in a different form in L. Lin, C. Yang, J. Meza, J. Lu, L. Ying, W. E SelInv – An algorithm for selected inversion of a sparse symmetric matrix, Tech. Report, Princeton Univ. ® An algorithm based on a form of nested dissection is de- scribed in Li, Ahmed, Glimeck, Darve [2008] ® A close relative to this technique is represented in L. Lin , J. Lu, L. Ying , R. Car , W. E Fast algorithm for extracting the diagonal of the inverse matrix with application to the elec- tronic structure analysis of metallic systems Comm. Math. Sci, 2009. ® Difficulty: 3-D problems.
Sparse Days 06/15/2010 10 Stochastic Estimator • A = original matrix, B = A−1. • δ(B) = diag(B) [matlab notation] • D(B) = diagonal matrix with diagonal δ(B) Notation: • and : Elementwise multiplication and division of vectors
• {vj}: Sequence of s random vectors
s s X X Result: δ(B) ≈ vj Bvj vj vj j=1 j=1
Refs: C. Bekas , E. Kokiopoulou & YS (’05), Recent: C. Bekas, A. Curioni, I. Fedulova ’09.
Sparse Days 06/15/2010 11 ® Let Vs = [v1, v2, . . . , vs]. Then, alternative expression: > −1 > D(B) ≈ D(BVsVs )D (VsVs )
Question: When is this result exact?
Main Proposition
n×s n×n • Let Vs ∈ R with rows {vj,:}; and B ∈ C with elements {bjk}
• Assume that: hvj,:, vk,:i = 0, ∀j 6= k, s.t. bjk 6= 0 Then: > −1 > D(B)=D(BVsVs )D (VsVs )
® Approximation to bij exact when rows i and j of Vs are ⊥ Sparse Days 06/15/2010 12 Ideas from information theory: Hadamard matrices
® Consider the matrix V – want the rows to be as ‘orthogonal as possible among each other’, i.e., want to minimize T kI − VV kF T Erms = or Emax = max |VV |ij pn(n − 1) i6=j
® Problems that arise in coding: find code book [rows of V = code words] to minimize ’cross-correlation amplitude’ ® Welch bounds: s s n − s n − s Erms ≥ Emax ≥ (n − 1)s (n − 1)s
® Result: ∃ a sequence of s vectors vk with binary entries which achieve the first Welch bound iff s = 2 or s = 4k.
Sparse Days 06/15/2010 13 ® Hadamard matrices are a special class: n × n matrices with entries ±1 and such that HH> = nI. 1 1 1 1 1 1 1 −1 1 −1 Examples : and . 1 −1 1 1 −1 −1 1 −1 −1 1
® Achieve both Welch bounds ® Can build larger Hadamard matrices recursively:
Given two Hadamard matrices H1 and H2, the Kro- necker product H1 ⊗ H2 is a Hadamard matrix. ® Too expensive to use the whole matrix of size n
® Can use Vs = matrix of s first columns of Hn
Sparse Days 06/15/2010 14 0 0
20 20
40 40
32 64 60 60
80 80
100 100
120 120
0 20 40 60 80 100 120 0 20 40 60 80 100 120 > Pattern of VsVs , for s = 32 and s = 64.
Sparse Days 06/15/2010 15 A Lanczos approach
® Given a Hermitian matrix A - generate Lanczos vectors via:
βi+1qi+1 = Aqi − αiqi − βiqi−1
αi, βi+1 selected s.t. kqi+1k2 = 1 and qi+1 ⊥ qi, qi+1 ⊥ qi−1 ® Result: > AQm = QmTm + βm+1qm+1em,
> −1 −1 > ® When m = n then A = QnTnQn and A = QnTn Qn . −1 −1 > ® For m < n use the approximation: A ≈ QmTm Qm →
−1 −1 > D(A ) ≈ D[QmTm Qm]
Sparse Days 06/15/2010 16 ALGORITHM : 1 diagInv via Lanczos
For j = 1, 2, ··· , Do: βj+1qj+1 = Aqj − αjqj − βjqj−1 [Lanczos step] pj := qj − ηjpj−1 δj := αj − βjηj pj pj dj := dj−1 + [Update of diag(inv(A))] δj βj+1 ηj+1 := δj EndDo
−1 ® dk (a vector) will converge to the diagonal of A ® Limitation: Often requires all n steps to converge ® One advantage: Lanczos is shift invariant – so can use this for many ω’s ® Potential: Use as a direct method - exploiting sparsity
Sparse Days 06/15/2010 17 Using a sparse V : Probing
Find Vs such that (1) s is small and (2) Vs Goal: satisfies Proposition (rows i & j orthgonoal for any nonzero bij) Can work only for sparse matrices but B = Difficulty: A−1 is usually dense ® B can sometimes be approximated by a sparse matrix. ® bij, |bij| > Consider for some : (B)ij = 0, |bij| ≤
® B will be sparse under certain conditions, e.g., when A is diagonally dominant
® In what follows we assume B is sparse and set B := B. ® Pattern will be required by standard probing methods. Sparse Days 06/15/2010 18 Generic Probing Algorithm
ALGORITHM : 2 Probing Input: A, s Output: Matrix D (B) Determine Vs := [v1, v2, . . . , vs] for j ← 1 to s Solve Axj = vj end Construct Xs := [x1, x2, . . . , xs] > −1 > Compute D (B) := D XsVs D (VsVs )
® Note: rows of Vs are typically scaled to have unit 2-norm −1 > =1., so D (VsVs ) = I.
Sparse Days 06/15/2010 19 Standard probing (e.g. to compute a Jacobian)
® Several names for same method: “probing”; “CPR”, “Sparse Jacobian estimators”,..
Basis of the method: can compute Jacobian if a coloring of the columns is known so that no two columns of the same color overlap.
1 3 5 12 13 16 20 All entries of same color 1 (1) 1 (3) can be computed with 1 (5) one matvec. Example: For all blue 1 (12) entries multiply B by the 1 (13) blue vector on right. 1 (15) 1 (20)
Sparse Days 06/15/2010 20 What about Diag(inv(A))?
® Define vi - probing vector associated with color i: 1 if color(k) == i [v ] = i k 0 otherwise
® Standard probing satisfies requirement of Proposition but... ® ... this coloring is not what is needed! [It is an overkill]
Alternative: ® Color the graph of B in the standard graph coloring algo- rithm [Adjacency graph, not graph of column-overlaps] Graph coloring yields a valid set of probing Result: vectors for D(B).
Sparse Days 06/15/2010 21 color red color black
Proof:
® Column vc: one for each node i whose color is c, zero 01 0 0 0 0 0 i elsewhere. i
® Row i of Vs: has a ’1’ in c c = color(i) column , where , j 0 0 0 0 01 0 j zero elsewhere.
® If bij 6= 0 then in matrix Vs: • i-th row has a ’1’ in column color(i), ’0’ elsewhere. • j-th row has a ’1’ in column color(j), ’0’ elsewhere. ® The 2 rows are orthogonal.
Sparse Days 06/15/2010 22 Example:
® Two colors required for this graph → two probing vectors ® Standard method: 6 colors [graph of BT B]
Sparse Days 06/15/2010 23 Next Issue: Guessing the pattern of B
® Recall that we are dealing with B := B [‘pruned’ B] ® Assume A diagonally dominant ® Write A = D − E , with D = D(A). Then :
A = D(I − F ) with F ≡ D−1E →
A−1 ≈ (I + F + F 2 + ··· + F k)D−1 | {z } B(k)
® When A is D.D. kF kk decreases rapidly. ® Can approximate pattern of B by that of B(k) for some k. ® Interpretation in terms of paths of length k in graph of A. Sparse Days 06/15/2010 24 Q: How to select k? −1 A: Inspect A ej for some j k ® Values of solution outside pattern of (A ej) should be small. ® If during calculations we get larger than expected errors – then redo with larger k, more colors, etc.. ® Can we salvage what was done? Question still open.
Sparse Days 06/15/2010 25 Preliminary experiments
Problem Setup • DMFT: Calculate the imaginary time Green’s function • DMFT Parameters: Set of physical parameters is provided • DMFT loop: At most 10 outer iterations, each consisting of 62 inner iterations 1 • Each inner iteration: Find D(B) • Each inner iteration: Find D(B) 1 1 • Matrix: Based on a five-point stencil ajj with ajj = µ + iω − V − s(j) 1 −10 Probing Setup • Probing tolerance: = 10 • GMRES tolerance: δ = 10−12
Sparse Days 06/15/2010 26 Results n → 212 412 612 812 CPU times (sec) LAPACK 0.5 26 282 > 1000 for one inner itera- Lanczos 0.2 9.9 115 838 tion of DMFT. Probing 0.02 0.19 0.79 2.0
20 Path length, k # Probing vectors # GMRES iterations 15
A few statistics for 10 case n = 81 Number
5
0 0 10 20 30 40 50 60 70 DMFT inner iteration Sparse Days 06/15/2010 27 Challenge: The indefinite case
® The DMFT code deals with a separate case which uses a “real axis” sampling.. ® Matrix A is no longer diagonally dominant – Far from it. ® This is a much more challenging case.
® One option: solve Axj = ej FOR ALL j’s - with the ARMS solver using ddPQ ordering + exploit multiple right-hand sides ® More appealing: DD-type approaches
Sparse Days 06/15/2010 28 Divided & Conquer approach
Let A == a 5-point matrix (2-D problem) split roughly in two: A1 −I −IA2 −I ...... −IAk −I A = −I Ak+1 −I ...... −IAny−1 −I
−IAny where {Aj} = tridiag. Write: A A A A A = 11 12 = 11 + 12 , A21 A22 A22 A21
m×m (n−m)×(n−m) with A11 ∈ C and A22 ∈ C , Sparse Days 06/15/2010 29 ® Observation:
T T T A11 + E1E1 E1E1 E1E2 A = T − T T . A22 + E2E2 E2E1 E2E2 where E1,E2 are (relatively) small rank matrices: I m×nx (n−m)×nx E1 := ∈ C ,E2 := ∈ C , I Of the form C E A = C − EET ,C := 1 E := 1 C2 E2
® Idea: Use Sherman-Morrisson formula.
Sparse Days 06/15/2010 30 A−1 = C−1 + UG−1U T , with:
−1 n×nx T nx×nx U = C E ∈ C G = Inx − E U ∈ C ,
D(A−1) can be found from −1 −1 D(C1 ) −1 T D(A ) = −1 +D(UG U ). D(C2 ) | recursion{z }
C U = E , ® U : solve CU = E, or 1 1 1 Solve iteratively C2U2 = E2 ® T T T G: G = Inx − E U = Inx − E1 U1 − E2 U2
Sparse Days 06/15/2010 31 Domain Decomposition approach
Zoom into Subdomain 2 Domain decomposition with Ω p = 3 subdomains 2
Ω ΩΩ Ω Ω 1 2 3 1 3
Γ 12 Γ23 Under usual ordering [interior points then interface points]: B1 F1 B2 F2 BF A = . . . . ≡ , F T C Bp Fp T T T F1 F2 ··· Fp C 0
100
Example of matrix A 200 based on a DDM or- 300 dering with p = 4 sub- 400 domains. (n = 252) 500
600 0 100 200 300 400 500 600 nz = 3025
Inverse of A [Assuming both B and S nonsingular] B−1 + B−1FS−1F T B−1 −B−1FS−1 A−1 = −S−1F T B−1 S−1 S = C − F T B−1F, D(B−1) + D(B−1FS−1F T B−1) D(A−1) = D(S−1)
® Note: each diagonal block decouples from others:
Inverse of A in i- −1 −1 −1 T (A )ii = D(Bi ) + D(HiS Hi ) th block (domain) −1 Hi = Bi Fi
® Note: only nonzero columns of Fi are those related to interface vertices. ® Approach similar to Divide and Conquer but not recursive..
Sparse Days 06/15/2010 34 DMFT experiment
Times (in seconds) for direct inversion (INV), divide-and-conquer (D&C), and domain decomposition (DD) methods. √ n INV D&C DD ® p = 4 subd. for DD 21 .3 .1 .1 ® Various sizes - 2-D problems 51 12 1.4 .7 ® Times: seconds in matlab 81 88 7.1 3.2 ® NOTE: work still in progress
Sparse Days 06/15/2010 35 Conclusion
® Diag(inv(A)) problem: easy for Diag. Dominant case. Very challenging in (highly) indefinite case. ® Dom. Dec. methods can be a bridge between the two cases ® Approach [specifically for DMFT problem] : • Use direct methods in strongly Diag. Dom. case • Use DD-type methods in nearly Diag. Dom. case • Use direct methods in all other cases [until we find better means :-) ]
Sparse Days 06/15/2010 36