<<

Computing the diagonal of the inverse of a sparse Yousef Saad Department of Science and University of Minnesota

“Sparse Days”, Toulouse, June 15, 2010 Motivation: DMFT

‘Dynamic Mean Field Theory’ - quantum mechanical studies of highly correlated particles ® Equation to be solved (repeatedly) is Dyson’s equation G(ω) = [(ω + µ)I − V − Σ(ω) + T ]−1

• ω (frequency) and µ (chemical potential) are real • V = trap potential = real diagonal • Σ(ω) == local self-energy - a complex diagonal • T is the hopping matrix (sparse real).

Sparse Days 06/15/2010 2 ® Interested only in diagonal of G(ω) – in addition, equation must be solved self-consistently and ... ® ... must do this for many ω’s ® Related approach: Non Equilibrium Green’s Function (NEGF) approach used to model nanoscale transistors. ® Many new applications of diagonal of inverse [and related problems.] ® A few examples to follow

Sparse Days 06/15/2010 3 Introduction: A few examples

Problem 1: Compute Tr[inv[A]] the trace of the inverse. ® Arises in cross validation : k(I − A(θ))gk2 with A(θ) ≡ I−D(DT D+θLLT )−1DT , Tr (I − A(θ)) D == blurring operator and L is the regularization operator ® In [Huntchinson ’90] Tr[Inv[A]] is stochastically estimated ® Many authors addressed this problem.

Sparse Days 06/15/2010 4 Problem 2: Compute Tr [ f (A)], f a certain function Arises in many applications in Physics. Example: ® Stochastic estimations of Tr ( f(A)) extensively used by quan- tum chemists to estimate Density of States, see [Ref: H. Röder, R. N. Silver, D. A. Drabold, J. J. Dong, Phys. Rev. B. 55, 15382 (1997)]

Sparse Days 06/15/2010 5 Problem 3: Compute diag[inv(A)] the diagonal of the inverse ® Arises in Dynamic Mean Field Theory [DMFT, motivation for this work]. In DMFT, we seek the diagonal of a “Green’s function” which solves (self-consistently) Dyson’s equation. [see J. Freericks 2005] ® Related approach: Non Equilibrium Green’s Function (NEGF) approach used to model nanoscale transistors. ® In uncertainty quantification, the diagonal of the inverse of a is needed [Bekas, Curioni, Fedulova ’09]

Sparse Days 06/15/2010 6 Problem 4: Compute diag[ f (A)]; f = a certain function. ® Arises in any approach in quantum modeling - for example Density Functional Theory. ® Here, f = Fermi-Dirac operator:

1 Note: when T → 0 f() = 1 + exp(−µ) then f becomes a step kBT function. Note: if f is approximated by a rational function then diag[f(A)] −1 ≈ a lin. combinaiton of terms like diag[(A − σiI) ] ® Linear-Scaling methods based on approximating f(H) and Diag(f(H)) – avoid ‘diagonalization’ of H

Sparse Days 06/15/2010 7 Methods based on the sparse L U factorization

® Basic reference: K. Takahashi, J. Fagan, and M.-S. Chin, Formation of a sparse bus impedance matrix and its application to short circuit study, in Proc. of the Eighth Inst. PICA Conf., Minneapolis, MN, IEEE, Power Engineering Soc., 1973, pp. 63-69. ® Described in [Duff, Erisman, Reid, p. 273] - ® used by Erisman and Tinney [Num. Math. 1975]

Sparse Days 06/15/2010 8 ® Main idea. If A = LDU and B = A−1 then B = U −1D−1 + B(I − L); B = D−1L−1 + (I − U)B.

® Not all entries are needed to compute selected entries of B ® For example: Consider lower part, i > j; use first equation: X bij = (B(I − L))ij = − biklkj k>j

® Need entries bik of row i where Lkj 6= 0, k > j. ® “Entries of B belonging to the pattern of (L, U)T can be extracted without computing any other entries outside the pat- tern.”

Sparse Days 06/15/2010 9 ® More recently exploited in a different form in L. Lin, C. Yang, J. Meza, J. Lu, L. Ying, W. E SelInv – An algorithm for selected inversion of a sparse , Tech. Report, Princeton Univ. ® An algorithm based on a form of nested dissection is de- scribed in Li, Ahmed, Glimeck, Darve [2008] ® A close relative to this technique is represented in L. Lin , J. Lu, L. Ying , R. Car , W. E Fast algorithm for extracting the diagonal of the inverse matrix with application to the elec- tronic structure analysis of metallic systems Comm. Math. Sci, 2009. ® Difficulty: 3-D problems.

Sparse Days 06/15/2010 10 Stochastic Estimator • A = original matrix, B = A−1. • δ(B) = diag(B) [ notation] • D(B) = with diagonal δ(B) Notation: • and : Elementwise multiplication and division of vectors

• {vj}: Sequence of s random vectors

 s   s  X X Result: δ(B) ≈  vj Bvj  vj vj j=1 j=1

Refs: C. Bekas , E. Kokiopoulou & YS (’05), Recent: C. Bekas, A. Curioni, I. Fedulova ’09.

Sparse Days 06/15/2010 11 ® Let Vs = [v1, v2, . . . , vs]. Then, alternative expression: > −1 > D(B) ≈ D(BVsVs )D (VsVs )

Question: When is this result exact?

Main Proposition

n×s n×n • Let Vs ∈ R with rows {vj,:}; and B ∈ C with elements {bjk}

• Assume that: hvj,:, vk,:i = 0, ∀j 6= k, s.t. bjk 6= 0 Then: > −1 > D(B)=D(BVsVs )D (VsVs )

® Approximation to bij exact when rows i and j of Vs are ⊥ Sparse Days 06/15/2010 12 Ideas from information theory: Hadamard matrices

® Consider the matrix V – want the rows to be as ‘orthogonal as possible among each other’, i.e., want to minimize T kI − VV kF T Erms = or Emax = max |VV |ij pn(n − 1) i6=j

® Problems that arise in coding: find code book [rows of V = code words] to minimize ’cross-correlation amplitude’ ® Welch bounds: s s n − s n − s Erms ≥ Emax ≥ (n − 1)s (n − 1)s

® Result: ∃ a sequence of s vectors vk with binary entries which achieve the first Welch bound iff s = 2 or s = 4k.

Sparse Days 06/15/2010 13 ® Hadamard matrices are a special class: n × n matrices with entries ±1 and such that HH> = nI.  1 1 1 1    1 1  1 −1 1 −1  Examples : and   . 1 −1  1 1 −1 −1  1 −1 −1 1

® Achieve both Welch bounds ® Can build larger Hadamard matrices recursively:

Given two Hadamard matrices H1 and H2, the Kro- necker product H1 ⊗ H2 is a . ® Too expensive to use the whole matrix of size n

® Can use Vs = matrix of s first columns of Hn

Sparse Days 06/15/2010 14 0 0

20 20

40 40

32 64 60 60

80 80

100 100

120 120

0 20 40 60 80 100 120 0 20 40 60 80 100 120 > Pattern of VsVs , for s = 32 and s = 64.

Sparse Days 06/15/2010 15 A Lanczos approach

® Given a A - generate Lanczos vectors via:

βi+1qi+1 = Aqi − αiqi − βiqi−1

αi, βi+1 selected s.t. kqi+1k2 = 1 and qi+1 ⊥ qi, qi+1 ⊥ qi−1 ® Result: > AQm = QmTm + βm+1qm+1em,

> −1 −1 > ® When m = n then A = QnTnQn and A = QnTn Qn . −1 −1 > ® For m < n use the approximation: A ≈ QmTm Qm →

−1 −1 > D(A ) ≈ D[QmTm Qm]

Sparse Days 06/15/2010 16 ALGORITHM : 1 diagInv via Lanczos

For j = 1, 2, ··· , Do: βj+1qj+1 = Aqj − αjqj − βjqj−1 [Lanczos step] pj := qj − ηjpj−1 δj := αj − βjηj pj pj dj := dj−1 + [Update of diag(inv(A))] δj βj+1 ηj+1 := δj EndDo

−1 ® dk (a vector) will converge to the diagonal of A ® Limitation: Often requires all n steps to converge ® One advantage: Lanczos is shift invariant – so can use this for many ω’s ® Potential: Use as a direct method - exploiting sparsity

Sparse Days 06/15/2010 17 Using a sparse V : Probing

Find Vs such that (1) s is small and (2) Vs Goal: satisfies Proposition (rows i & j orthgonoal for any nonzero bij) Can work only for sparse matrices but B = Difficulty: A−1 is usually dense ® B can sometimes be approximated by a sparse matrix.  ®  bij, |bij| >  Consider for some : (B)ij = 0, |bij| ≤ 

® B will be sparse under certain conditions, e.g., when A is diagonally dominant

® In what follows we assume B is sparse and B := B. ® Pattern will be required by standard probing methods. Sparse Days 06/15/2010 18 Generic Probing Algorithm

ALGORITHM : 2 Probing Input: A, s Output: Matrix D (B) Determine Vs := [v1, v2, . . . , vs] for j ← 1 to s Solve Axj = vj end Construct Xs := [x1, x2, . . . , xs] > −1 > Compute D (B) := D XsVs D (VsVs )

® Note: rows of Vs are typically scaled to have unit 2-norm −1 > =1., so D (VsVs ) = I.

Sparse Days 06/15/2010 19 Standard probing (e.g. to compute a Jacobian)

® Several names for same method: “probing”; “CPR”, “Sparse Jacobian estimators”,..

Basis of the method: can compute Jacobian if a coloring of the columns is known so that no two columns of the same color overlap.

1 3 5 12 13 16 20 All entries of same color 1 (1) 1 (3) can be computed with 1 (5) one matvec. Example: For all blue 1 (12) entries multiply B by the 1 (13) blue vector on right. 1 (15) 1 (20)

Sparse Days 06/15/2010 20 What about Diag(inv(A))?

® Define vi - probing vector associated with color i:  1 if color(k) == i [v ] = i k 0 otherwise

® Standard probing satisfies requirement of Proposition but... ® ... this coloring is not what is needed! [It is an overkill]

Alternative: ® Color the graph of B in the standard graph coloring algo- rithm [Adjacency graph, not graph of column-overlaps] Graph coloring yields a valid set of probing Result: vectors for D(B).

Sparse Days 06/15/2010 21 color red color black

Proof:

® Column vc: one for each node i whose color is c, zero 01 0 0 0 0 0 i elsewhere. i

® Row i of Vs: has a ’1’ in c c = color(i) column , where , j 0 0 0 0 01 0 j zero elsewhere.

® If bij 6= 0 then in matrix Vs: • i-th row has a ’1’ in column color(i), ’0’ elsewhere. • j-th row has a ’1’ in column color(j), ’0’ elsewhere. ® The 2 rows are orthogonal.

Sparse Days 06/15/2010 22 Example:

® Two colors required for this graph → two probing vectors ® Standard method: 6 colors [graph of BT B]

Sparse Days 06/15/2010 23 Next Issue: Guessing the pattern of B

® Recall that we are dealing with B := B [‘pruned’ B] ® Assume A diagonally dominant ® Write A = D − E , with D = D(A). Then :

A = D(I − F ) with F ≡ D−1E →

A−1 ≈ (I + F + F 2 + ··· + F k)D−1 | {z } B(k)

® When A is D.D. kF kk decreases rapidly. ® Can approximate pattern of B by that of B(k) for some k. ® Interpretation in terms of paths of length k in graph of A. Sparse Days 06/15/2010 24 Q: How to select k? −1 A: Inspect A ej for some j k ® Values of solution outside pattern of (A ej) should be small. ® If during calculations we get larger than expected errors – then redo with larger k, more colors, etc.. ® Can we salvage what was done? Question still open.

Sparse Days 06/15/2010 25 Preliminary experiments

Problem Setup • DMFT: Calculate the imaginary time Green’s function • DMFT Parameters: Set of physical parameters is provided • DMFT loop: At most 10 outer iterations, each consisting of 62 inner iterations 1 • Each inner iteration: Find D(B)   • Each inner iteration: Find D(B) 1   1    • Matrix: Based on a five-point stencil ajj with ajj = µ + iω − V − s(j)   1 −10 Probing Setup • Probing tolerance:  = 10 • GMRES tolerance: δ = 10−12

Sparse Days 06/15/2010 26 Results n → 212 412 612 812 CPU times (sec) LAPACK 0.5 26 282 > 1000 for one inner itera- Lanczos 0.2 9.9 115 838 tion of DMFT. Probing 0.02 0.19 0.79 2.0

20 Path length, k # Probing vectors # GMRES iterations 15

A few for 10 case n = 81 Number

5

0 0 10 20 30 40 50 60 70 DMFT inner iteration Sparse Days 06/15/2010 27 Challenge: The indefinite case

® The DMFT code deals with a separate case which uses a “real axis” sampling.. ® Matrix A is no longer diagonally dominant – Far from it. ® This is a much more challenging case.

® One option: solve Axj = ej FOR ALL j’s - with the ARMS solver using ddPQ ordering + exploit multiple right-hand sides ® More appealing: DD-type approaches

Sparse Days 06/15/2010 28 Divided & Conquer approach

Let A == a 5-point matrix (2-D problem) split roughly in two:   A1 −I  −IA2 −I     ......     −IAk −I  A =    −I Ak+1 −I     ......     −IAny−1 −I 

−IAny where {Aj} = tridiag. Write: A A  A   A  A = 11 12 = 11 + 12 , A21 A22 A22 A21

m×m (n−m)×(n−m) with A11 ∈ C and A22 ∈ C , Sparse Days 06/15/2010 29 ® Observation:

 T   T T  A11 + E1E1 E1E1 E1E2 A = T − T T . A22 + E2E2 E2E1 E2E2 where E1,E2 are (relatively) small matrices:   I m×nx (n−m)×nx E1 :=   ∈ C ,E2 :=   ∈ C , I Of the form C  E  A = C − EET ,C := 1 E := 1 C2 E2

® Idea: Use Sherman-Morrisson formula.

Sparse Days 06/15/2010 30 A−1 = C−1 + UG−1U T , with:

−1 n×nx T nx×nx U = C E ∈ C G = Inx − E U ∈ C ,

D(A−1) can be found from  −1  −1 D(C1 ) −1 T D(A ) = −1 +D(UG U ). D(C2 ) | recursion{z }

 C U = E , ® U : solve CU = E, or 1 1 1 Solve iteratively C2U2 = E2 ® T T T G: G = Inx − E U = Inx − E1 U1 − E2 U2

Sparse Days 06/15/2010 31 Domain Decomposition approach

Zoom into Subdomain 2 Domain decomposition with Ω p = 3 subdomains 2

Ω ΩΩ Ω Ω 1 2 3 1 3

Γ 12 Γ23 Under usual ordering [interior points then interface points]:   B1 F1  B2 F2     BF A =  . . . .  ≡ ,   F T C  Bp Fp T T T F1 F2 ··· Fp C 0

100

Example of matrix A 200 based on a DDM or- 300 dering with p = 4 sub- 400 domains. (n = 252) 500

600 0 100 200 300 400 500 600 nz = 3025

Inverse of A [Assuming both B and S nonsingular] B−1 + B−1FS−1F T B−1 −B−1FS−1 A−1 = −S−1F T B−1 S−1 S = C − F T B−1F, D(B−1) + D(B−1FS−1F T B−1)  D(A−1) = D(S−1)

® Note: each diagonal block decouples from others:

Inverse of A in i- −1 −1 −1 T (A )ii = D(Bi ) + D(HiS Hi ) th block (domain) −1 Hi = Bi Fi

® Note: only nonzero columns of Fi are those related to interface vertices. ® Approach similar to Divide and Conquer but not recursive..

Sparse Days 06/15/2010 34 DMFT experiment

Times (in seconds) for direct inversion (INV), divide-and-conquer (D&C), and domain decomposition (DD) methods. √ n INV D&C DD ® p = 4 subd. for DD 21 .3 .1 .1 ® Various sizes - 2-D problems 51 12 1.4 .7 ® Times: seconds in matlab 81 88 7.1 3.2 ® NOTE: work still in progress

Sparse Days 06/15/2010 35 Conclusion

® Diag(inv(A)) problem: easy for Diag. Dominant case. Very challenging in (highly) indefinite case. ® Dom. Dec. methods can be a bridge between the two cases ® Approach [specifically for DMFT problem] : • Use direct methods in strongly Diag. Dom. case • Use DD-type methods in nearly Diag. Dom. case • Use direct methods in all other cases [until we find better means :-) ]

Sparse Days 06/15/2010 36