PRINCIPLES OF CIRCUIT SIMULAITON

LectureLecture 9.9. LinearLinear Solver:Solver: LULU SolverSolver andand SparseSparse MatrixMatrix

Guoyong Shi, PhD [email protected] School of Microelectronics Shanghai Jiao Tong University Fall 2010

2010-11-15 Slide 1 OutlineOutline

Part 1: • • LU • Pivoting • Doolittle method and Crout’s Method • Summary Part 2: Sparse

2010-11-15 Lecture 9 slide 2 MotivationMotivation • Either in Sparse Tableau Analysis (STA) or in Modified Nodal Analysis (MNA), we have to solve linear system of equations: Ax = b

⎛⎞A 00⎛⎞i ⎛0 ⎞ ⎜⎟ 00IAv−=T ⎜⎟ ⎜ ⎟ ⎜⎟⎜⎟ ⎜ ⎟ ⎜⎟⎜⎟ ⎜ ⎟ ⎡⎤11 1 KKiv e S0 ⎝⎠ ⎝ ⎠ GG ⎝⎠⎢⎥++22 −− RR13 R 3⎛⎞e1 ⎛⎞0 ⎢⎥⎜⎟= ⎜⎟ ⎢⎥111⎝⎠e2 ⎝⎠I S 5 ⎢⎥−+ ⎣ RRR343⎦

2010-11-15 Lecture 9 slide 3 MotivationMotivation • Even in nonlinear circuit analysis, after "linearization", again one has to solve a system of linear equations: Ax = b. • Many other engineering problems require solving a system of linear equations.

• Typically, matrix size is of 1000s to millions. • This needs to be solved 1000 to million times for one simulation cycle. • That's why we'd like to have very efficient linear solvers!

2010-11-15 Lecture 9 slide 4 ProblemProblem DescriptionDescription

Problem: Solve Ax = b A: nxn (real, non-singular), x: nx1, b: nx1 Methods: – Direct Methods (this lecture) Gaussian Elimination, LU Decomposition, Crout – Indirect, Iterative Methods (another lecture) Gauss-Jacobi, Gauss-Seidel, Successive Over Relaxation (SOR), Krylov

2010-11-15 Lecture 9 slide 5 GaussianGaussian EliminationElimination ---- ExampleExample

⎧25xy+= ⎨ ⎩xy+=24

1 ⎛215⎞ − ⎛215⎞ ⎧x = 2 2 ⎜ 33⎟ ⎨ ⎜ 124⎟ ⎜0 ⎟ ⎩y = 1 ⎝ ⎠ ⎝ 22⎠

⎛⎞⎛⎞10 21 ⎛⎞21 ⎜⎟⎜⎟ ⎜⎟= 13LU factorization ⎝⎠12 ⎜⎟⎜⎟10 ⎝⎠⎝⎠22

2010-11-15 Lecture 9 slide 6 UseUse ofof LULU FactorizationFactorization

L U ⎛⎞⎛⎞10 21 ⎛21 ⎞⎛⎞xx⎜⎟⎜⎟⎛⎞ ⎛5⎞ ⎜ ⎟⎜⎟==13⎜⎟ ⎜⎟ ⎝12 ⎠⎝⎠yy⎜⎟⎜⎟10 ⎝⎠ ⎝4⎠ ⎝⎠⎝⎠22 Define ⎛⎞21 ⎛⎞ux⎜⎟ ⎛⎞ Solving the L-system: ⎜⎟:= 3 ⎜⎟ ⎝⎠vy⎜⎟0 ⎝⎠ ⎝⎠2 ⎛⎞10 ⎛⎞5 ⎛⎞u ⎛⎞5 ⎛⎞u ⎜⎟1 = ⎜⎟ ⎜⎟⎜⎟v ⎜⎟4 ⎜⎟= 3 1 ⎝⎠ ⎝⎠ ⎝⎠v ⎜⎟ ⎝⎠2 Solve ⎝⎠2 L Triangle systems are easy to solve (by back-substitution.) 2010-11-15 Lecture 9 slide 7 UseUse ofof LULU FactorizationFactorization

⎛⎞10 ⎛⎞u ⎛⎞5 ⎛⎞5 ⎜⎟ ⎛⎞u ⎜⎟ 1 ⎜⎟= ⎜⎟ ⎜⎟= 3 ⎜⎟1 ⎝⎠v ⎝⎠4 ⎝⎠v ⎜⎟ ⎝⎠2 ⎝⎠2

U U ⎛⎞21 ⎛ 5⎞ ⎛⎞21 ⎜⎟⎛⎞x ⎜⎟ ⎛⎞xu ⎛⎞ 33⎜⎟= ⎜⎟ ⎜⎟0 y ⎜⎟ 3 ⎜⎟= ⎜⎟ ⎝⎠ ⎜⎟0 ⎝⎠y ⎝v ⎠ ⎝⎠22 ⎝⎠ ⎝⎠2 Solving the U-system: ⎛⎞x ⎛2⎞ ⎜⎟= ⎜⎟ ⎝⎠y ⎝1⎠

2010-11-15 Lecture 9 slide 8 LULU FactorizationFactorization

= LUA = U The task of L & U L factorization is to find the elements in matrices L and U. Ax=== L( Ux) Ly b

1. Let y = Ux. 2. Solve y from Ly = b 3. Solve x from Ux = y

2010-11-15 Lecture 9 slide 9 AdvantagesAdvantages ofof LULU FactorizationFactorization

• When solving Ax = b for multiple b, but the same A, then we only LU-factorize A only once. • In circuit simulation, entries of A may change, but structure of A does not alter. – This factor can used to speed up repeated LU- factorization. – Implemented as symbolic factorization in the “sparse1.3” solver in Spice 3f4.

2010-11-15 Lecture 9 slide 10 GaussianGaussian EliminationElimination

• Gaussian elimination is a process of row transformation Ax = b

aaa11 12 13 a 1n b1

aaa21 22 23 a 2n b2

aaa31 32 33 a 3n b3

aaannn123 a nn bn Eliminate the lower triangular part

2010-11-15 Lecture 9 slide 11 GaussianGaussian EliminationElimination

a11 ≠ 0 aa11 12 a13 a1n b1 aaa a b a 21 22 23 2n 2 − i 1 aa31 32 a33 a3n b3 a11

aann12an3 annbn

aa11 12 a13 a1n b1

0 aa22 23 a2n b2

0 aa32 33 a3n b3 Entries updated

0 aann23 ann bn

2010-11-15 Lecture 9 slide 12 EliminatingEliminating 11stst ColumnColumn

• Column elimination is equiv to row transformation

⎡ 100 0⎤ aaa a ⎡ 11 12 13 1n ⎤ b1 ⎢ a ⎥ ⎢ − 21 10 0⎥ ⎢ ⎥ a aaa21 22 23 a 2n b ⎢ 11 ⎥ ⎢ ⎥ 2 ⎢ a ⎥ ⎢ − 31 01 0⎥ X ⎢aaa31 32 33 a 3n ⎥ b3 ⎢ a 11 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ a n 1 ⎢aaa a ⎥ b ⎢ − 00 1⎥ ⎣ nnn123 nn⎦ n ⎣ a 11 ⎦

⎡aa11 12 a 13 a 1n ⎤ b1 ⎢ (2) (2) (2) ⎥ (2) 0 aa22 23 a 2n b2 -1 (2) ⎢ ⎥ L A = A (2) 1 ⎢ 0 aa(2) (2) a (2) ⎥ b ⎢ 32 33 3n ⎥ 3 ⎢ ⎥ ⎢ (2) (2) (2) ⎥ (2) ⎣ 0 aann23 a nn ⎦ bn 2010-11-15 Lecture 9 slide 13 EliminatingEliminating 22ndnd ColumnColumn

⎡ 100 0⎤ ⎡aa11 12 a 13 a 1n ⎤ ⎢ 010 0⎥ ⎢ ⎥ ⎢ ⎥ 0 aa(2) (2) a (2) ⎢ (2) ⎥ ⎢ 22 23 2n ⎥ a 32 ⎢ − 10 ⎥ (2) (2) (2) a (2) X ⎢ ⎥ ⎢ 22 ⎥ 0 aa32 33 a 3n ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ a (2) ⎥ n 2 ⎢ 001− (2) ⎥ ⎢ (2) (2) (2) ⎥ a ⎣ 22 ⎦ ⎣ 0 aann23 a nn ⎦ (2) a22 ≠ 0 ⎡aa11 12 a 13 a 1n ⎤ b1 ⎢ ⎥ 0 aa(2) (2) a (2) (2) ⎢ 22 23 2n ⎥ b2 L -1 A(2) = A(3) 2 ⎢ 00aa(3) (3) ⎥ b (3) ⎢ 33 3n ⎥ 3 ⎢ ⎥ ⎢ ⎥ 00aa(3) (3) (3) ⎣ nn3 n⎦ bn 2010-11-15 Lecture 9 slide 14 ContinueContinue onon EliminationElimination

• Suppose all diagonals are nonzero

aaa a ⎡ 11 12 13 1n ⎤ b1 ⎢ ⎥ aaa a b ⎢ 21 22 23 2n ⎥ 2 L -1 L -1 ••• L -1 L -1 n n-1 2 1 ⎢aaa31 32 33 a 3n ⎥ b3 ⎢ ⎥ ⎢ ⎥ b ⎣⎢aaannn123 a nn ⎦⎥ n

Upper triangular aa a a ⎡ 11 12 13 1n ⎤ b1 ⎢ ⎥ 0 aa(2) (2) a (2) (2) ⎢ 22 23 (n) 2n ⎥ b2 (n) (n) ⎢ 00aaA(3) (3) ⎥ b (3) A x = b ⎢ 33 3n ⎥ 3 ⎢ ⎥ ⎢ ()n ⎥ b ()n ⎣ 00 0 ann ⎦ n 2010-11-15 Lecture 9 slide 15 TriangularTriangular SystemSystem

Gaussian elimination ends up with the following upper triangular system of equations

⎡⎤aa11 12 a 13 a b1n ⎛⎞x ⎛1 ⎞ ⎢⎥1 ⎜ ⎟ 0 aa(2) (2) a b(2) ⎜⎟x (2) ⎢⎥22 23 2n ⎜⎟2 ⎜ 2 ⎟ ⎢⎥(3) (3) ⎜ (3) ⎟ 00aa33 3n ⎜⎟x 3 b= 3 ⎢⎥⎜⎟⎜ ⎟ ⎢⎥ ⎜⎟ ⎜ ⎟ ⎢⎥()nn⎜⎟x ⎜ ()⎟ ⎣⎦00 0 abnn ⎝⎠n ⎝n ⎠

Solve this system from bottom up: xn, xn-1, ..., x1

2010-11-15 Lecture 9 slide 16 LULU FactorizationFactorization

• Gaussian elimination leads to LU factorization

-1 -1 -1 -1 ( Ln Ln-1 ••• L2 L1 ) A = U

L = (L L ••• L L ) A = (L1 L2 ••• Ln-1 Ln) U = LU 1 2 n-1 n

⎡ 100 0⎤ ⎡⎤aa11 12 a 13 a 1n ⎢ a ⎥ ⎢ 21 10 0⎥ ⎢⎥(2) (2) (2) ⎢ a 11 ⎥ 0 aa22 23 a 2n ⎢ ⎥ ⎢⎥ aa(2) (3) (3) ⎢ 31 32 10 ⎥ ⎢⎥00aa ⎢ (2) ⎥ 33 3n a 11 a ⎢⎥ ⎢ 22 ⎥ ⎢ ⎥ ⎢⎥ ⎢ (2) ⎥ aa ⎢⎥()n ⎢ nn12 ⎥ 00 0 a (2) *1 ⎣⎦nn ⎣⎢ a 11 a 22 ⎦⎥

2010-11-15 Lecture 9 slide 17 ComplexityComplexity ofof LULU

a11 ≠ 0 aa11 12 a13 a1n a − 21 aaa a a11 21 22 23 2n a 31 − aa31 32 a33 a3n a11 a − n 1 aann12an3 ann a11

# of mul / div = (n-1)*n ≈ n2 n >> 1

n 1 ∑in23=+(1n)(2n+1)∼ O(n) i =1 6

2010-11-15 Lecture 9 slide 18 CostCost ofof BackBack--SubstitutionSubstitution

⎡⎤aa11 12 a 13 a b1n ⎛⎞x ⎛1 ⎞ ⎢⎥1 ⎜ ⎟ 0 aa(2) (2) a b(2) ⎜⎟x (2) ⎢⎥22 23 2n ⎜⎟2 ⎜ 2 ⎟ ⎢⎥(3) (3) ⎜ (3) ⎟ 00aa33 3n ⎜⎟x 3 b= 3 ⎢⎥⎜⎟⎜ ⎟ ⎢⎥ ⎜⎟ ⎜ ⎟ ⎢⎥()nn⎜⎟x ⎜ ()⎟ ⎣⎦00 0 abnn ⎝⎠n ⎝n ⎠

(1)(1)nn b ()n bax− − − n x = nnnn−−11, x n = ()n n −1 (1)n − ann ann−−1, 1

n 1 Total # of mul / div = ∑inn=+(1)()∼ On2 i =1 2 2010-11-15 Lecture 9 slide 19 ZeroZero DiagonalDiagonal

Example 1: After two steps of Gaussian elimination:

⎡ ×× 0000 ⎤ ⎡ ×× 0000 ⎤ ⎢ × × 0000 ⎥ ⎢ ⎥ ⎢ × × 0000 ⎥ ⎢ 11 ⎥ ⎢ ⎥ ⎢ 00 − 01 ⎥ ⎢ 11 ⎥ RR ⎢ 00 − 01 ⎥ ⎢ ⎥ RR ⎢ 11 ⎥ ⎢ ⎥ 00 − 10 ⎢ 110000 ⎥ ⎢ RR ⎥ ⎢ ⎥ ⎢ 000 ×× 0 ⎥ ⎢ 00 × × 00 ⎥ ⎢ ⎥ ⎢ ⎥ ⎣⎢ 000 ×× 0 ⎦⎥ ⎣ 000 ×× 0 ⎦

Gaussian elimination cannot continue

2010-11-15 Lecture 9 slide 20 PivotingPivoting

Solution 1: Interchange rows to bring a non-zero element into position (k, k):

⎡⎤011 ⎡⎤xx0 ⎢⎥ ⎢⎥011 ⎢⎥××0 ⎢⎥ ⎣⎦⎢⎥××0 ⎣⎦⎢⎥xx0 Solution 2: How about column exchange? Yes Then the unknowns are re-ordered as well. ⎡101⎤ ⎡⎤011 ⎢ ⎥ ⎢⎥ xx0 ⎢⎥××0 ⎢ ⎥ ⎣⎦⎢⎥××0 ⎣⎢xx0⎦⎥ In general both rows and columns can be exchanged! 2010-11-15 Lecture 9 slide 21 SmallSmall DiagonalDiagonal

− 4 ⎡ × 25.11025.1 ⎤ ⎡ x 1 ⎤ ⎡ 25.6 ⎤ Example 2: ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ ⎣⎢ 5.125.12 ⎦⎥ ⎣ x 2 ⎦ ⎣ 75 ⎦ Assume finite arithmetic: 3-digit floating point, we have

5 − 4 −10 ⎡ × 1025.1 25.1 ⎤ ⎡ x 1 ⎤ ⎡ 25.6 ⎤ ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ ⎣⎢ 5.12 5.12 ⎦⎥ ⎣ x 2 ⎦ ⎣ 75 ⎦ pivoting

⎡ × − 4 25.11025.1 ⎤ ⎡ x ⎤ ⎡ 25.6 ⎤ ⎢ ⎥ 1 = 5 ⎢ ⎥ ⎢ 5 ⎥ ⎣⎢ 0 ×− 1025.1 ⎦⎥ ⎣ x 2 ⎦ ⎣⎢ ×− 1025.6 ⎦⎥

12.5 rounded off ⎪⎧ x 2 = 5 ⎨ − 4 x = 0 ⎩⎪ × x 1 + x 2 = 25.6)25.1()1025.1( 1 Unfortunately, (0, 5) is not the solution. Considering the 2nd equation: 12.5 * 0 + 12.5 * 5 = 62.5 ≠ 75. 2010-11-15 Lecture 9 slide 22 AccuracyAccuracy DependsDepends onon PivotingPivoting

5 − 4 −10 ⎡ × 1025.1 25.1 ⎤ ⎡ x 1 ⎤ ⎡ 25.6 ⎤ ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ ⎣⎢ 5.12 5.12 ⎦⎥ ⎣ x 2 ⎦ ⎣ 75 ⎦ pivoting

Reason:

a11 (the pivot) is too small relative to the other numbers!

Solution: Don't choose small element to do elimination. Pick a large element by row / column interchanges.

Correct solution to 5 digit accuracy is

x1 = 1.0001

x2 = 5.0000 2010-11-15 Lecture 9 slide 23 WhatWhat causescauses accuracyaccuracy problem?problem?

• Ill Conditioning: The A matrix close to singular • Round-off error: Relative magnitude too big

− yx = 0 + yx = 1 yx =− 0 + yx = 1

− yx = 0 − yx = 0 yx =− 01.001.1 Í resulting in numerical errors ill-conditioned

2010-11-15 Lecture 9 slide 24 PivotingPivoting StrategiesStrategies

1. Partial Pivoting

2. Complete Pivoting

3. Threshold Pivoting

2010-11-15 Lecture 9 slide 25 PivotingPivoting StrategyStrategy 11

1. Partial Pivoting: (Row interchange only) Choose r as the smallest integer such that:

k )( k )( k a rk = max a jk = ,..., nkj U k Rows k to n L r

Search Area

2010-11-15 Lecture 9 slide 26 PivotingPivoting StrategyStrategy 22

2. Complete Pivoting: (Row and column interchange) Choose r and s as the smallest integer such that:

k

k )( k )( a rs = max a ij U = ,..., nki k = ,..., nkj L r rows k to n; s cols k to n

Search Area

2010-11-15 Lecture 9 slide 27 PivotingPivoting StrategyStrategy 33

3. Threshold Pivoting: a k )( < ε a k )( a. Apply partial pivoting only if kk rkp k )( k )( b. Apply complete pivoting only if a kk < ε a rsp

user specified

k )( k )( a rk = max a jk U = ,..., nkj L r k )( k )( a rs = max a ij s = ,..., nki = ,..., nkj Implemented in Spice 3f4

2010-11-15 Lecture 9 slide 28 VariantsVariants ofof LULU FactorizationFactorization

• Doolittle Method • Crout Method • Motivated by directly filling in L/U elements in the storage space of the original matrix "A".

⎡ 100 0⎤ ⎡uuu11 12 13 u 1n ⎤ = LUA = U ⎢ ⎥ ⎢ ⎥ 10 0 0 uu22 23 u 2n ⎢ 21 ⎥ ⎢ ⎥ L ⎢ 00uu ⎥ ⎢31 32 10 ⎥ 33 3n ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 000 u ⎥ ⎣⎢nnn123 1 ⎦⎥ ⎣ nn ⎦

⎡aaa11 12 13 a 1n ⎤ ⎢aaa a ⎥ ⎢ 21 22 23 2n ⎥ Reuse the storage ⎢aaa31 32 33 a 3n ⎥ ⎢ ⎥ ⎢ ⎥ ⎣⎢aaannn123 a nn ⎦⎥ 2010-11-15 Lecture 9 slide 29 VariantsVariants ofof LULU FactorizationFactorization

Hence we need a sequential method to process the rows and columns of A in certain order – processed rows / columns are not used in the later processing.

= LUA = U L Reuse the storage

uuu u ⎡⎤aaa11 12 13 a 1n ⎡ 100 0⎤ ⎡ 11 12 13 1n ⎤ ⎢⎥⎢ ⎥ ⎢ ⎥ aaa a 10 0 0 uu22 23 u 2n ⎢⎥21 22 23 2n ⎢ 21 ⎥ ⎢ ⎥ ⎢⎥aaa a ⎢10 ⎥ ⎢ 00uu33 3n ⎥ 31 32 33 3n 31 32 ⎢ ⎥ ⎢⎥⎢ ⎥ ⎢⎥ ⎢ ⎥ ⎢ ⎥ ⎢⎥ ⎢ 000 u ⎥ ⎣⎦⎢⎥aaannn123 a nn ⎣ nnn123 1 ⎦ ⎣ nn ⎦

2010-11-15 Lecture 9 slide 30 DoolittleDoolittle MethodMethod –– 11

Keep this row

⎡⎤aaa11 12 13 a 1n ⎡ 100 0⎤ ⎡uuu11 12 13 u 1n ⎤ ⎢⎥⎢ ⎥ ⎢ ⎥ aaa21 22 23 a 2n 10 0 0 uu22 23 u 2n ⎢⎥⎢ 21 ⎥ ⎢ ⎥ ⎢⎥aaa31 32 33 a 3n = ⎢10 ⎥ ⎢ 00uu33 3n ⎥ 31 32 ⎢ ⎥ ⎢⎥ ⎢ ⎥ ⎢⎥⎢ ⎥ ⎢ ⎥ ⎢⎥ ⎢ 000 u ⎥ ⎣⎦aaannn123 a nn ⎣⎢nnn123 1 ⎦⎥ ⎣ nn ⎦

First solve the 1st row of U, i.e., U(1, :)

(uuu11 12 13 u 1n ) = (aaa11 12 13 a 1n )

2010-11-15 Lecture 9 slide 31 DoolittleDoolittle MethodMethod –– 22

⎡ 100 0⎤ ⎡⎤aaa11 12 13 a 1n ⎡uuu11 12 13 u 1n ⎤ ⎢ ⎥ ⎢ ⎥ ⎢⎥21 10 0 0 uu u aaa21 22 23 a 2n ⎢ ⎥ ⎢ 22 23 2n ⎥ ⎢⎥10 aaa a ⎢ 31 32 ⎥ ⎢ 00uu ⎥ ⎢⎥31 32 33 3n = ⎢ ⎥ 33 3n ⎢⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢⎥⎢ ⎥ ⎣nnn123 1 ⎦ ⎢ 000 u ⎥ ⎣⎦⎢⎥aaannn123 a nn ⎣ nn ⎦

Then solve the 1st column of L, i.e., L(2:n, 1)

⎡a21 ⎤ ⎡21 ⎤ ⎢a ⎥ ⎢ ⎥ ⎢ 31 ⎥ = ⎢ 31 ⎥u ua= ⎢ ⎥ ⎢ ⎥ 11 11 11 ⎢ ⎥ ⎢ ⎥ ⎣an1 ⎦ ⎣n1 ⎦

2010-11-15 Lecture 9 slide 32 DoolittleDoolittle MethodMethod –– 33

(1) 100 0⎡uuu u ⎤ ⎡⎤aaa11 12 13 a 1n ⎡ ⎤ 11 12 13 1n ⎢⎥⎢ ⎥ ⎢ ⎥ aaa a 21 10 00 uu22 23 u 2n (3) ⎢⎥21 22 23 2n ⎢ ⎥ ⎢ ⎥ ⎢⎥aaa a ⎢31 32 10 ⎥ ⎢ 00uu33 3n ⎥ 31 32 33 3n = ⎢ ⎥ ⎢⎥⎢ ⎥ ⎢⎥ (2) ⎢ ⎥ ⎢ ⎥ ⎣⎢nnn123 1 ⎦⎥ ⎢ 000 u ⎥ ⎣⎦⎢⎥aaannn123 a nn ⎣ nn ⎦

Solve the 2nd row of U, i.e., U(2, 2:n)

21(uu 12 13 u uu 1nn)+ ( u 22 23 2 )

= (aa22 23 a 2n )

2010-11-15 Lecture 9 slide 33 DoolittleDoolittle MethodMethod –– 44

⎡uuu u ⎤ ⎡⎤aaa a ⎡ 100 0⎤ 11 12 13 1n 11 12 13 1n ⎢ ⎥ ⎢ ⎥ ⎢⎥10 0 0 uu22 23 u 2n aaa21 22 23 a 2n ⎢ 21 ⎥ ⎢ ⎥ ⎢⎥ ⎢ 00uu ⎥ = ⎢31 32 10 ⎥ 33 3n ⎢⎥aaa31 32 33 a 3n ⎢ ⎥ ⎢⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢⎥ ⎢ 000 u ⎥ ⎣⎢nnn123 1 ⎦⎥ ⎣ nn ⎦ ⎣⎦⎢⎥aaannn123 a nn

The computation order of the Doolittle Method:

\ UL = 1 3 5

2 4 6

2010-11-15 Lecture 9 slide 34 CroutCrout MethodMethod

• Similar to the Doolittle Method, but starts from the 1st column (Doolittle starts from the 1st row.)

1 uu u ⎡11 00 0⎤ ⎡ 12 13 1n ⎤ ⎢ ⎥ ⎢00 ⎥ 01uu = LUA = U ⎢ 21 22 ⎥ ⎢ 23 2n ⎥ ⎢00 1 u ⎥ L ⎢31 32 33 0 ⎥ 3n ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢00 0 1⎥ ⎣⎢nnn123 nn⎦⎥ ⎣ ⎦ The diagonals of U The computation order of the Crout Method: are normalized ! \ UL = 2 4 6

1 3 5

2010-11-15 Lecture 9 slide 35 StorageStorage ofof LULU FactorizationFactorization

Using only one 2-dimensional array !

U U

L )3( L A A )4(

• In implementation, this type of storage requires increasing memory space because of fill-ins during the factorization.

2010-11-15 Lecture 9 slide 36 SummarySummary

• LU factorization has been used in virtually all circuit simulators – Good for multiple RHS and sensitivity calculation • Pivoting is required to handle zero diagonals and to improve numerical accuracy – Partial pivoting (row exchange): tradeoff between accuracy and efficiency – Matrix condition number is used to analyze the effect of round-off errors and

2010-11-15 Lecture 9 slide 37 PRINCIPLES OF CIRCUIT SIMULAITON

PartPart 2.2. ProgrammingProgramming TechniquesTechniques forfor SparseSparse MatricesMatrices

2010-11-15 Slide 38 OutlineOutline

• Why Sparse Matrix Techniques? • Sparse Matrix Data Structure • Markowitz Pivoting • Diagonal Pivoting for MNA Matrices • Modified Markowitz pivoting • How to Handle Sparse RHS • Summary

2010-11-15 Lecture 9 slide 39 WhyWhy SparseSparse Matrix?Matrix?

Motivation: – n = 103 equations – Complexity of Gaussian elimination ~ O(n3) – n = 103 Î ~ 109 flops operations Î (1 GHz computer) 10 sec Î storage 106 words

Exploiting Sparsity – MNA Î 3 nonzeros / row – Can reach complexity for Gaussian elimination • ~ O(n1.1) – O(n1.5) (Empirical complexity)

2010-11-15 Lecture 9 slide 40 SparseSparse MatrixMatrix ProgrammingProgramming

• Use linked-list data structure – to avoid storing zeros – used to be hard before 1980s: in Fortran! • Avoid trivial operations 0x = 0, 0+x = x • Two kinds of zero – Structural zeros – always 0 independent of numerical operations – Numerical zeros – resulting from computation • Avoid losing sparsity (very important!) – sparsity changes with pivoting

2010-11-15 Lecture 9 slide 41 NonNon--zerozero FillFill--insins

• Gaussian elimination causes nonzero fill-ins

-1 3 4 5 0 6 0 0 1 2 8 3 4 5 0 6 0 0 1 2 8 3 0 0 0 -1 0 0 3 -2 4 0 -4 -5 0 -7 0 0 2 -4 -4 5 0 -3 2 6 1 2 7 0 0 5 0 -3 2 6 1 2 7 0 0 0 4 8 0 10 0 0 2 2 3 0 4 8 0 10 0 0 2 2 3 1 5 4 0 0 0 0 1 0 6 1 5 4 0 0 0 0 1 0 6

fill-ins

2010-11-15 Lecture 9 slide 42 HowHow toto MaintainMaintain SparsitySparsity

• One should choose appropriate pivoting (during Gaussian Elimination, G.E.) to avoid large increment of fill-ins.

After row/col reordering no fill-ins introduced

before GE after GE before GE after GE xxxxxx xxxxxx xxxx xx xxxxx xx xx x x xxxxx xx xx x x xxxxx xx xx x x xxxxx xx xx x x xxxxx xxxxxx xxxxx

2010-11-15 fill-ins Lecture 9 slide 43 MarkowitzMarkowitz CriterionCriterion

• Markowitz criterion – kth pivot; – A(k) is the reduced matrix – NZ = nonzero – The num of nonzeros in a row (column) is also called the row (column) degree. – The column degrees can be used for column ordering.

If chosen for pivoting

xxx x4 # NZ this row x x x 3 (k) xx 2 (row degrees) A x x 2 xx2

33223ci \ri

# NZ this col (column degrees) 2010-11-15 Lecture 9 slide 44 MarkowitzMarkowitz ProductProduct • If Gaussian Elimination to pivot on (i, j)

• Markowitz product = (ri –1)(cj-1) = maximum possible number of fill-ins if pivoting at (i, j)

• Recommendations: (implemented in Sparse1.3) – Best with largest magnitude of and smallest Markowitz product – Try threshold test after choosing smallest Markowitz product (M.P.) – Break ties (if equal M.P.) by choosing element with largest magnitude

2010-11-15 Lecture 9 slide 45 SparseSparse MatrixMatrix DataData StructureStructure

Example Matrix r \ c 1 2 3 1 1 1.2 0 2 0 1.5 0 3 2.1 0 1.7

Matrix Element structure

struct elem{ real value; int row; int col; struct elem *next_in_row; struct elem *next_in_col; } Element;

2010-11-15 Lecture 9 slide 46 DataData StructureStructure inin SparseSparse 1.31.3

• Sparse 1.3 – Written by Ken Kundert, 1985~1988, then PhD student at Berkeley, later with Cadence Design Systems, Inc.

FirstInCol[1] FirstInCol[2] FirstInCol[3]

FirstInRow[1] 1.0 1 1 1.2 1 2 diag[1]

FirstInRow[2] 1.5 2 2 diag[2] FirstInRow[3] 2.1 3 1 1.7 3 3 diag[3]

2.1 3 1

value row col

2010-11-15 Lecture 9 slide 47 ASTAPASTAP DataData StructureStructure

• ASTAP is an IBM simulator using STA (Sparse Tableau Analysis).

1 2 3 4 ... Row Pointers 1346-1 r \ c 1 2 3 1 1 1.2 0 Col Indices 12213-1 2 0 1.5 0 Values 1.0 1.2 1.5 2.1 1.7 3 2.1 0 1.7 values stored row-wise

9Row Pointers point to the beginning of Col Indices. 9Nonzeros in the same row are indexed by their col indexes continuously. 9Used by many iterative sparse solvers

2010-11-15 Lecture 9 slide 48 KeyKey LoopsLoops inin aa SPICESPICE ProgramProgram

dx Update stamps Cfxt= (,) dt related to time Cx Cx h t f x nn++11= +⋅ nn(,)+1+ Newton-Raphson ()kkk⎡⎤∂f ⎡⎤() (− 1) Cx nnn111=−+ xx (at point x) +++⎣⎦⎢⎥∂x ⎣⎦ Invoke linear solver

(1)k − x := x + Δx ∂fx()nn t+11, + A = ∂x t := t + Δt

2010-11-15 Lecture 9 slide 49 LinearLinear SolvesSolves inin SimulationSimulation

Update stamps related to time

Newton-Raphson (at point x) Invoke linear solver

x := x + Δx

t := t + Δt At each time point, Ax = b has to be solved for many times

time t

time points

2010-11-15 Lecture 9 slide 50 StructureStructure ofof MatrixMatrix StampsStamps

• In circuit simulation, matrix being solved repeatedly is of the same structur; • only some entries vary at different frequency or time points.

Typical matrix structure

TC X T A = T C X X C = Constant T T = Time varying CCX X = Nonlinear (varying even at C X the same time point)

2010-11-15 Lecture 9 slide 51 StrategiesStrategies forfor EfficiencyEfficiency

• Utilizing the structural information can greatly improve the solving efficiency.

• Strategies: – Weighted Markowitz Product – Reuse the LU factorization – Iterative solver (by conditioning) – ...

2010-11-15 Lecture 9 slide 52 AA GoodGood (Sparse)(Sparse) LULU SolverSolver

Properties of a good LU solver: • Should have a good column ordering . • With a good column ordering, partial (row) pivoting would be enough ! • Should have an ordering/elimination separated design: – i.e., ordering is separated from elimination. – SuperLU does this, –but Sparse1.3 doesn’t.

2010-11-15 Lecture 9 slide 53 OptimalOptimal OrderingOrdering isis NPNP--hardhard

• The ordering has a significant impact on the memory and computational requirements for the latter stages. • However, finding the optimal ordering for A (in the sense of minimizing fill-in) has been proven to be NP-complete. • Heuristics must be used for all but simple (or specially structured) cases.

M.R. Garey and D.S. Johnson, Computers and Intractibility: A Guide to the Theory of NP-Completeness W.H. Freeman, New York, 1979.

2010-11-15 Lecture 9 slide 54 ColumnColumn OrderingOrdering

Why Important ? • A good column ordering greatly reduces the number of fill-ins, resulting in a vast speedup. • However, searching a pivot with minimum degree at each step (in Sparse 1.3) is not efficient. • Best to get a good ordering before elimination (e.g. SuperLU), but not easy!

2010-11-15 Lecture 9 slide 55 AvailableAvailable OrderingOrdering AlgorithmsAlgorithms SuperLU uses the following :

• Multiple Minimum Degree (MMD) applied to the structure of (ATA). – Mostly good • Multiple Minimum Degree (MMD) applied to the structure of (AT+A). – Mostly good • Column Approximate Minimum Degree (COLAMD). – Mostly not good!

2010-11-15 Lecture 9 slide 56 SummarySummary

• Exploiting sparsity reduces CPU time and memory • Markowitz algorithm reflects a good tradeoff between overhead (computation of MP) and savings (less fill-ins) • Use weighted Markowitz to account for different types of element stamps in nonlinear dynamic circuit simulation • Consider sparse RHS and selective unknowns for speedup

2010-11-15 Lecture 9 slide 57 NoNo--turnturn--inin ExerciseExercise

• Spice3f4 contains a solver called Sparse 1.3 (in src/lib/sparse) • This is a independent solver that can be used outside Spice3f4. • Download the sparse package from the course web page (sparse.tar.gz) (or ask TA). • Find the test program called "spTest.c". • Modify this program if necessary so that you can run the solver. • Create some test matrices to test the sparse solver. • Compare the solved results to that by MATLAB.

2010-11-15 Lecture 9 slide 58 SoftwareSoftware

• Sparse1.3 is in C and was programmed by Dr. Ken Kundert (fellow of Cadence; architect of Spectre). • Source code is available from http://www.netlib.org/sparse/ • SparseLib++ is in C++ and comes from NIST. The authors are J. Dongarra, A. Loumsdaine, R. Pozo, K. Remington. • See``A Sparse Matrix Library in C++ for High Performance Architectures”, Proc. of the Second Object Oriented Numerics Conference, pp. 214-218, 1994. • The paper and the C++ source code are available from http://math.nist.gov/sparselib%2b%2b/

2010-11-15 Lecture 9 slide 59 ReferencesReferences

1. G. Dahlquist and A. Bjorck, Numerical Methods (translated by N. Anderson), Prentice Hall, Inc. Englewood Cliffs, New Jersey, 1974. 2. W. J. McCalla, Fundamentals of Computer-Aided Circuit Simulation, Kluwer Academic Publishers. 1. Chapter 3, “Sparse Matrix Methods” 3. Albert Ruehli (Ed.), “Circuit Analysis, Simulation and Design”, North-Holland, 1986. 1. K. Kundert, “Sparse Matrix Techniques” 4. J. Dongarra, A. Loumsdaine, R. Pozo, K. Remington, ``A Sparse Matrix Library in C++ for High Performance Architectures,” Proc. of the Second Object Oriented Numerics Conference, pp. 214-218, 1994.

2010-11-15 Lecture 9 slide 60