FMB - NLA

Block-tridiagonal matrices

. – p.1/31 FMB - NLA

Block-tridiagonal matrices - where do these arise?

- as a result of a particular mesh-point ordering

- as a part of a factorization procedure, for example when we compute the eigenvalues of a .

. – p.2/31 FMB - NLA Block-tridiagonal matrices

Ω1 Ω2 Ω3 Consider a two-dimensional domain partitioned in strips. Assume that points on the lines of intersection are only coupled to their nearest neighbors in the underlying mesh (and we do not have periodic boundary conditions). Hence, there is no coupling between subdomains except through the “glue” on the interfaces.

. – p.3/31 FMB - NLA Block-tridiagonal matrices

When the subdomains are ordered lexicographically from left to right, a

domain Ωi becomes coupled only to its pre- and postdecessors Ωi 1

and Ωi +1, respectively and the corresponding matrix takes the form of

a block tridiagonal matrix = tridiag ( i;i 1 i;i i;i +1), or

¾ ¿

11 12 0

6 7

6 21 22 23 7

6 7 = 6 ...... 7

4 . . . 5

0 Ò;Ò 1 Ò;Ò

\ For definiteness we let the boundary meshline Ω i Ω i +1 belong to Ωi .

In order to preserve the sparsity pattern we shall factor without use of permutations. Naturally, the lines of intersection do not have to be straight.

. – p.4/31 FMB - NLA Block-tridiagonal matrices

How do we factorize a (block)-tridiagonal matrix ?

. – p.5/31 FMB - NLA

Let be block-tridiagonal, and expressed as = A A A .

1 Convenient: seek , , such that = and where

is diagonal, = A and = A Direct computation:

1 1

=( A ) ( A )= A A + A A = A A A

1

i.e., A = + A A

Important: A and A are strictly lower and upper triangular.

. – p.6/31 FMB - NLA 1

= ÄD Í A for pointwise tridiagonal matrices

2 1 1 3 2 1 3

6 2 2 7 6 2 7 6 7 = 6 7 + 6 .. 7 6 .. 7 6 . 7 6 . 7 6 7 6 7 6 7 6 7

4 5 4 5

2 0 3 21 1 3 20 1 2 3

0 1 0 6 2 1 7 6 2 7 6 2 3 7 6 7 6 7 6 7 6 .. 7 6 .. 7 6 .. 7 6 . 7 6 . 7 6 . 7 6 7 6 7 6 7 6 7 6 7 6 7

4 1 05 4 1 5 4 05 Factorization algorithm:

1 = 1; 1

i;i 1 i 1;i

i = i;i

i 1

. – p.7/31 FMB - NLA 1

= ÄD Í A for pointwise tridiagonal matrices

1 Solution of systems with

. – p.8/31 FMB - NLA Block-tridiagonal matrices

Let be block-tridiagonal, and expressed as = A A A . One can envisage three major versions of the factorization algorithm:

1

(i) =( A ) ( A )

1 1

(ii) =( A ) ( A )

e 1 e

(iii) =( A ) ( A )

1



i = ii i;i i ;i 2 = 1 i 1 1 1 11

1

 i =( ii i;i 1 i 1 i 1;i ) 1 0 = 0 (Inverse free

e e

substitutions), where A = A , A = A .

1 e 1 e 1

Here =( A ) ( A )

×

e 1 e 2 e 2 e e 1

( A ) =( + ) ( + )( + A ) and similarly for ( A ) .

A A

. – p.9/31 FMB - NLA Existence of factorization for block-tridiagonal matrice s

(Ö )

We assume that the matrices are real. It can be shown that Ö Ö is always nonsingular for two important classes of matrices, namely for

¯ matrices which are positive definite, i.e.,

Ì Ò

0 for all ¾ Ê ( if has order )

¯ blockwise generalized diagonally dominant matrices (also called

block -matrices), i.e., for which the diagonal matrices are nonsingular and

1 1

k k k k 

i ;i + i ;i 1 = 1 2 1 ii +1 ii

(here 0; 1 = 0 Ò +1;Ò = 0).

. – p.10/31 FMB - NLA

= 1; 2; : : : ; Ò 1 A factorization passes through stages Ö

For two important classes of matrices there holds that the successive top blocks, i.e., pivot matrices which arise after every factorization stage, are nonsingular.

(Ö ) At every stage the current matrix is partitioned in 2 ¢ 2 blocks,

2 11 12 ¡ ¡ ¡ 0 3 (1) (1) 21 22 23 ¡ ¡ ¡ 0 (1) 6 7 2 11 12 3 = = 6 7 = 6 . . . . 7 (1) (1) . . . . 6 ¡ ¡ ¡ . . . . 7 4 21 22 5 6 7 6 7

4 0 0 1 5

1

( ) ( ) ( ) At the th stage we compute 11 = 11 and factor ,

1

( ) ( )

0 ( ) 2 3 2 11 12 3 = ( ) ( ) ( +1)

4 21 11 5 4 0 5

Ö Ö Ö Ö (Ö +1) ( ) ( ) ( ) ( ) where = 22 21 11 12 is the so-called Schur complement. . – p.11/31 FMB - NLA Existence of factorization for block-tridiagonal matrice s

The factorization of a is equivalent to the block Gaussian

(Ö ) elimination of it. Note then that the only block in 22 which will be (1) affected by the elimination (of block matrix 21 ) is the top block of the

(Ö ) (Ö +1) block tridiagonal decomposition of 22 , i.e., 11 , the new pivot matrix.

We show that for the above matrix classes the Schur complement

Ö Ö Ö Ö (Ö +1) ( ) ( ) ( ) ( ) (Ö ) = 22 21 11 12 belongs to the same class as , i.e., in particular that the pivot entries are nonsingular.

. – p.12/31 FMB - NLA

" #

11 12

Lemma 1 Let = be positive definite. Then ii = 1 2 and the

21 22

1 Schur complement = 22 21 11 12 are also positive definite.

Proof There holds = 1 11 1 for all =( 1 0). Hence 1 11 1 0 for all

1, i.e., 11 is positive definite. Similarly, it can be shown that 22 is positive definite.

Since is nonsingular then

1 = = for =

1 so 0 for all 6= 0 i.e., the inverse of is also positive definite. Use now the explicit form of the inverse computed by use of the factorization ,

11 12 11 0 0 £ £ 1 2 3 2 3 2 3 2 3 = =

1 1 40 5 4 0 5 4 21 11 5 4£ 5

where £ indicates entries not important for the present discussion.

1 1 Hence, since is positive definite, so is its diagonal block . Hence, the inverse of

1 , and therefore also , is positive definite.

. – p.13/31 FMB - NLA

Ö (Ö ) (Ö +1) ( +1) Corollary 1 When is positive definite, and in particular, 11 are positive definite.

(Ö +1) (Ö ) (Ö +1) Proof is a Schur complement of so by Lemma 1,

(Ö ) is positive definite when is. In particular, its top diagonal block is positive definite.

. – p.14/31 FMB - NLA

" #

11 12

Lemma 2 Let = be blockwise generalized diagonally dominant

21 22

1 where is block tridiagonal. Then the Schur complement = 22 21 11 12 is also generalized diagonally dominant.

Proof (Hint) Since the only matrix block in which has been changed

(Ö +1) from 22 is its top block 11 to 11 it suffices to show that 11 is non- singular and the first block column is generalized diagonally dominant.

. – p.15/31 . – p.16/31 , where 1 b ¡ = ¡ ¡ x 2 . The matrices ¡

¡ ¡ 3 1 =

= 2 = or k k = i;k 1 i;k =1

È i k , i.e., +1

z i Ò È i , i.e., = = k b are lower- and upper-triangular, respectively. x =

= g i i z i;j = f 1 i = = 1 , we must perform two steps: Ò x and = g Ò Linear recursions i;j f has been already factorized as = Consider the solution of the linear system of equations backward substitution: To compute forward substitution: FMB - NLA

While the implementation of the forward and back-substitution on a serial computer is trivial, to implement them on a vector or parallel computer system is problematic. The reason is that the relations are particular examples of a linear recursion which is an inherently

sequential process. A general -level recurrence relation reads as

¡ ¡ ¡ i = 1;i i 1 + 2;i i 2 + + Ñ;i i Ñ + i and the performance of its straightforward vector or parallel implementation is degraded due to the existing backwards data dependencies.

. – p.17/31 FMB - NLA Block-tridiagonal matrices

Can we speedup somehow

the solution of systems with bi- or tri-diagonal matrices ?

. – p.18/31 FMB - NLA Multifrontal solution methods

* * * * * * * * * * * * * * * * 1 3 5 7 9 8 6 4 2 * * * * * * x n0 * * *

Ü (a) Two way frontal method ( 0 is the (b) The structure of the

middle node matrix A

Any tridiagonal or block tridiagonal matrix can be attacked in parallel from both ends, after a proper numbering of the unknowns It can be seen that we can work independently on the odd numbered and even numbered points until we have eliminated all entries except the final corner one.

. – p.19/31 FMB - NLA

Hence, the factorization and forward substitution can occur in parallel for the two fronts (the even and the odd). At the final point we can either continue in parallel with the back substitution to compute the solution at all the other interior points, or we can use the same type of two way frontal method now for each of the two structures which have been split by the already computed solution at the middle point. This method of recursively dividing the domain in smaller and smaller pieces which can be handled all in parallel, can be continued d log2 e steps, after which we have just one unknown per subinterval.

. – p.20/31 FMB - NLA

The idea to perform Gaussian elimination from both ends of a tridiagonal matrix, called also twisted factorization, was proposed first by Babushka in 1972.

Note that in this method no back substitution is required.

. – p.21/31 FMB - NLA Odd-even elimination/cyclic reduction/divide- and-conquer

We sketch some parallel computation methods for recurrence relations. The methods are applicable for general (block-)band matrices. For simplicity of presentation, the idea is illustrated on one-level or two-level scalar recursions:

¡ ¡ ¡ 1 = 1 i = i i 1 + i = 2 3

¡ ¡ ¡ i;i 1 i 1 + i;i i + i;i +1 i +1 = i = 1 2 1; 0 = Ò;Ò +1 = 0

The corresponding matrix-vector equivalent of the above recursions is

to solve a system x = b, where is lower bidiagonal and tridiagonal, respectively.

. – p.22/31 FMB - NLA

An idea to gain some parallelism when solving linear recursions is to reduce the size of the corresponding linear system by eliminating the odd-indexed unknowns from the even-numbered equations (or vice versa). This elimination can be done in parallel for each of the equations because the odd numbered equations and the even numbered equations are both mutually uncoupled. The system of equations resulting for the even numbered and for the odd numbered unknowns after the elimination can be applied for the reduced equations and so on. For every elimination step we reduce the order of the coupled equations to about half its previous order and eventually we are left with a single equation or a system of uncoupled equations.

1 ** 2 *** 2 3 *** * * 4 4 * * * *** 6 5 *** * * 6 *** 7 **

. – p.23/31 FMB - NLA

In the odd-even elimination (or odd-even reduction) method we eliminate the odd numbered unknowns (i.e., numbers 1 (mod 2)) and we are left with a tridiagonal system for the even numbered (i.e., numbers 2 (mod 2)) unknowns. The method is repeated, i.e., we eliminate the unknowns 2 (mod 4) and are left with the unknowns numbered 4 (mod 4) and so on. Eventually we are left with just a single equation which we solve. At this point we can use back substitution to compute the remaining unknowns.

. – p.24/31 FMB - NLA ...the odd-even simultaneous ...

There exist a second version of this method, called the odd-even simultaneous elimination. In the odd-even simultaneous elimination method we eliminate the odd-numbered unknowns from the even numbered equations and simultaneously the even numbered unknowns from the odd equations. In this way we are left with two decoupled equations, one for the even numbered unknowns and one for the odd numbered unknowns. The same method can be recursively applied for these two sets in parallel. Hence, in this method we do not reduce the size of the problem but we successively decouple the problems into smaller and smaller sizes of subproblems. Eventually we arrive at a system on diagonal form which we solve for all unknowns in parallel. Therefore, in this method there is no need to perform back substitution.

. – p.25/31 FMB - NLA ...the odd-even...

1 * * 1 * * 2 *** 3 * * * 3 *** 5 * ** 4 *** 7 * * 5 *** 2 * * 6 * * * 4 * ** 7 * * * 6 *** 8 * * 8 * *

1 * * 5 * * 3 * * 7 * * 2 * * 6 ** 4 * * 8 * * Two elimination steps of the simultaneous elimination method

. – p.26/31 FMB - NLA ...the odd-even...

The computational complexity of the sequential LU factorization and

forward and back substitution method for three-diagonal matrices is 8 . While performing the odd-even simultaneous elimination we perform

9 d log2 e flops to transform the system and flops to solve the last diagonal system. Hence, the redundancy of the odd-even simultaneous elimination method is  9 8d log2 e which is the price we pay to get a fully parallel method.

. – p.27/31 FMB - NLA Algebraic description of the odd-even...

Consider the three-term recursion, which we rewrite as

2 1 2 1 + 2 2 + 2 2 +1 = 2

2 2 + 2 +1 2 +1 + 2 +1 2 +2 = 2 +1

2 +1 2 +1 + 2 +2 2 +2 + 2 +2 2 +3 = 2 +2

i 2i 2 +1 We multiply the first equation by , the third by and add the resulting

2i 2i +2 equations to the second equation. The so-resulting equation is

( ) (1) (1) (1)

+ + = = 0 1 where 2 2 1 2 +1 2 +1 2 +1 2 +3 2 +1

(1) (1) i 2i 2i 2 +1

= =

2 1 2 +1 2 2 +1 2 2 +1 2i 2i 2i +2

(1) 2i +1 (1) 2i +1 2i

= =

2 +2 2 +1 2 2 +2 2 +1 2 +1 2i +2 2i 2i +2

Next, the odd-even reduction is repeated for all odd numbered equations. The resulting system can be reduced in a similar way and eventually we are left with just one equation.

. – p.28/31 FMB - NLA

Similarly, for the even points we get

(1) (1) (1) (1)

i + i + i = = 1 2 2i 1 2 2 2i 2 2i 2 +2 2i

(1) (1) (1) (1) where and are defined accordingly. 2i 1 2i 2i 2i It is interesting to note that for a sufficiently diagonally dominant matrix, the reduction can be terminated or truncated after fewer than (log2 ) steps, since the reduced system can be considered numerically (i.e., up to a machine ) as a diagonal system.

. – p.29/31 FMB - NLA

With the same indices for a block tridiagonal system

= blocktridiag( i 1 i i ) we get

(1) 1

= i i 2i 2 2i 2 1

(1) 1 1

= i i i i i 2i +1 2 +1 2 2i 2 2 +1 2i +2 2 +1

(1) 1

= i i (1) 2i +1 2 +1 2i +2 2 +2

. – p.30/31 FMB - NLA Some keywords to discuss

¯ Load balancing for cyclic reduction methods

¯ Divide-and-conquer techniques

¯ Domain decomposition ordering

. – p.31/31