<<

Pivot-Free Block Inversion

Stephen M. Watt Ontario Research Centre for Computer Algebra Department of Computer Science University of Western Ontario London Ontario, CANADA N6A 5B7 [email protected]

Abstract Second, we wish to use block matrices as one representa- tion in the mathematical libraries for Aldor [2]. Aldor sup- We present a pivot-free deterministic for the ports generic programming through parametric polymor- inversion of block matrices. The method is based on the phism, where types must satisfy specific type categories. Moore-Penrose inverse and is applicable over certain gen- For matrices to satisfy the type category they must eral classes of rings. This improves on previous methods provide a partial inverse operation. that required at least one invertible on-diagonal block, and Third, we wish to use block matrices as a test case in that otherwise required row- or column-based pivoting, dis- our work benchmarking the performance of compilers on rupting the block structure. Our method is applicable to any generic programs [3]. and does not require any particular blocks We consider the inversion of 2×2 block matrices. Larger to invertible. This is achieved at the cost of two additional matrices are expressed by applying this construction recur- specialized matrix and, in some cases, re- sively. Earlier methods for inversion either quires the inversion to be performed in an extended ring. require one of the blocks to be invertible or degenerate into individual row or column operations, destroying the block 1. Introduction structure. We ask whether it is possible to formulate deterministic block matrix inversion in such a way that for block matrices have been well studied and have many benefits under certain conditions: They may 1. only operations on entire blocks are used, be used to improve memory hierarchy performance, at the level of cache, disk or network. They may be used on a 2. no case-based branching is required, quad-tree representation that allows reasonable memory use for dense, sparse or structured matrices. They allow the 3. block inverses are required only when they are guaran- generic formulation of many algorithms of so teed to exist and that they may be applied over non-commutative domains. 4. applied recursively, the method gives inversion the This allows some some algorithms to be expressed in a same complexity as matrix ? recursive formulation, leading to improved computational complexity. Block matrices have been used in numerical As with other block-oriented methods, we do not require computing, and also in symbolic computing [1]. . This paper revisits the question of block algorithms for We are able to answer this question positively. We show matrix inversion. We are interested in this problem for sev- such a block matrix inversion that is applicable over a gen- eral reasons: eral class of rings. In particular, it may be used for block First, it is an interesting mathematical problem. While matrices with real, complex or finite field entries. many problems that are na ıvely expressed using matrix in- The rest of the paper is organized as follows: Section 2 verse (such as the solution of linear systems) are better provides some necessary background. Section 3 presents solved using other methods (such as PLU factorization), our method for block matrix inverse. The algorithm is pre- matrix inverse nevertheless remains a fundamental opera- sented first for formally real rings, then for the complex and tion. general field cases. Finally we present our conclusions. 2. Preliminaries may be invertible even if all of A, B, C and D are non- invertible. Second, this expression for the matrix inverse re- 2.1 Recursive Block Matrices quires 8 block inversions and 8 block multiplications. When applied recursively to give an algorithm for n × n matrices, 3 We consider n × n matrices with elements from a ring this leads to O(n ) coefficient ring operations. Since faster R and denote the matrix ring Rn×n. For convenience we methods are well known and practical, this formulation is require that R have unity and explicitly state any additional not attractive for a generic algorithm. properties when they are required. If n is even, we may put To avoid these problems, a more common formulation the matrices of Rn×n in a one-to-one correspondence with for block matrix inversion is as n/2×n/2 2×2       the matrices of (R ) . For an n × n matrix M we I −A−1B A−1 0 I 0 M −1 = take   0 I 0 S−1 −CA−1 I AB A M = CD " # A−1 + A−1BS−1CA−1 − A−1BS−1 n n = A A (1) with the 2 × 2 matrices A, B, C, D having elements Aij = −1 −1 −1 −SA CA SA M , B = M n , C = M n , D = M n n i,j ij i,j+ 2 ij i+ 2 ,j ij i+ 2 ,j+ 2 −1 with 1 ≤ i, j, ≤ n/2. For general n, there are two ways to where SA = D − CA B is the of A in impose a 2×2 block structure that preserves the multiplica- M. Alternatively, the inverse may be expressed as tive properties of the matrix ring:       −1 −1 n×n −1 I 0 SD 0 I −BD The first method is to embed R in a ring of larger M = −1 −1 matrices R(n+`)×(n+`) by adding ones along the diagonal −D CI 0 D 0 I " # and zeros elsewhere. If n is odd and ` is one, then we may −1 −1 −1 dlog ne S −S BD apply the 2×2 block once. If ` = 2 2 −n (i.e. = D D . −1 −1 −1 −1 −1 −1 (2) if n+` is the next power of 2), then the construction may be −D CSD D + D CSD BD dlog ne 2×2 applied 2 times to obtain a fully recursive struc- This formulation requires only that either A or D and the ture. This method has the advantage that, at each level, the corresponding Schur complement be invertible. It also has matrix elements are from a specific ring. For the recursive the benefit of computational efficiency, requiring only 2 (2×2)k block matrix ring, we introduce the notation R . This inversions, 6 multiplications and 3 additive operations on allows certain to be expressed conveniently blocks. If 2 ≤ ω ≤ 3 is the exponent such that n × n requires O(nω) element operations, 2k×2k ∼ 2k−1×2k−1 2×2 ∼ (2×2)k−i (2×2)i ∼ (2×2)k R =(R ) =(R ) =R . then this formulation of block matrix inverse, applied re- cursively, leads to an O(nω) algorithm for inversion. For The second method is to divide matrices unevenly. concreteness, Strassen’s matrix multiplication [4] requires The blocks A, B, C, D have elements A = M , ij i,j nlog2 7 multiplications and 6(nlog2 7 − n2) additive opera- Bij0 = Mi,j0+`, Ci0j = Mi0+`,j, Di0j0 = Mi0+`,j0+`, tions (+, −), and the above formulas give matrix inversion where ` is between 1 and n and the indices range as in (6nlog2 7 − 1)/5 multiplicative operations (×, inverse) 1 ≤ i, j ≤ ` and 1 ≤ i0, j0 ≤ n − `. This construction and (72nlog2 7 − 165n2 + 93n)/10 additive operations. may be applied repeatedly to obtain a fully recursive 2 × 2 block structure. For the complexity results given below, we n 2.3 Non-Invertible Blocks choose ` = b 2 c. Both methods may be used for the block matrix inversion While the complexity of the block matrix inversion given problem considered in this paper. Note that in both methods by equations (1) and (2) is acceptable, there remains the the blocks A and D are square. problem that it requires either A or D to be invertible. How- ever, as stated earlier, all blocks may be non-invertible, even 2.2. Na¨ıve Block Matrix Inverse for invertible M. For example, the following is invertible: 2     3 Perhaps the most æsthetically pleasing formulation of 1 0 0 0 6 7 the block matrix inverse is as 6 0 0 1 0 7 " # 6     7 (A − BD−1C)−1 (C − DB−1A)−1 4 0 1 0 0 5 M −1 = , (B − AC−1D)−1 (D − CA−1B)−1 0 0 0 1 which is valid over any ring, and may be applied recursively. The usual approach to deal with this problem is to aban- This formulation has two defects, however: First, it re- don the block formulation of the inverse, and use a row- quires all of A, B, C and D to be invertible. In fact, M oriented divide and conquer PLU matrix factorization to compute the inverse using pivots to avoid non-invertible el- 3. Pivot-Free Inversion ements. See, for example, [5]. This has two problems: The first problem is that if we 3.1 Preconditioning for Block Inversion are representing matrices in memory using a recursive block structure (say, as quad-trees), then a row-oriented PLU de- To apply the recursive block methods given by (1) or (2), composition either requires converting the matrix in and out we must find a way to guarantee that the required blocks of the desired representation or requires using an indexing will be invertible. We do this by preconditioning the matrix scheme that carries a high overhead. The second problem is M. For invertible M, we may choose any invertible G and that if the entries are not themselves block sub-matrices, but have members of some non-commutative base ring, then there is M −1 = (GM)−1G. (3) no internal row structure to exploit. Some authors, e.g. [1], argue that they do not encounter A choice of G that guarantees invertibility of the required non-invertible blocks in practice. This may indeed be the blocks of GM using (1), leads to a block algorithm for case when working with specialized families of matrices or M −1. There is the additional cost of two matrix multipli- with matrices with elements chosen randomly from a large cations, which can clearly be performed using block opera- ring. For a general solution, however, non-invertibility of tions. blocks remains a problem. For example, when the entries We observe that if R is embedded in a larger ring, S, are chosen from a restricted set a significant proportion of and G is an invertible matrix in Sn×n, then (3) may still small matrices can be non-invertible. Since recursive block be used to compute M −1. The computation will use the algorithms operate upon many small matrices in their ex- ring operations of S, but the final result will be a matrix in ecution, non-invertible blocks may be encountered with a Rn×n. significant probability. For concreteness, we show below For convenience, we shall assume from this point for- n×n 1 the proportion of Q matrices that are non-invertible for ward that R is a division ring. This guarantees that the in- small n when elements are chosen from the sets {0, 1} and verses of intermediate expressions in R exist when required. {−1, 0, 1}. Note that for n < 8, most of the {0, 1} matrices If this condition is relaxed, then it may be necessary to work are non-invertible. in some localization of R to deal with non-invertible ele- ments that occur in intermediate expressions. {0, 1}n×n ⊂ Qn×n The matrix G may be chosen to randomize M to yield a −1 n 1 2 3 4 5 6 7 8 probabilistic block algorithm for M . This is often suffi- % non-inv. 50 62 66 66 63 58 52 45 cient for many applications. For a deterministic algorithm for M −1, we must select a {−1, 0, 1}n×n ⊂ Qn×n specific matrix G based on M. We show below that such a n 1 2 3 4 choice of G exists and can be constructed easily. In sections % non-inv. 33 41 40 45 3.3, 3.4 and 3.5 we show how to choose G depending on the algebraic properties of R. In each case we use Lemma 1 of 2.4. The Moore-Penrose Inverse section 3.2.

The Moore-Penrose inverse is a generalized form of ma- 3.2 Two Lemmata trix inversion applicable to non-square matrices, discovered independently by E. Moore [7] and R. Penrose [8]. If Z is Lemma 1 (Block Inverse). If R is a division ring, M ∈ a complex n × m matrix, and Z∗ its conjugate , Rn×n is invertible and there exists G ∈ En×n, E ⊃ R, then the Moore-Penrose inverse, denoted Z+, satisfies: such that all principal minors of GM are invertible, then (GM)−1, and hence M −1 = (GM)−1G ∈ Rn×n, may ZZ+Z = Z (ZZ+)∗ = ZZ+ be computed using only block operations. Block operations + + + ∗ + i Z ZZ = Z (Z Z) = Z Z. are ring operations in E(2×2) . If Z is of full , then m ≤ n, Z∗Z is invertible and the Proof. When the inversion scheme given by equation (1) is Moore-Penrose inverse is applied recursively to GM, the only blocks that are ever Z+ = (Z∗Z)−1Z∗. inverted at any level of recursion are square sub-matrices lying on the diagonal and their Schur complements with re- In this case, Z+ is left inverse of Z. If Z is square and −1 spect to their immediately containing blocks. The square invertible, then the Moore-Penrose inverse is Z . sub-matrices on the diagonal are a special case of the prin- 1These are calculated using the sequences A046747 and A056989 [6]. cipal minors and are therefore all invertible. Now suppose that the inverse of a Schur complement is required when In particular, the rational numbers√ and real numbers are for- inverting the containing block mally real, as are, e.g., Q[ 2], Z(x1, . . . , xn) and R[x, ∂] for formally real R. The complex numbers and rings of fi-  WX  H = nite characteristic are not formally real. YZ We may now state:

In this case we have already that H and W are invertible Theorem 1. If R is a formally real division ring and M ∈ −1 −1 −1 and we require the inverse SW = (Z − YW X) . The Rn×n is invertible, then it is possible to compute M −1 as −1 invertibility of H implies this exists, as SW is an explicit (M T M)−1M T using only block operations. By block op- −1 i sub-block of H . erations, we mean ring operations in R(2×2) . We note that the two matrix multiplications required in Proof. If M is invertible, then so are M T and M T M. We the inverse computation are for special forms of matrices. show that all the principal minors of M T M are invertible, One multiplication is of a matrix with its own transpose, and then Lemma 1 gives the required result. the other is the multiplication of a with a Let N be a principal of M T M and S ⊂ {1, ..., n} general matrix. We show that, depending on the method the index set of its rows and columns in M T M. Then N = used for matrix multiplication, the first of these can be per- M T M , where M is the matrix of the columns of M formed faster than a general multiplication. S S S indexed by S. If N were not invertible, then there would be Lemma 2. For M ∈ Rn×n, computing the the product a non-zero vector x such that Nx = 0. This would imply T T T M M requires at most x Nx = (MSx) (MSx) = 0. As this is a sum of squares in the formally real ring R, it would imply MSx = 0. But 2 ω−2 ω n (2n + 2 − 6) M is invertible and so MS is of full column rank, which ω 2 − 4 implies MSx 6= 0, a contradiction. ω ω and requires at least n /2 multiplications in R. The expression (M T M)−1M T is the real form of the ∗ T Proof. The upper bound follows from recursive application Moore-Penrose inverse, since in this case M = M . of the the formula 3.4 The Complex Case  AT CT   AB  M T M = BT DT CD We now consider the case of complex numbers in a gen-  AT A + CT CAT B + CT D  eral algebraic setting. Let C be a division ring with a for- = . (AT B + CT D)T BT B + DT D mally real sub-ring R and involution “∗”, such that for all c ∈ C, c∗ × c is a sum of squares in R. One such case The lower bound follows by choosing is the complexification of a formally real ring R, that is C = R[i]/hi2 + 1i with complex conjugation as the in-  XT Y  ∗ M = volution: (a + bi) = a − ib for a, b ∈ R. A second case 0 0 is to take C as the quaternions over R with the involution (a + bi + cj + dk)∗ = a − bi − cj − dk for a, b, c, d ∈ R. T and noting that M M computes XY . We may lift the involution on C to one on Cn×n by ∗ For the best asymptotic algorithms the upper bound from defining M as the transpose of the element-wise involu- this lemma is worse than the cost of general matrix multipli- tion of M. Now the arguments presented in section 3.3 all T ∗ cation and so is not useful. When Strassen matrix multipli- follow when M is replaced by M . We may thus com- −1 log 7 M cation is used, however, general multiplication costs n 2 pute with block operations using the Moore-Penrose and this method requires only 1/3n2 + 2/3nlog2 7. inverse: Theorem 2. Let C be a division ring with a formally real 3.3 The Real Case sub-ring R and involution “∗”, such that for all c ∈ C, c∗ × c is a sum of squares in R. If M ∈ Cn×n is invertible, We begin by treating matrices over a formally real ring then it is possible to compute M −1 as (M ∗M)−1M ∗ us- that need not be commutative. A ring R is formally real if, ing only block operations. Here, block operations are ring i for any subset {ai}i=1,...,n ⊂ R, operations in C(2×2) . n T T ∗ ∗ X 2 Proof. Replace x and M with x and M in the proof ai = 0 ⇒ a1 = a2 = ··· = an = 0. i=1 of Theorem 1 to obtain the same contradiction. 3.5 General Fields 4. Conclusions

For rings that do not have a formally real sub-ring, the We have shown how the Moore-Penrose inverse can be approach of the previous sections may not be applied di- used to give a formulation of recursive block matrix inver- rectly. We can, however, work in a convenient extended sion that rules out non-invertible sub-blocks. This is useful ring. for a wide range of element rings. This formulation requires two additional matrix multi- We use an idea of [9] and [10] who work in a rational plications at the top level only. A software implementation function field. Mulmuley [9] has approached the problem of may choose to use the usual, somewhat less expensive, for- fast computation of matrix rank over an arbitrary field, K, mulation of block matrix inversion given by equations (1) by working in the field of univariate rational functions K(t). and (2) and resort to this method only if it encounters a step Diaz-Toca et al [10] have extended this approach to gener- where both A and D are non-invertible. alize Cramer’s rule. They introduce a generalized form of Finally we note that, because our formulation guarantees the Moore-Penrose inverse, which in our setting gives the required blocks will be invertible, it does not require M −1 = (M ◦M)−1M ◦ special cases and may therefore be more suitable for a hard- ware implementation. ◦ −1 T M = Qn M Qn

2 n−1 Acknowledgements where Qn = diag(1, t, t , . . . , t ). We find this formulation of matrix inverse to be suitable The author would like to thank Wayne Eberly, Laureano n×n for pivot-free block matrix inversion in K . Gonzalez Vega, David Jeffrey, Victoria Powers, B. David Saunders and Arne Storjohann for helpful discussions, Theorem 3. Let K be a field. If M ∈ Kn×n is invertible, −1 ◦ −1 ◦ many of which occurred during the 2006 Dagstuhl work- then it is possible to compute M as (M M) M us- shop Challenges in Symbolic Computation. ing only block operations. Here, block operations are ring (2×2)i operations in K(t) . References Proof. We show that all principal minors of M ◦M are in- [1] S.K. Abdali and D.S. Wise: Experiments with quadtree repre- N s × s M ◦M vertible. Let be a principal minor of and sentation of matrices. In P. Gianni (ed.), Proc. ISSAC 88, LNCS S ⊂ {1, ..., n}, the index set of its rows and columns in 358, Springer Verlag (1989), 96–108. ◦ n×s n×s M M. Let PS ∈ {0, 1} ⊂ K(t) be the projection [2] S.M. Watt: Aldor. In Handbook of Computer Algebra, matrix that, when multiplying an n × n matrix on the right, J. Grabmeier, E. Kaltofen and V. Weispfenning (editors), retains the columns indexed by S. Then we have Springer Verlag (2003), 265–270. [3] L. Dragan and S.M. Watt: Performance Analysis of Gener- T −1 T N = PS (Qn M QnM)PS. (4) ics for Scientific Computing. Proc. 7th Int’l Symposium on Symbolic and Numeric Algorithms in Scientific Computing, If N is non-invertible, then there is a non-zero vector (SYNASC 2005), September 25-29 2005, IEEE Press, 93-100. x ∈ K(t)s such that Nx = 0. In that case we also have [4] V. Strassen: is Not Optimal, Nu- merische Mathematik 13, 354-356 (1969). T T [5] A. Aho, J. Hopcroft, and J. Ullman: The Design and Analysis (x PS QnPS) Nx = 0. (5) of Computer Algorithms. Addison-Wesley, 1974. Combining (4) and (5) gives [6] N.J.A. Sloane: The On-Line Encyclopedia of Se- quences, http://www.research.att.com/˜njas/ sequences/ (Valid September 2006). (xT P T Q P ) Nx S n S [7] E. H. Moore: On the Reciprocal of the General Algebraic Ma- T T T −1 T = (x PS QnPS) PS (Qn M QnMPS) x trix. Bulletin of the AMS 26, 394-395 (1920). T [8] R. Penrose: A for Matrices. Proc. of the = (MPS x) Qn(MPS x) Cambridge Philosophical Society 51, 406-413 (1955). n X 2 [9] K. Mulmuley: A Fast Algorithm to Compute the Rank = (MP x) ti−1 = 0. (6) S i of a Matrix over an Arbitrary , Combinatorica 7, 101-104 i=1 (1987). [10] G.M. Diaz-Toca, L. Gonzalez-Vega and H. Lombardi: Gen- But M is invertible by hypothesis and the vector PSx is non-zero. Therefore MP x 6= 0 and the polynomial (6) eralizing Cramer’s Rule: Solving Uniformly Linear Systems of S Equations, SIAM J. Matrix Anal. Appl. 27(3), 621-637 (2005). cannot vanish, which is a contradiction.