Faster Inversion and Other Black Box Matrix Computations Using Efficient Block Projections
Wayne Eberly1, Mark Giesbrecht2, Pascal Giorgi2,4, Arne Storjohann2, Gilles Villard3
(1) Department of Computer Science, University of Calgary, Calgary, Alberta, Canada [email protected] (2) David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada [email protected], [email protected], [email protected] (3) CNRS, LIP, Ecole´ Normale Superieure´ de Lyon, Lyon, France [email protected] (4) IUT — Universite´ de Perpignan, Perpignan, France [email protected]
ABSTRACT rank, are obtained at the same cost. An application of this E cient block projections of non-singular matrices have re- technique to Kaltofen and Villard’s Baby-Steps/Giant-Steps cently been used by the authors in [10] to obtain an e cient algorithms for the determinant and Smith Form of an inte- ger matrix is also sketched, yielding algorithms requiring algorithm to nd rational solutions for sparse systems of lin- 2.66 ear equations. In particular a bound of O˜(n2.5) machine op- O˜(n ) machine operations. More general bounds involv- erations is presented for this computation assuming that the ing the number of black-box matrix operations to be used input matrix can be multiplied by a vector with constant- are also obtained. sized entries using O˜(n) machine operations. Somewhat The derived algorithms are all probabilistic of the Las Ve- more general bounds for black-box matrix computations are gas type. They are assumed to be able to generate random also derived. Unfortunately, the correctness of this algo- elements — bits or eld elements — at unit cost, and always rithm depends on the existence of e cient block projections output the correct answer in the expected time given. of non-singular matrices, and this was only conjectured. In this paper we establish the correctness of the algorithm Categories and Subject Descriptors from [10] by proving the existence of e cient block pro- I.1.2 [Symbolic and Algebraic Manipulation]: Algo- jections for arbitrary non-singular matrices over su ciently rithms—Algebraic algorithms, analysis of algorithms large elds. We further demonstrate the usefulness of these projections by incorporating them into existing black-box matrix algorithms to derive improved bounds for the cost General Terms of several matrix problems. We consider, in particular, ma- Algorithms trices that can be multiplied by a vector using O˜(n) eld operations: We show how to compute the inverse of any such non-singular matrix over any eld using an expected Keywords 2.27 number of O˜(n ) operations in that eld. A basis for Sparse integer matrix, structured integer matrix, linear sys- the null space of such a matrix, and a certi cation of its tem solving, black box linear algebra This material is based on work supported in part by the French National Research Agency (ANR Gecko, Villard), 1. INTRODUCTION and by the Natural Sciences and Engineering Research Council (NSERC) of Canada (Eberly, Giesbrecht, Storjo- In our paper [10] we presented an algorithm which pur- hann). portedly solved a sparse system of rational equations consid- erably more e ciently than standard linear equations solv- ing. Unfortunately, its e ectiveness in all cases was conjec- tural, even as its complexity and actual performance were Permission to make digital or hard copies of all or part of this work for very appealing. This e ectiveness relied on a conjecture re- personal or classroom use is granted without fee provided that copies are garding the existence of so-called e cient block projections. not made or distributed for profit or commercial advantage and that copies Given a matrix A Fn n over any eld F, these projections bear this notice and the full citation on the first page. To copy otherwise, to should be block vectors∈ u Fn s (where s is a blocking republish, to post on servers or to redistribute to lists, requires prior specific factor dividing n, so n = ms∈) such that we can compute uv permission and/or a fee. or vtu quickly for any v Fn s, and such that the sequence ISSAC'07,July29–August1,2007,Waterloo,Ontario,Canada. m∈ 1 Copyright 2007 ACM 978-1-59593-743-8/07/0007 ...$5.00. of vectors u, Au,..., A u has rank n. In this paper, we
143
prove the existence of a class of such e cient block projec- lel in [12, 13], and this is further developed in [8, 18]. See tions for non-singular n n matrices over su ciently large the references in these papers for a more complete history. elds; we require that the size of the eld F exceed n(n + 1). For sparse systems over a eld, the seminal work is that This can be used to establish a variety of results concern- of Wiedemann [22] who shows how to solve sparse n n ing matrices A Zn n with e cient matrix-vector prod- systems over a eld with O(n) matrix-vector products and ucts — in particular,∈ such that a matrix-vector product O(n2) other operations. This research is further developed Ax mod p can be computed for a given integer vector x and a in [4, 16, 17] and many other works. The bit complexity of small (word-sized) prime p using O˜(n) bit operations. Such similar operations for various families of structured matrices matrices include all “sparse” matrices having O(n) nonzero is examined by Emiris and Pan [11]. entries, assuming these are appropriately represented. They also include a variety of “structured” matrices, having con- 2. EFFICIENT BLOCK PROJECTIONS stant “displacement rank” (for one de nition of displace- For now we will consider an arbitrary invertible matrix ment rank or another) studied in the recent literature. n n A F over a eld F, and s an integer, the blocking factor, In particular, our existence result implies that if A Zn n ∈ ∈ that divides n exactly. Let m = n/s. For a so-called block is non-singular and has an e cient matrix-vector product n s projection u F and 1 k m, we denote by k(A, u) then the Las Vegas algorithm for system solving given in the block Krylov∈ matrix [u, Au, . . . , Ak 1u] Fn Kks. We [10] can be used to solve a system Ax = b for a given integer n n ∈ wish to show that m(A, u) F is non-singular for a vector b using an expected number of matrix-vector products K ∈ 1.5 particularly simple and sparse u, assuming some properties modulo a word-sized prime that is O˜(n log( A + b )) of A. together with an expected number of additionalk bitk opkera-k 2.5 Our factorization uses the special projection (which we tions that is O˜(n log( A + b )). If A has an e cient will refer to as an e cient block projection) matrix-vector product thenk thek totalk k expected number of bit operations used by this algorithm is less than that used by Is any previously known algorithm, at least when “standard” u = 2 . 3 Fn s, (2.1) (i.e., cubic) matrix arithmetic is used. . ∈ 6 I 7 Consider, for example, the case when the cost of a matrix- 4 s 5 vector product by A modulo a word-sized prime is O˜(n) which is comprised of m copies of Is and thus has exactly operations, and the entries in A are constant size. The cost n non-zero entries. We suggest a similar projection in [10] 2.5 of our algorithm will be O˜(n ) bit operations. This im- without proof of its reliability (i.e., that the corresponding proves upon the p-adic lifting method of Dixon [6], which block Krylov matrix is non-singular). We establish here that 3 requires O˜(n ) bit operations for sparse or dense matrices. it does yield a block Krylov matrix of full rank, and hence This theoretical e ciency was re ected in practice in [10] at can be used for an e cient inverse of a sparse A. least for large matrices. Let = diag( 1, . . . , 1, 2, . . . , 2, . . . , m, . . . , m) be an We present several other rather surprising applications of n n Ddiagonal matrix whose entries consist of m distinct this technique. Each incorporates the technique into an ex- indeterminates i, each i occurring s times. isting algorithm in order to reduce the asymptotic complex- Theorem ity for the matrix problem to be solved. In particular, given 2.1. If the leading ks ks minor of A is non- F n n Fn n F zero, for 1 k m, then m( A , u) [ 1, . . . , m] a matrix A over an arbitrary eld , we are able to K D D ∈ compute the∈complete inverse of A with O˜(n3 1/(ω 1)) op- is non-singular. 2 1/(ω 1) Proof. erations in F plus O˜(n ) matrix-vector products by Let = A . For 1 k m, de ne k as B D D B A. Here ω is such that we can multiply two n n matrices the specialization of obtained by setting k+1, k+2, . . . , m ω B with O(n ) operations in F. Standard matrix multiplication to zero. Thus k is the matrix constructed by setting to B gives ω = 3, while the best known matrix multiplication of zero the last n ks rows and columns of . Similarly, for Fn s B Coppersmith and Winograd [5] has ω = 2.376. If again we 1 k m we de ne uk to be the matrix constructed ∈ can compute v Av with O˜(n) operations in F, this implies from u by setting to zero the last n ks rows. In particular 7→ 3 1/(ω 1) we have m = and um = u. This specialization will allow an algorithm to compute the inverse with O˜(n ) op- B B erations in F. This is always in O˜(nω), and in particular us to argue incrementally about how the rank is increased equals O˜(n2.27) operations in F for the best known ω of as k increases. [5]. Other relatively straightforward applications of these We proceed by induction on k and show that techniques yield algorithms for the full nullspace and (certi- rank k( k, uk) = ks, (2.2) ed) rank with this same cost. Finally, we sketch how these K B for 1 k m. For the base case k = 1 we have 1( 1, u1) = methods can be employed in the algorithms of Kaltofen and K B Villard [18] and Giesbrecht [13] to computing the determi- u1 and thus rank 1( 1, u1) = rank u1 = s. Now, assume thatK (2.2)B holds for some k with 1 k < m. nant and Smith form of sparse matrices more e ciently. By the de nition of k and uk, only the rst ks rows of There has certainly been much important work done on B k and uk will be involved in the left hand side of (2.2). nding exact solutions to sparse rational systems prior to B [10]. Dixon’s p-adic lifting algorithm [6] performs extremely Similarly, only the rst ks columns of k will be involved. Since by assumption on the leading kBs ks minor is non- well in practice for dense and sparse linear systems, and is B zero, we have rank k k( k, uk) = ks, which is equivalent implemented e ciently in LinBox [7] and Magma (see [10] B K B to rank k( k, kuk) = ks. By the fact that the rst ks for a comparison). Kaltofen and Saunders [17] are the rst K B B rows of uk+1 uk are zero, we have k(uk+1 uk) = 0, or to propose to use Krylov-type algorithms for these prob- B equivalently kuk+1 = kuk, and hence lems. Krylov-type methods are used to nd Smith forms of B B sparse matrices and to solve Diophantine systems in paral- rank k( k, kuk+1) = ks. (2.3) K B B
144
The matrix in (2.3) can be written as Corollary 2.3. For any non-singular A Fn n and s n (over a eld of size greater than n(n + 1))∈there exists Mk n ks | Fn n Fs n Fn s k( k, kuk+1) = F , an e cient block projection (R, u, v) . K B B » 0 – ∈ ∈
ks ks where Mk F is non-singular. Introducing the block ∈ 3. FACTORIZATION OF THE MATRIX uk+1 we obtain the matrix INVERSE Mk The existence of the e cient block projection established [uk+1, k( k, kuk+1)] = 2 Is 0 3 . (2.4) in the previous section allows us to de ne a useful factor- K B B 0 0 ization of the inverse of a matrix. This was used to obtain 4 5 faster heuristics for solving integer systems in [10]. The basis whose rank is (k + 1)s. Noticing that is the following factorization of the matrix inverse. Let = A , where is an n n diagonal matrix whose uk+1, k( k, kuk+1) = k+1 ( k, uk+1) , B D D D » K B B – K B diagonal entries consist of m distinct indeterminates, each occurring s times contiguously, as previously de ned. De ne we are led to (r) (`) T T u = m( , u) with u as in (2.1) and u = m( , u) , K K B K K B rank k+1( k, uk+1) = (k + 1)s. where (r) and (`) refer to projection on the right and left K B respectively. For any 0 k m 1 and any two indices Finally, using the fact that k is the specialization of k+1 T l r T k B B l and r such than l + r = k we have u u = u u. obtained by setting k+1 to zero, we obtain (`) (r) B B B Hence the matrix u = u u is block-Hankel with H K B K rank k+1( k+1, uk+1) = (k + 1)s, blocks of dimension s s: K B which is (2.2) for k + 1 and thus establishes the theorem by uT u uT 2u . . . uT mu B B B induction. . . 2 uT 2u uT 3u .. . 3 = u B. B If the leading ks ks minor of A is non-zero, then the H 6 . .. T 2m 2 7 T 6 . . u u 7 leading ks ks minor of A is non-zero as well, for any 6 B 7 6 uT mu . . . uT 2m 2u uT 2m 1u 7 integer k. This gives us the following corollary. 4 B B B 5 (`) (r) (`) (r) Corollary 2.2. If the leading ks ks minor of A is non- Notice that u = u u = u A u . T H K B K K D D K zero for 1 k m, and = A , then m( , u) is Theorem 2.1 and Corollary 2.2 imply that if all leading ks B D D K B (`) (r) non-singular. ks minors of A are non-singular then Ku and Ku are each non-singular as well. This establishes the following. Suppose now that A Fn n is an arbitrary non-singular ∈ matrix and the size of F exceeds n(n + 1). It follows by Theorem 3.1. If A Fn n is such that all leading ks Theorem 2 of Kaltofen and Saunders [17] that there exists ks minors are non-singular,∈ is a diagonal matrix of in- n n a lower triangular Toeplitz matrix L F and an upper determinates, and = A D, then 1 and A 1 may be n n ∈ triangular Toeplitz matrix U F such that each of the factored as B D D B ∈ leading minors of A = UAL is non-zero. Let = A ; the B D D 1 = (r) 1 (`), product of the determinants of the matrices m( , u) and B Ku Hu Ku T b K B b (3.1) m( , u) (mentioned in the above theorem and corollary) 1 (r) 1 (`) A = u u u , isK a Bpolynomial with total degree less than 2n(m 1) < DK H K D (`) (r) n n n(n + 1) (if m = 1). In this case it follows that there is where u and u are as de ned above, and u F 6 n n also a non-singular diagonal matrix D F such that is block-HankelK (andK invertible) with s s blocks,Has∈above. T ∈ m(B, u) and m(B , u) are non-singular, for K K Note that for any specialization of the indeterminates in to B = DAD = DUALD. D eld elements in F such that det u = 0 we obtain a similar H 6 Now let R = LD2U Fbn n, u Fs n and v Fn s such formula to (3.1) completely over F. A similar factorization that ∈ ∈ ∈ in the non-blocked case is used in [9, (4.5)] for fast parallel b b matrix inversion. uT = (LT ) 1D 1u and v = LDu. Then b b 4. BLACK-BOX MATRIX INVERSION m(RA, v) = LD m(B, u) OVER A FIELD K K and Suppose again that A Fn n is invertible, and that for b Fn 1 ∈ T T T T T any v the products Av and A v can be computed in L D m((RA) , u ) = m(B , u), ∈ K K (n) operations in F (where (n) n). Following Kaltofen, T T so that m(RA, v) and m((RbA) , u ) are each non-singular we call such matrix-vector and vector-matrix products black- as well.KBecause D is diagonalK and U and L are triangular box evaluations of A. In this section we will show how to Toeplitz matrices,b it is now easily establishedb that (R, u, v) compute A 1 with O˜(n2 1/(ω 1)) black box evaluations and is an e cient block projection for the given matrix A, where O˜(n3 1/(ω 1)) additional operations in F. Note that when such projections are as de ned in [10]. b b (n) = O˜(n) the exponent in n of this cost is smaller than This proves Conjecture 2.1 of [10] for the case that the ω, and is O˜(n2.273) with the currently best-known matrix size of F exceeds n(n + 1): multiplication.
145
Again assume that n = ms, where s is a blocking factor additional operations. If (n) = O˜(n), the overall number and m the number of blocks. Assume for the moment that of eld operations is minimized with the blocking factor s = all principal ks ks minors of A are non-zero, 1 k m. n1/(ω 1). Let , , . . . , be the indeterminates that form the di- 1 2 m Theorem 4.1. Let A Fn n, where n = ms and s = agonal entries of and let = A . By Theorem 2.1 1/(ω 1) ∈ D B D D T n , be such that all leading ks ks minors are non- and Corollary 2.2, the matrices m( , u) and m( , u) K B K B singular for 1 k m. Let B = DAD, for D = diag(d1, are each invertible. If m 2 then the product of the . . . , d1, . . . , dm, . . . , dm), such that d1, . . . , dm are non-zero determinants of these matrices is a non-zero polynomial T and each of the matrices m(DAD, u) and m((DAD) , u) F[ 1, . . . , m] with total degree at most 2n(m 1). K 1 K ∈ is invertible. Then the inverse matrix A can be computed Suppose that F has at least 2n(m 1) elements. Then 2 1/(ω 1) n using O(n ) black box operations and an additional cannot be zero at all points in (F 0 ) . Let d1, d2, . . . , dm \ { } O˜(n3 1/(ω 1)) operations in F. be non-zero elements of F such that (d1, d2, . . . , dm) = 0, 6 let D = diag(d1, . . . , d1, . . . , dm, . . . , dm), and let B = DAD. The above discussion makes a number of assumptions. (r) n n (`) T T Then Ku = m(B, u) F and Ku = m(B , u) First, it assumes that the blocking factor s exactly divides n n K ∈ K ∈ F are each invertible because (d1, d2, . . . , dm) = 0, B n. This is easily accommodated by simply extending n to 6 is invertible because A is and d1, d2, . . . , dm are all non-zero, the nearest multiple of s, placing A in the top left corner (`) (r) n n of the augmented matrix, and adding diagonal ones in the and thus Hu = Ku BKu F is invertible as well. Correspondingly, (3.1) suggests∈ bottom right corner. Theorem 4.1 also makes the assumptions that all the lead- 1 (r) 1 (`) 1 (r) 1 (`) B = Ku Hu Ku , and A = DKu Hu Ku D ing ks ks minors of A are non-singular and that the de- terminan ts of (DAD, u) and ((DAD)T , u) are each for computing the matrix inverse. m m non-zero. AlthoughK we know of noK way to ensure this de- T T T 2m 1 (`) terministically in the times given, standard techniques can 1. Computation of u , u B, . . . , u B and Ku . F (r) be used to obtain these properties probabilistically if is We can compute this sequence, hence Ku , with m 1 su ciently large. applications of B to vectors using O(n (n)) operations Suppose, in particular, that n 16 and that #F > 2(m + in F. 1)n log2 n . Fix a set of at least 2(m + 1)n log2 n non- zerod elemene ts of F. WSe can ensure that the dleadinge ks 2. Computation of Hu. Due to the special form (2.1) of u, one may then com- ks minors of A are non-zero by pre- and post-multiplying pute wu for any w Fs n with O(sn) operations. by butter y network preconditioners X and Y respectively, ∈ T i with parameters chosen uniformly and randomly from . If Hence we can now compute u B u for 0 i 2m 1 S with O(n2) operations in F. X and Y are constructed using the generic exchange matrix of [4, 6.2], then it will use at most n log2 n /2 random el- 1 § d e 3. Computation of Hu . ements from S, and from [4, Theorem 6.3] it follows that 1 The o -diagonal inverse representation of Hu as in all leading ks ks minors of A = XAY will be non-zero (A.4) in the Appendix can be found with O˜(sωm) op- simultaneously with probability at least 3/4. This proba- erations by Proposition A.1. bility of success can be made arbitrarilye close to 1 with a 1 1 1 (`) choice from a larger . We note that A = Y A X. Thus, 4. Computation of H K . S u u once we have computed A 1 we can compute A 1 with an From Corollary A.2 in the Appendix, we can compute 2 e additional O˜(n ) operations in F, using the fact that multi- the product H 1M for any matrix M Fn n with u plication of an arbitrary ne n matrix by an n n butter y O˜(sωm2) operations. ∈ preconditioner can be done with O˜(n2) operations. (r) 1 (`) Once again let be the products of the determinants 5. Computation of Ku (Hu Ku ). T (r) m 1 of the matrices m( A , u) and m(( A ) , u), so that We can compute Ku M = [u, Bu, . . . , B u]M, for is non-zero withK totalD Ddegree atKmostD2nD(m 1). If we any M Fn n by splitting M into m blocks of s ∈ choose randomly selected values from for 1, . . . , m, be- consecutive rows Mi, for 0 i m 1: cause # 2(m + 1)n log n > 4 degS the probability S d 2 e m 1 that is zero at this point is at most 1/4 by the Schwartz- i KuM = B (uMi) Zippel Lemma [21, 23]. Xi=0 (4.1) In summary, for randomly selected butter y precondition- =uM0 + B(uM1 + B(uM2 + ers X, Y as above, and independently and randomly chosen values d1, d2, . . . , dm the probability that A = XAY has + B(uMm 2 + BuMm 1) ). non-singular leading ks ks minors for 1 k m and e Because of the special form (2.1) of u, each product (d1, d2, . . . , dm) is non-zero is at least 9/16 > 1/2 when n n 2 uMi F requires O(n ) operations, and hence all random choices are made uniformly and independently from ∈ F such products involved in (4.1) can be computed in a nite subset of 0 with size at least 2(m+1)n log2 n . 2 F S \{ } d e O(mn ) operations. Because applying B to an n n When # 2(m + 1)n log2 n , we can easily construct (r) E F d e matrix costs n (n) operations, Ku M is computed in a eld extension of that has size greater than 2(m + 2 O(mn (n) + mn ) operations using the iterative form 1)n log2 n and perform the computation in that extension. d e F of (4.1) Because this extension will have degree O(log#F n) over , it will add only a logarithmic factor to the nal cost. While we In total, the above process requires O(mn) applications certainly do not claim that this is not of practical concern, of A to a vector (the same as for B), and O(sωm2 + mn2) it does not a ect the asymptotic complexity.
146
This algorithm is Las Vegas (or trivially modi ed to be Corollary 4.4. Let A Fn n be non-singular and let T n n ∈ 1 so): For if either m( A , u) or m(( A ) , u) is singular M F . We can compute A M with a Las Vegas algo- K D D K D D ∈ 2 1/(ω 1) then so is Hu and this is detected at step 3. On the other rithm whose expected cost is O˜(n ) black box oper- T 3 1/(ω 1) hand, if m( A , u) and m(( A ) , u) are both non- ations and O˜(n ) additional operations in F. singular thenK DtheDalgorithm’sKoutputD Dis correct. The estimates in Table 4.1 apply to this computation as well. Theorem 4.2. Let A Fn n be non-singular. Then the inverse matrix A 1 can ∈be computed by a Las Vegas algo- rithm whose expected cost is O˜(n2 1/(ω 1)) black box oper- 5. APPLICATIONS TO BLACK-BOX ations and O˜(n3 1/(ω 1)) additional operations in F. MATRICES OVER A FIELD The algorithms of the previous section have applications Table 4.1 (below) states the expected costs to compute in some important computations with black-box matrices the inverse using various values of ω when (n) = O˜(n). over an arbitrary eld F. In particular, we consider the problems of computing the nullspace and rank of a black- ω Black-box Blocking Inversion box matrix. Each of these algorithms is probabilistic of the applications factor cost Las Vegas type; the output is certi ed to be correct. 3 (Standard) 1.5 n1/2 O˜(n2.5) Kaltofen and Saunders [17] present algorithms for com- puting the rank of a matrix and for randomly sampling the 2.807 (Strassen) 1.446 n0.553 O˜(n2.446) nullspace, building upon the work of Wiedemann [22]. In 2.3755 (Cop/Win) 1.273 n0.728 O˜(n2.273) particular, they show for random lower upper and lower tri- angular Toeplitz matrices U, L Fn n, and random diag- ∈ Table 4.1: Exponents of matrix inversion with a ma- onal D, that all leading k k minors of A = UALD are e trix vector cost (n) = O˜(n). non-singular for 1 k r = rank A, and that if f A F[x] e ∈ is the minimal polynomial of A, then it has degree r + 1 if A Remark 4.3. The structure (2.1) of the projection u plays is singular (and degree n if A is non-singular). This is proved to be true for any input A Fen n, and for random choice of a central role in computing the product of the block Krylov ∈ Fn s U, L and D, with high probability. The cost of computing matrix by a n n matrix. For a general projection u , e how to do better than a general matrix multiplication,∈ i.e., f A (and hence rank A) is shown to be O(n) applications of how to take advantage of the Krylov structure for computing the black-box for A and O(n2) additional operations in F. KuM, appears to be unknown. However, no certi cate is provided that the rank is correct within this cost (and we do not know of one or provide one here). Kaltofen and Saunders [17] also show how to gener- Multiplying a Black-Box Matrix Inverse By ate a vector uniformly and randomly from the nullspace of A with this cost (and, of course, this is certi able with a Any Matrix single evaluation of the black box for A). We also note that 1 The above method can also be used to compute A M for the algorithms of Wiedemann and Kaltofen and Saunders any matrix M Fn n with the same cost as in Theorem 4.2. ∈ require only a linear amount of extra space, which will not Consider the new step 1.5: be the case for our algorithms. We rst employ the random preconditioning of [17] and let (`) 1.5. Computation of Ku M. A = UALD as above. We will thus assume in what follows Split M into m blocks of s columns, so that M = that A has all leading i i minors non-singular for 1 i r. Fn s [M0, . . . , Mm 1] where Mk . Now consider com- Althoughe an unlucky c hoice may make this statemen t false, (`) ∈ puting Ku Mk for some k 0, . . . , m 1 . This can this case will be identi ed in our method. Also assume that ∈ { i } be accomplished by computing B Mk for 0 i m 1 we have computed the rank r of A with high probability. T in sequence, and then multiplying on the left by u to Again, this will be certi ed in what follows. compute uT BiM for each iterate. k 1. Inverting the leading minor. (`) The cost for computing Ku Mk for a single k by the Let A0 be the leading r r minor of A and partition above process is n s multiplication of A to vectors A as F and O(ns) additional operations in . The cost of A0 A1 doing this for all k such that 0 k m 1 is thus A = . „A2 A3« m(n s) < nm multiplications of A to v ectors and O(n2 ) additional operations. Since applying A (and Using the algorithm of the previous section, compute A 1. If this fails, and the leading r r minor is hence B) to an n n matrix is assumed to cost n (n) 0 (`) singular, then either the randomized conditioning or operations in F, Ku M is computed in O(mn (n) + 2 F the rank estimate has failed and we either report this mn ) operations in by the process described here. failure or try again with a di erent randomized pre- conditioning. If we can compute A 1, then the rank Note that this is the same as the cost of Step 5, so the 0 of A is at least the estimated r. overall cost estimate is not a ected. Because Step 4 does (`) not rely on any special form for Ku , we can replace it with 2. Applying the inverted leading minor. (`) 1 r (n r) 1 Compute A A1 F using the algorithm of a computation of Hu (Ku M) with the same cost. The 0 ∈ output is again easily certi ed with n additional black-box the previous section (this could in fact be merged into evaluations. We obtain the following corollary. the rst step).
147
3. Con rming the nullspace. with (n) integer operations, where the bit-lengths of these Note that integers are bounded by O˜(log(n + v + A )). k k k k 1 A0 A1 A0 A1 0 = 1 = 0, 1. Preconditioning and setup. „A2 A3« „ I « „A2A0 A1 A3« Precondition A B = D1UAD2, where D1, D2 are N random diagonal matrices, and U is a unimodular pre- | {z } conditioner from [22, 5]. While we will not provide and the Schur complement A A 1A A must be 2 0 1 3 the detailed analysis here,§ selecting coe cients for these zero if the rank r is correct. This can b e checked with randomly from a set S of size n3 is su cient to ensure n r evaluations of the black box for A. We note that 1 1 a high probability of success. This preconditioning will because of its structure, N = A0 A1 has rank n r. I ensure that all leading minors are non-singular and “ ” that the characteristic polynomial is squarefree with 4. Output rank and nullspace basis. high probability (see [4] Theorem 4.3 for a proof of If the Schur complement is zero, then output the rank the latter condition). From Theorem 2.1, we also see r and N, whose columns give a basis for the nullspace that m(B, u) has full rank with high probability. of A. Otherwise, output “fail” (and possibly retry with K a di erent randomized pre-conditioning). Let p be a prime that is larger than the a priori bound on the coe cients of the characteristic polynomial of Theorem 5.1. Let A Fn n have rank r. Then a basis A; this is easily determined to be (n log A )n+o(1). for the nullspace of A and∈rank r of A can be computed with Fix a blocking factor s to be optimized later,k kand as- an expected number of O˜(n2 1/(ω 1)) applications of A to a sume n = ms. vector, plus an additional expected number of O˜(n3 1/(ω 1)) operations in F. The algorithm is probabilistic of the Las 2. Choosing projections. Zn s Vegas type. Let u be an e cient block projection as in (2.1) and∈ v Zn s a random (dense) block projection ∈ with coe cients chosen from a set S2 of size at least 6. APPLICATIONS TO SPARSE RATIONAL 2n2.
LINEAR SYSTEMS i s s 3. Forming the sequence Z . Given a non-singular A Zn n and b Zn 1, in [10] i = uA v ∈ ∈ Compute this sequence for i = 0 .∈. . 2m. Computing we presented an algorithm and implementation to compute i A 1b with O˜(n1.5(log( A + b ))) matrix-vector products all the A v takes O˜(n (n) m log A ) bit operations. k k k k Computing all the uAiv tak es O˜k(n2k m log A ) bit v A mod p for a machine-word sized prime p and any v k k Zn7→ 1 plus O˜(n2.5(log( A + b ))) additional bit-operations.∈ operations. p k k k k Assuming that A and b had constant sized entries, and that a 4. Computing the minimal matrix generator. matrix-vector product by A mod p could be performed with The minimal matrix generator F ( ) modulo p can be O˜(n) operations modulo p, the algorithm presented could computed from the initial sequence segment 0, . . . , solve a system with O˜(n2.5) bit operations. Unfortunately, 2m 1. See [18, 4]. This can be accomplished with this result was conditional upon the unproven Conjecture O˜(msω n log A§) bit operations. 2.1 of [10]: the existence of an e cient block projection. k k This conjecture was established in Corollary 2.3 of the cur- 5. Extracting the determinant. rent paper. We can now unconditionally state the following Following the algorithm in [18, 4], we rst check if theorem. its degree is less than n and if §so, return “failure”. Otherwise, we know det F A( ) = det( I A). Return Theorem Zn n 6.1. Given any invertible A and b det A = det F (0) mod p. Zn 1, we can compute A 1b using a Las ∈Vegas algorithm.∈ The expected number of matrix-vector products v Av mod The correctness of the algorithm, and speci cally the block 1.5 7→ p is in O˜(n (log( A + b ))), and the expected num- m 1 k k k k projections, follows from fact that [u, Au, . . . , A u] is of ber of additional bit-operations used by this algorithm is in full rank with high probability by Theorem 2.1. Because the O˜(n2.5(log( A + b ))). k k k k projection v is dense, the analysis of [18, (2.6)] is applicable, and the minimal generating polynomial will have full degree Sparse Integer Determinant and Smith Form m with high probability, and hence its determinant at = 0 The e cient block projection of Theorem 2.1 can also be will be the determinant of A. employed relatively directly into the block baby-steps/giant- The total cost of this algorithm is O˜((n (n)m + n2m + steps methods of [18] for computing the determinant of an nmsω) log A ) bit operations, which is minimized when k k integer matrix. This will yield improved algorithms for the s = n1/ω. This yields an algorithm for the determinant determinant and Smith form of a sparse integer matrix. Un- which requires O˜((n2 1/ω (n) + n3 1/ω) log A ) bit op- fortunately, the new techniques do not obviously improve erations. This is probably most interesting whenk k ω = 3, the asymptotic cost of their algorithms in the case for which where it yields an algorithm for determinant that requires they were designed, namely, for computations of the deter- O˜(n2.66 log A ) bit operations on a matrix with pseudo- minants of dense integer matrices. linear cost matrix-vk k ector product. We only sketch the method for computing the determinant We also note that a similar approach allows us to use the here following the algorithm in Section 4 of [18], and esti- Monte Carlo Smith form algorithm of [13], which is com- mate its complexity. Throughout we assume that A Zn n puted by means of computing the characteristic polynomial is non-singular and assume that we can compute v∈ Av of random preconditionings of a matrix. This reduction is 7→
148
explored in [18] in the dense matrix setting. The upshot The linear systems (A.2) and (A.3) may also be formulated is that we obtain the Smith form with the same order of in terms of matrix Pad e approximation problems. We asso- 2m i Fs s complexity, to within a poly-logarithmic factor, as we have ciate to H the matrix polynomial A = i=0 ix [x]. s ∈s obtained the determinant using the above techniques. See The s s matrix polynomials Q, P, QP, P in F [x] that [18, 7.1] and [13] for details. We make no claim that this satisfy § is practical in its present form. A(x)Q(x) P (x) + x2m 1 mod x2m, Note: A referee has indicated that a “lifting” algorithm of Pan et al [20] can also be used to solve integer systems where deg Q m 1 and deg P m 2, when e cient matrix-vector products (modulo small primes) (A.5) are supported for both the coe cient matrix and its inverse. Q (x)A(x) P (x) + x2m 1 mod x2m, This would provide an alternate application of our central where deg Q m 1 and deg P m 2 results to solve integer systems. We wish to thank the referee m 1 i for this information. are unique and provide the coe cients Q = i=0 qix and m 1 i 1 Q = i=0 qi x for constructing H usingP(A.4) (see [19, APPENDIX TheoremP 3.1]). The notation “mod xi” for i 0 indicates that the terms of degree i or higher are ignored. The s s A. APPLYING THE INVERSE OF A matrix polynomials V, U, V , U in Fs s[x] that satisfy BLOCK-HANKEL MATRIX 2m+1 A(x)V (x) U(x) mod x , V (0) = I, In this appendix we address asymptotically fast techniques where deg V m and deg U m 1, for computing a representation of the inverse of a block Han- kel matrix, for applying this inverse to an arbitrary matrix. (A.6) 2m+1 The fundamental technique we will employ is to use the o - V (x)A(x) U (x) mod x , V (0) = I, diagonal inversion formula of Beckermann & Labahn [1] and where deg Q m 1 and deg P m 2, its fast variants [14]. An alternative to using the inversion m i formula would be to use the generalization of the Levinson- are unique and provide the coe cients V = 1 + i=1 vix m i Durbin algorithm in [16]. and Q = 1 + vi x for (A.4). P i=1 Again assume n = ms for integers m and s, and let Using the matrixP Pad e formulation, the matrices Q, Q , V , and V may be computed using the -basis algorithm in [1], or its fast counterpart in [14, 2.2] that uses fast ma- . . . 0 1 m 1 trix multiplication. For solving (A.5),§ the -basis algorithm 2 .. . 3 . . with = s(2m 1) solves H = 1 2 Fn n 6 . . 7 ∈ 6 . .. 7 Q 2m 1 2m 6 . 2m 2 7 [A I] Rx mod x , 6 . . . 7 » P – 4 m 1 2m 2 2m 1 5 (A.1) A [Q P ] R x2m 1 mod x2m, be a non-singular block-Hankel matrix whose blocks are s » I – s s s matrices over F, and let 2m be arbitrary in F . We follow the approach of [19] for computing the inverse matrix with Q, P , Q , P Fs s[x] that satisfy the degree con- ∈ H 1. Since H is invertible, the following four linear systems straints deg Q m 1, deg Q m 1, and deg P m (see [19, (3.8)-(3.11)]) Fs s 2, deg P m 2. The residue matrices R and R in 1 t n s are non-singular, hence Q = QR and Q = (R )Q are H [qm 1, , q0] = [0, , 0, I] F , ∈ solutions Q and Q for applying the inversion formula (A.4). t n s (A.2) H [vm, , v1] = [ m, 2m 1 2m] F , For (A.6), the -basis algorithm with = s(2m + 1) leads ∈ and to Fs n V 2m+1 [qm 1 . . . q0 ] H = [0 . . . 0 I] , [A I] 0 mod x , ∈ » U – Fs n [vm . . . v1 ] H = [ m . . . 2m 1 2m] , ∈ A 2m+1 (A.3) [V U ] 0 mod x Fs s » I – have unique solutions given by the qk, qk , (for 0 Fs s ∈ k m 1), and the vk, vk (for 1 k m). We then with deg V m, deg V m, and deg U m 1, deg U ∈ obtain the following equation (see [19, Theorem 3.1]): m 1. The constant terms V (0) and V (0) in Fs s are non- 1 1 v . . . v I singular, hence V = V (V (0)) and V = (V (0)) V m 1 1 q . . . q are solutions for applying (A.4). Using Theorem 2.4 in [14] 2 . .. .. 3 m 1 0 1 . . . together with the above material we get the following cost H = 2 .. . 3 6 . 7 . . estimate. 6 v .. 7 6 1 7 6 qm 1 7 6 I 7 4 5 Proposition 4 5 A.1. Computing the expression (A.4) of the (A.4) inverse of the block-Hankel matrix (A.1) reduces to multiply- qm 2 . . . q0 0 ing matrix polynomials of degree O(m) in Fs s, and can be . . . vm . . . v1 2 . .. .. 3 ω F . 2 . . 3 done with O˜(s m) operations in . .. . . 6 .. 7 6 q0 . 7 Multiplying a block triangular Toeplitz or Hankel matrix 6 7 6 vm 7 6 0 7 4 5 in Fn n with blocks of size s s by a matrix in Fn n reduces 4 5
149
to the product of two matrix polynomials of degree O(m), [9] W. Eberly. Processor-e cient parallel matrix inversion and of dimensions s s and s n. Using the fast algorithms over abstract elds: two extensions. In Proceedings, in [3] or [2], such a s s pro duct can be done in O˜(sωm) PASCO ’97, pages 38–45, New York, NY, USA, 1997. operations. By splitting the s n matrix into s s blocks, ACM Press. ω the s s by s n product can thus be done in O˜(m s m) = [10] W. Eberly, M. Giesbrecht, P. Giorgi, A. Storjohann, ω 2 O˜(s m ) operations. and G. Villard. Solving sparse rational linear systems. For n = s let ω(1, 1, ) be the exponent of the problem In ISSAC ’06: Proceedings of the 2006 International of s s by s n matrix multiplication over F. The splitting Symposium on Symbolic and algebraic computation, considered just above of the s n matrix into s s blocks, pages 63–70, New York, NY, USA, 2006. ACM Press. corresponds to taking ω(1, 1, ) = ω + 1 <