Faster Inversion and Other Black Box Computations Using Efficient Block Projections

Wayne Eberly1, Mark Giesbrecht2, Pascal Giorgi2,4, Arne Storjohann2, Gilles Villard3

(1) Department of Computer Science, University of Calgary, Calgary, Alberta, Canada [email protected] (2) David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada [email protected], [email protected], [email protected] (3) CNRS, LIP, Ecole´ Normale Superieure´ de Lyon, Lyon, France [email protected] (4) IUT — Universite´ de Perpignan, Perpignan, France [email protected]

ABSTRACT rank, are obtained at the same cost. An application of this Ecient block projections of non-singular matrices have re- technique to Kaltofen and Villard’s Baby-Steps/Giant-Steps cently been used by the authors in [10] to obtain an ecient algorithms for the and Smith Form of an inte- ger matrix is also sketched, yielding algorithms requiring algorithm to nd rational solutions for sparse systems of lin- 2.66 ear equations. In particular a bound of O˜(n2.5) machine op- O˜(n ) machine operations. More general bounds involv- erations is presented for this computation assuming that the ing the number of black-box matrix operations to be used input matrix can be multiplied by a vector with constant- are also obtained. sized entries using O˜(n) machine operations. Somewhat The derived algorithms are all probabilistic of the Las Ve- more general bounds for black-box matrix computations are gas type. They are assumed to be able to generate random also derived. Unfortunately, the correctness of this algo- elements — bits or eld elements — at unit cost, and always rithm depends on the existence of ecient block projections output the correct answer in the expected time given. of non-singular matrices, and this was only conjectured. In this paper we establish the correctness of the algorithm Categories and Subject Descriptors from [10] by proving the existence of ecient block pro- I.1.2 [Symbolic and Algebraic Manipulation]: Algo- jections for arbitrary non-singular matrices over suciently rithms—Algebraic algorithms, analysis of algorithms large elds. We further demonstrate the usefulness of these projections by incorporating them into existing black-box matrix algorithms to derive improved bounds for the cost General Terms of several matrix problems. We consider, in particular, ma- Algorithms trices that can be multiplied by a vector using O˜(n) eld operations: We show how to compute the inverse of any such non-singular matrix over any eld using an expected Keywords 2.27 number of O˜(n ) operations in that eld. A basis for Sparse , structured integer matrix, linear sys- the null space of such a matrix, and a certication of its tem solving, black box This material is based on work supported in part by the French National Research Agency (ANR Gecko, Villard), 1. INTRODUCTION and by the Natural Sciences and Engineering Research Council (NSERC) of Canada (Eberly, Giesbrecht, Storjo- In our paper [10] we presented an algorithm which pur- hann). portedly solved a sparse system of rational equations consid- erably more eciently than standard linear equations solv- ing. Unfortunately, its eectiveness in all cases was conjec- tural, even as its complexity and actual performance were Permission to make digital or hard copies of all or part of this work for very appealing. This eectiveness relied on a conjecture re- personal or classroom use is granted without fee provided that copies are garding the existence of so-called ecient block projections. not made or distributed for profit or commercial advantage and that copies Given a matrix A Fn n over any eld F, these projections bear this notice and the full citation on the first page. To copy otherwise, to should be block vectors∈ u Fns (where s is a blocking republish, to post on servers or to redistribute to lists, requires prior specific factor dividing n, so n = ms∈) such that we can compute uv permission and/or a fee. or vtu quickly for any v Fn s, and such that the sequence ISSAC'07,July29–August1,2007,Waterloo,Ontario,Canada. m∈1 Copyright 2007 ACM 978-1-59593-743-8/07/0007 ...$5.00. of vectors u, Au,..., A u has rank n. In this paper, we

143

prove the existence of a class of such ecient block projec- lel in [12, 13], and this is further developed in [8, 18]. See tions for non-singular n n matrices over suciently large the references in these papers for a more complete history. elds; we require that thesize of the eld F exceed n(n + 1). For sparse systems over a eld, the seminal work is that This can be used to establish a variety of results concern- of Wiedemann [22] who shows how to solve sparse n n ing matrices A Znn with ecient matrix-vector prod- systems over a eld with O(n) matrix-vector products and ucts — in particular,∈ such that a matrix-vector product O(n2) other operations. This research is further developed Ax mod p can be computed for a given integer vector x and a in [4, 16, 17] and many other works. The bit complexity of small (word-sized) prime p using O˜(n) bit operations. Such similar operations for various families of structured matrices matrices include all “sparse” matrices having O(n) nonzero is examined by Emiris and Pan [11]. entries, assuming these are appropriately represented. They also include a variety of “structured” matrices, having con- 2. EFFICIENT BLOCK PROJECTIONS stant “displacement rank” (for one denition of displace- For now we will consider an arbitrary ment rank or another) studied in the recent literature. nn A F over a eld F, and s an integer, the blocking factor, In particular, our existence result implies that if A Zn n ∈ ∈ that divides n exactly. Let m = n/s. For a so-called block is non-singular and has an ecient matrix-vector product ns projection u F and 1 k m, we denote by k(A, u) then the Las Vegas algorithm for system solving given in the block Krylov∈ matrix [u,Au,. . . , Ak1u] FnKks. We [10] can be used to solve a system Ax = b for a given integer nn ∈ wish to show that m(A, u) F is non-singular for a vector b using an expected number of matrix-vector products K ∈ 1.5 particularly simple and sparse u, assuming some properties modulo a word-sized prime that is O˜(n log( A + b )) of A. together with an expected number of additionalk bitk opkera-k 2.5 Our factorization uses the special projection (which we tions that is O˜(n log( A + b )). If A has an ecient will refer to as an ecient block projection) matrix-vector product thenk thek totalk k expected number of bit operations used by this algorithm is less than that used by Is any previously known algorithm, at least when “standard” u = 2 . 3 Fn s, (2.1) (i.e., cubic) matrix arithmetic is used. . ∈ 6 I 7 Consider, for example, the case when the cost of a matrix- 4 s 5 vector product by A modulo a word-sized prime is O˜(n) which is comprised of m copies of Is and thus has exactly operations, and the entries in A are constant size. The cost n non-zero entries. We suggest a similar projection in [10] 2.5 of our algorithm will be O˜(n ) bit operations. This im- without proof of its reliability (i.e., that the corresponding proves upon the p-adic lifting method of Dixon [6], which block Krylov matrix is non-singular). We establish here that 3 requires O˜(n ) bit operations for sparse or dense matrices. it does yield a block Krylov matrix of full rank, and hence This theoretical eciency was reected in practice in [10] at can be used for an ecient inverse of a sparse A. least for large matrices. Let = diag(1, . . . , 1, 2, . . . , 2, . . . , m, . . . , m) be an We present several other rather surprising applications of n n Ddiagonal matrix whose entries consist of m distinct this technique. Each incorporates the technique into an ex- indeterminates i, each i occurring s times. isting algorithm in order to reduce the asymptotic complex- Theorem ity for the matrix problem to be solved. In particular, given 2.1. If the leading ks ks of A is non- F nn Fnn F zero, for 1 k m, then m( A , u) [1, . . . , m] a matrix A over an arbitrary eld , we are able to K D D ∈ compute the∈complete inverse of A with O˜(n31/(ω1)) op- is non-singular. 21/(ω1) Proof. erations in F plus O˜(n ) matrix-vector products by Let = A . For 1 k m, dene k as B D D B A. Here ω is such that we can multiply two n n matrices the specialization of obtained by setting k+1, k+2, . . . , m ω B with O(n ) operations in F. Standard to zero. Thus k is the matrix constructed by setting to B gives ω = 3, while the best known matrix multiplication of zero the last n ks rows and columns of . Similarly, for Fns B Coppersmith and Winograd [5] has ω = 2.376. If again we 1 k m we dene uk to be the matrix constructed ∈ can compute v Av with O˜(n) operations in F, this implies from u by setting to zero the last n ks rows. In particular 7→ 3 1/(ω 1) we have m = and um = u. This specialization will allow an algorithm to compute the inverse with O˜(n ) op- B B erations in F. This is always in O˜(nω), and in particular us to argue incrementally about how the rank is increased equals O˜(n2.27) operations in F for the best known ω of as k increases. [5]. Other relatively straightforward applications of these We proceed by induction on k and show that techniques yield algorithms for the full nullspace and (certi- rank k( k, uk) = ks, (2.2) ed) rank with this same cost. Finally, we sketch how these K B for 1 k m. For the base case k = 1 we have 1( 1, u1) = methods can be employed in the algorithms of Kaltofen and K B Villard [18] and Giesbrecht [13] to computing the determi- u1 and thus rank 1( 1, u1) = rank u1 = s. Now, assume thatK (2.2)B holds for some k with 1 k < m. nant and Smith form of sparse matrices more eciently. By the denition of k and uk, only the rst ks rows of There has certainly been much important work done on B k and uk will be involved in the left hand side of (2.2). nding exact solutions to sparse rational systems prior to B [10]. Dixon’s p-adic lifting algorithm [6] performs extremely Similarly, only the rst ks columns of k will be involved. Since by assumption on the leading kBs ks minor is non- well in practice for dense and sparse linear systems, and is B zero, we have rank k k( k, uk) = ks, which is equivalent implemented eciently in LinBox [7] and Magma (see [10] B K B to rank k( k, kuk) = ks. By the fact that the rst ks for a comparison). Kaltofen and Saunders [17] are the rst K B B rows of uk+1 uk are zero, we have k(uk+1 uk) = 0, or to propose to use Krylov-type algorithms for these prob- B equivalently kuk+1 = kuk, and hence lems. Krylov-type methods are used to nd Smith forms of B B sparse matrices and to solve Diophantine systems in paral- rank k( k, kuk+1) = ks. (2.3) K B B

144

The matrix in (2.3) can be written as Corollary 2.3. For any non-singular A Fnn and s n (over a eld of size greater than n(n + 1))∈there exists Mk nks | Fnn Fsn Fns k( k, kuk+1) = F , an ecient block projection (R, u, v) . K B B » 0 – ∈ ∈

ksks where Mk F is non-singular. Introducing the block ∈ 3. FACTORIZATION OF THE MATRIX uk+1 we obtain the matrix INVERSE Mk The existence of the ecient block projection established [uk+1, k( k, kuk+1)] = 2 Is 0 3 . (2.4) in the previous section allows us to dene a useful factor- K B B 0 0 ization of the inverse of a matrix. This was used to obtain 4 5 faster heuristics for solving integer systems in [10]. The basis whose rank is (k + 1)s. Noticing that is the following factorization of the matrix inverse. Let = A , where is an n n whose uk+1, k( k, kuk+1) = k+1 ( k, uk+1) , B D D D » K B B – K B diagonal entries consist of m distinct indeterminates, each occurring s times contiguously, as previously dened. Dene we are led to (r) (`) T T u = m( , u) with u as in (2.1) and u = m( , u) , K K B K K B rank k+1( k, uk+1) = (k + 1)s. where (r) and (`) refer to projection on the right and left K B respectively. For any 0 k m 1 and any two indices Finally, using the fact that k is the specialization of k+1 T l r T k B B l and r such than l + r = k we have u u = u u. obtained by setting k+1 to zero, we obtain (`) (r) B B B Hence the matrix u = u u is block-Hankel with H K B K rank k+1( k+1, uk+1) = (k + 1)s, blocks of dimension s s: K B which is (2.2) for k + 1 and thus establishes the theorem by uT u uT 2u . . . uT mu B B B induction. . . 2 uT 2u uT 3u .. . 3 = u B. B If the leading ks ks minor of A is non-zero, then the H 6 . .. T 2m2 7 T 6 . . u u 7 leading ks ks minor of A is non-zero as well, for any 6 B 7 6 uT mu . . . uT 2m 2u uT 2m 1u 7 integer k. This gives us the following corollary. 4 B B B 5 (`) (r) (`) (r) Corollary 2.2. If the leading ks ks minor of A is non- Notice that u = u u = u A u . T H K B K K D D K zero for 1 k m, and = A , then m( , u) is Theorem 2.1 and Corollary 2.2 imply that if all leading ks B D D K B (`) (r) non-singular. ks minors of A are non-singular then Ku and Ku are each non-singular as well. This establishes the following. Suppose now that A Fnn is an arbitrary non-singular ∈ matrix and the size of F exceeds n(n + 1). It follows by Theorem 3.1. If A Fnn is such that all leading ks Theorem 2 of Kaltofen and Saunders [17] that there exists ks minors are non-singular,∈ is a diagonal matrix of in- nn a lower triangular L F and an upper determinates, and = A D, then 1 and A1 may be nn ∈ triangular Toeplitz matrix U F such that each of the factored as B D D B ∈ leading minors of A = UAL is non-zero. Let = A ; the B D D 1 = (r) 1 (`), product of the of the matrices m( , u) and B Ku Hu Ku T b K B b (3.1) m( , u) (mentioned in the above theorem and corollary) 1 (r) 1 (`) A = u u u , isK a Bpolynomial with total degree less than 2n(m 1) < DK H K D (`) (r) nn n(n + 1) (if m = 1). In this case it follows that there is where u and u are as dened above, and u F 6 nn also a non-singular diagonal matrix D F such that is block-HankelK (andK invertible) with s s blocks,Has∈above. T ∈ m(B, u) and m(B , u) are non-singular, for K K Note that for any specialization of the indeterminates in to B = DAD = DUALD. D eld elements in F such that det u = 0 we obtain a similar H 6 Now let R = LD2U Fbn n, u Fs n and v Fn s such formula to (3.1) completely over F. A similar factorization that ∈ ∈ ∈ in the non-blocked case is used in [9, (4.5)] for fast parallel b b matrix inversion. uT = (LT ) 1D 1u and v = LDu. Then b b 4. BLACK-BOX MATRIX INVERSION m(RA, v) = LD m(B, u) OVER A FIELD K K and Suppose again that A Fn n is invertible, and that for b Fn1 ∈ T T T T T any v the products Av and A v can be computed in L D m((RA) , u ) = m(B , u), ∈ K K (n) operations in F (where (n) n). Following Kaltofen, T T so that m(RA, v) and m((RbA) , u ) are each non-singular we call such matrix-vector and vector-matrix products black- as well.KBecause D is diagonalK and U and L are triangular box evaluations of A. In this section we will show how to Toeplitz matrices,b it is now easily establishedb that (R, u, v) compute A 1 with O˜(n2 1/(ω 1)) black box evaluations and is an ecient block projection for the given matrix A, where O˜(n31/(ω1)) additional operations in F. Note that when such projections are as dened in [10]. b b (n) = O˜(n) the exponent in n of this cost is smaller than This proves Conjecture 2.1 of [10] for the case that the ω, and is O˜(n2.273) with the currently best-known matrix size of F exceeds n(n + 1): multiplication.

145

Again assume that n = ms, where s is a blocking factor additional operations. If (n) = O˜(n), the overall number and m the number of blocks. Assume for the moment that of eld operations is minimized with the blocking factor s = all principal ks ks minors of A are non-zero, 1 k m. n1/(ω1). Let , , . . . , be the indeterminates that form the di- 1 2 m Theorem 4.1. Let A Fn n, where n = ms and s = agonal entries of and let = A . By Theorem 2.1 1/(ω1) ∈ D B D D T n , be such that all leading ks ks minors are non- and Corollary 2.2, the matrices m( , u) and m( , u) K B K B singular for 1 k m. Let B = DAD, for D = diag(d1, are each invertible. If m 2 then the product of the . . . , d1, . . . , dm, . . . , dm), such that d1, . . . , dm are non-zero determinants of these matrices is a non-zero polynomial T and each of the matrices m(DAD, u) and m((DAD) , u) F[1, . . . , m] with total degree at most 2n(m 1). K 1 K ∈ is invertible. Then the inverse matrix A can be computed Suppose that F has at least 2n(m 1) elements. Then 21/(ω1) n using O(n ) black box operations and an additional cannot be zero at all points in (F 0 ) . Let d1, d2, . . . , dm \ { } O˜(n3 1/(ω 1)) operations in F. be non-zero elements of F such that (d1, d2, . . . , dm) = 0, 6 let D = diag(d1, . . . , d1, . . . , dm, . . . , dm), and let B = DAD. The above discussion makes a number of assumptions. (r) nn (`) T T Then Ku = m(B, u) F and Ku = m(B , u) First, it assumes that the blocking factor s exactly divides nn K ∈ K ∈ F are each invertible because (d1, d2, . . . , dm) = 0, B n. This is easily accommodated by simply extending n to 6 is invertible because A is and d1, d2, . . . , dm are all non-zero, the nearest multiple of s, placing A in the top left corner (`) (r) nn of the , and adding diagonal ones in the and thus Hu = Ku BKu F is invertible as well. Correspondingly, (3.1) suggests∈ bottom right corner. Theorem 4.1 also makes the assumptions that all the lead- 1 (r) 1 (`) 1 (r) 1 (`) B = Ku Hu Ku , and A = DKu Hu Ku D ing ks ks minors of A are non-singular and that the de- terminan ts of (DAD, u) and ((DAD)T , u) are each for computing the matrix inverse. m m non-zero. AlthoughK we know of noK way to ensure this de- T T T 2m1 (`) terministically in the times given, standard techniques can 1. Computation of u , u B, . . . , u B and Ku . F (r) be used to obtain these properties probabilistically if is We can compute this sequence, hence Ku , with m 1 suciently large. applications of B to vectors using O(n(n)) operations Suppose, in particular, that n 16 and that #F > 2(m + in F. 1)n log2 n . Fix a set of at least 2(m + 1)n log2 n non- zerod elemene ts of F. WSe can ensure that the dleadinge ks 2. Computation of Hu. Due to the special form (2.1) of u, one may then com- ks minors of A are non-zero by pre- and post-multiplying pute wu for any w Fsn with O(sn) operations. by buttery network preconditioners X and Y respectively, ∈ T i with parameters chosen uniformly and randomly from . If Hence we can now compute u B u for 0 i 2m 1 S with O(n2) operations in F. X and Y are constructed using the generic exchange matrix of [4, 6.2], then it will use at most n log2 n /2 random el- 1 § d e 3. Computation of Hu . ements from S, and from [4, Theorem 6.3] it follows that 1 The o-diagonal inverse representation of Hu as in all leading ks ks minors of A = XAY will be non-zero (A.4) in the Appendix can be found with O˜(sωm) op- simultaneouslywith probability at least 3/4. This proba- erations by Proposition A.1. bility of success can be made arbitrarilye close to 1 with a 1 1 1 (`) choice from a larger . We note that A = Y A X. Thus, 4. Computation of H K . S u u once we have computed A 1 we can compute A 1 with an From Corollary A.2 in the Appendix, we can compute 2 e additional O˜(n ) operations in F, using the fact that multi- the product H 1M for any matrix M Fn n with u plication of an arbitrary ne n matrix by an n n buttery O˜(sωm2) operations. ∈ preconditioner can be donewith O˜(n2) operations. (r) 1 (`) Once again let be the products of the determinants 5. Computation of Ku (Hu Ku ). T (r) m1 of the matrices m( A , u) and m(( A ) , u), so that We can compute Ku M = [u, Bu, . . . , B u]M, for is non-zero withK totalD Ddegree atKmostD2nD(m 1). If we any M Fnn by splitting M into m blocks of s ∈ choose randomly selected values from for 1, . . . , m, be- consecutive rows Mi, for 0 i m 1: cause # 2(m + 1)n log n > 4 degS the probability S d 2 e m1 that is zero at this point is at most 1/4 by the Schwartz- i KuM = B (uMi) Zippel Lemma [21, 23]. Xi=0 (4.1) In summary, for randomly selected buttery precondition- =uM0 + B(uM1 + B(uM2 + ers X, Y as above, and independently and randomly chosen values d1, d2, . . . , dm the probability that A = XAY has + B(uMm2 + BuMm1) ). non-singular leading ks ks minors for 1 k m and e Because of the special form (2.1) of u, each product (d1, d2, . . . , dm) is non-zero is at least 9/16 > 1/2 when nn 2 uMi F requires O(n ) operations, and hence all random choices are made uniformly and independently from ∈ F such products involved in (4.1) can be computed in a nite subset of 0 with size at least 2(m+1)n log2 n . 2 F S \{ } d e O(mn ) operations. Because applying B to an n n When # 2(m + 1)n log2 n , we can easily construct (r) E F d e matrix costs n(n) operations, Ku M is computed in a eld extension of that has size greater than 2(m + 2 O(mn(n) + mn ) operations using the iterative form 1)n log2 n and perform the computation in that extension. d e F of (4.1) Because this extension will have degree O(log#F n) over , it will add only a logarithmic factor to the nal cost. While we In total, the above process requires O(mn) applications certainly do not claim that this is not of practical concern, of A to a vector (the same as for B), and O(sωm2 + mn2) it does not aect the asymptotic complexity.

146

This algorithm is Las Vegas (or trivially modied to be Corollary 4.4. Let A Fnn be non-singular and let T nn ∈ 1 so): For if either m( A , u) or m(( A ) , u) is singular M F . We can compute A M with a Las Vegas algo- K D D K D D ∈ 21/(ω1) then so is Hu and this is detected at step 3. On the other rithm whose expected cost is O˜(n ) black box oper- T 31/(ω1) hand, if m( A , u) and m(( A ) , u) are both non- ations and O˜(n ) additional operations in F. singular thenK DtheDalgorithm’sKoutputD Dis correct. The estimates in Table 4.1 apply to this computation as well. Theorem 4.2. Let A Fn n be non-singular. Then the inverse matrix A1 can ∈be computed by a Las Vegas algo- rithm whose expected cost is O˜(n21/(ω1)) black box oper- 5. APPLICATIONS TO BLACK-BOX ations and O˜(n31/(ω1)) additional operations in F. MATRICES OVER A FIELD The algorithms of the previous section have applications Table 4.1 (below) states the expected costs to compute in some important computations with black-box matrices the inverse using various values of ω when (n) = O˜(n). over an arbitrary eld F. In particular, we consider the problems of computing the nullspace and rank of a black- ω Black-box Blocking Inversion box matrix. Each of these algorithms is probabilistic of the applications factor cost Las Vegas type; the output is certied to be correct. 3 (Standard) 1.5 n1/2 O˜(n2.5) Kaltofen and Saunders [17] present algorithms for com- puting the rank of a matrix and for randomly sampling the 2.807 (Strassen) 1.446 n0.553 O˜(n2.446) nullspace, building upon the work of Wiedemann [22]. In 2.3755 (Cop/Win) 1.273 n0.728 O˜(n2.273) particular, they show for random lower upper and lower tri- angular Toeplitz matrices U, L Fnn, and random diag- ∈ Table 4.1: Exponents of matrix inversion with a ma- onal D, that all leading k k minors of A = UALD are e trix vector cost (n) = O˜(n). non-singular for 1 k r = rank A, and that if f A F[x] e ∈ is the minimal polynomial of A, then it has degree r + 1 if A Remark 4.3. The structure (2.1) of the projection u plays is singular (and degree n if A is non-singular). This is proved to be true for any input A Fenn, and for random choice of a central role in computing the product of the block Krylov ∈ Fns U, L and D, with high probability. The cost of computing matrix by a n n matrix. For a general projection u , e how to do better than a general matrix multiplication,∈ i.e., f A (and hence rank A) is shown to be O(n) applications of how to take advantage of the Krylov structure for computing the black-box for A and O(n2) additional operations in F. KuM, appears to be unknown. However, no certicate is provided that the rank is correct within this cost (and we do not know of one or provide one here). Kaltofen and Saunders [17] also show how to gener- Multiplying a Black-Box Matrix Inverse By ate a vector uniformly and randomly from the nullspace of A with this cost (and, of course, this is certiable with a Any Matrix single evaluation of the black box for A). We also note that 1 The above method can also be used to compute A M for the algorithms of Wiedemann and Kaltofen and Saunders any matrix M Fnn with the same cost as in Theorem 4.2. ∈ require only a linear amount of extra space, which will not Consider the new step 1.5: be the case for our algorithms. We rst employ the random preconditioning of [17] and let (`) 1.5. Computation of Ku M. A = UALD as above. We will thus assume in what follows Split M into m blocks of s columns, so that M = that A has all leading i i minors non-singular for 1 i r. Fns [M0, . . . , Mm1] where Mk . Now consider com- Althoughe an unlucky choice may make this statement false, (`) ∈ puting Ku Mk for some k 0, . . . , m 1 . This can this case will be identied in our method. Also assume that ∈ { i } be accomplished by computing B Mk for 0 i m 1 we have computed the rank r of A with high probability. T in sequence, and then multiplying on the left by u to Again, this will be certied in what follows. compute uT BiM for each iterate. k 1. Inverting the leading minor. (`) The cost for computing Ku Mk for a single k by the Let A0 be the leading r r minor of A and partition above process is n s multiplication of A to vectors A as F and O(ns) additional operations in . The cost of A0 A1 doing this for all k such that 0 k m 1 is thus A = . „A2 A3« m(n s) < nm multiplicationsof Ato vectors and O(n2) additional operations. Since applying A (and Using the algorithm of the previous section, compute A1. If this fails, and the leading r r minor is hence B) to an n n matrix is assumed to cost n(n) 0 (`) singular, then either the randomized conditioning or operations in F, Ku M is computed in O(mn(n) + 2 F the rank estimate has failed and we either report this mn ) operations in by the process described here. failure or try again with a dierent randomized pre- conditioning. If we can compute A1, then the rank Note that this is the same as the cost of Step 5, so the 0 of A is at least the estimated r. overall cost estimate is not aected. Because Step 4 does (`) not rely on any special form for Ku , we can replace it with 2. Applying the inverted leading minor. (`) 1 r(nr) 1 Compute A A1 F using the algorithm of a computation of Hu (Ku M) with the same cost. The 0 ∈ output is again easily certied with n additional black-box the previous section (this could in fact be merged into evaluations. We obtain the following corollary. the rst step).

147

3. Conrming the nullspace. with (n) integer operations, where the bit-lengths of these Note that integers are bounded by O˜(log(n + v + A )). k k k k 1 A0 A1 A0 A1 0 = 1 = 0, 1. Preconditioning and setup. „A2 A3« „ I « „A2A0 A1 A3« Precondition A B = D1UAD2, where D1, D2 are N random diagonal matrices, and U is a unimodular pre- | {z } conditioner from [22, 5]. While we will not provide and the Schur complement A A 1A A must be 2 0 1 3 the detailed analysis here,§ selecting coecients for these zero if the rank r is correct. This can be checked with randomly from a set S of size n3 is sucient to ensure n r evaluations of the black box for A. We note that 1 1 a high probability of success. This preconditioning will because of its structure, N = A0 A1 has rank n r. I ensure that all leading minors are non-singular and “ ” that the characteristic polynomial is squarefree with 4. Output rank and nullspace basis. high probability (see [4] Theorem 4.3 for a proof of If the Schur complement is zero, then output the rank the latter condition). From Theorem 2.1, we also see r and N, whose columns give a basis for the nullspace that m(B, u) has full rank with high probability. of A. Otherwise, output “fail” (and possibly retry with K a dierent randomized pre-conditioning). Let p be a prime that is larger than the a priori bound on the coecients of the characteristic polynomial of Theorem 5.1. Let A Fnn have rank r. Then a basis A; this is easily determined to be (n log A )n+o(1). for the nullspace of A and∈rank r of A can be computed with Fix a blocking factor s to be optimized later,k kand as- an expected number of O˜(n21/(ω1)) applications of A to a sume n = ms. vector, plus an additional expected number of O˜(n31/(ω1)) operations in F. The algorithm is probabilistic of the Las 2. Choosing projections. Zns Vegas type. Let u be an ecient block projection as in (2.1) and∈ v Zns a random (dense) block projection ∈ with coecients chosen from a set S2 of size at least 6. APPLICATIONS TO SPARSE RATIONAL 2n2.

LINEAR SYSTEMS i ss 3. Forming the sequence Z . Given a non-singular A Zn n and b Zn 1, in [10] i = uA v ∈ ∈ Compute this sequence for i = 0 .∈. . 2m. Computing we presented an algorithm and implementation to compute i A1b with O˜(n1.5(log( A + b ))) matrix-vector products all the A v takes O˜(n(n) m log A ) bit operations. k k k k Computing all the uAiv tak es O˜k(n2k m log A ) bit v A mod p for a machine-word sized prime p and any v k k Zn7→1 plus O˜(n2.5(log( A + b ))) additional bit-operations.∈ operations. p k k k k Assuming that A and b had constant sized entries, and that a 4. Computing the minimal matrix generator. matrix-vector product by A mod p could be performed with The minimal matrix generator F () modulo p can be O˜(n) operations modulo p, the algorithm presented could computed from the initial sequence segment 0, . . . , solve a system with O˜(n2.5) bit operations. Unfortunately, 2m1. See [18, 4]. This can be accomplished with this result was conditional upon the unproven Conjecture O˜(msω n log A§) bit operations. 2.1 of [10]: the existence of an ecient block projection. k k This conjecture was established in Corollary 2.3 of the cur- 5. Extracting the determinant. rent paper. We can now unconditionally state the following Following the algorithm in [18, 4], we rst check if theorem. its degree is less than n and if §so, return “failure”. Otherwise, we know det F A() = det(I A). Return Theorem Znn 6.1. Given any invertible A and b det A = det F (0) mod p. Zn1, we can compute A1b using a Las ∈Vegas algorithm.∈ The expected number of matrix-vector products v Av mod The correctness of the algorithm, and specically the block 1.5 7→ p is in O˜(n (log( A + b ))), and the expected num- m1 k k k k projections, follows from fact that [u, Au, . . . , A u] is of ber of additional bit-operations used by this algorithm is in full rank with high probability by Theorem 2.1. Because the O˜(n2.5(log( A + b ))). k k k k projection v is dense, the analysis of [18, (2.6)] is applicable, and the minimal generating polynomial will have full degree Sparse Integer Determinant and Smith Form m with high probability, and hence its determinant at = 0 The ecient block projection of Theorem 2.1 can also be will be the determinant of A. employed relatively directly into the block baby-steps/giant- The total cost of this algorithm is O˜((n(n)m + n2m + steps methods of [18] for computing the determinant of an nmsω) log A ) bit operations, which is minimized when k k integer matrix. This will yield improved algorithms for the s = n1/ω. This yields an algorithm for the determinant determinant and Smith form of a sparse integer matrix. Un- which requires O˜((n21/ω(n) + n31/ω) log A ) bit op- fortunately, the new techniques do not obviously improve erations. This is probably most interesting whenk k ω = 3, the asymptotic cost of their algorithms in the case for which where it yields an algorithm for determinant that requires they were designed, namely, for computations of the deter- O˜(n2.66 log A ) bit operations on a matrix with pseudo- minants of dense integer matrices. linear cost matrix-vk k ector product. We only sketch the method for computing the determinant We also note that a similar approach allows us to use the here following the algorithm in Section 4 of [18], and esti- Monte Carlo Smith form algorithm of [13], which is com- mate its complexity. Throughout we assume that A Znn puted by means of computing the characteristic polynomial is non-singular and assume that we can compute v∈ Av of random preconditionings of a matrix. This reduction is 7→

148

explored in [18] in the dense matrix setting. The upshot The linear systems (A.2) and (A.3) may also be formulated is that we obtain the Smith form with the same order of in terms of matrix Pade approximation problems. We asso- 2m i Fss complexity, to within a poly-logarithmic factor, as we have ciate to H the matrix polynomial A = i=0 ix [x]. s∈s obtained the determinant using the above techniques. See The s s matrix polynomials Q, P, QP, P in F [x] that [18, 7.1] and [13] for details. We make no claim that this satisfy § is practical in its present form. A(x)Q(x) P (x) + x2m 1 mod x2m, Note: A referee has indicated that a “lifting” algorithm of Pan et al [20] can also be used to solve integer systems where deg Q m 1 and deg P m 2, when ecient matrix-vector products (modulo small primes) (A.5) are supported for both the coecient matrix and its inverse. Q (x)A(x) P (x) + x2m 1 mod x2m, This would provide an alternate application of our central where deg Q m 1 and deg P m 2 results to solve integer systems. We wish to thank the referee m1 i for this information. are unique and provide the coecients Q = i=0 qix and m1 i 1 Q = i=0 qi x for constructing H usingP(A.4) (see [19, APPENDIX TheoremP 3.1]). The notation “mod xi” for i 0 indicates that the terms of degree i or higher are ignored. The s s A. APPLYING THE INVERSE OF A matrix polynomials V, U, V , U in Fss[x] that satisfy BLOCK- 2m+1 A(x)V (x) U(x) mod x , V (0) = I, In this appendix we address asymptotically fast techniques where deg V m and deg U m 1, for computing a representation of the inverse of a block Han- kel matrix, for applying this inverse to an arbitrary matrix. (A.6) 2m+1 The fundamental technique we will employ is to use the o- V (x)A(x) U (x) mod x , V (0) = I, diagonal inversion formula of Beckermann & Labahn [1] and where deg Q m 1 and deg P m 2, its fast variants [14]. An alternative to using the inversion m i formula would be to use the generalization of the Levinson- are unique and provide the coecients V = 1 + i=1 vix m i Durbin algorithm in [16]. and Q = 1 + vi x for (A.4). P i=1 Again assume n = ms for integers m and s, and let Using the matrixP Pade formulation, the matrices Q, Q , V , and V may be computed using the -basis algorithm in [1], or its fast counterpart in [14, 2.2] that uses fast ma- . . . 0 1 m 1 trix multiplication. For solving (A.5),§ the -basis algorithm 2 .. . 3 . . with = s(2m 1) solves H = 1 2 Fn n 6 . . 7 ∈ 6 . .. 7 Q 2m1 2m 6 . 2m 2 7 [A I] Rx mod x , 6 . . . 7 » P – 4 m 1 2m 2 2m 1 5 (A.1) A [Q P ] R x2m 1 mod x2m, be a non-singular block-Hankel matrix whose blocks are s » I – ss s matrices over F, and let 2m be arbitrary in F . We follow the approach of [19] for computing the inverse matrix with Q, P , Q , P Fs s[x] that satisfy the degree con- ∈ H 1. Since H is invertible, the following four linear systems straints deg Q m 1, deg Q m 1, and deg P m (see [19, (3.8)-(3.11)]) Fss 2, deg P m 2. The residue matrices R and R in 1 t ns are non-singular, hence Q = QR and Q = (R )Q are H [qm1, , q0] = [0, , 0, I] F , ∈ solutions Q and Q for applying the inversion formula (A.4). t ns (A.2) H [vm, , v1] = [m, 2m12m] F , For (A.6), the -basis algorithm with = s(2m + 1) leads ∈ and to Fsn V 2m+1 [qm1 . . . q0 ] H = [0 . . . 0 I] , [A I] 0 mod x , ∈ » U – Fsn [vm . . . v1 ] H = [m . . . 2m1 2m] , ∈ A 2m+1 (A.3) [V U ] 0 mod x Fss » I – have unique solutions given by the qk, qk , (for 0 Fss ∈ k m 1), and the vk, vk (for 1 k m). We then with deg V m, deg V m, and deg U m 1, deg U ∈ obtain the following equation (see [19, Theorem 3.1]): m 1. The constant terms V (0) and V (0) in Fss are non- 1 1 v . . . v I singular, hence V = V (V (0)) and V = (V (0)) V m 1 1 q . . . q are solutions for applying (A.4). Using Theorem 2.4 in [14] 2 . .. .. 3 m 1 0 1 . . . together with the above material we get the following cost H = 2 .. . 3 6 . 7 . . estimate. 6 v .. 7 6 1 7 6 qm1 7 6 I 7 4 5 Proposition 4 5 A.1. Computing the expression (A.4) of the (A.4) inverse of the block-Hankel matrix (A.1) reduces to multiply- qm2 . . . q0 0 ing matrix polynomials of degree O(m) in Fss, and can be . . . vm . . . v1 2 . .. .. 3 ω F . 2 . . 3 done with O˜(s m) operations in . .. . . 6 .. 7 6 q0 . 7 Multiplying a block triangular Toeplitz or Hankel matrix 6 7 6 vm 7 6 0 7 4 5 in Fn n with blocks of size s s by a matrix in Fn n reduces 4 5

149

to the product of two matrix polynomials of degree O(m), [9] W. Eberly. Processor-ecient parallel matrix inversion and of dimensions s s and s n. Using the fast algorithms over abstract elds: two extensions. In Proceedings, in [3] or [2], such a s s product can be done in O˜(sωm) PASCO ’97, pages 38–45, New York, NY, USA, 1997. operations. By splitting the s n matrix into s s blocks, ACM Press. ω the s s by s n product can thus be done in O˜(m s m) = [10] W. Eberly, M. Giesbrecht, P. Giorgi, A. Storjohann, ω 2 O˜(s m ) operations. and G. Villard. Solving sparse rational linear systems. For n = s let ω(1, 1, ) be the exponent of the problem In ISSAC ’06: Proceedings of the 2006 International of s s by s n matrix multiplication over F. The splitting Symposium on Symbolic and algebraic computation, considered just above of the s n matrix into s s blocks, pages 63–70, New York, NY, USA, 2006. ACM Press. corresponds to taking ω(1, 1, ) = ω + 1 < + 1.376 [11] I. Z. Emiris and V. Y. Pan. Improved algorithms for (ω < 2.376 due to [5]), with the total cost O˜(sω(1,1,)m) = computing determinants and resultants. J. Complex., O˜(sωm2). Depending on 1, a slightly smaller bound 21(1):43–71, 2005. than + 1.376 for ω(1, 1, ) may be used due the matrix [12] M. Giesbrecht. Ecient parallel solution of sparse multiplication techniques specically designed for rectangu- systems of linear diophantine equations. In lar matrices in [15]. This is true as soon as 1.171, and Proceediings, PASCO’97, pages 1–10, 1997. gives for example ω(1, 1, ) < + 1.334 for = 2, i.e., for [13] M. Giesbrecht. Fast computation of the Smith form of s = √n. a sparse integer matrix. Computational Complexity, 10(1):41–69, 2004. Corollary A.2. Let H be the block-Hankel matrix of (A.1). [14] P. Giorgi, C.-P. Jeannerod, and G. Villard. On the If the representation (A.4) of H 1 is given, then computing complexity of computations. In H 1M for an arbitrary M Fn n reduces to four s s by Rafael Sendra, editor, Proceedings of the 2003 s n products of polynomial∈matrices of degree O(m). This International Symposium on Symbolic and Algebraic can be done with O˜(sω(1,1,)m) or O˜(sωm2) operations in F Computation, Philadelphia, Pennsylvania, USA, pages (n = s = ms). 135–142. ACM Press, New York, August 2003. [15] X. Huang and V. Y. Pan. Fast rectangular matrix B. REFERENCES multiplication and applications. J. Complex., [1] B. Beckermann and G. Labahn. A uniform approach 14(2):257–299, 1998. for the fast computation of matrix-type Pade [16] E. Kaltofen. Analysis of Coppersmith’s block approximants. SIAM J. Matrix Anal. Appl., Wiedemann algorithm for the parallel solution of 15(3):804–823, July 1994. sparse linear systems. Mathematics of Computation, [2] A. Bostan and E. Schost. Polynomial evaluation and 64(210):777–806, April 1995. interpolation on special sets of points. J. Complex., [17] E. Kaltofen and B. D. Saunders. On Wiedemann’s 21(4):420–446, 2005. method of solving sparse linear systems. In Proc. [3] D. Cantor and E. Kaltofen. Fast multiplication of AAECC-9, volume 539 of Springer Lecture Notes in polynomials over arbitrary algebras. Acta Informatica, Comp. Sci., 1991. 29–38. 28:693–701, 1991. [18] E. Kaltofen and G. Villard. On the complexity of [4] L. Chen, W. Eberly, E. Kaltofen, B. D. Saunders, computing determinants. Computational Complexity, W. J. Turner, and G. Villard. Ecient matrix 13(3-4):91–130, 2004. preconditioners for black box linear algebra. Linear [19] G. Labahn, D. K. Chio, and S. Cabay. The inverses of Algebra and its Applications, 343–344:119–146, 2002. block Hankel and block Toeplitz matrices. SIAM J. [5] D. Coppersmith and S. Winograd. Matrix Comput., 19(1):98–123, 1990. multiplication via arithmetic progressions. J. Symb. [20] V. Y. Pan, B. Murphy, R. E. Rosholt, and X. Wang. Comp., 9:251–280, 1990. Toeplitz and Hankel meet Hensel and Newton: Nearly [6] John D. Dixon. Exact solution of linear equations optimal algorithms and their practical acceleration using p-adic expansions. Numerische Mathematik, with saturated initialization. Technical Report 2004 40:137–141, 1982. 013, The Graduate Center, CUNY, New York, 2004. [7] J.-G. Dumas, T. Gautier, M. Giesbrecht, P. Giorgi, [21] J. T. Schwartz. Fast probabilistic algorithms for B. Hovinen, E. Kaltofen, B. D. Saunders, W. J. verication of polynomial identities. J. Assoc. Turner, and G. Villard. LinBox: A generic library for Computing Machinery, 27:701–717, 1980. exact linear algebra. In Arjeh M. Cohen, Xiao-Shan [22] D. H. Wiedemann. Solving sparse linear equations Gao, and Nobuki Takayama, editors, Proceedings of over nite elds. IEEE Transactions on Information the 2002 International Congress of Mathematical Theory, 32(1):54–62, January 1986. Software, Beijing, China, pages 40–50. World [23] R. Zippel. Probabilistic algorithms for sparse Scientic, August 2002. polynomials. In Proc. EUROSAM 79, pages 216–226, [8] J.-G. Dumas, B. D. Saunders, and G. Villard. Integer Marseille, 1979. Smith form via the valence: experience with large sparse matrices from homology. In ISSAC ’00: Proceedings of the 2000 International Symposium on Symbolic and Algebraic Computation, pages 95–105, New York, NY, USA, 2000. ACM Press.

150