A fast and nearly -free algorithm for the characteristic polynomial Fredrik Johansson

To cite this version:

Fredrik Johansson. A fast and nearly division-free algorithm for the characteristic polynomial. 2020. ￿hal-03016034v1￿

HAL Id: hal-03016034 https://hal.inria.fr/hal-03016034v1 Preprint submitted on 20 Nov 2020 (v1), last revised 24 Nov 2020 (v3)

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A FAST AND NEARLY DIVISION-FREE ALGORITHM FOR THE CHARACTERISTIC POLYNOMIAL

FREDRIK JOHANSSON

Abstract. We give a simple O(n3.5) algorithm for computing the character- istic polynomial, and adjugate of an n × n , using only ring operations together with exact divisions by small integers. The method is a baby-step giant-step version of the Faddeev-Leverrier algorithm.

1. Introduction Let R be a commutative ring. Denote by ω > 2 an exponent of matrix multi- plication, meaning that we can multiply two n n matrices using O(nω) ring op- erations (additions, subtractions and multiplications).× Assume additionally that R has a unit and that the characteristic of R is 0 or coprime to 1, 2,...,n. We shall describe a simple algorithm that achieves the following: Theorem 1. The characteristic polynomial, determinant and adjugate of a matrix A Rn×n can be computed using O(nω+0.5 + n3) ring operations in R and n exact divisions∈ of elements in R by integers 1 k n. ≤ ≤ 3.31 The complexity reduces to O(n ) with Strassen multiplication (ω = log2(7)), and O(n3) with the asymptotically fastest currently known multiplication algorithm (ω =2.3728639) due to Le Gall [9]. Since Strassen’s algorithm tends to give modest savings and Le Gall’s algorithm is completely impractical, the O(n3.5) complexity with classical ω = 3 multiplication is perhaps the most relevant metric. The method is a baby-step giant-step improvement of the Faddeev-Leverrier algorithm, which in its original form (Algorithm 1) achieves the same result with complexity O(nω+1). Although our improvement (Algorithm 2) is elementary, it has apparently never been published. Csanky [6] describes a parallel version of the Faddeev-Leverrier algorithm; Berkowitz [2] states that this can be turned into an O(nω+0.5 log n) method using baby-step giant-step techniques, citing private communication with S. Winograd, but does not describe such an algorithm in detail. Since all descriptions of the Faddeev-Leverrier algorithm that we have found in the literature present it as an O(n4) or O(nω+1) algorithm, it seems appropriate to document an improved version and study its performance. If R is a field, then the determinant, adjugate and characteristic polynomial can be computed in O(nω) operations allowing divisions [8, 21, 1, 19]. The same is also true if R is an integral domain (for example, by working in its fraction field). The key point of Theorem 1 is that the algorithm is nearly division-free, in the sense that we do not divide by general elements of R (all divisors are integers, and the divisions are exact in the sense that the quotients remain in R). It is also branch- free: we do not need to determine whether elements in R are zero. It is an open problem whether an O(nω) algorithm is possible with these constraints. 1 2 FREDRIK JOHANSSON

The obvious division-free algorithm is cofactor expansion, which uses O(n!) op- erations. It is mainly interesting for n 4 and for sparse symbolic matrices. The Berkowitz algorithm [2] achieves complexity≤ O(nω+1) without any divisions, and at least with classical multiplication is faster than the original Faddeev-Leverrier algo- rithm by a constant factor since it operates on triangular matrices rather than full matrices. Our improved Faddeev-Leverrier algorithm overcomes this disadvantage asymptotically. It is an open problem (already posed in [2]) whether the Berkowitz algorithm admits a similar baby-step giant-step speedup. Neither Theorem 1 nor the complexity claimed by Berkowitz is the best result available for linear algebra over rings. Kaltofen and Gillard have shown [17, 18, 26] that the determinant and adjoint of a matrix over any commutative ring can be computed without divisions in nη+o(1) operations where η = ω+1/((ω 1)2+1), and − similarly the bound nχ+o(1) with χ = ω +2/(ω2 ω +1) holds for the characteristic polynomial. This gives η = ω +0.2 and χ ω +0− .29 with classical multiplication (ω = 3) and η ω +0.35, χ ω +0.47 with≈ Le Gall’s algorithm.1 The Kaltofen- Gillard algorithm≈ is quite complex,≈ however, requiring (among other things) FFT- based multiplication of power series matrices. Theorem 1 has a slightly worse exponent and requires division by integers, but the algorithm can be implemented in a few lines of code using only as a building block.

2. The Faddeev-Leverrier algorithm We recall the Faddeev-Leverrier algorithm (see [25, 12, 6, 13]; Alg. 2.2.7 in [5]) n for computing the characteristic polynomial pA(x)= cnx + +c1x+c0 of a square ···1 k j matrix A. The algorithm is based on the recursion cn−k = P cn−k j Tr(A ): − k j=1 + we compute a sequence of matrices (stored in the variable B) through repeated multiplication by A, and in each step extract a trace. The determinant and the adjugate matrix appear as byproducts of this process: ( 1)n det(A) is the last coefficient c , and ( 1)n+1 adj(A) is the penultimate entry− in the matrix sequence. 0 − Algorithm 1 Faddeev-Leverrier algorithm Input: A Rn×n where n 1 and R is a commutative ring, R having a unit element∈ and characteristic≥ 0 or characteristic coprime to 1, 2,...,n Output: (pA(x), det(A), adj(A)) 1: cn =1, B I, k 1 ← ← 2: while k n 1 do ≤ − 3: B AB ← 1 4: cn−k Tr(B) ←− k 5: B B + cn−k I ← 6: k k +1 ← 7: end while 1 8: c0 n Tr(AB) ←− n n n+1 9: return (c + c x + ... + cnx , ( 1) c , ( 1) B) 0 1 − 0 −

The code can be tightened assuming n 2, in which case the line before the start ≥ of the loop can be changed to cn =1, cn− = Tr(A), B A + cn− I, k 2 . { 1 − ← 1 ← }

1Slightly better η and χ can be given by using fast rectangular matrix multiplication. ALGORITHM FOR THE CHARACTERISTIC POLYNOMIAL 3

We can change the loop condition to k n and remove the line after the loop if we omit returning adj(A). ≤ It is easy to see that Algorithm 1 performs O(n) matrix multiplications and O(n2) additional arithmetic operations. The condition on the characteristic of R ensures that we can divide exactly by each k, i.e. (xk)/k = x holds for x R, k n. ∈ ≤ 3. The improved Faddeev-Leverrier algorithm Algorithm 1 computes a sequence of n matrices but only extracts a small amount of unique information (the trace) from each matrix. In such a scenario, we can often save time using a baby step giant-step approach in which we only compute O(√n) products explicitly (see [20, 3, 2, 14] for a few examples of this technique). The key observation in this instance is that, given matrices A and B, we can compute Tr(AB) using O(n2) operations without forming the complete matrix product AB, by simply evaluating the dot products for the main diagonal of AB. We denote this product trace operation by Tr(A, B). We now choose a step size m √n, expand the loop in Algorithm 1 to group m iterations together, and precompute≈ the matrix powers and traces that appear repeatedly. This results in Algorithm 2.

Algorithm 2 Baby-step giant-step Faddeev-Leverrier algorithm Input: A Rn×n where n 1 and R is a commutative ring, R having a unit element∈ and characteristic≥ 0 or characteristic coprime to 1, 2,...,n Output: (pA(x), det(A), adj(A)) 1: m √n ←⌊ ⌋ 2: Precompute the matrices A1, A2, A3,...,Am k 3: Precompute the traces tk = Tr(A ) for k =1,...,m 4: cn =1, B I, k 1 ← ← 5: while k n 1 do ≤ − 6: m min(m,n k) ← 1 − 7: cn−k Tr(A, B) ←− k 8: for j 1, 2,...,m 1 do ← j−+1 9: cn−k−j Tr(A ,B) ⊲ Using precomputed power of A ← 10: for i 0, 1,...,j 1 do ← − 11: cn−k−j cn−k−j + tj−icn−k−i ← 12: end for 13: cn−k−j cn−k−j /( k j) ← − − 14: end for 15: B AmB ⊲ Using precomputed power of A ← 16: for j 0, 1,...,m 1 do ← − m−j−1 0 17: B B + cn−k−j A ⊲ Using precomputed power, or A = I ← 18: end for 19: k k + m ← 20: end while 1 21: c0 n Tr(A, B) ←− n n n+1 22: return (c + c x + ... + cnx , ( 1) c , ( 1) B) 0 1 − 0 −

The complexity bound in Theorem 1 is clear from inspection since we perform O(m + n/m)= O(√n) matrix multiplications of size n n, and O(n3) arithmetic × 4 FREDRIK JOHANSSON operations in the remaining steps. The remaining O(n3) operations can presumably be grouped into matrix multiplications with further rearrangements; this might lead to the complexity bound suggested by Berkowitz, but we have not attempted to pursue such an analysis since any asymptotic savings would be irrelevant for practical computations. As an observation for implementations, the matrix-matrix multiplications and product traces are done with invariant operands that get recycled O(√n) times. This can be exploited for preconditioning purposes, for instance by packing the data more efficiently for arithmetic operations. We also note that the optimal m may depend on the application, and a smaller value will reduce memory consumption.

4. Applicability and performance evaluation When, if ever, does it make sense to use Algorithm 2? We can immediately discard some applications: For computing over R and C in ordinary floating-point arithmetic, the • Faddeev-Leverrier algorithm is slower and far less numerically stable than textbook techniques such as reduction to Hessenberg form and with O(n3) or better complexity [22, 27]. For finite fields, classical O(n3) methods using divisions have no drawbacks, • and linear algebra with O(n2.81) Strassen complexity is well established [7]. Over rings with small characteristic, the applicability of Algorithm 2 is in any case limited due to the integer divisions. Generally speaking, division-free or nearly division-free algorithms are interesting for rings and fields R where dividing recklessly can lead to coefficient explosion (for example, Q) or in which testing for zero is problematic (for example, exact models of R). The optimal approach in such situations is usually to avoid computing directly in R, for example using modular arithmetic and interpolation techniques, but such indirect methods are more difficult to implement and must typically be designed on a case by case basis. By contrast, Algorithm 2 is easy to use anywhere. We will now look at the results of some implementation experiments.

4.1. Integers. For exact linear algebra over Z and Q, the best methods are gen- erally fraction-free versions of classical algorithms (such as the Bareiss version of Gaussian elimination) for small n, and multimodular or p-adic methods for large n (see for example [8, 21]). We do not expect Algorithm 2 to beat those algorithms, but it is instructive to examine its performance. Table 1 shows timings for comput- ing a determinant, inverse or characteristic polynomial of an n n matrix over Z with random entries in 10,..., 10, using the following algorithms:× − FFLU: fraction-free LU factorization using the Bareiss algorithm. • FFLU2: as above, but using the resulting decomposition to compute A−1 • (equivalently determining adj(A)) by solving AA−1 = I. ModDet: a multimodular algorithm for the determinant. • ModInv: a multimodular algorithm for the inverse matrix. • ModCP: a multimodular algorithm for the characteristic polynomial. • Berk: the Berkowitz algorithm for the characteristic polynomial. • Alg1: Faddeev-Leverrier, Algorithm 1. • Alg2: baby-step giant-step Faddeev-Leverrier, Algorithm 2. • ALGORITHM FOR THE CHARACTERISTIC POLYNOMIAL 5

Table 1. Time in seconds to compute characteristic polynomial (C), determinant (D), adjugate/inverse (A) of an n n matrix over Z with random elements in 10,..., 10, using various× algorithms. −

n FFLU ModDet FFLU2 ModInv ModCP Berk Alg1 Alg2 D D DA A CD CD CDA CDA 10 0.0000060 0.000021 0.000015 0.00012 0.000016 0.000035 0.000015 0.000030 20 0.000036 0.000078 0.00043 0.00096 0.00011 0.0010 0.00086 0.00061 50 0.0023 0.0012 0.011 0.016 0.0039 0.048 0.052 0.017 100 0.039 0.0068 0.14 0.18 0.055 0.84 1.1 0.22 200 0.64 0.044 2.3 1.8 0.89 16 27 4.1 300 3.4 0.15 13 9.0 4.6 94 174 20 400 12 0.38 45 22 15 321 696 66 500 32 0.77 127 52 37 900 2057 150

We implemented Alg1 and Alg2 on top of Flint [10], while all the other tested algorithms are builtin Flint methods.

4.1.1. Observations. Alg2 clearly outperforms both Alg1 and Berk for large n, mak- ing it the best algorithm for computing the characteristic using direct arithmetic in Z (the modular algorithm is, as expected, superior). Alg2 is reasonably com- petitive for computing the inverse or adjugate matrix, coming within a factor 2-3 of FFLU2 and ModInv in this example. For , the gap to the FFLU× algorithm is larger, and the modular determinant algorithm is unmatched.

4.2. Number fields. Exact linear algebra over algebraic number fields Q(a) is an interesting use case for division-free algorithms since coefficient explosion is a sig- nificant problem for classical O(n3) algorithms. As in the case of Z and Q, modular algorithms are asymptotically more efficient than working over Q(a) directly, but harder to implement. Here we compare the following algorithms: Sage: the charpoly method in SageMath [24], which implements a special- • purpose algorithm for cyclotomic fields based on modular computations and reconstruction using the Chinese remainder theorem. Hess: Hessenberg reduction for the characteristic polynomial • Dani: Danilevsky’s algorithm for the characteristic polynomial. • LU: LU factorization to compute the determinant. • FFLU: fraction-free LU factorization using the Bareiss algorithm. • LU2 and FFLU2: as above, but using the resulting decomposition to com- • pute A−1 (equivalently determining adj(A)) by solving AA−1 = I. Berk (Berkowitz), Alg1 and Alg2 as in the previous section. • With the exception of the Sage function, we implemented all the algorithms using Antic [11] for number field arithmetic and Flint for other operations. We perform fast matrix multiplication by packing number field elements into integers and multiplying matrices over Z via Flint. Our implementations of LU, LU2, Alg1 and Alg2 benefit from matrix multiplication while Hess, Dani, FFLU, FFLU2 and Berk do not. The benchmark is therefore not representative of the performance that ideally should be achievable with these algorithms, although it is fair in the 6 FREDRIK JOHANSSON

Table 2. Time in seconds to compute characteristic polynomial (C), determinant (D), adjugate/inverse (A) of a matrix over a cy- clotomic number field.

n Sage Hess Dani LU FFLU LU2 FFLU2 Berk Alg1 Alg2 CD CD CD D D DA DA CD CDA CDA k Input: n × n matrix over Q(ζ20), entries Pk(p/q)ζ20, random |p|≤ 10, 1 ≤ q ≤ 10. 10 0.038 0.31 0.16 0.024 0.0059 0.21 0.11 0.010 0.0073 0.010 20 0.12 19 6.7 0.22 0.067 2.6 1.4 0.28 0.15 0.16 30 0.39 200 67 0.93 0.31 12 6.8 2.0 1.1 0.8 40 1.1 353 2.8 0.9 37 22 7.5 3.7 2.6 50 1.9 7.0 2.3 88 56 22 8.7 5.7 60 3.4 15 4.7 182 119 54 19 12 70 5.1 29 8.6 337 230 114 39 22 80 7.5 53 15 581 409 208 67 34 90 11 89 24 397 144 54 100 15 144 41 608 670 130 120 322 83 1439 3013 420 (i−1)(j−1) Input: n × n DFT matrix over Q(ζn), entries Ai,j = ζn . 10 0.010 0.0018 0.0016 0.00017 0.00022 0.00076 0.0014 0.00061 0.00075 0.00059 20 0.039 0.0019 0.0024 0.0017 0.0046 0.0071 0.038 0.020 0.020 0.0070 50 1.3 0.17 0.13 0.065 0.80 0.28 6.0 8.2 2.0 0.49 100 22 5.4 22 0.89 43 5.3 335 803 223 29 150 78 22 7.9 4.4 214 19 1423 7259 933 138 200 333 1928 140 31 1655 192 1687

sense that the implementation effort for Alg1 and Alg2 was minimal while the other algorithms would require much more code to speed up using block strategies. Table 2 compares timings for two kinds of input: random matrices over a fixed cyclotomic field, and discrete Fourier transform (DFT) matrices which have special structure. Choosing cyclotomic fields allows us to compare with the dedicated algorithm for characteristic polynomials in Sage; the corresponding method for generic number fields in Sage is far slower. All the other algorithms make no assumptions about the field.

4.2.1. Observations. There are no clear winners since there is a complex interplay between operation count, multiplication algorithms, matrix structure and coeffi- cient growth. Modular algorithms are the best solution in general for large n, but implementations for number fields are complex and less readily available in current software than for Z and Q. Among the non-modular algorithms, the O(n3) division-heavy Hessenberg and Danilevsky algorithms are nearly useless due to coefficient explosion for generic in- put, but both perform well on the DFT matrix. The LU and FFLU algorithms have more even performance but alternate with each other for the advantage depending on the matrix. Alg2 has excellent average performance for the determinant, charac- teristic polynomial as well as the adjugate matrix considering the large variability between the algorithms for different input. It is highly competitive for computing the inverse or adjugate of the random matrix. ALGORITHM FOR THE CHARACTERISTIC POLYNOMIAL 7

Table 3. Time in seconds to compute characteristic polynomial (C), determinant (D), adjugate/inverse (A) of a random n n matrix in real ball arithmetic. The respective algorithms were× run with 333 + p bits of precision, with p chosen to give roughly 100- digit output accuracy.

n Hess Hess2 Dani Eig LU LU2 Berk Alg1 Alg2 CD CD CD CD D D CD CDA CDA 10 0.00021 0.00038 0.00022 0.017 0.000068 0.00023 0.00080 0.0010 0.00078 20 0.0021 0.0032 0.0020 0.18 0.00052 0.0015 0.0039 0.017 0.0092 50 0.045 0.057 0.048 4.6 0.0078 0.019 0.22 0.61 0.21 100 0.64 0.69 0.61 56 0.062 0.15 6.3 9.7 2.4 150 3.5 3.5 3.0 245 0.23 0.44 52 52 11 200 12 11 10 0.59 1.0 224 176 34 250 31 29 25 1.4 1.9 687 460 73 300 66 59 53 2.5 3.2 1804 1075 160 350 135 115 110 4.4 5.0 4033 2107 306 p 10n 6n 10n 0 n 0 6n 6n 6n

4.3. Ball arithmetic. Division-free algorithms are useful when computing rigor- ously over R and C in interval arithmetic or ball arithmetic. The reason is that we cannot test whether elements are zero, so algorithms like Gaussian elimination and Hessenberg reduction fail when they need to branch upon zero pivot elements or zero vectors. Although zeros will not occur for random input, they are likely to occur for structured matrices arising in applications. A posteriori verification of approximate numerical solutions or perturbation analysis is in principle the best workaround [23], but it is sometimes useful to fall back to more direct division-free methods, especially when working in very high precision. Table 3 shows timings for computing a determinant or characteristic polynomial with 100-digit accuracy using the following algorithms implemented in ball arith- metic. The input is taken to be an n n real matrix with uniformly random entries in [0, 1]. For this experiment, we only× focus on the determinant and characteristic polynomial (the conclusions regarding matrix inversion would be similar to those regarding the determinant). Hess: Hessenberg reduction using Gaussian elimination. • Hess2: Hessenberg reduction using Householder reflections. • LU: LU factorization using Gaussian elimination. • LU2: approximate computation of the determinant using LU factorization • followed by a posteriori verification. Eig: approximate computation of the eigenvalues using the QR algorithm • followed by a posteriori verification and reconstruction of the characteristic polynomial from its roots. Berk (Berkowitz), Alg1 and Alg2 as in the previous section. • All algorithms were implemented in Arb[15] which uses the accelerated dot prod- uct and matrix multiplication algorithms described in [16]. The LU, LU2, Alg1 and Alg2 implementations benefit from fast matrix multiplication while Hess, Hess2, Eig and Berk do not. 8 FREDRIK JOHANSSON

The methods LU2 and Eig are numerically stable: the output balls are precise to nearly full precision for well-conditioned input. All other algorithms are unstable in ball arithmetic and lose O(n) digits of accuracy. At least on this example, the rate of loss is almost the same for Hess2, Berk, Alg1 and Alg2, while LU is more stable and Hess and Dani are less stable. To make the comparison meaningful, we set the working precision (shown in the table) to an experimentally determined value so that all algorithms enclose the determinant with around 100 digits of accuracy.

4.3.1. Observations. For computing the characteristic polynomial in high-precision ball arithmetic, it seems prudent to try Hessenberg reduction and fall back to a division-free algorithm when it fails due to encountering a zero vector. The Berkowitz algorithm is the best fallback for small n, while Alg2 wins for large n (n 50, although the cutoff will vary). On this particular benchmark, Alg2 runs only≈ about 4 slower than Hessenberg reduction, making it an interesting one- size-fits-all algorithm.× The verification method (Eig) gives the best results if the precision is constrained, but is far more expensive than the other methods. For computing the determinant alone, all the division-free methods are clearly inferior to methods based on LU factorization in this setting. The only advantage of the division-free algorithms is that they are foolproof while LU factorization requires some attention to implement correctly. Better methods for computing the characteristic polynomial in ball arithmetic or interval arithmetic are surely possible. For the analogous problem of computing the characteristic polynomial over Qp, see [4].

4.4. Polynomial quotient rings. At first glance Algorithm 2 seems to hold po- tential for working over multivariate polynomial quotient rings. Such rings need not be integral domains and division can be very expensive (requiring Gr¨obner ba- sis computations). Unfortunately, in most examples we have tried, Algorithm 2 performs worse than both the Berkowitz algorithm and Algorithm 1, presumably because repeated multiplication by the initial matrix A is much cheaper than mul- tiplication by a power of A which generally will have much larger entries. There may be special classes of matrices for which the method performs well, however.

References

[1] Jounaidi Abdeljaoued and Gennadi I. Malaschonok. Efficient algorithms for computing the characteristic polynomial in a domain. Journal of Pure and Applied Algebra, 156(2-3):127– 145, 2001. [2] Stuart J. Berkowitz. On computing the determinant in small parallel time using a small number of processors. Information Processing Letters, 18(3):147–150, March 1984. [3] R. P. Brent and H. T. Kung. Fast algorithms for manipulating formal power series. Journal of the ACM, 25(4):581–595, October 1978. [4] Xavier Caruso, David Roe, and Tristan Vaccon. Characteristic polynomials of p-adic matrices. In Proceedings of the 2017 ACM on International Symposium on Symbolic and Algebraic Computation. ACM, July 2017. [5] Henri Cohen. A Course in Computational Algebraic Number Theory. Springer Berlin Heidel- berg, 1996. [6] L. Csanky. Fast parallel matrix inversion algorithms. SIAM Journal on Computing, 5(4):618– 623, December 1976. [7] Jean-Guillaume Dumas and Cl´ement Pernet. Computational linear algebra over finite fields. arXiv preprint arXiv:1204.3735, 2012. ALGORITHM FOR THE CHARACTERISTIC POLYNOMIAL 9

[8] Jean-Guillaume Dumas, Cl´ement Pernet, and Zhendong Wan. Efficient computation of the characteristic polynomial. In Proceedings of the 2005 international symposium on Symbolic and algebraic computation - ISSAC’05. ACM Press, 2005. [9] Fran¸cois Le Gall. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation - ISSAC’14. ACM Press, 2014. [10] William B. Hart. Fast Library for Number Theory: An introduction. In Mathematical Soft- ware – ICMS 2010, pages 88–91. Springer Berlin Heidelberg, 2010. [11] William B. Hart. ANTIC: Algebraic number theory in C. Computeralgebra-Rundbrief: Vol. 56, 2015. [12] Gilbert Helmberg, Peter Wagner, and Gerhard Veltkamp. On Faddeev-Leverrier’s method for the computation of the characteristic polynomial of a matrix and of eigenvectors. Linear Algebra and its Applications, 185:219–233, 1993. [13] Shui-Hung Hou. Classroom note: A simple proof of the Leverrier–Faddeev characteristic polynomial algorithm. SIAM Review, 40(3):706–709, January 1998. [14] Fredrik Johansson. A fast algorithm for reversion of power series. Mathematics of Computa- tion, 84(291):475–484, May 2014. [15] Fredrik Johansson. Arb: Efficient arbitrary-precision midpoint-radius interval arithmetic. IEEE Transactions on Computers, 66(8):1281–1292, August 2017. [16] Fredrik Johansson. Faster arbitrary-precision dot product and matrix multiplication. In 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH). IEEE, June 2019. [17] Erich Kaltofen. On computing determinants of matrices without divisions. In Papers from the international symposium on Symbolic and algebraic computation - ISSAC’92. ACM Press, 1992. [18] Erich Kaltofen and Gilles Villard. On the complexity of computing determinants. Computa- tional Complexity, 13(3-4):91–130, February 2005. [19] Vincent Neiger and Cl´ement Pernet. Deterministic computation of the characteristic polyno- mial in the time of matrix multiplication. arXiv preprint arXiv:2010.04662, 2020. [20] Michael S. Paterson and Larry J. Stockmeyer. On the number of nonscalar multiplications necessary to evaluate polynomials. SIAM Journal on Computing, 2(1):60–66, March 1973. [21] Cl´ement Pernet and Arne Storjohann. Faster algorithms for the characteristic polynomial. In Proceedings of the 2007 international symposium on Symbolic and algebraic computation - ISSAC’07. ACM Press, 2007. [22] Rizwana Rehman and Ilse CF Ipsen. La Budde’s method for computing characteristic poly- nomials. arXiv preprint arXiv:1104.3769, 2011. [23] Siegfried M. Rump. Verification methods: Rigorous results using floating-point arithmetic. Acta Numerica, 19:287–449, May 2010. [24] The Sage Developers. SageMath, the Sage Mathematics Software System (Version 9.0), 2020. https://www.sagemath.org. [25] Urbain Le Verrier. Sur les variations s´eculaire des ´elements des orbites pour les sept plan´etes principales. J. de Math, (s 1):5, 1840. [26] Gilles Villard. Kaltofen’s division-free determinant algorithm differentiated for matrix adjoint computation. Journal of Symbolic Computation, 46(7):773–790, July 2011. [27] James Hardy Wilkinson. The algebraic eigenvalue problem, volume 87. Clarendon press Ox- ford, 1965.

Inria Bordeaux-Sud-Ouest and Institut Math. Bordeaux, U. Bordeaux, 33400 Talence, France Email address: [email protected]