Efficient Parallelizations of Hermite and Smith Normal Form Algorithms

Efficient Parallelizations of Hermite and Smith Normal Form Algorithms Gerold Jägera, Clemens Wagnerb aComputer Science Institute, University of Halle-Wittenberg, D-06120 Halle (Saale), Germany bdenkwerk, Vogelsanger Straße 66, D-50823 Köln, Germany Abstract Hermite and Smith normal form are important forms of matrices used in linear algebra. These terms have many applications in group theory and number theory. As the entries of the matrix and of its corresponding transformation matrices can explode during the computation, it is a very difficult problem to compute the Hermite and Smith normal form of large dense matrices. The main problems of the computation are the large execution times and the memory requirements which might exceed the memory of one processor. To avoid these problems, we develop parallelizations of Hermite and Smith normal form algorithms. These are the first parallelizations of algorithms for computing the normal forms with corresponding transformation matrices, both over the rings Z and F[x]. We show that our parallel versions have good efficiency, i.e., by doubling the processes, the execution time is nearly halved. Furthermore, they succeed in computing normal forms of dense large example matrices over the rings Q[x], F3[x], and F5[x]. Key words: Hermite normal form, Smith normal form, parallelization. 1. Introduction A matrix in Rm,n, where R is a commutative, integral and Euclidean ring with 1, with rank n is in Hermite normal form (HNF), if it is a lower triangular matrix, where all elements are smaller than the diagonal element of the same row. The definition can easily be generalized to rank r < n. It follows from Hermite [19] that you can obtain from an arbitrary matrix in Rm,n the uniquely determined HNF by doing unimodular column operations. A matrix in Rm,n with rank r is in Smith normal form (SNF), if it is a diagonal matrix with the first r diagonal elements being divisors of the next diagonal element and the last diagonal elements being zero. It follows from Smith [31] that you can Email addresses: [email protected] (Gerold Jäger), [email protected] (Clemens Wagner) obtain from an arbitrary matrix in Rm,n the uniquely determined SNF by doing unimodular row and column operations. Thus the SNF is a generalization of the HNF for both, row and column operations. As Smith and Hermite normal forms are the basic building blocks to solving linear equations over the integers, they are at one more level of complexity than linear algebra (where elimination is done over the reals). Furthermore, Hermite and Smith normal form play an important role in the theory of finite Abelian groups, the theory of finitely generated modules over principal ideal rings, system theory, number theory, and integer programming. For many applications, for example linear equations over the integers, the transformation matrices describing the unimodular operations are important as well. There are many algorithms for computing the HNF [1, 2, 6, 14, 24], and SNF [2, 13, 15], most of them only for the ring Z. Some of these algorithms are probabilistic ([10] for the SNF and R = Z, [34] for the SNF and R = Q[x]). Deterministic algorithms for R = Z often use modular techniques ([23] for the HNF, [16, 32] for the SNF, [11], [30, Chapter 8.4], [33] for the HNF and SNF). Most of these modular algorithms are unable to compute the corresponding transformation matrices. Unfortunately, most algorithms lead to coefficient explosion, i.e., during the computation, entries of the matrix and the corresponding transformation matrices occur, which are very large, even exponential [8, 17]. For high-dimensional matrices with large entries this leads to large execution times and memory problems, i.e., the memory of one process is not large enough for the normal form computation of large matrices. These problems can be remedied by parallelization, so that it is possible to handle considerably larger matrices. Much effort has been done for matrix and linear algebra computations in parallel [3, 7, 9, 18, 26, 27, 29, 36]. In [21] a parallel HNF algorithm and in [21, 22, 37] parallel probabilistic SNF algorithms are introduced for the ring F[x], but without experimental results, and in [28] a parallel SNF algorithm is described which only works for characteristic matrices. The purpose of this paper is to show efficient parallelizations of Hermite and Smith normal form computations with empirical evidence. Especially we parallelize the well-known HNF and SNF algorithms of Kannan, Bachem [25] generalized to rectangular matrices with arbitrary rank, and the SNF algorithm of Hartley, Hawkes [12]. These are three of the most important algorithms which work for both the ring Z and the ring F[x] and which are able to compute the corresponding transformation matrices. It is an important problem of parallelization of normal forms, how to uniformly distribute a large matrix to many processes. Our main idea for this problem comes from the following observation which holds for all HNF and SNF algorithms considered: When an elimination step is done by series of column (row) operations, the operations depend only on one particular row (column). Thus it is reasonable to use the well-known row (column) distribution of matrices [9, 29]. Especially we use a row distribution for column operations and a column distribution for row operations, where row (column) distribution means distributing the matrix among the processes, so that each whole row (column) goes to a single process. This is done by a broadcast operation. When an elimination step is done involving entries in a particular row (column), that row (column) is broadcast to all the processes, 2 so that they can determine all in parallel what column (row) operations are to be done on the matrix. Then they update their rows (columns) by doing these column (row) operations on all their rows (columns). As for the SNF we use both row and column operations, an auxiliary algorithm is used, which trans- forms a row distributed matrix into a column distributed one and vice versa. This procedure is an implementation of parallel matrix transposition [4, 5, 35]. We estimate the parallel operations of the three algorithms and observe that the complexity of the parallel HNF algorithm is much better than that of both parallel SNF algorithms and that the parallel Hartley-Hawkes SNF algorithm leads to a better complexity than the parallel Kannan-Bachem SNF algorithm. We implement the algorithms and test it for large dense matrices over the rings Q[x], F3[x], and F5[x]. The experiments show that the parallel HNF algorithm and the parallel Kannan-Bachem SNF algorithm give very similar results. Com- paring the SNF algorithms, the parallel Kannan-Bachem SNF algorithm leads to better results for the ring Q[x], and the parallel Hartley-Hawkes SNF algorithm to better results for the rings F3[x], F5[x]. Considering medium-sized matrices, we observe that the algorithms have a good efficiency, even for 64 processes. The algorithms are also able to compute the HNF and SNF with its corresponding transformation matrices of large matrices in reasonable time. Because of the memory requirements, the program packages MAGMA and MAPLE are not able to do most of such computations. 2. Preliminaries Let R be a commutative, integral ring with 1 and R∗ ⊆ R the set of units of R. Let R be Euclidean, i.e., there is a mapping φ : R \{0} → N0, so that for a ∈ R, b ∈ R \{0} q, r ∈ R exist with a = qb + r and r = 0 ∨ φ(r) < φ(b), where we define ψ(a, b) := r. Further let R ⊆ R be a system of representatives of R, i.e., for each a ∈ R \{0} unique e ∈ R∗ and b ∈ R exist with a = e · b, 1 where we define β(a) := e . In this paper we only consider two examples: a) The set Z of integers. We choose φ := | · |, R := N0. For a ∈ R let bac be the largest integer ≤ a. With the above notations we define for a ∈ Z, b ∈ Z \{0} : ψ(a, b) := r = a − ba/bc · b and for a ∈ Z \{0} we have: m,n β(a) := sgn(a). For A ∈ Z let kAk∞ := max1≤i≤m,1≤j≤n{|Ai,j|}. b) The polynomial ring F[x] with a field F. We choose φ := deg, R := {Monic polynomials over F[x]}. With the above notations it holds for a ∈ F[x], b ∈ F[x] \{0} : ψ(a, b) := r, where r is uniquely determined by polynomial division of a and b. Further for a ∈ [x] \{0} we have: β(a) := 1 for a = F ak k X i m,n ai · x , ak 6= 0. For A ∈ F[x] let dAedeg := max1≤i≤m,1≤j≤n {deg(Ai,j)}. i=0 n,n Definition 2.1. a) The matrix En = (Eij)1≤i,j≤n ∈ R is defined by 1, if i = j E = . i,j 0, otherwise n,n b) GLn(R) is the group of matrices in R whose determinant is a unit in the ring R. These matrices are called unimodular matrices. 3 m,n Definition 2.2. A matrix A = (Aij)1≤i≤m,1≤j≤n ∈ R with rank r is in Hermite normal form (HNF), if the following conditions hold: a) ∃i1, . , ir with 1 ≤ i1 < ··· < ir ≤ m with Aij ,j ∈ R \ 0 for 1 ≤ j ≤ r (the Aij ,j are called pseudo diagonal elements). b) Ai,j = 0 for 1 ≤ i ≤ ij − 1, 1 ≤ j ≤ r. c) The columns r + 1, . , n are zero.

Load more