Implementation Techniques for Fast Polynomial Arithmetic in a High-Level Programming Environment
Total Page:16
File Type:pdf, Size:1020Kb
Implementation Techniques For Fast Polynomial Arithmetic In A High-level Programming Environment Akpodigha Filatei Xin Li Marc Moreno Maza ORCCA, University of Western ORCCA, University of Western ORCCA, University of Western Ontario (UWO) Ontario (UWO) Ontario (UWO) London, Ontario, Canada London, Ontario, Canada London, Ontario, Canada afi[email protected] [email protected] [email protected] Eric´ Schost LIX, Ecole´ polytechnique 91128 Palaiseau, France [email protected] ABSTRACT multiplication is “better than the classical method approxi- Though there is increased activity in the implementation of mately when n + m ≥ 600”, where n and m are the degrees asymptotically fast polynomial arithmetic, little is reported of the input polynomials. In [22] p. 501, quoting [3], Knuth on the details of such effort. In this paper, we discuss how we writes “He (R. P. Brent) estimated that Strassen’s scheme achieve high performance in implementing some well-studied would not begin to excel over Winograd’s until n ≈ 250 and fast algorithms for polynomial arithmetic in two high-level such enormous matrices rarely occur in practice unless they programming environments, AXIOM and Aldor. are very sparse, when other techniques apply.” Two approaches are investigated. With Aldor we rely The implementation of asymptotically fast arithmetic was only on high-level generic code, whereas with AXIOM we not the primary concern of the early computer algebra sys- endeavor to mix high-level, middle-level and low-level spe- tems, which had many other challenges to face. For in- stance, one of the main motivations for the development of cialized code. We show that our implementations are satis- AXIOM factory compared with other known computer algebra sys- the computer algebra system [19] was the design tems or libraries such as Magma v2.11-2 and NTL v5.4. of a language where mathematical properties and algorithms could be expressed in a natural and efficient manner. Nev- Categories and Subject Descriptors: ertheless, successful implementations of the FFT-based uni- I.1.2 [Computing Methodologies]: Symbolic and Alge- variate polynomial multiplication [25] and Strassen’s matrix braic Manipulation – Algebraic Algorithms multiplication [2] have been reported for several decades. General Terms: In the last decade, several software for performing sym- Algorithms, Experimentation, Performance, Theory bolic computations have put a great deal of effort in pro- Keywords: viding outstanding performances, including successful im- High-performance, polynomials, Axiom, Aldor. plementation of asymptotically fast arithmetic. As a result, the general-purpose computer algebra system Magma [5] NTL 1. INTRODUCTION and the Number Theory Library [26, 18] have set world records for polynomial factorization and determining orders Asymptotically fast algorithms for exact polynomial and of elliptic curves. The book Modern Computer Algebra [12] matrix arithmetic have been known for more than forty has also contributed to increase the general interest of the years. Among others, the work of Karatsuba [21], Cooley computer algebra community for these algorithms. and Tukey [6], and Strassen [27] has initiated an intense ac- As to linear algebra, in addition to Magma, let us men- tivity in this area. Unfortunately, its impact on computer tion the C++ template library LinBox [17] for exact linear algebra systems has been reduced until recently. One reason algebra computation with dense, sparse, and structured ma- was, probably, the belief that these algorithms were of very trices over the integers and over finite fields. A cornerstone limited practical interest. In [13] p. 132, referring to [25], of this library is the use of BLAS libraries such as ATLAS the authors state that the FFT-based univariate polynomial to provide high-speed routines for matrices over small finite fields, through floating-point computations [9]. Today, it is common practice to assume that a new al- gorithm, say for GCD computations over products of fields Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are as in [8], can rely on asymptotically fast polynomial multi- not made or distributed for profit or commercial advantage and that copies plication. Therefore, it is desirable not only to offer im- bear this notice and the full citation on the first page. To copy otherwise, to plementations of asymptotically fast arithmetic, but also republish, to post on servers or to redistribute to lists, requires prior specific programming environments for developing new such algo- permission and/or a fee. rithms. In addition, it is also a demand to achieve this goal ISSAC'06, July 9–12, 2006, Genova, Italy. in the context of high-level programming languages, where Copyright 2006 ACM 1-59593-276-3/06/0007 ...$5.00. new ideas can be tested quickly and where algorithms can a given AXIOM or Aldor ring R, the domains SUP(R) and be easily made generic. These are the goals of this paper, DUP(R), for sparse and dense univariate polynomials respec- which reports on implementation techniques for asymptot- tively, provide exactly the same operations; that is they have ically fast algorithms in two high-level programming envi- the same user interface, which is defined by the category ronments, AXIOM and Aldor. We focus on polynomial UnivariatePolynomialCategory(R). But, of course, the im- arithmetic and our test-operations are univariate and mul- plementation of the operations of SUP(R) and DUP(R) is quite tivariate multiplication, and computations of power series different. While SUP(R) implements polynomials with linked inverses as well as GCDs for univariate polynomials. lists of terms, DUP(R) implements them with arrays of coef- Implementing asymptotically fast algorithms for these op- ficients indexed by their degrees. This allows us to specify erations in a high-level programming environment presents a package, FFTPolynomialMultiplication(R, U), parame- several difficulties. First, compilation of high-level generic terized by R, an FFTRing, that is, a ring supporting the FFT; code to machine code through middle-level, say Lisp, and and by U, a domain of UnivariatePolynomialCategory(R). low-level, say C, may lead to a running-time overhead, with respect to carefully hand-written C code. This may reduce 2.1 The Aldor environment the benefit of these algorithms, since they generally involve Aldor can be used both as a compiled and interpreted changes of data representation, whereas classical algorithms language. Code optimization is however only available when work usually in a straightforward manner. Minimizing this used in compiled mode. An Aldor program can be com- overhead is the motivation of our work in Aldor where piled into: stand-alone executable programs; object libraries our entire code is written for univariate polynomials over an in native operating system formats (which can be linked arbitrary field supporting the FFT. Second, compiled and with one another, or with C or Fortran code to form ap- optimized high-level code may not take advantage of some plication programs); portable byte code libraries; and C or hardware features. If writing architecture-aware code can be Lisp source [16]. Code improvements by techniques such done in C [20], this remains a challenge in a non-imperative as program specialization, cross-file procedural integration language like Lisp. Thus, in our second high-level program- and data structure elimination, are performed at intermedi- ming environment, namely AXIOM, we take advantage of ate stages of compilation [28]. This produces code that is every component of the system, by mixing low-level code comparable to hand-optimized C. (C and assembly code), middle-level code (Lisp) and high- level code in the AXIOM language. We develop specialized 2.2 The AXIOM environment Z Z code for univariate and multivariate polynomials over /p AXIOM has both an interactive mode for user interac- where p is a prime; we distinguish also the cases where p is tions and a high level programming language, called SPAD, machine word size and a big integer. for building library modules. In the interactive mode, users AXIOM Section 2 contains an overview of the features of can evaluate arithmetic expressions, declare and define vari- Aldor and systems. In Sections 3 and 4, we discuss our ables, call library functions and define their own functions. Aldor AXIOM implementation techniques in the and en- Programmers can also add new functions to the local AX- vironments. We compare our implementation of asymptot- IOM library. To do so, they need to integrate their code in Magma NTL ically fast algorithms with those of and . In AXIOM type constructors. Section 5 we report on our experiments. Our generic imple- SPAD code is translated into Common Lisp code by a Aldor mentations in are only, approximately, twice slower built-in compiler, then translated into C code by the GCL NTL than those of for comparable operations. Our special- compiler. Finally, GCL makes use of a native C compiler, AXIOM ized implementation in leads to comparable per- such as GCC, to generate machine code. Since these compil- Magma formances and sometimes outperforms those of and ers can generate fairly efficient code, programmers can con- NTL . A review of the algorithms we implemented is given centrate on their mathematical algorithms and write them in appendix. All timings given in this article are obtained in SPAD. However, to achieve higher performance, our im- on a bi-Pentium 4, 2.80GHz machine, with 1 Gb of RAM. plementation also involves Lisp, C, and assembly level code. By modifying the AXIOM makefiles, new Lisp functions can be compiled and made available at SPAD level. Moreover, 2. HIGH LEVEL PROGRAMMING by using the GCL system provided make-function macro, ENVIRONMENT one can add new C functions into the GCL system, then AXIOM and Aldor designers attempted to surmount use them at the GCL and SPAD level.