Parallel Arbitrary-Precision Integer Arithmetic

Total Page:16

File Type:pdf, Size:1020Kb

Parallel Arbitrary-Precision Integer Arithmetic Western University Scholarship@Western Electronic Thesis and Dissertation Repository 3-19-2021 1:30 PM Parallel Arbitrary-precision Integer Arithmetic Davood Mohajerani, The University of Western Ontario Supervisor: Moreno Maza, Marc, The University of Western Ontario A thesis submitted in partial fulfillment of the equirr ements for the Doctor of Philosophy degree in Computer Science © Davood Mohajerani 2021 Follow this and additional works at: https://ir.lib.uwo.ca/etd Part of the Numerical Analysis and Scientific Computing Commons, and the Theory and Algorithms Commons Recommended Citation Mohajerani, Davood, "Parallel Arbitrary-precision Integer Arithmetic" (2021). Electronic Thesis and Dissertation Repository. 7674. https://ir.lib.uwo.ca/etd/7674 This Dissertation/Thesis is brought to you for free and open access by Scholarship@Western. It has been accepted for inclusion in Electronic Thesis and Dissertation Repository by an authorized administrator of Scholarship@Western. For more information, please contact [email protected]. Abstract Arbitrary-precision integer arithmetic computations are driven by applications in solving sys- tems of polynomial equations and public-key cryptography. Such computations arise when high precision is required (with large input values that t into multiple machine words), or to avoid coecient overow due to intermediate expression swell. Meanwhile, the growing demand for faster computation alongside the recent advances in the hardware technology have led to the development of a vast array of many-core and multi-core processors, acceler- ators, programming models, and language extensions (e.g., CUDA and OpenCL for GPUs, and OpenMP and Cilk for multi-core CPUs). The massive computational power of parallel pro- cessors makes them attractive targets for carrying out arbitrary-precision integer arithmetic. At the same time, developing parallel algorithms, followed by implementing and optimizing them as multi-threaded parallel programs imposes a set of challenges. This work explains the current state of research on parallel arbitrary-precision integer arithmetic on GPUs and CPUs, and proposes a number of solutions for some of the challenging problems related to this subject. Keywords: Arbitrary-precision integer arithmetic, FFT, GPU, Multi-core ii Summary Arbitrary-precision integer arithmetic computations are driven by applications in solving sys- tems of polynomial equations and public-key cryptography. Such computations arise when high precision is required. Meanwhile, the growing demand for faster computation alongside the recent advances in the hardware technology have led to the development of a vast array of many-core and multi-core processors, accelerators, programming models, and language extensions. The massive computational power of parallel processors makes them attractive targets for carrying out arbitrary-precision integer arithmetic. At the same time, developing parallel algorithms, followed by implementing and optimizing them as multi-threaded paral- lel programs imposes a set of challenges. This work explains the current state of research on parallel arbitrary-precision integer arithmetic on GPUs and CPUs, and proposes a number of solutions for some of the challenging problems related to this subject. iii Co-Authorship Statement • Chapter2 is a joint work with Liangyu Chen, Svyatoslav Covanov, and Marc Moreno Maza. This work is published in [1]. The contributions of the thesis author include: a new GPU im- plementation, optimization of arithmetic operations on GPU, experimental results (tables, diagrams, proling information) for comparing the presented algorithms against computa- tionally equivalent solutions on CPUs and GPUs. • Chapter3 is a joint work with Svyatoslav Covanov, Marc Moreno Maza, and Linxiao Wang. This work is published in [2]. The contributions of the thesis author include: adaptation of convolution code (originally developed by Svyatoslav Covanov as part of BPAS library) for multiplying arbitrary elements of big prime eld, specialized arithmetic for the CRT, implementation and tuning of base-case DFT functions, parallelization of the code using Cilk, and collection of experimental results. • Chapter4 is a joint work with Alexander Brandt, Marc Moreno Maza, Jeeva Paudel, and Linxiao Wang. A preprint of this work is published in [3]. The contributions of the thesis author include: processing of the annotated CUDA kernel code, generation of instrumented binary code using the LLVM Pass Framework, development of a customized proler using NVIDIA’s EVENT API within the CUPTI API, and collection of experimental results. • Chapter5 is a joint work with Marc Moreno Maza. The contributions of the thesis author include: algorithm design and analysis, implementation, code optimization and collection of experimental results. iv Acknowlegements First and foremost, I would like to thank my supervisor Professor Marc Moreno Maza. I am grateful for his advice, support, and providing me the opportunity to work on multiple fascinating problems. Through our discussions, I have had the chance to learn a handful of lessons, including but not limited to, the emphasis on the rst principles, clear and concise expression of ideas, and trying to systematically model the problems with algebraic structures. There were multiple times that I assumed a problem could not be further studied, or optimized and more than once I was proven wrong. This in essence has taught me to distinguish the real barriers versus the ones that are a creation of my thinking. Everything that I have worked on has taught me another important lesson that I must always remember and that is what Voltaire once said it best: "Perfect is the enemy of the good." I am grateful for the insightful comments and questions of thesis examiners Professor Michael Bauer, Professor Mark Daley, Professor Arash Reyhani-Masoleh, and Professor Éric Schost. I would like to thank the members of Ontario Research Center for Computer Algebra (OR- CCA) and the Computer Science Department of the University of Western Ontario. I am thankful to my colleagues and co-authors; I have learned a little bit of what to do and what not to do from each of you. Specically, I am thankful to Svyatoslav Covanov for laying the theoretical foundation and further contributions to our joint work on big prime eld FFT. I also would like to thank Alexander Brandt for his help with proofreading chapter 5 of this the- sis as well as our ISSAC 2019 paper (FFT on multi-core CPUs) although he was not a co-author of that work. Last but not least, I am very thankful to my family and friends for their endless support. v Contents Abstract ii Summary iii Co-Authorship Statement iv Acknowlegementsv List of Figures viii List of Tables ix 1 Introduction1 1.1 Background and motivation ............................ 1 1.2 Challenges and objectives.............................. 2 2 Big Prime Field FFT on GPUs4 2.1 Introduction ..................................... 4 2.2 Complexity analysis................................. 5 2.3 Generalized Fermat numbers............................ 8 2.4 FFT Basics ...................................... 13 2.5 Blocked FFT on the GPU .............................. 14 2.6 Implementation ................................... 16 2.7 Experimentation................................... 17 2.8 Conclusion...................................... 22 2.9 Appendix: modular methods and unlucky primes ................ 22 3 Big Prime Field FFT on Multi-core Processors 26 3.1 Introduction ..................................... 26 3.2 Generalized Fermat prime elds .......................... 28 3.3 Optimizing multiplication in generalized Fermat prime elds.......... 29 3.4 A generic implementation of FFT over prime elds................ 32 3.5 Experimentation................................... 37 3.6 Conclusions and future work............................ 44 4 KLARAPTOR: Finding Optimal Kernel Launch Parameters 45 vi 4.1 Introduction ..................................... 45 4.2 Theoretical foundations............................... 48 4.3 KLARAPTOR: a dynamic optimization tool for CUDA.............. 54 4.4 An algorithm to build and deploy rational programs............... 55 4.5 Runtime selection of thread block conguration for a CUDA kernel . 57 4.6 The implementation of KLARAPTOR ....................... 59 4.7 Experimentation................................... 62 4.8 Conclusions and future work............................ 64 5 Arbitrary-precision Integer Multiplication on GPUs 66 5.1 Introduction ..................................... 66 5.2 Essential denitions for designing parallel algorithms.............. 67 5.3 Choice of algorithm for parallelization....................... 67 5.4 Problem denition.................................. 68 5.5 A ne-grained parallel multiplication algorithm ................. 69 5.6 Complexity analysis................................. 70 5.7 Experimentation................................... 75 5.8 Discussion ...................................... 80 6 Conclusion 81 Bibliography 82 Curriculum Vitae 89 vii List of Figures 2.1 Speedup diagram for computing the benchmark for a vector of size N Ke 63 34 8 (K 16) for P3 : ¹2 + 2 º + 1.......................... 21 2.2 Speedup diagram for computing the benchmark for a vector of size N Ke 62 36 16 (K 32) for P4 : ¹2 + 2 º + 1. ........................ 21 4 2.3 Running time of computing DFTN with N K on a GTX 760M GPU. 21 3.1 Ratio (t/tgmp) of average time spent in one multiplication operation measured
Recommended publications
  • The Sage Project: Unifying Free Mathematical Software to Create a Viable Alternative to Magma, Maple, Mathematica and MATLAB
    The Sage Project: Unifying Free Mathematical Software to Create a Viable Alternative to Magma, Maple, Mathematica and MATLAB Bur¸cinEr¨ocal1 and William Stein2 1 Research Institute for Symbolic Computation Johannes Kepler University, Linz, Austria Supported by FWF grants P20347 and DK W1214. [email protected] 2 Department of Mathematics University of Washington [email protected] Supported by NSF grant DMS-0757627 and DMS-0555776. Abstract. Sage is a free, open source, self-contained distribution of mathematical software, including a large library that provides a unified interface to the components of this distribution. This library also builds on the components of Sage to implement novel algorithms covering a broad range of mathematical functionality from algebraic combinatorics to number theory and arithmetic geometry. Keywords: Python, Cython, Sage, Open Source, Interfaces 1 Introduction In order to use mathematical software for exploration, we often push the bound- aries of available computing resources and continuously try to improve our imple- mentations and algorithms. Most mathematical algorithms require basic build- ing blocks, such as multiprecision numbers, fast polynomial arithmetic, exact or numeric linear algebra, or more advanced algorithms such as Gr¨obnerbasis computation or integer factorization. Though implementing some of these basic foundations from scratch can be a good exercise, the resulting code may be slow and buggy. Instead, one can build on existing optimized implementations of these basic components, either by using a general computer algebra system, such as Magma, Maple, Mathematica or MATLAB, or by making use of the many high quality open source libraries that provide the desired functionality. These two approaches both have significant drawbacks.
    [Show full text]
  • An Open Source Approach Using Sage
    An Introduction to Algebraic, Scientific, and Statistical Computing: an Open Source Approach Using Sage William A. Stein April 11, 2008 Abstract This is an undergraduate textbook about algebraic, scientific, and statistical computing using the free open source mathematical software system Sage. Contents 1 Computing with Sage 6 1.1 Installing and Using Sage ...................... 6 1.1.1 Installing Sage ........................ 6 1.1.2 The command line and the notebook ............ 8 1.1.3 Loading and saving variables and sessions ......... 13 1.2 Python: the Language of Sage .................... 15 1.2.1 Lists, Tuples, Strings, Dictionaries and Sets ........ 15 1.2.2 Control Flow ......................... 24 1.2.3 Errors Handling ....................... 24 1.2.4 Classes ............................ 26 1.3 Cython: Compiled Python ...................... 28 1.3.1 The Cython project ..................... 28 1.3.2 How to use Cython: command line, notebook ....... 28 1.4 Optimizing Sage Code ........................ 28 1.4.1 Example: Optimizing a Function Using A Range of Man- ual Techniques ........................ 28 1.4.2 Optimizing Using the Python profile Module ..... 37 1.5 Debugging Sage Code ........................ 39 1.5.1 Be Skeptical! ......................... 40 1.5.2 Using Print Statements ................... 41 1.5.3 Debugging Python using pdb ................ 41 1.5.4 Debugging Using gdb .................... 41 1.5.5 Narrowing down the problem ................ 42 1.6 Source Control Management ..................... 42 1.6.1 Distributed versus Centralized ............... 42 1.6.2 Mercurial ........................... 42 1.7 Using Mathematica, Matlab, Maple, etc., from Sage ....... 42 2 Algebraic Computing 43 2.1 Groups, Rings and Fields ...................... 43 2.1.1 Groups ............................ 43 2.1.2 Rings ............................
    [Show full text]
  • Sage 9.3 Reference Manual: External Packages
    Sage 9.4 Reference Manual: External Packages Release 9.4 The Sage Development Team Aug 24, 2021 CONTENTS 1 Standard Packages 1 2 Optional Packages 7 3 Experimental Packages 11 4 Detailed Listing of External Packages 13 i ii CHAPTER ONE STANDARD PACKAGES The Sage distribution includes most programs and libraries on which Sage depends. It installs them automatically if it does not find equivalent system packages. • _prereq: Represents system packages required for installing SageMath from source • alabaster: Default theme for the Sphinx documentation system • appdirs: A small Python module for determining appropriate platform-specific dirs, e.g. a “user data dir”. • appnope: Disable App Nap on OS X • arb: Arbitrary-precision floating-point ball arithmetic • argon2_cffi: The secure Argon2 password hashing algorithm • attrs: Decorator for Python classes with attributes • babel: Internationalization utilities for Python • backcall: Specifications for callback functions • bleach: An HTML-sanitizing tool • boost_cropped: Portable C++ libraries (subset needed for Sage) • brial: Boolean Ring Algebra implementation using binary decision diagrams • bzip2: High-quality data compressor • cddlib: Double description method for polyhedral representation conversion • certifi: Python package for providing Mozilla’s CA Bundle • cffi: Foreign Function Interface for Python calling Ccode • cliquer: Routines for clique searching • cmake: A cross-platform build system generator • combinatorial_designs: Data from the Handbook of Combinatorial Designs • conway_polynomials:
    [Show full text]
  • SAGEMATH CROSS PRODUCT: Community Run Open Source Software Providing Research Opportunities for Developers, Users, Contributors, and Teachers
    COLLABORATIVE RESEARCH: SI2-SSI: SAGEMATH CROSS PRODUCT: Community Run Open Source Software Providing Research Opportunities for Developers, Users, Contributors, and Teachers SageMath [217], or Sage for short, is a free open source general purpose mathematical software system, initiated under the leadership of William Stein (University of Washington), that has developed explosively within the last ten years among research mathematicians. Stein's primary motivational goal was to create software that is efficient, open source, comprehensive, well- documented, extensible, and free [216]. Sage's development has reached a point where it has the potential to become a standard software package that is widely used throughout STEM disciplines. If funded, the proposed project will allow us to make major progress towards this goal. Sage is similar to Maple [150], Mathematica [237], Magma [43], and MATLAB [225] (via third-party packages like SciPy [105]), but uses the popular Python language [175] both as an implementation and as a surface language. This not only ensures easy access to help with programming [75, 146, 174], but also provides developers with transferable skills as they work with a computer language that is ubiquitous in the software industry. Furthermore, Sage has gained strong momentum in the mathematics community far beyond its initial focus in number theory, in particular in the field of combinatorics where the capabilities and functionalities of Sage are much deeper than those of commercial packages. Sage is community-run, meaning that it is developed and organized by users for users. The development is driven by research and teaching needs and has a variety of stakeholders: e.g., developers, users, contributors, researchers, and teachers.
    [Show full text]
  • Fast Computation of Hermite Normal Forms of Random Integer Matrices ∗ Clément Pernet A,1, William Stein B, ,2 a Grenoble Université, INRIA-MOAIS, LIG; 51, Avenue J
    ARTICLE IN PRESS YJNTH:4018 JID:YJNTH AID:4018 /FLA [m1G; v 1.36; Prn:2/04/2010; 10:44] P.1 (1-9) Journal of Number Theory ••• (••••) •••–••• Contents lists available at ScienceDirect Journal of Number Theory www.elsevier.com/locate/jnt Fast computation of Hermite normal forms of random integer matrices ∗ Clément Pernet a,1, William Stein b, ,2 a Grenoble université, INRIA-MOAIS, LIG; 51, avenue J. Kuntzmann; 38330 Montbonnot St-Martin, France b University of Washington, Department of Mathematics, Box 354350, Seattle, WA, United States article info abstract Article history: This paper is about how to compute the Hermite normal form Received 21 April 2009 of a random integer matrix in practice. We propose significant Available online xxxx improvements to the algorithm by Micciancio and Warinschi, and Communicated by Fernando Rodriguez extend these techniques to the computation of the saturation of Villegas a matrix. We describe the fastest implementation for computing Keywords: Hermite normal form for large matrices with large entries. Hermite normal form © 2010 Published by Elsevier Inc. Exact linear algebra 1. Introduction This paper is about how to compute the Hermite normal form of a random integer matrix in practice. We describe the best known algorithm for random matrices, due to Micciancio and Warin- schi [MW01] and explain some new ideas that make it practical. We also apply these techniques to give a new algorithm for computing the saturation of a module, and present timings. In this paper we do not concern ourselves with nonrandom matrices, and instead refer the reader to [SL96,Sto98] for the state of the art for worse case complexity results.
    [Show full text]