Mechanically Verifying Real-Valued Algorithms in ACL2

Total Page:16

File Type:pdf, Size:1020Kb

Mechanically Verifying Real-Valued Algorithms in ACL2 Copyright by Ruben Antonio Gamboa 1999 Mechanically Verifying Real-Valued Algorithms in ACL2 by Ruben Antonio Gamboa, B.S., M.C.S. Dissertation Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy The University of Texas at Austin May 1999 Mechanically Verifying Real-Valued Algorithms in ACL2 Approved by Dissertation Committee: This work is dedicated to my daughter, who inspired me to finish, and to my wife, who would not let me quit. Acknowledgments The members of my committee have all contributed greatly to my dissertation, either di- rectly with comments and ideas, or indirectly by providing me with the necessary back- ground. Special thanks are due to Robert Boyer and Matt Kaufmann, the former for sticking with me through the early years of learning ACL2 and choosing a topic, and the latter for serving as a bountiful source of ideas and enthusiasm. Many other people deserve recognition. Among them are my family for believing in me and supporting me, and my best friend and boss for giving me the time away from work to indulge my academic hobby. RUBEN ANTONIO GAMBOA The University of Texas at Austin May 1999 v Mechanically Verifying Real-Valued Algorithms in ACL2 Publication No. Ruben Antonio Gamboa, Ph.D. The University of Texas at Austin, 1999 Supervisor: Robert S. Boyer ACL2 is a theorem prover over a total, first-order, mostly quantifier-free logic, supporting defined and constrained functions, equality and congruence rewriting, induction, and other reasoning techniques. With its powerful induction engine and direct support for the rational and complex-rational numbers, ACL2 can be used to verify recursive rational algorithms. However, ACL2 can not be used to reason about real-valued algorithms that involve the irrational numbers. For example, there are elegant proofs of the correctness of the Fast Fourier Transform (FFT) which could be formulated in ACL2, but since ACL2’s sparse number system permits the proof that the square root of two, and hence the eighth principal root of one, does not exist, it is impossible to reason directly about the FFT in ACL2. This research extends ACL2 to allow reasoning about the real and complex irra- tional numbers. The modifications are based on non-standard analysis, since infinitesimals are more natural than limits in a quantifier-free context. It is also shown how the trigono- metric functions can be defined in the modified ACL2. These definitions are then used to prove that the FFT correctly implements the Fourier Transform. vi Contents Acknowledgments v Abstract vi List of Tables x Chapter 1 Introduction 1 1.1 Related Work ................................. 5 1.2 Outline .................................... 8 Chapter 2 Learning from the Square Root Function 11 2.1 The Non-Existence of the Square Root of Two ................ 12 2.2 Approximating the Square Root Function .................. 16 Chapter 3 Non-Standard Analysis in ACL2 24 3.1 An Introduction to Non-Standard Analysis .................. 24 3.2 Non-Standard Analysis Primitives in ACL2 ................. 32 3.3 The Non-Standard Analysis ACL2 Library .................. 42 Chapter 4 The Square Root Function Revisited 47 4.1 Defining the Square Root Function in ACL2 ................. 47 4.2 Properties of the Square Root Function .................... 51 vii Chapter 5 The Exponential Function 55 5.1 Defining the Exponential Function in ACL2 ................. 55 5.1.1 A Complex Norm ........................... 56 5.1.2 Geometric Series ........................... 58 5.1.3 The Definition of the Exponential Function ............. 64 5.2 The Exponent of a Sum: ex+y = ex ey ................... 70 · 5.2.1 The Binomial Theorem ........................ 71 5.2.2 Nested Summations ......................... 74 5.2.3 Proving ex+y = ex ey ........................ 77 · 5.3 The Continuity of the Exponential Function ................. 90 Chapter 6 Trigonometric Functions 95 6.1 Defining the Trigonometric Functions in ACL2 ............... 96 6.1.1 The Definition of Sine and Cosine .................. 96 6.1.2 Alternating Sequences ........................ 99 6.1.3 The Intermediate Value Theorem ................... 101 6.1.4 π for Dessert ............................. 111 6.2 Basic Trigonometric Identities ........................ 116 6.3 Computations with the Trigonometric Functions ............... 125 6.3.1 Estimating π ............................. 126 6.3.2 Approximating sine and cosine .................... 132 Chapter 7 Powerlists in ACL2 139 7.1 Regular Powerlists .............................. 139 7.2 Defining Powerlists in ACL2 ......................... 141 7.2.1 Similar Powerlists .......................... 150 7.2.2 Regular Powerlists .......................... 153 7.2.3 Functions on Powerlists ....................... 156 viii 7.3 Simple Examples ............................... 160 7.4 Sorting Powerlists ............................... 165 7.4.1 Merge Sorting ............................ 167 7.4.2 Batcher Sorting ............................ 171 7.4.3 A Comparison with the Hand-Proof ................. 184 7.5 Prefix Sums of Powerlists ........................... 186 7.5.1 Simple Prefix Sums .......................... 188 7.5.2 Ladner-Fischer Prefix Sums ..................... 199 7.5.3 Comparing with the Hand-Proof Again ............... 202 7.5.4 L’agniappe: A Carry-Lookahead Adder ............... 203 Chapter 8 The Fast Fourier Transform 212 8.1 The Fast Fourier Transform .......................... 212 8.2 Verifying the Fast Fourier Transform in ACL2 ................ 215 8.3 Specializing the ACL2 Proof ......................... 225 Chapter 9 Conclusion 232 Appendix A A Simple Introduction to ACL2 236 Bibliography 246 Vita 266 ix List of Tables 9.1 An Estimate of the Proof Effort ........................ 233 x Chapter 1 Introduction ACL2 is a total, first-order, mostly quantifier-free logic based on the programming language Common LISP. ACL2 is also an automated theorem prover for this logic. The theorem prover excels in using induction to prove theorems about recursive functions. At its heart is a rewriter, which uses a database of previously proved theorems to transform a term while maintaining an appropriate equivalence relation, e.g., equality or if-and-only-if. In addition, it supports many other inference mechanisms. For instance, numeric inequalities are automatically verified using a linear arithmetic decision procedure, and propositional tautologies can be proved using a decision procedure based on binary decision diagrams (BDDs). Moreover, ACL2 supports the introduction of constrained functions, allowing a limited amount of higher-order reasoning. ACL2 is a direct descendant of Nqthm, also known as the Boyer-Moore theorem prover. It is fair to say that ACL2 grew out of a desire to “do Nqthm, only better” [40, 12]. Among the enhancements of ACL2 over Nqthm is a richer number system. Nqthm had some native support for the integers, and it was primarily designed for working with the naturals. ACL2, on the other hand, introduced the rational and the complex-rational num- bers. However, the irrationals were deliberately excluded from the ACL2 number system. This exclusion stems from the desire to keep ACL2 as close to Common LISP as possible, 1 and Common LISP does not include the irrationals. Since the primary goal of ACL2 is to prove properties about Common LISP functions, it makes sense to exclude objects that do not exist in the Common LISP universe. However, the omission of the reals places an artificial limit on the theories that ACL2 can verify. For example, an important application of ACL2 is the proof of correctness of various micro-processor algorithms, such as floating-point arithmetic operations, roots, and other transcendental functions. In [54], Russinoff uses ACL2 to prove the correctness of the square root floating-point microcode in the AMD K5 processor. However, since the number px is possibly irrational for a given rational x, his theory “meticulously avoids any reference” to px. In particular, the correctness theorem has the (simplified) form l2 x h2 rnd(l) sqrt(x) rnd(h) ≤ ≤ ) ≤ ≤ where rnd rounds a number to its nearest floating-point representation and sqrt is the AMD K5 microcode algorithm for finding floating-point square roots. This theorem is equivalent to the desired statement of correctness: sqrt(x) = rnd(px) However, this equivalence can not be stated in ACL2. The difference between these two theorems is more than a matter of aesthetics. Consider the norm given by x = Re2(x) + Im2(x), where Re(x) and Im(x) are the jj jj p real and imaginary parts of x, respectively. An algorithm may require the value x y , jj · jj but it may compute it using x y because those values are already known. The results jj jj · jj jj are mathematically equivalent, even if it is false when the approximation sqrt(x) is used instead of the function px. This is a general phenomenon. The correctness of many algorithms depend on prop- erties that are true of real-valued functions, even though the algorithms are implemented entirely in terms of rational or floating-point values. That is, an algorithm may compute the value of the function f at a point x not directly by approximating the equation defining 2 f(x), but indirectly by approximating the equation defining g(x), where it can be proved that the functions f and g are equal. In the example above, the equation f(x; y) would suggest multiplying x and y and taking the norm of the product, while g(x; y) multiplies the norms x and y. The key fact is that in order to prove the correctness of the algorithm following the defining equation g(x), it may be necessary to prove that f(x) = g(x) and such arguments are often impossible without using the language of irrational numbers and irrational functions. Note, it is quite possible that the defining equation for g(x) is more amenable for ap- proximation using floating-point arithmetic than f(x). This happens when the computation of f(x) accumulates errors, whereas the errors tend to cancel out during the computation of g(x). In the language of numerical analysis, g(x) is called stable and f(x) unstable.
Recommended publications
  • Parallel Prefix Sum (Scan) with CUDA
    Parallel Prefix Sum (Scan) with CUDA Mark Harris [email protected] April 2007 Document Change History Version Date Responsible Reason for Change February 14, Mark Harris Initial release 2007 April 2007 ii Abstract Parallel prefix sum, also known as parallel Scan, is a useful building block for many parallel algorithms including sorting and building data structures. In this document we introduce Scan and describe step-by-step how it can be implemented efficiently in NVIDIA CUDA. We start with a basic naïve algorithm and proceed through more advanced techniques to obtain best performance. We then explain how to scan arrays of arbitrary size that cannot be processed with a single block of threads. Month 2007 1 Parallel Prefix Sum (Scan) with CUDA Table of Contents Abstract.............................................................................................................. 1 Table of Contents............................................................................................... 2 Introduction....................................................................................................... 3 Inclusive and Exclusive Scan .........................................................................................................................3 Sequential Scan.................................................................................................................................................4 A Naïve Parallel Scan ......................................................................................... 4 A Work-Efficient
    [Show full text]
  • Generic Implementation of Parallel Prefix Sums and Their
    GENERIC IMPLEMENTATION OF PARALLEL PREFIX SUMS AND THEIR APPLICATIONS A Thesis by TAO HUANG Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE May 2007 Major Subject: Computer Science GENERIC IMPLEMENTATION OF PARALLEL PREFIX SUMS AND THEIR APPLICATIONS A Thesis by TAO HUANG Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Approved by: Chair of Committee, Lawrence Rauchwerger Committee Members, Nancy M. Amato Jennifer L. Welch Marvin L. Adams Head of Department, Valerie E. Taylor May 2007 Major Subject: Computer Science iii ABSTRACT Generic Implementation of Parallel Prefix Sums and Their Applications. (May 2007) Tao Huang, B.E.; M.E., University of Electronic Science and Technology of China Chair of Advisory Committee: Dr. Lawrence Rauchwerger Parallel prefix sums algorithms are one of the simplest and most useful building blocks for constructing parallel algorithms. A generic implementation is valuable because of the wide range of applications for this method. This thesis presents a generic C++ implementation of parallel prefix sums. The implementation applies two separate parallel prefix sums algorithms: a recursive doubling (RD) algorithm and a binary-tree based (BT) algorithm. This implementation shows how common communication patterns can be sepa- rated from the concrete parallel prefix sums algorithms and thus simplify the work of parallel programming. For each algorithm, the implementation uses two different synchronization options: barrier synchronization and point-to-point synchronization. These synchronization options lead to different communication patterns in the algo- rithms, which are represented by dependency graphs between tasks.
    [Show full text]
  • PRAM Algorithms Parallel Random Access Machine
    PRAM Algorithms Arvind Krishnamurthy Fall 2004 Parallel Random Access Machine (PRAM) n Collection of numbered processors n Accessing shared memory cells Control n Each processor could have local memory (registers) n Each processor can access any shared memory cell in unit time Private P Memory 1 n Input stored in shared memory cells, output also needs to be Global stored in shared memory Private P Memory 2 n PRAM instructions execute in 3- Memory phase cycles n Read (if any) from a shared memory cell n Local computation (if any) Private n Write (if any) to a shared memory P Memory p cell n Processors execute these 3-phase PRAM instructions synchronously 1 Shared Memory Access Conflicts n Different variations: n Exclusive Read Exclusive Write (EREW) PRAM: no two processors are allowed to read or write the same shared memory cell simultaneously n Concurrent Read Exclusive Write (CREW): simultaneous read allowed, but only one processor can write n Concurrent Read Concurrent Write (CRCW) n Concurrent writes: n Priority CRCW: processors assigned fixed distinct priorities, highest priority wins n Arbitrary CRCW: one randomly chosen write wins n Common CRCW: all processors are allowed to complete write if and only if all the values to be written are equal A Basic PRAM Algorithm n Let there be “n” processors and “2n” inputs n PRAM model: EREW n Construct a tournament where values are compared P0 Processor k is active in step j v if (k % 2j) == 0 At each step: P0 P4 Compare two inputs, Take max of inputs, P0 P2 P4 P6 Write result into shared memory
    [Show full text]
  • Parallel Prefix Sum on the GPU (Scan)
    Parallel Prefix Sum on the GPU (Scan) Presented by Adam O’Donovan Slides adapted from the online course slides for ME964 at Wisconsin taught by Prof. Dan Negrut and from slides Presented by David Luebke Parallel Prefix Sum (Scan) Definition: The all-prefix-sums operation takes a binary associative operator ⊕ with identity I, and an array of n elements a a a [ 0, 1, …, n-1] and returns the ordered set I a a ⊕ a a ⊕ a ⊕ ⊕ a [ , 0, ( 0 1), …, ( 0 1 … n-2)] . Example: Exclusive scan: last input if ⊕ is addition, then scan on the set element is not included in [3 1 7 0 4 1 6 3] the result returns the set [0 3 4 11 11 15 16 22] (From Blelloch, 1990, “Prefix 2 Sums and Their Applications) Applications of Scan Scan is a simple and useful parallel building block Convert recurrences from sequential … for(j=1;j<n;j++) out[j] = out[j-1] + f(j); … into parallel : forall(j) in parallel temp[j] = f(j); scan(out, temp); Useful in implementation of several parallel algorithms: radix sort Polynomial evaluation quicksort Solving recurrences String comparison Tree operations Lexical analysis Histograms Stream compaction Etc. HK-UIUC 3 Scan on the CPU void scan( float* scanned, float* input, int length) { scanned[0] = 0; for(int i = 1; i < length; ++i) { scanned[i] = scanned[i-1] + input[i-1]; } } Just add each element to the sum of the elements before it Trivial, but sequential Exactly n-1 adds: optimal in terms of work efficiency 4 Parallel Scan Algorithm: Solution One Hillis & Steele (1986) Note that a implementation of the algorithm shown in picture requires two buffers of length n (shown is the case n=8=2 3) Assumption: the number n of elements is a power of 2: n=2 M Picture courtesy of Mark Harris 5 The Plain English Perspective First iteration, I go with stride 1=2 0 Start at x[2 M] and apply this stride to all the array elements before x[2 M] to find the mate of each of them.
    [Show full text]
  • Download This PDF File
    International Journal of Electrical and Computer Engineering (IJECE) Vol. 10, No. 2, April 2020, pp. 1469~1476 ISSN: 2088-8708, DOI: 10.11591/ijece.v10i2.pp1469-1476 1469 The new integer factorization algorithm based on fermat’s factorization algorithm and euler’s theorem Kritsanapong Somsuk Department of Computer and Communication Engineering, Faculty of Technology, Udon Thani Rajabhat University, Udon Thani, 41000, Thailand Article Info ABSTRACT Article history: Although, Integer Factorization is one of the hard problems to break RSA, many factoring techniques are still developed. Fermat’s Factorization Received Apr 7, 2019 Algorithm (FFA) which has very high performance when prime factors are Revised Oct 7, 2019 close to each other is a type of integer factorization algorithms. In fact, there Accepted Oct 20, 2019 are two ways to implement FFA. The first is called FFA-1, it is a process to find the integer from square root computing. Because this operation takes high computation cost, it consumes high computation time to find the result. Keywords: The other method is called FFA-2 which is the different technique to find prime factors. Although the computation loops are quite large, there is no Computation loops square root computing included into the computation. In this paper, Euler’s theorem the new efficient factorization algorithm is introduced. Euler’s theorem is Fermat’s factorization chosen to apply with FFA to find the addition result between two prime Integer factorization factors. The advantage of the proposed method is that almost of square root Prime factors operations are left out from the computation while loops are not increased, they are equal to the first method.
    [Show full text]
  • Combinatorial Species and Labelled Structures Brent Yorgey University of Pennsylvania, [email protected]
    University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations 1-1-2014 Combinatorial Species and Labelled Structures Brent Yorgey University of Pennsylvania, [email protected] Follow this and additional works at: http://repository.upenn.edu/edissertations Part of the Computer Sciences Commons, and the Mathematics Commons Recommended Citation Yorgey, Brent, "Combinatorial Species and Labelled Structures" (2014). Publicly Accessible Penn Dissertations. 1512. http://repository.upenn.edu/edissertations/1512 This paper is posted at ScholarlyCommons. http://repository.upenn.edu/edissertations/1512 For more information, please contact [email protected]. Combinatorial Species and Labelled Structures Abstract The theory of combinatorial species was developed in the 1980s as part of the mathematical subfield of enumerative combinatorics, unifying and putting on a firmer theoretical basis a collection of techniques centered around generating functions. The theory of algebraic data types was developed, around the same time, in functional programming languages such as Hope and Miranda, and is still used today in languages such as Haskell, the ML family, and Scala. Despite their disparate origins, the two theories have striking similarities. In particular, both constitute algebraic frameworks in which to construct structures of interest. Though the similarity has not gone unnoticed, a link between combinatorial species and algebraic data types has never been systematically explored. This dissertation lays the theoretical groundwork for a precise—and, hopefully, useful—bridge bewteen the two theories. One of the key contributions is to port the theory of species from a classical, untyped set theory to a constructive type theory. This porting process is nontrivial, and involves fundamental issues related to equality and finiteness; the recently developed homotopy type theory is put to good use formalizing these issues in a satisfactory way.
    [Show full text]
  • Modern Computer Arithmetic (Version 0.5. 1)
    Modern Computer Arithmetic Richard P. Brent and Paul Zimmermann Version 0.5.1 arXiv:1004.4710v1 [cs.DS] 27 Apr 2010 Copyright c 2003-2010 Richard P. Brent and Paul Zimmermann This electronic version is distributed under the terms and conditions of the Creative Commons license “Attribution-Noncommercial-No Derivative Works 3.0”. You are free to copy, distribute and transmit this book under the following conditions: Attribution. You must attribute the work in the manner specified • by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Noncommercial. You may not use this work for commercial purposes. • No Derivative Works. You may not alter, transform, or build upon • this work. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to the web page below. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author’s moral rights. For more information about the license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/ Contents Contents iii Preface ix Acknowledgements xi Notation xiii 1 Integer Arithmetic 1 1.1 RepresentationandNotations . 1 1.2 AdditionandSubtraction . .. 2 1.3 Multiplication . 3 1.3.1 Naive Multiplication . 4 1.3.2 Karatsuba’s Algorithm . 5 1.3.3 Toom-Cook Multiplication . 7 1.3.4 UseoftheFastFourierTransform(FFT) . 8 1.3.5 Unbalanced Multiplication . 9 1.3.6 Squaring.......................... 12 1.3.7 Multiplication by a Constant .
    [Show full text]
  • Algorithms Based on Parallel Prefix (Scan) Operations, Cont
    COMP 322: Fundamentals of Parallel Programming Lecture 38: Algorithms based on Parallel Prefix (Scan) operations, cont. Mack Joyner and Zoran Budimlić {mjoyner, zoran}@rice.edu Acknowledgements: • Book chapter on “Prefix Sums and Their Applications”, Guy E. Blelloch, CMU • Slides on “Parallel prefix adders”, Kostas Vitoroulis, Concordia University http://comp322.rice.edu COMP 322 Lecture 38 April 2018 Worksheet #37 problem statement: Parallelizing the Split step in Radix Sort The Radix Sort algorithm loops over the bits in the binary representation of the keys, starting at the lowest bit, and executes a split operation for each bit as shown below. The split operation packs the keys with a 0 in the corresponding bit to the bottom of a vector, and packs the keys with a 1 to the top of the same vector. It maintains the order within both groups. The sort works because each split operation sorts the keys with respect to the current bit and maintains the sorted order of all the lower bits. Your task is to show how the split operation can be performed in parallel using scan operations, and to explain your answer. [101 111 011 001 100 010 111 010] 1.A = [5 7 3 1 4 2 7 2] 2.A⟨0⟩ = [1 1 1 1 0 0 1 0] //lowest bit 3.A←split(A,A⟨0⟩) = [4 2 2 5 7 3 1 7] 4.A⟨1⟩ = [0 1 1 0 1 1 0 1] // middle bit 5.A←split(A,A⟨1⟩) = [4 5 1 2 2 7 3 7] 6.A⟨2⟩ = [1 1 0 0 0 1 0 1] // highest bit 7.A←split(A,A⟨2⟩) = [1 2 2 3 4 5 7 7] 2 COMP 322, Spring 2018 (M.Joyner, Z.
    [Show full text]
  • MPI Collectives I: Reductions and Broadcast Calculating Pi with a Broadcast and Reduction
    MPI Collectives I: Reductions and broadcast Calculating Pi with a broadcast and reduction UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN © 2018 L. V. Kale at the University of Illinois Urbana What does “Collective” mean • Everyone within the communicator (named in the call) must make that call before the call can be effected • Essentially, a collective call requires coordination among all the processes of a communicator L.V.Kale 2 Some Basic MPI Collective Calls • MPI_Barrier(MPI_Comm comm) • Blocks the caller until all processes have entered the call • MPI_Bcast(void* buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm) • Broadcasts a message from rank ‘root’ to all processes of the group • It is called by all members of group using the same arguments • MPI_Allreduce(void* sendbuf, void* recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm) • MPI_Reduce(void* sendbuf, void* recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm) L.V.Kale 3 PI Example with Broadcast and reductions int main(int argc, char **argv) { int myRank, numPes; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numPes); MPI_Comm_rank(MPI_COMM_WORLD, &myRank); int count, i, numTrials, myTrials; if(myRank == 0) { scanf("%d", &numTrials); myTrials = numTrials / numPes; numTrials = myTrials * numPes; // takes care of rounding } MPI_BcastMPI_Bcast(&(&myTrialsmyTrials,, 1,1, MPI_INT,MPI_INT, 0,0, MPI_COMM_WORLD);MPI_COMM_WORLD); count = 0; srandom(myRank); double x, y, pi; L.V.Kale 4 // code continues from the last page... for (i=0; i<myTrials; i++) { x = (double) random()/RAND_MAX; y = (double) random()/RAND_MAX; if (x*x + y*y < 1.0) count++; } int globalCount = 0; MPI_Allreduce(&count, &globalCount, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); pi = (double)(4 * globalCount) / (myTrials * numPes); printf("[%d] pi = %f\n", myRank, pi); MPI_Finalize(); return 0; } /* end function main */ L.V.Kale 5 MPI_Reduce • Often, you want the result of a reduction only on one processors • E.g.
    [Show full text]
  • Appendix Mathematical Background A
    Appendix Mathematical Background A Sum Formulas Each sum of the form n xk = 1k + 2k + 3k +···+nk, x=1 where k is a positive integer has a closed-form formula that is a polynomial of degree k + 1. For example,1 n n(n + 1) x = 1 + 2 + 3 +···+n = 2 x=1 and n n(n + 1)(2n + 1) x2 = 12 + 22 + 32 + ...+ n2 = . 6 x=1 An arithmetic progression is a sequence of numbers where the difference between any two consecutive numbers is constant. For example, 3, 7, 11, 15 is an arithmetic progression with constant 4. The sum of an arithmetic progression can be calculated using the formula n(a + b) a +···+b = 2 n numbers 1There is even a general formula for such sums, called Faulhaber’s formula, but it is too complex to be presented here. © Springer International Publishing AG, part of Springer Nature 2017 269 A. Laaksonen, Guide to Competitive Programming, Undergraduate Topics in Computer Science, https://doi.org/10.1007/978-3-319-72547-5 270 Appendix A: Mathematical Background where a is the first number, b is the last number, and n is the amount of numbers. For example, 4 · (3 + 15) 3 + 7 + 11 + 15 = = 36. 2 The formula is based on the fact that the sum consists of n numbers and the value of each number is (a + b)/2 on average. A geometric progression is a sequence of numbers where the ratio between any two consecutive numbers is constant. For example, 3, 6, 12, 24 is a geometric progression with constant 2.
    [Show full text]
  • A Sound and Complete Abstraction for Reasoning About Parallel Prefix
    In proceedings of POPL’14 A Sound and Complete Abstraction for Reasoning about Parallel Prefix Sums ∗ Nathan Chong Alastair F. Donaldson Jeroen Ketema Imperial College London fnyc04,afd,[email protected] Abstract 1. Introduction Prefix sums are key building blocks in the implementation of many The prefix sum operation, which given an array A computes an concurrent software applications, and recently much work has gone array B consisting of all sums of prefixes of A, is an important into efficiently implementing prefix sums to run on massively par- building block in many high performance computing applications. allel graphics processing units (GPUs). Because they lie at the heart A key example is stream compaction: suppose a group of n threads of many GPU-accelerated applications, the correctness of prefix has calculated a number of data items in parallel, each thread t sum implementations is of prime importance. (1 ≤ t ≤ n) having computed dt items; now the threads must We introduce a novel abstraction, the interval of summations, write the resulting items to a shared array out in a compact manner, that allows scalable reasoning about implementations of prefix i.e. thread t should write its items to a series of dt indices of sums. We present this abstraction as a monoid, and prove a sound- out starting from position d1 + ··· + dt−1. Compacting the data ness and completeness result showing that a generic sequential pre- stream by serialising the threads, so that thread 1 writes its results, fix sum implementation is correct for an array of length n if and followed by thread 2, etc., would be slow.
    [Show full text]
  • User's Guide to PARI / GP
    User’s Guide to PARI / GP C. Batut, K. Belabas, D. Bernardi, H. Cohen, M. Olivier Laboratoire A2X, U.M.R. 9936 du C.N.R.S. Universit´eBordeaux I, 351 Cours de la Lib´eration 33405 TALENCE Cedex, FRANCE e-mail: [email protected] Home Page: http://pari.math.u-bordeaux.fr/ Primary ftp site: ftp://pari.math.u-bordeaux.fr/pub/pari/ last updated 11 December 2003 for version 2.2.7 Copyright c 2000–2003 The PARI Group ° Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions, or translations, of this manual under the conditions for verbatim copying, provided also that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. PARI/GP is Copyright c 2000–2003 The PARI Group ° PARI/GP is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. It is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY WHATSOEVER. Table of Contents Chapter1: OverviewofthePARIsystem . ...........5 1.1Introduction ..................................... ....... 5 1.2ThePARItypes..................................... ...... 6 1.3Operationsandfunctions. ...........9 Chapter 2: Specific Use of the GP Calculator . ............ 13 2.1Defaultsandoutputformats . ............14 2.2Simplemetacommands. ........21 2.3InputformatsforthePARItypes . .............25 2.4GPoperators ....................................... .28 2.5ThegeneralGPinputline . ........31 2.6 The GP/PARI programming language . .............32 2.7Errorsanderrorrecovery. .........42 2.8 Interfacing GP with other languages .
    [Show full text]