Augmented'' Floating-Point Operations Using Round-To-Nearest Ties-To-E

Total Page:16

File Type:pdf, Size:1020Kb

Augmented'' Floating-Point Operations Using Round-To-Nearest Ties-To-E Emulating round-to-nearest ties-to-zero ”augmented” floating-point operations using round-to-nearest ties-to-even arithmetic Sylvie Boldo, Christoph Lauter, Jean-Michel Muller To cite this version: Sylvie Boldo, Christoph Lauter, Jean-Michel Muller. Emulating round-to-nearest ties-to-zero ”aug- mented” floating-point operations using round-to-nearest ties-to-even arithmetic. IEEE Transactions on Computers, Institute of Electrical and Electronics Engineers, In press, 10.1109/TC.2020.3002702. hal-02137968v4 HAL Id: hal-02137968 https://hal.archives-ouvertes.fr/hal-02137968v4 Submitted on 13 Mar 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. 1 Emulating Round-to-Nearest Ties-to-Zero “Augmented” Floating-Point Operations Using Round-to-Nearest Ties-to-Even Arithmetic Sylvie Boldo*, Christoph Lautery, Jean-Michel Mullerz * Université Paris-Saclay, Univ. Paris-Sud, CNRS, Inria, Laboratoire de recherche en informatique, 91405 Orsay, France y University of Alaska Anchorage, College of Engineering, Computer Science Department, Anchorage, AK, USA z Univ Lyon, CNRS, ENS de Lyon, Inria, Université Claude Bernard Lyon 1, LIP UMR 5668, F-69007 Lyon, France Abstract—The 2019 version of the IEEE 754 Standard for emin < 0 and the maximum exponent is emax. A floating-point Floating-Point Arithmetic recommends that new “augmented” number is a number of the form operations should be provided for the binary formats. These ex−p+1 operations use a new “rounding direction”: round-to-nearest x = Mx × 2 ; (1) ties-to-zero. We show how they can be implemented using the currently available operations, using round-to-nearest ties-to-even where with a partial formal proof of correctness. p Mx is an integer satisfying jMxj ≤ 2 − 1; (2) Keywords. Floating-point arithmetic, Numerical repro- ducibility, Rounding error analysis, Error-free transforms, and e ≤ e ≤ e : Rounding mode, Formal proof. min x max (3) If jMxj is maximum under the constraints (1), (2), and (3), emin I. INTRODUCTION AND NOTATION then ex is the floating-point exponent of x. The number 2 is The new IEEE 754-2019 Standard for Floating-Point (FP) the smallest positive normal number (a FP number of absolute 2emin subnormal 2emin−p+1 Arithmetic [8] supersedes the 2008 version. It recommends value less than is called ), and is the that new “augmented” operations should be provided for the smallest positive FP number. The largest positive FP number binary formats (see [15] for history and motivation). These is Ω = (2 − 2−p+1) · 2emax : operations are called augmentedAddition, augmentedSub- traction, and augmentedMultiplication. They use a new We will assume “rounding direction”: round-to-nearest ties-to-zero. The reason 3p ≤ emax + 1; (4) behind this recommendation is that these operations would significantly help to implement reproducible summation and which is satisfied by all binary formats of the IEEE 754 Stan- dot product, using an algorithm due to Demmel, Ahrens, and dard, with the exception of binary16 (which is an interchange NGuyen [5]. Obtaining very fast reproducible summation with format but not a basic format [8]). The usual round-to-nearest, that algorithm may require a direct hardware implementation ties-to-even function (which is the default in the IEEE-754 of these operations. However, having these operations avail- Standard) will be noted RNe. We recall its definition [8]: able on common processors will certainly take time, and they RNe(t) (where t is a real number) is the floating- may not be available on all platforms. The purpose of this point number nearest to t. If the two nearest floating- paper is to show that, in the meantime, one can emulate point numbers bracketing t are equally near, RNe(t) these operations with conventional FP operations (with the is the one whose least significant bit is zero. If emax−p usual round-to-nearest ties-to-even rounding direction), with jtj ≥ Ω + 2 then RNe(t) = 1, with the same reasonable efficiency. In this paper, we present the first pro- sign as t. posed emulation algorithms, with proof of their correctness We will also assume that an FMA (fused multiply-add) and experimental results. This allows, for instance, the design instruction is available. This is the case on all recent FP units. of programs that use these operations, and that will be ready As said above, the new recommended operations use a new for use with full efficiency as soon as the augmented operations “rounding direction”: round-to-nearest ties-to-zero. It corre- are available in hardware. Also, when these operations are sponds to the rounding function RN0 defined as follows [8]: available in hardware on some systems, this will improve the RN0(t) (where t is a real number) is the floating- portability of programs using these operations by allowing point number nearest t. If the two nearest floating- them to still work (with degraded performance, however) on point numbers bracketing t are equally near, other systems. RN0(t) is the one with smaller magnitude. If emax−p In the following, we assume radix-2, precision-p floating- jtj > Ω + 2 then RN0(t) = 1, with the same point arithmetic [13]. The minimum floating-point exponent is sign as t. 2 This is illustrated in Fig. 1. As one can infer from the defini- place”). If x is a floating-point number different from −Ω, tions, RNe(t) and RN0(t) can differ in only two circumstances first define pred(x) as the floating-point predecessor of x, i.e., (called halfway cases): when t is halfway between two consec- the largest floating-point number < x. We define ulpH (x) as utive floating-point numbers, and when t = ±(Ω + 2emax−p). follows. Definition 1 (Harrison’s ulp). If x is a floating-point number, RN0(x) = RNe(x) +1 then ulpH (x) is 0 RN0(y) RNe(y) jxj − pred (jxj) : 100100 100101 100110 100111 101000 Notation ulpH is to avoid confusion with the usual definition y x of function ulp. The usual ulp and function ulpH differ at powers of 2, except in the subnormal domain. For instance, Fig. 1: Round-to-nearest ties-to-zero (assuming we are in the −p+1 −p ulp(1) = 2 , whereas ulpH (1) = 2 . One easily checks positive range). Number x is rounded to the (unique) FP that if jtj is not a power of 2, then ulp(t) = ulpH (t), and if number nearest to x. Number y is a halfway case: it is exactly k k−p+1 jtj = 2 , then ulp(t) = 2 = 2ulpH (t), except in the halfway between two consecutive FP numbers: it is rounded emin−p+1 subnormal range where ulp(t) = ulpH (t) = 2 . to the one that has the smallest magnitude. The reason for choosing function ulpH instead of function ulp is twofold: The augmented operations are required to behave as fol- lows [8], [15]: ∙ if t > 0 is a real number, each time RN0(t) differs from RN (t), RN (t) will be the floating-point predecessor ∙ augmentedAddition(x; y) delivers (a ; b ) such that e 0 0 0 (t) (t) 6= (t) t a = RN (x + y) and, when a 2= {±∞; NaNg, of RNe , because RN0 RNe implies that 0 0 0 is what we call a “halfway case” in Section II: it is b0 = (x+y)−a0. When b0 = 0, it is required to have the same sign as a . One easily shows that b is a FP number. exactly halfway between two consecutive floating-point 0 0 (t) For special rules when a 2 {±∞; NaNg, see [15]; numbers, and in that case, RN0 is the one of these 0 two FP numbers which is closest to zero and RN (t) is ∙ augmentedSubtraction(x; y) is exactly the same as e (t) augmentedAddition(x; −y), so we will not discuss that the other one. Hence, in these cases, to obtain RN0 (t) operation further; we will have to subtract from RNe a number which is exactly ulp (RN (t)) (for negative t, for symmetry ∙ augmentedMultiplication(x; y) delivers (a ; b ) such H e 0 0 ( (t)) (t) that a = RN (x · y) and, where a 2= {±∞; NaNg, reasons, we will have to add ulpH RNe to RNe ); 0 0 0 and b0 = RN0((x · y) − a0). When (x · y) − a0 = 0, the ∙ (t) floating-point number b (equal to zero) is required to there is a very simple algorithm for computing ulpH 0 in the range where we need it (Algorithm 4 below). have the same sign as a0. Note that in some corner cases (an example is given in Section IV-A), b0 may differ Let us now briefly recall the classical Algorithms Fast2Sum, from (x · y) − a0 (in other words, (x · y) − a0 is not 2Sum, and Fast2Mult. always a floating-point number). Again, rules for handling infinities, NaNs and the signs of zeroes are given in [8], ALGORITHM 1: Fast2Sum(x; y). The Fast2Sum [15]. algorithm [4]. Because of the different rounding function, these augmented ae RNe(x + y) 0 operations differ from the well-known Fast2Sum, 2Sum, and y RNe(ae − x) 0 Fast2Mult algorithms (Algorithms 1, 2 and 3 below). As said be RNe(y − y ) above, the goal of this paper is to show that one can implement these augmented operations just by using rounded-to-nearest ties-to-even FP operations and with reasonable efficiency on Lemma 1.
Recommended publications
  • Nios II Custom Instruction User Guide
    Nios II Custom Instruction User Guide Subscribe UG-20286 | 2020.04.27 Send Feedback Latest document on the web: PDF | HTML Contents Contents 1. Nios II Custom Instruction Overview..............................................................................4 1.1. Custom Instruction Implementation......................................................................... 4 1.1.1. Custom Instruction Hardware Implementation............................................... 5 1.1.2. Custom Instruction Software Implementation................................................ 6 2. Custom Instruction Hardware Interface......................................................................... 7 2.1. Custom Instruction Types....................................................................................... 7 2.1.1. Combinational Custom Instructions.............................................................. 8 2.1.2. Multicycle Custom Instructions...................................................................10 2.1.3. Extended Custom Instructions................................................................... 11 2.1.4. Internal Register File Custom Instructions................................................... 13 2.1.5. External Interface Custom Instructions....................................................... 15 3. Custom Instruction Software Interface.........................................................................16 3.1. Custom Instruction Software Examples................................................................... 16
    [Show full text]
  • The MPFR Team [email protected] This Manual Documents How to Install and Use the Multiple Precision Floating-Point Reliable Library, Version 2.2.0
    MPFR The Multiple Precision Floating-Point Reliable Library Edition 2.2.0 September 2005 The MPFR team [email protected] This manual documents how to install and use the Multiple Precision Floating-Point Reliable Library, version 2.2.0. Copyright 1991, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover Texts being “A GNU Manual”, and with the Back-Cover Texts being “You have freedom to copy and modify this GNU Manual, like GNU software”. A copy of the license is included in Appendix A [GNU Free Documentation License], page 30. i Table of Contents MPFR Copying Conditions ................................ 1 1 Introduction to MPFR ................................. 2 1.1 How to use this Manual ........................................................ 2 2 Installing MPFR ....................................... 3 2.1 How to install ................................................................. 3 2.2 Other make targets ............................................................ 3 2.3 Known Build Problems ........................................................ 3 2.4 Getting the Latest Version of MPFR ............................................ 4 3 Reporting Bugs ........................................ 5 4 MPFR Basics .........................................
    [Show full text]
  • IEEE Std 754-1985 Revision of Reaffirmed1990 IEEE Standard for Binary Floating-Point Arithmetic
    Recognized as an American National Standard (ANSI) IEEE Std 754-1985 An American National Standard IEEE Standard for Binary Floating-Point Arithmetic Sponsor Standards Committee of the IEEE Computer Society Approved March 21, 1985 Reaffirmed December 6, 1990 IEEE Standards Board Approved July 26, 1985 Reaffirmed May 21, 1991 American National Standards Institute © Copyright 1985 by The Institute of Electrical and Electronics Engineers, Inc 345 East 47th Street, New York, NY 10017, USA No part of this publication may be reproduced in any form, in an electronic retrieval system or otherwise, without the prior written permission of the publisher. IEEE Standards documents are developed within the Technical Committees of the IEEE Societies and the Standards Coordinating Committees of the IEEE Standards Board. Members of the committees serve voluntarily and without compensation. They are not necessarily members of the Institute. The standards developed within IEEE represent a consensus of the broad expertise on the subject within the Institute as well as those activities outside of IEEE which have expressed an interest in participating in the development of the standard. Use of an IEEE Standard is wholly voluntary. The existence of an IEEE Standard does not imply that there are no other ways to produce, test, measure, purchase, market, or provide other goods and services related to the scope of the IEEE Standard. Furthermore, the viewpoint expressed at the time a standard is approved and issued is subject to change brought about through developments in the state of the art and comments received from users of the standard. Every IEEE Standard is subjected to review at least once every five years for revision or reaffirmation.
    [Show full text]
  • FORTRAN 77 Language Reference
    FORTRAN 77 Language Reference FORTRAN 77 Version 5.0 901 San Antonio Road Palo Alto, , CA 94303-4900 USA 650 960-1300 fax 650 969-9131 Part No: 805-4939 Revision A, February 1999 Copyright Copyright 1999 Sun Microsystems, Inc. 901 San Antonio Road, Palo Alto, California 94303-4900 U.S.A. All rights reserved. All rights reserved. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Portions of this product may be derived from the UNIX® system, licensed from Novell, Inc., and from the Berkeley 4.3 BSD system, licensed from the University of California. UNIX is a registered trademark in the United States and in other countries and is exclusively licensed by X/Open Company Ltd. Third-party software, including font technology in this product, is protected by copyright and licensed from Sun’s suppliers. RESTRICTED RIGHTS: Use, duplication, or disclosure by the U.S. Government is subject to restrictions of FAR 52.227-14(g)(2)(6/87) and FAR 52.227-19(6/87), or DFAR 252.227-7015(b)(6/95) and DFAR 227.7202-3(a). Sun, Sun Microsystems, the Sun logo, SunDocs, SunExpress, Solaris, Sun Performance Library, Sun Performance WorkShop, Sun Visual WorkShop, Sun WorkShop, and Sun WorkShop Professional are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and in other countries.
    [Show full text]
  • Languages Fortran 90 and C / C+--L
    On the need for predictable floating-point arithmetic in the programming languages Fortran 90 and C / C+--l- DENNIS VERSCHAEREN, ANNIE CUYT AND BRIGITTE VERDONK University of Antwerp (Campus UIA) InfoProc Department of Mathematics and Computer Science Universiteitsplein 1 B-2610 Antwerp - Belgium E-mail: Dennis. Yerschaeren~uia. ua. ac. be E-mail: Annie. Cuyt@uia. ua. ac. be E-mail: Brigitte. Verdonk~uia. ua. ac. be Abstract. During the past decade the IEEE 754 standard for binary floating-point arithmetic has been very successful. Many of today's hardware platforms conform to this standard and compilers should make floating-point functionality available to programmers through high-level programming languages. They should also implement certain features of the IEEE 754 standard in software, such as the input and output conversions to and from the internal binary representation of numbers. In this report, a number of Fortran 90 and C / C++ compilers for workstations as well as personal computers will be screened with respect to their IEEE conformance. It will be shown that most of these compilers do not conform to the IEEE standard and that shortcomings are essentially due to their respective programming language standards which lack attention for the need of predictable floating-point arithmetic. Categories and subject descriptors: G.1.0 [Numerical Analysis]: General--Computer arithmetic; D.3.0 [Programming Languages]: General--Standards; D.3.4 [Programming Languages]: Processors--Compilers AMS subject classifications: 68N15, 68N20 General terms: floating-point, arithmetic Additional key words and phrases: IEEE floating-point standard, Fortran 90, C, C++ 1 - Introduction The IEEE standard for floating-point arithmetic describes how floating-point numbers should be handled in hardware and software.
    [Show full text]
  • Beal's Conjecture Vs. "Positive Zero", Fight
    Beal's Conjecture vs. "Positive Zero", Fight By Angela N. Moore Yale University [email protected] January 6, 2015 Abstract This article seeks to encourage a mathematical dialog regarding a possible solution to Beal’s Conjecture. It breaks down one of the world’s most difficult math problems into layman’s terms and encourages people to question some of the most fundamental rules of mathematics. More specifically; it reinforces basic algebra/critical thinking skills, makes use of properties attributed to the number one and reanalyzes the definition of a positive integer in order to provide a potential counterexample to Beal's Conjecture. I. THE UNDEFEATED CHAMPION (BEAL’S CONJECTURE) A^x + B^y = C^z Where A, B, C, x, y, and z are positive integers with x, y, z > 2, then A, B, and C have a common prime factor. II. NOW FOR THE COUNTEREXAMPLE OF DOOM Pre-Fight: Beal’s Conjecture is never true when (A^x= 1) + B^y = C^z. This is because 1 has no prime factors. Let the Games Begin: There are instances when positive 0 is not the same as zero. “Signed zero is zero with an associated sign. In ordinary arithmetic, −0 = +0 = 0. However, in computing, some number representations allow for the existence of two zeros, often denoted by −0 (negative zero) and +0 (positive zero).”i Furthermore, the same website proved that signed 0 sometimes produces different results than 0. “…the concept of signed zero runs contrary to the general assumption made in most mathematical fields (and in most mathematics courses) that negative zero is the same thing as zero.
    [Show full text]
  • Numerical Computation Guide
    Numerical Computation Guide A Sun Microsystems, Inc. Business 2550 Garcia Avenue Mountain View, CA 94043 U.S.A. Part No.: 801-7639-10 Revision A, August 1994 1994 Sun Microsystems, Inc. 2550 Garcia Avenue, Mountain View, California 94043-1100 U.S.A. All rights reserved. This product and related documentation are protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or related documentation may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Portions of this product may be derived from the UNIX® and Berkeley 4.3 BSD systems, licensed from UNIX System Laboratories, Inc., a wholly owned subsidiary of Novell, Inc., and the University of California, respectively. Third-party font software in this product is protected by copyright and licensed from Sun’s font suppliers. RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the United States Government is subject to the restrictions set forth in DFARS 252.227-7013 (c)(1)(ii) and FAR 52.227-19. The product described in this manual may be protected by one or more U.S. patents, foreign patents, or pending applications. TRADEMARKS Sun, the Sun logo, Sun Microsystems, Sun Microsystems Computer Corporation, Solaris, are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and certain other countries. UNIX is a registered trademark of Novell, Inc., in the United States and other countries; X/Open Company, Ltd., is the exclusive licensor of such trademark. OPEN LOOK® is a registered trademark of Novell, Inc.
    [Show full text]
  • Floating Point
    Contents Articles Floating point 1 Positional notation 22 References Article Sources and Contributors 32 Image Sources, Licenses and Contributors 33 Article Licenses License 34 Floating point 1 Floating point In computing, floating point describes a method of representing an approximation of a real number in a way that can support a wide range of values. The numbers are, in general, represented approximately to a fixed number of significant digits (the significand) and scaled using an exponent. The base for the scaling is normally 2, 10 or 16. The typical number that can be represented exactly is of the form: Significant digits × baseexponent The idea of floating-point representation over intrinsically integer fixed-point numbers, which consist purely of significand, is that An early electromechanical programmable computer, expanding it with the exponent component achieves greater range. the Z3, included floating-point arithmetic (replica on display at Deutsches Museum in Munich). For instance, to represent large values, e.g. distances between galaxies, there is no need to keep all 39 decimal places down to femtometre-resolution (employed in particle physics). Assuming that the best resolution is in light years, only the 9 most significant decimal digits matter, whereas the remaining 30 digits carry pure noise, and thus can be safely dropped. This represents a savings of 100 bits of computer data storage. Instead of these 100 bits, much fewer are used to represent the scale (the exponent), e.g. 8 bits or 2 A diagram showing a representation of a decimal decimal digits. Given that one number can encode both astronomic floating-point number using a mantissa and an and subatomic distances with the same nine digits of accuracy, but exponent.
    [Show full text]
  • 5.3.1 Min and Max Operations
    The Removal/Demotion of MinNum and MaxNum Operations from IEEE 754™-2018 David H.C. Chen February 21, 2017 Acknowledgements In writing this report, I received help from many colleagues. A thank you to all my colleagues from Floating-Point Working Group for providing crucial insights and expertise. I would also like to express special appreciation to Bob Alverson, Steve Canon, Mike Cowlishaw, David Gay, Michel Hack, John Hauser, David Hough, William Huffman, Grant Martin, Terje Mathisen, Jason Riedy, and Lee Winter, whose valuable inputs, suggestions and comments have significantly improved this report. Any errors or omissions are mine and should not reflect on my colleagues. 1 Introduction IEEE Standard for Floating-Point Arithmetic 754™ 2008 version (IEEE std 754™-2008) [6] specifies four minimum and maximum (min-max) operations: minNum, maxNum, minNumMag, and maxNumMag, in sub-clause “5.3.1 General operations”. These four min-max operations are removed from or demoted in IEEE std 754™-2018 [7], due to their non-associativity. No existing implementations become non-conformant due to this removal/demotion. This report explains the technical reasons for the removal/demotion of the above four min-max operations, and reviews existing min-max implementations in some programming language standards and commercial hardware. Many implementations support several floating-point formats, but this report will only present one or two selected formats for an implementation when additional formats are supported in the same way. To be brief, this report focuses on treatments of Not a Number (NaN) by max operations. A given implementation may also provide max magnitude, min, and min magnitude operations treating signaling NaN (sNaN) and quiet NaN (qNaN) in the same way.
    [Show full text]
  • New Directions in Floating-Point Arithmetic
    New directions in floating-point arithmetic Nelson H. F. Beebe University of Utah Department of Mathematics, 110 LCB 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA Abstract. This article briefly describes the history of floating-point arithmetic, the development and features of IEEE standards for such arithmetic, desirable features of new implementations of floating-point hardware, and discusses work- in-progress aimed at making decimal floating-point arithmetic widely available across many architectures, operating systems, and programming languages. Keywords: binary arithmetic; decimal arithmetic; elementary functions; fixed-point arithmetic; floating-point arithmetic; interval arithmetic; mathcw library; range arithmetic; special functions; validated numerics PACS: 02.30.Gp Special functions DEDICATION This article is dedicated to N. Yngve Öhrn, my mentor, thesis co-advisor, and long-time friend, on the occasion of his retirement from academia. It is also dedicated to William Kahan, Michael Cowlishaw, and the late James Wilkinson, with much thanks for inspiration. WHAT IS FLOATING-POINT ARITHMETIC? Floating-point arithmetic is a technique for storing and operating on numbers in a computer where the base, range, and precision of the number system are usually fixed by the computer design. Conceptually, a floating-point number has a sign,anexponent,andasignificand (the older term mantissa is now deprecated), allowing a representation of the form .1/sign significand baseexponent.Thebase point in the significand may be at the left, or after the first digit, or at the right. The point and the base are implicit in the representation: neither is stored. The sign can be compactly represented by a single bit, and the exponent is most commonly a biased unsigned bitfield, although some historical architectures used a separate exponent sign and an unbiased exponent.
    [Show full text]
  • Catalogue of Common Sorts of Bug
    Warwick Research Software Engineering Catalogue of Common Sorts of Bug H. Ratcliffe and C.S. Brady Senior Research Software Engineers \The Angry Penguin", used under creative commons licence from Swantje Hess and Jannis Pohlmann. October 2, 2018 1 Bug Catalogue 1.1 Logic or algorithm bugs Logic bugs are a catch-all for when your program does what it ought to, but not what you wanted; effectively you have written the wrong correct program. They can be very tricky to find, because they often arise from some misunderstanding of what you are trying to achieve. You may find it helpful to look over your plan (you did make a plan, right?) Symptoms: • incorrect answers • poor performance Examples: • Finding the minimum of a list the wrong way: if your data is sorted, the minimum value must be the first in the list but sorting the data to just find the minimum item is silly.1 • Missing parts of a range: 1 INTEGER a , c 2 a = GET NEXT INTEGER( ) 3 IF ( a < 0 ) THEN 4 c = 1 5 ELSE IF ( a > 0 AND a < 5) THEN 6 c = 0 7 ELSE IF ( a >= 5) THEN 8 c = 2 9 END Notice that the case of a == 0 has been missed, and in this case c is undefined. • Single branch of sqrt: calculating a from something like a2 = 9, and forgetting that a can be either 3 or -3, introducing a sign error in further calculations • Most typos: mistyped names, function assignment rather than call (Python), missing semicolon (C). Mis-use of operators, for example using & and && (C).
    [Show full text]
  • Study of the Posit Number System: a Practical Approach
    STUDY OF THE POSIT NUMBER SYSTEM: A PRACTICAL APPROACH Raúl Murillo Montero DOBLE GRADO EN INGENIERÍA INFORMÁTICA – MATEMÁTICAS FACULTAD DE INFORMÁTICA UNIVERSIDAD COMPLUTENSE DE MADRID Trabajo Fin Grado en Ingeniería Informática – Matemáticas Curso 2018-2019 Directores: Alberto Antonio del Barrio García Guillermo Botella Juan Este documento está preparado para ser imprimido a doble cara. STUDY OF THE POSIT NUMBER SYSTEM: A PRACTICAL APPROACH Raúl Murillo Montero DOUBLE DEGREE IN COMPUTER SCIENCE – MATHEMATICS FACULTY OF COMPUTER SCIENCE COMPLUTENSE UNIVERSITY OF MADRID Bachelor’s Degree Final Project in Computer Science – Mathematics Academic year 2018-2019 Directors: Alberto Antonio del Barrio García Guillermo Botella Juan To me, mathematics, computer science, and the arts are insanely related. They’re all creative expressions. Sebastian Thrun Acknowledgements I have to start by thanking Alberto Antonio del Barrio García and Guillermo Botella Juan, the directors of this amazing project. Thank you for this opportunity, for your support, your help and for trusting me. Without you any of this would have ever been possible. Thanks to my family, who has always supported me and taught me that I am only limited by the limits I choose to put on myself. Finally, thanks to all of you who have accompanied me on this 5 year journey. Sharing classes, exams, comings and goings between the two faculties with you has been a wonderful experience I will always remember. vii Abstract The IEEE Standard for Floating-Point Arithmetic (IEEE 754) has been for decades the standard for floating-point arithmetic and is implemented in a vast majority of modern computer systems.
    [Show full text]