Computer Arithmetic (Temporary Title, Work in Progress)

Total Page:16

File Type:pdf, Size:1020Kb

Computer Arithmetic (Temporary Title, Work in Progress) Computer Arithmetic (temporary title, work in progress) Current additions done by Hossam A. H. Fahmy with permission from Michael J. Flynn Starting material based partly on the book: Introduction to Arithmetic for Digital Systems Designers by Shlomo Waser and Michael J. Flynn Originally published by Holt, Rinehard & Winston, New York, 1982 (Out of print) ii Contents 1 Numeric Data Representation1 1.1 Infinite aspirations and finite resources........................2 1.2 Natural Numbers, Finitude, and Modular Arithmetic................3 1.2.1 Properties....................................4 1.2.2 Extending Peano's Numbers..........................6 1.3 Integer Representation.................................7 1.3.1 Complement Coding..............................8 1.3.2 Radix Complement Code|Subtraction Using Addition..........8 1.3.3 Diminished Radix Complement Code..................... 10 1.4 Implementation of Integer Operations........................ 15 1.4.1 Negation.................................... 15 1.4.2 Two's Complement Addition......................... 15 1.4.3 Ones Complement Addition.......................... 16 1.4.4 Computing Through the Overflows...................... 17 1.4.5 Arithmetic Shifts................................ 18 1.4.6 Multiplication.................................. 19 1.4.7 Division..................................... 20 1.5 Going far and beyond................................. 21 1.5.1 Fractions.................................... 21 1.5.2 Is the radix a natural number?........................ 22 1.5.3 Redundant representations.......................... 23 1.5.4 Mixed radix systems.............................. 25 iii iv CONTENTS 1.6 Further readings.................................... 28 1.7 Summary........................................ 28 1.8 Problems........................................ 29 2 Floating over the vast seas 33 2.1 Motivation and Terminology; or the why? and what? of floating point....... 33 2.2 Properties of Floating Point Representation..................... 35 2.2.1 Lack of Unique Representation........................ 35 2.2.2 Range and Precision.............................. 37 2.2.3 Mapping Errors: Overflows, Underflows, and Gap.............. 38 2.3 Problems in Floating Point Computations...................... 39 2.3.1 Representational error analysis and radix tradeoffs............. 39 2.3.2 Loss of Significance............................... 44 2.3.3 Rounding: Mapping the Reals into the Floating Point Numbers...... 46 2.4 History of floating point standards.......................... 48 2.4.1 IEEE binary formats.............................. 49 2.4.2 Prior formats.................................. 51 2.4.3 Comparing the different systems....................... 53 2.4.4 Who needs decimal and why?......................... 54 2.4.5 IEEE decimal formats............................. 56 2.5 Floating Point Operations............................... 57 2.5.1 Addition and Subtraction........................... 58 2.5.2 Multiplication.................................. 60 2.5.3 Division..................................... 61 2.5.4 Fused Multiply Add.............................. 61 2.6 Reading the fine print in the standard........................ 62 2.6.1 Rounding.................................... 63 2.6.2 Exceptions and What to Do in Each Case.................. 67 2.6.3 Analysis of the IEEE 754 standard...................... 73 2.7 Cray Floating Point.................................. 76 CONTENTS v 2.7.1 Data Format.................................. 76 2.7.2 Machine Maximum............................... 76 2.7.3 Machine Minimum............................... 76 2.7.4 Treatment of Zero............................... 78 2.7.5 Operations................................... 78 2.7.6 Overflow..................................... 78 2.8 Additional Readings.................................. 80 2.9 Summary........................................ 80 2.10 Problems........................................ 81 3 Are there any limits? 85 3.1 The logic level and the technology level....................... 86 3.2 The Residue Number System............................. 88 3.2.1 Representation................................. 88 3.2.2 Operations in the Residue Number System................. 89 3.2.3 Selection of the Moduli............................ 91 3.2.4 Operations with General Moduli....................... 92 3.2.5 Conversion To and From Residue Representation.............. 93 3.2.6 Uses of the Residue Number System..................... 97 3.3 The limits of fast arithmetic.............................. 98 3.3.1 Background................................... 98 3.3.2 Levels of evaluation.............................. 98 3.3.3 The (r; d) Circuit Model............................ 99 3.3.4 First Approximation to the Lower Bound.................. 101 3.3.5 Spira/Winograd bound applied to residue arithmetic............ 103 3.3.6 Winograd's Lower Bound on Multiplication................. 104 3.4 Modeling the speed of memories........................... 106 3.5 Modeling the multiplexers and shifters........................ 108 3.6 Additional Readings.................................. 110 3.7 Summary........................................ 111 3.8 Problems........................................ 112 vi CONTENTS 4 Addition and Subtraction (Incomplete chapter) 115 4.1 Fixed Point Algorithms................................ 115 4.1.1 Historical Review................................ 115 4.1.2 Conditional Sum................................ 116 4.1.3 Carry-Look-Ahead Addition.......................... 119 4.1.4 Canonic Addition: Very Fast Addition and Incrementation........ 124 4.1.5 Ling Adders................................... 131 4.1.6 Simultaneous Addition of Multiple Operands: Carry-Save Adders..... 135 4.2 Problems........................................ 137 5 Go forth and multiply (Incomplete chapter) 141 5.1 Simple multiplication methods............................ 141 5.2 Simultaneous Matrix Generation and Reduction................... 146 5.2.1 Partial Products Generation: Booth's Algorithm.............. 148 5.2.2 Using ROMs to Generate Partial Products................. 151 5.2.3 Partial Products Reduction.......................... 154 5.3 Iteration and Partial Products Reduction...................... 157 5.3.1 A Tale of Three Trees............................. 157 5.4 Iterative Array of Cells................................. 164 5.5 Detailed Design of Large Multipliers......................... 168 5.5.1 Design Details of a 64 × 64 Multiplier.................... 168 5.5.2 Design Details of a 56 × 56 Single Length Multiplier............ 173 5.6 Problems........................................ 177 6 Division (Incomplete chapter) 181 6.1 Subtractive Algorithms: General Discussion..................... 181 6.1.1 Restoring and Nonrestoring Binary Division................. 181 6.1.2 Pencil and Paper Division........................... 181 6.2 Multiplicative Algorithms............................... 184 6.2.1 Division by Series Expansion......................... 185 6.2.2 The Newton{Raphson Division........................ 187 6.3 Additional Readings.................................. 190 6.4 Exercises........................................ 190 CONTENTS vii 7 Solutions 193 Solutions to Exercises................................. 194 viii CONTENTS List of Figures 2.1 Rounding methods on the real number axis...................... 48 2.2 ieee single (binary32), double (binary64), and quad (binary128) floating point number formats..................................... 50 2.3 IEEE decimal64 and decimal128 floating point formats............... 57 2.4 Alignment shift for the FMA............................. 62 3.1 The (r; d) circuit..................................... 100 3.2 Time delays in a circuit with 10 inputs and (r; d) = (4; 2).............. 102 3.3 The (r; d) network.................................... 103 3.4 A simple memory model................................ 107 4.1 Example of the conditional sum mechanism...................... 116 4.2 4-bit conditional sum adder slice with carry-look-ahead (gate count= 45)..... 118 4.3 16-bit conditional sum adder. The dotted line encloses a 4-bit slice with internal look ahead. The rectangular box (on the bottom) accepts conditional carries and generates fast true carries between slices. The worst case path delay is seven gates.120 4.4 4-bit adder slice with internal carry-look-ahead (gate count = 30)......... 122 4.5 Four group carry-look-ahead generator (gate count = 14).............. 123 4.6 64-bit addition using full carry-look-ahead...................... 123 4.7 Addition of three n-bit numbers............................ 135 4.8 Addition of four n-bit numbers............................. 136 5.1 A simple implementation of the add and shift multiplication............ 144 5.2 A variation of the add and shift multiplication.................... 144 5.3 Multiplying two 8-bit operands............................ 147 ix x LIST OF FIGURES 5.4 Generation of five partial products in 8×8 multiplication, using modified Booth's algorithm (only four partial products are generated if the representation is re- stricted to two's complement)............................. 150 5.5 Implementation of 8×8 multiplication using four 256×8 ROMs, where each ROM performs 4 × 4 multiplication.............................. 152 5.6 Using ROMs for various multiplier arrays...................... 153 5.7 Wallace tree....................................... 155 5.8 Wallace tree reduction of 8 × 8 multiplication,
Recommended publications
  • Dynamical Directions in Numeration Tome 56, No 7 (2006), P
    R AN IE N R A U L E O S F D T E U L T I ’ I T N S ANNALES DE L’INSTITUT FOURIER Guy BARAT, Valérie BERTHÉ, Pierre LIARDET & Jörg THUSWALDNER Dynamical directions in numeration Tome 56, no 7 (2006), p. 1987-2092. <http://aif.cedram.org/item?id=AIF_2006__56_7_1987_0> © Association des Annales de l’institut Fourier, 2006, tous droits réservés. L’accès aux articles de la revue « Annales de l’institut Fourier » (http://aif.cedram.org/), implique l’accord avec les conditions générales d’utilisation (http://aif.cedram.org/legal/). Toute re- production en tout ou partie cet article sous quelque forme que ce soit pour tout usage autre que l’utilisation à fin strictement per- sonnelle du copiste est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit contenir la présente mention de copyright. cedram Article mis en ligne dans le cadre du Centre de diffusion des revues académiques de mathématiques http://www.cedram.org/ Ann. Inst. Fourier, Grenoble 56, 7 (2006) 1987-2092 DYNAMICAL DIRECTIONS IN NUMERATION by Guy BARAT, Valérie BERTHÉ, Pierre LIARDET & Jörg THUSWALDNER (*) Abstract. — This survey aims at giving a consistent presentation of numer- ation from a dynamical viewpoint: we focus on numeration systems, their asso- ciated compactification, and dynamical systems that can be naturally defined on them. The exposition is unified by the fibred numeration system concept. Many examples are discussed. Various numerations on rational integers, real or complex numbers are presented with special attention paid to β-numeration and its gener- alisations, abstract numeration systems and shift radix systems, as well as G-scales and odometers.
    [Show full text]
  • Basic Computer Arithmetic
    BASIC COMPUTER ARITHMETIC TSOGTGEREL GANTUMUR Abstract. First, we consider how integers and fractional numbers are represented and manipulated internally on a computer. The focus is on the principles behind the algorithms, rather than on implementation details. Then we develop a basic theoretical framework for analyzing algorithms involving inexact arithmetic. Contents 1. Introduction 1 2. Integers 2 3. Simple division algorithms 6 4. Long division 11 5. The SRT division algorithm 16 6. Floating point numbers 18 7. Floating point arithmetic 21 8. Propagation of error 24 9. Summation and product 29 1. Introduction There is no way to encode all real numbers by using finite length words, even if we use an alphabet with countably many characters, because the set of finite sequences of integers is countable. Fortunately, the real numbers support many countable dense subsets, and hence the encoding problem of real numbers may be replaced by the question of choosing a suitable countable dense subset. Let us look at some practical examples of real number encodings. ● Decimal notation. Examples: 36000, 2:35(75), −0:000072. ● Scientific notation. Examples: 3:6⋅104, −72⋅10−6. The general form is m⋅10e. In order to have a unique (or near unique) representation of each number, one can impose a normalization, such as requiring 1 ≤ ∣m∣ < 10. 2 −1 ● System with base/radix β. Example: m2m1m0:m−1 = m2β +m1β +m0 +m−1β . The dot separating the integer and fractional parts is called the radix point. ● Binary (β = 2), octal (β = 8), and hexadecimal (β = 16) numbers. ● Babylonian hexagesimal (β = 60) numbers.
    [Show full text]
  • Introduction to Numerical Computing Number Systems in the Decimal System We Use the 10 Numeric Symbols 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 to Represent Numbers
    Statistics 580 Introduction to Numerical Computing Number Systems In the decimal system we use the 10 numeric symbols 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 to represent numbers. The relative position of each symbol determines the magnitude or the value of the number. Example: 6594 is expressible as 6594 = 6 103 + 5 102 + 9 101 + 4 100 Example: · · · · 436.578 is expressible as 436:578 = 4 102 + 3 101 + 6 100 + 5 10−1 + 7 10−2 + 8 10−3 · · · · · · We call the number 10 the base of the system. In general we can represent numbers in any system in the polynomial form: z = ( akak− a a : b b bm )B · · · 1 · · · 1 0 0 1 · · · · · · where B is the base of the system and the period in the middle is the radix point. In com- puters, numbers are represented in the binary system. In the binary system, the base is 2 and the only numeric symbols we need to represent numbers in binary are 0 and 1. Example: 0100110 is a binary number whose value in the decimal system is calculated as follows: 0 26 + 1 25 + 0 24 + 0 23 + 1 22 + 1 21 + 0 20 = 32 + 4 + 2 · · · · · · · = (38)10 The hexadecimal system is another useful number system in computer work. In this sys- tem, the base is 16 and the 16 numeric and arabic symbols used to represent numbers are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F Example: Let 26 be a hexadecimal number.
    [Show full text]
  • Version 13.0 Release Notes
    Release Notes The tool of thought for expert programming Version 13.0 Dyalog is a trademark of Dyalog Limited Copyright 1982-2011 by Dyalog Limited. All rights reserved. Version 13.0 First Edition April 2011 No part of this publication may be reproduced in any form by any means without the prior written permission of Dyalog Limited. Dyalog Limited makes no representations or warranties with respect to the contents hereof and specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. Dyalog Limited reserves the right to revise this publication without notification. TRADEMARKS: SQAPL is copyright of Insight Systems ApS. UNIX is a registered trademark of The Open Group. Windows, Windows Vista, Visual Basic and Excel are trademarks of Microsoft Corporation. All other trademarks and copyrights are acknowledged. iii Contents C H A P T E R 1 Introduction .................................................................................... 1 Summary........................................................................................................................... 1 System Requirements ....................................................................................................... 2 Microsoft Windows .................................................................................................... 2 Microsoft .Net Interface .............................................................................................. 2 Unix and Linux ..........................................................................................................
    [Show full text]
  • Data Representation in Computer Systems
    © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION There are 10 kinds of people in the world—those who understand binary and those who don’t. © Jones & Bartlett Learning, LLC © Jones—Anonymous & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION CHAPTER Data Representation © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE 2OR DISTRIBUTIONin ComputerNOT FOR Systems SALE OR DISTRIBUTION © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION 2.1 INTRODUCTION he organization of any computer depends considerably on how it represents T numbers, characters, and control information. The converse is also true: Stan- © Jones & Bartlettdards Learning, and conventions LLC established over the years© Jones have determined& Bartlett certain Learning, aspects LLC of computer organization. This chapter describes the various ways in which com- NOT FOR SALE putersOR DISTRIBUTION can store and manipulate numbers andNOT characters. FOR SALE The ideas OR presented DISTRIBUTION in the following sections form the basis for understanding the organization and func- tion of all types of digital systems. The most basic unit of information in a digital computer is called a bit, which is a contraction of binary digit. In the concrete sense, a bit is nothing more than © Jones & Bartlett Learning, aLLC state of “on” or “off ” (or “high”© Jones and “low”)& Bartlett within Learning, a computer LLCcircuit. In 1964, NOT FOR SALE OR DISTRIBUTIONthe designers of the IBM System/360NOT FOR mainframe SALE OR computer DISTRIBUTION established a conven- tion of using groups of 8 bits as the basic unit of addressable computer storage.
    [Show full text]
  • Number Systems
    1 Number Systems The study of number systems is important from the viewpoint of understanding how data are represented before they can be processed by any digital system including a digital computer. It is one of the most basic topics in digital electronics. In this chapter we will discuss different number systems commonly used to represent data. We will begin the discussion with the decimal number system. Although it is not important from the viewpoint of digital electronics, a brief outline of this will be given to explain some of the underlying concepts used in other number systems. This will then be followed by the more commonly used number systems such as the binary, octal and hexadecimal number systems. 1.1 Analogue Versus Digital There are two basic ways of representing the numerical values of the various physical quantities with which we constantly deal in our day-to-day lives. One of the ways, referred to as analogue,isto express the numerical value of the quantity as a continuous range of values between the two expected extreme values. For example, the temperature of an oven settable anywhere from 0 to 100 °C may be measured to be 65 °C or 64.96 °C or 64.958 °C or even 64.9579 °C and so on, depending upon the accuracy of the measuring instrument. Similarly, voltage across a certain component in an electronic circuit may be measured as 6.5 V or 6.49 V or 6.487 V or 6.4869 V. The underlying concept in this mode of representation is that variation in the numerical value of the quantity is continuous and could have any of the infinite theoretically possible values between the two extremes.
    [Show full text]
  • The Mayan Long Count Calendar Thomas Chanier
    The Mayan Long Count Calendar Thomas Chanier To cite this version: Thomas Chanier. The Mayan Long Count Calendar. 2015. hal-00750006v11 HAL Id: hal-00750006 https://hal.archives-ouvertes.fr/hal-00750006v11 Preprint submitted on 8 Dec 2015 (v11), last revised 16 Dec 2015 (v12) HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The Mayan Long Count Calendar Thomas Chanier∗1 1 Universit´ede Brest, 6 avenue Victor le Gorgeu, F-29285 Brest Cedex, France The Mayan Codices, bark-paper books from the Late Postclassic period (1300 to 1521 CE) contain many astronomical tables correlated to ritual cycles, evidence of the achievement of Mayan naked- eye astronomy and mathematics in connection to religion. In this study, a calendar supernumber is calculated by computing the least common multiple of 8 canonical astronomical periods. The three major calendar cycles, the Calendar Round, the Kawil and the Long Count Calendar are shown to derive from this supernumber. The 360-day Tun, the 365-day civil year Haab’ and the 3276-day Kawil-direction-color cycle are determined from the prime factorization of the 8 canonical astronomical input parameters.
    [Show full text]
  • Ed 040 737 Institution Available from Edrs Price
    DOCUMENT RESUME ED 040 737 LI 002 060 TITLE Automatic Data Processing Glossary. INSTITUTION Bureau of the Budget, Washington, D.C. NOTE 65p. AVAILABLE FROM Reprinted and distributed by Datamation Magazine, 35 Mason St., Greenwich, Conn. 06830 ($1.00) EDRS PRICE EDRS Price MF-$0.50 HC-$3.35 DESCRIPTORS *Electronic Data Processing, *Glossaries, *Word Lists ABSTRACT The technology of the automatic information processing field has progressed dramatically in the past few years and has created a problem in common term usage. As a solution, "Datamation" Magazine offers this glossary which was compiled by the U.S. Bureau of the Budget as an official reference. The terms appear in a single alphabetic sequence, ignoring commas or hyphens. Definitions are given only under "key word" entries. Modifiers consisting of more than one word are listed in the normally used sequence (record, fixed length). In cases where two or more terms have the same meaning, only the preferred term is defined, all synonylious terms are given at the end of the definition.Other relationships between terms are shown by descriptive referencing expressions. Hyphens are used sparingly to avoid ambiguity. The derivation of an acronym is shown by underscoring the appropriate letters in the words from which the acronym is formed. Although this glossary is several years old, it is still considered the best one available. (NH) Ns U.S DEPARTMENT OF HEALTH, EDUCATION & WELFARE OFFICE OF EDUCATION THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVED FROM THE PERSON OR ORGANIZATION ORIGINATING IT POINTS OF VIEW OR OPINIONS STATED DO NOT NECES- SARILY REPRESENT OFFICIAL OFFICE OF EDU- CATION POSITION OR POLICY automatic data processing GLOSSA 1 I.R DATAMATION Magazine reprints this Glossary of Terms as a service to the data processing field.
    [Show full text]
  • Floating Point
    Contents Articles Floating point 1 Positional notation 22 References Article Sources and Contributors 32 Image Sources, Licenses and Contributors 33 Article Licenses License 34 Floating point 1 Floating point In computing, floating point describes a method of representing an approximation of a real number in a way that can support a wide range of values. The numbers are, in general, represented approximately to a fixed number of significant digits (the significand) and scaled using an exponent. The base for the scaling is normally 2, 10 or 16. The typical number that can be represented exactly is of the form: Significant digits × baseexponent The idea of floating-point representation over intrinsically integer fixed-point numbers, which consist purely of significand, is that An early electromechanical programmable computer, expanding it with the exponent component achieves greater range. the Z3, included floating-point arithmetic (replica on display at Deutsches Museum in Munich). For instance, to represent large values, e.g. distances between galaxies, there is no need to keep all 39 decimal places down to femtometre-resolution (employed in particle physics). Assuming that the best resolution is in light years, only the 9 most significant decimal digits matter, whereas the remaining 30 digits carry pure noise, and thus can be safely dropped. This represents a savings of 100 bits of computer data storage. Instead of these 100 bits, much fewer are used to represent the scale (the exponent), e.g. 8 bits or 2 A diagram showing a representation of a decimal decimal digits. Given that one number can encode both astronomic floating-point number using a mantissa and an and subatomic distances with the same nine digits of accuracy, but exponent.
    [Show full text]
  • Non-Power Positional Number Representation Systems, Bijective Numeration, and the Mesoamerican Discovery of Zero
    Non-Power Positional Number Representation Systems, Bijective Numeration, and the Mesoamerican Discovery of Zero Berenice Rojo-Garibaldia, Costanza Rangonib, Diego L. Gonz´alezb;c, and Julyan H. E. Cartwrightd;e a Posgrado en Ciencias del Mar y Limnolog´ıa, Universidad Nacional Aut´onomade M´exico, Av. Universidad 3000, Col. Copilco, Del. Coyoac´an,Cd.Mx. 04510, M´exico b Istituto per la Microelettronica e i Microsistemi, Area della Ricerca CNR di Bologna, 40129 Bologna, Italy c Dipartimento di Scienze Statistiche \Paolo Fortunati", Universit`adi Bologna, 40126 Bologna, Italy d Instituto Andaluz de Ciencias de la Tierra, CSIC{Universidad de Granada, 18100 Armilla, Granada, Spain e Instituto Carlos I de F´ısicaTe´oricay Computacional, Universidad de Granada, 18071 Granada, Spain Keywords: Zero | Maya | Pre-Columbian Mesoamerica | Number rep- resentation systems | Bijective numeration Abstract Pre-Columbian Mesoamerica was a fertile crescent for the development of number systems. A form of vigesimal system seems to have been present from the first Olmec civilization onwards, to which succeeding peoples made contributions. We discuss the Maya use of the representational redundancy present in their Long Count calendar, a non-power positional number representation system with multipliers 1, 20, 18× 20, :::, 18× arXiv:2005.10207v2 [math.HO] 23 Mar 2021 20n. We demonstrate that the Mesoamericans did not need to invent positional notation and discover zero at the same time because they were not afraid of using a number system in which the same number can be written in different ways. A Long Count number system with digits from 0 to 20 is seen later to pass to one using digits 0 to 19, which leads us to propose that even earlier there may have been an initial zeroless bijective numeration system whose digits ran from 1 to 20.
    [Show full text]
  • A Tutorial on Data Representation - Integers, Floating-Point Numbers, and Characters 2/26/13 8:22 PM
    A Tutorial on Data Representation - Integers, Floating-point numbers, and characters 2/26/13 8:22 PM A Tutorial on Data Representation 1. Number Systems Human beings use decimal (base 10) and duodecimal (base 12) number systems for counting and measurements (probably because we have 10 fingers and two big toes). Computers use binary (base 2) number system, as they are made from binary digital components (known as transistors) operating in two states - on and off. In computing, we also use hexadecimal (base 16) or octal (base 8) number systems, as a compact form for represent binary numbers. 1.1 Decimal (Base 10) Number System Decimal number system has ten symbols: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, called digits. It uses positional notation. That is, the least-significant digit (right-most digit) is of the or- der of 10^0 (units or ones), the second right-most digit is of the order of 10^1 (tens), the third right-most digit is of the order of 10^2 (hundreds), and so on. For example, 735 = 7×10^2 + 3×10^1 + 5×10^0 We shall denote a decimal number with an optional suffix D if ambiguity arises. 1.2 Binary (Base 2) Number System Binary number system has two symbols: 0 and 1, called bits. It is also a positional nota- tion, for example, 10110B = 1×2^4 + 0×2^3 + 1×2^2 + 1×2^1 + 0×2^0 We shall denote a binary number with a suffix B. Some programming languages de- note binary numbers with prefix 0b (e.g., 0b1001000), or prefix b with the bits quot- ed (e.g., b'10001111').
    [Show full text]
  • New Directions in Floating-Point Arithmetic
    New directions in floating-point arithmetic Nelson H. F. Beebe University of Utah Department of Mathematics, 110 LCB 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA Abstract. This article briefly describes the history of floating-point arithmetic, the development and features of IEEE standards for such arithmetic, desirable features of new implementations of floating-point hardware, and discusses work- in-progress aimed at making decimal floating-point arithmetic widely available across many architectures, operating systems, and programming languages. Keywords: binary arithmetic; decimal arithmetic; elementary functions; fixed-point arithmetic; floating-point arithmetic; interval arithmetic; mathcw library; range arithmetic; special functions; validated numerics PACS: 02.30.Gp Special functions DEDICATION This article is dedicated to N. Yngve Öhrn, my mentor, thesis co-advisor, and long-time friend, on the occasion of his retirement from academia. It is also dedicated to William Kahan, Michael Cowlishaw, and the late James Wilkinson, with much thanks for inspiration. WHAT IS FLOATING-POINT ARITHMETIC? Floating-point arithmetic is a technique for storing and operating on numbers in a computer where the base, range, and precision of the number system are usually fixed by the computer design. Conceptually, a floating-point number has a sign,anexponent,andasignificand (the older term mantissa is now deprecated), allowing a representation of the form .1/sign significand baseexponent.Thebase point in the significand may be at the left, or after the first digit, or at the right. The point and the base are implicit in the representation: neither is stored. The sign can be compactly represented by a single bit, and the exponent is most commonly a biased unsigned bitfield, although some historical architectures used a separate exponent sign and an unbiased exponent.
    [Show full text]