Floating Point Representation

Total Page:16

File Type:pdf, Size:1020Kb

Floating Point Representation FloatingPoint Repre s entation Michael L. Overton c copyr ight 1996 1 Computer Repre s entation of Numbers Computers whichworkwith real ar ithmetic us e a system calle d oating point. Suppose a real number x has thebinary expans ion E x = m 2 ; where 1 m<2 and m =(b :b b b :::) : 0 1 2 3 2 Tostore a number in oating p oint repre s entation, a computer word i s divided into 3 elds, repre s entingthe s ign, theexponent E ,andthe s igni cand m re sp ectively. A 32-bit word could b e divided into elds as follows: 1 bit for the s ign, 8 bitsfortheexponentand 23 bits for the s igni cand. Since the exp onent eld i s 8 bits, it can b e us e d to represent exp onentsbetween 128 and 127. The s igni cand eld can store the rst23bitsofthebinary repre s entation of m,namely b :b :::b : 0 1 22 If b ;b ;::: are not all zero, thi s oating p oint repre s entation of x i s not 23 24 exact butapproximate. A numberiscalleda oating point number if it can b e store d exactly on the computer us ingthegiven oatingpoint repre s entation scheme, i.e. in thi s cas e, b ;b ;::: are all zero. For example, thenumber 23 24 2 11=2= (1:011) 2 2 would b e repre s ented by 0 E =2 1.0110000000000000000000 ; 1 andthenumber 6 71 = (1:000111) 2 2 would b e repre s ented by 0 E =6 1.0001110000000000000000 : Toavoid confus ion, the exp onent E , whichisactually store d in a binary repre s entation, i s shown in decimal for themoment. The oatingpoint repre s entation of a nonzero number is unique as long as we require that1 m<2. If it were not for thi s requirement, thenumber 11=2 could also b e wr itten 4 (0:01011) 2 2 and could therefore b e repre s ented by 0 E =4 0.0101100000000000000000 : However, thi s i s not allowe d s ince b =0 andsom<1. A more intere sting 0 example i s 1=10 = (0:0001100110011 :::) : 2 Since thi s binary expans ion i s in nite, wemust truncate the expans ion some- where. (An alter native, namely rounding, i s di scuss e d later.) The s imple st way totruncatethe expans ion to 23 bitswould givethe representation 0 E =0 0.0001100110011001100110 ; butthi s means m<1 s ince b =0.Aneven wors e choice of repre s entation 0 would b e the following: s ince 4 1=10 = (0:00000001100110011 :::) 2 ; 2 thenumb er could b e repre s ented by 0 E =4 0.0000000110011001100110 : Thi s i s clearly a bad choice s ince le ss of the binary expans ion of 1=10 i s store d, due tothe space wasted bythe leading zeros in the s igni cand eld. This is the reason why m<1,i.e.b =0, is not al lowed. The only allowable 0 repre s entation for 1=10 us e s thefactthat 4 1=10 = (1:100110011 :::) 2 ; 2 2 givingthe repre s entation 0 E = 4 1.1001100110011001100110 : Thi s repre s entation includes more of the binary expansion of 1=10 than the others, and i s said tobenormalized, s ince b = 1, i.e. m> 1. Thus noneof 0 theavailable bitsiswasted bystor ing leading zeros. We can s ee f rom thi s example whythename oating point is used: the binary point of thenumber 1=10 can b e oated toanyposition in the bitstr ing we likebychoosingtheappropr iateexponent: thenormalize d repre s entation, with b =1,istheone whichshould b e always b e us e d when p oss ible. It i s 0 clear that an irrational number suchas i s also repre s ente d most accurately by a normalize d repre s entation: s igni cand bitsshould not b e wasted by stor ing leading zeros. However, thenumber zero i s sp ecial. It cannot b e normalize d, s ince all the bitsinits repre s entation are zero. The exp onent E i s irrelevantand can b e s et to zero. Thus, zero could b e repre s ented as 0 E =0 0.0000000000000000000000 : The gap b etween thenumber1andthenext large st oatingpointnumber 1 i s calle d the precision of the oatingpointsystem, or, often, the machine precision,andweshall denotethi s by .Inthesystem just described, the next oating p oint bigger than 1 i s 1:0000000000000000000001; 22 withthe last bit b =1.Therefore, the preci s ion i s =2 . 22 Exerci s e 1 What is the smal lest possible positive normalized oating point number using the system just described? 1 m< Exerci s e 2 Could nonzero numbers instead be normalizedsothat 2 1? Would this bejustasgood? It i s quite instructivetosuppose thatthe computer wordsizeismuch smaller than 32 bitsandworkoutindetail whatallthe p oss ible oating numb ers are in such a cas e. Suppose thatthe s igni cand eld has ro om only tostore b :b b ,andthatthe only p oss ible value s for the exp onent E are 1, 0 1 2 0and1.Weshall call thi s system our toy oating point system.Thesetof toy oating p ointnumbers is shown in Figure 1 1 Actually,the usual de nition of preci s ion i s onehalf of thi s quantity, for reasons that will b ecomeapparentinthenext s ection. Weprefertoomitthe f actor of onehalf in the de nition . 3 - 0 1 2 3 :::::: Figure 1: TheToyFloatingPointNumbers 1 The large st number is (1:11) 2 =(3:5) ,andthesmalle st p os itivenor- 2 10 1 malize d number is (1:00) 2 =(0:5) . All of thenumbers shown are 2 10 normalize d except zero. Since thenext oating p ointnumb er bigger than 1 i s 1.25, the preci s ion of thetoysystem i s =0:25. Notethatthegapbetween oating p ointnumbers becomes smal ler as themagnitudeofthenumb ers thems elve s get smaller, and bigger as thenumb ers get bigger. Notealsothat the gap b etween b etween zero andthesmallest positivenumber is much big- ger than thegapbetween thesmalle st p os itivenumber andthenext p os itive numb er. Weshall showinthenext s ection howthi s gap can b e \ lle d in" withtheintro duction of \subnormal numb ers". 2 IEEE FloatingPointRepresentation In the 1960's and 1970's, eachcomputer manuf acturer developed itsown oating p ointsystem, leadingto a lot of incons i stency as tohowthe same program b ehave d on di erentmachines. For example, although most ma- chine s us e d binary oatingpoint systems, the IBM 360/370 s er ie s, which dominate d computingdur ingthi s p er io d, us e d a hexadecimal bas e, i.e. num- E bers were repre s ented as m 16 .Other machines, such as HP calculators, usedadecimal oatingpointsystem. Through the e ortsofseveral com- puter scientists, particularly W. Kahan, a binary oating p ointstandard was developed in the early 1980's and, most imp ortantly, followed very carefully bythe pr incipal manuf acturers of oatingpointchips for p ersonal computers, namely Intel andMotorola. Thi s standard has b ecome known as theIEEE oating p ointstandard s ince it was developed andendors e d bya working 2 committee of the Institute for Electr ical and Electronics Engineers. (There i s also a decimal vers ion of thestandard butweshall not di scuss thi s.) The IEEE standard has three very imp ortant requirements: 2 ANSI/IEEE Std 754-1985. Thanks to Jim Demmel for intro ducingtheauthor tothe standard. 4 cons i stent representation of oatingpointnumb ers across all machines adoptingthestandard correctly rounded arithmetic (tobeexplained in thenext s ection) cons i stentand s ens ible treatment of exceptional s ituations suchasdi- vi s ion byzero(to b e di scuss e d in the followingsection). We will not describe thestandard in detail, butwe will cover themain p oints. Westart withthe following obs ervation. In thelastsection, wechos e to E ,where 1 m<2, i.e. normalize a nonzero number x so that x = m 2 m =(b :b b b :::) ; 0 1 2 3 2 with b =1.Inthe s imple oatingpointmodel di scuss e d in the previous 0 s ection, westore d the leading nonzero bit b in the rst p os itionofthe eld 0 provided for m.Note, however, that s ince weknowthi s bit has thevalue one, it is not necessary to storeit. Cons equently,we can us e the 23 bitsofthe s igni cand eld tostore b ;b ;:::;b instead of b ;b ;:::;b 2, changingthe 1 2 23 0 1 2 22 23 machine preci s ion f rom =2 to =2 : Since the bitstr ingstore d in the s igni cand eld i s now actually the fractional part of the s igni cand, weshall refer henceforthtothe eldasthe fraction eld. Given a str ing of bitsinthe f raction eld, it i s nece ssary toimaginethatthesymb ols \1." app ear in f ront of thestring, even though these symb ols are not store d. Thi s technique i s calle d hidden bit normalization andwas us e d by Digital for theVax machine in thelate 1970's. Exerci s e 3 Show that the hidden bit technique does not result in a more accurate representation of 1=10. Would this stil l be true if we had started with a eld width of 24 bits before applying the hidden bit technique? Note an imp ortant p oint: s ince zero cannot b e normalize d tohavea leading nonzero bit, hidden bit repre s entation requires a special technique for storing zero.Weshall s ee whatthi s i s shortly.Apatter n of all zeros in the f raction eld of a normalize d numb er repre s entsthe s igni cand 1.0, not 0.0.
Recommended publications
  • Fujitsu SPARC64™ X+/X Software on Chip Overview for Developers]
    White paper [Fujitsu SPARC64™ X+/X Software on Chip Overview for Developers] White paper Fujitsu SPARC64™ X+/X Software on Chip Overview for Developers Page 1 of 13 www.fujitsu.com/sparc White paper [Fujitsu SPARC64™ X+/X Software on Chip Overview for Developers] Table of Contents Table of Contents 1 Software on Chip Innovative Technology 3 2 SPARC64™ X+/X SIMD Vector Processing 4 2.1 How Oracle Databases take advantage of SPARC64™ X+/X SIMD vector processing 4 2.2 How to use of SPARC64™ X+/X SIMD instructions in user applications 5 2.3 How to check if the system and operating system is capable of SIMD execution? 6 2.4 How to check if SPARC64™ X+/X SIMD instructions indeed have been generated upon compilation of a user source code? 6 2.5 Is SPARC64™ X+/X SIMD implementation compatible with Oracle’s SPARC SIMD? 6 3 SPARC64™ X+/X Decimal Floating Point Processing 8 3.1 How Oracle Databases take advantage of SPARC64™ X+/X Decimal Floating-Point 8 3.2 Decimal Floating-Point processing in user applications 8 4 SPARC64™ X+/X Extended Floating-Point Registers 9 4.1 How Oracle Databases take advantage of SPARC64™ X+/X Decimal Floating-Point 9 5 SPARC64™ X+/X On-Chip Cryptographic Processing Capabilities 10 5.1 How to use the On-Chip Cryptographic Processing Capabilities 10 5.2 How to use the On-Chip Cryptographic Processing Capabilities in user applications 10 6 Conclusions 12 Page 2 of 13 www.fujitsu.com/sparc White paper [Fujitsu SPARC64™ X+/X Software on Chip Overview for Developers] Software on Chip Innovative Technology 1 Software on Chip Innovative Technology Fujitsu brings together innovations in supercomputing, business computing, and mainframe computing in the Fujitsu M10 enterprise server family to help organizations meet their business challenges.
    [Show full text]
  • Arithmetic Algorithms for Extended Precision Using Floating-Point Expansions Mioara Joldes, Olivier Marty, Jean-Michel Muller, Valentina Popescu
    Arithmetic algorithms for extended precision using floating-point expansions Mioara Joldes, Olivier Marty, Jean-Michel Muller, Valentina Popescu To cite this version: Mioara Joldes, Olivier Marty, Jean-Michel Muller, Valentina Popescu. Arithmetic algorithms for extended precision using floating-point expansions. IEEE Transactions on Computers, Institute of Electrical and Electronics Engineers, 2016, 65 (4), pp.1197 - 1210. 10.1109/TC.2015.2441714. hal- 01111551v2 HAL Id: hal-01111551 https://hal.archives-ouvertes.fr/hal-01111551v2 Submitted on 2 Jun 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. IEEE TRANSACTIONS ON COMPUTERS, VOL. , 201X 1 Arithmetic algorithms for extended precision using floating-point expansions Mioara Joldes¸, Olivier Marty, Jean-Michel Muller and Valentina Popescu Abstract—Many numerical problems require a higher computing precision than the one offered by standard floating-point (FP) formats. One common way of extending the precision is to represent numbers in a multiple component format. By using the so- called floating-point expansions, real numbers are represented as the unevaluated sum of standard machine precision FP numbers. This representation offers the simplicity of using directly available, hardware implemented and highly optimized, FP operations.
    [Show full text]
  • Metaobject Protocols: Why We Want Them and What Else They Can Do
    Metaobject protocols: Why we want them and what else they can do Gregor Kiczales, J.Michael Ashley, Luis Rodriguez, Amin Vahdat, and Daniel G. Bobrow Published in A. Paepcke, editor, Object-Oriented Programming: The CLOS Perspective, pages 101 ¾ 118. The MIT Press, Cambridge, MA, 1993. © Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. Metaob ject Proto cols WhyWeWant Them and What Else They Can Do App ears in Object OrientedProgramming: The CLOS Perspective c Copyright 1993 MIT Press Gregor Kiczales, J. Michael Ashley, Luis Ro driguez, Amin Vahdat and Daniel G. Bobrow Original ly conceivedasaneat idea that could help solve problems in the design and implementation of CLOS, the metaobject protocol framework now appears to have applicability to a wide range of problems that come up in high-level languages. This chapter sketches this wider potential, by drawing an analogy to ordinary language design, by presenting some early design principles, and by presenting an overview of three new metaobject protcols we have designed that, respectively, control the semantics of Scheme, the compilation of Scheme, and the static paral lelization of Scheme programs. Intro duction The CLOS Metaob ject Proto col MOP was motivated by the tension b etween, what at the time, seemed liketwo con icting desires. The rst was to have a relatively small but p owerful language for doing ob ject-oriented programming in Lisp. The second was to satisfy what seemed to b e a large numb er of user demands, including: compatibility with previous languages, p erformance compara- ble to or b etter than previous implementations and extensibility to allow further exp erimentation with ob ject-oriented concepts see Chapter 2 for examples of directions in which ob ject-oriented techniques might b e pushed.
    [Show full text]
  • Floating Points
    Jin-Soo Kim ([email protected]) Systems Software & Architecture Lab. Seoul National University Floating Points Fall 2018 ▪ How to represent fractional values with finite number of bits? • 0.1 • 0.612 • 3.14159265358979323846264338327950288... ▪ Wide ranges of numbers • 1 Light-Year = 9,460,730,472,580.8 km • The radius of a hydrogen atom: 0.000000000025 m 4190.308: Computer Architecture | Fall 2018 | Jin-Soo Kim ([email protected]) 2 ▪ Representation • Bits to right of “binary point” represent fractional powers of 2 • Represents rational number: 2i i i–1 k 2 bk 2 k=− j 4 • • • 2 1 bi bi–1 • • • b2 b1 b0 . b–1 b–2 b–3 • • • b–j 1/2 1/4 • • • 1/8 2–j 4190.308: Computer Architecture | Fall 2018 | Jin-Soo Kim ([email protected]) 3 ▪ Examples: Value Representation 5-3/4 101.112 2-7/8 10.1112 63/64 0.1111112 ▪ Observations • Divide by 2 by shifting right • Multiply by 2 by shifting left • Numbers of form 0.111111..2 just below 1.0 – 1/2 + 1/4 + 1/8 + … + 1/2i + … → 1.0 – Use notation 1.0 – 4190.308: Computer Architecture | Fall 2018 | Jin-Soo Kim ([email protected]) 4 ▪ Representable numbers • Can only exactly represent numbers of the form x / 2k • Other numbers have repeating bit representations Value Representation 1/3 0.0101010101[01]…2 1/5 0.001100110011[0011]…2 1/10 0.0001100110011[0011]…2 4190.308: Computer Architecture | Fall 2018 | Jin-Soo Kim ([email protected]) 5 Fixed Points ▪ p.q Fixed-point representation • Use the rightmost q bits of an integer as representing a fraction • Example: 17.14 fixed-point representation
    [Show full text]
  • Chapter 2 Basics of Scanning And
    Chapter 2 Basics of Scanning and Conventional Programming in Java In this chapter, we will introduce you to an initial set of Java features, the equivalent of which you should have seen in your CS-1 class; the separation of problem, representation, algorithm and program – four concepts you have probably seen in your CS-1 class; style rules with which you are probably familiar, and scanning - a general class of problems we see in both computer science and other fields. Each chapter is associated with an animating recorded PowerPoint presentation and a YouTube video created from the presentation. It is meant to be a transcript of the associated presentation that contains little graphics and thus can be read even on a small device. You should refer to the associated material if you feel the need for a different instruction medium. Also associated with each chapter is hyperlinked code examples presented here. References to previously presented code modules are links that can be traversed to remind you of the details. The resources for this chapter are: PowerPoint Presentation YouTube Video Code Examples Algorithms and Representation Four concepts we explicitly or implicitly encounter while programming are problems, representations, algorithms and programs. Programs, of course, are instructions executed by the computer. Problems are what we try to solve when we write programs. Usually we do not go directly from problems to programs. Two intermediate steps are creating algorithms and identifying representations. Algorithms are sequences of steps to solve problems. So are programs. Thus, all programs are algorithms but the reverse is not true.
    [Show full text]
  • Hacking in C 2020 the C Programming Language Thom Wiggers
    Hacking in C 2020 The C programming language Thom Wiggers 1 Table of Contents Introduction Undefined behaviour Abstracting away from bytes in memory Integer representations 2 Table of Contents Introduction Undefined behaviour Abstracting away from bytes in memory Integer representations 3 – Another predecessor is B. • Not one of the first programming languages: ALGOL for example is older. • Closely tied to the development of the Unix operating system • Unix and Linux are mostly written in C • Compilers are widely available for many, many, many platforms • Still in development: latest release of standard is C18. Popular versions are C99 and C11. • Many compilers implement extensions, leading to versions such as gnu18, gnu11. • Default version in GCC gnu11 The C programming language • Invented by Dennis Ritchie in 1972–1973 4 – Another predecessor is B. • Closely tied to the development of the Unix operating system • Unix and Linux are mostly written in C • Compilers are widely available for many, many, many platforms • Still in development: latest release of standard is C18. Popular versions are C99 and C11. • Many compilers implement extensions, leading to versions such as gnu18, gnu11. • Default version in GCC gnu11 The C programming language • Invented by Dennis Ritchie in 1972–1973 • Not one of the first programming languages: ALGOL for example is older. 4 • Closely tied to the development of the Unix operating system • Unix and Linux are mostly written in C • Compilers are widely available for many, many, many platforms • Still in development: latest release of standard is C18. Popular versions are C99 and C11. • Many compilers implement extensions, leading to versions such as gnu18, gnu11.
    [Show full text]
  • Chapter 2 DIRECT METHODS—PART II
    Chapter 2 DIRECT METHODS—PART II If we use finite precision arithmetic, the results obtained by the use of direct methods are contaminated with roundoff error, which is not always negligible. 2.1 Finite Precision Computation 2.2 Residual vs. Error 2.3 Pivoting 2.4 Scaling 2.5 Iterative Improvement 2.1 Finite Precision Computation 2.1.1 Floating-point numbers decimal floating-point numbers The essence of floating-point numbers is best illustrated by an example, such as that of a 3-digit floating-point calculator which accepts numbers like the following: 123. 50.4 −0.62 −0.02 7.00 Any such number can be expressed as e ±d1.d2d3 × 10 where e ∈{0, 1, 2}. (2.1) Here 40 t := precision = 3, [L : U] := exponent range = [0 : 2]. The exponent range is rather limited. If the calculator display accommodates scientific notation, e g., 3.46 3 −1.56 −3 then we might use [L : U]=[−9 : 9]. Some numbers have multiple representations in form (2.1), e.g., 2.00 × 101 =0.20 × 102. Hence, there is a normalization: • choose smallest possible exponent, • choose + sign for zero, e.g., 0.52 × 102 → 5.20 × 101, 0.08 × 10−8 → 0.80 × 10−9, −0.00 × 100 → 0.00 × 10−9. −9 Nonzero numbers of form ±0.d2d3 × 10 are denormalized. But for large-scale scientific computation base 2 is preferred. binary floating-point numbers This is an important matter because numbers like 0.2 do not have finite representations in base 2: 0.2=(0.001100110011 ···)2.
    [Show full text]
  • Decimal Floating Point for Future Processors Hossam A
    1 Decimal Floating Point for future processors Hossam A. H. Fahmy Electronics and Communications Department, Cairo University, Egypt Email: [email protected] Tarek ElDeeb, Mahmoud Yousef Hassan, Yasmin Farouk, Ramy Raafat Eissa SilMinds, LLC Email: [email protected] F Abstract—Many new designs for Decimal Floating Point (DFP) hard- Simple decimal fractions such as 1=10 which might ware units have been proposed in the last few years. To date, only represent a tax amount or a sales discount yield an the IBM POWER6 and POWER7 processors include internal units for infinitely recurring number if converted to a binary decimal floating point processing. We have designed and tested several representation. Hence, a binary number system with a DFP units including an adder, multiplier, divider, square root, and fused- multiply-add compliant with the IEEE 754-2008 standard. This paper finite number of bits cannot accurately represent such presents the results of using our units as part of a vector co-processor fractions. When an approximated representation is used and the anticipated gains once the units are moved on chip with the in a series of computations, the final result may devi- main processor. ate from the correct result expected by a human and required by the law [3], [4]. One study [5] shows that in a large billing application such an error may be up to 1 WHY DECIMAL HARDWARE? $5 million per year. Ten is the natural number base or radix for humans Banking, billing, and other financial applications use resulting in a decimal number system while a binary decimal extensively.
    [Show full text]
  • Floating Point Formats
    Telemetry Standard RCC Document 106-07, Appendix O, September 2007 APPENDIX O New FLOATING POINT FORMATS Paragraph Title Page 1.0 Introduction..................................................................................................... O-1 2.0 IEEE 32 Bit Single Precision Floating Point.................................................. O-1 3.0 IEEE 64 Bit Double Precision Floating Point ................................................ O-2 4.0 MIL STD 1750A 32 Bit Single Precision Floating Point............................... O-2 5.0 MIL STD 1750A 48 Bit Double Precision Floating Point ............................. O-2 6.0 DEC 32 Bit Single Precision Floating Point................................................... O-3 7.0 DEC 64 Bit Double Precision Floating Point ................................................. O-3 8.0 IBM 32 Bit Single Precision Floating Point ................................................... O-3 9.0 IBM 64 Bit Double Precision Floating Point.................................................. O-4 10.0 TI (Texas Instruments) 32 Bit Single Precision Floating Point...................... O-4 11.0 TI (Texas Instruments) 40 Bit Extended Precision Floating Point................. O-4 LIST OF TABLES Table O-1. Floating Point Formats.................................................................................... O-1 Telemetry Standard RCC Document 106-07, Appendix O, September 2007 This page intentionally left blank. ii Telemetry Standard RCC Document 106-07, Appendix O, September 2007 APPENDIX O FLOATING POINT
    [Show full text]
  • Lecture 2: Variables and Primitive Data Types
    Lecture 2: Variables and Primitive Data Types MIT-AITI Kenya 2005 1 In this lecture, you will learn… • What a variable is – Types of variables – Naming of variables – Variable assignment • What a primitive data type is • Other data types (ex. String) MIT-Africa Internet Technology Initiative 2 ©2005 What is a Variable? • In basic algebra, variables are symbols that can represent values in formulas. • For example the variable x in the formula f(x)=x2+2 can represent any number value. • Similarly, variables in computer program are symbols for arbitrary data. MIT-Africa Internet Technology Initiative 3 ©2005 A Variable Analogy • Think of variables as an empty box that you can put values in. • We can label the box with a name like “Box X” and re-use it many times. • Can perform tasks on the box without caring about what’s inside: – “Move Box X to Shelf A” – “Put item Z in box” – “Open Box X” – “Remove contents from Box X” MIT-Africa Internet Technology Initiative 4 ©2005 Variables Types in Java • Variables in Java have a type. • The type defines what kinds of values a variable is allowed to store. • Think of a variable’s type as the size or shape of the empty box. • The variable x in f(x)=x2+2 is implicitly a number. • If x is a symbol representing the word “Fish”, the formula doesn’t make sense. MIT-Africa Internet Technology Initiative 5 ©2005 Java Types • Integer Types: – int: Most numbers you’ll deal with. – long: Big integers; science, finance, computing. – short: Small integers.
    [Show full text]
  • Extended Precision Floating Point Arithmetic
    Extended Precision Floating Point Numbers for Ill-Conditioned Problems Daniel Davis, Advisor: Dr. Scott Sarra Department of Mathematics, Marshall University Floating Point Number Systems Precision Extended Precision Patriot Missile Failure Floating point representation is based on scientific An increasing number of problems exist for which Patriot missile defense modules have been used by • The precision, p, of a floating point number notation, where a nonzero real decimal number, x, IEEE double is insufficient. These include modeling the U.S. Army since the mid-1960s. On February 21, system is the number of bits in the significand. is expressed as x = ±S × 10E, where 1 ≤ S < 10. of dynamical systems such as our solar system or the 1991, a Patriot protecting an Army barracks in Dha- This means that any normalized floating point The values of S and E are known as the significand • climate, and numerical cryptography. Several arbi- ran, Afghanistan failed to intercept a SCUD missile, number with precision p can be written as: and exponent, respectively. When discussing float- trary precision libraries have been developed, but leading to the death of 28 Americans. This failure E ing points, we are interested in the computer repre- x = ±(1.b1b2...bp−2bp−1)2 × 2 are too slow to be practical for many complex ap- was caused by floating point rounding error. The sentation of numbers, so we must consider base 2, or • The smallest x such that x > 1 is then: plications. David Bailey’s QD library may be used system measured time in tenths of seconds, using binary, rather than base 10.
    [Show full text]
  • Kind of Quantity’
    NIST Technical Note 2034 Defning ‘kind of quantity’ David Flater This publication is available free of charge from: https://doi.org/10.6028/NIST.TN.2034 NIST Technical Note 2034 Defning ‘kind of quantity’ David Flater Software and Systems Division Information Technology Laboratory This publication is available free of charge from: https://doi.org/10.6028/NIST.TN.2034 February 2019 U.S. Department of Commerce Wilbur L. Ross, Jr., Secretary National Institute of Standards and Technology Walter Copan, NIST Director and Undersecretary of Commerce for Standards and Technology Certain commercial entities, equipment, or materials may be identifed in this document in order to describe an experimental procedure or concept adequately. Such identifcation is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the entities, materials, or equipment are necessarily the best available for the purpose. National Institute of Standards and Technology Technical Note 2034 Natl. Inst. Stand. Technol. Tech. Note 2034, 7 pages (February 2019) CODEN: NTNOEF This publication is available free of charge from: https://doi.org/10.6028/NIST.TN.2034 NIST Technical Note 2034 1 Defning ‘kind of quantity’ David Flater 2019-02-06 This publication is available free of charge from: https://doi.org/10.6028/NIST.TN.2034 Abstract The defnition of ‘kind of quantity’ given in the International Vocabulary of Metrology (VIM), 3rd edition, does not cover the historical meaning of the term as it is most commonly used in metrology. Most of its historical meaning has been merged into ‘quantity,’ which is polysemic across two layers of abstraction.
    [Show full text]