Part V Real Arithmetic

Part V Real Arithmetic Parts Chapters 1. Numbers and Arithmetic 2. Representing Signed Numbers I. Number Representation 3. Redundant Number Systems 4. Residue Number Systems 5. Basic Addition and Counting 6. Carry-Lookahead Adders II. Addition / Subtraction 7. Variations in Fast Adders 8. Multioperand Addition 9. Basic Multiplication Schemes 10. High-Radix Multipliers III. Multiplication 11. Tree and Array Multipliers 12. Variations in Multipliers 13. Basic Division Schemes 14. High-Radix Dividers IV . Division Elementary Operations Elementary 15. Variations in Dividers 16. Division by Convergence 17. Floating-Point Reperesentations 18. Floating-Point Operations V. Real Arithmetic 19. Errors and Error Control 20. Precise and Certifiable Arithmetic 21. Square-Rooting Methods 22. The CORDIC Algorithms VI. Function Evaluation 23. Variations in Function Evaluation 24. Arithmetic by Table Lookup 25. High-Throughput Arithmetic 26. Low-Power Arithmetic VII. Implementation Topics 27. Fault-Tolerant Arithmetic 28. Past,Reconfigurable Present, and Arithmetic Future Appendix: Past, Present, and Future May 2015 Computer Arithmetic, Real Arithmetic Slide 1 About This Presentation This presentation is intended to support the use of the textbook Computer Arithmetic: Algorithms and Hardware Designs (Oxford U. Press, 2nd ed., 2010, ISBN 978-0-19-532848-6). It is updated regularly by the author as part of his teaching of the graduate course ECE 252B, Computer Arithmetic, at the University of California, Santa Barbara. Instructors can use these slides freely in classroom teaching and for other educational purposes. Unauthorized uses are strictly prohibited. © Behrooz Parhami Edition Released Revised Revised Revised Revised First Jan. 2000 Sep. 2001 Sep. 2003 Oct. 2005 May 2007 May 2008 May 2009 Second May 2010 Apr. 2011 May 2012 May 2015 May 2015 Computer Arithmetic, Real Arithmetic Slide 2 V Real Arithmetic Review floating-point numbers, arithmetic, and errors: • How to combine wide range with high precision • Format and arithmetic ops; the IEEE standard • Causes and consequence of computation errors • When can we trust computation results? Topics in This Part Chapter 17 Floating-Point Representations Chapter 18 Floating-Point Operations Chapter 19 Errors and Error Control Chapter 20 Precise and Certifiable Arithmetic May 2015 Computer Arithmetic, Real Arithmetic Slide 3 “According to my calculation, you should float now ... I think ...” “It’s an inexact science.” May 2015 Computer Arithmetic, Real Arithmetic Slide 4 17 Floating-Point Representations Chapter Goals Study a representation method offering both wide range (e.g., astronomical distances) and high precision (e.g., atomic distances) Chapter Highlights Floating-point formats and related tradeoffs The need for a floating-point standard Finiteness of precision and range Fixed-point and logarithmic representations as special cases at the two extremes May 2015 Computer Arithmetic, Real Arithmetic Slide 5 Floating-Point Representations: Topics Topics in This Chapter 17.1 Floating-Point Numbers 17.2 The IEEE Floating-Point Standard 17.3 Basic Floating-Point Algorithms 17.4 Conversions and Exceptions 17.5 Rounding Schemes 17.6 Logarithmic Number Systems May 2015 Computer Arithmetic, Real Arithmetic Slide 6 17.1 Floating-Point Numbers No finite number system can represent all real numbers Various systems can be used for a subset of real numbers Fixed-point w . f Low precision and/or range Rational p / q Difficult arithmetic Floating-point sbe Most common scheme Logarithmic logbx Limiting case of floating-point Fixed-point numbers Square of x = (0000 0000 . 0000 1001) Small number two neither number y = (1001 0000 . 0000 0000) Large number two representable Floating-point numbers x = 1.001 2–5 e exponent x = s b or significand base y = 1.001 2+7 A floating-point number comes with two signs: Number sign, usually appears as a separate bit Exponent sign, usually embedded in the biased exponent May 2015 Computer Arithmetic, Real Arithmetic Slide 7 Floating-Point Number Format and Distribution Fig. 17.1 Typical ± es floating-point Sign E x p o n e n t : S i g n i f i c a n d : number format. Signed integer, Represented as a fixed-point number 0 : + often represented 1 : – as unsigned value Usually normalized by shifting, Fig. 17.2 Subranges by adding a bias so that the MSB becomes nonzero. In radix 2, the fixed leading 1 1.001 2–5 and special values in Range with h bits: can be removed to save one bit; floating-point number [–bias, 2 h –1–bias] this bit is known as "hidden 1". 1.001 2+7 representations. Negative numbers Positive numbers – max – FLP – min – 0 min + FLP + max + + Sparser Denser Denser Sparser Overflow Underflow Overflow region regions region Midway Underflow Typical Overflow example example example example May 2015 Computer Arithmetic, Real Arithmetic Slide 8 Floating-Point Before the IEEE Standard Computer manufacturers tended to have their own hardware-level formats This created many problems, as floating-point computations could produce vastly different results (not just differing in the last few significant bits) To get a sense for the wide variations in floating-point formats, visit: http://www.mrob.com/pub/math/floatformats.html In computer arithmetic, we talked about IBM, CDC, DEC, Cray, … formats and discussed their relative merits First IEEE standard for binary floating-point arithmetic was adopted in 1985 after years of discussion The 1985 standard was continuously discussed, criticized, and clarified for a couple of decades In 2008, after several years of discussion, a revised standard was issued May 2015 Computer Arithmetic, Real Arithmetic Slide 9 17.2 The IEEE Floating-Point Standard Short (32-bit) format IEEE 754-2008 Standard (supersedes IEEE 754-1985) 8 bits, 23 bits for fractional part Also includes half- & bias = 127, (plus hidden 1 in integer part) quad-word binary, plus –126 to 127 some decimal formats Sign Exponent Significand 11 bits, bias = 1023, 52 bits for fractional part –1022 to 1023 (plus hidden 1 in integer part) Long (64-bit) format Fig. 17.3 The IEEE standard floating-point number representation formats. May 2015 Computer Arithmetic, Real Arithmetic Slide 10 Overview of IEEE 754-2008 Standard Formats Table 17.1 Some features of the IEEE 754-2008 standard floating-point number representation formats –––––––––––––––––––––––––––––––––––––––––––––––––––––––– Feature Single/Short Double/Long –––––––––––––––––––––––––––––––––––––––––––––––––––––––– Word width (bits) 32 64 Significand bits 23 + 1 hidden 52 + 1 hidden Significand range [1, 2 – 2–23] [1, 2 –2–52] Exponent bits 8 11 Exponent bias 127 1023 Zero (0) e + bias = 0, f = 0 e + bias = 0, f = 0 Denormal e + bias = 0, f 0 e + bias = 0, f 0 represents 0.f2–126 represents 0.f2–1022 Infinity () e + bias = 255, f = 0 e + bias = 2047, f =0 Not-a-number (NaN) e + bias = 255, f 0 e + bias = 2047, f 0 Ordinary number e + bias [1, 254] e + bias [1, 2046] e [–126, 127] e [–1022, 1023] represents 1.f 2e represents 1.f 2e min 2–126 1.2 10–38 2–1022 2.2 10–308 max 2128 3.4 1038 21024 1.8 10308 –––––––––––––––––––––––––––––––––––––––––––––––––––––––– May 2015 Computer Arithmetic, Real Arithmetic Slide 11 Exponent Encoding Exponent encoding in 8 bits for the single/short (32-bit) IEEE 754 format Decimal code 0 1126 127 128 254 255 Hex code 00 017E 7F 80 FE FF Exponent value –126–1 0 +1 +127 1.f 2e f = 0: Representation of 0 f = 0: Representation of f 0: Representation of subnormals, f 0: Representation of NaNs 0.f 2–126 Negative numbers Positive numbers – max – FLP – min – 0 min + FLP + max + + Exponent encoding in Sparser Denser Denser Sparser 11 bits for the double/long Overflow Underflow Overflow region regions region (64-bit) format is similar Midway Underflow Typical Overflow example example example example May 2015 Computer Arithmetic, Real Arithmetic Slide 12 Special Operands and Subnormals Operations on special operands: Biased value 012 . 253 254 255 Ordinary number (+) = 0 (+) Ordinary number = 126 125 . 126 127 NaN + Ordinary number = NaN Ordinary FLP numbers (1.f 2e ) 0, Subnormal , NaN ( 0.f 2–126) Denormals –126 –125 0 Subnormals 2 2 . min (1.00…01 – 1.00…00)2–126 = 2–149 Fig. 17.4 Subnormals in the IEEE single-precision format. May 2015 Computer Arithmetic, Real Arithmetic Slide 13 Extended Formats Single extended 11 bits 32 bits Bias is unspecified, but exponent range must include: Short (32-bit) format Single extended [1022, 1023] 8 bits, 23 bits for fractional part bias = 127, (plus hidden 1 in integer part) –126 to 127 Double extended [16 382, 16 383] Sign Exponent Significand 11 bits, bias = 1023, 52 bits for fractional part –1022 to 1023 (plus hidden 1 in integer part) Long (64-bit) format 15 bits 64 bits Double extended May 2015 Computer Arithmetic, Real Arithmetic Slide 14 Requirements for Arithmetic Results of the 4 basic arithmetic operations (+, , , ) as well as square-rooting must match those obtained if all intermediate computations were infinitely precise That is, a floating-point arithmetic operation should introduce no more imprecision than the error attributable to the final rounding of a result that has no exact representation (this is the best possible) Example: (1 + 21) (1 + 223 ) Exact result 1 + 21 + 223 + 224 Chopped result 1 + 21 + 223 Error = ½ ulp Rounded result 1 + 21 + 222 Error = +½ ulp May 2015 Computer Arithmetic, Real Arithmetic Slide 15 17.3 Basic Floating-Point Algorithms Addition Assume e1 e2; alignment shift (preshift)

Load more