Adaptivfloat: a Floating-Point Based Data Type for Resilient Deep Learning Inference

Total Page:16

File Type:pdf, Size:1020Kb

Adaptivfloat: a Floating-Point Based Data Type for Resilient Deep Learning Inference ADAPTIVFLOAT:AFLOATING-POINT BASED DATA TYPE FOR RESILIENT DEEP LEARNING INFERENCE Thierry Tambe 1 En-Yu Yang 1 Zishen Wan 1 Yuntian Deng 1 Vijay Janapa Reddi 1 Alexander Rush 2 David Brooks 1 Gu-Yeon Wei 1 ABSTRACT Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low word sizes as their shrinking dynamic ranges cannot adequately capture the wide data distributions commonly seen in sequence transduction models. We present AdaptivFloat, a floating-point inspired number representation format for deep learning that dynamically maximizes and optimally clips its available dynamic range, at a layer granularity, in order to create faithful encoding of neural network parameters. AdaptivFloat consistently produces higher inference accuracies compared to block floating-point, uniform, IEEE-like float or posit encodings at very low precision (≤ 8-bit) across a diverse set of state-of-the-art neural network topologies. And notably, AdaptivFloat is seen surpassing baseline FP32 performance by up to +0.3 in BLEU score and -0.75 in word error rate at weight bit widths that are ≤ 8-bit. Experimental results on a deep neural network (DNN) hardware accelerator, exploiting AdaptivFloat logic in its computational datapath, demonstrate per-operation energy and area that is 0.9× and 1.14×, respectively, that of equivalent bit width integer-based accelerator variants. (a) ResNet-50 Weight Histogram (b) Inception-v3 Weight Histogram 1 INTRODUCTION 105 Max Weight: 1.32 105 Max Weight: 1.27 Min Weight: -0.78 Min Weight: -1.20 4 4 Deep learning approaches have transformed representation 10 10 103 103 learning in a multitude of tasks. Recurrent Neural Networks 102 102 Frequency (RNNs) are now the standard solution for speech recog- 101 101 nition, exhibiting remarkably low word error rates (Chiu 100 100 -10 -5 0 5 10 15 20 -10 -5 0 5 10 15 20 et al., 2017) while neural machine translation has narrowed weight value weight value the performance gap versus human translators (Wu et al., (c) DenseNet-201 Weight Histogram (d) Transformer Weight Histogram 105 Max Weight: 1.33 107 Max Weight: 20.41 2016). Convolutional Neural Networks (CNNs) are now Min Weight: -0.92 6 Min Weight: -12.46 104 10 105 the dominant engine behind image processing and have 103 104 been pushing the frontiers in many computer vision appli- 102 103 Frequency 102 cations (Krizhevsky et al., 2012; He et al., 2016). Today, 101 101 0 deep neural networks (DNNs) are deployed at all comput- 10 100 -10 -5 0 5 10 15 20 -10 -5 0 5 10 15 20 ing scales, from resource-constrained IoT edge devices to weight value weight value massive data center farms. In order to exact higher compute density and energy efficiency on these compute platforms, a Figure 1. Histogram of the weight distribution for (a) ResNet-50, plethora of reduced precision quantization techniques have (b) Inception-v3, (c) DenseNet-201 and (d) Transformer whose arXiv:1909.13271v3 [cs.LG] 11 Feb 2020 been proposed. weight values are more than 10× higher than the maximum weight value from popular CNNs. In this line of research, a large body of work has focused on fixed-point encodings (Choi et al., 2019; Gupta et al., 2015; Lin et al., 2015; Hwang & Sung, 2014; Courbariaux can be seen from Figure1, sequence transduction models et al., 2015) or uniform quantization via integer (Migacz, with layer normalization such as the Transformer (Vaswani 2017; Jacob et al., 2017). These fixed-point techniques are et al., 2017) can contain weights more than an order of mag- frequently evaluated on shallow models or on CNNs exhibit- nitude larger than those from popular CNN models with ing relatively narrow weight distributions. However, as it batch normalization such as ResNet-50, Inception-v3 or DenseNet-201. The reason for this phenomenon is that 1 2 Harvard University, Cambridge, MA, USA Cornell Tech, batch normalization effectively produces a weight normal- New York, NY, USA. ization side effect (Salimans & Kingma, 2016) whereas Preprint. layer normalization adopts invariance properties that do not reparameterize the network (Ba et al., 2016). AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference In the pursuit of wider dynamic range and improved numeri- adaptive posit and float quantization techniques. cal accuracy, there has been surging interest in floating-point • We propose a hybrid float-integer (HFINT) PE imple- based (Drumond et al., 2018;K oster¨ et al., 2017), logarith- mentation that exploits the AdaptivFloat mechanism mic (Johnson, 2018; Miyashita et al., 2016) and posit repre- and provides a cost-effective compromise between the sentations (Gustafson & Yonemoto, 2017), which also form high accuracy of floating-point computations and the the inspiration of this work. greater hardware density of fixed-point post-processing. AdaptivFloat improves on the aforementioned techniques We show that the HFINT PE produces higher energy ef- by dynamically maximizing its available dynamic range at a ficiencies compared to conventional monolithic integer- neural network layer granularity. And unlike block floating- based PEs. point (BFP) approaches with shared exponents that may lead • We design and characterize an accelerator system tar- to degraded rendering of smaller magnitude weights, Adap- geted for sequence-to-sequence neural networks and tivFloat achieves higher inference accuracy by remaining show that, when integrated with HFINT PEs, lower committed to the standard floating-point delineation of inde- overall power consumption compared to an integer- pendent exponent and mantissa bits for each tensor element. based adaptation is obtained. However, we break from IEEE 754 standard compliance with a unique clamping strategy for denormal numbers and The rest of the paper is structured as follows. A summary of with a customized proposition for zero assignment, which prominent number and quantization schemes used in deep enables us to engineer leaner hardware. learning is narrated in Section2. We present the intuition Rather than proposing binary or ternary quantization tech- and a detailed description of the AdaptivFloat algorithm in niques evaluated on a small number of carefully selected Section3. The efficacy and resiliency of AdaptivFloat is models, through AdaptivFloat, we aim to inform a gen- demonstrated in Section4 across DNN models of varying eralized floating-point based mathematical blueprint for parameter distributions. Section5 describes the hardware adaptive and resilient DNN quantization that can be eas- modeling with energy, area and performance efficiency re- ily applied on neural models of various categories (CNN, sults reported in Section6. Section7 concludes the paper. RNN or MLP), layer depths and parameter statistics. By virtue of an algorithm-hardware co-design, we also pro- 2 RELATED WORK pose a processing element implementation that exploits Quantization Techniques. Low-precision DNN training the AdaptivFloat arithmetic in its computational datapath and inference have been researched heavily in recent years in order to yield energy efficiencies that surpass those of with the aim of saving energy and memory costs. A rather integer-based variants. Furthermore, owing to the superior significant percentage of prior work in this domain (Wu performance of AdaptivFloat at very low word sizes, as et al., 2015; Mishra et al., 2017; Park et al., 2017; Zhou it will be shown, higher compute density can be acquired et al., 2016; Cai et al., 2017; Zhang et al., 2018; Han et al., at a lower penalty for computational accuracy compared to 2015) have focused on or evaluated their low precision strate- block floating-point, integer, or non-adaptive IEEE-like float gies strictly on CNNs or on models with narrow parameter or posit encodings. Altogether, the AdaptivFloat algorithm- distributions. Notably, inference performance with mod- hardware co-design framework offers a compelling alterna- est accuracy degradation has been demonstrated with bi- tive to integer or fixed-point solutions. nary (Courbariaux & Bengio, 2016), ternary (Zhu et al., Finally, we note that the AdaptivFloat encoding scheme is 2016), and quaternary weight precision (Choi et al., 2019). self-supervised as it only relies on unlabeled data distribu- Often, tricks such as skipping quantization on the sensitive tions in the network. first and last layers are performed in order to escape steeper end-to-end accuracy loss. This paper makes the following contributions: Extending these aggressive quantization techniques to • We propose and describe AdaptivFloat: a floating-point RNNs have been reported (Alom et al., 2018), although based data encoding algorithm for deep learning, which still with recurrent models exhibiting the same narrow distri- maximizes its dynamic range at a neural network layer bution seen in many CNNs. (Park et al., 2018) noticed that granularity by dynamically shifting its exponent range large magnitude weights bear a higher impact on model per- and by optimally clipping its representable datapoints. formance and proposed outlier-aware quantization, which requires separate low and high bit-width precision for small • We evaluate AdaptivFloat across a diverse set of DNN and outlier weight values, respectively. However, this tech- models and tasks and show that it achieves higher clas- nique complicates the hardware implementation by requir- sification and prediction accuracies compared to equiv- ing two separate PE datapaths for the small and the outlier alent bit width uniform, block floating-point and non- weights. AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference Hardware-Friendly Encodings. Linear fixed-point or uni- Floating points w/o denormal, Floating points w/o denormal form integer quantization is commonly used for deep learn- but sacrifice ±min for ±0 ing hardware acceleration (Jouppi et al., 2017; Jacob et al., +0.25 -0.25 +0 -0 2017; Reagen et al., 2016) as it presents an area and energy +0.375 -0.375 +0.375 -0.375 cost-effective solution compared to floating-point based pro- +0.5 -0.5 +0.5 -0.5 cessors.
Recommended publications
  • 5. Data Types
    IEEE FOR THE FUNCTIONAL VERIFICATION LANGUAGE e Std 1647-2011 5. Data types The e language has a number of predefined data types, including the integer and Boolean scalar types common to most programming languages. In addition, new scalar data types (enumerated types) that are appropriate for programming, modeling hardware, and interfacing with hardware simulators can be created. The e language also provides a powerful mechanism for defining OO hierarchical data structures (structs) and ordered collections of elements of the same type (lists). The following subclauses provide a basic explanation of e data types. 5.1 e data types Most e expressions have an explicit data type, as follows: — Scalar types — Scalar subtypes — Enumerated scalar types — Casting of enumerated types in comparisons — Struct types — Struct subtypes — Referencing fields in when constructs — List types — The set type — The string type — The real type — The external_pointer type — The “untyped” pseudo type Certain expressions, such as HDL objects, have no explicit data type. See 5.2 for information on how these expressions are handled. 5.1.1 Scalar types Scalar types in e are one of the following: numeric, Boolean, or enumerated. Table 17 shows the predefined numeric and Boolean types. Both signed and unsigned integers can be of any size and, thus, of any range. See 5.1.2 for information on how to specify the size and range of a scalar field or variable explicitly. See also Clause 4. 5.1.2 Scalar subtypes A scalar subtype can be named and created by using a scalar modifier to specify the range or bit width of a scalar type.
    [Show full text]
  • Occam 2.1 Reference Manual
    occam 2.1 reference manual SGS-THOMSON Microelectronics Limited May 12, 1995 iv occam 2.1 REFERENCE MANUAL SGS-THOMSON Microelectronics Limited First published 1988 by Prentice Hall International (UK) Ltd as the occam 2 Reference Manual. SGS-THOMSON Microelectronics Limited 1995. SGS-THOMSON Microelectronics reserves the right to make changes in specifications at any time and without notice. The information furnished by SGS-THOMSON Microelectronics in this publication is believed to be accurate, but no responsibility is assumed for its use, nor for any infringement of patents or other rights of third parties resulting from its use. No licence is granted under any patents, trademarks or other rights of SGS-THOMSON Microelectronics. The INMOS logo, INMOS, IMS and occam are registered trademarks of SGS-THOMSON Microelectronics Limited. Document number: 72 occ 45 03 All rights reserved. No part of this publication may be reproduced, stored in a retrival system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior permission, in writing, from SGS-THOMSON Microelectronics Limited. Contents Contents v Contents overview ix Preface xi Introduction 1 Syntax and program format 3 1 Primitive processes 5 1.1 Assignment 5 1.2 Communication 6 1.3 SKIP and STOP 7 2 Constructed processes 9 2.1 Sequence 9 2.2 Conditional 11 2.3 Selection 13 2.4 WHILE loop 14 2.5 Parallel 15 2.6 Alternation 19 2.7 Processes 24 3 Data types 25 3.1 Primitive data types 25 3.2 Named data types 26 3.3 Literals
    [Show full text]
  • Floating-Point Exception Handing
    ISO/IEC JTC1/SC22/WG5 N1372 Working draft of ISO/IEC TR 15580, second edition Information technology ± Programming languages ± Fortran ± Floating-point exception handing This page to be supplied by ISO. No changes from first edition, except for for mechanical things such as dates. i ISO/IEC TR 15580 : 1998(E) Draft second edition, 13th August 1999 ISO/IEC Foreword To be supplied by ISO. No changes from first edition, except for for mechanical things such as dates. ii ISO/IEC Draft second edition, 13th August 1999 ISO/IEC TR 15580: 1998(E) Introduction Exception handling is required for the development of robust and efficient numerical software. In particular, it is necessary in order to be able to write portable scientific libraries. In numerical Fortran programming, current practice is to employ whatever exception handling mechanisms are provided by the system/vendor. This clearly inhibits the production of fully portable numerical libraries and programs. It is particularly frustrating now that IEEE arithmetic (specified by IEEE 754-1985 Standard for binary floating-point arithmetic, also published as IEC 559:1989, Binary floating-point arithmetic for microprocessor systems) is so widely used, since built into it are the five conditions: overflow, invalid, divide-by-zero, underflow, and inexact. Our aim is to provide support for these conditions. We have taken the opportunity to provide support for other aspects of the IEEE standard through a set of elemental functions that are applicable only to IEEE data types. This proposal involves three standard modules: IEEE_EXCEPTIONS contains a derived type, some named constants of this type, and some simple procedures.
    [Show full text]
  • Mark Vie Controller Standard Block Library for Public Disclosure Contents
    GEI-100682AC Mark* VIe Controller Standard Block Library These instructions do not purport to cover all details or variations in equipment, nor to provide for every possible contingency to be met during installation, operation, and maintenance. The information is supplied for informational purposes only, and GE makes no warranty as to the accuracy of the information included herein. Changes, modifications, and/or improvements to equipment and specifications are made periodically and these changes may or may not be reflected herein. It is understood that GE may make changes, modifications, or improvements to the equipment referenced herein or to the document itself at any time. This document is intended for trained personnel familiar with the GE products referenced herein. Public – This document is approved for public disclosure. GE may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not provide any license whatsoever to any of these patents. GE provides the following document and the information included therein as is and without warranty of any kind, expressed or implied, including but not limited to any implied statutory warranty of merchantability or fitness for particular purpose. For further assistance or technical information, contact the nearest GE Sales or Service Office, or an authorized GE Sales Representative. Revised: July 2018 Issued: Sept 2005 © 2005 – 2018 General Electric Company. ___________________________________ * Indicates a trademark of General
    [Show full text]
  • DRAFT1.2 Methods
    HL7 v3.0 Data Types Specification - Version 0.9 Table of Contents Abstract . 1 1 Introduction . 2 1.1 Goals . 3 DRAFT1.2 Methods . 6 1.2.1 Analysis of Semantic Fields . 7 1.2.2 Form of Data Type Definitions . 10 1.2.3 Generalized Types . 11 1.2.4 Generic Types . 12 1.2.5 Collections . 15 1.2.6 The Meta Model . 18 1.2.7 Implicit Type Conversion . 22 1.2.8 Literals . 26 1.2.9 Instance Notation . 26 1.2.10 Typus typorum: Boolean . 28 1.2.11 Incomplete Information . 31 1.2.12 Update Semantics . 33 2 Text . 36 2.1 Introduction . 36 2.1.1 From Characters to Strings . 36 2.1.2 Display Properties . 37 2.1.3 Encoding of appearance . 37 2.1.4 From appearance of text to multimedial information . 39 2.1.5 Pulling the pieces together . 40 2.2 Character String . 40 2.2.1 The Unicode . 41 2.2.2 No Escape Sequences . 42 2.2.3 ITS Responsibilities . 42 2.2.4 HL7 Applications are "Black Boxes" . 43 2.2.5 No Penalty for Legacy Systems . 44 2.2.6 Unicode and XML . 47 2.3 Free Text . 47 2.3.1 Multimedia Enabled Free Text . 48 2.3.2 Binary Data . 55 2.3.3 Outstanding Issues . 57 3 Things, Concepts, and Qualities . 58 3.1 Overview of the Problem Space . 58 3.1.1 Concept vs. Instance . 58 3.1.2 Real World vs. Artificial Technical World . 59 3.1.3 Segmentation of the Semantic Field .
    [Show full text]
  • Handwritten Digit Classication Using 8-Bit Floating Point Based Convolutional Neural Networks
    Downloaded from orbit.dtu.dk on: Apr 10, 2018 Handwritten Digit Classication using 8-bit Floating Point based Convolutional Neural Networks Gallus, Michal; Nannarelli, Alberto Publication date: 2018 Document Version Publisher's PDF, also known as Version of record Link back to DTU Orbit Citation (APA): Gallus, M., & Nannarelli, A. (2018). Handwritten Digit Classication using 8-bit Floating Point based Convolutional Neural Networks. DTU Compute. (DTU Compute Technical Report-2018, Vol. 01). General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Handwritten Digit Classification using 8-bit Floating Point based Convolutional Neural Networks Michal Gallus and Alberto Nannarelli (supervisor) Danmarks Tekniske Universitet Lyngby, Denmark [email protected] Abstract—Training of deep neural networks is often con- In order to address this problem, this paper proposes usage strained by the available memory and computational power. of 8-bit floating point instead of single precision floating point This often causes it to run for weeks even when the underlying which allows to save 75% space for all trainable parameters, platform is employed with multiple GPUs.
    [Show full text]
  • Maconomy Restful Web Services
    Maconomy RESTful Web Services Programmer’s Guide 2015 Edited by Rune Glerup While Deltek has attempted to verify that the information in this document is accurate and complete, some typographical or technical errors may exist. The recipient of this document is solely responsible for all decisions relating to or use of the information provided herein. The information contained in this publication is effective as of the publication date below and is subject to change without notice. This publication contains proprietary information that is protected by copyright. All rights are reserved. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, or translated into another language, without the prior written consent of Deltek, Inc. This edition published April 2015. © 2015 Deltek Inc. Deltek’s software is also protected by copyright law and constitutes valuable confidential and proprietary information of Deltek, Inc. and its licensors. The Deltek software, and all related documentation, is provided for use only in accordance with the terms of the license agreement. Unauthorized reproduction or distribution of the program or any portion thereof could result in severe civil or criminal penalties. All trademarks are the property of their respective owners. ii ©Deltek Inc., All Rights Reserved Contents 1 Introduction1 1.1 The Container Abstraction..........................1 1.1.1 Card panes...............................1 1.1.2 Table panes...............................1 1.1.3 Filter panes...............................2 1.2 REST......................................2 1.2.1 Resources................................3 1.2.2 Hyperlinks...............................3 1.2.3 Other Styles of Web Services.....................3 1.2.4 Further Reading............................4 1.3 Example.....................................4 1.4 curl.......................................8 2 Basics 9 2.1 JSON and XML................................9 2.2 Language...................................
    [Show full text]
  • 60-140 1. Overview of Computer Systems Overview of Computer
    60-140 Introduction to Algorithms and Programming I 1. Overview of Computer Systems FALL 2015 Computers are classified based on their generation and INSTRUCTOR: DR. C.I. EZEIFE Everybody knows that the type. WORLD’S COOLEST The architecture of different generations of computers STUDENTS TAKE 60-141400 differ with advancement in technology. Changes in computer equipment have gone through four generations namely: • First Generation Computers (1945-1955): Bulky, SCHOOL OF COMPUTER SCIENCE, expensive, consumed a lot of energy because main UNIVERSITY OF WINDSOR electronic component was vacuum tube. Pro- gramming was in machine language and wiring up plug boards. 60-140 Dr. C.I. Ezeife © 2015 Slide 1 60-140 Dr. C.I. Ezeife © 2015 Slide 2 Overview of Computer Systems Overview of Computer Systems Second Generation Computers (1955-1965): Basic electronic components became transistors. Prog- ramming in High level language with punched cards. Third Generation Computers (1965-1980): Basic technology became integrated circuit (I Cs) allowing many transistors on a silicon chip. Faster, cheaper and smaller in size, e.g., IBM system 360. Fourth Generation (1980-1990): Personal Computers came to use. Technology in use is large scale Vacuum Tube, Transistor, an LSI chip integration(LSI). Support for network and GUI. Higher Generations: Use of VLSI technology. 60-140 Dr. C.I. Ezeife © 2015 Slide 3 60-140 Dr. C.I. Ezeife © 2015 Slide 4 1 Types of Computers Types of Computers implementing powerful virtual computing processing powers Computers belong to one of these types based on their like grid and cloud computing. Grid computing applies size, processing power, number of concurrent users resources of many computers to a single problem (European Data Grid) while cloud computing is used for Internet based supported and their cost.
    [Show full text]
  • Data Types and Instruction Set of 8087
    Data Types and Instruction Set of 8087 �Internally, all data operands are converted to the 80-bit temporary real format. We have 3 types. •Integer data type •Packed BCD data type •Real data type Coprocessor data types Integer Data Type Packed BCD Real data type Example �Converting a decimal number into a Floating-point number. 1) Converting the decimal number into binary form. 2) Normalize the binary number 3) Calculate the biased exponent. 4) Store the number in the floating-point format. Example Step Result 1) 100.25 2) 1100100.01 = 1.10010001 * 26 3) 110+01111111=10000101 4 ) Sign = 0 Exponent =10000101 Significand = 10010001000000000000000 •In step 3 the biased exponent is the exponent a 26 or 110,plus a bias of 01111111(7FH) ,single precision no use 7F and double precision no use 3FFFH. •IN step 4 the information found in prior step is combined to form the floating point no. INSTRUCTION SET �The 8087 instruction mnemonics begins with the letter F which stands for Floating point and distinguishes from 8086. �These are grouped into Four functional groups. �The 8087 detects an error condition usually called an exception when it executing an instruction it will set the bit in its Status register. Types I. DATA TRANSFER INSTRUCTIONS. II. ARITHMETIC INSTRUCTIONS. III. COMPARE INSTRUCTIONS. IV. TRANSCENDENTAL INSTRUCTIONS. (Trigonometric and Exponential) I Data Transfers Instructions: � REAL TRANSFER FLD Load real FST Store real FSTP Store real and pop FXCH Exchange registers � INTEGER TRANSFER FILD Load integer FIST Store in teger FISTP Store integer and pop �PACKED DECIMAL TRANSFER(BCD) FBLD Load BCD FBSTP Store BCD and pop Example �FLD Source- Decrements the stack pointer by one and copies a real numbe r from a stack element or memory location to the new ST.
    [Show full text]
  • Generate Db Documentation from Db Schema
    Generate Db Documentation From Db Schema Multangular and Rhenish Marshal strowing while syntonic Britt doping her platysma pliably and forefeeling hirebelievingly. sneakingly Hyphenated or attitudinize and unshowered deathy when Len Salmon ligate is his engrained. bandoliers mud debated out-of-hand. Tenanted Brandon Database queries are carried out in threads, including related objects, if you are collecting tribe data from various data sources you may need to have a repository or an additional database to ingest the data. But if you are not interested in the forum module, scalable solution, or zero for the default changelist. The user can make your database schema reports more friendly by adding notes for your objects in automatic or manual mode. Deleting a database detaches the forests that are attached to it, passionate, but we also hold the same sort of impressions about ourselves. The maturational process and the facilitating environment. Conditional by default, we are exposed to the idea of the self from our parents and other figures. Define format that is expected, Robin will work on a feature that requires altering the User table. Revise these settings depending on your nodes sizing requirements. Buying a copy supports me writing more about similar topics. The development of congruence is dependent on unconditional positive regard. We believe development must be an enjoyable and creative experience to be truly fulfilling. Whose self is it anyway? Need to tell us more? Use caution when adjusting the merge parameters or using merge blackouts, bringing it from whatever point it is in the history to the latest version.
    [Show full text]
  • PART III Basic Data Types
    PART III Basic Data Types 186 Chapter 7 Using Numeric Types Two kinds of number representations, integer and floating point, are supported by C. The various integer types in C provide exact representations of the mathematical concept of “integer” but can represent values in only a limited range. The floating-point types in C are used to represent the mathematical type “real.” They can represent real numbers over a very large range of magnitudes, but each number generally is an approximation, using a limited number of decimal places of precision. In this chapter, we define and explain the integer and floating-point data types built into C and show how to write their literal forms and I/O formats. We discuss the range of values that can be stored in each type, how to perform reliable arithmetic computations with these values, what happens when a number is converted (or cast) from one type to another, and how to choose the proper data type for a problem. We would like to think of numbers as integer values, not as patterns of bits in memory. This is possible most of the time when working with C because the language lets us name the numbers and compute with them symbolically. Details such as the length (in bytes) of the number and the arrangement of bits in those bytes can be ignored most of the time. However, inside the computer, the numbers are just bit patterns. This becomes evident when conditions such as integer overflow occur and a “correct” formula produces a wrong and meaningless answer.
    [Show full text]
  • Slide Set 14 for ENCM 369 Winter 2014 Lecture Section 01
    Slide Set 14 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014 ENCM 369 W14 Section 01 Slide Set 14 slide 2/66 Contents Introduction to Floating-Point Numbers MIPS Formats for F-P Numbers IEEE Floating-Point Standards MIPS Floating-Point Registers Coprocessor 1 Translating C F-P Code to to MIPS A.L. Quick Overview of F-P Algorithms and Hardware Some Data and Remarks about Speed of Arithmetic ENCM 369 W14 Section 01 Slide Set 14 slide 3/66 Outline of Slide Set 14 Introduction to Floating-Point Numbers MIPS Formats for F-P Numbers IEEE Floating-Point Standards MIPS Floating-Point Registers Coprocessor 1 Translating C F-P Code to to MIPS A.L. Quick Overview of F-P Algorithms and Hardware Some Data and Remarks about Speed of Arithmetic ENCM 369 W14 Section 01 Slide Set 14 slide 4/66 Introduction to floating-point numbers We've finished ENCM 369 coverage of integer representations and arithmetic. We're moving on to floating-point numbers and arithmetic. (Section 5.3.2 in the textbook for concepts; 6.7.4 for a very brief introduction to MIPS floating-point registers and instructions.) Floating-point is the generic name given to the kinds of numbers you've seen in C and C++ with types double and float. ENCM 369 W14 Section 01 Slide Set 14 slide 5/66 Scientific Notation This is a format that engineering students should be very familiar with! Example: 6:02214179 × 1023 mol−1 Example: −1:60217656 × 10−19 C Floating-point representation has the same structure as scientific notation, but floating-point typically uses base two, not base ten.
    [Show full text]