Floating-Point Extensions for C — 10 Part 2: Decimal Floating-Point Arithmetic

Total Page:16

File Type:pdf, Size:1020Kb

Floating-Point Extensions for C — 10 Part 2: Decimal Floating-Point Arithmetic © ISO/IEC 2012 – All rights reserved Study Group Draft – September 23, 2012 WG 14 N1632 ISO/IEC JTC 1/SC 22 0000 Date: yyyy-mm-dd 5 Reference number of document: ISO/IEC nnn-n Committee identification: ISO/IEC JTC 1/SC 22/WG 14 Secretariat: ANSI Information Technology — Programming languages, their environments, and system software interfaces — Floating-point extensions for C — 10 Part 2: Decimal floating-point arithmetic Technologies de l’information — Langages de programmation, leurs environnements et interfaces du logiciel système — Extensions à virgule flottante pour C — Partie 2: décimal arithmétique flottante Warning 15 This document is not an ISO International Standard. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an International Standard. Recipients of this draft are invited to submit, with their comments, notification of any relevant patent rights of which they are aware and to provide supporting documentation. Document type: Technical Specification Document subtype: Document stage: (20) Preparation Document language: E ISO/IEC nnn-n Study Group Draft – September 23, 2012 WG 14 N1632 Copyright notice This ISO document is a working draft or committee draft and is copyright-protected by ISO. While the reproduction of working drafts or committee drafts in any form for use by participants in the ISO standards development process is permitted without prior permission from ISO, neither this document 5 nor any extract from it may be reproduced, stored or transmitted in any form for any other purpose without prior written permission from ISO. Requests for permission to reproduce this document for the purpose of selling it should be addressed as shown below or to ISO’s member body in the country of the requester: ISO copyright office 10 Case postale 56 CH-1211 Geneva 20 Tel. +41 22 749 01 11 Fax + 41 22 749 09 47 E-mail [email protected] Web www.iso.org 15 Reproduction for sales purposes may be subject to royalty payments or a licensing agreement. Violators may be prosecuted. ii © ISO/IEC 2012 – All rights reserved WG 14 N1632 Study Group Draft – September 23, 2012 ISO/IEC nnn-n Contents Page Foreword ...........................................................................................................................................................iv 0 Introduction ................................................................................................................................................iv 0.1 Background .............................................................................................................................................iv 5 0.2 Purpose.....................................................................................................................................................v 0.3 The arithmetic model ...............................................................................................................................v 0.4 The formats..............................................................................................................................................vi 1 Scope ...........................................................................................................................................................1 2 Conformance ...............................................................................................................................................1 10 3 Normative references .................................................................................................................................2 4 Terms and definitions.................................................................................................................................2 5 Predefined macros......................................................................................................................................2 6 Decimal floating types ................................................................................................................................2 7 Characteristics of decimal floating types <float.h>.............................................................................4 15 8 Operation binding .......................................................................................................................................6 9 Conversions ................................................................................................................................................7 9.1 Conversions between decimal floating and integer .............................................................................7 9.2 Conversions among decimal floating types, and between decimal floating types and generic floating types ........................................................................................................................................7 20 9.3 Conversions between decimal floating and complex...........................................................................8 9.4 Usual arithmetic conversions .................................................................................................................8 9.5 Default argument promotion...................................................................................................................8 10 Constants...................................................................................................................................................8 11 Arithmetic operations ...............................................................................................................................9 25 11.1 Operators ................................................................................................................................................9 11.2 Functions ..............................................................................................................................................10 11.3 Conversions .........................................................................................................................................11 12 Library ......................................................................................................................................................11 12.1 Standard headers.................................................................................................................................11 30 12.2 Floating-point environment <fenv.h> ..............................................................................................11 12.3 Decimal mathematics <math.h> ........................................................................................................12 12.4 New <math.h> functions ....................................................................................................................21 12.4.1 Quantum exponent functions............................................................................................................21 12.4.2 Decimal re-encoding functions .........................................................................................................22 35 12.5 Formatted input/output specifiers......................................................................................................24 12.6 strtod32, strtod64, and strtod128 functions <stdlib.h> ......................................................26 12.7 wcstod32, wcstod64, and wcstod128 functions <wchar.h> ........................................................29 12.8 Type-generic macros <tgmath.h> ....................................................................................................30 Bibliography.....................................................................................................................................................32 40 © ISO/IEC 2012 – All rights reserved iii ISO/IEC nnn-n Study Group Draft – September 23, 2012 WG 14 N1632 Foreword ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been 5 established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization. International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2. The main task of technical committees is to prepare International Standards. Draft International Standards 10 adopted by the technical committees are circulated to the member bodies for voting. Publication as an International Standard requires approval by at least 75 % of the member bodies casting a vote. Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent rights. ISO nnn-n was prepared by Technical Committee ISO JTC 1, Information Technology, Subcommittee SC 22, 15 Programming languages, their environments, and system software interfaces. ISO nnn consists of the following parts, under the general title Floating-point extensions for C: ⎯ Part 1: Binary floating-point arithmetic ⎯ Part 2: Decimal floating-point arithmetic
Recommended publications
  • Version 13.0 Release Notes
    Release Notes The tool of thought for expert programming Version 13.0 Dyalog is a trademark of Dyalog Limited Copyright 1982-2011 by Dyalog Limited. All rights reserved. Version 13.0 First Edition April 2011 No part of this publication may be reproduced in any form by any means without the prior written permission of Dyalog Limited. Dyalog Limited makes no representations or warranties with respect to the contents hereof and specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. Dyalog Limited reserves the right to revise this publication without notification. TRADEMARKS: SQAPL is copyright of Insight Systems ApS. UNIX is a registered trademark of The Open Group. Windows, Windows Vista, Visual Basic and Excel are trademarks of Microsoft Corporation. All other trademarks and copyrights are acknowledged. iii Contents C H A P T E R 1 Introduction .................................................................................... 1 Summary........................................................................................................................... 1 System Requirements ....................................................................................................... 2 Microsoft Windows .................................................................................................... 2 Microsoft .Net Interface .............................................................................................. 2 Unix and Linux ..........................................................................................................
    [Show full text]
  • New Directions in Floating-Point Arithmetic
    New directions in floating-point arithmetic Nelson H. F. Beebe University of Utah Department of Mathematics, 110 LCB 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA Abstract. This article briefly describes the history of floating-point arithmetic, the development and features of IEEE standards for such arithmetic, desirable features of new implementations of floating-point hardware, and discusses work- in-progress aimed at making decimal floating-point arithmetic widely available across many architectures, operating systems, and programming languages. Keywords: binary arithmetic; decimal arithmetic; elementary functions; fixed-point arithmetic; floating-point arithmetic; interval arithmetic; mathcw library; range arithmetic; special functions; validated numerics PACS: 02.30.Gp Special functions DEDICATION This article is dedicated to N. Yngve Öhrn, my mentor, thesis co-advisor, and long-time friend, on the occasion of his retirement from academia. It is also dedicated to William Kahan, Michael Cowlishaw, and the late James Wilkinson, with much thanks for inspiration. WHAT IS FLOATING-POINT ARITHMETIC? Floating-point arithmetic is a technique for storing and operating on numbers in a computer where the base, range, and precision of the number system are usually fixed by the computer design. Conceptually, a floating-point number has a sign,anexponent,andasignificand (the older term mantissa is now deprecated), allowing a representation of the form .1/sign significand baseexponent.Thebase point in the significand may be at the left, or after the first digit, or at the right. The point and the base are implicit in the representation: neither is stored. The sign can be compactly represented by a single bit, and the exponent is most commonly a biased unsigned bitfield, although some historical architectures used a separate exponent sign and an unbiased exponent.
    [Show full text]
  • A Software Implementation of the IEEE 754R Decimal Floating-Point Arithmetic Using the Binary Encoding Format
    148 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 2, FEBRUARY 2009 A Software Implementation of the IEEE 754R Decimal Floating-Point Arithmetic Using the Binary Encoding Format Marius Cornea, Member, IEEE, John Harrison, Member, IEEE, Cristina Anderson, Ping Tak Peter Tang, Member, IEEE, Eric Schneider, and Evgeny Gvozdev Abstract—The IEEE Standard 754-1985 for Binary Floating-Point Arithmetic [19] was revised [20], and an important addition is the definition of decimal floating-point arithmetic [8], [24]. This is intended mainly to provide a robust reliable framework for financial applications that are often subject to legal requirements concerning rounding and precision of the results, because the binary floating-point arithmetic may introduce small but unacceptable errors. Using binary floating-point calculations to emulate decimal calculations in order to correct this issue has led to the existence of numerous proprietary software packages, each with its own characteristics and capabilities. The IEEE 754R decimal arithmetic should unify the ways decimal floating-point calculations are carried out on various platforms. New algorithms and properties are presented in this paper, which are used in a software implementation of the IEEE 754R decimal floating-point arithmetic, with emphasis on using binary operations efficiently. The focus is on rounding techniques for decimal values stored in binary format, but algorithms are outlined for the more important or interesting operations of addition, multiplication, and division, including the case of nonhomogeneous operands, as well as conversions between binary and decimal floating-point formats. Performance results are included for a wider range of operations, showing promise that our approach is viable for applications that require decimal floating-point calculations.
    [Show full text]
  • Floatingpoint Decode Floating Operand(S) Point Stictor
    US 20170 199724A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2017/0199724 A1 Cowlishaw et al. (43) Pub. Date: Jul. 13, 2017 (54) ROUND FOR REROUND MODE IN A (2006.01) DECMAL FLOATING POINT INSTRUCTION (2006.01) (52) U.S. C. (71) Applicant: International Business Machines CPC .......... G06F 7/49957 (2013.01); G06F 7/491 Corporation, Armonk, NY (US) (2013.01); G06F 7/49963 (2013.01); G06F 7/49968 (2013.01); G06F 7/49973 (2013.01); (72) Inventors: Michael F. Cowlishaw, Coventry (GB); G06F 7/49978 (2013.01); G06F 7/483 Eric M. Schwarz, Gardiner, NY (US); (2013.01); G06F 7/49984 (2013.01); G06F Ronald M. Smith, SR., Wappingers 7/49915 (2013.01); G06F 7/386 (2013.01); Falls, NY (US); Phil C. Yeh, Poughkeepsie, NY (US) G06F 7/499.47 (2013.01) (57) ABSTRACT (21) Appl. No.: 15/470,692 A round-for-reround mode (preferably in a BID encoded Decimal format) of a floating point instruction prepares a (22) Filed: Mar. 27, 2017 result for later rounding to a variable number of digits by detecting that the least significant digit may be a 0, and if so Related U.S. Application Data changing it to 1 when the trailing digits are not all 0. A (63) Continuation of application No. 14/943.254, filed on Subsequent reround instruction is then able to round the Nov. 17, 2015, which is a continuation of application result to any number of digits at least 2 fewer than the No. 13/848,885, filed on Mar. 22, 2013, now Pat.
    [Show full text]
  • Design and Implementation of IEEE-754 Addition and Subtraction for Floating Point Arithmetic Logic Unit
    Design and Implementation of IEEE-754 Addition and Subtraction for Floating Point Arithmetic Logic Unit V.vinay chamkur [email protected] M.Tech ,VLSI DESIGN and EMBEDDED SYSTEMS Chethana.R LecturerECE Dept, SJBIT Bengaluru-60. ABSTRACT--- This paper describes the FPGA implementation of a Decimal Floating Point (DFP) adder/subtractor using IEEE 754-2008 format. In this paper we describe an efficient implementation of an IEEE 754 single precision Standard for Binary Floating- Point Arithmetic to include specifications for decimal floating-point arithmetic. As processor support for decimal floating-point arithmetic emerges, it is important to investigate efficient algorithms and hardware designs for common decimal floating-point arithmetic algorithms. This paper presents novel designs for a decimal floating-point addition and subtraction. They are fully synthesizable hardware descriptions in VERILOG. Each one is presented for high speed computing. Figure 1.IEEE floating point format Keywords- IEEE-754 Floating Point Standard; S (E - Bias) Addition and Subtraction Algorithm. Z = (-1 ) * 2 * (1.M) -1 -2 -3 -22 -23 Where M = m22 2 + m21 2 + m20 2 +…+ m1 2 + m0 2 ; Bias = 127. I.INTRODUCTION FIG : -1 Floating point numbers are one possible way of a) 1-bit sign s. representing real numbers in binary format; the IEEE 754 [1] standard presents two different floating point formats, b) A w + 5 bit combination field G encoding Binary interchange format classification and, if the encoded datum is a finite number,the exponent q and four significand bits (1 or 3 of which are implied). and Decimal interchange format. Multiplying floating The biased exponent E is a w + 2 bit quantity q + bias, where the point numbers is a critical requirement for DSP applications value of the first two bits of the biased exponent taken together is involving large dynamic range.
    [Show full text]
  • Hardware Design of a Binary Integer Decimal-Based IEEE P754 Rounding Unit
    Hardware Design of a Binary Integer Decimal-based IEEE P754 Rounding Unit Charles Tsen Michael Schulte Sonia González-Navarro University of Wisconsin University of Wisconsin Universidad de Málaga [email protected] [email protected] [email protected] Abstract BigDecimal library [14]. Also, Intel recently published results for a prototype software library using the Binary Because of the growing importance of decimal Integer Decimal (BID) encoding [3,4]. These software floating-point (DFP) arithmetic, specifications for it were packages are adequate for many of today’s applications, but recently added to the draft revision of the IEEE 754 as trends towards globalization and e-commerce continue, Standard (IEEE P754). In this paper, we present a the performance of software for DFP arithmetic may not hardware design for a rounding unit for 64-bit DFP suffice. numbers (decimal64) that use the IEEE P754 binary Several hardware designs for DFP arithmetic have been encoding of DFP numbers, which is widely known as the developed using encodings other than BID [1, 2, 12, 17, 18, Binary Integer Decimal (BID) encoding. We summarize the 19]. Recently, IBM announced DFP hardware on its technique used for rounding, present the theory and design Power6 server processor [13] and System z9 processor [8] of the BID rounding unit, and evaluate its critical path using the Densely Packed Decimal (DPD) encoding, delay, latency, and area for combinational and pipelined discussed in Section 2. Previous designs for DFP designs. Over 86% of the rounding unit’s area is due to a arithmetic differ from our design in that they operate on 55-bit by 54-bit binary multiplier, which can be shared with significands with a decimal radix of 10, such as Binary a double-precision binary floating-point multiplier.
    [Show full text]
  • Algorithms and Hardware Designs for Decimal Multiplication
    Algorithms and Hardware Designs for Decimal Multiplication by Mark A. Erle Presented to the Graduate and Research Committee of Lehigh University in Candidacy for the Degree of Doctor of Philosophy in Computer Engineering Lehigh University November 21, 2008 Approved and recommended for acceptance as a dissertation in partial ful¯llment of the requirements for the degree of Doctor of Philosophy. Date Accepted Date Dissertation Committee: Dr. Mark G. Arnold Chair Dr. Meghanad D. Wagh Member Dr. Brian D. Davison Member Dr. Michael J. Schulte External Member Dr. Eric M. Schwarz External Member Dr. William M. Pottenger External Member iii This dissertation is dedicated to Michele, Laura, Kristin, and Amy. iv Acknowledgements Along the path to completing this dissertation, I have traversed the birth of my third daughter, the passing of my mother, and the passing of my father. Fortunately, I have not been alone, for I have been traveling with God, family, and friends. I am appreciative of the support I received from IBM all along the way. There are several colleagues with whom I had the good fortune to collaborate on several papers, and to whom I am very grateful, namely Dr. Eric Schwarz, Mr. Brian Hickmann, and Dr. Liang-Kai Wang. Additionally, I am thankful for the guidance and assistance from my dissertation committee which includes Dr. Michael Schulte, Dr. Mark Arnold, Dr. Meghanad Wagh, Dr. Brian Davison, Dr. Eric Schwarz, and Dr. William Pottenger. In particular, I am indebted to Dr. Schulte for his mentorship, encouragement, and rapport. Now, ¯nding myself at the end of this path, I see two roads diverging..
    [Show full text]
  • 2014 03 Dpdbid.Pdf
    List of Abbreviation ASIC Application-Specific Integrated Circuit BCD Binary Coded Decimal BID Binary Integer Decimal CSA Carry Save Adder DFP Decimal Floating Point DPD Densely Packed Decimal FPGA Field Programmable Gate Array IEEE Institute of Electrical and Electronics Engineers Inf Infinity LSB Least Significant Bit MSB Most Significant Bit MUX Multiplexer NaN Not a Number qNaN quiet NaN sNaN signaling NaN x Definitions Biased exponent: The sum of the exponent and a constant (bias) are chosen to • make the biased exponent’s range nonnegative. Binary floating-point number: A floating-point number with radix two. • Cohort: In a given format, the set of floating-point representations with the same • numerical value. Decimal floating-point number: A floating-point number with radix ten. • Declet: An encoding of three decimal digits into ten bits using the densely packed • decimal encoding scheme. Of the 1024 possible declets, 1000 canonical declets are produced by computational operations, while 24 noncanonical declets are not produced by computational operations, but are accepted in operands. Exception: An event that occurs when an operation has no outcome suitable for • every reasonable application. Exponent: The component of a binary floating-point number that normally signifies • the integer power to which the radix two is raised in determining the value of the represented number. Occasionally the exponent is called the signed or unbiased exponent. Floating-point number: A bit-string encoding characterized by three components: • a sign, a signed exponent, and a significand. All floating-point numbers, including zeros and infinities, are signed. Format: A set of representations of numerical values and symbols, perhaps accom- • panied by an encoding.
    [Show full text]
  • Algorithms and Architectures for Decimal Transcendental Function Computation
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by University of Saskatchewan's Research Archive Algorithms and Architectures for Decimal Transcendental Function Computation A Thesis Submitted to the College of Graduate Studies and Research in Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy in the Department of Electrical and Computer Engineering University of Saskatchewan Saskatoon, Saskatchewan, Canada By Dongdong Chen c Dongdong Chen, January, 2011. All rights reserved. Permission to Use In presenting this thesis in partial fulfilment of the requirements for a Postgraduate degree from the University of Saskatchewan, I agree that the Libraries of this University may make it freely available for inspection. I further agree that permission for copying of this thesis in any manner, in whole or in part, for scholarly purposes may be granted by the professor or professors who supervised my thesis work or, in their absence, by the Head of the Department or the Dean of the College in which my thesis work was done. It is understood that any copying or publication or use of this thesis or parts thereof for financial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given to me and to the University of Saskatchewan in any scholarly use which may be made of any material in my thesis. Requests for permission to copy or to make other use of material in this thesis in whole or part should be addressed to: Head of the Department of Electrical and Computer Engineering 57 Campus Drive University of Saskatchewan Saskatoon, Saskatchewan Canada S7N 5A9 i Abstract Nowadays, there are many commercial demands for decimal floating-point (DFP) arith- metic operations such as financial analysis, tax calculation, currency conversion, Internet based applications, and e-commerce.
    [Show full text]
  • A Ada Interface
    A Ada interface AUGUSTA ADA (1815–1852), COUNTESS OF LOVELACE AND DAUGHTER OF LORD BYRON, ASSISTANT AND BIOGRAPHER TO CHARLES BABBAGE, AND THE WORLD’S FIRST COMPUTER PROGRAMMER. The Ada programming language [Ada95] defines a standard interface to routines written in other languages, and the GNU Ada translator, gnat, supports that interface. The standard Ada floating-point data types Float, Long_Float, and Long_Long_Float correspond directly to the C data types float, double, and long double. The Ada integer data types Integer, Long_Integer, and Long_Long_Integer are defined to be 32-bit, 32-bit, and 64-bit signed values, respectively, by the gnat compiler. At the time of writing this, this author has access only to one Ada compiler family, gnat, running on IA-32, IA-64, and AMD64 systems, so interplatform portability of the Ada interface to the mathcw library cannot yet be extensively checked. Nevertheless, any changes needed for other platforms are expected to be relatively small, and related to the mapping of integer types. Ada is a complex and strongly typed language, with a large collection of packages, several of which must be imported into Ada programs in order to use library functions. It is traditional in Ada to separate the specification of a package from its implementation, and with the gnat compiler, the two parts are stored in separate files with extensions .ads (Ada specification) and .adb (Ada body). For example, commercial software vendors can supply the specification source files and implementation object library files to customers, without revealing the implementation source code. Ada specification files therefore are similar in purpose to C header files.
    [Show full text]
  • Draft Standard for Interval Arithmetic 14.4 §
    P1788/D9.3, June 23, 2014 Chapter 2 Draft Standard For Interval Arithmetic 14.4 § The standard does not say which of these results is “correct”. Rather than change the implementation, it might be better to document that such code must be avoided if reproducible behavior is required.] 14.4. Interchange representations and encodings. The purpose of interchange repre- sentations and encodings is to allow the loss-free exchange of Level 2 interval data between 754- conforming implementations. This is done by imposing a standard Level 3 representation using Level 2 number datums and delegating interchange encoding of number datums to the IEEE 754 standard. Letx be a datum of the bare interval inf-sup typeT derived from a supported 754 formatF. Its standard Level 3 representative is an ordered pair (inf(x), sup(x)) of two Level 2F-numbers as defined in 12.12.8. For example, the only representative of Empty is the pair (+ , ) and the only representative§ of [0, 0] is the pair ( 0, +0). ∞ −∞ − Letx dx be a datum of the decorated interval typeDT derived fromT. Its standard Level 3 representative is an ordered triple (inf(x), sup(x), dx) of two Level 2F-datums and a decora- tion. For example, the only representative of Emptytrv is the triple (+ , , trv) and the only representative of NaI is the triple (NaN, NaN, ill). ∞ −∞ Interchange Level 4 encoding of an interval datum is a bit string that comprises the interchange encodings offields of the ordered pair/triple above, in the order given. Interchange encoding of theF-datumfields is a bit string encoding of one of the 754 interchange formats (754 3.6).
    [Show full text]
  • Hardware–Software Co-Design for Decimal Multiplication
    computers Article Hardware–Software Co-Design for Decimal Multiplication Riaz-ul-haque Mian * , Michihiro Shintani and Michiko Inoue Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma 630-0192, Japan; [email protected] (M.S.); [email protected] (M.I.) * Correspondence: [email protected] Abstract: Decimal arithmetic using software is slow for very large-scale applications. On the other hand, when hardware is employed, extra area overhead is required. A balanced strategy can overcome both issues. Our proposed methods are compliant with the IEEE 754-2008 standard for decimal floating-point arithmetic and combinations of software and hardware. In our methods, software with some area-efficient decimal component (hardware) is used to design the multiplication process. Analysis in a RISC-V-based integrated co-design evaluation framework reveals that the proposed methods provide several Pareto points for decimal multiplication solutions. The total execution process is sped up by 1.43× to 2.37× compared with a full software solution. In addition, 7–97% less hardware is required compared with an area-efficient full hardware solution. Keywords: decimal arithmetic; decimal multiplication; decimal floating-point; hardware-software co- design 1. Introduction Decimal arithmetic is very important for banking, commercial, and financial transac- tions. Moreover, it is widely used in scientific applications. A study by IBM has indicated that more than half of the numerical data in a commercial database are stored in decimal format [1]. Decimal arithmetic using hardware or software is typically much more complex Citation: Mian, R.-u.-h.; Shintani, M.; and slower than its counterpart binary arithmetic.
    [Show full text]