Floating Point - Wikipedia, the Free Encyclopedia Page 1 of 12 Floating Point
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Type-Safe Composition of Object Modules*
International Conference on Computer Systems and Education I ISc Bangalore Typ esafe Comp osition of Ob ject Mo dules Guruduth Banavar Gary Lindstrom Douglas Orr Department of Computer Science University of Utah Salt LakeCity Utah USA Abstract Intro duction It is widely agreed that strong typing in We describ e a facility that enables routine creases the reliability and eciency of soft typ echecking during the linkage of exter ware However compilers for statically typ ed nal declarations and denitions of separately languages suchasC and C in tradi compiled programs in ANSI C The primary tional nonintegrated programming environ advantage of our serverstyle typ echecked ments guarantee complete typ esafety only linkage facility is the ability to program the within a compilation unit but not across comp osition of ob ject mo dules via a suite of suchunits Longstanding and widely avail strongly typ ed mo dule combination op era able linkers comp ose separately compiled tors Such programmability enables one to units bymatching symb ols purely byname easily incorp orate programmerdened data equivalence with no regard to their typ es format conversion stubs at linktime In ad Such common denominator linkers accom dition our linkage facility is able to automat mo date ob ject mo dules from various source ically generate safe co ercion stubs for com languages by simply ignoring the static se patible encapsulated data mantics of the language Moreover com monly used ob ject le formats are not de signed to incorp orate source language typ e -
Fujitsu SPARC64™ X+/X Software on Chip Overview for Developers]
White paper [Fujitsu SPARC64™ X+/X Software on Chip Overview for Developers] White paper Fujitsu SPARC64™ X+/X Software on Chip Overview for Developers Page 1 of 13 www.fujitsu.com/sparc White paper [Fujitsu SPARC64™ X+/X Software on Chip Overview for Developers] Table of Contents Table of Contents 1 Software on Chip Innovative Technology 3 2 SPARC64™ X+/X SIMD Vector Processing 4 2.1 How Oracle Databases take advantage of SPARC64™ X+/X SIMD vector processing 4 2.2 How to use of SPARC64™ X+/X SIMD instructions in user applications 5 2.3 How to check if the system and operating system is capable of SIMD execution? 6 2.4 How to check if SPARC64™ X+/X SIMD instructions indeed have been generated upon compilation of a user source code? 6 2.5 Is SPARC64™ X+/X SIMD implementation compatible with Oracle’s SPARC SIMD? 6 3 SPARC64™ X+/X Decimal Floating Point Processing 8 3.1 How Oracle Databases take advantage of SPARC64™ X+/X Decimal Floating-Point 8 3.2 Decimal Floating-Point processing in user applications 8 4 SPARC64™ X+/X Extended Floating-Point Registers 9 4.1 How Oracle Databases take advantage of SPARC64™ X+/X Decimal Floating-Point 9 5 SPARC64™ X+/X On-Chip Cryptographic Processing Capabilities 10 5.1 How to use the On-Chip Cryptographic Processing Capabilities 10 5.2 How to use the On-Chip Cryptographic Processing Capabilities in user applications 10 6 Conclusions 12 Page 2 of 13 www.fujitsu.com/sparc White paper [Fujitsu SPARC64™ X+/X Software on Chip Overview for Developers] Software on Chip Innovative Technology 1 Software on Chip Innovative Technology Fujitsu brings together innovations in supercomputing, business computing, and mainframe computing in the Fujitsu M10 enterprise server family to help organizations meet their business challenges. -
CWM) Specification
Common Warehouse Metamodel (CWM) Specification Volume 1 Version 1.0 October 2001 Copyright © 1999, Dimension EDI Copyright © 1999, Genesis Development Corporation Copyright © 1999, Hyperion Solutions Corporation Copyright © 1999, International Business Machines Corporation Copyright © 1999, NCR Corporation Copyright © 2000, Object Management Group Copyright © 1999, Oracle Corporation Copyright © 1999, UBS AG Copyright © 1999, Unisys Corporation The companies listed above have granted to the Object Management Group, Inc. (OMG) a nonexclusive, royalty-free, paid up, worldwide license to copy and distribute this document and to modify this document and distribute copies of the mod- ified version. Each of the copyright holders listed above has agreed that no person shall be deemed to have infringed the copyright in the included material of any such copyright holder by reason of having used the specification set forth herein or having conformed any computer software to the specification. PATENT The attention of adopters is directed to the possibility that compliance with or adoption of OMG specifications may require use of an invention covered by patent rights. OMG shall not be responsible for identifying patents for which a license may be required by any OMG specification, or for conducting legal inquiries into the legal validity or scope of those patents that are brought to its attention. OMG specifications are prospective and advisory only. Prospective users are responsible for protecting themselves against liability for infringement of patents. NOTICE The information contained in this document is subject to change without notice. The material in this document details an Object Management Group specification in accordance with the license and notices set forth on this page. -
Arithmetic Algorithms for Extended Precision Using Floating-Point Expansions Mioara Joldes, Olivier Marty, Jean-Michel Muller, Valentina Popescu
Arithmetic algorithms for extended precision using floating-point expansions Mioara Joldes, Olivier Marty, Jean-Michel Muller, Valentina Popescu To cite this version: Mioara Joldes, Olivier Marty, Jean-Michel Muller, Valentina Popescu. Arithmetic algorithms for extended precision using floating-point expansions. IEEE Transactions on Computers, Institute of Electrical and Electronics Engineers, 2016, 65 (4), pp.1197 - 1210. 10.1109/TC.2015.2441714. hal- 01111551v2 HAL Id: hal-01111551 https://hal.archives-ouvertes.fr/hal-01111551v2 Submitted on 2 Jun 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. IEEE TRANSACTIONS ON COMPUTERS, VOL. , 201X 1 Arithmetic algorithms for extended precision using floating-point expansions Mioara Joldes¸, Olivier Marty, Jean-Michel Muller and Valentina Popescu Abstract—Many numerical problems require a higher computing precision than the one offered by standard floating-point (FP) formats. One common way of extending the precision is to represent numbers in a multiple component format. By using the so- called floating-point expansions, real numbers are represented as the unevaluated sum of standard machine precision FP numbers. This representation offers the simplicity of using directly available, hardware implemented and highly optimized, FP operations. -
Variable Precision in Modern Floating-Point Computing
Variable precision in modern floating-point computing David H. Bailey Lawrence Berkeley Natlional Laboratory (retired) University of California, Davis, Department of Computer Science 1 / 33 August 13, 2018 Questions to examine in this talk I How can we ensure accuracy and reproducibility in floating-point computing? I What types of applications require more than 64-bit precision? I What types of applications require less than 32-bit precision? I What software support is required for variable precision? I How can one move efficiently between precision levels at the user level? 2 / 33 Commonly used formats for floating-point computing Formal Number of bits name Nickname Sign Exponent Mantissa Hidden Digits IEEE 16-bit “IEEE half” 1 5 10 1 3 (none) “ARM half” 1 5 10 1 3 (none) “bfloat16” 1 8 7 1 2 IEEE 32-bit “IEEE single” 1 7 24 1 7 IEEE 64-bit “IEEE double” 1 11 52 1 15 IEEE 80-bit “IEEE extended” 1 15 64 0 19 IEEE 128-bit “IEEE quad” 1 15 112 1 34 (none) “double double” 1 11 104 2 31 (none) “quad double” 1 11 208 4 62 (none) “multiple” 1 varies varies varies varies 3 / 33 Numerical reproducibility in scientific computing A December 2012 workshop on reproducibility in scientific computing, held at Brown University, USA, noted that Science is built upon the foundations of theory and experiment validated and improved through open, transparent communication. ... Numerical round-off error and numerical differences are greatly magnified as computational simulations are scaled up to run on highly parallel systems. -
Floating Points
Jin-Soo Kim ([email protected]) Systems Software & Architecture Lab. Seoul National University Floating Points Fall 2018 ▪ How to represent fractional values with finite number of bits? • 0.1 • 0.612 • 3.14159265358979323846264338327950288... ▪ Wide ranges of numbers • 1 Light-Year = 9,460,730,472,580.8 km • The radius of a hydrogen atom: 0.000000000025 m 4190.308: Computer Architecture | Fall 2018 | Jin-Soo Kim ([email protected]) 2 ▪ Representation • Bits to right of “binary point” represent fractional powers of 2 • Represents rational number: 2i i i–1 k 2 bk 2 k=− j 4 • • • 2 1 bi bi–1 • • • b2 b1 b0 . b–1 b–2 b–3 • • • b–j 1/2 1/4 • • • 1/8 2–j 4190.308: Computer Architecture | Fall 2018 | Jin-Soo Kim ([email protected]) 3 ▪ Examples: Value Representation 5-3/4 101.112 2-7/8 10.1112 63/64 0.1111112 ▪ Observations • Divide by 2 by shifting right • Multiply by 2 by shifting left • Numbers of form 0.111111..2 just below 1.0 – 1/2 + 1/4 + 1/8 + … + 1/2i + … → 1.0 – Use notation 1.0 – 4190.308: Computer Architecture | Fall 2018 | Jin-Soo Kim ([email protected]) 4 ▪ Representable numbers • Can only exactly represent numbers of the form x / 2k • Other numbers have repeating bit representations Value Representation 1/3 0.0101010101[01]…2 1/5 0.001100110011[0011]…2 1/10 0.0001100110011[0011]…2 4190.308: Computer Architecture | Fall 2018 | Jin-Soo Kim ([email protected]) 5 Fixed Points ▪ p.q Fixed-point representation • Use the rightmost q bits of an integer as representing a fraction • Example: 17.14 fixed-point representation -
Faster Ffts in Medium Precision
Faster FFTs in medium precision Joris van der Hoevena, Grégoire Lecerfb Laboratoire d’informatique, UMR 7161 CNRS Campus de l’École polytechnique 1, rue Honoré d’Estienne d’Orves Bâtiment Alan Turing, CS35003 91120 Palaiseau a. Email: [email protected] b. Email: [email protected] November 10, 2014 In this paper, we present new algorithms for the computation of fast Fourier transforms over complex numbers for “medium” precisions, typically in the range from 100 until 400 bits. On the one hand, such precisions are usually not supported by hardware. On the other hand, asymptotically fast algorithms for multiple precision arithmetic do not pay off yet. The main idea behind our algorithms is to develop efficient vectorial multiple precision fixed point arithmetic, capable of exploiting SIMD instructions in modern processors. Keywords: floating point arithmetic, quadruple precision, complexity bound, FFT, SIMD A.C.M. subject classification: G.1.0 Computer-arithmetic A.M.S. subject classification: 65Y04, 65T50, 68W30 1. Introduction Multiple precision arithmetic [4] is crucial in areas such as computer algebra and cryptography, and increasingly useful in mathematical physics and numerical analysis [2]. Early multiple preci- sion libraries appeared in the seventies [3], and nowadays GMP [11] and MPFR [8] are typically very efficient for large precisions of more than, say, 1000 bits. However, for precisions which are only a few times larger than the machine precision, these libraries suffer from a large overhead. For instance, the MPFR library for arbitrary precision and IEEE-style standardized floating point arithmetic is typically about a factor 100 slower than double precision machine arithmetic. -
Chapter 2 DIRECT METHODS—PART II
Chapter 2 DIRECT METHODS—PART II If we use finite precision arithmetic, the results obtained by the use of direct methods are contaminated with roundoff error, which is not always negligible. 2.1 Finite Precision Computation 2.2 Residual vs. Error 2.3 Pivoting 2.4 Scaling 2.5 Iterative Improvement 2.1 Finite Precision Computation 2.1.1 Floating-point numbers decimal floating-point numbers The essence of floating-point numbers is best illustrated by an example, such as that of a 3-digit floating-point calculator which accepts numbers like the following: 123. 50.4 −0.62 −0.02 7.00 Any such number can be expressed as e ±d1.d2d3 × 10 where e ∈{0, 1, 2}. (2.1) Here 40 t := precision = 3, [L : U] := exponent range = [0 : 2]. The exponent range is rather limited. If the calculator display accommodates scientific notation, e g., 3.46 3 −1.56 −3 then we might use [L : U]=[−9 : 9]. Some numbers have multiple representations in form (2.1), e.g., 2.00 × 101 =0.20 × 102. Hence, there is a normalization: • choose smallest possible exponent, • choose + sign for zero, e.g., 0.52 × 102 → 5.20 × 101, 0.08 × 10−8 → 0.80 × 10−9, −0.00 × 100 → 0.00 × 10−9. −9 Nonzero numbers of form ±0.d2d3 × 10 are denormalized. But for large-scale scientific computation base 2 is preferred. binary floating-point numbers This is an important matter because numbers like 0.2 do not have finite representations in base 2: 0.2=(0.001100110011 ···)2. -
Harnessing Numerical Flexibility for Deep Learning on Fpgas.Pdf
WHITE PAPER FPGA Inline Acceleration Harnessing Numerical Flexibility for Deep Learning on FPGAs Authors Abstract Andrew C . Ling Deep learning has become a key workload in the data center and the edge, leading [email protected] to a race for dominance in this space. FPGAs have shown they can compete by combining deterministic low latency with high throughput and flexibility. In Mohamed S . Abdelfattah particular, FPGAs bit-level programmability can efficiently implement arbitrary [email protected] precisions and numeric data types critical in the fast evolving field of deep learning. Andrew Bitar In this paper, we explore FPGA minifloat implementations (floating-point [email protected] representations with non-standard exponent and mantissa sizes), and show the use of a block-floating-point implementation that shares the exponent across David Han many numbers, reducing the logic required to perform floating-point operations. [email protected] The paper shows this technique can significantly improve FPGA performance with no impact to accuracy, reduce logic utilization by 3X, and memory bandwidth and Roberto Dicecco capacity required by more than 40%.† [email protected] Suchit Subhaschandra Introduction [email protected] Deep neural networks have proven to be a powerful means to solve some of the Chris N Johnson most difficult computer vision and natural language processing problems since [email protected] their successful introduction to the ImageNet competition in 2012 [14]. This has led to an explosion of workloads based on deep neural networks in the data center Dmitry Denisenko and the edge [2]. [email protected] One of the key challenges with deep neural networks is their inherent Josh Fender computational complexity, where many deep nets require billions of operations [email protected] to perform a single inference. -
Bibliography
Bibliography [1] M. Abramowitz, I.A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs and Mathematical Tables. Applied Math. Series 55 (National Bureau of Standards, Washington, D.C., 1964) [2] R.C. Agarwal, J.C. Cooley, F.G. Gustavson, J.B. Shearer, G. Slishman, B. Tuckerman, New scalar and vector elementary functions for the IBM system/370. IBM J. Res. Dev. 30(2), 126–144 (1986) [3] H.M. Ahmed, Efficient elementary function generation with multipliers, in Proceedings of the 9th IEEE Symposium on Computer Arithmetic (1989), pp. 52–59 [4] H.M. Ahmed, J.M. Delosme, M. Morf, Highly concurrent computing structures for matrix arithmetic and signal processing. Computer 15(1), 65–82 (1982) [5] H. Alt, Comparison of arithmetic functions with respect to Boolean circuits, in Proceedings of the 16th ACM STOC (1984), pp. 466–470 [6] American National Standards Institute and Institute of Electrical and Electronic Engineers. IEEE Standard for Binary Floating-Point Arithmetic. ANSI/IEEE Standard 754–1985 (1985) [7] C. Ancourt, F. Irigoin, Scanning polyhedra with do loops, in Proceedings of the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’91), Apr 1991 (ACM Press, New York, NY, 1991), pp. 39–50 [8] I.J. Anderson, A distillation algorithm for floating-point summation. SIAM J. Sci. Comput. 20(5), 1797–1806 (1999) [9] M. Andrews, T. Mraz, Unified elementary function generator. Microprocess. Microsyst. 2(5), 270–274 (1978) [10] E. Antelo, J.D. Bruguera, J. Villalba, E. Zapata, Redundant CORDIC rotator based on parallel prediction, in Proceedings of the 12th IEEE Symposium on Computer Arithmetic, July 1995, pp. -
Midterm-2020-Solution.Pdf
HONOR CODE Questions Sheet. A Lets C. [6 Points] 1. What type of address (heap,stack,static,code) does each value evaluate to Book1, Book1->name, Book1->author, &Book2? [4] 2. Will all of the print statements execute as expected? If NO, write print statement which will not execute as expected?[2] B. Mystery [8 Points] 3. When the above code executes, which line is modified? How many times? [2] 4. What is the value of register a6 at the end ? [2] 5. What is the value of register a4 at the end ? [2] 6. In one sentence what is this program calculating ? [2] C. C-to-RISC V Tree Search; Fill in the blanks below [12 points] D. RISCV - The MOD operation [8 points] 19. The data segment starts at address 0x10000000. What are the memory locations modified by this program and what are their values ? E Floating Point [8 points.] 20. What is the smallest nonzero positive value that can be represented? Write your answer as a numerical expression in the answer packet? [2] 21. Consider some positive normalized floating point number where p is represented as: What is the distance (i.e. the difference) between p and the next-largest number after p that can be represented? [2] 22. Now instead let p be a positive denormalized number described asp = 2y x 0.significand. What is the distance between p and the next largest number after p that can be represented? [2] 23. Sort the following minifloat numbers. [2] F. Numbers. [5] 24. What is the smallest number that this system can represent 6 digits (assume unsigned) ? [1] 25. -
A Library for Interval Arithmetic Was Developed
1 Verified Real Number Calculations: A Library for Interval Arithmetic Marc Daumas, David Lester, and César Muñoz Abstract— Real number calculations on elementary functions about a page long and requires the use of several trigonometric are remarkably difficult to handle in mechanical proofs. In this properties. paper, we show how these calculations can be performed within In many cases the formal checking of numerical calculations a theorem prover or proof assistant in a convenient and highly automated as well as interactive way. First, we formally establish is so cumbersome that the effort seems futile; it is then upper and lower bounds for elementary functions. Then, based tempting to perform the calculations out of the system, and on these bounds, we develop a rational interval arithmetic where introduce the results as axioms.1 However, chances are that real number calculations take place in an algebraic setting. In the external calculations will be performed using floating-point order to reduce the dependency effect of interval arithmetic, arithmetic. Without formal checking of the results, we will we integrate two techniques: interval splitting and taylor series expansions. This pragmatic approach has been developed, and never be sure of the correctness of the calculations. formally verified, in a theorem prover. The formal development In this paper we present a set of interactive tools to automat- also includes a set of customizable strategies to automate proofs ically prove numerical properties, such as Formula (1), within involving explicit calculations over real numbers. Our ultimate a proof assistant. The point of departure is a collection of lower goal is to provide guaranteed proofs of numerical properties with minimal human theorem-prover interaction.