Floating-Point Architectures for Energy-Efficient Transprecision Computing
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Type-Safe Composition of Object Modules*
International Conference on Computer Systems and Education I ISc Bangalore Typ esafe Comp osition of Ob ject Mo dules Guruduth Banavar Gary Lindstrom Douglas Orr Department of Computer Science University of Utah Salt LakeCity Utah USA Abstract Intro duction It is widely agreed that strong typing in We describ e a facility that enables routine creases the reliability and eciency of soft typ echecking during the linkage of exter ware However compilers for statically typ ed nal declarations and denitions of separately languages suchasC and C in tradi compiled programs in ANSI C The primary tional nonintegrated programming environ advantage of our serverstyle typ echecked ments guarantee complete typ esafety only linkage facility is the ability to program the within a compilation unit but not across comp osition of ob ject mo dules via a suite of suchunits Longstanding and widely avail strongly typ ed mo dule combination op era able linkers comp ose separately compiled tors Such programmability enables one to units bymatching symb ols purely byname easily incorp orate programmerdened data equivalence with no regard to their typ es format conversion stubs at linktime In ad Such common denominator linkers accom dition our linkage facility is able to automat mo date ob ject mo dules from various source ically generate safe co ercion stubs for com languages by simply ignoring the static se patible encapsulated data mantics of the language Moreover com monly used ob ject le formats are not de signed to incorp orate source language typ e -
Variable Precision in Modern Floating-Point Computing
Variable precision in modern floating-point computing David H. Bailey Lawrence Berkeley Natlional Laboratory (retired) University of California, Davis, Department of Computer Science 1 / 33 August 13, 2018 Questions to examine in this talk I How can we ensure accuracy and reproducibility in floating-point computing? I What types of applications require more than 64-bit precision? I What types of applications require less than 32-bit precision? I What software support is required for variable precision? I How can one move efficiently between precision levels at the user level? 2 / 33 Commonly used formats for floating-point computing Formal Number of bits name Nickname Sign Exponent Mantissa Hidden Digits IEEE 16-bit “IEEE half” 1 5 10 1 3 (none) “ARM half” 1 5 10 1 3 (none) “bfloat16” 1 8 7 1 2 IEEE 32-bit “IEEE single” 1 7 24 1 7 IEEE 64-bit “IEEE double” 1 11 52 1 15 IEEE 80-bit “IEEE extended” 1 15 64 0 19 IEEE 128-bit “IEEE quad” 1 15 112 1 34 (none) “double double” 1 11 104 2 31 (none) “quad double” 1 11 208 4 62 (none) “multiple” 1 varies varies varies varies 3 / 33 Numerical reproducibility in scientific computing A December 2012 workshop on reproducibility in scientific computing, held at Brown University, USA, noted that Science is built upon the foundations of theory and experiment validated and improved through open, transparent communication. ... Numerical round-off error and numerical differences are greatly magnified as computational simulations are scaled up to run on highly parallel systems. -
JHDF5 (HDF5 for Java) 19.04
JHDF5 (HDF5 for Java) 19.04 Introduction HDF5 is an efficient, well-documented, non-proprietary binary data format and library developed and maintained by the HDF Group. The library provided by the HDF Group is written in C and available under a liberal BSD-style Open Source software license. It has over 600 API calls and is very powerful and configurable, but it is not trivial to use. SIS (formerly CISD) has developed an easy-to-use high-level API for HDF5 written in Java and available under the Apache License 2.0 called JHDF5. The API works on top of the low-level API provided by the HDF Group and the files created with the SIS API are fully compatible with HDF5 1.6/1.8/1.10 (as you choose). Table of Content Introduction ................................................................................................................................ 1 Table of Content ...................................................................................................................... 1 Simple Use Case .......................................................................................................................... 2 Overview of the library ............................................................................................................... 2 Numeric Data Types .................................................................................................................... 3 Compound Data Types ................................................................................................................ 4 System -
Faster Ffts in Medium Precision
Faster FFTs in medium precision Joris van der Hoevena, Grégoire Lecerfb Laboratoire d’informatique, UMR 7161 CNRS Campus de l’École polytechnique 1, rue Honoré d’Estienne d’Orves Bâtiment Alan Turing, CS35003 91120 Palaiseau a. Email: [email protected] b. Email: [email protected] November 10, 2014 In this paper, we present new algorithms for the computation of fast Fourier transforms over complex numbers for “medium” precisions, typically in the range from 100 until 400 bits. On the one hand, such precisions are usually not supported by hardware. On the other hand, asymptotically fast algorithms for multiple precision arithmetic do not pay off yet. The main idea behind our algorithms is to develop efficient vectorial multiple precision fixed point arithmetic, capable of exploiting SIMD instructions in modern processors. Keywords: floating point arithmetic, quadruple precision, complexity bound, FFT, SIMD A.C.M. subject classification: G.1.0 Computer-arithmetic A.M.S. subject classification: 65Y04, 65T50, 68W30 1. Introduction Multiple precision arithmetic [4] is crucial in areas such as computer algebra and cryptography, and increasingly useful in mathematical physics and numerical analysis [2]. Early multiple preci- sion libraries appeared in the seventies [3], and nowadays GMP [11] and MPFR [8] are typically very efficient for large precisions of more than, say, 1000 bits. However, for precisions which are only a few times larger than the machine precision, these libraries suffer from a large overhead. For instance, the MPFR library for arbitrary precision and IEEE-style standardized floating point arithmetic is typically about a factor 100 slower than double precision machine arithmetic. -
Harnessing Numerical Flexibility for Deep Learning on Fpgas.Pdf
WHITE PAPER FPGA Inline Acceleration Harnessing Numerical Flexibility for Deep Learning on FPGAs Authors Abstract Andrew C . Ling Deep learning has become a key workload in the data center and the edge, leading [email protected] to a race for dominance in this space. FPGAs have shown they can compete by combining deterministic low latency with high throughput and flexibility. In Mohamed S . Abdelfattah particular, FPGAs bit-level programmability can efficiently implement arbitrary [email protected] precisions and numeric data types critical in the fast evolving field of deep learning. Andrew Bitar In this paper, we explore FPGA minifloat implementations (floating-point [email protected] representations with non-standard exponent and mantissa sizes), and show the use of a block-floating-point implementation that shares the exponent across David Han many numbers, reducing the logic required to perform floating-point operations. [email protected] The paper shows this technique can significantly improve FPGA performance with no impact to accuracy, reduce logic utilization by 3X, and memory bandwidth and Roberto Dicecco capacity required by more than 40%.† [email protected] Suchit Subhaschandra Introduction [email protected] Deep neural networks have proven to be a powerful means to solve some of the Chris N Johnson most difficult computer vision and natural language processing problems since [email protected] their successful introduction to the ImageNet competition in 2012 [14]. This has led to an explosion of workloads based on deep neural networks in the data center Dmitry Denisenko and the edge [2]. [email protected] One of the key challenges with deep neural networks is their inherent Josh Fender computational complexity, where many deep nets require billions of operations [email protected] to perform a single inference. -
Midterm-2020-Solution.Pdf
HONOR CODE Questions Sheet. A Lets C. [6 Points] 1. What type of address (heap,stack,static,code) does each value evaluate to Book1, Book1->name, Book1->author, &Book2? [4] 2. Will all of the print statements execute as expected? If NO, write print statement which will not execute as expected?[2] B. Mystery [8 Points] 3. When the above code executes, which line is modified? How many times? [2] 4. What is the value of register a6 at the end ? [2] 5. What is the value of register a4 at the end ? [2] 6. In one sentence what is this program calculating ? [2] C. C-to-RISC V Tree Search; Fill in the blanks below [12 points] D. RISCV - The MOD operation [8 points] 19. The data segment starts at address 0x10000000. What are the memory locations modified by this program and what are their values ? E Floating Point [8 points.] 20. What is the smallest nonzero positive value that can be represented? Write your answer as a numerical expression in the answer packet? [2] 21. Consider some positive normalized floating point number where p is represented as: What is the distance (i.e. the difference) between p and the next-largest number after p that can be represented? [2] 22. Now instead let p be a positive denormalized number described asp = 2y x 0.significand. What is the distance between p and the next largest number after p that can be represented? [2] 23. Sort the following minifloat numbers. [2] F. Numbers. [5] 24. What is the smallest number that this system can represent 6 digits (assume unsigned) ? [1] 25. -
System Design for a Computational-RAM Logic-In-Memory Parailel-Processing Machine
System Design for a Computational-RAM Logic-In-Memory ParaIlel-Processing Machine Peter M. Nyasulu, B .Sc., M.Eng. A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Doctor of Philosophy Ottaw a-Carleton Ins titute for Eleceical and Computer Engineering, Department of Electronics, Faculty of Engineering, Carleton University, Ottawa, Ontario, Canada May, 1999 O Peter M. Nyasulu, 1999 National Library Biôiiothkque nationale du Canada Acquisitions and Acquisitions et Bibliographie Services services bibliographiques 39S Weiiington Street 395. nie WeUingtm OnawaON KlAW Ottawa ON K1A ON4 Canada Canada The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, ban, distribute or seU reproduire, prêter, distribuer ou copies of this thesis in microform, vendre des copies de cette thèse sous paper or electronic formats. la forme de microficbe/nlm, de reproduction sur papier ou sur format électronique. The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fkom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation. Abstract Integrating several 1-bit processing elements at the sense amplifiers of a standard RAM improves the performance of massively-paralle1 applications because of the inherent parallelism and high data bandwidth inside the memory chip. -
A Library for Interval Arithmetic Was Developed
1 Verified Real Number Calculations: A Library for Interval Arithmetic Marc Daumas, David Lester, and César Muñoz Abstract— Real number calculations on elementary functions about a page long and requires the use of several trigonometric are remarkably difficult to handle in mechanical proofs. In this properties. paper, we show how these calculations can be performed within In many cases the formal checking of numerical calculations a theorem prover or proof assistant in a convenient and highly automated as well as interactive way. First, we formally establish is so cumbersome that the effort seems futile; it is then upper and lower bounds for elementary functions. Then, based tempting to perform the calculations out of the system, and on these bounds, we develop a rational interval arithmetic where introduce the results as axioms.1 However, chances are that real number calculations take place in an algebraic setting. In the external calculations will be performed using floating-point order to reduce the dependency effect of interval arithmetic, arithmetic. Without formal checking of the results, we will we integrate two techniques: interval splitting and taylor series expansions. This pragmatic approach has been developed, and never be sure of the correctness of the calculations. formally verified, in a theorem prover. The formal development In this paper we present a set of interactive tools to automat- also includes a set of customizable strategies to automate proofs ically prove numerical properties, such as Formula (1), within involving explicit calculations over real numbers. Our ultimate a proof assistant. The point of departure is a collection of lower goal is to provide guaranteed proofs of numerical properties with minimal human theorem-prover interaction. -
Signedness-Agnostic Program Analysis: Precise Integer Bounds for Low-Level Code
Signedness-Agnostic Program Analysis: Precise Integer Bounds for Low-Level Code Jorge A. Navas, Peter Schachte, Harald Søndergaard, and Peter J. Stuckey Department of Computing and Information Systems, The University of Melbourne, Victoria 3010, Australia Abstract. Many compilers target common back-ends, thereby avoid- ing the need to implement the same analyses for many different source languages. This has led to interest in static analysis of LLVM code. In LLVM (and similar languages) most signedness information associated with variables has been compiled away. Current analyses of LLVM code tend to assume that either all values are signed or all are unsigned (except where the code specifies the signedness). We show how program analysis can simultaneously consider each bit-string to be both signed and un- signed, thus improving precision, and we implement the idea for the spe- cific case of integer bounds analysis. Experimental evaluation shows that this provides higher precision at little extra cost. Our approach turns out to be beneficial even when all signedness information is available, such as when analysing C or Java code. 1 Introduction The “Low Level Virtual Machine” LLVM is rapidly gaining popularity as a target for compilers for a range of programming languages. As a result, the literature on static analysis of LLVM code is growing (for example, see [2, 7, 9, 11, 12]). LLVM IR (Intermediate Representation) carefully specifies the bit- width of all integer values, but in most cases does not specify whether values are signed or unsigned. This is because, for most operations, two’s complement arithmetic (treating the inputs as signed numbers) produces the same bit-vectors as unsigned arithmetic. -
GNU MPFR the Multiple Precision Floating-Point Reliable Library Edition 4.1.0 July 2020
GNU MPFR The Multiple Precision Floating-Point Reliable Library Edition 4.1.0 July 2020 The MPFR team [email protected] This manual documents how to install and use the Multiple Precision Floating-Point Reliable Library, version 4.1.0. Copyright 1991, 1993-2020 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back- Cover Texts. A copy of the license is included in Appendix A [GNU Free Documentation License], page 59. i Table of Contents MPFR Copying Conditions ::::::::::::::::::::::::::::::::::::::: 1 1 Introduction to MPFR :::::::::::::::::::::::::::::::::::::::: 2 1.1 How to Use This Manual::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 2 2 Installing MPFR ::::::::::::::::::::::::::::::::::::::::::::::: 3 2.1 How to Install ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 3 2.2 Other `make' Targets :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 4 2.3 Build Problems :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 4 2.4 Getting the Latest Version of MPFR ::::::::::::::::::::::::::::::::::::::::::::::: 4 3 Reporting Bugs::::::::::::::::::::::::::::::::::::::::::::::::: 5 4 MPFR Basics ::::::::::::::::::::::::::::::::::::::::::::::::::: 6 4.1 Headers and Libraries :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 6 -
Sun 64-Bit Binary Alignment Proposal
1 KMIP 64-bit Binary Alignment Proposal 2 3 To: OASIS KMIP Technical Committee 4 From: Matt Ball, Sun Microsystems, Inc. 5 Date: May 1, 2009 6 Version: 1 7 Purpose: To propose a change to the binary encoding such that each part is aligned to an 8-byte 8 boundary 9 10 Revision History 11 Version 1, 2009-05-01: Initial version 12 Introduction 13 The binary encoding as defined in the 1.0 version of the KMIP draft does not maintain alignment to 8-byte 14 boundaries within the message structure. This causes problems on hard-aligned processors, such as the 15 ARM, that are not able to easily access memory on addresses that are not aligned to 4 bytes. 16 Additionally, it reduces performance on modern 64-bit processors. For hard-aligned processors, when 17 unaligned memory contents are requested, either the compiler has to add extra instructions to perform 18 two aligned memory accesses and reassemble the data, or the processor has to take a „trap‟ (i.e., an 19 interrupt generated on unaligned memory accesses) to correctly assemble the memory contents. Either 20 of these options results in reduced performance. On soft-aligned processors, the hardware has to make 21 two memory accesses instead of one when the contents are not properly aligned. 22 This proposal suggests ways to improve the performance on hard-aligned processors by aligning all data 23 structures to 8-byte boundaries. 24 Summary of Proposed Changes 25 This proposal includes the following changes to the KMIP 0.98 draft submission to the OASIS KMIP TC: 26 Change the alignment of the KMIP binary encoding such that each part is aligned to an 8-byte 27 boundary. -
IEEE Standard 754 for Binary Floating-Point Arithmetic
Work in Progress: Lecture Notes on the Status of IEEE 754 October 1, 1997 3:36 am Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic Prof. W. Kahan Elect. Eng. & Computer Science University of California Berkeley CA 94720-1776 Introduction: Twenty years ago anarchy threatened floating-point arithmetic. Over a dozen commercially significant arithmetics boasted diverse wordsizes, precisions, rounding procedures and over/underflow behaviors, and more were in the works. “Portable” software intended to reconcile that numerical diversity had become unbearably costly to develop. Thirteen years ago, when IEEE 754 became official, major microprocessor manufacturers had already adopted it despite the challenge it posed to implementors. With unprecedented altruism, hardware designers had risen to its challenge in the belief that they would ease and encourage a vast burgeoning of numerical software. They did succeed to a considerable extent. Anyway, rounding anomalies that preoccupied all of us in the 1970s afflict only CRAY X-MPs — J90s now. Now atrophy threatens features of IEEE 754 caught in a vicious circle: Those features lack support in programming languages and compilers, so those features are mishandled and/or practically unusable, so those features are little known and less in demand, and so those features lack support in programming languages and compilers. To help break that circle, those features are discussed in these notes under the following headings: Representable Numbers, Normal and Subnormal, Infinite