Computer Architecture and System Programming Laboratory
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Arithmetic Algorithms for Extended Precision Using Floating-Point Expansions Mioara Joldes, Olivier Marty, Jean-Michel Muller, Valentina Popescu
Arithmetic algorithms for extended precision using floating-point expansions Mioara Joldes, Olivier Marty, Jean-Michel Muller, Valentina Popescu To cite this version: Mioara Joldes, Olivier Marty, Jean-Michel Muller, Valentina Popescu. Arithmetic algorithms for extended precision using floating-point expansions. IEEE Transactions on Computers, Institute of Electrical and Electronics Engineers, 2016, 65 (4), pp.1197 - 1210. 10.1109/TC.2015.2441714. hal- 01111551v2 HAL Id: hal-01111551 https://hal.archives-ouvertes.fr/hal-01111551v2 Submitted on 2 Jun 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. IEEE TRANSACTIONS ON COMPUTERS, VOL. , 201X 1 Arithmetic algorithms for extended precision using floating-point expansions Mioara Joldes¸, Olivier Marty, Jean-Michel Muller and Valentina Popescu Abstract—Many numerical problems require a higher computing precision than the one offered by standard floating-point (FP) formats. One common way of extending the precision is to represent numbers in a multiple component format. By using the so- called floating-point expansions, real numbers are represented as the unevaluated sum of standard machine precision FP numbers. This representation offers the simplicity of using directly available, hardware implemented and highly optimized, FP operations. -
Floating Points
Jin-Soo Kim ([email protected]) Systems Software & Architecture Lab. Seoul National University Floating Points Fall 2018 ▪ How to represent fractional values with finite number of bits? • 0.1 • 0.612 • 3.14159265358979323846264338327950288... ▪ Wide ranges of numbers • 1 Light-Year = 9,460,730,472,580.8 km • The radius of a hydrogen atom: 0.000000000025 m 4190.308: Computer Architecture | Fall 2018 | Jin-Soo Kim ([email protected]) 2 ▪ Representation • Bits to right of “binary point” represent fractional powers of 2 • Represents rational number: 2i i i–1 k 2 bk 2 k=− j 4 • • • 2 1 bi bi–1 • • • b2 b1 b0 . b–1 b–2 b–3 • • • b–j 1/2 1/4 • • • 1/8 2–j 4190.308: Computer Architecture | Fall 2018 | Jin-Soo Kim ([email protected]) 3 ▪ Examples: Value Representation 5-3/4 101.112 2-7/8 10.1112 63/64 0.1111112 ▪ Observations • Divide by 2 by shifting right • Multiply by 2 by shifting left • Numbers of form 0.111111..2 just below 1.0 – 1/2 + 1/4 + 1/8 + … + 1/2i + … → 1.0 – Use notation 1.0 – 4190.308: Computer Architecture | Fall 2018 | Jin-Soo Kim ([email protected]) 4 ▪ Representable numbers • Can only exactly represent numbers of the form x / 2k • Other numbers have repeating bit representations Value Representation 1/3 0.0101010101[01]…2 1/5 0.001100110011[0011]…2 1/10 0.0001100110011[0011]…2 4190.308: Computer Architecture | Fall 2018 | Jin-Soo Kim ([email protected]) 5 Fixed Points ▪ p.q Fixed-point representation • Use the rightmost q bits of an integer as representing a fraction • Example: 17.14 fixed-point representation -
Hacking in C 2020 the C Programming Language Thom Wiggers
Hacking in C 2020 The C programming language Thom Wiggers 1 Table of Contents Introduction Undefined behaviour Abstracting away from bytes in memory Integer representations 2 Table of Contents Introduction Undefined behaviour Abstracting away from bytes in memory Integer representations 3 – Another predecessor is B. • Not one of the first programming languages: ALGOL for example is older. • Closely tied to the development of the Unix operating system • Unix and Linux are mostly written in C • Compilers are widely available for many, many, many platforms • Still in development: latest release of standard is C18. Popular versions are C99 and C11. • Many compilers implement extensions, leading to versions such as gnu18, gnu11. • Default version in GCC gnu11 The C programming language • Invented by Dennis Ritchie in 1972–1973 4 – Another predecessor is B. • Closely tied to the development of the Unix operating system • Unix and Linux are mostly written in C • Compilers are widely available for many, many, many platforms • Still in development: latest release of standard is C18. Popular versions are C99 and C11. • Many compilers implement extensions, leading to versions such as gnu18, gnu11. • Default version in GCC gnu11 The C programming language • Invented by Dennis Ritchie in 1972–1973 • Not one of the first programming languages: ALGOL for example is older. 4 • Closely tied to the development of the Unix operating system • Unix and Linux are mostly written in C • Compilers are widely available for many, many, many platforms • Still in development: latest release of standard is C18. Popular versions are C99 and C11. • Many compilers implement extensions, leading to versions such as gnu18, gnu11. -
Floating Point Formats
Telemetry Standard RCC Document 106-07, Appendix O, September 2007 APPENDIX O New FLOATING POINT FORMATS Paragraph Title Page 1.0 Introduction..................................................................................................... O-1 2.0 IEEE 32 Bit Single Precision Floating Point.................................................. O-1 3.0 IEEE 64 Bit Double Precision Floating Point ................................................ O-2 4.0 MIL STD 1750A 32 Bit Single Precision Floating Point............................... O-2 5.0 MIL STD 1750A 48 Bit Double Precision Floating Point ............................. O-2 6.0 DEC 32 Bit Single Precision Floating Point................................................... O-3 7.0 DEC 64 Bit Double Precision Floating Point ................................................. O-3 8.0 IBM 32 Bit Single Precision Floating Point ................................................... O-3 9.0 IBM 64 Bit Double Precision Floating Point.................................................. O-4 10.0 TI (Texas Instruments) 32 Bit Single Precision Floating Point...................... O-4 11.0 TI (Texas Instruments) 40 Bit Extended Precision Floating Point................. O-4 LIST OF TABLES Table O-1. Floating Point Formats.................................................................................... O-1 Telemetry Standard RCC Document 106-07, Appendix O, September 2007 This page intentionally left blank. ii Telemetry Standard RCC Document 106-07, Appendix O, September 2007 APPENDIX O FLOATING POINT -
Unit – I Computer Architecture and Operating System – Scs1315
SCHOOL OF ELECTRICAL AND ELECTRONICS DEPARTMENT OF ELECTRONICS AND COMMMUNICATION ENGINEERING UNIT – I COMPUTER ARCHITECTURE AND OPERATING SYSTEM – SCS1315 UNIT.1 INTRODUCTION Central Processing Unit - Introduction - General Register Organization - Stack organization -- Basic computer Organization - Computer Registers - Computer Instructions - Instruction Cycle. Arithmetic, Logic, Shift Microoperations- Arithmetic Logic Shift Unit -Example Architectures: MIPS, Power PC, RISC, CISC Central Processing Unit The part of the computer that performs the bulk of data-processing operations is called the central processing unit CPU. The CPU is made up of three major parts, as shown in Fig.1 Fig 1. Major components of CPU. The register set stores intermediate data used during the execution of the instructions. The arithmetic logic unit (ALU) performs the required microoperations for executing the instructions. The control unit supervises the transfer of information among the registers and instructs the ALU as to which operation to perform. General Register Organization When a large number of registers are included in the CPU, it is most efficient to connect them through a common bus system. The registers communicate with each other not only for direct data transfers, but also while performing various microoperations. Hence it is necessary to provide a common unit that can perform all the arithmetic, logic, and shift microoperations in the processor. A bus organization for seven CPU registers is shown in Fig.2. The output of each register is connected to two multiplexers (MUX) to form the two buses A and B. The selection lines in each multiplexer select one register or the input data for the particular bus. The A and B buses form the inputs to a common arithmetic logic unit (ALU). -
S.D.M COLLEGE of ENGINEERING and TECHNOLOGY Sridhar Y
VISVESVARAYA TECHNOLOGICAL UNIVERSITY S.D.M COLLEGE OF ENGINEERING AND TECHNOLOGY A seminar report on CUDA Submitted by Sridhar Y 2sd06cs108 8th semester DEPARTMENT OF COMPUTER SCIENCE ENGINEERING 2009-10 Page 1 VISVESVARAYA TECHNOLOGICAL UNIVERSITY S.D.M COLLEGE OF ENGINEERING AND TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE ENGINEERING CERTIFICATE Certified that the seminar work entitled “CUDA” is a bonafide work presented by Sridhar Y bearing USN 2SD06CS108 in a partial fulfillment for the award of degree of Bachelor of Engineering in Computer Science Engineering of the Visvesvaraya Technological University, Belgaum during the year 2009-10. The seminar report has been approved as it satisfies the academic requirements with respect to seminar work presented for the Bachelor of Engineering Degree. Staff in charge H.O.D CSE Name: Sridhar Y USN: 2SD06CS108 Page 2 Contents 1. Introduction 4 2. Evolution of GPU programming and CUDA 5 3. CUDA Structure for parallel processing 9 4. Programming model of CUDA 10 5. Portability and Security of the code 12 6. Managing Threads with CUDA 14 7. Elimination of Deadlocks in CUDA 17 8. Data distribution among the Thread Processes in CUDA 14 9. Challenges in CUDA for the Developers 19 10. The Pros and Cons of CUDA 19 11. Conclusions 21 Bibliography 21 Page 3 Abstract Parallel processing on multi core processors is the industry’s biggest software challenge, but the real problem is there are too many solutions. One of them is Nvidia’s Compute Unified Device Architecture (CUDA), a software platform for massively parallel high performance computing on the powerful Graphics Processing Units (GPUs). -
Extended Precision Floating Point Arithmetic
Extended Precision Floating Point Numbers for Ill-Conditioned Problems Daniel Davis, Advisor: Dr. Scott Sarra Department of Mathematics, Marshall University Floating Point Number Systems Precision Extended Precision Patriot Missile Failure Floating point representation is based on scientific An increasing number of problems exist for which Patriot missile defense modules have been used by • The precision, p, of a floating point number notation, where a nonzero real decimal number, x, IEEE double is insufficient. These include modeling the U.S. Army since the mid-1960s. On February 21, system is the number of bits in the significand. is expressed as x = ±S × 10E, where 1 ≤ S < 10. of dynamical systems such as our solar system or the 1991, a Patriot protecting an Army barracks in Dha- This means that any normalized floating point The values of S and E are known as the significand • climate, and numerical cryptography. Several arbi- ran, Afghanistan failed to intercept a SCUD missile, number with precision p can be written as: and exponent, respectively. When discussing float- trary precision libraries have been developed, but leading to the death of 28 Americans. This failure E ing points, we are interested in the computer repre- x = ±(1.b1b2...bp−2bp−1)2 × 2 are too slow to be practical for many complex ap- was caused by floating point rounding error. The sentation of numbers, so we must consider base 2, or • The smallest x such that x > 1 is then: plications. David Bailey’s QD library may be used system measured time in tenths of seconds, using binary, rather than base 10. -
Lecture 6: Instruction Set Architecture and the 80X86
Lecture 6: Instruction Set Architecture and the 80x86 Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review From Last Time • Given sales a function of performance relative to competition, tremendous investment in improving product as reported by performance summary • Good products created when have: – Good benchmarks – Good ways to summarize performance • If not good benchmarks and summary, then choice between improving product for real programs vs. improving product to get more sales=> sales almost always wins • Time is the measure of computer performance! • What about cost? RHK.S96 2 Review: Integrated Circuits Costs IC cost = Die cost + Testing cost + Packaging cost Final test yield Die cost = Wafer cost Dies per Wafer * Die yield Dies per wafer = p * ( Wafer_diam / 2)2 – p * Wafer_diam – Test dies Die Area Ö 2 * Die Area Defects_per_unit_area * Die_Area Die Yield = Wafer yield * { 1 + } 4 Die Cost is goes roughly with area RHK.S96 3 Review From Last Time Price vs. Cost 100% 80% Average Discount 60% Gross Margin 40% Direct Costs 20% Component Costs 0% Mini W/S PC 5 4.7 3.8 4 3.5 Average Discount 3 2.5 Gross Margin 2 1.8 Direct Costs 1.5 1 Component Costs 0 Mini W/S PC RHK.S96 4 Today: Instruction Set Architecture • 1950s to 1960s: Computer Architecture Course Computer Arithmetic • 1970 to mid 1980s: Computer Architecture Course Instruction Set Design, especially ISA appropriate for compilers • 1990s: Computer Architecture Course Design of CPU, memory system, I/O system, Multiprocessors RHK.S96 5 Computer Architecture? . the attributes of a [computing] system as seen by the programmer, i.e. -
Low-Power Microprocessor Based on Stack Architecture
Girish Aramanekoppa Subbarao Low-power Microprocessor based on Stack Architecture Stack on based Microprocessor Low-power Master’s Thesis Low-power Microprocessor based on Stack Architecture Girish Aramanekoppa Subbarao Series of Master’s theses Department of Electrical and Information Technology LU/LTH-EIT 2015-464 Department of Electrical and Information Technology, http://www.eit.lth.se Faculty of Engineering, LTH, Lund University, September 2015. Department of Electrical and Information Technology Master of Science Thesis Low-power Microprocessor based on Stack Architecture Supervisors: Author: Prof. Joachim Rodrigues Girish Aramanekoppa Subbarao Prof. Anders Ard¨o Lund 2015 © The Department of Electrical and Information Technology Lund University Box 118, S-221 00 LUND SWEDEN This thesis is set in Computer Modern 10pt, with the LATEX Documentation System ©Girish Aramanekoppa Subbarao 2015 Printed in E-huset Lund, Sweden. Sep. 2015 Abstract There are many applications of microprocessors in embedded applications, where power efficiency becomes a critical requirement, e.g. wearable or mobile devices in healthcare, space instrumentation and handheld devices. One of the methods of achieving low power operation is by simplifying the device architecture. RISC/CISC processors consume considerable power because of their complexity, which is due to their multiplexer system connecting the register file to the func- tional units and their instruction pipeline system. On the other hand, the Stack machines are comparatively less complex due to their implied addressing to the top two registers of the stack and smaller operation codes. This makes the instruction and the address decoder circuit simple by eliminating the multiplex switches for read and write ports of the register file. -
A Variable Precision Hardware Acceleration for Scientific Computing Andrea Bocco
A variable precision hardware acceleration for scientific computing Andrea Bocco To cite this version: Andrea Bocco. A variable precision hardware acceleration for scientific computing. Discrete Mathe- matics [cs.DM]. Université de Lyon, 2020. English. NNT : 2020LYSEI065. tel-03102749 HAL Id: tel-03102749 https://tel.archives-ouvertes.fr/tel-03102749 Submitted on 7 Jan 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. N°d’ordre NNT : 2020LYSEI065 THÈSE de DOCTORAT DE L’UNIVERSITÉ DE LYON Opérée au sein de : CEA Grenoble Ecole Doctorale InfoMaths EDA N17° 512 (Informatique Mathématique) Spécialité de doctorat :Informatique Soutenue publiquement le 29/07/2020, par : Andrea Bocco A variable precision hardware acceleration for scientific computing Devant le jury composé de : Frédéric Pétrot Président et Rapporteur Professeur des Universités, TIMA, Grenoble, France Marc Dumas Rapporteur Professeur des Universités, École Normale Supérieure de Lyon, France Nathalie Revol Examinatrice Docteure, École Normale Supérieure de Lyon, France Fabrizio Ferrandi Examinateur Professeur associé, Politecnico di Milano, Italie Florent de Dinechin Directeur de thèse Professeur des Universités, INSA Lyon, France Yves Durand Co-directeur de thèse Docteur, CEA Grenoble, France Cette thèse est accessible à l'adresse : http://theses.insa-lyon.fr/publication/2020LYSEI065/these.pdf © [A. -
X86-64 Machine-Level Programming∗
x86-64 Machine-Level Programming∗ Randal E. Bryant David R. O'Hallaron September 9, 2005 Intel’s IA32 instruction set architecture (ISA), colloquially known as “x86”, is the dominant instruction format for the world’s computers. IA32 is the platform of choice for most Windows and Linux machines. The ISA we use today was defined in 1985 with the introduction of the i386 microprocessor, extending the 16-bit instruction set defined by the original 8086 to 32 bits. Even though subsequent processor generations have introduced new instruction types and formats, many compilers, including GCC, have avoided using these features in the interest of maintaining backward compatibility. A shift is underway to a 64-bit version of the Intel instruction set. Originally developed by Advanced Micro Devices (AMD) and named x86-64, it is now supported by high end processors from AMD (who now call it AMD64) and by Intel, who refer to it as EM64T. Most people still refer to it as “x86-64,” and we follow this convention. Newer versions of Linux and GCC support this extension. In making this switch, the developers of GCC saw an opportunity to also make use of some of the instruction-set features that had been added in more recent generations of IA32 processors. This combination of new hardware and revised compiler makes x86-64 code substantially different in form and in performance than IA32 code. In creating the 64-bit extension, the AMD engineers also adopted some of the features found in reduced-instruction set computers (RISC) [7] that made them the favored targets for optimizing compilers. -
Effectiveness of Floating-Point Precision on the Numerical Approximation by Spectral Methods
Mathematical and Computational Applications Article Effectiveness of Floating-Point Precision on the Numerical Approximation by Spectral Methods José A. O. Matos 1,2,† and Paulo B. Vasconcelos 1,2,∗,† 1 Center of Mathematics, University of Porto, R. Dr. Roberto Frias, 4200-464 Porto, Portugal; [email protected] 2 Faculty of Economics, University of Porto, R. Dr. Roberto Frias, 4200-464 Porto, Portugal * Correspondence: [email protected] † These authors contributed equally to this work. Abstract: With the fast advances in computational sciences, there is a need for more accurate compu- tations, especially in large-scale solutions of differential problems and long-term simulations. Amid the many numerical approaches to solving differential problems, including both local and global methods, spectral methods can offer greater accuracy. The downside is that spectral methods often require high-order polynomial approximations, which brings numerical instability issues to the prob- lem resolution. In particular, large condition numbers associated with the large operational matrices, prevent stable algorithms from working within machine precision. Software-based solutions that implement arbitrary precision arithmetic are available and should be explored to obtain higher accu- racy when needed, even with the higher computing time cost associated. In this work, experimental results on the computation of approximate solutions of differential problems via spectral methods are detailed with recourse to quadruple precision arithmetic. Variable precision arithmetic was used in Tau Toolbox, a mathematical software package to solve integro-differential problems via the spectral Tau method. Citation: Matos, J.A.O.; Vasconcelos, Keywords: floating-point arithmetic; variable precision arithmetic; IEEE 754-2008 standard; quadru- P.B.