CUDA Math API

Total Page:16

File Type:pdf, Size:1020Kb

CUDA Math API CUDA Math API API Reference Manual v11.4 | September 2021 Table of Contents Chapter 1. Modules.............................................................................................................. 1 1.1. Half Precision Intrinsics........................................................................................................... 2 Half Arithmetic Functions............................................................................................................2 Half2 Arithmetic Functions..........................................................................................................2 Half Comparison Functions......................................................................................................... 2 Half2 Comparison Functions....................................................................................................... 2 Half Precision Conversion and Data Movement.........................................................................2 Half Math Functions.....................................................................................................................2 Half2 Math Functions...................................................................................................................2 1.1.1. Half Arithmetic Functions..................................................................................................2 __habs........................................................................................................................................2 __hadd....................................................................................................................................... 3 __hadd_sat................................................................................................................................ 3 __hdiv.........................................................................................................................................3 __hfma....................................................................................................................................... 4 __hfma_relu.............................................................................................................................. 4 __hfma_sat................................................................................................................................ 4 __hmul....................................................................................................................................... 5 __hmul_sat................................................................................................................................5 __hneg....................................................................................................................................... 6 __hsub........................................................................................................................................6 __hsub_sat................................................................................................................................ 6 1.1.2. Half2 Arithmetic Functions................................................................................................6 __h2div....................................................................................................................................... 7 __habs2......................................................................................................................................7 __hadd2......................................................................................................................................7 __hadd2_sat.............................................................................................................................. 7 __hcmadd.................................................................................................................................. 8 __hfma2..................................................................................................................................... 9 __hfma2_relu............................................................................................................................ 9 __hfma2_sat............................................................................................................................ 10 __hmul2................................................................................................................................... 10 __hmul2_sat............................................................................................................................ 11 __hneg2....................................................................................................................................11 __hsub2....................................................................................................................................11 __hsub2_sat.............................................................................................................................12 CUDA Math API vRelease Version | ii 1.1.3. Half Comparison Functions............................................................................................. 12 __heq........................................................................................................................................12 __hequ..................................................................................................................................... 13 __hge........................................................................................................................................13 __hgeu..................................................................................................................................... 14 __hgt........................................................................................................................................ 14 __hgtu...................................................................................................................................... 15 __hisinf.....................................................................................................................................15 __hisnan...................................................................................................................................16 __hle.........................................................................................................................................16 __hleu...................................................................................................................................... 16 __hlt......................................................................................................................................... 17 __hltu....................................................................................................................................... 17 __hmax.....................................................................................................................................18 __hmax_nan............................................................................................................................ 18 __hmin..................................................................................................................................... 18 __hmin_nan............................................................................................................................. 19 __hne....................................................................................................................................... 19 __hneu..................................................................................................................................... 19 1.1.4. Half2 Comparison Functions........................................................................................... 20 __hbeq2....................................................................................................................................20 __hbequ2................................................................................................................................. 21 __hbge2....................................................................................................................................21 __hbgeu2................................................................................................................................. 22 __hbgt2.................................................................................................................................... 22 __hbgtu2.................................................................................................................................. 23 __hble2.....................................................................................................................................24 __hbleu2.................................................................................................................................. 24 __hblt2..................................................................................................................................... 25 __hbltu2................................................................................................................................... 25 __hbne2....................................................................................................................................26 __hbneu2................................................................................................................................
Recommended publications
  • Avoiding the Undefined by Underspecification
    Avoiding the Undefined by Underspecification David Gries* and Fred B. Schneider** Computer Science Department, Cornell University Ithaca, New York 14853 USA Abstract. We use the appeal of simplicity and an aversion to com- plexity in selecting a method for handling partial functions in logic. We conclude that avoiding the undefined by using underspecification is the preferred choice. 1 Introduction Everything should be made as simple as possible, but not any simpler. --Albert Einstein The first volume of Springer-Verlag's Lecture Notes in Computer Science ap- peared in 1973, almost 22 years ago. The field has learned a lot about computing since then, and in doing so it has leaned on and supported advances in other fields. In this paper, we discuss an aspect of one of these fields that has been explored by computer scientists, handling undefined terms in formal logic. Logicians are concerned mainly with studying logic --not using it. Some computer scientists have discovered, by necessity, that logic is actually a useful tool. Their attempts to use logic have led to new insights --new axiomatizations, new proof formats, new meta-logical notions like relative completeness, and new concerns. We, the authors, are computer scientists whose interests have forced us to become users of formal logic. We use it in our work on language definition and in the formal development of programs. This paper is born of that experience. The operation of a computer is nothing more than uninterpreted symbol ma- nipulation. Bits are moved and altered electronically, without regard for what they denote. It iswe who provide the interpretation, saying that one bit string represents money and another someone's name, or that one transformation rep- resents addition and another alphabetization.
    [Show full text]
  • Fortran 90 Overview
    1 Fortran 90 Overview J.E. Akin, Copyright 1998 This overview of Fortran 90 (F90) features is presented as a series of tables that illustrate the syntax and abilities of F90. Frequently comparisons are made to similar features in the C++ and F77 languages and to the Matlab environment. These tables show that F90 has significant improvements over F77 and matches or exceeds newer software capabilities found in C++ and Matlab for dynamic memory management, user defined data structures, matrix operations, operator definition and overloading, intrinsics for vector and parallel pro- cessors and the basic requirements for object-oriented programming. They are intended to serve as a condensed quick reference guide for programming in F90 and for understanding programs developed by others. List of Tables 1 Comment syntax . 4 2 Intrinsic data types of variables . 4 3 Arithmetic operators . 4 4 Relational operators (arithmetic and logical) . 5 5 Precedence pecking order . 5 6 Colon Operator Syntax and its Applications . 5 7 Mathematical functions . 6 8 Flow Control Statements . 7 9 Basic loop constructs . 7 10 IF Constructs . 8 11 Nested IF Constructs . 8 12 Logical IF-ELSE Constructs . 8 13 Logical IF-ELSE-IF Constructs . 8 14 Case Selection Constructs . 9 15 F90 Optional Logic Block Names . 9 16 GO TO Break-out of Nested Loops . 9 17 Skip a Single Loop Cycle . 10 18 Abort a Single Loop . 10 19 F90 DOs Named for Control . 10 20 Looping While a Condition is True . 11 21 Function definitions . 11 22 Arguments and return values of subprograms . 12 23 Defining and referring to global variables .
    [Show full text]
  • Quick Overview: Complex Numbers
    Quick Overview: Complex Numbers February 23, 2012 1 Initial Definitions Definition 1 The complex number z is defined as: z = a + bi (1) p where a, b are real numbers and i = −1. Remarks about the definition: • Engineers typically use j instead of i. • Examples of complex numbers: p 5 + 2i; 3 − 2i; 3; −5i • Powers of i: i2 = −1 i3 = −i i4 = 1 i5 = i i6 = −1 i7 = −i . • All real numbers are also complex (by taking b = 0). 2 Visualizing Complex Numbers A complex number is defined by it's two real numbers. If we have z = a + bi, then: Definition 2 The real part of a + bi is a, Re(z) = Re(a + bi) = a The imaginary part of a + bi is b, Im(z) = Im(a + bi) = b 1 Im(z) 4i 3i z = a + bi 2i r b 1i θ Re(z) a −1i Figure 1: Visualizing z = a + bi in the complex plane. Shown are the modulus (or length) r and the argument (or angle) θ. To visualize a complex number, we use the complex plane C, where the horizontal (or x-) axis is for the real part, and the vertical axis is for the imaginary part. That is, a + bi is plotted as the point (a; b). In Figure 1, we can see that it is also possible to represent the point a + bi, or (a; b) in polar form, by computing its modulus (or size), and angle (or argument): p r = jzj = a2 + b2 θ = arg(z) We have to be a bit careful defining φ, since there are many ways to write φ (and we could add multiples of 2π as well).
    [Show full text]
  • A Quick Algebra Review
    A Quick Algebra Review 1. Simplifying Expressions 2. Solving Equations 3. Problem Solving 4. Inequalities 5. Absolute Values 6. Linear Equations 7. Systems of Equations 8. Laws of Exponents 9. Quadratics 10. Rationals 11. Radicals Simplifying Expressions An expression is a mathematical “phrase.” Expressions contain numbers and variables, but not an equal sign. An equation has an “equal” sign. For example: Expression: Equation: 5 + 3 5 + 3 = 8 x + 3 x + 3 = 8 (x + 4)(x – 2) (x + 4)(x – 2) = 10 x² + 5x + 6 x² + 5x + 6 = 0 x – 8 x – 8 > 3 When we simplify an expression, we work until there are as few terms as possible. This process makes the expression easier to use, (that’s why it’s called “simplify”). The first thing we want to do when simplifying an expression is to combine like terms. For example: There are many terms to look at! Let’s start with x². There Simplify: are no other terms with x² in them, so we move on. 10x x² + 10x – 6 – 5x + 4 and 5x are like terms, so we add their coefficients = x² + 5x – 6 + 4 together. 10 + (-5) = 5, so we write 5x. -6 and 4 are also = x² + 5x – 2 like terms, so we can combine them to get -2. Isn’t the simplified expression much nicer? Now you try: x² + 5x + 3x² + x³ - 5 + 3 [You should get x³ + 4x² + 5x – 2] Order of Operations PEMDAS – Please Excuse My Dear Aunt Sally, remember that from Algebra class? It tells the order in which we can complete operations when solving an equation.
    [Show full text]
  • Fortran Math Special Functions Library
    IMSL® Fortran Math Special Functions Library Version 2021.0 Copyright 1970-2021 Rogue Wave Software, Inc., a Perforce company. Visual Numerics, IMSL, and PV-WAVE are registered trademarks of Rogue Wave Software, Inc., a Perforce company. IMPORTANT NOTICE: Information contained in this documentation is subject to change without notice. Use of this docu- ment is subject to the terms and conditions of a Rogue Wave Software License Agreement, including, without limitation, the Limited Warranty and Limitation of Liability. ACKNOWLEDGMENTS Use of the Documentation and implementation of any of its processes or techniques are the sole responsibility of the client, and Perforce Soft- ware, Inc., assumes no responsibility and will not be liable for any errors, omissions, damage, or loss that might result from any use or misuse of the Documentation PERFORCE SOFTWARE, INC. MAKES NO REPRESENTATION ABOUT THE SUITABILITY OF THE DOCUMENTATION. THE DOCU- MENTATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. PERFORCE SOFTWARE, INC. HEREBY DISCLAIMS ALL WARRANTIES AND CONDITIONS WITH REGARD TO THE DOCUMENTATION, WHETHER EXPRESS, IMPLIED, STATUTORY, OR OTHERWISE, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PAR- TICULAR PURPOSE, OR NONINFRINGEMENT. IN NO EVENT SHALL PERFORCE SOFTWARE, INC. BE LIABLE, WHETHER IN CONTRACT, TORT, OR OTHERWISE, FOR ANY SPECIAL, CONSEQUENTIAL, INDIRECT, PUNITIVE, OR EXEMPLARY DAMAGES IN CONNECTION WITH THE USE OF THE DOCUMENTATION. The Documentation is subject to change at any time without notice. IMSL https://www.imsl.com/ Contents Introduction The IMSL Fortran Numerical Libraries . 1 Getting Started . 2 Finding the Right Routine . 3 Organization of the Documentation . 4 Naming Conventions .
    [Show full text]
  • Undefinedness and Soft Typing in Formal Mathematics
    Undefinedness and Soft Typing in Formal Mathematics PhD Research Proposal by Jonas Betzendahl January 7, 2020 Supervisors: Prof. Dr. Michael Kohlhase Dr. Florian Rabe Abstract During my PhD, I want to contribute towards more natural reasoning in formal systems for mathematics (i.e. closer resembling that of actual mathematicians), with special focus on two topics on the precipice between type systems and logics in formal systems: undefinedness and soft types. Undefined terms are common in mathematics as practised on paper or blackboard by math- ematicians, yet few of the many systems for formalising mathematics that exist today deal with it in a principled way or allow explicit reasoning with undefined terms because allowing for this often results in undecidability. Soft types are a way of allowing the user of a formal system to also incorporate information about mathematical objects into the type system that had to be proven or computed after their creation. This approach is equally a closer match for mathematics as performed by mathematicians and has had promising results in the past. The MMT system constitutes a natural fit for this endeavour due to its rapid prototyping ca- pabilities and existing infrastructure. However, both of the aspects above necessitate a stronger support for automated reasoning than currently available. Hence, a further goal of mine is to extend the MMT framework with additional capabilities for automated and interactive theorem proving, both for reasoning in and outside the domains of undefinedness and soft types. 2 Contents 1 Introduction & Motivation 4 2 State of the Art 5 2.1 Undefinedness in Mathematics .
    [Show full text]
  • Complex Analysis Class 24: Wednesday April 2
    Complex Analysis Math 214 Spring 2014 Fowler 307 MWF 3:00pm - 3:55pm c 2014 Ron Buckmire http://faculty.oxy.edu/ron/math/312/14/ Class 24: Wednesday April 2 TITLE Classifying Singularities using Laurent Series CURRENT READING Zill & Shanahan, §6.2-6.3 HOMEWORK Zill & Shanahan, §6.2 3, 15, 20, 24 33*. §6.3 7, 8, 9, 10. SUMMARY We shall be introduced to Laurent Series and learn how to use them to classify different various kinds of singularities (locations where complex functions are no longer analytic). Classifying Singularities There are basically three types of singularities (points where f(z) is not analytic) in the complex plane. Isolated Singularity An isolated singularity of a function f(z) is a point z0 such that f(z) is analytic on the punctured disc 0 < |z − z0| <rbut is undefined at z = z0. We usually call isolated singularities poles. An example is z = i for the function z/(z − i). Removable Singularity A removable singularity is a point z0 where the function f(z0) appears to be undefined but if we assign f(z0) the value w0 with the knowledge that lim f(z)=w0 then we can say that we z→z0 have “removed” the singularity. An example would be the point z = 0 for f(z) = sin(z)/z. Branch Singularity A branch singularity is a point z0 through which all possible branch cuts of a multi-valued function can be drawn to produce a single-valued function. An example of such a point would be the point z = 0 for Log (z).
    [Show full text]
  • Control Systems
    ECE 380: Control Systems Course Notes: Winter 2014 Prof. Shreyas Sundaram Department of Electrical and Computer Engineering University of Waterloo ii c Shreyas Sundaram Acknowledgments Parts of these course notes are loosely based on lecture notes by Professors Daniel Liberzon, Sean Meyn, and Mark Spong (University of Illinois), on notes by Professors Daniel Davison and Daniel Miller (University of Waterloo), and on parts of the textbook Feedback Control of Dynamic Systems (5th edition) by Franklin, Powell and Emami-Naeini. I claim credit for all typos and mistakes in the notes. The LATEX template for The Not So Short Introduction to LATEX 2" by T. Oetiker et al. was used to typeset portions of these notes. Shreyas Sundaram University of Waterloo c Shreyas Sundaram iv c Shreyas Sundaram Contents 1 Introduction 1 1.1 Dynamical Systems . .1 1.2 What is Control Theory? . .2 1.3 Outline of the Course . .4 2 Review of Complex Numbers 5 3 Review of Laplace Transforms 9 3.1 The Laplace Transform . .9 3.2 The Inverse Laplace Transform . 13 3.2.1 Partial Fraction Expansion . 13 3.3 The Final Value Theorem . 15 4 Linear Time-Invariant Systems 17 4.1 Linearity, Time-Invariance and Causality . 17 4.2 Transfer Functions . 18 4.2.1 Obtaining the transfer function of a differential equation model . 20 4.3 Frequency Response . 21 5 Bode Plots 25 5.1 Rules for Drawing Bode Plots . 26 5.1.1 Bode Plot for Ko ....................... 27 5.1.2 Bode Plot for sq ....................... 28 s −1 s 5.1.3 Bode Plot for ( p + 1) and ( z + 1) .
    [Show full text]
  • Division by Zero in Logic and Computing Jan Bergstra
    Division by Zero in Logic and Computing Jan Bergstra To cite this version: Jan Bergstra. Division by Zero in Logic and Computing. 2021. hal-03184956v2 HAL Id: hal-03184956 https://hal.archives-ouvertes.fr/hal-03184956v2 Preprint submitted on 19 Apr 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. DIVISION BY ZERO IN LOGIC AND COMPUTING JAN A. BERGSTRA Abstract. The phenomenon of division by zero is considered from the per- spectives of logic and informatics respectively. Division rather than multi- plicative inverse is taken as the point of departure. A classification of views on division by zero is proposed: principled, physics based principled, quasi- principled, curiosity driven, pragmatic, and ad hoc. A survey is provided of different perspectives on the value of 1=0 with for each view an assessment view from the perspectives of logic and computing. No attempt is made to survey the long and diverse history of the subject. 1. Introduction In the context of rational numbers the constants 0 and 1 and the operations of addition ( + ) and subtraction ( − ) as well as multiplication ( · ) and division ( = ) play a key role. When starting with a binary primitive for subtraction unary opposite is an abbreviation as follows: −x = 0 − x, and given a two-place division function unary inverse is an abbreviation as follows: x−1 = 1=x.
    [Show full text]
  • IEEE Standard 754 for Binary Floating-Point Arithmetic
    Work in Progress: Lecture Notes on the Status of IEEE 754 October 1, 1997 3:36 am Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic Prof. W. Kahan Elect. Eng. & Computer Science University of California Berkeley CA 94720-1776 Introduction: Twenty years ago anarchy threatened floating-point arithmetic. Over a dozen commercially significant arithmetics boasted diverse wordsizes, precisions, rounding procedures and over/underflow behaviors, and more were in the works. “Portable” software intended to reconcile that numerical diversity had become unbearably costly to develop. Thirteen years ago, when IEEE 754 became official, major microprocessor manufacturers had already adopted it despite the challenge it posed to implementors. With unprecedented altruism, hardware designers had risen to its challenge in the belief that they would ease and encourage a vast burgeoning of numerical software. They did succeed to a considerable extent. Anyway, rounding anomalies that preoccupied all of us in the 1970s afflict only CRAY X-MPs — J90s now. Now atrophy threatens features of IEEE 754 caught in a vicious circle: Those features lack support in programming languages and compilers, so those features are mishandled and/or practically unusable, so those features are little known and less in demand, and so those features lack support in programming languages and compilers. To help break that circle, those features are discussed in these notes under the following headings: Representable Numbers, Normal and Subnormal, Infinite
    [Show full text]
  • A Practical Introduction to Python Programming
    A Practical Introduction to Python Programming Brian Heinold Department of Mathematics and Computer Science Mount St. Mary’s University ii ©2012 Brian Heinold Licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported Li- cense Contents I Basics1 1 Getting Started 3 1.1 Installing Python..............................................3 1.2 IDLE......................................................3 1.3 A first program...............................................4 1.4 Typing things in...............................................5 1.5 Getting input.................................................6 1.6 Printing....................................................6 1.7 Variables...................................................7 1.8 Exercises...................................................9 2 For loops 11 2.1 Examples................................................... 11 2.2 The loop variable.............................................. 13 2.3 The range function............................................ 13 2.4 A Trickier Example............................................. 14 2.5 Exercises................................................... 15 3 Numbers 19 3.1 Integers and Decimal Numbers.................................... 19 3.2 Math Operators............................................... 19 3.3 Order of operations............................................ 21 3.4 Random numbers............................................. 21 3.5 Math functions............................................... 21 3.6 Getting
    [Show full text]
  • A Fast-Start Method for Computing the Inverse Tangent
    A Fast-Start Method for Computing the Inverse Tangent Peter Markstein Hewlett-Packard Laboratories 1501 Page Mill Road Palo Alto, CA 94062, U.S.A. [email protected] Abstract duction method is handled out-of-loop because its use is rarely required.) In a search for an algorithm to compute atan(x) For short latency, it is often desirable to use many in- which has both low latency and few floating point in- structions. Different sequences may be optimal over var- structions, an interesting variant of familiar trigonom- ious subdomains of the function, and in each subdomain, etry formulas was discovered that allow the start of ar- instruction level parallelism, such as Estrin’s method of gument reduction to commence before any references to polynomial evaluation [3] reduces latency at the cost of tables stored in memory are needed. Low latency makes additional intermediate calculations. the method suitable for a closed subroutine, and few In searching for an inverse tangent algorithm, an un- floating point operations make the method advantageous usual method was discovered that permits a floating for a software-pipelined implementation. point argument reduction computation to begin at the first cycle, without first examining a table of values. While a table is used, the table access is overlapped with 1Introduction most of the argument reduction, helping to contain la- tency. The table is designed to allow a low degree odd Short latency and high throughput are two conflict- polynomial to complete the approximation. A new ap- ing goals in transcendental function evaluation algo- proach to the logarithm routine [1] which motivated this rithm design.
    [Show full text]