Improving Particle Filter Performance Using SSE Instructions

Total Page:16

File Type:pdf, Size:1020Kb

Improving Particle Filter Performance Using SSE Instructions To appear in Proceedings of IROS 2009-IEEE/RSJ International Conference on Intelligent RObots and Systems (IROS 09), St. Louis, USA, October 2009. Improving Particle Filter Performance Using SSE Instructions Peter Djeu and Michael Quinlan and Peter Stone Abstract— Robotics researchers are often faced with real- advantage of the SSE vector unit in developing his or her time constraints, and for that reason algorithmic and code/algorithms. implementation-level optimization can dramatically increase the In this paper we present ground-up SSE implementations overall performance of a robot. In this paper we illustrate how a substantial run-time gain can be achieved by taking advantage of key functions required in the robotics domain (atan, atan2, of the extended instruction sets found in modern processors, exp) and apply them to a commonly used and practical in particular the SSE1 and SSE2 instruction sets. We present robotics algorithm: Monte Carlo Localization (MCL) [3]. In an SSE version of Monte Carlo Localization that results in an our implementation we achieve an impressive 9x speedup, impressive 9x speedup over an optimized scalar implementation. surpassing the ideal speedup of 4x. In the process, we discuss SSE implementations of atan, atan2 and exp that achieve up to a 4x speedup in these mathematical While this improvement in and of itself is noteworthy, one operations alone. of the main objectives of this paper is to encourage and also instruct the robotics community on how to implement its own I. INTRODUCTION algorithms using the SSE instruction set. To help facilitate In robotics, it is often critical to process sensory data or this task we are releasing the full source code for our MCL to compute decisions in real-time, that is to say at the rate implementation, including the SSE components. at which sensory data is captured. For example, a robot with The remainder of this paper is structured as follows: Sec- a camera that captures frames at 30 Hz should complete tion II provides an overview of the basics of SSE. Section III its sense-act loop at the same rate or risk missing sensing introduces the Monte Carlo Localization algorithm and steps opportunities. Sometimes this constrains the processing al- through the construction of the SSE version, with particular gorithms that can be used (e.g. Hough transforms may be focus on the development of the SSE math extensions (such ruled out entirely) and sometimes it constrains the quality as atan2). Section IV will review and discuss the run-time of the processing (e.g. the number of particles that can be performance results of both the overall MCL algorithm and processed in a particle filtering algorithm). As a result, an the individual math operations. We will conclude and present important component of robotics is code streamlining and future work in Section V. optimization. A common source of untapped potential for II. USING THE SSE INSRUCTION SET code optimization is the use of the vector unit on CPUs to perform efficient and parallel computation. Traditional CPU instructions take single values as their Since 19981 most modern processors, including desktop operands. A typical floating point addition instruction has and notebook/netbook processors, can perform SIMD (Single the following form, which we call 1-wide addition. § Instruction Multiple Data) operations which allow a single f l o a t a = 1.0f; instruction stream to drive parallel computation. In particular f l o a t b = 5.0f; f l o a t c=a+b; // c: 6.0f both Intel and AMD support the SSE (Streaming SIMD ¦ ¥ Extension) instruction set and the follow-up instruction set In SSE, the data types are expanded from ‘float’ to SSE2 via an SSE vector unit [1]. Using the SSE vector unit ‘sse4Floats’2, which include the values of 4 separate floating allows the CPU to perform 4-wide operations in place of 1- points. When an addition is performed on two SSE operands, wide operations; four additions can be performed in the time each of the 4 elements is independently summed. In other it usually takes to perform one. Under ideal settings, an SSE words, element 0 in the first operand is added to element vector implementation can be up to 4x faster than the scalar 0 in the second operand, element 1 in the first operand is implementation, and in cases such as the ones described in added to element 1 in the second operand, and so on. We this paper, it can be even faster. will refer to this operation as a 4-wide addition. However, developing code that effectively uses the SSE In the following example, we will use a C++ constructor instruction set has generally been restricted to graphics for sse4Floats that takes 4 floating point values. The ‘+’ researchers or to the developers of heavily optimized libraries operator is overloaded so that if both operands are sse4Floats, for specific tasks such as vision [2] and linear algebra [4]. the SSE unit is invoked to perform a 4-wide addition. While these libraries provide excellent performance on their § sse4Floats a = sse4Floats (1.0f, 2.0f, 3.0f, 4.0f); intended tasks, the average roboticist has failed to take sse4Floats b = sse4Floats (5.0f, 6.0f, 7.0f, 8.0f); sse4Floats c=a+b; // c: (6.0f, 8.0f, 10.0f, 12.0f) All authors are in the Department of Computer Science, The University of ¦ ¥ Texas at Austin. djeu,mquinlan,[email protected] 1In 1998 AMD released the K6-2 processors, this was followed in 1999 2The built-in SSE data type is actually ‘ m128’, however we encapsulate by Intel’s Pentium III processors this type into a C++ class, which we call ‘sse4Floats’. We will use the term 1-wide and 4-wide to refer to § sse4Floats a = (1.0f, 0.0f, −1.0f , −2.0f ); mathematical operations that are performed analogously to sse4Floats b = (0.0f, 0.0f, 0.0f, 0.0f); the addition operations. The term scalar will be used in- sseMask mask = ( a >= b); // mask: (MASK TRUE , MASK TRUE , MASK FALSE, MASK FALSE ) terchangeably with 1-wide and vector interchangeably with ¦ ¥ 4-wide. This sseMask can then be used in a 4-wide blend operation Apart from 4-wide floating point addition, the SSE vector which takes three operands: an sseMask, a true operand, and unit also provides support for other arithmetic operations, a false operand. The return value is an sse4Floats that is such as subtraction, multiplication, division, minimum, max- set to the value of the true operand at all elements where imum, and bitwise operations. The vector unit supports the sseMask had value MASK TRUE. The return value is vectors of four 32-bit integers (aka ints) and has built-in set to the false operand at all other locations. The operators support for addition, subtraction, and bitwise operations. ‘&’, ‘|’, and ‘∼’ are overloaded to invoke the corresponding Processors which support later versions of SSE also pro- bitwise operations in the SSE unit. vide multiplication, minimum, and maximum operations for § sse4Floats blend4 ( sseMask mask , sse4Floats i n true , integers. sse4Floats i n f a l s e ) { return (mask & in t r u e ) | ( ∼mask & in false ); A. SSE Masks } One difficulty that occurs when using SSE is that a single ¦ ¥ instruction stream must be used for all processing. This The following is an SSE implementation of absolute value. is especially troubling when implementing algorithms that The ‘>=’ operator is overloaded to invoke the greater-than- would normally require branching. Consider the following or-equal-to comparison operator in the SSE unit, while the ‘-’ example, which returns the absolute value of f. operator is overloaded to invoke the unary negation operator § in the SSE unit. f l o a t abs ( f l o a t f ) { return ( f >= 0.0f) ? f : −f ; } § ¦ ¥ sse4Floats abs ( sse4Floats f ) { When implementing a 4-wide version of absolute value for sse4Floats z e r o s = sse4Floats (0.0f, 0.0f, 0.0f, 0.0f); sseMask i s n o n n e g = ( f >= zeros); an sse4Floats consisting of the values (-1.0f, -2.0f, +3.0f, return blend4(is n o n neg, f, −f ) ; +4.0f), the first 2 elements need to be negated, while the } ¦ ¥ last 2 elements do not. However, we must use the same instruction stream on all elements as per the definition of To summarize, masking allows us to conditionally perform SIMD. operations on only certain elements within an sse4Floats The solution is to evaluate both sides of the branch and rather than on all elements. Masking comes with a cost, to mask out the result from one branch and mask in the however, since both the true input and the false input must result from the other. The two results are then combined be evaluated whenever a mask is used. In scalar code, work and only the masked-in result survives. We now revisit the can often be saved by descending down only one branch of a scalar implementation to support this branching paradigm conditional, while in pure SSE code both sides of the branch using bitwise operations. For brevity, the implementation of must always be taken, even in cases where the sseMask the two reinterpretation helper functions are not shown. consists of 4 MASK TRUE’s or 4 MASK FALSE’s. § i n t reint f 2 i ( f l o a t f ) ; // reinterpret float as int B. Reduction Operations f l o a t reint i 2 f ( i n t i ) ; // reinterpret int as float f l o a t abs ( f l o a t f ) { It is often necessary to perform an operation which reduces i n t MASK TRUE = 0xffffffff; the 4 values within an sse4Floats to a single floating point i n t MASK FALSE = 0x00000000; i n t mask = ( f >= 0 .
Recommended publications
  • Fortran 90 Overview
    1 Fortran 90 Overview J.E. Akin, Copyright 1998 This overview of Fortran 90 (F90) features is presented as a series of tables that illustrate the syntax and abilities of F90. Frequently comparisons are made to similar features in the C++ and F77 languages and to the Matlab environment. These tables show that F90 has significant improvements over F77 and matches or exceeds newer software capabilities found in C++ and Matlab for dynamic memory management, user defined data structures, matrix operations, operator definition and overloading, intrinsics for vector and parallel pro- cessors and the basic requirements for object-oriented programming. They are intended to serve as a condensed quick reference guide for programming in F90 and for understanding programs developed by others. List of Tables 1 Comment syntax . 4 2 Intrinsic data types of variables . 4 3 Arithmetic operators . 4 4 Relational operators (arithmetic and logical) . 5 5 Precedence pecking order . 5 6 Colon Operator Syntax and its Applications . 5 7 Mathematical functions . 6 8 Flow Control Statements . 7 9 Basic loop constructs . 7 10 IF Constructs . 8 11 Nested IF Constructs . 8 12 Logical IF-ELSE Constructs . 8 13 Logical IF-ELSE-IF Constructs . 8 14 Case Selection Constructs . 9 15 F90 Optional Logic Block Names . 9 16 GO TO Break-out of Nested Loops . 9 17 Skip a Single Loop Cycle . 10 18 Abort a Single Loop . 10 19 F90 DOs Named for Control . 10 20 Looping While a Condition is True . 11 21 Function definitions . 11 22 Arguments and return values of subprograms . 12 23 Defining and referring to global variables .
    [Show full text]
  • Quick Overview: Complex Numbers
    Quick Overview: Complex Numbers February 23, 2012 1 Initial Definitions Definition 1 The complex number z is defined as: z = a + bi (1) p where a, b are real numbers and i = −1. Remarks about the definition: • Engineers typically use j instead of i. • Examples of complex numbers: p 5 + 2i; 3 − 2i; 3; −5i • Powers of i: i2 = −1 i3 = −i i4 = 1 i5 = i i6 = −1 i7 = −i . • All real numbers are also complex (by taking b = 0). 2 Visualizing Complex Numbers A complex number is defined by it's two real numbers. If we have z = a + bi, then: Definition 2 The real part of a + bi is a, Re(z) = Re(a + bi) = a The imaginary part of a + bi is b, Im(z) = Im(a + bi) = b 1 Im(z) 4i 3i z = a + bi 2i r b 1i θ Re(z) a −1i Figure 1: Visualizing z = a + bi in the complex plane. Shown are the modulus (or length) r and the argument (or angle) θ. To visualize a complex number, we use the complex plane C, where the horizontal (or x-) axis is for the real part, and the vertical axis is for the imaginary part. That is, a + bi is plotted as the point (a; b). In Figure 1, we can see that it is also possible to represent the point a + bi, or (a; b) in polar form, by computing its modulus (or size), and angle (or argument): p r = jzj = a2 + b2 θ = arg(z) We have to be a bit careful defining φ, since there are many ways to write φ (and we could add multiples of 2π as well).
    [Show full text]
  • Fortran Math Special Functions Library
    IMSL® Fortran Math Special Functions Library Version 2021.0 Copyright 1970-2021 Rogue Wave Software, Inc., a Perforce company. Visual Numerics, IMSL, and PV-WAVE are registered trademarks of Rogue Wave Software, Inc., a Perforce company. IMPORTANT NOTICE: Information contained in this documentation is subject to change without notice. Use of this docu- ment is subject to the terms and conditions of a Rogue Wave Software License Agreement, including, without limitation, the Limited Warranty and Limitation of Liability. ACKNOWLEDGMENTS Use of the Documentation and implementation of any of its processes or techniques are the sole responsibility of the client, and Perforce Soft- ware, Inc., assumes no responsibility and will not be liable for any errors, omissions, damage, or loss that might result from any use or misuse of the Documentation PERFORCE SOFTWARE, INC. MAKES NO REPRESENTATION ABOUT THE SUITABILITY OF THE DOCUMENTATION. THE DOCU- MENTATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. PERFORCE SOFTWARE, INC. HEREBY DISCLAIMS ALL WARRANTIES AND CONDITIONS WITH REGARD TO THE DOCUMENTATION, WHETHER EXPRESS, IMPLIED, STATUTORY, OR OTHERWISE, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PAR- TICULAR PURPOSE, OR NONINFRINGEMENT. IN NO EVENT SHALL PERFORCE SOFTWARE, INC. BE LIABLE, WHETHER IN CONTRACT, TORT, OR OTHERWISE, FOR ANY SPECIAL, CONSEQUENTIAL, INDIRECT, PUNITIVE, OR EXEMPLARY DAMAGES IN CONNECTION WITH THE USE OF THE DOCUMENTATION. The Documentation is subject to change at any time without notice. IMSL https://www.imsl.com/ Contents Introduction The IMSL Fortran Numerical Libraries . 1 Getting Started . 2 Finding the Right Routine . 3 Organization of the Documentation . 4 Naming Conventions .
    [Show full text]
  • IEEE Standard 754 for Binary Floating-Point Arithmetic
    Work in Progress: Lecture Notes on the Status of IEEE 754 October 1, 1997 3:36 am Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic Prof. W. Kahan Elect. Eng. & Computer Science University of California Berkeley CA 94720-1776 Introduction: Twenty years ago anarchy threatened floating-point arithmetic. Over a dozen commercially significant arithmetics boasted diverse wordsizes, precisions, rounding procedures and over/underflow behaviors, and more were in the works. “Portable” software intended to reconcile that numerical diversity had become unbearably costly to develop. Thirteen years ago, when IEEE 754 became official, major microprocessor manufacturers had already adopted it despite the challenge it posed to implementors. With unprecedented altruism, hardware designers had risen to its challenge in the belief that they would ease and encourage a vast burgeoning of numerical software. They did succeed to a considerable extent. Anyway, rounding anomalies that preoccupied all of us in the 1970s afflict only CRAY X-MPs — J90s now. Now atrophy threatens features of IEEE 754 caught in a vicious circle: Those features lack support in programming languages and compilers, so those features are mishandled and/or practically unusable, so those features are little known and less in demand, and so those features lack support in programming languages and compilers. To help break that circle, those features are discussed in these notes under the following headings: Representable Numbers, Normal and Subnormal, Infinite
    [Show full text]
  • A Practical Introduction to Python Programming
    A Practical Introduction to Python Programming Brian Heinold Department of Mathematics and Computer Science Mount St. Mary’s University ii ©2012 Brian Heinold Licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported Li- cense Contents I Basics1 1 Getting Started 3 1.1 Installing Python..............................................3 1.2 IDLE......................................................3 1.3 A first program...............................................4 1.4 Typing things in...............................................5 1.5 Getting input.................................................6 1.6 Printing....................................................6 1.7 Variables...................................................7 1.8 Exercises...................................................9 2 For loops 11 2.1 Examples................................................... 11 2.2 The loop variable.............................................. 13 2.3 The range function............................................ 13 2.4 A Trickier Example............................................. 14 2.5 Exercises................................................... 15 3 Numbers 19 3.1 Integers and Decimal Numbers.................................... 19 3.2 Math Operators............................................... 19 3.3 Order of operations............................................ 21 3.4 Random numbers............................................. 21 3.5 Math functions............................................... 21 3.6 Getting
    [Show full text]
  • A Fast-Start Method for Computing the Inverse Tangent
    A Fast-Start Method for Computing the Inverse Tangent Peter Markstein Hewlett-Packard Laboratories 1501 Page Mill Road Palo Alto, CA 94062, U.S.A. [email protected] Abstract duction method is handled out-of-loop because its use is rarely required.) In a search for an algorithm to compute atan(x) For short latency, it is often desirable to use many in- which has both low latency and few floating point in- structions. Different sequences may be optimal over var- structions, an interesting variant of familiar trigonom- ious subdomains of the function, and in each subdomain, etry formulas was discovered that allow the start of ar- instruction level parallelism, such as Estrin’s method of gument reduction to commence before any references to polynomial evaluation [3] reduces latency at the cost of tables stored in memory are needed. Low latency makes additional intermediate calculations. the method suitable for a closed subroutine, and few In searching for an inverse tangent algorithm, an un- floating point operations make the method advantageous usual method was discovered that permits a floating for a software-pipelined implementation. point argument reduction computation to begin at the first cycle, without first examining a table of values. While a table is used, the table access is overlapped with 1Introduction most of the argument reduction, helping to contain la- tency. The table is designed to allow a low degree odd Short latency and high throughput are two conflict- polynomial to complete the approximation. A new ap- ing goals in transcendental function evaluation algo- proach to the logarithm routine [1] which motivated this rithm design.
    [Show full text]
  • Appendix a Mathematical Fundamentals
    Appendix A Mathematical Fundamentals In this book, it is assumed that the reader has familiarity with the mathematical definitions from the areas of Linear Algebra, Trigonometry, and other related areas. Some of them that are introduced in this Appendix to make the reader quickly understand the derivations and notations used in different chapters of this book. A.1 Function ‘Atan2’ The usual inverse tangent function denoted by Atan(z), where z=y/x, returns an angle in the range (-π/2, π/2). In order to express the full range of angles it is useful to define the so called two- argument arctangent function denoted by Atan2(y,x), which returns angle in the entire range, i.e., (- π, π). This function is defined for all (x,y)≠0, and equals the unique angle θ such that x y cosθ = , and sinθ = …(A.1) x 2 + y 2 x 2 + y 2 The function uses the signs of x and y to select the appropriate quadrant for the angle θ, as explained in Table A.1. Table A.1 Evaluation of ‘Atan2’ function x Atan2(y,x) +ve Atan(z) 0 Sgn(y) π/2 -ve Atan(z)+ Sgn(y) π In Table A.1, z=y/x, and Sgn(.) denotes the usual sign function, i.e., its value is -1, 0, or 1 depending on the positive, zero, and negative values of y, respectively. However, if both x and y are zeros, ‘Atan2’ is undefined. Now, using the table, Atan2(-1,1)= -π/4 and Atan2(1,-1)= 3π/4, whereas the Atan(-1) returns -π/4 in both the cases.
    [Show full text]
  • Hardware Implementations of Fixed-Point Atan2
    Hardware Implementations of Fixed-Point Atan2 Hardware Implementations of Fixed-Point Atan2 Florent de Dinechin Matei I¸stoan Universit´ede Lyon, INRIA, INSA-Lyon, CITI-Lab ARITH22 Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 Hardware Implementations of Fixed-Point Atan2 Introduction: Methods for computing Atan2 Methods for Computing atan2 in Hardware Yet another arithmetic function . • . that is useful in telecom (to recover the phase of a signal) (12{24 bits of precision) • . and in general for cartesian to polar coordinate transformation • and an interesting function, nonetheless 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0.8 1 0.6 0.8 0.4 x 0.6 0.4 0.2 0.2 y 0 0 Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 2 / 24 Hardware Implementations of Fixed-Point Atan2 Introduction: Methods for computing Atan2 Common Specification • target function y 1 (x; y) 1 y α = atan2(y; x) f (x; y) = arctan ( ) π x −1 1 x • input: fixed-point format −1 −1 0 1 ky y arctan ( ) = arctan ( ) [ ) kx x • output: fixed-point format and binary angles y (0; 1) π 2 −1 0 1 (−1; 0) π 0 (1; 0) −π x =) [ ) Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 3 / 24 Hardware Implementations of Fixed-Point Atan2 Introduction: Methods for computing Atan2 Common Specification • target function y 1 (x; y) 1 y α = atan2(y; x) f (x; y) = arctan ( ) π x −1 1 x • input: fixed-point format −1 −1 0 1 ky y arctan ( ) = arctan ( ) [ ) kx x • output: fixed-point format and binary
    [Show full text]
  • Taichi Documentation Release 0.6.26
    taichi Documentation Release 0.6.26 Taichi Developers Aug 12, 2020 Overview 1 Why new programming language1 1.1 Design decisions.............................................1 2 Installation 3 2.1 Troubleshooting.............................................3 2.1.1 Windows issues.........................................3 2.1.2 Python issues..........................................3 2.1.3 CUDA issues..........................................4 2.1.4 OpenGL issues.........................................4 2.1.5 Linux issues...........................................5 2.1.6 Other issues...........................................5 3 Hello, world! 7 3.1 import taichi as ti.............................................8 3.2 Portability................................................8 3.3 Fields...................................................9 3.4 Functions and kernels..........................................9 3.5 Parallel for-loops............................................. 10 3.6 Interacting with other Python packages................................. 11 3.6.1 Python-scope data access.................................... 11 3.6.2 Sharing data with other packages................................ 11 4 Syntax 13 4.1 Taichi-scope vs Python-scope...................................... 13 4.2 Kernels.................................................. 13 4.2.1 Arguments........................................... 14 4.2.2 Return value........................................... 14 4.2.3 Advanced arguments...................................... 15 4.3 Functions................................................
    [Show full text]
  • Trigonometric Functions
    Title stata.com Trigonometric functions Contents Functions Reference Also see Contents acos(x) the radian value of the arccosine of x acosh(x) the inverse hyperbolic cosine of x asin(x) the radian value of the arcsine of x asinh(x) the inverse hyperbolic sine of x atan(x) the radian value of the arctangent of x atan2(y, x) the radian value of the arctangent of y=x, where the signs of the parameters y and x are used to determine the quadrant of the answer atanh(x) the inverse hyperbolic tangent of x cos(x) the cosine of x, where x is in radians cosh(x) the hyperbolic cosine of x sin(x) the sine of x, where x is in radians sinh(x) the hyperbolic sine of x tan(x) the tangent of x, where x is in radians tanh(x) the hyperbolic tangent of x Functions acos(x) Description: the radian value of the arccosine of x Domain: −1 to 1 Range: 0 to π acosh(x) Description: the inverse hyperbolic cosinep of x acosh(x) = ln(x + x2 − 1) Domain: 1 to 8.9e+307 Range: 0 to 709.77 asin(x) Description: the radian value of the arcsine of x Domain: −1 to 1 Range: −π=2 to π=2 asinh(x) Description: the inverse hyperbolic sinep of x asinh(x) = ln(x + x2 + 1) Domain: −8.9e+307 to 8.9e+307 Range: −709.77 to 709.77 1 2 Trigonometric functions atan(x) Description: the radian value of the arctangent of x Domain: −8e+307 to 8e+307 Range: −π=2 to π=2 atan2(y, x) Description: the radian value of the arctangent of y=x, where the signs of the parameters y and x are used to determine the quadrant of the answer Domain y: −8e+307 to 8e+307 Domain x: −8e+307 to 8e+307 Range: −π to
    [Show full text]
  • Hardware Implementations of Fixed-Point Atan2 Florent De Dinechin, Matei Istoan
    Hardware implementations of fixed-point Atan2 Florent de Dinechin, Matei Istoan To cite this version: Florent de Dinechin, Matei Istoan. Hardware implementations of fixed-point Atan2. 22nd IEEE Symposium on Computer Arithmetic, Jun 2015, Lyon, France. hal-01091138 HAL Id: hal-01091138 https://hal.inria.fr/hal-01091138 Submitted on 4 Dec 2014 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License Hardware implementations of fixed-point Atan2 Florent de Dinechin, Matei Istoan Universite´ de Lyon, INRIA, INSA-Lyon, CITI-INRIA, F-69621, Villeurbanne, France Abstract—The atan2 function computes the polar angle y arctan(x/y) of a point given by its cartesian coordinates. 1 It is widely used in digital signal processing to recover (x,y) the phase of a signal. This article studies for this context α = atan2(y,x) the implementation of atan2 with fixed-point inputs and outputs. It compares the prevalent CORDIC shift-and-add -1 1 x algorithm to two multiplier-based techniques. The first one reduces the bivariate atan2 function to two functions of one variable: the reciprocal, and the arctangent.
    [Show full text]
  • [6.48] Use R▷P () for Four-Quadrant Arc Tangent Function
    [6.48] Use R▶P✕ () for four-quadrant arc tangent function The built-in arc tangent function, tan-1(), cannot return the correct quadrant of an angle specified by x- and y-coordinates, because the argument does not contain enough information. Suppose we want the angle between the x-axis and a ray with origin (0,0) and passing through point (1,1). Since y − tan( ) = we havetan( ) = 1 or= 1( ) so = ✜ ✕ x ✕ 1 ✕ tan 1 ✕ 4 However, if the ray instead passes through (-1,-1), we get the same result since (-1/-1) is also equal to 1, but the angle is actually (−3✜)/4. This difficulty was addressed by the designers of the Fortran programming language, which includes a function called atan2(y,x) to find the arc tangent of y/x correctly in any of the four quadrants, by accounting for the signs of x and y. It is a simple matter to accomplish this in TI Basic with the built-in R▶Pθ() function, if we account for the special case (0,0). In fact, it can be done with a single line of TI Basic code: when(x=0 and y=0,undef,R▶Pθ(x,y)) undef is returned for (0,0), otherwise R▶Pθ(0,0) returns itself in Radian angle mode, and this expression in Degree angle mode: 180*R▶Pθ(0,0)/Œ Neither of these results are useful. A more elaborate function can be written which also handles list and matrix arguments: atan2(αx,αy) Func ©(x,y) 4-quadrant arctan(y/x) ©Must be installed in math\ ©6jan02/[email protected] local αt,εm,τx © Function name, error message, αx type "atan2 error"→εm © Initialize error message define αt(α,β)=func © Function finds atan2() of simple elements
    [Show full text]