A Novel Coordinate Rotation Digital Computer Method for Energy and Latency Saving by Trigonometric Operations Spatial Locality Principle

Copyright © 2019 American Scientific Publishers Journal of All rights reserved Low Power Electronics Printed in the United States of America Vol. 15, 338–350, 2019 A Novel Coordinate Rotation Digital Computer Method for Energy and Latency Saving by Trigonometric Operations Spatial Locality Principle Giuseppe Visalli Independent Researcher, Via G. Guerzoni, 1 Padova, 35126, Italy (Received: 23 June 2019; Accepted: 7 October 2019) In this work, we propose an approximate and energy-efficient CORDIC method, based on a trigonometric function spatial locality principle derived from benchmarks profiling. Successive sine/cosine computation requests cover more than 50% when the absolute phase difference is at most ten degrees. Consequently, this property suggests an optimized circuit implementation, both iterative or a succession of microrotation modules, where the last CORDIC requires fewer iterations, reducing the latency and the total energy budget at the same precision of two separate and independent instances. Thus, this simple design strategy allows significant area and energy dissipation in general-purpose VLSI architectures, but it introduces also dramatically optimizations in application- specific embedded systems used in the area of signal processing and radio frequency communi- cation. In this contribution, we introduce a method, the hardware overhead and the energy budget per single cycle. Simulation results show the total energy saving in considered benchmarks is 40% in pipelined and iterativeIP: general192.168.39.210 purposes CORDIC.On: Sat, 25 Furthermore, Sep 2021 our 20:55:15 application-specific systems (fast Fourier transform andCopyright: digital oscillators American for radiofrequency Scientific Publishers down conversions) show remarkable cycle savings when the successive sine/cosineDelivered computation by Ingenta requests are more than 70%. Finally, in this work, we extend the proposed approach to whichever phase difference less than 26.56,as a variable for the second CORDIC number of angle rotations. Keywords: CORDIC, Digital Arithmetic, Digital Signal Processing Chip, Energy Estimation, Fixed-Point Arithmetic, Very-Large Scale Integration Systems. 1. INTRODUCTION AND BACKGROUND Moreover, this algorithm has applications in special- Due to the rapid advances made in very-large-scale inte- purpose systems for real-time signal processing.12 13 These gration (VLSI) technology, digital signal processor (DSP), typical applications require the computation of trigono- the design for real-time applications requires the com- metric functions reducing the hardware to save area, putation of trigonometric functions as a primary task energy and the achievement of the minimum latency. The in the image, signal processing, and telecommunication. CORDIC algorithm calculates the sine and cosine of any The CORDIC (Coordinate Rotation Digital Computer) angle by successive rotations until the total residual angle algorithm1 is the most commonly used code to perform reaches a position very close to zero. This algorithm does trigonometry, logarithmic, exponential, hyperbolic, real not require signed integer multiplications as a principal and complex multiplication, eigenvalue estimation, square advantage, but it accomplishes the calculus by successive root, division, singular value decomposition and many shift-and-add operations. However, the total latency is the more.2 Walther3 extended the initial implementation and most important drawback, proportional to the used iter- N refined by many others.4 The CORDIC algorithm could ations ( ). For this reason, many applications improve be used in image and signal processing applications,5 6 conventional CORDIC, reducing the required cycles at the 14 telecommunication (e.g., radio baseband processors),7–9 same result precision. Control CORDIC and the Angle 15 3D graphics manipulation10 and Internet of things (IoT).11 Re-coding (AR) CORDIC reduces the used iterations; this last approach requires an average of N /3 and a maximum of N /2. Additionally, since the single CORDIC rota- Emails: [email protected], [email protected] tion is not a pure rotation but a rotation-extension, the 338 J. Low Power Electron. 2019, Vol. 15, No. 4 1546-1998/2019/15/338/013 doi:10.1166/jolpe.2019.1619 Visalli A Novel CORDIC Method for Energy and Latency Saving by Trigonometric Operations Spatial Locality Principle number of rotations for any angle should be a constant and quality of result in our considered benchmarks. The independent of the operand so that a constant scale fac- organization of the rest of the paper is as follows. Section 2 tor is used. Hence, the utilization of redundant number is an overview of the CORDIC algorithms. Subsequently, representations or carry-save adders (CSA)16 17 in addi- Section 3 enhances the conventional CORDIC to perform tion/subtraction reduces the total latency allowing a higher a second instance with a minimal incremental angle. Fur- clock rate.18 19 ther, we formalize our approach in terms of algorithm, In this work, we simulated several representative bench- VLSI implementation and relative error measurement in marks in the fields of signal processing, computer arith- Section 4, where we consider both general-purpose and metic and telecommunication for general purpose and application-specific problems. Section 5 summarizes our application-specific embedded cores. We trace the access results in terms of energy savings in our considered bench- to two successive trigonometric (sine/cosine) functions marks and the measure of achieved precisions of the sec- when the angles differ at most 10. Our profile shows that ond CORDIC instance. Finally, Section 6 concludes our 50% of total successive sine/cosine computations fall into work. our region; this last sentence is an equivalent spatial locality principle used in the context of microprocessor cache 2. THE CORDIC ALGORITHM 20 memory design. Low-power CORDIC implementations This section introduces the conventional CORDIC and its generally reduce the total energy at the price of increased specializations for reducing the total iterations responsi- 21 latency as depicted in the work. Another research in ble for the full latency. This algorithm is a succession of the area of VLSI signal processing considers the precom- phase rotations translated in hardware by an iterative VLSI putation of each CORDIC direction, accelerating every implementation or a succession of micro-rotation module; 22 shift-and-add stage. However, these timing optimizations this last preferred to achieve an elevated throughput. How- are by far from our approach excluding some un-useful ever, in this paper, we focus our work on to CORDIC rotations, under trigonometric operand constraints, reduc- algorithm with variable and unknown angles of rotation, ing a large portion of the required delay. Thus, our pro- a property used in the implementation of low-power DSP posed approach is reconfigurable hardware to reduce the and microprocessors. energy dissipation and latency by the removal of un- necessary angle rotations when twoIP: successive192.168.39.210 sin/cos On: fall Sat,2.1. 25 Sep Conventional 2021 20:55:15 CORDIC into an angle difference of at mostCopyright: ten degrees. American Other- ScientificThe CORDIC Publishers algorithm is a succession of rotations of wise, the hardware dissipation is almost alignedDelivered to the by Ingenta −n 4 angle in the form of en = arctan2 when n is an integer. current state of art and implementation technology. Our A read-only memory (ROM) stores these angles in a fixed proposed hardware does not dissipate more energy than precision format; the memory stores these angles mea- known CORDICs, a miss of the locality principle does sured in degrees. Hereafter, we consider a radix-2 32-bit not dissipate additional energy and latency. Our idea is to fixed-point CORDIC algorithm. The trigonometric func- compute the second CORDIC algorithm without resetting tions sine and cosine use a rotation starting from a couple the rotation results from the previous instance. In this way, x y of fixed-point numbers 0 0 and we obtain the point the hardware starts from a non-reset value and the next x y d · e d ∈ n+1 n+1 by a rotation of angle n n,where n rotation begins from a slightly larger angle than our refer- +1 −1,sowehave: ence offset value: 10. This approach could be used in any x −d · −n x CORDIC implementations based on binary either redun- n+1 = e 1 n 2 · n y cos n d · −n y (1) dant or un-redundant representation and with the single or n+1 n 2 1 n double rotation-extension algorithm implementation, this Here, the CORDIC algorithm uses the iterative equation last introduced in Ref. [23]. for zn as follows: The spatial locality of sine/cosine angles allows signif- z = z − d · −n icant hardware savings in application-specific VLSI. We n+1 n n arctan 2 (2) explored two representative benchmarks: the fast Fourier This new variable is the accumulated partial sum of step 24–27 transform (FFT) processors and the implementation of angles; it controls the sign of the angle rotation. The sign an accurate numerical controlled oscillator (NCO) used in z + z of n sets the rotation mode: clockwise ( 1) or not. If 0 the spectroscopy28 29 or modulated wave down conversion −k = is less than or equal to k=0 arctan 2 1 743 we 30 section in a typical digital telecommunication receiver. have: We consider a discrete number of reference approaches ⎛ ⎞ ⎛ ⎞ xn x · cosz − y · sin z illustrated in the contributions31 32 estimating the hard- ⎝ ⎠ ⎝ 0 0 0 0 ⎠ lim yn = K · x · sin z + y · cosz (3) ware wastes (area and power) and the maximum operative n→ 0 0 0 0 zn 0 frequency. K / d · We translated the reference and new hardware in a Scale factor in√ Eq. (3) is equal to n=0 1 cos n −n = + −2n = 90 nm CMOS technology, evaluating the energy efficiency arctan 2 n=0 1 2 1 646 . We calculate the J. Low Power Electron. 15, 338–350, 2019 339 A Novel CORDIC Method for Energy and Latency Saving by Trigonometric Operations Spatial Locality Principle Visalli x = /K = sine and cosine functions starting from 0 1 binary position; it computes two rotation-extension coeffi- y = z q p 0 607252 , 0 0 0and 0 our wished angle.

A Novel Coordinate Rotation Digital Computer Method for Energy and Latency Saving by Trigonometric Operations Spatial Locality Principle

Binary Counter

Cross Architectural Power Modelling

Experiment No

1 DIGITAL COUNTER and APPLICATIONS a Digital Counter Is

Design and Implementation of High Speed Counters Using “MUX Based Full Adder (MFA)”

Hardware-Assisted Rootkits: Abusing Performance Counters on the ARM and X86 Architectures

Central Processing Unit and Microprocessor Video

The Central Processing Unit (CPU)

Processor Hardware Counter Statistics As a First-Class System Resource

Designing Digital Sequential Logic Circuits © N

Tutorial on Adder and Subtractor Logic Circuits Digital Adder: 1. Half Adder 2. Full Adder. Half Adder- Full Adder

Arithmetic Logic Unit Architectures with Dynamically Defined Precision