Area- and Energy-Efficient CORDIC Accelerators in Deep Sub-Micron

Adv. Radio Sci., 10, 207–213, 2012 Advances in www.adv-radio-sci.net/10/207/2012/ doi:10.5194/ars-10-207-2012 Radio Science © Author(s) 2012. CC Attribution 3.0 License. Area- and energy-efficient CORDIC accelerators in deep sub-micron CMOS technologies U. Vishnoi and T. G. Noll Chair of Electrical Engineering and Computer Systems, RWTH Aachen University, Aachen, Germany Correspondence to: U. Vishnoi ([email protected]) Abstract. The COordinate Rotate DIgital Computer a 40-nm CMOS technology and occupies a silicon area of (CORDIC) algorithm is a well known versatile approach 1560 µm2 only. Maximum clock frequency from circuit sim- and is widely applied in today’s SoCs for especially but not ulation of extracted netlist is 768 MHz under typical, and restricted to digital communications. Dedicated CORDIC 463 MHz under worst case technology and application cor- blocks can be implemented in deep sub-micron CMOS tech- ner conditions, respectively. Simulated dynamic power dis- nologies at very low area and energy costs and are attrac- sipation is 0.24 uW MHz−1 at 0.9 V; static power is 38 uW in tive to be used as hardware accelerators for Application Spe- slow corner, 65 uW in typical corner and 518 uW in fast cor- cific Instruction Processors (ASIPs). Thereby, overcoming ner, respectively. The latter can be reduced by 43 % in a 40- the well known energy vs. flexibility conflict. Optimiz- nm CMOS technology using 0.5 V reverse-backbias. These ing Global Navigation Satellite System (GNSS) receivers features are compared with the results from different design to reduce the hardware complexity is an important research styles as well as with an implementation in 28-nm CMOS topic at present. In such receivers CORDIC accelerators technology. It is interesting that in the latter case area scales can be used for digital baseband processing (fixed-point) as expected, but worst case performance and energy do not and in Position-Velocity-Time estimation (floating-point). A scale well anymore. micro architecture well suited to such applications is presented. This architecture is parameterized according to the wordlengths as well as the number of iterations and can be 1 Introduction easily extended for floating point data format. Moreover, area can be traded for throughput by partially or even fully The CORDIC-approach is a well known versatile algorithm, unrolling the iterations, whereby the degree of pipelining which can be used to implement a large variety of arith- is organized with one CORDIC iteration per cycle. From metic functions being quite complex to implement on stan- the architectural description, the macro layout can be gen- dard processor architectures. Consequently, it is widely ap- erated fully automatically using an in-house datapath gen- plied in Digital Signal Processing (DSP) on today’s SoCs erator tool. Since the adders and shifters play an impor- for especially but not restricted to digital communications. tant role in optimizing the CORDIC block, they must be Typical tasks comprise (Numerically Controlled Oscillator) carefully optimized for high area and energy efficiency in NCO, phase rotating, filtering, modulation and demodulation the underlying technology. So, for this purpose carry-select to name a few. A very attractive application of this is to use adders and logarithmic shifters have been chosen. Device dedicated CORDIC macros as accelerators for certain arith- dimensioning was automatically optimized with respect to metic functions to Application Specific Instruction Proces- dynamic and static power, area and performance using the sors (ASIPs). If the CORDIC macro is properly optimized in-house tool. The fully sequential CORDIC block for fixed- for very low area and energy costs, the overhead becomes point digital baseband processing features a wordlength of very small and by that a remedy to the well known “flexibility 16 bits, requires 5232 transistors, which is implemented in vs. energy” conflict can be achieved. Published by Copernicus Publications on behalf of the URSI Landesausschuss in der Bundesrepublik Deutschland e.V. 208 U. Vishnoi and T. G. Noll: Area- and energy-efficient CORDIC accelerators In this contribution, CORDIC macros are considered for (m = 0), or on a hyperbola (m = −1). There are two modes the use in highly area- and energy efficient digital GNSS re- of CORDIC operations for each of the above defined rota- ceivers. It will be shown in the later sections of the paper that tions: the “rotate mode” and the “vectoring mode”. For this CORDIC can be used in the baseband signal processing part application it is sufficient to consider the case of (m = 1) with as well as in the so-called Position Velocity Time (PVT) es- the two possible CORDIC operations. In the rotate mode, the timation unit. In the future, further applications of CORDIC vector xin,yin has to be rotated by a given angle zin and in building blocks can be expected. For example, it can be used order to converge zi to zero the direction di of rotation in in Singular Value Decomposition- (SVD) based jammer or each individual iteration is determined by the sign of the re- interferer supression in multisystem GNSS receivers. maining angle zi In order to cover the different specifications for these = GNSS receiver applications and also to ensure re-usability in di sgn(zi). (3) other application domains, a highly flexible, parameterized In the vectoring mode, the vector has to be rotated into the CORDIC architecture was elaborated. Thereby, a major goal x-axis and therefore, in order to converge yi to zero the direc- was to achieve an architecture being well suited for imple- tion di of rotation is determined by the sign of the remaining mentation in today’s deep sub-micron CMOS technologies. angle zi This paper is divided into 5 sections. A brief description of CORDIC algorithm is presented in Sect. 2. Section 3 di = −sgn(yi). (4) refers to the proposed flexible CORDIC architecture tem- As Eq. (2) actually perform so-called pseudo-rotations only, plate and the building blocks required for its implementa- at the very end, after n iterations the x- and y-components tion are described in detail. Implementation of an exemplary have to be corrected by a constant K . CORDIC macro and results also are presented in the end of Sect. 3. In Sect. 4, as an exemplary application, a CORDIC xout = K ·xn macro well-suited as a floating-point co-processor to an ASIP is discussed. yout = K ·yn (5) zout = K ·zn. 2 CORDIC algorithm This constant is determined by the series of step angles only The Coordinate Rotation Digital Computer (CORDIC) algo- and especially being independent from the individual direc- rithm was first introduced by Volder (1959) for the compu- tions of rotation di. The series of step angles can be opti- tation of trigonometric functions. Later on, it was general- mized by repeating individual elements towards a correction ized to hyperbolic functions, multiplication, division etc. by constant K featuring a small number of non-zero digits in Walther (1971). The algorithm allows for a hardware effi- canonical signed digit representation easing these final mul- cient implementation of arithmetic and trigonometric func- tiplications. tions by rotating a vector in a two-dimensional plane using only shift- and addition/subtraction operations being well 3 CORDIC architecture template suited for VLSI-CMOS realization. Thereby, a given input vector xin,yin is rotated in an iterative approach by a series As can be seen from Eq. (2), only additions/subtractions and of predetermined angles in each step; the direction of each shift operations are required for CORDIC iterations. In ad- elementary rotation is determined similarly to non-restoring dition to that, registers are required in pipelined CORDIC division. architectures and multiplexers are required in the feedback In more detail, after an initialization time-shared architectures. Figure 1 shows the block diagram of a fully iterative x0 = xin,y0 = y ,z0 = zin (1) in CORDIC macro performing rotations on a circle (m = 1). the vector is iteratively rotated by applying the CORDIC The input rot/vect is used to switch between rotate and vec- equations toring mode and Initialization is controlled by the input init. A single CORDIC stage is multiplexed in time i.e. for each = − · · · −i xi+1 xi m yi di 2 iteration the intermediate results are fed back. The multiplication by 2−i is implemented using shifters which are con- y = y +x ·d ·2−i (2) i+1 i i i trolled by the iteration index i. A look-up table (not shown in the figure) feeds the step-angle values ei, 2’s complement zi+1 = zi −di ·ei, subtraction is implemented by using XOR stages and proper −1 −i where, ei = tan 2 is the i-th element of the predefined LSB 2’s complement correction. Final multiplication by the step angle sequence. The parameter m specifies whether the correction constant K, can be implemented by simple shift rotations are performed on a circle (m = 1), on a straight line and add operations (not shown in the Fig. 1). Adv. Radio Sci., 10, 207–213, 2012 www.adv-radio-sci.net/10/207/2012/ U. Vishnoi and T. G. Noll: Area- and energy-efficient CORDIC accelerators 209 1 2 Fig. 3a. The basic cells of carry select adder. 3 Figure 3(a). The basic cells of carry select adder. 1 2 Fig. 1. Basic structure of a fully iterative CORDIC macro. 3 Figure 1. Basic structure of a fully iterative CORDIC macro. 1 2 Fig. 3b. The basic cells of carry select adder. 3 Figure 3(b). The basic cells of carry select adder.

Area- and Energy-Efficient CORDIC Accelerators in Deep Sub-Micron

3.2 the CORDIC Algorithm

CORDIC-Like Method for Solving Kepler's Equation

CORDIC V6.0 Logicore IP Product Guide

A Review on Hardware Accelerator Design and Implementation of CORDIC Algorithm for a Gaming Application

A.1 CORDIC Algorithm

An Optimization of CORDIC Algorithm and FPGA Implementation

FPGA Technology in Beam Instrumentation and Related Tools

A Unified Reconfigurable CORDIC Processor for Floating-Point Arithmetic

Design and Analysis of Double Precision Floating Point Division Operator Based on CORDIC Algorithm

A Highly Optimized Arithmetic Software Library and Hardware Co

A Trigonometric Hardware Acceleration in 32-Bit RISC-V

Cordic Algorithm and Its Applications In