Document downloaded from:
http://hdl.handle.net/10251/108350
This paper must be cited as:
Torres Carot, V.; Valls Coquillat, J.; Canet Subiela, MJ. (2017). Optimised CORDIC-based atan2 computation for FPGA implementations. Electronics Letters. 53(19):1296-1298. doi:10.1049/el.2017.2090
The final publication is available at
http://doi.org/10.1049/el.2017.2090
Copyright Institution of Electrical Engineers
Additional Information
Optimized CORDIC-based atan2 computation for FPGA implementations
V. Torres, J. Valls and M.J. Canet 1st June 2017
Abstract
In the present article we describe a method for the implementation of the atan2 operator based on the CORDIC algorithm. In our proposal the computation of the z-path takes advantage of the LUT-based Field Programmable Gate Array (FPGA) resources to reduce by between 17% and 25%, without performance deterioration, the overall area of the unrolled architecture.
1 Introduction
The atan2(y,x) function is an operation frequently needed in hardware sys- tems. For example, it is required to compute the angle of complex numbers in the time/frequency/phase synchronization stages of digital communication systems. Since the atan2 function is highly nonlinear and has two input vari- ables, the overall implementation cost of the operator can be significant. In this regard, hardware-optimized operators are always desirable in real-world designs to reduce their economical cost or to improve the performance with- out increasing that cost. The iterative algorithm COordinate Rotation DIgital Computer (CORDIC) [1] in its vectoring mode is probably the best option for hardware implementations of the atan2(y,x), and the conventional version is the best choice for FPGAs because it profits from the fast carry propagation logic, which is faster than general logic [2]. In vectoring mode, the aim of the CORDIC algorithm is to rotate its input vector (x,y) trying to align it with the x axis to obtain the atan2(y,x) value in the z variable. The direction di of the rotation at iteration i is decided by the sign of yi: di=−sign(yi). The initial values for the algorithm are z0=0, x0=x, y0=y and the CORDIC equations for iteration i are as follows:
−i −i xi+1 = xi − di · yi · 2 , yi+1 = yi + di · xi · 2 , (1) −i zi+1 = zi − di · atan(2 ), (2) Since the range of convergence of the algorithm is |z|≤1.74329, some sort of preprocessing, and may be also postprocessing, must be done to ensure the
1 convergence for any possible input vector. Without loss of generality, we will assume the use of the following preprocessing stage [1] that guarantees the con- vergence for any input vector because it moves the initial angle of the regular CORDIC to the [−π/2, π/2] range: x0=−d−1·y, y0=d−1·x, z0=−d−1·π/2, be- ing d−1 = sign(y). To have a w-bit output precision, w+1 iterations are needed. In the present work we propose a method to reduce the cost of the z-path in the conventional CORDIC for FPGA devices.
2 Optimizations for the CORDIC atan2
Several optimizations can be applied when the only output of the algorithm is atan2(y,x): a) the width of the y-path can be progressively reduced, since |yi| has a bound that is divided by 2 with each iteration, b) the x-path can be frozen, that is, it is not updated any more as soon as this does not negatively affect the output accuracy, since the only output of the operator is z, c) the adder/subtractors in the x-path can have a shorter word length than those in the y-path, since this last data-path needs more precision to guarantee the right decisions and d) the y-path adder/subtractor for the final iteration can be omitted because its output is irrelevant for the atan2 computation. All the optimizations explained above can be applied and the accuracy goal at the output of the operator is still satisfied, which in this work we assume is last-bit accuracy (LBA), i.e. the maximum error magnitude is less or equal than the weight of the least significant bit (LSB) of the output. For example, the operator in Fig. 1 is a 12-bit CORDIC where the x-path is frozen after the 4th regular CORDIC stage and the sizes of the adder/subtractors are drawn in accordance to their real word length.
3 LUT-based computation of the z-path
In the conventional CORDIC, the angle zi+1 is computed from the angle zi accumulated in the previous iterations and the angle rotated in the current iteration, according to (2). Therefore, the update of the z-path requires an adder/subtractor for each CORDIC iteration. In the present paper we propose a novel implementation scheme that profits from the resources offered by modern FPGA devices. These devices provide look-up tables (LUTs) connected to carry- chain logic to implement fast ripple-carry structures, as shown in Fig. 2a. These resources can be used to store precomputed values for all the possible rotated angles for n consecutive CORDIC iterations and, also, to add that value to the angle zg accumulated in previous stages. Note that, despite the reduction in the use of resources that we will show in next sections, with this proposal we are still benefiting from the carry-chain logic, as shown in Fig. 2b. Hence, if the table that stores the precomputed values for n iterations is named LUTn, the new value for the angle is obtained as:
zg+n = LUTn(dg, . . . , dg+n−1) + zg. (3)
2 iue1 ceai iga fteoeao.Tefis tg fteoperator the of stage first The operator. the of a diagram is Schematic 1: Figure de oaohript ssoni i.2.Teeoe h oae nl for angle rotated the Therefore, be 2a. can Fig. in output shown whose as of LUTs sets input, 5-input another provide to slices added Xilinx current Specifically, the of content the (1)–(2), equations of follow set a iterations for the LUTn all that Assuming implemented are adder one resources. and carry-logic blocks LUTns and LUTs input. 6-input its using of bit sign the extracts fteFG a ecngrdt c saLT ihu nadr fe that, ( After needed adder. as an used without be LUT6 should a structures structure as adder basic act plus the to LUT5 angle, configured many accumulated as be no can is FPGA there the iterations of 6 first the for Since hs tutrsi eddi i.1.TeLT o h nlsto iterations of set final a the for for ( LUTns stages LUTn of previous The amount in required 1). than Fig. smaller in be may needed is structures these yacnetoa -ah ny5LTsaenee:oeLT,treLUT5 three 2 LUT6, adding one by adder. needed: plus LUT4 are final LUTns a 5 and only adder, required adder/subtractors plus z-path, 25 conventional the of a instead CORDIC, by 24-bit a for example, For ausad hn ipytuctn h upto h operator. the of output the truncating simply then, and, values y
sgn x d 1 ial,teotu fteoeao a eruddto rounded be can operator the of output the Finally,
± −
π/ y 1 x 1 n − − ≤ oao,a ecie ntetx.Tesnboki noeao that operator an is block sgn The text. the in described as rotator, 2
sgn *1/-1 d *1/-1 trtoscnb trduigoe5iptLTprbto h angle. the of bit per LUT 5-input one using stored be can iterations 5 d0 y0 1 − x0 − 0 0 ( w LUTn( sgn +/- d +/- +1) d1 y1 0 x1
n 1 1
ooeo h Un htsoetecmiain fatan of combinations the store that LUTns the of one to tgsi le sn h olwn expression: following the using filled is stages d sgn y +/- d1 +/- d 2 2 x2 g
d , . . . , 2 2
sgn +/- d +/-
n d 2 3 y3 x
s 3 + 1 = g 3 3
+ sgn
n +/- +/-
w d d4 y4 3 −
btpeiinCRI is: CORDIC precision -bit x4 1 = )
d 4 ( sgn
w +/-
3 d 5 y5 x5 +1 − e.g. 5 g sgn
− +/- + d y
X 6 i 6 x6 = n 6) U2i eddi i.1.The 1). Fig. in needed is LUT2 a g − 6 sgn 1 / d7 y +/-
5 7 d x7 e i
. 7 · sgn atan d +/- 8 y8 x8 8 sgn