A High Speed 32-Bit FPGA Based CORDIC Architecture for Sine and Cosine Function Evaluation

International Journal of Computer Applications (0975 – 8887) Volume 73– No.7, July 2013 A High Speed 32-bit FPGA based CORDIC Architecture for Sine and Cosine Function Evaluation Burhan Khurshid Roohie Naz Mir Hakim Najeeb-ud-din Shah Department of Computer Science Department of Computer Department of Electronics and Engineering, Science Engineering, Communication Engineering, NIT Srinagar, India NIT Srinagar, India NIT Srinagar India redundant signed arithmetic prevents the propagation of carry 1. ABSTRACT thereby making it possible to create parallel adders with Digital Signal Processing (DSP) algorithms always have a constant delay irrespective of the operand word-length [9]. need for calculating certain linear, trigonometric, hyperbolic, Thus low latency results are produced in a redundant logarithmic and other transcendental functions. CORDIC architecture resulting in high-performance designs. based algorithms have long been used in evaluating these functions. Traditional approaches have, however, been limited Carry save arithmetic also leads to a form of redundant to software domain only. The simplicity of operation of number representation resulting in low latency CORDIC algorithm encourages its implementation in operations.This paper proposes a novel architecture that, hardware. In this paper a novel CORDIC architecture for sine besides reducing the latency of CORDIC algorithm stabilizes and cosine function evaluation has been proposed. The its timing response by providing a constant operating hardware integration is carried out using Field Programmable frequency for varying input word lengths. Gate Arrays (FPGAs).The proposed algorithm is based on modified carry save addition and incorporates bit-truncation. The use of FPGAs as an implementing platform for various The structure offers extremely low latency and high operating custom DSPs has increased since the beginning of this decade frequencies, when pipelined. The novelty of the proposed [10, 11]. However, earlier FPGAs were not able to implement architecture is that it offers a flat timing response for varying the parallel CORDIC architecture due to the limited chip size input word lengths. The structure has an inherent capability of and difficulty in routing the hard-wired shifters [4, 12, 13]. supporting an additional internal pipeline within each stage, Consequently FPGAs were used to implement only the enabling the structure to operate at high frequencies, typically iterative CORDIC architectures [14, 15]. The implementation four times that of the normal CORDIC. The performance of parallel CORDIC architectures was, therefore restricted analysis is carried out by comparing the proposed architecture only to the programmable platforms. against existing non-redundant (basic) and redundant (modified) architectures. However, the advancement in VLSI technology has resulted in high density FPGAs which provide an attractive platform General Terms for parallel CORDIC architectures [16, 17, 18, 19]. This paper DSP algorithm, Reconfigurable computing emphasizes the implementation of parallel CORDIC Keywords architectures in FPGAs. Carry save Addition; CORDIC Algorithm; Digital Signal Processing; FPGA; Pipelining; Rotation Mode 3. CORDIC ALGORITHM The CORDIC algorithm provides an iterative method of 2. INTRODUCTION performing vector rotations by arbitrary angles using only CORDIC (COordinate Rotation DIgital Computer) [1,2] is a shift and add operations. The algorithm, credited to Volder, is shift-and-add algorithm that is widely used in Very Large derived by rotating a vector (x, y) in a Cartesian plane by Scale Integration (VLSI) DSP systems [3] due to its simple some angle, say. hardware and versatility. The algorithm is the best compromise between the look up table method (requires an For a single CORDIC micro-rotation the resulting equations enormous memory) and the polynomial approximation are: method, (slow to converge). Traditionally the hardware implementation has foccussed on sequential structures. i However, these structures do not map well on FPGAs [4]. The xi1 x i y i d i 2 (1) major drawback of the conventional CORDIC algorithm was its relatively high latency and low throughput [5]. This was i due to the sequential nature of the iteration process with carry yi1 y i x i d i 2 (2) propagate addition and variable shifting in every iteration [6]. Unfolded parallel implementations were proposed to 1 i overcome the above drawbacks. However, the carry propagate zi1 z i d i tan 2 (3) addition still remained a bottleneck for further latency improvements [7, 8]. To improve the latency of CORDIC Where, architectures, high radix CORDIC and redundant signed arithmetic have been proposed and implemented. The use of 36 International Journal of Computer Applications (0975 – 8887) Volume 73– No.7, July 2013 x y z c s (9) dii 1 if z 0 =+1 Otherwise Where, s represents the sum and c represents the carry. Equations 1 and 2 are the coordinate equations, giving the The important consideration is that c and s can be computed coordinates for the next micro-rotation. Equation 3 is the angle equation, giving the amount of rotation for the next independently, and furthermore, each ci (and si) can also be micro-rotation. computed independently from all of the other c’s (and s’). The CORDIC rotator is normally operated in one of two In order to add m different n-bit numbers together, the simple modes. In rotation mode, the angle accumulator is initialized approach would be to repeat the procedure approximately m with the desired rotation angle. The rotation decision at each times over. This would require (m−2) carry save adder (CSA) iteration is made to diminish the magnitude of the residual blocks and a final ripple carry adder (RCA) block. Note that angle in the angle accumulator. The decision at each iteration is therefore based on the sign of the residual angle after each every time we pass through a CSA block, our number step. increases in size by one bit. Therefore, the number that goes to the RCA will be at most (n + m – 2) bits long. So the final After n iterations we get the following results: RCA will have a gate delay of O(log (n + m)). Therefore the total gate delay is O(m + log (n + m)). Two points worth x A xcos z y sin z noting here are; firstly the major part of the delay in the nn 0 0 0 0 (4) structure is due to the final RCA stage; and, secondly after y A ycos z x sin z every CSA stage an additional bit gets added to the sum nn 0 0 0 0 (5) vector. The proposed architecture takes care of both these issues. z 0 n (6) 5. PROPOSED ARCHITECTURE Setting the y component of the input vector to zero reduces The proposed architecture is based on carry save addition. A the rotation mode result as under: carry save adder consists of a carry save stage and a final ripple carry stage that adds (or subtracts) the sum (or x A. x cos z difference) and the carry (or borrow) bits [20]. The nn00 (7) performance of the adder is limited by the ripple carry stage that accounts for the major portion of the delay in the circuit. y A. x cos z nn00 (8) Unfolded implementation of CORDIC structures is carried out by decomposing the entire core into several subsequent By setting x0 equal to 1/An, the rotation produces the unscaled stages, with the output of one stage serving as input to the sine and cosine of the angle argument z0. subsequent one. At each stage a sum (or difference) vector is obtained that is propagated to the next stage. The proposed 4. CARRY SAVE ADDITION architecture eliminates the ripple carry stage from the Carry-save arithmetic leads to a form of redundant number structure, such that instead of propagating only the final sum representation. Due to its similarity with redundant (or difference) vector we are actually propagating both the arithmetic, most redundant architectures can be adapted to sum (or difference) and carry (or borrow) vectors. This takes carry-save case. In carry save addition, we refrain from care of the large delay introduced by the RCA block at each directly passing on the carry information until the very last stage. However, this results in certain modifications in the step. This is done by taking three numbers, x, y, z and overall algorithm which is depicted in the flow chart of figure converting them into two numbers c and s such that: 1. The block diagram of the proposed CORDIC follows and is shown in figure 3. 37 International Journal of Computer Applications (0975 – 8887) Volume 73– No.7, July 2013 Fig 1: Flow chart for proposed CORDIC. 38 International Journal of Computer Applications (0975 – 8887) Volume 73– No.7, July 2013 Fig 2: Proposed CORDIC logic diagram As can be seen in figure 2 above, an additional redundant bit This has been overcome by truncating the most significant bit is added after every CSA stage, such that, if the initial word after every carry save addition/subtraction. The truncation of length is N bits, then after n stages the word length will be the most significant bit does not affect the precision of the N+2n -1.This causes a high fan-in for the subsequent stages result as the bit is redundant. The bit truncated CORDIC is requiring several layers of logic, thus slowing the process. shown in figure 3 39 International Journal of Computer Applications (0975 – 8887) Volume 73– No.7, July 2013 Fig 3: Bit truncated proposed CORDIC logic diagram As seen in the above figure in each stage a redundant bit is different parameter values. generated. This MSB is eliminated in each stage, so that if the initial wordlength is N-bits the wordlength after n stages will 6.2 Implementation remain N-bits. The core is implemented with the following synthesis description: 6.

Load more