FPGA Implementation of a High Speed Multistage Pipelined Adder Based CORDIC Structure for Large Operand Word Lengths ISSN 2047-3338

International Journal of Computer Science and Telecommunications [Volume 3, Issue 5, May 2012] 105 FPGA Implementation of a High Speed Multistage Pipelined Adder Based CORDIC Structure for Large Operand Word Lengths ISSN 2047-3338 Burhan Khurshid 1, Ghulam Mohd Rather 1 and Najeeb-ud-din Hakim 1 1Department of ECE, National Institute of Technology, Srinagar, J & K, India Abstract — CORDIC is an acronym for COordinate Rotation time consuming as it was too slow to converge to a desired Digital Computer. It is a hardware-efficient, shift and add precision. algorithm that is used in various digital signal processing With digital signal processing switching back to hardware applications for computing trigonometric, logarithmic, hyperbolic efficient solutions [2], CORDIC came to picture. The and other linear and transcendental functions. Traditionally the CORDIC method of computation basically represents a algorithm is implemented in hardware in two different styles: folded and unfolded. Unfolded structures are pipelined to compromise between the look-up table method and the power increase the throughput of the structure. This paper implements series method, wherein the precision of the result is preserved the algorithm in normal unfolded style, but using multistage without having to use any considerable amount of on chip pipelined adders. The resulting structure is compared against the memory. With the advancement in VLSI technology, the traditional unfolded and pipelined approaches and is shown to design of high speed VLSI architectures for modern digital have an improved throughput. The structure has been coded in signal processing systems became a reality. This provided VHDL and implemented using Xilinx FPGA synthesis tool. The designers with significant impetus to map CORDIC algorithm algorithm has been simulated for sine and cosine function into hardware structures. However, the frequent use of these evaluation. The simulations are carried out using Xilinx ISim tool. architectures in modern DSP systems requires a rapid increase in performance accompanied by a reduction in cost and shorter Index Terms —FPGA, Rotation Mode, Pipelined Adders and Unfolded Architecture time to market [1]. This is achieved by optimizing these structures for various performance parameters such as speed, power and area. One way of optimizing these structures is to I. INTRODUCTION incorporate such architectural modifications that enable them to be clocked at higher operating frequencies. The optimized ANY of the algorithms used today in DSP require structures are then implemented using suitable hardware Melementary functions such as trigonometric, inverse platforms. trigonometric, logarithmic, exponential, multiplication and IC technology provides a variety of implementation formats division functions [1]. The traditional approach for for system designers. The implementation format defines the implementing these functions was to look for some software technology to be used, how the switching elements are solutions. These included the use of look-up tables, power organized and how the system functionality will be series and polynomial expansion methods. These software materialized. Perhaps the most advanced implementation based approaches, however, suffered from many drawbacks. platform used today is the reconfigurable platform. FPGAs are Look-up table methods, although fast, required large reconfigurable systems that are often used as co-processors to memory for high precision results. The use of power series was implement CORDIC architectures. Historically, FPGAs have been slower, less energy efficient and generally achieved less functionality than their fixed ASIC counterparts. This has provided enough scope for designers to optimize FPGA based This work has been carried out in SMDP-II VLSI laboratory of the structures for parameters like throughput, power, area etc. Electronics and Communication Engineering Department, of National The rest of the paper is organized in the following manner. Institute of Technology Srinagar, India. This SMDP – II VLSI project is Section II discusses the CORDIC algorithm and its operating funded by Ministry of Communication and Information Technology, Government of India. Authors are grateful to the Ministry for the facilities mode for sine and cosine function evaluation. Section III provided under this project. discusses the folded and unfolded CORDIC architectures. B. Khurshid is with the Electronics and Communication Department Section IV discusses the pipelined addition. Section V National Institute of Technology, Srinagar, J & K, India (phone: 9797875163; provides the implementation and simulation results and based e-mail: [email protected]). G. M. Rather is with the Electronics and Communication Department on the comparison metrics, performance evaluation of folded, National Institute of Technology, Srinagar, J & K, India. unfolded, pipelined and proposed CORDIC architectures is N. Hakim is with the Electronics and Communication Department carried out. National Institute of Technology, Srinagar, J & K, India. Journal Homepage: www.ijcst.org Burhan Khurshid et al. 106 -2i 1/2 II. CORDIC ALGORITHM Ki = 1/(1+2 ) ; known as scale constant. The CORDIC algorithm provides an iterative method of di = ±1 ; known as decision function. performing vector rotations by arbitrary angles using only shift and add operations [3]. The algorithm, credited to Volder [4], Removing the scale constant from the iterative equations is derived from the general (Givens) rotation transform: yields a shift-add algorithm for vector rotation. The product of the Ki’s can be applied elsewhere in the system or treated as x'= x cos ø – y sin ø (1) part of a system processing gain. That product approaches 0.6073 as the number of iterations goes to infinity. Therefore, y'= x sin ø + y cos ø (2) the rotation algorithm has a gain, An , of approximately 1.647. The exact gain depends on the number of iterations, and obeys This rotates a vector in a Cartesian plane by the angle ø. These the relation: can be rearranged so that: -2i 1/2 An = Π [1+2 ] x'= cos ø [ x - y tan ø ] (3) The angle accumulator adds a third difference equation to the y'= cos ø [ y + x tan ø ] (4) CORDIC algorithm: -i The rotation angles are restricted so that, tan ø = ±2 . This z i+1 = z i – αi reduces the multiplication operation by the tangent term to -1 -i -i simple shift operation. Any given target angle ø can be z i+1 = z i – di tan ( 2 ) { tan αi = ±2 } decomposed into a sequence of smaller micro rotations. Thus ø is decomposed as a sequence of elementary rotations: For a single CORDIC micro-rotation the resulting equations are: ø =Σ α i (5) -I x i+1 = x i – yi .d i. 2 (10) Using these basic ideas we have the basic iterative rotations -I as: y i+1 = y i + xi .d i. 2 (11) -1 -i x i+1 = cos αi [ x i – yi tan αi ] (6) z i+1 = z i – di tan ( 2 ) (12) y i+1 = cos αi [y i + x i tan αi ] (7) The CORDIC rotator is normally operated in one of two modes. In rotation mode, the angle accumulator is initialized The rotation angles are restricted so that: with the desired rotation angle. The rotation decision at each iteration is made to diminish the magnitude of the residual -i tan αi = ±2 angle in the angle accumulator. The decision at each iteration is therefore based on the sign of the residual angle after each This assures that the multiplication by the tangent term is step. The CORDIC equations are: reduced to simple shifting operation. -i x i+1 = x i – yi .d i. 2 2 1/2 x i+1 = [ x i – yi tan αi ] / ( 1 + tan αi ) -i y i+1 = y i + x i .d i. 2 2 1/2 y i+1 = [ y i + x i tan αi ] / ( 1 + tan αi ) -1 -i z i+1 = z i – di tan ( 2 ), Rearranging: Where, -i -2i 1/2 x i+1 = [ x i – yi (±2 ) ] / ( 1 + 2 ) di = -1 if z i < 0 -i -2i 1/2 y i+1 = [ y i + x i (±2 ) ] / ( 1 + 2 ) +1, otherwise Or After n iterations we are provided with following results: -i x i+1 = K i. [ x i – yi .d i. 2 ] (8) xn = A n [ x 0 cosz 0 – y0 sinz 0 ] (13) -i y i+1 = K i. [ y i + x i .d i. 2 ] (9) yn = A n [ y 0 cosz 0 + x 0 sinz 0 ] (14) Where, zn = 0 (15) International Journal of Computer Science and Telecommunications [Volume 3, Issue 5, May 2012] 107 Setting the y component of the input vector to zero reduces the the shifters in each path are shared on time basis this rotation mode result to: conventional approach of implementing the CORDIC algorithm is not suitable for high speed applications [7]. xn = A n . x 0 cosz 0 (16) Another disadvantage is with respect to the shift operations. When implemented in hardware the shifters have to change the yn = A n . x 0 sinz 0 (17) shift distance with the number of iteration. For large number of iterations these require a high fan in and reduce the maximum By setting x0 equal to 1/A n, the rotation produces the unscaled speed for the application [7, 9].These shifters do not map well sine and cosine of the angle argument z0. into FPGA architectures and if implemented require several layers of logic. The result is a slow design that uses large number of logic cells. In addition the output rate is also limited III. CORDIC ARCHITECTURES by the fact that the operation is performed iteratively and therefore the maximum output rate equals 1/n times the clock In general, CORDIC architectures can be broadly classified rate, where n is the number of iterations. as folded and unfolded, based upon the hardware realization of the three iterative equations [5]. A direct duplication of B. Unfolded parallel design equations 10, 11 and 12 into hardware results in folded architecture.

Load more