A Noniterative Radix-8 CORDIC Algorithm with Low Latency and High Efficiency

Total Page:16

File Type:pdf, Size:1020Kb

Load more

electronics Article A Noniterative Radix-8 CORDIC Algorithm with Low Latency and High Efficiency Wenming Tang * and Feng Xu The Key Laboratory for Information Science of Electromagnetic Waves, School of Information Science and Technology, Fudan University, Shanghai 200433, China; [email protected] * Correspondence: [email protected] Received: 6 August 2020; Accepted: 1 September 2020; Published: 17 September 2020 Abstract: An efficient, noniterative Radix-8 (NR-8) coordinate rotation digital computer (CORDIC) algorithm is proposed for low-latency and high-efficiency computation of the functions of sine, cosine, or the phase shift, with which the values of the functions are precisely computed by only using the angle in a narrow range of [0, π/12] rather than in a wide angle range of [0, π/2]. This algorithm is expressed by a formula that simplifies the traditional iterative processes by using a complex multiplier. The results obtained from the simulation and the experiment on an FPGA show that the NR-8 CORDIC algorithm operates well, with which the 16-bit precision output is extremely precise, with only 0.012% of the absolute error for computing the sine or cosine function with a step of 0.001◦. Compared with the best conventional CORDIC algorithm, the clock latency of this algorithm significantly decreases down to less than 50%, only needs half of the logic resources and consumes half of the power. This algorithm also takes advantages over other newly improved CORDIC algorithms and requires less than half of the clock latency, even for a 23-bit precision output. Therefore, this algorithm could provide a potential application in real-time systems such as radar digital beamforming. Keywords: CORDIC; sine and cosine; phase shift; FPGA; digital beamforming 1. Introduction As one of the most common transcendental functions, the sine or cosine function has been widely used in real-time digital signal processing systems, such as radar, ultrasound, robotics, communication and so on [1–7]. The accuracy and efficiency of the computation of the functions are two key requirements for evaluating the performance of these systems. For this purpose, many methods to calculating the sine or cosine function have been developed, such as the lookup table, Taylor series, polynomial approximation and so on [8–10]. However, these methods have the disadvantage of either high complexity or high latency, and thus an efficient method is extremely required to meet the accurate and efficient computation for real-time systems. Fortunately, the coordinate rotation digital computer (CORDIC) algorithm [11] can provide accurate and efficient computations by employing an iterative way and decomposing the calculation into a series of addition, subtraction and shift operations, which enables it to be widely used in digital circuits to implement the computations of trigonometric and exponential functions, and so forth [12]. However, as an iterative algorithm, the accuracy of the CORDIC algorithm strongly relies on the number of iterations, so the increase of the iteration number leads to the increase of the clock latency, thus lowering the efficiency for the computations. To further enhance the efficiency, more progress has been made by improving the architecture of the CORDIC algorithm to achieve a more efficient algorithm, such as the Scaling-Free (SF) CORDIC, Radix-4 CORDIC, Radix-8 CORDIC and low-latency Hybrid (LLH) CORDIC algorithms [13–17]. Electronics 2020, 9, 1521; doi:10.3390/electronics9091521 www.mdpi.com/journal/electronics Electronics 2020, 12, x FOR PEER REVIEW 2 of 18 Electronics 2020, 9, 1521 2 of 17 [13–17]. Some of the improved CORDIC algorithms have been widely used in radar digital beamforming (DBF) systems. For instance, Lee et al. developed a CORDIC-based algorithm to be Some of the improved CORDIC algorithms have been widely used in radar digital beamforming (DBF) used in Multi-Gbps MIMO systems, which is implemented by a Virtex-6 FPGA using 49,752 slices, systems. For instance, Lee et al. developed a CORDIC-based algorithm to be used in Multi-Gbps and the algorithm needs 260 ns (250 MHz, 65 clock periods) of latency due to the many iterations MIMO systems, which is implemented by a Virtex-6 FPGA using 49,752 slices, and the algorithm needs required for computations [4]. Similarly, Jun et al. described look-ahead, pipelined CORDIC-based 260 ns (250 MHz, 65 clock periods) of latency due to the many iterations required for computations [4]. adaptive filters and their application to adaptive beamforming [5], and the pipeline level m depends Similarly, Jun et al. described look-ahead, pipelined CORDIC-based adaptive filters and their on the m-bit precision. However, the CORDIC algorithm often requires many iterations to converge, application to adaptive beamforming [5], and the pipeline level m depends on the m-bit precision. which has become a major bottleneck for real-time applications. However, the CORDIC algorithm often requires many iterations to converge, which has become a In this work, a new noniterative Radix-8 (NR-8) CORDIC algorithm is proposed for low-latency major bottleneck for real-time applications. implementation on FPGAs. In the process of the development of an NR-8 CORDIC algorithm, three In this work, a new noniterative Radix-8 (NR-8) CORDIC algorithm is proposed for low-latency steps were taken: (1) The NR-8 CORDIC algorithm was derived from the conventional Radix-2 implementation on FPGAs. In the process of the development of an NR-8 CORDIC algorithm, CORDIC one. (2) The input angle θ was set to a narrow range by simultaneously transforming the three steps were taken: (1) The NR-8 CORDIC algorithm was derived from the conventional Radix-2 input variables x and y . (3) A formula was deduced and optimized. These steps can narrow the CORDIC one. (2)0 The input0 angle θ was set to a narrow range by simultaneously transforming the selectinputed variables range ofx 0theand iterationy0. (3) Aangle formula and wasrealize deduced a noniterative and optimized. formula Theseof the stepsCORDIC can narrowalgorithm the; bselectedesides, the range algorithm of the iterationcan be accelerated angle and by realize the multiplier a noniterative module formula readily of available the CORDIC in FPGAs algorithm; [18]. Asbesides, a result, the algorithmthe algorithm can becan accelerated reduce 7– by17 theclock multiplier latencies module of the readilyconventional available CORDIC in FPGAs (16 [-18bit]. precision)As a result, alg theorithm algorithm to a canthree reduce-clock 7–17 latency clock, needs latencies less of logic the conventional resources and CORDIC consume (16-bits less precision) power. Comparedalgorithm towith a three-clock the LLH algorithm latency, needs [16], less it has logic great resources advantages and consumes in terms lessof time power. and Comparedresources. withFor thethe structure LLH algorithm of this [ 16paper,], it has following great advantages the introduction in terms is of Section time and 2, resources.in which the For derivation the structure from of the this conventionalpaper, following CORDIC the introduction algorithm is Sectionis presented.2, in which In Section the derivation 3, the from proposed the conventional NR-8 CORDIC CORDIC is introduced.algorithm is Section presented. 4 presents In Section its3 ,FPGA the proposed implementation NR-8 CORDIC and isanalysis introduced.. Section Section 5 introduces4 presents the its applicationFPGA implementation of the NR-8 andCORDIC analysis. in radar Section DBF.5 introduces Finally, a the conclusion application is made of the according NR-8 CORDIC to the in results radar obtainedDBF. Finally, from a the conclusion above sections is made. according to the results obtained from the above sections. 2.2. C Conventionalonventional CORDIC CORDIC Rotator Rotator Algorithm Algorithm TheThe CORDIC algorithmalgorithm usually usually operates operates in in rotation rotation mode mode or vector or vector mode mode [11,12 [1],1 following,12], following linear, linear,circular circular or hyperbolic or hyperbolic coordinate coordin trajectories.ate trajectories. In this In paper, this paper we focus, we focus on the on rotation the rotation mode mode using usingcircular circular trajectory. trajectory. * TheThe rotation rotation mode mode is is depicted depicted in in Fig Figureure 1,1, where where θθ isis the the angle angle between between the the VV00(,)( xx 00, yy 00) andand * * * Vdd(,)( xx dd, yy dd) vectors.vectors. AsAs thethe vector vectorV 0Vrotates0 rotates counterclockwise counterclockwise to the to vectorthe vectorVd, the V coordinate,d , the coordinate,(xd, yd) , can be described as in Equation (1): (,)xydd, can be described as in Equation (1): " # " #" # " #" # xd cos θsinθ x 0 1 1 tan θ x0 = xd cosθ sinθ x00= tanθ x − =coscosθθ − (1) yd sin θcos θ y0 tan θ 1 y0 (1) yd sinθ cosθ y00 tanθ 1 y y V1 (x1,y1) (x ,y ) Vd d d (x ,y ) V2 2 2 . (x ,y ) V0 0 0 2nd rotation θ1 1st rotation θ θ 0 O x FigureFigure 1. 1. TThehe CORDIC CORDIC vector vector rotation rotation model. model. If the initial vector (x , y ) is set to x = 1, y = 0, Equation (1) can be used to compute cos θ and If the initial vector (,)0xy0 is set to0 xy=1, 0 0 , Equation (1) can be used to compute cosθ sin θ. 00 00 and sinθ . is decomposed into a series of micro angles, each of which corresponds to one step rotation as shown in Figure 1 and described as in Equation (2): Electronics 2020, 9, 1521 3 of 17 θ is decomposed into a series of micro angles, each of which corresponds to one step rotation as shown in Figure1 and described as in Equation (2): Xn 1 (i+1) θ = θi, θi = tan− (σiR− ) (2) i=0 where n denotes the number of rotations, R denotes the radix, R = 2l, l N, θ denotes micro angles 2 i and σ is the selection factors defined as all integers within the interval σ [ R/2, R/2].
Recommended publications
  • 3.2 the CORDIC Algorithm

    3.2 the CORDIC Algorithm

    UC San Diego UC San Diego Electronic Theses and Dissertations Title Improved VLSI architecture for attitude determination computations Permalink https://escholarship.org/uc/item/5jf926fv Author Arrigo, Jeanette Fay Freauf Publication Date 2006 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California 1 UNIVERSITY OF CALIFORNIA, SAN DIEGO Improved VLSI Architecture for Attitude Determination Computations A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical and Computer Engineering (Electronic Circuits and Systems) by Jeanette Fay Freauf Arrigo Committee in charge: Professor Paul M. Chau, Chair Professor C.K. Cheng Professor Sujit Dey Professor Lawrence Larson Professor Alan Schneider 2006 2 Copyright Jeanette Fay Freauf Arrigo, 2006 All rights reserved. iv DEDICATION This thesis is dedicated to my husband Dale Arrigo for his encouragement, support and model of perseverance, and to my father Eugene Freauf for his patience during my pursuit. In memory of my mother Fay Freauf and grandmother Fay Linton Thoreson, incredible mentors and great advocates of the quest for knowledge. iv v TABLE OF CONTENTS Signature Page...............................................................................................................iii Dedication … ................................................................................................................iv Table of Contents ...........................................................................................................v
  • CORDIC-Like Method for Solving Kepler's Equation

    CORDIC-Like Method for Solving Kepler's Equation

    A&A 619, A128 (2018) Astronomy https://doi.org/10.1051/0004-6361/201833162 & c ESO 2018 Astrophysics CORDIC-like method for solving Kepler’s equation M. Zechmeister Institut für Astrophysik, Georg-August-Universität, Friedrich-Hund-Platz 1, 37077 Göttingen, Germany e-mail: [email protected] Received 4 April 2018 / Accepted 14 August 2018 ABSTRACT Context. Many algorithms to solve Kepler’s equations require the evaluation of trigonometric or root functions. Aims. We present an algorithm to compute the eccentric anomaly and even its cosine and sine terms without usage of other transcen- dental functions at run-time. With slight modifications it is also applicable for the hyperbolic case. Methods. Based on the idea of CORDIC, our method requires only additions and multiplications and a short table. The table is inde- pendent of eccentricity and can be hardcoded. Its length depends on the desired precision. Results. The code is short. The convergence is linear for all mean anomalies and eccentricities e (including e = 1). As a stand-alone algorithm, single and double precision is obtained with 29 and 55 iterations, respectively. Half or two-thirds of the iterations can be saved in combination with Newton’s or Halley’s method at the cost of one division. Key words. celestial mechanics – methods: numerical 1. Introduction expansion of the sine term and yielded with root finding methods a maximum error of 10−10 after inversion of a fifteen-degree Kepler’s equation relates the mean anomaly M and the eccentric polynomial. Another possibility to reduce the iteration is to use anomaly E in orbits with eccentricity e.
  • Characterization Quaternaty Lookup Table in Standard Cmos Process

    Characterization Quaternaty Lookup Table in Standard Cmos Process

    International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 02 Issue: 07 | Oct-2015 www.irjet.net p-ISSN: 2395-0072 CHARACTERIZATION QUATERNATY LOOKUP TABLE IN STANDARD CMOS PROCESS S.Prabhu Venkateswaran1, R.Selvarani2 1Assitant Professor, Electronics and Communication Engineering, SNS College of Technology,Tamil Nadu, India 2 PG Scholar, VLSI Design, SNS College Of Technology, Tamil Nadu, India -------------------------------------------------------------------------**------------------------------------------------------------------------ Abstract- The Binary logics and devices have been process gets low power and easy to scale down, gate of developed with an latest technology and gate design MOS need much lower driving current than base current of area. The design and implementation of logical circuits two polar, scaling down increase CMOS speed Comparing become easier and compact. Therefore present logic MVL present high power consumption, due to current devices that can implement in binary and multi valued mode circuit element or require nonstandard multi logic system. In multi-valued logic system logic gates threshold CMOS technologies. multiple-value logic are increase the high power required for level of transition varies in different logic systems, a quaternary has and increase the number of required interconnections become mature in terms of logic algebra and gates. ,hence also increasing the overall energy. Interconnections Some multi valued logic systems such as ternary and are increase the dominant contributor to delay are and quaternary logic schemes have been developed. energy consumption in CMOS digital circuit. Quaternary logic has many advantages over binary logic. Since it require half the number of digits to store 2. QUATERNARY LOGIC AND LUTs any information than its binary equivalent it is best for storage; the quaternary storage mechanism is less than Ternary logic system: It is based upon CMOS compatible twice as complex as the binary system.
  • CORDIC V6.0 Logicore IP Product Guide

    CORDIC V6.0 Logicore IP Product Guide

    CORDIC v6.0 LogiCORE IP Product Guide Vivado Design Suite PG105 August 6, 2021 Table of Contents IP Facts Chapter 1: Overview Navigating Content by Design Process . 5 Core Overview . 5 Feature Summary. 6 Applications . 6 Licensing and Ordering . 7 Chapter 2: Product Specification Performance. 8 Resource Utilization. 9 Port Descriptions . 9 Chapter 3: Designing with the Core Clocking. 12 Resets . 12 Protocol Description – AXI4-Stream . 12 Functional Description. 17 Input/Output Data Representation . 30 Chapter 4: Design Flow Steps Customizing and Generating the Core . 38 System Generator for DSP. 44 Constraining the Core . 44 Simulation . 45 Synthesis and Implementation . 46 Chapter 5: C Model Features . 47 Overview . 47 Installation . 48 C Model Interface. 49 CORDIC v6.0 Send Feedback 2 PG105 August 6, 2021 www.xilinx.com Compiling . 53 Linking. 53 Dependent Libraries . 54 Example . 55 Chapter 6: Test Bench Demonstration Test Bench . 56 Appendix A: Upgrading Migrating to the Vivado Design Suite. 58 Upgrading in the Vivado Design Suite . 58 Appendix B: Debugging Finding Help on Xilinx.com . 62 Debug Tools . 63 Simulation Debug. 64 AXI4-Stream Interface Debug . 65 Appendix C: Additional Resources and Legal Notices Xilinx Resources . 66 Documentation Navigator and Design Hubs . 66 References . 67 Revision History . 67 Please Read: Important Legal Notices . 68 CORDIC v6.0 Send Feedback 3 PG105 August 6, 2021 www.xilinx.com IP Facts Introduction LogiCORE IP Facts Table Core Specifics This Xilinx® LogiCORE™ IP core implements a Versal™ ACAP
  • A Review on Hardware Accelerator Design and Implementation of CORDIC Algorithm for a Gaming Application

    A Review on Hardware Accelerator Design and Implementation of CORDIC Algorithm for a Gaming Application

    International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-9 Issue-9, July 2020 A Review on Hardware Accelerator Design and Implementation of CORDIC Algorithm for a Gaming Application Trupthi B, Jalendra H E, Srilaxmi C P, Varun M S, Geethashree A Abstract - Co-ordinate Digital rotation computer is the full The conversion of rectangular to polar and polar to form of CORDIC.CORDIC is a process for finding functions rectangular function are the two major operations in using less hardware like shifts, subs/adds and then compares. It's CORDIC. Basically, they are the two different co-ordinates the algorithm used for some elementary functions which are calculated in real-time and many conversions like from to represent the 2D plane. The Rectangular form is rectangular to polar co-ordinate and vice versa. Rectangular to represented by a real part (horizontal axis) and an imaginary polar and polar to rectangular is an important operation in (Vertical axis) part of the vector. The Rectangular form is CORDIC which are generally used in ALUs, wireless shown by a real part (horizontal axis) and an imaginary communications, DSP processors etc. This paper proposes the (Vertical axis) part of the vector [8].The rectangular co- implementation of physical design for CORDIC algorithm for ordinates are in the form of (x, y) where „x‟ stands for a polar to rectangular and rectangular to polar conversions, by the use of RTL code in written in Verilog and fed to pre-processor, horizontal plane and „y‟ stands for the vertical plane from cordic core and post processor.
  • A.1 CORDIC Algorithm

    A.1 CORDIC Algorithm

    Research on Hardware Intellectual Property Cores Based on Look-Up Table Architecture for DSP Applications HOANG VAN PHUC Doctoral Program in Electronic Engineering Graduate School of Electro-Communications The University of Electro-Communications A thesis submitted for the degree of DOCTOR OF ENGINEERING September 2012 I would like to dedicate this dissertation to my parents, my wife and my daughter. Research on Hardware Intellectual Property Cores Based on Look-Up Table Architecture for DSP Applications APPROVED Asso. Prof. Cong-Kha PHAM, Chairman Prof. Kazushi NAKANO Prof. Yoshinao MIZUGAKI Prof. Koichiro ISHIBASHI Asso. Prof. Takayuki NAGAI Date Approved by Chairman c Copyright 2012 by HOANG Van Phuc All Rights Reserved. 和文要旨 DSPアプリケーションのためのルックアップテーブルアーキテ クチャに基づくハードウェアIPコアに関する研究 ホアン ヴァンフック 電気通信学研究科電子工学専攻博士後期課程 電気通信大学大学院 本論文は,ルックアップテーブル(LUT)アーキテクチャに基づく省面積 かつ高性能なハードウェア IP コアを提案し,DSP アプリケーションに適用 することを目的としている.LUT ベースの IP コアおよびこれらに基づいた 新しい計算システムを提案した.ここで,提案する計算システムには,従来 の演算と提案する IP コアを含む.提案する IP コアには,乗算や二乗計算等 の基本演算と,正弦関数や対数・真数計算等の初等関数が含まれる. まず初めに,高効率な LUT 乗算器および二乗回路の設計を目的として, 2つの方法を提案した.1つは,全幅結果が不要な場合に DSP に応用可能 な打切り定数乗算器である.このアーキテクチャには,LUT ベースの計算と DSP 用の打切り定数乗算器という2つのアプローチを組み合わせた.最適な パラメータと LUT の内容を探索するために,LUT 最適化アルゴリズムの検 討を行った.さらに,固定幅二乗回路向けに LUT ハイブリッドアーキテク チャを改良した.この技術は,二乗回路における性能,誤計算率,複雑さの 間の妥協点を見いだすために,LUT 論理回路と従来の論理回路の両方を採用 したものである. 初等関数の計算については,LUT ベースの計算と線形差分法を組み合わせ て,2つのアーキテクチャを提案した.1つは,ディジタル周波数合成器, 適応信号処理技術および正弦関数生成器に利用可能な正弦関数計算のために, 線形差分法を改良したものである.他の方法と比較して誤計算率が変わらな い一方で,LUT の規模と複雑さを抑える最適パラメータを探索するために数 値解析と最適化を行った.その他に,対数計算および真数計算向けに,疑似
  • An Optimization of CORDIC Algorithm and FPGA Implementation

    An Optimization of CORDIC Algorithm and FPGA Implementation

    International Journal of Hybrid Information Technology Vol.8, No.6 (2015), pp.217-228 http://dx.doi.org/10.14257/ijhit.2015.8.6.21 An Optimization of CORDIC Algorithm and FPGA Implementation Rui Xua, Zhanpeng Jiangb, Hai Huangc, Changchun Dongd Department of Integrated circuits design and integrated systems,Harbin University of Science and Technology ,Harbin, Heilongjiang, China [email protected], b [email protected], [email protected] [email protected] Abstract ASIC and FPGA ASIC and FPGA are considered to be the ideal platform for special fast calculations because of the hardware structure, and how to achieve computational algorithm by is the hotpot of research. The CORDIC (Coordinate Rotational Digital Computer) can break the basis functions down to operations of shift and addition or subtraction, which can be used to lay the foundation for the realization of complex logic. But the functions selected by traditional CODIC for angle encoding are too complex, which will lead to some problems, such as too much of area consumption and large delay. In this paper, an optimization of CORDIC algorithm are proposed, which reduce the consumption of Adders and comparators, decrease the complexity and delay of the algorithm implement in hardware. The proposed algorithms are modeled in Verilog Hardware Description Language and implemented with FPGA. The simulation results show that the functions of sine and cosine are realized successfully, and the proposed algorithm not only improves the computation speed but also reduces
  • A Reduced-Complexity Lookup Table Approach to Solving Mathematical Functions in Fpgas

    A Reduced-Complexity Lookup Table Approach to Solving Mathematical Functions in Fpgas

    A reduced-complexity lookup table approach to solving mathematical functions in FPGAs Michael Low, Jim P. Y. Lee Defence R&D Canada --- Ottawa Technical Memorandum DRDC Ottawa TM 2008-261 December 2008 A reduced-complexity lookup table approach to solving mathematical functions in FPGAs Michael Low Jim P. Y. Lee DRDC Ottawa Defence R&D Canada – Ottawa Technical Memorandum DRDC Ottawa TM 2008-261 December 2008 Principal Author Original signed by Michael Low Michael Low Defence Scientist, Radar Electronic Warfare Approved by Original signed by Jean-F. Rivest Jean-F. Rivest Head, Radar Electronic Warfare Approved for release by Original signed by Pierre Lavoie Pierre Lavoie Chief Scientist, DRDC Ottawa © Her Majesty the Queen in Right of Canada, as represented by the Minister of National Defence, 2008 © Sa Majesté la Reine (en droit du Canada), telle que représentée par le ministre de la Défense nationale, 2008 Abstract …….. Certain mathematical functions, such as the inverse, log, and arctangent, have traditionally been implemented in the digital domain using the Coordinate Rotation Digital Computer (CORDIC) algorithm. In this study, it is shown that it is possible to achieve a similar degree of numerical accuracy using a reduced-complexity lookup table (RCLT) algorithm, which is much less computationally-intensive than the CORDIC. On programmable digital signal processor (DSP) chips, this reduces computation time. On field-programmable gate arrays (FPGAs), and application-specific integrated circuits (ASICs), this reduces the consumption of hardware resources. This paper presents the results of a study in which three commonly-used functions, i.e. inverse, log, and arctangent functions, have been synthesized for the Xilinx Virtex-II family of FPGAs, using both the RCLT and CORDIC implementations.
  • FPGA Technology in Beam Instrumentation and Related Tools

    FPGA Technology in Beam Instrumentation and Related Tools

    FPGA technology in beam instrumentation and related tools Javier Serrano CERN, Geneva, Switzerland DIPAC 2005. Lyon, France. 7 June 2005. Plan of the presentation z FPGA architecture basics z FPGA design flow z Performance boosting techniques z Doing arithmetic with FPGAs z Example: RF cavity control in CERN’s Linac 3. DIPAC 2005. Lyon, France. 7 June 2005. A preamble: basic digital design Clk [31:0] DataInB[31:0] [31:0] D[31:0] Q[31:0] [31:0] D[0] Q[0] [31:0] 0 dataBC[31:0] dataSelectC [31:0] [31:0] [31:0] D[31:0] Q[31:0] [31:0] DataOut[31:0] DataSelect [31:0] 1 DataOut[31:0] DataOut_3[31:0] [31:0] High clock rate: [31:0] + [31:0] sum[31:0] DataInA[31:0] [31:0] D[31:0] Q[31:0] 144.9 MHz on a [31:0] [31:0] 6.90 ns dataAC[31:0] Xilinx Spartan IIE. DataSelect D[0] Q[0] D[0] Q[0] dataSelectC dataSelectCD1 Clk [31:0] D[31:0] Q[31:0] [31:0] 0 [31:0] [31:0] [31:0] [31:0] DataInB[31:0] [31:0] D[31:0] Q[31:0] [31:0] [31:0] D[31:0] Q[31:0] [31:0] DataOut[31:0] dataACd1[31:0] [31:0] 1 dataBC[31:0] DataOut[31:0] DataOut_3[31:0] Higher clock rate: [31:0] [31:0] + [31:0] D[31:0] Q[31:0] [31:0] DataInA[31:0] [31:0] D[31:0] Q[31:0] [31:0] 151.5 MHz on the [31:0] [31:0] sum_1[31:0] sum[31:0] dataAC[31:0] 6.60 ns same chip.
  • A Unified Reconfigurable CORDIC Processor for Floating-Point Arithmetic

    A Unified Reconfigurable CORDIC Processor for Floating-Point Arithmetic

    Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 June 2018 doi:10.20944/preprints201806.0393.v1 Article A Unified Reconfigurable CORDIC Processor for Floating-Point Arithmetic Linlin Fang 1, Bingyi Li 1, Yizhuang Xie 1,* and He Chen 1 1 Beijing Key Laboratory of Embedded Real-time Information Processing Technology, Beijing Institute of Technology, Beijing 100081, China; [email protected] (L.F.); [email protected] (B.L.); [email protected] (Y.X.); [email protected] (H.C.) * Correspondence: [email protected]; Tel.: +86-156-5279-7282 Abstract: This paper presents a unified reconfigurable coordinate rotation digital computer (CORDIC) processor for floating-point arithmetic. It can be configured to operate in multi-mode to achieve a variety of operations and replaces multiple single-mode CORDIC processors. A reconfigurable pipeline-parallel mixed architecture is proposed to adapt different operations, which maximizes the sharing of common hardware circuit and achieves the area-delay-efficiency. Compared with previous unified floating-point CORDIC processors, the consumption of hardware resources is greatly reduced. As a proof of concept, we apply it to 16384 16384 points target Synthetic Aperture Radar (SAR) imaging system, which is implemented on Xilinx XC7VX690T FPGA platform. The maximum relative error of each phase function between hardware and software computation and the corresponding SAR imaging result can meet the accuracy index requirements. Keywords: reconfigurable architecture; CORDIC; Field Programmable Gate Array(FPGA); SAR imaging 1. Introduction The CORDIC algorithm involves a simple shift-and-add iterative procedure to perform several computing tasks. It can execute the rotation of a two-dimensional (2-D) vector in linear, circular, and hyperbolic coordinates systems [1].
  • Design and Analysis of Double Precision Floating Point Division Operator Based on CORDIC Algorithm

    Design and Analysis of Double Precision Floating Point Division Operator Based on CORDIC Algorithm

    International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Impact Factor (2012): 3.358 Design and Analysis of Double Precision Floating Point Division Operator Based on CORDIC Algorithm Chetan Dudhagave1, Hari Krishna Moorthy2 1M.Tech Student (SP & VLSI) Department of Electronics and Communication Engineering, Jain University, Karnataka, India 2Assistant Professor, Department of Electronics and Communication Engineering, Jain University, Karnataka, India Abstract: Floating point arithmetic units provides better accuracy, precision and it covers larger data ranges compared to fixed point. Designing of floating point division operator is complex compared to other data operands. In general, division operation based on CORDIC algorithm has a limitation in term of the range of inputs that can be processed by the CORDIC machine to give proper convergence and precise division operation result. This paper involves the design of Double precision floating point division operator using CORDIC algorithm. The new architecture of CORDIC Algorithm is proposed in this project which overcomes the limitation in terms of range of inputs and is capable of processing broader input ranges. The performance is evaluated for large input tests. The results show that the proposed system gives precise division operation results with broader input ranges. The proposed hardware architecture is modeled in VERILOG and synthesized on Virtex-4FPGA device (xc4vsx25). The design has achieved maximum frequency of 211.879MHz. Keywords: Floating-Point operators, CORDIC Algorithm, 64-bit IEEE Standard Double-Precision 1. Introduction of 60 clock cycles for double-precision division. Based on this sequential design, a pipelined design [8] was presented, In modern digital computer architecture, performance of which unrolls the iterations of the digit recurrence digital computer is vastly improved due to floating point computations and optimizes the operations within each arithmetic units.
  • A Highly Optimized Arithmetic Software Library and Hardware Co

    A Highly Optimized Arithmetic Software Library and Hardware Co

    TETRACOM: Technology Transfer in Computing Systems FP7 Coordination and support action to fund 50 technology transfer projects (TTP) in computing systems. FP7 Coordination and Support Action to fund 50 technology transfer projects (TTP) in computing systems. This project has received funding from the European Union’s Seventh Framework Programme for research, This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n⁰ 609491. technological development and demonstration under grant agreement n⁰ 609491. A Highly Optimized Arithmetic Software Library and Hardware Co-processor IP for Fixed-Point VLIW-SIMD Processor Architectures Lukas Gerlach, Stephan Nolting, Holger Blume and Guillermo Payá Vayá, Leibniz Universität Hannover, Germany Hans‐Joachim Stolberg and Carsten Reuter, videantis GmbH, Hannover, Germany TTP Problem Performance requirements are pushing the limits of Area and energy efficiency is restricted for embedded multimedia systems: embedded multimedia systems: Often used non-linear complex Area and energy optimized computation by mathematical functions require a lot of using specific arithmetic evaluation computational power. software libraries or hardware accelerators. sin() atan() ln() div() cos() sqrt() exp() pow() TTP Solution Software-based solution: Hardware-based solution: Mathematic software CORDIC (Coordinate CORDIC processing element config x,y,z start Rotation Digital Computer) library (LibARITH) N optional Scalable co-processor Pre-processing stage architecture: Optimized for VLIW-SIMD processors: scale . M CORDIC modules in . Exploiting data and instruction level parallelism series are incorporated to Register Register Register Iteration controller process M CORDIC Advantages: D S iterations per clock cycle CORDIC Scale factor M table . High flexibility Angle P P table P .