A.1 CORDIC Algorithm

Research on Hardware Intellectual Property Cores Based on Look-Up Table Architecture for DSP Applications HOANG VAN PHUC Doctoral Program in Electronic Engineering Graduate School of Electro-Communications The University of Electro-Communications A thesis submitted for the degree of DOCTOR OF ENGINEERING September 2012 I would like to dedicate this dissertation to my parents, my wife and my daughter. Research on Hardware Intellectual Property Cores Based on Look-Up Table Architecture for DSP Applications APPROVED Asso. Prof. Cong-Kha PHAM, Chairman Prof. Kazushi NAKANO Prof. Yoshinao MIZUGAKI Prof. Koichiro ISHIBASHI Asso. Prof. Takayuki NAGAI Date Approved by Chairman c Copyright 2012 by HOANG Van Phuc All Rights Reserved. 和文要旨 DSPアプリケーションのためのルックアップテーブルアーキテクチャに基づくハードウェアIPコアに関する研究ホアンヴァンフック電気通信学研究科電子工学専攻博士後期課程電気通信大学大学院本論文は，ルックアップテーブル（LUT）アーキテクチャに基づく省面積かつ高性能なハードウェア IP コアを提案し，DSP アプリケーションに適用することを目的としている．LUT ベースの IP コアおよびこれらに基づいた新しい計算システムを提案した．ここで，提案する計算システムには，従来の演算と提案する IP コアを含む．提案する IP コアには，乗算や二乗計算等の基本演算と，正弦関数や対数・真数計算等の初等関数が含まれる. まず初めに，高効率な LUT 乗算器および二乗回路の設計を目的として，２つの方法を提案した．１つは，全幅結果が不要な場合に DSP に応用可能な打切り定数乗算器である．このアーキテクチャには，LUT ベースの計算と DSP 用の打切り定数乗算器という２つのアプローチを組み合わせた．最適なパラメータと LUT の内容を探索するために，LUT 最適化アルゴリズムの検討を行った．さらに，固定幅二乗回路向けに LUT ハイブリッドアーキテクチャを改良した．この技術は，二乗回路における性能，誤計算率，複雑さの間の妥協点を見いだすために，LUT 論理回路と従来の論理回路の両方を採用したものである．初等関数の計算については，LUT ベースの計算と線形差分法を組み合わせて，２つのアーキテクチャを提案した．１つは，ディジタル周波数合成器，適応信号処理技術および正弦関数生成器に利用可能な正弦関数計算のために，線形差分法を改良したものである．他の方法と比較して誤計算率が変わらない一方で，LUT の規模と複雑さを抑える最適パラメータを探索するために数値解析と最適化を行った．その他に，対数計算および真数計算向けに，疑似対称線形アプローチを提案した．そのアーキテクチャの最適パラメータを探索するために，二段階最適化アルゴリズムも開発した．これらの最適化アルゴリズムと LUT 規模圧縮手法を利用することで，提案する対数計算および真数計算モジュールは，他の方法と比較して同程度の正確さを維持しつつ，ハードウェアの複雑化を回避することができる．本論文で提案する手法は， DSP システムの対数・真数計算等関数生成器と混合数システムプロセッサにおける変換器の両方に適用可能である．最後に，提案する IP コアによる特定用途向け DSP アプリケーションと計算システムを開発した．これらの方法は，将来的には，特定用途向け DSP アプリケーションや，ディジタル信号プロセッサの演算ユニットとして利用可能である． Abstract Research on Hardware Intellectual Property Cores Based on Look-Up Table Architecture for DSP Applications HOANG VAN PHUC Doctoral Program in Electronic Engineering The University of Electro-Communications Recently, high performance hardware intellectual property (IP) cores for arithmetic operations are highly required to meet the increasing demand of digital signal processing (DSP) applications in multime- dia, wireless communications, mobile and handheld devices. Tradi- tionally, the high performance arithmetic function generators can be implemented by logic circuits with some advanced techniques such as the parallel architecture. However, these techniques have to trade- off between the computation performance and system complexity. In other words, the high performance computation circuits may occupy much hardware area and lead to high power consumption. As a re- sult, some alternative approaches should be proposed for modern and future DSP applications. On the other hand, the look-up table (LUT)- based computation approach is promisingly suitable to be an alternative approach. LUT-based computation circuits provide the output by only accessing the pre-stored tables other than actual computations in real time. This LUT access-only operation results in the high speed and the low switching rate that leads to the low power consumption. Moreover, according to the report of the International Technol- ogy Roadmap for Semiconductors (ITRS), embedded memories are becoming faster, having higher density, lower dynamic power consumption and will dominate the content of the future System-on- Chip DSP applications. Also, advanced LUT technologies lead to the high improvements in LUT performance and density. Memory- based computation and LUT-based computation are also well suited for many DSP algorithms which require squaring, constant multipli- cations and elementary function computations. However, in the sim- ple direct LUT-based computation, the LUT size grows exponentially when the operand width increases. Therefore, many researches focus on reducing the LUT size. Although there have been some existing methods for LUT-based computation components and systems, the new and more improved methods are desired because the future DSP applications require more efficient computation circuits. The research presented in this dissertation focuses on the efficient hardware IP cores based on the LUT architectures for DSP applications. Four methods for efficient specific hardware IP cores are proposed. Moreover, some detail design examples and a computation system based on these IP cores are developed. The proposed IP cores include both basic arithmetic operations (such as multiplication and squaring) and elementary functions (such as sine and logarithm/anti- logarithm function computations). Two methods are proposed for the design of efficient LUT-based multiplier and squarer circuits. Firstly, a novel and efficient LUT- based truncated multiplier is presented for specific DSP applications in which the full width results are not required. This method combines two approaches of LUT-based computation and truncated multipliers for DSP applications. An LUT optimization algorithm is also developed to find the optimal parameters and LUT content. Secondly, an improved hybrid LUT-based architecture is employed for the fixed- width squarer circuits. This hybrid technique takes the advantages of both LUT-based and conventional logic circuits to achieve the good trade-off of the squarer performance, error and complexity. For the computation of elementary functions using LUT-based architecture, two novel architectures which combine LUT-based computation and linear difference approach are proposed. The first one is the improved linear difference method for the sine function computation which can be used in digital frequency synthesizer, adaptive signal processing and sine function generator. The numerical analysis and optimization are employed to find the optimal parameters which minimize the LUT size and hardware complexity while remain the same error performance compared with other methods. The second one is a novel quasi-symmetrical approach for the logarithm and anti- logarithm computation. An optimization algorithm is also developed to find the optimal parameters for the hardware architecture. Thanks to the optimization algorithm and the LUT size reduction method, the hardware complexity and computation delay of the logarithm and anti-logarithm computation modules can be reduced significantly with the same accuracy compared with other methods. This method can be applied for both logarithm/anti-logarithm function generators in DSP systems and the domain converters in hybrid number system processors. Finally, some specific applications and a prototype computation system based on these IP cores are developed to clarify the improvements of proposed methods. With the improvements achieved, the proposed methods can be considered as potential candidates for modern and future DSP systems. List of Abbreviations ADC Analog to Digital Converter ADP Area-Delay Product ALOGC Anti-logarithmic Converter ALU Arithmetic Logic Unit AU Arithmetic Unit ASIC Application Specific Integrated Circuit CMOS Complementary Metal Oxide Semiconductor CORDIC Coordinate Rotation Digital Computer CSC Clolor Space Conversion CSD Canonical Signed Digit DA Distributed Arithmetic DAC Digital to Analog Converter DSP Digital Signal Processing FET Field Effect Transistor FPGA Field Programmable Gate Array GPP General Purpose Processor HIL Hardware In the Loop HNS Hybrid Number System IP Intellectual Property ITRS International Technology Roadmap for Semiconductors KCM Constant Coefficient Multiplier LE Logic Element LMS Least Mean Square LNS Logarithmic Number System LOGC Logarithm Converter LSB Least Significant Bit LUT Look-up Table MCU Microcontroller Unit MRAM Magnetic Random Access Memory MSB Maximum Significant Bit MTM Multipartite Table Method MFCC Mel Frequency Cepstral Coefficients vii PAC Phase to Amplitude Converter PPM Partial Product Matrix RAM Random Access Memory ROM Read-Only Memory SoC System-on-chip SR Speech Recognition TTA Transport Triggered Architecture VHDL Very High Speed Integrated Circuits Hardware Description Language VLSI Very Large Scale Integration viii Contents 1 Introduction 1 1.1 Context of Research and Motivation ................. 1 1.1.1 Trend of DSP Market ..................... 1 1.1.2 Trend of DSP Features .................... 1 1.1.3 Motivation ........................... 2 1.2 Objective and Scope of the Dissertation ............... 5 1.3 Original Contributions ........................ 7 1.4 Dissertation Overview ........................ 8 2 Background and General Approach 13 2.1 Introduction to DSP Systems .................... 13 2.2 Hardware Platforms and DSP Implementation Methods ...... 14 2.2.1 General Purpose Processor (GPP)-Based Implementation .14 2.2.2 DSP Processor-Based Implementation ........... 15 2.2.3 ASIC-Based DSP Implementation .............. 15 2.2.4 FPGA-Based DSP Implementation ............. 16 2.2.5 Implementation Methods for DSP Applications ...... 17 2.3 Concept of LUT-Based Computation and Design Issues ...... 17 2.4 Literature Review of LUT-Based Computation Methods ..... 20 2.5 General Approach for Efficient LUT-Based IP Cores ........ 23 3 LUT-Based Truncated Multiplier 27 3.1 Introduction to LUT-Based Truncated Multiplier ......... 29 3.1.1 Truncated Multiplier ..................... 29 3.1.2 LUT-Based Truncated Multiplier .............. 29 ix CONTENTS 3.2 Proposed LUT-Based Truncated Multiplier ............. 32 3.2.1 Proposed Multiplier Architecture

A.1 CORDIC Algorithm

3.2 the CORDIC Algorithm

CORDIC-Like Method for Solving Kepler's Equation

CORDIC V6.0 Logicore IP Product Guide

A Review on Hardware Accelerator Design and Implementation of CORDIC Algorithm for a Gaming Application

An Optimization of CORDIC Algorithm and FPGA Implementation

FPGA Technology in Beam Instrumentation and Related Tools

A Unified Reconfigurable CORDIC Processor for Floating-Point Arithmetic

Design and Analysis of Double Precision Floating Point Division Operator Based on CORDIC Algorithm

A Highly Optimized Arithmetic Software Library and Hardware Co

A Trigonometric Hardware Acceleration in 32-Bit RISC-V

Cordic Algorithm and Its Applications In

CORDIC-Based LMMSE Equalizer for Software Defined Radio