Floating-Point Division Operator Based on CORDIC Algorithm 79

Floating-Point Division Operator based on CORDIC Algorithm 79 Floating-Point Division Operator based on CORDIC Algorithm Pongyupinpanich Surapong1 and Faizal Arya Samman2, Non-members ABSTRACT the complexity to implement the operation, especially when using floating-point data operands. Design and evaluation of a CORDIC (COordinate Rotation DIgital Computer) algorithm for a floating- Compared to other arithmetic operator, designing point division operation is presented in this paper. In a divider unit requires a special attention because of general, division operation based on CORDIC algo- the complexity to implement the operation, especially rithm has a limitation in term of the range of inputs when using floating-point data operands. The special that can be processed by the CORDIC machine to attention opens a challenging issue for many scientists give proper convergence and precise division opera- and researchers to introduce new efficient algorithms tion result. A hardware architecture of CORDIC al- and methods to design the divider unit. [1] gorithm capable of processing broader input ranges Division operations are often used in many sci- is implemented and presented in this paper by using entific computations of image and signal processing a pre-processing and a post-processing stage. The algorithms. Image convolution and Gaussian Filter- performance as well as the calculation error statis- ing are example computations [2] that require divider tics over exhaustive sets of input tests are evaluated. operator as presented in Equ. (1) and Equ. (2), re- The results show that the CORDIC algorithm can spectively. Equ. (1) is the equation of pixel output of be well-convergence and gives precise division opera- a 3 × 3 convolution masks window, where wi is the tion results with broader input ranges. The proposed weight of the j adjacent input image pixel within the hardware architecture is modeled in VHDL and syn- window. thesized on a CMOS standard-cell technology and a FPGA device, resulting 1 GFlops on the CMOS and P 210.812 MFlops on the FPGA device. 9 p w P = Pi=0 i i (1) 9 w Keywords: Floating-Point Operators, Accelerator i=0 i Processor, Product-of-Sum, Sum-of-Product, 32-bit IEEE Standard Single-Precision. Another important equation with division operation is presented by the 2D Gaussian distribution as 1. INTRODUCTION shown in Equ. (2), where x and y is the two dimen- In modern digital computer architecture, floating- sional image signal and σ is the standard deviation point arithmetic units have been important compo- of the distribution. The Gaussian filtering is used to nents to improve the performance of the digital com- \blur" images and remove detail and noise [2]. puter. Arithmetic components such as adder/sub- tractor, multiplier and divider units are generally 1 − x2+y2 basic operators in a scientific computations beside 2σ2 G(x; y) = 2 e (2) other trigonometric functions such as sine, cosine, 2πσ logarithm, exponent, etc. Compared to fixed point arithmetic units, the floating-point arithmetic units In adaptive digital signal processing applications provides better accuracy and precision and it cov- for instance, division operations are used in a nor- ers larger data ranges, which is suitable for scien- malized adaptive least-mean-square (LMS) algorithm tific computations in engineering application areas. presented in Equ. (5) to update the parameters of an Compared to other arithmetic operator, designing a adaptive filter. The filter output signal, Equ. (3), is divider unit requires a special attention because of compared to the desired system model output signal (d), resulting in an error signal, Equ. (4). This Manuscript received on July 2, 2012. error signal is then used to drive the adaptive filter 1 The author is with Ramkhamhaeng University, Fac- parameters, in such a way that finally the adaptive ulty of Engineering, Department of Computer Engineering, Ramkhamhaeng Road, Hua Mark, Bangkapi, Bangkok 10240, filter parameters (wj) will be equal (almost equal in Thailand , E-mail: [email protected] practice) to system model parameters. The signal di- 2 The author is with Universitas Hasanuddin at Makassar, visor in this case (γ + x(k − j)2) is used to improve Faculty of Engineering, Department of Electrical Engineering, Jl. Perintis Kemerdekaan Km. 10, Tamalanrea, Makassar the stability of the adaptive parameter identification 90245, Indonesia. , E-mail: [email protected] algorithm. 80 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.7, NO.1 May 2013 malization or division by convergence [12]. The dis- advantage of the Goldschmidt's algorithm in term of NXtap the area overhead is the need for two independent − 8 2 y(k) = wj(k)x(k j); j Ψ (3) parallel multiplication. As we know, a multiplier re- j=1 quires large number of logic area, especially when it e(k) = d(k) − y(k) (4) is implemented in floating-point arithmetic. β e(k)x(k − j) 5. Newton-Raphson Algorithm: The Newton- w (k + 1) = w (k) + ; 8j 2 Ψ (5) j j γ + x(k − j)2 Raphson division algorithm is almost similar with the Goldschmidt's algorithm. In the Newton-Raphson 2. STATE-OF-THE-ARTS OF DIVISION method however, the iterative refinement is applied METHODS only to the reciprocal value of the divisor, which will be convergent after several iterations [13]. The divi- In this section, we will present brief descriptions sion operation of the Newton-Raphson method can on the state-of-the-art of the methodologies or algo- be divided into three steps, i.e. the initial estimation rithms to implement the binary division operation. of the divisor's reciprocal, the iterative refinement of The methodologies are described as follows. the divisor's reciprocal and the multiplication step be- 1. Adder-Cell-based Method: The design of divi- tween the divided and the final convergent divisor's sion operator using the adder-cell-based method will reciprocal. The work in [14] has presented for ex- always result in a very compact divider architecture. ample a decimal floating-point divider using newton- This method is classified as non iterative technique, raphson iteration, where an accurate piece-wise linear where the divider unit consists of half-adder and full- approximation is used to obtain an initial estimate of adder cells as well as other logic gate units and sup- a divisor's reciprocal. porting modules [3]. A binary divider that uses carry- 6. CORDIC Algorithm: Beside the aforemen- save adder units is presented for example in [4]. tioned method to implement the division operation, 2. Digit Recurrence Algorithm: In modern float- there is also another powerful algorithm to implement ing point arithmetic units the most common algo- the divider unit called CORDIC (COrdinate Rotation rithm employed to division function is a digit recur- DIgital Computer) algorithm [15]. Like digit recurrence algorithm [5] [6] [7]. The algorithm performs rence method, CORDIC is also classified into iter- both operations based on shifting and subtraction ative method. The main powerful characteristic of as the fundamental operators. A combined floating- the CORDIC algorithm is the capability to imple- point square-root and division operation can also be ment several trigonometric function [16], [15], phase implemented by using a subtractive SRT (Sweeney, and magnitude functions [17], and hyperbolic func- Robertson and Tocher) algorithm [8], which can be tions [18] as well as linear operational function such classified as a digit recurrence algorithm. The sub- as multiplication and division functions. tractive SRT algorithm can be extended by using By using CORDIC algorithms, we can also easily im- Radix-8 IDS (Interleaved Digit Set) algorithm to implement all the function in a single CORDIC hard- prove the performance of the traditional digit re- ware architecture [19]. Some basic standard and currence algorithm. Another variant of the digit- non-standard operators such as sum-of-product and recurrence method is svoboda algorithm. A new product-of-sum [20], which can be used to accelerate Svoboda-Tung Division algorithm is for instance pro- floating-point operations, can also be implemented by posed in [9]. using CORDIC algorithm. The work in [21] presents 3. Taylor's Series Expansion Algorithm: A Tay- for example a flexible FPGA implementation of a pa- lor's Series Expansion Algorithm [10] for example can rameterizable floating-point library allowing to com- be used to calculate division operation using a sequen- pute the sine, cosine or arctangent functions. The tial series of a harmonic equation. However, the Tay- CORDIC algorithm can also even be used to solve lor's Series Expansion algorithm is rarely used and problematic operation in a fuzzy logic controller cir- perform slow computation to calculate the division cuit [22]. Moreover, the CORDIC algorithm can operations. be used to implement a unified frequency analysis 4. Goldschmidt's Algorithm: The basic idea be- or transformations functions such as DFT (Discrete- hind the Goldschmidt's Algorithm is the iterative Fourier Transform), DHT (Discrete Hartley Trans- parallel multiplication of the dividend and divisor by form), DCT (Discrete Cosine Transform) and DST updated factors in such as a way that the final divi- (Discrete Sine Transform) [23]. sor will be driven to one. Thus, the final dividend gives the quotient (the division result). Oberman et There are two main issues, in which CORDIC al- al. for example [11] proposes a floating-point divider gorithm is preferable to design of the floating-point and square root for AMD-K7 by using Goldschmidt's division operator i.e., algorithm. The Goldschmidt's algorithm has been 1. The benefit of CORDIC Algorithm: The broadly used on many commercial microprocessors CORDIC algorithm provides advantages in the per- and is also known as division by multiplicative nor- formability of fundamental function for scientific and Floating-Point Division Operator based on CORDIC Algorithm 81 engineering, the low algorithmic complexity, and the 20 simplicity for VLSI implementation.

Floating-Point Division Operator Based on CORDIC Algorithm 79

3.2 the CORDIC Algorithm

CORDIC-Like Method for Solving Kepler's Equation

CORDIC V6.0 Logicore IP Product Guide

A Review on Hardware Accelerator Design and Implementation of CORDIC Algorithm for a Gaming Application

A.1 CORDIC Algorithm

An Optimization of CORDIC Algorithm and FPGA Implementation

FPGA Technology in Beam Instrumentation and Related Tools

A Unified Reconfigurable CORDIC Processor for Floating-Point Arithmetic

Design and Analysis of Double Precision Floating Point Division Operator Based on CORDIC Algorithm

A Highly Optimized Arithmetic Software Library and Hardware Co

A Trigonometric Hardware Acceleration in 32-Bit RISC-V

Cordic Algorithm and Its Applications In