Computation of Decimal Transcendental Functions Using the CORDIC Algorithm

Computation of Decimal Transcendental Functions Using the CORDIC Algorithm Álvaro Vázquez, Julio Villalba* and Elisardo Antelo University of Santiago de Compostela *University of Málaga SPAIN 19th IEEE Symposium on Computer Arithmetic Portland (USA), June 8-10, 2009 Motivation • Renewed interest in decimal floating point (IEEE 754- 2008; Hardware implementations .) • Research now mainly focused on basic floating point operations (+/-, x, /). • We consider transcendentals (sin, cos, exp, log…) to fully support the IEEE 754-2008 Decimal Number System for DPD formats. • Slow multiplication -> CORDIC might be an option for transcendentals. 2 Commercial Hardware Implementations Cycles required for execution for Dec128 operands (Power 6 - 13 FO4 cycle - 5GHz @ 65nm SOI ) FP add/sub 11 to 19 (depending on specific case ) FP Multiplication 21+2 N (wost case:89) (N: nº digits excluding leading zeros ) FP Division 154 FXP add/sub 2 3 Transcendentals • Transcendental functions: sin, cos, … • We aim to implement transcendentals based on CORDIC (serial method). • Why?, slow decimal multiplication-> Table + polynomial approx might be slow. • Resembles the context when CORDIC were used in x86 processors. 4 CORDIC Rotation mode (circular) Vectoring mode (circular) output: rotated vector Input angle Input vector Input vector output: modulus and angle Circular Functions: sin, cos, tan, norm of a vector, tan -1(y/x),… Extension to “hyperbolic rotation”: sinh,cosh, sqrt, tan -1(y/x),ln,exp,.. 5 CORDIC Rotation decomposed into elementary rotations α α cos( ) sin( ) cos( α1) sin( α1) cos( α2) sin( α2) = α α .... -sin( ) cos( ) -sin( α1) cos( α1) -sin( α2) cos( α2) Rotation angle decomposed as sum of elementary rotation angles α = Σ α i 1 tan( α ) α i cos( i) α -tan( i) 1 6 Radix-2 CORDIC: basic step Basic step: elementary rotation 1 tan( α ) with increase in the modulus α i cos( i) α (x[i],y[i]) -tan( i) 1 (x[i+1],y[i+1]) basic step α -1 -i -1 -i σ -i i=+ tan (2 ) or -tan (2 ) 1 i 2 σ -i - i 2 1 σ i=+1 or -1 α σ cos( i) independent of i Κ= Π α cos( i) constant scale factor ->precomputed and compensated after basic steps 7 CORDIC Convergence Worst case: vector on target position α Next residual angle value is i Next elementary rotation target position (x[i+1],y[i+1]) α (remainder angle=0) i (x[i],y[i]) α < Σ α Convergence condition : i k=i+1 k 8 Our Decimal CORDIC • We want constant scale factor -> precomputed and simple to compensate. • Constant scale factor means two possible angles (same magnitude, different sign) for each elementary rotation. • Conventional radix-2 CORDIC not good for decimal operands 10 -k 10 -(k+1) BCD Rep. σ -i x[i] ……8 4 2 1 8 4 2 1 1 i 2 σ -i y[i] x x x x x x x x - i 2 1 Not a simple right shift 9 Elementary angles Key issue: elementary angles set Requirements: α α • Two possible angles per iteration : + i and – i. σσσ ααα • tan( i i) proportional to a power of ten . 1 tan( σ α ) α i i cos( i) σ α -tan( i i) 1 Simple right shift for BCD operands Constant scale factor 10 Angle set for Dec. CORDIC Binary weights of a decimal rep. RNC8 paper [3] ααα -1 σσσ -ceil(i/4) i=tan ( i C[i] 10 ) σ i=-1 or +1 8, 4, 2,1 5, 2, 1,1 Easier to implement (easy multiples for dec.) α < α α α Convergence: i i+1 + i+2 +…..+ i+k +… 11 Contributions in this work • Extend the algorithm to support hyperbolic coordinates (tanh -1 angles) • Extend to floating point • Mapping the algorithm to a state of the art Decimal Floating Point Unit. • Algorithm with redundant adder (carry-save). 12 Angle set for circular and hyperbolic ααα -1 σσσ -ceil(i/4) i=tan ( i C[i] 10 ) ααα -1 σσσ -ceil(i/4) i=tanh ( i C[i] 10 ) It does not assure convergence C[i]=R[i mod 4]=1, 5, 2, 1 for both Enough redundancy to assure C[i]=R[i mod 4]=1, 5, 2, 2 convergence for hyperbolic coord. 13 Unified Algorithm Roughly one iteration per bit of the input operands. (4 iter./digit) c=1 (circular); c=-1 (hyperbolic); C[i] takes values 5, 2 or 1. 14 Floating-Point Extension Range Reduction : standard methods (see paper) sin/cos, tan -1,sinh/cosh,e f,10 f, tanh -1 ln(f) log(f), sqrt Result after range reduction : One of the inputs: yin= Myin 10 -Eyin or zin=Mzin 10 -Ezin |Mzin|, |Myin| in [1,10) and Eyin, Ezin > or = 0 We need a floating point version of the iterations: Scale y or z iterations (or both) to move Eyin or Ezin Leading zeros (or nines). 15 Floating-Point Extension Floating-point iterations : Scale y and/or z iterations by 10 Eyin or 10 Ezin Start iterations with index J=4Ezin-3 or J=4Eyin-3 Funct P Q sin/cos; Ezin Ezin sinh/cosh (N=0) sinh/cosh 0 Ezin (N!=0) tan -1, tanh -1 Eyin Eyin Ln, log sqrt 0 0 p decimal digits and 1 ulp accuracy: datapath of 1 integer and p+3 fract. and m=p+1 (4(p+1) elementary rotations). 16 Mapping to a DFPU (for Decimal128 format) 17 Pipelining the Algorithm Interleave x and y iterations for effective pipelining 18 Pipelining the Algorithm: Number of cycles for Decimal 128 -#cycles without pre and post processing due to range reduction -Optimization performed : after about half of rotations, perform a single final rotation since cos and sin can be linearly approximated. This requires a mult. for rotation and a division for vectoring both of about m/2 digits. 19 Algorithm for Carry-Save Representation Carry-save Adder CPA Perform iterations with redundant adder. (may be used for division) Why use a redundant adder? Not for speed (actually the algorithm will be slower) Our aim is to use a simple adder for the iteration instead of the complex carry-propagate adder. 20 Algorithm for Carry-Save Representation Direction of rotation obtained as the sign of estimation of y or z. Redundant representation: estimation of sign from a few leading digits. Real value “wrong” direction of rot. 0 α i Estimated value of y or z 2D[t] -> error of estimation Convergence condition (without rotation repetition): α < α α α i + D[t] i+1 + i+2 +…..+ i+k +… 21 Algorithm for Carry-Save Representation Example for rotation mode: Scale recurrence to have sign information at the most sig. digits Scaled recurrence for z: Estimation of one fractional digit enough for convergence (see paper). < −α α α α D[t] i + i+1 + i+2 +…..+ i+k +… Angular slack > 0 due to redundant angular representation 22 Pipelining the Redundant Algorithm Schedule two shift-scale-add(3-2 csa) operations for each x and y iterations due to redundant representation. 23 Related Works • Early versions of decimal CORDIC (60’s-70’s) – Pocket calculators. – Elementary angle set: tan -1(10 -j) – Convergence not achieved -1 -j Σ -1 -k tan (10 ) > k=j+1 tan (10 ) – Solution: repeat 9 times each element. rot . -1 -j Σ -1 -k tan (10 ) < k=j+1 9 x tan (10 ) – Comparison: 9 rots./digit vs 4 rots/digit – Comparison: 2.25 more elementary rotations. 24 Related Works • Radix-10 BKM (ref[4]) – log and exp in the complex domain (similar to combined cordic + log+ exp). – More complex iterations: 1 iteration/digit; initial overhead Main drawbacks: complex schedule in DFPU and high storage requirements (380 Kbits vs 14 Kbits for fixed point) x y 25 di , di takes values in {-6,…,0,…,6} Comparison with Table-based Polynomial Approx. Scheme # digits Polynom. Latency Storage Look-up table degree (# cycles) (K BCD digits) LuT+Poly 2 10 560 120 (one funct.) LuT+Poly 1 14 880 4.5 (one funct.) Our CORDIC - - 300-400 9.4 (several functions) 26 Conclusions • Floating point decimal CORDIC to compute transcendentals on long formats (Dec128). • Constant scale factor and number of iterations=number of bits of input operands (as in binary standard CORDIC). • Novel decimal angle set to support a unified algorithm for circular and hyperbolic coordinates. • Floating-point extension and mapping to a state of the art DFPU. • Redundant version of the algorithm to use simple 3-2 decimal carry-save adder. • Compares favourably with other decimal CORDIC proposals. • Allows to reduce latency and/or storage compared to table driven polynomial implementations. 27.

Computation of Decimal Transcendental Functions Using the CORDIC Algorithm

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support