Quick viewing(Text Mode)

Computation of Decimal Transcendental Functions Using the CORDIC Algorithm

Computation of Decimal Transcendental Functions Using the CORDIC Algorithm

Computation of Decimal Transcendental Functions Using the CORDIC

Álvaro Vázquez, Julio Villalba* and Elisardo Antelo University of Santiago de Compostela *University of Málaga SPAIN

19th IEEE Symposium on Computer Arithmetic Portland (USA), June 8-10, 2009 Motivation

• Renewed interest in decimal floating point (IEEE 754- 2008; Hardware implementations .)

• Research now mainly focused on basic floating point operations (+/-, x, /).

• We consider transcendentals (sin, cos, exp, log…) to fully support the IEEE 754-2008 Decimal Number System for DPD formats.

• Slow -> CORDIC might be an option for transcendentals.

2 Commercial Hardware Implementations

Cycles required for execution for Dec128 operands (Power 6 - 13 FO4 cycle - 5GHz @ 65nm SOI )

FP add/sub 11 to 19 (depending on specific case )

FP Multiplication 21+2 N (wost case:89) (N: nº digits excluding leading zeros )

FP 154

FXP add/sub 2

3 Transcendentals

• Transcendental functions: sin, cos, …

• We aim to implement transcendentals based on CORDIC (serial method).

• Why?, slow decimal multiplication-> Table + polynomial approx might be slow.

• Resembles the context when CORDIC were used in processors.

4 CORDIC

Rotation mode (circular) Vectoring mode (circular)

output: rotated vector

Input angle Input vector Input vector

output: modulus and angle

Circular Functions: sin, cos, tan, norm of a vector, tan -1(y/x),… Extension to “hyperbolic ”: sinh,cosh, sqrt, tan -1(y/x),ln,exp,.. 5 CORDIC

Rotation decomposed into elementary rotations

α α cos( ) sin( ) cos( α1) sin( α1) cos( α2) sin( α2) = α α .... -sin( ) cos( ) -sin( α1) cos( α1) -sin( α2) cos( α2)

Rotation angle decomposed as sum of elementary rotation angles α = Σ α i

1 tan( α ) α i cos( i) α -tan( i) 1

6 Radix-2 CORDIC: basic step

Basic step: elementary rotation 1 tan( α ) with increase in the modulus α i cos( i) α (x[i],y[i]) -tan( i) 1 (x[i+1],y[i+1])

basic step

α -1 -i -1 -i σ -i i=+ tan (2 ) or -tan (2 ) 1 i 2

σ -i - i 2 1 σ i=+1 or -1

α σ cos( i) independent of i Κ= Π α cos( i) constant scale factor ->precomputed and compensated after basic steps 7 CORDIC Convergence

Worst case: vector on target position α Next residual angle value is i

Next elementary rotation target position (x[i+1],y[i+1]) α (remainder angle=0) i (x[i],y[i])

α < Σ α Convergence condition : i k=i+1 k 8 Our Decimal CORDIC

• We want constant scale factor -> precomputed and simple to compensate.

• Constant scale factor means two possible angles (same magnitude, different sign) for each elementary rotation.

• Conventional radix-2 CORDIC not good for decimal operands

10 -k 10 -(k+1)

BCD Rep. σ -i x[i] ……8 4 2 1 8 4 2 1 1 i 2

σ -i y[i] x x x x x x x x - i 2 1

Not a simple right shift 9 Elementary angles

Key issue: elementary angles set

Requirements:

α α • Two possible angles per iteration : + i and – i.

σσσ ααα • tan( i i) proportional to a power of ten .

1 tan( σ α ) α i i cos( i) σ α -tan( i i) 1

Simple right shift for BCD operands Constant scale factor 10 Angle set for Dec. CORDIC

Binary weights of a decimal rep. RNC8 paper [3]

ααα -1 σσσ -ceil(i/4) i=tan ( i C[i] 10 )

σ i=-1 or +1 8, 4, 2,1 5, 2, 1,1 Easier to implement (easy multiples for dec.) α < α α α Convergence: i i+1 + i+2 +…..+ i+k +…

11 Contributions in this work

• Extend the algorithm to support hyperbolic coordinates (tanh -1 angles)

• Extend to floating point

• Mapping the algorithm to a state of the art Decimal Floating Point Unit.

• Algorithm with redundant (carry-save).

12 Angle set for circular and hyperbolic

ααα -1 σσσ -ceil(i/4) i=tan ( i C[i] 10 )

ααα -1 σσσ -ceil(i/4) i=tanh ( i C[i] 10 )

It does not assure convergence C[i]=R[i mod 4]=1, 5, 2, 1 for both

Enough redundancy to assure C[i]=R[i mod 4]=1, 5, 2, 2 convergence for hyperbolic coord.

13 Unified Algorithm

Roughly one iteration per bit of the input operands. (4 iter./digit)

c=1 (circular); c=-1 (hyperbolic); C[i] takes values 5, 2 or 1. 14 Floating-Point Extension

Range Reduction : standard methods (see paper)

sin/cos, tan -1,sinh/cosh,e f,10 f, tanh -1 ln(f) log(f), sqrt

Result after range reduction :

One of the inputs:

yin= Myin 10 -Eyin or zin=Mzin 10 -Ezin |Mzin|, |Myin| in [1,10) and Eyin, Ezin > or = 0

We need a floating point version of the iterations:

Scale y or z iterations (or both) to move Eyin or Ezin Leading zeros (or nines). 15 Floating-Point Extension

Floating-point iterations :

Scale y and/or z iterations by 10 Eyin or 10 Ezin Start iterations with index J=4Ezin-3 or J=4Eyin-3

Funct P Q

sin/cos; Ezin Ezin sinh/cosh (N=0)

sinh/cosh 0 Ezin (N!=0)

tan -1, tanh -1 Eyin Eyin Ln, log

sqrt 0 0 p decimal digits and 1 ulp accuracy: of 1 integer and p+3 fract. and m=p+1 (4(p+1) elementary rotations). 16 Mapping to a DFPU (for Decimal128 format)

17 Pipelining the Algorithm

Interleave x and y iterations for effective pipelining 18 Pipelining the Algorithm: Number of cycles for Decimal 128

-#cycles without pre and post processing due to range reduction

-Optimization performed : after about half of rotations, perform a single final rotation since cos and sin can be linearly approximated.

This requires a mult. for rotation and a division for vectoring both of about m/2 digits. 19 Algorithm for Carry-Save Representation

Carry-save Adder CPA

Perform iterations with redundant adder. (may be used for division) Why use a redundant adder?

Not for speed (actually the algorithm will be slower)

Our aim is to use a simple adder for the iteration instead of the complex carry-propagate adder. 20 Algorithm for Carry-Save Representation

Direction of rotation obtained as the sign of estimation of y or z.

Redundant representation: estimation of sign from a few leading digits.

Real value “wrong” direction of rot.

0 α i

Estimated value of y or z 2D[t] -> error of estimation

Convergence condition (without rotation repetition): α < α α α i + D[t] i+1 + i+2 +…..+ i+k +… 21 Algorithm for Carry-Save Representation Example for rotation mode:

Scale recurrence to have sign information at the most sig. digits

Scaled recurrence for z:

Estimation of one fractional digit enough for convergence (see paper). < −α α α α D[t] i + i+1 + i+2 +…..+ i+k +…

Angular slack > 0 due to redundant angular representation 22 Pipelining the Redundant Algorithm

Schedule two shift-scale-add(3-2 csa) operations for each x and y iterations due to redundant representation. 23 Related Works

• Early versions of decimal CORDIC (60’s-70’s) – Pocket . – Elementary angle set: tan -1(10 -j) – Convergence not achieved -1 -j Σ -1 -k tan (10 ) > k=j+1 tan (10 ) – Solution: repeat 9 times each element. rot . -1 -j Σ -1 -k tan (10 ) < k=j+1 9 x tan (10 ) – Comparison: 9 rots./digit vs 4 rots/digit – Comparison: 2.25 more elementary rotations.

24 Related Works

• Radix-10 BKM (ref[4])

– log and exp in the complex domain (similar to combined + log+ exp).

– More complex iterations: 1 iteration/digit; initial overhead

Main drawbacks: complex schedule in DFPU and high storage requirements (380 Kbits vs 14 Kbits for fixed point)

x y 25 di , di takes values in {-6,…,0,…,6} Comparison with Table-based Polynomial Approx.

Scheme # digits Polynom. Latency Storage Look-up table degree (# cycles) (K BCD digits) LuT+Poly 2 10 560 120 (one funct.)

LuT+Poly 1 14 880 4.5 (one funct.)

Our CORDIC - - 300-400 9.4 (several functions)

26 Conclusions

• Floating point decimal CORDIC to compute transcendentals on long formats (Dec128).

• Constant scale factor and number of iterations=number of bits of input operands (as in binary standard CORDIC).

• Novel decimal angle set to support a unified algorithm for circular and hyperbolic coordinates.

• Floating-point extension and mapping to a state of the art DFPU.

• Redundant version of the algorithm to use simple 3-2 decimal carry-save adder.

• Compares favourably with other decimal CORDIC proposals.

• Allows to reduce latency and/or storage compared to table driven polynomial implementations.

27