Cairo University What are the elementary functions Electronics and Communications Department

For mathematicians, they are elementary. For hardware people, • they are Higher level functions.

Computer Arithmetic: Originally, calculators and computers only had: sin(x), log(x), • Elementary functions √n x, tanh(x), . . .

Hossam A. H. Fahmy Now, some DSPs may include other operations such as the • gamma .

Simply, they are any functions that can be easily tabulated or • represented in a , a , or a ratio of .

c Hossam A. H. Fahmy 1/25

Implementation steps Reduction and reconstruction

The range reduction and reconstruction steps are related and • The approximation of an elementary function is more efficient (time they are function-dependent. delay and area) if the argument is constrained in a small interval.

There is no single reduction and reconstruction technique that Hence, there are three main steps in any elementary function cal- • is applicable to all functions. culation:

The modular reduction is applicable to the exponential and si- • 1. range reduction, nusoidal functions, the case of the is even simpler.

x N ln(2)+y N y 2. approximation, and e = e = 2 e sin(x) = sin(N π + y) × × 2 sin(x) = sin(y),Nmod4 = 0 sin(x) = cos(y),Nmod4 = 1 sin(x) = sin(y),Nmod4 = 2 3. reconstruction. − sin(x) = cos(y),Nmod4 = 3 log(x) =− log(2exp 1.f) = exp log(2) + log(1.f) × 2/25 3/25 The approximation step The digit and functional recurrence

Digit recurrence techniques:

The approximation algorithms are classified as: converge linearly. • use , subtraction, shift, and single digit . 1. digit recurrence techniques, • restoring/non-restoring division, SRT, Cordic, Briggs and DeL- • ugish, . . . 2. functional recurrence techniques,

Functional recurrence techniques: 3. polynomial approximation techniques, and converge quadratically (or better for higher orders). • 4. rational approximation techniques. use addition, subtraction, multiplication, and table lookup. • Newton-Raphson of any order. • 4/25 5/25

The polynomial approximation The rational approximation

Polynomial approximation techniques: Rational approximation techniques:

depending on the function and the implementation details, may • converge directly to the required precision. depending on the function and the implementation details, may • converge directly to the required precision.

use addition, subtraction, multiplication, and table lookup. • use addition, subtraction, multiplication, tables, and division. • divide the interval of the argument to a number of sub-intervals • where the elementary function is approximated by a polynomial for each sub-interval, approximate the given function by a ratio- • of a suitable degree. One or more tables contain the coefficients nal function (a polynomial divided by another polynomial). of the polynomials.

6/25 7/25 Digit recurrence: Cordic Basic idea of Cordic

(xn, yn)  J. Volder in 1959 developed a digit by digit algorithm to compute • all the with minimal hardware support. θ

- (x0, y0) The generalized algorithm calculates also the hyperbolic and the • arc functions. To reach an angle of θ, we rotate the initial vector in each iteration 1 i by a small angle αi = tan− 2− and watch the error zi = θ i ± − j=0 αj. Cordic has been widely used in calculators and in some proces- • P At the end (when zn 0), we reach sors. ≈ xn = x0 cos θ yn = x0 sin θ. Setting x0 = 1, we directly get the cosine and sine functions. 8/25 9/25

Derivation of Cordic The simple Cordic iteration

yi+1 Volder’s algorithm is based on  yi i ¡ xi+1 = xi diyi2− ¡ − i ¡ yi+1 = yi + dixi2− ¡ z = z d tan 1(2 i) ¡ i+1 i − i − − α¡ φi d = 1 if z 0, ¡ i x x i i ≥ ¡ i i+1 d = 1 otherwise. i − Notice that, with each iteration, We rotate the current vector (x , y ) by an angle α = tan 1 2 i. i i i ± − − x0 = R cos(φi + αi) the magnitude of z is decreasing by α and = R(cos φ cos α sin φ sin α ) • i | i| i i − i i = x cos α y sin α i i − i i x the magnitude of the vector is increasing due to the division by 0 = tan • xi yi αi cos . cos αi − αi x = x d y 2 i i+1 i − i i − i 0 1 2 3 4 5 6 7 8 9 αi (degrees) 45 26.6 14 7.1 3.6 1.8 0.9 0.4 0.2 0.1 10/25 11/25 Compensation Concluding Cordic

1 i 1 2i Since αi = tan− 2− then = 1 + 2− . Let us define ± cos αi What we have just explained is the rotation mode of the circular q • type. There is also a vectoring mode and two other types:linear, k = ∞ 1 + 2 2i = 1.646760258 − ··· and hyperbolic. iY=0 q and start from 1 With the generalized Cordic, it is possible to compute many x0 = = 0.60725293 • k ··· functions with minimal hardware support (three and a y = 0 0 comparison). z0 = θ.

Cordic is slow but area efficient. It has been used in calculators Is there a maximum for θ? What is it? What if you want to calculate • and in the 8087 coprocessor. for a larger angle?

12/25 13/25

Polynomial approximations Different polynomials

Taylor series are available for most functions if we know their • . However, such series do not provide the minimum error term. To improve the precision, it is better to divide the domain of • the input to sub-intervals and to have a specific polynomial for Chebyshev polynomials minimize the maximum error (mini-max) each sub-interval. • in the domain of the approximation. However, the calculation of the coefficients of the polynomial may take some effort. Note that we actually use truncated coefficients so we need to com- Many different polynomials may approximate the same function, • pensate for that. how do we choose the best? How do we define best? ⇒ Instead of saving the coefficients in a table, we can save the • How do we actually make the calculation? What is the best values of the function at various points and interpolate. • way? – How many points? Which points?

– How to interpolate between those points?

14/25 15/25 Evaluating a polynomial Rational approximations

It is possible to optimize the evaluation of a polynomial f(x) = n n 1 cnx + c x + + c in order to minimize the hardware or to n 1 − ··· 0 If a polynomial approximation requires too many terms, there might increase the− speed. be a ( ) = P (x) of a lower degree that gives a R x Q(x) good approximation. For Hardware: use Horner’s rule

f(x) = ( (((c x + c )x + c )x + )x + c ) 2 n n 1 n 2 0 To reduce the calculation order, you may use R = xPm(x ). ··· − − ··· m,n ( 2) and iterate. This might be even driven and controlled by soft- • Qn x ware. Typically, R is enough for over sixty bits of precision. However, • 4,4 some functions require higher degrees. For Speed:

use parallel powering units to reach the required precision in • Rational approximations are “better” for double or extended one iteration if possible. • precisions, Cordic is preferred for single precision. use the PPA of a multiplier. • 16/25 17/25

Using the PPA of a multiplier Reciprocal with PPA

In 1993, Eric Schwarz proposed to use the PPA to evaluate the If we want to calculate q such that b q = 0.1111 1.0 where b = 0.1b b b we write × · · · ≈ reciprocal. It is possible to generalize the idea for any polynomial. 2 3 4 ··· 0. 1 b2 b3 b4 b5 = b ··· q0 q1 q2 q3 q4 q5 = q × . ··· Write the operation in terms of the bits and expand the poly- . q4 q4b2 q4b3 q4b4 q4b5 • ··· nomial symbolically. q3 q3b2 q3b3 q3b4 q3b5 ··· q2 q2b2 q2b3 q2b4 q2b5 ··· q1 q1b2 q1b3 q1b4 q1b5 ··· q0 q0b2 q0b3 q0b4 q0b5 Group the terms with the same power of 2 numerical weight 0. 1 1 1 1 1··· 1 1 1 1 1 1 • ≈ and write them in columns. and use redundant digits for the qi to make each column indepen- dent. Hence,

q0 = 1 The resulting matrix is similar to the partial products. Hence, • q1 + q0b2 = 1 we add the rows of this matrix using the reduction tree and the q2 + q1b2 + q0b3 = 1 carry propagate adder of the multiplier.

18/25 19/25 Steps of PPA implementation Reduction rules for the PPA evaluation

i 1. Any M a ( ki2 )a. For example 5a (a, 0, a) over three Solve the : × ⇒ ⇒ • columns. P q0 = 1 q = 1 q b = 1 b 1 − 0 2 − 2 2. Algebraic reductions. For example, 2a a a. q = 1 q b q b = 1 b − ⇒ 2 − 1 2 − 0 3 − 3 3. Boolean reductions: Put in a PPA form: • q0 q1 q2 q3 q4 ··· a ab = a(1 b) a¯b. b4 b5 − − • − − ⇒ b3 2b2b4 − ··· 2b2b3 b4 a + b ab = a + ¯ab aOR b. − b2 b3 b2 b2b3 • − ⇒ 1− 1− 1− 1− 1 a + b 2ab a b. • − ⇒ ⊕

20/25 21/25

The rest of the steps Other functions using the PPA

After applying the reduction rules, If the function is expressed as a polynomial, we represent the coef- ficients and the parameter using their bits and expand symbolically. 2 Compensate for any approximation errors to improve the accu- For example, let us evaluate P (h) = c0 + c1h + c2h where • racy. 5 6 h = h12− + h22− 1 Complement the negative elements and subtract one (remember c0 = c00 + c012− • that ¯a = 1 a). c = c + c 2 1 − 1 10 11 − 1 Reduce all the constants. c2 = c20 + c212− • We expand P (h) symbolically and group the terms: 1 5 6 q0 q1 q2 q3 q4 P (h) = c00 + c012− + c10h12− + (c10h2 + c11h1)2− ··· 7 10 11 1 + c11h22− + (c20h1 + c20h1h2)2− + (c21h1 + c21h1h2)2− ¯b5 12 13 ··· + c20h22− + c21h22− 1 ¯b2b3b4 ¯b2 ¯b2¯b4 ¯b4 then write them in the form of a partial product array: 1 (b2OR ¯b3) ¯b3 (¯b2OR ¯b3)

Reducing all of this, we get a non-redundant representation of the c00 c01 0 0 0 c10h1 c10h2 c11h2 0 0 c20h1 c21h1 c20h2 c21h2 quotient. c11h1 c20h1h2 c21h1h2 22/25 23/25 Conclusion on the use of PPA Summary of elementary functions

The PPA of a multiplier is modified by adding a multiplexer and • some logic gates to generate the required bit patterns for the function. We presented a general classification of how to implement them. • That minimal hardware allows us to compute many functions at • the speed of a multiplication. What is “best” depends on the goal of the specific unit. •

Almost all the elementary functions can be computed (usually It is possible to have hardware intensive and extremely fast eval- • • to within 12–20 bits of precision). uation. On the opposite spectrum, it is possible to delegate the computations to software.

The reconfiguration of the multiplier may be done in less than • a clock cycle.

24/25 25/25