IEEE TRANSACTIONS ON COMPUTERS, VOL. C-19, NO. 2, FEBRUARY 1970 97'

Generation of Products and Quotients Using Approximate Binary for Digital Filtering Applications

ERNEST L. HALL, MEMBER, IEEE, DAVID D. LYNCH, MEMBER, IEEE, AND SAMUEL J. DWYER, III, MEMBER, IEEE

Abstract-An approximate method for rapid multiplication or divi- analysis is given. Finally, Section IV contains examples sion with relatively simple digital circuitry is described. The algo- of the use of the log-antilog technique for digital rithm consists of computing approximate binary logarithms, adding or filter subtracting the logarithms, and computing the approximate anti- applications. of the resultant. Using a criteria of minimum mean square error, coefficients for the approximations are developed. An error I. BINARY-TO-BINARY LOGARITHM CONVERSION analysis is given for three cases in which the is useful. A simple method for the computation of the base Finally, applications to digital filtering computations are considered two which illustrate that log-antilog multiplication is not simpler than logarithm of a was developed by Mitch- an array multiplier for computing single products, but is useful for ell [I]. The method consists of encoding the binary parallel digital filter banks and multiplicative digital filters. number into a form from which the characteristic is easily determined and the Index Terms-Antilogarithm converter, binary-to-binary log- mantissa is easily approxi- arithm converter, computer multiplication, digital filter realization. mated. Let N be a nonzero finite length binary number and INTRODUCTION let m and j represent the binary power of the most and least significant of N, respectively. The case of AN ALGORITHM for computer multiplication by N=0 is easily handled separately. N may be written binary logarithms was described by Mitchell [1 ] as and expanded by Combet, Van Zonneveld, and N = ZmZm-l . . Zj+,Zj Verbeek [2]. This method has only modest accuracy with and limited application for general purpose computa- tion. However, there is a class of digital filtering prob- Zm =01; m,j=O ±1 ...; m >j. lems in which the speed, nature of the signals, and com- ponent count, offset accuracy considerations. Real-time Clearly, digital filtering of radar video for moving target detec- 2m1+ > N > 2'. tion, synthetic aperture processing, and pulse compres- N is also given by sion are in this class. Because of the statistical nature of the sampled signals, the large amount of signal integra- m tion required, and the characterization of detection per- N =Zj2i. formance on a probabilistic basis, the accuracy of a sin- gle computation has less importance than the mean and Now let Zk be the most significant nonzero of N, variance of the operation on the signal ensemble. Radar m>k>j. Then, video is characterized by broad bandwidth and corre- k-1 sponding high data flow rates which make real time N = 2k + EZ2i. multiplication with readily available logic very difficult. i-i Furthermore, multiple filters are usually required be- Factoring out 2k results in cause the noise is colored and the filter bandpass is but a small fraction of the actual signal bandwidth. f k-1 E N = In Section I the binary-to-binary logarithm conver- 2 I1 + Z2i-k = 2k(1 + x) sion is reviewed. In Section II the antilogarithm con- version is developed. In Section III a detailed error where k-1 Manuscript received May 1, 1968; revised July 14, 1969. x = E Zi2i-k and 1 > x > 0 since k > j. E. L. Hall is with the Department of Electrical Engineering, i=i University of Missouri, Columbia, and Emerson Electric Company, St. Louis, Mo. Thus, N has been encoded into the form N= 2'(1 +x). D. D. Lynch is with the Emerson Electric Company, Electronics and Space Division, St. Louis, Mo. The base two logarithm of N is S. J. Dwyer, I II, is with the Department of Electrical Engineering, University of Missouri, Columbia, Mo. log2N = k + 1Og2 (1 + X). 98 IEEE TRANSACTIONS ON COMPUTERS, FEBRUARY 1970

Fig. 1. Piecewise linear approximation to binary logarithm.

Since 1 >x>0, the logarithm characteristic is k and the The mean-squared error is defined as mantissa is only a function of x. 1 rx2 A linear approximation of log2 N is of the form P2 = (1 + x) -(ax + b)}2dx. - gJ{1og2 X2 Xl X1 LA(N) = k + ax + b. To minimize E2 with respect to a and b it is necessary The geometrical interpretation of this approximation is that shlown in Fig. 1 and consists of a piecewise linear ap- proximation between the points where log2 N attains E =0-= -2x{1og2 (1+ x) -(ax + b)}dx integral values. d a xl The linear coefficients a and b may be selected to - maximize some return function. If simplicity is the de- - = 0 = -2{1 g2 (1 + X)-(ax + b)}dx- sired return function, then a = 1 and b =0 are the best a b x coefficients. As shown by Mitchell, the maximum error Let E = 10g2 N - LA(N) 2 Il =f Xlog2 (1 + x)dx with a= 1, b =0, is 0.086. If an easily computed set of coefficients is desired, Xl then one may use the linear terms of a Taylor = log24YFloge Y- Y} expansion of log2 (1 +x) about the point x = xo, 1 >xo > 0 1-+:X1 to obtain I2 =J xlog2 (1+ x)dx a = log2 E/(1 + xO) X1 b = 1og2 (1 + xO) - xO log2 E/(1 + xO). Y2 1+X2 = 1og2 E - loge Y - Y2/4} 1+x 1 The error in this approximation is E =-xO log2 e-(1 + xO) log2 (1 + xO). then = - Combet et al. [2] described another method for se- a {12 (x2 + xl)I1/2} lecting the coefficients. This method consists of parti- fX23 - X13 (X22 - X12)(X2 + x1) tioning the range of x into four parts and again making 3 4 linear The linear a piecewise approximation. equations = - - given in [2] were reportedly found by trial and error b 11/(x2 x1) a(x + x2)/2. using a criteria of minimum error, and constraining the Thus, for any partition of the interval [0, 1], the best coefficients to be easily implemented with binary cir- linear mean-square coefficients can be determined and cuitry. That is, the coefficients were chosen to be frac- E2 evaluated. Also, the maximum error for any sub- tions with numerators and de- interval is easily determined. The coefficients, the mean- nominators. With a four subinterval partition, the square error, and the maximum absolute error are given single division error was reduced by a factor of six. in Table I for 1, 2, 4, and 8 equispaced subinterval The authors propose that for many applications, in- partitions. cluding digital filtering, mean-square error is a desirable For a particular realization, it is cQnnyenient to work error criteria, although maximum error and easy im- with the linear equations in the form; plementation must also be considered. The coefficients d if a > I for a linear least squares fit to log2 (1 +x) over, 1 > x2 > x x + cx + >xj .0, will now be developed. x+cx+ d if a < I HALL et at.: GENERATION OF PRODUCTS AND QUOTIENTS FOR DIGITAL FILTERING 99 TABLE I MEAN-SQUARE ERROR AND COEFFICIENTS FOR LOGARITHM APPROXIMATION

Number Subin- Z of Subin- terval a b Emnax tervals tra 1 1 0.984255 0.065176 0.641074E-3 0.065176 2 1 1.163555 0.021303 0.192903E-4 0.021303 2 0.827788 0.181567 0.581653E-5 0.010518 4 1 1.285610 0.006243 0.278225E-6 0.006243 2 1.050957 0.063330 0.387113E-6 0.004141 3 0.888761 0.143537 0.642476E-6 0.002186 4 0.770244 0.231857 0.289856E-7 0.002186 8 1 1.359165 0.001681 0.173267E-7 0.001681 2 1.215426 0.019368 0.550871E-7 0.001371 3 1.099427 0.048200 0.814192E-7 0.001129 4 1.003868 0.083914 0.297152E-7 0.000933 5 0.923414 0.124049 0.130118E-6 0.000794 6 0.854749 0.166916 0.128720E-6 0.000695 7 0.806959 0.202033 0.186847E-6 0.001232 8 0.734065 0.265769 0.150003E-6 0.001186

TABLE I I LOGARITHM EQUATIONS Range Mantissa O

Number Subin- b ofSubin-a1. E JE,,,axI~~~~~~~~~~~~~~~~~._ tervals tervals b E Emax I 1 0.992089 0.946650 0.656127E-3 0.061261 2 1 0.826852 0.988453 0.813603E-5 0.012333 2 1.169200 0.813322 0.143051E-4 0.017487 4 1 0.756609 0.997295 0.894069E-7 0.002759 2 0.900147 0.960905 0.283122E-6 0.003275 3 1.068757 0.876172 0.23818E-6 0.004054 4 1.273014 0.722413 0.730156E-6 0.004519 8 1 0.726814 0.999170 0.223517E-7 0.000839 2 0.787594 0.991472 0.223517E-7 0.000846 3 0.859510 0.973650 0.372529E-7 0.000872 4 0.945204 0.941145 0.149011E-7 0.001242 5 1.021377 0.902764 0.134110E-6 0.001095 6 1.110733 0.847371 0.208616E-6 0.001372 7 1.221244 0.764536 0.819563E-7 0.001323 8 1.331134 0.667864 0.163912E-6 0.001400

TABLE IV ANTILOGARITHM EQUATIONS 0

Let I1I. ERROR ANALYSIS rX2 In this section, an error analysis is given for several z1= Z2xdx = 10g2 e{62X2 -2X1} cases in which a log-antilog conversion would be desir- able. The first case considered will be the product of r w2 ~ X log, 2 - l- *2 two variables. Next, a special case of a of a 2= x2dx = 2x product JZ1 _ ~~(log, 2)2 constant and a variable is considered. This special case arises in many digital filter applications. Finally, Then the error for a quotient of two variables is considered. This analysis is mainly concerned with errors due to (X2 + x1) a ={12 I4 the approximation. Quantization error, truncation error, and coefficient error have been dealt with in other papers (X23- XI3) (X22 Xl )(Xl +X2)} and would depend on the actual hardware used to im- /R3 ..4J plement the conversions. For one of the authors' ap- was to (X22_-Xi2 plications [3] an exact digital simulation made b= 1-aI(X2 - Xl). study these effects. A. Product Error Thus, for any partition of [0, 1], the linear mean- Suppose that the log-antilog conversion was used to square error coefficients a and b may be determined approximate the product of two numbers. How much and E2 evaluated. The absolute maximum error may error would be incurred? Let M1 and M2 represent two also be determined over each subinterval. These values binary numbers which are encoded into the form: are given in Table III for 1, 2, 4, and 8 subinterval partitions. Ml = 2k(l + x1) where 0 < x1 < 1, k, an integer The linear antilogarithm equations for a four sub- M2=2k2(1 + X2) where 0

o (A1 +A3+ A1A3)

Fig. 3. Curve of constant error E1 = C.

Yi aix1 + bi Case 1: Y2= a2x2 + b2. E = 2kl+k2{( + XI)( + x2) The values of as and bi, i=1, 2, are constants over a - (1 + Xl + yl + X2 + Y2 + Z12)} certain interval of and are xi given in Table II. No E = 2kl+k2+l1XIX2 - (yl + Y2 + Z12)}I carry can occur in the sum xi+yi with the coefficients given in Table II so that Let the normalized error be defined as O X1 + y' < 1 El = E/2kl+k2. O . X2 + Y2 <1. Case 2: E = + + Adding the approximate logarithms of M1 and M2 2ki+2+lk2(l x2) (1 xI) gives the approximate logarithm LP of the product - 2(xj + X2 + yl + Y2 + Z12)} P=M1M2. Thus, E = 2k+k2{(1 - xi) (1 - x2)/2 - (yl + Y2 + z12)}I. LP = ki + k2 + X1 + Yl + X2 + Y2. For this case, let the normalized error be defined as Since a carry from the mantissa to the characteristic E2 = E/2(kli+k2+) can occur, there are two cases to consider. For Case 1, no carry: x+y1 +X2 +y2 <1 A direct attempt at finding the critical points of E1 and For Case 2, carry: xj+y1 +X2 +y2>1. E2 would involve setting the partial derivatives equal Case 1: The approximate binary exponent EA (P) is to zero. For Case 1, given by &E, = = X2 - a, - a3 - aIa3 EA(P) 2kl+k2(1 + Xl + X2 + yl + Y2 + Z12) &xl where z12 iS the linear correction term, i.e., = X1- a2 - a3 -a2a3 12 = a3(x1 + X2 + yl + Y2) + b3. aX2 Case 2: which would indicate that the critical point is EA(P) = 2+k2+1±(XI + X2 + yI + y2 + Z12) (X1, X2) = (a2 + a3 + a2a3, al + a3 + ala3). this never where Z12 is the linear correction term, i.e., However, point falls in the interval of interest for the given coefficients. This fact is clearly indicated Z12 = a3(x1 + yl+ x2 + Y2 -1) + b3. by the E1 =constant curves shown in Fig. 3. A similar result holds for E2. If The coefficients a3 and b3 are constants over certain regions and are given in Table IV. The error in using E1 = C = X1X2 - (al + a3 + a1a3)x1 - (a2 + a3 + a2a3)X2 the approximate product is given by - (b, + b2)(1 + a3)-be E = P - EA(P) = M1M2 - EA(P). then 102 IEEE TRANSACTIONS ON COMPUTERS, FEBRUARY 1970 C+ (bl+b2)(1+ a3) + b3) (a2 + a3 + a2a3) {X2 + a2 + a3 + a2a3 J Xl = X2-(a, + a3 + ala3)

A graph of this equation is shown in Fig. 3. where In fact, E is a hyperbolic paraboloid and is a mono- - tonic function over the regions of interest, and there- Z12 a3{xI + yl + log2 (1 + x2)- + b3. fore, it attains its maximum and minimum values at The product error is defined as the boundary of the region. The maximum absolute values of the normalized E=P-EA(P). error E over the 16 regions of x1 and x2 are given in Table V. The maximum error is 0.01907 and occurs at Case 1: Xl =X2=. The values in Table V were arrived at by E = 2kl+k2 (1 + X1) + x2) computing E at 4096 equispaced points in the (xI, x2) plane. The maximum product error is only I as large as -(1 + X1 + yl + 10g2 (1 + X2) + Z12)j the product error computed by Mitchell for [1 ] single Let the normalized error be defined as interval approximations to the logarithm and exponen- tial function. E = E/(2k,+k2). B. Product Error-Special Case Case 2: The special case of the product of a variable and a E = 2kl+k2+{(1 + XI) (1 + constant will now be considered. This case arises in all x2)/2 linear constant coefficient digital filter applications. It - (X1 + yl + 1og2 (1 + X2) + Z12)} is assumed that the logarithm of the is constant exact. Let As one would expect, the resulting product error is the normalized error be defined by smaller. E2= E/ [2klik2+1]. Let N and C represent two binary numbers encoded into the form The normalized error is again a monotonic function over certain and N = 2k1(1 + x1) regions attains its maximum and mini- mum values at the boundaries of these regions. The C = 2k2(1 + X2). maximum absolute values of the error are shown in Table VI. The The approximate logarithm of N is largest error is 0.01321 which occurs at given by xl = 25/64, x2 = 14/32 and is substantially smaller than LA(N) = ki + xi + y, where y, = aix, + b2. the product error for the general case. Although this is not on The binary logarithm of C is given point (xl, x2) the boundary of one of the 16 by main regions, it is on a product boundary since a3 10g2 (C) = k2 + 10g2 (1 + x2). changes inside the region. The approximate logarithm of the product P = CN is given by C. Quotient Error The error incurred in a division operation will now be LP = ki + k2 + Xl + yl + log2 (1 + X2)* considered. Let the dividend D1 and the divisor D2 be Two cases must again be considered. binary numbers encoded into the form For Case 1, no carry: xi+y1+1og2 (1+X2)<1. D1= 211(1 + x1), 0 1. The approximate binary exponent of P is given by: D2= 2k1(1 + X2), 0 < X2 < 1. Case 1: The approximate logarithms of D1 and D2 are given by EA(P) = 2k+k2 {1 + X1 + Yi + 10g2 (1 + X2) + Z12} LA(Di) = ki+ X + yi, y =a1x1+ bi where LA(D2) = k2 + X2 + Y2, Y2 = a2X2 + b2. Z12= a34xi + + + + b3. yi log2 (1 x2)} Subtracting these logarithms gives the approximate Case 2: logarithm LQ of the quotient Q =DID2: EA(P) = 2ki+k2(xi + yl + 10g2 (1 + X2) + Z12} LQ = k- k2 + Xl- X2 + yI - Y2 HALL et al.: GENERATION OF PRODUCTS AND QUOTIENTS FOR DIGITAL FILTERING 103 TABLE V MODULUS VALUES OF PRODUCT ERROR W A 1 ~~o

TABLE VI TABLE VII MODULUS VALUES OF PRODUCT ERROR SPECIAL CASE MODULUS VALUES OF QUOTIENT ERROR \ XI Xi X2 \ O

Again, two cases must be considered depending on the Case 2: occurrence of a borrow from the characteristic to the mantissa. E = k2 - + X + - - Case 1, no borrow: xl+yl2X2+Y2. 2k1 {( 1/2(1 yl X2-Y2 + 12) Case 2, borrow occurs: xl+yl

= k8 { yl - EA (Q) 2kl I+ Xl + X2 Y2 + Z12} E2 = E/(2k1-k2). where The maximum value of E for the 16 regions of the Xl, X2 plane is listed in Table VII. This error is five Z12 + Yi - X- a3axi Y2} + b3. times smaller than the quotient error computed by Case 2: Mitchell.

EA(Q) = 21k*l2112 + X1 + Yi -X - Y2 + Z12} IV. APPLICATIONS TO DIGITAL FILTERING where To illustrate the applications in which the log- antilog conversion would be advantageous, three exam- Z12= a3 + Xl + YI - X2- Y2} + b3. ples are given. The first example shows that for a single The actual quotient is given by multiplication a cobweb array is simpler. The second example illustrates how the log-antilog conversion can be used advantageously for a parallel filter bank. The = = Q D1/D2 2k1-2 } }. last example of a multiplicative filter illustrates a situa- tion in which a log-antilog conversion is necessary. The error is defined as Example 1-Nonrecursive Digital Filter: The differ- ence of a nonrecursive E= Q-EA(Q) equation digital filter [4] may be written as which for the two cases is the following. -1 Case 1: Y. E aix.-i - i=O + E = 2k-k{( 1) + XI + Yl-X2 Y2 + Z12) Using the log-antilog conversion the computation may + X2) be performed by Let the error normalized be defined by N-1 E exp ai El = E/2k1-1k2 Y.= flog I I + log I x.-i I1} i=O 104 IEEE TRANSACTIONS ON COMPUTERS, FEBRUARY 1970

Fig. 4. Parallel filter bank.

Log e(t) Log n (t)M

Fig. 5. Multiplicative digital filter.

The special cases of act<0 or x,,i <0 are easily handled described by a difference equation of the form by either complementing or clearing the antilogarithm N-1 M result. Also, the computation of log ail may be done Y.= E ix-i- E biYn_i a priori. i-1 i-1 The tradeoff between the two computations is: a direct then, again the computation of log xi may be moved multiplication versus a log conversion, an addi- ahead of the tion, and an . A remaining filters. comparison of hardware Once more complexity and computation time can be assuming 6-bit numbers, the cobweb array made for the truncated to 6 bits would particular example of multiplying two six-bit numbers. require 21 adders for each An multiplication. An addition and exponentiation require indicator of hardware complexity is the number of 18 adders. full adder circuits required. Thus, if the number of filters R is four or For a cobweb array multi- more, the log-antilog conversion would require less plier, 30 adders are required if all product bits are hardware. retained; however, only 21 adders are required if the 3-A product is truncated to six bits. For the log Example Multiplicative Digital Filter: Recently, conversion, Oppenheim et al. presented a twelve adders are required, for the log addition eight [6] general method for are used, and for the nonlinear filtering of multiplied and convolved signals. antilog conversion ten are needed, The multiplicative techniques were or a total of 30 adders are required to obtain 6-bit applied to audio accuracy. Thus, compression and expansion and image for a multiplication the cobweb array enhancement. The block is less complex and simpler than the log-antilog con- diagram of a multiplicative version. filter is shown in Fig. 5. If the input signal S(t) consists of the of two Example 2-Parallel Digital Filter Bank: A particular product components e(t) and b(t), then the digital filter logarithm conversion reduces the process to the familiar configuration which often arises is the additive process which be filtered parallel filter bank which may be described by a z-trans- may using linear fer techniques. The antilogarithm conversion reconstitutes function of the form the filtered signals. H(z) = H1(z) + H2I(z) + * * + Hr(Z). The log and antilog conversions developed in this paper could be used for any digital realization of the The block diagram of this filter is shown in Fig. 4(a). multiplicative filter. If each of the Hi are of the nonrecursive type, then the log conversion of xi may be performed first as shown in V. SUMMARY Fig. 4(b). The resulting filters, Gi, would then require for approximate binary logarithms and only an addition and an exponentiation for each multi- exponents and some applications of these algorithms plication. If any of the Hi are of the recursive [5] type to digital filtering have been described. Since the maxi- IEEE TRANSACTIONS ON COMPUTERS, VOL. C-19, NO. 2, FEBRUARY 1970 105 mum product and division errors occur as percentages REFERENCES of the operands, these algorithms are suited for high- [11 J. N. Mitchell, Jr., "Computer multiplication and division using speed hardware rather than GP computer applications. binary logarithms, " IRE Trans. Electronic Computers, vol. EC-1 1, pp. 512-517, August 1962. One such application is real time digital filtering. The [21 M. Combet, H. Van Zonneveld, and L. Verbeek, "Computation of computations involved are usually sums of products of the base two logarithm of binary numbers," IEEE Trans. Elec- tronic Computers, vol. EC-14, pp. 863-867, December 1965. a variable and constant coefficients. Using the log- [3] E. L. Hall, D. D. Lynch, and R. E. Young, "A digital modified antilog algorithms gives complete freedom of coefficient discrete Fourier transform Doppler radar processor," 1968 selection. For single multiplications a cobweb EASCONRec., pp. 150-159. array [41 J. F. Kaiser and F. Kuo, System Analysis by Digital Computer. multiplier is simpler. However, for other configurations, New York: Wiley, 1966, pp. 218-277. such as a parallel filter bank, the [51 C. M. Rader and B. Gold, "Digital filter design techniques in the digital log-antilog frequency domain," Proc. IEEE, vol. 55, pp. 149-171, February technique is less complex. Also, there are applications, 1967. such as [6] A. V. Oppenheim, R. W. Schafer, and T. G. Stockham, Jr., multiplicative digital filters, where log and ex- "Nonlinear filtering of multiplied and convolved signals," Proc. ponent conversions are necessary. IEEE, vol. 56, pp. 1264-1291, August 1968.

A Generalization of the Fast Fourier Transform

J. A. GLASSMAN

Abstract-A procedure for factoring of the NXN matrix rep- be termed a mixed radix algorithm, has appeared in resenting the discrete Fourier transform is presented which does not recent articles but no other algorithm has been produce shuffled data. Exactly one factor is produced for each factor [7]- [9] of N, resulting in a fast Fourier transform valid for any N. The found which does not involve scrambled data. This factoring algorithm enables the fast Fourier transform to be imple- paper develops the basic matrix, demonstrates the al- mented in general with four nested loops, and with three loops if N gorithm for its , and illustrates the factor- is a power of two. No special logical organization, such as binary in- ing and the resulting fast Fourier transform with pro- dexing, is required to unshuffle data. Included are two sample pro- grams for time-sharing operation. grams, one which writes the equations of the matrix factors employ- ing the four key loops, and one which implements the algorithm in a fast Fourier transform for N a power of two. The algorithm is shown THE DISCRETE FOURIER TRANSFORM to be most efficient for N a power of two. The Fourier transform Y(w) of a function x(t) is de- Index Terms-Cooley-Tukey algorithm, discrete Fourier trans- fined by the relation form, fast Fourier transform, mixed radix, spectral analysis. co Y(W) = x(t)e- wtdt. (1) T HE fast Fourier transform, extensively covered in -co current and termed the journals [1]- [7 ] popularly If x(t) is sampled g times, a sampled function X*(t) is Cooley-Tukey algorithm [4], can be generalized produced, defined by for any number of coefficients N by a factoring which does not shuffle the data. This has three A-1 factoring X*(t) = E x(t)b(t - kT) (2) major effects on the fast Fourier transform. First, the k20 transform is more easily explained, since the complexity of the tree graph to trace data is eliminated. Second, the where a(t) is the Dirac delta function. The Fourier trans- mechanization is simplified since the final data need not form of X*(t) is the discrete Fourier transform Y*(w): be unshuffled, and third, the fast Fourier transform may be conveniently applied to any number of coefficients, Y*(w) = x*(t)e-iwtdt although an application to anything but a power of two -00 or four may be only of academic interest. The applica- r A--1 tion of the fast Fourier transforms to any N, which may - f EF x(t)5(t- kT)e-wtdt (3)

Manuscript received January 16, 1969; revised August 4, 1969. j-1 The author is with Hughes Aircraft Company, Canoga Park, Y*-(w) = E x(kT)eiwkT. Calif. k=O