A Dual-Purpose Real/Complex Logarithmic Number System ALU
Total Page:16
File Type:pdf, Size:1020Kb
A Dual-Purpose Real/Complex Logarithmic Number System ALU Mark G. Arnold Sylvain Collange Computer Science and Engineering ELIAUS Lehigh University Universite´ de Perpignan Via Domitia [email protected] [email protected] Abstract—The real Logarithmic Number System (LNS) allows advantages to using logarithmic arithmetic to represent <[X¯] fast and inexpensive multiplication and division but more expen- and =[X¯]. Several hundred papers [26] have considered con- sive addition and subtraction as precision increases. Recent ad- ventional, real-valued LNS; most have found some advantages vances in higher-order and multipartite table methods, together with cotransformation, allow real LNS ALUs to be implemented for low-precision implementation of multiply-rich algorithms, effectively on FPGAs for a wide variety of medium-precision like (1) and (2). special-purpose applications. The Complex LNS (CLNS) is a This paper considers an even more specialized number generalization of LNS which represents complex values in log- system in which complex values are represented in log-polar polar form. CLNS is a more compact representation than coordinates. The advantage is that the cost of complex multi- traditional rectangular methods, reducing the cost of busses and memory in intensive complex-number applications like the plication and division is reduced even more than in rectangular FFT; however, prior CLNS implementations were either slow LNS at the cost of a very complicated addition algorithm. CORDIC-based or expensive 2D-table-based approaches. This This paper proposes a novel approach to reduce the cost of paper attempts to leverage the recent advances made in real- this log-polar addition algorithm by using a conventional real- valued LNS units for the more specialized context of CLNS. This valued LNS ALU, together with additional hardware that is paper proposes a novel approach to reduce the cost of CLNS addition by re-using a conventional real-valued LNS ALU with less complex than the real-valued LNS ALU to which it is specialized CLNS hardware that is smaller than the real-valued attached. LNS ALU to which it is attached. The resulting ALU is much less Arnold et al. [1] introduced a generalization of logarith- expensive than prior fast CLNS units at the cost of some extra mic arithmetic, known as the Complex-Logarithmic Number delay. The extra hardware added to the ALU is for trigonometric- System (CLNS), which represents each complex point in log- related functions, and may be useful in LNS applications other than CLNS. The novel algorithm proposed here is implemented polar coordinates. CLNS was inspired by a nineteenth-century using the FloPoCo library (which incorporates recent HOTBM paper by Mehmke [16] on manual usage of such log-polar advances in function-unit generation), and FPGA synthesis re- representation, and shares only the polar aspect with highly- sults are reported. unusual complex-level-index representation [20]. The initial Keywords: Complex Arithmetic, Logarithmic Number System, CLNS implementations, like the manual approach, were based hardware function evaluation, FPGA. on straight table lookup, which grows quite expensive as pre- I. INTRODUCTION cision increases since the lookup involves a two-dimensional table. The usual approach to complex arithmetic works with Cotransformation [2] cuts the cost of CLNS significantly pairs of real numbers that represent points in a rectangular- by converting difficult cases for addition and subtraction into coordinate system. To multiply two complex numbers, de- easier cases, but the table sizes are still large. noted in this paper by upper-case variables, X¯ and Y¯ , using Lewis [14] overcame such large area requirements in the rectangular-coordinates, involves four real multiplications: design of a 32-bit CLNS ALU by using a CORDIC algorithm, <[X¯Y¯ ] = <[X¯] · <[Y¯ ] − =[X¯] · =[Y¯ ] (1) which is significantly less expensive; however, the implemen- =[X¯Y¯ ] = <[X¯] · =[Y¯ ] + =[X¯] · <[Y¯ ]; (2) tation involves many steps, making it rather slow. Despite the implementation cost, CLNS may be preferred in certain where <[X¯] is the real part and =[X¯] is the imaginary part applications. For example, Arnold et al. [3], [4] showed that of a complex value, X¯. The bar indicates this is an exact CLNS is significantly more compact than a comparable rectan- linearly-represented unbounded-precision number. There are gular fixed-point representation for a radix-two FFT, making many implementation alternatives for the real arithmetic in the memory and busses in the system much less expensive. (1) and (2): fixed-point, Floating-Point (FP), or more unusual These savings counterbalance the extra cost of the CLNS ALU systems like the scaled Residue Number System (RNS) [8] or if the precision requirements are low enough. Vouzis et al. the real-Logarithmic Number System (LNS)[19]. Swartzlander [21] analyzed a CLNS approach for Orthogonal Frequency et al. [19] analyzed fixed-point, FP, and LNS usage in an FFT Division Multiplexing (OFDM) demodulation of Ultra-Wide implemented with rectangular coordinates, and found several Band (UWB) receivers. To reduce the implementation cost of such CLNS applications, Vouzis et al. [22] proposed using a back exactly to rectangular form: Range-Addressable Lookup Table (RALUT), similar to [17]. <[X¯] = bXL cos(X ) θ (4) =[X¯] = bXL sin(X ): II. REAL-VALUED LNS θ The format of the real-valued LNS [18] has a base-b In a practical implementation, both the logarithm and angle will be quantized: (usually, b = 2) logarithm (consisting of kL signed integer bits and f fractional bits) to represent the absolute value of fL −fL L X^L = bXL2 + 0:5c2 a real number and often an additional sign bit to allow for f +2 −f (5) X^θ = bXθ2 θ /π + 0:5c2 θ : that real to be negative. We can describe the ideal logarithmic transformation and the quantization with distinct notations. For We scale the angle by 4/π so that the quantization near an arbitrary non-zero real, x¯, the ideal (infinite precision) LNS the unit circle will be roughly the same in the angular and value is xL = logb jx¯j. The exact linear x¯ has been transformed radial axes when fL = fθ while allowing the complete circle into an exact but non-linear xL, which is then quantized as of 2π radians to be represented by a power of two. The fL −fL ~ x^L = bxL2 + 0:5c2 . In analogy to the FP number sys- rectangular value, X, represented by the quantized CLNS has tem, the kL integer bits behave like an FP exponent (negative the following real and imaginary parts: exponents mean smaller than unity; positive exponents mean X^L <[X~] = b cos(π=4X^θ) larger than unity); the fL fractional bits are similar to the FP ^ (6) ~ XL ^ mantissa. Multiplication or division simply require adding or =[X] = b sin(π=4Xθ): subtracting the logarithms and exclusive-ORing the sign bits. To summarize our notation, an arbitrary-precision complex To compute real LNS sums, a conventional LNS ALU value X¯ 6= 0 is transformed losslessly to its ideal CLNS z computes a special function, known as sb(z) = logb(1 + b ), representation, X. Quantizing X to X^ produces an absolute which, in effect, increments the LNS representation by 1.0. error perceived by the user in the rectangular system as (The s reminds us this function is only used for sums when the jX¯ − X~j. sign bits are the same.) The LNS addition algorithm starts with Complex multiplication and division are trivial in CLNS. xL = logb jx¯j and yL = logb jy¯j already in LNS format. The Given the exact CLNS representations, X and Y , the result of result, tL = logb(¯x+¯y) is computed as tL = yL +sb(xL −yL), multiplication, Z = X + Y , can be computed by two parallel which is justified by the fact that t¯ =x ¯ +y ¯ =y ¯(¯x=y¯ + 1), adders: even though such a step never occurs in the hardware. Since Z = X + Y L L L (7) x¯ +y ¯ =y ¯ +x ¯, we can always choose zL = −|xL − yLj [18] Zθ = (Xθ + Yθ) mod 2π: by simultaneously choosing the maximum of x and y . In L L The modular operation for the imaginary part is not strictly other words, t = max(x ; y ) + s (−|x − y j), thereby L L L b L L necessary, but reduces the range of angles that the hardware restricting z < 0. needs to accept. In the quantized system, this reduction comes Analogously to s , there is a similar function to compute b at no cost in the underlying binary adder because of the 4/π the logarithm of the differences with t = max(x ; y ) + L L L scaling. d (−|x − y j), where d (z) = log j1 − bzj. The decision b L L b b For example, if X¯ = −1 + i and Y¯ = 4i, the b = 2 polar- whether to compute s or d is based on the signs of the real b b logarithmic representations are X = 0:5 log (12 + 12) = values. L b 0:5;X = 3π=4 and Y = 0:5 log (42) = 2:0;Y = π=2. The Early LNS implementations [18] used ROM to lookup s^ θ L b θ b representation of the product is computed simply as Z = and d^ achieving moderate (f = 12 bits) accuracy. More L b L X + Y = 0:5 + 2:0 = 2:5 and Z = X + Y = 3π=4 + recent methods [11], [12], [13], [23], [24] can obtain accuracy L L θ θ θ π=2 = 5π=4. We can verify manually that this is correct: Z¯ = near single-precision floating point at reasonable cost. The goal p p p b2:5(cos(5π=4) + i sin(5π=4)) = 4 2(− 2=2 − 2=2i) = here is to leverage the recent advances made in real-valued −4 − 4i.