Lightweight Floating-Point Arithmetic: Case Study of Inverse Discrete Cosine Transform

EURASIP Journal on Applied Signal Processing 2002:9, 879–892 c 2002 Hindawi Publishing Corporation Lightweight Floating-Point Arithmetic: Case Study of Inverse Discrete Cosine Transform Fang Fang Department of Electrical and Computer Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA Email: ff[email protected] Tsuhan Chen Department of Electrical and Computer Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA Email: [email protected] Rob A. Rutenbar Department of Electrical and Computer Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA Email: [email protected] Received 15 May 2001 and in revised form 9 May 2002 To enable floating-point (FP) signal processing applications in low-power mobile devices, we propose lightweight floating-point arithmetic. It offers a wider range of precision/power/speed/area trade-offs, but is wrapped in forms that hide the complexity of the underlying implementations from both multimedia software designers and hardware designers. Libraries implemented in C++ and Verilog provide flexible and robust floating-point units with variable bit-width formats, multiple rounding modes and other features. This solution bridges the design gap between software and hardware, and accelerates the design cycle from algorithm to chip by avoiding the translation to fixed-point arithmetic. We demonstrate the effectiveness of the proposed scheme using the inverse discrete cosine transform (IDCT), in the context of video coding, as an example. Further, we implement lightweight floating-point IDCT into hardware and demonstrate the power and area reduction. Keywords and phrases: floating-point arithmetic, customizable bit-width, rounding modes, low-power, inverse discrete cosine transform, video coding. 1. INTRODUCTION plement these algorithms into integer-like hardware, that is, fixed-point units. This seemingly minor technical choice ac- Multimedia processing has been finding more and more ap- tually creates severe consequences: the need to use fixed- plications in mobile devices. A lot of effort must be spent point operations often distorts the natural form of the al- to manage the complexity, power consumption, and time- gorithm, forces awkward design trade-offs, and even intro- to-market of the modern multimedia system-on-chip (SoC) duces perceptible artifacts. Error analysis and word length designs. However, multimedia algorithms are computation- optimization of fixed-point 2D IDCT, inverse discrete cosine ally intensive, rich in costly FP arithmetic operations rather transform, algorithm has been studied in [2], and a tool for than simple logic. FP arithmetic hardware offers a wide dy- translating FP algorithms to fixed-point algorithms was pre- namic range and high computation precision, yet occupies sented in [3]. However, such optimization and translation large fractions of total chip area and energy budget. There- are based on human knowledge of the dynamic range, preci- fore, its application on mobile computing chip is highly lim- sion requirements, and the relationship between algorithm’s ited. Many embedded microprocessors such as the Stron- architecture and precision. This time-consuming and error- gARM [1] do not include an FP unit due to its unacceptable prone procedure often becomes the bottleneck of the entire hardware cost. system design flow. So there is an obvious gap in multimedia system devel- In this paper, we propose an effective solution: light- opment: software designers prototype these algorithms us- weight FP arithmetic. This is essentially a family of cus- ing high-precision FP operations, to understand how the al- tomizable FP data formats that offer a wider range of preci- gorithm behaves, while the silicon designers ultimately im- sion/power/speed/area trade-offs, but wrapped in forms that 880 EURASIP Journal on Applied Signal Processing hide the complexity of the underlying implementations from 18 23 both multimedia algorithm designers and silicon designers. s exp frac Libraries implemented in C++ and Verilog provide flexible s − FP value: (1) 2exp bias · 1. frac (The leading 1 is implicit) and robust FP units with variable bit-width formats, multi- 0 ≤ exp ≤ 255, bias = 127 ple rounding modes and other features. This solution bridges the design gap between software and hardware and accelerate Figure 1: FP number representation. the design cycle from algorithm to chip. Algorithm designers can translate FP arithmetic computations transparently to lightweight FP arithmetic and adjust the precision eas- 16 16 ily to what is needed. Silicon designers can use the standard int frac ASIC or FPGA design flow to implement these algorithms using the arithmetic cores we provide which consume less Figure 2: Fixed-point number representation. power than standard FP units. Manual translation from FP algorithms to algorithms can be eliminated from the design cycle. exponential scale and is reputed for a wide dynamic range. We test the effectiveness of our lightweight arithmetic li- The date format consists of three fields: sign, exponent, and brary using an H.263 video decoder. Typical multimedia ap- fraction (also called mantissa), as shown in Figure 1. plications working with modest-resolution human sensory Dynamic range is determined by the exponent bit-width, data such as audio and video do not need the whole dynamic and resolution is determined by the fraction bit-width. The range and precision that IEEE-standard FP offers. By reduc- widely adopted IEEE single FP standard [4] uses an 8-bit ex- − ing the complexity of FP arithmetic in many dimensions, ponent that can reach a dynamic range roughly from 2 126 such as narrowing the bit-width, simplifying the rounding to 2127, and a 23-bit fraction that can provide a resolution of − − methods and the exception handling, and even increasing the 2exp 127 · 2 23, where exp stands for value represented by the radix, we explore the impact of such lightweight arithmetic exponent field. on both the algorithm performance and the hardware cost. In contrast, the fixed-point representation is on a uni- Our experiments show that for the H.263 video decoder, form scale, that is, essentially the same as the integer repre- FP representation with less than half of the IEEE standard sentation, except for the fixed radix point. For instance (see FP bit-width can produce almost the same perceptual video Figure 2), a 32-bit fixed-point number with a 16-bit integer quality. Specifically, only 5 exponent bits and 8 mantissa bits part and a 16-bit fraction part can provide a dynamic range − − for a radix-2 FP representation, or 3-exponent bits and 11 of 2 16 to 216 and a resolution of 2 16. mantissa bits for a radix-16 FP representation are all we need When prototyping algorithms with FP, programmers do to maintain the video quality. We also demonstrate that a not have to concern about dynamic range and precision, be- simple rounding mode is sufficient for video decoding and cause IEEE standard FP provides more than necessary for offers enormous reduction in hardware cost. In addition, we most general applications. Hence, float and double are stan- implement a core algorithm in the video codec, IDCT, into dard parts of programming languages like C++, and are sup- hardware using the lightweight arithmetic unit. Compared ported by most compilers. However, in terms of hardware, to a conventional 32-bit FP IDCT, our approach reduces the the arithmetic operations of FP need to deal with three parts power consumption by 89.5%. (sign, exponent, fraction) individually, which adds substan- The paper is organized as follows. Section 2 introduces tially to the complexity of the hardware, especially in the as- briefly the relevant background on FP and fixed-point rep- pect of power consumption, while fixed-point operations are resentations. Section 3 describes our C++ and Verilog li- almost as simple as integer operations. If the system has a braries of lightweight FP arithmetic and the usage of the li- stringent power budget, then the application of FP units has braries. Section 4 explores the complexity reduction we can to be limited, and on the other hand, a lot of manual work is achieve for IDCT built with our customizable library. Based spent in implementing and optimizing the fixed-point algo- on the results in this section, we present the implementa- rithms to provide the necessary dynamic range and precision. tion of lightweight FP arithmetic units and analyze the hardware cost reduction in Section 5.InSection 6,wecompare 2.2. IEEE-754 floating-point standard the area/speed/power of a standard FP IDCT, a lightweight IEEE-754 is a standard for binary FP arithmetic [4]. Since FP IDCT, and a fixed-point IDCT. Concluding remarks fol- our later discussion about the lightweight FP is based on this low in Section 7. standard, we give a brief review of its main features in this section. 2. BACKGROUND Data format 2.1. Floating-point representation versus fixed-point The standard defines two primary formats, single precision representation (32 bits) and double precision (64 bits). The bit-widths of There are two common ways to specify real numbers: FP and three fields and the dynamic range of single and double- fixed-point representations. FP can represent numbers on an precision FP are listed in Table 1. Lightweight Floating-Point Arithmetic: Case Study of Inverse Discrete Cosine Transform 881 Table 1: IEEE FP number format and dynamic range. 18 23 Infinite s 11111111 frac == 0 Format Sign Exp Frac Bias Max Min Single 1 8 23 127 3.4 · 1038 1.4 · 10−45 18 23 NaN s 11111111 frac = 0 Double 1 11 52 1023 1.8 · 10308 4.9 · 10−324 NaN is assigned when some invalid operations occur, like (+∞)+(−∞), 0 ∗∞, 0/0,...etc. 18 23 s 00000000 frac = 0 Figure 4: Infinite and NaN representations. Figure 3: Denormalized number format.

Load more