A Unified Rate-Distortion Analysis Framework for Transform Coding Zhihai He, Member, IEEE, and Sanjit K

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 12, DECEMBER 2001 1221 A Unified Rate-Distortion Analysis Framework for Transform Coding Zhihai He, Member, IEEE, and Sanjit K. Mitra, Life Fellow, IEEE Abstract—In our previous work, we have developed a rate-distortion (R-D) modeling framework H.263 video coding by introducing the new concepts of characteristic rate curves and rate curve decomposition. In this paper, we further show it is a unified R-D analysis framework for all typical image/video transform coding systems, such as EZW, SPIHT and JPEG image coding; MPEG-2, H.263, and MPEG-4 video coding. Based on this framework, a unified R-D estimation and control algorithm is proposed for all typical transform coding systems. We have also provided a theoretical justification for the unique properties of the characteristic rate curves. A linear rate regulation scheme is designed to further improve the estimation accuracy and robustness, as well as to reduce the computational complexity of the R-D estimation algorithm. Our extensive experimental results show that with the proposed algorithm, we can accurately estimate the Fig. 1. Generic transform coding system for images and videos. R-D functions and robustly control the output bit rate or picture quality of the image/video encoder. Index Terms—Rate control, rate-distortion analysis, source form coding has become the dominant approach for image modeling, transform coding, video coding and transmission. and video compression. A generic transform coding system is depicted in Fig. 1. The transform, either discrete wavelet transform (DWT) or discrete cosine transform (DCT), is I. INTRODUCTION applied to the input picture. Here, a picture can be either a still ECENT advances in computing and communication image or motion-compensated video frame. After quantization, R technology have stimulated the research interest in digital the quantized coefficients are converted into symbols according techniques for recording and transmitting visual information. to some data representation scheme. For example, zig–zag The exponential growth in the amount of visual data to be scan and run-level data representation are employed in JPEG stored, transferred, and processed has created a huge need for and MPEG coding [2], [4]. In embedded zero-tree wavelet data compression. Compression of visual data, such as images (EZW) coding [7], all insignificant coefficients in a spatial and videos, can significantly improve the utilization efficiency orientation tree are represented by one zero-tree symbol. After of the limited communication channel bandwidth or storage data representation, the output symbols are finally encoded by capacity. a Huffman or arithmetic coder [13]. A. Transform Coding B. R-D Analysis The demand for image and video compression has triggered In transform coding of images and videos, the two most im- the development of several compression standards, such portant factors are the coding bit rate and picture quality. The as JPEG [2], JPEG-2000 [3], MPEG-2 [4], H.263 [5], and coding bit rate determines the channel bandwidth required MPEG-4 [6]. Besides the standard image/video compression to transfer the coded visual data. One direct and widely used algorithm, many other algorithms have also been reported in measure for the picture quality is the mean-square error (MSE) the literature, such as embedded zero-tree wavelet (EZW) [7] between the coded image/video and the original one. The recon- image coding, set partitioning in hierarchical trees (SPIHT) [8] struction error introduced by compression, often referred to as and stack-run (SR) [9] image coding. In both the compression distortion, is denoted by . In typical transform coding, both standards and the algorithms reported in the literature, trans- and are controlled by the quantization parameter of the quantizer . The major issue here is how to determine the value Manuscript received May 8, 2000; revised September 28, 2001. This paper of to achieve the target coding bit rate , or target picture was recommended by Associate Editor S. U. Lee. quality . To this end, we need to analyze and estimate the Z. He was with the Department of Electrical and Computer Engineering, Uni- R-D behavior of the image/video encoder; this behavior is char- versity of California, Santa Barbara, CA 93106 USA. He is now with the In- teractive Media Group, Sarnoff Corporation, Princeton, NJ 08543-5300 USA acterized by its rate-quantization (R-Q) and distortion-quantiza- (e-mail: [email protected]). tion (D-Q) functions, and , respectively [10], [11]. In S. K. Mitra is with the Department of Electrical and Computer Engi- this work, they are collectively called R-D functions or curves. neering, University of California, Santa Barbara, CA 93106 USA (e-mail: [email protected]). Based on the R-D functions, the quantization parameter can Publisher Item Identifier S 1051-8215(01)11031-1. be readily determined to achieve the target bit rate or picture 1051–8215/01$10.00 © 2001 IEEE 1222 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 12, DECEMBER 2001 quality [12], [14]. Therefore, the major issue becomes this: explore these characteristics. The R-D models reported in the how to analyze, model and estimate the R-D functions for the literature try to use some statistics of the input source data, image/video encoder. such as variance, to describe the input image or video data [10], Analysis and estimation of the R-D functions have important [12], [14]. They also try to analyze and model each step of the applications in visual coding and communication. First, with the coding algorithms and formulate an explicit expression of the estimated R-D functions we can adjust the quantization setting coding bit rate. To achieve high coding performance, an effi- of the encoder and control the output bit rate or picture quality cient coding algorithm must employ a sophisticated data repre- according to the channel condition, the storage capacity, or the sentation scheme as well as an efficient entropy coding scheme. user’s requirement [14]–[16]. Second, based on the estimated To improve the rate estimation accuracy for these coding algo- R-D functions, optimum bit allocation as well as other R-D opti- rithms, the rate models are becoming very complex [12], [14], mization procedures can be performed to improve the efficiency [24], [25]. However, with complex and highly nonlinear expres- of the coding algorithm and, consequently, to improve the image sions, the estimation and rate control process becomes increas- quality or video presentation quality [17], [18]. ingly complicated and even unstable with the image-dependent There are two basic approaches for R-D modeling. The first variations [23]. is the analytic approach. Its objective is to derive a set of math- It should also be noted that, for different coding algorithms, ematical formulas for the R-D functions based on the statistical the R-D models and rate control algorithms reported in the lit- properties of the source data. In this approach, both the coding erature are quite different from each other [12], [14]–[16], [24], system and the image are first decomposed into components [25]. It would be ideal to develop a simple, accurate, and uni- whose statistical models are already known. These models are fied rate model for all typical transform coding systems. Based then combined to form a complete analytic model for the whole on this simple model, we could then develop a unified rate and coding system. The R-D functions for a simple quantizer have picture quality control algorithm which could be applied to all been developed for a long time [10], [11]. In the analytic source typical transform coding systems. To this end, we need to un- model proposed by Hang and Chen [12], a theoretical entropy cover the common rules that govern the R-D behaviors of all formula for the quantized DCT coefficients is developed based transform coding systems. Obviously, this will provide us with on the R-D theory of the Gaussian source and the uniform quan- valuable insights into the mechanism of transform coding. From tizer. The mismatches between the theoretical entropy and the a practical point of view, the simple and unified rate model and actual coding bit rate of the entropy encoder is, however, com- control algorithm would enable us to control the image/video pensated by empirical estimation. encoder accurately and robustly with very low computational The second approach is the empirical approach. Here, the complexity and implementation cost. R-D functions are constructed by mathematical processing of In this work, based on the so-called -domain analysis the observed R-D data. In the R-D estimation algorithm pro- method proposed in [1], [22], we develop a generic source posed by Lin and Ortega [23], eight control points on the R-D modeling framework for transform coding of images and curves are first computed by running the coding system eight videos by the following two major steps. In the first step, we times. The whole R-D curves are then constructed by cubic in- introduce the concepts of characteristic rate curves and rate terpolation. In the MPEG rate control algorithm proposed by curve decomposition to characterize the input source date and Ding and Liu [15], the R-D curves are fitted by mathematical to model the coding algorithm, respectively. In the second step, functions with several control parameters which are estimated we propose a linear regulation scheme to improve the accuracy from the observed R-D data of the coding system. In general, and robustness of the R-D estimation. Our extensive simulation this type of R-D estimation algorithms have very high compu- results show that the proposed framework is a unified R-D tational complexity.

Load more