Low-Complexity Enhanced Lapped Transform for Image Coding in Jpeg
Total Page:16
File Type:pdf, Size:1020Kb
Low-Complexity Enhanced Lapped Transform for Image Coding in JPEG XR / HD Photo Aldo Maalouf, Member IEEE, Mohamed-Chaker Larabi, Senior Member IEEE XLIM Laboratory, Department of Signal, Image and Communication, UMR CNRS 6172 University of Poitiers, SP2MI-2 Bd Marie et Pierre Curie, PO Box 30179 86962 Futuroscope Chasseneuil, France {maalouf, larabi}@sic.sp2mi.univ-poitiers.fr Abstract— JPEG-XR is a new image compression standard Hence, POT improves compression efficiency, while at the that aims at achieving state-of-the-art image compression, while same time reducing blocking artifacts. simultaneously keeping the encoder and decoder complexities as In this work, we propose to replace the LBT by a representa- low as possible. JPEG-XR [1] is based on Microsoft technology known as HDPHOTO and makes use of a block-transform. This tion in Legendre polynomials basis. The motivation is that transform, known as Lapped Biorthogonal Transform (LBT), orthogonal polynomials have demonstrated much desirable requires only a small memory footprint while providing the strength in the field of image processing [7], especially in compression benefits of a larger block transform. In this work, pattern recognition [4], edge detection [6] and texture analysis we propose to replace the LBT by a representation in Legendre [8]. Moreover, it has been shown in [2] that orthogonal orthogonal polynomial basis. The motivation behind using the Legendre polynomials is that, in general, moment functions of polynomial have some properties related to the human visual orthogonal polynomials provide better feature representations system (HVS). In fact, visual analysis in the visual cortex, over other type of moments [7] [11] and have some properties from the mathematical point of view, can be regarded as a related to the human visual system (HVS) [2]. However, Legendre process of expansion of the curve of spatial distribution of polynomials have a unit weight function and recurrence relation brightness within the receptive fields in an orthogonal polyno- involving real coefficients, which make them suitable for defining image representation. We show that the expansion in Legendre mial basis. Therefore, orthogonal polynomials provide better polynomial basis can be implemented via lifting operations opportunities to qualitatively optimize image data representa- and has the same computation complexity as the LBT. The tion. However, Legendre polynomials are of special interest experimental evaluation of our modified JPEG-XR scheme shows in our case, because they have unit weight and algebraic beneficial improvements in terms of visual quality over the recurrence relations involving real coefficients, which make standard JPEG-XR. them suitable for defining image transforms for compression. Index Terms— JPEG-XR, image coding, HDPHOTO, orthog- The integration of the Legendre polynomial expansion allows onal polynomials. incorporating human visual system (HVS) properties in the JPEG-XR compression scheme. As a result, an improvement I. INTRODUCTION in terms of visual quality of the compressed images is obtained JPEG-XR is a recent compression algorithm developed by by our modified JPEG-XR scheme. Microsoft for digital imaging applications [12] [13]. JPEG- It is to be noted that the image representation in the Legendre XR is characterized by a block-based image compression polynomial basis can be implemented via lifting operations. scheme. The codec design aims at optimizing image quality Therefore, our modified approach does not add any complexity and compression efficiency while at the same time requiring to the standard JPEG-XR scheme. low-complexity in encoder and decoder implementations. As a The remainder of the paper is organized as follows. In sec- result, even if it uses many of the same fundamental building tion 2 an overview of the JPEG-XR is given. In section 3 blocks as in other traditional image and video compression our modified Legendre polynomial-based JPEG-XR approach schemes, i.e. color conversion, domain transform, quantiza- is described. Section 4 is devoted for experimental results. tion, coefficient scanning and entropy coding, the JPEG-XR Finally, section 5 draws some concluding remarques. have led to different improvements in terms of complexity when compared to other state of the art compression methods. II. OVERVIEW OF JPEG-XR To convert spatial domain image data to frequency domain, JPEG-XR uses a hierarchical two stages LBT [5] which is The coding structure of JPEG-XR, which shares some simi- based on a flexible concatenation of two operators: the Photo larities with traditional image coding techniques, is composed Core Transform (PCT) and the Photo Overlap Transform of the following steps: color conversion, reversible Lapped Bi- (POT). PCT is similar to the widely used DCT and exploits orthogonal Transform (LBT), flexible quantization, inter-block spatial correlation within the block. However, it fails to exploit prediction, adaptive” coefficient scanning, and entropy coding redundancy across block boundaries and may introduce block- of transform coefficients [12]. The distinguishing features are ing artifacts at low bit rates. To alleviate these drawbacks, POT the LBT and the advanced coding of coefficients. is designed to exploit the correlation across block boundaries. To convert spatial domain image data to frequency domain, 978-1-4244-5654-3/09/$26.00 ©2009 IEEE 5 ICIP 2009 JPEG-XR uses a hierarchical two stage LBT which is based on a discrete domain x =0, 1,...,N − 1. on a flexible concatenation of two operators: the Photo Core In this paper, we propose to use an image representation Transform (PCT) and the Photo Overlap Transform (POT). scheme similar to a 4 × 4 PCT, based on an image transform PCT is similar to the widely used DCT and exploits spatial requiring only polynomials of degree 0 through 3. For an correlation within the block. However, it fails to exploit re- image I(x, y), this representation is defined by: dundancy across block boundaries and may introduce blocking 3 3 artifacts at low bit rates. To alleviate these drawbacks, POT is Lpq = lp (x) lq (y)I (x, y), p, q =0,...,3 (2) designed to exploit the correlation across block boundaries. x=0 y=0 Hence, POT improves compression efficiency, while at the same time reducing blocking artifacts. The inverse transformation of (2) has the form The transform is performed in a two-stage hierarchical struc- 3 3 ture. For the sake of simplicity, we consider the case of I (x, y)= Lpqlp (x) lq (y) (3) the luminance channel. At the first stage, a 4 × 4 POT is p=0 q=0 optionally applied, followed by a compulsory 4x4 PCT. The resulting 16 DC coefficients of all 4×4 blocks within a 16x16 B. Implementation via lifting operators macroblock are grouped into a single 4 × 4 DC block. The From equation (2), a 4 × 4 transform matrix, denoted by A, remaining 240 AC coefficients are referred to as the High can be defined by: Pass (HP) coefficients. At the second stage, the DC blocks ⎡ ⎤ 4 × 4 lp (0) lq (0) lp (0) lq (1) ...... lp (0) lq (3) are further processed. Another optional POT is first ⎢ ⎥ ⎢ lp (1) lq (0) lp (1) lq (1) ...... lp (1) lq (3) ⎥ performed on the DC blocks, followed by the application of A = ⎣ ⎦ a compulsory 4x4 PCT. This yields 16 new coefficients: one ... ... ...... ... (3) (0) (3) (1) (3) (3) second stage DC coefficient and 15 second stage AC coeffi- l⎡p lq lp lq ...... lp ⎤lq 0 5050505 cients, referred to as the DC and Low Pass (LP) coefficients . ⎢ 0 67 0 224 −0 224 −0 67 ⎥ respectively. The DC, LP and HP bands are then quantized = ⎢ . ⎥ ⎣ 0 5 −0 5 −0 505 ⎦ and coded independently. All transforms are implemented by . 0 224 −0 67 0 67 −0 224 lifting steps [3]. The chrominance channels are processed in . (4) a similar way. Whenever POT and PCT are concatenated, the The matrix A indicates that the image representation in transform becomes equivalent to LBT [5]. In order to enable Legendre polynomial basis is an even/odd transform; that is, the optimization of the Quantization Parameters (QP) based one half of its row are odd vectors: on the sensitivity of the HVS and the coefficient statistics, JPEG-XR uses a flexible coefficient quantization approach. To vi = −vM+1−i, i =1, 2,...,M/2, (5) further improve compression efficiency, inter-block coefficient prediction is then used to remove inter-block redundancy in the while the others are even vectors: quantized transform coefficients. Adaptive coefficient scanning vi = vM+1−i, i =1, 2,...,M/2, (6) is then used to convert the 2-D transform coefficients within a block into a 1D vector to be encoded. Scan patterns are By re-arranging the rows of the transform matrix A, such adapted dynamically based on the local statistics of coded that rows 1, 3,...,M − 1 are the first M/2 rows and rows coefficients. Coefficients with higher probability of non-zero 2, 4,...,M are the last M/2 rows, the transform matrix can be written as a partitioned matrix: values are scanned earlier. Finally, the transform coefficients are entropy coded. Q Q¯ A = , (7) D −D¯ III. IMAGE REPRESENTATION IN LEGENDRE BASIS where Q and D are M/2 × M/2 orthogonal matrices, and In this section, we make use of Legendre orthogonal basis Q¯ and D¯ are formed by reversing the order of the columns in in order to represent the image and improve the quality Q and D;thatis, of the JPEG-XR compression scheme. Our modified JPEG- XR scheme is similar to the JPEG-XR baseline. The only Q¯ = QI¯ and D¯ = DI¯, difference is that we replaced the PCT with a lapped rep- the permutation matrix I¯ being the opposite diagonal identity resentation in Legendre polynomial basis. First, we describe matrix. The matrix A can then be factored into the product of this representation, then, we show how it can be implemented two matrices by using lifting operators.