Low-Complexity Enhanced Lapped Transform for Image Coding in JPEG XR / HD Photo Aldo Maalouf, Member IEEE, Mohamed-Chaker Larabi, Senior Member IEEE XLIM Laboratory, Department of Signal, Image and Communication, UMR CNRS 6172 University of Poitiers, SP2MI-2 Bd Marie et Pierre Curie, PO Box 30179 86962 Futuroscope Chasseneuil, France {maalouf, larabi}@sic.sp2mi.univ-poitiers.fr

Abstract— JPEG-XR is a new standard Hence, POT improves compression efficiency, while at the that aims at achieving state-of-the-art image compression, while same time reducing blocking artifacts. simultaneously keeping the encoder and decoder complexities as In this work, we propose to replace the LBT by a representa- low as possible. JPEG-XR [1] is based on Microsoft technology known as HDPHOTO and makes use of a block-transform. This tion in Legendre polynomials basis. The motivation is that transform, known as Lapped Biorthogonal Transform (LBT), orthogonal polynomials have demonstrated much desirable requires only a small memory footprint while providing the strength in the field of image processing [7], especially in compression benefits of a larger block transform. In this work, pattern recognition [4], edge detection [6] and texture analysis we propose to replace the LBT by a representation in Legendre [8]. Moreover, it has been shown in [2] that orthogonal orthogonal polynomial basis. The motivation behind using the Legendre polynomials is that, in general, moment functions of polynomial have some properties related to the human visual orthogonal polynomials provide better feature representations system (HVS). In fact, visual analysis in the visual cortex, over other type of moments [7] [11] and have some properties from the mathematical point of view, can be regarded as a related to the human visual system (HVS) [2]. However, Legendre process of expansion of the curve of spatial distribution of polynomials have a unit weight function and recurrence relation brightness within the receptive fields in an orthogonal polyno- involving real coefficients, which make them suitable for defining image representation. We show that the expansion in Legendre mial basis. Therefore, orthogonal polynomials provide better polynomial basis can be implemented via lifting operations opportunities to qualitatively optimize image data representa- and has the same computation complexity as the LBT. The tion. However, Legendre polynomials are of special interest experimental evaluation of our modified JPEG-XR scheme shows in our case, because they have unit weight and algebraic beneficial improvements in terms of visual quality over the recurrence relations involving real coefficients, which make standard JPEG-XR. them suitable for defining image transforms for compression. Index Terms— JPEG-XR, image coding, HDPHOTO, orthog- The integration of the Legendre polynomial expansion allows onal polynomials. incorporating human visual system (HVS) properties in the JPEG-XR compression scheme. As a result, an improvement I. INTRODUCTION in terms of visual quality of the compressed images is obtained JPEG-XR is a recent compression algorithm developed by by our modified JPEG-XR scheme. Microsoft for digital imaging applications [12] [13]. JPEG- It is to be noted that the image representation in the Legendre XR is characterized by a block-based image compression polynomial basis can be implemented via lifting operations. scheme. The design aims at optimizing image quality Therefore, our modified approach does not add any complexity and compression efficiency while at the same time requiring to the standard JPEG-XR scheme. low-complexity in encoder and decoder implementations. As a The remainder of the paper is organized as follows. In sec- result, even if it uses many of the same fundamental building tion 2 an overview of the JPEG-XR is given. In section 3 blocks as in other traditional image and compression our modified Legendre polynomial-based JPEG-XR approach schemes, i.e. color conversion, domain transform, quantiza- is described. Section 4 is devoted for experimental results. tion, coefficient scanning and entropy coding, the JPEG-XR Finally, section 5 draws some concluding remarques. have led to different improvements in terms of complexity when compared to other state of the art compression methods. II. OVERVIEW OF JPEG-XR To convert spatial domain image data to frequency domain, JPEG-XR uses a hierarchical two stages LBT [5] which is The coding structure of JPEG-XR, which shares some simi- based on a flexible concatenation of two operators: the Photo larities with traditional image coding techniques, is composed Core Transform (PCT) and the Photo Overlap Transform of the following steps: color conversion, reversible Lapped Bi- (POT). PCT is similar to the widely used DCT and exploits orthogonal Transform (LBT), flexible quantization, inter-block spatial correlation within the block. However, it fails to exploit prediction, adaptive” coefficient scanning, and entropy coding redundancy across block boundaries and may introduce block- of transform coefficients [12]. The distinguishing features are ing artifacts at low bit rates. To alleviate these drawbacks, POT the LBT and the advanced coding of coefficients. is designed to exploit the correlation across block boundaries. To convert spatial domain image data to frequency domain,

978-1-4244-5654-3/09/$26.00 ©2009 IEEE 5 ICIP 2009 JPEG-XR uses a hierarchical two stage LBT which is based on a discrete domain x =0, 1,...,N − 1. on a flexible concatenation of two operators: the Photo Core In this paper, we propose to use an image representation Transform (PCT) and the Photo Overlap Transform (POT). scheme similar to a 4 × 4 PCT, based on an image transform PCT is similar to the widely used DCT and exploits spatial requiring only polynomials of degree 0 through 3. For an correlation within the block. However, it fails to exploit re- image I(x, y), this representation is defined by: dundancy across block boundaries and may introduce blocking 3 3 artifacts at low bit rates. To alleviate these drawbacks, POT is Lpq = lp (x) lq (y)I (x, y), p, q =0,...,3 (2) designed to exploit the correlation across block boundaries. x=0 y=0 Hence, POT improves compression efficiency, while at the same time reducing blocking artifacts. The inverse transformation of (2) has the form The transform is performed in a two-stage hierarchical struc- 3 3 ture. For the sake of simplicity, we consider the case of I (x, y)= Lpqlp (x) lq (y) (3) the luminance channel. At the first stage, a 4 × 4 POT is p=0 q=0 optionally applied, followed by a compulsory 4x4 PCT. The resulting 16 DC coefficients of all 4×4 blocks within a 16x16 B. Implementation via lifting operators macroblock are grouped into a single 4 × 4 DC block. The From equation (2), a 4 × 4 transform matrix, denoted by A, remaining 240 AC coefficients are referred to as the High can be defined by: Pass (HP) coefficients. At the second stage, the DC blocks ⎡ ⎤ 4 × 4 lp (0) lq (0) lp (0) lq (1) ...... lp (0) lq (3) are further processed. Another optional POT is first ⎢ ⎥ ⎢ lp (1) lq (0) lp (1) lq (1) ...... lp (1) lq (3) ⎥ performed on the DC blocks, followed by the application of A = ⎣ ⎦ a compulsory 4x4 PCT. This yields 16 new coefficients: one ...... (3) (0) (3) (1) (3) (3) second stage DC coefficient and 15 second stage AC coeffi- l⎡p lq lp lq ...... lp ⎤lq 0 5050505 cients, referred to as the DC and Low Pass (LP) coefficients . . . . ⎢ 0 67 0 224 −0 224 −0 67 ⎥ respectively. The DC, LP and HP bands are then quantized = ⎢ . . . . ⎥ ⎣ 0 5 −0 5 −0 505 ⎦ and coded independently. All transforms are implemented by . . . . 0 224 −0 67 0 67 −0 224 lifting steps [3]. The chrominance channels are processed in . . . . (4) a similar way. Whenever POT and PCT are concatenated, the The matrix A indicates that the image representation in transform becomes equivalent to LBT [5]. In order to enable Legendre polynomial basis is an even/odd transform; that is, the optimization of the Quantization Parameters (QP) based one half of its row are odd vectors: on the sensitivity of the HVS and the coefficient statistics, JPEG-XR uses a flexible coefficient quantization approach. To vi = −vM+1−i, i =1, 2,...,M/2, (5) further improve compression efficiency, inter-block coefficient prediction is then used to remove inter-block redundancy in the while the others are even vectors: quantized transform coefficients. Adaptive coefficient scanning vi = vM+1−i, i =1, 2,...,M/2, (6) is then used to convert the 2-D transform coefficients within a block into a 1D vector to be encoded. Scan patterns are By re-arranging the rows of the transform matrix A, such adapted dynamically based on the local statistics of coded that rows 1, 3,...,M − 1 are the first M/2 rows and rows coefficients. Coefficients with higher probability of non-zero 2, 4,...,M are the last M/2 rows, the transform matrix can be written as a partitioned matrix: values are scanned earlier. Finally, the transform coefficients are entropy coded. Q Q¯ A = , (7) D −D¯ III. IMAGE REPRESENTATION IN LEGENDRE BASIS where Q and D are M/2 × M/2 orthogonal matrices, and In this section, we make use of Legendre orthogonal basis Q¯ and D¯ are formed by reversing the order of the columns in in order to represent the image and improve the quality Q and D;thatis, of the JPEG-XR compression scheme. Our modified JPEG- XR scheme is similar to the JPEG-XR baseline. The only Q¯ = QI¯ and D¯ = DI¯, difference is that we replaced the PCT with a lapped rep- the permutation matrix I¯ being the opposite diagonal identity resentation in Legendre polynomial basis. First, we describe matrix. The matrix A can then be factored into the product of this representation, then, we show how it can be implemented two matrices by using lifting operators. 0 ¯ = Q I I A 0 −¯ . (8) A. Image representation using Legendre polynomial basis D I I Legendre polynomial of order p denoted by l p(x),isgiven Because the transformation defined by the matrix A is an as follows, odd-even transform, its fast algorithm can be obtained via √ Hadamard transform through a conversion matrix S.IfW is l0 (x)=1 p, the Hadamard matrix of the same order as A, then (p +1)lp+1 (x)=(2p +1)xlp (x) − plp−1 (x) , (1) p =2, 3,...,N − 1, A = SW, S = AW T ,

6 IV. EXPERIMENTAL RESULTS We investigate our modified lapped transform into the JPEG-XR reference software [1] so as to compare our method with the prediction in JPEG-XR baseline. We also compare our modified JPEG-XR scheme with another improved version of JPEG-XR that was proposed by Richter in [14]. The first experimental test consists of applying our modified Fig. 1. Fast implementation of the expansion in Legendre polynomial basis JPEG-XR scheme on classical images. For that we used the via the fast Hadamard transform image ’Lenna’ which compressed by using JPEG-XR baseline and our modified JPEG-XR scheme. Zooms on both results where are shown in figure 2. It is clear that our modified JPEG-XR scheme provides a better details preservation than the baseline Y Y¯ W = , approach. For a better objective evaluation, we followed the G −G¯ evaluation method pursued in [15] corresponding to core- ¯ T T Q Q Y G experiments defined in the framework of the JPEG committee S = ¯ T T , D −D Y¯ −G¯ for the evaluation of JPEG-XR. For that, we compare our mod- ified JPEG-XR scheme with the JPEG-XR reference software QY T + QY¯ T QGT − QG¯T and the modified version proposed in [14] using other metrics, S = T T T T DY − DY¯ DG + DG¯ namely the Multiscale Structural Similarity Index (MS-SSIM) (9) QY T 0 [16] and the Visual Difference Predictor (VDP) [9], [10]. Both =2 T . 0 DG metrics follow a top-down design by trying to model the phys- ical properties of the HVS. The MS-SSIM follows a bottom-up The last step follows from modeling paradigm that first decomposes images into several   ¯ T T T T QY = QI¯ Y I¯ = QIIY¯ = QY scales and then measures contrast and structure in each scale. In addition, the luminance of the lowest scale is also measured. ¯T T DG = DG Finally, all the data is pooled into a single score. MS-SSIM For p = q =4, the conversion matrix can be computed has the advantage that it is computationally tractable while still from (9), as follows: providing reasonable correlations to subjective measurements [15]. The VDP follows a more traditional approach. Unlike MS-SSIM defining quality metric, VDP is a fidelity metric as 11 31 =05 =0224 it predicts how many pixels a standard observer is likely to Q . 1 −1 , D . 1 −3 perceive as different from the original. That is, VDP does not The Hadamard matrix of the 4th order is try to judge how irritating image artifacts generated by the ⎡ ⎤ 1111 compression technology are. It only tries to predict whether ⎢ 1 −11−1 ⎥ they are detectable. =05 ⎢ ⎥ W4 . ⎣ 11−1 −1 ⎦, The set of image used for this experiment is from the ISO 1 −1 −11 images originally used for JPEG-XR evaluation. Results are given in Fig.3 for the ”honolulu cathedral”, ”oahu waimea2” from which, and the ”waikiki at night” images. Both MS-SSIM as well as VDP have been plotted in the logarithmic domain, i.e. 11 1 −1 the first set of graphs shows −20log(1 − MSSIM),the =0707 =0707 Y . 1 −1 and G . 11 second −20log(r) where r is the ratio of pixels the VDP ⎛ ⎡ ⎤⎞ standard observer would detect as different with a probability 2 100 0 of p ≥ 75%. Note that a MS-SSIM index of 1 indicates a ⎜ 2 ⎢ 010 0⎥⎟ = ⎜ ⎢ ⎥⎟ perfect match, which will be mapped to ∞ in our plots. As and S ⎝ 0 632 ⎣ 001 2⎦⎠. . seen from the plots shown in figures 3 and 4, the proposed 0 632 002−1 . prediction scheme can greatly improve the MS-SSIM and the It is clear that S is a sparse matrix. The algorithm of the fast VDP metrics. implementation of the expansion in Legendre basis is shown in figure 1. In total, each 4 × 4 transform of figure 1 requires 96 trivial V. C ONCLUSION operations and 8 non-trivial per block. Compared to the PCT, In this work, an improvement of the JPEG-XR compression which requires 91 trivial and 11 non trivial operations per scheme is proposed. The proposed approach consists of replac- block, the transform of figure 1 has approximately the same ing the PCT with an expansion in Legendre polynomial basis. complexity as the PCT but with better visual quality as we The motivation behind using these basis is that they exhibit will show in the next section. some properties related to human visual system and, therefore, In order to exploit the redundancy across block boundaries, they can be used to improve image representation. Until now, we used the POT as in JPEG-XR baseline. we have used objective quality tests to quantify our results. It

7 Fig. 2. Results obtained by using (a) our modified JPEG-XR scheme (PSNR = 37.64 dB) and (b) JPEG-XR baseline (PSNR = 36.04 dB). = 0.8 bpp for both images.

Fig. 3. Logarithmic SSIM plots for the images ”honolulu cathedral”, ”oahu waimea2” and ”waikiki at night” (top to bottom). JPEG-XR baseline is in ’- -’, the scheme proposed in [14] (enhanced JPEG-XR) is in ’-o’ and our scheme in continuous line

Fig. 4. Logarithmic VDP plots for the images ”honolulu cathedral”, ”oahu waimea2” and ”waikiki at night” (top to bottom). JPEG-XR baseline is in ’- -’, the scheme proposed in [14] (enhanced JPEG-XR) is in ’-o’ and our scheme in continuous line

has been shown in [15] that the correlation between MS-SSIM [8] M. Tuceryan, Moment-based texture segmentation, Pattern Recog. Lett. and subjective judgement is very high (∼ 90%). This allows 15 (1994), 115123. [9] R. Mantiuk, K. Myszkowski and H. P. Seidel, Visible Difference Pred- to use it directly. However, it will be interesting to assess icator for High Dynamic Range Images, Proc. of IEEE International subjectively the results obtained by our modified JPEG-XR Conference on Systems, Man and Cybernetics (2004), 2763–2769. scheme. [10] R. Mantiuk, S. Daly, K. Myszkowski and H. P. Seidel, Predicting Visible Differences in High Dynamic Range Images-Model and its Calibration, Proc. of Human Vision and Electronic Imaging X, IS&T/SPIE’s 17th REFERENCES Annual Symposium on Electronic Imaging 204-214 (2005). [1] ISO, Reference software implementation of JPEG XR ISO/IEC JTC1 [11] R. Mukundan, S. H. Ong, P. A. Lee, Discrete vs Continuous Orthogonal SC29/WG1/N4560, April 2007. Moments in Image Analysis, Proc. Int. Conf. on Imaging Systems, [2] A. S. Blaivas, Visual analysis in unspecialized receptive fields as an Science and Technology, CISST’2001, Las Vegas (2001), 23–29. orthogonal series expansion, Neurophysiology, Springer New York 6 [12] S. Srinivasan, C. Tu, S. L. Regunathan, R. A. Rossi, and G. J. Sullivan, (1974), no. 2, 1573–9007. HD Photo: a new image coding technology for digital photography, [3] C. Tu, S. Srinivasan, G. Sullivan, S. Regunathan, and H. S. Malvar, Applications of Digital Image Processing XXX, Proceedings of SPIE, Low-complexity hierarchical lapped transform for lossy-to-lossless im- San Diego, CA USA 6696 (2007). age coding in JPEG XR / HD Photo, Applications of Digital Image [13] S. Srinivasan, C. Tu, Zhi Zhou, D. Ray, S. Regunathan and G. J. Sullivan, Processing XXXI, Proc. of SPIE 7073 (2008), 1–12. An Introduction to the HD Photo Technical Design, JPEG document [4] C.H. Lo, H.S. Don, 3D moment forms: their construction and applica- WG1 N4183. tion to object identification and positioning, IEEE Trans. Pattern Anal. [14] T. Richter, Visual quality improvement techniques of HDPHOTO/JPEG- Mach. Intell. 11 (1989), 1053–1064. XR, IEEE International conference on image processing, ICIP 2008 [5] H. S. Malvar, Biorthogonal and nonuniform lapped transforms for (2008), 2888–2891. with reduced blocking and ringing artifacts,IEEE [15] T. Richter and C. Larabi, Subjective and Objective Assessment of Visual Trans. Signal Processing (1998), 1043–1053. Image Quality Metrics and Still Image , Proc. of the Data [6] L.M. Luo, X.H. Xie, X.D. Bao, A modified moment-based edge oper- Compression Conference, DCC08 (2008). ator for rectangular pixel image, IEEE Trans. Circuits Systems Video [16] Z. Wang, E.P. Simoncelli and A.C. Bovik, Multi-scale structural simi- Technol. 4 (1994), 552–554. larity for image quality assessment, IEEE Asilomar Conf. on Signals, [7] M. R. Teague, Image Analysis via the General Theory of Moments,J. Systems and Computers (2003). Optical Soc. of America 70 (1980), 920–930.

8