New Image Compression Artifact Measure Using Wavelets
Total Page:16
File Type:pdf, Size:1020Kb
New image compression artifact measure using wavelets Yung-Kai Lai and C.-C. Jay Kuo Integrated Media Systems Center Department of Electrical Engineering-Systems University of Southern California Los Angeles, CA 90089-256 4 Jin Li Sharp Labs. of America 5750 NW Paci c Rim Blvd., Camas, WA 98607 ABSTRACT Traditional ob jective metrics for the quality measure of co ded images such as the mean squared error (MSE) and the p eak signal-to-noise ratio (PSNR) do not correlate with the sub jectivehuman visual exp erience well, since they do not takehuman p erception into account. Quanti cation of artifacts resulted from lossy image compression techniques is studied based on a human visual system (HVS) mo del and the time-space lo calization prop ertyofthe wavelet transform is exploited to simulate HVS in this research. As a result of our research, a new image quality measure by using the wavelet basis function is prop osed. This new metric works for a wide variety of compression artifacts. Exp erimental results are given to demonstrate that it is more consistent with human sub jective ranking. KEYWORDS: image quality assessment, compression artifact measure, visual delity criterion, human visual system (HVS), wavelets. 1 INTRODUCTION The ob jective of lossy image compression is to store image data eciently by reducing the redundancy of image content and discarding unimp ortant information while keeping the quality of the image acceptable. Thus, the tradeo of lossy image compression is the numb er of bits required to represent an image and the quality of the compressed image. This is usually known as the rate-distortion tradeo . The numberofbitsusedtorecordthecompressed image can b e measured easily.However, the \closeness" b etween the compressed and the original images is not a pure ob jective measure, since human p erception also plays an imp ortant role in determining the qualityofthe compressed image. Atpresent, the most widely used ob jective distortion measure is the mean squared error (MSE) or the related p eak signal-noise ratio (PSNR). It is well known that MSE do es not correlate very well with the visual quality p erceived byhuman b eing. This can b e easily explained by the fact that MSE is computed by adding the squared di erence of individual pixels without considering the visual interaction b etween adjacent pixels. In comparison with the amount of research on new compression techniques, work on visual quality measure of compressed images is amazingly little. Most visual quality assessments use rating scales ranging from \excellent" to \unsatisfactory" in measuring the p erceived image quality. Human assistance is needed in the establishmentof quality scales and the comparison b etween compressed images and the set of training images. It is dicult to obtain ob jective and quantitative measures by using this approach [15]. Results achieved are sub jective and qualitative. Some work tries to mo dify existing quantitative measures to accommo date the factor of human visual p erception. One is to improveMSEby putting di erentweights to neighb oring regions with di erent distances to the fo cal pixel [17]. Most of them can b e viewed as a curve- tting metho d to comply with the rating scale metho d. Since late 1970's, researchers have started to pay attention to the imp ortance of the human visual system (HVS) and tried to include the HVS mo del in the image quality metric [10], [11]. The development of the HVS mo del at that time was not mature enough and the prop osed mo del could not interpret human visual p erceptual phenomena very well. Recently, Karunasekera prop osed an ob jective distortion measure to evaluate the blo cking artifact of the blo ck-based compression technique [12]. Watson [23] and van den Branden Lambrecht [1], [22] prop osed more complete mo dels and extended their use to compressed videos. It has b een shown by psychophysic exp eriments that the human visual system is comprised bymany units, each of which is sensitive to the contrast at some space-frequency lo calized region and functions indep endently. The overall visual p erception of the luminance or contrast of an ob ject is the aggregate p erformance of the bandpass frequency resp onse of all units [4]. Based up on this space-frequency lo calized mo del, the visual p erception of co ding artifacts can b e interpreted as the p erception of lo cal frequency inconsistency. When an image is compressed and decompressed, the deco ded image has di erent lo cal frequency resp onses than the original image, thus resulting in compression artifacts. Since each cell can sense only a nite region with resp ect to the fo cal pixel in an image, Fourier analysis is not suitable for the mo deling of HVS due to its global analytical prop erty. This issue has b een addressed and inconsistency b etween theory and observed results by using the pure frequency analysis has b een found. Another observation is that the common bandwidth of cortical cells is 1 o ctave[4].Itisknown from exp eriments that the frequency resolution of the neighb orho o d is reduced by an o ctave when the distance from the fo cal p oint doubles [6]. This observation suggests the use of dyadic wavelets for the analysis and mo deling of HVS. In this work, we consider a wavelet approach for the mo deling of HVS and then build an ob jective quality metric based on this mo del. To b e more precise, we rst prop ose a new de nition of contrast with resp ect to complex images by taking the wavelet transform of the image, and using wavelet co ecients to estimate the contrast at each resolution in the image. By using the multiresolution and space-frequency lo calization prop erties, manyknown inconsistencies in the psychophysical literature can b e understo o d naturally.Known human visual system functions such as the contrast sensitivity threshold [20] and the frequency masking e ect [21] can also b e incorp orated in this mo del. We then prop ose an ob jective metric to measure the extent of the p erceived contrast in every resolution, and derive an error measure by examining the weighted di erences of wavelet comp onents b etween the original and compressed images at each resolution. This pap er is organized as follows. In Section 2, we brie y review existing HVS theory and mo dels. Then, the wavelet approachisintro duced to simulate the lo calization phenomenon of HVS in Section 3. A new HVS mo del based on wavelets is prop osed and an image quality assessment system is describ ed in Section 4. Exp erimental results are given in Section 5 to demonstrate that this new metric works for a wide range of compression artifacts and that the resulting measure is consistent with human sub jective ranking. 2 MODEL FOR HUMAN VISUAL SYSTEM (HVS) 2.1 Contrast thresholds Human visual systems (HVS) resp onse to light stimuli of surroundings. One imp ortant observation is that visual p erception is sensitive to luminance di erence rather than the absolute luminance. Let L and L b e the max min maximum and minimum luminances of the waveform around the p ointofinterest. Michelson's contrast, de ned as L L max min ; (1) C = L + L max min is found to b e nearly constant when used to represent the just noticeable luminance di erence. Psychophysic exp eriments showed that HVS is comprised bymany units, eachofwhich is fo cused on a certain p oint in the vision eld and only sensitive to the contrast in a certain frequency band. The overall visual p erception of ob ject luminance or contrast is the aggregate p erformance of each cell's frequency resp onse [4]. Since HVS cannot provide in nite luminance resolution, a contrast threshold exists in each frequency band. The contrast threshold value is a function of the spatial frequency which can b e determined exp erimentally.Atypical contrast sensitivity curve, de ned as the recipro cal of the contrast curve, is shown in Fig. 1 [4]. As shown in the gure, HVS has the highest luminance resolution around 3-10 cycles p er degree, and the sensitivityattenuates at b oth high and low frequency ends. An image artifact can b e sensed if its contrast is ab ove the threshold at sp eci c spatial frequencies. 3 10 2 10 Sensitivity Threshold 1 10 0 10 −1 0 1 2 10 10 10 10 Spatial Frequency (cycles/deg) Figure 1: A typical contrast sensitivitycurveofhuman b eings. 2.2 Suprathreshold contrast The subthreshold stimuli, i.e. stimuli with a contrast lower than the threshold, cannot b e sensed byhuman. The suprathreshold contrast is of concern in this subsection. Visual resp onse to the suprathreshold contrast involves human sub jective rating. Even though it is dicult to nd an precise formula for its mo deling, it is generally agreed that the estimated resp onse R is a function of the spatial frequency and follows a p ower law [14]: p R = k (C C ) ; (2) T where C is the suprathreshold contrast, C is the contrast threshold, and the exp onent p varies b etween 0.38 and some T value more than 1.0 [7]. Moreover, the contrast constancy phenomenon [9] states that the suprathreshold resp onse is almost constant for a given contrast at a wide range of spatial frequencies. These two seemingly contradicting observations were mediated rst by Cannon [26] and then by Georgeson's 3-stage mo del [8].