Image Quality Assessment: from Error Visibility to Structural Similarity Zhou Wang, Member, IEEE, Alan C
Total Page:16
File Type:pdf, Size:1020Kb
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 4, APRIL 2004 1 Image Quality Assessment: From Error Visibility to Structural Similarity Zhou Wang, Member, IEEE, Alan C. Bovik, Fellow, IEEE Hamid R. Sheikh, Student Member, IEEE, and Eero P. Simoncelli, Senior Member, IEEE Abstract— Objective methods for assessing perceptual im- system, a quality metric can assist in the optimal design of age quality have traditionally attempted to quantify the vis- prefiltering and bit assignment algorithms at the encoder ibility of errors between a distorted image and a reference image using a variety of known properties of the human and of optimal reconstruction, error concealment and post- visual system. Under the assumption that human visual filtering algorithms at the decoder. Third, it can be used perception is highly adapted for extracting structural infor- to benchmark image processing systems and algorithms. mation from a scene, we introduce an alternative framework for quality assessment based on the degradation of struc- tural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate Objective image quality metrics can be classified accord- its promise through a set of intuitive examples, as well as ing to the availability of an original (distortion-free) image, comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with with which the distorted image is to be compared. Most JPEG and JPEG2000.1 existing approaches are known as full-reference, meaning Keywords— Error sensitivity, human visual system (HVS), that a complete reference image is assumed to be known. In image coding, image quality assessment, JPEG, JPEG2000, many practical applications, however, the reference image perceptual quality, structural information, structural simi- is not available, and a no-reference or “blind” quality as- larity (SSIM). sessment approach is desirable. In a third type of method, the reference image is only partially available, in the form I. Introduction of a set of extracted features made available as side infor- Digital images are subject to a wide variety of distor- mation to help evaluate the quality of the distorted image. tions during acquisition, processing, compression, storage, This is referred to as reduced-reference quality assessment. transmission and reproduction, any of which may result This paper focuses on full-reference image quality assess- in a degradation of visual quality. For applications in ment. which images are ultimately to be viewed by human be- ings, the only “correct” method of quantifying visual im- The simplest and most widely used full-reference quality age quality is through subjective evaluation. In practice, metric is the mean squared error (MSE), computed by aver- however, subjective evaluation is usually too inconvenient, aging the squared intensity differences of distorted and ref- time-consuming and expensive. The goal of research in ob- erence image pixels, along with the related quantity of peak jective image quality assessment is to develop quantitative signal-to-noise ratio (PSNR). These are appealing because measures that can automatically predict perceived image they are simple to calculate, have clear physical meanings, quality. and are mathematically convenient in the context of opti- An objective image quality metric can play a variety of mization. But they are not very well matched to perceived roles in image processing applications. First, it can be visual quality (e.g., [1]–[9]). In the last three decades, a used to dynamically monitor and adjust image quality. For great deal of effort has gone into the development of quality example, a network digital video server can examine the assessment methods that take advantage of known charac- quality of video being transmitted in order to control and teristics of the human visual system (HVS). The majority allocate streaming resources. Second, it can be used to of the proposed perceptual quality assessment models have optimize algorithms and parameter settings of image pro- followed a strategy of modifying the MSE measure so that cessing systems. For instance, in a visual communication errors are penalized in accordance with their visibility. Sec- The work of Z. Wang and E. P. Simoncelli was supported by the tion II summarizes this type of error-sensitivity approach Howard Hughes Medical Institute. The work of A. C. Bovik and H. and discusses its difficulties and limitations. In Section III, R. Sheikh was supported by the National Science Foundation and the we describe a new paradigm for quality assessment, based Texas Advanced Research Program. Z. Wang and E. P. Simoncelli are with the Howard Hughes Medical Institute, the Center for Neural Sci- on the hypothesis that the HVS is highly adapted for ex- ence and the Courant Institute for Mathematical Sciences, New York tracting structural information. As a specific example, we University, New York, NY 10012 USA (email: [email protected]; develop a measure of structural similarity that compares lo- [email protected]). A. C. Bovik and H. R. Sheikh are with the Laboratory for Image and Video Engineering (LIVE), Department cal patterns of pixel intensities that have been normalized of Electrical and Computer Engineering, The University of Texas for luminance and contrast. In Section IV, we compare the at Austin, Austin, TX 78712 USA (email: [email protected]; test results of different quality assessment models against [email protected]). 1A MatLab implementation of the proposed algorithm is available a large set of subjective ratings gathered for a database of online at http://www.cns.nyu.edu/~lcv/ssim/. 344 images compressed with JPEG and JPEG2000. 2 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 4, APRIL 2004 Reference signal Quality/ Pre- CSF Channel Error Error . Distortion Distorted processing Filtering Decomposition . Normalization . Pooling . Measure signal Fig. 1. A prototypical quality assessment system based on error sensitivity. Note that the CSF feature can be implemented either as a separate stage (as shown) or within “Error Normalization”. II. Image Quality Assessment Based on Error quency response of the CSF). However, many recent met- Sensitivity rics choose to implement CSF as a base-sensitivity normal- ization factor after channel decomposition. An image signal whose quality is being evaluated can Channel Decomposition. The images are typically sep- be thought of as a sum of an undistorted reference signal arated into subbands (commonly called “channels” in the and an error signal. A widely adopted assumption is that psychophysics literature) that are selective for spatial and the loss of perceptual quality is directly related to the vis- temporal frequency as well as orientation. While some ibility of the error signal. The simplest implementation quality assessment methods implement sophisticated chan- of this concept is the MSE, which objectively quantifies nel decompositions that are believed to be closely re- the strength of the error signal. But two distorted images lated to the neural responses in the primary visual cortex with the same MSE may have very different types of errors, [2], [15]–[19], many metrics use simpler transforms such as some of which are much more visible than others. Most the discrete cosine transform (DCT) [20], [21] or separa- perceptual image quality assessment approaches proposed ble wavelet transforms [22]–[24]. Channel decompositions in the literature attempt to weight different aspects of the tuned to various temporal frequencies have also been re- error signal according to their visibility, as determined by ported for video quality assessment [5], [25]. psychophysical measurements in humans or physiological Error Normalization. The error (difference) between the measurements in animals. This approach was pioneered decomposed reference and distorted signals in each channel by Mannos and Sakrison [10], and has been extended by is calculated and normalized according to a certain masking many other researchers over the years. Reviews on image model, which takes into account the fact that the presence and video quality assessment algorithms can be found in of one image component will decrease the visibility of an- [4],[11]–[13]. other image component that is proximate in spatial or tem- poral location, spatial frequency, or orientation. The nor- A. Framework malization mechanism weights the error signal in a channel Fig. 1 illustrates a generic image quality assessment by a space-varying visibility threshold [26]. The visibility framework based on error sensitivity. Most perceptual threshold at each point is calculated based on the energy quality assessment models can be described with a simi- of the reference and/or distorted coefficients in a neighbor- lar diagram, although they differ in detail. The stages of hood (which may include coefficients from within a spatial the diagram are as follows: neighborhood of the same channel as well as other chan- Pre-processing. This stage typically performs a variety nels) and the base-sensitivity for that channel. The normal- of basic operations to eliminate known distortions from the ization process is intended to convert the error into units of images being compared. First, the distorted and reference just noticeable difference (JND). Some methods also con- signals are properly scaled and aligned. Second, the signal sider the effect of contrast response saturation (e.g., [2]). might be transformed into