Comparing a Spatial Extension of ICTCP Color Representation with S-CIELAB and Other Recent Color Metrics for HDR and WCG Quality Assessment

https://doi.org/10.2352/ISSN.2470-1173.2020.15.COLOR-162 © 2020, Society for Imaging Science and Technology Comparing a spatial extension of ICTCP color representation with S-CIELAB and other recent color metrics for HDR and WCG quality assessment Anustup Choudhury, Scott Daly; Dolby Laboratories Inc.; Sunnyvale, CA, USA Abstract of lower spatial frequencies and non-contiguous regions, while Content created in High Dynamic Range (HDR) and Wide the natural imagery has much higher frequencies, masking due Color Gamut (WCG) is becoming more ubiquitous, driving the to textures, as well contiguous color region effects and gradients. need for reliable tools for evaluating the quality across the imag- Furthermore, while there has been significant evaluation for color ing ecosystem. One of the simplest techniques to measure the differences in complex (natural and civilized) imagery [2, 3], most quality of any video system is to measure the color errors. The tra- of this evaluation has been dominated by still images as opposed ditional color difference metrics such as DE00 and the newer HDR to video, and has been almost entirely limited to Standard Dy- specific metrics such as DEZ and DEITP compute color difference namic Range (SDR) content. on a pixel-by-pixel basis which do not account for the spatial ef- For this work, we evaluated several color difference met- fects (optical) and active processing (neural) done by the human rics on four publicly available HDR databases consisting of com- visual system. In this work, we improve upon the per-pixel DEITP plex images with various distortions, along with the correspond- color difference metric by performing a spatial extension simi- ing subjective scores. The different databases focus on differ- lar to what was done during the design of S-CIELAB. We quan- ent distortions and the aggregation of these covers a wide vari- tified the performance using four standard evaluation procedures ety of both luminance and chromatic distortions. Some of these on four publicly available HDR and WCG image databases and distortions result from tone-mapping and gamut mapping opera- found that the proposed metric results in a marked improvement tions, which tend to be dominated by lower frequencies due to with subjective scores over existing per-pixel color difference met- shallow gradients and easily-visible regions, but also can con- rics. tain step edge artifacts (i.e., having a 1/f spectrum). Other distortions include higher frequency distortions resulting from com- Introduction pression artifacts by various compression schemes such as JPEG, Millions of devices now support High Dynamic Range JPEG-XT, JPEG2000, and HEVC. These typically include ring- (HDR) and Wide Color Gamut (WCG) content. Display design, ing around sharp edges (‘mosquito noise’) and visible transform video processing algorithm design, system format development block boundaries (‘blocking’). While perceptually dominated by and comparison across products all require being able to evalu- luminance distortions, these also contain chromatic distortions ate the quality of HDR images in a perceptually relevant manner. due to chroma subsampling and different processes acting on Y, There is a vast body of literature on quality metrics relating to Cr and Cb signals. Image statistics play a key role in imaging image data compression for traditional standard dynamic range product design, as the era of displays being able to expect a certain (SDR) images and a steadily increasing amount of research on class of imagery (e.g., optically-captured, or text, or computer- HDR [1], but these are almost exclusively based exclusively on generated) has given way to systems being used for all types of the luminance component. However, color distortions are also imagery. Some key aspects of image statistics have been studies very important to assess because they are increasingly likely in for SDR, such as the 1= f N spatial frequency power spectra, the HDR1 systems, due to the additional gamut and tone mapping log-normal luminance histograms, and the principal components operations that are required to convert the source color volume to in descending variances of an achromatic and two uncorrelated a typically reduced display color volume. Most commonly used chromatic components. However, the same statistics for HDR color difference metrics have been designed to measure differ- images are less well understood, and even for SDR these statis- ences between simple test patches, as opposed to natural (com- tics do not take into account characteristics such as as texture vs. plex2) imagery. Typically, the size of the test patch does not smooth gradients, mixed illumination, frequency of emissive light match the size of the objects in an image, which are often sub- sources, depth of field, etc [4]. Consequently, to allow for ro- stantially smaller in terms of visual degrees (by a factor of 1/300 bustness, it is desirable to evaluate as many images as possible. in consumer TV applications). Other key differences between test Toward that goal, this work includes a total of 46 source images patches and complex imagery is that test patches typically consist and a total of 532 distorted images as evaluated by 94 observers across all four databases. 1 Since most HDR systems are also WCG, we will use the term ‘HDR’ In addition to the commonly used (DE00) color difference to include both types of advances metric, we compare several recent metrics derived for HDR ap- 2There are specific categories of imagery such as natural (i.e., no human-made objects), civilized (including human-made objects), real- plications: DEZ based on the Jzazbz color space, and DEITP based world (optically captured), synthetic (computer generated) so in this paper on the ICTCP color space. While the CIE L*a*b*-based met- we will use the term ‘complex imagery’ to include all cases rics have been shown to perform well for many SDR applications IS&T International Symposium on Electronic Imaging 2020 Color Imaging: Displaying, Processing, Hardcopy, and Applications 162-1 (product surface colors, graphic arts printing), they are known to ing above threshold (9 luminance steps from white to black and have significant problems with new image characteristics enabled typically less than 9 steps from neutral to the maximum satura- by HDR, such as shadow detail spanning several log units, emis- tion) and consequently under-predicts threshold visibility [15]. sive colors, specular reflections, scenes of mixed illumination, and So, it is a good candidate to describe as a macro-uniform color interscene adaptations (e.g., from scene changes like going into a space. There is current debate on whether the micro-uniform or cave, or turning on a light source). One source of the CIE L*a*b* macro-uniform color space will better predict the kinds of color problems is the cube-root based nonlinearity forming the back- distortions in complex imagery and of practical interest to busi- bone of L*, since it has the inability to handle the continuously ness. One goal of this paper is to see how effective the appearance differing non-linearities such as log behavior (Weber’s Law) for and threshold-based approaches are in predicting the kinds of dis- greater than 200cd=m2, the square-root behavior (Rose-Devries tortions in the databases. To quantify the performance of the dif- law) for less than 0:001cd=m2, and the continuum of changes be- ferent color difference metrics, we use four standard performance tween these extremes. On the other hand, CIE L*a*b* is vetted by evaluation procedures – Root Mean Square Error, Pearson Linear standards and has a long history of use, with many experts familiar Correlation Coefficient, Spearman Rank-Order Correlation Coef- with how to apply it. While we have omitted DE94 in this anal- ficient and Outlier Ratio. ysis, it was found to perform worse than DE00 in a HDR-WCG Typical color difference metrics are pixel-based operations study [5]. and results are shown on test patches. In our previous work [5], Both of the new HDR metrics should have better perfor- we measured the performance of various color difference metrics mance over the larger luminance ranges common to HDR, as on a database of natural HDR/WCG images. However, apply- they have been derived from the behavior of the CSF (contrast ing these metrics on a pixel-by-pixel basis, as done with DEZ and sensitivity function 6= spatial MTF) across a large range of light DEITP, without accounting for spatial information from neighbor- adaptation [6, 7], whereas the L* essentially models a single fixed ing pixels does not really mimic the Human Visual System (HVS). adaptation level. In addition, they are both physiologically more With such approaches, for the same magnitude of color distortion, realistic models since they apply a nonlinearity to the signals for a single pixel can have as large of an effect as a large image region the actual known L, M, S, cones in the retina, as opposed to the if a maximum error criteria is used. Averaging the results across psychophysically-derived X, Y, Z color matching signals as do an entire image can be used to avoid this kind of mis-assessment, the CIE L*a*b*-based metrics (which were developed before the but may hide large errors that span only a few pixels. Consider known L, M, S responses were measured). Working with LMS two cases of distortion. One is a 100x100 contiguous pixel re- cone responses allows for chromatic adaptation to be modeled gion while the other has the same number of pixels (10,000) and without the well-known ‘Wrong Von-Kries Adaptation’ distor- magnitude but the pixels are scattered across the image individu- tions that occur when working on the XYZ signals [8].

Load more