Video Quality Assessment Using Spatio-Velocity Contrast Sensitivity Function
Total Page:16
File Type:pdf, Size:1020Kb
IEICE TRANS. INF. & SYST., VOL.Exx–??, NO.xx XXXX 200x 1 PAPER Video Quality Assessment using Spatio-Velocity Contrast Sensitivity Function Keita HIRAI †a) , Jambal TUMURTOGOO †, Student Members , Ayano KIKUCHI †, Nonmember , Norimichi TSUMURA †, Toshiya NAKAGUCHI †, and Yoichi MIYAKE †† , Members SUMMARY Due to the development and popularization of ever the cost of a subjective evaluation is expensive high-definition televisions, digital video cameras, Blu-ray discs, and it is not an appropriate way for a versatile VQA digital broadcasting, IP television and so on, it plays an impor- method. On the other hand, an objective VQA method tant role to identify and quantify video quality degradations. In this paper, we propose SV-CIELAB which is an objective video is not expensive compared with subjective methods. In quality assessment (VQA) method using a spatio-velocity con- general, factors of image quality in objective evalua- trast sensitivity function (SV-CSF). In SV-CIELAB, motion in- tions are quantified by the criteria such as sharpness, formation in videos is effectively utilized for filtering unnecessary color reproduction, tone reproduction and noise char- information in the spatial frequency domain. As the filter to apply videos, we used the SV-CSF. It is a modulation trans- acteristics. The simplest and most widely used image fer function of the human visual system, and consists of the re- quality assessment (IQA) method is the mean square lationship among contrast sensitivities, spatial frequencies and error (MSE) or peak signal to noise ratio (PSNR). But velocities of perceived stimuli. In the filtering process, the SV- they are not very well correlated with perceived visual CSF cannot be directly applied in the spatial frequency domain quality [1-5]. Since images are perceived through the because spatial coordinate information is required when using velocity information. For filtering by the SV-CSF, we obtain human visual system, objective IQA or VQA methods video frames separated in spatial frequency domain. By using should be designed by incorporating human visual char- velocity information, the separated frames with limited spatial acteristics [1]. Based on this observation, Wang et al. frequencies are weighted by contrast sensitivities in the SV-CSF proposed structural similarity (SSIM) [6-8]. Though model. In SV-CIELAB, the criteria are obtained by calculating image differences between filtered original and distorted videos. SSIM is an IQA method based on the assumption of For the validation of SV-CIELAB, subjective evaluation experi- human visual perception, it does not contain exact hu- ments were conducted. The subjective experimental results were man visual characteristics. compared with SV-CIELAB and the conventional VQA meth- In conventional IQA or VQA methods with the ods such as CIELAB color difference, Spatial-CIELAB, signal human visual characteristics, contrast sensitivity func- to noise ratio and so on. From the experimental results, it was shown that SV-CIELAB is a more efficient VQA method than tions (CSFs) are frequently used. CSFs are a mod- the conventional methods. ulation transfer function of the human visual system, key words: objective video quality assessment, spatio-velocity and have band-pass characteristic in spatial frequency contrast sensitivity function, image diffrence, subjective evalua- domain [9]. In 1996, Zhang et al. proposed Spatial- tion, rank order correlation coefficient CIELAB (S-CIELAB) [10]. To compute image degra- dations between original and distorted images, they ap- 1. Introduction plied a CSF in filtering images. The criteria can be ob- tained by calculating CIELAB color differences between Recently, due to the development and popularization of filtered original and distorted images at each pixel. high-definition televisions, digital video cameras, Blu- Though S-CIELAB is useful for still image quality as- ray discs, digital broadcasting, IP television and so sessment, temporal characteristics in human visual sys- on, high-quality videos have been widely used in our tem were not considered and it is not enough for VQA. life. As high-quality videos become popular, it plays To evaluate video quality, Tong et al, proposed spatio- an important role to identify and quantify video qual- temporal CIELAB (ST-CIELAB), which is an exten- ity degradations due to video data compression. sion of S-CIELAB [11]. They used a spatio-temporal Since humans are the ultimate evaluators of video CSF [12][13] to filter original and distorted videos. quality, the most reliable way of video quality assess- Though ST-CIELAB was proposed as a VQA method, ments (VQAs) is a subjective evaluation method. How- a previous research has reported that ST-CIELAB is †Graduate School of Advanced Integration Science, not so much effective compared to S-CIELAB [14]. This Chiba University, 1-33, Yayoi-cho, Inage-ku, Chiba, 263- reason is that ST-CSF does not contain eye movement 8522, Japan †† characteristics. When observers evaluate video quality, Research Center for Frontier Medical Engineering, it is considered that their eyes track moving objects in Chiba University, 1-33, Yayoi-cho, Inage-ku, Chiba, 263- 8522, Japan the video. To build more useful VQA methods, the eye a) E-mail: [email protected] movement characteristics should be incorporated. DOI: 10.1587/transinf.E0.D.1 Copyright °c 200x The Institute of Electronics, Information and Communication Engineers IEICE TRANS. INF. & SYST., VOL.Exx–??, NO.xx XXXX 200x 2 In this paper, therefore, we propose SV-CIELAB sual system. Those filtered images are transformed to which is a VQA method using a spatio-velocity con- X, Y, Z colorimetric values and then L*, a*, b* values. trast sensitivity function (SV-CSF) [15-17]. The SV- Finally color difference is calculated pixel-by-pixel and CSF consists of the relationship among contrast sen- then the mean difference is calculated. sitivities, spatial frequencies and velocities of stimuli. It also contains eye movement characteristics that fol- 2.2 Spatio-Velocity Contrast Sensitivity Function low moving objects. We assumed that eyes track mov- ing objects in videos when observers evaluate the video The SV-CSF model was originally proposed by S. Daly quality. Based on this observation, the SV-CSF was et al. [15][16]. They proposed the SV-CSF model based applied to filter original and distorted videos. The cri- on the experimental data of some previous works [9][12] teria in our method are obtained by calculating im- and their brief experiments. However the parameters in age differences between filtered videos. Furthermore the SV-CSF model were not validated sufficiently. In we conducted subjective experiments for validating SV- 2007, more detailed experiments to measure the con- CIELAB. The subjective results were compared with trast sensitivities were conducted by Hirai et al [17]. SV-CIELAB and the conventional VQA methods which In the experiments, Gabor patterns were used as the are PSNR, SSIM, CIELAB color difference, S-CIELAB stimuli which moved from left to right on a display de- and ST-CIELAB. vice. A fixation point was also displayed at the center of the stimuli and the point moved along the motion of 2. Related Work the stimuli. The observers were instructed to fix their gazes on the fixation point, and contrast sensitivities 2.1 Spatial-CIELAB and Spatio-Temporal CIELAB were measured. Based on the measured data, they op- timized the parameters in the SV-CSF model. Figure 1 shows the overview of the color difference cal- Figure 2 represents the SV-CSF model [17]. The culations in S-CIELAB or ST-CIELAB [10-14]. Origi- SV-CSF consists of the relationship among contrast nal R, G, B images and distorted R, G, B images are sensitivities, spatial frequencies and velocities of stim- respectively transformed to the opponent color com- uli. As shown in Fig.2, the SV-CSF has band-pass char- ponents, A (luminance channel), T (r/g channel), D acteristics and the peak of contrast sensitivities with 0 (b/y channel). Then, the spatial frequency filtering in degrees/second (degree means visual degree) is around S-CIELAB and the spatio-temporal frequency filtering 3 cycles/degree. As the velocity increases, the peak be- in ST-CIELAB are performed. The filters are CSFs comes close to lower frequency. In the experiments to which represent band-pass characteristics in human vi- measure contrast sensitivities, ten subjects participated [17]. The results of each subject are same tendency Original image / video Distorted image / video which indicates that SV-CSF has band-pass character- istics and the peak becomes close to lower frequency as the velocity of stimulus increase. As shown in Fig.2, the contrast sensitivities depend on the velocity of the moving stimuli. If eyes follow the fixation point perfectly, the retinal image when looking at moving stimuli are ideally the same as that when looking at still stimuli. However, the contrast sensitiv- R1G1B1 R1G1B1 A1T1D1 A2T2D2 (Opponent color space) (Opponent color space) S-CIELAB: Spatial frequency filtering ST-CIELAB: Spatio-temporal frequency filtering A1’T 1’D 1’ A2’T 2’D 2’ Contrast Sensitivity X1’Y 1’Z 1’ X2’Y 2’Z 2’ Velocity (deg/sec) Color difference calculation Spatial Frequency (cyc/deg) Fig. 1 Overview of S-CIELAB and ST-CIELAB. Fig. 2 SV-CSF model. HIRAI et al.: VIDEO QUALITY ASSESSMENT USING SPATIO-VELOCITY CONTRAST SENSITIVITY FUNCTION 3 Original video Distorted video Fourier transform Y1 of CIEXYZ Y2 of CIEXYZ Calculation of velocities at each pixel Masking Spatial frequency Filtering by SV-CSF low high … Y1’ of CIEXYZ Y2’ of CIEXYZ Image difference calculation Inverse fourier transform Fig. 3 Overview of the proposed SV-CIELAB. … ities changed depending on the velocity of the moving stimuli. S. Daly described that this reason is caused Weighting process using SV-CSF by smooth pursuit eye movements [15]. The smooth pursuit eye movements reduce the accuracy of tracking moving stimuli.