Impact Analysis of Baseband Quantizer on Coding Efficiency for HDR Video

1 Impact Analysis of Baseband Quantizer on Coding Efficiency for HDR Video Chau-Wai Wong, Member, IEEE, Guan-Ming Su, Senior Member, IEEE, and Min Wu, Fellow, IEEE Abstract—Digitally acquired high dynamic range (HDR) video saving the running time of the codec via computing numbers baseband signal can take 10 to 12 bits per color channel. It is in a smaller range, ii) handling the event of instantaneous economically important to be able to reuse the legacy 8 or 10- bandwidth shortage as a coding feature provided in VC-1 bit video codecs to efficiently compress the HDR video. Linear or nonlinear mapping on the intensity can be applied to the [15]–[17], or iii) removing the color precision that cannot be baseband signal to reduce the dynamic range before the signal displayed by old screens. is sent to the codec, and we refer to this range reduction step as Hence, it is important to ask whether reducing the bitdepth a baseband quantization. We show analytically and verify using for baseband signal is bad for coding efficiency measured test sequences that the use of the baseband quantizer lowers in HDR. Practitioners would say “yes”, but if one starts to the coding efficiency. Experiments show that as the baseband quantizer is strengthened by 1.6 bits, the drop of PSNR at a tackle this question formally, the answer is not immediately high bitrate is up to 1.60 dB. Our result suggests that in order clear as the change of the rate-distortion (RD) performance to achieve high coding efficiency, information reduction of videos is non-trivial: reducing the bitdepth for baseband signal while in terms of quantization error should be introduced in the video maintaining the compression strength of the codec will lead to codec instead of on the baseband signal. a smaller size of encoded bitstream and a larger error measured Index Terms—Reshaping, Quantization, High Dynamic Range in HDR. (HDR), Bitdepth, Transform Coding, HEVC/H.265 We approach this problem by establishing the relationship between the strength of the baseband quantizer and the coding I. INTRODUCTION efficiency measured in (peak) signal-to-noise ratio [(P)SNR]. Realizing more vivid digital videos relies on two main The (P)SNR measure on video signal stored in the PQ format aspects: more pixels and better pixels [1], [2]. The latter is can be approximately considered perceptually uniform because more important than the former nowadays when the resolution the PQ is by design a perceptually uniform representation goes beyond the high definition. At the signal level, achieving in its signal domain [8], [9]. It is beneficial to first model better pixels means adopting a wide color gamut (WCG), and the problem of quantifying the error in the reconstructed using a high dynamic range (HDR) to represent all colors with images [18] as the problem of quantifying the error in the small quantization errors [3]–[7]. reconstructed residues. We then examine the error of a single One efficient color coding standard that keeps the visibility quantizer, and arrive at Lemma 2 that serves as a primitive of quantization artifacts to a uniformly small level is the to facilitate the joint analysis on the effects of baseband and perceptual quantizer (PQ) [8], [9], but it still takes 12 bits to codec quantizers with a linear transform. represent all luminance levels. Economically, it is important The paper is organized as follows. In Section II, we simplify to be able to reuse the legacy 8 or 10-bit video codecs the practical HDR video coding pipeline into a theoretically such as H.264/AVC [10] and H.265/HEVC (without range tractable model, and then present the main derivation in Sec- extensions) [11] in order to efficiently compress HDR videos. tion III-A. Simulation results are presented in Section III-B to Linear or nonlinear mapping (e.g., reshaping [12]–[14]) on validate the derivation, and experimental results on videos are the intensity can be applied to the baseband signal to reduce presented in Section IV to confirm the theoretical explanation. the dynamic range before the signal is sent to the encoder, arXiv:1603.02980v3 [cs.MM] 1 Aug 2016 and we refer to this range reduction step as a baseband II. HDR VIDEO CODING PIPELINE MODELING quantization. Details of the baseband quantizer can be sent A. Quantifying Frame Error by Residue Error as side information to the decoder to recover the baseband signal. Even if a codec supports the dynamic range of a video, Block diagram shown in Fig. 1 (a) models the video coding range reduction can also be motivated by the needs of i) pipeline with the effect of baseband signal quantization. The HDR input to the pipeline is the HDR frame at time index t, It , Copyright (c) 2016 IEEE. Personal use of this material is permitted. with L pixels. The immediate input to the video codec It and However, permission to use this material for any other purposes must be ^HDR obtained from the IEEE by sending a request to [email protected]. final reconstructed output It are limited by the precision Manuscript received March 8, 2016; revised July 15, 2016; accepted July of the finite bits container, so pixels take values on the set 16, 2016. Date of publication July xx, 2016; date of current version July xx, q1 = fnq1jn 2 g. The immediate output pixels from the 2016. The associate editor coordinating the review of this manuscript and Z Z approving it for publication was Prof. Yao Zhao. codec take integer values due to the rounding operation at the C.-W. Wong and Min Wu are with the Department of Electrical and final stage of the codec, and the integer-valued vector Ît−1 Computer Engineering, and the Institute for Advanced Computer Studies, is used by intra- and inter-predictors collectively modeled as University of Maryland, College Park, MD 20742, USA. This work was initiated when C.-W. Wong was a research intern at Dolby Laboratories in pred(·). 2014. E-mail: (cwwong, minwu)@umd.edu. G.-M. Su is with Dolby Laboratories, Sunnyvale, CA 94085, USA. E-mail: Lemma 1 (frame error by residue error). The problem of [email protected]. quantifying the error of predictively coded video frames can 2 y 퐿 video codec 퐿 푢푣 frame 퐈푡 ∈ 푞1ℤ 퐈푡 ∈ ℤ 5 푅 1,1 signal y -1 푥푦 v u Q1 iQ1 T Q2 iQ2 T R Q1 iQ1 푅 0,2 HDR - HDR 퐈푡 퐈 푡 퐿 퐿 a ∈ ℝ ∈ 푞1ℤ 3 퐉푡 = pred 퐈푡−1 푥푦 (x0, y0) 푅 0,1 (a) 2 residual video codec signal -a 0 a -1 x 1 Q1 iQ1 T Q2 iQ2 T R Q1 iQ1 푅푥푦 0,0 HDR - HDR 퐫푡 퐫 푡 퐿 퐿 -1 1 ∈ ℝ ∈ 푞1ℤ x 퐉푡%퐪1 (b) -a Fig. 1: (a) Block diagram for the video coding process with the effect (a) (b) of baseband signal quantization, and (b) equivalent diagram of (a). Block R is the rounding to the nearest integer operation, round(x). Fig. 2: Illustration for MSE calculation for the cases that (a) any def def point (x; y) located within the 2a-by-2a square is quantized to Qi(x) = round (x=qi) ; iQi(x) = qi ·x; i = 1; 2 are quantization and dequantization, respectively. All operations are applied separately to the reconstruction centriod (x0; y0) that may or may not located each entry of x when x is a vector. within the square, and (b) a quantization in xy-plane is followed by a transform, a quantization in uv-plane, inverse transform, and a quantization in xy-plane. be reduced approximately to quantifying the error of non- predictively coded residues. 2 Fig. 2 (a) is located at (x0; y0) 2 R , not limited to be Proof: For simplicity, define the quantizer function within the region. We further assume that the point (X; Y ) ^ Qi(x) = iQi (Qi(x)). Denote the predicted frame pred(It−1) is uniformly distributed over the square, namely, the joint 1 2 as Jt, and it can be decomposed into the residue vector with distribution fX;Y (x; y) = 4a2 ; (x; y) 2 [−a; a] . The mean- the smallest absolute value for each coordinate, and a vector squared error (MSE) for the random vector (X; Y ) quantized of integer multiples of q1, namely, to/reconstructed at (x0; y0) is 2 Jt = Jt%q1 + Q1(Jt) (1) MSE = E k(X; Y ) − (x0; y0)k Z a Z a 2 where % is the modulo operation. Following Fig. 1 (a), the = k(x; y) − (x0; y0)k fX;Y (x; y) dx dy −a −a error due to the joint effect of baseband quantization and video a a ^HDR HDR 1 Z Z (5) compression It − It can be written as: = dy (x − x )2 + (y − y )2 dx 4a2 0 0 −1 HDR HDR −a −a Q1 T Q2 T Q1(It ) − Jt + Jt − It : (2) 2 2 2 L = d + a Substituting Eqn. (1) into (2) and moving Q1(Jt) 2 (q1Z) 3 q 2 2 2 into and out of the quantizer with step size 1, we obtain: where d = x0 + y0 is the squared Euclidean distance to the geometric center of the region, (0; 0), and 2 a2 is related to −1 HDR 3 Q1 T Q2 T Q1(It − Q1(Jt)) − Jt%q1 the strength of the quantizer. It is straight forward to extend HDR the result to the N-dimensional (N-d) case shown as follows: + Jt%q1 − It − Q1(Jt) : (3) Lemma 2 (quantization error). The mean-squared error for a HDR Here, It − Q1(Jt) can be considered as an intra- or inter- point that is uniformly distributed within an N-d hypercube HDR prediction residue, and we define it as rt .

Impact Analysis of Baseband Quantizer on Coding Efficiency for HDR Video

HDR PROGRAMMING Thomas J

A High Dynamic Range Video Codec Optimized by Large Scale Testing

SMPTE ST 2094 and Dynamic Metadata

Perceptual Signal Coding for More Efficient Usage of Bit Codes Scott Miller Mahdi Nezamabadi Scott Daly Dolby Laboratories, Inc

DVB SCENE | March 2017

Visualization & Analysis of HDR/WCG Content

Effect of Peak Luminance on Perceptual Color Gamut Volume

Measuring Perceptual Color Volume V7.1

Quick Reference HDR Glossary

High Dynamic Range Broadcasting

High Dynamic Range Versus Standard Dynamic Range Compression Efficiency

Content Aware Quantization: Requantization of High Dynamic Range Baseband Signals Based on Visual Masking by Noise and Texture