<<

arXiv:2012.11152v1 [eess.IV] 21 Dec 2020 rnfr ae nr ie oigcnaheeBj achieve can coding 8 intra proposed the based i that effici transform coding DCT demonstrate the and improve results Fi to transform Experimental developed Saab detail. are coding the in video integrating intra analyzed on cod experimentally strategies video three intra and based theoretically transform Rat Saab Then, is of developed. Meanwhil gain is learning. (RD) transform Distortion Saab transform dependent codi Saab mode video intra off-line intra based with transform Saab framework a capabil Transform propose decorrelation we and Cosine Secondly, compaction Discrete energy their mainstream on the (DCT) anal with is compared transform, (Saab) and Bias Su Adjusted capabil i.e., with transform, decorrelation Approximation based learning or problem machine compaction explainable optimization machi The energy an model as the we design Firstly, maximizing efficiency. transform coding based t the learning In improve video intra transform. to based transform ing of explainable an potential propose we performance work, coding the explore ae rnfr,Itapeito,Sbpc Approximatio ( Decorrelation. Optimization Compaction, Subspace Rate-Distortion Energy (Saab), prediction, Bias Adjusted Intra with Transform, based naeaea oprdwt h antem8 mainstream the with scheme. compared coding as average on et i ae(DR rm-1.19 from (BDBR) Rate Bit Delta ern,Uiest fSuhr aiona o nee,C Angeles, Los California, Southern email:[email protected]. of University neering, yun.zhang ehooy hns cdm fSine,Seze,China, Shenzhen, Sciences, of Academy Chinese Technology, ie oig(EC otelts estl ie Coding Video Versatile latest the to (HEVC) Coding Video MPEG-2, Efficiency H.265/High from the (AVC), standards, network Coding is coding Video over H.264/Advanced which video transmission of years, development and the two volume storage In every the video doubles , for of data bottleneck quality video th and global Along usage laptop of TV, the etc.. e.g., both drones, of (IoT), increase Things cameras, of surveillance Internet connect smartphones, or devices internet video the of applicat number to video increasing these with of striving usage vi is the immersive addition, and the In 3D experiences. by realistic, boost providing videos in (VR) Three- attractiveness Reality future. Virtual holograph near and 4K (HDR), (3D) the from Range Dimension in extended Dynamic UHD High be 8K Meanwhile, to to (UHD) expected Definition are Ultra-High resolutions video broadcas for Commercial application. wide and representation V .C a u swt igHihDprmn fEetia Eng Electrical of Department Hsieh Ming with is Kuo Jay C.-C. aL n u hn r ihteSeze ntttso Advanc of Institutes Shenzhen the with are Zhang Yun and Li Na Abstract ne Terms Index xlial ahn erigbsdTransform based Learning Machine Explainable aavlm nteeao i aadet t realistic its to due data of big increasing of the era in the most in volume the data contributes data IDEO } @siat.ac.cn Mcielann ehiuspoieacac to chance a provide techniques learning —Machine oigfrHg fcec nr Prediction Intra Efficiency High for Coding VdoCdn,EpanbeMcieLearning Machine Explainable Coding, —Video .I I. aL,u Zhang, Li,Yun Na NTRODUCTION % o-10.00 to eirMme,IEEE, Member, Senior × % C based DCT 8 lfri,USA, alifornia, e-mail: n -3.07 and ø × nteggard Saab 8 bspace RDO), { nally, na.li1, ency. ities. yzed cod- ions sual ting ing ity. his ng ed ne % ed of e- s, e, n n i- e . rmbt ie eune n tl mgsoto h Call the of out images still and secondary sequences video the both of from derivation Koo a wi transforms, As frames reconstructed patches overhead. from searching block computational current through the KLT Lan with dimensional matrix. similar eigen-decomposition one system trained Laplacian an directional [3] from discrete transform a of new a constructed oigta mrvstecdn fcec ntehbi vide hybrid the in framework. on efficiency coding transfor coding focus the based we improves learning paper, that coding this machine explainable In o bits. an symbols developing more of encodes with symbols whereas probability encodes bits lower it less Generally, with entropy. probability its higher codin coefficie approach transformed entropy to of Finally, larger as property given. where statistical sensitive the , be exploits more frequency could high scales generally than example, quantization is frequency For (HVS) low redundancies. to System perceptual Vision furth spectral to and Human a coefficients spectral spatial to the coding exploit quantize i predictive then quantization and from and domain, residuals transform transform referenc of transfor consists Secondly, to its mainly domain. higher on that temporal to based coding or leads prediction spatial a from inter thus classified pixels and be and can prediction encoded, coding intra predictive be The to ratio. compression residuals le less accuracy of a prediction to basis Higher blocks spatial frames. the neighboring successive the on spatial temporal content remove among video correlation to of exploiting is redundancies temporal coding and is predictive entrop and which Firstly, coding adopted, transform coding. been coding, predictive has of framework composed coding video hybrid big a desired. the highly efficien still and coding are Higher is ratio techniques data. video compression there global on on decades, increase video improvement volume past developing the between the on gap in keep techniques researchers coding Although eve years. almost doubled is ten ratio compression video the (VVC)[1], eerhr osuydt-rvntasom Dvir transform. data-driven study attr KLT to of capabilities e researchers decorrelation outstanding ar in and The matrix KLT compaction transmission. autocorrelation ergy using without the stored while cases and overhead are derived rate There coding. bit whi video additional coefficients, an transformed a brings with encoded be associated block. shall input matrix transmitted autocorrelation source the each coding, for video matrix cal autocorrelation to an requires late which decorrelation, and compaction energy Karhunen-Lo standards, coding video of generations three latest the In .C a Kuo, Jay C.-C. e ´ eTasom(L)i nieltasomfor transform ideal an is (KLT) Transform ve tal. et elw IEEE Fellow, 4 ere o-eaal transforms non-separable learned [4] tal. et t so nts tal. et ads cu- [2] act nd nd ch cy n- In ry th m er m o g y e e s s 1 f 2 for Proposals (CfP), and adopted five of the transforms with (JVET) [16], which consists of experts from ISO/IEC MPEG the best Rate-Distortion (RD) cost in the reference encoder and ITU-T VCEG. Zeng et al. [17] presented a two-stage as the final non-separable transform. Cai et al. [5] proposed transform framework, where coefficients at the first stage to only estimate the residual covariance as a function of produced by all directional transforms were arranged appropri- the coded boundary gradient, considering prediction is very ately for the secondary transform. Considering that multi-core sensitive to the accuracy of the prediction direction in the transforms and non-separable transforms can capture diverse image region with sharp discontinuities. Wang et al. [6] directional texture patterns more efficiently, EMT [14] and proposed to optimize transform and quantization together with Non-Separable Secondary Transform (NSST) [18] were com- RD optimization (RDO). Zhang et al.[7] designed a high bined to provide better coding performance in the reference efficient KLT based algorithm. Graph software of the next generation video coding standard. Zhang Based Transform (GBT) was proposed as a derivation of the et al. [19] presented a method on Implicit-Selected Transform traditional KLT in [8], which incorporates Laplacian with (IST) to improve the performance of transform coding for structural constraints to reflect the inherent model assumptions AVS-3. Pfaff et al. [20] applied mode dependent transform about the video . Arrufat et al. [9] designed a KLT based with primary and secondary transforms to improve transform transform for each intra prediction mode in Mode-Dependent coding in HEVC. Park et al. [21] introduced fast computation Directional Transform (MDDT). Takamura et al.[10] proposed methods of N-point DCT-V and DCT-VIII. Garrido et al. [22] the non-separable mode-dependent data-dependent transform proposed an architecture to implement transform types for and create offline 2D-KLT kernels for each intra prediction different sizes. DCT is a pre-defined and fixed transform to modes and for each Transform Unit (TU) sizes. In these re- approach KLT’s performance for Guassian distributed source. cent studies, researches focused on generating autocorrelation However, the Guassian distributed source assumption cannot matrix for the data-dependent KLT and optimizing the data- be always guaranteed due to the diversity of video contents dependent KLT with the constrained autocorrelation matrix. It and variable block patterns, which enlarge the gap between is not only computational difficult to estimate or encode the coding efficiency of using DCT and the best KLT. Video autocorrelation matrix for dynamic block residuals, but also content with complex textures are difficult to follow Gaussian memory consuming to store the transform kernels offline. distribution. In addition, using multiple transform kernels for Discrete Cosine Transform (DCT) performs similarly to different source assumptions and selecting the optimal one KLT on energy compaction when the input signal approx- with RDO significantly increase the coding complexity. imates Guassian distribution. Due to its good energy com- Machine learning based transform is a possible solution to paction capability and relative low complexity, DCT has been have a good trade-off between data dependent KLT and fixed widely used in video coding standards, including MPEG-1, DCT and improve the video coding performance. Lu et al.[23] MPEG-2, MPEG-4, H.261, H.262 and H.263. H.264/AVC utilized non-linear residual encoder decoder network imple- and later coding standards adopted Integer DCT (ICT) to mented as Convolutional Neural Network (CNN) to replace approximate float-point DCT calculation for lower complexity the linear transform. Parameters of these CNN are determined and hardware cost. ICT was computed exactly in integer by backpropagation which is not algorithmic transparency, arithmetic instead of the float-point computation and mul- short of human comprehensibility, not robust to perturbations tiplications were replaced with additions and shifts. Since and hard to be improved layer by layer. To handle these DCT transform kernels are fixed and difficult to adapt to all problem, explainable machine learning is a possible solution video contents and modes, advanced DCT optimizations are to improve the interpretation, scalability and robustness of proposed to improve the transform coding efficiency through learning. Therefore, Kuo et al. [24] proposed an interpretable jointly utilizing multiple transforms and RDO [11]. For intra feedforward design, noted as Subspace Approximation with video coding in HEVC, an integer Adjusted Bias (Saab) transform, which is statistics-centric (DST) [12] was further applied to 4 4 luminance (Y) block and in unsupervised manner. Motivated by the analyses on residuals. Han et al. [13] proposed a variant× of the DST named nonlinear activation of CNN in [25], [26], Kuo et al. [27] Asymmetric DST (ADST) [13] regarding to the prediction proposed the Subspace Approximation with Augmented Ker- direction and boundary information. nels (Saak), where each transform kernel was the KLT kernel Furthermore, due to the diversity of video contents and their augmented with its negative so as to resolve the sign confusion distributions, multiple transforms from DCT/DST families problem. The sign confusion problem was solved by shifting were jointly utilized other than one transform to enhance the the transform input to the next layer with a relatively large coding efficiency. Zhao et al. [14] presented the Enhanced bias in Saab transform [24]. As the explainable machine Multiple Transform (EMT) by selecting the optimal transform learning method, Saab transform[24], interpreted the cascaded from multiple candidates based on the source properties and convolutional layers as a sequence of approximating spatial- distributions. EMT is intra mode dependent where DCT/DST spectral subspaces. The data-dependent, multi-stage and non- transform kernels are selected based on the intra direction. As separable Saab transform was proposed to interpret CNN for the coding efficiency of EMT comes with the cost of higher recognition tasks, which also has good energy compaction computational complexity at the encoder side, Kammoun et al. capability. In [28], Saab transforms were learned from video [15] proposed an efficient pipelined hardware implementation. coding dataset and have potentials to outperform DCT on In 2018, EMT was simplified as Multiple Transform Selection energy compaction capability for variable block size residuals (MTS) and adopted in the VVC by Joint Video Expert Team from intra prediction. However, the coding performance of 3

Saab transform for intra coding need to be further investigated. %ORFNUHVLGXDOV Saab transform based intra video coding for specific block size could be reproduced for other block sizes under the  >G  Gt  G T   @ same methodology. Therefore, we mainly explore 8 8 Saab 8QVXSHUYLVHGOHDUQLQJ ] G  DGT  b  transform for intra predicted 8 8 block residuals to× illustrate &RPSXWH t t t  2 >]  ]  ] @ × t T   the coding performance potential of Saab transform. ,PDJH &RPSXWHWKHFRYDULDQFHPDWUL[ In this work, we propose the Saab transform based intra (^== T ` video coding to improve the coding efficiency. The paper (LJHQYDOXHGHFRPSRVLWLRQ / >w  w  w @ (OHPHQWLQWKHLQSXW t T   is organized as follows. Saab transform and its transform RIWKH6DDEWUDQVIRUP &RPSXWHWKHNHUQHOVDQGELDVHV

'&FRHIILFLHQW %ORFNUHVLGXDO ­  T k performances are analyzed in Section II. A framework of Saab °  k  °­  ak ® K bk ® $&FRHIILFLHQW PD[ __d __ dk d K   °w dk d K   ¯° d transform based intra video coding and intra mode dependent ¯ k x a a b Saab transform are illustrated in Section III. Then, the RD K a K    a K      b     performances and computational complexity are analyzed in )RUZDUGRQHVDWJH6DDE b $Saab  WUDQVIRUP a a b Section IV. Extensive experimental results and analyses are K  K K   K

   y presented in Section V. Finally, conclusions are drawn in K a a a K     b  7   Section VI.  b &RHIILFLHQWVJULG ,QYHUVHRQHVWDJH6DDE $Saab b   WUDQVIRUP a aK   aK   K K   b II. EXPLAINABLEMACHINELEARNINGBASED K x' K /HDUQLQJRQHVWDJH TRANSFORM AND ANALYSIS 2QHVWDJH6DDEWUDQVIRUP 6DDEWUDQVIRUP A. Problem Formulation Transform in image/video compression is general linear Fig. 1. Diagram of learning and testing of an one-stage Saab transform. and aims to improve transform performances, such as energy compaction and decorrelation, for the transformed coefficients. DCT on energy compaction and decorrelation, KLT based Let x = xi be an input source, and it is forward transformed { } video coding requires to transmit kernel information for each to output y = yk in another transform domain as { } block which causes large overhead bits. Saab transform is K 1 able to learn statistics for a large number of blocks, which − is a possible solution for improving the coding performance yk = xiak,i or y = Ax, (1) Xi=0 of existing . where ak,i is the transform element in the forward transform B. Saab Transform kernel A. The inverse transform from yk to xi is presented as Saab transform [24] is conducted as a data-dependent, multi- K 1 − stage and nonseparable transform in a local window to get a x = y u or x = Uy, (2) i k k,i local spectral vector. Saab transform is explainable machine kX=0 learning based transform which is algorithmic transparency, where uk,i is the transform element in the inverse transform 1 human comprehensible, more robust to perturbations and can matrix U. U is an inverse matrix of A satisfying U = A− and be improved layer by layer. Diagram of testing and training UA = AU = I, where I is the identity matrix. If the transform the 2D one-stage Saab transform is presented in Fig. 1, where is orthogonal, which means the rows of transform matrix are the left part is Saab transform and the right part is learning an orthogonal basis set, the inverse transform matrix U satisfies 1 T transform kernels. U = A− = A . M N Given an M N dimensional input x in the space R × , As a machine learning based transform, A is estimated from which is rearranged× into a vector in lexicographic order as data samples = [d0, ..., dT 1]. Generally, transforms are − D x = [x00, x01, ..., x0,N 1, x1,0, x1,1, ..., x1,N 1 learned from subspaces of data samples with different learning − − T (4) strategies. Given a transform set , the optimal transform A∗ ..., xM 1,0, ..., xM 1,N 1] . is selected from through solvingA the optimization problem − − − Then, output transformed coefficients from Saab transform can expressed as A be computed as A∗ = argmax M(y), (3) Ai A K 1 ∈ − aT where M(y) indicates a target transform performance of the yk = ak,j xj + bk = k x + bk, (5) X transformed coefficient y. The optimal transform could be j=0 selected by maximizing the value of M(y). In video cod- where ak are transform kernels and bk is the bias, K=M N, ing, M(y) can be defined as but not limited to the energy k = 0, 1, ..., K 1. In Saab transform, DC kernel and× AC − a K 1 b K 1 compaction or decorrelation capabilities, which relates to the kernels, are composed of ASaab= k 0 − and b= k 0 − , compression efficiency. For example, DCT is predefined as which are unsupervised learned{ from} the training{ dataset} a general orthogonal transform for all block residuals. KLT = [d0, ..., dT 1], as illustrated at the right part of Fig. kernel is derived by maximizing the decorrelation, which D1. The number of− samples in the training dataset, i.e., T , is varies for each block residual. Although KLT outperforms around 60K. y is generally organized as a coefficients grid. In 4 the process of the forward Saab transform, for input x in the M N space R × , the DC and AC coefficients for y are computed separately as DC Coefficient: The DC coefficient is computed with • K 1 y = 1 − x + b , where DC kernel a = 0 √K j 0 0 jP=0 1 (1, ..., 1)T and the corresponding bias b =0. √K 0 AC Coefficients: Firstly, z′ is computed as z′ = x • aT x 1 1 c c c − ( 0 + b0) , where = / , and = (1, 1, , 1, 1) is the constant unit vector.|| || Then, AC coefficient··· is K 1 − aT computed as yk = ak,j z′j + bk = k z′ + bk. AC jP=0 kernel ak is the eigenvectors wk of the covariance matrix T C = E ZZ , where = [z0, ..., zt, ..., zT 1] and z = d ({aT d+}b )1 are derivedZ from the training− dataset − 0 0 . The corresponding bias is bk = max d . D d || || The inverse Saab transform process is symmetric to its E forward Saab transform. Since one-stage Saab transform is Fig. 2. Energy compaction (y) comparison among KLT, DCT and Saab transforms. orthogonal [24], the inverse Saab transform USaab is sim- ply the transpose of the forward Saab transform, noted as T compute the energy compaction of KLT, DCT and these two USaab = ASaab. The vector y is inverse transformed into x′ K 1 K 1 with xk′ = ak′( yk − bk − )+ y , where ak′=[a ,k, Saab transforms. Fig. 2 shows the energy compaction E(y) { }1 −{ }1 0 1 a2,k, ..., aK 1,k], k =0, 1, ..., K 1. comparison among KLT, DCT, the one-stage Saab transform Multi-stage− Saab transform [24]− can be built by cascading and two-stage Saab transform, and we can have the following multiple one-stage Saab transforms, which can be used to two key observations: 1) KLT outperforms DCT by a large extract high-level recognition features. To solve the sign confu- margin on the energy compaction, since KLT is specified sion problem for pattern recognition, the input of the next stage for each block and DCT is fixed and pre-defined for all is shifted to be positive by the bias. It has been exploited in the blocks. DCT is desired to obtain the best energy compaction handwritten digits recognition and object classification [24], performance for the source following Gaussion. In many cases, [?]. In this paper, we explore the potential of Saab transform the assumption of the Gaussion source is not always satisfied for video coding. in video coding of various contents and settings. For video content with complex texture which is not exact Gaussian source, the energy compaction performance gap between DCT C. Energy Compaction and Decorrelation Capabilities of and the best KLT is larger than the content with smooth Saab Transform texture which is closer to be Gaussian source. 2) One-stage In video , one discipline of transform is to and two-stage 8 8 Saab transforms, which learn the fixed save bits by transforming input x to another domain with fewer transform kernels× off-line from training data and then are non-zero elements, which is noted as energy compaction. applied to transform all testing blocks, perform better than Therefore, the energy compaction is mathematically defined DCT in energy compaction. Therefore, 8 8 Saab transform as [29] has the potential to improve the coding performance× of the i y2 existing video codecs with the single choice of DCT. k Another discipline of transform in video coding is removing y kP=0 (6) E( )= 2 , σx redundancy or correlation of the input signals x via trans- formation, i.e., decorrelation. To evaluate the decorrelation where 2 is the variance of the input x and is the number of σx i capability of a transform, we measure the decorrelation cost coefficients. Accordingly, we analyze the energy compaction of transformed coefficients y with its covariance as of 8 8 transforms for video coding, i.e., KLT, DCT, one- × stage Saab transform and two-stage Saab transform. Without C(y)= cov(yi,yj) i=j | | losing generalization, both the one-stage (denoted as “Saab P6 , = E (yi µi)(yj µj ) , 0 i, j K 1 Transform [8 8]”) and the two-stage Saab transform (denoted | { − − }| ≤ ≤ − iP=j as “Saab Transform× [4 4,2 2]”) were learned from over 6 (7) × × 70K 8 8 luminance (Y) block residuals of “Planar” mode where cov(yi,yj) is the covariance between yi and yj , i = j. µi × 6 collected from encoding the video sequence “FourPeople” and µj are the mean of yi and yj . Smaller C(y) value indicates with Quantization Parameters (QPs) 22, 27, 32, 37 in a better decorrelation capability of a transform. The value of ∈ { } HEVC. Only one Saab transform was trained off-line and C(y) in the transform domain of KLT is 0, which means yi applied to transform all blocks in “Saab Transform [8 8]” and and yj are completely independent and their correlation is 0, “Saab Transform [4 4,2 2]”, respectively. Then, other× 500 of as i = j. In other words, redundancy is minimized as 0 among × × 6 8 8 luminance (Y) block residuals were randomly selected to the elements yi in the transformed coefficients. × 5

TABLE I %ORFN y x A ȕ § A Ȗ ELWVWUHDP DECORRELATION CAPABILITY C(y) COMPARISON AMONG KLT, DCT SL[HO ‚ ‚ Ͳ AND SAABTRANSFORMS. ,QWUDSUHGLFWLRQ Encoder ([LVWLQJLQWUDYLGHRHQFRGHU Decorrelation cost C(y) D   Sequence QP &ROOHFWLQJLQWUDSUHGLFWLRQEORFNUHVLGXDOV Saab Transform x DCT KLT ^Train ` [8×8] [4×4,2×2] E  *URXSLQJEORFNUHVLGXDOVLQWRJURXSV

22 581.78 574.26 582.46 ^g g n` 27 1453.53 1309.17 1357.91 BasketballDrill F  /HDUQLQJDVHWRI6DDEWUDQVIRUPV 32 3266.85 2691.37 2769.59 7UDLQLQJVWDJH ^6%7  6%7 n` $FRS\RI^ 6%7  6%7 n`LVWUDQVPLWWHGRIIOLQHWRWKHGHFRGHU 7HVWLQJVWDJH 37 5886.47 4574.13 4802.21 y %ORFN A A 22 588.19 586.50 591.71 x ‚ § Ȗ ELWVWUHDP Ȗ  § ‚ xÖ SL[HO A 

27 1385.87 1361.44 1382.41 6DDE A6DDE Ͳ § 6HOHFWLRQZ RaceHorses 6HOHFWLRQZ

 RUZR5'2 32 3911.97 3814.71 3887.13 0 RUZR5'2 6%7 i 6%7 i

6%7 i 37 8281.33 7818.34 8148.87 G  G  0DSSLQJWKH 0DSSLQJWKH A A   22 358.67 371.54 381.89 6DDE ‚ EORFNUHVLGXDO EORFNUHVLGXDO 27 1042.21 1010.10 1054.60 WR6%7 L FourPeople WR6%7 L %ORFN 32 1348.89 1331.15 1394.65 SL[HO 37 2842.04 2691.33 2950.05 ,QWUDSUHGLFWLRQ ,QWUDSUHGLFWLRQ Average 2578.98 2344.50 2441.96 Encoder Decoder 6DDEWUDQVIRUPEDVHGLQWUDYLGHRFRGHF

Fig. 3. Framework of Saab transform based intra video coding. Experimental analyses on the decorrelation capability of one-stage Saab transform, two-stage Saab transform and DCT for 8 8 block residuals were performed. Saab trans- × Saab transform, β denotes quantizer and γ denotes entropy form kernels were learned from three video sequences in coding. The Saab transform contains the learning kernel stage “BasketballDrill”, “RaceHorses”, “FourPeople” . For each { } and transform stage, which consists of four key components, video sequence, the value of C(y) was computed for 500 of including (a) collecting intra prediction block residuals, (b) block coefficients randomly selected in the transform domain dividing block residuals into groups based on intra mode, (c) of each transform. As to elements in the block residual are learning off-line a set of intra mode dependent Saab transforms either negative integer or positive integer, the expectation of and (d) selecting kernel based on the best intra prediction mode the block residual element is supposed to be zero mean, i.e., to do the forward and inverse Saab transforms. µ = µ = 0, in computing the decorrelation cost defined i j At the stage of learning Saab transform kernels, the in- as C(y). In Table I, decorrelation cost in terms of Eq. I tra prediction block residuals = x are collected off- of KLT, one-stage Saab transform, two-stage Saab transform T rain line from conventional DCT basedD { video} encoder. Since the and DCT are 0, 2344.50, 2441.96 and 2578.98 on average. distribution of the residual data highly depends to the intra We can have the following three key observations: 1) the mode [14], all intra modes are divided into n mode sets, M , decorrelation cost C(y) of KLT is 0, which is confirmed to i i [0,n] and n <= 35 for HEVC. Then, block residuals be the best. 2) Saab transform performs better than DCT on =∈ x are divided into groups g regarding their intra average. For smaller QP, i.e., 22, as well as for some specific T rain i modeD { whether} in M or not. A number{ } of Saab transform video sequences, e.g., “FourPeople”, DCT performs better i kernels SBT ,...,SBT are learned individually based on the on the decorrelation than two-stage Saab transform, which 0 n intra mode{ in M and} their block residual groups g . n is presents the well design of DCT in some cases but not all i i 23 here in the proposed scheme. For example, only{ blocks} in of them. 3) One-stage Saab transform is better than two-stage g are used to train SBT . Note that this is an off-line training Saab transform on decorrelation, mainly because the increase i i process that various video sequences and settings can be used of stages in Saab transform is designed to be beneficial for to train the Saab transform kernels. The complexity of the distinguishing the category of blocks, but not necessary for Saab transform training is negligible to the if it is off- the decorrelation C(y). This motivates us to explore one-stage line. Also, the trained Saab transform kernels are transmitted 8 8 Saab transform based video coding to improve the coding only once and stored at client side for inverse transform in efficiency.× decoding. At the stage of using the Saab transform kernels, the III. SAABTRANSFORMBASEDINTRAVIDEOCODING learned Saab transform kernels SBT0, ...,SBTn are utilized In this section, Saab transform is applied to to to transform block residuals based{ on the intra mode,} regarding improve intra video coding efficiency. Firstly, a framework of to the learning schemes in section III-B different from the Saab transform based intra video coding is presented. Then, existing mode dependent transform selection strategy, e.g., intra mode dependent Saab transform is developed. Finally, MDDT[9]. For example, SBTi will be used to transform the three integration strategies of Saab transform for intra video block residuals from mode in Mi. Note that there are several coding are proposed to improve coding performances. cases to integrate Saab transform into the video encoder. One is to replace the conventional DCT with the Saab transform. A. Framework of Saab Transform based Intra Video Coding The other is to add the Saab transform as an alternative Fig. 3 illustrates the proposed coding framework of Saab transform option and select the optimal one between Saab transform based intra video coding, where ASaab denotes transform ASaab and the conventional transform A , i.e., DCT, † 6

TABLE II TWO TRAINING STRATEGIES FOR MODE DEPENDENT SAAB TRANSFORM. 20.0

17.5 Category Mode ID Saab Learning Scheme 15.0 A 2 ∼ 7, 13 ∼ 23 , 29 ∼ 34 CGSL 12.5 B 0 ∼ 1, 8 ∼ 12, 24 ∼ 28 FGSL

10.0

7.5 Percentage (%) Percentage 5.0 characteristics of the angular intra mode and develop intra

2.5 mode dependent Saab transform.

0.0 We propose to divide intra prediction block residuals into

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 groups g in terms of intra prediction modes M , which are Intra prediction mode ID i i then used to learn Saab transform kernel SBTi. The statistical Fig. 4. Percentage of luminance (Y) blocks coded with each intra prediction characteristics of block residuals from single intra prediction mode indexed by 0 ∼ 34. mode is relatively easier to be represented, in comparison to the case of multiple intra prediction modes. On the other hand,

6%7  Saab transform learned unsupervisedly for block residuals   6%7 from single intra prediction mode, noted as Fine-Grained Saab    3ODQDU 6%7  Learning (FGSL), may have better performance than those  6%7 '& 6%7    learned for multiple intra prediction modes, noted as Coarse- 6%7   Grained Saab Learning (CGSL). However, FGSL trains SBTi  6%7   6%7  for each intra mode, 0 i 35. It means there are 35  6%7  ≤ ≤  SBTs for each Transform Unit (TU) size in HEVC, and even   more kernels shall be learned for standards beyond HEVC,  which significantly increases the difficulties in codec design.   In addition, the ratio of blocks for each mode distributes     unevenly. The distributions of 35 intra prediction modes were        statistically analyzed on the number of 8 8 luminance (Y)       × 6%7

6%7 block residuals encoded by each of these modes. 100 frames   of each video sequence in “BasketballDrill”, “FourPeople”, “RaceHorses” were encoded{ with four QPs in 22, 27, 32, 37 in HEVC.} As shown in Fig. 4, block residuals{ of Fig. 5. Grouping intra prediction block residuals on the basis of intra “Planar”,} “DC”, “Horizontal” and “Vertical” and their neigh- prediction modes. Saab transforms, noted as SBTk, 1 ≤ k ≤ 24, are applied to the corresponding block residuals. boring modes have more percentages than those of rest modes, which indicate these modes have higher impacts on the coding efficiency than the others. Therefore, considering the coding by RD cost comparison. In the latter case, an signaling flag efficiency and complexities of designing the codec, we will of choosing Saab transform or DCT is required to be encoded train Saab transform kernels for “Planar (0)”, “DC (1)”, “Hor- and transmitted to the client side for decoding. izontal (10)” and “Vertical (26)” and their neighboring modes At the decoder side, if the conventional DCT is replaced (8 12 and 24 28), shown as in Category B in Table II ∼ ∼ with Saab transform, based on the intra mode in Mi, the with FGSL scheme. Saab transform kernels for the rest modes, Saab transform kernel SBTi will be used in the inverse Saab shown as in Category A in Table II, are trained with CGSL transform to reconstruct block residuals. Otherwise, based on scheme. We considered the coding efficiency and complexity the signalling flag and intra mode in Mi, either DCT or SBTi of designing the codec. 24 Saab transforms for intra predicted will be selectively used in the inverse transform. block residuals are trained with FGSL and CGSL schemes. In comparison to the other KLT derivation based intra prediction mode dependent transform selection which design adapted B. Intra Mode Dependent Saab Transform transform to each of the 35 intra prediction modes, FGSL and Saab transform is a data-driven transform which is learned CGSL could save more memory to store the transform kernel based on the statistical characteristics of source input. How- as well as more complexity of designing the transform kernels. ever, the statistical characteristics of intra prediction block Whereas to be compared with the DCT/DST transform based residuals x are depending on intra prediction accuracy [5] as intra prediction mode dependent multiple transforms, FGSL well as image{ } texture [30]. There is still significant directional and CGSL is supposed to learn the statistics better for block information left in the intra prediction block residuals . The residuals through grouping block residuals in terms of intra intra prediction block residuals generated by the same intra prediction modes. The details of the FGSL and CGSL schemes prediction mode still have highly varying statistics. Therefore, are: it is necessary to learn Saab transforms based on the statistical CGSL scheme for modes in Category A: Grouping the • 7

intra prediction block residuals generated by intra pre- RD performance gain. Detailed analysis can be referred in diction mode indexed by mode ID i with block residuals Subsection IV-B. So, the original DCT in the existing intra related to intra prediction mode ID i 1 to learn SBTk. video codec is used for these modes. To complement Saab − SBTk will be applied to the block residuals of intra transform with DCT, strategy sII in the middle row of Table prediction mode with mode ID i. III is derived, where SBTk is applied to intra prediction modes FGSL scheme for modes in Category B: Grouping the in 0 7, 13 23, 29 34 . Meanwhile, the DCT is • intra prediction block residuals generated by intra pre- also{ activated.∼ The∼ optimal∼ transform} is selected from DCT diction mode indexed by mode ID i (i 3) with block and SBTk by choosing the one with lower RD cost, i.e., ≥ residuals related to the intra prediction modes indexed RDO. N/A indicates there is no available SBTk for modes by mode ID i 1 to learn SBTk. Different from Scheme in 8 12, 24 28 . So, DCT is used for them directly. − { ∼ ∼ } CGSL, the learned Saab transform SBTk will be applied Furthermore, to maximize the coding efficiency, strategy sIII to block residuals generated by intra prediction modes is proposed, as shown in the bottom row in Table III. SBTk, indexed by mode ID i 1 and i. Residuals generated 0 k 23, are applied to all 35 intra prediction modes in by “Planar” (indexed with− i = 0) and “DC” (indexed intra≤ coding.≤ Meanwhile, DCT is also activated. The optimal with i = 1) are grouped respectively as two groups for transform is selected from SBTk and DCT with RD cost learning and testing the SBTk. comparison. In integration strategies sII and sIII , the optimal Results of the proposed grouping scheme are illustrated in transform is selected with RDO from SBTk and DCT. In Fig. 5, where 24 of the Saab transform kernels are learned and these cases, a signaling flag with 1 bit shall be added in the applied to intra prediction block residuals, noted as SBTk, 0 bitstream for each TU, where 0/1 indicates using DCT/SBTk, ≤ k 23. FGSL is adopted by block residuals generated by respectively. Note that k in SBTk is determined by the intra intra≤ prediction modes indexed by mode ID in 0 7, 13 23, mode according to the learning scheme. The RD performance 29 34 . According to FGSL, SBT and SBT are{ ∼ learned∼ for gain of Saab transform and three integration strategies in intra ∼ } 0 1 block residuals of “Planar” and “DC” separately. SBTk with video coding are analyzed in following sections. k in 2 4, 10 15, 21 23 are learned from block residuals generated{ ∼ by intra∼ prediction∼ } modes pairs in (2,3), (4,5), (6,7), IV. RD PERFORMANCE AND COMPUTATIONAL { (13,14), (15,16), (17,18), (19,20), (21,22), (22,23), (29,30), COMPLEXITY OF SAABTRANSFORMBASEDINTRACODING (31,32), (33,34) . These Saab transforms are applied to cor- } In this section, RD performances and computational com- responding block residuals regarding to the FGSL, except plexity of Saab transform based intra video coding are an- SBT15, which is learned from block residuals corresponding alyzed theoretically and experimentally. Firstly, we analyze to intra prediction modes pairs (22,23) but (23,24), and only the RD cost of Saab transform based intra video coding applied to the block residuals generated by the intra prediction and two sufficient conditions are derived while using Saab mode indexed by 23, as CGSL is utilized to learn the Saab transform to improve the RD performance. Then, these two transform for block residuals of the intra prediction mode sufficient conditions of using SBTk are validated individually indexed by 24. CGSL is adopted by intra prediction modes with coding experiments. Finally, computational complexity indexed by mode ID in 8 12, 23 28 . Regarding CGSL, { ∼ ∼ } of one-stage Saab transform is compared with DCT. SBTk with mode ID in 5 9, 15 20 are learned from block residuals generated by{ ∼ intra prediction∼ } modes pairs in A. Theoretical RD Cost Analysis on Saab Transform (7,8), (8,9), (9,10), (10,11), (11,12), (12,13), (23,24), (24,25), (25,26),{ (26,27), (27,28) . These Saab transforms are applied The objective of video coding is to minimize the distortion to the block residuals according} to CGSL. (D) subject to (R) is lower than a target bit rate. By introducing the Lagrangian multiplier λ, the R-D optimization problem in video coding can be formulated by minimizing RD C. Integration Strategies for Saab Transform cost J(Q) as Since Saab transform has good performances on energy compaction and de-correlation, as analyzed in Subsection II-C, min J(Q),J(Q)= D(Q)+ λ R(Q), (8) we propose three integration strategies that integrate intra · mode dependent Saab transform with DCT in intra video codec where Q is quantization step, D(Q) and R(Q) are distortion to improve the coding efficiency. Table III shows these three and bit rate at given Q. So, it is necessary to analyze the RD cost of Saab transform based intra coding and that of DCT integration strategies, noted as sI , sII and sIII , for Saab based intra coding to validate its effectiveness. transform based intra video codec. In sI , each intra prediction mode adopts either Saab transform or DCT for transform The rate R(Q) and distortion D(Q) of using the Saab coding. Intra prediction modes in 0 7, 13 23, 29 34 transform are modelled and theoretically analyzed. The bit { ∼ ∼ ∼ } rate R can be modelled with the entropy of the transformed utilize SBTk with index k in 0 4, 10 15, 21 23 as their transforms. For intra prediction{ ∼ modes∼ that around∼ } the coefficient y. Meanwhile, when the transformed coefficient y horizontal and vertical directions, block residuals of which is quantized with quantization step Q, the bit rate R(Q) can retain more dynamical statistical characteristics of block con- be modelled as its entropy minus log Q, which is [31] + tent than the other modes. Replace DCT with Saab trans- ∞ R(Q) fy(y) log fy(y)dy log Q. (9) form for these modes directly can obtain less or negative ≈− Z − −∞ 8

TABLE III INTRA MODE DEPENDENT SAAB TRANSFORM SET {SBTk} AND INTEGRATION STRATEGIES. Integration Intra Mode ID Transform Strategies 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 1 sI SBT / DCT 01223344 DCT 10 10 11 11 12 12 13 13 14 14 15 DCT 21 21 22 22 23 23 SBT index 01223344 N/A 10 10 11 11 12 12 13 13 14 14 15 N/A 21 21 22 22 23 23 s 2 II DCT DCT SBT index 0 1 2 2 3 3 4 4 5 6 7 8 9 10 10 11 11 12 12 13 13 14 14 15 16 17 18 19 20 21 21 22 22 23 23 s 2 III DCT DCT 1 Either DCT or SBTk is used depending on the intra mode. 2 The optimal transform is selected from DCT and SBTk with RDO. One bit signalling flag is transmitted to indicate the type, where 0 / 1 indicate DCT / SBTk.

To analyze the transformed coefficient y output from Saab transform, we collected 1000 of 8 8 luminance (Y) intra prediction block residuals generated× by “Planar” mode from encoding “FourPeople” in HEVC. Histograms of transformed coefficients of Saab transform at two locations of the coeffi-

cient grid, i.e., (0,2) and (5,2), are presented in Fig. 6. Saab transform coefficients for 8 8 block residuals are organized as a grid of 8 8 in the order× of the zigzag scan. The histograms − − − − − − − × of transformed coefficients of Saab transform are approxi- (a) (b) mated respectively with Laplacian distribution and Gaussian Fig. 6. Distributions of transformed coefficients via Saab transform for 8 × 8 distribution, noted as ySaab Laplace(µy , σy ) and ∼ Saab Saab block residuals generated by “Planar” mode. (a) Transformed coefficient at ySaab N(µySaab , σySaab ). We can observe that distributions location (0,2) from Saab, (b) Transformed coefficient at location (5,2) from of transformed∼ coefficients of Saab transform generally obey Saab. Laplacian and Gaussian distributions. Meanwhile, higher ac- curacy can be obtained by modelling these transformed coeffi- cients using Laplacian distribution. So, Laplacian distribution is used for modelling the transformed coefficient y of Saab √2 y σy transform, i.e., fy(y) = √2σye− | |. By applying fy(y) to Eq. 9, we can obtain the rate R(Q) as

√ 2eσy R(Q) log . (10) − − − − ≈ Q Since uniform quantizer is used to quantize the transformed (a) (b) coefficient y, the range of y will be partitioned into a infinite Fig. 7. Distributions of DCT transformed coefficients for 8 × 8 residual number of intervals It = (qt, qt+1). y in the interval It will blocks generated by “Planar” mode. (a) Transformed coefficient at location be mapped to st. As st is independent with t, given the (0,2) from DCT, (b) Transformed coefficient at location (5,2) from DCT. quantization step size Q = qt+1 qt, the distortion caused by quantization D(Q) can be calculated− as [32] Similarly, we also analyze distributions of the transformed + s +0.5 ∞ t D(Q)= (y s )2f (y)dy, (11) coefficients from DCT, as shown in Fig. 7. These distributions Z t y X st 0.5 − of transform coefficients of DCT are closer to Laplacian distri- −∞ − bution yDCT Laplace(µy , σy ) [30] than Gaussian where Q is the quantization step size. As the distribution of the ∼ DCT DCT distribution yDCT N(µy , σy ). Therefore, Eq.13 can transformed coefficient y after uniform quantization still obey ∼ DCT DCT √2 also be derived for DCT based intra coding. To differentiate σ y Laplacian, i.e., fy(y) = √2σye− y | |, the distortion D(Q) Saab transform from DCT, transformed coefficients of Saab can be approximated as [30] transform are noted as ySaab and DCT are noted as yDCT . 2 2 Q κy for Saab and DCT are denoted as κySaab and κyDCT . D(Q) σy σ2 Q2 . (12) ≈ 12 y + Therefore, for block residuals, RD gain can be achieved if the Therefore, the RD cost of the Saab transform based intra transformed coefficient of Saab transform satisfies condition coding scheme can be calculated as

2 2 Q √2eσy Q κy <κy . (14) σ 2 2 + λ log σy > Saab DCT y 12σy +Q Q √2e J κy =  2 · , ≈ 2 Q Q  σy σ2 Q2 σy 12 y + ≤ √2e Apply Eq.13 to Eq.14, we can obtain an inequality relates (13) 2 2  to σySaab , σyDCT and quantization step Q. For simplicity, we where the right part is defined as κy for further illustration. 2 find Eq. 14 is satisfied by all quantization step Q when the Q 2 Q 2 2 When , , so 2 2 . σy R(Q)=0 J κy = σy σ Q variances of the transformed coefficients, σy and σyDCT , ≤ √2e ≈ 12 y + Saab 9 satisfy condition where negative ∆σ2 indicates a better RD performance of 2 2 Saab transform and positive 2 indicates a worse RD σySaab < σyDCT . (15) ∆σ 2 2 performance as compared with DCT. σySaab and σyDCT are This inequality is an sufficient but not necessary condition variances of transform coefficients of Saab transform and DCT, for Eq.14, which is more critical. It means transform that respectively. Table V shows σ2 , σ2 and ∆σ2 for four minimizes the output variances of transformed coefficients will ySaab yDCT different intra modes. We can observe that the average ∆σ2 improve the RD performance of a codec. Both conditions in of intra prediction modes “Planar”, “DC”, “Horizontal” and Eq.14 and Eq.15 will be experimentally analyzed in detail in “Vertical” are -0.005, -0.032, -0.332 and 0.005, respectively, the following subsection so as to testify the effectiveness of which means the Saab transform performs better than DCT Saab transform. for intra prediction modes “Planar”, “DC” and “Horizontal” on average, but a little inferior to DCT for “Vertical” mode B. Experimental RD Cost Analysis on Saab Transform on average. In fact, other intra modes can be compared Two conditions in Eq.14 and Eq.15 were analyzed by accordingly and Saab transform performs better than DCT for 2 2 most modes and sequences, which can be used to improve the comparing σySaab and σyDCT , κySaab and κyDCT of trans- formed coefficients from Saab and DCT, respectively. In the video coding efficiency. experiment, coding configurations were generally the same as Overall, Saab transform has better performance than DCT those in Section II-C. Saab transforms were learned from 80K for different sequences, QPs, and intra modes on average, 8 8 luminance (Y) block residuals for intra prediction modes which means Saab transform can be used to replace DCT in× “Planar”, “DC”, “Horizontal”, “Vertical” . Hundreds of to improve the coding efficiency. However, it is inferior to 8 {8 intra prediction block residuals from each} mode were DCT in some cases, such as the cases with QP as 22 or randomly× collected among thousands of blocks to compute intra prediction mode is “Vertical”. Therefore, to maximize the coding efficiency, an alternative way is to combine Saab κySaab and κyDCT , where these block residuals were generated from encoding video sequences with Saab transform based transform with DCT and select the optimal one with RD cost intra video encoder and conventional DCT based intra video comparison. encoder with “Planar” mode only. Three sequences with differ- ent resolutions, “PeopleOnStreet” at 2560 1600, “Johnny” at C. Computational Complexity Analysis on Saab Transform 1280 720 and “RaceHorses” at 416 240× , were tested when × × We measure the transform complexity via the number of QP 22, 27, 32, 37 . To quantify the difference between float-point multiplications or divisions. Practical complexity κ ∈ {and κ , ∆κ}is defined as ySaab yDCT is desired to be explored in the future. So, the computational complexity of applying DCT to the block of size M N ∆κ = κySaab κyDCT , (16) − is O(2MN 2 + 2M 2N). Saab transform for blocks of× size where negative ∆κ indicates a better RD performance of Saab M N is a little different from DCT at the computational transform as compared with DCT, while positive ∆κ indicates complexity.× It requires an extra 3MN float-point computations a worse RD performance. Table IV shows κySaab , κyDCT and in one-stage Saab transform before mapping one block of ∆κ for different QPs and video sequences. We can observe size M N to one DC coefficient and M N 1 AC × × − that κySaab is generally smaller than κyDCT , and the average coefficients. Therefore, the computational complexity of one- ∆κ are 0.0001, -0.0001, -0.0010 and -0.0045 when QP is 22, stage Saab transform is O(3MN + 2(MN)2). Theoretically, 27, 32 and 37, respectively. It means for the “Planar” mode the the complexity of DCT is relatively lower than the one-stage Saab transform can achieve better RD performance on average Saab. The ICT is low complexity approximation of DCT, when QP are 27, 32 and 37 and a little worse than DCT on RD which is implemented with integer arithmetic and avoid the performance when QP is 22. So, Saab transform is actually float-point multiplication. Its complexity is much lower than more effective than DCT on average. that of DCT as well as one-stage Saab computed in float 2 2 In addition, σySaab and σyDCT are also analyzed and com- point arithmetic, which has not been optimized in terms of pared to validate the effectiveness of Saab transform. Four implementation. Saab transforms were learned from 80K 8 8 luminance (Y) × block residuals for intra prediction modes in “Planar”, “DC”, V. EXPERIMENTAL RESULTS AND ANALYSIS “Horizontal”, “Vertical” , respectively. Then,{ these Saab trans- forms were applied to} block residuals of “Planar”, “DC”, We evaluate the RD performance of Saab transform in “Horizontal”, “Vertical” correspondingly. As{ a comparison, comparison with DCT for intra video coding in HEVC. In the same set of block residuals} were also transformed by DCT. learning Saab transform, 24 of the Saab transform kernels, Then, 2 and 2 were computed from the transformed noted as SBTk, 0 k 23, were learned off-line from around σySaab σyDCT ≤ ≤ coefficients of Saab transform and DCT. Four intra modes 80K block residuals separately. These 80K block residuals “Planar”, “DC”, “Horizontal”, “Vertical” and three video were collected evenly from encoding frames from “Peo- { } pleOnStreet” of resolution 2560 1600, “BasketballDrill” of sequences “PeopleOnStreet”, “RaceHorses”, “Johnny” were × tested. QP{ was fixed as 37. The difference between 2 } and resolution 832 480, “BasketballPass” of resolution 416 240 σySaab × × σ2 , i.e., ∆σ2, is defined as and “FourPeople” of resolution 1280 720 with QP in 22, yDCT 27, 32, 37 . In these sequences for training,× frames besides{ ∆σ2 = σ2 σ2 , (17) those frames} utilized to collect the training dataset are tested ySaab − yDCT 10

TABLE IV κ κ COMPARISONS BETWEEN ySaab AND yDCT FOR 8×8 LUMINANCE (Y) BLOCK RESIDUALS FROM “PLANAR” MODE.

QP 22 27 32 37 Sequence κ κ ∆κ κ κ ∆κ κ κ ∆κ κ κ ∆κ name yDCT ySaab yDCT ySaab yDCT Saab yDCT ySaab PeopleOnStreet 0.1035 0.1036 0.0001 0.6765 0.6764 -0.0001 4.3067 4.3052 -0.0015 21.6978 21.6912 -0.0066 RaceHorses 0.1263 0.1264 0.0001 0.9002 0.9001 -0.0001 5.3745 5.3743 -0.0002 22.6707 22.6658 -0.0049 Johnny 0.0751 0.0751 0.0000 0.2819 0.2818 -0.0001 1.3996 1.3982 -0.0014 11.1232 11.1211 -0.0021 Average - - 0.0001 - - -0.0001 - - -0.0010 - - -0.0045

TABLE V 2 2 COMPARISONS BETWEEN σ AND σ FOR 8 8 BLOCK RESIDUALS WHEN QP IS 37. FOURINTRAPREDICTIONMODES “PLANAR”,“DC”, ySaab yDCT × “HORIZONTAL” AND “VERTICAL” ARE TESTED.

Intra mode Planar DC Horizontal Vertical Sequence 2 2 2 2 2 2 2 2 2 2 2 2 σ σ ∆σ σ σ ∆σ σ σ ∆σ σ σ ∆σ name yDCT ySaab yDCT yDCT yDCT ySaab yDCT ySaab PeopleOnStreet 24.899 24.891 -0.008 32.741 32.735 -0.006 32.419 31.416 -1.003 21.673 21.673 0.000 RaceHorses 35.890 35.884 -0.006 54.590 54.589 -0.001 74.434 74.443 0.009 53.016 53.031 0.015 Johnny 11.908 11.906 -0.002 18.887 18.797 -0.09 30.122 30.119 -0.003 11.460 11.461 0.001 Average - - -0.005 - - -0.032 - - -0.332 - - 0.005

TABLE VI BDBR OF SAABTRANSFORMBASEDINTRACODINGWHERE DCT IS REPLACED WITH SBTk INDIVIDUALLY FOR EACH INTRA MODE.

Transform set with one SBT s0 s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 s16 s17 s18 s19 s20 s21 s22 s23 SBT index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Intra prediction modes 0 1 2,3 4,5 6,7 8 9 10 11 12 13,14 15,16 17,18 19,20 21,22 23 24 25 26 27 28 29,30 31,32 33,34 Class Sequence name BDBR (%) Traffic -0.21 -0.06 -0.25 -0.61 -0.41 -0.14 0.04 0.02 0.17 0.05 -0.01 0.00 0.03 -0.04 -0.03 0.02 0.04 0.10 0.11 -0.01 0.03 0.05 -0.07 -0.15 A PeopleOnStreet * -0.42 -0.14 -0.12 -0.29 -0.28 -0.08 0.14 0.08 0.09 0.04 0.03 -0.24 0.04 0.01 -0.08 0.16 0.14 0.06 0.06 0.10 -0.05 0.12 -0.02 -0.20 BQMall -0.19 -0.08 -0.28 -0.04 -0.06 -0.14 0.15 0.02 0.13 -0.09 -0.02 -0.18 -0.26 -0.34 -0.15 -0.11 -0.05 0.23 1.02 0.09 -0.01 -0.02 -0.23 -0.19 C BaketballDrill * -0.13 0.08 -0.50 -0.51 -0.69 -0.37 -0.17 -0.22 -0.07 0.03 -0.42 -0.65 -3.81 -2.34 -1.65 -0.50 -0.43 -0.34 -0.28 -0.21 -0.43 -0.18 -0.05 0.11 BQSquare -0.32 -0.23 -0.20 -0.26 -0.06 -0.15 -0.19 -0.12 -0.26 -0.08 -0.11 -0.23 -0.19 -0.21 -0.12 -0.22 0.01 -0.04 0.07 0.04 -0.14 -0.16 -0.12 -0.11 D BaketballPass * -0.21 -0.16 -0.04 -0.20 -0.17 0.15 0.00 0.00 0.15 0.35 -0.06 -0.03 -0.50 -0.10 -0.45 -0.13 -0.25 -0.04 0.21 -0.22 -0.07 -0.25 -0.24 -0.11 KristenAndSara 0.34 0.08 -0.05 -0.24 -0.01 0.08 0.06 0.07 0.22 -0.06 0.12 -0.38 0.10 -0.23 -0.21 -0.06 0.20 0.25 0.66 0.72 -0.01 -0.12 0.04 0.02 E FourPeople * -0.16 0.02 -0.17 -0.25 0.00 0.00 -0.09 0.15 0.16 -0.09 0.07 -0.23 -0.23 -0.18 -0.35 -0.06 -0.10 0.47 -0.04 0.20 -0.01 -0.02 -0.23 -0.25 Average -0.16 -0.06 -0.20 -0.30 -0.21 -0.08 -0.01 0.00 0.07 0.02 -0.05 -0.24 -0.60 -0.43 -0.38 -0.11 -0.05 0.09 0.23 0.09 -0.09 -0.07 -0.11 -0.11 * Partial blocks of these video sequences were utilized to learn the Saab transforms. at the stage of testing. The Saab transform based intra video A. Coding Efficiency Analysis codecs were implemented on HEVC test model version 16.9 We evaluated the coding performance of Saab transform (HM16.9) and Saab transform is implemented in C++. To based intra video coding in two phases. Firstly, the coding minimize the mutual influence of variable sizes Coding Unit performance of each Saab transform kernel was validated one- (CU) and TU, CU size was fixed as size of 16 16 and TU size by-one. In this coding experiment, DCT of only one intra was fixed as 8 8, as we would like to analyze× the performance mode was replaced by the SBT , and the rest intra modes of 8 8 Saab transform× without being influenced by the block k still use DCT, which has 24 combinations and denoted as size.× RDO Quantization (RDOQ) was disabled to compare s , k [0, 23]. Eight sequences were encoded for each Saab transform and DCT without influences from quantization k s . Table∈ VI shows the coding performances for proposed optimization. The coding experiments were performed under k Saab based intra video codecs for each s as compared with All Intra (AI) configuration, where four QPs 22, 27, 32, 37 k the original DCT based codec, where negative BDBR value were tested. Note that since we fixed the CU∈{ size as 16 16,} indicates coding gain and positive value means coding loss. video sequences of Class B were clipped from 1920 ×1080 We have three observations: 1) BDBR from -0.01% to -0.60% to 1920 1072 and video sequences in Class A were encoded× can be achieved on average for most intra modes. 2) SBT and decoded× conforming to the main profile at level 4 for 12 can get BDBR as -0.60% on average when it is applied to alignment. block residuals generated by intra mode 17 and 18, which All experiments were carried out in a workstation with is significant. 3) BDBR values are positive for several intra 3.3GHz CPU and 96.0GB memory, Windows 10 operating modes, such as 10,11,12,25,26,27 , which indicate that the system. Peak Signal-to-Noise Ratio (PSNR) and bit rate were RD performances{ of Saab transforms} around horizontal and utilized to evaluate the video quality and bit rate of the vertical directions are inferior to DCT on average. Based on proposed Saab transform based intra video coding while these results, we propose not to replace DCT with SBTk for Bjønteggard Delta PSNR (BDPSNR), Bjønteggard Delta Bit intra modes in 8 12, 24 28 , i.e., integration strategy sI , { ∼ ∼ } Rate (BDBR) [36] were adopted to represent coding gain. if without RDO. If with RDO, sII and sIII in Table III are 11 proposed. In addition to evaluate each SBTi, the joint RD performance of Saab transforms for intra video coding were also evaluated, which included three strategies sI , sII and sIII . Twenty three video sequences with various contents and resolutions in 1 416 240, 832 480, 1280 720, 1920 1080, 2560 1600 { × × × × × } 1 were encoded with the proposed Saab transform based intra video codec and the benchmark in the coding experiment.

100 frames were encoded for each test sequence. Table VII shows the RD performances of Saab transform based intra 1 video codec as compared with the state-of-the-art DCT based HEVC codec. We can observe that three schemes sI , sII and sIII can achieve BDBR gain -1.41%, -2.59% and -3.07% on Fig. 8. BDBR of sIII with Saab transforms of different decimal digits. average. Scheme sI can improve the coding efficiency for most sequences while schemes sII and sIII can improve the BDBR for all test sequences. In addition, maximum BDBR gains 134.2% and 138.4% on average respectively. Similarly, the are up to -9.10%,-9.72% and -10.00% for schemes sI , sII complexity is mainly brought by the implementation of Saab and sIII , respectively, which are significant and promising. transform. The decoding time of sII and sIII are reduced The competition between Saab transform and DCT with RDO as compared with sI because partial blocks are decoded with improves the coding performance of replacing DCT with Saab DCT in sII and sIII , which has a little lower computational transforms, i.e., sI , with a large margin. The overhead of the complexity. binary bit that indicating the optimal transform is negligible The memory consumption of these 24 8 8 one-stage 8 8 × × in comparison to the bit rate saving. Saab transform kernels for encoding and decoding is close to 3 MB, which is much larger than ICT implemented in HEVC. Each one-stage Saab transform kernel are stored with B. Coding Complexity Analysis 20 decimal digits. Besides transform matrix, the transform In addition to the coding efficiency, the coding com- kernel requires extra storage of parameters to do float-point plexity of the proposed Saab transform based intra video computations in one-stage Saab transform before mapping one coding was also analyzed. The ratios of the computational block to coefficients. complexities of the proposed Saab transform based intra video encoder/decoder to those of the DCT based anchor encoder/decoder are defined as C. RD Impacts from Different Computation Precisions in Saab Transform 4 EncR = 1 TEnc,Saab(QPi) 100% In the coding experiments, computation precision was set as  4 TEnc,DCT (QPi) × iP=1 , (18) 20 decimal digits in Saab transform. We evaluated the affection  4  DecR = 1 TDec,Saab(QPi) 100% of float-point precision on coding performance, i.e., BDBR. 4 TDec,DCT (QPi) i=1 ×  P To scheme sIII , video sequences “Traffic” and “BQMall”  where TEnc,Saab(QPi) and TDec,Saab(QPi) are encoding and were encoded by Saab transform based intra video coding with decoding time for QPi in Saab transform based intra video different decimal digits, i.e., 1, 2, 3, 5 and 20. In Fig. 8, BDBR codec, and TEnc,DCT (QPi) and TDec,DCT (QPi) are encod- for “Traffic” and “BQMall” were converged from -2.35% and ing/decoding time for QPi in DCT based intra video codec. -1.72% to -2.80% and -2.01% by increasing the number of In Table VII, in comparison to the fast algorithm for DCT decimal digits from 2 to 3. Increasing of the decimal digits in the existing codec, the EncR of intra video codecs with from 3 to 5 can achieve BDBR as -2.81% and -2.03% for schemes sI , sII and sIII to codec with DCT are 202.4%, “Traffic” and “BQMall” respectively, which is relatively small 240.0% and 284.4% on average respectively. Theoretically, the for bit rate reduction. Experimental results indicate that Saab computational complexity of Saab transform is a little lower transforms with the float-point no less than 3 decimal digits than that of DCT, as illustrated in Section IV-C. However, the do not sacrifice the bit rate saving on average in the Saab complexity increases to 202.4% for sI . The main reason is that transform based intra video codec. The reason behind these the DCT is optimized with butterfly operation in HEVC, and results is that the competition between Saab transform and Saab transform is implemented with float-point multiplication DCT minimizes the shrinking of RD performance of reducing directly. In fact, the implementation of Saab transform can be the precision of float-point in Saab transform. The affection optimized in future. As for the sII and sIII , the complexities of reducing the precision of float-point no more than 3 is increase to 240.0% and 284.4% from 202.4%, because of negligible. selecting the optimal transform with RDO. In addition to the encoding complexity, the decoding com- plexity of Saab transform based intra video codec was also D. Ratio of Blocks Using Saab Transform evaluated. The DecR of intra video codecs with schemes sI , We analyzed the percentage of 8 8 blocks that adopt SBTk × sII and sIII ranges from 117.0% to 197.5% and are 147.6%, as the optimal transform for encoding their luminance (Y) 12

TABLE VII RD PERFORMANCES AND COMPUTATIONAL COMPLEXITY OF SAABTRANSFORMBASEDINTRAVIDEOCODECASCOMPAREDWITHTHE STATE-OF-THE-ART DCT BASED HEVC CODEC.

Transform set sI sII sIII BD BD BD BD BD BD EncR DecR EncR DecR EncR DecR Class Sequence name BR PSNR BR PSNR BR PSNR (%) (%) (%) (%) (%) (%) (%) (dB) (%) (dB) (%) (dB) NebutaFestival 0.25 -0.019 212.9 159.3 -2.13 0.154 235.9 136.1 -2.30 0.167 267.4 139.3 StreamLocomotive 0.87 -0.045 224.3 153.8 -1.40 0.074 254.3 131.4 -1.69 0.089 290.8 134.0 A Traffic -0.88 0.047 216.2 143.4 -2.06 0.112 236.2 125.1 -2.81 0.154 295.4 137.0 PeopleOnStreet * -1.07 0.061 225.2 148.7 -2.37 0.137 260.2 127.9 -3.00 0.174 318.8 137.6 Kimono 0.74 -0.026 213.9 158.2 -0.97 0.033 251.0 141.8 -1.19 0.040 291.1 136.4 ParkScene -0.11 0.05 198.1 144.8 -1.74 0.080 236.9 125.8 -2.07 0.096 285.9 133.7 B Cactus -0.94 0.036 212.6 156.9 -2.28 0.089 232.4 126.0 -2.91 0.115 296.1 139.5 BQTerrace -0.37 0.008 203.7 136.8 -1.76 0.105 230.4 124.3 -2.32 0.136 290.4 133.0 BasketballDrive -0.62 0.012 193.2 132.2 -1.60 0.046 223.3 126.5 -2.24 0.065 287.2 125.8 RaceHorses -0.91 0.060 217.1 161.4 -2.34 0.158 249.8 137.8 -2.67 0.180 296.7 142.7 PartyScene -1.18 0.091 198.3 145.9 -1.99 0.157 238.7 133.0 -2.69 0.214 296.9 144.0 C BQMall -0.19 0.012 195.0 139.2 -1.38 0.083 231.6 125.0 -2.03 0.124 297.8 134.1 BaketballDrill * -9.10 0.463 212.6 155.4 -9.72 0.498 265.0 157.4 -10.00 0.514 273.3 150.5 RaceHorses -2.37 0.158 199.9 169.7 -3.45 0.233 268.0 160.4 -3.87 0.262 303.2 161.5 BlowingBubbles -2.19 0.132 195.5 197.5 -3.12 0.190 265.5 157.2 -3.78 0.232 292.6 162.9 D BQSquare * -0.68 0.058 188.2 141.8 -1.87 0.171 243.6 152.9 -2.47 0.227 304.0 146.3 BaketballPass -0.34 0.019 175.7 139.9 -1.41 0.084 223.0 155.5 -2.04 0.124 277.2 119.4 Johnny -1.38 0.062 184.6 120.2 -2.20 0.101 226.4 120.2 -2.51 0.116 256.5 128.9 E KristenAndSara * -1.15 0.061 198.6 131.6 -1.89 0.103 218.6 117.5 -2.47 0.135 271.9 122.7 FourPeople -0.99 0.058 199.6 135.4 -1.89 0.109 228.1 120.5 -2.58 0.149 269.4 122.6 BasketballDrillText -7.45 0.413 223.1 162.6 -8.10 0.453 259.4 157.5 -8.39 0.470 275.4 151.2 F ChinaSpeed -0.43 0.040 181.3 126.3 -1.14 0.106 227.8 118.7 -1.66 0.156 262.6 117.0 SlideShow -1.89 0.176 184.8 147.0 -2.70 0.262 213.0 134.4 -2.83 0.272 240.9 140.3 Average -1.41 0.082 202.4 147.6 -2.59 0.154 240.0 134.2 -3.07 0.183 284.4 138.4 * Block residuals were partially utilized to learn the Saab transforms.

block residuals. The percentage of blocks that adopt SBTk in TABLE VIII s encoding video sequences is defined as PERCENTAGES OF SAABTRANSFORMUSEDININTRACODINGFOR III .

nSaab(QPi) PSaab(QPi) (%) Class Sequence name QP PSaab(QPi)= 100%, (19) Average nTotal(QPi) × 22 27 32 37 Traffic 43.72 43.06 37.93 28.66 38.34 A where nSaab(QPi) is the number of 8 8 blocks adopt SBTk PeopleOnStreet 43.21 41.79 36.07 31.80 38.22 × ParkScene 43.27 35.36 34.21 39.24 38.02 as the optimal transform with QP is QPi. nTotal(QPi) is the B Cactus 48.83 41.86 33.70 32.43 39.20 total number of encoded 8 8 blocks. Eleven different video PartyScene 50.77 48.10 45.33 42.21 46.60 C sequences were encoded by× the intra video codec with scheme BasketballDrill 79.20 83.81 86.70 85.89 83.90 RaceHorses 49.68 50.57 55.97 64.70 55.23 D sIII and QP 22, 27, 32, 37 . In Table VIII, PSaab for each BQSquare 40.38 45.51 43.06 34.68 38.26 ∈{ } KristenAndSara 33.73 19.79 15.26 11.21 20.00 QP and PSaab of four QPs on average are presented at the last E FourPeople 36.18 25.57 23.05 21.67 26.61 row and column correspondingly. Over 80% of blocks selected F BasketballDrillText 78.77 74.52 76.55 90.34 80.04 SBTk as the optimal transform other than DCT with RDO, as Average 49.60 46.36 44.35 43.89 46.05 encoding “BasketballDrill” and “BasketballDrillText” with the intra video codec based on sIII . The percentage of blocks that select SBTk as the optimal transform with RDO is 46.05% on average. The best and worst cases of the percentage of learning based transform and analyze the explainable machine SBTk selected as the optimal transforms over DCT are 90.34% learning based transform, i.e., Subspace approximation with and 11.21% for encoding “BasketballDrillText” and “Kriste- adjusted bias (Saab) transform. Then, framework of Saab nAndSara” with QP as 37 respectively. The visualizations of transform based intra video coding and intra mode dependent the distribution of 8 8 blocks selected SBTk and DCT are × Saab transform learning are presented. Rate-distortion perfor- presented for encoding the first frame of “BasketballDrillText” mances of one-stage Saab transform over DCT for intra video and “KristenAndSara” with QP as 37, as shown in Fig. 9(a) coding is theoretically analyzed and experimentally verified. and (b). We can observe that there are a large proportion of Finally, three integration schemes of one-stage Saab transform blocks selecting Saab transform (blocks in white) as compared based intra video codec are evaluated in comparison with with DCT, which validates the effectiveness of Saab transform. DCT based encoder. Extensive experiments have proved that the proposed 8 8 Saab transform computed in float point VI. CONCLUSIONS arithmetic based× intra video coding is highly effective and Machine learning based transform is good at capturing the can significant improve the coding efficiency in comparison diverse statistical characteristics of data in video coding, as with DCT based video codec. As the Saab transform is compared with the fixed Discrete Cosine Transform (DCT). In nonseparable, it requires more memory to store the transform this work, we formulate the optimization problem in machine kernels than DCT in implementation. 13

[12] ITU-T and ISO/IEC, “High efficiency video coding,” Rec. ITU-T H.265, April, 2013. [13] J. Han, A. Saxena, V. Melkote et al., “Jointly optimized spatial Prediction and block transform for video and image coding,” IEEE Trans. Image Process., vol. 21, no. 4, pp. 1874-1884, April 2012. [14] X. Zhao, J. Chen, M. Karczewicz et al., “Enhanced Multiple Transform for Video Coding,” Data Compress. Conf., pp. 73-82, 2016. [15] A. Kammoun, W. Hamidouche, F. Belghith et al., “Hardware design and implementation of adaptive multiple transforms for the standard,” IEEE Trans. on Consum. Electron., vol. 64, no. 4, pp. 424-432, Nov. 2018. [16] X. Zhao, X. Li and S. Liu, “CE6: on 8-bit primary transform core (Test 6.1.3),” Joint Video Exploration Team (JVET) of ITU-T and ISO/IEC, document JVET-L0285, Macao, CN, Oct. 2018. [17] B. Zeng and J. Fu, “Directional discrete cosine transforms-a new framework for image coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 3, pp. 305-313, March 2008. (a) [18] X. Zhao, J. Chen, M. Karczewicz et al., “Joint separable and non- Separable transforms for next-generation video coding,” IEEE Trans. Image Process., vol. 27, no. 5, pp. 2514-2525, 2018. [19] Y. Zhang, K. Zhang, L. Zhang and et al., “Implicit-Selected Transform in Video COding,” 2020 IEEE Int’l Conf. Multimedia Expo Workshops (ICMEW), pp.1-6, 2020. [20] J. Pfaff, H. Schwarz, D. Marpe et al., “ Video Compression Using Generalized Binary Partitioning, Trellis Coded Quantization, Perceptually Optimized Encoding, and Advanced Prediction and Transform Coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 5, pp. 1281-1295, 2020. [21] W. Park, B. Lee and M. Kim, “Fast Computation of Integer DCT-V, DCT-VIII, and DST-VII for Video Coding,” IEEE Trans. Image Process., vol. 28, no. 12, pp. 5839-5851, Dec. 2019. [22] M. J. Garrido, F. Pescador, M. Chavarr´ıas et al., “A 2-D Multiple Transform Processor for the Versatile Video Coding Standard,” IEEE Trans. Consum. Electron., vol. 65, no. 3, pp. 274-283, Aug. 2019. [23] G. Lu, X. Zhang, W. Ouyang et al., “An End-to-End Learning Frame- (b) work for Video Compression,” IEEE Trans. Pattern Anal. Mach. Intell., pp. 1-1, 2020, doi: 10.1109/TPAMI.2020.2988453. Fig. 9. The blocks using Saab transform and DCT in the proposed scheme [24] C.-C.J. Kuo, M. Zhang, S. Li et al., “Interpretable convolutional neural sIII with QP 37, where white and black blocks indicate Saab and DCT, networks via feedforward design,” J. Vis. Commun. Image Represent., vol. respectively. (a)“BasketballDrillText”. and (b)“KristenAndSara”. 60, pp. 346-359, 2019. [25] C.-C.J. Kuo, “Understanding convolutional neural networks with a mathematical model,” J. Vis. Commun. Image Represent., vol. 41, pp. 406-413, 2016. REFERENCES [26] C.-C.J. Kuo, “The CNN as a guided multilayer RECOS transform [lecture notes],” IEEE Signal Process. Mag., vol. 34, no. 3, pp. 81-89, [1] B. Bross, J. Chen, S. Liu, and Y.-K. Wang, “Versatile video coding (draft 2017. 7) ,” Joint Video Exploration Team (JVET) of ITU-T SG16 WP3 and [27] C.-C.J. Kuo and Y. Chen, “On data-driven Saak transform,” J. Vis. ISO/IEC JTC 1/SC29/WG11, Geneva, CH, Oct. 2019. Commun. Image Represent., vol. 50, pp. 237-246, 2018. [2] I. Dvir, D. Irony, D. Drezner et al., “Orthogonal directional transforms [28] N. Li, Y. Zhang, Y. Zhang et al., “On energy compaction of 2D using discrete directional Laplacian eigen solutions for beyond HEVC Saab image transforms,” Asia-Pacific Signal Info. Process. Assoc. Annu. intra coding,” IEEE Trans. Image Process., vol. 29, pp. 5136-5146, 2020. Summit Conf., Lanzhou, China, Nov. 18-21, 2019. [3] C. Lan, J. Xu, W. Zeng et al., “Variable block-Sized signal-dependent [29] H. Lohscheller, “A subjectively adapted image communication system,” transform for video coding,” IEEE Trans. Circuits Syst. Video Technol., IEEE Trans. Commun., vol. 32, no. 12, pp. 1316-1322, Dec. 1984. vol. 28, no. 8, pp. 1920-1933, Aug. 2018. [30] L. Xu, X. Ji, W. Gao et al., “Laplacian distortion model (LDM) for rate [4] M. Koo, M. Salehifar, J. Lim et al., “CE6: reduced secondary transform control in video coding,” Proc. 8th Pacific Rim Conf. Multimedia, Berlin, (RST) (CE6-3.1),” Joint Video Exploration Team (JVET) of ITU-T and Heidelberg, pp. 638-646, 2007. ISO/IEC, document JVET-N0193, Geneva, Mar. 2019. [31] H. Gish and J. Pierce, “Asymptotically efficient quantizing,” IEEE Trans. [5] X. Cai and J. S. Lim, “Transforms for intra prediction residuals based on Inf. Theory, vol. 14, no. 5, pp. 676-683, Sept. 1968. prediction inaccuracy modeling,” IEEE Trans. Image Process., vol. 24, [32] R.M. Gray, D.L. Neuhoff. “ Quantization.” IEEE Trans. Inf. Theory , no. 12, pp. 5505-5515, Dec. 2015. vol. 44, no. 6, Oct. 1998. [6] M. Wang, W. Xie, J. Xiong et al., “Joint optimization of transform and [33] T. Wiegand and H. Schwarz, “Source coding: Part I of Fundamentals quantization for high efficiency video coding,” IEEE Access, vol. 7, pp. of Source and Video Coding.” Found. Trends Signal Process., pp. 1-222, 62534-62544, 2019. Jan. 2011. [7] X. Zhang, C. Yang, X. Li et al, “ Image Coding With Data-Driven [34] E.Y. Lam and J. W. Goodman,“A mathematical analysis of the DCT Transforms: Methodology, Performance and Potential,” IEEE Trans. coefficient distributions for images,” IEEE Trans. Image Process., vol. 9, Image Process., vol.29, pp.9292-9304,2020. no. 10, pp. 1661-1666, Oct. 2000. [8] H. E. Egilmez, Y. -H. Chao and A. Ortega, “Graph-Based Transforms [35] G.J. Sullivan and T. Wiegand, “Rate-distortion optimization for video for Video Coding,” IEEE Trans. Image Process., vol. 29, pp. 9330-9344, compression,” IEEE Signal Process. Mag., vol. 15, no. 6, pp. 74-90, Nov. 2020. 1998. [9] A. Arrufat, P. Philippe and O. Deforges,“Non-separable mode dependent [36] G. Bjøntegaard, “Calculation of average PSNR differences between transforms for intra coding in HEVC,” Proc. IEEE Visual Commun. Image RD-curves,” Video Coding Experts Group (VCEG) of ITU-T, document Process., pp. 61-64, Dec. 2014. VCEG-M33, 13th Meeting, Austin, Texas, USA, April 2001. [10] S. Takamura and A. Shimizu, “On Intra Coding Using Mode Dependent 2D-KLT,” Pic. Coding Symp., pp. 137-140, 2013. [11] X. Zhao, L. Zhang, S. Ma et al., “Video coding with rate-distortion optimized transform,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 1, pp. 138-151, Jan. 2012.