PAPER UNDER REVIEW, 2021 1

Versatile Coding Standard: A Review from Coding Tools to Consumers Deployment Wassim Hamidouche Member, IEEE, Thibaud Biatek, Mohsen Abdoli, Edouard Franc¸ois, Fernando Pescador Senior Member, IEEE, Milosˇ Radosavljevic,´ Daniel Menard and Mickael Raulet Senior Member, IEEE

Abstract—The amount of video content and the number of bit resolution) and higher frame rates (100/120 frames per applications based on multimedia information increase each day. second). All of these features must be integrated in devices The development of new video coding standards is a challenge with low resources and limited batteries. Therefore, a balance to increase the compression rate and other important features with a reasonable increase in the computational load. Video between complexity of the algorithms and efficiency in the Experts Team (JVET) of ITU-T and the JCT group within implementations is a challenge in the development of new ISO/IEC have worked together to standardize the Versatile Video consumer electronic devices. Coding, approved finally in July 2020 as ITU-T H.266 — MPEG- Taking into account this situation, the joint collaborative I - Part 3 (ISO/IEC 23090-3) standard. This paper overviews some interesting consumer electronic use cases, the compression team on video coding (JCT-VC) of ITU-T and the JCT tools described in the standard, the current available real time group within ISO/IEC started working in 2010 [4] on the implementations and the first industrial trials done with this development of more efficient video coding standards. An standard. example of the success of this collaboration was the high Index Terms—Encoding/decoding, Applica- efficiency video coding (HEVC) [5] standard. This latter [5] tion/implementation, , Real Time reduces the bit-rate of the previous video standard advanced video . video coding (AVC) [6] in a 50% for similar visual quality [7]. Presently, Versatile Video Coding [8] is the most recent video I.INTRODUCTION standard and therefore the one that defines the current state-of- HE last two decades have witnessed exciting devel- the-art. The challenge of this new video standard is to ensure T opments in Consumer Electronics Applications. In this that ubiquitous, embedded, resource-constrained systems are framework, the multimedia applications, and more specifically able to process in real-time to the requirements imposed those in charge of video encoding, broadcasting, storage and by the increasingly complex and computationally demanding decoding, play a key role. Video content represents today consumer electronics video applications. around 82% of the global Internet traffic according to a Versatile Video Coding (VVC) [8] has important improve- study recently conducted by Cisco [1] and video streaming ments compared to its predecessors, although it is also based represents 58% of the Internet traffic [2]. All these new trends on the conventional hybrid prediction/transform video coding will increase the part of video traffic, storage requirement and design scheme. VVC has achieved 50% [9], [10] bitrate especially its energy footprint. For instance, video stream- reduction compared to HEVC by implementing a set of new ing contributes today to 1% of the global greenhouse gas tools and features distributed over the main modules of the emissions, which represent the emissions of a country like traditional hybrid prediction/ scheme. On the Spain [3]. It is expected that in 2025, CO2 emissions induced other hand, the complexity of both encoder and decoder has by video streaming will reach the global CO2 emissions of been increased [11] as is explained in further sections of this cars [3]. paper. arXiv:2106.14245v1 [eess.IV] 27 Jun 2021 The impressive consumption of multimedia contents in The VVC standard has been published and, at present, different consumer electronic products (mobile devices, smart several research institutions and companies are working on ◦ TVs, video consoles, immersive and 360 video or augmented efficient implementations that will be included in new con- and virtual reality devices) requires more efficient video sumer electronics devices very soon. This paper reports in coding algorithms to reduce the bandwidth and the storage further sections some efficient implementations and trials done capabilities while increasing the . Nowadays, recently in real scenarios. the mass market products demand with higher reso- The remainder of this paper is organized as follows. Sec- lutions (greater than 4K) with higher quality (HDR or 10- tionII describes some use cases and the integration of the Wassim Hamidouche and Daniel Menard are with Univ. Rennes, standard with other standards included in the video ecosystem. INSA Rennes, CNRS, IETR - UMR 6164, Rennes, France (e-mail: Sections III andIV outline the basic tools of the VVC [email protected] and [email protected]). Thibaud Biatek, Mohsen Abdoli and Mickael Raulet are with ATEME, standard and the complexity of the algorithm, respectively. Rennes, France. In SectionV, some state-of-the-art implementations for using Edouard Franc¸ois and Milosˇ Radosavljevic´ are with InterDigital, Cesson- consumer electronic devices are reported and, first commercial Sevign´ e,´ France. F. Pescador is with CITSEM at Universidad Politecnica´ de Madrid, Madrid, trials are then presented in SectionVI. Finally, Section VII Spain (e-mail: [email protected]). concludes the paper. PAPER UNDER REVIEW, 2021 2

Frame partitioning Transform & quantization Transform Quantization CU Multi-type Tree Decoder Inv. quantization Preprocessing LMCS (Forward Prediction Inv. transform luma sampling) Inter blocks LMCS (Chroma Entropy coding residue scaling) CABAC Intra prediction Input video LMCS (Inverse Bitstream frame luma sampling) Combined inter- intra prediction Deblocking lter Fig. 1: Potentials of VVC from two different points of view: LMCS (Forward improving existing services and enabling emerging ones. luma sampling) SAO Inter prediction ALF, CC-ALF

Motion II.USE-CASESAND STANDARD INTEGRATION estimation

The need for more efficient codecs has arisen from different Decoded picture bu er sectors of the video delivery ecosystem, as the choice of plays a critical role in their success during the coming years. Fig. 2: VVC encoder block diagram. This includes different applications on different transport mediums and VVC is consistently being considered as one of the main options. From the use-case point of view, VVC investigating the adoption of new codecs for 5G applications. specification potentially covers a significantly wider range of Currently, three videos codecs are being characterized, namely applications, compared to previous video codecs. This aspect VVC, AV1 and EVC. In TR26.955 [12], these codecs are is likely to have a positive impact on the deployment cost and investigated for several scenarios, such as HD-Streaming, 4K- interoperability issue of solutions based on VVC. Thanks to TV, Screen-Content, Messaging, Social-Sharing and Online- its versatility and high capacity of addressing the upcoming Gaming. compression challenges, VVC can be used both for improving To limit the risk of reproducing the same licensing uncer- existing video communication applications and enabling new tainty as HEVC, VVC has taken a different approach. First, the ones relying on emerging technologies illustrated in Figure1. media coding industry forum (MC-IF) has been created to deal To properly address market needs and be deployed at scale, with all non-technical issues related to VVC such as licensing VVC shall be referenced and adopted by application-oriented and commercial development. Second, the specification of standards developing organization (SDO) specifications. Or- supplemental enhancement information (SEI) has ganizations such as digital video broadcast (DVB), 3rd gen- been shifted to a dedicated specification called VSEI (Versatile eration partnership project (3GPP) or advanced television SEI), published as ITU-T H.274 or ISO/IEC-23002-7. Finally, systems committee standards (ATSC) are defining receivers’ VVC has defined in its high level syntax (HLS) a structure capabilities for broadcast and broadband applications and are named adaptation parameter set (APS) enabling to switch tools thus critical to foster VVC adoption in the ecosystem. Apart off in a normative way in case licensing of specific IP would from its intrinsic performance (complexity and compression), be an issue. the successful adoption of a new also relies on Finally, the integration of the VVC with all these standards its licensing structure. and initiatives will allow its use in different consumer elec- DVB, which is a set of international open standards for dig- tronic devices. ital television, is currently working to include next generation video coding solutions in the DVB specification. In late 2020, III.VVCCODING TOOLS before starting the standarization activities, DVB organized a workshop on new video codecs. During this workshop, five In this section, we give a brief description of the VVC potential codecs were presented and discussed as candidates to coding tools to understand the improvements regarding its address DVB customers’ needs: VVC, predecessors. Figure2 illustrates the block diagram of a VVC (EVC), (AOM) video (AV1), low encoder. This latter relies on a conventional hybrid predic- complexity enhancement video coding (LC-EVC) and audio tion/transform architecture. The VVC encoder is composed of video coding standard (AVS3). Since then, the commercial and six main blocks including 1) preprocessing, 2) frame parti- technical work on inclusion of new video codec in the DVB tioning, 3) prediction, 4) transform/quantization, 5) decoder toolbox has started and is in progress, with a new TS-101-154 and in-loop filters, and 6) entropy coding. These six blocks specification expected in 2022. are described in more detail in this section. The preprocessing Similarly, 3GPP, which specifies mobile technologies from step relies on the luma mapping with chroma scaling (LMCS) physical layer to application layer (e.g 4G and 5G), is also tool, which is described in the in-loop filters section. PAPER UNDER REVIEW, 2021 3

A. Frame partitioning Intra prediction references are improved in VVC to more The first step of the frame partitioning block splits the efficiently manage noisy and less correlated neighboring pix- picture into blocks of equal size, named coding tree unit els. A reference selection tool, called multiple reference lines (CTU). The maximum CTU size is 128 × 128 pixels in VVC (MRL) has been introduced that allows the encoder to choose and is composed of one or three coding tree blocks (CTBs) among three reference lines and explicitly signal the best one depending on whether the video signal is monochrome or con- [17]. Once the reference line is selected, the reference sample tains three-color components. The CTUs are then processed in smoothing and the reference sample interpolation are two raster scan order from top left to bottom right. In order to adapt mechanisms in VVC to denoise reference pixels. The choice the prediction block size to the local activity of the pixels, between these two mechanisms is made implicitly (i.e without each CTU is then recursively split into smaller rectangular signaling) and based on block characteristics. coding units (CUs), according to the multi-type tree (MTT) position-dependent prediction combination (PDPC) is yet partitioning scheme. The MTT partitioning is an extension of another new tool in VVC that implicitly combines the predic- the quad-tree (QT) partitioning adopted in HEVC. The blocks tion signal of a block with its unfiltered and filtered boundaries resulting from the partitioning process are named CUs and [18]. Although PDPC is applied differently on DC, planar, may have a size between 64 × 64 and 4 × 4. In Intra slice, horizontal/vertical and other directional modes, its general the luma and chroma components can be recursively split functionality can be formulated as follows: according to their own coding trees (separate luma and chroma coding trees). P˜(x, y) =b(ωLRL + ωT RT As in HEVC, QT divides a CU in four equal sub-CUs. In 6 (1) + (64 − ωL − ωT )P˜(x, y) + 32)/2 , addition, VVC allows rectangular shape for CU with its novel splits binary-tree (BT) and ternary-tree (TT). The BT divides where, P˜(x, y) is the predicted sample at position of coor- a CU in two sub-CUs while the TT divides a CU in three dinates (x, y), RT and RL are, respectively, reference pixels sub-CUs with the ratio 1:2:1. Both BT and TT can split a aligned from the top and left to this position. Weights of these CU horizontally or vertically. The rate-distortion optimization reference pixels, expressed as ωT and ωL, respectively, are (RDO) process performs an exhaustive search that calculates determined based on the position (x, y) as well as the selected the rate-distortion (RD)-cost for each CU and selects the CTU intra mode. Finally, in this equation, the operator bxc is the partition that leads to the lowest RD-cost. In Intra slice, the lower integer value of a real x used with a division by 26 to VVC test model (VTM) constrains the first split of the CTU normalize the modified prediction value into its range. to be a QT. Additional constraints are also applied on CUs, Intra sub-partitions (ISP) tool allows splitting an intra block for instance, QT is not allowed after a BT or a TT split. into two or four sub-blocks, each having its separate residual block, while sharing one single intra mode [19]. The initial B. Intra coding tools motivation behind this tool is to allow short-distance intra prediction of block with non-stationary texture, by sequential Intra coding principal takes advantage of spatial correlation processing of sub-partitions. On thin non-square blocks, this existing in local image texture. To decorrelate them, a series scheme can result in sub-partitions coded with 1-D residual of coding tools are provided in VVC which are tied with par- blocks, providing the closest possible reference lines for intra titioning and a set of Intra prediction modes (IPMs) [13]. For prediction. the block partitioning, in addition to the principals explained The matrix-based Intra prediction (MIP) in VVC is a new in previous section, a new tool called dual-tree is introduced tool designed by an AI-based data-driven method [20]. MIP which allows separate partitioning trees for luma and chroma modes replace the functionality of conventional IPMs, by channel types. As opposed to the single-tree partitioning used matrix multiplication of reference lines, instead of their direc- in inter coding, the intra-specific dual-tree tool offers a higher tional projection. The process of predicting a block with MIP level of freedom for coding decisions of chroma blocks [14]. consists in three main steps: 1) averaging with sub-sampling, The set of IPMs in VVC has extended to 67 modes, 2) matrix-vector multiplication and 3) linear interpolation. The compared to 35 in HEVC. This set consists of two modes of data-driven aspect of MIP is expressed in the second step, DC and planar for modeling homogeneous textures, as well as where a set of matrices are pre-trained and hard-coded in the 65 directional modes for modeling angular textures. The IPMs VVC specification. This set is designed to provide distinct are coded in VVC through an most probable modes (MPMs) matrices for different combinations of block size and internal list of six IPMs, while a three MPMs list is used in HEVC MIP mode. [15]. For square shape blocks, these directional modes cover Cross-component linear model (CCLM) is a new tool in the same range of directional angles, with twice precision as in VVC for exploiting local correlations between luma and HEVC. Moreover, thanks to a new tool in VVC called wide- chroma channels [21]. This tool is based on a similar concept angle Intra prediction (WAIP), the set of 65 angular directional in the HEVC range extension standard, where the inter-channel IPMs are adaptively shifted for non-square blocks [16]. This correlation is modeled in the residual domain [22]. In VVC, tool assigns additional directional modes to the longer side this modeling is carried out in the reconstructed domain, of non-square blocks. By doing so, prediction directions with in the form of: angle greater than 45◦ relative to the pure horizontal or vertical modes are also possible. P˜c(x, y) = α Pˆ(x, y) + β, (2) PAPER UNDER REVIEW, 2021 4

where P˜c(x, y) is the predicted chroma value at position (x, y) present in the spatial or temporal neighborhood of the CU. based on the co-located reconstructed luma value Pˆ(x, y). The Pairwise MV is an average of the two first candidates in model parameters α and β are explicitly derived based on the the list. VVC also supports a new mode, sub-block temporal relation between neighboring luma and chroma pixels. motion prediction (SbTMVP) that performs temporal motion field prediction of a CU on a 4×4 block granularity. Residual C. Inter coding tools motion signaling applies in AMVP mode and consists in coding a corrective motion information added to the motion Inter coding relies on inter-prediction of motion and texture information prediction. In VVC, new residual motion coding data, from previously reconstructed pictures stored in the tools are supported compared to HEVC. Adaptive motion decoded picture buffer (DPB). A simplified block diagram vector resolution (AMVR) allows adapting at CU-level the of the VVC inter-prediction process is provided in Figure3. coding precision of the motion vector difference (MVD). The process first involves motion prediction, based on a Symmetrical motion vector difference (SMVD) applies in the list of motion data candidates. The motion prediction can case of bi-directional prediction and consists in coding a be corrected by residual motion information signaled in the single corrective MV applying symmetrically to MV of each bitstream. The reconstructed motion vectors are then used direction. In merge mode case, merge with motion vector to perform one or two motion compensations, whether the differential (MMVD) mode allows slightly correcting the coding block is uni- or bi-predicted. When bi-prediction is predicted motion information. Finally, in case of bi-directional performed, a blending process is then applied to mix the two prediction, it is also possible to refine the MVs of a CU, motion compensated blocks. Finally, a prediction enhancement with a 4×4 granularity using the decoder-side MV refinement step is performed as a post-prediction filtering. The resulting (DMVR) mode, that performs a motion search of limited range predicted signal can be further corrected by a residual block per 4×4 sub-block. signaled in the bitstream. The is based on separable linear 8-tap filters / 16 phases for luma, and 4-tap filters / 32 phases for chroma. In VVC, it is also possible to adaptively change the coded picture resolution, using the reference picture resampling (RPR) tool. Four different sets of interpolation filters are used depending on the motion model, block size, and scaling ratio between the current picture and the reference Fig. 3: Block diagram of the VVC inter-prediction process. picture (maximum downsampling ratio 2 and upsampling ratio 8). RPR offers new capabilities for bit-rate control to adapt Two different motion models are supported in VVC, trans- to network bandwidth variations and is also applicable to lational model and affine model, controlled at CU-level. The scalability and adaptive resolution coding. For 360◦ video con- affine model can rely on 4 or 6 degrees of freedom. When tent, using equi-rectangular projection format, horizontal wrap a CU is coded with affine motion, it is split into 4×4 sub- around motion compensation allows limiting seam artifacts by blocks, whose motion vectors (MVs) are derived from the performing wrapping instead of padding of the samples located affine parameters of the CU. These parameters are deduced at the reference picture vertical borders. Regarding the motion from 2 or 3 control point motion vectors positioned in the prediction blending, several new modes compared to HEVC top-left, top-right and possibly bottom left corners of the CU, are supported in VVC. Bi-prediction with CU-level weights depending on the number of affine parameters. (BCW) is an enhanced version of the bi-prediction blending of In VVC, the motion vectors accuracy is of 1/16th pixel for HEVC, performing a weighted averaging of the two prediction luma component, and 1/32nd for chroma component (for 4:2:0 signals Pˆ0 and Pˆ1 according to the following formula: chroma format). An inter CU can be coded according to three P˜ = b((8–w) Pˆ + w Pˆ + 4)/23c, (3) main modes: 1) skip mode, that only specifies a motion data 0 1 predictor (motion vector and reference picture index) among where the weight w is selected among a pre-defined set a set of motion data candidates, and does not add any motion of weights. VVC also supports geometric partitioning mode residual or texture residual to the predicted motion and texture (GPM), that splits a CU into non-rectangular sub-partitions, block; 2) merge mode, that additionally signals a texture each partition embedding a translational MV. residual; 3) advanced motion vector prediction (AMVP) mode, Combined inter-intra prediction (CIIP) generates a mixed that is a superset of merge mode where the whole motion version of temporal prediction P˜Inter and spatial prediction information is signaled. Each of these modes uses a list of mo- P˜Intra samples according to the following formula: tion information candidates. In addition, the lists construction P˜ = (w Pˆ + (1–w) Pˆ )/4, (4) differs for the translational and affine modes. As in HEVC, Inter Intra the lists are built from neighboring spatial motion information where the mixing weight w depends on the top and left CUs of the current CU (spatial MVs), motion information from coding mode (intra or inter). The final prediction enhancement reference pictures (TMVP). In addition, VVC specifies two step is a new coding step in VVC that is not present in new types of motion candidates, history-based motion vector HEVC. This step consists in slightly adjusting the prediction prediction (HMVP), and pairwise MV. HMVP stores in a signal, according to two possible modes both relying on the FIFO buffer up to 5 MVs, which allow accessing MVs not optical flow principles. Bi-directional optical flow (BDOF) PAPER UNDER REVIEW, 2021 5 applies in case of bi-directional prediction and consists in minimum between a and b. The sample value beyond the limits correcting the samples value based on the MV and spatio- of the effective N and M are considered to be zero, thus temporal gradients derived from the two reference pictures. reducing the computational cost of the 64-size DCT-II and Prediction refinement with optical flow (PROF) applies to CU 32-size DCT-VIII/DST-VII transforms. This concept is called coded with affine model and performs a per-sample correction zeroing in the VVC specification. that takes into account the difference between the true per- LFNST: The LFNST [24], [25] has been adopted in VVC sample motion field derived from the affine model parameters, since the VTM version 5. The LFNST relies on matrix and the approximated 4×4 motion field used in the actual multiplication applied between the forward primary transform affine motion compensation step. and the quantisation at the encoder side: Z~ = T · Y,~ (10) D. Transform and quantization ~ The transform module in VVC is composed of two blocks where the vector Y includes the coefficients of the residual namely multiple transform selection (MTS) and low frequency block rearranged in a vector and the matrix T contains the non-separable transform (LFNST) that perform separable and coefficients transform kernel. The inverse LFNST is expressed non-separable transforms, respectively [23]. in (11). ~ ~ MTS: The MTS block in VVC involves three transform Y˜ = T T · Z.˜ (11) types including the discrete cosine transform (DCT)-II, DCT- Four sets of two LFNST kernels of sizes 16×16 and 64×64 VIII and discrete sine transform (DST)-VII. The kernels of are applied on 16 coefficients of small blocks (min (width, DCT-II C , DST-VII S and DCT-VIII C are derived from 2 7 8 height) < 8 ) and 64 coefficients of larger blocks (min (width, (5), (6) and (7), respectively. height) > 4), respectively. The VVC specification defines r 2 π(i − 1)(2j − 1) four different transform sets selected depending on the Intra CN = γ cos , (5) 2 i,j i N 2N prediction mode and each set defines two transform kernels. ( q The used kernel within a set is signaled in the bitstream. The 1 i = 1, transform index within a set is coded with a Truncated Rice with γi = 2 . 1 i ∈ {2,...,N}. code with rice parameter p = 0 and cMax = 2 (TRp) and only r 4 π(2i − 1)j  the first bin is context coded. The LFNST is applied on Intra SN = sin . (6) CU for both Intra and Inter slices and concerns Luma and 7 i,j 2N + 1 2N + 1 Chroma components. Finally, LFNST is enabled only when r 4 π(2i − 1)(2j − 1) DCT-II is used as a primary transform. CN = cos , (7) 8 i,j 2N + 1 2(2N + 1) E. In-loop filters with (i, j) ∈ {1, 2,...,N}2 and N is the transform size. The MTS concept selects, for Luma blocks of size lower The picture partitioning and the quantization steps used in than 64, a set of transforms that minimizes the rate distortion VVC may cause coding artifacts such as block discontinuities, cost among five transform sets and the skip configuration. , mosquito noise, or texture and edge smooth- However, only DCT-II is considered for chroma components ing. Four in-loop filters are thus defined in VVC to alleviate and Luma blocks of size 64. The sps mts enabled flag flag these artifacts and enhance the overall coding efficiency [26]. defined at the sequence parameter set (SPS) enables to activate The VVC in-loop filters are deblocking filter (DBF), sample the MTS at the encoder side. Two other flags are defined adaptive offset (SAO), adaptive loop filter (ALF) and cross- at the SPS level to signal whether implicit or explicit MTS component adaptive loop filtering (CC-ALF). In addition, the signalling is used for Intra and Inter coded blocks, respectively. LMCS is a novel tool introduced in VVC that performs both For the explicit signalling, used by default in the reference luma mapping to the luma prediction signal in inter mode software, a syntax element signals the selected horizontal and and chroma scaling to residuals after inverse transform and vertical transforms. To reduce the computational cost of large inverse quantisation. The DBF is applied on block boundaries block–size transforms, the effective height M 0 and width N 0 to reduce the blocking artifacts. The SAO filter is then applied of the coding block (CB) are reduced depending of the CB on the deblocked samples. The SAO filter first classifies the size and transform type reconstructed samples into different categories. Then, for each category, an offset value retrieved by the entropy decoder is  min(N, 16) trT ypeHor > 0, N 0 = (8) added to each sample of the category. The SAO is particularly min(N, 32) otherwise. efficient to alleviate ringing artifacts and correct the local average intensity changes. The last in-loop filters, ALF and  min(M, 16) trT ypeV er > 0, M 0 = (9) CC-ALF, perform block-based linear filtering and adaptive min(M, 32) otherwise. clipping. The ALF performs adaptive filtering to minimize the In (8) and (9), M 0 and N 0 are the effective width and height mean squared error (MSE) between original and reconstructed sizes, trT ypeHor and trT ypeV er are respectively the types samples relying on Wiener filtering [27]. A 7×7 diamond of vertical and horizontal transforms (0: DCT-II, 1: DCT- filter shape is applied on luma components. The filter co- VIII and 2: DST-VII), and the min(a, b) function returns the efficients are derived from a 4×4 block classification based PAPER UNDER REVIEW, 2021 6 on local sample gradients. According to the computed class, TABLE I: Per-tool VVC performance. The BD-rate perfor- filter coefficients are selected from a set of filters which are mance are reported by disabled tool in terms of PSNR, VMAF whether fixed or signaled in the bitstream at the level of the and SSIM. The relative encoding (EncT) and decoding (DncT) APS. Geometric transformations such as 90-degree rotation, gains are reported with respect to the anchor. diagonal or vertical flip may also be applied to the filter BD-rate Complexity Tools coefficients according to the block class. For chroma samples PSNRYUV VMAF MS-SSIM EncT DecT ALF, a 5×5 diamond filter shape is first applied. The chroma MIP† 0.33% 0.57% 0.46% 96.0% 100.0% filter coefficients can only be signaled in the APS. The CC- MRL† 0.12% 0.16% 0.13% 100.0% 100.0% ALF uses co-located Luma samples to generate a correction Lmchroma† 3.80% 1.47% 1.31% 99.0% 100.0% for chroma samples. The CC-ALF is applied only on chroma ISP† 0.36% 0.32% 0.33% 96.0% 100.0% samples and it is performed in parallel with the ALF. MTS⊕ 0.88% 1.07% 1.03% 93.7% 99.2% SBT⊕ 0.35% 0.52% 0.27% 95.0% 100.0% LFNST⊕ 0.62% 0.97% 0.73% 96.1% 99.8% IV. COMPLEXITYAND CODINGPERFORMANCE JCCR⊕ 0.29% 0.38% 0.32% 99.0% 100.0% In order to assess the benefits of the VVC coding tools DepQuant⊕ 2.09% 1.74% 2.20% 99.1% 100.0% described in previous sections, a “tool off” test has been DBF⊗ 0.43% 0.71% 0.20% 100.8% 85.0% performed, consisting, for each individual tool, evaluating the SAO⊗ 0.06% 0.15% 0.02% 99.9% 98.1% coding cost variations between an encoding setting with all ALF+CCALF⊗ 5.89% 6.09% 0.66% 95.1% 90.0% tools enabled, compared to an encoding setting with all tools LMCS⊗ 0.09% -1.01% 0.85% 97.6% 100.0% enabled except the tested tool. A set of 42 UHD sequences, AFFINE? 2.37% 2.32% 2.61% 80.8% 97.0% not included in the common test sequences used during the TMVP? 1.29% 1.56% 1.43% 99.4% 100.0% VVC development process, has been used. This set includes SbTMVP? 0.35% 0.53% 0.41% 101.1% 99.0% sequences of various texture complexity, motion amplitude and AMVR? 1.16% 0.87% 1.16% 82.6% 100.0% local variations, and frame rates (from 30 to 60 frames per MMVD? 0.28% 0.19% 0.23% 90.3% 100.0% second). The evaluation has been performed in random access SMVD? 0.21% 0.19% 0.17% 96.2% 100.0% (RA) configuration, with group of pictures (GOP) size of 32, BCW• 0.25% 0.17% 0.15% 93.7% 98.0% and one intra frame inserted every 1 second, using the VVC GPM• 0.63% 0.73% 0.63% 95.1% 100.0% reference software (VTM12.0). The evaluation focuses on the CIIP• 0.11% 0.17% 0.16% 96.7% 100.0% main new tools supported by VVC and not present in HEVC DMVR 0.84% 0.93% 1.00% 99.9% 95.8% (except the partitioning part which is not considered in this BDOF 0.67% 1.29% 0.74% 96.7% 98.0% evaluation). The Bjøntegaard delta (BD-rate) metric PROF 0.35% 0.41% 0.31% 98.3% 99.0% [28] is used as estimation of the bitrate variations, using as † Intra tools, ⊕ transform tools, ⊗ in-loop filter tools, ? Inter motion, •  objective quality metrics PSNR, VMAF [29] and MS-SSIM prediction blending, prediction enhancement. [30]. For PSNR metric, the BD-rate variations are computed for the PSNR of each color component (Y, U, V), then a weighted BD-rate value is computed from the BD-rate of each to the overall coding gain. Most computing demanding parts in component (using weight 10/12 for luma, and 1/12 for each decoder are inter coding (inter motion, prediction blending and chroma component). VMAF and MS-SSIM are only computed prediction enhancement) and loop filtering. Intra and transform on the luma (Y) component. A positive value of BD-rate categories have negligible impacts on the VTM decoding time. variation indicates a bit rate increase when disabling the tool. At VTM encoding side, the most demanding part is the inter Encoding and decoding runtimes variations are also reported. part, which represents 2/3rd of the encoding time increase. These latest figures are provided as indicative data, but must Another substantial part of the encoding runtime is consumed be considered with a lot of care, since the VTM decoder by the partitioning, which is not assessed and reported in this implementation is far from a real product implementation paper. The decoding runtime impact of these tools remain and from an efficient implementation. Detailed results are limited in this particular RA coding configuration [31]. reported in TableI. Tools are grouped per category ( † Intra tools, ⊕ transform tools, ⊗ in-loop filter tools, ? Inter motion, •  prediction blending, prediction enhancement). When all V. REAL-TIME IMPLEMENTATIONS these tools are disabled, BD-rate variations are 38.75% for PSNRYUV, 30.22% for VMAF and 27.67% for MS-SSIM, As of now, VVC benefits from both industrial and open- with encoding and decoding runtimes of 19% and 49% source implementations, contributing to the emergence of an compared to the setting with all tools enabled (anchor). This end-to-end value chain. For example, manufacturers, univer- demonstrates the substantial coding performance brought by sities and research-institutes have announced the availability the new VVC tools. of fast VVC implementations [32]. Among existing solutions, Figure4a depicts the PSNRYUV BD-rate variations per the INSA Rennes real-time OpenVVC decoder, the Fraun- main tools category, and relative runtime ratios are depicted in hofer Heinrich Hertz Institute VVdeC decoder [33], VVenC Figure4b and Figure4c for encoder and decoder, respectively. encoder [34] and the ATEME TitanLive encoding platform All tools’ categories turn out to have an important contribution have been developed during the last months. PAPER UNDER REVIEW, 2021 7

Fig. 4: Relative contributions of the main VVC tools’ categories in RA coding configuration.

A. Real-Time VVC Decoding with OpenVVC C. Real-Time VVC Encoding with VVenC VVenC [34] is an open source fast implementation of the OpenVVC is the world-first real time software VVC de- VVC encoder3 developed by the Fraunhofer Heinrich Hertz coder developed from scratch in C programming language. Institute. VVenC is developed in C++ programming language The OpenVVC project is intended to provide consumers with and includes low level optimizations through SIMD instruc- an open source library enabling UHD real time VVC decoding tions targeting x86 platform. The encoder is parallel- capability1. The VVC main profile tools are supported by friendly enabling to process a set of frames in parallel on the OpenVVC decoder and the most complex operations are multi-core platforms. Five presets are defined by the encoder optimized in single instruction multiple data (SIMD) for both offering a wide range trade-off between coding efficiency Intel x86 and ARM Neon platforms. The decoder is parallel- (quality) and speed (complexity). Perceptual optimization is friendly supporting high level parallelism such as parallel also integrated to improve subjective video quality, based on decoding of slices, tiles, wavefront and frames (frame-based the XPSNR visual model [37]. Finally, frame-level single- parallelism). These latter leverage multi-core platforms to pass and two-pass rate control with variable bit-rate (VBR) further speedup the decoding process and reduce the decoded encoding are supported by the encoder. frame latency. Finally, the decoder is compatible with the well known video players such as FFplay, GPAC and VLC. D. Real-Time VVC Encoding and Packaging with TitanLive The ATEME TitanLive solution provides software-based B. Real-Time VVC Decoding with VVdeC implementation of a wide variety of standards for audio/video coding, packaging and transport. This solution is currently It is important to consider that all the improvements added used worldwide for broadcast and over-the-top (OTT) head- to VVC in order to achieve better coding performance come end deployments. In order to support VVC, a number of with a cost in terms of computational complexity increasing, components were upgraded. which has been determined to be around 10x in the encoder As further described in [38], VVC and HEVC present and 2x in the decoder compared to HEVC. The Fraunhofer some structural similarities making an upgrade from HEVC Heinrich Hertz Institute has been working on Versatile Video to VVC feasible in a cost effective manner. In order to do deCoder [35] (VVdeC) since October 2020, aiming to provide so, the VVC syntax has been implemented with support for a publicly available optimized VVC software decoder2. the tools already implemented in HEVC, disabling the other This implementation supports multicore architectures and it ones in the APS. Then, the HEVC tools have been upgraded to is optimized for Intel-based platforms. This decoder takes ad- comply with VVC specification and some tools offering a good vantage of functional (multi-threading) and data parallelization complexity-vs-gains trade-offs were implemented. Relying on (SIMD instructions) to achieve the maximum performance. the same core coding engine enabled us to leverage the existing Compared to VTM decoder, this optimized implementation optimized function (assembly, intrinsic) to achieve VVC real- has reached 50% to 90% improvement in terms of decoding time encoding with interesting gains over HEVC, from 10% time reduction over a x86 platform [36]. to 15% depending on the video content. The packager has been upgraded as well to support VVC encapsulation into MPEG2-TS and ISOBMFF. Since the final 1https://github.com/OpenVVC/OpenVVC 2https://github.com/fraunhoferhhi/vvdec 3https://github.com/fraunhoferhhi/vvenc PAPER UNDER REVIEW, 2021 8 draft international standard (FDIS) has not been issued yet several TVs. The PC framerate was set to the framerate of the for MPEG2-TS and ISOBMFF binds, some draft amendment source content. versions (DAM) were implemented [39], [40], assuming that 4) Player and Decoder: The OpenVVC described in Sec- these versions are close to the FDIS ones. tion V-A was used as decoding library for this trial. The VLC player wraps both OpenVVC and libavformat demuxer for this VI.FIRST COMMERCIAL TRIALS trial. Libavformat demuxer is modified to handle MPEG-TS Using the previously described tools some world-first VVC streams carrying VVC and the extracted NALUs are processed in field trials [41], [42] have been implemented as described by OpenVVC. The synchronization and frame presentation is in the following subsections. managed by VLC player.

A. World-First OTA Broadcasting with VVC B. World-First OTT Delivery with VVC 1) Overview: The trial depicted in Figure 5a took place in 1) Overview: The trial depicted in Figure 5b took place June 2020 and is the result of a collaboration between the in June 2020 and is the result of a collaboration between the following entities: following entities: • ATEME provided the encoding and packaging units. • ATEME provided the encoding unit. • SES provided the satellite transponder used for the ex- • Telecom Paris provided the DASH packager (MP4Box) periment as well as the gateways needed at transmission and the player (MP4Client) from GPAC. and reception sides. • IETR provided the VVC real-time decoding library used • VideoLabs provided the media player (VLC) including by MP4Client. demuxing and playback. As illustrated, the UHD source provided by The Explorers is • IETR provided the VVC real-time decoding library used encoded with VVC, and formatted into ISOBMFF mp4 files by VLC player. using ATEME video processing platform. The mp4 files are As illustrated, the UHD source provided by The Explorers encapsulated into DASH with multiple representations using is encoded with VVC, and encapsulated in MPEG-TS using Telecom Paris MP4Box software. The generated DASH is ATEME video processing platform. The streams are modulated pushed on an origin server publicly accessible on the internet. using DVB-S2 and broadcast by SES on Astra 2E transponder, The MP4Client demultiplexes and plays the content thanks to covering the whole Europe. The signal is demodulated and the real-time OpenVVC decoder developed by IETR. forwarded on IP to a VLC player that displays the video thanks 2) VVC Encoding: The ATEME encoding engine used in to the real-time OpenVVC decoder developed by IETR. this experiment followed the VVC draft specification [43] and 2) VVC Encoding and Encapsulation: The ATEME encod- produced a bitstream decodable by the VTM software (tag ing engine used in this experiment followed the VVC draft version 6.1). The video input was encoded offline using RA specification [43] and produced a bitstream decodable by the closed-GOP structure with a 1sec RAP period, producing the VTM software (tag version 6.1). The 2160p-10b-SDR video following bitrate ladder: input was encoded offline using RA GOP structure with a 1sec • 540p @ 1.6 Mbps RAP period and a 20Mbps (CBR) and deltaQP • 720p @ 3.4 Mbps is enabled. • @ 5.8 Mbps The produced elementary stream was encapsulated in • 2160p @ 16.8 Mbps MPEG-TS following draft specification incorporating VVC 3) Client and Player: Support for VVC transport was amendments [44]. Hence, a stream embedding stream type added to GPAC [45] as follows: 0x32 with video descriptor 57 was generated and ready to be delivered over existing broadcast infrastructure. • ISOBMFF demultiplexing for ’vvc1’ and ’vvi1’ sample 3) Satellite Transmission: The MPEG-TS provided by description entries has been added ATEME was rate-adapted and played out on Transponder • ISOBMFF multiplexing for ’vvc1’ and ’vvi1’ sample 2.014 on SES prime UK position 28.2 East. The transponder description entries has been added used for this transmission is a transponder that has been used • MPEG-2 TS demultiplexing for VVC (stream type 0x32) previously for SES 8K test transmissions and therefore the has been added parameters and link budget were slightly unusual for trans- • Inspection of files containing VVC tracks has been added missions at this position (Freq: 11.973, Pol: Vertical, SR 31 (partial support, no bitstream parsing of VVC has been MS/s, DVB-S2, 8PSK 9/10). The uplink was done in Betzdorf, added yet). Luxembourg which is the location of SES headquarters. The tools used in this demo were based on GPAC 1.0. In this Reception was done using a DVB-S2 to IP Gateway (in version, the DASH segmenter is independent from the media our instance the Kathrein EXIP 4124 which is a SAT>IP packaging format and did not require any modification. It server). The Gateway was statically tuned to the corresponding will however require further update once the ”codecs” MIME transponder and setup to forward the relevant PIDs of the parameter for VVC have been defined; for the purpose of the received TS encapsulated in RTP multicast (RFC 2250) to demo, the ”codecs” MIME parameter for VVC is set to ”vvc1” the local network. On that network a powerful PC was used only. Since the demo relies on GPAC 1.0, the packager can to decode the 4K-UHD stream in real-time and displayed on output to both MPEG-DASH and HLS formats. PAPER UNDER REVIEW, 2021 9

(a) End-to-end over the air video broadcasting with VVC.

(b) End-to-end OTT video delivery with VVC.

(c) Live and Low-Latency 4K video OTT channel using VVC. Fig. 5: Illustration of different end-to-end demonstrations with the VVC standard on different transmission mediums. Similarly, the DASH access engine in GPAC is independent provided by The Explorers was timestamped with UTC time from the media packaging format and did not require any and live-encoded using the TitanLive platform. The VVC modification. The OpenVVC decoder is integrated through a encoding was carried out with low-latency CMAF packaging, patched version of . The playback chain has been issuing 100ms chunks within a 2000ms segments, pushed to tested under Windows, and Mac OSX platforms. Since the Akamai CDN thanks to HTTP chunk transfer encoding. the experiment, the VVC support has been merged in the The GPAC player described in the previous section was used GPAC’s main code repository in the master branch4. and highlighted the low-latency delivery when compared to UTC time at the receiver side (typically measuring a 2s glass- C. World-First 4K Live OTTChannel with VVC to-glass). The trial depicted in Figure 5c took place in September 2020 and is the result of a collaboration between the following VII.CONCLUSION entities: • ATEME provided the encoding unit. In this paper, we have addressed several important aspects • Telecom Paris provided the DASH packager (MP4Box) of the latest video coding standard VVC from market use- and the player (MP4Client) from GPAC. cases, coding tools description, per-tool coding efficiency • IETR provided the VVC real-time decoding library used and complexity assessments, to the description of real time by MP4Client. implementations of VVC codecs used in early commercial • Akamai provided content delivery network (CDN) in- trials. All these aspects are of prominent to ensure a wide frastructure supporting HTTP chunk transfer encoding to adoption and a successful deployment of the VVC standard. enable low-latency. The current status of the developed real time VVC codecs In this experiment, an end-to-end live 4K TV channel was and the demonstrated end-to-end VVC transmissions over demonstrated during a period of 1 month. The input video, broadcast and OTT communication mediums clearly show that the VVC technology is mature enough and ready for real 4https://github.com/gpac/gpac/tree/master deployment on consumer electronic products. Our prediction PAPER UNDER REVIEW, 2021 10

is that VVC will be integrated in most of the Consumer [23] X. Zhao, S.-H. Kim, Y. Zhao, H. E. Egilmez, M. Koo, S. Liu, J. Lainema, Electronics devices in a near future. and M. Karczewicz, “Transform coding in the vvc standard,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–1, 2021. REFERENCES [24] X. Zhao, J. Chen, A. Said, V. Seregin, H. E. Egilmez, and M. Kar- czewicz, “Nsst: Non-separable secondary transforms for next generation [1] U. Cisco, “Cisco annual internet report (2018–2023) white paper,” 2020. video coding,” in 2016 Picture Coding Symposium (PCS), 2016, pp. 1–5. [2] N. Arulrajah and E. Marketing, “The world is in a state of flux. is your [25] M. Koo, M. Salehifar, J. Lim, and S.-H. K. and, “Low Frequency Non- website traffic, too?” March 2021. Separable Transform (LFNST),” in Picture Coding Symposium (PCS), [3] M. Efoui-Hess, “Climate crisis: The un- sustainable use of online video. November 2019. The practical case for digital sobriety,” July 2019. [26] M. Karczewicz, N. Hu, J. Taquet, C.-Y. Chen, K. Misra, K. Andersson, [4] “Jct-vc - joint collaborative team on video cod- P. Yin, T. Lu, E. Franc¸ois, and J. Chen, “Vvc in-loop filters,” IEEE ing. https://www.itu.int/en/itu-t/studygroups/2017- Transactions on Circuits and Systems for Video Technology, pp. 1–1, 2020/16/pages/video/jctvc.aspx,” 2010. 2021. [5] G. J. Sullivan, J. M. Boyce, Y. Chen, J. Ohm, C. A. Segall, and A. Vetro, [27] C.-Y. Tsai, C.-Y. Chen, T. Yamakage, I. S. Chong, Y.-W. Huang, C.-M. “Standardized extensions of high efficiency video coding (hevc),” IEEE Fu, T. Itoh, T. Watanabe, T. Chujoh, M. Karczewicz, and S.-M. Lei, Journal of Selected Topics in Signal Processing, vol. 7, no. 6, pp. 1001– “Adaptive loop filtering for video coding,” IEEE Journal of Selected 1016, December 2013. Topics in Signal Processing, vol. 7, no. 6, pp. 934–945, 2013. [6] G. J. Sullivan and T. Wiegand, “Video compression - from concepts to [28] G. Bjontegaard, “Calculation of average psnr differences between rd the h.264/avc standard,” Proceedings of the IEEE, vol. 93, no. 1, pp. curves,” ITU-T SG16/Q6 VCEG 13th meeting, Austin, Texas, USA, April 18–31, January 2005. 2001, Doc. VCEG-M33, 2001. [7] T. K. Tan, R. Weerakkody, M. Mrak, N. Ramzan, V. Baroncini, J.-R. [29] M. Orduna, C. D´ıaz, L. Munoz,˜ P. Perez,´ I. Benito, and N. Garc´ıa, Ohm, and G. J. Sullivan, “Video quality evaluation methodology and “Video multimethod assessment fusion (vmaf) on 360vr contents,” IEEE verification testing of hevc compression performance,” IEEE Transac- Transactions on Consumer Electronics, Vol.66, Issue 1, Feb. 2020. tions on Circuits and Systems for Video Technology, vol. 26, no. 1, pp. [30] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multi-scale structural 76–90, 2016. similarity for image quality assessment,” Proc. 37th IEEE Asilomar [8] B. Bross, J. Chen, J. R. Ohm, G. J. Sullivan, and Y.-K. Wang, Conf. on Signals, Systems, and Computers, Pacific Grove, USA, Nov. “Developments in international video coding standardization after avc, 2003. with an overview of versatile video coding (vvc),” Proceedings of the [31] E. Franc¸ois, M. Kerdranvat, R. Jullian, C. Chevance, P. de Lagrange, IEEE, pp. 1 – 31, January 2021. F. Urban, T. Poirier, and Y. Chen, “Vvc per-tool performance evaluation [9] N. Sidaty, W. Hamidouche, O. Deforges,´ P. Philippe, and J. Fournier, compared to hevc,” IBC 2020, Amsterdam, NL, Sept. 2020. “Compression Performance of the Versatile Video Coding: HD and UHD [32] G.-J. Sullivan, “Deployment status of the VVC standard,” in Document Visual Quality Monitoring,” in 2019 Picture Coding Symposium (PCS), JVET-V0021, Online meeting, March 2021. 2019, pp. 1–5. [33] A. Wieckowski, G. Hege, C. Bartnik, C. Lehmann, C. Stoffers, B. Bross, [10] A. Wieckowski, G. Hege, C. Bartnik, C. Lehmann, C. Stoffers, B. Bross, and D. Marpe, “Towards a live software decoder implementation for and D. Marpe, “Towards a live software decoder implementation for the upcoming versatile video coding (vvc) codec,” in 2020 IEEE the upcoming versatile video coding (vvc) codec,” IEEE International International Conference on Image Processing (ICIP), 2020, pp. 3124– Conference on Image Processing (ICIP), pp. 3124–3128, October 2020. 3128. [11] F. Bossen, K. Suhring,¨ A. Wieckowski, and S. Liu, “Vvc complexity [34] J. Brandenburg, A. Wieckowski, T. Hinz, A. Henkel, V. George, I. Zu- and software implementation analysis,” IEEE Transactions on Circuits pancic, C. Stoffers, B. Bross, H. Schwarz, and D. Marpe, “Towards fast and Systems for Video Technology, pp. 1–1, 2021. and efficient vvc encoding,” in 2020 IEEE 22nd International Workshop [12] “TR26.955: Video codec characteristics for 5G-based services and on Multimedia Signal Processing (MMSP), 2020, pp. 1–6. applications,” 3GPP. [35] A. Wieckowski and B. Bross, “Vdec fraun- [13] J. Pfaff, A. Filippov, S. Liu, X. Zhao, J. Chen, S. De-Luxan-Hern´ andez,´ hofer versatile video decoder, https://www.hhi. T. Wiegand, V. Rufitskiy, A. Ramasubramonian, and G. Van der Auwera, fraunhofer.de/fileadmin/departments/vca/mc/vvc/vvdec-v0.2-v1.,” “Intra prediction and mode coding in vvc,” IEEE Transactions on Heinrich Hertz Institute (HHI) White Paper, 2020. Circuits and Systems for Video Technology, 2021. [36] A. W. et al, “Towards a live software decoder implementation for [14] Y.-W. Huang, J. An, H. Huang, X. Li, S. Hsiang, K. Zhang, H. Gao, the upcoming versatile video coding (vvc) codec,” IEEE International J. Ma, , and O. Chubach, “Block partitioning structure in the vvc stan- Conference on Image Processing, pp. 3124–3128, 2020. dard,” IEEE Transactions on Circuits and Systems for Video Technology, [37] C. R. Helmrich, M. Siekmann, S. Becker, S. Bosse, D. Marpe, and 2021. T. Wiegand, “Xpsnr: A low-complexity extension of the perceptually [15] X. Zhang, S. Liu, and S. Lei, “Intra mode coding in hevc standard,” in weighted peak signal-to-noise ratio for high-resolution video quality 2012 Visual Communications and Image Processing. IEEE, 2012, pp. assessment,” in ICASSP 2020 - 2020 IEEE International Conference 1–6. on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 2727– [16] L. Zhao, X. Zhao, S. Liu, X. Li, J. Lainema, G. Rath, F. Urban, and 2731. F. Racape,´ “Wide angular intra prediction for versatile video coding,” in [38] T. Biatek, M. Abdoli, T. Guionnet, A. Nasrallah, and M. Raulet, 2019 Conference (DCC). IEEE, 2019, pp. 53–62. “Future MPEG standards VVC and EVC: 8K Broadcast Enabler,” in [17] J. Li, B. Li, J. Xu, and R. Xiong, “Intra prediction using multiple International Broadcasting Convention (IBC), September 2020. reference lines for video coding,” in 2017 Data Compression Conference [39] “Text of ISO/IEC 13818-1:2019 DAM 2 Carriage of VVC in MPEG-2 (DCC). IEEE, 2017, pp. 221–230. TS,” MPEG document w19436, July 2020. [18] G. V. der Auwera, V. Seregin, A. Said, , and M. Karczewicz, “Extension [40] “Text of ISO/IEC 14496-15:2019 DAM 2 Carriage of VVC and EVC of Simplified PDPC to Diagonal Intra Modes,” in Document JVET- in ISOBMFF,” MPEG document w19454, July 2020. J0069, San Diego, CA, USA, April 2018. [41] T. Biatek, M. Abdoli, T. Guionnet, M. Raulet, T. Wrede, J. Outters, [19] S. De-Luxan-Hern´ andez,´ V. George, J. Ma, T. Nguyen, H. Schwarz, T. Christophory, H. Bauzee-Luyssen,´ S. Latapie, J.-B. Kempf, P.-L. D. Marpe, and T. Wiegand, “An intra subpartition coding mode for vvc,” Cabarat, and W. Hamidouche, “End-to-End UHD Satellite Broadcast in 2019 IEEE International Conference on Image Processing (ICIP). Transmission using VVC,” MPEG document m54377, 2020. IEEE, 2019, pp. 1203–1207. [42] T. Biatek, M. Abdoli, T. Guionnet, M. Raulet, J. L. Feuvre, P.-L. Cabarat, [20] J. Pfaff, P. Helle, P. Merkle, M. Schafer,¨ B. Stallenberger, T. Hinz, and W. Hamidouche, “End-to-End OTT Streaming using DASH/VVC,” H. Schwarz, D. Marpe, and T. Wiegand, “Data-driven intra-prediction MPEG document m54379, 2020. modes in the development of the versatile video coding standard,” ITU [43] B. Bross, J. Chen, and S. Liu, “Versatile Video Coding (Draft 6),” JVET Journal: ICT Discoveries, vol. 3, 2020. document Q2001, 2019. [21] P. Hanhart and Y. He, “Modified CCLM downsampling filter for “type- [44] “ISO/IEC 13818-1:2019 CDAM2 Carriage of VVC in MPEG-2 TS,” 2” content,” in Document JVET-M0142, Marrakech, Morocco, January JVET document N19048, 2020. 2019. [45] “https://gpac.wp.imt.fr/ ,” GPAC, 2020. [22] X. Zhang, C. Gisquet, E. Francois, F. Zou, and O. C. Au, “Chroma intra prediction based on inter-channel correlation for hevc,” IEEE Transactions on Image Processing, vol. 23, no. 1, pp. 274–286, 2013.