Fast VP9-To-AV1 Transcoding Based on Block Partitioning Inheritance
Total Page:16
File Type:pdf, Size:1020Kb
Fast VP9-to-AV1 Transcoding based on Block Partitioning Inheritance Alex Borges, Daniel Palomino, Bruno Zatt, Marcelo Porto, Guilherme Correa Video Technology Research Group (ViTech) Graduate Program in Computing (PPGC), Federal University of Pelotas (UFPel), Brazil {amborges, dpalomino, zatt, porto, gcorrea}@inf.ufpel.edu.br Abstract— This paper proposes a fast VP9-to-AV1 video compared to VP9 (considering the same image quality). This transcoding algorithm based on block partitioning inheritance. represents an average superiority of 22% for AV1 over VP9, The proposed algorithm relies on the reuse of VP9 block an economy of 1/5 in storage resources and other costs partitioning during the AV1 re-encoding process. This way, the involving video transmission services. exhaustive search for the best block size option is avoided to save encoding time. The reuse of VP9 block partitioning is proposed Video transcoding is the process that converts from one based on a statiscal analysis that shows the relation of block video bitstream format to the same format with different parititioning sizes between VP9 and AV1. The analysis configurations (homogeneous transcoding) or to another demontrates that there is a high probability of the AV1 encoding bitstream format (heterogeneous transcoding), as presented in process to choose block sizes of the same size as in the VP9 Fig. 1. With the advent of AV1, converting legacy content encoding. Experimental results show that the proposed from previous formats, such as VP9, becomes an essential task algorithm is able to accelerate the VP9-to-AV1 transcoding for service providers that intend to benefit from its process by 28% on average at the cost of only 4% increase in the compression efficiency. However, as the computational cost BD-Rate when compared with the complete decoding and re- required by libaom is too high, speeding up the encoding encoding process. process is important to allow fast transcoding without decreasing compression efficiency significantly. Keywords—AV1, VP9, transcoding, video coding VP9 and AV1 share several characteristics that can be I. INTRODUCTION harnessed during the transcoding process. Both VP9 and AV1 In 2015, the Alliance for Open Media (AOMedia) follow a block-based hybrid video coding process, as they consortium was created to develop modern royalty-free video divide the frame into smaller parts called blocks for coding formats for online applications, such as on-demand prediction, transform, and quantization. Blocks can assume video transmission, videoconferences, and live streaming. different sizes and shapes, as defined by the codec Partially based on the VP9 [1], Daala [2] and Thor [3] codecs, specification. To achieve the best compression efficiency, the AOMedia launched the AOMedia Video 1 (AV1) [4] format. encoder needs to find the best block size to use in each region Along with the specification, the libaom [5] reference of the video. Usually, this is performed by exhaustive search software was released in 2018. Since then, many other fast over all block size possibilities, which requires a huge AV1 codecs versions have been developed and released by computational effort. In [13], a method for inheriting the best AOMedia members, such as the Scalable Video Technology block size from H.264/AVC to H.265/HEVC during the for AV1 Encoder (SVT-AV1) [6] developed by Intel and transcoding process is proposed. In [14], the H.265/HEVC Netflix, the CISCO-AV1 [7] developed by CISCO, and the Coding Unit depth information is used to accelerate the rav1e [8] developed by XIPH. transcoding from H.265/HEVC to AV1. Although the methods proposed in [13] and [14] present good results and One of the main goals of AV1 is to overcome the are based on block partitioning inheritance, they focus on the compression efficiency achieved by VP9 and replace it as H.264/AVC and H.265/HEVC standards, which employ a current state-of-the-art technology based on royalty-free very different set of block sizes, partitioning modes and codecs. To accomplish that, AV1 includes several new tools coding tree structure in comparison to VP9. Thus, they cannot and features with much more efficient signal processing be directly employed to accelerate the VP9-to-AV1 operations and frame partitioning structures in comparison to transcoding process. To the best of the authors’ knowledge, VP9. However, this efficiency is achieved at the cost of a there is no other work focusing on VP9-to-AV1 transcoding. considerable complexity increase in comparison to VP9. The authors in [9] and [10] show that the reference libaom codec This paper proposes a fast VP9-to-AV1 transcoding requires an encoding time more than 100 times larger in process based on Block Partitioning Inheritance. The comparison to VP9. Thus, time-saving strategies for AV1, proposed solution saves time by reusing the VP9 block especially those leading to small or no penalties in terms of partitioning direction to filter out AV1 partitioning compression efficiency, are currently essential to reduce this possibilities during the re-encoding process. The idea is based gap and enable the deployment of AV1 codecs. on a statistical analysis performed over a set of VP9 and AV1 bitstreams, which allowed identifying partition modes rarely VP9 owner and developer, Google, is the main company used under certain circumstances. that makes use of VP9 in its video services, like the YouTube platform [11], one of the most popular free video platforms in the world. According to [12] more than 500 hours of video content around the world is published Video Transcoder Bitstream Bitstream every minute on YouTube. All these videos are stored in *.vp9 VP9 AV1 decoder decoded coder *.av1 large data centers and require a huge space in hard drives. VIDEO In [9], [10], and [4] the authors demonstrate that AV1 can achieve a compression efficiency gain of 20%, 18%, Fig. 1. Tandem VP9-to-AV1 transcoder. and 28%, respectively, when 978-9-0827-9705-3 555 EUSIPCO 2020 A. Experimental Setup The Spatial Information and Temporal Information (SI- TI) analysis [15] was performed over all the test sequences available in [16], section “objective-2-slow”, to identify those with most heterogeneous characteristics in order to enable a diverse set of videos to be used for testing. Videos sequences selected for the statistical analysis were Blue Sky, BQ Highway, Dirt Bike, Guitar HDR Amazon, Netflix Dinner Scene, Netflix Food Market 2, Netflix Tunnel Flag, and Water HDR Amazon, as available in [17]. To perform the Fig. 2. Block partitioning allowed in AV1 and VP9 (highlighted). experiments, 60 frames of each video sequence were encoded. The reference codec software for both VP9 and AV1 was used in the experiments. For VP9, the libvpx [18] codec, 128x128 version 1.8.2 (hash code 50d1a4), was used. For AV1, the libaom [5] codec, version 1.0.0 (hash code db8f27), was used. 64x64 The reference software implementations were chosen because 32x32 they represent the most complete versions of the encoder 16x16 specifications, including all the available modes and partitioning possibilities allowed in both formats. 8x8 IETF-NETVC-T [16] is the documentation that defines the 4x4 test configurations used for both video codecs. Following the document, the High Latency CQP configuration was used in (a) (b) the experiments and the Constrained Quality (CQ) parameter was set to values 20, 32, 43, and 55. All experiments are Fig. 3. An AV1 superblock partitioned into blocks: (a) block view, (b) tree view. Gray blocks represent Split partitioning mode. executed sequentially in the same workstation (Intel [email protected] GHz processor, 8 GB RAM, Ubuntu OS), in terminal mode. II. VP9/AV1 PARTITIONING CORRELATION ANALYSIS The CQ parameter was set to 20 for quantization in the A block can assume square or rectangular shapes VP9 encoder, aiming at transcoding from the best quality organized in some configurations, called partitions type. In available in the recommended settings. For AV1, CQ values VP9 there are three partition types: square (named as None), 20, 32, 43, and 55 were used, as defined in [16]. As AV1 vertical (Vert) and horizontal (Horz), as shown in Fig. 2. In introduces new partitioning possibilities in both the horizontal AV1, blocks can assume nine partition types based on the and vertical directions, the occurrence rate of modes three directions observed in VP9. Besides, both codecs also belonging to each direction was summed up in the analysis. allow a Split partition type, which recursively subdivides the Thus, modes 2, 3, 4, and 8 in Fig. 2 are considered as Horz current block into four square blocks. This process follows a modes, whereas modes 1, 5, 6, and 7 are considered as Vert coding block tree structure, as shown in Fig. 3. It is likely that modes. the same block partitioning will be used in both VP9 and AV1 B. Correlation Analysis codecs, since the same video region is being encoded. Considering this, we performed a correlation analysis between For each block in a VP9-encoded video, the same region the partitioning chosen by each codec to use it as a basis for in the AV1-encoded video was observed. For that, a label was the proposed fast transcoding algorithm. attributed to each 4×4 area in the frame, indicating which block size and partition mode were chosen during the TABLE I. CORRELATION ANALYSIS BETWEEN PARTITION TYPES CHOSEN BY VP9 AND AV1 (CQ 20). VP9 64×64 32×32 16×16 8×8 AV1 None Horz Vert None Horz Vert None Horz Vert None Horz Vert None 39.18 25.16 10.92 4.56 1.86 1.72 1.61