An Overview of Coding Tools in AV1: the First Video Codec from The
Total Page:16
File Type:pdf, Size:1020Kb
SIP (2020), vol. 9, e6, page 1 of 15 © The Authors, 2020. This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited. doi:10.1017/ATSIP.2020.2 original paper An Overview of Coding Tools in AV1: the First Video Codec from the Alliance for Open Media yue chen,1 debargha mukherjee,1 jingning han,1 adrian grange,1 yaowu xu, 1 sarah parker,1 cheng chen,1 hui su,1 urvang joshi,1 ching-han chiang,1 1 1 1 2 2 yunqing wang, paul∗ wilkins, jim bankoski, luc trudeau, nathan egge, 3 4 4 5 jean-marc valin, thomas davies,∗∗ steinar midtskogen, andrey norkin, peterderivaz6 and zoe liu7 In 2018, the Alliance for Open Media (AOMedia) finalized its first video compression format AV1, which is jointly developed by the industry consortium of leading video technology companies. The main goal of AV1 is to provide an open source and royalty-free video coding format that substantially outperforms state-of-the-art codecs available on the market in compression efficiency while remaining practical decoding complexity as well as being optimized for hardware feasibility and scalability on modern devices. To give detailed insights into how the targeted performance and feasibility is realized, this paper provides a technical overview of key coding techniques in AV1. Besides, the coding performance gains are validated by video compression tests performed with the libaom AV1 encoder against the libvpx VP9 encoder. Preliminary comparison with two leading HEVC encoders, x265 and HM, and the reference software of VVC is also conducted on AOM’s common test set and an open 4k set. Keywords: Video compression, Open-source video coding, AV1, Alliance for Open Media Received 11 July 2019; Revised 19 December 2019 I. INTRODUCTION web browsers (Firefox, Chrome, etc.), and operating systems like Android. Therefore, in an effort to create an Over the last decade, web-based video applications have open video format at par with the leading commercial become prevalent, with modern devices and network choices, in mid 2013, Google launched and deployed the infrastructure driving rapid growth in the consump- VP9 video codec [1]. As a codec for production usage, tion of high-resolution, high-quality content. Therefore libvpx-VP9 considerably outperforms x264 [2], a popu- predominant bandwidth consumers such as video-on- lar open source encoder for the most widely used format demand(VoD), live streaming, and conversational video, H.264/AVC [3], while is also a strong competitor to x265, the along with emerging new applications including virtual open source encoder of the state-of-the-art royalty-bearing reality and cloud gaming, that critically rely on high reso- format H.265/HEVC [4]codeconHDcontent[5]. Thereby lution and low latency, are imposing severe challenges on after the release YouTube progressively pushes through VP9 delivery infrastructure and hence creating an even stronger adoption, as of now a good amount of YouTube content will need for high efficiency video compression technology. be streamed in VP9 format when it is possible at client side. It is widely acknowledged that the success of web appli- However, as the demand for high efficiency video appli- cationsisenabledbyopen,fastiterating,andfreelyimple- cationsroseanddiversified,itsoonbecameimperativeto mentable foundation technologies, for example, HTML, continue the advances in compression performance, as well as to incorporate designs facilitating efficient streaming in 1Google, USA 2Mozilla, USA scenarios beyond typical VoD. To that end, in late 2015, a 3Amazon, USA few leading video-on-demand providers, along with firms 4 Cisco, UK and Norway in semiconductor and web browser industry, co-founded 5Netflix, USA 6Argon Design, UK the Alliance for Open Media (AOMedia) [6], which is now a 7Visionular, USA consortium of more than 40 leading hi-tech companies, to *Work performed while with Mozilla. **Work performed while with Google. work jointly toward a next-generation open video coding format called AV1. Corresponding authors: Y. Chen and D. Mukherjee ThefocusofAV1developmentincludes,butisnotlim- E-mails: [email protected] and [email protected] ited to achieving: consistent high-quality real-time video Downloaded from https://www.cambridge.org/core. IP address: 104.132.29.65, on 24 Feb 2020 at 18:59:05, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.2 1 2 yue chen et al. delivery, scalability to modern devices at various band- claims much worse AV1 performance. We would like to widths, tractable computational footprint, optimization for suggest solutions to a few common issues of libaom configu- hardware, and flexibility for both commercial and non- ration in the comparison between libaom, HEVC encoders, commercial content. With VP9 tools and enhancements as and VTM (the reference software of the upcoming format the foundation, the codec was first initialized by solidify- VVC [18]). Firstly, when random access mode is used for ing promising tools from the separate VP10, Daala, and HEVC and VVC codecs, the counterpart mode of libaom Thor codecs that founding members have worked on. Cod- is two-pass coding with non-zero (19 is recommended) lag- ing tools and features were proposed, tested, discussed, and in-frames, rather than 1-pass mode and zero lag-in-frames iterated in AOMedia’s codec, hardware, and testing work- which will disable all bi-directional prediction tools as in groups, at the same time would be reviewed to work toward Refs. [10–12]. Secondly, as constant quality mode is usu- the goal of royalty-free. The AV1 video compression for- ally used for HEVC and VVC codecs in the comparison, mat has been finalized in the end of 2018, incorporating to match the setting, libaom needs to choose such mode a variety of new compression tools, high-level syntax, and as well by specifying end-usage = qandpassingintheQP parallelization features designed for specific use cases. (cq-level), rather than either using libaom in vbr mode to As has been the case for other open-source projects, the match the rates produced by other codecs [13], or manu- development of AV1 reference software – libaom,iscon- ally restraining libaom’s QP by setting min-q and max-q ducted openly in a public repository. The reference encoder [14]. Thirdly, in tests enforcing a 1s intra period, as the and decoder can be built from downloaded source code [7] default configurations of HM and VTM use open-loop GOP by following the guide [8]. Since the soft freeze of coding structure, to match it, forward key frames need to be man- tools in early 2018, libaom has made significant progress uallyenabledforlibaombyspecifyingenable-fwd-kf= 1, in terms of productization-readiness, by a radical accelera- otherwise [10–12,14], libaom encodes in closed gop struc- tion mainly achieved via machine learning powered search ture losing the benefit of cross gop referencing at every space pruning and early termination, and extra bitrate sav- intra frame. Need to mention that we appreciate all those ings via adaptive quantization and improved future refer- work, which greatly encourages community discussions and ence frame generation/structure. Besides libaom,nowthere pushes AOM developers to make better documentation on are other AV1 decoder and encoder implementations avail- how to use AV1 encoders. able or to be released soon designed for various goals and In this paper, we present the core coding tools in AV1 usage cases. Royalty-free AV1 decoder dav1d by VideoLAN that contribute to the majority of the 30 reduction in aver- and FFmpeg community [9], targeting for smooth play- age bitrate compared with the most performant libvpx VP9 back with low CPU utilization, now has been enabled by encoder at the same quality. Preliminary compression per- default in Firefox desktop version and will potentially also formance comparison between the libaom AV1 encoder and be included in Chrome soon. In the meantime, encoders four popular encoders, including the libvpx VP9 encoder, foravarietyofproductizationpurposesareequallyimpor- two widely recognized HEVC encoders – x265 and HM ref- tant to AV1’s ecosystem, for example, SVT-AV1 (by Intel erence software, as well as VTM,areoperatedontestsets and Netflix), rav1e (by Mozilla and Xiph.org), and Aurora (AOM’s common test set and an open 4k set) covering (by Visionular) AV1 encoders are emerging AV1 encod- various resolution and content types under conditions that ing solutions focusing on one or more specific goals like approximate common VoD encoding configurations. high-performance, real-time encoding, on-demand con- tent, immerse/interactive content, perceptual quality, etc. Due to the increasing interest in AV1 performance, many II. AV1 CODING TECHNIQUES efforts [10–17] have been made to conduct performance comparison between libaom-AV1 and other mainstream A) Coding block partition encoders of other formats. Among other work, the conclu- sions are different, and some of them have not listed key VP9 uses a four-way partition tree starting from the 64 × 64 libaom encoder configurations. To address community’s leveldownto4× 4 level, with some additional restrictions concerns on the contradictory conclusions, we would like for blocks below 8 × 8wherewithinan8x8blockallthe to point out some issues that can be spotted in the pro- sub-blocks should have the same reference frame, as shown vided information. References [15–17]claimsthatlibaom- in the top half of Fig. 1,soastoensurethechromacompo- AV1 performs better than x265 under PSNR metric. Ref- nent can be processed in a minimum of 4 × 4 block unit. erence [16]presentsover−30 BDRates for HD (≥720p) Note that partitions designated as “R” refer to as recursive content and −17 overall BDRate, [15]showslessgain in that the same partition tree is repeated at a lower scale of 15 possibly because of the quality loss introduced by until we reach the lowest 4 × 4level.