An In-Depth Review on the AV1 Intra Prediction Marcel M

Total Page:16

File Type:pdf, Size:1020Kb

An In-Depth Review on the AV1 Intra Prediction Marcel M An In-depth Review on the AV1 Intra Prediction Marcel M. Corrêa (Ph.D. student), Daniel M. Palomino, Guilherme R. Corrêa (Co-advisors), Luciano V. Agostini (Advisor) Video Technology Research Group (ViTech) Programa de Pós-Graduação em Computação (PPGC) Universidade Federal de Pelotas (UFPel) {mmcorrea, dpalomino, gcorrea, agostini}@inf.ufpel.edu.br Abstract. The AOMedia Video 1 (AV1) is an open-source and royalty-free video coding format, which was developed by the Alliance for Open Media (AOM) industry consortium and released in June 2018. The main goal of AV1 development was to achieve substantial compression gain over state-of-the-art codecs, while keeping a practical decoding complexity, hardware feasibility and its open and free status. This monograph provides a brief technical overview on the AV1 block partitioning and transform coding, and an in-depth technical review of its intra prediction tools, along with a brief presentation of the VP9 and HEVC intra prediction tools. 1. Introduction The continuous growth in consumption of digital videos over the internet, including services such as video-on-demand, video sharing on social networks, as well as high definition video conferences (even on mobile devices), are reaching the limits of available bandwidth of the telecommunication infrastructures. To address this situation, standardization bodies have been developing video coding standards during the last years. The most recent ISO/IEC and ITU-T standard is the High Efficiency Video Coding (HEVC) [Sullivan 2012; ISO/IEC 2013], which has many of its techniques protected by patents. A key factor in the success of the internet is that its core technologies are open and freely implementable, and digital videos certainly are a central part of the internet experience nowadays. According to Cisco Systems, digital videos are consuming the majority of all the internet traffic and are expected to consume even more in the near future [Cisco Systems 2017], which makes open-source and royalty-free video formats highly desirable. Based on that, Google LLC purchased the company On2 Technologies, which originally developed the VP8 video format [Bankoski 2011], and released the VP8 codec code under a BSD-like license and the bitstream specification under an irrevocable free patent license. This episode marked the start of the WebM project, as an effort to develop open media formats, later resulting in the release of the VP9 video format [Mukherjee 2013; Grange 2013]. Although Google LLC has been using VP9 successfully on YouTube, it is considered to be not enough to handle new tendencies, such as Ultra High Definition (UHD) 4K resolution which is four times larger than 1080p, UHD 8K resolution which is four times larger than 4K, High Dynamic Range (HDR) which has 25% to 50% more data than Standard Dynamic Range (SDR) and, also, higher frame rates. These tendencies result in an expressive increase of data to be compressed. As Google LLC was working on a successor for VP9, called VP10 [Mukherjee 2015], and Cisco Systems and Mozilla Corporation were working on their own video formats, respectively called Thor [Bjøntegaard 2016] and Daala [Valin 2016], an industry consortium, called Alliance for Open Media (AOM), was formed. This alliance was made mainly because the paradigm of developing the technology first and solving the licensing deals later was not working for the majority of the companies involved, as Rosenberg (2015) explained: Unfortunately, the patent licensing situation for H.265 [HEVC] has recently taken a turn for the worse. Two distinct patent licensing pools have formed so far, and many license holders are not represented in either. There is just one license pool for H.264. The total costs to license H.265 from these two pools is up to sixteen times more expensive than H.264, per unit. H.264 had an upper bound on yearly licensing costs, whereas H.265 has no such upper limit. […] We [Cisco Systems] also hired patent lawyers and consultants familiar with this technology area. We created a new codec [Thor] development process which would allow us to work through the long list of patents in this space, and continually evolve our codec to work around or avoid those patents. As result, the AOMedia Video 1 (AV1) [Chen 2018; Rivaz 2018] was released in June 2018 as the VP9 successor, to be a next-generation, open-source and royalty-free video format developed based on many elements of VP10, Thor and Daala. This monograph provides a brief technical overview on the AV1 block partitioning and transform coding, and an in-depth technical review of its intra prediction tools, along with a brief presentation of the VP9 and HEVC intra prediction tools. 2. AV1 Block Structure and Transform Coding In AV1, a video frame is partitioned into superblocks (SBs) of size 128x128 or 64x64 pixels, according to a bitstream flag. Then, SBs are predicted in raster order within the frame. To deliver a locally optimal prediction for each SB, the encoder can further divide each SB using a 10-way partition tree structure [Chen 2018], as illustrated in Figure 1. In this figure, blue blocks are final partition modes, but all four partitions of the white block can be recursively divided based on the same 10-way tree structure, down to 4x4 blocks, which is the smallest supported block size. The numbers inside each partition indicate the raster prediction order within the block. Each final partition can be either intra or inter predicted, or even by a compound inter-intra mode, which combines a single-reference inter prediction and an intra prediction separated by a smoothing function [Chen 2018]. Specifically, for intra prediction, the process is not invoked at block level, but at transform block level instead. That is, when the transform size is smaller than the block size, the prediction is invoked multiple times in raster order within the block [Rivaz 2018], thus allowing the prediction of a transform block to use the previously predicted and reconstructed transform block as a better reference. The AV1 specifies many square and non-square sizes for transform blocks, from 4x4 to 64x64, as listed in Table 1. In this table, TxSize represents how a given transform size is signaled in the bitstream. Super Can start at Block 128x128 or (SB) 64x64 pixels. Each block can be further divided. 1 1 2 1 2 1 2 3 4 1 3 2 4 VERT HORZ VERT_4 HORZ_4 NONE 1 2 1 2 1 3 1 2 3 3 2 3 VERT_A VERT_B HORZ_A HORZ_B SPLIT Figure 1. AV1 10-way block partition tree structure. Table 1. AV1 supported transform sizes [Rivaz 2018]. TxSize Transform Size 0 4x4 1 8x8 2 16x16 3 32x32 4 64x64 5 4x8 6 8x4 7 8x16 8 16x8 9 16x32 10 32x16 11 32x64 12 64x32 13 4x16 14 16x4 15 8x32 16 32x8 17 16x64 18 64x16 A very rich set of transform kernels is defined for both inter and intra blocks. The full set consists of 16 combinations of vertical and horizontal Discrete Cosine Transform (DCT), Asymmetrical Discrete Sine Transform (ADST), flipADST and Identity Transform (IDTX). Intra prediction tends to be less accurate for predicted samples further away from the reference samples, resulting in a residue with higher concentration on the bottom right corner. For these cases, the asymmetrical transforms, such as the ADST and flipADST (the ADST applied in reverse order), are the most appropriate [Parker 2016]. Also, the IDTX, which skips the transform in the applied direction, shows good results for coding residue with very sharp edges. Moreover, the specific combination of IDTX vertical and IDTX horizontal results in a full transform skip, which is very effective for Screen Content Coding (SCC) [Parker 2016]. 3. AV1 Intra Prediction As explained in section 2, although AV1 supports many different block sizes ranging from 128x128 to 4x4 due to its flexible partition structure, only partitions of size 64x64 or smaller are supported by the intra prediction modes (predictors), according to the 19 transform sizes listed in Table 1. AV1 supports a variety of non-directional predictors that consider gradients, spatial correlation of samples and coherence of luminance and chrominance planes. These modes are: DC, Paeth, Smooth, Smooth Vertical, Smooth Horizontal, Recursive-based- filtering (modes 0 to 4) and Chroma-from-luma (CfL) [Trudeau 2018]. There are also two modes particularly efficient for Screen Content Coding, namely the Intra Block Copy [Li 2018] and Palette [Guo 2014] modes. Finally, AV1 also supports many different directional predictors, corresponding to angles between 36 and 212 degrees, aiming at several varieties of spatial redundancies in directional textures. Generally, the intra prediction of a single block, referred as a 2D array called Pred of size width by height, requires reference samples from previously reconstructed blocks located on the left and above the current block. The number of reference samples needed for each reference array, referred as AboveRow (row of samples above the current block) and LeftCol (column of samples to the left of the current block), is equal to width+height+1, as illustrated in Figure 2. AboveRow[−1 to 11] Pred[0 to 3][0 to 7] 11] [−1 to [−1 LeftCol Figure 2. Example of a 4x8 block, which needs a total of 25 reference samples from previously encoded adjacent blocks. The position -1 of both reference arrays always share the same value. The following subsections present detailed information about each intra prediction mode supported by the AV1. 3.1. DC Mode This well-known predictor is also found in many older video formats, although this one was based on the VP9 version, which also considers the availability of reference samples. That is, because of the block processing order or data dependency break introduced by parallel encoding tools, the top and/or left neighboring blocks may not be available to be used as reference in intra prediction.
Recommended publications
  • On Audio-Visual File Formats
    On Audio-Visual File Formats Summary • digital audio and digital video • container, codec, raw data • different formats for different purposes Reto Kromer • AV Preservation by reto.ch • audio-visual data transformations Film Preservation and Restoration Hyderabad, India 8–15 December 2019 1 2 Digital Audio • sampling Digital Audio • quantisation 3 4 Sampling • 44.1 kHz • 48 kHz • 96 kHz • 192 kHz digitisation = sampling + quantisation 5 6 Quantisation • 16 bit (216 = 65 536) • 24 bit (224 = 16 777 216) • 32 bit (232 = 4 294 967 296) Digital Video 7 8 Digital Video Resolution • resolution • SD 480i / SD 576i • bit depth • HD 720p / HD 1080i • linear, power, logarithmic • 2K / HD 1080p • colour model • 4K / UHD-1 • chroma subsampling • 8K / UHD-2 • illuminant 9 10 Bit Depth Linear, Power, Logarithmic • 8 bit (28 = 256) «medium grey» • 10 bit (210 = 1 024) • linear: 18% • 12 bit (212 = 4 096) • power: 50% • 16 bit (216 = 65 536) • logarithmic: 50% • 24 bit (224 = 16 777 216) 11 12 Colour Model • XYZ, L*a*b* • RGB / R′G′B′ / CMY / C′M′Y′ • Y′IQ / Y′UV / Y′DBDR • Y′CBCR / Y′COCG • Y′PBPR 13 14 15 16 17 18 RGB24 00000000 11111111 00000000 00000000 00000000 00000000 11111111 00000000 00000000 00000000 00000000 11111111 00000000 11111111 11111111 11111111 11111111 00000000 11111111 11111111 11111111 11111111 00000000 11111111 19 20 Compression Uncompressed • uncompressed + data simpler to process • lossless compression + software runs faster • lossy compression – bigger files • chroma subsampling – slower writing, transmission and reading • born
    [Show full text]
  • Amphion Video Codecs in an AI World
    Video Codecs In An AI World Dr Doug Ridge Amphion Semiconductor The Proliferance of Video in Networks • Video produces huge volumes of data • According to Cisco “By 2021 video will make up 82% of network traffic” • Equals 3.3 zetabytes of data annually • 3.3 x 1021 bytes • 3.3 billion terabytes AI Engines Overview • Example AI network types include Artificial Neural Networks, Spiking Neural Networks and Self-Organizing Feature Maps • Learning and processing are automated • Processing • AI engines designed for processing huge amounts of data quickly • High degree of parallelism • Much greater performance and significantly lower power than CPU/GPU solutions • Learning and Inference • AI ‘learns’ from masses of data presented • Data presented as Input-Desired Output or as unmarked input for self-organization • AI network can start processing once initial training takes place Typical Applications of AI • Reduce data to be sorted manually • Example application in analysis of mammograms • 99% reduction in images send for analysis by specialist • Reduction in workload resulted in huge reduction in wrong diagnoses • Aid in decision making • Example application in traffic monitoring • Identify areas of interest in imagery to focus attention • No definitive decision made by AI engine • Perform decision making independently • Example application in security video surveillance • Alerts and alarms triggered by AI analysis of behaviours in imagery • Reduction in false alarms and more attention paid to alerts by security staff Typical Video Surveillance
    [Show full text]
  • A Practical Approach to Spatiotemporal Data Compression
    A Practical Approach to Spatiotemporal Data Compres- sion Niall H. Robinson1, Rachel Prudden1 & Alberto Arribas1 1Informatics Lab, Met Office, Exeter, UK. Datasets representing the world around us are becoming ever more unwieldy as data vol- umes grow. This is largely due to increased measurement and modelling resolution, but the problem is often exacerbated when data are stored at spuriously high precisions. In an effort to facilitate analysis of these datasets, computationally intensive calculations are increasingly being performed on specialised remote servers before the reduced data are transferred to the consumer. Due to bandwidth limitations, this often means data are displayed as simple 2D data visualisations, such as scatter plots or images. We present here a novel way to efficiently encode and transmit 4D data fields on-demand so that they can be locally visualised and interrogated. This nascent “4D video” format allows us to more flexibly move the bound- ary between data server and consumer client. However, it has applications beyond purely scientific visualisation, in the transmission of data to virtual and augmented reality. arXiv:1604.03688v2 [cs.MM] 27 Apr 2016 With the rise of high resolution environmental measurements and simulation, extremely large scientific datasets are becoming increasingly ubiquitous. The scientific community is in the pro- cess of learning how to efficiently make use of these unwieldy datasets. Increasingly, people are interacting with this data via relatively thin clients, with data analysis and storage being managed by a remote server. The web browser is emerging as a useful interface which allows intensive 1 operations to be performed on a remote bespoke analysis server, but with the resultant information visualised and interrogated locally on the client1, 2.
    [Show full text]
  • (A/V Codecs) REDCODE RAW (.R3D) ARRIRAW
    What is a Codec? Codec is a portmanteau of either "Compressor-Decompressor" or "Coder-Decoder," which describes a device or program capable of performing transformations on a data stream or signal. Codecs encode a stream or signal for transmission, storage or encryption and decode it for viewing or editing. Codecs are often used in videoconferencing and streaming media solutions. A video codec converts analog video signals from a video camera into digital signals for transmission. It then converts the digital signals back to analog for display. An audio codec converts analog audio signals from a microphone into digital signals for transmission. It then converts the digital signals back to analog for playing. The raw encoded form of audio and video data is often called essence, to distinguish it from the metadata information that together make up the information content of the stream and any "wrapper" data that is then added to aid access to or improve the robustness of the stream. Most codecs are lossy, in order to get a reasonably small file size. There are lossless codecs as well, but for most purposes the almost imperceptible increase in quality is not worth the considerable increase in data size. The main exception is if the data will undergo more processing in the future, in which case the repeated lossy encoding would damage the eventual quality too much. Many multimedia data streams need to contain both audio and video data, and often some form of metadata that permits synchronization of the audio and video. Each of these three streams may be handled by different programs, processes, or hardware; but for the multimedia data stream to be useful in stored or transmitted form, they must be encapsulated together in a container format.
    [Show full text]
  • Multimedia Compression Techniques for Streaming
    International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-8 Issue-12, October 2019 Multimedia Compression Techniques for Streaming Preethal Rao, Krishna Prakasha K, Vasundhara Acharya most of the audio codes like MP3, AAC etc., are lossy as Abstract: With the growing popularity of streaming content, audio files are originally small in size and thus need not have streaming platforms have emerged that offer content in more compression. In lossless technique, the file size will be resolutions of 4k, 2k, HD etc. Some regions of the world face a reduced to the maximum possibility and thus quality might be terrible network reception. Delivering content and a pleasant compromised more when compared to lossless technique. viewing experience to the users of such locations becomes a The popular codecs like MPEG-2, H.264, H.265 etc., make challenge. audio/video streaming at available network speeds is just not feasible for people at those locations. The only way is to use of this. FLAC, ALAC are some audio codecs which use reduce the data footprint of the concerned audio/video without lossy technique for compression of large audio files. The goal compromising the quality. For this purpose, there exists of this paper is to identify existing techniques in audio-video algorithms and techniques that attempt to realize the same. compression for transmission and carry out a comparative Fortunately, the field of compression is an active one when it analysis of the techniques based on certain parameters. The comes to content delivering. With a lot of algorithms in the play, side outcome would be a program that would stream the which one actually delivers content while putting less strain on the audio/video file of our choice while the main outcome is users' network bandwidth? This paper carries out an extensive finding out the compression technique that performs the best analysis of present popular algorithms to come to the conclusion of the best algorithm for streaming data.
    [Show full text]
  • Opus, a Free, High-Quality Speech and Audio Codec
    Opus, a free, high-quality speech and audio codec Jean-Marc Valin, Koen Vos, Timothy B. Terriberry, Gregory Maxwell 29 January 2014 Xiph.Org & Mozilla What is Opus? ● New highly-flexible speech and audio codec – Works for most audio applications ● Completely free – Royalty-free licensing – Open-source implementation ● IETF RFC 6716 (Sep. 2012) Xiph.Org & Mozilla Why a New Audio Codec? http://xkcd.com/927/ http://imgs.xkcd.com/comics/standards.png Xiph.Org & Mozilla Why Should You Care? ● Best-in-class performance within a wide range of bitrates and applications ● Adaptability to varying network conditions ● Will be deployed as part of WebRTC ● No licensing costs ● No incompatible flavours Xiph.Org & Mozilla History ● Jan. 2007: SILK project started at Skype ● Nov. 2007: CELT project started ● Mar. 2009: Skype asks IETF to create a WG ● Feb. 2010: WG created ● Jul. 2010: First prototype of SILK+CELT codec ● Dec 2011: Opus surpasses Vorbis and AAC ● Sep. 2012: Opus becomes RFC 6716 ● Dec. 2013: Version 1.1 of libopus released Xiph.Org & Mozilla Applications and Standards (2010) Application Codec VoIP with PSTN AMR-NB Wideband VoIP/videoconference AMR-WB High-quality videoconference G.719 Low-bitrate music streaming HE-AAC High-quality music streaming AAC-LC Low-delay broadcast AAC-ELD Network music performance Xiph.Org & Mozilla Applications and Standards (2013) Application Codec VoIP with PSTN Opus Wideband VoIP/videoconference Opus High-quality videoconference Opus Low-bitrate music streaming Opus High-quality music streaming Opus Low-delay
    [Show full text]
  • Encoding H.264 Video for Streaming and Progressive Download
    W4: KEY ENCODING SKILLS, TECHNOLOGIES TECHNIQUES STREAMING MEDIA EAST - 2019 Jan Ozer www.streaminglearningcenter.com [email protected]/ 276-235-8542 @janozer Agenda • Introduction • Lesson 5: How to build encoding • Lesson 1: Delivering to Computers, ladder with objective quality metrics Mobile, OTT, and Smart TVs • Lesson 6: Current status of CMAF • Lesson 2: Codec review • Lesson 7: Delivering with dynamic • Lesson 3: Delivering HEVC over and static packaging HLS • Lesson 4: Per-title encoding Lesson 1: Delivering to Computers, Mobile, OTT, and Smart TVs • Computers • Mobile • OTT • Smart TVs Choosing an ABR Format for Computers • Can be DASH or HLS • Factors • Off-the-shelf player vendor (JW Player, Bitmovin, THEOPlayer, etc.) • Encoding/transcoding vendor Choosing an ABR Format for iOS • Native support (playback in the browser) • HTTP Live Streaming • Playback via an app • Any, including DASH, Smooth, HDS or RTMP Dynamic Streaming iOS Media Support Native App Codecs H.264 (High, Level 4.2), HEVC Any (Main10, Level 5 high) ABR formats HLS Any DRM FairPlay Any Captions CEA-608/708, WebVTT, IMSC1 Any HDR HDR10, DolbyVision ? http://bit.ly/hls_spec_2017 iOS Encoding Ladders H.264 HEVC http://bit.ly/hls_spec_2017 HEVC Hardware Support - iOS 3 % bit.ly/mobile_HEVC http://bit.ly/glob_med_2019 Android: Codec and ABR Format Support Codecs ABR VP8 (2.3+) • Multiple codecs and ABR H.264 (3+) HLS (3+) technologies • Serious cautions about HLS • DASH now close to 97% • HEVC VP9 (4.4+) DASH 4.4+ Via MSE • Main Profile Level 3 – mobile HEVC (5+)
    [Show full text]
  • The Daala Video Codec Project Next-Next Generation Video
    The Daala Video Codec Project Next-next Generation Video Timothy B. Terriberry Mozilla & The Xiph.Org Foundation ● Patents are no longer a problem for free software – We can all go home 2 Mozilla & The Xiph.Org Foundation ● Except... not quite 3 Mozilla & The Xiph.Org Foundation Carving out Exceptions in OIN (Table 0 contains one Xiph codec: FLAC) 4 Mozilla & The Xiph.Org Foundation Why This Matters ● Encumbered codecs are a billion dollar toll-tax on communications – Every cost from codecs is repeated a million fold in all multimedia software ● Codec licensing is anti-competitive – Licensing regimes are universally discriminatory – An excuse for proprietary software (Flash) ● Ignoring licensing creates risks that can show up at any time – A tax on success 5 Mozilla & The Xiph.Org Foundation The Royalty-Free Video Challenge ● Creating good codecs is hard – But we don’t need many – The best implementations of patented codecs are already free software ● Network effects decide – Where RF is established, non-free codecs see no adoption (JPEG, PNG, FLAC, …) ● RF is not enough – People care about different things – Must be better on all fronts 6 Mozilla & The Xiph.Org Foundation We Did This for Audio 7 Mozilla & The Xiph.Org Foundation The Daala Project ● Goal: Better than HEVC without infringing IPR ● Need a better strategy than “read a lot of patents” – People don’t believe you – Analysis is error-prone ● Try to stay far away from the line, but... ● One mistake can ruin years of development effort ● See: H.264 Baseline 8 Mozilla & The Xiph.Org
    [Show full text]
  • IP-Soc Shanghai 2017 ALLEGRO Presentation FINAL
    Building an Area-optimized Multi-format Video Encoder IP Tomi Jalonen VP Sales www.allegrodvt.com Allegro DVT Founded in 2003 Privately owned, based in Grenoble (France) Two product lines: 1) Industry de-facto standard video compliance streams Decoder syntax, performance and error resilience streams for H.264|MVC, H.265/SHVC, VP9, AVS2 and AV1 System compliance streams 2) Leading semiconductor video IP Multi-format encoder IP for H.264, H.265, VP9, JPEG Multi-format decoder IP for H.264, H.265, VP9, JPEG WiGig IEEE 802.11ad WDE CODEC IP 2 Evolution of Video Coding Standards International standards defined by standardization bodies such as ITU-T and ISO/IEC H.261 (1990) MPEG-1 (1993) H.262 / MPEG-2 (1995) H.263 (1996) MPEG-4 Part 2 (1999) H.264 / AVC / MPEG-4 Part 10 (2003) H.265 / HEVC (2013) Future Video Coding (“FVC”) MPEG and ISO "Preliminary Joint Call for Evidence on Video Compression with Capability beyond HEVC.” (202?) Incremental improvements of transform-based & motion- compensated hybrid video coding schemes to meet the ever increasing resolution and frame rate requirements 3 Regional Video Standards SMPTE standards in the US VC-1 (2006) VC-2 (2008) China Information Industry Department standards AVS (2005) AVS+ (2012) AVS2.0 (2016) 4 Proprietary Video Formats Sorenson Spark On2 VP6, VP7 RealVideo DivX Popular in the past partly due to technical merits but mainly due to more suitable licensing schemes to a given application than standard video video formats with their patent royalties. 5 Royalty-free Video Formats Xiph.org Foundation
    [Show full text]
  • PVQ Applied Outside of Daala (IETF 97 Draft)
    PVQ Applied outside of Daala (IETF 97 Draft) Yushin Cho Mozilla Corporation November, 2016 Introduction ● Perceptual Vector Quantization (PVQ) – Proposed as a quantization and coefficient coding tool for NETVC – Originally developed for the Daala video codec – Does a gain-shape coding of transform coefficients ● The most distinguishing idea of PVQ is the way it references a predictor. – PVQ does not subtract the predictor from the input to produce a residual Mozilla Corporation 2 Integrating PVQ into AV1 ● Introduction of a transformed predictors both in encoder and decoder – Because PVQ references the predictor in the transform domain, instead of using a pixel-domain residual as in traditional scalar quantization ● Activity masking, the major benefit of PVQ, is not enabled yet – Encoding RDO is solely based on PSNR Mozilla Corporation 3 Traditional Architecture Input X residue Subtraction Transform T signal R predictor P + decoded X Inverse Inverse Scalar decoded R Transform Quantizer Quantizer Coefficient bitstream of Coder coded T(X) Mozilla Corporation 4 AV1 with PVQ Input X Transform T T(X) PVQ Quantizer PVQ Coefficient predictor P Transform T T(X) Coder PVQ Inverse Quantizer Inverse dequantized bitstream of decoded X Transform T(X) coded T(X) Mozilla Corporation 5 Coding Gain Change Metric AV1 --> AV1 with PVQ PSNR Y -0.17 PSNR-HVS 0.27 SSIM 0.93 MS-SSIM 0.14 CIEDE2000 -0.28 ● For the IETF test sequence set, "objective-1-fast". ● IETF and AOM for high latency encoding options are used. Mozilla Corporation 6 Speed ● Increase in total encoding time due to PVQ's search for best codepoint – PVQ's time complexity is close to O(n*n) for n coefficients, while scalar quantization has O(n) ● Compared to Daala, the search space for a RDO decision in AV1 is far larger – For the 1st frame of grandma_qcif (176x144) in intra frame mode, Daala calls PVQ 3843 times, while AV1 calls 632,520 times, that is ~165x.
    [Show full text]
  • Efficient Multi-Codec Support for OTT Services: HEVC/H.265 And/Or AV1?
    Efficient Multi-Codec Support for OTT Services: HEVC/H.265 and/or AV1? Christian Timmerer†,‡, Martin Smole‡, and Christopher Mueller‡ ‡Bitmovin Inc., †Alpen-Adria-Universität San Francisco, CA, USA and Klagenfurt, Austria, EU ‡{firstname.lastname}@bitmovin.com, †{firstname.lastname}@itec.aau.at Abstract – The success of HTTP adaptive streaming is un- multiple versions (e.g., different resolutions and bitrates) and disputed and technical standards begin to converge to com- each version is divided into predefined pieces of a few sec- mon formats reducing market fragmentation. However, other onds (typically 2-10s). A client first receives a manifest de- obstacles appear in form of multiple video codecs to be sup- scribing the available content on a server, and then, the client ported in the future, which calls for an efficient multi-codec requests pieces based on its context (e.g., observed available support for over-the-top services. In this paper, we review the bandwidth, buffer status, decoding capabilities). Thus, it is state of the art of HTTP adaptive streaming formats with re- able to adapt the media presentation in a dynamic, adaptive spect to new services and video codecs from a deployment way. perspective. Our findings reveal that multi-codec support is The existing different formats use slightly different ter- inevitable for a successful deployment of today's and future minology. Adopting DASH terminology, the versions are re- services and applications. ferred to as representations and pieces are called segments, which we will use henceforth. The major differences between INTRODUCTION these formats are shown in Table 1. We note a strong differ- entiation in the manifest format and it is expected that both Today's over-the-top (OTT) services account for more than MPEG's media presentation description (MPD) and HLS's 70 percent of the internet traffic and this number is expected playlist (m3u8) will coexist at least for some time.
    [Show full text]
  • Bitmovin's “Video Developer Report 2018,”
    MPEG MPEG VAST VAST HLS HLS DASH DASH H.264 H.264 AV1 AV1 HLS NATIVE NATIVE CMAF CMAF RTMP RTMP VP9 VP9 ANDROID ANDROID ROKU ROKU HTML5 HTML5 Video Developer MPEG VAST4.0 MPEG VAST4.0 HLS HLS DASH DASH Report 2018 H.264 H.264 AV1 AV1 NATIVE NATIVE CMAF CMAF ROKU RTMP ROKU RTMP VAST4.0 VAST4.0 VP9 VP9 ANDROID ANDROID HTML5 HTML5 DRM DRM MPEG MPEG VAST VAST DASH DASH AV1 HLS AV1 HLS NATIVE NATIVE H.264 H.264 CMAF CMAF RTMP RTMP VP9 VP9 ANDROID ANDROID ROKU ROKU MPEG VAST4.0 MPEG VAST4.0 HLS HLS DASH DASH H.264 H.264 AV1 AV1 NATIVE NATIVE CMAF CMAF ROKU ROKU Welcome to the 2018 Video Developer Report! First and foremost, I’d like to thank everyone for making the 2018 Video Developer Report possible! In its second year the report is wider both in scope and reach. With 456 survey submissions from over 67 countries, the report aims to provide a snapshot into the state of video technology in 2018, as well as a vision into what will be important in the next 12 months. This report would not be possible without the great support and participation of the video developer community. Thank you for your dedication to figuring it out. To making streaming video work despite the challenges of limited bandwidth and a fragmented consumer device landscape. We hope this report provides you with insights into what your peers are working on and the pain points that we are all experiencing. We have already learned a lot and are looking forward to the 2019 Video Developer Survey a year from now! Best Regards, StefanStefan Lederer Lederer CEO, Bitmovin Page 1 Key findings In 2018 H.264/AVC dominates video codec usage globally, used by 92% of developers in the survey.
    [Show full text]