The Daala Directional Deringing Filter Jean-Marc Valin, Member, IEEE

Total Page:16

File Type:pdf, Size:1020Kb

The Daala Directional Deringing Filter Jean-Marc Valin, Member, IEEE 1 The Daala Directional Deringing Filter Jean-Marc Valin, Member, IEEE Abstract—This paper presents the deringing filter used in 0 0 1 1 2 2 3 3 the Daala royalty-free video codec. The filter is based on a 1 1 2 2 3 3 4 4 non-linear conditional replacement filter and is designed for vectorization efficiency. It takes into account the direction 2 2 3 3 4 4 5 5 of edges and patterns being filtered. The filter works by 3 3 4 4 5 5 6 6 identifying the direction of each block and then adaptively filtering along the identified direction. In a second pass, the 4 4 5 5 6 6 7 7 blocks are also filtered in a different direction, with more 5 5 6 6 7 7 8 8 conservative thresholds to avoid blurring edges. The proposed deringing filter is shown to improve the quality of both Daala 6 6 7 7 8 8 9 9 and the Alliance for Open Media (AOM) AV1 video codec. 7 7 8 8 9 9 10 10 Figure 1. Line number k for pixels following direction d = 1 in an 8 × 8 I. INTRODUCTION block. The Daala video codec has been under development since 2012 with the goal of achieving royalty-free status by using technology that differs greatly from more traditional video The filter works by identifying the direction of each block codecs such as H.264 and HEVC. For example, Daala and then adaptively filtering along the identified direction. In uses lapped transforms [1], [2] rather than relying on more a second pass, the blocks are also filtered in a different di- traditional deblocking filters. It is also based on the percep- rection, with more conservative thresholds to avoid blurring tual vector quantization(PVQ) [3] technique rather than on edges. The deringing filter also vectorizes well, requiring no scalar quantization of a transformed residual. Some of these per-pixel scalar operation. An high-level interactive demon- techniques achieve better quality on textured content, at the stration of the algorithm is available at [7]. cost of more ringing artefacts. These are especially present in sharp edges due to Daala’s lack of intra prediction modes capable of predicting diagonal directions. For these reasons, II. DIRECTION SEARCH Daala greatly benefits from a deringing filter. The main goal of deringing is to filter out ringing artifacts The proposed deringing filter is based on the direction of while retaining all the details of the image. In HEVC, this is edges, so we will start by describing the direction search. achieved by the Sample Adaptive Offset (SAO) [4] algorithm For this, the image is first divided into blocks of 8 × 8. The that defines signal offsets for different classes of pixels. block size is chosen to be fine enough to adequately handle Unlike SAO, the approach we take in Daala is that of a non-straight edges, while being large enough to reliably non-linear spatial filter. From the very beginning, the design estimate directions when applied to a quantized image. of the filter was constrained to be easily vectorizable (i.e. Having a constant direction over an 8 × 8 region also makes implementable with SIMD operations), which was not the vectorization of the filter easier. case for other non-linear filters like the median filter [5] and For each block we want to determine the direction that the bilateral filter [6]. best matches the pattern in the block. This is done by The design of the deringing filter originates from the minimizing the sum of squared differences (SSD) between following observations. The amount of ringing in a coded the quantized block and a perfectly directional block. A image tends to be roughly proportional to the quantization perfectly directional block is a block for which each line step size. The amount of detail is a property of the input im- along a certain direction has a constant value. For each age, but the smallest detail actually retained in the quantized direction, we assign a line number k to each pixel, as shown image tends to also be proportional to the quantization step in Fig.1. size. For a given quantization step size, the amplitude of the For each direction d, the pixel average for line k is ringing is generally less than the amplitude of the details. determined by: This paper describes a deringing filter that takes into account the direction of edges and patterns being filtered. 1 X µd;k = xp ; (1) Nd;k Jean-Marc Valin is with Mozilla Corporation and Xiph.Org Foundation. p2Pd;k Corresponding email: [email protected] Copyright 2014-2016 Mozilla Foundation. This work is licensed under CC-BY 4.0. Send correspondence to Jean-Marc Valin <jm- where xp is the value of pixel p, Pd;k is the set of pixels in [email protected]>. line k following direction d and Nd;k is the cardinality of 2 Figure 3. Conditional replacement filter computation Figure 2. Example of direction search for an 8 × 8 block. The patterns III. CONDITIONAL REPLACEMENT FILTER shown are based on the µd;k values. In this case, the 45-degree direction 2 The conditional replacement filter is designed to remove is the one that minimizes σd, so it would be selected by the search. Note that the error values σd shown are never computed in practice (only sd is). noise without blurring sharp edges. A low-pass finite impulse response (FIR) filter with (2M + 1) taps in one dimension can be expressed as P (for example, in Fig.1, N = 2 and N = 8). The k=M d;k 1;0 1;4 1 X SSD is then defined as: y (n) = w x (n + k) ; (6) W k 2 3 k=−M 2 X X 2 M σd = (xp − µd;k) : (2) P 4 5 where W = k=−M wk. Although it removes noise from k2block;d p2Pd;k a signal, it also blurs any details and sharp edges, which Expanding (2) and then substituting (1) into it, we get is undesirable. Bilateral filters avoid the blurring effect by making each weights w dependent on the difference signal 2 3 k x (n + k) − x (n), typically using a Gaussian function. One 2 X X 2 X X 2 σd = 4 xp − 2 xpµd;k + µd;k5 disadvantage of this approach is that it requires keeping track k2block;d p2Pd;k p2Pd;k p2Pd;k of the sum of the weights for each sample n being filtered. 2 3 1 It also results in having a different W normalization factor X X 2 X 2 = 4 xp − 2µd;k xp + Nd;kµd;k5 for each pixel, making vectorization harder. k2block;d p2Pd;k p2Pd;k Rather than change the wk values, the conditional replace- 2 0 123 ment filter changes the signal x (n + k) used in the filter. X X 2 1 X When filtering sample n, any of the x (n + k) tap inputs that = 6 xp − @ xpA 7 4 Nd;k 5 differs from x (n) by more than a threshold T , is replaced k2block;d p2P p2P d;k d;k by x (n) in the calculation: 0 12 k=M X 2 X 1 X 1 X = xp − @ xpA : (3) y (n) = wkR (x (n) ; x (n + k) ;T ) ; (7) Nd;k W p2block k2block;d p2Pd;k k=−M Note that the simplifications leading to (3) are the same with 2 as to those allowing a variance to be computed as σx = ( P 2 P 2 xk ; jxk − x0j < T x − ( x) =N. Considering that the first term of (3) R (x0; xk;T ) = : (8) is constant with respect to d, we simply find the optimal x0 ; otherwise direction dopt by maximizing the second term: The filter computation is illustrated in Fig.3 and an example is shown in Fig.4. dopt = max sd ; (4) d Through algebraic simplifications, we can express the where filter (7) in terms of the differences x (n + k)−x (n), which 0 12 yields X 1 X sd = @ xpA : (5) k=M Nd;k 1 X k2block;d p2Pd;k y (n) = x (n)+ w f (x (n + k) − x (n) ;T ) ; W k k=−M;k6=0 We can avoid the division in (5) by multiplying sd by (9) 840, the least common multiple of the possible Nd;k values with the threshold function (1 ≤ N ≤ 8). When using 8-bit pixel data (for higher d;k d ; jdj < T bit depths, we downscale to 8 bits), and centering the values f (d; T ) = : (10) 0 ; otherwise such that −128 ≤ xp ≤ 127, then 840sd and all calculations leading to that value fit in a 32-bit signed integer. The advantage of this formulation is that the normalization 1 Fig.2 shows an example of a direction search for an 8×8 by W can be approximated without causing any bias, even block containing a line. when W is not a power of two. Also, because W does 3 50 50 80 45 45 23 24 40 40 25 Second filter 35 35 22 8 Filtered pixel 30 30 26 25 25 Effective spatial support 20 20 0 10 20 30 40 50 0 10 20 30 40 50 sample sample 50 50 Figure 5. Filtering along the direction (left) and filtering across the direction (right). 45 45 40 40 Table II 35 35 SECOND STAGE FILTER PARAMETERS 30 30 Index Direction dx dy 25 25 0 $ 1 0 1 0 1 20 20 $ 0 10 20 30 40 50 0 10 20 30 40 50 2 0 1 sample sample $ 3 0 1 $$ 1 0 Figure 4.
Recommended publications
  • Network Working Group A
    Network Working Group A. Filippov Internet Draft Huawei Technologies Intended status: Informational A. Norkin Netflix J.R. Alvarez Huawei Technologies Expires: May 17, 2017 November 17, 2016 <Video Codec Requirements and Evaluation Methodology> draft-ietf-netvc-requirements-04.txt Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. The list of current Internet-Drafts can be accessed at http://datatracker.ietf.org/drafts/current/ Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on May 17, 2017. Copyright Notice Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust’s Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with <Filippov> Expires May 17, 2017 [Page 1] Internet-Draft Video Codec Requirements and Evaluation November 2016 respect to this document.
    [Show full text]
  • Automatic Detection of Perceived Ringing Regions in Compressed Images
    International Journal of Electronics and Communication Engineering. ISSN 0974-2166 Volume 4, Number 5 (2011), pp. 491-516 © International Research Publication House http://www.irphouse.com Automatic Detection of Perceived Ringing Regions in Compressed Images D. Minola Davids Research Scholar, Singhania University, Rajasthan, India Abstract An efficient approach toward a no-reference ringing metric intrinsically exists of two steps: first detecting regions in an image where ringing might occur, and second quantifying the ringing annoyance in these regions. This paper presents a novel approach toward the first step: the automatic detection of regions visually impaired by ringing artifacts in compressed images. It is a no- reference approach, taking into account the specific physical structure of ringing artifacts combined with properties of the human visual system (HVS). To maintain low complexity for real-time applications, the proposed approach adopts a perceptually relevant edge detector to capture regions in the image susceptible to ringing, and a simple yet efficient model of visual masking to determine ringing visibility. Index Terms: Luminance masking, per-ceptual edge, ringing artifact, texture masking. Introduction In current visual communication systems, the most essential task is to fit a large amount of visual information into the narrow bandwidth of transmission channels or into a limited storage space, while maintaining the best possible perceived quality for the viewer. A variety of compression algorithms, for example, such as JPEG[1] and MPEG/H.26xhave been widely adopted in image and video coding trying to achieve high compression efficiency at high quality. These lossy compression techniques, however, inevitably result in various coding artifacts, which by now are known and classified as blockiness, ringing, blur, etc.
    [Show full text]
  • On Audio-Visual File Formats
    On Audio-Visual File Formats Summary • digital audio and digital video • container, codec, raw data • different formats for different purposes Reto Kromer • AV Preservation by reto.ch • audio-visual data transformations Film Preservation and Restoration Hyderabad, India 8–15 December 2019 1 2 Digital Audio • sampling Digital Audio • quantisation 3 4 Sampling • 44.1 kHz • 48 kHz • 96 kHz • 192 kHz digitisation = sampling + quantisation 5 6 Quantisation • 16 bit (216 = 65 536) • 24 bit (224 = 16 777 216) • 32 bit (232 = 4 294 967 296) Digital Video 7 8 Digital Video Resolution • resolution • SD 480i / SD 576i • bit depth • HD 720p / HD 1080i • linear, power, logarithmic • 2K / HD 1080p • colour model • 4K / UHD-1 • chroma subsampling • 8K / UHD-2 • illuminant 9 10 Bit Depth Linear, Power, Logarithmic • 8 bit (28 = 256) «medium grey» • 10 bit (210 = 1 024) • linear: 18% • 12 bit (212 = 4 096) • power: 50% • 16 bit (216 = 65 536) • logarithmic: 50% • 24 bit (224 = 16 777 216) 11 12 Colour Model • XYZ, L*a*b* • RGB / R′G′B′ / CMY / C′M′Y′ • Y′IQ / Y′UV / Y′DBDR • Y′CBCR / Y′COCG • Y′PBPR 13 14 15 16 17 18 RGB24 00000000 11111111 00000000 00000000 00000000 00000000 11111111 00000000 00000000 00000000 00000000 11111111 00000000 11111111 11111111 11111111 11111111 00000000 11111111 11111111 11111111 11111111 00000000 11111111 19 20 Compression Uncompressed • uncompressed + data simpler to process • lossless compression + software runs faster • lossy compression – bigger files • chroma subsampling – slower writing, transmission and reading • born
    [Show full text]
  • Amphion Video Codecs in an AI World
    Video Codecs In An AI World Dr Doug Ridge Amphion Semiconductor The Proliferance of Video in Networks • Video produces huge volumes of data • According to Cisco “By 2021 video will make up 82% of network traffic” • Equals 3.3 zetabytes of data annually • 3.3 x 1021 bytes • 3.3 billion terabytes AI Engines Overview • Example AI network types include Artificial Neural Networks, Spiking Neural Networks and Self-Organizing Feature Maps • Learning and processing are automated • Processing • AI engines designed for processing huge amounts of data quickly • High degree of parallelism • Much greater performance and significantly lower power than CPU/GPU solutions • Learning and Inference • AI ‘learns’ from masses of data presented • Data presented as Input-Desired Output or as unmarked input for self-organization • AI network can start processing once initial training takes place Typical Applications of AI • Reduce data to be sorted manually • Example application in analysis of mammograms • 99% reduction in images send for analysis by specialist • Reduction in workload resulted in huge reduction in wrong diagnoses • Aid in decision making • Example application in traffic monitoring • Identify areas of interest in imagery to focus attention • No definitive decision made by AI engine • Perform decision making independently • Example application in security video surveillance • Alerts and alarms triggered by AI analysis of behaviours in imagery • Reduction in false alarms and more attention paid to alerts by security staff Typical Video Surveillance
    [Show full text]
  • DVP Tutorial
    IP Video Conferencing: A Tutorial Roman Sorokin, Jean-Louis Rougier Abstract Video conferencing is a well-established area of communications, which have been studied for decades. Recently this area has received a new impulse due to significantly increased bandwidth of Local and Wide area networks, appearance of low-priced video equipment and development of web based media technologies. This paper presents the main techniques behind the modern IP-based videoconferencing services, with a particular focus on codecs, network protocols, architectures and standardization efforts. Questions of security and topologies are also tackled. A description of a typical video conference scenario is provided, demonstrating how the technologies, responsible for different conference aspects, are working together. Traditional industrial disposition as well as modern innovative approaches are both addressed. Current industry trends are highlighted in respect to the topics, described in the tutorial. Legacy analog/digital technologies, together with the gateways between the traditional and the IP videoconferencing systems, are not considered. Keywords Video Conferencing, codec, SVC, MCU, SIP, RTP Roman Sorokin ALE International, Colombes, France e-mail: [email protected] Jean-Louis Rougier Télécom ParisTech, Paris, France e-mail: [email protected] 1 1 Introduction Video conferencing is a two-way interactive communication, delivered over networks of different nature, which allows people from several locations to participate in a meeting. Conference participants use video conferencing endpoints of different types. Generally a video conference endpoint has a camera and a microphone. The video stream, generated by the camera, and the audio stream, coming from the microphone, are both compressed and sent to the network interface.
    [Show full text]
  • Opus, a Free, High-Quality Speech and Audio Codec
    Opus, a free, high-quality speech and audio codec Jean-Marc Valin, Koen Vos, Timothy B. Terriberry, Gregory Maxwell 29 January 2014 Xiph.Org & Mozilla What is Opus? ● New highly-flexible speech and audio codec – Works for most audio applications ● Completely free – Royalty-free licensing – Open-source implementation ● IETF RFC 6716 (Sep. 2012) Xiph.Org & Mozilla Why a New Audio Codec? http://xkcd.com/927/ http://imgs.xkcd.com/comics/standards.png Xiph.Org & Mozilla Why Should You Care? ● Best-in-class performance within a wide range of bitrates and applications ● Adaptability to varying network conditions ● Will be deployed as part of WebRTC ● No licensing costs ● No incompatible flavours Xiph.Org & Mozilla History ● Jan. 2007: SILK project started at Skype ● Nov. 2007: CELT project started ● Mar. 2009: Skype asks IETF to create a WG ● Feb. 2010: WG created ● Jul. 2010: First prototype of SILK+CELT codec ● Dec 2011: Opus surpasses Vorbis and AAC ● Sep. 2012: Opus becomes RFC 6716 ● Dec. 2013: Version 1.1 of libopus released Xiph.Org & Mozilla Applications and Standards (2010) Application Codec VoIP with PSTN AMR-NB Wideband VoIP/videoconference AMR-WB High-quality videoconference G.719 Low-bitrate music streaming HE-AAC High-quality music streaming AAC-LC Low-delay broadcast AAC-ELD Network music performance Xiph.Org & Mozilla Applications and Standards (2013) Application Codec VoIP with PSTN Opus Wideband VoIP/videoconference Opus High-quality videoconference Opus Low-bitrate music streaming Opus High-quality music streaming Opus Low-delay
    [Show full text]
  • High Efficiency, Moderate Complexity Video Codec Using Only RF IPR
    Thor High Efficiency, Moderate Complexity Video Codec using only RF IPR draft-fuldseth-netvc-thor-00 Arild Fuldseth, Gisle Bjontegaard (Cisco) IETF 93 – Prague, CZ – July 2015 1 Design principles • Moderate complexity to allow real-time implementation in SW on common HW, as well as new HW designs • Basic building blocks from well-known hybrid approach (motion compensated prediction and transform coding) • Common design elements in modern codecs – Larger block sizes and transforms, up to 64x64 – Quarter pixel interpolation, motion vector prediction, etc. • Cisco RF IPR (note well: declaration filed on draft) – Deblocking, transforms, etc. (some also essential in H.265/4) • Avoid non-RF IPR – If/when others offer RF IPR, design/performance will improve 2 Encoder Architecture Input Transform Quantizer Entropy Output video Coding bitstream - Inverse Transform Intra Frame Prediction Loop filters Inter Frame Prediction Reconstructed Motion Frame Estimation Memory 3 Decoder Architecture Input Entropy Inverse Bitstream Decoding Transform Intra Frame Prediction Loop filters Inter Frame Prediction Output video Reconstructed Frame Memory 4 Block Structure • Super block (SB) 64x64 • Quad-tree split into coding blocks (CB) >= 8x8 • Multiple prediction blocks (PB) per CB • Intra: 1 PB per CB • Inter: 1, 2 (rectangular) or 4 (square) PBs per CB • 1 or 4 transform blocks (TB) per CB 5 Coding-block modes • Intra • Inter0 MV index, no residual information • Inter1 MV index, residual information • Inter2 Explicit motion vector information, residual information
    [Show full text]
  • Encoding H.264 Video for Streaming and Progressive Download
    W4: KEY ENCODING SKILLS, TECHNOLOGIES TECHNIQUES STREAMING MEDIA EAST - 2019 Jan Ozer www.streaminglearningcenter.com [email protected]/ 276-235-8542 @janozer Agenda • Introduction • Lesson 5: How to build encoding • Lesson 1: Delivering to Computers, ladder with objective quality metrics Mobile, OTT, and Smart TVs • Lesson 6: Current status of CMAF • Lesson 2: Codec review • Lesson 7: Delivering with dynamic • Lesson 3: Delivering HEVC over and static packaging HLS • Lesson 4: Per-title encoding Lesson 1: Delivering to Computers, Mobile, OTT, and Smart TVs • Computers • Mobile • OTT • Smart TVs Choosing an ABR Format for Computers • Can be DASH or HLS • Factors • Off-the-shelf player vendor (JW Player, Bitmovin, THEOPlayer, etc.) • Encoding/transcoding vendor Choosing an ABR Format for iOS • Native support (playback in the browser) • HTTP Live Streaming • Playback via an app • Any, including DASH, Smooth, HDS or RTMP Dynamic Streaming iOS Media Support Native App Codecs H.264 (High, Level 4.2), HEVC Any (Main10, Level 5 high) ABR formats HLS Any DRM FairPlay Any Captions CEA-608/708, WebVTT, IMSC1 Any HDR HDR10, DolbyVision ? http://bit.ly/hls_spec_2017 iOS Encoding Ladders H.264 HEVC http://bit.ly/hls_spec_2017 HEVC Hardware Support - iOS 3 % bit.ly/mobile_HEVC http://bit.ly/glob_med_2019 Android: Codec and ABR Format Support Codecs ABR VP8 (2.3+) • Multiple codecs and ABR H.264 (3+) HLS (3+) technologies • Serious cautions about HLS • DASH now close to 97% • HEVC VP9 (4.4+) DASH 4.4+ Via MSE • Main Profile Level 3 – mobile HEVC (5+)
    [Show full text]
  • The Daala Video Codec Project Next-Next Generation Video
    The Daala Video Codec Project Next-next Generation Video Timothy B. Terriberry Mozilla & The Xiph.Org Foundation ● Patents are no longer a problem for free software – We can all go home 2 Mozilla & The Xiph.Org Foundation ● Except... not quite 3 Mozilla & The Xiph.Org Foundation Carving out Exceptions in OIN (Table 0 contains one Xiph codec: FLAC) 4 Mozilla & The Xiph.Org Foundation Why This Matters ● Encumbered codecs are a billion dollar toll-tax on communications – Every cost from codecs is repeated a million fold in all multimedia software ● Codec licensing is anti-competitive – Licensing regimes are universally discriminatory – An excuse for proprietary software (Flash) ● Ignoring licensing creates risks that can show up at any time – A tax on success 5 Mozilla & The Xiph.Org Foundation The Royalty-Free Video Challenge ● Creating good codecs is hard – But we don’t need many – The best implementations of patented codecs are already free software ● Network effects decide – Where RF is established, non-free codecs see no adoption (JPEG, PNG, FLAC, …) ● RF is not enough – People care about different things – Must be better on all fronts 6 Mozilla & The Xiph.Org Foundation We Did This for Audio 7 Mozilla & The Xiph.Org Foundation The Daala Project ● Goal: Better than HEVC without infringing IPR ● Need a better strategy than “read a lot of patents” – People don’t believe you – Analysis is error-prone ● Try to stay far away from the line, but... ● One mistake can ruin years of development effort ● See: H.264 Baseline 8 Mozilla & The Xiph.Org
    [Show full text]
  • IP-Soc Shanghai 2017 ALLEGRO Presentation FINAL
    Building an Area-optimized Multi-format Video Encoder IP Tomi Jalonen VP Sales www.allegrodvt.com Allegro DVT Founded in 2003 Privately owned, based in Grenoble (France) Two product lines: 1) Industry de-facto standard video compliance streams Decoder syntax, performance and error resilience streams for H.264|MVC, H.265/SHVC, VP9, AVS2 and AV1 System compliance streams 2) Leading semiconductor video IP Multi-format encoder IP for H.264, H.265, VP9, JPEG Multi-format decoder IP for H.264, H.265, VP9, JPEG WiGig IEEE 802.11ad WDE CODEC IP 2 Evolution of Video Coding Standards International standards defined by standardization bodies such as ITU-T and ISO/IEC H.261 (1990) MPEG-1 (1993) H.262 / MPEG-2 (1995) H.263 (1996) MPEG-4 Part 2 (1999) H.264 / AVC / MPEG-4 Part 10 (2003) H.265 / HEVC (2013) Future Video Coding (“FVC”) MPEG and ISO "Preliminary Joint Call for Evidence on Video Compression with Capability beyond HEVC.” (202?) Incremental improvements of transform-based & motion- compensated hybrid video coding schemes to meet the ever increasing resolution and frame rate requirements 3 Regional Video Standards SMPTE standards in the US VC-1 (2006) VC-2 (2008) China Information Industry Department standards AVS (2005) AVS+ (2012) AVS2.0 (2016) 4 Proprietary Video Formats Sorenson Spark On2 VP6, VP7 RealVideo DivX Popular in the past partly due to technical merits but mainly due to more suitable licensing schemes to a given application than standard video video formats with their patent royalties. 5 Royalty-free Video Formats Xiph.org Foundation
    [Show full text]
  • PVQ Applied Outside of Daala (IETF 97 Draft)
    PVQ Applied outside of Daala (IETF 97 Draft) Yushin Cho Mozilla Corporation November, 2016 Introduction ● Perceptual Vector Quantization (PVQ) – Proposed as a quantization and coefficient coding tool for NETVC – Originally developed for the Daala video codec – Does a gain-shape coding of transform coefficients ● The most distinguishing idea of PVQ is the way it references a predictor. – PVQ does not subtract the predictor from the input to produce a residual Mozilla Corporation 2 Integrating PVQ into AV1 ● Introduction of a transformed predictors both in encoder and decoder – Because PVQ references the predictor in the transform domain, instead of using a pixel-domain residual as in traditional scalar quantization ● Activity masking, the major benefit of PVQ, is not enabled yet – Encoding RDO is solely based on PSNR Mozilla Corporation 3 Traditional Architecture Input X residue Subtraction Transform T signal R predictor P + decoded X Inverse Inverse Scalar decoded R Transform Quantizer Quantizer Coefficient bitstream of Coder coded T(X) Mozilla Corporation 4 AV1 with PVQ Input X Transform T T(X) PVQ Quantizer PVQ Coefficient predictor P Transform T T(X) Coder PVQ Inverse Quantizer Inverse dequantized bitstream of decoded X Transform T(X) coded T(X) Mozilla Corporation 5 Coding Gain Change Metric AV1 --> AV1 with PVQ PSNR Y -0.17 PSNR-HVS 0.27 SSIM 0.93 MS-SSIM 0.14 CIEDE2000 -0.28 ● For the IETF test sequence set, "objective-1-fast". ● IETF and AOM for high latency encoding options are used. Mozilla Corporation 6 Speed ● Increase in total encoding time due to PVQ's search for best codepoint – PVQ's time complexity is close to O(n*n) for n coefficients, while scalar quantization has O(n) ● Compared to Daala, the search space for a RDO decision in AV1 is far larger – For the 1st frame of grandma_qcif (176x144) in intra frame mode, Daala calls PVQ 3843 times, while AV1 calls 632,520 times, that is ~165x.
    [Show full text]
  • Efficient Multi-Codec Support for OTT Services: HEVC/H.265 And/Or AV1?
    Efficient Multi-Codec Support for OTT Services: HEVC/H.265 and/or AV1? Christian Timmerer†,‡, Martin Smole‡, and Christopher Mueller‡ ‡Bitmovin Inc., †Alpen-Adria-Universität San Francisco, CA, USA and Klagenfurt, Austria, EU ‡{firstname.lastname}@bitmovin.com, †{firstname.lastname}@itec.aau.at Abstract – The success of HTTP adaptive streaming is un- multiple versions (e.g., different resolutions and bitrates) and disputed and technical standards begin to converge to com- each version is divided into predefined pieces of a few sec- mon formats reducing market fragmentation. However, other onds (typically 2-10s). A client first receives a manifest de- obstacles appear in form of multiple video codecs to be sup- scribing the available content on a server, and then, the client ported in the future, which calls for an efficient multi-codec requests pieces based on its context (e.g., observed available support for over-the-top services. In this paper, we review the bandwidth, buffer status, decoding capabilities). Thus, it is state of the art of HTTP adaptive streaming formats with re- able to adapt the media presentation in a dynamic, adaptive spect to new services and video codecs from a deployment way. perspective. Our findings reveal that multi-codec support is The existing different formats use slightly different ter- inevitable for a successful deployment of today's and future minology. Adopting DASH terminology, the versions are re- services and applications. ferred to as representations and pieces are called segments, which we will use henceforth. The major differences between INTRODUCTION these formats are shown in Table 1. We note a strong differ- entiation in the manifest format and it is expected that both Today's over-the-top (OTT) services account for more than MPEG's media presentation description (MPD) and HLS's 70 percent of the internet traffic and this number is expected playlist (m3u8) will coexist at least for some time.
    [Show full text]