Daala: a Perceptually-Driven Next Generation Video Codec
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Network Working Group A
Network Working Group A. Filippov Internet Draft Huawei Technologies Intended status: Informational A. Norkin Netflix J.R. Alvarez Huawei Technologies Expires: May 17, 2017 November 17, 2016 <Video Codec Requirements and Evaluation Methodology> draft-ietf-netvc-requirements-04.txt Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. The list of current Internet-Drafts can be accessed at http://datatracker.ietf.org/drafts/current/ Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on May 17, 2017. Copyright Notice Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust’s Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with <Filippov> Expires May 17, 2017 [Page 1] Internet-Draft Video Codec Requirements and Evaluation November 2016 respect to this document. -
Music 422 Project Report
Music 422 Project Report Jatin Chowdhury, Arda Sahiner, and Abhipray Sahoo October 10, 2020 1 Introduction We present an audio coder that seeks to improve upon traditional coding methods by implementing block switching, spectral band replication, and gain-shape quantization. Block switching improves on coding of transient sounds. Spectral band replication exploits low sensitivity of human hearing towards high frequencies. It fills in any missing coded high frequency content with information from the low frequencies. We chose to do gain-shape quantization in order to explicitly code the energy of each band, preserving the perceptually important spectral envelope more accurately. 2 Implementation 2.1 Block Switching Block switching, as introduced by Edler [1], allows the audio coder to adaptively change the size of the block being coded. Modern audio coders typically encode frequency lines obtained using the Modified Discrete Cosine Transform (MDCT). Using longer blocks for the MDCT allows forbet- ter spectral resolution, while shorter blocks have better time resolution, due to the time-frequency trade-off known as the Fourier uncertainty principle [2]. Due to their poor time resolution, using long MDCT blocks for an audio coder can result in “pre-echo” when the coder encodes transient sounds. Block switching allows the coder to use long MDCT blocks for normal signals being encoded and use shorter blocks for transient parts of the signal. In our coder, we implement the block switching algorithm introduced in [1]. This algorithm consists of four types of block windows: long, short, start, and stop. Long and short block windows are simple sine windows, as defined in [3]: (( ) ) 1 π w[n] = sin n + (1) 2 N where N is the length of the block. -
DVP Tutorial
IP Video Conferencing: A Tutorial Roman Sorokin, Jean-Louis Rougier Abstract Video conferencing is a well-established area of communications, which have been studied for decades. Recently this area has received a new impulse due to significantly increased bandwidth of Local and Wide area networks, appearance of low-priced video equipment and development of web based media technologies. This paper presents the main techniques behind the modern IP-based videoconferencing services, with a particular focus on codecs, network protocols, architectures and standardization efforts. Questions of security and topologies are also tackled. A description of a typical video conference scenario is provided, demonstrating how the technologies, responsible for different conference aspects, are working together. Traditional industrial disposition as well as modern innovative approaches are both addressed. Current industry trends are highlighted in respect to the topics, described in the tutorial. Legacy analog/digital technologies, together with the gateways between the traditional and the IP videoconferencing systems, are not considered. Keywords Video Conferencing, codec, SVC, MCU, SIP, RTP Roman Sorokin ALE International, Colombes, France e-mail: [email protected] Jean-Louis Rougier Télécom ParisTech, Paris, France e-mail: [email protected] 1 1 Introduction Video conferencing is a two-way interactive communication, delivered over networks of different nature, which allows people from several locations to participate in a meeting. Conference participants use video conferencing endpoints of different types. Generally a video conference endpoint has a camera and a microphone. The video stream, generated by the camera, and the audio stream, coming from the microphone, are both compressed and sent to the network interface. -
(A/V Codecs) REDCODE RAW (.R3D) ARRIRAW
What is a Codec? Codec is a portmanteau of either "Compressor-Decompressor" or "Coder-Decoder," which describes a device or program capable of performing transformations on a data stream or signal. Codecs encode a stream or signal for transmission, storage or encryption and decode it for viewing or editing. Codecs are often used in videoconferencing and streaming media solutions. A video codec converts analog video signals from a video camera into digital signals for transmission. It then converts the digital signals back to analog for display. An audio codec converts analog audio signals from a microphone into digital signals for transmission. It then converts the digital signals back to analog for playing. The raw encoded form of audio and video data is often called essence, to distinguish it from the metadata information that together make up the information content of the stream and any "wrapper" data that is then added to aid access to or improve the robustness of the stream. Most codecs are lossy, in order to get a reasonably small file size. There are lossless codecs as well, but for most purposes the almost imperceptible increase in quality is not worth the considerable increase in data size. The main exception is if the data will undergo more processing in the future, in which case the repeated lossy encoding would damage the eventual quality too much. Many multimedia data streams need to contain both audio and video data, and often some form of metadata that permits synchronization of the audio and video. Each of these three streams may be handled by different programs, processes, or hardware; but for the multimedia data stream to be useful in stored or transmitted form, they must be encapsulated together in a container format. -
Opus, a Free, High-Quality Speech and Audio Codec
Opus, a free, high-quality speech and audio codec Jean-Marc Valin, Koen Vos, Timothy B. Terriberry, Gregory Maxwell 29 January 2014 Xiph.Org & Mozilla What is Opus? ● New highly-flexible speech and audio codec – Works for most audio applications ● Completely free – Royalty-free licensing – Open-source implementation ● IETF RFC 6716 (Sep. 2012) Xiph.Org & Mozilla Why a New Audio Codec? http://xkcd.com/927/ http://imgs.xkcd.com/comics/standards.png Xiph.Org & Mozilla Why Should You Care? ● Best-in-class performance within a wide range of bitrates and applications ● Adaptability to varying network conditions ● Will be deployed as part of WebRTC ● No licensing costs ● No incompatible flavours Xiph.Org & Mozilla History ● Jan. 2007: SILK project started at Skype ● Nov. 2007: CELT project started ● Mar. 2009: Skype asks IETF to create a WG ● Feb. 2010: WG created ● Jul. 2010: First prototype of SILK+CELT codec ● Dec 2011: Opus surpasses Vorbis and AAC ● Sep. 2012: Opus becomes RFC 6716 ● Dec. 2013: Version 1.1 of libopus released Xiph.Org & Mozilla Applications and Standards (2010) Application Codec VoIP with PSTN AMR-NB Wideband VoIP/videoconference AMR-WB High-quality videoconference G.719 Low-bitrate music streaming HE-AAC High-quality music streaming AAC-LC Low-delay broadcast AAC-ELD Network music performance Xiph.Org & Mozilla Applications and Standards (2013) Application Codec VoIP with PSTN Opus Wideband VoIP/videoconference Opus High-quality videoconference Opus Low-bitrate music streaming Opus High-quality music streaming Opus Low-delay -
High Efficiency, Moderate Complexity Video Codec Using Only RF IPR
Thor High Efficiency, Moderate Complexity Video Codec using only RF IPR draft-fuldseth-netvc-thor-00 Arild Fuldseth, Gisle Bjontegaard (Cisco) IETF 93 – Prague, CZ – July 2015 1 Design principles • Moderate complexity to allow real-time implementation in SW on common HW, as well as new HW designs • Basic building blocks from well-known hybrid approach (motion compensated prediction and transform coding) • Common design elements in modern codecs – Larger block sizes and transforms, up to 64x64 – Quarter pixel interpolation, motion vector prediction, etc. • Cisco RF IPR (note well: declaration filed on draft) – Deblocking, transforms, etc. (some also essential in H.265/4) • Avoid non-RF IPR – If/when others offer RF IPR, design/performance will improve 2 Encoder Architecture Input Transform Quantizer Entropy Output video Coding bitstream - Inverse Transform Intra Frame Prediction Loop filters Inter Frame Prediction Reconstructed Motion Frame Estimation Memory 3 Decoder Architecture Input Entropy Inverse Bitstream Decoding Transform Intra Frame Prediction Loop filters Inter Frame Prediction Output video Reconstructed Frame Memory 4 Block Structure • Super block (SB) 64x64 • Quad-tree split into coding blocks (CB) >= 8x8 • Multiple prediction blocks (PB) per CB • Intra: 1 PB per CB • Inter: 1, 2 (rectangular) or 4 (square) PBs per CB • 1 or 4 transform blocks (TB) per CB 5 Coding-block modes • Intra • Inter0 MV index, no residual information • Inter1 MV index, residual information • Inter2 Explicit motion vector information, residual information -
Fast Computational Structures for an Efficient Implementation of The
ARTICLE IN PRESS Signal Processing ] (]]]]) ]]]–]]] Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro Fast computational structures for an efficient implementation of the complete TDAC analysis/synthesis MDCT/MDST filter banks Vladimir Britanak a,Ã, Huibert J. Lincklaen Arrie¨ns b a Institute of Informatics, Slovak Academy of Sciences, Dubravska cesta 9, 845 07 Bratislava, Slovak Republic b Delft University of Technology, Department of Electrical Engineering, Mathematics and Computer Science, Mekelweg 4, 2628 CD Delft, The Netherlands article info abstract Article history: A new fast computational structure identical both for the forward and backward Received 27 May 2008 modified discrete cosine/sine transform (MDCT/MDST) computation is described. It is Received in revised form the result of a systematic construction of a fast algorithm for an efficient implementa- 8 January 2009 tion of the complete time domain aliasing cancelation (TDAC) analysis/synthesis MDCT/ Accepted 10 January 2009 MDST filter banks. It is shown that the same computational structure can be used both for the encoder and the decoder, thus significantly reducing design time and resources. Keywords: The corresponding generalized signal flow graph is regular and defines new sparse Modified discrete cosine transform matrix factorizations of the discrete cosine transform of type IV (DCT-IV) and MDCT/ Modified discrete sine transform MDST matrices. The identical fast MDCT computational structure provides an efficient Modulated -
The Daala Video Codec Project Next-Next Generation Video
The Daala Video Codec Project Next-next Generation Video Timothy B. Terriberry Mozilla & The Xiph.Org Foundation ● Patents are no longer a problem for free software – We can all go home 2 Mozilla & The Xiph.Org Foundation ● Except... not quite 3 Mozilla & The Xiph.Org Foundation Carving out Exceptions in OIN (Table 0 contains one Xiph codec: FLAC) 4 Mozilla & The Xiph.Org Foundation Why This Matters ● Encumbered codecs are a billion dollar toll-tax on communications – Every cost from codecs is repeated a million fold in all multimedia software ● Codec licensing is anti-competitive – Licensing regimes are universally discriminatory – An excuse for proprietary software (Flash) ● Ignoring licensing creates risks that can show up at any time – A tax on success 5 Mozilla & The Xiph.Org Foundation The Royalty-Free Video Challenge ● Creating good codecs is hard – But we don’t need many – The best implementations of patented codecs are already free software ● Network effects decide – Where RF is established, non-free codecs see no adoption (JPEG, PNG, FLAC, …) ● RF is not enough – People care about different things – Must be better on all fronts 6 Mozilla & The Xiph.Org Foundation We Did This for Audio 7 Mozilla & The Xiph.Org Foundation The Daala Project ● Goal: Better than HEVC without infringing IPR ● Need a better strategy than “read a lot of patents” – People don’t believe you – Analysis is error-prone ● Try to stay far away from the line, but... ● One mistake can ruin years of development effort ● See: H.264 Baseline 8 Mozilla & The Xiph.Org -
PVQ Applied Outside of Daala (IETF 97 Draft)
PVQ Applied outside of Daala (IETF 97 Draft) Yushin Cho Mozilla Corporation November, 2016 Introduction ● Perceptual Vector Quantization (PVQ) – Proposed as a quantization and coefficient coding tool for NETVC – Originally developed for the Daala video codec – Does a gain-shape coding of transform coefficients ● The most distinguishing idea of PVQ is the way it references a predictor. – PVQ does not subtract the predictor from the input to produce a residual Mozilla Corporation 2 Integrating PVQ into AV1 ● Introduction of a transformed predictors both in encoder and decoder – Because PVQ references the predictor in the transform domain, instead of using a pixel-domain residual as in traditional scalar quantization ● Activity masking, the major benefit of PVQ, is not enabled yet – Encoding RDO is solely based on PSNR Mozilla Corporation 3 Traditional Architecture Input X residue Subtraction Transform T signal R predictor P + decoded X Inverse Inverse Scalar decoded R Transform Quantizer Quantizer Coefficient bitstream of Coder coded T(X) Mozilla Corporation 4 AV1 with PVQ Input X Transform T T(X) PVQ Quantizer PVQ Coefficient predictor P Transform T T(X) Coder PVQ Inverse Quantizer Inverse dequantized bitstream of decoded X Transform T(X) coded T(X) Mozilla Corporation 5 Coding Gain Change Metric AV1 --> AV1 with PVQ PSNR Y -0.17 PSNR-HVS 0.27 SSIM 0.93 MS-SSIM 0.14 CIEDE2000 -0.28 ● For the IETF test sequence set, "objective-1-fast". ● IETF and AOM for high latency encoding options are used. Mozilla Corporation 6 Speed ● Increase in total encoding time due to PVQ's search for best codepoint – PVQ's time complexity is close to O(n*n) for n coefficients, while scalar quantization has O(n) ● Compared to Daala, the search space for a RDO decision in AV1 is far larger – For the 1st frame of grandma_qcif (176x144) in intra frame mode, Daala calls PVQ 3843 times, while AV1 calls 632,520 times, that is ~165x. -
Efficient Multi-Codec Support for OTT Services: HEVC/H.265 And/Or AV1?
Efficient Multi-Codec Support for OTT Services: HEVC/H.265 and/or AV1? Christian Timmerer†,‡, Martin Smole‡, and Christopher Mueller‡ ‡Bitmovin Inc., †Alpen-Adria-Universität San Francisco, CA, USA and Klagenfurt, Austria, EU ‡{firstname.lastname}@bitmovin.com, †{firstname.lastname}@itec.aau.at Abstract – The success of HTTP adaptive streaming is un- multiple versions (e.g., different resolutions and bitrates) and disputed and technical standards begin to converge to com- each version is divided into predefined pieces of a few sec- mon formats reducing market fragmentation. However, other onds (typically 2-10s). A client first receives a manifest de- obstacles appear in form of multiple video codecs to be sup- scribing the available content on a server, and then, the client ported in the future, which calls for an efficient multi-codec requests pieces based on its context (e.g., observed available support for over-the-top services. In this paper, we review the bandwidth, buffer status, decoding capabilities). Thus, it is state of the art of HTTP adaptive streaming formats with re- able to adapt the media presentation in a dynamic, adaptive spect to new services and video codecs from a deployment way. perspective. Our findings reveal that multi-codec support is The existing different formats use slightly different ter- inevitable for a successful deployment of today's and future minology. Adopting DASH terminology, the versions are re- services and applications. ferred to as representations and pieces are called segments, which we will use henceforth. The major differences between INTRODUCTION these formats are shown in Table 1. We note a strong differ- entiation in the manifest format and it is expected that both Today's over-the-top (OTT) services account for more than MPEG's media presentation description (MPD) and HLS's 70 percent of the internet traffic and this number is expected playlist (m3u8) will coexist at least for some time. -
LOW COMPLEXITY H.264 to VC-1 TRANSCODER by VIDHYA
LOW COMPLEXITY H.264 TO VC-1 TRANSCODER by VIDHYA VIJAYAKUMAR Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN ELECTRICAL ENGINEERING THE UNIVERSITY OF TEXAS AT ARLINGTON AUGUST 2010 Copyright © by Vidhya Vijayakumar 2010 All Rights Reserved ACKNOWLEDGEMENTS As true as it would be with any research effort, this endeavor would not have been possible without the guidance and support of a number of people whom I stand to thank at this juncture. First and foremost, I express my sincere gratitude to my advisor and mentor, Dr. K.R. Rao, who has been the backbone of this whole exercise. I am greatly indebted for all the things that I have learnt from him, academically and otherwise. I thank Dr. Ishfaq Ahmad for being my co-advisor and mentor and for his invaluable guidance and support. I was fortunate to work with Dr. Ahmad as his research assistant on the latest trends in video compression and it has been an invaluable experience. I thank my mentor, Mr. Vishy Swaminathan, and my team members at Adobe Systems for giving me an opportunity to work in the industry and guide me during my internship. I would like to thank the other members of my advisory committee Dr. W. Alan Davis and Dr. William E Dillon for reviewing the thesis document and offering insightful comments. I express my gratitude Dr. Jonathan Bredow and the Electrical Engineering department for purchasing the software required for this thesis and giving me the chance to work on cutting edge technologies. -
Perceptual Audio Coding Contents
12/6/2007 Perceptual Audio Coding Henrique Malvar Managing Director, Redmond Lab UW Lecture – December 6, 2007 Contents • Motivation • “Source coding”: good for speech • “Sink coding”: Auditory Masking • Block & Lapped Transforms • Audio compression •Examples 2 1 12/6/2007 Contents • Motivation • “Source coding”: good for speech • “Sink coding”: Auditory Masking • Block & Lapped Transforms • Audio compression •Examples 3 Many applications need digital audio • Communication – Digital TV, Telephony (VoIP) & teleconferencing – Voice mail, voice annotations on e-mail, voice recording •Business – Internet call centers – Multimedia presentations • Entertainment – 150 songs on standard CD – thousands of songs on portable music players – Internet / Satellite radio, HD Radio – Games, DVD Movies 4 2 12/6/2007 Contents • Motivation • “Source coding”: good for speech • “Sink coding”: Auditory Masking • Block & Lapped Transforms • Audio compression •Examples 5 Linear Predictive Coding (LPC) LPC periodic excitation N coefficients x()nen= ()+−∑ axnrr ( ) gains r=1 pitch period e(n) Synthesis x(n) Combine Filter noise excitation synthesized speech 6 3 12/6/2007 LPC basics – analysis/synthesis synthesis parameters Analysis Synthesis algorithm Filter residual waveform N en()= xn ()−−∑ axnr ( r ) r=1 original speech synthesized speech 7 LPC variant - CELP selection Encoder index LPC original gain coefficients speech . Synthesis . Filter Decoder excitation codebook synthesized speech 8 4 12/6/2007 LPC variant - multipulse LPC coefficients excitation Synthesis