An Overview of Coding Tools in AV1: the First Video Codec from The

Total Page:16

File Type:pdf, Size:1020Kb

An Overview of Coding Tools in AV1: the First Video Codec from The SIP (2020), vol. 9, e6, page 1 of 15 © The Authors, 2020. This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited. doi:10.1017/ATSIP.2020.2 original paper An Overview of Coding Tools in AV1: the First Video Codec from the Alliance for Open Media yue chen,1 debargha mukherjee,1 jingning han,1 adrian grange,1 yaowu xu, 1 sarah parker,1 cheng chen,1 hui su,1 urvang joshi,1 ching-han chiang,1 1 1 1 2 2 yunqing wang, paul∗ wilkins, jim bankoski, luc trudeau, nathan egge, 3 4 4 5 jean-marc valin, thomas davies,∗∗ steinar midtskogen, andrey norkin, peterderivaz6 and zoe liu7 In 2018, the Alliance for Open Media (AOMedia) finalized its first video compression format AV1, which is jointly developed by the industry consortium of leading video technology companies. The main goal of AV1 is to provide an open source and royalty-free video coding format that substantially outperforms state-of-the-art codecs available on the market in compression efficiency while remaining practical decoding complexity as well as being optimized for hardware feasibility and scalability on modern devices. To give detailed insights into how the targeted performance and feasibility is realized, this paper provides a technical overview of key coding techniques in AV1. Besides, the coding performance gains are validated by video compression tests performed with the libaom AV1 encoder against the libvpx VP9 encoder. Preliminary comparison with two leading HEVC encoders, x265 and HM, and the reference software of VVC is also conducted on AOM’s common test set and an open 4k set. Keywords: Video compression, Open-source video coding, AV1, Alliance for Open Media Received 11 July 2019; Revised 19 December 2019 I. INTRODUCTION web browsers (Firefox, Chrome, etc.), and operating systems like Android. Therefore, in an effort to create an Over the last decade, web-based video applications have open video format at par with the leading commercial become prevalent, with modern devices and network choices, in mid 2013, Google launched and deployed the infrastructure driving rapid growth in the consump- VP9 video codec [1]. As a codec for production usage, tion of high-resolution, high-quality content. Therefore libvpx-VP9 considerably outperforms x264 [2], a popu- predominant bandwidth consumers such as video-on- lar open source encoder for the most widely used format demand(VoD), live streaming, and conversational video, H.264/AVC [3], while is also a strong competitor to x265, the along with emerging new applications including virtual open source encoder of the state-of-the-art royalty-bearing reality and cloud gaming, that critically rely on high reso- format H.265/HEVC [4]codeconHDcontent[5]. Thereby lution and low latency, are imposing severe challenges on after the release YouTube progressively pushes through VP9 delivery infrastructure and hence creating an even stronger adoption, as of now a good amount of YouTube content will need for high efficiency video compression technology. be streamed in VP9 format when it is possible at client side. It is widely acknowledged that the success of web appli- However, as the demand for high efficiency video appli- cationsisenabledbyopen,fastiterating,andfreelyimple- cationsroseanddiversified,itsoonbecameimperativeto mentable foundation technologies, for example, HTML, continue the advances in compression performance, as well as to incorporate designs facilitating efficient streaming in 1Google, USA 2Mozilla, USA scenarios beyond typical VoD. To that end, in late 2015, a 3Amazon, USA few leading video-on-demand providers, along with firms 4 Cisco, UK and Norway in semiconductor and web browser industry, co-founded 5Netflix, USA 6Argon Design, UK the Alliance for Open Media (AOMedia) [6], which is now a 7Visionular, USA consortium of more than 40 leading hi-tech companies, to *Work performed while with Mozilla. **Work performed while with Google. work jointly toward a next-generation open video coding format called AV1. Corresponding authors: Y. Chen and D. Mukherjee ThefocusofAV1developmentincludes,butisnotlim- E-mails: [email protected] and [email protected] ited to achieving: consistent high-quality real-time video Downloaded from https://www.cambridge.org/core. IP address: 104.132.29.65, on 24 Feb 2020 at 18:59:05, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.2 1 2 yue chen et al. delivery, scalability to modern devices at various band- claims much worse AV1 performance. We would like to widths, tractable computational footprint, optimization for suggest solutions to a few common issues of libaom configu- hardware, and flexibility for both commercial and non- ration in the comparison between libaom, HEVC encoders, commercial content. With VP9 tools and enhancements as and VTM (the reference software of the upcoming format the foundation, the codec was first initialized by solidify- VVC [18]). Firstly, when random access mode is used for ing promising tools from the separate VP10, Daala, and HEVC and VVC codecs, the counterpart mode of libaom Thor codecs that founding members have worked on. Cod- is two-pass coding with non-zero (19 is recommended) lag- ing tools and features were proposed, tested, discussed, and in-frames, rather than 1-pass mode and zero lag-in-frames iterated in AOMedia’s codec, hardware, and testing work- which will disable all bi-directional prediction tools as in groups, at the same time would be reviewed to work toward Refs. [10–12]. Secondly, as constant quality mode is usu- the goal of royalty-free. The AV1 video compression for- ally used for HEVC and VVC codecs in the comparison, mat has been finalized in the end of 2018, incorporating to match the setting, libaom needs to choose such mode a variety of new compression tools, high-level syntax, and as well by specifying end-usage = qandpassingintheQP parallelization features designed for specific use cases. (cq-level), rather than either using libaom in vbr mode to As has been the case for other open-source projects, the match the rates produced by other codecs [13], or manu- development of AV1 reference software – libaom,iscon- ally restraining libaom’s QP by setting min-q and max-q ducted openly in a public repository. The reference encoder [14]. Thirdly, in tests enforcing a 1s intra period, as the and decoder can be built from downloaded source code [7] default configurations of HM and VTM use open-loop GOP by following the guide [8]. Since the soft freeze of coding structure, to match it, forward key frames need to be man- tools in early 2018, libaom has made significant progress uallyenabledforlibaombyspecifyingenable-fwd-kf= 1, in terms of productization-readiness, by a radical accelera- otherwise [10–12,14], libaom encodes in closed gop struc- tion mainly achieved via machine learning powered search ture losing the benefit of cross gop referencing at every space pruning and early termination, and extra bitrate sav- intra frame. Need to mention that we appreciate all those ings via adaptive quantization and improved future refer- work, which greatly encourages community discussions and ence frame generation/structure. Besides libaom,nowthere pushes AOM developers to make better documentation on are other AV1 decoder and encoder implementations avail- how to use AV1 encoders. able or to be released soon designed for various goals and In this paper, we present the core coding tools in AV1 usage cases. Royalty-free AV1 decoder dav1d by VideoLAN that contribute to the majority of the 30 reduction in aver- and FFmpeg community [9], targeting for smooth play- age bitrate compared with the most performant libvpx VP9 back with low CPU utilization, now has been enabled by encoder at the same quality. Preliminary compression per- default in Firefox desktop version and will potentially also formance comparison between the libaom AV1 encoder and be included in Chrome soon. In the meantime, encoders four popular encoders, including the libvpx VP9 encoder, foravarietyofproductizationpurposesareequallyimpor- two widely recognized HEVC encoders – x265 and HM ref- tant to AV1’s ecosystem, for example, SVT-AV1 (by Intel erence software, as well as VTM,areoperatedontestsets and Netflix), rav1e (by Mozilla and Xiph.org), and Aurora (AOM’s common test set and an open 4k set) covering (by Visionular) AV1 encoders are emerging AV1 encod- various resolution and content types under conditions that ing solutions focusing on one or more specific goals like approximate common VoD encoding configurations. high-performance, real-time encoding, on-demand con- tent, immerse/interactive content, perceptual quality, etc. Due to the increasing interest in AV1 performance, many II. AV1 CODING TECHNIQUES efforts [10–17] have been made to conduct performance comparison between libaom-AV1 and other mainstream A) Coding block partition encoders of other formats. Among other work, the conclu- sions are different, and some of them have not listed key VP9 uses a four-way partition tree starting from the 64 × 64 libaom encoder configurations. To address community’s leveldownto4× 4 level, with some additional restrictions concerns on the contradictory conclusions, we would like for blocks below 8 × 8wherewithinan8x8blockallthe to point out some issues that can be spotted in the pro- sub-blocks should have the same reference frame, as shown vided information. References [15–17]claimsthatlibaom- in the top half of Fig. 1,soastoensurethechromacompo- AV1 performs better than x265 under PSNR metric. Ref- nent can be processed in a minimum of 4 × 4 block unit. erence [16]presentsover−30 BDRates for HD (≥720p) Note that partitions designated as “R” refer to as recursive content and −17 overall BDRate, [15]showslessgain in that the same partition tree is repeated at a lower scale of 15 possibly because of the quality loss introduced by until we reach the lowest 4 × 4level.
Recommended publications
  • FCC-06-11A1.Pdf
    Federal Communications Commission FCC 06-11 Before the FEDERAL COMMUNICATIONS COMMISSION WASHINGTON, D.C. 20554 In the Matter of ) ) Annual Assessment of the Status of Competition ) MB Docket No. 05-255 in the Market for the Delivery of Video ) Programming ) TWELFTH ANNUAL REPORT Adopted: February 10, 2006 Released: March 3, 2006 Comment Date: April 3, 2006 Reply Comment Date: April 18, 2006 By the Commission: Chairman Martin, Commissioners Copps, Adelstein, and Tate issuing separate statements. TABLE OF CONTENTS Heading Paragraph # I. INTRODUCTION.................................................................................................................................. 1 A. Scope of this Report......................................................................................................................... 2 B. Summary.......................................................................................................................................... 4 1. The Current State of Competition: 2005 ................................................................................... 4 2. General Findings ....................................................................................................................... 6 3. Specific Findings....................................................................................................................... 8 II. COMPETITORS IN THE MARKET FOR THE DELIVERY OF VIDEO PROGRAMMING ......... 27 A. Cable Television Service ..............................................................................................................
    [Show full text]
  • How to Exploit the Transferability of Learned Image Compression to Conventional Codecs
    How to Exploit the Transferability of Learned Image Compression to Conventional Codecs Jan P. Klopp Keng-Chi Liu National Taiwan University Taiwan AI Labs [email protected] [email protected] Liang-Gee Chen Shao-Yi Chien National Taiwan University [email protected] [email protected] Abstract Lossy compression optimises the objective Lossy image compression is often limited by the sim- L = R + λD (1) plicity of the chosen loss measure. Recent research sug- gests that generative adversarial networks have the ability where R and D stand for rate and distortion, respectively, to overcome this limitation and serve as a multi-modal loss, and λ controls their weight relative to each other. In prac- especially for textures. Together with learned image com- tice, computational efficiency is another constraint as at pression, these two techniques can be used to great effect least the decoder needs to process high resolutions in real- when relaxing the commonly employed tight measures of time under a limited power envelope, typically necessitating distortion. However, convolutional neural network-based dedicated hardware implementations. Requirements for the algorithms have a large computational footprint. Ideally, encoder are more relaxed, often allowing even offline en- an existing conventional codec should stay in place, ensur- coding without demanding real-time capability. ing faster adoption and adherence to a balanced computa- Recent research has developed along two lines: evolu- tional envelope. tion of exiting coding technologies, such as H264 [41] or As a possible avenue to this goal, we propose and investi- H265 [35], culminating in the most recent AV1 codec, on gate how learned image coding can be used as a surrogate the one hand.
    [Show full text]
  • Download Media Player Codec Pack Version 4.1 Media Player Codec Pack
    download media player codec pack version 4.1 Media Player Codec Pack. Description: In Microsoft Windows 10 it is not possible to set all file associations using an installer. Microsoft chose to block changes of file associations with the introduction of their Zune players. Third party codecs are also blocked in some instances, preventing some files from playing in the Zune players. A simple workaround for this problem is to switch playback of video and music files to Windows Media Player manually. In start menu click on the "Settings". In the "Windows Settings" window click on "System". On the "System" pane click on "Default apps". On the "Choose default applications" pane click on "Films & TV" under "Video Player". On the "Choose an application" pop up menu click on "Windows Media Player" to set Windows Media Player as the default player for video files. Footnote: The same method can be used to apply file associations for music, by simply clicking on "Groove Music" under "Media Player" instead of changing Video Player in step 4. Media Player Codec Pack Plus. Codec's Explained: A codec is a piece of software on either a device or computer capable of encoding and/or decoding video and/or audio data from files, streams and broadcasts. The word Codec is a portmanteau of ' co mpressor- dec ompressor' Compression types that you will be able to play include: x264 | x265 | h.265 | HEVC | 10bit x265 | 10bit x264 | AVCHD | AVC DivX | XviD | MP4 | MPEG4 | MPEG2 and many more. File types you will be able to play include: .bdmv | .evo | .hevc | .mkv | .avi | .flv | .webm | .mp4 | .m4v | .m4a | .ts | .ogm .ac3 | .dts | .alac | .flac | .ape | .aac | .ogg | .ofr | .mpc | .3gp and many more.
    [Show full text]
  • The Interplay of Compile-Time and Run-Time Options for Performance Prediction Luc Lesoil, Mathieu Acher, Xhevahire Tërnava, Arnaud Blouin, Jean-Marc Jézéquel
    The Interplay of Compile-time and Run-time Options for Performance Prediction Luc Lesoil, Mathieu Acher, Xhevahire Tërnava, Arnaud Blouin, Jean-Marc Jézéquel To cite this version: Luc Lesoil, Mathieu Acher, Xhevahire Tërnava, Arnaud Blouin, Jean-Marc Jézéquel. The Interplay of Compile-time and Run-time Options for Performance Prediction. SPLC 2021 - 25th ACM Inter- national Systems and Software Product Line Conference - Volume A, Sep 2021, Leicester, United Kingdom. pp.1-12, 10.1145/3461001.3471149. hal-03286127 HAL Id: hal-03286127 https://hal.archives-ouvertes.fr/hal-03286127 Submitted on 15 Jul 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The Interplay of Compile-time and Run-time Options for Performance Prediction Luc Lesoil, Mathieu Acher, Xhevahire Tërnava, Arnaud Blouin, Jean-Marc Jézéquel Univ Rennes, INSA Rennes, CNRS, Inria, IRISA Rennes, France [email protected] ABSTRACT Both compile-time and run-time options can be configured to reach Many software projects are configurable through compile-time op- specific functional and performance goals. tions (e.g., using ./configure) and also through run-time options (e.g., Existing studies consider either compile-time or run-time op- command-line parameters, fed to the software at execution time).
    [Show full text]
  • On Audio-Visual File Formats
    On Audio-Visual File Formats Summary • digital audio and digital video • container, codec, raw data • different formats for different purposes Reto Kromer • AV Preservation by reto.ch • audio-visual data transformations Film Preservation and Restoration Hyderabad, India 8–15 December 2019 1 2 Digital Audio • sampling Digital Audio • quantisation 3 4 Sampling • 44.1 kHz • 48 kHz • 96 kHz • 192 kHz digitisation = sampling + quantisation 5 6 Quantisation • 16 bit (216 = 65 536) • 24 bit (224 = 16 777 216) • 32 bit (232 = 4 294 967 296) Digital Video 7 8 Digital Video Resolution • resolution • SD 480i / SD 576i • bit depth • HD 720p / HD 1080i • linear, power, logarithmic • 2K / HD 1080p • colour model • 4K / UHD-1 • chroma subsampling • 8K / UHD-2 • illuminant 9 10 Bit Depth Linear, Power, Logarithmic • 8 bit (28 = 256) «medium grey» • 10 bit (210 = 1 024) • linear: 18% • 12 bit (212 = 4 096) • power: 50% • 16 bit (216 = 65 536) • logarithmic: 50% • 24 bit (224 = 16 777 216) 11 12 Colour Model • XYZ, L*a*b* • RGB / R′G′B′ / CMY / C′M′Y′ • Y′IQ / Y′UV / Y′DBDR • Y′CBCR / Y′COCG • Y′PBPR 13 14 15 16 17 18 RGB24 00000000 11111111 00000000 00000000 00000000 00000000 11111111 00000000 00000000 00000000 00000000 11111111 00000000 11111111 11111111 11111111 11111111 00000000 11111111 11111111 11111111 11111111 00000000 11111111 19 20 Compression Uncompressed • uncompressed + data simpler to process • lossless compression + software runs faster • lossy compression – bigger files • chroma subsampling – slower writing, transmission and reading • born
    [Show full text]
  • Blackberry QNX Multimedia Suite
    PRODUCT BRIEF QNX Multimedia Suite The QNX Multimedia Suite is a comprehensive collection of media technology that has evolved over the years to keep pace with the latest media requirements of current-day embedded systems. Proven in tens of millions of automotive infotainment head units, the suite enables media-rich, high-quality playback, encoding and streaming of audio and video content. The multimedia suite comprises a modular, highly-scalable architecture that enables building high value, customized solutions that range from simple media players to networked systems in the car. The suite is optimized to leverage system-on-chip (SoC) video acceleration, in addition to supporting OpenMAX AL, an industry open standard API for application-level access to a device’s audio, video and imaging capabilities. Overview Consumer’s demand for multimedia has fueled an anywhere- o QNX SDK for Smartphone Connectivity (with support for Apple anytime paradigm, making multimedia ubiquitous in embedded CarPlay and Android Auto) systems. More and more embedded applications have require- o Qt distributions for QNX SDP 7 ments for audio, video and communication processing capabilities. For example, an infotainment system’s media player enables o QNX CAR Platform for Infotainment playback of content, stored either on-board or accessed from an • Support for a variety of external media stores external drive, mobile device or streamed over IP via a browser. Increasingly, these systems also have streaming requirements for Features at a Glance distributing content across a network, for instance from a head Multimedia Playback unit to the digital instrument cluster or rear seat entertainment units. Multimedia is also becoming pervasive in other markets, • Software-based audio CODECs such as medical, industrial, and whitegoods where user interfaces • Hardware accelerated video CODECs are increasingly providing users with a rich media experience.
    [Show full text]
  • SA1OPS English User Manual
    Register your product and get support at www.philips.com/welcome SA1OPS08 SA1OPS16 SA1OPS32 EN User manual Select files and playlists for manual Contents sync 15 Copy files from GoGear Opus to your computer 16 English 1 Important safety information 3 WMP11 playlists 16 General maintenance 3 Create a regular playlist 16 Recycling the product 4 Create an auto playlist 16 Edit playlist 17 2 Your new GoGear Opus 6 Transfer playlists to GoGear Opus 17 What’s in the box 6 Search for music or pictures with WMP11 17 Delete files and playlists from WMP11 3 Getting started 7 library 17 Overview of the controls and Delete files and playlists from GoGear connections 7 Opus 18 Overview of the main menu 7 Edit song information with WMP11 18 Install software 8 Format GoGear Opus with WMP11 19 Connect and charge 8 Connect GoGear Opus to a computer 8 6 Music 20 Battery level indication 8 Listen to music 20 Battery level indication 9 Find your music 20 Disconnect GoGear Opus safely 9 Delete music tracks 20 Turn GoGear Opus on and off 9 Automatic standby and shut-down 9 7 Audiobooks 21 Add audiobooks to GoGear Opus 21 4 Use GoGear Opus to carry files 10 Audiobook controls 21 Select audiobook by book title 21 Adjust audiobook play speed 22 5 Windows Media Player 11 Add a bookmark in an audiobook 22 (WMP11) 11 Find a bookmark in an audiobook 22 Install Windows Media Player 11 Delete a bookmark in an audiobook 22 (WMP11) 11 Transfer music and picture files to WMP11 library 11 8 Video 23 Switch between music and pictures Download, convert and transfer library
    [Show full text]
  • Screen Capture Tools to Record Online Tutorials This Document Is Made to Explain How to Use Ffmpeg and Quicktime to Record Mini Tutorials on Your Own Computer
    Screen capture tools to record online tutorials This document is made to explain how to use ffmpeg and QuickTime to record mini tutorials on your own computer. FFmpeg is a cross-platform tool available for Windows, Linux and Mac. Installation and use process depends on your operating system. This info is taken from (Bellard 2016). Quicktime Player is natively installed on most of Mac computers. This tutorial focuses on Linux and Mac. Table of content 1. Introduction.......................................................................................................................................1 2. Linux.................................................................................................................................................1 2.1. FFmpeg......................................................................................................................................1 2.1.1. installation for Linux..........................................................................................................1 2.1.1.1. Add necessary components........................................................................................1 2.1.2. Screen recording with FFmpeg..........................................................................................2 2.1.2.1. List devices to know which one to record..................................................................2 2.1.2.2. Record screen and audio from your computer...........................................................3 2.2. Kazam........................................................................................................................................4
    [Show full text]
  • Chapter 9 Image Compression Standards
    Fundamentals of Multimedia, Chapter 9 Chapter 9 Image Compression Standards 9.1 The JPEG Standard 9.2 The JPEG2000 Standard 9.3 The JPEG-LS Standard 9.4 Bi-level Image Compression Standards 9.5 Further Exploration 1 Li & Drew c Prentice Hall 2003 ! Fundamentals of Multimedia, Chapter 9 9.1 The JPEG Standard JPEG is an image compression standard that was developed • by the “Joint Photographic Experts Group”. JPEG was for- mally accepted as an international standard in 1992. JPEG is a lossy image compression method. It employs a • transform coding method using the DCT (Discrete Cosine Transform). An image is a function of i and j (or conventionally x and y) • in the spatial domain. The 2D DCT is used as one step in JPEG in order to yield a frequency response which is a function F (u, v) in the spatial frequency domain, indexed by two integers u and v. 2 Li & Drew c Prentice Hall 2003 ! Fundamentals of Multimedia, Chapter 9 Observations for JPEG Image Compression The effectiveness of the DCT transform coding method in • JPEG relies on 3 major observations: Observation 1: Useful image contents change relatively slowly across the image, i.e., it is unusual for intensity values to vary widely several times in a small area, for example, within an 8 8 × image block. much of the information in an image is repeated, hence “spa- • tial redundancy”. 3 Li & Drew c Prentice Hall 2003 ! Fundamentals of Multimedia, Chapter 9 Observations for JPEG Image Compression (cont’d) Observation 2: Psychophysical experiments suggest that hu- mans are much less likely to notice the loss of very high spatial frequency components than the loss of lower frequency compo- nents.
    [Show full text]
  • Amphion Video Codecs in an AI World
    Video Codecs In An AI World Dr Doug Ridge Amphion Semiconductor The Proliferance of Video in Networks • Video produces huge volumes of data • According to Cisco “By 2021 video will make up 82% of network traffic” • Equals 3.3 zetabytes of data annually • 3.3 x 1021 bytes • 3.3 billion terabytes AI Engines Overview • Example AI network types include Artificial Neural Networks, Spiking Neural Networks and Self-Organizing Feature Maps • Learning and processing are automated • Processing • AI engines designed for processing huge amounts of data quickly • High degree of parallelism • Much greater performance and significantly lower power than CPU/GPU solutions • Learning and Inference • AI ‘learns’ from masses of data presented • Data presented as Input-Desired Output or as unmarked input for self-organization • AI network can start processing once initial training takes place Typical Applications of AI • Reduce data to be sorted manually • Example application in analysis of mammograms • 99% reduction in images send for analysis by specialist • Reduction in workload resulted in huge reduction in wrong diagnoses • Aid in decision making • Example application in traffic monitoring • Identify areas of interest in imagery to focus attention • No definitive decision made by AI engine • Perform decision making independently • Example application in security video surveillance • Alerts and alarms triggered by AI analysis of behaviours in imagery • Reduction in false alarms and more attention paid to alerts by security staff Typical Video Surveillance
    [Show full text]
  • (A/V Codecs) REDCODE RAW (.R3D) ARRIRAW
    What is a Codec? Codec is a portmanteau of either "Compressor-Decompressor" or "Coder-Decoder," which describes a device or program capable of performing transformations on a data stream or signal. Codecs encode a stream or signal for transmission, storage or encryption and decode it for viewing or editing. Codecs are often used in videoconferencing and streaming media solutions. A video codec converts analog video signals from a video camera into digital signals for transmission. It then converts the digital signals back to analog for display. An audio codec converts analog audio signals from a microphone into digital signals for transmission. It then converts the digital signals back to analog for playing. The raw encoded form of audio and video data is often called essence, to distinguish it from the metadata information that together make up the information content of the stream and any "wrapper" data that is then added to aid access to or improve the robustness of the stream. Most codecs are lossy, in order to get a reasonably small file size. There are lossless codecs as well, but for most purposes the almost imperceptible increase in quality is not worth the considerable increase in data size. The main exception is if the data will undergo more processing in the future, in which case the repeated lossy encoding would damage the eventual quality too much. Many multimedia data streams need to contain both audio and video data, and often some form of metadata that permits synchronization of the audio and video. Each of these three streams may be handled by different programs, processes, or hardware; but for the multimedia data stream to be useful in stored or transmitted form, they must be encapsulated together in a container format.
    [Show full text]
  • Opus, a Free, High-Quality Speech and Audio Codec
    Opus, a free, high-quality speech and audio codec Jean-Marc Valin, Koen Vos, Timothy B. Terriberry, Gregory Maxwell 29 January 2014 Xiph.Org & Mozilla What is Opus? ● New highly-flexible speech and audio codec – Works for most audio applications ● Completely free – Royalty-free licensing – Open-source implementation ● IETF RFC 6716 (Sep. 2012) Xiph.Org & Mozilla Why a New Audio Codec? http://xkcd.com/927/ http://imgs.xkcd.com/comics/standards.png Xiph.Org & Mozilla Why Should You Care? ● Best-in-class performance within a wide range of bitrates and applications ● Adaptability to varying network conditions ● Will be deployed as part of WebRTC ● No licensing costs ● No incompatible flavours Xiph.Org & Mozilla History ● Jan. 2007: SILK project started at Skype ● Nov. 2007: CELT project started ● Mar. 2009: Skype asks IETF to create a WG ● Feb. 2010: WG created ● Jul. 2010: First prototype of SILK+CELT codec ● Dec 2011: Opus surpasses Vorbis and AAC ● Sep. 2012: Opus becomes RFC 6716 ● Dec. 2013: Version 1.1 of libopus released Xiph.Org & Mozilla Applications and Standards (2010) Application Codec VoIP with PSTN AMR-NB Wideband VoIP/videoconference AMR-WB High-quality videoconference G.719 Low-bitrate music streaming HE-AAC High-quality music streaming AAC-LC Low-delay broadcast AAC-ELD Network music performance Xiph.Org & Mozilla Applications and Standards (2013) Application Codec VoIP with PSTN Opus Wideband VoIP/videoconference Opus High-quality videoconference Opus Low-bitrate music streaming Opus High-quality music streaming Opus Low-delay
    [Show full text]