Convention Paper Presented at the 111Th Convention 2001 September 21–24 New York, NY, USA

Total Page:16

File Type:pdf, Size:1020Kb

Convention Paper Presented at the 111Th Convention 2001 September 21–24 New York, NY, USA View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Infoscience - École polytechnique fédérale de Lausanne Audio Engineering Society Convention Paper Presented at the 111th Convention 2001 September 21–24 New York, NY, USA This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society. Audio Coding Using Perceptually Controlled Bitstream Buffering Christof Faller Media Signal Processing Research, Agere Systems 600 Mountain Avenue, Murray Hill, New Jersey 07974, USA [email protected] ABSTRACT Perceptual audio coders use a varying number of bits to encode subsequent frames according to the perceptual entropy of the audio signal. For transmission over a constant bitrate channel the bitstream must be buffered. The buffer must be large enough to absorb variations in the bitrate, otherwise the quality of the audio will be compromised. We present a new scheme for buffer control of perceptual audio coders. In contrast to conventional schemes the proposed scheme systematically reduces the variation in a perceptual distortion measure over time. The new scheme applied to a perceptual audio coder (PAC) improves the quality of the encoded signal for a given buffer size. The same technique can be used to increase the performance of other coders such as MPEG-1 Layer III or MPEG-2 AAC while maintaining backward compatibility. 1 INTRODUCTION noise threshold determined by the perceptual model. For transparent audio coding the perceptual model computes Typical non-stationary signals such as audio signals the masked threshold [4]. For non-transparent audio have a variable inherent perceptual entropy [1]. To ap- coding the parameters of the perceptual model can be proach the compression limit (i.e. perceptual entropy tuned such that it generates a supra-threshold, i.e. a for transparent audio coding) of such signals, variable noise threshold which is above the masked threshold. In bitrate compression techniques are used. Perceptual au- this case, more quantization noise is introduced into the dio coders [2, 3] quantize the spectral components of an encoded audio signal and the bitrate will be lower. audio signal such that the quantization noise follows the FALLER Audio Coding Using Perceptually Controlled Bitstream Buffering In a general sense, a perceptual model is an algorithm SIGNAL AMPLITUDE which computes for each frame k of an audio signal a noise threshold for each spectral component of this sig- nal with the perceived distortion D[k] as a model pa- D[k] rameter. Ideally the computed noise threshold has the DR property that it is the spectrally shaped noise with the largest possible variance for a perceived distortion of M[k]|DR D[k]. For D[k] = 0 the perceptual model computes the R masked threshold (for transparent audio coding) and for 1 2 3 4 5 D[k] > 0 it computes a supra-threshold (for non trans- TIME (FRAME #) parent audio coding). Ideally a perceptual model com- putes the noise thresholds such that for a constant pa- Fig. 1: Perceptual audio coders are by nature variable bi- rameter D[k] the perceived distortion of the entire coded trate source coders. For encoding an audio signal (top) at a audio signal is constant. constant distortion D[k] (middle) each frame has a different In this paper, we assume that an audio coder without bitrate M[k] (bottom). rate control or buffer control is the ideal case of encod- ing an audio signal. In this case the audio signal is en- SIGNAL AMPLITUDE coded with quantization noise equal to the noise thresh- old given by the perceptual model with the distortion D[k] held constant D[k] = DR. We call the frame bitrates D M[k]|DR in this case inherent bitrates with an average R bitrate of R. The average bitrate M[k] N 1 R R = M[k]|D (1) N X R k=1 1 2 3 4 5 TIME (FRAME #) of an audio signal encoded with a constant distortion D[k] = DR is not known prior to encoding the signal. Fig. 2: For encoding an audio signal (top) at a constant The number of frames to be encoded is N. bitrate M[k] = const. (bottom) the distortion D[k] (middle) For a specific desired bitrate R the audio signal is ideally must be varied in time. encoded with frame bitrates M[k] (bits/frame) equal to the inherent bitrates M[k]|D . A criterion for the 2 R bitrates M[k]|DR leading to a large σR and to significant optimality of the encoding process is variations of the distortion D[k] over time. The encoded 2 2 audio signal has now a significantly worse quality than σD = E{(D[k]|M[k] − DR) } . (2) in the ideal case of encoding with M[k] = M[k]|DR for 2 The smaller σD is the less is the distortion varying over the same average bitrate R. 2 time with no variation at all for σD = 0. The more Figure 3 shows conceptionally these two extreme cases of the distortion D[k] differs from DR the more the bitrate operation for an average bitrate of R. In one case (a) only M[k] differs from the inherent bitrate M[k]|DR and a the bitrate is varied (M[k] = M[k]|DR , D[k] = const.) related measure for optimality is and in the other case (b) only the distortion is varied 2 2 (M[k]= R, D[k]). σR = E{(M[k] − M[k]|DR ) } . (3) Many applications for constant bitrate transmission, In the optimal case when an audio signal is encoded such as digital radio broadcasting [5, 6] or Internet with a constant distortion D[k]= DR the bitrate of each streaming [7], can afford an end-to-end delay. For such frame M[k] = M[k]|DR will vary significantly as shown applications the bitstream of the audio coder can be in Fig. 1. However, many applications require a constant buffered. The buffer can absorb a certain degree of varia- bitrate R transmission from the encoder to the decoder. tion in the bitrate M[k], leading to reduced variations in To operate an audio encoder at a constant bitrate one the distortion D[k] over the case of an audio coder with- can iteratively encode each frame k with various distor- out bitstream buffering. A tradeoff can be made between tions D[k] until the bitrate of the frame M[k] is equal to variation in bitrate and variation in the distortion. The the desired bitrate R as shown in Fig. 2. In this case the smaller the variation in the distortion, the larger the vari- bitrates M[k] = R differ significantly from the inherent ation in the bitrate, and visa versa. The buffer size and AES 111TH CONVENTION, NEW YORK, NY, USA, 2001 SEPTEMBER 21–24 2 FALLER Audio Coding Using Perceptually Controlled Bitstream Buffering M[k] M[k] d c b b a R R a D[k] D D[k] DR R Fig. 3: The two extreme cases of operation of an audio coder: Fig. 4: Different scenarios for audio coding over a constant (a) ideally the audio coder is unconstrained and only varying bitrate transmission channel: (a) constant bitrate audio cod- the bitrate M[k], (b) the audio coder forced to be a constant ing without bitstream buffering, (b) buffered bitstream, (c) bitrate coder by varying only the distortion. buffered bitstream with larger buffer or better buffer control, (d) infinitely large buffer. the buffer control scheme determine how much variation AUDIO M[k] in the bitrate can be absorbed by the buffer. Therefore IN ENCODER BUFFER the buffer size and the buffer control scheme also deter- mine the amount of variation in the distortion. Figure d[k] l[k] 4 schematically shows the regions in which the bitrates CONSTANT M[k] and distortions D[k] lie for different scenarios: BUFFER CONTROL BITRATE Rd a) Constant bitrate audio coding M[k] = R (no bit- stream buffering). AUDIO OUT DECODER BUFFER b) Audio coding with bitstream buffering. c) Audio coding with bitstream buffering (larger buffer Fig. 5: An audio encoder and decoder with a constant bitrate or better buffer control scheme than b). transmission channel. d) Unconstrained audio coding M[k] = M[k]|DR (in- finitely large buffer). complexity restrictions of the encoder or decoder. For broadcasting applications the desired end-to-end delay Figure 5 shows an audio encoder and decoder with a is limited by the cost of the decoder and the tune-in buffered bitstream for a constant bitrate transmission time, i.e. the time it takes from a request for playback of the bits. In this scenario, the M[k] bits of the en- until the audio actually plays back. Therefore there is coded frame, at the time of each frame k, are put into a a need of a buffer control scheme which minimizes the FIFO (first-in-first-out) buffer while Rd bits are removed variation in the distortion for a given limited buffer size. from the FIFO buffer by the constant bitrate transmis- In this paper, we present a novel buffer control scheme sion channel. The number of data bits in the encoder for audio coding which significantly reduces the variation buffer can be expressed iteratively as in the distortion D[k] for a given buffer size.
Recommended publications
  • Optimized Bitrate Ladders for Adaptive Video Streaming with Deep Reinforcement Learning
    Optimized Bitrate Ladders for Adaptive Video Streaming with Deep Reinforcement Learning ∗ Tianchi Huang1, Lifeng Sun1,2,3 1Dept. of CS & Tech., 2BNRist, Tsinghua University. 3Key Laboratory of Pervasive Computing, China ABSTRACT Transcoding Online Stage Video Quality Stage In the adaptive video streaming scenario, videos are pre-chunked Storage Cost and pre-encoded according to a set of resolution-bitrate/quality Deploy pairs on the server-side, namely bitrate ladder. Hence, we pro- … … pose DeepLadder, which adopts state-of-the-art deep reinforcement learning (DRL) method to optimize the bitrate ladder by consid- Transcoding Server ering video content features, current network capacities, as well Raw Videos Video Chunks NN-based as the storage cost. Experimental results on both Constant Bi- Decison trate (CBR) and Variable Bitrate (VBR)-encoded videos demonstrate Network & ABR Status Feedback that DeepLadder significantly improvements on average video qual- ity, bandwidth utilization, and storage overhead in comparison to Figure 1: An Overview of DeepLadder’s System. We leverage prior work. a NN-based decision model for constructing the proper bi- trate ladders, and transcode the video according to the as- CCS CONCEPTS signed settings. • Information systems → Multimedia streaming; • Computing and solve the problem mathematically. In this poster, we propose methodologies → Neural networks; DeepLadder, a per-chunk video transcoding system. Technically, we set video contents, current network traffic distributions, past KEYWORDS actions as the state, and utilize a neural network (NN) to deter- Bitrate Ladder Optimization, Deep Reinforcement Learning. mine the proper action for each resolution autoregressively. Unlike the traditional bitrate ladder method that outputs all candidates ACM Reference Format: at one step, we model the optimization process as a Markov Deci- Tianchi Huang, Lifeng Sun.
    [Show full text]
  • Cube Encoder and Decoder Reference Guide
    CUBE ENCODER AND DECODER REFERENCE GUIDE © 2018 Teradek, LLC. All Rights Reserved. TABLE OF CONTENTS 1. Introduction ................................................................................ 3 Support Resources ........................................................ 3 Disclaimer ......................................................................... 3 Warning ............................................................................. 3 HEVC Products ............................................................... 3 HEVC Content ................................................................. 3 Physical Properties ........................................................ 4 2. Getting Started .......................................................................... 5 Power Your Device ......................................................... 5 Connect to a Network .................................................. 6 Choose Your Application .............................................. 7 Choose a Stream Mode ............................................... 9 3. Encoder Configuration ..........................................................10 Video/Audio Input .......................................................12 Color Management ......................................................13 Encoder ...........................................................................14 Network Interfaces .....................................................15 Cloud Services ..............................................................17
    [Show full text]
  • How to Encode Video for the Future
    How to Encode Video for the Future Dror Gill, CTO, Beamr Abstract The quality expectations of viewers paired with the ever-increasing shift to over-the-top (OTT) and mobile video consumption, are driving today’s networks to be more congested with video than ever before. To counter this congestion, this paper will cover advanced techniques for applying content-adaptive encoding and optimization methods to video workflows while lowering the bitrate of encoded video without compromising quality. Intended Audience: Business decision makers Video encoding engineers To learn more about optimized content-adaptive encoding, email [email protected] ©Beamr Imaging Ltd. 2017 | beamr.com Table of contents 3 Encoding for the future. 3 The trade off between bitrate and quality. 3 Legacy approaches to encoding your video content. 3 What is constant bitrate (CBR) encoding? 4 What about variable bitrate encoding? 4 Encoding content with constant rate factor encoding. 4 Capped content rate factor encoding for high complexity scenes. 4 Encoding content for the future. 5 Manually encoding content by title. 5 Manually encoding content by the category. 6 Content-adaptive encoding by the title and chunk. 6 Content-adaptive encoding using neural networks. 7 Closed loop content-adaptive encoding by the frame. 9 How should you be encoding your content? 10 References ©Beamr Imaging Ltd. 2017 | beamr.com Encoding for the future. how it will impact the file size and perceived visual quality of the video. The standard method of encoding video for delivery over the Internet utilizes a pre-set group of resolutions The rate control algorithm adjusts encoder parameters and bitrates known as adaptive bitrate (ABR) sets.
    [Show full text]
  • Why “Not- Compressing” Simply Doesn't Make Sens ?
    WHY “NOT- COMPRESSING” SIMPLY DOESN’T MAKE SENS ? Confidential IMAGES AND VIDEOS ARE LIKE SPONGES It seems to be a solid. But what happens when you squeeze it? It gets smaller! Why? If you look closely at the sponge, you will see that it has lots of holes in it. The sponge is made up of a mixture of solid and gas. When you squeeze it, the solid part changes it shape, but stays the same size. The gas in the holes gets smaller, so the entire sponge takes up less space. Confidential 2 IMAGES AND VIDEOS ARE LIKE SPONGES There is a lot of data that are not bringing any information to our human eyes, and we can remove it. It does not make sense to transport, store uncompressed images/videos. It is adding data that have an undeniable cost and that are not bringing any additional valuable information to the viewers. Confidential 3 A 4K TV SHOW WITHOUT ANY CODEC ▪ 60 MINUTES STORAGE: 4478.85 GB ▪ STREAMING: 9.953 Gbit per sec Confidential 4 IMAGES AND VIDEOS ARE LIKE SPONGES Remove Remove Remove Remove Temporal redundancy Spatial redundancy Visual redundancy Coding redundancy (interframe prediction,..) (transforms,..) (quantization,…) (entropy coding,…) Confidential 5 WHAT IS A CODEC? Short for COder & DECoder “A codec is a device or computer program for encoding or decoding a digital data stream or signal.” Note : ▪ It does not (necessarily) define quality ▪ It does not define transport / container format method. Confidential 6 WHAT IS A CODEC? Quality, Latency, Complexity, Bandwidth depend on how the actual algorithms are processing the content.
    [Show full text]
  • Bosch Intelligent Streaming
    Intelligent streaming | Bitrate-optimized high-quality video 1 | 22 Intelligent streaming Bitrate-optimized high-quality video – White Paper Data subject to change without notice | May 31st, 2019 BT-SC Intelligent streaming | Bitrate-optimized high-quality video 2 | 22 Table of contents 1 Introduction 3 2 Basic structures and processes 4 2.1 Image capture process ..................................................................................................................................... 4 2.2 Image content .................................................................................................................................................... 4 2.3 Scene content .................................................................................................................................................... 5 2.4 Human vision ..................................................................................................................................................... 5 2.5 Video compression ........................................................................................................................................... 6 3 Main concepts and components 7 3.1 Image processing .............................................................................................................................................. 7 3.2 VCA ..................................................................................................................................................................... 7 3.3 Smart Encoder
    [Show full text]
  • Isize Bitsave:High-Quality Video Streaming at Lower Bitrates
    Datasheet v20210518 iSIZE BitSave: High-Quality Video Streaming at Lower Bitrates iSIZE’s innovative BitSave technology comprises an AI-based perceptual preprocessing solution that allows conventional, third-party encoders to produce higher quality video at lower bitrate. BitSave achieves this by preprocessing the video content before it reaches the video encoder (Fig. 1), such that the output after compressing with any standard video codec is perceptually optimized with less motion artifacts or blurring, for the same or lower bitrate. BitSave neural networks are able to isolate areas of perceptual importance, such as those with high motion or detailed texture, and optimize their structure so that they are preserved better at lower bitrate by any subsequent encoder. As shown in our recent paper, when integrated as a preprocessor prior to a video encoder, BitSave is able to reduce bitrate requirements for a given quality level by up to 20% versus that encoder. When coupled with open-source AVC, HEVC and AV1 encoders, BitSave allows for up to 40% bitrate reduction at no quality compromise in comparison to leading third-party AVC, HEVC and AV1 encoding services. Lower bitrates equate to smaller filesizes, lower storage, transmission and distribution costs, reduced energy consumption and more satisfied customers. Source BitSave Encoder Delivery AI-based pre- Single pass processing processing prior One frame latency per content for an to encoding (AVC, entire ABR ladder HEVC, VP9, AV1) Improves encoding quality Integrated within Intel as measured by standard OpenVINO, ONNX and Dolby perceptual quality metrics Vision, easy to plug&play (VMAF, SSIM, VIF) within any existing workflow Figure 1.
    [Show full text]
  • Towards Optimal Compression of Meteorological Data: a Case Study of Using Interval-Motivated Overestimators in Global Optimization
    Towards Optimal Compression of Meteorological Data: A Case Study of Using Interval-Motivated Overestimators in Global Optimization Olga Kosheleva Department of Electrical and Computer Engineering and Department of Teacher Education, University of Texas, El Paso, TX 79968, USA, [email protected] 1 Introduction The existing image and data compression techniques try to minimize the mean square deviation between the original data f(x; y; z) and the compressed- decompressed data fe(x; y; z). In many practical situations, reconstruction that only guaranteed mean square error over the data set is unacceptable. For example, if we use the meteorological data to plan a best trajectory for a plane, then what we really want to know are the meteorological parameters such as wind, temperature, and pressure along the trajectory. If along this line, the values are not reconstructed accurately enough, the plane may crash; the fact that on average, we get a good reconstruction, does not help. In general, what we need is a compression that guarantees that for each (x; y), the di®erence jf(x; y; z)¡fe(x; y; z)j is bounded by a given value ¢, i.e., that the actual value f(x; y; z) belongs to the interval [fe(x; y; z) ¡ ¢; fe(x; y; z) + ¢]: In this chapter, we describe new e±cient techniques for data compression under such interval uncertainty. 2 Optimal Data Compression: A Problem 2.1 Data Compression Is Important At present, so much data is coming from measuring instruments that it is necessary to compress this data before storing and processing.
    [Show full text]
  • Adaptive Bitrate Streaming Over Cellular Networks: Rate Adaptation and Data Savings Strategies
    Adaptive Bitrate Streaming Over Cellular Networks: Rate Adaptation and Data Savings Strategies Yanyuan Qin, Ph.D. University of Connecticut, 2021 ABSTRACT Adaptive bitrate streaming (ABR) has become the de facto technique for video streaming over the Internet. Despite a flurry of techniques, achieving high quality ABR streaming over cellular networks remains a tremendous challenge. First, the design of an ABR scheme needs to balance conflicting Quality of Experience (QoE) metrics such as video quality, quality changes, stalls and startup performance, which is even harder under highly dynamic bandwidth in cellular network. Second, streaming providers have been moving towards using Variable Bitrate (VBR) encodings for the video content, which introduces new challenges for ABR streaming, whose nature and implications are little understood. Third, mobile video streaming consumes a lot of data. Although many video and network providers currently offer data saving options, the existing practices are suboptimal in QoE and resource usage. Last, when the audio and video arXiv:2104.01104v2 [cs.NI] 14 May 2021 tracks are stored separately, video and audio rate adaptation needs to be dynamically coordinated to achieve good overall streaming experience, which presents interesting challenges while, somewhat surprisingly, has received little attention by the research community. In this dissertation, we tackle each of the above four challenges. Firstly, we design a framework called PIA (PID-control based ABR streaming) that strategically leverages PID control concepts and novel approaches to account for the various requirements of ABR streaming. The evaluation results demonstrate that PIA outperforms state-of-the-art schemes in providing high average bitrate with signif- icantly lower bitrate changes and stalls, while incurring very small runtime overhead.
    [Show full text]
  • Download Full Text (Pdf)
    Bitrate Requirements of Non-Panoramic VR Remote Rendering Viktor Kelkkanen Markus Fiedler David Lindero Blekinge Institute of Technology Blekinge Institute of Technology Ericsson Karlskrona, Sweden Karlshamn, Sweden Luleå, Sweden [email protected] [email protected] [email protected] ABSTRACT This paper shows the impact of bitrate settings on objective quality measures when streaming non-panoramic remote-rendered Virtual Reality (VR) images. Non-panoramic here refers to the images that are rendered and sent across the network, they only cover the viewport of each eye, respectively. To determine the required bitrate of remote rendering for VR, we use a server that renders a 3D-scene, encodes the resulting images using the NVENC H.264 codec and transmits them to the client across a network. The client decodes the images and displays them in the VR headset. Objective full-reference quality measures are taken by comparing the image before encoding on the server to the same image after it has been decoded on the client. By altering the average bitrate setting of the encoder, we obtain objective quality Figure 1: Example of the non-panoramic view and 3D scene scores as functions of bitrates. Furthermore, we study the impact that is rendered once per eye and streamed each frame. of headset rotation speeds, since this will also have a large effect on image quality. We determine an upper and lower bitrate limit based on headset 1 INTRODUCTION rotation speeds. The lower limit is based on a speed close to the Remote rendering for VR is commonly done by using panorama average human peak head-movement speed, 360°/s.
    [Show full text]
  • Comments on `Area and Power Efficient DCT Architecture for Image
    Cintra and Bayer EURASIP Journal on Advances in Signal EURASIP Journal on Advances Processing (2017) 2017:50 DOI 10.1186/s13634-017-0486-8 in Signal Processing RESEARCH Open Access Comments on ‘Area and power efficient DCT architecture for image compression’ by Dhandapani and Ramachandran Renato J. Cintra1* and Fábio M. Bayer2 Abstract In [Dhandapani and Ramachandran, “Area and power efficient DCT architecture for image compression”, EURASIP Journal on Advances in Signal Processing 2014, 2014:180] the authors claim to have introduced an approximation for the discrete cosine transform capable of outperforming several well-known approximations in literature in terms of additive complexity. We could not verify the above results and we offer corrections for their work. Keywords: DCT approximations, Low-complexity transforms 1 Introduction 2 Criticisms In a recent paper [1], a low-complexity transformation 2.1 Inverse transformation was introduced, which is claimed to be a good approxima- The authors of [1] claim that inverse transformation T−1 tion to the discrete cosine transform (DCT). We wish to is given by evaluate this claim. ⎡ ⎤ The introduced transformation is given by the following 10001000 ⎢ ⎥ matrix: ⎢ −1100−11 0 0⎥ ⎢ ⎥ ⎢ 00100010⎥ ⎡ ⎤ ⎢ ⎥ 1 ⎢ 00−11 0 0−11⎥ 10000001 · ⎢ ⎥ . ⎢ ⎥ 2 ⎢ 00−11 0 0 1−1 ⎥ ⎢ 11000011⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 001000−10⎥ ⎢ 00100100⎥ ⎣ ⎦ ⎢ ⎥ −11001−10 0 ⎢ 00111100⎥ T = ⎢ ⎥ . 1000−10 0 0 ⎢ 00 1 1−1 −10 0⎥ ⎢ ⎥ ⎢ 00 1 0 0−10 0⎥ ⎣ ⎦ However, simple computation reveal that this is not 110000−1 −1 accurate, being the actual inverse given by: 1000000−1 ⎡ ⎤ 10000001 ⎢ ⎥ We aim at analyzing the above matrix and showing that ⎢ −1100001−1 ⎥ ⎢ ⎥ it does not consist of a meaningful approximation for ⎢ 00100100⎥ ⎢ ⎥ the8-pointDCT.Inthefollowing,weadoptedthesame − 1 ⎢ 00−11 1−10 0⎥ T 1 = · ⎢ ⎥ .
    [Show full text]
  • Lec 15 - HEVC Rate Control
    ECE 5578 Multimedia Communication Lec 15 - HEVC Rate Control Guest Lecturer: Li Li Dept of CSEE, UMKC Office: FH560E, Email: [email protected], Ph: x 2346. http://l.web.umkc.edu/lizhu slides created with WPS Office Linux and EqualX LaTex equation editor Z. Li, ECE 5578 Multimedia Communciation, 2020 Outline 1 Background 2 Related Work 3 Proposed Algorithm 4 Experimental Results 5 Conclusion Rate control introduction Assuming the bandwidth is 2Mbps, the encode parameters should be adjusted to reach the bandwidth Loss of data > 2 2 X Unable to make full use of the < 2 bandwidth How to reach the bandwidth accurately and provide the best video quality ? min D s.t. R £ Rt paras Optimal Rate Control Rate control applications Rate control can be used in various scenarios CBR: Constant Bitrate VBR: Variable Bitrate wire wireless Rate control storage ABR: Average Bitrate Rate distortion optimization Rate distortion optimization min D s.t. R £ Rt Û min D + l(Rt )R paras paras λ determines the optimization target λ Background Rate control Non-normative part of video coding standards Quite widely used: of great importance Considering its importance , rate control algorithm is always included in the reference software of standards MPEG2: TM5 H.263: VM8 H.264: Q-domain rate control HEVC: λ-domain rate control (JCTVC K0103, M0036) Outline 1 Background 2 Related Work 3 Proposed Algorithm 4 Experimental Results 5 Conclusion Related work Q-domain rate control algorithm Yaqin Zhang:Second-order R-Q model The improvement of RMSE in model prediction diminishes as the degree exceeds two.
    [Show full text]
  • Soundstream: an End-To-End Neural Audio Codec Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, Marco Tagliasacchi
    1 SoundStream: An End-to-End Neural Audio Codec Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, Marco Tagliasacchi Abstract—We present SoundStream, a novel neural audio codec that can efficiently compress speech, music and general audio at bitrates normally targeted by speech-tailored codecs. 80 SoundStream relies on a model architecture composed by a fully EVS convolutional encoder/decoder network and a residual vector SoundStream quantizer, which are trained jointly end-to-end. Training lever- ages recent advances in text-to-speech and speech enhancement, SoundStream - scalable which combine adversarial and reconstruction losses to allow Opus the generation of high-quality audio content from quantized 60 embeddings. By training with structured dropout applied to quantizer layers, a single model can operate across variable bitrates from 3 kbps to 18 kbps, with a negligible quality loss EVS when compared with models trained at fixed bitrates. In addition, MUSHRA score the model is amenable to a low latency implementation, which 40 supports streamable inference and runs in real time on a smartphone CPU. In subjective evaluations using audio at 24 kHz Lyra sampling rate, SoundStream at 3 kbps outperforms Opus at 12 kbps and approaches EVS at 9.6 kbps. Moreover, we are able to Opus perform joint compression and enhancement either at the encoder 20 or at the decoder side with no additional latency, which we 3 6 9 12 demonstrate through background noise suppression for speech. Bitrate (kbps) Fig. 1: SoundStream @3 kbps vs. state-of-the-art codecs. I. INTRODUCTION Audio codecs can be partitioned into two broad categories: machine learning models have been successfully applied in the waveform codecs and parametric codecs.
    [Show full text]