Compressing Variational Posteriors

Total Page:16

File Type:pdf, Size:1020Kb

Compressing Variational Posteriors Compressing Variational Posteriors Yibo Yang∗ Robert Bamler∗ Stephan Mandt UC Irvine UC Irvine UC Irvine [email protected] [email protected] [email protected] Abstract Deep Bayesian latent variable models have enabled new approaches to both model and data compression. Here, we propose a new algorithm for compressing latent representations in deep probabilistic models, such as VAEs, in post-processing. Our algorithm generalizes arithmetic coding to the continuous domain, using adaptive discretization accuracy that exploits posterior uncertainty. Our approach separates model design and training from the compression task, and thus allows for various rate-distortion trade-offs with a single trained model, eliminating the need to train multiple models for different bit rates. We obtain promising experimental results on compressing Bayesian neural word embeddings, and outperform JPEG on image compression over a wide range of bit rates using only a single standard VAE. 1 Introduction and Related Work One natural application of deep generative modeling is data compression. Here we focus on lossy compression of continuous data, which entails quantization, i.e., discretizing arbitrary real numbers to a countable number of representative values, resulting in the classic trade-off between rate (entropy of the discretized values) and distortion (error from discretization). State-of-the-art deep learning based approaches to lossy data compression [1–5] train a machine learning model to perform lossy compression by optimizing a rate-distortion objective, substituting the quantization step with a differentiable approximation during training. Consequently, a new model has to be trained for any desired combination of rate and distortion. To support variable-bitrate compression, one has to train and store several deep learning models for different rates, resulting in possibly very large codec size. As an additional downside, in real-time applications such as video streaming, the codec has to load a new model into memory every time a change in bandwidth requires adjusting the bitrate. By contrast, we consider a different but related problem: given a trained model, what is the best way to encode the information contained in its continuous latent variables? Our proposed solution is a lossy neural compression method that decouples training from compression, and enables variable- bitrate compression with a single model. We generalize the classical entropy-coding algorithm, Arithmetic Coding [6, 7], from discrete to continuous domain. The proposed method, Bayesian Arithmetic Coding, is in essence an adaptive quantization scheme that exploits posterior uncertainty to automatically reduce the accuracy of latent variables for which the model is uncertain anyway. 2 Method and Experimental Results Problem Setup We consider a wide class of generative models, such as variational autoencoders (VAEs), defined by a joint distribution, p(x; z) = p(z) p(xjz), where z 2 RK are continuous latent variables. Although we discuss the unsupervised setting, our framework also captures the supervised setup, in terms of a conditional model for target y given auxiliary input x: p(y; zjx) = p(z) p(yjz; x). ∗Corresponding authors; equal contribution. 1 35 1:0 7 8 (0:111)2 = = 30 0:8 (0:11) = 3=4 2 F<(m) 25 5 Bayesian AC (proposed) (0:101)2 = =8 F (m) 0:6 ≤ 20 Uniform Quantization 1 (0:1)2 = =2 p(m) k-means Quantization 15 3 p(zi) 0:4 JPEG (0:011)2 = =8 µi F (zi) Average PSNR10 (RGB) Ball´eet al. (2017) (0:01) = 1=4 Average MS-SSIM (RGB) 2 q(zi x) σσi σσiii 0:0 0:2 0:4 0:6 0:8 1:0 1:2 1 g(ξ j) ± 0:0 0:2 0:4 0:6 0:8 1:0 1:2 (0:001)2 = =8 i Average Bits per Pixel (BPP) Average Bits per Pixel (BPP) 0 0 1 2 3 4 5 6 7 8 9 10 2 0 2 − Figure 2: Average rate-distortion performance discrete message m latent variable zi R 2 on the Kodak dataset, measured in PSNR and Figure 1: Comparison of standard Arithmetic MS-SSIM [8] (higher is better). Bayesian AC Coding (AC, left) and Bayesian AC (right, pro- (blue, proposed) outperforms JPEG for all tested posed). Both methods use the prior CDF (or- bitrates with a single model. By contrast, [1] ange) to map non-uniformly distributed data to (black squares) relies on individually optimized a number ξ ∼ U(0; 1), and truncate it to finite models for each bitrate that all have to be in- binary precision based on a notion of uncertainty. cluded in the codec. We assume two parties in communication, a sender and a receiver. Both have access to the model p(x; z) = p(z)p(xjz), and given a data sample x, the sender wishes to send it to the receiver as efficiently as possible. The sender first infers a variational distribution q(zjx), then uses our proposed algorithm to select a latent vector z^ that has high probability under q, and that can be encoded in a compressed bitstring, which is sent to the receiver. The receiver losslessly decodes the bitstring back into ^z and uses the likelihood p(xj^z) to generate a reconstructed data point ^x, typically setting ^x = arg maxx p(xj^z). Currently our algorithm assumes factorized prior p(z) and posterior q(zjx). Bayesian Arithmetic Coding Given a discrete source p, standard arithmetic coding (AC) uses its CDF to map a data sample m to an interval Im 2 [0; 1) of width p(m), and uniquely identifies it ^ with a real number ξ 2 Im using only dlog2 p(m)e binary digits. In the continuous case, considering the ith latent variable zi, we make the similar observation that the CDF F (zi) transforms zi ∼ p(zi) into ξi ∼ U(0; 1). In binary, the bits of ξi = (0:b1b2; :::)2 are an infinite sequence of Bernoulli(0:5) random variables and cannot be compressed further. Our goal then, is to truncate ξi into a finite number ^ −1 ^ of bits, corresponding to a value ξi 2 [0; 1), which then identifies a quantized value z^i = F (ξi) in the latent space; we would like z^i to represent the original value zi well, while spending as few bits on ^ ξi as possible. Similar to how standard AC performs this truncation using the width of an “uncertainty interval" Im 2 [0; 1), in the continuous case we consider the posterior uncertainty in zi-space and map it to ξi-space, as illustrated in Figure 1. Approximating the posterior p(zijx) by the variational −1 distribution q(zijx), we consider the (negative) distortion function g(ξi) := q(F (ξi) j x), and search for the optimal ξ^by minimizing the rate-distortion objective: XK ^ XK ^ ^ L(ξ^jx) = − log q(^zjx) + λjξ^j = − log q(^zijx) + λjξij = − log g(ξi) + λjξij i=1 i=1 ^ ^ where λ > 0 controls the rate-distortion trade-off, and jξij is the number of bits for expressing ξi in binary. The optimization decouples across the K dimensions and can be efficiently solved in parallel. ^ −1 ^ K The resulting optimal ξ is transmitted lossessly to the receiver, who then recovers ^z = (F (ξi))i=1. Results on Image Compression We trained a VAE with standard Gaussian prior and diagonal Gaussian posterior, using the same ImageNet dataset and neural architecture as in [1]. See Figure 2 for results, where we compare to Ballé et al. [1], JPEG, and a uniform quantization baseline that ignores posterior variance and quantizes posterior means using a uniformly-spaced grid. Although we fall short of Ballé et al. [1], which represents the end-to-end optimized performance on this task, we outperform JPEG on all bitrates tested, both quantitatively and visually, using a single VAE. Results on Compressing Bayesian Neural Word Embeddings We trained a Bayesian Skip-gram model [9] and applied our method to compress the learned word embeddings. We compare again to baselines based on uniformly quantizating the posterior means, using more sophisticated entropy- coding algorithms such as gzip and bzip2. On all performance metrics considered (semantic and syntactic reasoning tasks [10]), our algorithm outperforms all baselines by a wide margin (full results omitted due to space constraint), and demonstrates its flexibility and potential for model compression. 2 References [1] Johannes Ballé, Valero Laparra, and Eero P Simoncelli. End-to-end optimized image compres- sion. International Conference on Learning Representations, 2017. [2] Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. Variational image compression with a scale hyperprior. In ICLR, 2018. [3] Oren Rippel and Lubomir Bourdev. Real-time adaptive image compression, 2017. [4] Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. Conditional probability models for deep image compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4394–4402, 2018. [5] Salvator Lombardo, Jun Han, Christopher Schroers, and Stephan Mandt. Deep generative video compression. In Advances in Neural Information Processing Systems, pages 9283–9294, 2019. [6] Ian H Witten, Radford M Neal, and John G Cleary. Arithmetic coding for data compression. Communications of the ACM, 30(6):520–540, 1987. [7] David JC MacKay. Information theory, inference and learning algorithms. Cambridge Univer- sity Press, 2003. [8] Zhou Wang, Eero P Simoncelli, and Alan C Bovik. Multiscale structural similarity for im- age quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, volume 2, pages 1398–1402. Ieee, 2003. [9] Robert Bamler and Stephan Mandt. Dynamic word embeddings. In International conference on Machine learning, 2017. [10] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed repre- sentations of words and phrases and their compositionality.
Recommended publications
  • Challenges in Relaying Video Back to Mission Control
    Challenges in Relaying Video Back to Mission Control CONTENTS 1 Introduction ..................................................................................................................................................................................... 2 2 Encoding ........................................................................................................................................................................................... 3 3 Time is of the Essence .............................................................................................................................................................. 3 4 The Reality ....................................................................................................................................................................................... 5 5 The Tradeoff .................................................................................................................................................................................... 5 6 Alleviations and Implementation ......................................................................................................................................... 6 7 Conclusion ....................................................................................................................................................................................... 7 White Paper Revision 2 July 2020 By Christopher Fadeley Using a customizable hardware-accelerated encoder is essential to delivering the high
    [Show full text]
  • Hikvision H.264+ Encoding Technology
    WHITE PAPER Hikvision H.264+ Encoding Technology Encoding Improvement / Higher Transmission / Efficiency Storage Savings 2 Contents 1. Introduction .............................................................................................. 3 2. Background ............................................................................................... 3 3. Key Technologies .................................................................................... 4 3.1 Predictive Encoding ........................................................................ 4 3.2 Noise Suppression.......................................................................... 8 3.3 Long-Term Bitrate Control........................................................... 9 4. Applications ............................................................................................ 11 5. Conclusion............................................................................................... 11 Hikvision H.264+ Encoding Technology 3 1. INTRODUCTION As the global market leader in video surveillance products, Hikvision Digital Technology Co., Ltd., continues to strive for enhancement of its products through application of the latest in technology. H.264+ Advanced Video Coding (AVC) optimizes compression beyond the current H.264 standard. Through the combination of intelligent analysis technology with predictive encoding, noise suppression, and long-term bitrate control, Hikvision is meeting the demand for higher resolution at reduced bandwidths. Our customers will benefit
    [Show full text]
  • A Packetization and Variable Bitrate Interframe Compression Scheme for Vector Quantizer-Based Distributed Speech Recognition
    A Packetization and Variable Bitrate Interframe Compression Scheme For Vector Quantizer-Based Distributed Speech Recognition Bengt J. Borgstrom¨ and Abeer Alwan Department of Electrical Engineering, University of California, Los Angeles [email protected], [email protected] Abstract For example, the ETSI standard [2] packetization scheme sim- ply concatenates 2 adjacent 44-bit source coded frames, and We propose a novel packetization and variable bitrate com- transmits the resulting signal along with header information. pression scheme for DSR source coding, based on the Group The proposed algorithm performs further compression using of Pictures concept from video coding. The proposed algorithm interframe correlation. The high time correlation present in simultaneously packetizes and further compresses source coded speech has previously been studied for speech coding [3] and features using the high interframe correlation of speech, and is DSR coding [4]. compatible with a variety of VQ-based DSR source coders. The The proposed algorithm allows for lossless compression of algorithm approximates vector quantizers as Markov Chains, the quantized speech features. However, the algorithm is also and empirically trains the corresponding probability parame- robust to various degrees of lossy compression, either through ters. Feature frames are then compressed as I-frames, P-frames, VQ pruning or frame puncturing. VQ pruning refers to ex- or B-frames, using Huffman tables. The proposed scheme can clusion of low probability VQ codebook labels prior to Huff- perform lossless compression, but is also robust to lossy com- man coding, and thus excludes longer Huffman codewords, pression through VQ pruning or frame puncturing. To illustrate and frame puncturing refers to the non-transmission of certain its effectiveness, we applied the proposed algorithm to the ETSI frames, which drastically reduces the final bitrate.
    [Show full text]
  • AVC to the Max: How to Configure Encoder
    Contents Company overview …. ………………………………………………………………… 3 Introduction…………………………………………………………………………… 4 What is AVC….………………………………………………………………………… 6 Making sense of profiles, levels, and bitrate………………………………………... 7 Group of pictures and its structure..………………………………………………… 11 Macroblocks: partitioning and prediction modes….………………………………. 14 Eliminating spatial redundancy……………………………………………………… 15 Eliminating temporal redundancy……...……………………………………………. 17 Adaptive quantization……...………………………………………………………… 24 Deblocking filtering….….…………………………………………………………….. 26 Entropy encoding…………………………………….……………………………….. 2 8 Conclusion…………………………………………………………………………….. 29 Contact details..………………………………………………………………………. 30 2 www.elecard.com Company overview Elecard company, founded in 1988, is a leading provider of software products for encoding, decoding, processing, monitoring and analysis of video and audio data in 9700 companies various formats. Elecard is a vendor of professional software products and software development kits (SDKs); products for in - depth high - quality analysis and monitoring of the media content; countries 1 50 solutions for IPTV and OTT projects, digital TV broadcasting and video streaming; transcoding servers. Elecard is based in the United States, Russia, and China with 20M users headquarters located in Tomsk, Russia. Elecard products are highly appreciated and widely used by the leaders of IT industry such as Intel, Cisco, Netflix, Huawei, Blackmagic Design, etc. For more information, please visit www.elecard.com. 3 www.elecard.com Introduction Video compression is the key step in video processing. Compression allows broadcasters and premium TV providers to deliver their content to their audience. Many video compression standards currently exist in TV broadcasting. Each standard has different properties, some of which are better suited to traditional live TV while others are more suited to video on demand (VoD). Two basic standards can be identified in the history of video compression: • MPEG-2, a legacy codec used for SD video and early digital broadcasting.
    [Show full text]
  • Adaptive Bitrate Streaming Over Cellular Networks: Rate Adaptation and Data Savings Strategies
    Adaptive Bitrate Streaming Over Cellular Networks: Rate Adaptation and Data Savings Strategies Yanyuan Qin, Ph.D. University of Connecticut, 2021 ABSTRACT Adaptive bitrate streaming (ABR) has become the de facto technique for video streaming over the Internet. Despite a flurry of techniques, achieving high quality ABR streaming over cellular networks remains a tremendous challenge. First, the design of an ABR scheme needs to balance conflicting Quality of Experience (QoE) metrics such as video quality, quality changes, stalls and startup performance, which is even harder under highly dynamic bandwidth in cellular network. Second, streaming providers have been moving towards using Variable Bitrate (VBR) encodings for the video content, which introduces new challenges for ABR streaming, whose nature and implications are little understood. Third, mobile video streaming consumes a lot of data. Although many video and network providers currently offer data saving options, the existing practices are suboptimal in QoE and resource usage. Last, when the audio and video arXiv:2104.01104v2 [cs.NI] 14 May 2021 tracks are stored separately, video and audio rate adaptation needs to be dynamically coordinated to achieve good overall streaming experience, which presents interesting challenges while, somewhat surprisingly, has received little attention by the research community. In this dissertation, we tackle each of the above four challenges. Firstly, we design a framework called PIA (PID-control based ABR streaming) that strategically leverages PID control concepts and novel approaches to account for the various requirements of ABR streaming. The evaluation results demonstrate that PIA outperforms state-of-the-art schemes in providing high average bitrate with signif- icantly lower bitrate changes and stalls, while incurring very small runtime overhead.
    [Show full text]
  • Comments on `Area and Power Efficient DCT Architecture for Image
    Cintra and Bayer EURASIP Journal on Advances in Signal EURASIP Journal on Advances Processing (2017) 2017:50 DOI 10.1186/s13634-017-0486-8 in Signal Processing RESEARCH Open Access Comments on ‘Area and power efficient DCT architecture for image compression’ by Dhandapani and Ramachandran Renato J. Cintra1* and Fábio M. Bayer2 Abstract In [Dhandapani and Ramachandran, “Area and power efficient DCT architecture for image compression”, EURASIP Journal on Advances in Signal Processing 2014, 2014:180] the authors claim to have introduced an approximation for the discrete cosine transform capable of outperforming several well-known approximations in literature in terms of additive complexity. We could not verify the above results and we offer corrections for their work. Keywords: DCT approximations, Low-complexity transforms 1 Introduction 2 Criticisms In a recent paper [1], a low-complexity transformation 2.1 Inverse transformation was introduced, which is claimed to be a good approxima- The authors of [1] claim that inverse transformation T−1 tion to the discrete cosine transform (DCT). We wish to is given by evaluate this claim. ⎡ ⎤ The introduced transformation is given by the following 10001000 ⎢ ⎥ matrix: ⎢ −1100−11 0 0⎥ ⎢ ⎥ ⎢ 00100010⎥ ⎡ ⎤ ⎢ ⎥ 1 ⎢ 00−11 0 0−11⎥ 10000001 · ⎢ ⎥ . ⎢ ⎥ 2 ⎢ 00−11 0 0 1−1 ⎥ ⎢ 11000011⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 001000−10⎥ ⎢ 00100100⎥ ⎣ ⎦ ⎢ ⎥ −11001−10 0 ⎢ 00111100⎥ T = ⎢ ⎥ . 1000−10 0 0 ⎢ 00 1 1−1 −10 0⎥ ⎢ ⎥ ⎢ 00 1 0 0−10 0⎥ ⎣ ⎦ However, simple computation reveal that this is not 110000−1 −1 accurate, being the actual inverse given by: 1000000−1 ⎡ ⎤ 10000001 ⎢ ⎥ We aim at analyzing the above matrix and showing that ⎢ −1100001−1 ⎥ ⎢ ⎥ it does not consist of a meaningful approximation for ⎢ 00100100⎥ ⎢ ⎥ the8-pointDCT.Inthefollowing,weadoptedthesame − 1 ⎢ 00−11 1−10 0⎥ T 1 = · ⎢ ⎥ .
    [Show full text]
  • Neural Rate Control for Video Encoding Using Imitation Learning
    Neural Rate Control for Video Encoding using Imitation Learning Hongzi Mao†? Chenjie Gu† Miaosen Wang† Angie Chen† Nevena Lazic† Nir Levine† Derek Pang* Rene Claus* Marisabel Hechtman* Ching-Han Chiang* Cheng Chen* Jingning Han* †DeepMind *Google ?MIT CSAIL Abstract module manages the distribution of available bandwidth, usually constrained by the network condition. It directly In modern video encoders, rate control is a critical com- determines how many bits to spend to encode each video ponent and has been heavily engineered. It decides how frame, by assigning a quantization parameter (QP). The many bits to spend to encode each frame, in order to opti- goal of rate control is to maximize the encoding efficiency mize the rate-distortion trade-off over all video frames. This (often measured by the Bjontegaard delta rate) as well as is a challenging constrained planning problem because of maintaining the bitrate under a user-specified target bitrate. the complex dependency among decisions for different video Rate control is a constrained planning problem and can frames and the bitrate constraint defined at the end of the be formulated as a Partially Observable Markov Decision episode. Process. For a particular frame, the QP assignment decision We formulate the rate control problem as a Partially depends on (1) the frame’s spatial and temporal complexity, Observable Markov Decision Process (POMDP), and ap- (2) the previously encoded frames this frame refers to, and ply imitation learning to learn a neural rate control pol- (3) the complexity of future frames. In addition, the bitrate icy. We demonstrate that by learning from optimal video constraint imposes an episodic constraint on the planning encoding trajectories obtained through evolution strategies, problem.
    [Show full text]
  • Forensic Reconstruction of Fragmented Variable Bitrate MP3 Files
    CORE Metadata, citation and similar papers at core.ac.uk Provided by ScholarWorks @ The University of New Orleans University of New Orleans ScholarWorks@UNO University of New Orleans Theses and Dissertations Dissertations and Theses 12-17-2010 Forensic Reconstruction of Fragmented Variable Bitrate MP3 files Abhilash Sajja University of New Orleans Follow this and additional works at: https://scholarworks.uno.edu/td Recommended Citation Sajja, Abhilash, "Forensic Reconstruction of Fragmented Variable Bitrate MP3 files" (2010). University of New Orleans Theses and Dissertations. 1258. https://scholarworks.uno.edu/td/1258 This Thesis is protected by copyright and/or related rights. It has been brought to you by ScholarWorks@UNO with permission from the rights-holder(s). You are free to use this Thesis in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights- holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/or on the work itself. This Thesis has been accepted for inclusion in University of New Orleans Theses and Dissertations by an authorized administrator of ScholarWorks@UNO. For more information, please contact [email protected]. Forensic Reconstruction of Fragmented Variable Bitrate MP3 files A Thesis Submitted to the Graduate Faculty of the University of New Orleans In partial fulfillment of the requirements for the degree of Master of Science in Computer Science by Abhilash Sajja December, 2010 Acknowledgements First I would like to thank my parents for their support and encouragement. Special thanks to my advisor Dr.Vassil Roussev for his guidance and feedback without which this thesis would not have been possible.
    [Show full text]
  • A/330, "Link Layer Protocol"
    ATSC A/330:2016 Link-Layer Protocol 19 September 2016 ATSC Standard: Link-Layer Protocol (A/330) Doc. A/330:2016 19 September 2016 Advanced Television Systems Committee 1776 K Street, N.W. Washington, D.C. 20006 202-872-9160 i ATSC A/330:2016 Link-Layer Protocol 19 September 2016 The Advanced Television Systems Committee, Inc., is an international, non-profit organization developing voluntary standards and recommended practices for digital television. ATSC member organizations represent the broadcast, broadcast equipment, motion picture, consumer electronics, computer, cable, satellite, and semiconductor industries. ATSC also develops digital television implementation strategies and supports educational activities on ATSC standards. ATSC was formed in 1983 by the member organizations of the Joint Committee on InterSociety Coordination (JCIC): the Electronic Industries Association (EIA), the Institute of Electrical and Electronic Engineers (IEEE), the National Association of Broadcasters (NAB), the National Cable Telecommunications Association (NCTA), and the Society of Motion Picture and Television Engineers (SMPTE). For more information visit www.atsc.org. Note: The user's attention is called to the possibility that compliance with this standard may require use of an invention covered by patent rights. By publication of this standard, no position is taken with respect to the validity of this claim or of any patent rights in connection therewith. One or more patent holders have, however, filed a statement regarding the terms on which such patent holder(s) may be willing to grant a license under these rights to individuals or entities desiring to obtain such a license. Details may be obtained from the ATSC Secretary and the patent holder.
    [Show full text]
  • A Psychoacoustic-Based Multiple Audio Object Coding Approach Via Intra- Object Sparsity
    University of Wollongong Research Online Faculty of Engineering and Information Faculty of Engineering and Information Sciences - Papers: Part B Sciences 2017 A psychoacoustic-based multiple audio object coding approach via intra- object sparsity Maoshen Jia Beijing University of Technology Jiaming Zhang Beijing University of Technology Changchun Bao Beijing University of Technology, [email protected] Xiguang Zheng University of Wollongong, Dolby Laboratories, [email protected] Follow this and additional works at: https://ro.uow.edu.au/eispapers1 Part of the Engineering Commons, and the Science and Technology Studies Commons Recommended Citation Jia, Maoshen; Zhang, Jiaming; Bao, Changchun; and Zheng, Xiguang, "A psychoacoustic-based multiple audio object coding approach via intra-object sparsity" (2017). Faculty of Engineering and Information Sciences - Papers: Part B. 882. https://ro.uow.edu.au/eispapers1/882 Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library: [email protected] A psychoacoustic-based multiple audio object coding approach via intra-object sparsity Abstract Rendering spatial sound scenes via audio objects has become popular in recent years, since it can provide more flexibility for different auditory scenarios, such as 3D movies, spatial audio communication and virtual classrooms. To facilitate high-quality bitrate-efficient distribution for spatial audio objects, an encoding scheme based on intra-object sparsity (approximate k-sparsity of the audio object itself) is proposed in this paper. The statistical analysis is presented to validate the notion that the audio object has a stronger sparseness in the Modified Discrete Cosine Transform (MDCT) domain than in the Short Time Fourier Transform (STFT) domain.
    [Show full text]
  • NETWORK VIDEO ENCODER User Manual
    NETWORK VIDEO ENCODER User Manual SPE-110 Network Video Encoder User Manual Copyright ©2018 Hanwha Techwin Co., Ltd. All rights reserved. Trademark Each of trademarks herein is registered. The name of this product and other trademarks mentioned in this manual are the registered trademark of their respective company. Restriction Copyright of this document is reserved. Under no circumstances, this document shall be reproduced, distributed or changed, partially or wholly, without formal authorization. Disclaimer Hanwha Techwin makes the best to verify the integrity and correctness of the contents in this document, but no formal guarantee shall be provided. Use of this document and the subsequent results shall be entirely on the user’s own responsibility. Hanwha Techwin reserves the right to change the contents of this document without prior notice. Design and specifications are subject to change without prior notice. The initial administrator ID is “admin” and the password should be set when logging in for the first time. Please change your password every three months to safely protect personal information and to prevent the damage of the information theft. Please, take note that it’s a user’s responsibility for the security and any other problems caused by mismanaging a password. overview IMPORTANT SAFETY INSTRUCTIONS WARNING TO REDUCE THE RISK OF FIRE OR ELECTRIC SHOCK, DO NOT EXPOSE THIS PRODUCT 1. Read these instructions. TO RAIN OR MOISTURE. DO NOT INSERT ANY METALLIC OBJECT THROUGH THE 2. Keep these instructions. VENTILATION GRILLS OR OTHER OPENNINGS ON THE EQUIPMENT. ● Heed all warnings. ● 3. Apparatus shall not be exposed to dripping or splashing and that no objects filled with liquids, OVERVIEW 4.
    [Show full text]
  • Constant Bit-Rate Control Efficiency with Fast Motion Estimation in H.264/Avc Video Coding Standard
    CONSTANT BIT-RATE CONTROL EFFICIENCY WITH FAST MOTION ESTIMATION IN H.264/AVC VIDEO CODING STANDARD Daniele Alfonso, Daniele Bagni, Luca Celetto and Simone Milani* Advanced System Technology labs, STMicroelectronics, Agrate Brianza, Italy (Europe) * Department of Information Engineering, University of Padova, Italy (Europe) emails: [email protected], [email protected] ABSTRACT greatly reduces the computation with respect to FSBM with- The emerging H.264/AVC video coding standard provides out visible quality losses at SD-TV resolution. ρ significant enhancements in compression efficiency with The JVT-D030/E069 and -domain algorithms are pre- respect to its ancestors in the MPEG and H.26x families. In sented in sections 2 and 3 respectively, whereas section 4 this paper, we analyse the performance of two different Con- introduces the motion estimation method. The numerical stant Bit-Rate control methods, suitable for H.264/AVC en- results are showed in section 5, followed by our conclusions coding at SD-TV resolution, where the motion estimation is in section 6. performed by a fast proprietary predictive algorithm. 2. JVT-D030/E069 RATE CONTROL 1. INTRODUCTION The JVT-D030/E069 [4, 5] is a CBR control method, devel- The H.264/AVC video coding standard [1] greatly outper- oped from the widely known TM5 [6], originally conceived forms its ancestors in the MPEG and H.26x families, provid- for MPEG-2 video coding. The main difference between the ing about 50% more compression at equal quality [2, 3] in two algorithms consists in the definition of luminance mac- all kinds of applications, from low-bitrate streaming to high- roblock activity, which is the Sum of Absolute Differences quality storage.
    [Show full text]