Compression Method LZFSE Student: Martin Hron Supervisor: Ing

Total Page:16

File Type:pdf, Size:1020Kb

Compression Method LZFSE Student: Martin Hron Supervisor: Ing ASSIGNMENT OF BACHELOR’S THESIS Title: Compression method LZFSE Student: Martin Hron Supervisor: Ing. Jan Baier Study Programme: Informatics Study Branch: Computer Science Department: Department of Theoretical Computer Science Validity: Until the end of summer semester 2018/19 Instructions Introduce yourself with LZ family compression methods. Analyze compression method LZFSE [1] and explain its main enhancements. Implement this method and/or these enhancements into the ExCom library [2], perform evaluation tests using the standard test suite and compare performance with other implemented LZ methods. References [1] https://github.com/lzfse/lzfse [2] http://www.stringology.org/projects/ExCom/ doc. Ing. Jan Janoušek, Ph.D. doc. RNDr. Ing. Marcel Jiřina, Ph.D. Head of Department Dean Prague January 17, 2018 Bachelor’s thesis Compression method LZFSE Martin Hron Department of Theoretical Computer Science Supervisor: Ing. Jan Baier May 12, 2018 Acknowledgements I would like to thank my supervisor, Ing. Jan Baier, for guiding this thesis and for all his valuable advice. Declaration I hereby declare that the presented thesis is my own work and that I have cited all sources of information in accordance with the Guideline for adhering to ethical principles when elaborating an academic final thesis. I acknowledge that my thesis is subject to the rights and obligations stip- ulated by the Act No. 121/2000 Coll., the Copyright Act, as amended. In accordance with Article 46(6) of the Act, I hereby grant a nonexclusive au- thorization (license) to utilize this thesis, including any and all computer pro- grams incorporated therein or attached thereto and all corresponding docu- mentation (hereinafter collectively referred to as the “Work”), to any and all persons that wish to utilize the Work. Such persons are entitled to use the Work in any way (including for-profit purposes) that does not detract from its value. This authorization is not limited in terms of time, location and quan- tity. However, all persons that makes use of the above license shall be obliged to grant a license at least in the same scope as defined above with respect to each and every work that is created (wholly or in part) based on the Work, by modifying the Work, by combining the Work with another work, by including the Work in a collection of works or by adapting the Work (including trans- lation), and at the same time make available the source code of such work at least in a way and scope that are comparable to the way and scope in which the source code of the Work is made available. In Prague on May 12, 2018 . Czech Technical University in Prague Faculty of Information Technology c 2018 Martin Hron. All rights reserved. This thesis is school work as defined by Copyright Act of the Czech Republic. It has been submitted at Czech Technical University in Prague, Faculty of Information Technology. The thesis is protected by the Copyright Act and its usage without author’s permission is prohibited (with exceptions defined by the Copyright Act). Citation of this thesis Hron, Martin. Compression method LZFSE. Bachelor’s thesis. Czech Techni- cal University in Prague, Faculty of Information Technology, 2018. Abstract This thesis focuses on LZFSE compression method, which combines a dictio- nary compression scheme with a technique based on ANS (asymmetric nu- meral systems). It describes the principles on which the method works and analyses the reference implementation of LZFSE by Eric Bainville. As part of this thesis, the LZFSE method is added as a new module into the ExCom library and compared with other implemented compression methods using the files from the Prague Corpus. The impacts that different settings of adjustable LZFSE parameters have are also examined. Keywords LZFSE, ExCom library, data compression, dictionary compres- sion methods, lossless compression, finite state entropy, asymmetric numeral systems vii Abstrakt Tato pr´acese zab´yv´akompresn´ımetodou LZFSE, kter´akombinuje slovn´ıkovou kompresi s technikou zaloˇzenouna ANS (asymmetric numeral systems). Pr´ace popisuje principy, na kter´ych tato metoda funguje, a analyzuje referenˇcn´ıim- plementaci, jej´ıˇzautorem je Eric Bainville. V r´amcit´etopr´aceje metoda LZFSE pˇrid´anajako nov´ymodul do knihovny ExCom a porovn´anas ostatn´ımi implementovan´ymimetodami na datech Praˇzsk´ehoKorpusu. D´aleje prozkoum´anvliv nastaviteln´ych parametr˚umetody LZFSE. Kl´ıˇcov´a slova LZFSE, knihovna ExCom, komprese dat, slovn´ıkov´ekom- presn´ımetody, bezeztr´atov´akomprese, finite state entropy, asymmetric nu- meral systems viii Contents Introduction 1 1 Data compression 3 1.1 Basic data compression concepts . 3 1.2 Information entropy and redundancy . 3 1.3 Classification of compression methods . 4 1.4 Measures of performance . 6 1.5 ExCom library . 7 1.6 Hash function and hash table . 7 2 LZ family algorithms 9 2.1 LZ77 . 9 2.2 LZ78 . 11 3 Asymmetric numeral systems 15 3.1 Entropy coding . 15 3.2 Asymmetric numeral systems . 16 3.3 Finite state entropy . 19 4 LZFSE 21 4.1 Compression . 21 4.2 Decompression . 29 5 Implementation 33 5.1 Implementation of LZFSE module . 33 6 Benchmarks 37 6.1 Methodology . 37 6.2 Testing platform specifications . 37 6.3 Results . 38 ix Conclusion 51 Bibliography 53 A Acronyms 55 B Reference implementation 57 B.1 Source files . 57 B.2 The API functions . 59 C Prague Corpus files 61 D Building the ExCom library 63 E Scripts used for benchmarking 65 F Detailed benchmark results 69 F.1 Tables . 70 F.2 Additional graphs . 76 G Contents of enclosed CD 79 x List of Figures 2.1 LZ77 sliding window . 9 2.2 Example of LZ77 compression . 11 2.3 Example of LZ77 decompression . 12 3.1 Standard numeral system and ANS . 18 4.1 LZFSE history table . 24 6.1 Comparison of compression time of LZFSE, LZ78 and LZW . 39 6.2 Comparison of decompression time of LZFSE, LZ78 and LZW . 39 6.3 Comparison of compression ratio of dictionary methods . 41 6.4 Comparison of compression time of LZFSE and non-dictionary methods . 43 6.5 Comparison of decompression time of LZFSE and non-dictionary methods . 43 6.6 Comparison of compression ratio of LZFSE and non-dictionary methods . 44 6.7 Comparison of compression time and compression ratio of all tested methods on flower file . 46 6.8 Comparison of decompression time and compression ratio of all tested methods on flower file . 46 6.9 Impact of good match parameter on compression time and ratio . 47 6.10 Impact of hash bits parameter on compression time and ratio . 48 F.1 Comparison of compression time of LZFSE, LZ78 and LZW on all files . 76 F.2 Comparison of decompression time of LZFSE, LZ78 and LZW on all files . 77 F.3 Comparison of compression ratio of LZFSE, LZ78 and LZMW on all files . 78 xi List of Tables C.1 The Prague Corpus files [19] . 61 F.1 Compression time of dictionary methods . 70 F.2 Decompression time of dictionary methods . 71 F.3 Compression ratio of dictionary methods . 72 F.4 Compression time of non-dictionary methods . 73 F.5 Decompression time of non-dictionary methods . 74 F.6 Compression ratio of non-dictionary methods . 75 List of Algorithms 1 LZ77 compression . 10 2 LZ77 decompression . 11 3 LZ78 compression [12] . 13 4 LZ78 decompression . 13 5 FSE symbol encoding . 29 6 LZFSE match decoding step . 31 7 Decoding of a literal using FSE . 32 8 Decoding of one L, M or D value using FSE . 32 xiii Introduction Motivation The amount of computer data produced by various information systems and applications is huge and grows every day. Therefore, vast quantity of data needs to be transmitted over the network or stored on some medium. To speed up data transmission and conserve storage space it is useful to reduce size of computer data while still maintaining the information that is contained in it. This is the purpose of data compression. Compression achieves data size reduction by decreasing redundancy and changing information representation to one that is more efficient. Data compression is used very commonly even by non-professional com- puter users. Often we want to archive some files, transfer them to another computer or share them with other users. In those common cases, compres- sion ratio (i.e. how much space is saved) and speed are typically more or less equally important, so we generally need to make a compromise between speed and compression quality. Also for compressing general files, a lossless method is usually required so that no vital information is lost. For example, ZIP is frequently used archive file format that is probably known to most computer users. ZIP archives are usually compressed using a lossless algorithm called DEFLATE, which is designed to keep good balance between compression ratio and speed. LZFSE is a new open source data compression algorithm developed by Apple Inc. It is designed for similar purpose as DEFLATE algorithm but is claimed to be significantly faster (up to three times) and to use less resources while preserving comparable compression ratio. LZFSE combines dictionary compression with an implementation of asymmetric numeral systems. Asym- metric numeral systems is a recent entropy coding method based on the work of Jaros law Duda, which aims to end the tradeoff between compression ratio and speed. 1 Introduction Various compression methods are collected in the ExCom library. ExCom is a library developed as part of Filip Simek’sˇ thesis on CTU in 2009, which supports benchmarking and experimenting with compression methods. ExCom contains many dictionary algorithms. However, it does not yet contain the LZFSE method, nor any other method that would use asymmetric numeral systems. Integrating LZFSE into the ExCom library will allow to run, experi- ment with this method and perform benchmarks on it to confirm its enhance- ments.
Recommended publications
  • Data Compression: Dictionary-Based Coding 2 / 37 Dictionary-Based Coding Dictionary-Based Coding
    Dictionary-based Coding already coded not yet coded search buffer look-ahead buffer cursor (N symbols) (L symbols) We know the past but cannot control it. We control the future but... Last Lecture Last Lecture: Predictive Lossless Coding Predictive Lossless Coding Simple and effective way to exploit dependencies between neighboring symbols / samples Optimal predictor: Conditional mean (requires storage of large tables) Affine and Linear Prediction Simple structure, low-complex implementation possible Optimal prediction parameters are given by solution of Yule-Walker equations Works very well for real signals (e.g., audio, images, ...) Efficient Lossless Coding for Real-World Signals Affine/linear prediction (often: block-adaptive choice of prediction parameters) Entropy coding of prediction errors (e.g., arithmetic coding) Using marginal pmf often already yields good results Can be improved by using conditional pmfs (with simple conditions) Heiko Schwarz (Freie Universität Berlin) — Data Compression: Dictionary-based Coding 2 / 37 Dictionary-based Coding Dictionary-Based Coding Coding of Text Files Very high amount of dependencies Affine prediction does not work (requires linear dependencies) Higher-order conditional coding should work well, but is way to complex (memory) Alternative: Do not code single characters, but words or phrases Example: English Texts Oxford English Dictionary lists less than 230 000 words (including obsolete words) On average, a word contains about 6 characters Average codeword length per character would be limited by 1
    [Show full text]
  • Package 'Brotli'
    Package ‘brotli’ May 13, 2018 Type Package Title A Compression Format Optimized for the Web Version 1.2 Description A lossless compressed data format that uses a combination of the LZ77 algorithm and Huffman coding. Brotli is similar in speed to deflate (gzip) but offers more dense compression. License MIT + file LICENSE URL https://tools.ietf.org/html/rfc7932 (spec) https://github.com/google/brotli#readme (upstream) http://github.com/jeroen/brotli#read (devel) BugReports http://github.com/jeroen/brotli/issues VignetteBuilder knitr, R.rsp Suggests spelling, knitr, R.rsp, microbenchmark, rmarkdown, ggplot2 RoxygenNote 6.0.1 Language en-US NeedsCompilation yes Author Jeroen Ooms [aut, cre] (<https://orcid.org/0000-0002-4035-0289>), Google, Inc [aut, cph] (Brotli C++ library) Maintainer Jeroen Ooms <[email protected]> Repository CRAN Date/Publication 2018-05-13 20:31:43 UTC R topics documented: brotli . .2 Index 4 1 2 brotli brotli Brotli Compression Description Brotli is a compression algorithm optimized for the web, in particular small text documents. Usage brotli_compress(buf, quality = 11, window = 22) brotli_decompress(buf) Arguments buf raw vector with data to compress/decompress quality value between 0 and 11 window log of window size Details Brotli decompression is at least as fast as for gzip while significantly improving the compression ratio. The price we pay is that compression is much slower than gzip. Brotli is therefore most effective for serving static content such as fonts and html pages. For binary (non-text) data, the compression ratio of Brotli usually does not beat bz2 or xz (lzma), however decompression for these algorithms is too slow for browsers in e.g.
    [Show full text]
  • Third Party Software Component List: Targeted Use: Briefcam® Fulfillment of License Obligation for All Open Sources: Yes
    Third Party Software Component List: Targeted use: BriefCam® Fulfillment of license obligation for all open sources: Yes Name Link and Copyright Notices Where Available License Type OpenCV https://opencv.org/license.html 3-Clause Copyright (C) 2000-2019, Intel Corporation, all BSD rights reserved. Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved. Copyright (C) 2009-2016, NVIDIA Corporation, all rights reserved. Copyright (C) 2010-2013, Advanced Micro Devices, Inc., all rights reserved. Copyright (C) 2015-2016, OpenCV Foundation, all rights reserved. Copyright (C) 2015-2016, Itseez Inc., all rights reserved. Apache Logging http://logging.apache.org/log4cxx/license.html Apache Copyright © 1999-2012 Apache Software Foundation License V2 Google Test https://github.com/abseil/googletest/blob/master/google BSD* test/LICENSE Copyright 2008, Google Inc. SAML 2.0 component for https://github.com/jitbit/AspNetSaml/blob/master/LICEN MIT ASP.NET SE Copyright 2018 Jitbit LP Nvidia Video Codec https://github.com/lu-zero/nvidia-video- MIT codec/blob/master/LICENSE Copyright (c) 2016 NVIDIA Corporation FFMpeg 4 https://www.ffmpeg.org/legal.html LesserGPL FFmpeg is a trademark of Fabrice Bellard, originator v2.1 of the FFmpeg project 7zip.exe https://www.7-zip.org/license.txt LesserGPL 7-Zip Copyright (C) 1999-2019 Igor Pavlov v2.1/3- Clause BSD Infralution.Localization.Wp http://www.codeproject.com/info/cpol10.aspx CPOL f Copyright (C) 2018 Infralution Pty Ltd directShowlib .net https://github.com/pauldotknopf/DirectShow.NET/blob/ LesserGPL
    [Show full text]
  • ROOT I/O Compression Improvements for HEP Analysis
    EPJ Web of Conferences 245, 02017 (2020) https://doi.org/10.1051/epjconf/202024502017 CHEP 2019 ROOT I/O compression improvements for HEP analysis Oksana Shadura1;∗ Brian Paul Bockelman2;∗∗ Philippe Canal3;∗∗∗ Danilo Piparo4;∗∗∗∗ and Zhe Zhang1;y 1University of Nebraska-Lincoln, 1400 R St, Lincoln, NE 68588, United States 2Morgridge Institute for Research, 330 N Orchard St, Madison, WI 53715, United States 3Fermilab, Kirk Road and Pine St, Batavia, IL 60510, United States 4CERN, Meyrin 1211, Geneve, Switzerland Abstract. We overview recent changes in the ROOT I/O system, enhancing it by improving its performance and interaction with other data analysis ecosys- tems. Both the newly introduced compression algorithms, the much faster bulk I/O data path, and a few additional techniques have the potential to significantly improve experiment’s software performance. The need for efficient lossless data compression has grown significantly as the amount of HEP data collected, transmitted, and stored has dramatically in- creased over the last couple of years. While compression reduces storage space and, potentially, I/O bandwidth usage, it should not be applied blindly, because there are significant trade-offs between the increased CPU cost for reading and writing files and the reduces storage space. 1 Introduction In the past years, Large Hadron Collider (LHC) experiments are managing about an exabyte of storage for analysis purposes, approximately half of which is stored on tape storages for archival purposes, and half is used for traditional disk storage. Meanwhile for High Lumi- nosity Large Hadron Collider (HL-LHC) storage requirements per year are expected to be increased by a factor of 10 [1].
    [Show full text]
  • Arxiv:2004.10531V1 [Cs.OH] 8 Apr 2020
    ROOT I/O compression improvements for HEP analysis Oksana Shadura1;∗ Brian Paul Bockelman2;∗∗ Philippe Canal3;∗∗∗ Danilo Piparo4;∗∗∗∗ and Zhe Zhang1;y 1University of Nebraska-Lincoln, 1400 R St, Lincoln, NE 68588, United States 2Morgridge Institute for Research, 330 N Orchard St, Madison, WI 53715, United States 3Fermilab, Kirk Road and Pine St, Batavia, IL 60510, United States 4CERN, Meyrin 1211, Geneve, Switzerland Abstract. We overview recent changes in the ROOT I/O system, increasing per- formance and enhancing it and improving its interaction with other data analy- sis ecosystems. Both the newly introduced compression algorithms, the much faster bulk I/O data path, and a few additional techniques have the potential to significantly to improve experiment’s software performance. The need for efficient lossless data compression has grown significantly as the amount of HEP data collected, transmitted, and stored has dramatically in- creased during the LHC era. While compression reduces storage space and, potentially, I/O bandwidth usage, it should not be applied blindly: there are sig- nificant trade-offs between the increased CPU cost for reading and writing files and the reduce storage space. 1 Introduction In the past years LHC experiments are commissioned and now manages about an exabyte of storage for analysis purposes, approximately half of which is used for archival purposes, and half is used for traditional disk storage. Meanwhile for HL-LHC storage requirements per year are expected to be increased by factor 10 [1]. arXiv:2004.10531v1 [cs.OH] 8 Apr 2020 Looking at these predictions, we would like to state that storage will remain one of the major cost drivers and at the same time the bottlenecks for HEP computing.
    [Show full text]
  • Arithmetic Coding
    Arithmetic Coding Arithmetic coding is the most efficient method to code symbols according to the probability of their occurrence. The average code length corresponds exactly to the possible minimum given by information theory. Deviations which are caused by the bit-resolution of binary code trees do not exist. In contrast to a binary Huffman code tree the arithmetic coding offers a clearly better compression rate. Its implementation is more complex on the other hand. In arithmetic coding, a message is encoded as a real number in an interval from one to zero. Arithmetic coding typically has a better compression ratio than Huffman coding, as it produces a single symbol rather than several separate codewords. Arithmetic coding differs from other forms of entropy encoding such as Huffman coding in that rather than separating the input into component symbols and replacing each with a code, arithmetic coding encodes the entire message into a single number, a fraction n where (0.0 ≤ n < 1.0) Arithmetic coding is a lossless coding technique. There are a few disadvantages of arithmetic coding. One is that the whole codeword must be received to start decoding the symbols, and if there is a corrupt bit in the codeword, the entire message could become corrupt. Another is that there is a limit to the precision of the number which can be encoded, thus limiting the number of symbols to encode within a codeword. There also exist many patents upon arithmetic coding, so the use of some of the algorithms also call upon royalty fees. Arithmetic coding is part of the JPEG data format.
    [Show full text]
  • The Basic Principles of Data Compression
    The Basic Principles of Data Compression Author: Conrad Chung, 2BrightSparks Introduction Internet users who download or upload files from/to the web, or use email to send or receive attachments will most likely have encountered files in compressed format. In this topic we will cover how compression works, the advantages and disadvantages of compression, as well as types of compression. What is Compression? Compression is the process of encoding data more efficiently to achieve a reduction in file size. One type of compression available is referred to as lossless compression. This means the compressed file will be restored exactly to its original state with no loss of data during the decompression process. This is essential to data compression as the file would be corrupted and unusable should data be lost. Another compression category which will not be covered in this article is “lossy” compression often used in multimedia files for music and images and where data is discarded. Lossless compression algorithms use statistic modeling techniques to reduce repetitive information in a file. Some of the methods may include removal of spacing characters, representing a string of repeated characters with a single character or replacing recurring characters with smaller bit sequences. Advantages/Disadvantages of Compression Compression of files offer many advantages. When compressed, the quantity of bits used to store the information is reduced. Files that are smaller in size will result in shorter transmission times when they are transferred on the Internet. Compressed files also take up less storage space. File compression can zip up several small files into a single file for more convenient email transmission.
    [Show full text]
  • Information Theory Revision (Source)
    ELEC3203 Digital Coding and Transmission – Overview & Information Theory S Chen Information Theory Revision (Source) {S(k)} {b i } • Digital source is defined by digital source source coding 1. Symbol set: S = {mi, 1 ≤ i ≤ q} symbols/s bits/s 2. Probability of occurring of mi: pi, 1 ≤ i ≤ q 3. Symbol rate: Rs [symbols/s] 4. Interdependency of {S(k)} • Information content of alphabet mi: I(mi) = − log2(pi) [bits] • Entropy: quantifies average information conveyed per symbol q – Memoryless sources: H = − pi · log2(pi) [bits/symbol] i=1 – 1st-order memory (1st-order Markov)P sources with transition probabilities pij q q q H = piHi = − pi pij · log2(pij) [bits/symbol] Xi=1 Xi=1 Xj=1 • Information rate: tells you how many bits/s information the source really needs to send out – Information rate R = Rs · H [bits/s] • Efficient source coding: get rate Rb as close as possible to information rate R – Memoryless source: apply entropy coding, such as Shannon-Fano and Huffman, and RLC if source is binary with most zeros – Generic sources with memory: remove redundancy first, then apply entropy coding to “residauls” 86 ELEC3203 Digital Coding and Transmission – Overview & Information Theory S Chen Practical Source Coding • Practical source coding is guided by information theory, with practical constraints, such as performance and processing complexity/delay trade off • When you come to practical source coding part, you can smile – as you should know everything • As we will learn, data rate is directly linked to required bandwidth, source coding is to encode source with a data rate as small as possible, i.e.
    [Show full text]
  • I Introduction
    PPM Performance with BWT Complexity: A New Metho d for Lossless Data Compression Michelle E ros California Institute of Technology e [email protected] Abstract This work combines a new fast context-search algorithm with the lossless source co ding mo dels of PPM to achieve a lossless data compression algorithm with the linear context-search complexity and memory of BWT and Ziv-Lemp el co des and the compression p erformance of PPM-based algorithms. Both se- quential and nonsequential enco ding are considered. The prop osed algorithm yields an average rate of 2.27 bits per character bp c on the Calgary corpus, comparing favorably to the 2.33 and 2.34 bp c of PPM5 and PPM and the 2.43 bp c of BW94 but not matching the 2.12 bp c of PPMZ9, which, at the time of this publication, gives the greatest compression of all algorithms rep orted on the Calgary corpus results page. The prop osed algorithm gives an average rate of 2.14 bp c on the Canterbury corpus. The Canterbury corpus web page gives average rates of 1.99 bp c for PPMZ9, 2.11 bp c for PPM5, 2.15 bp c for PPM7, and 2.23 bp c for BZIP2 a BWT-based co de on the same data set. I Intro duction The Burrows Wheeler Transform BWT [1] is a reversible sequence transformation that is b ecoming increasingly p opular for lossless data compression. The BWT rear- ranges the symb ols of a data sequence in order to group together all symb ols that share the same unb ounded history or \context." Intuitively, this op eration is achieved by forming a table in which each row is a distinct cyclic shift of the original data string.
    [Show full text]
  • Dc5m United States Software in English Created at 2016-12-25 16:00
    Announcement DC5m United States software in english 1 articles, created at 2016-12-25 16:00 articles set mostly positive rate 10.0 1 3.8 Google’s Brotli Compression Algorithm Lands to Windows Edge Microsoft has announced that its Edge browser has started using Brotli, the compression algorithm that Google open-sourced last year. 2016-12-25 05:00 1KB www.infoq.com Articles DC5m United States software in english 1 articles, created at 2016-12-25 16:00 1 /1 3.8 Google’s Brotli Compression Algorithm Lands to Windows Edge Microsoft has announced that its Edge browser has started using Brotli, the compression algorithm that Google open-sourced last year. Brotli is on by default in the latest Edge build and can be previewed via the Windows Insider Program. It will reach stable status early next year, says Microsoft. Microsoft touts a 20% higher compression ratios over comparable compression algorithms, which would benefit page load times without impacting client-side CPU costs. According to Google, Brotli uses a whole new data format , which makes it incompatible with Deflate but ensures higher compression ratios. In particular, Google says, Brotli is roughly as fast as zlib when decompressing and provides a better compression ratio than LZMA and bzip2 on the Canterbury Corpus. Brotli appears to be especially tuned for the web , that is for offline encoding and online decoding of Web assets, or Android APKs. Google claims a compression ratio improvement of 20–26% over its own Zopfli algorithm, which still provides the best compression ratio of any deflate algorithm.
    [Show full text]
  • Probability Interval Partitioning Entropy Codes Detlev Marpe, Senior Member, IEEE, Heiko Schwarz, and Thomas Wiegand, Senior Member, IEEE
    SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1 Probability Interval Partitioning Entropy Codes Detlev Marpe, Senior Member, IEEE, Heiko Schwarz, and Thomas Wiegand, Senior Member, IEEE Abstract—A novel approach to entropy coding is described that entropy coding while the assignment of codewords to symbols provides the coding efficiency and simple probability modeling is the actual entropy coding. For decades, two methods have capability of arithmetic coding at the complexity level of Huffman dominated practical entropy coding: Huffman coding that has coding. The key element of the proposed approach is given by a partitioning of the unit interval into a small set of been invented in 1952 [8] and arithmetic coding that goes back disjoint probability intervals for pipelining the coding process to initial ideas attributed to Shannon [7] and Elias [9] and along the probability estimates of binary random variables. for which first practical schemes have been published around According to this partitioning, an input sequence of discrete 1976 [10][11]. Both entropy coding methods are capable of source symbols with arbitrary alphabet sizes is mapped to a approximating the entropy limit (in a certain sense) [12]. sequence of binary symbols and each of the binary symbols is assigned to one particular probability interval. With each of the For a fixed probability mass function, Huffman codes are intervals being represented by a fixed probability, the probability relatively easy to construct. The most attractive property of interval partitioning entropy (PIPE) coding process is based on Huffman codes is that their implementation can be efficiently the design and application of simple variable-to-variable length realized by the use of variable-length code (VLC) tables.
    [Show full text]
  • Fast Algorithm for PQ Data Compression Using Integer DTCWT and Entropy Encoding
    International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 22 (2017) pp. 12219-12227 © Research India Publications. http://www.ripublication.com Fast Algorithm for PQ Data Compression using Integer DTCWT and Entropy Encoding Prathibha Ekanthaiah 1 Associate Professor, Department of Electrical and Electronics Engineering, Sri Krishna Institute of Technology, No 29, Chimney hills Chikkabanavara post, Bangalore-560090, Karnataka, India. Orcid Id: 0000-0003-3031-7263 Dr.A.Manjunath 2 Principal, Sri Krishna Institute of Technology, No 29, Chimney hills Chikkabanavara post, Bangalore-560090, Karnataka, India. Orcid Id: 0000-0003-0794-8542 Dr. Cyril Prasanna Raj 3 Dean & Research Head, Department of Electronics and communication Engineering, MS Engineering college , Navarathna Agrahara, Sadahalli P.O., Off Bengaluru International Airport,Bengaluru - 562 110, Karnataka, India. Orcid Id: 0000-0002-9143-7755 Abstract metering infrastructures (smart metering), integration of distributed power generation, renewable energy resources and Smart meters are an integral part of smart grid which in storage units as well as high power quality and reliability [1]. addition to energy management also performs data By using smart metering Infrastructure sustains the management. Power Quality (PQ) data from smart meters bidirectional data transfer and also decrease in the need to be compressed for both storage and transmission environmental effects. With this resilience and reliability of process either through wired or wireless medium. In this power utility network can be improved effectively. Work paper, PQ data compression is carried out by encoding highlights the need of development and technology significant features captured from Dual Tree Complex encroachment in smart grid communications [2].
    [Show full text]