Understanding Compression

Total Page:16

File Type:pdf, Size:1020Kb

Understanding Compression Understanding Compression DATA COMPRESSION FOR MODERN DEVELOPERS Colt McAnlis & Aleks Haecky Understanding Compression Data Compression for Modern Developers Colt McAnlis and Aleks Haecky Beijing Boston Farnham Sebastopol Tokyo Understanding Compression by Colt McAnlis and Aleks Haecky Copyright © 2016 Colton McAnlis and Aleks Haecky. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safaribooksonline.com). For more information, contact our corporate/ institutional sales department: 800-998-9938 or [email protected]. Editor: Tim McGovern Indexer: Ellen Troutman-Zaig Production Editor: Melanie Yarbrough Interior Designer: David Futato Copyeditor: Octal Publishing, Inc. Cover Designer: Karen Montgomery Proofreader: Jasmine Kwityn Illustrator: Melanie Yarbrough July 2016: First Edition Revision History for the First Edition 2016-07-11: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491961537 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Understanding Compression, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-491-96153-7 [LSI] From CLM To JAM and MLM: I swear to Zuul, that if you don’t eat your broccoli right now, I’m going to write a book. And in the dedication of that book, I’m going to call you out as being afraid of a piece of foliage that humans have been eating for thousands of genera‐ tions. Then, 20 years from now, when you have kids of your own, I’m going to pull that book out, and show you what I wrote, and laugh in your face, because you’ll know how crazy you’re making me right now. #parenting To KMKM: How about another decade, just for good measure? From AH To AHS and GHS: I hoped you’d learn to cook. Instead, you proved that humankind can survive on fresh apples and stale supermarket sushi. Table of Contents Foreword. xi Preface. xv Chapter Synopsis 18 1. Let’s Not Be Boring. 1 The Five Buckets of Compression Algorithms 1 Claude Shannon Is Infuriating! 2 The Only Thing You Need to Know about Data Compression 3 A World Built on Data Compression 4 2. Do Not Skip This Chapter. 9 Understanding Binary 9 Base 10 System 9 Binary Number System 10 Information Theory 13 An Excursion into Binary Search 14 Entropy: The Minimum Bits Needed to Represent a Number 16 Standard Number Lengths 17 3. Breaking Entropy. 19 Understanding Entropy 19 What This Entropy Stuff Is Good For 21 Understanding Probability 22 Breaking Entropy 23 Example 1: Delta Coding 24 Example 2: Symbol Grouping 25 Example 3: Permutations 26 v Information Theory Versus Data Compression 31 4. Variable-Length Codes. 33 Morse Code 33 Probability, Entropy, and Codeword Size 36 Variable-Length Codes 38 Using VLCs 38 Creating VLCs 42 A Handful of Example VLCs 44 Finding the Right Code for Your Data Set 51 5. Statistical Encoding. 53 Statistically Compressing to Entropy 53 Huffman Coding 55 Building a Huffman Tree 55 Generating Codewords 57 Encoding and Decoding 58 Practical Implementations 58 Arithmetic Coding 60 Finding the Right Number 61 Encoding 62 Picking the Right Output Value 64 Decoding 64 Practical Implementations 69 Asymmetric Numeral Systems 69 Encoding and Decoding Using a Transform Table 70 Creating the Reference Table 71 Using ANS for Compression 74 Decoding Example 75 So Where Does the Compression Come From? 76 Practical Compression: Which Statistical Algorithm Do I Choose? 77 6. Adaptive Statistical Encoding. 79 Locality Matters for Entropy 79 Adaptive VLC Encoding 81 Dynamically Building a VLC Table 81 Literals 84 Resets 87 Knowing When to Reset 88 Using This in Practice 89 Adaptive Arithmetic Coding 89 Adaptive Huffman Coding 90 vi | Table of Contents The Modern Choice 91 7. Dictionary Transforms. 93 A Basic Dictionary Transform 94 Finding the Right “Words” 95 The Lempel-Ziv Algorithm 98 How LZ Works 99 Encoding 104 Decoding 105 Compressing LZ output 106 LZ Variants 107 Collect Them All! 110 8. Contextual Data Transforms. 111 Run-Length Encoding 112 Dealing with Short Runs 112 Compressing 114 Delta Coding 115 XOR Delta Coding 118 Frame of Reference Delta Coding 119 Patched Frame of Reference Delta Coding 120 Compressing Delta-Encoded Data 123 Does It Work on Text? 123 Move-to-Front Coding 123 Avoiding Rogue Symbols 125 Compressing MTF 126 Burrows–Wheeler Transform 126 Ordering Is Important! 128 How BWT Works 128 Inverse BWT 130 Practical Implementations 132 Compressing BWT 132 9. Data Modeling. 135 The Chains of Markov 136 Markov and Compression 139 Practical Implementations 145 Prediction by Partial Matching 145 The Search Trie 147 Compressing a Symbol 149 Choosing a Sensible N Value 150 Dealing with Unknown Symbols 150 Table of Contents | vii Context Mixing 150 Types of Models 151 Types of Mixing 153 The Next Big Thing? 154 10. Switching Gears. 155 Media-Specific Compression 155 General-Purpose Compression 156 Compression in Practice 157 11. Evaluating Compression. 159 Compression Usage Scenarios 159 Compressed Offline, Decompressed On-Client 159 Compressed On-Client, Decompressed In-Cloud 160 Compressed In-Cloud, Decompressed On-Client 160 Compressed On-Client, Decompressed On-Client 161 Compression Need 161 Compression Ratio 162 Compression Performance 163 Decompression Performance 164 Ability to Decode-Stream 164 Comparing Compressors 165 12. Compressing Image Data Types. 167 Understanding Quality Versus File Size 167 What Reduces Image Quality? 169 Measuring Image Quality 171 Making This Work 173 Image Dimensions Are Important 173 Choosing the Correct Image Format 175 PNG 175 JPG 176 GIF 177 WebP 177 And Now for Choosing... 177 GPU Texture Formats 179 Vector Formats 180 Eyes on the Prize 182 13. Serialized Data. 183 Understanding Common Use Cases 184 Dynamically Server-Built Data 184 viii | Table of Contents Statically Built Server-Owned Data 184 Dynamically Client-Built Data 184 Statically Client-Owned Data 184 Issues with Serialized Formats 185 Human-Readable Text 185 Slow Decode Times 186 Smaller Serialized Data 186 Use a Binary Serialization Format 186 Restructure Lists for Better Compression 187 Organize for Efficient Fetching 188 Segment Out Data into the Proper Compression Format 191 14. Lossy Data Compression. 193 15. Making the World a Little Smaller. 195 Data Compression and You 195 Data Compression and the Bottom Line 195 User Acquisition and Retention 195 Running Costs 196 Planning Ahead 197 Making Your Users’ Lives a Little More Magical and Less Expensive 197 Thinking About What’s Next in Technology 198 The Next Five Billion Users 198 Mobile Networks 198 ...Starting Now 199 Glossary of Compression Words. 201 Index. 209 Table of Contents | ix Foreword When I first began programming, I had no idea what data compression was nor why it mattered. Luckily, my Apple II Plus computer came with 0.000048 GB of memory (48 KB), which was quite a lot in 1979, and was enough to let me explore program‐ ming and computer graphics without realizing that my programs and data were con‐ stantly being compressed and decompressed behind the scenes in order to reduce their size in memory. Thanks, Woz! After programming for a few years, I had discovered: • Data compression took time and could slow down my software. • Changing my data organization could make the compressed data smaller. • There are a bewildering variety of complicated data compression algorithms. This led to the realization that compression was not a rigid black box; rather, it’s a flexible tool that greatly influenced the quality of my software and could be manipu‐ lated in several ways: • Changing compression algorithms could make my software run faster. • Pairing my data organization with the right compression algorithm could make my data smaller. • Choosing the wrong data organization or algorithm could make my data larger (and/or run slower). Ah! Now I knew why data compression mattered. If things weren’t fitting into mem‐ ory or were decompressing too slowly, I could slightly change my data organization to better fit the compression algorithm. I’d simply put numbers together in one group, strings in another, build tables of recurring data types, or truncate fractions into inte‐ gers. I didn’t need to do the hard work of evaluating and adopting new compression algorithms if I could fit my data to the algorithm. Then, I began making video games professionally, and most of the game data was cre‐ ated by not-so-technical artists, designers, and musicians. It turned out that math was xi not their favorite topic of discussion, and they were less than excited about changing the game data so that it would take advantage of my single go-to compression algo‐ rithm. Well, if the data organization couldn’t be improved, that left choosing the best compression algorithm to pair up with all of this great artistic data. I surveyed the various compression algorithms and found there were a couple of broad categories suitable for my video game data: Lossless • De-duplication (LZ) • Entropy (Huffman, Arithmetic) Lossy • Reduced precision (truncation or decimation) • Image/video • Audio For text strings and binary data, I used LZ to compress away repeating duplicate data patterns.
Recommended publications
  • Third Party Software Component List: Targeted Use: Briefcam® Fulfillment of License Obligation for All Open Sources: Yes
    Third Party Software Component List: Targeted use: BriefCam® Fulfillment of license obligation for all open sources: Yes Name Link and Copyright Notices Where Available License Type OpenCV https://opencv.org/license.html 3-Clause Copyright (C) 2000-2019, Intel Corporation, all BSD rights reserved. Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved. Copyright (C) 2009-2016, NVIDIA Corporation, all rights reserved. Copyright (C) 2010-2013, Advanced Micro Devices, Inc., all rights reserved. Copyright (C) 2015-2016, OpenCV Foundation, all rights reserved. Copyright (C) 2015-2016, Itseez Inc., all rights reserved. Apache Logging http://logging.apache.org/log4cxx/license.html Apache Copyright © 1999-2012 Apache Software Foundation License V2 Google Test https://github.com/abseil/googletest/blob/master/google BSD* test/LICENSE Copyright 2008, Google Inc. SAML 2.0 component for https://github.com/jitbit/AspNetSaml/blob/master/LICEN MIT ASP.NET SE Copyright 2018 Jitbit LP Nvidia Video Codec https://github.com/lu-zero/nvidia-video- MIT codec/blob/master/LICENSE Copyright (c) 2016 NVIDIA Corporation FFMpeg 4 https://www.ffmpeg.org/legal.html LesserGPL FFmpeg is a trademark of Fabrice Bellard, originator v2.1 of the FFmpeg project 7zip.exe https://www.7-zip.org/license.txt LesserGPL 7-Zip Copyright (C) 1999-2019 Igor Pavlov v2.1/3- Clause BSD Infralution.Localization.Wp http://www.codeproject.com/info/cpol10.aspx CPOL f Copyright (C) 2018 Infralution Pty Ltd directShowlib .net https://github.com/pauldotknopf/DirectShow.NET/blob/ LesserGPL
    [Show full text]
  • Compressed Transitive Delta Encoding 1. Introduction
    Compressed Transitive Delta Encoding Dana Shapira Department of Computer Science Ashkelon Academic College Ashkelon 78211, Israel [email protected] Abstract Given a source file S and two differencing files ∆(S; T ) and ∆(T;R), where ∆(X; Y ) is used to denote the delta file of the target file Y with respect to the source file X, the objective is to be able to construct R. This is intended for the scenario of upgrading soft- ware where intermediate releases are missing, or for the case of file system backups, where non consecutive versions must be recovered. The traditional way is to decompress ∆(S; T ) in order to construct T and then apply ∆(T;R) on T and obtain R. The Compressed Transitive Delta Encoding (CTDE) paradigm, introduced in this paper, is to construct a delta file ∆(S; R) working directly on the two given delta files, ∆(S; T ) and ∆(T;R), without any decompression or the use of the base file S. A new algorithm for solving CTDE is proposed and its compression performance is compared against the traditional \double delta decompression". Not only does it use constant additional space, as opposed to the traditional method which uses linear additional memory storage, but experiments show that the size of the delta files involved is reduced by 15% on average. 1. Introduction Differential file compression represents a target file T with respect to a source file S. That is, both the encoder and decoder have available identical copies of S. A new file T is encoded and subsequently decoded by making use of S.
    [Show full text]
  • Arithmetic Coding
    Arithmetic Coding Arithmetic coding is the most efficient method to code symbols according to the probability of their occurrence. The average code length corresponds exactly to the possible minimum given by information theory. Deviations which are caused by the bit-resolution of binary code trees do not exist. In contrast to a binary Huffman code tree the arithmetic coding offers a clearly better compression rate. Its implementation is more complex on the other hand. In arithmetic coding, a message is encoded as a real number in an interval from one to zero. Arithmetic coding typically has a better compression ratio than Huffman coding, as it produces a single symbol rather than several separate codewords. Arithmetic coding differs from other forms of entropy encoding such as Huffman coding in that rather than separating the input into component symbols and replacing each with a code, arithmetic coding encodes the entire message into a single number, a fraction n where (0.0 ≤ n < 1.0) Arithmetic coding is a lossless coding technique. There are a few disadvantages of arithmetic coding. One is that the whole codeword must be received to start decoding the symbols, and if there is a corrupt bit in the codeword, the entire message could become corrupt. Another is that there is a limit to the precision of the number which can be encoded, thus limiting the number of symbols to encode within a codeword. There also exist many patents upon arithmetic coding, so the use of some of the algorithms also call upon royalty fees. Arithmetic coding is part of the JPEG data format.
    [Show full text]
  • Implementing Compression on Distributed Time Series Database
    Implementing compression on distributed time series database Michael Burman School of Science Thesis submitted for examination for the degree of Master of Science in Technology. Espoo 05.11.2017 Supervisor Prof. Kari Smolander Advisor Mgr. Jiri Kremser Aalto University, P.O. BOX 11000, 00076 AALTO www.aalto.fi Abstract of the master’s thesis Author Michael Burman Title Implementing compression on distributed time series database Degree programme Major Computer Science Code of major SCI3042 Supervisor Prof. Kari Smolander Advisor Mgr. Jiri Kremser Date 05.11.2017 Number of pages 70+4 Language English Abstract Rise of microservices and distributed applications in containerized deployments are putting increasing amount of burden to the monitoring systems. They push the storage requirements to provide suitable performance for large queries. In this paper we present the changes we made to our distributed time series database, Hawkular-Metrics, and how it stores data more effectively in the Cassandra. We show that using our methods provides significant space savings ranging from 50 to 95% reduction in storage usage, while reducing the query times by over 90% compared to the nominal approach when using Cassandra. We also provide our unique algorithm modified from Gorilla compression algorithm that we use in our solution, which provides almost three times the throughput in compression with equal compression ratio. Keywords timeseries compression performance storage Aalto-yliopisto, PL 11000, 00076 AALTO www.aalto.fi Diplomityön tiivistelmä Tekijä Michael Burman Työn nimi Pakkausmenetelmät hajautetussa aikasarjatietokannassa Koulutusohjelma Pääaine Computer Science Pääaineen koodi SCI3042 Työn valvoja ja ohjaaja Prof. Kari Smolander Päivämäärä 05.11.2017 Sivumäärä 70+4 Kieli Englanti Tiivistelmä Hajautettujen järjestelmien yleistyminen on aiheuttanut valvontajärjestelmissä tiedon määrän kasvua, sillä aikasarjojen määrä on kasvanut ja niihin talletetaan useammin tietoa.
    [Show full text]
  • Information Theory Revision (Source)
    ELEC3203 Digital Coding and Transmission – Overview & Information Theory S Chen Information Theory Revision (Source) {S(k)} {b i } • Digital source is defined by digital source source coding 1. Symbol set: S = {mi, 1 ≤ i ≤ q} symbols/s bits/s 2. Probability of occurring of mi: pi, 1 ≤ i ≤ q 3. Symbol rate: Rs [symbols/s] 4. Interdependency of {S(k)} • Information content of alphabet mi: I(mi) = − log2(pi) [bits] • Entropy: quantifies average information conveyed per symbol q – Memoryless sources: H = − pi · log2(pi) [bits/symbol] i=1 – 1st-order memory (1st-order Markov)P sources with transition probabilities pij q q q H = piHi = − pi pij · log2(pij) [bits/symbol] Xi=1 Xi=1 Xj=1 • Information rate: tells you how many bits/s information the source really needs to send out – Information rate R = Rs · H [bits/s] • Efficient source coding: get rate Rb as close as possible to information rate R – Memoryless source: apply entropy coding, such as Shannon-Fano and Huffman, and RLC if source is binary with most zeros – Generic sources with memory: remove redundancy first, then apply entropy coding to “residauls” 86 ELEC3203 Digital Coding and Transmission – Overview & Information Theory S Chen Practical Source Coding • Practical source coding is guided by information theory, with practical constraints, such as performance and processing complexity/delay trade off • When you come to practical source coding part, you can smile – as you should know everything • As we will learn, data rate is directly linked to required bandwidth, source coding is to encode source with a data rate as small as possible, i.e.
    [Show full text]
  • In-Place Reconstruction of Delta Compressed Files
    In-Place Reconstruction of Delta Compressed Files Randal C. Burns Darrell D. E. Long’ IBM Almaden ResearchCenter Departmentof Computer Science 650 Harry Rd., San Jose,CA 95 120 University of California, SantaCruz, CA 95064 [email protected] [email protected] Abstract results in high latency and low bandwidth to web-enabled clients and prevents the timely delivery of software. We present an algorithm for modifying delta compressed Differential or delta compression [5, 11, compactly en- files so that the compressedversions may be reconstructed coding a new version of a file using only the changedbytes without scratchspace. This allows network clients with lim- from a previous version, can be usedto reducethe size of the ited resources to efficiently update software by retrieving file to be transmitted and consequently the time to perform delta compressedversions over a network. software update. Currently, decompressingdelta encoded Delta compressionfor binary files, compactly encoding a files requires scratch space,additional disk or memory stor- version of data with only the changedbytes from a previous age, used to hold a required second copy of the file. Two version, may be used to efficiently distribute software over copiesof the compressedfile must be concurrently available, low bandwidth channels, such as the Internet. Traditional as the delta file contains directives to read data from the old methods for rebuilding these delta files require memory or file version while the new file version is being materialized storagespace on the target machinefor both the old and new in another region of storage. This presentsa problem. Net- version of the file to be reconstructed.
    [Show full text]
  • Delta Compression Techniques
    D but the concept can also be applied to multimedia Delta Compression and structured data. Techniques Delta compression should not be confused with Elias delta codes, a technique for encod- Torsten Suel ing integer values, or with the idea of coding Department of Computer Science and sorted sequences of integers by first taking the Engineering, Tandon School of Engineering, difference (or delta) between consecutive values. New York University, Brooklyn, NY, USA Also, delta compression requires the encoder to have complete knowledge of the reference files and thus differs from more general techniques for Synonyms redundancy elimination in networks and storage systems where the encoder has limited or even Data differencing; Delta encoding; Differential no knowledge of the reference files, though the compression boundaries with that line of work are not clearly defined. Definition Delta compression techniques encode a target Overview file with respect to one or more reference files, such that a decoder who has access to the same Many applications of big data technologies in- reference files can recreate the target file from the volve very large data sets that need to be stored on compressed data. Delta compression is usually disk or transmitted over networks. Consequently, applied in cases where there is a high degree of data compression techniques are widely used to redundancy between target and references files, reduce data sizes. However, there are many sce- leading to a much smaller compressed size than narios where there are significant redundancies could be achieved by just compressing the tar- between different data files that cannot be ex- get file by itself.
    [Show full text]
  • Probability Interval Partitioning Entropy Codes Detlev Marpe, Senior Member, IEEE, Heiko Schwarz, and Thomas Wiegand, Senior Member, IEEE
    SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1 Probability Interval Partitioning Entropy Codes Detlev Marpe, Senior Member, IEEE, Heiko Schwarz, and Thomas Wiegand, Senior Member, IEEE Abstract—A novel approach to entropy coding is described that entropy coding while the assignment of codewords to symbols provides the coding efficiency and simple probability modeling is the actual entropy coding. For decades, two methods have capability of arithmetic coding at the complexity level of Huffman dominated practical entropy coding: Huffman coding that has coding. The key element of the proposed approach is given by a partitioning of the unit interval into a small set of been invented in 1952 [8] and arithmetic coding that goes back disjoint probability intervals for pipelining the coding process to initial ideas attributed to Shannon [7] and Elias [9] and along the probability estimates of binary random variables. for which first practical schemes have been published around According to this partitioning, an input sequence of discrete 1976 [10][11]. Both entropy coding methods are capable of source symbols with arbitrary alphabet sizes is mapped to a approximating the entropy limit (in a certain sense) [12]. sequence of binary symbols and each of the binary symbols is assigned to one particular probability interval. With each of the For a fixed probability mass function, Huffman codes are intervals being represented by a fixed probability, the probability relatively easy to construct. The most attractive property of interval partitioning entropy (PIPE) coding process is based on Huffman codes is that their implementation can be efficiently the design and application of simple variable-to-variable length realized by the use of variable-length code (VLC) tables.
    [Show full text]
  • Fast Algorithm for PQ Data Compression Using Integer DTCWT and Entropy Encoding
    International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 22 (2017) pp. 12219-12227 © Research India Publications. http://www.ripublication.com Fast Algorithm for PQ Data Compression using Integer DTCWT and Entropy Encoding Prathibha Ekanthaiah 1 Associate Professor, Department of Electrical and Electronics Engineering, Sri Krishna Institute of Technology, No 29, Chimney hills Chikkabanavara post, Bangalore-560090, Karnataka, India. Orcid Id: 0000-0003-3031-7263 Dr.A.Manjunath 2 Principal, Sri Krishna Institute of Technology, No 29, Chimney hills Chikkabanavara post, Bangalore-560090, Karnataka, India. Orcid Id: 0000-0003-0794-8542 Dr. Cyril Prasanna Raj 3 Dean & Research Head, Department of Electronics and communication Engineering, MS Engineering college , Navarathna Agrahara, Sadahalli P.O., Off Bengaluru International Airport,Bengaluru - 562 110, Karnataka, India. Orcid Id: 0000-0002-9143-7755 Abstract metering infrastructures (smart metering), integration of distributed power generation, renewable energy resources and Smart meters are an integral part of smart grid which in storage units as well as high power quality and reliability [1]. addition to energy management also performs data By using smart metering Infrastructure sustains the management. Power Quality (PQ) data from smart meters bidirectional data transfer and also decrease in the need to be compressed for both storage and transmission environmental effects. With this resilience and reliability of process either through wired or wireless medium. In this power utility network can be improved effectively. Work paper, PQ data compression is carried out by encoding highlights the need of development and technology significant features captured from Dual Tree Complex encroachment in smart grid communications [2].
    [Show full text]
  • Entropy Encoding in Wavelet Image Compression
    Entropy Encoding in Wavelet Image Compression Myung-Sin Song1 Department of Mathematics and Statistics, Southern Illinois University Edwardsville [email protected] Summary. Entropy encoding which is a way of lossless compression that is done on an image after the quantization stage. It enables to represent an image in a more efficient way with smallest memory for storage or transmission. In this paper we will explore various schemes of entropy encoding and how they work mathematically where it applies. 1 Introduction In the process of wavelet image compression, there are three major steps that makes the compression possible, namely, decomposition, quanti- zation and entropy encoding steps. While quantization may be a lossy step where some quantity of data may be lost and may not be re- covered, entropy encoding enables a lossless compression that further compresses the data. [13], [18], [5] In this paper we discuss various entropy encoding schemes that are used by engineers (in various applications). 1.1 Wavelet Image Compression In wavelet image compression, after the quantization step (see Figure 1) entropy encoding, which is a lossless form of compression is performed on a particular image for more efficient storage. Either 8 bits or 16 bits are required to store a pixel on a digital image. With efficient entropy encoding, we can use a smaller number of bits to represent a pixel in an image; this results in less memory usage to store or even transmit an image. Karhunen-Lo`eve theorem enables us to pick the best basis thus to minimize the entropy and error, to better represent an image for optimal storage or transmission.
    [Show full text]
  • The Pillars of Lossless Compression Algorithms a Road Map and Genealogy Tree
    International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 6 (2018) pp. 3296-3414 © Research India Publications. http://www.ripublication.com The Pillars of Lossless Compression Algorithms a Road Map and Genealogy Tree Evon Abu-Taieh, PhD Information System Technology Faculty, The University of Jordan, Aqaba, Jordan. Abstract tree is presented in the last section of the paper after presenting the 12 main compression algorithms each with a practical This paper presents the pillars of lossless compression example. algorithms, methods and techniques. The paper counted more than 40 compression algorithms. Although each algorithm is The paper first introduces Shannon–Fano code showing its an independent in its own right, still; these algorithms relation to Shannon (1948), Huffman coding (1952), FANO interrelate genealogically and chronologically. The paper then (1949), Run Length Encoding (1967), Peter's Version (1963), presents the genealogy tree suggested by researcher. The tree Enumerative Coding (1973), LIFO (1976), FiFO Pasco (1976), shows the interrelationships between the 40 algorithms. Also, Stream (1979), P-Based FIFO (1981). Two examples are to be the tree showed the chronological order the algorithms came to presented one for Shannon-Fano Code and the other is for life. The time relation shows the cooperation among the Arithmetic Coding. Next, Huffman code is to be presented scientific society and how the amended each other's work. The with simulation example and algorithm. The third is Lempel- paper presents the 12 pillars researched in this paper, and a Ziv-Welch (LZW) Algorithm which hatched more than 24 comparison table is to be developed.
    [Show full text]
  • The Deep Learning Solutions on Lossless Compression Methods for Alleviating Data Load on Iot Nodes in Smart Cities
    sensors Article The Deep Learning Solutions on Lossless Compression Methods for Alleviating Data Load on IoT Nodes in Smart Cities Ammar Nasif *, Zulaiha Ali Othman and Nor Samsiah Sani Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science & Technology, University Kebangsaan Malaysia, Bangi 43600, Malaysia; [email protected] (Z.A.O.); [email protected] (N.S.S.) * Correspondence: [email protected] Abstract: Networking is crucial for smart city projects nowadays, as it offers an environment where people and things are connected. This paper presents a chronology of factors on the development of smart cities, including IoT technologies as network infrastructure. Increasing IoT nodes leads to increasing data flow, which is a potential source of failure for IoT networks. The biggest challenge of IoT networks is that the IoT may have insufficient memory to handle all transaction data within the IoT network. We aim in this paper to propose a potential compression method for reducing IoT network data traffic. Therefore, we investigate various lossless compression algorithms, such as entropy or dictionary-based algorithms, and general compression methods to determine which algorithm or method adheres to the IoT specifications. Furthermore, this study conducts compression experiments using entropy (Huffman, Adaptive Huffman) and Dictionary (LZ77, LZ78) as well as five different types of datasets of the IoT data traffic. Though the above algorithms can alleviate the IoT data traffic, adaptive Huffman gave the best compression algorithm. Therefore, in this paper, Citation: Nasif, A.; Othman, Z.A.; we aim to propose a conceptual compression method for IoT data traffic by improving an adaptive Sani, N.S.
    [Show full text]