Compression Algorithms

Total Page:16

File Type:pdf, Size:1020Kb

Compression Algorithms Ecole Polytechnique Federale de Lausanne Master Semester Project Compression Algorithms Supervisor: Author: Ghid Maatouk Ludovic Favre Professor : Amin Shokrollahi June 11, 2010 Contents 1 Theory for Data Compression4 1.1 Model.......................................4 1.2 Entropy......................................4 1.3 Source Coding..................................4 1.3.1 Bound on the optimal code length...................4 1.3.2 Other properties.............................6 2 Source Coding Algorithms8 2.1 Huffman coding..................................8 2.1.1 History..................................8 2.1.2 Description................................8 2.1.3 Optimality................................9 2.2 Arithmetic coding................................ 10 3 Adaptive Dictionary techniques: Lempel-Ziv 11 3.1 History...................................... 11 3.2 LZ77........................................ 11 3.2.1 LZ77 encoding and decoding...................... 12 3.2.2 Performance discussion.......................... 13 3.3 LZ78........................................ 13 3.3.1 LZ78 encoding and decoding...................... 13 3.3.2 Optimality................................ 14 3.4 Improvements for LZ77 and LZ78........................ 15 4 Burrows-Wheeler Transform 17 4.1 History...................................... 17 4.2 Description.................................... 17 4.2.1 Encoding................................. 17 4.2.2 Decoding................................. 18 4.2.3 Why it compresses well......................... 19 4.3 Algorithms used in combination with BWT.................. 20 4.3.1 Run-length encoding........................... 20 4.3.2 Move-to-front encoding......................... 20 5 Implementation 21 5.1 LZ78........................................ 21 5.1.1 Code details................................ 21 5.2 Burrows Wheeler Transform........................... 22 1 CONTENTS 2 5.3 Huffman coding.................................. 23 5.3.1 Binary input and output......................... 23 5.3.2 Huffman implementation......................... 23 5.4 Move-to-front................................... 24 5.5 Run-length encoding............................... 24 5.6 Overview of source files............................. 24 6 Practical Results 27 6.1 Benchmark files.................................. 27 6.1.1 Notions used for comparison....................... 28 6.1.2 Other remarks.............................. 28 6.2 Lempel-Ziv 78................................... 28 6.2.1 Lempel-Ziv 78 with dictionary reset.................. 28 6.2.2 Comparison between with and without dictionary reset version... 29 6.2.3 Comparison between my LZ78 implementation and GZIP...... 29 6.3 Burrows-Wheeler Transform........................... 31 6.3.1 Comparison of BWT schemes to LZ78................. 32 6.3.2 Influence of the block size........................ 32 6.3.3 Comparison between my optimal BWT method and BZIP2..... 35 6.4 Global comparison................................ 36 7 Supplementary Material 37 7.1 Using the program................................ 37 7.1.1 License.................................. 37 7.1.2 Building the program.......................... 37 7.2 Collected data................................... 38 7.2.1 Scripts................................... 38 7.2.2 Spreadsheets............................... 38 7.2.3 Repository................................ 38 8 Conclusion 39 Introduction With the increasing amount of data traveling by various means like wireless networks link from mobile phone to servers, lossless data compression has become an important factor to optimize the spectrum utilization. As a computer science student with interest in domain like computational biology, data processing was important for me to understand how to handle the large amount of data coming from high-throughput sequencing technologies. Since I have both interest in algorithms and concrete data processing, getting in touch with data compression techniques immediately interested me. During this semester project, it has been decided to focus on two lossless compression algorithms to allow some comparison. The chosen algorithms for implementation were Lempel-Ziv 78 and the most recent Burrows-Wheeler Transform, which enable well-known techniques such as run-length encoding, move-to-front and Huffman coding to outperform easily in most of the situations the more complicated Lempel-Ziv-based techniques. Those two compression techniques both have a very different approach on the way to compress data and are commonly used in GZip1 and BZip22 software for example. In this report, I will first introduce some information theory material and present the the- oretical part of the project in which I learnt how popular compression techniques attempt to reduce the size required for heterogeneous types of data. The subsequent chapters will then detail my practical work during the semester and present the implementation I have done in C/C++. Finally, the last two chapters consist of the results obtained on the famous Calgary Corpus benchmark files where I will highlight the differences in performance and explain the choices made in actual compression software. 1http://www.gnu.org/software/gzip (May 31. 2010) 2http://www.bzip.org (May 31. 2010) 3 Chapter 1 Theory for Data Compression 1.1 Model The general model used for the theoretical part is the First-Order Model [1, theory]: In this model, the symbols are independent of one another, and the probability distribution of the symbols is determined by the source X. We will considere X as a discrete random variable with alphabet A1. We will also assume that there is a probability mass function p(x) over A. Let also denote a finite sequence of length n by Xn. 1.2 Entropy In information theory, the concept of entropy is due to Claude Shannon in 19482. It is used to quantify the minimal average number of bits required to encode a source X. Definition 1. The entropy H(X) of a discret random variable X is defined as: X H(x) = − p(x)log p(x) x2A where p(x) is the probability mass function for x 2 A to be encountered. 1.3 Source Coding 1.3.1 Bound on the optimal code length Before giving a bound for the entropy, we will have to introduce some definitions. The first definition introduces the notion of codeword and binary code. Definition 2. A binary code C for the random variable X is a mapping from A to a finite binary string. Let denote by C(x) the codeword mapped to x 2 A and let l(x) be the length of C(x). Moreover, a property that is often wanted for a binary code is to be instantaneous: 1The alphabet A of X is the set of all possible symbols X can output. 2http://en.wikipedia.org/wiki/Entropy_(information_theory) 4 CHAPTER 1. THEORY FOR DATA COMPRESSION 5 Definition 3. A code is said to be instantaneous (or prefix-free) if no codeword is a prefix of any other codeword. The property of an instantaneous code is quite interesting since it permits to transmit the codeword for multiple input symbols x1; x2; x2; ··· by simply concatenating the codewords C(x1)C(x2)C(x3) ··· while still being able to decode xi instantly after C(xi) has been received. Another definition required for the entropy bound theorem is about the expected length of a binary code C. Definition 4. Given a binary code C, the expected length for C is given by X L(C) = p(x)l(x) x2A Finally, the Kraft inequality permits connecting the instantaneous property for a code to the code length. The Kraft inequality The theorem formalizing the Kraft inequality is given below: Theorem 1. (Kraft inequality [7, p.107]) For any instantaneous code (prefix code) over an alphabet of size D, the codeword lengths l(x1); l(x2); ··· ; l(xm) must satisfy the inequality X D−l(xi) ≤ 1 i Conversely, given a set of codeword lengths that satisfy this inequality, there exists an instantaneous code with these word lengths. The Kraft inequality theorem will not be proven here. The proof can be found in [7, pp.107-109]. We are now able to give the theorem for the entropy bound on the expected length of a binary code C. Theorem 2. The expected length of any code C satisfies the following double inequality H(X) ≤ L(C) ≤ H(X) + 1 Proof. The proof will take place in two phases: 1. We will first probe the upper bound, that is L(C) ≤ H(X) + 1 We chose an integer word-length assignment for the word xi: 1 l(xi) = logD p(xi) These lengths satisfy the craft inequality because l 1 m 1 X − logD p(x ) X −logD p(x ) X D i ≤ D i = p(xi) = 1 CHAPTER 1. THEORY FOR DATA COMPRESSION 6 hence there exists a code with these word lengths. The upper bound is obtained as follows using Theorem1: X X 1 p(x)l(x) = p(x) log D p(x) x2A x2A X 1 ≤ p(x)(log + 1) D p(x) x2A = H(X) + 1 which proves the upper bound. 2. The lower bound is obtained as follows: By our word-length assignments, we can deduce that 1 logD ≤ l(xi) p(xi) and therefore we obtain X p(x)l(x) = L(C) x2A X 1 ≥ p(x)logD p(xi) x2A = H(X) which proves the two inequality parts from Theorem2. 1.3.2 Other properties The entropy can also be used to qualify multiple sources (random variables). For such cases, we use the joint entropy. Definition 5. The joint entropy H(X; Y ) of a pair of discret random variables (X,Y) with a joint distribution p(x; y) is defined as [7, p.16] X H(X; Y ) = − p(x; y)log p(x; y) (x;y) It is also possible to use the conditional entropy : Definition 6. The conditional entropy H(Y jX) is defined as X H(Y jX) = p(x)H(Y jX = x) x X X = − p(x) p(yjx)log p(yjx) x y X X = − p(x; y)log p(yjx) y x CHAPTER 1. THEORY FOR DATA COMPRESSION 7 Theorem 3. From the previous definitions, we obtain the following theorem: H(X; Y ) = H(X) + H(Y jX) = H(Y ) + H(XjY ) Proof. Using the previously seen definitions and properties, proving Theorem3 is simply a matter of developing formulas. X X H(X; Y ) = − p(x; y)log p(x; y) x y X X = − p(x; y)log p(x)p(yjx) x y X X X X = − p(x; y)log p(x) − p(x; y)log p(yjx) x y x y X X X = − p(x)log p(x) − p(x; y)log p(yjx) x x y = H(X) + H(Y jX) The proof is similar for the second part of the equality.
Recommended publications
  • Word-Based Text Compression
    WORD-BASED TEXT COMPRESSION Jan Platoš, Jiří Dvorský Department of Computer Science VŠB – Technical University of Ostrava, Czech Republic {jan.platos.fei, jiri.dvorsky}@vsb.cz ABSTRACT Today there are many universal compression algorithms, but in most cases is for specific data better using specific algorithm - JPEG for images, MPEG for movies, etc. For textual documents there are special methods based on PPM algorithm or methods with non-character access, e.g. word-based compression. In the past, several papers describing variants of word- based compression using Huffman encoding or LZW method were published. The subject of this paper is the description of a word-based compression variant based on the LZ77 algorithm. The LZ77 algorithm and its modifications are described in this paper. Moreover, various ways of sliding window implementation and various possibilities of output encoding are described, as well. This paper also includes the implementation of an experimental application, testing of its efficiency and finding the best combination of all parts of the LZ77 coder. This is done to achieve the best compression ratio. In conclusion there is comparison of this implemented application with other word-based compression programs and with other commonly used compression programs. Key Words: LZ77, word-based compression, text compression 1. Introduction Data compression is used more and more the text compression. In the past, word- in these days, because larger amount of based compression methods based on data require to be transferred or backed-up Huffman encoding, LZW or BWT were and capacity of media or speed of network tested. This paper describes word-based lines increase slowly.
    [Show full text]
  • The Basic Principles of Data Compression
    The Basic Principles of Data Compression Author: Conrad Chung, 2BrightSparks Introduction Internet users who download or upload files from/to the web, or use email to send or receive attachments will most likely have encountered files in compressed format. In this topic we will cover how compression works, the advantages and disadvantages of compression, as well as types of compression. What is Compression? Compression is the process of encoding data more efficiently to achieve a reduction in file size. One type of compression available is referred to as lossless compression. This means the compressed file will be restored exactly to its original state with no loss of data during the decompression process. This is essential to data compression as the file would be corrupted and unusable should data be lost. Another compression category which will not be covered in this article is “lossy” compression often used in multimedia files for music and images and where data is discarded. Lossless compression algorithms use statistic modeling techniques to reduce repetitive information in a file. Some of the methods may include removal of spacing characters, representing a string of repeated characters with a single character or replacing recurring characters with smaller bit sequences. Advantages/Disadvantages of Compression Compression of files offer many advantages. When compressed, the quantity of bits used to store the information is reduced. Files that are smaller in size will result in shorter transmission times when they are transferred on the Internet. Compressed files also take up less storage space. File compression can zip up several small files into a single file for more convenient email transmission.
    [Show full text]
  • Dictionary Based Compression for Images
    INTERNATIONAL JOURNAL OF COMPUTERS Issue 3, Volume 6, 2012 Dictionary Based Compression for Images Bruno Carpentieri Ziv and Lempel in [1]. Abstract—Lempel-Ziv methods were original introduced to By limiting what could enter the dictionary, LZ2 assures compress one-dimensional data (text, object codes, etc.) but recently that there is at most one instance for each possible pattern in they have been successfully used in image compression. the dictionary. Constantinescu and Storer in [6] introduced a single-pass vector Initially the dictionary is empty. The coding pass consists of quantization algorithm that, with no training or previous knowledge of the digital data was able to achieve better compression results with searching the dictionary for the longest entry that is a prefix of respect to the JPEG standard and had also important computational a string starting at the current coding position. advantages. The index of the match is transmitted to the decoder using We review some of our recent work on LZ-based, single pass, log N bits, where N is the current size of the dictionary. adaptive algorithms for the compression of digital images, taking into 2 account the theoretical optimality of these approach, and we A new pattern is introduced into the dictionary by experimentally analyze the behavior of this algorithm with respect to concatenating the current match with the next character that the local dictionary size and with respect to the compression of bi- has to be encoded. level images. The dictionary of LZ2 continues to grow throughout the coding process. Keywords—Image compression, textual substitution methods.
    [Show full text]
  • I Introduction
    PPM Performance with BWT Complexity: A New Metho d for Lossless Data Compression Michelle E ros California Institute of Technology e [email protected] Abstract This work combines a new fast context-search algorithm with the lossless source co ding mo dels of PPM to achieve a lossless data compression algorithm with the linear context-search complexity and memory of BWT and Ziv-Lemp el co des and the compression p erformance of PPM-based algorithms. Both se- quential and nonsequential enco ding are considered. The prop osed algorithm yields an average rate of 2.27 bits per character bp c on the Calgary corpus, comparing favorably to the 2.33 and 2.34 bp c of PPM5 and PPM and the 2.43 bp c of BW94 but not matching the 2.12 bp c of PPMZ9, which, at the time of this publication, gives the greatest compression of all algorithms rep orted on the Calgary corpus results page. The prop osed algorithm gives an average rate of 2.14 bp c on the Canterbury corpus. The Canterbury corpus web page gives average rates of 1.99 bp c for PPMZ9, 2.11 bp c for PPM5, 2.15 bp c for PPM7, and 2.23 bp c for BZIP2 a BWT-based co de on the same data set. I Intro duction The Burrows Wheeler Transform BWT [1] is a reversible sequence transformation that is b ecoming increasingly p opular for lossless data compression. The BWT rear- ranges the symb ols of a data sequence in order to group together all symb ols that share the same unb ounded history or \context." Intuitively, this op eration is achieved by forming a table in which each row is a distinct cyclic shift of the original data string.
    [Show full text]
  • Dc5m United States Software in English Created at 2016-12-25 16:00
    Announcement DC5m United States software in english 1 articles, created at 2016-12-25 16:00 articles set mostly positive rate 10.0 1 3.8 Google’s Brotli Compression Algorithm Lands to Windows Edge Microsoft has announced that its Edge browser has started using Brotli, the compression algorithm that Google open-sourced last year. 2016-12-25 05:00 1KB www.infoq.com Articles DC5m United States software in english 1 articles, created at 2016-12-25 16:00 1 /1 3.8 Google’s Brotli Compression Algorithm Lands to Windows Edge Microsoft has announced that its Edge browser has started using Brotli, the compression algorithm that Google open-sourced last year. Brotli is on by default in the latest Edge build and can be previewed via the Windows Insider Program. It will reach stable status early next year, says Microsoft. Microsoft touts a 20% higher compression ratios over comparable compression algorithms, which would benefit page load times without impacting client-side CPU costs. According to Google, Brotli uses a whole new data format , which makes it incompatible with Deflate but ensures higher compression ratios. In particular, Google says, Brotli is roughly as fast as zlib when decompressing and provides a better compression ratio than LZMA and bzip2 on the Canterbury Corpus. Brotli appears to be especially tuned for the web , that is for offline encoding and online decoding of Web assets, or Android APKs. Google claims a compression ratio improvement of 20–26% over its own Zopfli algorithm, which still provides the best compression ratio of any deflate algorithm.
    [Show full text]
  • A Survey on Different Compression Techniques Algorithm for Data Compression Ihardik Jani, Iijeegar Trivedi IC
    International Journal of Advanced Research in ISSN : 2347 - 8446 (Online) Computer Science & Technology (IJARCST 2014) Vol. 2, Issue 3 (July - Sept. 2014) ISSN : 2347 - 9817 (Print) A Survey on Different Compression Techniques Algorithm for Data Compression IHardik Jani, IIJeegar Trivedi IC. U. Shah University, India IIS. P. University, India Abstract Compression is useful because it helps us to reduce the resources usage, such as data storage space or transmission capacity. Data Compression is the technique of representing information in a compacted form. The actual aim of data compression is to be reduced redundancy in stored or communicated data, as well as increasing effectively data density. The data compression has important tool for the areas of file storage and distributed systems. To desirable Storage space on disks is expensively so a file which occupies less disk space is “cheapest” than an uncompressed files. The main purpose of data compression is asymptotically optimum data storage for all resources. The field data compression algorithm can be divided into different ways: lossless data compression and optimum lossy data compression as well as storage areas. Basically there are so many Compression methods available, which have a long list. In this paper, reviews of different basic lossless data and lossy compression algorithms are considered. On the basis of these techniques researcher have tried to purpose a bit reduction algorithm used for compression of data which is based on number theory system and file differential technique. The statistical coding techniques the algorithms such as Shannon-Fano Coding, Huffman coding, Adaptive Huffman coding, Run Length Encoding and Arithmetic coding are considered.
    [Show full text]
  • Enhanced Data Reduction, Segmentation, and Spatial
    ENHANCED DATA REDUCTION, SEGMENTATION, AND SPATIAL MULTIPLEXING METHODS FOR HYPERSPECTRAL IMAGING LEANNA N. ERGIN Bachelor of Forensic Chemistry Ohio University June 2006 Submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY IN BIOANALYTICAL CHEMISTRY at the CLEVELAND STATE UNIVERSITY July 19th, 2017 We hereby approve this dissertation for Leanna N. Ergin Candidate for the Doctor of Philosophy in Clinical-Bioanalytical Chemistry degree for the Department of Chemistry and CLEVELAND STATE UNIVERSITY College of Graduate Studies ________________________________________________ Dissertation Committee Chairperson, Dr. John F. Turner II ________________________________ Department/Date ________________________________________________ Dissertation Committee Member, Dr. David W. Ball ________________________________ Department/Date ________________________________________________ Dissertation Committee Member, Dr. Petru S. Fodor ________________________________ Department/Date ________________________________________________ Dissertation Committee Member, Dr. Xue-Long Sun ________________________________ Department/Date ________________________________________________ Dissertation Committee Member, Dr. Yan Xu ________________________________ Department/Date ________________________________________________ Dissertation Committee Member, Dr. Aimin Zhou ________________________________ Department/Date Date of Defense: July 19th, 2017 Dedicated to my husband, Can Ergin. ACKNOWLEDGEMENT I would like to thank my advisor,
    [Show full text]
  • Implementing Associative Coder of Buyanovsky (ACB)
    Implementing Associative Coder of Buyanovsky (ACB) data compression by Sean Michael Lambert A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Montana State University © Copyright by Sean Michael Lambert (1999) Abstract: In 1994 George Mechislavovich Buyanovsky published a basic description of a new data compression algorithm he called the “Associative Coder of Buyanovsky,” or ACB. The archive program using this idea, which he released in 1996 and updated in 1997, is still one of the best general compression utilities available. Despite this, the ACB algorithm is still barely understood by data compression experts, primarily because Buyanovsky never published a detailed description of it. ACB is a new idea in data compression, merging concepts from existing statistical and dictionary-based algorithms with entirely original ideas. This document presents several variations of the ACB algorithm and the details required to implement a basic version of ACB. IMPLEMENTING ASSOCIATIVE CODER OF BUYANOVSKY (ACB) DATA COMPRESSION by Sean Michael Lambert A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science MONTANA STATE UNIVERSITY-BOZEMAN Bozeman, Montana April 1999 © COPYRIGHT by Sean Michael Lambert 1999 All Rights Reserved ii APPROVAL of a thesis submitted by Sean Michael Lambert This thesis has been read by each member of the thesis committee and has been found to be satisfactory regarding content, English usage, format, citations, bibliographic style, and consistency, and is ready for submission to the College of Graduate Studies. Brendan Mumey U /l! ^ (Signature) Date Approved for the Department of Computer Science J.
    [Show full text]
  • An Analysis of XML Compression Efficiency
    An Analysis of XML Compression Efficiency Christopher J. Augeri1 Barry E. Mullins1 Leemon C. Baird III Dursun A. Bulutoglu2 Rusty O. Baldwin1 1Department of Electrical and Computer Engineering Department of Computer Science 2Department of Mathematics and Statistics United States Air Force Academy (USAFA) Air Force Institute of Technology (AFIT) USAFA, Colorado Springs, CO Wright Patterson Air Force Base, Dayton, OH {chris.augeri, barry.mullins}@afit.edu [email protected] {dursun.bulutoglu, rusty.baldwin}@afit.edu ABSTRACT We expand previous XML compression studies [9, 26, 34, 47] by XML simplifies data exchange among heterogeneous computers, proposing the XML file corpus and a combined efficiency metric. but it is notoriously verbose and has spawned the development of The corpus was assembled using guidelines given by developers many XML-specific compressors and binary formats. We present of the Canterbury corpus [3], files often used to assess compressor an XML test corpus and a combined efficiency metric integrating performance. The efficiency metric combines execution speed compression ratio and execution speed. We use this corpus and and compression ratio, enabling simultaneous assessment of these linear regression to assess 14 general-purpose and XML-specific metrics, versus prioritizing one metric over the other. We analyze compressors relative to the proposed metric. We also identify key collected metrics using linear regression models (ANOVA) versus factors when selecting a compressor. Our results show XMill or a simple comparison of means, e.g., X is 20% better than Y. WBXML may be useful in some instances, but a general-purpose compressor is often the best choice. 2. XML OVERVIEW XML has gained much acceptance since first proposed in 1998 by Categories and Subject Descriptors the World-Wide Web Consortium (W3C).
    [Show full text]
  • Improving Compression-Ratio in Backup
    Institutionen för systemteknik Department of Electrical Engineering Examensarbete Improving compression ratio in backup Examensarbete utfört i Informationskodning/Bildkodning av Mattias Zeidlitz Författare Mattias Zeidlitz LITH-ISY-EX--12/4588--SE Linköping 2012 TEKNISKA HÖGSKOLAN LINKÖPINGS UNIVERSITET Department of Electrical Engineering Linköpings tekniska högskola Linköping University Institutionen för systemteknik S-581 83 Linköping, Sweden 581 83 Linköping Improving compression-ratio in backup ............................................................................ Examensarbete utfört i Informationskodning/Bildkodning vid Linköpings tekniska högskola av Mattias Zeidlitz ............................................................. LITH-ISY-EX--12/4588--SE Presentationsdatum Institution och avdelning 2012-06-13 Institutionen för systemteknik Publiceringsdatum (elektronisk version) Department of Electrical Engineering Datum då du ämnar publicera exjobbet Språk Typ av publikation ISBN (licentiatavhandling) Svenska Licentiatavhandling ISRN LITH-ISY-EX--12/4588--SE x Annat (ange nedan) x Examensarbete Serietitel (licentiatavhandling) C-uppsats D-uppsats Engelska Rapport Serienummer/ISSN (licentiatavhandling) Antal sidor Annat (ange nedan) 58 URL för elektronisk version http://www.ep.liu.se Publikationens titel Improving compression ratio in backup Författare Mattias Zeidlitz Sammanfattning Denna rapport beskriver ett examensarbete genomfört på Degoo Backup AB i Stockholm under våren 2012. Syftet var att designa en kompressionssvit
    [Show full text]
  • Benefits of Hardware Data Compression in Storage Networks Gerry Simmons, Comtech AHA Tony Summers, Comtech AHA SNIA Legal Notice
    Benefits of Hardware Data Compression in Storage Networks Gerry Simmons, Comtech AHA Tony Summers, Comtech AHA SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced without modification The SNIA must be acknowledged as source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Oct 17, 2007 Benefits of Hardware Data Compression in Storage Networks 2 © 2007 Storage Networking Industry Association. All Rights Reserved. Abstract Benefits of Hardware Data Compression in Storage Networks This tutorial explains the benefits and algorithmic details of lossless data compression in Storage Networks, and focuses especially on data de-duplication. The material presents a brief history and background of Data Compression - a primer on the different data compression algorithms in use today. This primer includes performance data on the specific compression algorithms, as well as performance on different data types. Participants will come away with a good understanding of where to place compression in the data path, and the benefits to be gained by such placement. The tutorial will discuss technological advances in compression and how they affect system level solutions. Oct 17, 2007 Benefits of Hardware Data Compression in Storage Networks 3 © 2007 Storage Networking Industry Association. All Rights Reserved. Agenda Introduction and Background Lossless Compression Algorithms System Implementations Technology Advances and Compression Hardware Power Conservation and Efficiency Conclusion Oct 17, 2007 Benefits of Hardware Data Compression in Storage Networks 4 © 2007 Storage Networking Industry Association.
    [Show full text]
  • The Deep Learning Solutions on Lossless Compression Methods for Alleviating Data Load on Iot Nodes in Smart Cities
    sensors Article The Deep Learning Solutions on Lossless Compression Methods for Alleviating Data Load on IoT Nodes in Smart Cities Ammar Nasif *, Zulaiha Ali Othman and Nor Samsiah Sani Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science & Technology, University Kebangsaan Malaysia, Bangi 43600, Malaysia; [email protected] (Z.A.O.); [email protected] (N.S.S.) * Correspondence: [email protected] Abstract: Networking is crucial for smart city projects nowadays, as it offers an environment where people and things are connected. This paper presents a chronology of factors on the development of smart cities, including IoT technologies as network infrastructure. Increasing IoT nodes leads to increasing data flow, which is a potential source of failure for IoT networks. The biggest challenge of IoT networks is that the IoT may have insufficient memory to handle all transaction data within the IoT network. We aim in this paper to propose a potential compression method for reducing IoT network data traffic. Therefore, we investigate various lossless compression algorithms, such as entropy or dictionary-based algorithms, and general compression methods to determine which algorithm or method adheres to the IoT specifications. Furthermore, this study conducts compression experiments using entropy (Huffman, Adaptive Huffman) and Dictionary (LZ77, LZ78) as well as five different types of datasets of the IoT data traffic. Though the above algorithms can alleviate the IoT data traffic, adaptive Huffman gave the best compression algorithm. Therefore, in this paper, Citation: Nasif, A.; Othman, Z.A.; we aim to propose a conceptual compression method for IoT data traffic by improving an adaptive Sani, N.S.
    [Show full text]