Lempel Ziv Welch Data Compression Using Associative Processing As an Enabling Technology for Real Time Application
Total Page:16
File Type:pdf, Size:1020Kb
Rochester Institute of Technology RIT Scholar Works Theses 3-1-1998 Lempel Ziv Welch data compression using associative processing as an enabling technology for real time application Manish Narang Follow this and additional works at: https://scholarworks.rit.edu/theses Recommended Citation Narang, Manish, "Lempel Ziv Welch data compression using associative processing as an enabling technology for real time application" (1998). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. Lempel Ziv Welch Data Compression using Associative Processing as an Enabling Technology for Real Time Applications. by Manish Narang A Thesis Submitted m Partial Fulfillment ofthe Requirements for the Degree of MASTER OF SCIENCE in Computer Engineering Approved by: _ Graduate Advisor - Tony H. Chang, Professor Roy Czemikowski, Professor Muhammad Shabaan, Assistant Professor Department ofComputer Engineering College ofEngineering Rochester Institute ofTechnology Rochester, New York March,1998 Thesis Release Permission Form Rochester Institute ofTechnology College ofEngineering Title: Lempel Ziv Welch Data Compression using Associative Processing as an Enabling Technology for Real Time Applications. I, Manish Narang, hereby grant pennission to the Wallace Memorial Library to reproduce my thesis in whole or in part. Signature: _ Date: -----------.L----=-------/19? ii Abstract Data compression is a term that refers to the reduction of data representation requirements either in storage and/or in transmission. A commonly used algorithm for compression is the Lempel-Ziv-Welch (LZW) method proposed by Terry A. Welch[l]. LZW is an adaptive, dictionary based, lossless algorithm. This provides for a general compression mechanism that is applicable to a broad range of inputs. Furthermore, the lossless nature of LZW implies that it is a reversible process which results in the original file/message being fully recoverable from compression. "compress" A variant of this algorithm is currently the foundation of the UNIX program. Additionally, LZW is one of the compression schemes defined in the TIFF standard[12], as well as in the CCITT V.42bis standard. One of the challenges in designing an efficient compression mechanism, such as LZW, which can be used in real time applications, is the speed of the search into the data dictionary. In this paper an Associative Processing implementation of the LZW algorithm is presented. This approach provides an efficient solution to this requirement. Additionally, it is shown that Associative Processing (ASP) allows for rapid and elegant development ofthe LZW algorithm that will generally outperform standard approaches in complexity, readability, and performance. in Acknowledgment I would like to acknowledge my family, Gurprasad, Vimal and Vibhu Narang, for the guidance and support they offered throughout the implementation ofthis work. Also, I would like to thank the Thesis committee for their input and time in reviewing my work. IV Table of Contents ABSTRACT Ill ACKNOWLEDGMENT IV TABLE OF CONTENTS V LIST OF FIGURES VII LIST OF TABLES VIII GLOSSARY IX 1. INTRODUCTION 1 1.1 General 1 1 .2 Information Sources 2 1.3 Deliverables 4 2. INFORMATION THEORY 5 2.1 Information 5 2.2 Redundancy 11 3. COMPRESSION .13 3.1 General Overview 13 3.2 Run Length Encoding 18 3.3 Shannon-Fano Coding ., 19 3.4 Huffman Coding 23 3.5 Arithmetic Coding 28 3.6 Lempel-Ziv-Welch 32 3.6.1 Basic Algorithms 34 3.6.2 An Example 36 3.6.3 Hashing 40 3.7 Considerations 43 3.8 The Need for Hardware Compression 44 4. ASSOCIATIVE PROCESSING 48 4.1 general -cam 48 4.2 Integration Issues 54 4.2.1 Constituents of a cell 54 4.2.2 Scalability 56 4.3 Coherent Processor 58 4.4 Alternatives 60 4.5 Architecture for Compression using Associative Processing 62 4.6 a variable length 12 bit code implementation of lzw compression using associative Processing 63 4.6.1 Data Structure 64 4.6.2 CP Macros 66 4.6.3 Accomplishments 68 4.7 a variable length 16 bit code implementation of lzw compression using associative Processing 71 4.7.1 Data Structure 71 4.7.2 CP Macros 72 4.8 Results 73 5. COMMERCIAL HARDWARE COMPRESSORS 79 6. CONCLUSION 80 6.1 Summary 80 6.2 Significance 83 PREFERENCES 84 8. APPENDIX A: MAIN LZW 12 BIT CODE 87 9. APPENDIX B: CP LZW 12 BIT CODE 94 10. APPENDIX C: MAIN LZW 16 BIT CODE 96 11. APPENDIX D: CP LZW 16 BIT CODE 103 VI List of Figures Figure 1. Generic Statistical Compression 15 Figure 2. Generic Adaptive Compression 17 Figure 3. Generic Adaptive Decompression 18 Figure 4. Binary Tree Representation of Shannon-Fano Code 22 Figure 5. Huffman Tree after second iteration 25 Figure 6. Binary Tree representation of Huffman code 26 Figure 7. Incoming Stream to LZW compression 36 Figure 8. Compressed Stream after LZW compression 37 Figure 9. Mask Functionality in an ASP 50 Figure 10. Logic for a single bit of CAM 51 Figure 11. Multiple Response Resolver , 52 Figure 12. Static CAM Cell Circuit 54 Figure 13. Dynamic CAM Cell Circuit 55 Figure 14. Proposed ASP Architecture 63 Figure 15. CAM Data Structure for 12 bit LZW 64 Figure 16. Representation of Table 13 in CAM using 12 bit codes 65 Figure 17. User Data Structure for CAM interface 66 Figure 18. CAM Data Structure for 16 bit LZW 71 Figure 19. Representation of Table 13 in CAM using 16 bit codes 72 Figure 20. Compression Comparison using Calgary Corpus 75 Figure 21 . Relative performance improvement using ASP 79 Vll List of Tables Table 1. Entropies of English relative to order 9 Table 2. Probability of a sample alphabet 20 Table 3. First iteration of Shannon-Fano 21 Table 4. Shannon-Fano Code for the sample alphabet of Table 2 21 Table 5. Example of Shannon-Fano Coding 22 Table 6. Huffman Code for Table 2 26 Table 7. Huffman coding example 27 "ROCHESTER" Table 8. Range assignment for text string 30 "ROCHESTER" Table 9. Arithmetic Coding of text string 30 "ROCHESTER" Table 10. Arithmetic decoding of text string 31 Table 1 1. String table after initialization 36 Table 12. Building of String Table during Compression 36 Table 13. String Table at completion 37 Table 14. Decompression of Compressed Stream up to the seventh code 37 Table 15. Hash table using Open Chaining - Linear Probe 41 Table 16. Calgary Corpus [33] 69 Table 17. Compression results for 12 bit implementation 70 Table 18. Compression results for 16 bit implementation 74 Table 19. Run time approximations for 16 bit version 77 Table 20. Run time approximations for 12 bit version 78 vin Glossary Active-X A software component interface architecture proposed by Microsoft. ASIC Application Specific Integrated Circuit. ASP Associative Processor(s) or Processing. CAM Content Addressable Memory. CP Coherent Processor, an associative processor designed by Coherent Research. CU Control Unit. Java-Beans A software component interface architecture proposed by Sun Microsystems. GPLB General Purpose Logic Block. LAN Local Area Network. LFU Least Frequently Used. LRU Least Recently Used. LZW Lempel-Ziv-Welch. MRR Multiple Response Resolver. MRReg Multiple Response Register. MRROut Output of the MRR. OLAP Online Analytical Processing. PPM Prediction by Partial Matching. RLE Run Length Encoding. SIMD Single Instruction Multiple Data. SR Shift Register. UDS User Data Structure. VLSI Very Large Scale Integration. WAN Wide Area Network. wcs Writable Control Store. IX 1. Introduction 1.1 General Today's world has become an information society. Bob Lucky has stated, "The industrial here." age has come and gone; the information age is [17, p. 2] This is in sight of the fact that more than 65% percent of the population of the United States is involved in information related work. Much like the industrial age and other turning points in the history of man, this age too has many implications. The greatest of which is the management of digital information. There are many aspects relating to this challenge; from creating, securing, sharing, storing, and distributing, to analyzing. In this paper, solutions to two of these challenges are being examined. Namely, storing, and distributing information will be presented. It is obvious that as information grows, so does the need to store and distribute it. Here, when we speak of storing information, we are referring to a means of physically writing data into a permanent storage device such as magnetic or optical media. Although the density of such devices is ever increasing, without an appreciable increase in cost, there are issues that this technology change does not address. One, those that have already invested in the purchase of storage media or networks will not be easily willing to spend additional money in revamping existing systems. Unfortunately, due to the fact that information is growing at a faster rate than the nothing" technology, a "do philosophy is not the solution. Therefore, the purchase of new hardware is unavoidable but must happen in tandem with the utilization of data compression to meet the needs of the user and society. Two, there is the issue of physical space constraints. Unchallenged, it is not difficult to imagine the trend in computer technology going towards a future, which like the days of the ENIAC will have large computer rooms. Unlike those days, where this room was used for processing hardware, the space will now be dedicated for data storage. Again, data compression can provide a mechanism to help control this growth in information. Moreover, it is quite common to see the interconnection of high speed LANs via low speed WANs.