Efficient Sequential Algorithms, Comp309

Efficient Sequential Algorithms, Comp309

Efficient Sequential Algorithms, Comp309 University of Liverpool 2010–2011 Module Organiser, Igor Potapov Supplementary Material: String Algorithms. 1 / 16 Lempel-Ziv Welch (LZW) Compression We will now look at the Lempel-Ziv Welch compression algorithm, which is a lossless compression algorithm that does particularly well on data with repetitions. A useful feature of LZW compression is that the dictionary is built adaptively during encoding. The dictionary does not need to be passed with the compressed text — the decoding algorithm produces the same dictionary from the compressed text. 2 / 16 Lempel-Ziv Welch (LZW) Compression We will now look at the Lempel-Ziv Welch compression algorithm, which is a lossless compression algorithm that does particularly well on data with repetitions. A useful feature of LZW compression is that the dictionary is built adaptively during encoding. The dictionary does not need to be passed with the compressed text — the decoding algorithm produces the same dictionary from the compressed text. 2 / 16 The original paper that describes the LZW algorithm is: Terry A. Welch. A Technique for High Performance Data Compression. IEEE Computer, Vol. 17, No. 6, 1984, pp. 8-19. This paper describes an improvement to a compression method introduced by Ziv and Lempel in 1977 and 1978. LZW and variants have been used in popular software such as Unix compress and GIF compression. 3 / 16 The original paper that describes the LZW algorithm is: Terry A. Welch. A Technique for High Performance Data Compression. IEEE Computer, Vol. 17, No. 6, 1984, pp. 8-19. This paper describes an improvement to a compression method introduced by Ziv and Lempel in 1977 and 1978. LZW and variants have been used in popular software such as Unix compress and GIF compression. 3 / 16 Sources There is a lot of information about LZW on the web. See, for example, Wikipedia, or the nice animation at http://www.data-compression.com/lempelziv.shtml Also, see Dave Marshall’s notes. http://www.cs.cf.ac.uk/Dave/Multimedia 4 / 16 Compression The dictionary is initialised so that there is a codeword for every extended ASCII character. character code word null 0 ::: ::: A 65 B 66 C 67 D 68 ::: ::: 255 5 / 16 The compression algorithm Initialise dictionary w NIL while there is a character to read k next character in text If wk is in the dictionary w wk Else Add wk to the dictionary Output the code for w w k Output the code for w 6 / 16 Example T = ABACABA dictionary ::: ::: Initialise dictionary A 65 w NIL B 66 while there is a char to read rest of C 67 k next char in text w k text ::: ::: If wk is in the dictionary NIL A BACABA output w wk A B ACABA AB 256 65 Else B A CABA BA 257 66 Add wk to dictionary A C ABA AC 258 65 Output code for w C A BA CA 259 67 w k A B A Output the code for w AB A ABA 260 256 A 65 7 / 16 The Decompression Algorithm The basic decompression algorithm is as follows. Initialise dictionary c first codeword output the translation of c w c While there is a codeword to read c next codeword output the translation of c s translation of w k first character of translation of c Add sk to dictionary w c 8 / 16 Example Code = 65,66,65,67,256,65 dictionary ::: ::: Initialise dictionary A 65 c first codeword B 66 output the translation of c C 67 w c c w s k ::: ::: output While there is a codeword 65 65 A to read 66 A B B c next codeword 66 AB 256 output the translation of c 65 B A A s translation of w 65 BA 257 k first character of 67 A C C translation of c 67 AC 258 Add sk to dictionary 256 C A AB w c 256 CA 259 65 AB A A 65 ABA 260 9 / 16 Refinement The decoding algorithm as stated does not always work as it fails if c is not in the dictionary. 10 / 16 Example dictionary ::: ::: Initialise dictionary A 65 w NIL B 66 while there is a char to read C 67 k next char in text w k ::: ::: If wk is in the dictionary NIL A output w wk A B AB 256 65 Else B C BC 257 66 Add wk to dictionary C A CA 258 67 Output code for w A B w k AB A ABA 259 256 Output the code for w A B AB A ABA 259 11 / 16 Example Code = 65,66,67,256,259 dictionary Initialise dictionary ::: ::: c first codeword A 65 output the translation of c B 66 w c C 67 While there is a codeword c w s k ::: ::: output to read 65 65 A c next codeword 66 A B B output the translation of c 66 AB 256 s translation of w 67 B C C k first character of 67 BC 257 translation of c 256 C A AB Add sk to dictionary 256 CA 258 w c 259 ? 12 / 16 The problem arises when the dictionary has cs in the dictionary for a character c and a string s and then the input contains cscsc. The decompression algorithm can be modified to deal with this case. 13 / 16 The problem arises when the dictionary has cs in the dictionary for a character c and a string s and then the input contains cscsc. The decompression algorithm can be modified to deal with this case. 13 / 16 Initialise dictionary c first codeword output the translation of c w c While there is a codeword to read c next codeword s translation of w If c is in dictionary k first character of translation of c output the translation of c Else (* c is the code for what we add here*) k first character of s output sk Add sk to dictionary w c 14 / 16 Example Code = 65,66,67,256,259 Initialise dictionary dictionary c first codeword ::: ::: output the translation of c A 65 w c B 66 While there is a codeword C 67 c next codeword c w s k ::: ::: output s translation of w 65 65 A If c is in dictionary 66 A B B k 1st char of trans c 66 AB 256 output trans c 67 B C C Else (* c is next added*) 67 BC 257 k first character of s 256 C A AB output sk 256 CA 258 Add sk to dictionary 259 AB A ABA w c 259 ABA 259 15 / 16 There are lots of interesting implementation issues. For example, what if the dictionary runs out of space? Also, if we start re-using dictionary space, what data structure do we use to make dictionary access efficient? GIF compression solves the problem of dictionary overflow by having variable-length codes. We will not cover the details. 16 / 16.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    20 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us