Huffman Coding Why Compress Files?

Huffman Coding Why Compress Files?

Why compress files? Data Compression: Huffman Coding 10.1 in Weiss (p.389) 1 2 Why compress files? What is a file? • For long term storage (disc space is limited) • C++ program code • Executable program • For transferring files over the internet (bigger • Email - text files take longer) • HTML document • A smaller file more likely to fit in • Pictures (lossy); JPEG memory/cache • Video (lossy); MPEG • Audio (lossy); MP3 3 4 Data Compression Data Compression original compressed decompressed original compressed decompressed Y Y X Encoder Decoder X’ X Encoder Decoder X’ • Lossless compression X = X’ • Lossy compression X != X’ • Compression Ratio |X|/|Y| – Where |X| is the # of bits in X. 5 6 Lossy Compression Lossless Compression • Some data is lost, but not too much. • No data is lost. Standards : Standards: • JPEG (Joint Photographic Experts Group) – • Gzip, Unix compress, zip, GIF, Morse code stills • Examples: • MPEG (Motion Picture Experts Group) – Run-length Encoding (RLE) – Audio and video – Huffman Coding • MP3 (MPEG-1, Layer 3) 7 8 RLE RLE • Idea: Compactly represent long ‘runs’ of the • Idea: Compactly represent long ‘runs’ of the same character same character • “aaarrrrr!” as ‘a’x3 ‘r’x5 then ‘!’ • “aaarrrrr!” as ‘a’x3 ‘r’x5 then ‘!’ • Say… – Replace all ‘runs’ of the same character by 2 characters: the 1) character and 2) the length – ‘bee’ becomes ‘b’,1,’e’,2 9 10 Another idea: Use fewer bits per RLE character • Idea: Compactly represent long ‘runs’ of the ASCII = fixed 8 bits per character same character Example : “hello there” • “aaarrrrr!” as ‘a’x3 ‘r’x5 then ‘!’ – 11 characters * 8 bits = 88 bits • Say… Can we encode this message using fewer bits? – Replace all ‘runs’ of the same character by 2 characters: the 1) character and 2) the length – ‘bee’ becomes ‘b’,1,’e’,2 – When is this good? – When is this really bad? 11 12 Another idea: Use fewer bits per Huffman Coding character • Uses frequencies of symbols in a ASCII = fixed 8 bits per character Letter code string to build a prefix code . Example : “hello there” • Prefix Code – no code in our a 0 – 11 characters * 8 bits = 88 bits encoding is a prefix of another b 100 Can we encode this message using fewer bits? code. c 101 • We could look JUST at the message • there are only 6 possible characters + one space = 7 things; d 11 only need 3 bits • Encode: aabddcaa = could do as 16 bits (each character = 2 bits each) • Huffman can do as 14 bits 13 14 Huffman Coding Huffman Coding • Uses frequencies of symbols in a • Uses frequencies of symbols in a Letter code Letter code string to build a prefix code . string to build a prefix code . • Prefix Code – no code in our a 0 • Prefix Code – no code in our a 0 encoding is a prefix of another b 100 encoding is a prefix of another b 100 code. code. c 101 c 101 d 11 d 11 15 16 Decoding a Prefix Code Decode: 11100010100110 Loop start at root of tree loop Letter code if bit read = 1 then go right a 0 else, go left b 100 until node is a leaf c 101 Report character found! d 11 Until end of the message 17 18 Decode: 11100010100110 Huffman Trees Cost of a Huffman Tree containing n symbols Letter code C(T) = p 1*r 1+p 2*r 2+p 3*r 3+….+ pn*rn a 0 b 100 Where: c 101 pi = the probability that a symbol occurs d 11 ri = the length of the path from the root to the node 19 20 Example Cost Constructing a tree Letter Frequency code Cost: 1.75 • Determine frequency of each letter/symbol a .50 0 • Place each as an unconnected leaf node • Repeatedly merge two nodes with lowest b .125 100 frequency into one node with sum of c .125 101 frequencies d .25 11 • Huffman Coding is optimal* 21 22 Constructing a tree example Constructing a tree example • Encode “a java jar” • Encode “a java jar” • 4 a’s, 2 spaces, 2 j’s, 1 v, 1 r; 10 total • 4 a’s, 2 spaces, 2 j’s, 1 v, 1 r; 10 total .2 a: .4 space: .2 j: .2 v: .1 r: .1 a: .4 space: .2 j: .2 v: .1 r: .1 23 24 Constructing a tree example Constructing a tree example • Encode “a java jar” • Encode “a java jar” • 4 a’s, 2 spaces, 2 j’s, 1 v, 1 r; 10 total • 4 a’s, 2 spaces, 2 j’s, 1 v, 1 r; 10 total .6 .4 a: .4 space: .2 j: .2 v: .1 r: .1 a: .4 space: .2 j: .2 v: .1 r: .1 25 26 Constructing a tree example Constructing a tree example • Encode “a java jar” • Encode “a java jar” • 4 a’s, 2 spaces, 2 j’s, 1 v, 1 r; 10 total • 4 a’s, 2 spaces, 2 j’s, 1 v, 1 r; 10 total 1.0 0 1 0 1 0 1 0 1 a: .4 space: .2 j: .2 v: .1 r: .1 a: .4 space: .2 j: .2 v: .1 r: .1 27 28 Constructing a tree example Constructing a tree example • Encode “a java jar” • Encode “a java jar” • 4 a’s, 2 spaces, 2 j’s, 1 v, 1 r; 10 total • 4 a’s, 2 spaces, 2 j’s, 1 v, 1 r; 10 total 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 a: .4 space: .2 j: .2 v: .1 r: .1 a: .4 space: .2 j: .2 v: .1 r: .1 0 10 110 1110 1111 0 10 110 1110 1111 29 Cost = .4*1 + .2*2 + .2*3 + .1*4 + .1*4 = 2.2 30 Run-time? Run-time? • To decode an encoded message length n: • To decode an encoded message length n: O(n) • To encode message length n, with c possible characters 31 32 Run-time? Run-time? • To decode an encoded message length n: O(n) • To decode an encoded message length n: O(n) • To encode message length n, with c possible • To encode message length n, with c possible characters characters • Count frequencies: • Count frequencies: O(n) • Build tree: • Build tree: O(clogc) (with priority queue) • Encode: • Encode: O(n) 33 34.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us