
Why compress files? Data Compression: Huffman Coding 10.1 in Weiss (p.389) 1 2 Why compress files? What is a file? • For long term storage (disc space is limited) • C++ program code • Executable program • For transferring files over the internet (bigger • Email - text files take longer) • HTML document • A smaller file more likely to fit in • Pictures (lossy); JPEG memory/cache • Video (lossy); MPEG • Audio (lossy); MP3 3 4 Data Compression Data Compression original compressed decompressed original compressed decompressed Y Y X Encoder Decoder X’ X Encoder Decoder X’ • Lossless compression X = X’ • Lossy compression X != X’ • Compression Ratio |X|/|Y| – Where |X| is the # of bits in X. 5 6 Lossy Compression Lossless Compression • Some data is lost, but not too much. • No data is lost. Standards : Standards: • JPEG (Joint Photographic Experts Group) – • Gzip, Unix compress, zip, GIF, Morse code stills • Examples: • MPEG (Motion Picture Experts Group) – Run-length Encoding (RLE) – Audio and video – Huffman Coding • MP3 (MPEG-1, Layer 3) 7 8 RLE RLE • Idea: Compactly represent long ‘runs’ of the • Idea: Compactly represent long ‘runs’ of the same character same character • “aaarrrrr!” as ‘a’x3 ‘r’x5 then ‘!’ • “aaarrrrr!” as ‘a’x3 ‘r’x5 then ‘!’ • Say… – Replace all ‘runs’ of the same character by 2 characters: the 1) character and 2) the length – ‘bee’ becomes ‘b’,1,’e’,2 9 10 Another idea: Use fewer bits per RLE character • Idea: Compactly represent long ‘runs’ of the ASCII = fixed 8 bits per character same character Example : “hello there” • “aaarrrrr!” as ‘a’x3 ‘r’x5 then ‘!’ – 11 characters * 8 bits = 88 bits • Say… Can we encode this message using fewer bits? – Replace all ‘runs’ of the same character by 2 characters: the 1) character and 2) the length – ‘bee’ becomes ‘b’,1,’e’,2 – When is this good? – When is this really bad? 11 12 Another idea: Use fewer bits per Huffman Coding character • Uses frequencies of symbols in a ASCII = fixed 8 bits per character Letter code string to build a prefix code . Example : “hello there” • Prefix Code – no code in our a 0 – 11 characters * 8 bits = 88 bits encoding is a prefix of another b 100 Can we encode this message using fewer bits? code. c 101 • We could look JUST at the message • there are only 6 possible characters + one space = 7 things; d 11 only need 3 bits • Encode: aabddcaa = could do as 16 bits (each character = 2 bits each) • Huffman can do as 14 bits 13 14 Huffman Coding Huffman Coding • Uses frequencies of symbols in a • Uses frequencies of symbols in a Letter code Letter code string to build a prefix code . string to build a prefix code . • Prefix Code – no code in our a 0 • Prefix Code – no code in our a 0 encoding is a prefix of another b 100 encoding is a prefix of another b 100 code. code. c 101 c 101 d 11 d 11 15 16 Decoding a Prefix Code Decode: 11100010100110 Loop start at root of tree loop Letter code if bit read = 1 then go right a 0 else, go left b 100 until node is a leaf c 101 Report character found! d 11 Until end of the message 17 18 Decode: 11100010100110 Huffman Trees Cost of a Huffman Tree containing n symbols Letter code C(T) = p 1*r 1+p 2*r 2+p 3*r 3+….+ pn*rn a 0 b 100 Where: c 101 pi = the probability that a symbol occurs d 11 ri = the length of the path from the root to the node 19 20 Example Cost Constructing a tree Letter Frequency code Cost: 1.75 • Determine frequency of each letter/symbol a .50 0 • Place each as an unconnected leaf node • Repeatedly merge two nodes with lowest b .125 100 frequency into one node with sum of c .125 101 frequencies d .25 11 • Huffman Coding is optimal* 21 22 Constructing a tree example Constructing a tree example • Encode “a java jar” • Encode “a java jar” • 4 a’s, 2 spaces, 2 j’s, 1 v, 1 r; 10 total • 4 a’s, 2 spaces, 2 j’s, 1 v, 1 r; 10 total .2 a: .4 space: .2 j: .2 v: .1 r: .1 a: .4 space: .2 j: .2 v: .1 r: .1 23 24 Constructing a tree example Constructing a tree example • Encode “a java jar” • Encode “a java jar” • 4 a’s, 2 spaces, 2 j’s, 1 v, 1 r; 10 total • 4 a’s, 2 spaces, 2 j’s, 1 v, 1 r; 10 total .6 .4 a: .4 space: .2 j: .2 v: .1 r: .1 a: .4 space: .2 j: .2 v: .1 r: .1 25 26 Constructing a tree example Constructing a tree example • Encode “a java jar” • Encode “a java jar” • 4 a’s, 2 spaces, 2 j’s, 1 v, 1 r; 10 total • 4 a’s, 2 spaces, 2 j’s, 1 v, 1 r; 10 total 1.0 0 1 0 1 0 1 0 1 a: .4 space: .2 j: .2 v: .1 r: .1 a: .4 space: .2 j: .2 v: .1 r: .1 27 28 Constructing a tree example Constructing a tree example • Encode “a java jar” • Encode “a java jar” • 4 a’s, 2 spaces, 2 j’s, 1 v, 1 r; 10 total • 4 a’s, 2 spaces, 2 j’s, 1 v, 1 r; 10 total 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 a: .4 space: .2 j: .2 v: .1 r: .1 a: .4 space: .2 j: .2 v: .1 r: .1 0 10 110 1110 1111 0 10 110 1110 1111 29 Cost = .4*1 + .2*2 + .2*3 + .1*4 + .1*4 = 2.2 30 Run-time? Run-time? • To decode an encoded message length n: • To decode an encoded message length n: O(n) • To encode message length n, with c possible characters 31 32 Run-time? Run-time? • To decode an encoded message length n: O(n) • To decode an encoded message length n: O(n) • To encode message length n, with c possible • To encode message length n, with c possible characters characters • Count frequencies: • Count frequencies: O(n) • Build tree: • Build tree: O(clogc) (with priority queue) • Encode: • Encode: O(n) 33 34.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages6 Page
-
File Size-