The Smallest Number Requiring Nine Words to Express. ・What Is It

The Smallest Number Requiring Nine Words to Express. ・What Is It

The smallest number requiring nine words to express. ・What is it? – Nine hundred ninety nine million nine thousand ninety nine? – One trillion to the power of one trillion factorial? 1 Algorithms ROBERT SEDGEWICK | KEVIN WAYNE 5.5 DATA COMPRESSION ‣ introduction ‣ run-length coding Algorithms ‣ Huffman compression FOURTH EDITION ‣ LZW compression ‣ Kolmogorov complexity ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu 5.5 DATA COMPRESSION ‣ introduction ‣ run-length coding Algorithms ‣ Huffman compression ‣ LZW compression ‣ Kolmogorov Complexity ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu Data compression Compression reduces the size of a file: ・To save space when storing it. ・To save time when transmitting it. ・Most files have lots of redundancy. Who needs compression? ・Moore's law: # transistors on a chip doubles every 18–24 months. ・Parkinson's law: data expands to fill space available. ・Text, images, sound, video, … “ Everyday, we create 2.5 quintillion bytes of data—so much that 90% of the data in the world today has been created in the last two years alone. ” — IBM report on big data (2011) Basic concepts ancient (1950s), best technology recently developed. 4 Applications Generic file compression. ・Files: GZIP, BZIP, 7z. ・Archivers: PKZIP. ・File systems: NTFS, HFS+, ZFS. Multimedia. ・Images: GIF, JPEG. ・Sound: MP3. ・Video: MPEG, DivX™, HDTV. Communication. ・ITU-T T4 Group 3 Fax. ・V.42bis modem. ・Skype. Databases. Google, Facebook, .... 5 Way back in the day ・Polybius, 200 BC http://people.seas.harvard.edu/~jones/cscie129/images/history/encoding.html 6 Semaphore telegraph Every sensible person will agree that a single man in a single day could, without interference, cut all the electrical wires terminating in Paris; it is obvious that a single man could sever, in ten places, in the course of a day, the electrical wires of a particular line of communication, without being stopped or even recognized. [Clearly, no serious competition could be expected from] “a few wretched wires.’’ ” — Dr. Jules Guoyot (July 5th, 1841) vs The Claude Chappe system (1792) Then: 1 symbol per minute 7 These days Now Video ・Uncompressed 1080p, 100 megabytes per second ・1080p on Netflix, ~0.5 megabytes per second 8 Lossless compression and expansion Message. Binary data B we want to compress. Compress. Generates a "compressed" representation C (B). Expand. Reconstructs original bitstream B. uses fewer bits (you hope) Compress Expand bitstream B compressed version C(B) original bitstream B 0110110101... 1101011111... 0110110101... Basic model for data compression Compression ratio. Bits in C (B) / bits in B. Ex. 50–75% or better compression ratio for natural language. 9 Data representation: genomic code Genome. String over the alphabet { A, C, T, G }. Goal. Encode an N-character genome: ATAGATGCATAG... Standard ASCII encoding. Two-bit encoding. ・8 bits per char. ・2 bits per char. ・8 N bits. ・2 N bits. char hex binary char binary A 41 01000001 A 00 C 43 01000011 C 01 T 54 01010100 T 10 G 47 01000111 G 11 AG AG AG 0100000101000111 0011 0100000101000111 10 Data representation: genomic code Genome. String over the alphabet { A, C, T, G }. Goal. Encode an N-character genome: ATAGATGCATAG... Standard ASCII encoding. Two-bit encoding. ・8 bits per char. ・2 bits per char. ・8 N bits. ・2 N bits. char hex binary char binary A 41 01000001 A 00 C 43 01000011 C 01 T 54 01010100 T 10 G 47 01000111 G 11 Fixed-length code. k-bit code supports alphabet of size 2k. Amazing but true. Initial genomic databases in 1990s used ASCII. 11 664 CHAPTER 6 Strings Binary input and output. Most systems nowadays, including Java, base their I/O on 664 CHAPTER8-bit bytestreams, 6 Strings so we might decide to read and write bytestreams to match I/O for- mats with the internal representations of primitive types, encoding an 8-bit char with 1 byte, a 16-bit short with 2 bytes, a 32-bit int with 4 bytes, and so forth. Since bit- streams are the primary abstraction for data compression, we go a bit further to allow Binary input and output. Most systems nowadays, including Java, base their I/O on clients to read and write individual bits, intermixed with data of various types (primi- 8-bit bytestreams, so we might decide to read and write bytestreams to match I/O for- Reading andtive writing types and Stringbinary). The datagoal is to minimize the necessity for type conversion in mats with the internal representations of primitive types, encoding an 8-bit char with client programs and also to take care of operating-system conventions for representing 1 byte, a 16-bit short with 2 bytes, a 32-bit int with 4 bytes, and so forth. Since bit- data.We use the following API for reading a bitstream from standard input: Binary standardstreams input are the andprimary standard abstraction for output. data compression, Libraries we go a bitto further read to andallow write bits clientspublic to readclass and BinaryStdIn write individual bits, intermixed with data of various types (primi- tive types and String). The goal is to minimize the necessity for type conversion in from standardboolean input readBoolean()and to standard output. boolean client programs and also to take careread of 1operating-system bit of data and return conventions as a for valuerepresenting data.Wechar use thereadChar() following API for readingread 8 bits a bitstream of data and from return standard as a char input: value publicchar classreadChar(int BinaryStdIn r) read r bits of data and return as a char value boolean[similar methodsreadBoolean() for byte (8 bits); shortread 1 (16 bit bits);of data int and (32 return bits); longas a boolean and double value (64 bits)] booleanchar readChar()isEmpty() readis the 8 bitstreambits of data empty? and return as a char value void close() char readChar(int r) readclose r the bits bitstream of data and return as a char value API for static methods that read from a bitstream on standard input [similar methods for byte (8 bits); short (16 bits); int (32 bits); long and double (64 bits)] Aboolean key featureisEmpty() of the abstraction is isthat, the bitstreamin marked empty? constrast to StdIn, the data on stan- dard voidinput isclose() not necessarily alignedclose on thebyte bitstream boundaries. If the input stream is a single byte, a client could read it 1 bit at a time with 8 calls to readBoolean(). The close() API for static methods that read from a bitstream on standard input method is not essential, but, for clean termination, clients should call close() to in- dicate that no more bits are to be read. As with StdIn/StdOut, we use the following A key feature of the abstraction is that, in marked constrast to StdIn, the data on stan- complementary API for writing bitstreams to standard output: dard input is not necessarily aligned on byte boundaries. If the input stream is a single byte,public a client class could BinaryStdOut read it 1 bit at a time with 8 calls to readBoolean(). The close() method is not essential, but, for clean termination, clients should call close() to in- void write(boolean b) write the specifed bit dicate that no more bits are to be read. As with StdIn/StdOut, we use the following complementaryvoid write(char API for writing c) bitstreamswrite to the standard specifed output:8-bit char publicvoid classwrite(char BinaryStdOut c, int r) write the r least signifcant bits of the specifed char [similar methods for byte (8 bits); short (16 bits); int (32 bits); long and double (64 bits)] void write(boolean b) write the specifed bit void close() close the bitstream void write(char c) write the specifed 8-bit char API for static methods that write to a bitstream on standard output void write(char c, int r) write the r least signifcant bits of the specifed char 12 [similar methods for byte (8 bits); short (16 bits); int (32 bits); long and double (64 bits)] void close() close the bitstream API for static methods that write to a bitstream on standard output Writing binary data Date representation. Three different ways to represent 12/31/1999. A character stream (StdOut) A character stream (StdOut) A character stream (StdOut) StdOut.print(month + "/" + day + "/" + year); StdOut.print(month + "/" + day + "/" +StdOut.print(month year); + "/" + day + "/" + year); 00110001001100100010111100110111001100010010111100110001001110010011100100111001 00110001001100100010111100110111001100010010111100110001001110010011100100111001 001100010011001000101111001101110011000100101111001100010011100100111001001110011 2 / 3 1 / 1 9 9 9 1 2 / 3 1 / 1 9 9 9 80 bits 1 2 / 3 1 / 1 9 9 9 80 bits Three ints (BinaryStdOut) 80 bits Three ints (BinaryStdOut) Three ints (BinaryStdOut) BinaryStdOut.write(month); BinaryStdOut.write(month); BinaryStdOut.write(day);BinaryStdOut.write(month); BinaryStdOut.write(day); BinaryStdOut.write(year);BinaryStdOut.write(day); BinaryStdOut.write(year); BinaryStdOut.write(year); 000000000000000000000000000011000000000000000000000000000001111100000000000000000000011111001111 000000000000000000000000000011000000000000000000000000000001111100000000000000000000011111001111 00000000000000000000000000001100000000000000000000000000000111110000000000000000000001111100111112 31 1999 96 bits 12 31 1999 12 31 1999 96 bits Two chars and a short (BinaryStdOut) 96A bits 4-bit feld, a 5-bit feld, and a 12-bit feld (BinaryStdOut) Two chars and a short (BinaryStdOut) A 4-bit feld, a 5-bit feld, and a 12-bit feld (BinaryStdOut) Two chars and a short (BinaryStdOut) ABinaryStdOut.write((char) 4-bit feld, a 5-bit feld, and a 12-bit feld (BinaryStdOut)month);

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    100 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us