Information and Compression Introduction Summary of Course
Total Page:16
File Type:pdf, Size:1020Kb
Information and compression Introduction ■ We will be looking at what information is Information and Compression ■ Methods of measuring it ■ The idea of compression ■ 2 distinct ways of compressing images Paul Cockshott Faraday partnership Summary of Course Agenda ■ At the end of this lecture you should have ■ Encoding systems an idea what information is, its relation to ■ BCD versus decimal order, and why data compression is possible ■ Information efficiency ■ Shannons Measure ■ Chaitin, Kolmogorov Complexity Theory ■ Relation to Goedels theorem Overview Connections ■ Data coms channels bounded ■ Simple encoding example shows different ■ Images take great deal of information in raw densities possible form ■ Shannons information theory deals with ■ Efficient transmission requires denser transmission encoding ■ Chaitin relates this to Turing machines and ■ This requires that we understand what computability information actually is ■ Goedels theorem limits what we know about compression paul 1 Information and compression Vocabulary What is information ■ information, ■ Information measured in bits ■ entropy, ■ Bit equivalent to binary digit ■ compression, ■ Why use binary system not decimal? ■ redundancy, ■ Decimal would seem more natural ■ lossless encoding, ■ Alternatively why not use e, base of natural ■ lossy encoding logarithms instead of 2 or 10 ? Voltage encoding Noise Immunity Binary voltage ■ ■ ■ 5v Digital information BINARY DECIMAL encoded as ranges of ■ 2 values 0,1 ■ 10 values 0..9 continuous variables 1 ■ 1 isolation band ■ 9 isolation bands ■ Each digital value ■ 1.6 volt isolation band ■ 0.32 volt isolation 2.4v takes a range ■ Good noise imunity, bands ■ isolation bands in ■ error 0.5 volt noise on 0.5 volt power noise between power no problem would destroy 0.8v readings 0 0v Decimal Computing BCD encoding of 1998 ■ Great advantages of decimal for commerce, ■ Reliable since encoded 0 0000 in binary banking 1 0001 ■ 1998 represented as ■ Burroughs, ICL went on making decimal 2 0010 3 0011 ■ 0001 1001 1001 1000 computers until late 1970s 4 0100 ■ In pure binary this is ■ Burroughs B8 series, ICL System 10 5 0101 6 0110 ■ 0000 0111 1100 1110 ■ 7 0111 Used Binary Encoded Decimal BCD ■ 8 1000 BCD 13 sig digits 9 1001 ■ Binary 11 sig digits paul 2 Information and compression BCD less dense Nyquist Limit Here capture works ■ BCD requires 16 bits to represent dates up to 9999 BCD : 1001 1001 1001 1001 ■ Binary requires 14 bits to go up to 9999 binary: 10 0111 0000 1111 ■ Why ? ■ What is the relative efficiency of two encodings? To capture a frequency with discrete samples the capture rate Must be twice the frequency. Maxwells Demon End Result – Free Heat ■ Daemon opens door only for fast molecules ■ Slow molecules in A, fast in B to go from A to B. ■ Thus B hotter than A, and can be used to B A power a machine A B This apparently contradicts second law of thermodynamics ``BSD Daemon Copyright 1988 by Marshall Kirk McKusick. All Rights Reserved.'' Entropy of the system has been reduced. Szilards response Topic Two: Shannon’s Theory ■ Szilard pointed out that to decide which ■ Shannon defines a bit as the amount of molecules to let through, the daemon must information required to decide between two measure the speed of them. Szilard showed equally probable outcomes that these measurements ( bouncing photons ■ Example: a sequence of tosses of a fair coin off the molecules ) would use up more can be encode 1 bit per toss, such that heads energy than was gained. are 1 and tails 0. paul 3 Information and compression Information as Surprise This explains inefficiency of BCD According to Shannon the information content of a message ■ Consider the frequency of 8s 4s 2s 1s Value is a function of how surprised we are by it. occurrence of 1 and 0 in 00000 00011 The less probable a message the more information it contains the different columns of a 00102 BCD digit. 00113 H = - log p 01004 2 ■ Only the least significant 01015 H = information also called entropy, p the probability of 01106 digit decides between 01117 occurrence of a messages. The mean information content equiprobable outcomes 10008 of an ensemble of messages is obtained by weighting the 10019 ■ For most significant digit, messages by their probability of occurence 2445total 0 is four times more 0.2 0.4 0.4 0.5 Prob = 1 Σ common than 1 - p log2p Determining encoding efficiency Mean info of a BCD digit ■ For the most significant Bit position 8s 4s 2s 1s digit of BCD we have 0.800 0.4 0.6 Information in bits ■ p= 0.2 of a 1, p=0.8 of a 0 0.322 1.322 0.737 0.722 0.971 0.971 1 ■ Applying Shannon’s 0.258 0.529 0.442 Total for BCD digit 3.664 formula a 1 contributes 0.971 2.321 bits of information, but with a probability of 0.2 this contributes 0.464 bits Thus the 4 bits of a BCD digit contain at most 3.664 bits of ■ Sum of information for 0 information. Binary is a denser encoding. and 1 is 0.721 bits Channel capacity Signal to noise ratio ■ The amount of data that can be sent along ■ As the signal to noise ratio falls, the a physical channel – for example a radio probability that you get an error in the data frequency channel depends upon two sent rises. In order to protect against errors things one needs to use error correcting codes 1. The frequency of the channel – follows which have greater redundancy from Nyquist 2. The signal to noise ratio of the channel paul 4 Information and compression Channel capacity formula C= W log2(P+N)/N ■ Where – C is channel capacity in bits per second – P is signal power ( say in watts) – N is noise power ( say watts) – W is the frequency of the channel ■ Note that the dimensions of P and N cancel Topic Three: Algorithmic Randomness and π Information Chaitin defines the information content of a We know from Shannon that 1 million tosses number ( or sequence of binary digits ) to be of fair coin generates 1 million bits of the length of the shortest computer program information. capable of generating that sequence. On the other hand, from Chaitin we know that He uses Turing machines as cannonical computers π to 1 million bit precision contains much Thus the information content of a sequence is the less than 1 millon bits, since the program to shortest Turing machine tape that would cause the compute π can be encoded in much less machine to halt with the sequence on its output tape. than 1 million bits. Randomness Randomness goal of compression Andrei Kolgomorov defined a random ■ A fully compressed data sequence is number as a number for which no formulae indistinguishable from a random sequence shorter than itself existed. of 0s and 1s. By Chaitin’s definition of information a ■ This follows directly from Kolgomorov and random number is thus incompressible. Chaitin’s results. A Random Sequence Thus Contains the ■ From Shannon we also have the result that for Maximum Information. each bit of the stream to have maximal information it must mimic the tossing of a fair coin : be unpredictable, random. paul 5 Information and compression Information is not meaning Information is not complexity ■ One million digits of π are more meaningful ■ One million digits from π have less than one million random bits. information than a million random bits but ■ But they contain less information. they are more complex. ■ Meaningful sequences generally contain ■ The complexity of a sequence is measured redundancy. by the number of machine cycles a computer would have to go through to generate it ( Bennet’s theorem of logical depth). Topic Four: Limits to compression Consequence of Goedels theorem ■ We can not in principle come up with an ■ Showed we can not prove completeness of a algorithm for performing the maximum consistent set of arithmetic axioms. There will compression on any bit sequence. be true statements that can not be proven. ■ 3/7 is a rule of arithmetic, an algorithm that ■ If there existed a general procedure to derive generates the sequence 0.42857142857 the minimal Turing machine program for any sequence, then we would have a procedure to ■ So this sequence is presumably less random derive any true proposition from a smaller set than 0.32857142877 ( changed 2 digits) of axioms, contra Goedel. ■ But we can never be sure. Image sizes Storage capacities • QCIF: 160 x 120 (used in video phones) Floppy disk • MPEG: 352 x 288 ■ Floppy disk capacity = 1.44 MB • VGA: 640 x 480 ■ A single 1280x1024x24 image = 3.9 MB • NTSC: 720 x 486 ( US Television) ■ A single 640x480x24 image = 922K • Workstation: 1280 x 1024 ■ Floppy disk holds only one 640x480x24 • HDTV: 1920 x 1080 image. • 35mm slide: 3072 x 2048 paul 6 Information and compression Video on CD rom Telecoms links ■ CD-ROM capacity = 600 MB ■ Ethernet = 10 meg bits /sec ■ A 160x120x16 image @30 fps (QCIF)= ■ ISDN = 128 K bits /sec 1.15 MB/sec ■ ADSL = max 2 meg bit /sec ■ CD-ROM now holds 8.7 minutes of video Compute max frame rates for QCIF for each (still not enough), but only at about a quarter of these channels the image size of normal TV Summary Where to get more information ■ Information = randomness = disorder = ■ ‘The user Illusion’ by Tor Norretranders , entropy. Penguin 1998. ■ Meaningful data is generally compressible. ■ ‘Information Randomness and Incompleteness’ by Chaitin ■ Information measured by length of program ■ to generate it. ‘Information is Physical’ by Landauer, in Physics Today,May 1991. ■ We will look at two techniques to apply ■ ‘Logical depth and physical complexity’ by these ideas to images: Fractal compression, Bennet, in The Universal Turing Machine a Half and Vector Quantization.