6.02 Lecture 1: Overview: Information and Entropy
Total Page:16
File Type:pdf, Size:1020Kb
6.02 Fall 2012 Lecture #1 • Digital vs. analog communication • The birth of modern digital communication • Information and entropy • Codes, Huffman coding 6.02 Fall 2012 Lecture 1, Slide #1 A9AR(DWQf F(DG9GEEVFA0(SGF • GEEVFA0(SF9(0GFSFVGVQsSE2X(W23GPE t2f9fdWGDR(923PGE(EA0PGH@GF2udWA((EHDARV12EG1VD(SGF t uGP3P2IV2F0`EG1VD(SGFt&uGPfff – F(DG92D20RPGFA0Q – &A12DAR`RGR@2X(W23GPE • ! GEEVFA0(SF9(E2QQ(920GEHPAQAF9(1AQ0P2R2s SE2Q2IV2F02G3Q`E)GDQ3PGEQGE2QGVP02(DH@()2R – 82F0G121GFRGQGE2GR@2PQ2IV2F02G3Q`E)GDQR@(RhQ(1(HR21RG R@20GEEVFA0(SGF0@(FF2Dd2f9fd)AF(P`1A9ARQd(F1f – 82FAFWGDWAF9(F(DG90GEEVFA0(SGF(0PGQQR@2H@`QA0(D0@(FF2D – &A12DAR`RGR@2E2QQ(92 – $2DDQVAR21RGPA1AF9R@2QR(992PAF99PGXR@AF0GEHVR(SGF(DHGX2Pd QRGP(92d)A91(R(dg 6.02 Fall 2012 Lecture 1, Slide #3 6.02 Syllabus Point-to-point communication channels (transmitterreceiver): • Encoding information BITS • Models of communication channels SIGNALS • Noise, bit errors, error correction • Sharing a channel Multi-hop networks: • Packet switching, efficient routing PACKETS • Reliable delivery on top of a best-efforts network 6.02 Fall 2012 Lecture 1, Slide #4 Samuel&f0fGPQ2 • FW2FR21tGFX(P1QdH(R2FRwdAFuR@2EGQRHP(0S0(D3GPE G32D20RPA0(DR2D29P(H@`dAF0DV1AF9C2`QdXAP2(PP(F92E2FRQd 2D20RPGE(9F2RQdE(PCAF912WA02QdP2D(`Qdgd(F1GPQ20G12c • $GPC21SP2D2QQD`RG2QR()DAQ@R@2R20@FGDG9` • 82PAFAS(DQRPV99D2QdR2D29P(H@`X(QIVA0CD`(1GHR21(F1XA12D` 12HDG`21 – !P(FQs RD(FS00()D2(U2EHRQ t@GVPQRGQ2F1XGP1Q3PGEV22F #A0RGPA(RGP2QA12FR0V0@(F(Fcud d d5F(DD`QV002QQAFtXGP1Qq EAFVR2u – !P(FQs0GFSF2FR(D" AFt2420SW2D`2F121R@2GF`EYHP2QQu – !P(FQs(0A50 • !2D29P(H@`RP(FQ3GPE210GEEVFA0(SGFtRP(FQs RD(FS0SE23PGE1(`Q )`Q@AHRGEAFVR2Q)`R2D29P(H@u(F10GEE2P02d(DQGQHVPP21E(BGP 12W2DGHE2FRQAFEER@2GP`vHP(0S02tI2FP`d2DWAFdI2(WAQA12dVHAFdgu 6.02 Fall 2012 Lecture 1, Slide #7 &(QRs3GPX(P1`2(PQ • #A( – !2D2H@GF2tiEHPGW2E2FRAF!2D29P(H@`pdH(R2FRw d d02DDu – $AP2D2QQR2D29P(H@`t(P0GFAu – P(1AGt&2QQ2F12Fu – &P(1AGt PEQRPGF9u – !2D2WAQAGF)PG(10(QSF9)`R@200tu – g • 02DD()Q9(D(Y`G3P2Q2(P0@2PQ – `IVAQRd0G12dI(PRD2`dg 6.02 Fall 2012 Lecture 1, Slide #13 Claude E. Shannon, 1916-2001 (QR2PQR@2QAQdEE2HRd! FRPG1V021(HHDA0(SGFG30GGD2(F (D92)P(RGDG9A00AP0VARQd(F1WA02W2PQ(f #2P`AF7V2FS(DAF1A9AR(D0AP0VAR12QA9Ff iGQRAEHGPR(FR(QR2PQR@2QAQG3R@202FRVP`p @d(R@2HRd! !G(F(D`a2R@21`F(EA0QG32F12DA(F Photograph © source unknown. All rights reserved.This HGHVD(SGFQf content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/fairuse. GAF2102DD()QAFf MIT faculty 1956-1978 i E(R@2E(S0(DR@2GP`G30P`HRG9P(H@`p q ! "%(&' 6.02 Fall 2012 Lecture 1, Slide #14 A probabilistic theory requires at least a one-slide checklist on Probabilistic Models! • "FAW2PQ2G32D2E2FR(P`GVR0GE2QQdQdgdQdgfF2(F1GFD`GF2 GVR0GE2AF2(0@2YH2PAE2FRGPPVFG3R@2EG12Df • EW2FRQ d0ddg(P2QV)Q2RQG3GVR0GE2Qf$2Q(`2W2FR @(QG00VPP21A3R@2 GVR0GE2G3R@22YH2PAE2FRDA2QAF f • EW2FRQ3GPE(Fi(D92)P(pG3Q2RQdAf2fd 0tVFAGFd(DQGXPAU2F 0uAQ(F 2W2FRd 0tAFR2PQ20SGFd(DQGXPAU2F 0uAQ(F2W2FRd t0GEHD2E2FRd (DQGXPAU2F 0uAQ(F2W2FRf G(F1R@2FVDDQ2R$(P2(DQG2W2FRQf • PG)()ADAS2Q(P2125F21GF2W2FRQdQV0@R@(Rt udtud(F1 t 0ut ut0uA3 (F10(P2EVRV(DD`2Y0DVQAW2dAf2fA3 0$fGP2 92F2P(DD`dt 0ut ut0ust 0uf • EW2FRQ d0dddEd2R0fd(P2Q(A1RG)2tEVRV(DD`uAF12H2F12FRA3BGAFR HPG)()ADAR`G32W2P`0GE)AF(SGFG3R@2Q22W2FRQ3(0RGPQAFRGHPG1V0RG3 AF1AWA1V(DHPG)()ADAS2QdQGt 0Eut ut0utututEudt 0ut u t0ututudt Eut ututEud2R0f • GF1ASGF(DHPG)()ADAR`t d9AW2FR@(R0@(QG00VPP21ut r0ut 0uqt0uf • EYH20R21W(DV2G3(P(F1GEW(PA()D2AQR@2HPG)()ADAR`sX2A9@R21(W2P(92f 6.02 Fall 2012 Lecture 1, Slide #15 Measuring Information Shannon’s (and Hartley’s) definition of the information obtained on being told the outcome s i of a probabilistic experiment S : ⎛ ⎞ = = ⎜ 1 ⎟ I(S si ) log2 ⎝ pS (si )⎠ = where p S ( s i ) is the probability of the event S s i . The unit of measurement (when the log is base-2) is the bit (binary information unit --- not the same as binary digit!). 1 bit of information corresponds to = p S ( s i ) 0.5 . So, for example, when the outcome of a fair coin toss is revealed to us, we have received 1 bit of information. “Information is the resolution of uncertainty” Image by MIT OpenCourseWare. Shannon 6.02 Fall 2012 Lecture 1, Slide #16 Examples We’re drawing cards at random from a standard N=52-card deck. Elementary outcome: card that’s drawn, probability 1/52, information log2(52/1) = 5.7 bits. For an event comprising M such (mutually exclusive) outcomes, the probability is M/52. Q. If I tell you the card is a spade , how many bits of information have you received? A. Out of N=52 equally probable cards, M=13 are spades , so probability of drawing a spade is 13/52, and the amount of information received is log2(52/13) = 2 bits. This makes sense, we can encode one of the 4 (equally probable) suits using 2 binary digits, e.g., 00=, 01=, 10=, 11=. Q. If instead I tell you the card is a seven, how much info? A. N=52, M=4, so info = log2(52/4) = log2(13) = 3.7 bits 6.02 Fall 2012 Lecture 1, Slide #17 PGH2PS2QG3F3GPE(SGF125FASGF • DGX2PsHPG)()ADAR`GVR0GE2`A2D1Q@A9@2PAF3GPE(SGF • @A9@D`AF3GPE(SW2GVR0GE21G2QFGRF202QQ(PAD`E2(F( EGP2W(DV()D2GVR0GE2dGFD`(EGP2QVPHPAQAF9GVR0GE2dAf2fd R@2P2hQFGAFRPAFQA0W(DV2)2AF9(QQ2QQ21t0(FR@AFCG3 AF3GPE(SGF(Q129P22G3QVPHPAQ2u • 82FVQ213(0Re!@2AF3GPE(SGFAFAF12H2F12FR2W2FRQAQ (11ASW2ft(VSGFeR@GV9@AF12H2F12F02AQQV60A2FR3GP (11ASWAR`dARAQFGRF202QQ(P`d)20(VQ2X20(F@(W2t 0u t ut0utu2W2FX@2F d0d(P2FGRAF12H2F12FRsss AF12H2F12F02P2IVAP2QR@2H(APXAQ2BGAFRHPG)()ADAS2QRG(DQG 3(0RGPfu 6.02 Fall 2012 Lecture 1, Slide #18 Expected Information as Uncertainty or Entropy Consider a discrete random variable S , which may represent the set of possible symbols to be transmitted at a particular time, taking possible values s 1 , s 2 ,..., s N , with respective probabilities p S ( s 1 ), p S ( s 2 ),..., p S ( s N ) . The entropy H ( S ) of S is the expected (or mean or average) value of the information obtained by learning the outcome of S : N N ⎛ ⎞ = = = ⎜ 1 ⎟ H(S) ∑ pS (si )I(S si ) ∑ pS (si )log2 i=1 i=1 ⎝ pS (si )⎠ When all the p S ( s i ) are equal (with value 1/ N ), then = = H (S) H(S) log2 N or N 2 6.02 Fall 2012 This is the maximum attainable value! Lecture 1, Slide #19 e.g., Binary entropy function h(p) h(p) 1.0 Heads (or C=1) with probability p Tails (or C=0) with 0.5 probability 1 − p p 0 0 0.5 1.0 =− − − − = H(C) plog2 p (1 p)log2 (1 p) h(p) 6.02 Fall 2012 Lecture 1, Slide #20 Connection to (Binary) Coding • Suppose p=1/1024, i.e., very small probability of getting a Head, typically one Head in 1024 trials. Then = + h(p) (1/1024)log2 (1024 /1) (1023/1024)log2 (1024 /1023) =.0112 bits of uncertainty or information per trial on average • So using 1024 binary digits (C=0 or 1) to code the results of 1024 tosses of this particular coin seems inordinately wasteful, i.e., 1 binary digit per trial. Can we get closer to an average of .0112 binary digits/trial? • Yes! Confusingly, a binary digit is also referred to as a bit! • Binary coding: Mapping source symbols to binary digits 6.02 Fall 2012 Lecture 1, Slide #21 Significance of Entropy Entropy (in bits) tells us the average amount of information (in bits) that must be delivered in order to resolve the uncertainty about the outcome of a trial. This is a lower bound on the number of binary digits that must, on the average, be used to encode our messages. If we send fewer binary digits on average, the receiver will have some uncertainty about the outcome described by the message. If we send more binary digits on average, we’re wasting the capacity of the communications channel by sending binary digits we don’t have to. Achieving the entropy lower bound is the “gold standard” for an encoding (at least from the viewpoint of information compression). 6.02 Fall 2012 Lecture 1, Slide #22 Fixed-length Encodings An obvious choice for encoding equally probable outcomes is to choose a fixed-length code that has enough sequences to encode the necessary information • 96 printing characters 7-”bit” ASCII • Unicode characters UTF-16 • 10 decimal digits 4-”bit” BCD (binary coded decimal) Fixed-length codes have some advantages: • They are “random access” in the sense that to decode the nth message symbol one can decode the nth fixed- length sequence without decoding sequence 1 through n-1. • Table lookup suffices for encoding and decoding 6.02 Fall 2012 Lecture 1, Slide #23 Now consider: choicei pi log2(1/pi) A 1/3 1.58 bits B 1/2 1 bit C 1/12 3.58 bits D 1/12 3.58 bits The expected information content in a choice is given by the entropy: = (.333)(1.58) + (.5)(1) + (2)(.083)(3.58) = 1.626 bits Can we find an encoding where transmitting 1000 choices requires 1626 binary digits on the average? The natural fixed-length encoding uses two binary digits for each choice, so transmitting the results of 1000 choices requires 2000 binary digits. 6.02 Fall 2012 Lecture 1, Slide #24 Variable-length encodings (David Huffman, in term paper for MIT graduate class, 1951) Use shorter bit sequences for high probability choices, longer sequences for less probable choices BC A BA D 011010010111 Expected length choicei pi encoding =(.333)(2)+(.5)(1)+(2)(.083)(3) A 1/3 10 0 1 = 1.666 bits B 1/2 0 B 0 1 Transmitting 1000 C 1/12 110 A 0 1 choices takes an average of 1666 bits… D 1/12 111 C D better but not optimal Huffman Decoding Tree Note: The symbols are at the leaves of the tree; necessary and sufficient for instantaneously decodability. 6.02 Fall 2012 Lecture 1, Slide #25 Huffman’s Coding Algorithm • Begin with the set S of symbols to be encoded as binary strings, together with the probability p(s) for each symbol s in S.