Digital Communication Systems ECS 452
Asst. Prof. Dr. Prapun Suksompong (ผศ.ดร.ประพันธ ์ สขสมปองุ ) [email protected] 1. Intro to Digital Communication Systems Office Hours: BKD, 6th floor of Sirindhralai building Monday 10:00-10:40 Tuesday 12:00-12:40
1 Thursday 14:20-15:30 “The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.”
Shannon, Claude. A Mathematical Theory Of Communication. (1948)
2 Shannon: Father of the Info. Age
Documentary Co-produced by the Jacobs School, UCSD-TV, and the California Institute for Telecommunications and Information Technology Won a Gold award in the Biography category in the 2002 Aurora Awards.
3 [http://www.uctv.tv/shows/Claude-Shannon-Father-of-the-Information-Age-6090] [http://www.youtube.com/watch?v=z2Whj_nL-x8] C. E. Shannon (1916-2001) 1938 MIT master's thesis: A Symbolic Analysis of Relay and Switching Circuits Insight: The binary nature of Boolean logic was analogous to the ones and zeros used by digital circuits. The thesis became the foundation of practical digital circuit design. The first known use of the term bit to refer to a “binary digit.” Possibly the most important, and also the most famous, master’s thesis of the century. It was simple, elegant, and important.
4 C. E. Shannon: Master Thesis
5 Boole/Shannon Celebration Events in 2015 and 2016 centered around the work of George Boole, who was born 200 years ago, and Claude E. Shannon, born 100 years ago. Events were scheduled both at the University College Cork (UCC), Ireland and the Massachusetts Institute of Technology (MIT)
6 http://www.rle.mit.edu/booleshannon/ An Interesting Book The Logician and the Engineer: How George Boole and Claude Shannon Created the Information Age by Paul J. Nahin ISBN: 9780691151007 http://press.princeton.edu/titles/ 9819.html
7 C. E. Shannon (Con’t) 1948: A Mathematical Theory of Communication Bell System Technical Journal, vol. 27, pp. 379-423, July- October, 1948. September 1949: Book published. Include a new section by Warren Weaver that applied Shannon's theory to Invent Information Theory: human communication. Simultaneously founded the subject, introduced all of the Create the architecture and major concepts, and stated and concepts governing digital proved all the fundamental communication. theorems. 8 A Mathematical Theory of Communication Link posted in the “references” section of the website.
9 [An offprint from the Bell System Technical Journal] C. E. Shannon
10 …with some remarks by Toby Berger. Claude E. Shannon Award
Claude E. Shannon (1972) Elwyn R. Berlekamp (1993) Sergio Verdu (2007) David S. Slepian (1974) Aaron D. Wyner (1994) Robert M. Gray (2008) Robert M. Fano (1976) G. David Forney, Jr. (1995) Jorma Rissanen (2009) Peter Elias (1977) Imre Csiszár (1996) Te Sun Han (2010) Mark S. Pinsker (1978) Jacob Ziv (1997) Shlomo Shamai (Shitz) (2011) Jacob Wolfowitz (1979) Neil J. A. Sloane (1998) Abbas El Gamal (2012) W. Wesley Peterson (1981) Tadao Kasami (1999) Katalin Marton (2013) Irving S. Reed (1982) Thomas Kailath (2000) János Körner (2014) Robert G. Gallager (1983) Jack KeilWolf (2001) Arthur Robert Calderbank (2015) Solomon W. Golomb (1985) Toby Berger (2002) Alexander S. Holevo (2016) David Tse (2017) William L. Root (1986) Lloyd R. Welch (2003) James L. Massey (1988) Robert J. McEliece (2004) Thomas M. Cover (1990) Richard Blahut (2005) Andrew J. Viterbi (1991) Rudolf Ahlswede (2006)
11 [ http://www.itsoc.org/honors/claude-e-shannon-award ] [http://www.ieee.org/documents/hamming_rl.pdf] IEEE Richard W. Hamming Medal 1988 - Richard W. Hamming 2006 -Vladimir I. Levenshtein 1989 - Irving S. Reed 2007 - Abraham Lempel 1990 - Dennis M. Ritchie and Kenneth L. Thompson 2008 - Sergio Verdú 1991 - Elwyn R. Berlekamp 2009 - Peter Franaszek 1992 - Lotfi A. Zadeh 2010 -Whitfield Diffie, 1993 - Jorma J. Rissanen Martin Hellman, 1994 - Gottfried Ungerboeck and Ralph Merkle 1995 - Jacob Ziv 2011 -Toby Berger 1996 - Mark S. Pinsker 2012 - Michael Luby, Amin Shokrollahi 1997 -Thomas M. Cover 2013 - Arthur Robert Calderbank 1998 - David D. Clark 2014 -Thomas Richardson 1999 - David A. Huffman and Rüdiger L. Urbanke 2000 - Solomon W. Golomb 2015 - Imre Csiszar 2001 - A. G. Fraser 2016 - Abbas El Gamal 2002 - Peter Elias 2017 - Shlomo Shamai 2003 - Claude Berrou and Alain Glavieux 2018 - Erdal Arikan 2004 - Jack K. Wolf 2005 - Neil J.A. Sloane “For contributions to Information Theory, including 12 source coding and its applications.” [http://www.cvaieee.org/html/toby_berger.html] Information Theory The science of information theory tackles the following questions [Berger] 1. What is information, i.e., how do we measure it quantitatively? 2. What factors limit the reliability with which information generated at one point can be reproduced at another, and what are the resulting limits? 3. How should communication systems be designed in order to achieve or at least to approach these limits?
13 Elements of communication sys. (ECS 332)
Analog (continuous) Digital (discrete) Transmitted Received Message Message Signal Signal Information Transmitter Channel Receiver Destination Source Modulation Amplification Coding Demodulation Decoding Noise, Filtering Interference, Distortion + Transmission loss (attenuation)
[Shannon, 1948] 14 The Switch to Digital TV
Japan: Starting July 24, 2011, the analog broadcast has ceased and only digital broadcast is available. US: Since June 12, 2009, full- power television stations nationwide have been broadcasting exclusively in a digital format. Thailand: Use DVB-T2. Launched in 2014.
15 [https://upload.wikimedia.org/wikipedia/commons/thumb/b/bd/Digital_broadcast_standards.svg/800px-Digital_broadcast_standards.svg.png] News: The Switch to Digital Radio Norway (the mountainous nation of 5 million) is the first country to shut down its national FM radio network in favor of digital radio. Start on January 11, 2017 At which point, 99.5% of the population has access to DAB reception with almost three million receivers sold. 70% of Norwegian households regularly tune in digitally Take place over a 12-month period, conducting changes region by region. December 13, 2017: All national networks are DAB-only. Local broadcasters have five years to phase out their FM stations. New format: Digital Audio Broadcasting (DAB)
http://gizmodo.com/norway-is-killing-fm-radio-tomorrow-1791019824 http://www.worlddab.org/country-information/norway http://www.smithsonianmag.com/smart-news/norway-killed-radio-star-180961761/ http://www.latimes.com/world/la-fg-norway-radio-20170114-story.html 16 https://www.newscientist.com/article/2117569-norway-is-first-country-to-turn-off-fm-radio-and-go-digital-only/ http://fortune.com/2017/12/18/norway-fm-radio-digital-audio-broadcasting/ Digital Audio Broadcasting Initiated as a European research project in the 1980s. The Norwegian Broadcasting Corporation (NRK) launched the first DAB channel in the world on 1 June 1995 (NRK Klassisk) The BBC and Swedish Radio (SR) launched their first DAB digital radio broadcasts in September 1995. Audio quality varies depending on the bitrate used.
17 The Switch to DAB in Norway Co-exist with FM since 1995. Provide a clearer and more reliable network that can better cut through the country's sparsely populated rocky terrain. FM has always been problematic in Norway since the nation’s mountains and fjords makes getting clear FM signals difficult. Offer more channels at a fraction of the cost. Allow 8 times as many radio stations Norway currently has five national FM radio stations. With DAB, it will be able to have around 40. The FM radio infrastructure was coming to the end of its life, Need to either replace it or fully commit to DAB anyway Can run at lower power levels the infrastructure electricity bills are lower 18 The Switch to Digital Radio Switzerland and Denmark are also interested in phasing out FM Great Britain says it will look at making the switch once 50 percent of listeners use digital formats currently at 35 percent Unlikely to happen before 2020. and when the DAB signal reaches 90 percent of the population. Germany had set a 2015 date for dumping FM many years ago, but lawmakers reversed that decision in 2011. In North America, FM radio, which has been active since the 1940s, shows no sign of being replaced any time soon, either in the United States or Canada. There are around 4,000 stations using HD radio technology in the United States, and HD radio receivers are now common fixtures in new cars. In Thailand, NBTC planed to start digital radio trial within 2018.
http://thaidigitalradio.com/ความคบหนื ้าล่าสดุ -วิทยุ/
19 [ https://en.wikipedia.org/wiki/HD_Radio
Selected by the U.S. FCC in 2002 as a digital audio broadcasting method for the United States. Embed digital signal “on-frequency” immediately above and below a station’s standard analog signal Provide the means to listen to the same program in either HD ] (digital radio with less noise) or as a standard broadcast (analog radio with standard sound quality).
Spectrum of FM broadcast station
20 without HD Radio with HD Radio Countries using DAB/DMB
21 https://en.wikipedia.org/wiki/Digital_audio_broadcasting 2017 hurricane season in the US
22 http://edition.cnn.com/2017/10/10/weather/hurricane-nate-maria-irma-harvey-impact-look-back-trnd/index.html https://www.vox.com/energy-and-environment/2017/9/28/16362522/hurricane-maria-2017-irma-harvey-rain-flooding-climate-change Radio broadcasts are critical during a disaster In areas hit hardest by things like hurricanes, earthquakes, fires, or even shootings: Vulnerabilities of mobile phone infrastructure Cell phone infrastructure is often knocked out. Overwhelmed from everyone trying to access information. Three weeks after Hurricane Maria pummeled Puerto Rico, more than 76 percent of cell sites still aren’t functioning. Radio broadcast signals, which use low frequencies and can travel much further distances and penetrate through obstacles, usually remain up.
23 FM capability in modern cellphone FM capability is baked into the Qualcomm LTE modem inside nearly every cellphone. You can easily turn your phone into an FM radio if it has an embedded chipset and the proper circuitry to connect that chip to an FM antenna. Need an app like NextRadio something to act as an antenna, such as headphones or nonwireless speakers. Until a few years ago, device manufacturers disabled the function. Wireless carriers wanted customers to stream music and podcasts, and consume more data. Broadcasters and public safety officials have long urged handset manufacturers and wireless carriers to universally activate the FM chip. ITU (International Telecommunications Union) issued an opinion in March 2017 urging all mobile phone makers to include and turn on FM radios on their devices. Major US carriers now allow FM chips to be turned on. Manufacturers like Samsung, LG, HTC and Motorola have activated FM radio on their phones. September 28, 2017: FCC blasted Apple for not activating FM receivers built into iPhones. Apple responded that iPhone 7 and iPhone 8, and iPhone X don’t use a chipset with an embedded FM radio.
https://www.cnet.com/news/everything-you-need-to-know-about-fm-radio-on-your-phone/ 24 https://spectrum.ieee.org/tech-talk/consumer-electronics/gadgets/fcc-wants-apple-to-turn-on-iphone-fm-receivers-that-may-not-exist https://www.wired.com/2016/07/phones-fm-chips-radio-smartphone/ Pokémon Communications
25 Pikachu's language
Some of Pikachu's speech is consistent enough that it seems that some phrases actually mean something. Pikachu always uses "Pikapi" when referring to Ash (notice that it sounds somewhat similar to "Satoshi"). Pi-Kachu: He says this during the sponsor spots in the original Japanese, Pochama (Piplup) Pikachu-Pi: Kasumi (Misty) Pika-Chu: Takeshi (Brock), Kibago (Axew) Pikaka: Hikari (Dawn) PiPiPi: Togepy (Togepi) PikakaPika: Fushigidane (Bulbasaur) PikaPika: Zenigame (Squirtle), Mukuhawk (Staraptor), Goukazaru (Infernape) or Gamagaru (Palpitoad) PiPi-kachu: Rocket-dan (Team Rocket) Pi-Pikachu: Get da ze! (He says this after Ash wins a Badge, catches a new Pokémon or anything similar.) Pikachu may not be the only one to use this phrase, as other Pokémon do this as well. For example, when Iris caught Emolga, Axew said Ax-Axew (Ki-Kibago in the Japanese). Pika-Pikachu: He says this when referring to himself. Four-symbol variable-length code?
26 [https://www.youtube.com/watch?v=XumQrRkGXck] Rate-Distortion Theory The theory of lossy source coding
27 Digital Communication Systems ECS 452
Asst. Prof. Dr. Prapun Suksompong [email protected] 2. Source Coding
Office Hours: BKD, 6th floor of Sirindhralai building Monday 10:00-10:40 Tuesday 12:00-12:40
1 Thursday 14:20-15:30 Elements of digital commu. sys.
Message Transmitter
Information Source Channel Digital Source Encoder Encoder Modulator Transmitted Remove Add Signal redundancy systematic redundancy Channel Noise & Interference Recovered Message Receiver Received Signal Source Channel Digital Destination Decoder Decoder Demodulator
2 System Under Consideration
Message Transmitter
Information Source Channel Digital Source Encoder Encoder Modulator Transmitted Remove Add Signal redundancy systematic redundancy Channel Noise & Interference Recovered Message Receiver Received Signal Source Channel Digital Destination Decoder Decoder Demodulator
3 Main Reference Elements of Information Theory 2006, 2nd Edition Chapters 2, 4 and 5 ‘the jewel in Stanford's crown’ One of the greatest information theorists since Claude Shannon (and the one most like Shannon in approach, clarity, and taste).
4 English Alphabet (Non-Technical Use)
5 US UK The ASCII Coded Character Set (American Standard Code for Information Interchange)
016 32 48 64 80 96 112
6 [The ARRL Handbook for Radio Communications 2013] Example: ASCII Encoder
Character Codeword x c(x) ⋮ MATLAB: E 1000101 >> M = 'LOVE'; ⋮ >> X = dec2bin(M,7); >> X = reshape(X',1,numel(X)) L 1001100 X = ⋮ 1001100100111110101101000101 O 1001111 Remark: ⋮ numel(A) = prod(size(A)) V 1010110 (the number of elements in matrix A) ⋮
Information “LOVE” Source “1001100100111110101101000101” Source Encoder 7 English Redundancy: Ex. 1
J-st tr- t- r--d th-s s-nt-nc-.
8 English Redundancy: Ex. 2
yxx cxn xndxrstxnd whxt x xm wrxtxng xvxn xf x rxplxcx xll thx vxwxls wxth xn 'x' (t gts lttl hrdr f y dn't vn kn whr th vwls r).
9 English Redundancy: Ex. 3
To be, or xxx xx xx, xxxx xx xxx xxxxxxxx
10 Entropy Rate of Thai Text
11 Introduction to Data Compression
12 [ https://www.khanacademy.org/computing/computer-science/informationtheory/moderninfotheory/v/compressioncodes ] Introduction to Data Compression
13 [ https://www.khanacademy.org/computing/computer-science/informationtheory/moderninfotheory/v/compressioncodes ] ASCII: Source Alphabet of Size = 128 (American Standard Code for Information Interchange)
016 32 48 64 80 96 112
14 [The ARRL Handbook for Radio Communications 2013] Ex. Source alphabet of size = 4
15 Ex. DMS (1)
1 ,,,,,xabcde px 5 X abcde,,,, X 0, otherwise Information Source a c a c e c d b c e d a e e d a b b b d b b a a b e b e d c c e d b c e c a a c a a e a c c a a d c d e e a a c a a a b b c a e b b e d b c d e b c a e e d d c d a b c a b c d d e d c e a b a a c a d
16 Approximately 20% are letter ‘a’s [GenRV_Discrete_datasample_Ex1.m] Ex. DMS (1) clear all; close all;
S_X = 'abcde'; p_X = [1/5 1/5 1/5 1/5 1/5];
n = 100; MessageSequence = datasample(S_X,n,'Weights',p_X) MessageSequence = reshape(MessageSequence,10,10)
>> GenRV_Discrete_datasample_Ex1
MessageSequence =
eebbedddeceacdbcbedeecacaecedcaedabecccabbcccebdbbbeccbadeaaaecceccdaccedadabceddaceadacdaededcdcade
MessageSequence =
eeeabbacde eacebeeead bcadcccdce bdcacccaed ebabcbedac dceeeacadd dbccbdcbac deecdedcca eddcbaaedd cecabacdae
17 [GenRV_Discrete_datasample_Ex1.m] 1 ,1,x Ex. DMS (2) 2 1 ,2,x px 4 X 1, 2, 3, 4 X 1 ,3,4x 8 0, otherwise Information Source 2 1 1 2 1 4 1 1 1 1 1 1 4 1 1 2 4 2 2 1 3 1 1 2 3 2 4 1 2 4 2 1 1 2 1 1 3 3 1 1 1 3 4 1 4 1 1 2 4 1 4 1 4 1 2 2 1 4 2 1 4 1 1 1 1 2 1 4 2 4 2 1 1 1 2 1 2 1 3 2 2 1 1 1 1 1 1 2 3 2 2 1 1 2 1 4 2 1 2 1
18 Approximately 50% are number ‘1’s [GenRV_Discrete_datasample_Ex2.m] Ex. DMS (2)
clear all; close all;
S_X = [1 2 3 4]; p_X = [1/2 1/4 1/8 1/8];
n = 20;
MessageSequence = randsrc(1,n,[S_X;p_X]); %MessageSequence = datasample(S_X,n,'Weights',p_X);
rf = hist(MessageSequence,S_X)/n; % Ref. Freq. calc. 0.5 stem(S_X,rf,'rx','LineWidth',2) % Plot Rel. Freq. Rel. freq. from sim. 0.45 pmf p (x) hold on X stem(S_X,p_X,'bo','LineWidth',2) % Plot pmf 0.4 xlim([min(S_X)-1,max(S_X)+1]) legend('Rel. freq. from sim.','pmf p_X(x)') 0.35 xlabel('x') 0.3 grid on 0.25
0.2
0.15
0.1
0.05
0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x
19 [GenRV_Discrete_datasample_Ex2.m] DMS in MATLAB
clear all; close all;
S_X = [1 2 3 4]; p_X = [1/2 1/4 1/8 1/8]; n = 1e6;
SourceString = randsrc(1,n,[S_X;p_X]);
Alternatively, we can also use SourceString = datasample(S_X,n,'Weights',p_X);
rf = hist(SourceString,S_X)/n; % Ref. Freq. calc. stem(S_X,rf,'rx','LineWidth',2) % Plot Rel. Freq. hold on stem(S_X,p_X,'bo','LineWidth',2) % Plot pmf xlim([min(S_X)-1,max(S_X)+1]) legend('Rel. freq. from sim.','pmf p_X(x)') xlabel('x') 20 grid on [GenRV_Discrete_datasample_Ex.m] A more realistic example of pmf:
Relative freq. of letters in the English language
21 [http://en.wikipedia.org/wiki/Letter_frequency] A more realistic example of pmf:
Relative freq. of letters in the English language ordered by frequency
22 [http://en.wikipedia.org/wiki/Letter_frequency] Example: ASCII Encoder
Character Codeword x c(x) ⋮ MATLAB: E 1000101 >> M = 'LOVE'; ⋮ >> X = dec2bin(M,7); >> X = reshape(X',1,numel(X)) L 1001100 X = ⋮ 1001100100111110101101000101
Codebook O 1001111 Remark: ⋮ numel(A) = prod(size(A)) V 1010110 (the number of elements in matrix A) ⋮
c(“L”) c(“O”) c(“V”) c(“E”) Information “LOVE” Source “1001100100111110101101000101” Source Encoder 23 The ASCII Coded Character Set
016 32 48 64 80 96 112
24 [The ARRL Handbook for Radio Communications 2013] A Byte (8 bits) vs. 7 bits
>> dec2bin('I Love ECS452',7) >> dec2bin('I Love ECS452',8) ans = ans = 1001001 01001001 0100000 00100000 1001100 01001100 1101111 01101111 1110110 01110110 1100101 01100101 0100000 00100000 1000101 01000101 1000011 01000011 1010011 01010011 0110100 00110100 0110101 00110101 0110010 00110010 25 Geeky ways to express your love >> dec2bin('I Love You',8) >> dec2bin('i love you',8) ans = ans = 01001001 01101001 00100000 00100000 01001100 01101100 01101111 01101111 01110110 01110110 01100101 01100101 00100000 00100000 01011001 01111001 01101111 01101111 01110101 01110101 https://www.etsy.com/listing/91473057/binary-i-love-you-printable-for- your?ref=sr_gallery_9&ga_search_query=binary&ga_filters=holidays+- supplies+valentine&ga_search_type=all&ga_view_type=gallery http://mentalfloss.com/article/29979/14-geeky-valentines-day-cards https://www.etsy.com/listing/174002615/binary-love-geeky-romantic-pdf- cross?ref=sr_gallery_26&ga_search_query=binary&ga_filters=holidays+- supplies+valentine&ga_search_type=all&ga_view_type=gallery https://www.etsy.com/listing/185919057/i-love-you-binary-925-silver-dog-tag- 26 can?ref=sc_3&plkey=cdf3741cf5c63291bbc127f1fa7fb03e641daafd%3A185919057&ga_search_query=binary &ga_filters=holidays+-supplies+valentine&ga_search_type=all&ga_view_type=gallery http://www.cafepress.com/+binary-code+long_sleeve_tees w/o extension Summary: Source Encoder source string encoded string c(“L”) c(“O”) c(“V”) c(“E”) Information “LOVE” Source “1001100100111110101101000101” Source Encoder
Discrete Memoryless • An encoder · is a • The codeword Source (DMS) function that maps each of corresponding to a source • The source the symbol in the source symbol is denoted by alphabet is the alphabet into a . collection of all corresponding (binary) • the length of possible source codeword. symbols. • The list for such mapping • Each codeword is • Each symbol that the is called the codebook. constructed from a code Source Symbol Codeword alphabet. source generates is x c(x) ⋮ • assumed to be E 1000101 For binary codeword, ⋮ randomly selected L 1001100 the code alphabet is ⋮ 0,1 from the source O 1001111 ⋮ alphabet. V 1010110 27 ⋮ Morse code (wired and wireless) Telegraph network Samuel Morse, 1838 A sequence of on-off tones (or , lights, or clicks)
28 Example
29 [http://www.wolframalpha.com/input/?i=%22I+love+you.%22+in+Morse+code] Example
30 Morse code: Key Idea
Frequently-used characters are mapped to short codewords.
31 Relative frequencies of letters in the English language Morse code: Key Idea Frequently-used characters (e,t) are mapped to short codewords.
32 Relative frequencies of letters in the English language Morse code: Key Idea Frequently-used characters (e,t) are mapped to short codewords.
Basic form of compression.
33 รหสมอรั ์สภาษาไทย
34 Example: ASCII Encoder
Character Codeword ⋮ MATLAB: E 1000101 >> M = 'LOVE'; ⋮ >> X = dec2bin(M,7); L 1001100 >> X = reshape(X',1,numel(X)) ⋮ X = 1001100100111110101101000101 O 1001111 ⋮ V 1010110 ⋮
Information “LOVE” Source “1001100100111110101101000101” Source Encoder 35 Another Example of non-UD code Suppose we want to convey the sequence of outcomes from rolling a dice.
xc(x) A sequence of throws such as 53214 is 11 encoded as 10111101100 210 311 4 100 5 101 6 110
36 Another Example of non-UD code Suppose we want to convey the sequence of outcomes from rolling a dice.
xc(x) The encoded string 11 could be 11 interpreted as 210 11: 1 1 311 3: 11 The encoded string 110 could be 4 100 interpreted as 5 101 12: 1 10 6 110 6: 110
37 Another Example of non-UD code
xc(x) A1 B 011 C 01110 D 1110 E 10011
38 Another Example of non-UD code
xc(x) Consider the encoded string A1 011101110011. B 011 It can be interpreted as C 01110 CDB: 01110 1110 011 D 1110 BABE: 011 1 011 10011 E 10011
39 [ https://en.wikipedia.org/wiki/Sardinas%E2%80%93Patterson_algorithm ] Game: 20 Questions 20 Questions is a classic game that has been played since the 19th century. One person thinks of something (an object, a person, an animal, etc.) The others playing can ask 20 questions in an effort to guess what it is.
40 20 Questions: Example
41 Prof. Robert Fano (1917-2016) Shannon Award (1976 ) Shannon–Fano coding Proposed in Shannon’s “A Mathematical Theory of Communication” in 1948 The method was attributed to Fano, who later published it as a technical report. Fano, R.M. (1949). “The transmission of information”. Technical Report No. 65. Cambridge (Mass.), USA: Research Laboratory of Electronics at MIT. Should not be confused with Shannon coding, the coding method used to prove Shannon's noiseless coding theorem, or with Shannon–Fano–Elias coding (also known as Elias coding), the precursor to arithmetic coding.
42 David Huffman (1925–1999) Huffman Code Hamming Medal (1999) MIT, 1951 Information theory class taught by Professor Fano. Huffman and his classmates were given the choice of a term paper on the problem of finding the most efficient binary code. or a final exam. Huffman, unable to prove any codes were the most efficient, was about to give up and start studying for the final when he hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the most efficient. Huffman avoided the major flaw of the suboptimal Shannon-Fano coding by building the tree from the bottom up instead of from the top down.
43 Huffman’s paper (1952)
[D. A. Huffman, "A Method for the Construction of Minimum-Redundancy 44 Codes," in Proceedings of the IRE, vol. 40, no. 9, pp. 1098-1101, Sept. 1952.] [ http://ieeexplore.ieee.org/document/4051119/ ] All codes Summary: Nonsingular codes [2.16-17] A good code must be UD codes uniquely decodable (UD). Prefix-free [Defn 2.18] codes Difficult to check. Huffman [2.24] Consider a special family codes of codes: prefix(-free) code. No codeword is a prefix of any other Always UD. codeword. Same as being instantaneous. Each source symbol can be decoded as soon as we come to the end of the [Defn 2.30] Huffman’s recipe codeword corresponding to it Repeatedly combine the two least-likely (combined) symbols. Automatically give prefix-free code. [Defn 2.36] [2.37] For a given source’s pmf, Huffman codes are optimal 45 among all UD codes for that source. Huffman coding
46 [ https://www.khanacademy.org/computing/computer-science/informationtheory/moderninfotheory/v/compressioncodes ] Ex. Huffman Coding in MATLAB [Ex. 2.31] Observe that pX = [0.5 0.25 0.125 0.125]; % pmf of X MATLAB SX = [1:length(pX)]; % Source Alphabet automatically give [dict,EL] = huffmandict(SX,pX); % Create codebook the expected length of the %% Pretty print the codebook. codewords codebook = dict; for i = 1:length(codebook) codebook{i,2} = num2str(codebook{i,2}); end codebook
%% Try to encode some random source string n = 5; % Number of source symbols to be generated sourceString = randsrc(1,10,[SX; pX]) % Create data using pX encodedString = huffmanenco(sourceString,dict) % Encode the data
47 [Huffman_Demo_Ex1] Ex. Huffman Coding in MATLAB
codebook =
[1] '0' [2] '1 0' [3] '1 1 1' [4] '1 1 0'
sourceString =
1 4 4 1 3 1 1 4 3 4
encodedString =
0 1 1 0 1 1 0 0 1 1 1 0 0 1 1 0 1 1 1 1 1 0
48 [Huffman_Demo_Ex1] Ex. Huffman Coding in MATLAB [Ex. 2.32] pX = [0.4 0.3 0.1 0.1 0.06 0.04]; % pmf of X SX = [1:length(pX)]; % Source Alphabet [dict,EL] = huffmandict(SX,pX); % Create codebook
%% Pretty print the codebook. >> Huffman_Demo_Ex2 codebook = dict; for i = 1:length(codebook) codebook = codebook{i,2} = num2str(codebook{i,2}); end codebook [1] '1' [2] '0 1' EL [3] '0 0 0 0' [4] '0 0 1' [5] '0 0 0 1 0' The codewords can be different [6] '0 0 0 1 1' from our answers found earlier. EL = The expected length is the same. 2.2000 49 [Huffman_Demo_Ex2] Ex. Huffman Coding in MATLAB [Exercise] pX = [1/8, 5/24, 7/24, 3/8]; % pmf of X SX = [1:length(pX)]; % Source Alphabet [dict,EL] = huffmandict(SX,pX); % Create codebook
%% Pretty print the codebook. codebook = dict; for i = 1:length(codebook) codebook{i,2} = num2str(codebook{i,2}); end codebook
EL
codebook = [1] '0 0 1' >> -pX*(log2(pX)).' [2] '0 0 0' ans = [3] '0 1' 1.8956 [4] '1' EL = 1.9583 50 0
-0.05
-0.1
-0.15
-0.2
-0.25
-0.3
-0.35
-0.4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 56 Entropy and Description of RV
57 [ https://www.khanacademy.org/computing/computer-science/informationtheory/moderninfotheory/v/information-entropy ] Entropy and Description of RV
58 [ https://www.khanacademy.org/computing/computer-science/informationtheory/moderninfotheory/v/information-entropy ] Summary: Optimality of Huffman Codes Consider a given DMS with All codes known pmf … Nonsingular codes [Defn 2.36] A code is optimal if it is UD and its corresponding UD codes expected length is the shortest Prefix-free among all possible UD codes for codes that source. Huffman [2.37] Huffman codes are codes optimal. [2.49-2.54] Bounds on expected lengths: Expected length Expected length (per source (per source 1 symbol) of an symbol) of a optimal code Huffman code 59 Summary: Entropy Entropy measures the amount of uncertainty (randomness) in a RV. Three formulas for calculating entropy: [Defn 2.41] Given a pmf of a RV , ≡ ∑ log . Set 0log 0 0. [2.44] Given a probability vector ,
≡ ∑ log . [Defn 2.47] Given a number , binary entropy ≡ log 1 log 1 function [2.56] Operational meaning: Entropy of a random variable is the average length of its shortest description. 60 Examples Example 2.31 Huffman HX1.75 X Efficiency = 100%
Example 2.32
Huffman HX2.14 2.2 X Efficiency 97%
61 Examples Example 2.33
Huffman HX2.29 2.3 X Efficiency 99%
Example 2.34
Huffman A HX1.86 2 X B C Efficiency 93% D
62 Summary: Entropy Important Bounds deterministic uniform The entropy of a uniform (discrete) random variable: The entropy of a Bernoulli random variable: binary entropy function
63 [Ex.2.40] Huffman Coding: Source Extension