10/2/2019

Introduction to

Part 4

A General Communication System

CHANNEL

• Information Source • Transmitter • Channel • Receiver • Destination 10/2/2019 2

1 10/2/2019

Information Channel

Input Output Channel X Y

10/2/2019 3

Perfect Communication (Discrete Noiseless Channel)

0 0 X Y Transmitted Received Symbol Symbol 1 1

10/2/2019 4

2 10/2/2019

NOISE

10/2/2019 5

Motivating Noise…

0 0 X Y Transmitted Received Symbol Symbol 1 1

10/2/2019 6

3 10/2/2019

Motivating Noise…

f = 0.1, n = ~10,000

1 - f 0 0 f

f 1 1 1 - f

10/2/2019 7

Motivating Noise…

Message: $5213.75 Received: $5293.75

1. Detect that an error has occurred.

2. Correct the error.

3. Watch out for the overhead.

10/2/2019 8

4 10/2/2019

Error Detection by Repetition

In the presence of 20% noise…

Message : $ 5 2 1 3 . 7 5 Transmission 1: $ 5 2 9 3 . 7 5 Transmission 2: $ 5 2 1 3 . 7 5 Transmission 3: $ 5 2 1 3 . 1 1 Transmission 4: $ 5 4 4 3 . 7 5 Transmission 5: $ 7 2 1 8 . 7 5

There is no way of knowing where the errors are.

10/2/2019 9

Error Detection by Repetition

In the presence of 20% noise… Message : $ 5 2 1 3 . 7 5 Transmission 1: $ 5 2 9 3 . 7 5 Transmission 2: $ 5 2 1 3 . 7 5 Transmission 3: $ 5 2 1 3 . 1 1 Transmission 4: $ 5 4 4 3 . 7 5 Transmission 5: $ 7 2 1 8 . 7 5 Most common: $ 5 2 1 3 . 7 5 1. Guesswork is involved. 2. There is overhead.

10/2/2019 10

5 10/2/2019

Error Detection by Repetition

In the presence of 50% noise… Message : $ 5 2 1 3 . 7 5 … Repeat 1000 times!

1. Guesswork is involved. But it will almost never be wrong! 2. There is overhead. A LOT of it!

10/2/2019 11

Binary Symmetric Channel (BSC) (Discrete Memoryless Channel)

1 − 푝 0 0 푝 푋 푌 Transmitted Received Symbol 푝 Symbol 1 1 1 − 푝

10/2/2019 12

6 10/2/2019

Binary Symmetric Channel (BSC) (Discrete Memoryless Channel)

1 − 푝 0 0 푝 푋 푌 Transmitted Received Symbol 푝 Symbol 1 1 1 − 푝

Defined by a set of conditional probabilities (aka transitional probabilities)

푝 푦 푥 푓표푟 푎푙푙 푥 ∈ 푋 푎푛푑 푦 ∈ 푌 The probability of 푦 occurring at the output when 푥 is the input to the channel.

10/2/2019 13

A General Discrete Channel

p(푦 |푥 ) 푥1 1 1

p(푦2|푥1) 푦1 p(푦푟|푥1) p(푦3|푥1) 푦2 푥2 푦3

푥푠 푦푟 푠 × 푟 transition probabilities 푠 input symbols 푟 output symbols

10/2/2019 14

7 10/2/2019

Channel With Internal Structure

1 − 푝 1 − 푞 0 0 0 0 1 − 푝 1 − 푞 + 푝푞 0 푝 푞

1 − 푝 푞 + (1 − 푞)푝 푝 푞 1 1 1 1 1 1 − 푝 1 − 푞 + 푝푞

10/2/2019 15

Channel

• A channel can be modeled using a probabilistic model of source and what was received.

Input Output Channel X Y

10/2/2019 16

8 10/2/2019

Conditional & Joint Entropy

• 푋 ,푌 random variables with entropy 퐻(푋) and 퐻 푌

• Conditional Entropy: Average entropy in 푌, given knowledge of 푋.

ퟏ 푯 풀 푿 = ෍ ෍ 풑(풙풊, 풚풋) 풍풐품 풑(풚풋|풙풊) 풙풊∈푿 풚풋∈풀

where 푝 푥푖, 푦푗 = 푝 푦푗 푥푖 푝 푥푖

• Joint Entropy: 푯 푿, 풀 = 푯 풀 푿 + 푯(푿) Entropy of the pair (푋, 푌)

10/2/2019 17

Mutual Information

• The of a 푋 given the random variable 푌 is

퐼 푋; 푌 = 퐻 푋 − 퐻 푋 푌 It is the information about 푋 transmitted by 푌.

10/2/2019 18

9 10/2/2019

Mutual Information: Properties

• 퐼(푋; 푌) = 퐻(푋, 푌) – 퐻(푋|푌) − 퐻(푌|푋) • 퐼(푋; 푌) = 퐻(푋) − 퐻(푋|푌) • 퐼(푋; 푌) = 퐻(푌) − 퐻(푌|푋) • 퐼(푋; 푌) is symmetric in 푋 and 푌 푝(푥,푦) • 퐼 푋; 푌 = σ σ 푝 푥, 푦 푙표푔 푥 푦 푝 푥 푝(푦) • 퐼(푋; 푌) >= 0 • 퐼(푋; 푋) = 퐻(푋)

10/2/2019 19

Entropy Concepts

• 퐼(푋; 푌) = 퐻(푋, 푌) – 퐻(푋|푌) − 퐻(푌|푋) • 퐼(푋; 푌) = 퐻(푋) − 퐻(푋|푌) • 퐼(푋; 푌) = 퐻(푌) − 퐻(푌|푋) • 퐼(푋; 푌) is symmetric in 푋 and 푌 푝(푥,푦) • 퐼 푋; 푌 = σ σ 푝 푥, 푦 푙표푔 푥 푦 푝 푥 푝(푦) • 퐼(푋; 푌) >= 0 • 퐼(푋; 푋) = 퐻(푋) 10/2/2019 20

10 10/2/2019

Channel Capacity

• What is the capacity of the channel?

• What is the reliability of communication across the channel?

10/2/2019 21

Channel Capacity

• What is the capacity of the channel? • What is the reliability of communication across the channel? • Defined Channel Capacity as the rate at which reliable communication is possible. • Capacity is defined in terms of Mutual Information I(X, Y) • Mutual Information is defined in terms of entropy of source H(X) and the joint entropy H(X, Y) • Joint entropy is defined in terms of source entropy H(X) and the conditional entropy H(Y|X)

10/2/2019 22

11 10/2/2019

Channel Capacity

• The capacity of a channel is the maximum possible mutual information that can be achieved between input and output by varying the probabilities of the input symbols.

If X is the input channel and Y is the output, the capacity C is

풄 = 퐦퐚퐱 푰(푿; 풀) 풊풏풑풖풕 풑풓풐풃풂풃풊풍풊풕풊풆풔

10/2/2019 23

Channel Capacity

풄 = 퐦퐚퐱 푰(푿; 풀) 풊풏풑풖풕 풑풓풐풃풂풃풊풍풊풕풊풆풔

Mutual information about X given Y is the information transmitted by the channel and depends on the probability structure

– Input probabilities – Transition probabilities – Output probabilities

10/2/2019 24

12 10/2/2019

Channel Capacity

풄 = 퐦퐚퐱 푰(푿; 풀) 풊풏풑풖풕 풑풓풐풃풂풃풊풍풊풕풊풆풔

Mutual information about X given Y is the information transmitted by the channel and depends on the probability structure

– Input probabilities – Transition probabilities: fixed by properties of channel – Output probabilities: determined by input and transition probabilities

10/2/2019 25

Channel Capacity

풄 = 퐦퐚퐱 푰(푿; 풀) 풊풏풑풖풕 풑풓풐풃풂풃풊풍풊풕풊풆풔

Mutual information about X given Y is the information transmitted by the channel and depends on the probability structure

– Input probabilities: can be adjusted by suitable coding – Transition probabilities: fixed by properties of channel – Output probabilities: determined by input and transition probabilities

10/2/2019 26

13 10/2/2019

Channel Capacity

풄 = 퐦퐚퐱 푰(푿; 풀) 풊풏풑풖풕 풑풓풐풃풂풃풊풍풊풕풊풆풔

Mutual information about X given Y is the information transmitted by the channel and depends on the probability structure

– Input probabilities: can be adjusted by suitable coding – Transition probabilities: fixed by properties of channel – Output probabilities: determined by input and transition probabilities

That is, input probabilities determine mutual information and can be varied by coding. The maximum mutual information with respect to these input probabilities is the channel capacity.

10/2/2019 27

Shannon’s Second Theorem

• Suppose a discrete channel has capacity C and the source has entropy H

If H < C there is a coding scheme such that the source can be transmitted over the channel with an arbitrarily small frequency of error.

If H > C, it is not possible to achieve arbitrarily small error frequency.

10/2/2019 28

14 10/2/2019

Detailed Communication Model

DISCRETE TRANSMITTER CHANNEL

data source channel information encipherment modulation source reduction coding coding

distortion (noise)

RECEIVER

data source channel destination decipherment demodulation reconstruction coding decoding

10/2/2019 29

Detailed Communication Model

DISCRETE TRANSMITTER CHANNEL

data source channel information encipherment modulation source reduction coding coding

Standard compression used to reduce inherent redundancy. distortion resulting bitstream is (noise) nearly random.

RECEIVER

data source channel destination decipherment demodulation reconstruction coding decoding

10/2/2019 30

15 10/2/2019

Detailed Communication Model

DISCRETE TRANSMITTER CHANNEL

data source channel information encipherment modulation source reduction coding coding

Next, resulting bitstream is transformed by an error correcting - puts distortion redundancy back in but in (noise) a systematic way.

RECEIVER

data source channel destination decipherment demodulation reconstruction coding decoding

10/2/2019 31

Detailed Communication Model

DISCRETE TRANSMITTER CHANNEL

data source channel information encipherment modulation source reduction coding coding

distortion (noise)

RECEIVER

data source channel destination decipherment demodulation reconstruction coding decoding

10/2/2019 32

16 10/2/2019

Error Correcting

• Hamming Codes (1950)

• Linear Codes

• Low Density Parity Codes (1960)

• Convolutional Codes

• Turbo Codes (1993)

10/2/2019 33

Applications of Error Correcting Codes

• Satellite communications • Spacecraft/space probe communications: Mariner 4 (1965), Voyager (1977-), Cassini (1990s), Mars Rovers (1997-)

10/2/2019 34

17 10/2/2019

Applications of Error Correcting Codes

• Satellite communications • Spacecraft/space probe communications: Mariner 4 (1965), Voyager (1977-), Cassini (1990s), Mars Rovers (1997-) • CD and DVD Players, etc. • Internet & Web communications (Ethernet, IP, IPv6, TCP, UDP, etc) • Data storage • ECC Memory (DRAM) • Etc. etc. etc.

10/2/2019 35

Error Correcting Codes: Checksum

• ISBN: 0-691-12418-3

• 1*0+2*6+3*9+4*1+5*1+6*2+7*4+8*1+9*8 = 168 mod 11 = 3

• This is a staircase checksum

10/2/2019 36

18 10/2/2019

ISBN Checksum Exercise

• For the 13-digit ISBN (e.g. 978-0-252-72546-3, for Shannon & Weaver’s The Mathematical Theory of Communication)

Each digit, from left to right, is alternately multiplied by 1 or 3, then those products are summed modulo 10 to give a value, R ranging from 0 to 9. Checksum digit is 10 – R.

Thus, 978-0-19-955137-? (Floridi’s Information: A Very Short Introduction)

10/2/2019 37

Information Channel is a General Model

Input Output Channel X Y Cholesterol Levels Condition of Arteries

10/2/2019 38

19 10/2/2019

Information Channel is a General Model

Input Output Channel X Y Symptoms Diagnosis or Test results

10/2/2019 39

Information Channel is a General Model

Input Output Channel X Y Geological Structure Presence of oil deposits

10/2/2019 40

20 10/2/2019

Information Channel is a General Model

Input Output Channel X Y Opinion Poll Next President

10/2/2019 41

MTC: Summary

• Information • Channel • Entropy • Conditional Entropy • Source Coding • Joint Entropy Theorem • Mutual Information • Redundancy • Channel Capacity • Compression • Shannon’s Second • Huffman Encoding Theorem • • Lempel-Ziv Coding Error Correction Codes

10/2/2019 42

21 10/2/2019

References

• Eugene Chiu, Jocelyn Lin, Brok Mcferron, Noshirwan Petigara, Satwiksai Seshasai: Mathematical Theory of Claude Shannon: A study of the style and context of his work up to the genesis of information theory. MIT 6.933J / STS.420J The Structure of Engineering Revolutions • Luciano Floridi, 2010: Information: A Very Short Introduction, Oxford University Press, 2011. • Luciano Floridi, 2011: The Philosophy of Information, Oxford University Press, 2011. • James Gleick, 2011: The Information: A History, A Theory, A Flood, Pantheon Books, 2011. • Zhandong Liu , Santosh S Venkatesh and Carlo C Maley, 2008: Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples, BMC Genomics 2008, 9:509 • David Luenberger, 2006: Information Science, Princeton University Press, 2006. • David J.C. MacKay, 2003: Information Theory, Inference, and Learning Algorithms, Cambridge University Press, 2003. • Claude Shannon & Warren Weaver, 1949: The Mathematical Theory of Communication, University of Illinois Press, 1949. • W. N. Francis and H. Kucera: Brown University Standard Corpus of Present-Day American English, Brown University, 1967. • Edward L. Glaeser: A Tale of Many Cities, New York Times, April 10, 2010. Available at: http://economix.blogs.nytimes.com/2010/04/20/a-tale-of-many-cities/ • Alan Rimm-Kaufman, The Long Tail of Search. Search Engine Land Website, September 18, 2007. Available at: http://searchengineland.com/the-long-tail-of-search-12198

43

22