Shannon’s Formula W log(1 + SNR): A Historical· Perspective on the occasion of ’s Centenary

Oct. 26th, 2016

Olivier Rioul

Outline

Who is Claude Shannon? Shannon’s Seminal Paper Shannon’s Main Contributions Shannon’s Capacity Formula A Hartley’s rule C0 = log2 1 + ∆ is not Hartley’s   1 P Many authors independently derived C = 2 log2 1 + N in 1948.

Hartley’s rule is exact: C0 = C (a coincidence?)  

C0 is the capacity of the “uniform” channel Shannon’s Conclusion

2 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective April 30, 2016 centennial day celebrated by Google: here Shannon is juggling with (1,0,0) in his communication scheme

“father of the age’’

Claude Shannon (1916–2001) 100th birthday 2016

April 30, 1916 Claude Elwood Shannon was born in Petoskey, Michigan, USA

3 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective here Shannon is juggling with bits (1,0,0) in his communication scheme

“father of the information age’’

Claude Shannon (1916–2001) 100th birthday 2016

April 30, 1916 Claude Elwood Shannon was born in Petoskey, Michigan, USA April 30, 2016 centennial day celebrated by Google:

3 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Claude Shannon (1916–2001) 100th birthday 2016

April 30, 1916 Claude Elwood Shannon was born in Petoskey, Michigan, USA April 30, 2016 centennial day celebrated by Google:

here Shannon is juggling with bits (1,0,0) in his communication scheme “father of the information age’’

3 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “the most important man... you’ve never heard of”

Do you Know Claude Shannon?

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Do you Know Claude Shannon?

“the most important man... you’ve never heard of”

4 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Well-Known Scientific Heroes

Alan Turing (1912–1954)

5 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Well-Known Scientific Heroes

Alan Turing (1912–1954)

5 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Well-Known Scientific Heroes

John Nash (1928–2015)

6 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Well-Known Scientific Heroes

John Nash (1928–2015)

6 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective The Quiet and Modest Life of Shannon Shannon with Juggling Props

7 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective The Quiet and Modest Life of Shannon Shannon’s Toys Room

Shannon is known for riding through the halls of on a unicycle while simultaneously juggling four balls

8 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Crazy Machines

Theseus (labyrinth mouse)

9 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Crazy Machines

10 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Crazy Machines

calculator in Roman numerals

11 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective properties hex Shannon machine knowledge provably hard search humans 10x10 Crazy Machines computers 1951 Shannon machine

pawlewicz“Hex” + [email protected] switchingsolving game 10x10 hex machine

12 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Crazy Machines

Rubik’s cube solver

13 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Crazy Machines

3-ball juggling machine

14 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Crazy Machines

Wearable computer to predict roulette in casinos (with Edward Thorp)

15 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Crazy Machines

ultimate useless machine

16 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective ...and !

“Serious” Work

At the same time, Shannon made decisive theoretical advances in ... logic & circuits cryptography artifical intelligence stock investment wearable computing . .

17 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “Serious” Work

At the same time, Shannon made decisive theoretical advances in ... logic & circuits cryptography artifical intelligence stock investment wearable computing . . ...and information theory!

17 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Outline

Who is Claude Shannon? Shannon’s Seminal Paper Shannon’s Main Contributions Shannon’s Capacity Formula A Hartley’s rule C0 = log2 1 + ∆ is not Hartley’s   1 P Many authors independently derived C = 2 log2 1 + N in 1948.

Hartley’s rule is exact: C0 = C (a coincidence?)  

C0 is the capacity of the “uniform” channel Shannon’s Conclusion

18 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective One article (written 1940–48): A REVOLUTION !!!!!!

The Mathematical Theory of Communication (BSTJ, 1948)

19 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective The Mathematical Theory of Communication (BSTJ, 1948)

One article (written 1940–48): A REVOLUTION !!!!!!

19 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Shannon’s Theorems Yes it’s Maths !!

1. Source Coding Theorem (Compression of Information)

2. Channel Coding Theorem (Transmission of Information)

20 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Outline

Who is Claude Shannon? Shannon’s Seminal Paper Shannon’s Main Contributions Shannon’s Capacity Formula A Hartley’s rule C0 = log2 1 + ∆ is not Hartley’s   1 P Many authors independently derived C = 2 log2 1 + N in 1948.

Hartley’s rule is exact: C0 = C (a coincidence?)  

C0 is the capacity of the “uniform” channel Shannon’s Conclusion

21 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Shannon’s Paradigm

A tremendous impact!

22 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Shannon’s Paradigm... in Communication

23 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Shannon’s Paradigm... in Linguistics

24 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Shannon’s Paradigm... in Biology

25 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Shannon’s Paradigm... in Psychology

26 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Shannon’s Paradigm... in Social Sciences

27 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Shannon’s Paradigm... in Human-Computer Interaction

28 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Shannon’s “Bandwagon” Editorial

29 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Shannon’s Viewpoint

“The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; [...] These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages [...] unknown at the time of design. ”

X : a message symbol modeled as a random variable p(x) : the probability that X = x

30 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Kolmogorov’s Modern Probability Theory

Andreï Kolmogorov (1903–1987) founded modern probability theory in 1933 a strong early supporter of information theory! “Information theory must precede probability theory and not be based on it. [...] The concepts of information theory as applied to infinite sequences [...] can acquire a certain value in the investigation of the algorithmic side of mathematics as a whole.”

31 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective A Logarithmic Measure

1 digit represents 10 numbers 0,1,2,3,4,5,6,7,8,9; 2 digits represents 100 numbers 00, 01, . . . , 99; 3 digits represents 1000 numbers 000, . . . , 999; . .

log10 M digits represents M possible outcomes

Ralph Hartley (1888–1970) “[...] take as our practical measure of information the of the number of possible symbol sequences”

Transmission of Information, BSTJ, 1928

32 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective any information can be represented by a sequence of 0’s and 1’s — the Digital Revolution!

The

log10 M digits represents M possible outcomes or...

log2 M bits represents M possible outcomes

John Tukey (1915–2000) coined the term “bit” (contraction of “binary digit”) which was first used by Shannon in his 1948 paper

33 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective The Bit

log10 M digits represents M possible outcomes or...

log2 M bits represents M possible outcomes

John Tukey (1915–2000) coined the term “bit” (contraction of “binary digit”) which was first used by Shannon in his 1948 paper

any information can be represented by a sequence of 0’s and 1’s — the Digital Revolution!

33 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective The official name (International standard ISO/IEC 80000-13) for the information unit: ...the Shannon (symbol Sh)

The Unit of Information

bit (binary digit, unit of storage) = bit (binary unit of information) 6 less-likely messages are more informative than more-likely ones 1 1 1 bit is the of one equiprobable bit ( 2 , 2 )

otherwise the information content is < 1 bit:

34 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective ...the Shannon (symbol Sh)

The Unit of Information

bit (binary digit, unit of storage) = bit (binary unit of information) 6 less-likely messages are more informative than more-likely ones 1 1 1 bit is the information content of one equiprobable bit ( 2 , 2 )

otherwise the information content is < 1 bit:

The official name (International standard ISO/IEC 80000-13) for the information unit:

34 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective The Unit of Information

bit (binary digit, unit of storage) = bit (binary unit of information) 6 less-likely messages are more informative than more-likely ones 1 1 1 bit is the information content of one equiprobable bit ( 2 , 2 )

otherwise the information content is < 1 bit:

The official name (International standard ISO/IEC 80000-13) for the information unit: ...the Shannon (symbol Sh)

34 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Fundamental Limit of Performance

Shannon does not really give practical solutions but solves a theoretical problem: No matter what you do, (as long as you have a given amount of ressources) you cannot go beyond than a certain bit rate limit to achieve reliable communication

35 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Fundamental Limit of Performance

before Shannon: communication technologies did not have a landmark the limit can be calculated: we know how far we are from it and you can be (in theory) arbitrarily close to the limit! the challenge becomes: how can we build practical solutions that are close to the limit?

36 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Fundamental Limit of Performance

before Shannon: communication technologies did not have a landmark the limit can be calculated: we know how far we are from it and you can be (in theory) arbitrarily close to the limit! the challenge becomes: how can we build practical solutions that are close to the limit?

36 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Asymptotic Results

to find the limits of performance, Shannon’s results are necessarily asymptotic a source is modeled as a sequence of random variables

X1, X2,..., Xn

where the dimension n + . → ∞ this allows to exploit dependences and obtain a geometric “gain” using the law of large numbers where limits are expressed as expectations E {·}

37 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Asymptotic Results: Example

Consider the source X1, X2,..., Xn where each X can take a finite number of possible values, independently of the other symbols.

The probability of message x = (x1, x2,..., xn) is the product of the individual probabilities:

p(x) = p(x1) p(x2) p(xn). · ········· Re-arrange according to the value x taken by each argument:

p(x) = p(x)n(x) x Y where n(x) = number of symbols equal to x.

38 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective n H = 2− ·

where 1 1 H = p(x) log = log 2 p(x) E 2 p(X) x X   is a positive quantity called entropy.

Asymptotic Results: Example (Cont’d)

By the law of large numbers, the empirical probability (frequency) n(x) p(x) as n + n → → ∞

Therefore, a “typical” message x = (x1, x2,..., xn) satisfies p(x) = p(x)n(x) p(x)np(x) x ≈ x Y Y

39 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Asymptotic Results: Example (Cont’d)

By the law of large numbers, the empirical probability (frequency) n(x) p(x) as n + n → → ∞

Therefore, a “typical” message x = (x1, x2,..., xn) satisfies n(x) np(x) n H p(x) = p(x) p(x) = 2− · x ≈ x Y Y where 1 1 H = p(x) log = log 2 p(x) E 2 p(X) x X   is a positive quantity called entropy.

39 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective suggested by “You should call it entropy [...] no one really knows what entropy really is, so in a debate you will always have the advantage.” John von Neumann (1903–1957) studied in physics by

Léon Brillouin (1889–1969)

1 Shannon’s entropy H = p(x) log 2 p(x) x X analogy with statistical mechanics

Ludwig Boltzmann (1844–1906)

40 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “You should call it entropy [...] no one really knows what entropy really is, so in a debate you will always have the advantage.”

studied in physics by

Léon Brillouin (1889–1969)

1 Shannon’s entropy H = p(x) log 2 p(x) x X analogy with statistical mechanics

Ludwig Boltzmann (1844–1906) suggested by

John von Neumann (1903–1957)

40 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective studied in physics by

Léon Brillouin (1889–1969)

1 Shannon’s entropy H = p(x) log 2 p(x) x X analogy with statistical mechanics

Ludwig Boltzmann (1844–1906) suggested by “You should call it entropy [...] no one really knows what entropy really is, so in a debate you will always have the advantage.” John von Neumann (1903–1957)

40 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective 1 Shannon’s entropy H = p(x) log 2 p(x) x X analogy with statistical mechanics

Ludwig Boltzmann (1844–1906) suggested by “You should call it entropy [...] no one really knows what entropy really is, so in a debate you will always have the advantage.” John von Neumann (1903–1957) studied in physics by

Léon Brillouin (1889–1969)

40 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective The Source Coding Theorem

Compression problem: noiseless channel, minimize bit rate

INFORMATION NOISELESS TRANSMITTER RECEIVER DESTINATION SOURCE CHANNEL

x - - - - x RECEIVED SIGNAL SIGNAL MESSAGE MESSAGE

nH A “typical” sequence x = (x1, x2,..., xn) satisfies p(x) 2 . ≈ − Summing over the N typical sequences: nH 1 N 2− ≈ since the probability of x being typical is 1. So N 2nH. ≈ ≈ It is sufficient to encode only the N typical sequences:

log2N H bits per symbol n ≈

41 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective This is an asymptotic theorem (n + ) not a practical solution. → ∞ Variable length coding solution by Shannon and

Robert Fano (1917–2016) Optimal code (1952) by David Huffman (1925-1999) Elias, Golomb, Lempel-Ziv, ...

The Source Coding Theorem

Theorem (Shannon’s First Theorem) Only H bits per symbol suffice to reliably encode an information source.

The entropy H is the bit rate lower bound for reliable compression.

42 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective The Source Coding Theorem

Theorem (Shannon’s First Theorem) Only H bits per symbol suffice to reliably encode an information source.

The entropy H is the bit rate lower bound for reliable compression. This is an asymptotic theorem (n + ) not a practical solution. → ∞ Variable length coding solution by Shannon and

Robert Fano (1917–2016) Optimal code (1952) by David Huffman (1925-1999) Elias, Golomb, Lempel-Ziv, ...

42 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective n D(p,q) Bounds of the type 2− · useful in statistics: large deviations theory asymptotic behavior in hypothesis testing

Chernoff information to classify empirical data

Herman Chernoff (1923–)

Fisher information for parameter estimation

Ronald Fisher (1890–1962)

Relative Entropy (or Divergence) p(x) D(p, q) = p(x) log 0 with D(p, q) = 0 iff p q. 2 q(x) x ≥ ≡ X

43 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Relative Entropy (or Divergence) p(x) D(p, q) = p(x) log 0 with D(p, q) = 0 iff p q. 2 q(x) x ≥ ≡ X n D(p,q) Bounds of the type 2− · useful in statistics: large deviations theory asymptotic behavior in hypothesis testing

Chernoff information to classify empirical data

Herman Chernoff (1923–)

Fisher information for parameter estimation

Ronald Fisher (1890–1962)

43 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Shannon’s

Shannon’s entropy of a random variable X: 1 1 H(X) = p(x) log = log 2 p(x) E 2 p(X) x X   Shannon’s (mutual) information between two random variables X, Y: p(x, y) p(X, Y) I(X; Y) = p(x, y) log = log 2 p(x)p(y) E 2 p(X)p(Y) x,y X   This exactly D(p, q) where: p(x, y) is the (true) joint distribution; q(x, y) = p(x)p(y) is what would have been in the case of independence. Therefore I(X; Y) 0 with I(X; Y) = 0 iff X and Y are independent. ≥

44 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Shannon’s Mutual Information

Shannon writes p(X Y) I(X; Y) = E log2 | = H(X) H(X Y) p(X) − |   where H(X Y) is the conditional entropy of X given Y. |

H(X Y) H(X): knowledge decreases uncertainty | ≤ by a quantity equal to the information gain I(X; Y). intuitive and rigorous!

45 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective The Channel Coding Theorem

Transmission problem: noisy channel, maximize bit rate for reliable communication

TRANSMITTER CHANNEL RECEIVER

y - x - - - RECEIVED SIGNAL 6 SIGNAL MESSAGE MESSAGE

NOISE SOURCE

It is sufficient to decode only sequences x jointly typical with y.

46 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective The Channel Coding Theorem (Cont’d)

But another code is also jointly typical with y with probability bounded by n I(X;Y) 2− · . Summing over the N code sequences, the total probability of decoding error is bounded by n I(X;Y) N 2− · · which tends to zero only if the bit rate log N 2 < I(X; Y) n

Definition (Channel Capacity)

C = max I(X; Y) p(x)

47 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Revolutionary! Transmission noise does not affect quality—it only impacts the bit rate. This is the theorem that led to the digital revolution!

The Channel Coding Theorem (Cont’d)

If the bit rate is < C, then the error probability, averaged over all possible codes, can be made as small as desired.

Therefore there exists at least one code with arbitrarily small probability of error. Theorem (Shannon’s Second Theorem) Information can be transmitted reliably provided that the bit rate does not exceed the channel capacity C.

The capacity C is the bit rate upper bound for reliable transmission.

48 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective The Channel Coding Theorem (Cont’d)

If the bit rate is < C, then the error probability, averaged over all possible codes, can be made as small as desired.

Therefore there exists at least one code with arbitrarily small probability of error. Theorem (Shannon’s Second Theorem) Information can be transmitted reliably provided that the bit rate does not exceed the channel capacity C.

The capacity C is the bit rate upper bound for reliable transmission.

Revolutionary! Transmission noise does not affect quality—it only impacts the bit rate. This is the theorem that led to the digital revolution!

48 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Shannon’s Result is Paradoxical!

Shannon theorems show that good codes exist, but give no clue on how to build them in practice but choosing a code at random would be almost optimal! however random coding is impractical (n is large)... only 50 years later were found turbo-codes (by Claude Berrou & Alain Glavieux) that imitate random coding to approach capacity

49 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Outline

Who is Claude Shannon? Shannon’s Seminal Paper Shannon’s Main Contributions Shannon’s Capacity Formula A Hartley’s rule C0 = log2 1 + ∆ is not Hartley’s   1 P Many authors independently derived C = 2 log2 1 + N in 1948.

Hartley’s rule is exact: C0 = C (a coincidence?)  

C0 is the capacity of the “uniform” channel Shannon’s Conclusion

50 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective asoundchannel Shannon’s formula:

claudeshannon

anagram

51 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective anagram

Shannon’s formula:

claudeshannon

asoundchannel

51 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective anagram

claudeshannon

asoundchannel Shannon’s formula: P C = W log 1 + bits/second 2 N  

51 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective anagram

claudeshannon

asoundchannel Shannon’s formula: P or... C = 1 log 1 + bits/symbol 2 2 N  

51 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Additive White Gaussian Noise Channel

A very common model: Y = X + Z where Z is Gaussian (0, σ2). N Shannon finds the exact expression:

P C = W log 1 + bit/s · 2 N   where W is the bandwidth and P/N is the signal-to-noise ratio. a “concrete” finding of information theory – the most celebrated formula of Shannon! to derive this formula, Shannon popularized the Whittaker- Nyquist sampling theorem — “Shannon’s Theorem”!

52 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Claude Shannon

Shannon’s formula: P + N C = W log 2 N   “A Mathematical Theory of Communication,” The Technical Journal, Vol. 27, pp. 623–656, October, 1948.

53 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Claude Shannon

Shannon’s formula: P + N C = W log 2 N “A Mathematical Theory of Communication,” The Bell System Technical Journal, Vol. 27, pp. 623–656, October, 1948.

53 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Ralph Hartley

20 years before... in the same journal...

Hartley’s rule: A C = log 1 + bits/symbol 0 2 ∆   “Transmission of Information,” The Bell System Technical Journal, Vol. 7, pp. 535–563, July 1928.

54 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Ralph Hartley

Hartley’s rule: A C = log 1 + 0 2 ∆  

(Wozencraft-Jacobs textbook, 1965)

55 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Ralph Hartley

Hartley’s rule: A C = log 1 + 0 2 ∆   amplitude “SNR” A/∆ (factor 1/2 is missing) no coding involved (except quantization) zero error

(Wozencraft-Jacobs textbook, 1965)

55 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective 1 P Shannon’s C = 2 log2 1 + N came unexpected in 1948   Hartley’s rule is inexact: C = C 0 6

Besides, C0 is not the capacity of a noisy channel

Outline

A Hartley’s C0 = log2 1 + ∆ came 20 years before Shannon  

56 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Hartley’s rule is inexact: C = C 0 6

Besides, C0 is not the capacity of a noisy channel

Outline

A Hartley’s C0 = log2 1 + ∆ came 20 years before Shannon   1 P Shannon’s C = 2 log2 1 + N came unexpected in 1948  

56 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Besides, C0 is not the capacity of a noisy channel

Outline

A Hartley’s C0 = log2 1 + ∆ came 20 years before Shannon   1 P Shannon’s C = 2 log2 1 + N came unexpected in 1948   Hartley’s rule is inexact: C = C 0 6

56 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Outline

A Hartley’s C0 = log2 1 + ∆ came 20 years before Shannon   1 P Shannon’s C = 2 log2 1 + N came unexpected in 1948   Hartley’s rule is inexact: C = C 0 6

Besides, C0 is not the capacity of a noisy channel

56 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Outline Wrong!

57 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective 1 P Shannon’s C = 2 log2 1 + N was derived by 7 other authors in 1948!  

Hartley’s rule is exact: C0 = C (a coincidence?)

C0 is the capacity of a noisy “uniform” channel

Outline

A Hartley’s rule C0 = log2 1 + ∆ is not Hartley’s  

58 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Hartley’s rule is exact: C0 = C (a coincidence?)

C0 is the capacity of a noisy “uniform” channel

Outline

A Hartley’s rule C0 = log2 1 + ∆ is not Hartley’s   1 P Shannon’s C = 2 log2 1 + N was derived by 7 other authors in 1948!  

58 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective C0 is the capacity of a noisy “uniform” channel

Outline

A Hartley’s rule C0 = log2 1 + ∆ is not Hartley’s   1 P Shannon’s C = 2 log2 1 + N was derived by 7 other authors in 1948!  

Hartley’s rule is exact: C0 = C (a coincidence?)

58 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Outline

A Hartley’s rule C0 = log2 1 + ∆ is not Hartley’s   1 P Shannon’s C = 2 log2 1 + N was derived by 7 other authors in 1948!  

Hartley’s rule is exact: C0 = C (a coincidence?)

C0 is the capacity of a noisy “uniform” channel

58 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Outline

A Hartley’s rule C0 = log2 1 + ∆ is not Hartley’s   1 P Shannon’s C = 2 log2 1 + N was derived by 7 other authors in 1948!  

Hartley’s rule is exact: C0 = C (a coincidence?)

C0 is the capacity of a noisy “uniform” channel

58 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Outline

Who is Claude Shannon? Shannon’s Seminal Paper Shannon’s Main Contributions Shannon’s Capacity Formula A Hartley’s rule C0 = log2 1 + ∆ is not Hartley’s   1 P Many authors independently derived C = 2 log2 1 + N in 1948.

Hartley’s rule is exact: C0 = C (a coincidence?)  

C0 is the capacity of the “uniform” channel Shannon’s Conclusion

59 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “I was a great fanof Edgar Allan Poe’s ‘The Gold Bug’ and storieslike that. And 1 used to solve cryptograms when 1 was a boy.”

C.S.: That could be. That cryptography report is a funny C.S.: Yes. I believe that I made some remarks about that in thing because it contains a lot of information theory that I oneof mypapers. I thinkthat all ofthese sciences and had worked out before, during the five years between 1940 theories stimulate eachother to later developments. In my and 1945. Much of that work I did at home. case, I started with Hartley’s paper and worked at least two R.P.: You didthat analysis during the war,at home? or three years on the problems of information and communi- Wasn’t that motivated by cryptography? cations.That would be around 1943 or 1944; andthen I C.S.: My first getting at that was information theory, andI started thinking about cryptography and secrecy systems. used cryptography as a way of legitimatizing the work. There is this close connection; they are very similar things, in one case trying to conceal information, and in the other R.P.: Was it an answer looking for a problem? You were case trying to transmit it. delighted to find cryptography coming along during the war as somethingthat was needed andthat was agreat R.P.: That is why I see a duality there. Entropy measures application of your information theory? can be used in both cases. C.S.: In part. I might say that cryptography was there and C.S.: When I came out with my paper in 1948 [7], part of it seemed to me that this cryptography problem was very thatwas taken verbatim from the cryptography report, closelyrelated to thecommunications problem. The other which had not been published at that time. thing was that I was not yet ready to write up information theory. For cryptography you could write up anything in any Origin of the Entropy Measure in Information Theory shape, which I did. R.P.: Do you think, even if there had not been a war effort, R.P.: It has been said that ‘[John] Von Neumann gave you you would have been interested in the cryptographic aspects the word “entropy,” saying to use it because you would win of this? everytime becauseno one wouldunderstand it and, C.S.: I probably would have been because that’s the kind of furthermore, it fitted plog(p) perfectly [12,13]. thing that attracts me. I was a great fan of Edgar Allan Poe’s I also heard a different version of this story: that you had “The Gold Bug”and stories like that. And I used to solve independentlyarrived at the word “entropy” and were cryptograms when I was a boy. thinking of using it but were somewhat dubious, and you got reassurances from people like Von Neumann and people at R.P.: I read that John R. Pierce said that cryptography was Bell Labs that“entropy” could be used. You hadalready an application of information theory. I was pretty sure that made that identification and, furthermore, in your cryptog- that was putting the cart before the horse.I was beginning to raphy report of 1945, you use the word “entropy”; you liken think that it was the other way around, and that information it to statistical mechanics. Moreover, I don’t believe that you theoryhad come out of cryptography. When look at this I were in contact with Von Neumann in 1945. So, it does not 1945 cryptographyreport, it has the phrase “information seem to me that Von Neumann suggested the word “entropy” theory” and it says that you are next going to get around to to you. writingup information theory. This makes it soundas if cryptography gave you the mysterious “missing link,” but C.S.: No, I don’t think he did. I’m quite sure that it did not it’s now clearHartley that information or not Hartley theory did not come out of happen between Von Neumann and me. cryptography. R.P.: I think the fact that it is in your 1945 cryptography QuoteC.S.: from Working Shannon, on 1984: cryptography led backto the good report establishes that you did not get the idea from Von aspects of informationtheory. I startedwith information Neumann. Rather,you had made theplog(p) identification theory, inspired by Hartley’s paper, which was a good paper, with entropy by some other means. but it didnot take account of thingslike noise and best Professor [I. J.] Good told me that[Alan] Turing had encoding and probabilistic aspect^.^ brought the entropy measure into cryptography in England asearly as 1940. Good talkedabout this in his book, R.P.: You have said to other people that these were closely Weighting of Evidence, or some title like that, in 1948. But intertwined,In Hartley’s and paper, that no cryptography mention of signal was vs.no noisemere or applicationA vs. ∆ Good alluded to it only very obliquely because it was still of information theory. AsA you say, you got stimulus. Could I Why was C0 = log2 1 + ∆ mistakenly attributed to Hartley? under super-secrecy, and it was not until 1974 that this could suggest that there is a sort of duality there? The cryptog-   be talked about openly. However, the entropy measure was raphy problem is, in some ways, the “mirror image” of the communications problem, so you naturally got some insights out of it.

60 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “. . . they are very similar things, in one case trying to conceal information, 3Ed. Note: In laterdiscussion, Dr. Shannonalso emphasized the and in the other case trying to importance of Nyquist’s work in the development of his thinking in this area. Still later, he introducedthe editor to [lo], and provided transmit it.” the note accompanying it in the References.

May 1984-VOI. 22, NO. 5 IEEE Communications Magazine 124 The first tutorial of information theory!

. .

. .

61 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Outline

Who is Claude Shannon? Shannon’s Seminal Paper Shannon’s Main Contributions Shannon’s Capacity Formula A Hartley’s rule C0 = log2 1 + ∆ is not Hartley’s   1 P Many authors independently derived C = 2 log2 1 + N in 1948.

Hartley’s rule is exact: C0 = C (a coincidence?)  

C0 is the capacity of the “uniform” channel Shannon’s Conclusion

62 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective 2. William G. Tuller, PhD Thesis, June 1948 3. Herbert Sullivan (unpublished) 4. Jacques Laplume, April 1948 5. Charles W. Earp, June 1948 6. André G. Clavier, December 1948 7. Stanford Goldman, May 1948 8. Claude E. Shannon, Oct. 1948

And then there were eight

Quote from Shannon, 1948:

1. Norbert Wiener, Cybernetics, ? 1948

63 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective 3. Herbert Sullivan (unpublished) 4. Jacques Laplume, April 1948 5. Charles W. Earp, June 1948 6. André G. Clavier, December 1948 7. Stanford Goldman, May 1948 8. Claude E. Shannon, Oct. 1948

And then there were eight

Quote from Shannon, 1948:

1. Norbert Wiener, Cybernetics, ? 1948 2. William G. Tuller, PhD Thesis, June 1948

63 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective 4. Jacques Laplume, April 1948 5. Charles W. Earp, June 1948 6. André G. Clavier, December 1948 7. Stanford Goldman, May 1948 8. Claude E. Shannon, Oct. 1948

And then there were eight

Quote from Shannon, 1948:

1. Norbert Wiener, Cybernetics, ? 1948 2. William G. Tuller, PhD Thesis, June 1948 3. Herbert Sullivan (unpublished)

63 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective 5. Charles W. Earp, June 1948 6. André G. Clavier, December 1948 7. Stanford Goldman, May 1948 8. Claude E. Shannon, Oct. 1948

And then there were eight

Quote from Shannon, 1948:

1. Norbert Wiener, Cybernetics, ? 1948 2. William G. Tuller, PhD Thesis, June 1948 3. Herbert Sullivan (unpublished) 4. Jacques Laplume, April 1948

63 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective 6. André G. Clavier, December 1948 7. Stanford Goldman, May 1948 8. Claude E. Shannon, Oct. 1948

And then there were eight

Quote from Shannon, 1948:

1. Norbert Wiener, Cybernetics, ? 1948 2. William G. Tuller, PhD Thesis, June 1948 3. Herbert Sullivan (unpublished) 4. Jacques Laplume, April 1948 5. Charles W. Earp, June 1948

63 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective 7. Stanford Goldman, May 1948 8. Claude E. Shannon, Oct. 1948

And then there were eight

Quote from Shannon, 1948:

1. Norbert Wiener, Cybernetics, ? 1948 2. William G. Tuller, PhD Thesis, June 1948 3. Herbert Sullivan (unpublished) 4. Jacques Laplume, April 1948 5. Charles W. Earp, June 1948 6. André G. Clavier, December 1948

63 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective 8. Claude E. Shannon, Oct. 1948

And then there were eight

Quote from Shannon, 1948:

1. Norbert Wiener, Cybernetics, ? 1948 2. William G. Tuller, PhD Thesis, June 1948 3. Herbert Sullivan (unpublished) 4. Jacques Laplume, April 1948 5. Charles W. Earp, June 1948 6. André G. Clavier, December 1948 7. Stanford Goldman, May 1948

63 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective And then there were eight

Quote from Shannon, 1948:

1. Norbert Wiener, Cybernetics, ? 1948 2. William G. Tuller, PhD Thesis, June 1948 3. Herbert Sullivan (unpublished) 4. Jacques Laplume, April 1948 5. Charles W. Earp, June 1948 6. André G. Clavier, December 1948 7. Stanford Goldman, May 1948 8. Claude E. Shannon, Oct. 1948

63 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Norbert Wiener

. 64 / 85 26/10/2016 Olivier Rioul .Shannon’s Formula W log(1 + SNR): A Historical Perspective .f :;-, *: j&y+; .,_ ; c.2 i ,g Ir 48 IRE TRANSACTIONS ON INFORMATION THEORY June

.f :;-, *: j&y+; .,_ ; c.2 i ,g Ir 48 IRE TRANSACTIONS ON INFORMATION THEORY June

Later. . . in 1956: What is Information Theory? What is Information Theory?

NORBERT WIENER NORBERT WIENER NFORMATION THEORYNFORMATION has been identifiedTHEORY communication has beentheory identified and in my opinioncommunication in all theory and in my opinion in all in the public mind to denote the theory of infor- branches of science whatever. It is generally recog- mation by bits, asin developedthe public by Claudemind E.to denotenized that the the theoryquantum theoryof infor- which nowbranches dominates of science whatever. It is generally recog- Shannon and myself. Thismation notion is bycertainly bits, impor- as developedthe whole of physicsby Claudeis at root E.a statisticalnized theory; that the quantum theory which now dominates tant and has provedShannon profitable andas amyself. standpoint Thisat notionalthough itis iscertainly perhaps not impor-yet as generally the recognized whole of physics is at root a statistical theory; least, although as Dr. Shannon suggests in his edi- as it should be, the quantum theory is strictly a torial, “The Bandwagon,tant ” andthe concepthas provedas taken fromprofitable branch ofas the atheory standpoint of time series. at Professoralthough Armand it is perhaps not yet as generally recognized this point of viewleast, is beginning although to suffer as fromDr. theShannon Siegel andsuggests I are among in those his nowedi- working asin thisit field.should be, the quantum theory is strictly a indiscriminate way in which it has been taken as a What I am here entreating is that communication solution of all informationaltorial, “ Theproblems, Bandwagon, a sort of” thetheory concept be studied as as taken one item from in an entirebranch context ofof the theory of time series. Professor Armand magic key. I am pleadingthis pointin this editorialof view that isInfor- beginning related theoriesto suffer of a statisticalfrom thenature, Siegeland that and it I are among those now working in this field. mation Theory go back of its slogans and return to the should not lose its integrity by becoming a special indiscriminate way in which it has been taken as a What I am ,-here entreating is that communication point of view from which it originated: that of the vested interest attached to a certain set of slogans general statistical solutionconcept of communication.of all informational A mes- and cliches.problems, I hope thata sortthese TRANSACTIONSof theory maybe studied as one item in an entire context of sage is to be conceivedmagic as akey. sequence I am of occurrencespleading inencourage this editorial this integrated that Infor-view of communicationrelated theories of a statistical , and that it distributed in time to be considered not exclusively theory by extending its hospitality to papers which, by itself, but as onemation of an Theoryensemble goof similarback se-of itswhile slogans they bearand on return communication to the theory,should cross itsnot lose its integrity by becoming a special quences. As such pointit comes ofunder view the theoryfrom ofwhich time itboundaries, originated: and havethat a scopeof thecovering vestedthe related interest attached to a certain set of slogans ,- series which is angeneral important statisticalbranch of statisticalconcept ofstatistical communication. theories. In myA opinion mes- we areand in acliches. dan- I hope that these TRANSACTIONS may theory with a rapidly developing technique and set gerous age of overspecialization. To me the danger of of concepts of its sageown. Thisis totheory be conceivedis closely allied as athis sequence period is notof occurrencesprimarily that we areencourage studying this integrated view of communication to the ideas of Willarddistributed Gibbs in statisticalin time mechanics. to be consideredvery special problemsnot exclusively that the development theory of science by extending its hospitality to papers which, What I am urging is a return to the concepts of this has forced us to go into, but rather that we are in theory in its entiretyby ratheritself, thanbut the asexaltation one ofof angreat ensemble danger of offinding similar our outlookse- so whilelimited thatthey bear on , cross its one particular conceptquences. of this Asgroup, such the conceptit comes of weunder may failthe totheory see the bearingof time of importantboundaries, ideas and have a scope covering the related the measure of informationseries whichinto the singleis an dominant important because branch they haveof beenstatistical formulated instatistical what our theories. In my opinion we are in a dan- idea of all. organization of science has decreed to be alien terri- I am pleading fortheory this morewith particularly a rapidly because developing tory. I hopetechnique that these TRANSACTIONSand set maygerous steadily age of overspecialization. To me the danger of the Gibbsian point ofof viewconcepts is showing of an its applicability own. Thisset theorytheir face is againstclosely this allied comminution this ofperiod the is not primarily that we are studying and fertility in manyto thebranches ideas of scienceof Willard other thanGibbs intellect. in statistical mechanics. very special problems that the development of science What I am urging is a return to the concepts of this has forced us to go into, but rather that we are in theory in its entirety rather than the exaltation of great danger of finding our outlook so limited that one particular concept of this group, the concept of we may fail to see the bearing of important ideas the measure of information into the single dominant because they have been formulated in what our idea of all. organization of science has decreed to be alien terri- I am pleading for this more particularly because tory. I hope thatF these TRANSACTIONS may steadily the Gibbsian point of view is showing an applicability set their face against this comminution of the and fertility in many branches of science other than intellect.

F

Norbert Wiener

65 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective .f :;-, *: j&y+; .,_ ; c.2 i ,g Ir 48 IRE TRANSACTIONS ON INFORMATION THEORY June Norbert Wiener

.f :;-, *: j&y+; .,_ ; c.2 i ,g Ir 48 IRE TRANSACTIONS ON INFORMATION THEORY June

Later. . . in 1956: What is Information Theory? What is Information Theory?

NORBERT WIENER NORBERT WIENER NFORMATION THEORYNFORMATION has been identifiedTHEORY communication has beentheory identified and in my opinioncommunication in all theory and in my opinion in all in the public mind to denote the theory of infor- branches of science whatever. It is generally recog- mation by bits, asin developedthe public by Claudemind E.to denotenized that the the theoryquantum theoryof infor- which nowbranches dominates of science whatever. It is generally recog- Shannon and myself. Thismation notion is bycertainly bits, impor- as developedthe whole of physicsby Claudeis at root E.a statisticalnized theory; that the quantum theory which now dominates tant and has provedShannon profitable andas amyself. standpoint Thisat notionalthough itis iscertainly perhaps not impor-yet as generally the recognized whole of physics is at root a statistical theory; least, although as Dr. Shannon suggests in his edi- as it should be, the quantum theory is strictly a torial, “The Bandwagon,tant ” andthe concepthas provedas taken fromprofitable branch ofas the atheory standpoint of time series. at Professoralthough Armand it is perhaps not yet as generally recognized 65 / 85 26/10/2016this point of viewleast, is beginningOlivier although Rioul to suffer as fromDr. theShannon Shannon’sSiegel andsuggests Formula I are Wamonglog (1in +those SNRhis ) now: Aedi- Historical working Perspectiveasin thisit field.should be, the quantum theory is strictly a indiscriminate way in which it has been taken as a What I am here entreating is that communication solution of all informationaltorial, “ Theproblems, Bandwagon, a sort of” thetheory concept be studied as as taken one item from in an entirebranch context ofof the theory of time series. Professor Armand magic key. I am pleadingthis pointin this editorialof view that isInfor- beginning related theoriesto suffer of a statisticalfrom thenature, Siegeland that and it I are among those now working in this field. mation Theory go back of its slogans and return to the should not lose its integrity by becoming a special indiscriminate way in which it has been taken as a What I am ,-here entreating is that communication point of view from which it originated: that of the vested interest attached to a certain set of slogans general statistical solutionconcept of communication.of all informational A mes- and cliches.problems, I hope thata sortthese TRANSACTIONSof theory maybe studied as one item in an entire context of sage is to be conceivedmagic as akey. sequence I am of occurrencespleading inencourage this editorial this integrated that Infor-view of communicationrelated theories of a statistical nature, and that it distributed in time to be considered not exclusively theory by extending its hospitality to papers which, by itself, but as onemation of an Theoryensemble goof similarback se-of itswhile slogans they bearand on return communication to the theory,should cross itsnot lose its integrity by becoming a special quences. As such pointit comes ofunder view the theoryfrom ofwhich time itboundaries, originated: and havethat a scopeof thecovering vestedthe related interest attached to a certain set of slogans ,- series which is angeneral important statisticalbranch of statisticalconcept ofstatistical communication. theories. In myA opinion mes- we areand in acliches. dan- I hope that these TRANSACTIONS may theory with a rapidly developing technique and set gerous age of overspecialization. To me the danger of of concepts of its sageown. Thisis totheory be conceivedis closely allied as athis sequence period is notof occurrencesprimarily that we areencourage studying this integrated view of communication to the ideas of Willarddistributed Gibbs in statisticalin time mechanics. to be consideredvery special problemsnot exclusively that the development theory of science by extending its hospitality to papers which, What I am urging is a return to the concepts of this has forced us to go into, but rather that we are in theory in its entiretyby ratheritself, thanbut the asexaltation one ofof angreat ensemble danger of offinding similar our outlookse- so whilelimited thatthey bear on communication theory, cross its one particular conceptquences. of this Asgroup, such the conceptit comes of weunder may failthe totheory see the bearingof time of importantboundaries, ideas and have a scope covering the related the measure of informationseries whichinto the singleis an dominant important because branch they haveof beenstatistical formulated instatistical what our theories. In my opinion we are in a dan- idea of all. organization of science has decreed to be alien terri- I am pleading fortheory this morewith particularly a rapidly because developing tory. I hopetechnique that these TRANSACTIONSand set maygerous steadily age of overspecialization. To me the danger of the Gibbsian point ofof viewconcepts is showing of an its applicability own. Thisset theorytheir face is againstclosely this allied comminution this ofperiod the is not primarily that we are studying and fertility in manyto thebranches ideas of scienceof Willard other thanGibbs intellect. in statistical mechanics. very special problems that the development of science What I am urging is a return to the concepts of this has forced us to go into, but rather that we are in theory in its entirety rather than the exaltation of great danger of finding our outlook so limited that one particular concept of this group, the concept of we may fail to see the bearing of important ideas the measure of information into the single dominant because they have been formulated in what our idea of all. organization of science has decreed to be alien terri- I am pleading for this more particularly because tory. I hope thatF these TRANSACTIONS may steadily the Gibbsian point of view is showing an applicability set their face against this comminution of the and fertility in many branches of science other than intellect.

F Charles W. Earp

66 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Charles W. Earp

66 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective 584 PROCEEDINGS OF THE I.R.E. May Some Fundamental Considerations Concerning Noise Reduction and Range in Radar and Communication * STANFORD GOLDMANt, SENIOR MEMBER, I.R.E. Summary-A general analysis based upon information theory and amplitude levels, any particular signal such as that the mathematical theory of probability is used to investigate the fun- shown, having a duration of n significant time intervals, damental principles involved in the transmission of signals584 through Stanford GoldmanPROCEEDINGS OF THE I.R.E. May a background of random noise. Three general theorems governing represents one out of Ln different possible signals of this the probability relations between signal and noise are proved, and duration which could have been transmitted in the one is applied to investigate the effect of pulse length andSomerepetitionFundamentalsystem.' With theConsiderationsforegoing meaning forConcerningthe various Noise rate on radar range. The concept of "generalized selectivity"Reductionis in- symbols,andweRangehave in Radar and Communication * troduced, and it is shown how and why extra bandwidth can be used for noise reduction. It is pointed out that most noise-improvement numberSTANFORDof differentGOLDMANt,possibleSENIORmessagesMEMBER,=Ln.I.R.E. (I),2 systems are based upon coherent repetition of the message informa- tion either in time or in the frequency spectrum. It is alsoSummary-Apointed outgeneral analysis based upon information theory and amplitude levels, any particular signal such as that the mathematical theory of probabilityThe numberis used to investigateof significantthe fun- shown,amplitudehaving levelsa durationis ofusuallyn significant time intervals, why more powerful noise-improvement systems shoulddamentalbe possibleprinciples involveddeterminedin the transmissionby theof signalsnoisethroughin the system. If the system is than have so far been made. a background of random noise. Three general theorems governing represents one out of Ln different possible signals of this the probability relationsofbetweena linearsignal andnature,noise are andproved,theand maximumduration whichsignalcouldamplitudehave been transmitted in the The general mechanism of noise-improvement thresholdsone is appliedis dis-to investigate the effect of pulse length and repetition system.' With the foregoing meaning for the various cussed, and it is shown how they depend upon the establishmentrate on radar range.of a TheisconceptS, whileof "generalizedthe noiseselectivity"amplitudeis in- symbols,is N, thenwe havethe number of coherence standard. The reason for and the limitation oftroduced,the apparentand it is shownsignificanthow and why extraamplitudebandwidth canlevelsbe used is essentially for noise reduction. It is pointed out that most noise-improvement number of different possible messages =Ln. (I),2 law that the maximum operating range of a communicationssystems aresystem,based upon coherent repetition of the message informa- for a given average power, is independent of the type tionof modulationeither in time or in the frequency spectrum. It is also Lpointed= (S/N)out +The1number of significant amplitude(2) levels is usually used is then explained. General ways in which improvementswhy moreinpowerfulrange noise-improvement systems should be possible determined by the noise in the system. If the system is than have so far been made. of radar and communication systems may be made areThealsogeneraldis-mechanismwhereof noise-improvementthe "1" is duethresholdsto theis dis-factof athatlinearthenature,zeroandsignalthe maximumlevel signal amplitude cussed. The possibility of using extra bandwidth to reducecussed,distortionand it is shown howcantheybedependused.upon the establishment of a is S, while the noise amplitude is N, then the number of is pointed out. Finally, some possible relations of this workcoherenceto biologystandard. The reason for and the limitation of the apparent significant amplitude levels is essentially law that the maximum operatingTherangedurationof a communicationsto of system,a significant time interval of the and psychology are described. for a given average power, is independent of the type of modulation L = (S/N) + 1 (2) used is then explained. Generalsignalwaysisindeterminedwhich improvementsbyin rangethe inherent limited bandwidth I. INFORMATION THEORY of radar and communicationof thesystemssignal.may be madeIt isare wellalso dis-knownwhere thethat,"1" isifduea tosignalthe fact hasthat the zero signal level cussed. The possibility of using extra bandwidth toareduce distortion can be used. more flf are inis pointed out. Finally, somepassedpossiblethroughrelations of this worktransmissionto biology system havingto or HE SIGNALS which of interest andradiopsychologyengi-are described. The duration of a significant time interval of the neering may be represented graphically67 / 85as 26/10/2016func- less uniformOlivier Rioultransmission overShannon’ssignala frequencyis FormuladeterminedW log(1 +bybandwidthSNRthe): Ainherent Historical Perspectivelimited bandwidth tions of time. One such signal is shown in Fig. 1.I. INFORMATIONB, the smallestTHEORYtime intervals intoof thewhichsignal. weIt iscanwellseparateknown that, if a signal has flf HE SIGNALSthewhichportionsare of interestof thein radiosignalengi-suchpassedthatthrough a transmissionof thesystem having more or In a transmission system having L different significant less uniformamplitudestransmission over a frequency bandwidth neering may individualbe representedintervalsgraphically shallas func-be separately significant will tions of time. One such signal is shown in Fig. 1. B, the smallest time intervals into which we can separate In a transmission systemhave havinga durationL differentof significantapproximately3the portions of the signal such that amplitudes of the T individual intervals shall be separately significant will have a duration of approximately3 T to= 1/2B. (3) to= 1/2B. (3) Equation (3) may, in any particular case, be in error Equation (3) may, in any particular case, be in error by several per cent. However,by itseveralwillpernotcent.beHowever,wrong itbywill not be wrong by an order of magnitude. If the totalan orderdurationof magnitude.of theIf thesignaltotal duration of the signal is T, then the number of its significantis T, then the timenumberintervalsof its significantis time intervals is n = T/to = 2TB. (4) n = T/to = 2TB. (4) I - Consequently, a given message of duration T represents - a particular choice of one out of I Consequently, a given message of duration T represents - L-I 2 a particular choice of one out of / + )2TB Fig. 1-Diagram of a signal, showing its significant time intervals Ln = ( + 11 (5)4 - L-I and amplitude levels. This signal is in a system in which there are XN I 2 both positive and negative levels. With noise also having both/ + )2TB positive and negative levels, the spacing between signal= levels different possible messages of the same duration which Fig. 1-Diagram of a signal, showing its significant timemustintervalsbe the peak-to-peak value of noise, namely, 2N,Lnso that( the + 11 (5)4 levels. This a number of different significant amplitude levels is still L = (S/N)XN couldIhave been sent through the system. and amplitude signal is in system in which+1.there(The idealaresignal is shown by the broken line. The solid line both positive and negative levels. With noise also havingshows thebothsame signal after passing through a transmission system I R.V. L. Hartley, "Transmission of information," Bell Sys. Tech. positive and negative levels, the spacing between signalof bandwidthlevelsB.) Jour., vol.same7, pp. 535-563; July, 1928. different possible messages of the2 For example, durationif there are threewhichamplitude levels, designated as must be the peak-to-peak value of noise, namely, 2N,*so that the classification: R272.3. Original manuscriptsentreceived by a, b,theand c, and if there are two time intervals, then the 32=9 possible number of different significant amplitude levels is stillthe Institute,L = (S/N)October 6,could1947; revisedhavemanuscriptbeenreceived, Januarythroughsignals aresystem.aa, ab, ac, ba, bb, bc, ca, cb, and cc. +1. (The ideal signal is shown by the broken line. 15,The1948.solidPresented,line National Electronics Conference, November, I Stanford Goldman, "Frequency Analysis, Modulation and Noise," 1947, Chicago, Ill. This workI has been supported in part by the Sig- McGraw-Hill Book Co., New York, N. Y., 1947; chap. IV, especially shows the same signal after passing through a transmissionnal Corps,systemthe Air Materiel Command,R.V. L. andHartley,the Office"Transmissionof Naval Re- Fig.of 7c.information," Bell Sys. Tech. of bandwidth B.) search. Jour., vol. 7, pp. 535-563; July, 1928. 4Equation (5) has been derived independently by many people, t Research Laboratory of2 Electronics,For example,Massachusettsif thereInstituteare threeof amongamplitudethem W.levels,G. Tuller,designatedfrom whom theaswriter first learned about * Decimal classification: R272.3. Original manuscriptTechnology,receivedCambridge,by a,Mass.b, and c, and if there are two time it.intervals, then the 32=9 possible the Institute, October 6, 1947; revised manuscript received, January signals are aa, ab, ac, ba, bb, bc, ca, cb, and cc. 15, 1948. Presented, National Electronics Conference, November, I Stanford Goldman, "Frequency Analysis, Modulation and Noise," 1947, Chicago, Ill. This work has been supported in part by the Sig- McGraw-Hill Book Co., New York, N. Y., 1947; chap. IV, especially nal Corps, the Air Materiel Command, and the Office of Naval Re- Fig. 7c. search. 4Equation (5) has been derived independently by many people, t Research Laboratory of Electronics, Massachusetts Institute of among them W. G. Tuller, from whom the writer first learned about Technology, Cambridge, Mass. it. 584 PROCEEDINGS OF THE I.R.E. May Some Fundamental Considerations Concerning Noise Reduction and Range in Radar and Communication * STANFORD GOLDMANt, SENIOR MEMBER, I.R.E. Summary-A general analysis based upon information theory and amplitude levels, any particular signal such as that the mathematical theory of probability is used to investigate the fun- shown, having a duration of n significant time intervals, damental principles involved in the transmission of signals584 through Stanford GoldmanPROCEEDINGS OF THE I.R.E. May a background of random noise. Three general theorems governing represents one out of Ln different possible signals of this the probability relations between signal and noise are proved, and duration which could have been transmitted in the one is applied to investigate the effect of pulse length andSomerepetitionFundamentalsystem.' With theConsiderationsforegoing meaning forConcerningthe various Noise rate on radar range. The concept of "generalized selectivity"Reductionis in- symbols,andweRangehave in Radar and Communication * troduced, and it is shown how and why extra bandwidth can be used for noise reduction. It is pointed out that most noise-improvement numberSTANFORDof differentGOLDMANt,possibleSENIORmessagesMEMBER,=Ln.I.R.E. (I),2 systems are based upon coherent repetition of the message informa- tion either in time or in the frequency spectrum. It is alsoSummary-Apointed outgeneral analysis based upon information theory and amplitude levels, any particular signal such as that the mathematical theory of probabilityThe numberis used to investigateof significantthe fun- shown,amplitudehaving levelsa durationis ofusuallyn significant time intervals, why more powerful noise-improvement systems shoulddamentalbe possibleprinciples involveddeterminedin the transmissionby theof signalsnoisethroughin the system. If the system is than have so far been made. a background of random noise. Three general theorems governing represents one out of Ln different possible signals of this the probability relationsofbetweena linearsignal andnature,noise are andproved,theand maximumduration whichsignalcouldamplitudehave been transmitted in the The general mechanism of noise-improvement thresholdsone is appliedis dis-to investigate the effect of pulse length and repetition system.' With the foregoing meaning for the various cussed, and it is shown how they depend upon the establishmentrate on radar range.of a TheisconceptS, whileof "generalizedthe noiseselectivity"amplitudeis in- symbols,is N, thenwe havethe number of coherence standard. The reason for and the limitation oftroduced,the apparentand it is shownsignificanthow and why extraamplitudebandwidth canlevelsbe used is essentially for noise reduction. It is pointed out that most noise-improvement number of different possible messages =Ln. (I),2 law that the maximum operating range of a communicationssystems aresystem,based upon coherent repetition of the message informa- for a given average power, is independent of the type tionof modulationeither in time or in the frequency spectrum. It is also Lpointed= (S/N)out +The1number of significant amplitude(2) levels is usually used is then explained. General ways in which improvementswhy moreinpowerfulrange noise-improvement systems should be possible determined by the noise in the system. If the system is than have so far been made. of radar and communication systems may be made areThealsogeneraldis-mechanismwhereof noise-improvementthe "1" is duethresholdsto theis dis-factof athatlinearthenature,zeroandsignalthe maximumlevel signal amplitude cussed. The possibility of using extra bandwidth to reducecussed,distortionand it is shown howcantheybedependused.upon the establishment of a is S, while the noise amplitude is N, then the number of is pointed out. Finally, some possible relations of this workcoherenceto biologystandard. The reason for and the limitation of the apparent significant amplitude levels is essentially law that the maximum operatingTherangedurationof a communicationsto of system,a significant time interval of the and psychology are described. for a given average power, is independent of the type of modulation L = (S/N) + 1 (2) used is then explained. Generalsignalwaysisindeterminedwhich improvementsbyin rangethe inherent limited bandwidth I. INFORMATION THEORY of radar and communicationof thesystemssignal.may be madeIt isare wellalso dis-knownwhere thethat,"1" isifduea tosignalthe fact hasthat the zero signal level cussed. The possibility of using extra bandwidth toareduce distortion can be used. more flf are inis pointed out. Finally, somepassedpossiblethroughrelations of this worktransmissionto biology system havingto or HE SIGNALS which of interest andradiopsychologyengi-are described. The duration of a significant time interval of the neering may be represented graphically67 / 85as 26/10/2016func- less uniformOlivier Rioultransmission overShannon’ssignala frequencyis FormuladeterminedW log(1 +bybandwidthSNRthe): Ainherent Historical Perspectivelimited bandwidth tions of time. One such signal is shown in Fig. 1.I. INFORMATIONB, the smallestTHEORYtime intervals intoof thewhichsignal. weIt iscanwellseparateknown that, if a signal has flf HE SIGNALSthewhichportionsare of interestof thein radiosignalengi-suchpassedthatthrough a transmissionof thesystem having more or In a transmission system having L different significant less uniformamplitudestransmission over a frequency bandwidth neering may individualbe representedintervalsgraphically shallas func-be separately significant will tions of time. One such signal is shown in Fig. 1. B, the smallest time intervals into which we can separate In a transmission systemhave havinga durationL differentof significantapproximately3the portions of the signal such that amplitudes of the T individual intervals shall be separately significant will have a duration of approximately3 T to= 1/2B. (3) to= 1/2B. (3) Equation (3) may, in any particular case, be in error Equation (3) may, in any particular case, be in error by several per cent. However,by itseveralwillpernotcent.beHowever,wrong itbywill not be wrong by an order of magnitude. If the totalan orderdurationof magnitude.of theIf thesignaltotal duration of the signal is T, then the number of its significantis T, then the timenumberintervalsof its significantis time intervals is n = T/to = 2TB. (4) n = T/to = 2TB. (4) I - Consequently, a given message of duration T represents - a particular choice of one out of I Consequently, a given message of duration T represents - L-I 2 a particular choice of one out of / + )2TB Fig. 1-Diagram of a signal, showing its significant time intervals Ln = ( + 11 (5)4 - L-I and amplitude levels. This signal is in a system in which there are XN I 2 both positive and negative levels. With noise also having both/ + )2TB positive and negative levels, the spacing between signal= levels different possible messages of the same duration which Fig. 1-Diagram of a signal, showing its significant timemustintervalsbe the peak-to-peak value of noise, namely, 2N,Lnso that( the + 11 (5)4 levels. This a number of different significant amplitude levels is still L = (S/N)XN couldIhave been sent through the system. and amplitude signal is in system in which+1.there(The idealaresignal is shown by the broken line. The solid line both positive and negative levels. With noise also havingshows thebothsame signal after passing through a transmission system I R.V. L. Hartley, "Transmission of information," Bell Sys. Tech. positive and negative levels, the spacing between signalof bandwidthlevelsB.) Jour., vol.same7, pp. 535-563; July, 1928. different possible messages of the2 For example, durationif there are threewhichamplitude levels, designated as must be the peak-to-peak value of noise, namely, 2N,*so that the Decimal classification: R272.3. Original manuscriptsentreceived by a, b,theand c, and if there are two time intervals, then the 32=9 possible number of different significant amplitude levels is stillthe Institute,L = (S/N)October 6,could1947; revisedhavemanuscriptbeenreceived, Januarythroughsignals aresystem.aa, ab, ac, ba, bb, bc, ca, cb, and cc. +1. (The ideal signal is shown by the broken line. 15,The1948.solidPresented,line National Electronics Conference, November, I Stanford Goldman, "Frequency Analysis, Modulation and Noise," 1947, Chicago, Ill. This workI has been supported in part by the Sig- McGraw-Hill Book Co., New York, N. Y., 1947; chap. IV, especially shows the same signal after passing through a transmissionnal Corps,systemthe Air Materiel Command,R.V. L. andHartley,the Office"Transmissionof Naval Re- Fig.of 7c.information," Bell Sys. Tech. of bandwidth B.) search. Jour., vol. 7, pp. 535-563; July, 1928. 4Equation (5) has been derived independently by many people, t Research Laboratory of2 Electronics,For example,Massachusettsif thereInstituteare threeof amongamplitudethem W.levels,G. Tuller,designatedfrom whom theaswriter first learned about * Decimal classification: R272.3. Original manuscriptTechnology,receivedCambridge,by a,Mass.b, and c, and if there are two time it.intervals, then the 32=9 possible the Institute, October 6, 1947; revised manuscript received, January signals are aa, ab, ac, ba, bb, bc, ca, cb, and cc. 15, 1948. Presented, National Electronics Conference, November, I Stanford Goldman, "Frequency Analysis, Modulation and Noise," 1947, Chicago, Ill. This work has been supported in part by the Sig- McGraw-Hill Book Co., New York, N. Y., 1947; chap. IV, especially nal Corps, the Air Materiel Command, and the Office of Naval Re- Fig. 7c. search. 4Equation (5) has been derived independently by many people, t Research Laboratory of Electronics, Massachusetts Institute of among them W. G. Tuller, from whom the writer first learned about Technology, Cambridge, Mass. it. 472 PROCEEDINGS OF TIHE I.R.E. May 472 PROCEEDINGS OF TIHE I.R.E. May to the utmost. This does not, however, affect the rate of by power, attenuation, and noise limitations, not S/N. totransmissionthe utmost.ofThisinformation,does not,thehowever,quantityaffectundertheconsid-rate of Similarly,by power,itattenuation,is bandwidthandin thenoisetransmissionlimitations,linknot thatS/N. erationtransmissionhere. of information, the quantity under consid- isSimilarly,scarce andit isexpensive.bandwidthItinis,thetherefore,transmissionnecessarylink thatto erationAs a resulthere. of the considerations given above, we are bringis scarcebothandtheseexpensive.quantitiesItintois, thetherefore,analysisnecessaryand go be-to ledAstoatheresultconclusionof the considerationsthat the only limitsgiven toabove,the ratewe areof yondbring(2).both these quantities into the analysis and go be- ledtransmissionto the conclusionof informationthat theononlya noise-freelimits tocircuitthe rateareof yondThe (2).transmission system assumed for the remainder economictransmissionandofpractical,informationnot theoretical.on a noise-free circuit are of thisTheanalysistransmissionis shownsystemin blockassumeddiagramfor intheFig.remainder6. The economic and practical, not theoretical. elementsof this analysisof this issystemshownmayin blockbe considereddiagram inseparately.Fig. 6. The VI. TRANSMISSION OF INFORMATION IN THE elements of this system may be considered separately. VI. TRANSMISSIONPRESENCEOF OFINFORMATIONNOISE IN THE In some ways thePRESENCEdiscussionOF NOISEof the section immedi- atelyIn someprecedingwaysthisthe onediscussionrepresentsof thea digressionsection immedi-in the atelymain argumentpreceding tothisbeonecontinuedrepresentsbelow.a digressionIt may beinwell,the OUTPUT maintherefore,argumentto reviewto bethecontinuedmain argumentbelow. Itatmaythisbepoint,well, INFORMATION FUNCTIONOUTPUTPLUS therefore,and to indicateto reviewthe directionthe mainit argumentis to take. atSo thisfar, Hart-point, NOISEINFORMATION FUNCTION PLUS andley'stodefinitionindicateoftheinformationdirection ithasis beento take.investigatedSo far, Hart-and NOISE shownley's definitionadequate.forof informationthis analysis.has beenTheinvestigatedearly theoriesand Fig. 6-Block diagram of the simplified communication system Fig. 6-Block diagramusedofinthethesimplifiedanalysis. communication system shownof transmissionadequate.forof informationthis analysis.haveThebeenearlyrefuted.theoriesIn used in the analysis. ofthetransmissionportion of theofwQrkinformationthat follows,havea beenmodifiedrefuted.versionIn The transmitter, for example, is simply a device that theof theportionHartleyof thelaw wQrkapplicablethat follows,to a systema modifiedin whichversionnoise operatesThe transmitter,on the informationfor example,functionis simplyin a one-to-onea deviceandthat ofis presentthe Hartleyis derived.law applicableThis is doneto a systemfor theingeneralwhich noisecase reversibleoperates onmanner.the informationThe informationfunctioncontainedin a one-to-onein the in-and isandpresentfor twois specialderived.typesThisofiswide-banddone for modulationthe general sys-case formationreversible functionmanner. isThepreservedinformationin thiscontainedtransformation.in the in- andtems,foruncodedtwo specialand codedtypes ofsystems.wide-bandAs amodulationresult of thesesys- formationThe receiverfunctionis the ismathematicalpreserved ininversethis transformation.of the trans- tems,analysesuncodedthe fundamentaland codedrelationsystems.betweenAs a resultrate ofoftrans-these mitter;The receiverthat is, inis thethe absencemathematicalof noiseinverseor otherof thedisturb-trans- analysesmission oftheinformationfundamentalandrelationtransmissionbetweenfacilitiesrate of istrans-de- ance,mitter;thethatreceiveris, in thewillabsenceoperateofonnoisetheoroutputother disturb-of the rived.mission of information and transmission facilities is de- transmitterance, the receiverto producewilla signaloperateidenticalon thewithoutputthe origi-of the rived.Since we have shown that intersymbol interference naltransmitterinformationto producefunction.a signalThe receiver,identicallikewiththethetrans-origi- is Sinceunimportantwe havein shownlimitingthattheintersymbolrate of transmissioninterferenceof mitter,nal informationneed not befunction.linear. The receiver, like the trans- Williamisinformation,unimportant G.let Tullerinus limitingassume ittheabsent.rate ofLettransmissionS be the rmsof mitter,It is assumedneed notthroughoutbe linear. the remainder of this analy- 468 information,amplitude of letPROCEEDINGStheusmaximumassume itOFsignalabsent.TIHEthatI.R.E.LetmayS bebethedeliv-rms sis, Ithowever,is assumedMaythatthroughoutthe differencethebetweenremaindertwoofcarriersthis analy-of amplitudeered by theofcommunicationthe maximum system.signal thatLet mayus assume,be deliv-a barelysis, however,discerniblethatamplitudethe differencedifferencebetweenis N,tworegardlesscarriers of Theoreticaleredfact veryby theclosecommunicationLimitationsto the truth,system.that ona Letsignaltheus amplitudeassume,Ratea ofofbarelycarrierdiscernibleamplitude.amplitudeThis correspondsdifference tois N,an regardlessassump- change less than noise amplitude cannot be recognized, tion of over-all receiver linearity, but does not rule out fact Transmissionvery close to the truth,of thatInformation*a signal amplitude of carrier amplitude. This corresponds to an assump- changebut a signalless thanamplitudenoise changeamplitudeequalcannotto noisebe isrecognized,instantly thetionpresenceof over-allof nonlinearreceiver linearity,elements butwithindoesthenotreceiver.rule out butrecognizable.'4a signalWILLIAMamplitudeThen,G. TULLERt,ifchangeN is theequalSENIORrmstoMEMBER,amplitudenoise isIREinstantlyof the Thisthe presenceassumptionof nonlinearis convenientelementsbut notwithinessential.the receiver.If it recognizable.'4noise mixed withThen,the signal,if N istherethearerms1 +S/Namplitudesignificantof the doesThisnotassumptionhold, the isusualconvenientmethod butof assumingnot essential.linearityIf it Summary-A review of valuesearly workofon thesignaltheory thatof the transmis-may be however,determined.the possibilityThis setsof thes inuse ofoverthe knowledgea ofrange of and these sion of information is followednoiseby a criticalmixedsurveywithof thistheworksignal,and a therethe transient-responseare 1 +S/N significantcharacteristics doesof the notcircuitssmallhold,in- the usualoperationmethod of assumingcascadinglinearity refutation of the point that,thein thederivationabsence of noise,of there(1). isSincea finite it volved.is known"3He furtherthat neglectedthe specifi-snoise.in small ranges to form the whole range may be used in an limit to the rate at which informationvaluesmayof besignaltransmittedthatovermaya finitebe determined. This sets over a small range of operation and cascading these frequency band. A simple theorythecationderivationis thenof developedan arbitraryofwhich(1). includes,Sincewaveit. isofInknown"3duration1946, Gabor5thatT presentedandthe specifi-maxi-an analysisentirelysmallwhichrangesanalogousbroketo formanalysisthe wholewithrangeessentiallymay benousedchangein an way, shows that in- . the Hartley theory in a first-order the effectscationmumof noise.componentof Thisan theoryarbitraryf; requireswave. throughof2fcTdurationsomemeasurements,of theT andlimitationsmaxi-we of inentirelymethodanalogousand only aanalysisslight changewith essentiallyin definitionnoofchangeC/N formation may be transmitted overfroma given circuittheaccording to the ofandinformationintroduced quantitativeavailable atanalysis into Hartley's to be relation mumhave component(1) f;quantityrequirespurely2fcTqualitativemeasurements,reasoning.weHowever,andin methodS/N,Gaborhereandalso assumedonly a slight changeamplitude-insensitive.in definition of C/N H 2BThavethelogoutput(1from+ C/N),(1)of thethesystem:quantity offailedinformationto include noiseavailablein his reasoning.at andTheS/N,filterhereat theassumedoutputtoof betheamplitude-insensitive.receiver is assumed to have so far been discussed where H is the quantity oftheinformation, Bof=thethetransmission=link The workers whose papers set Thethe responsefilter at thecharacteristicoutput of theof thereceivertransmissionis assumedsys-to outputH kn logsystem:s k2_fT log (1 + S/N). (2)the fact that the problem bandwidth, T the time of transmission, and C/N the carrier-to-noise failed to give much thought to system.set the (Itresponseshouldcharacteristicbe noted that,of whenthe transmission"transmissionsys- ratio. Certain special cases are Thisconsidered,is Hanand=importantit isknshownlogthats =thereexpression,k2_fTof transmittinglog to(1 be+ S/N).sure,informationbut givesis(2)in manysystem"ways identicalis referred to, all the elements shown in are two distinctly different types of modulation systems, one trading to the problem of analysis of stationarysystem.time series.(ItThisshould be noted that, when "transmissionFig. bandwidth linearly for signal-to-noiseus no informationratio, the other tradingin itselfband- as to the limitsinthata may be 6 Wiener,6are included.who did "Transmission link" refers only to those width logarithmically for signal-to-noiseThis isratio.an important expression,point wastomadebe sure,classicalbut givespaper bysystem" is referred to, all the elements shown in Fig. on H. In a searchingis the bandwidthanalysis of thatof theproblemelementswhich is betweena large the output of the transmitter and the The theory developed isusplacedappliedno to show some of theparticular,inefficienciesin asfJ to limits be 6 link" refers to those information itself the thatone, maythe problem ofarethe included.irreducible "Transmission only of present communication systems.placedover-allTheoncommunicationadvantagesH. Intoparticular,be gained system,by fJpartisofnotthethethebandwidthgeneralbandwidthof theof inputelementsto thebetweenreceiver.)the outputThe transmissionof the transmittercharacteristicsand the the removal of internal messagethe transmissioncorrelations and analysislink ofconnectingthe noise presenttransmitterin a mixtureandof signalre- andof thisnoise.filterUnfortu-are, therefore, those previously given for the actual information content ofover-alla message arecommunicationpointed out. The discus-system,nately,notthisthepaperbandwidthreceived onlyof a limitedinput circulation,to the receiver.) The transmission characteristics sion is applied to such communicationceiver. Also,systems S/Nas radarmayrelays, nottele- at this stage of the analysis over-all transmission system. Coming now to the ele- the transmission link connectingand and this,transmittercoupled with andthe factre-thatofthethismathematicsfilter are, therefore, those previously given for the meters, voice conununicationhavesystems,any relationservomechanisms,to C/N, the ratio of the maximum ments of theof thetransmission consider first the filter computers. ceiver. Also, S/N may not atemployedthis stagewere ofbeyondthe analysisthe off-hand capabilitiesover-all transmission system.link, Coming now to the ele- 68 / 85 26/10/2016 signalOlivieramplitude Rioul to the noiseShannon’shard-pressedamplitude FormulacommunicationW logas(1 +measuredSNR): Aengineers Historicalwhich Perspectiveengagedsetsin high-the link's transmission characteristics. The have any relation to C/N, the ratio of the maximum ments asofwidetheantransmission link, consider first the filter I. INTRODUCTIONbefore such nonlinear processesspeedaswartimedemodulationdevelopments,thathas preventedphase shift of this filter is assumed to be linear with re- signal amplitude to the noiseapplicationamplitudeof theastheorymeasuredas its importancewhich setsdeserves.the link's transmission characteristics. The r[f in thegoesreceiver. It is that is determined for from minus to HE HISTORY ofbeforemaythisoccurinvestigationsuch nonlinearbackprocessesat AssociatesC/INas demodulationof Wiener have writtenthat simplifiedspectphasetoversionsshiftfrequencyofofthis filterall frequenciesis assumed to be linear withplusre- least to 1922, when Carson,' analyzing narrow- infinity. Theas over-all attenuation is assumed to be zero may14Thisoccurassumptionin the receiver.ignores the Itrandomportionsis C/INnatureofthathisoftreatment,7'8noiseis determinedto a certainbut thesespectalso haveto frequencyyet for all frequencies from minus to plus deviation frequencyextent,modulationresulting asin aabandwidth-theoretical limitbeenaboutlittle 3acceptedto 8 dbintoabovethe thatworkingdecibelstools of theat com-all frequencies less than B, and is assumed wrote are believed The over-all attenuation is assumed to be zero reduction scheme, actually"all14Thissuchobtainable.assumptionschemes Theignoresassumptionthe randommunicationis believednatureengineer.worthof noisewhiletoWieneraincertainviewhas himselftoinfinity.be sodonelargeworkfor all frequencies above B that energy to involve a fundamentalof thefallacy."enormousInsimplificationin1924,a theoreticalNyquist2of theorylimitparallelobtained.aboutto that3 toForpresented8 adbmoreaboveinprecisethisthatpaper,decibelsbut this workat allis frequencies less than B, and is assumed extent,formulationresultingof the footnote 9 and passing through the system at these frequencies is small and Kuipfmiuller,' workingactuallyindependently,obtainable.theory,-seeshowedThe assumptionthat as isyetreferencesbelievedunpublished,worthandwhile10. its inexistenceview wasto learnedbe so oflargeonly for all frequencies above B that energy the number of telegraphof thesignalsenormousthat maysimplificationbe transmit-of theoryafter obtained.the For a ofmoresubstantiallyprecise all the research re- formulation of the completion9 the system at these is small ted over a line is directly proportional to itstheory,-seebandwidth.footnoteportedreferenceson here. Aandgroup10. at the Bell Telephonepassing throughLabora- frequencies Hartley,4 writing in 1928, generalized this theory to ap- tories, including C. E. Shannon, has also done similar ply to speech and general information, concluding that work.9'10"11 "the total amount of information which may be trans-

mitted . . is proportional to the product of the fre- II. DEFINITIONS OF TERMS FREQUENTLY USED quency range which is transmitted and the time which Certain terms are used in the discussion to follow is available for the transmission." It is Hartley's work which are either so new to the art that accepted defini- that is the most direct ancestor of the present paper. In tions for them have not yet been established, or have his paper he introduced the concept of the information function, the measure of quantity of information, and 5 D. Gabor, "Theory of communication," Jour. I.E.E. (London), vol. 93, part III, p. 439; November, 1946. the general technique used in this paper. He neglected, 6N. Wiener, "The extrapolation, interpolation and smoothing of stationary time series," National Defense Research Council, Section * Decimal classification: 621.38. Original manuscript received D2 Report, February, 1942. by the Institute, September 7, 1948; revised manuscript received, 7 N. Levinson, "The Wiener (RMS) error criterion in filter design February 3, 1949. This paper is based on a thesis submitted in partial and prediction," Jour. Math. Phys., vol. 25, no. 4, p. 261; 1947. fulfillment of the requirements of the degree of Doctor of Science at 8 H. M. James, "Ideal frequency response of receiver for square the Massachusetts Institute of Technology. It was supported, in pulses," Report No. 125 (v-12s), Radiation Laboratory, MIT, part, by the Signal Corps, the Air Materiel Command, and the November 1, 1941. Office of Naval Research. 9 C. E. Shannon, "A mathematical theory of communication," t Melpar, Inc., Alexandria, Va. Bell Sys. Tech. Jour., vol. 27, pp. 379-424 and 623-657; July and IJ. R. Carson, "Notes on the theory of modulation," PROC. October, 1948. I.R.E., vol. 10, p. 57; February, 1922. 10 C. E. Shannon, "Communication in the presence of noise," 2 H. Nyquist, "Certain factors affecting telegraph speed," Bell PROC. I.R.E., vol. 37, pp. 10-22; January, 1949. Sys. Tech. Jour., vol. 3, p. 324; April, 1924. "1 The existence of this work was learned by the author in the 3 K. Ktipfmtiller, "Transient phenomena in wave filters," Elek. spring of 1946, when the basic work underlying this paper had just Nach. Tech., vol. 1, p. 141; 1924. been completed. Details were not known by the author until the 4R. V. L. Hartley, "Transmission of information," Bell Sys. summer of 1948, at which time the work reported here had been Tech. Jour., vol. 7, p. 535-564; July, 1928. complete for about eight months. 472 PROCEEDINGS OF TIHE I.R.E. May 472 PROCEEDINGS OF TIHE I.R.E. May to the utmost. This does not, however, affect the rate of by power, attenuation, and noise limitations, not S/N. totransmissionthe utmost.ofThisinformation,does not,thehowever,quantityaffectundertheconsid-rate of Similarly,by power,itattenuation,is bandwidthandin thenoisetransmissionlimitations,linknot thatS/N. erationtransmissionhere. of information, the quantity under consid- isSimilarly,scarce andit isexpensive.bandwidthItinis,thetherefore,transmissionnecessarylink thatto erationAs a resulthere. of the considerations given above, we are bringis scarcebothandtheseexpensive.quantitiesItintois, thetherefore,analysisnecessaryand go be-to ledAstoatheresultconclusionof the considerationsthat the only limitsgiven toabove,the ratewe areof yondbring(2).both these quantities into the analysis and go be- ledtransmissionto the conclusionof informationthat theononlya noise-freelimits tocircuitthe rateareof yondThe (2).transmission system assumed for the remainder economictransmissionandofpractical,informationnot theoretical.on a noise-free circuit are of thisTheanalysistransmissionis shownsystemin blockassumeddiagramfor intheFig.remainder6. The economic and practical, not theoretical. elementsof this analysisof this issystemshownmayin blockbe considereddiagram inseparately.Fig. 6. The VI. TRANSMISSION OF INFORMATION IN THE elements of this system may be considered separately. VI. TRANSMISSIONPRESENCEOF OFINFORMATIONNOISE IN THE In some ways thePRESENCEdiscussionOF NOISEof the section immedi- atelyIn someprecedingwaysthisthe onediscussionrepresentsof thea digressionsection immedi-in the atelymain argumentpreceding tothisbeonecontinuedrepresentsbelow.a digressionIt may beinwell,the OUTPUT maintherefore,argumentto reviewto bethecontinuedmain argumentbelow. Itatmaythisbepoint,well, INFORMATION FUNCTIONOUTPUTPLUS therefore,and to indicateto reviewthe directionthe mainit argumentis to take. atSo thisfar, Hart-point, NOISEINFORMATION FUNCTION PLUS andley'stodefinitionindicateoftheinformationdirection ithasis beento take.investigatedSo far, Hart-and NOISE shownley's definitionadequate.forof informationthis analysis.has beenTheinvestigatedearly theoriesand Fig. 6-Block diagram of the simplified communication system Fig. 6-Block diagramusedofinthethesimplifiedanalysis. communication system shownof transmissionadequate.forof informationthis analysis.haveThebeenearlyrefuted.theoriesIn used in the analysis. ofthetransmissionportion of theofwQrkinformationthat follows,havea beenmodifiedrefuted.versionIn The transmitter, for example, is simply a device that theof theportionHartleyof thelaw wQrkapplicablethat follows,to a systema modifiedin whichversionnoise operatesThe transmitter,on the informationfor example,functionis simplyin a one-to-onea deviceandthat ofis presentthe Hartleyis derived.law applicableThis is doneto a systemfor theingeneralwhich noisecase reversibleoperates onmanner.the informationThe informationfunctioncontainedin a one-to-onein the in-and isandpresentfor twois specialderived.typesThisofiswide-banddone for modulationthe general sys-case formationreversible functionmanner. isThepreservedinformationin thiscontainedtransformation.in the in- andtems,foruncodedtwo specialand codedtypes ofsystems.wide-bandAs amodulationresult of thesesys- formationThe receiverfunctionis the ismathematicalpreserved ininversethis transformation.of the trans- tems,analysesuncodedthe fundamentaland codedrelationsystems.betweenAs a resultrate ofoftrans-these mitter;The receiverthat is, inis thethe absencemathematicalof noiseinverseor otherof thedisturb-trans- analysesmission oftheinformationfundamentalandrelationtransmissionbetweenfacilitiesrate of istrans-de- ance,mitter;thethatreceiveris, in thewillabsenceoperateofonnoisetheoroutputother disturb-of the rived.mission of information and transmission facilities is de- transmitterance, the receiverto producewilla signaloperateidenticalon thewithoutputthe origi-of the rived.Since we have shown that intersymbol interference naltransmitterinformationto producefunction.a signalThe receiver,identicallikewiththethetrans-origi- is Sinceunimportantwe havein shownlimitingthattheintersymbolrate of transmissioninterferenceof mitter,nal informationneed not befunction.linear. The receiver, like the trans- Williamisinformation,unimportant G.let Tullerinus limitingassume ittheabsent.rate ofLettransmissionS be the rmsof mitter,It is assumedneed notthroughoutbe linear. the remainder of this analy- 468 information,amplitude of letPROCEEDINGStheusmaximumassume itOFsignalabsent.TIHEthatI.R.E.LetmayS bebethedeliv-rms sis, Ithowever,is assumedMaythatthroughoutthe differencethebetweenremaindertwoofcarriersthis analy-of amplitudeered by theofcommunicationthe maximum system.signal thatLet mayus assume,be deliv-a barelysis, however,discerniblethatamplitudethe differencedifferencebetweenis N,tworegardlesscarriers of Theoreticaleredfact veryby theclosecommunicationLimitationsto the truth,system.that ona Letsignaltheus amplitudeassume,Ratea ofofbarelycarrierdiscernibleamplitude.amplitudeThis correspondsdifference tois N,an regardlessassump- change less than noise amplitude cannot be recognized, tion of over-all receiver linearity, but does not rule out fact Transmissionvery close to the truth,of thatInformation*a signal amplitude of carrier amplitude. This corresponds to an assump- changebut a signalless thanamplitudenoise changeamplitudeequalcannotto noisebe isrecognized,instantly thetionpresenceof over-allof nonlinearreceiver linearity,elements butwithindoesthenotreceiver.rule out butrecognizable.'4a signalWILLIAMamplitudeThen,G. TULLERt,ifchangeN is theequalSENIORrmstoMEMBER,amplitudenoise isIREinstantlyof the Thisthe presenceassumptionof nonlinearis convenientelementsbut notwithinessential.the receiver.If it recognizable.'4noise mixed withThen,the signal,if N istherethearerms1 +S/Namplitudesignificantof the doesThisnotassumptionhold, the isusualconvenientmethod butof assumingnot essential.linearityIf it Summary-A review of valuesearly workofon thesignaltheory thatof the transmis-may be however,determined.the possibilityThis setsof thes inuse ofoverthe knowledgea ofrange of and these sion of information is followednoiseby a criticalmixedsurveywithof thistheworksignal,and a therethe transient-responseare 1 +S/N significantcharacteristics doesof the notcircuitssmallhold,in- the usualoperationmethod of assumingcascadinglinearity refutation of the point that,thein thederivationabsence of noise,of there(1). isSincea finite it volved.is known"3He furtherthat neglectedthe specifi-snoise.in small ranges to form the whole range may be used in an limit to the rate at which informationvaluesmayof besignaltransmittedthatovermaya finitebe determined. This sets over a small range of operation and cascading these frequency band. A simple theorythecationderivationis thenof developedan arbitraryofwhich(1). includes,Sincewaveit. isofInknown"3duration1946, Gabor5thatT presentedandthe specifi-maxi-an analysisentirelysmallwhichrangesanalogousbroketo formanalysisthe wholewithrangeessentiallymay benousedchangein an way, shows that in- . the Hartley theory in a first-order the effectscationmumof noise.componentof Thisan theoryarbitraryf; requireswave. throughof2fcTdurationsomemeasurements,of theT andlimitationsmaxi-we of inentirelymethodanalogousand only aanalysisslight changewith essentiallyin definitionnoofchangeC/N formation may be transmitted overfroma given circuittheaccording to the ofandinformationintroduced quantitativeavailable atanalysis into Hartley's to be relation mumhave component(1) f;quantityrequirespurely2fcTqualitativemeasurements,reasoning.weHowever,andin methodS/N,Gaborhereandalso assumedonly a slight changeamplitude-insensitive.in definition of C/N H 2BThavethelogoutput(1from+ C/N),(1)of thethesystem:quantity offailedinformationto include noiseavailablein his reasoning.at andTheS/N,filterhereat theassumedoutputtoof betheamplitude-insensitive.receiver is assumed to have so far been discussed where H is the quantity oftheinformation, Bof=thethetransmission=link The workers whose papers set Thethe responsefilter at thecharacteristicoutput of theof thereceivertransmissionis assumedsys-to outputH kn logsystem:s k2_fT log (1 + S/N). (2)the fact that the problem bandwidth, T the time of transmission, and C/N the carrier-to-noise failed to give much thought to system.set the (Itresponseshouldcharacteristicbe noted that,of whenthe transmission"transmissionsys- ratio. Certain special cases are Thisconsidered,is Hanand=importantit isknshownlogthats =thereexpression,k2_fTof transmittinglog to(1 be+ S/N).sure,informationbut givesis(2)in manysystem"ways identicalis referred to, all the elements shown in are two distinctly different types of modulation systems, one trading to the problem of analysis of stationarysystem.time series.(ItThisshould be noted that, when "transmissionFig. bandwidth linearly for signal-to-noiseus no informationratio, the other tradingin itselfband- as to the limitsinthata may be 6 Wiener,6are included.who did "Transmission link" refers only to those width logarithmically for signal-to-noiseThis isratio.an important expression,point wastomadebe sure,classicalbut givespaper bysystem" is referred to, all the elements shown in Fig. on H. In a searchingis the bandwidthanalysis of thatof theproblemelementswhich is betweena large the output of the transmitter and the The theory developed isusplacedappliedno to show some of theparticular,inefficienciesin asfJ to limits be 6 link" refers to those information itself the thatone, maythe problem ofarethe included.irreducible "Transmission only of present communication systems.placedover-allTheoncommunicationadvantagesH. Intoparticular,be gained system,by fJpartisofnotthethethebandwidthgeneralbandwidthof theof inputelementsto thebetweenreceiver.)the outputThe transmissionof the transmittercharacteristicsand the the removal of internal messagethe transmissioncorrelations and analysislink ofconnectingthe noise presenttransmitterin a mixtureandof signalre- andof thisnoise.filterUnfortu-are, therefore, those previously given for the actual information content ofover-alla message arecommunicationpointed out. The discus-system,nately,notthisthepaperbandwidthreceived onlyof a limitedinput circulation,to the receiver.) The transmission characteristics sion is applied to such communicationceiver. Also,systems S/Nas radarmayrelays, nottele- at this stage of the analysis over-all transmission system. Coming now to the ele- the transmission link connectingand and this,transmittercoupled with andthe factre-thatofthethismathematicsfilter are, therefore, those previously given for the meters, voice conununicationhavesystems,any relationservomechanisms,to C/N, the ratio of the maximum ments of theof thetransmission consider first the filter computers. ceiver. Also, S/N may not atemployedthis stagewere ofbeyondthe analysisthe off-hand capabilitiesover-all transmission system.link, Coming now to the ele- 68 / 85 26/10/2016 signalOlivieramplitude Rioul to the noiseShannon’shard-pressedamplitude FormulacommunicationW logas(1 +measuredSNR): Aengineers Historicalwhich Perspectiveengagedsetsin high-the link's transmission characteristics. The have any relation to C/N, the ratio of the maximum ments asofwidetheantransmission link, consider first the filter I. INTRODUCTIONbefore such nonlinear processesspeedaswartimedemodulationdevelopments,thathas preventedphase shift of this filter is assumed to be linear with re- signal amplitude to the noiseapplicationamplitudeof theastheorymeasuredas its importancewhich setsdeserves.the link's transmission characteristics. The r[f in thegoesreceiver. It is that is determined for from minus to HE HISTORY ofbeforemaythisoccurinvestigationsuch nonlinearbackprocessesat AssociatesC/INas demodulationof Wiener have writtenthat simplifiedspectphasetoversionsshiftfrequencyofofthis filterall frequenciesis assumed to be linear withplusre- least to 1922, when Carson,' analyzing narrow- infinity. Theas over-all attenuation is assumed to be zero may14Thisoccurassumptionin the receiver.ignores the Itrandomportionsis C/INnatureofthathisoftreatment,7'8noiseis determinedto a certainbut thesespectalso haveto frequencyyet for all frequencies from minus to plus deviation frequencyextent,modulationresulting asin aabandwidth-theoretical limitbeenaboutlittle 3acceptedto 8 dbintoabovethe thatworkingdecibelstools of theat com-all frequencies less than B, and is assumed wrote are believed The over-all attenuation is assumed to be zero reduction scheme, actually"all14Thissuchobtainable.assumptionschemes Theignoresassumptionthe randommunicationis believednatureengineer.worthof noisewhiletoWieneraincertainviewhas himselftoinfinity.be sodonelargeworkfor all frequencies above B that energy to involve a fundamentalof thefallacy."enormousInsimplificationin1924,a theoreticalNyquist2of theorylimitparallelobtained.aboutto that3 toForpresented8 adbmoreaboveinprecisethisthatpaper,decibelsbut this workat allis frequencies less than B, and is assumed extent,formulationresultingof the footnote 9 and passing through the system at these frequencies is small and Kuipfmiuller,' workingactuallyindependently,obtainable.theory,-seeshowedThe assumptionthat as isyetreferencesbelievedunpublished,worthandwhile10. its inexistenceview wasto learnedbe so oflargeonly for all frequencies above B that energy the number of telegraphof thesignalsenormousthat maysimplificationbe transmit-of theoryafter obtained.the For a ofmoresubstantiallyprecise all the research re- formulation of the completion9 the system at these is small ted over a line is directly proportional to itstheory,-seebandwidth.footnoteportedreferenceson here. Aandgroup10. at the Bell Telephonepassing throughLabora- frequencies Hartley,4 writing in 1928, generalized this theory to ap- tories, including C. E. Shannon, has also done similar ply to speech and general information, concluding that work.9'10"11 "the total amount of information which may be trans-

mitted . . is proportional to the product of the fre- II. DEFINITIONS OF TERMS FREQUENTLY USED quency range which is transmitted and the time which Certain terms are used in the discussion to follow is available for the transmission." It is Hartley's work which are either so new to the art that accepted defini- that is the most direct ancestor of the present paper. In tions for them have not yet been established, or have his paper he introduced the concept of the information function, the measure of quantity of information, and 5 D. Gabor, "Theory of communication," Jour. I.E.E. (London), vol. 93, part III, p. 439; November, 1946. the general technique used in this paper. He neglected, 6N. Wiener, "The extrapolation, interpolation and smoothing of stationary time series," National Defense Research Council, Section * Decimal classification: 621.38. Original manuscript received D2 Report, February, 1942. by the Institute, September 7, 1948; revised manuscript received, 7 N. Levinson, "The Wiener (RMS) error criterion in filter design February 3, 1949. This paper is based on a thesis submitted in partial and prediction," Jour. Math. Phys., vol. 25, no. 4, p. 261; 1947. fulfillment of the requirements of the degree of Doctor of Science at 8 H. M. James, "Ideal frequency response of receiver for square the Massachusetts Institute of Technology. It was supported, in pulses," Report No. 125 (v-12s), Radiation Laboratory, MIT, part, by the Signal Corps, the Air Materiel Command, and the November 1, 1941. Office of Naval Research. 9 C. E. Shannon, "A mathematical theory of communication," t Melpar, Inc., Alexandria, Va. Bell Sys. Tech. Jour., vol. 27, pp. 379-424 and 623-657; July and IJ. R. Carson, "Notes on the theory of modulation," PROC. October, 1948. I.R.E., vol. 10, p. 57; February, 1922. 10 C. E. Shannon, "Communication in the presence of noise," 2 H. Nyquist, "Certain factors affecting telegraph speed," Bell PROC. I.R.E., vol. 37, pp. 10-22; January, 1949. Sys. Tech. Jour., vol. 3, p. 324; April, 1924. "1 The existence of this work was learned by the author in the 3 K. Ktipfmtiller, "Transient phenomena in wave filters," Elek. spring of 1946, when the basic work underlying this paper had just Nach. Tech., vol. 1, p. 141; 1924. been completed. Details were not known by the author until the 4R. V. L. Hartley, "Transmission of information," Bell Sys. summer of 1948, at which time the work reported here had been Tech. Jour., vol. 7, p. 535-564; July, 1928. complete for about eight months. 10 PROCEEDINGS OF THE I.R.E. January 16 PROCEEDINGS OF THE I.R.E. January through the attenuator to the receiver. In this manner, space-charge type illustrated in Fig. 5. The shape of this the gain versus the cathode-potential-difference curve of curve corresponds rather closely with the shape of the This discussion is relevant to the well-knownFig."Hartley17 was obtained.be larger.This figureSincecorrespondsin timeratherT theretheoreticalarecurve2TWgivenindependentin Fig. 7. Fig. 8 shows the output Law, " which " . . . closelytowith the theoretical curve of propagation con- voltage versus drift-potential characteristic for the two- states that an upper limitstant versusthethe inhomogeneityamplitudes,factor,theshowntotalin numberFig. 1. velocity-typeof reasonablyelectron-wavedistincttube. sig-When the drift-tube amount of information which may be transmitted is set nals is voltage is high, the tube behaves like the two-cavity by the sum for the various available lines of the product klystron amplifier. As the drift voltage is lowered the _Pgain+Ngraduallyyn2TWincreases, due to the space-charge inter- of the line-frequency range of each by the tifie during M= K action effect, and achieves a maximum(17) which is ap- which it is available for use."2 There is a sense40 in- which - - I, I,- 15.ma. proximately 60 db higher than the output achieved with this statement is sense in which it klystron operation. With further reduction of the drift- true, and another is The30c1number-2X2 of bits238.voltsthat cantube potentialbe senttheinoutputthisdropstimeratherisrapidly, because is - 3000 me. the space-charge conditions become unfavorable; that is, false. It not possible to map the message space into log2 M, and the20fC rate of transmission is the signal space in a one-to-one, continuous manner the inhomogeneity factor becomes too large. lo -_ The electronic bandwidth was measured by measur- is as a log2 M _P±Ningl the gain of the tube over a frequency range from (this known mathematically topological mapping) = W unless the two spaces have the same log2 K2 2000 to(bits3000 perMc andsecond).retuning the (18)input and output cir- dimensionality; cuits for each frequency. It was observed that the gain i.e., unless D =2TW. Hence, if we limit the transmitter of the tube was essentially constant over this frequency and receiver to continuous one-to-one operations, there The difficulty with this range,argument,thus confirmingapart thefromtheoreticalits prediction of -20 general approximate character,electroniclies bandwidthin the tacitof overassump-30 per cent at the gain of is a lower bound to the product TW in the channel. 80 db. -V2) CATHODEtionPOTENTIALthatDIFFERENCEfor two(V V signals to be distinguishable they must This lower bound is determined, not by the I~ product L The electron-wave tube, because of its remarkable 10 o 20 30 differ40 so at60 some70 80 sampling90 100 110 point§20 propertyby moreof achievingthan theenergyexpectedamplification without the W1Tj of message bandwidth and time, but by theFig. 17-Gainnum-versus cathode-potential-difference characteristics use of any resonant or waveguiding structures in the ber of essential dimension D, as indicated in Section IV.of the two-velocity-typenoise. Theelectron-waveargumenttube. presupposesamplifyingthatregionPCM,of the tube,or promisessome- to offer a satis- There trans- thing very similar to PCM,factoryis thesolutionbest tomethodthe problemof en-of generation and is, however, no good reason for limiting theAt a frequency of 3000 Mc and a total current of 15 ma, amplification of energy at millimeter wavelengths, and mitter and receiver to topological mappings.a netIn gainfact,of 46 codingdb was obtained,binaryevendigitsthoughintono at-signals.thus willActually,aid in expeditingtwothesignalsexploitation of that por- PCM and similar modulation systems are highlytempt wasdis-made canto matchbe eitherreliablythe inputdistinguishedor output tionifof theythe electromagneticdiffer byspectrum.only a circuits. The lacksmallof appropriateamount,matchprovidedis responsiblethis difference is sustained continuous and come very close to the type of formappingthe fact that the gain curve assumes negative values ACKNOWLEDGMENTover given by (14) and (15). It is desirable, then,whentothefindelectronica gainlongis notperiodsufficientof totime.overcomeEachthe sampleThe authorof thewishesreceivedto expresssignalhis appreciation of the losses due to mismatch.then Atgivesthe peaka ofsmallthe curve,amountit is enthusiasticof statisticalsupport ofinformationall his co-workers at the Naval limits for what can be done with no restrictionsestimatedon thethat the electronic gain is of the order of 80 Research Laboratory who helped to carry out this proj- type of transmitter and receiver operations.db. These concerning the transmittedect signal;from the stagein ofcombination,conception to the production and limits, which will be derived in the following sections,The curves of outputthese voltagestatisticalversus theindicationspotential of testsresultof experimentalin nearelectron-wavecertainty.tubes. The untiring the drift tube were shownClaudein Figs. 8 and E.9. ShannonFig. 9 shows efforts of two of the author's assistants, C. B. Smith depend on the amount and nature of the noisethis incharacteristicthe Thisfor thepossibilityelectron-waveallowstube ofanthe improvementand R. S. Ware, areofparticularlyabout 8appreciated.db channel, and on the transmitter power, as well as on in power over (18) with a reasonable definition of re- liable resolution of signals, as will appear later. We will the bandwidth-time product. Communication in the Presence of Noise* It is evident that any system, either to compress TW, now make use of the geometrical representation to de- or to expand it and make full use of the additional vol- termine the CLAUDEexact capacityE. SHANNONt,of a noisyMEMBER,channel.IRE Summary-A methodTHEOREMis developed for2:representingLet P beany thecom- average transmitterI. INTRODUCTIONpower, and ume, must be highly nonlinear in character andmunicationfairlysystem geometrically. Messages and the corresponding complex because of the peculiar nature of the mappingssignals are points insupposetwo "functionthespaces,"noiseandisthewhitemodulationthermalA noiseGENERALof powerCOMMUNICATIONSN in the system is process is a mappingbandof one spaceW. intoBythesufficientlyother. Using this complicatedrepre- shownencodingschematicallysystemsin Fig.it is1. It consists essen- involved. sentation, a number of results in communication theory are deduced tially of five elements. concerning expansionpossibleand compressionto transmitof bandwidthbinaryand thedigits at a rate threshold effect. Formulas are found for the maxmum rate of trans- 1. An information source. The source selects one mes- VII. THE CAPACITY OF A CHANNEL IN THEmission of binary digits over a system when the signal is perturbed sage from a set of possible messages to be transmitted to PRESENCE OF WHITE THERMAL NOISEby various types of noise. Some of the properties of "ideal" systems theP+receivingN terminal. The message may be of various which transmit at this maxmum rate are discussed.CThe= equivalentWlog21N9(types; for example, a sequence of(1letters9) or numbers, as number of binary digits per second for certain information sources N It is not difficult to set up certain quantitativeis calculated.rela- in telegraphy or teletype, or a continuous function of timef(t), as in radio or telephony. tions that must hold when we change the product* DecimalTW.classification: 621.38. Original manuscript received by the Institute, July 23,with1940. asPresented,small1948aIREfrequencyNational Conven-of errors2. asThe desired.transmitter.ItThisis notoperatespos-on the message in Let us tion, New York, N. Y., March 24, 1948; and IRE New York Section, some way and produces a suitable for transmis- assume, for the present, that the noise inNewtheYork,sys-N. Y., Novembersible by12, 1947.any encoding method to send at a higher ratesignaland tem is a white thermal-noise band limited to thet BellbandTelephonehaveLaboratories,an arbitrarilyMurray Hill, N. J.low frequencysion ofto theerrors.receiving point over the channel. In teleph- W, and that it is added to the transmitted signal to pro-69 / 85 26/10/2016This showsOlivierthat Rioul the rate WShannon’slog Formula(P+W logN)(1 +/NSNR): Ameasures Historical Perspective in duce the received signal. A white thermal noise has the a sharply defined way the capacity of the channel for property that each sample is perturbed independently of transmitting information. It is a rather surprising result, all the others, and the distribution of each amplitude is since one would expect that reducing the frequency of Gaussian with standard deviation o =,\N where N is errors would require reducing the rate of transmission, the average noise power. How many different signals can and that the rate must approach zero as the error fre- be distinguished at the receiving point in spite of the quency does. Actually, we can send at the rate C but perturbations due to noise? A crude estimate can be ob- reduce errors byusing more involvedencoding and longer tained as follows. If the signal has a power P, then the delays at the transmitter and receiver. The transmitter perturbed signal will have a power P+N. The number will take long sequences of binary digits and represent of amplitudes that can be reasonably well distinguished this entire sequence by a particular signal function of is long duration. The delay is required because the trans- mitter must wait for the full sequence before the signal K /+N (16) is determined. Similarly, the receiver must wait for the full signal function before decoding into binary digits. where K is a small constant in the neighborhood of unity We new prove Theorem 2. In the geometrical repre- depending on how the phrase "reasonably well" is inter- sentation each signal point is surrounded by a small re- preted. If we require very good separation, K will be gion of uncertainty due to noise. With white thermal small, while toleration of occasional errors allows K to noise, the perturbations of the different samples (or co- 10 PROCEEDINGS OF THE I.R.E. January through the attenuator to the receiver. In this manner, space-charge type illustrated in Fig. 5. The shape of this the gain versus the cathode-potential-difference curve of curve corresponds rather closely with the shape of the Fig. 17 was obtained. This figure corresponds rather theoretical curve given in Fig. 7. Fig. 8 shows the output closely with the theoretical curve of propagation con- voltage versus drift-potential characteristic for the two- stant versus the inhomogeneity factor, shown in Fig. 1. velocity-type electron-wave tube. When the drift-tube voltage is high, the tube behaves like the two-cavity klystron amplifier. As the drift voltage is lowered the gain gradually increases, due to the space-charge inter- action effect, and achieves a maximum which is ap- 40 - - - I, I,- 15.ma. proximately 60 db higher than the output achieved with klystron operation. With further reduction of the drift- 30c1 -2X2 238.volts tube potential the output drops rather rapidly, because

- 3000 me. the space-charge conditions become unfavorable; that is, 20fC the inhomogeneity factor becomes too large. lo -_ The electronic bandwidth was measured by measur- ing the gain of the tube over a frequency range from 2000 to 3000 Mc and retuning the input and output cir- cuits for each frequency. It was observed that the gain of the tube was essentially constant over this frequency range, thus confirming the theoretical prediction of -20 electronic bandwidth of over 30 per cent at the gain of 80 db.

CATHODE POTENTIAL DIFFERENCE (V -V2)V I~ L The electron-wave tube, because of its remarkable 10 o 20 30 40 so 60 70 80 90 100 110 §20 property of achieving energy amplification without the Fig. 17-Gain versus cathode-potential-difference characteristics use of any resonant or waveguiding structures in the of the two-velocity-type electron-wave tube. amplifying region of the tube, promises to offer a satis- factory solution to the problem of generation and At a frequency of 3000 Mc and a total current of 15 ma, amplification of energy at millimeter wavelengths, and a net gain of 46 db was obtained, even though no at- thus will aid in expediting the exploitation of that por- tempt was made to match either the input or output tion of the electromagnetic spectrum. circuits. The lack of appropriate match is responsible for the fact that the gain curve assumes negative values ACKNOWLEDGMENT when the electronic gain is not sufficient to overcome the The author wishes to express his appreciation of the losses due to mismatch. At the peak of the curve, it is enthusiastic support of all his co-workers at the Naval estimated that the electronic gain is of the order of 80 Research Laboratory who helped to carry out this proj- db. Claude E. Shannonect from the stage of conception to the production and The curves of output voltage versus the potential of tests of experimental electron-wave tubes. The untiring the drift tube were shown in Figs. 8 and 9. Fig. 9 shows efforts of two of the author's assistants, C. B. Smith this characteristic for the electron-wave tube of the and R. S. Ware, are particularly appreciated.

Communication in the Presence of Noise* CLAUDE E. SHANNONt, MEMBER, IRE Summary-A method is developed for representing any com- I. INTRODUCTION munication system geometrically. Messages and the corresponding signals are points in two "function spaces," and the modulation A GENERAL COMMUNICATIONS system is process is a mapping of one space into the other. Using this repre- shown schematically in Fig. 1. It consists essen- sentation, a number of results in communication theory are deduced of elements. concerning expansion and compression of bandwidth and the tially five threshold effect. Formulas are found for the maxmum rate of trans- 1. An information source. The source selects one mes- mission of binary digits over a system when the signal is perturbed sage from a set of possible messages to be transmitted to by various types of noise. Some of the properties of "ideal" systems the receiving terminal. The message may be of various which transmit at this maxmum rate are discussed. The equivalent types; for example, a sequence of letters or numbers, as number of binary digits per second for certain information sources is calculated. in telegraphy or teletype, or a continuous function of timef(t), as in radio or telephony. * Decimal classification: 621.38. Original manuscript received by the Institute, July 23, 1940. Presented, 1948 IRE National Conven- 2. The transmitter. This operates on the message in tion, New York, N. Y., March 24, 1948; and IRE New York Section, some way and produces a signal suitable for transmis- New York, N.69Y., / 85November26/10/2016 12, 1947. Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective t Bell Telephone Laboratories, Murray Hill, N. J. sion to the receiving point over the channel. In teleph- 10 PROCEEDINGS OF THE I.R.E. January through the attenuator to the receiver. In this manner, space-charge type illustrated in Fig. 5. The shape of this the gain versus the cathode-potential-difference curve of curve corresponds rather closely with the shape of the Fig. 17 was obtained. This figure corresponds rather theoretical curve given in Fig. 7. Fig. 8 shows the output closely with the theoretical curve of propagation con- voltage versus drift-potential characteristic for the two- stant versus the inhomogeneity factor, shown in Fig. 1. velocity-type electron-wave tube. When the drift-tube voltage is high, the tube behaves like the two-cavity klystron amplifier. As the drift voltage is lowered the gain gradually increases, due to the space-charge inter- action effect, and achieves a maximum which is ap- 40 - - - I, I,- 15.ma. proximately 60 db higher than the output achieved with klystron operation. With further reduction of the drift- 30c1 -2X2 238.volts tube potential the output drops rather rapidly, because

- 3000 me. the space-charge conditions become unfavorable; that is, 20fC the inhomogeneity factor becomes too large. lo -_ The electronic bandwidth was measured by measur- ing the gain of the tube over a frequency range from 2000 to 3000 Mc and retuning the input and output cir- cuits for each frequency. It was observed that the gain of the tube was essentially constant over this frequency range, thus confirming the theoretical prediction of -20 electronic bandwidth of over 30 per cent at the gain of 80 db.

CATHODE POTENTIAL DIFFERENCE (V -V2)V I~ L The electron-wave tube, because of its remarkable 10 o 20 30 40 so 60 70 80 90 100 110 §20 property of achieving energy amplification without the Fig. 17-Gain versus cathode-potential-difference characteristics use of any resonant or waveguiding structures in the of the two-velocity-type electron-wave tube. amplifying region of the tube, promises to offer a satis- factory solution to the problem of generation and At a frequency of 3000 Mc and a total current of 15 ma, amplification of energy at millimeter wavelengths, and a net gain of 46 db was obtained, even though no at- thus will aid in expediting the exploitation of that por- tempt was made to match either the input or output tion of the electromagnetic spectrum. circuits. The lack of appropriate match is responsible for the fact that the gain curve assumes negative values ACKNOWLEDGMENT when the electronic gain is not sufficient to overcome the The author wishes to express his appreciation of the losses due to mismatch. At the peak of the curve, it is enthusiastic support of all his co-workers at the Naval estimated that the electronic gain is of the order of 80 Research Laboratory who helped to carry out this proj- db. ect from the stage of conception to the production and The curves of output voltage versus the potential of tests of experimental electron-wave tubes. The untiring the drift tube were shownClaudein Figs. 8 and E.9. ShannonFig. 9 shows efforts of two of the author's assistants, C. B. Smith this characteristic for the electron-wave tube of the and R. S. Ware, are particularly appreciated.

Communication in the Presence of Noise* CLAUDE E. SHANNONt, MEMBER, IRE Summary-A method is developed for representing any com- I. INTRODUCTION munication system geometrically. Messages and the corresponding signals are points in two "function spaces," and the modulation A GENERAL COMMUNICATIONS system is process is a mapping of one space into the other. Using this repre- shown schematically in Fig. 1. It consists essen- sentation, a number of results in communication theory are deduced of elements. concerning expansion and compression of bandwidth and the tially five threshold effect. Formulas are found for the maxmum rate of trans- 1. An information source. The source selects one mes- mission of binary digits over a system when the signal is perturbed sage from a set of possible messages to be transmitted to by various types of noise. Some of the properties of "ideal" systems the receiving terminal. The message may be of various which transmit at this maxmum rate are discussed. The equivalent types; for example, a sequence of letters or numbers, as number of binary digits per second for certain information sources is calculated. in telegraphy or teletype, or a continuous function of timef(t), as in radio or telephony. * Decimal classification: 621.38. Original manuscript received by the Institute, July 23, 1940. Presented, 1948 IRE National Conven- 2. The transmitter. This operates on the message in tion, New York, N. Y., March 24, 1948; and IRE New York Section, some way and produces a suitable for transmis- New York, N. Y., November 12, 1947. signal t Bell Telephone Laboratories, Murray Hill, N. J. sion to the receiving point over the channel. In teleph-

69 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Clavier & Laplume

What about the French?

Deux ingénieurs français ont publié la même « formule de Shannon » en 1948:

70 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective What about the French?

Deux ingénieurs français ont publié la même « formule de Shannon » en 1948: Clavier & Laplume

70 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective André G. Clavier

71 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective André G. Clavier

71 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Jacques Laplume

Meanwhile (1948), far away. . .

72 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective More on Jacques Laplume...

Histoire des sciences / Évolution des disciplines et histoire des découvertes — Octobre 2016

Laplume, sous le masque

par Patrick Flandrin (directeur de recherche CNRS à l'École normale supérieure de Lyon, membre de l'Académie des sciences) et Olivier Rioul (professeur à Télécom-ParisTech et professeur chargé de cours à l’École Polytechnique)

Cette note vise à faire sortir de l’oubli un travail original de 1948 de l’ingénieur français Jacques Laplume, relatif au calcul de la capacité d’un canal bruité de bande passante donnée. La publication de sa Note dans les Comptes Rendus de l’Académie des sciences a précédé de peu celle de l’article du mathématicien américain Claude E. Shannon, fondateur de la théorie de l’information, ainsi que celles de plusieurs chercheurs aux U.S.A. qui avaient proposé la même année 1948 des formules de capacité analogues. La singularité de Jacques 73 / 85 26/10/2016 Laplume réside dansOlivier le Rioul fait qu’il travaillait indépendammentShannon’s et isolément Formula W enlog France,(1 + SNR et) : que A Historical son approche Perspective est (contrairement à Shannon) plus physique que mathématique bien qu’exploitant explicitement (comme Shannon) le caractère aléatoire du bruit. La Note est semble-t-il tombée rapidement dans l’oubli et n’a été exhumée qu’en 1998, à l’occasion du cinquantenaire de la théorie de l’information. Même au début de cette année 2016 qui marque le centième anniversaire de la naissance de Shannon, nous ne connaissions que très peu de choses sur l’œuvre et la vie de Jacques Laplume. Il nous a fallu un véritable travail de détective pour remonter jusqu’à sa famille et découvrir que cette Note de Jacques Laplume n’était qu’une parmi de nombreux travaux extrêmement divers d’une personnalité forte aux multiples talents qui s’inscrit dans une véritable saga familiale.

1948, annus mirabilis

Nous fêtons en 2016 le centenaire de la naissance de Claude E. Shannon, mathématicien américain considéré comme le « père de l’Âge de l’information ». C’est l’occasion d’un retour sur sa théorie de l’information et sur la formule emblématique de la capacité d’un canal bruité qui couronne son œuvre : C = W log (1+SNR) où C est la capacité (débit maximal pour une transmission fiable) exprimée en bits par seconde, W est la largeur de bande passante du canal, et SNR ("Signal to Noise Ratio") est le rapport signal sur bruit présent lors de la transmission.

La théorie de l’information naît en 1948, dans un seul article de Shannon publié en deux parties en juillet et en octobre, « A Mathematical Theory of Communication » (Shannon, 1948). Cet article rassemble tellement d’avancées fondamentales et de coups de génie qu’on peut dire sans exagérer que Shannon est le héros mathématicien dont les théorèmes ont rendu possible le monde du numérique que nous connaissons aujourd’hui. Sa formule de la capacité y est démontrée rigoureusement dans un modèle de bruit additif gaussien en se fondant sur la théorie des probabilités, 1 Laplume, sous le masque – Patrick Flandrin et Olivier Rioul – Histoire des Sciences – Octobre 2016 Tous droits de reproduction et de représentation réservés © Académie des sciences would actually be the Shannon-Laplume-Tuller-Wiener-Clavier-Earp-Goldman-Sullivan formula

Who’s formula?

The “Shannon-Hartley” formula

P C = 1 log 1 + 2 2 N  

74 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Who’s formula?

The “Shannon-Hartley” formula

P C = 1 log 1 + 2 2 N   would actually be the Shannon-Laplume-Tuller-Wiener-Clavier-Earp-Goldman-Sullivan formula

74 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Outline

Who is Claude Shannon? Shannon’s Seminal Paper Shannon’s Main Contributions Shannon’s Capacity Formula A Hartley’s rule C0 = log2 1 + ∆ is not Hartley’s   1 P Many authors independently derived C = 2 log2 1 + N in 1948.

Hartley’s rule is exact: C0 = C (a coincidence?)  

C0 is the capacity of the “uniform” channel Shannon’s Conclusion

75 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Hence A 3P P log 1 + = 1 log (1 + M2 1) = 1 log 1 + = 1 log 1 + 2 ∆ 2 2 − 2 2 ∆2 2 2 N       i.e., C0 = C . A mathematical coïncidence?

“Hartley”’s argument

The channel input X is taking M = 1 + A/∆ equiprobable values in the set A, A + 2∆,..., A 2∆, A : {− − − n } 2 2 1 2 2 M 1 P = E(X ) = (M 1 2k) = ∆ − . M − − 3 k=0 X The input is mixed with additive noise Z with accuracy ∆, i.e. ± having uniform distribution in [ ∆, ∆]: − ∆ 2 2 1 2 ∆ N = E(Z ) = z dz = . 2∆ ∆ 3 Z−

76 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective “Hartley”’s argument

The channel input X is taking M = 1 + A/∆ equiprobable values in the set A, A + 2∆,..., A 2∆, A : {− − − n } 2 2 1 2 2 M 1 P = E(X ) = (M 1 2k) = ∆ − . M − − 3 k=0 X The input is mixed with additive noise Z with accuracy ∆, i.e. ± having uniform distribution in [ ∆, ∆]: − ∆ 2 2 1 2 ∆ N = E(Z ) = z dz = . 2∆ ∆ 3 Z− Hence A 3P P log 1 + = 1 log (1 + M2 1) = 1 log 1 + = 1 log 1 + 2 ∆ 2 2 − 2 2 ∆2 2 2 N       i.e., C0 = C . A mathematical coïncidence?

76 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Outline

Who is Claude Shannon? Shannon’s Seminal Paper Shannon’s Main Contributions Shannon’s Capacity Formula A Hartley’s rule C0 = log2 1 + ∆ is not Hartley’s   1 P Many authors independently derived C = 2 log2 1 + N in 1948.

Hartley’s rule is exact: C0 = C (a coincidence?)  

C0 is the capacity of the “uniform” channel Shannon’s Conclusion

77 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective The uniform channel

The capacity of Y = X + Z with additive uniform noise Z is max I(X; Y) = max h(Y) h(Y X) X s.t. X A X − | | |≤ = max h(Y) h(Z) X −

= max h(Y) log2(2∆) X s.t. Y A+∆ − | |≤ Choose X to be discrete uniform in A, A + 2∆,..., A , then ∗ {− − } Y = X + Z has uniform density over [ A ∆, A + ∆], which ∗ − − maximizes differential entropy:

= log (2(A + ∆)) log (2∆) 2 − 2 A = log 1 + 2 ∆  

78 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Further analogy: Shannon used the entropy power inequality to show that under limited power, Gaussian is the worst possible noise in the channel: P P 1 log 1 + α C 1 log 1 + + 1 log α, 2 2 N ≤ ≤ 2 2 N 2 2     where α = N/N˜ 1 ≥ We can show: under limited amplitude, uniform noise is the worst possible noise one can inflict in the channel: A A log 1 + C0 log 1 + + log α, 2 ∆ ≤ ≤ 2 ∆ 2     where α = ∆/∆˜ 1. ≥

What is the worst noise?

0 A Thus C = log2 1 + ∆ is correct as a capacity! But: the noise is not Gaussian, but uniform; signal limitation is not on the power, but on the amplitude.

79 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective What is the worst noise?

0 A Thus C = log2 1 + ∆ is correct as a capacity! But: the noise is not Gaussian, but uniform; signal limitation is not on the power, but on the amplitude. Further analogy: Shannon used the entropy power inequality to show that under limited power, Gaussian is the worst possible noise in the channel: P P 1 log 1 + α C 1 log 1 + + 1 log α, 2 2 N ≤ ≤ 2 2 N 2 2     where α = N/N˜ 1 ≥ We can show: under limited amplitude, uniform noise is the worst possible noise one can inflict in the channel: A A log 1 + C0 log 1 + + log α, 2 ∆ ≤ ≤ 2 ∆ 2     where α = ∆/∆˜ 1. 79 / 85 26/10/2016 Olivier≥ Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective we can explain the coincidence by deriving necessary and 1 P sufficient conditions s.t. C = 2 log2 1 + N . the uniform (Tuller) and Gaussian (Shannon)  channels are not the only examples. using B-splines, we can construct a sequence of such additive noise channels s.t.

uniform channel Gaussian channel −−−−−−−−−−−−−→

Conclusion

Why is Shannon’s formula ubiquitous?

80 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective the uniform (Tuller) and Gaussian (Shannon) channels are not the only examples. using B-splines, we can construct a sequence of such additive noise channels s.t.

uniform channel Gaussian channel −−−−−−−−−−−−−→

Conclusion

Why is Shannon’s formula ubiquitous? we can explain the coincidence by deriving necessary and 1 P sufficient conditions s.t. C = 2 log2 1 + N .  

80 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective using B-splines, we can construct a sequence of such additive noise channels s.t.

uniform channel Gaussian channel −−−−−−−−−−−−−→

Conclusion

Why is Shannon’s formula ubiquitous? we can explain the coincidence by deriving necessary and 1 P sufficient conditions s.t. C = 2 log2 1 + N . the uniform (Tuller) and Gaussian (Shannon)  channels are not the only examples.

80 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Conclusion

Why is Shannon’s formula ubiquitous? we can explain the coincidence by deriving necessary and 1 P sufficient conditions s.t. C = 2 log2 1 + N . the uniform (Tuller) and Gaussian (Shannon)  channels are not the only examples. using B-splines, we can construct a sequence of such additive noise channels s.t.

uniform channel Gaussian channel −−−−−−−−−−−−−→

80 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Conclusion

Why is Shannon’s formula ubiquitous? we can explain the coincidence by deriving necessary and 1 P sufficient conditions s.t. C = 2 log2 1 + N . the uniform (Tuller) and Gaussian (Shannon)  channels are not the only examples. using B-splines, we can construct a sequence of such additive noise channels s.t. uniform channel Gaussian channel −−−−−−−−−−−−−→

Rioul & Magossi, “On Shannon’s formula and Hartley’s rule: Beyond the mathematical coincidence,” in Journal Entropy, Vol. 16, No. 9, pp. 4892-4910, Sept. 2014. http://www.mdpi.com/1099-4300/16/9/4892/

80 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Entropy 2014, 16 4906

1 z Proof. Since pZ (z)= d is the (d +1)th convolution power of the rectangle function of the d 2 · 2 interval [ , ], the corresponding characteristic function is a (d +1)th power of a cardinal sine: (!) = sincd+1( !). Zd · Let M>0 be an integer. From Example 4, we have

d+1 d+1 Zd (M!) sinc (M !) sin(M !) = d+1 · = · Zd (!) sinc ( !) M sin( !) · ⇣ · ⌘ d+1 1 i(M 1)! i(M 3)! i(M 1)! = e + e + + e . M ··· ⇣ ⌘ This is the characteristic function of the random variable

X = X + + X , d M,0 ··· M,d where the X are i.i.d. and take M equiprobable values in the set (M 1), (M 3),..., M,i { (M 3), (M 1) . Hence, Theorem 7 applies with ↵ = M and cost function (7). } Again for d =0one recovers the case of the uniform channel with input X0 = XM,0 taking M equiprobable values in the set (M 1), (M 3),...,(M 3), (M 1) (Figure 1a). In { } general, the probability distribution of Xd is the (d +1)th discrete convolution power of the uniform

distribution. For d =1, the pdf of the noise has a triangular shape and the distribution of Xd is also triangular (Figure 1b). As d increases, it becomes closer to a Gaussian shape (Figure 1c,d).

Figure 1. Discrete plots of input probability distributions (of Xd) that attain capacity for B-splinesM =15 channelsand different values of d.

(a) d =0(rectangular) (b) d =1(triangular)

(c) d =2 (d) d =3

6.3. Convergence as d + ! 1 To determine the limit behavior as d + , we need to apply some normalization on the probability ! 1 distributions. Since the pdf of Zd is obtained by successive convolutions of rectangles of length 2, its

81 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Outline

Who is Claude Shannon? Shannon’s Seminal Paper Shannon’s Main Contributions Shannon’s Capacity Formula A Hartley’s rule C0 = log2 1 + ∆ is not Hartley’s   1 P Many authors independently derived C = 2 log2 1 + N in 1948.

Hartley’s rule is exact: C0 = C (a coincidence?)  

C0 is the capacity of the “uniform” channel Shannon’s Conclusion

82 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Shannon on Information Theory

“I didn’t think at the first stages that it was going to have a great deal of impact. I enjoyed working on this kind of a problem, as I have enjoyed working on many other problems, without any notion of either financial or gain in the sense of being famous; and I think indeed that most scientists are oriented that way, that they are working because they like the game.”

83 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Thank you!

84 / 85 26/10/2016 Olivier Rioul Shannon’s Formula W log(1 + SNR): A Historical Perspective Shannon’s Formula W log(1 + SNR): A Historical· Perspective on the occasion of Shannon’s Centenary

Oct. 26th, 2016

Olivier Rioul