ECE 515

Channel Capacity and Coding

1 Information Theory Problems

• How to transmit or store information as efficiently as possible. • What is the maximum amount of information that can be transmitted or stored reliably? • How can information be kept secure?

2 Digital Communications System

W X PX

Data source

PY|X

Data sink

Ŵ Y PY

Ŵ is an estimate of W

3 Communication Channel

• Cover and Thomas Chapter 7

4 Discrete Memoryless Channel X Y

x1 y1

x2 y2 ...... xK

yJ J≥K≥2

5 Discrete Memoryless Channel

Channel Transition Matrix

py(|)11 x py (|)1 xkK py (|)1 x  

P=  py(|)j x py (|)jk x py (|)jK x 1   py(|)J x1  py (|)Jk x py (|)JK x

6 Binary Symmetric Channel

x1 y1

x2 y2 channel matrix pp  pp 7 Binary Errors with Erasure Channel

1-p-q x 0 0 y 1 q 1 p

e y2 p q x2 1 1 y3 1-p-q

8 BPSK Modulation K=2

2E x t = + b cos 2π ft 0 bit 1 ( ) ( c ) Eb is the energy per bit Tb Tb is the bit duration 2E f is the carrier x t = − b cos 2π ft 1 bit c 2 ( ) ( c ) frequency Tb

9 BPSK Demodulation in AWGN

− Eb 0 + Eb

x2 x1

p(y|x2) p(y|x1)

0

x2 x1

2 1 (yx− k ) pYX (yx |= xk ) = exp− | 2 2 2πσ 2σ 10 BPSK Demodulation K=2

− + Eb 0 Eb J=3 y3 y2 y1

− + Eb 0 Eb J=4 y4 y3 y2 y1

− + Eb 0 Eb J=8 y8 y1

11 for a BSC

crossover probability p X BSC Y pp=1 −

channel matrix pp PY|X =  pp X Y p w 0 0 p p(xw= 0) = p p(x==−= 1) 1 ww w 1 1 p

12 13 Convex Functions

Concave Convex

14 Concave Function

15 Convex Function

16 Mutual Information

17 18 input probabilities

19 20 21 channel transition probabilities

22 BSC I(X;Y)

w p

23 Channel Capacity

X channel Y

24 Binary Symmetric Channel

x1 y1

x2 y2 C=−== 1h()forp ww 1/2

25 Properties of the Channel Capacity

• C ≥ 0 since I(X;Y) ≥ 0 • C ≤ log|X| = log(K) since C = max I(X;Y) ≤ max H(X) = log(K) • C ≤ log|Y| = log(J) for the same reason • I(X;Y) is a concave function of p(X), so a local maximum is a global maximum

26 Symmetric Channels

A discrete memoryless channel is said to be symmetric if the set of output symbols

{yj}, j = 1, 2, ..., J, can be partitioned into subsets such that for each subset of the matrix of transition probabilities, each column is a permutation of the other columns, and each row is also a permutation of the other rows.

27 Binary Channels

Symmetric channel matrix Non-symmetric channel matrix

1 − pp 1 − pp12 P =  P= pp12≠ pp1 − pp121 −

28 Binary Errors with Erasure Channel

1-p-q x 0 0 y 1 q 1 p

e y2 p q x2 1 1 y3 1-p-q

29 Symmetric Channels

• No partition required → strongly symmetric • Partition required → weakly symmetric

30 Capacity of a Strongly Symmetric Channel

Theorem

KJ I(X;Y)= ∑∑ p(xk ) p( yx jk | )logp( yx jk | )+ H(Y) kj=11 = J = ∑p(yxjk | )logp( yx jk | )+ H(Y) j=1

31 Example J = K = 3

x .7 y 1 .1 .2 1 .2 .7 x2 y2 .1 .2 .1 x3 .7 y3

32 Example .7 .2 .1  P = Y|X .1 .7 .2 .2 .1 .7

KJ ∑∑p(xk ) p( yx jk | )logp( yx jk | ) kj=11 =

=p(x1 )[ .7log.7 ++ .1log.1 .2log.2]

+p(x2 )[ .2log.2 ++ .7log.7 .1log.1]

+p(x3 )[ .1log.1 ++ .2log.2 .7log.7] =.7log.7 ++ .2log.2 .1log.1 J = ∑p(yxjk | )logp( yx jk | ) j=1

33 Example

J H(Y)= −∑ p(yyjj )logp( ) j=1 K p(y11 )= ∑ p( yx |kk )p( x ) k=1 K p(y22 )= ∑ p( yx |kk )p( x ) k=1  K p(yJ )= ∑ p( yx Jk | )p( x k ) k=1

p(y1 )=++ .7p( xxx 123 ) .2p( ) .1p( ) p(y2 )=++ .1p( xxx 123 ) .7p( ) .2p( ) p(y3 )=++ .2p( xxx 123 ) .1p( ) .7p( ) 34 r-ary Symmetric Channel

pp p 1 − p  rr−− r − 11 1 p pp 1 − p  r−1 rr −− 11 Pp=  p p 1 − p  rr−−11 r − 1   ppp  1 − p −−−rrr111 

35 r-ary Symmetric Channel

pp C=− (1p )log(1 −+− pr ) ( 1) log( )+ logr rr−−11 p =logr +− (1 p )log(1 −+ pp ) log( ) r −1 =logr +− (1 p )log(1 −+ pp ) log( pp ) − log( r − 1) =logr −− hp ( ) p log( r − 1)

• r = 2 C = 1 – h(p) • r = 3 C = log23 – h(p) – p • r = 4 C = 2 – h(p) – plog23

36 Binary Errors with Erasure Channel

1-p-q 0 0 q p e p q 1 1 1-p-q

37 Binary Errors with Erasure Channel

.8 .05 =  PY|X  .15 .15 .05 .8

.425 .5 = = ×= PX  PY P Y|X P X  .15 .5 .425

38 Capacity of a Weakly Symmetric Channel

L CC= ∑qii i=1

• qi – probability of channel i

• Ci – capacity of channel i

39 Binary Errors with Erasure Channel

1 −−pq 1 − q q 0 0 0 p q 1 − q p e − 1 q q 1 1 −−pq 1 1 q 1 − q .9412 .0588 P1 = P2 = [ 1.0 1.0] .0588 .9412

40 Binary Errors with Erasure Channel

L CC= ∑qii i=1

C1 = .9412log.9412 + .0588log.0588 += log2 .6773

C2 = 1log1 += log1 0.0

C= .85(.6773) += .15(0.0) .5757

41 Binary Erasure Channel

1-p 0 0 p

e

p 1 1 1-p

C = 1 - p

42 Z Channel (Optical)

1 0 0 (light on)

p

1 1 (light off) 1-p

43 Z Channel (Optical)

22 p(yxjk | ) I(X;Y)= ∑∑ p(xk )p( yx jk | )log  kj=11 = p(y j )

1 I(x1 ;Y)= log  w+ pw

p 1 I(xp2 ;Y)= log+− (1p )log w+ pw w

I(X;Y) = w× I(xx12 ;Y) +× w I( ;Y)

44 Z Channel (Optical)

I(X;Y) = w× I(xx12 ;Y) +× w I( ;Y)

1 w*1= − (1−+p )(1 2h(pp )/(1− ) )

pp/(1− ) C= log2 ( 1 +− (1pp ) )

45 Mutual Information for the Z Channel

• p = 0.15

I(X;Y)

w

46 Channel Capacity for the Z, BSC and BEC

p

47 Blahut-Arimoto Algorithm  KJ p(yx | ) = jk I(X;Y)∑∑ p(xk ) p( yx jk | )log K kj=11 = px( )( py | x ) ∑ l jl l=1 • An analytic solution for the capacity can be very difficult to obtain • The alternative is a numerical solution – Arimoto Jan. 1972 – Blahut Jul. 1972 • Exploits the fact that I(X;Y) is a concave function of p(xk) 48 Blahut-Arimoto Algorithm

49 Blahut-Arimoto Algorithm

• Update the probabilities

50 51 Symmetric Channel Example

52 53 Non-Symmetric Channel Example

0.7000

54 55 56 57 58 Binary Symmetric Channel

x1 y1

x2 y2 channel matrix pp  pp=1 − pp 59 Binary Symmetric Channel

• Consider a block of N = 1000 bits – if p = 0, 1000 bits are received correctly – if p = 0.01, 990 bits are received correctly – if p = 0.5, 500 bits are received correctly • When p > 0, we do not know which bits are in error – if p = 0.01, C = .919 bit – if p = 0.5, C = 0 bit

60 Triple Repetition

• N = 3 message w codeword c 0 000 1 111

61 Binary Symmetric Channel Errors

• If N bits are transmitted, the probability of an m bit error pattern is Nm− ppm (1 − ) • The probability of exactly m errors is Nm− N ppm 1 − ( m) ( ) • The probability of m or more errors is N Ni− N ppi 1 − ∑( i ) ( ) im=

62 Triple Repetition Code • N = 3 • The probability of 0 errors is (1− p )3 • The probability of 1 error is 3p(1− p )2 • The probability of 2 errors is 3pp2 (1− ) • The probability of 3 errors is p3

63 Triple Repetition Code

• For p = 0.01 – The probability of 0 errors is .970 – The probability of 1 error is 2.94×10-2 – The probability of 2 errors is 2.97×10-4 – The probability of 3 errors is 10-6

1 • If p  2 p(0 errors) p(1 error) p(2 errors)  p(3 errors)

64 Triple Repetition Code – Decoding

Received Word Codeword Error Pattern 000 000 000 001 000 001 010 000 010 100 000 100 111 111 000 110 111 001 101 111 010 011 111 100

65 Triple Repetition Code • Majority vote or nearest neighbor decoding will correct all single errors 000, 001, 010, 100 → 000 111, 110, 101, 011 → 111 • The probability of a decoding error is then 2 323 Pe = 3p (1 −+ pp ) = 3 p − 2 p < p

• If p = 0.01, then Pe = 0.000298 and only one word in 3356 will be in error after decoding. • A reduction by a factor of 33.

66 Code Rate

• After compression, the data is (almost) memoryless and uniformly distributed (equiprobable) • Thus the entropy of the messages (codewords) is

H(WM ) = logb • The blocklength of a codeword is N

67 Code Rate

• The code rate is given by log M R = b bits per channel use N • M is the number of codewords • N is the block length • For the triple repetition code log 2 1 R = 2 = 33

68 Repetition

Code rate R 69 YN XN

70 Shannon’s Noisy Coding Theorem

For any ε > 0 and for any rate R less than the channel capacity C, there is an encoding and decoding scheme that can be used to ensure that the probability of decoding error is less than ε for a sufficiently large block length N.

71 Code rate R 72 Error Correction Coding N = 3

• R = 1/3 M = 2 0 → 000 1 → 111 • R = 1 M = 8 000 → 000 001 → 001 010 → 010 011 → 011 111 → 111 110 → 110 101 → 101 100 → 100 • Another choice R = 2/3 M = 4 00 → 000 01 → 011 10 → 101 11 → 110

73 Error Correction Coding N = 3

• BSC p = 0.01 • M is the number of codewords

NR Code Rate R Pe M=2 1 0.0297 8 2/3 0.0199 4 1/3 2.98×10-4 2

• Tradeoff between code rate and error rate

74 for N=3

101 111 101 111

001 011 011

100 110 110

000 010 000 000 R=1 R=2/3 R=1/3

75 Error Correction Coding N = 5

• BSC p = 0.01

NR Code Rate R Pe M=2 1 0.0490 32 4/5 0.0394 16 3/5 0.0297 8 2/5 9.80×10-4 4 1/5 9.85×10-6 2

• Tradeoff between code rate and error rate

76 Error Correction Coding N = 7

• BSC p = 0.01 N = 7

NR Code Rate R Pe M=2 1 0.0679 128 6/7 0.0585 64 5/7 0.0490 32 4/7 2.03×10-3 16 3/7 1.46×10-3 8 2/7 9.80×10-4 4 1/7 3.40×10-7 2 • Tradeoff between code rate and error rate

77 Communication over Noisy Channels

X

78 Binary Symmetric Channel Errors

• If N bits are transmitted, the probability of an m bit error pattern is Nm− ppm (1 − ) • The probability of exactly m errors is Nm− N ppm 1 − ( m) ( ) • The probability of m or more errors is N Ni− N ppi 1 − ∑( i ) ( ) im=

79 BSC Error Correction

• For a code with codewords of length N bits that can correct up to t errors, the probability of error after decoding is

N Ni− P1=N ppi − e ∑ ( i ) ( ) it= +1

• This is the probably of t+1 or more errors

80 Best Codes Comparison

• BSC p = 0.01 R = 2/3 M = 2NR

N Pe log2M 3 1.99×10-2 2 12 6.17×10-3 8 30 3.32×10-3 20 51 1.72×10-3 34 81 1.36×10-3 54

• For fixed R, Pe can be decreased by increasing N

81 YN XN

82 Code Matrix

83 Binary Codes

• For given values of M and N, there are 2MN possible binary codes. • Of these, some will be bad, some will be best (optimal), and some will be good, in terms of

Pe • An average code will be good.

84 85 Channel Capacity

• To prove that information can be transmitted reliably over a noisy channel at rates up to the capacity, Shannon used a number of new concepts – Allowing an arbitrarily small but nonzero probability of error – Using long codewords – Calculating the average probability of error over a random choice of codes to show that at least one good code exists

86 Channel Coding Theorem

• Random coding used in the proof • Joint typicality used as the decoding rule • Shows that good codes exist which provide an arbitrarily small probability of error • Does not provide an explicit way of constructing good codes • If a long code (large N) is generated randomly, the code is likely to be good but is difficult to decode

87 The Data Processing Inequality Cascaded Channels

I(W;Y)

W X Y I(W;X) I(X;Y)

The mutual information I(W;Y) for the cascade cannot be larger than I(W;X) or I(X;Y), so that I(W;Y) ≤ I(W;X) I(W;Y) ≤ I(X;Y)

88 Communication over Noisy Channels

X

89 90 Channel Capacity: Weak Converse

For R > C, the decoding error probability is bounded away from 0

lower bound

on Pe

R C

91 Channel Capacity: Weak Converse

• C = 0.3 1-C/R

R

92 Fano’s Inequality h(Pe)+Pelogb(r-1)

logbr

logb(r-1)

Pe 0 (r-1)/r 1

93 Fano’s Inequality

H(X|Y) ≤ h(Pe)+Pelogb(r-1)

H(X|Y) ≤ 1+NRPe(N,R)

94 Channel Capacity: Strong Converse

• For rates above capacity (R > C)

-NEA (R) Pe (N,R)≥− 1 2

• where EA(R) is Arimoto’s error exponent

95 Arimoto’s Error Exponent EA(R)

96 EA(R) for a BSC with p=0.1

97 • The capacity is a very clear dividing point • At rates below capacity, Pe → 0 exponentially as N → ∞ • At rates above capacity, Pe → 1 exponentially as N → ∞

Pe

1

R C

98