ECE 515 Information Theory
Channel Capacity and Coding
1 Information Theory Problems
• How to transmit or store information as efficiently as possible. • What is the maximum amount of information that can be transmitted or stored reliably? • How can information be kept secure?
2 Digital Communications System
W X PX
Data source
PY|X
Data sink
Ŵ Y PY
Ŵ is an estimate of W
3 Communication Channel
• Cover and Thomas Chapter 7
4 Discrete Memoryless Channel X Y
x1 y1
x2 y2 ...... xK
yJ J≥K≥2
5 Discrete Memoryless Channel
Channel Transition Matrix
py(|)11 x py (|)1 xkK py (|)1 x
P= py(|)j x py (|)jk x py (|)jK x 1 py(|)J x1 py (|)Jk x py (|)JK x
6 Binary Symmetric Channel
x1 y1
x2 y2 channel matrix pp pp 7 Binary Errors with Erasure Channel
1-p-q x 0 0 y 1 q 1 p
e y2 p q x2 1 1 y3 1-p-q
8 BPSK Modulation K=2
2E x t = + b cos 2π ft 0 bit 1 ( ) ( c ) Eb is the energy per bit Tb Tb is the bit duration 2E f is the carrier x t = − b cos 2π ft 1 bit c 2 ( ) ( c ) frequency Tb
9 BPSK Demodulation in AWGN
− Eb 0 + Eb
x2 x1
p(y|x2) p(y|x1)
0
x2 x1
2 1 (yx− k ) pYX (yx |= xk ) = exp− | 2 2 2πσ 2σ 10 BPSK Demodulation K=2
− + Eb 0 Eb J=3 y3 y2 y1
− + Eb 0 Eb J=4 y4 y3 y2 y1
− + Eb 0 Eb J=8 y8 y1
11 Mutual Information for a BSC
crossover probability p X BSC Y pp=1 −
channel matrix pp PY|X = pp X Y p w 0 0 p p(xw= 0) = p p(x==−= 1) 1 ww w 1 1 p
12 13 Convex Functions
Concave Convex
14 Concave Function
15 Convex Function
16 Mutual Information
17 18 input probabilities
19 20 21 channel transition probabilities
22 BSC I(X;Y)
w p
23 Channel Capacity
X channel Y
24 Binary Symmetric Channel
x1 y1
x2 y2 C=−== 1h()forp ww 1/2
25 Properties of the Channel Capacity
• C ≥ 0 since I(X;Y) ≥ 0 • C ≤ log|X| = log(K) since C = max I(X;Y) ≤ max H(X) = log(K) • C ≤ log|Y| = log(J) for the same reason • I(X;Y) is a concave function of p(X), so a local maximum is a global maximum
26 Symmetric Channels
A discrete memoryless channel is said to be symmetric if the set of output symbols
{yj}, j = 1, 2, ..., J, can be partitioned into subsets such that for each subset of the matrix of transition probabilities, each column is a permutation of the other columns, and each row is also a permutation of the other rows.
27 Binary Channels
Symmetric channel matrix Non-symmetric channel matrix
1 − pp 1 − pp12 P = P= pp12≠ pp1 − pp121 −
28 Binary Errors with Erasure Channel
1-p-q x 0 0 y 1 q 1 p
e y2 p q x2 1 1 y3 1-p-q
29 Symmetric Channels
• No partition required → strongly symmetric • Partition required → weakly symmetric
30 Capacity of a Strongly Symmetric Channel
Theorem
KJ I(X;Y)= ∑∑ p(xk ) p( yx jk | )logp( yx jk | )+ H(Y) kj=11 = J = ∑p(yxjk | )logp( yx jk | )+ H(Y) j=1
31 Example J = K = 3
x .7 y 1 .1 .2 1 .2 .7 x2 y2 .1 .2 .1 x3 .7 y3
32 Example .7 .2 .1 P = Y|X .1 .7 .2 .2 .1 .7
KJ ∑∑p(xk ) p( yx jk | )logp( yx jk | ) kj=11 =
=p(x1 )[ .7log.7 ++ .1log.1 .2log.2]
+p(x2 )[ .2log.2 ++ .7log.7 .1log.1]
+p(x3 )[ .1log.1 ++ .2log.2 .7log.7] =.7log.7 ++ .2log.2 .1log.1 J = ∑p(yxjk | )logp( yx jk | ) j=1
33 Example
J H(Y)= −∑ p(yyjj )logp( ) j=1 K p(y11 )= ∑ p( yx |kk )p( x ) k=1 K p(y22 )= ∑ p( yx |kk )p( x ) k=1 K p(yJ )= ∑ p( yx Jk | )p( x k ) k=1
p(y1 )=++ .7p( xxx 123 ) .2p( ) .1p( ) p(y2 )=++ .1p( xxx 123 ) .7p( ) .2p( ) p(y3 )=++ .2p( xxx 123 ) .1p( ) .7p( ) 34 r-ary Symmetric Channel
pp p 1 − p rr−− r − 11 1 p pp 1 − p r−1 rr −− 11 Pp= p p 1 − p rr−−11 r − 1 ppp 1 − p −−−rrr111
35 r-ary Symmetric Channel
pp C=− (1p )log(1 −+− pr ) ( 1) log( )+ logr rr−−11 p =logr +− (1 p )log(1 −+ pp ) log( ) r −1 =logr +− (1 p )log(1 −+ pp ) log( pp ) − log( r − 1) =logr −− hp ( ) p log( r − 1)
• r = 2 C = 1 – h(p) • r = 3 C = log23 – h(p) – p • r = 4 C = 2 – h(p) – plog23
36 Binary Errors with Erasure Channel
1-p-q 0 0 q p e p q 1 1 1-p-q
37 Binary Errors with Erasure Channel
.8 .05 = PY|X .15 .15 .05 .8
.425 .5 = = ×= PX PY P Y|X P X .15 .5 .425
38 Capacity of a Weakly Symmetric Channel
L CC= ∑qii i=1
• qi – probability of channel i
• Ci – capacity of channel i
39 Binary Errors with Erasure Channel
1 −−pq 1 − q q 0 0 0 p q 1 − q p e − 1 q q 1 1 −−pq 1 1 q 1 − q .9412 .0588 P1 = P2 = [ 1.0 1.0] .0588 .9412
40 Binary Errors with Erasure Channel
L CC= ∑qii i=1
C1 = .9412log.9412 + .0588log.0588 += log2 .6773
C2 = 1log1 += log1 0.0
C= .85(.6773) += .15(0.0) .5757
41 Binary Erasure Channel
1-p 0 0 p
e
p 1 1 1-p
C = 1 - p
42 Z Channel (Optical)
1 0 0 (light on)
p
1 1 (light off) 1-p
43 Z Channel (Optical)
22 p(yxjk | ) I(X;Y)= ∑∑ p(xk )p( yx jk | )log kj=11 = p(y j )
1 I(x1 ;Y)= log w+ pw
p 1 I(xp2 ;Y)= log+− (1p )log w+ pw w
I(X;Y) = w× I(xx12 ;Y) +× w I( ;Y)
44 Z Channel (Optical)
I(X;Y) = w× I(xx12 ;Y) +× w I( ;Y)
1 w*1= − (1−+p )(1 2h(pp )/(1− ) )
pp/(1− ) C= log2 ( 1 +− (1pp ) )
45 Mutual Information for the Z Channel
• p = 0.15
I(X;Y)
w
46 Channel Capacity for the Z, BSC and BEC
p
47 Blahut-Arimoto Algorithm KJ p(yx | ) = jk I(X;Y)∑∑ p(xk ) p( yx jk | )log K kj=11 = px( )( py | x ) ∑ l jl l=1 • An analytic solution for the capacity can be very difficult to obtain • The alternative is a numerical solution – Arimoto Jan. 1972 – Blahut Jul. 1972 • Exploits the fact that I(X;Y) is a concave function of p(xk) 48 Blahut-Arimoto Algorithm
49 Blahut-Arimoto Algorithm
• Update the probabilities
50 51 Symmetric Channel Example
52 53 Non-Symmetric Channel Example
0.7000
54 55 56 57 58 Binary Symmetric Channel
x1 y1
x2 y2 channel matrix pp pp=1 − pp 59 Binary Symmetric Channel
• Consider a block of N = 1000 bits – if p = 0, 1000 bits are received correctly – if p = 0.01, 990 bits are received correctly – if p = 0.5, 500 bits are received correctly • When p > 0, we do not know which bits are in error – if p = 0.01, C = .919 bit – if p = 0.5, C = 0 bit
60 Triple Repetition Code
• N = 3 message w codeword c 0 000 1 111
61 Binary Symmetric Channel Errors
• If N bits are transmitted, the probability of an m bit error pattern is Nm− ppm (1 − ) • The probability of exactly m errors is Nm− N ppm 1 − ( m) ( ) • The probability of m or more errors is N Ni− N ppi 1 − ∑( i ) ( ) im=
62 Triple Repetition Code • N = 3 • The probability of 0 errors is (1− p )3 • The probability of 1 error is 3p(1− p )2 • The probability of 2 errors is 3pp2 (1− ) • The probability of 3 errors is p3
63 Triple Repetition Code
• For p = 0.01 – The probability of 0 errors is .970 – The probability of 1 error is 2.94×10-2 – The probability of 2 errors is 2.97×10-4 – The probability of 3 errors is 10-6
1 • If p 2 p(0 errors) p(1 error) p(2 errors) p(3 errors)
64 Triple Repetition Code – Decoding
Received Word Codeword Error Pattern 000 000 000 001 000 001 010 000 010 100 000 100 111 111 000 110 111 001 101 111 010 011 111 100
65 Triple Repetition Code • Majority vote or nearest neighbor decoding will correct all single errors 000, 001, 010, 100 → 000 111, 110, 101, 011 → 111 • The probability of a decoding error is then 2 323 Pe = 3p (1 −+ pp ) = 3 p − 2 p < p
• If p = 0.01, then Pe = 0.000298 and only one word in 3356 will be in error after decoding. • A reduction by a factor of 33.
66 Code Rate
• After compression, the data is (almost) memoryless and uniformly distributed (equiprobable) • Thus the entropy of the messages (codewords) is
H(WM ) = logb • The blocklength of a codeword is N
67 Code Rate
• The code rate is given by log M R = b bits per channel use N • M is the number of codewords • N is the block length • For the triple repetition code log 2 1 R = 2 = 33
68 Repetition
Code rate R 69 YN XN
70 Shannon’s Noisy Coding Theorem
For any ε > 0 and for any rate R less than the channel capacity C, there is an encoding and decoding scheme that can be used to ensure that the probability of decoding error is less than ε for a sufficiently large block length N.
71 Code rate R 72 Error Correction Coding N = 3
• R = 1/3 M = 2 0 → 000 1 → 111 • R = 1 M = 8 000 → 000 001 → 001 010 → 010 011 → 011 111 → 111 110 → 110 101 → 101 100 → 100 • Another choice R = 2/3 M = 4 00 → 000 01 → 011 10 → 101 11 → 110
73 Error Correction Coding N = 3
• BSC p = 0.01 • M is the number of codewords
NR Code Rate R Pe M=2 1 0.0297 8 2/3 0.0199 4 1/3 2.98×10-4 2
• Tradeoff between code rate and error rate
74 Codes for N=3
101 111 101 111
001 011 011
100 110 110
000 010 000 000 R=1 R=2/3 R=1/3
75 Error Correction Coding N = 5
• BSC p = 0.01
NR Code Rate R Pe M=2 1 0.0490 32 4/5 0.0394 16 3/5 0.0297 8 2/5 9.80×10-4 4 1/5 9.85×10-6 2
• Tradeoff between code rate and error rate
76 Error Correction Coding N = 7
• BSC p = 0.01 N = 7
NR Code Rate R Pe M=2 1 0.0679 128 6/7 0.0585 64 5/7 0.0490 32 4/7 2.03×10-3 16 3/7 1.46×10-3 8 2/7 9.80×10-4 4 1/7 3.40×10-7 2 • Tradeoff between code rate and error rate
77 Communication over Noisy Channels
X
78 Binary Symmetric Channel Errors
• If N bits are transmitted, the probability of an m bit error pattern is Nm− ppm (1 − ) • The probability of exactly m errors is Nm− N ppm 1 − ( m) ( ) • The probability of m or more errors is N Ni− N ppi 1 − ∑( i ) ( ) im=
79 BSC Error Correction
• For a code with codewords of length N bits that can correct up to t errors, the probability of error after decoding is
N Ni− P1=N ppi − e ∑ ( i ) ( ) it= +1
• This is the probably of t+1 or more errors
80 Best Codes Comparison
• BSC p = 0.01 R = 2/3 M = 2NR
N Pe log2M 3 1.99×10-2 2 12 6.17×10-3 8 30 3.32×10-3 20 51 1.72×10-3 34 81 1.36×10-3 54
• For fixed R, Pe can be decreased by increasing N
81 YN XN
82 Code Matrix
83 Binary Codes
• For given values of M and N, there are 2MN possible binary codes. • Of these, some will be bad, some will be best (optimal), and some will be good, in terms of
Pe • An average code will be good.
84 85 Channel Capacity
• To prove that information can be transmitted reliably over a noisy channel at rates up to the capacity, Shannon used a number of new concepts – Allowing an arbitrarily small but nonzero probability of error – Using long codewords – Calculating the average probability of error over a random choice of codes to show that at least one good code exists
86 Channel Coding Theorem
• Random coding used in the proof • Joint typicality used as the decoding rule • Shows that good codes exist which provide an arbitrarily small probability of error • Does not provide an explicit way of constructing good codes • If a long code (large N) is generated randomly, the code is likely to be good but is difficult to decode
87 The Data Processing Inequality Cascaded Channels
I(W;Y)
W X Y I(W;X) I(X;Y)
The mutual information I(W;Y) for the cascade cannot be larger than I(W;X) or I(X;Y), so that I(W;Y) ≤ I(W;X) I(W;Y) ≤ I(X;Y)
88 Communication over Noisy Channels
X
89 90 Channel Capacity: Weak Converse
For R > C, the decoding error probability is bounded away from 0
lower bound
on Pe
R C
91 Channel Capacity: Weak Converse
• C = 0.3 1-C/R
R
92 Fano’s Inequality h(Pe)+Pelogb(r-1)
logbr
logb(r-1)
Pe 0 (r-1)/r 1
93 Fano’s Inequality
H(X|Y) ≤ h(Pe)+Pelogb(r-1)
H(X|Y) ≤ 1+NRPe(N,R)
94 Channel Capacity: Strong Converse
• For rates above capacity (R > C)
-NEA (R) Pe (N,R)≥− 1 2
• where EA(R) is Arimoto’s error exponent
95 Arimoto’s Error Exponent EA(R)
96 EA(R) for a BSC with p=0.1
97 • The capacity is a very clear dividing point • At rates below capacity, Pe → 0 exponentially as N → ∞ • At rates above capacity, Pe → 1 exponentially as N → ∞
Pe
1
R C
98