Channel Capacity

Home , Channel capacity

Channel Capacity

Master Universitario en Ingenier´ıade Telecomunicaci´on

I. Santamar´ıa Universidad de Cantabria Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Contents

Introduction

Channel capacity for discrete channels

Channel coding theorem

Channel capacity for Gaussian channels

Channel Capacity 1/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

General model for communication systems

Message Codeword ^ W XN Channel YN W Source Encoder p(y|x) Decoder Sink

Source Channel Channel Source Coder Coder decoder decoder

Add redundacy Correct errors Restore message Remove redundancy (to protect the source against source (to compress the source) channel errors)

Channel Capacity 2/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Basic ideas

N I Choose input set of codewords x = (x1,..., xN )(N denotes the number of channel uses) so they are distinguishable at the output of the channel

I The number of these that we can choose will determine the channel’s capacity

I This number will depend on the distribution p(y x) which characterizes the channel |

I First, we will consider discrete channels

Channel Capacity 3/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Discrete Memoryless Channel (DMC) Deﬁnition: A DMC is the link connecting the discrete input X and the output Y , described by the conditional probability∈ X p(y x) ∈ Y | x y p(y|x)

I A DMC is described by a time-invariant transition probability matrix with elements Pi,j

Pi,j = p(Y = yj X = xi ) | P P is a right stochastic (or row stochastic) matrix: j Pi,j = 1 I Memoryless means that the probability of the output at time n depends only on the input at that time, and is conditionally independent of previous channel inputs or outputs, that is

p(yn x1,..., xn, y1,..., yn−1) = p(yn xn) | | Channel Capacity 4/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Channel coding theorem

Deﬁnition: The information channel capacity is deﬁned as

C = max I (X ; Y ) p(x) where the maximum is taken over all possible input distributions p(x)

Deﬁnition: The operational channel capacity is deﬁned as the highest rate (bits/channel use) at which information can be sent with arbitrarily low probability of error

The Channel coding theorem proves that Information Capacity = Operational Capacity

Channel Capacity 5/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Channel capacity x y p(y|x)

C = max I (X ; Y ) p(x) Remember that I (X ; Y ) = H(Y ) H(Y X ) − | X X p(x, y) = D(p(x, y) p(x)p(y)) = p(x, y) log || p(x)p(y) x∈X y∈Y

In general, the hard part of the problem is to ﬁnd the capacity achieving distribution

Channel Capacity 6/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels Example: Noiseless binary channel

1 0 0 X ∈{0,1} Y ∈{0,1}

1 1 1

I (X ; Y ) = H(Y ) H(Y X ) − | = H(X ) H(X Y ) = H(X ) H(X X ) = H(X ) − | − | Therefore, C = max I (X ; Y ) = max H(X ) p(x) p(x) What is the capacity achieving distribution? 1 1 p(x) = , 2 2 C = H(X ) = 1 bit/channel use

Channel Capacity 7/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Example: Noisy channel with non-overlapping outputs

1/ 2 0 0 1/ 2 1 X ∈{0,1} Y ∈{0,1,2,3} 1/ 2 2 1 1/ 2 3

The channel appears to be noisy, but really is not. We have again

C = max I (X ; Y ) = max H(X ) = 1 bit/channel use p(x) p(x)

An the capacity achieving distribution is

1 1 p(x) = , 2 2

Channel Capacity 8/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Example: Binary symmetric channel

1− p 0 0 p X ∈{0,1} Y ∈{0,1} p 1 1 1− p

I (X ; Y ) = H(Y ) H(Y X ) − X | X = H(Y ) p(x)H(Y X = x) = H(Y ) p(x)H(p) − | − 1 H(p) ≤ − where H(p) = p log p (1 p) log(1 p) − − − − is the entropy of a Bernoulli of parameter p, and the inequality follows from the fact the Y is a binary r.v. and then H(Y ) 1 ≤ Channel Capacity 9/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

I Equality I (X ; Y ) = 1 H(p) is achieved when Y is uniform with − 1 1 p(y) = , 2 2

I To have a uniform output we should send a uniform input, therefore 1 1 p(x) = , 2 2 is the capacity achieving distribution

I Finally, C = 1 H(p) − is the capacity of the BSC

Channel Capacity 10/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels Example: Binary erasure channel

1− f 0 = = − Pr{X 0} 1 p 0 f X ∈{0,1} e Y ∈{0,e,1} Pr{X =1}= p 1 f 1− f 1

X I (X ; Y ) = H(X ) H(X Y ) = H(X ) p(y)H(X Y = y) − | − | = H(X ) [(1 p)(1 f )H(X Y = 0) + p(1 f )H(X Y = 1) − − − | − | +(pf + (1 p)f )H(X Y = e)] − | = H(X ) [(1 p)(1 f )0 + p(1 f )0 + (pf + (1 p)f )H(X )] − − − − − = H(X )(1 f ) − The capacity is C = max I (X ; Y ) = 1 f p(x) −

Channel Capacity 11/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Weakly symmetric channels

I Remember that a DMC is characterized by a transition probability matrix P with elements

Pi,j = p(Y = yj X = xi ) | I A channel is said to be weakly symmetric if 1. Every row of the transition matrix is a permutation of every other row 1 2. All column sums are equal X X Pi j = p(y x) = c , | i x Note that this property implies that if X is uniform, then Y is uniform X 1 X c 1 p(y) = p(x)p(y x) = p(y x) = = | | x |X | x |X | |Y| 1If every column is also a permutation of every other column, the channel is symmetric Channel Capacity 12/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Erasure

1− f 0 Pr{X = 0}=1− p 0 f 1− f f 0  X ∈{0,1} e Y ∈{0,e,1} P =    0 f 1− f  Pr{X =1}= p 1 f 1− f 1

BSC 1− p 0 0 p X ∈{0,1} Y ∈{0,1} 1− p p  p P =    p 1− p 1 1 1− p

Channel Capacity 13/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Capacity of weakly symmetric channels Theorem: For a weakly symmetric channel, the capacity is

C = log H(r), |Y| − where r = Pi,1, Pi,2,..., Pi,|Y| is a row of the transition probability matrix, and the capacity is achieved by a uniform distribution on the input alphabet Example: Suppose a symmetric channel with transition probability matrix 0.3 0.2 0.5 P = 0.5 0.3 0.2 0.2 0.5 0.3 The capacity is

C = log 3 H(0.3, 0.2, 0.5) = log 3 1.4855 = 0.0995, − − 1 1 1 and is achieved by a uniform distribution on the input 3 , 3 , 3 Channel Capacity 14/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Properties of channel capacity

1. C 0 ≥ 2. C log ≤ |X | C = max I (X ; Y ) max H(X ) = log p(x) ≤ p(x) |X |

3. C log ≤ |Y|

Final remark: In general, for a DMC there is no closed-form solution for the capacity, but we can always resort to nonlinear optimization techniques (the most well-known being the Arimoto-Blahut algorithm)

Channel Capacity 15/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Channel coding theorem

I The channel coding theorem proves that the operational capacity (rate in bits/s that we can transmit over a channel with an arbitrarily small probability of error) is equal to the information capacity (maximum of I (X ; Y ) over all input distributions)

I We wish to ensure that no two input sequences produce the same output sequence; otherwise, we will not be able to decide which sequence was sent

I We will use the channel N times and send sequences N (codewords) of length N: x = (x1,..., xN ) I The key insight is, again, typicality

Channel Capacity 16/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Jointly typical sequences

N N N Deﬁnition: The set A of jointly typical sequences (x , y ) with respect to distribution p(x, y) is the set of sequences whose sample entropies are -close to the true entropies

N N N N N 1 N A = (x , y ) : log p(x ) H(X ) < , ∈ X × Y −N −

1 N log p(y ) H(Y ) < , −N − 1 N N log p(x , y ) H(X , Y ) < −N −

Channel Capacity 17/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels Example: Suppose that the input of the channel is Bernoulli with p = 0.1 and the channel is BSC with ﬂipping probability f = 0.2. For N = 50 the following pair of (input,output) sequences is jointly typical

xN = (000100000000010000000001000000 00010000000001000000)

y N = (000010100100010100100101000000 00010010010001010000)

N I x has 5 1s, so it is a typical sequence of p(x) at any N I y has 13 1s, so it is a typical sequence of p(y) at any (note that Pr(Y = 1) = 0.26) N N I x and Y diﬀer in 10 bits, which is the typical number of ﬂips for this BSC Channel Capacity 18/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Joint AEP

Informal statement: Let (X N , Y N ) be sequences of length N drawn i.i.d. according to

N N N Y p(x , y ) = p(xi , yi ) i=1 Then: 1. The probability that (X N , Y N ) are jointly typical tends to 1 as N → ∞ 2. The number of jointly typical sequences is close to 2NH(X ,Y ) N N 3. If X and Y are independent samples with the same N N marginals as p(xN , y N ), then the probability that (X , Y ) belongs to the jointly typical set is close to 2−NI (X ;Y )

Channel Capacity 19/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

N I For each typical input sequence, x , there are approximately 2NH(Y |X ) possible output sequences, all of them equally probable N I The total number of typical output sequences, y , is approximately 2NH(Y ) NH(Y |X ) I This number can be divided into sets of size 2 , each one corresponding to a diﬀerent input sequence

I The total number of disjoint sets is

2N (H(Y )−H(Y |X )) = 2NI (X ;Y )

hence we can send at most 2NI (X ;Y ) distinguishable codewords of length N

Channel Capacity 20/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels A graphical representation of the jointly typical set (from MacKay’s textbook) N X A - 6 - 6 2NH(X) q q q q q q q q q q q q q q q q q q q q q q q q q q q q NH(X,Y ) q q q 2 dots q q q q q q q q q q q q q q q q q q q q q q q q6 6 N q q q NH(Y X) Y q q q 2 | A q q?q ? q q q q q q q q-q q q q NH(Y ) q q q 2 q q q q q q q q q q q q q q q q q q q q q - q q q NH(X Y q) q q 2 | q q q q q q q q q q q q ? q q q q q q q q q q ?

Channel Capacity 21/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Deﬁnitions Deﬁnition: An (M, N) code for the channel p(y x) consists of the following: | 1. An index set 1,..., M { } 2. An encoding function

1,..., M N { } → X that yields M codewords, xN (1),..., xN (M), each of length N. The set of codewords is called the codebook 3. A decoding function

g : N 1,..., M Y → { } which is a deterministic function of the received vector and decides which codeword has been transmitted

Channel Capacity 22/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Deﬁnition: If we send the index i, the conditional probability of error is N N N Pei = Pr(g(Y ) = i X = x (i)) 6 |

Deﬁnition: The maximal probability of error for an (M, N) code is

Pemax = max Pei i∈{1,2,...,M}

Deﬁnition: The average probability of error for an (M, N) code is

M 1 X Pe = Pe M i i=1

Channel Capacity 23/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Deﬁnition: The rate of an (M, N) code is

log M R = bits per transmission N

Deﬁnition: A rate R is achievable if there exist a sequence of NR 2 2 , N codes such that Pemax 0 as N → → ∞

2Strictly, we should write d2NR e, N Channel Capacity 24/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Channel Coding

Channel Coding Theorem [C. E. Shannon, 1948]: For a DMC, all rates below capacity, C, are achievable. 1. Speciﬁcally, for every rate R < C, there exists a sequence of NR 2 , N codes with maximum probability of error Pemax 0 → 2. Conversely, any sequence sequence of 2NR , N codes with Pemax 0 must have R C → ≤

I Despite channel errors it is possible to get arbitrarily low bit error rates provided that R < C

I We can turn any noisy channel into an essentially noiseless channel with rate up to C bits per channel use

Channel Capacity 25/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Key ideas behind the proof (achievability)

1. Random coding: Calculating the average probability of error over a random choice of codebooks, which can be used to show the existence of at least one good code NR I Generate M = 2 codewords at random according to N QN d e p(x ) = n=1 p(xn) I The code is known to both the transmitter and the receiver N I A message i is chosen from 1,..., M and x (i) is transmitted, the received signal{ has the} following distribution

N N N Y p(y x (i)) = p(yn xn(i)) | | n=1

2. Typical set decoding: decode y N as iˆ if (xN (ˆi ), y N ) are jointly typical 3. An error occurs if iˆ= i 6

Channel Capacity 26/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Key ideas behind the proof (converse) 3 I The converse of the theorem proves that if R > C, a code with Pemax 0 does not exist → I If R > C the Pe is always bounded away from 0 I The key ingredient to prove the converse is Fano’s inequality

Fano’s inequality nR I Consider a DMC and a codebook with 2 codewords: x N (1),..., x N (M) C N I We transmit a codeword x (i) chosen at random (uniformly from the codebook) and make an estimate at the receiver x N (î ) N N I Fano’s inequality relates the conditional entropy H(X (i) X (î )) with the probability of error | H(X N (i) X N (î )) 1 Pe | − ≥ 2nR 3Typically, the converse is the difficult part to prove in most capacity results Channel Capacity 27/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

AWGN Channel 2 N ~ N(0,σ n )

X h ⊕ Y = hX + N

I The noise is zero-mean Gaussian I The channel is constant (and perfectly known at the receiver) Without loss of generality we will consider the following simpliﬁed model that assumes h = 1 Y = X + N We need some constraint on the input power, otherwise capacity would be inﬁnite 2 I Average power constraint: E[X ] P ≤ E[Y 2] = E[(X + N)2] P + σ2 ≤ n Channel Capacity 28/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

A ﬁrst idea

I Suppose we send a BPSK signal √SNR, √SNR and apply the optimal decoder at the receiver{− } I The probability of error would be Pe = Q √SNR

I AWGN BSC with ﬂipping probability p = Pe →N ~ N(0,1) decisor 1− Pe 0 0 Y X ∈{0,1} Pe Y ∈{0,1} X ∈ − SNR, SNR { } ⊕ ⇒ 1 Pe 1 1− Pe

I We know that 1 H(Pe) bits/channel use is the capacity of the BSC and thus− is an achievable rate, but this is obviously suboptimal (from an IT point of view)

I Notice, however, that this is what we do in practice

Channel Capacity 29/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

The capacity of the AWGN

The information capacity of the AWGN is

C = max I (X ; Y ) f (x):E[X 2]≤P

We can calculate I (X ; Y ) as follows

I (X ; Y ) = h(Y ) h(Y X ) = h(Y ) h((X + N) X ) − | − | = h(Y ) h(N X ) = h(Y ) h(N) − | − where we have applied that N is independent of X . Since N is Gaussian, we have that 1 I (X ; Y ) = h(Y ) log 2πeσ2 − 2 n

Channel Capacity 30/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

2 I Since E[X ] P, the maximum value for the power (variance) of≤Y is 2 2 E[Y ] = P + σn

2 2 I Given that E[Y ] = P + σn, the entropy of Y is maximized 1 2 when Y is Gaussian: h(Y ) = 2 log 2πe(P + σn). Then,

1 2 1 2 1 P I (X ; Y ) = log 2πe(P +σn) log 2πeσn = log 1 + 2 2 − 2 2 σn

I Y is Gaussian only when X N(0, P) is also Gaussian ∼ 1 P C = max I (X ; Y ) = log 1 + 2 f (x):E[X 2]≤P 2 σn and the capacity is achieved transmitting Gaussian codewords

Channel Capacity 31/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Discrete Constellations

I In practice, we never use Gaussian codewors (why?)

I We use instead discrete constellations (e.g., BPSK, QPSK, QAM, ...)

I What is the mutual information with discrete constellations?

I Let us consider BPSK signaling: we send X +1, 1 through and AWGN channel ∈ { − }

Y = √SNR X + N

with N N(0, 1) ∼ I (X ; Y ) = h(Y ) h(Y X ) − |

Channel Capacity 32/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

∞ Z 1 2 I (X ; Y ) = SNR e−y /2 log cosh SNR √SNR y − −∞ 2π −

3.5

3 Gaussian (Shannon capacity) 2.5 BPSK

I(X;Y) 1.5

0.5

0 -10 -5 0 5 10 15 20 SNR (dB)

Channel Capacity 33/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

10 Shaping gain (1.53 dB)

8 Gaussian inputs log(1+SNR) 64-QAM 6

16-QAM 4

Capacity (bps/Hz) Capacity 4-QAM (QPSK) 2

Pe (symbol) = 10-6 (uncoded) 0 5 10 15 20 25 30 SNR dB

Channel Capacity 34/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Complex AWGN

2 In the complex case, the noise is N CN(0, σn) and the capacity achieving distribution is also a complex∼ Gaussian

X CN(0, P) ∼ The capacity is

P C = max I (X ; Y ) = log 1 + 2 f (x):E[|X |2]≤P σn Since now we are transmitting over the real and imaginary parts, the capacity of the complex AWGN is twice the capacity of the real AWGN

Channel Capacity 35/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Bandlimited channels

I Consider now the case of AWGN bandlimited channels

Y (t) = (X (t) + N(t)) h(t) ∗ where h(t) is the impulse response of an ideal lowpass ﬁlter, which ﬁlters out all frequencies greater than W (if ω > W H(ω) = 0 ) | |

I We know from Nyquist sampling theorem that a bandlimited 1 signal is completely determined by samples spaced Ts = 2W seconds apart

I For a signal of duration T we should take 2TW samples: The continuous-time signal is thus represented by a 2WT -dimensional vector

Channel Capacity 36/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

I White Gaussian noise with double-sided PSD N0/2

2 Variance per noise sample: σn = N0/2

I Signal power P = Signal energy PT ⇒ P Energy per sample: 2W

I Capacity: 1 P C = log 1 + 2W 2 (N0/2)2W P = W log 1 + N0W

Channel Capacity 37/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Parallel Gaussian channels

2 N1 ~ N(0,σ1 )

Y = X + N X1 ⊕ 1 1 1   k  2 ≤ E∑ X j  P 2  j=1  Nk ~ N(0,σ k )

Y = X + N X k ⊕ k k k

I There is a total power constraint: We wish to distribute the total power among the channels so as to maximize capacity

I A model for multicarrier (OFDM) communications

Channel Capacity 38/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels

Since the noises are independent, we have:

I (X1,..., Xk ; Y1,..., Yk ) = h(Y1,..., Yk ) h(Y1,..., Yk X1,..., Xk ) − | = h(Y1,..., Yk ) h(N1,..., Nk ) − k X = h(Y1,..., Yk ) h(Ni ) − i=1 k X h(Yi ) h(Ni ) ≤ − i=1 k X 1 Pi log 1 + ≤ 2 σ2 i=1 i

where equality is achieved when Xi are independent Gaussians, P Xi N(0, Pi ), such that Pi = P ∼

Channel Capacity 39/40 Introduction Capacity DMC Channel coding theorem Capacity Gaussian channels So the problem is reduced to ﬁnd the optimal power allocation among the channels. This is a standard optimization problem k ! X 1 Pi X J(P1,..., Pk ) = log 1 + + λ P Pi 2 σ2 − i=1 i i

2+ + The solution is Pi = ν σi , where (x) denotes x when x is positive and 0 otherwise,− and ν is chosen to fulﬁll the constraint

k X (ν σ2)+ = P − i i=1 This is typically referred to as water-ﬁlling

ν Water level

2 σ3 1 42 𝑃𝑃 2 2 σ𝑃𝑃 σ1 𝑃𝑃σ 2 4 52 2 σ𝑃𝑃5

Channel Capacity 40/40