4 and Channel Capacity

In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we in- troduce several more information-theoretic quantities. These quantities are important in the study of Shannon’s results.

4.1 Information-Theoretic Quantities Definition 4.1. Recall that, the entropy of a discrete random variable X is defined in Definition 2.41 to be X H (X) = − pX (x) log2 pX (x) = −E [log2 pX (X)] . (16) x∈SX Similarly, the entropy of a discrete random variable Y is given by X H (Y ) = − pY (y) log2 pY (y) = −E [log2 pY (Y )] . (17) y∈SY In our context, the X and Y are input and output of a discrete memory- less channel, respectively. In such situation, we have introduced some new notations in Section 3.1:

SX ≡ X pX(x) ≡ p(x) p pY |X(y|x) ≡ Q(y|x) Q matrix SY ≡ Y pY (y) ≡ q(y) q pX,Y (x, y) ≡ p(x, y) P matrix

Under such notations, (16) and (17) become X H (X) = − p (x) log2 p (x) = −E [log2 p (X)] (18) x∈X and X H (Y ) = − q (y) log2 q (y) = −E [log2 q (Y )] . (19) y∈Y Definition 4.2. The for two random variables X and Y is given by X X H (X,Y ) = − p (x, y) log2p (x, y) = −E [log2 p (X,Y )] . x∈X y∈Y

48 Example 4.3. Random variables X and Y have the following joint pmf matrix P:  1 1 1 1  8 16 16 4 1 1 1  16 8 16 0   1 1 1   32 32 16 0  1 1 1 32 32 16 0

Find H(X), H(Y ) and H(X,Y ).

Definition 4.4. The (conditional) entropy of Y when we know X = x is denoted by H (Y |X = x) or simply H(Y |x). It can be calculated from X H (Y |x) = − Q (y |x) log2 Q (y |x) y∈Y • Note that the above formula is what we should expect it to be. When we want to find the entropy of Y , we use (19): X H (Y ) = − q (y) log2 q (y). y∈Y When we have an extra piece of information that X = x, we should update the probability about Y from the unconditional probability q(y) to the conditional probability Q(y|x). • Note that when we consider Q(y|x) with the value of x fixed and the value of y varied, we simply get the whole x-row from Q matrix. So, to

49 find H (Y |x), we simply find the “usual” entropy from the probability values in the row corresponding to x in the Q matrix. Example 4.5. Consider the following DMC (actually BAC)

1/2 / 0 0 1/2 X Y / 1 1 1

 1/4, y = 0,  Originally P [Y = y] = q(y) = 3/4, y = 1,  0, otherwise. (a) Suppose we know that X = 0.  1/2, y = 0, 1, The “x = 0” row in the Q matrix gives Q (y|0) = 0, otherwise; that is, given x = 0, the RV Y will be uniform.

(b) Suppose we know that X = 1. The “x = 1” row in the Q matrix  1, y = 1, gives Q (y|1) = that is, given x = 1, the RV Y is 0, otherwise; degenerated (deterministic).

Definition 4.6. : The (average) conditional entropy of Y when we know X is denoted by H(Y |X). It can be calculated from X H (Y |X ) = p (x) H (Y |x). x∈X Example 4.7. In Example 4.5,

50 4.8. An alternative way to calculate H(Y |X) can be derived by first rewrit- ing it as X X X H (Y |X ) = p (x) H (Y |x) = − p (x) Q (y |x) log2 Q (y |x) x∈X x∈X y∈Y X X = − p (x, y) log2Q (y |x) = −E [log2 Q (Y |X )] x∈X y∈Y p(x,y) Note that Q(y|x) = p(x) . Therefore,  p (X,Y ) H (Y |X) = − [log Q (Y |X )] = − log E 2 E 2 p (X)

= (−E [log2p (X,Y )]) − (−E [log2p (X)]) = H (X,Y ) − H (X) Example 4.9. In Example 4.5,

Example 4.10. Continue from Example 4.3. Random variables X and Y have the following joint pmf matrix P:  1 1 1 1  8 16 16 4 1 1 1  16 8 16 0   1 1 1   32 32 16 0  1 1 1 32 32 16 0 Find H(Y |X) and H(X|Y ).

51 ECS 452: In-Class Exercise # _

Instructions Date: _ _ / _ _ / 2017

1. Separate into groups of no more than three persons. Name ID (last 3 digits) 2. The group cannot be the same as your former group. 3. Only one submission is needed for each group. 4. Write down all the steps that you have done to obtain your answers. You may not get full credit even when your answer is correct without showing how you get your answer. 5. Do not panic.

1. Consider two random variables X and Y whose joint pmf matrix is given by

y 3 5 x 1 1 5 18 18 P  . 42 2 99

Calculate the following quantities. Except for part (e), all of your answers should be of the form X.XXXX.

a. 퐻(푋, 푌)

b. 퐻(푋)

c. 퐻(푌)

d. 퐻(푌|푋)

e. Q matrix

Definition 4.11. The mutual information13 I(X; Y ) between two ran- dom variables X and Y is defined as

I (X; Y ) = H (X) − H (X |Y ) (20) = H (Y ) − H (Y |X ) (21) = H (X) + H (Y ) − H (X,Y ) (22)  p (X,Y )  X X p (x, y) = log = p (x, y) log (23) E 2 p (X) q (Y ) p (x) q (y) x∈X y∈Y  P (X |Y )  Q (Y |X ) = log X|Y = log . (24) E 2 p (X) E 2 q (Y ) • Mutual information quantifies the reduction in the uncertainty of one random variable due to the knowledge of another.

• Mutual information is a measure of the amount of information one random variable contains about another [5, p 13]. • It is also natural to think of I(X; Y ) as a measure of how far X and Y are from being independent. ◦ Technically, it is the (Kullback-Leibler) divergence between the joint and product-of-marginal distributions.

4.12. Some important properties (a) H(X,Y ) = H(Y,X) and I(X; Y ) = I(Y ; X). However, in general, H(X|Y ) 6= H(Y |X). (b) I and H are always ≥ 0. (c) There is a one-to-one correspondence between Shannon’s information measures and set theory. We may use an information diagram, which

13The name mutual information and the notation I(X; Y ) was introduced by [Fano, 1961, Ch 2].

52 is a variation of a Venn diagram, to represent relationship between Shannon’s information measures. This is similar to the use of the Venn diagram to represent relationship between probability measures. These diagrams are shown in Figure 7.

Venn Diagram Information Diagram Probability Diagram 퐵 푌 퐵 퐻 푋 푃 퐴

퐴\B 퐴 ∩ 퐵 퐵\A 퐻 푋|푌 퐼 푋; 푌 퐻 푌|푋 푃 퐴\B 푃 퐴 ∩ 퐵 푃 퐵\A

퐴 푋 퐻 푌 퐴 푃 퐵

퐴 ∪ 퐵 퐻 푋, 푌 푃 퐴 ∪ 퐵

Figure 7: Venn diagram and its use to represent relationship between information measures and relationship between probabilities

• Chain rule for information measures:

H (X,Y ) = H (X) + H (Y |X ) = H (Y ) + H (X |Y ) .

(d) I(X; Y ) ≥ 0 with equality if and only if X and Y are independent.

• When this property is applied to the information diagram (or def- initions (20), (21), and (22) for I(X,Y )), we have (i) H(X|Y ) ≤ H(X), (ii) H(Y |X) ≤ H(Y ), (iii) H(X,Y ) ≤ H(X) + H(Y ) Moreover, each of the inequalities above becomes equality if and =

only X | Y .

(e) We have seen in Section 2.4 that

0 ≤ H (X) ≤ log2 |X | . (25) deterministic (degenerated) uniform

Similarly,

0 ≤ H (Y ) ≤ log2 |Y| . (26) deterministic (degenerated) uniform

53 For conditional entropy, we have 0 ≤ H (Y |X ) ≤ H (Y ) (27) ∃g Y =g(X)

X = Y | and 0 ≤ H (X |Y ) ≤ H (X) . (28) ∃g X=g(Y )

X = Y | For mutual information, we have 0 ≤ I (X; Y ) ≤ H (X) (29)

X = Y ∃g X=g(Y ) | and 0 ≤ I (X; Y ) ≤ H (Y ) . (30)

X = Y ∃g Y =g(X) | Combining 25, 26, 29, and 30, we have

0 ≤ I (X; Y ) ≤ min {H (X) ,H (Y )} ≤ min {log2 |X | , log2 |Y|} (31)

(f) H (X |X ) = 0 and I(X; X) = H(X). Example 4.13. Find the mutual information I(X; Y ) between the two ran- h 1 1 i 2 4 dom variables X and Y whose joint pmf matrix is given by P = 1 . 4 0

Example 4.14. Find the mutual information I(X; Y ) between the two ran- h 1 3 i 1 3 4 4 dom variables X and Y whose p = 4, 4 and Q = 3 1 . 4 4

54