<<

10.3 Joint Differential , Conditional (Differential) Entropy, and Mutual Definition 10.17 The joint di↵erential entropy h(X) of a random vector X with joint pdf f(x) is defined as

h(X)= f(x) log f(x)dx = E log f(X). ZS

Corollary If X ,X , ,X are mutually independent, then 1 2 ··· n n h(X)= h(Xi). i=1 X

Theorem 10.18 (Translation) h(X + c)=h(X).

Theorem 10.19 (Scaling) h(AX)=h(X) + log det(A) . | | Definition 10.17 The joint di↵erential entropy h(X) of a random vector X with joint pdf f(x) is defined as

h(X)= f(x) log f(x)dx = E log f(X). ZS

Corollary If X ,X , ,X are mutually independent, then 1 2 ··· n n h(X)= h(Xi). i=1 X

Theorem 10.18 (Translation) h(X + c)=h(X).

Theorem 10.19 (Scaling) h(AX)=h(X) + log det(A) . | | Definition 10.17 The joint di↵erential entropy h(X) of a random vector X with joint pdf f(x) is defined as

h(X)= f(x) log f(x)dx = E log f(X). ZS

Corollary If X ,X , ,X are mutually independent, then 1 2 ··· n n h(X)= h(Xi). i=1 X

Theorem 10.18 (Translation) h(X + c)=h(X).

Theorem 10.19 (Scaling) h(AX)=h(X) + log det(A) . | | Definition 10.17 The joint di↵erential entropy h(X) of a random vector X with joint pdf f(x) is defined as

h(X)= f(x) log f(x)dx = E log f(X). ZS

Corollary If X ,X , ,X are mutually independent, then 1 2 ··· n n h(X)= h(Xi). i=1 X

Theorem 10.18 (Translation) h(X + c)=h(X).

Theorem 10.19 (Scaling) h(AX)=h(X) + log det(A) . | | Theorem 10.20 (Multivariate Gaussian Distribution) Let X (µ,K). Then N 1 h(X)= log [(2e)n K ] . 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) Theorem 10.19 h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 det(Q)= 1 ± n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | |

K = Q⇤Q>

K = Q ⇤ Q> | | | || || | 2 = Q ⇤ | | | | = ⇤ | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) Theorem 10.19 h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 det(Q)= 1 ± n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | |

K = Q⇤Q>

K = Q ⇤ Q> | | | || || | 2 = Q ⇤ | | | | = ⇤ | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) Theorem 10.19 h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 det(Q)= 1 ± n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | |

K = Q⇤Q>

K = Q ⇤ Q> | | | || || | 2 = Q ⇤ | | | | = ⇤ | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) Theorem 10.19 h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 det(Q)= 1 ± n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | |

K = Q⇤Q>

K = Q ⇤ Q> | | | || || | 2 = Q ⇤ | | | | = ⇤ | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) Theorem 10.19 h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 det(Q)= 1 ± n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | |

K = Q⇤Q>

K = Q ⇤ Q> | | | || || | 2 = Q ⇤ | | | | = ⇤ | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) Theorem 10.19 h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 det(Q)= 1 ± n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) _2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) _2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- _2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2___⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2___⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡e__i) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. i_____Y=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. i_____Y=1 4 5 1 n = log[(2⇡e) ___⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ___⇤ ] 2 | | 1 n = log[(2⇡e) ___K ]. 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ___⇤ ] 2 | | 1 n = log[(2⇡e) ___K ]. 2 | |

K = Q⇤Q>

K = Q ⇤ Q> | | | || || | 2 = Q ⇤ | | | | = ⇤ | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ___⇤ ] 2 | | 1 n = log[(2⇡e) ___K ]. 2 | |

K = Q⇤Q>

K = Q ⇤ Q> | | | || || | 2 = Q ⇤ | | | | = ⇤ | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ___⇤ ] 2 | | 1 n = log[(2⇡e) ___K ]. 2 | |

K = Q⇤Q>

K = Q ⇤ Q> | | | || || | 2 = Q ⇤ | | | | = ⇤ | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ___⇤ ] 2 | | 1 n = log[(2⇡e) ___K ]. 2 | |

K = Q⇤Q>

K = Q ⇤ Q> | | | || || | 2 = Q ⇤ | | | | = ⇤ | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ___⇤ ] 2 | | 1 n = log[(2⇡e) ___K ]. 2 | |

K = Q⇤Q>

K = Q ⇤ Q> | | | || || | 2 = Q ⇤ | | | | = ⇤ | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = i, the ith diagonal = log(2⇡ei) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) i 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | The Model of a “Channel” with Discrete Output

Definition 10.21 The Y is related to the random variable X through a conditional distribution p(y x) defined for all x means | The Model of a “Channel” with Discrete Output

Definition 10.21 The random variable Y is related to the random variable X through a conditional distribution p(y x) defined for all x means |

X (general) p(y x) Y (discrete) | The Model of a “Channel” with Continuous Output

Definition 10.22 The random variable Y is related to the random variable X through a conditional pdf f(y x) defined for all x means | The Model of a “Channel” with Continuous Output

Definition 10.22 The random variable Y is related to the random variable X through a conditional pdf f(y x) defined for all x means |

X (general) f(y x) Y (continuous) | Conditional

Definition 10.23 Let X and Y be jointly distributed random variables where Y is continuous and is related to X through a conditional pdf f(y x)defined for all x. The conditional di↵erential entropy of Y given X = x is| defined as { }

h(Y X = x)= f(y x) log f(y x) dy | (x) | | ZSY and the conditional di↵erential entropy of Y given X is defined as

h(Y X)= h(Y X = x) dF (x)= E log f(Y X) | | | ZSX Conditional Differential Entropy

Definition 10.23 Let X and Y be jointly distributed random variables where Y is continuous and is related to X through a conditional pdf f(y x)defined for all x. The conditional di↵erential entropy of Y given X = x is| defined as { }

h(Y X = x)= f(y x) log f(y x) dy | (x) | | ZSY and the conditional di↵erential entropy of Y given X is defined as

h(Y X)= h(Y X = x) dF (x)= E log f(Y X) | | | ZSX

X (general) f(y x) Y (continuous) | Conditional Differential Entropy

Definition 10.23 Let X and Y be jointly distributed random variables where Y is continuous and is related to X through a conditional pdf f(y x)defined for all x. The conditional di↵erential entropy of Y given X = x is| defined as { }

h(Y X = x)= f(y x) log f(y x) dy | (x) | | ZSY and the conditional di↵erential entropy of Y given X is defined as

h(Y X)= h(Y X = x) dF (x)= E log f(Y X) | | | ZSX

X (general) f(y x) Y (continuous) | Conditional Differential Entropy

Definition 10.23 Let X and Y be jointly distributed random variables where Y is continuous and is related to X through a conditional pdf f(y x)defined for all x. The conditional di↵erential entropy of Y given X = x is| defined as { }

h(Y X = x)= f(y x) log f(y x) dy | (x) | | ZSY and the conditional di↵erential entropy of Y given X is defined as

h(Y X)= h(Y X = x) dF (x)= E log f(Y X) | | | ZSX

X (general) f(y x) Y (continuous) | Conditional Differential Entropy

Definition 10.23 Let X and Y be jointly distributed random variables where Y is continuous and is related to X through a conditional pdf f(y x)defined for all x. The conditional di↵erential entropy of Y given X = x is| defined as { }

h(Y X = x)= f(y x) log f(y x) dy | (x) | | ZSY and the conditional di↵erential entropy of Y given X is defined as

h(Y X)= h(Y X = x) dF (x)= E log f(Y X) | X | | ZS f(x) dx

X (general) f(y x) Y (continuous) | Proposition 10.24 If

X (general) f(y x) Y (continuous) | then f(y) exists and is given by

f(y)= f(y x) dF (x). | Z Remark Proposition 10.24 says that the pdf of Y exists regardless of the distribution of X. The next proposition is its vector generalization.

Proposition 10.25 Let X and Y be jointly distributed random vectors where Y is continuous and is related to X through a conditional pdf f(y x)defined for all x.Thenf(y) exists and is given by |

f(y)= f(y x)dF (x). | Z Proposition 10.24 If

X (general) f(y x) Y (continuous) | then f(y) exists and is given by

f(y)= f(y x) dF (x). | Z Remark Proposition 10.24 says that the pdf of Y exists regardless of the distribution of X. The next proposition is its vector generalization.

Proposition 10.25 Let X and Y be jointly distributed random vectors where Y is continuous and is related to X through a conditional pdf f(y x)defined for all x.Thenf(y) exists and is given by |

f(y)= f(y x)dF (x). | Z Proposition 10.24 If

X (general) f(y x) Y (continuous) | then f(y) exists and is given by

f(y)= f(y x) dF (x). | Z Remark Proposition 10.24 says that the pdf of Y exists regardless of the distribution of X. The next proposition is its vector generalization.

Proposition 10.25 Let X and Y be jointly distributed random vectors where Y is continuous and is related to X through a conditional pdf f(y x)defined for all x.Thenf(y) exists and is given by |

f(y)= f(y x)dF (x). | Z Proposition 10.24 If

X (general) f(y x) Y (continuous) | then f(y) exists and is given by

f(y)= f(y x) dF (x). | Z Remark Proposition 10.24 says that the pdf of Y exists regardless of the distribution of X. The next proposition is its vector generalization.

Proposition 10.25 Let X and Y be jointly distributed random vectors where Y is continuous and is related to X through a conditional pdf f(y x)defined for all x.Thenf(y) exists and is given by |

f(y)= f(y x)dF (x). | Z Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so g(v) X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so g(v) X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so g(v) X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so g(v) X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so g(v) X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so g(v) X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so g(v) X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 __ y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F (__,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so g(v) X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 _y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y_) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so g(v) X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so g(v) X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so g(v) X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so

X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so

X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so

X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so

X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so

X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so

X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so g(v) X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so g(v) X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so g(v) X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Fundamental Z1 theorem of = g(y) FY (y)=FXY ( , y) calculus 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so g(v) X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | | Proposition 10.24 If 3. By Fubini’s theorem, the order of integration in FY (y) can be exchanged, and so g(v) X (general) f(y x) Y (continuous) | y 1 FY (y)= fY X (v x) dFX (x) dv. " | | # Z1 Z1 then f(y) exists and is given by 4. Then

f (y)= 1 f (y x) dF (x). Y Y X X d | | Z1 fY (y)= FY (y) dy Proof d y = g(v) dv 1. dy Z1 = g(y) FY (y)=FXY ( , y) 1 1 y = fY X (y x) dFX (x). 1 | | = fY X (v x) dv dFX (x). Z1 | | Z1 Z1

2. Since

y 1 fY X (v x) dv dFX (x) | | Z1 Z1 y 1 = fY X (v x) dv dFX (x) | | Z1 Z1 = F ( ,y) XY 1 = FY (y) 1,  fY X (v x) is absolutely integrable. | |

Definition 10.26 Let

X (general) f(y x) Y (continuous) |

1. The mutual information between X and Y is defined as f(Y X) I(X; Y )=E log | f(Y ) f(y x) = f(y x) log | dy dF (x). (x) | f(y) ZSX ZSY 2. When both X and Y are continuous and f(x, y)exists,

f(Y X) f(X, Y ) I(X; Y )=E log | = E log . f(Y ) f(X)f(Y ) Mutual Information

Definition 10.26 Let

X (general) f(y x) Y (continuous) |

1. The mutual information between X and Y is defined as f(Y X) I(X; Y )=E log | f(Y ) f(y x) = f(y x) log | dy dF (x). (x) | f(y) ZSX ZSY 2. When both X and Y are continuous and f(x, y)exists,

f(Y X) f(X, Y ) I(X; Y )=E log | = E log . f(Y ) f(X)f(Y ) Mutual Information

Definition 10.26 Let

X (general) f(y x) Y (continuous) |

1. The mutual information between X and Y is defined as f(Y X) I(X; Y )=E log | f(Y ) f(y x) = f(y x) log | dy dF (x). (x) | f(y) ZSX ZSY 2. When both X and Y are continuous and f(x, y)exists,

f(Y X) f(X, Y ) I(X; Y )=E log | = E log . f(Y ) f(X)f(Y ) Mutual Information

Definition 10.26 Let

X (general) f(y x) Y (continuous) |

1. The mutual information between X and Y is defined as f(Y X) I(X; Y )=E log | f(Y ) f(y x) = f(y x) log | dy dF (x). X Y (x) | f(y) ZS ZS f(x) dx 2. When both X and Y are continuous and f(x, y)exists,

f(Y X) f(X, Y ) I(X; Y )=E log | = E log . f(Y ) f(X)f(Y ) Mutual Information

Definition 10.26 Let

X (general) f(y x) Y (continuous) |

1. The mutual information between X and Y is defined as f(Y X) I(X; Y )=E log | f(Y ) f(y x) = f(y x) log | dy dF (x). (x) | f(y) ZSX ZSY 2. When both X and Y are continuous and f(x, y)exists,

f(Y X) f(X, Y ) I(X; Y )=E log | = E log . f(Y ) f(X)f(Y ) Mutual Information

Definition 10.26 Let

X (general) f(y x) Y (continuous) |

1. The mutual information between X and Y is defined as f(Y X) I(X; Y )=E log | f(Y ) f(y x) = f(y x) log | dy dF (x). (x) | f(y) ZSX ZSY 2. When both X and Y are continuous and f(x, y)exists,

f(Y X) f(X, Y ) I(X; Y )=E log | = E log . f(Y ) f(X)f(Y ) Remarks

With Definition 10.26, the mutual information is defined when one r.v. is • general and the other is continuous. In Ch. 2, the mutual information is defined when both r.v.’s are discrete. • Thus the mutual information is defined when each of the r.v.’s is either • discrete or continuous. Remarks

With Definition 10.26, the mutual information is defined when one r.v. is • general and the other is continuous. In Ch. 2, the mutual information is defined when both r.v.’s are discrete. • Thus the mutual information is defined when each of the r.v.’s is either • discrete or continuous. Remarks

With Definition 10.26, the mutual information is defined when one r.v. is • general and the other is continuous. In Ch. 2, the mutual information is defined when both r.v.’s are discrete. • Thus the mutual information is defined when each of the r.v.’s is either • discrete or continuous. Remarks

With Definition 10.26, the mutual information is defined when one r.v. is • general and the other is continuous. In Ch. 2, the mutual information is defined when both r.v.’s are discrete. • Thus the mutual information is defined when each of the r.v.’s is either • discrete or continuous. Conditional Mutual Information

Definition 10.27 Let

X (general) f(y x, t) Y (continuous) T (general) |

The mutual information between X and Y given T is defined as

f(Y X, T ) I(X; Y T )= I(X; Y T = t)dF (t)=E log | | | f(Y T ) ZST | where f(y x, t) I(X; Y T = t)= f(y x, t) log | dy dF (x t). | (t) (x,t) | f(y t) | ZSX ZSY | Conditional Mutual Information

Definition 10.27 Let

X (general) f(y x, t) Y (continuous) T (general) |

The mutual information between X and Y given T is defined as

f(Y X, T ) I(X; Y T )= I(X; Y T = t)dF (t)=E log | | | f(Y T ) ZST | where f(y x, t) I(X; Y T = t)= f(y x, t) log | dy dF (x t). | (t) (x,t) | f(y t) | ZSX ZSY | Conditional Mutual Information

Definition 10.27 Let

X (general) f(y x, t) Y (continuous) T (general) |

The mutual information between X and Y given T is defined as

f(Y X, T ) I(X; Y T )= I(X; Y T = t)dF (t)=E log | | | f(Y T ) ZST | where f(y x, t) I(X; Y T = t)= f(y x, t) log | dy dF (x t). | (t) (x,t) | f(y t) | ZSX ZSY | Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For allyi and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle (j + 1) Xi Xj { } { } j i,j A Axy 2 y 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y j (f(x ))(f(y )) Xi Xj { i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) logAi fx(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log |{z}dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) x ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- i (i + 1) tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ ______{ } = ______Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ ______{ } = ______Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 ______f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ ______i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj ______{ } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj ______i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { }______{ } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i ______j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Interpretation of I(X;Y)

j 1. Assume f(x, y) exists and is continuous. 5. For all i and j,let(x ,y ) Ai A . i j 2 x ⇥ y 2. For a fixed , for all integer i and j, define the 6. Then intervals ˆ ˆ i j I(X; Y) Ax =[i, (i + 1)) and Ay =[j, (j +1)) Pr (Xˆ , Yˆ )=(i, j) ˆ ˆ { } = Pr (X, Y)=(i, j) log { } Pr Xˆ = i Pr Yˆ = j and the rectangle Xi Xj { } { } 2 2 f(xi,yj ) i,j i j f(x ,y ) log A = A A . ⇡ i j xy x ⇥ y (f(x ))(f(y )) Xi Xj i j f(x ,y ) 3. Define discrete r.v.’s 2 i j = f(xi,yj ) log f(x )f(y ) Xi Xj i j i Xˆ = i if X A f(x, y) 2 x ˆ j f(x, y)log dxdy ( Y = j if Y Ay ⇡ 2 ZZ f(x)f(y) = I(X; Y ) ˆ ˆ 4. X and Y are quantizations of X and Y ,respec- tively. 7. Therefore, I(X; Y ) can be interpreted as the limit of I(Xˆ ; Yˆ )as 0. ! 8. This interpretation continues to be valid for general distribution for X and Y . Definition 10.28 Let

X (discrete) f(y x) Y (continuous) |

The of X given Y is defined as

H(X Y )=H(X) I(X; Y ). |

Proposition 10.29 For two random variables X and Y ,

1. h(Y )=h(Y X)+I(X; Y )ifY is continuous; | 2. H(Y )=H(Y X)+I(X; Y )ifY is discrete. |

Proposition 10.30 (Chain Rule)

n

h(X1,X2, ,Xn)= h(Xi X1, ,Xi 1) ··· | ··· i=1 X Definition 10.28 Let

X (discrete) f(y x) Y (continuous) |

The conditional entropy of X given Y is defined as

H(X Y )=H(X) I(X; Y ). |

Proposition 10.29 For two random variables X and Y ,

1. h(Y )=h(Y X)+I(X; Y )ifY is continuous; | 2. H(Y )=H(Y X)+I(X; Y )ifY is discrete. |

Proposition 10.30 (Chain Rule)

n

h(X1,X2, ,Xn)= h(Xi X1, ,Xi 1) ··· | ··· i=1 X Definition 10.28 Let

X (discrete) f(y x) Y (continuous) |

The conditional entropy of X given Y is defined as

H(X Y )=H(X) I(X; Y ). |

Proposition 10.29 For two random variables X and Y ,

1. h(Y )=h(Y X)+I(X; Y )ifY is continuous; | 2. H(Y )=H(Y X)+I(X; Y )ifY is discrete. |

Proposition 10.30 (Chain Rule)

n

h(X1,X2, ,Xn)= h(Xi X1, ,Xi 1) ··· | ··· i=1 X Definition 10.28 Let

X (discrete) f(y x) Y (continuous) |

The conditional entropy of X given Y is defined as

H(X Y )=H(X) I(X; Y ). |

Proposition 10.29 For two random variables X and Y ,

1. h(Y )=h(Y X)+I(X; Y )ifY is continuous; | 2. H(Y )=H(Y X)+I(X; Y )ifY is discrete. |

Proposition 10.30 (Chain Rule)

n

h(X1,X2, ,Xn)= h(Xi X1, ,Xi 1) ··· | ··· i=1 X Definition 10.28 Let

X (discrete) f(y x) Y (continuous) |

The conditional entropy of X given Y is defined as

H(X Y )=H(X) I(X; Y ). |

Proposition 10.29 For two random variables X and Y ,

1. h(Y )=h(Y X)+I(X; Y )ifY is continuous; | 2. H(Y )=H(Y X)+I(X; Y )ifY is discrete. |

Proposition 10.30 (Chain Rule)

n

h(X1,X2, ,Xn)= h(Xi X1, ,Xi 1) ··· | ··· i=1 X Definition 10.28 Let

X (discrete) f(y x) Y (continuous) |

The conditional entropy of X given Y is defined as

H(X Y )=H(X) I(X; Y ). |

Proposition 10.29 For two random variables X and Y ,

1. h(Y )=h(Y X)+I(X; Y )ifY is continuous; | 2. H(Y )=H(Y X)+I(X; Y )ifY is discrete. |

Proposition 10.30 (Chain Rule)

n

h(X1,X2, ,Xn)= h(Xi X1, ,Xi 1) ··· | ··· i=1 X Theorem 10.31 I(X; Y ) 0, ⇥ with equality if and only if X is independent of Y .

Corollary 10.32 I(X; Y T ) 0, | ⇥ with equality if and only if X is independent of Y conditioning on T .

Corollary 10.33 (Conditioning Does Not Increase Dierential En- tropy) h(X Y ) h(X) | with equality if and only if X and Y are independent.

Remarks For continuous r.v.’s, 1. h(X),h(X Y ) 0 DO NOT generally hold; | ⇥ 2. I(X; Y ),I(X; Y Z) 0 always hold. | ⇥ Theorem 10.31 I(X; Y ) 0, ⇥ with equality if and only if X is independent of Y .

Corollary 10.32 I(X; Y T ) 0, | ⇥ with equality if and only if X is independent of Y conditioning on T .

Corollary 10.33 (Conditioning Does Not Increase Dierential En- tropy) h(X Y ) h(X) | with equality if and only if X and Y are independent.

Remarks For continuous r.v.’s, 1. h(X),h(X Y ) 0 DO NOT generally hold; | ⇥ 2. I(X; Y ),I(X; Y Z) 0 always hold. | ⇥ Theorem 10.31 I(X; Y ) 0, ⇥ with equality if and only if X is independent of Y .

Corollary 10.32 I(X; Y T ) 0, | ⇥ with equality if and only if X is independent of Y conditioning on T .

Corollary 10.33 (Conditioning Does Not Increase Dierential En- tropy) h(X Y ) h(X) | with equality if and only if X and Y are independent.

Remarks For continuous r.v.’s, 1. h(X),h(X Y ) 0 DO NOT generally hold; | ⇥ 2. I(X; Y ),I(X; Y Z) 0 always hold. | ⇥ Theorem 10.31 I(X; Y ) 0, ⇥ with equality if and only if X is independent of Y .

Corollary 10.32 I(X; Y T ) 0, | ⇥ with equality if and only if X is independent of Y conditioning on T .

Corollary 10.33 (Conditioning Does Not Increase Dierential En- tropy) h(X Y ) h(X) | with equality if and only if X and Y are independent.

Remarks For continuous r.v.’s, 1. h(X),h(X Y ) 0 DO NOT generally hold; | ⇥ 2. I(X; Y ),I(X; Y Z) 0 always hold. | ⇥ Theorem 10.31 I(X; Y ) 0, ⇥ with equality if and only if X is independent of Y .

Corollary 10.32 I(X; Y T ) 0, | ⇥ with equality if and only if X is independent of Y conditioning on T .

Corollary 10.33 (Conditioning Does Not Increase Dierential En- tropy) h(X Y ) h(X) | with equality if and only if X and Y are independent.

Remarks For continuous r.v.’s, 1. h(X),h(X Y ) 0 DO NOT generally hold; | ⇥ 2. I(X; Y ),I(X; Y Z) 0 always hold. | ⇥ Theorem 10.31 I(X; Y ) 0, ⇥ with equality if and only if X is independent of Y .

Corollary 10.32 I(X; Y T ) 0, | ⇥ with equality if and only if X is independent of Y conditioning on T .

Corollary 10.33 (Conditioning Does Not Increase Dierential En- tropy) h(X Y ) h(X) | with equality if and only if X and Y are independent.

Remarks For continuous r.v.’s, 1. h(X),h(X Y ) 0 DO NOT generally hold; | ⇥ 2. I(X; Y ),I(X; Y Z) 0 always hold. | ⇥ Theorem 10.31 I(X; Y ) 0, ⇥ with equality if and only if X is independent of Y .

Corollary 10.32 I(X; Y T ) 0, | ⇥ with equality if and only if X is independent of Y conditioning on T .

Corollary 10.33 (Conditioning Does Not Increase Dierential En- tropy) h(X Y ) h(X) | with equality if and only if X and Y are independent.

Remarks For continuous r.v.’s, 1. h(X),h(X Y ) 0 DO NOT generally hold; | ⇥ 2. I(X; Y ),I(X; Y Z) 0 always hold. | ⇥ Corollary 10.34 (Independence Bound for Di↵erential Entropy)

n h(X ,X , ,X ) h(X ) 1 2 ··· n  i i=1 X with equality if and only if i =1, 2, ,n are mutually independent. ···