10.3 Joint Differential Entropy, Conditional (Differential) Entropy, and Mutual Information

10.3 Joint Differential Entropy, Conditional (Differential) Entropy, and Mutual Information Definition 10.17 The joint di↵erential entropy h(X) of a random vector X with joint pdf f(x) is defined as h(X)= f(x) log f(x)dx = E log f(X). − − ZS Corollary If X ,X , ,X are mutually independent, then 1 2 ··· n n h(X)= h(Xi). i=1 X Theorem 10.18 (Translation) h(X + c)=h(X). Theorem 10.19 (Scaling) h(AX)=h(X) + log det(A) . | | Definition 10.17 The joint di↵erential entropy h(X) of a random vector X with joint pdf f(x) is defined as h(X)= f(x) log f(x)dx = E log f(X). − − ZS Corollary If X ,X , ,X are mutually independent, then 1 2 ··· n n h(X)= h(Xi). i=1 X Theorem 10.18 (Translation) h(X + c)=h(X). Theorem 10.19 (Scaling) h(AX)=h(X) + log det(A) . | | Definition 10.17 The joint di↵erential entropy h(X) of a random vector X with joint pdf f(x) is defined as h(X)= f(x) log f(x)dx = E log f(X). − − ZS Corollary If X ,X , ,X are mutually independent, then 1 2 ··· n n h(X)= h(Xi). i=1 X Theorem 10.18 (Translation) h(X + c)=h(X). Theorem 10.19 (Scaling) h(AX)=h(X) + log det(A) . | | Definition 10.17 The joint di↵erential entropy h(X) of a random vector X with joint pdf f(x) is defined as h(X)= f(x) log f(x)dx = E log f(X). − − ZS Corollary If X ,X , ,X are mutually independent, then 1 2 ··· n n h(X)= h(Xi). i=1 X Theorem 10.18 (Translation) h(X + c)=h(X). Theorem 10.19 (Scaling) h(AX)=h(X) + log det(A) . | | Theorem 10.20 (Multivariate Gaussian Distribution) Let X (µ,K). Then ∼ N 1 h(X)= log [(2πe)n K ] . 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) Theorem 10.19 h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 det(Q)= 1 ± n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = λi, the ith diagonal = log(2⇡eλi) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) λi 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | K = Q⇤Q> K = Q ⇤ Q> | | | || || | 2 = Q ⇤ | | | | = ⇤ | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) Theorem 10.19 h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 det(Q)= 1 ± n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = λi, the ith diagonal = log(2⇡eλi) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) λi 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | K = Q⇤Q> K = Q ⇤ Q> | | | || || | 2 = Q ⇤ | | | | = ⇤ | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) Theorem 10.19 h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 det(Q)= 1 ± n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = λi, the ith diagonal = log(2⇡eλi) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) λi 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | K = Q⇤Q> K = Q ⇤ Q> | | | || || | 2 = Q ⇤ | | | | = ⇤ | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) Theorem 10.19 h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 det(Q)= 1 ± n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = λi, the ith diagonal = log(2⇡eλi) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) λi 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | K = Q⇤Q> K = Q ⇤ Q> | | | || || | 2 = Q ⇤ | | | | = ⇤ | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) Theorem 10.19 h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 det(Q)= 1 ± n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = λi, the ith diagonal = log(2⇡eλi) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) λi 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | K = Q⇤Q> K = Q ⇤ Q> | | | || || | 2 = Q ⇤ | | | | = ⇤ | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = λi, the ith diagonal = log(2⇡eλi) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) λi 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = λi, the ith diagonal = log(2⇡eλi) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) λi 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = λi, the ith diagonal = log(2⇡eλi) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) λi 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated. iY=1 4 5 1 n = log[(2⇡e) ⇤ ] 2 | | 1 n = log[(2⇡e) K ]. 2 | | Theorem 10.20 (Multivariate Gaussian Distribu- 5. Now consider tion) Let X (µ,K). Then ⇠N h(X)=h(QY) 1 n = h(Y)+log det(Q) Theorem 10.19 h(X)= log (2⇡e) K . | | 2 | | h i = h(Y)+0 n Proof = h(Yi) 1. Let K be diagonalizable as Q⇤Q>. iX=1 2. Write X = QY, where the random variables in Y n 1 are uncorrelated with var Yi = λi, the ith diagonal = log(2⇡eλi) 2 element of ⇤(cf. Corollary 10.7). iX=1 3. Since X is Gaussian, so is Y. n 1 n = log (2⇡e) λi 4. Then the random variables in Y are mutually inde- 2 2 3 pendent because they are uncorrelated.

10.3 Joint Differential Entropy, Conditional (Differential) Entropy, and Mutual Information

Distribution of Mutual Information

Lecture 3: Entropy, Relative Entropy, and Mutual Information 1 Notation 2

Package 'Infotheo'

Arxiv:1907.00325V5 [Cs.LG] 25 Aug 2020

GAIT: a Geometric Approach to Information Theory

On a General Definition of Conditional Rényi Entropies

Noisy Channel Coding

A Statistical Framework for Neuroimaging Data Analysis Based on Mutual Information Estimated Via a Gaussian Copula

Lecture 11: Channel Coding Theorem: Converse Part 1 Recap

Information Theory 1 Entropy 2 Mutual Information

Information Theory and Maximum Entropy 8.1 Fundamentals of Information Theory

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)