2.2 Shannon's Information Measures

2.2 Shannon’s Information Measures Shannon’s Information Measures Entropy • Conditional entropy • Mutual information • Conditional mutual information • Shannon’s Information Measures Entropy • Conditional entropy • Mutual information • Conditional mutual information • Shannon’s Information Measures Entropy • Conditional entropy • Mutual information • Conditional mutual information • Shannon’s Information Measures Entropy • Conditional entropy • Mutual information • Conditional mutual information • Shannon’s Information Measures Entropy • Conditional entropy • Mutual information • Conditional mutual information • Definition 2.13 The entropy H(X) of a random variable X is defined as H(X)= p(x) log p(x). − x X Convention: summation is taken over . • SX When the base of the logarithm is ↵,writeH(X) as H (X). • ↵ Entropy measures the uncertainty of a discrete random variable. • The unit for entropy is • bit if ↵ =2 nat if ↵ = e D-it if ↵ = D A bit in information theory is di↵erent from a bit in computer science. • Definition 2.13 The entropy H(X) of a random variable X is defined as H(X)= p(x) log p(x). − x X Convention: summation is taken over . • SX When the base of the logarithm is ↵,writeH(X) as H (X). • ↵ Entropy measures the uncertainty of a discrete random variable. • The unit for entropy is • bit if ↵ =2 nat if ↵ = e D-it if ↵ = D A bit in information theory is di↵erent from a bit in computer science. • Definition 2.13 The entropy H(X) of a random variable X is defined as H(X)= p(x) log p(x). − x X Convention: summation is taken over . • SX When the base of the logarithm is ↵,writeH(X) as H (X). • ↵ Entropy measures the uncertainty of a discrete random variable. • The unit for entropy is • bit if ↵ =2 nat if ↵ = e D-it if ↵ = D A bit in information theory is di↵erent from a bit in computer science. • Definition 2.13 The entropy H(X) of a random variable X is defined as H(X)= p(x) log p(x). − x X Convention: summation is taken over . • SX When the base of the logarithm is __↵,writeH(X) as H (X). • ↵ Entropy measures the uncertainty of a discrete random variable. • The unit for entropy is • bit if ↵ =2 nat if ↵ = e D-it if ↵ = D A bit in information theory is di↵erent from a bit in computer science. • Definition 2.13 The entropy H(X) of a random variable X is defined as H(X)= p(x) log p(x). − x X Convention: summation is taken over . • SX When the base of the logarithm is __↵,writeH(X) as H (X). • ↵_ Entropy measures the uncertainty of a discrete random variable. • The unit for entropy is • bit if ↵ =2 nat if ↵ = e D-it if ↵ = D A bit in information theory is di↵erent from a bit in computer science. • Definition 2.13 The entropy H(X) of a random variable X is defined as H(X)= p(x) log p(x). − x X Convention: summation is taken over . • SX When the base of the logarithm is ↵,writeH(X) as H (X). • ↵ Entropy measures the uncertainty of a discrete random variable. • The unit for entropy is • bit if ↵ =2 nat if ↵ = e D-it if ↵ = D A bit in information theory is di↵erent from a bit in computer science. • Definition 2.13 The entropy H(X) of a random variable X is defined as H(X)= p(x) log p(x). − x X Convention: summation is taken over . • SX When the base of the logarithm is ↵,writeH(X) as H (X). • ↵ Entropy measures the uncertainty of a discrete random variable. • The unit for entropy is • bit if ↵ =2 nat if ↵ = e D-it if ↵ = D A bit in information theory is di↵erent from a bit in computer science. • Definition 2.13 The entropy H(X) of a random variable X is defined as H(X)= p(x) log p(x). − x X Convention: summation is taken over . • SX When the base of the logarithm is ↵,writeH(X) as H (X). • ↵ Entropy measures the uncertainty of a discrete random variable. • The unit for entropy is • bit if ↵ =2 nat if ↵ = e D-it if ↵ = D A bit in information theory is di↵erent from a bit in computer science. • Remark H(X) depends only on the distribution of X but not on the actual values taken by X, hence also write H(pX ). Example Let X and Y be random variables with = = 0, 1 , and let X Y { } pX (0) = 0.3,pX (1) = 0.7 and pY (0) = 0.7,pY (1) = 0.3. Although p = p , H(X)=H(Y ). X 6 Y Remark H(X) depends only on the distribution of X but not on the actual values taken by X, hence also write H(pX ). Example Let X and Y be random variables with = = 0, 1 , and let X Y { } pX (0) = 0.3,pX (1) = 0.7 and pY (0) = 0.7,pY (1) = 0.3. Although p = p , H(X)=H(Y ). X 6 Y Remark H(X) depends only on the distribution of X but not on the actual values taken by X, hence also write H(pX ). Example Let X and Y be random variables with = = 0, 1 , and let X Y { } ______________pX (0) = 0.3,pX (1) = 0.7 and pY (0) = 0.7,pY (1) = 0.3. Although p = p , H(X)=H(Y ). X 6 Y Remark H(X) depends only on the distribution of X but not on the actual values taken by X, hence also write H(pX ). Example Let X and Y be random variables with = = 0, 1 , and let X Y { } pX (0) = 0.3,p______________X (1) = 0.7 and pY (0) = 0.7,pY (1) = 0.3. Although p = p , H(X)=H(Y ). X 6 Y Remark H(X) depends only on the distribution of X but not on the actual values taken by X, hence also write H(pX ). Example Let X and Y be random variables with = = 0, 1 , and let X Y { } pX (0) = 0.3,pX (1) = 0.7 and ______________pY (0) = 0.7,pY (1) = 0.3. Although p = p , H(X)=H(Y ). X 6 Y Remark H(X) depends only on the distribution of X but not on the actual values taken by X, hence also write H(pX ). Example Let X and Y be random variables with = = 0, 1 , and let X Y { } pX (0) = 0.3,pX (1) = 0.7 and pY (0) = 0.7,p______________Y (1) = 0.3. Although p = p , H(X)=H(Y ). X 6 Y Remark H(X) depends only on the distribution of X but not on the actual values taken by X, hence also write H(pX ). Example Let X and Y be random variables with = = 0, 1 , and let X Y { } pX (0) = 0.3,pX (1) = 0.7 and pY (0) = 0.7,pY (1) = 0.3. Although p = p , H(X)=H(Y ). X 6 Y Entropy as Expectation Convention • Eg(X)= p(x)g(x) x X where summation is over . SX Linearity • E[f(X)+g(X)] = Ef(X)+Eg(X) See Problem 5 for details. Can write • H(X)= E log p(X)= p(x) log p(x) − − x X In probability theory, when Eg(X) is considered, usually g(x)depends • only on the value of x but not on p(x). Entropy as Expectation Convention • Eg(X)= p(x)g(x) x X where summation is over . SX Linearity • E[f(X)+g(X)] = Ef(X)+Eg(X) See Problem 5 for details. Can write • H(X)= E log p(X)= p(x) log p(x) − − x X In probability theory, when Eg(X) is considered, usually g(x)depends • only on the value of x but not on p(x). Entropy as Expectation Convention • Eg(X)= p(x)g(x) x X where summation is over . SX Linearity • E[f(X)+g(X)] = Ef(X)+Eg(X) See Problem 5 for details. Can write • H(X)= E log p(X)= p(x) log p(x) − − x X In probability theory, when Eg(X) is considered, usually g(x)depends • only on the value of x but not on p(x). Entropy as Expectation Convention • Eg(X)= p(x)g(x) x X where summation is over . SX Linearity • E[f(X)+g(X)] = Ef(X)+Eg(X) See Problem 5 for details. Can write • H(X)= E log p(X)= p(x) log p(x) − _________ − _________ x X In probability theory, when Eg(X) is considered, usually g(x)depends • only on the value of x but not on p(x). Entropy as Expectation Convention • Eg(X)= p(x)g(x) x X where summation is over . SX Linearity • E[f(X)+g(X)] = Ef(X)+Eg(X) See Problem 5 for details. Can write • H(X)= E log p(X)= p(x) log p(x) − _________ − _________ x X In probability theory, when Eg(X) is considered, usually g(x)depends • only on the value of x but not on p(x). Entropy as Expectation Convention • Eg(X)= p(x)g(x) x X where summation is over . SX Linearity • E[f(X)+g(X)] = Ef(X)+Eg(X) See Problem 5 for details. Can write • H(X)= E log p(X)= p(x) log p(x) − _________ − _________ x X In probability theory, when Eg(X) is considered, usually g(x)depends • only on the value of x but not on p(x). Binary Entropy Function For 0 γ 1, define the binary entropy function • h (γ)= γ log γ (1 γ) log(1 γ) b − − − − with the convention 0 log 0 = 0, as by L’Hopital’s rule, lim a log a =0.

2.2 Shannon's Information Measures

Package 'Infotheo'

Information Theory Techniques for Multimedia Data Classification and Retrieval

Chapter 4 Information Theory

Guide for the Use of the International System of Units (SI)

On Measures of Entropy and Information

Understanding Shannon's Entropy Metric for Information

A Flow-Based Entropy Characterization of a Nated Network and Its Application on Intrusion Detection

Information Theory and Maximum Entropy 8.1 Fundamentals of Information Theory

Information, Entropy, and the Motivation for Source Codes

Shannon's Coding Theorems

Mutual Information Analysis: a Comprehensive Study∗

Nonparametric Maximum Entropy Estimation on Information Diagrams