Entropy and Mutual Information (Discrete Random Variables)

Entropy and Mutual Information (Discrete Random Variables) Master Universitario en Ingenier´ıade Telecomunicación I. Santamar´ıa Universidad de Cantabria Introduction Entropy Joint Entropy and Conditional Entropy Relative Entropy Mutual Information Jensen's inequality Contents Introduction Entropy Joint Entropy and Conditional Entropy Relative Entropy Mutual Information Jensen's inequality Entropy and Mutual Information 1/31 Introduction Entropy Joint Entropy and Conditional Entropy Relative Entropy Mutual Information Jensen's inequality Entropy and mutual information are key concepts in IT I Entropy I The entropy H(X ) of a random variable X gives us the fundamental limit for data compression I A source producing i.i.d. realizations of X can be compressed up to H(X ) bits/realization I The entropy is the average shortest description of X I Mutual information I The mutual information gives us the fundamental limit for transmission I The capacity of a channel is given by C = max I (X ; Y ) p(x) Entropy and Mutual Information 2/31 Introduction Entropy Joint Entropy and Conditional Entropy Relative Entropy Mutual Information Jensen's inequality Definitions Let X be a discrete random variable (r.v.) that takes values x 2 X . The probability mass function (pmf) of X will be denoted by p(x) = PrfX = xg Example 1: Bernoulli r.v. X = f0; 1g PrfX = 1g = p; PrfX = 0g = 1 − p Example 2: Binomial r.v. X ∼ B(n; p)(X = f0; 1;:::; ng) n PrfX = kg = pk (1 − p)n−k k I Note that X ∼ B(1; p) is a Bernoulli r.v. I If X1;:::; Xn are independent B(1; p), then Pn Y = k=1 Xk ∼ B(n; p) Entropy and Mutual Information 3/31 Introduction Entropy Joint Entropy and Conditional Entropy Relative Entropy Mutual Information Jensen's inequality Information For a discrete random variable, X , the information of an outcome X = x is i(x) = − log(p(x)) and it is measured in bits1 I Why the log of a probability measures information? Not a simple question, but there are strong reasons I Less probable outcomes are the most informative ones I For independent random variables: i(x; y) = − log(p(x; y)) = − log(p(x)p(y)) = i(x) + i(y) I The outcome of a random experiment is most informative if the pmf is uniform ! Experiment design, guessing games,... 1Unless otherwise indicated, in this course log denotes logarithm in base 2. Entropy and Mutual Information 4/31 Introduction Entropy Joint Entropy and Conditional Entropy Relative Entropy Mutual Information Jensen's inequality If X is Bernoulli with PrfX = 1g = p, the information of the outcome X = 1 is i = − log(p) 10 8 6 i 4 2 0 0 0.2 0.4 0.6 0.8 1 p The information is always a positive quantity Entropy and Mutual Information 5/31 Introduction Entropy Joint Entropy and Conditional Entropy Relative Entropy Mutual Information Jensen's inequality Entropy Definition: The entropy of a discrete random variable, X , is defined by2 X H(X ) = −E[log p(X )] = − p(x) log(p(x)) x2X I For discrete random variables H(X ) ≥ 0 I It is the average information of the random variable X : H(X ) = E[i(X )]; note that this can interpreted also as the mean value of a new (transformed) r.v. Y = − log(p(X )) 2We assume for convention that 0 log(0) = 0. Entropy and Mutual Information 6/31 Introduction Entropy Joint Entropy and Conditional Entropy Relative Entropy Mutual Information Jensen's inequality Interpretation I Entropy is the measure of average uncertainty in X I Entropy is the average number of bits needed to describe X I Entropy is a lower bound on the average length of the shortest description of X Entropy and Mutual Information 7/31 Introduction Entropy Joint Entropy and Conditional Entropy Relative Entropy Mutual Information Jensen's inequality Example 1: Entropy of a Bernoulli r.v. with parameter p H(X ) = −p log(p) − (1 − p) log(1 − p) , H(p) 1 0.8 0.6 H(p) 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 p It is a concave function of p Entropy and Mutual Information 8/31 Introduction Entropy Joint Entropy and Conditional Entropy Relative Entropy Mutual Information Jensen's inequality Example 2: Entropy of a uniform r.v. taking on K values: e.g., X = (1;:::; K) K X X 1 H(X ) = − p(x) log(p(x)) = log(K) = log(K) K x2X i=1 I It does not depend of the values that X takes, it depends only on their probabilities (X and X + a have the same entropy!) I Property: For an arbitrary discrete r.v. H(X ) ≤ log(jX j),where jX j denotes the cardinality of the set, and H(X ) = log(jX j) iff X has a uniform distribution over X That is to say: H(X ) is a lower bound on the number of binary questions (bits) that are always guaranteed to identify an outcome from the ensemble X Entropy and Mutual Information 9/31 Introduction Entropy Joint Entropy and Conditional Entropy Relative Entropy Mutual Information Jensen's inequality Example 3: The entropy of English i ai pi I Probabilities estimated from The Frequently Asked 1 a 0.0575 a a 2 b 0.0128 b Questions Manual for Linux 3 c 0.0263 c 4 d 0.0285 d I 26 letters (a-z) and a space character '-' 5 e 0.0913 e 6 f 0.0173 f 7 g 0.0133 g I The entropy is 8 h 0.0313 h 9 i 0.0599 i X 10 j 0.0006 j H = − pi log(pi ) = 4:11 bits=letter 11 k 0.0084 k l i 12 0.0335 l 13 m 0.0235 m 14 n 0.0596 n 15 o 0.0689 o I The most informative letter is z: 11.48 bits 16 p 0.0192 p (frequency of appearance 0:1%) 17 q 0.0008 q 18 r 0.0508 r 19 s 0.0567 s I The least informative letter is e: 3.5 bits (frequency 20 t 0.0706 t of appearance 13%) 21 u 0.0334 u 22 v 0.0069 v 23 w 0.0119 w I Is it possible to write a book in English without using 24 x 0.0073 x the letter e? 25 y 0.0164 y 26 z 0.0007 z 27 { 0.1928 { aReprinted from MacKay's textbook Entropy and Mutual Information 10/31 Introduction Entropy Joint Entropy and Conditional Entropy Relative Entropy Mutual Information Jensen's inequality I [To July 1906] If youth, throughout all history, had had a champion to stand up for it; to show a doubting world that a child can think; and, possibly, do it practically; you wouldn’t constantly run across folks today who claim that “a child don’t know anything.” A child’s brain starts functioning at birth; and has, amongst its many infant convolutions, thousands of dormant atoms, into which God has put a mystic possibility for noticing an adult’s act, and figuring out its purport. Up to about its primary school days a child thinks, naturally, only of play. But many a Entropy and Mutualform of Information play contains disciplinary factors. “You can’t do this,” or “that puts you out,” 11/31 shows a child that it must think, practically or fail. Now, if, throughout childhood, a brain has no opposition, it is plain that it will attain a position of “status quo,” as with our ordinary animals. Man knows not why a cow, dog or lion was not born with a brain on a par with ours; why such animals cannot add, subtract, or obtain from books and schooling, that paramount position which Man holds today. But a human brain is not in that class. Constantly throbbing and pulsating, it rapidly forms opinions; attaining an ability of its own; a fact which is startlingly shown by an occasional child “prodigy” in music or school work. And as, with our dumb animals, a child’s inability convincingly to impart its thoughts to us, should not class it as ignorant. Upon this basis I am going to show you how a bunch of bright young folks did find a champion; a man with boys and girls of his own; a man of so dominating and happy individuality that Youth is drawn to him as is a fly to a sugar bowl. It is a story about a small town. It is not a gossipy yarn; nor is it a dry, monotonous account, full of such customary “fill-ins” as “romantic moonlight casting murky shadows down a long, winding country road.” Nor will it say anything about tinklings lulling distant folds; robins carolling at twilight, nor any “warm glow of lamplight” from a cabin window. No. It is an account of up-and-doing activity; a vivid portrayal of Youth as it is today; and a practical discarding of that worn-out notion that “a child don’t know anything.” Now, any author, from history’s dawn, always had that most important aid to writing: an ability to call upon any word in his dictionary in building up his story. That is, our strict laws as to word construction did not block his path. But in my story that mighty obstruction will constantly stand in my path; for many an important, common word I cannot adopt, owing to its orthography. I shall act as a sort of historian for this small town; associating with its inhabitants, and striving to acquaint you with its youths, in such a way that you can look, knowingly, upon any child, rich or poor; forward or “backward;” your own, or John Smith’s, in your community.

Load more