
Machine Learning CS 4641 Information Theory Nakul Gopalan Georgia Tech These slides are based on slides from Le Song, Roni Rosenfeld, Chao Zhang, Mahdi Roozbahani and Maneesh Sahani. Outline • Motivation • Entropy • Conditional Entropy and Mutual Information • Cross-Entropy and KL-Divergence 2 Uncertainty and Information Information is processed data whereas knowledge is information that is modeled to be useful. You need information to be able to get knowledge 3 Uncertainty and Information Which day is more uncertain? How do we quantify uncertainty? High entropy correlates to high information or the more uncertain Information Let X be a random variable with distribution p(x) 1 퐼 푋 = log( ) 2 푝(푥) Information Let X be a random variable with distribution p(x) 1 퐼 푋 = log( ) 2 푝(푥) Have you heard a picture is worth 1000 words? Information obtained by random word from a 100,000 word vocabulary: 1 1 퐼 푤표푟푑 = log = log = 16.61 푏푖푡푠 푝 푥 1/100000 A 1000 word document from same source: 퐼 푑표푐푢푚푒푛푡 = 1000 × 퐼 푤표푟푑 = 16610 A 640*480 pixel, 16-greyscale video picture (each pixel has 16 bits information): 1 퐼 푃푖푐푡푢푟푒 = log = 1228800 1/16640∗480 A picture is worth (a lot more than) 1000 words! Which one has a higher probability: T or H? Which one should carry more information: T or H? Example Morse Code image from Wikipedia Commons Information Theory • Information theory is a mathematical framework which addresses questions like: ‣ How much information does a random variable carry about? ‣ How efficient is a hypothetical code, given the statistics of the random variable? ‣ How much better or worse would another code do? ‣ Is the information carried by different random variables complementary or redundant? 14 Outline • Motivation • Entropy • Conditional Entropy and Mutual Information • Cross-Entropy and KL-Divergence 15 Entropy 16 Entropy 17 Entropy Computation: An Example 5 18 Properties of Entropy 19 Outline • Motivation • Entropy • Conditional Entropy and Mutual Information • Cross-Entropy and KL-Divergence 20 Joint Entropy Temperature huMidity 21 Conditional Entropy or Average Conditional Entropy 푝(푥) 퐻 푌 푋 = 푝 푥 퐻 푌 푋 = 푥 = 푝 푥, 푦 푙표푔 푝(푥, 푦) 푥∈푋 푥∈푋,푦∈푌 22 Conditional Entropy proof Conditional Entropy 푝(푥) 퐻 푌 푋 = 푝 푥 퐻 푌 푋 = 푥 = 푝 푥, 푦 푙표푔 푝(푥, 푦) 푥∈푋 푥∈푋,푦∈푌 24 Conditional Entropy 25 Conditional Entropy Discrete random variables: 푝 푥푖 퐻 푌 푋 = 푝 푥푖 퐻 푌 푋 = 푥푖 = 푝 푥푖, 푦푖 푙표푔 푝 푥푖, 푦푖 푥∈푋 푥∈푋,푦∈푌 Continuous: 퐻 푌 푋 26 Poll Relationship between H(Y) and H(Y|X): • H(Y) >= H(Y|X) • H(Y) <= H(Y|X) Mutual Information 퐼 푌 푋 28 Properties of Mutual Information 29 CE and MI: Visual Illustration Image Credit: Christopher Olah. 30 Outline • Motivation • Entropy • Conditional Entropy and Mutual Information • Cross-Entropy and KL-Divergence 31 Cross Entropy Cross Entropy: The expected number of bits when a wrong distribution Q is assumed while the data actually follows a distribution P = 퐻 푃 + 퐾퐿 푃 푄 Other definitions: 32 Kullback-Leibler Divergence = 퐻(푃, 푄) − 퐻(푃) KL Divergence is a distance measurement log function is By Jensen Inequality concave or 퐸[푔(푥)] ≤ 푔(퐸[푥]) convex? 푔(푥) = log(x) When 푃 = 푄, 퐾퐿[푃||푄] = 0 33 Take-Home Messages • Entropy ‣ A measure for uncertainty ‣ Why it is defined in this way (optimal coding) ‣ Its properties • Joint Entropy, Conditional Entropy, Mutual Information ‣ The physical intuitions behind their definitions ‣ The relationships between them • Cross Entropy, KL Divergence ‣ The physical intuitions behind them ‣ The relationships between entropy, cross-entropy, and KL divergence 34.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages33 Page
-
File Size-