Belief Propagation in [.1Cm] Bayesian Networks
Total Page:16
File Type:pdf, Size:1020Kb
Belief Propagation in Bayesian Networks Céline Comte Reading Group “Network Theory” November 5, 2018 Introduction (First-order) logic Represent causal relations between variables by a directed acyclic graph Probabilities Weight these causal relations by probabilities that implicitly account for non-represented variables 2/40 © 2018 Nokia Public Introduction “Belief networks are directed acyclic graphs in which the nodes represent propositions (or variables), the arcs signify direct dependencies between the linked propositions, and the strengths of these de- pendencies are quantified by conditional probabilities” (Pearl, 1986) 2/40 © 2018 Nokia Public Bayesian networks are also … ² A memory-efficient way of storing a PMF ² Based on simple probability rules (more details in a few slides) ² Inspired by human causal reasoning (Pearl, 1986, 1988) ² Used for decision taking if a utility function is provided ² Applied in many fields: medecine diagnoses, turbo-codes, (programming) language detection, … ² Related to other models: Markov random fields, Markov chains, hidden Markov models, … 3/40 © 2018 Nokia Public References: Pearl’s articles and book ² J. Pearl (1982). “Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach”. In: AAAI’82 ! Belief propagation in causal trees ² J. Pearl (1986). “Fusion, propagation, and structuring in belief networks”. In: Artificial Intelligence ! Belief propagation in causal trees and polytrees ² J. Pearl (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann ! A complete reference (thanks Achille for providing me with this book) 4/40 © 2018 Nokia Public References: Textbooks ² T. D. Nielsen and F. V. Jensen (2007). Bayesian Networks and Decision Graphs. Springer-Verlag ! A lot of examples in Chapters 2 and 3 ² D. Koller, N. Friedman, and F. Bach (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press ² M. Jordan (Last modified in 2015). An Introduction to Probabilistic Graphical Models. ! Definition and belief probagation (thanks Nathan for pointing this reference) 5/40 © 2018 Nokia Public Outline Reminders on probability theory Bayesian networks Belief propagation in trees Belief propagation in polytrees 6/40 © 2018 Nokia Public Outline Reminders on probability theory Bayesian networks Belief propagation in trees Belief propagation in polytrees 7/40 © 2018 Nokia Public ² A and B are marginally independent (written A ?? B) if one of these three equivalent conditions is satisfied: ¡ P(A,B) Æ P(A)P(B) ¡ P(A j B) Æ P(A) ¡ P(B j A) Æ P(B) ² A and B are conditonally independent given C (written A ?? B j C) if one of these three equivalent conditions is satisfied: ¡ P(A,B j C) Æ P(A j C)P(B j C) ¡ P(A j B,C) Æ P(A j C) ¡ P(B j A,C) Æ P(B j C) Independence and conditional independence Remark: We work exclusively with discrete random variables 8/40 © 2018 Nokia Public ² A and B are conditonally independent given C (written A ?? B j C) if one of these three equivalent conditions is satisfied: ¡ P(A,B j C) Æ P(A j C)P(B j C) ¡ P(A j B,C) Æ P(A j C) ¡ P(B j A,C) Æ P(B j C) Independence and conditional independence Remark: We work exclusively with discrete random variables ² A and B are marginally independent (written A ?? B) if one of these three equivalent conditions is satisfied: ¡ P(A,B) Æ P(A)P(B) ¡ P(A j B) Æ P(A) ¡ P(B j A) Æ P(B) 8/40 © 2018 Nokia Public Independence and conditional independence Remark: We work exclusively with discrete random variables ² A and B are marginally independent (written A ?? B) if one of these three equivalent conditions is satisfied: ¡ P(A,B) Æ P(A)P(B) ¡ P(A j B) Æ P(A) ¡ P(B j A) Æ P(B) ² A and B are conditonally independent given C (written A ?? B j C) if one of these three equivalent conditions is satisfied: ¡ P(A,B j C) Æ P(A j C)P(B j C) ¡ P(A j B,C) Æ P(A j C) ¡ P(B j A,C) Æ P(B j C) 8/40 © 2018 Nokia Public ² Law of total probability If A and B are two random variables, X P(B) Æ P(B j A)P(A) A Useful rules ² The chain rule of probabilities If A1,...,An are random variables, we have P(A1,...,An) Æ P(A1) £ P(A2 j A1) £ P(A3 j A1,A2) £ ¢¢¢ £ P(An j A1,...,An¡1) 9/40 © 2018 Nokia Public Useful rules ² The chain rule of probabilities If A1,...,An are random variables, we have P(A1,...,An) Æ P(A1) £ P(A2 j A1) £ P(A3 j A1,A2) £ ¢¢¢ £ P(An j A1,...,An¡1) ² Law of total probability If A and B are two random variables, X P(B) Æ P(B j A)P(A) A 9/40 © 2018 Nokia Public Useful rules ² Bayes’ rule If A and B are two random variables, P(A j B)P(B) P(B j A) Æ P(A) We can see P(A) as a normalizing constant: we can first compute P(B j A) / P(A j B)P(B) for each value of B and then normalize to obtain P(B j A) without computing P(A) 10/40 © 2018 Nokia Public ² Observe a random variable ² Evidence (piece of evidence) The set of random variables that have been observed Glossary ² Belief in a random variable (conviction in french) Marginal distribution of this random variable (given the value of some observed variables) 11/40 © 2018 Nokia Public ² Evidence (piece of evidence) The set of random variables that have been observed Glossary ² Belief in a random variable (conviction in french) Marginal distribution of this random variable (given the value of some observed variables) ² Observe a random variable 11/40 © 2018 Nokia Public Glossary ² Belief in a random variable (conviction in french) Marginal distribution of this random variable (given the value of some observed variables) ² Observe a random variable ² Evidence (piece of evidence) The set of random variables that have been observed 11/40 © 2018 Nokia Public Outline Reminders on probability theory Bayesian networks Belief propagation in trees Belief propagation in polytrees 12/40 © 2018 Nokia Public D I G B L The Student example Course Student difficulty intelligence Grade Baccalauréat Reference letter Borrowed from (Koller, Friedman, and Bach, 2009) 13/40 © 2018 Nokia Public The Student example D I G B Course Student difficulty intelligence L Grade Baccalauréat Reference letter Borrowed from (Koller, Friedman, and Bach, 2009) 13/40 © 2018 Nokia Public ² Chain rule of Bayesian networks By the chain rule of probabilities: P(D,I,G,B,L) Æ P(D)P(I j D)P(G j D,I)P(B j D,I,G)P(L j D,I,G,B), Æ P(D)P(I)P(G j D,I)P(B j I)P(L j G) These two definitions are equivalent The Student example D I ² Local Markov property G B Each node is conditionally independent of its non-descendants given its parents: L D ?? {I,B}, I ?? D, G ?? B j {D,I}, B ?? {D,G,L} j I, L ?? {D,I,B} j G 14/40 © 2018 Nokia Public These two definitions are equivalent The Student example D I ² Local Markov property G B Each node is conditionally independent of its non-descendants given its parents: L D ?? {I,B}, I ?? D, G ?? B j {D,I}, B ?? {D,G,L} j I, L ?? {D,I,B} j G ² Chain rule of Bayesian networks By the chain rule of probabilities: P(D,I,G,B,L) Æ P(D)P(I j D)P(G j D,I)P(B j D,I,G)P(L j D,I,G,B), Æ P(D)P(I)P(G j D,I)P(B j I)P(L j G) 14/40 © 2018 Nokia Public The Student example D I ² Local Markov property G B Each node is conditionally independent of its non-descendants given its parents: L D ?? {I,B}, I ?? D, G ?? B j {D,I}, B ?? {D,G,L} j I, L ?? {D,I,B} j G ² Chain rule of Bayesian networks By the chain rule of probabilities: P(D,I,G,B,L) Æ P(D)P(I j D)P(G j D,I)P(B j D,I,G)P(L j D,I,G,B), Æ P(D)P(I)P(G j D,I)P(B j I)P(L j G) These two definitions are equivalent 14/40 © 2018 Nokia Public The Student example D I G B Course Student L P(D) P(I) difficulty intelligence P(G j D,I) Grade Baccalauréat P(B j I) P(L j G) Reference letter 15/40 © 2018 Nokia Public Bayesian networks in general D I G B Described by L ² A directed acyclic graph ¡ Nodes » (discrete) random variables X1,...,Xn ¡ Arrows » conditional (in)dependencies ² Local conditional probability tables (CPT) ¡ P(Xi j parents(Xi)) for each node Xi 16/40 © 2018 Nokia Public Bayesian networks in general D I G B Two equivalent definitions L ² Local Markov property Each node is conditionally independent of its non-descendants given its parents ² Chain rule of Bayesian networks Yn P(X1,...,Xn) Æ P(Xi j parents(Xi)) iÆ1 Proof of the equivalence: Corollary 4 p.20 of (Pearl, 1988) 17/40 © 2018 Nokia Public ² Interpretation: chain of causality X “causes” Y that “causes” Z y // ² Information can flow between X and Z through Y (that is, observing X changes our belief in Z and vice versa), Flow of information unless Y is observed ² Example: Markov chains Base case: serial connection X ?? Z j YP(X,Y,Z) Æ P(X)P(Y j X)P(Z j Y) X Y Z 18/40 © 2018 Nokia Public y // ² Information can flow between X and Z through Y (that is, observing X changes our belief in Z and vice versa), Flow of information unless Y is observed ² Example: Markov chains Base case: serial connection X ?? Z j YP(X,Y,Z) Æ P(X)P(Y j X)P(Z j Y) X ² Interpretation: chain of causality X “causes” Y that “causes” Z Y Z 18/40 © 2018 Nokia Public y // Flow of information ² Example: Markov chains Base case: serial connection X ?? Z j YP(X,Y,Z) Æ P(X)P(Y j X)P(Z j Y) X ² Interpretation: chain of causality X “causes” Y that “causes” Z Y ² Information can flow between X and Z through Y (that is, observing X changes our belief in Z and vice versa), Z unless Y is observed 18/40 © 2018 Nokia Public y // ² Example: Markov chains Base case: serial connection X ?? Z j YP(X,Y,Z) Æ P(X)P(Y j X)P(Z j Y) X ² Interpretation: chain of causality X “causes” Y that “causes” Z Y ² Information can flow between X and Z through Y (that is, observing X changes our belief in Z and vice versa), Z Flow of