Inference in Graphical Models the Junction Tree Algorithm
Total Page:16
File Type:pdf, Size:1020Kb
George Mason University Department of Systems Engineering and Operations Research Graphical Models for Inference and Decision Making Unit 4: Inference in Graphical Models The Junction Tree Algorithm Instructor: Kathryn Blackmond Laskey Spring 2019 ©Kathryn Blackmond Laskey Spring 2019 Unit 4 (v3a) - 1 - George Mason University Department of Systems Engineering and Operations Research Learning Objectives 1. State the canonical belief propagation problem for inference in probabilistic graphical models. 2. Describe why inference is hard for general graphical models. 3. Explain how the junction tree inference algorithm works. a) Given a Bayesian network, construct a junction tree. b) Given a junction tree, follow the steps for propagating beliefs between two neighboring clusters. c) Given a junction tree, describe the flow of control as belief propagation occurs. 4. Describe how the junction tree algorithm is modified for: a) Virtual evidence b) Most probable explanation (MPE) or maximum a posteriori (MAP) inference ©Kathryn Blackmond Laskey Spring 2019 Unit 4 (v3a) - 2 - George Mason University Department of Systems Engineering and Operations Research Unit 4 Outline • The Inference Problem • Pearl’s Belief Propagation Algorithm • Junction Tree Algorithm – Propagation in Junction Trees – Constructing the Junction Tree – Additional topics: » Virtual (likelihood) evidence & soft evidence » MAP / MPE ©Kathryn Blackmond Laskey Spring 2019 Unit 4 (v3a) - 3 - George Mason University Department of Systems Engineering and Operations Research Inference in Graphical Probability Models • In a graphical probability model the joint probability distribution over all random variables in the network is encoded by specifying – the graph – the local probability distributions • Inference (evidence accumulation, belief propagation) means using new information about some random variables to update beliefs about other random variables • Incorporate new information by computing conditional distributions – Given: » Target random variables Xt; evidence random variables Xe; other random variables Xo » A graphical probability model on (Xt, Xe, Xo) » Evidence about Xe (either we learn Xe= xe or we receive virtual evidence) – Goal: » Answer queries about conditional distribution P(Xt | Xe=xe) or P(Xt | virtual evidence) » Assume that we want to know the distribution of one of the target random variables (we will discuss other kinds of query later) • No need to specify in advance which random variables are target / evidence – A graphical model encodes a full joint distribution over all the random variables – The same graphical model can be used to infer fever from flu and flu from fever ("bidirectional inference") ©Kathryn Blackmond Laskey Spring 2019 Unit 4 (v3a) - 4 - George Mason University Department of Systems Engineering and Operations Research MPE and MAP • BNs are also used to solve the maximum a posteriori (MAP) or most probable explanation (MPE) problem: – MPE: Given evidence Xe=xe, find the most probable instantiation of the non-evidence random variables (Xt,Xo) – MAP: Given evidence Xe=xe, find the most probable instantiation of a target random variable Xt – The answer to the second problem is not the same as the answer to the first!!! Why? • We will focus on the computation of posterior marginal distributions for target random variables – Computational issues for MAP and MPE are similar but not identical to issues for posterior marginal distributions – Published literature on MAP and MPE should be understandable to someone who thoroughly understands algorithms for computing posterior marginal distributions – We will briefly describe how to modify the algorithm to find MPE ©Kathryn Blackmond Laskey Spring 2019 Unit 4 (v3a) - 5 - George Mason University Department of Systems Engineering and Operations Research Pearl on Belief Updating • "The primary appeal of probability theory is its ability to express qualitative relationships among beliefs and to process these relationships in a way that yields intuitively plausible conclusions." • "The fortunate match between human intuition and the laws of proportions is not a coincidence. It came about because beliefs are not formed in a vacuum but rather as a distillation of sensory experiences. For reasons of storage economy and generality we forget the actual experiences and retain their mental impressions in the forms of averages, weights, or (more vividly) abstract qualitative relationships that help us determine future actions." • "We [view] a belief network not merely as a passive code for storing factual knowledge but also as a computational architecture for reasoning about that knowledge. This means that the links in the network should be treated as the only mechanisms that direct and propel the flow of data through the process of querying and updating beliefs." – exploits independencies in network – leads to conceptually meaningful belief revision process – separates control from knowledge representation – natural object-oriented implementation • Probability updating seems to resist local constraint propagation approaches because of the need to keep track of sources of belief change. • Purely local and distributed updating is possible in singly connected networks. ©Kathryn Blackmond Laskey Spring 2019 Unit 4 (v3a) - 6 - George Mason University Department of Systems Engineering and Operations Research The Inference (aka “belief propagation”) Problem • Functions the inference algorithm performs: – Query a node for its current probabilities – Declare evidence about nodes in the network – Update current probabilities for nodes given evidence A B Belief in D? E C D J F G I H Evidence about H • Updating should conform to Bayes Rule. That is, if we learn that Xe=xe, the updated belief in Xt must satisfy P(Xt = xt , Xe = xe) P(Xt = xt | Xe = xe ) = P(Xe = xe) ©Kathryn Blackmond Laskey Spring 2019 Unit 4 (v3a) - 7 - € George Mason University Department of Systems Engineering and Operations Research Some General Points on Inference in Graphical Models • Inference is NP-hard (Cooper, 1988) – NP is a class of decision problems for which any instances with “yes” answers have proofs verifiable in polynomial time (but not necessarily provable in polynomial time) – NP-hard is a class of problems at least as hard as the hardest problems in NP » Formally, H is NP-hard if any problem in NP can be reduced in polynomial time to H – If any NP-hard problem could be solved in polynomial time than any problem in NP could also – No such algorithm has yet been found, and most people believe there is none • Although general inference is intractable, tractable algorithms are available for certain types of networks – Singly connected networks – Some sparsely connected networks – Networks amenable to various approximation algorithms • Goal of the expert system builder: find a representation which is a good approximation to the expert's probability distribution and has good computational properties • Goal of the machine learning algorithm designer: search over the space of tractable models for a model which is a good approximation to the training sample • Goal of the inference algorithm designer: find an algorithm that is tractable and provides an accurate approximation to the query of interest ©Kathryn Blackmond Laskey Spring 2019 Unit 4 (v3a) - 8 - George Mason University Department of Systems Engineering and Operations Research Unit 4 Outline • The Inference Problem • Pearl’s Belief Propagation Algorithm • Junction Tree Algorithm – Propagation in Junction Trees – Constructing the Junction Tree – Additional topics: » Virtual (likelihood) evidence & soft evidence » MAP / MPE ©Kathryn Blackmond Laskey Spring 2019 Unit 4 (v3a) - 9 - George Mason University Department of Systems Engineering and Operations Research Pearls Algorithm for Singly Connected BNs Random variables • Goal: compute probability above B distribution of random A1 A4 A5 variable B given evidence p (assume B itself is not known) • Key idea: impact of belief A2 A3 p A6 in B from evidence "above" B and evidence "below" B can be processed ? D7 separately D5 B • Justification: B d-separates above random variables Random from below random variables l variables D6 below B D1 D2 D3 D4 • This picture depicts the updating process for one node. • The algorithm simultaneously updates beliefs for all the nodes. • Can you find the above and below subsets for other nodes = evidence random variable in this diagram? ©Kathryn Blackmond Laskey Spring 2019 Unit 4 (v3a) - 10 - George Mason University Department of Systems Engineering and Operations Research Pearls Algorithm (continued) • Partition evidence random variables into: • X is below B is there is a chain between it – D = evidence variables "below" B and B that goes through a child of B – A = evidence variables "above" B - Descendents of B • General formula for joint probability: - Other ancestors of descendents of B • X is above B if it is not below B – P(A,B,D) = P(A)P(B|A)P(D|B,A) - Ancestors of B • In a singly connected network: - Other descendents of ancestors of B – A is independent of D given B – THerefore P(A,B,D) = P(A)P(B|A)P(D|B) – From Bayes Rule we know P(B | A=a,D=d) is proportional to P(A=a, B, D=d) – P(B | A=a,D=d) µ P(B | A=a)P(D=d | B) The partitioning into • The algorithm maintains for each node above and below – p(B) = P(B | A=a) (a is the instantiated value of A) cannot be done if the – l(B) = P(D=d | B) (d is the instantiated value of D) network is not singly connected – P'(B) = P(B | A=a,D=d)