Lecture 3 – Bayesian Graphical Models

Advanced Probabilistic Machine Learning Lecture 3 – Bayesian Graphical Models Riccardo Sven Risuleo Division of Systems and Control Department of Information Technology Uppsala University [email protected] www.it.uu.se/katalog/ricri923 1 / 39 [email protected] Bayesian Graphical Models Summary of Lecture 2 (I) Bayesian lLinear regression model y = wwT x + " ;" (0; σ2); n = 1; :::; N n n n n ∼ N w p(w): ∼ Present assumptions: 1. yn – observed random variable. 2. w – unknown deterministicw – unknown random variable. (difference from SML) 3. xn – known deterministic variable. 4. "n – unknown random variable. 5. σ – known deterministic variable. 2 / 39 [email protected] Bayesian Graphical Models Summary of Lecture 2 (II) Remember Bayes’ theorem p(w; y) p(y w)p(w) p(w y) = = j j p(y) p(y) Prior distribution: p(w) describes the knowledge we have about • w before observing any data. Likelihood: p(y w) described how “likely” the observed data is • j for a particular parameter value. Posterior distribution: p(w y) summarize all our knowledge • j about w from the observed data and the model. In Bayesian linear regression we use a Gaussian distribution as prior p(w) = (w; m ; Σ ) N 0 0 3 / 39 [email protected] Bayesian Graphical Models Summary of Lecture 2 (III) p(xa, xb) Thm 1 Thm 2 Thm 3 p(x ) p(x x ) a b| a Col 1 = Thm 3 + Thm 2 p(x ) Col 2= Thm 3 + Thm 1 p(x x ) b a| b 4 / 39 [email protected] Bayesian Graphical Models Summary of Lecture 2 (IV) Plot of the situation after one measurement has arrived. w1 w1 w1 w0 w0 w0 Prior Likelihood Posterior/prior, p(w) = w m0; S0 p(y1 w) = p(w y1) = w m1; S1 ; N j j j N j −1 T (y1 w0 + w1x1; β ) m = βS X y ; N j 1 1 1 1 T −1 0.8 S1 = (αI2 + βX X) : 0.6 0.4 0.2 y y 0 −0.2 Example of a few realizations from the posterior and the −0.4 first measurement (black circle). −0.6 −0.8 −1 −1 −0.5 0 0.5 1 xx 5 / 39 [email protected] Bayesian Graphical Models Contents Bayesian Graphical Models Why graphical models? Types of Graphical Models Bayesian Networks Factorization of the Joint Distribution How to build BNs Examples Generative Models Independence in BNs Basic structures D-separation Markov blanket Exact Inference 6 / 39 [email protected] Bayesian Graphical Models Bayesian Graphical Models 7 / 39 [email protected] Bayesian Graphical Models Why graphical models? “Graphical models bring together graph theory and probability theory in a powerful formalism for multivariate statistical modeling.1” Augment algebraic manipulations with graph tools for aiding visualization • inferring model structure • structuring computations (e.g. message passing) • Just a different representation! The model is not changed! 1Wainwright and Jordan “Graphical models, exponential families, and variational inference,” Foundations and Trends in Machine Learning, 1(1-2):1–305, 2008 8 / 39 [email protected] Bayesian Graphical Models Types of Graphical Models Three types of graphs a 1. Bayesian Networks: represent b dependencies between variables c using a Directed Acyclic Graph (DAG) yi 2. Markov Random Fields: represents Markovian dependencies between xi variables using an undirected graphs. 3. Factor Graphs: represents both variables relationships and between a b variables (can represent both BNs and MRFs) d c 9 / 39 [email protected] Bayesian Graphical Models Bayesian Networks 10 / 39 [email protected] Bayesian Graphical Models Bayesian Networks: Notation (I) Two components2 Random variable nodes: represent random vari- • ables in the model. Dependency edges: arrows from conditioning • variables toward conditional variables. A BN describes the dependency structure, not the distributions of the variables 2bipartite graph 11 / 39 [email protected] Bayesian Graphical Models The BN is a factorization of the Joint distribution From BN to joint distribution d a b e c f p(a; b; c; d; e; f) = p(f c; e)p(e b; c)p(b a; d)p(c a)p(a)p(d) j j j j The factorization of the joint distribution into conditionals is given by the structure of the BN. 12 / 39 [email protected] Bayesian Graphical Models Factorization of the Joint Distribution From joint distribution to BN p(a; b; c; d) = p(a b; c; d)p(b c; d)p(c d)p(d) j j j d c a b Any joint distribution has a representation as a BN. The representation is not unique! 13 / 39 [email protected] Bayesian Graphical Models How to build a BN Pearl’s network construction algorithm For each variable 1. Choose a set of variables to describe the domain 2. Choose an ordering of the variables 3. While there are variables left: Add the next variable to the graph • Add edges to the new variable from a minimal set of nodes in the • graph such that the added variable is conditionally independent on the rest of the graph Variable ordering matters! The BN is not unique! 14 / 39 [email protected] Bayesian Graphical Models Bayesian Networks: Notation (II) Additional useful notation Observed variable nodes: represent condition- • ing random variables (with a known value). Plates: represent repeated parts of the graph • xi N Labels: represent quantities that are not random • ρ (mostly used for hyperparameters). A 15 / 39 [email protected] Bayesian Graphical Models Examples (I) Predicting blood disease from gene expression profiles3 3Agrahari et al. “Applications of Bayesian network models in predicting types of hematological malignancies,” Scientific Reports, vol. 8, no. 6951 (2018) 16 / 39 [email protected] Bayesian Graphical Models Examples (II) Inferring relationships between stock prices in S&P 5004 4Conrady and Jouffe, “Knowledge Discovery in the Stock Market: Supervised and Unsupervised Learning with BayesiaLab,” Technical report, Bayesia, June 2013 17 / 39 [email protected] Bayesian Graphical Models Dependency = causality! 6 A = it rains ;B = I take the umbrella f g f g Factorized density: p(A; B) = p(A B)p(B) • j Bayesian network: B A • Do not confuse conditional dependency with causality! https://xkcd.com/552/ 18 / 39 [email protected] Bayesian Graphical Models Generative Models 19 / 39 [email protected] Bayesian Graphical Models Generative Models How do we generate samples from this distribution? 1 1 (x+1)2 p(x) = e− 2 p2π How do we generate samples from this distribution? 1 1 (x+1)2 2 1 (x 1)2 p(x) = e− 2 + e− 2 − 3p2π 3p2π .2 See it as the sum of two normal distributions ) x ( p .1 1. Choose one component 0 5 0 5 2. Draw from that component − x Ancestral sampling: sample in order in a BN 20 / 39 [email protected] Bayesian Graphical Models BNs as Generative Models x x 1 2 Ancestral sampling Start from the non-conditioned nodes x3 x4 • Sample once all conditioning nodes are given • Collect the samples x5 x6 • WARNING: cannot be directly used when we have observed nodes! In that case, we use other sampling methods (Lecture 4) X1 X2 X3 X4 X5 X6 X7 21 / 39 [email protected] Bayesian Graphical Models Independence in BNs 22 / 39 [email protected] Bayesian Graphical Models Example I: Bayesian Linear Regression Observations y1:N • w Linear model y = xT w + ν • n n n νn (0;R): i.i.d. noise • ∼ N y1 y2 yN w (0; Σ): prior. ··· • ∼ N The joint density is given by • N Y y(y : ; w) = p(y : w)p(w) = p(w) p(y w) 1 N 1 N j nj n=1 QN Why is it that p(y : w) = p(y w)? 1 N j n=1 nj 23 / 39 [email protected] Bayesian Graphical Models Example II: “Explaining away” Fuel system of a car: Battery is charged (B = 1) or flat (B = 0); • Fuel tank is full (F = 1) or empty (F = 0); • Fuel gauge indicates full (G = 1) or empty (G = 0) • p(B = 1) = 0:9 p(G = 1jB = 1;F = 1) = 0:8 p(G = 1jB = 1;F = 0) = 0:2 p(F = 1) = 0:9 p(G = 1jB = 0;F = 1) = 0:2 p(G = 1jB = 0;F = 0) = 0:1 We have p(F = 0) = 0:1 B F • p(F = 0 G = 0) 0:257 • j ≈ p(F = 0 G = 0;B = 0) 0:111 G • j ≈ Why is p(F = 0 G = 0) = P (F = 0 G = 0;B = 0)? j 6 j Exercise: compute the probabilities! 24 / 39 [email protected] Bayesian Graphical Models Independence in BNs (I) p(a; b; c) p(a; b c) = j p(c) c p(a c)p(b c)p(c) = j j p(c) = p(a c)p(b c) a b j j = a = b c ) j j tail-to-tail nodes are independent if the node between them is observed. 25 / 39 [email protected] Bayesian Graphical Models Independence in BNs (II) Z p(a; b) = p(a; b; c) dc a Z b = p(c a; b)p(a)p(b) dc j c = p(a)p(b) = a = b ) j j; head-to-head nodes are independent if the node between them is not observed. 26 / 39 [email protected] Bayesian Graphical Models Independence in BNs (III) p(a; b; c) p(a; b c) = j p(c) p(b c)p(c a)p(a) = j j a c b p(c) = p(b c)p(a c) j j = a = b c ) j j head-to-tail nodes are independent if the node between them is observed.

Load more