Quick viewing(Text Mode)

Lecture 3 – Bayesian Graphical Models

Advanced Probabilistic Lecture 3 – Bayesian Graphical Models

Riccardo Sven Risuleo Division of Systems and Control Department of Information Technology Uppsala University

[email protected] www.it.uu.se/katalog/ricri923

1 / 39 [email protected] Bayesian Graphical Models Summary of Lecture 2 (I)

Bayesian lLinear regression model

y = wwT x + ε , ε (0, σ2), n = 1, ..., N n n n n ∼ N w p(w). ∼

Present assumptions:

1. yn – observed . 2. w – unknown deterministicw – unknown random variable. (difference from SML)

3. xn – known deterministic variable.

4. εn – unknown random variable. 5. σ – known deterministic variable.

2 / 39 [email protected] Bayesian Graphical Models Summary of Lecture 2 (II)

Remember Bayes’ theorem

p(w, y) p(y w)p(w) p(w y) = = | | p(y) p(y)

Prior distribution: p(w) describes the knowledge we have about • w before observing any data. Likelihood: p(y w) described how “likely” the observed data is • | for a particular parameter value. Posterior distribution: p(w y) summarize all our knowledge • | about w from the observed data and the model.

In Bayesian linear regression we use a Gaussian distribution as prior p(w) = (w; m , Σ ) N 0 0

3 / 39 [email protected] Bayesian Graphical Models Summary of Lecture 2 (III)

p(xa, xb) Thm 1

Thm 2

Thm 3 p(x ) p(x x ) a b| a

Col 1 = Thm 3 + Thm 2

p(x ) Col 2= Thm 3 + Thm 1 p(x x ) b a| b 4 / 39 [email protected] Bayesian Graphical Models Summary of Lecture 2 (IV)

Plot of the situation after one measurement has arrived.

w1 w1 w1

w0 w0 w0 Prior Likelihood Posterior/prior,

  p(w) = w m0, S0 p(y1 w) = p(w y1) = w m1, S1 , N | | | N | −1 T (y1 w0 + w1x1, β ) m = βS X y , N | 1 1 1 1 T −1 0.8 S1 = (αI2 + βX X) . 0.6

0.4

0.2 y y 0 −0.2 Example of a few realizations from the posterior and the −0.4 first measurement (black circle). −0.6

−0.8

−1 −1 −0.5 0 0.5 1 xx

5 / 39 [email protected] Bayesian Graphical Models Contents

Bayesian Graphical Models Why graphical models? Types of Graphical Models Bayesian Networks Factorization of the Joint Distribution How to build BNs Examples Generative Models Independence in BNs Basic structures D-separation Markov blanket Exact

6 / 39 [email protected] Bayesian Graphical Models Bayesian Graphical Models

7 / 39 [email protected] Bayesian Graphical Models Why graphical models?

“Graphical models bring together graph theory and probability theory in a powerful formalism for multivariate statistical modeling.1”

Augment algebraic manipulations with graph tools for aiding visualization • inferring model structure • structuring computations (e.g. message passing) •

Just a different representation! The model is not changed!

1Wainwright and Jordan “Graphical models, exponential families, and variational inference,” Foundations and Trends in Machine Learning, 1(1-2):1–305, 2008 8 / 39 [email protected] Bayesian Graphical Models Types of Graphical Models

Three types of graphs a 1. Bayesian Networks: represent b dependencies between variables c using a (DAG) yi 2. Markov Random Fields: represents

Markovian dependencies between xi variables using an undirected graphs. 3. Factor Graphs: represents both variables relationships and between a b variables (can represent both BNs and MRFs) d c

9 / 39 [email protected] Bayesian Graphical Models Bayesian Networks

10 / 39 [email protected] Bayesian Graphical Models Bayesian Networks: Notation (I)

Two components2 Random variable nodes: represent random vari- • ables in the model.

Dependency edges: arrows from conditioning • variables toward conditional variables.

A BN describes the dependency structure, not the distributions of the variables

2bipartite graph 11 / 39 [email protected] Bayesian Graphical Models The BN is a factorization of the Joint distribution

From BN to joint distribution

d

a b

e

c f

p(a, b, c, d, e, f) = p(f c, e)p(e b, c)p(b a, d)p(c a)p(a)p(d) | | | |

The factorization of the joint distribution into conditionals is given by the structure of the BN.

12 / 39 [email protected] Bayesian Graphical Models Factorization of the Joint Distribution

From joint distribution to BN

p(a, b, c, d) = p(a b, c, d)p(b c, d)p(c d)p(d) | | |

d

c

a b

Any joint distribution has a representation as a BN. The representation is not unique!

13 / 39 [email protected] Bayesian Graphical Models How to build a BN

Pearl’s network construction algorithm For each variable 1. Choose a set of variables to describe the domain 2. Choose an ordering of the variables 3. While there are variables left: Add the next variable to the graph • Add edges to the new variable from a minimal set of nodes in the • graph such that the added variable is conditionally independent on the rest of the graph

Variable ordering matters! The BN is not unique!

14 / 39 [email protected] Bayesian Graphical Models Bayesian Networks: Notation (II)

Additional useful notation Observed variable nodes: represent condition- • ing random variables (with a known value).

Plates: represent repeated parts of the graph • xi N

Labels: represent quantities that are not random • ρ (mostly used for ). A

15 / 39 [email protected] Bayesian Graphical Models Examples (I)

Predicting blood disease from gene expression profiles3

3Agrahari et al. “Applications of Bayesian network models in predicting types of hematological malignancies,” Scientific Reports, vol. 8, no. 6951 (2018) 16 / 39 [email protected] Bayesian Graphical Models Examples (II)

Inferring relationships between stock prices in S&P 5004

4Conrady and Jouffe, “Knowledge Discovery in the Stock Market: Supervised and with BayesiaLab,” Technical report, Bayesia, June 2013 17 / 39 [email protected] Bayesian Graphical Models Dependency = ! 6

A = it rains ,B = I take the umbrella { } { } Factorized density: p(A, B) = p(A B)p(B) • | Bayesian network: B A •

Do not confuse conditional dependency with causality!

https://xkcd.com/552/

18 / 39 [email protected] Bayesian Graphical Models Generative Models

19 / 39 [email protected] Bayesian Graphical Models Generative Models

How do we generate samples from this distribution?

1 1 (x+1)2 p(x) = e− 2 √2π

How do we generate samples from this distribution?

1 1 (x+1)2 2 1 (x 1)2 p(x) = e− 2 + e− 2 − 3√2π 3√2π

.2 See it as the sum of two normal distributions ) x (

p .1 1. Choose one component 0 5 0 5 2. Draw from that component − x

Ancestral sampling: sample in order in a BN

20 / 39 [email protected] Bayesian Graphical Models BNs as Generative Models

x x 1 2 Ancestral sampling Start from the non-conditioned nodes x3 x4 • Sample once all conditioning nodes are given • Collect the samples x5 x6 •

WARNING: cannot be directly used when we have observed nodes! In that case, we use other sampling methods (Lecture 4)

X1 X2 X3 X4 X5 X6 X7

21 / 39 [email protected] Bayesian Graphical Models Independence in BNs

22 / 39 [email protected] Bayesian Graphical Models Example I: Bayesian Linear Regression

Observations y1:N • w Linear model y = xT w + ν • n n n νn (0,R): i.i.d. noise • ∼ N y1 y2 yN w (0, Σ): prior. ··· • ∼ N The joint density is given by • N Y y(y : , w) = p(y : w)p(w) = p(w) p(y w) 1 N 1 N | n| n=1

QN Why is it that p(y : w) = p(y w)? 1 N | n=1 n|

23 / 39 [email protected] Bayesian Graphical Models Example II: “Explaining away”

Fuel system of a car: Battery is charged (B = 1) or flat (B = 0); • Fuel tank is full (F = 1) or empty (F = 0); • Fuel gauge indicates full (G = 1) or empty (G = 0) • p(B = 1) = 0.9 p(G = 1|B = 1,F = 1) = 0.8 p(G = 1|B = 1,F = 0) = 0.2 p(F = 1) = 0.9 p(G = 1|B = 0,F = 1) = 0.2 p(G = 1|B = 0,F = 0) = 0.1

We have p(F = 0) = 0.1 B F • p(F = 0 G = 0) 0.257 • | ≈ p(F = 0 G = 0,B = 0) 0.111 G • | ≈

Why is p(F = 0 G = 0) = P (F = 0 G = 0,B = 0)? | 6 |

Exercise: compute the probabilities!

24 / 39 [email protected] Bayesian Graphical Models Independence in BNs (I)

p(a, b, c) p(a, b c) = | p(c) c p(a c)p(b c)p(c) = | | p(c) = p(a c)p(b c) a b | |

= a = b c ⇒ | |

tail-to-tail nodes are independent if the node between them is observed.

25 / 39 [email protected] Bayesian Graphical Models Independence in BNs (II)

Z p(a, b) = p(a, b, c) dc

a Z b = p(c a, b)p(a)p(b) dc | c = p(a)p(b)

= a = b ⇒ | |∅

head-to-head nodes are independent if the node between them is not observed.

26 / 39 [email protected] Bayesian Graphical Models Independence in BNs (III)

p(a, b, c) p(a, b c) = | p(c) p(b c)p(c a)p(a) = | | a c b p(c) = p(b c)p(a c) | |

= a = b c ⇒ | |

head-to-tail nodes are independent if the node between them is observed.

27 / 39 [email protected] Bayesian Graphical Models D-separation (I)

Goal: Deduce dependencies/independencies between variables directly from the graph!

Previous examples give blocked paths: observed tail-to-tail • not observed head-to-head with not observed descendants • observed head-to-tail •

Are a and b independent?

a f a f

e b e b

c No c Yes 28 / 39 [email protected] Bayesian Graphical Models Another example

b

a c d

a = d c? | | a c d is blocked • → → a c b d is not blocked! • → → →

What if we also observed b? Then a = b c, b | | { }

29 / 39 [email protected] Bayesian Graphical Models Predictive distribution in BLR

Bayesian Linear Regression (from lecture 2) The predictive distribution is Z p(y x , y, X) = p(y x , w)p(w y, X) dw ∗| ∗ ∗| ∗ |

But why? !

w

y1 y2 yN y ··· ∗

x1 x2 xN x ∗

p(y , w x , y, X) = p(y x , w, y,X)p(w ¨x¨, y, X) ∗ | ∗ ∗| ∗ | ∗ 30 / 39 [email protected] Bayesian Graphical Models D-separation (II)

D-separation If A, B, and C, are non-intersecting sets of nodes, we have that

A = B C | | if, on all possible paths from any node in A to any node in B, 1. all tail-to-tail and head-to-tail nodes are in C; 2. neither head-to-head nodes nor any of their descendants are in C.

We need to check all paths in the graph! • Linear-time algorithms exist5 •

5Shachter “Bayes-ball: The rational pastime (for determining irrelevance and requisite information in belief networks and influence diagrams).” arXiv preprint arXiv:1301.7412 (2013). 31 / 39 [email protected] Bayesian Graphical Models Markov blanket

is the set of nodes that shield a node from the rest of the network

X

Markov blanket (in BNs) is the set of parents, children, and co-parents of a node X. For any M node Y in the network

p(X ,Y ) = p(X ) |M |M 32 / 39 [email protected] Bayesian Graphical Models Markov blanket for diagnosis of blood cancer.

BN learned from data, and Markov blanket of Acute Myeloid Leukemia vs. Myelodysplastic Syndrome. See Agrahari et al. (2018). 33 / 39 [email protected] Bayesian Graphical Models Exact Inference

34 / 39 [email protected] Bayesian Graphical Models Bayes’ Theorem: the BN picture

Ingredients latent variable x p(y x)p(x) • p(x y) = | observation y (dependent on x) | p(y) • Objective: infer p(x y) |

x y x y x y

Model Data Result

How does this reasoning generalize to other graphs?

35 / 39 [email protected] Bayesian Graphical Models Exact inference on chains

x1 x2 x3 x4

p(x , x , x , x ) = p(x )p(x x )p(x x )p(x x ) 1 2 3 4 1 2| 1 3| 2 4| 3

How can we compute p(x x )? 3| 4

p(x3, x4) p(x3 x4) = | p(x4)

36 / 39 [email protected] Bayesian Graphical Models Computing the marginal

x1 x2 x3 x4

X p(x ) = p(x )p(x x )p(x x )p(x x ) 4 1 2| 1 3| 2 4| 3 x1:3   X X =  p(x1)p(x2 x1) p(x3 x2)p(x4 x3) | | | x2:3 x1 | {z } µ2(x2)   X X =  µ2(x2)p(x3 x2) p(x4 x3) | | x3 x2 | {z } µ3(x3) X = µ (x )p(x x ) 3 3 4| 3 x3 | {z } µ4(x4) 37 / 39 [email protected] Bayesian Graphical Models Computing the posterior

We can reuse messages! X p(x , x ) = p(x )p(x x )p(x x )p(x x ) 3 4 1 2| 1 3| 2 4| 3 x1:2   X X =  p(x1)p(x2 x1) p(x3 x2) p(x4 x3) | | | x2 x1 | {z } µ2(x2) | {z } µ3(x3) = µ (x )p(x x ) 3 3 4| 3

µ3(x3)p(x4 x3) p(x3 x4) = | | µ4(x4)

Exercise: try to compute p(x x ) 2| 4 38 / 39 [email protected] Bayesian Graphical Models Conclusions from the day

Graphical models allow us to reason about probabilistic models using tools of graph theory

Different graphical models reflect different properties

Bayesian networks show conditional dependency structures

D-separation can be unsed to find conditionally independent sets of variables

Message-passing can be unsed to do inference in chain graphs

What if the inference is not so simple? Monte Carlo methods (Lecture 4) • Message-passing in trees (Lecture 5) • Variational inference (Lecture 6) •

39 / 39 [email protected] Bayesian Graphical Models