<<

School of Computer Science

The Belief Propagation (Sum-Product)

Probabilistic Graphical Models (10-708)

Lecture 5, Sep 31, 2007

Receptor A Receptor B X1 X2 Eric Xing X Kinase C 3 Kinase D X4 Kinase E X5

TF F X6 Reading: J-Chap 4 Gene G X 7 Gene H X8 1

From Elimination to Belief Propagation

z Recall that Induced dependency during marginalization is captured in elimination cliques

z Summation <-> elimination

z Intermediate term <-> elimination clique

B A B A A

C A A

C D A E F C D

E z Can this lead to an generic E E F inference algorithm? G H Eric Xing 2

1 GMs

Undirected tree: a Directed tree: all Poly tree: can have unique path between nodes except the root multiple parents any pair of nodes have exactly one parent We will come back to this later

Eric Xing 3

Equivalence of directed and undirected trees

z Any undirected tree can be converted to a directed tree by choosing a root node and directing all edges away from it

z A directed tree and the corresponding undirected tree make the same conditional independence assertions

z Parameterizations are essentially the same.

z Undirected tree:

z Directed tree:

z Equivalence:

z Evidence:?

Eric Xing 4

2 From elimination to message passing

z Recall ELIMINATION algorithm:

z Choose an ordering Z in which query node f is the final node

z Place all potentials on an active list

z Eliminate node i by removing all potentials containing i, take sum/product over xi. z Place the resultant factor back on the list

z For a TREE graph:

z Choose query node f as the root of the tree

z View tree as a directed tree with edges pointing towards from f

z Elimination ordering based on depth-first traversal

z Elimination of each node can be considered as message-passing (or Belief Propagation) directly along tree branches, rather than on some transformed graphs Æ thus, we can use the tree itself as a data-structure to do general inference!!

Eric Xing 5

The elimination algorithm

Procedure Initialize (G, Z) Procedure Normalization (φ∗) ∗ ∗ 1. Let Z1, . . . ,Zk be an ordering of Z 1. P(X|E)=φ (X)/∑xφ (X) such that Zi ≺ Zj iff i < j 2. Initialize F with the full the set of factors

Procedure Evidence (E)

1. for each i∈ΙE , Procedure Sum-Product-Eliminate-Var ( F =F ∪δ(Ei, ei) F, // Set of factors Procedure Sum-Product-Variable- Z // Variable to be eliminated ) Elimination (F, Z, ≺) 1. F ′←{φ ∈ F : Z ∈ Scope[φ]} 1. for i = 1, . . . , k 2. F ′′ ← F − F ′ F ← Sum-Product-Eliminate-Var(F, Zi) ∗ 3. ψ ←∏φ∈F ′ φ 2. φ ← ∏φ∈F φ 4. τ ← ∑ ψ 3. return φ∗ Z 5. return F ′′ ∪ {τ} 4. Normalization (φ∗)

Eric Xing 6

3 Message passing for trees

Let mij(xi) denote the factor resulting from f eliminating variables from bellow up to i, which is a function of xi:

This is reminiscent of a message sent i from j to i.

j

k l

mij(xi) represents a "belief" of xi from xj!

Eric Xing 7

z Elimination on trees is equivalent to message passing along tree branches! f

i

j

k l

Eric Xing 8

4 The message passing protocol:

z A node can send a message to its neighbors when (and only when) it has received messages from all its other neighbors.

z Computing node marginals:

z Naïve approach: consider each node as the root and execute the message passing algorithm

X m21(x1) 1 Computing P(X1)

m32(x2) m42(x2) X2

X3 X4

Eric Xing 9

The message passing protocol:

z A node can send a message to its neighbors when (and only when) it has received messages from all its other neighbors.

z Computing node marginals:

z Naïve approach: consider each node as the root and execute the message passing algorithm

X m12(x2) 1 Computing P(X2)

m32(x2)m42(x2) X2

X3 X4

Eric Xing 10

5 The message passing protocol:

z A node can send a message to its neighbors when (and only when) it has received messages from all its other neighbors.

z Computing node marginals:

z Naïve approach: consider each node as the root and execute the message passing algorithm

X m12(x2) 1 Computing P(X3)

m23(x3) m42(x2) X2

X3 X4

Eric Xing 11

Computing node marginals

z Naïve approach:

z Complexity: NC

z N is the number of nodes

z C is the complexity of a complete message passing

z Alternative dynamic programming approach

z 2-Pass algorithm (next slide Î)

z Complexity: 2C!

Eric Xing 12

6 The message passing protocol:

z A two-pass algorithm:

X1

m21(X 1) m12(X 2)

m32(X 2) X2 m42(X 2)

X4 X3 m24(X 4) m23(X 3)

Eric Xing 13

Belief Propagation (SP-algorithm): Sequential implementation

Eric Xing 14

7 Belief Propagation (SP-algorithm): Parallel synchronous implementation

z For a node of degree d, whenever messages have arrived on any subset of d-1 node, compute the message for the remaining edge and send! z A pair of messages have been computed for each edge, one for each direction z All incoming messages are eventually computed for each node Eric Xing 15

Correctness of BP on tree

z Collollary: the synchronous implementation is "non-blocking"

z Thm: The Message Passage Guarantees obtaining all marginals in the tree

z What about non-tree?

Eric Xing 16

8 Another view of SP:

z Example 1

X X X f X 1 5 fa 1 d 5

X X 3 fc 3

f fe X2 X4 b X2 X4

P(X1) P(X2) P(X3|X1,X2) P(X5|X1,X3) P(X4|X2,X3)

fa(X1) fb(X2) fc(X3,X1,X2) fd(X5,X1,X3) fe(X4,X2,X3)

Eric Xing 17

Factor Graphs

z Example 2 X1 X1

fa fc

X2 X3 X2 X3 f ψ(x1,x2,x3) = fa(x1,x2)fb(x2,x3)fc(x3,x1) b

z Example 3 X1 X1

fa

X2 X3 X2 X3

ψ(x1,x2,x3) = fa(x1,x2,x3)

Eric Xing 18

9 Factor Tree

z A Factor graph is a Factor Tree if the undirected graph obtained by ignoring the distinction between variable nodes and factor nodes is an undirected tree

X1 X1

fa

X2 X3 X2 X3

ψ(x1,x2,x3) = fa(x1,x2,x3)

Eric Xing 19

Message Passing on a Factor Tree

z Two kinds of messages

1. ν: from variables to factors 2. µ: from factors to variables

x f1 j

x f xi fs i s

f3 xk

Eric Xing 20

10 Message Passing on a Factor Tree, con'd

z Message passing protocol:

z A node can send a message to a neighboring node only when it has received messages from all its other neighbors

z Marginal probability of nodes:

x f1 j

x f xi fs i s

f3 xk

P(xi) ∝∏s 2 N(i) µsi(xi)

∝νis(xi)µsi(xi)

Eric Xing 21

BP on a Factor Tree

X X3 1 X2

ν µ 1d µd2 e2 ν3e f f X X1 d X2 e 3 µ ν ν µ d1 2d 2e e3 µ µa1 ν3c c3 ν1a µb2 ν2b f fa fb c

Eric Xing 22

11 Why factor graph?

z Tree-like graphs to Factor trees

X1

X1

X 2 X2

X X3 4 X 3 X4

X5 X6

X5 X6

Eric Xing 23

Poly-trees to Factor trees

X X1 2 X 1 X2

X3 X4 X3 X4

X5

X5

Eric Xing 24

12 Why factor graph?

z Because FG turns tree-like X1

X1 graphs to factor trees,

z and trees are a data-structure X 2 X2 that guarantees correctness of

X X3 4 BP !

X3 X4

X5 X6

X5 X6

X1 X2 X 1 X2

X3 X4 X3 X4

X5

X5

Eric Xing 25

Max-product algorithm: computing MAP probabilities

f

i

j

k l

Eric Xing 26

13 Max-product algorithm: computing MAP configurations using a final bookkeeping backward pass

f

i

j

k l

Eric Xing 27

Summary

z Sum-Product algorithm computes singleton marginal probabilities on:

z Trees

z Tree-like graphs

z Poly-trees

z Maximum a posteriori configurations can be computed by replacing sum with max in the sum-product algorithm

z Extra bookkeeping required

Eric Xing 28

14 Inference on general GM

z Now, what if the GM is not a tree-like graph?

z Can we still directly run message message-passing protocol along its edges?

z For non-trees, we do not have the guarantee that message-passing will be consistent!

z Then what?

z Construct a graph data-structure from P that has a tree structure, and run message-passing on it!

Æ

Eric Xing 29

Elimination Clique

z Recall that Induced dependency during marginalization is captured in elimination cliques

z Summation <-> elimination

z Intermediate term <-> elimination clique

B A B A A

C A A

C D A E F C D

E z Can this lead to an generic E E F inference algorithm? G H Eric Xing 30

15 A Clique Tree

B A B A A m m C c b

A A md C D mf A E F me C D

E mh E E F mg G H

me (a,c,d)

= ∑ p(e | c,d)mg (e)m f (a,e) e Eric Xing 31

From Elimination to Message Passing

z Elimination ≡ message passing on a clique tree

B A B A B A B A B A B A B A B A A

C D C D C D C D C D C D C

E F E F E F E F E

G H G H G ≡

B A B A A m m C c b A A md C D mf A m (a,c,d) E F e me C D = p(e | c,d)mg (e)m f (a,e) ∑ E mh e E E F mg G H

z Messages can be reused

Eric Xing 32

16 From Elimination to Message Passing

z Elimination ≡ message passing on a clique tree

z Another query ...

B A B A A m m C c b A A md C D mf A E F me C D

E mh E E F mg G H

z Messages mf and mh are reused, others need to be recomputed

Eric Xing 33

17