
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 5: Belief Propagation & Factor Graphs Some figures courtesy Michael Jordan’s draft textbook, An Introduction to Probabilistic Graphical Models Inference in Graphical Models xE observed evidence variables (subset of nodes) xF unobserved query nodes we’d like to infer x remaining variables, extraneous to this query R but part of the given graphical representation p(xE,xF )= p(xE,xF ,xR) R = V E,F \{ } x XR Posterior Marginal Densities p(xE,xF ) p(xF xE)= p(xE)= p(xE,xF ) | p(xE) x XF • Provides Bayesian estimators, confidence measures, and sufficient statistics for iterative parameter estimation • The elimination algorithm assumed a single query node. What if we want the marginals for all unobserved nodes? Inference in Undirected Trees • For a tree, the maximal cliques are always pairs of nodes: 1 p(x)= (x ,x ) (x ) Z st s t s s (s,t) s Y2E Y2V Inference via the Distributed Law Inference via the Distributed Law Inference via the Distributed Law X1 X1 m ()x m12 ()x1 12 2 X2 X2 m32 ()x2 m42 ()x2 m32 ()x2 m42 ()x2 X 3 X4 X 3 X4 Computing Multiple Marginals(a) (b) X1 XX1 1X1 X1X1 X1 X1 m12 ()x2 m m()12x()x2 m12 ()x1 m12m()12x1()x1 12 2 m12 ()x2 m12 ()x2 m12 ()x1 X2 XX2 2X2 X2X2 X2 X2 m ()x m ()x m m()x ()x m m()x ()x m ()x m ()x m32 ()x2 m42 ()x2 32 2 42 2 m3232 ()322x2 2 m4242 ()422x2 2 m32 ()32x2 2 m42 ()42x2 2 m32 ()x2 m24 ()x 4 m23 ()x3 m24 ()x4 X 3 X4 XX3 3X 3 XX4 4X4 X 3X 3 X4X4 X 3 X4 X 3 X4 (a) (b)(a)(a) (b)(b) (c) (d) X X X X 1 X1 1 1 • CanX1 compute1 all marginals, at all m12 ()x2 mm12 m()x()12x()x2 m12 ()x m m()()x m m()()x 12 2 2 1 12 12x2nodes,2 12 12x1 1by combining incoming messages from adjacent edges X2 XX2 2X2 X2X2 m ()x m ()x m ()x m ()x m () 32 2 42 2 m32 ()32x2 •2 Each messagem42 ()42x2 2 must only be 32 x2 m32m()32x2()x2 m24 ()x4 m () m24m()24x()x4 m computed() m () once, via some 23 x3 24 x4 m23 ()23x3x3 m24 ()24x4x4 X X X X X4 X X X4 3 4 X3 3 3 X4 4 X 3 3 message updateX4 schedule (c) (d)(c)(c) (d)(d) Belief Propagation (Sum-Product) BELIEFS: Posterior marginals neighborhood of node t (adjacent nodes) MESSAGES: Sufficient statistics I) Message Product II) Message Propagation Message Update Schedules • Message Passing Protocol: A node can send a message to a neighboring node when, and only when, it has (a)(a) (b) (b) received incoming messages from all of its other neighbors (a) (b) (c)(c) (d) (d) • Synchronous Parallel Schedule: At each iteration, every node computes all outputs for which it has needed inputs • Global Sequential Schedule: Choose some node as the root of the true. Pass messages from the leaves to the root, and then from the root back to the leaves. (c)• Asynchronous (d) Parallel Schedule: Initialize messages arbitrarily. At each iteration, all nodes compute all outputs from all current inputs. Iterate until convergence. Belief Propagation for Trees • Dynamic programming algorithm which exactly computes all marginals • On Markov chains, BP equivalent to alpha-beta or forward-backward algorithms for HMMs • Sequential message schedules require each message to be updated only once • Computational cost: number of nodes discrete states for each node Belief Prop: Brute Force: BP for Continuous Variables Sec. 2.3. Variational Methods and Message Passing Algorithms 71 Sec. 2.3. Variational Methods and Message Passing Algorithms 71 70 CHAPTER 2. NONPARAMETRIC ANDx GRAPHICAL3 MODELS x3 xj\i xl\i x1 x2 xl x1 x2 xi xj x4 xk x4 p(x ) ψ (x )ψ x(xk\i ,x )ψ (x )ψ (x ,x )ψ (x )ψ (x ,x )ψ (x ) dx dx dx 1 ∝ 1 1 12 1 2 2 2 23 2 3 3 3 24 2 4 4 4 4 3 2 p(x ) ψ (x !!!)ψ (x ,x )ψ (x )ψ (x ,x )ψ (x )ψ (x ,x )ψ (x ) dx dx dx 1 ∝ 1 1 12 1 2 2 2 23 2 3 3 3 24 2 4 4 4 4 3 2 Figure 2.14. For!!! a tree–structuredψ1(x1) graph,ψ12(x1 each,x2) nodeψ2(xi2partitions)ψ23(x2,x the3) graphψ3(x3 into)ψ24|Γ((xi2)|,xdisjoint4)ψ4(x subtrees.4) dx4 dx3 dx2 x ∝ x Conditionedψ on(x )i, the variablesψ (x!!!,xj)\ψi in(x these)ψ subtrees(x ,x )ψ are(x independent.)ψ (x ,x )ψ (x ) dx dx dx ∝ 1 1 12 1 2 2 2 23 2 3 3 3 24 2 4 4 4 4 3 2 !!!ψ (x ) ψ (x ,x )ψ (x ) ψ (x ,x )ψ (x )ψ (x ,x )ψ (x ) dx dx dx ∝ 1 1 12 1 2 2 2 23 2 3 3 3 24 2 4 4 4 4 3 2 ψ1(x1) ψ12(x1,x!2)ψ2(x2) ψ23(x2,x"!!3)ψ3(x3)ψ24(x2,x4)ψ4(x4) dx4 dx3 dx2 # can be∝ efficiently decomposedBP is into Exact a set of simpler, for localTrees computations. In particular, ! "!! # for tree–structuredψ graphical1(x1) ψ12 models(x1Proof,x2) aψ generalization2on(x2 Board) ψ23 (x2 of,x3 d)ynamicψ3(x3) dx programming3 ψ24(x2 known,x4)ψ4(x4) dx4 dx2 ψ (x ) ∝ψ (x ,x )ψ (x ) ψ (x ,x )ψ (x ) dx ψ (x ,x·)ψ (x ) dx dx as belief∝ propagation1 1 12 (BP)1 !2[1782 , 2312 , 25523] recursively2 "3! 3 3 computes3 · exact24 2 posterior#4 "!4 4 marginals4 2 # in linear time. In! the following sections,"! we provide a briefm32#(x"derivation!2) of BP, andm discuss#42(x2) m (x ) m (x ) issues arising in its implementation. We then32$ present2 a vari%& ational interpretation42' $ 2 of%& BP ' which justifies extensions to graphs$ withm21( cycles.x1%&) ψ12(x1',x$2)ψ2(x2)m%&32(x2) m42(x'2) dx2 $m (x ) ψ (x ,x∝ )ψ (x )m (x %&) m (x ) dx ' 21 1 ∝ 12 1 2 ! 2 2 32 2 42 2 2 $ ! %& ' Message PassingFigure in Trees 2.15. Example derivation of the BP message passing recursion through repeated application Figure 2.15.of theExample distributive derivation law. of Becausethe BP message the joint passing distribution recursionp( throughx) factorizes repeated as aapplication product of pairwise clique Considerof the a distributive pairwise law. MRF, Because parameterized the joint distribution as inp( Sec.x) factorizes2.3.1, as whose a product underlying of pairwise graph clique potentials, the joint integral can be decomposed via messages mji(xi) sent between neighboring nodes. =(potentials,, ) is tree–structured. the joint integral can be As decomposed shown in via Fig. messages2.14m,ji any(xi) sent node betweeni neighboringdivides nodes. such G V E ∈ V atreeinto Γ(i) disjoint subsets: | | To derive the BP algorithm, we begin by considering the clique potentials corre- To derive the BP algorithm, we begin by considering the clique potentials corre- spondingj i ! j to particulark no subsets path of from thek full graph:j intersects i (2.110) sponding to particular\ { } subsets∪ { ∈ ofV| the full graph: → } Ψ (x ) ! ψij(xi,xj) ψi(xi,y) (2.111) By the Markov propertiesΨA(xA) of! , theA A variablesψij(xi,xxj) inψ thesei(xi,y) sub–trees are conditionally(2.111) in- j\i A ⊂ V A ⊂ V G(i,j)∈E(A) (i,j)∈E(iA∈)A i∈A dependent given xi. The BP algorithm( exploits( ( this structure( to recursively decompose the computationHere, ( Here,) ! of p((i,x( ji) )y!) intoi,(i, j a j) seriesare ofi, the j simpler, edgesare localcontained the calculations. edges in containedthe node–induced in the sub-node–induced sub- E A { E A|∈ E|{ ∈ A}∈ E| ∈ A} Fromgraph the[50 Hammersley–Cli]graph corresponding[50] corresponding tofford. Using Theorem, to the partitions. Markov Using the illustrated properties partitions in ar Fig.illustratede expressed2.14, we in can through Fig. then2.14, we can then A A the algebraicwrite thestructure marginalwrite the distribution of marginal the pairwise distribution of any MRF’s node of as factorization any follows: node as into follows: clique potentials. As illustrated in Fig. 2.15, tree–structured graphs allow multi–dimensional integrals (or summations) to bep decomposed(xi y) p(x intoψyi)( axi series,y) ofψψ simpler,(xij(,yxi),xj) one–dΨ ψ(ximensional(x ),xdx) Ψ integrals.(x )(2.112)dx As (2.112) | ∝ i | ∝ i i j\iij j\i j V\ij\i j\i V\i in dynamic programming [24, 90XV\, 303i ], theXj overallV\∈Γi(i) integral can then be computed via a ! ! ( j∈(Γ(i) recursion involving messages sent between neighboring nodes. This decomposition is an ψi(xi,y) ψi(xi,yψ)ij(xi,xj) Ψjψ\iij(x(jx\i),xdxj)jΨ\i (x )(2.113)dx (2.113) instance of the same distributive∝ law underlying∝ X a variety of other algorithmsj\i [j4\,i50, 255j\i ], j∈Γ(i) ! j\i Xj\i including the fast Fourier transform. Critically,( becausej∈(Γ(i) !messages are shared among sim- ilar decompositions associated with different nodes, BP efficiently and simultaneously computes the desired marginals for all nodes in the graph. BP Algorithm BP Algorithm BP Algorithm Inference for Graphs with Cycles • For graphs with cycles, the dynamic programming BP derivation breaks Junction Tree Algorithm • Cluster nodes to break cycles • Run BP on the tree of clusters • Exact, but often intractable Loopy Belief Propagation • Iterate local BP message updates on the graph with cycles • Hope beliefs converge • Empirically, often very effective… A Brief History of Loopy BP • 1993: Turbo codes (and later LDPC codes, rediscovered from Gallager’s 1963 thesis) revolutionize error correcting codes (Berrou et.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages24 Page
-
File Size-