Sum-Product Inference for Factor Graphs, Learning Directed Graphical Models

Total Page:16

File Type:pdf, Size:1020Kb

Sum-Product Inference for Factor Graphs, Learning Directed Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 6: Sum-Product Inference for Factor Graphs, Learning Directed Graphical Models Some figures courtesy Michael Jordan’s draft textbook, An Introduction to Probabilistic Graphical Models 50 CHAPTER 2. NONPARAMETRIC AND GRAPHICAL MODELS Pairwise Markov Random Fields x1 1 x2 x1 x2 x1 x2 p(x)= (x ,x ) (x ) Z st s t s s (s,t) s Y2E Y2V • Simple xparameterization,3 but xstill3 x3 expressive and widely used in practice • Guaranteed Markov with respect to graph x4 x5 x4 x5 x4 x5 (a) (b) (c) Figure 2.4. Three graphical representations of a distribution over five random variables (see [175]). (a) Directed graph G depicting a causal, generative process. (b) Factor graph expressing the factoriza- tion underlying G. (c) A “moralized” undirected graph capturing the Markov structure of G. set of undirected edges (s,t) linking pairs of nodes For example,E in the factor graph of Fig. 2.5(c), there are 5 variable nodes, and the joint set of N nodes or vertices, 1, 2,...,N distributionV has one potential for each of the 3{ hyperedges: } normalizationp(x) ψ (x constant,x ,x ) ψ (partition(x ,x ,x function)) ψ (x ,x ) Z ∝ 123 1 2 3 234 2 3 4 35 3 5 Often, these potentials can be interpreted as local dependencies or constraints. Note, however, that ψf (xf ) does not typically correspond to the marginal distribution pf (xf ), due to interactions with the graph’s other potentials. In many applications, factor graphs are used to impose structure on an exponential family of densities. In particular, suppose that each potential function is described by the following unnormalized exponential form: ψ (x θ )=ν (x )exp θ φ (x ) (2.67) f f | f f f fa fa f a$∈Af Here, θ ! θ a are the canonical parameters of the local exponential family f { fa | ∈ Af } for hyperedge f. From eq. (2.66), the joint distribution can then be written as p(x θ)= ν (x ) exp θ φ (x ) Φ(θ) (2.68) | f f fa fa f − ( f)∈F * f$∈F a$∈Af Comparing to eq. (2.1), we see that factor graphs define regular exponential fami- lies [104, 311], with parameters θ = θ f , whenever local potentials are chosen { f | ∈ F} from such families. The results of Sec. 2.1 then show that local statistics, computed over the support of each hyperedge, are sufficient for learning from training data. This Belief Propagation (Sum-Product) BELIEFS: Posterior marginals pˆ (x ) (x ) m (x ) t t / t t ut t u Γ(t) 2Y xt neighborhood of node t Γ(t) (adjacent nodes) MESSAGES: Sufficient statistics m (x ) (x ,x ) (x ) m (x ) ts s / st s t t t ut t xt u Γ(t) s X 2Y \ I) Message Product II) Message Propagation xs xt Belief Propagation for Trees • Dynamic programming algorithm which exactly computes all marginals • On Markov chains, BP equivalent to alpha-beta or forward-backward algorithms for HMMs • Sequential message schedules require each message to be updated only once • Computational cost: number of nodes discrete states for each node Belief Prop: Brute Force: Factor Graphs 1 p(x)= (x ) Z f f f Y2F • In a hypergraph, the hyperedges link arbitrary subsets of nodes (not just pairs) • Visualize by a bipartite graph, with square (usually black) nodes for hyperedges • A factor graph associates a non-negative potential function with each hyperedge • Motivation: factorization key to computation set of hyperedges linking subsets of nodes f F ✓ V set of N nodes or vertices, 1, 2,...,N V { } Z normalization constant (partition function) Factor Graphs & Factorization 1 p(x)= (x ) Z f f f Y2F • For a given undirected graph, there exist distributions which have equivalent Markov properties, but different factorizations and different inference/learning complexities: Undirected Pairwise (edge) Potentials on Alternative Graphical Model Potentials Maximal Cliques Factorization Directed Graphs as Factor Graphs Directed Graphical Model: N p(x)= p(x x ) i | Γ(i) i=1 Y Corresponding Factor Graph: N p(x)= i(xi,xΓ(i)) i=1 Y • Associate one factor with each node, linking it to its parents and defined to equal the corresponding conditional distribution • Information lost: Directionality of conditional distributions, and fact that global partition function Z =1 Sum-Product Algorithm Belief Propagation for Factor Graphs ft Xj µti ()xi νjs()xj νis()xi µsi()xi Xi fs Xi fs µui()xi νks()xk fu Xk (a) (b) • From each variable node, the incoming and outgoing messages are functions only of that particular variable • Factor message updates must sum over all combinations of the adjacent variable nodes (exponential in degree) Comparing Sum-Product Variants Xi Xi µ si()xi mji ()xi fs ψ ()xj Xj fr Xj µ ti ()xj µuj()xj mkj ()xj mlj (xj ) ft fu Xk Xl Xk Xl (a) (b) • For pairwise potentials, there is one “incoming” message for each outgoing factor message, simplifies to earlier algorithm: m (x ) (x ,x ) (x ) m (x ) ts s / st s t t t ut t xt u Γ(t) s X 2Y \ Factor Graph Message Schedules X1 f d X 2 fe X 3 • All of the previouslyX1 X 2 discussedX 3 message schedules are valid X f d X fe X • Here is an example of a synchronous1 parallel2 schedule:3 X1 X 2 X 3 X1 f d X 2 fe X 3 fa f b fc X X X 1 (a)2 3 X1 f d (b)X 2 fe X 3 fa f b fc X1 X 2 X 3 () () (a) ν1d x1 (b) ν3e x3 fa f b fc ν ()x ν ()x ()x ()x ()x 1d 1 3e 3 µa1 1 µb2 2 (a) µc3 3 fa (b)f b fc (a) (b) ν1d ()x1 ν3e ()x3 µa1 ()x1 µb2()x2 µc3 ()x3 (c) (d) ν1d ()x1 ν3e ()x3 µa1 ()x1 µb2()x2 µc3 ()x3 (c) (d) µd2()x2 µe 2()x2 ν2d ()x2 ν2e ()x2 µa1 ()x1 µb2()x2 µc3 ()x3 (c) (d) µd2()x2 µe 2()x2 ν2d ()x2 ν2e ()x2 ν2b ()x2 (c) (d) µd2()x2 µe 2()x2 ν2d ()x ν2e ()x ν2b ()x2 2 2 (e) (f) µd2()x2 µe 2()x2 ν2d ()x2 ν2e ()x2 ν2b ()x2 (e) (f) µd1()x1 µe3 ()x3 ν2b ()x2 (e) (f) µd1()x1 µe3 ()x3 ν1a ()x1 ν3c ()x3 (e) (f) µd1()x1 µe3 ()x3 ν1a ()x1 ν3c ()x3 (g) (h) µd1()x1 µe3 ()x3 ν1a ()x1 ν3c ()x3 (g) (h) ν1a ()x1 ν3c ()x3 (g) (h) (g) (h) Sum-Product for “Nearly” Trees Undirected Pairwise Graphical Model Factor Graph Graphical Model via Auxiliary Variable X1 X1 X1 X 2 X 2 X 2 Z XX 3 4 XX 3 4 XX3 4 X 5 X6 X 5 X6 X 5 X6 (a) (b) (c) • Sum-product algorithm computes exact marginal distributions for any factor graph which is tree-structured (no cycles) • This includes some undirected graphs with cycles Sum-Product for Polytrees Undirected Directed Directed Factor Graph Tree Tree Polytree X1 X 2 X1 X 2 X 4 X 3 XX3 4 (a) (b) (c) X 5 X 5 (a) (b) • Early work on belief propagation (Pearl, 1980’s) focused on directed graphical models, and was complicated by directionality of edges and multiple parents (polytrees) • Factor graph framework makes this a simple special case Learning Directed Graphical Models X1 X X1 X1 1 p(x)= p(x x , ✓ ) i | Γ(i) i i Y2V X 2 X 3 X 2 X 3 Intuition: Must learn a X 2 X 3 good predictive model of each node, given X 4 its parent nodes X 4 • Directed(a) factorization causes (b) likelihood to locally decompose: • Often paired with a correspondingly factorized prior: p(✓)=p(✓1)p(✓2)p(✓3)p(✓4) log p(✓) = log p(✓1) + log p(✓2) + log p(✓3) + log p(✓4) Complete Observations X1 X1 X1,1 X1,1 X1,2 X1,2 X1,N X1,N X1 X1,1 X1,2 X1,N X 2 X 2 X 2,1 X 2,1 X 2,2 X 2,2 X 2,N X 2,N X 2 X 2,1 X 2,2 X 2,N X X X X X 3 3 X 3,1 3,1 X 3,2 3,2 X 3,N 3,N X 3 N X 3,1 X 3,2 X 3,N Directed(a) (a) N Independent,(b) (b) Identically (a) (b) Graphical Model Distributed Training Examples Plate Notation • A directed graphical model encodes assumed statistical dependencies among the different parts of a single training example: N p( ✓)= p(xi,n xΓ(i),n, ✓i) = x ,1,...,x ,N D| | D { V V } n=1 i Y Y2V • Given N independent, identically distributed, completely observed samples: N N log p( ✓)= log p(x x , ✓ )= log p(x x , ✓ ) D| i,n | Γ(i),n i i,n | Γ(i),n i n=1 i i "n=1 # X X2V X2V X Priors and Tied Parameters N N log p( ✓)= log p(x x , ✓ )= log p(x x , ✓ ) D| i,n | Γ(i),n i i,n | Γ(i),n i n=1 i i "n=1 # X X2V X2V X log p(✓)= p(✓i) A “meta-independent” factorized prior i X2V • Factorized posterior allows independent learning for each node: N log p(✓ )=C + log p(✓ )+ log p(x x , ✓ ) |D i i,n | Γ(i),n i i " n=1 # X2V X • Learning remains tractable when subsets of nodes are “tied” to use identical, shared parameter values: N log p( ✓)= log p(x x , ✓ ) D| i,n | Γ(i),n bi i "n=1 # X2V X N log p(✓ )=C + log p(✓ )+ log p(x x , ✓ ) b |D b i,n | Γ(i),n b i b =b n=1 |Xi X Example: Temporal Models z0 z1 z2 z3 z4 z5 Sequence 1 x1 x2 x3 x4 x5 z0 z1 z2 z3 Sequence 2 x1 x2 x3 z0 z1 z2 z3 z4 Sequence 3 x1 x2 x3 x4 N Tn p(x, z ✓)= p(zt,n zt 1,n, ✓time)p(xt,n zt,n, ✓obs) | | − | n=1 t=1 Y Y Learning Binary Probabilities Bernoulli Distribution: Single toss of a (possibly biased) coin Ber(x ✓)=✓I(x=1)(1 ✓)I(x=0) 0 ✓ 1 | − • Suppose we observe N samples from a Bernoulli distribution with unknown mean: X Ber(✓),i=1,...,N i ⇠ p(x ,...,x ✓)=✓N1 (1 ✓)N0 1 N | − • What is the maximum likelihood parameter estimate? N ✓ˆ = arg max log p(x ✓)= 1 ✓ | N Beta Distributions Probability density function: x [0, 1] 2 Γ(x + 1) = xΓ(x) Γ(k)=(k 1)! − Beta Distributions a ab E[x]= V[x]= a + b (a + b)2(a + b + 1) a 1 Mode[x] = arg max Beta(x a, b)= − x [0,1] | (a 1) + (b 1) 2 − − Bayesian Learning of Probabilities Bernoulli Likelihood: Single toss of a (possibly biased) coin Ber(x ✓)=✓I(x=1)(1 ✓)I(x=0) 0 ✓ 1 | − p(x ,...,x ✓)=✓N1 (1 ✓)N0 1 N | − Beta Prior Distribution: a 1 b 1 p(✓) = Beta(✓ a, b) ✓ − (1 ✓) − | / − Posterior Distribution: N +a 1 N +b 1 p(✓ x) ✓ 1 − (1 ✓) 0 − Beta(✓ N + a, N + b) | / − / | 1 0 • This is a conjugate prior, because posterior is in same family • Estimate by posterior mode (MAP) or mean (preferred) • Here, posterior predictive equivalent to mean estimate Sequence
Recommended publications
  • Bayesian Network - Wikipedia
    2019/9/16 Bayesian network - Wikipedia Bayesian network A Bayesian network, Bayes network, belief network, decision network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network A simple Bayesian network. Rain could represent the probabilistic relationships between diseases and influences whether the sprinkler symptoms. Given symptoms, the network can be used to compute the is activated, and both rain and probabilities of the presence of various diseases. the sprinkler influence whether the grass is wet. Efficient algorithms can perform inference and learning in Bayesian networks. Bayesian networks that model sequences of variables (e.g. speech signals or protein sequences) are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams. Contents Graphical model Example Inference and learning Inferring unobserved variables Parameter learning Structure learning Statistical introduction Introductory examples Restrictions on priors Definitions and concepts Factorization definition Local Markov property Developing Bayesian networks Markov blanket Causal networks Inference complexity and approximation algorithms Software History See also Notes References https://en.wikipedia.org/wiki/Bayesian_network 1/14 2019/9/16 Bayesian network - Wikipedia Further reading External links Graphical model Formally, Bayesian networks are DAGs whose nodes represent variables in the Bayesian sense: they may be observable quantities, latent variables, unknown parameters or hypotheses.
    [Show full text]
  • An Introduction to Factor Graphs
    An Introduction to Factor Graphs Hans-Andrea Loeliger MLSB 2008, Copenhagen 1 Definition A factor graph represents the factorization of a function of several variables. We use Forney-style factor graphs (Forney, 2001). Example: f(x1, x2, x3, x4, x5) = fA(x1, x2, x3) · fB(x3, x4, x5) · fC(x4). fA fB x1 x3 x5 x2 x4 fC Rules: • A node for every factor. • An edge or half-edge for every variable. • Node g is connected to edge x iff variable x appears in factor g. (What if some variable appears in more than 2 factors?) 2 Example: Markov Chain pXYZ(x, y, z) = pX(x) pY |X(y|x) pZ|Y (z|y). X Y Z pX pY |X pZ|Y We will often use capital letters for the variables. (Why?) Further examples will come later. 3 Message Passing Algorithms operate by passing messages along the edges of a factor graph: - - - 6 6 ? ? - - - - - ... 6 6 ? ? 6 6 ? ? 4 A main point of factor graphs (and similar graphical notations): A Unified View of Historically Different Things Statistical physics: - Markov random fields (Ising 1925) Signal processing: - linear state-space models and Kalman filtering: Kalman 1960. - recursive least-squares adaptive filters - Hidden Markov models: Baum et al. 1966. - unification: Levy et al. 1996. Error correcting codes: - Low-density parity check codes: Gallager 1962; Tanner 1981; MacKay 1996; Luby et al. 1998. - Convolutional codes and Viterbi decoding: Forney 1973. - Turbo codes: Berrou et al. 1993. Machine learning, statistics: - Bayesian networks: Pearl 1988; Shachter 1988; Lauritzen and Spiegelhalter 1988; Shafer and Shenoy 1990. 5 Other Notation Systems for Graphical Models Example: p(u, w, x, y, z) = p(u)p(w)p(x|u, w)p(y|x)p(z|x).
    [Show full text]
  • Conditional Random Fields
    Conditional Random Fields Grant “Tractable Inference” Van Horn and Milan “No Marginal” Cvitkovic Recap: Discriminative vs. Generative Models Suppose we have a dataset drawn from Generative models try to learn Hard problem; always requires assumptions Captures all the nuance of the data distribution (modulo our assumptions) Discriminative models try to learn Simpler problem; often all we care about Typically doesn’t let us make useful modeling assumptions about Recap: Graphical Models One way to make generative modeling tractable is to make simplifying assumptions about which variables affect which other variables. These assumptions can often be represented graphically, leading to a whole zoo of models known collectively as Graphical Models. Recap: Graphical Models One zoo-member is the Bayesian Network, which you’ve met before: Naive Bayes HMM The vertices in a Bayes net are random variables. An edge from A to B means roughly: “A causes B” Or more formally: “B is independent of all other vertices when conditioned on its parents”. Figures from Sutton and McCallum, ‘12 Recap: Graphical Models Another is the Markov Random Field (MRF), an undirected graphical model Again vertices are random variables. Edges show which variables depend on each other: Thm: This implies that Where the are called “factors”. They measure how compatible an assignment of subset of random variables are. Figure from Wikipedia Recap: Graphical Models Then there are Factor Graphs, which are just MRFs with bonus vertices to remove any ambiguity about exactly how the MRF factors An MRF An MRF A factor graph for this MRF Another factor graph for this MRF Circles and squares are both vertices; circles are random variables, squares are factors.
    [Show full text]
  • The Core in Random Hypergraphs and Local Weak Convergence 3
    THE CORE IN RANDOM HYPERGRAPHS AND LOCAL WEAK CONVERGENCE KATHRIN SKUBCH∗ Abstract. The degree of a vertex in a hypergraph is defined as the number of edges incident to it. In this paper we study the k-core, defined as the max- imal induced subhypergraph of minimum degree k, of the random r-uniform r−1 hypergraph Hr(n,p) for r ≥ 3. We consider the case k ≥ 2 and p = d/n for which asymptotically every vertex has fixed average degree d/(r − 1)! > 0. We derive a multi-type branching process that describes the local structure of the k-core together with the mantle, i.e. the vertices outside the k-core. This paper provides a generalization of prior work on the k-core problem in random graphs [A. Coja-Oghlan, O. Cooley, M. Kang, K. Skubch: How does the core sit inside the mantle?, Random Structures & Algorithms 51 (2017) 459–482]. Mathematics Subject Classification: 05C80, 05C65 (primary), 05C15 (sec- ondary) Key words: Random hypergraphs, core, branching process, Warning Prop- agation 1. Introduction r−1 Let r ≥ 3 and let d > 0 be a fixed constant. We denote by H = Hr(n,d/n ) the random r-uniform hypergraph on the vertex set [n] = {1,...,n} in which any subset a of [n] of cardinality r forms an edge with probability p = d/nr−1 inde- pendently. We say that the random hypergraph H enjoys a property with high probability (‘w.h.p.’) if the probability of this event tends to 1 as n tends to infinity. 1.1.
    [Show full text]
  • CS242: Probabilistic Graphical Models Lecture 1B: Factor Graphs, Inference, & Variable Elimination
    CS242: Probabilistic Graphical Models Lecture 1B: Factor Graphs, Inference, & Variable Elimination Professor Erik Sudderth Brown University Computer Science September 15, 2016 Some figures and materials courtesy David Barber, Bayesian Reasoning and Machine Learning http://www.cs.ucl.ac.uk/staff/d.barber/brml/ CS242: Lecture 1B Outline Ø Factor graphs Ø Loss Functions and Marginal Distributions Ø Variable Elimination Algorithms Factor Graphs set of hyperedges linking subsets of nodes f F ✓ V set of N nodes or vertices, 1, 2,...,N V 1 { } p(x)= f (xf ) Z = (x ) Z f f f x f Y2F X Y2F Z>0 normalization constant (partition function) (x ) 0 arbitrary non-negative potential function f f ≥ • In a hypergraph, the hyperedges link arbitrary subsets of nodes (not just pairs) • Visualize by a bipartite graph, with square (usually black) nodes for hyperedges • Motivation: factorization key to computation • Motivation: avoid ambiguities of undirected graphical models Factor Graphs & Factorization For a given undirected graph, there exist distributions with equivalent Markov properties, but different factorizations and different inference/learning complexities. 1 p(x)= (x ) Z f f f Y2F Undirected Pairwise (edge) Maximal Clique An Alternative Graphical Model Potentials Potentials Factorization 1 1 Recommendation: p(x)= (x ,x ) p(x)= (x ) Use undirected graphs Z st s t Z c c (s,t) c for pairwise MRFs, Y2E Y2C and factor graphs for higher-order potentials 6 K. Yamaguchi, T. Hazan, D. McAllester, R. Urtasun > 2pixels > 3pixels > 4pixels > 5pixels Non-Occ
    [Show full text]
  • 4 Factor Graphs and Comparing Graphical Model Types
    Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 4 Factor graphs and Comparing Graphical Model Types We now introduce a third type of graphical model. Beforehand, let us summarize some key perspectives on our first two. First, for some ordering of variables, we can write any probability distribution as px(x1; : : : ; xn) = p (x1)p j (x2; x1) p j ;:::; (xn x1; : : : ; xn−1); x1 x2 x1 ··· xn x1 xn−1 j which can be expressed as a fully connected directed graphical model. When the conditional distributions involved do not depend on all the indicated conditioning variables, then some of the edges in the directed graphical model can be removed. This reduces the complexity of inference, since the associated conditional probability tables have a more compact description. By contrast, undirected graphical models express distributions of the form 1 Y px(x1; : : : ; xn) = Φc(xc); Z c2C where the potential functions Φc are non-negative and C is the set of maximal cliques on an undirected graph G. Remember that the Hammersley-Clifford theorem says that any distribution that factors in this way satisfies the Markov property on the graph and conversely that if p is strictly positive and satisfies the Markov property for G then it factors as above. Evidently, any distribution can be expressed using fully connected undirected graphical model, since this corresponds to a single potential involving all the variables—i.e., a joint probability distribution. Undirected graphical models efficiently represent conditional independencies in the distribution of interest, which are expressed by the removal of edges, and which similarly reduces the complexity of inference.
    [Show full text]
  • An Introduction to Conditional Random Fields
    Foundations and Trends R in sample Vol. xx, No xx (xxxx) 1{87 c xxxx xxxxxxxxx DOI: xxxxxx An Introduction to Conditional Random Fields Charles Sutton1 and Andrew McCallum2 1 EdinburghEH8 9AB, UK, [email protected] 2 Amherst, MA01003, USA, [email protected] Abstract Often we wish to predict a large number of variables that depend on each other as well as on other observed variables. Structured predic- tion methods are essentially a combination of classification and graph- ical modeling, combining the ability of graphical models to compactly model multivariate data with the ability of classification methods to perform prediction using large sets of input features. This tutorial de- scribes conditional random fields, a popular probabilistic method for structured prediction. CRFs have seen wide application in natural lan- guage processing, computer vision, and bioinformatics. We describe methods for inference and parameter estimation for CRFs, including practical issues for implementing large scale CRFs. We do not assume previous knowledge of graphical modeling, so this tutorial is intended to be useful to practitioners in a wide variety of fields. Contents 1 Introduction 1 2 Modeling 5 2.1 Graphical Modeling 6 2.2 Generative versus Discriminative Models 10 2.3 Linear-chain CRFs 18 2.4 General CRFs 21 2.5 Applications of CRFs 23 2.6 Feature Engineering 24 2.7 Notes on Terminology 26 3 Inference 27 3.1 Linear-Chain CRFs 28 3.2 Inference in Graphical Models 32 3.3 Implementation Concerns 40 4 Parameter Estimation 43 i ii Contents 4.1 Maximum Likelihood 44 4.2 Stochastic Gradient Methods 52 4.3 Parallelism 54 4.4 Approximate Training 54 4.5 Implementation Concerns 61 5 Related Work and Future Directions 63 5.1 Related Work 63 5.2 Frontier Areas 70 1 Introduction Fundamental to many applications is the ability to predict multiple variables that depend on each other.
    [Show full text]
  • Markov Random Field Modeling, Inference & Learning in Computer
    Markov Random Field Modeling, Inference & Learning in Computer Vision & Image Understanding: A Survey Chaohui Wang, Nikos Komodakis, Nikos Paragios To cite this version: Chaohui Wang, Nikos Komodakis, Nikos Paragios. Markov Random Field Modeling, Inference & Learning in Computer Vision & Image Understanding: A Survey. Computer Vision and Image Un- derstanding, Elsevier, 2013, 117 (11), pp.1610-1627. 10.1016/j.cviu.2013.07.004. hal-00858390v2 HAL Id: hal-00858390 https://hal.archives-ouvertes.fr/hal-00858390v2 Submitted on 6 Sep 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Markov Random Field Modeling, Inference & Learning in Computer Vision & Image Understanding: A Survey Chaohui Wanga,b, Nikos Komodakisa,c, Nikos Paragiosa,d aCenter for Visual Computing, Ecole Centrale Paris, Grande Voie des Vignes, Chatenay-Malabry, France bPerceiving Systems Department, Max Planck Institute for Intelligent Systems, T¨ubingen, Germany cLIGM laboratory, University Paris-East & Ecole des Ponts Paris-Tech, Marne-la-Vall´ee, France dGALEN Group, INRIA Saclay - Ile de France, Orsay, France Abstract In this paper, we present a comprehensive survey of Markov Random Fields (MRFs) in computer vision and image understanding, with respect to the modeling, the inference and the learning.
    [Show full text]
  • Factor Graphs and Message Passing
    Factor Graphs and message passing Carl Edward Rasmussen October 28th, 2016 Carl Edward Rasmussen Factor Graphs and message passing October 28th, 2016 1 / 13 Key concepts • Factor graphs are a class of graphical model • A factor graph represents the product structure of a function, and contains factor nodes and variable nodes • We can compute marginals and conditionals efficiently by passing messages on the factor graph, this is called the sum-product algorithm (a.k.a. belief propagation or factor-graph propagation) • We can apply this to the True Skill graph • But certain messages need to be approximated • One approximation method based on moment matching is called Expectation Propagation (EP) Carl Edward Rasmussen Factor Graphs and message passing October 28th, 2016 2 / 13 Factor Graphs Factor graphs are a type of probabilistic graphical model (others are directed graphs, a.k.a. Bayesian networks, and undirected graphs, a.k.a. Markov networks). Factor graphs allow to represent the product structure of a function. Example: consider the factorising probability distribution: p(v, w, x, y, z) = f1(v, w)f2(w, x)f3(x, y)f4(x, z) A factor graph is a bipartite graph with two types of nodes: • Factor node: Variable node: • Edges represent the dependency of factors on variables. Carl Edward Rasmussen Factor Graphs and message passing October 28th, 2016 3 / 13 Factor Graphs p(v, w, x, y, z) = f1(v, w)f2(w, x)f3(x, y)f4(x, z) • What are the marginal distributions of the individual variables? • What is p(w)? • How do we compute conditional distributions, e.g.
    [Show full text]
  • Classical Probabilistic Models and Conditional Random Fields
    Classical Probabilistic Models and Conditional Random Fields Roman Klinger Katrin Tomanek Algorithm Engineering Report TR07-2-013 December 2007 ISSN 1864-4503 Faculty of Computer Science Algorithm Engineering (Ls11) 44221 Dortmund / Germany http://ls11-www.cs.uni-dortmund.de/ Classical Probabilistic Models and Conditional Random Fields Roman Klinger∗ Katrin Tomanek∗ Fraunhofer Institute for Jena University Algorithms and Scientific Computing (SCAI) Language & Information Engineering (JULIE) Lab Schloss Birlinghoven F¨urstengraben 30 53754 Sankt Augustin, Germany 07743 Jena, Germany Dortmund University of Technology [email protected] Department of Computer Science Chair of Algorithm Engineering (Ls XI) 44221 Dortmund, Germany [email protected] [email protected] Contents 1 Introduction 2 2 Probabilistic Models 3 2.1 Na¨ıve Bayes ................................... 4 2.2 Hidden Markov Models ............................ 5 2.3 Maximum Entropy Model ........................... 6 3 Graphical Representation 10 3.1 Directed Graphical Models .......................... 12 3.2 Undirected Graphical Models ......................... 13 4 Conditional Random Fields 14 4.1 Basic Principles ................................. 14 4.2 Linear-chain CRFs ............................... 15 4.2.1 Training ................................. 19 4.2.2 Inference ................................ 23 4.3 Arbitrarily structured CRFs .......................... 24 4.3.1 Unrolling the graph .......................... 25 4.3.2 Training and Inference ......................... 26 5Summary 26 ∗ The authors contributed equally to this work. version 1.0 1 Introduction 1 Introduction Classification is known as the assignment of a class y to an observation x .This ∈Y ∈X task can be approached with probability theory by specifying a probability distribution to select the most likely class y for a given observation x. A well-known example (Russell and Norvig, 2003) is the classification of weather observations into categories, such as good or bad, = good, bad .
    [Show full text]
  • Factor Graph Grammars
    Factor Graph Grammars David Chiang Darcey Riley University of Notre Dame University of Notre Dame [email protected] [email protected] Abstract We propose the use of hyperedge replacement graph grammars for factor graphs, or factor graph grammars (FGGs) for short. FGGs generate sets of factor graphs and can describe a more general class of models than plate notation, dynamic graphical models, case–factor diagrams, and sum–product networks can. Moreover, inference can be done on FGGs without enumerating all the generated factor graphs. For finite variable domains (but possibly infinite sets of graphs), a generalization of variable elimination to FGGs allows exact and tractable inference in many situations. For finite sets of graphs (but possibly infinite variable domains), a FGG can be converted to a single factor graph amenable to standard inference techniques. 1 Introduction Graphs have been used with great success as representations of probability models, both Bayesian and Markov networks (Koller and Friedman, 2009) as well as latent-variable neural networks (Schulman et al., 2015). But in many applications, especially in speech and language processing, a fixed graph is not sufficient. The graph may have substructures that repeat a variable number of times: for example, a hidden Markov model (HMM) depends on the number of words in the string. Or, part of the graph may have several alternatives with different structures: for example, a probabilistic context-free grammar (PCFG) contains many trees for a given string. Several formalisms have been proposed to fill this need. Plate notation (Buntine, 1994), plated factor graphs (Obermeyer et al., 2019), and dynamic graphical models (Bilmes, 2010) address the repeated-substructure problem, but only for sequence models like HMMs.
    [Show full text]
  • Factor Graphs: Theory and Applications
    TECHNICAL UNIVERSITY OF CRETE ELECTRONIC AND COMPUTER ENGINEERING DEPARTMENT TELECOMMUNICATIONS DIVISION Factor Graphs: Theory and Applications by Panagiotis Alevizos A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DIPLOMA DEGREE OF ELECTRONIC AND COMPUTER ENGINEERING September 2012 THESIS COMMITTEE Assistant Professor Aggelos Bletsas, Thesis Supervisor Assistant Professor George N. Karystinos Professor Athanasios P. Liavas Abstract Factor graphs (FGs) represent graphically the factorization of a global function into a product of local sub-functions. The global function is usually a multi-variable prob- ability density function (pdf), where the calculation of a marginal pdf is usually in- tractable. The sum-product algorithm (SPA) is applied on the FG through message- passing, i.e. exchange of functions, between the FG nodes in a distributed way; the output is a marginal pdf with respect to a variable of interest. Factor graph theory has several applications in many interdisciplinary fields, such as error correction cod- ing theory, detection and estimation, wireless networking, artificial intelligence and many others. This thesis provides the basic theoretical background in a tutorial way, from first principles. Furthermore, specific FG applications found in the literature are pre- sented. Specifically, coding problems (LDPC, convolutional and parallel concatenated Turbo codes), Bayesian estimation (in the context of network localization) and wire- less multi-hop networking (in the context of time scheduling) are analyzed within the FG framework. In all cases, the respective graph, the associated SPA, the message- passing scheduling and the final output are thoroughly presented. The power of FGs as a distributed inference tool is vividly demonstrated. Acknowledgements Thanks to my parents for the psychological support and for their unconditional love.
    [Show full text]