Markov Logic Networks

Markov Logic Networks Matthew Richardson ([email protected]) and Pedro Domingos ([email protected]) Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195-250, U.S.A. Abstract. We propose a simple approach to combining first-order logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a first-order knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the domain, it specifies a ground Markov network containing one feature for each possible grounding of a first-order formula in the KB, with the corresponding weight. Inference in MLNs is performed by MCMC over the minimal subset of the ground network required for answering the query. Weights are efficiently learned from relational databases by iteratively optimizing a pseudo-likelihood measure. Optionally, addi- tional clauses are learned using inductive logic programming techniques. Experiments with a real-world database and knowledge base in a university domain illustrate the promise of this approach. Keywords: Statistical relational learning, Markov networks, Markov random fields, log-linear models, graphical models, first-order logic, satisfiability, inductive logic programming, knowledge-based model construction, Markov chain Monte Carlo, pseudo-likelihood, link prediction 1. Introduction Combining probability and first-order logic in a single representation has long been a goal of AI. Probabilistic graphical models enable us to efficiently handle uncertainty. First-order logic enables us to compactly represent a wide variety of knowledge. Many (if not most) applications require both. Interest in this problem has grown in recent years due to its relevance to statistical relational learning (Getoor & Jensen, 2000; Getoor & Jensen, 2003; Diet- terich et al., 2003), also known as multi-relational data mining (Dzeroskiˇ & De Raedt, 2003; Dzeroskiˇ et al., 2002; Dzeroskiˇ et al., 2003; Dzeroskiˇ & Blockeel, 2004). Current proposals typically focus on combining probability with restricted subsets of first-order logic, like Horn clauses (e.g., Wellman et al. (1992); Poole (1993); Muggleton (1996); Ngo and Haddawy (1997); Sato and Kameya (1997); Cussens (1999); Kersting and De Raedt (2001); Santos Costa et al. (2003)), frame-based systems (e.g., Friedman et al. (1999); Pasula and Russell (2001); Cumby and Roth (2003)), or database query lan- guages (e.g., Taskar et al. (2002); Popescul and Ungar (2003)). They are often quite complex. In this paper, we introduce Markov logic networks (MLNs), a representation that is quite simple, yet combines probability and first-order logic with no restrictions other than finiteness of the domain. We develop mln.tex; 26/01/2006; 19:24; p.1 2 Richardson and Domingos efficient algorithms for inference and learning in MLNs, and evaluate them in a real-world domain. A Markov logic network is a first-order knowledge base with a weight attached to each formula, and can be viewed as a template for constructing Markov networks. From the point of view of probability, MLNs provide a compact language to specify very large Markov networks, and the ability to flexibly and modularly incorporate a wide range of domain knowledge into them. From the point of view of first-order logic, MLNs add the ability to soundly handle uncertainty, tolerate imperfect and contradictory knowledge, and reduce brittleness. Many important tasks in statistical relational learning, like collective classification, link prediction, link-based clustering, social network modeling, and object identification, are naturally formulated as instances of MLN learning and inference. Experiments with a real-world database and knowledge base illustrate the benefits of using MLNs over purely logical and purely probabilistic approaches. We begin the paper by briefly reviewing the fundamentals of Markov networks (Section 2) and first-order logic (Section 3). The core of the paper introduces Markov logic networks and algorithms for inference and learning in them (Sections 4–6). We then report our experimental results (Section 7). Finally, we show how a variety of statistical relational learning tasks can be cast as MLNs (Section 8), discuss how MLNs relate to previous approaches (Section 9) and list directions for future work (Section 10). 2. Markov Networks A Markov network (also known as Markov random field) is a model for the joint distribution of a set of variables X = (X1; X2; : : : ; Xn) 2 X (Pearl, 1988). It is composed of an undirected graph G and a set of potential functions φk. The graph has a node for each variable, and the model has a potential function for each clique in the graph. A potential function is a non- negative real-valued function of the state of the corresponding clique. The joint distribution represented by a Markov network is given by 1 P (X =x) = φ (x ) (1) Z k fkg Yk where xfkg is the state of the kth clique (i.e., the state of the variables that appear in that clique). Z, known as the partition function, is given by Z = x2X k φk(xfkg). Markov networks are often conveniently represented as log-linear models, with each clique potential replaced by an exponentiated P Q weighted sum of features of the state, leading to mln.tex; 26/01/2006; 19:24; p.2 Markov Logic Networks 3 1 P (X =x) = exp w f (x) (2) 0 j j 1 Z j X @ A A feature may be any real-valued function of the state. This paper will focus on binary features, fj(x) 2 f0; 1g. In the most direct translation from the potential-function form (Equation 1), there is one feature corresponding to each possible state xfkg of each clique, with its weight being log φk(xfkg). This representation is exponential in the size of the cliques. However, we are free to specify a much smaller number of features (e.g., logical functions of the state of the clique), allowing for a more compact representation than the potential-function form, particularly when large cliques are present. MLNs will take advantage of this. Inference in Markov networks is #P-complete (Roth, 1996). The most widely used method for approximate inference in Markov networks is Markov chain Monte Carlo (MCMC) (Gilks et al., 1996), and in particular Gibbs sampling, which proceeds by sampling each variable in turn given its Markov blanket. (The Markov blanket of a node is the minimal set of nodes that renders it independent of the remaining network; in a Markov network, this is simply the node's neighbors in the graph.) Marginal probabilities are computed by counting over these samples; conditional probabilities are computed by running the Gibbs sampler with the conditioning variables clamped to their given values. Another popular method for inference in Markov networks is belief propagation (Yedidia et al., 2001). Maximum-likelihood or MAP estimates of Markov network weights can- not be computed in closed form, but, because the log-likelihood is a concave function of the weights, they can be found efficiently using standard gradient- based or quasi-Newton optimization methods (Nocedal & Wright, 1999). Another alternative is iterative scaling (Della Pietra et al., 1997). Features can also be learned from data, for example by greedily constructing conjunctions of atomic features (Della Pietra et al., 1997). 3. First-Order Logic A first-order knowledge base (KB) is a set of sentences or formulas in first- order logic (Genesereth & Nilsson, 1987). Formulas are constructed using four types of symbols: constants, variables, functions, and predicates. Con- stant symbols represent objects in the domain of interest (e.g., people: Anna, Bob, Chris, etc.). Variable symbols range over the objects in the domain. Function symbols (e.g., MotherOf) represent mappings from tuples of objects to objects. Predicate symbols represent relations among objects in the domain (e.g., Friends) or attributes of objects (e.g., Smokes). An inter- mln.tex; 26/01/2006; 19:24; p.3 4 Richardson and Domingos pretation specifies which objects, functions and relations in the domain are represented by which symbols. Variables and constants may be typed, in which case variables range only over objects of the corresponding type, and constants can only represent objects of the corresponding type. For example, the variable x might range over people (e.g., Anna, Bob, etc.), and the constant C might represent a city (e.g., Seattle). A term is any expression representing an object in the domain. It can be a constant, a variable, or a function applied to a tuple of terms. For example, Anna, x, and GreatestCommonDivisor(x; y) are terms. An atomic formula or atom is a predicate symbol applied to a tuple of terms (e.g., Friends(x; MotherOf(Anna))). Formulas are recursively constructed from atomic formulas using logical connectives and quantifiers. If F1 and F2 are formulas, the following are also formulas: :F1 (negation), which is true iff F1 is false; F1 ^ F2 (conjunction), which is true iff both F1 and F2 are true; F1 _ F2 (disjunction), which is true iff F1 or F2 is true; F1 ) F2 (implication), which is true iff F1 is false or F2 is true; F1 , F2 (equivalence), which is true iff F1 and F2 have the same truth value; 8x F1 (universal quantification), which is true iff F1 is true for every object x in the domain; and 9x F1 (existential quantification), which is true iff F1 is true for at least one object x in the domain. Parentheses may be used to enforce precedence. A positive literal is an atomic formula; a negative literal is a negated atomic formula. The formulas in a KB are implicitly conjoined, and thus a KB can be viewed as a single large formula. A ground term is a term containing no variables.

Markov Logic Networks

Probabilistic Logic Programming

Probabilistic Boolean Logic, Arithmetic and Architectures

A Unifying Field in Logics: Neutrosophic Logic. Neutrosophy, Neutrosophic Set, Neutrosophic Probability and Statistics

Probabilistic Logic Programming and Bayesian Networks

Generalising the Interaction Rules in Probabilistic Logic

Probabilistic Logic Learning

New Challenges to Philosophy of Science NEW CHALLENGES to PHILOSOPHY of SCIENCE

Estimating Accuracy from Unlabeled Data: a Probabilistic Logic Approach

Probabilistic Logic Neural Networks for Reasoning

Unifying Logic and Probability

Learning Probabilistic Logic Models from Probabilistic Examples

Probabilistic Logic Programming Under the Distribution Semantics