Unifying Logical and Statistical AI with Markov Logic
Total Page:16
File Type:pdf, Size:1020Kb
review articles DOI:10.1145/3241978 Markov logic can be used as a general framework for joining logical and statistical AI. BY PEDRO DOMINGOS AND DANIEL LOWD Unifying Logical and Statistical AI with Markov Logic To handle the complexity and uncer- FOR MANY YEARS, the two dominant paradigms in tainty present in most real-world prob- artificial intelligence (AI) have been logical AI and lems, we need AI that is both logical and statistical AI. Logical AI uses first-order logic and statistical, integrating first-order logic related representa tions to capture complex and graphical models. One or the other relationships and knowledge about the world. key insights However, logic-based approaches are often too brittle ˽ Intelligent systems must be able to handle the complexity and uncertainty of the real to handle the uncertainty and noise pres ent in many world. Markov logic enables this by unifying first-order logic and probabilistic graphical applications. Statistical AI uses probabilistic models into a single representation. Many representations such as probabilistic graphical deep architectures are instances of Markov logic. models to capture uncertainty. However, graphical ˽ A extensive suite of learning and models only represent distributions over inference algorithms for Markov logic has been developed, along with open source propositional universes and must be customized to implementations like Alchemy. ˽ Markov logic has been applied to natural handle relational domains. As a result, expressing language understanding, information complex concepts and relationships in graphical models extraction and integration, robotics, social network analysis, computational is often difficult and labor-intensive. biology, and many other areas. 74 COMMUNICATIONS OF THE ACM | JULY 2019 | VOL. 62 | NO. 7 by itself cannot provide the minimum Markov logic7 is a simple yet powerful Markov logic combine ideas from prob- functionality needed to support the full generalization of first-order logic and abilistic and logical inference, such as range of AI applications. Further, the probabilistic graphical models, which Markov chain Monte Carlo, belief propa- two need to be fully integrated, and are allows it to build on and integrate the gation, satisfiability, and resolution. not simply provided alongside each oth- best approaches from both logical and Markov logic has already been used to er. Most applications require simulta- statistical AI. A Markov logic network efficiently develop state-of-the-art mod- neously the expressiveness of first-order (MLN) is a set of weighted first-order for- els for many AI problems, such as collec- logic and the robustness of probability, mulas, viewed as templates for con- tive classification, link prediction, not just one or the other. Unfortunately, structing Markov networks. This yields a ontology mapping, knowledge base the split between logical and statistical well-defined probability distribution in refine ment, and semantic parsing in AI runs very deep. It dates to the earliest which worlds are more likely when they application areas such as the Web, days of the field, and continues to be satisfy a higher-weight set of ground for- social networks, molecular biology, highly visible today. It takes a different mulas. Intuitively, the magnitude of the information extraction, and others. form in each subfield of AI, but it is om- weight corresponds to the relative Markov logic makes solving new prob- nipresent. Table 1 shows examples of strength of its formula; in the infinite- lems easier by offering a simple frame- this. In each case, both the logical and weight limit, Markov logic reduces to work for representing well-defined the statistical approaches contribute first-order logic. Weights can be set by probability distributions over uncer- something important. This justifies the hand or learned automatically from tain, relational data. Many existing abundant research on each of them, but data. Algorithms for learning or revising approaches can be described by a few also implies that ultimately a combina- formulas from data have also been weighted formulas, and multiple IMAGE BY GUZEL KHUZHINA BY IMAGE tion of the two is required. developed. Inference algorithms for approaches can be combined by JULY 2019 | VOL. 62 | NO. 7 | COMMUNICATIONS OF THE ACM 75 review articles 9 Table 1. Examples of logical and statistical AI. Deep learning methods have led to competitive or dominant approaches in a growing number of problems. Deep Field Logical approach Statistical approach learning fits complex, nonlinear func- Knowledge representation First-order logic Graphical models tions directly from data. For domains Automated reasoning Satisfiability testing Markov chain Monte where we know something about the Carlo problem structure, MLNs and other Machine learning Inductive logic programming Neural networks graphical models make it easier to cap- Planning Classical planning Markov decision ture background knowledge about the processes domain, sometimes specifying competi- Natural language processing Definite clause grammars Probabilistic context-free tive models without having any training grammars data at all. Due to their complementary strengths, combining deep learning with graphical models is an ongoing Table 2. Example of a first-order knowledge base and MLN. area of research. First-Order Logic English First-order logic Weight A first-order knowledge base(KB) is a set “Friends of friends are friends.” ∀x∀y∀z Fr(x, y) ∧ Fr(y, z) ⇒ Fr(x, z) 0.7 of sentences or formulas in first-order “Friendless people smoke.” ∀x (¬(∃y Fr(x, y) ) ⇒ Sm(x) ) 2.3 logic. Formulas are constructed using “Smoking causes cancer.” ∀x Sm(x) ⇒ Ca(x) 1.5 “If two people are friends, then ∀x∀y Fr(x, y) ⇒ (Sm(x) ⇔ Sm(y) ) 1.1 four types of symbols: constants, vari- either both smoke or neither does.” ables, functions, and predicates. Fr() is short for Friends(), Sm() for Smokes(), and Ca() for Cancer(). Constant symbols represent objects in the domain of interest (for example, people: Anna, Bob, and Chris). Variable including all of the relevant formulas. uses weighted formulas like Markov symbols range over the objects in the Many algorithms, as well as sample logic, but with a continuous relaxation domain. Function symbols (MotherOf) datasets and applications, are available of the variables in order to reason effi- represent mappings from tuples of in the open source Alchemy system17 ciently. In some cases, PSL can be objects to objects. Predicate symbols (alchemy.cs.washington.edu). viewed as Markov logic with a particular represent relations among objects in In this article, we describe Markov choice of approximate inference algo- the domain (Friends) or attributes of logic and its algorithms, and show rithm. One limitation of PSL’s degree- objects (Smokes). An interpretation speci- how they can be used as a general of-satisfaction semantics is that more fies which objects, functions, and rela- framework for combining logical and evidence does not always make an event tions in the domain are represented by statistical AI. Before presenting back- more likely; many weak sources of evi- which symbols. ground and details on Markov logic, dence do not combine to produce A term is any expression representing we first discuss how it relates to other strong evidence, even when the sources an object in the domain. It can be a con- methods in AI. are independent. stant, a variable, or a function applied to Markov logic is the most widely used Probabilistic programming28 is a tuple of terms. For example, Anna, x, approach to unifying logical and statis- an other paradigm for defining and rea- and GreatestCommonDivisor(x, y) are tical AI, but this is an active research soning with rich, probabilistic models. terms. An atomic formula or atom is a area, and there are many others (see Probabilistic programming is a good fit predicate symbol applied to a tuple of Kimmig et al.14 for a recent survey with for problems where the data is gener- terms (for example, Friends(x, many examples). Most approaches can ated by a random process, and the pro- MotherOf(Anna) ) ). Formulas are recur- be roughly categorized as either extend- cess can be described as the execution sively constructed from atomic formu- ing logic programming languages (for of a procedural program with random las using logical connectives and example, Prolog) to handle uncer- choices. Not every domain is well-suited quantifiers. If F1 and F2 are formulas, tainty, or extending probabilistic to this approach; for example, we may the following are also formulas: ¬F1 graphical models to handle relational wish to describe or predict the behavior (negation), which is true if F1 is false; F1 ∧ structure. Many of these model classes of people in a social network without F2 (conjunction), which is true if both F1 can also be represented efficiently as modeling the complete evolution of that and F2 are true; F1 ∨ F2 (disjunction), 32 MLNs (see Richardson and Domingos network. Inference methods for proba- which is true if F1 or F2 is true; F1 ⇒ F2 for a discussion of early approaches to bilistic programming languages work (implication), which is true if F1 is false or statistical relational AI and how they backwards from the data to reason about F2 is true; F1 ⇔ F2 (equivalence), which is relate to Markov logic). In recent years, the processes that generate the data true if F1 and F2 have the same truth value; x most work on statistical relational AI and compute conditional probabilities ∀ F1 (universal quantification), which is x has assumed a parametric factor (par- of other events. Probabilistic graphical true if F1 is true for every object in the 29 x factor) representation which is simi- models perform similar reasoning, but domain; and ∃ F1 (existential quantifi- lar to the weighted formulas in an typically have more structure to exploit cation), which is true if F1 is true for at MLN. Probabilistic soft logic (PSL)1 for reasoning at scale. least one object x in the domain.