A Tractable First-Order Probabilistic Logic
Total Page:16
File Type:pdf, Size:1020Kb
A Tractable First-Order Probabilistic Logic Pedro Domingos and W. Austin Webb Department of Computer Science and Engineering University of Washington Seattle, WA 98195-2350, U.S.A. fpedrod, [email protected] Abstract of these representations continues to be significantly limited by the difficulty of inference. Tractable subsets of first-order logic are a central topic The ideal language for most AI applications would be a in AI research. Several of these formalisms have been used as the basis for first-order probabilistic languages. tractable first-order probabilistic logic, but the prospects for However, these are intractable, losing the original moti- such a language that is not too restricted to be useful seem vation. Here we propose the first non-trivially tractable dim. Cooper (1990) showed that inference in propositional first-order probabilistic language. It is a subset of graphical models is already NP-hard, and Dagum and Luby Markov logic, and uses probabilistic class and part hier- (1993) showed that this is true even for approximate infer- archies to control complexity. We call it TML (Tractable ence. Probabilistic inference can be reduced to model count- Markov Logic). We show that TML knowledge bases ing, and Roth (1996) showed that even approximate count- allow for efficient inference even when the correspond- ing is intractable, even for very restricted propositional lan- ing graphical models have very high treewidth. We also guages, like CNFs with no negation and at most two literals show how probabilistic inheritance, default reasoning, per clause, or Horn CNFs with at most two literals per clause and other inference patterns can be carried out in TML. TML opens up the prospect of efficient large-scale first- and three occurrences of any variable. order probabilistic inference. We can trivially ensure that a first-order probabilis- tic knowledge base is tractable by requiring that, when grounded, it reduce to a low-treewidth graphical model. Introduction However, this is a syntactic condition that is not very mean- ingful or perspicuous in terms of the domain, and experience The tradeoff between expressiveness and tractability is a has shown that few real domains can be well modeled by central problem in AI. First-order logic can express most low-treewidth graphs. This is particularly true of first-order knowledge compactly, but is intractable. As a result, much domains, where even very simple formulas can lead to un- research has focused on developing tractable subsets of it. bounded treewidth. First-order logic and its subsets ignore that most knowledge is uncertain, severely limiting their applicability. Graphical Although probabilistic inference is commonly assumed models like Bayesian and Markov networks can represent to be exponential in treewidth, a line of research that many probability distributions compactly, but their expres- includes arithmetic circuits (Darwiche 2003), AND-OR siveness is only at the level of propositional logic, and infer- graphs (Dechter and Mateescu 2007) and sum-product net- ence in them is also intractable. works (Poon and Domingos 2011) shows that this need not Many languages have been developed to handle uncer- be the case. However, the consequences of this for first-order tain first-order knowledge. Most of these use a tractable probabilistic languages seem to have so far gone unnoticed. subset of first-order logic, like function-free Horn clauses In addition, recent advances in lifted probabilistic inference (e.g., (Wellman, Breese, and Goldman 1992; Poole 1993; (e.g., (Gogate and Domingos 2011)) make tractable infer- Muggleton 1996; De Raedt, Kimmig, and Toivonen 2007)) ence possible even in cases where the most efficient propo- or description logics (e.g., (Jaeger 1994; Koller, Levy, and sitional structures do not. In this paper, we take advantage of Pfeffer 1997; d’Amato, Fanizzi, and Lukasiewicz 2008; these to show that it is possible to define a first-order prob- Niepert, Noessner, and Stuckenschmidt 2011)). Unfortu- abilistic logic that is tractable, and yet expressive enough to nately, the probabilistic extensions of these languages are encompass many cases of interest, including junction trees, intractable, losing the main advantage of restricting the rep- non-recursive PCFGs, networks with cluster structure, and resentation. Other languages, like Markov logic (Domingos inheritance hierarchies. and Lowd 2009), allow a full range of first-order constructs. We call this language Tractable Markov Logic, or TML Despite much progress in the last decade, the applicability for short. In TML, the domain is decomposed into parts, each part is drawn probabilistically from a class hierarchy, Copyright c 2012, Association for the Advancement of Artificial the part is further decomposed into subparts according to Intelligence (www.aaai.org). All rights reserved. its class, and so forth. Relations of any arity are allowed, provided that they are between subparts of some part (e.g., Algorithm 1 LWMC(CNF L, substs. S, weights W ) workers in a company, or companies in a market), and that // Base case they depend on each other only through the part’s class. In- if all clauses in L are satisfied then ference in TML is reducible to computing partition func- Q nA(S) return A2M(L)(WA + W:A) tions, as in all probabilistic languages. It is tractable because if L has an empty unsatisfied clause then return 0 there is a correspondence between classes in the hierarchy // Decomposition step and sums in the computation of the partition function, and if there exists a lifted decomposition fL1;1;:::;L1;m ; between parts and products. 1 :::;Lk;1;:::;Lk;mk g of L under S then We begin with some necessary background. We then de- k return Q [LWMC(L ; S; W )]mi fine TML, prove its tractability, and illustrate its expressive- i=1 i;1 // Splitting step ness. The paper concludes with extensions and discussion. Choose an atom A (1) (l) Let fΣA;S;:::; ΣA;Sg be a lifted split of A for L under S Background Pl ti fi return i=1 niWA W:ALWMC(Ljσi; Si;W ) TML is a subset of Markov logic (Domingos and Lowd 2009). A Markov logic network (MLN) is a set of weighted first-order clauses. It defines a Markov network over all verts the PKB into a CNF L and set of literal weights W . the ground atoms in its Herbrand base, with a feature cor- In this process, each soft formula F is replaced by a hard responding to each ground clause in the Herbrand base. i formula F ,A , where A is a new atom with the same ar- (We assume Herbrand interpretations throughout this pa- i i i guments as F and the weight of :A is φ , all other weights per.) The weight of each feature is the weight of the i i i being 1. corresponding first-order clause. If x is a possible world Algorithm 1 shows pseudo-code for LWMC, the core rou- (assignment of truth values to all ground atoms), then tine in PTP. M(L) is the set of atoms appearing in L, and P (x) = 1 exp (P w n (x)), where w is the weight Z i i i i n (S) is the number of groundings of A consistent with the of the ith clause, n (x) is its number of true ground- A i substitution constraints in S.A lifted decomposition is a par- ings in x, and Z = P exp (P w n (x)) is the par- x i i i tition of a first-order CNF into a set of CNFs with no ground tition function. For example, if an MLN consists of the atoms in common. A lifted split of an atom A for CNF L is formula 8x8y Smokes(x) ^ Friends(x; y) ) Smokes(y) a partition of the possible truth assignments to groundings with a positive weight and a set of facts about smoking and of A such that, in each part, (1) all truth assignments have friendship (e.g., Friends(Anna; Bob)), the probability of a the same number of true atoms and (2) the CNFs obtained world decreases exponentially with the number of pairs of by applying these truth assignments to L are identical. For friends that do not have similar smoking habits. the ith part, ni is its size, ti/fi is the number of true/false Inference in TML is carried out using probabilistic theo- atoms in it, σ is some truth assignment in it, Ljσ is the re- rem proving (PTP) (Gogate and Domingos 2011). PTP is an i i sult of simplifying L with σi, and Si is S augmented with inference procedure that unifies theorem proving and proba- the substitution constraints added in the process. In a nut- probabilistic knowledge base bilistic inference. PTP inputs a shell, LWMC works by finding lifted splits that lead to lifted (PKB) K Q P (QjK) and a query formula and outputs .A decompositions, progressively simplifying the CNF until the F poten- PKB is a set of first-order formulas i and associated base case is reached. tial values φi ≥ 0. It defines a factored distribution over possible worlds with one factor or potential per ground- ing of each formula, with the value of the potential being Tractable Markov Logic 1 if the ground formula is true and φi if it is false. Thus A TML knowledge base (KB) is a set of rules. These can 1 Q ni(x) P (x) = Z i φi , where ni(x) is the number of false take one of three forms, summarized in Table 1. F : w is the groundings of Fi in x. Hard formulas have φi = 0 and soft syntax for formula F with weight w. Rules without weights formulas have φi > 0.