A Best-First Backward-Chaining Search Strategy Based on Learned
Total Page:16
File Type:pdf, Size:1020Kb
A Best-first Backward-chaining Search Strategy based on Learned Predicate Representations Alexander Sakharov Synstretch, Framingham, MA, U.S.A. Keywords: Knowledge Base, First-order Logic, Resolution, Backward Chaining, Neural-symbolic Computing, Tensorization. Abstract: Inference methods for first-order logic are widely used in knowledge base engines. These methods are pow- erful but slow in general. Neural networks make it possible to rapidly approximate the truth values of ground atoms. A hybrid neural-symbolic inference method is proposed in this paper. It is a best-first search strategy for backward chaining. The strategy is based on neural approximations of the truth values of literals. This method is precise and the results are explainable. It speeds up inference by reducing backtracking. 1 INTRODUCTION inference with neural networks (NN) (Rocktaschel,¨ 2017; Serafini and d’Avila Garcez, 2016; Dong et al., The facts and rules of knowledge bases (KB) are usu- 2019; Marra et al., 2019; Van Krieken et al., 2019; ally expressible in first-order logic (FOL) (Russell Sakharov, 2019). Most commonly, it is done via pred- and Norvig, 2009). Typically, KB facts are literals. icate tensorization. Objects are embedded as real- Quantifier-free implications A ( A1 ^ ::: ^ Ak, where valued vectors of a fixed length for the use in NNs. A;A1;:::;Ak are literals, are arguably the most com- Predicates are represented by one or more tensors of mon form of rules in KBs. All these literals are pos- various ranks which are learned. The truth values of itive in Prolog rules. In general logic programs, rule ground atoms of any predicate P are approximated by heads are positive. These rules are equivalent to dis- applying a symbolically differentiable function s to junctions of literals in classical FOL. These disjunc- an algebraic expression. The range of s is the interval tions are known as non-Horn clauses, and the disjunc- [0;1]. One corresponds to true, and zero corresponds tions corresponding to Prolog rules are called Horn to false. The expression is composed of the tensors clauses. Any FOL formula can be expressed by such representing P, embeddings of the constants that are set of non-Horn clauses that their conjunction is equi- P arguments, tensor contraction operations, and sym- satisfiable with this formula (Nie, 1997). bolically differentiable functions. Resolution methods work on sets of non-Horn One key advantage of this machine learning ap- clauses. These methods have become a de facto stan- proach over inference is that approximation of truth dard for inference in KBs and logic programming values of ground atoms is fast. Assuming that s and (Russell and Norvig, 2009). Their success is a ma- other functions from the aforementioned expression jor reason for the popularity of non-Horn clauses as are efficiently implemented, the approximation takes a knowledge representation format. Complete infer- a linear time over the size of the tensors representing ence methods for FOL including resolution are inher- a predicate. Unfortunately, there are serious cons to ently slow. Multiple strategies and heuristics speeding this approach. up resolution procedures have been developed. SLD This approach is limited to ground atoms. If the resolution, which is also known as backward chain- result of an approximation is around 0.5, it is not pos- ing, is a complete strategy for Horn clauses. Faster sible to draw any conclusion about the truth value of but incomplete inference procedures are also accept- an atom. Approximation results may not be reliable. able for KBs. Prolog also utilizes an incomplete in- Their accuracy is not known in advance and previous ference procedure (Stickel, 1992). results do not provide any assurance for the same for The inefficiency of inference in FOL and even future results. The truth values yielded by NNs are not in its Horn fragment prompted attempts to replace explainable because machine learning does not pro- 982 Sakharov, A. A Best-first Backward-chaining Search Strategy based on Learned Predicate Representations. DOI: 10.5220/0010299209820989 In Proceedings of the 13th International Conference on Agents and Artificial Intelligence (ICAART 2021) - Volume 2, pages 982-989 ISBN: 978-989-758-484-8 Copyright c 2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved A Best-first Backward-chaining Search Strategy based on Learned Predicate Representations vide any justification for the results. In many AI tasks dures for FOL and its extensions are more suitable such as automatic code generation, robotic planning, for theorem provers. Inference in KBs is supposed to etc., the aim is actually a derivation itself not the mere be faster, even at the expense of completeness. Un- knowledge of the truth value. The approximation of like theorem provers, the number of facts and rules truth values based on NNs does not contribute to these involved in KB inference may be huge, which slows tasks. down the inference. The use of incomplete strategies Hybrid approaches that combine symbolic reason- is also justified by the fact that KBs are almost always ing and machine learning models are considered the incomplete. most promising (Marcus, 2020). To the best of au- Backward chaining is often explained in terms of thor’s knowledge, there are no known hybrid methods goal lists (sets). The set of negations of literals of one that retain the accuracy of inference, produce deriva- disjunction is considered as an original list of goals. tions, and take advantage of learned predicate repre- Every resolvent is also viewed as a goal list that is sentations. This work introduces such method. It is comprised of negations of its literals. We follow this a search strategy for backward chaining. This search tradition. Due to this explanation, backward chain- strategy utilizes learned predicate representations in ing is interpreted as inference based on generalized order to make better choices at every inference step, Modus Ponens (Russell and Norvig, 2009). Goals and thus, to reduce backtracking which usually con- have to be derived, not refuted. Not only this interpre- sumes the vast portion of time during inference. tation makes backward chaining more explainable, it is also more pertinent to our best-first strategy. This strategy aims to pick the facts or rules that lead to goal 2 RESOLUTION lists whose elements are more likely derivable. Various search strategies can be used in imple- Resolution is perhaps the most practical inference mentations of resolution. Search strategies determine method. The resolution calculus works on Skolem- the order in which literals or disjunctions are resolved. ized FOL formulas in the conjunctive normal form. These strategies include depth-first, breadth-first, it- The conjuctions are viewed as sets of disjunctions of erative deepening, and others. Prolog relies on the literals, i.e. non-Horn clauses. The resolution calcu- depth-first strategy. It is incomplete but efficient. Unit lus has two rules: resolution and factoring. The reso- preference (Russell and Norvig, 2009) is one well- lution rule produces disjunction known best-first search strategy for resolution. Facts are resolved before rules under unit preference. OT- A1q _ ::: _ Ai−1q _ Ai+1q _ ::: _ Akq TER (McCune, 2003) conducts best-first search on _ B1q _ ::: _ B j−1q _ B j+1q _ ::: _ Bmq the basis of rule weight. Lighter rules are preferred. from two disjunctions A1 _ ::: _ Ak and B1 _ ::: _ Bm Longer rules tend to have a higher weight. where substitution q is the most general unifier of Ai and :B j. The factoring rule produces disjunction The OTTER’s search strategy is perhaps the clos- A1q _ ::: _ Ai−1q _ Ai+1q _ ::: _ Akq est to the strategy presented in this paper. In the ex- from disjuction A1 _ ::: _ Ak where substitution q is amples given later, we compare the two. For cer- the most general unifier of Ai and A j. Factoring can tainty, we assume that the rule weight equals the num- be combined with the resolution rule. We assume ber of symbols in the rule including variables, con- reader’s familiarity with resolution. Please refer to stants, functions, predicates, and negations. Resolu- (Chang and Lee, 1973) for details. In this paper, tion strategies should include some form of loop de- we consider KB rules that are equivalent non-Horn tection (Shen et al., 2001). There also exist optimiza- clauses, and KB facts that are literals. tion techniques that make resolution implementations Unconstrained resolution may be very inefficient. more efficient. One notable example of these tech- What makes resolution practical is the availability of niques is tabling (Swift, 2009). strategies that constrain branching at every resolution Non-Horn clauses may contain Skolem functions step by prohibiting certain applications of the resolu- or constants. We refer to both of them as Skolem tion rule. These strategies include set of support reso- functions for short. They are introduced in the process lution, unit resolution, linear resolution, etc. Some of of eliminating existential quantifiers from FOL for- them are complete for FOL, some are not. mulas (Chang and Lee, 1973). Skolem functions are Backward chaining can be applied to non-Horn not evaluable because they are unknown. It is fair to clauses as well (Sakharov, 2020). In this strategy, one assume that all other functions in KB rules are evalu- of any two resolved literals is a rule head or a fact. able. Any term with a Skolem function at the top can It is an incomplete strategy for non-Horn clauses but be unified with a variable only. Usually, the majority it is relatively efficient. Complete inference proce- of rules do not contain Skolem functions. 983 ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence When predicates are approximated by NNs and on the depth of derivations.