Property Invariant Embedding for Automated Reasoning

Property Invariant Embedding for Automated Reasoning Miroslav Olsˇak´ 1 and Cezary Kaliszyk 2 and Josef Urban 3 Abstract. Automated reasoning and theorem proving have recently TreeRNNs [6], and GraphNNs [9]. Most of the approaches, how- become major challenges for machine learning. In other domains, ever, cannot capture well the idea of a variable occurring multiple representations that are able to abstract over unimportant transfor- times in the formula and to abstract from the names of the variables. mations, such as abstraction over translations and rotations in vision, These issues were first addressed in FormulaNet [49] but even that are becoming more common. Standard methods of embedding math- architecture relies on knowing the names of function and predicate ematical formulas for learning theorem proving are however yet un- symbols. This makes it unsuitable for handling the large number of able to handle many important transformations. In particular, embed- problem-specific function and predicate symbols introduced during ding previously unseen labels, that often arise in definitional encod- the clausification.4 The same holds for large datasets of ATP prob- ings and in Skolemization, has been very weak so far. Similar prob- lems where symbol names are not used consistently, such as the lems appear when transferring knowledge between known symbols. TPTP library [43]. We propose a novel encoding of formulas that extends existing In this paper, we make further steps towards the abstraction of graph neural network models. This encoding represents symbols only mathematical clauses, formulas and proof states. We present a net- by nodes in the graph, without giving the network any knowledge of work that is invariant not only under renaming of variables, but also the original labels. We provide additional links between such nodes under renaming of arbitrary function and predicate symbols. It is also that allow the network to recover the meaning and therefore cor- invariant under replacement of the symbols by their negated versions. rectly embed such nodes irrespective of the given labels. We test This is achieved by a novel conversion of the input formulas into a the proposed encoding in an automated theorem prover based on the hypergraph, followed by a particularly designed graph neural net- tableaux connection calculus, and show that it improves on the best work (GNN) capable of maintaining the invariance under negation. characterizations used so far. The encoding is further evaluated on We experimentally demonstrate in three case studies that the network the premise selection task and a newly introduced symbol guessing works well on data coming from automated theorem proving tasks. task, and shown to correctly predict 65% of the symbol names. The paper is structured as follows. We first formally describe our network architecture in Section 2, and discuss its invariance proper- ties in Section 3. We describe an experiment using the network for 1 Introduction guiding leanCoP in Section 4, and two experiments done on a fixed Automated Theorem Provers (ATPs) [38] can be in principle used to dataset in Section 5. Section 6 contains the results of these three ex- attempt the proof of any provable mathematical conjecture. The stan- periments. dard ATP approaches have so far relied primarily on fast implemen- tation of manually designed search procedures and heuristics. How- 2 Network Architecture for Invariant Embedding ever, using machine learning for guidance in the vast action spaces of the ATP calculi is a natural choice that has been recently shown to This section describes the design and details of the proposed neural significantly improve over the unguided systems [26, 20]. architecture for invariant embeddings. The architecture gets as its in- The common procedure of a first-order ATP system – saturation- put a set of clauses C. It outputs an embedding for each of the clauses style or tableaux – is the following. The ATP starts with a set of first in C, each literal and subterm and each function and predicate sym- order axioms and a conjecture. The conjecture is negated and the for- bol present in C. The process consists of initially constructing a hy- mulas are Skolemized and clausified. The objective is then to derive pergraph out of the given set of clauses, and then several message a contradiction from the set of clauses, typically using some form of passing layers on the hypergraph. In Section 2.1 we first explain the resolution and related inference rules. The Skolemization as well as construction of a hypergraph from the input clauses. The details of introduction of new definitions during the clausification results in the the message passing are explained in Section 2.2 . introduction of many new function and predicate symbols. When guiding the proving process by statistical machine learn- 2.1 Hypergraph Construction ing, the state of the prover and the formulas, literals, and clauses, arXiv:1911.12073v1 [cs.AI] 27 Nov 2019 are typically encoded to vectors of real numbers. This has been so When converting the clauses to the graph, we aim to capture as much far mostly done with hand-crafted features resulting in large sparse relevant structure as possible. We roughly convert the tree structure vectors [27, 5, 1, 48, 23, 19], possibly reducing their dimension af- of the terms to a circuit by sharing variables, constants and also terwards [6]. Several experiments with neural networks have been bigger terms. The graph will be also interconnected through spe- made recently, in particular based on 1D convolutions, RNNs [16], cial nodes representing function symbols. Let nc denote the number of clauses, and let the clauses be C = fC1;:::;Cnc g. Sim- 1 University of Innsbruck, Austria, email: [email protected] 2 University of Innsbruck, Austria, email: [email protected] 4 The ratio of such symbols in real-world clausal datasets is around 40%, see 3 Czech Technical Univ. in Prague, Czechia, email: [email protected] Section 5.2. ilarly, let S = fS1;:::;Sns g denote all the function and predi- In the following we first provide the formulas describing the com- cate symbols occurring at least once in the given set of clauses, and putation. The symbols used in them are explained afterwards. T = fT1;:::;Tnt g denote all the subterms and literals occurring at least once in the given set of clauses. Two subterms are considered to i i i be identical (and therefore represented by a single node) if they are ci+1;j = ReLU(Bc + Mc · ci;j + Mct · red (ti;a)) a2F j constructed the same way using the same functions and variables. If ct a;b;c i i i i Ti is a negative literal, the unnegated form of Ti is not automatically xi = Bts + Mts;1 · ti;a + Mts;2 · ti;b + Mts;3 · ti;c added to T but all its subterms are. i i 0 a;b;c si+1;j = tanh(Ms · si;j + Mts · red (g · xi )) The sets C; S; T represent the nodes of our hypergraph. The hy- j (a;b;c;g)2Fst pergraph will also contain two sets of edges: Binary edges Ect ⊂ ya;b;c;g = Bi + M 1;d · t + M 2;d · t + M 3;d · s · g C×T between clauses and literals, and 4-ary oriented labeled edges i;d st st;i i;a st;i i;b st;i i;c 2 i a;b;c;g Est ⊂ S × T × (T [ fT0g) × f1; −1g. Here T0 is a specially cre- zi;j;d = Mst;d · red (ReLU(y )) j i;d ated and added term node disjoint from all actual terms and serving (a;b;c;g)2Fts;d in the arity-related encodings described below. The label is present i vi;j = Mtc · red (ci;a) j at the last position of the 5-tuple. The set Ect contains all the pairs a2Ftc (Ci;Tj ) where Tj is a literal contained in Ci. Note that this encod- i i X t = ReLU(B + M · t + v + z ) ing makes the order of the literals in the clauses irrelevant, which i+1;j t t i;j i;j i;j;d corresponds to the desired semantic behavior. d2f1;2;3g The set Est is constructed by the following procedure applied to every literal or subterm Ti that is not a variable. If Ti is a negative literal, we set σ = 1, and interpret Ti as Ti = :Sj (t1; : : : ; tn), Here, all the B symbols represent learnable vectors (biases), and otherwise we set σ = −1 and interpret Ti as Ti = Sj (t1; : : : ; tn), all the M symbols represent learnable matrices. Their sizes are listed where Sj 2 S, n is the arity of Sj and t1; : : : ; tn 2 T. If n = 0, we in Fig. 1. add (Sj ;Ti;T0;T0; σ) to Est. If n = 1, we add (Sj ;Ti; t1;T0; σ) i i+1 i i+1 i i+1 i i+1 to Est. And finally, if n ≥ 2, we extend Est by all the tuples Bc : dc Bts : ds Bst : dt Bt : dt i i+1 i i i+1 i i i+1 i (Sj ;Ti; tk; tk+1; σ) for k = 1; : : : ; n − 1. Mc : dc × dc Mt : dt × dt Mts;j : ds × dt i i+1 i i i+1 i i i+1 i+1 This encoding is used instead of just (Sj ;Ti; tk; σ) to (reasonably) Mct : dc × 2dt Mtc : dt × 2dc Mst;j : dt × 2dt M i : di+1 × di M i : di+1 × 2di+1 k;j i+1 i maintain the order of function and predicate arguments. For example, s s s ts s s Mst;i : dt × dt for two non-isomorphic (i.e., differently encoded) terms t1 and t2, Figure 1.

Load more