Neural-Symbolic Integration
Total Page:16
File Type:pdf, Size:1020Kb
Neural-Symbolic Integration Dissertation zur Erlangung des akademischen Grades Doktor rerum naturalium (Dr. rer. nat) vorgelegt an der Technischen Universit¨atDresden Fakult¨atInformatik eingereicht von Dipl.-Inf. Sebastian Bader geb. am 22. Mai 1977 in Rostock Dresden, Oktober 2009 Gutachter: Prof. Dr. rer. nat. habil Steffen H¨olldobler (Technische Universit¨atDresden) Prof. Dr. rer. nat. habil Barbara Hammer (Technische Universit¨atClausthal) Verteidigung: 5. Oktober 2009 Neural-Symbolic Integration Sebastian Bader Dresden, October 2009 To my family. Contents 1 Introduction and Motivation1 1.1 Motivation for the Study of Neural-Symbolic Integration............2 1.2 Related Work....................................4 1.3 A Classification Scheme for Neural-Symbolic Systems.............. 12 1.4 Challenge Problems................................ 18 1.5 Structure of this Thesis.............................. 22 2 Preliminaries 25 2.1 General Notions and Notations.......................... 26 2.2 Metric Spaces, Contractive Functions and Iterated Function Systems..... 30 2.3 Logic Programs................................... 34 2.4 Binary Decision Diagrams............................. 41 2.5 Connectionist Systems............................... 45 3 Embedding Propositional Rules into Connectionist Systems 53 3.1 Embedding Semantic Operators into Threshold Networks........... 54 3.2 Embedding Semantic Operators into Sigmoidal Networks........... 58 3.3 Iterating the Computation of the Embedded Operators............. 65 3.4 Summary...................................... 67 4 An Application to Part of Speech Tagging 69 4.1 Part of Speech Tagging.............................. 70 4.2 System Architecture................................ 70 4.3 Experimental Evaluation.............................. 74 4.4 Summary...................................... 76 5 Connectionist Learning and Propositional Background Knowledge 77 5.1 Integrating Background Knowledge into the Training Process......... 78 5.2 A Simple Classification Task as Case Study................... 79 5.3 Evaluation...................................... 81 5.4 Summary...................................... 81 6 Extracting Propositional Rules from Connectionist Systems 85 6.1 The Rule Extraction Problem for Feed-Forward Networks........... 86 6.2 CoOp { A New Decompositional Approach................... 88 6.3 Decomposition of Feed-Forward Networks.................... 89 6.4 Computing Minimal Coalitions and Oppositions................. 91 6.5 Composition of Intermediate Results....................... 104 6.6 Incorporation of Integrity Constraints...................... 112 6.7 Extraction of Propositional Logic Programs................... 117 6.8 Evaluation...................................... 119 6.9 Summary...................................... 121 v 7 Embedding First-Order Rules into Connectionist Systems 123 7.1 Feasibility of the Core Method.......................... 124 7.2 Embedding Interpretations into the Real Numbers............... 126 7.3 Embedding the Consequence Operator into the Real Numbers......... 133 7.4 Approximating the Embedded Operator using Connectionist Systems..... 134 7.5 Iterating the Approximation............................ 153 7.6 Vector-Based Learning on Embedded Interpretations.............. 157 7.7 Summary...................................... 159 8 Conclusions 161 8.1 Summary...................................... 162 8.2 Challenge Problems Revised............................ 164 8.3 Further Work.................................... 167 8.4 Final Remarks................................... 172 List of Figures 1.1 The Neural-Symbolic Cycle............................4 1.2 The idea behind the Core Method........................6 1.3 The three main dimensions of our classification scheme............. 12 1.4 The details of the interrelation dimension.................... 13 1.5 The details of the language dimension...................... 13 1.6 The details of the usage dimension........................ 13 2.1 Operators defined to obtain a more compact source code and some \syntactic sugar" yielding a better readable source code.................. 27 2.2 The definition for the \evaluate-to" operator.................. 28 2.3 The definition for the \evaluate-to" operator, ctd................ 29 2.4 Implementation to construct an empty BDD and to access the ingredients of a BDD......................................... 43 2.5 Implementation to construct a BDD from two BDDs.............. 44 2.6 Base cases for the construction of a BDD from two BDDs using disjunction and conjunction..................................... 44 2.7 The construction of a BDD from a given variable and two BDDs using the if-then-else construct................................ 44 2.8 Implementation to create of nodes while constructing a reduced BDD.... 45 3.1 Activation functions tanh and sigm together with their threshold counterparts 54 3.2 Implementation to construct a behaviour-equivalent artificial neural network for a given propositional logic program........................ 57 3.3 Implementation for the construction of a behaviour equivalent network with units computing the hyperbolic tangent...................... 64 4.1 General architecture of a neural-symbolic part of speech tagger......... 71 4.2 Network architecture for part-of-speech tagging.................. 73 4.3 Comparison of MSE after training an initialised and a purely randomised network. 75 4.4 The evolution of the error on training and validation set............ 75 5.1 The rule-insertion cycle for the integration of symbolic knowledge into the con- nectionist training process............................. 78 5.2 A 3-layer fully connected feed-forward network to classify Tic-Tac-Toe boards 79 5.3 The embedding of a tic-tac-toe rule........................ 80 5.4 The development of the mean squared error over time for different embedding factors ! ....................................... 82 6.1 Implementation to compute the set of perceptrons for a given feed-forward net- work......................................... 89 6.2 Implementation to construct a search tree to guide the extraction process.. 93 6.3 Implementation to construct the pruned search tree.............. 95 vii 6.4 Implementation to construct the reduced ordered binary decision diagram rep- resenting the set of minimal coalition for a given perceptron P ........ 101 6.5 Implementation to construct the pruned search trees for coalitions and opposi- tions......................................... 103 6.6 Implementation to construct the reduced ordered BDD for coalition and the reduced ordered BDD for oppositions for a given perceptron P ........ 105 6.7 Implementation to construct the positive form of a given perceptron...... 106 6.8 Implementation to construct the negative form of a given perceptron..... 107 6.9 The expansion of some node O into BDDa ..................... 113 6.10 The extraction of a reduced ordered BDDb representing the coalitions of all out- put nodes wrt. the input nodes for a given network N .............. 114 6.11 The full extraction into a single reduced ordered BDD including the incorpora- tion of integrity constraints for all output nodes................. 117 6.12 Resulting BDD sizes of the extraction for different maxn-integrity constraints. 121 7.1 Implementation of the embedding function ι for the 1-dimensional case.... 127 7.2 Implementation for the embedding function ι for multi-dimensional embeddings 127 7.3 Two sets of embedded interpretations...................... 130 7.4 Implementation to compute a set of constant pieces which approximate a given TP -operator up to level N ............................. 136 7.5 Implementation to compute a set of approximating step functions for a given TP -operator up to level N ............................. 138 1 7.6 Three sigmoidal approximations of the step function Θ0;0 ........... 139 7.7 Implementation to compute a set of approximating sigmoidal functions for a given TP -operator up to level N .......................... 141 7.8 Implementation to construct an approximating sigmoidal network for a given TP -operator up to level N ............................. 143 7.9 Implementation to compute the set of approximating raised cosine functions for a given TP -operator up to level N ......................... 145 7.10 Implementation to construct an approximating raised cosine network for a given TP -operator up to level N ............................. 146 7.11 Implementation to compute the set of approximating reference vectors for a given TP -operator up to level N .......................... 149 7.12 Implementation to construct an approximating vector based network for a given TP -operator up to level N ............................. 152 7.13 Approximation based on the constant pieces are not necessarily contractive if the underlying fP is contractive.......................... 155 7.14 The adaptation of the input weights for a given input i ............. 158 7.15 Adding a new unit................................. 159 7.16 Removing an inutile unit.............................. 160 Preface & Acknowledgement My interest in the area of neural-symbolic integration started when I was an undergraduate student at TU Dresden. In all classes on artificial neural networks that I took, some questions remained open, among them were the following two which are key problems tackled in this the- sis: \What did the network really learn?" and \How can I help the network using background knowledge?" While being a student in the