<<

24th International Joint Conference on IJCAI-15

Proceedings of the 10th International Workshop on Neural-Symbolic Learning and Reasoning NeSy’15

Tarek R. Besold, Luis C. Lamb, Thomas Icard, and Risto Miikkulainen (eds.)

Buenos Aires, Argentina, 27th of July 2015 Preface

NeSy’15 is the tenth installment of the long-running series of international workshops on Neural-Symbolic Learning and Reasoning (NeSy), which started in 2005 at IJCAI-05 in Edinburgh, Scotland, and also gave rise to two Dagstuhl seminars on the topic, held in 2008 and 2014.

Both, the NeSy workshops and the seminars offer researchers in the areas of artificial intelligence (AI), neural computation, , and neighboring disciplines opportunities to get together and share experience and practices concerning topics at the intersection between and knowledge representation and reasoning through the integration of neural computation and symbolic AI.

Topics of interest for the 2015 edition of NeSy correspondingly include: - The representation of symbolic knowledge by connectionist systems. - Neural Learning theory. - Integration of logic and probabilities, e.g., in neural networks, but also more generally. - Structured learning and relational learning in neural networks. - Logical reasoning carried out by neural networks. - Integrated neural-symbolic learning approaches. - Extraction of symbolic knowledge from trained neural networks. - Integrated neural-symbolic reasoning. - Neural-symbolic cognitive models. - Biologically-inspired neural-symbolic integration. - Applications in robotics, simulation, fraud prevention, natural language processing, semantic web, software engineering, fault diagnosis, bioinformatics, visual intelligence, etc.

NeSy’15 offers a diverse mixture of topics and formats, with two keynote lectures by Dan Roth (University of Illinois at Urbana-Champaign) and Gary F. Marcus (New York University), two invited papers from the IJCAI-15 main conference, and four contributed papers to the workshop itself ranging from technical to fairly philosophical considerations concerning different aspects relevant to neural-symbolic integration.

We, as workshop organizers, want to thank the following members of the NeSy’15 program committee for their time and efforts in reviewing the submissions to the workshop and providing valuable feedback to accepted and rejected papers alike: - Artur D'Avila Garcez, City University London, UK - Ross Gayler, Melbourne, Australia - Ramanathan V. Guha, Google Inc., U.S.A. - Pascal Hitzler, Wright State University, U.S.A. - Steffen Hoelldobler, Technical University of Dresden, Germany - Frank Jaekel, University of Osnabrueck, Germany - Kai-Uwe Kuehnberger, University of Osnabrueck, Germany - Christopher Potts, Stanford University, U.S.A. - Ron Sun, Rensselaer Polytechnic Insitute, U.S.A. - Jakub Szymanik, University of Amsterdam, The Netherlands - Gerson Zaverucha, Federal University of Rio de Janeiro, Brazil These workshop proceedings are available online from the workshop webpage under http://www.neural-symbolic.org/NeSy15/.

Osnabrück, 8th of July 2015 Tarek R. Besold, Luis C. Lamb, Thomas Icard, and Risto Miikkulainen.1

1 Tarek R. Besold is a postdoctoral researcher in the AI Group at the Institute of Cognitive Science of the University of Osnabrück, Germany; Luis C. Lamb is Professor of Computer Science at UFRGS, Porto Alegre, Brazil; Thomas Icard is Assistant Professor of Philosophy and Symbolic Systems at Stanford University, USA; Risto Miikkulainen is Professor of Computer Science and Neuroscience at the University of Texas at Austin, USA. Contents: Contributed Papers

Same same, but different? A research program exploring differences in complexity between logic and neural networks (Tarek R. Besold)

Probabilistic Inference Modulo Theories (Rodrigo de Salvo Braz, Ciaran O’Reilly, Vibhav Gogate, and Rina Dechter)

A Connectionist Network for Skeptical Abduction (Emmanuelle-Anna Dietz, Steffen Hölldobler, and Luis Palacios)

Towards an Artificially Intelligent System: Possibilities of General Evaluation of Hybrid Paradigm (Ond!ej Vadinsk") Same same, but different? A research program exploring differences in complexity between logic and neural networks Tarek R. Besold Institute of Cognitive Science, University of Osnabruck¨ Osnabruck,¨ Germany [email protected]

Abstract reasons also beyond the analogy to the described functioning principles of the brain: After an overview of the status quo in neural- symbolic integration, a research program targeting In general, network-based approaches possess a higher • foundational differences and relationships on the degree of biological motivation than symbol-based ap- level of computational complexity between sym- proaches, also outmatching the latter in terms of learning bolic and sub-symbolic computation and represen- capacities, robust fault-tolerant processing, and general- tation is outlined and proposed to the community. ization to similar input. Also, in AI applications they often enable flexible tools (e.g., for discovering and pro- cessing the internal structure of possibly large data sets) 1 Integrating symbolic and sub-symbolic and efficient signal-processing models (which are bio- computation and representation logically plausible and optimally suited for a wide range A seamless coupling between learning and reasoning is com- of applications). monly taken as basis for intelligence in humans and, in close Symbolic representations are generally superior in terms analogy, also for the biologically-inspired (re-)creation of • of their interpretability, the possibilities of direct control human-level intelligence with computational means (see, e.g., and coding, and the extraction of knowledge when com- [Valiant, 2013], p. 163). Still, one of the unsolved method- pared to their (in many ways still black box-like) con- ological core issues in human-level AI, cognitive systems nectionist counterparts. modelling, and cognitive and computational neuroscience is From a cognitive modelling point of view, sub-symbolic the question of the integration between connectionist sub- • symbolic (i.e., “neural-level”) and logic-based symbolic (i.e., representations for tasks requiring symbolic high-level “cognitive-level”) approaches to representation, computa- reasoning might help solving, among many others, the tion, (mostly sub-symbolic) learning, and (mostly symbolic) problem with “too large” logical (epistemic) models higher-level reasoning. (see, e.g., [Gierasimczuk and Szymanik, 2011]) which AI researchers working on the modelling or (re-)creation seem to lead to implausible computations from the rea- of human cognition and intelligence, and cognitive neurosci- soning agent’s perspective as documented, among oth- entists trying to understand the neural basis for human cog- ers, by [Degremont et al., 2014]. nition, have for years been interested in the nature of brain- Concerning our current understanding of the relationship computation in general (see, e.g., [Adolphs, 2015]) and the and differences between symbolic and sub-symbolic com- relation between sub-symbolic/neural and symbolic/cognitive putation and representation, the cognitive-level “symbolic modes of representation and computation in particular (see, paradigm” is commonly taken to correspond to a Von Neu- e.g., [Dinsmore, 1992]). The brain has a neural structure mann architecture (with predominantly discrete and serial which operates on the basis of low-level processing of percep- computation and localized representations) and the neural- tual signals, but cognition also exhibits the capability to effi- level “sub-symbolic paradigm” mainly is conceptualized as ciently perform abstract reasoning and symbol processing; in a dynamical systems-type approach (with distributed repre- fact, processes of the latter type seem to form the conceptual sentations and predominantly parallel and continuous com- cornerstones for thinking, decision-making, and other (also putations). directly behavior-relevant) mental activities (see, e.g., [Fodor This divergence notwithstanding, both symbolic/cognitive and Pylyshyn, 1988]). and sub-symbolic/neural models are considered from a com- Building on these observations – and taking into account putability perspective equivalent in practice [Siegelmann, that hybrid systems loosely combining symbolic and sub- 1999], and from a tractability perspective in practice [van symbolic modules into one architecture turned out to be in- Rooij, 2008] as well as likely in theory [van Emde Boas, sufficient – agreement on the need for fully integrated neural- 1990]. Also, [Leitgeb, 2005] showed that in principle there cognitive processing has emerged (see, e.g., [Bader and Hit- is no substantial difference in representational or problem- zler, 2005; d’Avila Garcez et al., 2015]). This has several solving power between both paradigms (see Sect. 2 for fur- ther discussion of the cited results). straction above and beyond mere low-level process- Still, in general experiences from application studies con- ing. The resulting networks partially perform tasks sistently and reliably show different degrees of suitability and classically involving complex symbolic reasoning such performance of the paradigms in different types of applica- as, for instance, the labeling of picture elements or tion scenarios, with sub-symbolic/neural approaches offering scene description (see, e.g., [Karpathy and Li, 2014; themselves, e.g., for effective and efficient solutions to tasks Vinyals et al., 2014]). involving learning and generalization, while high-level rea- Recently proposed classes of sub-symbolic models such soning and concept composition are commonly addressed in • as “Neural Turing Machines” [Graves et al., 2014] or symbolic/cognitive frameworks. Unfortunately, general ex- “Memory Networks” [Weston et al., 2015] seem to planations (and solutions) for this foundational dichotomy also architecturally narrow the gap between the (sub- this far have been elusive when using standard methods of symbolic) dynamical systems characterization and the investigation. (symbolic) Von Neumann architecture understanding. In summary, symbolic/cognitive interpretations of artificial neural network (ANN) architectures and accurate and feasi- Nonetheless, all these developments (including deep neu- ble sub-symbolic/neural models of symbolic/cognitive pro- ral networks as layered recurrent ANNs) stay within the pos- cessing seem highly desirable: as important step towards the sibilities and limitations of the respective classical paradigms computational (re-)creation of mental capacities, as possi- without significantly changing the basic formal characteris- ble sources of an additional (bridging) level of explanation tics of the latter. of cognitive phenomena of the (assuming that Formal analysis of symbolic and sub-symbolic suitably chosen ANN models correspond in a meaningful computation and representation way to their biological counterparts), and also as important According to our current knowledge, from a formal per- part of future technological developments (also see Sect. 6). spective – especially when focusing on actually physically- But while there is theoretical evidence indicating that both realizable and implementable systems (i.e., physical fi- paradigms indeed share deep connections, how to explicitly nite state machines) instead of strictly abstract models establish and exploit these correspondences currently remains of computation, together with the resulting physical and a mostly unsolved question. conceptual limitations – both symbolic/cognitive and sub- symbolic/neural models seem basically equivalent. 2 Status quo in neural-symbolic integration As already mentioned in Sect. 1, notwithstanding par- tially differing theoretical findings and discussions (as, e.g., Neural-symbolic integration in AI, cognitive modelling, given by [Tabor, 2009]), both paradigms are considered and machine learning computability-equivalent in practice [Siegelmann, 1999]. Research on integrated neural-symbolic systems (especially Also from a tractability perspective, for instance in [van in AI and to a certain extent also in cognitive modelling) has Rooij, 2008], equivalence in practice with respect to clas- made significant progress over the last two decades (see, e.g., sical dimensions of analysis (i.e., interchangeability except [Bader and Hitzler, 2005; d’Avila Garcez et al., 2015]); par- for a polynomial overhead) has been established, comple- tially, but not exclusively, in the wake of the development menting and supporting the theoretical suggestion of equiv- of approaches to machine learning (see, e.g., alence by [van Emde Boas, 1990] in his Invariance Thesis. [Schmidhuber, 2015]. Generally, what seem to be several im- Finally, [Leitgeb, 2005] provided an in principle existence portant steps towards the development of integrated neural- result, showing that there is no substantial difference in rep- symbolic models have been made: resentational or problem-solving power between dynamical From the symbolic perspective on the capacities of sub- systems with distributed representations or symbolic systems • symbolic computation and representation, the “Proposi- with non-monotonic reasoning capabilities. tional Fixation” (i.e., the limitation of neural models on Still, these results are only partially satisfactory: Although implementing propositional logic at best) has been over- introducing basic connections and mutual dependencies be- come, among others, in models implementing modal or tween both paradigms, the respective levels of analysis are temporal logics with ANNs (see, e.g., [d’Avila Garcez quite coarse and the found results are only existential in char- et al., 2008]). acter. While establishing the in principle equivalence de- scribed above, [Leitgeb, 2005] does not provide constructive From the sub-symbolic perspective, neural computation • methods for how to actually obtain the corresponding sym- has been equipped with features previously (almost) ex- bolic counterpart to a sub-symbolic model and vice versa. clusively limited to symbolic models by adding top- Concerning the complexity and computability equiva- down governing mechanisms to modular, neural learn- lences, while the latter is supported by the results of [Leit- ing architectures, for example, through the use of “Con- geb, 2005], the former stays mostly untouched: While com- [ ] ceptors” Jaeger, 2014 as computational principle. ing to the same conclusion, i.e., the absence of substantial Deep learning approaches to machine learning – by differences between paradigms (i.e., differences at the level • the high number of parameterized transformations per- of tractability classes), no further clarification or characteri- formed in the corresponding hierarchically structured zation of the precise nature and properties of the polynomial models – seem to, at first sight, also conceptually pro- overhead between symbolic and sub-symbolic approaches is vide what can be interpreted as different levels of ab- provided. Summary symbolic/neural forms of computation and the associ- Although remarkable successes have been achieved within ated dynamical systems conception, the adequacy and the respective paradigms, the divide between the paradigms exhaustiveness of the classical approaches to complex- persists, interconnecting results still either only address spe- ity analysis using only TIME and SPACE as resources cific and non-generalizable cases or are in principle and for a fully informative characterization must be ques- non-constructive, benchmark scenarios for principled com- tioned. Are there more adequate resources which should parisons (e.g., in terms of knowledge representation power be taken into account for analysis? or descriptive complexity) between sub-symbolic and sym- Question 2: Especially when considering the symbolic bolic models have still not been established, and questions • level, are there more adequate approaches/methods of concerning the precise nature of the relationship and foun- analysis available than classical complexity theory, al- dational differences between symbolic/cognitive and sub- lowing to take into account formalism- or calculus- symbolic/neural approaches to computation and representa- specific characterizations of computations or to perform tion still remain unanswered (see, e.g., [Isaac et al., 2014]): in analyses at a more fine-grained level than tractability? some cases due to a lack of knowledge for deciding the prob- Finally, in an integrative concluding step taking into ac- lem, in others due to a lack of tools and methods for properly count the methods and findings resulting from the previous specifying and addressing the relevant questions. two, a third question shall be investigated: Question 3: Can the in principle results from [Leit- 3 Proposed research program • geb, 2005] be extended to more specific and/or construc- Focusing on the just described lack of tools and methods, tive correspondences between individual notions and/or together with the insufficient theoretical knowledge about characterizations within the respective paradigms? many aspects of the respective form(s) of computation and Answers to these questions (and the resulting refined tools representation, in the envisioned research program, the clas- and methods) promise to contribute to resolving some of the sical findings concerning the relation and integration be- basic theoretical and practical tensions described in Sect. 1 tween the symbolic/cognitive and the sub-symbolic/neural and Sect. 2: Although both paradigms are theoretically undis- paradigm described in Sect. 1 shall be revisited in light of tinguishable (i.e., equivalent up to a polynomial overhead) in new developments in the modelling and analysis of connec- their general computational-complexity behavior using clas- tionist systems in general (and ANNs in particular), and of sical methods of analysis and characterization results, empir- new formal methods for investigating the properties of gen- ical studies and application cases using state of the art ap- eral forms of representation and computation on a symbolic proaches still show clear distinctions in suitability and feasi- level. bility of the respective paradigms for different types of tasks To this end, taking into account the apparent empiri- and domains without us having an explanation for this behav- cal differences between the paradigms and (especially when ior. Parts of this divergence might be explained by previously dealing with physically-realizable systems) assuming basic unconsidered and unaccessible complexity-related properties equivalence on the level of computability, emphasis shall be of the respective approaches and their connections to each put on identifying and/or developing adequate formal tools other. and investigating previously unconsidered aspects of exist- ing equivalence results. Focus shall be put on the pre- 4 Proposed approach/methodology cise nature of the polynomial overhead as computational- In order to address the three leading questions stated in complexity difference between paradigms: Most complex- Sect. 3, the program relies on continuous exchange and ity results for symbolic/cognitive and sub-symbolic/neural close interaction with researchers from cognitive science computations have been established using exclusively TIME and (computational and cognitive) neuroscience on the one [ and SPACE as classical resources (see, e.g., Thomas and hand, and with logicians and theoretical computer scientists ] Vollmer, 2010; S´ıma and Orponen, 2003 ), and the tractabil- on the other hand. The envisioned level of work is situated [ ity equivalence between paradigms (see, e.g., van Rooij, between the (purely theoretical) development of methods in ] 2008 ) mostly leaves out more precise investigations of the complexity theory, network analysis, etc. and the (purely remaining polynomial overhead. Against this background, applied) study of properties of computational and repre- the working hypotheses for the program are that TIME and sentational paradigms by applying existing tools: Previous SPACE are not always adequate and sufficient as resources work from the different fields and lines of research shall of analysis for elucidating all relevant properties of the re- be assessed and combined – in doing so, where necessary, spective paradigms, and that there are significant characteris- adapting or expanding the respective methods and tools – tics and explanations to be found on a more fine-grained level into new means of analysis, which then shall subsequently than accessible by classical methods of analysis (settling on be applied to suitably selected candidate models representing the general tractability level). paradigmatic examples of symbolic or sub-symbolic repre- The main line of research can be summarized in two con- sentations/computations with respect to features relevant for secutive questions (corresponding to the stated working hy- the respective question(s) at hand. potheses), one starting out from a more sub-symbolic, the other from a more symbolic perspective: The proposed research program is divided into three stages, Question 1: Especially when considering sub- corresponding to the three questions from Sect. 3: • Adequate resources for analysis of the program emphasis shall be shifted towards the sym- TIME and SPACE are the standard resources considered bolic/cognitive side. While staying closer to the classical con- in classical complexity analyses of computational frame- ception of complexity in terms of TIME and SPACE, recent works. Correspondingly, most results concerning complex- developments in different fields of theoretical computer sci- ity comparisons between symbolic and sub-symbolic mod- ence shall be combined into tools for more model-specific and els of computation also focus on these two dimensions (as fine-grained analyses of computational properties. do, e.g., the aforementioned results by [van Rooij, 2008; Parameterized-complexity theory (see, e.g., [Downey and van Emde Boas, 1990]). Fellows, 1999]) makes the investigation of problem-specific Still, the reading of TIME and SPACE as mostly relevant complexity characteristics possible, while tools such as, resources for complexity analysis is closely connected to a e.g., developed in the theory of proof-complexity (see, e.g., Turing-style conception of computation and a Von Neumann- [Kraj´ıcek, 2005]) allow for for more varied formalism- or inspired architecture as machine model, working, e.g., with calculus-specific characterizations of the respective computa- limited memory. Especially when considering other compu- tions than currently done. Additionally, tools from descrip- tational paradigms with different characteristics, as, e.g., the tive complexity theory (see, e.g., [Immerman, 1999]) and dynamical systems model commonly associated to the sub- work from model-theoretic (see, e.g., [Rabin, 1965]) symbolic/neural paradigm, the exhaustiveness and adequate- seem likely to offer chances for shedding light on complex- ness of TIME and SPACE for a full analysis of all relevant ity distinctions below the tractability threshold (i.e., for ex- computational properties has to be questioned. Instead, it ploring the precise nature of the polynomial overhead) and to seems likely that additional resources specific to the respec- allow for more fine-grained and discriminative comparisons tive model of computation and architecture have to be taken between paradigms and models. into account in order to provide a complete characterization. Thus, results from the just mentioned fields/techniques Thus, in a first stage of the program, popular network types could be examined for their applicability to better charac- on the sub-symbolic/neural side shall be investigated for rele- terizing symbolic computation and to potentially establishing vant dimensions of analysis other than TIME and SPACE. Be- conceptual connections to characterizations of sub-symbolic sides the classical standard and recurrent approaches, models computation from the previous stage: from the the following (non-exhaustive) list could be consid- Parameterized-complexity theory: Taking into account ered: recurrent spiking neural networks (see, e.g. [Gerstner • problem- and application-specific properties of (families et al., 2014]), Long Short-Term Memory networks and ex- of) problems and connecting these to results describing tensions thereof (see, e.g., [Monner and Reggia, 2012]), or specific properties of sub-symbolic or symbolic compu- recurrent stochastic neural networks in form of Boltzmann tation and representation, trying to explain the different machines [Ackley et al., 1985] and restricted Boltzmann ma- suitability of one or the other paradigm for certain types chines [Hinton, 2002]. of tasks. Taking recurrent networks of spiking as examples, Descriptive complexity theory and model-theoretic syn- concerning candidates for relevant dimensions also measures • tax: Attempting to explore complexity distinctions be- such as spike complexity (a bound for the total number of tween different forms of symbolic and between sym- spikes during computation; [Uchizawa et al., 2006]), conver- bolic and sub-symbolic computation also in more fine- gence speed (from some initial network state to the stationary grained ways than by mere tractability considerations distribution; [Habenschuss et al., 2013]), sample complex- (e.g., also taking into account the polynomial-time hi- ity (the number of samples from the stationary distribution erarchy and the logarithmic-time hierarchy). needed for a satisfactory computational output; [Vul et al., 2014]), or network size and connectivity seem relevant. Proof-complexity theory: Taking into account • These and similar proposals for the other network mod- formalism- and calculus-specific properties of symbolic els shall be critically assessed (both theoretically and in com- computations, and trying to map these onto properties putational experiments, testing hypotheses and validating the of specific sub-symbolic models. relevance of theoretical results) and, where possible, put into At the end of this stage, proposals for refined methods a correspondence relation with each other, allowing to mean- of analysis for forms of symbolic/cognitive computation and ingfully generalize between different sub-symbolic/neural application examples in terms of proof of concept analyses, models and provide general characterizations of the respec- together with suggestions for correspondences to models of tive computations. sub-symbolic/neural computation, will be available. At the end of this stage, new proposals for adequate resources usable in refined complexity analyses for sub- Correspondences between paradigms symbolic/neural computation, together with application ex- In a third and final stage of the program, by combining the amples in terms of proof of concept analyses of popular results of the previous stages, additional dimensions will be paradigms, will be available. added to previous analyses and established equivalence re- sults, and the precise nature of the polynomial overhead as Adequate methods of analysis computational difference between paradigms will be better In parallel to and/or following the search for more ad- explained. Also, the outcomes of previous stages shall be in- equate resources for complexity analyses of mostly sub- tegrated where meaningfully possible, ideally providing the symbolic/neural models of computation, in a second stage foundations for a general set of refined means of analysis for future comparative investigations of symbolic/cognitive and Principled correspondences between specific no- sub-symbolic/neural computation. tions/characterizations of paradigms: New perspec- Depending on previous outcomes, some of the following tives on the relation between symbolic/cognitive and (interrelated) questions probably can be addressed: sub-symbolic/neural representation and computation will be explored and a better understanding of the respective Given the in principle equivalence between (sym- approach(es) and their interaction (with a strong orientation • bolic/cognitive) non-monotonic logical systems and towards a future integration of conceptual paradigms, of (sub-symbolic/neural) dynamical systems, is it possible levels of explanation, and of involved scientific disciplines) to establish complexity-based systematic conceptual re- shall be established. Emphasis will be put on understanding lationships between particular logical calculi and differ- the interaction between model-specific changes in one ent types of sub-symbolic networks? paradigm and corresponding adaptations of the respective Can adaptations in network structure and/or (the ar- conceptual or formal counterpart within the other paradigm. • tificial equivalent of) synaptic dynamics in a neural representation in a systematic way be related to re- 6 Potential impact of the proposed program representation in a logic-based representation, or (alter- Integrating symbolic/cognitive and sub-symbolic/neural natively) is there a systematic correspondence on the paradigms of computation and representation not only helps level of change of calculus? to solve foundational questions within and to strengthen Can changes in network type in a neural representa- the interface between AI/computer science and cognitive • tion in a systematic way be related to changes of non- and computational neuroscience, but will also have lasting monotonic logic in a symbolic representation? impact in present and future technological applications and significant possibilities of industrial valorization: Can the correspondences and differences between novel Following the advent of the internet/WWW, ubiquitous • network models approximating classical symbolic ca- sensors and ambient intelligence systems performing high- pacities (as, e.g., top-down control) or architectures (as, level and complex reasoning based on low-level data and sig- e.g., a Von Neumann machines) and the original sym- nals will be key to the future development of advanced intelli- bolic concepts be characterized in a systematic way? gent applications and smart environments. Whilst “Big Data” At the end of this stage, partial answers to some of the and statistical reasoning can provide for current applications, stated questions, together with proposals for future lines of many real-world scenarios in the near future will require re- investigation continuing the work started in the program, will liable reasoning also based on smaller samples of data, either be available. Also, suggestions for new tools and methods due to the need for immediate (re)action without the time de- for the comparative analysis of symbolic/cognitive and sub- lay or effort required for obtaining additional usable data, or symbolic/neural computation will be available. due to the need of dealing with rare events offering too few similar data entries as to allow for the application of standard 5 Expected results/outcomes statistic-driven approaches. The corresponding systems will, thus, have to make use of complex abstract reasoning mecha- More adequate and refined tools and methods for relat- nisms (which then will have to be used to inform subsequent ing and comparing paradigms: New methodological ap- low-level sensing and processing steps in an action-oriented proaches and updated and refined formal tools for better and continuous sensing–processing–reasoning cycle). more adequately analyzing and characterizing the nature and This still poses enormous challenges in terms of techno- mechanisms of representation and computation in the corre- logical realizability due to the remaining significant divide sponding paradigm(s) will be developed. between symbolic and sub-symbolic paradigms of computa- Alternative resources complementing TIME and SPACE tion and representation. Here, a better understanding of the for the characterization of properties of (especially sub- relationship between the paradigms and their precise differ- symbolic/neural) computation will be provided. Emphasis ences in computational and representational power will open will be put on making model-specific properties of the re- up the way to a better integration between both. spective mechanisms accessible. Also, alternative methods complementing the classical References complexity-theoretical approach to the characterization of [Ackley et al., 1985] D. H. Ackley, G. E. Hinton, and T. J. properties of (especially symbolic/cognitive) computation Sejnowski. Learning and Relearning in Boltzmann Ma- will be provided. Emphasis will be put on making formalism- chines. Cognitive Science, 9(1):147–169, 1985. or calculus-specific properties of the respective computing mechanisms accessible, and on offering more fine-grained in- [Adolphs, 2015] R. Adolphs. The unsolved problems of neu- sights than available in the classical framework. roscience. Trends in Cognitive Science, 19(4):173–175, Whilst by itself being useful in more theoretical work, 2015. the results shall be maximally informative and usable in the [Bader and Hitzler, 2005] S. Bader and P. Hitzler. Dimen- context of neural-cognitive integration in AI, cognitive and sions of Neural-symbolic Integration - A Structured Sur- computational neuroscience, and (computational) cognitive vey. In We Will Show Them! Essays in Honour of Dov science. Gabbay, Volume One, pages 167–194. College Publica- tions, 2005. [d’Avila Garcez et al., 2008] A. S. d’Avila Garcez, L. C. [Kraj´ıcek, 2005] J. Kraj´ıcek. Proof Complexity. In 4ECM Lamb, and D. M. Gabbay. Neural-Symbolic Cognitive Stockholm 2004. European Mathematical Society, 2005. Reasoning. Cognitive Technologies. Springer, 2008. [Leitgeb, 2005] H. Leitgeb. Interpreted dynamical systems [d’Avila Garcez et al., 2015] A. S. d’Avila Garcez, T. R. and qualitative laws: From neural networks to evolution- Besold, L. de Raedt, P. Foldiak,¨ P. Hitzler, T. Icard, K.- ary systems. Synthese, 146:189–202, 2005. U. Kuhnberger,¨ L. Lamb, R. Miikkulainen, and D. Sil- [Monner and Reggia, 2012] D. Monner and J. Reggia. A ver. Neural-Symbolic Learning and Reasoning: Contri- generalized LSTM-like training algorithm for second- butions and Challenges. In AAAI Spring 2015 Symposium order recurrent neural networks. Neural Networks, 25:70– on Knowledge Representation and Reasoning: Integrat- 83, 2012. ing Symbolic and Neural Approaches, volume SS-15-03 of AAAI Technical Reports. AAAI Press, 2015. [Rabin, 1965] M. Rabin. A simple method for undecidabil- ity proofs and some applications. In Y. Bar-Hillel, editor, [Degremont et al., 2014] C. Degremont, L. Kurzen, and Logic Methodology and Philosophy of Science II, pages J. Szymanik. Exploring the tractability border in epistemic 58–68. North-Holland, 1965. tasks. Synthese, 191(3):371–408, 2014. [ ] [Dinsmore, 1992] J. Dinsmore, editor. The Symbolic and Schmidhuber, 2015 J. Schmidhuber. Deep learning in neu- Neural Networks Connectionist Paradigms: Closing the Gap. Cognitive ral networks: An overview. , 61:85–117, Science Series. Press, 1992. 2015. [ ] [Downey and Fellows, 1999] R. G. Downey and M. R. Fel- Siegelmann, 1999 H. Siegelmann. Neural Networks lows. Fundamentals of Parameterized Complexity. Texts and Analog Computation: Beyond the Turing Limit. in Computer Science. Springer, 1999. Birkhauser,¨ 1999. [Fodor and Pylyshyn, 1988] J. A. Fodor and Z. W. Pylyshyn. [S´ıma and Orponen, 2003] J. S´ıma and P. Orponen. General- Connectionism and : A critical anal- Purpose Computation with Neural Networks: A Survey ysis. Cognition, 28(1–2):3 – 71, 1988. of Complexity Theoretic Results. Neural Computation, 15:2727–2778, 2003. [Gerstner et al., 2014] W. Gerstner, W. Kistler, R. Naud, and L. Paninski. Neuronal Dynamics: From Single Neurons to [Tabor, 2009] W. Tabor. A dynamical systems perspective on to Networks and Models of Cognition. Cambridge Univer- the relationship between symbolic and non-symbolic com- sity Press, 2014. putation. Cognitive Neurodynamics, 3(4):415–427, 2009. [Gierasimczuk and Szymanik, 2011] N. Gierasimczuk and [Thomas and Vollmer, 2010] M. Thomas and H. Vollmer. J. Szymanik. A Note on a Generalization of the Muddy Complexity of Non-Monotonic Logics. Bulletin of the Children Puzzle. In Proc. of the 13th Conference on Theo- EATCS, 102:53–82, October 2010. retical Aspects of Rationality and Knowledge, TARK XIII, [Uchizawa et al., 2006] K. Uchizawa, R. Douglas, and pages 257–264. ACM, 2011. W. Maass. On the computational power of thresh- [Graves et al., 2014] A. Graves, G. Wayne, and I. Danihelka. old circuits with sparse activity. Neural Computation, Neural Turing Machines. arXiv. 2014. 1410.5401v1 18(12):2994–3008, 2006. [cs.NE], 20 Oct 2014. [Valiant, 2013] L. Valiant. Probably Approximately Correct: [Habenschuss et al., 2013] S. Habenschuss, Z. Jonke, and Nature’s Algorithms for Learning and Prospering in a W. Maass. Stochastic computations in cortical microcir- Complex World. Basic Books, 2013. cuit models. PLOS Computational Biology, 9(11), 2013. [van Emde Boas, 1990] P. van Emde Boas. Machine Models [Hinton, 2002] G. E. Hinton. Training Products of Experts and Simulations. In Handbook of Theoretical Computer by Minimizing Contrastive Divergence. Neural Computa- Science A. Elsevier, 1990. tion, 14(8):1771–1800, 2002. [van Rooij, 2008] I. van Rooij. The Tractable Cognition [Immerman, 1999] N. Immerman, editor. Descriptive Com- Thesis. Cognitive Science, 32:939–984, 2008. plexity. Texts in Computer Science. Springer, 1999. [Vinyals et al., 2014] O. Vinyals, A. Toshev, S. Bengio, and [Isaac et al., 2014] A. Isaac, J. Szymanik, and R. Verbrugge. D. Erhan. Show and Tell: A Neural Image Caption Gener- Logic and Complexity in Cognitive Science. In Johan van ator. arXiv. 2014. 1411.4555v1 [cs.CV], 17 Nov 2014. Benthem on Logic and Information Dynamics, volume 5 [Vul et al., 2014] E. Vul, N. Goodman, , T. Griffiths, and of Outstanding Contributions to Logic, pages 787–824. J. Tenenbaum. One and done? Optimal decisions from Springer, 2014. very few samples. Cognitive Science, 38(4):599–637, [Jaeger, 2014] H. Jaeger. Controlling recurrent neural net- 2014. works by conceptors. arXiv. 2014. 1403.3369v1 [cs.CV], [Weston et al., 2015] J. Weston, S. Chopra, and A. Bordes. 13 Mar 2014. Memory Networks. arXiv. 2015. 1410.3916v6 [cs.AI], 7 [Karpathy and Li, 2014] A. Karpathy and F.-F. Li. Deep Feb 2015. Visual-Semantic Alignments for Generating Image De- scriptions. arXiv. 2014. 1412.2306v1 [cs.CV], 7 Dec 2014. Probabilistic Inference Modulo Theories

Rodrigo de Salvo Braz Ciaran O’Reilly Vibhav Gogate Rina Dechter SRI International SRI International U. of Texas at Dallas U. of California, Irvine

Abstract The Statistical Relational Learning [Getoor and Taskar, 2007] literature offered more expressive languages but relied We present SGDPLL(T ), an algorithm that solves on conversion to conventional representations to perform in- (among many other problems) probabilistic infer- ference, which can be very inefficient. To counter this, lifted ence modulo theories, that is, inference problems inference [Poole, 2003; de Salvo Braz, 2007] offered solu- over probabilistic models defined via a logic the- tions for efficiently processing logically specified models, but ory provided as a parameter (currently, equalities with languages of limited expressivity (such as function-free and inequalities on discrete sorts). While many so- ones) and algorithms that are hard to extend. Probabilistic lutions to probabilistic inference over logic repre- programming [Goodman et al., 2012] has offered inference sentations have been proposed, SGDPLL(T ) is the on full programming languages, but relies on approximate only one that is simultaneously (1) lifted, (2) exact methods on the propositional level. and (3) modulo theories, that is, parameterized by a background logic theory. This offers a founda- We present SGDPLL(T ), an algorithm that solves (among tion for extending it to rich logic languages such many other problems) probabilistic inference on models de- as data structures and relational data. By lifted, fined over logic representations. Importantly, the algorithm we mean that our proposed algorithm can leverage is agnostic with respect to which particular logic theory is first-order representations to solve some inference used, which is provided to it as a parameter. In this paper, problems in constant or polynomial time in the do- we use the theory consisting of equalities over finite discrete main size (the number of values that variables can types, and inequalities over bounded integers, as an example. take), as opposed to exponential time offered by However, SGDPLL(T ) offers a foundation for extending it to propositional algorithms. richer theories involving relations, arithmetic, lists and trees. While many algorithms for probabilistic inference over logic representations have been proposed, SGDPLL(T ) is distin- 1 Introduction guished by being the only existing solution that is simultane- Uncertainty representation, inference and learning are im- ously (1) lifted, (2) exact and (3) modulo theories. portant goals in Artificial Intelligence. In the past few SGDPLL(T ) is a generalization of the Davis-Putnam- decades, neural networks and graphical models have made Logemann-Loveland (DPLL) algorithm for solving the sat- much progress towards those goals, but even today their main isfiability problem. SGDPLL(T ) generalizes DPLL in three methods can only support very simple types of representa- ways: (1) while DPLL only works on propositional logic, tions (such as tables and weight matrices) that exclude log- SGDPLL(T ) takes (as mentioned) a logic theory as a param- ical constructs such as relations, functions, arithmetic, lists eter; (2) it solves many more problems than satisfiability on and trees. Moreover, such representations require models in- boolean formulas, including summations over real-typed ex- volving discrete variables to be specified at the level of their pressions, and (3) it is symbolic, accepting input with free individual values, making generic algorithms expensive for variables (which can be seen as constants with unknown val- finite domains and impossible for infinite ones. ues) in terms of which the output is expressed. For example, consider the following conditional probabil- ity distributions, which would need to be either automatically Generalization (1) is similar to the generalization of DPLL expanded into large tables (a process called propositionaliza- made by Satisfiability Modulo Theories (SMT) [Barrett et al., tion), or manipulated in a manual, ad hoc manner, in order 2009; de Moura et al., 2007; Ganzinger et al., 2004], but SMT to be processed by mainstream neural networks or graphical algorithms require only satisfiability solvers of their theory model algorithms: parameter to be provided, whereas SGDPLL(T ) may require solvers for harder tasks (including model counting) that de- P (x>10 y = 98 z 15) = 0.1, pend on the theory, including symbolic model counters, i.e., • for x, y, z | ￿ 1,...,∨1000≤ ∈{ } Figures 1 and 2 illustrate how both DPLL and SGDPLL(T ) P (x = Bob friends(x, Ann)) = 0.3 work and highlight their similarities and differences. • ￿ | 2 Intuition: DPLL, SMT and SGDPLL(T ) problem. SAT consists of determining whether a propo- sitional formula F , expressed in conjunctive normal form (CNF) has a solution or not. A CNF is a conjunction ( ) of xyz (x › ™y) š (™ x › y › z) clauses where a clause is a disjunction ( ) of literals, where∧ a literal is a proposition (e.g., x) or its∨ negation (e.g., x). x = false x = true ¬ › A solution to a CNF is an assignment of values from the set 0, 1 or TRUE, FALSE to all Boolean variables (or propo- yz ™y yz y › z sitions){ } in{F such that at} least one literal in each clause in F is assigned to TRUE. y = false › y = true y = false › y = true Algorithm 1 shows a simplified and non-optimized version of DPLL which operates on CNF formulas. It works by re- z true z false z z z true   cursively trying assignments for each proposition, one at a F z = false › z = true time, simplifying the CNF, and terminating if is a constant (TRUE or FALSE). Figure 1 shows an example of the execu- false true tion of DPLL. Although simple, DPLL is the basis for modern SAT solvers which improve it by adding sophisticated tech- niques and optimizations such as unit propagation, watch lit- Figure 1: Example of DPLL’s search tree for the existence of sat- erals, and clause learning [Een´ and Sorensson,¨ 2003]. isfying assignments. We show the full tree even though the search typically stops when the first satisfying assignment is found.

Algorithm 1 A version of the DPLL algorithm.

           DPLL(F )     F : a formula in CNF.    simplify: simplifies boolean formulas given a condition (e.g., simplify(x y y)=FALSE)        ∧ |¬       

  1 if F is a boolean constant       2 return F 3 else v pick a variable in F      Sol← DPLL(simplify(F v))  4 1 5 Sol ← DPLL(simplify(F | v)) 2 ← |¬    6 return Sol 1 Sol 2   ∨                          Satisfiability Modulo Theories (SMT) algorithms [Bar-     rett et al., 2009; de Moura et al., 2007; Ganzinger et al.,        2004] generalize DPLL and can determine the satisfiability of       a Boolean formula expressed in first-order logic, where some            function and predicate symbols have additional interpreta-          tions. Examples of predicates include equalities, inequalities,             and uninterpreted functions, which can then be evaluated us- ing rules of real arithmetic. SMT algorithms condition on the literals of a background theory T , looking for a truth assign- Figure 2: SGDPLL(T ) for summation with a background theory of ment to these literals that satisfies the formula. While a SAT inequalities on bounded integers. It splits the problem according to solver is free to condition on a proposition, assigning it to ei- literals in the background theory, simplifying it until the sum is over a literal-free expression. Some of the splits are on a free variable (y) ther TRUE or FALSE regardless of previous choices (truth val- and create if-then-else expressions which are symbolic conditional ues of propositions are independent from each other), an SMT solutions. Other splits are on quantified variables (x, z), and split the solver needs to also check whether a choice for one literal is corresponding quantifier. When the base case with a literal-free ex- consistent with the previous choices for others, according to pression is obtained, the specific theory solver computes its solution T . This is done by a theory-specific model checker provided (white boxes). This figure does not show how the conditional sub- to the SMT algorithm as a parameter. solutions are summed together; see end of Section 4.1 for examples and details. SGDPLL(T ) is, like SMT algorithms, modulo the- ories but further generalizes DPLL by being symbolic The Davis-Putnam-Logemann-Loveland (DPLL) algo- and quantifier-parametric (thus “Symbolic Generalized rithm [Davis et al., 1962] solves the satisfiability (or SAT) DPLL(T )”). These three features can be observed in the prob- lem being solved by SGDPLL(T ) in Figure 2: 3 T -Problems and T -Solutions SGDPLL(T ) receives a T -problem (or, for short, simply a (if x>y y =5 then 0.1 else 0.9) ∧ ￿ problem) of the form x,z 1,...,100 ∈{￿ } E, (if z4 then y else 10 + z, Before formally describing SGDPLL(T ), we will briefly y x:3 x x y ￿ ≤￿∧ ≤ comment on its three key generalizations. for x, y bounded integer variables in, say, 1,...,20 . A T -solution (or, for short, simply a solution{ ) to} a prob- lem is a quantifier-free expression in T equivalent to the Quantifier-parametric Satisfiability can be seen as com- problem. It can be an unconditionalL solution, contain- puting the value of an existentially quantified formula; the ing no literals in T , or a conditional solution of the form existential quantifier can be seen as an indexed form of dis- C if L then S1 else S2, where L is a literal in T and S1,S2 junction, so we say it is based on disjunction. SGDPLL(T ) are two solutions (either conditional or unconditional).C Note generalizes SMT algorithms with respect to the problem be- that, in order for the solution to be equivalent to the problem, ing solved by computing expressions quantified by any quan- only variables that were free (not quantified) can appear in tifier based on an associative commutative operation the literal L. In other words, a solution can be seen as a deci- . Examples of ( , , ) pairs are ( , ), ( , ), ( ,+), ⊕ ￿ ⊕ ∀ ∧ ∃ ∨ sion tree on literals, with literal-free expressions in the leaves, and ( , ). Therefore SGDPLL(T ) can solve not only sat- such that each leaf is equivalent to the original problem, pro- isfiability× (since disjunction￿ is associative commutative),￿ but ￿ vided that the literals on the path to it are true. For example, also validity (using the quantifier), sums, products, model the problem counting, weighted model∀ counting, maximization, and many more. if y>2 w>y then y else 4 ∧ x:1 x x 10 ≤￿∧ ≤ Modulo Theories SMT generalizes the propositions in has an equivalent conditional solution SAT to literals in a given theory T , but the theory con- if y>2 then if w>y then 10y else 40 else 40. necting these literals remains that of boolean connectives. SGDPLL(T ) takes a theory T =(T ,T ), composed of C L T a constraint theory T and an input theory T . DPLL 4 SGDPLL( ) propositions are generalizedC to literals in T in SGDPLL(L T ), In this section we provide the details of SGDPLL(T ), de- whereas the boolean connectives are generalizedC to functions scribed in Algorithm 2 and exemplified in Figure 2. We first in T . In the example above, T is the theory of inequali- give an informal explanation guided by examples, and then ties onL bounded integers, whereasC T is the theory of +, , provide a formal description of the algorithm. boolean connectives and if then elseL . T is used simply× for the simplifications after conditioning, whichL takes time at 4.1 Informal Description of SGDPLL(T ) most linear in the input expression size. Base Case Problems A problem is in base case 0 if and only if m =1, E contains no literals in T and C is in a base form specific to the theory C Symbolic Both SAT and SMT can be seen as computing the T , which its solver must be able to recognize. value of an existentially quantified formula in which all vari- In our running example, we a slight variant of difference ables are quantified, and which is always equivalent to either arithmetic [de Moura et al., 2007], with atoms of the form TRUE or FALSE. SGDPLL(T ) further generalizes SAT and x4 then y else 10 + z. u l. 1 If l u is false, there are no values of x satisfying x:3 x 10 z − ≤ ￿≤ ≤ ￿ the constraint, and the problem results in 0. Therefore, the In this case, we cannot simply move the literal outside the solution is the conditional if l u then E￿ else 0. scope of the sum in its latest index x. Instead, we add the A problem is such≤ that m>1 and E base case 1 x:Cm literal and its negation to the constraint on x, in two new sub- satisfies base case 0, yielding solution S. In this case, we problems: reduce the problem to the simpler ￿ = y + 10 + z S. x:x>4 3 x 10 z x:x 4 3 x 10 z ··· ￿ ￿∧ ≤ ≤ ￿ ￿ ￿ ≤ ￿∧ ≤ ≤ ￿ ￿ x1:C1 xm 1:Cm 1 ￿ −￿ − = y + 10 + z . Non-Base case Problems ￿x:5 x<10 z ￿ ￿x:3 x 4 z ￿ For non-base cases, SGDPLL(T ) mirrors DPLL, by selecting ￿≤ ￿ ￿≤ ≤ ￿ In this example, the two sub-solutions are unconditional poly- a splitter literal to split the problem on, generating two sim- pler problems. This eventually leads to base case problems. nomials, and their sum results in another unconditional poly- The splitter literal L can come from either the expression nomial, which is a valid solution. However, if at least one of the sub-solutions is conditional, their direct sum is not a E, to bring it closer to being literal-free, or from Cm, to bring it closer to satisfying the base form prerequisites. We will see valid solution. In this case, we need to combine them with a distributive transformation of over if then else : examples shortly. ⊕ Once the splitter literal L is chosen, it splits the problem (if x<4 then y2 else z)+(if y>z then 3 else x) in two possible ways: if L does not contain any of the in- if x<4 then y2 +(if y>z then 3 else x) dices xi, it causes an if-splitting in which L is the condition ≡ of an if then else expression and the two simpler sub- else z +(if y>z then 3 else x) problems are its then and else clauses; if L contains at least if x<4 then if y>z then y2 +3 else y2 + x one index, it causes a based on the latest ≡ quantifier-splitting else if y>z then z +3 else z + x. of the indices it contains. For an example of an if-splitting on a literal coming from 4.2 Formal Description of SGDPLL(T ) C , consider the problem m We now present a formal version of the algorithm. We start y2. by specifying the basic tasks the given T -solver is required to solve, and then show can we can use it to solve any T - z x:y x 3 x x 10 ￿ ≤ ∧￿≤ ∧ ≤ problems. This is not a base case because the constraint includes two Requirements on T solver lower bounds for x (y and 3), which are not redundant be- To be a valid input for SGDPLL(T ), a T -solver S for theory cause we do not know which one is the smallest. We can T T =(T ,T ) must solve two tasks: however reduce the problem to base case ones by splitting L C the problem according to y<3: Given a problem x:C E, ST must be able to recognize • whether C is in base form and, if so, provide a solution 2 2 ￿ if y<3 then y else y . baseT ( x:C E) for the problem. z x:3 x 10 z x:y x 10 Given a conjunction C not in base form, S must pro- ￿ ￿≤ ≤ ￿ ￿≤ ≤ • ￿ T vide a tuple splitT (C)=(L, CL,C L) such that L For another example of an if-splitting, but on a literal com- ¬ ∈ T , and conjunctions CL and C L are smaller than C ing from E this time, consider C ¬ and satisfy L (C CL) L (C C L). ⇒ ⇔ ∧¬ ⇒ ⇔ ¬ if y>4 then y else 10. The algorithm is presented in Figure 2. Note that it does z x:3 x 10 not depend on difference arithmetic theory, but can use a ￿ ￿≤ ≤ solver for any theory satisfying the requirements above. This is not a base case because E is not literal-free. However, Note also that conditional solution may contain redundant splitting on y>4 reduces to literals, and that optimizations such as unit propagation and watch literals can be employed. These issues can be ad- if y>4 then y else 10, dressed but we omit the details for lack of space. z x:3 x 10 z x:3 x 10 ￿ ￿≤ ≤ ￿ ￿≤ ≤ If the T -solver implements the operations above in polyno- containing two base cases. mial time in the number of variables and constant time in the domain size (the size of their types), then SGDPLL(T ), like 1This takes a number of steps, which we omit for lack of space, DPLL, will have time complexity exponential in the num- but it is not hard to do. ber of literals and, therefore, in the number of variables, and be independent of the domain size. This is the case for the solver for difference arithmetic and will be typically the case for many other solvers. Algorithm 2 Symbolic Generalized DPLL (SGDPLL(T )), 5 Probabilistic Inference Modulo Theories omitting pruning, heuristics and optimizations.

Let P (X1 = x1,...,Xn = xn) be the joint probability dis- SGDPLL(T )( E) tribution on random variables X1,...,Xn . For any tuple x1:C1 ··· xm:Cm of indices t, we define X to be{ the tuple of variables} indexed T T t Returns a ￿-solution for￿ the given -problem. by the indices in t, and abbreviate the assignments (X = x) (X = x ) x x t¯ split( E) and t t by simply and t, respectively. Let be the 1 if xm:Cm indicates “base case” tuple of indices in 1,...,n but not in t. 2 S baseT ( E) { } ← xm:Cm The marginal probability distribution of a subset of vari- 3 if m￿=1// decide if base case 0 or 1 ables Xq is one of the most basic tasks in probabilistic infer- 4 return S￿ P (x )= P (x) ence, defined as q xq¯ which is a summation 5 else P S on a subset of variables occurring in an input expression, and 6 x1:C1 xm 1:Cm 1 ￿ ← ··· − − therefore solvable by SGDPLL(T ). 7 else ￿ ￿ If P (X = x) is expressed in the language of input and 8 // split returned (L, E￿, E￿￿) xm:Cm￿ xm:Cm￿￿ constraint theories appropriate for SGDPLL(T ) (such as the 9 if L does not contain any indices one shown in Figure 2), then it can be solved by SGDPLL(T ), 10 splittingType ￿“if” ￿ ← without first converting its representation to a much larger one 11 Sub1 E￿ based on tables. The output will be a summation-free expres- ← x1:C1 ··· xm:Cm￿ 12 Sub2 x :C x :C E￿￿ sion in the assignment variables x representing the marginal ← ￿ 1 1 ···￿ m m￿￿ q L x probability distribution of X . 13 else // contains a latest index j: q 14 splittingType￿ “quantifier”￿ Belief updating consists of computing the posterior prob- ← 15 Sub1 x :C x :C L x :C E￿ ability of Xq given evidence on Xe, which is defined as ← 1 1 ··· j j ∧ ··· m m￿ 16 Sub2 E￿￿ P (x x )=P (x ,x )/P (x ) which can be computed x1:C1 xj :Cj L xm:C￿￿ q| e q e e ← ￿ ···￿ ∧¬ ···￿ m with two applications of SGDPLL(T ), one for the marginal 17 S1 SGDPLL(T )(Sub1) ← ￿ ￿ ￿ P (x ,x ) and another for P (x ). 18 S2 SGDPLL(T )(Sub2) q e e ← Applying SGDPLL(T ) in the manner above does not take 19 if splittingType == “if” advantage of factorized representations of joint probability 20 return the expression if L then S1 else S2 distributions, a crucial aspect of efficient probabilistic infer- 21 else return combine(S1,S2) ence. However, it can be used as a basis for an algorithm, Symbolic Generalized Variable Elimination modulo theories SPLIT( x:C E) E L (SGVE(T )), analogous to Variable Elimination (VE) [Zhang 1 if ￿contains a literal and Poole, 1994] for graphical models, that exploits factoriza- 2 E￿ E with L replaced by TRUE ← tion. Suppose P (x) is represented as a product of real-valued 3 E￿￿ E with L replaced by FALSE ← functions (called factors) fi: P (x)=f1(xt1) fm(xt1) 4 return (L, C E￿, C E￿￿) ×···×f (x ) 5 elseif C is not recognized as base form by the T -solver and we want to compute a summation over it: xq¯ 1 t1 × 6 (L, C￿,C￿￿) ￿split (￿C) fm(xtm) where q and ti are tuples. We now choose a T ···× ￿ 7 return (L, ← E, E) variable xi for i q to eliminate first. Let g be the product of C￿ C￿￿ ￿∈ x h 8 else return “base case” all factors in which i does not appear, be the product of all ￿ ￿ factors in which xi does appear, and b be the tuple of indices COMBINE(S1,S2) of variables other than xi appearing in h. Then we rewrite the above as 1 if S1 is of the form if C1 then S11 else S12 2 return the following if-then-else expression: g(x¯i) h(xi,xb)= g(x¯i)h￿(xb) 3 if C1 xq,¯ ¯i xi xq,¯ ¯i then combine(S ,S ) ￿ ￿ ￿ 4 11 2 5 else combine(S12,S2) where h￿ is a summation-free factor computed by 6 elseif S2 is of the form if C2 then S21 else S22 SGDPLL(T ) and equivalent to the innermost summa- 7 return the following if-then-else expression: tion. We now have a problem of the same type as originally, 8 if C but with one less variable, and can proceed until all variables 2 9 then combine(S1,S21) in xq¯ are eliminated. SGDPLL(T ) being symbolic allows us 10 else combine(S1,S22) to compute h￿ without iterating over all values to xi. 11 else return S1 S2 SGDPLL(T ) contributes to the symbolic connectionist lit- ⊕ erature by providing a way to incorporate logic representa- tions to systems dealing with uncertainty. If the variables and data used by SGDPLL(T ) in a given system are of a sub- symbolic, distributed nature, then this system will exhibit ad- vantages of connectionism, observed on logic representations References instead of simpler propositional representations. [Barrett et al., 2009] C. W. Barrett, R. Sebastiani, S. A. Se- shia, and C. Tinelli. Satisfiability Modulo Theories. In 6 Evaluation Armin Biere, Marijn Heule, Hans van Maaren, and Toby We did a very preliminary comparison of SGDPLL(T )-based Walsh, editors, Handbook of Satisfiability, volume 185 of SGVE(T ), using an implementation of an equality theory Frontiers in Artificial Intelligence and Applications, pages (=, = literals only) symbolic model counter, to the state- 825–885. IOS Press, 2009. of-the-art￿ probabilistic inference solver variable elimination [Davis et al., 1962] M. Davis, G. Logemann, and D. Love- and conditioning (VEC) [Gogate and Dechter, 2011], on land. A machine program for theorem proving. Commu- randomly generated probabilistic graphical models based on nications of the ACM, 5:394–397, 1962. equalities formulas, on a Intel Core i7 notebook. We ran [de Moura et al., 2007] Leonardo de Moura, Bruno Dutertre, both SGVE(T ) and VEC on a random graphical model with and Natarajan Shankar. A tutorial on satisfiability modulo 10 random variables, 5 factors, with formulas of depth and theories. In Computer Aided Verification, 19th Interna- breadth (number of arguments per sub-formula) 2 for random tional Conference, CAV 2007, Berlin, Germany, July 3-7, connectives and . SGVE(T ) took 1.5 seconds to com- ∨ ∧ 2007, Proceedings, volume 4590 of Lecture Notes in Com- pute marginals for all variables (unsurprisingly, irrespective puter Science, pages 20–36. Springer, 2007. of domain size). We grounded this model for domain size 16 to provide the table-based input required by VEC, which [de Salvo Braz, 2007] R. de Salvo Braz. Lifted First-Order then took 30 seconds to compute all marginals. The largest Probabilistic Inference. PhD thesis, University of Illinois, grounded table given to VEC as input had 6 random variables Urbana-Champaign, IL, 2007. and therefore around 16 million entries. [Een´ and Sorensson,¨ 2003] N. Een´ and N. Sorensson.¨ An Extensible SAT-solver. In SAT Competition 2003, volume 7 Related work 2919 of Lecture Notes in Computer Science, pages 502– 518. Springer, 2003. SGDPLL(T ) is a lifted inference algorithm [Poole, 2003; de Salvo Braz, 2007; Gogate and Domingos, 2011], but [Ganzinger et al., 2004] Harald Ganzinger, George Hagen, the proposed lifted algorithms so far have concerned them- Robert Nieuwenhuis, Albert Oliveras, and Cesare Tinelli. selves only with relational formulas with equality. We have DPLL( T): Fast Decision Procedures. 2004. not yet developed a relational model counter but presented [Getoor and Taskar, 2007] L. Getoor and B. Taskar, editors. one or inequalities, the first in the lifted inference literature. Introduction to Statistical Relational Learning. MIT Press, SGDPLL(T ) generalizes several algorithms that operate on 2007. mixed networks [Mateescu and Dechter, 2008] – a framework [Gogate and Dechter, 2011] V. Gogate and R. Dechter. Sam- that combines Bayesian networks with constraint networks, pleSearch: Importance sampling in presence of determin- but with a much richer representation. ism. Artificial Intelligence, 175(2):694–729, 2011. [Gogate and Domingos, 2011] V. Gogate and P. Domingos. 8 Conclusion and Future Work Probabilistic Theorem Proving. In Proceedings of the We have presented SGDPLL(T ) and its derivation SGVE(T ), Twenty-Seventh Conference on Uncertainty in Artificial the first algorithms formally able to solve a variety of model Intelligence, pages 256–265. AUAI Press, 2011. counting problems (including probabilistic inference) mod- [Goodman et al., 2012] Noah D. Goodman, Vikash K. ulo theories, that is, capable of being extended with theories Mansinghka, Daniel M. Roy, Keith Bonawitz, and Daniel for richer representations than propositional logic, in a lifted Tarlow. Church: a language for generative models. CoRR, and exact manner. Future work includes additional theories abs/1206.3255, 2012. of interest mainly among them those for uninterpreted rela- tions (particularly multi-arity functions) and arithmetic; mod- [Knuth, 1993] Donald E. Knuth. Johann Faulhaber and Sums ern SAT solver optimization techniques such as watched liter- of Powers. Mathematics of Computation, 61(203):277– als and unit propagation anytime approximation schemes that 294, 1993. offer guaranteed bounds on approximations that converge to [Mateescu and Dechter, 2008] R. Mateescu and R. Dechter. the exact solution. Mixed deterministic and probabilistic networks. Annals of Mathematics and Artificial Intelligence, 54(1-3):3–51, Acknowledgments 2008. We gratefully acknowledge the support of the Defense Ad- [Poole, 2003] D. Poole. First-Order Probabilistic Inference. vanced Research Projects Agency (DARPA) Probabilistic In Proceedings of the Eighteenth International Joint Con- Programming for Advanced Machine Learning Program un- ference on Artificial Intelligence, pages 985–991, Aca- der Air Force Research Laboratory (AFRL) prime contract pulco, Mexico, 2003. Morgan Kaufmann. no. FA8750-14-C-0005. Any opinions, findings, and con- [Zhang and Poole, 1994] N. Zhang and D. Poole. A simple clusions or recommendations expressed in this material are approach to Bayesian network computations. In Proceed- those of the author(s) and do not necessarily reflect the view ings of the Tenth Biennial Canadian Artificial Intelligence of DARPA, AFRL, or the US government. Conference, 1994. A Connectionist Network for Skeptical Abduction

Emmanuelle-Anna Dietz and Steffen Hölldobler and Luis Palacios∗ International Center for Computational Logic, TU Dresden, 01062 Dresden, Germany [email protected] [email protected] [email protected]

Abstract A (propositional logic) program is a finite set of (program) clauses of the form A L ... L ,n 0, where A is We present a new connectionist network to com- 1 n an atom and L , 1 i ← n, are∧ literals.∧ A is≥ called head and pute skeptical abduction. Combined with the CORE i L ... L is called≤ ≤body. We also refer to the body as the method to compute least fixed points of semantic 1 n set ∧L ,...,L∧ . If the body is empty we write A. atoms( ) operators for logic programs, the network is a pre- 1 n denotes{ the set} of atoms occurring in . def( )= AP requisite to solve human reasoning tasks like the A body is the set of defined atomsP and Pundef( {)=| suppression task in a connectionist setting. atoms← ( ) ∈defP}( ) is the set of undefined atoms in . P P \ P P 1 Introduction An interpretation I is a mapping from the set of ground atoms to , . is a model of if it is an interpretation Various human reasoning tasks like the suppression and the that maps{￿ each⊥} clauseM occurring in P to . is a minimal selection tasks can be adequately modeled in a computational P ￿ M model of iff there is no other model ￿ s.t. ￿ . If logic approach based on the weak completion semantics [11; has onlyP one minimal model, then thisM is its leastM ⊂ modelM. ] P 6; 4; 5 . Therein, knowledge is encoded as logic program The knowledge represented by a logic program can be [ ] and interpreted under the Łukasiwiecz (Ł) logic 18 . As captured by the T operator [1]. Given an interpretationP I and [ ] P shown in 11 the weak completion of each program ad- a program , T is defined as T (I)= A A body mits a least Ł-model and reasoning is performed with respect and I(bodyP)=P . For acceptableP programs{ | [7←], T admits∈P to these least Ł-models. The least Ł-models can be com- ￿} P a least fixed point denoted by T lfp. Moreover, this fixed point puted as least fixed points of a particular semantic operator P lfp which was introduced in [22]. In addition, some tasks re- is the least model of [7]. We write = F iff formula F holds in the least fixedP point of T . P| quire abduction and, in particular, it has been shown in [12; P 6] that skeptical abduction is needed as otherwise, if credu- We consider the abductive framework AF = , , , lfp 1 ￿P AP IC lous abduction is applied, the approach is no longer adequate. = , where is an acceptable program, = A | ￿ P AP { ←￿| As shown in [10], the above mentioned semantic opera- A undef( ) is the set of abducibles and is a finite set∈ of integrityP constraints} , i.e., expressions of theIC form tor can be computed by a three-layer feed-forward network ⊥← within the CORE method [8; 2], where the input as well as body . Let be a finite set of literals called observations. is O lfp lfp O the output layer represent interpretations. By connecting the explained by iff = , = and satisfies .E We⊆ assumeAP P that￿| explanationsO P∪E| are minimal,O P i.e.,∪E units in the output layer to the units in the input layer, the IC there is no other explanation ￿ for . corresponding recurrent network converges to a stable state E ⊂E O encoding the least fixed point of the semantic operator. F follows skeptically from , and iff can be ex- P IC O O In order to develop a fully connectionist network for the plained given and , and for all explanations for we lfpP IC E O suppression and the selection tasks we need to add abduc- find = F . F follows credulously from , and P∪E| P IClfp O tion to the recurrent networks mentioned in the previous para- iff there exists an explanation for and = F .We E O P∪E| graph. Unfortunately, to the best of our knowledge, the only focus on skeptical abduction, we are not interested in comput- connectionist models for abduction compute credulous con- ing skeptical abductive explanations (i.e. explanations which clusions [3]. Hence, there is the need to develop a connec- are present in every model of ) but instead we want P∪O tionist model for skeptical abduction and we will do this in to know the logical consequences of all explanations for . O this paper for two-valued logic. This is a keypoint and the main contribution of this paper. The reason why we need to restrict ourselves to minimal ex- 2 Preliminaries planations can be clarified by the following example: Given

We introduce the general notation and terminology that will 1 be used throughout the paper, based on [17; 13] and [14]. This restricion can be lifted allowing to be any logic program using the ideas in [16; 20], where AF is transformedP into an equiva- ∗The authors are listed alphabetically. Luis Palacios was sup- lent AF* wrt generalized stable model semantics. The resulting AF* ported by the European Master’s Program in Computational Logic is definite and can be handled by our approach. Data: , , , =lfp and Data: , , , =lfp , and 2 ￿P AP IC | ￿ O ￿P AP +IC | ￿ O ME Result: Set of minimal explanations for Result: Sets , − = ME O j =0 S S ME ∅ + for i =1...ndo = − = S S ∅ 1) if forall i and i Ord( ) then for each do E∈MlfpE E￿⊂ C C ∈ AP E∈ME 2) if i = and i satisfies then increment j by 1 P∪C | O P∪C IC + lfp add i to j = A = A } C ME F { |P∪E| lfp else − = A = A } discard Fj {¬ |P∪E| ¬ Ci end end if j 1 then else ≥+ j + = k=1 k discard i S ∩j F C − = − end S ∩k=1Fk end end Algorithm 1: The computation of all minimal explanations. Algorithm 2: The computation of skeptical consequences.

= p q, s t where is q ,t for . By condition 2), as explains , all are O E O E∈ME P {= ←p ← } AP {= ←￿q ←￿} minimal explanations for . and , the only explanation is . Ad- O ditionallyO { to }p and q, we conclude thatE s{and←￿t }follow 2. (by contradiction) Assume that all possible explanations skeptically. Without the restriction to minimal¬ explanations,¬ have been tested and is a minimal explanation. Then E￿∈ ME = q ,t there exists no explanation ￿ , i.e., satisfies condi- would yet be another explanation for E ⊂E E E , and{ consequently←￿ ←￿ only} p and q follow skeptically. tion 1), and because explains , it must have been added to O by condition 2)E of AlgorithmO 1. Therefore, . ME E∈M 3 Computing Skeptical Abduction 3.2 Algorithm 2 In this section we explain how skeptical abduction can be Algorithm 2 determines which literals follow skeptically by computed. For this purpose, we first need to determine the set verifying if they are entailed by all minimal explanations. of minimal explanations for , and . After that, we can P IC O Proposition 2. Consider , , , =lfp and observa- identify which literals follow skeptically from and , i.e., + ￿P AP IC | ￿ O P tion . Let , − be computed by Algorithm 2. which literals follow from all minimal explanations for . O S S O 1. + contains all skeptically entailed positive literals. 3.1 Algorithm 1 S 2. − contains all skeptically entailed negative literals. We define an order on the candidate explanations , i.e., S C Proof. 1. If j =0then can not be explained, and nothing all subsets of which are possibly minimal explanations O + for . Let PowA(P ) be the power set of and be any follows skeptically. Accordingly, the set will be empty. If O AP AP ≺ at least one minimal explanation exists, weS show that all ele- total ordering over Pow( ) s.t. for every , ￿ Pow( ) AP C C ∈ AP ments in + follow skeptically. Let A +. Then, because we have that < ￿ implies ￿. Ord( )= |C| |C | C≺C AP + jS + +∈S 1, 2,..., n is an ordered set with exactly the elements = k=1 k it follows that A k with 1 k j. As {C C C } S ∩ F + ∈F ≤ ≤ of Pow( ) s.t. 1 2 n. All minimal explana- Algorithm 2 creates a set k for each k , we have AP C ≺C ≺···≺C + F lfp E ∈ME tions are determined by Algorithm 1. that A implies k = A, for all k . By ∈Fk P∪E | E ∈ME Lemma 1. Consider , , , =lfp and observation . Proposition 1, the set contains exactly all minimal expla- ￿P AP IC | ￿ O nations for . Consequently,ME A follows skeptically. The following holds for i Ord( ), 1 i n: When- C ∈ AP ≤ ≤ 2. can be shownO similarly to 1. ever i is tested as part of condition 1) of Algorithm 1, C ME contains all minimal explanations ￿ for with ￿ < i . C O |C | |C | 4 A Connectionist Realization Proof. If there exists a minimal explanation then E⊂Ci E was tested before i, because all have been tested in order We encode Algorithms 1 and 2 into programs and translate C E the programs into neural networks with the help of the CORE of minimality. As is a minimal explanation, it must have + been added to E, therefore there cannot exist a minimal method. The resulting networks compute , and −. ME ME S S explanation smaller than i that is not in . C ME 4.1 The CORE Method lfp Proposition 1. Consider , , , = and observa- The CORE method [8; 2] translates programs into neural net- tion . Let be the set computed￿P AP IC by Algorithm| ￿ 1. works. It is based on the idea that feed-forward networks can O ME 1. contains only minimal explanations for . approximate almost any function arbitrarily well, and that the ME O semantics of a logic program is captured by the T operator. 2. All minimal explanations for are contained in . P O ME We represent propositional variables by natural numbers; let n N be the largest number occurring in . The network Proof. 1. Every satisfies conditions 1) and 2) of ∈ P E∈ME associated with is constructed as follows: Algorithm 1. By Lemma 1 and condition 1) there is no P Ord( ) with s.t. is a minimal explanation 2This set is computed by Algorithm 1. C∈ AP C⊂E C sync sol A1 A2 A3 ... An K L S

sol non_min

done next obs, ic Ai A1 A2 A3 ... An M a A Figure 2: The network for AF, where A ,...,A = i i { 1 n} C AF atoms( ￿ ￿). ai P∪O ∪IC

Figure 1: Overview of the network: Components and the con- = L ,...,L , ￿ = obs L L and O { 1 n} O { ← 1 ∧···∧ n} nections between them. ￿ = ic body body . Lemma 4 follows immediatelyIC { ← from the| definitions.⊥← ∈IC} 1. The input and output layer is a vector of binary threshold Lemma 4. Given , , , =lfp and let be a finite set .5 n i P units (with threshold ) of length , where the th unit in of literals. If ￿P A explainsIC | ￿ given O and then each layer represents the variable i, 1 i n. Addi- lfp E⊆AP lfp O P IC ≤ ≤ ￿ = obs and ￿ = ic. tionally, each unit in the output layer is connected to the P ∪E | P ∪E | ¬ corresponding unit in the input layer with weight 1. We can construct the associated network NAF using the CORE method as shown in Figure 2. From this construction 2. For every clause of the form A L1 Lk with and Lemma 3 follows: k 0 occurring in do: ← ∧···∧ ≥ P Lemma 5. Given NAF and let be a minimal ex- (a) Add a binary threshold unit c to the hidden layer. E∈ME planation for given and . If NAF reaches a stable c A O P IC (b) Connect to the unit representing in the output state, then for each i in the output layer of NAF we lfp layer with weight 1. find: If i is active, then ￿ = i. If i is inactive, then lfp P ∪E | (c) For each L occurring in L1 Lk connect the ￿ = i. unit representing L in the input∧··· layer∧ to c. If L is an P ∪E | ¬ atom, then set the weight to 1, otherwise to 1. The Counter C − outputs all possible combinations of abducibles in the order (d) Set threshold θc of c to l 0.5, where l is the number C − of minimality. It is divided into the two components and of positive literals occurring in L1 Lk. C1 ∧···∧ C . C computes all the combinations of k elements out of [ ] 2 1 Lemma 2 ( 8 ). For each program there exists a 3-layer n and C increases k. This way we first compute all the sets recurrent network that computes T .P 2 P Pow( ) with =1. Thereafter, we compute all sets Lemma 3 ([9]). For each acceptable program there ex- withC∈ =2AP, =3|C|,..., = n, where n = . ists a 3-layer recurrent network such that each computationP |C| |C| |C| |AP | C1 starts with a number k of active bits, and is based on starting with an arbitrary initial input converges and yields two simple rules over a binary vector of length n. Each time the unique fixed point of T . P the signal δ is activated: 4.2 The Network 1. The left-most bit b is shifted to the left. [3] provides a connectionist model to compute credulous 2. When a bit b can not be shifted anymore the last active bit abduction. We modify their approach such that we only aaare reset consider minimal explanations, which allows us to compute relative to a, and the process restarts from 1. skeptical solutions. Figure 1 gives an overview of the net- Eventually all k active bits will be shifted to the left, and C1 work. Each component is expressed as a logic program and asks C2 to update the initial configuration of k +1active bits, translated into a neural network using the CORE method. and starts again. The process ends when C1 runs with the ini- AF is the abductive framework in consideration where we test tial configuration of size k = n. all minimal explanations. A counter C generates all candi- In condition 2. we reset the position of the bits. We need to dates in order of minimality. M tests only the minimal candi- know which of them were initially active. In order to keep dates and records which of them are the (minimal) explana- track of the initial k bits, each one is represented by the sym- tions. Solution S keeps track of all literals that follow from bol aij where i indicates its original and j its current position. + each minimal explanation and generates the sets and −. S S (1) a a ,s ,δ Ultimately, these are the sets we are interested in. Whenever i(j+1) ← ij i (2) mi 1 ain,si,δ a valid solution has been found, control logic L informs S − ←  (3) mi 1 aij,a(i+1)(j+1),mi  and M. Additionally, L requests C to output the next set of − n  (4) r ← a , a ,m  abducibles. Finally the clock K snychronizes the system. It  i ← ij ¬ (i+1)(j+1) i  C =  (5) a aij, a ,mi  instructs L when to evaluate the current solution. 1  i(j+1) ← ¬ (i+1)(j+1)  i,j=1  (6) ak(j+k i) ri,aij,bk (for k>i)  ￿  − ←  The Abductive Framework AF (7) aij aij, ai(j+1), rk, v In order to detect which explains and satisfies  ← (for k>i¬ ) ¬ ¬  E⊆AP O   , we construct ￿ = ￿ ￿ where, given that  (8) v m0  IC P P∪O ∪IC  ←      (9) b 1 ← Lemma 6. Neuron sol is active only when an explanation (10) bi bi for given and has been found. It is active for onlyE  (11) b ← b ,v  O P IC  i+1 ← i  two time steps.  (12) done bn,v   ←  Proof. From Lemma 4, if explains , then neuron obs will  (13) ψ ψ, next  n  ←¬  be active, and neuron ic inactive,E thusOnogood can not be ac-  (14) δ ψ, next  C2 =  ←  tivated. Under this conditions when sync is activated by K to  (15) aii ψ, bi, next  i,j=1  ←¬  ￿  (16) ψ ψ, v  check if the current solution is valid, sol will be activated. ← ¬ (17) si+1 si,v  ←  The Minimal Candidates M  (18) si si, si+1   (19) s ← b ¬, ψ, next  M has the tasks to filter C’s output s.t. only minimal candi-  1 2   (20) a ←a¬ ¬  dates are tested and to record each minimal explanation that  j ← ij  has been found. Given a candidate , a minimal explanation   C Due to space restrictions, we only detail the most relevant and the set , we construct the set = q1,...,qn   E AP QEC { } features of the counter. For each component C1 and C2 a in order to determine , where q = iff a and E⊂C i ⊥ i ∈E correspondent signal δ and ψ is used, δ is used by C1 to shift a , and q = otherwise. i ￿∈ C i ￿ the bits to the left, and ψ is used by C2 to update the initial Intuitively, in case all elements of are also in , all qi configuration. They both depend on the next signal. The will be indicating that is aE superset of Cand thus ∈ QEC ￿ C E C output of the counter is represented by each neuron aj. can not be minimal (non_min). The process is encoded by adding the following clauses to M, where we label a as Proposition 3. Consider and the counter C. The output i ∈E AP aE in order to distinguish the elements of from those of . of C represented by all active atoms aj generates Ord( ). i E C AP For each add to M: E∈ME The Clock K (1) q aE (3) non_min q ,...,q K synchronizes the system. Each cycle of the clock activates i ←¬ i ← 1 n (2) qi ai,aE ∪ (4) next non_min the sync signal, which checks if the current solution is valid ￿ ← i ￿ ￿ ← ￿ (sol) and activates the next signal. This causes C to output Lemma 7. Given , , , =lfp and let be a minimal the next combination of abducibles. The clock has D +1 ￿P AP IC | ￿ E 1 explanation for given and . For any candidate atoms ki. The value of D = (P + N + 5) is chosen to O P IC C 2 where , it holds that qi = for all qi . give the rest of the network enough time to stabilize, where E⊂C ￿ ∈QEC P is the longest directed path without repeated atoms in the Proof. Assume but qi = for some i. Then clauses E⊂C ⊥ dependency graph of , and N = . (1) and (2) in M are not satisfied, only if for the corresponding P |AP | k ai we find both: ai and ai . But, since ai and In K each atom i depends on its predecessor and the ad- ∈E ￿∈ C ∈E non min it follows that ai , thus qi must be true. ditional _ condition is used to reset the clock if a E⊂C ∈C non-minimal¬ solution is discarded by component M. The combination k , k is reached once per cycle and is used to Lemma 8. Let be the current candidate represented by 0 1 a C activate the neuron¬ sync, which remains active for two con- each neuron i from the output of the counter C. Given M, the non_min neuron will be active iff for some . secutive time steps. C⊃E E∈ME Proof. Follows from Lemma 7 and Proposition 1. D k0 done, kn K = ki ki 1, non_min ←¬ ¬ Since non_min detects if for some , clause { ← − ¬ }∪ sync k0, k1 i=1 ← ¬ E⊂C E∈M ￿ ￿ ￿ (4) activates the next neuron discarding and instructing C to generate the next candidate, thus onlyC minimal explana- This process continues until neuron done is activated (set by tions will be tested. (12) on C ) and prevents k to change its value. 2 0 To construct the set we need to record each minimal M The Control L explanation found. We provide an upper bound for the num- L verifies whether ic is violated (1) or obs is not met (2). ber of minimal explanations that can exist. The proof of the following lemma is quite extensive and can be found in [19]. (1) nogood ic ← Lemma 9. Given a set of abducibles , let n = . (2) nogood obs AP |AP | ←¬ Then the maximal number of possible minimal explanations L =  (3) sol sync, nogood  (n!)  ← ¬  is given by max = n n , which is the maximum number of  (4) done done  2 ! 2 !   n (5) next ← sync subsets of the same size ( ) out of n elements. ← 2   The following rules are added to M to keep track of how If this is the case, the current solution is not valid (nogood ).   many explanations have been found. Both conditions are checked when sync is activated by K. If E the conditions are met, an explanation has been found, and 1 sol, 1 sol is activated by (3). (4) serves as a memory for neu- E ← sol, ¬E , E2 ← ¬E2 E1 ron done which indicates thal all explanations have been ex-  3 sol, 3, 2  i i plored. (5) induces C to output the next possible explanation  E ← ¬E E  ∪{ E ←E }  ...  by activating next, which depends on sync. max sol, max, max 1 E ← ¬E E −     When the ith solution has been found, the correspondent + i Ai− Ai∗ Ai neuron will be active. Because sol is active for only two timeE steps, it ensures that only one of the rules above will be fired by each activation of sol, and only one minimal explanation will be recorded at each time. As soon as an i has been found, its elements are recorded. + A− A∗ sol A Ai E j i i i For this purpose we introduce the symbol aiE which states that atom ai belongs to j. Then, for each ai and each with 1 j max, weE add the following rules∈A to M : Figure 3: Network for S Ej ≤ ≤

j (1) aiE ai, j, block_ j Consider (1): By Lemma 6 and 10, sol is active only when j ← jE ¬ E (3) block_ i i ￿(2) aiE aiE ￿ ∪{ E ←E } a minimal explanation for is found and for each active ← E O lfp neuron Ai in the output layer of AF, = Ai. (1) en- + P∪E| (1) is fired as soon as j is activated, and records its ele- lfp E sures that Ai is active iff = Ai, and in particular Ej + lfp P∪E | ments ai . The additional condition block_ j ensures that Ai = iff 1 = Ai. Additionally, by (2), we remem- ¬ E ￿ + P∪E+ | we record the values only when the j solution was found. A j E ber that i j for some , thus we introduce a recurrent j ∈F+ (2) serves as a memory for each aE j. (3) blocks the cur- connection for A . i ∈E i rent explanation; thus, only values present at the moment j Besides remembering when an atom was present in a solu- E was detected are recorded. tion we need to detect if this atom does not occur in all other E Lemma 10. Consider , , , =lfp , observation and solutions. Atoms which follow credulously but not skepti- the components AF, C￿P, KA, PLICand| M.￿ At the momentO sol cally are labelled by A∗. Let Ai be the atom corresponding to + + + neuron is active, a minimal explanation for is found. an active neuron Ai . Ai if Ai j for some j. This E O ￿∈ S ￿∈ F Then for each active neuron A in the output layer of AF it is detected by (3) and the atom Ai∗ is activated: lfp holds: = A and, for each inactive neuron A in the + P∪E| lfp (3) A∗ A , A ,sol output layer of AF it holds = A. i ← i ¬ i P∪E| ¬ (4) Ai∗ Ai∗ Proof. From Lemma 3 we know that N computes T lfp. ￿ ← ￿ AF lfp From Lemmas 6 and 10 we conclude that each explanationP If for the current explanation j it holds j = Ai lfp E P∪E+ | ¬ detected by the signal sol is minimal. and, k = Ai for any k1 + = + S+ Proof. For is trivial. In case , j j j 1 The resulting network + j 1 + S+ F+ ∩ + − and since j 1 = 1− i it follows that j = j j 1 Using the CORE method we translate into a neural network + S − +∩ F S F ∩F − ∩ P j 2 1 , thus when the final solution k is found, NAF and we add recurrent connections from its output to its F − ∩···∩F lfp + + input layer. The resulting network computes T , given that k = . The proof is similar for −. S S S is a acceptable. P + + P In order to compute , we first need to construct 1 . The networks NC, NL, NK, NM, NS corresponding to the S + F components C, L, K, M, S, resp., are obtained by translat- This is achieved by introducing a copy Ai for each Ai + lfp ∈R ing the theory representing each component using the CORE such that Ai = iff 1 = Ai, with 1 . For ￿ P∪E | E ∈ME method. Each resulting sub-network has the property that each Ai we add the following clauses to S: ∈R it computes the associated T operator, thus the semantics + P (1) Ai Ai,sol detailed above are preserved. In the following, let N = + ← + (2) A A NAF, NC, NL, NK, NM, NS . ￿ i ← i ￿ ￿ ￿ + + Claim 2. Let = Ai Ai = , − = Ai Ai− = 34th Annual Conference of the Cognitive Science Soci- bad R { | ￿} R { | , and = A A∗ = . If N reaches a stable state, ety, 1500–1505. Cognitive Science Society, 2012. ￿} R { i | i ￿} then for all minimal explanations and for each atom [7] M. Fitting. Metric methods – three examples and a the- + bad E∈lfpME Ai ( ) we find = Ai; likewise, for each orem. J. of Logic Programming, 21(3):113–127, 1994. ∈ R \R bad P∪E| lfp atom A ( − ) we find = A . i ∈ R \R P∪E| ¬ i [8] S. Hölldobler and Y. Kalinke. Towards a new massively Claim 3. If N reaches a stable state, then + = + bad parallel computational model for logic programming. In bad S R \R and − = A A − . Proc. ECAI94 Workshop on Combining Symbolic and S {¬ | ∈R \R } Connectionist Processing, 68–77. ECCAI, 1994. Remarks Positive cycles in the framework AF can be problematic as [9] S. Hölldobler, Y. Kalinke, and H.-P. Störr. Approximat- they may introduce a memory in the network. To handle them ing the semantics of logic programs by recurrent neural properly the AF component has to be relaxed at each next networks. Applied Intelligence, 11:45–59, 1999. signal. This can be done by connecting the neuron next to [10] S. Hölldobler and C. D. P. Kencana Ramli. Contraction each neuron representing a rule in in the input layer of AF, properties of a semantic operator for human reasoning. and erase possible memories introducedP by such cycles. In L. Li and K. K. Yen, eds., Proc. Fifth International Conference on Information, 228–231. International In- 5 Conclusion formation Institute, Tokyo, 2009. We have developed a connectionist network for skeptical [11] S. Hölldobler and C. D. P. Kencana Ramli. Logic pro- reasoning based on the CORE method and the ideas pub- grams under three-valued Łukasiewicz’s semantics. In lished in [3] for credulous reasoning. The CORE method P. M. Hill and D. S. Warren, eds., Logic Programming, has already been extended to many-valued logics in [15; vol. 5649 of LNCS, 464–478. Springer, 2009. 21], but the skeptical reasoning part has not been considered [12] S. Hölldobler, T. Philipp, and C. Wernhard. An abduc- yet. We intend to extend our skeptical reasoning approach to tive model for human reasoning. In Proc. 10th Interna- three-valued Łukasiewicz logic to realize our ultimate goal as tional Symposium on Logical Formalizations of Com- outlined in the introduction, viz. to develop a connectionist monsense Reasoning, 2011. model for human reasoning tasks. [13] S. Hölldober. Logik und Logikprogrammierung: Grund- Although the AF always reaches a stable state, other compo- lagen. Synchron Publishers GmbH, Heidelberg, 2009. nents are represented by programs which are not acceptable. [ ] We need to provide the missing proofs for Claims 1, 2 and 3 14 A. C. Kakas, R. A. Kowalski, and F. Toni. Abduc- which can be built showing that the network N terminates, tive Logic Programming. J. Logic and Computation, i.e., that it reaches a least fixed point. 2(6):719–770, 1993. [15] Y. Kalinke. Ein massiv paralleles Berechnungsmodell References für normale logische Programme. Master’s thesis, TU Dresden, Fakultät Informatik, 1994. (in German). [1] K. Apt and M. van Emden. Contributions to the theory of logic programming. J. ACM, 29:841–862, 1982. [16] A.C. Kakas and P. Mancarella. Generalized Stable Mod- els: a Semantics for Abduction. In Proc. 9th European [2] S. Bader and S. Hölldobler. The Core method: Connec- Conference on Artificial Intelligence, 385–391. Pitman, tionist model generation. In S. Kollias, A. Stafylopatis, 1990. W. Duch, and E. Ojaet, eds., Proc. 16th International [ ] Foundations of Logic Programming Conference on Artificial Neural Networks (ICANN), vol. 17 J. W. Lloyd. . 4132 of LNCS, 1–13. Springer, 2006. Springer, 1984. [18] J.Łukasiewicz. O logice trójwartosciowej.´ Ruch [3] A. d’Avila Garcez, D. Gabbay, O. Ray, and J. Woods. Filozoficzny, 5:169–171, 1920. English: On Three- Abductive reasoning in neural-symbolic learning sys- Valued Logic. In: Jan Łukasiewicz Selected Works. tems. TOPOI, 26:37–49, 2007. (L. Borkowski, ed.), North Holland, 87-88, 1990. [ ] 4 E.-A. Dietz, S. Hölldobler, and M. Ragni. A computa- [19] L. Palacios. A connectionist model for skeptical abduc- tional logic approach to the abstract and the social case tion. Project thesis, TU Dresden, Fakultät Informatik. of the selection task. In Proc. 11th International Sympo- sium on Logical Formalizations of Commonsense Rea- [20] O. Ray and A. d’Avila Garcez. Towards the integra- soning, 2013. tion of abduction and induction in artificial neural net- works. In Proc. ECAI’06 Workshop on Neural- Sym- [5] E.-A. Dietz, S. Hölldobler, and L. M. Pereira. On in- bolic Learning and Reasoning, 41–46, 2006. dicative conditionals. In S. Hölldobler and Y. Liang, [ ] eds., Proc. First International Workshop on Semantic 21 A. Seda and M. Lane. Some aspects of the integration Technologies, vol. 1339 of CEUR Workshop Proc., 19– of connectionist and logic-based systems. In Proc. Third 30, 2015. http://ceur-ws.org/Vol-1339/. International Conference on Information, 297–300, In- ternational Information Institute, Tokyo, 2004. [6] E.-A. Dietz, S. Hölldobler, and M. Ragni. A com- [22] K. Stenning and M. van Lambalgen. Human Reasoning putational logic approach to the suppression task. In and Cognitive Science. MIT Press, 2008. N. Miyake, D. Peebles, and R. P. Cooper, eds., Proc. Towards an Artificially Intelligent System: Possibilities of General Evaluation of Hybrid Paradigm

Ondrejˇ Vadinsky´ Department of Information and Knowledge Engineering University of Economics, Prague, Czech Republic [email protected]

Abstract Taking into account various ways in which intelligence was delimited, especially Legg and Hutter’s [2007] Univer- The paper summarizes presumptions of intelligence sal Intelligence definition, an evaluation of AI and its meth- from the point of view of the artificial general in- ods seems feasible. In Section 4, several possibilities of this telligence and with focus on the hybrid paradigm. evaluation will be outlined. These ideas will focus on AGI- Particularly, the paper concerns with possible ways point-of-view analysis of a hybrid system in particular (4.1), in which Universal Intelligence definition and the the hybrid approach (4.2), and the paradigms of AI in general derived Algorithmic Intelligence Quotient test can (4.3). be used to evaluate a hybrid system, the hybrid paradigm and the paradigms of artificial intelli- gence. A preliminary search suggests that Algorith- 2 Artificial and Natural Intelligence mic Intelligence Quotient test can be used for prac- To grasp intelligence, philosophical reflection will be called tical evaluation of systems on condition that their to help (2.1). An outline of the knowledge of cognitive sci- interfacing is technically solved, while the Univer- ence as well as some examples of ongoing research of cog- sal Intelligence definition can be used for formal nitive architectures will be given to illustrate the need for analysis in case a suitable formal description of the general-oriented multi-disciplinary approach (2.2). Further, system or paradigm is devised. The long-standing the strong – weak and general – specific AI distinctions will issue of evaluating general artificial intelligence al- be summarized (2.3). Finally, attempts on defining AI will be beit crucial has been neglected. However, it can described (2.4). help focusing research efforts, as well as answering some questions regarding the intelligence itself. 2.1 Philosophical Presumptions of Intelligence So far the only true intelligence known to us is human intel- 1 Introduction ligence. Therefore, any effort to build AI should somehow Recently, the original question of artificial intelligence (AI): relate to human intelligence, although this does not neces- ”What is intelligence?” has again been brought into focus, sarily call for its duplication. Of course, research on animal cf. [Turing, 1950; Legg and Hutter, 2007]. It seems, that behavior and of AI itself, can also contribute. This way pre- the question is, indeed, crucial for reaching the original goal sumptions of intelligence, that is properties essentially tied of AI by some dubbed as strong AI or in more current with intelligence, can be identified to be later used as a guid- terms artificial general intelligence (AGI), cf. [Searle, 1980; ance to build an AI. Goertzel, 2014]. As pointed out by Vadinsky´ [2013] and de- Philosophical reflections of what is intelligence and tailed later in [2014], it is a multidisciplinary endeavor reach- thought can be traced back at least to Descartes [1637], who ing to various areas of philosophy and other cognitive sci- notices universality of thought and its connection to lan- ences, not only a limited AI problem. This is also supported guage and rational speech. Similarity to the well-known by ongoing research of cognitive architectures, see e.g. [Sun, Turing test [1950], which was at the beginning of AI as a 2007]. An outline of these issues will be given in Section 2. field of study, can be easily seen. Expanded tests of intelli- Throughout the history of AI, several paradigms emerged: gence were later proposed e.g. by Harnad [1991], who fo- symbolism, connectionism, hybrid approach and situated cuses on full repertoire of human intelligent behavior, and cognition, suppling the field with various approaches to reach Schweizer [2012], who accentuates the ability of a species the goal, see e.g. [Sun, 2001; Walter, 2010]. Of these to evolve its intelligent behavior. Searle [1980] in his Chi- paradigms the paper concerns mainly the hybrid approach nese Room Argument notices the connection of intelligence since it synergically combines both symbolism and connec- with meaning, understanding and intentionality. However, tionism. Certain aspects of situated cognition are also dis- presumptions of intelligence can be understood even more cussed. A brief summary of the paradigms of AI will be given widely, e.g. to the level of consciousness, as discussed in Section 3. in [Dennett, 1993]. 2.2 Cognitive Presumptions of Intelligence The field of psychology called psychometrics deals with a As de Mey [1992] points out, requires systematical measurement of psychological properties (espe- a system to have a world representation in some kind of cially intelligence) in humans using various tests. According [ ] model. However, de Mey also stresses the ways in which to Bringsjord and Schimanski, 2003 it gives an answer to the model interacts with the world through perception and the question, what is intelligence, therefore AI should be un- action. A stratified model of perception shows, how the ex- derstood as PAI and should focus on ”building information- pectations of a system based on its world model contribute to processing entities capable of at least solid performance on all its perception of some object and how this propagates back established, validated tests of intelligence and mental ability.” to the model. The way in which the model is updated fol- This results in an iterative approach of integrating the ability lows according to de Mey two steps: 1) implicit knowledge is of solving yet another test into the entity in question – which formed, 2) further interaction with the object makes the im- is useful from the engineering perspective. plicit knowledge explicit. However as a testing approach, PAI is somewhat imprac- To explain cognition, Sun [2007] proposes to use an inte- tical, since it deals with all established and validated tests grated hierarchical approach. Various sciences encounter the which is an open set. PAI also does not explicitly concern cognition on different levels of abstraction: social, psycho- the issue of defining intelligence and leaves it to psychology. logical, componential and physiological. This creates sets of That may be considered a benefit by some, however ques- constrains both from the upper levels, as well as from the tions can be raised, whether psychological definitions of hu- lower levels of abstraction. Using this approach, cognitive man intelligence implicitly present in the tests can be directly architectures – that is domain-generic computational models applied to an artificial system. A more general approach sup- capturing essential structures and processes of the – can ported by multidisciplinary interaction may be needed. The [ et al. ] be build. Those are useful both for understanding the cogni- limitations of PAI are also considered by Besold , 2015 tion as well as for building AI systems. Examples of cogni- stating several arguments questioning the necessity and suf- tive architectures include: modular neural-symbolic architec- ficiency conditions of directly using human intelligence tests ture CLARION [Sun, 2007], modular declarative-procedural to measure machine intelligence. This also calls for general- architecture ACT-R [Anderson et al., 2004], or recursive al- ization and improvement of tests used by PAI. [ ] gorithmic interpolating model Ouroboros [Thomsen, 2013]. As argued by Legg and Hutter, 2007 , to achieve the ulti- mate goal of AI, intelligence has to be precisely and formally 2.3 Strong, Weak, Specific and General AI defined. Ideally, the definition should allow for some sort of comparison or measurement to serve as a guide for reach- The ultimate goal of AI can be understood differently. To il- ing the goal. To create such a definition, Legg and Hutter lustrate it, a strong – weak distinction made by Searle [1980] studied a broad variety of definitions, theories, and tests of can be adopted. Originally, Searle based his distinction on human, animal, and artificial intelligence. Noting their com- whether AI is the explanation of mind (Strong AI), or only mon essential features, Legg and Hutter derived a following a tool for modeling and simulating mind (Weak AI). He informal definition: ”Intelligence measures an agent’s ability also identifies the strong AI with traditional symbolic ap- to achieve goals in a wide range of environments.” The for- proach. However, the distinction can be interpreted more malization of the definition dubbed as Universal Intelligence freely: Strong AI would be such an AI system, that it would is shown by Figure 1. be comparably powerful to human intelligence, bearing in mind its universality. On the other hand, Weak AI would not Υ(π):= 2 K(µ)V π be universal, it would only provide humans with useful tools − µ µ E to solve certain problems. ￿∈ This interpretation of Searle’s strong – weak AI distinction is similar to more recent general – specific AI distinction, as Figure 1: Universal Intelligence Υ of agent π is given by [ ] agent’s ability to achieve goals as described by a value func- given e.g. in Goertzel, 2014 . Specific AI offers programs π capable of solving a specific task, or perhaps a limited set of tion Vµ as an expected sum of all future rewards over a set of tasks, no matter how the task itself is sophisticated. General environments µ weighted by Kolmogorov complexity func- AI strives for programs capable of general problem solving, tion K. For more detailed explanation see [Legg and Hutter, or general intelligent action. Such an AI would be able to 2007]. solve a broad set of tasks, ranging from simple to complex. Looking at the formal definition by [Legg and Hutter, 2.4 Defining Artificial and Natural Intelligence 2007] at Figure 1, following important building blocks can be noted: When dealing with AI, a question: ”What is intelligence?” The definition considers environment µ, agent π and naturally comes to mind. The question was already noted by • Turing [1950], however, he sidestepped it. Only recently, the their iterated interaction through actions ai of the agent, question came back into research focus. The issue can be ap- its perceptions (observations) oi and rewards ri originat- proached from different angles: 1) focusing more on the test- ing in the environment. The environment is described as ing aspect is so called psychometric AI (PAI) of [Bringsjord a computable probability measure µ of perceptions and and Schimanski, 2003], 2) stressing the need for a definition rewards given current interaction history. is the work of [Legg and Hutter, 2007]. With all computable environments considered and many • agent’s hypotheses of the current environment in the cur- With only N environment programs considered, simple rent iteration existing, Kolmogorov complexity K(µ) is • average is taken. However, the notion of Occam’s ra- K(µ) used as a measure in place for Occam’s razor 2− . If zor is kept in the way environment programs are sam- a short program can be used to describe the probability pled since Solomonoff’s Universal Distribution is used: l(p) measure of an environment (ie. as a hypothesis about M (x):= 2− . Therefore, a shorter pro- U p: (p)=x the environment), than the environment has a low com- gram has a higherU probability∗ of being used, but all pro- plexity, since Kolmogorov complexity is based on the grams are considered,￿ not only the shortest as is the case length of the shortest program describing a sequence of with Kolmogorov complexity. bits. Therefore, complex environments (hypotheses) are Empirical value function Vˆ π is used since only a limited less influential on the agent’s overall performance, than • pi the simple ones. However, even complex environments number of iterations is tried. Also, the rewards given (hypotheses) are considered if consistent with history of by the environment program are no longer bound by 1 interactions. as is the case with the definition, nor are in any way discounted to specify the temporal preference on this Agent’s ability to achieve goals is described by a value • π level. That is, total reward returned from a single trial function: V := E ( ∞ r ) 1 basically as maximiz- µ i=1 i ≤ of agent – environment interaction is used. ing the expected future rewards ri given past interaction with the environment.￿ Temporal preference is present in The AIQ test proposed by [Legg and Veness, 2013] enables the way, how rewards are distributed by the environment. testing of agents supplied internally by the prototype imple- That is, the task or environment itself decides whether a mentation or externally via a custom wrapper as is the case slow-learning but more accurate or fast-learning but in- with the AIXI approximation. The test is configurable namely accurate solution is better. by setting size of the environment programs sample, number of iterations for agent – environment interaction, sizes of ob- The definition proposed by Legg and Hutter [2007] enables servation and action space, or computation limit per iteration. ordering of performance of various agents, ranging from ran- The test uses a simple reference (a modified domly behaving to theoretically optimally intelligent agent BF reference machine). On the choice of reference machine, AIXI. As environments are weighted by their complexity however, the results are dependent. and all Turing-computable environments are considered, to achieve a high level of Universal Intelligence, a true gener- ality of the agent is required. As the definition draws from 3 Paradigms of Artificial Intelligence fundamental concepts of computation, information, and com- plexity, it is not culturally biased or anthropocentric. The As the AI field evolved, several paradigms appeared. Sym- definition is by itself not a test, however it has a potential bolism (3.1) draws its inspiration from logic and abstract to found a basis for some approximative test of intelligence. thought, while connectionism (3.2) is inspired by biologi- The potential was examined by [Legg and Veness, 2013] cal neural networks. Realizing their complementarity, hybrid proposing an approximation of Universal Intelligence mea- paradigm (3.3) tries to reach a synergic combination. Situ- sure called Algorithmic Intelligence Quotient (AIQ) as shown ated cognition (3.4) is inspired by phenomenology and biol- in Figure 2. Legg and Veness took many steps to transform ogy and accents interaction of an intelligent system with its the original definition into a practically feasible test resulting environment. in an open source prototype implementation. 3.1 Symbolic Paradigm N Drawing its inspiration from the very beginnings of com- ˆ 1 ˆ π Υ(π):= Vp puter science, symbolic paradigm is tightly connected to the N i i=1 ￿ idea of universal computation of Turing machine and ensuing computational theory of mind and functionalism, cf. [Tur- Figure 2: AIQ estimate of Universal Intelligence Υˆ of agent ing, 1936; Dennett, 1993]. Coined as Physical Symbol π is given by agent’s ability to achieve goals as described by System by [Newell and Simon, 1976], the paradigm deals an empirical value function Vˆ π as total reward returned from with an explicit representation of knowledge in the form of pi symbols structured into physical patterns by an external ob- a single trial of an environment program pi averaged over N randomly sampled environment programs. For more detailed server. However, the way in which symbols gain their mean- [ explanation see [Legg and Veness, 2013]. ing, known as the Grounding Problem, is crucial Harnad, 1990]. Several solutions have been proposed, which either deal with more symbols of different kinds, as e.g. Rapa- [ Looking at the formula of AIQ test by Legg and Veness, port’s [1995] Syntactic Semantics, and recent semantic ap- ] 2013 at Figure 2, following differences from the Universal proaches such as semantic web [Gray, 2014], or robotic in- Intelligence definition can be noted: teraction with the world, as Harand’s original solution, or The test considers a finite sample of N environment pro- even result in adopting another paradigm. Nevertheless, sym- • grams pi, agent π and their interaction. Same environ- bolic paradigm brings good results when algorithm is known, ment can be described by several programs and same knowledge are explicitly represented and the data processing program can be included in the sample many times. is mostly sequential. 3.2 Connectionist Paradigm the AIQ test. A system could be formally analyzed using the Inspired by massive parallelism of human brain, connection- Universal Intelligence definition. Let us now consider what ism is based on relatively simple computational units con- such possibilities require and what results can be achieved. nected to a complex network in which higher ability emerges 1. A choice of a suitable hybrid system: [ in accordance with emergentist theory of mind, cf Church- A broad search of existing hybrid systems should ] land and Churchland, 1990; Sun, 2001 . Starting with Mc- • be conveyed and some criteria defined. Culloch and Pitts’[1943] artificial neuron complemented by Obviously, to test a system using the AIQ test, the Hebb’s [1949] unsupervised learning, the paradigm evolved • more complex models such as Rosenblatt’s [1958] system should be implemented. For a formal analysis using the Universal Intelli- with supervised back-propagation learning of [McClelland et • al., 1986]. Connectionist paradigm is suitable for tasks where gence definition it should suffice that the system is knowledge is implicit and learning is needed providing paral- formally defined. lel and robust solution. 2. AIQ test of the chosen hybrid system: Technical issues can be expected when connecting 3.3 Hybrid Paradigm • the system with the test. Here, an open source sys- Realizing the complementarity of both symbolic as well as tem would be beneficial, as modifications necessary connectionist paradigm, it seems only natural to try a hybrid for the test could be realized. As for closed source approach. Inspired by dual-process approaches to theory of systems, it may be feasible to develop some kind of mind, hybrid paradigm seeks synergy in dual representation a wrapper, if an interface is well defined. The test of knowledge, e.g. [Sun et al., 2005]. The architecture of itself is open source, therefore some modifications hybrid systems is an important consideration: usually it is needed by a system can be feasible. some version of a heterogeneous modular system, however Obviously, the results will enable a comparison of several variants of tight or loose coupling of connectionist and • the evaluated system with other tested systems. So [ ] symbolic modules exist Sun, 2001 . There are two ways in far however, only simple agents and AIXI approxi- which learning of a hybrid system is realized: a bottom – up mations were tested to the best of my knowledge. approach is basically of explicit knowledge from The results will allow for incremental testing of im- implicit layer, while a top – down approach concerns a grad- • ual descend of explicit knowledge in an assimilative way into proved versions of the system, however it is un- the implicit layer [Sun, 2007]. clear, if they can shed some light on what to im- prove specifically. 3.4 Situated Cognition 3. Formal analysis of the chosen hybrid system: Rooted in French phenomenology of Merleau-Ponty [1942; As the definition focuses on the interaction of the 1945] and inspired by Maturana and Varela’s [1979] autopoi- • agent with environments, a formalization of the etic approach to biology, the paradigm of situated cognition system focused on the interaction is needed. It is stresses the role of physical body and interaction through per- an open question, if there are hybrid systems for- ception and action with the environment creating a species- mally described in such a way. specific Umwelt. Manifested mainly in Brooks’ [1991] re- If there are systems formalized structurally, a feasi- active approach to robotics, the situated cognition suppresses • bility of a transformation to the needed form is the the role of representation to a varying degree. An explana- issue. As the definition is constructed so that it is tion of different schools in situated cognition, i.e. embodied, applicable on the widest possible range of agents, embedded, extended, and enacted cognition is given e.g. by it tries not to constrain agent’s structure, however, Walter [2010]. some structures may be better than others in the terms of resulting behavior, so perhaps this is also 4 AGI Evaluation and Hybrid Paradigm worth of consideration. Having outlined presumptions of intelligence, a formal defi- The analysis can be also done less rigorously. Con- • nition, and a related practical test of AI, several research pos- sidering what sorts of tasks (environments) can the sibilities concerning evaluation of AI paradigms and systems system succeed in and how complex the tasks are, arise. The paper will mainly discuss ways of evaluating a can yield some insight of its Universal Intelligence. chosen hybrid system (4.1), and a hybrid paradigm in gen- This can be further specified if some estimates can eral (4.2). Alternatives of evaluating AI paradigms will be be made regarding the agent’s ability of keeping also mentioned (4.3), since a hypothesis is held, that hybrid history of its interactions with the environment and approach is more suitable for reaching AGI. making predictions for the future. The results will, obviously, allow for ordering the 4.1 Possibilities of Evaluating Presumptions of • systems according to their Universal Intelligence Intelligence of a Hybrid System and should help comparing different design im- First, a suitable hybrid system should be chosen. Then, there provements. However, this will strongly depend on seem to be two ways of evaluating presumptions of intelli- the precision and suitability of formalization of the gence of a hybrid system: A system could be tested using analyzed system, as discussed above. 4.2 Possibilities of Evaluating Presumptions of Similar considerations apply for the results as for • Intelligence of Hybrid Paradigm the previous case. Also, a comparison with the re- The evaluation of presumptions of intelligence can be brought sults of the induction can tell us how do the cur- to a more general level – the hybrid paradigm itself. Here, an rent hybrid systems fulfill the potential of the hy- inductive approach generalizing results of hybrid systems in brid paradigm. This also applies for a comparison the AIQ test and in the formal analysis according to the Uni- with an individual system. versal Intelligence definition could be practicable. Also, it may be possible to use a deductive approach based on the 4.3 Possibilities of Evaluating Presumptions of Universal Intelligence definition and a general formal de- Intelligence of the Paradigms of AI scription of hybrid systems. Let us now consider require- Taking the idea of evaluation one step further, it should be ments and possible results of such approaches possible to evaluate existing paradigms of AI. Again, an in- ductive approach based on AIQ tests and formal analyses of 1. Induction from AIQ test results and formal analyses: systems developed according the paradigms could be taken. An obvious condition is the feasibility of a test and Also, a deductive approach based on formal descriptions of • an analysis of a hybrid system as discussed in pre- AI paradigms and Universal Intelligence definition could be vious section. entertained. Let us now briefly consider requirements and For an inductive generalization tests of several hy- possible results of such an endeavor. • brid systems should be performed. Here, an issue 1. Induction from AIQ test results and formal analyses: of induction arises: Is it necessary to test all exist- As the approach is basically the same as in the pre- ing hybrid systems for the induction to be relevant? • Or will it be reasonably sufficient to test some sam- vious section, only applied to a broader set of sys- ple of existing systems? Having a poor result of tems and paradigms, mentioned requirements and a chosen hybrid system in AIQ test as a base for considerations also apply. generalization, would it indicate, that the hybrid ap- However, the issue of comparability of results is proach is poor, or that the system is yet incomplete? • even more significant. Due to the way the Univer- Such cases would support the need to generalize sal Intelligence definition and the derived AIQ test • from the widest base possible, ideally somehow were constructed, the comparability issue should be taking into the account, that some existing imple- taken care of. That is, as long as it is possible to test mentations may not be implemented well. Perhaps, and analyze the systems from the point of view of taking the complexity of a system into considera- their interaction with different environments, their tion similarly as it is used in the test and the defini- different structure will not matter in any other way tion to weight environments might help. than in which it contributes to their performance. The results can be used to compare existing Results of this approach should give some general • • standing of the hybrid paradigm regarding the Uni- paradigms of AI to each other. A hypothesis is versal Intelligence and AIQ. By itself it may not be held, that the hybrid paradigm is more suitable very interesting, however comparison to AIXI, or for reaching AGI. This is based on the fact that it other evaluated types of agents can be feasible and synergically combines symbolic and connectionist stimulating. paradigms as well as some philosophical consider- ations regarding the nature of the mind. That it- 2. Deduction from formal description of hybrid paradigm: self, obviously, cannot validate the hypothesis. An Clearly, a formal definition or description of hybrid evaluation on this level, however, should make the • paradigm is required. Is it possible to construct it validation possible. and how will it differ from a formal definition of a The considered level of evaluation is high enough specific hybrid system? • to contribute to the correctness of the very notion Moreover, the description has to be in a form suit- of functionalism. As the realization of the func- • able for the Universal Intelligence definition, which tion according to functionalism does not matter, stresses the ability of an agent to interact with its than if it is correct, the results should be generally environment. the same for all the paradigms with some caveats. Therefore a formal description of the structure of There is the recurrent issue of determining if the • hybrid paradigm will not suffice unless it is read- tested system represents well its paradigm, or if it is ily transformable to the form describing the inter- just poorly implemented, and if this could skew the action. There are deeper questions behind this is- results unfavorably for a certain paradigm. Also, sue: Is it possible to predict or describe interaction if the results were not reasonably comparable, it of the agent with its environment based on the in- would not disprove functionalism, as it may be that ner structure of the agent? To what degree does just some paradigm is poor. the structure contribute to the agent’s interaction The results could also contribute to the question of and performance? Are there some generally better • attainability of AGI, on condition that the concept structures? is more precisely specified with respect to AIQ or the Universal Intelligence. A possible way may be can validate the hypothesis that hybrid paradigm is more suit- to use fuzzy intervals over AIQ values. able for AGI than other approaches. All in all it seems that 2. Deduction from formal description of AI paradigms: focusing on evaluation of AI systems and paradigms can be worth of further research. Considerations for deduction mentioned by the pre- • Mentioning that, there is future work to be done in several vious section also apply. areas, ranging from practical to theoretical. Practical issues Also, the comparability of results ensured mainly include identifying ways of choosing AI systems suitable for • by the Universal Intelligence definition is more sig- AIQ testing and specifying a set of criteria for the choice. nificant at this level. Also, technical means of interfacing the system with the test Considered deductive approach can contribute as should be found. On the theoretical side, the research should • well to validation of the hypothesis of suitability focus on devising interaction-based formal descriptions of of the hybrid paradigm for AGI. systems so that they could be analyzed using the Universal The deduction can as well be instrumental in sup- Intelligence definition. Attention should be also given to find- • porting the functionalism. Compared to the induc- ing the ways of transformation existing structural formalisms tive approach, there would be a benefit in not need- into interaction-based. Generalizations of such formalisms ing to deal with whether a system is implemented should be looked for to facilitate the analysis on the level of well or not. AI paradigms. Similarly, the attainability of AGI could be some- • what enlightened. References Having both the results of the inductive approach as [Anderson et al., 2004] John R. Anderson, Daniel Bothell, • well as the deductive approach, their comparison Michael D. Byrne, Scott Douglass, Christian Lebiere, and can further clarify and support the finding in the Yulin Qin. An integrated theory of the mind. Psychologi- above mentioned areas. cal Review, 111(4):1036–1060, Oct 2004. 5 Conclusion and Future Work [Besold et al., 2015] Tarek Besold, Jose´ Hernandez-Orallo,´ and Ute Schmid. Can machine intelligence be measured The paper summarized philosophical and cognitive presump- in the same way as human intelligence? KI-Kunstliche¨ tions of intelligence in order to better grasp what artificial Intelligenz, pages 1–7, 2015. intelligence should strive for. The strong – weak and gen- eral – specific AI distinctions were described to illustrate the [Bringsjord and Schimanski, 2003] Selmer Bringsjord and opposing interpretations of the main goal of the AI. Special Bettina Schimanski. What is artificial intelligence? psy- attention was given to recent achievements in defining intel- chometric ai as an answer. In Proceedings of the 18th in- ligence. The Universal Intelligence definition of Legg and ternational joint conference on artificial intelligence (IJ- Hutter [2007] was presented, as well as its approximation the CAI’03), pages 887–893. Citeseer, 2003. [ ] AIQ test by Legg and Veness 2013 . [Brooks, 1991] Rodney A Brooks. Intelligence without rep- Further, the paper described existing paradigms of AI. Be- resentation. Artificial intelligence, 47(1):139–159, 1991. ginning with traditional symbolism and biologically inspired connectionism, the paper especially noted the hybrid ap- [Churchland and Churchland, 1990] Paul M. Churchland proach, since it focuses on finding a synergic combination of and Patricia Smith Churchland. Could a machine think? the two paradigms. Approaches of situated cognition which classical ai is unlikely to yield conscious machines; stress out the role of body and perception – action interaction systems that mimic the brain might. Scientific American, of an agent with its environment were also noted. 262(1):32–37, 1990. Finally, the paper discussed possibilities of evaluating AI [de Mey, 1992] Mark de Mey. The cognitive paradigm. Uni- from the AGI point of view on different levels of abstraction. versity of Chicago Press, Chicago and London, 1992. Basically the Universal Intelligence definition can be used for formal analysis of presumptions of intelligence, while the [Dennett, 1993] Daniel Clement Dennett. Consciousness ex- AIQ test can be used for practical testing. A specific system plained. Penguin Books, 1 edition, 1993. can be evaluated, or the focus can be more general on a spe- [Descartes, 1637] Rene´ Descartes. A Discourse on Method. cific paradigm or even on all paradigms. Using the AIQ test Oxford University Press, Oxford, 1992 (1637). calls for an inductive approach with induction-specific limi- tations, while using the Universal Intelligence definition re- [Goertzel, 2014] Ben Goertzel. Artificial general intelli- quires a deductive formal analysis based on a formal specifi- gence: Concept, state of the art, and future prospects. cation of a system or a paradigm. While the results of the AIQ Journal of Artificial General Intelligence, 5(1):1–48, test can be used to compare systems and paradigms, and also 2014. to compare incremental versions of them, the results cannot [Gray, 2014] Norman Gray. Rdf, the semantic web, jordan, easily identify ways in which an extension of a system should jordan and jordan. 2014. be attempted. If an evaluation of all paradigms is undertaken, it can provide some clues for the validity of functionalism, [Harnad, 1990] Stevan Harnad. The symbol grounding prob- however it cannot disprove it. Also, this level of evaluation lem. Physica D, pages 335–346, June 1990. [Harnad, 1991] Stevan Harnad. Other bodies, other : [Sun, 2007] Ron Sun. The importance of cognitive architec- A machine incarnation of an old philosophical problem. tures: An analysis based on clarion. Journal of Exper- Minds and Machines, 1(1):43–54, 1991. imental & Theoretical Artificial Intelligence, 19(2):159– 193, 2007. [Hebb, 1949] Donald Olding Hebb. The organization of be- havior. Wiley, New York, 1949. [Thomsen, 2013] Knud Thomsen. The cerebellum in the ouroboros model, the “interpolator hypothesis”. In Shunji [ ] Legg and Hutter, 2007 Shane Legg and Marcus Hutter. Shimizu and Terry Bossomaier, editors, COGNITIVE Universal intelligence: A definition of machine intelli- 2013, The Fifth International Conference on Advanced gence. Minds and Machines, 17(4):391–444, Dec 2007. Cognitive Technologies and Applications, pages 37–41, [Legg and Veness, 2013] Shane Legg and Joel Veness. An Valencia, Spain, 2013. IARIA. approximation of the universal intelligence measure. In [Turing, 1936] Alan Mathison Turing. On computable Algorithmic Probability and Friends. Bayesian Prediction numbers, with an application to the entscheidungsprob- and Artificial Intelligence, pages 236–249. Springer, 2013. lem. Proceedings of the London Mathematical Society, [Maturana and Varela, 1979] and Fran- 2(42):230–265, 1936. cisco Varela. Autopoiesis and cognition. Boston studies [Turing, 1950] Alan Mathison Turing. Computing machin- in the philosophy of science ; v. 42. D. Reidel Pub. Co., ery and intelligence. Mind, 59(236):433–460, 1950. 1979. [Vadinsky,´ 2013] Ondrejˇ Vadinsky.´ Towards an artificially [McClelland et al., 1986] James L McClelland, David E intelligent system: Philosophical and cognitive presump- Rumelhart, PDP Research Group, et al. Parallel distributed tions of hybrid systems. In Shunji Shimizu and Terry processing. Explorations in the microstructure of cogni- Bossomaier, editors, COGNITIVE 2013, The Fifth Inter- tion, 2, 1986. national Conference on Advanced Cognitive Technologies [McCulloch and Pitts, 1943] Warren Sturgis McCulloch and and Applications, pages 97–101, Valencia, Spain, 2013. WA Pitts. A logical calculus of the ideas immanent in IARIA. neural nets. Bulletin of Mathematical Biophysics, 5:115, [Vadinsky,´ 2014] Ondrejˇ Vadinsky.´ Na cesteˇ k umeleˇ in- 1943. teligentn´ım system´ um:˚ Intencionalita a percepcnˇ e-akˇ cnˇ ´ı spirala´ v hybridn´ıch systemech´ , pages 217–222. Slezska´ [ ] The Struc- Merleau-Ponty, 1942 Maurice Merleau-Ponty. univerzita, Opava, 1 edition, 2014. ture of Behaviour. Beacon Press, Boston, 1963 (1942). [Walter, 2010] Sven Walter. Locked-in syndrome, bci, and a [Merleau-Ponty, 1945] Maurice Merleau-Ponty. Phe- confusion about embodied, embedded, extended, and en- nomenology of Perception. Humanities Press, New York, acted cognition. Neuroethics, (3):61–72, 2010. 1962 (1945). [Newell and Simon, 1976] Allen Newell and Herbert A Si- mon. Computer science as empirical inquiry: Symbols and search. Communications of the ACM, 19(3):113–126, 1976. [Rapaport, 1995] William J. Rapaport. Understanding un- derstanding: Syntactic semantics and computational cog- nition. Philosophical perspectives, 9:49–88, 1995. [Rosenblatt, 1958] . The perceptron: a probabilistic model for information storage and organiza- tion in the brain. Psychological review, 65(6):386, 1958. [Schweizer, 2012] Paul Schweizer. The externalist founda- tions of a truly total turing test. Minds and Machines, 22(3):191–212, Aug 2012. [Searle, 1980] John Rogers Searle. Minds, brains, and pro- grams. Behavioral and Brain Sciences, 3(3):417–457, 1980. [Sun et al., 2005] Ron Sun, Paul Slusarz, and Chris Terry. The interaction of the explicit and the implicit in skill learning: A dual-process approach. Psychological Review, 112(1):159–192, Jan 2005. [Sun, 2001] Ron Sun. Artificial intelligence: Connection- ist and symbolic approaches, pages 783–789. Perga- mon/Elsevier, Oxford, 2001.