Modeling Changing Perspectives — Reconceptualizing Sensorimotor Experiences: Papers from the 2014 AAAI Fall Symposium

Integration of and Machine Learning as a Tool for Creative Reasoning

Bartłomiej Snie´ zy˙ nski´ AGH University of Science and Technology al. Mickiewicza 30 30-059 Krakow, Poland e-mail: [email protected]

Abstract MILS combines many knowledge manipulation tech- niques during reasoning. It is able to use a background In this paper a method to integrate inference and ma- knowledge, simple proof rules (such as generalization or chine learning is proposed. Execution of learning al- gorithm is defined as a complex inference rule, which ) or complex patterns (machine learning algo- generates intrinsically new knowledge. Such a solu- rithms) to produce information that was not stored explicite tion makes the reasoning process more creative and al- in the . lows to re-conceptualize agent’s experiences depend- In the following sections related research is discussed, ing on the context. Knowledge representation used in the MILS model and inference algorithm are presented. the model is based on the Logic of Plausible Reason- Next, preliminary results of experiments: knowledge base ing (LPR). Three groups of knowledge transmutations and three use cases are described. are defined: search transmutations that are looking for the information in data, inference transmutations that Related research are formalized as LPR proof rules, and complex ones that can use machine learning algorithms or knowl- LPR was proposed by Alan Collins and Richard Michalski, edge representation change operators. All groups can who in 1989 published their article entitled ”The Logic of be used by inference engine in a similar manner. In Plausible Reasoning, A Core Theory” (Collins and Michal- the paper appropriate system model and inference al- ski 1989). The aim of this study was to identify patterns gorithm are proposed. Additionally, preliminary exper- of reasoning used by humans and create a , imental results are presented. which would be able to represent these patterns. The basic operations performed on the knowledge defined in the LPR Introduction include: abduction and deduction – used to explain and predict the Traditional reasoning techniques applied in AI offer conver- characteristics of objects based on domain knowledge; gent interpretation of the stored knowledge, which does not provide new knowledge. Machine learning techniques may generalisation and specialisation – allow for generalisa- be creative and provide diversity but are not integrated with tion and refining of information by changing the set of inference process. In this paper a method to integrate these objects to which this information relates to a set larger or two approaches is proposed. Execution of learning algo- smaller; rithm is defined as a complex inference rule, which produces abstraction and concretisation – change the level of detail new knowledge. Such a solution allows to re-conceptualize in description of objects; agent’s experiences depending on the context. It is possible similarity and contrast – allow the inference by analogy to change perspective in which stored data is analyzed and or lack of similarity between objects. intrinsically new knowledge is generated. The experimental results confirming that the methods of The solution proposed is formulated as a Multistrategy reasoning used by humans can be represented in the LPR are Inference and Learning System (MILS). The idea is based presented in subsequent papers (Boehm-Davis, Dontas, and on the Inferential Theory of Learning (Michalski 1994). Michalski 1990a; 1990b). The objective set by the creators In this approach, learning and inference can be presented has caused that LPR is significantly different from other as a goal-guided exploration of the knowledge space using known knowledge representation methods, such as classi- operators called knowledge transmutations. As a base for cal logic, fuzzy logic, multi-valued logic, Demster - Shafer knowledge representation the Logic of Plausible Reasoning theory, probabilistic logic, Bayesian networks, semantic net- (LPR) (Collins and Michalski 1989) is used. However, it is works, ontologies, rough sets, or default logic. Firstly, there possible to apply this approach using other knowledge rep- are many inference rules in LPR, which are not present in resentation techniques, which are based on logic. the formalisms mentioned above. Secondly, many parame- Copyright c 2014, Association for the Advancement of Artificial ters are specified for representing the uncertainty of knowl- Intelligence (www.aaai.org). All rights reserved. edge.

33 On the basis of LPR, a DIH (Dynamically Interlaced Hi- Using relational symbols, formulas of LPR can be 0 erarchies) formalism was developed (Hieb and Michalski defined. If o, o , o1, ..., on, a, a1, ..., an, v, c ∈ C, 1993b; 1993a). Knowledge consists of a static part repre- v1, ..., vn are lists of elements of C, then V (o, a, v), sented by hierarchies and a dynamic part, which are traces, H(o1, o, c), B(o1, o), S(o1, o2, o, a), E(o1, a1, o2, a2), playing a role similar to statements in LPR. The DIH dis- V (o1, a1, v1) ∧ ... ∧ V (on, an, vn) → V (o, a, v) are LPR tinguishes three types of hierarchies: types, components and formulas. priorities. The latter type of hierarchy can be divided into The LPR language can be extended by adding countable subclasses: hierarchies of measures (used to represent the set of variables, which may be used instead of constant sym- physical quantities), hierarchies of quantification (allowing bols in formulas. quantifiers to be included in traces, such as e.g. one, most, or To manage uncertainty the following label algebra is all) and hierarchies of schemes (used as means for the defini- used: tion of multi-argument relationships and needed to interpret A = (A, {fri }). (1) the traces). A is a set of labels which estimate uncertainty of formulas. ITL was formulated just after DIH development (Michal- Labeled formula is a pair f : l where f is a formula and ski 1994). Michalski et al. also developed ITL implemen- l ∈ A is a label. A set of labeled formulas can be considered tation - an INTERLACE system (Alkharouf and Michalski as a knowledge base. 1996). This system is based on DIH and can generate se- LPR inference patterns are defined as proof rules. Ev- quences of knowledge operations that will enable the deriva- ery proof rule ri has a sequence of premises (of length tion of a target trace from the input hierarchies and traces. pri ) and a conclusion. {fri } is a set of functions which Yet, not all kinds of hierarchy, probability and factors de- are used in proof rules to generate a label of a conclu- scribing the uncertainty of the information were included sion: for every proof rule ri an appropriate function fri : there. Also rule induction was not taken into account. pr A i → A should be defined. For rule ri with premises p : l , ..., p : l the plausible label of its conclusion Outline of the logic of plausible reasoning 1 1 n n is calculated using fri (l1, ..., ln). Examples of definitions MILS is based on LPR. It is formalized as a labeled deduc- of plausible algebras can be found in (Snie´ zy˙ nski´ 2001; tive system (LDS) (Gabbay 1991). If needed, instead of LPR 2002). another knowledge representation, which can be formulated There are five main types of proof rules: GEN, SPEC, using LDS, may be used. SIM, T RAN and MP . They correspond to the following The language consists of a finite set of constant symbols inference patterns: generalization, specialization, similarity C, five relational symbols and logical connectives: →, ∧. transformation, transitivity of relations and modus ponens. The relational symbols are: V, H, B, S, E. They are used to Some transformations can be applied to different types of represent: statements (V ), hierarchy (H,B), similarity (S) formulas, therefore indexes are used to distinguish different and dependency (E). versions of rules. Formal definitions of these rules can be Statements are represented as object-attribute-value found in (Collins and Michalski 1989; Snie´ zy˙ nski´ 2003). triples: V (o, a, v), where o, a, v ∈ C. It is a representation of the fact that object o has an attribute a equals v If object MILS Model o has several values of a, there should be several appropri- ate statements in a knowledge base. To represent vagueness MILS may be used to find an answer for a given hypothesis. of knowledge it is possible to extend this definition and al- The inference algorithm builds the proof using knowledge transmutations to infer the answer. It may also find substitu- low to use composite value [v1, v2, . . . , vn], list of elements of C. It can be interpreted that object o has an attribute a tions for variables appearing in the hypothesis. Three types of knowledge transmutations are defined in MILS: equals v1 or v2, ..., or vn. Relation H(o1, o, c), where o1, o, c ∈ C, means that o1 is • simple (LPR proof rules), o in a context c. Context is used for specification of the range • complex (using complex computations, e.g. rule induction of inheritance. o1 and o have the same value for all attributes algorithms or clustering methods) which depend on attribute c of object o. To show that one object is below the other in any hierar- • search (database or web searching procedures). chy, relation B(o1, o), where o1, o ∈ C, should be used. Knowledge transmutation can be represented as a triple: Relation S(o1, o2, c) represents a fact, that o1 is similar to (p, c, a), where p are (possibly empty) premises or precon- o2; o1, o2, c ∈ C. Context, as above, specifies the range of ditions, c is a consequence (pattern of formula(s) that can similarity. Only these attributes of o1 and o2 have the same be generated) and a is an action (empty for simple transmu- values which depend on c. tations) that should be executed to generate consequence if Dependency relation E(o1, a1, o2, a2), where o1, a1, o2, premises are true according to the knowledge base. a2 ∈ C, means that values of attribute a1 of object o1 de- Every transmutation has its cost assigned. The cost should pend on attribute a2 of the second object (o2). represent its computational complexity and (or) other im- In object-attribute-value triples, value should be placed portant resources that are consumed (e.g. database access or below an attribute in a hierarchy: if V (o, a, [v1, v2, . . . , vn]) search engines fees). Usually, simple transmutations have a is in a knowledge base, there should be also H(vi, a, c) for low cost, search transmutations have a moderate cost and any 1 ≤ i ≤ n, c ∈ C. complex ones have a high cost.

34 MILS inference algorithm (see algorithm 1) is an adapta- This algorithm generates a tree T , which nodes (N) are tion of LPR proof algorithm (Snie´ zy˙ nski´ 2003), where proof labeled by sequences of formulas. Every edge of T is labeled rules are replaced by more general knowledge transmuta- by a knowledge transmutation, which consequence can be tions. It is based on AUTOLOGIC system developed by unified with the first formula of a parent node or is labeled Morgan (Morgan 1985). To limit the number of nodes and by the term kb(l) if the first formula of a parent node can be to generate optimal inference chains, algorithm A* (Hart, unified with ψ : l ∈ KB. s is the root of T . It is labeled by Nilsson, and Raphael 1968) is used. [ϕ]. The goal is to generate a node labeled by empty set of formulas. Input: ϕ – formula, KB – finite set of labeled formulas As it was mentioned, to limit the number of nodes Output: If ∃l ∈ A such that ϕ : l can be inferred from expanded, A* algorithm is used. Therefore nodes in the KB: success, P – inference chain of ϕ : l OPEN sequence are ordered according to the values of from KB; else: failure evaluation function f : N → R, which is defined as fol- T := tree with one node (root) s = [ϕ]; lows: OPEN := [s]; f(n) = g(n) + h(n), (2) while OPEN is not empty do n := the first element from OPEN; where g : N → R represents the actual cost of the inference Remove n from OPEN; chain, using knowledge transmutation costs and label of ϕ if n = [] then that can be generated, and h : N → R is a heuristic function Generate proof P using path from s to n; which estimates the cost of the path from n to the goal node Exit with success; (e.g. minimal knowledge transmutation cost multiplied by the length of n can be used). end if the first formula of n represents action then All formulas in the proof path can be forgotten when a Execute action; new task is executed. But it is also possible to keep these if action was successfull then formulas in cache knowledge base together with the counter add action’s results to KB; that will indicate the number of proofs in which this such E:=nodes generated by removing from n formula is used. Using a formula in some proof should in- action formula; crease its counter, not using should decrease it. If the counter is equal 0, formula should be removed from temporal knowl- end edge base. else K := knowledge transmutations, which consequence can be unified with first formula of Preliminary Experimental Results n; In this section experimental implementation of MILS is de- E := nodes generated by replacing the first scribed and some examples of inference chains are pre- formula of n by premises and action of sented. transmutations from K and applying In the current version of software only one complex and substitutions from unifier generated in the several simple knowledge transmutations are implemented. previous step; They are: GEN , SPEC , SIM , GEN , SPEC , if the first formula from n can be unified with o o o v v SIM , SPEC , SIM , SPEC , H , T RAN , AQ, element of KB then v E E o→ B B Add to E node obtained from n by where the last one is a rule induction transmutation based removing the first formula and applying on Michalski’s AQ algorithm (Michalski 1973). Other rule substitutions from unifier; induction algorithms, like C4.5 (Quinlan 1993) may be also used. end Label algebra is very simple. Every formula is labeled end by a single value from the range [0, 1] representing its cer- Remove from E nodes generating loops; tainty or strength. Only hierarchy formulas are labeled by Append E to T connecting nodes to n; a pair of such values, one used in generalizations, second Insert nodes from E to OPEN; in specialization. To calculate label of the consequence, la- end bels of premises are multiplied. Cost of MP transmutation Exit with failure; is 0.2, cost of SPEC is 0.3, costs of H , T RAN and Algorithm 1: MILS inference algorithm o→ B B T RANP are equal 0.05. The rest of simple transmutations has cost 0.1. The complex transmutation has the highest Input data is a set of labeled formulas KB – a knowledge cost: 10. base and a hypothesis (question) represented by the formula The domain on which the system was tested is similar to ϕ, which should be proved from KB. If there exist a label that used to test LPR (Boehm-Davis, Dontas, and Michalski l ∈ A such, that ϕ : l can be inferred from KB, appropriate 1990b). It represents a part of agent’s geography knowledge, inference chain is returned, else procedure exits with failure. hence some facts are uncertain or missing, some others are Agent’s experience and the context description should be not true. Hierarchy of objects used is presented in figure 1, also stored in KB as LPR formulas. statements are presented in table 1.

35 Figure 1: Hierarchy of objects

Table 1: Statements Place Literacy GNP change GNP per capita Government type Europe high - - - Germany high slow decrease high democracy Poland high slow growth medium democracy Albania medium stable medium democracy China low fast growth low communism North Korea medium slow decrease low communism x medium slow growth low -

The following three questions were asked to present how that literate rate in Europe is high (statement V(europe, system is able to infer plausible knowledge: lit rate, high)), Slovakia is a typical European coun- 1. Is in Germany GNP growth? – v(germany, try in the context of culture (H(slovakia, europe, gnp-chg, gnp-grow), culture)), and literate rate depends on the culture con- text, we can specialize the object and replace europe with 2. What is literate rate in Slovakia? – v(slovakia, slovakia. lit-rate, X), It is not possible to answer the third question using the 3. Is government in some unknown country x communistic? knowledge base and simple inference rules. Therefore AQ – v(x, gov, com). complex rule is chosen by the inference algorithm. As a result it produced implication formulas with consequence Answers returned by the system are presented in table 2. matching the statement V(place, gov, com). Train- Each answer has two parts: an inferred answer to the ques- ing data was prepared using statements describing all the tion, and an inference chain used to derive the answer in a countries. Every country was a separate example. Class at- form of proof tree. Proof tree has a following form: tribute was equal to gov. As a result, one such formula was generated: p(ϕ, l, r, P ), (3) where ϕ : l is a formula proved using rule r and P is a list of V (place, lit rate, [low, medium]) ∧ proof trees representing inference of premises of r. If ϕ : l V (place, gnp chg, [fast grow, slow decr, slow grow]) → was taken from then knowledge base, then P is empty and V (place, gov, com) r = kb(l). To answer the first question MILS used simple Next, this implication was specialized using SPECo→ knowledge transmutation GENv which corresponds to abstraction. In the knowledge base a statement which replaced place with x. The result was derived by V(germany, gnp-chg, slow-grow) is stored. modus ponens using information about x from the knowl- edge base. Using GENv slow-grow is replaced by more general value gnp-grow. The last example clearly demonstrates the difference be- The second question is answered using SPECo infer- tween MILS and traditional inference engines. Without ence rule, which corresponds to specialization. Knowing learning inference rule the answer would not be found.

36 1. v(germany, gnp_chg, gnp_grow) p(v(germany, gnp_chg, gnp_grow), vPL(0.9), genv, [ p(h(slow_grow, gnp_grow, all), hPL(1.0, 0.3), kb, []), p(b(gnp_grow, gnp_chg), bPL(1), kb, []), p(e(gnp_chg, europe, gnp_chg, all), ePL(1.0), kb, []), p(h(germany, europe, all), hPL(1.0, 0.1), kb, []), p(v(germany, gnp_chg, slow_grow), vPL(0.9), kb, []) ])

2. v(slovakia, lit_rate, X) p(v(slovakia, lit_rate, high), vPL(0.9), speco, [ p(h(slovakia, europe, culture), hPL(1.0, 0.01), kb, []), p(e(europe, lit_rate, europe, culture), ePL(1), spece, [ p(h(europe, place, all), hPL(1.0, 1.0), kb, []), p(e(place, lit_rate, place, all), ePL(1.0), kb, []), p(e(place, lit_rate, place, all), ePL(1.0), kb, []) ]), p(v(europe, lit_rate, high), vPL(0.9), kb, []) ])

3. v(x, gov, com) p(v(x, gov, com), vPL(0.8), aq, [ p(aq(x, gov, com), vPL(_), kb, []), p(v(x, gov, com), vPL(0.8), modusponens, [ p(impl(v(x, gov, com), [v(x, lit_rate, [low, medium]), v(x, gnp_chg, [fast_grow, slow_decr, slow_grow])]), iPL(1), speci, [ p(h(x, place, all), hPL(1.0, 0.01), kb, []), p(impl(v(place, gov, com), [v(place, lit_rate, [low, medium]), v(place, gnp_chg, [fast_grow, slow_decr, slow_grow])]), iPL(1), aq, []) ]), p(v(x, lit_rate, [low, medium]), vPL(0.8), kb, []), p(v(x, gnp_chg, [fast_grow, slow_decr, slow_grow]), vPL(0.9), kb, []) ]) ])

Table 2: Questions and answers returned by MILS

Conclusions and Further Works Acknowledgments The research reported in the paper was supported by the Multistrategy Inference and Learning System (MILS) can be grant “Information management and decision support sys- considered as an inference system that can manage knowl- tem for the Government Protection Bureau” (No. DOBR- edge in a multi fashion way, in a way similar as human be- BIO4/060/13423/2013) from the Polish National Center for ings do. It combines search, inference and machine learn- Research and Development. ing capabilities that are performed in a uniform way. As a result, the reasoning process is more creative than in classi- References cal AI models. Depending on the agent’s experience and the Alkharouf, N. W., and Michalski, R. S. 1996. Multistrategy context, other knowledge may be discovered from the stored task-adaptive learning using dynamically interlaced hierar- statements. chies. In Michalski, R. S., and Wnek, J., eds., Proceedings Further works will concern enrichment of software capa- of the Third International Workshop on Multistrategy Learn- bilities by extending the range of implemented knowledge ing. transmutations. For example, adding clustering algorithm is Boehm-Davis, D.; Dontas, K.; and Michalski, R. S. 1990a. planed. It will be used to derive similarity formulas. Plausible reasoning: An outline of theory and the validation Testing the system in other (more realistic) domains is of its structural properties. In Intelligent Systems: State of also considered. Application in a system for data analysis the Art and Future Directions. North Holland. that is being developed for the Polish Government Protec- Boehm-Davis, D.; Dontas, K.; and Michalski, R. S. 1990b. tion Bureau is considered. MILS will be applied in decision A validation and exploration of the Collins-Michalski the- support component. Application of MILS in robotics seems ory of plausible reasoning. Technical report, George Mason to be also a good research direction. University.

37 Collins, A., and Michalski, R. S. 1989. The logic of plausi- ble reasoning: A core theory. Cognitive Science 13:1–49. Gabbay, D. M. 1991. LDS – Labeled Deductive Systems. Oxford University Press. Hart, P.; Nilsson, N. J.; and Raphael, B. 1968. A formal basis for the heuristic determination of minimum cost path. IEEE Trans. Syst. Science and Cybernetics 4 (2):100–107. Hieb, M. R., and Michalski, R. S. 1993a. A knowledge representation system based on dynamically interlaced hier- archies: Basic ideas and examples. Technical report, George Mason University. Hieb, M. R., and Michalski, R. S. 1993b. Multitype infer- ence in multistrategy task-adaptive learning: Dynamic inter- laced hierarchies. Technical report, George Mason Univer- sity. Michalski, R. S. 1973. AQVAL/1 – computer implemen- tation of a variable valued logic VL1 and examples of its application to pattern recognition. In Proc. of the First In- ternational Joint Conference on Pattern Recognition. Michalski, R. S. 1994. Inferential theory of learning: De- veloping foundations for multistrategy learning. In Michal- ski, R. S., ed., Machine Learning: A Multistrategy Approach, Volume IV. Morgan Kaufmann Publishers. Morgan, C. G. 1985. Autologic. Logique et Analyse 28 (110-111):257–282. Quinlan, J. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann. Snie´ zy˙ nski,´ B. 2001. Verification of the logic of plausi- ble reasoning. In Kłopotek, M., and et al., eds., Intelligent Information Systems 2001, Advances in Soft Computing. Physica-Verlag, Springer. Snie´ zy˙ nski,´ B. 2002. Probabilistic label algebra for the logic of plausible reasoning. In Kłopotek, M., and et al., eds., In- telligent Information Systems 2002, Advances in Soft Com- puting. Physica-Verlag, Springer. Snie´ zy˙ nski,´ B. 2003. Proof searching algorithm for the logic of plausible reasoning. In Kłopotek, M., and et al., eds., In- telligent Information Processing and Web Mining, Advances in Soft Computing, 393–398. Springer.

38