Logical Formalizations of Commonsense Reasoning: Papers from the 2015 AAAI Spring Symposium

Fast and Loose Semantics for Computational Cognition

Loizos Michael Open University of Cyprus [email protected]

Abstract tics. The questions that arise, the alternatives that one could Psychological evidence supporting the profound effort- have adopted, and the prior works that are relevant to such an lessness (and often substantial carelessness) with which endeavor, are much too numerous to be reasonably covered human cognition copes with typical daily life situations in the constraints of one paper. We view this work as a start- abounds. In line with this evidence, we propose a formal ing point for a fuller investigation of this line of research. semantics for computational cognition that places em- phasis on the existence of naturalistic and unpretentious Perception Semantics algorithms for representing, acquiring, and manipulat- ing knowledge. At the heart of the semantics lies the re- We assume that the environment determines at each moment alization that the partial nature of perception is what ul- in time a state that fully specifies what holds. An agent never timately necessitates — and hinders — cognition. Inex- fully perceives these states. Instead, the agent uses some pre- orably, this realization leads to the adoption of a unified specified language to assign finite names to atoms, which are treatment for all considered cognitive processes, and to used to represent concepts related to the environment. The the representation of knowledge via prioritized implica- set of all atoms is not explicitly provided upfront. Atoms are tion rules. Through discussion and the implementation encountered through the agent’s interaction with its environ- of an early prototype cognitive system, we argue that ment, or introduced through the agent’s cognitive processing such fast and loose semantics may offer a good basis for mechanism. At the neural level, each atom might be thought the development of machines with cognitive abilities. of as a set of neurons that represent a concept (Valiant 2006). A scene s is a mapping from atoms to {0, 1, ∗}. We write Introduction s[α] to mean the value associated with atom α, and call atom The founding statement of Artificial Intelligence (McCarthy α specified in scene s if s[α] ∈{0, 1}. Scenes s1,s2 agree et al. 1955) proceeds on the basis that “every [...] feature of on atom α if s1[α]=s2[α]. Scene s1 is an expansion of intelligence can in principle be so precisely described that a scene s2 if s1,s2 agree on every atom specified in s2. Scene machine can be made to simulate it” and proposes to “find s1 is a reduction of scene s2 if s2 is an expansion of s1.A how to make machines [...] solve kinds of problems now re- scene s is the greatest common reduct of a set S of scenes if served for humans”. Philosophical considerations relating s is the only scene among its expansions that is a reduction to the debate on vs. aside, it is hard of each scene in S. A set S of scenes is coherent if there not to acknowledge that modern-day Artificial Intelligence exists a scene that is an expansion of each scene in S. research has pushed forward on the latter front more than on In simple psychological terms, a scene can be thought of the former. As readily seen from the exceptionally power- as corresponding to the contents of an agent’s working mem- ful algorithms and the ingeniously crafted ory, where the agent’s perception of the environment state, reasoning algorithms, more emphasis has been placed on the and any inferences relevant to this perception, are recorded performance of the algorithms, and less emphasis on their for further processing. In line with psychological evidence, design to replicate certain features of the human intelligence. scenes actually used by an agent can be assumed to have a Making progress on the former front presumably necessi- severely-limited number of specified atoms (Miller 1956). tates the abandonment of rigid semantics and convoluted al- A formula ψ is true (resp., false) and specified in s if ψ gorithms, and the investigation of naturalistic and unpreten- (resp., ¬ψ) is classically entailed by the conjunction of each tious solutions, guided by psychological evidence on human atom α such that s[α]=1 and the negation of each atom α cognition. At the same time, and in contrast to the specializa- such that s[α]=0; otherwise, ψ is unspecified in s. tion of much modern-day Artificial Intelligence research, a When convenient, we represent a scene by the set of all holistic view of cognition should be taken, whereby the pro- literals (atoms or their negations) that are true in it, which cesses of perception, reasoning, and learning are considered suffices to fully capture the information available in a scene. in a unified framework that facilitates their close interaction. To formalize the agent’s interaction with its environment, In this spirit, we present a simple unified semantics and an let E denote the set of environments of interest, and let a par- early prototype cognitive system built on top of this seman- ticular environment dist, perc∈E determine: a proba-

114 bility distribution dist over states, capturing the possibly qualified on si by s, and (iii) is not endogenously qualified complex and unknown dynamics with which states are pro- by a rule in κ on si; such a rule r is called dominant in the duced; a stochastic perception process perc determining for step. each state a probability distribution over a coherent subset Intuitively, the truth-values of atoms specified in percept dist, perc of scenes. The agent has only oracle access to s remain as perceived, since they are not under dispute.1 The such that: in unit time the agent senses its environment and truth-values of other atoms in si are updated to incorporate obtains a percept s, resulting by an unknown state t being in si+1 the inferences drawn by dominant rules, and also dist s perc(t) drawn from , and scene then drawn from .A updated to drop any inferences that are no longer supported. s t percept represents, thus, what the agent senses from state . The inferences of a knowledge base on a percept are deter- An agent’s ultimate goal is to act in a manner appropriate mined by the set of scenes that one reaches, and from which for the current state of the environment. We shall not attempt one cannot escape, by repeatedly applying the step operator. to specify here how this appropriateness is defined or quan- tified. Suffices to say that this decision making problem is Definition 3 (Inference Trace and Frontier). The infer- κ s sufficiently arduous, and that the agent can be aided if more ence trace of a knowledge base on a percept is the in- trace(κ,s)=s ,s ,s ,... information on the current state of the environment becomes finite sequence 0 1 2 of scenes, with s = s s κ,s s i ≥ 0 available. Reasoning serves precisely this role: to complete 0 and i i+1 for each integer . The inference κ s information that is not explicitly available in a percept, and frontier of a knowledge base on a percept is the minimal front(κ,s) trace(κ,s) to facilitate, in this manner, the decision making process. To set of scenes found in a suffix of . do so, it utilizes knowledge that the agent has been given, or Lemma 1 (Properties of Inference Frontier). front(κ,s) has acquired, on certain regularities in the environment. is unique and finite, for any knowledge base κ and percept s. Reasoning Semantics In general, the inference frontier includes multiple scenes, Since reasoning aims to complete information, it should be and one can define many natural entailment notions. computationally easy to determine what inferences follow Definition 4 (Entailment Notions). A knowledge base κ from the agent’s knowledge, and inferences should follow applied on a percept s entails a formula ψ if ψ is: often. We present our chosen representation for this knowl- (E1) true in a scene in front(κ,s); edge, and defer the discussion of its properties for later. (E2) true in a scene in front(κ,s) and not false in others; A rule is an expression of the form ϕ  λ, where formula (E3) true in every scene in front(κ,s); ϕ is the body of the rule, and literal λ is the head of the (E4) true in the greatest common reduct of front(κ,s). rule, with ϕ and λ not sharing any atoms, and with ϕ being read-once (no atom appears more than once). The intuitive Going from the first to the last notion, entailment becomes reading of a rule is that when the rule’s body holds in a scene, more skeptical. Only the first notion of entailment captures an agent has evidence that the rule’s head should also hold. what one would typically call credulous entailment, in that A collection of rules could happen to simultaneously pro- ψ is possible, but ¬ψ might also be possible. The following vide evidence for conflicting conclusions. To resolve such result clarifies the relationships between these notions. conflicts, we let rules be qualified based on their priorities. Proposition 2 (Relationships Between Entailment No- A knowledge base κ = , over a set R of rules com- tions). A knowledge base κ applied on a percept s entails ψ prises a finite collection  ⊆Rof rules, and an irreflexive under Ei if it entails ψ under Ej, for every pair of entailment antisymmetric priority relation  that is a subset of  × . notions Ei,Ej with i

115 excludes reasoning by case analysis, where an inference fol- (κ,s) ||= ¬Flying, κ is resolute on s, and front(κ,s)= lows if it does in each of a set of collectively exhaustive {{Antarctica, Funny, Wings, Bird, ¬Flying, Penguin}}. cases. For a knowledge base κ and a percept s such that One may note some back and forth when computing the front(κ,s)={{α} , {β}}, for instance, the formula α ∨ β inference trace above. Initially the inference Bird is drawn, is true in every scene in the inference frontier by case analy- which then gives rise to Flying, and later to Feathers. When sis, and is entailed under E3, but is not entailed under E4. Penguin is subsequently inferred, it leads rule r1 to oppose In the special case where the inference frontier includes a the inference Flying coming from rule r2, and in fact over- single scene, all proposed entailment notions coincide. We ride and negate it. As a result of this overriding of Flying, restrict our attention in the sequel, and define our entailment Feathers is no longer supported through rule r6, and is also notion to be applicable only under this special case, remain- dropped, even though no other rule directly opposes it. ing oblivious as to what entailment means in general. In a sense, then, although the entailment semantics itself Definition 5 (Resolute Entailment). A knowledge base κ is skeptical in nature, the process through which entailment is resolute on a percept s if front(κ,s) is a singleton set; is determined is credulous in nature. It jumps to inferences then, the unique scene in front(κ,s) is the resolute con- as long as there is sufficient evidence to do so, and if there is clusion of κ on s. A knowledge base κ applied on a per- no immediate / local reason to qualify that inference. When cept s on which κ is resolute entails a formula ψ, denoted and if reasons emerge later that oppose an inference drawn (κ,s) ||= ψ,ifψ is true in the resolute conclusion of κ on s. earlier, then those are considered as they are made available. This fast and loose aspect of the reasoning semantics is in Definition 6 (Knowledge Base Equivalence). Knowledge line with well-investigated psychological theories for human bases κ1,κ2 are equivalent if for every percept s on which cognition (Collins and Loftus 1975), which can guide its fur- both κ1 and κ2 are resolute, front(κ1,s)=front(κ2,s). ther development (e.g., by introducing a decreasing gradient Below we write κ1 ⊆ κ2 for two knowledge bases κ1 = in activations that limits the length of the inference trace). 1, 1 ,κ2 = 2, 2 to mean 1 ⊆ 2 and 1 ⊆2. Why Not Equivalences? Theorem 3 (Additive Elaboration Tolerance). Consider The proposed semantics uses implication rules for its repre- two knowledge bases κ0,κ1. Then, there exists a knowledge sentation, and priorities among those rules that are conflict- base κ2 such that κ1 ⊆ κ2 and κ0,κ2 are equivalent. ing. One would not be at fault to wonder whether our chosen representation is nothing but syntactic sugar to hide the fact Proof. Initially, set κ2 := κ1. For every rule r : ϕ  λ in that one is simply expressing a single equivalence / defini- κ1, introduce in κ2 the rule f1(r):ϕ  ¬λ with a fresh tion for each atom. We investigate this possibility below. name f1(r). For every rule r : ϕ  λ in κ0, introduce in Fix an arbitrary knowledge base κ. Let body(r0) and κ2 the rule f0(r):ϕ  λ with a fresh name f0(r).Give head(r0) mean, respectively, the body and head of rule r0. priority to rule f0(r) over every other rule that appears in κ2 Let str(r0) mean all rules ri in κ such that r0,ri are con- because of κ1. For every priority ri 0 rj in κ0, introduce flicting, and r0  ri; i.e., rules that are stronger than r0. Let in κ2 the priority f0(ri) 2 f0(rj). The claim follows.  exc(r0)  body(r); i.e., the exceptions to r0. r∈str(r0) The result above shows, in particular, that even if an agent cond(λ)  (body(r0) ∧¬exc(r0)) Let head(r0)=λ ; i.e., is initially programmed with certain rules, the acquisition of the conditions under which literal λ is inferred. For each new rules — through learning, or otherwise — could nullify atom α, let def(α)  (U ∨ cond(α)) ∧¬cond(¬α), where their effect, if this happens to be desirable, without requiring U is an atom that does not appear in κ and is unspecified in a “surgery” to the existing knowledge (McCarthy 1998). every percept of interest. Let T [κ] be the theory comprising an equivalence def(α) ≡ α for each atom α appearing in κ. Illustration of Reasoning We show, next, a precise sense in which this set of equiv- For illustration, consider a knowledge base κ with the rules alences captures the reasoning via the prioritize rules in κ. r1 : Penguin  ¬Flying Theorem 4 (Prioritized Rules as Equiva- r2 : Bird  Flying lences). Consider a knowledge base κ, a per- r3 : Penguin  Bird cept s = ∅, and a scene si in which every atom κ,s r4 : Wings  Bird in κ is specified. Then: sisi+1 if and only if r5 : Antarctica ∧ Bird ∧ Funny  Penguin si+1 = {α | def(α) ≡ α ∈ T [κ], def(α) is true in si}∪ r6 : Flying  Feathers {¬α | def(α) ≡ α ∈ T [κ], def(α) is false in si}. and the priority r1  r2. Consider applying κ on a percept κ,s Proof. It can be shown that each dominant rule in sisi+1 s = {Antarctica, Funny, Wings}. Then, trace(κ,s)= will lead the associated equivalence to infer the rule’s head, {Antarctica, Funny, Wings}, and that when no dominant rule exists for an atom, then the {Antarctica, Funny, Wings, Bird}, associated equivalence will leave the atom unspecified. {Antarctica, Funny, Wings, Bird, Flying, Penguin}, {Antarctica, Funny, Wings, Bird, ¬Flying, Penguin, Feathers}, In Theorem 4 we have used the step operator with the per- {Antarctica, Funny, Wings, Bird, ¬Flying, Penguin}, cept s = ∅ simply as a convenient way to exclude the pro- {Antarctica, Funny, Wings, Bird, ¬Flying, Penguin}, ... cess of exogenous qualification, and show that endogenous

116 qualification among prioritized rules is properly captured by of reasoning, and that excluding it might be psychologically- the translation to equivalences. It follows that if one were to warranted. In this frame of mind, it is now the knowledge define a step operator for equivalences and apply the exoge- base that draws the more appropriate inferences in both ex- nous qualification coming from an arbitrary percept s on top amples, jumping to the conclusion that birds fly when no in- of the drawn inferences, one would have an equivalent step formation is available on their penguin-hood, and avoiding operator to the one using prioritized rules with the percept s. to draw a conclusion that would follow by a case analysis, What is critical, however, and is not used simply for con- when the scene on which the rules are applied does not pro- venience in Theorem 4, is the insistence on having a scene si vide concrete information for an inference to be drawn. in which every atom in κ is specified. Indeed, the translation Beyond the conceptual reasons to choose prioritized rules works as long as full information is available, which is, of over equivalences, there are also certain formal reasons. course, contrary to the perception semantics we have argued First, reasoning with equivalences is an NP-hard problem: for. In the case of general scenes, the translation is problem- evaluating a 3-CNF formula (as the body of an equivalence) atic, as illustrated by the following two simple examples. on a scene that does not specify any formula atoms amounts As a first example, consider a knowledge base κ with the precisely to deciding the formula’s satisfiability (Michael rules r1 : Bird  Flying, r2 : Penguin  ¬Flying, and the 2010; 2011b). Second, the knowledge representable in an priority r2  r1. Then, the following equalities hold: equivalence is subject to certain inherent limitations, which str(r1)={r2}, and exc(r1)=body(r2)=Penguin, and are overcome when multiple rules are used instead (Michael cond(Flying)=body(r1) ∧¬exc(r1)=Bird ∧¬Penguin. 2014). We shall not elaborate any further, other than to direct the reader to the cited works for more information. str(r2)=∅, and exc(r2)=⊥, and cond(¬Flying)= body(r ) ∧¬exc(r )= ∧¬⊥= Of course, one could counter-argue that the case analysis, 2 2 Penguin Penguin. and the intractability of reasoning that we claim is avoided def(Flying)=(U ∨ cond(Flying)) ∧¬cond(¬Flying)= by using prioritized rules can easily creep in if, for instance, (U∨(Bird∧¬Penguin))∧¬Penguin =(U∨Bird)∧¬Penguin. a knowledge base includes the rule ϕ  λ for ϕ = β ∨¬β, Thus, the resulting equivalence is (U∨Bird)∧¬Penguin ≡ or ϕ equal to a 3-CNF formula. Our insistence on using read- Flying. As it follows from Theorem 4, when si is the scene once formulas for the body of rules avoids such concerns. {Bird, Penguin}, {Bird, ¬Penguin}, {¬Bird, Penguin},or Our analysis above reveals that the choice of representa- {¬Bird, ¬Penguin}, both the considered knowledge base κ tion follows inexorably from the partial nature of perception. and the resulting equivalence give, respectively, rise to the Prioritized rules are easy to check, while allowing expressiv- same inference ¬Flying, Flying, ¬Flying, or ‘unspecified’. ity through their collectiveness, and easy to draw inferences However, when si does not specify both Bird and Penguin, with, while avoiding non-naturalistic reasoning patterns. the inferences of the formalisms may no longer coincide. When si = {Bird}, the considered knowledge base gives Why Not Argumentation? rise to the inference Flying, whereas the resulting equiva- Abstract argumentation (Dung 1995) has revealed itself as a lence gives rise to the inference ‘unspecified’ for Flying. powerful formalism, within which defeasible reasoning can Note that since the formalisms agree on all fully-specified be understood. We examine the relation of our proposed se- scenes, they may disagree on general scenes only because mantics to abstract argumentation, by considering a natural one of them infers ‘unspecified’ when the other does not; the way to instantiate the arguments and their attacks. formalisms will never give rise to contradictory inferences Definition 7 (Arguments). An argument A for the literal λ when used in a single step. However, because of the multiple given a knowledge base κ and a percept s is a minimal set of steps in the reasoning process, contradictory inferences may explanation-conclusion pairs of the form e, c that can be arise at the end. We omit the presentation of a formal result. ordered so that: c equals λ for a pair in A;ife equals s, then Based on the preceding example, one may hasten to con- c is a literal that is true in s;ife equals a rule r in κ, then jecture that the knowledge base always gives more infer- c is the head of the rule, and the rule’s body is classically ences, and that it is the equivalences that infer ‘unspecified’ entailed by the set of conclusions in the preceding pairs in A. when the formalisms disagree. This is not always the case. Definition 8 (Attacks Between Arguments). Argument A1 As a second example, consider a knowledge base κ with for λ1 attacks argument A2 for λ2 given a knowledge base κ the rules r1 : β  α, r2 : ¬β  α. It can be shown, then, and a percept s if there exist e1,c1∈A1 and e2,c2∈A2 that def(α)=U ∨ β ∨¬β = , with a resulting equiva- such that c1,c2 are the negations of each other, c1 = λ1, and lence ≡α. On scenes si that specify β, the formalisms either e1 = s or both e1,e2 are rules in κ such that e2  e1. coincide, but on the scene si = ∅, the considered knowledge base gives rise to the inference ‘unspecified’ for α, whereas Definition 9 (Argumentation Framework). The argu- A, R the resulting equivalence gives rise to the inference α. mentation framework associated with a knowledge κ s A In either of the two examples, one’s intuition may suggest base and a percept comprises the set of all argu- κ s that the formalism with an inference that specifies atoms is ments for any literal given and , and the attacking re- R ⊆ A × A A ,A ∈R A A more appropriate than the other one. However, this intuition lation such that 1 2 if 1 attacks 2 κ s comes from viewing the two formalisms as computational given and . tools aiming to draw inferences according to the underlying Definition 10 (Grounded Extension). A set Δ ⊆ A of ar- mathematical-logic view of the rules. We have argued, how- guments is the unique grounded extension of an argumen- ever, that case analysis is not necessarily a naturalistic way tation framework A, R if it can be constructed iteratively

117 as follows: initially, set Δ:=∅ and Ω:=A;moveanargu- Learning Semantics ment A1 from Ω to Δ if there is no argument A2 ∈ Ω such A ,A ∈R A Ω The knowledge base used for reasoning encodes regularities that 2 1 ; remove an argument 1 from if there in the environment, which can be naturally acquired through is an argument A2 ∈ Δ such that A2,A1∈R; repeat the Δ Ω a process of learning. An agent perceives the environment, last two steps until neither leads to any changes in and . and through its partial percepts attempts to identify the struc- Definition 11 (Argumentation Framework Entailment). ture in the underlying states of the environment. How can the A set Δ of arguments entails a formula ψ if ψ is classically success of the learning process be measured and evaluated? entailed by the set {λ | A ∈ Δ is an argument for λ} of lit- Given a set P of atoms, the P -projection of a scene s is erals. An argumentation framework A, R entails a for- the scene sP  {λ | λ ∈ s and (λ ∈ P or ¬λ ∈ P )}; the P - ψ ψ A, R mula if is entailed by the grounded extension of . projection of a set S of scenes is the set SP  {sP | s ∈ S}. Perhaps counter to expectation, this natural translation to Definition 12 (Projected Resolute). Given a knowledge argumentation does not preserve the set of entailed formulas. base κ, a percept s, and a set P of atoms, κ is P -resolute on s P front(κ,s) Theorem 5 (Incomparability with Argumentation Se- if the -projection of is a singleton set; then, P front(κ,s) mantics). There exists a knowledge base κ, a percept s, the unique scene in the -projection of is the P κ s and a formula ψ such that: (i) κ is resolute on s, and -resolute conclusion of on . (κ,s) ||= ¬ψ, (ii) the argumentation framework A, R as- Definition 13 (Projected Complete). Given a knowledge sociated with κ and s is well-founded, and A, R entails base κ, a percept s, and a set P of atoms such that κ is P - ψ. resolute on s, and si is the P -resolute conclusion of κ on s, κ is P -complete on s if si specifies every atom in P . κ Proof. Consider a knowledge base with the rules Definition 14 (Projected Accurate). Given a knowledge κ S s r1 :  ar2 :  br3 :  cr4 : c  ¬b base , a coherent subset of scenes, a percept , and a set P κ P s si P r5 : b  ¬ar6 : b  dr7 : d  ¬ar8 : ¬a  d of atoms, such that is -resolute on , and is the - resolute conclusion of κ on s, κ is P -accurate on s against and the priorities r4  r2, r5  r1, r7  r1. Consider the S if every atom that is true (resp., false) in si is either true percept s = ∅, on which κ is resolute. Indeed, trace(κ,s) (resp., false) or unspecified in each scene in the subset S. ∅, {a, b, c} , {¬a, ¬b, c, d} , {¬a, ¬b, c, d} ,... equals , and The notions above can then be used to evaluate the perfor- front(κ,s)={{¬a, ¬b, c, d}} (κ,s) ||= ¬a . Clearly, . mance of a given knowledge base on a given environment. Consider, now, the set Δ={A1,A2} with the argu- ments A1 = {r3,c , r4, ¬b}, A2 = {r1,a}. Observe Definition 15 (Knowledge Base Evaluation Metrics). dist, perc∈E P that no argument A3 is such that A3,A1∈R. Further- Given an environment , and a set of κ (1 − ε) (1 − ε) more, any argument A4 such that A4,A2∈R includes atoms, a knowledge base is -resolute, - (1−ε) dist, perc P either r5, ¬a or r7, ¬a, and necessarily r2,b. But, then complete,or -accurate on with focus 1−ε dist, perc A1,A4∈R. Hence, Δ is a subset of the grounded ex- if with probability an oracle call to gives t dist s tension of the argumentation framework A, R. Clearly, rise to a state being drawn from and a scene being perc(t) κ P A, R entails a. Note, also, that A, R is well-founded. drawn from such that, respectively, is -resolute s κ P s κ P s The claim follows immediately by letting ψ = a. on , is -complete on ,or is -accurate on against S, where S is the coherent subset of scenes determined by perc(t) At a conceptual level, the incomparability — even assum- . ing simultaneously a resolute knowledge base and a well- It would seem unrealistic that one universally-appropriate founded argumentation framework — can be traced back tradeoff between these evaluation metrics should exist, and to the fact that the argumentation semantics does not di- that a learner should strive for a particular type of knowledge rectly lend itself to a naturalistic reasoning algorithm. Unlike base independently of context. Nonetheless, some guidance the original semantics, the process of considering arguments is available. Formal results in previous work (Michael 2008; and attacks to compute the grounded extension proceeds in 2014) show that one cannot be expected to provide explicit a very skeptical and rigid manner. It meticulously chooses completeness guarantees when learning from partial per- which argument to include in the grounded extension, and cepts, and that one should focus on accuracy, letting the rea- thus which new inferences to draw, after ensuring that the soning process with the multiple rules it employs to improve choice is globally appropriate and will not be later retracted. completeness to the extent allowed by the perception process We conjecture that under additional natural assumptions, perc that happens to be available. This view is formalized the two considered semantics coincide. But more important in the cited works, as an extension of the Probably Approx- are the cases where any two given semantics diverge. Even imately Correct semantics for learning (Valiant 1984). if we were to accept that the grounded extension semantics The seemingly ill-defined requirement to achieve accu- is the “right” way to draw inferences, the goal here is not to racy against the unknown subset S — effectively, the under- define semantics that capture some ideal notion of reason- lying state t of a percept s — can be achieved optimally, in a ing, but semantics that are naturalistic and close to human defined sense, by (and only by) ensuring that the inferences cognition. Ultimately, only psychological experiments can that one draws are consistent with the percept itself (Michael reveal which semantics is more appropriate in this regard. 2010). In the language of this work, as long as the rules used

118 are not exogenously qualified during the reasoning process, 2010). The constituent processes for perception, reasoning, one can safely assume that the inferences drawn on atoms and learning should be (D4) integrated in a holistic archi- not specified in a percept are optimally accurate against S. tecture, naturally interacting with, and feeding from, each Although the reasoning process can cope with exogenous other (Michael 2014). Lastly, a cognitive system’s workings qualification, this feature should be used only as a last resort should (D5) avoid rigidness and be robust to moderate per- and in response to unexpected / exceptional circumstances. turbations of its current state, allowing the system to grace- It is the role of the learning process to minimize the occur- fully recover from externally / internally-induced errors. rences of exogenous qualifications, and to turn them into en- Embracing these desiderata, we have implemented a pro- dogenous qualifications, through which the agent internally totype cognitive system that perpetually obtains a new per- can explain why a certain rule failed to draw an inference. cept, reasons to draw inferences, evaluates its performance, Examining learnability turns out to provide arguments for and learns. Learning introduces new rules when new atoms and against the proposed perception and reasoning seman- are encountered, gradually corrects existing rules that are ex- tics and knowledge representation. On the positive side, it is ogenously qualified to maintain accuracy, and produces mu- known that learning in settings where the relevant atoms are tations of rules to improve completeness. Rules that are in- not determined upfront remains possible for many learning troduced / produced are always assigned lower priority than problems, and enjoys naturalistic learning algorithms (Blum existing rules to aid in completeness without affecting accu- 1992). Priorities between implications can be identified by racy as a side-effect, and are maintained in a sandbox part of learning default concepts (Schuurmans and Greiner 1994) the memory until sufficient confidence is gained for their ap- or exceptions (Dimopoulos and Kakas 1995). On the nega- propriateness. Rules that remain inapplicable for sufficiently tive side, partial percepts hinder learnability, with even deci- long are removed through a process of garbage collection. sion lists (hierarchical exceptions, bundled into single equiv- We are currently evaluating and comparing variants of the alences) being unlearnable under typical worst-case com- prototype system empirically, as a means to explore further plexity assumptions (Michael 2010; 2011b). Noisy percepts the semantics proposed in this work, and to guide the for- can also critically hinder learnability (Kearns and Li 1993). mulation of results that one may prove in relation to the sys- Back on the positive side, environments without adversar- tem’s operation. Initial empirical evidence indicates that the ially chosen partial and noisy percepts undermine the non- learned knowledge bases are almost 1-resolute, and achieve learnability results. The demonstrable difference of a col- surprisingly high completeness and accuracy, given the min- lection of prioritized implications from a single equivalence imal structure that is available in the simulated environment further suggests that the non-learnability of the latter need that we have been using. The learned knowledge bases seem not carry over to the former. Back on the negative side again, to comprise significantly more rules than one might have learning from partial percepts cannot be decoupled from rea- hoped for, an indication that a more aggressive garbage col- soning, and one must simultaneously learn and predict to get lection and / or a less aggressive introduction / production of highly-complete inferences (Michael 2014). Efficiency con- new rules might be warranted; or, perhaps, an indication that cerns, then, impose restrictions on the length of the inference when playing it fast and loose, the plurality of concepts and trace, which, fortuitously, can be viewed in a rather positive associations is not something to be dismissed, but embraced. light as being in line with psychological evidence on the re- stricted depth of human reasoning (Balota and Lorch 1986). Further Thoughts Overall, the proposed semantics and representation would For everyday cognitive tasks (as opposed to problem solving seem to lie at the very edge between what is and what is not tasks), humans resort to fast thinking (Kahneman 2011), for (known to be) learnable. This realization can be viewed as a form of which we have offered a first formalization herein. favorable evidence for our proposal, since, one could argue, Closely related is a neuroidal architecture that uses relational evolutionary pressure would have pushed for such an opti- implications and learned priorities (Valiant 2000a), but does mal choice for the cognitive processing in humans as well. not examine the intricacies of reasoning with learned rules on partial percepts. Extending our framework to relational A Cognitive System rules can proceed via known reductions (Valiant 2000b). In line with our analysis so far, we give here some desiderata Employing prioritized rules also, recent work on story un- for the design and development of cognitive systems. derstanding focuses on temporal aspects of perception and A cognitive system should have (D1) perpetual and sus- reasoning, which seem to call for more elaborate semantics tainable operation, without suppositions on the existence of (Michael 2013b; Diakidoy et al. 2014). Priorities are present bounded collections of atoms or rules with which the system in conditional preference networks, and work on learning deals, nor on the existence of a specified time-horizon after the latter could be brought to bear (Dimopoulos, Michael, which the system’s behavior changes. In particular, a cogni- and Athienitou 2009; Michael and Papageorgiou 2013). tive system should undergo (D2) continual improvement and Systems that extract facts or answers from the Web are evaluation, without designated training and testing phases, available (Carlson et al. 2010; Ferrucci et al. 2010), but work but rather with the ability to improve — if not monotoni- on endowing machines with websense (Michael 2013a) is cally, then roughly so — its performance (cf. Definition 15). considerably closer to the goal of acquiring human-readable Acquisition of knowledge in a cognitive system should (D3) rules (Michael and Valiant 2008; Michael 2008; 2009; 2010; employ autodidactic learning, avoiding to the extent possi- 2014). Extending the framework to learn causal knowledge ble the dependence on external supervision (Michael 2008; can build on top of existing work (Michael 2011a).

119 The desideratum for continual improvement and evalua- McCarthy, J.; Minsky, M. L.; Rochester, N.; and Shannon, tion relates to the view of evolution as constrained learning C. E. 1955. A Proposal for the Dartmouth Summer Research (Valiant 2009), which may prove useful for designing mono- Project on Artificial Intelligence. Report, Massachusetts In- tonically improving (Michael 2012) learning processes. stitute of Technology, A.I. Lab, Cambridge, MA, U.S.A. In mechanizing human cognition, it might be that getting McCarthy, J. 1998. Elaboration Tolerance. In Work- the behavior right offers too little feedback (Levesque 2014), ing notes of the 4th International Symposium on Logical and that looking into the human psyche is the way to go. Formalizations of Commonsense Reasoning (Commonsense 1998), 198–216. References Michael, L., and Papageorgiou, E. 2013. An Empirical In- Balota, D. A., and Lorch, R. F. 1986. Depth of Automatic vestigation of Ceteris Paribus Learnability. In Proceedings Spreading Activation: Mediated Priming Effects in Pronun- of the 23rd International Joint Conference on Artificial In- ciation but Not in Lexical Decision. Journal of Experimental telligence (IJCAI 2013). Beijing, China: IJCAI/AAAI. : Learning, Memory, and Cognition 12(3):336– Michael, L., and Valiant, L. G. 2008. A First Experimental 345. Demonstration of Massive Knowledge Infusion. In Proceed- Blum, A. 1992. Learning Boolean Functions in an Infinite ings of the 11th International Conference on Principles of Attribute Space. Machine Learning 9(4):373–386. Knowledge Representation and Reasoning (KR 2008), 378– 389. Sydney, Australia: AAAI Press. Carlson, A.; Betteridge, J.; Kisiel, B.; Settles, B.; Hruschka Jr., E. R.; and Mitchell, T. M. 2010. Toward an Architecture Michael, L. 2008. Autodidactic Learning and Reasoning. for Never-Ending Language Learning. In Proceedings of Ph.D. Dissertation, School of Engineering and Applied Sci- the 24th AAAI Conference on Artificial Intelligence (AAAI ences, Harvard University, Cambridge, MA, U.S.A. 2010). Atlanta, GA, U.S.A.: AAAI Press. Michael, L. 2009. Reading Between the Lines. In Proceed- Collins, A. M., and Loftus, E. F. 1975. A Spreading- ings of the 21st International Joint Conference on Artificial Activation Theory of Semantic Processing. Psychological Intelligence (IJCAI 2009), 1525–1530. Review 82(6):407–428. Michael, L. 2010. Partial Observability and Learnability. Diakidoy, I.-A.; Kakas, A. C.; Michael, L.; and Miller, R. Artificial Intelligence 174(11):639–669. 2014. Story Comprehension through Argumentation. In Michael, L. 2011a. Causal Learnability. In Proceedings Proceedings of the 5th International Conference on Compu- of the 22nd International Joint Conference on Artificial In- tational Models of Argument (COMMA 2014), volume 266 telligence (IJCAI 2011), 1014–1020. Barcelona, Catalonia, of Frontiers in Artificial Intelligence and Applications, 31– Spain: IJCAI/AAAI. 42. Scottish Highlands, U.K.: IOS Press. Michael, L. 2011b. Missing Information Impediments to Dimopoulos, Y., and Kakas, A. 1995. Learning Non- Learnability. In Proceedings of the 24th Annual Conference Monotonic Logic Programs: Learning Exceptions. In Pro- on Learning Theory (COLT 2011), volume 19 of JMLR Pro- ceedings of the 8th European Conference on Machine ceedings, 825–828. Budapest, Hungary: JMLR.org. Learning (ECML 1995), volume 912 of LNAI, 122–137. Michael, L. 2012. Evolvability via the Fourier Transform. Berlin: Springer. Theoretical Computer Science 462(1):88–98. Dimopoulos, Y.; Michael, L.; and Athienitou, F. 2009. Ce- Michael, L. 2013a. Machines with WebSense. In Work- teris Paribus Preference Elicitation with Predictive Guaran- ing notes of the 11th International Symposium on Logical tees. In Proceedings of the 21st International Joint Confer- Formalizations of Commonsense Reasoning (Commonsense ence on Artificial Intelligence (IJCAI 2009), 1890–1895. 2013). Dung, P. M. 1995. On the Acceptability of Arguments and Michael, L. 2013b. Story Understanding... Calculemus! In its Fundamental Role in Nonmonotonic Reasoning, Logic Working notes of the 11th International Symposium on Log- Programming, and n-Person Games. Artificial Intelligence ical Formalizations of Commonsense Reasoning (Common- 77(2):321–257. sense 2013). Ferrucci, D.; Brown, E.; Chu-Carroll, J.; Fan, J.; Gondek, Michael, L. 2014. Simultaneous Learning and Predic- D.; Kalyanpur, A. A.; Lally, A.; Murdock, J. W.; Nyberg, tion. In Proceedings of the 14th International Conference E.; Prager, J.; Schlaefer, N.; and Welty, C. 2010. Building on Principles of Knowledge Representation and Reasoning Watson: An Overview of the DeepQA Project. AI Magazine (KR 2014). Vienna, Austria: AAAI Press. 31(3):59–79. Miller, G. A. 1956. The Magic Number Seven, Plus or Kahneman, D. 2011. Thinking, Fast and Slow. New York: Minus Two: Some Limits on Our Capacity for Processing Farrar, Straus and Giroux. Information. Psychological Review 63(2):81–97. Kearns, M. J., and Li, M. 1993. Learning in the Presence of Schuurmans, D., and Greiner, R. 1994. Learning Default Malicious Errors. SIAM Journal on Computing 22(4):807– Concepts. In Proceedings of the 10th Canadian Conference 837. on Artificial Intelligence (AI 1994), 99–106. Levesque, H. J. 2014. On Our Best Behaviour. Artificial Valiant, L. G. 1984. A Theory of the Learnable. Communi- Intelligence 212:27–35. cations of the ACM 27(11):1134–1142.

120 Valiant, L. G. 2000a. A Neuroidal Architecture for Cogni- tive Computation. Journal of the ACM 47(5):854–882. Valiant, L. G. 2000b. Robust Logics. Artificial Intelligence 117:231–253. Valiant, L. G. 2006. A Quantitative Theory of Neural Com- putation. Biological Cybernetics 95(3):205–211. Valiant, L. G. 2009. Evolvability. Journal of the ACM 56(1):3:1–3:21.

121