Knowledge Representation: How far we have come? Daniel Khashabi University of Illinois, Urbana-Champaign

TechReport, Spring 2015

“Concepts are the glue that holds our mental world together”–Gregory Murphy

Abstract ogy about how humans solve problems. Having multiple trends in interpreting the form of under- The theory of Knowledge Representation standing, comprehension and reasoning in the hu- has been one of the central focuses in Arti- man mind has created a diverse set of options for ficial Intelligence research over the past 60 knowledge representation. years which has seen numerous propos- In this work we give a relatively comprehensive als and debates ranging from very philo- review on the long-standing works on knowledge sophical and theoretical level to practical representation with emphasis on natural language issues. Although a lot of work has been systems. The review will start from relatively fun- done in this area, we believe currently, it damental issues on representation, and will con- does not receive enough attention from the tinue to exemplary practical advancements. Due research community, except in very lim- to the close ties between representation and rea- ited perspectives. The goal of this work soning, various reasoning frameworks have ap- is to give a unified overview of a set of peared; we briefly explore them in conjunction select works on the foundations of knowl- with representation techniques. edge representation and exemplar practi- Key terms: Before starting our main conversation cal advancements. we define the terminology we will be using in this summary. We start with proposition. 1 Introduction Proposition: Propositions are judgments or opin- Knowledge Representation (KR) is the subfield of ions which can be true or false. A proposition Artificial Intelligence (AI) devoted to formalizing is not necessarily a sentence, although a sentence information about the world in a form that com- can express a proposition. For example the sen- puter systems can use in order to solve complex tence that “Cats cannot fly” contains the proposi- tasks, such as a robot moving objects in an envi- tion that cats are not able to fly. Similarly this sen- ronment or answering a question in the form of tence “Cats are unable to fly” and this declarative natural language. The work in knowledge repre- phrase with logic can-fly(cats)=false sentation goes beyond creating formalism for in- convey the same proposition. A claim or a sen- formation, and includes issues like how to acquire, tence can contain multiple atomic propositions. encode and access it, which sometimes is called For example “Many think that cats can fly”. knowledge engineering. Knowledge acquisition is Concept: Propositions can be about any physi- is the process of identifying and capturing the rel- cal object (like a tree, bicycle, etc) and any ab- evant knowledge, according to the representation. stract idea (like happiness, thought, betrayal, etc), The discussion of knowledge representation has which all are (usually) called concepts. a long history in AI and ranges very fundamental Belief: Belief is an expression of faith and/or trust problems like, whether knowledge representation in a proposition, although a belief might not nec- is necessary for an AI system? to relatively practi- essarily be true. For example “Homer believed cal questions like whether classical logical repre- that the earth is flat” is an proposition which con- sentation is complete? Many of the ideas in repre- tains the belief of “earth” being “flat”, which is sentation stemmed from observations in psychol- not true. “The weather forecast predicts a tornado for tomorrow.” is another proposition which ex- Natural Input presses another belief which might turn out to be true or false.

Cognition: Discussion of knowledge representa- Intermediate Input tion is closely related to the mental picture of the objects and concepts around us. Cognition is the mental action or process of acquiring knowledge AIComputer System Brain and understanding through thought, experience,

and the senses. Although cognition involves ab- Knowledge Base Knowledge stract mental representations of concepts, it can Intermediate Output have an outer physical source (like observation, hearing, touching, etc), or inner source (like think- ing), or a combination of inner and outer sources. Natural Output Knowledge: Knowledge is information, facts, un- derstanding, and skills acquired through experi- Figure 1: A conventional structure of an AI sys- ence or education; the theoretical or practical un- tem. The “intermediate” representation is the mid- derstanding of a subject. The philosophical nature dleman between the reasoning engine and the ac- of knowledge has long been studied in Epistemol- tual input information. In addition to information ogy (Steup, 2014), where it has been been classi- in the input, for many problems reasoning engine fied into different categories based on its nature, demands knowledge beyond the problem defini- and its inherent connection to belief, truth, justifi- tions itself, which is provided by a knowledge cation, etc is analyzed. base. Therefore the discussion of representation Representation: To solve many problems there is can be about either the input information, or the a need to use some form of information which internal knowledge of the reasoning system. might be implicit in the problem definition or even not mentioned; therefore this knowledge must be provided to the solver (which can be a computer edge representation is necessary or not. In a lot of or any other machine) in order to be used when problems, since dealing with the raw input/output needed. In other words, representation is a surro- complicates the reasoning stage, historically re- gate for a concept, belief, knowledge or proposi- searchers have preferred to devise intermediate in- tions. For example, the number 5 could be rep- put/output to be middleman between the natural resented with the string “5”, or with bits 101, information and the reasoning engine (Figure 1). or Roman numeral “V”, etc. As part of design- Therefore the need for intermediate level seems ing a program to solve a problem, we must de- to be essential. In addition, in many problems fine how to represent the required knowledge. The there is a significant amount of knowledge which theory of Knowledge Representation is concerned either is not mentioned, or it is implicit. Somehow with the formalization of how to represent propo- the extra information needs to be provided to the sitions, beliefs, etc. A representation scheme de- reasoning system, which is usually modeled as a fines the form of the knowledge saved in a knowl- knowledge base. The issue of representations ap- edge base. Any representation is an abstraction plies to both the input level information and the of the world and its objects. The abstraction level internal knowledge of the reasoning system. of a representation might be too detailed, or too Such separation of knowledge from reasoning coarse with only high-level information. system is inherited to us from the very early AI Reasoning: Reasoning is the heart of AI, the de- projects and nowadays many accept it as the right cision making system for solving a problem or the way of modeling problems. Whether this is def- action of thinking about something in a logical initely the right way or not, there are debates; at and sensible way in order to form a conclusion. least this is the popular way of doing it. We refer Why Knowledge Representation?: It is not a to some of the relevant debates in the forthcoming trivial question to answer whether having knowl- sections.

2 2 A brief history of Knowledge 1978) was one of the first working expert sys- Representation tems for hypothesis formation and discovery of unknown organic molecules, using its knowledge In the early history of AI, there was big series of chemistry. of oppositions to the idea of “abstract represen- There were parallel trend inspired by neurons tation of thought” among psychology researchers. in the human (Rosenblatt, 1958). This movement Perhaps one of the earliest works that created a started by modeling mental or behavioral phenom- separate representation of the knowledge from the ena, emergent from interconnected networks of solver engine was the General Problem Solver some units. It lost many of its fans after Min- (GPS) system of Newell and Simon (1961). This sky and Papert (1969) showed fundamental lim- system was intended to be a universal problem itations of low layer networks in approximating solver which could be formalized in symbolic some functions. However, a series of following form, like chess playing or theorem proving. The events gave another energy to neurally inspired idea of symbolic reasoning was developed cen- models; Rumelhart et al. (1988a) found a formal- turies earlier by Descartes, Pascal, Leibniz and ized way to train networks with more than one some other pioneers in philosophy of mind. The layer. Funahashi (1989) showed the universal ap- use of symbols as the fundamental elements of proximation property for feedforward networks, the representation, or the physical symbol system i.e. any continuous function on the real num- hypothesis (Newell and Simon, 1961) is a debat- bers can be uniformly approximated by neural net- able assumption which has continued its way until works; Rumelhart et al. (1988b) emphasized on now. the parallel and distributed nature of processing, After the early scalablity issues of GPS, it soon which gave rise to the name “connectionism”. became clear that a system designed for solving Much progress has been made, but still many any formalized problem is impractical due to the problems are unsolved to a large extent. Some is- combinatorial explosion of intermediate states for sues are far more foundational in AI, like authors numerous problems. Also, it became apparent who have claimed that human-level reasoning is early that one of the main problems was how to not achievable via purely computational means; represent the knowledge needed to solve a prob- e.g. (Dreyfus, 1992; Searle, 1980; Boden, 1996). lem. Since then, the trend moved towards limiting Similarly on the representation level, there are the scope and focusing on specific problems, and many issues that still are subject to debate. For ex- their properties. As a result, the family of sym- ample, the frame problem (McCarthy and Hayes, bolic and logical representations, such as propo- 1968) in logical representation, which is the need sitional and 1st-order logic gained popularity and for a compact way of expressing states which interesting applications came out (McCarthy and do not change in an environment as a result of Hayes, 1968). Similarly increased focus on spe- some actions. For example, coloring a table, def- cific applications gave rise to the popularity of Ex- initely does not change its position but when us- pert Systems (i.e. systems which are good on a ing classical logical reasoning, the independence specific set of problems, rather than everything) between ‘color’ property and ‘location’ property (Hayes-Roth et al., 1984). For example, the STU- needs to be explicitly encoded. Rather than en- DENT program of Bobrow (1964), written in LISP, coding exceptions in the knowledge base, there could read and solve high school algebra prob- has been proposals in the form of non-monotonic lems which were expressed in natural language; reasoning for the frame problem. In nonmono- the SHRDLU system of Winograd (1971) with a tonic reasoning it is possible to jump to a con- restricted natural language world model, could clusion and retract some of the conclusions previ- discuss and perform tasks in a simulated Blocks ously made, as further information becomes avail- World, a famous planning domain with a set of ob- able (which is the reason for being called non- jects (e.g. cubes, etc) on a table where the goal is monotonic). Despite the initial promise of the to build stacks of blocks, given natural language nonmonotonic reasoning methods, many of such inputs. DENDRAL (Buchanan and Feigenbaum, reasoning frameworks suffered from the issue of

3 consistency, i.e. ensuring that conclusions drawn 3 Is knowledge representation necessary are consistent with one another and with the as- for AI systems? sumptions. Some other works group properties into independent categories to encode causal in- The use of a representation layer is a common dependence of actions on properties; for exam- trend among AI researchers. However, (Agre and ple grouping properties with temporal and spatial Chapman, 1987; Brooks, 1990; Brooks, 1991) is boundaries (Hayes, 1995). among the few works which oppose the common trend. Rather than modeling intelligence via sym- Minsky and Fillmore were among the first to bol representation, this approach aims to use real- propose the frame-based representation (Minsky, time interaction with the environment to generate 1974; Fillmore, 1977), resulted in resources like viable responses. The big motivation for the de- FrameNet (Baker et al., 1998), or systems like sign of such situated behavior in robots is that KRL (Bobrow and Winograd, 1977). Frames can most human activity is concept-free, simply re- be seen as a restricted form of 1st-order knowl- actions to changes in their environment (Such- edge (Fikes and Kehler, 1985). A frame consists man, 1987; Maes, 1990); for example running, of a group of slots and fillers to define a stereotypi- avoiding collisions, etc. Brooks argues that rather cal object or activity. A slot can contain value such than defining knowledge for the robot, it should as rules, facts, images, video, procedures or even be able to obtain the behavior via interaction with another frame (Fikes and Kehler, 1985). Frames its environment, since “the world is its own best can be organized hierarchically, where the default model”. Brooks also argues that human-like in- values can be inherited the value directly from par- telligence should be evolved via interaction with ent frames. the environment, rather than defining it directly with representation. In other words, unlike the Frames later on evolved to other representati- common top-down approach, “intelligence is de- nal forms many of which are strict subsets of 1st- termined by the dynamics of interaction with the order logic and some are decidable and closely world”. Brook’s subsumption architecture became related to other formalisms such as modal log- successful in applications that needed real-time ics (Schild, 1991; Schmidt-Schauß and Smolka, interaction with a dynamic environment. 1991). One of such extensions is Description Brooks (1991) realizes his ideas with the Logics (also called concept languages or attribu- Mobots, which are constructed by linking small tive description languages (Schmidt-Schauß and state Finite State Machines (FSM). There is an Smolka, 1991)) that model the declarative part underlying assumption that the Mobot structure of frames using a logic-based semantics. One of can scale up to gain robustness in performance the reasons for such extension was adding more by overlaying more and more specialized mech- mathematical rigor in hierarchical definition of anisms. Clearly with such a design, the intermedi- knowledge. Description Logics (Borgida et al., ate representations (as the states of FSM’s) are in- 1989) guarantee polynomial-time decision algo- evitable. However Brooks avoids using any direct rithms by using operators on concept descriptions declarative representation. Although this idea (in contrast to the use of quantifiers in 1st-order found successes in some applications, it did not logic). gain much attention in the majority of AI tasks. Kirsh (1991) and Etzioni (1993) are among the What does a good knowledge representation opposing works to Brooks’ idea that most human look like? Many researchers have given some- activity is concept-free and AI tasks could be re- what different answers. In the following sections solved independent of having a direct declarative we will discuss some of the fundamental issues representation. and research questions regarding representation First, from a practical perspective, each FSM and related problems. Each section contains a dif- needs to be tuned to the right stimuli so as to al- ferent issue, although there might be overlaps be- low the world senses to work properly. This, com- tween the issues. bined with implicit representation (the state mem-

4 ory of FSM’s), can be considered as playing the in models which support uncertainty for reason- role of declarative knowledge. In addition, the de- ing; for example (Pearl, 1986; Cheeseman, 1985; sign of FSM’s for tasks that demand intricate rea- Zedeh, 1989; Nilsson, 1986; Darwiche and Gold- soning about the future is very hard and limited. szmidt, 1994; Spohn, 1988). For each of these Therefore one obstacle is the lack of large mem- methods, there are debates over the assumptions, ory for representation, which seems to be essential appropriateness and efficiency when used in the for tasks like natural language understanding. reasoning procedure. In addition, there are many tasks which cannot Probability Theory is undeniably the most well- be performed based on immediate sensory infor- known way of representing uncertainty. There is mation alone, but instead require a combination of a plethora of works that use probability functions perception, reasoning, recalling, etc. The decision either inside the logical models (Nilsson, 1986), that a chicken is not a dangerous animal, while an Graphical Models (Pearl, 1986; Lauritzen, 1996), eagle might be, rests in part on the knowledge that etc. Probability can be viewed as a generaliza- chickens are domesticated while eagles are wild. tion of classical propositional logic from a binary This is despite the fact that there are many sim- domain to a bounded continuous domain. There ilarities between the two birds in terms of sen- are rigorous studies on agreement between logical sory input: wings, feathers, beaks, claws. Other representation and probability theory (Cox, 1961; tasks may require reasoning or making predictions Jaynes, 2003). Despite the popularity of Proba- about the actions of other agents, such as when at- bility Theory, there are significant limitations and tempting to dribble past a defender in soccer. difficulties in applying probabilities, which has re- sulted in the development of other variations and 4 Representing Degrees of Beliefs heuristic techniques for approximating it. There Is uncertainty with respect to propositions im- is extensive experimental research in behavioral portant? Or can we create a complete AI sys- decision science which verifies that mental judg- tem without taking uncertainty into account? It ments often align with probabilistic rules or show seems that in the real world, almost any informa- an approximation of it. However, there are re- tion is subject to uncertainty. Therefore all reason- sults which show systematic bias in decision mak- ing problems are pervaded with uncertainty and ing and strict deviations from norms of probabil- doubt, which incessantly change along with inter- ity theory (Tversky and Kahneman, 1974), which actions with the environment. There seem to be is another obstacle in using probability theoretic many reasons for using probabilistic models for models. There is much disagreement on the inter- reasoning, although it still has its opponents. Un- pretation of probability functions as “degree of be- certainty may be a result of the uncertain nature lief” (subjective), versus the “long run frequency” of events, disagreement between different facts, (frequentist or objective) interpretation, etc (Hjek, or inaccurate or incomplete information. Linguis- 2012). Bayesianism which currently plays a big tic imprecision is a very common reason for the role in probability theory, uses the subjective in- occurrence of uncertainty in problems, and unfor- terpretation by using priors as an amount of belief tunately there is not much direct literature on it, in a proposition combined with the new evidence, unlike other sources of uncertainty. which results into a posterior probability. How should beliefs with respect to propositions Dempster-Shafer theory (DS) (Shafer, 1976) be represented? Is uncertainty deterministically can be seen as a generalization of the Bayesian quantifiable, or does it need to be modeled as an- theory. It represents uncertainty with belief func- other level of uncertainty? The very nature of tions, which are a way of representing epistemic uncertainty has been the subject of epistemolog- plausibilities, and are not necessarily the same ical studies for centuries. There are various forms as probability functions. DS also provides a of quantitative uncertainty, i.e. assigning numer- way of combining beliefs. From a computational ical values that express the degree to which we point of view, higher-order representations such as are uncertain about pieces of knowledge. During Dempster-Shafer are harder than the probabilistic the eighties there was a big movement to bring models.

5 Fuzzy Logic (Zadeh, 1973) can be thought of as represented so that when something happens, you the generalization of traditional logic to contain a can make analogies with others” (Minsky, 1988). truth value that ranges in degree between 0 and There are works focusing on elementary-school 1. Fuzzy logic deals with cases where something level story comprehension, which is full of com- could be partially true; we can think of it as deal- monsense facts (for example, Charniak (1972) and ing with “shades of grey” rather than the “black or Dejong (1979)). Unfortunately many of such ef- white” of classical logic. forts were tested on very limited problems, like the Block World problem. There have been fa- 5 Dealing with commonsense mous decade-long efforts to create knowledge Commonsense is usually defined as “knowledge bases which contain commonsense, such as about the everyday world that is possessed by all (Lenat, 1995) and ConceptNet (Liu and Singh, people” (Liu and Singh, 2004). It is usually the set 2004) (explained in the last section), but none of of information which is possessed by most people these have been shown to be a good solution to in a society; e.g. objects fall on the ground; chick- NLP problems, at least to a reasonable extent. ens do not fly; friends support each other; Barack Starting in the nineties, most of the excitement Obama is the US president 1. about commonsense disappeared and the focus Commonsense knowledge is gained, mostly, shifted towards side problems which seemed to be during human experience and interaction through easier and approximations to bigger challenges. our sensory signals about spatial, physical, social, Unfortunately there are not many success stories temporal, and psychological aspects of everyday in literature on this issue. life. Since the acquisition of such knowledge is Recently there have been multiple initiatives done automatically during our daily life, it is as- to attack problems which need deeper semantic sumed that every person possesses commonsense, reasoning (such as commonsense reasoning), e.g. and therefore typically it is never specified in any MCTest QA challenge (Richardson et al., 2013), sort of communication and text. Winograd Challenge (Levesque et al., 2011), In many problems in natural language under- Facebook’s QA dataset (Weston et al., 2015), high standing, there is a need for a surprising amount school science test challenge (Clark, 2015); al- of commonsense (for example, question answer- though no reasonable solution yet exists for any ing on elementary-school stories (Richardson et of the aforementioned challenges. One com- al., 2013), or (Dagan et al., mon property of these challenges is that all of 2010)). Commonsense can play an important role, them need extensive knowledge which is not di- even in mature NLP problems; for example take rectly mentioned in the problem or is at most im- the task of the phrase “animals other than plicitly mentioned, which makes it essential to dogs such as cats”, which is relatively easy for hu- have a knowledge layer along with reasoning, and man. For a computer parser it can be ambiguous stresses the importance of the representation prob- whether “cats are animals” or “cats are dogs” is lem. more plausible, although it is clear to humans that 6 Explicit representation vs. distributed cats cannot be dogs, based on our commonsense representation (Wu et al., 2012). Early AI, during the sixties and onward, experi- In this section we provide a couple of major ar- enced a lot of interest in modeling commonsense guments for and against connectionsm and clas- knowledge. McCarthy, one of the founders of AI, sifical formalisim (such as logical modeling). Be- believed in formal logic as a solution to common fore jumping into the arguments, we find it im- sense reasoning (McCarthy and Lifschitz, 1990). portant to mention that making a seamless com- Marvin Minsky, in his famous book, estimated parison between connectionism and classicism is that “... commonsense is knowing maybe 30 or 60 almost impossible, since they are based on differ- million things about the world and having them ent assumptions about the availability of resources 1This is commonsense for American people, but not nec- (such as data and computation) or the task being essarily in another country. solved. Thus one needs to be very careful in ac-

6 cepting arguments for or against these modeling observing the input in formal form, without ex- frameworks. tra repetitions.3 Another issue is the degradation How is knowledge encoded in human mind? of previously learned information in connection- Even after decades of study of the human mind, ist models, upon learning new facts (also known there is still debate on to what extent knowledge as known as catastrophic interference). While in is localized in mind. Some argue that all knowl- logical models, when adding more axioms to the edge is spread across the entire cortex and pat- system, with the use of a sensible reasoning sys- terns of neural firing responsible for representing tem, there is no loss of previous information. some external event. On the other hand, some be- In many AI designs, a considerable amount of lieve that certain kinds of information are stored initial knowledge is given to a learning agent. in tightly circumscribed regions. Since the struc- There has been a lot of debate about whether hu- ture of brain has been a source of inspiration for mans are born with knowledge or not. Pinker AI researchers, this issue has found mathematical and Bloom (1990) and Chomsky (1988) argue elaboration, giving rise to models and philosophi- that children are born with innate domain-specific cal issues. Connectionism is based on the assump- knowledge of the principles of grammar, which tion that cognition takes place via interactions be- is, perhaps a support for classical AI systems, in tween large numbers of simple processing units, which the system is given expert knowledge while linked via weighted connections. Each concept modeling. However, Elman (1996) argues against is represented by many neurons, the weights of the innateness hypothesis, which is a support for the connections and the topology of the network models learning from scratch, similar to learning (Touretzky and Hinton, 1985). Similarly, each statistical models with random initialization (in- neuron participates in the representation of many cluding many connectionist models). concepts. This is in contrast to ‘localist’ represen- One of the arguments against the connectionism tation which uses one symbol per concept. Local- models was proposed by (Fodor and Pylyshyn, ist designs are easy to comprehend and design, but 1988), where they questioned the ability of con- inefficient when data has big componential struc- nectionist networks to embody systematicity and ture. compositionality. Fodor and Pylyshyn argue that The first few decades of AI were dominated by any reasonable model of representation needs to symbolic representation. Symbolic logic provides have these properties; “Compositionality” is the well-understood declarative knowledge represen- property of natural language whereby one can cre- tation and reasoning paradigms. One big problem ate novel meaningful elements from smaller con- with these models is the need for a huge number stituents. “Systematicity” is the latent connections of knowledge definitions. Otherwise, there will created in language to link certain roles and con- be a sparsity issue and a lack of robustness as a stituents of sentences to create meaningful sen- result of missing information in knowledge. Dis- tences. For example, for any English-speaker the tributed representations tend to be more robust2, sentence “Jack loves Kate” makes sense, although since their models are based on vector represen- it might not make much sense if we randomly tation of input information (say, words). How- scramble the words. But we understand that “Kate ever, unlike logical representations, they lack rea- loves Jack” is probably implied from the previ- soning like induction, deduction, etc. Instead ous sentence. Such observations are the basis for they just learn a mapping from input to output the sub-symbolic hypothesis that the brain must space. Learning such a mapping usually needs a contain symbolic representations similar to the large number of training examples in order to ex- constituents of language (although independent of hibit enough generalization. Conversely, classical the language being spoken, hence called the “lan- models are capable of learning immediately upon guage of thought”). How can such linguistic phe- nomena which seems to have a discrete symbolic 2This phrase is a little inaccurate. A connectionist model needs to see many learning instances in order to generalize 3For human mind, this is subject of some debate. Some enough, just like the need for many rules in the classical believe that ‘rehearsal’ is a necessity before saving memories frameworks. into long-term memory (cf. Goldstein (2014)).

7 nature be analyzed with a network of functions? they fire asynchronously. The criticism of Fodor and Pylyshyn (1988) was One big practical limitation of many proposed followed by a series of responses arguing that sys- ideas for dynamic binding is that they are ineffi- tematicity and compositionality can be simulated cient when it comes to modeling facts and long- by means of complex networks (e.g. work by term memory. Smolensky (1990) proposed ten- Smolensky (1988)). Elman (1991) proposed a so- sor product for simulating the process of variable lution by introducing recursive neural networks binding, where symbolic information is stored which repeatedly uses it’s previous output and a at and retrieved from known locations. One part of the context to model the recursive behavior of the frameworks for natural language process- of the language. ing created based upon temporal synchrony was There have been many studies on the structural SHRUTI (Shastri and Ajjanagadde, 1993), in behaviors inside learned networks following em- which dynamic bindings are modeled by syn- pirical behavior observations in human. For ex- chronous firing of appropriate role and entity ample “U-shaped” development of cognition has cells. The first-oreder logic type knowledge is been observed in human when learning various modeled as directed connections. Inference in tasks. Usually when we are learning an action (say SHRUTI corresponds to a transient propagation the past form of verbs), the performance degrades of rhythmic signals over the network. In practice as a result of over-generalization of exceptions. creating such a network might require a gigantic This is followed by an increase in performance by space of nodes and links which is able to process learning the right level of generalization of rules. the information at each node in parallel. The au- Rumelhart and McClelland (1985) observed such thors at the time were motivated by the empirical behavior in their network when learning past tense results suggesting the possibility of synthesizing for English verbs. such structures. Another issue is the “binding” of variables, The debate about systenaticity instigated a se- which has a close connection to the “systematic- ries of works on neural-symbolic models which ity” issue raised by Fodor and Pylyshyn. Binding aim at the integration of neural networks and sym- is the act of representing conjunctions of proper- bolic knowledge. There might seem to be a ri- ties. A property can be anything being sensed, valry between connectionism and symbolic rep- for example: color, shape, orientation, etc. To resentation, however some believe that the way visually detect a red circle, among a blue circle the mind works is a combination of both. Al- and a blue rectangle, one must visually bind each though the mind implements a neural net, it is also object’s color to its shape (Treisman and Gelade, a symbolic processor at a higher level. When we 1980). Similarly, one can imagine thematic roles do many daily routine skills, such as opening a for groups of words in sentences which binds them door or riding a bicycle, it seems like we do not together (Fodor, 1983). For example, in order to think about details of the actions; we never rea- understand the statement, “Jack feels Kate is an- son about how to move our weight so that we gry” one must bind “Jack” to the agent role of keep our balance. Instead we do the actions un- “feels”. Binding is the backbone of symbolic rep- consciously and without much reasoning; over- resentation. The real world is full of concepts with thinking the details of our actions might actually complex combinations of properties which require confuse us. This perhaps might be an evidence for efficient dynamic binding mechanisms in order implicit knowledge in mind. On the other hand, to generate representations of perceptual objects the first time we want to drive a manual car, we and movements. However it is not obvious how might need to get instructions from someone, fol- to do dynamic binding in a mathematical frame- low instructions of a book, or observe someone work. One of the popular trends following neuro- who knows how to do it (each of which being dif- scientists’ observations of the human body (Gray ferent approximations to what actually needs to be and Singer, 1989), is the binding base on tempo- done). This part of our learning is more similar to ral synchrony of the connected units: if two units a classical form of learning (e.g. learning from are bound, then they fire synchronously; otherwise declarative/procedural definitions). When learn-

8 ing, while repeating the actions we need to care- ally the more detailed the information is, the more fully think about them and the order in which they computationally difficult it is to reason with. In must be done. By repetition of actions, they be- general there is a trade-off between the expressive come part of our skill, i.e. we do not need much level of the representation and the deductive com- thinking/reasoning while doing it. This can per- plexity. haps be seen as transfer of knowledge/skills from What if in the previous example, the goal is a explicit form to implicit representation. decision dependent on the exact job name? Then Following similar intuitions, a series of works case (2) would be the right abstraction level and appeared under “Hybrid Networks” or “Neural- case (1) a bad approximation to the information Symbolic Networks”. For example Towell and needed. This shows that the right level of ab- Shavlik (1993) proposed KBANN (Knowledge- straction depends on what is needed. The abstrac- Based Artificial Neural Network), a system for in- tion level should be in a way that it is “expressive sertion and refinement of a neural network with enough” to find the answer, without further addi- backpropagation, which also has the ability to ex- tional complications. tract rules from the resulting network. A pure con- nectionist model can be viewed as learning a set of Since many levels of abstraction might be nec- propositional rules. Minimizing the energy func- essary for reasoning, it might be a good idea to tion of a symmetric network corresponds to find- model an environment at multiple levels of ab- ing a model compatible with a set of propositonal straction (Rasmussen, 1985; Bisantz and Vicente, rules (Pinkas, 1991). But there is hope to gain 1994). Such hierarchical abstract can be modeled capabilities beyond propositional logic in connec- in various ways, 1st-order logic, frames, semantic tionist models. Ballard and Hayes (1986) for- graphs, etc, and the properties are inherited from mulated resolution on a restricted family of 1st- parent nodes to more refined nodes (for example order rules as an energy minimization problem on work with frames (Bobrow and Winograd, 1977)). a network. Holldobler¨ et al. (1991) developed One issue with the default pass of properties to a method for encoding propositionl logic inside children is how model exceptions. For example, if a multi-layer feed-forward network. It has been any “bird” can “fly”, any entity of type “bird” will claimed that they can create a loop between the inherit the “fly” property. What is the right way continuous network and the symbolic logic by ex- handle the exception that “chicken” does not fly? tracting the logical rules from the network and Another issue is during accessing the knowl- feeding them into the network. edge. Suppose a problem is given to the system, 7 The issue of abstraction level and internally it needs to choose which level of abstraction to use. What is the right way of deter- One of the issues which is arises as a result mining what is the right level of abstraction? For of explicit representation of facts and rules is example, two different inputs to a questions an- the “abstraction” issue, which unfortunately does swering system might be (1) “Is there any person not have much direct literature. Making rules with degree in psychology?”, or (2) “Is there any more coarser sometimes makes processing and person with degree in a non-engineering major?”. representation easier. For example suppose in a The attention structure in human mind (Johnson database of people the final task involves whether and Proctor, 2004; Janzen and Vicente, 1997) people are engineers or not. Consider these two is very strong in finding out what are the right scenarios: having a database of people with (1) abstraction for answering each question, but for with more abstracted form of job title, i.e. “en- computers, it might need a further pre-processing gineer” or “non-engineer”, or (2) full job names, on the question to decide where to look for infor- such as “psychologist”, “mathematician”, “chem- mation. The issue of accessing knowledge goes ical engineer”, etc. In case (1) the decision is beyond abstraction levels; in general it can be relatively easier that case (2), in which the deci- about accessing multiple knowledge bases, pos- sion needs further collapsing of the job-names into sibly with contradictions, etc. Unfortunately there “engineer” and “non-engineer”. Therefore, usu- is not much formalism on these issues.

9 8 Exemplar Practical Systems adjectives and adverbs4, labeled with ‘sense’ classes. Meaning in Wordnet is defined through Here we give a summary of the main efforts in synsets, which are the set of synonyms. In addi- creating knowledge bases. Many of the following tion, the elements are linked with small set of se- KBs claim to to capture general-purpose world- mantic relations, e.g. ‘is-a’, synonym, hyponym, semantic knowledge; however the small differ- etc. For example here, the hierarchy of hypernyms ences in their knowledge representations make of happiness is shown bellow: them suitable for very different purposes. In each of the following we will give partial answers to the following questions: abstraction • What are the representations? What is the => attribute level of abstraction? => state • How is knowledge acquired? => feeling • How are contradictions and updating the ex- => emotion isting information handled? => spirit • Is there any notion of time/date/number? If => emotional_state so, how are temporal and quantitative issues => happiness handled? • How the information is accessed? • Prominent success stories. In other words, WordNet could be seen as a combination of dictionary and thesaurus. It also The works are presented in chronological order. contains the morphological information to get the 8.1 Cyc (1984–present) lemma or stem of a word. As of 2012, it contains 155,287 words with a total of 206,941 word-sense The Cyc project (Lenat, 1995) started with the ef- pairs, organized in 117,659 synsets. fort to formalize knowledge (including common- sense) into a logical framework. As of 2003, The knowledge in the WordNet is mostly hand- Cyc contained around 118,000 concepts, used crafted by knowledge engineers. WordNet has ex- in around 2,000,000 assertions (Liu and Singh, tensively been used for word sense disambigua- 2004). The assertions in this KB are mostly tion (WSD), in which it is aimed to differentiate crafted by knowledge engineers. the right meaning of the words, based on context To properly use Cyc KB it is essential to first information. Given the nice relational hierarchy map the raw text into CycL, the logical represen- of words in WordNet, one of its popular usages tation used by Cyc. However, the mapping is quite is as a similarity metric, for example by taking complex due to the complexity and ambiguity of into account the distance in the hypernymy tree, natural language. Since the representation of con- or similarity between the synsets. cepts and assertions are formalized based on logic, A criticism (or maybe advantage?), is that the its deductive reasoning works great when all de- information in WordNet is general, and there is no tails are precise and unambiguous. However, the domain-specific classification of knowledge (e.g. usual nature of natural language is full of ambigu- food, engineering, etc). Also it has been argued ity and brevity. The limited access to the knowl- that the sense categorization inside WordNet is too edge base and tools to use it for common public fine-grained. For example, it can happen in many was another limitation which made it hard for re- cases that, a word’s meaning corresponds to mul- searchers to improve upon this KB. tiple senses in WordNet. To solve the sense gran- 8.2 WordNet (1985–present) ularity issue, there have been many proposals, in- cluding clustering techniques, etc. WordNet (Miller, 1995; Fellbaum, 1998) is ar- guably the most impactful resource used in NLP community. The resource is optimized for lexical 4It ignores determiners, prepositions, pronouns, conjunc- categorization. It contains (mostly) nouns, verbs, tions, and particles.

10 8.3 ThoughtTreasure (1994–2000) (e.g. eating at a restaurant). For a fixed script, Begun in December 1993, ThoughtTreasure it contains details details on how events follow (Mueller, 1998) was aimed to contain common- each other, and how people and physical objects sense knowledge. Adopting the view of Minsky might interact with each other. For example the (1988) that there is no single “right” representa- procedure to buy a ticket, is a represented as an tion for everything, ThoughtTreasure uses various automaton: forms of representation to represent the knowl- edge. The major part of the knowledge is a set of purchase-ticket(A, P) :- concepts, which are linked as assertions. Bellow dress(A, purchase-ticket), RETRIEVE building-of(P, BLDG); are examples of the assertions: near-reachable(A, BLDG), near-reachable(A, FINDO(office)), [isa soda drink] 2: interjection-of-greeting(A, B = (Soda is a drink.) FIND(human NEAR counter)), WAIT FOR may-I-help-you(B, A) OR WAIT 10 seconds AND GOTO 2, [is the-sky blue] ... (The sky is blue.) The KB contains such scenarios for typical ac- The final version contained a total of 27,000 con- tivities like inter-personal relations, sleeping, at- cepts and 51,000 assertions. Unlike many other tending events, sending a message with phone, KBs, it has several domain-specific lower ontolo- etc. The finite automata give a relatively good way gies such as for clothing, food, and music. It con- to simulate behavior of actors in different scenar- tains assertions containing quantities, such as the ios. followings: As mentioned, for different types of informa- tion different ways of representation have been [duration attend-play NUMBER:second:10800] used; (1) logical assertions for encyclopedic facts, (The duration of a play is 10,800 (2) grids for stereotypical physical settings, and seconds.) (3) finite automata for rules of thumb, device be- @19770120:19810120|[President-of havior, and mental processes. country-USA Jimmy-Carter] Considering the lexical hierarchies by some (Jimmy Carter was the President of the USA from January 20, 1977 to specific set of relations, ThoughtTreasure is sim- January 20, 1981.) ilar to WordNet. Also it is similar to Cyc, in the sense that it has commonsense knowledge The ontology supports English and French; and most of its information is as assertions be- 35,000 lexical entries in English and 21,000 lexi- tween concepts. However, as mentioned ear- cal entries in French. For each entry there are at lier, ThoughtTreasure is using multiple represen- most 118 features attached. Examples of features tations in addition to logic (finite automata, grids, are ZEROART (zero article taker), SING (sin- scripts). gular), FML (formal), CAN (Canadian). Grids are 2D maps of floorplans which repre- 8.4 ConceptNet(2000–present) sent in a typical life how objects (flower, window, ConceptNet (Liu and Singh, 2004) was motivated table, chair, etc) are arranged near each other. A by importance of commonsense knowledge in tex- grid can belong to a kitchen, bedroom, a restau- tual reasoning. rant, etc. There are rules for how the grids (for ex- As of 2014, ConceptNet5 (Speer and Havasi, ample bedroom and kitchen) might be related to 2012) contains over 10 million assertions, and each other in the form of procedural commands. about 7 million of those assertions are in English. There are 29 grids in total included in the KB. There are 50 relations, and 2,798,486 English con- ThoughtTreasure also contains procedural cepts. As a graph, just the English part is about 3 knowledge, in the form of about 100 scripts, million nodes and 7 millon edges or arcs connect- which are representations of typical activities ing them.

11 Compared to WordNet, it contains more so- phisticated relations like LocationOf, SubeventOf, Q: Jimbo was afraid of Bobbert because she gets scared UsedFor, etc. An example subgraph of the con- around new people. ceptNet is shown in Figure 2. (she;gets scared around;new people) (Jimbo; was afraid of; Bobbert) The data collection for ConceptNet was (Jimbo; was; afraid of Bobbert) extracted from the Open-Mind Commonsense (OMCS) project (Singh et al., 2002), during which Although OpenIE, at first look, is an open re- common people (rather than knowledge experts) lation extractor, the result of running it on web- were asked to fill in possible relations between scale data can be used as a knowledge base of re- concepts or a concept relating to a another one. lations between arguments. In Balasubramanian The OMCS data was already in a semi-structured et al. (2013) unigram and counts of co- form the way it was collected by prompting users occurring relations have been published. with fill-in-the-blank templates. The extraction Unlike other knowledge bases, OpenIE is an of ConceptNet5 relations are done by design of open relation-tuple extraction system which can regular expression on the semi-structured OMCS be used a resource. Therefore, there is not much data. In the final stage of the construction, the con- human power is used for extraction of the rules. cepts are stripped of determiners, and tenses and One issue is that, since the rules are extracted reduced to their lemma forms. from text, it will have problems in extraction of ConceptNet’s knowledge representation is the rules that are implicit or not usually mentioned semi-structured English. Since many of the (like facts relevant to commonsense). It has also concepts are represented with natural language, been argued that the surface relations for Ope- there are many redundancies in the representation. nIE are brittle. In other words, a fixed relation Unlike WordNet, the concepts can be relatively might be expressed in many different forms, and more complex phrases, e.g. ‘cut food’, ‘kick a they will be treated differently. Similarly, a sur- ball’, ‘drive to work’. Also there is relativey more face string, might have two different meanings, variety of relations between concepts; e.g. ‘IsA’, depending on its context, therefore expressing two ‘RelatedTo’, ‘PartOf’, ‘UsedFor’, ‘CapableOf’, different sense of the relation. ‘AtLocation’, ‘Causes’, ‘Synonym’, ‘Antonym’, etc. Also, conceptNet, does not have sense labels 8.6 Freebase (2007–2015) (either relations or concepts). For example the Freebase (Bollacker et al., 2008) is a collabora- concept ‘Power’ (in Figure 2) can be used in the tive knowledge base acquired by Google in 2010. sense of ‘having authority’, or in the sense of Including 46 million of concepts, Freebase is a ‘rate of doing physical work’. very popular source of information. The infor- mation was acquired from many sources, such 8.5 OpenIE (2003–present) as Wikipedia page and human entries. The data Open (Etzioni et al., 2004; are represented in the form of entities/concepts Mausam et al., 2012) consists of a series of which are connected by relations. The KB can be projects dedicated to extraction of relational tuples searched, queried like a database and used to pro- from text, without resorting to any pre-specified vide information to applications. In 2015, Google vocabulary or ontology. The systems mostly start announced that it is shutting down Freebase, and with identification of relation phrases and asso- will be replaced with Wikidata (see Section 8.9). ciated arguments, for a given arbitrary sentences. There are many successful usages of Freebase, The type of the relation can be very general; me- although many of such works are limited in the diated by a verb, nouns, adjectives, etc. sense that, they mostly rely on direct usages of the facts, without much reasoning on chains of facts. In the following box, a sample output of OLLIE (Mausam et al., 2012) on the input text has been 8.7 NELL (2010–present) shown. The Never-Ending Language Learner (NELL) (Carlson et al., 2010; Mitchell et al., 2015)

12 AtLocation

Fact isA AtLocation Library Study Cause Information isA Learn MotivatedBy isA Abstracton Politician Desires FormOf Energy Book hasA Knowledge isA Power isA HasProperty Desires Physical SymbolOf Dictator HasProperty PartOf Phenomena SymbolOf Wisdom Powerful Knife Corrupt

Figure 2: A small visualization of the relations inside ConceptNet5. From conceptnet5.media. mit.edu has been developed to learn from the web 24 information in the extracted patterns. Compared hours/day since its start in January 2010. So to Cyc, which has been using knowledge experts far has acquired over 80 million beliefs (relations for many years and Freebase, which was using among two fixed entities/concepts). NELL up- communal effort it can be considered an advan- dates its parameters based on new belief, which tage. From this perspective it is similar to Ope- results in extension of its capability for generating nIE, NELL, etc. Unfortunately Probase has so far new beliefs. been used only for Microsoft’s internal user and NELL is one of the most unique efforts to gen- not publicly available. erate knowledge in the loop of reasoning and us- age. There has been much engineering on this 8.9 Wikidata(2012–present) system to create a stable loop of reasoning and Wikidata (Vrandeciˇ c´ and Krotzsch,¨ 2014) is a knowledge extraction. For example, deviations in collaboratively knowledge base operated by the knowledge base facts might be result of some ini- Wikimedia Foundation. tial erroneous conclusions which creates system- The data is distributed across documents which atic noise in the results, as the loop of reason- represents a topic and is identified by a unique ing and extraction continues based on the previ- number (for example the item for the topic Pol- ous facts. There is vast number of new innova- itics is Q7163). Each topic contains a series of tive works suggesting ways to add new facts given statements which are in the form of key-value the existing knowledge; e.g. (Gardner et al., 2013; pairs. Each single user can edit the values or add Lao et al., 2011). Despite the huge volume of facts new keys. The current user-interface of Wikidata in NELL, the usage of this KB has been limited in seems to be user-friendlier Freebase, another col- problems which need reasoning on top of facts, laborative KB. The creators believe that such the e.g. . community-driven design will be the way to cre- ate an ever-evolving and robust knowledge base. 8.8 Probase (2012–present) Probase (Wu et al., 2012) is a probabilistic tax- 8.10 Other efforts onomy which contains 2.7 million concepts ex- Two early frame-based systems, FRL (Roberts tracted automatically from 1.68 billion web pages. and Goldstein, 1977) and KRL (Bobrow and The probabilities are used to model inconsistent, Winograd, 1977) both defined on based on ambiguous and uncertain information. frame models with default assignments, are the As mentioned Probase is a taxonomy which can early exemplar usages of frame-based knowledge be thought of as an ontology with only isA rela- bases, although tailored for limited problems. tions. The extraction of concepts is done auto- MindPixel(2000–2005) was a web-based collab- matically which has resulted in its big size. The orative project aimed to create a KB, though vali- extraction is done by bootstrapping using syntac- dating true/false statements or probabilistic propo- tic patterns and using iterative use of semantic sitions by online users. Yago (2007–present)

13 (Suchanek et al., 2007) and DBPedia (2007– generated based on it, probability might not be the present) (Auer et al., 2007) are two other knowl- right solution for simulating uncertainty in AI, al- edge bases which are similar in representation to though there is no doubt that it is a useful solution. Freebase. Their data is mostly collected based on Although we did not focus on reasoning, it is Wikipedia data. BabelNet(2012–present) (Navigli directly determined by the way information about and Ponzetto, 2010) is both a multilingual knowl- the problem are formalized. Defining what is the edge base, with a which con- reasoning should be is hard; deduction, induction, nects concepts and named entities in a very large family of non-monotonic reasoning, MAP, etc? network of semantic relations. It contains about Each of these have their own properties which oth- 14 million entries, in about 271 languages. ers might not have, but is there any efficient rea- It is important to mention that, there has been soning which subsumes others? Certainly if there a surge of interest in creating distributed rep- exist any, it needs to handle the diverse set of rep- resentations for natural language; for example resentations too. (Mikolov et al., 2013; Pennington et al., 2014) and many other ongoing works. Acknowledgment I would like to appreciate Dan Roth for his in- 9 Concluding remarks valuable advice and tremendous discussions when Here we reviewed a substantial range of works re- writing this report. lated to Knowledge Representation. However due to the richness of this subarea, much is missing References here. Our goal was to give an overall view of a se- ries of outstanding past works to the readers who Philip E Agre and David Chapman. 1987. Pengi: An implementation of a theory of activity. In AAAI, vol- might not have had the chance to go over this vast ume 87, pages 286–272. literature, and remind ourselves the challenges the modern trends in AI and Machine Learning are Soren¨ Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. facing. 2007. Dbpedia: A nucleus for a web of open data. Since the advent of AI systems, many works Springer. have been based upon the physical symbol sys- Collin F Baker, Charles J Fillmore, and John B Lowe. tem (Newell and Simon, 1976) assumption. This 1998. The berkeley project. In Proceed- seems to be a strong assumption, but just like ings of the 17th international conference on Compu- many scientific hypotheses, it is made by empir- tational linguistics-Volume 1, pages 86–90. Associ- ical observation and it might not be necessary (or ation for Computational Linguistics. might even be wrong). Niranjan Balasubramanian, Stephen Soderland, Another trend followed traditionally is the sep- Oren Etzioni Mausam, and Oren Etzioni. 2013. aration of knowledge bases from reasoning sys- Generating coherent event schemas at scale. In EMNLP tems, which we have inherited from the first sys- , pages 1721–1731. tems designed based on logical axioms of world, Dana H Ballard and Patrick J Hayes. 1986. Parallel along with an independent logical reasoning (Mc- logical inference and energy minimization. Depart- Carthy, 1963). ment of Computer Science, University of Rochester. There is not much work directly addressing the Ann M Bisantz and Kim J Vicente. 1994. Making the issue of knowledge access. In real problems when abstraction hierarchy concrete. International Jour- there are multiple levels of abstraction in knowl- nal of human-computer studies, 40(1):83–117. edge bases, or multiple resource, the question be- Daniel G Bobrow and Terry Winograd. 1977. An comes what is the right place to look for knowl- overview of krl, a knowledge representation lan- edge, and how to use it. This seems to be an open guage. Cognitive science, 1(1):3–46. issue to a large extent. Daniel G Bobrow. 1964. Natural language input for a It seems to be a prevalent idea that uncertainty computer problem solving system. is a necessity for AI. Despite decades of working Margaret A Boden. 1996. The philosophy of artificial on probability theory and the whole set of areas life.

14 Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Hubert L Dreyfus. 1992. What computers still can’t Sturge, and Jamie Taylor. 2008. Freebase: a collab- do: a critique of artificial reason. MIT press. oratively created graph database for structuring hu- man knowledge. In Proceedings of the 2008 ACM Jeffrey L Elman. 1991. Distributed representations, SIGMOD international conference on Management simple recurrent networks, and grammatical struc- of data, pages 1247–1250. ACM. ture. Machine learning, 7(2-3):195–225.

Alexander Borgida, Ronald J Brachman, Deborah L Jeffrey L Elman. 1996. Rethinking innateness: A con- McGuinness, and Lori Alperin Resnick. 1989. nectionist perspective on development, volume 10. Classic: A structural data model for objects. In MIT press. ACM Sigmod record, volume 18, pages 58–67. Oren Etzioni, Michael Cafarella, Doug Downey, Stan- ACM. ley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Rodney A Brooks. 1990. Elephants don’t play chess. Soderland, Daniel S Weld, and Alexander Yates. Robotics and autonomous systems, 6(1):3–15. 2004. Web-scale information extraction in know- itall:(preliminary results). In Proceedings of the Rodney A Brooks. 1991. Intelligence without repre- 13th international conference on World Wide Web, sentation. Artificial intelligence, 47(1):139–159. pages 100–110. ACM. Oren Etzioni. 1993. Intelligence without robots: A Bruce G Buchanan and Edward A Feigenbaum. 1978. reply to brooks. AI Magazine, 14(4):7. Dendral and meta-dendral: Their applications di- mension. Artificial intelligence, 11(1):5–24. Christiane Fellbaum. 1998. WordNet. Wiley Online Library. Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R Hruschka Jr, and Tom M Richard Fikes and Tom Kehler. 1985. The role of Mitchell. 2010. Toward an architecture for never- frame-based representation in reasoning. Commu- ending language learning. In AAAI, volume 5, nications of the ACM, 28(9):904–920. page 3. Charles J Fillmore. 1977. Scenes-and-frames seman- Eugene Charniak. 1972. Toward a model of children’s tics. Linguistic structures processing, 59:55–88. story comprehension. Jerry A Fodor and Zenon W Pylyshyn. 1988. Connec- Peter Cheeseman. 1985. In defense of probability. In tionism and cognitive architecture: A critical analy- IJCAI, pages 1002–1009. sis. Cognition, 28(1):3–71.

Noam Chomsky. 1988. Language and problems Jerry A Fodor. 1983. The modularity of mind: An of knowledge: The Managua lectures, volume 16. essay on faculty psychology. MIT press. MIT press. Ken-Ichi Funahashi. 1989. On the approximate real- Peter Clark. 2015. Elementary School Science and ization of continuous mappings by neural networks. Math Tests as a Driver for AI:Take the Aristo Chal- Neural networks, 2(3):183–192. lenge! In Proceedings of IAAI. Matt Gardner, Partha Pratim Talukdar, Bryan Kisiel, R.T. Cox. 1961. Algebra of Probable Inference. Johns and Tom M Mitchell. 2013. Improving learning Hopkins University Press. and inference in a large knowledge-base using latent syntactic cues. In EMNLP, pages 833–838. Ido Dagan, Bill Dolan, Bernardo Magnini, and Dan E Goldstein. 2014. Cognitive psychology: Connecting Roth. 2010. Recognizing textual entailment: Ra- mind, research and everyday experience. Cengage Natural tional, evaluation and approaches–erratum. Learning. Language Engineering, 16(01):105–105. Charles M Gray and Wolf Singer. 1989. Stimulus- Adnan Darwiche and Moises´ Goldszmidt. 1994. On specific neuronal oscillations in orientation columns the relation between kappa calculus and probabilis- of cat visual cortex. Proceedings of the National tic reasoning. In Proceedings of the Tenth interna- Academy of Sciences, 86(5):1698–1702. tional conference on Uncertainty in artificial intelli- gence, pages 145–153. Morgan Kaufmann Publish- Frederick Hayes-Roth, Donald Waterman, and Dou- ers Inc. glas Lenat. 1984. Building expert systems. Gerald Francis Dejong, II. 1979. Skimming Stories Patrick J. Hayes. 1995. Computation & intel- in Real Time: An Experiment in Integrated Un- ligence. chapter The Second Naive Physics Mani- derstanding. Ph.D. thesis, New Haven, CT, USA. festo, pages 567–585. American Association for Ar- AAI7925621. tificial Intelligence, Menlo Park, CA, USA.

15 Steffen Holldobler,¨ Yvonne Kalinke, Fg Wissensver- John McCarthy and Vladimir Lifschitz. 1990. Formal- arbeitung Ki, et al. 1991. Towards a new mas- izing common sense: papers, volume 5. Intellect sively parallel computational model for logic pro- Books. gramming. In In ECAI94 workshop on Combining Symbolic and Connectioninst Processing. Citeseer. John McCarthy. 1963. Programs with common sense. Defense Technical Information Center. Alan Hjek. 2012. Interpretations of probability. In Tomas Mikolov, Kai Chen, Greg Corrado, and Jef- Edward N. Zalta, editor, The Stanford Encyclopedia frey Dean. 2013. Efficient estimation of word of Philosophy. Winter 2012 edition. representations in vector space. arXiv preprint arXiv:1301.3781. Michael E Janzen and Kim J Vicente. 1997. Attention allocation within the abstraction hierarchy. In Pro- George A Miller. 1995. Wordnet: a lexical ceedings of the Human Factors and Ergonomics So- database for english. Communications of the ACM, ciety Annual Meeting, volume 41, pages 274–278. 38(11):39–41. SAGE Publications. Marvin Minsky and Seymour Papert. 1969. Per- Edwin T Jaynes. 2003. Probability theory: the logic ceptron: an introduction to computational geome- of science. Cambridge university press. try. The MIT Press, Cambridge, expanded edition, 19:88. Addie Johnson and Robert W Proctor. 2004. Atten- tion: Theory and practice. Sage Publications. Marvin Minsky. 1974. A framework for representing knowledge. David Kirsh. 1991. Today the earwig, tomorrow man? Marvin Minsky. 1988. Society of mind. Simon and Artificial intelligence, 47(1):161–184. Schuster. Ni Lao, Tom Mitchell, and William W Cohen. 2011. T. Mitchell, W. Cohen, E. Hruscha, P. Talukdar, Random walk inference and learning in a large scale J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, knowledge base. In Proceedings of the Conference B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, on Empirical Methods in Natural Language Pro- T. Mohammad, N. Nakashole, E. Platanios, A. Rit- cessing, pages 529–539. Association for Computa- ter, M. Samadi, B. Settles, R. Wang, D. Wijaya, tional Linguistics. A. Gupta, X. Chen, A. Saparov, M. Greaves, and J. Welling. 2015. In AAAI. Steffen L Lauritzen. 1996. Graphical models. Oxford University Press. Erik T Mueller. 1998. Natural language processing with ThoughtTreasure. Citeseer. Douglas B Lenat. 1995. Cyc: A large-scale invest- Roberto Navigli and Simone Paolo Ponzetto. 2010. ment in knowledge infrastructure. Communications Babelnet: Building a very large multilingual se- of the ACM, 38(11):33–38. mantic network. In Proceedings of the 48th an- nual meeting of the association for computational Hector J Levesque, Ernest Davis, and Leora Morgen- linguistics, pages 216–225. Association for Compu- stern. 2011. The winograd schema challenge. In tational Linguistics. AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning. Allen Newell and Herbert Alexander Simon. 1961. Gps, a program that simulates human thought. Hugo Liu and Push Singh. 2004. Conceptneta practi- Technical report, DTIC Document. cal commonsense reasoning tool-kit. BT technology journal, 22(4):211–226. Allen Newell and Herbert A Simon. 1976. Computer science as empirical inquiry: Symbols and search. Pattie Maes. 1990. Situated agents can have goals. Communications of the ACM, 19(3):113–126. Robotics and autonomous systems, 6(1):49–70. Nils J Nilsson. 1986. Probabilistic logic. Artificial Mausam, Michael Schmitz, Robert Bart, Stephen intelligence, 28(1):71–87. Soderland, and Oren Etzioni. 2012. Open language Judea Pearl. 1986. Fusion, propagation, and struc- learning for information extraction. In Proceed- turing in belief networks. Artificial intelligence, ings of Conference on Empirical Methods in Nat- 29(3):241–288. ural Language Processing and Computational Nat- ural Language Learning (EMNLP-CoNLL). Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for John McCarthy and Patrick Hayes. 1968. Some philo- word representation. Proceedings of the Empiricial sophical problems from the standpoint of artificial Methods in Natural Language Processing (EMNLP intelligence. Stanford University USA. 2014), 12.

16 Gadi Pinkas. 1991. Symmetric neural networks and Paul Smolensky. 1988. The constituent structure of propositional logic satisfiability. Neural Computa- connectionist mental states: A reply to fodor and tion, 3(2):282–291. pylyshyn. The Southern Journal of Philosophy, 26(S1):137–161. Steven Pinker and Paul Bloom. 1990. Natural lan- guage and natural selection. Behavioral and brain Paul Smolensky. 1990. Tensor product variable bind- sciences, 13(04):707–727. ing and the representation of symbolic structures in connectionist systems. Artificial intelligence, Jens Rasmussen. 1985. The role of hierarchical 46(1):159–216. knowledge representation in decisionmaking and system management. Systems, Man and Cybernet- Robert Speer and Catherine Havasi. 2012. Represent- ics, IEEE Transactions on, (2):234–243. ing general relational knowledge in conceptnet 5. In LREC, pages 3679–3686. Matthew Richardson, Christopher JC Burges, and Erin Renshaw. 2013. Mctest: A challenge dataset for the Wolfgang Spohn. 1988. Ordinal conditional func- open-domain machine comprehension of text. In tions: A dynamic theory of epistemic states. EMNLP, pages 193–203. Springer.

R Bruce Roberts and Ira P Goldstein. 1977. The frl Matthias Steup. 2014. Epistemology. In Edward N. manual. Technical report, DTIC Document. Zalta, editor, The Stanford Encyclopedia of Philos- ophy. Spring 2014 edition. Frank Rosenblatt. 1958. The perceptron: a probabilis- tic model for information storage and organization Fabian M Suchanek, Gjergji Kasneci, and Gerhard in the brain. Psychological review, 65(6):386. Weikum. 2007. Yago: a core of semantic knowl- edge. In Proceedings of the 16th international con- David E Rumelhart and James L McClelland. 1985. ference on World Wide Web, pages 697–706. ACM. On learning the past tenses of english verbs. Lucy A Suchman. 1987. Plans and situated ac- David E Rumelhart, Geoffrey E Hinton, and Ronald J tions: the problem of human-machine communica- Williams. 1988a. Learning representations by tion. Cambridge university press. back-propagating errors. Cognitive modeling, 5. David S Touretzky and Geoffrey E Hinton. 1985. David E Rumelhart, James L McClelland, PDP Re- Symbols among the neurons: Details of a connec- search Group, et al. 1988b. Parallel distributed tionist inference architecture. In IJCAI, volume 85, processing, volume 1. IEEE. pages 238–243.

Klaus Schild. 1991. A correspondence theory for ter- Geoffrey G Towell and Jude W Shavlik. 1993. Ex- minological logics: Preliminary report. tracting refined rules from knowledge-based neural networks. Machine learning, 13(1):71–101. Manfred Schmidt-Schauß and Gert Smolka. 1991. Attributive concept descriptions with complements. Anne M Treisman and Garry Gelade. 1980. A feature- Artificial intelligence, 48(1):1–26. integration theory of attention. Cognitive psychol- ogy, 12(1):97–136. John R Searle. 1980. Minds, brains, and programs. Behavioral and brain sciences, 3(03):417–424. Amos Tversky and Daniel Kahneman. 1974. Judg- ment under uncertainty: Heuristics and biases. sci- Glenn Shafer. 1976. A Mathematical Theory of Evi- ence, 185(4157):1124–1131. dence. Princeton University Press, Princeton. Denny Vrandeciˇ c´ and Markus Krotzsch.¨ 2014. Wiki- Lokendra Shastri and Venkat Ajjanagadde. 1993. data: a free collaborative knowledgebase. Commu- From simple associations to systematic reasoning: nications of the ACM, 57(10):78–85. A connectionist representation of rules, variables and dynamic bindings using temporal synchrony. Jason Weston, Antoine Bordes, Sumit Chopra, and Behavioral and brain sciences, 16(03):417–451. Tomas Mikolov. 2015. Towards ai-complete ques- tion answering: A set of prerequisite toy tasks. Push Singh, Thomas Lin, Erik T Mueller, Grace Lim, arXiv preprint arXiv:1502.05698. Travell Perkins, and Wan Li Zhu. 2002. : Knowledge acquisition from the Terry Winograd. 1971. Procedures as a representa- general public. In On the Move to Meaningful In- tion for data in a computer program for understand- ternet Systems 2002: CoopIS, DOA, and ODBASE, ing natural language. Technical report, DTIC Doc- pages 1223–1237. Springer. ument.

17 Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q Zhu. 2012. Probase: A probabilistic taxonomy for text understanding. In Proceedings of the 2012 ACM SIGMOD International Conference on Man- agement of Data, pages 481–492. ACM. Lotfi A Zadeh. 1973. Outline of a new approach to the analysis of complex systems and decision processes. Systems, Man and Cybernetics, IEEE Transactions on, (1):28–44. LA Zedeh. 1989. Knowledge representation in fuzzy logic. Knowledge and Data Engineering, IEEE Transactions on, 1(1):89–100.

18