KNOWLEDGE REPRESENTATION FOR COMMONSENSE REASONING WITH TEXT

Kathleen Dahlgren Joyce McDowell IBM Los Angeles Scientific Center

Edward P. Stabler, Jr.l University of Western Ontario

1 INTRODUCTION gressive, and ambitious. Inherently lawyers are 1.1 NAIVE SEMANTICS adults, have gone to law school, and have passed the bar. They practice law, argue cases, advise The reader of a text actively constructs a rich picture of clients, and represent them in court. Conversely, the objects, events, and situation described. The text is if someone has these features, he/she probably is a a vague, insufficient, and ambiguous indicator of the lawyer. world that the writer intends to depict. The reader In the classical approach to word meaning, the aim is to draws upon world knowledge to disambiguate and clar- find a set of primitives that is much smaller than the set ify the text, selecting the most plausible interpretation of words in a language and whose elements can be from among the (infinitely) many possible ones. In conjoined in representations that are truth-conditionally principle, any world knowledge whatsoever in the read- adequate. In such theories "bachelor" is represented as er's mind can affect the choice of an interpretation. Is a conjunction of primitive predicates. there a level of knowledge that is general and common to many speakers of a natural language? Can this level 2. bachelor(X) ¢~ adult(X) & human(X) & be the basis of an explanation of text interpretation? male(X) & unmarried(X) Can it be identified in a principled, projectable way? In such theories, a sentence such as (3) can be given Can this level be represented for use in computational truth conditions based upon the meaning representation text understanding? We claim that there is such a level, of "bachelor," plus rules of compositional semantics called naive semantics (NS), which is commonsense that map the sentence into a logical formula that asserts knowledge associated with words. Naive semantics that the individual denoted by "John" is in the set of identifies words with concepts, which vary in type. objects denoted by "bachelor." Nominal concepts are categorizations of objects based upon naive theories concerning the nature and typical 3. John is a bachelor. description of conceptualized objects. Verbal concepts are naive theories of the implications of conceptualized The sentence is true just in case all of the properties in events and states. 2 Concepts are considered naive be- the meaning representation of "bachelor" (2) are true of cause they are not always objectively true, and bear "John." This is essentially the approach in many com- only a distant relation to scientific theories. An informal putational knowledge representation schemes such as KRYPTON (Brachman et al. 1985), approaches follow- example of a naive nominal concept is the following description of the typical lawyer. ing Schank (Schank and Abelson 1977), and linguistic semantic theories such as Katz (1972) and Jackendoff 1. If someone is a lawyer, typically they are male or (1985). female, well-dressed, use paper, books, and brief- Smith and Medin (1981), Dahlgren (1988a), Johnson- cases in their job, have a high income and high Laird (1983), and Lakoff (1987) argue in detail that all of status. They are well-educated, clever, articulate, these approaches are essentially similar in this way and and knowledgeable, as well as contentious, ag- all suffer from the same defects, which we summarize

Copyright 1989 by the Association for Computational Linguistics. Permission to copy without fee all or part of this material is granted provided that the copies are not made for direct commercial advantage and the CL reference and this copyright notice are included on the first page. To copy otherwise, or to republish, requires a fee and/or specific permission. 0362-613X/89/010149-170503.00

Computational Linguistics, Volume 15, Number 3, September 1989 149 Kathleen Dahlgren, Joyce McDowell, and Edward P. Stabler, Jr. Knowledge Representation for Commonsense Reasoning with Text briefly here. Word meanings are not scientific theories Compositional [ and do not provide criteria for membership in the Parser Augmentationof the categories they name (Putnam 1975). Concepts are vague, and the categories they name are sometimes Discourse vaguely defined (Labov 1973; Rosch et al. 1976). Mem- bership of objects in categories is gradient, while the classical approach would predict that all members share Xw full and equal status (Rosch and Mervis 1975). Not all Compositkmal Naive Inference categories can be decomposed into primitives (e.g., SemontTca color terms). Exceptions to features in word meanings are common (most birds fly, but not all) (Fahlman 1979). Some terms are not intended to be used truth-condition- Figure 1. Components of Grammar in NS. ally (Dahlgren 1988a). Word meanings shift in unpre- dictable ways based upon changes in the social and physical environment (Dahlgren 1985b). The classical naive semantics are separate components with unique theory also predicts that fundamentally new concepts representational forms and processing mechanisms. are impossible. Figure l illustrates the components. The autonomous NS sees lexical meanings as naive theories and syntactic component draws upon naive semantic infor- denies that meaning representations provide truth con- mation for problems such as prepositional phrase at- ditions for sentences in which they are used. NS ac- tachment and word sense disambiguation. Another au- counts for the success of natural language communica- tonomous component interprets the compositional tion, given the vagueness and inaccuracy of word semantics and builds discourse representation structures meanings, by the fact that natural language is anchored (DRSs) as in Kamp (1981) and Asher (1987). Another in the real world. There are some real, stable classes of component models naive semantics and completes the objects that nouns are used to refer to, and mental discourse representation that includes the implications representations of their characteristics are close enough of the text. All of these components operate in parallel to true, enough of the time, to make reference using and have access to each other's representations when- nouns possible (Boyd 1986). Similarly, there are real ever necessary. classes of events which verbs report, and mental repre- 1.2 THE KT SYSTEM sentations of their implications are approximately true. The vagueness and inaccuracy of mental representa- Naive semantics is the theoretical motivation for the KT tions requires non-monotonic reasoning in drawing in- system under development at the IBM Los Angeles ferences based upon them. Anchoring is the main Scientiific Center by Dahlgren, McDowell, and others. 3 explanation of referential success, and the use of words The heart of the system is a commonsense knowledge for imaginary objects is derivative and secondary. base with two components, a commonsense ontology NS differs from approaches that employ exhaustive and databases of generic knowledge associated with decompositions into primitive concepts which are sup- lexical items. The first phase of the project, which is posed to be true of all and only the members of the set nearly complete, is a text understanding and query denoted by lawyer. NS descriptions are seen as heuris- system. In this phase, text is read into the system and tics. Features associated with a concept can be overrid- parsed by the MODL parser (McCord 1987), which has den or corrected by new information in specific cases very wide coverage. The parse is submitted to a module (Reiter 1980). NS accounts for the fact that while (DISAMBIG) that outputs a logical structure which English speakers believe that an inherent function of a reflects the scope properties of operators and quantifi- lawyer is to practice law, they are also willing to be told ers, correct attachment of post-verbal adjuncts, and that some lawyer does not practice law. A non-prac- selects, word senses. This is passed to a semantic ticing lawyer is still a lawyer. The goal in NS is not to translator whose output (a DRS) is then converted to find the minimum set of primitives required to distin- first-order logic (FOL). We then have the text in two guish concepts from each other, but rather, to represent different semantic forms (DRS and FOL), each of which a portion of the naive theory that constitutes the cogni- has its advantages and each of which is utilized in the tive concept associated with a word. NS descriptions system in different ways. Queries are handled in the include features found in alternative approaches, but same way as text. Answers to the queries are obtained more as well. The content of features is seen as essen- either by matching to the FOL textual database or to the tially limitless and is drawn from psycholinguistic stud- commonsense databases. However, the commonsense ies of concepts. Thus, in NS, featural descriptions knowledge is accessed at many other stages in the associated with words have as values not primitives, but processing of text and queries, namely in parse disam- other words, as in Schubert et al. (1979). biguafion, in lexical retrieval, anaphora resolution, and In NS, the architecture of cognition that is assumed in the construction of the discourse structure of the is one in which syntax, compositional semantics, and entire text.

150 Computational Linguistics, Volume 15, Number 3, September 1989 Kathleen Dahlgren, Joyce McDowell, and Edward P. Stabler, Jr. Knowledge Representation for Commonsense Reasoning with Text

The second phase of the project, which is at present ENTITY (ABSTRACT v REAL) & (INDIVIDUAL v in the research stage, will be to use the commonsense COLLECTIVE) ABSTRACT IDEAL v PROPOSITIONAL v knowledge representations and the textual database to NUMERICAL v IRREAL guide text selection. We anticipate a system, NewSe- NUMERICAL NUMBER v MEASURE REAL (PHYSICAL v TEMPORAL v SENTIENT) lector, which, given a set of user profiles, will distribute & (NATURAL v SOCIAL) textual material in accordance with user interests, thus, PHYSICAL --* (STATIONARY v NONSTATIONARY) & in effect, acting as an automatic clipping service. Our (LIVING v NONLIVING) target text is the Wall Street Journal. The inferencing NON-STATIONARY --* (SELFMOVING v NONSELFMOVING) COLLECTIVE --* MASS v SET v STRUCTURE capabilities provided by the commonsense knowledge TEMPORAL ---> RELATIONAL v NONRELATIONAL will allow us to go well beyond simple keyword search. RELATIONAL --~ (EVENT v STATIVE) & (MENTAL v The theoretical underpinnings and practical work on EMOTIONAL v NONMENTAL) EVENT --* (GOAL v NONGOAL) & (ACTIVITY v the KT system have been reported extensively else- ACCOMPLISHMENT v ACHIEVEMENT) where, in conference papers (Dahlgren and McDowell 1986a, 1986b; McDowell and Dahlgren 1987) and in a Table 1. The Ontological Schema book (Dahlgren 1988a). Since the publication of those works, a number of significant additions and modifica- tions have been made to the system. The intended focus implies that cognitively there are essentially parallel of this paper is this new work. However, in order to ontological schemas for individuals and collectives. make this accessible to readers unfamiliar with our Thus we have the parallel ontology fragments in Figure previous reports, we present in Section 2 an overview of 2. Table 2 illustrates cross-classification at the root of the components of the present system. (Readers famil- the ontology, where ENTITY cross-classifies as either iar with our system can skip Section 2). In the remaining REAL or ABSTRACT, and as either INDIVIDUAL or sections on new work we have emphasized implemen- COLLECTIVE. Cross-classification is handled as in tation, because this paper is addressed to the computa- McCord (1985, 1987). tional linguistics community: in Section 3, the details of disambiguation procedures that use the NS representa- tions and in Section 4 the details of the query system. INDIVIDUAL COLLECTIVE Finally, Section 5 discusses work in progress regarding discourse and naive semantics. ENTITY ENTITY

2 OVERVIEWor THE KT SYSTEM ABSTRACT REAL ABSTRACT REAL

2.1 KNOWLEDGE REPRESENTATION / / P~SICAL, N~U RAI_,SE ~M~ING PH~IC~,~TURAL,SE~VING 2.1.1 THE ONTOLOGY / / Naive theories associated with words include beliefs concerning the structure of the actual world and the ANIMAL FAUNA significant "joints" in that structure. People have the / / environment classified, and the classification scheme of COW HERD a culture is reflected in its language. Since naive seman- Figure 2. Parallel Portions of the Ontology. tics is intended as a cognitive model, we constructed the naive semantic ontology empirically, rather than intu- itix;ely. We studied the behavior of hundreds of verbs and determined selectional restrictions, which are con- straints reflecting the naive ontology embodied in En- INDIVIDUAL COLLECTIVE glish. We also took into account psychological studies REAL cow herd of classification (Keil 1979), and philosophical studies of ABSTRACT idea book epistemology (Strawson 1953). Table 2. Entity Node Cross Classification 2.1.1.1 MATHEMATICAL PROPERTIES OF THE ONTOLOGY The ontology has several properties which distinguish it from classical taxonomies. It is a directed acyclic graph, Multiple attachments of instantiations to leaves is rather than a binary tree, because many concepts have possible. For example, an entity, John, is both a HU- more than two subordinate concepts (Rosch et al. 1976). MAN with the physical properties of a MAMMAL, and FISH, BIRD, MAMMAL, and so on, are subordinates is also a PERSON, a SENTIENT. As a SENTIENT, of VERTEBRATE. It is a directed graph rather than a John can be the subject of mental verbs such as think tree, because it handles cross-classification. Cross-clas- and say. Institutions are also SENTIENTs, so that the sification is justified by contrasts between individual SENTIENT node reflects English usage in pairs like (4). and collective nouns such as "cow" and "herd." This 4. John sued Levine. The government sued Levine.

Computational Linguistics, Volume 15, Number 3, September 1989 151 Kathleen Dahlgren, Joyce McDowell, and Edward P. Stabler, Jr. Knowledge Representation for Commonsense Reasoning with Text

On the other hand, John, as a HUMAN, is like animals, epistemology (human knowledge) is arguably that be- and has physical properties. Both John and a cow can tween thinking and non-thinking beings (Strawson figure as subjects of verbs like "eat" and "weigh." 1953). Psychology has shown that infants are able to Multiple attachment was justified by an examination of distinguish humans from all other objects and they the texts. It was found that references to human beings develop a deeper and more complex understanding of in text, for example, deal with them either as persons humans than of other objects (Gelman and Spelke 1981). (SENTIENTs) or as ANIMALs (physiological beings), In the realm of linguistics, a class of verbs selects for but rarely as both at the same time. persons, roles, and institutions as subjects or objects. Thus the SENTIENT distinction captures the similarity 2.1.1.2 ONTOLOGICALCATEGORIES between persons and institutions or roles. There is a The INDIVIDUAL/COLLECTIVE cut was made at widespread lexical ambiguity between a locational and the level of ENTITY (the highest level) because all institutional reading of nouns, which can be accounted types of entities are conceived individually or in collec- for by the SENTIENT distinction, as in (7). tions. COLLECTIVE breaks into sets of identical mem- bers (herd, mob, crowd, fleet), masses that are con- 7. The court is in the center of town. The court issued an injunction. ceived as stuff (sand, water), and structures where the members have specified relations, such as in institutions The NATURAL/SOCIAL distinction also was placed (school, company, village). Leaf node names, such as high in the hierarchy. Entities (including events) that are ANIMAL and FAUNA, are shorthand for collections of products of society, and thereby have a social function, categories inherited from dominating nodes and by are viewed as fundamentally different from natural cross-classification. Thus "cow" and "herd" share all entities in the commonsense conceptual scheme. The categories except that "cow" is an INDIVIDUAL term distinction is a basic one psychologically (Miller 1978; and "herd" is a COLLECTIVE term. Gelman and Spelke 1981). SOCIAL entities are those The REAL node breaks into the categories PHYSI- that come into being only in a social or institutional CAL, TEMPORAL, and SENTIENT, and also NAT- setting, with "institution" being understood in the URAL and SOCIAL. Entities (or events) that come into broadest sense, for instance family, government, edu- being (or take place) naturally must be distinguished cation,, warfare, organized religion, etc. from those that arise through some sort of social inter- 2.1.1.3 CONSTRUCTIONOF THE ONTOLOGY vention. Table 3 illustrates the assignment of example words under the REAL cross-classification. The ontological schema was constructed to handle the selectional restrictions of verbs in 4,000 words of geog- raphy text and 6,000 words of newspaper text. These were arranged in a hierarchical schema. The hierarchy INDIVIDUAL COLLECTIVE NATURAL SOCIAL NATURAL SOCIAL was examined and modified to reflect cognitive, philo- PHYSICAL rock knife sand fleet sophical, and linguistic facts, as described above. It was SENTIENT man programmer mob clinic pruned to make it as compact as possible. We mini- TEMPORAL earthquake party winter epoch mized empty terminal nodes. A node could not be part of the ontology unless it systematically pervaded some Table 3. Attachment of Nouns under REAL subhierarchy. Distinctions found in various places were relegated to feature status. The full ontology with examples may be found in Dahlgren (1988).

The SENTIENT/PHYSICAL distinction is placed high 2.1.1.4 VERBS because in commonsense reasoning, the properties of people and things are very different. Verbs select for In KT, verbs are attached to the main ontology at the SENTIENT arguments or PHYSICAL arguments, as node TEMPORAL because information concerning the illustrated in (5). Notice also, that as a physical object, temporality of situations described in a sentence is an individual entity like John can be the subject both of encoded on the verb as tense and because the relations verbs that require physical subjects and those that indicated by verbs must be interpreted with respect to require SENTIENT subjects, as in (6). their location in time in order to properly understand the discourse structure of a text. Thus the ontology implies 5. The lawyer/the grand jury indicted Levine. that events are real entities, and that linguistic, not *The cow indicted Levine. conceptual, structure distinguishes verbalized from 6. John fell. nominalized versions of events and states. John sued Levine. We view the cognitive structure of the concepts The cow fell. associated with nouns and verbs as essentially different. We make the SENTIENT/NON-SENTIENT distinc- Non-derived nouns in utterances refer to objects. Lex- tion high up in the hierarchy for several reasons. ical nouns name classes of entities that share certain Philosophically, the most fundamental distinction in features. Verbs name classes of events and states, but

152 Computational Linguistics, Volume 15, Number 3, September 1989 Kath|een Dahlgren, Joyce McDoweli, and Edward P. Stabler, Jr. Knowledge Representation for Commonsense Reasoning with Text these do not share featural descriptions. Psychologi- accurately a classification of verb phrases than verbs. cally, verbs are organized around goal orientation and KT handles this problem in two ways. First, in sentence argument types (Huttenlocher and Lui 1979; Graesser processing we take into account the arguments in the and Hopkinson 1987). verb phrase as well as the verb classification to deter- The primary category cut at the node TEMPORAL is mine clause aspect, which can be any of the Vendler between nouns, which name classes of entities, in this classes. Second, we classify each sense of a verb case temporal entities like "party," "hurricane," and separately. "winter," and verbs, which indicate relations between members of these nominal classes, like "hit," "love," "remember." We attach temporal nouns to the TEM- Activity Accomplish- Achievement State PORAL/NON-RELATIONAL node and verbs to the ment TEMPORAL/RELATIONAL node. Many nouns are, run, think build a house, recognizejqnd have, want read a novel of course, relational, in the sense that "father" and Progressives + + "indictment" are relational. Our node RELATIONAL Terminus - + + does not carry this intuitive sense of relational, but Change of State - Gradual Punctual instead simply indicates that words attached here re- Subinterval + + quire arguments for complete interpretation. So, while Property "father" is relational, it is possible to use "father" in a Table 4. The Vendler Verb Classification Scheme text without mentioning the related entity. But verbs require arguments (usually overtly, but sometimes un- derstood, as in the case of commands) for full interpre- The other nodes in the relational ontology are motivated tation. Nominalizations like "indictment" are a special by the psycholinguistic studies noted above. The MEN- case. In our system, all deverbal nominalizations are so TAL/NONMENTAL/EMOTIONAL distinction is made marked with a pointer to the verb from which they are at the highest level for the same reasons that led us to derived. Subsequent processing is then directed to the place SENTIENT at a high level in the main ontology. verb, which, of course, is attached under TEMPORAL, All EVENTs are also cross-classified as GOAL ori- RELATIONAL. ented or not. This is supported by virtually every 2.1.1.5 THE VENDLER CLASSIFICATION experimental study on the way people view situations, i.e., GOAL orientation is the most salient property of One basis of the relational ontology is the Vendler events and actions. For example, Trabasso and Van den (1967) classification scheme, which categorizes verbs Broek (1985) find that events are best recalled which into aspectual classes (see Dowty 1979). According to feature the goals of individuals and the consequences of this classification, RELATIONAL divides into EVENT goals and Trabasso and Sperry (1985) find that the or STATIVE, and EVENT divides into ACTIVITY, salient features of events are goals, antecedents, conse- ACHIEVEMENT, or ACCOMPLISHMENT. Vendler quences, implications, enablement, causality, motiva- and others (particularly Dowty 1979) have found the tion, and temporal succession and coexistence. This following properties, which distinguish these classes. view is further supported by Abbott, Black, and Smith 8. STATIVE and ACHIEVEMENT verbs may not (1985) and Graesser and Clark (1985). NONGOAL, appear in the progressive, but may appear in the ACCOMPLISHMENT is a null category because AC- simple present. COMPLISHMENTS are associated with a terminus ACTIVITY and ACCOMPLISHMENT verbs and thus inherently GOAL oriented. On the other hand may appear in the progressive and if they appear in NONGOAL, ACHIEVEMENT is not a null category the simple present, they are interpreted as describ- because the activity leading up to an achievement is ing habitual or characteristic states. always totally distinct from the achievement itself. ACCOMPLISHMENTs and ACHIEVEMENTs SOCIAL, NONGOAL, ACTIVITY is a sparse cate- entail a change of state associated with a terminus gory. (a clear endpoint). STATIVEs ("know") and AC- Cross-classifications inherited from the TEMPORAL TIVITYs ("run") have no well-defined terminus. node are SOCIAL/NATURAL and INDIVIDUAL/ ACHIEVEMENTs are punctual (John killed COLLECTIVE. The INDIVIDUAL/COLLECTIVE Mary) while ACCOMPLISHMENTs are gradual distinction is problematical for verbs, because all events (John built a house). can be viewed as a collection of an infinitude of sub- STATEs and ACTIVITYs have the subinterval events. property (cf. Bennett and Partee 1978). Table 4 summarizes these distinctions. There are sev- 2.1.2 GENERIC KNOWLEDGE eral standard tests for the Vendler (1967) system which Generic descriptions of the nouns and verbs were drawn can be found in Dowty (1979) and others and which we from psycholinguistic data to the extent possible. In a apply in classification. typical experiment, subjects are asked to freelist fea- The Vendler classification scheme is actually more tures "characteristic" of and common to objects in

Computational Linguistics, Volume 15, Number 3, September 1989 153 Kathleen Dahlgren, Joyce McDoweH, and Edward P. Stabler, Jr. Knowledge Representation for Commonsense Reasoning with Text categories such as DOG, LEMON, and SECRETARY fmlctlon(practlce(*,noun,Y) & law(Y)), (Rosch et al. 1976; Ashcraft 1976; Dahlgren 1985a). The functlon(advise(E,noun,Y) & client(Y) & regarcUng(E,Z) & law(Z)), number of subjects in such an experiment ranges from ikmction(represent(E,noun,Y) & client(Y) & 20 to 75. Any feature that is produced in a freelisting ln(E,Z) & court(Z)), experiment by several subjects is likely to be shared in function(argue(*jloun,Y) & case(Y))}). the relevant subpopulation (Rosch 1975; Dahlgren 1985a). Features that were freelisted by at least one-fifth The entire set of features for nouns collected in this way of the subjects are chosen for a second experiment, in so far sort into 38 feature types. These are age, agent, which different subject s are asked to rate the features appearance, association, behavior, color, construc- for typicality. Those features rated as highly typical by tion, content, direction, duration, education, exem- the second group can be considered a good first approx- plar, experienced-as, in-extension-of, frequency, imation to the content of the cognitive structures asso- function, goal, habitat, haspart, hasrole, hierarchy, ciated with the terms under consideration. The number internal-trait, legal-requirement, length, level, loca- of features shared in this way for a term averaged 15. tion, manner, material, name, object, odor, opera- The generic knowledge in the KT system is contained tion, owner, partof, physiology, place, processing, in two generic data bases, one for nouns and one for propagation, prototype, relation, requirement, verbs. There is a separate entry for each sense of each rolein, roles, sex, shape, size, source, speed, state, lexical item. The content of the entries is a pair of lists status, strength, structure, taste, texture, time. A of features drawn from the psycholinguistic data (as given feature type can be used in either typical or described above) or constructed using these data as a inherent feature lists. Since these are not primitives, we model. A feature is, informally, any bit of knowledge expect the list of feature types to expand as we enlarge that is associated with a term. In informal terms, these the semantic domain of the system. can be any items like "wears glasses" (programmer), There is a much smaller set of feature types for "is red" (brick), or "can't be trusted" (used-car sales- verbs. We were guided by recent findings in the psy- man). For each entry, the features are divided into two cholinguistic literature which show that the types of lists, one for typical features and one for inherent information that subjects associate with verbs are sub- features. For example, a brick is typically red, but blood stantially different from what they associate with nouns. is inherently red. Together the lists comprise the entry Huttenlocher and Lui (1979) and Abbott, Black, and description of the term that heads the entry. Smith (1985) in particular have convincingly argued that The source of descriptions of social roles were data subjects conceive of verbs in terms of whether or not collected by Dahlgren (1985a). For physical objects we the activities they describe are goal oriented, the causal used generic descriptions from Ashcraft (1976), includ- and temporal properties of the events described, and the ing raw data generously supplied by the author. An types of entities that can participate as arguments to the informal conceptualization for "lawyer" was shown in verb. For the actual feature types, we adapted the (1). The corresponding generic description is shown in findings of Graesser and Clark (1985), whose research (9). Features of the same feature type within either the focused on the salient implications of events in narra- inherent or typical list are AND'ed or OR'ed as re- tives. A small number of feature types is sufficient to quired. Some features contain first-order formulas like represent the most salient features of events. These can the conditions in discourse representations. For exam- be thought of as answers to questions about the typical ple, one function feature has (advise(E,noun,Y) & event described by the verb that heads the entry. In client(Y) & regarding(E,Z) & law(Z)). The first addition, selectional restrictions on the verb are also argument of the predicate advise is an event reference encoded as a feature type. Feature types for verbs are marker. This event is modified in the regarding predi- cause, goal, what.enabled, what.happened_next, cate. The second argument of advise is instantiated in consequence_oLevent, where, when, implies, how, the processing as the same entity that is predicated as selectioD..]-restriction. An example of a generic entry being in the extension of the noun generically described for verbs follows. in the representation, in this case, "lawyer." 10. 9. lawyer( ~v( if someone buys something, {wha~enabled(can( typlca£ly he can afford it, {behavior( contentious ),appearance(well-dressed), a~ox~l(mabj,obJ)) ), status(htgh)~mcome(hlgh), how(with(X) & money(X)), he uses money, sex(male ),sex(feinale ),tools(paper ),tools(books ), where(in(Y) & store(Y)), he buys it in a store, tools(briefcase), cause(need~bJ,obj) ), he needs it, function(negotlate(*jxoun,Y) ~¢ settlement(Y)), what_happened-next(use(subJ,obJ) )}, and later he uses it. internal_trait( ambitious);tnternaLtrait( articulate ), {goal( own( subJ ,obJ) ), Inherently, his goal is to internal_trait( aggressive ), own it, consectuence_oLevent(own(( subJ ,obJ) ), and after buying it, he does internal_trait(clever), OwTt it. interns]_trait (knowledgeable) }, selectionsLrestrtction(sentient(subJ )), The buyer is sentient and {age( adult )'educatl°n(law-sch°°l)' implies(merchandise(obJ ) )}). 4 what is bought is legaLreq(pass(*jmun~) & bar(X)) merchandise.

154 Computational Linguistics, Volume 15, Number 3, September 1989 Kathleen Dahlgren, Joyce McDowell, and Edward P. Stabler, Jr. Knowledge Representation for Commonsense Reasoning with Text

Naive Semantic representations of generic knowledge Node in Featuretypes associated Featurevalues for Ontology with the node lawyer contain fifteen or more pieces of information per word, relatively more than required by other theories. The SOCIAL function types function practice law magnitude of the lexicon is counterbalanced by con- function argue cases straints that naturally obtain in the generic knowledge. function advise clients Study of the protocols of subjects in prototype experi- function represent clients in court ments reveals that people conceive of objects in con- requirement pass bar strained patterns of feature types. For example, animals appearance well-dressed are conceived in terms of physical and behavioral SENTIENT internaltrait friendly properties, while social roles are conceived in terms of education law school functional and relational properties. Thus not all feature internaltrait clever internaltrait articulate types occur in the representations of all nouns (or internaltrait contentious verbs). The pattern of features relevant to each node in sex male the ontology is called a Kind Type. Each feature is sex female classified by type as a COLOR, SIZE, FUNCTION, ROLE relation high status INTERNAL TRAIT or other. At each node, only income high certain feature types are applicable. Features at lower tools books nodes inherit feature type constraints from the nodes tools briefcases above them in the ontology. For instance, any node Table 6. Social Role Kind Type under SOCIAL may have certain feature types, and any node under ROLE may have those feature types inher- ited from SOCIAL, as well as further feature types. naive semantic representations. This goal motivated the Examples contrasting "elephant" and "lawyer" are choice of DRT as the compositional semantic formalism shown in Tables 5 and 6. for the project. The particular implementation of DRT which we use assumes a simple, purely syntactic parse as input to the DRS construction procedures (Wada and Node in Feature types associated Featurevalues for Ontology with the node elephant Asher 1986). Purely syntactic parsing and formal se- ENTITY haspart trunk mantics are unequal to the task of selecting one of the haspart 4 legs many possible parses and interpretations of a given text, partof herd but human readers easily choose just one interpretation. PHYSICAL color grey NS representations are very useful in guiding this size vehiclesized choice. They can be used to disambiguate parse trees, texture rough word senses, and quantifier scope. We use an existing LIVING propagation live births parser (MODL) to get one of the syntactic structures for habitat jungle a sentence and modify it based upon NS information. ANIMAL sex male or female This allows us to isolate the power of NS representa- behavior lumbers tions with respect to the parsing problem. Not only are behavior eats grass NS representations necessary for a robust parsing ca- Table 5. Animal Kind Type pability, but also in anaphora resolution and discourse reasoning. Furthermore, the parse tree must be avail- able to the discourse coherence rules. Thus our re- A lexical augmentation facility is used to create search has shown that not only must the NS represen- generic entries. This facility exploits the fact that pos- tations be accessible at all levels of text processing, but sible feature types for any term are constrained by the purely syntactic, semantic, and pragmatic information ontological attachment of the term, by the Kind Type to that has been accumulated must also be available to which they belong (Dahlgren & McDowell 1986a; Dahl- later stages of processing. As a result, the architecture gren 1988a). For example, it is appropriate to encode of the system involves separate modules for syntax, the feature type "behavior" for "dog" but not for semantics, discourse and naive semantics, but each of "truck." Similarly, it is appropriate to encode the the modules has access to the output of all others, as feature type "goal" for "dig" but not for "fall." The shown in Figure I. lexical augmentation facility presents the user with The parser chosen is the Modular Logic Grammar appropriate choices for each term and then converts the (McCord 1987), (MODL). Both MODL and KT are entries to a form suitable for processing in the system. written in VM/ (IBM 1985). The input to 2.2 TEXTINTERPRETATION ARCHITECTURE MODL is a sentence or group of sentences (a text). In KT we intercept the output of MODL at the stage of a 2.2.1 PARSER labeled bracketing marked with grammatical features The overall goal of the KT project is text selection and before any disambiguation or semantic processing based on the extraction of discourse relations guided by is done. In effect, we bypass the semantics of MODL in

Computational Linguistics, Volume 15, Number 3, September 1989 155 Kathleen Dahlgren, Joyce McDowell, and Edward P. Stabler, Jr. Knowledge Representation for Commonsense Reasoning with Text order to test our NS representations. In the labeled 2.2.3 DISCOURSE-LEVEL SEMANTICS bracketing each lexical item is associated with an argu- As each sentence of a text is processed it is added to the ment structure that can be exploited in semantic inter- DRS built for the previous sentence or sentences. Thus pretation. The labeled bracketing output by MODL is an augmented DRS is built for the entire text. In the slightly processed before being passed to our module augmentation module the commonsense knowledge rep- DISAMBIG. Here the commonsense is resentations are accessed to determine definite noun accessed to apply rules for prepositional phrase attach- phrase anaphora, sentence-external pronoun anaphora, ment (Dahlgren and McDowell 1986b) and word sense temporal relations between the tense predicates gener- disambiguation (Dahlgren 1988a), as well as to assign ated during sentence-level DRS construction, discourse the correct scope properties to operators and quantifi- relations (suc]h as causal relations) between clauses, and ers. The output is a modified parse. All of these modules the rhetorical structure of the discourse as a whole. The are in place and functional. The word sense disambig- discourse work is being carried out mainly by Dahlgren uation rules are in the process of being converted. An and is in various stages of completion. example of the input to DISAMBIG and the resulting 2.2.4 FOL output is as follows: 11. Input S: Since standard proof techniques are available for use with logical forms, the DRS formulated by the sentence- John put the money in the bank. level and discourse-level semantic components is con- Input to DISAMBIG: verted to standard logic. A number of difficulties s(np(n(n(john(Vl)))) & present themselves here. In the first place, given any of vp(v(fin(pers3,sg,past,ind),put(V I,V2)) np(detp(the(V3,V4)) & n(n(money(V2))) the proposed semantics for DRSs (e.g., Kamp 1981; pp(p(in(V2,VS)) & np(detp(the(V6,VT)) Asher 1987), DRSs are not truth functional. That is, the n(n(bank(VS)))))))) truth walue of a DRS structure (in the actual world) is not generally a function of the truth values of its Output of DISAMBIG: constituents. For example, this happens when verbs s(np(n(n(John(Vl)))) produce opaque contexts (Asher 1987). Since general vp (v(fln(pers3,sg,past~d),put i (V I,V2 )) proof methods for modal logics are computationally np(detp(the(V3,V4)) & n(n(money(V2)))) & pp(p(in(V2,VS)) & np(detp(the(V6,VT)) & difficult, we have adopted the policy of mapping DRSs n(n(bank2(VS))))))) to naiw~, first-order translations in two steps, providing a special and incomplete treatment of non-truth func- The differences are that the PP "in the bank" is tional contexts. The first step produces representations VP-attached in the output of DISAMBIG rather than that differ minimally from standard sentences of first- NP-attached as in the output of MODL, and that the order logic. The availability of this level of representa- words "put" and "bank" are assigned index numbers tion enhances the modularity and the extensibility of the and changed to putl and bank2, selecting the senses system. Since first-order reasoning is not feasible in an indicated by the word sense disambiguation algorithm. application like this, a second step converts the logical forms to clausal forms appropriate for the problem 2.2.2 SENTENCE-LEVEL SEMANTICS solver or the textual knowledge base. We describe each of these steps in turn. The modified parse is then submitted to a semantics module, which outputs a structure motivated in part by 2.2.4.1 THE TRANSLATION TO STANDARD LOGICAL FORMS current versions of discourse representation theory The sentence-level and discourse-level semantic com- (DRT) (Kamp 1981; Wada and Asher 1986; Asher and ponents disambiguate names and definite NPs, so that Wada 1988). The actual form of the discourse represen- each discourse reference marker is the unique canonical tation structure (DRS) and its conditions list in the KT name for an individual or object mentioned in the system differ from standard formats in DRT in that discourse. The scoping of quantifiers and negations has tense arguments have been added to every predicate also been determined by the semantic processing. This and tense predicates link the tense arguments to the allows the transformation of a DRS to FOL to be rather tense of the verb of the containing clause. The analysis simple. The basic ideas can be briefly introduced here; of questions and negation was carried out entirely with a more thorough discussion of the special treatment of respect to the KT system and to serve its needs. The queries is provided below. The conditions of a DRS are DRT semantics is in place and functional for most conjoined. Those conditions may include some that are structures covered by MODL. The commonsense already related by logical operators (if-then, or, not), in knowledge representations are accessed in the DRT which case the logical form includes the same opera- module for semantic interpretation of modals, the de- tors. Discourse referents introduced in the consequent termination of sentence-internal pronoun anaphora of an if-then construction may introduce r quantifiers: (where simple C-command and agreement tests fail), these are given narrow scope relative to quantifiers in and to determine some cases of quantifier scoping. the consequent (cf. Kamp's notion of a "subordinate"

156 Computational Linguistics, Volume 15, Number 3, September 1989 Kathleen Dahlgren, Joyce McDowell, and Edward P. Stabler, Jr. Knowledge Representation for Commonsense Reasoning with Text

DRS in Kamp 1981; Asher and Wada 1988). Any lem solver uses standard resolution techniques: to quantifiers needing wide scope will already have been prove a proposition, we show that its negation is moved out of the consequent by earlier semantic proc- incompatible with the theory. Accordingly, the material essing. The DRSs of questions contain special operators inside the scope of a question operator is treated as if it that, like the logical operators, take DRS arguments that were negated, and this implies an appropriately differ- represent the scope of the material being questioned. A ent treatment of any quantifiers inside the scope of the similar indication is needed in the logical form. In the operators. case of a yes-no question, we introduce a special 2.2.5 REASONER vacuous quantifier just to mark the scope of the ques- tioned material for special treatment by the problem In the architecture of the system, the reasoning module solver (see below). In the case of wh-questions, a is broken into two parts: the specialized query process- special wh-quantifier is introduced, again indicating the ing system and a general purpose problem solver. The scope of the questioned material and triggering a special special processing of queries is described in detail treatment by the problem solver. Verbs of propositional below. The problem solver is based on a straightforward attitude and other structures with opaque contexts are depth-bounded Horn clause proof system, implemented treated as, in effect, introducing new isolated subtheo- by a Prolog metainterpreter (e.g. Sterling and Shapiro des or "worlds." For example, "John believes all men 1986). The depth bound can be kept fixed when it is are mortal" is represented as a relation of belief holding known that no proofs should exceed a certain small between John and a logical form (see below for more depth. When a small depth bound is not known, the detail), though it is recognized that this approach will depth can be allowed to increase iteratively (cf. Stickel need to be supplemented to handle anaphora and prop- 1986), yielding a complete SLD resolution system. This ositional nominals (cf., e.g., Asher 1988). proof system is augmented with negation-as-failure for predicates known to be complete (see the discussion of 2.2.4.2 THE CONVERSION TO SPECIALIZED open and closed world assumptions below), and with a CLAUSAL FORMS specialized incomplete negation-as-inconsistency that Considerations of efficiency motivate a further transfor- allows some negative answers to queries in cases where mation in our logical forms. After the first step of negation-as-failure cannot be used. processing, we have standard first-order logical forms, 2.2.6 RELEVANCE except that they may include special quantifiers indicat- ing questioned material. Consider first those logical The RELEVANCE module will have the responsibility forms that are not inside the scope of any question of determining the relevance of a particular text to a quantifier. These are taken as representations of poten- particular user. The text and user profiles will be tial new knowledge for the textual data base. Since the processed through the system in the usual way resulting inference system must solve problems without user in two textual data bases, one for the target text and one guidance, it would be infeasible to reason directly from for the profile. Target and profile will then be compared the first-order formulations. Clausal forms provide an for relevance and a decision made whether to dispatch enormous computational advantage. For these reasons, the target to the profiled user or not. The relevance we transform each sentence of first-order logic into a rules are a current research topic. The commonsense clausal form with a standard technique (cf., e.g., Chang knowledge representations will form the primary basis and Lee 1973), introducing appropriate Skolem func- for determining relevance. tions to replace existentially quantified variables. The textual database can then be accessed by a clausal 3 NAIVE SEMANTICS IN THE KT SYSTEM theorem prover. In the current system, we use efficient Horn clause resolution techniques (see below), so the From the foregoing brief overview of the KT system, it knowledge representation is further restricted to Horn should be clear that naive semantics is used throughout clauses, since completeness is less important than fea- the system for a number of different processing tasks. In sible resource use in the present application. Definite this section we show why each of these tasks is a clauses are represented in a standard Prolog format, problem area and how NS can be used to solve it. while negative clauses are transformed into definite 3.1 PREPOSITIONAL PHRASE ATTACHMENT 5 clauses by the addition of a special positive literal The proper attachment of post-verbal adjuncts is a "false(n)," where n is an integer that occurs in no other notoriously difficult task. The problem for prepositional literal with this predicate. This allows a specialized phrases can be illustrated by comparing the following incomplete treatment of negation-as-inconsistency (cf. sentences. Gabbay and Sergot 1986). The definite clause transla- tions of the text can then be inserted into a textual 12. [S The government [VP had uncovered [NP an knowledge base for use by the reasoning component. entire file [PP about the scheme]]]]. 6 The presence of question quantifiers triggers a special 13. [S Levine's lawyer [VP announced [NP the plea treatment in the conversion to clausal form. Our prob- bargain] [PP in a press conference]

Computational Linguistics, Volume 15, Number 3, September 1989 157 Kathleen Dahigren, Joyce McDowell, and Edward P. Stabler, Jr. Knowledge Representation for Commonsense Reasoning with Text

14. [S[S The judge adjourned the hearing] [PP in the e. 2intransitive(V)& (place(POBJ) OR temporal afternoon]] (POBJ) OR abstract(POBJ)) --* a.attach(PP) If the verb is intransitive and the object of the Each of these sentences takes the canonical form Sub- preposition is a place, temporal, or abstract, ject-Verb-Object-PP. The task for the processing sys- tem is to determine whether the PP modifies the object then S-attach the PP. (i.e., the PP is a constituent of the NP, as in (12)), the Levine worked in a brokerage house. verb (i.e., the PP is a constituent of the VP, as in (13)), f. epistemic(POBJ)--* s_attach(PP) or the sentence (i.e., the PP is an adjunct to S, as in If the prepositional phrase expresses a proposi- (14)). Some deny the need for a distinction between VP tional attitude, then attach the PP to the S. and S-modification. The difference is that with S-mod- Levine was guilty in my opinion. ification, the predication expressed by the PP has scope g. xp(.. ~_dj-PP...)--~ xp_attach(PP) over the subject, while in VP-attachment it does not. If PP follows an adjective, then attach the PP to For example, in (15), "in the park" applies to Levine, the phrase which dominates and contains the while in (16), "with 1,000 dollars" does not apply to adjective phrase. Levine. Levine is young for a millionaire. h. measure(DO)--~ np_attach(PP) 15. Levine made the announcement in the park. If the direct object is an expression of measure, 16. Levine bought the stock with 1,000 dollars. then NP-attach the PP. A number of solutions for the problem presented by The defendant had consumed several ounces of post-verbal prepositional phrases have been offered. whiskey. The most common techniques depend on structural i. comparative-* np_attach(PP) (Frazier and Fodor 1978), semantic (Ford, Bresnan, and If there is a comparative construction, then Kaplan 1982), or pragmatic (Crain and Steedman 1985) NP-attach the PP. tests. MODL (McCord 1987) employs a combination of The judge meted out a shorter sentence than syntactic and semantic information for PP attachment. usual. Independently, we formulated a preference strategy for j. mental(V) & medium(POBJ)-~ vp_attach(PP) PP attachment which uses ontological, generic and If the verb is a verb of saying, and the object of syntactic information to cover 99% of the cases in an the preposition is a medium of expression then initial test corpus, and which is 93% reliable across a VP-attach the PP. number of types of text. This is the preference strategy Levine's lawyer announced the plea bargain on we employ in KT. The PP attachment rules make use of television. information about the verb, the object of the verb, and Example 14 is handled by global rule (17a). Example 13 the object of the preposition. A set of global rules is is handled by global rule (17j). The global rules are applied first, and if these fail to find the correct attach- inapplicable with example 12, so the rules specific to ment for the PP, a set of rules specific to the preposition "about" are called. These are shown below. are tried. Each of these latter rules has a default. The global rules are stated informally in (17). with example 18a. intrsmsitive(V) & mental(V)--~ vp_attach(PP) sentences. If the verb is an intransitive mental verb, then VP-attach the PP. 17a. time(POBJ)-, s_attach(PP) Levine spoke about his feelings. If the object of the preposition is an expression b. Elsewhere--* np_attach(PP) of time, then S-attach the PP. Otherwise, NP-attach the PP. The judge adjourned the hearing in the after- The government had uncovered an entire file noon. about the scheme. b. lexical(V+Prep)--~ vp_attach(PP) If the verb and preposition form a lexicalized As a filrther example, the specific rules for "by" uses complex verb, then VP-attach the PP. both generic (19a,b) and ontological (19c) knowledge. The defense depended on expert witnesses. 19 a. nom(DO)--~ np_attach(PP) c. Prep=of--~ np_attach(PP) If the direct object is marked as a nominaliza- If the preposition is of then NP-attach the PP. tion, then NP-attach the PP. The ambulance carried the victim of the shoot- The soldiers withstood the attack by the en- ing. emy. d. intransitive(V) & motion(V) & place(POBJ) b. location(DO,POBJ)--~ np_attach(PP) --~ vp_attach(PP) If tJ~e rel~.tion between the d.ireot obJeot and If the verb is an intransitive verb of motion and t~he object of t~he preposition Is one of loos,- the object of the preposition is a place then ~ion, then lq'P-~t~ch t,he PP. VP-attach the PP. The clerk adjusted the microphone by the The press scurried about the courtroom. witness stand.

158 Computational Linguistics, Volume 15, Number 3, September 1989 Kathleen Dahlgren, Joyce McDowell, and Edward P. Stabler, Jr. Knowledge Representation for Commortsense Reasoning with Text

c. proposltional(DO) & sentlent(POBJ)--, np_at- (Dahlgren 1988b). The method is local because word ta~h(PP) senses are disambiguated cyclically, from the lowest If the direct object is propositional and the S-node up to the matrix node. Only when intrasentential object of the preposition is sentient, then NP- sources of information fail are other sentences in the attach the PP. text considered by the disambiguation method. The The judge read out the statement by Levine. algorithm is combined because it employs three sources d. Elsewhere---> s_attach(PP) of information. First it tries fixed and frequent phrases, Otherwise, S-attach the PP. then word-specific syntactic tests, and finally naive The lawyers discussed the case by the parking semantic relationships in the clause. If the fixed and lot. frequent phrases fail, the syntactic and naive semantic These PP-attachment preference rules are remarkably rules progressively reduce the number of senses rele- vant in the clausal context. The algorithm was devel- successful when applied to real examples from actual oped by considering concordances of seven nouns with text. However, they are not foolproof and it is possible to construct counterexamples. Take the global rule a total of 2,193 tokens of the nouns, and concordances of four verbs with a total of 1,789 tokens of the verbs. illustrated in (17a). We can construct a counterexample as in (20). The algorithm is 96% accurate for the nouns in these concordances, and 99% accurate for the verbs in these 20. John described the meeting on Jan. 20th. concordances. Sentence 20 is ambiguous. Jan. 20th can be the time of Fixed phrases are lists of phrases that decisively the describing or the time of the meeting. Perhaps there disambiguate the word senses in them. For example, the noun "hand" has 16 senses. Phrases such as "by is a slight intuitive bias toward the latter interpretation, but the rules will assign the former interpretation. This hand," "on hand," "on the one hand" have only one sense. is a counterexample only because "meeting" is a TEM- PORAL noun and can plausibly have a time feature. Syntactic tests either reduce the number of relevant Compare (21), which is identical except for the ontolog- senses, or fully disambiguate. For nouns, syntactic tests ical attachment of the direct object and which is handled look for presence or absence of the determiner, the type correctly by the global rule. of determiner, certain prepositional phrase modifiers, quantifiers and number, and noun complements. For 21. John described the proposal on Jan. 20th. example, only five of the 16 senses of "hand" are The problem of the interpretation of event nominals is a possible in bare plural noun phrases. For verbs, syntac- research topic we are working on. tic tests include the presence of a reflexive object, The PP attachment rules are applied in the module elements of the specifier, such as particular adverbs the DISAMBIG, which produces a disambiguated parse presence of a complement of the verb and particular from the output of the MODL parser. The first step is to prepositions. For example, the verb "charge" has only identify the sentence elements that form their inputs for its reading meaning "indict" when there is a VP- each clause using find..args. The output of find_args is a attached PP where the preposition is "with" (as deter- list of the the direct object, object of the preposition, mined by the prepositional phrase attachment rules). preposition, and main verb of the clause and the index Syntactic tests are encoded for each sense of each of the clause (main, subordinate, and so on). The word. The remainder of this section will illustrate PP-attachment rules are in place and functional for one disambiguation using naive semantic information and post-verbal prepositional phrase. Where more than one give examples of the naive semantic rules. (The com- post-verbal prepositional phrase occurs, the current plete algorithm may be found in Dahlgren 1988a). default is to attach the second PP to the NP of the first 3.2.1 NOUN DISAMBIGUATION PP. However, this will not get the correct attachment in cases like the following. Naive semantic information was required for at least a 22. The judge passed sentence on the defendant in a portion of the disambiguation in 49% of the cases of terse announcement. nouns in the concordance test. Naive semantic infer- A planned extension of the PP-attachment functionality ence involves either ontological similarity or generic will attack this problem by also keeping a stack of relationships. Ontological similarity means that two prepositions. The top of the stack will be the head of the nouns are relatively close to each other in the ontology, rightmost PP. The attachment rules will be applied to both upwards and across the ontology. If there is no PPs and the other constituents in a pairwise fashion ontological similarity, generic information is inspected. until all are attached. Generic information for the ambiguous noun, other nouns in the clause, the main verb, often disambiguate 3.2 WORD SENSE DISAMBIGUATION an ambiguous noun. Ontological similarity is tested for in several syntac- The word sense disambiguation method used in the tic constructions: conjunction, nominal compounds, system is a combined local ambiguity reduction method possessives, and prepositional phrase modifiers. Many

Computational Linguistics, Volume 15, Number 3, September 1989 159 Kathleen Dahlgren, Joyee McDowell, and Edward P. Stabler, Jr. Knowiedge Representation for Commonsense Reasoning with Text of the 16 senses of "hand" are ontologically distinct, as courtl PLACE Typically, it has a bench, shown in Table 7. jury box, and witness stand. Inherently its function is for a judge to conduct trials in. It is part of a courthouse. I. HUMAN human body part court2 INSTITUTION Typically, its function is 2. DIRECTION right or left justice. Examples are the 3. INSTRUMENT by hand Supreme Court and the 4. SOCIAL power, authority superior court. Its location 5. TEMPORAL applause is a courtroom. Inherently it 6. ROLE laborer is headed by a judge, has 7. ARTIFACT part of a clock bailiffs, attorneys, court reporters as officers. Table 7. Some senses of hand Participants are defendants, witnesses and jurors. The function of a court is to In (23), only the HUMAN and ROLE senses (I and hear testimony, examine 6) are possible, by ontological similarity. Generic evidence and reach a knowledge of the verb "clear" is inspected for the final verdict. It is part of the disambiguation to sense 6. justice system. 23. The farmer and his hand cleared the field. Table 8. Generic Information for Two Senses of court 7 In contrast, in (24), the relevant senses of "hand" are the HUMAN and ARTIFACT senses (1 and 7). 29. The witness stand in Jones's court is made of oak. 24. His arm and hand were broken. In (30), the adjective "wise" narrows the relevant At the point in the algorithm where naive semantic tests senses from nine to the two INSTITUTION senses of are invoked, syntactic tests have already eliminated the "court." ARTIFACT sense, which does not occur with a per- sonal pronoun. Thus the HUMAN sense is selected. In 30. 'The wise court found him guilty. (25), only the ARTIFACT sense (7) is possible, by Generic knowledge of one sense of the verb "find" is ontological similarity of "clock" and "hand." then used to select between the court-of-law sense (2) 25. The clock hand was black. and the-royal-court sense (4). Verb selection restric- tions are powerful disambiguators of nouns, as many In (26), again the HUMAN and ROLE senses (1 and 6) computational linguists have observed. In (31), the verb are the only relevant ones by ontological similarity. Selection restrictions on "shake" complete the disam- biguation. chargel Typically, if someone is charged, next they 26. John shook the man's hand. are indicted in court, convicted or acquitted. In (27), sense 4 is selected because "affair" and sense 4 They are charged because they have are both SOCIAL. committed a crime or the person who charges them suspects they have. 27. John saw his hand in the affair. Inherently, the charger and chargee are In (28), sense I is selected because both sense 1 and a sentient, and the thing charged with is a sense of "paper" are attached to PHYSICAL. crime. 28. The judge had the paper in his hand. charge2 Inherently, if someone charges that something is true, that someone is sentient, The word sense disambiguation algorithm tests for his goal is that it be known, and the generic relationships between the ambiguous noun and prepositional phrases modifiers, adjective modifiers, something is bad. Typically, if someone charges someone else and the main verb of the sentence. Two of the nine charge3 an amount for something, the chargee has to senses of "court" are shown in Table 8. In "the court pay the amount to the charger, and the listened to testimony," generic information for the chargee is providing goods or services. second sense of "court" can be used to select sense 2. Inherently, the charger and chargee are The generic information includes knowledge that one of sentients, the amount is a quantity of the functions of courts has to do with testimony. In (29), money, and the goal of the charger is to sense 1 of "court" is selected because the generic have the chargee pay the amount. representation of "court" contains information that witness stands are typical parts of courtrooms. Table 9. Generic Information for Thi'ee Senses of charge

160 Computational Linguistics, Volume 15, Number 3, September 1989 Kathleen Dahlgren, Joyce McDoweli, and Edward P. Stabler, Jr. Knowledge Representation for Commonsense Reasoning with Text

"last" requires a TEMPORAL subject, thus disambig- for a generic relationship between senses of the ambig- uating "hand." uous word and the head of the PP. 31. They gave him a hand which lasted 10 minutes. 37. ppmod(Ambig_Word,{.. _A_rnbig_Word,Prep, Noun...},Sl,S2) ~- 3.2.2 VERB DISAMBIGUATION ontattach(Noun, Node) & ontselect (Ambig_Word,Node, Sl,S2). Just as selectional restrictions of verbs disambiguate ppmod(Ambig_Word,{.. _&mbig_Word,Prep, nouns, their arguments are powerful disambiguators of Noun...},Sl,S2) ~-- verbs. Subject, object, and oblique arguments are all generic_relation( Nound%rnbig_Word). taken into account by the algorithm. (32) and (33) illustrate the way that objects can disambiguate the verb 3.3 QUANTIFIER SCOPING "charge," generic entries for which are shown in Table The semantic module of the KT system is capable of 9. generating alternative readings in cases of quantifier In (32) the SENTIENT object selects sense 1. In (33), scope ambiguities, as in the classic case illustrated in the MONEY object selects sense 3. (38). 32. The state charged the man with a felony. 38. Every man loves a woman. 33. The state charged ten dollars for the fine. In this example either the universally quantified NP The verb "present" can be disambiguated by its sub- ("every man") or the existentially quantified NP ("a ject. It has at least three senses: woman") can take widest scope. In such sentences, it is 34. present l--"give" generally assumed that the natural scope reading (left- present2---"introduce" most quantified expression taking widest scope) is to be present3--"arrive in the mind of" preferred and the alternative reading chosen only under Senses 1 and 2 require SENTIENT subjects, so the explicit instructions of some sort (such as input from a third sense is selected in (35). user, for example, or upon failure to find a proper 35. The decision presented a problem to banks. antecedent for an anaphoric expression). Under this (36) illustrates subject and object disambiguation. The assumption, in a sentence like (39), the indefinite NP SENTIENT subject narrows the possibilities to senses would preferentially take widest scope. l and 2, and the "give" sense (1) is selected because it 39. A woman loves every man. requires a PHYSICAL object argument. But there are a number of cases similar to (39) where an 36. John presented a bouquet to Mary. expression quantified by "every" appears to the right of an indefinite NP and still seems to take widest scope. 3.2.3 OISAMBIGUATION RULES Ioup (1975) has discussed this phenomena, suggesting The disambiguation method first tries fixed and frequent that expressions such as "every" take widest scope phrases, then syntactic tests, and finally naive semantic inherently. Another computational approach to this information. Each set of rules reduces the number of problem is to assign precedence numbers to quantifiers, relevant senses of a word in the sentential (and extra- as described in McCord (1987). However, our investi- sentential) context. There is a fixed set of commonsense gation has shown that commonsense knowledge plays at rules for nouns and another one for verbs. They are least as large a role as any inherent scope properties of tried in an order that inspects the main verb of the universal quantifiers. clause last, because the main verb often chooses be- Consider (40). In the natural scope reading, "every tween the last two relevant senses. An example of a lawyer" takes scope over "a letter" and we have noun rule is ppmod, which considers an ambiguous several letters, one from each of the lawyers, i.e., noun in relation to the head of a prepositional phrase several tokens of a one-to-one relationship. In the modifier attached to the same higher NP as the ambig- alternative reading, "a letter" takes scope over "every uous noun. There are two versions of the rule, one lawyer" and we have only one letter, i.e., one token of which looks for ontological similarity between senses of a many-to-one relationship relationship. Both scope the ambiguous noun and the head of the PP, and one readings are plausible for (40). which looks for generic relationships between them. 40. The judge read a letter from every lawyer. The output of find_args (in DISAMBIG, see Section 3. l) In (41) only the alternative reading (several tokens of is used as a simplified syntactic structure inspected by one-to-one) is plausible. these rules. This provides information as to whether or 41. The politician promised a chicken in every pot. not a head noun is modified by a prepositional phrase. In (42), however, only the natural reading (one token of In the first version of the rule, the ontological attach- many-to-one) is plausible. ment of the head of the PP is looked up and then senses 42. The prince sought a wife with every charm. of the ambiguous word with that same ontological Even in (40), however, speakers prefer the one-to-one attachment are selected. SI is the list of senses of the relationship, the alternative reading. That is, speakers ambiguous noun still relevant when the rule is invoked. prefer the reading that denotes several tokens of a $2 is the list of senses reduced by the rule if it succeeds. one-to-one relationship (several letters) over one which If the first version fails, the second is invoked. It looks denotes one token of a many-to-one relationship unless

Computational Linguistics, Volume 15, Number 3, September 1989 161 Kathleen Dahlgren, Joyce McDowell, and Edward P. Stabler, Jr. Knowledge Representation for Commonsense Reasoning with Text there is strong commonsense knowledge to override this semantic interpretation because they introduce ambigu- preference. We know that in our culture princes can ities and induce intensional contexts in which possibil- h~ive only one wife, so in the case of (42) speakers prefer ity, necessity, belief, and value systems play a role. In one token (one wife) of many-to-one to several tokens the KT system, we are concerned with what is known of one-to-one. Similar arguments apply to the following by the system as a result of reading in a modal sentence. examples (43)-(45), which correspond to (40)-(42) re- In particular we are interested in what status the system spectively. assigns to the propositional content of such a sentence. 43. A judge decided on every petition. To illustrate the problem, if the system reads 44. A lawyer arrived from every firm. "Levine engaged in insider trading," then an assertion 45. A company agent negotiated with every union. can justifiably be added to the knowledge base reflect- Thus, if there is an inherent tendency for universal ing the fact that Levine engaged in insider trading. The quantifiers to take widest scope where scope ambiguity same is true if the system reads "The Justice Depart- is possible, it derives from the human preference for ment knows that Levine engaged in insider trading." viewing situations involving universally quantified enti- But this is not the case if the system reads "The Justice ties as many tokens of one-to-one relationships, s In KT, Department believes that Levine engaged in insider first we prefer wide scope for the universal quantifier. trading." In this case the statement that Levine engaged In cases where this is not the natural scope interpreta- in insider trading must be assigned some status other tion, (i.e., the universal quantifier is to the right of a than fact. Specifically, since "believe" introduces an containing existentially quantified NP), we can use facts opaque context, the propositional content of the embed- encoded in the generic data base to override the pref- ded clause would be assigned to a partitioned data base erence. For example, when processing (42) we would linked to the speaker the Justice Department, as de- discover that a man may have only one wife. The scribed in the previous section. A similar problem exists generic entry for "wife" tells us that "wife" is a role in in modal sentences such as "Levine may have engaged a "marriage." The generic entry for "marriage" tells us in insider trading." that "marriage" is a relation between exactly one There are two types of modal sentences. In Type I husband and exactly one wife. This knowledge forces modal sentences the truth value of the propositional the many-to-one interpretation. The cases where this is content is evaluated with respect to the actual world or necessary turn out to be rare. Curiously, the preposition a set of possible other states of the actual world directly "with" correlates very highly with many-to-one rela- inferable from the actual world. Examples are tionships. Thus our strategy for the present has been to consider overriding the preference only when the uni- 47. Levine must have engaged in insider trading. versal quantifier is in an NP which is the object of 48. The Justice Department will prosecute Levine. "with." In these cases we access the generic knowl- 49. Levine can plead innocent. edge, as described above. In Type II modal sentences, we say that a second 3.4 OPAQUE CONTEXTS speech act is "semantically embedded" in the modal sentence. The modal sentence is successful as an asser- In KT, clauses in opaque contexts (embedded under tion just in case the secondary speech act is in effect in propositional attitude verbs such as "hope," "be- the actual world. In these cases the truth value of the lieve," "deny") are handled by asserting the predicates propositional content is evaluated with respect to some generated from the clause into partitioned databases, set of deontic or normative worlds. The modal is viewed which correspond to the delineated DRSs of Asher as a quantifier cum selection function. Thus, for a (1987). Each partition is associated with the speaker or sentence of the form/~ = NP Modal VP, I~ is true in the the individual responsible for the embedded clause. actual world just in case NP VP is true in at least Reasoning can then proceed taking into account the one/every world (depending on Modal) in the set of reliability and bias of the originator of the partitioned deontic or normative worlds selected by Modal. Exam- statements, as in Section 2.2.4.1 An example follows. ples are 46. Text: Meese believes that Levine is guilty. 50. Levine must confess his guilt. Textual Database: believe(sl,meese,pl) 51. Levine may make one phone call. Partition: p 1 :: guilty(s2,1evine) 52. Levine should get a good attorney. 3.5 MODALS9 In (50) a command is semantically embedded in the The English modals can, may, must, will, and should assertion; in (51) a permission is semantically embedded are high-frequency items in all kinds of texts. They can in the assertion; and in (51) the issuance of a norm is be easily parsed by a single rule similar to the rules that semantically embedded in the assertion. handle auxiliary "have" and "be" because all the Type I modal sentences are of assertive type accord- modals occupy the same surface syntactic position (i.e., ing to the speech act classification scheme in Searle and the first element in the auxiliary sequence). However, Vanderveken (1985). These include the standard asser- the~ modals present some considerable problems for tions, reports, and predictions, and a proposed new

162 Computational Linguistics, Volume 15, Number 3, September 1989 Kathleen Dahlgren, Joyce McDowell, and Edward P. Stabler, Jr. Knowledge Representation for Commonsense Reasoning with Text type, quasi-assertion. They must be distinguished in a than plain assertion whereas epistemically-must(P) is text-understanding system from Type II modal sen- weaker than plain assertion. This results from the fact tences that embed other types of speech acts, because that the standard logic necessity operator quantifies only Type I modal sentences make a contribution to the over every logically possible world, plain assertion of P textual knowledge base. In addition, some modal sen- is evaluated with respect to the actual world, but the tences are ambiguous between Type I and Type II, for epistemic modal operator quantifies only over the example (47), (51), and (50). epistemically accessible worlds, a set which could pos- For the KT system, disambiguating the ambiguous sibly be null. modals "may" and "must" results in changing the Assertions of possibility (49) t¢igger the inferring of syntactic input to the semantic module. The surface enabling conditions. For any event there is a set of syntactic parse that is output by the parser is converted enabling conditions that must be met before the event is into the equivalent logical form where the epistemic possible. For John to play the piano, the following (Type I) uses of ambiguous modals are treated as conditions must be met: sentential operators and the nonepistemic uses of am- 55. 1. John knows how to play the piano. biguous modals (Type II) are treated as modifiers of the 2. John has the requisite permissions (if any). verb. Sentences containing ambiguous modals can be 3. A piano is physically available to John. assigned the correct status by a simple disambiguation 4. John is well enough to play. algorithm that makes appeal to the ontological classifi- cation of the main verb, specifically whether or not the .... verb is STATIVE, following Steedman (1977). Disam- These can be ordered according to saliency as above. biguation takes place in DISAMBIG, the same module This, we claim, is why the sentence "John can play the that converts the labelled bracketing to a modified parse piano" most often receives the interpretation (I), less for input to the DRT semantic translator. At this point, often (2), and practically never (3) or (4) unless explic- a determination is made whether the modal is a senten- itly stated, as in "John can play the piano now that his tial operator or a modifier of the verb. The propositional mother has bought a new Steinway." The enabling content of quasi-assertions and predictions can be conditions are encoded in KT as part of the generic added directly to the dynamically constructed textual representation for verbs. When a modal sentence is data base if they are appropriately marked with proba- interpreted as a full assertion of possibility (poss(p)), bility ratings. On this view, "will" and one sense of this triggers the inference that the most salient of the "may" are taken as denoting strong and weak predic- enabling conditions is in fact true. The difference be- tion and are not viewed as tenses. That is, when using tween poss(p) and p being processed for KT, is that ifp these modals, the speaker is indicating his confidence is output, then p is added to the textual database and the that there is a high/moderate probability of the propo- most salient enabling condition is also inferred. But if sitional content being true in the future. In the present poss(p) is output, then only the most salient enabling state of the system, every predicate contains a tense condition is inferred, but p is not added to the textual argument and there is a tense predicate relating every database. Notice that this simply reflects the fact that if tense argument to a tense value (such as "pres.," I say "John can play the piano," I am not saying that "future," etc.).1° In a planned extension to the system John is playing the piano at that very moment. these tense predicates will also contain probability Type II modal sentences present a more complex ratings. For example, given the continuum of speaker problem for interpretation. The commands, permis- commitment to the truth of the statement "Leyine sions, and norms reported in Type II modal sentences engaged in insider trading" illustrated in (53), we would are asserted into partitioned databases in the same way have the corresponding predicates in the DRS shown in as clauses in opaque contexts. The only difference is (54). that in most cases the issuer of the command, permis- sion, or norm reported in a modal sentence is not 53. Full Assertion: Levine engaged in insider trading known. Semantic translation in DRT proceeds via cat- Strong Quasi-Assertion: Levine must have en- egorial combination. By the time the modal sentence gaged in insider trading reaches the DRT module, the semantic type of the Weak Quasi-Assertion: Levine may have en- modal is unambiguous and the appropriate lexical entry gaged in insider trading can be retrieved. The creation of appropriate predicates 54. Full Assertion: engage(el,levine), tense(el ,past, 1) to express the variety of modal statements is the task of Strong Quasi-Assertion: engage(el,levine), tense the DRT module. (el,past,0.9) Weak Quasi-Assertion: engage(el,levine), tense (el,past,0.5) 4 THE QUERY' SYSTEM This hierarchy reflects the "epistemic paradox" of 4.1 OPENAND CLOSED WORLD ASSUMPTIONS Karttunen (1971), in which he points out that in stan- It is well known that negation-as-failure is a sound dard modal logic must(P) or necessarily, P is stronger extension of SLD resolution only when the database is

Computational Linguistics, Volume 15, Number 3, September 1989 163 Kathleen Dahlgren, Joyce McDowell, and Edward P. Stabler, Jr. Knowledge Representation for Commonsense Reasoning with Text complete, i.e., when it represents a closed world (Lloyd Is the man happy? 1984). Since our databases will always include some Who bought the book? predicates about which we have only incomplete infor- Generic Questions: mation, we cannot assume a completely closed world. Does a man buy a book? The open world assumption, though, makes it unsound Does the man love horses? to use the very useful negation-as-failure rule. Fortu- Ontological questions are answered by looking in the nately, it is well known that we can restrict the use of ontological database only. If an answer is not found, the this rule to just those predicates known to be complete, response will be "no" if the query contained an onto- keeping an open world assumption for other predicates. logical predicate (such as PLANT or ANIMAL) be- We accordingly specify that some of our general knowl- cause the ontology is a closed world. edge comprises a complete, closed world in the appro- priate sense, but we do not make this assumption about 58. Text: John is a man who bought a book. textual knowledge. Is a plant living?--Yes. 4.2 FUNCTIONING OF THE QUERY SYSTEM Is the man an animal?--Yes. Is the man a plant?----No. Queries are handled just like text up through conversion Factual queries (non-generic questions) go to the textual to FOL. The FOL form of the query is then passed to database first. If an answer is not found, then the REASONER, which decides which database is the most generic knowledge is consulted. If an answer is found likely source of the answer. REASONER can access there, the appropriate response (Typically.., Inher- the textual database, the verb and noun generic data- ently..) is returned. Otherwise the response is, "I don't bases, and the ontology. The reasoning is complex and know." dependent on whether the query form is ontological, factual, or generic. A search sequence is then initiated 59. Text: John is a man who bought a book. depending on these factors. The form of the answer Did John buy a book?---Yes. depends on the search sequence and the place where the Who bought a book?--John. answer is found. Is John the President?mI don't know. 4.2.1 ANSWERS Does John wear pants?---Typically so. The form of the answer depends on the reliability of the Where did John buy the book?--Typically, in a knowledge encoded in the database where the answer is store. found. The text is considered authoritative. If an answer Generic: queries go only to the generic database. is found in the text, search is terminated. The ontology 60. Does a man wear pants?--Typically so. is considered a closed world (see discussion above). What is the function of a lawyer?--Inherently, This means that yes/no ontological questions are an- represents clients. swered either "yes" or "no." The textual and generic Is an apple red? Typically so. databases are considered an open world. If an answer is Does a man love flowers?--I don't know. not found, and no further search is possible, the system Where does a man buy a book? Typically, in a concludes, "I don't know." Answers found in the store. generic databases are prefaced by "Typically," for Who buys a book?---Inherently, a sentient. information found in the first list (of typical features) or In addition to these general rules, there is special "Inherently," for answers found in the second list (of handling for certain types of queries. Questions of the inherent features). form "Who is . . ." and "What is . . ." are answered 4.2.2 QUESTION TYPES by finding" every predicate in any data base that is true An ontological question is in a copular sentence with a of the questioned entity. For example, in any of the non-terminal ontological node in the predicate. examples above, the question "Who is John?" would trigger a response that includes a list of all the nodes in 56. Ontological Questions: the ontology which dominate the position where Is a man human? "John" is attached plus the information that John is a Is the man a plant? man, is tall, and bought a book. The question "What is A factual question is one couched in the past tense, a vehicle?" would trigger only the list of ontological present progressive, and/or where the subject is specific nodes because there is no specific vehicle in the domain (a name or a definite NP). Specific NPs are assumed to of this example. Questions such as "Who buys a have already been introduced into the system. Our book?" and "What is driven?" are answered by stating simplified definition of generic question is one which selectional restrictions on the verb--Inherently, a sen- contains an inherently stative verb or a non-stative verb tient buys a book, and Inherently, a vehicle is driven. in the simple present combined with a non-specific Finally, it is possible to override the generic informa- subject (indefinite NP).li tion in specific cases while still retaining the capability 57. Factual Questions: of accessing the generic information later, as the follow- Did John buy a book? ing example shows.

164 Computational Linguistics, Volume 15, Number 3, September 1989 Kathleen Dahlgren, Joyce McDowell, and Edward P. Stabler, Jr. Knowledge Representation for Commonsense Reasoning with Text

61. What color is an airplane?mTypically, white. naive semantics offers a principled, transportable alter- John bought an airplane.--OK. native to scripts. NS is a powerful source of information What color is the airplane. - The text does not in discourse reasoning. Along with syntactic, composi- say, but typically white. tional semantic, and discourse cue information, NS can The airplane is red.roOK. be used to reason heuristically about discourse and What color is the airplane?~The text says red. drive many of the inferences drawn by people when What color is an airplane?--Typically, white. they read a discourse. The role of syntax and composi- Thus the system uses default logic (Reiter 1980). tional semantics will be underplayed in what follows, REASONER, therefore, is sensitive to a number of only because these contributions have been thoroughly factors which make the system seem to understand the treated by others (Reinhart 1982; Asher and Wada 1988; queries in a natural way. The responses generated also Kamp 1981; Grosz and Sidner 1986; Reichman 1985; reflect the continuum of reliability of information which Webber 1985). is available to a human reasoner. A flow chart of the 5.1 ANAPHORA search strategies in REASONER is shown in Figure 3. In anaphora resolution, syntactic constraints, accessi- bility (in the sense of Kamp 1981), and discourse segmentation work in concert to limit the number of Ontologicol Question? antecedents available to an anaphoric pronoun or defi- nite NP. However, it is clear that the resultant saliency stack can end up with more than one member (Asher and Wada 1988). It is at this point in the reasoning that naive inference is required. Consider the following. 0ooo0,l [ I 62. Levine's friend is a lawyer. He won his case for him. ^oswer"I I ^nswer' , Syntactic rules exclude a reading in which "he" and Yes No i rextuolDB I I "him" co-refer. Since both Levine and the lawyer are possible antecedents of all of the pronouns and there- BUCCO~ fore are both in the saliency stack, the following read- ings remain: I r 'r'"°'°' I 63. i. Levine won Levine's case for the lawyer. ~ uooeod ii. Levine won the lawyer's case for the lawyer. iii. The lawyer won Levine's case for Levine. Try Inherent Answer:. iv. The lawyer won the lawyer's case for Levine. DB Typico y...J Reading (iii) is the one people select. Generic knowl- fall ["~uceeed ~ucceed edge of "lawyer" suffices to make this selection. The i i r, lnhnnti generic representation of "lawyer" in (9) includes in- I formation that lawyers argue cases. An inspection of I ucceed generic knowledge for "argue" reveals that one goal of ~ arguing is winning. Using these two facts, the system Answer: Answer:. can infer that the lawyer won, so the lawyer is the I don't know Inherent y... subject of the sentence. Thus the benefactee, by dis- jointness, must be Levine. Now the feature of "lawyer" Figure 3. The Query System that says that lawyers represent their clients could be used to figure that the case is Levine's. Turning to definite anaphora, a definite description often has an implicit antecedent in the discourse, rather 5 NAIVE SEMANTICS AND DISCOURSE PHENOMENA12 than an explicit or deictic one. Unless world knowledge Most computational treatments of discourse phenom- is used, it is impossible to recover these. Consider the ena acknowledge the role of world knowledge in ana- following. phora resolution, temporal reasoning, and causal rea- 64. Levine's trial was short. The testimony took only soning (Reichman 1985; Grosz and Sidner 1986; Wada one day. and Asher 1986). However, in the past the only method A generic representation of "trial" is shown below. for encoding and incorporating world knowledge in- Using it, the antecedent of "testimony" in (65) can be volved writing a detailed script for every real-life situ- identified as a part of the trial. ation, directly encoding the probable sequence of events, participants, and so forth (Schank and Abelson 65. trial- 1977). This section will demonstrate that word level Typically, in a trial first there are oaths, state-

Computational Linguistics, Volume 15, Number 3, September 1989 165 Kathleen Dahlgren, Joyce McDowell, and Edward P. Stabler, Jr. Knowledge Representation for Comrnonsense Reasoning with Text

ments and testimony, then the judge instructs the and speech time (now). In a sequence of simple past jury, the jury deliberates and announces a ver- tense clauses, the reference time is updated each time. dict, and the judge pronounces the sentence. So in text (66), the temporal equations look as follows. There are roles of witnesses and spectators. A 68. rl < now. trial lasts about three days. el ~ rl Inherently, a trial has roles of judge, clerk, bailiff, r 1 ~ r 2 attorneys, plaintiff, and defendant. The goal of a e2 ~ r2 trial is to settle a dispute or determine the guilt or r 2 .~ r 3 innocence of a defendant. e 3 C I"3 An interesting subset of definite anaphora is definite However, the reference time is not always updated by a event anaphora (Asher 1988). In (66), we know that simple past tense verb. In addition to the effects of "decision" refers to an event, because it is a deverbal clause aspect, which will be discussed below, common- nominal. Generic knowledge for deverbal nominals is sense knowledge affects temporal reasoning. Two the same as for verbs. So we know that there is some events reported in a sequence of simple past tense definite SOCIAL MENTAL event. Both commenting clauses can overlap in time, or the second event (in and sentencing are SOCIAL MENTAL verbs. Generic textual sequence) can occur before the first. Naive knowledge of the verb "sentence" includes knowledge semantic representations are sufficient to assign these that a sentencing is enabled by a decision. This can be relations correctly in many cases. The typical implica- used to infer that the antecedent of "the decision" is the tions of events in verb representations can be used to sentencing event reported in the first sentence, rather infer overlap. Consider the following discourse. than the commenting event. Thus e 4 = e I. 69. (el) Levine made a statement. (e2) He said he was sorry. 66. (el) The judge sentenced Levine to a short term. We know that a "statement" is a deverbal nominal of (e2) He commented that Levine had been coop- "state," and that the goal of "state" and of also "say" erative. is communicating ideas. This knowledge is used to infer (e3) The decision (e4) surprised an attorney. that e 2 C e 1. If the generic knowledge has an implication which is The regular temporal presuppositions of implica- similar to the event nominal, the correct antecedent can tional features on verbs are very powerful in inferring be inferred. The resulting DRS is shown in (67). the relative order of events. An event e i must occur 67. before an event ej in order to cause ej.

Xl, x2, a 1, e I, x 3, p, e 4, x 3, a2, e2, e3 cause(el,e2) e2 before el judge(xl) enable(el,e2) e 2 before e l levine(x2) goal(el,e2) e 2 after e ! term(a0 consequence(e I,e2) e 2 after e l short(a0 whnext(el,e2) e 2 after e 1 el sentence(x~,x2) when(el,e2) e 2 overlap e 1 to(el,a0 where(el,e2) e2 overlap e I he(x3) how(el,e2) e 2 overlap el e2 comment(xa,p) Table 10. Temporal Presuppositions of Verb Implications P: s2

s2 cooperative(x2) Table 10 lists the temporal presuppositions of several generic: features of verbs. This works well when one of the implications of a verb mentions another verb (or X 3 = X I related verb) in the text, and tense or adverbial modifi- decision(e3) ers do not give clues as to the temporal relations. In attorney(a2) (70), the fact that e2 is a typical cause of el, can be used e2 surprise(ea,a2) to infer that e 2 occurred before e l. e 3 = e 1 70. (ea) Levine was found guilty. (e2) He broke the 5.2 TEMPORAL REASONING law. Temporal reasoning involves the assignment of relation- Similarly, in (71), the fact that buying typically takes ships among the times of the events reported in the place in stores, can be used to infer that e 2 overlapped discourse. Following the Reichenbachian approach to el in time. (The stativity of the verb in the second tense found in Partee (1984), there are three elements to sentence is also a weak, but insufficient indicator of the temporal reasoning: event time (el), reference time (ri), temporal relationship between the sentences).

166 Computational Linguistics, Volume 15, Number 3, September 1989 Kathleen Dahlgren, Joyce McDowell, and Edward P. Stabler, Jr. Knowledge Representation for Commonsense Reasoning with Text

71. (el) Levine bought a book. (e2) He was in a store on 5th Ave. 8 1 and 62---event or state reference markers Temporal properties of nouns also require naive seman- r~ < rE--reference times tics. Role terms have a temporal element (Enc 1987). A I. t~1 C rl, 62 C r 2 Elaboration(Si,62), Cause(81,62), term such as "President of the U.S." has a DURA- Goal(62,81) TION feature in the naive semantic representation. This Evidence(62,80, enables appropriate inferences concerning the occu- Enablement(81,62), pants of roles. In (72), the system can infer that the Comment(62,Sl), individual who declared war on insider trading is prob- 2. 81, 62 C_ r I Parallel(81,62), Contrast(81,62), ably not the same as the one whose offi