<<

Generating Referring Quantified Expressions

James Shaw and Kathleen McKeown Dept. of Computer Science

Columbia University , :, ~.*~ . New York, NY 10027, USA shaw, kathy*cs, columbia, edu

Abstract pression refers to multiple entities. There is a po- In this paper, we describe how quantifiers can be tential ambiguity between whether the aggregated generated in a text generation system. By taking entities acted individually (distributive) or acted to- advantage of discourse and ontological information, gether as one (collective). Under the distributive quantified expressions can replace entities in a text, reading, the sentence "All the nurses inspected the making the text more fluent and concise. In ad- patient." implies that each nurse individually in- dition to avoiding ambiguities between distributive spected the patient. Under the collective reading, and collective readings in universal quantification the nurses inspected the patient together as a group. generation, we will also show how different scope The other ambiguity in quantification involves mul- orderings between universal and existential quanti- tiple quantifiers in the same sentence. The sentence tiers will result in different quantified expressions in "A nurse inspected each patient." has two possi- our algorithm. ble scope orderings. In Vpatient3nurse, the universal quantifier V has wide scope, outscop- 1 Introduction ing the existential quantifier 3. This ordering means that each patient is inspected by a nurse, who might To convey information concisely and fluently, text not be the same in each case. In the other scope generation systems often perform opportunistic text order, 3nurseVpatient, a single, particular nurse in- planning (Robin, 1995; Mellish et al., 1998) and em- spected every patient. In both types of ambiguities, ploy advanced linguistic constructions such as ellip- a generation system should make the desired reading sis (Shaw, 1998). But a system can also take ad- clear. vantage of quantification and ontological informa- tion to generate concise references to entities at the Fortunately, the difficulties of quantifier scope dis- discourse level. For example, a sentence such as ambiguation faced by the understanding conmmnity "'The patient has an infusion line in each arm." is do not apply to text generation. For generation, the a more concise version of "The patient has an in- problem is the reverse: given an unambiguous rep- fusion line ir~ his left arm. The patient has an in- resentation of a of facts as input, how can it fusion line in his right arm." Quantification is an generate a quantified sentence that unambiguously active research topic in , language, and philoso- conveys the intended meaning? In this paper, we phy(Carpenter, 1997; de Swart. 1998). Since nat- propose an algorithm which selects an appropriate ural language understanding systems need to ob- quantified expression to refer .to a set of entities us- tain as few interpretations as possible from text, ing discourse and ontological knowledge. The algo- researchers have studied quantifier scope ambigu- rithm first identifies the entities for quantification in ity extensively (Woods~ 1978;-Grosz et al., 1987; ;the input :propositions. Then an- appropriate con- Hobbs and Shieber, 1987: Pereira, 1990; Moran and cept in the ontology is selected to refer to these en- Pereira, 1992: Park, 1995). Research in quantifica- tities. Using discourse and ontological information, tion first transforms a sentence into the system determines if quantification is appropri- predicate logic, raises tim quantifiers to the senten- ate and if it is, which particular quantifier to use tial level, and permutes these quantifiers {o obtain to minimize the anabiguity between distributive and as many readings as possible relaled to quantifier collective readings. More importantly, when there scoping. Then, invalid readings are eliminated using are multiple quantifiers hi the same sentence, the al- various consl raints. gorithm generates different expressions for differen~ Ambiguity in quantified expressions is caused by scope orderings. In this work, we focus on generat- two main culprits. The first type of ambiguity in- ing referring quantified expressions for entities which volves the distributive reading versus the collective have been mentioned before in the discourse or can reading. In universal quantification, a referring ex- be inferred from an ontology. There are quantified 100 expressions that do not refer to particular entities in a domain or discourse, such as generics (i.e. "All ((TYPE EVENT) whales are mammals."), or negatives (i.e., "The pa- (PRED ((PRED receive) (ID idl))) tient has no allergies."). The synthesis of such quan- (ARGi ((PRED patient) (ID ptl))) tifiers is currently performed in earlier stages.of the (ARG2 ((PRED aprotinin) (ID apl))) generation process. (MODS ((PRED after) (ID id2) . In the next section;we..vdll..~orapaxe ou_r~.approach ...... :. .. tTYRE_TIME)...... f ...... with previous work in the generation of quantified (ARG2 ((PRED critical-point) expressions. In Section 3, we will describe the appli- (NAME intubation) (IDcl))) cation where the need for concise output motivated ))) our research in quantification. The algorithm for generating universal quantifiers is detailed in Sec- Figure h The predicate-argument structure of tion 4, including how the system handles ambiguity "After intubation, a patient received aprotinin." between distributive and collective readings. Sec- tion 5 describes how our algorithm generates sen- tences with multiple quantifiers. dard text generation system architecture with three 2 Related Work modules (Rambow and Korelsky, 1992): a content planner, a sentence planner, and a linguistic realizer. Because a quantified expression refers to multiple Once the bypass surgery is finished, information that entities in a domain, our work can be categorized as is automatically collected during surgery such as referring expression generation (Dale, 1992; Reiter blood pressure, heart rate, and medications given, and Dale, 1992; Horacek, 1997). Previous work in is sent to a domain=specific medical inference mod- this area did not address the generation of quantified ule. Based on the medical inferences and schemas expressions directly. In this paper, we are interested (McKeown, 1985), the content planner determines in how to systematically derive quantifiers from in- the information to convey and the order to convey put propositions, discourse history, and ontological it. information. Recent work on the generation ofquan- tifiers (Gailly, 1988; Creaney, 1996; Creaney, 1999) The sentence planner takes a set of propositions follows the analysis viewpoint, discussing scope ana- (or predicate-argument structures) with rhetorical biguities extensively. Though our algorithm gener- relations from the content planner and uses linguistic ates different sentences for different scope orderings, information to make decisions about how to convey we do not achieve this through scoping operations as the propositions fluently. Each proposition is repre- they did. Creaney also discussed various imprecise sented as a feature structure (Kaplan and Bresnan, quantifiers, such as some, at least, and at most. 1982; Kay, 1979) similar to the one shown in Fig- In regards to generating generic quantified expres- ure 1. The sentence planner's responsibilities include sions, (Knott et al., 1997) has proposed an algorithm referring expression generation, clause aggregation, for generating defeasible, but informative descrip- and lexical choice (Wanner and How, 1996). Then tions for objects in nmseums. the aggregated predicate-argument structure is sent Other researchers (van Eijck and Alshawi, 1992; to FUF/SURGE (Elhadad and Robin, 1992), a lin- Copestake et al., 1999) proposed representations in a guistic realizer which t.ransforms the lexicalized se- machine translation setting which allow underspec- inantic specification into a string. The quantification ification in regard to quantifier scope. Our work is algorithm is implemented in the sentence planner. different, in that we perform quantification directly on the instance-based representation obtained from 4 Quantification Algorithm database tuples. Our input .does not have the in-.. in this:,work, weprefergenerating expressions with formation about which entities are quantified as is universal quantifiers over conjunction because, as- the case in machine translation, where the quanti- suming that the users and the system have tile same tiers are already specified in the input from a source domain model, the universally quantified expres- language. sions are more concise and they represent the same amount of information as the expression with con- 3 The Application Domain joined entities. In contrast,, when given a conjunc- We implemented our quantification algorithm as tion of entities and an expression with a cardinal part of MAGIC (Dalai et al., 1996: McKeown et quantifier, the system, by default, would use the al., 1997). MAGIC automatically generates multi- conjunction if the conjoined entities can be distin- media briefings to describe the post-operative sta- guished at the surface level. This is because once tus of a patient after undergoing Coronary Artery the system generates a cardinal quantifier when the Bypass Graft, surgery. The system embodies a stan- universal quantification does not hold, such as "three 101 patients", it is impossible for the hearer to recover • both: ID - X[ = 0 and IxI = 2, can the identities of these patients based on the con- have col- text. The default heuristics to prefer universal quan- lective reading tifier over conjunction over cardinal quantifier can • every, all, the: ID-XI = 0 and IX[ > 2, can be superseded by directives fromthe contentplan- have collective reading ner which are application specific. • each: ID - X[ = 0 and IXI _> 2, only distribu- The input to our quaatifica~omalgorit;hm is a set -- • tive reading ...... of predicate-argument structures after the referring expression module selected the properties to identify ® any: ]D- X] = 0, when under the scope of the entities (Dale, 1992; Dale and Reiter, 1995), but without carrying out the assignment of quantifiers. ° a/an: IDnXl > 0 and Ixl = 1 Our quantification algorithm first identifies the set • n (cardinal): > 0 and [Xl = n of distinct entities which can be quantified in the IOnXl input propositions. A generalization of the entities in the ontology is selected to potentially replace the Figure 2: Axioms of the quantifiers discussed in this references to these entities. If universal quantifica- paper. tion is possible, then the replacement is made and the system must select which particular quantifier to use. In our system, we have six realizations for CLASSIC(Borgida et al., 1989) and is a of universal quantifiers: each, every, all 1, both, the, WordNet(Miller et alL, 1990) and an online medical and any, and two for existential quantifiers: the in- dictionary (Cimino et al., 1994) designed to support definite article, a/an, and cardinal n. multiple applications across the medical institution. Given the entities in set X, queries in CLASSIC de- 4.1 Identify Thematic Roles with Distinct termine the class of each instance and its ancestors Entities in the ontology. Based on this information, the gen- Our algorithm identifies the roles containing distinct eralization algorithm identifies Class-X by comput- entities among the input propositions as candidates ing the most specific class which covers all the enti- for universal and existential quantification. Suppose ties. Earlier work (Passonneau et al., 1996) provided the system is given two propositions similar to the a framework for balancing specificity and verbosity one in Figure 1, "After intubation, Alice received in selecting appropriate concepts for generalization. aprotinin" and "After start of bypass, Alice received However, given the precision needed in medical re- aprotinin", each with four roles - PRED, ARG1, ports, our generalization procedure selects the most ARG2, and MODS-TIME. By computing similarity specific class. anaong entities in the same role, the system deter- Set D represents the set of instances of Class-X in mines that the entities in ARG1, PRED, and ARG2 a context. Our system currently computes set D for are identical in each role, and only the entities in three different contexts: MODS-TIME are different. Based on this result, the distinct entities in MODS-TIME, "after intuba- e discourse: Previous references can provide an tion" and "after start of bypass", are candidates for appropriate context for universal quantification. quantificat ion. For example, if "Alice" and "Bob" were men- tioned in the previous sentence, the system can 4.2 Generalization and Quantification refer t.o them as "both patients" in the current We used the axioms in Figure 2 to determine if sentence. the distinct entities can be universally or existen- ® domain ontology: The domain ontology pro- tially quantified. Though the axioms are similar to vides a closed world from which we can obtain those used in Generalized Quantifier (Barwise and 't-he set D by matching all the instances of a Cooper, 1981; Zwarts, 1983; de Swart, 1998). the concept in the knowledge base, such as "'ev- semantics of set X and set D are different. In the ery patient". In addition, certain concepts in previous step. the entities in set X have been iden- the ontology have limited types. For example, tified. To compute set D in Figure 2. we introduce knowing that cell savers, platelets and packed a concept, Class-X. Class-X is a generalization of red blood cells are the only possible types of the distinct entities in set X. Quantification can re- blood products in the ontology, the quantified place the distinct entities in the propositions with expression "every blood product" can be used a reference to their type restricled by a quantifier. instead of referring to each entity. accessing discourse and ontological information .to ® domain knowledge: The possessor of the dis- provide a context. Our ontology is implemented in tinct entities in a role might contain a maximum lali is realized as "ali the". number of instances allowed for Class-X. For ex- 102 ample, because a person has only two arms, the tinguishable expressions at surface level. A more entities "the patient's left arm" and "the pa- developed pragmatic module is needed before quan- tient's right arm" can be referred to as "each tifiers such as some, raps'e, at least, and few, can arm". be systematically generated. Indiscriminate applica- tion of imprecise quantification can result in- vague The computation of set D can also involve interac- or inappropriate text in our domain, such as "The tions with a referring expression m0dule(Dale aad~ ~-:patient~rec~ived.~some 61ood~produetS:"'-v.ia-our~e~P - Reiter, 1995). For example, instead of the expres- plication, knowing exactly what blood products are sion "Alice and Bob" and "both patients" covered used is very important. To avoid generating such by the current algorithm, by interacting with a refer- inappropriate sentences, the system only performs ring expression module, the system might determine generalization on the entities which can be univer- that "both CABG patients operated on this morn- sally quantified. If the distinct entities cannot be ing by Dr. Rose" is a clearer expression to refer to universally quantified, the system will realize these the entities. Though this is desirable, we did not entities using coordinated conjunction. incorporate this capability into our system. Once the system decides that a universally quan- Although the is often used to indicate a generic tified expression can be used to replace the entities reference (i.e., "The lion is the king of jungle."), in in set X, it must select which universal quantifier. English, the can also be used as an unmarked uni- Because our sentence planner opportunistically com- versal quantifier when its head noun is plural, such bines distinct entries from separate database entries as "the patients." Like the quantifier all, the can for conciseness, it is not the case that these aggre- be both distributive and collective. However, the gated entities acted together (the collective read- cannot always replace all as a universal quantifier. ing). Given such input, the referring expression for the cannot be used when universal quantification is aggregated entities should have only the distribu- based on the domain ontology. For example, it is tive reading 2. The universal quantifier, each, al- not obvious that the quantified expression in "John ways imposes a distributive reading when applied. received the blood products." refers to "each blood In general, each requires a "matching" between the product" in the ontology. Although unmarked uni- domain of the quantifier and the objects referred versal quantifiers can be used to refer to body parts, to(McCawley, 1981, pp. 37). In our algorithm, this as in "The lines include an IV in the arms.", the ex- matching process is exactly what happened, thus it pression is ambiguous between the distributive and is the default universal quantifier in our algorithm. collective readings. Of the three contexts discussed Of course, indiscriminate use of each can result in above, the system occationally generates the instead awkward sounding text. For example, tile sentence of every and both in a discourse context, yielding "Every patient is awake" sounds more natural than more natural output. "Each patient is awake." However, since quantified When the computed set D matches set X exactly expressions with the universal quantifiers all and (ID - X I = 0), a quantified expression with either every 3 can have collective readings (Vendler, 1967; each, all, every, both, the, and any, replaces the McCawley, 1981), our system generates every and entities in set X. all under two conditions when the collective read- 4.3 Selecting a Particular Quantifier ing is unlikely. First if the proposition is a state, as opposed to an event, we assume only the distribu- In general, the universal quantification of a partic- tive reading is possible 4. The quantifier every is ular type of entity, such as "every patient", refers used in "Ever.q patient tmd.taehycardia.'" because to all such entities in a context. As a result, read- the proposition is a state proposition and contains ers can recover what a universally quantified expres- the predicate has-attribute, an attributive relation. sion refers to. In contrast, readers cannot pinpoint which entity has been refei'red to. in an existentially ...... 2For our system to generate noun-phrases.wivh ,col}eetive quantified expression, such as "a patient." or "two readings, the quantification process must be performed at the patients". Because a universally quantified expres- content planner level not in the clause aggregation module. sion preserves original semantics and is more con- 3every is also distributive, but it stresses completeness or cise than listing each entity, it is the focus of our rather, exhaustiveness(Vendler, 1967). The sentence "John quantificalion algorithm. The universal quantifiers took a picture of everyone in the room." is ambiguous while "John took a picture ost each person in the room." is not. hlaplemented in our system include the six possible 4There are cases where state propositions do have dis- realizations of V in English: every, all. each. both. teibuted readings (e.g., "Mountains surround the village." ). the, and any. The only existential quantifiers im- Sentences with collective readings are bandied earlier in the plemented in our system are the singular indefinite content planner and thus, this type of problem does not occur quantifier, a/an. and cardinal quantifiers, n. They at this point in our system. Though .this observation seems to be true in our medical application, when implementing quan- are used in sentences with multiple quantifiers and tifiers in a new domain, we can limit this assumption to only when the entities being referred to do not have dis- the subset of state relations for which it holds. 103 Second, when the concept being universally quan- ment of negation, each, all, every and both are in- tified is marked as having a distributive reading in appropriate, and any should be used instead. Given the lexicon, such as the concept episode, quantifiers that the patient went on bypass without compli- every will be used instead of each. These quanti- cations, the system should generate "The patient tiers make the quantified sentences more natural be- went on bypass without any problem." In contrast, cause they do not pick out the redundant distribu- "The patient went on bypass without every prob- tive meaning...... ~ . -: ...... =~:: .... ~" :~; /em.V=-~as-~ a,:differeut.-~meani~g; -,Our, :,system-cur=. The use of prepositions can also affect which quan- rently uses any as a universal quantifier when the tifier to use. For example, "After all the episodes, universal quantification is under the government of the patient received dobutamine" is ambiguous in re- negation, such as "The patient denied any drug al- gards to whether the dobutamine is given once dur- lergy.", or "Her hypertension was controlled without ing the surgery, or given after each episode. In con- any medication." Currently, the generation of nega- trast, the sentence "In all the episodes, the patient tion sentences about surgery problems and allergies received dobutamine." does not have this problem. are handled in the content planner. They are not The current system looks at the particular preposi- synthesized from multiple negation sentences: "The tion (i.e., "before", "after", or "in") before selecting patient is not allergic to aspirin. The paitent is not the appropriate quantifier. allergic to penicillin..." 4.4 Examples of a Single Quantifier 5 Generation of Multiple Quantifiers Given the four propositions, "After intubation, Mrs. Doe had tachycardia", "After skin incision, When there are two distinct roles across the proposi- Mrs. Doe had tachycardia", "After start of bypass, tions, the algorithm tries to use a universal quantifier Mrs. Doe had tachycardia', and "After coming off for one role and an existential quantifier for another. bypass, Mrs. Doe had tachycardia.", the algorithm To generate sentences with 33, both entities being first identifies roles with similar entities, ARG1, referred to must have no proper names; this triggers PRED, ARG2 and removes them from further quan- the use of existential quantifiers. We intentionally tification processing while the distinct entities in the ignore the cases where two universal quantifiers are role MODS-TIME, "after intubation", "after skin in- generated in the same sentence. The likelihood for cision", "after start of bypass", and "after coming off input specifying sentences with W to a text genera- bypass", are further processed for universal quantifi- tion system is slim. cation. The role MODS-TIME is further separated When generating multiple quantifiers in the same into two smaller roles, one role with the preposi- sentence, we differentiate between cases where there tions and the other role with different critical points. is or isn't a dependency between the two distinct Since the prepositions are all the same, universal roles. Two roles are independent of each other when quantification is only applied to the distinct entities one is not a modifier of the other. For example, in set X, in this case, the four critical points. Queries the roles ARG1 and ARG2 in a proposition are in- to the CLASSIC ontology indicate that the enti- dependent. In "Each patient is given a high sever- ties in set X, "intubation", "skin-incision", "start- ity rating", performing universal quantification on of-bypass", and "conaing-off-bypass" match all the the patients (ARG3) is a separate decision from possible types of the concept critical-point, sat- the existential quantification of the severity ratings isfying the domain ontology context in Section 4.2. (ARG2). Similarly, in "An abnormal lab result was Since set D and set X match exactly, generalization seen in each patient with hypertension after bypass". and universal quantification can be used to replace the quantification operations on the abnormal lab the references to these entities: "After each criti- results and the patients can be performed indepen- cal point, Mrs. Doe had tachycardia." The system dently. currently does not.perfor.m generMization omeJ~tities ....When there isa dependency 'between theroles be- which failed the univeral quantification test.. In such ing quantified, the quantification process of each role cases, a sentence with conjunction will be generated, might interact because modifiers restrict the range i.e., "After intubation and skin incision, Mrs. Doe of the entities being modified. We found that when had tachycardia." universal quantification occurs in the MODS role, In addition to every, the system generates both the quantification of PRED and MODS can be per- when the number of entities in set X is two. In formed independently, just as in the cases withou! our application, both is used as a universal quanti- dependency. Given the input propositions "Alice has tier under discourse context: "Alice had q)isodes of II