<<

to appear in Proceeding 20th International Conference on Computational Linguistics (COLING-04), August 23-27, 2004 Geneva Inferring parts of speech for lexical mappings via the Cyc KB

Tom O’Hara‡, Stefano Bertolo, Michael Witbrock, Bjørn Aldag, Jon Curtis,withKathy Panton, Dave Schneider,andNancy Salay ‡Computer Science Department Cycorp, Inc. New Mexico State University Austin, TX 78731 Las Cruces, NM 88001 {bertolo,witbrock,aldag}@cyc.com [email protected] {jonc,panton,daves,nancy}@cyc.com

Abstract in the phrase “painting for sale.” In cases like this, semantic or pragmatic criteria, as opposed We present an automatic approach to learn- to syntactic criteria only, are necessary for deter- ing criteria for classifying the parts-of-speech mining the proper part of speech. The headword used in lexical mappings. This will fur- part of speech is important for correctly identifying ther automate our knowledge acquisition sys- phrasal variations. For instance, the first term can tem for non-technical users. The criteria for also occur in the same sense in “paint for money.” the speech parts are based on the types of However, this does not hold for the second case, the denoted terms along with morphological since “paint for sale” has an entirely different sense and corpus-based clues. Associations among (i.e., a substance rather than an artifact). these and the parts-of-speech are learned us- When lexical mappings are produced by naive ing the lexical mappings contained in the Cyc users, such as in DARPA’s Rapid Knowledge For- knowledge base as training data. With over mation (RKF) project, it is desirable that tech- 30 speech parts to choose from, the classifier nical details such as the headword part of speech achieves good results (77.8% correct). Ac- be inferred for the user. Otherwise, often complex curate results (93.0%) are achieved in the and time-consuming clarification dialogs might be special case of the mass-count distinction for necessary in order to rule out various possibilities. nouns. Comparable results are also obtained For example, Cycorp’s Assistant was using OpenCyc (73.1% general and 88.4% developed for RKF in order to allow non-technical mass-count). users to specify lexical mappings from terms into the Cyc knowledge base (KB). Currently, when a 1 Introduction new type of activity is described, the user is asked In semantic , the term lexical mapping de- a series of questions about the ways of referring to scribes the relation between a concept and a phrase the activity. If the user enters the phrase “painting used to refer to it (Onyshkevych and Nirenburg, for money,” the system asks whether the phrases 1995; Burns and Davis, 1999). Lexical mappings “paint for money” and “painted for money” are include associated syntactic information, in partic- suitable variations in order to determine whether ular, part of speech information for phrase head- ‘painting’ should be treated as a verb. Users find . The term lexicalize will refer to the process such clarification dialogs distracting, since they are of producing these mappings, which are referred more interested in entering domain rather than lin- to as lexicalizations. Selecting the part of speech guistic knowledge. Regardless, it is often very diffi- for the lexical mapping is required so that proper cult to produce prompts that make the distinction inflectional variations can be recognized and gen- intelligible to a linguistically naive user. erated for the term. Although producing lexi- A special case of the lexicalization speech part calizations is often a straightforward task, there classification is the handling of the mass-count dis- are many cases that can pose problems, especially tinction. Having the ability to determine if a con- when fine-grained speech part categories are used. cept takes a mass or count noun is useful not only For example, the headword ‘painting’ is a verb for parsing, but also for generation of grammat- in the phrase “painting for money” but a noun ical English. For example, automatically gener- PhysicalDevice Collection: “Dubya” refers to President George W.Bush). The Microtheory: ArtifactGVocabularyMt KB also includes a broad-coverage English isa: ExistingObjectType mapping words and phrases to terms throughout genls: Artifact ComplexPhysicalObject the KB. A subset of the Cyc KB including parts of SolidTangibleProduct the English lexicon has been made freely available Microtheory: ProductGMt as part of OpenCyc (www.opencyc.org). isa: ProductType 2.1 Ontology Figure 1: Type definition for PhysicalDevice. Central to the Cyc ontology is the concept collec- tion, which corresponds to the familiar notion of ated web pages (e.g., based on search terms) oc- a set, but with membership intensionally defined casionally produce ungrammatical term variations (so distinct collections can have identical members, because this distinction is not addressed properly. which is impossible for sets). Every object in the Although learner provide informa- Cyc ontology is a member (or instance,inCyc tion on the mass-count distinction, they are not parlance) of one or more collections. Collection suitable for this task because different senses of a membership is expressed using the predicate (i.e., are often conflated in the definitions for the relation-type) isa, whereas collection subsumption sake of simplicity. In cases like this, the word or is expressed using the transitive predicate genls sense might be annotated as being both count and (i.e., generalization). These predicates correspond mass, perhaps with examples illustrating the dif- to the set-theoretic notions element of and subset ferent usages. This is the case for ‘chicken’ from of respectively and thus are used to form a partially the Cambridge International Dictionary of English ordered hierarchy of concepts. For the purposes of (Procter, 1995), defined as follows: this discussion, the isa and genls assertions on a a type of bird kept on a farm for its eggs or Cyctermconstituteitstype definition. its meat, or the meat of this bird which is Figure 1 shows the type definition for Physi- cooked and eaten calDevice, a prototypical denotatum term for count nouns. The type definition of PhysicalDevice indi- This work describes an approach for automati- cates that it is a collection that is a specialization cally inferring the parts of speech for lexical map- of Artifact, etc. As is typical for terms referred to pings, using the existing lexical assertions in the by count nouns, it is an instance of the collection CycKB.Wearespecificallyconcerned with select- ExistingObjectType. ing parts of speech for entries in a semantic lexi- Figure 2 shows the type definition for Water,a con, not about determining parts of speech in con- prototypical denotation for mass nouns. Although text. After an overview of the Cyc KB in the next the asserted type information for Water does not section, Section 3 discusses the approach taken to convey any properties that would suggest a mass inferring the part of speech for lexicalizations. Sec- noun lexicalization, the genls hierarchy of collec- tion 4 then covers the classification results. This is tions does. In particular, the collection Chemical- followed by a comparison to related work in Section CompoundTypeByChemicalSpecies is known to be 5. a specialization of the collection ExistingStuffType, via the transitive properties of genls. Thus, by 2 Cyc knowledge base virtue of being an instance of ChemicalCompound- In development since 1984, the Cyc knowledge base TypeByChemicalSpecies, Water is known to be an (Lenat, 1995) is the world’s largest formalized rep- instance of ExistingStuffType. This illustrates that resentation of commonsense knowledge, contain- the decision procedure for the lexical mapping ing over 120,000 concepts and more than a mil- speech parts needs to consider not only asserted, lion axioms.1 Cyc’s upper ontology describes but also inherited collection membership. the most general and fundamental of distinctions (e.g., tangibility versus intangibility). The lower 2.2 English lexicon ontology contains facts useful for particular appli- Natural language lexicons are integrated directly cations, such as web searching, but not necessar- into the Cyc KB (Burns and Davis, 1999). Though ily required for commonsense reasoning (e.g., that several lexicons are included in the KB, the English 1These figures and the results discussed later are lexicon is the only one with general coverage. The basedonCycKBversion576andOpenCycKBversion mapping from nouns to concepts is done using one 567. of two general strategies, depending on whether the 2 Collection: Water Usage Microtheory: UniversalVocabularyMt Predicate OpenCyc Cyc isa: ChemicalCompoundTypeByChemicalSpecies multiWordString 1123 24606 Microtheory: UniversalVocabularyMt denotation 2080 16725 genls: Individual compoundString 318 2226 Microtheory: NaivePhysicsVocabularyMt headMedialString 200 942 genls: Oxide total 3721 44499 Figure 2: Type definition for Water. Table 1: Denotational predicate usage in Cyc English lexicon. This excludes slang and jargon. mapping is from a name or a common noun phrase. Several different binary predicates indicate name- Usage to-term mappings, with the name represented as a SpeechPart OpenCyc Cyc string. For example, CountNoun 2041 21820 MassNoun 566 9993 (nameString HEBCompany “HEB”) Adjective 262 6460 Verb 659 2860 A denotational assertion maps a phrase into a AgentiveNoun 81 1389 concept, usually a collection. The phrase is spec- ProperCountNoun 16 906 ified via a lexical word unit (i.e., concept) Adverb 50 310 with optional string modifiers. The part of speech ProperMassNoun 1 286 is specified via one of Cyc’s SpeechPart constants. GerundiveNoun 7 275 Syntactic information, such as the wordform vari- other 39 185 ants and their speech parts, is stored with the Cyc total 3721 44499 constant for the word unit. For example, Device- TheWord, the Cyc constant for the word ‘device,’ Table 2: Most common speech parts in deno- has a single syntactic mapping since the plural tational assertions. The other entry covers 20 form is inferable: infrequently used cases.

Constant: Device-TheWord Microtheory: GeneralEnglishMt This states that “buy down” refers to BuyDown, isa: EnglishWord as do “buys down,” “buying down,” and “bought posForms: CountNoun down” based on the inflections of the verb ‘buy.’ singular: “device” Table 1 shows the frequency of the various pred- icates used in the denotational assertions, exclud- The simplest type of denotational mapping asso- ing lexicalizations that involve technical, informal ciates a particular sense of a word with a concept or slang terms. Table 2 shows the most frequent via the denotation predicate. For example, speech parts from these assertions. This shows that nearly 50% of the cases use CountNoun for (denotation Device-Word CountNoun 0 the headword speech part and that about 25% use PhysicalDevice) MassNoun. This subset of the denotational asser- tions forms the basis of the training data used in This indicates that sense 0 of the count noun ‘de- the mass versus count noun classifier, as discussed vice’ refers to PhysicalDevice via the associated later. Twenty other speech parts used in the lexi- wordforms “device” and “devices.” con are not shown. Several of these are quite spe- To account for phrasal mappings, three addi- cialized (e.g., QuantifyingIndexical) and not very tional predicates are used, depending on the lo- common, mainly occurring in fixed phrases. The cation of the headword in the phrase. These full speech part classifier handles all categories. are compoundString, headMedialString,andmul- tiWordString for phrases with the headword at the 3 Inference of default part of beginning, the middle, and the end, respectively. speech For example, Our method of inferring the part of speech for lexi- (compoundString Buy-TheWord (“down”) calizations is to apply machine learning techniques Verb BuyDown) over the lexical mappings from English words or 3 phrases to Cyc terms. For each target denota- Feature Search Pattern tum term, the corresponding types and general- singular singular izations are extracted from the ontology. This in- plural plural cludes terms for which the denotatum term is an count “many plural”or“severalplural” instance or specialization, either explicitly asserted mass “much singular”or“severalsingular” or inferable via transitivity. For simplicity, these verb “must head”or“couldhead” are referred to as ancestor terms. The associa- adverb “did head”or“dohead”or tion between the lexicalization parts of speech and “does head”or“sohead”or the common ancestor terms forms the basis for the “has head been” or “have head been” main criteria used in the lexicalization speech part adjective “more head”or“mosthead”or classifier and the special case for the mass-count ‘very head” classifier. In addition, this is augmented with fea- tures indicating whether known suffixes occur in Figure 3: Corpus pattern templates for part- the headword as well as with corpus statistics. of-speech clues. The placeholders refer to word- forms derived from the headword: plural and 3.1 Cyc ancestor term features singular are derived via morphology; head uses There are several possibilities in mapping the Cyc the headword as is. ancestor terms into a feature vector for use in machine learning algorithms. The most direct used are the most-common two to four letter se- method is to have a binary feature for each pos- quences found in the headwords. sible ancestor term, but this would require about Often the choice of speech parts for lexicaliza- ten thousand features. To prune the list of poten- tions reflects idiosyncratic usages rather than just tial features, frequency considerations can be ap- underlying semantics. To account for this, a set of plied, such as taking the most frequent terms that features is included that is based on the relative occur in type definition assertions. Alternatively, frequency that the denotational headword occurs the training data can be analyzed to see which ref- in contexts that are indicative of each of the main erence terms are most correlated with the classifi- speech parts: singular, plural, count, mass, verbal, cations. adjectival, and adverbial. See Figure 3. These pat- For simplicity, the frequency approach is used terns were determined by analyzing part-of-speech here. The most-frequent 1024 atomic terms are se- tagged text and seeing which function words co- lected, excluding terms used for bookkeeping pur- occur predominantly in the immediate context for poses (e.g., PublicConstant, which mark terms for words of the given grammatical category. Note public releases of the KB); half of these terms are that high frequency function words such as ‘to’ taken from the isa assertions, and the other half were not considered because they are usually not from the genls assertions. These are referred to indexed for information retrieval. as the reference terms. For instance, ObjectType These features are derived as follows. Given is a type for 21,108 of the denotation terms (out a lexical assertion (e.g., (denotation Hound- of 44,449 cases), compared to 20,283 for StuffType. TheWord CountNoun 0 Dog)), the headword is These occur at ranks 13 and 14, so they are both extracted and then the plural or singular variant included. In contrast, SeparationEvent occurs only wordform is derived for use in the pattern tem- 185 times as a generalization term at rank 522, so plates. Corpus checks are done for each, producing it is pruned. See (O’Hara et al., 2003) for more a vector of frequency counts (e.g., 29, 17, 0, 0, 0, details on extracting the reference term features. 0, 0). These counts are then normalized and then used as numeric features for the machine learning 3.2 Morphology and corpus-based algorithm. Table 3 shows the results for the hound features example and with a few other cases. In English, the suffix for a word can provide a good clue as to the speech part of a word. For exam- 3.3 Sample criteria ple, agentive nouns commonly end in ‘-or’ or ‘-er.’ We use decision trees for this classification. Part Features to account for this are derived by seeing of the motivation is that the result is readily in- whether the headword ends in one of a predefined terpretable and can be incorporated directly by set of suffixes and adding the suffix as a value to knowledge-based applications. Decision trees are an enumerated feature variable corresponding to induced in a process that recursively splits the suffixes of the given length. Currently, the suffixes training examples based on the feature that parti- 4 Head Sing Plural Count Mass Verb Adv Adj is an implementation of Quillian’s C4.5 (Quinlan, hound .630 .370 0 0 0 0 0 1993) decision tree learner. Other classifiers were book .613 .371 .011 .001 0 .002 .001 considered as well (e.g., Naive Bayes and nearest wood .577 .418 0 .004 0 .001 .001 neighbor), but J4.8 generally gave the best overall leave .753 .215 0 0 .024 .008 0 results. fast .924 .003 0 .003 .001 .043 .027 stormy .981 0 0 0 0 0 .019 4.1 Results for mass-count distinction Table 4 shows the results for the special case mass- Table 3: Sample relative frequency values count classification. This shows that the system from corpus checks. achieves an accuracy of 93.0%, an improvement of 24.4 percentage points over the standard base- if (genls Event) and (genls not ∈{ConsumingFoodOrDrink, line of always selecting the most frequent case (i.e., SeasonOfYear, QualitativeTimeOfDay, count noun). Other baselines are included for com- SocialGathering, PrecipitationProcess, parison purposes. For example, using the head- SimpleRepairing, ConflictEvent, word as the sole feature (just-headwords)performs } SomethingAppearingSomewhere )and fairly well compared to the system based on Cyc; (isa not PhysiologicalConditionType) and (f-Plural ≤ 0.245) then but, this classifier would lack generalizability, re- if (Suffix ∈{ine, een} then Verb lying simply upon table lookup. (In this case, the if (Suffix ∈{ile, ent} then CountNoun decision tree induction process ran into memory if (Suffix = ing) then MassNoun constraints, so a Naive Bayes classifier was used if (Suffix = ion) then instead.) In addition, a system only based on if (f-Mass > 0.026) then MassNoun else Verb the suffixes (just-suffixes) performs marginally bet- if (Suffix = ite) then CountNoun ter than always selecting the most common case. if (Suffix ∈{ide, ure, ous} then Verb Thus, morphology alone would not be adequate for if (Suffix = ive) and this task. The OpenCyc version of the classifier (genls Perceiving) then MassNoun also performs well. This illustrates that sufficient else CountNoun if (Suffix = ate) then data is already available in OpenCyc to allow for if (not genls InformationStore) and good approximations for such classifications. Note (f-Count ≤ 0.048) and that for the mass-count experiments and for the (f-Adverb ≤ 0.05) then experiments discussed later, the combined system if (gens Translocation) then MassNoun over full Cyc leads to statistically significant im- else CountNoun provements compared to the other cases.

Figure 4: Sample rule from the general 4.2 Results for general speech part speech part classifier. classification Running the same classifier setup over all speech tions the current set of examples to maximize the parts produces the results shown in Table 5. The information gain (Witten and Frank, 1999). This is overall result is not as high, but there is a similar commonly done by selecting the feature that min- improvement over the baselines. Relying solely on imizes the entropy of the distribution (i.e., yields suffixes or on corpus checks performs slightly bet- least uniform distribution). A fragment of the de- ter than the baseline. Using headwords performs cision tree is shown to give an idea of the criteria well, but again that amounts to table lookup. In being considered in the speech part classification. terms of absolute accuracy it might seem that the See Figure 4. In this example, the semantic types system based on OpenCyc is doing nearly as well mostly provide exceptions to associations inferred as the system based on full Cyc. This is somewhat from the suffixes, with corpus clues used occasion- misleading, since the distribution of parts of speech ally for differentiation. is simpler in OpenCyc, as shown by the lower en- tropy value (Jurafsky and Martin, 2000). 4 Evaluation and results 5 Related work To test out the performance of the speech part clas- sification, 10-fold cross validation is applied to each There has not been much work in the automatic de- configuration that was considered. Except as noted termination of the preferred lexicalization part of below, all the results are produced using Weka’s speech, outside of work related to part-of-speech J4.8 classifier (Witten and Frank, 1999), which tagging (Brill, 1995), which concentrates on the 5 sequences of speech tags rather than the default tags. Brill uses an error-driven transformation- Dataset Characteristics based learning approach that learns lists for trans- OpenCyc Cyc forming the initial tags assigned to the sentence. Instances 2607 30676 Unknown words are handled basically via rules Classes 2 2 that change the default assignment to another Entropy 0.76 0.90 based on the suffixes of the unknown word. Ped- ersen and Chen (1995) discuss an approach to Accuracy Figures inferring the grammatical categories of unknown OpenCyc Cyc words using constraint solving over the properties Baseline 78.3 68.6 of the known words. Toole (2000) applies decision Just-headwords 87.5 89.3 trees to a similar problem, distinguishing common Just-suffixes 78.3 71.9 nouns, pronouns, and various types of names, using Just-corpus 78.2 68.6 a framework analogous to that commonly applied Just-terms 87.4 90.5 in named-entity recognition. Combination 88.4 93.0 In work closer to ours, Woods (2000) describes an approach to this problem using manually con- Table 4: Mass-count classification over Cyc structed rules incorporating syntactic, morpholog- lexical mappings. Instances is size of the train- ical, and semantic tests (via an ontology). For ing data. Classes is the number of choices. En- example, patterns targeting specific stems are ap- tropy characterizes distribution uniformity. Base- plied provided that the root meets certain semantic line uses more frequent case. The just-X en- constraints. There has been clustering-based work tries incorporate a single type: headwords from in part-of-speech induction, but these tend to tar- lexical mapping, suffixes of headword, corpus co- get idiosyncratic classes, such as capitalized words occurrence of part-of-speech indicators; and Cyc and words ending in ‘-ed’ (Clark, 2003). reference terms. Combination uses all features ex- The special case of classifying the mass-count cept for the headwords. For Cyc, it yields a statis- distinction has received some attention. Bond and tically significant improvement over the others at Vatikiotis-Bateson (2002) infer five types of count- p<. 01 using a paired t-test. ability distinctions using NT&T’s Japanese to En- glish transfer dictionary, including the categories strongly countable, weakly countable, and plural only. The countability assigned to a particular semantic category is based on the most common Dataset Characteristics case associated with the English words mapping OpenCyc Cyc into the category. Our earlier work (O’Hara et al., Instances 3721 44499 2003) just used semantic features as well but ac- Classes 16 34 counted for inheritance of types, achieving 89.5% Entropy 1.95 2.11 with a baseline of 68.2%. Schwartz (2002) uses the five NT&T countability distinctions when tagging Accuracy Figures word occurrences in a corpus (i.e., word tokens), based primarily on clues provided by determiners. OpenCyc Cyc Results are given in terms of agreement rather than Baseline 54.9 48.6 accuracy; compared to NT&T’s dictionary there is Just-headwords 61.6 73.8 about 90% agreement for the fully or strong count- Just-suffixes 55.6 53.0 able types and about 40% agreement for the weakly Just-corpus 63.1 49.0 countable or uncountable types, with half of the Just-terms 68.2 71.3 tokens left untagged for countability. Baldwin and Combination 73.1 77.8 Bond (2003) apply sophisticated preprocessing to derive a variety of countability clues, such as gram- Table 5: Full speech part classification over matical number of modifiers, co-occurrence of spe- Cyc lexical mappings. All speech parts in Cyc cific types of determiners and pronouns, and spe- are used. See Table 4 for legend. cific types of prepositions. They achieve 94.6% ac- curacy using four categories of countability, includ- ing two categories for types of plural-only nouns. 6 Since multiple assignments are allowed, negative Eric Brill. 1995. Transformation-based error- agreement is considered as well as positive. When driven learning and natural language processing: restricted to just count versus mass nouns, the ac- A case study in part of speech tagging. Compu- curacy is 89.9% (personal communication). Note tational Linguistics, 21(4):543–565. that, as with Schwartz, the task is different from Kathy J. Burns and Anthony B. Davis. 1999. ours and that of Bond and Vatikiotis-Bateson: we Building and maintaining a semantically ade- assign countability to word/concept pairs instead quate lexicon using Cyc. In Evelyn Viegas, ed- of just to words. itor, Breadth and Depth of Semantic Lexicons, pages 121–143. Kluwer, Dordrecht. 6 Conclusion and future work Alexander Clark. 2003. Combining distributional This paper shows that an accurate decision pro- and morphological information for part of speech cedure (93.0%) accounting for the mass-count dis- induction. In Proceedings of EACL 2003. tinction can be induced from the lexical mappings Daniel Jurafsky and James H. Martin. 2000. in the Cyc KB. The full speech part classifier pro- Speech and Language Processing. Prentice Hall, duces promising results (77.8%), considering that Upper Saddle River, New Jersey. it is a much harder task, with over 30 categories to choose from. The features incorporate semantic in- D. B. Lenat. 1995. Cyc: A large-scale investment formation, in particular Cyc’s ontological types, in in knowledge infrastructure. Communications of addition to syntactic information (e.g., headword the ACM, 38(11). morphology). Tom O’Hara, Nancy Salay, Michael Witbrock, Future work will investigate how the classifiers Dave Schneider, Bjoern Aldag, Stefano Bertolo, can be generalized for classifying word usages in Kathy Panton, Fritz Lehmann, Matt Smith, context, rather than isolated words. This could David Baxter, Jon Curtis, and Peter Wagner. complement existing part-of-speech taggers by al- 2003. Inducing criteria for mass noun lexical lowing for more detailed tag types, such as for mappings using the Cyc KB, and its extension count and agentive nouns. to WordNet. In Proc. Fifth International Work- A separate area for future work will be to ap- shop on Computational Semantics (IWCS-5). ply the techniques to other languages. For exam- B. Onyshkevych and S. Nirenburg. 1995. A lexicon ple, minimal changes to the classifier setup would for knowledge-based MT. Machine Translation, be required to handle Romance languages, such 10(2):5–57. as Italian. The version of the classifier that just uses Cyc reference terms could be applied as is, Ted Pedersen and Weidong Chen. 1995. Lexi- cal acquisition via constraint solving. In Proc. given lexical mappings for the language. For the AAAI 1995 Spring Symposium Series. combined-feature classifier, we would just need to change the list of suffixes and the part-of-speech Paul Procter, editor. 1995. Cambridge Interna- pattern templates (from Figure 3). tional Dictionary of English. Cambridge Uni- versity Press, Cambridge. Acknowledgements J. Ross Quinlan. 1993. C4.5: Programs for Ma- The lexicon work at Cycorp has been supported in part chine Learning. Morgan Kaufmann, San Mateo, by grants from NIST, DARPA (e.g., RKF), and ARDA California. (e.g., AQUAINT). At NMSU, the work was facilitated Lane O.B. Schwartz. 2002. Corpus-based acquisi- by a GAANN fellowship from the Department of Edu- tion of head noun countability features. Master’s cation and utilized computing resources made possible thesis, Cambridge University, Cambridge, UK. through MII Grants EIA-9810732 and EIA-0220590. Janine Toole. 2000. Categorizing unknown words: Using decision trees to identify names and mis- References . In Proc. ANLP-2000. Timothy Baldwin and Francis Bond. 2003. Learn- Ian H. Witten and Eibe Frank. 1999. Data ing the countability of English nouns from cor- Mining: Practical Machine Learning Tools and pus data. In Proc. ACL-03. Techniques with Java Implementations. Morgan Kaufmann, San Francisco, CA. Francis Bond and Caitlin Vatikiotis-Bateson. 2002. Using an ontology to determine English W. Woods. 2000. Aggressive morphology for ro- countability. In Proc. COLING-2002, pages 99– bust lexical coverage. In Proc. ANLP-00. 105. Taipei. 7