<<

Learning the Countability of English from Corpus Data

Timothy Baldwin Francis Bond CSLI NTT Communication Science Laboratories Stanford University Nippon Telegraph and Telephone Corporation Stanford, CA, 94305 Kyoto, Japan [email protected] [email protected]

Abstract ence. Knowledge of countability preferences is im- portant both for the analysis and generation of En- This paper describes a method for learn- glish. In analysis, helps to constrain the inter- ing the countability preferences of English pretations of parses. In generation, the countabil- nouns from raw text corpora. The method ity preference determines whether a can be- maps the corpus-attested lexico-syntactic come plural, and the range of possible determin- properties of each noun onto a feature ers. Knowledge of countability is particularly im- vector, and uses a suite of memory-based portant in machine translation, because the closest classifiers to predict membership in 4 translation equivalent may have different countabil- countability classes. were able to as- ity from the source noun. Many languages, such sign countability to English nouns with a as Chinese and Japanese, do not mark countability, precision of 94.6%. which means that the choice of countability will be largely the responsibility of the generation compo- nent (Bond, 2001). In addition, knowledge of count- 1 Introduction ability obtained from examples of use is an impor- tant resource for dictionary construction. This paper is concerned with the task of knowledge- In this paper, we learn the countability prefer- rich lexical acquisition from unannotated corpora, ences of English nouns from unannotated corpora. focusing on the case of countability in English. We first annotate them automatically, and then train Knowledge-rich lexical acquisition takes unstruc- classifiers using a set of gold standard data, taken tured text and extracts out linguistically-precise cat- from COMLEX (Grishman et al., 1998) and the trans- egorisations of word and expression types. By fer dictionaries used by the machine translation sys- combining this with a grammar, we can build tem ALT-J/E (Ikehara et al., 1991). The classifiers broad-coverage deep-processing tools with a min- and their training are described in more detail in imum of human effort. This research is close Baldwin and Bond (2003). These are then run over in spirit to the work of Light (1996) on classi- the corpus to extract nouns as members of four fying the semantics of derivational affixes, and classes — countable: dog; uncountable: furniture; bi- Siegel and McKeown (2000) on learning verb as- partite: [pair of] scissors and plural only: clothes. pect. We first discuss countability in more detail (§ 2). In English, nouns heading noun phrases are typ- Then we present the lexical resources used in our ex- ically either countable or uncountable (also called periment (§ 3). Next, we describe the learning pro- count and mass). Countable nouns can be modi- cess (§ 4). We then present our results and evalu- fied by denumerators, prototypically numbers, and ation (§ 5). Finally, we discuss the theoretical and have a morphologically marked plural form: practical implications (§ 6). dog, two dogs. Uncountable nouns cannot be modi- fied by denumerators, but can be modified by unspe- 2 Background cific quantifiers such as much, and do not show any number distinction (prototypically being singular): Grammatical countability is motivated by the se- *one equipment, some equipment, *two equipments. mantic distinction between and substance Many nouns can be used in countable or uncountable (also known as bounded/non-bounded or environments, with differences in interpretation. individuated/non-individuated). It is a of We call the lexical property that determines which contention among linguists as to how far grammat- uses a noun can have the noun’s countability prefer- ical countability is semantically motivated and how much it is arbitrary (Wierzbicka, 1988). ers will dispute the validity of the new usage. We The prevailing position in the natural language do not necessarily wish to learn such rare examples, processing community is effectively to treat count- and may not need to learn more common conver- ability as though it were arbitrary and encode it as sions either, as can be handled by regular lexi- a lexical property of nouns. The study of countabil- cal rules (Copestake and Briscoe, 1995). The second ity is complicated by the fact that most nouns can problem is that some constructions affect the appar- have their countability changed: either converted by ent countability of their head: for example, nouns a lexical rule or embedded in another . denoting a role, which are typically countable, can An example of conversion is the so-called universal appear without an article in some constructions (e.g. packager, a rule which takes an uncountable noun We elected him treasurer). The third is that different with an interpretation as a substance, and returns a senses of a word may have different countabilities: countable noun interpreted as a portion of the sub- interest “a sense of concern with and curiosity” is stance: would like two beers. An example of em- normally countable, whereas interest “fixed charge bedding is the use of a classifier, e.g. uncountable for borrowing money” is uncountable. nouns can be embedded in countable noun phrases There have been at several earlier approaches as complements of classifiers: one piece of equip- to the automatic determination of countabil- ment. ity. Bond and Vatikiotis-Bateson (2002) determine Bond et al. (1994) suggested a division of count- a noun’s countability preferences from its seman- ability into five major types, based on Allan (1980)’s tic class, and show that semantics predicts (5-way) noun countability preferences (NCPs). Nouns which countability 78% of the time with their ontology. rarely undergo conversion are marked as either fully O’Hara et al. (2003) get better results (89.5%) using countable, uncountable or plural only. Fully countable the much larger Cyc ontology, although they only nouns have both singular and plural forms, and can- distinguish between countable and uncountable. not be used with determiners such as much, little, a Schwartz (2002) created an automatic countabil- little, less and overmuch. Uncountable nouns, such ity tagger (ACT) to learn noun countabilities from as furniture, have no plural form, and can be used the British National Corpus. ACT looks at deter- with much. Plural only nouns never head a singular miner co-occurrence in singular noun chunks, and noun phrase: goods, scissors. classifies the noun if and only if it occurs with a de- Nouns that are readily converted are marked as ei- terminer which can modify only countable or un- ther strongly countable (for countable nouns that can countable nouns. The method has a coverage of be converted to uncountable, such as cake) or weakly around 50%, and agrees with COMLEX for 68% of countable (for uncountable nouns that are readily the nouns marked countable and with the ALT-J/E convertible to countable, such as beer). lexicon for 88%. Agreement was worse for uncount- NLP systems must list countability for at least able nouns (6% and 44% respectively). some nouns, because full knowledge of the refer- ent of a noun phrase is not enough to predict count- 3 Resources ability. There is also a language-specific knowl- edge requirement. This can be shown most sim- Information about noun countability was obtained ply by comparing languages: different languages en- from two sources. One was COMLEX 3.0 (Grish- code the countability of the same referent in dif- man et al., 1998), which has around 22,000 noun ferent ways. There is nothing about the concept entries. Of these, 12,922 are marked as being count- denoted by lightning, e.g., that rules out *a light- able (COUNTABLE) and 4,976 as being uncountable ning being interpreted as a flash of lightning. In- (NCOLLECTIVE or :PLURAL *NONE*). The remainder deed, the German and French translation equivalents are unmarked for countability. are fully countable (ein Blitz and un eclair´ respec- The other was the common noun part of ALT- tively). Even within the same language, the same J/E’s Japanese-to-English semantic transfer dictio- referent can be encoded countably or uncountably: nary (Bond, 2001). It contains 71,833 linked clothes/clothing, things/stuff , jobs/work. Japanese-English pairs, each of which has a value Therefore, we must learn countability classes for the noun countability preference of the English from usage examples in corpora. There are several noun. Considering only unique English entries with impediments to this approach. The first is that words different countability and ignoring all other informa- are frequently converted to different countabilities, tion gave 56,245 entries. Nouns in the ALT-J/E dic- sometimes in such a way that other native speak- tionary are marked with one of the five major count- ability preference classes described in Section 2. In of nouns countability-classified in both lexicons, the addition to countability, default values for number mean was 93.8%. Almost half of the disagreements and classifier (e.g. blade for grass: blade of grass) came from words with two countabilities in ALT-J/E are also part of the lexicon. but only one in COMLEX. We classify words into four possible classes, with some words belonging to multiple classes. The first 4 Learning Countability class is countable: COMLEX’s COUNTABLE and ALT- J/E’s fully, strongly and weakly countable. The sec- The basic methodology employed in this research is ond class is uncountable: COMLEX’s NCOLLECTIVE or to identify lexical and/or constructional features as- :PLURAL *NONE* and ALT-J/E’s strongly and weakly sociated with the countability classes, and determine countable and uncountable. the relative corpus occurrence of those features for The third class is bipartite nouns. These can only each noun. We then feed the noun feature vectors be plural when they head a noun phrase (trousers), into a classifier and make a judgement on the mem- but singular when used as a modifier (trouser leg). bership of the given noun in each countability class. When they are denumerated they use pair: a pair of In order to extract the feature values from corpus scissors. COMLEX does not have a feature to mark data, we need the basic phrase structure, and partic- bipartite nouns; trouser, for example, is listed as ularly noun phrase structure, of the source text. We countable. Nouns in ALT-J/E marked plural only with use three different sources for this phrase structure: a default classifier of pair are classified as bipartite. part-of-speech tagged data, chunked data and fully- The last class is plural only nouns: those that only parsed data, as detailed below. have a plural form, such as goods. They can nei- The corpus of choice throughout this paper is the ther be denumerated nor modified by much. Many written component of the British National Corpus of these nouns, such as clothes, use the plural form (BNC version 2, Burnard (2000)), totalling around even as modifiers (a clothes horse). The word 90m w-units (POS-tagged items). We chose this be- clothes cannot be denumerated at all. Nouns marked cause of its good coverage of different usages of En- :SINGULAR *NONE* in COMLEX and nouns in ALT- glish, and thus of different countabilities. The only J/E marked plural only without the default classifier component of the original annotation we make use pair are classified as plural only. There was some of is the sentence tokenisation. noise in the ALT-J/E data, so this class was hand- Below, we outline the features used in this re- checked, giving a total of 104 entries; 84 of these search and methods of describing feature interac- were attested in the training data. tion, along with the pre-processing tools and ex- Our classification of countability is a subset of traction techniques, and the classifier architecture. ALT-J/E’s, in that we use only the three basic ALT- The full range of different classifier architectures J/E classes of countable, uncountable and plural only, tested as part of this research, and the experi- (although we treat bipartite as a separate class, not a ments to choose between them are described in subclass). As we derive our countability classifica- Baldwin and Bond (2003). tions from corpus evidence, it is possible to recon- struct countability preferences (i.e. fully, strongly, or 4.1 Feature space weakly countable) from the relative token occurrence of the different countabilities for that noun. For each target noun, we compute a fixed-length In order to get an idea of the intrinsic difficulty of feature vector based on a variety of features intended the countability learning task, we tested the agree- to capture linguistic constraints and/or preferences ment between the two resources in the form of clas- associated with particular countability classes. The sification accuracy. That is, we calculate the average feature space is partitioned up into feature clusters, proportion of (both positive and negative) countabil- each of which is conditioned on the occurrence of ity classifications over which the two methods agree. the target noun in a given construction. E.g., COMLEX lists tomato as being only countable Feature clusters take the form of one- or two- where ALT-J/E lists it as being both countable and un- dimensional feature matrices, with each dimension countable. Agreement for this one noun, therefore, is describing a lexical or syntactic property of the 3 4 , as there is agreement for the classes of countable, construction in question. In the case of a one- plural only and bipartite (with implicit agreement as dimensional feature cluster (e.g. noun occurring in to negative membership for the latter two classes), singular or plural form), each component feature but not for uncountable. Averaging over the total set feat s in the cluster is translated into the 3-tuple: Feature cluster Countable Uncountable Bipartite Plural only (base feature no.) Head number (2) S,P S P P Modifier number (2) S,P S S P Subj–V agreement (2 × 2) [S,S],[P,P] [S,S] [P,P] [P,P] Coordinate number (2 × 2) [S,S],[P,S],[P,P] [S,S],[S,P] [P,S],[P,P] [P,S],[P,P] N of N (11 × 2) [100s,P], . . . [lack,S], . . . [pair,P], . . . [rate,P], . . . PPs (52 × 2) [per,-DET], . . . [in,-DET], . . . — — Pronoun (12 × 2) [it,S],[they,P], . . . [it,S], . . . [they,P], . . . [they,P], . . . Singular determiners (10) a,each, . . . much, . . . — — Plural determiners (12) many, few, . . . — — many, . . . Neutral determiners (11 × 2) [less,P], . . . [BARE,S], . . . [enough,P], . . . [all,P], . . .

Table 1: Predicted feature-correlations for each feature cluster (S=singular, P=plural)

hTYPE,SINGULARi). We have identified a total of 11 N types for use in this feature cluster freq(feats|word) freq(feats|word) hfreq(feats|word), , i (e.g. COLLECTIVE, LACK, TEMPORAL). freq(word) P ifreq(feat i|word) Occurrence in PPs:2D the presence or absence of In the case of a two-dimensional feature cluster a determiner (DET) when the target noun oc- (e.g. subject-position noun number vs. verb number curs in singular form in a PP (e.g. per dog = hper,−DETi). This feature cluster exploits agreement), each component feature feat s,t is trans- lated into the 5-tuple: the fact that countable nouns occur determin- erless in singular form with only very partic- ular prepositions (e.g. by bus, *on bus, *with freq(feat s,t|word) freq(feats,t|word) hfreq(feats,t|word), , , bus) whereas with uncountable nouns, there are freq(word) P i,j freq(feati,j |word) fewer restrictions on what prepositions a target freq(feats,t|word) freq(feat s,t|word) , i noun can occur with (e.g. on furniture, with fur- P ifreq(feati,t|word) P j freq(feats,j |word) niture, ?by furniture). 2D See Baldwin and Bond (2003) for further details. Pronoun co-occurrence: what personal and The following is a brief description of each fea- possessive occur in the same sen- ture cluster and its dimensionality (1D or 2D). A tence as singular and plural instances of the summary of the number of base features and predic- target noun (e.g. The dog ate its dinner = tion of positive feature correlations with countability hits,SINGULARi). This is a proxy for pronoun classes is presented in Table 1. binding effects, and is determined over a total of 12 third-person pronoun forms (normalised Head noun number:1D the number of the target for case, e.g. , their, itself ). noun when it heads an NP (e.g. a shaggy dog Singular determiners:1D what singular-selecting = SINGULAR) determiners occur in NPs headed by the tar- Modifier noun number:1D the number of the tar- get noun in singular form (e.g. a dog = a). get noun when a modifier in an NP (e.g. dog All singular-selecting determiners considered food = SINGULAR) are compatible with only countable (e.g. an- 2D other, each) or uncountable nouns (e.g. much, Subject–verb agreement: the number of the tar- little). Determiners compatible with either are get noun in subject position vs. number agree- excluded from the feature cluster (cf. this dog, ment on the governing verb (e.g. the dog barks this information). Note that the term “deter- = hSINGULAR,SINGULARi) miner” is used loosely here and below to denote 2D Coordinate noun number: the number of the an amalgam of simplex determiners (e.g. a), the target noun vs. the number of the head null determiner, complex determiners (e.g. all nouns of conjuncts (e.g. dogs and mud = the), numeric expressions (e.g. one), and adjec- hPLURAL,SINGULARi) tives (e.g. numerous), as relevant to the partic- N of N constructions:2D the number of the target ular feature cluster. 1D noun (N) vs. the type of the N in an N Plural determiners: what plural-selecting deter- of N construction (e.g. the type of dog = miners occur in NPs headed by the target noun in plural form (e.g. few dogs = few). As agreement), we looked at only adjacent chunk pairs with singular determiners, we focus on those so as to maintain a high level of precision. plural-selecting determiners which are compat- As the full parser, we used RASP (Briscoe and ible with a proper subset of count, plural only Carroll, 2002), a robust tag sequence grammar- and bipartite nouns. based parser. RASP’s grammatical relation output Non-bounded determiners:2D what non-bounded function provides the phrase structure in the form determiners occur in NPs headed by the target of lemmatised dependency tuples, from which it is noun, and what is the number of the target noun possible to read off the feature information. RASP for each (e.g. more dogs = hmore,PLURALi). has the advantage that recall is high, although pre- Here again, we restrict our focus to non- cision is potentially lower than chunking or tagging bounded determiners that select for singular- as the parser is forced into resolving phrase attach- form uncountable nouns (e.g. sufficient furni- ment ambiguities and committing to a single phrase ture) and plural-form countable, plural only structure analysis. and bipartite nouns (e.g. sufficient dogs). Although all three systems map onto an identi- cal feature space, the feature vectors generated for a The above feature clusters produce a combined given target noun diverge in content due to the dif- total of 1,284 individual feature values. ferent feature extraction methodologies. In addition, we only consider nouns that occur at least 10 times 4.2 Feature extraction as head of an NP, causing slight disparities in the In order to extract the features described above, target noun type space for the three systems. There we need some mechanism for detecting NP and were sufficient instances found by all three systems PP boundaries, determining subject–verb agreement for 20,530 common nouns (out of 33,050 for which and deconstructing NPs in order to recover con- at least one system found sufficient instances). juncts and noun-modifier data. We adopt three ap- proaches. First, we use part-of-speech (POS) tagged 4.3 Classifier architecture data and POS-based templates to extract out the nec- The classifier design employed in this research is essary information. Second, we use chunk data four parallel supervised classifiers, one for each to determine NP and PP boundaries, and medium- countability class. This allows us to classify a sin- recall chunk adjacency templates to recover inter- gle noun into multiple countability classes, e.g. de- phrasal dependency. Third, we fully parse the data mand is both countable and uncountable. Thus, and simply read off all necessary data from the de- rather than classifying a given target noun accord- pendency output. ing to the unique most plausible countability class, With the POS extraction method, we first Penn- we attempt to capture its full range of countabilities. tagged the BNC using an fnTBL-based tagger (Ngai Note that the proposed classifier design is that which and Florian, 2001), training over the Brown and was found by Baldwin and Bond (2003) to be opti- WSJ corpora with some spelling, number and hy- mal for the task, out of a wide range of classifier phenation normalisation. We then lemmatised this architectures. data using a version of morph (Minnen et al., 2001) In order to discourage the classifiers from over- customised to the Penn POS tagset. Finally, we training on negative evidence, we constructed the implemented a range of high-precision, low-recall gold-standard training data from unambiguously POS-based templates to extract out the features from negative exemplars and potentially ambiguous pos- the processed data. For example, NPs are in many itive exemplars. That is, we would like classifiers cases recoverable with the following Perl-style reg- to judge a target noun as not belonging to a given ular expression over Penn POS tags: (PDT)* DT countability class only in the absence of positive ev- (RB|JJ[RS]?|NNS?)* NNS? [ˆN]. idence for that class. This was achieved in the case For the chunker, we ran fnTBL over the lem- of countable nouns, for instance, by extracting all matised tagged data, training over CoNLL 2000- countable nouns from each of the ALT-J/E and COM- style (Tjong Kim Sang and Buchholz, 2000) chunk- LEX lexicons. As positive training exemplars, we converted versions of the full Brown and WSJ cor- then took the intersection of those nouns listed as pora. For the NP-internal features (e.g. determin- countable in both lexicons (irrespective of member- ers, head number), we used the noun chunks directly, ship in alternate countability classes); negative train- or applied POS-based templates locally within noun ing exemplars, on the other hand, were those con- chunks. For inter-chunk features (e.g. subject–verb tained in both lexicons but not classified as count- Class Positive data Negative data Baseline Class System Accuracy (e.r.) F-score Countable 4,342 1,476 .746 Tagger∗ .928 (.715) .953 Uncountable 1,519 5,471 .783 Countable Chunker .933 (.734) .956 Bipartite 35 5,639 .994 RASP∗ .923 (.698) .950 Plural only 84 5,639 .985 Combined .939 (.759) .960 Tagger .945 (.746) .876 ∗ Table 2: Details of the gold-standard data Uncountable Chunker .945 (.747) .876 RASP∗ .944 (.743) .872 able in either.1 The uncountable gold-standard data Combined .952 (.779) .892 was constructed in a similar fashion. We used the Tagger .997 (.489) .752 ALT-J/E lexicon as our source of plural only and bi- Bipartite Chunker .997 (.460) .704 RASP .997 (.488) .700 partite nouns, using all the instances listed as our Combined .996 (.403) .722 positive exemplars. The set of negative exemplars Tagger .989 (.275) .558 was constructed in each case by taking the intersec- Chunker .990 (.299) .568 Plural only ∗ tion of nouns not contained in the given countability RASP .989 (.227) .415 Combined .990 (.323) .582 class in ALT-J/E, with all annotated nouns with non- identical singular and plural forms in COMLEX. Table 3: Cross-validation results Having extracted the positive and negative exem- plar noun lists for each countability class, we filtered of 10-fold stratified cross-validation over the gold- out all noun lemmata not occurring in the BNC. standard data for each countability class. The fi- 3 The final make-up of the gold-standard data for nal classification accuracy and F-score are averaged each of the countability classes is listed in Table 2, over the 10 iterations. along with a baseline classification accuracy for The cross-validated results for each classifier are each class (“Baseline”), based on the relative fre- presented in Table 3, broken down into the differ- quency of the majority class (positive or negative). ent feature extraction methods. For each, in addi- That is, for bipartite nouns, we achieve a 99.4% clas- tion to the F-score and classification accuracy, we sification accuracy by arbitrarily classifying every present the relative error reduction (e.r.) in classifi- training instance as negative. cation accuracy over the majority-class baseline for The supervised classifiers were built using that gold-standard set (see Table 2). For each count- TiMBL version 4.2 (Daelemans et al., 2002), a ability class, we additionally ran the classifier over memory-based classification system based on the k- the concatenated feature vectors for the three basic nearest neighbour algorithm. As a result of exten- feature extraction methods, producing a 3,852-value sive parameter optimisation, we settled on the de- feature space (“Combined”). fault configuration for TiMBL with k set to 9. 2 Given the high baseline classification accuracies for each gold-standard dataset, the most revealing 5 Results and Evaluation statistics in Table 3 are the error reduction and F- score values. In all cases other than bipartite, the Evaluation is broken down into two components. combined system outperformed the individual sys- First, we determine the optimal classifier configura- tems. The difference in F-score is statistically sig- tion for each countability class by way of stratified nificant (based on the two-tailed t-test, p < .05) for cross-validation over the gold-standard data. We the asterisked systems in Table 3. For the bipartite then run each classifier in optimised configuration class, the difference in F-score is not statistically sig- over the remaining target nouns for which we have nificant between any system pairing. feature vectors. There is surprisingly little separating the tagger-, chunker- and RASP-based feature extraction meth- 5.1 Cross-validated results ods. This is largely due to the precision/recall trade- First, we ran the classifiers over the full feature set off noted above for the different systems. for the three feature extraction methods. In each case, we quantify the classifier performance by way 5.2 Open data results

1 We next turn to the task of classifying all unseen Any nouns not annotated for countability in COMLEX were common nouns using the gold-standard data and the ignored in this process so as to assure genuinely negative exemplars. best-performing classifier configurations for each 2We additionally experimented with the kernel-based 3 2·precision·recall TinySVM system, but found TiMBL to be superior in all cases. Calculated according to: precision+recall 1 1 quency. That is, for lower-frequency nouns, the clas- sifier tends to rampantly classify nouns as count- 0.8 0.8 able, while for higher-frequency nouns, the classi- precision fier tends to be extremely conservative in positively 0.6 recall

0.6 classifying nouns. One possible explanation for this Recall 0.4 is that, based on the training data, the frequency Precision of a noun is proportional to the number of count- 0.4 0.2 ability classes it belongs to. Thus, for the more frequent nouns, evidence for alternate countability 0 0.2 10 100 1000 10000 classes can cloud the judgement of a given classifier. Mean frequency In secondary evaluation, the authors used BNC Figure 1: Precision–recall curve for countable nouns corpus evidence to blind-annotate 100 randomly- selected nouns from the test data, and tested the cor- countability class (indicated in bold in Table 3).4 relation with the system output. This is intended Here, the baseline method is to classify every noun to test the ability of the system to capture corpus- as being uniquely countable. attested usages of nouns, rather than independent There were 11,499 feature-mapped common lexicographic intuitions as are described in the COM- nouns not contained in the union of the gold- LEX and ALT-J/E lexicons. Of the 100, 28 were clas- standard datasets. Of these, the classifiers were able sified by the annotators into two or more groups to classify 10,355 (90.0%): 7,974 (77.0%) as count- (mainly countable and uncountable). On this set, able (e.g. alchemist), 2,588 (25.0%) as uncountable the baseline of all-countable was 87.8%, and the (e.g. ingenuity), 9 (0.1%) as bipartite (e.g. head- classifiers gave an agreement of 92.4% (37.7% e.r.), phones), and 80 (0.8%) as plural only (e.g. dam- agreement with the dictionaries was also 92.4%. ages). Only 139 nouns were assigned to multiple Again, the main source of errors was the classi- countability classes. fier only returning a single countability for each We evaluated the classifier outputs in two ways. noun. To put this figure in proper perspective, we In the first, we compared the classifier output to the also hand-annotated 100 randomly-selected nouns combined COMLEX and ALT-J/E lexicons: a lexicon from the training data (that is words in our com- with countability information for 63,581 nouns. The bined lexicon) according to BNC corpus evidence. classifiers found a match for 4,982 of the nouns. The Here, we tested the correlation between the manual predicted countability was judged correct 94.6% of judgements and the combined ALT-J/E and COMLEX the time. This is marginally above the level of match dictionaries. For this dataset, the baseline of all- between ALT-J/E and COMLEX (93.8%) and substan- countable was 80.5%, and agreement with the dic- tially above the baseline of all-countable at 89.7% tionaries was a modest 86.8% (32.3% e.r.). Based (error reduction = 47.6%). on this limited evaluation, therefore, our automated To gain a better understanding of the classifier method is able to capture corpus-attested count- performance, we analysed the correlation between abilities with greater precision than a manually- corpus frequency of a given target noun and its pre- generated static repository of countability data. cision/recall for the countable class.5 To do this, we listed the 11,499 unannotated nouns in increas- 6 Discussion ing order of corpus occurrence, and worked through the ranking calculating the mean precision and re- The above results demonstrate the utility of the call over each partition of 500 nouns. This resulted proposed method in learning noun countability in the precision–recall graph given in Figure 1, from from corpus data. In the final system configu- which it is evident that mean recall is proportional ration, the system accuracy was 94.6%, compar- and precision inversely proportional to corpus fre- ing favourably with the 78% accuracy reported 4In each case, the classifier is run over the best- by Bond and Vatikiotis-Bateson (2002), 89.5% of 500 features as selected by the method described in O’Hara et al. (2003), and also the noun token-based Baldwin and Bond (2003) rather than the full feature set, purely results of Schwartz (2002). in the interests of reducing processing time. Based on cross- At the moment we are merely classifying nouns validated results over the training data, the resultant difference in performance is not statistically significant. into the four classes. The next step is to store the 5We similarly analysed the uncountable class and found the distribution of countability for each target noun and same basic trend. build a representation of each noun’s countability preferences. We have made initial steps in this direc- on Computational Linguistics (COLING ’94), pages 32–8, tion, by isolating token instances strongly support- Kyoto, Japan. ing a given countability class analysis for that target Francis Bond. 2001. Determiners and Number in English, con- noun. We plan to estimate the overall frequency of trasted with Japanese, as exemplified in Machine Transla- tion. Ph.D. thesis, University of Queensland, Brisbane, Aus- the different countabilities based on this evidence. tralia. This would represent a continuous equivalent of the Ted Briscoe and John Carroll. 2002. Robust accurate statistical discrete 5-way scale employed in ALT-J/E, tunable to annotation of general text. In Proc. of the 3rd International different corpora/domains. Conference on Language Resources and Evaluation (LREC For future work we intend to: investigate further 2002), pages 1499–1504, Las Palmas, Canary Islands. the relation between meaning and countability, and Lou Burnard. 2000. User Reference Guide for the British Na- the possibility of using countability information to tional Corpus. Technical report, Oxford University Comput- prune the search space in word sense disambigua- ing Services. tion; describe and extract countability-idiosyncratic Ann Copestake and Ted Briscoe. 1995. Semi-productive poly- semy and sense extension. Journal of Semantics, pages 15– constructions, such as determinerless PPs and role- 67. nouns; investigate the use of a grammar that distin- Walter Daelemans, Jakub Zavrel, Ko van der Sloot, and An- guishes between countable and uncountable uses of tal van den Bosch. 2002. TiMBL: Tilburg memory based nouns; and in combination with such a grammar, in- learner, version 4.2, reference guide. ILK technical report vestigate the effect of lexical rules on countability. 02-01. Ralph Grishman, Catherine Macleod, and Adam Myers, 1998. 7 Conclusion COMLEX Reference Manual. Proteus Project, NYU. (http://nlp.cs.nyu.edu/comlex/refman.ps). We have proposed a knowledge-rich lexical acqui- Satoru Ikehara, Satoshi Shirai, Akio Yokoo, and Hiromi sition technique for multi-classifying a given noun Nakaiwa. 1991. Toward an MT system without pre-editing according to four countability classes. The tech- – effects of new methods in ALT-J/E–. In Proc. of the Third Machine Translation Summit (MT Summit III), pages 101– nique operates over a range of feature clusters draw- 106, Washington DC. ing on pre-processed corpus data, which are then fed Marc Light. 1996. Morphological cues for lexical semantics. into independent classifiers for each of the count- In Proc. of the 34th Annual Meeting of the ACL, pages 25– ability classes. The classifiers were able to selec- 31, Santa Cruz, USA. tively classify the countability preference of English Guido Minnen, John Carroll, and Darren Pearce. 2001. Ap- nouns with a precision of 94.6%. plied morphological processing of English. Natural Lan- guage Engineering, 7(3):207–23. Acknowledgements Grace Ngai and Radu Florian. 2001. Transformation-based learning in the fast lane. In Proc. of the 2nd Annual Meeting This material is based upon work supported by the National of the North American Chapter of Association for Compu- Science Foundation under Grant No. BCS-0094638 and also tational Linguistics (NAACL2001), pages 40–7, Pittsburgh, the Research Collaboration between NTT Communication Sci- USA. ence Laboratories, Nippon Telegraph and Telephone Corpora- tion and CSLI, Stanford University. We would like to thank Tom O’Hara, Nancy Salay, Michael Witbrock, Dave Schnei- Leonoor van der Beek, Ann Copestake, Ivan Sag and the three der, Bjoern Aldag, Stefano Bertolo, Kathy Panton, Fritz anonymous reviewers for their valuable input on this research. Lehmann, Matt Smith, David Baxter, Jon Curtis, and Peter Wagner. 2003. Inducing criteria for mass noun lexical map- pings using the Cyc KB and its extension to WordNet. In Proc. of the Fifth International Workshop on Computational Semantics (IWCS-5), Tilburg, the Netherlands. Keith Allan. 1980. Nouns and countability. Language, Lane O.B. Schwartz. 2002. Corpus-based acquisition of head 56(3):541–67. noun countability features. Master’s thesis, Cambridge Uni- Timothy Baldwin and Francis Bond. 2003. A plethora of meth- versity, Cambridge, UK. ods for learning English countability. In Proc. of the 2003 Eric V. Siegel and Kathleen McKeown. 2000. Learning meth- Conference on Empirical Methods in Natural Language Pro- ods to combine linguistic indicators: Improving aspectual cessing (EMNLP 2003), Sapporo, Japan. (to appear). classification and revealing linguistic insights. Computa- tional Linguistics, 26(4):595–627. Francis Bond and Caitlin Vatikiotis-Bateson. 2002. Using an ontology to determine English countability. In Proc. of the Erik F. Tjong Kim Sang and Sabine Buchholz. 2000. Introduc- 19th International Conference on Computational Linguistics tion to the CoNLL-2000 shared task: Chunking. In Proc. (COLING 2002), Taipei, Taiwan. of the 4th Conference on Computational Natural Language Learning (CoNLL-2000), Lisbon, Portugal. Francis Bond, Kentaro Ogura, and Satoru Ikehara. 1994. Anna Wierzbicka. 1988. The Semantics of Grammar. John Countability and number in Japanese-to-English machine Benjamin. translation. In Proc. of the 15th International Conference