Implementing the Syntax of Japanese Numeral Classifiers

Implementing the Syntax of Japanese Numeral Classifiers Emily M. Bender Melanie Siegel Department of Linguistics Saarland University University of Washington Computational Linguistics Box 354340 PF 15 11 50 Seattle WA 98195-4340 D-66041 Saarbruck¨ en [email protected] [email protected] Abstract originally developed as part of the Verbmobil project (Siegel, 2000) to handle spoken Japanese, and then While the sortal constraints associated extended to handle informal written Japanese (email with Japanese numeral classifiers are well- text; (Siegel and Bender, 2002)) and newspaper text. studied, less attention has been paid to Recently, it has been adapted to be consistent with the details of their syntax. We describe the LinGO Grammar Matrix (Bender et al., 2002). an analysis implemented within a broad- coverage HPSG that handles an intricate 2 Types of numeral classifiers set of numeral classifier construction types and compositionally relates each to an ap- Paik and Bond (2002) divide Japanese numeral clas- propriate semantic representation, using sifiers into five major classes: sortal, event, men- Minimal Recursion Semantics. sural, group and taxanomic, and several subclasses. The classes and subclasses can be differentiated ac- 1 Introduction cording to the semantic relationship between the classifiers and the nouns they modify, on two lev- Much attention has been paid to the semantic aspects els: First, what properties of the modified noun mo- of Japanese numeral classifiers, in particular, the se- tivate the choice of the classifier, and second what mantic constraints governing which classifiers co- properties the classifiers predicate of the nouns. As occur with which nouns (Matsumoto, 1993; Bond we are concerned here with the syntax and com- and Paik, 2000). Here, we focus on a more neglected positional semantics of numeral classifiers, we will aspect of this linguistic phenomenon, namely the focus only on the latter. Sortal classifiers, (kind, syntax of numeral classifiers: How they combine shape, and complement classifiers) serve to indi- with number names to create numeral classifier viduate the nouns they modify. Event classifiers phrases, how they modify head nouns, and how they quantify events, characteristically modifying verbs can occur as stand-alone NPs. We find that there is rather than nouns. Mensural classifiers measure both broad similarity and differences in detail across some property of the entity denoted by the noun they different types of numeral classifiers in their syntac- modify (e.g., its length). NPs containing group clas- tic and semantic behavior. We present semantic rep- sifiers denote a group or set of individuals belonging resentations for two types of numeral classifiers, and to the type denoted by the noun. Finally, taxonomic describe how they can be constructed composition- classifiers force a kind or species reading on an NP. ally in an implemented broad-coverage HPSG (Pol- In this paper, we will treat the syntax and compo- lard and Sag, 1994) for Japanese. sitional semantics of sortal and mensural classifiers. The grammar of Japanese in question is JACY,1 However, we believe that our general analysis can 1http://www.dfki.uni-sb.de/ siegel/grammar- be extended to treat the full range of classifiers in download/JACY-grammar.html Japanese and similar languages. 3 Data: Constructions NumClPs can be modified by elements such as yaku ‘approximately’ (before the number name) or mo Internally, Japanese numeral classifier expressions ‘even’ (after the floated numeral classifiers). consist of a number name followed by a numeral The above examples illustrate the contexts with a classifier (1a,b,c). In this, they resemble date ex- sortal numeral classifier, but mensural numeral clas- pressions (1d).2 sifiers can also appear both as modifiers (3a) and as (1) a. juu mai b. juu en NPs in their own right (3b): 10 NumCl 10 yen (3) a. ni kiro no ringo wo katta c. juu kagetsu d. juu gatsu 2 NumCl (kg) GEN apple ACC bought 10 month 10 month ‘(I) bought two kilograms of apples.’ ‘10 months’ ‘October’ b. ni kiro wo katta 2 NumCl (kg) ACC bought In fact, both numeral classifiers and date expressions ‘(I) bought two kilograms.’ are tagged as numeral classifiers by the morpho- logical analyzer ChaSen (Asahara and Matsumoto, NumClPs serving as NPs can also appear as mod- 2000). However, date expressions do not have the ifiers of other nouns: same combinatoric potential (syntactic or semantic) as numeral classifiers. We thus give date expressions (4) a. san nin no deai wa 80 nen haru a distinct analysis, which we will not describe here. 3 NumCl GEN meeting TOP 80 year spring Externally, numeral classifier phrases (NumClPs) ‘The three’s meeting was in the spring of appear in at least four different contexts: alone, as ’80.’ anaphoric NPs (2a); preceding a head noun, linked b. ichi kiro no nedan ha hyaku en desu by the particle no (2b); immediately following a 1 kg GEN price TOP 100 yen COPULA head noun (2c); and ‘floated’, right after the asso- ‘The price of/for 1 kg is 100 yen.’ ciated noun’s case particle or right before the verb (2d). These constructions are distinguished prag- As a result, tokens following the syntactic pattern matically (Downing, 1996).3 of (2b) and (3a) are systematically ambiguous, although the non-anaphoric reading tends to be pre- (2) a. ni hiki wo kau ferred. 2 NumCl ACC raise Certain mensural classifiers can be followed by ‘(I) am raising two (small animals).’ the word han ‘half’: b. ni hiki no neko wo kau 2 NumCl GEN cat ACC raise (5) ni kiro han ‘(I) am raising two cats.’ two kg half c. neko ni hiki wo kau ‘two and a half kilograms’ cat 2 NumCl ACC raise In order to build their semantic representations com- ‘(I) am raising two cats.’ positionally, we make the numeral classifier (here, d. neko wo (ni hiki) ie de kiro) the head of the whole expression, and ni and cat ACC (2 NumCl) house LOC han its dependents. Kiro can then orchestrate the se- (ni hiki) kau mantic composition of the two dependents as well (2 NumCl) raise as the composition of the whole expression with the ‘(I) am raising two cats in my house.’ noun it modifies (see 6 below). 2Note that many of the time units are ambiguous with date Although they aren’t tagged as numeral classi- expressions, although some, like the one for months shown in fiers by ChaSen, we extended our analysis of mensu- (1), are distinguished. ral classifiers to certain elements that appear before 3Downing also notes NumClPs following the head noun with an intervening no. As this rare construction did not appear numbers, namely currency symbols (such as $), and in our data, we have not incorporated it into our account. prefixes like No. ‘number’ in (6). (6) kouza No. 1234 gou al., 2001). Abstracting away from handle con- account number 1234 number straints,4 illocutionary force, tense/aspect, and the ‘account number 1234’ unexpressed subject, the representation we build for (2b,c) is as in (8).5 Finally, we found that number names can some- times occur without numeral classifiers, either as (8) cat n rel(x), udef rel(x), card rel(x,“2”), modifiers of nouns or as anaphora: raise v rel(z,x) (7) (kouza) 1234 wo tojitai This can be read as follows: A relation of raising ¡ ¡ (account) 1234 ACC close.volitional holds between (the unexpressed subject), and . ‘(I) want to close (account) 1234.’ denotes a cat entity, and is bound by an underspecified quantifier (as there is no explicit determiner). ¡ Due to space considerations, we won’t describe our is also an argument of a card rel (short for ‘cardi- analysis of such bare number names here. nal relation’), whose other argument is the constant 4 Data: Distribution value 2, meaning that there are in fact two cats being referred to.6 We used ChaSen to segment and tag 10,000 para- For anaphoric numeral classifiers, the representa- graphs of the Mainichi Shinbun 2002 corpus. Of tion contains an underspecified noun relation, to be the resulting 490,202 words, 11,515 (2.35%) were resolved in further processing to a specific relation: tagged as numeral classifiers. 4,543 of those were (9) noun relation(x), udef rel(x), card rel(x,“2”), potentially time/date expressions, leaving 6,972 nu- raise v rel(z,x) meral classifiers, or 1.42% of the words. 203 ortho- graphically distinct numeral classifiers occur in the Mensural classifiers have somewhat more elab- corpus. The most frequent is nin (the numeral clas- orated semantic representations, which we treat as sifier for people) which occurs 1,675 times. similar to English measure NPs (Flickinger and We sampled 100 sentences tagged as containing Bond, 2003). On this analysis, the NumClP de- numeral classifiers to examine the distribution of the notes the extent of some dimension or property constructions outlined in 3. These sentences con- of the modified N. This dimension or property is tained a total of 159 numeral classifier phrases and represented with an underspecified relation (un- the vast majority (128) were stand-alone NPs. This spec adj rel), and a degree rel relates the mea- contrasts with Downing’s (1996) study of 500 exam- sured amount to the underspecified adjective rela- ples from modern works of fiction and spoken texts, tion.7 The underspecified adjective relation modi- where most of the occurrences are not anaphoric. fies the N in the usual way. This is illustrated in Furthermore, while our sample contains no exam- (10), which is the semantic representation assigned ples of the floated variety, Downing’s contains 96. to (3a).8 The discrepancy probably arises because Downing 4The potentially underspecified MRS representation of only included sortal numeral classifiers, and not any scope.

Load more