On the Idiosyncrasies of the Mandarin Chinese Classifier System

On the Idiosyncrasies of the Mandarin Chinese Classifier System Shijia Liu@ and Hongyuan Mei@ and Adina WilliamsP and Ryan Cotterell@, H @Department of Computer Science, Johns Hopkins University PFacebook Artificial Intelligence Research HDepartment of Computer Science and Technology, University of Cambridge fsliu126,[email protected], [email protected], [email protected] Abstract Classifier P¯ıny¯ın Semantic Class While idiosyncrasies of the Chinese classi- * ge` objects, general-purpose fier system have been a richly studied topic 件 jian` matters among linguists (Adams and Conklin, 1973; 4 tou´ domesticated animals Erbaugh, 1986; Lakoff, 1986), not much work ê zh¯ı general animals has been done to quantify them with statisti- zhang¯ cal methods. In this paper, we introduce an flat objects information-theoretic approach to measuring a tiao´ long, narrow objects idiosyncrasy; we examine how much the un- y xiang` items, projects certainty in Mandarin Chinese classifiers can S dao` orders, roads, projections be reduced by knowing semantic information 9 pˇı horses, cloth about the nouns that the classifiers modify. 顿 dun` meals Using the empirical distribution of classifiers from the parsed Chinese Gigaword corpus Table 1: Examples of Mandarin classifiers. Classi- (Graff et al., 2005), we compute the mutual fiers’ simplified Mandarin Chinese characters (1st col- information (in bits) between the distribution umn), p¯ıny¯ın pronunciations (2nd column), and com- over classifiers and distributions over other lin- monly modified noun types (3rd column) are listed. guistic quantities. We investigate whether semantic classes of nouns and adjectives differ in how much they reduce uncertainty in classifier choice, and find that it is not fully idiosyn- 2015, Table1 gives some canonical examples), and cratic; while there are no obvious trends for the classifier choices are often argued to be based on majority of semantic classes, shape nouns re- inherent, possibly universal, semantic properties duce uncertainty in classifier choice the most. associated with the noun, such as shape (Kuo and Sera, 2009; Zhang and Jiang, 2016). Indeed, in 1 Introduction a summary article, Tai(1994) writes: “Chinese Many of the world’s languages make use of nu- classifier systems are cognitively based, rather than meral classifiers (Aikhenvald, 2000). While theo- arbitrary systems of classification.” If classifier retical debate still rages on the function of numeral choice were solely based on conceptual features of classifiers (Krifka, 1995; Ahrens and Huang, 1996; a given noun, then we might expect it to be nearly Cheng et al., 1998; Chierchia, 1998; Li, 2000; Nis- determinate—like gender-marking on nouns in bett, 2004; Bale and Coon, 2014), it is generally ac- German, Slavic, or Romance languages (Erbaugh, cepted that they need to be present for nouns to be 1986, 400)—and perhaps even fixed for all of a modified by numerals, quantifiers, demonstratives, given noun’s synonyms. or other qualifiers (Li and Thompson, 1981, 104). However, selecting which classifers go with In Mandarin Chinese, for instance, the phrase one nouns in practice is an idiosyncratic process, often person translates as 一个人 (y¯ı ge` ren´ ); the classi- with several grammatical options (see Table2 fier * (ge`) has no clear translation in English, yet, for two examples). Moreover, knowing what a nevertheless, it is necessary to place it between the noun means doesn’t always mean you can guess numeral 一 (y¯ı) and the word for person º (ren´ ). the classifier. For example, most nouns referring There are hundreds of numeral classifiers in to animals, such as t (lu¨´, donkey) or 羊 (yang´ , the Mandarin lexicon (Po-Ching and Rimmington, goat), use the classifier ê (zh¯ı). However, horses Classifier p(C j N = ºººëëë) p(C j N = 工工工程程程) Why investigate the idiosyncrasy of the Man- M (wei` ) 0.4838 0.0058 darin Chinese classifier system? How idiosyncratic 名 (m´ıng) 0.3586 0.0088 or predictable natural language is has captivated * (ge`) 0.0205 0.0486 researchers since Shannon(1951) originally pro- y (p¯ı) 0.0128 0.0060 posed the question in the context of printed En- y (xiang` ) 0.0063 0.4077 glish text. Indeed, looking at predictability directly 期 (q¯ı) 0.0002 0.2570 Everything else 0.1178 0.2661 relates to the complexity of language—a funda- mental question in linguistics (Newmeyer and Pre- Table 2: Empirical distribution of selected classifiers ston, 2014; Dingemanse et al., 2015)—which has over two nouns: ºë (ren´ sh`ı, people) and 工程 (gong¯ also been claimed to have consequences learnabil- cheng,´ project). Most of the probability mass is allo- ity and processing. For example, how hard it is cated to only a few classifiers for both nouns (bolded). for a learner to master irregularity, say, in the En- glish past tense (Rumelhart and McClelland, 1987; Pinker and Prince, 1994; Pinker and Ullman, 2002; ê cannot use (zh¯ı), despite being semantically Kirov and Cotterell, 2018) might be affected by pre- similar to goats and donkeys, and instead must dictability, and highly predictable noun-adjacent 9 appear with (pˇı). Conversely, knowing which words, such as gender affixes in German and pre- particular subset of noun meaning is reflected in nominal adjectives in English, are also shown to the classifer also doesn’t mean that you can use confer online processing advantages (Dye et al., that classifier with any noun that seems to have 2016, 2017, 2018). Within the Chinese classifier a the right semantics. For example, classifier system itself, the very common, general-purpose (tiao´ ) can be used with nouns referring to long and classifier * (ge`) is acquired by children earlier than narrow objects, like rivers, snakes, fish, pants, and rarer, more semantically rich ones (Hu, 1993). Gen- certain breeds of dogs—but never cats, regardless eral classifiers are also found to occur more often of how long and narrow they might be! In general, in corpora with nouns that are less predictable in classifiers carve up the semantic space in a very context (i.e., nouns with high surprisal; Hale 2001) idiosyncratic manner that is neither fully arbitrary, (Zhan and Levy, 2018), providing initial evidence nor fully predictable from semantic features. that predictability likely plays a role in classifier- Given this, we can ask: precisely how idiosyn- noun pairing more generally. Furthermore, pro- cratic is the Mandarin Chinese classifier system? viding classifiers improves participants’ recall For a given noun, how predictable is the set of of nouns in laboratory experiments (Zhang and classifiers that can be grammatically employed? Schmitt, 1998; Gao and Malt, 2009) (but see Huang For instance, had we not known that the Mandarin and Chen 2014)—but, it isn’t known whether clas- word for horse l (maˇ) predominantly takes the sifiers do so by modulating noun predictability. classifier 9 (pˇı), how likely would we have been to guess it over the much more common animal 2 Quantifying Classifier Idiosyncrasy classifier ê (zh¯ı)? Is it more important to know We take an information-theoretic approach to that a noun is l (maˇ) or simply that the noun is statistically quantify the idiosyncrasy of the Man- an animal noun? We address these questions by darin Chinese classifier system, and measure the computing how much the uncertainty in the distri- uncertainty (entropy) reduction—or mutual infor- bution over classifiers can be reduced by knowing mation (MI) (Cover and Thomas, 2012)—between information about nouns and noun semantics. We classifiers and other linguistic quantities, like quantify this notion of classifier idiosyncrasy by nouns or adjectives. Intuitively, MI lets us directly calculating the mutual information between clas- measure classifier “idiosyncrasy”, because it tells sifiers and nouns, and also between classifiers and us how much information (in bits) about a classifier several sets that are relevant to noun meaning (i.e., we can get by observing another linguistic quantity. categories of noun senses, sets of noun synonyms, If classifiers were completely independent from adjectives, and categories of adjective senses). Our other quantities, knowing them would give us no results yield concrete, quantitative measures of information about classifier choice. idiosyncrasy in bits, that can supplement existing hand-annotated, intuition-based approaches that Notation. Let C be a classifier-valued random organize Mandarin classifiers into an ontology. variable with range C, the set of Mandarin Chinese classifiers. Let X be a second random variable, into 26 SemCor supersense categories (Tsvetkov which models a second linguistic quantity, with et al., 2015), and then compute I(C; Ni) (i 2 range X . Mutual information (MI) is defined as f1; 2; :::; 26g) for each supersense category. The supersense categories (e.g., animal, plant, person, I(C; X) ≡ H(C) − H(C j X) (1a) artifact, etc.) provide a semantic classification sys- X p(c; x) tem for English nouns. Since there are no known = p(c; x) log (1b) p(c)p(x) supersense categories for Mandarin, we need to c2C;x2X translate Chinese nouns into English to perform our Let N and A denote the sets of nouns and adjec- analysis. We use SemCor supersense categories tives, respectively, with N and A be noun- and instead of WordNet hypernyms because different adjective-valued random variables, respectively. “basic levels” for each noun make it difficult to Let Ni and Ai denote the sets of nouns and adjec- determine the “correct” category for each noun. tives in ith SemCor supersense category for nouns (Tsvetkov et al., 2015) and adjectives (Tsvetkov 2.4 MI between Classifiers and Adjective et al., 2014), respectively, with their random vari- Supersenses ables being N and A , respectively. Let S be the i i We translated and divided the adjectives into 12 set of all English WordNet (Miller, 1998) senses supersense categories (Tsvetkov et al., 2014), of nouns, with S be the WordNet sense-valued ran- and compute mutual information I(C; Ai) (i 2 dom variable.

Load more