<<

A Statistical Approach to the Processing of Metonymy

Masao Utiyama, Masaki Murata, and Hitoshi Isahara Communications Research Laboratory, MPT, 588-2, Iwaoka, Nishi-ku, Kobe, Hyogo 651-2492 {mutiyama,murata,isahara}@crl.go.jp

Abstract Such approaches are restricted by the knowl- This paper describes a statistical approach to edge bases they use, and may only be applicable the interpretation of metonymy. A metonymy to domain-specific tasks because the construc- is received as an input, then its possible inter- tion of large knowledge bases could be very dif- pretations are ranked by applying a statistical ficult. measure. The method has been tested experi- The method outlined in this paper, on the mentally. It correctly interpreted 53 out of 75 hand, uses corpus statistics to interpret metonymies in Japanese. metonymy, so that a variety of metonymies can be handled without using hand-constructed 1 Introduction knowledge bases. The method is quite promis- Metonymy is a figure of speech in which the ing as shown by the experimental results given of one thing is substituted for that of in section 5. something to which it is related. The explicit 2 Recognition and Interpretation term is ‘the name of one thing’ and the implicit term is ’the name of something to which it is Two main steps, recognition and interpre- related’. A typical example of metonymy is tation, are involved in the processing of metonymy (Fass, 1997). In the recognition step, He read Shakespeare. (1) metonymic expressions are labeled. In the in- terpretation step, the meanings of those expres- ‘Shakespeare’ is substituted for ‘the works of sions are interpreted. Shakespeare’. ‘Shakespeare’ is the explicit term Sentence (1), for example, is first recognized and ‘works’ is the implicit term. as a metonymy and ‘Shakespeare’ is identified Metonymy is pervasive in natural language. as the explicit term. The interpretation ‘works’ The correct treatment of metonymy is vital for is selected as an implicit term and ‘Shakespeare’ natural language processing applications, es- is replaced by ‘the works of Shakespeare’. pecially for machine translation (Kamei and A comprehensive survey by Fass (1997) shows Wakao, 1992; Fass, 1997). A metonymy may be that the most common method of recogniz- acceptable in a source language but unaccept- ing metonymies is by selection-restriction vio- able in a target language. For example, a direct lations. Whether or not statistical approaches translation of ‘he read Mao’, which is acceptable can recognize metonymy as well as the selection- in English and Japanese, is completely unac- restriction violation method is an interesting ceptable in Chinese (Kamei and Wakao, 1992). question. Our concern here, however, is the In such cases, the machine translation system interpretation of metonymy, so we leave that has to interpret metonymies to generate accept- question for a future work. able translations. In interpretation, an implicit term (or terms) Previous approaches to processing metonymy that is (are) related to the explicit term is (are) have used hand-constructed ontologies or se- selected. The method described in this paper mantic networks (Fass, 1988; Iverson and Helm- uses corpus statistics for interpretation. reich, 1992; Bouaud et al., 1996; Fass, 1997).1 tual clues obtained through corpus analysis for detecting 1As for processing, Ferrari (1996) used tex- . This method, as applied to Japanese concepts such as that the name of a container metonymies, receives a metonymy in a phrase can represent its contents and the name of an of the form ‘Noun A Case-Marker R Predicate artist can imply an artform (container for V ’ and returns a list of nouns ranked in or- contents and artist for artform below).3 Ex- der of the system’s estimate of their suitability amples of these and similar types of metonymic as interpretations of the metonymy, assuming concepts (Lakoff and Johnson, 1980; Fass, 1997) that noun A is the explicit term. For exam- are given below. 2 ple, given Ford wo (accusative-case) kau (buy) Container for contents (buy a Ford), zyˆoyˆosya (car), best seller, ku- ruma (vehicle), etc. are returned, in that . • glass no mizu (water) The method follows the procedure outlined below to interpret a metonymy. • nabe (pot) no ryˆori (food) 1. Given a metonymy in the form ‘Noun A Artist for artform Case-Marker R Predicate V ’, nouns that • can be syntactically related to the explicit Beethoven no kyoku (music) term A are extracted from a corpus. • Picasso no e (painting) 2. The extracted nouns are ranked according to their appropriateness as interpretations Object for user of the metonymy by applying a statistical • ham sandwich no kyaku (customer) measure. • sax no sˆosya (performer) The first step is discussed in section 3 and the second in section 4. Whole for part

3 Information Source • kuruma (car) no tire We use a large corpus to extract nouns which • door no knob can be syntactically related to the explicit term of a metonymy. A large corpus is valuable as a These examples suggest that we can extract source of such nouns (Church and Hanks, 1990; semantically related nouns by using the AnoB Brown et al., 1992). relation. We used Japanese noun phrases of the form AnoBto extract nouns that were syntactically 4 Statistical Measure related to A. Nouns in such a syntactic relation A metonymy ‘Noun A Case-Marker R Predi- are usually close semantic relatives of each other cate V ’ can be regarded as a contraction of (Murata et al., 1999), and occur relatively infre- ‘Noun A Syntactic-Relation Q Noun B Case- quently. We thus also used an AnearBrela- Marker R Predicate V ’, where A has relation tion, i.e. identifying the other nouns within the Q to B (Yamamoto et al., 1998). For exam- target sentence, to extract nouns that may be ple, Shakespeare wo yomu (read) (read Shake- more loosely related to A, but occur more fre- speare) is regarded as a contraction of Shake- quently. These two types of syntactic relation speare no sakuhin (works) wo yomu (read the are treated differently by the statistical measure works of Shakespeare), where A=Shakespeare, which we will discuss in section 4. Q=no, B=sakuhin, R=wo, and V=yomu. The Japanese noun phrase AnoBroughly Given a metonymy in the form ARV, the corresponds to the English noun phrase BofA, appropriateness of noun B as an interpretation but it has a much broader range of usage (Kuro- of the metonymy under the syntactic relation Q hashi and Sakai, 1999). In fact, AnoBcan ex- is defined by press most of the possible types of semantic re- lation between two nouns including metonymic . LQ(B|A, R, V ) = Pr(B|A, Q, R, V ), (2) 2‘Ford’ is spelled ‘hˆodo’ in Japanese. We have used English when we spell Japanese loan-words from English 3Yamamoto et al. (1998) also used AnoBrelation for the sake of readability. to interpret metonymy. where Pr(···) represents probability and Q is This treatment does not alter the order of the either an AnoBrelation or an AnearBre- nouns ranked by the system because Pr(R, V ) lation. Next, the appropriateness of noun B is is a constant for a given metonymy of the form defined by ARV. . Equations (5) and (6) differ in their treatment M(B|A, R, V ) = max LQ(B|A, R, V ). (3) Q of zero frequency nouns. In Equation (5), a noun B such that f(A, Q, B) = 0 will be ignored We rank nouns by applying the measure M. (assigned a zero probability) because it is un- Equation (2) can be decomposed as follows: likely that such a noun will have a close relation- ship with noun A. In Equation (6), on the other LQ(B|A, R, V ) hand, a noun B such that f(B,R,V ) = 0 is as- = Pr(B|A, Q, R, V ) signed a non-zero probability. These treatments Pr(A, Q, B, R, V ) reflect the asymmetrical property of metonymy, = i.e. in a metonymy of the form ARV,an Pr(A, Q, R, V ) implicit term B will have a much tighter rela- Pr(A, Q, B) Pr(R, V |A, Q, B) = tionship with the explicit term A than with the Pr(A, Q) Pr(R, V |A, Q) predicate V. Consequently, a noun B such that Pr(B|A, Q) Pr(R, V |B) f(A, Q, B)  0 ∧ f(B, R, V ) = 0 may be ap- ' , (4) Pr(R, V ) propriate as an interpretation of the metonymy. Therefore, a non-zero probability should be as- where hA, Qi and hR, V i are assumed to be in- signed to Pr(R, V |B) even if f(B, R, V )=0.5 dependent of each other. Equation (7) is the probability that noun B Let f(event) be the frequency of an event and occurs as a member of class C. This is reduced to f(B) | | Classes(B) be the set of semantic classes to f(C) if B is not ambiguous, i.e. Classes(B) = which B belongs. The expressions in Equation 1. If it is ambiguous, then f(B) is distributed (4) are then defined by4 equally to all classes in Classes(B). The frequency of class C is obtained simi- . f(A, Q, B) f(A, Q, B) Pr(B|A, Q) = = P , larly: X f(A, Q) B f(A, Q, B) f(B) (5) f(C)= , (8) |Classes(B)| B∈C Pr(R, V |B)  where B is a noun which belongs to the class C.  f(B,R,V ) Finally we derive  Pf(B) if f(B,R,V ) > 0, . | ∈ Pr(B C)f(C,R,V ) =  C Classes(B) (6) X f(B, R, V )  f(B) f(C, R, V )= . (9) otherwise, |Classes(B)| B∈C . f(B)/|Classes(B)| Pr(B|C) = . (7) In summary, we use the measure M as de- f(C) fined in Equation (3), and calculated by apply- We omitted Pr(R, V ) from Equation (4) when ing Equation (4) to Equation (9), to rank nouns we calculated Equation (3) in the experiment according to their appropriateness as possible described in section 5 for the sake of simplicity. interpretations of a metonymy.

P 4Strictly speaking, Equation (6) does not satisfy Example Given the statistics below, bottle wo | akeru (open) (open a bottle) will be interpreted R,V Pr(R, V B) = 1. We have adopted this def- inition for the sake of simplicity. This simplifi- cationP has little effect on the final results because 5The use of Equation (6) takes into account a noun B Pr(B|C)f(C, R, V )  1 will usually such that f (B, R, V ) = 0. But, such a noun is usually ig- C∈Classes(B) 0 0 hold. More sophisticated methods (Manning and nored if thereP is another noun B such that f(B ,R,V) > |  ≤ Sch¨utze, 1999) of smoothing probability distribution 0 because C∈Classes(B) Pr(B C)f(C, R, V ) 1 may be beneficial. However, applying such methods f (B0,R,V) will usually hold. This means that the co- and comparing their effects on the interpretation of occurrence probability between implicit terms and verbs metonymy is beyond the scope of this paper. are also important in eliminating inappropriate nouns. 2 23 as described in the following paragraphs, assum- = =6.01 × 10−5, ing that cap and reizˆoko (refrigerator) are the 503 1521 candidate implicit terms. Statistics: M(cap)

= max{Lno(cap),Lnear(cap)} f(bottle, no, cap)=1, =8.37 × 10−3, and f(bottle, no, reizˆoko)=0, M(reizˆoko)

f(bottle, no)=2, = max{Lno(reizˆoko),Lnear(reizˆoko)} f(bottle, near, cap)=1, =6.01 × 10−5, f(bottle, near, reizˆoko)=2, | f(bottle, near) = 503, where Lno(cap)=Lno(cap bottle, wo, akeru), M(cap)=M(cap|bottle, wo, akeru), and so on. f(cap) = 478, Since M(cap) >M(reizˆoko), we can conclude f(reizˆoko) = 1521, that cap is a more appropriate implicit term f(cap, wo, akeru)=8, and than reizˆoko. This conclusion agrees with our f(reizˆoko, wo, akeru)=23. intuition. f(bottle, no, reizˆoko) = 0 indicates that bottle 5 Experiment and reizˆoko are not close semantic relatives of 5.1 Material each other. This shows the effectiveness of us- ing AnoBrelation to filter out loosely related Metonymies Seventy-five metonymies were words. used in an experiment to test the proposed method. Sixty-two of them were collected from Measure: literature on cognitive linguistics (Yamanashi, 1988; Yamanashi, 1995) and psycholinguistics f(bottle, no, cap) (Kusumi, 1995) in Japanese, paying attention L (cap)= no f(bottle, no) so that the types of metonymy were sufficiently diverse. The remaining 13 metonymies were ×f(cap, wo, akeru) direct translations of the English metonymies f(cap) listed in (Kamei and Wakao, 1992). These 13 1 8 − metonymies are shown in Table 2, along with = =8.37 × 10 3, 2 478 the results of the experiment. f(bottle, near, cap) Lnear(cap)= Corpus A corpus which consists of seven f(bottle, near) years of issues of the Mainichi (from ×f(cap, wo, akeru) 1991 to 1997) was used in the experiment. The f(cap) sentences in the corpus were morphologically 1 8 analyzed by ChaSen version 2.0b6 (Matsumoto = =3.33 × 10−5, 503 478 et al., 1999). The corpus consists of about 153 f(bottle, no, reizˆoko) million words. L (reizˆoko)= no f(bottle, no) Semantic Class A Japanese thesaurus, Bun- rui Goi-Hyˆo (The National Language Research ×f(reizˆoko, wo, akeru) f(reizˆoko) Institute, 1996), was used in the experiment. It has a six-layered hierarchy of abstractions and 0 23 = =0, contains more than 55,000 nouns. A class was 2 1521 defined as a set of nouns which are classified in f(bottle, near, reizˆoko) the same abstractions in the top three layers. Lnear(reizˆoko)= f(bottle, near) The total number of classes thus obtained was ×f(reizˆoko, wo, akeru) 43. If a noun was not listed in the thesaurus, it f(reizˆoko) was regarded as being in a class of its own. 5.2 Method Table 2 shows the results of applying the The method we have described was applied to method to the thirteen directly translated the metonymies described in section 5.1. The metonymies described in section 5.1. Aster- procedure described below was followed in in- isks (*) in the first column indicate that direct terpreting a metonymy. translation of the sentences result in unaccept- able Japanese. The C’s and W’s in the sec- 1. Given a metonymy of the form ‘Noun A ond column respectively indicate that the top- Case-Marker R Predicate V ’, nouns re- ranked candidates were correct and wrong. The lated to A by AnoBrelation and/or A sentences in the third column are the original near B relation were extracted from the English metonymies adopted from (Kamei and corpus described in Section 5.1. Wakao, 1992). The Japanese metonymies in 7 2. The extracted nouns (candidates) were the form ‘noun case-marker predicate ’, in the ranked according to the measure M defined fourth column, are the inputs to the method. in Equation (3). In this column, wo and ga mainly represent the accusative-case and nominative-case, re- 5.3 Results spectively. The nouns listed in the last column The result of applying the proposed method to are the top three candidates, in order, according our set of metonymies is summarized in Table to the measure M that was defined in Equation 1. A reasonably good result can be seen for (3). ‘both relations’, i.e. the result obtained by us- These results demonstrate the effectiveness of ing both AnoBand AnearBrelations when the method. Ten out of the 13 metonymies extracting nouns from the corpus. The accu- were interpreted correctly. Moreover, if we racy of ‘both relations’, the ratio of the number restrict our attention to the ten metonymies 6 of correctly interpreted top-ranked candidates that are acceptable in Japanese, all but one to the total number of metonymies in our set, were interpreted correctly. The accuracy was was 0.71(=53/(53+22)) and the 95% confidence 0.9=(9/10), higher than that for ‘both rela- interval estimate was between 0.61 and 0.81. tions’ in Table 1. The reason for the higher de- We regard this result as quite promising. gree of accuracy is that the metonymies in Table Since the metonymies we used were general, 2 are somewhat typical and relatively easy to domain-independent ones, the degree of accu- interpret, while the metonymies collected from racy achieved in this experiment is likely to be Japanese sources included a diversity of types repeated when our method is applied to other and were more difficult to interpret. general sets of metonymies. Finally, the effectiveness of using semantic classes is discussed. The top candidates of six Table 1: Experimental results. out of the 75 metonymies were assigned their Relations used Correct Wrong appropriateness by using their semantic classes, Both relations 53 22 i.e. the values of the measure M was calculated Only AnoB 50 25 with f(B, R, V ) = 0 in Equation (6). Of these, Only AnearB 43 32 three were correct. On the other hand, if se- mantic class is not used, then three of the six are still correct. Here there was no improve- Table 1 also shows that ‘both relations’ is ment. However, when we surveyed the results more accurate than either the result obtained of the whole experiment, we found that nouns by solely using the AnoBrelation or the A for which f(B, R, V ) = 0 often had close re- near B relation. The use of multiple relations lationship with explicit terms in metonymies in metonymy interpretation is thus seen to be and were appropriate as interpretations of the beneficial. metonymies. We need more research before we 6The correctness was judged by the authors. A candi- can judge the effectiveness of utilizing semantic date was judged correct when it made sense in Japanese. classes. For example, we regarded beer, cola, and mizu (water) as all correct interpretations for glass wo nomu (drink) (drink a glass) because they made sense in some context. 7Predicates are lemmatized. Table 2: Results of applying the proposed method to direct translations of the metonymies in (Kamei and Wakao, 1992). Sentences Noun Case-Marker Pred. Candidates C Dave drank the glasses. glass wo nomu beer, cola, mizu (water) C The kettle is boiling. yakan ga waku yu (hot water), oyu (hot water), nettˆo (boiling water) C He bought a Ford. Ford wo kau zyˆoyˆosya (car), best seller, kuruma (vehicle) C He has got a Picasso in his room. Picasso wo motu e (painting), image,aizin (lover) C Anne read Steinbeck. Steinbeck wo yomu gensaku (original work), meisaku (famous story), daihyˆosaku (important work) C Ted played Bach. Bach wo hiku menuetto (minuet), kyoku (music), piano C He read Mao. Mao wo yomu si (poem), tyosyo (writings), tyosaku (writings) * W We need a couple of strong bodies karada ga hituyˆo care,kyˆusoku (rest), for our team. kaigo (nursing) * C There are a lot of good heads in the atama ga iru hito (person),tomodati (friend), university. byˆonin (sick person) W Exxon has raised its price again. Exxon ga ageru Nihon (Japan),ziko (accident), kigyˆo (company) C Washington is insensitive to the Washington ga musinkei zikanho (assistant vice-minister), needs of the people. seikai (political world), gikai (Congress) C The T.V. said it was very crowded T.V. ga iu commentator, announcer, caster at the festival. * W The sign said fishing was prohibited hyˆosiki ga iu mawari (surrounding), here. zugara (design) seibi (maintenance)

6 Discussion then they can be ranked according to the mea- Semantic Relation The method proposed in sure M defined in Equation (3). this paper identifies implicit terms for the ex- Lexically based approaches Generative plicit term in a metonymy. However, it is not Lexicon theory (Pustejovsky, 1995) proposed concerned with the semantic relation between the qualia structure which encodes semantic re- an explicit term and implicit term, because such lations among words explicitly. It is useful to semantic relations are not directly expressed in infer an implicit term of the explicit term in corpora, i.e. noun phrases of the form Ano a metonymy. The proposed approach, on the B can be found in corpora but their semantic other hand, uses corpora to infer implicit terms relations are not. If we need such semantic re- and thus sidesteps the construction of qualia lations, we must semantically analyze the noun structure.8 phrases (Kurohashi and Sakai, 1999). Applicability to other languages Japan- 7 Conclusion ese noun phrases of the form AnoBare specific This paper discussed a statistical approach to to Japanese. The proposed method, however, the interpretation of metonymy. The method could easily be extended to other languages. For follows the procedure described below to inter- example, in English, noun phrases BofAcould pret a metonymy in Japanese: be used to extract semantically related nouns. Nouns related by is-a relations or part-of re- 1. Given a metonymy of the form ‘Noun A lations could also be extracted from corpora 8Briscoe et al. (1990) discusses the use of machine- (Hearst, 1992; Berland and Charniak, 1999). If readable dictionaries and corpora for acquiring lexical such semantically related nouns are extracted, semantic information. Case-Marker R Predicate V ’, nouns that literal phrase interpretation. Computational are syntactically related to the explicit Intelligence, 8(3):477–493. term A are extracted from a corpus. Shin-ichiro Kamei and Takahiro Wakao. 1992. Metonymy: Reassessment, survey of accept- 2. The extracted nouns are ranked according ability,and its treatment in a machine trans- to their degree of appropriateness as inter- lation system. In ACL-92, pages 309–311. pretations of the metonymy by applying a Sadao Kurohashi and Yasuyuki Sakai. 1999. statistical measure. Semantic analysis of Japanese noun phrases: A new approach to dictionary-based under- The method has been tested experimentally. standing. In ACL-99, pages 481–488. Fifty-three out of seventy-five metonymies were Takashi Kusumi. 1995. Hiyu-no Syori-Katei- correctly interpreted. This is quite a promis- to Imi-Kˆozˆo (Processing and Semantic Struc- ing first step toward the statistical processing ture of Tropes). Kazama Publisher. (in of metonymy. Japanese). References George Lakoff and Mark Johnson. 1980. Metaphors We Live By. Chicago University Matthew Berland and Eugene Charniak. 1999. Press. Finding parts in very large corpora. In ACL- Christopher D. Manning and Hinrich Sch¨utze, 99, pages 57–64. 1999. Foundations of Statistical Natural Lan- Jacques Bouaud, Bruno Bachimont, and Pierre guage Processing, chapter 6. The MIT Press. Zweigenbaum. 1996. Processing metonymy: Yuji Matsumoto, Akira Kitauchi, Tatsuo a domain-model heuristic graph traversal ap- Yamashita, and Yoshitaka Hirano. 1999. proach. In COLING-96, pages 137–142. Japanese morphological analysis system Ted Briscoe, Ann Copestake, and Bran Bogu- ChaSen manual. Nara Institute of Science raev. 1990. Enjoy the paper: Lexical seman- and Technology. tics via lexicology. In COLING-90, pages 42– Masaki Murata, Hitoshi Isahara, and Makoto 47. Nagao. 1999. Resolution of indirect anaphora Peter F. Brown, Vincent J. Della Pietra, Pe- in Japanese sentences using examples ”X no ter V. deSouza, Jenifer C. Lai, and Robert L. Y (X of Y)”. In ACL’99 Workshop on Coref- Mercer. 1992. Class-based n-gram models of erence and Its Applications, pages 31–38. natural language. Computational Linguistics, James Pustejovsky. 1995. The Generative Lex- 18(4):467–479. icon. The MIT Press. Kenneth Ward Church and Patrick Hanks. The National Language Research Institute. 1990. Word association norms, mutual in- 1996. Bunrui Goi-hyˆoZˆoho-ban(Taxonomy formation, and lexicography. Computational of Japanese, enlarged edition). (in Japanese). Linguistics, 16(1):22–29. Atsumu Yamamoto, Masaki Murata, and Dan Fass. 1988. Metonymy and metaphor: Makoto Nagao. 1998. Example-based What’s the difference? In COLING-88, metonymy interpretation. In Proc. of the pages 177–181. 4th Annual Meeting of the Association for Dan Fass. 1997. Processing Metonymy and Natural Language Processing, pages 606–609. Metaphor, volume 1 of Contemporary Studies (in Japanese). in Cognitive Science and Technology. Ablex Masa-aki Yamanashi. 1988. Hiyu-to Rikai Publishing Corporation. (Tropes and Understanding). Univer- Stephane Ferrari. 1996. Using textual clues sity Publisher. (in Japanese). to improve metaphor processing. In ACL-96, Masa-aki Yamanashi. 1995. Ninti Bunpˆo-ron pages 351–354. (Cognitive Linguistics). Hitsuji Publisher. Marti A. Hearst. 1992. Automatic acquisition (in Japanese). of hyponyms from large text corpora. In COLING-92, pages 539–545. Eric Iverson and Stephen Helmreich. 1992. Metallel: An integrated approach to non-