Paraphrasing of Japanese Light-Verb Constructions Based on Lexical

Second ACL Workshop on Multiword Expressions: Integrating Processing, July 2004, pp. 9-16 Paraphrasing of Japanese Light-verb Constructions Based on Lexical Conceptual Structure Atsushi Fujita† Kentaro Furihata† Kentaro Inui† Yuji Matsumoto† Koichi Takeuchi‡ †Graduate School of Information Science, Nara Institute of Science and Technology {atsush-f,kenta-f,inui,matsu}@is.naist.jp ‡Department of Information Technology, Okayama University [email protected] Abstract nalized verb (“kandou (an impression)”) (also see Some particular classes of lexical paraphrases such Figure 1 in Section 2.2). A paraphrase of (1s) is sen- as verb alteration and compound noun decomposi- tence (1t), in which the nominalized verb functions tion can be handled by a handful of general rules as the main verb with its verbal form (“kandou-s-as and lexical semantic knowledge. In this paper, we e-ta (be impressed-CAU, PAST)”). attempt to capture the regularity underlying these (1) s. Eiga-ga kare-ni kandou-o ataeta. classes of paraphrases, focusing on the paraphras- film-NOM him-DAT impression-ACC give-PAST ing of Japanese light-verb constructions (LVCs). The film made an impression on him. We propose a paraphrasing model for LVCs that t. Eiga-ga kare-o kandou-s-ase-ta. is based on transforming the Lexical Conceptual film-NOM him-ACC be impressed-CAUSATIVE, PAST Structures (LCSs) of verbal elements. We also pro- The film impressed him. pose a refinement of an existing LCS dictionary. Ex- To generate this type of paraphrase, we need a com- perimental results show that our LCS-based para- putational model that is capable of the following phrasing model characterizes some of the semantic two classes of choice (also see Section 2.2): features of those verbs required for generating para- Selection of the voice: The model needs to be able phrases, such as the direction of an action and the to choose the voice of the target sentence from relationship between arguments and surface cases. active, passive, causative, etc. In example (1), the causative voice is chosen, which is indi- 1 Introduction cated by the auxiliary verb “ase (causative)”. Automatic paraphrase generation technology offers Reassignment of the cases: The model needs to the potential to bridge gaps between the authors and be able to reassign a case marker to each ar- readers of documents. For example, a system that gument of the main verb. In (1), the gram- is capable of simplifying a given text, or showing matical case of “kare (him),” which was orig- the user several alternative expressions conveying inally assigned the dative case, is changed to the same content, would be useful for assisting a accusative. reader (Carroll et al., 1999; Inui et al., 2003). The task is not as simple as it may seem, because In Japanese, like other languages, there are sev- both decisions depend not only on the syntactic and eral classes of paraphrasing that exhibit a degree semantic attributes of the light-verb, but also on of regularity that allows them to be explained by those of the nominalized verb (Muraki, 1991). a handful of sophisticated general rules and lexical In this paper, we propose a novel lexical semantic knowledge. For example, paraphrases as- semantics-based account of the LVC paraphrasing, sociated with voice alteration, verb/case alteration, which uses the theory of Lexical Conceptual Struc- compounds, and lexical derivations all fall into such ture (LCS) of Japanese verbs (Kageyama, 1996; classes. In this paper, we focus our discussion on Takeuchi et al., 2001). The theory of LCS offers another useful class of paraphrases, namely, the an advantage as the basis of lexical resources for paraphrasing of light-verb constructions (LVCs), paraphrasing, because it has been developed to ex- and propose a computational model for generating plain varieties of linguistic phenomena including paraphrases of this class. 1 lexical derivations, the construction of compounds, Sentence (1s) is an example of an LVC .AnLVC and verb alteration (Levin, 1993; Dorr et al., 1995; is a verb phrase (“kandou-o ataeta (made an impres- Kageyama, 1996; Takeuchi et al., 2001), all of sion)” in (1s)) that consists of a light-verb (“ataeta which are associated with the systematic paraphras- (give-PAST)”) that grammatically governs a nomi- ing we mentioned above. 1For each example, s denotes an input and t denotes its para- The paraphrasing associated with LVCs is not id- phrase. iosyncratic to Japanese but also appears commonly in other languages such as English (Mel’ˇcuk and (a) Adverb Polguère, 1987; Iordanskaja et al., 1991; Dras, (b) Noun + Case Particle 1999, etc.), as indicated by the following examples. (c) Noun + Case particle"no"(GEN) (2) s. Steven made an attempt to stop playing. (d) Embedded clause t. Steven attempted to stop playing. (e) Adjective Target of this paper Nominalized verb + Case particle (3) s. It had a noticeable effect on the trade. LVC Light-verb (+suffixes) t. It noticeably affected the trade. Figure 1: Dependency structure showing the range Our approach raises the interesting issue of whether which the LVC paraphrasing affects. the paraphrasing of LVCs can be modeled in an analogous way across languages. be VERB-PP by Y ⇒ Y VERB X”). If we were to Our aim in this paper are: (i) exploring the reg- use an acquisition scheme that is not capable of de- ularity of the LVC paraphrasing based a lexical composing such complex paraphrases correctly, we semantics-based account, and (ii) assessing the im- would have to collect a combinatorial number of mature Japanese semantic typology through a prac- paraphrases to gain the required coverage. Second, tical task. the results of automatic acquisition would likely in- The following sections describe our motiva- clude many inappropriate patterns, which would re- tion, target, and related work on LVC paraphras- quire manual correction. Manual correction, how- ing (Section 2), the basics of LCS and the refine- ever, would be impractical if we were collecting a ments we made (Section 3), our paraphrasing model combinatorial number of patterns. (Section 4), and our experiments (Section 5). Fi- Our approach to this dilemma is as follows: first, nally, we conclude this paper with a brief of descrip- we manually develop the resources needed to cover tion of work to be done in the future (Section 6). those paraphrases that appear regularly, and then de- compose and automatically refine the acquired para- 2 Motivation, target, and related work phrasing patterns using those resources. The work 2.1 Motivation reported in this paper is aimed at this resource de- One of the critical issues that we face in para- velopment. phrase generation is how to develop and maintain 2.2 Target structure and required operations knowledge resources that covers a sufficiently wide Figure 1 shows the range which the LVC para- range of paraphrasing patterns such as those in- phrasing affects, where the solid boxes denote dicating that “to make an attempt” can be para- Japanese base-chunk so-called “bunsetsu.” 2 Being phrased into “to attempt,” and that “potential” can involved in the paraphrasing, the modifiers of the be paraphrased into “possibility.” Several attempts LVC need the following operations: have been made to develop such resources manually (Sato, 1999; Dras, 1999; Inui and Nogami, 2001); Change of the dependence: The dependences of those work have, however, tended to restrict their the elements (a) and (b) need to be changed scope to specific classes of paraphrases, and cannot because the original modifiee, the light-verb, be used to construct a sufficiently comprehensive re- is eliminated by the paraphrasing. source for practical applications. Re-conjugation: The conjugation form of the ele- There is another trend in the research in this field, ments (d), (e), and occasionally (c) need to be namely, the automatic acquisition of paraphrase pat- changed according to the category change of terns from parallel or comparable corpora (Barzilay their modifiee, the nominalized verb. and McKeown, 2001; Lin and Pantel, 2001; Pang et Reassignment of the cases: As described in the al., 2003; Shinyama and Sekine, 2003, etc.). This previous section, the case markers of the ele- type of approach may be able to reduce the cost ments (b) and often (c) need to be reassigned. of resource development. There are problems that Selection of the voice: The voice of the nominal- must be overcome, however, before they can work ized verb needs to be chosen according to the practically. First, automatically acquired patterns combination of the nominalized verb, the light- tend to be complex. For example, from the para- verb, and the original voice. phrase of (4s) into (4t), we can naively obtain the The first two operations are trivial in the field of pattern: “X is purchased by Y ⇒ Y buys X.” text generation. Moreover, they can be done inde- (4) s. This car was purchased by him. pendently of the LVC paraphrasing. The most deli- t. He bought this car. cate operation is for the element (c) because it acts This could also, however, be regarded as a combi- either as an adverb or as a case, relying on the con- nation of a simpler pattern of lexical paraphrasing 2The modifiee of the LVC is not affected because the part- (“purchase ⇒ buy”) and a voice activization (“X of-speech of the light-verb and main verb are the same. Table 1: Examples of LCS Verb LCS for verb Verb phrase move [y MOVE TO z] My sister (Theme) moves to a neighboring town (Goal). transmit [x CONTROL [y MOVE TO z]] The enzyme (Agent) transmits messages (Theme) to the muscles (Goal). locate [y BE AT z] The school (Theme) locates near the river (Goal).

Load more