Japanese Zero Reference Resolution Considering Exophora and Author/Reader Mentions

Japanese Zero Reference Resolution Considering Exophora and Author/Reader Mentions Masatsugu Hangyo Daisuke Kawahara Sadao Kurohashi Graduate School of Informatics, Kyoto University Yoshida-honmachi, Sakyo-ku Kyoto, 606-8501, Japan hangyo,dk,kuro @nlp.ist.i.kyoto-u.ac.jp { } Abstract For example, in example (1) , the accusative argument of the predicate “食べます” (eat) is omitted .1 In Japanese, zero references often occur and The omitted argument is called a zero pronoun. In many of them are categorized into zero ex- this example, the zero pronoun refers to “パスタ” ophora, in which a referent is not mentioned in (pasta). the document. However, previous studies have focused on only zero endophora, in which Zero reference resolution is divided into two sub- a referent explicitly appears. We present a tasks: zero pronoun detection and referent identifi- zero reference resolution model considering cation. Zero pronoun detection is the task that de- zero exophora and author/reader of a docu- tects omitted zero pronouns from a document. In ment. To deal with zero exophora, our model example (1), this task detects that there are the zero adds pseudo entities corresponding to zero pronouns in the accusative and nominative cases of exophora to candidate referents of zero pro- “食べます” (eat) and there is no zero pronoun in nouns. In addition, we automatically detect the dative case of “食べます”. Referent identifica- mentions that refer to the author and reader of a document by using lexico-syntactic patterns. tion is the task that identifies the referent of a zero We represent their particular behavior in a dis- pronoun. In example (1), this task identifies that the course as a feature vector of a machine learn- referent of the zero pronoun in the accusative case of ing model. The experimental results demon- “食べます” is “パスタ” (pasta). These two subtasks strate the effectiveness of our model for not are often resolved simultaneously and our proposed only zero exophora but also zero endophora. model is a unified model. Many previous studies (Imamura et al., 2009; 1 Introduction Sasano et al., 2008; Sasano and Kurohashi, 2011) have treated only zero endophora, which is a phe- Zero reference resolution is the task of detecting and nomenon that a referent is mentioned in a document, identifying omitted arguments of a predicate. Since such as “パスタ” (pasta) in example (1). However, the arguments are often omitted in Japanese, zero zero exophora, which is a phenomenon that a ref- reference resolution is essential in a wide range of erent does not appear in a document, often occurs in Japanese natural language processing (NLP) appli- Japanese when a referent is an author or reader of a cations such as information retrieval and machine document or an indefinite pronoun. For example, in translation. example (1), the referent of the zero pronoun of the (1) パスタが好きで毎日 (φ ガ) nominative case of “食べます” (eat) is the author of pasta-NOM like everyday (φ-NOM) 1In this paper, we use the following abbreviations: NOM ヲ食べます。 (φ ) (nominative), ABL(ablative), ACC (accusative), DAT (dative), (φ-ACC) eat ALL (allative), GEN (genitive), CMI (comitative), CNJ (con- (Liking pasta, (φ) eats (φ) every day) junction), INS(instrumental) and TOP (topic marker). 924 Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 924–934, Seattle, Washington, USA, 18-21 October 2013. c 2013 Association for Computational Linguistics Referent Zero pronoun Example in the document 僕はカフェが好きで毎日 (カフェニ)通っている。 Zero endophora Exist Exist (I like cafes and go (to a cafe) everyday.) 私がメリットを ([reader] ニ) Not exist Zero exophora Exist 説明させていただきます。 (I would like to explain the advantage (to [reader]).) あなたはリラックスタイムが (×ニ)過ごせる。 Not exist No zero reference Not exist (You can have a relaxing time.) *There is no dative case. Table 1: Examples of zero endophora, zero exophora and no zero reference. the document, but the author is not mentioned ex- articles and shopping sites, the A/R often appear in plicitly. the discourse. The A/R tend to be omitted and there are many clues for the referent identification about (2) 最近はパソコンで動画を the A/R such as honorific expressions and modal- recently PC-INS movie-ACC ity expressions. Therefore, it is important to deal ([unspecified:person] ガ) 見れる。 with the A/R of a document explicitly for the refer- ([unspecified:person]-NOM) can see ent identification. (Recently, (people) can see movies by a PC.) The A/R appear as not only the exophora but also Similarly, in example (2), the referent of the zero the endophora. pronoun of the nominative case of “見れる” (can see) is an unspecified person.2 (3) 僕 author は京都に (僕ガ) I-TOP Kyoto-DAT (I-NOM) Most previous studies have neglected zero exophora, as though a zero pronoun does not exist in 行こうと思っています。 a sentence. However, such a rough approximation will go have thought has impeded the zero reference resolution research. (I have thought (I) will go to Kyoto.) In Table 1, in “zero exophora,” the dative case of 皆さん reader はどこに行きたいか the predicate has the zero pronoun, but in “no zero you all-TOP where-DAT want to go reference,” the dative case of the predicate does not (皆さんガ) (僕ニ) 教えてください。 have a zero pronoun. Treating them with no dis- (you all-NOM) (I-DAT) let me know tinction causes a decrease in accuracy of machine learning-based zero pronoun detection due to a gap (Please let (me) know where do you want to go.) between the valency of a predicate and observed ar- In example (3), “僕” (I), which is explicitly men- guments of the predicate. In this work, to deal with tioned in the document, is the author of the docu- zero exophora explicitly, we provide pseudo entities ment and “皆さん” (you all) is the reader. In this pa- such as [author], [reader] and [unspecified:person] per, we call these expressions, which refer to the au- as candidate referents of zero pronouns. thor and reader, author mention and reader men- In the referent identification, selectional prefer- tion. We treat them explicitly to improve the per- ences of a predicate (Sasano et al., 2008; Sasano and formance of zero reference resolution. Since the Kurohashi, 2011) and contextual information (Iida A/R are mentioned as various expressions besides et al., 2006) have been widely used. The author and personal pronouns in Japanese, it is difficult to de- reader (A/R) of a document have not been used for tect the A/R mentions based merely on lexical in- contextual clues because the A/R rarely appear in formation. In this work, we automatically detect the discourse in corpora based on newspaper arti- the A/R mentions by using a learning-to-rank al- cles, which are main targets of the previous studies. gorithm(Herbrich et al., 1998; Joachims, 2002) that However, in other domain documents such as blog uses lexico-syntactic patterns as features. 2In the following examples, omitted arguments are put in Once the A/R mentions can be detected, their in- parentheses and exophoric referents are put in square brackets. formation is useful for the referent identification. 925 The A/R mentions have both a property of the dis- pronoun,” in which they are treated without distinc- course element mentioned in a document and a prop- tion. Sasano et al. (2008) proposed a probabilis- erty of the zero exophoric A/R. In the first sentence tic predicate-argument structure analysis model in- of example (3), it can be estimated that the referent cluding zero endophora resolution by using wide- of the zero pronoun of the nominative case of “行こ coverage case frames constructed from a web cor- う” (will go) from a contextual clue that “僕” (I) is pus. Sasano and Kurohashi (2011) extended the the topic of this sentence and a syntactic clues that “ Sasano et al. (2008)’s model by focusing on zero en- 僕” (I) depends on “思っています” (have thought) dophora. Their model is based on a log-linear model over the predicate “行こう” (will go).3 Such con- that uses case frame information and the location of textual clues can be available only for the discourse a candidate referent as features. In their work, zero entities that are mentioned explicitly. On the other exophora is not treated and they assumed that a zero hand, in the second sentence, since “教えてくださ pronoun is absent when there is no referent in a doc- い” (let me know) is a request form, it can be as- ument. sumed that the referent of the zero pronoun of the For languages other than Japanese, zero pronoun nominative case is “僕” (I), which is the author, resolution methods have been proposed for Chinese, and the one of the dative case is “皆様” (you all), Portuguese, Spanish and other languages. In Chi- which is the reader. The clues such as request forms, nese, Kong and Zhou (2010) proposed tree-kernel honorific expressions and modality expressions are based models for three subtasks: zero pronoun de- available for the author and reader. In this work, to tection, anaphoricity decision and referent selection. represent such aspect of the A/R mentions, both the In Portuguese and Spanish, only a subject word is endophora and exophora features are given to them. omitted and zero pronoun resolution has been tack- In this paper, we propose a zero reference reso- led as a part of coreference resolution. Poesio et lution model considering the zero exophora and the al. (2010) and Rello et al. (2012) detected omitted author/reader mentions, which resolves the zero ref- subjects and made a decision whether the omitted erence as a part of a predicate-argument structure subject has anaphoricity or not as preprocessing of analysis.

Japanese Zero Reference Resolution Considering Exophora and Author/Reader Mentions

Exophoric and Endophoric Awareness

The English Referential System (A Tutorial) ! Natural Language Discourse Could Have Looked Like This: ! Andy Kehler Is Traveling to Stanford on Wednesday

Antar Solhy Abdellah Publication Date: 2007 Source: CDELT (Centre for Developing English Language Teaching) Occasional Papers, January (2007) [Egypt]

Deixis Used in the Writing Text by the Fourth Semester Students of STKIP PGRI Tulungagung

A Coding System for Analysing a Spoken Text Database. PUB DATE 94 NOTE 16P.; for Serial Publication in Which This Paper Appears, See FL 022 351

Donkey Sentences 763 Creating Its Institutions of Laws, Religion, and Learning

Microstructural (Cohesion and Coherence) Text Generation Problems of Syrian Refugee Students Learning Turkish

Global Coherence and the Teaching of Composition

Resolving Direct and Indirect Anaphora for Japanese Definite

Anaphora: Text-Based Or Discourse-Dependent? Functionalist Vs

Semantics and Pragmatics

Using Zero Anaphora Resolution to Improve Text Categorization