Identifying Pronominal Verbs: Towards Automatic Disambiguation of the Clitic 'Se' in Portuguese
Total Page:16
File Type:pdf, Size:1020Kb
Identifying Pronominal Verbs: Towards Automatic Disambiguation of the Clitic se in Portuguese Magali Sanches Duran♥, Carolina Evaristo Scarton♥, Sandra Maria Alu´ısio♥, Carlos Ramisch♠ ♥ University of Sao˜ Paulo (Brazil) ♠ Joseph Fourier University (France) [email protected], [email protected] [email protected], [email protected] Abstract noun, however, se has six uses: A challenging topic in Portuguese language 1. marker of SUBJECT INDETERMINATION: processing is the multifunctional and ambigu- Ja´ se falou muito nesse assunto. ous use of the clitic pronoun se, which impacts *Has-SE already spoken a lot about this matter. NLP tasks such as syntactic parsing, semantic role labeling and machine translation. Aiming One has already spoken a lot about this matter. to give a step forward towards the automatic 2. marker of pronominal PASSIVE voice (syn- disambiguation of se, our study focuses on the thetic passive voice): identification of pronominal verbs, which cor- Sugeriram-se muitas alternativas. respond to one of the six uses of se as a clitic *Have-SE suggested many alternatives. pronoun, when se is considereda CONSTITU- Many alternatives have been suggested. TIVE PARTICLE of the verb lemma to which it is bound, as a multiword unit. Our strategy 3. REFLEXIVE pronoun (-self pronouns): to identify such verbs is to analyze the results Voceˆ deveria se olhar no espelho. of a corpus search and to rule out all the other *You should look-SE on the mirror. possible uses of se. This process evidenced You should look at yourself on the mirror. the features needed in a computational lexicon to automatically perform the disambiguation 4. RECIPROCAL pronoun (each other): task. The availability of the resulting lexicon Eles se cumprimentaram com um aperto de mao.˜ of pronominal verbs on the web enables their *They greeted-SE with a handshake. inclusion in broader lexical resources, such as They greeted each other with a handshake. the Portuguese versions of Wordnet, Propbank 5. marker of causative-INCHOATIVE alternation2: and VerbNet. Moreover, it will allow the revi- Esse esporte popularizou-se no Brasil. sion of parsers and dictionaries already in use. *This sport popularED-SE in Brazil. This sport became popular in Brazil. 1 Introduction 6. CONSTITUTIVE PARTICLE of the verb lexical In Portuguese, the word se is multifunctional. POS item (pronominal verb): taggers have succeeded in distinguishing between se Eles se queixaram de dor no joelho. as a conjunction (meaning if or whether) and se as *They complained-SE about knee pain. a pronoun (see Martins et al. (1999) for more details They complained about knee pain. on the complexity of such task). As a clitic1 pro- 2Causative-inchoative alternation: a same verb can be used 1A clitic is a bound form, phonologically unstressed, at- two different ways, one transitive, in which the subject position tached to a word from an open class (noun, verb, adjective, ad- is occupied by the argument which causes the action or process verbial). It belongs to closed classes, that is, classes that have described by the verb (causative use), and one intransitive, in grammatical rather than lexical meaning (pronouns, auxiliary which the subject position is occupied by the argument affected verbs, determiners, conjunctions, prepositions, numerals). by the action or process (inchoative use). 93 Proceedings of the 9th Workshop on Multiword Expressions (MWE 2013), pages 93–100, Atlanta, Georgia, 13-14 June 2013. c 2013 Association for Computational Linguistics Clitic se uses Syntactic Semantic function function NO YES3 SUBJECT INDE- TERMINATION YES YES3 PASSIVE YES YES REFLEXIVE YES YES RECIPROCAL YES NO INCHOATIVE NO NO CONSTITUTIVE Figure 1: Sentence The broadcasters refused to apologize PARTICLE includes pronominal verbs negar-se (refuse) and retratar- se (apologize) that evoke frames in SRL. Table 1: Uses of the clitic se from the point of view of syntax and semantics. tive definition: if se does not match the restrictions The identification of these uses is very important imposed by the other five uses, so it is a CONSTI- for Portuguese language processing, notably for syn- TUTIVE PARTICLE of the verb, that is, it composes a tactic parsing, semantic role labeling (SRL) and ma- multiword. Therefore, the identification of pronom- chine translation. Table 1 shows which of these six inal verbs requires linguistic knowledge to distin- uses support syntactic and/or semantic functions. guish se as a CONSTITUTIVE PARTICLE from the se SUBJECT INDE Since superficial syntactic features seem not suffi- other uses of the the pronoun ( - cient to disambiguate the uses of the pronoun se, we TERMINATION, PASSIVE, REFLEXIVE, RECIPRO- propose the use of a computational lexicon to con- CAL and INCHOATIVE.) tribute to this task. To give a step forward to solve There are several theoretical linguistic studies this problem, we decided to survey the verbs un- about the clitic pronoun se in Portuguese. Some of dergoing se as an integral part of their lexical form these studies present an overview of the se pronoun (item 6), called herein pronominal verbs, but also uses, but none of them prioritized the identification known as inherent reflexive verbs (Rosario´ Ribeiro, of pronominal verbs. The study we report in this pa- 2011). Grammars usually mention this kind of verbs per is intended to fill this gap. and give two classical examples: queixar-se (to com- plain) and arrepender-se (to repent). For the best of 2 Related Work our knowledge, a comprehensive list of these multi- From a linguistic perspective, the clitic pronoun word verbs is not available in electronic format for se has been the subject of studies focusing on: NLP uses, and not even in a paper-based format, SUBJECT INDETERMINATION and PASSIVE uses such as a printed dictionary. (Morais Nunes, 1990; Cyrino, 2007; Pereira-Santos, An example of the relevance of pronominal verbs 2010); REFLEXIVE use (Godoy, 2012), and IN- is that, in spite of not being argumental, that is, not CHOATIVE use (Fonseca, 2010; Nunes-Ribeiro, being eligible for a semantic role label, the use of se 2010; Rosario´ Ribeiro, 2011). Despite none of these as a CONSTITUTIVE PARTICLE should integrate the works concerning specifically pronominal verbs, verb that evokes the argumental structure, as may be they provided us an important theoretical basis for seen in Figure 1. the analysis undertaken herein. The identification of pronominal verbs is not a The problem of the multifunctional use of clitic trivial task because a pronominal verb has a nega- pronouns is not restricted to Portuguese. Romance 3In these cases, the clitic may support the semantic role label languages, Hebrew, Russian, Bulgarian and oth- of the suppressed external argument (agent). ers also have similar constructions. There are 94 crosslinguistic studies regarding this matter reported terns covered all the verbs in third person singular in Siloni (2001) and Slavcheva (2006), showing (POS=V*, morphology=3S) followed/preceded by that there are partial coincidence of verbs taking the clitic pronoun se (surface form=se, POS=PERS). clitic pronouns to produce alternations and reflexive The patterns returned a set of se occurrences, that voice. is, for each verb, a set of sentences in the corpus in From an NLP perspective, the problem of the which this verb is followed/preceded by the clitic se. ambiguity of the clitic pronoun se was studied by In our analysis, we looked at all the verbs tak- Martins et al. (1999) to solve a problem of catego- ing an enclitic se, that is, where the clitic se is at- rization, that is, to decide which part-of-speech tag tached after the verb. We could as well have in- should be assigned to se. However, we have not cluded the occurrences of verbs with a proclitic se found studies regarding pronominal verbs aiming at (clitic attached before the verb). However, we sus- Portuguese automatic language processing. pected that this would increase the number of occur- Even though in Portuguese all the uses of the clitic rences (sentences) to analyze without a proportional pronoun se share the same realization at the surface increase in verb lemmas. Indeed, our search for pro- form level, the use as a CONSTITUTIVE PARTICLE of clitic se occurrences returned 40% more verb lem- pronominal verbs is the only one in which the verb mas and 264% more sentences than for the enclitic and the clitic form a multiword lexical unit on its se (59,874 sentences), thus confirming our hypothe- own. In the other uses, the clitic keeps a separate sis. Moreover, as we could see at a first glance, pro- syntactic and/or semantic function, as presented in clitic se results included se conjunctions erroneously Table 1. tagged as pronouns (when the parser fails the cate- The particle se is an integral part of pronominal gorial disambiguation). This error does not occur verbs in the same way as the particles of English when the pronoun is enclitic because Portuguese or- phrasal verbs. As future work, we would like to in- thographic rules require a hyphen between the verb vestigate possible semantic contributions of the se and the clitic when se is enclitic, but never when it particle to the meaning of pronominal verbs, as done is proclitic. by Cook and Stevenson (2006), for example, who try We decided to look at sentences as opposed to to automatically classify the uses of the particle up in looking only at candidate verb lemmas, because we verb-particle constructions. Like in the present pa- did not trust that our intuition as native speakers per, they estimate a set of linguistic features which would be sufficient to identify all the uses of the are in turn used to train a Support Vector Machine clitic se for a given verb, specially as some verbs (SVM) classifier citecook:2006:mwe.