Estonian Copular and Existential Constructions As an UD Annotation Problem
Total Page:16
File Type:pdf, Size:1020Kb
Estonian copular and existential constructions as an UD annotation problem Kadri Muischnek Kaili Mu¨urisep¨ University of Tartu University of Tartu Estonia Estonia [email protected] [email protected] Abstract frequent verb of Estonian language which can also function as auxiliary verb in compound tenses, be This article is about annotating clauses part of phrasal verbs, and may occur in existential, with nonverbal predication in version 2 possessor or cognizer sentences. of Estonian UD treebank. Three possible As the descriptive grammar of Estonian (Erelt annotation schemas are discussed, among et al., 1993) lacks more detailed treatment of cop- which separating existential clauses from ular sentences, the label “copula” has not been in- copular clauses would be theoretically troduced into original Estonian Dependency Tree- most sound but would need too much man- bank (EDT) (Muischnek et al., 2014). In copu- ual labor and could possibly yield incon- lar clauses olema is annotated as the root of the cistent annotation. Therefore, a solution clause and other components of the sentence de- has been adapted which separates exis- pend on it; that is also the case if the sentence tential clauses consisting only of subject contains a subject complement. As subject com- and (copular) verb olema be from all other plements have a special label PRD (predicative) in olema-clauses. EDT, such sentences can be easily searched. Estonian treebank for UD v1.3 has been gener- 1 Introduction ated automatically from EDT using transfer rules. This paper discusses the annotation problems and The guidelines for UD v1 implied that subject research questions that came up during the anno- complements serve as roots in copular clauses. tation of Estonian copular sentences while devel- This analysis of copula constructions, according oping Estonian Universal Dependencies treebank, to UD v1 guidelines, extended to adpositional especially while converting it from version 1 of phrases and oblique nominals as long as they have UD annotation guidelines to version 2. a predicative function. By contrast, temporal and Copular clauses are a sentence type in which the locative modifiers were treated as dependents on contentful predicate is not a verb, but falls into the existential verb ’be’. some other category. In some languages there is Therefore, while converting EDT to UD v1, no verbal element at all in these clauses; in other sentences with subject complements were rela- languages there is a verbal copula joining the sub- tively easily transferred to sentences with copular ject and the non-verbal element (Mikkelsen, 2011, tree structure (Muischnek et al., 2016). However, p. 1805). oblique nominals and adpositional phrases were In Estonian, there is mainly one verb, namely not annotated as instances of nonverbal predica- olema ’be’, that functions as copula in copular tion. clauses. Estonian descriptive grammar (Erelt et Since UD v 2.0 assumes a more general annota- al., 1993) uses the term copular verb (Est ’koide’)¨ tion scheme for copular sentences, we faced sev- only for describing sentences with subject com- eral conversion problems and also linguistic ques- plements, stating that in such sentences the verb tions. This paper provides insights into these re- olema has only grammatical features of a predi- search questions, gives an overview how copular cate (time, mode, person). Also, the copula olema clauses are annotated in some UD v2 treebanks for is semantically empty if used alone and it can not some other languages (Finnish, German, English) have any other dependents than non-verbal pred- and describes what are the options for annotating icate. At the same time, verb olema is the most Estonian sentences. 79 Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017), pages 79–85, Gothenburg, Sweden, 22 May 2017. In the remainder of this paper, Section 2 we give Finnish UD v2 treebank, clauses with olla ’be’ are a short account of UD v2 guidelines for annotat- mostly regarded as instances of nonverbal predi- ing copular clauses and show how these construc- cation, annotating olla as copula (1), among them tions are annotated in some UD v2 treebanks for also possessive clauses (2). However, if the clause Finnish, German and English. Section 3 is ded- contains only subject besides some form of olla, icated to copular constructions in Estonian lan- olla is annotated as root (3). guage and Estonian UD versions 1 and 2. Some In Finnish FTB v2 treebank more clause types conclusions are drawn in Section 4. are annotated with olla as root, e.g. possessive clause (4) and clause containing predicative adver- 2 UD annotation guidelines for bial (5). It seems that annotation of copular con- nonverbal predication structions in Finnish FTB resembles that in ver- According to the UD annotation scheme version sion 1 of Estonian UD only subject complements 1, copular constructions are to be annotated differ- are annotated as roots in copular constructions. ently from other clause types, analysing the pred- In the UD v2 treebank of German, sentences icative element as root and if there is an overt link- with subject complement are annotated as in- ing verb present, it should be attached to this non- stances of nonverbal predication (6) and other in- verbal predicate as copula. The copula relation is stances of sein and werden ’be’ seem to be anno- restricted to function words whose sole function tated as main verbs, not copulas (7). is to link a non-verbal predicate to its subject and (1) Hyllylla¨ oli H&M which does not add any meaning other than gram- shelf-ADE was H&M maticalised TAME categories. Such an analysis is ROOT cop nmod motivated by the fact that many languages often or Home-tuotteita always lack an overt copula, so annotation would Home-product-PL.PRT 1 be cross-linguistically consistent . nsubj:cop Version 2 of UD annotation guidelines extend ’There were some H&M products on the the set of constructions that should be annotated shelf’ as instances of nonverbal predication, defining six categories of nonverbal predication, namely (2) Kuvia minulla ei ole those of equation, attribution, location, posses- picture-PL.PRT I-ADE not be sion, benefaction and existence2. nsubj:cop ROOT aux cop:own In order to get better overview of practical anno- ’I have no pictures’ tation of copular constructions cross-linguistically, (3) Kun rahaa ei ole ... we studied the v2 versions of UD treebanks of If money-PRT not is Finnish, which is the most closely related lan- mark nsubj aux ROOT guage to Estonian present in UD, and also German and English. As the language-independent anno- ’If there is no money’ tation guidelines for UD version 2 were published (4) Meilla¨ ei ole rahaa in the very end of last year, there are no language- we-ADE not is money-PRT specific guidelines published yet. So we had to nmod:own aux ROOT nsubj rely on treebank queries in order to gain infor- tuhlata. mation about annotating copular and related con- waste-INF structions in the afforementioned languages. We acl queried UD v2 treebanks using the SETS treebank ’We have no money to waste’ search maintained by the University of Turku3. There are two UD treebanks for Finnish: the (5) Talonmies on juovuksissa. Finnish UD treebank, based on Turku Dependency Caretaker is drunkenness-INE Treebank, and Finnish-FTB (FinnTreeBank). In nsubj ROOT advmod 1http://universaldependencies.org/v2/ ’The caretaker is drunk’ copula.html 2http://universaldependencies.org/u/ (6) Das Personal ist freundlich. overview/simple-syntax.html#nonverbal-clauses DET staff is friendly 3http://bionlp-www.utu.fi/dep_search/ det nsubj:cop cop ROOT 80 ’The staff is friendly.’ some exceptions. First, copula is often omitted in headlines like (8). (7) Ich war in dem Dezember bei I was in DET December at (8) Valitsus otsustusvoimetu˜ nsubj ROOT case obl case det Government indecisive Kuchen¨ Walther. ’Government is indecisive.’ Kuchen¨ Walther And due to time pressure, copula, but also other obl flat verbs, can be omitted in online communication ’I was in December at Kuchen¨ Walther.’ (9). There are four English UD treebanks, but we (9) Ma nii kurb queried only the largest of them, the English Web I so sad Treebank. It seems that predicative (e.g. Fig. 1) ’I am so sad.’ and locative (Fig. 2) constructions are analysed as instances of non-verbal predication, whereas exis- As a sidenote, although ellipsis of verb olema tential clauses (Fig. 3) are not. is rare in Estonian, it is still more frequent than in Finnish (Kehayov, 2008). The annotation guidelines for version 2 of Universal Dependencies define six categories of nonverbal predication that can be found cross- linguistically (with or without a copula), namely Figure 1: Predicative construction in the English equation (aka identification), attribution, location, UD treebank. possession, benefaction and existence. As for Estonian, constructions expressing equa- tion (10a), attribution (10b), location (10c), pos- session (10d), benefaction (10e) or existence (10f) are all coded using verb olema. In addition to the afforementioned clause types, there are also cog- nizer clauses (10g) that can be viewed as a sub- Figure 2: Locative construction in the English UD type or metaphorical extension of the possessive treebank. clause type. Perhaps also quantification clause (10h) should be mentioned as a separate type. (10) a. Mari on opetaja.˜ Mari is teacher ’Mari is a teacher.’ Figure 3: Existential clause in the English UD b. Laps on vaike.¨ treebank. Child is small ’Child is small.’ As the above discussion illustrates, annota- c. Laps on koolis. tion of copular constructions varies across differ- Child is school-INE ent languages and also across different treebanks. ’Child is at school.’ Having better documentation for v2 treebanks d. Lapsel on raamat.