Expletives in Universal Dependency Treebanks

Expletives in Universal Dependency Treebanks Gosse Bouma∗◦ Jan Hajic†◦ Dag Haug‡◦ Joakim Nivre•◦ Per Erik Solberg‡◦ Lilja Øvrelid?◦ ∗University of Groningen, Centre for Language and Cognition yCharles University in Prague, Faculty of Mathematics and Physics, UFAL zUniversity of Oslo, Department of Philosophy, Classics, History of Arts and Ideas •Uppsala University, Department of Linguistics and Philology ?University of Oslo, Department of Informatics ◦Center for Advanced Study at the Norwegian Academy of Science and Letters Abstract language that has expletives into a language that does not use expletives (Hardmeier et al., 2015; Although treebanks annotated according to the Werlen and Popescu-Belis, 2017). The ParCor guidelines of Universal Dependencies (UD) now exist for many languages, the goal of co-reference corpus (Guillou et al., 2014) distin- annotating the same phenomena in a cross- guishes between anaphoric, event referential, and linguistically consistent fashion is not always pleonastic use of the English pronoun it. Loaiciga´ met. In this paper, we investigate one phe- et al.(2017) train a classifier to predict the dif- nomenon where we believe such consistency ferent uses of it in English using among others is lacking, namely expletive elements. Such syntactic information obtained from an automatic elements occupy a position that is structurally parse of the corpus. Being able to distinguish ref- associated with a core argument (or sometimes an oblique dependent), yet are non-referential erential from non-referential noun phrases is po- and semantically void. Many UD treebanks tentially important also for tasks like question an- identify at least some elements as expletive, swering and information extraction. but the range of phenomena differs between Applications like these motivate consistent and treebanks, even for closely related languages, explicit annotation of expletive elements in tree- and sometimes even for different treebanks for banks and the UD annotation scheme introduces a the same language. In this paper, we present dedicated dependency relation (expl) to account criteria for identifying expletives that are ap- plicable across languages and compatible with for these. However, the current UD guidelines are the goals of UD, give an overview of exple- not specific enough to allow expletive elements to tives as found in current UD treebanks, and be identified systematically in different languages, present recommendations for the annotation of and the use of the expl relation varies consid- expletives so that more consistent annotation erably both across languages and between differ- can be achieved in future releases. ent treebanks for the same language. For instance, 1 Introduction the manually annotated English treebank uses the expl relation for a wide range of constructions, Universal Dependencies (UD) is a framework for including clausal extraposition, weather verbs, ex- morphosyntactic annotation that aims to provide istential there, and some idiomatic expressions. useful information for downstream NLP applica- By contrast, Dutch, a language in which all these tions in a cross-linguistically consistent fashion phenomena occur as well, uses expl only for ex- (Nivre, 2015; Nivre et al., 2016). Many such ap- traposed clausal arguments. In this paper, we pro- plications require an analysis of referring expres- vide a more precise characterization of the notion sions. In co-reference resolution, for example, it of expletives for the purpose of UD treebank anno- is important to be able to separate anaphoric uses tation, survey the annotation of expletives in exist- of pronouns such as it from non-referential uses ing UD treebanks, and make recommendations to (Boyd et al., 2005; Evans, 2001; Uryupina et al., improve consistency in future releases. 2016). Accurate translation of pronouns is another challenging problem, sometimes relying on co- 2 What is an Expletive? reference resolution, and where one of the choices is to not translate a pronoun at all. The latter sit- The UD initiative aims to provide a syntactic uation occurs for instance when translating from a annotation scheme that can be applied cross- linguistically, and that can be used to drive se- pronoun, and there, identical to the non- mantic interpretation. At the clause level, it dis- proximate locative pro-adverb), (ii) nonref- tinguishes between core arguments and oblique erential (neither anaphoric/cataphoric nor dependents of the verb, with core arguments be- exophoric), and (iii) devoid of any but a vac- ing limited to subjects (nominal and clausal), ob- uous semantic role. As a tentative definition jects (direct and indirect), and clausal comple- of expletives, we can characterize them as ments (open and closed). Expletives are of inter- pro-forms (typically third person pronouns est here, as a consistent distinction between exple- or locative pro-adverbs) that occur in core tives and regular core arguments is important for argument positions but are non-referential semantic interpretation but non-trivial to achieve (and therefore not assigned a semantic role). across languages and constructions. Like the UD definition, Postal and Pullum(1988) The UD documentation currently states that emphasize the vacuous semantics of expletives, expl is to be used for expletive or pleonastic but understand this not just as the lack of semantic nominals, that appear in an argument position of a role (iii) but also more generally as the absence of predicate but which do not themselves satisfy any reference (ii). Arguably, (ii) entails (iii) and could of the semantic roles of the predicate. As exam- seem to make it superfluous, but we will see that it ples, it mentions English it and there as used in can often be easier to test for (iii). The common, clausal extrapostion and existential constructions, pre-theoretic understanding of expletives does not cases of true clitic doubling in Greek and Bul- include idiom parts such as the bucket in kick the garian, and inherent reflexives. Silveira(2016) bucket, so it is necessary to restrict the concept characterizes expl as a wildcard for any element further. Postal and Pullum(1988) do this by (i), that has the morphosyntactic properties associ- which restricts expletives to be pro-forms. This is ated with a particular grammatical function but a relatively weak constraint on the form of exple- does not receive a semantic role. tives. We will see later that it may be desirable It is problematic that the UD definition relies to strengthen this criterion and require expletives on the concept of argument, since UD otherwise to be pro-forms that are selected by the predicate abandons the argument/adjunct distinction in favor with which it occurs. Such purely formal selec- of the core/oblique distinction. Silveira’s account tion is needed in many cases, since expletives are avoids this problem by instead referring to gram- not interchangeable across constructions – for ex- matical functions, thus also catering for cases like: ample, there rains is not an acceptable sentence (1) He will see to it that you have a reservation. of English. Criteria (ii) and (iii) from the definition of Postal and Pullum(1988) may be hard to However, both definitions appear to be too wide, apply directly in a UD setting, as UD is a syntac- in that they do not impose any restrictions on the tic, not a semantic, annotation framework. On the form of the expletive, or require it to be non- other hand, many decisions in UD are driven by referential. It could therefore be argued that the the need to provide annotations that can serve as subject of a raising verb, like Sue in Sue appears input for semantic analysis, and distinguishing be- to be nice, satisfies the conditions of the definition, tween elements that do and do not refer and fill a since it is a nominal in subject position that does thematic role therefore seems desirable. not satisfy a semantic role of the predicate appear. In addition to the definition, Postal and Pul- It seems useful, then, to look for a better defi- lum(1988) provide tests for expletives. Some of nition of expletive. Much of the literature in the- these (tough-movement and nominalization) are oretical linguistics is either restricted to specific not easy to apply cross-linguistically, but two of languages or language families (Platzack, 1987; them are, namely absence of coordination and in- Bennis, 2010; Cardinaletti, 1997) or to specific ability to license an emphatic reflexive. constructions (Vikner, 1995; Hazout, 2004). A theory-neutral and general definition can be found (2) *It and John rained and carried an umbrella in Postal and Pullum(1988): respectively. (3) *It itself rained. [T]hey are (i) morphologically identical to pro-forms (in English, two relevant forms The inability to license an emphatic reflexive is are it, identical to the third person neuter probably due to the lack of referentiality. It is less immediately obvious what the absence of coordi- (5) It seems that she came (en) nation diagnoses. One likely interpretation is that (6) Hij betreurt het dat jullie verliezen (nl) sentences like (2) are ungrammatical because the He regrets it that you lose verb selects for a particular syntactic string as its ‘He regrets that you lose’ subject. If that is so, form-selection can be consid- (7) It is going to be hard to sell the Dodge (en) ered a defining feature of expletives. Finally, following Postal and Pullum(1988), we It is fairly straightforward to argue that this con- can draw a distinction between expletives that oc- struction involves an expletive. Theoretically, it cur in chains and those that do not, where we un- could be cataphoric to the following clause and so derstand a chain as a relation between an expletive be referential, but in that case we would expect it and some other element of the sentence which has to be able to license an emphatic reflexive. How- the thematic role that would normally be associ- ever, this is not what we find, as shown in (8-a), ated with the position of the expletive, for exam- which contrasts with (8-b) where the raised sub- ple, the subordinate clause in (4).

Expletives in Universal Dependency Treebanks

Arxiv:1908.07448V1

Extended and Enhanced Polish Dependency Bank in Universal Dependencies Format

Universal Dependencies According to BERT: Both More Specific and More General

Universal Dependencies for Japanese

Universal Dependencies for Persian

Language Technology Meets Documentary Linguistics: What We Have to Tell Each Other

Foreword to the Special Issue on Uralic Languages

The Universal Dependencies Treebank of Spoken Slovenian

Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas

Conll-2017 Shared Task

Conferenceabstracts

A Multilingual Collection of Conll-U-Compatible Morphological Lexicons