Applying Universal Dependency to the Arapaho Language
Total Page:16
File Type:pdf, Size:1020Kb
Applying Universal Dependency to the Arapaho Language Irina Wagner1, Andrew Cowell1, Jena D. Hwang2 1University of Colorado Boulder, Department of Linguistics; 2IHMC irina.wagner, james.cowell @colorado.edu, [email protected] { } Abstract Applying the UD rules while annotating the data from the Arapaho (Algonquian) language, several This paper discusses the use of Universal specific features were observed to fall outside of Dependency for annotations of a Native the charted labels. Since the language does not North American language Arapaho (Algo- have a fixed word order and allows discontinuous nquian). While some relations of the uni- constituency, dependencies on the previous word versal dependency perfectly correspond were avoided and re-analyzed. The most problem- with those in Arapaho, language specific atic dependency distinction in this language is the annotations of verbal arguments elucidate variation in relations between a verb and its argu- problems of assuming certain syntactic ments. This paper examines the correlation of the categories across languages. By critiquing dependency relations in the UD scheme and their the influence of grammatical structures of practical application for the Arapaho data. Us- major European and Asian languages in ing the UD framework, we create guidelines for establishing the UD framework, this paper annotating this data. In considerations of space, develops guidelines for annotating a poly- this paper primarily focuses on the argument struc- synthetic agglutinating language and sets tures defined by the UD and their correspondences a path to developing a more comprehen- to the Arapaho syntactic patterns. An additional sive cross-linguistic approach to syntactic discussion of non-verbal roots and topicality prob- annotations of language data. lematizes some of the common assumptions in dis- 1 Introduction counting pragmatic features while analyzing syn- tactic dependencies. The recent initiatives to create a cross-linguistic scheme of annotation rely on Universal Depen- In the following pages, we first provide a short dency (UD) as a system of describing the syntactic note on the Arapaho language and the procedures connection between words (Nivre, 2015; de Marn- of annotations (2); discuss issues of mapping the effe et al., 2014). While research shows this anno- labels for subject, objects, and noun modifiers of tation type is effective not only for monolingual the UD onto the Arapaho dependencies (3); define parsers but also cross-linguistically across mul- the mechanism of analysis of non-verbal roots (4); tiple platforms, the universality of this approach and suggest further ways of developing these an- is based on the assumptions of similar syntac- notation guidelines (5). tic structures of major, often European, languages (McDonald et al., 2013). Without doubt, those 2 Arapaho data and annotations are also the languages that receive predominant at- Arapaho is an Algonquian poly-synthetic aggluti- tention in the computational sphere, the languages nating language spoken by less than 200 people in whose technological presence requires a thorough the Wind River Indian Reservation in Wyoming. analysis and annotation. However, if the goal of Because the language is in critical condition, there natural language processing is truly to develop have been attempts at documenting and preserving a universal cross-linguistic strategy for annotat- it. A large transcribed and annotated spoken cor- ing and analyzing linguistic data, it is important pus has been created and parts of it are now avail- to attend to lesser described languages that may able in the Endangered Languages Archive1.A present strikingly different syntactic structures and dependencies. 1http://elar.soas.ac.uk/deposit/0194 171 Proceedings of LAW X – The 10th Linguistic Annotation Workshop, pages 171–179, Berlin, Germany, August 11, 2016. c 2016 Association for Computational Linguistics total of around eighty thousand lines transcribed, hit him/her” is marked to agree both with the se- translated, and grammatically analyzed is avail- mantic agent and undergoer of the verb. This se- able for further processing. The current attempts mantic distinction in the arguments is not observed at establishing the dependency scheme for this lan- in intransitive and semi-transitive verbs. Because guage initiate the new type of analysis of this data such verbs demonstrate morphological agreement to allow machine processing. only with one nominal2, other nominals are con- sidered outside of the argument structure of a verb 2.1 Some features of the Arapaho language even if they specify the semantic patient or theme. The current paper largely relies on the previous de- (3) nih’ii-koo-ko’uyei-3i’ biino scription of the Arapaho grammar by Cowell and PST.IMPF-REDUP-pick things-3PL chokecherries Moss (2008). There are several intriguing features “They were picking chokecherries.” of the grammar, but the ones most relevant to this study are its complex verbal morphology, split se- So in the example (3), the noun biino “chokecher- mantic and syntactic transitivity, and the system of ries” is not reflected in verbal morphology, but obviation. corresponds with its semantics by specifying the object of picking. Being outside of the argument 2.1.1 Verbal complexity structure of this verb, syntactically the noun is bet- As is observed in many other poly-synthetic lan- ter understood as a verbal adjunct specifying the guages, Arapaho verbs are highly complex and manner of action, while semantically it is still the mark multiple grammatical and semantic features. patient. So the designation of the relationship be- So, in example (1), a single verb demonstrates in- tween such arguments and verbs as dobj of the uni- corporation of not only the usual tense, aspect, versal dependencies is wrong because it does not mode, person, and number features, but also the consider verbal morphology, whereas the label of manner of action and an incorporated object. nmod would not account for its semantic role. (1) he’ih’ii-xoo-xook-bix-ohoe-koohuut-oo-no’ “Their hands would go right through them and appear 2.1.3 Obviation on the other side.” Unlike many languages, Arapaho does not rely A single verb can be a full clause conveying a full on word order or case markers to disambiguate thought. Verbal prefixes code grammatical as well between overt nominals; rather it uses a system as many semantic features, inhibiting the depen- of obviation that incorporates a distinction based dency analysis since this framework only consid- on animacy along with the combination of ver- ers the relations between individual words. bal morphosyntax and pragmatics to mark partic- ular grammatical roles. This system clearly dis- 2.1.2 Transitivity tinguishes between two third person referents by The category of verbal transitivity is both syntactic marking one of them (a less salient one in the and semantic (Cowell and Moss, 2008). To under- discourse) as obviative and leaving the other ref- stand how many arguments are allowed in a verb’s erent unmarked (proximate). In Algonquian lan- frame, one must examine both the morphological guages, the obviation is argued to be a pragmatic and the semantic structure of a verb. So, while feature structuring discourse outside of a single semantically a verb to’oo3ei “to hit things” may clause (Goddard, 1984). Verbal morphology also appear transitive, grammatically it is intransitive, shows agreement with these categories: the transi- requiring only one argument, the subject, as in tive verb inflection clearly marks which argument too’oo3einoo “I am hitting (unspecified) things.” is acting on the other. So, instead of the usual three The transitivity of a verb is expressed in its inflec- persons, Arapaho has four, with the fourth person tion which must agree in person and number with being the obviative argument. In the example be- its arguments. Truly transitive verbs carry inflec- low, the obviative argument is the noun hiinoon tions agreeing with both of its arguments: “his mother” which corresponds with the verbal (2) Nih-to’ow-oo-t nuhu’ hinen-ino subjunctive inflection -eihok “4th person acting on PST-hit-3/4-3S this man-OBV.PL 3rd singular.” “He hit these men” 2We use phrases “nominal” and “nominal expression” to Even though only one of the two arguments ap- refer to nouns, noun phrases and nominalized verbs that func- pears in the sentence, the verb nihto’owoot “s/he tion as noun phrases. 172 (4) Hohou, hee3eihok hiinoon the annotations has been used thus far, and all of thank you say to s.o.-4/3S.SUBJ his/her mother the annotations are stored in a spreadsheet format. 3eeyokooxuu. Tipi-pole Child Because the language is critically endangered, “Thank you,” his mother said to Under-the-Tipi-Pole the resources available for this type of work are Child. extremely limited. Importantly, it is not just that there are fewer recorded texts and conversations, As it is observed in this example, obviation does but there are also fewer trained individuals able not correspond with the semantic or the syntac- to perform any type of language annotation. So, tic role of an argument. Neither it depends on the during this particular project, most of the anno- transitivity of a verb. Rather, obviative status lines tations were done by the first two authors of the up with the semantic role of an obviative coded in paper with Andrew Cowell being the language verbal morphology. Based on this feature of tran- expert due to his experience and acquired profi- sitivity and obviation,