Modeling the Noun Morphology of Plains Cree
Total Page:16
File Type:pdf, Size:1020Kb
Modeling the Noun Morphology of Plains Cree Conor Snoek1, Dorothy Thunder1, Kaidi Loo˜ 1, Antti Arppe1, Jordan Lachler1, Sjur Moshagen2, Trond Trosterud2 1 University of Alberta, Canada 2 University of Tromsø, Norway [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Abstract While we are working to develop a complete finite-state model of Plains Cree morphology, we This paper presents aspects of a com- focus on nominal morphology in this paper. putational model of the morphology of Plains Cree based on the technology of In the first section we briefly describe Plains finite state transducers (FST). The paper Cree nominal morphology and give some back- focuses in particular on the modeling of ground on the language. This is followed by de- nominal morphology. Plains Cree is a tails on the model and its implementation. Fi- polysynthetic language whose nominal nally, we discuss the particular situation of de- morphology relies on prefixes, suffixes veloping tools for a language that lacks a formal, and circumfixes. The model of Plains agreed-upon standard and the challenges that this Cree morphology is capable of handling presents. We conclude with some comments on these complex affixation patterns and the benefits of this technology to language revital- the morphophonological alternations ization efforts. that they engender. Plains Cree is an endangered Algonquian language spo- 2 Background ken in numerous communities across 2.1 Plains Cree Canada. The language has no agreed upon standard orthography, and exhibits Plains Cree or nehiyawˆ ewinˆ is an Algonquian widespread variation. We describe prob- language spoken across the Prairie Provinces in lems encountered and solutions found, what today is Canada. It forms part of the while contextualizing the endeavor in the Cree-Montagnais-Naskapi dialect continuum that description, documentation and revitaliza- stretches from Labrador to British Columbia. Es- tion of First Nations Languages in Canada. timates as to the number of speakers of Plains Cree vary a lot and the exact number is not known, from a high of just over 83,000 (Statistics Canada 1 Introduction 2011, for Cree without differentiating for Cree di- The Department of Linguistics at the University of alects) to as low as 160 (Ethnologue 2013). Wol- Alberta has a long tradition of working with First fart (1973) estimated there to be about 20,000 na- Nations communities in Alberta and beyond. Re- tive speakers, but some recent figures are more cently a collaboration has begun with Giellatekno, conservative. a research institute at the University of Tromsø, Regardless of the exact number of speakers, which has specialized in creating language tech- there is general agreement that the language is un- nologies, particularly for the indigenous Saami der threat of extinction. In many, if not most, com- languages of Scandinavia, but also for other lan- munities where Cree is spoken, children are learn- guages that have received less attention from the ing English as a first language, and encounter Cree computational linguistic mainstream. This collab- only in the language classroom. However, vigor- oration is currently focusing on developing com- ous revitalization efforts are underway and Cree is putational tools for promoting and supporting lit- regarded as one of the Canadian First Nations lan- eracy, language learning and language teaching. guages with the best chances to prosper (Cook and Plains Cree is a morphologically complex lan- Flynn, 2008). guage, especially with regard to nouns and verbs. As a polysynthetic language (Wolvengrey, 34 Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 34–42, Baltimore, Maryland, USA, 26 June 2014. c 2014 Association for Computational Linguistics 2011, 35), Plains Cree exhibits substantial mor- phological complexity. Nouns come in two gen- (2) der classes: animate and inanimate. Each of these omaskisinisiwawaˆ classes is associated with distinct morphological o-maskisin-is-iwaw-aˆ patterns. Both animate and inanimate nouns carry 3PL.POSS-shoe-DIM-3PL.POSS-PL.IN inflectional morphology expressing the grammati- ‘their little shoes’ cal categories of number and locativity. The num- ber suffixes for animate and inanimate nouns are The particular form of the diminutive, how- different, the plural being marked by -ak in ani- ever, varies considerably. For example, the most mates and -a in inanimates. Locativity is marked common form of the suffix is -is. by a suffix taking the form -ihk (with a number of The suffix triggers morphophonemic changes in allomorphs). The locative suffix cannot co-occur the stem. For example, the ‘t’ in oskatˆ askw-ˆ ‘car- with suffixes marking number or obviation, but rot’ changes to ‘c’ (the alveolar affricate [ts]) when does occur in conjunction with possessive affixes. the diminutive suffix is present resulting in the Obviation is a grammatical category marked on form oskacˆ askosˆ . Since the form oskatˆ askw-ˆ is a - animate nouns that indicates relative position on w final form a further phonological change occurs, the animacy hierarchy, when there are two third namely the initial vowel in the suffix changes from person participants in the same clause. Obviation i > o. is expressed through the suffix -a, which forms a mutually exclusive paradigmatic structure with the To sum up, Plains Cree nominal morphology locative and number prefixes. allows the following productive pattern types: The possessor of a noun in Plains Cree is ex- pressed through affixes attached to the noun stem. (3) These affixes mark person and number of the stem+NUM possessor by means of a paradigmatic inflectional stem+OBV pattern that includes both prefixes and suffixes. stem+LOC Since matching prefixes and suffixes need to stem+DIM+NUM co-occur with the noun when it is possessed, it stem+DIM+OBV is possible to treat such prefix-suffix pairings as stem+DIM+LOC circumfixes expressing a single person-number POSS+stem+POSS+NUM meaning. The noun maskisin in (1) below1 is POSS+stem+DIM+POSS+NUM marked for third person plural possessors as well POSS+stem+DIM+POSS+OBV as being plural itself. The inanimate gender class POSS+stem+POSS+LOC is recognizable in the plural suffix -a, which POSS+stem+DIM+POSS+LOC would be -ak in the case of an animate noun. Plains Cree can be written both with the Ro- (1) man alphabet and with a Syllabary. Theoretically omaskisiniwawaˆ there is a one-to-one match between the two. o-maskisin-iwaw-aˆ However, a number of factors complicate this 3PL.POSS-shoe-3PL.POSS-PL.IN relationship. Differing punctuation conventions, ‘their shoes’ such as capitalization, and the treatment of loanwords make conversion from one writing system to another anything but a trivial matter. Nouns also occur with derivational morphology Orthography presents a general problem for the in the form of diminutive and augmentative development of computer-based tools, because suffixes. The diminutive suffix is productive and unlike nationally standardized languages, ortho- forms taking the diminutive suffix can occur with graphic conventions can vary considerably from all the inflectional morphology described above. community to community, even from one user to another. Certain authors have argued for the 1 The following abbreviations are used POSS = possessive adoption of orthographic standards for Plains prefix/suffix; LOC = locative suffix; OBV = obviative suffix; DIM = diminutive suffix; NUM = number marking suffix; IN Cree (Okimasisˆ and Wolvengrey, 2008), but there = inanimate; PL = plural. simply is no centralized institution to enforce 35 orthographic or other standardization. This means Algonquian languages. However, rather than cre- that the wealth of varying forms and dialectal ating a proliferation of dialectal tags, it is easier diversity of the language are apparent in each in- to reproduce the architecture of the model and use dividual community. This situation poses specific it to create a new model for the related language. challenges to the project of developing language This allows the preservation of formal structures tools that are more seldom encountered when that follow essentially the same pattern, such as making spell-checkers and language learning possessive inflection for example, while replacing tools for more standardized languages. the actual surface forms with those of the target language. Similar situations have been encountered in work on the Saami languages of Scandinavia 2.2 Previous computational modeling of (Johnson, 2013). Following their work, we in- Algonquian languages clude dialectal variants in the model, but mark them with specific tags. This permits a tool such as Previous work on Algonquian languages that has a spell-checker to be configured to accept and out- taken a computational approach is not extensive. put a subset of the total possible forms in the mor- Hewson (1993) compiled a dictionary of Proto- phological model. An example here is the distribu- Algonquian terms generated through an algorithm. tion of the locative suffix described in more detail His data were drawn from fieldwork carried out in section 4. There is a disparity between com- by Leonard Bloomfield. Kondrak (2002) applied munities regarding the acceptability of the occur- algorithms for cognate identification to Algon- rence of the suffix with certain nouns. The suffix quian data with considerable success. Wolfart and can be marked with a tag in the FST-model. This Pardo (1973) worked on a sizable corpus of Cree tag can then be used to block the acceptance or data and developed tools for data management and generation of this particular form. The key notion analysis in PL/I. Junker and Stewart (2008) have here is that language learning and teaching tools written on the difficulties of creating search engine are built on the basis of the general FST model. tools for East Cree and describe challenges simi- For Plains Cree there is one inclusive model, en- lar to the ones we have encountered with regard compassing as much dialectal variation as possi- to dialectal variation and the absence of agreed on ble.