<<

Modeling the of Plains Conor Snoek1, Dorothy Thunder1, Kaidi Loo˜ 1, Antti Arppe1, Jordan Lachler1, Sjur Moshagen2, Trond Trosterud2 1 University of , 2 University of Tromsø, Norway [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract While we are working to develop a complete finite-state model of Plains Cree morphology, we This paper presents aspects of a com- focus on nominal morphology in this paper. putational model of the morphology of Plains Cree based on the technology of In the first section we briefly describe Plains finite state transducers (FST). The paper Cree nominal morphology and give some back- focuses in particular on the modeling of ground on the . This is followed by de- nominal morphology. Plains Cree is a tails on the model and its implementation. Fi- whose nominal nally, we discuss the particular situation of de- morphology relies on prefixes, suffixes veloping tools for a language that lacks a formal, and circumfixes. The model of Plains agreed-upon standard and the challenges that this Cree morphology is capable of handling presents. We conclude with some comments on these complex affixation patterns and the benefits of this technology to language revital- the morphophonological alternations ization efforts. that they engender. Plains Cree is an endangered Algonquian language spo- 2 Background ken in numerous communities across 2.1 Plains Cree Canada. The language has no agreed upon standard , and exhibits Plains Cree or nehiyawˆ ewinˆ is an Algonquian widespread variation. We describe prob- language spoken across the Prairie Provinces in lems encountered and solutions found, what today is Canada. It forms part of the while contextualizing the endeavor in the Cree-Montagnais-Naskapi continuum that description, documentation and revitaliza- stretches from Labrador to . Es- tion of First Nations in Canada. timates as to the number of speakers of Plains Cree vary a lot and the exact number is not known, from a high of just over 83,000 (Statistics Canada 1 Introduction 2011, for Cree without differentiating for Cree di- The Department of at the University of alects) to as low as 160 ( 2013). Wol- Alberta has a long tradition of working with First fart (1973) estimated there to be about 20,000 na- Nations communities in Alberta and beyond. Re- tive speakers, but some recent figures are more cently a collaboration has begun with Giellatekno, conservative. a research institute at the University of Tromsø, Regardless of the exact number of speakers, which has specialized in creating language tech- there is general that the language is un- nologies, particularly for the indigenous Saami der threat of extinction. In many, if not most, com- languages of Scandinavia, but also for other lan- munities where Cree is spoken, children are learn- guages that have received less attention from the ing English as a first language, and encounter Cree computational linguistic mainstream. This collab- only in the language classroom. However, vigor- oration is currently focusing on developing com- ous revitalization efforts are underway and Cree is putational tools for promoting and supporting lit- regarded as one of the Canadian First Nations lan- eracy, language learning and language teaching. guages with the best chances to prosper (Cook and Plains Cree is a morphologically complex lan- Flynn, 2008). guage, especially with regard to and . As a polysynthetic language (Wolvengrey,

34 Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 34–42, Baltimore, Maryland, USA, 26 June 2014. 2014 Association for Computational Linguistics 2011, 35), Plains Cree exhibits substantial mor- phological complexity. Nouns come in two gen- (2) der classes: animate and inanimate. Each of these omaskisinisiwawaˆ classes is associated with distinct morphological o-maskisin-is-iwaw-aˆ patterns. Both animate and inanimate nouns carry 3PL.POSS-shoe-DIM-3PL.POSS-PL.IN inflectional morphology expressing the grammati- ‘their little shoes’ cal categories of number and locativity. The num- ber suffixes for animate and inanimate nouns are The particular form of the , how- different, the being marked by -ak in ani- ever, varies considerably. For example, the most mates and -a in inanimates. Locativity is marked common form of the suffix is -is. by a suffix taking the form -ihk (with a number of The suffix triggers morphophonemic changes in allomorphs). The locative suffix cannot co-occur the stem. For example, the ‘t’ in oskatˆ askw-ˆ ‘car- with suffixes marking number or obviation, but rot’ changes to ‘c’ (the alveolar [ts]) when does occur in conjunction with affixes. the diminutive suffix is present resulting in the Obviation is a marked on form oskacˆ askosˆ . Since the form oskatˆ askw-ˆ is a - animate nouns that indicates relative position on w final form a further phonological change occurs, the animacy hierarchy, when there are two third namely the initial in the suffix changes from person participants in the same clause. Obviation i > o. is expressed through the suffix -a, which forms a mutually exclusive paradigmatic structure with the To sum up, Plains Cree nominal morphology locative and number prefixes. allows the following productive pattern types: The possessor of a noun in Plains Cree is ex- pressed through affixes attached to the noun stem. (3) These affixes mark person and number of the stem+NUM possessor by means of a paradigmatic inflectional stem+OBV pattern that includes both prefixes and suffixes. stem+LOC Since matching prefixes and suffixes need to stem+DIM+NUM co-occur with the noun when it is possessed, it stem+DIM+OBV is possible to treat such prefix-suffix pairings as stem+DIM+LOC circumfixes expressing a single person-number POSS+stem+POSS+NUM meaning. The noun maskisin in (1) below1 is POSS+stem+DIM+POSS+NUM marked for third person plural possessors as well POSS+stem+DIM+POSS+OBV as being plural itself. The inanimate gender class POSS+stem+POSS+LOC is recognizable in the plural suffix -a, which POSS+stem+DIM+POSS+LOC would be -ak in the case of an animate noun. Plains Cree can be written both with the Ro- (1) man alphabet and with a Syllabary. Theoretically omaskisiniwawaˆ there is a one-to-one match between the two. o-maskisin-iwaw-aˆ However, a number of factors complicate this 3PL.POSS-shoe-3PL.POSS-PL.IN relationship. Differing punctuation conventions, ‘their shoes’ such as capitalization, and the treatment of loanwords make conversion from one writing system to another anything but a trivial matter. Nouns also occur with derivational morphology Orthography presents a general problem for the in the form of diminutive and augmentative development of computer-based tools, because suffixes. The diminutive suffix is productive and unlike nationally standardized languages, ortho- forms taking the diminutive suffix can occur with graphic conventions can vary considerably from all the inflectional morphology described above. community to community, even from one user to another. Certain authors have argued for the 1 The following abbreviations are used POSS = possessive adoption of orthographic standards for Plains prefix/suffix; LOC = locative suffix; OBV = suffix; DIM = diminutive suffix; NUM = number marking suffix; IN Cree (Okimasisˆ and Wolvengrey, 2008), but there = inanimate; PL = plural. simply is no centralized institution to enforce

35 orthographic or other standardization. This means . However, rather than cre- that the wealth of varying forms and dialectal ating a proliferation of dialectal tags, it is easier diversity of the language are apparent in each in- to reproduce the architecture of the model and use dividual community. This situation poses specific it to create a new model for the related language. challenges to the project of developing language This allows the preservation of formal structures tools that are more seldom encountered when that follow essentially the same pattern, such as making spell-checkers and language learning possessive inflection for example, while replacing tools for more standardized languages. the actual surface forms with those of the target language. Similar situations have been encountered in work on the Saami languages of Scandinavia 2.2 Previous computational modeling of (Johnson, 2013). Following their work, we in- Algonquian languages clude dialectal variants in the model, but mark them with specific tags. This permits a tool such as Previous work on Algonquian languages that has a spell-checker to be configured to accept and out- taken a computational approach is not extensive. put a subset of the total possible forms in the mor- Hewson (1993) compiled a dictionary of Proto- phological model. An example here is the distribu- Algonquian terms generated through an algorithm. tion of the locative suffix described in more detail His data were drawn from fieldwork carried out in section 4. There is a disparity between com- by Leonard Bloomfield. Kondrak (2002) applied munities regarding the acceptability of the occur- algorithms for cognate identification to Algon- rence of the suffix with certain nouns. The suffix quian data with considerable success. Wolfart and can be marked with a tag in the FST-model. This Pardo (1973) worked on a sizable corpus of Cree tag can then be used to block the acceptance or data and developed tools for data management and generation of this particular form. The key notion analysis in PL/I. Junker and Stewart (2008) have here is that language learning and teaching tools written on the difficulties of creating search engine are built on the basis of the general FST model. tools for and describe challenges simi- For Plains Cree there is one inclusive model, en- lar to the ones we have encountered with regard compassing as much dialectal variation as possi- to dialectal variation and the absence of agreed on ble. From this, individual tools are created, e.g. standard and other widespread con- spell-checkers, that selects an appropriate subset ventions. of the dialectally marked forms. A community In general, computational approaches to Algon- can therefore have their own spell-checker, spe- quian, and other Indigenous North American lan- cific to their own preferences. It is also possi- guages, have been hampered by the fact that in ble to allow for “spelling relaxations” (Johnson, many cases large bodies of data to develop and test 2013, 67) at the level of user input, meaning that methods on are just not available. Even for Plains variant forms will be recognized, but constraining Cree, which is relatively widely spoken, and rela- the output to a selection of forms deemed appro- tively well documented, the available descriptions priate for a given community. Hence, the spell- are still lacking in many places. As a result, field- checker used in one particular community could work must be undertaken in order to establish pat- accept certain noun-locative combinations. At the terns that can be modeled in the formalism neces- same time, other tools, such as paradigm learn- sary for the finite state transducer (FST) to work, ing applications, could block this particular noun- a point that will be expanded on below. locative combination from being generated: cer- tain forms are understood, but not taught by the 3 Modeling Plains Cree morphology model. In general, the variation is not difficult to The finite state transducer technology that forms deal with in terms of the model itself, rather it rep- the backbone of our morphological model, and resents a difficulty in the availability of accurate consequently of all the language applications we descriptions, since their specifics must be known are currently developing, is based historically on and understood to be successfully included in the work on computational modeling of natural lan- model. guages known as two-level morphology (TWOL) This method could, in principle, be used to ex- by Koskenniemi (1983). His ideas were further tend the Plains Cree FST-model to closely related developed by Beesley and Karttunen (2003).

36 Their framework offers two basic formalisms with path can terminate here as indicated by the hash which to encode linguistic data, lexc and twolc. mark. The other path, also open to both forms The Compiler, or lexc, is “a high-level since they pass through the same continuation declarative language and associated compiler” lexicon, leads to a further continuation lexicon (Beesley and Karttunen, 2003, 203) used for named OBVIATIVE. This rather small lexicon encoding stem forms and basic concatenative adds a final -a suffix and the tag +Obv indicating morphology. The source files are structured in that the form is inflected for the grammatical terms of a sequence of continuation lexica. Begin- category of obviation. Since no number suffixes ning with an inventory of stems the continuation can occur in this form the path does not add a +Sg lexica form states along a path, adding surface or +Pl tag to the underlying form. morphological forms and underlying analytic structure at each stage. A colon (:) separates (5) underlying and surface forms. Example (4) apiscacihkos+N+AN+Obv demonstrates paths through just three continua- apiscacihkosa tion lexica for the animate nouns apiscacihkos ‘antelope’ ‘antelope’ and apisimososˆ ‘deer’. By convention, the names of continuation lexica are given in These circumfixes were modeled using Flag upper case. Stems and affixes represent actual Diacritics, which are an “extension of the finite forms, and are thus given in lower case. The state implementation, providing feature-setting ‘+’ sign indicates a morphological tag. and feature-unification operations” (Beesley and Karttunen, 2003, 339). Flag diacritics make it (4) possible for the transducer to remember earlier LEXICON ANSTEMLIST states. The transducer may travel all paths through apiscacihkos ANDECL ; the prefixes via thousands of stems to all the apisimososˆ ANDECL ; suffixes, but the flag diacritics ensure that only LEXICON ANDECL strings with prefixes and suffixes belonging to < +N:0 +AN:0 +Sg:0 @U.noun.abs@ # > ; the same person-number value are generated. In < +N:0 +AN:0 @U.noun.abs@ OBVIATIVE > ; our solution for nouns, the continuation lexica LEXICON OBVIATIVE allow all combinations of possession suffixes < +Obv:a # > ; and prefixes, but the flag diacritics serve to filter out all undesired combinations. For example, in Both forms are directed to the continuation the noun omaskisiniwawaˆ from (1) above, the lexicon here named ANDECL which provides third person prefix o- and the suffix marking both some morphological tagging in the form of +N to person and number -iwawˆ are annotated in the mark the word as a noun and +AN to denote the lexc file with identical flag diacritics, so that they gender class ‘animate’. Each of the two nouns has will always occur together. the possibility of passing through the continuation lexicon ANDECL as an ‘absolutive’ noun – as Plains Cree has some very regular and pre- indicated by the tag @U.noun.abs@ (a flag dictable morphophonological alternations that diacritic, as will be explained below). The colons can be modeled successfully in the finite state in the code indicate a distinction between upper transducer framework. The formalism used here and lower levels of the transducer. The upper form is not lexc as in the listing of stems and the to the left of the colon is a string containing the concatenative morphology, but an additional for- the lemma as well as a number of tags that contain malism called the two-level compiler or twolc that information about grammatical properties. For is well suited to this task. The twolc formalism the word form apiscacihkos, the analysis once it was developed by Lauri Karttunen, Todd Yampol, has passed through the ANDECL continuation Kenneth R. Beesley and Ronald M. Kaplan based lexicon is apiscacihkos+N+AN+Sg. on ideas set forth in Koskenniemi (1983).

The surface forms apiscacihkos and apisimososˆ are well-formed strings of Plains Cree, following the Standard Roman Orthography. Hence, the

37 (6) in atimw- ‘dog’ changes to c when the diminutive acawewikamikosisˆ suffix is present resulting in the surface form aci- atawewikamikw-isisˆ mosis. Both these forms can be handled by twolc store-DIM rules such as the one exemplified in (8) above. ‘little store’ However, atimw- also undergoes changes in the stem vowel when the noun is marked for a pos- In (6) above, atawewikamikw-ˆ ‘store’ is modi- sessor so that a > i and i > eˆ. In the first person, fied by the derivational suffix -isis marking the the possessive prefix takes the form ni- leading to diminutive form. This derivation is highly pro- a sequence of two arising from the prefix ductive in Plains Cree. The underlying form of the final -i- and stem initial -i-, which is not permit- suffix is -isis but in conjunction with a stem-final ted in Plains Cree. This situation is handled by a -w, the initial vowel of the suffix changes to -o. general rule deleting the first vowel in preference This morphophonemic alternation can be written for the latter. However, a set of twolc rules would in twolc much like a phonological rule: be required to change the stem vowels – a set that would be specific to this particular word only. The (7) full set of two level rules are accessible online2. i:o <=> w: %>:0 s: +Dim ; Since the addition of further rules poses the risk of rule conflicts in an increasingly complex twolc The sign %> is used to mark a suffix bound- code, the stem vowel changes are handled in lexc ary, which, along with the +Dim tag, ensures that instead. There are currently over 40 continuation it is the first vowel of the suffix that undergoes lexica in the model of nominal morphology alone. substitution. Thus the context is given by the occurrence of a -w before the suffix boundary, (8) i.e. stem finally. An additional complication here LEXICON IRREGULARANIMATESTEMS is that the presence of the diminutive suffix in a atim IRREGULARINFLECTION-1 ; form again triggers a phonological change in the atim:temˆ IRREGULARINFLECTION-2 ; stem by which all t’s change to c’s (phonetically [ts]). In twolc the rules dictating morphophono- The continuation lexicon contains two ver- logical alternations apply in parallel, avoiding sions of the form atim with two different paths possible problems caused by sequential rule leading to further inflectional suffixes. In the interactions. The noun completes the path through second instance of atim, writing the base form to the continuation lexica and is passed to twolc as the left of the colon and the suppletive stem to atawewikamikwisisˆ . There it undergoes two the right ensures both that the form -temˆ surfaces morphophonological changes giving the correct correctly. In the analysis the base form atim can surface form acawewikamikosisˆ . still be recovered. The forms are sent to differing continuation lexica, since only the suppletive Twolc is a powerful mechanism for dealing with forms occurs within the paradigm of possessive regular alternations. Reliance on twolc can reduce prefixes. The word meaning ‘my little dog’ is the number of continuation lexica and hence com- given as an example in (10) below. plexity of the morphology modeling carried out in lexc. The downside of using large numbers of (9) twolc rules is the increasing complexity of rule nicemisisˆ interactions. We have found that decisions about ni-atimw-isis which strategy to pursue in the modeling of a par- 1SG.POSS-dog-DIM ticular morphological pattern must frequently be ‘my little dog’ made on a case by case basis. For example, in modeling the interesting case of the form atimw- The suppletive form also does not carry an ‘dog’ several strategies needed to be employed. underlying -w and hence no longer triggers the The form triggers a vowel change i > o in con- vowel change in the diminutive suffix. With this junction with the diminutive suffix -isis resulting 2https://victorio.uit.no/langtech/ in -osis, a change falling under a rule described trunk/langs/crk/src/phonology/crk-phon. in (8) above. A further change here is that the t twolc

38 solution we can handle the regular and more can look on scholarly work dating back some cen- straightforward morphophonological alternations turies, and are supported by work from a com- in twolc, while avoiding undue complexity by munity of specialists numbering hundreds of peo- modeling the suppletive forms in lexc. ple, work on Plains Cree (and other languages in Finally, we have adopted a system of using spe- similar situations) is being carried out by what cial tags to denote dialectal variants that are not is at best a handful of people. While Cree lan- equally acceptable in different communities. The guage specialists form a professional body of re- seemingly of variation found in Plains searchers with a proud tradition, they are faced Cree can be related to several reasons described with the enormous task of documenting a language in more detail in the next section. The variation spoken in many small communities spread over a is dealt with in the morphological model with a huge geographical area. In addition, many of those tagging strategy that marks dialectal forms. This specialists are also involved with language revi- tagging allows for the systems based on the mor- talization and language teaching, with the result phological model to behave in accordance with that less time can be devoted to language descrip- the wishes of the user or community of users. tion, scholarship and the pursuit of larger projects In the setting of a particular teaching institution, such as the development of corpora. While such for instance, only a certain subset of the vari- projects are under development in many areas, the ants encoded in the morphological model might be demands placed on individual researchers and ac- deemed acceptable. Our model permits this com- tivists has resulted in an overall scarcity of re- munity to adjust the applications they are employ- sources. While compared to other Indigenous ing, e.g. a spell-checker, so that their community- languages spoken in Canada, Plains Cree is rel- specific forms are accepted as correct. atively well documented, many of the resources The stems are accessible online3, and may be that would be desirable assets for the development analysed and generated at the webpage for Plains of a finite state model are not available. As a re- Cree grammar tools4. sult, we have carried out fieldwork to further make explicit the full inflectional paradigm of nouns in 4 The necessity for fieldwork in modeling Plains Cree. Plains Cree There is considerable variation among speak- We began working on the morphological model ers and specialists regarding the acceptability of of Plains Cree by examining published sources, certain inflectional possibilities. For example, in such as Plains Cree: A grammatical study (Wol- the case of one animate noun atim ‘dog’ it seems fart, 1973) and Cree: Language of the Plains formally reasonable to allow its combination with (Okimasis,ˆ 2004). Okimasis’ˆ work is clearly the locative suffix -ohk rendering atimohk. This structured and contains a wealth of information. combination of stem and affix was considered im- Nevertheless, the level of explicitness required to possible or at least implausible by some of our capture the nature of a language in enough de- native speaker consultants. However, the form tail for applications such as, for example, spell- itself does occur, albeit in the guise of a place checkers is beyond the scope of her work. This name for a lake island in northern is to say that in formalizing Okimasis’ˆ descrip- named atim ‘dog’. Therefore the form atimohk ‘on tion we needed to generalize grammatical patterns the dog’ with locative suffix attached can occur that were not always explicitly spelled out in her in this very specific and geographically bounded 5 work in every detail. It should be apparent here context . The way of coping with this is to lexi- that a number of factors come in to play here that calize atimohk as locative of the island Atim, and make working on Plains Cree quite a different un- to keep the noun atim outside the set of nouns get- dertaking from working on a European language ting regular locatives. with a long history of research in the Western aca- Further inquiry into this matter revealed that demic tradition. While official national European some speakers see the locative suffix as potentially languages such as German, Finnish or Estonian occurring quite widely, while others are more re- strictive (Arok Wolvengrey – p.c.). Here again 3https://victorio.uit.no/langtech/ trunk/langs/crk/src/morphology/stems/ there is a problem of scale: individual speakers of 4http://giellatekno.uit.no/cgi/index. crk.eng.html 5Thanks to Jan Van Eijk for pointing this out.

39 any language have only a partial experience of the FST model of Plains Cree morphology and use it possible extent of the language. In the modeling to create in one go a variety of language tools in- of the morphology for the purposes of such tech- cluding a spellchecker, a morphological analyzer nologies as spell-checkers, for example, the expe- and a paradigm generator, which can be integrated rience of any potential speaker must be taken into as modules within general software applications account. While the information that this particu- such as a word-processor, an electronic dictionary lar form is rare or semantically not well-formed or a intelligent computer-aided language learning is valuable, retaining the form is important, if the (ICALL) application. Each of these tools can as- model is to cover the range of potential usage pat- sist fluent speakers, as well as new learners, in terns of all Plains Cree speakers. Ideally, if the their use of Plains Cree as a written language. written use of the language is supported by the The spellchecking functionality within a word- tools that can be developed based on our morpho- processor will be a valuable tool for the small-but- logical model, that would lead to a gradually in- growing number of Plains profes- creasing electronic corpus of texts, providing fre- sionals are engaged in the development of quency information on both the stems and mor- teaching and literary resources for the language. phological forms. It will allow for greater accuracy and consistency We have developed a workflow in which we in spelling, as well as faster production of materi- construct the maximal paradigms that are theoreti- als. Because dialectal variation is being encoded cally possible and then submit them to intense na- directly into the FST model, the spellchecker can tive speaker scrutiny. Only once native speakers be configured so that writers from all communities and specialists have approved the forms do they and can use this tool, without worry that become part of the actual model. The paradigms the technology is covertly imposing particular or- are chosen so as to provide the coverage of the thographic standards which the communities have entire span of morphologically possible forms as not all agreed upon. well as all morphophonemic alternations. As such The morphological analysis functionality built they present a maximal testbed for the patterns en- from the FST model and integrated within e.g. a coded in the formalism. Each paradigm consists web-based electronic dictionary will allow readers of about sixty inflected forms. to highlight Plains Cree text in a document or web- Overall, a careful balance must be struck be- page to perform a lookup of in any inflected tween directly explicit speaker/specialist input and form, and not only with the citation (base) form. theoretically possible forms. We aim to achieve This will enable readers to more easily read Plains this balance by taking a threefold approach: First, Cree documents with unfamiliar words without by careful consultation with speakers and special- needing to stop to repeatedly consult paper dic- ists; second, by building a corpus6 which can serve tionaries and grammars. While this does not obvi- as a testing ground for the morphological analyzer ate the need for printed resources in learning and and as a source of data, and third by working teaching of the language, such added functional- closely with communities willing to test the model ity can greatly increase the pace at which texts and provide feedback. are read through by language learners. This is not 5 Applications in language teaching and inconsequential as it can slow down considerably revitalization the onset of weariness brought on by needing to interrupt the reading process to consult reference The development of an explicit model of the mor- materials, and hence maintain the motivation for phology of Plains Cree as outlined above is of language learning. benefit not just to researchers but also those in- The paradigm generation functionality within volved in teaching and revitalizing the language e.g. an electronic dictionary allows users to se- within their home communities. Using the gen- lect a word and receive the full, or alternatively a eral technological infrastructure developed by the smaller but representative, inflected paradigm of researchers at Giellatekno, we are able to take the that word. This will be of direct benefit to in- 6 As noted above, a tool like a spell-checker promotes lit- structors developing materials to teach the com- eracy and hence contributes naturally to the increase in tex- tual materials. Until that begins to happen, however, we are plex morphology of the Plains Cree, as well as collecting texts through recording and transcription. their students.

40 We are working in collaboration with Plains In the future, we will continue to expand the Cree communities in the development and piloting morphological model both in its grammatical cov- of these tools, to ensure their accuracy and their erage and in the size of the lexical resources which usefulness for teachers, developers, learners and go into it. In regard to the latter, we are working other community members. The full range of uses with Cree-speaking communities in Alberta to ex- that these tools will be put to will only become ap- pand on existing dictionaries and develop collec- parent over time, but we expect that they will have tions of recordings. The development of this mor- a positive impact for community language main- phological model has led us to carry out fieldwork tenance by supporting the continued development on Plains Cree and to actively engage with Cree- Plains Cree literacy. speaking communities. We have worked hard to bridge the unfortunate gap that sometimes forms 6 Conclusion between the linguistic work being carried within academia and the needs of communities that are We have found the technology of Finite State active in language documentation and revitaliza- Transducers so useful in developing language ap- tion. We look forward to further fruitful coopera- plications for Plains Cree because it permits us tion between activists, educators and researchers. to integrate native speaker competence and spe- cialized linguistic understanding of grammatical Acknowledgments structures into the model directly. At present the analyzer contains 72 nominal lex- Building a computational model of Plains Cree emes, carefully chosen to cover all morphologi- morphology is a task that relies on the knowl- cal and morphophonological aspects of the Plains edge, time and goodwill of many people. We Cree nominal system. Once the morphological thank the University of Alberta’s Killam Re- modeling of this core set of nouns has been final- search Fund Cornerstones Grant for supporting ized, scaling up the lexicon will be a trivial task, this project. We would like to acknowledge in par- as all lexicographic resources classify their stem ticular the crucial advice, attention and effort of in the same way as is done in the morphological Jean Okimasisˆ and Arok Wolvengrey, and thank transducer. them for the resources they have contributed. We We have described our method of working with wish also to thank Jeff Muehlbauer for his time native speaker specialists and how their insights and materials, as well as the attendees of the first are reflected in the design of the model. This in- Prairies Workshop on Language and Linguistics teraction also allows enough possibilities for in- for their insights and expertise. Further, it is im- teractions with language teachers, learners and ac- portant to acknowledge the helpfulness of Earle tivists so that we make our work truly useful to the Waugh who at the very start of our project made effort of preserving and revitalizing the precious his dictionary available to us, and who has been cultural heritage that is Plains Cree. We are aware very supportive. Arden Ogg has worked tirelessly of the limits of tools that relate primarily to the to build connections among researchers working written forms for languages that have rich oral his- on Cree, which has greatly promoted and facili- tories and cultures, but feel that writing and read- tated our work. Ahmad Jawad and Intellimedia, ing Plains Cree will play an ever growing role in Inc. who have for some time provided the tech- the future of this language. nological platform to make available a number This work makes practical contributions to lin- of Plains Cree dictionaries through a web-based guistic research on Plains Cree. On the one hand, interface, have given us invaluable assistance in creating the model required the formalization of terms of resources and introductions. We would many aspects of Plains Cree morphology which also especially like to thank the staff at Miyo had not previously been spelled out in full detail, Wahkohtowin Education for their wonderful en- i.e. it makes explicit what is known, or not known, thusiasm, and for welcoming us into their commu- about Plains Cree morphology, and thus allows us nity. Last but by no means least, we are indebted to to extend the description of Plains Cree morphol- innumerable Elders and native speakers of Plains ogy accordingly. On the other, the morphological Cree whose contributions have made possible all analyses can aid in future linguistic discovery es- the dictionaries and text collections we are fortu- pecially when used in conjunction with corpora. nate to have today.

41 References Kenneth R. Beesley and Lauri Karttunen. 2003. Fi- nite State Morphology. CSLI Publications, Stanford (CA). Eung-Do Cook and Darin Flynn. 2008. Aborigi- nal . In: O’Grady, William and John Archibald (eds.) Contemporary Linguistic Analysis. Pearson, Toronto (ON). John Hewson. 1993. A computer-generated dictionary of Proto-Algonquian, Canadian Museum of Civ- ilization and Canadian Ethnology Service, Ottawa (ON). Ryan Johnson, Lene Antonsen and Trond Trosterud. 2013. Using Finite State Transducers for Making Efficient Reading Comprehension Dictionaries. In Stephan Oepen & Kristin Hagen & Janne Bondi Jo- hannessen (eds.), Proceedings of the 19th Nordic Conference of Computational Linguistics (NODAL- IDA 2013), 378-411. Linkoping¨ Electronic Confer- ence Proceedings No. 85.

Marie-Odile Junker and Terry Stewart. 2008. Build- ing Search Engines for Algonquian Languages. In Karl S. Hele & Regna Darnell (eds.), Papers of the 39th Algonquian Conference, 59-71. University of Western Ontario Press, London (ON). Grzegorz Kondrak. 2002. Algorithms for Language Reconstruction, Department of Computer Science, University of Toronto.

Kimmo Koskenniemi. 1983. Two-level Morphology: A General Computational Model for Word-Form Recognition and Production, Publication No. 11. Department of General Linguistics, University of Helsinki. Jean Okimasis.ˆ 2004. Cree: Language of the Plains, Volume 13 of University of Regina publications. University of Regina Press, Regina (SK). Jean Okimasisˆ and Arok Wolvengrey. 2008. How to Spell it in Cree. miywasinˆ ink, Regina (SK).

H. Christoph Wolfart. 1973. Plains Cree: A grammati- cal study, Transactions of the American Philosoph- ical Society No. 5.

H. Christoph Wolfart and Francis Pardo 1973. Computer-assisted linguistic analysis, University of Anthropology Papers No. 6. Department of Anthropology, University of Manitoba.

Arok E. Wolvengrey. 2011. Semantic and pragmatic functions in Plains Cree syntax, LOT, Utrecht (NL).

42