advancing the study of endangered languages benefits of using computational tools 1. Test the accuracy and coverage of the with computational tools for analysis of the morphology. 2. Partially automate the glossing of texts. The case of Asama verb paradigms 3. Enable quantitative measures of morphological complexity. Dimitri Lévêque Thomas Pellard ⇒ Inalco-CNRS-EHESS, CRLAO CNRS-EHESS-Inalco, CRLAO Advance the study of endangered languages. [email protected]@cnrs.fr asama why computational tools? implementation • “The insufficiency of paper-and-pencil • Content paradigm cell ⟨퐿, 휎⟩ → realised cell 푤 linguistics” (Karttunen 2006). ⟨amjui LH, seq⟩ → aadi̵ HL • Finite-state transducers (Beesley & Karttunen • Unlabeled form 푤 → all possible analyses ⟨퐿, 휎⟩ 2003). ⟨amjui LH, seq⟩ • Bidirectional: ⟨aamjui H, seq⟩ aadi̵ HL production lexeme/ → inflected form; ⟨abjui LH, seq⟩ Asama recognition inflected form → lexeme/root. ⟨aabjui H, seq⟩ • Syntax similar to that of phonological rewrite • 퐿 → full realised paradigm, i.e. a set of cells rules familiar to many linguists. {⟨푤, 휎⟩, ⟨푤′, 휎′⟩, …}). regexV->0 ||V V _ ; • Any realised form ⟨퐿, 휎⟩ → list of all possible regexi->ɨ ||r _ ; realised forms of any other paradigm cell. regex0->j ||s _ a ; • Endangered Ryukuan language (Japonic family). 3 theoretical morphology • FSTs have sufficient power to handle (almost) • Reference grammar project by D. Lévêque. any morphological phenomena. goals Exploring the implicative structure with • 100 full + 400 near-full inflectional verb • Provide a maximally precise and explicit complexity measures and Shannon paradigms collected. description. entropy and solve the Paradigm Cell Filling Problem (“What licenses reliable • Non-canonical phenomena (e.g. alternation of • Free software implementations (Foma). tones and vowel length) in verb inflection. inferences about the inflected (and 2 linguistic analysis derived) surface forms of a lexical ‘to sell’ ‘to knit’ ‘to go out’ item?”, Ackerman et al. 2009: 54). npst ujui H amjui H izji̵jui LH goals Check the accuracy of the description issues Too few full paradigms, no pre-existing and test hypotheses on new lexemes. cvb2 urugɘɘsi̵ H amjugɘɘsi̵ H izji̵rugɘɘsi̵ LH interactive database. neg uran H aman H izji̵ran LH issues Non-canonical verb morphology, solution Composition of two FSTs in order to multiple classes. obtain for any given realized cell ⟨푤, 휎⟩ dimp uroo H amoo H izji̵roo LH descrip- Two orthogonal sets of rules associated the list of all possible realized forms of iimp uri̵i̵ H aami̵ HL izji̵i̵ri̵ LH tion with two sets of classes. any other paradigm cell. cvb ui H amii H izji̵i̵ LH omotui LHL prog.npst des uicjaahai H amicjaahai H izji̵cjaahai LH omocjui LHL prog.npst T1 T2 T3 T4 T5 pst utan H adan H izji̵tan LH omoori̵ LH seq omootui LHL prog.npst Xbj - Xb Xb Xd seq uti̵i̵ H aadi̵ HL izji̵i̵ti̵ LH omoocjui LHL prog.npst Xmj - Xm Xm Xd prog utui LHL aadui HL izji̵i̵tui LHL … Xkj - Xk Xk Xcj results Entropy measures help identifying the Xcj - Xc Xt Xccj 1 linguistic documentation sources of uncertainty (unpredictable Xj X Xr Xr Xcj segmental alternations, neutralisation goals To speed up and ease the glossing of Xj X Xr Xr Xccj of vowel length and neutralisation of texts. Xj X X Xr Xtt ) and the principal parts of the issues Small amount of data, limited time, system (Finkel & Stump 2007). usual tools (Toolbox) not well-fitted to 퐻(퐶|퐿) npst conv iimp seq prog.npst non-concatenative phenomena. Class npst imp.dir seq npst 0.000 0.000 0.222 0.244 solution The transducer produces complete conv 0.923 0.082 0.296 0.550 paradigms for all existing lexemes. I1 RT1ui (H) RːT4i̵ (H) RT5ui (H) iimp 0.950 0.211 0.222 0.525 I2 RT1ui (H) RːT4i̵ (H) RT5ui (H) seq 1.362 1.012 0.425 0.317 ⟨aman LH, neg⟩ prog.npst 1.204 0.910 0.423 0.000 II1 RT1ui (H) RT4i̵i̵ (H) RT5ui (LHL) ⟨amoo H, dimp⟩ II2 RT1ui (H) RːT4i̵ (H) RT5ui (LHL) amjui LH ⟨aami̵ LH, iimp⟩ references III RT1ui (LH) RːT4i̵ (HL) RːT5ui (HL) ⟨aadi̵ H, seq⟩ Ackerman, Farrell & Blevins, James P. & Malouf, Robert. 2009. Parts and wholes: Implicative patterns in IV RT1ui (LH) RːT4i̵ (LH) RːT5ui (LHL) inflectional paradigms. In Blevins, James P. & Blevins, Juliette (eds.), Analogy in grammar: Form and … acquisition, 54–81. Oxford: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199547548.003.0003. V RT1ui (HL) RT4i̵ (HL) RT5ui (HL) Ackerman, Farrell & Malouf, Robert. 2013. Morphological organization: The low conditional entropy conjecture. Language 89(3). 429–464. https://doi.org/10.1353/lan.2013.0054. tourner-IPFV-PART-PROH máwaj-u-N-na Beesley, Kenneth & Karttunen, Lauri. 2003. Finite state morphology. Stanford: Center for the Study of Language and Information Publications. tourner-IPFV-PST-PART máwaj-ut-a-N solutions FSTs allow to: Blevins, James P. 2016. and paradigm morphology. Oxford: Oxford University Press. Bonami, Olivier & Luís, Ana R. 2015. Sur la morphologie implicative dans la conjugaison du portugais: Une étude quantitative. In Léonard, Jean Léo (ed.), Morphologie flexionnelle et dialectologie romane, • Check the formal description on 111–151. Leuven: Peeters. results A simple Python script is used to format Finkel, Raphael & Stump, Gregory. 2007. Principal parts and morphological typology. Morphology 17. lexemes and highlight the problems 39–75. https://doi.org/10.1007/s11525-007-9115-9 . the output, which can then be directly Hulden, Mans. 2009. Foma: A finite-state compiler and library. In Proceedings of the Demonstrations (comparison between data and Session at eacl 2009, 29–32. https://www.aclweb.org/anthology/E09-2008 . imported into Toolbox and used in the Jacques, Guillaume & Lahaussois, Aimée & Michailovsky, Boyd & Rai, Dhan Bahadur. 2012. An overview output); of Khaling verbal morphology. Language and Linguistics 13(6). 1095–1170. process of interlinearisation. http://www.ling.sinica.edu.tw/Files/LL/Docments/Journals/13.6/j2012_6_03_7314.pdf. • Test the results with the speaker Karttunen, Lauri. 2003. Computing with realizational morphology. In Gelbukh, Alexander (ed.), Computational linguistics and intelligent text processing, 203–214. Berlin: Springer. (computation of the whole paradigm https://doi.org/10.1007/3-540-36456-0_20 . \lx máwajunna Karttunen, Lauri. 2006. The insufficiency of paper-and-pencil linguistics: The case of Finnish prosody. for lexemes with incomplete data and In Butt, Miriam & Dalrymple, Mary & Holloway King, Tracy (eds.), Intelligent linguistic architectures: \mo máwaj-u-n-na Variations on themes by Ronald M. Kaplan, 287–300. Stanford: Center for the Study of Language and check with the speaker). Information Publications. http://roa.rutgers.edu/article/view/828. \ge tourner-IPFV-PART-PROH Matthews, Peter H. 1972. Morphology. 1st edn. Cambridge: Cambridge University Press. Pellard, Thomas & Yamada, Masahiro. 2017. Verb morphology and conjugation classes in Dunan results Robustness of formal analysis is (Yonaguni). In Kiefer, Ferenc & Blevins, James P. & Bartos, Huba (eds.), Perspectives on morphological organization, 31–49. Leiden: Brill. https://doi.org/10.1163/9789004342934_004. \lx máwajutaN confirmed by the computational study, https://hal.archives-ouvertes.fr/hal-01493096 . Snoek, Conor & Thunder, Dorothy & Lõo, Kaidi & Arppe, Antti & Lachler, Jordan & Moshagen, Sjur & \mo máwaj-ut-a-N and lacks of data in the description are Trosterud, Trond. 2014. Modeling the morphology of Plains Cree. In Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages, 34–42. \ge tourner-IPFV-PST-PART highlighted. https://doi.org/10.3115/v1/W14-2205 . Stump, Gregory T. 2001. Inflectional morphology: A theory of paradigm structure. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511486333. Stump, Gregory & Finkel, Raphael. A. 2013. Morphological typology: From word to paradigm. Cambridge: Cambridge University Press.