SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection Ekaterina Vylomova@ Jennifer Whiteq Elizabeth Saleskyz Sabrina J

SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection Ekaterina Vylomova@ Jennifer WhiteQ Elizabeth SaleskyZ Sabrina J. MielkeZ Shijie WuZ Edoardo PontiQ Rowan Hall MaudslayQ Ran ZmigrodQ Josef ValvodaQ Svetlana ToldovaE Francis TyersI;E Elena KlyachkoE Ilya YegorovM Natalia KrizhanovskyK Paula CzarnowskaQ Irene NikkarinenQ Andrew KrizhanovskyK Tiago PimentelQ Lucas Torroba HennigenQ Christo Kirov5 Garrett Nicolaiá Adina WilliamsF Antonios Anastasopoulosì Hilaria CruzL Eleanor Chodroff7 Ryan CotterellQ;D Miikka Silfverbergá Mans HuldenX @University of Melbourne QUniversity of Cambridge ZJohns Hopkins University EHigher School of Economics MMoscow State University KKarelian Research Centre 5Google AI áUniversity of British Columbia FFacebook AI Research ìCarnegie Mellon University IIndiana University LUniversity of Louisville 7University of York DETH Zürich XUniversity of Colorado Boulder [email protected] [email protected] Abstract 1950 and more recently, List et al., 2016), grammatical features, and even abstract implications A broad goal in natural language processing (NLP) is to develop a system that has the capac- (proposed in Greenberg, 1963), each language nev- ity to process any natural language. Most sys- ertheless has a unique evolutionary trajectory that tems, however, are developed using data from is affected by geographic, social, cultural, and just one language such as English. The SIG- other factors. As a result, the surface form of MORPHON 2020 shared task on morpholog- languages varies substantially. The morphology ical reinflection aims to investigate systems’ of languages can differ in many ways: Some ability to generalize across typologically dis- exhibit rich grammatical case systems (e.g., 12 tinct languages, many of which are low re- in Erzya and 24 in Veps) and mark possessive- source. Systems were developed using data from 45 languages and just 5 language fam- ness, others might have complex verbal morphol- ilies, fine-tuned with data from an additional ogy (e.g., Oto-Manguean languages; Palancar and 45 languages and 10 language families (13 in Léonard, 2016) or even “decline” nouns for tense total), and evaluated on all 90 languages. A (e.g., Tupi–Guarani languages). Linguistic typol- total of 22 systems (19 neural) from 10 teams ogy is the discipline that studies these variations were submitted to the task. All four winning by means of a systematic comparison of languages systems were neural (two monolingual trans- (Croft, 2002; Comrie, 1989). Typologists have de- formers and two massively multilingual RNN- based models with gated attention). Most fined several dimensions of morphological varia- teams demonstrate utility of data hallucination tion to classify and quantify the degree of cross- and augmentation, ensembles, and multilin- linguistic variation. This comparison can be chal- gual training for low-resource languages. Non- lenging as the categories are based on studies of neural learners and manually designed gram- known languages and are progressively refined mars showed competitive and even superior with documentation of new languages (Haspel- performance on some languages (such as In- math, 2007). Nevertheless, to understand the po- grian, Tajik, Tagalog, Zarma, Lingala), espe- tential range of morphological variation, we take a cially with very limited data. Some language families (Afro-Asiatic, Niger-Congo, Turkic) closer look at three dimensions here: fusion, inflec- were relatively easy for most systems and tional synthesis, and position of case affixes (Dryer achieved over 90% mean accuracy while oth- and Haspelmath, 2013). ers were more challenging. Fusion, our first dimension of variation, refers to the degree to which morphemes bind to one an- 1 Introduction other in a phonological word (Bickel and Nichols, Human language is marked by considerable diver- 2013b). Languages range from strictly isolat- sity around the world. Though the world’s lan- ing (i.e., each morpheme is its own phonolog- guages share many basic attributes (e.g., Swadesh, ical word) to concatenative (i.e., morphemes 1 Proceedings of the Seventeenth SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 1–39 Online, July 10, 2020. c 2020 Association for Computational Linguistics https://doi.org/10.18653/v1/P17 bind together within a phonological word); non- In this year’s iteration of the SIGMORPHON linearities such as ablaut or tonal morphology shared task on morphological reinflection, we can also be present. From a geographic perspec- specifically focus on typological diversity and aim tive, isolating languages are found in the Sahel to investigate systems’ ability to generalize across Belt in West Africa, Southeast Asia and the Pa- typologically distinct languages many of which cific. Ablaut–concatenative morphology and tonal are low-resource. For example, if a neural net- morphology can be found in African languages. work architecture works well for a sample of Indo- Tonal–concatenative morphology can be found in European languages, should the same architecture Mesoamerican languages (e.g., Oto-Manguean). also work well for Tupi–Guarani languages (where Concatenative morphology is the most common nouns are “declined” for tense) or Austronesian system and can be found around the world. Inflec- languages (where verbal morphology is frequently tional synthesis, the second dimension considered, prefixing)? refers to whether grammatical categories like tense, voice or agreement are expressed as affixes (syn- 2 Task Description thetic) or individual words (analytic) (Bickel and Nichols, 2013c). Analytic expressions are com- The 2020 iteration of our task is similar to mon in Eurasia (except the Pacific Rim, and the Hi- CoNLL-SIGMORPHON 2017 (Cotterell et al., malaya and Caucasus mountain ranges), whereas 2017) and 2018 (Cotterell et al., 2018) in that synthetic expressions are used to a high degree in participants are required to design a model that the Americas. Finally, affixes can variably sur- learns to generate inflected forms from a lemma face as prefixes, suffixes, infixes, or circumfixes and a set of morphosyntactic features that derive (Dryer, 2013). Most Eurasian and Australian lan- the desired target form. For each language we guages strongly favor suffixation, and the same provide a separate training, development, and holds true, but to a lesser extent, for South Ameri- test set. More historically, all of these tasks can and New Guinean languages (Dryer, 2013). In resemble the classic “wug”-test that Berko (1958) Mesoamerican languages and African languages developed to test child and human knowledge of spoken below the Sahara, prefixation is dominant English nominal morphology. instead. Unlike the task from earlier years, this year’s These are just three dimensions of variation in task proceeds in three phases: a Development morphology, and the cross-linguistic variation is Phase, a Generalization Phase, and an Evaluation already considerable. Such cross-lingual variation Phase, in which each phase introduces previously makes the development of natural language pro- unseen data. The task starts with the Develop- cessing (NLP) applications challenging. As Ben- ment Phase, which was an elongated period of der (2009, 2016) notes, many current architectures time (about two months), during which partici- and training and tuning algorithms still present pants develop a model of morphological inflection. language-specific biases. The most commonly In this phase, we provide training and develop- used language for developing NLP applications is ment splits for 45 languages representing the Aus- English. Along the above dimensions, English is tronesian, Niger-Congo, Oto-Manguean, Uralic productively concatenative, a mixture of analytic and Indo-European language families. Table 1 pro- and synthetic, and largely suffixing in its inflec- vides details on the languages. The Generaliza- tional morphology. With respect to languages that tion Phase is a short period of time (it started exhibit inflectional morphology, English is rela- about a week before the Evaluation Phase) during tively impoverished.1 Importantly, English is just which participants fine-tune their models on new one morphological system among many. A larger data. At the start of the phase, we provide train- goal of natural language processing is that the sys- ing and development splits for 45 new languages tem work for any presented language. If an NLP where approximately half are genetically related system is trained on just one language, it could (belong to the same family) and half are geneti- be missing important flexibility in its ability to ac- cally unrelated (are isolates or belong to a different count for cross-linguistic morphological variation. family) to the languages presented in the Develop- ment Phase. More specifically, we introduce (sur- 1Note that many languages exhibit no inflectional morphology e.g., Mandarin Chinese, Yoruba, etc.: Bickel and prise) languages from Afro-Asiatic, Algic, Dravid- Nichols (2013a). ian, Indo-European, Niger-Congo, Sino-Tibetan, 2 Siouan, Songhay, Southern Daly, Tungusic, Tur- sidered an important characteristic of the family. kic, Uralic, and Uto-Aztecan families. See Table 2 In addition, some of the families in the phylum use for more details. tone to encode tense, modality and number among Finally, test splits for all 90 languages are re- others. However, all branches use objective and leased in the Evaluation

Load more