Development of Natural Language Processing Tools for Cook Islands Maori¯ Rolando Coto-Solano1, Sally Akevai Nicholas2, and Samantha Wray3 1School of Linguistics and Applied Language Studies Victoria University of Wellington Te Whare Wananga¯ o te Upoko¯ o te Ika a Maui¯
[email protected] 2School of Language and Culture Auckland University of Technology
[email protected] 3Neuroscience of Language Lab New York University Abu Dhabi, United Arab Emirates
[email protected] Abstract 1.1 Minority Languages and NLP Lack of resources makes it difficult to train data- This paper presents three ongoing projects driven NLP tools for smaller languages. This is for NLP in Cook Islands Maori:¯ Un- compounded by the difficulty in generating input trained Forced Alignment (approx. 9% er- for Indigenous and endangered languages, where ror when detecting the center of words), dwindling numbers of speakers, non-standardized automatic speech recognition (37% WER writing systems and lack of resources to train spe- in the best trained models) and automatic cialist transcribers and analysts create a vicious part-of-speech tagging (92% accuracy for cycle that makes it even more difficult to take ad- the best performing model). These new re- vantage of NLP solutions. Amongst the hundreds sources fill existing gaps in NLP for the of languages of the Americas, for example, very language, including gold standard POS- few have large spoken and written corpora (e.g. tagged written corpora, transcribed speech Zapotec from Mexico, Guaran´ı from Paraguay and corpora, and time-aligned corpora down Quechua from Bolivia and Peru),´ some have spo- to the phoneme level.