LARA in the Service of Revivalistics and Documentary : Community Engagement and Endangered Languages∗ Ghil‘ad Zuckermann Sigurður Vigfússon The University of Adelaide The Communication Centre for the Deaf Australia and Hard of Hearing, Iceland [email protected] [email protected]

Manny Rayner Neasa Ní Chiaráin FTI/TIM, University of Geneva Trinity College, Dublin Switzerland Ireland [email protected] [email protected]

Nedelina Ivanova Hanieh Habibi The Communication Centre for the Deaf FTI/TIM, University of Geneva and Hard of Hearing, Iceland Switzerland [email protected] [email protected]

Branislav Bédi The Árni Magnússon Institute for Icelandic Studies, Iceland [email protected]

Abstract under threat. Given a particular language we care about, what can we do to improve its prospects? We argue that LARA, a new web platform that supports easy conversion of text into an on- To underline our commitment to this positive line multimedia form designed to support non- perspective, we will situate the discussion within native readers, is a good match to the task of the emerging trandisciplinary field of Revivalis- creating high-quality resources useful for lan- tics (Zuckermann, 2020), a neologism which com- guages in the revivalistics spectrum. We illus- bines the notions of “revival” and “linguistics”. trate with initial case studies in three widely This in no way should be read as playing down different endangered/revival languages: Irish (Gaelic); Icelandic Sign Language (ÍTM); and the importance of language documentation, which Barngarla, a reclaimed Australian Aboriginal for obvious reasons is an essential component of language. The exposition is presented from a the enterprise. Rather, we want to view documen- language community perspective. Links are tary linguistics through a revivalistic lens, docu- given to examples of LARA resources con- menting the language so that the material we pro- structed for each language. duce can directly help community members who are trying to strengthen their linguistic abilities but 1 Introduction may not be linguistically sophisticated. In this pa- When talking about languages with small, shrink- per, we argue that LARA (Learning And Reading ing or non-existent speaker bases, we can adopt a Assistant; (Akhlaghi et al., 2019)), an open source positive or a negative attitude. If we say “endan- web platform that supports easy conversion of text gered” or “dead” languages, the terms predisposes into multimodal form, offers functionality that fits us towards a negative, deficit point of view: per- surprisingly well with the goals of revivalistics, haps the most important thing is to try to document and makes available a plethora of possibilities for the language as well as we can while information rapid creation of useful online teaching materials. about it is still available. In this paper, we will In the rest of the paper, we start in §2 by giving in contrast accentuate the positive. All languages, a brief introduction to revivalistics, as here con- except those where 100% of the children in the ceptualised, and to LARA. In §§3–5, we present relevant group grow up speaking the language, are case studies showing how we have used LARA *∗ Authors in reverse alphabetical order. to construct resources for three widely different

13

Proceedings of the 4th Workshop on the Use of Computational Methods in the Study of Endangered Languages: Vol. 1 Papers, pages 13–23, Online, March 2–3, 2021. endangered/Sleeping Beauty (“dead”) languages: The basic approach is in line with Krashen’s in- Irish (Gaelic); Icelandic Sign Language (ÍTM); fluential Theory of Input (Krashen, 1982), sug- and Barngarla, a reclaimed Australian Aboriginal gesting that language learning proceeds most suc- language. The final section concludes and outlines cessfully when learners are presented with inter- ongoing work. esting and comprehensible L2 material in a low- anxiety situation. LARA implements this ab- 2 Background: Revivalistics and LARA stract programme by providing concrete assistance to L2 learners, making texts more comprehensi- 2.1 Revivalistics ble to help them develop their reading, vocab- Revivalistics is defined by Zuckermann (2020) ulary and listening skills. In particular, LARA as “a new global, trans-disciplinary field of en- texts include and human-recorded au- quiry surrounding language reclamation, revital- dio (video, in the case of sign languages) attached ization and reinvigoration”. Zuckermann consid- to words and sentences, and a personalised concor- ers these terms as different points on a “revival dance constructed from the learner’s reading his- cline or spectrum”. Here, Reclamation is the tory. An important point is that the concordance revival of a no-longer spoken language (“Sleep- is organised by lemma, rather than by surface ing/Dreaming Beauty”), the best-known case be- form; this requires the LARA text to be marked ing Hebrew; Revitalization is the revival of a up so that each word is annotated with its associ- severely , for example Ad- ated lemma, a process which for many languages nyamathanha of the Flinders Ranges in Australia; can be performed semi-automatically with an in- and Reinvigoration is the revival of an endan- tegrated third-party tagger/lemmatizer doing most gered language that still has children speaking it, of the work (Akhlaghi et al., 2020). for example the Celtic languages Irish and Welsh. From the user perspective, the consequence of Zuckermann argues at length that it is helpful to the above is that the learner, just by clicking or adopt a broad perspective, both when consider- hovering on a word, is always in a position to an- ing the above as instances of a single set of is- swer three questions: what does it mean, what sues, and when considering revivalistics as an in- does it sound like (look like, in the case of sign herently trans-disciplinary field of inquiry, which languages), and where have I seen some form of by its nature involves not only linguistics and lan- the word before. Figure1 shows an example for an guage technology but also anthropology, sociol- Irish text. Students can test their knowledge of a ogy, politics, law, mental health and other disci- text using several kinds of automatically generated plines. To help a language that is under threat, it is flashcards, with a new set of flashcards created on necessary to consider why it is under threat, what each run (Bédi et al., 2020). the consequences are for the (current and poten- tial) speakers of the language, what their motiva- Related platforms, from which the project has tion is, if any, for wanting to strengthen the lan- adapted some ideas, include Learning With Text3 guage, and what in practice can be done. and Clilstore4. The LARA tools are accessed In this paper, we will be most immediately con- through a free portal, divided into two layers. The cerned with language and language technology, core LARA engine consists of a suite of Python but the other aspects are also implicitly present. modules, which can also be run stand-alone from the command-line; on top, there is a web layer 2.2 LARA implemented in PHP. Comprehensive online doc- umentation is available (Rayner et al., 2020). LARA1 (Learning and Reading Assistant) is a collaborative open source2 project, active since In the following sections, we describe how mid-2018, whose goal is to develop tools that LARA is being used to create resources for the support conversion of plain texts into an interac- three languages which are our main focus in this tive multimedia form designed to support devel- paper. opment of L2 language skills by reading online.

1https://www.unige.ch/callector/lara/ 2https://sourceforge.net/projects/ 3https://sourceforge.net/projects/lwt/ callector-lara/ 4http://multidict.net/clilstore/

14 Figure 1: Example of Irish LARA content, Fairceallach Fhinn Mhic Cumhaill, (‘Fionn’s burly friend’). A ‘play all’ audio button function is included at the top of the page to enable the listener to hear the entire story in one go (1). The text and images are in the pane on the left hand side. Clicking on a word displays information about it in the right hand pane. Here, the user has clicked on bhí = “to be (past tense)” (2), showing an automatically generated concordance; the lemma bí; and every variation of bí that is in this text (3). Hovering the mouse over a word plays audio and shows a popup at word-level. Clicking on a loudspeaker plays audio for the entire sentence as well as showing a popup translation (4). The back-arrows (5) link each line in the concordance to its context of occurrence. A link to the document can be found on the LARA content page.

3 Irish (Gaelic) language teaching in this context poses numerous challenges. Some are discussed here, as a pre- 3.1 The context lude to discussion of how LARA can help address Irish belongs to the Goidelic branch of the Celtic them. languages. It is community language spoken in relatively small regions (Gaeltacht regions), pri- 3.2 Challenges in the teaching of Irish marily in the West of Ireland, with daily speaker A major challenge is that the teachers are typ- numbers of about 20,586, or 0.43% of the Irish ically second language learners themselves, and population (CSO, 2016). Note, however, that there their own command of the language (or their con- are no monolingual communities, and even in the fidence in it) can be problematic. Teachers of- Gaeltacht, English is increasingly dominant. Out- ten feel overburdened with the major responsibil- side these rather small and scattered communities, ity placed on them in the revitalisation and main- Irish is spoken as a first language in individual tenance initiative, but report inadequate resources households, mostly in urban areas. and training to fulfil it (Dunne, 2019). One area As a minority and endangered language, Irish of particular difficulty is pronunciation (see more is perhaps unique in being the recognised first of- below) – as learners typically lack native speaker ficial language of the State, and since its founda- models, they tend to acquire spoken forms very far tion, the revival of Irish has been a national policy. from those of Gaeltacht speakers. It has always been implicitly understood that the Other problems arise with language-teaching teaching of Irish on a national scale is the cen- resources and teaching approaches: materials are tral plank of language maintenance. Irish is an often outdated for the modern child — particularly obligatory subject in schools, and there is a thus as Irish lives in the shadow of English, for which a large cohorts of learners (c. 700,000 learners endless exciting resources are at people’s finger- in the system in the Republic of Ire- tips. Inevitably, there are issues of student atti- land (Ní Chiaráin and Ní Chasaide, 2020)). Irish tude and motivation, related to the modality and

15 the content of language teaching materials. There many reasons. Firstly, it allows for rapid develop- is a need for materials adapted to local context, ment of content by teachers, with native-speaker materials that are exciting and varied for the young speech output, that essentially bring the native digitally-savvy child, and materials that encourage speaker model into the classroom – available to ‘reading for fun’. learners at all times. It thus provides not just Social media has gone a long way to connect one exemplar, but a potential environment where the Irish-language community generally in recent speech is integral to numerous reading activities years, and this in turn has benefitted the morale of with the consequence that children are constantly teachers. Nonetheless, they often exposed to the native spoken language and devel- feel isolated and left to their own strategies and oping native speaker pronunciation as a norm. For feel the lack of a shared pool of high-quality re- native-speaker children in the Gaeltacht, the fact sources: there is a great need for a community of that the choice of dialects is available means that users and contributors that might support them and the local teacher can add the appropriate speech the learner. output to assist with reading acquisition. For learners both within and outside of the Gaeltacht, 3.3 Linguistic Challenges the prevalent use of speech output in reading mate- Quite apart from the sociolinguistic context of rials means that the linkage of pronunciation and Irish learning, there are many linguistic features written forms is being reinforced while reading, in the language that are very different from En- and this should greatly assist learners to gain an in- glish, and therefore challenging to the L1 English- tuitive grasp of the phonic regularities of the writ- speaking learner. Firstly, although there is an ten language – crucial to literacy acquisition. agreed standardised written form, there is no sin- The flexibility afforded by the inclusion of the gle spoken standard, but rather three major di- synthetic voices offers the possibility for creative alects. teachers to expand the diversity of materials, ap- As mentioned, the pronunciation of Irish is very propriate to the cohort they are teaching, and var- different from English. A striking feature of the ied to maintain their interest in an ongoing way. sound system is the contrast of palatalised and ve- The content of teaching materials can be adapted larised consonants (with a very large inventory, to the local context and to the topics that are cur- relative to English). This feature is partially ob- rently of interest to the learners, and content can scured (and complicated for learners) by the rather even be personalised to suit the wide diversity of opaque writing system, and the link of the sounds individual learners’ interests and levels. It is hoped to written forms is often poorly understood. There that this widening of the palate of offering for the are numerous other features. The basic word or- learner may increase the motivation of learners of der is VSO. The initial sounds of lexical items different ages, levels and learning styles – key to a ‘mutate’ in certain grammatical contexts, so that positive attitude and better learning outcomes. e.g., in a word like bord ‘table’ it may be [b], [w], [v] or [m]. And the language is highly inflected The other features of LARA are also valuable (e.g., verbal forms can have as many as 42 in- in making more complex content accessible be- flected forms). cause of the level of scaffolding provided by the dictionaries, etc. so that learners can browse mate- 3.4 LARA and Irish rials of interest, which might be beyond their lan- A major feature of LARA for Irish is the in- guage level. The linkage provided to the lemma tegration of synthetic voices for spoken output. forms and the access to all forms of a given lemma This is made possible due to the recent develop- provides constant reinforcement, helping to relate ment of synthetic voices for the main Irish di- words that may look and sound like very differ- alects by the ABAIR project (they are available at ent items – helping to consolidate the learner’s im- www.abair.ie). The Irish co-author of this paper plicit knowledge of grammatical forms. While the is directly involved with this initiative (ABAIR, explicit teaching of and vocabulary has 2020). tended to be out of favour in recent teaching ap- The integration of native-quality synthetic proaches, to the detriment of students’ literacy ac- speech, rather than relying on pre-recorded speech curacy, LARA is a tool that can be used to help is particularly interesting for the Irish context for direct learner’s attention to grammatical features,

16 in a way that may not be felt as onerous by the Stefánsdóttir, 2015; Stefánsdóttir et al., 2019). learner. According to (Þorvaldsdóttir and Stefánsdóttir, The emphasis in LARA on identifying the more 2015), The Communication Centre for the Deaf frequent words is another helpful feature, as it al- and Hard of Hearing (the Center) in Iceland pro- lows vocabulary acquisition to be maximally use- vided interpreter service to 178 Deaf signers ac- ful for the learner, directing their attention to the cording to the latest data from 2013. They point core vocabulary of the language, and giving a out that not all Deaf signers use the interpreter sense of mastery. service from the Center, so the number of native From the teachers’ perspective, the Irish ver- signers may be higher. In addition, there are about sion of LARA, with its synthetic speech facility 50 Deaf immigrants in Iceland who use ÍTM (Ste- also enables the creation of a community of learn- fánsdóttir et al., 2019). The authors estimate that ers (classrooms / people with shared interests). there are around 1000–1500 hearing L2 users of LARA, along with other educational speech-based ÍTM. resources, is a move in the right direction to em- Atypically, hereditary deafness barely exists in powering the community of Irish language teach- Iceland. It is to be found only in three families, ers by connecting them to powerful new resources two going only one generation back. On the other with which to engage their students. In the future it hand all Deaf immigrant families who have Deaf also promises a mechanism for sharing high qual- children report hereditary deafness several gener- ity resources, to the benefit of the broader com- ations back (the Center, Ivanova). This has given munity of Irish language teachers. It also has the rise to a unique situation where 75% of the chil- potential to engage the young digital generation in dren under 9 who use ÍTM are second genera- a way that traditional materials are now less effec- tion Deaf of Deaf immigrants, and all of them are tive for. born in Iceland (Koulidobrova and Ivanova, 2020). They grow up unimodal bilingual with the her- 4 ÍTM/Icelandic Sign Language itage sign language and ÍTM; they learn written 4.1 Overview, history and social context Icelandic when they start school, and it is their L3 or L4 depending on whether they also learn Icelandic Sign Language (íslenskt táknmál; ÍTM) the written language of the heritage sign language. is a natural language and the first language of the Koulidobrova and Ivanova (2020) report that 25% Deaf and their children in Iceland. ÍTM is an in- of the signing children under the age of 9 have ac- digenous minority language (Stefánsdóttir et al., cess to sound by hearing aids and hearing parents. 2019). The Deaf community in Iceland defines They grow up bimodal bilingual, with spoken Ice- itself as a linguistic minority, i.e. not on the ba- landic and ÍTM. Taking all facts into account, an sis of biological or medical deafness but from the unavoidable impact is that ÍTM changes because language they consider their mother tongue (Ste- the youngest generation of native signers is sec- fánsdóttir, 2005; Sverrisdóttir, 2007). ÍTM was ond generation Deaf with Deaf immigrant parents, acknowledged as the first language of the Deaf, and there is only one Deaf family with Deaf chil- hard of hearing and deaf-blind people with the es- dren with the other parent being of Icelandic ori- tablishment of Act No. 61/2011 on the Status of gin. According to the typology of §2.1, the fact the Icelandic Language and Icelandic Sign Lan- that hardly any child signers of ÍTM have parents guage. According to the 2015 annual report from who are native ÍTM signers places it firmly in the the Icelandic Sign Language Council (Stefánsdót- “revitalization” category. tir et al., 2015) and to Stefansdottir, Kristinsson The first school for the Deaf in Iceland was es- and Hreinsdottir (2019), ÍTM is an endangered tablished in 1867 by the Rev. Páll Pálsson (Þor- language and the language community is still ex- valdsdóttir and Stefánsdóttir, 2015). Before that periencing discrimination. In their 2019 article, time 24 deaf children were sent to Denmark of Stefansdottir et al stress that ÍTM is still associ- whom 16 came back (Kristjánsson). Until 1922 ated with disability and impairment. when the Act on education of deaf and mute chil- ÍTM is the first language of 250–300 Icelanders dren5 came into effect, parents of deaf children (Report of the committee on the judicial status of could choose whether their Deaf child would be Icelandic and Icelandic Sign Language, 2010:86; (Brynjólfsdóttir et al., 2012; Þorvaldsdóttir and 5Lög um kennslu heyrnar- og málleysingja nr. 24/1922

17 educated in Iceland or in Denmark. After 1922 are in practice difficult to use; since as they are all Deaf children were required to be educated in not annotated, it is hard to search for examples in Iceland from the age of eight and their education them. Some first steps towards building up an an- was compulsory. Even though some children were notated corpus for ÍTM at the Center were taken educated in Denmark before 1922, there is insuf- with two short-term projects in 2014 and 2017 ficient scientific evidence for the claim that ÍTM funded by the Student Innovation Fund (the Cen- is derived from Danish Sign Language. Research ter, p.c. Thorvaldsdottir). All data and results are (Aldersson and McEntee-Atalianis, 2007; Sverris- kept at the Center and in the light of the informa- dóttir and Þorvaldsdóttir, 2016) shows similari- tion above it is very important that they are used ties in the between ÍTM and DTS. Those for ongoing work in order to preserve and docu- are due to borrowing but not of a genetic rela- ment the language, follow its historical develop- tion (Sverrisdóttir and Þorvaldsdóttir, 2016). As ment, and secure the survival of ÍTM. Thorvaldsdottir and Stefansdottir point out (2015), ÍTM has been in contact with other spoken and 4.3 LARA and ÍTM sign languages which have influenced its develop- There are several different scenarios involving lan- ment. Today, there are 13 foreign sign languages guage learning and ÍTM that can potentially make used in Iceland. use of LARA: (1) Deaf people learning written ÍTM does not have dialects or genderlects, but Icelandic; (2) Hearing people learning ÍTM as L2; there is significant age-graded variation (Þorvalds- (3) Deaf immigrants learning ÍTM as L2; (4) Deaf dóttir and Stefánsdóttir, 2015). The lack of di- children learning ÍTM as L1; (5) Hearing children alects may be explained by the fact that both the learning ÍTM as L1 (Coda). The first case is the preschool and the compulsory school for the Deaf most straightforward one, and formed a logical have always been in Reykjavík, as are the Associ- starting point. As noted above, written Icelandic ation for the Deaf and the interpreter services pro- is in most cases an L3 or even an L4 for Deaf chil- vided by the Center. More research is needed on dren in Iceland, so tools that can help them make ethnolects in ÍTM today resulting from the Deaf progress in reading are potentially very useful. immigrants. ÍTM is taught as an L2 in different It turned out to quite easy to extend LARA so type of courses at the Center and in the Sign Lan- that it can support this kind of scenario: basically, guage Studies BA program at the University of all that was necessary was to arrange things so Iceland. There is a lack of teaching materials and that recorded signed video can systematically be tools for ÍTM as an L1. used as an alternative to recorded audio. Thus a LARA text of this type is written in Icelandic, 4.2 Research but words and sentences are associated with ÍTM The grammar of ÍTM has been researched less signed videos. The signed video for a word is ac- than many other sign languages. Even if research cessed by clicking on the word; the signed video on ÍTM has been ongoing for 30 years, only cer- for a sentence is accessed by clicking on a cam- tain parts of the language have been studied (Sver- era icon inserted at the end of the sentence. (In risdóttir, 2000; Aldersson and McEntee-Atalianis, ‘video mode’, the camera icon replaces the usual 2007; Thorvaldsdóttir, 2007, 2008; Þorvaldsdót- loudspeaker icon). tir, 2011; Brynjólfsdóttir, 2012; Beck, 2013; Bryn- Videos are recorded using the same third-party jólfsdóttir et al., 2012). Research at the Center has recording tool as is used for recording audio been mainly of a practical nature; much of the in- content; the tool had already been adapted for formation that has been gathered has not been pub- this purpose in a previous project (Ahmed et al., lished in journals. The Center for Sign Language 2016). The workflow for recording is modality- Research was established in 2011 with the aim of independent. The LARA portal creates the record- enhancing theoretical grammar research and co- ing script from the text and uploads it to the operation between the Center and the Institute of recording tool; the voice talent/signer records the Linguistics at the University of Iceland. audio/video from the script; at the end, the portal There is currently no proper corpus for ÍTM, but downloads the recorded multimedia and inserts it such a resource would greatly help the survival of into the LARA document. the language. Most of existing video recordings A link to an initial example of a LARA docu-

18 ment of this kind, a children’s story about 2.7K garla has a phonemic inventory featuring three words long, is posted on the LARA examples vowels ([a], [i], [u]) and retroflex consonants, an page. The student who created the signed con- ergative grammar with many cases, and a com- tent turned to two members of staff at the Center plex pronominal system. Unusual features in- for feedback. One is a native ÍTM signer and the clude a number system with singular, dual, plural other has worked as an sign language interpreter and superplural (warraidya, ‘emu’; warraidyal- for over two decades. There were many things to bili ‘two emus’; warraidyarri ‘emus’; warraidyai- consider, as ÍTM is not a standardised language, lyarranha, ‘a lot of emus’) and matrilineal and even to the extent that the basic word order is un- patrilineal distinction in the dual. For example, clear: research (Brynjólfsdóttir, 2012) shows that the matrilineal ergative first person dual pronoun subjects accept both SOV and SVO word orders. ngadlaga (‘we two’) would be used by a mother The central issue was the question of whether the and her child, or by a man and his sister’s child, signed translation of the text should be true to the while the patrilineal form ngarrrinyi would be Icelandic original or re-expressed in ÍTM. One ar- used by a father and his child, or by a woman with gument is that, as a tool to learn written Icelandic, her brother’s child. the translation should be faithful to the source During the twentieth century, Barngarla was so that ÍTM signs corresponding to the Icelandic intentionally eradicated under Australian ‘stolen words appearing there. The argument in the oppo- generation’ policies, the last original native site direction is that a free re-interpretation is bet- speaker dying in 1960. Language reclamation ter suited to helping Deaf children understand the efforts were launched on September 14 2011 in signed content. In the end an interpreting strategy a meeting between Zuckermann and representa- was preferred for three reasons. Comprehension tives of the Barngarla people (Zuckermann, 2020). of the signed text is crucial for Deaf children; the Since then, a series of language reclamation work- interpreting strategy seemed to be a better fit to the shops have been held in which about 120 Barn- content of a children’s book; and in LARA learn- garla people have participated. The primary re- ers can click on a word in the Icelandic text to see source used has been a dictionary, including a brief the ÍTM sign, if the corresponding sign did not ap- grammar, written by the German Lutheran mis- pear in the freely translated segments. sionary Clamor Wilhelm Schürmann (Schürmann, 1844). 5 Barngarla 5.2 Using LARA for Barngarla 5.1 The context Published resources for Barngarla, non-existent Barngarla is a dreaming, sleeping beauty tongue ten years ago, are now emerging. The most visi- belonging to the Thura-Yura language group, ble example to date is Barngarlidhi Manoo (Zuck- which also includes Adnyamathanha, Kuyani, ermann and the Barngarla, 2019), a Barngarla al- Nukunu, Ngadjuri, Wirangu, Nawoo, Narangga, phabet book/primer compiled by Ghil‘ad Zucker- and Kaurna. The name Thura-Yura derives from mann in collaboration with the nascent Barngarla the fact that the word for ‘man, person’ in these revivalistic community. A first step in evaluating languages is either thura or yura —- consider the possible relevance of LARA to Barngarla was Barngarla yoora. The Thura-Yura language group to convert this book into LARA form (Butterweck is part of the Pama-Nyungan , et al., 2019). The LARA functionality is primarily which includes 306 out of 400 Aboriginal lan- used to attach audio recordings to Barngarla lan- guages in Australia, and whose name is a merism guage: words and phrases marked in red can be derived from the two end-points of the range: the played by hovering the mouse over them. Links Pama languages of northeast Australia (where the to the freely available LARA text are posted on word for ‘man’ is pama) and the Nyungan lan- the LARA examples page6 in two versions, one guages of southwest Australia (where the word recorded by Zuckermann and one by ethnic Barn- for ‘man’ is nyunga). According to (Bouckaert garla language custodians. et al., 2018), the Pama-Nyungan language fam- ily arose just under 6,000 years ago around Bur- A second resource was produced as part of the ketown, Queensland. 6https://www.unige.ch/callector/lara- Typically for a Pama-Nyungan language, Barn- content/

19 “Fifty Words Project”7, which collects together form will be evaluated by this learner group, and fifty basic words such as “fire”, “water”, “sun” learner feedback sought concerning how engaging and “moon” for several dozen Aboriginal lan- and user-friendly they found the multimodal text guages. The Barngarla version, recorded by eth- presentation; how useful they found the specific nic custodian Jenna Richards LARA features; and whether they found it encour- from Galinyala (= Port Lincoln), is available both aged them to be more active in their learning. on the Fifty Words and the LARA examples pages. As a second step, the teacher community will be Two more LARA resources for Barngarla are as targeted. The aim is not only to get a wider user of early 2021 nearly complete. We describe them group and to elicit feedback but also to involve in the next section. groups of teachers in the creation of materials that are targeted at their own learner cohorts and con- 6 Summary and further directions texts. As with other learning platforms being de- We have described work where the LARA plat- veloped, the long term goal is for a community of form has been used to construct annotated multi- teachers contributing to a growing bank of shared modal resources for three endangered languages, resources, catering for learners at every level – a outlining the relevant background and the our rea- need already identified by teachers (cf. §3). sons for believing that LARA has strong potential With regards to Irish teachers adapting their for assisting in various kinds of language docu- own materials it should be noted that the integra- mentation and “revivalistics” efforts, and describ- tion of TTS has, to date, been done in an ad hoc ing some early examples of LARA resources for way, using the command-line interface. We hope these languages. LARA documents can be pro- to streamline the process so that in future only a duced quickly and easily (Akhlaghi et al., 2020),8 very small investment of time would be required and more ambitious efforts are in an advanced for a teacher to adapt their resources into a LARA stage of preparation. We briefly outline ongoing format. The part-of-speech tagging and synthesis work here. steps are unsupervised, therefore once the content In the case of Irish, the immediate intention is fixed a short story could take as little as 10 min- is to seek to engage the community of learners, utes to convert to LARA format. Teachers may and then, increasingly the community of teach- wish to proof-listen to the TTS output and, if nec- ers, involving them in the development of LARA essary, correct the transcriptions, which could take resources for their own teaching contexts. The some time. If they wished, teachers could also add current provision is for eight short stories, span- their own sentence-level translations, another task ning the three major dialects. These are based on where the time commitment depends on the length episodes from the mythology sagas of the band of the text. A more principled solution to integrate of warriors Na Fianna, adapted from historic col- TTS capability into LARA is currently being in- lections (see https://www.abair.tcd.ie/ vestigated and we hope will be ready to roll out in scealai/#/resources). These stories, pre- 2021. pared in Summer 2020, are currently being used Looking to the future, the ABAIR initiative is with a third level group of various backgrounds aiming to document, and produce virtual speak- and levels approximating B2 level, who are pur- ers, i.e., synthetic voices for even the most endan- suing an advanced level module in ‘Irish Lan- gered dialects. (Although in §3 we refer to the guage and Culture’. Although the texts are fairly three major dialects of Irish, there are also sub- difficult, the LARA features allow individuals to dialects which are severely endangered as well as take what they need from it. It facilitates teach- dialects no longer spoken.) It is even envisaged ing to a mixed-level group, and supports non- that it may be possible to ‘resurrect’ recently de- linear progression of language learning. The plat- ceased dialects such as the Irish of Co. Clare, no longer spoken, but for which recordings do exist. 7https://50words.online/ 8Special considerations extraneous to LARA may consid- In the shorter term, LARA provides a tool that can erably increase the workload. For the ÍSL text described here, be purposed to permit access to Clare texts and by far the most laborious task was creating the sign language recordings and should help bring this dialect alive videos. For the Barngarla texts, well over half the effort was spent adding the complex HTML formatting needed to repro- – in a way that could engage young people from duce the layout used in the paper version. that region and beyond. In these ways, the case

20 TÍNA BAKDYR KOMA-INN MAMMA HVAR i Tina back-door come in, classifier Mom + start Q non-manual where, Q non-manual (“Tina comes in through the back door. ‘Where are you, Mom?’ she asks”)

Figure 2: ÍTM sentence from ‘pure ÍTM’ version of Tína story, lines as follows: (1) sequence of ID-glosses, (2) annotations, (3) translation. Only the first line is shown. Hovering over an ID-gloss shows a popup with the corresponding annotation. Clicking on an ID-gloss plays a video for the sign and shows the concordance. Clicking on a camera icon plays signed video for the whole sentence. (54a) Ngarrinyelbudninge ninna Parnkalliti Ngarrinyarlboodninga nhina Barngarlidhi. n ngarrinyarlboo- dninga nhina Barngarla- dha n us (all)- with you Barngarla- PRES ‘Through us or with us you are Parnkalla [you have learned the language from us].’

Figure 3: Example of Barngarla sentence from LARA version of (Schürmann, 1844), lines as follows: (1) sen- tence in original Schürmann orthography, (2) sentence in modern Barngarla orthography, (3) sentence with words decomposed into morphemes, (4) glosses for morphemes, (5) translation. All five lines are shown on the screen. Hovering over a word or morpheme plays recorded audio and shows a popup with a list of all possible associated glosses in the text; clicking on a word or morpheme shows a concordance; clicking on a loudspeaker icon plays audio for the whole sentence. use here is not that distant from the Barngarla use. trated in Figure3. An advantage of LARA is that For ÍTM, the next goal is to develop methods for the text can easily be produced in two forms: one creating LARA documents for true ÍTM texts, as is designed for the professional linguist, the other opposed to Icelandic texts with ÍTM annotations tailored for ethnic Barngarla readers. (cf. §4). The immediate problem is that sign lan- A second Barngarla text, Mangiri (Zuckermann guages lack a written form. In practice, a signed and the Barngarla, 2021) has been designed as utterance is represented as a sequence of manual a teaching resource. In contrast to Barngarlidhi (hand) sign representations, typically accompa- Manoo (cf. §5), which is almost exclusively fo- nied by a parallel line showing non-manual (non- cused on vocabulary, Mangiri introduces some hand) signs, most often facial expressions. Lexi- nontrivial grammar using annotated example sen- cal signs are conventionally written in uppercase tences in the same interlinear form as (Zucker- as “ID-glosses” (Johnston, 2008). Signs which are mann, 2021). Many of these examples are taken not lexicalized, usually “classifier constructions”9, from translations of English and Hebrew songs are written in lowercase as short general descrip- based on Biblical verses. tions of what the sign stands for. In a broader perspective, this thread of work can The approach used to represent signed docu- be seen as an interesting process of turning his- ments in LARA is to make the “text” the sequence torical injustice against itself. A book, written in of ID-glosses and classifier descriptions, and at- 1844 in order to help a German Lutheran mission- tach information to these elements appearing as ary bring the “Christian light” to Aboriginal peo- popups. An ÍTM document of this kind is cur- ple at the expense of their own spirituality, is used rently under construction. Figure2 illustrates. to do the opposite thing, and reunite the Barngarla As of early 2021, two new LARA texts for people with their own heritage; Christian texts Barngarla are nearly complete. The first of these are being translated to provide attractive linguis- is a LARA version of a language documenta- tic examples in reclaimed Barngarla; and technol- tion and revivalistic project (Zuckermann, 2021), ogy used during the colonial and Stolen Genera- which presents the full set of 708 Barngarla sen- tion periods to oppress indigenous people is used tences extracted from (Schürmann, 1844). Each in the form of LARA to help them recover their sentence is shown using the interlinear form illus- own autonomy, spirituality and well-being. These thoughts encourage us to proceed further down the 9“Classifiers” can most easily be thought of as short semi- path we have begun to explore. lexicalized playlets performed with the hands. They are com- mon in all sign languages.

21 References and linguistics papers with LARA. In Proc. 12th annual International Conference of Education, Re- ABAIR: An Sintéiseoir Gaeilge – The Irish search and Innovation, Seville, Spain. Language Synthesiser ABAIR. 2020. http://www.abair.ie. As of 8 September 2020. CSO. 2016. Census 2016 Profile 10 – Education, Skills Farhia Ahmed, Pierrette Bouillon, Chelle Destefano, and the Irish Language. https://www.cso.ie/ Johanna Gerlach, Angela Hooper, Manny Rayner, en/csolatestnews/presspages/2017. Irene Strasly, Nikos Tsourakis, and Catherine Weiss. As of 11 October 2020. 2016. Rapid construction of a web-enabled medi- cal speech to sign language translator using recorded Claire M. Dunne. 2019. Primary teachers’ experiences video. In Proceedings of FETLT 2016, Seville, in preparing to teach Irish: Views on promoting the Spain. language and language proficiency. Studies in Self- Access Learning, 10(1):21–43. Elham Akhlaghi, Branislav Bédi, Fatih Bekta¸s, Har- ald Berthelsen, Matthias Butterweck, Cathy Chua, Trevor Johnston. 2008. Corpus linguistics and signed Catia Cucchiarini, Gül¸senEryigit,˘ Johanna Gerlach, languages: no lemmata, no corpus. In 3rd Workshop Hanieh Habibi, Neasa Ní Chiaráin, Manny Rayner, on the Representation and Processing of Sign Lan- Steinþór Steingrímsson, and Helmer Strik. 2020. guages. Constructing multimodal language learner texts us- ing LARA: Experiences with nine languages. In Elena Koulidobrova and Nedelina Ivanova. 2020. Ac- Proceedings of The 12th Language Resources and quisition of phonology in child Icelandic Sign Lan- Evaluation Conference, pages 323–331. guage: Unique findings. Proceedings of the Linguis- tic Society of America Elham Akhlaghi, Branislav Bédi, Matthias Butterweck, , 5(1):164–179. Cathy Chua, Johanna Gerlach, Hanieh Habibi, Junta Ikeda, Manny Rayner, Sabina Sestigiani, and Stephen Krashen. 1982. Principles and practice in sec- Ghil’ad Zuckermann. 2019. Overview of LARA: ond language acquisition. A learning and reading assistant. In Proc. SLaTE 2019: 8th ISCA Workshop on Speech and Language Ó.P. Kristjánsson. Málleysingjakennsla hér á landi. Technology in Education, pages 99–103. [the teaching of the mute here in the country]. Men- ntamál, 18(1):7–17. Russell R. Aldersson and Lisa McEntee-Atalianis. 2007. A lexical comparison of Icelandic sign lan- Neasa Ní Chiaráin and Ailbhe Ní Chasaide. 2020. The guage and Danish sign language. Birkbeck Studies potential of text-to-speech synthesis in computer- in Applied Linguistics, 2:123–158. assisted language learning: A minority language perspective. In Alberto Andujar, editor, Recent Þórhalla Guðmundsdóttir Beck. 2013. Í landi myn- Tools for Computer- and Mobile-Assisted Foreign danna. Um merkingu og uppruna lysandi` orða í Language Learning, chapter 7, pages 149–169. IGI táknmáli. Ph.D. thesis. Global, Hershey, PA. Branislav Bédi, Matt Butterweck, Cathy Chua, Johanna Gerlach, Birgitta Björg, Hanieh Habibi Guðmars- Manny Rayner, Hanieh Habibi, Cathy Chua, and dóttir, Bjartur Örn Jónsson, Manny Rayner, and Sig- Matt Butterweck. 2020. Constructing LARA urður Vigfússon. 2020. LARA: An extensible open content. https://www.issco.unige.ch/ source platform for learning languages by reading. en/research/projects/callector/ In Proc. EUROCALL 2020. LARADoc/build/html/index.html. Online documentation. Remco R. Bouckaert, Claire Bowern, and Quentin D Atkinson. 2018. The origin and expansion of Pama– Clamor Wilhelm Schürmann. 1844. A Vocabulary of Nyungan languages across Australia. Nature ecol- the Parnkalla Language. Spoken by the natives in- ogy & evolution, 2(4):741–749. habiting the western shore of Spencer’s Gulf. To which is prefixed a collection of grammatical rules, Elísa Guðrún Brynjólfsdóttir. 2012. Hvað gerðir þú við hitherto ascertained. peningana sem frúin í Hamborg gaf þér? Myndun hv-spurninga í íslenska táknmálinu. Ph.D. thesis. V. Stefánsdóttir, A.P Kristinsson, H.D. Eiríksdóttir, Elísa Guðrún Brynjólfsdóttir, Jóhannes Gísli Jónsson, H.A. Haraldsdóttir, and R. Sverrisdóttir. 2015. Kristín Lena Thorvaldsdóttir, and Rannveig Sverris- Annual report of the Icelandic Sign Language dóttir. 2012. Málfræði íslenska táknmálsins. [The council. Technical report. In Icelandic. Available grammar of Icelandic Sign Language]. Íslenskt mál at https://www.islenskan.is/images/ og almenn málfræði, 34:9–52. Skyrsla-MIT-2015.pdf.

Matt Butterweck, Cathy Chua, Hanieh Habibi, Manny Valgerður Stefánsdóttir. 2005. Málsamfélag Rayner, and Ghil‘ad Zuckermann. 2019. Easy con- heyrnarlausra: Um samskipti á milli táknmál- struction of multimedia online language textbooks stalandi og íslenskutalandi fólks. Ph.D. thesis.

22 Valgerður Stefánsdóttir, Ari Páll Kristinsson, and Júlía G Hreinsdóttir. 2019. Legal recognition of Ice- landic Sign Language: Meeting Deaf people’s ex- pectations? The Legal Recognition of Sign Lan- guages: Advocacy and Outcomes Around the World, page 238. Rannveig Sverrisdóttir. 2000. Signing simultaneous events: The expression of simultaneity in children’s and adults’ narratives in Icelandic Sign Language. Master’s thesis, University of Copenhagen. Rannveig Sverrisdóttir. 2007. Hann var bæði mál- og heyrnarlaus: um viðhorf til táknmála [he was both mute and deaf: On attitudes towards sign lan- guages]. Ritið, 7(1):83–105. Rannveig Sverrisdóttir and Kristín Lena Þorvaldsdóttir. 2016. Why is the sky blue? on colour signs in Ice- landic Sign Language. Semantic Fields in Sign Lan- guages Colour, Kinship and Quantification. Berlin, Boston: De Gruyter, pages 209–250. Guðný Björk Thorvaldsdóttir. 2007. The use of space in Icelandic Sign Language. Master’s thesis, The University of Dublin. Guðný Björk Thorvaldsdóttir. 2008. Tillaga um nýja íslenska táknmálsorðabók á málvísin- dalegum forsendum [proposal for a new Ice- landic Sign Language dictionary on linguistic terms]. Master’s thesis, University of Iceland. Urlhttps://skemman.is/handle/1946/3452. Ghil‘ad Zuckermann. 2020. Revivalistics: From the Genesis of Israeli to Language Reclamation in Aus- tralia and Beyond. New York: Oxford University Press. Ghil‘ad Zuckermann. 2021. Barngarla sentences. Barngarla Language Advisory Committee (BLAC). In preparation. Ghil‘ad Zuckermann and the Barngarla. 2019. Barn- garlidhi Manoo: "Speaking Barngarla Together". Barngarla Language Advisory Committee (BLAC). Ghil‘ad Zuckermann and the Barngarla. 2021. Man- giri: "Barngarla Wellbeing". Barngarla Language Advisory Committee (BLAC). In preparation. Kristín Lena Þorvaldsdóttir. 2011. Sagnir í íslenska táknmálinu. Formleg einkenni og málfræðilegar for- mdeildir. Ph.D. thesis. Kristín Lena Þorvaldsdóttir and Valgerður Stefánsdót- tir. 2015. Icelandic Sign Language. Sign languages of the world: a comparative handbook, page 409.

23