Arxiv:2104.08726V1 [Cs.CL] 18 Apr 2021

Total Page:16

File Type:pdf, Size:1020Kb

Arxiv:2104.08726V1 [Cs.CL] 18 Apr 2021 AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages Abteen Ebrahimi} Manuel Mager♠ Arturo Oncevay~ Vishrav Chaudharyr Luis Chiruzzo4 Angela Fanr John OrtegaΩ Ricardo Ramosη Annette Rios Ivan Vladimir] Gustavo A. Gimenez-Lugo´ | Elisabeth Mager] Graham Neubig./ Alexis Palmer} Rolando A. Coto Solanof Ngoc Thang Vu♠ Katharina Kann} ./Carnegie Mellon University fDartmouth College rFacebook AI Research ΩNew York University 4Universidad de la Republica,´ Uruguay ηUniversidad Tecnologica´ de Tlaxcala ]Universidad Nacional Autonoma´ de Mexico´ |Universidade Tecnologica´ Federal do Parana´ }University of Colorado Boulder ~University of Edinburgh ♠University of Stuttgart University of Zurich Abstract Language ISO Family Dev Test Aymara aym Aymaran 743 750 Pretrained multilingual models are able to per- Ashaninka´ cni Arawak 658 750 form cross-lingual transfer in a zero-shot set- Bribri bzd Chibchan 743 750 ting, even for languages unseen during pre- Guarani gn Tupi-Guarani 743 750 training. However, prior work evaluating per- Nahuatl´ nah Uto-Aztecan 376 738 formance on unseen languages has largely Otom´ı oto Oto-Manguean 222 748 Quechua quy Quechuan 743 750 been limited to low-level, syntactic tasks, and Raramuri´ tar Uto-Aztecan 743 750 it remains unclear if zero-shot learning of Shipibo-Konibo shp Panoan 743 750 high-level, semantic tasks is possible for un- Wixarika hch Uto-Aztecan 743 750 seen languages. To explore this question, we present AmericasNLI, an extension of XNLI Table 1: The languages in AmericasNLI, along with (Conneau et al., 2018) to 10 indigenous lan- their ISO codes, language families, and dataset sizes. guages of the Americas. We conduct experi- ments with XLM-R, testing multiple zero-shot and translation-based approaches. Addition- further improvements (Muller et al., 2020; Pfeiffer ally, we explore model adaptation via contin- et al., 2020a,b; Wang et al., 2020). ued pretraining and provide an analysis of the Practically all work evaluating the zero-shot per- dataset by considering hypothesis-only mod- formance of models on languages not seen dur- els. We find that XLM-R’s zero-shot perfor- ing pretraining is currently limited to low-level, mance is poor for all 10 languages, with an av- erage performance of 38.62%. Continued pre- syntactic tasks such as part-of-speech tagging, de- training offers improvements, with an average pendency parsing, and named-entity recognition accuracy of 44.05%. Surprisingly, training on (Muller et al., 2020; Wang et al., 2020). This is due poorly translated data by far outperforms all to the fact that most multilingual datasets for high- other methods with an accuracy of 48.72%. level, semantic tasks only cover languages which are well-resourced enough to already be contained 1 Introduction in the pretraining data.1 This limits our ability to Pretrained multilingual models such as XLM draw more general conclusions with regards to the arXiv:2104.08726v1 [cs.CL] 18 Apr 2021 (Lample and Conneau, 2019), multilingual BERT zero-shot learning abilities of pretrained multilin- (mBERT; Devlin et al., 2019), and XLM-R (Con- gual models for unseen languages. neau et al., 2020) achieve strong cross-lingual trans- In order to make such an evaluation possi- fer results for many languages and natural language ble, we introduce AmericasNLI, an extension of processing (NLP) tasks. However, there exists a XNLI (Conneau et al., 2018) – a natural language discrepancy in terms of zero-shot performance be- inference (NLI; cf. §2.3) dataset covering 15 tween languages present in the pretraining data and high-resource languages – to 10 indigenous lan- those that are not: performance is generally highest guages spoken in the Americas: Ashaninka,´ Ay- for well-represented languages, and decreases with mara, Bribri, Guarani, Nahuatl,´ Otom´ı, Quechua, less representation. Yet, even for unseen languages, 1One notable exception is XCoPA (Ponti et al., 2020), performance is generally above chance, and model which covers two languages not present in mBERT’s pretrain- adaptation approaches have been shown to yield ing data: Haitian Creole and Quechua. Raramuri,´ Shipibo-Konibo, and Wixarika. All These models follow the standard pretraining– of them are truly low-resource languages: they finetuning paradigm: they are first trained on unla- have small or no Wikipedia corpora, and are not beled monolingual corpora from various languages present in the training data of current state-of-the- (the pretraining languages) and later finetuned art pretrained multilingual models. This dataset en- on target-task data in a – usually high-resource ables us to address the following research questions – source language. Having been exposed to a vari- (RQs): (1) How do existing multilingual models ety of languages through this training setup, cross- perform on unseen languages as compared to their lingual transfer results for these models are com- performance on XNLI? (2) Do methods aimed at petitive with the state of the art for many languages adapting models to unseen languages – previously and tasks. Commonly used models are mBERT exclusively evaluated on low-level, syntactic tasks (Devlin et al., 2019), which is pretrained on the – also increase performance on NLI? Wikipedias of 104 languages with masked language We experiment with XLM-R, both with and with- modeling (MLM) and next sentence prediction out model adaptation via continued pretraining on (NSP), and XLM, which is trained on 15 languages monolingual corpora in the target language. Our and introduces the translation language modeling results show that the performance of XLM-R out- objective, which is based on MLM, but uses pairs of-the-box is moderately above chance, and model of parallel sentences. XLM-R has improved per- adaptation leads to improvements of up to 5.88 formance over XLM, and trains on data from 100 percentage points. Training on machine-translated different languages with only the MLM objective. training data, however, results in an even larger per- Common to all models is a large shared subword formance gain of 10.1 percentage points over the vocabulary created using either WordPiece (Sen- corresponding XLM-R model without adaptation. nrich et al., 2016) or SentencePiece (Kudo and We further perform an analysis via experiments Richardson, 2018) tokenization. with hypothesis-only models, to examine poten- tial artifacts which may have been inherited from 2.2 Evaluating Pretrained Multilingual XNLI and find that performance is above chance Models for most models, but still below that for using the full example. Just as in the monolingual setting, where bench- AmericasNLI is publicly available at marks such as GLUE (Wang et al., 2018) and Super- nala-cub.github.io/resources. We GLUE (Wang et al., 2019) provide a look into the hope that it will serve as a benchmark for measur- performance of models across various tasks, multi- ing the zero-shot natural language understanding lingual benchmarks (Hu et al., 2020; Liang et al., abilities of multilingual models for unseen lan- 2020) cover a wide variety of tasks involving sen- guages. Additionally, we hope that our dataset will tence structure, classification, retrieval, and ques- motivate the development of novel pretraining and tion answering. Evaluation is based on zero-shot model adaptation techniques which are suitable for transfer from English, providing a strong bench- truly low-resource languages. mark for cross-lingual transfer. Additional work has been done examining what 2 Background and Related Work mechanisms allow multilingual models to trans- fer across languages (Pires et al., 2019; Wu and 2.1 Pretrained Multilingual Models Dredze, 2019). Wu and Dredze(2020) examine Prior to the widespread use of pretrained trans- transfer performance dependent on a language’s former models, cross-lingual transfer was mainly representation in the pretraining data. For language achieved through word embeddings (Mikolov et al., with low representation, multiple methods have 2013; Pennington et al., 2014; Bojanowski et al., been proposed to improve performance, including 2017), either by aligning monolingual embeddings extending the vocabulary, transliterating the target into the same embedding space (Conneau et al., text, and continuing pretraining before finetuning 2017; Lample et al., 2017; Grave et al., 2018) or (Lauscher et al., 2020; Chau et al., 2020; Muller by training multilingual embeddings (Ammar et al., et al., 2020; Pfeiffer et al., 2020a,c; Wang et al., 2016; Artetxe and Schwenk, 2019). Pretrained mul- 2020). In this work, we focus on continued pretrain- tilingual models represent the extension of multilin- ing to analyze the performance of model adaptation gual embeddings to pretrained transformer models. for a high-level, semantic task. 2.3 Natural Language Inference Lang. Source(s) Sent. aym Tiedemann(2012) 6,531 Given two sentences, the premise and the hypothe- Feldman and Coto-Solano(2020); Margery(2005); sis, the task of NLI consists of determining whether Jara Murillo(2018a); Constenla et al.(2004); bzd 7,508 the hypothesis logically entails, contradicts, or is Jara Murillo and Garc´ıa Segura(2013); Jara Murillo(2018b); Flores Sol orzano´ (2017) neutral to the premise. The most widely used cni Cushimariano Romano and Sebastian´ Q.(2008) 3,883 datasets for NLI in English are SNLI (Bowman et al., 2015) and MNLI (Williams et al., 2018). gn Chiruzzo et al.(2020) 26,032 XNLI (Conneau et al., 2018) is the multilingual hch Mager et al.(2017) 8,966 expansion of MNLI to 15 languages, providing nah Gutierrez-Vasques et al.(2016) 16,145 manually translated evaluation sets and machine- oto https://tsunkua.elotl.mx 4,889 translated training sets. While datasets for NLI or quy Agic´ and Vulic´(2019) 125,008 Galarreta et al.(2017); Loriot et al.(1993); the similar task of recognizing textual entailment shp 14,592 Gomez´ Montoya et al.(2019) exist for other languages (Bos et al., 2009; Alab- Brambila(1976); bas, 2013; Eichler et al., 2014; Amirkhani et al., tar github.com/pywirrarika/tar_par 14,720 2020), their lack of similarity prevents a general- ized study of cross-lingual zero-shot performance.
Recommended publications
  • (Huichol) of Tateikita, Jalisco, Mexico
    ETHNO-NATIONALIST POLITICS AND CULTURAL PRESERVATION: EDUCATION AND BORDERED IDENTITIES AMONG THE WIXARITARI (HUICHOL) OF TATEIKITA, JALISCO, MEXICO By BRAD MORRIS BIGLOW A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2001 Copyright 2001 by Brad Morris Biglow Dedicated to the Wixaritari of Tateikita and the Centro Educativo Tatutsi Maxa Kwaxi (CETMK): For teaching me the true meaning of what it is to follow in the footsteps of Tatutsi, and for allowing this teiwari to experience what you call tame tep+xeinuiwari. My heart will forever remain with you. ACKNOWLEDGMENTS I would like to thank my committee members–Dr. John Moore for being ever- supportive of my work with native peoples; Dr. Allan Burns for instilling in me the interest and drive to engage in Latin American anthropology, and helping me to discover the Huichol; Dr. Gerald Murray for our shared interests in language, culture, and education; Dr. Paul Magnarella for guidance and support in human rights activism, law, and intellectual property; and Dr. Robert Sherman for our mutual love of educational philosophy. Without you, this dissertation would be a mere dream. My life in the Sierra has been filled with countless names and memories. I would like to thank all of my “friends and family” at the CETMK, especially Carlos and Ciela, Marina and Ángel, Agustín, Pablo, Feliciano, Everardo, Amalia, Rodolfo, and Armando, for opening your families and lives to me. In addition, I thank my former students, including los chavos (Benjamín, Salvador, Miguel, and Catarino), las chicas (Sofía, Miguelina, Viviana, and Angélica), and los músicos (Guadalupe and Magdaleno).
    [Show full text]
  • The Reorganization of the Huichol Ceremonial Precinct (Tukipa) of Guadalupe Ocotán, Nayarit, México Translation of the Spanish by Eduardo Williams
    FAMSI © 2007: Víctor Manuel Téllez Lozano The Reorganization of the Huichol Ceremonial Precinct (Tukipa) of Guadalupe Ocotán, Nayarit, México Translation of the Spanish by Eduardo Williams Research Year : 2005 Culture : Huichol Chronology : Modern Location : Nayarit, México Site : Guadalupe Ocotán Table of Contents Abstract Resumen Linguistic Note Introduction Architectural Influences The Tukipa District of Xatsitsarie The Revolutionary Period and the Reorganization of the Community The Fragmentation of the Community The Tukipa Precinct of Xatsitsarie Conclusions Acknowledgements Appendix: Ceremonial precincts derived from Xatsitsarie’s Tuki List of Figures Sources Cited Abstract This report summarizes the results of research undertaken in Guadalupe Ocotán, a dependency and agrarian community located in the municipality of La Yesca, Nayarit. This study explores in greater depth the political and ceremonial relations that existed between the ceremonial district of Xatsitsarie and San Andrés Cohamiata , one of three Wixaritari (Huichol) communities in the area of the Chapalagana River, in the northern area of the state of Jalisco ( Figure 1 , shown below). Moreover, it analyzes how the destruction of the Temple ( Tuki ) of Guadalupe Ocotán, together with the modification of the community's territory, determined the collapse of these ceremonial links in the second half of the 20th century. The ceremonial reorganization of this district is analyzed using a diachronic perspective, in which the ethnographic record, which begins with Lumholtz' work in the late 19th century, is contrasted with reports by missionaries and oral history. Similarly, on the basis of ethnographic data and information provided by archaeological studies, this study offers a reinterpretation of certain ethnohistorical sources related to the antecedents of these ceremonial centers.
    [Show full text]
  • Origins and Diversity of Aymara How and Why Is Aymara Different in Different Regions?
    Origins and Diversity of Aymara How and Why is Aymara Different in Different Regions? Contents Is Aymara Alone? Aymara and Quechua The Aymara Language Family ‘Southern’ or ‘Altiplano’ Aymara Central Aymara: Jaqaru and Kawki Differences Between Central and Southern Aymara Is There a ‘Correct’ Aymara? The Origins of Aymara: Where and When? Which Civilisations Spoke Aymara? How to Find Out More If you want to print out this text, we recommend our printable versions either in .pdf format : A4 paper size or Letter paper size or in Microsoft Word format : A4 paper size or Letter paper size Back to Contents Origins and Diversity of Aymara www.quechua.org.uk/Sounds Paul Heggarty [of 12 ] – 1 – Back to Contents – Skip to Next: Aymara & Quechua Is Aymara Alone? The language that is normally called ‘Aymara’ is well-known to be spoken in much of the Altiplano , the ‘high plain’ at an altitude of around 4000 m that covers much of western Bolivia and the far south of Peru. Aymara is spoken all around the region of the Bolivian capital La Paz, and further north to Lake Titicaca, the famous archaeological site of Tiwanaku, and into the southernmost regions of Peru, around Huancané, Puno and Moquegua. South of La Paz, Aymara is spoken in the Oruro and Poopó regions and beyond, across the wild and beautiful border areas into northern Chile. What is much less well-known, however, is that this Altiplano Aymara is not alone! A language of the very same family is spoken almost a thousand kilometres further north, in central Peru, in the semi-desert mountains of the province of Yauyos , not far south and inland from Lima.
    [Show full text]
  • Languages: Genetic Relatiónship Or Areal Diffusion?
    Opción, Año 11, No. 18 (1995): 45-73 ?' ISBN 1012-1387 Lexical similarities between Uru-Chipaya and Pano-Takanan languages: Genetic relatiónship or areal diffusion? Alain Fabre Tampere University ofTechnology Finlandia Abstract |; This study traces the geographical distribution of some words in the Pano-Takanan and Uru-Chipaya languige famiües (Perú and Bo- livia). Themethod applied has been fruitfuljy (though notexclusively) used in the fíelds of Uraüc and Indo-European diachronic studies. Out research has been influenced by the studies ¡jby Bereczki (1983), Hajdú (1981), Hajdú & Domokos (1987), Hákkinen (1983) and Joki (1973). This kind of studies are a prerequisite toany attempt toinvestígate into thekindofproblematic relatíonsfiips involverjbetween language groups, showing either that (1) the languages in quéstion, at least in the course ofthe time section under study, were not in! direct contact (or had only sporadic contacts) or that (2) the languages were indeed in contact. The latter possibiüty offers us the opportunity to further examine whether we are deaüng with areal affinity or geneticrelatiónship.When no contact can be shown,there can be no genetic connéction for the period under scrutiny, The next step, phonolpgical and morphological comparison either proving or discarding genetic relatiónship, is not attempted here. We try to disclose layer after layer, as far back in the past as feasable, the former distribution of the ancestors of these languages, thus recon- structing some of the movements of these Jpeoples and/or languages. 1 • ! Recibido: 20 de mayo de 1995 • Aceptado: 4 de octubrede 1995 46 Alain Fabre Opción, Año 11, No. 18 (1995): 45-73 Mainly by inspecting the geographical distribution ofcognate words, we havetriedto disentangle differentchronological stages ofthelanguages, inrelative tíme.
    [Show full text]
  • Final MA Thesis
    UNIVERSITY OF CALIFORNIA Los Angeles Teachings From Within: Urban Wixárika Women Re-making Motherland A thesis submitted in partial satisfaction of the requirements for the degree Master of Arts in Culture and Performance by Cyndy Margarita Garcia-Weyandt 2015 © Copyright by Cyndy Margarita Garcia-Weyandt 2015 ABSTRACT OF THE THESIS Teachings From Within: Urban Wixárika Women Re-making Motherland by Cyndy Margarita Garcia-Weyandt Master of Arts in Culture and Performance University of California, Los Angeles, 2015 Professor David D. Shorter, Chair The Indigenous Wixárika community, from Western México, commonly known as the “Huichol,” continues to practice ancestral traditions despite the numerous families who have left their motherland to re-establish communities in urban areas in the state of Nayarit, México. As a direct result of these moves, Wixárika families have adopted mestizo ways of living, in regards to education and modes of production. This thesis explores how they implement forms of resistance in urban communities, allowing them to continue practicing Wixárika rituals and ceremonies. Drawing from my ethnographic fieldwork, I illustrate how Wixárika women transform their urban space to cultural knowledge. I include a discussion on how women use physical labor to transmit cultural identity in urban areas in practices that include preparation of corn-based meals and art production such as beading or weaving. In addition, the use of poetics in traditional Wixárika names provides the metaphors to link children and youth to cultural identity. This thesis serves as a case study to illustrate how Indigenous women empower children by providing the skills to continue the practice community-based knowledge.
    [Show full text]
  • Redalyc.Multilingualism on the North Coast of Peru: an Archaeological
    Indiana ISSN: 0342-8642 [email protected] Ibero-Amerikanisches Institut Preußischer Kulturbesitz Alemania Herrera Wassilowsky, Alexander Multilingualism on the North Coast of Peru: An Archaeological Perspective on Quingnam, Muchik, and Quechua Toponyms from the Nepeña Valley and its Headwaters Indiana, vol. 33, núm. 1, 2016, pp. 161-176 Ibero-Amerikanisches Institut Preußischer Kulturbesitz Berlin, Alemania Available in: http://www.redalyc.org/articulo.oa?id=247046764008 How to cite Complete issue Scientific Information System More information about this article Network of Scientific Journals from Latin America, the Caribbean, Spain and Portugal Journal's homepage in redalyc.org Non-profit academic project, developed under the open access initiative Multilingualism on the North Coast of Peru: An Archaeological Perspective on Quingnam, Muchik, and Quechua Toponyms from the Nepeña Valley and its Headwaters Multilingüismo en la costa norte del Perú: una perspectiva arqueológica sobre los topónimos quingnam, muchik y quechua del valle de Nepeña y sus cabeceras (Ancash, Perú) Alexander Herrera Wassilowsky Universidad de los Andes, Colombia [email protected] Abstract: This paper presents and explores names of places pertaining to the southern Yunga languages – Muchik or Quingnam – from the valley of Nepeña (Ancash, Peru). Toponyms include possible Quechua-Yunga compounds and, possibly Muchik-Quingnam hybrids. Their regional distribution is described and their temporal placement discussed. Archaeological data patterning, the location of sacred waka places, routes of interregional interaction and political developments are described. Enduring multilingualism – coupled with established oracular shrines – is put forward as an alternative to language replacement theories. Keywords: multilingualism; historical linguistics; toponyms; archaeology; Muchik; Quing- nam; Quechua; Ancash; Peru.
    [Show full text]
  • Only and Focus in Imbabura Quichua 1
    Only and focus in Imbabura Quichua Jos Tellings University of California, Los Angeles⇤ 1 Introduction This paper investigates the interaction of focus and the exclusive particle -lla ‘only’ in Im- babura Quichua. Imbabura Quichua (henceforth Quichua)1 is a Quechuan language spoken in Imbabura Province in Northern Ecuador. Quichua is a highly agglutinative, suffixing language with a predominantly verb-final word order. A 2008 estimate of the number of speakers is 150,000 (G´omez-Rend´on 2008:182fn.). Existing literature on this language in- cludes one descriptive grammar (Cole 1982), whereas most theoretical work on this language is directed towards (morpho)syntax (Cole and Hermon 1981; Hermon 2001; Willgohs and Farrell 2009), and its evidential system (to be discussed in section 2.2 below) (see S´anchez 2010:236↵. for a more exhaustive bibliography on the Quechuan languages). The study of focus in Quichua is worthwhile for a number of reasons. First, as I will dis- cuss in a little more detail in section 2.1 below, Quichua is a relatively uncommon language from a point of view of focus typology, because it realizes focus non-phonologically, and it has a bound morpheme exclusive particle -lla. The semantic study of focus is still domi- nated by English and other languages that realize focus by phonological means. Studying a typologically marked language will be insightful in testing our theory for cross-linguistic validity. Second, this work contributes to an existing body of research on the suffix -mi which appears in several Quechuan languages, and which belongs to perhaps the best studied parts of the Quechuan language family.
    [Show full text]
  • Schaefer Revised
    Portraits TOC HUICHOL: BECOMING A GODMOTHER Stacy B. Schaefer MY INTRODUCTION TO SPROUTING CORN Nauxé, á ítsari, éna pu ta, éna pu ta. Look, your loom, put this here, put this here. Mükü pu ta, éna pu ta. Aixru pu áne. Put that there, put this here. That’s good. Pe ti kwa’á? Teté kwaní. Are you hungry? It’s time to eat. hese were the first words spoken to me by a little four-year-old Huichol girl named Sprouting Corn, who was later to become Tmy godchild. Together, as godmother and goddaughter, we would develop an extraordinary relationship that would transcend the traditional role of anthropologist and young native informant. It was midday and we were sitting in the dirt patio of Sprouting Corn’s family rancho in the Huichol Indian Sierra com- munity of San Andres Cohamiata, located on the pine-covered mesa of the Sierra Madre Occidental Mountains in the state of Jalisco. I was just beginning my long-term fieldwork researching backstrap loom weaving and the role of women among the Huichol Indians of Mexico. Sprouting Corn’s mother, Swift Rabbit, a master weaver, had agreed to teach me the ancient art of Huichol weaving. Although I was fluent in Spanish, I knew very little of the Huichol language—a language related to Nahuatl (spoken by Aztecs) and Hopi—which derives from the Uto-Aztecan language family. Thousands of years of separation have made it difficult for the speakers of these related languages to understand one another. So when this little girl, dressed in a bright print gathered skirt and shocking pink blouse with ribbons sewn on, with strands of blue, orange, and yellow beads circling her neck, started chattering to me as if I understood completely what she was saying, I nodded in confusion.
    [Show full text]
  • [.35 **Natural Language Processing Class Here Computational Linguistics See Manual at 006.35 Vs
    006 006 006 DeweyiDecimaliClassification006 006 [.35 **Natural language processing Class here computational linguistics See Manual at 006.35 vs. 410.285 *Use notation 019 from Table 1 as modified at 004.019 400 DeweyiDecimaliClassification 400 400 DeweyiDecimali400Classification Language 400 [400 [400 *‡Language Class here interdisciplinary works on language and literature For literature, see 800; for rhetoric, see 808. For the language of a specific discipline or subject, see the discipline or subject, plus notation 014 from Table 1, e.g., language of science 501.4 (Option A: To give local emphasis or a shorter number to a specific language, class in 410, where full instructions appear (Option B: To give local emphasis or a shorter number to a specific language, place before 420 through use of a letter or other symbol. Full instructions appear under 420–490) 400 DeweyiDecimali400Classification Language 400 SUMMARY [401–409 Standard subdivisions and bilingualism [410 Linguistics [420 English and Old English (Anglo-Saxon) [430 German and related languages [440 French and related Romance languages [450 Italian, Dalmatian, Romanian, Rhaetian, Sardinian, Corsican [460 Spanish, Portuguese, Galician [470 Latin and related Italic languages [480 Classical Greek and related Hellenic languages [490 Other languages 401 DeweyiDecimali401Classification Language 401 [401 *‡Philosophy and theory See Manual at 401 vs. 121.68, 149.94, 410.1 401 DeweyiDecimali401Classification Language 401 [.3 *‡International languages Class here universal languages; general
    [Show full text]
  • (REELA) 5-7 September 2015, Leiden University Centre for Linguistics
    Fourth Conference of the Red Europea para el Estudio de las Lenguas Andinas (REELA) 5-7 September 2015, Leiden University Centre for Linguistics Fourth Conference of the European Association for the Study of Andean Languages - Abstracts Saturday 5 September Lengua X, an Andean puzzle Matthias Pache Leiden University In the southern central Andes, different researchers have come across series of numerals which are difficult to attribute to one of the language groups known to be or have been spoken in this area: Quechuan, Aymaran, Uru-Chipayan, or Puquina (cf. Ibarra Grasso 1982: 97-107). In a specific chapter headed “La lengua X”, Ibarra Grasso (1982) discusses different series of numerals which he attributes to this language. Although subsumed under one heading, Lengua X, the numerals in question may vary across the sources, both with respect to form and meaning. An exemplary paradigm of Lengua X numerals recorded during own fieldwork is as follows: 1 mayti 2 payti 3 kimsti 4 taksi 5 takiri 6 iriti 7 wanaku 8 atʃ͡atʃ͡i 9 tʃ͡ipana 10 tʃ͡ˀutx Whereas some of these numerals resemble their Aymara counterparts (mayti ‘one’, payti ‘two’, cf. Aymara maya ‘one’, paya ‘two’), others seem to have parallels in Uru or Puquina numerals (taksi ‘four’, cf. Irohito Uru táxˀs núko ‘six’ (Vellard 1967: 37), Puquina tacpa ‘five’ (Torero 2002: 454)). Among numerals above five, there are some cases of homonymy with Quechua/Aymara terms referring to specific entities, as for instance Lengua X tʃ͡ipana ‘nine’ and Quechua/Aymara tʃ͡ipana ‘fetter, bracelet’. In this talk, I will discuss two questions: (1) What is the origin of Lengua X numerals? (2) What do Lengua X numerals reveal about the linguistic past of the southern central Andes? References Ibarra Grasso, Dick.
    [Show full text]
  • Arxiv:1806.04291V1 [Cs.CL] 12 Jun 2018 Hnwrigo Hsfil.Sneidgnu Agae R Di Are Languages Indigenous Since P We field
    Challenges of language technologies for the indigenous languages of the Americas Manuel Mager Ximena Gutierrez-Vasques Instituto de Investigaciones en Matem´aticas GIL IINGEN Aplicadas y en Sistemas Universidad Nacional Universidad Nacional Aut´onoma de M´exico Aut´onoma de M´exico [email protected] [email protected] Gerardo Sierra Ivan Meza GIL IINGEN Instituto de Investigaciones en Matem´aticas Universidad Nacional Aplicadas y en Sistemas Aut´onoma de M´exico Universidad Nacional Aut´onoma de M´exico [email protected] [email protected] Abstract Indigenous languages of the American continent are highly diverse. However, they have received little attention from the technological perspective. In this paper, we review the research, the dig- ital resources and the available NLP systems that focus on these languages. We present the main challenges and research questions that arise when distant languages and low-resource scenarios are faced. We would like to encourage NLP research in linguistically rich and diverse areas like the Americas. Title and Abstract in Nahuatl Masehualtlahtoltecnologias ipan Americatlalli In nepapan Americatlalli imacehualtlahtol, inin tlahtolli ahmo quinpiah miac tlahtoltecnolog´ıas (“tecnolog´ıas del lenguaje”). Ipan inin amatl, tictemoah nochin macehualtlahtoltin intequiuh, nochin recursos digitales ihuan nochin tlahtoltecnolog´ıas in ye mochiuhqueh. Cequintin problemas monextiah ihcuac tlahtolli quinpiah tepitzin recursos kenin amoxtli, niman, ohuic quinchihuaz tecnolog´ıa ihuan ohuic quinchihuaz macehualtlahtolmatiliztli. Cenca importante in ocachi ticchihuilizqueh tlahtoltecnolog´ıas macehualtlahtolli, niman tipalehuilizqueh ahmo mopolozqueh inin tlahtolli. 1 Introduction The American continent is linguistically diverse, it comprises many indigenous languages that are nowa- days spoken from North to South America.
    [Show full text]
  • Five Suffixes with Unified Spellings for Southern Quechua
    Five Suffixes With Unified Spellings for Southern Quechua -mi /-m -pa/-p -pti- -chka- -chik On this page we look at the five suffixes of Quechua for which there is a single unified spelling proposed for all Southern Quechua : -mi/-m, -pa/-p, -pti-, -chka- and -chik , to explain why these particular spellings have been chosen as the unified ones. Certainly, they might at first seem a little odd to people in some regions, including Cuzco and southern Bolivia; but on the other hand there are also big, big advantages of using these particular spellings in order to be consistent and to achieve more unity among Quechua speakers from all regions, as we shall find out on this page … Contents Spelling and Sounds, Spelling and Grammar But Isn’t It Really Strange to Spell These Suffixes in the Unified Way? An Example of a Unified Suffix Spelling in Another Language: English Unified Suffix Spellings: More Complicated, or Simpler? The Five Main Suffixes With a Unified Spelling ‘Assertive ’ -mi or -m ’Possessor ’ -pa or -p ‘If/When ’ -pti- ‘Progressive ’ -chka- ‘Inclusive we ’ -chik Which Region’s Spellings are the Unified Ones? Three Vowels or Five? If you want to print out this text, we recommend our printable versions either in .pdf format : A4 paper size or Letter paper size or in Microsoft Word format : A4 paper size or Letter paper size Suffix Spellings in Southern Quechua www.quechua.org.uk/Sounds Paul Heggarty [of 8] – 1 – Back to Contents – Skip to Next: But Isn’t It Really Strange to Spell These Suffixes in the Unified Way? Spelling and Sounds, Spelling and Grammar Our main texts on pronunciation and spelling discuss only the proposed unified spellings of individual sounds like [ q] or [ m] wherever they occur in a word.
    [Show full text]