Named Entity Recognition for African Languages

Total Page:16

File Type:pdf, Size:1020Kb

Named Entity Recognition for African Languages MasakhaNER: Named Entity Recognition for African Languages David Ifeoluwa Adelani1∗, Jade Abbott2∗, Graham Neubig3, Daniel D’souza4∗, Julia Kreutzer5∗, Constantine Lignos6∗, Chester Palen­Michel6∗, Happy Buzaaba7∗ Shruti Rijhwani3, Sebastian Ruder8, Stephen Mayhew9, Israel Abebe Azime10∗ Shamsuddeen H. Muhammad11;12∗, Chris Chinenye Emezue13∗, Joyce Nakatumba­Nabende14∗ Perez Ogayo15∗, Anuoluwapo Aremu16∗, Catherine Gitau∗, Derguene Mbaye∗ Jesujoba Alabi17∗, Seid Muhie Yimam18, Tajuddeen Gwadabe19∗, Ignatius Ezeani20∗ Rubungo Andre Niyongabo21∗, Jonathan Mukiibi14, Verrah Otiende22∗ Iroro Orife23∗, Davis David∗, Samba Ngom∗, Tosin Adewumi24∗ Paul Rayson20, Mofetoluwa Adeyemi∗, Gerald Muriuki14, Emmanuel Anebi∗ Chiamaka Chukwuneke20, Nkiruka Odu25, Eric Peter Wairagala14 Samuel Oyerinde∗, Clemencia Siro∗, Tobius Saul Bateesa14, Temilola Oloyede∗ Yvonne Wambui∗, Victor Akinode∗, Deborah Nabagereka14, Maurice Katusiime14 Ayodele Awokoya26∗, Mouhamadane MBOUP∗ Dibora Gebreyohannes∗, Henok Tilaye∗ Kelechi Nwaike∗, Degaga Wolde∗, Abdoulaye Faye∗, Blessing Sibanda27∗ Orevaoghene Ahia28∗, Bonaventure F. P. Dossou29∗, Kelechi Ogueji30∗ Thierno Ibrahima DIOP∗, Abdoulaye Diallo∗, Adewale Akinfaderin∗ Tendai Marengereke∗, and Salomey Osei10∗ ∗ Masakhane, 1 Spoken Language Systems Group (LSV), Saarland University, Germany 2 Retro Rabbit, 3 Language Technologies Institute, Carnegie Mellon University 4 ProQuest, 5 Google Research, 6 Brandeis University, 8 DeepMind, 9 Duolingo 7 Graduate School of Systems and Information Engineering, University of Tsukuba, Japan. 10 African Institute for Mathematical Sciences (AIMS­AMMI), 11 University of Porto 12 Bayero University, Kano, 13 Technical University of Munich, Germany 14 Makerere University, Kampala, Uganda,15 African Leadership University, Rwanda 16 University of Lagos, Nigeria, 17 Max Planck Institute for Informatics, Germany. 18 LT Group, Universität Hamburg, 19 University of Chinese Academy of Science, China 20 Lancaster University, 21 University of Electronic Science and Technology of China, China. 22 United States International University ­ Africa (USIU­A), Kenya. 23 Niger­Volta LTI 24 Luleå University of Technology, 25 African University of Science and Technology, Abuja 26 University of Ibadan, Nigeria, 27Namibia University of Science and Technology 28 Instadeep, 29 Jacobs University Bremen, Germany, 30 University of Waterloo. Abstract 1 Introduction We take a step towards addressing the under­ Africa has over 2,000 languages (Eberhard et al., representation of the African continent in NLP 2020); however, these languages are scarcely rep­ research by creating the first large publicly resented in existing natural language processing available high­quality dataset for named entity (NLP) datasets, research, and tools (Martinus and recognition (NER) in ten African languages, Abbott, 2019). 8 et al. (2020) investigate the rea­ bringing together a variety of stakeholders. sons for these disparities by examining how NLP We detail characteristics of the languages to for low­resource languages is constrained by sev­ help researchers understand the challenges that eral societal factors. One of these factors is the these languages pose for NER. We analyze our datasets and conduct an extensive empirical geographical and language diversity of NLP re­ evaluation of state­of­the­art methods across searchers. For example, only 5 out of the 2695 both supervised and transfer learning settings. affiliations of authors whose work was published We release the data, code, and models in order at the five major NLP conferences in 2019 are to inspire future research on African NLP1. from African institutions (Caines, 2019). Many 1https://github.com/masakhane-io/ NLP tasks such as machine translation, text classi­ masakhane-ner fication, part­of­speech tagging, and named entity recognition would benefit from the knowledge of such as the WikiAnn corpus (Pan et al., 2017) native speakers who are involved in the develop­ covering 282 languages, such datasets consist of ment of datasets and models. “silver­standard” labels created by transferring an­ In this paper, we focus on named entity recog­ notations from English to other languages through nition (NER)—one of the most impactful tasks in cross­lingual links in knowledge bases. As the data NLP (Sang and De Meulder, 2003; Lample et al., comes from Wikipedia, it includes some African 2016). NER is a core NLP task in information languages, most with fewer than 10k tokens. extraction and an essential component of numer­ Other NER datasets for African languages are ous products including spell­checkers, localization SADiLaR (Eiselen, 2016) for ten South African of voice and dialogue systems, and conversational languages based on government data, and small agents. It also enables identifying African names, corpora of less than 2K sentences for Yorùbá (Al­ places and organizations for information retrieval. abi et al., 2020) and Hausa (Hedderich et al., 2020). African languages are under­represented in this Additionally, LORELEI language packs (Strassel crucial task due to lack in datasets, reproducible and Tracey, 2016) were created for some African results, and researchers who understand the chal­ languages including Yorùbá, Hausa, Amharic, So­ lenges that such languages pose for NER. mali, Twi, Swahili, Wolof, Kinyarwanda, and In this paper, we take an initial step towards im­ Zulu, but are not publicly available. proving representation for African languages in the NER models Popular sequence labeling models NER task. More specifically, this paper makes the for NER are CRF (Lafferty et al., 2001), CNN­ following contributions: BiLSTM (Chiu and Nichols, 2016), BiLSTM­CRF (i) We bring together language speakers, dataset (Huang et al., 2015), and CNN­BiLSTM­CRF (Ma curators, NLP practitioners, and evaluation and Hovy, 2016). The traditional CRF makes use experts to address the constraints facing of hand­crafted features like part­of­speech tags, African NER. Based on the availability of on­ context words and word capitalization. Neural net­ line news corpora and language annotators, work NER models are initialized with word em­ we develop NER datasets, models, and evalu­ beddings like Word2Vec (Mikolov et al., 2013), ation covering ten widely spoken African lan­ GloVe (Pennington et al., 2014) and FastText (Bo­ guages. janowski et al., 2016). More recently, pre­trained language models such as BERT (Devlin et al., (ii) We curate NER datasets from local sources to 2019), RoBERTa (Liu et al., 2019), and LUKE (Ya­ ensure relevance of future research for native mada et al., 2020) produce state­of­the­art results speakers of the respective languages. for the task. Multilingual variants of these models (iii) We train and evaluate multiple NER mod­ like mBERT and XLM­RoBERTa (Conneau et al., els for all ten languages. Our experiments 2020) make it possible to train NER models for sev­ provide insights into the transfer across lan­ eral languages using transfer learning. Language­ guages, and highlight open challenges. specific parameters and adaptation to unlabelled data of the target language have yielded further (iv) We release the data, code, and model outputs gains (Pfeiffer et al., 2020a,b). in order to facilitate future research on the spe­ cific challenges of African NER. 3 Focus Languages Table 1 gives an overview of the languages consid­ 2 Related Work ered in this work, in terms of their language fam­ African NER datasets NER is a well­studied se­ ily, number of speakers and the regions in Africa quence labeling task (Yadav and Bethard, 2018) where they are spoken. We chose to focus on these and has been subject of many shared tasks in dif­ languages due to the availability of online news ferent languages (Tjong Kim Sang, 2002; Tjong corpora, annotators, and most importantly because Kim Sang and De Meulder, 2003; ijc, 2008; they are widely spoken African languages. Both re­ Shaalan, 2014; Benikova et al., 2014). Most of gion and language family might indicate a notion the available datasets are from high­resource lan­ of proximity for NER, either because of linguis­ guages. Although there have been efforts to cre­ tic features shared within that family, or because ate NER datasets for lower­resourced languages, data sources cover a common set of locally rele­ Language Family Speakers Region (except c), Igbo contains the following digraphs: Amharic Afro­Asiatic­Ethio­ 26M East (ch, gb, gh, gw, kp, kw, nw, ny, sh). Semitic Hausa Afro­Asiatic­Chadic 63M West Kinyarwanda (kin) makes use of 24 Latin char­ Igbo Niger­Congo­Volta­Niger 27M West acters with 5 vowels similar to English and 19 Kinyarwanda Niger­Congo­Bantu 12M East consonants excluding q and x. Moreover, Kin­ Luganda Niger­Congo­Bantu 7M East yarwanda has 74 additional complex consonants Luo Nilo Saharan 4M East (such as mb, mpw, and njyw).2 It is a tonal lan­ Nigerian­ English Creole 75M West Pidgin guage with three tones: low (no diacritic), high Swahili Niger­Congo­Bantu 98M Central (signaled by “/”) and falling (signaled by “^”). The & East default word order is Subject­Verb­Object. Wolof Niger­Congo­ 5M West Senegambia & NW Luganda (lug) is a tonal language with subject­ Yorùbá Niger­Congo­Volta­Niger 42M West verb­object word order. The Luganda alphabet is Table 1: Language, family, number of speakers (Eber­ composed of 24 letters that include 17 consonants hard et al., 2020), and regions in Africa. (p, v, f, m, d, t, l, r, n, z, s, j, c, g), 5 vowel sounds represented in the five alphabetical symbols
Recommended publications
  • Ideologies of Honorific Language
    Pragmatics2:3.25 l -262 InternationalPrasmatics Association IDEOLOGIES OF HONORIFIC LANGUAGE Judith T. Irvine 1. Introductionr All sociolinguisticsystems, presumably, provide some meansof expressingrespect (or disrespect);but only some systems have grammaticalized honorifics. This paper comparesseveral languages - Javanese,Wolof, and Zulu, plus a glance at ChiBemba - with regard to honorific expressionsand the social and cultural frameworks relevant thereto.2The main questionto be exploredis whether one can identiff any special cultural concomitants of linguistic systems in which the expression of respect is grammaticalized. Javanese"language levels" are a classicand well-describedexample of a system for the expressionof respect. In the sensein which I shall define "grammaticalized honorifics,"Javanese provides an apt illustration.Wolof, on the other hand, does not. Of course,Javanese is only one of several Asian languageswell known for honorific constructions,while Wolof, spokenin Senegal,comes from another part of the globe. But the presence or absence of honorifics is not an area characteristic of Asian languagesas opposed to African languages.As we shall see, Zulu has a system of lexicalalternates bearing a certain typological resemblanceto the Javanesesystem. Moreover,many other Bantu languages(such as ChiBemba) also have grammaticalized honorifics,but in the morphology rather than in the lexicon. Focusing on social structure instead of on geographical area, one might hypothesizethat grammaticalized honorifics occur where there are royal courts (Wenger1982) and in societieswhose traditions emphasize social rank and precedence. Honorificswould be a linguisticmeans of expressingconventionalized differences of rank.The languagesI shall comparewill make it evident,however, that a hypothesis causallylinking honorifics with court life or with entrenchedclass differences cannot be 1 An earlierversion of this paperwas presentedat a sessionon "Languageldeology" at the 1991annual meeting of the AmericanAnthropological Association.
    [Show full text]
  • Multilingual Texting in Senegal
    Names U ma puce: multilingual texting in Senegal Kristin Vold Lexander, University of Oslo [email protected] Working paper presented to the Media Anthropology Network e-seminar European Association of Social Anthropologists (EASA) 17-31 May 2011 http://www.media-anthropology.net/ Abstract Multilingualism is an important aspect of African urban life, also of the lives of students in Dakar. While the students usually write monolingual texts, mainly in French, their text messages involve the use of African languages too, in particular of the majority language Wolof, as well as Arabic and English, often mixed in one and the same message. With the rapid rise in the use of mobile phones, texting is becoming increasingly central as a means of communication for the students, and the social network with whom they text is growing. This working paper investigates texting as literacy practices (cf. Barton & Hamilton 1998), putting the accent on language choices: what role do they play in constructing these new practices? What are the motivations and the functions of the students’ languages choices? The analysis is based on six months of fieldwork in Dakar, during which I collected 496 SMS and interviewed and observed the 15 students who had sent and received the messages. I will focus on the practices of three of the students: Baba Yaro, a Fula-speaker born outside Dakar who has come to the Senegalese capital to undertake his studies, Christine, a Joola-speaker born in Dakar, and the Wolof-speaker Ousmane, from the suburb. I argue that in order to manage relationships and express different aspects of their identity, the students both exploit and challenge dominant language attitudes in their texting.
    [Show full text]
  • Wolof Informational Report
    Rhode Island College M.Ed. In TESL Program Language Group Specific Informational Reports Produced by Graduate Students in the M.Ed. In TESL Program In the Feinstein School of Education and Human Development Language Group: Wolof Author: Amanda Roy Program Contact Person: Nancy Cloud ([email protected]) Wolof Informational Report By: Amanda Roy TESL 539 Fall 2011 Where is Wolof Spoken? The language of Wolof belongs to the Atlantic branch of the Niger-Congo language family. It totals approximately 7 million speakers within the following countries. Senegal Gambia Mauritania France Guinea Guinea-Bissau Mali www.everyculture.com/Sa-Th/Senegal.html Writing System Wolof was first written in Wolofal which is a version of Arabic script. This is still used by some of the older male population in Senegal. http://www.omniglot.com/writing/wolof.htm Writing System Continued In 1974, the Wolof orthography using the Latin alphabet was standardized and became the official script in Senegal for Wolof. A a B b C c D d E e Ë ë F f G g I I J j K k L l M m N n Ñ ñŊŋ O o Pp Q q R r S s T t U u W w X x Y y Assane Faye, a Senegalese artist, also created an alphabet for Wolof in 1961. It goes from right to left and has some similarities to the Arabic script. Sometimes Wolof is written with this alphabet. http://www.omniglot.com/writing/wolof.htm What does Wolof sound like? Doomiaadamayéppdanuyjuddu, yam citawfeexci sag aksañ-sañ.
    [Show full text]
  • Annex H. Summary of the Early Grade Reading Materials Survey in Senegal
    Annex H. Summary of the Early Grade Reading Materials Survey in Senegal Geography and Demographics 196,722 square Size: kilometers (km2) Population: 14 million (2015) Capital: Dakar Urban: 44% (2015) Administrative 14 regions Divisions: Religion: 95% Muslim 4% Christian 1% Traditional Source: Central Intelligence Agency (2015). Note: Population and percentages are rounded. Literacy Projected 2013 Primary School 2015 Age Population (aged 2.2 million Literacy a a 7–12 years): Rates: Overall Male Female Adult (aged 2013 Primary School 56% 68% 44% 84%, up from 65% in 1999 >15 years) GER:a Youth (aged 2013 Pre-primary School 70% 76% 64% 15%,up from 3% in 1999 15–24 years) GER:a Language: French Mean: 18.4 correct words per minute When: 2009 Oral Reading Fluency: Standard deviation: 20.6 Sample EGRA Where: 11 regions 18% zero scores Resultsb 11% reading with ≥60% Reading comprehension Who: 687 P3 students Comprehension: 52% zero scores Note: EGRA = Early Grade Reading Assessment; GER = Gross Enrollment Rate; P3 = Primary Grade 3. Percentages are rounded. a Source: UNESCO (2015). b Source: Pouezevara et al. (2010). Language Number of Living Languages:a 210 Major Languagesb Estimated Populationc Government Recognized Statusd 202 DERP in Africa—Reading Materials Survey Final Report 47,000 (L1) (2015) French “Official” language 3.9 million (L2) (2013) “National” language Wolof 5.2 million (L1) (2015) de facto largest LWC Pulaar 3.5 million (L1) (2015) “National” language Serer-Sine 1.4 million (L1) (2015) “National” language Maninkakan (i.e., Malinké) 1.3 million (L1) (2015) “National” language Soninke 281,000 (L1) (2015) “National” language Jola-Fonyi (i.e., Diola) 340,000 (L1) “National” language Balant, Bayot, Guñuun, Hassanya, Jalunga, Kanjaad, Laalaa, Mandinka, Manjaaku, “National” languages Mankaañ, Mënik, Ndut, Noon, __ Oniyan, Paloor, and Saafi- Saafi Note: L1 = first language; L2 = second language; LWC = language of wider communication.
    [Show full text]
  • 1 Chapter 1 Introduction 1.1 Overview of the West Atlantic Languages As Wolof Has Not Been Widely Studied in the Generative Trad
    Chapter 1 Introduction 1.1 Overview of the West Atlantic Languages As Wolof has not been widely studied in the generative tradition, this section will serve to situate it in phyletic and geographic context. Wolof is a member of the Atlantic (or “West Atlantic”) sub-branch of the Niger-Congo family. Although classification schemes differ, it is generally agreed that the Atlantic subfamily represents one of the earliest branchings within the Niger-Congo phylum (Greenberg 1963, Ruhlen 1991, Heine and Nurse 2000). In fact, as a group, the Atlantic languages are unfortunately largely understudied. With the exception of Fula, linguistic materials on the Atlantic languages are typically scarce and scattered. These range from descriptions and traditional traditional grammars to pedagogical works, word lists, and dictionaries. Within the descriptive tradition, detailed linguistic works and grammars have been written for Fula (Sylla 1992), Kissi (Childs 1995), and Noon (Soukka 2000), for example. The most widely studied Atlantic language is Fula, which has a descriptive literature and a fair number of analytical works. Note though, that it has been the phonological system of Fula that has attracted the attention of most scholars. After Fula, the number of analytical and descriptive works drops precipitously. Even Wolof, one of the national languages of Senegal, has been little investigated overall. Within the literature on Wolof, it has been the phonological system that has been the center of study, especially vowel harmony (Ka 1989, Ndiaye 1995, Sy 2003).1 Descriptive works on Wolof include Diagne 1971, Mangold 1977, Church 1981, Dialo 1981, and Ka 1981. The only extensive analytical treatments of Wolof syntax are Njie 1981 and Dunigan 1994.2 Wolof is a member of the Senegambian group of the Northern branch in Atlantic.
    [Show full text]
  • Homosexuality in the Ghanaian Media: a Preliminary Survey
    International Journal of Linguistics, Literature and Culture (Linqua- IJLLC) October 2014 edition Vol.1 No.2 HOMOSEXUALITY IN THE GHANAIAN MEDIA: A PRELIMINARY SURVEY Charles Quist-Adade, PhD Faith Bates Nubwa Wathanafa Kwantlen Polytechnic University Abstract This paper examines how homosexuality is represented in the Ghanaian media. More specifically, it focuses on newspaper articles from GhanaWeb for the years 2008 - 2011. This research discussesthe prevalence of negative stereotypes abouthomosexuality in the Ghanaian media, and reveals the ways in which the type of language used by the Ghanaian media stigmatizes, dehumanizes, and others the homosexual population.This paper argues that homosexual stereotypes are prevalent in Ghana, and thatthese stereotypes are reflected in the Ghanaian media. It suggests that it is essential for Ghanaian media to portrayhomosexuality in a holistic manner in order to debunk stereotypes and myths regarding homosexuality.It focused on attitudes toward homosexuality as presented in the Ghanaian media and not the media (e.g. editors’, reporters’) attitudes toward homosexuality. Keywords: Homosexuality, Ghana, media, gay, lesbian, Africa Introduction Under Ghanaian criminal law, same-sex sexual activity among males is illegal. It is uncertain whether same-sex sexual activity among females is illegal. Under the chapter 6 of the Criminal Code, 1960, as amended by The Criminal Code (Amendment) Act, 2003, homosexuality is punishable by law. In fact, the Ghanaian constitution criminalizes homosexual coital activity
    [Show full text]
  • Proquest Dissertations
    Language, politics and the Halpulaar'en of Mauritania Item Type text; Dissertation-Reproduction (electronic) Authors Scionti, Theresa Louise Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Download date 05/10/2021 12:24:09 Link to Item http://hdl.handle.net/10150/289744 INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing fiom left to right in equal sections with small overiaps. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6' x 9" black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge.
    [Show full text]
  • Dakar Wolof and the Configuration of an Urban Identity
    Journal of African Cultural Studies, Volume 14, Number 2, December 2001, pp. 153-172 Dakar Wolof and the con guration of an urban identity FIONA MCLAUGHLIN (Department of Linguistics, University of Kansas) ABSTRACT The turbulent period of political and social unrest at the end of the 1980s and beginning of the 1990s in Senegal gave rise to the l movement in which the city of Dakar was recreated in the historical imagination of its youth. This essay argues that the movement coincided with the emergence of a self-conscious urban identity among the Dakar population, evidenced by a variety of artistic expression that focuses on and exalts the culture of the city. Central to the notion of an urban identity is the role of Dakar Wolof, a variety of the language that has signi cantly diverged from the more conservative dialects spoken in the rural areas, primarily by incorporating massive lexical borrowing from French. Dakar Wolof is portrayed in sustained written form for the rst time in two comics that appeared during this period: Boy Dakar by Ibou Fall and Aziz Bâ, and Ass et Oussou by Omar Diakité. This essay discusses the hybrid nature of Dakar Wolof and its depiction in written form in the two comics. Finally, it is argued that Dakar Wolof has had a profound effect on the notion of ethnicity in the Senegalese context and has contributed to the emergence of a de-ethnicized urban identity. 1. Introduction In February of 1988, presidential elections in Senegal failed to produce a change of regime from that of incumbent president Abdou Diouf to that of opposition leader Abdoulaye Wade, whose widely proclaimed campaign slogan had been or Change!1 A little more than a year later, in April of 1989, during the Muslim holy month of Ramadan, unprecedented ethnic violence against the Moorish population in Senegal broke out in the streets of Dakar, the country’s capital, escalating into an international crisis that stopped just short of war.
    [Show full text]
  • Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof
    Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof Elodie Gauthier1, Laurent Besacier1, Sylvie Voisin2, Michael Melese3, Uriel Pascal Elingui4 1Univ. Grenoble Alpes, LIG, F-38000 Grenoble, France 2Laboratoire Dynamique Du Langage (DDL), CNRS - Universite´ de Lyon, France 3Addis Ababa University, Ethiopia 4Voxygen SAS, Pleumeur-Bodou, France & Dakar, Sen´ egal´ [email protected], [email protected], [email protected], [email protected], [email protected] Abstract This article presents the data collected and ASR systems developped for 4 sub-saharan african languages (Swahili, Hausa, Amharic and Wolof). To illustrate our methodology, the focus is made on Wolof (a very under-resourced language) for which we designed the first ASR system ever built in this language. All data and scripts are available online on our github repository. Keywords: african languages, under-resourced languages, wolof, data collection, automatic speech recognition 1. Introduction 2. ASR systems made available in the Today is very favorable to the development of a market for ALFFA project speech in sub-saharan african languages. People’s access 2.1. Target languages of the project to information and communications technologies (ICT) is Language choice for the project is mainly governed by pop- done mainly through mobile (and keyboard) and the need ulation coverage and industrial perspectives. We focus on for voice services can be found in all sectors, from higher Hausa spoken by around 60 million people, as first or sec- priority (health, food) to more fun (games, social media). ond language. Hausa is part of the family of Afroasiatic For this, overcoming the language barrier is needed.
    [Show full text]
  • V Ol. 21 Issue 1 (November 2011) Working Papers of the Linguistics Circle of the University of Victoria
    WPLC Working Papers of the Linguistics Circle of the University of Victoria The Proceedings of the 27th Northwest Linguistics Conference February 19-20 2011 V ol. 21 Issue 1 (November 2011) Working Papers of the Linguistics Circle of the University of Victoria – Vol. 21 – Proceedings of the 27th Northwest Linguistics Conference Published by the graduate students of the University of Victoria Linguistics Department Department of Linguistics University of Victoria P.O. Box 3045 Victoria, B.C., Canada V8W 3P4 ISSN 1200-3344 (print) ISSN 1920-440X (digital) http://web.uvic.ca/~wplc | [email protected] © 2011 All rights retained by contributing authors. ii Table of Contents Acknowledgements v Preface v Editorial committee vi CONTRIBUTIONS IN PHONOLOGY AND PHONETICS Leane Secen & Tae-Jin Yoon 1 Degrees of phonetic enhancement by speech clinicians towards speech-/ language- impaired children Aliana Parker 9 It’s that schwa again! Towards a typology of Salish schwa Khaled Karim 22 An Optimality Theoretic (OT) account of word-final vowel epenthesis and deletion processes in the incorporation of loanwords into the Dhaka dialect of Bangla Christen Harris 34 The structural category of past tense woon in Wolof Akitsugu Nogita 43 Epenthesis, intrusion, or deletion? Vowel alternation in consonant clusters by Japanese ESL learners CONTRIBUTIONS IN SYNTAX AND MORPHOLOGY Reem Alsadoon 52 Non-derivational approach to ditransitive constructions in MSA Hailey Ceong 61 The “Gradient Structure” of Korean “Words” Emrah Görgülü 70 Plural marking in Turkish: Additive
    [Show full text]
  • The Impact of Arabic on Wolof Language
    ISSN 1799-2591 Theory and Practice in Language Studies, Vol. 6, No. 4, pp. 675-680, April 2016 DOI: http://dx.doi.org/10.17507/tpls.0604.03 The Impact of Arabic on Wolof Language Muna Aljhaj-Saleh Salama Al-Ajrami Language Center, The University of Jordan, Jordan Abstract—This research aims to study Wolof people in terms of their origin, background, and language. It will also discuss the factors that led Arabic to spread among the members of this tribe, such as the religious factor after the spreading of Islam in the West of Africa (i.e. Mauritania, Senegal, and Gambia), where Wolof people reside. The commercial factor also affected the spreading of Arabic language in the aforementioned areas. In addition to that, the emigration factor of some Arab tribes from Egypt and the Arab peninsula that resided in the far west of Africa for economic and political reasons had an impact on the spreading. Finally, the study will show the impact of Arabic Language on Wolof Language as the following: 1) the Arabic phonetics and their alternatives in this language; 2) the borrowed vocabulary in Wolof language from Arabic; and 3) Conduct a contrastive analysis in verb conjugation, masculinity and femininity, and definiteness and indefiniteness between the two languages to know how far Arabic Language has impacted Wolof Language. Index Terms—Wolof, Arabic, contrastive study I. INTRODUCTION Arabic is the religious language for millions of African Muslims, as they use Arabic to recite Quran and to perform the rituals of Islam such as prayers, pilgrimage and others. Arabic has spread in the west of Africa since the tenth century due to the spread of Islam.
    [Show full text]
  • Perspectives of Comparative Studies on the Mande and West Atlantic Language Groups: 1 an Approach to the Quantitative Comparative Linguistics
    PERSPECTIVES OF COMPARATIVE STUDIES ON THE MANDE AND WEST ATLANTIC LANGUAGE GROUPS: 1 AN APPROACH TO THE QUANTITATIVE COMPARATIVE LINGUISTICS Konstantin Pozdnyakov Statistical methods in comparative linguistics have been widely applied in only one peripheral field, namely the glotto-chronology and lexico-statistics which are intended to establish absolute or relative chronology of language divergence. In the meantime, statistics can be applied in all the principal fields of comparative linguistics. In certain relations, statistical data become the decisive factor. This paper deals with principal branches of quantitative comparative linguistics dealt with during my work in the field of Mande and West Atlantic historical grammar. The principal starting point for the following reasonings is very simple: any qualitative divergence between cognate languages evoked by any innovation process results in a respective quantitative divergence. For example, palatalization of dental stops before front vowels in one of a couple of related languages results automatically in an increased frequency of palatals and a decline in the frequency of dentals in the vocabulary. As a 1 This electronic version of Konstantin Pozdnyakov’s article has undergone some extra editing, so it has some differences from the paper version published in Mandenkan 22. However, the modifications concern only formulations of ideas but not their content. – Editors. 40 result, the statistical distribution of phonemes reflects all basic divergence processes in cognate languages (with few exceptions). Therefore, any case of quantitative difference in any pair of related languages is to be interpreted by comparative historical linguistics. Taking this thesis as a starting point, we obtain two alternative sets of facts to explain a phenomena, instead of only one.
    [Show full text]