Automatic Speech Recognition for Low-Resource and Morphologically Complex Languages

Total Page:16

File Type:pdf, Size:1020Kb

Automatic Speech Recognition for Low-Resource and Morphologically Complex Languages Rochester Institute of Technology RIT Scholar Works Theses 4-2021 Automatic Speech Recognition for Low-Resource and Morphologically Complex Languages Ethan Morris [email protected] Follow this and additional works at: https://scholarworks.rit.edu/theses Recommended Citation Morris, Ethan, "Automatic Speech Recognition for Low-Resource and Morphologically Complex Languages" (2021). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. Automatic Speech Recognition for Low-Resource and Morphologically Complex Languages Ethan Morris Automatic Speech Recognition for Low-Resource and Morphologically Complex Languages Ethan Morris April 2021 A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Engineering Department of Computer Engineering Automatic Speech Recognition for Low-Resource and Morphologically Complex Languages Ethan Morris Committee Approval: Emily Prud'hommeaux Advisor Date Department of Computer Science, Boston College Alexander Loui Date Department of Computer Engineering Andreas Savakis Date Department of Computer Engineering i Acknowledgments I would like to thank Dr. Emily Prud'hommeaux for her continued support and guidance, with invaluable suggestions. I am also thankful for Dr. Alexander Loui and Dr. Andreas Savakis for being on the thesis committee. I also wish to thank Dr. Raymond Ptucha, Dr. Christopher Homan, and Robert Jimerson for their ideas, advice, and general expertise. ii Abstract The application of deep neural networks to the task of acoustic modeling for au- tomatic speech recognition (ASR) has resulted in dramatic decreases of word error rates, allowing for the use of this technology in smart phones and personal home assistants in high-resource languages. Developing ASR models of this caliber, how- ever, requires hundreds or thousands of hours of transcribed speech recordings, which presents challenges for most of the world's languages. In this work, we investigate the applicability of three distinct architectures that have previously been used for ASR in languages with limited training resources. We tested these architectures using publicly available ASR datasets for several typologically and orthographically diverse languages, whose data was produced under a variety of conditions using different speech collection strategies, practices, and equipment. Additionally, we performed data augmentation on this audio, such that the amount of data could increase nearly tenfold, synthetically creating higher resource training. The architectures and their individual components were modified, and parameters explored such that we might find a best-fit combination of features and modeling schemas to fit a specific language morphology. Our results point to the importance of considering language-specific and corpus-specific factors and experimenting with multiple approaches when developing ASR systems for resource-constrained languages. iii Contents Signature Sheet i Acknowledgments ii Abstract iii Table of Contents iv List of Figures vii List of Tables viii 1 Introduction and Motivation 1 1.1 Introduction and Motivation . 1 1.2 Document Structure . 2 2 Background 4 2.1 Introduction . 4 2.2 Automatic Speech Recognition . 4 2.3 Feature Extraction . 5 2.3.1 Spectral Features . 6 2.3.2 Mel Filterbanks . 7 2.3.3 Mel-frequency Cepstral Coefficients (MFCCs) . 7 2.3.4 Feature Space Maximum Likelihood Linear Regression (fMLLR) 8 2.3.5 I/X Vector . 8 2.4 Acoustic Models . 9 2.4.1 Hidden Markov Model . 9 2.4.2 Gaussian Mixture Model . 9 2.4.3 Deep Neural Network . 10 2.4.4 Recurrent Neural Networks . 11 2.5 Language Models . 12 2.5.1 N-Gram . 12 2.5.2 Neural . 12 2.5.3 Trans-dimensional Random Field (TRF) . 13 2.6 Data Augmentation . 13 iv CONTENTS 2.6.1 Pitch/Speed/Noise . 13 2.6.2 SpecAugment . 14 2.6.3 Speech Synthesis . 15 2.7 Evaluation Metrics . 16 2.7.1 Character/Word Error Rate . 16 2.7.2 Perplexity . 17 2.8 Frameworks . 17 2.8.1 Kaldi . 17 2.8.2 DeepSpeech . 18 2.8.3 Wav2Letter/Wav2Vec . 18 2.8.4 WireNet . 19 3 Datasets 21 3.1 Amharic . 21 3.1.1 Language Description . 21 3.1.2 Prior ASR Work . 22 3.2 Bemba . 23 3.2.1 Language Description . 23 3.2.2 Prior ASR Work . 23 3.3 Iban . 24 3.3.1 Language Description . 24 3.3.2 Prior ASR Work . 25 3.4 Seneca . 25 3.4.1 Language Description . 25 3.4.2 Prior ASR Work . 26 3.5 Swahili . 26 3.5.1 Language Description . 26 3.5.2 Prior ASR Work . 27 3.6 Vietnamese . 27 3.6.1 Language Description . 27 3.6.2 Prior ASR Work . 28 3.7 Wolof . 28 3.7.1 Language Description . 28 3.7.2 Prior ASR Work . 29 v CONTENTS 4 Methodology 30 4.1 Language Comparison . 30 4.1.1 Average Utterance Length . 30 4.1.2 Quality Measures . 30 4.2 Kaldi . 32 4.3 WireNet . 33 4.3.1 Transfer Learning . 33 4.3.2 Data Augmentation . 33 4.3.3 Architecture Exploration . 34 4.3.4 Feature Comparison . 35 4.4 Speaker Re-partitioning . 36 4.5 Language Model Exploration . 36 5 Results 37 5.1 Kaldi Results . 37 5.1.1 Overall . 37 5.1.2 Data Augmentation . 38 5.2 WireNet . 39 5.2.1 Overall Results . 39 5.2.2 Architecture Exploration . 39 5.2.3 Transfer Learning . 43 5.2.4 Feature Comparison . 44 5.3 Language Comparison . 46 5.4 Language Model Exploration . 48 5.5 Speaker Overlap . 48 6 Conclusions 56 6.1 Conclusions . 56 6.2 Future Work . 57 Bibliography 58 vi List of Figures 2.1 Generic pipeline of an ASR system. 4 2.2 An overview of the most basic feature extraction process, including the different stopping points for various features [1]. 6 2.3 Augmentations applied to the base input, given at the top. From top to bottom, the figures depict the log mel spectrogram of the base input with no augmentation, time warp, frequency masking and time masking applied [2] . 15 2.4 Left: The overall WireNet architecture. Right: A bottleneck block consisting of 9 paths, each with bottleneck filters centered by filters of different width to capture different temporal dependencies. Each layer shows (# input channels, filter width, # output channels). 20 4.1 Comparison of the WADA SNR algorithm and the average estimated NIST SNR against an artificially corrupted database. [3] . 32 5.1 CER vs. training epochs of the Swahili language at specific bottleneck depths. 41 5.2 U-Net architecture where the convolutional size increases to a maxi- mum of 1024, before reducing back to the input size. [4] . 42 5.3 CER vs. training epochs through the stages of training the Swahili language with transfer learning. 43 5.4 CER vs. training epochs through the stages of training the Swahili language without transfer learning. 44 5.5 Best produced WER vs. NIST SNR across all languages, training set depicted. 47 5.6 Best produced WER vs. WADA SNR across all languages, training set depicted. 47 5.7 WER vs. % of Test Set for the Bemba Language. 51 5.8 WER vs. % of Test Set for the Seneca Language. 53 5.9 WER vs. % of Test Set for the Wolof Language. 55 vii List of Tables 3.1 Amharic Dataset Information . 22 3.2 Bemba Dataset Information . 23 3.3 Iban Dataset Information . 24 3.4 Seneca Dataset Information . 26 3.5 Swahili Dataset Information . 27 3.6 Vietnamese Dataset Information . 28 3.7 Wolof Dataset Information . 29 5.1 Overall Kaldi results for the two best architectures across all languages. 37 5.2 Kaldi results for the two best architectures across the Wolof language with data augmentation. 38 5.3 Overall WireNet results across all languages. 39 5.4 WireNet architecture exploration for Swahili, varying the width and depth of the system. ..
Recommended publications
  • Trilingual Codeswitching in Kenya – Evidence from Ekegusii, Kiswahili, English and Sheng
    Trilingual Codeswitching in Kenya – Evidence from Ekegusii, Kiswahili, English and Sheng Dissertation zur Erlangung der Würde des Doktors der Philosophie der Universität Hamburg vorgelegt von Nathan Oyori Ogechi aus Kenia Hamburg 2002 ii 1. Gutachterin: Prof. Dr. Mechthild Reh 2. Gutachter: Prof. Dr. Ludwig Gerhardt Datum der Disputation: 15. November 2002 iii Acknowledgement I am indebted to many people for their support and encouragement. It is not possible to mention all by name. However, it would be remiss of me not to name some of them because their support was too conspicuous. I am bereft of words with which to thank my supervisor Prof. Dr. Mechthild Reh for accepting to supervise my research and her selflessness that enabled me secure further funding at the expiry of my one-year scholarship. Her thoroughness and meticulous supervision kept me on toes. I am also indebted to Prof. Dr. Ludwig Gerhardt for reading my error-ridden draft. I appreciate the support I received from everybody at the Afrika-Abteilung, Universität Hamburg, namely Dr. Roland Kießling, Theda Schumann, Dr. Jutta Becher, Christiane Simon, Christine Pawlitzky and the institute librarian, Frau Carmen Geisenheyner. Professors Myers-Scotton, Kamwangamalu, Clyne and Auer generously sent me reading materials whenever I needed them. Thank you Dr. Irmi Hanak at Afrikanistik, Vienna, Ndugu Abdulatif Abdalla of Leipzig and Bi. Sauda Samson of Hamburg. I thank the DAAD for initially funding my stay in Deutschland. Professors Miehe and Khamis of Bayreuth must be thanked for their selfless support. I appreciate the kind support I received from the Akademisches Auslandsamt, University of Hamburg.
    [Show full text]
  • Cover Page the Handle Holds Various Files of This Leiden University Dissertation. Author: Lima
    Cover Page The handle http://hdl.handle.net/1887/85723 holds various files of this Leiden University dissertation. Author: Lima Santiago J. de Title: Zoonímia Histórico-comparativa: Denominações dos antílopes em bantu Issue Date: 2020-02-26 729 ANEXO 1: TABELA RECAPITULATIVA DAS PROTOFORMAS Nas protoformas provenientes do BLR (2003) e nas reconstruções de outros autores (majoritariamente, Mouguiama & Hombert, 2006), as classes nominais em negrito e sublinhadas, são sugestões da autora da tese. Significados Reconstruções Propostas Propostas do BLR e de de correções (De Lima outros autores Santiago) *-bʊ́dʊ́kʊ́ °-bʊ́dʊ́gʊ́ (cl. 9/10, 12/13) °-cénda (cl. 12/13) Philantomba °-cótɩ́ monticola (cl. 12/13) *-kùengà > °-kùèngà (cl. 11/5, 7/8) °°-cécɩ/ °°-cétɩ (cl. 9/10, 12/13) *-pàmbı ́ °-pàmbɩ́ (cl. 9/10) °-dòbò Cephalophus (cl. 3+9/4, nigrifrons 5/6) *-pùmbɩ̀dɩ̀ °-pùmbèèdɩ̀ (cl. 9/10, 9/6) 730 Significados Reconstruções Propostas Propostas do BLR e de de correções (De Lima outros autores Santiago) *-jʊ́mbɩ̀ (cl. 9/10, 3/4) °°-cʊ́mbɩ (cl. 9/10, 5/6, 7/8, 11/10) *-jìbʊ̀ °-tʊ́ndʊ́ Cephalophus (cl. 9/10) (cl. 9/10) silvicultor °°-bɩ́mbà °-bɩ̀mbà (cl. 9/10) °-kʊtɩ (cl. 9, 3) *-kʊ́dʊ̀pà/ °-bɩ́ndɩ́ *-kúdùpà (cl. 9/10, 7/8, (cl. 9/10) 3, 12/13) Cephalophus dorsalis °°-cíbʊ̀ °-pòmbɩ̀ (cl. 7/8) (cl. 9/10) °°-cʊmɩ >°-cʊmɩ́ °-gindà (cl. 9) Cephalophus (cl. 3/4) callipygus °°-cábè >°-cábà (cl. 9/10, 7/8) °°-bɩ̀jɩ̀ (cl. 9) 731 Significados Reconstruções Propostas Propostas do BLR e de de correções (De Lima outros autores Santiago) *-bengeda >°-bèngédè °-cégé (cl.9/10) (cl. 9/10) °°-àngàdà >°-jàngàdà Cephalophus (cl.
    [Show full text]
  • Swahili and the Dilemma of Ugandan Language Policy*
    ASIAN AND AFRICAN STUDIES, 5, 1996, 2, 158170 SWAHILI AND THE DILEMMA OF UGANDAN LANGUAGE POLICY* Viera PAWLIKOVÁ-VILHANOVÁ Institute of Oriental and African Studies, Slovak Academy of Sciences, Klemensova 19, 813 64 Bratislava, Slovakia The language situation in East Africa is characterized by the widespread use of Swahili. While both in Tanzania and Kenya Swahili has been systematically promoted in all spheres of everyday life, Uganda has lacked a coherent government policy on language development and the position of Swahili in Uganda has always been very ambiguous. The continued vacil- lation over language policy and the Luganda/Swahili opposition threatening to carve the country into two major camps of language choice which is characteristic of the Ugandan lan- guage situation suggests that the language issue is not likely to be solved in the near future. Africa probably has the most complex and varied language situation in the world. It is a well known fact that the national boundaries of African countries drawn arbitrarily by the colonial powers at conferences in Europe during the time of the imperial partition of the African continent, pay little regard to the historical, cultural and linguistic affinity of the Africans. Few African states are known for their linguistic homogeneity.1 Given a very complex language situa- tion in Africa south of the Sahara with African languages co-existing and com- peting with European languages as well as with various lingua francas, Pidgins and Creoles, multilinguism and hence multilingualism is a feature of most Afri- can countries and it seems that it will remain the norm for a long time to come.
    [Show full text]
  • Local Referenda on Fisheries Futures for Lake Tanganyika
    RESEARCH FOR THE MANAGEMENT OF THE FISHERIES ON LAKE TANGANYIKA GCP/RAF/271/FIN-TD/91 (En) GCP/RAF/271/FIN-TD/91(En) February 1999 BUILDING MANAGEMENT PARTNERSHIPS: LOCAL REFERENDA ON FISHERIES FUTURES FOR LAKE TANGANYIKA Edited By: J.E. Reynolds FINNISH INTERNATIONAL DEVELOPMENT AGENCY FOOD AND AGRICULTURE ORGANIZATION OF THE UNITED NATIONS Bujumbura, February 1999 The conclusions and recommendations given in this and other reports in the Research for the Management of the Fisheries on the Lake Tanganyika Project series are those considered appropriate at the time of preparation. They may be modified in the light of further knowledge gained at subsequent stages of the Project. The designations employed and the presentation of material in this publication do not imply the expression of any opinion on the part of FAO or FINNIDA concerning the legal status of any country, territory, city or area, or concerning the determination of its frontiers or boundaries. PREFACE The Research for the Management of the Fisheries on Lake Tanganyika project (LTR) became fully operational in January 1992. It is executed by the Food and Agriculture Organization of the United Nations (FAO) and funded by the Finnish International Development Agency (FINNIDA) and the Arab Gulf Program for the United Nations Development Organization (AGFUND). LTR's objective is the determination of the biological basis for fish production on Lake Tanganyika, in order to permit the formulation of a coherent lake-wide fisheries management policy for the four riparian States (Burundi, Democratic Republic of Congo, Tanzania, and Zambia). Particular attention is given to the reinforcement of the skills and physical facilities of the fisheries research units in all four beneficiary countries as well as to the build-up of effective coordination mechanisms to ensure full collaboration between the Governments concerned.
    [Show full text]
  • Learning Swahili Morphology, with Fidele Mpiranya. 2018
    Learning Swahili morphology John Goldsmith and Fidèle Mpiranya July 23, 2018 1 Introduction In this paper we would like to explain some of the things that we have learned from a project on the learning of morphology. “Learning of morphology” in this context means using an algorithm which takes a large amount of text from a language, and draws conclusions about what are the roots, affixes, and principles of word construction (from roots and affixes) in this particular language. The crucial fact to bear in mind is that the algorithm is to have no prior knowledge of the language that we give to it. If the language is English or is Swahili, the learning algorithm starts from the same point; any differences that it draws between the two derive entirely from the data, and not from anything that we have given to the algorithm. That sounds like a tall order, and in some ways it is. But we can offer the following as motivation for this work. When we teach an introductory course on linguistics, we always reserve the second class on morphology for an experiment. We begin by putting a word on the board: ninasema, but we do not tell them this is from Swahili. We ask if anyone knows what it means or what language it comes from; if someone does know, we tell them to be quiet for the rest of the class. Then we ask everyone else to divide it into morphemes. There is silence, of course, because the students think they have no idea what the right answer is.
    [Show full text]
  • The International Status of Kiswahili: the Parameters of Braj Kachru's
    The International Status of Kiswahili: The Parameters of Braj Kachru’s Model of World Englishes by Patrick Lugwiri Okombo, MA [email protected] Nairobi, Kenya & Edwin Muna, MA [email protected] Nairobi, Kenya Abstract This paper critically examines the status of Kiswahili as an international language within the parameters of Braj Kachru’s sociolinguistic construct – “Model of World Englishes”, that works to analyze how sociolinguistic histories, multicultural backgrounds and contexts of function influence the use of English in different regions of the world. In our view, such a model yields fairly objective results. Thus, the results of this paper are that Kiswahili perfectly meets the standards of an international language within Braj Kachru’s model, albeit with few disparities. The paper concludes that Kiswahili’s growth into an international language is clearly taking the trends taken by English as examined by Kachru. Hence, the findings reinforce Kiswahili’s geo- political significance as it recommends the creation of conditions that would encourage its spread and influence in the world. Key words: Kiswahili, international language, world Englishes, native speaker 55 Africology: The Journal of Pan African Studies, vol.10, no.7, September 2017 Introduction Kiswahili is an indigenous African language whose origin, according to many researchers, is the coast of Eastern Africa. Traditionally, it was regarded as the language of the coastal communities of Kenya and Tanzania. It remained the language of the people of East African coast for a long time. It is argued that the early visitors and traders, such as the Arabs and Persians who came to the East African coast, used to speak with the natives in Kiswahili.
    [Show full text]
  • [.35 **Natural Language Processing Class Here Computational Linguistics See Manual at 006.35 Vs
    006 006 006 DeweyiDecimaliClassification006 006 [.35 **Natural language processing Class here computational linguistics See Manual at 006.35 vs. 410.285 *Use notation 019 from Table 1 as modified at 004.019 400 DeweyiDecimaliClassification 400 400 DeweyiDecimali400Classification Language 400 [400 [400 *‡Language Class here interdisciplinary works on language and literature For literature, see 800; for rhetoric, see 808. For the language of a specific discipline or subject, see the discipline or subject, plus notation 014 from Table 1, e.g., language of science 501.4 (Option A: To give local emphasis or a shorter number to a specific language, class in 410, where full instructions appear (Option B: To give local emphasis or a shorter number to a specific language, place before 420 through use of a letter or other symbol. Full instructions appear under 420–490) 400 DeweyiDecimali400Classification Language 400 SUMMARY [401–409 Standard subdivisions and bilingualism [410 Linguistics [420 English and Old English (Anglo-Saxon) [430 German and related languages [440 French and related Romance languages [450 Italian, Dalmatian, Romanian, Rhaetian, Sardinian, Corsican [460 Spanish, Portuguese, Galician [470 Latin and related Italic languages [480 Classical Greek and related Hellenic languages [490 Other languages 401 DeweyiDecimali401Classification Language 401 [401 *‡Philosophy and theory See Manual at 401 vs. 121.68, 149.94, 410.1 401 DeweyiDecimali401Classification Language 401 [.3 *‡International languages Class here universal languages; general
    [Show full text]
  • The Languages of Tanzania There Are About 112 Indi,Enous African Languages in Tanzania (Grimes 1992)
    LANGUAGE SHIFI' AND NATIONAL IDENTITY IN TANZANIA I Deo Ngonyani Introduction Tanzania is a country of many languages. While Swahili is used by everyone in different situations, vernacular languages are restricted to the different ethnic groups and English is mainly used in education. Vernacular languages, Swahili and English are competitors whose fonuncs are tied to the socio-political changes in the country. In the past flfty years the use of the Swahili language has increased tremendously in Tanzania. Vinually all Tanzanians speak Swahili today and Swahili has become an identity marker for Tanzanians. The use of Swahili has expanded so much that it is now replacing vernacular languages as the language of everyday interaction and is also replacing English as the languaJe of education and government. In this paper, I illusttate that there 1s a process of language shift in Tanzan1a. I also show that due to different factors, Swahili bas become the language of Tanzanian identity. The Language Situtation In order to understand the dynamics propelling Swahili to such a position of prominence, one needs to look at the language situation. In this section I briefly discuss what languages are spoken and for what functions as well as how the different languages relate to each other. The Languages of Tanzania There are about 112 indi,enous African languages in Tanzania (Grimes 1992). The majority o them (101 languages) belong to the Bantu language group. The other African language super-families are also represented. There are 4 Nilotic languages which include Maasai and Tatoga, 5 Cushitic languages such as Iraqw, and 2 Khoisan languages, Hatsa and Sandawe.
    [Show full text]
  • LCSH Section K
    K., Rupert (Fictitious character) K-TEA (Achievement test) Kʻa-la-kʻun-lun kung lu (China and Pakistan) USE Rupert (Fictitious character : Laporte) USE Kaufman Test of Educational Achievement USE Karakoram Highway (China and Pakistan) K-4 PRR 1361 (Steam locomotive) K-theory Ka Lae o Kilauea (Hawaii) USE 1361 K4 (Steam locomotive) [QA612.33] USE Kilauea Point (Hawaii) K-9 (Fictitious character) (Not Subd Geog) BT Algebraic topology Ka Lang (Vietnamese people) UF K-Nine (Fictitious character) Homology theory USE Giẻ Triêng (Vietnamese people) K9 (Fictitious character) NT Whitehead groups Ka nanʻʺ (Burmese people) (May Subd Geog) K 37 (Military aircraft) K. Tzetnik Award in Holocaust Literature [DS528.2.K2] USE Junkers K 37 (Military aircraft) UF Ka-Tzetnik Award UF Ka tūʺ (Burmese people) K 98 k (Rifle) Peras Ḳ. Tseṭniḳ BT Ethnology—Burma USE Mauser K98k rifle Peras Ḳatseṭniḳ ʾKa nao dialect (May Subd Geog) K.A.L. Flight 007 Incident, 1983 BT Literary prizes—Israel BT China—Languages USE Korean Air Lines Incident, 1983 K2 (Pakistan : Mountain) Hmong language K.A. Lind Honorary Award UF Dapsang (Pakistan) Ka nō (Burmese people) USE Moderna museets vänners skulpturpris Godwin Austen, Mount (Pakistan) USE Tha noʹ (Burmese people) K.A. Linds hederspris Gogir Feng (Pakistan) Ka Rang (Southeast Asian people) USE Moderna museets vänners skulpturpris Mount Godwin Austen (Pakistan) USE Sedang (Southeast Asian people) K-ABC (Intelligence test) BT Mountains—Pakistan Kā Roimata o Hine Hukatere (N.Z.) USE Kaufman Assessment Battery for Children Karakoram Range USE Franz Josef Glacier/Kā Roimata o Hine K-B Bridge (Palau) K2 (Drug) Hukatere (N.Z.) USE Koro-Babeldaod Bridge (Palau) USE Synthetic marijuana Ka-taw K-BIT (Intelligence test) K3 (Pakistan and China : Mountain) USE Takraw USE Kaufman Brief Intelligence Test USE Broad Peak (Pakistan and China) Ka Tawng Luang (Southeast Asian people) K.
    [Show full text]
  • Swahili-English Dictionary, the First New Lexical Work for English Speakers
    S W AHILI-E N GLISH DICTIONARY Charles W. Rechenbach Assisted by Angelica Wanjinu Gesuga Leslie R. Leinone Harold M. Onyango Josiah Florence G. Kuipers Bureau of Special Research in Modern Languages The Catholic University of Americ a Prei Washington. B. C. 20017 1967 INTRODUCTION The compilers of this Swahili-English dictionary, the first new lexical work for English speakers in many years, hope that they are offering to students and translators a more reliable and certainly a more up-to-date working tool than any previously available. They trust that it will prove to be of value to libraries, researchers, scholars, and governmental and commercial agencies alike, whose in- terests and concerns will benefit from a better understanding and closer communication with peoples of Africa. The Swahili language (Kisuiahili) is a Bantu language spoken by perhaps as many as forty mil- lion people throughout a large part of East and Central Africa. It is, however, a native or 'first' lan- guage only in a nnitp restricted area consisting of the islands of Zanzibar and Pemba and the oppo- site coast, roughly from Dar es Salaam to Mombasa, Outside this relatively small territory, elsewhere in Kenya, in Tanzania (formerly Tanganyika), Copyright © 1968 and to a lesser degree in Uganda, in the Republic of the Congo, and in other fringe regions hard to delimit, Swahili is a lingua franca of long standing, a 'second' (or 'third' or 'fourth') language enjoy- ing a reasonably well accepted status as a supra-tribal or supra-regional medium of communication. THE CATHOLIC UNIVERSITY OF AMERICA PRESS, INC.
    [Show full text]
  • Some Linguistic Variations of Bemba
    SOME LINGUISTIC VARIATIONS OF BEMBA: A DIALECTOLOGICAL STUDY OF STANDARD BEMBA, LUUNDA AND ŊUMBO BY CHIBWE RONALD LUMWANGA A DISSERTATION SUBMITTED TO THE UNIVERSITY OF ZAMBIA IN PARTIAL FULFILMENT OF THE REQUIREMENTS OF THE DEGREE OF MASTER OF ARTS IN LINGUISTIC SCIENCE THE UNIVERSITY OF ZAMBIA 2015 DECLARATION I, Chibwe Ronald Lumwanga, do hereby declare that this dissertation is my own work, and that it has not been submitted for a degree at this university or any other, and that it does not include any published work or material from another dissertation or a thesis without acknowledgement. Signed............................................................................. Date................................................................................. © Chibwe Ronald Lumwanga, 2015. All rights reserved. i ii APPROVAL This dissertation of Chibwe Ronald Lumwanga is approved as fulfilling part of the requirements for the award of the degree of Master of Arts in Linguistic Science of the University of Zambia. Signed: ............................................................. Date: ................................................................. Signed: ............................................................. Date: ................................................................. Signed: ............................................................. Date: ................................................................. iii ABSTRACT This study investigated some linguistic variations among three Bemba dialects, namely: Standard
    [Show full text]
  • Storytelling in Northern Zambia: Theory, Method, Practice and Other Necessary Fictions
    To access digital resources including: blog posts videos online appendices and to purchase copies of this book in: hardback paperback ebook editions Go to: https://www.openbookpublishers.com/product/137 Open Book Publishers is a non-profit independent initiative. We rely on sales and donations to continue publishing high-quality academic works. Man playing the banjo, Kaputa (northern Zambia), 1976. Photo by Robert Cancel World Oral Literature Series: Volume 3 Storytelling in Northern Zambia: Theory, Method, Practice and Other Necessary Fictions Robert Cancel http://www.openbookpublishers.com © 2013 Robert Cancel. Foreword © 2013 Mark Turin. This book is licensed under a Creative Commons Attribution 3.0 Unported license (CC-BY 3.0). This license allows you to share, copy, distribute and transmit the work; to adapt the work and to make commercial use of the work providing attribution is made the respective authors (but not in any way that suggests that they endorse you or your use of the work). Further details available at http:// creativecommons.org/licenses/by/3.0/ Attribution should include the following information: Cancel, Robert. Storytelling in Northern Zambia: Theory, Method, Practice and Other Necessary Fictions. Cambridge, UK: Open Book Publishers, 2013. This is the third volume in the World Oral Literature Series, published in association with the World Oral Literature Project. World Oral Literature Series: ISSN: 2050-7933 Digital material and resources associated with this volume are hosted by the World Oral Literature Project (http://www.oralliterature.org/collections/rcancel001.html) and Open Book Publishers (http://www.openbookpublishers.com/isbn/9781909254596). ISBN Hardback: 978-1-909254-60-2 ISBN Paperback: 978-1-909254-59-6 ISBN Digital (PDF): 978-1-909254-61-9 ISBN Digital ebook (epub): 978-1-909254-62-6 ISBN Digital ebook (mobi): 978-1-909254-63-3 DOI: 10.11647/OBP.0033 Cover image: Mr.
    [Show full text]