Speech Recognition for Under-Resourced Languages: Data
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
THE DEVELOPMENT of ACCENTED ENGLISH SYNTHETIC VOICES By
THE DEVELOPMENT OF ACCENTED ENGLISH SYNTHETIC VOICES by PROMISE TSHEPISO MALATJI DISSERTATION Submitted in fulfilment of the requirements for the degree of MASTER OF SCIENCE in COMPUTER SCIENCE in the FACULTY OF SCIENCE AND AGRICULTURE (School of Mathematical and Computer Sciences) at the UNIVERSITY OF LIMPOPO SUPERVISOR: Mr MJD Manamela CO-SUPERVISOR: Dr TI Modipa 2019 DEDICATION In memory of my grandparents, Cecilia Khumalo and Alfred Mashele, who always believed in me! ii DECLARATION I declare that THE DEVELOPMENT OF ACCENTED ENGLISH SYNTHETIC VOICES is my own work and that all the sources that I have used or quoted have been indicated and acknowledged by means of complete references and that this work has not been submitted before for any other degree at any other institution. ______________________ ___________ Signature Date iii ACKNOWLEDGEMENTS I want to recognise the following people for their individual contributions to this dissertation: • My brother, Mr B.I. Khumalo and the whole family for the unconditional love, support and understanding. • A distinct thank you to both my supervisors, Mr M.J.D. Manamela and Dr T.I. Modipa, for their guidance, motivation, and support. • The Telkom Centre of Excellence for Speech Technology for providing the resources and support to make this study a success. • My colleagues in Department of Computer Science, Messrs V.R. Baloyi and L.M. Kola, for always motivating me. • A special thank you to Mr T.J. Sefara for taking his time to participate in the study. • The six Computer Science undergraduate students who sacrificed their precious time to participate in data collection. -
Prioritizing African Languages: Challenges to Macro-Level Planning for Resourcing and Capacity Building
Prioritizing African Languages: Challenges to macro-level planning for resourcing and capacity building Tristan M. Purvis Christopher R. Green Gregory K. Iverson University of Maryland Center for Advanced Study of Language Abstract This paper addresses key considerations and challenges involved in the process of prioritizing languages in an area of high linguistic di- versity like Africa alongside other world regions. The paper identifies general considerations that must be taken into account in this process and reviews the placement of African languages on priority lists over the years and across different agencies and organizations. An outline of factors is presented that is used when organizing resources and planning research on African languages that categorizes major or crit- ical languages within a framework that allows for broad coverage of the full linguistic diversity of the continent. Keywords: language prioritization, African languages, capacity building, language diversity, language documentation When building language capacity on an individual or localized level, the question of which languages matter most is relatively less complicated than it is for those planning and providing for language capabilities at the macro level. An American anthropology student working with Sierra Leonean refugees in Forecariah, Guinea, for ex- ample, will likely know how to address and balance needs for lan- guage skills in French, Susu, Krio, and a set of other languages such as Temne and Mandinka. An education official or activist in Mwanza, Tanzania, will be concerned primarily with English, Swahili, and Su- kuma. An administrator of a grant program for Less Commonly Taught Languages, or LCTLs, or a newly appointed language authori- ty for the United States Department of Education, Department of Commerce, or U.S. -
Corpus Collection for Under-Resourced Languages with More Than One Million Speakers
Corpus collection for under-resourced languages with more than one million speakers Dirk Goldhahn1, Maciej Sumalvico1, Uwe Quasthoff1,2 Natural Language Processing Group, University of Leipzig, Germany Department of African Languages, University of South Africa, South Africa Email: { dgoldhahn, janicki, quasthoff, }@informatik.uni-leipzig.de Abstract For only 40 of about 350 languages with more than one million speakers, the situation concerning text resources is comfortable. For the remaining languages, the number of speakers indicates a need for both corpora and tools. This paper describes a corpus collection initiative for these languages. While random Web crawling has serious limitations, native speakers with knowledge of web pages in their language are of invaluable help. The aim is to give interested scholars, language enthusiasts the possibility to contribute to corpus creation or extension by simply entering a URL into the Web Interface. Using a Web portal URLs of interest are collected with the help of the respective communities. A standardized corpus processing chain for daily newspaper corpora creation is adapted to append newly added web pages to an increasing corpus. As a result we will be able to collect larger corpora for under-resourced languages by a community effort. These corpora will be made publicly available. Keywords: corpora, under-resourced languages, Web portal, community 1. Introduction links require the execution of JavaScript code by the There are about 350 languages with more than one million crawler which often produces errors. So, if a website speakers1. For about 40 of them, the situation concerning heavily uses JavaScript links and there are no other links text resources is comfortable: there are corpora of pointing to special pages (coming from another website, reasonable size and also tools like POS taggers adapted to for instance), then all but the main page might be these languages. -
Cape Flats English'i
The copyright of this thesis vests in the author. No quotation from it or information derived from it is to be published without full acknowledgementTown of the source. The thesis is to be used for private study or non- commercial research purposes only. Cape Published by the University ofof Cape Town (UCT) in terms of the non-exclusive license granted to UCT by the author. University , II , Focusing and Diffusion inI 'Cape Flats English'I A sociophonetic study of three vowels Justin Brown (BRWJUS002) A minor dissertation submitted in partial fullfilment of the requirements for the award of the degree of Master of Arts in Linguistics Faculty of the Humanities UniversityUniversity of of Cape Cape Town Town January 20122 COMPULSORY DECLARATION This work has not been previously submittedIII"\I"II'HIT.c.1'II in whole, or in part, for the award of any degree. It is my own work. Each significant contribution to, and quotation in, this dissertation from the work, or works, of other people has been attributed, and has been cited and referenced. Signature:, ________________Date: _____ Abstract This research contributes to the wider fields of sociophonetics and the social dialectology of English in South Africa. The study looks at three vowel sets; GOOSE, BATH and KIT taken from Wells (1982). The study was designed to identify and attempt to explain potential differences in pronunciation amongst speakers in an English-speaking community living in Cape Town and classified as 'Coloured' during apartheid. The community in question has used English as their first language for several generations and has enjoyed some of the economic advantages attached to this while at the same time being the victims (historically) of discrimination and marginalization. -
English in South Africa: Effective Communication and the Policy Debate
ENGLISH IN SOUTH AFRICA: EFFECTIVE COMMUNICATION AND THE POLICY DEBATE INAUGURAL LECTURE DELIVERED AT RHODES UNIVERSITY on 19 May 1993 by L.S. WRIGHT BA (Hons) (Rhodes), MA (Warwick), DPhil (Oxon) Director Institute for the Study of English in Africa GRAHAMSTOWN RHODES UNIVERSITY 1993 ENGLISH IN SOUTH AFRICA: EFFECTIVE COMMUNICATION AND THE POLICY DEBATE INAUGURAL LECTURE DELIVERED AT RHODES UNIVERSITY on 19 May 1993 by L.S. WRIGHT BA (Hons) (Rhodes), MA (Warwick), DPhil (Oxon) Director Institute for the Study of English in Africa GRAHAMSTOWN RHODES UNIVERSITY 1993 First published in 1993 by Rhodes University Grahamstown South Africa ©PROF LS WRIGHT -1993 Laurence Wright English in South Africa: Effective Communication and the Policy Debate ISBN: 0-620-03155-7 No part of this book may be reproduced, stored in a retrieval system or transmitted, in any form or by any means, electronic, mechanical, photo-copying, recording or otherwise, without the prior permission of the publishers. Mr Vice Chancellor, my former teachers, colleagues, ladies and gentlemen: It is a special privilege to be asked to give an inaugural lecture before the University in which my undergraduate days were spent and which holds, as a result, a special place in my affections. At his own "Inaugural Address at Edinburgh" in 1866, Thomas Carlyle observed that "the true University of our days is a Collection of Books".1 This definition - beloved of university library committees worldwide - retains a certain validity even in these days of microfiche and e-mail, but it has never been remotely adequate. John Henry Newman supplied the counterpoise: . no book can convey the special spirit and delicate peculiarities of its subject with that rapidity and certainty which attend on the sympathy of mind with mind, through the eyes, the look, the accent and the manner. -
Old Frisian, an Introduction To
An Introduction to Old Frisian An Introduction to Old Frisian History, Grammar, Reader, Glossary Rolf H. Bremmer, Jr. University of Leiden John Benjamins Publishing Company Amsterdam / Philadelphia TM The paper used in this publication meets the minimum requirements of 8 American National Standard for Information Sciences — Permanence of Paper for Printed Library Materials, ANSI Z39.48-1984. Library of Congress Cataloging-in-Publication Data Bremmer, Rolf H. (Rolf Hendrik), 1950- An introduction to Old Frisian : history, grammar, reader, glossary / Rolf H. Bremmer, Jr. p. cm. Includes bibliographical references and index. 1. Frisian language--To 1500--Grammar. 2. Frisian language--To 1500--History. 3. Frisian language--To 1550--Texts. I. Title. PF1421.B74 2009 439’.2--dc22 2008045390 isbn 978 90 272 3255 7 (Hb; alk. paper) isbn 978 90 272 3256 4 (Pb; alk. paper) © 2009 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa Table of contents Preface ix chapter i History: The when, where and what of Old Frisian 1 The Frisians. A short history (§§1–8); Texts and manuscripts (§§9–14); Language (§§15–18); The scope of Old Frisian studies (§§19–21) chapter ii Phonology: The sounds of Old Frisian 21 A. Introductory remarks (§§22–27): Spelling and pronunciation (§§22–23); Axioms and method (§§24–25); West Germanic vowel inventory (§26); A common West Germanic sound-change: gemination (§27) B. -
Binary Tree — up to 3 Related Nodes (List Is Special-Case)
trees 1 are lists enough? for correctness — sure want to efficiently access items better than linear time to find something want to represent relationships more naturally 2 inter-item relationships in lists 1 2 3 4 5 List: nodes related to predecessor/successor 3 trees trees: allow representing more relationships (but not arbitrary relationships — see graphs later in semester) restriction: single path from root to every node implies single path from every node to every other node (possibly through root) 4 natural trees: phylogenetic tree image: Ivicia Letunic and Mariana Ruiz Villarreal, via the tool iTOL (Interative Tree of Life), via Wikipedia 5 natural trees: phylogenetic tree (zoom) image: Ivicia Letunic and Mariana Ruiz Villarreal, via the tool iTOL (Interative Tree of Life), via Wikipedia 6 natural trees: Indo-European languages INDO-EUROPEAN ANATOLIAN Luwian Hittite Carian Lydian Lycian Palaic Pisidian HELLENIC INDO-IRANIAN DORIAN Mycenaean AEOLIC INDO-ARYAN Doric Attic ACHAEAN Aegean Northwest Greek Ionic Beotian Vedic Sanskrit Classical Greek Arcado Thessalian Tsakonian Koine Greek Epic Greek Cypriot Sanskrit Prakrit Greek Maharashtri Gandhari Shauraseni Magadhi Niya ITALIC INSULAR INDIC Konkani Paisaci Oriya Assamese BIHARI CELTIC Pali Bengali LATINO-FALISCAN SABELLIC Dhivehi Marathi Halbi Chittagonian Bhojpuri CONTINENTAL Sinhalese CENTRAL INDIC Magahi Faliscan Oscan Vedda Maithili Latin Umbrian Celtiberian WESTERN INDIC HINDUSTANI PAHARI INSULAR Galatian Classical Latin Aequian Gaulish NORTH Bhil DARDIC Hindi Urdu CENTRAL EASTERN -
Afrikaans FAMILY HISTORY LIBRARY" SALTLAKECITY, UTAH TMECHURCHOF JESUS CHRISTOF Latl'er-Qt.Y SAINTS
GENEALOGICAL WORD LIST ~ Afrikaans FAMILY HISTORY LIBRARY" SALTLAKECITY, UTAH TMECHURCHOF JESUS CHRISTOF LATl'ER-Qt.y SAINTS This list contains Afrikaans words with their English these compound words are included in this list. You translations. The words included here are those that will need to look up each part of the word separately. you are likely to find in genealogical sources. If the For example, Geboortedag is a combination of two word you are looking for is not on this list, please words, Geboorte (birth) and Dag (day). consult a Afrikaans-English dictionary. (See the "Additional Resources" section below.) Alphabetical Order Afrikaans is a Germanic language derived from Written Afrikaans uses a basic English alphabet several European languages, primarily Dutch. Many order. Most Afrikaans dictionaries and indexes as of the words resemble Dutch, Flemish, and German well as the Family History Library Catalog..... use the words. Consequently, the German Genealogical following alphabetical order: Word List (34067) and Dutch Genealogical Word List (31030) may also be useful to you. Some a b c* d e f g h i j k l m n Afrikaans records contain Latin words. See the opqrstuvwxyz Latin Genealogical Word List (34077). *The letter c was used in place-names and personal Afrikaans is spoken in South Africa and Namibia and names but not in general Afrikaans words until 1985. by many families who live in other countries in eastern and southern Africa, especially in Zimbabwe. Most The letters e, e, and 0 are also used in some early South African records are written in Dutch, while Afrikaans words. -
The Pronunciation of English in South Africa by L.W
The Pronunciation of English in South Africa by L.W. Lanham, Professor Emeritus, Rhodes University, 1996 Introduction There is no one, typical South African English accent as there is one overall Australian English accent. The variety of accents within the society is in part a consequence of the varied regional origins of groups of native English speakers who came to Africa at different times, and in part a consequence of the variety of mother tongues of the different ethnic groups who today use English so extensively that they must be included in the English-using community. The first truly African, native English accent in South Africa evolved in the speech of the children of the 1820 Settlers who came to the Eastern Cape with parents who spoke many English dialects. The pronunciation features which survive are mainly those from south-east England with distinct Cockney associations. The variables (distinctive features of pronunciation) listed under A below may be attributed to this origin. Under B are listed variables of probable Dutch origin reflecting close association and intermarriage with Dutch inhabitants of the Cape. There was much contact with Xhosa people in that area, but the effect of this was almost entirely confined to the vocabulary. (The English which evolved in the Eastern and Central Cape we refer to as Cape English.) The next large settlement from Britain took place in Natal between 1848 and 1862 giving rise to pronunciation variables pointing more to the Midlands and north of England (List C). The Natal settlers had a strong desire to remain English in every aspect of identity, social life, and behaviour. -
Reproductions Supplied by EDRS Are the Best That Can Be Made from the Original Document
DOCUMENT RESUME ED 447 692 FL 026 310 AUTHOR Breathnech, Diarmaid, Ed. TITLE Contact Bulletin, 1990-1999. INSTITUTION European Bureau for Lesser Used Languages, Dublin (Ireland). SPONS AGENCY Commission of the European Communities, Brussels (Belgium). PUB DATE 1999-00-00 NOTE 398p.; Published triannually. Volume 13, Number 2 and Volume 14, Number 2 are available from ERIC only in French. PUB TYPE Collected Works Serials (022) LANGUAGE English, French JOURNAL CIT Contact Bulletin; v7-15 Spr 1990-May 1999 EDRS PRICE MF01/PC16 Plus Postage. DESCRIPTORS Ethnic Groups; Irish; *Language Attitudes; *Language Maintenance; *Language Minorities; Second Language Instruction; Second Language Learning; Serbocroatian; *Uncommonly Taught Languages; Welsh IDENTIFIERS Austria; Belgium; Catalan; Czech Republic;-Denmark; *European Union; France; Germany; Greece; Hungary; Iceland; Ireland; Italy; *Language Policy; Luxembourg; Malta; Netherlands; Norway; Portugal; Romania; Slovakia; Spain; Sweden; Ukraine; United Kingdom ABSTRACT This document contains 26 issues (the entire output for the 1990s) of this publication deaicated to the study and preservation of Europe's less spoken languages. Some issues are only in French, and a number are in both French and English. Each issue has articles dealing with minority languages and groups in Europe, with a focus on those in Western, Central, and Southern Europe. (KFT) Reproductions supplied by EDRS are the best that can be made from the original document. N The European Bureau for Lesser Used Languages CONTACT BULLETIN This publication is funded by the Commission of the European Communities Volumes 7-15 1990-1999 REPRODUCE AND PERMISSION TO U.S. DEPARTMENT OF EDUCATION MATERIAL HAS Office of Educational Research DISSEMINATE THIS and Improvement BEEN GRANTEDBY EDUCATIONAL RESOURCESINFORMATION CENTER (ERIC) This document has beenreproduced as received from the personor organization Xoriginating it. -
Corpus-Based Study on African English Varieties
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Academy Publication Online ISSN 1798-4769 Journal of Language Teaching and Research, Vol. 8, No. 3, pp. 615-623, May 2017 DOI: http://dx.doi.org/10.17507/jltr.0803.22 Corpus-based Study on African English Varieties Xiaohui Xu Qingdao University of Science and Technology, Qingdao, China Abstract—Corpus-based research is more and more used in linguistics. English varieties are used a lot in daily communications throughout the world. African English varieties are discussed in this paper, including West African English, East African English and South African English. Kenya and Tanzania corpus is the main target corpus while Jamaica corpus is used as a comparative one. The tool used is AntConc 3.2.4. Index Terms—corpus, English varieties, African English, pidgins, creoles I. INTRODUCTION A. Standard English There is an agreeable division among scholars that the whole world is divided into three circles: the Inner Circle, the Outer Circle, and the Expanding Circle. In the Inner Circle, English is spoken as mother tongue; in the Outer Circle, English is usually spoken as a second language; in the Expanding Circle, English is usually spoken as a foreign language. Standard English is used in books, newspapers, magazines and nearly everything else that appears in print in the English-speaking world. This type of English is called “standard” because it has undergone standardization, which means that it has been subjected to a process through which it has been selected and stabilized, in a way that other varieties have not. -
The Indigenization of English Vowels by Zimbabwean Native Shona Speakers
African Englishes: The Indigenization of English Vowels by Zimbabwean Native Shona Speakers by Maxwell Kadenge, D.Phil. [email protected] Department of Linguistics, University of the Witwatersrand Maxwell Kadenge (B.A. Hons, D.Phil.) is a Lecturer in the Department of Linguistics at the University of Zimbabwe. He is currently engaged as a Postdoctoral Research Fellow in the School of Literature and Language Studies at the University of the Witwatersrand where he is carrying out research on various aspects of the phonological and morphosyntactic structures of Southern Bantu languages and Zimbabwean English. Abstract This research is largely inspired by the increasing literature chronicling the worldwide emergence of “new Englishes” (Deyuan and David, 2009:70), particularly their subtype known as “African Englishes” (Mutonya, 2008:434). Although the variety of English that is spoken in Zimbabwe is clearly a distinct variation of African English, however it has not received significant attention from both theoretical and applied linguists. In this context, this study seeks to critically examine the vocalic characteristics of the variety of English that is predominantly spoken as a second language (L2) in Zimbabwe. In this regard, this exploratory research adopts a highly observational data collection method and qualitative data analysis approach in order to insightfully investigate the influence of native Shona phonology on the pronunciation of English vowels by Shona-English bilinguals. The main focus of this research is to analyze how native English simple monophthongs and complex vowels such as long monophthongs, diphthongs and triphthongs are pronounced by Shona-English bilinguals. This study shows that first language (L1) Shona speakers employ simplifying strategies such as monophthongization of diphthongs and glide epenthesis in order to reduce English diphthongs and triphthongs to five simple monophthongs corresponding to [i, e, a, o, u].