Classification of South African Languages Using Text and Acoustic Based Methods: a Case of Six Selected Languages

Total Page:16

File Type:pdf, Size:1020Kb

Classification of South African Languages Using Text and Acoustic Based Methods: a Case of Six Selected Languages Classification of South African languages using text and acoustic based methods: A case of six selected languages Peleira Nicholas Zulu University of KwaZulu-Natal Electrical, Electronic and Computer Engineering Durban, 4041, South Africa [email protected] Heeringa, 2004; Van-Bezooijen & Heeringa, 2006; Abstract Van-Hout & Münstermann, 1981), and those sub- jective decisions are, for example, the basis for Language variations are generally known to classifying separate languages, and certain groups have a severe impact on the performance of of language variants as dialects of one another. It is Human Language Technology Systems. In or- well known that languages are complex; they differ der to predict or improve system performance, in vocabulary, grammar, writing format, syntax a thorough investigation into these variations, and many other characteristics. This presents levels similarities and dissimilarities, is required. Distance measures have been used in several of difficulty in the construction of objective com- applications of speech processing to analyze parative measures between languages. Even if one different varying speech attributes. However, intuitively knows, for example, that English is not much work has been done on language dis- closer to French than it is to Chinese, what are the tance measures, and even less work has been objective factors that allow one to assess the levels done involving South African languages. This of distance? study explores two methods for measuring the This bears substantial similarities to the analo- linguistic distance of six South African lan- gous questions that have been asked about the rela- guages. It concerns a text based method, (the tionships between different species in the science Levenshtein Distance), and an acoustic ap- of cladistics. As in cladistics, the most satisfactory proach using extracted mean pitch values. The Levenshtein distance uses parallel word tran- answer would be a direct measure of the amount of scriptions from all six languages with as little time that has elapsed since the languages’ first split as 144 words, whereas the pitch method is from their most recent common ancestor. Also, as text-independent and compares mean language in cladistics, it is hard to measure this from the pitch differences. Cluster analysis resulting available evidence, and various approximate from the distance matrices from both methods measures have to be employed instead. In the bio- correlates closely with human perceptual dis- logical case, recent decades have seen tremendous tances and existing literature about the six lan- improvements in the accuracy of biological meas- guages. urements as it has become possible to measure dif- ferences between DNA sequences. In linguistics, 1 Introduction the analogue of DNA measurements is historical information on the evolution of languages, and the The development of objective metrics to assess the more easily measured—though indirect measure- distances between different languages is of great ments (akin to the biological phenotype)—are ei- theoretical and practical importance. Currently, ther the textual or acoustic representations of the subjective measures have generally been employed languages in question. to assess the degree of similarity or dissimilarity between different languages (Gooskens & 280 Proceedings of NAACL-HLT 2013, pages 280–287, Atlanta, Georgia, 9–14 June 2013. c 2013 Association for Computational Linguistics In the current article, we focus on language dis- after, the paper will present an evaluation on the tance measures derived from both text and acoustic six languages of South Africa, highlighting lan- formats; we apply two different techniques, name- guage groupings and proximity patterns. In conclu- ly Levenshtein distance between orthographic sion, the paper discusses the results. word transcriptions, and distances between lan- guage pitch means in order to obtain measures of 2 Theoretical Background dissimilarity amongst a set of languages. These methods are used to obtain language groupings Orthographic transcriptions are one of the most which are represented graphically using multidi- basic types of annotation used for speech transcrip- mensional scaling and dendrograms—two standard tion, and are particularly important in most fields statistical techniques. This allows us to visualize of research concerned with spoken language. The and assess the methods relative to known linguistic orthography of a language refers to the set symbols facts in order to judge their relative used to write a language and includes its writing reliability(Zulu, Botha, & Barnard, 2008). system. English, for example, has an alphabet of Our evaluation is based on six of the eleven 26 letters which includes both consonants and official languages of South Africa. 1 The eleven vowels. However, each English letter may repre- official languages fall into two distinct groups, sent more than one phoneme, and each phoneme namely the Germanic group (represented by Eng- may be represented by more than one letter. In the lish and Afrikaans) and the South African Bantu current research, we investigate the use of Le- languages, which belong to the South Eastern Ban- venshtein distance on orthographic transcriptions tu group. The South African Bantu languages can for the assessment of language similarities. further be classified in terms of different sub- On the other hand, speech has been and still groupings: Nguni (consisting of Zulu, Xhosa, Nde- very much is the most natural form of communica- bele and Swati), Sotho (consisting of Southern So- tion. Prosodic characteristics such as rhythm, stress tho, Northern Sotho and Tswana), and a pair that and intonation in speech convey important infor- falls outside these sub-families (Tsonga and Ven- mation regarding the identity of a spoken language. da). The six languages chosen for our evaluation Results of perception studies on spoken language are English, Afrikaans, Zulu, Xhosa, Northern So- identification confirm that prosodic information, tho (also known as Sepedi) and Tswana, which specifically pitch and intensity—which represent equally represent the three groups; Germanic, intonation and stress respectively—are useful for Nguni and Sotho. language identification (Kometsu, Mori, Arai, & We believe that an understanding of these lan- Murahara, 2001; Mori et al., 1999). This paper pre- guage distances is not only of inherent interest, but sents a preliminary investigation of pitch and its also of great practical importance. For purposes role in determining acoustic based language dis- such as language learning, the selection of target tances. languages for various resources and the develop- ment of Human Language Technologies, reliable 2.1 Levenshtein Distance knowledge of language distances would be of great value. Consider, for example, the common situa- There are several ways in which phoneticians have tion of an organization that wishes to publish in- tried to measure the distance between two linguis- formation relevant to all languages in a particular tic entities, most of which are based on the descrip- multi-lingual community, but has insufficient fund- tion of sounds via various representations. This ing to do so. Such an organization can be guided section introduces the Levenshtein Distance Meas- by knowledge of language distances and mutual ure, one of the more popular sequence-based dis- intelligibility between languages to make an ap- tance measures. In 1995 Kessler introduced the use propriate choice of publication languages. of the Levenshtein Distance as a tool for measuring The following sections describe the Levenshtein linguistic distances between dialects (Kessler, distance and pitch characteristics in detail. There- 1995). The basic idea behind the Levenshtein Dis- tance is to imagine that one is rewriting or trans- forming one string into another. Kessler 1 Data for all eleven languages is available on the Lwazi web- successfully applied the Levenshtein Distance site: ( http://www.meraka.org.za/lwazi/index.php ). 281 measure to the comparison of Irish dialects. In his spoken language. This controlled modulation of work, the strings were transcriptions of word pro- pitch is referred to as intonation . The sound units nunciations. In general, rewriting is effected by are shortened or lengthened in accordance to some basic operations, each of which is associated with a underlying pattern giving rhythmic properties to cost, as illustrated in Table 1 in the transformation speech. The information attained from these of the string “ mošemane ” to the string “ umfana ”, rhythmic patterns increases the intelligibility of which are both orthographic translations of the spoken languages, enabling the listener to segment word boy in Northern Sotho and Zulu respectively. continuous speech into phrases and words with ease (Shriberg, Stolcke, Hakkani-Tur, & Tur, Operation Cost 2000). The characteristics that make us perceive mošemane delete m 1 this and other information such as stress, accent ošemane delete š 1 and emotion are collectively referred to as prosody. oemane delete e 1 omane insert f 1 Comparisons have shown that languages differ omfane substitute o/u 2 greatly in their prosodic features (Hirst & Cristo, umfane substitute e/a 2 1998), therefore providing a basis for objective umfana comparison between languages. Further, pitch is a Total cost 8 perceptual attribute of sound, the physical correlate of which is fundamental frequency ( F ), which rep- Table 1. Levenshtein
Recommended publications
  • A Linguistic and Anthropological Approach to Isingqumo, South Africa’S Gay Black Language
    “WHERE THERE’S GAYS, THERE’S ISINGQUMO”: A LINGUISTIC AND ANTHROPOLOGICAL APPROACH TO ISINGQUMO, SOUTH AFRICA’S GAY BLACK LANGUAGE Word count: 25 081 Jan Raeymaekers Student number: 01607927 Supervisor(s): Prof. Dr. Maud Devos, Prof. Dr. Hugo DeBlock A dissertation submitted to Ghent University in partial fulfilment of the requirements for the degree of Master of Arts in African Studies Academic year: 2019 - 2020 Table of Contents Acknowledgements ......................................................................................................................... 3 1. Introduction ............................................................................................................................. 4 2. Theoretical Framework ....................................................................................................... 8 2.1. Lavender Languages...................................................................................................... 8 2.1.1. What are Lavender Languages? ....................................................................... 8 2.1.2. How Are Languages Categorized? ................................................................ 12 2.1.3. Documenting Undocumented Languages ................................................. 17 2.2. Case Study: IsiNgqumo .............................................................................................. 18 2.2.1. Homosexuality in the African Community ................................................ 18 2.2.2. Homosexuality in the IsiNgqumo Community
    [Show full text]
  • Prioritizing African Languages: Challenges to Macro-Level Planning for Resourcing and Capacity Building
    Prioritizing African Languages: Challenges to macro-level planning for resourcing and capacity building Tristan M. Purvis Christopher R. Green Gregory K. Iverson University of Maryland Center for Advanced Study of Language Abstract This paper addresses key considerations and challenges involved in the process of prioritizing languages in an area of high linguistic di- versity like Africa alongside other world regions. The paper identifies general considerations that must be taken into account in this process and reviews the placement of African languages on priority lists over the years and across different agencies and organizations. An outline of factors is presented that is used when organizing resources and planning research on African languages that categorizes major or crit- ical languages within a framework that allows for broad coverage of the full linguistic diversity of the continent. Keywords: language prioritization, African languages, capacity building, language diversity, language documentation When building language capacity on an individual or localized level, the question of which languages matter most is relatively less complicated than it is for those planning and providing for language capabilities at the macro level. An American anthropology student working with Sierra Leonean refugees in Forecariah, Guinea, for ex- ample, will likely know how to address and balance needs for lan- guage skills in French, Susu, Krio, and a set of other languages such as Temne and Mandinka. An education official or activist in Mwanza, Tanzania, will be concerned primarily with English, Swahili, and Su- kuma. An administrator of a grant program for Less Commonly Taught Languages, or LCTLs, or a newly appointed language authori- ty for the United States Department of Education, Department of Commerce, or U.S.
    [Show full text]
  • The Use of the Augment in Nguni Languages with Special Reference to the Referentiality of the Noun Eva-Marie Bloom Ström & Matti Miestamo
    [Draft, August 2020; to appear in Lutz Marten, Rozenn Guérois, Hannah Gibson & Eva-Marie Bloom-Ström (eds), Morphosyntactic Variation in Bantu. Oxford: Oxford University Press.] The use of the augment in Nguni languages with special reference to the referentiality of the noun Eva-Marie Bloom Ström & Matti Miestamo Abstract This chapter examines the use of the augment, a prefix preceding the noun class prefix, in a number of language varieties in the Nguni subgroup of Bantu languages. The study of these closely related varieties, which show striking similarities as well as differences in the use of the augment, gives new insights into developmental tendencies of the augment. All contexts in which the augment can be omitted are non-fact contexts. Contrary to what has previously been argued for some varieties, however, we find that the presence vs. absence of the augment does not mark a referentiality distinction. It is argued that referentiality constitutes a semantic and pragmatic explanation to the absence and presence of the augment in different contexts in a diachronic perspective, but that this function is eroded in present-day Nguni. What remains is a limited referentiality distinction for some speakers in some varieties. The loss of function explains why the augment is included in the noun in nearly all contexts in some varieties, and omitted everywhere in others. Due to its loss of function, the augment has become free to participate in sociolinguistic and stylistic variation in some Bantu languages. Key-words: negation, non-fact, referentiality, augment, morphosyntax, Nguni 1. Introduction The aim of this paper is to explore the connection between the use of a nominal prefix in Bantu referred to as the augment1 and (non-)referentiality, such as has been claimed to exist in Swati: 2 (1) a.
    [Show full text]
  • (Bantugent – Ugent Centre for Bantu Studies) Digital Colloquium on African Languages and Linguistics Humboldt University, Berlin – 19 May 2020 OVERVIEW
    DEPARTMENT OF LANGUAGES AND CULTURES AFRICAN LANGUAGES AND CULTURES THE HISTORY OF CLICKS IN NGUNI LANGUAGES Hilde Gunnink – Ghent University (BantUGent – UGent centre for Bantu Studies) Digital colloquium on African languages and linguistics Humboldt University, Berlin – 19 May 2020 OVERVIEW 1. Bantu/Khoisan language contact 2. Clicks in Bantu languages 3. The Nguni languages 1. Click inventories 2. Subclassification 3. Reconstruction of Proto-Nguni clicks When did clicks enter the Nguni languages and what does this tell us about the contact history between Nguni and Khoisan speakers? 3 PRE-BANTU SOUTHERN AFRICA “Khoisan”: languages with phonemic clicks that do not belong to another language family (e.g. Bantu or Cushitic) Southern Africa: ̶ Kx’a (Northern Khoisan) ̶ Khoe-Kwadi (Central Khoisan) ̶ Tuu (Southern Khoisan) Most Khoisan languages are endangered/extinct Güldemann, T. 2014. 'Khoisan' linguistic classification today. In Güldemann, T & A.-M. Fehn (eds.), Beyond 'Khoisan': historical relations in the Kalahari 4 basin, 1-40. Amsterdam/Philadelphia: John Benjamins Publishing Company. BANTU/KHOISAN LANGUAGE CONTACT ̶ Lexicon: ̶ loanwords ̶ lexical semantics ̶ Phonology ̶ clicks ̶ other rare consonants ̶ Morphology ̶ borrowed affixes ̶ contact-induced grammaticalization 5 CLICKS Clicks are unique to: ̶ “Khoisan” languages: Khoe-Kwadi, Kx’a, Tuu families + Sandawe, Hadza ̶ Bantu languages in southern Africa ̶ The Cushitic language Dahalo in east Africa ̶ Damin, ritual register of Australian language Lardil Very unique so clear hallmark of Khoisan contact! 6 CLICKS South East Bantu click languages - Nguni: Xhosa, Phuthi, Zulu, Swati, Southern Ndebele, Zimbabwean Ndebele - Sotho: Southern Sotho South West Bantu click languages - Kavango: Kwangali, Manyo, Mbukushu - Bantu Botatwe: Fwe - Yeyi Adapted from: Pakendorf, B., et al.
    [Show full text]
  • Utilise Oar Des Voyageurs, Des Commnervants, Des Soldats Ou Des
    CONTACT LANGUAGES IN AFRICA'1 Bonnie B. Keller Graduate Student, Anthropology University of California, Berkeley Culture contact has been a toric of snecial interest to anthropologists. A contact situation lends itself to numerous kinds of study: personality con- flict, culture change, diffusion, >reakdown and retention of cultural patterns. Another phenomenon occurring in contact situations, and one which would seem to appear in all the above ½henomena to some extent, is linguistic change; unfor- tunately, this aspect of culture contact has not been emphasized, and there is not a great amount of research material available on it. When individuals of different cultures, and often Possessing different languages, come into contact, there are four possibilities for intercourse be- tween them, according to John Reinecke, one of the authorities in the field of linguistic contact and change (1938:107';. They may attempt no speech at all, and carry on a purely gesturing or signalling tyne of communication, such as dumb barter. They may use a lingua franca, that is, a third language which both know. (The term is derived from the original contact language, Lingua Franca, a language used by the French crusaders in dealing with the Levantines.) A lingua franca is essentially "any language used as a means of communication among peo- ple of different linguistic backgrounds" (Hall, 1955:25). One group may learn the language of the other, a phenomenon known as bilingualism. Finally, if neither group is in a position to become bilingual or to learn a lingua franca, both may resort to the use of a reduced fonr"of one of the two languages.
    [Show full text]
  • Receptive Multilingualism Across the Lifespan: Cognitive and Linguistic Factors in Cognate Guessing
    Languages and Literatures \ Multilingualism research Receptive multilingualism across the lifespan Cognitive and linguistic factors in cognate guessing Jan Vanhove Dissertation zur Erlangung der Doktorwürde an der Philosophischen Fakultät der Universität Freiburg in der Schweiz Genehmigt von der philosophischen Fakultät auf Antrag der Professoren Raphael Berthele (1. Gutachter) und Charlotte Gooskens (2. Gutach- terin). Freiburg, den 17. Januar 2014. Prof. Marc-Henry Soulet, Dekan. 2014 Receptive multilingualism across the lifespan Cognitive and linguistic factors in cognate guessing Jan Vanhove Cite as: Vanhove, Jan (2014). Receptive multilingualism across the lifespan. Cognitive and linguistic factors in cognate guessing. PhD thesis. University of Fribourg (Switzerland). Data and computer code available from: http://dx.doi.org/10.6084/m9.figshare.795286. Contents Tables xi Figures xiii Preface xv I Introduction 1 1 Context and aims 3 1.1 Cross-linguistic similarities in language learning . 4 1.2 Receptive multilingualism . 5 1.3 Multilingualism and the age factor . 8 1.4 The present project . 9 1.4.1 The overarching project ‘Multilingualism through the lifespan’ . 9 1.4.2 Aim, scope and terminology . 10 1.5 Overview . 13 II The lifespan development of cognate guess- ing skills 15 2 Inter-individual differences in cognate guessing skills 17 2.1 Linguistic repertoire . 18 2.1.1 Typological relation between the Lx and the L1 . 18 2.1.2 The impact of multilingualism . 19 2.2 Previous exposure . 26 vi Contents 2.3 Attitudes . 28 2.4 Age . 29 3 The lifespan development of cognition 33 3.1 Intelligence . 34 3.1.1 Fluid and crystallised intelligence . 34 3.1.2 Lifespan trajectories .
    [Show full text]
  • Isingqumo: Exploring Origins, Growth and Sociolinguistics of an Nguni Urban-Township Homosexual Subculture
    UNIVERSITY OF KWAZULU-NATAL TITLE IsiNgqumo: Exploring origins, growth and sociolinguistics of an Nguni urban-township homosexual subculture By Praisegod Mduduzi Ntuli 204505680 A dissertation submitted in partial fulfillment of the requirements for the degree of Masters in Social Science FACULTY OF HUMANITIES, DEVELOPMENT AND SOCIAL SCIENCES Gender Studies Supervisor: Prof. Thenjiwe Meyiwa 2009 i DECLARATION Submitted in fulfilment / partial fulfilment of the requirements for the degree of $\^%&i:ldl?,)>3£r. , in the Graduate Programme in ...Cr.Stl4^.!C..-%l(^niversity of KwaZulu-Natal, South Africa. I declare that this dissertation is my own unaided work. All citations, references and borrowed ideas have been duly acknowledged. I confirm that an external editor was / was not used (delete whichever is applicable) and that my Supervisor was informed of the identity and details of my editor. It is being submitted for the degree of $!^!%Jffih&\5.&SASM-.&.... in the Faculty of Humanities, Development and Social Science, University of KwaZulu-Natal, South Africa. None of the present work has been submitted previously for any degree or examination in any other University. Student name 3 QIO Data «/YJ tM <rL- Editor ACKOWLEDGEMENTS I wish to express my sincere appreciation and gratitude to the following individuals, without whose assistance, this study would not have been possible: • Prof. Thenjiwe Meyiwa for the superb supervision and mentorship. • Dr. Stephanie Rudwick for IsiNgqumo - Introducing a gay Black South African linguistic variety. • Prof Gqaleni for the funding for the research • My sincere thanks to the 36 Nguni homosexual men and baba DIamini without whom this research would not have been successful.
    [Show full text]
  • Orthographic Measures of Language Distances Between the Official South African Languages
    Orthographic measures of language distances between the official South African languages P.N. Zulu, G. Botha & E. Barnard Human Language Technologies Research Group, CSIR & Department of Electrical and Computer Engineering University of Pretoria PRETORIA E-mail: [email protected] [email protected] [email protected] Abstract Orthographic measures of language distances between the official South African languages Two methods for objectively measuring similarities and dis- similarities between the eleven official languages of South Africa are described. The first concerns the use of n-grams. The confusions between different languages in a text-based lan- guage identification system can be used to derive information on the relationships between the languages. Our classifier calculates n-gram statistics from text documents and then uses these statistics as features in classification. We show that the classification results of a validation test can be used as a similarity measure of the relationship between languages. Using the similarity measures, we were able to represent the relation- ships graphically. We also apply the Levenshtein distance measure to the ortho- graphic word transcriptions from the eleven South African lan- guages under investigation. Hierarchical clustering of the dis- tances between the different languages shows the relationships between the languages in terms of regional groupings and closeness. Both multidimensional scaling and dendrogram ana- lysis reveal results similar to well-known language groupings, and also suggest a finer level of detail on these relationships. Literator 29(1) April 2008:185-204 ISSN 0258-2279 185 Orthographic measures of language distances ... official South African languages Opsomming Ortografiese maatstawwe van taalafstande tussen die amptelike Suid-Afrikaanse tale Twee metodes vir die bepaling van verwantskappe tussen die elf amptelike tale van Suid-Afrika word beskryf.
    [Show full text]
  • Linguistic Distance and the Language Fluency of Immigrants
    RUHR ECONOMIC PAPERS Ingo E. Isphording Sebastian Otten Linguistic Distance and the Language Fluency of Immigrants #274 Imprint Ruhr Economic Papers Published by Ruhr-Universität Bochum (RUB), Department of Economics Universitätsstr. 150, 44801 Bochum, Germany Technische Universität Dortmund, Department of Economic and Social Sciences Vogelpothsweg 87, 44227 Dortmund, Germany Universität Duisburg-Essen, Department of Economics Universitätsstr. 12, 45117 Essen, Germany Rheinisch-Westfälisches Institut für Wirtschaftsforschung (RWI) Hohenzollernstr. 1-3, 45128 Essen, Germany Editors Prof. Dr. Thomas K. Bauer RUB, Department of Economics, Empirical Economics Phone: +49 (0) 234/3 22 83 41, e-mail: [email protected] Prof. Dr. Wolfgang Leininger Technische Universität Dortmund, Department of Economic and Social Sciences Economics – Microeconomics Phone: +49 (0) 231/7 55-3297, email: [email protected] Prof. Dr. Volker Clausen University of Duisburg-Essen, Department of Economics International Economics Phone: +49 (0) 201/1 83-3655, e-mail: [email protected] Prof. Dr. Christoph M. Schmidt RWI, Phone: +49 (0) 201/81 49-227, e-mail: [email protected] Editorial Offi ce Joachim Schmidt RWI, Phone: +49 (0) 201/81 49-292, e-mail: [email protected] Ruhr Economic Papers #274 Responsible Editor: Thomas K. Bauer All rights reserved. Bochum, Dortmund, Duisburg, Essen, Germany, 2011 ISSN 1864-4872 (online) – ISBN 978-3-86788-319-1 The working papers published in the Series constitute work in progress circulated to stimulate discussion and critical comments. Views expressed represent exclusively the authors’ own opinions and do not necessarily refl ect those of the editors. Ruhr Economic Papers #274 Isphording, Ingo E.
    [Show full text]
  • A Quantitative Measure of the Distance Between English and Other Languages
    IZA DP No. 1246 Linguistic Distance: A Quantitative Measure of the Distance Between English and Other Languages Barry R. Chiswick Paul W. Miller DISCUSSION PAPER SERIES DISCUSSION PAPER August 2004 Forschungsinstitut zur Zukunft der Arbeit Institute for the Study of Labor Linguistic Distance: A Quantitative Measure of the Distance Between English and Other Languages Barry R. Chiswick University of Illinois at Chicago and IZA Bonn Paul W. Miller University of Western Australia Discussion Paper No. 1246 August 2004 IZA P.O. Box 7240 53072 Bonn Germany Phone: +49-228-3894-0 Fax: +49-228-3894-180 Email: [email protected] Any opinions expressed here are those of the author(s) and not those of the institute. Research disseminated by IZA may include views on policy, but the institute itself takes no institutional policy positions. The Institute for the Study of Labor (IZA) in Bonn is a local and virtual international research center and a place of communication between science, politics and business. IZA is an independent nonprofit company supported by Deutsche Post World Net. The center is associated with the University of Bonn and offers a stimulating research environment through its research networks, research support, and visitors and doctoral programs. IZA engages in (i) original and internationally competitive research in all fields of labor economics, (ii) development of policy concepts, and (iii) dissemination of research results and concepts to the interested public. IZA Discussion Papers often represent preliminary work and are circulated to encourage discussion. Citation of such a paper should account for its provisional character. A revised version may be available directly from the author.
    [Show full text]
  • The Mosaic Evolution of Left Dislocation in Xhosa
    Stellenbosch Papers in Linguistics Plus, Vol. 50, 2016, 139-158 doi: 10.5842/50-0-720 The mosaic evolution of Left Dislocation in Xhosa Alexander Andrason & Marianna W. Visser Department of African Languages, Stellenbosch University, South Africa E-mail: [email protected]; [email protected] Abstract This paper demonstrates that the status of Clitic LD in Xhosa is a result of the mosaic evolution of Xhosa grammar. It emerges as an accumulation and combination of two more individual, distinct and, at least, initially separated developments and characteristics – LD sensu stricto and Object Agreement. This view enables the authors to propose a possible solution to the problem of whether the Clitic LD structure in Xhosa (and Nguni) is an instantiation of LD (and its prototypical function as posited by Westbury (2014)) or fronting (and its topical and focal functions). The Clitic LD structure is employed to accomplish the two tasks. The former is derived from the original LD construction, while the latter arose due to the Object Agreement cycle. The mosaic character of LD in Xhosa, in turn, demonstrates the situatedness of LD. LD is determined not only by its own evolutionary baggage (the source from which it has developed) but also by the dynamics of the environment in which it has been embedded (the properties of other constructions). Keywords: Xhosa; left dislocation; object agreement; language dynamics 1. Introduction The concept of mosaic evolution is currently one of the tenants of the theory of evolution. The mosaic character of evolutionary mechanisms implies that different parts of a system evolve quasi-separately and at different rates (Futyma 2005: 550).
    [Show full text]
  • Intonation Modelling for the Nguni Languages
    I NTONATION MODELLING FOR THE NGUNI LANGUAGES NATASHA GOVENDER I NTONATION MODELLING FOR THE NGUNI LANGUAGES By Natasha Govender Submitted in partial fulfilment of the requirements for the degree Master of Computer Science in the Faculty of Engineering, the Built Environment and Information Technology at the UNIVERSITY OF PRETORIA Advisor: Professor Etienne Barnard April 2006 I NTONATION MODELLING FOR THE NGUNI LANGUAGES BY NATASHA GOVENDER PROFESSOR ETIENNE BARNARD DEPARTMENT OF COMPUTER SCIENCE,MASTER OF COMPUTER SCIENCE Although the complexity of prosody is widely recognised, there is a lack of widely-accepted descriptive standards for prosodic phenomena. This situation has become particularly noticeable with the development of increasingly capable text-to-speech (TTS) systems. Such systems require detailed prosodic models to sound natural. For the languages of Southern Africa, the deficiencies in our modelling capabilities are acute. Little work of a quantitative nature has been published for the languages of the Nguni family (such as isiZulu and isiXhosa), and there are significant contradictions and imprecisions in the literature on this topic. We have therefore embarked on a programme aimed at understanding the relationship between linguistic and physical variables of a prosodic nature in this family of languages. We then use the information/knowledge gathered to build intonation models for isiZulu and isiXhosa as representatives of the Nguni languages. Firstly, we need to extract physical measurements from the voice recordings of the Nguni family of languages. A number of pitch tracking algorithms have been developed; however, to our knowledge, these algorithms have not been evaluated formally on a Nguni language. In order to decide on an appropriate algorithm for further analysis, evaluations have been performed on two state- of-the-art algorithms namely the Praat pitch tracker and Yin (developed by Alain de Cheveingn´e).
    [Show full text]