Finite State Tokenisation of an Orthographical Disjunctive Agglutinative Language: the Verbal Segment of Northern Sotho
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
A Corpus-Driven Account of the Noun Classes and Genders in Northern Sotho
Southern African Linguistics and Applied Language Studies 2016, 34(2): 169–185 Copyright © NISC (Pty) Ltd SOUTHERN AFRICAN LINGUISTICS AND APPLIED LANGUAGE STUDIES ISSN 1607-3614 EISSN 1727-9461 http://dx.doi.org/10.2989/16073614.2016.1206478 A corpus-driven account of the noun classes and genders in Northern Sotho Elsabé Taljard1* and Gilles-Maurice de Schryver1,2 1Department of African Languages, University of Pretoria, Pretoria, South Africa 2 BantUGent – UGent Centre for Bantu Studies, Department of Languages and Cultures, Ghent University, Belgium *Corresponding author email: [email protected] Abstract: This article offers a distributional corpus analysis of the Northern Sotho noun and gender system. The aim is twofold: first, to assess whether the existing descriptions of the noun class system in Northern Sotho are corroborated by information provided by the analysis of a large electronic corpus for this language, with specific reference to singular-plural pairings, and second, to present a number of novel visualisation aids to characterise a noun class system (in a radar diagram) and a noun gender system (using a two-directional weighted representation) for Northern Sotho in particular, and for any Bantu language in general. The findings include the discovery of two new genders in Northern Sotho (i.e. class pairs 1/6 and 3/10), and also indicate that the Northern Sotho noun class system, and by extension any one for Bantu, should be seen as dynamic. Introduction The present undertaking adds to a relatively small, but growing number of corpus-based grammatical descriptions of Bantu language features, several of which regard South African languages. -
The Standardisation of African Languages Michel Lafon, Vic Webb
The Standardisation of African Languages Michel Lafon, Vic Webb To cite this version: Michel Lafon, Vic Webb. The Standardisation of African Languages. Michel Lafon; Vic Webb. IFAS, pp.141, 2008, Nouveaux Cahiers de l’Ifas, Aurelia Wa Kabwe Segatti. halshs-00449090 HAL Id: halshs-00449090 https://halshs.archives-ouvertes.fr/halshs-00449090 Submitted on 20 Jan 2010 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The Standardisation of African Languages Language political realities CentRePoL and IFAS Proceedings of a CentRePoL workshop held at University of Pretoria on March 29, 2007, supported by the French Institute for Southern Africa Michel Lafon (LLACAN-CNRS) & Vic Webb (CentRePoL) Compilers/ Editors CentRePoL wishes to express its appreciation to the following: Dr. Aurelia Wa Kabwe-Segatti, Research Director, IFAS, Johannesburg, for her professional and material support; PanSALB, for their support over the past two years for CentRePoL’s standardisation project; The University of Pretoria, for the use of their facilities. Les Nouveaux Cahiers de l’IFAS/ IFAS Working Paper Series is a series of occasional working papers, dedicated to disseminating research in the social and human sciences on Southern Africa. -
Computational Challenges for Polysynthetic Languages
Computational Modeling of Polysynthetic Languages Judith L. Klavans, Ph.D. US Army Research Laboratory 2800 Powder Mill Road Adelphi, Maryland 20783 [email protected] [email protected] Abstract Given advances in computational linguistic analysis of complex languages using Machine Learning as well as standard Finite State Transducers, coupled with recent efforts in language revitalization, the time was right to organize a first workshop to bring together experts in language technology and linguists on the one hand with language practitioners and revitalization experts on the other. This one-day meeting provides a promising forum to discuss new research on polysynthetic languages in combination with the needs of linguistic communities where such languages are written and spoken. Finally, this overview article summarizes the papers to be presented, along with goals and purpose. Motivation Polysynthetic languages are characterized by words that are composed of multiple morphemes, often to the extent that one long word can express the meaning contained in a multi-word sentence in lan- guage like English. To illustrate, consider the following example from Inuktitut, one of the official languages of the Territory of Nunavut in Canada. The morpheme -tusaa- (shown in boldface below) is the root, and all the other morphemes are synthetically combined with it in one unit.1 (1) tusaa-tsia-runna-nngit-tu-alu-u-junga hear-well-be.able-NEG-DOER-very-BE-PART.1.S ‘I can't hear very well.’ Kabardian (Circassian), from the Northwest Caucasus, also shows this phenomenon, with the root -še- shown in boldface below: (2) wə-q’ə-d-ej-z-γe-še-ž’e-f-a-te-q’əm 2SG.OBJ-DIR-LOC-3SG.OBJ-1SG.SUBJ-CAUS-lead-COMPL-POTENTIAL-PAST-PRF-NEG ‘I would not let you bring him right back here.’ Polysynthetic languages are spoken all over the globe and are richly represented among Native North and South American families. -
“Today's Morphology Is Yeatyerday's Syntax”
MOTHER TONGUE Journal of the Association for the Study of Language in Prehistory • Issue XXI • 2016 From North to North West: How North-West Caucasian Evolved from North Caucasian Viacheslav A. Chirikba Abkhaz State University, Sukhum, Republic of Abkhazia The comparison of (North-)West Caucasian with (North-)East Caucasian languages suggests that early Proto-West Caucasian underwent a fundamental reshaping of its phonological, morphological and syntactic structures, as a result of which it became analytical, with elementary inflection and main grammatical roles being expressed by lexical means, word order and probably also by tones. The subsequent development of compounding and incorporation resulted in a prefixing polypersonal polysynthetic agglutinative language type typical for modern West Caucasian languages. The main evolutionary line from a North Caucasian dialect close to East Caucasian to modern West Caucasian languages was thus from agglutinative to the analytical language-type, due to a near complete loss of inflection, and then to the agglutinative polysynthetic type. Although these changes blurred the genetic relationship between West Caucasian and East Caucasian languages, however, this can be proven by applying standard procedures of comparative-historical linguistics. 1. The West Caucasian languages.1 The West Caucasian (WC), or Abkhazo-Adyghean languages constitute a branch of the North Caucasian (NC) linguistic family, which consists of five languages: Abkhaz and Abaza (the Abkhaz sub-group), Adyghe and Kabardian (the Circassian sub-group), and Ubykh. The traditional habitat of these languages is the Western Caucasus, where they are still spoken, with the exception of the extinct Ubykh. Typologically, the WC languages represent a rather idiosyncratic linguistic type not occurring elsewhere in Eurasia. -
Zaspil Nr. 53 – November 2010 Papers
ZASPiL Nr. 53 – November 2010 Papers from the Workshop on Bantu Relative Clauses Laura Downing, Annie Rialland, Jean- Marc Beltzung, Sophie Manus, Cédric Patin, Kristina Riedel (Eds.) Table of Contents Laura J. Downing, Annie Rialland, Cédric Patin, Kristina Riedel Introduction ......................................................................................................... 1 Jean-Marc Beltzung, Annie Rialland & Martial Embanga Aborobongui Les relatives possessives en mbochi (C25)…………………………………….. 7 Lisa L.-S. Cheng, Laura J. Downing Locative Relatives in Durban Zulu……………………………………………. 33 Laura J. Downing, Al Mtenje The Prosody of Relative Clauses in Chewa………..…………………………. 53 Larry M. Hyman, Francis X. Katamba Tone, Syntax, and Prosodic Domains in Luganda............................................. 69 Shigeki Kaji A Comparative Study of Tone of West Ugandan Bantu Languages, with Particular Focus on the Tone Loss in Tooro …………………………………. 99 Charles W. Kisseberth Phrasing and Relative Clauses in Chimwiini .................................................. 109 Emmanuel-Moselly Makasso Processus de relativisation en bàsàa: de la syntaxe à la prosodie …………… 145 Sophie Manus The Prosody of Símákonde Relative Clauses ………………………….……. 159 Cédric Patin The Prosody of Shingazidja Relatives ……………………………………… 187 Kristina Riedel Relative Clauses in Haya ................................................................................. 211 Sabine Zerbian The Relative Clause and its Tones in Tswana ................……….............….... 227 BantuPsyn -
[.35 **Natural Language Processing Class Here Computational Linguistics See Manual at 006.35 Vs
006 006 006 DeweyiDecimaliClassification006 006 [.35 **Natural language processing Class here computational linguistics See Manual at 006.35 vs. 410.285 *Use notation 019 from Table 1 as modified at 004.019 400 DeweyiDecimaliClassification 400 400 DeweyiDecimali400Classification Language 400 [400 [400 *‡Language Class here interdisciplinary works on language and literature For literature, see 800; for rhetoric, see 808. For the language of a specific discipline or subject, see the discipline or subject, plus notation 014 from Table 1, e.g., language of science 501.4 (Option A: To give local emphasis or a shorter number to a specific language, class in 410, where full instructions appear (Option B: To give local emphasis or a shorter number to a specific language, place before 420 through use of a letter or other symbol. Full instructions appear under 420–490) 400 DeweyiDecimali400Classification Language 400 SUMMARY [401–409 Standard subdivisions and bilingualism [410 Linguistics [420 English and Old English (Anglo-Saxon) [430 German and related languages [440 French and related Romance languages [450 Italian, Dalmatian, Romanian, Rhaetian, Sardinian, Corsican [460 Spanish, Portuguese, Galician [470 Latin and related Italic languages [480 Classical Greek and related Hellenic languages [490 Other languages 401 DeweyiDecimali401Classification Language 401 [401 *‡Philosophy and theory See Manual at 401 vs. 121.68, 149.94, 410.1 401 DeweyiDecimali401Classification Language 401 [.3 *‡International languages Class here universal languages; general -
٧٠٩ Morphological Typology
جامعة واسط العـــــــــــــــدد السابع والثﻻثون مجلــــــــة كليــــــــة التربيــــــة الجزء اﻷول / تشرين الثاني / 2019 Morphological Typology: A Comparative Study of Some Selected Languages Bushra Farhood Khudheyier AlA'amiri Prof. Dr. Abdulkareem Fadhil Jameel, Ph.D. Department of English, College of Education/ Ibn Rushud for Humanities Baghdad University, Baghdad, Iraq Abstract Morphology is a main part of English linguistics which deals with forms of words. Morphological typology organizes languages on the basis of these word forms. This organization of languages depends on structural features to mould morphological, patterns, typologising languages, assigning them to analytic, or synthetic types on the base of words segmentability and invariance, or measuring the number of morphemes per word. Morphological typology studies the universals in languages, the differences and similarities between languages in the structural patterns found in different languages, which occur within a restricted range. This paper aims at distinguishing the various types of several universal languages and comparing them with English. The comparison of languages are set according to the number of morphemes, the degree of being analytic ,or synthetic languages by given examples of each type. Accordingly, it is hypothesed that languages are either to be analytic, or synthetic according to the syntactic and morphological form of morphemes and their meaning relation. The analytical procedures consist of expressing the morphological types with some selected examples, then making the comparison between each type and English. The conclusions reached at to the point of the existence of similarity between these morphological types . English is Analytic , but it has some synythetic aspects, so it validated the first hypothesis and not entirely refuted the second one. -
A Balanced and Representative Corpus: the Effects of Strict Corpus-Based Dictionary Compilation in Sesotho Sa Leboa* V.M
http://lexikos.journals.ac.za A Balanced and Representative Corpus: The Effects of Strict Corpus-based Dictionary Compilation in Sesotho sa Leboa* V.M. Mojela, Sesotho sa Leboa National Lexicography Unit, University of Limpopo, Turfloop Campus, Polokwane, South Africa ([email protected]) Abstract: Theoretically the Northern Sotho language is made up of almost 30 dialects while practically it is not so, because the standard language was formed from very few of its dialects. As a result, even today the language has no corpus which is balanced or representative owing to the fact that almost all of the available corpora are compiled from the written standard language and the written dialects. The majority of the Northern Sotho dialects do not have written orthographies, and the few dialects which had written orthographies prior to standardization came to monopolize the standard language and the Northern Sotho corpora. Therefore, the compilation of a corpus- based dictionary in Northern Sotho is tantamount to a continuation of producing unbalanced and unrepresentative dictionaries, which continue to sideline and to marginalize the majority of the communities and the linguistic varieties which could potentially enrich both the Northern Sotho standard language and the Northern Sotho corpora. The main objective with this research is to analyze, to expose and to suggest ways of correcting these irregularities so that the marginalized Northern Sotho dialects can be accommodated in the standard language. This will obviously increase the size of the Northern Sotho standard language and the corpus by more than 50%. Keywords: CORPUS, BALANCED CORPUS, REPRESENTATIVE CORPUS, STANDARDIZA- TION, DIALECT, ORTHOGRAPHY, MARGINALIZED DIALECTS, PRESTIGE DIALECTS, MIS- SIONARY ACTIVITIES Opsomming: 'n Gebalanseerde en verteenwoordigende korpus: Die gevolge van streng korpusgebaseerde woordeboeksamestelling in Sesotho sa Leboa. -
LNGT 0250 Morphology and Syntax
3/22/2016 LNGT 0250 Announcements Morphology and Syntax • I’m extending the deadline for Assignment 3 until 8pm tomorrow Wednesday March 23. This should allow you to take advantage of my office hours tomorrow if you have questions. • Reactions to lecture on Menominee? • Answer the questions on the short handout, and give it back to me on Thursday. I’ll use that information in forming your Linguistic Lecture #11 Diversity Project groups. March 22nd, 2016 2 Neatist Vegetarian vs. Humanitarian 3 4 Today’s agenda Restrictions on productivity • Unfinished business: Constraints on the • ‐ee productivity of derivational morphemes. – draw drawee • Morphological typology: How do languages – pay payee differ? – free *freeee • Nominal and verbal categories of inflectional – accompany *accompanyee morphology. 5 6 1 3/22/2016 Spanish diminutive morpheme: ‐illo Restrictions on productivity • mesa mesillo ‘little table’ • private privatize • grupo grupillo ‘little group’ • capital capitalize • gallo *gallillo ‘little rooster’ • corrupt *corruptize • camello *camellillo ‘little camel’ • secure *securize 7 8 Restrictions on productivity Restrictions on productivity • combat combatant fight *fightant • escalate deescalate • brutal brutality brittle *brittality • assassinate *deassassinate • monster monstrous spinster *spinstrous • parent parental mother *motheral • But notice: murderous, thunderous 9 10 Index of synthesis: How many morphemes Morphological typology does your language have per word? • One aspect of morphological variation -
Towards an Alternative Description of Incomplete Sentences in Agglutinative Languages S. Ido a Thesis Submitted in Fulfilment O
Towards an Alternative Description of Incomplete Sentences in Agglutinative Languages S. Ido A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy 2001 University of Sydney I declare that this thesis is all my own work. I have acknowledged in formal citation the sources of any reference I have made to the work of others. ____________________________ Shinji Ido ____________________________ Date Title: Towards an Alternative Description of Incomplete Sentences in Agglutinative Languages Abstract: This thesis analyses ‘incomplete sentences’ in languages which utilise distinctively agglutinative components in their morphology. In the grammars of the languages dealt with in this thesis, there are certain types of sentences which are variously referred to as ‘elliptical sentences’ (Turkish eksiltili cümleler), ‘incomplete sentences’ (Uzbek to‘liqsiz gaplar), ‘cut-off sentences’ (Turkish kesik cümleler), etc., for which the grammarians provide elaborated semantic and syntactic analyses. The current work attempts to present an alternative approach for the analysis of such sentences. The distribution of morphemes in incomplete sentences is examined closely, based on which a system of analysis that can handle a variety of incomplete sentences in an integrated manner is proposed from a morphological point of view. It aims to aid grammarians as well as researchers in area studies by providing a simple description of incomplete sentences in agglutinative languages. The linguistic data are taken from Turkish, Uzbek, -
Northern Sotho Translation of the Peabody Picture Vocabulary Test - Revised
3 Culturally Valid Assessment Tools: Northern Sotho Translation of the Peabody Picture Vocabulary Test - Revised Corrieta Pakendorf and Erna Alant Department of Communication Pathology University of Pretoria ABSTRACT There is currently a great demand for service provision for the African language speakers in South Africa. The difficulties associated with assessing speakers in the absence of assessment tools in the indigenous languages is, therefore, also a perti- nent concern. Within the current socio-economic climate in South Africa where test translation and adaptation is often cited as a more viable option than that of developing new tests, very few guidelines exist for the development or adaptation of valid assessment tools for culturally and linguistically diverse population groups. This article is aimed at describing the process which took place when existing English test material, in this instance, The Peabody Picture Vocabulary Test-Revised (PPVT- R)(Dunn & Dunn, 1981) was translated and culturally adapted for the Northern Sotho population in Pretoria and surround- ing areas. The findings of the research include practical examples of methodological considerations which should be taken into account while translating and undertaking cultural adaptation of test material. The newly adapted test material was also applied to a sample of 152 North-Sotho speaking pupils in the study area and the test results are discussed. OPSOMMING Hierdie studie het voortgespruit vanuit die huidige behoefte aan dienslewering binne die Afrikatale-konteks en die gebrek aan evalueringshulpmiddels in die inheemse tale. Alhoewel die Suid-Afrikaanse sosio-ekonomiese klimaat eerder ) toetsvertaling en -aanpassing aanmoedig, as die ontwikkeling van nuwe toetsmateriaal, bestaan daar tans min riglyne vir 2 1 die beplanning en uitvoering van so 'n prosedure. -
The Case of the Sepedi and Sesotho Sa Leboa (Northern Sotho) Language Names
INVESTIGATING THE ONOMASTIC PRINCIPLES OF NAMING AN OFFICIAL LANGUAGE: THE CASE OF THE SEPEDI AND SESOTHO SA LEBOA (NORTHERN SOTHO) LANGUAGE NAMES by RAKGOGO TEBOGO JACOB Student Number: 1798227 A thesis submitted in fulfilment of the requirements for the degree of DOCTOR OF PHILOSOPHY: AFRICAN LANGUAGES AND LINGUISTICS in the School of Literature, Language and Media FACULTY OF HUMANITIES UNIVERSITY OF THE WITWATERSRAND, JOHANNESBURG Supervisor: Dr E.B. Zungu July 2019 DECLARATION BY THE RESEARCHER I, Tebogo Jacob Rakgogo, declare that this thesis is my own, unaided work. It is being submitted for the Degree of Doctor of Philosophy in African Languages and Linguistics at the University of the Witwatersrand, Johannesburg. It has not been formerly submitted before for any degree of examination at any other University. I further declare that all the sources cited and quoted are indicated and acknowledged by means of a comprehensive list of references. _____________________________ Signature of Candidate ________ Day of ___________________20__________________in______________ T.J. Rakgogo Copyright© University of the Witwatersrand, Johannesburg i DEDICATION I dedicate this thesis to: My wife, Mankale Norah Phaladi and our son, Makhwana Lesego Rakgogo. I also dedicate this thesis to the entire Rakgogo family. ii ACKNOWLEDGEMENTS I would like to direct my acknowledgements and sincere gratitude and appreciation to: The God that I serve, for protecting me and shaping my trajectory so I could achieve this dream! I give You all the praise and honour. Thank you for giving me strength and wisdom to undertake this study and bring it to completion within a reasonable time. With You, all things are possible, I thank You most of all.