Corpus-Based Lexicography for Sesotho

Corpus-based Lexicography for Sesotho by Mmasibidi Setaka Student Number 10657542 A dissertation submitted in fulfilment of the requirements for the degree Master of Arts in the Department of African Languages at the Faculty of Humanities UNIVERSITY OF PRETORIA, SUPERVISOR: Prof. D.J. Prinsloo August 2018 Acknowledgements Thanks God for allowing this opportunity and forever gracing me with your mercy and blessings. To my queen mama Ellen Setaka and my sister Madibe Setaka and brother Ntsheke Setaka, you have been more than a blessing in my life and I am most grateful for your unwavering support, advices and encouragement. You make my world brighter. To my son Relebohile Setaka, you motivate me to do more. To my late grandparents, I know you are watching over me always. Thank you and Rest in Peace. Prof. Prinsloo, thank you for believing in me. Your contribution in my studies gave me life lessons about patience, hope and success. I cannot thank you enough for all that you have done for me. SADiLaR, all could have been really hard to achieve if it was not for you. Thank you for the opportunity. Pieter Labuschagne, thank you for always helping me. To my prayer warrior, Matlhogonolo Tsae, thank you for your prayers and constantly enquiring about my studies. That is true friendship and I am most grateful to have you in my life. To all my friends and family who stood by me, I salute you. Modimo o le hlohonolofatse. i 1. Table of Contents Acknowledgements ........................................................................................................... i Abstract ........................................................................................................................ viii Chapter 1: Introduction .................................................................................................... 1 1.1 Introduction ...................................................................................................................... 1 1.2 Rationale and background to the study ........................................................................... 4 Figure 1.1: The treatment of medicine............................................................................... 9 Figure 1.2: Bilingual Sesotho dictionaries .......................................................................... 9 1.3 Aims of the study ........................................................................................................... 10 1.4 Building organic corpora for Sesotho ............................................................................. 11 1.5 Objectives of the research ............................................................................................. 12 1.5.1 Description of principles and practice for the compilation of modern Sesotho dictionaries and evaluating selected English dictionaries ................................................... 13 1.5.2 Critical evaluation of African-language dictionaries with specific focus on Sesotho dictionaries. .......................................................................................................................... 13 1.5.3 Designing a comprehensive lexicographic approach for the compilation of Sesotho dictionaries ........................................................................................................................... 14 1.6 Research methodology .................................................................................................. 15 1.7 Preview and layout of chapters ..................................................................................... 15 Chapter 2: The Sesotho language .................................................................................... 18 2.1 Introduction .................................................................................................................... 18 2.2 Classification of African languages ................................................................................. 19 Figure 2.1: Map of Africa .................................................................................................. 20 2.3 South African Bantu languages ...................................................................................... 23 2.4 The Constitution of South Africa .................................................................................... 24 2.5 History and classification of the Sesotho language ....................................................... 26 Figure 2.5.1: Structure of South-eastern Bantu zone ....................................................... 26 Table 2.5.2: Sesotho Noun Classes ................................................................................... 28 2.6 Sesotho dialects, alphabets and vowel variation ........................................................... 29 Figure 2.6.1: Sesotho phonetic chart ............................................................................... 30 Figure 2.6.2: Sesotho clicks (1) ......................................................................................... 30 ii Figure 2.6.3: Sesotho clicks (2) ......................................................................................... 31 Figure 2.6.4: Southern Sotho alphabetic representation and pronunciation .................. 31 2.7 The Sesotho speech community .................................................................................... 32 2.8 Sesotho orthography ...................................................................................................... 32 2.9 The beginning of Sesotho literature............................................................................... 32 2.10 Orthography: Lesotho Sesotho (LSe) vs South African Sesotho (SASe) ....................... 33 2.11 News and Media ........................................................................................................... 34 2.12 Conclusion .................................................................................................................... 36 Chapter 3: Compilation and utilization of Sesotho corpora .............................................. 38 3.1 Introduction .................................................................................................................... 38 3.2 History of electronic corpora ......................................................................................... 40 Table 3.1: English corpora ................................................................................................ 41 3.3 History of electronic corpora: an illustration for African languages .............................. 41 3.4 Corpus size...................................................................................................................... 42 3.5 Corpus designs ............................................................................................................... 44 Table 3.5.1: Basic composition of Brown and LOB corpora ............................................. 45 Figure 3.5.2: Longman/Lancaster English language corpus – current structure.............. 46 Figure 3.5.3: The British National Corpus ......................................................................... 47 Figure 3.5.4: The International Corpus of English ............................................................ 48 3.6 Balanced versus representative corpora ....................................................................... 48 3.7 The organic corpus ......................................................................................................... 50 3.8 The design of the Sesotho corpus .................................................................................. 52 3.9 Conclusion ...................................................................................................................... 53 Chapter 4: Corpus-enabled Macrostructural compilation ................................................ 55 4.1 Introduction .................................................................................................................... 55 4.2 Types of Macrostructures .............................................................................................. 57 Figure 4.2.1: The structure of a dictionary ....................................................................... 58 Figure 4.2.2: Different macrostructures ........................................................................... 60 4.3 The macrostructural components .................................................................................. 61 Figure 4.3.1: Presentation of the front matter................................................................. 62 iii 4.4 Corpora and the Macrostructure ................................................................................... 63 4.5 Frequency tests .............................................................................................................. 64 4.6 Putting frequencies in the dictionary ............................................................................. 65 Figure 4.6: Diamond representation of frequencies ........................................................ 66 4.7 Lemmatisation ................................................................................................................ 66 4.8 Lemmatisation approaches, strategies and lexicographic traditions ............................ 67 Figure 4.8.1: Example of rule-orientated approach ......................................................... 69 Figure 4.8.2: Southern Sotho pronunciation

Corpus-Based Lexicography for Sesotho

The Distribution and Interpretation of the Qualificative

Sesotho Personal Names As Epithets: a Systemic Functional Linguistics Approach

TANKISO LUCIA MOTJOPE-MOKHALI Submitted in Accordance with the Requirements for the Degree of Doctor of Literature and Philosoph

Anghelescu, Andrei, “Distinctive Transitional Probabilities Across

The Wili Benchmark Dataset for Written Natural Language Identification

Modeling Word and Morpheme Order in Natural Language As an Efﬁcient Tradeoff of Memory and Surprisal

Race/Ethnicity And

Felix Banda Orthography Design

Sesotho Book

GOO-80-02119 392P

Boston University Graduate School of Arts and Sciences, 2017

Palatalization and Other Non-Local Effects in Southern Bantu Languages