Developing a Model Corpus for Endangered Languages

University of Calgary PRISM: University of Calgary's Digital Repository Graduate Studies The Vault: Electronic Theses and Dissertations 2014-07-11 Developing a Model Corpus for Endangered Languages Hashi, Awil Hashi, A. (2014). Developing a Model Corpus for Endangered Languages (Unpublished doctoral thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/25614 http://hdl.handle.net/11023/1632 doctoral thesis University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca UNIVERSITY OF CALGARY Developing a Model Corpus for Endangered Languages by Awil Hashi A DISSERTATION SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY GRADUATE PROGRAMS IN EDUCATION CALGARY, ALBERTA JULY, 2014 © Awil Hashi 2014 i Abstract World languages are becoming endangered at an unprecedented rate. Linguists estimate that 50- 90% of the world’s 6700 languages are endangered. Krauss (1992) predicted that as few as 10% of the world’s languages may survive by 2100, which means we may be left with only 670 languages. Given the relationship between linguistic diversity, cultural diversity, intellectual diversity and biological diversity, this study is timely because it contributes to maintaining the biodiversity of languages. This research expands upon the previous success of corpus linguistics for certain world languages to develop and document the Somali language. This educational technology study examines the phenomenon of endangered languages by examining technology and languages in Corpus Linguistics, the study of language in real world texts. Through analysis and a review of the literature and historical precedents, the study demonstrates how the Somali language, spoken by millions, can be labelled an endangered language. Using a pragmatic approach and expanding upon the theories and experience of previous corpus constructions, this study developed an iterative design framework in five separate but complementary phases for corpus design in challenging contexts and drafted a sampling frame for text collection. The research designed and built a Somali language corpus as a working prototype for endangered languages that illustrates how it could be used for language development and documentation purposes. Phase one used the Somali language and collected 22 texts from different genres to produce a general corpus of 865,214 words. Aided by quantitative results generated by antConc corpus tools, the study reveals the 20 most frequently used Somali words. The identification of such words has practical implications for teachers, students, curriculum developers, lexicographers, and language policy makers because their high frequency has pedagogical significance. The study illustrates how corpus tools identify the least commonly used words with implications for those who are involved in language documentation and development initiatives. Using types/token ratio analysis, the study indicates that certain genres such as poetry have an inherently richer vocabulary, which has implications for lexicographers and those who are involved in vocabulary and terminology development in languages. ii Dedication This work is dedicated to my late father Calidhuux Hashi and my uncle, Dr. Xasan Xaashi Fiqi, who are unfortunately not here with us to rejoice the achievement of this landmark work, which is undoubtedly the culmination of the fruits they sowed by supporting, nurturing and inspiring me not merely to excel but also to contribute to humanity. Hibeyn Waxa aan miraha hawshaan u hibeeyey oo aan kusoo xasuusanyaa Allaha u naxariistee, aabbahay Calidhuux Xaashi iyo adeerkay Dr. Xasan Xaashi Fiqi, kuwaasoo nasiib-darro aan halkaan nala joogin si ay noola qaybsadaan libinta hawshaan oo ay dhab u bogaadin lahaayeen. Tanoo aan shaki ku jirin inay tahay mirihii ay ku beereen taageeradoodii, ababintoodii iyo dhiirrigelintoodii ahayd inaanan kaligey tallaabsan ee aan adduunyada wax uun kusoo kordhiyo. iii Acknowledgements This academic work has been one of the most challenging projects I have ever had to undertake. The support, patience, and inspiration from many people have made the initiation of this endeavor as well as it is completion attainable. I owe them my utmost appreciation. ! Dr. Michele Jacobsen, my supervisor who despite her many other academic, family and professional commitments, provided me with timely and constructive feedback consistently on each chapter. Her tactful demeanor and commitment to excellence has inspired me to stay focused. I can’t thank you enough! ! My supervisory committee, Dr. Susan Crichton, Dr. Ian Winchester and Dr. Darin Flynn, have all given me unwavering support and insight from day one. Thank you all! ! My mother has always been a source of knowledge and inspiration. Her encouragement to excel and to also live my life to the fullest has been inspirational. Thank you for being such a wonderful mother. ! My soulmate, Faisa Xassan Xaashi and my children have all been there for me. Your endless support and patience have taught me so much about empathy, perseverance and love. Special thanks goes to Anas and Anisa Hashi who have worked with me as project co-supervisors during the initial stages of this thesis and as a result learned about the overall structure and the subtitles of the first three chapters. I am so thankful to you all beyond words. My heartfelt thanks also goes to my sister, Ubax Calidhuux Xaashi and my awesome nieces Rumaysa, Ruweyda and Ramla Maxamed Weli who have been so supportive throughout the entire process. ! A grade-school friend and a cousin, Abdirishid Hashi of the Heritage Institute for Policy Studies, has worked with me tirelessly during the data collection in ensuring that I gather iv as diverse texts as possible towards constructing a corpus of one million words. I owe you one, Rashid! ! Abdurahman Hashi of Fiqi Publishers has not only donated much-needed texts but has also been a dependable friend and cousin. Ali Mohamed Saeed has always been a staunch supporter during tough times. Thank you, Ali Hashi. ! The following individuals deserve special gratitude for their encouragement and inspiration: Cabdirisaaq Cabdullaahi Xaashi, Cabdinuur Khaliif, Guuleed Xasan Xaashi, Fuad Hassan Hashi, Dahir K. Hashi, Fiqi Bashir Khalif, Abdulkadir Abdinur Hashi, Ali Bosir, Dr. Afyare Elmi, Siciid Suugaan, Dr. Cabdullaahi Barise and Dr. Abdullaahi Hussein (Muqadin). ! The following friends and colleagues have generously donated various texts: Ali M. Abdigiir (Cali-ganey), Abdulqadir Khaliif Xaashi, Abwaan Xasan Cabdullaahi Xaashi (Timacadde), and Dr. Qaasim Hersi Faarah. ! Sylvia Parks, our graduate program administrator has always been supportive, considerate and effective in her efforts. I thank you very much! ! I am also very grateful to University of Calgary’s Graduate Division for Educational Research for funding this research through in-house graduate funding and also through the Queen Elizabeth II Scholarship. I am eternally grateful. ! Muthhir Mohamed and Dr. Hussein Warsame of Haskayne School of Business and all my friends in Calgary have always encouraged me to stay on course. Thank you all! There are many friends and colleagues who have helped me one way or another over the course of this journey. I will always cherish your efforts and support in my heart. v Table of Contents ABSTRACT ............................................................................................................................................. I DEDICATION ........................................................................................................................................ II ACKNOWLEDGEMENTS ....................................................................................................................... III LIST OF TABLES .................................................................................................................................. VIII LIST OF FIGURES ................................................................................................................................ VIII CHAPTER ONE: OVERVIEW ................................................................................................................... 1 1.1 RESEARCH BACKGROUND ........................................................................................................................ 1 1.2 SIGNIFICANCE OF THE STUDY .................................................................................................................... 2 1.3 PURPOSE OF THE STUDY .......................................................................................................................... 4 1.4 RESEARCH QUESTION ............................................................................................................................. 4 1.5 METHODOLOGY AND CONCEPTUAL FRAMEWORK ........................................................................................ 4 1.6 THE SCOPE OF THE STUDY ........................................................................................................................ 5 1.7 ORGANIZATION OF CHAPTERS .................................................................................................................

Developing a Model Corpus for Endangered Languages

Title First Name Last (Family) Name Officecountry Jobtitle Organization 1 Mr. Sultan Abou Ali Egypt Professor of Economics

519 Ethiopia Report With

African Studies Association 59Th Annual Meeting

Somalia Terror Threat

The Case of Somali Language

Palestinian Rights

African Immigrant Innovation in 21St Century Giving

Contribution to the UNESCO Encyclopedia of Life Support Systems (EOLSS) 6.20B.10.3

The Politics of Religious Education, the 2013 Curriculum, and the Public Space of the School

SETTING HISTORY STRAIGHT? INDONESIAN HISTORIOGRAPHY in the NEW ORDER a Thesis Presented to the Faculty of the Center for Inte

Somali Children and Youth's Experiences in Educational Spaces in North America: Reconstructing Identities and Negotiating the Past in the Present

Fair and Other Non-Infringing Uses in the Context of Cloud Computing;Note George Jiang