Development of a Rule Base Grammar Checker for Swahili Language
Total Page:16
File Type:pdf, Size:1020Kb
The University of Dodoma University of Dodoma Institutional Repository http://repository.udom.ac.tz Information and Communication Technology Master Dissertations 2015 Development of a rule base grammar checker for Swahili language Bamsi, Haji Idd The University of Dodoma Bamsi, H. I. (2015). Development of a rule base grammar checker for Swahili language. Dodoma: The University of Dodoma. http://hdl.handle.net/20.500.12661/760 Downloaded from UDOM Institutional Repository at The University of Dodoma, an open access institutional repository. DEVELOPMENT OF A RULE BASED GRAMMAR CHECKER FOR SWAHILI LANGUAGE By Haji Idd Bamsi A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science of the University of Dodoma The University of Dodoma October, 2015 CERTIFICATION The undersigned certifies that he has read and hereby recommends for acceptance by the University of Dodoma a dissertation entitled Development of a rule base grammar checker for Swahili language in fulfilment of the requirements for the Master of Science in Computer Science of the University of Dodoma. ………………………………………………………………. Dr. Salehe I. Mrutu (Supervisor) Date…………………………………… i DECLARATION AND COPYRIGHT I, Haji Idd Bamsi declare that this dissertation is my own original work and that it has not been presented and will not be presented to any other University for similar or any other degree award. Signature……………………………………… No part of this dissertation may be reproduced, stored in any retrieval system, or transmitted in any form or by any means without prior written permission of the author or the University of Dodoma. ii ACKNOWLEDGEMENT I have taken efforts in this study. How it would not have been possible without receiving blessings from the Almighty God who kept me safe and healthy to be able to undertake my study. I am highly indebted to my supervisor, Dr. Mrutu for his guidance and constant support up to the completion of this study. His fascination in this study and strong opinion served as a powerful essential towards the completion of this study. I would like to express my sincere gratitude to my colleagues, MSC class 2014/2015. Their support, encouragement and collaboration highly helped me out with their abilities. This work would have not been possible without the support from my employer, the University of Dodoma (UDOM) for granting me a study leave to attend the program. I would like to acknowledge all staff of the College of Informatics and Virtual Education, Department of Computer Science for their support and encouragement during my studies. Last, but not least, I would like to thank my wife and my family for their constant encouragement and patience throughout my studies. I could not finish this without mentioning my friend, my colleague Ms Rose for her assistance and encouragement during the course of my study. iii DEDICATION I would like to dedicate this work to my late mother, Fatma Mohammed. I always remember her guidance and love she had shown me. May Allah rest her in peace. iv ABSTRACT Grammar checker is a writing assistance tool developed for checking rules of the Natural Languages automatically. Every natural language has a set of rules which are used to guide users of that language. Swahili is one of the most widely spoken languages in the East African countries and specifically in Tanzania. Efforts have been made so far towards the development of the tool. However, to the best of our knowledge there is no assistance tool developed and reported for detecting the grammatical errors of Swahili sentences automatically. In this study, a rule based grammar checker prototype for Swahili language has been developed and tested. The system prototype has been developed using a rule based approach. In developing a grammar checker, Swahili texts were collected and analyzed. Then, a grammar rules were developed and tested using Transformation Based Learning (TBL) algorithm. The grammar checker prototype was designed into two modules; the first module detects spelling using Bayesian theory that finds the most likely spelling correction from the set of possible corrections, and the second module detects grammar errors that match the input text against the pre-defined grammar rules. The performance of the developed prototype was evaluated using precision and recall standard performance measures. Precision was used to present the ability of the prototype to detect grammar errors, while recall was used to test the ability of the prototype to reveal only relevant grammar errors. The system prototype achieves 71% recall and 76% precision. Therefore, the accuracy of the grammar checker prototype obtained was 73%. v TABLE OF CONTENTS CERTIFICATION ........................................................................................................................ i DECLARATION AND COPYRIGHT ......................................................................................ii ACKNOWLEDGEMENT ......................................................................................................... iii DEDICATION ............................................................................................................................ iv ABSTRACT .................................................................................................................................. v TABLE OF CONTENTS ........................................................................................................... vi LIST OF FIGURES .................................................................................................................... ix LIST OF ABBREVIATIONS .................................................................................................... xi CHAPTER ONE : OVERVIEW OF THE STUDY .................................................................. 1 1.1 Background Information .......................................................................................................... 1 1.2 Problem Statement ................................................................................................................... 4 1.3 Research Objectives ................................................................................................................. 5 1.3.1 Main Objective...................................................................................................................... 5 1.3.2 Specific Objectives ............................................................................................................... 5 1.4 Research Questions .................................................................................................................. 5 1.5 Significance of the Study ......................................................................................................... 6 1.6 Limitation of the Study ............................................................................................................ 6 1.7 Dissertation Structure ............................................................................................................... 7 CHAPTER TWO : LITERATURE REVIEW .......................................................................... 9 2.1 About Swahili Language .......................................................................................................... 9 2.2 Definitions of Terms .............................................................................................................. 11 2.2.1 Natural Language Processing ............................................................................................. 11 2.3 Rule Based Approach ............................................................................................................ 14 2.3.1 Part of Speech Tagging ....................................................................................................... 16 vi 2.3.1.1 Swahili Word Categories ................................................................................................. 18 2.3.2 Phrase Chunking ................................................................................................................. 23 2.3.3 Phrase Structure .................................................................................................................. 23 2.4 Transformation Based Learning............................................................................................. 24 2.5 Parsing ................................................................................................................................... 26 2.6 Spelling Corrector .................................................................................................................. 28 2.7 Related Works ........................................................................................................................ 28 CHAPTER THREE : METHODOLOGY ............................................................................... 30 3.1 Data Gathering, Review and Analysis ................................................................................... 30 3.2 Development Model for Grammar Checker .......................................................................... 31 3.2.1 Lexical Analysis.................................................................................................................. 32 3.2.1.1 Part of Speech Tagging .................................................................................................... 33 3.2.1.2 Parse Tree........................................................................................................................