Kadazan Part of Speech Tagging Using Transformation-Based Approach
Available online at www.sciencedirect.com ScienceDirect Procedia Technology 11 ( 2013 ) 621 – 627 The 4th International Conference on Electrical Engineering and Informatics (ICEEI 2013) Kadazan Part of Speech Tagging using Transformation-Based Approach Marylyn Alex*, Lailatul Qadri Zakaria CAIT Research Group, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600 Bangi, Malaysia Abstract This paper presents the Part of Speech (POS) Tagger for Kadazan language using Transformation-based approach. The objectives of this study is to develop a POS tagger for Kadazan which has never been develop systematically before by any of the tagging approaches and also to solve the disambiguation problem in that language and at the same time to use it as a learning language tool. We use and implement this approach because it can achieve higher accuracy equivalent to other tagging approaches such as statistical and the original rule-based techniques and as well as having a significant advantages over the other tagging approaches. The tagging system has been trained using two Kadazan corpuses which contain 741 words and 1328 words. Based on the evaluation results, the tagging system can achieve around 92 % to 93 % of accuracy. ©© 20132013 TheThe Authors. Authors. Published Published by by Elsevier Elsevier B.V. Ltd. SelectionSelection and peer-reviewpeer-review under responsibilityresponsibility ofof thethe Faculty Faculty of of Information Information Science Science & & Technology, Technology, Universiti Universiti Kebangsaan Kebangsaan MalaMalaysia.ysia. Keywords: Kadazan Language, Transformation-Based, POS tagger, Brill’s Tagger, Statistical, Rule-Based 1. Introduction POS tagging is a system that can read text in some languages and to assign POS such as noun, verb, adjective, adverb, pronoun, etc.
[Show full text]