Thesis Submitted to the Indian Institute of Technology Kharagpur for Award of the Degree

Thesis Submitted to the Indian Institute of Technology Kharagpur for Award of the Degree

WORD PREDICTION SYSTEM WITH VIRTUAL KEYBOARD FOR TEXT ENTRY IN HINDI Manoj Kumar Sharma WORD PREDICTION SYSTEM WITH VIRTUAL KEYBOARD FOR TEXT ENTRY IN HINDI Thesis submitted to the Indian Institute of Technology Kharagpur for award of the degree of Master of Science (by Research) by Manoj Kumar Sharma Under the guidance of Dr. Debasis Samanta School of Information Technology Indian Institute of Technology Kharagpur Kharagpur - 721 302, India June 2012 ⃝c 2012 Manoj Kumar Sharma. All rights reserved. CERTIFICATE OF APPROVAL 19/06/2012 Certified that the thesis entitled Word Prediction System with Virtual Keyboard for Text Entry in Hindi submitted by Manoj Kumar Sharma to the Indian Insti- tute of Technology, Kharagpur, for the award of the degree Master of Science has been accepted by the external examiners and that the student has successfully defended the thesis in the viva-voce examination held today. (Member of DAC) (Member of DAC) (Member of DAC) (Member of DAC) (Member of DAC) (Member of DAC) (Supervisor) (Internal Examiner) (Chairman) CERTIFICATE This is to certify that the thesis entitled Word Prediction System with Virtual Keyboard for Text Entry in Hindi, submitted by Manoj Kumar Sharma to Indian Institute of Technology Kharagpur, is a record of bona fide research work under my supervision and I consider it worthy of consideration for the award of the degree of Master of Science (by Research) of the Institute. Date: 19/06/2012 Dr. Debasis Samanta Associate Professor School of Information Technology Indian Institute of Technology Kharagpur Kharagpur - 721 302, India DECLARATION I certify that a. The work contained in the thesis is original and has been done by myself under the general supervision of my supervisor. b. The work has not been submitted to any other Institute for any degree or diploma. c. I have followed the guidelines provided by the Institute in writing the thesis. d. I have conformed to the norms and guidelines given in the Ethical Code of Conduct of the Institute. e. Whenever I have used materials (data, theoretical analysis, and text) from other sources, I have given due credit to them by citing them in the text of the thesis and giving their details in the references. f. Whenever I have quoted written materials from other sources, I have put them under quotation marks and given due credit to the sources by citing them and giving required details in the references. Manoj Kumar Sharma Dedicated to My parents and other family members ACKNOWLEDGMENT First and foremost I wish to convey my deep sense of gratitude to my mentor Prof. Debasis Samanta. It has been my blessed opportunity to be his student. I appreciate all his contributions in the form of time, idea, and greater vision to make my research experience productive and cherish able. I basically learnt an approach of humanity, patience and hard working from him. I would like to thank Prof. Jayanta Mukhopadhyay, Head of SIT for extending me all the possible facilities to carry out the research work. I also wish to thank all of my departmental academic committee members Prof. A. Gupta, Prof. C. R. Mandal, Prof. S. Sural, Prof. S. K. Ghosh, Prof. K. S. Rao, Prof. S. Misra, Prof. R. R. Sahay for their valuable suggestions during my research. I sincerely remember the support of office staffs Mithun Da, Soma Di, Malay Da, Vinod Da and others. I am also grateful to all members of School of Information Technology. I owe my deepest gratitude to Somnath Da, Barik Da, Ranjan Da for strengthening my research by constant moral support and providing necessary guidance when required. I really learnt a lot from them. I wish to convey my heartfelt thanks to Sayan Sar- car, Pradipta Kumar Saha, Soumalya Ghosh, Santa Maiti, Debasish Kundu, Arindam Dasgupta, Jayeeta Mukherjee, Sankar Narayan Das, Indira Mukherjee, Soumyajit Dey, Chandan Karfa, Kanchan Manna, Sudhamay Maity, Shashidhar Koolagudi, Col. Ranjit Singh, Puspak, Krishnendu, Soumya, Ankit, Anmol and many more. I am greatly indebted to many of my friends for their constant inspiration. I used to receive frequent boosting calls from K. Satish, Dilip Kumar Singh, Chandra Mohan Prasad, Jay Singh, Richard Xalxo, Kundan Shrivastava, Shakti Singh, Amit Sharma, Mithun Mittra, Vijay Verma, Chittapriya Mahato, Prosenjit Banerjee, Raj, Anil, Bipin and many more. Nothing would have been possible without the moral support of my parents, brothers, sisters and their families. I deeply indebted to them. Thanks to my wife Jyoti for being with me. I would like to thank Dr. N. C. Pal, Prof. S. Mitra, Prof. A. Chakraborty and Partha Sarathi Santra for reducing my research burden by creating and maintaining cordial family relationship with us. Manoj Kumar Sharma ix Abstract With the growth of Information and Communication Technology (ICT), text compo- sition in users’ languages has drawn attention of HCI researchers. The conventional text entry mechanism with QWERTY keyboard is not so efficient, in particular, in Indian languages. This work aims to develop an efficient text entry system in Hindi. To develop a text entry system, we address two main issues: designing text entry interface and enhancement of text entry rate. Virtual keyboard has been advocated as a best alternative to the QWERTY hardware keyboard because of its user friendliness and easy to customize. We develop a layout of a virtual keyboard for text composition in Hindi. Our design considers Indian language related issues such as large set of char- acters, complex and diacritic characters etc. We also consider the size and placement of prediction window in the layout to achieve better text entry rate. Next, we propose to augment the virtual keyboard with text entry rate enhancement strategy namely word prediction. In the context of Hindi language, users commit many errors. Our approach is to predict the correct words even there are errors in the initial input. We also consider another rate enhancement approach called visual clue which reduces visual search time and hence improves the text entry rate significantly. The experiments with users and simulation reveal that text entry rate according to our developed system is 13:12 wpm compare to the text entry rates (approximately 5:17 wpm) with existing state of the arts virtual keyboards in Hindi. Keywords: Human computer interaction, Text entry rate enhancement, Word predic- tion, Hindi text entry interface, Indian languages, Virtual keyboard, Visual clue. xi Contents Approval i Certificate iii Declaration v Dedication vii Acknowledgment ix Abstract xi Contents xiii List of Figures xvii List of Tables xxi List of Symbols and Abbreviations xxiii 1 Introduction 1 1.1 Different modes of text composition . 2 1.2 Advantages and issues with virtual keyboard . 4 1.3 Word prediction . 5 1.4 Issues in designing predictive virtual keyboard in Indian languages . 6 1.5 Scope and objectives . 9 1.6 Thesis outline . 10 2 Related Work 11 2.1 Text entry interface . 12 xiii Contents 2.1.1 Virtual keyboard in English . 12 2.1.2 Virtual keyboards in other non-Indian languages . 14 2.1.3 Virtual keyboards in Indian languages . 14 2.2 Text entry rate enhancement strategies . 18 2.2.1 Abbreviation expansion . 18 2.2.2 Semantic coding . 19 2.2.3 Sentence compansion . 19 2.2.4 Text prediction . 20 2.3 Word prediction . 21 2.3.1 Statistical prediction . 22 2.3.2 Syntactical prediction . 24 2.3.3 Semantical prediction . 26 2.3.4 Other prediction methods . 27 2.3.5 Practical outcomes of word prediction . 27 2.3.6 Available commercial tools . 28 2.3.7 Indian scenario of word prediction . 28 2.4 Prediction with virtual keyboard . 29 2.4.1 Position of prediction window . 32 2.4.2 Size of prediction window . 32 2.4.3 Visual clue . 33 2.5 Summary . 34 3 Text Composition Interface 37 3.1 Designing a virtual keyboard for text composition in Hindi . 38 3.2 Finding a place for positioning the prediction window . 40 3.3 Deciding the size of the prediction window . 48 3.4 Experiments and experimental results . 49 3.4.1 Metrics for performance measure . 50 3.4.2 Experimental setup . 51 3.4.3 Evaluation procedures . 51 3.4.4 Experimental results . 54 3.5 Summary . 62 4 Word Prediction in Hindi 65 4.1 Word-level prediction with error correction support . 66 4.1.1 Development of language model . 66 4.1.2 Score metrics . 67 xiv Contents 4.1.3 Prediction methodology . 70 4.2 Experimental results . 77 4.3 Comparison with some relevant word prediction systems . 80 4.4 Summary . 80 5 Predicting Next Character Highlighter 83 5.1 Framework of the PNCH . 84 5.2 Proposed methodology . 85 5.2.1 Identification of candidate characters . 85 5.2.2 Filteration of candidate characters . 93 5.3 Algorithm walkthrough . 95 5.4 Experimental results . 99 5.4.1 Metrics for performance measure . 100 5.4.2 PNCH performance measure . 100 5.5 Summary . 102 6 Summary and Conclusion 103 6.1 Discussion . 103 6.2 Conclusion . 105 6.3 Future scope of work . 106 References 107 Publications 119 xv List of Figures 1.1 Different modes of text composition . 3 (a) Keyboard-based text composition [81] ................. 3 (b) Gesture-based text composition [1] ................... 3 (c) Icon-based text composition [6] .................... 3 (d) Speech-based text composition [112] .................. 3 (e) Eyegaze-based text composition [82] .................. 3 2.1 Popular virtual keyboard layouts in English . 13 (a) QWERTY keyboard [97] ........................ 13 (b) FITALY keyboard [33] ......................... 13 (c) Dvorak keyboard [28] .......................... 13 (d) Lewis keyboard [66] ........................... 13 (e) OPTI keyboard [74] ........................... 13 (f) Cirrin keyboard [77] ........................... 13 2.2 Virtual keyboard layouts in non-Indian languages [117] ..........

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    148 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us