6 Design and Evaluation of Soft Keyboards for Brahmic Scripts

1 Design and Evaluation of Soft Keyboards for Brahmic Scripts 2 LAUREN HINKLE, University of Michigan 3 ALBERT BROUILLETTE,UniversityofColorado 4 SUJAY JAYAKAR, Cornell University 6 5 LEIGH GATHINGS, MITRE Corporation 6 MIGUEL LEZCANO and JUGAL KALITA,UniversityofColorado 7 Despite being spoken by a large percentage of the world, Indic languages in general lack user-friendly 8 and efficient methods for text input. These languages have poor or no support for typing. Soft keyboards, 9 because of their ease of installation and lack of reliance on specific hardware, are a promising solution 10 as an input device for many languages. Developing an acceptable soft keyboard requires the frequency 11 analysis of characters in order to design a layout that minimizes text-input time. This article proposes the 12 use of various development techniques, layout variations, and evaluation methods for the creation of soft 13 keyboards for Brahmic scripts. We propose that using optimization techniques such as genetic algorithms 14 and multi-objective Pareto optimization to develop multi-layer keyboards will increase the speed at which 15 text can be entered. 16 Categories and Subject Descriptors: H.5.2 [Information Systems–Information Interfaces and 17 Presentation]: User Interfaces 18 General Terms: Design, Experimentation, Human Factors 19 Additional Key Words and Phrases: Genetic algorithms, soft keyboards, Indic languages, Assamese, Pareto 20 optimization, mobile devices 21 ACM Reference Format: 22 Hinkle, L., Brouillette, A., Jayakar, S., Gathings, L., Lezcano, M., and Kalita, J. 2013. Design and evaluation 23 of soft keyboards for Brahmic scripts. ACM Trans. Asian Lang. Inform. Process. 12, 2, Article 6 (June 2013), 24 37 pages. 25 DOI:http://dx.doi.org/10.1145/2461316.2461318 26 1. INTRODUCTION 27 In an increasingly fast-paced and digitally connected world, being able to input text 28 and input it quickly is important to all users. Many efficient physical keyboards have 29 been designed and developed for standard Roman-alphabet based languages although 30 in practice, they have been discarded in favor of the familiar QWERTY keyboard. Thus, 31 the standard physical keyboard that comes with computers around the world is almost 32 always the QWERTY keyboard, which has its origins in the 1800s. Noyes [1983] and 33 Yamada [1980] provide detailed developmental histories of English physical keyboards 34 and of the QWERTY keyboard in particular. 35 The QWERTY keyboard is unsuitable for Brahmic scripts, which are the modern 36 descendants of the Brahmi script [Coulmas 1991], used widely in countries of the This work is supported by the National Science Foundation, under grants CNS-0958576, CNS-0851783 and DUE-0422524. Author’s address: J. Kalita, Computer Science Department, College of Engineering and Applied Science, 1420 Austin Bluffs Parkway, Colorado Springs, CO 80933-7150; email: [email protected]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax 1(212)869-0481,[email protected]. +c 2013 ACM 1530-0226/2013/06-ART6 $15.00 DOI:! http://dx.doi.org/10.1145/2461316.2461318 ACM Transactions on Asian Language Information Processing, Vol. 12, No. 2, Article 6, Publication date: June 2013. 6:2 L. Hinkle et al. 37 Indian sub-continent and other parts of East Asia. This is evident in India where 1 38 there are several different extant Brahmic scripts [Salomon 1998] used by all 39 the major languages belonging to Indo-European, Dravidian, and other families 40 [Wagner et al. 1999, 24]. The scripts include Devanagari, Eastern Nagari, Gujarati, 41 Gurmukhi, Telugu, Kannada, Malayalam, and Tamil. Scripts for languages such as 42 Tibetan (Tibeto-Burman, Tibet), Burmese (Tibeto-Burman, Myanmar), Sinhala (Indo- 43 European, Sri Lanka), Balinese and Javanese (Austronesian, Indonesia), Thai (Austro- 44 Thai, Thailand), Khmer and Lao (Laos, both Mon-Khmer) belong to the same script 45 class, which have similar, and sometimes additional, issues. There is no easy and 46 widely acceptable text entry method for most of these languages, especially those in 47 the Indian subcontinent. This is true even in the case of Hindi, which is used natively 48 by between 182 and 366 million people, and Bengali, which is used by between 181 2 49 and 207 million people .Thesetwolanguagesarethefourthandsixthmostcommonly 50 spoken languages in the world, surpassing languages such as Russian (between 144 51 and 167 million), German (between 90 and 100 million), and French (between 68 and 52 78 million). 53 Most widely spoken Indic languages have had typewriters for some time, but the 54 typewriters have been used almost exclusively for press and bureaucratic needs. 55 However, records of how these typewriter keyboards were created are very difficult 56 to find. In addition, such typewriters, and hence the associated keyboards, never 57 achieved any significant level of usage outside printing presses and government offices. 58 While researchers have designed physical keyboards for Brahmic scripts for use with 59 computers [Joshi et al. 2004], usually the efficiency of input has not been a major 60 concern in these designs, and such keyboards are typically unavailable outside of a 3 61 few research labs. In a country like India, where only 4% [Kachru 1986] to 10% of 62 the population can perform adequately in English, almost everyone else relies on their 63 native language for everyday activities. However, the small percentage of people who 64 can perform well in English control most professions and these are the only people 65 who have frequent and reliable access to digital technologies that require reading text 66 and inputting text. Thus, one of the problems that has made the use of computers 67 and other digital devices in their native language very difficult is the lack of adequate 68 text entry methods. This perpetuates a severe but unacknowledged form of digital 69 divide (e.g., [Joshi et al. 2004; Ko and Yoshiki 2005; Hosken and Lyons 2003]) for a 70 large segment of the world’s population. Typically, computer or typewriter text entry 71 in Indian languages using physical keyboards is done by professional typists who 72 have had months of training or by dedicated individuals. Typing is almost impossible 73 for most common folk, which is in stark contrast with America and Western Europe 74 where the vast majority of people are able to type. The recent introduction of mobile 75 phones has spread across the socio-economic layers of India’s population. While this 76 development has greatly increased the access that many people have to information, 77 it has also increased the need for soft keyboards in the native languages. 78 Soft keyboards allow a user to input text without the use of a physical keyboard. 79 Kolsch and Turk [2002] define a soft keyboard as a typing device with no physical 80 manifestation. Soft keyboards are versatile because they allow data to be input 81 through mouse clicks or by touching an onscreen keyboard. With the recent surge 82 in popularity of touchscreen technology, well-designed soft keyboards are becoming 1See http://en.wikipedia.org/wiki/Brahmic script, accessed February 20, 2012. 2See http://en.wikipedia.org/wiki/List of languages by number of native speakers, accessed February 20, 2012. 3See http://en.wikipedia.org/wiki/List of countries by English-speaking population, accessed February 20, 2012. ACM Transactions on Asian Language Information Processing, Vol. 12, No. 2, Article 6, Publication date: June 2013. Design and Evaluation of Soft Keyboards for Brahmic Scripts 6:3 83 even more important everywhere. The market for English language soft keyboards is 84 already dominated by the familiar and ubiquitous QWERTY layout, even though it 85 is not the most efficient keyboard ever made, and many more efficient soft keyboards 86 have been designed. However, languages for which there are no entrenched keyboard 87 designs (and in fact no soft keyboards for Brahmic languages have been made beyond 88 the basic alphabetic ones and QWERTY-based designs, which have not been optimized 89 for text input performance), designing and developing soft keyboards for the emerging 90 and rapidly expanding touchscreen market may be very helpful in reducing the almost 91 complete inaccessibility of digital devices, including computers, for those who natively 92 speak Brahmic languages. Some preliminary work has been done in this field, but 93 there is still much room for improvement. Ghosh et al. [2011] has done work on Bengali 94 input systems for mobile devices, and Jalihal [2010] has proposed a Hindi input system 95 that maps characters to the standard nine buttons of a cell phone keypad. 96 Brahmic scripts have more characters and ligatures than what can usably fit on 97 astandardkeyboard.Asoftkeyboardallowsanylanguagetohavecustomlayouts 98 based on the frequency of character and ligature use within that language. The 99 development and spread of soft keyboards would allow the use of an easier to learn and 4 5 100 more efficient keyboard. Web service providers such as Google and Wikipedia have 101 developed soft keyboards for Brahmic scripts, but these are simply an alphabetical 102 listing of letters, and thus not optimized.

6 Design and Evaluation of Soft Keyboards for Brahmic Scripts

Cross-Language Framework for Word Recognition and Spotting of Indic Scripts

Grammatica Et Verba Glamor and Verve

SC22/WG20 N896 L2/01-476 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale De Normalisation

Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress

Proposal to Encode 0D5F MALAYALAM LETTER ARCHAIC II

The Rise of Dalit Peasants Kolhi Activism in Lower Sindh

Proposal for a Gujarati Script Root Zone Label Generation Ruleset (LGR)

Comment on L2/13-065: Grantha Characters from Other Indic Blocks Such As Devanagari, Tamil, Bengali Are NOT the Solution

Finite-State Script Normalization and Processing Utilities: the Nisaba Brahmic Library

N4035 Proposal to Encode the Tirhuta Script in ISO/IEC 10646

Relation Between Harappan and Brahmi Scripts

Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset