Designing Soft Keyboards for Brahmic Scripts

Designing Soft Keyboards for Brahmic Scripts Lauren Hinkle Miguel Lezcano Jugal Kalita Colorado College University of Colorado University of Colorado Colorado Springs, CO, USA Colorado Springs, CO, USA Colorado Springs, CO, USA lauren.hinkle [email protected] [email protected] @coloradocollege.edu Abstract Soft keyboards allow a user to input text with and without the use of a physical keyboard. They Despite being spoken by a large percent- are versatile because they allow data to be input age of the world, many Indic languages through mouse clicks on an on-screen keyboard, have escaped research and development in through a touch screen on a computer, cell phone, the realm of text-input. There exist lan- or PDA, or by mapping a virtual keyboard to guages with poor or no support for typ- a standard physical keyboard. With the recent ing. Soft keyboards, because of their ease surge in popularity of touchscreen media such of installation and lack of reliance on spe- as cell phones and computers, well-designed cific hardware, are a promising solution as soft keyboards are becoming more important an input device for many languages. De- everywhere. veloping an acceptable soft keyboard re- For languages that don’t conform well to quires frequency analysis of characters in standard QWERTY based keyboards that order to design a layout that minimizes accompany most computers, soft keyboards are text-input time. This paper proposes us- especially important. Brahmic scripts (Coulmas ing various development techniques, lay- 1990) have more characters and ligatures than fit out variations, and evaluation methods usably on a standard keyboard. A soft keyboard for soft keyboard creation for Brahmic allows any language to have custom layouts based scripts. We propose that using optimiza- on the frequency of character and ligature use tion techniques such as genetic algorithms within that language. While physical keyboards to develop multi-layer and/or gesture key- have been designed for Brahmic scripts, such boards will increase the speed at which as (Joshi et al. 2004), they can be cumbersome and text can be entered. difficult to learn. The development and spread of soft keyboards would allow the use of an easier to 1 Introduction learn and more efficient keyboard. Additionally, Keyboard optimization has been a field of quickly soft keyboards are becoming more and more advancing research for several decades. In an important in the realm of communication due to increasingly fast paced world, being able to input the popularity of touch-screen cell phones. text and input it quickly is important to all users. Web service providers such as Google and Standard Roman-alphabet based languages have Wikipedia have developed soft keyboards for been studied in detail, and it is important to Brahmic scripts, but these are not optimized. now develop efficient keyboards in languages that Computer users are often forced to use English have not been highly researched. Many Indic keyboards to tediously type their script. This languages have only rudimentary keyboards that limits people’s ability to use computers and were developed so that users could input text, but thus connect themselves to the many resources not with the goal of optimization. As a result, computers can provide. The development of soft many of these keyboards have not had a significant keyboards for these languages has the dual benefit impact because they are difficult to learn and use. of allowing more people to be able to use a Proceedings of ICON-2010: 8th International Conference on Natural Language Processing, Macmillan Publishers, India. Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2010 computer in their native language and to prepare increasingly efficient soft keyboards that were users for touch-screen technologies. able to reach theoretic upper-bounds of text input of approximately 42 wpm (Zhai et al. 2 Problem Definition 2000). The next major step in efficiency was undertaken by employing genetic algorithms. The Developing a soft keyboard for a Brahmic script keyboards generated using this technique were is important, but in order for it to be usable it is more efficient for every layout tested than any necessary to design it such that it is easily learned previous soft keyboard (Raynal and Vigouroux as well as efficient. 2005). The impressive performance of genetic A user must be able to select the desired algorithms leads to the idea of using other characters as quickly as possible. Brahmic scripts optimization techniques for keyboard layout. pose an important problem that must be overcome: While such techniques have not been applied the number of characters that should appear on the to keyboards aimed at the general public, ant keyboard to maximize efficiency. Brahmic scripts colony optimization strategies have been applied have more characters than the standard Roman- to virtual keyboards designed to make text input based languages most people are familiar with. easier for people with disabilities (Colas et al. For example, the Eastern Nagari script has 37 2008). consonants, 11 vowels, and 4 special characters that can be used individually as well as combined If these steps worked to create efficient soft to create 240 different ligatures. keyboards for English, it is likely they will work Character layout must be designed with as well for other languages. efficient text-entry in mind. More frequently Brahmic soft keyboards are currently in a used characters should be placed centrally, and developmental stage analogous to where English characters that are often used together should soft keyboards were a decade ago. They are be located close to one another. This would currently organized in traditional alphabetic or minimize the travel distance between characters consonant-vowel groups. However, there have being selected and therefore increase the input been several interesting techniques for input speed. schemes employed. Although there are some designs for single layer keyboards, reminiscent 3 Related Work of English soft keyboards (Sowmya and Varma 2009), this is not the trend in Brahmic keyboards. While efficient soft keyboards have not been widely developed for Brahmic scripts, there has As a result of the great number of characters been much research in techniques for development in the alphabets, including diacritics, many of soft keyboards in English. Brahmic keyboard designers have opted for Layouts have developed from simple and layered keyboards. A layered keyboard has familiar to seemingly arbitrary, yet designed multiple characters residing in the same location by computers to be as efficient as possible. that are accessible by either hitting a button Early techniques for English soft keyboards that toggles between layers (Rathod and Josh used either alphabetic ordering or the traditional 2002) or rolling over base characters to reveal QWERTY layout. While these are still the most others (Shanbhag et al. 2002). Shanbhag’s commonly used keyboards, great enhancements keyboard has a combination of these effects, have been made in optimization. MacKenzie with multiple layers that can be toggled between et al. were among the first to design a as well as a layer that has consonants grouped layout based on character frequency. They used together and accessible by rolling over the group Fitt’s Law and trial-and-error hand-placement icon (Shanbhag et al. 2002). of frequently occurring bigraphs to develop the Another technique to decrease the necessary OPTI keyboard (MacKenzie et al. 1999). number of keys is the use of gestures on the Keyboards were further improved upon by using keyboard. In a gesture keyboard, the user draws computers to iterate over many solutions to any necessary diacritics for each character as computation-heavy algorithms from physics such they select it (Krishna et al. 2005). Gesture as Hook’s Law and a Metropolis random walk based keyboards have taken some of their ideas algorithm. Machines developed and tested from English gesture-based keyboards such as T- Cube, in which a user selects different characters vowel diacritics and every ligature in which the by flicking their mouse or stylus in different consonant participates shows up. The second level directions (Venolia and Neiberg 1994). menus are rectangular. An example of the menu These techniques have the potential to increase that pops up for is shown in Figure 2. the input speed of the user by decreasing the search time for characters and decreasing the necessary distance to travel between characters. However, no theoretic input speeds have been calculated for them. In addition, both techniques stand to benefit by reordering the keyboard by frequency of use of unigraphs and bigraphs, rather Figure 2: Second level menu (light blue in green) than alphabetically. for the consonant (na). 4 Brute-Force Soft Keyboard Design There exists a third level of menus as well. Each As a first attempt in designing a soft keyboard, ligature in the second layer has a sub-menu of all we took a brute-force approach. We decided to the vowel diacritics that can be added to it. The implement a three-level pop-up menu based soft level 3 menu for a ligature is shown in Figure 3. keyboard for Assamese. The first level provided an individual key for each consonant (See Figure 1); it also had a single key for each of the following sets of characters: vowels, vowel diacritics, digits and special punctuation marks. Figure 3: Third level menu (in red) for a ligature Thus, in our three-level menu system, we provide direct access to every vowel, vowel diacritic, digit, special punctuation mark, consonant, and consonant-vowel diacritic combination, as well as every ligature-vowel combination. However, this made for an unwieldy keyboard with access to Figure 1: Figure shows the top-level menu items over 3000 character combinations in which it is for character entry. The 11 vowels are represented possible to type anything that can appear in print by the first menu item in the top row.

Designing Soft Keyboards for Brahmic Scripts

Cross-Language Framework for Word Recognition and Spotting of Indic Scripts

SC22/WG20 N896 L2/01-476 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale De Normalisation

POSTLUDE P.1 Letter Frequencies

VSI Openvms C Language Reference Manual

Finite-State Script Normalization and Processing Utilities: the Nisaba Brahmic Library

Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset

Phonics TRB Coding Chart

Simplified Abugidas

English Literacy Dossier

Lunyole Grammar; It Does Not Attempt to Make a Statement for Or Against a Particular Formal Linguistic Theory

How Children Learn to Write Words

ADAPTATIONS of HEBREW SCRIPT in This Chapter I Present An