Tracing a Loose Wordhood for Chinese Input Method Engine

Total Page:16

File Type:pdf, Size:1020Kb

Tracing a Loose Wordhood for Chinese Input Method Engine Tracing a Loose Wordhood for Chinese Input Method Engine Xihu Zhang, Chu Wei and Hai Zhao∗ Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China fxihuzhangcs, [email protected], [email protected] Abstract phonetic representation of Chinese language, into Chinese character sequence. For example, a user Chinese input methods are used to convert wants to type a word ”北¬”(Beijing). The cor- pinyin sequence or other Latin encoding responding pinyin beijing is inputted, then the systems into Chinese character sentences. pinyin IME provides a list of Chinese charac- For more effective pinyin-to-character ter candidates whose pinyins are all beijing, such conversion, typical Input Method Engines as ”北¬”(pinyin: beijing, the Beijing city), ”背 (IMEs) rely on a predefined vocabulary o”(pinyin: beijing, background) as presented in that demands manually maintenance on Figure 1. The user at last selects the word ”北¬” schedule. For the purpose of removing the as the result. inconvenient vocabulary setting, this work focuses on automatic wordhood acquisi- beijing tion by fully considering that Chinese in- 1.北¬ 2.背centero 3.背 hereY 4.« 5.倍 putting is a free human-computer interac- tion procedure. Instead of strictly defin- Figure 1: IME interface on one page ing words, a loose word likelihood is in- troduced for measuring how likely a char- Chinese IMEs often have to utilize word-based acter sequence can be a user-recognized language models since character-based Language word with respect to using IME. Then an Model does not produce satisfactory result (Yang online algorithm is proposed to adjust the et al., 1998), Chinese words are composed of word likelihood or generate new words multiple consecutive Chinese characters. But un- by comparing user true choice for input- like alphabetic languages, The words in Chinese ting and the algorithm prediction. The sentences are not demarcated by spaces, mak- experimental results show that the pro- ing Chinese word identification a nontrivial task. posed solution can agilely adapt to di- Therefore, Chinese word dictionaries are usually verse typings and demonstrate perform- predefined to support word-based language mod- ance approaching highly-optimized IME els for IMEs. with fixed vocabulary. However, there is a huge difference between arXiv:1712.04158v1 [cs.CL] 12 Dec 2017 1 Introduction standard Chinese word segmentation and IME word identification. As Chinese text has no The inputting of Chinese language on computer blanks between characters , it allows all possible differs from English and other alphabetic lan- consecutive characters to hopefully form Chinese guages since Chinese language has more than words to some extent. For Chinese word seg- 20,000 characters and they cannot be simply mentation task, it learns from segmented corpus mapped to only 26 keys in the Latin keyboards. or predefined segmentation conventions, therefore The Chinese Input Method Engines (IMEs) have not all consecutive characters can be regarded as been developed to aid Chinese inputting. Among words for the task to learn1. For IME word iden- all IMEs, the most prominent type is pinyin- based IME that converts pinyins, which are the 1For example, a Chinese corpus may have 1 million bi- grams, though still given fewer than 50 thousand meaningful *Corresponding∗ author. word types. tification, it is a quite different case, because user Ï « seldom types a complete sentence and may stop Y at any position to leave a pinyin syllable seg- 倍 o 北北北¬¬¬ mentation and require IME to give proper pinyin- 北北北 to-character conversion. Table 1 shows multiple bei jing 精 beijing W possible segmentation choices for a given input. 竟 背o The segmentation points are indicated by vertical 背... ¬¬¬... bar, though, the linguistically valid segmentation is only the first one, which is ”` j ( j Z j 什 H”(pinyin: ni zai zuo shenme, what are you do- Figure 2: All possible character counterparts of ing). bei, jing and beijing. Note that though modern Chinese has five tones, in all pinyin based IMEs, 1. ```(you) j ((((be) j ZZZ(do) j 什什什么HH(what) tones are not inputted with the letters for simpli- 2. `(you) j (Z(be doing) j 什么(what) city, therefore pinyin IME has to handle much 3. `((you are) j Z什么(do what) more pinyin-to-character ambiguities in practice. 4. `(you) j (Z什么(be doing what) 5. `(Z什么(what are you doing) ...... 2002; Zhao et al., 2006; Pei et al., 2014; Chen et al., 2015; Ma and Hinrichs, 2015; Cai and Zhao, Table 1: IME user’s input segmentations 2016). IMEs manage user inputting, which is a human- The converted consecutive characters forming computer interaction software. We here consider ‘word’ during inputting will not have a unified two issues as follows for effective processing. standard right as shown in Table 1, in which every segmentation in every items may be a ‘word’ as • Challenge: As a user behavior, Chinese in- any user-determined segment for IME inputting put may vary from user to user and from time should be regarded as word. Thus word defined to time, which makes it impossible to find a by user during Chinese inputting is actually a quite stable and general model for the core com- loose concept. Meanwhile, word from a linguistic ponents of IME, vocabulary and user prefer- sense is still a useful heuristic information for IME ence. For example, a word may be frequently construction even though it is much more freely inputted by a user in a few days, but after then defined than that for the task of Chinese word seg- it is never touched any more. mentation. For example, IME being aware of that 北 pinyin beijing can be only converted to either ” • Convenience: Every time, user makes a ¬ 背o (Beijing)” or ” (background)” will greatly choice from character or character sequence help it make the right and more efficient pinyin- candidate list given by IME, which makes to-character decoding, as both pinyins bei and jing inputting an online human-computer inter- are respectively mapped to dozens of difference action procedure. It means that every time single Chinese characters (see Figure 2). Since user indicates if the prediction given by IME capturing words can narrow down the scope of is correct2. However, as to our best know- possible conversions, tracing such kind of word- ledge, such kind of interactive property of hood for IME becomes important in improving IMEs has not been well studied or exploited. user experiences. In fact, most, if not all, existing IMEs con- The above huge difference about wordhood sider pinyin-to-character decoding in an off- definitions between Chinese word segmentation line model. and IME results in most techniques developed for the former is unavailable to the latter, even 2IMEs are always capable of letting users input whatever they want, and the standard about goodness for IMEs is ac- we know that many studies have been paid to tually behind its speed, namely, the faster an IME lets user Chinese word segmentation, using either super- input, the better it is. Therefore the correct prediction means vised, or unsupervised learning, and amazing res- that IME ranks the user intended Chinese input at the first po- sition of the candidate list. Ranking the expected input at top ults have been received (Chang and Su, 1997; Ge position means that user may input by default keystroke so et al., 1999; Huang and Powers, 2003; Peng et al., that the inputting is the most efficient. To solve the word dilemma in Chinese IME, lem, but they still require an initial vocabulary or this paper presents a novel solution of automatic a segmented training corpus. wordhood acquisition for IMEs without assuming In addition to the above studies over vocabu- vocabulary initialization by fully considering that lary organization and language model adaptation Chinese inputting is a human-computer interaction for IMEs, most current business IMEs actually use procedure for the first time. In detail, the pro- two engineering solutions to follow user vocabu- posed method uses an algorithm to trace word like- lary change. The first is to simply save and re- lihood (but not word) that measures how a consec- peat what is frequently inputted and confirmed by utive character sequence looks like a ’word’ ac- user in recent times. Nevertheless, the recent high- cording to user inputting pattern. The algorithm frequency inputting will not be permanently ad- initially only has a Pinyin-to-Character conversion ded to the IME vocabulary. The second is to push table3 and an empty vocabulary. Each time the al- new words or new language model on schedule. gorithm receives a pinyin sequence and predicts The new words are collected and mined from all its corresponding Chinese sequence. By compar- online IME users’ inputs. However, the updates ing the user’s true choice for inputting and the al- about new words or new model are not real-time, gorithm prediction, the vocabulary is updated and and rely on a distributed user group, not adaptive the newly-introduced word likelihood is adjusted for each individual user. to follow user inputting change. This work differs from all existing scientific Our paper is organized as follows. Section 2 and engineering solutions for the following as- displays related work. Section 3 introduces the pects. First, we focus on adaptive word acquisi- framework of our model, discusses the pinyin-to- tion for IMEs, but not for any standard machine character conversion method and proposes an al- learning tasks such as all the recent work about gorithm for adaptive word likelihood adjustment. Chinese word segmentation, as IME word defini- In section 4 the experiment settings and results are tion is quite different from all these tasks (Chang demonstrated.
Recommended publications
  • Chinese Pinyin Aided IME, Input What You Have Not Keystroked Yet
    Chinese Pinyin Aided IME, Input What You Have Not Keystroked Yet Yafang Huang1;2, Hai Zhao1;2;∗, 1Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China [email protected], [email protected] ∗ Abstract between pinyin syllables and Chinese characters. (Huang et al., 2018; Yang et al., 2012; Jia and Chinese pinyin input method engine (IME) Zhao, 2014; Chen et al., 2015) regarded the P2C as converts pinyin into character so that Chinese a translation between two languages and solved it characters can be conveniently inputted into computer through common keyboard. IMEs in statistical or neural machine translation frame- work relying on its core component, pinyin- work. The fundamental difference between (Chen to-character conversion (P2C). Usually Chi- et al., 2015) work and ours is that our work is a nese IMEs simply predict a list of character fully end-to-end neural IME model with extra at- sequences for user choice only according to tention enhancement, while the former still works user pinyin input at each turn. However, Chi- on traditional IME only with converted neural net- nese inputting is a multi-turn online procedure, work language model enhancement. (Zhang et al., which can be supposed to be exploited for fur- 2017) introduced an online algorithm to construct ther user experience promoting. This paper thus for the first time introduces a sequence- appropriate dictionary for P2C. All the above men- to-sequence model with gated-attention mech- tioned work, however, still rely on a complete in- anism for the core task in IMEs.
    [Show full text]
  • Strategic Social Media Tools and Increasing Engagement
    Strategic Social media tools and increasing engagement Soolcio D I ST RI C T CO UN C IL In this guide 1. How to make best use ofyour time and money 2. What's in your social media too/kit? 3. How can you increase engagement? Soolcio D I ST RI C T CO UN C IL 1. Investment should you focus your time and money? Soolcio D I ST RI C T CO UN C IL Let’s think about 1. Platforms 3. Photography 2. Planning 4. Promotions Soolcio D I ST RI C T CO UN C IL 1. Platforms Always better to do fewer platforms, brilliantly You don't have to be everywhere, all the time! Think: Where is my audience? Soolcio D I ST RI C T CO UN C IL ------------------- ----------- 9!.!Je! · little_aae_kitchen ti II ~• • • • little_acre_kitchen • Follow eny oc •, d • Origi al io little_acre_kitchen ; PY F D Y! ; Biscoff brownies o t e counter! I you vould like to book a house old table or the veeken t en ge i touch! 01480 46405 □ eve s@hallandco en desig .co.uk 3w simply_rose711 Are hese available for take away? 3 1 like Rep y (? 0 Y/ Lik by cambridgejuiceco an 75 others e Q ddacommen ... Huntingdonshire- DISTRICT COUNCIL 2. Planning Put time aside to think: • What type ofcontent you'll post • What you won't post about! • Who's going to do it • What will be sustainable Soolcio D I ST RI C T CO UN C IL SOCIAL MEDIA CHECKLIST Planning Maybe you're new to social media - or you want to take a fresh 2.
    [Show full text]
  • Open Vocabulary Learning for Neural Chinese Pinyin IME
    Open Vocabulary Learning for Neural Chinese Pinyin IME Zhuosheng Zhang1;2, Yafang Huang1;2, Hai Zhao1;2;∗ 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China fzhangzs, [email protected], [email protected] Abstract lation between two different languages, pinyin se- quences and Chinese character sequences (namely Pinyin-to-character (P2C) conversion is the Chinese sentence). Actually, such a translation in core component of pinyin-based Chinese in- put method engine (IME). However, the con- P2C procedure is even more straightforward and version is seriously compromised by the am- simple by considering that the target Chinese char- biguities of Chinese characters corresponding acter sequence keeps the same order as the source to pinyin as well as the predefined fixed vocab- pinyin sequence, which means that we can decode ularies. To alleviate such inconveniences, we the target sentence from left to right without any propose a neural P2C conversion model aug- reordering. mented by an online updated vocabulary with Meanwhile, there exists a well-known challenge a sampling mechanism to support open vocab- ulary learning during IME working. Our ex- in P2C procedure, too much ambiguity mapping periments show that the proposed method out- pinyin syllable to character. In fact, there are only performs commercial IMEs and state-of-the- about 500 pinyin syllables corresponding to ten art traditional models on standard corpus and thousands of Chinese characters, even though the true inputting history dataset in terms of multi- amount of the commonest characters is more than ple metrics and thus the online updated vocab- 6,000 (Jia and Zhao, 2014).
    [Show full text]
  • Universidad De El Salvador Facutad Multidisciplinaria Oriental Departamento De Ciencias Y Humanidades Seccion De Educacion
    UNIVERSIDAD DE EL SALVADOR FACUTAD MULTIDISCIPLINARIA ORIENTAL DEPARTAMENTO DE CIENCIAS Y HUMANIDADES SECCION DE EDUCACION AÑO ACADEMICO: CICLO II, 2018 TEMA: LOS BUSCADORES (GOOGLE) DOCENTE: JORGE ENESTO PORTILLO. ESTUDINTE: GRANADOS VENTURA JOSE ISAAC. CIUDAD UNIVERSITARIA ORIENTAL, OCTUBRE 2018 INTRODUCCION En el presente trabajo daremos a conocer una gama de buscadores que son útiles para que las personas, busquen información de acuerdo a su intereses, cada buscador posee sus ventajas y desventajas en esta sección solo se presentara una síntesis por cada uno de los buscadores nos centraremos en aspectos importantes sobre el buscador google, su conceptos así como también sus aspectos generales, las características que este posee como material didáctico, además su clasificación, luego presentamos la importancia que este tiene para el procesos de enseñanza aprendizaje, así como también las ventas y desventajas que este buscador tiene, de igual manera se presentara el procesos detallado para la elaboración de la búsqueda de información de google, y el procesos como se usa en el aula. 2 INDICE OBJETIVOS.................................................................................................................................................... 5 OBJETIVO GENERAL. ..................................................................................................................................... 5 OBJETIVO ESPECIFICOS. ...........................................................................................................................
    [Show full text]
  • Using Google Pinyin IME
    Back Home Blog Tools Media Textbooks Using Google Pinyin IME Google Pinyin IME (谷歌拼音输入法 ) Free download here (http://www.google.com/intl/zh-CN/ime/pinyin/) 说到中文打字,谷歌拼音输入法比微软拼音输入法更快捷:中英转换不必换键,只需连着打,或按一下Shift 即 可。屏幕下方显示的控制台,可便于选择软键盘标注拼音调号,或繁简转换。此程序没有Mac电脑版本,不过 Mac使用者可以用QIM(新版称为 IMKQIM), 非常快捷。(详见关于QIM的说明)。 When it comes to typing Chinese on PC, Google Pinyin IME (freely available) is superior to Microsoft IME built into the Windows OS. The program is similar to MS IME in many ways, but it is much faster and easier. It can be freely downloaded and installed (and learned) in seconds! This user-friendly program has these features: 1. Easily switch between English and Chinese without changing keys, or just press Shift once. 2. The visible control center makes it easy to add pinyin tone marks, or switch between simplified and traditional character forms. 3. Type faster using the sentence mode. Just keep typing, and the system will automatically correct wrong characters. Download When you are on the download page (right), click on the yellow area to download the program, and then open the file and install it to your computer. Control Center After you have installed the Google pinyin IME program, click on the language bar (EN or CH) to choose Chinese (CH), then choose Google IME (谷歌 ) if you have more than one input methods. The Control Center should be displayed (normally at the bottom right of your screen). Click on each symbol to see how it works. Note that the moon shape should be kept as the following. If you change it to a full moon, it will affect your English text.
    [Show full text]
  • Chinese Pinyin Aided IME, Input What You Have Not Keystroked Yet
    Chinese Pinyin Aided IME, Input What You Have Not Keystroked Yet Yafang Huang1;2, Hai Zhao1;2;∗, 1Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China [email protected], [email protected] ∗ Abstract between pinyin syllables and Chinese characters. (Huang et al., 2018; Yang et al., 2012; Jia and Chinese pinyin input method engine (IME) Zhao, 2014; Chen et al., 2015) regarded the P2C as converts pinyin into character so that Chinese a translation between two languages and solved it characters can be conveniently inputted into computer through common keyboard. IMEs in statistical or neural machine translation frame- work relying on its core component, pinyin- work. The fundamental difference between (Chen to-character conversion (P2C). Usually Chi- et al., 2015) work and ours is that our work is a nese IMEs simply predict a list of character fully end-to-end neural IME model with extra at- sequences for user choice only according to tention enhancement, while the former still works user pinyin input at each turn. However, Chi- on traditional IME only with converted neural net- nese inputting is a multi-turn online procedure, work language model enhancement. (Zhang et al., which can be supposed to be exploited for fur- 2017) introduced an online algorithm to construct ther user experience promoting. This paper thus for the first time introduces a sequence- appropriate dictionary for P2C. All the above men- to-sequence model with gated-attention mech- tioned work, however, still rely on a complete in- anism for the core task in IMEs.
    [Show full text]
  • Usporedba Mobilnih Aplikacija Za Uređivanje Fotografija
    Primjena digitalne fotografije u reprodukcijskim medijima Katedra za grafički dizajn i slikovne informacije Grafički fakultet Sveučilišta u Zagrebu USPOREDBA MOBILNIH APLIKACIJA ZA UREĐIVANJE FOTOGRAFIJA SEMINARSKI RAD Nositelji kolegija i voditelj rada: Ime i prezime studenata: Dr. sc. Maja Strgar Kurečić, doc. Leon Burazin, Petar Mamić, Sara Žitvaj prosinac, 2019. Tablica sadržaja UVOD .......................................................................................................................................... 3 USPOREDBA REZULTATA ............................................................................................................ 4 Photoshop Express ................................................................................................................. 4 VSCO ..................................................................................................................................... 11 Facetune ............................................................................................................................... 14 Snapseed .............................................................................................................................. 17 Adobe Lightroom .................................................................................................................. 19 ZAKLJUČAK ............................................................................................................................... 21 LITERATURA ............................................................................................................................
    [Show full text]
  • G Ooooooooooooooo Gle
    80.000,000.000 $ 80.000,000.000 $ 60.000,000.000 $ 60.000,000.000 $ 40.000,000.000 $ 40.000,000.000 $ Brin in Page se spoznata Brin in Page sodelovati in začneta začne PageRank Algoritem splet iz 5 doniranih indeksirati Stanforda v omrežju računalnikov kapitala $100,000 zagonskega (via Andy Bechtolsheim), podjetja Google Inc.ustanovitev Dodatnih $25.000,000 zagonskega Caufield kapitala (via Kleiner Perkins in Sequoia& Byers Capital) AdWords, storitve Zagon na oglasni model prehod poslovni Blogger, platforme Prevzem vsebin za z namenom analize Google News algoritmov izboljšanje izide na borzi. Beta izdaja Podjetje Gmail z namenom analize platforme oglasnih izboljšanje vsebin za algoritmov Google Maps Izid platforme in analize z namenom zbiranja geografsko pogojenih vsebin YouTube platforme Predstavitev vsebina za z namenom analize algoritmov iskalnih izboljšanje sistema operacijskega Predstavitev leti po prevzemu (dve Android podjetja)istoimenskega lastnega brskalnika Predstavitev ki podjetju Chrome, omogoči storitev naprednih razvoj nemoten naložbe, za rezerv 50 milijard Najavijo 79 podjetij, kar prevzamejo skupno eno na teden. kot več Mobility Motorola prevzem Rekordni dolarjev. 12 milijard ceno za 20.000,000.000 $ 20.000,000.000 $ Likvidna sredstva 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Likvidna sredstva G ooooo ooooo ooooo gle let “Sklepava, da bo iskalnik z oglaševalskim poslovnim modelom neizogibno pristranski v korist oglaševalcev in tako ne bo zadovoljeval dejanskih potreb njegovih uporabnikov.” — Priloga A k znanstvenemu članku, ki sta ga leta 1998 izdala doktorska študenta Univerze v Stanfordu Sergey Brin in Lawrence Page. Članek z naslovom Anatomija obsežnega hipertekstovnega spletnega iskalnika opisuje temelje danes najbolj uporabljane spletne platforme: Google.
    [Show full text]
  • Google Ime Marathi Typing Software Free Download
    Google Ime Marathi Typing Software Free Download Google Ime Marathi Typing Software Free Download 1 / 2 Download Google Input Marathi Setup - best software for Windows. Marathi Indic Input: With ... Free. Google Gujarati Input is part of the Google Transliteration IME. ... Free. Point your cursor in a text area and start typing in Marathi or Hindi. Google input tools Marathi is a tool that will help you easily type in Marathi. ... How to install google Marathi input tools in window ... download Software from.. Oct 9, 2020 — google IME tool you were using to write Marathi in Microsoft Word has been removed from your ... google ime offline installer | google ime setup hindi marathi | google marathi ime offline ... English; मराठी; Software Name; 4 ... increase typing speed ... Free Download Total video converter with serial key. Google Unicode Marathi is the best way of typing the Marathi language on the ... 10.2.0.0 is available to all software users as a free download for Windows.. Try type “我想食叉燒包” with “ngoshuengsikchasiubao”. While typing, you will see a list of candidates matching the sounds of your input. Choose a candidate by ... google marathi typing software google marathi typing software, google marathi typing software for windows 10, google marathi typing software free download for windows 7 32 bit, google marathi typing software free download, google marathi typing software for windows 7, google marathi typing software free download for windows 7 64 bit, google marathi typing software free download for windows 7, google marathi typing software free download for windows 8, google marathi typing software offline, google english to marathi typing software, google indic marathi typing software free download, google input tools marathi typing software free download Tutorial on how to install and use Google IME to type in Hindi, Marathi, Bangla, Tamil, Telugu etc.
    [Show full text]
  • HS-CIT Standards
    2018 HS-CIT Standards HS-CIT Standards Learning Topics in HS-CIT 1 OPERATING SYSTEM ...................................................................................................................................... 2 2 INTERNET ...................................................................................................................................................... 4 3 WORD PROCESSING .................................................................................................................................... 12 4 SPREADSHEET .............................................................................................................................................. 14 5 PRESENTATION GRAPHICS ....................................................................................................................... 15 6 PERSONAL INFORMATION MANAGER ............................................................................................ 16 7 ENGLISH AND DEVANAGARI (HINDI) TYPING ............................................................................. 16 Page 1 of 16 HS-CIT Standards 1 Operating System Sr. Topic Category HS-CIT Standard International No. Standard 1. Windows Basic Learner should able to start, restart, shutdown, lock, sleep, iCCCS Operations hibernate and log off a computer or laptop. 2. Learner should able to use mouse techniques such as Click, Northstar Digital Right Click, Double Click and Drag & Drop Literacy Standards 3. Learner should able to plug in headphones correctly and use Northstar Digital
    [Show full text]
  • Google Tamil Transliteration Software
    Google tamil transliteration software Available input tools include transliteration, IME, and on-screen keyboards. Marathi, Nepali, Oriya, Punjabi, Russian, Sanskrit, Serbian, Sinhala, Tamil, Telugu, ​Installation · ​Configuration · ​Features · ​Windows XP. Try Google Input Tools online. Google Input Tools makes it easy to type in the language you choose, anywhere on the web. Learn more. To try it out, choose. Google Input Tools for Windows is an input editor that helps you type text in 22 different langua Type using 22 different languages with this tool Marathi, Nepali, Oriya, Punjabi, Russian, Sanskrit, Serbian, Sinhala, Tamil, Telugu, Tigrinya. Portable Indian languages transliteration software Azhagi+ ( +). Android version exists too (with voice recognition). Transliterate or type in Hindi, Tamil, Sanskrit, Telugu, Kannada, Malayalam, Gujarati, Install from Google Play Store. Google Transliteration IME is an input method editor which allows users Nepali, Oriya, Punjabi, Russian, Sanskrit, Serbian, Sinhalese, Tamil. Google Transliteration IME(Input Method Editor) is an input method editor which Gujarati, Hindi, Kannada, Marathi, Nepali, Punjabi, Tamil, Telugu and Urdu. Available input tools include transliteration, IME, and on-screen keyboards. Google Input Tamil Virtual Online Tamil Keyboard. Alternative software. Gujarati Indic Input. It gives users a convenient way of entering text in Indian Languages. FREE. Google Gujarati Input. Part of the. The Google Tamil Input tool is an easy to use transliteration and language support IME application. It is very efficient in "phonetic transliteration". Download Google Transliteration IME now to start typing in your language. the user needs to download a very small piece of software for any if the 14 Tamil,. Telugu,. Urdu. You can also use bookmarklet to directly type in.
    [Show full text]
  • Open Vocabulary Learning for Neural Chinese Pinyin IME
    Open Vocabulary Learning for Neural Chinese Pinyin IME Zhuosheng Zhang1;2, Yafang Huang1;2, Hai Zhao1;2;∗ 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China fzhangzs, [email protected], [email protected] Abstract lation between two different languages, pinyin se- quences and Chinese character sequences (namely Pinyin-to-character (P2C) conversion is the Chinese sentence). Actually, such a translation in core component of pinyin-based Chinese in- put method engine (IME). However, the con- P2C procedure is even more straightforward and version is seriously compromised by the am- simple by considering that the target Chinese char- biguities of Chinese characters corresponding acter sequence keeps the same order as the source to pinyin as well as the predefined fixed vocab- pinyin sequence, which means that we can decode ularies. To alleviate such inconveniences, we the target sentence from left to right without any propose a neural P2C conversion model aug- reordering. mented by an online updated vocabulary with Meanwhile, there exists a well-known challenge a sampling mechanism to support open vocab- ulary learning during IME working. Our ex- in P2C procedure, too much ambiguity mapping periments show that the proposed method out- pinyin syllable to character. In fact, there are only performs commercial IMEs and state-of-the- about 500 pinyin syllables corresponding to ten art traditional models on standard corpus and thousands of Chinese characters, even though the true inputting history dataset in terms of multi- amount of the commonest characters is more than ple metrics and thus the online updated vocab- 6,000 (Jia and Zhao, 2014).
    [Show full text]