Open Vocabulary Learning for Neural Chinese Pinyin IME

Total Page:16

File Type:pdf, Size:1020Kb

Open Vocabulary Learning for Neural Chinese Pinyin IME Open Vocabulary Learning for Neural Chinese Pinyin IME Zhuosheng Zhang1;2, Yafang Huang1;2, Hai Zhao1;2;∗ 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China fzhangzs, [email protected], [email protected] Abstract lation between two different languages, pinyin se- quences and Chinese character sequences (namely Pinyin-to-character (P2C) conversion is the Chinese sentence). Actually, such a translation in core component of pinyin-based Chinese in- put method engine (IME). However, the con- P2C procedure is even more straightforward and version is seriously compromised by the am- simple by considering that the target Chinese char- biguities of Chinese characters corresponding acter sequence keeps the same order as the source to pinyin as well as the predefined fixed vocab- pinyin sequence, which means that we can decode ularies. To alleviate such inconveniences, we the target sentence from left to right without any propose a neural P2C conversion model aug- reordering. mented by an online updated vocabulary with Meanwhile, there exists a well-known challenge a sampling mechanism to support open vocab- ulary learning during IME working. Our ex- in P2C procedure, too much ambiguity mapping periments show that the proposed method out- pinyin syllable to character. In fact, there are only performs commercial IMEs and state-of-the- about 500 pinyin syllables corresponding to ten art traditional models on standard corpus and thousands of Chinese characters, even though the true inputting history dataset in terms of multi- amount of the commonest characters is more than ple metrics and thus the online updated vocab- 6,000 (Jia and Zhao, 2014). As well known, the ulary indeed helps our IME effectively follows homophone and the polyphone are quite common user inputting behavior. in the Chinese language. Thus one pinyin may cor- 1 Introduction respond to ten or more Chinese characters on the average. Chinese may use different Chinese characters up However, pinyin IME may benefit from decod- to 20,000 so that it is non-trivial to type the Chi- ing longer pinyin sequence for more efficient in- nese character directly from a Latin-style key- putting. When a given pinyin sequence becomes board which only has 26 keys (Zhang et al., longer, the list of the corresponding legal character 2018a). The pinyin as the official romanization sequences will significantly reduce. For example, representation for Chinese provides a solution that IME being aware of that pinyin sequence bei jing maps Chinese character to a string of Latin al- can be only converted to either 背o(background) phabets so that each character has a letter writing or 北¬(Beijing) will greatly help it make the right form of its own and users can type pinyin in terms and more efficient P2C decoding, as both pinyin of Latin letters to input Chinese characters into a bei and jing are respectively mapped to dozens of computer. Therefore, converting pinyin to Chinese difference single Chinese characters. Table1 illus- characters is the most basic module of all pinyin- trates that the list size of the corresponding Chi- based IMEs. nese character sequence converted by pinyin se- As each Chinese character may be mapped to a quence bei jing huan ying ni (北¬"Î`, Wel- pinyin syllable, it is natural to regard the Pinyin- come to Beijing) is changed according to the dif- to-Character (P2C) conversion as a machine trans- ferent sized source pinyin sequences. ∗ Corresponding author. This paper was partially sup- To reduce the P2C ambiguities by decoding ported by National Key Research and Development Program longer input pinyin sequence, Chinese IMEs may of China (No. 2017YFB0304100) and Key Projects of Na- tional Natural Science Foundation of China (U1836222 and often utilize word-based language models since 61733011). character-based language model always suffers 1584 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1584–1594 Florence, Italy, July 28 - August 2, 2019. c 2019 Association for Computational Linguistics Pinyin seq. con- bei jing huan ying ni ule will update the vocabulary accordingly. Our sists of 1 syllable « l ¯ ñ ``` evaluation will be performed on three diverse cor- 北北北 Y b 颖 h pora, including two which are from the real user W 井 还 ÎÎÎ 逆 inputting history, for verifying the effectiveness of o ¬¬¬ { q 拟 背 Ï """ 应 < the proposed method in different scenarios. The rest of the paper is organized as follows: Pinyin seq. con- bei jing huan ying ni Section 2 discusses relevant works. Sections 3 sists of 2 syllables 北北北¬¬¬ {q ` and 4 introduce the proposed model. Experimental 背o """ÎÎÎ ` results and the model analysis are respectively in Sections 5 and 6. Section 7 concludes this paper. Pinyin seq. con- bei jing huan ying ni sists of 5 syllables 北北北¬¬¬"""ÎÎÎ``` 2 Related Work Table 1: The shorter the pinyin sequence is, the more To effectively utilize words for IMEs, many nat- character sequences will be mapped. ural language processing (NLP) techniques have been applied. Chen(2003) introduced a joint maximum n-gram model with syllabification for from the mapping ambiguity. However, the ef- grapheme-to-phoneme conversion. Chen and Lee fect of the work in P2C will be undermined with (2000) used a trigram language model and incor- quite restricted vocabularies. The efficiency of porated word segmentation to convert pinyin se- IME conversion depends on the sufficiency of the quence to Chinese word sequence. Xiao et al. vocabulary and previous work on machine transla- (2008) proposed an iterative algorithm to discover tion has shown a large enough vocabulary is nec- unseen words in corpus for building a Chinese essary to achieve good accuracy (Jean et al., 2015). language model. Mori et al.(2006) described a In addition, some sampling techniques for vocab- method enlarging the vocabulary which can cap- ulary selection are proposed to balance the com- ture the context information. putational cost of conversion (Zhou et al., 2016; For either pinyin-to-character for Chinese IMEs Wu et al., 2018). As IMEs work, users inputting or kana-to-kanji for Japanese IMEs, a few lan- style may change from time to time, let alone di- guage model training methods have been devel- verse user may input quite diverse contents, which oped. Mori et al.(1998) proposed a probabilis- makes a predefined fixed vocabulary can never be tic based language model for IME. Jiampojamarn sufficient. For a convenient solution, most com- et al.(2008) presented online discriminative train- mercial IMEs have to manually update their vo- ing. Lin and Zhang(2008) proposed a statistic cabulary on schedule. Moreover, the training for model using the frequent nearby set of the target word-based language model is especially difficult word. Chen et al.(2012) used collocations and k- for rare words, which appear sparsely in the cor- means clustering to improve the n-pos model for pus but generally take up a large share of the dic- Japanese IME. Jiang et al.(2007) put forward a tionary. PTC framework based on support vector machine. To well handle the open vocabulary learning Hatori and Suzuki(2011) and Yang et al.(2012) problem in IME, in this work, we introduce an respectively applied statistic machine translation online sequence-to-sequence (seq2seq) model for (SMT) to Japanese pronunciation prediction and P2C and design a sampling mechanism utilizing Chinese P2C tasks. Chen et al.(2015); Huang our online updated vocabulary to enhance the con- et al.(2018) regarded the P2C as a translation be- version accuracy of IMEs as well as speed up the tween two languages and solved it in neural ma- decoding procedure. In detail, first, a character- chine translation framework. enhanced word embedding (CWE) mechanism is All the above-mentioned work, however, still proposed to represent the word so that the pro- rely on a predefined fixed vocabulary, and IME posed model can let IME generally work at the users have no chance to refine their own dictionary word level and pick a very small target vocabu- through a user-friendly way. Zhang et al.(2017) is lary for each sentence. Second, every time the mostly related to this work, which also offers an user makes a selection contradicted the prediction online mechanism to adaptively update user vo- given by the P2C conversion module, the mod- cabulary. The key difference between their work 1585 and ours lies on that this work presents the first the meantime, high-frequency word embeddings neural solution with online vocabulary adaptation are attached to character embedding via average while (Zhang et al., 2017) sticks to a traditional pooling while low-frequency words are computed model for IME. from character embedding. Our embeddings also Recently, neural networks have been adopted contain different granularity levels of embedding, for a wide range of tasks (Li et al., 2019; Xiao but the word vocabulary is capable of being up- et al., 2019; Zhou and Zhao, 2019; Li et al., dated in accordance with users’ inputting choice 2018a,b). The effectiveness of neural models de- during IME working. In contrast, (Cai et al., 2017) pends on the size of the vocabulary on the target build embeddings based on the word frequency side and previous work has shown that vocabular- from a fixed corpus. ies of well over 50K word types are necessary to achieve good accuracy (Jean et al., 2015)(Zhou 3 Our Models et al., 2016). Neural machine translation (NMT) For a convenient reference, hereafter a character systems compute the probability of the next tar- in pinyin language also refers to an independent get word given both the previously generated tar- pinyin syllable in the case without causing confu- get words as well as the source sentence.
Recommended publications
  • Chinese Pinyin Aided IME, Input What You Have Not Keystroked Yet
    Chinese Pinyin Aided IME, Input What You Have Not Keystroked Yet Yafang Huang1;2, Hai Zhao1;2;∗, 1Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China [email protected], [email protected] ∗ Abstract between pinyin syllables and Chinese characters. (Huang et al., 2018; Yang et al., 2012; Jia and Chinese pinyin input method engine (IME) Zhao, 2014; Chen et al., 2015) regarded the P2C as converts pinyin into character so that Chinese a translation between two languages and solved it characters can be conveniently inputted into computer through common keyboard. IMEs in statistical or neural machine translation frame- work relying on its core component, pinyin- work. The fundamental difference between (Chen to-character conversion (P2C). Usually Chi- et al., 2015) work and ours is that our work is a nese IMEs simply predict a list of character fully end-to-end neural IME model with extra at- sequences for user choice only according to tention enhancement, while the former still works user pinyin input at each turn. However, Chi- on traditional IME only with converted neural net- nese inputting is a multi-turn online procedure, work language model enhancement. (Zhang et al., which can be supposed to be exploited for fur- 2017) introduced an online algorithm to construct ther user experience promoting. This paper thus for the first time introduces a sequence- appropriate dictionary for P2C. All the above men- to-sequence model with gated-attention mech- tioned work, however, still rely on a complete in- anism for the core task in IMEs.
    [Show full text]
  • Strategic Social Media Tools and Increasing Engagement
    Strategic Social media tools and increasing engagement Soolcio D I ST RI C T CO UN C IL In this guide 1. How to make best use ofyour time and money 2. What's in your social media too/kit? 3. How can you increase engagement? Soolcio D I ST RI C T CO UN C IL 1. Investment should you focus your time and money? Soolcio D I ST RI C T CO UN C IL Let’s think about 1. Platforms 3. Photography 2. Planning 4. Promotions Soolcio D I ST RI C T CO UN C IL 1. Platforms Always better to do fewer platforms, brilliantly You don't have to be everywhere, all the time! Think: Where is my audience? Soolcio D I ST RI C T CO UN C IL ------------------- ----------- 9!.!Je! · little_aae_kitchen ti II ~• • • • little_acre_kitchen • Follow eny oc •, d • Origi al io little_acre_kitchen ; PY F D Y! ; Biscoff brownies o t e counter! I you vould like to book a house old table or the veeken t en ge i touch! 01480 46405 □ eve s@hallandco en desig .co.uk 3w simply_rose711 Are hese available for take away? 3 1 like Rep y (? 0 Y/ Lik by cambridgejuiceco an 75 others e Q ddacommen ... Huntingdonshire- DISTRICT COUNCIL 2. Planning Put time aside to think: • What type ofcontent you'll post • What you won't post about! • Who's going to do it • What will be sustainable Soolcio D I ST RI C T CO UN C IL SOCIAL MEDIA CHECKLIST Planning Maybe you're new to social media - or you want to take a fresh 2.
    [Show full text]
  • Universidad De El Salvador Facutad Multidisciplinaria Oriental Departamento De Ciencias Y Humanidades Seccion De Educacion
    UNIVERSIDAD DE EL SALVADOR FACUTAD MULTIDISCIPLINARIA ORIENTAL DEPARTAMENTO DE CIENCIAS Y HUMANIDADES SECCION DE EDUCACION AÑO ACADEMICO: CICLO II, 2018 TEMA: LOS BUSCADORES (GOOGLE) DOCENTE: JORGE ENESTO PORTILLO. ESTUDINTE: GRANADOS VENTURA JOSE ISAAC. CIUDAD UNIVERSITARIA ORIENTAL, OCTUBRE 2018 INTRODUCCION En el presente trabajo daremos a conocer una gama de buscadores que son útiles para que las personas, busquen información de acuerdo a su intereses, cada buscador posee sus ventajas y desventajas en esta sección solo se presentara una síntesis por cada uno de los buscadores nos centraremos en aspectos importantes sobre el buscador google, su conceptos así como también sus aspectos generales, las características que este posee como material didáctico, además su clasificación, luego presentamos la importancia que este tiene para el procesos de enseñanza aprendizaje, así como también las ventas y desventajas que este buscador tiene, de igual manera se presentara el procesos detallado para la elaboración de la búsqueda de información de google, y el procesos como se usa en el aula. 2 INDICE OBJETIVOS.................................................................................................................................................... 5 OBJETIVO GENERAL. ..................................................................................................................................... 5 OBJETIVO ESPECIFICOS. ...........................................................................................................................
    [Show full text]
  • Using Google Pinyin IME
    Back Home Blog Tools Media Textbooks Using Google Pinyin IME Google Pinyin IME (谷歌拼音输入法 ) Free download here (http://www.google.com/intl/zh-CN/ime/pinyin/) 说到中文打字,谷歌拼音输入法比微软拼音输入法更快捷:中英转换不必换键,只需连着打,或按一下Shift 即 可。屏幕下方显示的控制台,可便于选择软键盘标注拼音调号,或繁简转换。此程序没有Mac电脑版本,不过 Mac使用者可以用QIM(新版称为 IMKQIM), 非常快捷。(详见关于QIM的说明)。 When it comes to typing Chinese on PC, Google Pinyin IME (freely available) is superior to Microsoft IME built into the Windows OS. The program is similar to MS IME in many ways, but it is much faster and easier. It can be freely downloaded and installed (and learned) in seconds! This user-friendly program has these features: 1. Easily switch between English and Chinese without changing keys, or just press Shift once. 2. The visible control center makes it easy to add pinyin tone marks, or switch between simplified and traditional character forms. 3. Type faster using the sentence mode. Just keep typing, and the system will automatically correct wrong characters. Download When you are on the download page (right), click on the yellow area to download the program, and then open the file and install it to your computer. Control Center After you have installed the Google pinyin IME program, click on the language bar (EN or CH) to choose Chinese (CH), then choose Google IME (谷歌 ) if you have more than one input methods. The Control Center should be displayed (normally at the bottom right of your screen). Click on each symbol to see how it works. Note that the moon shape should be kept as the following. If you change it to a full moon, it will affect your English text.
    [Show full text]
  • Tracing a Loose Wordhood for Chinese Input Method Engine
    Tracing a Loose Wordhood for Chinese Input Method Engine Xihu Zhang, Chu Wei and Hai Zhao∗ Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China fxihuzhangcs, [email protected], [email protected] Abstract phonetic representation of Chinese language, into Chinese character sequence. For example, a user Chinese input methods are used to convert wants to type a word ”北¬”(Beijing). The cor- pinyin sequence or other Latin encoding responding pinyin beijing is inputted, then the systems into Chinese character sentences. pinyin IME provides a list of Chinese charac- For more effective pinyin-to-character ter candidates whose pinyins are all beijing, such conversion, typical Input Method Engines as ”北¬”(pinyin: beijing, the Beijing city), ”背 (IMEs) rely on a predefined vocabulary o”(pinyin: beijing, background) as presented in that demands manually maintenance on Figure 1. The user at last selects the word ”北¬” schedule. For the purpose of removing the as the result. inconvenient vocabulary setting, this work focuses on automatic wordhood acquisi- beijing tion by fully considering that Chinese in- 1.北¬ 2.背centero 3.背 hereY 4.« 5.倍 putting is a free human-computer interac- tion procedure. Instead of strictly defin- Figure 1: IME interface on one page ing words, a loose word likelihood is in- troduced for measuring how likely a char- Chinese IMEs often have to utilize word-based acter sequence can be a user-recognized language models since character-based Language word with respect to using IME. Then an Model does not produce satisfactory result (Yang online algorithm is proposed to adjust the et al., 1998), Chinese words are composed of word likelihood or generate new words multiple consecutive Chinese characters.
    [Show full text]
  • Chinese Pinyin Aided IME, Input What You Have Not Keystroked Yet
    Chinese Pinyin Aided IME, Input What You Have Not Keystroked Yet Yafang Huang1;2, Hai Zhao1;2;∗, 1Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China [email protected], [email protected] ∗ Abstract between pinyin syllables and Chinese characters. (Huang et al., 2018; Yang et al., 2012; Jia and Chinese pinyin input method engine (IME) Zhao, 2014; Chen et al., 2015) regarded the P2C as converts pinyin into character so that Chinese a translation between two languages and solved it characters can be conveniently inputted into computer through common keyboard. IMEs in statistical or neural machine translation frame- work relying on its core component, pinyin- work. The fundamental difference between (Chen to-character conversion (P2C). Usually Chi- et al., 2015) work and ours is that our work is a nese IMEs simply predict a list of character fully end-to-end neural IME model with extra at- sequences for user choice only according to tention enhancement, while the former still works user pinyin input at each turn. However, Chi- on traditional IME only with converted neural net- nese inputting is a multi-turn online procedure, work language model enhancement. (Zhang et al., which can be supposed to be exploited for fur- 2017) introduced an online algorithm to construct ther user experience promoting. This paper thus for the first time introduces a sequence- appropriate dictionary for P2C. All the above men- to-sequence model with gated-attention mech- tioned work, however, still rely on a complete in- anism for the core task in IMEs.
    [Show full text]
  • Usporedba Mobilnih Aplikacija Za Uređivanje Fotografija
    Primjena digitalne fotografije u reprodukcijskim medijima Katedra za grafički dizajn i slikovne informacije Grafički fakultet Sveučilišta u Zagrebu USPOREDBA MOBILNIH APLIKACIJA ZA UREĐIVANJE FOTOGRAFIJA SEMINARSKI RAD Nositelji kolegija i voditelj rada: Ime i prezime studenata: Dr. sc. Maja Strgar Kurečić, doc. Leon Burazin, Petar Mamić, Sara Žitvaj prosinac, 2019. Tablica sadržaja UVOD .......................................................................................................................................... 3 USPOREDBA REZULTATA ............................................................................................................ 4 Photoshop Express ................................................................................................................. 4 VSCO ..................................................................................................................................... 11 Facetune ............................................................................................................................... 14 Snapseed .............................................................................................................................. 17 Adobe Lightroom .................................................................................................................. 19 ZAKLJUČAK ............................................................................................................................... 21 LITERATURA ............................................................................................................................
    [Show full text]
  • G Ooooooooooooooo Gle
    80.000,000.000 $ 80.000,000.000 $ 60.000,000.000 $ 60.000,000.000 $ 40.000,000.000 $ 40.000,000.000 $ Brin in Page se spoznata Brin in Page sodelovati in začneta začne PageRank Algoritem splet iz 5 doniranih indeksirati Stanforda v omrežju računalnikov kapitala $100,000 zagonskega (via Andy Bechtolsheim), podjetja Google Inc.ustanovitev Dodatnih $25.000,000 zagonskega Caufield kapitala (via Kleiner Perkins in Sequoia& Byers Capital) AdWords, storitve Zagon na oglasni model prehod poslovni Blogger, platforme Prevzem vsebin za z namenom analize Google News algoritmov izboljšanje izide na borzi. Beta izdaja Podjetje Gmail z namenom analize platforme oglasnih izboljšanje vsebin za algoritmov Google Maps Izid platforme in analize z namenom zbiranja geografsko pogojenih vsebin YouTube platforme Predstavitev vsebina za z namenom analize algoritmov iskalnih izboljšanje sistema operacijskega Predstavitev leti po prevzemu (dve Android podjetja)istoimenskega lastnega brskalnika Predstavitev ki podjetju Chrome, omogoči storitev naprednih razvoj nemoten naložbe, za rezerv 50 milijard Najavijo 79 podjetij, kar prevzamejo skupno eno na teden. kot več Mobility Motorola prevzem Rekordni dolarjev. 12 milijard ceno za 20.000,000.000 $ 20.000,000.000 $ Likvidna sredstva 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Likvidna sredstva G ooooo ooooo ooooo gle let “Sklepava, da bo iskalnik z oglaševalskim poslovnim modelom neizogibno pristranski v korist oglaševalcev in tako ne bo zadovoljeval dejanskih potreb njegovih uporabnikov.” — Priloga A k znanstvenemu članku, ki sta ga leta 1998 izdala doktorska študenta Univerze v Stanfordu Sergey Brin in Lawrence Page. Članek z naslovom Anatomija obsežnega hipertekstovnega spletnega iskalnika opisuje temelje danes najbolj uporabljane spletne platforme: Google.
    [Show full text]
  • Google Ime Marathi Typing Software Free Download
    Google Ime Marathi Typing Software Free Download Google Ime Marathi Typing Software Free Download 1 / 2 Download Google Input Marathi Setup - best software for Windows. Marathi Indic Input: With ... Free. Google Gujarati Input is part of the Google Transliteration IME. ... Free. Point your cursor in a text area and start typing in Marathi or Hindi. Google input tools Marathi is a tool that will help you easily type in Marathi. ... How to install google Marathi input tools in window ... download Software from.. Oct 9, 2020 — google IME tool you were using to write Marathi in Microsoft Word has been removed from your ... google ime offline installer | google ime setup hindi marathi | google marathi ime offline ... English; मराठी; Software Name; 4 ... increase typing speed ... Free Download Total video converter with serial key. Google Unicode Marathi is the best way of typing the Marathi language on the ... 10.2.0.0 is available to all software users as a free download for Windows.. Try type “我想食叉燒包” with “ngoshuengsikchasiubao”. While typing, you will see a list of candidates matching the sounds of your input. Choose a candidate by ... google marathi typing software google marathi typing software, google marathi typing software for windows 10, google marathi typing software free download for windows 7 32 bit, google marathi typing software free download, google marathi typing software for windows 7, google marathi typing software free download for windows 7 64 bit, google marathi typing software free download for windows 7, google marathi typing software free download for windows 8, google marathi typing software offline, google english to marathi typing software, google indic marathi typing software free download, google input tools marathi typing software free download Tutorial on how to install and use Google IME to type in Hindi, Marathi, Bangla, Tamil, Telugu etc.
    [Show full text]
  • HS-CIT Standards
    2018 HS-CIT Standards HS-CIT Standards Learning Topics in HS-CIT 1 OPERATING SYSTEM ...................................................................................................................................... 2 2 INTERNET ...................................................................................................................................................... 4 3 WORD PROCESSING .................................................................................................................................... 12 4 SPREADSHEET .............................................................................................................................................. 14 5 PRESENTATION GRAPHICS ....................................................................................................................... 15 6 PERSONAL INFORMATION MANAGER ............................................................................................ 16 7 ENGLISH AND DEVANAGARI (HINDI) TYPING ............................................................................. 16 Page 1 of 16 HS-CIT Standards 1 Operating System Sr. Topic Category HS-CIT Standard International No. Standard 1. Windows Basic Learner should able to start, restart, shutdown, lock, sleep, iCCCS Operations hibernate and log off a computer or laptop. 2. Learner should able to use mouse techniques such as Click, Northstar Digital Right Click, Double Click and Drag & Drop Literacy Standards 3. Learner should able to plug in headphones correctly and use Northstar Digital
    [Show full text]
  • Google Tamil Transliteration Software
    Google tamil transliteration software Available input tools include transliteration, IME, and on-screen keyboards. Marathi, Nepali, Oriya, Punjabi, Russian, Sanskrit, Serbian, Sinhala, Tamil, Telugu, ​Installation · ​Configuration · ​Features · ​Windows XP. Try Google Input Tools online. Google Input Tools makes it easy to type in the language you choose, anywhere on the web. Learn more. To try it out, choose. Google Input Tools for Windows is an input editor that helps you type text in 22 different langua Type using 22 different languages with this tool Marathi, Nepali, Oriya, Punjabi, Russian, Sanskrit, Serbian, Sinhala, Tamil, Telugu, Tigrinya. Portable Indian languages transliteration software Azhagi+ ( +). Android version exists too (with voice recognition). Transliterate or type in Hindi, Tamil, Sanskrit, Telugu, Kannada, Malayalam, Gujarati, Install from Google Play Store. Google Transliteration IME is an input method editor which allows users Nepali, Oriya, Punjabi, Russian, Sanskrit, Serbian, Sinhalese, Tamil. Google Transliteration IME(Input Method Editor) is an input method editor which Gujarati, Hindi, Kannada, Marathi, Nepali, Punjabi, Tamil, Telugu and Urdu. Available input tools include transliteration, IME, and on-screen keyboards. Google Input Tamil Virtual Online Tamil Keyboard. Alternative software. Gujarati Indic Input. It gives users a convenient way of entering text in Indian Languages. FREE. Google Gujarati Input. Part of the. The Google Tamil Input tool is an easy to use transliteration and language support IME application. It is very efficient in "phonetic transliteration". Download Google Transliteration IME now to start typing in your language. the user needs to download a very small piece of software for any if the 14 Tamil,. Telugu,. Urdu. You can also use bookmarklet to directly type in.
    [Show full text]
  • Open Vocabulary Learning for Neural Chinese Pinyin IME
    Open Vocabulary Learning for Neural Chinese Pinyin IME Zhuosheng Zhang1;2, Yafang Huang1;2, Hai Zhao1;2;∗ 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China fzhangzs, [email protected], [email protected] Abstract lation between two different languages, pinyin se- quences and Chinese character sequences (namely Pinyin-to-character (P2C) conversion is the Chinese sentence). Actually, such a translation in core component of pinyin-based Chinese in- put method engine (IME). However, the con- P2C procedure is even more straightforward and version is seriously compromised by the am- simple by considering that the target Chinese char- biguities of Chinese characters corresponding acter sequence keeps the same order as the source to pinyin as well as the predefined fixed vocab- pinyin sequence, which means that we can decode ularies. To alleviate such inconveniences, we the target sentence from left to right without any propose a neural P2C conversion model aug- reordering. mented by an online updated vocabulary with Meanwhile, there exists a well-known challenge a sampling mechanism to support open vocab- ulary learning during IME working. Our ex- in P2C procedure, too much ambiguity mapping periments show that the proposed method out- pinyin syllable to character. In fact, there are only performs commercial IMEs and state-of-the- about 500 pinyin syllables corresponding to ten art traditional models on standard corpus and thousands of Chinese characters, even though the true inputting history dataset in terms of multi- amount of the commonest characters is more than ple metrics and thus the online updated vocab- 6,000 (Jia and Zhao, 2014).
    [Show full text]