Deep Internationalization for Gboard, the Google Keyboard
Total Page:16
File Type:pdf, Size:1020Kb

Load more
Recommended publications
-
Cross-Language Framework for Word Recognition and Spotting of Indic Scripts
Cross-language Framework for Word Recognition and Spotting of Indic Scripts aAyan Kumar Bhunia, bPartha Pratim Roy*, aAkash Mohta, cUmapada Pal aDept. of ECE, Institute of Engineering & Management, Kolkata, India bDept. of CSE, Indian Institute of Technology Roorkee India cCVPR Unit, Indian Statistical Institute, Kolkata, India bemail: [email protected], TEL: +91-1332-284816 Abstract Handwritten word recognition and spotting of low-resource scripts are difficult as sufficient training data is not available and it is often expensive for collecting data of such scripts. This paper presents a novel cross language platform for handwritten word recognition and spotting for such low-resource scripts where training is performed with a sufficiently large dataset of an available script (considered as source script) and testing is done on other scripts (considered as target script). Training with one source script and testing with another script to have a reasonable result is not easy in handwriting domain due to the complex nature of handwriting variability among scripts. Also it is difficult in mapping between source and target characters when they appear in cursive word images. The proposed Indic cross language framework exploits a large resource of dataset for training and uses it for recognizing and spotting text of other target scripts where sufficient amount of training data is not available. Since, Indic scripts are mostly written in 3 zones, namely, upper, middle and lower, we employ zone-wise character (or component) mapping for efficient learning purpose. The performance of our cross- language framework depends on the extent of similarity between the source and target scripts. -
A Universal Amazigh Keyboard for Latin Script and Tifinagh
LES RESSOURCES LANGAGIERES : CONSTRUCTION ET EXPLOITATION A universal Amazigh keyboard for Latin script and Tifinagh Paul Anderson [email protected] 1. Introduction Systems of Amazigh text encoding and corresponding keyboard layouts have tended to be narrowly aimed at specific user communities, because of differences in phonology and orthography across Amazigh language variants1. Keyboard layouts for language variants have therefore lacked orthographic features found in other regions. This restricted focus impedes users' experimentation with the writing of other Amazigh regional variants and converged literary forms where they differ in orthographic features or in script. So far there has been no way to type more than a handful of Amazigh variants intuitively on any one layout even within one script. This fragmented development has meant that keyboard driver implementations have often lagged behind advances in technology, and have usually failed to take into account general keyboard layout design, ergonomy and typing speed, and solutions from other Amazigh regions or non-Amazigh languages. Some users even preferred to improvise key definitions based on their own understanding, which often resulted in mistaken use of lookalike letters and diacritics. Keyboard layouts have also failed to provide for Amazigh minority populations around the world, and have considered the multilingual context of Amazigh language use only locally. Several scripts are commonly used to write Amazigh variants, and even within a script there are different orthographies in use. Some orthographies are formal 1 I use the term 'language variant' since distinguishing 'dialect' and 'language' is not necessary here. ~ 165 ~ LES RESSOURCES LANGAGIERES : CONSTRUCTION ET EXPLOITATION standards. In others, some features are obsolete but still in use, some features are still disputed, and some features are regional usages or personal initiatives, or are required only for writing more phonetically. -
Getting Started with Libreoffice 3.4 Copyright
Getting Started with LibreOffice 3.4 Copyright This document is Copyright © 2010–2012 by its contributors as listed below. You may distribute it and/or modify it under the terms of either the GNU General Public License (http://www.gnu.org/licenses/gpl.html), version 3 or later, or the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), version 3.0 or later. Contributors Jean Hollis Weber Jeremy Cartwright Ron Faile Jr. Martin Fox Dan Lewis David Michel Andrew Pitonyak Hazel Russman Peter Schofield John A Smith Laurent Balland-Poirier Cover art: Drew Jensen Christoph Noack Klaus-Jürgen Weghorn Jean Hollis Weber Acknowledgements This book is adapted and updated from Getting Started with OpenOffice.org 3.3. The contributors to that book are listed on page 13. Feedback Please direct any comments or suggestions about this document to: [email protected] Publication date and software version Published 10 September 2012. Based on LibreOffice 3.5.6. Documentation for LibreOffice is available at http://www.libreoffice.org/get-help/documentation Contents Copyright..................................................................................................................................... 2 Note for Mac users...................................................................................................................... 8 Preface.................................................................................................................................. 9 Who is this book for?................................................................................................................ -
Background Information History, Licensing, and File Formats Copyright This Document Is Copyright © 2008 by Its Contributors As Listed in the Section Titled Authors
Getting Started Guide Appendix B Background Information History, licensing, and file formats Copyright This document is Copyright © 2008 by its contributors as listed in the section titled Authors. You may distribute it and/or modify it under the terms of either the GNU General Public License, version 3 or later, or the Creative Commons Attribution License, version 3.0 or later. All trademarks within this guide belong to their legitimate owners. Authors Jean Hollis Weber Feedback Please direct any comments or suggestions about this document to: [email protected] Acknowledgments This Appendix includes material written by Richard Barnes and others for Chapter 1 of Getting Started with OpenOffice.org 2.x. Publication date and software version Published 13 October 2008. Based on OpenOffice.org 3.0. You can download an editable version of this document from http://oooauthors.org/en/authors/userguide3/published/ Contents Introduction...........................................................................................4 A short history of OpenOffice.org..........................................................4 The OpenOffice.org community.............................................................4 How is OpenOffice.org licensed?...........................................................5 What is “open source”?..........................................................................5 What is OpenDocument?........................................................................6 File formats OOo can open.....................................................................6 -
SC22/WG20 N896 L2/01-476 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale De Normalisation
SC22/WG20 N896 L2/01-476 Universal multiple-octet coded character set International organization for standardization Organisation internationale de normalisation Title: Ordering rules for Khmer Source: Kent Karlsson Date: 2001-12-19 Status: Expert contribution Document type: Working group document Action: For consideration by the UTC and JTC 1/SC 22/WG 20 1 Introduction The Khmer script in Unicode/10646 uses conjoining characters, just like the Indic scripts and Hangul. An alternative that is suitable (and much more elegant) for Khmer, but not for Indic scripts, would have been to have combining- below (and sometimes a bit to the side) consonants and combining-below independent vowels, similar to the combining-above Latin letters recently encoded, as well as how Tibetan is handled. However that is not the chosen solution, which instead uses a combining character (COENG) that makes characters conjoin (glue) like it is done for Indic (Brahmic) scripts and like Hangul Jamo, though the latter does not have a separate gluer character. In the Khmer script, using the COENG based approach, the words are formed from orthographic syllables, where an orthographic syllable has the following structure [add ligation control?]: Khmer-syllable ::= (K H)* K M* where K is a Khmer consonant (most with an inherent vowel that is pronounced only if there is no consonant, independent vowel, or dependent vowel following it in the orthographic syllable) or a Khmer independent vowel, H is the invisible Khmer conjoint former COENG, M is a combining character (including COENG, though that would be a misspelling), in particular a combining Khmer vowel (noted A below) or modifier sign. -
ISO Basic Latin Alphabet
ISO basic Latin alphabet The ISO basic Latin alphabet is a Latin-script alphabet and consists of two sets of 26 letters, codified in[1] various national and international standards and used widely in international communication. The two sets contain the following 26 letters each:[1][2] ISO basic Latin alphabet Uppercase Latin A B C D E F G H I J K L M N O P Q R S T U V W X Y Z alphabet Lowercase Latin a b c d e f g h i j k l m n o p q r s t u v w x y z alphabet Contents History Terminology Name for Unicode block that contains all letters Names for the two subsets Names for the letters Timeline for encoding standards Timeline for widely used computer codes supporting the alphabet Representation Usage Alphabets containing the same set of letters Column numbering See also References History By the 1960s it became apparent to thecomputer and telecommunications industries in the First World that a non-proprietary method of encoding characters was needed. The International Organization for Standardization (ISO) encapsulated the Latin script in their (ISO/IEC 646) 7-bit character-encoding standard. To achieve widespread acceptance, this encapsulation was based on popular usage. The standard was based on the already published American Standard Code for Information Interchange, better known as ASCII, which included in the character set the 26 × 2 letters of the English alphabet. Later standards issued by the ISO, for example ISO/IEC 8859 (8-bit character encoding) and ISO/IEC 10646 (Unicode Latin), have continued to define the 26 × 2 letters of the English alphabet as the basic Latin script with extensions to handle other letters in other languages.[1] Terminology Name for Unicode block that contains all letters The Unicode block that contains the alphabet is called "C0 Controls and Basic Latin". -
Plaquette De Présentation De Bépo Est Sous Double Licence CC-BY-SA Et GFDL ©2014 Association Ergodis, Avec L’Aimable Collaboration De Ploum
Installation moins Bépo s’installe sur la plupart des systèmes , de (Windows, OSX, BSD, Android) et est déjà inclus s dans GNU/Linux, Haiku et FirefoxOS. t m Vous pouvez également télécharger l’archive o « nomade » qui vous permet d’utiliser bépo a partout où vous allez sans avoir besoin d’installer m u préalablement un logiciel. x Rien n’est définitif ! il vous est toujours possible de e basculer en un clic sur votre ancienne disposition. d Apprentissage s u Bépo est conçu pour une utilisation en l aveugle à dix doigts, c’est plus facile P qu’on peut le penser et plus confortable. Choisissez un logiciel de dactylographie et pratiquez les exercices pendant 10 à 15 minutes par jour. la disposition de clavier L’apprentissage de bépo est simplifié par ergonomique, francophone et le fait que dès les premières leçons, vous libre écrivez de vrais mots et non des suites de lettres dénuées de sens. De plus, les caractères de la couche AltGr par l’association sont installés de manière mnémotechnique. Même sans pratique, vous n’oublierez pas les acquis de votre ancienne disposition : C’est comme le vélo, un petit temps d’adaptation et c’est reparti ! Claviers Un clavier avec un marquage particulier Tapez facilement à dix doigts n’est pas nécessaire et est même dans votre langue. contre-indiqué lors de l’apprentissage. http://bepo.fr/ Cependant, il existe des autocollants à coller sur vos touches permettant Notre communauté est prête à d’adapter un clavier existant et même répondre à toutes vos questions. -
The Yubikey Manual
The YubiKey Manual Usage, configuration and introduction of basic concepts Version: 3.4 Date: 27 March, 2015 The YubiKey Manual Disclaimer The contents of this document are subject to revision without notice due to continued progress in methodology, design, and manufacturing. Yubico shall have no liability for any error or damages of any kind resulting from the use of this document. The Yubico Software referenced in this document is licensed to you under the terms and conditions accompanying the software or as otherwise agreed between you or the company that you are representing. Trademarks Yubico and YubiKey are trademarks of Yubico AB. Contact Information Yubico AB Kungsgatan 37, 8 floor 111 56 Stockholm Sweden [email protected] © Yubico, 2015 Page 2 of 40 Version: Yubikey Manual 3.4 The YubiKey Manual Contents 1 Document Information 1.1 Purpose 1.2 Audience 1.3 Related documentation 1.4 Document History 1.5 Definitions 2 Introduction and basic concepts 2.1 Basic concepts and terms 2.2 Functional blocks 2.3 Security rationale 2.4 OATH-HOTP mode 2.5 Challenge-response mode 2.6 YubiKey NEO 2.7 YubiKey versions and parametric data 2.8 YubiKey Nano 3 Installing the YubiKey 3.1 Inserting the YubiKey for the first time (Windows XP) 3.2 Verifying the installation (Windows XP) 3.3 Installing the key under Mac OS X 3.4 Installing the YubiKey on other platforms 3.5 Understanding the LED indicator 3.6 Testing the installation 3.7 Installation troubleshooting 4 Using the YubiKey 4.1 Using multiple configurations (from version 2.0) 4.2 Updating a -
How to Enter Foreign Language Characters on Computers
How to Enter Foreign Language Characters on Computers Introduction Current word processors and operating systems provide a large number of methods for writing special characters such as accented letters used in foreign languages. Unfortunately, it is not always obvious just how to enter such characters. Moreover, even when one knows a method of typing an accented letter, there may be a much simpler method for doing the same thing. This note may help you find the most convenient method for typing such characters. The choice of method will largely depend on how frequently you have to type in foreign languages. 1 The “ALT Key” Method This is the most common method of entering special characters. It always works, regardless of what pro- gram you are using. On both PCs and Macs, you can write foreign characters in any application by combining the ALT key (the key next to the space bar) with some alphabetic characters (on the Mac) or numbers (on PCs), pro- vided you type numbers on the numeric keypad, rather than using the numbers at the top of the keyboard. To do that, of course, also requires your NumLock Key to be turned on, which it normally will be. For example, On the Mac, ALT + n generates “ñ”. On the PC, ALT + (number pad) 164 or ALT + (number pad) 0241 generate “ñ”. A list of three- and four-digit PC codes for some common foreign languages appears at the end of this note. 2 The “Insert Symbol” Method Most menus in word processors and other applications offer access to a window displaying all the printable characters in a particular character set. -
Gmail Smart Compose: Real-Time Assisted Writing
Gmail Smart Compose: Real-Time Assisted Writing Mia Xu Chen∗ Benjamin N Lee∗ Gagan Bansal∗ [email protected] [email protected] [email protected] Google Google Google Yuan Cao Shuyuan Zhang Justin Lu [email protected] [email protected] [email protected] Google Google Google Jackie Tsay Yinan Wang Andrew M. Dai [email protected] [email protected] [email protected] Google Google Google Zhifeng Chen Timothy Sohn Yonghui Wu [email protected] [email protected] [email protected] Google Google Google Figure 1: Smart Compose Screenshot. ABSTRACT our proposed system design and deployment approach. This system In this paper, we present Smart Compose, a novel system for gener- is currently being served in Gmail. ating interactive, real-time suggestions in Gmail that assists users in writing mails by reducing repetitive typing. In the design and KEYWORDS deployment of such a large-scale and complicated system, we faced Smart Compose, language model, assisted writing, large-scale serv- several challenges including model selection, performance eval- ing uation, serving and other practical issues. At the core of Smart ACM Reference Format: arXiv:1906.00080v1 [cs.CL] 17 May 2019 Compose is a large-scale neural language model. We leveraged Mia Xu Chen, Benjamin N Lee, Gagan Bansal, Yuan Cao, Shuyuan Zhang, state-of-the-art machine learning techniques for language model Justin Lu, Jackie Tsay, Yinan Wang, Andrew M. Dai, Zhifeng Chen, Timothy training which enabled high-quality suggestion prediction, and Sohn, and Yonghui Wu. 2019. Gmail Smart Compose: Real-Time Assisted constructed novel serving infrastructure for high-throughput and Writing. In The 25th ACM SIGKDD Conference on Knowledge Discovery and real-time inference. -
Finite-State Script Normalization and Processing Utilities: the Nisaba Brahmic Library
Finite-state script normalization and processing utilities: The Nisaba Brahmic library Cibu Johny† Lawrence Wolf-Sonkin‡ Alexander Gutkin† Brian Roark‡ Google Research †United Kingdom and ‡United States {cibu,wolfsonkin,agutkin,roark}@google.com Abstract In addition to such normalization issues, some scripts also have well-formedness constraints, i.e., This paper presents an open-source library for efficient low-level processing of ten ma- not all strings of Unicode characters from a single jor South Asian Brahmic scripts. The library script correspond to a valid (i.e., legible) grapheme provides a flexible and extensible framework sequence in the script. Such constraints do not ap- for supporting crucial operations on Brahmic ply in the basic Latin alphabet, where any permuta- scripts, such as NFC, visual normalization, tion of letters can be rendered as a valid string (e.g., reversible transliteration, and validity checks, for use as an acronym). The Brahmic family of implemented in Python within a finite-state scripts, however, including the Devanagari script transducer formalism. We survey some com- mon Brahmic script issues that may adversely used to write Hindi, Marathi and many other South affect the performance of downstream NLP Asian languages, do have such constraints. These tasks, and provide the rationale for finite-state scripts are alphasyllabaries, meaning that they are design and system implementation details. structured around orthographic syllables (aksara)̣ as the basic unit.1 One or more Unicode characters 1 Introduction combine when rendering one of thousands of leg- The Unicode Standard separates the representation ible aksara,̣ but many combinations do not corre- of text from its specific graphical rendering: text spond to any aksara.̣ Given a token in these scripts, is encoded as a sequence of characters, which, at one may want to (a) normalize it to a canonical presentation time are then collectively rendered form; and (b) check whether it is a well-formed into the appropriate sequence of glyphs for display. -
Smart Reply Feature ` 01-Feb-2018
SMART REPLY FEATURE ` 01-FEB-2018 Google announced that it is now rolling out the Smart Reply feature to messaging app Android Messages. The AI-based Smart Reply feature was launched with GooglePHISHING Allo back in September 2016. It will be available only for Project Fi users currently, with no timeline on a wider rollout. Google will require access to your SMS history to help it generate intelligent responses. The announcement was made through a tweet on Project Fi's official Twitter account. Smart Reply, launched with Google Allo, automatically suggests responses to messages that you have received. It provides contextual replies by analysing the recent message in the thread.PHISHING It can be turned off by going into Settings in Android Messages, under Smart Reply. The feature currently works with Google Allo, Gmail, Google Assistant, and, now, Android Messages - but the last as we mentioned is only for Project Fi users. Notably, this addition of Smart Reply to Android Messages comes a week after a teardown of Google's Gboard beta APK revealed that the Smart Reply intelligent suggestions are coming to the Gboard app on Android. The keyboard is expected to offer phrase-length suggestions in the topmost row. Thanks to the upcoming integration, the feature will work on wide variety of apps, negating the need for third-party app developers to bring Smart Reply support or similar features on their offerings. Apart from first party apps like Allo, Android Messages, and Hangouts, the feature was also spotted working on Facebook, Messenger Lite, WhatsApp, Facebook Messenger, and Tencent's platforms.