Title Studies on Thai Language Processing with Emphasis
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
IJCNLP 2011 Proceedings of the Workshop on Advances in Text Input Methods (WTIM 2011)
IJCNLP 2011 Proceedings of the Workshop on Advances in Text Input Methods (WTIM 2011) November 13, 2011 Shangri-La Hotel Chiang Mai, Thailand IJCNLP 2011 Proceedings of the Workshop on Advances in Text Input Methods (WTIM 2011) November 13, 2011 Chiang Mai, Thailand We wish to thank our sponsors Gold Sponsors www.google.com www.baidu.com The Office of Naval Research (ONR) Department of Systems Engineering and The Asian Office of Aerospace Research and Devel- Engineering Managment, The Chinese Uni- opment (AOARD) versity of Hong Kong Silver Sponsors Microsoft Corporation Bronze Sponsors Chinese and Oriental Languages Information Processing Society (COLIPS) Supporter Thailand Convention and Exhibition Bureau (TCEB) We wish to thank our sponsors Organizers Asian Federation of Natural Language National Electronics and Computer Technolo- Processing (AFNLP) gy Center (NECTEC), Thailand Sirindhorn International Institute of Technology Rajamangala University of Technology Lanna (SIIT), Thailand (RMUTL), Thailand Chiang Mai University (CMU), Thailand Maejo University, Thailand c 2011 Asian Federation of Natural Language Proceesing vii Preface Welcome to the IJCNLP Workshop on Advances in Text Input Methods (WTIM 2011)! Methods of text input have entered a new era. The number of people who have access to computers and mobile devices is skyrocketing in regions where people do not have a convenient method of inputting their native language. It has also become commonplace to input text not through a keyboard but through different modes such as voice and handwriting recognition. Even when people input text using a keyboard, it is done differently from only a few years ago – adaptive software keyboards, word auto- completion and prediction, and spell correction are just a few examples of such recent changes in text input experience. -
AIX Globalization
AIX Version 7.1 AIX globalization IBM Note Before using this information and the product it supports, read the information in “Notices” on page 233 . This edition applies to AIX Version 7.1 and to all subsequent releases and modifications until otherwise indicated in new editions. © Copyright International Business Machines Corporation 2010, 2018. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents About this document............................................................................................vii Highlighting.................................................................................................................................................vii Case-sensitivity in AIX................................................................................................................................vii ISO 9000.....................................................................................................................................................vii AIX globalization...................................................................................................1 What's new...................................................................................................................................................1 Separation of messages from programs..................................................................................................... 1 Conversion between code sets............................................................................................................. -
Computers and the Thai Language
[3B2-6] man2009010046.3d 12/2/09 13:47 Page 46 Computers and the Thai Language Hugh Thaweesak Koanantakool National Science and Technology Development Agency Theppitak Karoonboonyanan Thai Linux Working Group Chai Wutiwiwatchai National Electronics and Computer Technology Center This article explains the history of Thai language development for computers, examining such factors as the language, script, and writing system, among others. The article also analyzes characteristics of Thai characters and I/O methods, and addresses key issues involved in Thai text processing. Finally, the article reports on language processing research and provides detailed information on Thai language resources. Thai is the official language of Thailand. Certain vowels, all tone marks, and diacritics The Thai script system has been used are written above and below the main for Thai, Pali, and Sanskrit languages in Bud- character. dhist texts all over the country. Standard Pronunciation of Thai words does not Thai is used in all schools in Thailand, and change with their usage, as each word has a most dialects of Thai use the same script. fixed tone. Changing the tone of a syllable Thai is the language of 65 million people, may lead to a totally different meaning. and has a number of regional dialects, such Thai verbs do not change their forms as as Northeastern Thai (or Isan; 15 million with tense, gender, and singular or plural people), Northern Thai (or Kam Meuang or form,asisthecaseinEuropeanlanguages.In- Lanna; 6 million people), Southern Thai stead, there are other additional words to help (5 million people), Khorat Thai (400,000 with the meaning for tense, gender, and sin- people), and many more variations (http:// gular or plural. -
GNU Texmacs User Manual Joris Van Der Hoeven
GNU TeXmacs User Manual Joris van der Hoeven To cite this version: Joris van der Hoeven. GNU TeXmacs User Manual. 2013. hal-00785535 HAL Id: hal-00785535 https://hal.archives-ouvertes.fr/hal-00785535 Preprint submitted on 6 Feb 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. GNU TEXMACS user manual Joris van der Hoeven & others Table of contents 1. Getting started ...................................... 11 1.1. Conventionsforthismanual . .......... 11 Menuentries ..................................... 11 Keyboardmodifiers ................................. 11 Keyboardshortcuts ................................ 11 Specialkeys ..................................... 11 1.2. Configuring TEXMACS ..................................... 12 1.3. Creating, saving and loading documents . ............ 12 1.4. Printingdocuments .............................. ........ 13 2. Writing simple documents ............................. 15 2.1. Generalities for typing text . ........... 15 2.2. Typingstructuredtext ........................... ......... 15 2.3. Content-basedtags -
Proceedings of the Second Workshop on Advances in Text Input Methods (WTIM 2)
COLING 2012 24th International Conference on Computational Linguistics Proceedings of the Second Workshop on Advances in Text Input Methods (WTIM 2) Workshop chairs: Kalika Bali, Monojit Choudhury and Yoh Okuno 15 December 2012 Mumbai, India Diamond sponsors Tata Consultancy Services Linguistic Data Consortium for Indian Languages (LDC-IL) Gold Sponsors Microsoft Research Beijing Baidu Netcon Science Technology Co. Ltd. Silver sponsors IBM, India Private Limited Crimson Interactive Pvt. Ltd. Yahoo Easy Transcription & Software Pvt. Ltd. Proceedings of the Second Workshop on Advances in Text Input Methods (WTIM 2) Kalika Bali, Monojit Choudhury and Yoh Okuno (eds.) Revised preprint edition, 2012 Published by The COLING 2012 Organizing Committee Indian Institute of Technology Bombay, Powai, Mumbai-400076 India Phone: 91-22-25764729 Fax: 91-22-2572 0022 Email: [email protected] This volume c 2012 The COLING 2012 Organizing Committee. Licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Nonported license. http://creativecommons.org/licenses/by-nc-sa/3.0/ Some rights reserved. Contributed content copyright the contributing authors. Used with permission. Also available online in the ACL Anthology at http://aclweb.org ii Preface It is our great pleasure to present the proceedings of the Second Workshop on Advances in Text Input Methods (WTIM-2) held in conjunction with Coling 2012, on 15th December 2012, in Mumbai, India. This workshop is a sequel to the first WTIM which was held in conjunction with IJCNLP 2011 in November 2011, Chiang Mai, Thailand. The aim of the current workshop remains the same as the previous one that is to bring together the researchers and developers of text input technologies around the world, and share their innovations, research findings and issues across different applications, devices, modes and languages. -
Japanese Input Method Independent of Applications
JAPANESE INPUT METHOD INDEPENDENT OF APPLICATIONS By Takahide Honma, Hiroyoshi Baba, and Kuniaki Takizawa ABSTRACT The Japanese input method is a complex procedure involving preediting operations. An application that accepts Japanese from an input device must have three systems for the input method: a keybinding system, a manipulator for preediting, and a kana-to-kanji conversion system. Various keybinding systems and manipulators accelerate input operations. Our implementation separates an application from the Japanese input method in three layers. An application can use a front-end input processor to perform all operations including I/O. An application can use the henkan (conversion) module and implement I/O operation itself. An application can execute all operations except keybinding, which is handled by an input method library. INTRODUCTION In this paper, we first present an overview of the technical environment of the Japanese input method implementation. Based on this overview, we briefly describe the critical engineering issues for conversion of Digital's products for the Japanese user. Our most critical engineering issue was the reduction of similar (but slightly different) work to localize products. Another issue was to satisfy customers' requests for the ability to use the many input styles familiar to them. We describe our approach to the development of a Japanese input method that overcomes these issues by separating the input method from an application in three layers. OVERVIEW OF THE JAPANESE INPUT METHOD In this section, we describe Japanese input and string manipulation from the perspective of both the user and the application. Based on these descriptions, we present a brief overview of reengineering a product for Japanese users and a summary of the industry's complex techniques developed for Japanese input methods. -
On the History of Thai Scripts Hans Penth
Dr. Hans Penth giving his address to the audience or the Oriental Dr. Navavan Bandhumedha talking about the different Tai Hotel on 10 July 1986. Languages at the Oriental Hotel on 10 July 1986. On the History of Thai scripts Hans Penth (From lecture delivered during a meeting of groups were living scattered over a vast area, that the Siam Society, held at the Oriental Hotel, on borrowing of local Man letters by local Thai groups Thursday, 10 July 1986) had the effect to produce, right from the outset, a number of slightly different local proto-Thai scripts Dr. Pe nth pointed out in his introductory which were spread out over a large geographical remarks that the history of the alphabets used by area. the Thais throughout Southeast Asia was still insufficiently known but there were many theories. (3) In the decades around 1300 A.D., efforts He hoped he would be excused if he too come up to reform the old proto-Thai script were made in with certain ideas concerning the origin and both Sukhothai and Chiang Mai. These efforts mark development of the scripts used by tlie Thais, from the turning point from proto-Thai script to modern Assam in the west to Tongking in the east, and from Thai script. Yunnan in the north to Thailand in the south, and (4) Also during the decades before or around referred listeners for further information to his article 1300 A.D., the Thais of Sukhothai and the Thais included in H.R.H. Galyani Vadhana's book on of Chiang Mai did something else that was identical: Yunnan. -
A Flick-Based Japanese Tablet Keyboard Using Direct Kanji Input
A Flick-based Japanese Tablet Keyboard using Direct Kanji Input Yuya Nakamura1 and Hiroshi Hosobe2 1Graduate School of Computer and Information Sciences, Hosei University, Tokyo, Japan 2Faculty of Computer and Information Sciences, Hosei University, Tokyo, Japan Keywords: Text Entry Touch Screen, Touch Typing, Software Keyboard. Abstract: Tablets, as well as smartphones and personal computers, are popular as Internet clients. A split keyboard is a software keyboard suitable for tablets with large screens. However, unlike other methods, the split keyboard has space in the center of the screen, which makes the part of the screen for displaying suggestions small and inconvenient. This paper proposes a Japanese input software keyboard that enables direct kanji input on a split flick keyboard. Once the user has mastered this keyboard, it allows the user to efficiently input Japanese text while holding a tablet with both hands. The paper presents an implementation of the keyboard on Android and reports the result of an experiment on its performance compared with existing methods. In addition, since direct kanji input generally takes time for users to learn, one of the authors by himself has conducted a long- term experiment to confirm the possibility of its mastery. For 12 months, both the input speed and the error rate have gradually improved. 1 INTRODUCTION makes the part of the screen for displaying conversion suggestions small and inconvenient. Japanese text is composed of Chinese-originated In Japan, there are about as many users of “flick” “kanji” characters and Japanese “kana” characters keyboards as QWERTY keyboard users. Flick is a (Tamaoka, 2014). While a kanji character typi- gesture operation especially used for character input cally has a meaning, a kana character does not; in- on touch-screen devices such as tablets and smart- stead, a kana character is associated with a speech phones. -
(RSEP) Request October 16, 2017 Registry Operator INFIBEAM INCORPORATION LIMITED 9Th Floor
Registry Services Evaluation Policy (RSEP) Request October 16, 2017 Registry Operator INFIBEAM INCORPORATION LIMITED 9th Floor, A-Wing Gopal Palace, NehruNagar Ahmedabad, Gujarat 380015 Request Details Case Number: 00874461 This service request should be used to submit a Registry Services Evaluation Policy (RSEP) request. An RSEP is required to add, modify or remove Registry Services for a TLD. More information about the process is available at https://www.icann.org/resources/pages/rsep-2014- 02-19-en Complete the information requested below. All answers marked with a red asterisk are required. Click the Save button to save your work and click the Submit button to submit to ICANN. PROPOSED SERVICE 1. Name of Proposed Service Removal of IDN Languages for .OOO 2. Technical description of Proposed Service. If additional information needs to be considered, attach one PDF file Infibeam Incorporation Limited (“infibeam”) the Registry Operator for the .OOO TLD, intends to change its Registry Service Provider for the .OOO TLD to CentralNic Limited. Accordingly, Infibeam seeks to remove the following IDN languages from Exhibit A of the .OOO New gTLD Registry Agreement: - Armenian script - Avestan script - Azerbaijani language - Balinese script - Bamum script - Batak script - Belarusian language - Bengali script - Bopomofo script - Brahmi script - Buginese script - Buhid script - Bulgarian language - Canadian Aboriginal script - Carian script - Cham script - Cherokee script - Coptic script - Croatian language - Cuneiform script - Devanagari script -
Niboshi for Slate Devices: a Japanese Input Method Using Multi-Touch for Slate Devices
Niboshi for Slate Devices: A Japanese Input Method Using Multi-touch for Slate Devices Gimpei Kimioka, Buntarou Shizuki, and Jiro Tanaka Department of Computer Science, University of Tsukuba, SB1024 University of Tsukuba, 1-1-1 Tennodai, Tsukuba-shi, Ibaraki, Japan {gimpei,shizuki,jiro}@iplab.cs.tsukuba.ac.jp Abstract. We present Niboshi for slate devices, an input system that utilizes a multi-touch interface. Users hold the device with both hands and use both thumbs to input a character in this system. Niboshi for slate devices has four features that improve the performance of inputting text to slate devices: it has a multi-touch input, enables the device to be firmly held with both hands while text is input, can be used without visual confirmation of the input buttons, and has a large text display area with a small interface. The Niboshi system will en- able users to type faster and requires less user attention to typing than existing methods. 1 Introduction Slate devices, such as the iPad or the Galaxy Tab, are mobile devices that have an ap- proximately 7 by 10 inch touchscreen as the main interface. They have more possible applications and bigger screens than smartphones or PDAs, so they are becoming in- creasingly widely used. Users tend to input text more on these devices than they do on smartphones or PDAs. Onscreen QWERTY keyboard (QWERTY) is an existing method commonly used to input Japanese on slate devices. Users input Japanese text by using the alphabet to spell out Hiragana characters phonetically, just like with a hardware-based QWERTY keyboard. -
3. Pronunciation & Phonetic Letters Guide
หน้า ๒๙ : Nâa 29 PRONUNCIATION GUIDE It is important to read this section thoroughly if your aim is to learn to speak Thai using phonetic letters or transliteration. Skip this only if you can read Thai script. What are Phonetic Letters? There are many different practices for learning to speak a language without learning the scripts of the language. Romanisation is the conversion of writing from a different writing system to the Roman (Latin) script, or a system for doing so. Methods of romanisation include transliteration, for representing written text, and transcription, for representing the spoken word, and combinations of both. Phonetic transcription (also known as Phonetic Script or Phonetic Notation) is the visual representation of speech sounds (or phones). The most common type of phonetic transcription use a phonetic alphabet such as the International Phonetic Alphabet (IPA). Transliteration is the practice of converting a text from one script into another by writing or printing (a letter or word) using the closest corresponding letters of a different alphabet or language. Royal Thai Phonetic Alphabet (RPA) is a system used as a standard for transcribing Thai words, names of a person, place, thing, into English. Romanisation In Thai is called ทับศัพท์ : Túb~Sùb or ทับเสียง : Túb~Sĕarng and colloquially called ภาษาคาราโอเกะ : Paa-săa Kaa-raa-o-ge’ = Karaoke language. (ทับ : Túb = place on top of, ศัพท์ : Sùb = vocabulary, เสียง : Sĕang = sound, คาราโอเกะ : Kaa- raa-o-ge’ = karaoke) At Thai Style, we have developed our own system using the English alphabet as phonetic letters. This is because we would like to compare the sounds in English and Thai by using the English alphabet, like if you are a native English speaker learning French which uses the same alphabet but some letters are pronounced with completely different sounds. -
Ms Word Keyboard Shortcuts Pdf
Ms word keyboard shortcuts pdf Continue Ctrl+0 → Toggles 6pts spacing before a paragraph. Ctrl+A → Select all page content. Ctrl+B → Selection highlighted in bold. Ctrl+C → Copy Selected Text. Ctrl+D → Open font preferences window. Ctrl+E → Aligns the selected line or text to the center of the screen. Ctrl+F → Open Search box. Ctrl+I → selection highlighted in italics. Ctrl+J → Aligns the selected text or line to justify the screen. Ctrl+K → Insert a hyperlink. Ctrl+L → Aligns the selected line or text to the left of the screen. Ctrl+M → Paragraph Indent. Ctrl+N → Opens a new, blank document window. Ctrl+O → Opens the dialog box or page to select a file to open. Ctrl+P → Open the print window. Ctrl+R → Aligns the selected line or text to the right of the screen. Ctrl+S → Save Open Document. Like Shift+F12. Alt, F, A → Save the document with a different file name. Ctrl+T - Create a hanging indentation. Ctrl+U → Underline selected text. Ctrl+V → Paste. Ctrl+W → Close currently open document. Ctrl+X → Cut selected text. Ctrl+Y → Redo last action performed. Ctrl+Z → Undo Last Action. Ctrl+Shift+L → quickly create a bullet point. Ctrl+Shift+F → Change Font. Ctrl+Shift+> → Increase selected font +1pts up to 12pt and then increase the font +2pts. Ctrl+] → Increase selected font +1pts. Ctrl+Shift+< → Decrease selected font -1pts if 12pt or lower; if it is above 12, decrease the source by +2pt. Ctrl+[ → Decrease the selected font -1pts. Ctrl+/+c → Insert a penny sign (or).