Preliminary Proposal to Encode Old Uyghur in Unicode
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Section 14.4, Phags-Pa
The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1. -
Technical Reference Manual for the Standardization of Geographical Names United Nations Group of Experts on Geographical Names
ST/ESA/STAT/SER.M/87 Department of Economic and Social Affairs Statistics Division Technical reference manual for the standardization of geographical names United Nations Group of Experts on Geographical Names United Nations New York, 2007 The Department of Economic and Social Affairs of the United Nations Secretariat is a vital interface between global policies in the economic, social and environmental spheres and national action. The Department works in three main interlinked areas: (i) it compiles, generates and analyses a wide range of economic, social and environmental data and information on which Member States of the United Nations draw to review common problems and to take stock of policy options; (ii) it facilitates the negotiations of Member States in many intergovernmental bodies on joint courses of action to address ongoing or emerging global challenges; and (iii) it advises interested Governments on the ways and means of translating policy frameworks developed in United Nations conferences and summits into programmes at the country level and, through technical assistance, helps build national capacities. NOTE The designations employed and the presentation of material in the present publication do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. The term “country” as used in the text of this publication also refers, as appropriate, to territories or areas. Symbols of United Nations documents are composed of capital letters combined with figures. ST/ESA/STAT/SER.M/87 UNITED NATIONS PUBLICATION Sales No. -
The Vocabulary of Inanimate Nature As a Part of Turkic-Mongolian Language Commonness
ISSN 2039-2117 (online) Mediterranean Journal of Social Sciences Vol 6 No 6 S2 ISSN 2039-9340 (print) MCSER Publishing, Rome-Italy November 2015 The Vocabulary of Inanimate Nature as a Part of Turkic-Mongolian Language Commonness Valentin Ivanovich Rassadin Doctor of Philological Sciences, Professor, Department of the Kalmyk language and Mongolian studies Director of the Mongolian and Altaistic research Scientific centre, Kalmyk State University 358000, Republic of Kalmykia, Elista, Pushkin street, 11 Doi:10.5901/mjss.2015.v6n6s2p126 Abstract The article deals with the problem of commonness of Turkic and Mongolian languages in the area of vocabulary; a layer of vocabulary, reflecting the inanimate nature, is subject to thorough analysis. This thematic group studies the rubrics, devoted to landscape vocabulary, different soil types, water bodies, atmospheric phenomena, celestial sphere. The material, mainly from Khalkha-Mongolian and Old Written Mongolian languages is subject to the analysis; the data from Buryat and Kalmyk languages were also included, as they were presented in these languages. The Buryat material was mainly closer to the Khalkha-Mongolian one. For comparison, the material, mainly from the Old Turkic language, showing the presence of similar words, was included; it testified about the so-called Turkic-Mongolian lexical commonness. The analysis of inner forms of these revealed common lexemes in the majority of cases allowed determining their Turkic origin, proved by wide occurrence of these lexemes in Turkic languages and Turkologists' acknowledgement of their Turkic origin. The presence of great quantity of common vocabulary, which origin is determined as Turkic, testifies about repeated ancient contacts of Mongolian and Turkic languages, taking place in historical retrospective, resulting in hybridization of Mongolian vocabulary. -
ISO Basic Latin Alphabet
ISO basic Latin alphabet The ISO basic Latin alphabet is a Latin-script alphabet and consists of two sets of 26 letters, codified in[1] various national and international standards and used widely in international communication. The two sets contain the following 26 letters each:[1][2] ISO basic Latin alphabet Uppercase Latin A B C D E F G H I J K L M N O P Q R S T U V W X Y Z alphabet Lowercase Latin a b c d e f g h i j k l m n o p q r s t u v w x y z alphabet Contents History Terminology Name for Unicode block that contains all letters Names for the two subsets Names for the letters Timeline for encoding standards Timeline for widely used computer codes supporting the alphabet Representation Usage Alphabets containing the same set of letters Column numbering See also References History By the 1960s it became apparent to thecomputer and telecommunications industries in the First World that a non-proprietary method of encoding characters was needed. The International Organization for Standardization (ISO) encapsulated the Latin script in their (ISO/IEC 646) 7-bit character-encoding standard. To achieve widespread acceptance, this encapsulation was based on popular usage. The standard was based on the already published American Standard Code for Information Interchange, better known as ASCII, which included in the character set the 26 × 2 letters of the English alphabet. Later standards issued by the ISO, for example ISO/IEC 8859 (8-bit character encoding) and ISO/IEC 10646 (Unicode Latin), have continued to define the 26 × 2 letters of the English alphabet as the basic Latin script with extensions to handle other letters in other languages.[1] Terminology Name for Unicode block that contains all letters The Unicode block that contains the alphabet is called "C0 Controls and Basic Latin". -
Arabic Alphabet - Wikipedia, the Free Encyclopedia Arabic Alphabet from Wikipedia, the Free Encyclopedia
2/14/13 Arabic alphabet - Wikipedia, the free encyclopedia Arabic alphabet From Wikipedia, the free encyclopedia َأﺑْ َﺠ ِﺪﯾﱠﺔ َﻋ َﺮﺑِﯿﱠﺔ :The Arabic alphabet (Arabic ’abjadiyyah ‘arabiyyah) or Arabic abjad is Arabic abjad the Arabic script as it is codified for writing the Arabic language. It is written from right to left, in a cursive style, and includes 28 letters. Because letters usually[1] stand for consonants, it is classified as an abjad. Type Abjad Languages Arabic Time 400 to the present period Parent Proto-Sinaitic systems Phoenician Aramaic Syriac Nabataean Arabic abjad Child N'Ko alphabet systems ISO 15924 Arab, 160 Direction Right-to-left Unicode Arabic alias Unicode U+0600 to U+06FF range (http://www.unicode.org/charts/PDF/U0600.pdf) U+0750 to U+077F (http://www.unicode.org/charts/PDF/U0750.pdf) U+08A0 to U+08FF (http://www.unicode.org/charts/PDF/U08A0.pdf) U+FB50 to U+FDFF (http://www.unicode.org/charts/PDF/UFB50.pdf) U+FE70 to U+FEFF (http://www.unicode.org/charts/PDF/UFE70.pdf) U+1EE00 to U+1EEFF (http://www.unicode.org/charts/PDF/U1EE00.pdf) Note: This page may contain IPA phonetic symbols. Arabic alphabet ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع en.wikipedia.org/wiki/Arabic_alphabet 1/20 2/14/13 Arabic alphabet - Wikipedia, the free encyclopedia غ ف ق ك ل م ن ه و ي History · Transliteration ء Diacritics · Hamza Numerals · Numeration V · T · E (//en.wikipedia.org/w/index.php?title=Template:Arabic_alphabet&action=edit) Contents 1 Consonants 1.1 Alphabetical order 1.2 Letter forms 1.2.1 Table of basic letters 1.2.2 Further notes -
Shahmukhi to Gurmukhi Transliteration System: a Corpus Based Approach
Shahmukhi to Gurmukhi Transliteration System: A Corpus based Approach Tejinder Singh Saini1 and Gurpreet Singh Lehal2 1 Advanced Centre for Technical Development of Punjabi Language, Literature & Culture, Punjabi University, Patiala 147 002, Punjab, India [email protected] http://www.advancedcentrepunjabi.org 2 Department of Computer Science, Punjabi University, Patiala 147 002, Punjab, India [email protected] Abstract. This research paper describes a corpus based transliteration system for Punjabi language. The existence of two scripts for Punjabi language has created a script barrier between the Punjabi literature written in India and in Pakistan. This research project has developed a new system for the first time of its kind for Shahmukhi script of Punjabi language. The proposed system for Shahmukhi to Gurmukhi transliteration has been implemented with various research techniques based on language corpus. The corpus analysis program has been run on both Shahmukhi and Gurmukhi corpora for generating statistical data for different types like character, word and n-gram frequencies. This statistical analysis is used in different phases of transliteration. Potentially, all members of the substantial Punjabi community will benefit vastly from this transliteration system. 1 Introduction One of the great challenges before Information Technology is to overcome language barriers dividing the mankind so that everyone can communicate with everyone else on the planet in real time. South Asia is one of those unique parts of the world where a single language is written in different scripts. This is the case, for example, with Punjabi language spoken by tens of millions of people but written in Indian East Punjab (20 million) in Gurmukhi script (a left to right script based on Devanagari) and in Pakistani West Punjab (80 million), written in Shahmukhi script (a right to left script based on Arabic), and by a growing number of Punjabis (2 million) in the EU and the US in the Roman script. -
Turkic Languages 161
Turkic Languages 161 seriously endangered by the UNESCO red book on See also: Arabic; Armenian; Azerbaijanian; Caucasian endangered languages: Gagauz (Moldovan), Crim- Languages; Endangered Languages; Greek, Modern; ean Tatar, Noghay (Nogai), and West-Siberian Tatar Kurdish; Sign Language: Interpreting; Turkic Languages; . Caucasian: Laz (a few hundred thousand speakers), Turkish. Georgian (30 000 speakers), Abkhaz (10 000 speakers), Chechen-Ingush, Avar, Lak, Lezghian (it is unclear whether this is still spoken) Bibliography . Indo-European: Bulgarian, Domari, Albanian, French (a few thousand speakers each), Ossetian Andrews P A & Benninghaus R (1989). Ethnic groups in the Republic of Turkey. Wiesbaden: Dr. Ludwig Reichert (a few hundred speakers), German (a few dozen Verlag. speakers), Polish (a few dozen speakers), Ukranian Aydın Z (2002). ‘Lozan Antlas¸masında azınlık statu¨ su¨; (it is unclear whether this is still spoken), and Farklı ko¨kenlilere tanınan haklar.’ In Kabog˘lu I˙ O¨ (ed.) these languages designated as seriously endangered Azınlık hakları (Minority rights). (Minority status in the by the UNESCO red book on endangered lan- Treaty of Lausanne; Rights granted to people of different guages: Romani (20 000–30 000 speakers) and Yid- origin). I˙stanbul: Publication of the Human Rights Com- dish (a few dozen speakers) mission of the I˙stanbul Bar. 209–217. Neo-Aramaic (Afroasiatic): Tu¯ ro¯ yo and Su¯ rit (a C¸ag˘aptay S (2002). ‘Otuzlarda Tu¨ rk milliyetc¸ilig˘inde ırk, dil few thousand speakers each) ve etnisite’ (Race, language and ethnicity in the Turkish . Languages spoken by recent immigrants, refugees, nationalism of the thirties). In Bora T (ed.) Milliyetc¸ilik ˙ ˙ and asylum seekers: Afroasiatic languages: (Nationalism). -
Writing Systems: Their Properties and Implications for Reading
Writing Systems: Their Properties and Implications for Reading Brett Kessler and Rebecca Treiman doi:10.1093/oxfordhb/9780199324576.013.1 Draft of a chapter to appear in: The Oxford Handbook of Reading, ed. by Alexander Pollatsek and Rebecca Treiman. ISBN 9780199324576. Abstract An understanding of the nature of writing is an important foundation for studies of how people read and how they learn to read. This chapter discusses the characteristics of modern writing systems with a view toward providing that foundation. We consider both the appearance of writing systems and how they function. All writing represents the words of a language according to a set of rules. However, important properties of a language often go unrepresented in writing. Change and variation in the spoken language result in complex links to speech. Redundancies in language and writing mean that readers can often get by without taking in all of the visual information. These redundancies also mean that readers must often supplement the visual information that they do take in with knowledge about the language and about the world. Keywords: writing systems, script, alphabet, syllabary, logography, semasiography, glottography, underrepresentation, conservatism, graphotactics The goal of this chapter is to examine the characteristics of writing systems that are in use today and to consider the implications of these characteristics for how people read. As we will see, a broad understanding of writing systems and how they work can place some important constraints on our conceptualization of the nature of the reading process. It can also constrain our theories about how children learn to read and about how they should be taught to do so. -
Abstracts English
International Symposium: Interaction of Turkic Languages and Cultures Abstracts Saule Tazhibayeva & Nevskaya Irina Turkish Diaspora of Kazakhstan: Language Peculiarities Kazakhstan is a multiethnic and multi-religious state, where live more than 126 representatives of different ethnic groups (Sulejmenova E., Shajmerdenova N., Akanova D. 2007). One-third of the population is Turkic ethnic groups speaking 25 Turkic languages and presenting a unique model of the Turkic world (www.stat.gov.kz, Nevsakya, Tazhibayeva, 2014). One of the most numerous groups are Turks deported from Georgia to Kazakhstan in 1944. The analysis of the language, culture and history of the modern Turkic peoples, including sub-ethnic groups of the Turkish diaspora up to the present time has been carried out inconsistently. Kazakh researchers studied history (Toqtabay, 2006), ethno-political processes (Galiyeva, 2010), ethnic and cultural development of Turkish diaspora in Kazakhstan (Ibrashaeva, 2010). Foreign researchers devoted their studies to ethnic peculiarities of Kazakhstan (see Bhavna Dave, 2007). Peculiar features of Akhiska Turks living in the US are presented in the article of Omer Avci (www.nova.edu./ssss/QR/QR17/avci/PDF). Features of the language and culture of the Turkish Diaspora in Kazakhstan were not subjected to special investigation. There have been no studies of the features of the Turkish language, with its sub- ethnic dialects, documentation of a corpus of endangered variants of Turkish language. The data of the pre-sociological surveys show that the Kazakh Turks self-identify themselves as Turks Akhiska, Turks Hemshilli, Turks Laz, Turks Terekeme. Unable to return to their home country to Georgia Akhiska, Hemshilli, Laz Turks, Terekeme were scattered in many countries. -
On the Acoustic Properties of Mongolian Language
AN INTRODUCTION TO MONGOLIAN PHONETIC-LINGUISTIC MODEL: ON THE ACOUSTIC PROPERTIES OF MONGOLIAN LANGUAGE ¡ Idomuso Dawa , Shigeki Okawa and Katsuhiko Shirai ¢ Dept. of Information and Computer Science, Waseda University, Japan ¢£¢ Dept. of Network Science, Chiba Institute of Technology, Japan ABSTRACT Pandita in 1648. (c) Cyrillic scripts, created from Russian orthographic characters in 1940’s, have been widely used in This paper proposes a multi-lingual phonetic-linguistic mod- Mongolia, Kalmyk A. R. in Russia. el for Mongolian language based on the study of the In the script (a), allophones sometimes correspond to classification and the postulates of such models. This the same characters as well as the contrary case. In many model is commonly applicable for various fields as linguis- cases, a word form is quite different from the pronunciations tic/phonetic/dialectal/ language educational studies as well in these days. On the contrary, (b) Todo scripts were created as speech recognition. Its application to major dialects of based on a rule that one character corresponds to only one Mongolian has marked 91 % on word level recognition. pronunciation. Additionally, the letters have special marks In this paper, therefore, we first introduce the structure of for the long vowels etc. Todo scripts avoid the vagueness in language and classification of Mongolian dialectal speech. writing and speaking in initial Mongolian, as “Todo” means Next, we analyze and classify fundamental characteristics the clearness by Mongolian. in order to define phoneme labels. Lastly, we apply the common models to a simple word recognition experiment. ¤ (a) word ¤ (b) word written spoken written spoken 1 INTRODUCTION 1 2 -- n n Mongolian language, one of the most spoken Asian lan- n n n o u u u guages, is mainly used by Mongolian people resided in g g o- o G G the Central Asia and spoken by about 7.5 million people. -
Unicode Alphabets for L ATEX
Unicode Alphabets for LATEX Specimen Mikkel Eide Eriksen March 11, 2020 2 Contents MUFI 5 SIL 21 TITUS 29 UNZ 117 3 4 CONTENTS MUFI Using the font PalemonasMUFI(0) from http://mufi.info/. Code MUFI Point Glyph Entity Name Unicode Name E262 � OEligogon LATIN CAPITAL LIGATURE OE WITH OGONEK E268 � Pdblac LATIN CAPITAL LETTER P WITH DOUBLE ACUTE E34E � Vvertline LATIN CAPITAL LETTER V WITH VERTICAL LINE ABOVE E662 � oeligogon LATIN SMALL LIGATURE OE WITH OGONEK E668 � pdblac LATIN SMALL LETTER P WITH DOUBLE ACUTE E74F � vvertline LATIN SMALL LETTER V WITH VERTICAL LINE ABOVE E8A1 � idblstrok LATIN SMALL LETTER I WITH TWO STROKES E8A2 � jdblstrok LATIN SMALL LETTER J WITH TWO STROKES E8A3 � autem LATIN ABBREVIATION SIGN AUTEM E8BB � vslashura LATIN SMALL LETTER V WITH SHORT SLASH ABOVE RIGHT E8BC � vslashuradbl LATIN SMALL LETTER V WITH TWO SHORT SLASHES ABOVE RIGHT E8C1 � thornrarmlig LATIN SMALL LETTER THORN LIGATED WITH ARM OF LATIN SMALL LETTER R E8C2 � Hrarmlig LATIN CAPITAL LETTER H LIGATED WITH ARM OF LATIN SMALL LETTER R E8C3 � hrarmlig LATIN SMALL LETTER H LIGATED WITH ARM OF LATIN SMALL LETTER R E8C5 � krarmlig LATIN SMALL LETTER K LIGATED WITH ARM OF LATIN SMALL LETTER R E8C6 UU UUlig LATIN CAPITAL LIGATURE UU E8C7 uu uulig LATIN SMALL LIGATURE UU E8C8 UE UElig LATIN CAPITAL LIGATURE UE E8C9 ue uelig LATIN SMALL LIGATURE UE E8CE � xslashlradbl LATIN SMALL LETTER X WITH TWO SHORT SLASHES BELOW RIGHT E8D1 æ̊ aeligring LATIN SMALL LETTER AE WITH RING ABOVE E8D3 ǽ̨ aeligogonacute LATIN SMALL LETTER AE WITH OGONEK AND ACUTE 5 6 CONTENTS -
Proposal for Generation Panel for Latin Script Label Generation Ruleset for the Root Zone
Generation Panel for Latin Script Label Generation Ruleset for the Root Zone Proposal for Generation Panel for Latin Script Label Generation Ruleset for the Root Zone Table of Contents 1. General Information 2 1.1 Use of Latin Script characters in domain names 3 1.2 Target Script for the Proposed Generation Panel 4 1.2.1 Diacritics 5 1.3 Countries with significant user communities using Latin script 6 2. Proposed Initial Composition of the Panel and Relationship with Past Work or Working Groups 7 3. Work Plan 13 3.1 Suggested Timeline with Significant Milestones 13 3.2 Sources for funding travel and logistics 16 3.3 Need for ICANN provided advisors 17 4. References 17 1 Generation Panel for Latin Script Label Generation Ruleset for the Root Zone 1. General Information The Latin script1 or Roman script is a major writing system of the world today, and the most widely used in terms of number of languages and number of speakers, with circa 70% of the world’s readers and writers making use of this script2 (Wikipedia). Historically, it is derived from the Greek alphabet, as is the Cyrillic script. The Greek alphabet is in turn derived from the Phoenician alphabet which dates to the mid-11th century BC and is itself based on older scripts. This explains why Latin, Cyrillic and Greek share some letters, which may become relevant to the ruleset in the form of cross-script variants. The Latin alphabet itself originated in Italy in the 7th Century BC. The original alphabet contained 21 upper case only letters: A, B, C, D, E, F, Z, H, I, K, L, M, N, O, P, Q, R, S, T, V and X.