Proposal to Encode Old Uyghur in Unicode
Total Page:16
File Type:pdf, Size:1020Kb

Load more
Recommended publications
-
ISO Basic Latin Alphabet
ISO basic Latin alphabet The ISO basic Latin alphabet is a Latin-script alphabet and consists of two sets of 26 letters, codified in[1] various national and international standards and used widely in international communication. The two sets contain the following 26 letters each:[1][2] ISO basic Latin alphabet Uppercase Latin A B C D E F G H I J K L M N O P Q R S T U V W X Y Z alphabet Lowercase Latin a b c d e f g h i j k l m n o p q r s t u v w x y z alphabet Contents History Terminology Name for Unicode block that contains all letters Names for the two subsets Names for the letters Timeline for encoding standards Timeline for widely used computer codes supporting the alphabet Representation Usage Alphabets containing the same set of letters Column numbering See also References History By the 1960s it became apparent to thecomputer and telecommunications industries in the First World that a non-proprietary method of encoding characters was needed. The International Organization for Standardization (ISO) encapsulated the Latin script in their (ISO/IEC 646) 7-bit character-encoding standard. To achieve widespread acceptance, this encapsulation was based on popular usage. The standard was based on the already published American Standard Code for Information Interchange, better known as ASCII, which included in the character set the 26 × 2 letters of the English alphabet. Later standards issued by the ISO, for example ISO/IEC 8859 (8-bit character encoding) and ISO/IEC 10646 (Unicode Latin), have continued to define the 26 × 2 letters of the English alphabet as the basic Latin script with extensions to handle other letters in other languages.[1] Terminology Name for Unicode block that contains all letters The Unicode block that contains the alphabet is called "C0 Controls and Basic Latin". -
Turkic Languages 161
Turkic Languages 161 seriously endangered by the UNESCO red book on See also: Arabic; Armenian; Azerbaijanian; Caucasian endangered languages: Gagauz (Moldovan), Crim- Languages; Endangered Languages; Greek, Modern; ean Tatar, Noghay (Nogai), and West-Siberian Tatar Kurdish; Sign Language: Interpreting; Turkic Languages; . Caucasian: Laz (a few hundred thousand speakers), Turkish. Georgian (30 000 speakers), Abkhaz (10 000 speakers), Chechen-Ingush, Avar, Lak, Lezghian (it is unclear whether this is still spoken) Bibliography . Indo-European: Bulgarian, Domari, Albanian, French (a few thousand speakers each), Ossetian Andrews P A & Benninghaus R (1989). Ethnic groups in the Republic of Turkey. Wiesbaden: Dr. Ludwig Reichert (a few hundred speakers), German (a few dozen Verlag. speakers), Polish (a few dozen speakers), Ukranian Aydın Z (2002). ‘Lozan Antlas¸masında azınlık statu¨ su¨; (it is unclear whether this is still spoken), and Farklı ko¨kenlilere tanınan haklar.’ In Kabog˘lu I˙ O¨ (ed.) these languages designated as seriously endangered Azınlık hakları (Minority rights). (Minority status in the by the UNESCO red book on endangered lan- Treaty of Lausanne; Rights granted to people of different guages: Romani (20 000–30 000 speakers) and Yid- origin). I˙stanbul: Publication of the Human Rights Com- dish (a few dozen speakers) mission of the I˙stanbul Bar. 209–217. Neo-Aramaic (Afroasiatic): Tu¯ ro¯ yo and Su¯ rit (a C¸ag˘aptay S (2002). ‘Otuzlarda Tu¨ rk milliyetc¸ilig˘inde ırk, dil few thousand speakers each) ve etnisite’ (Race, language and ethnicity in the Turkish . Languages spoken by recent immigrants, refugees, nationalism of the thirties). In Bora T (ed.) Milliyetc¸ilik ˙ ˙ and asylum seekers: Afroasiatic languages: (Nationalism). -
Sir Gerard Clauson and His Skeleton Tangut Dictionary Imre Galambos
Sir Gerard Clauson and his Skeleton Tangut Dictionary Imre Galambos Sir Gerard Leslie Makins Clauson (1891–1974) worked most of his life as a civil servant and conducted academic research in his spare time.1 Only after retiring in 1951 at the age of 60 was he able to devote his full attention to scholarly endeavours, which were primarily focussed on Turkish languages. Thus as a scholar, today he is primarily remembered for his contribution to Turkish studies, and his Etymological Dictionary of Pre-Thirteenth-Century Turkish is still an essential reference tool in the field.2 Yet in addition to his study of Turkish and Mongolian linguistics, he also worked on a number of other Asian languages, including Tangut. Even though his extensive list of publications includes a small number of items related to Tangut studies,3 he devoted an incredible amount of time and effort to studying the language and to compiling a dictionary. He never finished the dictionary but deposited a draft version along with his notes in seven large volumes at the Library of the School of Oriental and African Studies (SOAS), so that they would be available to anyone who wished to study Tangut and perhaps continue his research. Eric Grinstead, who used the dictionary when working on the Tangut manuscripts at the British Museum, called it “a paragon of excellence” in comparison with high level of errors in dictionaries available at the time.4 Indeed, the erudition of Clauson’s dictionary is obvious even upon a cursory look at the manuscript version and had it ever been published, it would have undoubtedly made a major impact on scholarship. -
Abstracts English
International Symposium: Interaction of Turkic Languages and Cultures Abstracts Saule Tazhibayeva & Nevskaya Irina Turkish Diaspora of Kazakhstan: Language Peculiarities Kazakhstan is a multiethnic and multi-religious state, where live more than 126 representatives of different ethnic groups (Sulejmenova E., Shajmerdenova N., Akanova D. 2007). One-third of the population is Turkic ethnic groups speaking 25 Turkic languages and presenting a unique model of the Turkic world (www.stat.gov.kz, Nevsakya, Tazhibayeva, 2014). One of the most numerous groups are Turks deported from Georgia to Kazakhstan in 1944. The analysis of the language, culture and history of the modern Turkic peoples, including sub-ethnic groups of the Turkish diaspora up to the present time has been carried out inconsistently. Kazakh researchers studied history (Toqtabay, 2006), ethno-political processes (Galiyeva, 2010), ethnic and cultural development of Turkish diaspora in Kazakhstan (Ibrashaeva, 2010). Foreign researchers devoted their studies to ethnic peculiarities of Kazakhstan (see Bhavna Dave, 2007). Peculiar features of Akhiska Turks living in the US are presented in the article of Omer Avci (www.nova.edu./ssss/QR/QR17/avci/PDF). Features of the language and culture of the Turkish Diaspora in Kazakhstan were not subjected to special investigation. There have been no studies of the features of the Turkish language, with its sub- ethnic dialects, documentation of a corpus of endangered variants of Turkish language. The data of the pre-sociological surveys show that the Kazakh Turks self-identify themselves as Turks Akhiska, Turks Hemshilli, Turks Laz, Turks Terekeme. Unable to return to their home country to Georgia Akhiska, Hemshilli, Laz Turks, Terekeme were scattered in many countries. -
Unicode Alphabets for L ATEX
Unicode Alphabets for LATEX Specimen Mikkel Eide Eriksen March 11, 2020 2 Contents MUFI 5 SIL 21 TITUS 29 UNZ 117 3 4 CONTENTS MUFI Using the font PalemonasMUFI(0) from http://mufi.info/. Code MUFI Point Glyph Entity Name Unicode Name E262 � OEligogon LATIN CAPITAL LIGATURE OE WITH OGONEK E268 � Pdblac LATIN CAPITAL LETTER P WITH DOUBLE ACUTE E34E � Vvertline LATIN CAPITAL LETTER V WITH VERTICAL LINE ABOVE E662 � oeligogon LATIN SMALL LIGATURE OE WITH OGONEK E668 � pdblac LATIN SMALL LETTER P WITH DOUBLE ACUTE E74F � vvertline LATIN SMALL LETTER V WITH VERTICAL LINE ABOVE E8A1 � idblstrok LATIN SMALL LETTER I WITH TWO STROKES E8A2 � jdblstrok LATIN SMALL LETTER J WITH TWO STROKES E8A3 � autem LATIN ABBREVIATION SIGN AUTEM E8BB � vslashura LATIN SMALL LETTER V WITH SHORT SLASH ABOVE RIGHT E8BC � vslashuradbl LATIN SMALL LETTER V WITH TWO SHORT SLASHES ABOVE RIGHT E8C1 � thornrarmlig LATIN SMALL LETTER THORN LIGATED WITH ARM OF LATIN SMALL LETTER R E8C2 � Hrarmlig LATIN CAPITAL LETTER H LIGATED WITH ARM OF LATIN SMALL LETTER R E8C3 � hrarmlig LATIN SMALL LETTER H LIGATED WITH ARM OF LATIN SMALL LETTER R E8C5 � krarmlig LATIN SMALL LETTER K LIGATED WITH ARM OF LATIN SMALL LETTER R E8C6 UU UUlig LATIN CAPITAL LIGATURE UU E8C7 uu uulig LATIN SMALL LIGATURE UU E8C8 UE UElig LATIN CAPITAL LIGATURE UE E8C9 ue uelig LATIN SMALL LIGATURE UE E8CE � xslashlradbl LATIN SMALL LETTER X WITH TWO SHORT SLASHES BELOW RIGHT E8D1 æ̊ aeligring LATIN SMALL LETTER AE WITH RING ABOVE E8D3 ǽ̨ aeligogonacute LATIN SMALL LETTER AE WITH OGONEK AND ACUTE 5 6 CONTENTS -
Proposal for Generation Panel for Latin Script Label Generation Ruleset for the Root Zone
Generation Panel for Latin Script Label Generation Ruleset for the Root Zone Proposal for Generation Panel for Latin Script Label Generation Ruleset for the Root Zone Table of Contents 1. General Information 2 1.1 Use of Latin Script characters in domain names 3 1.2 Target Script for the Proposed Generation Panel 4 1.2.1 Diacritics 5 1.3 Countries with significant user communities using Latin script 6 2. Proposed Initial Composition of the Panel and Relationship with Past Work or Working Groups 7 3. Work Plan 13 3.1 Suggested Timeline with Significant Milestones 13 3.2 Sources for funding travel and logistics 16 3.3 Need for ICANN provided advisors 17 4. References 17 1 Generation Panel for Latin Script Label Generation Ruleset for the Root Zone 1. General Information The Latin script1 or Roman script is a major writing system of the world today, and the most widely used in terms of number of languages and number of speakers, with circa 70% of the world’s readers and writers making use of this script2 (Wikipedia). Historically, it is derived from the Greek alphabet, as is the Cyrillic script. The Greek alphabet is in turn derived from the Phoenician alphabet which dates to the mid-11th century BC and is itself based on older scripts. This explains why Latin, Cyrillic and Greek share some letters, which may become relevant to the ruleset in the form of cross-script variants. The Latin alphabet itself originated in Italy in the 7th Century BC. The original alphabet contained 21 upper case only letters: A, B, C, D, E, F, Z, H, I, K, L, M, N, O, P, Q, R, S, T, V and X. -
Royal Asiatic Society
ROYAL ASIATIC SOCIETY OF GREAT BRITAIN & IRELAND (FOUNDED MARCH, 1823) LIST OF MEMBERS 1959 PUBLISHED BY THE SOCIETY 56 QUEEN ANNE STREET LONDON W. 1 Downloaded from https://www.cambridge.org/core. IP address: 170.106.202.126, on 30 Sep 2021 at 11:37:45, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0035869X00117630 PRESIDENTS OF THE SOCIETY SINCE ITS FOUNDATION 1823 RT. HON. CHARLES WATKIN WILLIAMS WYNN, M.P. 1841 RT. HON. THE EARL OF MUNSTER. 1842 RT. HON. THE LORD FITZGERALD AND VESEY. 1843 RT. HON. THE EARL OF AUCKLAND. 1849 RT. HON. THE EARL OF ELLESMERE. 1852 RT. HON. THE EARL ASHBURTON. 1855 PROFESSOR HORACE HAYMAN WILSON. 1859 COLONEL WILLIAM HENRY SYKES, M.P. 1861 RT. HON. THE VISCOUNT STRANGFORD. 1864 SIR THOMAS EDWARD COLEBROOKE, BART., M.P. 1867 RT. HON. THE VISCOUNT STRANGFORD. 1869 SIR T. E. COLEBROOKE, BART. 1869 MAJOR-GENERAL SIR HENRY CRESWICKE RAWLINSON, BART. 1871 SIR T. E. COLEBROOKE, BART. 1872 SIR HENRY BARTLE EDWARD FRERE. 1875 SIR T. E. COLEBROOKE, BART. 1878 MAJOR-GENERAL SIR H. C. RAWLINSON, BART. 1881 SIR T. E. COLEBROOKE, BART. 1882 SIR H. B. E. FRERE. 1884 SIR WILLIAM MUIR. 1885 COLONEL SIR HENRY YULE. 1887 SIR THOMAS FRANCIS WADE. 1890 RT. HON. THE EARL OF NORTHBROOK. 1893 RT. HON. THE LORD REAY. 1921 LIEUT.-COL. SIR RICHARD CARNAC TEMPLE. BART. 1922 RT. HON. THE LORD CHALMERS. 1925 SIR EDWARD D. MACLAGAN. 1928 THE MOST HON. THE MARQUESS OF ZETLAND. 1931 SIR E. -
Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset
Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset Brian Roarky, Lawrence Wolf-Sonkiny, Christo Kirovy, Sabrina J. Mielkez, Cibu Johnyy, Işın Demirşahiny and Keith Hally yGoogle Research zJohns Hopkins University {roark,wolfsonkin,ckirov,cibu,isin,kbhall}@google.com [email protected] Abstract This paper describes the Dakshina dataset, a new resource consisting of text in both the Latin and native scripts for 12 South Asian languages. The dataset includes, for each language: 1) native script Wikipedia text; 2) a romanization lexicon; and 3) full sentence parallel data in both a native script of the language and the basic Latin alphabet. We document the methods used for preparation and selection of the Wikipedia text in each language; collection of attested romanizations for sampled lexicons; and manual romanization of held-out sentences from the native script collections. We additionally provide baseline results on several tasks made possible by the dataset, including single word transliteration, full sentence transliteration, and language modeling of native script and romanized text. Keywords: romanization, transliteration, South Asian languages 1. Introduction lingua franca, the prevalence of loanwords and code switch- Languages in South Asia – a region covering India, Pak- ing (largely but not exclusively with English) is high. istan, Bangladesh and neighboring countries – are generally Unlike translations, human generated parallel versions of written with Brahmic or Perso-Arabic scripts, but are also text in the native scripts of these languages and romanized often written in the Latin script, most notably for informal versions do not spontaneously occur in any meaningful vol- communication such as within SMS messages, WhatsApp, ume. -
Characters for Classical Latin
Characters for Classical Latin David J. Perry version 13, 2 July 2020 Introduction The purpose of this document is to identify all characters of interest to those who work with Classical Latin, no matter how rare. Epigraphers will want many of these, but I want to collect any character that is needed in any context. Those that are already available in Unicode will be so identified; those that may be available can be debated; and those that are clearly absent and should be proposed can be proposed; and those that are so rare as to be unencodable will be known. If you have any suggestions for additional characters or reactions to the suggestions made here, please email me at [email protected] . No matter how rare, let’s get all possible characters on this list. Version 6 of this document has been updated to reflect the many characters of interest to Latinists encoded as of Unicode version 13.0. Characters are indicated by their Unicode value, a hexadecimal number, and their name printed IN SMALL CAPITALS. Unicode values may be preceded by U+ to set them off from surrounding text. Combining diacritics are printed over a dotted cir- cle ◌ to show that they are intended to be used over a base character. For more basic information about Unicode, see the website of The Unicode Consortium, http://www.unicode.org/ or my book cited below. Please note that abbreviations constructed with lines above or through existing let- ters are not considered separate characters except in unusual circumstances, nor are the space-saving ligatures found in Latin inscriptions unless they have a unique grammatical or phonemic function (which they normally don’t). -
Revised Proposal to Encode Old Uyghur in Unicode
L2/19-016 2019-01-07 Revised proposal to encode Old Uyghur in Unicode Anshuman Pandey [email protected] January 7, 2019 Document History This proposal is a revision of the following: • L2/18-126: “Preliminary proposal to encode Old Uyghur in Unicode” • L2/18-333: “Proposal to encode Old Uyghur in Unicode” It incorporates comments made by the UTC Script Ad Hoc Committee and other experts in: • L2/18-168: “Recommendations to UTC #155 April-May 2018 on Script Proposals” • L2/18-335: “Comments on the preliminary proposal to encode Old Uyghur in Unicode (L2/18-126)” The major changes to L2/18-333 are as follows: • Correction to glyphs for initial and medial beth, previously shown erroneously using forms for yodh • Revision of glyphs for aleph and nun to reflect distinctive forms from the 9th century • Revision of representative glyph for zayin for stylistic uniformity • Unification of gimel and heth into a single letter, and addition of a letter for final heth • Addition of an alternate form letter for both aleph and nun • Expansion of description of the orientation of terminals for specific letters A previous version of this proposal was reviewed by the following expert: • Dai Matsui (Graduate School of Letters, Osaka University) 1 Revised proposal to encode Old Uyghur in Unicode Anshuman Pandey 1 Introduction The ‘Old Uyghur’ script was used between the 8th and 17th centuries primarily in the Tarim Basin of Central Asia, located in present-day Xinjiang, China. It is a cursive-joining alphabet with features of an abjad, and is characterized by its vertical orientation. -
List of Publications
LIST OF PUBLICATIONS BOOKS 2016 Altuigurische Aparimitāyus-Literatur und kleinere tantrische Texte. (Berliner Turfantexte XXXVI). Turnhout: Brepols, 234 Pp. + 10 Plates. Gudai Weiwueryu zanshi he miaoxiexing shige de yuwenxue yanjiu [Old Uyghur hymns and praises]. Shanghai: Shanghai Classics, XIV + 482 Pp. + 50 Plates. 2015 Gudai Weiwueryu zanshi he miaoxiexing shige de yuwenxue yanjiu [Old Uyghur hymns and praises]. Shanghai: Shanghai Classics, XIV + 482 Pp. + 50 Plates. [With Masahiro Shōgaito, Setsu Fujishiro, Noriko Ohsaki and Mutsumi Sugahara] The Berlin Chinese text U 5335 written in Uighur script. A reconstruction of the Inherited Uighur Pronunciation of Chinese. (Berliner Turfantexte XXXIV.) Turnhout: Brepols, 208 Pp. + 7 Plates. 2010 Prajñāpāramitā Literature in Old Uyghur. (Berliner Turfantexte XXVIII.) Turnhout: Brepols, 319 Pp. + 23 Plates. 2009 Alttürkische Handschriften Teil 15: Die uigurischen Blockdrucke der Berliner Turfansammlung. Teil 3: Stabreimdichtungen, Kalendarisches, Bilder, unbestimmte Fragmente und Nachträge. (Verzeichnis der Orientalischen Handschriften in Deutschland, Bd. XIII 23.) Stuttgart: Franz Steiner Verlag, 309 Pp. 2008 Alttürkische Handschriften Teil 12: Die uigurischen Blockdrucke der Berliner Turfansammlung. Teil 2: Apokryphen, Mahāyāna-Sūtren, Erzählungen, Magische Texte, Kommentare und Kolophone. (Verzeichnis der Orientalischen Handschriften in Deutschland, Bd. XIII 20.) Stuttgart: Franz Steiner Verlag, 264 Pp. 2007 Alttürkische Handschriften. Teil 11: Die uigurischen Blockdrucke der Berliner Turfansammlung. Teil 1: Tantrische Texte (VOHD XIII,19.). Beschrieben von Abdurishid Yakup und M. Knüppel. Stuttgart: Franz Steiner Verlag, 258 Pp. [Catalogue, technical remarks, blibliography and indexes (Pp. 27-258) are by A. Yakup] 2006 Dišastvustik: Eine altuigurische Bearbeitung einer Legende aus dem Catuṣpariṣat-sūtra. (Veröffentlichungen der Societas Uralo-Altaica 71.) Wiesbaden: Harrassowitz. VIII + 176 Pp. 2005 The Turfan dialect of Uyghur. -
Orthographic Transparency and the Ottoman Abjad Maithili Jais
Orthographic Transparency and the Ottoman Abjad Maithili Jais University of Florida Spring 2018 I. Introduction In 2014, the debate over whether Ottoman Turkish was to be taught in schools or not was once again brought to the forefront of Turkish society and the Turkish conscience, as Erdogan began to push for Ottoman Turkish to be taught in all high schools across the country (Yeginsu, 2014). This became an obsession of a news topic for media in the West as well as in Turkey. Turkey’s tumultuous history with politics inevitably led this proposal of teaching Ottoman Turkish in all high schools to become a hotbed of controversy and debate. For all those who are perfectly contented to let bygones be bygones, there are many who assert that the Ottoman Turkish alphabet is still relevant and important. In fact, though this may be a personal anecdote, there are still certainly people who believe that the Ottoman script is, or was, superior to the Latin alphabet with which modern Turkish is written. This thesis does not aim to undertake a task so grand as sussing out which of the two was more appropriate for Turkish. No, such a task would be a behemoth for this paper. Instead, it aims to answer the question, “How?” Rather, “How was the Arabic script moulded to fit Turkish and to what consequence?” Often the claim that one script it superior to another suggests inherent judgement of value, but of the few claims seen circulating Facebook on the efficacy of the Ottoman script, it seems some believe that it represented Turkish more accurately and efficiently.