Scripts and Politics in the Ussr
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Section 14.4, Phags-Pa
The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1. -
Technical Reference Manual for the Standardization of Geographical Names United Nations Group of Experts on Geographical Names
ST/ESA/STAT/SER.M/87 Department of Economic and Social Affairs Statistics Division Technical reference manual for the standardization of geographical names United Nations Group of Experts on Geographical Names United Nations New York, 2007 The Department of Economic and Social Affairs of the United Nations Secretariat is a vital interface between global policies in the economic, social and environmental spheres and national action. The Department works in three main interlinked areas: (i) it compiles, generates and analyses a wide range of economic, social and environmental data and information on which Member States of the United Nations draw to review common problems and to take stock of policy options; (ii) it facilitates the negotiations of Member States in many intergovernmental bodies on joint courses of action to address ongoing or emerging global challenges; and (iii) it advises interested Governments on the ways and means of translating policy frameworks developed in United Nations conferences and summits into programmes at the country level and, through technical assistance, helps build national capacities. NOTE The designations employed and the presentation of material in the present publication do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. The term “country” as used in the text of this publication also refers, as appropriate, to territories or areas. Symbols of United Nations documents are composed of capital letters combined with figures. ST/ESA/STAT/SER.M/87 UNITED NATIONS PUBLICATION Sales No. -
Sinitic Language and Script in East Asia: Past and Present
SINO-PLATONIC PAPERS Number 264 December, 2016 Sinitic Language and Script in East Asia: Past and Present edited by Victor H. Mair Victor H. Mair, Editor Sino-Platonic Papers Department of East Asian Languages and Civilizations University of Pennsylvania Philadelphia, PA 19104-6305 USA [email protected] www.sino-platonic.org SINO-PLATONIC PAPERS FOUNDED 1986 Editor-in-Chief VICTOR H. MAIR Associate Editors PAULA ROBERTS MARK SWOFFORD ISSN 2157-9679 (print) 2157-9687 (online) SINO-PLATONIC PAPERS is an occasional series dedicated to making available to specialists and the interested public the results of research that, because of its unconventional or controversial nature, might otherwise go unpublished. The editor-in-chief actively encourages younger, not yet well established, scholars and independent authors to submit manuscripts for consideration. Contributions in any of the major scholarly languages of the world, including romanized modern standard Mandarin (MSM) and Japanese, are acceptable. In special circumstances, papers written in one of the Sinitic topolects (fangyan) may be considered for publication. Although the chief focus of Sino-Platonic Papers is on the intercultural relations of China with other peoples, challenging and creative studies on a wide variety of philological subjects will be entertained. This series is not the place for safe, sober, and stodgy presentations. Sino- Platonic Papers prefers lively work that, while taking reasonable risks to advance the field, capitalizes on brilliant new insights into the development of civilization. Submissions are regularly sent out to be refereed, and extensive editorial suggestions for revision may be offered. Sino-Platonic Papers emphasizes substance over form. -
Old Cyrillic in Unicode*
Old Cyrillic in Unicode* Ivan A Derzhanski Institute for Mathematics and Computer Science, Bulgarian Academy of Sciences [email protected] The current version of the Unicode Standard acknowledges the existence of a pre- modern version of the Cyrillic script, but its support thereof is limited to assigning code points to several obsolete letters. Meanwhile mediæval Cyrillic manuscripts and some early printed books feature a plethora of letter shapes, ligatures, diacritic and punctuation marks that want proper representation. (In addition, contemporary editions of mediæval texts employ a variety of annotation signs.) As generally with scripts that predate printing, an obvious problem is the abundance of functional, chronological, regional and decorative variant shapes, the precise details of whose distribution are often unknown. The present contents of the block will need to be interpreted with Old Cyrillic in mind, and decisions to be made as to which remaining characters should be implemented via Unicode’s mechanism of variation selection, as ligatures in the typeface, or as code points in the Private space or the standard Cyrillic block. I discuss the initial stage of this work. The Unicode Standard (Unicode 4.0.1) makes a controversial statement: The historical form of the Cyrillic alphabet is treated as a font style variation of modern Cyrillic because the historical forms are relatively close to the modern appearance, and because some of them are still in modern use in languages other than Russian (for example, U+0406 “I” CYRILLIC CAPITAL LETTER I is used in modern Ukrainian and Byelorussian). Some of the letters in this range were used in modern typefaces in Russian and Bulgarian. -
Learning Cyrillic
LEARNING CYRILLIC Question: If there is no equivalent letter in the Cyrillic alphabet for the Roman "J" or "H" how do you transcribe good German names like Johannes, Heinrich, Wilhelm, etc. I heard one suggestion that Johann was written as Ivan and that the "h" was replaced with a "g". Can you give me a little insight into what you have found? In researching would I be looking for the name Ivan rather than Johann? One must always think phonetic, that is, think how a name is pronounced in German, and how does the Russian Cyrillic script produce that sound? JOHANNES. The Cyrillic spelling begins with the letter “I – eye”, but pronounced “eee”, so we have phonetically “eee-o-hann” which sounds like “Yo-hann”. You can see it better in typeface – Иоганн , which letter for letter reads as “I-o-h-a-n-n”. The modern Typeface script is radically different than the old hand-written Cyrillic script. Use the guide which I sent to you. Ivan is the Russian equivalent of Johann, and it pops up occasionally in Church records. JOSEPH / JOSEF. Listen to the way the name is pronounced in German – “yo-sef”, also “yo-sif”. That “yo” sound is produced by the Cyrillic script letters “I” and “o”. Again you can see it in the typeface. Иосеф and also Иосиф. And sometimes Joseph appears as , transliterated as O-s-i-p. Similar to all languages and scripts, Cyrillic spellings are not consistent. The “a” ending indicates a male name. JAKOB. There is no “Jay” sound in the German language. -
Turkmen Language: Alphabet, Structure and Writing
International Journal of Multidisciplinary Research and Growth Evaluation www.allmultidisciplinaryjournal.com International Journal of Multidisciplinary Research and Growth Evaluation ISSN: 2582-7138 Received: 03-07-2021; Accepted: 19-07-2021 www.allmultidisciplinaryjournal.com Volume 2; Issue 4; July-August 2021; Page No. 663-664 Turkmen Language: Alphabet, structure and writing Homayoun Mahmoodi Lecture, Department of Turkmani, Education Faculty, Jawzjan University, Sheberghan, Afghanistan Corresponding Author: Homayoun Mahmoodi Abstract The Turkmen Language is classified as one of the Turkic people worldwide, with approximately 1 million more languages which are all in the Ural-Altaic language family. speaking it as a second language. The majority of Turkmen Like all Turkic languages, the Turkmen Language is an speakers are located in Eurasia and Central Asia, with the agglutinative language that has productive inflectional and highest concentrations located in Turkmenistan, Iran, and derivational suffixes which are affixed to a word like “beads Afghanistan. on a string”. Turkmen is spoken natively by about 7 million Keywords: Turkmen, Afghanistan, Alphabet, structure 1. Introduction Turkmen are part of the ethno-linguistic Turkic continuum that stretches from western China to western Turkey. They share a claim to western-Turkic (Oguz) heritage along with Azerbaijanis and Turks in Turkey (Golden 1972) [4]. Despite a distinct twenty-first century identity, Turkmen culture and traditions historically overlapped with that of other Turks. The term “Turkic” refers to this cultural and linguistic heritage that all Turkic-language speaking groups share. Nevertheless, a separate Turkmen sense of identity was born out of their particular historical experience. A nomadic heritage, distinctively Oguz dialects, and tribal traditions contribute greatly to the identity that sets Turkmen apart from other Turkic groups and shape a specifically Turkmen sense of ethnic identity. -
Script Recognition by Statistical Analysis of the Image Texture
X International Symposium on Industrial Electronics INDEL 2014, Banja Luka, November 0608, 2014 Script Recognition by Statistical Analysis of the Image Texture Invited Paper Darko Brodić Technical Faculty in Bor University of Belgrade Bor, Serbia [email protected] Abstract—The paper proposed an algorithm for the script The proposed approach incorporates the statistical analysis identification using the statistical analysis of the texture obtained of the texture. Texture is suitable for extracting similarities and by script mapping. First, the algorithm models the script sign by dissimilarities between images. However, the novelty of the the equivalent script type. The script type is determined by the proposed approach [8] is given by specific text modeling and position of the letter in the baseline area. Furthermore, the 1-D texture analysis. During text modeling the number of extraction of the features is performed. This step of the algorithm variables is substantially reduced. Furthermore, the image is is based on the script type occurrence and co-occurrence pattern replaced with text. Hence, the image which represents a 2-D analysis. Then, the resultant features are compared. Their image is replaced with the text given by 1-D "image". All differences simplify the script feature classification. The aforementioned contributes to the algorithm's speed. algorithm is tested on the German and Slavic printed documents incorporating different scripts. The experiment gives the results The paper is organized as follows. Section 2 describes the that are promising. proposed algorithm. Section 3 explains the experiment. Section 4 presents the results and gives the discussion. Section 5 makes Keywords - Coding, Cultural heritage, Historical documents, conclusions. -
Writing Systems: Their Properties and Implications for Reading
Writing Systems: Their Properties and Implications for Reading Brett Kessler and Rebecca Treiman doi:10.1093/oxfordhb/9780199324576.013.1 Draft of a chapter to appear in: The Oxford Handbook of Reading, ed. by Alexander Pollatsek and Rebecca Treiman. ISBN 9780199324576. Abstract An understanding of the nature of writing is an important foundation for studies of how people read and how they learn to read. This chapter discusses the characteristics of modern writing systems with a view toward providing that foundation. We consider both the appearance of writing systems and how they function. All writing represents the words of a language according to a set of rules. However, important properties of a language often go unrepresented in writing. Change and variation in the spoken language result in complex links to speech. Redundancies in language and writing mean that readers can often get by without taking in all of the visual information. These redundancies also mean that readers must often supplement the visual information that they do take in with knowledge about the language and about the world. Keywords: writing systems, script, alphabet, syllabary, logography, semasiography, glottography, underrepresentation, conservatism, graphotactics The goal of this chapter is to examine the characteristics of writing systems that are in use today and to consider the implications of these characteristics for how people read. As we will see, a broad understanding of writing systems and how they work can place some important constraints on our conceptualization of the nature of the reading process. It can also constrain our theories about how children learn to read and about how they should be taught to do so. -
A New Database for Online Handwritten Mongolian Word Recognition
2016 23rd International Conference on Pattern Recognition (ICPR) Cancún Center, Cancún, México, December 4-8, 2016 A New Database for Online Handwritten Mongolian Word Recognition Long-Long Ma, Ji Liu, Jian Wu National Engineering Research Center of Fundamental Software Institute of Software, Chinese Academy of Sciences Beijing, P. R. China {longlong, wujian}@iscas.ac.cn Abstract—A new online handwritten Mongolian word studied lately field. Researchers from only several institutes are database, MRG-OHMW, is introduced in this paper. This devoted to related work. Gao et al. [1] presented a statistical database contains 946 Mongolian words produced by 300 persons and structural recognition method for printed Mongolian from Mongolian ethnic minority. These Mongolian words are character recognition. Batsaikhan et al. [2] used multilayer composed of one to fourteen Mongolian characters, and selected perceptron classifier to recognize noisy Mongolian characters from large-scale Mongolian text corpus according to the with single font. Two-stage classification method based on frequencies of usage. The current version of this database is glyph segmentation was used to enhance Mongolian character collected using Anoto pen on paper. The database is further recognition performance [3]. A segmentation-based approach annotated using Mongolian word-level string alignment strategy. segments classical Mongolian words into several GUs (Glyph We partition the samples into training and test sets, and evaluate Unites) and then recognizes the GUs [4]. These GUs are the database using the CNN-based recognizer as a baseline. Experimental results reveal a big challenge to higher recognition combined to form word recognition results. Multi-font printed performance. To our knowledge, MRG-OHMW is the first Mongolian documents are recognized by integrating character publicly available database for online handwritten Mongolian segmentation and character recognition with support mixed research. -
Proposal for Generation Panel for Latin Script Label Generation Ruleset for the Root Zone
Generation Panel for Latin Script Label Generation Ruleset for the Root Zone Proposal for Generation Panel for Latin Script Label Generation Ruleset for the Root Zone Table of Contents 1. General Information 2 1.1 Use of Latin Script characters in domain names 3 1.2 Target Script for the Proposed Generation Panel 4 1.2.1 Diacritics 5 1.3 Countries with significant user communities using Latin script 6 2. Proposed Initial Composition of the Panel and Relationship with Past Work or Working Groups 7 3. Work Plan 13 3.1 Suggested Timeline with Significant Milestones 13 3.2 Sources for funding travel and logistics 16 3.3 Need for ICANN provided advisors 17 4. References 17 1 Generation Panel for Latin Script Label Generation Ruleset for the Root Zone 1. General Information The Latin script1 or Roman script is a major writing system of the world today, and the most widely used in terms of number of languages and number of speakers, with circa 70% of the world’s readers and writers making use of this script2 (Wikipedia). Historically, it is derived from the Greek alphabet, as is the Cyrillic script. The Greek alphabet is in turn derived from the Phoenician alphabet which dates to the mid-11th century BC and is itself based on older scripts. This explains why Latin, Cyrillic and Greek share some letters, which may become relevant to the ruleset in the form of cross-script variants. The Latin alphabet itself originated in Italy in the 7th Century BC. The original alphabet contained 21 upper case only letters: A, B, C, D, E, F, Z, H, I, K, L, M, N, O, P, Q, R, S, T, V and X. -
On Digitization of Romanian Cyrillic Printings of the 17Th–18Th Centuries
Computer Science Journal of Moldova, vol.25, no.2(74), 2017 On Digitization of Romanian Cyrillic Printings of the 17th–18th Centuries Svetlana Cojocaru Alexandru Colesnicov Ludmila Malahov Tudor Bumbu Ștefan Ungur Abstract The paper describes in details recognition of Romanian texts of the 17th–18th centuries printed in the Cyrillic script, and their conversion to the modern Latin script. The challenges are dis- cussed, and solutions of problems are proposed. The elaborated technology and a tool pack include historical alphabets, sets of recognition patterns, and spelling dictionaries in the corresponding orthographies for ABBYY Finereader. In addition, virtual keyboards, fonts, a transliteration utility, and the user manual were developed. This permits successful recognition of old Romanian texts in the Cyrillic script. Transliteration to the Latin script grants no- barrier access to historical documents. 1 Introduction OCR of old books is a sophisticated task. The problems arose from peculiarities of historical typography, non-standardized spelling, and physical degradation of the documents due to their aging and usage. This paper describes digitization of the Romanian texts of the 17th– 18th centuries that were printed in the old Romanian Cyrillic script (RC). We need to OCR them, to present the recognized text in the fonts of the corresponding period, to transliterate then to the modern Romanian Latin script (MRL). Sometimes the reverse transliteration is also useful. We use the existing programs and develop our own pack of additional programs and data. ©2017 by S.Cojocaru, A.Colesnicov, L.Malahov, T.Bumbu, Ș.Ungur 217 S.Cojocaru et al. In the literature, we met the opinion that commercial software like ABBYY Finereader (AFR) is not fully suitable for OCR of old print- ings. -
1 Introduction 6
Report No 170, August 1999 UNU/IIST, P.O. Box 3058, Macau 2 Report No 170, August 1999 UNU/IIST, P.O. Box 3058, Macau 3 Report No 170, August 1999 UNU/IIST, P.O. Box 3058, Macau 4 Report No 170, August 1999 UNU/IIST, P.O. Box 3058, Macau 5 Contents 1 Introduction 6 2 The Basic Character set 8 2.1 Other basic Mongolian characters 15 3 The Variant Forms 17 3.1 Overriding the Defaults 21 3.2 The Mongolian Reference Table 23 4 The Ligatures 25 5 Implementing Software for Mongolian 28 A The Mongolian Reference Table B Mongolian Ligatures Report No 170, August 1999 UNU/IIST, P.O. Box 3058, Macau 6 1 Introduction Although most countries in the world have had national standard encoding schemes for the characters of their own language or languages for some time, these could differ wildly even between countries sharing the same written language. As a result, an electronic document written using a piece of software based on a particular encoding scheme could only be read by someone possessing either software based on the same encoding scheme or software for translating between the two different encoding schemes. As the volume of international communications increased, especially the international exchange of electronic data, not least via the internet, it became clear that this situation was completely impractical and that some internationally accepted universal encoding scheme, which could form the basis for multi-lingual software, was needed. A joint technical committee (ISO/IEC JTC1) was therefore set up by the International Organization for Standardisation (ISO) and the International Electrotechnical Committee (IEC) to work on this, and, initially independently though later in collaboration with ISO/IEC, the Unicode Consortium embarked on a similar project.