UAX #24: Unicode Script Property
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Sinitic Language and Script in East Asia: Past and Present
SINO-PLATONIC PAPERS Number 264 December, 2016 Sinitic Language and Script in East Asia: Past and Present edited by Victor H. Mair Victor H. Mair, Editor Sino-Platonic Papers Department of East Asian Languages and Civilizations University of Pennsylvania Philadelphia, PA 19104-6305 USA [email protected] www.sino-platonic.org SINO-PLATONIC PAPERS FOUNDED 1986 Editor-in-Chief VICTOR H. MAIR Associate Editors PAULA ROBERTS MARK SWOFFORD ISSN 2157-9679 (print) 2157-9687 (online) SINO-PLATONIC PAPERS is an occasional series dedicated to making available to specialists and the interested public the results of research that, because of its unconventional or controversial nature, might otherwise go unpublished. The editor-in-chief actively encourages younger, not yet well established, scholars and independent authors to submit manuscripts for consideration. Contributions in any of the major scholarly languages of the world, including romanized modern standard Mandarin (MSM) and Japanese, are acceptable. In special circumstances, papers written in one of the Sinitic topolects (fangyan) may be considered for publication. Although the chief focus of Sino-Platonic Papers is on the intercultural relations of China with other peoples, challenging and creative studies on a wide variety of philological subjects will be entertained. This series is not the place for safe, sober, and stodgy presentations. Sino- Platonic Papers prefers lively work that, while taking reasonable risks to advance the field, capitalizes on brilliant new insights into the development of civilization. Submissions are regularly sent out to be refereed, and extensive editorial suggestions for revision may be offered. Sino-Platonic Papers emphasizes substance over form. -
Old Cyrillic in Unicode*
Old Cyrillic in Unicode* Ivan A Derzhanski Institute for Mathematics and Computer Science, Bulgarian Academy of Sciences [email protected] The current version of the Unicode Standard acknowledges the existence of a pre- modern version of the Cyrillic script, but its support thereof is limited to assigning code points to several obsolete letters. Meanwhile mediæval Cyrillic manuscripts and some early printed books feature a plethora of letter shapes, ligatures, diacritic and punctuation marks that want proper representation. (In addition, contemporary editions of mediæval texts employ a variety of annotation signs.) As generally with scripts that predate printing, an obvious problem is the abundance of functional, chronological, regional and decorative variant shapes, the precise details of whose distribution are often unknown. The present contents of the block will need to be interpreted with Old Cyrillic in mind, and decisions to be made as to which remaining characters should be implemented via Unicode’s mechanism of variation selection, as ligatures in the typeface, or as code points in the Private space or the standard Cyrillic block. I discuss the initial stage of this work. The Unicode Standard (Unicode 4.0.1) makes a controversial statement: The historical form of the Cyrillic alphabet is treated as a font style variation of modern Cyrillic because the historical forms are relatively close to the modern appearance, and because some of them are still in modern use in languages other than Russian (for example, U+0406 “I” CYRILLIC CAPITAL LETTER I is used in modern Ukrainian and Byelorussian). Some of the letters in this range were used in modern typefaces in Russian and Bulgarian. -
Learning Cyrillic
LEARNING CYRILLIC Question: If there is no equivalent letter in the Cyrillic alphabet for the Roman "J" or "H" how do you transcribe good German names like Johannes, Heinrich, Wilhelm, etc. I heard one suggestion that Johann was written as Ivan and that the "h" was replaced with a "g". Can you give me a little insight into what you have found? In researching would I be looking for the name Ivan rather than Johann? One must always think phonetic, that is, think how a name is pronounced in German, and how does the Russian Cyrillic script produce that sound? JOHANNES. The Cyrillic spelling begins with the letter “I – eye”, but pronounced “eee”, so we have phonetically “eee-o-hann” which sounds like “Yo-hann”. You can see it better in typeface – Иоганн , which letter for letter reads as “I-o-h-a-n-n”. The modern Typeface script is radically different than the old hand-written Cyrillic script. Use the guide which I sent to you. Ivan is the Russian equivalent of Johann, and it pops up occasionally in Church records. JOSEPH / JOSEF. Listen to the way the name is pronounced in German – “yo-sef”, also “yo-sif”. That “yo” sound is produced by the Cyrillic script letters “I” and “o”. Again you can see it in the typeface. Иосеф and also Иосиф. And sometimes Joseph appears as , transliterated as O-s-i-p. Similar to all languages and scripts, Cyrillic spellings are not consistent. The “a” ending indicates a male name. JAKOB. There is no “Jay” sound in the German language. -
Turkmen Language: Alphabet, Structure and Writing
International Journal of Multidisciplinary Research and Growth Evaluation www.allmultidisciplinaryjournal.com International Journal of Multidisciplinary Research and Growth Evaluation ISSN: 2582-7138 Received: 03-07-2021; Accepted: 19-07-2021 www.allmultidisciplinaryjournal.com Volume 2; Issue 4; July-August 2021; Page No. 663-664 Turkmen Language: Alphabet, structure and writing Homayoun Mahmoodi Lecture, Department of Turkmani, Education Faculty, Jawzjan University, Sheberghan, Afghanistan Corresponding Author: Homayoun Mahmoodi Abstract The Turkmen Language is classified as one of the Turkic people worldwide, with approximately 1 million more languages which are all in the Ural-Altaic language family. speaking it as a second language. The majority of Turkmen Like all Turkic languages, the Turkmen Language is an speakers are located in Eurasia and Central Asia, with the agglutinative language that has productive inflectional and highest concentrations located in Turkmenistan, Iran, and derivational suffixes which are affixed to a word like “beads Afghanistan. on a string”. Turkmen is spoken natively by about 7 million Keywords: Turkmen, Afghanistan, Alphabet, structure 1. Introduction Turkmen are part of the ethno-linguistic Turkic continuum that stretches from western China to western Turkey. They share a claim to western-Turkic (Oguz) heritage along with Azerbaijanis and Turks in Turkey (Golden 1972) [4]. Despite a distinct twenty-first century identity, Turkmen culture and traditions historically overlapped with that of other Turks. The term “Turkic” refers to this cultural and linguistic heritage that all Turkic-language speaking groups share. Nevertheless, a separate Turkmen sense of identity was born out of their particular historical experience. A nomadic heritage, distinctively Oguz dialects, and tribal traditions contribute greatly to the identity that sets Turkmen apart from other Turkic groups and shape a specifically Turkmen sense of ethnic identity. -
Basis Technology Unicode対応ライブラリ スペックシート 文字コード その他の名称 Adobe-Standard-Encoding A
Basis Technology Unicode対応ライブラリ スペックシート 文字コード その他の名称 Adobe-Standard-Encoding Adobe-Symbol-Encoding csHPPSMath Adobe-Zapf-Dingbats-Encoding csZapfDingbats Arabic ISO-8859-6, csISOLatinArabic, iso-ir-127, ECMA-114, ASMO-708 ASCII US-ASCII, ANSI_X3.4-1968, iso-ir-6, ANSI_X3.4-1986, ISO646-US, us, IBM367, csASCI big-endian ISO-10646-UCS-2, BigEndian, 68k, PowerPC, Mac, Macintosh Big5 csBig5, cn-big5, x-x-big5 Big5Plus Big5+, csBig5Plus BMP ISO-10646-UCS-2, BMPstring CCSID-1027 csCCSID1027, IBM1027 CCSID-1047 csCCSID1047, IBM1047 CCSID-290 csCCSID290, CCSID290, IBM290 CCSID-300 csCCSID300, CCSID300, IBM300 CCSID-930 csCCSID930, CCSID930, IBM930 CCSID-935 csCCSID935, CCSID935, IBM935 CCSID-937 csCCSID937, CCSID937, IBM937 CCSID-939 csCCSID939, CCSID939, IBM939 CCSID-942 csCCSID942, CCSID942, IBM942 ChineseAutoDetect csChineseAutoDetect: Candidate encodings: GB2312, Big5, GB18030, UTF32:UTF8, UCS2, UTF32 EUC-H, csCNS11643EUC, EUC-TW, TW-EUC, H-EUC, CNS-11643-1992, EUC-H-1992, csCNS11643-1992-EUC, EUC-TW-1992, CNS-11643 TW-EUC-1992, H-EUC-1992 CNS-11643-1986 EUC-H-1986, csCNS11643_1986_EUC, EUC-TW-1986, TW-EUC-1986, H-EUC-1986 CP10000 csCP10000, windows-10000 CP10001 csCP10001, windows-10001 CP10002 csCP10002, windows-10002 CP10003 csCP10003, windows-10003 CP10004 csCP10004, windows-10004 CP10005 csCP10005, windows-10005 CP10006 csCP10006, windows-10006 CP10007 csCP10007, windows-10007 CP10008 csCP10008, windows-10008 CP10010 csCP10010, windows-10010 CP10017 csCP10017, windows-10017 CP10029 csCP10029, windows-10029 CP10079 csCP10079, windows-10079 -
Script Recognition by Statistical Analysis of the Image Texture
X International Symposium on Industrial Electronics INDEL 2014, Banja Luka, November 0608, 2014 Script Recognition by Statistical Analysis of the Image Texture Invited Paper Darko Brodić Technical Faculty in Bor University of Belgrade Bor, Serbia [email protected] Abstract—The paper proposed an algorithm for the script The proposed approach incorporates the statistical analysis identification using the statistical analysis of the texture obtained of the texture. Texture is suitable for extracting similarities and by script mapping. First, the algorithm models the script sign by dissimilarities between images. However, the novelty of the the equivalent script type. The script type is determined by the proposed approach [8] is given by specific text modeling and position of the letter in the baseline area. Furthermore, the 1-D texture analysis. During text modeling the number of extraction of the features is performed. This step of the algorithm variables is substantially reduced. Furthermore, the image is is based on the script type occurrence and co-occurrence pattern replaced with text. Hence, the image which represents a 2-D analysis. Then, the resultant features are compared. Their image is replaced with the text given by 1-D "image". All differences simplify the script feature classification. The aforementioned contributes to the algorithm's speed. algorithm is tested on the German and Slavic printed documents incorporating different scripts. The experiment gives the results The paper is organized as follows. Section 2 describes the that are promising. proposed algorithm. Section 3 explains the experiment. Section 4 presents the results and gives the discussion. Section 5 makes Keywords - Coding, Cultural heritage, Historical documents, conclusions. -
The Fontspec Package Font Selection for XƎLATEX and Lualatex
The fontspec package Font selection for XƎLATEX and LuaLATEX Will Robertson and Khaled Hosny [email protected] 2013/05/12 v2.3b Contents 7.5 Different features for dif- ferent font sizes . 14 1 History 3 8 Font independent options 15 2 Introduction 3 8.1 Colour . 15 2.1 About this manual . 3 8.2 Scale . 16 2.2 Acknowledgements . 3 8.3 Interword space . 17 8.4 Post-punctuation space . 17 3 Package loading and options 4 8.5 The hyphenation character 18 3.1 Maths fonts adjustments . 4 8.6 Optical font sizes . 18 3.2 Configuration . 5 3.3 Warnings .......... 5 II OpenType 19 I General font selection 5 9 Introduction 19 9.1 How to select font features 19 4 Font selection 5 4.1 By font name . 5 10 Complete listing of OpenType 4.2 By file name . 6 font features 20 10.1 Ligatures . 20 5 Default font families 7 10.2 Letters . 20 6 New commands to select font 10.3 Numbers . 21 families 7 10.4 Contextuals . 22 6.1 More control over font 10.5 Vertical Position . 22 shape selection . 8 10.6 Fractions . 24 6.2 Math(s) fonts . 10 10.7 Stylistic Set variations . 25 6.3 Miscellaneous font select- 10.8 Character Variants . 25 ing details . 11 10.9 Alternates . 25 10.10 Style . 27 7 Selecting font features 11 10.11 Diacritics . 29 7.1 Default settings . 11 10.12 Kerning . 29 7.2 Changing the currently se- 10.13 Font transformations . 30 lected features . -
Proposal for Generation Panel for Latin Script Label Generation Ruleset for the Root Zone
Generation Panel for Latin Script Label Generation Ruleset for the Root Zone Proposal for Generation Panel for Latin Script Label Generation Ruleset for the Root Zone Table of Contents 1. General Information 2 1.1 Use of Latin Script characters in domain names 3 1.2 Target Script for the Proposed Generation Panel 4 1.2.1 Diacritics 5 1.3 Countries with significant user communities using Latin script 6 2. Proposed Initial Composition of the Panel and Relationship with Past Work or Working Groups 7 3. Work Plan 13 3.1 Suggested Timeline with Significant Milestones 13 3.2 Sources for funding travel and logistics 16 3.3 Need for ICANN provided advisors 17 4. References 17 1 Generation Panel for Latin Script Label Generation Ruleset for the Root Zone 1. General Information The Latin script1 or Roman script is a major writing system of the world today, and the most widely used in terms of number of languages and number of speakers, with circa 70% of the world’s readers and writers making use of this script2 (Wikipedia). Historically, it is derived from the Greek alphabet, as is the Cyrillic script. The Greek alphabet is in turn derived from the Phoenician alphabet which dates to the mid-11th century BC and is itself based on older scripts. This explains why Latin, Cyrillic and Greek share some letters, which may become relevant to the ruleset in the form of cross-script variants. The Latin alphabet itself originated in Italy in the 7th Century BC. The original alphabet contained 21 upper case only letters: A, B, C, D, E, F, Z, H, I, K, L, M, N, O, P, Q, R, S, T, V and X. -
On Digitization of Romanian Cyrillic Printings of the 17Th–18Th Centuries
Computer Science Journal of Moldova, vol.25, no.2(74), 2017 On Digitization of Romanian Cyrillic Printings of the 17th–18th Centuries Svetlana Cojocaru Alexandru Colesnicov Ludmila Malahov Tudor Bumbu Ștefan Ungur Abstract The paper describes in details recognition of Romanian texts of the 17th–18th centuries printed in the Cyrillic script, and their conversion to the modern Latin script. The challenges are dis- cussed, and solutions of problems are proposed. The elaborated technology and a tool pack include historical alphabets, sets of recognition patterns, and spelling dictionaries in the corresponding orthographies for ABBYY Finereader. In addition, virtual keyboards, fonts, a transliteration utility, and the user manual were developed. This permits successful recognition of old Romanian texts in the Cyrillic script. Transliteration to the Latin script grants no- barrier access to historical documents. 1 Introduction OCR of old books is a sophisticated task. The problems arose from peculiarities of historical typography, non-standardized spelling, and physical degradation of the documents due to their aging and usage. This paper describes digitization of the Romanian texts of the 17th– 18th centuries that were printed in the old Romanian Cyrillic script (RC). We need to OCR them, to present the recognized text in the fonts of the corresponding period, to transliterate then to the modern Romanian Latin script (MRL). Sometimes the reverse transliteration is also useful. We use the existing programs and develop our own pack of additional programs and data. ©2017 by S.Cojocaru, A.Colesnicov, L.Malahov, T.Bumbu, Ș.Ungur 217 S.Cojocaru et al. In the literature, we met the opinion that commercial software like ABBYY Finereader (AFR) is not fully suitable for OCR of old print- ings. -
Glagolitic Script in the UCS Source: Michael Everson, EGT (IE) Status: Expert Contribution Action: for Consideration by JTC1/SC2/WG2 and UTC Date: 1998-11-24
ISO/IEC JTC1/SC2/WG2 N1931 1998-11-24 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Œåæäóíàðîäíàß îðãàíèçàöèß ïî ñòàíäàðòèçàöèè Doc Type: Working Group Document Title: Revised proposal for encoding the Glagolitic script in the UCS Source: Michael Everson, EGT (IE) Status: Expert Contribution Action: For consideration by JTC1/SC2/WG2 and UTC Date: 1998-11-24 This document is based on the proposal written by Joe Becker and published in UTR#3, and the proposal written by me in N1659. It is a revision of N1659, and contains the proposal summary. A. Administrative 1. Title Revised proposal for encoding the Glagolitic script in the UCS. 2. Requester’s name Michael Everson, EGT (WG2 member for Ireland). 3. Requester type Expert contribution. 4. Submission date 1998-11-24. 5. Requester’s reference 6a. Completion This is a complete proposal. 6b. More information to be provided? No. B. Technical – General 1a. New script? Name? Yes. Glagolitic. 1b. Addition of characters to existing block? Name? No. 2. Number of characters 92. 3. Proposed category Category B.1. 4. Proposed level of implementation and rationale As an alphabetic script, Glagolitic requires Level 1. 5a. Character names included in proposal? Yes. 5b. Character names in accordance with guidelines? Yes. 5c. Character shapes reviewable? Yes (see below). 1 6a. Who will provide computerized font? Michael Everson. 6b. Font currently available? Yes. 6c. Font format? TrueType. 7a. Are references (to other character sets, dictionaries, descriptive texts, etc.) provided? Yes. ISO 6861:1996 is a coded character set for Glagolitic, as well as the bibliography below. -
Greek Type Design Introduction
A primer on Greek type design by Gerry Leonidas T the 1997 ATypI Conference at Reading I gave a talk Awith the title ‘Typography & the Greek language: designing typefaces in a cultural context.’ The inspiration for that talk was a discussion with Christopher Burke on designing typefaces for a script one is not linguistically familiar with. My position was that knowledge and use of a language is not a prerequisite for understanding the script to a very high, if not conclusive, degree. In other words, although a ‘typographically attuned’ native user should test a design in real circumstances, any designer could, with the right preparation and monitoring, produce competent typefaces. This position was based on my understanding of the decisions a designer must make in designing a Greek typeface. I should add that this argument had two weak points: one, it was based on a small amount of personal experience in type design and a lot of intuition, rather than research; and, two, it was quite possible that, as a Greek, I was making the ‘right’ choices by default. Since 1997, my own work proved me right on the first point, and that of other designers – both Greeks and non- Greeks – on the second. The last few years saw multilingual typography literally explode. An obvious arena was the broader European region: the Amsterdam Treaty of 1997 which, at the same time as bringing the European Union closer to integration on a number of fields, marked a heightening of awareness in cultural characteristics, down to an explicit statement of support for dialects and local script variations. -
Latin Alphabet for Kazakhstan: Turkification, Westernisation Or Useless Initiative?
Latin Alphabet for Kazakhstan: Turkification, Westernisation or Useless Initiative? By 2025, five out of six Turkic-speaking countries will be using Latin alphabet. It’s the year when Kazakhstan will switch from Cyrillic to Latin alphabet. Follow us on LinkedIn The reasons for this switch and the need for it are diverse. Some experts think this decision highlights the cooling in relations between Kazakhstan and Russia, a desire to distance away from the Soviet past and to focus on the West. Others say this issue should not be politicised and explain the switch to Latin alphabet by the demand of the modern world and the need for cultural self-identity. Back in the Soviet period, the Kazakh alphabet changed twice. In 1929, it was changed to Latin alphabet and in 1940 to Cyrillic alphabet. The initiator of the language reform in Kazakhstan in 2017 was president Nursultan Nazarbayev, who claimed that introduction of the Latin alphabet was a step toward integration with the common developing information world and the global system of science and culture. He denied categorically a version of changes in the geopolitical preferences of the country: Someone sees it, without cause, as a kind of a “sign” of changes in the -geopolitical preferences of Kazakhstan. This is not the case. I can tell it clearly that a switch to Latin alphabet is a domestic need for development and modernisation of the Kazakh language. You should not look for a black cat in a coal cellar because it was never there.” (cited from Tengrinews) The implementation of this project will cost much to the national budget.