The Non-Latin Scripts & Typography

Total Page:16

File Type:pdf, Size:1020Kb

The Non-Latin Scripts & Typography TUGboat, Volume 41 (2020), No. 3 275 The Non-Latin scripts & typography group includes Tamil, Telugu, Kannada, Malayalam, and Sinhala. In Southeast Asia, we find more dis- Kamal Mansour tant descendants of Brahmi in Thai, Lao, Khmer, 1 Introduction and Myanmar scripts. As close relatives of Latin, the Greek and Cyril- It is quite common in typographic terminology to lic scripts had more time to develop typographically divide the world’s scripts into Latin and non-Latins. than other non-Latins. In its earliest typographic At first glance that might seem to be a reasonable forms, Greek initially imitated cursive scribal form categorization until one looks a bit closer to find before eventually settling on printed forms based on that non-Latin consists of a huge number of diverse its classic origins. Classical and Biblical studies fur- scripts. Imagine if we were to call Latin script blue, ther established the need for Greek typefaces in the while calling all others non-blue. We would be gath- late nineteenth and early twentieth centuries. ering the remaining colors of the spectrum into one Cyrillic, originally derived from Greek, was ex- group without differentiation. Calling a color non- tensively revised and “europeanized” under the hand blue doesn’t tell us anything about what it might of Peter the Great. Then early in the twentieth cen- be; is it orange, green, magenta, indigo? tury, Cyrillic was further simplified by purging it of However strange this terminology may be, there unnecessary vestiges from Greek orthography and re- is good reason behind it. Gutenberg invented move- dundant characters. In an effort to further unite the able type for Latin script; in particular, he focused huge territory of the Soviet Union, the government on a certain Gothic style used at the time by Ger- extended Cyrillic script to support the many non- man scribes. We know things didn’t stop there. The Slavic languages of the various republics. Outside printed forms of Latin script moved away from the the Soviet Union, a certain number of Cyrillic type- handwritten forms over time, evolving into a dis- faces were always available through the mainstream tinct craft and discipline. In the process, the letter type foundries to satisfy the needs of academics in forms underwent considerable simplification. By the Slavic Studies and of emigrant communities. time typographic letters began to develop for other Because of their common shapes and behavior, scripts, Latin type had reached a maturity possible Greek and Cyrillic have always been easily accommo- only through time. Over the centuries, the machin- dated on the same typesetting equipment as Latin ery and techniques had been refined for the needs of script. Of course, the compositor had to be familiar Latin type and any newcomers to the game needed with the script he was setting, with Polytonic Greek to adapt to the current state of things. being the most demanding because of the variety of In the realm of non-Latin scripts, what are the diacritics. main groups and families? Moving eastward from When it comes to typography, every script has the origins of Latin script, we find Greek script, and its intricate details, and consequently, its own story further east, its close relative, Cyrillic. Hovering of transition from handwriting to type. It would over the Caucasus, Armenian and Georgian raise be interesting to tell every story, but that would be their heads. Reaching from North Africa eastward, enough to fill many books. Without going into every Arabic script occupies a large swath of the land- detail, we can gain some understanding by survey- scape that reaches as far as India. Near the juncture ing the types of problems that arose when adapting of Western Asia and North Africa, Hebrew has its typesetting technology developed for Latin script to home. In East Africa, Ethiopic script — in ancient non-Latin scripts. Because scripts have developed times know as Ge’ez — has a sizable presence. Over as families, they share many attributes with other the large territory of China, as well as in Taiwan, members of the family. We can take advantage Chinese is the dominant script. Hangul, a syllabic of the similarities to identify recurring obstacles in script with some Chinese visual features, dominates the typographic development of non-Latin scripts. in Korea. Meanwhile, further south in Japan, rules a By citing judiciously chosen examples from a few unique hybrid writing system consisting of two syl- scripts, we can readily present the gamut of signifi- labaries (katakana and hiragana), Chinese charac- cant difficulties. ters, in addition to Latin. The Indian subcontinent Before delving into the range of technical chal- is home for a large family of scripts descended from lenges, we must first recognize the most common ancient Brahmi writing. Over the centuries, the de- hurdle shared by all who endeavored to bring a non- scendants split into two groups, northern and south- Latin script into the typographic age: the fear of the ern scripts. In the northern group, Devanagari, Ben- exotic. Typography developed in Europe and it was gali, and Gujarati scripts stand out. The southern The Non-Latin scripts & typography 276 TUGboat, Volume 41 (2020), No. 3 Europeans who first strove to take this craft to dis- The orthographic norms of Khmer script represent tant lands. Before the advent of sociology, anthro- the range of simple to complex syllables by allowing pology, and linguistics, non-European cultures were its components to stack both horizontally and ver- seen as foreign, exotic, and difficult to fathom. Be- tically. Like Devanagari, the simplest syllable con- fore understanding the written form of a language, sists of a consonant with an implied or explicit vowel, one must first learn the spoken language and, to but because of the tonal nature of Khmer, marks in- some extent, understand its culture. dicating tone are sometimes also required. Khmer To explore the gamut of typographic challenges, has the same needs as Devanagari for upper and we will visit four scripts: Devanagari, Khmer, Ara- lower marks, which result in more height and depth bic, and Chinese. In terms of visual impression and for syllable clusters. While Devanagari allows multi- structure, each of these scripts differs greatly from consonant syllables, it also allows them to stack hor- the others. izontally, if need be. On the other hand, Khmer requires the components of a multi-consonant sylla- 2 Devanagari ble cluster to stack vertically. We begin with Devanagari — the most commonly Following is an example of a relatively simple used script in contemporary India — that is used syllable cluster consisting of a single consonant no primarily for the Hindi and Marathi languages. De- followed by a vowel mark i above (u1793 + u17B7): vanagari has a long history of development that be- gan with the Sanskrit language centuries ago. The ន + ◌ −! ន structure of Devanagari writing is based on the sylla- ិ ិ ble which, in its simplest form, can be composed of As an example where additional depth is re- a consonant with either an inherent or explicit vowel quired, the following syllable cluster consists of a mark. As an example, let us consider the letter ka, single consonantal letter followed by a subscript con- sonant that sits to its right while partially overlap- which when written as क, includes the implied vowel a. If we want to write kī, we need to explicitly add ping it below [pha + subscript sa ]. The use of a subscript form indicates that the preceding conso- the ī vowel ◌ी to ka क, resulting in क. Presented at a larger size, we can clearly show how the two nant is not followed by a vowel (u1795 + u17D2 + components of the kī syllable overlap on the top: u179F): −! ផ ◌្ ស −! ផ䮟 क + ◌ी क Now, if we want to write ku instead of kī, the u The character u17D2 indicates that the follow- vowel is placed below ka as follows: ing consonantal symbol should appear in its sub- script form. There are also deeper cases where a vowel sym- Marks other than vowels can also appear above bol ie surrounds and subtends a consonant ko fol- and below letters; for instance, if we want to indicate lowed by a subscript consonant lo (u1783 + u17D2 that the vowel in ka sounds nasal, we would add a + u179B + u17C0). Note that the right part of ie dot above (natively, anusvara) as in: is stretched to embrace the stacked consonants (ko+ subscript lo + ie): कं Since Latin type was usually composed on a ឃ ល េ◌ៀ −! េឃៀ䮛 single horizontal line with gaps between letters, De- vanagari presented quite a challenge because of its The above sequence could also carry an addi- need for overlap between adjacent characters, in ad- tional mark above it. Unlike Latin letters, Khmer dition to upper and lower marks. characters have a predominantly vertical aspect ra- tio when arranged into syllable clusters. One can 3 Khmer imagine the numerous challenges posed by Khmer Next, we consider Khmer, or Cambodian, writing. script for those who wanted to implement it in metal As a remote descendant of the Brahmi script, the type. How does one stack so many vertical elements structure of Khmer is also broadly based on the syl- along the baseline? If Latin and Khmer letters are lable, including the presence of an implied or explicit put side by side, what sort of size ratio should be vowel; however, the phonetic nature of the Khmer maintained between the two? Does one resort to us- language has also resulted in many additional fea- ing the largest leading needed throughout one docu- tures that go beyond the ancestral Brahmi model.
Recommended publications
  • A Method to Accommodate Backward Compatibility on the Learning Application-Based Transliteration to the Balinese Script
    (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 6, 2021 A Method to Accommodate Backward Compatibility on the Learning Application-based Transliteration to the Balinese Script 1 3 4 Gede Indrawan , I Gede Nurhayata , Sariyasa I Ketut Paramarta2 Department of Computer Science Department of Balinese Language Education Universitas Pendidikan Ganesha (Undiksha) Universitas Pendidikan Ganesha (Undiksha) Singaraja, Indonesia Singaraja, Indonesia Abstract—This research proposed a method to accommodate transliteration rules (for short, the older rules) from The backward compatibility on the learning application-based Balinese Alphabet document 1 . It exposes the backward transliteration to the Balinese Script. The objective is to compatibility method to accommodate the standard accommodate the standard transliteration rules from the transliteration rules (for short, the standard rules) from the Balinese Language, Script, and Literature Advisory Agency. It is Balinese Language, Script, and Literature Advisory Agency considered as the main contribution since there has not been a [7]. This Bali Province government agency [4] carries out workaround in this research area. This multi-discipline guidance and formulates programs for the maintenance, study, collaboration work is one of the efforts to preserve digitally the development, and preservation of the Balinese Language, endangered Balinese local language knowledge in Indonesia. The Script, and Literature. proposed method covered two aspects, i.e. (1) Its backward compatibility allows for interoperability at a certain level with This study was conducted on the developed web-based the older transliteration rules; and (2) Breaking backward transliteration learning application, BaliScript, for further compatibility at a certain level is unavoidable since, for the same ubiquitous Balinese Language learning since the proposed aspect, there is a contradictory treatment between the standard method reusable for the mobile application [8], [9].
    [Show full text]
  • 75 Characters Maximum
    Kannada Script LGR Proposal Introduction, Current Analysis and Next Steps Dr. U.B. Pavanaja NBGP F2F Meeting, Colombo 14 December 2017 | 1 Agenda 1 2 3 Introduction to Repertoire Analysis Within Script Kannada Script Variants 4 5 6 Cross-Script WLE Rules Current Status and Variants Next Steps for Completion | 2 Introduction to Kannada Script Population – there are about 60 million speakers of Kannada language which uses Kannada script. Geographical area - Kannada is spoken predominantly by the people of Karnataka State of India. It is also spoken by significant linguistic minorities in the states of Andhra Pradesh, Telangana, Tamil Nadu, Maharashtra, Kerala, Goa and abroad Languages written in Kannada script – Kannada, Tulu, Kodava (Coorgi), Konkani, Havyaka, Sanketi, Beary (byaari), Arebaase, Koraga | 3 Classification of Characters Swaras (vowels) Letter ಅ ಆ ಇ ಈ ಉ ಊ ಋ ಎ ಏ ಐ ಒ ಓ ಔ Vowel sign/ N/Aಾ ಾ ಾ ಾ ಾ ಾ ಾ ಾ ಾ ಾ ಾ ಾ matra Yogavahas In Kannada, all consonants Anusvara ಅಂ (vyanjanas) when written as ಕ (ka), ಖ (kha), ಗ (ga), etc. actually have a built-in vowel sign (matra) Visarga ಅಃ of vowel ಅ (a) in them. | 4 Classification of Characters Vargeeya vyanjana (structured consonants) voiceless voiceless aspirate voiced voiced aspirate nasal Velars ಕ ಖ ಗ ಘ ಙ Palatals ಚ ಛ ಜ ಝ ಞ Retroflex ಟ ಠ ಡ ಢ ಣ Dentals ತ ಥ ದ ಧ ನ Labials ಪ ಫ ಬ ಭ ಮ Avargeeya vyanjana (unstructured consonants) ಯ ರ ಱ (obsolete) ಲ ವ ಶ ಷ ಸ ಹ ಳ ೞ (obsolete) | 5 Repertoire Included-1 Sr. Unicode Glyph Character Name Unicode Indic Ref Widespread No. Code General Syllabic use ? Point Category Category [Yes/No] 1 0C82 ಂ KANNADA SIGN ANUSVARA Mc Anusvara Yes 2 0C83 ಂ KANNADA SIGN VISARGA Mc Visarga Yes 3 0C85 ಅ KANNADA LETTER A Lo Vowel Yes 4 0C86 ಆ KANNADA LETTER AA Lo Vowel Yes 5 0C87 ಇ KANNADA LETTER I Lo Vowel Yes 6 0C88 ಈ KANNADA LETTER II Lo Vowel Yes 7 0C89 ಉ KANNADA LETTER U Lo Vowel Yes 8 0C8A ಊ KANNADA LETTER UU Lo Vowel Yes KANNADA LETTER VOCALIC 9 0C8B ಋ R Lo Vowel Yes 10 0C8E ಎ KANNADA LETTER E Lo Vowel Yes | 6 Repertoire Included-2 Sr.
    [Show full text]
  • Grammatica Et Verba Glamor and Verve
    “GippertOVprint” — 2014/4/24 — 22:14 — page 81 —#1 Grammatica et verba Glamor and verve Studies in South Asian, historical, and Indo-European linguistics in honor of Hans Henrich Hock on the occasion of his seventy-fifth birthday edited by Shu-Fen Chen and Benjamin Slade Beech Stave Press Ann Arbor New York • “GippertOVprint” — 2014/4/24 — 22:14 — page 82 —#2 © Beech Stave Press, Inc. All rights reserved. No part of this publication may be reproduced, translated, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission from the publisher. Typeset with LATEX using the Galliard typeface designed by Matthew Carter and Greek Old Face by Ralph Hancock. The typeface on the cover is Post Hock by Steve Peter. Library of Congress Cataloging-in-Publication Data Grammatica et verba : glamor and verve : studies in South Asian, historical, and Indo- European linguistics in honor of Hans Henrich Hock on the occasion of his seventy- fifth birthday / edited by Shu-Fen Chen and Benjamin Slade. pages cm Includes bibliographical references. ISBN ---- (alk. paper) . Indo-European languages. Lexicography. Historical linguistics. I. Hock, Hans Henrich, - honoree. II. Chen, Shu-Fen, editor of compilation. III. Slade, Ben- jamin, editor of compilation. –dc Printed in the United States of America “GippertOVprint” — 2014/4/24 — 22:14 — page 83 —#3 TableHHHHHH ofHHHHHHHHHHHHHHH ContentsHHHHHHHHHHHH Preface . vii Bibliography of Hans Henrich Hock . ix List of Contributors . xxi , Traces of Archaic Human Language Structure Anvita Abbi in the Great Andamanese Language. , A Study of Punctuation Errors in the Chinese Diamond Sutra Shu-Fen Chen Based on Sanskrit Texts .
    [Show full text]
  • Control of Equilibrium Phases (M,T,S) in the Modified Aluminum Alloy
    Materials Transactions, Vol. 44, No. 1 (2003) pp. 181 to 187 #2003 The Japan Institute of Metals EXPRESS REGULAR ARTICLE Control of Equilibrium Phases (M,T,S) in the Modified Aluminum Alloy 7175 for Thick Forging Applications Seong Taek Lim1;*, Il Sang Eun2 and Soo Woo Nam1 1Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology, 373-1Guseong-dong, Yuseong-gu, Daejeon, 305-701,Korea 2Agency for Defence Development, P.O. Box 35-5, Yuseong-gu, Daejeon, 305-600, Korea Microstructural evolutions, especially for the coarse equilibrium phases, M-, T- and S-phase, are investigated in the modified aluminum alloy 7175 during the primary processing of large ingot for thick forging applications. These phases are evolved depending on the constitutional effect, primarily the change of Zn:Mg ratio, and cooling rate following solutionizing. The formation of the S-phase (Al2CuMg) is effectively inhibited by higher Zn:Mg ratio rather than higher solutionizing temperature. The formation of M-phase (MgZn2) and T-phase (Al2Mg3Zn3)is closely related with both constitution of alloying elements and cooling rate. Slow cooling after homogenization promotes the coarse precipitation of the M- and T-phases, but becomes less effective as the Zn:Mg ratio increases. In any case, the alloy with higher Zn:Mg ratio is basically free of both T and S-phases. The stability of these phases is discussed in terms of ternary and quaternary phase diagrams. In addition, the modified alloy, Al–6Zn–2Mg–1.3%Cu, has greatly reduced quench sensitivity through homogeneous precipitation, which is uniquely applicable in 7175 thick forgings.
    [Show full text]
  • Technical Reference Manual for the Standardization of Geographical Names United Nations Group of Experts on Geographical Names
    ST/ESA/STAT/SER.M/87 Department of Economic and Social Affairs Statistics Division Technical reference manual for the standardization of geographical names United Nations Group of Experts on Geographical Names United Nations New York, 2007 The Department of Economic and Social Affairs of the United Nations Secretariat is a vital interface between global policies in the economic, social and environmental spheres and national action. The Department works in three main interlinked areas: (i) it compiles, generates and analyses a wide range of economic, social and environmental data and information on which Member States of the United Nations draw to review common problems and to take stock of policy options; (ii) it facilitates the negotiations of Member States in many intergovernmental bodies on joint courses of action to address ongoing or emerging global challenges; and (iii) it advises interested Governments on the ways and means of translating policy frameworks developed in United Nations conferences and summits into programmes at the country level and, through technical assistance, helps build national capacities. NOTE The designations employed and the presentation of material in the present publication do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. The term “country” as used in the text of this publication also refers, as appropriate, to territories or areas. Symbols of United Nations documents are composed of capital letters combined with figures. ST/ESA/STAT/SER.M/87 UNITED NATIONS PUBLICATION Sales No.
    [Show full text]
  • 214 CHAPTER 5 [T, D]-DELETION in ENGLISH in English, a Coronal Stop
    CHAPTER 5 [t, d]-DELETION IN ENGLISH In English, a coronal stop that appears as last member of a word-final consonant cluster is subject to variable deletion – i.e. a word such as west can be pronounced as either [wEst] or [wEs]. Over the past thirty five years, this phenomenon has been studied in more detail than probably any other variable phonological phenomenon. Final [t, d]-deletion has been studied in dialects as diverse as the following: African American English (AAE) in New York City (Labov et al., 1968), in Detroit (Wolfram, 1969), and in Washington (Fasold, 1972), Standard American English in New York and Philadelphia (Guy, 1980), Chicano English in Los Angeles (Santa Ana, 1991), Tejano English in San Antonio (Bayley, 1995), Jamaican English in Kingston (Patrick, 1991) and Trinidadian English (Kang, 1 1994), etc.TP PT Two aspects that stand out from all these studies are (i) that this process is strongly grammatically conditioned, and (ii) that the grammatical factors that condition this process are the same from dialect to dialect. Because of these two facts [t, d]-deletion is particularly suited to a grammatical analysis. In this chapter I provide an analysis for this phenomenon within the rank-ordering model of EVAL. The factors that influence the likelihood of application of [t, d]-deletion can be classified into three broad categories: the following context (is the [t, d] followed by a consonant, vowel or pause), the preceding context (the phonological features of the consonant preceding the [t, d]), the grammatical status of the [t, d] (is it part of the root or 1 TP PT This phenomenon has also been studied in Dutch – see Schouten (1982, 1984) and Hinskens (1992, 1996).
    [Show full text]
  • ISO Basic Latin Alphabet
    ISO basic Latin alphabet The ISO basic Latin alphabet is a Latin-script alphabet and consists of two sets of 26 letters, codified in[1] various national and international standards and used widely in international communication. The two sets contain the following 26 letters each:[1][2] ISO basic Latin alphabet Uppercase Latin A B C D E F G H I J K L M N O P Q R S T U V W X Y Z alphabet Lowercase Latin a b c d e f g h i j k l m n o p q r s t u v w x y z alphabet Contents History Terminology Name for Unicode block that contains all letters Names for the two subsets Names for the letters Timeline for encoding standards Timeline for widely used computer codes supporting the alphabet Representation Usage Alphabets containing the same set of letters Column numbering See also References History By the 1960s it became apparent to thecomputer and telecommunications industries in the First World that a non-proprietary method of encoding characters was needed. The International Organization for Standardization (ISO) encapsulated the Latin script in their (ISO/IEC 646) 7-bit character-encoding standard. To achieve widespread acceptance, this encapsulation was based on popular usage. The standard was based on the already published American Standard Code for Information Interchange, better known as ASCII, which included in the character set the 26 × 2 letters of the English alphabet. Later standards issued by the ISO, for example ISO/IEC 8859 (8-bit character encoding) and ISO/IEC 10646 (Unicode Latin), have continued to define the 26 × 2 letters of the English alphabet as the basic Latin script with extensions to handle other letters in other languages.[1] Terminology Name for Unicode block that contains all letters The Unicode block that contains the alphabet is called "C0 Controls and Basic Latin".
    [Show full text]
  • Sinitic Language and Script in East Asia: Past and Present
    SINO-PLATONIC PAPERS Number 264 December, 2016 Sinitic Language and Script in East Asia: Past and Present edited by Victor H. Mair Victor H. Mair, Editor Sino-Platonic Papers Department of East Asian Languages and Civilizations University of Pennsylvania Philadelphia, PA 19104-6305 USA [email protected] www.sino-platonic.org SINO-PLATONIC PAPERS FOUNDED 1986 Editor-in-Chief VICTOR H. MAIR Associate Editors PAULA ROBERTS MARK SWOFFORD ISSN 2157-9679 (print) 2157-9687 (online) SINO-PLATONIC PAPERS is an occasional series dedicated to making available to specialists and the interested public the results of research that, because of its unconventional or controversial nature, might otherwise go unpublished. The editor-in-chief actively encourages younger, not yet well established, scholars and independent authors to submit manuscripts for consideration. Contributions in any of the major scholarly languages of the world, including romanized modern standard Mandarin (MSM) and Japanese, are acceptable. In special circumstances, papers written in one of the Sinitic topolects (fangyan) may be considered for publication. Although the chief focus of Sino-Platonic Papers is on the intercultural relations of China with other peoples, challenging and creative studies on a wide variety of philological subjects will be entertained. This series is not the place for safe, sober, and stodgy presentations. Sino- Platonic Papers prefers lively work that, while taking reasonable risks to advance the field, capitalizes on brilliant new insights into the development of civilization. Submissions are regularly sent out to be refereed, and extensive editorial suggestions for revision may be offered. Sino-Platonic Papers emphasizes substance over form.
    [Show full text]
  • Arabic Alphabet - Wikipedia, the Free Encyclopedia Arabic Alphabet from Wikipedia, the Free Encyclopedia
    2/14/13 Arabic alphabet - Wikipedia, the free encyclopedia Arabic alphabet From Wikipedia, the free encyclopedia َأﺑْ َﺠ ِﺪﯾﱠﺔ َﻋ َﺮﺑِﯿﱠﺔ :The Arabic alphabet (Arabic ’abjadiyyah ‘arabiyyah) or Arabic abjad is Arabic abjad the Arabic script as it is codified for writing the Arabic language. It is written from right to left, in a cursive style, and includes 28 letters. Because letters usually[1] stand for consonants, it is classified as an abjad. Type Abjad Languages Arabic Time 400 to the present period Parent Proto-Sinaitic systems Phoenician Aramaic Syriac Nabataean Arabic abjad Child N'Ko alphabet systems ISO 15924 Arab, 160 Direction Right-to-left Unicode Arabic alias Unicode U+0600 to U+06FF range (http://www.unicode.org/charts/PDF/U0600.pdf) U+0750 to U+077F (http://www.unicode.org/charts/PDF/U0750.pdf) U+08A0 to U+08FF (http://www.unicode.org/charts/PDF/U08A0.pdf) U+FB50 to U+FDFF (http://www.unicode.org/charts/PDF/UFB50.pdf) U+FE70 to U+FEFF (http://www.unicode.org/charts/PDF/UFE70.pdf) U+1EE00 to U+1EEFF (http://www.unicode.org/charts/PDF/U1EE00.pdf) Note: This page may contain IPA phonetic symbols. Arabic alphabet ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع en.wikipedia.org/wiki/Arabic_alphabet 1/20 2/14/13 Arabic alphabet - Wikipedia, the free encyclopedia غ ف ق ك ل م ن ه و ي History · Transliteration ء Diacritics · Hamza Numerals · Numeration V · T · E (//en.wikipedia.org/w/index.php?title=Template:Arabic_alphabet&action=edit) Contents 1 Consonants 1.1 Alphabetical order 1.2 Letter forms 1.2.1 Table of basic letters 1.2.2 Further notes
    [Show full text]
  • Old Cyrillic in Unicode*
    Old Cyrillic in Unicode* Ivan A Derzhanski Institute for Mathematics and Computer Science, Bulgarian Academy of Sciences [email protected] The current version of the Unicode Standard acknowledges the existence of a pre- modern version of the Cyrillic script, but its support thereof is limited to assigning code points to several obsolete letters. Meanwhile mediæval Cyrillic manuscripts and some early printed books feature a plethora of letter shapes, ligatures, diacritic and punctuation marks that want proper representation. (In addition, contemporary editions of mediæval texts employ a variety of annotation signs.) As generally with scripts that predate printing, an obvious problem is the abundance of functional, chronological, regional and decorative variant shapes, the precise details of whose distribution are often unknown. The present contents of the block will need to be interpreted with Old Cyrillic in mind, and decisions to be made as to which remaining characters should be implemented via Unicode’s mechanism of variation selection, as ligatures in the typeface, or as code points in the Private space or the standard Cyrillic block. I discuss the initial stage of this work. The Unicode Standard (Unicode 4.0.1) makes a controversial statement: The historical form of the Cyrillic alphabet is treated as a font style variation of modern Cyrillic because the historical forms are relatively close to the modern appearance, and because some of them are still in modern use in languages other than Russian (for example, U+0406 “I” CYRILLIC CAPITAL LETTER I is used in modern Ukrainian and Byelorussian). Some of the letters in this range were used in modern typefaces in Russian and Bulgarian.
    [Show full text]
  • Shahmukhi to Gurmukhi Transliteration System: a Corpus Based Approach
    Shahmukhi to Gurmukhi Transliteration System: A Corpus based Approach Tejinder Singh Saini1 and Gurpreet Singh Lehal2 1 Advanced Centre for Technical Development of Punjabi Language, Literature & Culture, Punjabi University, Patiala 147 002, Punjab, India [email protected] http://www.advancedcentrepunjabi.org 2 Department of Computer Science, Punjabi University, Patiala 147 002, Punjab, India [email protected] Abstract. This research paper describes a corpus based transliteration system for Punjabi language. The existence of two scripts for Punjabi language has created a script barrier between the Punjabi literature written in India and in Pakistan. This research project has developed a new system for the first time of its kind for Shahmukhi script of Punjabi language. The proposed system for Shahmukhi to Gurmukhi transliteration has been implemented with various research techniques based on language corpus. The corpus analysis program has been run on both Shahmukhi and Gurmukhi corpora for generating statistical data for different types like character, word and n-gram frequencies. This statistical analysis is used in different phases of transliteration. Potentially, all members of the substantial Punjabi community will benefit vastly from this transliteration system. 1 Introduction One of the great challenges before Information Technology is to overcome language barriers dividing the mankind so that everyone can communicate with everyone else on the planet in real time. South Asia is one of those unique parts of the world where a single language is written in different scripts. This is the case, for example, with Punjabi language spoken by tens of millions of people but written in Indian East Punjab (20 million) in Gurmukhi script (a left to right script based on Devanagari) and in Pakistani West Punjab (80 million), written in Shahmukhi script (a right to left script based on Arabic), and by a growing number of Punjabis (2 million) in the EU and the US in the Roman script.
    [Show full text]
  • Learning Cyrillic
    LEARNING CYRILLIC Question: If there is no equivalent letter in the Cyrillic alphabet for the Roman "J" or "H" how do you transcribe good German names like Johannes, Heinrich, Wilhelm, etc. I heard one suggestion that Johann was written as Ivan and that the "h" was replaced with a "g". Can you give me a little insight into what you have found? In researching would I be looking for the name Ivan rather than Johann? One must always think phonetic, that is, think how a name is pronounced in German, and how does the Russian Cyrillic script produce that sound? JOHANNES. The Cyrillic spelling begins with the letter “I – eye”, but pronounced “eee”, so we have phonetically “eee-o-hann” which sounds like “Yo-hann”. You can see it better in typeface – Иоганн , which letter for letter reads as “I-o-h-a-n-n”. The modern Typeface script is radically different than the old hand-written Cyrillic script. Use the guide which I sent to you. Ivan is the Russian equivalent of Johann, and it pops up occasionally in Church records. JOSEPH / JOSEF. Listen to the way the name is pronounced in German – “yo-sef”, also “yo-sif”. That “yo” sound is produced by the Cyrillic script letters “I” and “o”. Again you can see it in the typeface. Иосеф and also Иосиф. And sometimes Joseph appears as , transliterated as O-s-i-p. Similar to all languages and scripts, Cyrillic spellings are not consistent. The “a” ending indicates a male name. JAKOB. There is no “Jay” sound in the German language.
    [Show full text]