Full Cyrillic: How Many Languages?
Total Page:16
File Type:pdf, Size:1020Kb
Full Cyrillic: How Many Languages? Olga G. Lapko Russia, 129820, Moscow Pervy Rizhsky per., 2 Mir Publishers [email protected] A Brief History of Cyrillic Mongolian language uses Russian letters with two ¯ additions: ‘é’, ‘ ’. The Slavonic writing was invented by St. Cyrill and One may also find Cyrillic letters used in scripts St. Method. Now there are two well-known Slavonic based on the Latin alphabet. Examples are the Chi- writings: Glagolitic and Cyrillic. nese languages: Y, Lahu, Lisu, Myao, Juang, as well Historians are not sure whether the author of as several African languages. both was St. Cyrill or whether Cyrillic script was created by St. Method while St. Cyrill invented the History of the full Cyrillic font project Glagolitic alphabet. In any case, in this paper we 1 will deal about the alphabet nowadays called “Cyril- The LHFONTS package was created as a part of lic.” the CyrTUG-EmTEX package, which is distributed The birthday of Cyrillic is considered to be the among Russian and non-Russian users who use end of May 863. May 24 was declared by UNESCO Cyrillic. LHFONTS offers the LH Cyrillic font fam- as the day of Cyrillic. By coincidence, the first con- ily; these fonts are based on the WNCYR fonts of the CYRILLIC package — part of AMS-T X. ference of the Cyrillic TEX Users Group, CyrTUG, E was held May 24–25, 1991. The main task of the LHFONTS package was The Cyrillic alphabet is based on the Greek al- to create the Cyrillic fonts family, an extension of phabet. There were 43 letters in this alphabet. Up standard text fonts of Computer Modern, which also until the beginning of the 20th century, four addi- corresponds to Russian typesetting traditions. tional letters existed; these are absent in the modern First this package offered two more or less popular encoding schemes:2 Alternative — an 8-bit s c Russian alphabet: ‘u’, ‘ ’, ‘ ’, and ‘i’. Nowadays Cyrillic script is used not only by Latin-Russian font encoding analogous to MS-DOS’s Slavonic people, but also by other nations of the for- Code Page 866, mainly used by Russian MS-DOS mer USSR. Historically, many of these nations used users; and the Washington or WNCYR — a 7-bit en- other scripts. Some Soviet republics such as Mid- coding for typesetting with transliteration, which is dle Asian republics, Azerbaijan, and the Russian mainly used by non-Russian users. The Cyrillic character encodings are described autonomic republics, used the Arabian script. In in special files — lbcoding.mf (Alternate encoding) Siberia, the old Mongolian vertical script was used in and wncoding.mf (WNCYR encoding). One can the Buryat language, and the Dzayapandin vertical choose between these files by changing the value script was used in the Kalmyk language. Soon af- of one of the following variables: altcoding, ter the October Revolution many languages started vfcoding or wncoding. These variables also to use the Latin script with additional letters. In determine the font layout: Abkhasia, the Georgian alphabet was used for a few years. At the end of 20s and 30s, almost all lan- altcoding Standard Computer Modern in the lower guages of the USSR changed from using Latin script part of table plus Russian letters and ad- to Cyrillic. In many languages new letters were cre- ditional punctuation marks in the upper ated (see fig. 1). one; Alternate encoding: encoding file Outside Russia, Cyrillic is used in Bulgaria, Ser- lbcoding.mf bia, Macedonia and Mongolia. The Bulgarian lan- guage now uses only letters of the modern Russian 1 This package was originally named MAKEFONT, but c alphabet, but earlier ‘ k’and‘ ’werealsoused.In was renamed to avoid confusion with the utility of the same Macedonian and Serbian the following additional name on the 4AllTEX CD-ROM, produced by the Nederland- Z S \ letters are used: ‘j’, ‘Y’, ‘ ’, with ‘ ’, ‘ ’, ‘s’ in stalige TEX Gebruikersgroep. 2 The package also offered virtual encoding: rather condi- [ R Macedonian only, and ‘_’, ‘ ’, ‘ ’ in Serbian. The tional 7-bit encoding which combines Cyrillic and Latin fonts. 174 TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting Full Cyrillic: How Many Languages? vfcoding Russian letters and added punctuation which use the Cyrillic alphabet with added letters. marks in the upper part of table — for the The following sections discuss this problem. following combining with Computer Mod- ern in virtual font; Alternate encoding: The Global Cyrillic font as the material for encoding file lbcoding.mf Ω project wncoding Cyrillic letters for Slavonic languages with necessary input ligatures, standard and Some time ago the multilingual project Ω was additional punctuation marks in the lower started. One of the authors of Ω, Yannis Haralam- part of table; WNCYR encoding: encoding bous, began to create a full Cyrillic font. He offered file wncoding.mf this font to CyrTUG for further work. The data for this font was taken from the Cyril- The values of these variables are determined in 3 driver files (ld??font.mf) by default. The header lic part of the Unicode table. But this table still of these files (for ex. ldrmfont.mf)containsthe does not cover all Cyrillic letters; some old Cyrillic following lines: letters and national letters are missing. Probably the full assortment of accented vowels is necessary, if unknown wncoding: wncoding:=0; fi which are not included in Unicode. if unknown vfcoding: vfcoding:=0; fi The font created during this work had more ... than 256 letters and marks. This font assortment altcoding:=1-wncoding-vfcoding; should be further extended and improved. During the testing of this font, I created shortened variants, if wncoding<>0: input wncoding; or split it into a few fonts. The two Cyrillic Uni- else: input lbcoding; fi code fonts were extended with glyphs for characters ... not included in Unicode and created by the methods Variables altcoding, vfcoding and wncoding may described in this paper (see the Appendix). be set by hand in the file header or at the start of The methods of partial font creation may use- AFONT the MET run. fully improve the economy of use of the computer’s The LHFONTS package contains font headers memory. One may create a big 256-letter (Unicode) named lh*.mf and ll*.mf font and then use a virtual font to achieve the nec- The files lh*.mf (56 files) are virtually identical essary encoding. Alternatively, one may create the to cm*.mf except for the last line: font immediately in its required encoding. generate <driver-file> AFONT How TEXhelpsMET .Creationof that is, the standard Computer Modern driver file coding and ligature-kerning tables was changed to the analogous file for the LH fonts. AFONT To create a font, MET needs program descrip- These file headers generate a full 8-bit Latin-Cyrillic tions for letters (a lot of them), information about font. lettercodes, and kerning and ligature data. The files ll*.mf (also 56 files) contain only the Now there are a few well-known encodings of following line: Cyrillic. They differ in which characters they hold vfcoding=1; input <header-file lh*>; and in what order (see the Appendix). So, for every thus the command vfcoding:=1; sets generation of encoding, a separate file is needed. Russian letters and punctuation marks only. Since TEX cannot use a font containing lig- Since the WNCYR encoding was an optional en- atures or kerning information relating to external coding in this package, the files wn*.mf were not characters, we cannot use the same table, with all created, but documentation explains how to create Cyrillic letters, for every font: we must create a sep- fonts with the WNCYR encoding using a similar one- arate table for each encoding. There are five tables line header file as follows: for the different font shapes of the text fonts of the Computer Modern family: they are included in the wncoding=1; input <header-file lh*>; driver files. We must create the same number of The LH Cyrillic font family offers typesetting tables for every encoding. in Russian and other languages using the Russian part of the Cyrillic alphabet — that is Virtual and Alternate encoding. 3 Unicode — International Standard ISO/IEC 10646–1, The WNCYR encoding also offers typesetting in first edition, 1993.05.01. Information Technology — Univer- th sal Multiple Octet Coded Character Set (UCS)Part1:Ar- modern Slavonic texts using Cyrillic and 19 cen- chitecture and Basic Multilingual Plane (Table 11, Row 04, tury Russian text. But there are a lot of languages Cyrillic). TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting 175 Olga G. Lapko Furthermore, for each font and each encoding \fonttwoletters; these two letters are set by the we must create a header file (the Computer Modern user according to the necessary encoding at the start family has 56 text fonts). of TEX’s run. A fragment of such a file for font head- As you can see the number of files will be very ers is shown below: large, and they will often duplicate each other. % by default full Cyrillic Font (Unicode) The best solution is to create three files which % is generated contain all the information that could potentially \ifx\mainfontspecific\undefined be duplicated. The first one contains a table with \def\mainfontspecific{vfcoding:=1;}\fi all the Cyrillic glyphs and signs and all supported \ifx\fonttwoletters\undefined encodings. The second one contains data on liga- \edef\fonttwoletters{uc}\fi ture and kerning for the complete font repertoire. \long\def\FontsToBeGenerated{ The third file contains the table of font names and \tablevalues % sizes and the necessary command lines for every font ( ..