<<

Full : How Many ?

Olga . Lapko , 129820, Moscow Pervy Rizhsky per., 2 Mir Publishers [email protected]

A Brief History of Cyrillic Mongolian uses Russian letters with two ¯ additions: ‘é’, ‘ ’. Slavonic was invented by St. Cyrill and One may also find Cyrillic letters used in scripts St. Method. Now there are two well-known Slavonic based on the . Examples are the - : Glagolitic and Cyrillic. nese languages: , Lahu, Lisu, Myao, Juang, as well Historians are not sure whether the author of as several African languages. both was St. Cyrill or whether Cyrillic was created by St. Method while St. Cyrill invented the History of the full Cyrillic project Glagolitic alphabet. In any case, in this paper 1 will deal about the alphabet nowadays called “Cyril- The LHFONTS package was created as a part of lic.” the CyrTUG-EmTEX package, which is distributed The birthday of Cyrillic is considered to the among Russian and non-Russian users who use end of May 863. May 24 was declared by UNESCO Cyrillic. LHFONTS offers the LH Cyrillic font fam- as the day of Cyrillic. By coincidence, the first con- ily; these are based on the WNCYR fonts of the CYRILLIC package — part of AMS- . ference of the Cyrillic TEX Users Group, CyrTUG, was held May 24–25, 1991. The main task of the LHFONTS package was The Cyrillic alphabet is based on the Greek al- to create the Cyrillic fonts family, an extension of phabet. There were 43 letters in this alphabet. Up standard text fonts of Computer Modern, which also until the beginning of the 20th century, four addi- corresponds to Russian traditions. tional letters existed; these are absent in the modern First this package offered two more or less

popular encoding schemes:2 Alternative — an 8-bit

: ‘’, ‘ ’, ‘ ’, and ‘’. Nowadays is used not only by Latin-Russian font encoding analogous to MS-DOS’s Slavonic people, but also by other nations of the for- 866, mainly used by Russian MS-DOS mer USSR. Historically, many of these nations used users; and the Washington or WNCYR — a 7-bit - other scripts. Some Soviet republics such as Mid- coding for typesetting with , which is dle Asian republics, , and the Russian mainly used by non-Russian users. The Cyrillic encodings are described autonomic republics, used the Arabian script. In in special files — lbcoding.mf (Alternate encoding) , the old Mongolian vertical script was used in and wncoding.mf (WNCYR encoding). One can the , and the Dzayapandin vertical choose between these files by changing the value script was used in the Kalmyk language. Soon af- of one of the following variables: altcoding, ter the October Revolution many languages started vfcoding or wncoding. These variables also to use the with additional letters. In determine the font layout: Abkhasia, the Georgian alphabet was used for a few years. At the end of 20s and 30s, almost all lan- altcoding Standard Computer Modern in the lower guages of the USSR changed from using Latin script part of table plus Russian letters and ad- to Cyrillic. In many languages new letters were cre- ditional marks in the upper ated (see fig. 1). one; Alternate encoding: encoding file Outside Russia, Cyrillic is used in , Ser- lbcoding.mf bia, and . The Bulgarian lan- guage now uses only letters of the modern Russian

1 This package was originally named MAKEFONT, but c alphabet, but earlier ‘ ’and‘ ’werealsoused.In was renamed to avoid confusion with the utility of the same

Macedonian and Serbian the following additional name on the 4AllTEX CD-ROM, produced by the Nederland-

Z S \ letters are used: ‘’, ‘Y’, ‘ ’, with ‘ ’, ‘ ’, ‘s’ in stalige TEX Gebruikersgroep.

2 The package also offered virtual encoding: rather condi-

[ Macedonian only, and ‘_’, ‘ ’, ‘ ’ in Serbian. The tional 7-bit encoding which combines Cyrillic and Latin fonts.

174 TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting Full Cyrillic: How Many Languages?

vfcoding Russian letters and added punctuation which use the Cyrillic alphabet with added letters. marks in the upper part of table — for the The following sections discuss this problem. following combining with Computer Mod- ern in virtual font; Alternate encoding: The Global Cyrillic font as the material for encoding file lbcoding.mf Ω project wncoding Cyrillic letters for Slavonic languages with necessary input ligatures, standard and Some time ago the multilingual project Ω was additional punctuation marks in the lower started. One of the authors of Ω, Yannis Haralam- part of table; WNCYR encoding: encoding bous, began to create a full Cyrillic font. offered file wncoding.mf this font to CyrTUG for further work. The data for this font was taken from the Cyril- The values of these variables are determined in 3 driver files (ld??font.mf) by default. The header lic part of the table. But this table still of these files (for ex. ldrmfont.mf)containsthe does not cover all Cyrillic letters; some old Cyrillic following lines: letters and national letters are missing. Probably the full assortment of accented is necessary, if unknown wncoding: wncoding:=0; fi which are not included in Unicode. if unknown vfcoding: vfcoding:=0; fi The font created during this work had more ... than 256 letters and marks. This font assortment altcoding:=1-wncoding-vfcoding; should be further extended and improved. During the testing of this font, I created shortened variants, if wncoding<>0: input wncoding; or split it into a few fonts. The two Cyrillic Uni- else: input lbcoding; fi code fonts were extended with for characters ... not included in Unicode and created by the methods Variables altcoding, vfcoding and wncoding may described in this paper (see the Appendix).

be set by hand in the file header or at the start of The methods of partial font creation may use-

AFONT the MET run. fully improve the economy of use of the computer’s The LHFONTS package contains font headers memory. One may create a big 256-letter (Unicode) named lh*.mf and ll*.mf font and then use a virtual font to achieve the nec- The files lh*.mf (56 files) are virtually identical essary encoding. Alternatively, one may create the to cm*.mf except for the last line: font immediately in its required encoding.

generate

AFONT How TEXhelpsMET .Creationof that is, the standard Computer Modern driver file coding and -kerning tables

was changed to the analogous file for the LH fonts.

AFONT To create a font, MET needs program descrip- These file headers generate a full 8-bit Latin-Cyrillic tions for letters (a lot of them), information about font. lettercodes, and kerning and ligature data. The files ll*.mf (also 56 files) contain only the Now there are a few well-known encodings of following line: Cyrillic. They differ in which characters they hold vfcoding=1; input ; and in what order (see the Appendix). So, for every thus the command vfcoding:=1; sets generation of encoding, a separate file is needed. Russian letters and punctuation marks only. Since TEX cannot use a font containing lig- Since the WNCYR encoding was an optional en- atures or kerning information relating to external coding in this package, the files wn*.mf were not characters, we cannot use the same table, with all created, but documentation explains how to create Cyrillic letters, for every font: we must create a sep- fonts with the WNCYR encoding using a similar one- arate table for each encoding. There are five tables line header file as follows: for the different font shapes of the text fonts of the Computer Modern family: they are included in the wncoding=1; input ; driver files. We must create the same number of The LH Cyrillic font family offers typesetting tables for every encoding. in Russian and other languages using the Russian part of the Cyrillic alphabet — that is Virtual and Alternate encoding. 3 Unicode — International Standard ISO/IEC 10646–1, The WNCYR encoding also offers typesetting in first edition, 1993.05.01. Information Technology — Univer- th sal Multiple Octet Coded Character Set (UCS)Part1:Ar- modern Slavonic texts using Cyrillic and 19 cen- chitecture and Basic Multilingual (Table 11, Row 04, tury Russian text. But there are a lot of languages Cyrillic).

TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting 175 Olga G. Lapko

Furthermore, for each font and each encoding \fonttwoletters; these two letters are set by the we must create a header file (the Computer Modern user according to the necessary encoding at the start family has 56 text fonts). of TEX’s run. A fragment of such a file for font head- As you can see the number of files will be very ers is shown below: large, and they will often duplicate each other. % by default full Cyrillic Font (Unicode) The best solution is to create three files which % is generated contain all the information that could potentially \ifx\mainfontspecific\undefined be duplicated. The first one contains a table with \def\mainfontspecific{vfcoding:=1;}\fi all the Cyrillic glyphs and signs and all supported \ifx\fonttwoletters\undefined encodings. The second one contains data on liga- \edef\fonttwoletters{uc}\fi ture and kerning for the complete font repertoire. \long\def\FontsToBeGenerated{ The third file contains the table of font names and \tablevalues % sizes and the necessary command lines for every font ( ... 8 9 10 ... 17.28[17] ) % header file. The data for every font in every encod- ing would be taken from these files and the neces- \makefont\fonttwoletters r % sary header files created. All these files are created ( ... 8 9 10 ... 17.28[17] )() by TEX. TEX also can create the file containing all ... uccodes, lccodes and mathcodes for a given font. \makefont\fonttwoletters tt % ( ... 8 9 10 ... )(specific:=0;) Preparing font headers As mentioned above, we } use the parameters of the Computer Modern text By default there is creation of a set of file head- fonts for creating Cyrillic fonts. First, the header ers and encoding for the global Cyrillic font in Uni- files for the LHFONTS package were copied from code (see fig. 1) in this file and the file of encoding header files of the Computer Modern family, chang- data. ing the only line (See the section entitled “History of the full Cyrillic font project”). To avoid unneces- Preparing the encoding file and files of liga- sary duplication, we create header files which load ture/kerning tables The encoding files are cre- the necessary Computer Modern header file, substi- ated from the file which contains the table of all tuting the standard driver file with the driver of LH Cyrillic glyphs and signs and all well-known (or at Cyrillic fonts. This task, for example, was solved in least necessary) encodings. the Polish fonts package, in there is a special tricky % by default full Cyrillic Font (Unicode) file fik_mik.mf which substitutes standard drivers % is generated with Polish ones. One of the authors of this file, Bo- \ifx\fonttwoletters\undefined guslav Yackovski, allowed us to use this file in the \def\fonttwoletters{uc}\fi LHFONTS package. \def\nolettercode{*} Now it is necessary to create header files for font \long\def\CodesToBeGenerated{ creation which include one or a few lines only. \tablevalues ( uc lh wn ... ) The EmTEX package supports a command opera- \makecod CYR_A CYRA ( 10 80 41[A] ... ) tor in its MFJob program, which enables one to write

\makecod CYR_BE CYRB ( 11 81 41[] ... )

AFONT necessary short commands for the MET run. ... By using this, we can avoid creation of a lot of files \makecod CYR_LJE CYRLJE ( 09 * 01[] ... ) with the only line: } input fik_mik_; use_driver; We can see that this table is analogous to the For font generation on other platforms we must previous one. Macros, analogous to macros for cre- create these files in any case. For quick gener- ating font headers, were used for creating the encod- ation of the files I used the file dcstdedt.tex ing files. from DCFONTS. The original file includes the Now we must create tables of ligatures and table with all font names and font sizes. We kerns. In the Computer Modern fonts these tables modified the file, providing a possibility to add are in the font driver files. In the LH fonts the tables the line (\mainfontspecific macro), which can switch on variables vfcoding:=1; or wncoding:=1; are in separate files. As we said above, we need to when necessary, and the parameter for a few fonts, create five tables of ligatures and kernings for text which switch on necessary shape. To specify us- fonts: 1) for roman and sans shape; 2) for italic age of different encodings we must change the font shape; 3) for caps and shape, for which names, so the first two letters are changed to macro two tables are actually necessary, separately for the

176 TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting

Full Cyrillic: How Many Languages?

AFONT uppercase and small saps letters; and 4) for large How MET generates only necessary let- fonts like cminch. ters TEX has thus created files necessary for en-

The Cyrillic font is very large, but one may see coding and ligature and kerning pair tables. Now

AFONT that almost all Cyrillic letters can be identified in a MET must generate only the necessary glyphs few shape groups. We determined 14 groups for up- required for the given encoding. percase letters, 14 groups for lowercase letters and From the very beginning the LH font family sup- 17 groups for italic letters in Cyrillic font. The let- ported different encodings. Since in different coding ters in these groups sometimes repeat the shape (or schemes Cyrillic letters occupy different places, in contour) of Latin ones, so one may use the Computer character descriptions (beginchar command), ex- Modern table as a base. plicit character codes have been replaced with their symbolic names. For example, the description of the The file of kerning and ligature data retains the lowercase Cyrillic letter ‘a’ starts with: shape grouping of letters, so every new letter will be added to an appropriate group. At the beginning of each group we place a “typical” Latin letter as a cmchar "Lowercase Russian letter a"; comment. When the new letter appears it may be beginchar(CYR_a,9.25u#,x_height#,0); added into a necessary group: ... \writeLig{if wn:} In the description of uppercase letters we added \writeLig{ ligtable CYR_ZE: "1"=:CYR_ZHE, the line: if lower_case: ... fi for redefinition of ""=:CYR_ZHE, "h"=:CYR_ZHE;} % "" a code in the font “Small Caps”: \writeLig{fi}

\Ligtab %A cmchar "Uppercase Russian letter A"; \Letter{CYR_A} \Letter{CYR_A_acute} beginchar(CYR_A,13u#,cap_height#,0); \Letter{CYR_LIT_YUS} if lower_case: charcode:=CYR_a; fi ......

%b

AFONT \Letter{CYR_HARD_SIGN}\Letter{CYR_YATZ} In plain MET the definition of the ... beginchar command has the following lines: %R \WriteLig{if serifs:} def beginchar(expr c,w_sharp,h_sharp,d_sharp) = \Letter{CYR_BIG_YUS} begingroup ... charcode:=if known c: byte c else: 0 fi; \WriteLig{fi} ... % enddef; % \Kern{CYR_O}{k#}\Kern{CYR_O_lcomma}{k#} which means that a letter with an unrecognized \Kern{CYR_O_acute}{k#} code number is set to position ‘0’. ... \Kern{CYR_ABKH_O}{k#} For the LH fonts,aletterorasignwhosecodeis % not recognized must be skipped, so the beginchar command is redefined in the following way: ... \EndLigtab let plain_beginchar=beginchar; We may create a font for kern testing by taking the characteristic examples from these letter groups def beginchar(expr c,w_sharp,h_sharp,d_sharp) = (see Appendix). iff known c: % For creation of the necessary ligature and kern- plain_beginchar(c,w_sharp,h_sharp,d_sharp); ing pair tables, we use data from the encoding file enddef; which was created just before them. Now the liga- tures are used in WNCYR encoding only without any What needs to be done when it is necessary to cre- changes from the original Washington State Univer- ate the Cyrillic font only, but letters and signs of sity fonts. standard Computer Modern have got code numbers In addition to the tables of ligature/kerning, in beginchar so they are always determined? The TEX creates a uccode/lccode/mathcode file and a three variables which were mentioned in the section file ???cod.tex, which is used by russianb.ldf4. on the History of the Full Cyrillic Font Project, - termined three different encodings. Now they may 4 The -specific file for the Babel . switch on the following:

TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting 177 Olga G. Lapko

altcoding Standard Computer Modern in the lower [10] Fry Edmund, “Pantographia containing ac- part plus Cyrillic letters and added punc- curate copies of all the known in tuation marks in necessary encoding in the the world together with an English explana- upper one tion of the regular force or power of each let- vfcoding Cyrillic letters and added punctuation marks in the upper, lower, or in both parts ter, to which are added specimens of all well- of the table — for next combining with authenticated oral languages; forming a com- Latin part in virtual font or for full Cyrillic prehensive digest of ”, Cooper and font creation (for example Unicode) Wilson, London, 1799. wncoding this set was not changed — Cyrillic letters [11] Katzner Kenneth, “The languages of the for Slavonic languages with necessary lig- world”, London, Henley: atures, standard and additional punctua- [12] The World’s major languages, ed. by Bernard tion marks in the lower part of the table; Comrie, London, Sydney, 1987. Rout ledge& WNCYR encoding KeganPaul, 1977. Now the encoding files, lbcoding.mf or [13] Y. Haralambous, J. Plaice, “Typesetting in the wncoding.mf switched on by these variables, set Cyrillic alphabet with Ω — The Basic Ideas”, necessary selection of Cyrillic letters and their en- August 24, 1994. coding. In fact, the file wncoding.mf for WNCYR [14] DC-Fonts, Beschreibung der Kodebelegung: encoding was not changed. The file lbcoding.mf T X 256 Zeichen — internationaler Zeichensetz, switches the necessary encoding. When we have the E 22. ¨arz 1992. necessary files, we can create the font.

References

M. Musaev, \Alfavity zykov narodov

[1] K.

SSSR", Moskva, \Nauka", 1965.

S. Gil revski, . S. Grivnin, Opre-

[2] R. A Appendix

delitel~ zykov mira po pis~mennost m,

Moskva, \Izdatel~stvo vostoqno liter- tury", 1960.

a Figure 1: Unicode encoding; Cyrillic part

ova, Nekotorye voprosy grafiki

[3] E. I. Ubr t

i orfografii pis~mennosti zykov naro-

       

0: 

        

SSSR, pol~zuwihs alfavitami na

dov 16:

 ! "  $  & ' * + , -. /

ssko osnove, Moskva, 1959.

ru 32:

1 2 3 4 5 6 7 8 9 : ; < = > ?

48: 0

skovska sinodal~na tipografi , Ob-

[4] Mo

A B C E G H I J K M O

64: @

razcy liter cerkovnyh, rossiskih, gre-

R S T U V X Y Z [ \ ^ _

80:

qeskih, latinskih, gruzinskih, evres-

a b cd ghij klm n o

96: `

kih, nemeckih i proqih, nahod wihs v

q r st u v w x yz {| }~ 

112: p

Moskovsko sinodal~no tipograf i, Mos-

Q R S T U V

128: P

kva, 1826.

[ \ ] ^ _` a b c š ›œ  ž Ÿ

144: Z

¡ ¢ £¤ ¥¦ § ¨ © ª « ¬ ­ ® ¯

xriftov, Uzbekska SSR. Sovet

[5] Obrazcy 160:

± ² ³´ µ ¶ · ¸ ¹ º »¼ ½¾ ¿

176: °

narodnyh komissarov, ridiqeskoe izda-

Â Ã Ä È Ë Ì

192: ÀÁ

tel~stvo, Tipografi , Samarkand, 1928.

Ñ Ò ÓÔ Õ Ö × Ø Ù Ú ÛÜ Ý Þ ß

208: Ð

atarski zyk i novye informacionnye

[6] T

á â ã ä å æ ç è é ê ë î ï

224: à

tehnologii. Vypusk 2, Izdatel~stvo Kazan-

ñ ò ó ô õ ø ù

240: ð

skogo universiteta. 1995.

odov SSSR, 5 t., Moskva, 1966{68. [7] zyki nar [8] Andrei B. Khodulev and Irina A. Makhovaya, “On TEX experience in Mir Publishers”, Pro- Since the Latin part is unchanged and uses the TEX ceedings of the 7th EUROTEX Conference, encoding scheme the next examples show only the Prague, pp. 37–43, 1992. Cyrillic part of the font. [9] Olga G. Lapko, “MAKEFONT as part of CyrTUG-EmTEX package”, Proceedings of the 8th EUROTEX Conference,Gda´nsk, Poland, pp. 110–114, 1994.

178 TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting Full Cyrillic: How Many Languages?

Figure 5: KOI-8 ( platform) encoding

Figure 2: Cyrillic letters which are not included >} in Unicode 128: <

144:

      

0: 160: Q

     

16: 176: 

! "  $  & '   * + , - . / N 0 1 F 4 5 D 3 E 8 9 : ; < = >

32:  192:

? O @ A B C 6 2 L K 7 H M I G J

1 2 3 4 5 6 7

48: 0 208:

. &   $       

A B C D E F G H I J K L M N O

64: @ 224:

/  ! "  ,+  - ' *

240:

Q R S T U V W X Y Z [ \ ] ^ _

80: P

ab cd e f gh i j k l m n o

96: `

qr s

112: p

QR ST UV WXYŠ‹Œ  Ž 

128: P [

144: Z ¡¢ 160: 176: Figure 6: ISO 8859-5 encoding

192: > 128: < 208: 144:

224:

        160: 

240: ó

        

176:

! "  $  & ' * + , -. /

192: 

1 2 3 4 5 6 7 8 9 : ; < = > ?

208: 0

A B C D E F G H I J K L M N O

224: @

Q R S T U V W X Y Z [ \ ^ _ 240: }

Figure 3: MS DOS cp866 encoding

         128:

Figure 7: Apple Macintosh encoding

! "  $  & ' * + , -. /

144: 

1 2 3 4 5 6 7 8 9 : ; < = > ?

160: 0

        

176: 128:

! "  $  & ' * + , -. /

192: 144: 

  R ­  S

208: 160: Z

[   T  W Y Z

176: V

A B C D E F G H I J K L M N O

224: @

 < > [ \ U

192: X

Q  T  W  ^ < >}

240: 

^  _ }  Q

208: 

1 2 3 4 5 6 7 8 9 : ; < = > ?

224: 0

A B C D E F G H I J K L M N 240: @

Figure 4: Washington encoding

Figure 8: Windows 1251 encoding

       

0: 

  S 

          

16: 128:

R Y Z \ [ _

! "  $  & '   * + , - . /

32:  144:

 ^  Z   < 

1 2 3 4 5 6 7 8 9 : ; < = > ?

48: 0 160:

 V [ Q } T > X  U W

A B C D E F G H I J K LM N O

64: @ 176:

P Q R S T U VWX Y Z [ \ ] ^_

        

80: 192:

` a b c d e f g h i j k l m n o

! " $  & ' * + , -. /

96: 208: 

p q r s t u v w x y z {| } ~ 

1 2 3 4 56 7 8 9 : ; < = > ?

112: 224: 0

A B C D E F G H I J K L M N O 240: @

TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting 179 Olga G. Lapko

Figure 9: Example of national encoding: Tatar

encoding

        

128:

! "  $  & ' * + , -. /

144: 

1 2 3 4 5 6 7 8 9 : ; < = > ? 160: 0 176: 192:

208:

A B C D E F G H I J K L M N O

224: @

Q Ø Ù è é ° ±` a ¢ £ º » < > 240: 

180 TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting