Tugboat, Volume 11 (1990), No
Total Page:16
File Type:pdf, Size:1020Kb
TUGboat, Volume 11 (1990), No. 2 G.A. Kubba. The Impact of Computers on Ara- to Computer Modern fonts-I strongly support the bic Writing, Character Processing, and Teach- principal idea, and I pursue it in the present paper. ing. Information Processing, 80:961-965, 1980. To organize the discussion in a systematic way, I Pierre Mackay. Typesetting Problem Scripts. will use the notions - borrowed from [2]-of text Byte, 11(2):201-218, February 1986. encoding, typing and rendering. J. Marshall Unger. The Fiflh Generation 2 Text encoding Fallacy- Why Japan is Betting its Future on Artificial Intelligence. Oxford University Press, In the context of w,encoding means the character 1987. sets of the fonts in question and their layouts. In the present section I will focus my attention on the X/Open Company, Ltd. X/Open Portability character sets, as the layouts should be influenced, Guide, Supplementary Definitions, volume 3. among others, by typing considerations. Prentice-Hall. 1989. In an attempt to obtain a general idea about the use of the latin alphabet worldwide, I looked up the o Nelson H.F. Beebe only relevant reference work I am aware of, namely Center for Scientific Computing and Department of Languages Identificatzon Guzde [7] (hereafter LIG). Mathematics Apart from the latin scripts used in the Soviet Union South Physics Building and later replaced by Cyrillic ones, it lists 82 lan- University of Utah guages using the latin alphabet with additional let- Salt Lake City, UT 84112 ters (I preserve the original spelling): USA Albanian, Aymara, Basque. Breton, Bui, Tel: (801) 581-5254 Catalan, Choctaw, Chuana, Cree, Czech, Internet: BeebeQscience .utah.edu Danish, Delaware, Dutch, Eskimo, Espe- ranto, Estonian, Ewe, Faroese (also spelled Faroeish), Fiji, Finnish, French, Frisian, Fulbe, German, Guarani, Hausa, Hun- garian, Icelandic, Irish, Italian, Javanese, Juang, Kasubian, Kurdish, Lahu, Lahuli, - Latin, Lettish, Lingala, Lithuanian, Lisu, On Standards Luba, Madura. Miao, Malagash, Malay, for Computer Modern Font Extensions Mandingo, Minankabaw, Mohawk, Mossi, Janusz S. Bien Navaho, Norwegian, Occidental, Ojibway (also spelled Ojibwe), Polish, Portuguese, Abstract Quechua, Rhaeto-Romanic (Ladin, Ro- Haralambous' proposal to standardize the unused mansh), Rumanian, Samoan, Seneca, Serbo- part of Computer Modern fonts is discussed, and Croatian. Sioux, Slovak, Slovene, Spanish, some modifications and extensions suggested. The Suto, Sundanese, Swahili. Swedish, Tagalog, idea is pursued by designing the extended CM font Turkish, Uolio, Vietnamese, Volapiik, Welsh, layout, and an example is given for one of its possible Wolof, Y, Yoruba, Zulu. uses. This list includes some languages and dialects 1 Introduction with no script at all, for which the information sup- plied concerns more or less standard transcription. In my note [4] I advocated an old (115, p. 461, 16, For most of them this fact is noted explicitly, but p. 451) but rarely used idea to place national letters the exception of Kasubian (usually recognized as a (actually, the Polish ones, but the generalization is dialect of Polish) suggests that this is not always obvious) in the unused part of Computer Modern the case. I noticed some inconsistencies in the nu- fonts, i.e. as the characters with the codes higher merous indexes to the book, but only one omission than 127; this approach allows the handling of na- (described later) in the proper text. Of course, it is tional languages in a way upward compatible with difficult for me to judge the reliability of the work the standard (American) English TEX. A similar as a whole. proposal was made independently by Yannis Hara- The number of additional letters in the latin lambous [8], who states also that the use of non- alphabets listed in LIG - including some variants English letters of latin alphabets should be coordi- of shape but excluding upper case letters - is 176. nated, resulting in a single widely used extension 176 TUGboat, Volume 11 (1990), No. 2 Hence the total number of lower and upper case let- unique identifications (explained in Annex A to the ters is definitely over 300. The possible errors and standard [13]) and names; I will use these names in omissions cannot change this estimate significantly, the sequel when appropriate. so in general we have to cope with the number of ad- The IS0 6937 character set includes 87 addi- ditional letters substantially exceeding the number tional letters which exist in both lower and upper of fkee slots in the Computer Modern fonts. case form, 6 letters which have only lower case form My solution to this problem is to postulate two and 2 letters which have only upper case form. Ad- levels of standards: ditionally, 3 lower case letters and 1 upper case let- Extended Computer Modern fonts, with a ter have shape variants (I refer here to the shapes small number of slots unassigned. of the letters, not to their function in specific lan- Full Extended Computer Modern fonts, i.e. guages; although e.g. in Lapp and Latvian G is the national or regional fonts compatible with Ex- upper case equivalent of g, I count them as having no case counterparts). This gives us the total of 186 tended CM fonts, but having some additional additional letters. Although 10 of them are already characters assigned. included in the original CM fonts (namely EE IE, 1, Of course, both of them will include all the charac- 1 L, ce a,0 0, fi), again the number of additional ters of the original CM fonts in their proper places; characters exceeds the number of free slots. More- although teletypewriter layout fonts are much less over, we should not forget the problem of the missing used, our standards should take them into account, punctuation marks. The most demanded ones seem too. to be the angle quotation marks (<,>>)used also e.g. It should be noted now that there are numerous in French, German and Polish, the "continental" left national and international standards for text encod- quotation mark (,,) used e.g. in German and Polish, ing. The most relevant for us is the IS0 6937 inter- and perhaps the German right quotation mark ("); national standard ([12], [13], [l4]), described thor- cf. [6], [21], [18]. oughly in [25] and discussed in [24]. Annex D to Let us have now a closer look at the character the standard 1131 is entitled Use of Latin alphabetic set proposed by Haralambous. To understand fully characters; formally it is not part of the standard, its implications, let us discuss first the language list but its goal is to provide contained in [8]. The IS0 standard and LIG confirm justification for the composition of the alpha- consistently only 8 items: betic part of the graphic character repertoire. Croat (spelled Croatian in [8] and [7]): C C, E C, It does not attempt to define which charac- d D, s S, i Z. ters should, and which ones should not. be Hungarian: B A, 6 E, ii, 6 0,60, i5 0, 6 U, ii~, used in any language. ii U. The annex contains a table (quoted in [25]) listing Polish (in addition, I vouch for its correctness per- the following languages (I again preserve the original sonally): a, 4, C C, ? Q, 1 L, ri N, 6 0; 6 S, i Z, spelling): i Z. Albanian, Basque, Breton, Catalan, Croat, Romanian (spelled Rumanian in 171): B A, & A, Czech, Danish, Dutch, English, Estonian, i 1, 9 8, I! T. Faroese, Finnish, French, Frisian, Galician, Slovene (spelled Slovenian in [7] and [8]): E C, S S, German, Greenlandic, Hungarian, Icelandic, i Z. Irish, Italian, Lapp, Latvian, Lithuanian, Maltese, Norwegian, Occitan, Polish, Por- Spanish: B A, 6 E, if, ii fi, 6 0,6 U, u U. tuguese, Rhaeto-Romanic, Romanian, Scots Turkish: &A,~~,~G,~I,~I,~~,~O,~~,CU, Gaelic, Slovak, Slovene, Sorbian, Spanish, ii u. Swedish, Turkish. Welsh, Afrikaans, Es- In the case of 7 languages my sources consis- peranto. tently disagree with Haralambous' list: With the exception of the last 2 languages, the Albanian. There is c with cedilla (C C) instead of list contains 39 living European languages. How- c with caron (E c). ever, despite the quoted reservation, it seems rather Catalan. There is the additional letter i with di- strange that, according to the table, English uses aeresis (i'); according to IS0 6937, there is an 28 additional letters (namely B A, B A, EE E, q C, additional letter 1 with middle dot, while LIG 6~,k~,G~,G~,ii,i1,fi~,60,60,cea).states The standard associates with all the characters their TUGboat, Volume 11 (1990), No. 2 Two successive letters 1which do not de- by Haralambous in a simplified form. I intend to note one sound are separated by a point consult an expert on this matter (I suspect differ- 1.1 (or 1.1). ent usage in North and South Vietnam), but his Czech. The letter d' is treated as variant of d; both answer is not relevant for our further discussion- of them are called in IS0 6937 small d with anyway, the Vietnamese letters and accents should caron; the same holds respectively for t'. LIG be included in a specific Full Extended CM font, not distinguishes also a variant of d differing in the in the Extended CM font. placement of the caron. For upper case letters In my opinion, the Extended CM fonts should both sources list only D aild T (neither D' nor contain the following additional letters: T'). small and capital a with acute (A A), grave Faroese (in [7] often spelled Faroeish). Instead (B A) and circumflex (5 A) accent, with diaere- of small d with stroke and small thorn there sis (ii A), tilde (5 A), ring (B A) and ogonek should be small eth (a)' and capital D with (a, 41, stroke (i.e.