<<

TUGboat, Volume 41 (2020), No. 3 275

The Non- scripts & group includes Tamil, Telugu, , , and Sinhala. In Southeast , find more dis- Kamal Mansour tant descendants of Brahmi in Thai, Lao, Khmer, 1 Introduction and scripts. As close relatives of Latin, Greek and Cyril- It is quite common in typographic terminology to lic scripts had more time to develop typographically divide the world’ scripts into Latin and non-. than other non-Latins. In its earliest typographic At first glance that might seem to a reasonable forms, Greek initially imitated scribal form categorization until one looks a bit closer to find before eventually settling on printed forms based on that non-Latin consists of a huge number of diverse its classic origins. Classical and Biblical studies fur- scripts. Imagine if we were to call Latin blue, ther established the need for Greek in the while calling all others non-blue. We would be gath- late nineteenth and early twentieth centuries. ering the remaining colors of the spectrum into one Cyrillic, originally derived from Greek, was ex- group without differentiation. Calling a color non- tensively revised and “europeanized” under the hand blue doesn’ tell us anything about what it might of . Then early in the twentieth cen- be; is it orange, green, magenta, indigo? tury, Cyrillic was further simplified by purging it of However strange this terminology may be, there unnecessary vestiges from Greek and re- is good reason behind it. Gutenberg invented move- dundant characters. In an effort to further unite the able type for ; in particular, focused huge territory of the , the government on a certain Gothic style used at the time by Ger- extended to support the many non- man scribes. We know things didn’t stop there. The Slavic of the various republics. Outside printed forms of Latin script moved away from the the Soviet Union, a certain number of Cyrillic type- handwritten forms over time, evolving into a dis- faces were always available through the mainstream tinct craft and discipline. In the process, the type foundries to satisfy the needs of academics in forms underwent considerable simplification. By the Slavic Studies and of emigrant communities. time typographic letters began to develop for other Because of their common shapes and behavior, scripts, Latin type had reached a maturity possible Greek and Cyrillic have always been easily accommo- only through time. Over the centuries, the machin- dated on the same typesetting equipment as Latin ery and techniques had been refined for the needs of script. Of course, the compositor had to be familiar Latin type and any newcomers to the game needed with the script he was setting, with Polytonic Greek to adapt to the current state of things. being the most demanding because of the variety of In the realm of non-Latin scripts, what are the . main groups and families? Moving eastward from When it comes to typography, every script has the origins of Latin script, we find Greek script, and its intricate details, and consequently, its own story further east, its close relative, Cyrillic. Hovering of transition from handwriting to type. It would over the , Armenian and Georgian raise be interesting to tell every story, but that would be their heads. Reaching from North eastward, enough to fill many books. Without going into every script occupies a large swath of the land- detail, we can gain some understanding by survey- scape that reaches as far as . Near the juncture ing the types of problems that arose when adapting of Western Asia and , Hebrew has its typesetting technology developed for Latin script to home. In East Africa, Ethiopic script — in ancient non-Latin scripts. Because scripts have developed times know as ’ez — has a sizable presence. Over as families, they share many attributes with other the large territory of , as well as in Taiwan, members of the family. We can take advantage Chinese is the dominant script. , a syllabic of the similarities to identify recurring obstacles in script with some Chinese visual features, dominates the typographic development of non-Latin scripts. in Korea. Meanwhile, further south in Japan, rules a By citing judiciously chosen examples from a few unique hybrid consisting of two syl- scripts, we can readily present the gamut of signifi- labaries ( and ), Chinese charac- cant difficulties. ters, in addition to Latin. The Indian subcontinent Before delving into the range of technical chal- is home for a large family of scripts descended from lenges, we must first recognize the most common ancient Brahmi writing. Over the centuries, the - hurdle shared by all who endeavored to bring a non- scendants split into two groups, northern and south- Latin script into the typographic age: the fear of the ern scripts. In the northern group, , Ben- exotic. Typography developed in Europe and it was gali, and Gujarati scripts stand out. The southern

The Non-Latin scripts & typography 276 TUGboat, Volume 41 (2020), No. 3

Europeans who first strove to take this craft to dis- The orthographic norms of represent tant lands. Before the advent of sociology, anthro- the range of simple to complex syllables by allowing pology, and linguistics, non-European cultures were its components to stack both horizontally and ver- seen as foreign, exotic, and difficult to fathom. Be- tically. Like Devanagari, the simplest syllable con- fore understanding the written form of a , sists of a with an implied or explicit , one must first learn the spoken language and, to but because of the tonal nature of Khmer, marks in- some extent, understand its culture. dicating tone are sometimes also required. Khmer To explore the gamut of typographic challenges, has the same needs as Devanagari for upper and we will visit four scripts: Devanagari, Khmer, Ara- lower marks, which result in more height and depth bic, and Chinese. In terms of visual impression and for syllable clusters. While Devanagari allows multi- structure, each of these scripts differs greatly from consonant syllables, it also allows them to stack hor- the others. izontally, if need be. On the other hand, Khmer requires the components of a multi-consonant sylla- 2 Devanagari ble cluster to stack vertically. We begin with Devanagari — the most commonly Following is an example of a relatively simple used script in contemporary India — that is used syllable cluster consisting of a single consonant no primarily for the and Marathi languages. De- followed by a vowel mark above (u1793 + u17B7): vanagari has a long history of development that be- gan with the language centuries ago. The ន + ◌ −→ ន structure of Devanagari writing is based on the sylla- ិ ិ ble which, in its simplest form, can be composed of As an example where additional depth is re- a consonant with either an inherent or explicit vowel quired, the following syllable cluster consists of a mark. As an example, let us consider the letter , single consonantal letter followed by a subscript con- sonant that sits to its right while partially overlap- which when written as क, includes the implied vowel a. If we want to write kī, we need to explicitly add ping it below [ + subscript sa ]. The use of a subscript form indicates that the preceding conso- the ī vowel ◌ी to ka क, resulting in क. Presented at a larger size, we can clearly show how the two nant is not followed by a vowel (u1795 + u17D2 + components of the kī syllable overlap on the top: u179F): −→ ផ ◌្ ស −→ ផ䮟 क + ◌ी क Now, if we want to write ku instead of kī, the The character u17D2 indicates that the follow- vowel is placed below ka as follows: ing consonantal should appear in its sub- script form. There are also deeper cases where a vowel sym- Marks other than can also appear above bol ie surrounds and subtends a consonant ko fol- and below letters; for instance, if we want to indicate lowed by a subscript consonant lo (u1783 + u17D2 that the vowel in ka sounds nasal, we would add a + u179B + u17C0). Note that the right part of ie above (natively, ) as in: is stretched to embrace the stacked (ko+ subscript lo + ie): कं Since Latin type was usually composed on a ឃ ល េ◌ៀ −→ េឃៀ䮛 single horizontal line with gaps between letters, De- vanagari presented quite a challenge because of its The above sequence could also carry an addi- need for overlap between adjacent characters, in ad- tional mark above it. Unlike Latin letters, Khmer dition to upper and lower marks. characters have a predominantly vertical aspect - tio when arranged into syllable clusters. One can 3 Khmer imagine the numerous challenges posed by Khmer Next, we consider Khmer, or Cambodian, writing. script for those who wanted to implement it in metal As a remote descendant of the , the type. How does one stack so many vertical elements structure of Khmer is also broadly based on the syl- along the baseline? If Latin and Khmer letters are lable, including the presence of an implied or explicit put side by side, what sort of size ratio should be vowel; however, the phonetic nature of the Khmer maintained between the two? Does one resort to us- language has also resulted in many additional fea- ing the largest leading needed throughout one docu- tures that go beyond the ancestral Brahmi model. ment, or does one increase it only when required?

Kamal Mansour TUGboat, Volume 41 (2020), No. 3 277

In digital equipped with OpenType tech- Each of the above styles required that the com- nology — or some other shaping software — the built- positor thoroughly master its shaping rules. in shaping logic allows the user to enter text at the character level only, while watching as syllable clus- ters compose magically into their correct form. By moving from metal to digital type, we also leave the Most Arabic text is set without vowel marks, obstacles of metal behind while gaining new flexibil- so setting fully vocalized text (with optional marks ity and fluidity of form. above and below the letters) as in the above sample was (and still is) laborious, and therefore avoided 4 Arabic because of the extra expense. The cursive form of is the one trait that With the advent of mechanized typesetting in differentiates it from most others. Arabic writing the early to mid-twentieth century, styles had to never developed typographic “block” letters, but has be simplified to fit the new size constraints of mag- remained cursive from its inception to this day. azines and keyboards. This trend to simplify per- Spread over many centuries and a broad geo- sisted, reaching its peak with the arrival of photo- graphic terrain, many different styles of Arabic script typesetters where the limited number of character have developed. Under Ottoman rule, when it came slots was dictated by Latin fonts. time to adapt one of the Arabic styles to type, the In the present day, one can say with good con- style prevailed. In due time, many other fidence that Arabic script stands to benefit greatly styles were implemented, but Naskh has remained from digital fonts equipped with OpenType technol- the reference style for running text. Because of its ogy in several ways. First of all, the oversimplifica- rich variety of contextually variant forms, Naskh tion of the past century can be discarded because style could only be typeset by a compositor well OpenType technology is capable of emulating the versed in the intricacies of the style. Showing a - various calligraphic styles of Arabic script. Since riety of styles available for manual setting, the fol- the global adoption of the Standard in the lowing image is taken from an 1873 catalog of the 1990s, Arabic text is stored as a sequence of - Amiriyah Press in Cairo: betic characters that are independent of style. The embedded shaping logic of each reflects its style, while text input remains invariant. By setting a string of text in a particular font, one can expect it to take the shape mandated by the rules of that style. For instance, in the following sample, each line consists of the same text set in three distinct fonts. Note that the second and third lines show words that are set on an incline with respect to the baseline — a calligraphic feature ably rendered in OpenType.

5 Chinese We now take a look at the fourth script, Chinese, which is nowadays shared mainly by the Chinese and Japanese languages, albeit with some stylistic differ- ences. Unlike the three previously examined scripts, require no additional marks nor any contextual shaping. Once designed, a character can simply be typeset as is. The primary obstacle for Chinese script lies in the thousands of characters needed to set everyday text. Chinese writing has a

The Non-Latin scripts & typography 278 TUGboat, Volume 41 (2020), No. 3

…Inherent dignity and…inalienable rights of…the human family is the foundation of freedom

…Inherent dgnity and…inalienable rghts of… human family i te fondation of freedom Figure 1: Typesetting in a standard Latin font vs. .

long history of exacting calligraphic styles that re- Scripts vary greatly in their level of complexity. quire great effort to master. The following sample A typical Latin typeface is quite simple because of its shows Song, one of the most common styles. relatively small character set and the near absence of shaping rules, while a calligraphic font such as Zapfino depends on a set of complex rules; see fig.1 This example clarifies how complexity can of- ten be based in a particular style, and is not nec- Despite the graphic complexity of its characters, essarily fundamental to the structure of the script Chinese script is typically classified among the “sim- itself. Some scripts are relatively complex because ple” scripts because the typesetting of Chinese text of their intrinsic shaping rules, while their input re- has no complexities in terms of its character shapes. quirements are simple due to a small character set. To write Japanese, Chinese characters () The line composition of Chinese text is relatively are intermixed with two phonetic called simple because all characters fit within a square and hiragana and katakana to show grammatical affixes, require no additional marks. On the other hand, the as well as foreign words and names. In typical Japa- labor needed to create all the characters is dispro- nese text, about half the characters consist of portionately high, and input methods are complex (hiragana or katakana). The following text is taken by necessity because of the large character set. An from the first line of the Declaration ofHu- Arabic font whose style requires at most four shapes man Rights. The more fluid and curvier character for each letter is simple in comparison to one whose are hiragana. style requires a rich set of contextual variants. Com- plexity in scripts is a matter of gradation. There is no need to simplistically put all scripts into two heaps, simple or complex. In this brief overview, we have seen some salient features of each of the following scripts: Devanagari, In the twentieth century, ingenious methods Khmer, Arabic, and Chinese. Although there are no and typesetting equipment were devised to handle visual similarities among these scripts, they do share the large character sets needed for Chinese and - a common history: each of them came into the twen- panese in their home lands. However, the keyboards tieth century with a typographic tradition of varying and magazines of the mechanical typesetters of Lino- durations and strengths. While this is also true of type and Monotype were designed for limited char- other scripts, there are many others that came into acter sets. In addition, the cost of developing an ad- the twenty-first century with either a typographic equate set of Chinese characters was high enough to false start or none at all. There are many language stop any Western companies from entering the field communities who had their own traditional script until the early era of digital typesetters in the late but whose populations were too small to initiate and 1980s, when Monotype entered the marketplace. sustain a typographic practice. Other communities Once digital methods prevailed in the Chinese had established a traditional practice of metal type- and Japanese markets, more local enterprises were setting during the nineteenth or twentieth centuries, able to enter the typographic field and thrive. Dig- but could not transition to newer technology such ital text entry now depends on sophisticated soft- as phototypesetting as metal technology became ob- ware that exploits the systematic structure of - solete. nese character in terms of its radicals, strokes, or Let us now roll back the calendar by about four phonetics to simplify the job for the user. decades to recount how we got to the current state of typography. As digital technology was matur- 6 Degrees of complexity ing in the 1970s and 1980s, a silent software revo- Now that we have taken a quick glimpse at several lution was starting to overtake typography. Donald scripts and their traits, we need to examine what Knuth’s and TEX started making digital inferences we can draw for the world of scripts.

Kamal Mansour TUGboat, Volume 41 (2020), No. 3 279

typography directly accessible to academics. This a faithful image of the intended content, but it was transition enabled a wave of academic publications not a dependable, unambiguous archive of the orig- that had earlier been throttled by ever-rising costs of inal text. professional typesetters. In the mid-1980s, the wave Around that time, a group of technologists in- of WYSIWYG word-processing began at Xerox, but volved with multilingual computing were discussing was later made available to the masses through the how to achieve the until-then elusive goal of an un- Apple Macintosh. The concurrent publishing of the ambiguous, global character set. They wanted a PostScript language, and its use as a standard font digital environment where code 192 would represent format, wrested digital typography from the hands only one character, and every character of every lan- of proprietary typesetting equipment and brought guage would be assigned a unique code. Gone would it into the public arena. High-resolution printing — be the world of ambiguous character codes, and all well, 300 dpi at the time — suddenly appeared in scripts, Latin and non-Latin, would be on equal foot- big and small offices alike, dragging along with it ing. This group of visionaries laid down the foun- the word font into everyday language. dation for the Unicode Standard, which eventually The first stage of the typographic revolution - came to be recognized as the one global character set abled the definition of character shapes through soft- by all the major makers of systems and software. ware instead of hardware. As affordable character- Once Unicode was accepted by Apple and Mi- drawing software became available in the early 1990s, crosoft, the slow transition away from ambiguous more typographic aficionados were drawn to the field codes began. Before and during this transition, the as new practitioners. The new software made it pos- transfer of text documents between different sys- sible to produce consistent character sets, properly tems was laborious. Any characters outside the 7-bit formatted as fonts destined to create high-resolution ASCII range could be garbled in transit unless they imprints on paper. Because of the notable differ- had been deliberately filtered. For instance, code ence in resolution, what appeared on the computer 0xE8 in Mac Roman represented Ë, while in Win- screen was only an approximation of the final image dows 1252 it stood for è. As the use of produced by a laser printer on paper. In the end, personal computers continued to penetrate every as- the paper image was the more important one. After pect of life, it became clear that such a contradictory all, until that time, typography’s aim was to leave situation was untenable in the long run. Change was an imprint on the relatively permanent medium of on the horizon, but it was slow in coming. paper. In the mid-1990s, as the presence of the Uni- By making letter forms in software instead of code Standard made itself felt across the globe, it metal, the wall of separation between Latin and met resistance in many different regions because it non-Latin characters collapsed. After all, outlines was perceived as foreign-born. To be accepted every- defined in the form of polynomials have no physical where, advocates and practitioners of the Standard barriers like molds and grooves, and are even free to had to demonstrate its efficacy and suitability be- overlap and intertwine. The domain of digital char- fore many different national and linguistic commu- acter shapes proved itself to be adaptable to any nities. As this defense of the Standard was mounted, script. it soon became apparent that an aspect of Unicode As impressive as the first stage of digital typog- had turned out to be an unintended obstacle for raphy was, it was only a first step forward. Even some non-Latin scripts. though it allowed us to produce typographic images “Characters, not ” — one of Unicode’s pri- without resorting to metal, it still had significant de- mary design principles — was aimed at keeping plain ficiencies which were not immediately apparent to Unicode-encoded text free of the entanglements of all. Once stored on a computer, one could use a style. Characters represent abstract units such as text document to produce numerous impressions of letters of the , while glyphs depict the par- it over time. In a sense, we can consider the docu- ticular style and size of the character, as dictated by ment a digital equivalent of the typographic galley a specific font. Since Unicode represents only the for metal type, but unlike the galley, the early digital characters of a string, the final shape and appear- documents were somewhat unreliable. In the early ance of the display glyphs are relegated to the “text 1990s, typical text documents used inherently am- rendering” process. biguous character codes; for instance, the code 192 In the mid-1990s, competing systems had differ- (decimal) represented À in a Latin-1 code set, while ent “rendering” processes that typically supported it meant ω¯ in a Greek set. With the right software only a few scripts. The rendering component of sys- and fonts in place, such a document could produce tems turned out to be a bottleneck for “complex”

The Non-Latin scripts & typography 280 TUGboat, Volume 41 (2020), No. 3

scripts such as Devanagari and Arabic; though their lished in 2002, the Script Encoding Initiative (SEI) texts could be readily encoded in Unicode, it was has been preparing and presenting such proposals difficult to achieve good typographic results. In for scores of less privileged scripts (see linguistics. that period, Apple introduced a sophisticated ty- berkeley.edu/sei/scripts-encoded.html). Over pographic system called “Apple GX” which was - the years, SEI has been instrumental in bringing at pable of rendering many scripts and styles, but it least 70 scripts into Unicode. Without such spon- ran only on Apple hardware. Soon thereafter, Mi- sorship, many scripts have little chance to enter the crosoft countered with a competing system called digital era. “TrueType Open”. About 10 years ago, launched the Noto Every practitioner in the publishing community Project (google.com/get/noto) to build OpenType was faced with a difficult choice to make. It wasn’t fonts for the scripts in Unicode, and to make the until 1996 that Adobe and first agreed on fonts freely available to all under an open source li- OpenType, a font specification that integrated both cense. were required not only to support PostScript Type 1 and TrueType font formats, in ad- the character set for each script, but also to make dition to a unified system supporting advanced typo- use of advanced typographic features that render the graphic features. Finally, there was hope of produc- script in its expected native appearance. So far, at ing -platform fonts in the near term, even for least 120 scripts are supported by Noto fonts. For non-Latin fonts. As could be expected, this poten- scripts that were already widely covered, the Noto tial was achieved gradually over the following years. fonts simply broadened the variety of styles avail- Today, we can rightly claim that a large num- able, while for other scripts, they opened the door ber of scripts can be correctly rendered and that the to the digital domain. level of typographic sophistication is continually ris- Not all scripts are on even footing yet — a great ing. Certainly, scripts such as Devanagari, Arabic, many have just begun their “digital lives”. As al- and Khmer can be correctly rendered on multiple ways, improvements will need to be made, and addi- digital platforms without any worry about portabil- tional scripts will have to be encoded. All the same, ity. The use of Unicode for numerous scripts, and I can say with confidence that this is a fruitful and the languages they support, has been universally ac- auspicious moment for the whole spectrum of global cepted. Each of the characters À, ω¯, Ë, and è is scripts, from red to violet. represented by the same unique code in any docu- ment, on any platform. ⋄ Kamal Mansour At this auspicious juncture, one must ask if Kamal.Mansour (at) {monotype,me} dot com there is more to do. Have we reached a stage where all the scripts in the world are on an even footing? In the Internet era, what does it take for a script to become “digitally enabled”? To prepare text docu- ments in a particular script, all its characters must Production notes first be listed in the Unicode Standard, each ofthem assigned a unique code. Then, to make such doc- Karl Berry uments humanly readable, there must be at least This article was typeset with LE ATEX, using the one functional font that renders the text in its com- polyglossia package and its commands \texthindi monly recognizable form. Once these two require- and \textkhmer for the Devanagari and Khmer ex- ments are met, documents in a given script — and amples. The input text was UTF-8. all the languages it supports — can be created, read, We used the Noto Devanagari and Noto distributed, and archived. Serif Khmer fonts (regular weight). It was not fea- Even though Version 13.0 of Unicode includes sible or desirable to install them as system fonts, so 154 scripts, more than 100 scripts are known to be we specified them to polyglossia as filenames: under consideration for future inclusion. It might \setotherlanguages{hindi,khmer} come as a surprise to some that so many scripts re- \newfontfamily\devanagarifont[Script=Devanagari] main unencoded. Some scripts might no longer be {NotoSerifDevanagari-Regular.ttf} in use and require considerable research to produce \newfontfamily\khmerfont[Script=Khmer] a definitive character set, while others might belong {NotoSerifKhmer-Regular.ttf} to a minority population that does not have the re- With the addition of ,Renderer=HarfBuzz in sources to prepare the documents needed to propose the \newfontfamily calls, the results obtained with their inclusion in the Standard. Since it was estab- LuaLATEX were identical. ⋄

Kamal Mansour