<<

TUGboat, Volume 14 (1993)) No. 1

- a a- a a' 5 av a a' Philology - e e- e e' 5 ev i e' T i- i i' T iv i i' - Typesetting Chinese Using o 0- 6 0' 6 ov o o' - U u- u u' ij uv h u' Virtual Fonts - " Wai Wong U u:- u u:' U u:v u u:' u U: @ e^ Pinyin is a phonetic system for transcribing the pro- A A- A A' A Av A A' nunciation (in Mandarin) of Chinese characters. It E E- E E' i EV E E' - " is in widespread use in China 111. Pinyin is taught to I I- i I' I IV I I' millions of children in schools there. It is also used 0 0- 6 0) d ov o 0' to teach foreigners and speakers of other Chinese u u- u U' 0 uv u U' dialects to learn Mandarin. This paper describes a - package for typesetting pinyin in LAW 121. It uses u u:- u U:' 6 U:v " u: ' the features of virtual fonts and ligatures to provide u U: i E- an easy way of generating fonts for pinyin and a simple input method. This package requires a . dvi Table l: Input formats of pinyin characters driver which is capable of handling virtual fonts. a paragraph of properly tone-marked pznyzn using 1 What is pinyin those macros yourself.) The pznyzn package provides a new style op- Pinyin uses the Latin alphabet to transcribe the tion pinyin to simplify the input and typesetting of sound of Chinese characters. All Chinese charac- pznyzn. This option provides an environment, a sim- ters are single syllables. Each syllable can be divided ple input method and several fonts specially built for into two parts: the initial and the final. The initials pznyzn. To use it, put the word pinyin in the op- are consonants (or sheng mij in Chinese). Some of tional argument of the \documentst yle command. the finals (or yhn mij in Chinese) are pure vowels, the Then, within your document, the pinyin environ- others are combinations of vowels and consonants. ment can be used anywhere. Each syllable can be pronounced in one of four pos- Within the pinyin environment, tone marks sible tones (shsng diao). When writing down pinyin, are input by typing the appropriate character after the tone of the characters is indicated by placing a the main vowel. The characters to mark the tones tone mark on top of the main vowel in the finals. are: - ' v '. For example, ma- ma' mav ma' will For example the vowel a can have the following four produce ma ma m5 ma. Another example shown be- tones 5 a 5 a. low is a poem by the famous poet IT b6i in the Tang There are two styles of writing pinyin: the first dynasty. The input is to separate syllables by spaces, i.e., the pinyin for each character is separated, the second puts the \begin(quote)% pinyin of multi-character words together, like pinyin. \begin(pinyin)% When writing in the second form, syllables begin- chua'ng qia'n mi'ng yue' pa-ng\\ ning with a, e or o may follow a vowel-ending sylla- yi' shi' di' sha'ng shua-ng\\ ble. This will cause confusion. If such case arises, an juv to'u wa'ng mi'ng yueC\\ apostrophe mark (') is used to separate them. For di- to'u si- - xia-ng \end(pinyin)% example pya6 (meaning leather jacket). \end(quote) produces 2 How to typeset pinyin Since pinyin uses the Latin alphabet, it would not chu6ng qian ming yui be too difficult to typeset it in LATEX, except one has yi shi di shang shuzng to place the tone marks properly. The normal jii t6u wang ming yui accent mark macros may be used for this purpose, di tou si gii xiang but they are not very convenient since need to put a tone mark on each syllable. (Try typing in The input formats of all possible combinations of the tone marks and vowels are listed in Table 1. TUGboat, Volume 14 (1993), No. I

I I I I I I I 'I I '272 u ij 6 1 "Bx "8"9 / "A / "B / "C 1 "D 1 "E 1 "F I

Table 2: pinyin character positions

The pinyin environment can take an optional pyssio from cmssi0. Ligature tables were defined argument which specifies the font to use. The cur- for each of the vowels to map the combinations of rently available fonts are sf and rm. Conventionally, the vowel and tone mark character to new charac- pinyin is printed in a Sans Serif font, so sf is the ter positions. All new characters and their codes are default as you can see in the examples above. If you listed in Table 2l. The compound characters are de- say fined in terms of the base vowel characters and the \begin{pinyin)[rml ... \end{pinyin) appropriate accent characters. These mappings and character definitions were done in the property list the pinyin text will be typeset in a Roman font. files. For example, the pysslO font is created by the Very often, a small piece of pinyin is embedded following procedures: in running text. A shorthand command \py can be used instead of the pinyin environment. For exam- 1. convert cmssi0.tfm to cmssl0.pl using ple, \py{pi-n yi-n) produces pin yin. Like the en- tftopl; vironment, \py can also take an optional argument 2. copy cmssio . pl into a file named pyssl0.vpl; to specify the font. 3. edit pyssl0.vpl to add the ligature tables and There is a command to specify the size of the new character definitions: pinyin characters: \pysize. It takes a single ar- 4. generate pyssl0.tfm and pyssl0.vf using gument which is the required size. It can be any vptovf . reasonable size for text, such as IOpt, llpt and so The ligature tables for all pznyin fonts should on. The default is IOpt. be identical. Appendix A lists the complete ligature table. When editing the .vpl file, care must be 3 The implementation of the pinyin taken to merge this into the existing table in the package file. The new character definitions have a general The pinyin environment is implemented using vir- form that looks like the example below: tual fonts and ligatures. Two families of virtual (CHARACTER 0 241 fonts. namely pyrn and pyssn, were created for (COMMENT a1 a-1 typesetting pinyin. These pinyin fonts are needed (MAP (PUSH) (SETCHAR C a) (POP) because: (SETCHAR 0 26)) accented characters, e.g., 3, i, which are not (CHARWD R 0.480557) included in ordinary fonts and not even in (CHARHT R 0.608887) the extended (DCIEC) fonts, are required 1 in pinyin; The positions of the lowercase pinyin charac- a special ligature table is required to support ters are arranged in a way that their codes are equal the input method in the pinyin environment. to the least significant byte of GB codes (the Chi- These pinyin fonts are based on the Computer Mod- nese national standard). For example, 3 at position ern fonts, i.e., pyr10 is modified from cmrlO and "A1 has GB code A8A1 in hexadecimal. TUGboat, Volume 14 (1993), No. 1

for single accent in (-, ', ", ' , ", ^) if the base character basechar is in (a, e. o, u) wd is the width of basechar ht is the height of accent no movement is need for setting the character the program is: (PUSH) (SETCHAR basechar ) (POP) (SETCHAR accent) if the base character basechar is i . basechar becomes 1 (the dotless i) wd is the width of basechar ht is the height of accent the program is: (PUSH) (MOVELEFT (wdaCcent- ~dbasechar)/2) (SETCHAR accent ) (POP) (SETCHAR dotless i) for double accent wd is the width of the base character (u)

ht + htddot + htaccent - htu the program is: (PUSH) (SETCHAR C u) (POP) (PUSH) (SETCHAR 0 177) (POP) (MOVEUP htddot- htu) (SETCHAR accent)

Figure 1: Algorithm for pinyin character definitions.

This is the definition for character 2 whose char- mechanize the generation of new font files by a pro- acter code is ,241. The MAP property specifies how gram. To achieve the perfect result, manual editing, this character is generated. It first saves the cur- i.e., moving the accents or base character around, is rent position, typesets the character a in the current necessary. base font and recovers the current position. It then overprints the accent character - whose code is '26. This is the simplest case since the accents in CM 4 A sample installation fonts were designed so that they do not need to be This package has been developed and tested in UNIX raised or lowered for most lower _case letters. For systems. It will not be difficult to port it to other more complicated characters like ii, the second ac- systems; in fact, the style file and the font files can cent mark (the one on top) has to be moved up. The be moved to other systems without any change pro- amount can be calculated using the heights of the vided that the new environment has m and . dvi base character and the accents given in the original driver supporting virtual fonts. The remainder of .pl file. The CHARWD and CHARHT properties specify this section describes an installation procedure in the width and height of the new character. They are UNIX systems as an example. calculated from the widths and heights of the base This package is distributed as a uuencoded and accent characters. compressed tar archive file for UNIX users. Use the The algorithm in Figure 1 was used in building commands below to decode and extract the files: the pinyin fonts where wd and ht are respectively uudecode pinyin.tar.Z.uue the width and height of the new characters. This zcat pinyin.tar.Z I tar xvf - gives a precise specification of the new characters. Afterward there will be a directory containing the The resulting appearance of the compound charac- style file pinyin. sty and the .vf and . tfm files. ters is very good though not perfect. The advantage To install it, follow the procedures below: of using an algorithm is that it could be possible to TUGboat, Volume 14 (1993), No. 1 11

1. copy pinyin.sty to the directory where the algorithm in Figure 1 and incorporating the lig- looks for macro files. In UNIX systems it is ature table. This will provide us an easy means of usually /usr/local/lib/tex/inputs. creating new font when the need arises. 2. copy all the .tfm files to the directory where The way of implementing this package, namely TEX looks for fonts. In UNIX systems it is usu- by using virtual fonts and ligatures, can be general- ally /usr/local/lib/tex/f onts/tfm. ized for other languages, such as Vietnamese, which 3. copy all the .vf files to the directory where have many accented and double accented characters. your .dvi driver looks for virtual fonts. For such applications, the automatic font generation For dvips in UNIX systems, it is usually program is necessary, and style files which support /usr/local/lib/tex/fonts/vf. more flexible font change and the new font selection scheme (NFSS) are also required.

5 Conclusions and future work References The pinyin package provides a simple and flexible means of typesetting pinyin text. Its usefulness is [I] Scheme for the chinese phonetic alphabet. Issued obvious to anyone who wants to perform this task. by the State Council of the People's Republic of It is very easy to use. The output quality is high, China. thanks to the system. 121 Leslie Lamport. BW: A Document Prepara- Although the pinyin package currently has a taon System. Addison-Wesley, 1985. small number of fonts and they are hard-wired in [3] Tomas Rokicki. DVIPS: a driver. the pinyin environment currently, it is sufficient for most applications since pinyin is only a pho- o Wai Wong netic transcription, it is usually mixed with ordinary Computer Laboratory text in a document and seldom appears as an entire University of Cambridge pinyin document on its own. Actually, having spe- New Museum Site cial fonts will provide a visual distinction between Pembroke Street ordinary text and pinyin. Cambridge CB2 3QG A natural extension to this work will be to au- U.K. tomate the generation of new fonts by implementing wwQcl.cam.ac.uk

A Ligature table for (LIG C v 0 217) (LIG 0 140 0 250) pinyin characters (LIG 0 140 0 220) (STOP) This appendix lists the ligature (STOP) (LABEL C i) table for the pinyin characters. It (LABEL C U) (LIG 0 55 0 251) should be merged into the exist- (LIG 0 55 0 221) (LIG 0 47 0 252) ing table in the base font. (LIG 0 47 0 222) (LIG C v 0 253) (LABEL C A) (LIG C v 0 223) (LIG 0 140 0 254) (LIG 0 55 0 201) (LIG 0 140 0 224) (STOP) (LIG 0 47 0 202) (LIG 0 072 0 231) (LABEL C o) (LIG C v 0 203) (STOP) (LIG 0 55 0 255) (LIG 0 140 0 204) (LABEL 0 231) (LIG 0 47 0 256) (STOP) (LIG 0 55 0 225) (LIG C v 0 257) (LABEL C E) (LIG 0 47 0 226) (LIG 0 140 0 260) (LIG 0 55 0 205) (LIG C v 0 227) (STOP) (LIG 0 47 0 206) (LIG 0 140 0 230) (LABEL C u) (LIG C v 0 207) (STOP) (LIG 0 55 0 261) (LIG 0 140 0 210) (LIG 0 47 0 262) (LIG 0 136 0 232) (LIG C v 0 263) (STOP) (LABEL C a) (LIG 0 140 0 264) (LABEL C I) (LIG 0 55 0 241) (LIG 0 072 0 271) (LIG 0 55 0 211) (LIG 0 47 0 242) (STOP) (LIG 0 47 0 212) (LIG C v 0 243) (LABEL 0 271) (LIG C v 0 213) (LIG 0 140 0 244) (LIG 0 55 0 265) (LIG 0 140 0 214) (STOP) (LIG 0 47 0 266) (LIG C v 0 267) (STOP) (LABEL C e) (LABEL C 0) (LIG 0 55 0 245) (LIG 0 140 0 270) (LIG 0 55 0 215) (LIG 0 47 0 246) (STOP) (LIG 0 47 0 216) (LIG C v 0 247)