Typesetting Vietnamese with VnTEX (and with the TEX Gyre fonts too)

Hàn Thế Thành River Valley Technologies http://river-valley.com Abstract This paper attempts to give an overview of the topics related to typesetting Viet- namese with TEX. There are no detailed instructions, but rather a comprehensive list of relevant issues and resources, so the reader can get an overall feeling of what is possible and where to look for further information for a quick start. A section is dedicated to a frequently asked question, i. . how to write a few Vietnamese words in a non-Vietnamese document. I also take a close look at the Latin Modern and TEX Gyre fonts from the perspective of a Vietnamese end user, hoping to provide some useful feedback to the authors of those fonts.

1 A brief introduction to the AĂÂBCDĐEÊGHIKLMN OÔƠPQRSTUƯVXY For convenience I repeat here a short introduction aăâbcdđeêghiklmn to the Vietnamese language from [1] (with small oôơpqrstuưvxy modifications): In Vietnamese all words consist of single sylla- Vietnamese is written with Latin letters and bles, so they are often very short; hyphenation is not additional accents in a system called Quốc allowed at all. Ngữ (which can be translated to English as The large number of Vietnamese accented let- “national language”) developed by the French ters has caused big problems in the past. Since there missionary Alexander de Rhodes. What sepa- are 134 accented letters, we have 6 letters that don’ rates Vietnamese from other languages type- fit within the upper 128 positions of an 8-bit encod- set with Latin characters is that in Vietnamese ing. So people invented various encodings to deal some letters can have two accents. The to- with this problem, using techniques like dropping tal number of accented letters in Vietnamese uppercase letters, or placing some letters in slots (including uppercase and lowercase letters) 0–31, or removing some ASCII letters. None was is 134. Table 1 lists all accented lowercase perfect, until UTF-8 became widely used. Thanks to Vietnamese letters. UTF-8 support in the recent LATEX kernel, this is no longer a problem. Vietnamese accents can be divided into three The way that punctuation marks are typeset is kinds of marks: , and con- not consistent; sometimes people put a space before sonant. Table 2 shows them all with examples. them, sometimes not. However, the case without a space before punctuation is dominant. The accents in Vietnamese play a very impor- More information on the Vietnamese tant role, since they are used to indicate different can be found in [9]. meanings and pronunciations of words. The same words with different accents can have a totally differ- 2 Introduction to VnTEX ent meaning (though the pronunciations can sound VnTEX is a package to typeset Vietnamese with LATEX very similar to a non-Vietnamese). and plain TEX. VnTEX consists of the following: The vowel marks are more important than tone • Vietnamese fonts; marks. Letters with different vowel marks are con- • LAT support for Vietnamese: input encoding, sidered as distinguished letters, but a letter with E font encoding, support for babel, a style file to different tone marks is considered as a single letter activate Vietnamese, etc.; with different tones. have more impact on pronunciation than tones. • plain TEX support; The official as taught in • comprehensive font samples and test files; school has 29 letters in this collating order: • some brief documentation.

TUGboat, Volume 29, No. 1 — XVII European TEX Conference, 2007 95 Hàn Thế Thành

a á ạ à ả ã o ọ ò ỏ õ Vowel breve lăn ă ắ ặ ằ ẳ ẵ ô ố ộ ồ ổ ỗ tôi â ấ ậ ầ ẩ ẫ ơ ớ ợ ờ ở ỡ lươn Tone e é ẹ è ẻ ẽ u ú ụ ù ủ ũ acute lá ế ệ ề ể ễ ư ứ ự ừ ử ữ grave là lả i í ị ì ỉ ĩ ý ỵ ỳ ỷ ỹ lã đ below lạ Table 1: List of all Vietnamese lowercase letters taking accents stroke đi

Table 2: List of all Vietnamese diacritic marks

VnTEX tries to make typesetting Vietnamese as in your final PDF or PostScript file. accessible and easy as typesetting English, and also 1. As the very first requirement, you must have tries to follow common LAT X conventions, in order to E some minimal LATEX support for Vietnamese: minimize the chance of conflicts with other packages. • Check whether you have VnT X installed. The official LATEX font encoding for Vietnamese is T5, E made by Werner Lemberg and Vladimir Volovich. VnTEX is included in teTEX, MiKTEX and T X Live. VnTEX supports various input encodings for Viet- E namese. The recommended encoding to use with • If you don’t have VnTEX installed and you VnTEX is UTF-8. If for some reason UTF-8 cannot are using Latin Modern or TEX Gyre fonts, be used (for example your TEX editor doesn’t support then you can just download a single file it), VISCII is the recommended 8-bit encoding. http://vntex.sf.net/download/ The first version was released in 2000. The vntex-support/t5enc.def and put it in current version is 3.1.5 and was released in January the directory where your LATEX source is. 2007. VnT X is being maintained by Hàn Thế Thành, E • Or, if you feel that you need VnTEX, or Werner Lemberg and Reinhard Kotucha. The official want to give it a try, or you are not using website of VnT X is [2]. E Latin Modern or TEX Gyre fonts but a VnT X is already part of T X Live and MiKT X, E E E font available in VnTEX only, you can try which frees most users from the need to install it to download and install VnTEX by following manually. the instructions (in English) at [3]. ConTEXt also has support for Vietnamese (partly 2. If all the above fails, try to get help from some- imported from VnTEX) and a number of Vietnamese users. one else to solve at least one of those issues. 3. Make sure you have the package fontenc loaded 3 How to typeset just a few with T5 encoding. For example, if your docu- Vietnamese words ment contains European language() only, then This section tries to answer the question that has you should have a line saying often been asked: How can I typeset just a few Viet- \usepackage[T5,T1]{fontenc} namese words in my LATEX document, which is writ- ten in English (German/Polish/French/...)? in your preamble. The answer depends very much on the particular 4. Here’s an example of how to input Vietnamese scenario; however, I assume that you are in a hurry, words: you don’t want to bother with issues like how to {\fontencoding{T5}\selectfont display and write Vietnamese in your TEX editor. Ti\’\ecircumflex{}ng Vi\\ecircumflex{}t} You have only a few Vietnamese words in your LATEX file and you would like to see them properly displayed which gives the output Tiếng Việt.

96 TUGboat, Volume 29, No. 1 — XVII European TEX Conference, 2007 Typesetting Vietnamese with VnTEX (and with the TEX Gyre fonts too)

5. If the font family you are using doesn’t support • use the XUniKey or Xvnkb keyboard driver T5 encoding, you might want to use another to input Vietnamese letters (this is not family by saying something like needed for Emacs which has its own input \fontencoding{T5}\fontfamily{cmr} methods). \selectfont Supposing that you have VnTEX installed and can read/write Vietnamese with your editor, we can Instead of cmr ( fonts) you continue now. can use lmr (Latin Modern fonts), or any font family that supports T5 encoding. For a com- A minimal example looks as follows: plete list, see [4]. \documentclass{report} 6. Table 3 contains all Vietnamese letters for your \usepackage[utf8]{} reference. \begin{document} Tiếng Việt 7. If you have quite a lot of Vietnamese words it \end{document} can be somewhat tedious to translate them to the above form (often called the LATEX Internal If the document contains multiple languages, Character Representation — LICR). On Win- you can use babel: dows you can use the package http://vntex. \documentclass{report} sourceforge.net/download/vntex-support/ \usepackage[english,vietnam]{babel} tovntex.zip to translate text in the clipboard \usepackage[utf8]{inputenc} from VIQR or UTF-8 to LICR by one key press. \begin{document} The same (or close) convenience could be \selectlanguage{english} made for Unix/ users, but at somewhat English text higher cost due to deficiencies of Unix-like sys- \selectlanguage{vietnam} tems. So if you don’t use Windows then you are Tiếng Việt out of luck, sorry. However, if you use Vim, you \end{document} can still download the package mentioned above, Once you get started with those examples, you and use the vim script inside the zip archive might be interested in some practical issues like: to do the conversion. If you want to make this • how to use VnTEX with Scientific WorkPlace, easier for Unix users then let me know. • how to create PDF that can be searched, cut or In the future, a page might be set up to pasted, provide online conversion from UTF-8 text to • how to create PDF with bookmarks, LICR. • how to use MakeIndex with VnTEX, 4 A quick start for VnTEX users • how to convert LATEX to HTML using TEX4ht. This section describes the very first steps for those All those issues are solved and described at the who want to use VnTEX. In contrast to the previ- VnTEX website [2]. ous section, the requirement here is that you have 5 A survey of fonts that can be used VnT X installed, your editor can properly display E with VnT X Vietnamese text and you can input Vietnamese text. E If your setup doesn’t meet these requirements, here VnTEX comes with quite a large set of fonts. Apart are some quick tips: from that, many existing fonts also support Viet- 1. for Windows users: namese. Most of them are made by the Polish font experts who are working on the Latin Modern and • install MiKT X; E TEX Gyre font projects. • use a text editor that supports UTF-8: TeX- Maker with a Unicode font like Courier 5.1 Fonts coming with VnTEX New; Many freely available TEX fonts have their Viet- • use the Unikey keyboard driver to input namese counterpart coming with VnTEX: Vietnamese letters. • VNR: based on Computer Modern, the default 2. for Unix users: TEX fonts; • install TEX Live 2007; • URWVN: based on URW fonts, the free version • use a text editor that supports UTF-8: TeX- of 35 standard PostScript fonts; Maker, Emacs or Kile, with a Unicode font • Vn CM Bright: based on CM Bright, a sans serif like Courier New; font derived from Computer Modern fonts;

TUGboat, Volume 29, No. 1 — XVII European TEX Conference, 2007 97 Hàn Thế Thành

À \‘A Í \’I Ý \’Y ị \d{i} Ả \{A} Ị \d{I} Ỵ \d{Y} ò \‘o à \~A Ò \‘O à \‘a ỏ \h{o} õ \~o Á \’A Ỏ \h{O} ả \h{a} ã \~a ó \’o Ạ \d{A} Õ \~O á \’a ọ \d{o} Ă \Abreve Ó \’O ạ \d{a} ô \ocircumflex Ằ \‘\Abreve Ọ \d{O} ă \abreve ồ \‘\ocircumflex Ẳ \h\Abreve Ô \Ocircumflex ằ \‘\abreve ổ \h\ocircumflex Ẵ \~\Abreve Ồ \‘\Ocircumflex ẳ \h\abreve ỗ \~\ocircumflex Ắ \’\Abreve Ổ \h\Ocircumflex ẵ \~\abreve ố \’\ocircumflex Ặ \d\Abreve Ỗ \~\Ocircumflex ắ \’\abreve ộ \d\ocircumflex  \Acircumflex Ố \’\Ocircumflex ặ \d\abreve ơ \ohorn Ầ \‘\Acircumflex Ộ \d\Ocircumflex â \acircumflex ờ \‘\ohorn ở \h\ohorn Ẩ \h\Acircumflex Ơ \Ohorn ầ \‘\acircumflex ẩ \h\acircumflex ỡ \~\ohorn Ẫ \~\Acircumflex Ờ \‘\Ohorn ẫ \~\acircumflex ớ \’\ohorn Ấ \’\Acircumflex Ở \h\Ohorn ấ \’\acircumflex ợ \d\ohorn Ậ \d\Acircumflex Ỡ \~\Ohorn ậ \d\acircumflex ù \‘u Đ \DJ Ớ \’\Ohorn đ \dj ủ \h{u} È \‘E Ợ \d\Ohorn è \‘e ũ \~u \h{E} Ù \‘U Ẻ ẻ \h{e} ú \’u Ẽ \~E Ủ \h{U} ẽ \~e ụ \d{u} É \’E Ũ \~U é \’e ư \uhorn Ẹ \d{E} Ú \’U ẹ \d{e} ừ \‘\uhorn Ê \Ecircumflex Ụ \d{U} ê \ecircumflex ử \h\uhorn ữ \~\uhorn Ề \‘\Ecircumflex Ư \Uhorn ề \‘\ecircumflex ứ \’\uhorn Ể \h\Ecircumflex Ừ \‘\Uhorn ể \h\ecircumflex ễ \~\ecircumflex ự \d\uhorn Ễ \~\Ecircumflex Ử \h\Uhorn ế \’\ecircumflex ỳ \‘y Ế \’\Ecircumflex Ữ \~\Uhorn ệ \d\ecircumflex ỷ \h{y} Ệ \d\Ecircumflex Ứ \’\Uhorn Ự \d\Uhorn ì \‘i ỹ \~y Ì \‘I ỉ \h{i} ý \’y Ỳ \‘Y Ỉ \h{I} ĩ \~i ỵ \d{y} Ỷ \h{Y} Ĩ \~I í \’i Ỹ \~Y Table 3: Vietnamese alphabet

• Vn Concrete: based on the CM Concrete font; • Vn Classico: based on URW Classico (also known • TXTTVN: based on TXTT, a very nice type- as Optima). writer font from the txfonts package; • Vntopia: based on Adobe Utopia; In addition to the above, VnTEX also contains TEX support for some TrueType fonts (MS Core fonts, • Vn Charter: based on Bitstream Charter; ArevSan, ComicSans). • Vn URW Grotesk: based on URW Grotesk font, a heavy font suitable for displayed text; 5.2 Fonts from other sources • Vn Garamond: based on URW Garamond; The fonts made by Bogusl aw Jackowski and Janusz

98 TUGboat, Volume 29, No. 1 — XVII European TEX Conference, 2007 Typesetting Vietnamese with VnTEX (and with the TEX Gyre fonts too)

Nowacki have excellent support for many languages are not very interested in trying, since the bene- including Vietnamese. The fonts come ready-to-use fits of switching to those new fonts are not that and cover all Vietnamese letters. Since the fonts are clear for an end user. Perhaps a short document so well known, I will only list them here without explaining the benefits to switch to those new description: fonts would be helpful. • Latin Modern fonts, Quality: the Vietnamese letters look very good in general: • TEX Gyre fonts (Bonum, Pagella, Schola, Ter- mes, Heros), • accents are very consistent regarding shape and placement; • Antykwa Toru´nska, • in most cases the accents look similar to • Iwona, or better than those in VNR and URWVN • Kurier. fonts; Antykwa Pól tawskiego is also a very nice font • kerning data are added for Vietnamese let- made by our Polish font experts but doesn’t support ters as well; Vietnamese yet. Therefore I take this opportunity • the widths of letters like Ư, Ơ, ư, ơ are to repeat my plea to add Vietnamese support to this adjusted properly; font :). • the quality of hints and outlines are far Starting with version 7.0, Adobe Reader comes better than in URWVN fonts. with two nice fonts: Minion and Myriad. These fonts Issues: still, there are some issues regarding accent have full support for Vietnamese. However, the use shapes and placement that need to be discussed of these fonts is restricted to Adobe Reader only. and perhaps re-considered if applicable: It’s not even allowed to embed a subset of the fonts • sometimes the accents don’t seem to have in a PDF or PostScript file. Using a recent version the same (or optically compatible) dark- of pdfT X, it is possible to use the fonts without E ness; especially the is often embedding them (but doing so implies that the PDF darker than the others in italic fonts; output must be viewed or printed with Adobe Reader • the placement of acute and grave over cir- 7.0 or newer). cumflex does not seem optimal to Viet- 5.3 Typesetting Vietnamese and math namese users, especially the combination of grave over circumflex. It is preferable to As mentioned above, VnT X tries to make typesetting E have the grave rather on the right side of Vietnamese as easy as typesetting English. When the circumflex, instead of the center. it comes to math typesetting, this holds true as well. Most of the fonts listed in the Free Math Font • sometimes the accents are placed too close Survey [7] either have a Vietnamese version avail- to the base letter or to each other, making some letters look too “crowded”. This is able with VnTEX, or support Vietnamese themselves. Therefore, the samples listed in the survey also apply more visible at small sizes, or low resolu- to Vietnamese. There is even a Vietnamese transla- tion. tion of the survey [8]. 7 Acknowledgments 6 Comments on Latin Modern and I would like to thank all the people who have con- TEX Gyre from a user perspective tributed to VnTEX in various ways. I try to list here the most significant contributors that I can recall, in The Latin Modern and TEX Gyre fonts try to make no particular order: life easier for TEX maintainers as well as TEX users. Therefore I think it is important for the authors to • Vladimir Volovich for work on the T5 encoding get feedback not only from font experts and TEX and supporting T5 encoding in his cmap package, maintainers, but also from end users. In this sec- • Werner Lemberg for major LATEX support in tion I would like to comment on the fonts from the VnTEX and also for the vncmr package, from perspective of a Vietnamese TEX user. which VnTEX was derived, Ease of use: excellent, since the fonts are already • Reinhard Kotucha for maintaining the VnTEX included in major TEX distributions, with every- package, thing needed for use. In other words, it cannot • Bogusl aw Jackowski and Janusz . Nowacki for be easier for someone who wants to use them or their excellent support of Vietnamese in their give them a try. However, it seems that people fonts,

TUGboat, Volume 29, No. 1 — XVII European TEX Conference, 2007 99 Hàn Thế Thành

• Nguyễn Đại Quý (also known as vnpenguin) for [5] The VnOSS forum has a box dedicated to TEX supporting the vntex.org domain and many and Vietnamese: various issues related to VnTEX, including dedi- http://forum.vnoss.org/viewforum.php? cating a box on his forum [5] to VnTEX, id=10 • Thái Phú Khánh Hòa for the Vietnamese trans- [6] Another web site with a forum useful for Viet- lation of the Free Math Font Survey and the namese TEX users: VnTEX logo at the VnTEX website [2]. http://viettug.org [7] The Free Math Font Survey is available at: References http://ctan.tug.org/tex-archive/info/ [1] Hàn Thế Thành, Making Type 1 fonts for Viet- Free_Math_Font_Survey/survey.html namese, TUGboat 24:1, Proc. of the 24th An- [8] The Vietnamese translation of the Free Math nual Meeting and Conference of the TEX Users Font Survey is available at: Group, pp. 69–84. http://ctan.org/tex-archive/info/Free_ [2] The official VnTEX website: vntex.sf.net. Math_Font_Survey/vn/survey-vn.pdf [3] The latest version of VnTEX and installation [9] The definition of the Vietnamese alphabet at instructions (in English) are available at: Wikipedia: http://en.wikipedia.org/wiki/ http://vntex.sf.net/download/vntex Quoc_ngu [4] The complete font samples that can be used with VnTEX are available at http://vntex.sf.net/ fonts/samples. The NFSS and TFM names of all fonts are included, making it easy to use a particular font.

100 TUGboat, Volume 29, No. 1 — XVII European TEX Conference, 2007