Typesetting Vietnamese with Vntex (And with the TEX Gyre Fonts Too)
Total Page:16
File Type:pdf, Size:1020Kb
Typesetting Vietnamese with VnTEX (and with the TEX Gyre fonts too) Hàn Th¸ Thành River Valley Technologies http://river-valley.com Abstract This paper attempts to give an overview of the topics related to typesetting Viet- namese with TEX. There are no detailed instructions, but rather a comprehensive list of relevant issues and resources, so the reader can get an overall feeling of what is possible and where to look for further information for a quick start. A section is dedicated to a frequently asked question, i. e. how to write a few Vietnamese words in a non-Vietnamese document. I also take a close look at the Latin Modern and TEX Gyre fonts from the perspective of a Vietnamese end user, hoping to provide some useful feedback to the authors of those fonts. 1 A brief introduction to the AĂÂBCDĐEÊGHIKLMN Vietnamese language OÆƠPQRSTUƯVXY For convenience I repeat here a short introduction a«¥bcdđe¶ghiklmn to the Vietnamese language from [1] (with small oôơpqrstuưvxy modifications): In Vietnamese all words consist of single sylla- Vietnamese is written with Latin letters and bles, so they are often very short; hyphenation is not additional accents in a system called Quèc allowed at all. Ngú (which can be translated to English as The large number of Vietnamese accented let- “national language”) developed by the French ters has caused big problems in the past. Since there missionary Alexander de Rhodes. What sepa- are 134 accented letters, we have 6 letters that don't rates Vietnamese from other languages type- fit within the upper 128 positions of an 8-bit encod- set with Latin characters is that in Vietnamese ing. So people invented various encodings to deal some letters can have two accents. The to- with this problem, using techniques like dropping tal number of accented letters in Vietnamese uppercase letters, or placing some letters in slots (including uppercase and lowercase letters) 0–31, or removing some ASCII letters. None was is 134. Table 1 lists all accented lowercase perfect, until UTF-8 became widely used. Thanks to Vietnamese letters. UTF-8 support in the recent LATEX kernel, this is no longer a problem. Vietnamese accents can be divided into three The way that punctuation marks are typeset is kinds of diacritic marks: tone, vowel and con- not consistent; sometimes people put a space before sonant. Table 2 shows them all with examples. them, sometimes not. However, the case without a space before punctuation is dominant. The accents in Vietnamese play a very impor- More information on the Vietnamese alphabet tant role, since they are used to indicate different can be found in [9]. meanings and pronunciations of words. The same words with different accents can have a totally differ- 2 Introduction to VnTEX ent meaning (though the pronunciations can sound VnTEX is a package to typeset Vietnamese with LATEX very similar to a non-Vietnamese). and plain TEX. VnTEX consists of the following: The vowel marks are more important than tone • Vietnamese fonts; marks. Letters with different vowel marks are con- • LAT X support for Vietnamese: input encoding, sidered as distinguished letters, but a letter with E font encoding, support for babel, a style file to different tone marks is considered as a single letter activate Vietnamese, etc.; with different tones. Vowels have more impact on pronunciation than tones. • plain TEX support; The official Vietnamese alphabet as taught in • comprehensive font samples and test files; school has 29 letters in this collating order: • some brief documentation. TUGboat, Volume 29, No. 1 — XVII European TEX Conference, 2007 95 Hàn Th¸ Thành a ¡ ¤ à £ ¢ o ó ọ á ỏ ã Vowel breve l«n « ặ ¬ ¯ ® ô è ë ồ ê é circumflex tôi ¥ § ª ¦ © ¨ ơ ớ ñ ờ ở ỡ horn lươn Tone e ² ẹ ± ´ ³ u ú ụ ù õ ũ acute l¡ ¶ ¸ » · º ¹ ư ù ự ø û ú grave là hook above l£ i ½ ị ¼ ¿ ĩ y ý ỵ ỳ ỷ ỹ tilde l¢ đ dot below l¤ Consonant Table 1: List of all Vietnamese lowercase letters taking accents stroke đi Table 2: List of all Vietnamese diacritic marks VnTEX tries to make typesetting Vietnamese as in your final PDF or PostScript file. accessible and easy as typesetting English, and also 1. As the very first requirement, you must have tries to follow common LAT X conventions, in order to E some minimal LATEX support for Vietnamese: minimize the chance of conflicts with other packages. • Check whether you have VnT X installed. The official LATEX font encoding for Vietnamese is T5, E made by Werner Lemberg and Vladimir Volovich. VnTEX is included in teTEX, MiKTEX and T X Live. VnTEX supports various input encodings for Viet- E namese. The recommended encoding to use with • If you don't have VnTEX installed and you VnTEX is UTF-8. If for some reason UTF-8 cannot are using Latin Modern or TEX Gyre fonts, be used (for example your TEX editor doesn't support then you can just download a single file it), VISCII is the recommended 8-bit encoding. http://vntex.sf.net/download/ The first version was released in 2000. The vntex-support/t5enc.def and put it in current version is 3.1.5 and was released in January the directory where your LATEX source is. 2007. VnT X is being maintained by Hàn Th¸ Thành, E • Or, if you feel that you need VnTEX, or Werner Lemberg and Reinhard Kotucha. The official want to give it a try, or you are not using website of VnT X is [2]. E Latin Modern or TEX Gyre fonts but a VnT X is already part of T X Live and MiKT X, E E E font available in VnTEX only, you can try which frees most users from the need to install it to download and install VnTEX by following manually. the instructions (in English) at [3]. ConTEXt also has support for Vietnamese (partly 2. If all the above fails, try to get help from some- imported from VnTEX) and a number of Vietnamese users. one else to solve at least one of those issues. 3. Make sure you have the package fontenc loaded 3 How to typeset just a few with T5 encoding. For example, if your docu- Vietnamese words ment contains European language(s) only, then This section tries to answer the question that has you should have a line saying often been asked: How can I typeset just a few Viet- \usepackage[T5,T1]{fontenc} namese words in my LATEX document, which is writ- ten in English (German/Polish/French/...)? in your preamble. The answer depends very much on the particular 4. Here's an example of how to input Vietnamese scenario; however, I assume that you are in a hurry, words: you don't want to bother with issues like how to {\fontencoding{T5}\selectfont display and write Vietnamese in your TEX editor. Ti\'\ecircumflex{}ng Vi\d\ecircumflex{}t} You have only a few Vietnamese words in your LATEX file and you would like to see them properly displayed which gives the output Ti¸ng Vi»t. 96 TUGboat, Volume 29, No. 1 — XVII European TEX Conference, 2007 Typesetting Vietnamese with VnTEX (and with the TEX Gyre fonts too) 5. If the font family you are using doesn't support • use the XUniKey or Xvnkb keyboard driver T5 encoding, you might want to use another to input Vietnamese letters (this is not family by saying something like needed for Emacs which has its own input \fontencoding{T5}\fontfamily{cmr} methods). \selectfont Supposing that you have VnTEX installed and can read/write Vietnamese with your editor, we can Instead of cmr (Computer Modern fonts) you continue now. can use lmr (Latin Modern fonts), or any font family that supports T5 encoding. For a com- A minimal example looks as follows: plete list, see [4]. \documentclass{report} 6. Table 3 contains all Vietnamese letters for your \usepackage[utf8]{vietnam} reference. \begin{document} Ti¸ng Vi»t 7. If you have quite a lot of Vietnamese words it \end{document} can be somewhat tedious to translate them to the above form (often called the LATEX Internal If the document contains multiple languages, Character Representation — LICR). On Win- you can use babel: dows you can use the package http://vntex. \documentclass{report} sourceforge.net/download/vntex-support/ \usepackage[english,vietnam]{babel} tovntex.zip to translate text in the clipboard \usepackage[utf8]{inputenc} from VIQR or UTF-8 to LICR by one key press. \begin{document} The same (or close) convenience could be \selectlanguage{english} made for Unix/Linux users, but at somewhat English text higher cost due to deficiencies of Unix-like sys- \selectlanguage{vietnam} tems. So if you don't use Windows then you are Ti¸ng Vi»t out of luck, sorry. However, if you use Vim, you \end{document} can still download the package mentioned above, Once you get started with those examples, you and use the vim script inside the zip archive might be interested in some practical issues like: to do the conversion. If you want to make this • how to use VnTEX with Scientific WorkPlace, easier for Unix users then let me know. • how to create PDF that can be searched, cut or In the future, a web page might be set up to pasted, provide online conversion from UTF-8 text to • how to create PDF with Unicode bookmarks, LICR. • how to use MakeIndex with VnTEX, 4 A quick start for VnTEX users • how to convert LATEX to HTML using TEX4ht. This section describes the very first steps for those All those issues are solved and described at the who want to use VnTEX.