The Arabi system — ] ¨r`˜ [ A\ž TEX writes in and Farsi

Youssef Jabri Ecole´ Nationale des Sciences Appliqu´ees, Oujda, Morocco yjabri (at) ensa dot univ-oujda dot ac dot ma

Abstract In this paper, we will present a newly arrived package on CTAN that provides Arabic script support for TEX without the need for an external pre-processor. The Arabi package adds one of the last major multilingual typesetting capabilities

to Babel by adding support for the Arabic ¨rˆ and Farsi ¨FCA languages. Other languages using the Arabic script should also be more or less easily imple- mentable. Arabi comes with many good quality free fonts, Arabic and Farsi, and may also use commercial fonts. It supports many 8-bit input encodings (namely, CP-1256, ISO-8859-6 and UTF-8) and can typeset classical Arabic poetry. The package is distributed under the LATEX Project Public License (LPPL), and has the LPPL maintenance status “author-maintained”. It can be used freely (including commercially) to produce beautiful texts that mix Arabic, Farsi and Latin (or other) characters.

Pl›

Y˜ ¾Abn Tn ®˜¤ Tr`˜ ‘¤r˜ šAm`tF TžAk› t§ A\ž ¨r`˜ T›EC .‘¤r˜ fOt˜ TEX «  » A\ž šAm`tFA d¤ dnts› ¨ n

 A\ž ‰› (¨FCA ¤ ¨rˆ)ŸtŒl˜ šAm`tF TžAk› S ¨r`˜ T›EC , Tž¤rm˜ Ÿ› rb• Cdq ‰tmt§¤ ¯m› ¢žk zmt§ A\n˜ @h , T§db˜ @n›¤

Y˜ At§ ¯ ¢ž Y˜ TAR . AARn› £EAž œ A› œ\`› ‰› šAm`tF®˜ ™A’ ¢ž±

¾AO› ¾A˜A ¨r`˜ dq§ . Tmlk˜ ¨ ‘¤r˜ šAkJ d§dt˜ ¨CA ˜A`› © ¨  ¨t˜ ªW˜ Ÿ› dˆ šAm`tF ¢nkm§ Am• šAm`tF¯ ­r ªW Tˆmm

¯¤ ¨žA› ¨r`˜  ,  A\n˜ Tbsn˜A šA˜ ¡ Am• . ¾®› E¤dn§¤ A\ž ‰› . šAm`tF¯ ºAnˆ ¯ ¢lim`atsu› lk§

1 Introduction 4. Free (as in freedom), meaning a license like the The development of Arabi1 was a response to the GNU GPL or LPPL. absence of a package that manipulates the Arabic Arabi comes with an extensive user manual; this script and fulfills the following requirements: article gives a general overview of the system. 1. LATEX 2ε and Babel compliant, this combina- 2 Typesetting Arabic with T X: the tion format/package being the most widely used E existing possibilities in our opinion when mixing different languages. 2. The possibility of using 8-bit input text includ- TEX and the Arabic script have a long history. ing already existing Arabic texts, on different One might imagine that enabling TEX to write systems. in both directions Right-to-Left (R2L) and Left-to- 3. Able to use existing, commercial and free, beau- Right (L2R) with an Arabic font suffices to typeset tiful Arabic fonts. Arabic with TEX. 1 Unfortunately, although such an extended TEX The name of the package should not be misunderstood. may perhaps be used to typeset a R2L language like It is not designed to support only the Arabic language, but all languages that use the Arabic script. Technically speaking, Hebrew, this is far from sufficient for a complex for Babel, they will all be considered as dialects of Arabic. script like Arabic, where the shapes of the glyphs

TUGboat, Volume 27 (2006), No. 2 — Proceedings of the 2006 Annual Meeting 147 Youssef Jabri

depend on the , and may take many forms 3 Arabic script specifics (at least four forms for the majority of Arabic char- 2 The Arabic script is one of the most widely used acters even in the simplest cases). scripts on earth. It dominates in Arabic countries, Many early attempts have been made; they all of course, but has a special place for all Muslims relied on a preprocessor that does the contextual because it’s the script used to write the Koran, the analysis (also known as the shaping algorithm). holy book of Muslims. One attempt, not widely known, due to Terry The Arabic script, like all other Semitic lan- Regier from the University of California, Berkeley, dating from December 1990, relied on the famous guages, is written from Right-to-Left. macros of D. Knuth and P. MacKay: Another important aspect of the Arabic script is that no hyphenation is needed, or allowed at all. %The lines below are from Knuth and MacKay % TUGboat vol.8, #1, page 14. So, no hyphenation patterns are needed for any lan- \font\revrm=xbmc10 \hyphenchar\revrm=-1 guages that uses the Arabic script. In very old Ara- \catcode‘\|=\active bic documents, words could be split after a non- \def|#1|{{\revrm \reflect#1\empty\tcelfer}} connecting character, while characters that connect \def\reflect#1#2\tcelfer{\ifx#1\empty\else% were never split. In modern Arabic, hyphenation is \reflect#2\tcelfer#1\fi} forbidden completely. This makes it more difficult to do the reflection, after a preprocessor has done a to get justification when long words occur at the rough contextual analysis. end of a line, but Arabic is also cursive and has (in The pioneering work by Knuth and MacKay modern fonts mimicking the handwritten forms) a [11], who implemented the TEX bidirectional algo- special character called kashida or tatweel (keshideh rithm (which is unrelated to the Unicode Bidirec- is a Farsi word that means stretch) that may be used tional Algorithm; the latter implicitly chooses the between adjoining characters to make the word be- directions of the text) and added to TEX the four come longer. An example is the following word: primitives (\beginL, \endL, \beginR and \endR) šA› that may be written to occupy longer šAþ› made things much better! and longer šAþþ› and much more longer space Some early attempts were also carried out by šAþþþþ›. Y. Haralambous, who used the new extended en- 3 3.1 The gine TEX--XET. This includes the non-free Al-Amal 4 (1992, [6]), and the free ArabiTEX (April 1995). The Arabic alphabet is caseless, but most letters The most widely used system at present is prob- have either two or four forms. The different forms ably K. Lagally’s ArabTEX [13]. It is a package for are used according to the letter’s position in the writing Arabic in several languages using the Ara- word (initial, medial, final and isolated). The al- bic script. It consists of a TEX macro package and phabet is constituted in its basic form by one Arabic Naskhi-like font. ArabTEX will run with • 28 consonants (29 if we count the ). But Plain TEX and LATEX; and work with any TEX en- the number of 28 characters can exceed easily gine, because it uses its own bidirectional algorithm. 1000 glyphs per font if all ligatures are present! So, no preprocessor is needed! This makes it a little slow but with today’s computer power, this is not Isolated Initial Medial Final really a problem. Its real drawback lies in the fact that the macros apparently depend heavily on the  b glyphs of the font it uses, making it quite impossi-     ble to use any other fonts that may be available to Š ˆ ` ‰ the user. For courageous users, there also exist two more £ ¡ h ¢ powerful systems Table 1: Some characters’ contextual forms • Ω by Y. Haralambous and J. Plaice, and • X TE EX by J. Kew, if you have the right system • Seven diacritical marks specifying the vowels. and the right fonts. They are not used in typical Arabic texts but 2 Through typographical simplifications. Some aspects of appear in poetry, textbooks for people learning traditional Arabic typography are described in [5]. 3 the Arabic language, and some religious texts. We did not review it, as it was not available to the public They can be typed and then at the moment of as far as we know. 4 The source and a DOS executable of the preprocessor compiling the document, can be either included were available through the French TUG. or omitted according to the author’s wish! The

148 TUGboat, Volume 27 (2006), No. 2 — Proceedings of the 2006 Annual Meeting The Arabi system — T X writes in Arabic and Farsi ] ¨r`˜ [ A\ž E

three basic ones are called fatha, damma and fonts, using the (quite limited) ligature possi- bilities of . kasra: ; the sukun is used þiþþþþþuþþþþaþ þ"þ This second point is the whole secret of Arabi’s com- patibility with most available packages. We tried to for the absence of vowels; and there are three shorten T X coding to deal with the specifics of the tanwin forms written by doubling the three ba- E Arabic script as much as we could, to avoid eventual conflicts and clashes with other code. sic ones: . þÀþþþ¿þþþþ¾þþ The system is also compatible with all other for- mats, such as plain or ConT Xt. This too is because The vowel marks are written somewhat like E the whole contextual analysis is done in the fonts! accents in the Latin script. Above, the drawn line represents the baseline, with the vowels that 4.1 Input and font encodings appear above the line being typeset above let- ters, while those below the line are typeset be- Typesetting Arabic and Farsi texts with TEX implies low letters. the use of special input and output encodings, so we need to use the standard packages inputenc and 3.2 Arabic typography fontenc. We use two special font encodings. For Arabic, This aspect of Arabic merits much investigation and we use LAE for Local Arabic Encoding, while for Farsi so much can be said about it. But in order not to we use LFE that stands for Local Farsi Encoding. be too lengthy, we will just cite three points. These two encoding are not final. Some character In the classical Arabic literature, there are no positions may change, and some empty slots will be typographical styles like bold, italic, etc. Different filled with new characters. classical typefaces are used instead (req’a, naskhi, Concerning the input encoding, the user simply thuluth, etc.) to distinguish between different log- A creates an ordinary LTEX file, in which can use 8- ical parts of the text. In modern literature, that bit Arabic characters, typed visually on some system depends heavily on computers made by people who that supports the Arabic script. are either unaware of the rules of Arabic typography For now, the Arabi system supports the following or do not have enough time or money to develop input code pages: such possibilities, we use more and more boldface 1. Arabic Windows CP-1256 for Arabic and Farsi. and italics (slanted to the wrong side many times, unfortunately). 2. ISO-8859-6 for Arabic, not suitable for Farsi be- Concerning spacing and punctuation, there is cause many Farsi characters are missing. a lot of change between books published early in 3. The multibyte Unicode UTF-8 (ISO-10646) for this century by mechanical means and some more Arabic and Farsi. recent ones typeset using computer programs. It seems that different editors adopted different rules. 4.2 What has been done so far? Some use English or French rules, while others insert Currently, with Arabi you can typeset correctly, while space before and after each sign — which was the mixing the Arabic and Latin scripts, according to rule in the older texts! the context: In general, in Arabic texts, enumerated lists use • Footnotes, appearing on the right side of the the abjad system using letters, in a particular order, page. instead of numbers, but numbered lists are used also. • Lists, both itemized and enumerated. The stan- 4 The Arabi system dard enumerate environment uses the abjad system mentioned earlier. The two main problems faced when typesetting Ara- • Floats are typeset with the right caption form bic with T X are managed by Arabi as follows. E and the appropriate entry is added to the table 1. The bi-directional capability supposes that the of contents. user has a TEX engine providing the four primi- Moreover, Arabi takes care of the bidirectional for- \beginR \endR \beginL \endL tives , , and . This matting of sectioning, chapters, (sub-)sections, etc., ε is the case with the TEX--XET and -TEX en- according to the context. And the tables (of con- gines. tents, figures and tables) are typeset all according 2. The contextual analysis does not need/use any to one global text direction, which is the main text pre-processor; this is done completely in the direction as specified by the user. This is meant to

TUGboat, Volume 27 (2006), No. 2 — Proceedings of the 2006 Annual Meeting 149 Youssef Jabri

\documantclass{article} \usepackage[ cp1256 ]{inputenc} \usepackage[ arabic ,english]{babel} \begin{document} \selectlanguage{arabic}

, œr˜ Ÿmr˜ ü œs \\

­CAtF¯ ¨ rKˆ x As˜ ™Of˜

œ¡ Ð rq˜ Ÿ› ­Cs˜ Anml`§ Am• r›¯ ¨ ­CAtF¯ Anml`§ ü šFC A• šA’ rA Ÿˆ ©CAb˜ } ¨

Ÿ› –˜AF¤ – Cdq —CdqtF¤ –ml` —rtF ¨ž œhl˜ ™q˜ œ TS§rf˜ r‹ Ÿ› Ÿt`•C ‰•rl r›¯A œ•d

Yms§¤ r›¯ @¡ œl` n• œhl˜ Œ˜ ®ˆ ž¤ œlˆ ¯¤ œl` ¤ Cd’ ¯¤ Cdq –žA œ\`˜ –lS

rJ r›¯ @¡ œl` n• ¤ ¢ ¨˜ —CA œ ¨˜ £rs§¤ ¨˜ £Cd’A ©r› Tb’Aˆ¤ ¨JA`›¤ ¨n§ ¨ ¨˜ r ¢tA

dns› ¨¤ ¢ ¨nRC œ A•  r˜ ¨˜ Cd’¤ ¢nˆ ¨nr}¤ ¨nˆ ¢r}A ©r› Tb’Aˆ¤ ¨JA`›¤ ¨n§ ¨ ¨˜

Ÿ ­ A`F Ÿ›¤ ü ­CAtF  Ÿ ­ A`F Ÿ› šA’ ¢ž Q ¨bn˜ Ÿˆ QA’¤ ¨ Ÿ d`F §d Ÿ› dm A›¯

šA’ d’¤ ü YS’ Am ¢WF  Ÿ ­qJ Ÿ›¤ ü ­CAtF ¢•r  Ÿ ­qJ Ÿ›¤ ü YS’ Am £ARC  Y˜A` ¤ ¢žAbF [\textmash{

ü Ylˆ ™•t ›zˆ ÐA r›¯ ¨ œ¡C¤AJ¤ }]

œ¡r› dJC Y˜ ¤d¡ ¯ ü ¢¤ Œtb§ ’ C¤AK A› £ At’ šA’¤ \L{This is a simple example of Arabic text you may want to type}

.Ÿm˜A`˜ C ü dm˜¤ œ \end{document}

Figure 1: Sample Arabi input be the one that dominates your text, either an Ara- As mentioned earlier, the package is distributed bic (script) document with small amounts of Latin under the LPPL, and has the status “author-main- text included, or a Latin one that contains Arabic. tained”. It can be used freely (including commer- Arabi has also a limited, but almost good, capa- cially) to produce beautiful texts that mix Arabic bility of vocalizing. Some more work needs to be with characters from other scripts. done in that direction. Things would have been cer- Figure 1 shows a sample Arabi input document, tainly better if METAFONT had more powerful lig- and figure 2 the corresponding output. ature possibilities! But if you use X TE EX and have the right fonts, then things are certainly better. 4.4 Babel compliance The package also comes with extensive, and, we Arabi is fully LATEX 2ε and Babel compliant. It pro- hope, clear documentation. vides almost all the language-dependent strings for the Arabic and Farsi languages and can generate automatically the official Jalali calendar. The Farsi 4.3 Current status captions and the code for the Farsi date are from 5 At the time of this writing, Arabi is at version 1.0, and the FarsiTEX system. Moreover, all Babel language already included in some distributions like MikTEX switching commands apply. and BakomaTEX. 5 The latest version is always available from the The FarsiTEX system seems unfortunately still not avail- able with LATEX 2ε. We hope that the Farsi support offered CTAN archives. You should find it at by Arabi and the Farsi fonts from the Farsi project that CTAN:-archive/language/arabic/arabi come with Arabi will be useful to all Farsi users.

150 TUGboat, Volume 27 (2006), No. 2 — Proceedings of the 2006 Annual Meeting The Arabi system — T X writes in Arabic and Farsi ] ¨r`˜ [ A\ž E

, œr˜ Ÿmr˜ ü œs ­CAtF¯ ¨ rKˆ x As˜ ™Of˜

­Cs˜ Anml`§ Am• r›± ¨ ­CAtF¯ Anml`§ ü šFC A• šA’ rA Ÿˆ ©CAb˜ } ¨

—rtF ¨ž œhl˜ ™q˜ œ TS§rf˜ r‹ Ÿ› Ÿt`•C ‰•rl r›±A œ•d œ¡ Ð rq˜ Ÿ› œlˆ ¯¤ œl` ¤ Cd’ ¯¤ Cdq –žA œ\`˜ –lS Ÿ› –˜F¤ – Cdq —CdqtF¤ –ml` Tb’Aˆ¤ ¨JA`›¤ ¨n§ ¨ ¨˜ r ¢tA Yms§¤ r›± @¡  œl` n•  œhl˜ Œ˜ ®ˆ ž¤

¨JA`›¤ ¨n§ ¨ ¨˜ rJ r›± @¡  œl` n• ¤ ¢ ¨˜ —CA œ ¨˜ £rs§¤ ¨˜ £Cd’A ©r› dns› ¨¤ ¢ ¨nRC œ A•  r˜ ¨˜ Cd’¤ ¢nˆ ¨nr}¤ ¨nˆ ¢r}A ©r› Tb’Aˆ¤

Ÿ›¤ ü ­CAtF  Ÿ ­ A`F Ÿ› šA’ ¢ž Q ¨bn˜ Ÿˆ QA’¤ ¨ Ÿ d`F §d Ÿ› dm A›³

¢WF  Ÿ ­qJ Ÿ›¤ ü ­CAtF ¢•r  Ÿ ­qJ Ÿ›¤ ü YS’ Am £ARC  Ÿ ­ A`F

­ At’ šA’¤ ]ü Ylˆ ™•t ›zˆ Џ r›± ¨ œ¡C¤AJ¤[ Y˜A` ¤ ¢žAbF šA’ d’¤ ü YS’ Am

This is a simple example of Arabic text œ¡r› dJC Y˜ ¤d¡ ¯ ü ¢¤ Œtb§ ’ C¤AK A›

.Ÿm˜A`˜ C ü dm˜¤ œ you may want to type

Figure 2: Sample Arabi output

4.5 Compatibility When texts are in general not fully vowelized, The Arabi package has been tested successfully with the transliteration cannot be expected to be correct. packages such as parshape, poster, (and Moreover, when writing using some 8-bit input en- many of its derivatives), and, to our great surprise coding there is absolutely no way to distinguish be- tween long vowels and the letters alif, yaa and and pleasure, ArabTEX. It has been tested also on © ¤ a Mac OS X system with the teT X distribution and . Neither, it is possible to write correctly the E -alif waw ya- TeXShop (see figure 3). hamza when on , ¯ , or ¯ . To use it, just load the package translit as 4.6 Arabic fonts with any other package, and type Arabic text in 8 bits in a Latin context, that is, without issuing a Arabi One of the good features in is its ability to command that switches to the Arabic language. use any existing fonts that the underlying TEX en- gine can access. Arabi comes with a collection of

Arabic and Farsi GNU fonts from, respectively, the 1 -abw al¯ ,la-¯ al¯ m,ry ©r`m˜ º®`˜  1 Arabeyes and Farsi Web projects. The TFM files of 2 matnuN mubar¯ akuN ¿—CAbu› ¿Ÿta› 2 some widely available commercial fonts are also in- 3 h. gˇ mbrwr C¤rb›  3 cluded in the distribution, but the user still has to manage telling his engine where to find the corre- Table 2: A little example of transliteration sponding font. One remark to make here is that when prepar- ing the vector encoding files for the different fonts, we learned that there is no standard. Even some Classical poetry, in both Arabic and Farsi, is corporations who produce and distribute applica- formatted in two “parallel” verses that begin and tions and fonts that support the Arabic script for end at the same positions. When verses are too many years use so many names for the same glyphs short, they are written closer to the (vertical) center of the page, as in the next example. Arabi relies on that we arrived at the conclusion that one can never 6 know what will be found when the font is opened! the same idea of spreading the keshida glyph used by ArabTEX. 5 Some bells and whistles 6 Arabi comes also with an experimental module that A contribution by the author to ArabTEX a long time ago. ArabTEX uses a variable width horizontal “line” while produces a transliteration of Arabic texts. No we stack the kashida glyph the necessary number of times to counterpart has been done for Farsi yet. get the right width!

TUGboat, Volume 27 (2006), No. 2 — Proceedings of the 2006 Annual Meeting 151 Youssef Jabri

Figure 3: Arabi running on Mac OS

6 The limits and the problems

] œ`˜ T›¯ ¨ ¨¶rŒW˜ šA’ [ The main limit seems to be the capacity of the TFM format: • First in its 256-glyph limit, which certainly is ¢þþbA} œ¡ ¨n§ Tþþ›®s˜  not a sufficient number for a modern font, not ™þþsk˜A ºrm˜ ©rŒ§¤ ¨˜Aþþ`m˜ Ÿˆ to talk about an Arabic one! Aþþþqfž @ A ¢þþþ˜ n  • And second in the very limiting way it han- šzþþtˆ¤ ˜ ¨ AþþmlF ¤ |C± ¨ dles ligatures. In a script like Arabic, three- character ligatures are the rule, while there are Yn› Žþþl «¤m˜ ‘rJ ¨ Aþþ• ˜ even four letter ligatures, e.g., dm›. But if we ™þþm˜ ­C Aþþ›§ HmK˜ rb œ˜ also want to manage diacritics, which we can œh@• xAþþn˜ dnˆ –þþ’d} J¤ recall play in Arabic the role of vowels in Latin šdþþþt`m `› “AþþþW§ ™¡¤ languages, things become even worse.

¯¤ ¢þþlˆ YK§ ¯ TˆAþþnq˜ –l› There is also an important ε-TEX issue, that R2L šþþ˜¤ CAþþOž± Y˜ ¢ Aþþt§ the direction is not supported in Mathematics. So we have to rely on some script a` la Knuth and Aþþh˜ Ab ¯ Cd ºAþþqb˜ r MacKay to reverse the characters and the words. ™þþþqtn› r‹ ™þþþ\ `mF ™h 7 The future for Arabi Figure 4: Some Arabic poetry Concerning the future developments of Arabi. In these early times, we focus on keeping it alive and

152 TUGboat, Volume 27 (2006), No. 2 — Proceedings of the 2006 Annual Meeting The Arabi system — T X writes in Arabic and Farsi ] ¨r`˜ [ A\ž E bringing it to maturity by correcting any bug that [9] D.E. Knuth. The METAFONTbook. appears and completing the already existing func- Addison-Wesley, Reading, MA, USA, 1986. tions, as no one is perfect. Let us cite the poet [10] D.E. Knuth. Virtual Fonts: More fun for grand al-mutanabbi : ¨bntm˜ W˜  šA’ wizards. TUGboat 11(1), 13–23, April 1990.

¾Aþþþþbˆ xAn˜ ˆ ¨ C œ˜¤ [11] D.E. Knuth and P. MacKay. Mixing Aþþþþmt˜ Ylˆ Ÿ§C Aq˜ Pqn• right-to-left texts with left-to-right texts. TUGboat 8(1), 14–25, April 1987. Please do not hesitate to forward suggestions, questions, or comments on Arabi. Thanks for your [12] A. Lakhdar-Ghazal. Caract`eres arabes ASV-CODAR interest. diacritiques selon l’ (pour imprimer les langues arabes). Institut References d’Etudes´ et de Recherches pour l’Arabisation, Rabat, 1993. [1] B. Esfahbod and R. Pournader. FarsiTEX and the Iranian TEX community. TUGboat 23(1), [13] K. Lagally. ArabTEX — Typesetting Arabic 41–45, 2002. with vowels and ligatures. Proceedings of the [2] J. Braams. Babel, a multilingual style-option EuroTEX92 conference, Prague, 1992 system for use with LATEX’s standard [14] K. Lagally. ArabTEX Arabic and Hebrew, document styles. TUGboat 12(2), 291–301, (Draft) User Manual Version 4.00. March 11, 1992. 2004. A [3] J. Braams. An update on the Babel system. [15] L. Lamport. LTEX: A Document Preparation TUGboat 14(1), 60–61, 1993. System: User’s Guide and Reference Manual. [4] Michel Goossens and Frank Mittelbach, with Addison-Wesley, Reading, MA, USA, second Johannes Braams, David Carlisle, and Chris edition, 1994. Rowley. The LATEX Companion. [16] P. MacKay. Typesetting problem scripts. Addison-Wesley, 2nd edition, 2004. BYTE 11(2), 201–218, 1986. [5] Y. Haralambous. Towards the revival of [17] The FarsiTEX Project. traditional Arabic typography . . . through http://www.farsitex.org/ TEX. Proceedings of the EuroTEX’92 [18] The FarsiWeb Project. conference, Prague, 1992. http://www.farsiweb.info/ [6] Y. Haralambous. Typesetting the Holy Qur’a¯n [19] Institute of Standards and Industrial Research with TEX. Proceedings of the 2nd of Iran. http://www.isiri.com International Conference on Multilingual [20] Microsoft. Free download of the Arabic font Computing (Latin and Arabic script), pack, arafonts.exe. Durham, 1992. http://office.microsoft.com/arabicregion/ [7] A. Hoenig. TEX Unbound: Strategies for Downloads/2000/arafonts.aspx Fonts, Graphics, and More. Oxford University [21] X TE EX web site and mailing list. Press, 1998. http://scripts.sil.org/xetex [8] D.E. Knuth. The TEXbook. Addison-Wesley, [22] The Unicode standard. Reading, MA, USA, 1986. http://www.unicode.org/

TUGboat, Volume 27 (2006), No. 2 — Proceedings of the 2006 Annual Meeting 153