(9999), No. 0 Typesetting Bangla in Unicode-Aware TEX Engines
Total Page:16
File Type:pdf, Size:1020Kb
TUGboat, Volume 0 (9999), No. 0 draft: July 20, 2019 1:30 ? 1 Typesetting Bangla in Unicode-aware the present-day situation of different attempts to TEX engines | from user experiences to typeset Bangla in TEX, this article tries to cover a development insights few of them and provides some insights for future development. Qutub Sajib 2 Scope of this article Abstract Before the Unicode standard was born for the pur- Typesetting Bangla (also known as Bengali) script in pose of writing most scripts of the world in computer, T X was first introduced more than 15 years ago from E several attempts were taken to typeset Bangla in now through a transliteration system. The system T X using mostly ASCII based transliteration sys- would require the user to input Bangla texts with E tems. Brief discussion of those systems are presented fonts from Roman script in a specific ASCII translit- in Section 3. Following this, typesetting Bangla eration scheme and then process the input to display script in Unicode-aware T X engines | X T X and the texts using its own METAFONT-generated Bangla E E E LuaT X | are discussed with examples in subsequent fonts. The transliteration-based system falls short in E sections. Some development ideas for this particular terms of, among others, its harder-to-read source file script are discussed in Section 11. and its requirement of one particular Bangla typeface In this paper, when we say using X T X we family. With the introduction of Unicode-aware T X E E E mean compiling the .tex file with the xelatex pro- engines, X T X for example, and the emergence of E E gram; similarly when using LuaT X we mean com- Unicode-compliant free Bangla fonts, new possibili- E piling the same file with lualatex. The T X-specific ties have evolved. Today both X T X and LuaT X E E E E examples presented in this paper were produced us- support Bangla typesetting allowing the user to in- ing the T X Live 2019 distribution on a computer put texts directly with Unicode Bangla fonts in the E running GNU/Linux operating system, the Slack- editor; thus making the source file easy-to-read and ware 14.2 to be specific. eliminating the need for transliteration schemes. Al- It is predictable that most Bangla documents though several years have passed since the X T X E E would contain at least English, math and other system was first seen to support Bangla typesetting, scripts; but in this article we will consider type- it is still in a state where the finest typographic qual- setting only Bangla using two engines in T X that ity is nearly unachievable for this particular script. E supports the Unicode standard. This article does not While working with different Unicode Bangla fonts cover the discussion about font selection techniques with both of these engines, several rendering issues for different scripts except for Bangla. In order to were observed. In order to ensure fine typographic learn selecting specific fonts for Roman (English) and quality in Bangla script, these issues, among others, maths along with Bangla, one may like to consult need attention from users and developers. the fontspec package documentation. 1 Introduction 3 An ASCII based transliteration system The language Bangla, also known as Bengali, is the made the first attempt sixth most-spoken language in the world as reported As Anshuman Pandey mentioned [2], it was Avinash by Ethnologue it its latest edition. Native speakers Chopde who first introduced Bengali (Bangla) in of this language are primarily from Bangladesh, a TEX in his itrans package. The subsequent devel- rather small country in south Asia. Another good opments in Bangla typesetting in TEX include the number of native speakers are from West Bengal, a arosgn by Muhammad Ali Masroor (later dropped), province of India. The Bangla script is one of the \ItxBengali" by Shrikrishna Patil, \Bengali Writer" thirteen major Indic scripts and has made its way in and bwti by Abhijit Das, and the bengali along with a the Unicode standard. Publishing in this language preprocessor beng by Pandey. Although the translit- has a good history of many centuries. Like other eration based system works, it has the limitation of Indic scripts, typesetting Bangla in TEX has seen its harder-to-read source file (Figure 1). Also, the several attempts in last few years but the typographic font used to typeset the Bangla script is limited to quality is still to find its prime. one particular font family. Apart from the beautiful rendering of mathemat- The bengali-omega package by Lakshmi K. Raut ical contents in TEX, another goal of this typesetting was probably the first to introduce typesetting Bangla system was the finest typographic quality [1]. The using either the transliteration system or Unicode same philosophy can be expected while typesetting based direct input [4]. Although this package has other scripts including Bangla in TEX. Considering the capability of taking Unicode Bangla as input, ? 2 draft: July 20, 2019 1:30 TUGboat, Volume 0 (9999), No. 0 ¿ ¿ i Т ÑÖ Æ × Ö {\bn ke la{}ibe mor kaarya, kahe sandhyaa rabi ক লইেব মার কায, কেহ সারিব। 1 L'ZG ,6. 1SJûG 1ZP 2 Ç6.7*s\\ ÅTDcS 83_ .=3 TDÃP. 7TGs\\ Ö ÒÖ º ÒÝ "suni.yaa jagaT rahe niruttar chabi | িনয়া জগৎ রেহ িনর ছিব। ISú. Ý%8E T7LG ¿2 1TPLG 6,8G\\ Ô¯ÁÔ Ð¸ × Ð¸ Ñ ÑÌÖ maa.tir pradiip chila, se kahila, sbaami মার দীপ িছল, স কিহল, ামী, ,6. ¿J;<19 OSCÇ 1TKG @S 7,s ÌÙÙ ×Ü Ö Ø Ñ º ÑÖ aamaar ye.tuku saadhya kariba taa aami | আমার যটুকু সাধ কিরব তা আিম। Ö¯ÒÞ Ù Ö ß -- rabindranaath .thaakur} \Q\ – রবীনাথ ঠাকু র HH .*8qDSA <S19. Figure 1: Typesetting Bangla in TEX with a transliteration based system as found in [2]. Figure 2: Typesetting Bangla in X TE EX with Unicode fonts. itÑ uses the fonts as used in the package bangtex by script, along with many other scripts using Unicode Anshuman Pandey [3] to typeset the document. fonts and Unicode-aware TEX engines. Although the package comes with language definitions files for 4 Then Unicode-aware T X engines came E different scripts including Bangla, it has many limi- into play tations. Even worse, the language definition file for Т Ð With the introduction of X TE EX and LuaTEX engines, Bangla script (gloss-bengali.ldf) that comes with and the fontspec package for selecting TrueType and this package has several words incorrectly spelled. It OpenType fonts, typesetting Bangla using Unicode is well-known to the speakers and writers of Bangla fonts came into reality. Today, a good number of language that the spelling of a few words has un- Unicode compliant Bangla fonts are freely available dergone changes and few are still in debate to agree that seem to work with these engines. on the \correct" spelling. But those misspelling To start, one needs a keyboard layout that would in the file gloss-bengali.ldf are beyond any de- allow to input Unicode Bangla characters in the edi- bate, as found in most dictionaries. For example, as tor. In most Linux systems, a keyboard layout called of TEX Live 2019, the gloss-bengali.ldf file con- Probhat is available to input Unicode Bangla charac- tains \সারিণ", \খ", and few other words incorrectly ters. Another popular alternative is the Avro Key- spelled. (The author of this paper is not convinced board (https://www.omicronlab.com/index.html), to write incorrect spelling; so in order to see the available for installation in Mac OS X, Linux, and incorrectly spelled words, one has to look into the Windows operating systems. In emacs, three layouts above-mentioned file for the corresponding words). are available, namely bengali-inscript, bengali-itrans, The polyglossia package does come with few good and bengali-probhat. None of the Unicode Bangla features. It has the option to change the numbering keyboard layouts available today were designed with style of LATEX counters; i.e., one can typeset LATEX TEX users in mind; hence one may need to switch the counters with Bangla numerals. As the author of layout frequently in order to type TEX-special char- latexbangla stated, the polyglossia package's current acters (e.g., &, % etc.). Using any of these keyboard support does not offer any easy way to select fonts layouts, one can input Unicode compliant Bangla with Bangla script. This package tries to solve the characters directly in the editor. An appropriate font problem by introducing control sequences to select containing the Bangla script has to be set via the several fonts for Bangla script but it has its shortcom- fontspec package (details in Section 6). Then, upon ings, too. As of our knowledge, there are no Unicode processing the .tex file either with xelatex or lualatex Bangla fonts designed to be used especially with TEX, one gets the typeset document. the package sticks with the limited fonts available Typesetting with the Unicode based system in freely and tries to make bolded and monospaced texts TEX has the advantage over the older systems in using different fonts from different designers. This terms of its easy-to-read source file. As shown in results into using the \AutoFakeBold" and \Auto- Figure 2, the source file can be read easily com- FakeSlant" features of existing \Regular" style fonts. pared to the source file in Figure 1 for the same As a result, the typeset document becomes some- text.