∗ Typesetting the Qur’an and its specific challenges to the TEX family

Hossam A. H. Fahmy Electronics and Communications Department, Faculty of Engineering, Cairo University, Egypt [email protected]

1 Peculiarity of typography isolated. The simplest example of this rule is the

¡ ¥ ¤ ¥

¥ ¥ ¨ equivalent of ‘b’ in Arabic: § . A reader with The has been adopted for use by a sensitive eye would notice that the four shapes of many languages in Africa and Asia including: Ara- the same letter differ in their width, height above bic, Dari, Farsi, Jawi, Kashmiri, Pashto, Punjabi,

the line, and depth below it. The same structural

Sindhi, Urdu, and Uyghur. The is

¡ ¤ ¨ shape is used for the equivalent of ‘t’ ( § ),

also used for a number of other languages either to

¡ ¤

¨

© ©

§ © and ‘th’ ( © ). The equivalent of ‘n’ and present how the language used to be written histor- ‘y’ share the same shapes as ‘b’ in the initial and

ically (as for Turkish) or how some write it in an

¤ £ ¤ £

¨ ¨ ¦ medial forms ( ¦ and ) but not in the final or

unofficial manner (as for Hausa in western Africa).

¦

the isolated forms ( ¦ and ). In tradi- Similar to the latin alphabet, with its adoption tional Arabic writing styles (but with the exception to several languages, the arabic alphabet acquired of , riq¯a‘, and tawq¯ı‘ styles [13]), the letters new symbols to represent the sounds that do not  and their siblings with dots or marks do exist in Arabic. In contrast to the latin alphabet not connect to the following letter but only to the that has dots only on the ‘i’ and ‘j’, the arabic al- preceding one. phabet uses dots extensively both above and below The cursive nature of arabic script adds another the letter shapes to distinguish the different charac- characteristic; many letters combine together to pro-

ters. This explicit distinction between the different

 

¥ ¥   duce new shapes as in  becoming . In the letters in written text using dots was in itself an ad- latin script this phenomena occurs infrequently and dition to the original script which had no dots. In when it happens, a is used to improve the

the original arabic script, the distinction between

appearance of the problematic letter combinations

¢ £ ¤ ¥ ¡ ¢ ¦ ¤ ¥ ¡ (house) and (girl) when represented as

such as ‘ff’ and ‘ffl’. In arabic typography, on the

¢ ¤ ¡ was understood from the context. In general, other hand, the presence of combined letters is abun- the developed symbols for the other languages follow dant but optional in many cases. Haralambous [2] the same idea as Arabic and use more dots (up to gives a long list (yet not exhaustive) of possible ‘lig- four) and special marks on the original shapes of the atures’ in arabic. While speaking about the history arabic letters. of arabic typography, Milo [9] explains that “each Arabic being a semitic language, only the con- letter can have a different appearance in any com- sonants are usually written in a word. The equiva- bination, something that can only be crudely imi- lent of short vowel sounds are written as additional tated with ligatures”. According to Milo [9], most marks on top of the letters. Obviously, the differ- modern books present the connected letter groups ent languages have different vowels and need dif- “as ‘ligatures’ and ‘artistic expressions’ without so ferent symbols to mark them. In addition to that, much as a hint at traditional morphographic rules”. since the geographical area covered by the arabic Mackay [8] discusses the range of context evaluation script historically is quite vast, different regions of in Arabic and concludes that clusters of four, five, the world developed different symbols. The result six, and sometimes more letters may combine into that we have today is a plethora of additional marks a unique shape. Mackay then proposes the use of developed historically. virtual fonts as an adequate solution. A beginner learns that Arabic is written from Another feature in the Arabic script is its re- right to left and must practice writing each letter liance on subtle changes to the letter shape to aid and its connection rules to other letters. Because the reader in identifying the beginning and end of of the cursive nature, any letter may connect to the

each letter within a combination. The letter  has previous and following letters. Hence, a beginner three “teeth” (vertical pen strokes) similar to the learns the general four basic forms of a letter: at the start of a word, at the middle, at the end, and ∗ A graduation project under the author’s supervision by: Ibrahim R. Mohamed, Ahmed Z. Ameen, Ahmed M. Amin, Akram A. Mojahed, Kareem O. Sharawi, and Hisham Shihab TUGboat, Volume 0 (2060), No. 0 — Proceedings of the 2060 Annual Meeting 1001

Hossam A. H. Fahmy

¤ ¥ ¤ ¤ ¥

teeth in and . When  is connected to , the tail 5. There are additional marks that are put on top

of the  may be elongated to alert the reader to the or below the character.

correct grouping of the teeth. Furthermore, the two 6. The horizontal and vertical location of the dots

¢

¥

¨

 ¡

words and present different heights for the and marks on the characters is not at the same

¥ ¤ teeth of the ¤ and to help the reader as well. This position always but depends on the character difference in width and height is a type of encoding and its context. to prevent a misreading of the word and to aid the 7. There are too many letters that combine (“lig- trained eye in quickly catching the letter combina- atures”). tion. That encoding helps in other cases as well. If 8. Ligatures and variable width forms of the let-

due to any reason the dots fade away, a reader faced ters are used to justify the lines.

¢ ¡ with can guess its correct origin. This encoding 9. Several typefaces are needed for special materi-

to emphasize the different letters by raising some als in a work of good quality.

¢

¢ £ ¥ ¡ ¢ ¥ ¤ ¡ ¢ ¥

¦

¨ ¨ teeth is essential in words such as and ¡ . With all of these issues, to find a suitable posi- However, the raising of the teeth is only possible by tion for the dots and marks on the letter combina-

investigating the group of letters. So the height of tions is sometimes a real challenge even for a human

¡ ¢ £ ¢ ¥ ¤ ¡ ¢ ¥ ¤

¦

¨ ¨ ¨ ¦ the tooth for ¦ in and is not a feature of let alone a machine. the individual letter but of the whole combination.

The same goes for the height of the dot on top of 2 Automated typesetting of Arabic

£

¥ ¢

¦ ¦

 ¨

¦ ¦

  that same letter as in  or . Arabic script does not enjoy the same luxury that In traditional (manual) writing, the “skeleton” latin script has when it comes to automated type- of the letter combination is written first then the setting on computers. Milo correctly asserts [9] that dots and the other marks are provided. So, a writer the use of individual letters as the building block

probably progresses from the skeleton to the dotted is not suitable for Arabic. Both Mackay [8] and

£ § £ £

¦¥

¦ ¦

¦ ¦

 ¤  to the vocalized form as  → → . Milo [10] argue that a layered approach is a bet- In the arabic script, a good calligrapher justifies ter solution. In such a layered approach, some basic the lines not just by stretching the spaces between elements are provided in the font to represent the

the words but mainly by using optional ligatures or skeleton of some letter combinations, an individual ©

wider forms of some letters as in changing ¨ to letter’s skeleton, or even a part of a letter. These

¡ ¡

¥ ¨ ¥ ¨ ¨ and writing © instead of . The use of elements are combined first to give the correct skele- optional ligatures and wider forms is the preferred ton of the word with all the needed shaping for the method in high quality works. Another method is to teeth or other style requirements. On top of that

add an elongation to the tail of some letters by us- skeleton, a second layer for the dots is added. Then,

¡ ¥

¥ ¨ 

ing the tat.w¯ıl or kash¯ıdah symbol ‘ ’ such as the vowel marks and any other marks come in sub-

¡ ¢

¥ ¥ instead of  . This second method has been sequent layers. Milo [11] developed a system with a widely abused in newspapers and low quality mate- layered approach for his company, DecoType. It is a rials using mechanical typewriters. proprietary system used by a number of commercial The arabic script has a large number of writing software tools. Due to its proprietary nature, the styles that were developed traditionally to accom- full details and the extent of the capabilities of this modate the different languages and different pur- system are not widely known. poses. Latin scripts use bold, italic, or larger fonts Our goal is to provide a freely available system for section headings and for emphasis. Traditional capable of typesetting the Qur’an, other traditional arabic writings vary the typeface instead. The print- texts, and any publications in the languages using ings of the Qur’an as well as of most books almost the arabic script. The Qur’an is one of the most always use the typeface for the main body. demanding arabic texts from a typographical point The headings, the introductory materials, and the of view. However, there is a long historical record of back materials frequently use other typefaces such excellent quality materials (manuscripts and recent as the thuluth, ta‘l¯ıq, and ruq‘ah. printings) to guide the work on a system to typeset In summary, the points just mentioned are: it. Such a system, once complete, can easily typeset 1. Arabic is written from right to left. any work using the arabic script including those with 2. Characters in general have four different forms. mixed languages. 3. These forms are of different width, height, and Knuth and MacKay [5] were the first to present depth. a working solution for including right to left text (for 4. The shape (the height of the teeth for example) Arabic and Hebrew) in the TEX family. Their pro- of a specific form depends on its context. posed TEX-XET system is an extension of TEX that

1002 TUGboat, Volume 0 (2060), No. 0 — Proceedings of the 2060 Annual Meeting Typesetting the Qur’an and its specific challenges to the TEX family

produces a different DVI file. The enhanced mode of 3 Implementation ε-T X allows bidirectional text processing and pro- E Our goal of typesetting the Qur’an and traditional duces regular DVI files but ε-T X does not provide E texts means a few more challenging requirements in any arabic fonts or any specific functionalities that addition to those of the general arabic typesetting. ease the typesetting of arabic books. Within the To assist the reader in recitation, several indicators T X extensions, both Ω [4] and ArabT X [6, 7] have E E for vowels, joints, text structure, and pausing loca- been used for Arabic and have met some of the basic tions have been added historically to the text of the requirements to varying degrees. Qur’an. We present here a few symbols. With the historical trend to extend T X, Ω

E

¢¤£

¡



¦



¦¥ §©¨ evolved as an implementation allowing multilingual Signs of pause § : text processing. As an offshoot of the work on Ω, Al- • The  sign : the reader may continue but Amal system [2] is an early attempt to typeset the it is better to pause.

Qur’an specifically. Unfortunately, it is not freely •  available and its output (as shown in the exam- The sign : possible to pause but it is ple published in the paper describing it) falls really better to continue.

short of the desires of a native reader. • The waqf j¯a’iz sign  : equal possibility to Due to its various attractive features, Ω was the pause or continue. first choice to achieve our goal. However, the result Additional diacritics of our early experiments with the available arabic :

•     font provided with Ω were not satisfactory. None Ra’s kh¯a’  corresponds to sukun.¯

of the people whom we asked for their opinion liked 

   • The madda  appears in the Qur’an

the font. That font is suitable for a simple publica-







 £

¨

 ¨

tion but definitely not for a high quality work using on many letters such as in  .

¥%

§

¥ ¢

'&

¤ ¥ £

¦

"!¤#

•   ( © $

the arabic script. Ω in its current state does not The small ‘ ’ in ¥ and ‘ ’

,

¥

§ §

§ §

-

¥

+

¥

*

)

  &

¥ ¨ /. ¦

easily lend itself to the layered approach described

¡ (   ¥

in ¦ . # earlier. The modification of Ω is not an easy task ¥ since it is a very large system and such a modifi- Furthermore, there are different “narrations” of the cation means the creation of a new system that is Qur’an that differ in the pronunciation in some lo- not compatible with the existing base of TEX. The cations and hence lead to a plethora of additional newer developments to Ω [3] —once they are sta- marks needed. The vast majority of the printed ble, widely available, and documented— may help copies of the Qur’an are in the narration known as in implementing the layered approach necessary for Hafs. Only three other narrations (with their special typesetting high quality texts in arabic. marks for the special pronunciations) are printed in Lagally in ArabTEX [6] preferred to stay within the whole muslim world. The remaining narrations the stable TEX standard and perform all the nec- (sixteen remaining for a total of twenty) are still in essary processing with TEX macros. That decision manuscript form. allowed ArabTEX to be portable to any TEX imple- Fig. 1 shows an example of the four narrations mentation. However, ArabTEX had to compromise that exist in print. To make the comparison easier, on the issue of line breaking. For right to left text, we present the same two lines from the four narra- ArabTEX is forced to handle the line breaking by it- tions written by the same calligrapher and printed self in a slow and complicated algorithm bypassing by the same press: King Fahd’s complex for print- one of the best parts of TEX available for the latin ing the Qur’an in Madinah. A simple look at the script. first word (top right in each example) reveals some Although not a simple program, ArabTEX is of the different symbols needed. Those additional

confined to a number of style files each perform- symbols fit well in a layered approach but would be ¥ ing a specific task. ArabTEX implements a layered quite difficult to accommodate otherwise. The 0 approach where each character is represented by a symbol appearing on the first word of the top most skeleton and additional modifiers (dots and vowels). narration belongs to the pausing signs. In AlQalam, According to the collected opinions, the quality of we introduced a layer for the pausing signs beyond its font is much better than that of Ω. The font the layer of the dots and the layer of short vowels. still needed improvements but it was an acceptable We use the existing symbols in the original font start. ArabTEX uses the LATEX license and hence we of ArabTEX where they are sufficient. We have im-

changed the name of our work to AlQalam (the pen proved the shape of the madda ‘ ’ and dagger alif ‘1 ’.  in Arabic). The symbols that we have added so far are:

TUGboat, Volume 0 (2060), No. 0 — Proceedings of the 2060 Annual Meeting 1003

Hossam A. H. Fahmy

0



-

0

 

     









# $ % % %10 $ % $ % % %



  .   



& &  &

7

   

 

 



3



2

! " ' ( ' ' 6 '

  54 4

9

" " (

+ , + ,

)

   * 

/ /

  

-

        8

"







 





 -









 G





D



  



K

% %  %CB % I %

 $ $ $ $



   







-

 

A

 3



J

?

2

E

/

' ' ;

(

< < <

: 9 >

= "

+ + +

, ) ,1F

" *

/

 





    H



@

"









%



ROS

'



)

,



 )

L MON

¦

P P5Q Hafs H

Hafs @



0



- -





     









% % % % % %

#VU 0 X



   .   



& &  &

7

   

 

U

 

 3

 2

! ' ' ' 6 '

" (

 T W4 4

 9

( " "

+ +

)

, ,

   * 

/ / Y

  

-

  

    

"















 -







G

   





D

   



K

% %CB % %

 X  I

U U U



   







U

  A

3





J

?

2

E

/

' ; '

(

< < <

9 > :

=

"

+ + +

) F

, ,

*

/

 





H

   



@

T

"







G





%



I

 

¡

?

' '

 

>

) , )

©

 P

Warsh Warsh H

@



0





- -





      









% % % % % %

#VU X



    .  



& & & 

7

 

 

U  

 



3



2

' ' ! ' ' 6

" (

T

  Z4 4

9

( " "

+ +

) , ,

   * 

/ /

  

-

  

    

"













T





-











   

G





D

  

 

 K

 % %CB % I %

 U U X U

    





-

U

 

A



3



J

?

2

E

/

' ( ; '

< < <

: 9 >

"

=

+ , + , +

) F

* "

/

 



    

H





@

T

"









 

%

£ ¢







¦

? ?

E

' '

< 

>

¥

) ,

Q¯alun¯ 

P H

Q¯alun¯ H @

0



-

0





     









% % %10 % % % %

# $ $ $



  .   



& & & 

7

 

 

 

 



3



2

! ' ' ' 6 '

" (

  [4 4

9

( " "

+ +

) , ,

   * 

Y

/ /

  

-

  

    

"















-









 G





D



  



K

% %  %CB % I %

 $ $ $ $



   







-

 

A

 3



J

?

2

Y

E

/

' ; '

(

< < <

9 > :

= "

+ + +

, ) ,\F

* "

/

 





    H



@

"







+

-

  



% % %





3 2

Al-dur¯ ¯ı '

  

) )

,



/^]

P`_Oa (

Al-dur¯ ¯ı P @ Figure 1: The first two lines of surat al-ra‘d. An Figure 2: Start of surat al-ra‘d by AlQalam.

example from the four printed narrations.

&

¢ £ ¤ ¥ ¦

 similar color encoding schemes to stress new read-

#

§ ¨ ©

ing concepts and to train their little eyes in picking

¤ up the distinctive features of the script. A complete

      ( system for dealing with the arabic text should be The use of transliteration for a purely arabic able to color a piece of a letter combination or some document that is a few hundreds of pages long is ob- specific marks. viously not practical nor desired. Hence, the default To color the text, we use our modified version assumption for AlQalam is an input file with the of the acolor package which Karol Mokry had orig- characters coded in Unicode-UTF8. Bi-directional inally written for ArabTEX. With a few modifi- editors such as emacs and gedit (we used both) are cations to the aboxes and awrite components of good options. Editors have their own limitations ArabTEX, AlQalam allows the user to color the dia- though. If a symbol has a unicode point associated critics and the pausing signs of the Qur’an through with it but there is no key combination mapped to commands such as \coldia{blue}. that symbol or no glyph in the editor’s font to rep- resent it, another facility must be used. In a case 4 Results and future work

such as  the user might supply the unicode as ^^db^^96. AlQalam interprets this code as belong- Fig. 2 presents the current results of AlQalam for

ing to the “signs of pause” category then raises it the same narrations shown earlier in Fig. 1.

¤ §

, The initial phase of our work produced a usable

 #



  . to its correct position such as in ¥ . In addition system with known problems in the case of some

to that, for some frequently occurring symbols we symbols which we are currently solving. In addition

  provide a macro such as \ for . . The improvement of to the four narrations of Fig. 2, we completed the the input method is one of the major steps in future work for another ten narrations and wrote a few lines developments. as an example for each one. These results are the Another feature of typesetting the Qur’an is first step in a long study of the typesetting require- the use of colors. In some printings, certain letters, ments for the ten main “readings” of the Qur’an and marks, or sometimes complete words take a different their expansion into the twenty narrations. color usually to remind the reader of a pronunciation Some of the differences between the narrations rule. Educational texts for young children often use constitute simple rules that may be programmed.

1004 TUGboat, Volume 0 (2060), No. 0 — Proceedings of the 2060 Annual Meeting Typesetting the Qur’an and its specific challenges to the TEX family

As an example, every plural pronoun ending with POSTSCRIPT fonts. Software—Practice and

§



. ( # and not followed by becomes in the reading Experience, 29(15):1417–1457, 1999. of ’ab¯ı ja‘far. In the future, we hope to write macros [2] Yannis Haralambous. Typesetting the holy for all the programmable rules. Qur’an with TEX. In Multi-lingual computing: Font development is a must. The current re- Arabic and Roman Script: 3rd International sults of Fig. 2 are still quite far from the level of conference — Durham, UK, December 1992. Fig. 1. With the vast repertoire of ligatures needed [3] Yannis Haralambous and G´abor Bella. Omega to represent the letter combinations in each typeface becomes a sign processor. In EuroTEX 2005: (naskh, thuluth, ta‘l¯iq, . . . ), this task is quite com- Proceedings of the 15th Annual Meeting of the plicated. The macros to detect the combinations European TEX Users, Pont-`a-Mousson, France, and produce the appropriate ligatures for the skele- pages 8–19, March 2005. tons (which might be changed for line justification) [4] Yannis Haralambous and John Plaice. Multi- followed by the accumulation of the additional lay- lingual typesetting with Ω, a case study: Ara- ers of dots and marks is a hard design problem as bic. In Proceedings of the International Sympo- well. sium on Multilingual Information Processing, For line justification, AlQalam inherits the abil- Tsukuba, pages 63–80, March 1997. ity to stretch letters using a kash¯ıdah from ArabTEX. [5] Donald E. Knuth and Pierre A. MacKay. Mix- However we do not have yet a facility to use the bet- ing right-to-left texts with left-to-right texts. ter method of alternative ligatures and wider forms TUGBoat, 8(1):14–25, 1987. for some letters. In his system, Milo [10] attempts to [6] Klaus Lagally. ArabTEX — typesetting Ara- provide for the use of wider forms. Berry [1] presents bic with vowels and ligatures. In Jiˇr´ı Zlatuˇska, another solution (but without high quality fonts and editor, EuroTEX 92: Proceedings of the 7th Eu- ligatures) to stretch the letters for Arabic, Hebrew, ropean TEX Conference, pages 153–172, Brno, and Persian using troff. Within the TEX family, Czechoslovakia, September 1992. Masarykova T´anh [12] presents what he calls “selective use of Universita. multiple glyph” for the latin script although wider [7] Klaus Lagally. ArabTEX: a system for typeset- forms are not necessarily needed there. Our initial ting arabic. In Multi-lingual computing: Ara- assessment indicates that line breaking for Arabic is bic and Roman Script: 3rd International con- an area that needs much more research. ference — Durham, UK, page 9.4.1, December Besides these research problems, AlQalam must 1992. extend its capabilities inherited from ArabTEX such [8] Pierre A. MacKay. The internationalization as footnotes, construction of a table of contents, of TEX with special reference to Arabic. Pro- marginal notes, more LATEX commands and envi- ceedings of the IEEE International Conference ronments,... on Systems, Man and Cybernetics, pages 481– In conclusion, we provided a summary of the 484, November 1990. IEEE catalog number arabic typesetting requirements as well as the first 90CH2930-6. steps of a system to fulfill them. This study is useful [9] Thomas Milo. Arabic script and typography: for those working on any of the languages written in A brief historical overview. In John D. Berry, the arabic script and specifically those supporting editor, Language Culture Type: International the inclusion of Qur’anic quotations. A layered ap- Type Design in the Age of Unicode, pages 112– proach is a must for high quality typography. The 127. Graphis, November 2002. first version of AlQalam is ready for use but with [10] Thomas Milo. Authentic arabic: A case study. very limited documentation. Improvements are in right-to-left font structure, font design, and ty- process for subsequent versions. AlQalam is free. pography. Manuscripta Orientalia, 8(1):49–61, We would like to share it with anyone interested March 2002. in testing and providing us with feedback. To the [11] Thomas Milo. Ali-baba and the 4.0 unicode best of our knowledge, there is no other software characters. TUGBoat, 24(3):502–511, 2003. system that serves the requirements of the different [12] H`an Thˆe Th´anh. Micro-typographic exten- Qur’anic narrations. sions to the TEX typesetting system. TUGBoat,

21(4):317–434, 2000.

%

¤

¦

¦¥ 

References 

 ¤

  

¦

¡ ¥ §

£¢ ¥ [13] Mohamed Zakariya. £ .

[1] Daniel Berry. Stretching letter and slanted- Al-Computer, Communications and Electronics

£ ¨ ¢ ¨

£

     

¤ ¤

£ £ ¥

¨  © © © ¨ ¨ 

¦

¦¥ § ¥ §

baseline formatting for Arabic, Hebrew, and Magazine § , % Persian with ditroff/ffortid and dynamic 22(8):48–53, October 2005.

TUGboat, Volume 0 (2060), No. 0 — Proceedings of the 2060 Annual Meeting 1005