Outlines Arabic writing Arabic mathematical expressions Arabic scientific document composition DadTEX motivations DadTEX system Open problems and prospects

DadTEX, a full Arabic interface

M. Eddahibi, A. Lazrek, K. Sami

Department of Computer Science Cadi Ayyad University

Marrakesh November 9 - 11, 2006

TUG2006

http://www.ucam.ac.ma/fssm/rydarab DadTEX, a full Arabic interface Outlines

1 Arabic writing

2 Arabic mathematical expressions

3 Arabic scientific document composition

4 DadTEX motivations

5 DadTEX system

6 Open problems and prospects Arabic Writing

Arabic letters represent only consonants or long vowels Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired Arabic script is written from right to left Arabic script is cursive Arabic Writing

Arabic letters represent only consonants or long vowels Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired Arabic script is written from right to left Arabic script is cursive Arabic Writing

Arabic letters represent only consonants or long vowels Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired Arabic script is written from right to left

Arabic script is cursive Arabic Writing

Arabic letters represent only consonants or long vowels Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired Arabic script is written from right to left Arabic script is cursive Arabic mathematical expressions

There are two mathematical notations according to the regions: ? Latin notation with Arabic text ? genuine Arabic notation

Arabic notation Spreads from right to left Uses Arabic alphabet based symbols Uses Arabic abbreviations for usual functions Uses some Latin mirrored symbols Uses curved kashida for variable sized symbols Arabic mathematical expressions

There are two mathematical notations according to the regions: ? Latin notation with Arabic text ? genuine Arabic notation

Arabic notation Spreads from right to left Uses Arabic alphabet based symbols

Uses Arabic abbreviations for usual functions Uses some Latin mirrored symbols Uses curved kashida for variable sized symbols Arabic mathematical expressions

There are two mathematical notations according to the regions: ? Latin notation with Arabic text ? genuine Arabic notation

Arabic notation Spreads from right to left Uses Arabic alphabet based symbols Uses Arabic abbreviations for usual functions

Uses some Latin mirrored symbols Uses curved kashida for variable sized symbols Arabic mathematical expressions

There are two mathematical notations according to the regions: ? Latin notation with Arabic text ? genuine Arabic notation

Arabic notation Spreads from right to left Uses Arabic alphabet based symbols Uses Arabic abbreviations for usual functions Uses some Latin mirrored symbols

Uses curved kashida for variable sized symbols Arabic mathematical expressions

There are two mathematical notations according to the regions: ? Latin notation with Arabic text ? genuine Arabic notation

Arabic notation Spreads from right to left Uses Arabic alphabet based symbols Uses Arabic abbreviations for usual functions Uses some Latin mirrored symbols Uses curved kashida for variable sized symbols Arabic mathematical expressions

Arabic alphabet symbols are dotted or dotless

Uses several styles to extend the amount of symbols Arabic scientific document composition

Image based method Pure painting Painting and equation editor Painting and text layout Handwritten equation digitalization

TEX based method RyDArab and CurExt Arabic scientific document composition

Image based method Pure painting Painting and equation editor Painting and text layout Handwritten equation digitalization

TEX based method RyDArab and CurExt Pure painting

Uses drawing tool to paint both text and non textual symbols The result is cut and pasted to document Painting and text layout

Use of letters for alphabet based symbols and drawings for other symbols Painting and equation editor

Uses an equation editor for Arabic-Latin common notation and drawing for mirrored symbols Handwritten equation digitalization

Use of scanner to transform both text and mathematical expressions to image Image based Methods

Advantages are WYSIWYG do not present any linguistic difficulty when used in Arabic versions Drawbacks Bad typographical quality Difficult (need some drawing skills and Image handling) Not uniform (expressions have not the same metrics) No semantic content RyDArab and CurExt

Advantages TEX based systems High quality technical documents Documents can be edited using simple text editor Several choices for notations and symbols styles Variable sized symbols dimensions are automatically and transparently set Expressions can be converted into several formats (image, MathML,...) Expression can be edited easily Drawbacks No WYSIWYG interface for TEX in Arabic till now Need some English and TEX learning DadTEX Motivations

Information may be reached, used and communicated in the languages of the transmitter and that of the receiver without considerations of the technical support Several trends help to go from monolingualism to multilingualism from ASCII to Unicode TEX Arabic support: Omega, ArabTEX, Aleph, Rydarab ... Multilingual fonts: STIX project XML I18n ... Several studies show that one learns better in his mother tongue DadTEX system

DadTEX goals Interface for composition of LATEX documents using only Arabic Build Arabic version of TEX commands lexicon Make it easy to learn and understand TEX vocabulary for Arabic users Avoid bidirectionality problems due to the mixture of English commands and Arabic text DadTEX System

Bidirectionality problems Text selection

Character position Semantic ambiguity Due to the graphical swapping DadTEX System

Bidirectionality problems Text selection Character position

Semantic ambiguity Due to the graphical swapping DadTEX System

Bidirectionality problems Text selection Character position Semantic ambiguity Due to the graphical swapping DadTEX System

A conceivable solution is to use an application to convert Arabic document into its transliterated equivalent. This mechanism is similar to the one used in FarsiTEX ( ftx2tex translates Persian text to transliterated text)

This solution have several deficiencies: it is not direct; it generates a supplementary file to be processed instead of the original source file the compilation time is increased additional memory and free space are required DadTEX System

DadTEX is an interface that allows the creation of LATEX documents in Arabic. The whole of the document can be composed using only Arabic text with some control characters like backslash, dollar, ... It is based on the primitive \def for the translation of every commands. In some cases commands translation is not sufficient : RydArab uses transliteration for individual letter-based alphabetical symbols it is necessary to redefine the correspondences using Arabic characters. DadTEX System

DadTEX is an interface that allows the creation of LATEX documents in Arabic. The whole of the document can be composed using only Arabic text with some control characters like backslash, dollar, ... It is based on the primitive \def for the translation of every commands. In some cases commands translation is not sufficient : RydArab uses transliteration for individual letter-based alphabetical symbols it is necessary to redefine the correspondences using Arabic characters. DadTEX System

DadTEX is an interface that allows the creation of LATEX documents in Arabic. The whole of the document can be composed using only Arabic text with some control characters like backslash, dollar, ... It is based on the primitive \def for the translation of every commands. In some cases commands translation is not sufficient : RydArab uses transliteration for individual letter-based alphabetical symbols it is necessary to redefine the correspondences using Arabic characters. DadTEX System

In TEX, besides commands for mathematical objects, there are many commands for document segmentation and content nature specification. The translation of such commands, will make it very easy for Arabic users to understand why commands like \chapter should not be used in a standard article. DadTEX structure DadTEX structure

dadpreamb.: a set of commands used in document preambles. The encoding system in use is defined here. dadalpha.tex: here, ISO-8859-6 Arabic characters are declared as letters using the primitive \catcode dadbody.tex: a set of commands used in documents bodies dadadapt.tex: DadTEX users can add their own commands. It is like a dictionary of commands. It allows for flexible and customizable translation. Encoding

DadTEX can be used with any encoding system able of representing the Arabic alphabet, as long as the encoding is supported by the packages in use. In this version, we used ISO-8859-6 instead of UTF-8 because the ArabTEX system still has some problems when it is used with UTF-8 In ISO-8859-6, Arabic characters are encoded in one byte. It is thus easy to set their category code into 11. Encoding

UTF-8 Using \catcode with UTF-8 will lead to errors, because it is intended to be used with one-byte characters.

Arabic characters should be divided into there two visible bytes using an ASCII text editor.

and . In the case of the Omega system, the hexadecimal code can be used directly: . Sample DadTEX advantages

DadTEX is based completely on TEX crossplatform supported compatible with several other TEX extensions allows composition of documents using only Arabic text and commands can be adapted easily to regional needs and users choices can be generalized to several other non Latin languages Open problems and prospects

In TEX, numbers are used to control document component sizes, and action’s frequency, etc. The support of Arabic Indic numbers or Persian numbers in control sequences is needed The use of file names in Arabic is still a problem In a future version of DadTEX, it will be very interesting to add support for other packages (e.g. Arabi, ...) Further steps are to be done in the I18n field: Exploiting the linguistic and regional properties from the system settings and getting notational preferences (such as units, currency,...) Take maximum advantage of Unicode’s bidirectionality algorithm like it is done in browsers, where no specific declaration of the language is required Open problems and prospects

In TEX, numbers are used to control document component sizes, and action’s frequency, etc. The support of Arabic Indic numbers or Persian numbers in control sequences is needed The use of file names in Arabic is still a problem In a future version of DadTEX, it will be very interesting to add support for other packages (e.g. Arabi, ...) Further steps are to be done in the I18n field: Exploiting the linguistic and regional properties from the system settings and getting notational preferences (such as units, currency,...) Take maximum advantage of Unicode’s bidirectionality algorithm like it is done in browsers, where no specific declaration of the language is required Open problems and prospects

In TEX, numbers are used to control document component sizes, and action’s frequency, etc. The support of Arabic Indic numbers or Persian numbers in control sequences is needed The use of file names in Arabic is still a problem In a future version of DadTEX, it will be very interesting to add support for other packages (e.g. Arabi, ...) Further steps are to be done in the I18n field: Exploiting the linguistic and regional properties from the system settings and getting notational preferences (such as units, currency,...) Take maximum advantage of Unicode’s bidirectionality algorithm like it is done in browsers, where no specific declaration of the language is required Open problems and prospects

In TEX, numbers are used to control document component sizes, and action’s frequency, etc. The support of Arabic Indic numbers or Persian numbers in control sequences is needed The use of file names in Arabic is still a problem In a future version of DadTEX, it will be very interesting to add support for other packages (e.g. Arabi, ...) Further steps are to be done in the I18n field: Exploiting the linguistic and regional properties from the system settings and getting notational preferences (such as units, currency,...) Take maximum advantage of Unicode’s bidirectionality algorithm like it is done in browsers, where no specific declaration of the language is required Open problems and prospects

In TEX, numbers are used to control document component sizes, and action’s frequency, etc. The support of Arabic Indic numbers or Persian numbers in control sequences is needed The use of file names in Arabic is still a problem In a future version of DadTEX, it will be very interesting to add support for other packages (e.g. Arabi, ...) Further steps are to be done in the I18n field: Exploiting the linguistic and regional properties from the system settings and getting notational preferences (such as units, currency,...) Take maximum advantage of Unicode’s bidirectionality algorithm like it is done in browsers, where no specific declaration of the language is required The End Thank you!