Outlines Arabic writing Arabic mathematical expressions Arabic scientific document composition DadTEX motivations DadTEX system Open problems and prospects
DadTEX, a full Arabic interface
M. Eddahibi, A. Lazrek, K. Sami
Department of Computer Science Cadi Ayyad University
Marrakesh November 9 - 11, 2006
TUG2006
http://www.ucam.ac.ma/fssm/rydarab DadTEX, a full Arabic interface Outlines
1 Arabic writing
2 Arabic mathematical expressions
3 Arabic scientific document composition
4 DadTEX motivations
5 DadTEX system
6 Open problems and prospects Arabic Writing
Arabic letters represent only consonants or long vowels Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired Arabic script is written from right to left Arabic script is cursive Arabic Writing
Arabic letters represent only consonants or long vowels Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired Arabic script is written from right to left Arabic script is cursive Arabic Writing
Arabic letters represent only consonants or long vowels Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired Arabic script is written from right to left
Arabic script is cursive Arabic Writing
Arabic letters represent only consonants or long vowels Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired Arabic script is written from right to left Arabic script is cursive Arabic mathematical expressions
There are two mathematical notations according to the regions: ? Latin notation with Arabic text ? genuine Arabic notation
Arabic notation Spreads from right to left Uses Arabic alphabet based symbols Uses Arabic abbreviations for usual functions Uses some Latin mirrored symbols Uses curved kashida for variable sized symbols Arabic mathematical expressions
There are two mathematical notations according to the regions: ? Latin notation with Arabic text ? genuine Arabic notation
Arabic notation Spreads from right to left Uses Arabic alphabet based symbols
Uses Arabic abbreviations for usual functions Uses some Latin mirrored symbols Uses curved kashida for variable sized symbols Arabic mathematical expressions
There are two mathematical notations according to the regions: ? Latin notation with Arabic text ? genuine Arabic notation
Arabic notation Spreads from right to left Uses Arabic alphabet based symbols Uses Arabic abbreviations for usual functions
Uses some Latin mirrored symbols Uses curved kashida for variable sized symbols Arabic mathematical expressions
There are two mathematical notations according to the regions: ? Latin notation with Arabic text ? genuine Arabic notation
Arabic notation Spreads from right to left Uses Arabic alphabet based symbols Uses Arabic abbreviations for usual functions Uses some Latin mirrored symbols
Uses curved kashida for variable sized symbols Arabic mathematical expressions
There are two mathematical notations according to the regions: ? Latin notation with Arabic text ? genuine Arabic notation
Arabic notation Spreads from right to left Uses Arabic alphabet based symbols Uses Arabic abbreviations for usual functions Uses some Latin mirrored symbols Uses curved kashida for variable sized symbols Arabic mathematical expressions
Arabic alphabet symbols are dotted or dotless
Uses several styles to extend the amount of symbols Arabic scientific document composition
Image based method Pure painting Painting and equation editor Painting and text layout Handwritten equation digitalization
TEX based method RyDArab and CurExt Arabic scientific document composition
Image based method Pure painting Painting and equation editor Painting and text layout Handwritten equation digitalization
TEX based method RyDArab and CurExt Pure painting
Uses drawing tool to paint both text and non textual symbols The result is cut and pasted to document Painting and text layout
Use of letters for alphabet based symbols and drawings for other symbols Painting and equation editor
Uses an equation editor for Arabic-Latin common notation and drawing for mirrored symbols Handwritten equation digitalization
Use of scanner to transform both text and mathematical expressions to image Image based Methods
Advantages are WYSIWYG do not present any linguistic difficulty when used in Arabic versions Drawbacks Bad typographical quality Difficult (need some drawing skills and Image handling) Not uniform (expressions have not the same metrics) No semantic content RyDArab and CurExt
Advantages TEX based systems High quality technical documents Documents can be edited using simple text editor Several choices for notations and symbols styles Variable sized symbols dimensions are automatically and transparently set Expressions can be converted into several formats (image, MathML,...) Expression can be edited easily Drawbacks No WYSIWYG interface for TEX in Arabic till now Need some English and TEX learning DadTEX Motivations
Information may be reached, used and communicated in the languages of the transmitter and that of the receiver without considerations of the technical support Several trends help to go from monolingualism to multilingualism from ASCII to Unicode TEX Arabic support: Omega, ArabTEX, Aleph, Rydarab ... Multilingual fonts: STIX project XML I18n ... Several studies show that one learns better in his mother tongue DadTEX system
DadTEX goals Interface for composition of LATEX documents using only Arabic Build Arabic version of TEX commands lexicon Make it easy to learn and understand TEX vocabulary for Arabic users Avoid bidirectionality problems due to the mixture of English commands and Arabic text DadTEX System
Bidirectionality problems Text selection
Character position Semantic ambiguity Due to the graphical swapping DadTEX System
Bidirectionality problems Text selection Character position
Semantic ambiguity Due to the graphical swapping DadTEX System
Bidirectionality problems Text selection Character position Semantic ambiguity Due to the graphical swapping DadTEX System
A conceivable solution is to use an application to convert Arabic document into its transliterated equivalent. This mechanism is similar to the one used in FarsiTEX ( ftx2tex translates Persian text to transliterated text)
This solution have several deficiencies: it is not direct; it generates a supplementary file to be processed instead of the original source file the compilation time is increased additional memory and free space are required DadTEX System
DadTEX is an interface that allows the creation of LATEX documents in Arabic. The whole of the document can be composed using only Arabic text with some control characters like backslash, dollar, ... It is based on the primitive \def for the translation of every commands. In some cases commands translation is not sufficient : RydArab uses transliteration for individual letter-based alphabetical symbols it is necessary to redefine the correspondences using Arabic characters. DadTEX System
DadTEX is an interface that allows the creation of LATEX documents in Arabic. The whole of the document can be composed using only Arabic text with some control characters like backslash, dollar, ... It is based on the primitive \def for the translation of every commands. In some cases commands translation is not sufficient : RydArab uses transliteration for individual letter-based alphabetical symbols it is necessary to redefine the correspondences using Arabic characters. DadTEX System
DadTEX is an interface that allows the creation of LATEX documents in Arabic. The whole of the document can be composed using only Arabic text with some control characters like backslash, dollar, ... It is based on the primitive \def for the translation of every commands. In some cases commands translation is not sufficient : RydArab uses transliteration for individual letter-based alphabetical symbols it is necessary to redefine the correspondences using Arabic characters. DadTEX System
In TEX, besides commands for mathematical objects, there are many commands for document segmentation and content nature specification. The translation of such commands, will make it very easy for Arabic users to understand why commands like \chapter should not be used in a standard article. DadTEX structure DadTEX structure
dadpreamb.tex: a set of commands used in document preambles. The encoding system in use is defined here. dadalpha.tex: here, ISO-8859-6 Arabic characters are declared as letters using the primitive \catcode dadbody.tex: a set of commands used in documents bodies dadadapt.tex: DadTEX users can add their own commands. It is like a dictionary of commands. It allows for flexible and customizable translation. Encoding
DadTEX can be used with any encoding system able of representing the Arabic alphabet, as long as the encoding is supported by the packages in use. In this version, we used ISO-8859-6 instead of UTF-8 because the ArabTEX system still has some problems when it is used with UTF-8 In ISO-8859-6, Arabic characters are encoded in one byte. It is thus easy to set their category code into 11. Encoding
UTF-8 Using \catcode with UTF-8 will lead to errors, because it is intended to be used with one-byte characters.
Arabic characters should be divided into there two visible bytes using an ASCII text editor.
and . In the case of the Omega system, the hexadecimal code can be used directly: . Sample DadTEX advantages
DadTEX is based completely on TEX crossplatform supported compatible with several other TEX extensions allows composition of documents using only Arabic text and commands can be adapted easily to regional needs and users choices can be generalized to several other non Latin languages Open problems and prospects
In TEX, numbers are used to control document component sizes, and action’s frequency, etc. The support of Arabic Indic numbers or Persian numbers in control sequences is needed The use of file names in Arabic is still a problem In a future version of DadTEX, it will be very interesting to add support for other packages (e.g. Arabi, ...) Further steps are to be done in the I18n field: Exploiting the linguistic and regional properties from the system settings and getting notational preferences (such as units, currency,...) Take maximum advantage of Unicode’s bidirectionality algorithm like it is done in browsers, where no specific declaration of the language is required Open problems and prospects
In TEX, numbers are used to control document component sizes, and action’s frequency, etc. The support of Arabic Indic numbers or Persian numbers in control sequences is needed The use of file names in Arabic is still a problem In a future version of DadTEX, it will be very interesting to add support for other packages (e.g. Arabi, ...) Further steps are to be done in the I18n field: Exploiting the linguistic and regional properties from the system settings and getting notational preferences (such as units, currency,...) Take maximum advantage of Unicode’s bidirectionality algorithm like it is done in browsers, where no specific declaration of the language is required Open problems and prospects
In TEX, numbers are used to control document component sizes, and action’s frequency, etc. The support of Arabic Indic numbers or Persian numbers in control sequences is needed The use of file names in Arabic is still a problem In a future version of DadTEX, it will be very interesting to add support for other packages (e.g. Arabi, ...) Further steps are to be done in the I18n field: Exploiting the linguistic and regional properties from the system settings and getting notational preferences (such as units, currency,...) Take maximum advantage of Unicode’s bidirectionality algorithm like it is done in browsers, where no specific declaration of the language is required Open problems and prospects
In TEX, numbers are used to control document component sizes, and action’s frequency, etc. The support of Arabic Indic numbers or Persian numbers in control sequences is needed The use of file names in Arabic is still a problem In a future version of DadTEX, it will be very interesting to add support for other packages (e.g. Arabi, ...) Further steps are to be done in the I18n field: Exploiting the linguistic and regional properties from the system settings and getting notational preferences (such as units, currency,...) Take maximum advantage of Unicode’s bidirectionality algorithm like it is done in browsers, where no specific declaration of the language is required Open problems and prospects
In TEX, numbers are used to control document component sizes, and action’s frequency, etc. The support of Arabic Indic numbers or Persian numbers in control sequences is needed The use of file names in Arabic is still a problem In a future version of DadTEX, it will be very interesting to add support for other packages (e.g. Arabi, ...) Further steps are to be done in the I18n field: Exploiting the linguistic and regional properties from the system settings and getting notational preferences (such as units, currency,...) Take maximum advantage of Unicode’s bidirectionality algorithm like it is done in browsers, where no specific declaration of the language is required The End Thank you!