Dadtex, a Full Arabic Interface
Total Page:16
File Type:pdf, Size:1020Kb
Outlines Arabic writing Arabic mathematical expressions Arabic scientific document composition DadTEX motivations DadTEX system Open problems and prospects DadTEX, a full Arabic interface M. Eddahibi, A. Lazrek, K. Sami Department of Computer Science Cadi Ayyad University Marrakesh November 9 - 11, 2006 TUG2006 http://www.ucam.ac.ma/fssm/rydarab DadTEX, a full Arabic interface Outlines 1 Arabic writing 2 Arabic mathematical expressions 3 Arabic scientific document composition 4 DadTEX motivations 5 DadTEX system 6 Open problems and prospects Arabic Writing Arabic letters represent only consonants or long vowels Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired Arabic script is written from right to left Arabic script is cursive Arabic Writing Arabic letters represent only consonants or long vowels Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired Arabic script is written from right to left Arabic script is cursive Arabic Writing Arabic letters represent only consonants or long vowels Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired Arabic script is written from right to left Arabic script is cursive Arabic Writing Arabic letters represent only consonants or long vowels Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired Arabic script is written from right to left Arabic script is cursive Arabic mathematical expressions There are two mathematical notations according to the regions: ? Latin notation with Arabic text ? genuine Arabic notation Arabic notation Spreads from right to left Uses Arabic alphabet based symbols Uses Arabic abbreviations for usual functions Uses some Latin mirrored symbols Uses curved kashida for variable sized symbols Arabic mathematical expressions There are two mathematical notations according to the regions: ? Latin notation with Arabic text ? genuine Arabic notation Arabic notation Spreads from right to left Uses Arabic alphabet based symbols Uses Arabic abbreviations for usual functions Uses some Latin mirrored symbols Uses curved kashida for variable sized symbols Arabic mathematical expressions There are two mathematical notations according to the regions: ? Latin notation with Arabic text ? genuine Arabic notation Arabic notation Spreads from right to left Uses Arabic alphabet based symbols Uses Arabic abbreviations for usual functions Uses some Latin mirrored symbols Uses curved kashida for variable sized symbols Arabic mathematical expressions There are two mathematical notations according to the regions: ? Latin notation with Arabic text ? genuine Arabic notation Arabic notation Spreads from right to left Uses Arabic alphabet based symbols Uses Arabic abbreviations for usual functions Uses some Latin mirrored symbols Uses curved kashida for variable sized symbols Arabic mathematical expressions There are two mathematical notations according to the regions: ? Latin notation with Arabic text ? genuine Arabic notation Arabic notation Spreads from right to left Uses Arabic alphabet based symbols Uses Arabic abbreviations for usual functions Uses some Latin mirrored symbols Uses curved kashida for variable sized symbols Arabic mathematical expressions Arabic alphabet symbols are dotted or dotless Uses several styles to extend the amount of symbols Arabic scientific document composition Image based method Pure painting Painting and equation editor Painting and text layout Handwritten equation digitalization TEX based method RyDArab and CurExt Arabic scientific document composition Image based method Pure painting Painting and equation editor Painting and text layout Handwritten equation digitalization TEX based method RyDArab and CurExt Pure painting Uses drawing tool to paint both text and non textual symbols The result is cut and pasted to document Painting and text layout Use of letters for alphabet based symbols and drawings for other symbols Painting and equation editor Uses an equation editor for Arabic-Latin common notation and drawing for mirrored symbols Handwritten equation digitalization Use of scanner to transform both text and mathematical expressions to image Image based Methods Advantages are WYSIWYG do not present any linguistic difficulty when used in Arabic versions Drawbacks Bad typographical quality Difficult (need some drawing skills and Image handling) Not uniform (expressions have not the same metrics) No semantic content RyDArab and CurExt Advantages TEX based systems High quality technical documents Documents can be edited using simple text editor Several choices for notations and symbols styles Variable sized symbols dimensions are automatically and transparently set Expressions can be converted into several formats (image, MathML,...) Expression can be edited easily Drawbacks No WYSIWYG interface for TEX in Arabic till now Need some English and TEX learning DadTEX Motivations Information may be reached, used and communicated in the languages of the transmitter and that of the receiver without considerations of the technical support Several trends help to go from monolingualism to multilingualism from ASCII to Unicode TEX Arabic support: Omega, ArabTEX, Aleph, Rydarab ... Multilingual fonts: STIX project XML I18n ... Several studies show that one learns better in his mother tongue DadTEX system DadTEX goals Interface for composition of LATEX documents using only Arabic Build Arabic version of TEX commands lexicon Make it easy to learn and understand TEX vocabulary for Arabic users Avoid bidirectionality problems due to the mixture of English commands and Arabic text DadTEX System Bidirectionality problems Text selection Character position Semantic ambiguity Due to the graphical swapping DadTEX System Bidirectionality problems Text selection Character position Semantic ambiguity Due to the graphical swapping DadTEX System Bidirectionality problems Text selection Character position Semantic ambiguity Due to the graphical swapping DadTEX System A conceivable solution is to use an application to convert Arabic document into its transliterated equivalent. This mechanism is similar to the one used in FarsiTEX ( ftx2tex translates Persian text to transliterated text) This solution have several deficiencies: it is not direct; it generates a supplementary file to be processed instead of the original source file the compilation time is increased additional memory and free space are required DadTEX System DadTEX is an interface that allows the creation of LATEX documents in Arabic. The whole of the document can be composed using only Arabic text with some control characters like backslash, dollar, ... It is based on the primitive \def for the translation of every commands. In some cases commands translation is not sufficient : RydArab uses transliteration for individual letter-based alphabetical symbols it is necessary to redefine the correspondences using Arabic characters. DadTEX System DadTEX is an interface that allows the creation of LATEX documents in Arabic. The whole of the document can be composed using only Arabic text with some control characters like backslash, dollar, ... It is based on the primitive \def for the translation of every commands. In some cases commands translation is not sufficient : RydArab uses transliteration for individual letter-based alphabetical symbols it is necessary to redefine the correspondences using Arabic characters. DadTEX System DadTEX is an interface that allows the creation of LATEX documents in Arabic. The whole of the document can be composed using only Arabic text with some control characters like backslash, dollar, ... It is based on the primitive \def for the translation of every commands. In some cases commands translation is not sufficient : RydArab uses transliteration for individual letter-based alphabetical symbols it is necessary to redefine the correspondences using Arabic characters. DadTEX System In TEX, besides commands for mathematical objects, there are many commands for document segmentation and content nature specification. The translation of such commands, will make it very easy for Arabic users to understand why commands like \chapter should not be used in a standard article. DadTEX structure DadTEX structure dadpreamb.tex: a set of commands used in document preambles. The encoding system in use is defined here. dadalpha.tex: here, ISO-8859-6 Arabic characters are declared as letters using the primitive \catcode dadbody.tex: a set of commands used in documents bodies dadadapt.tex: DadTEX users can add their own commands. It is like a dictionary of commands. It allows for flexible and customizable translation. Encoding DadTEX can be used with any encoding system able of representing the Arabic alphabet, as long as the encoding is supported by the packages in use. In this version, we used ISO-8859-6 instead of UTF-8 because the ArabTEX system still has some problems when it is used with UTF-8 In ISO-8859-6, Arabic characters are encoded in one byte. It is thus easy to set their category code into 11. Encoding UTF-8 Using \catcode with UTF-8 will lead to errors, because it is intended to be used with one-byte characters. Arabic characters should be divided into there two visible bytes using an ASCII text editor. and . In the case of the Omega system, the hexadecimal code can be used directly: . Sample DadTEX advantages DadTEX is based completely on TEX crossplatform supported compatible with several other