Mem. a Multilingual Package for LATEX with Aleph
Total Page:16
File Type:pdf, Size:1020Kb
MOT01 Proceedings EuroTEX2005 – Pont-à-Mousson, France Mem. A multilingual package for LATEX with Aleph Javier Bezos Typesetter and consultant http://perso-wanadoo.es/jbezos/ http://mem-latex.sourceforge.net/ [email protected] Abstract Mem provides an experimental environment for multilingual and multiscript type- setting with LATEX in the Aleph typesetting system. Aleph is Unicode-savvy and combines features of Omega and eTEX. With Mem you should be able to typeset Unicode documents mixing several languages and several scripts taking advantage of its built-in OCP mechanism and with a high level interface. Currently still under study and development, Mem is designed to be capable of following the development of Omega and LATEX3, and I’m publishing it to encourage other people to think about the ideas behind it and to discuss the advantages and disadvantages of several approachs to the involved problems. The project is now hosted in the public respository SourceForge.net to open its development to other people. Introduction Multilingual Information Processing (Tokyo, 2001), but for several reasons its development was paused.1 Until now, the only way to adapt LAT X for it to E Its goal is twofold: in the short-term, to provide a become a multilingual system is babel; although an- real working package for Aleph to become useable other systems like mlp (by Bernard Gaulle) or poly- with LAT X, taking advantage of features like the glot (by me) have appeared now and then, in prac- E OCP mechanism; in the mid-term, to use the expe- tice only babel is used. It exploits T X in order to E rience gained with a real life system in order to de- accomplish some tasks which T X was not intended E velop better multilingual environments with LAT X3 for, like right to left writing and transliterations, E and Omega. but it’s clear that the next step requires features The rest of this paper of devoted to highlight not available in T X. Further, while one can write E some of the issues and therefore it does not intend documents in several languages, babel is esentially to be exhaustive. To get a full picture of the pack- a way to change the main language in monolingual age please refer to the manual [3], which is being documents. written at the same time as the package, because I Long ago, Omega and ε-T X developement E think the documentation is an integral part in the started independently and recently a new project development process. I’ve divided the topics in two named Aleph, combining features from both sys- parts, those related directly to T X, and those re- tems, has been launched. There are several pack- E lated to the Aleph/Omega extensions, particularly ages for specific languages taking advantage of the to the OCP mechanism. features in Omega (devnag, makor, CJK, etc.) and omega the package provided a few macros to ease its The TEX part use, now expanded with the name of Antomega by Alexej Kryukov [5], but they don’t provide a generic Organizing and selecting features Language high level interface to add a language and to syn- commands are grouped in components, with a few chronize it with other languages in a consistent and predefined ones—namely, names, date, tools and A babel flexible framework. On the other hand, LTEX3 con- text. At first sight this resembles , but in fact tinues evolving and one of its aims is to have built-in this similitude is only superficial, because you are multilingual capabilities. free to organize and to select components. The limit It is in this context that Mem was born. Actu- 1 There is no paper, but you can find the slides on http:// ally, it was born several years ago with the name of perso.wanadoo.es/jbezos/mlaleph.html. In fact, Mem was Lambda and presented in the Fifth Symposium on born even before, in 1996, with the name of polyglot as I shall explain shortly. 92 Mem. A Multilingual Environment for LATEX with Aleph Javier Bezos Proceedings EuroTEX2005 – Pont-à-Mousson, France MOT01 would be a component per macro but this does not been implemented in Omega, but to me it’s clear it seem sensible; for example, left and right guillemets should be taken as a guide for Mem, and for that could be a single group. On the other hand, too matter for any multilingual environment. At the many components would be unconvenient for the time of this writing I was studying how to tackle user. I think a sort or component/subcomponent this task and the resulting model will be left for a model should be devised (eg, text.guillemets), future paper. and at the time of this writing I’m working on a system to allow even decisions at macro level like text.guillemets.\lguillemet. Never again default values! In a well-known ar- This poses the problem to determine which ticle published in the TUGboat ten years ago, Hara- components are active at a certain point of the doc- lambous, Plaice and Braams proclaimed “Never ument. There are, of course, systems like those again active characters!” [4]. Now I proclaim in CSS and other formatting languages based on the end of another source of problems in the babel description rules for transfomations based on con- package—namely, default values. Actually, default tent (for example, with the keywords inherit and values are mainly associated with active characters, ignore). However, TEX allows programmable rules but they are also present in macros. Having default for transformation based on format and such a values for a certain language is not a bad thing, but model seems very limited (and the term “inherit” when those values are restored every time the lan- can be inappropiate in the context of an object-like guage is selected and they cannot be redefined with 2 environment). Unlike CSS, with its closed set of the standard LATEX procedures then problems arise. properties, TEX allows creating new properties and In Mem, a default value in a language is only a therefore new ways to organize the document layout. proposal, while the final decision is left to the user, There is a proposal from Frank Mittelbach and which can change it by means of \renewcommand, Chris Rowley [7] based on nesting levels, with com- \setlenght and similars. No special syntax is re- ments about the main issues to be addressed, but quired, like for example \addto\extrasspanish. since this paper is somewhat abstract regarding the The behaviour of language commands is exactly possible solutions it’s difficult to determine if that that of normal commands, except that their values model will be enough for many purposes. In par- change when the language changes. ticular, it presumes the structure of the document A macro is made specific for a certain language is a tree, and therefore, as its authors point out, with \DeclareLanguageCommand, which provides a the model has to be extended to provide the neces- default definition to be used if the users likes it; sary support of “special regions” that receive con- if you don’t like it, you can redefine it, since the tent from other parts of the document. default value is not remembered any more. Outside A basic idea in that paper is that there is a that language, there could be macros with similar base language for large portions of text as well as names, but they are not language specific (except if embedded languages segments, which are nestable. defined for another language, of course). Although in a limited way, these concepts shown Furthermore, if a language defines an undefined at TUG 1997 related to a clear separation between macro, this is only defined in the context of that base and embedded languages were present at that language and you not are required to provide a de- time in my own polyglot package (first released early fault for another language, because I firmly believe 1997) whose code I used as the base to develop loading a language should not change at all the be- Lambda and now Mem. haviour of another language. In other words, with On the other hand, Plaice and Haralambous in Mem languages are much like black boxes. [9] and I (in Lambda) proposed independently to fol- A good example could be the Basque language, low a model based in context information; the ver- which places the figure number before the figure sioning system for Omega described in the former name. For that to be accomplished we must make has been worked out and much extended from a the- Basque dependent several internal macros. Consid- oretical point of view in [11] by Plaice, Haralambous ering the number of languages and the fact we can- and Rowley, with the introduction of the concept of not know a priory which changes will be necessary, a typographical space. Unfortunately, such a model the fact languages can (or even must) decide which cannot be carried out in full with TEX and it has not macros have a default value could lead to an unman- ageable situation which could even prevent a proper 2 See [2]. An English summary is availaible on http:// writing of packages, because we don’t know if we mem-latex.sourceforge.net. need to use \(re)newcommand or something else. Mem. A Multilingual Environment for LATEX with Aleph 93 Javier Bezos MOT01 Proceedings EuroTEX2005 – Pont-à-Mousson, France The Aleph/Omega part It’s important to remember where OCPs are not applied: when writing to a file (e.g., the aux file), OTP files The OCP mechanism provides a pow- in \edef’s, in arguments of primitives like \accent, erful tool to make a wide range of text transfor- and in math mode.