Generating Multiple Outputs from Ω John Plaice, Yannis Haralambous

Generating Multiple Outputs from Ω John Plaice, Yannis Haralambous To cite this version: John Plaice, Yannis Haralambous. Generating Multiple Outputs from Ω. Tugboat, TeX Users Group, 2003, Proceedings of EuroTeX 2003, 24 (3), pp.512-518. hal-02112933 HAL Id: hal-02112933 https://hal.archives-ouvertes.fr/hal-02112933 Submitted on 27 Apr 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Generating Multiple Outputs from Ω John Plaice School of Computer Science and Engineering The University of New South Wales UNSW SYDNEY NSW 2052, Australia [email protected] http://www.cse.unsw.edu.au/~plaice Yannis Haralambous Département Informatique École Nationale Supérieure des Télécommunications de Bretagne CS 83 818, 29238 Brest Cédex, France [email protected] http://omega.enstb.org/yannis Abstract In this paper, we describe how to generate multiple outputs (DVI, PostScript, PDF, XML, ...) from the same Ω document. The Ω engine is augmented with a library for manipulating mul- tidimensional contexts. Each macro can be defined in multiple versions, and macros can thereby adapt to differing contexts. Macros can be specialized for several different output formats, without changing the overall structure. As a result, the same document can be used to easily produce different output formats, with appropriate specializations for each of them, without having to make any changes to the document itself. Résumé Dans cet article nous décrivons le processus de génération de sorties multiples (DVI, PostScript, PDF, XML, ...) à partir du même document Ω. Le moteur Ω a été muni d’une bibliothèque de sous-routines dédiée à la manipulation de contextes multi-dimensionnels. Les macros TEX peuvent être spécialisés selon le format de sortie, sans changer leur structure globale. Ainsi, le même document peut, sans la moindre modification, produire facilement différents formats de sortie avec les spécialisations ad hoc. Introduction rial for different output formats: versioning the typesetting process also provides a high-level interface for mul- We present in this paper a new approach to generating tilingual typesetting, an issue that has hindered the de- typeset and structural material from Ω in a number of velopment of the Ω system since its inception. See the different output formats. This approach generalizes the paper presented at TUG 2003, with Chris Rowley [3], existing approaches of DVI postprocessors capable of in- for a detailed discussion. terpreting DVI \special’s, specialized modifications to However, it is not sufficient simply to be able to gen- the typesetting engine, judicious use of alternate versions erate different versions of macros and ΩTPs; the T X of macros, and external interpreters of subsets of LAT X. E E document model is very simple, and the one-pass doc- Key to this new approach is the introduction in Ω of ument manipulation approach — analogous to the Pascal versioned macros and versioned ΩTPs that can adapt their language in which it was written — built into the soft- behavior to a dynamically running tree-structured con- ware acts like a straitjacket when one wishes to pass as in- text that permeates the entire typesetting process. As a put or to generate as output significantly different docu- result, when a text is to be typeset for a new output for- ment structures. mat, then new versions of macros can be written at any Therefore, at least three additional components level, without changing the existing macros, thereby mini- need to be added to Ω in order for it to be fully adapt- mizing the amount of additional work to be undertaken. able to different formats. First is the ability to directly Versioned macros and ΩTPs have ramifications well apply ΩTPs and other filters to the input stream, even be- beyond the structural issues involved in generating mate- fore, and possibly bypassing, the macro processing stage. 512 TUGboat, Volume 24 (2003), No. 3 — Proceedings of EuroTEX 2003 Generating Multiple Outputs from Ω Second is the ability to directly apply ΩTPs to the out- • building pages from streams of boxes and glue (page put stream, possibly without even generating DVI output. builder). Third is to supply general hooks that allow the user to TEX’s operation is undertaken in one pass, and it is manipulate internal document structure, and not simply very difficult, if not impossible, to be able to manipulate horizontal and vertical boxes. intermediate data structures as they are being built. In this article, we present the work that we have The different extensions to TEX and the different undertaken towards these goals. We begin with a brief DVI postprocessors have all taken different approaches, background, describing what we consider to be the main which is quite normal given their divergent aims. contributions of existing extensions of the general TEX First are the DVI postprocessors, dvips (generat- framework (not just to the TEX engine), and show how ing PostScript) and dvipdfm (generating PDF). Each of these different approaches all contribute to a better un- these programs transforms DVI output, augmented with derstanding of the general problem of generating differ- DVI \special’s, specifically designed for use with that ent outputs from the same files. program and generated by TEX through its macro mech- The model for contexts that we have adopted was anism, into the relevant output format. developed by Paul Swoboda in his PhD thesis [7]. It The main advantage of this approach is that it en- is the most highly developed presentation of intensional courages modularity, in the sense that the typesetter is versioning, an approach to the development of software separate from the pretty-printer. However, one can only variants first proposed by the first author and William W. put into \special’s information that is made available to Wadge [6]. We give a discussion of intensional program- the user. Information about intermediate data structures ming and versioning, then give a detailed presentation of is not directly available, so can only be approximated. contexts and context operators. Second is the LATEX2HTML approach. This tool We then show how contexts have been integrated does not do typesetting, rather it reorganizes the struc- into Ω. To do this, new Ω primitives are introduced for ture of the text into HTML. It does not use the TEX creating and using different versions of macros, and for engine, but itself parses a large (reasonable) subset of changing and manipulating the runtime context. In ad- LATEX. For parts that cannot be directly translated into dition, means for having versions of internal and exter- HTML, such as mathematics, then it generates small nal ΩTPs are defined. LATEX files, calls LATEX, then dvips, then transforms These technical sections are followed by a discus- then into PNG files. Although LATEX2HTML is a useful sion of how the internals of the Ω engine should be reor- tool, in its current form it will never have access to TEX’s ganized to facilitate the generation of multiple outputs. internal data structures, since it never calls TEX. Third, also for generating HTML, is TEX4HT, TEX and its Extensions which produces HTML files that resemble DVI pages The TEX document model supposes that a stream of text, generated by TEX. TEX4HT is also standalone, but it interspersed with control sequences, is to be transformed does use TEX for parsing and typesetting the input. It into a series of pages, each of which is a vertical box that makes use of extensive DVI \special’s. contains other boxes, either vertical or horizontal. Each Fourth are the extensions to the TEX engine, namely page is generated into DVI output, in the process losing e-TEX, pdfTEX, and Ω. The e-TEX extensions fo- some of the information contained in the page. cus mainly on improving the macro expansion facilities. The boxes are generated on the fly. Although cer- They do not change the typesetting, but do provide the tain items can be stored for later use in registers, accessed very useful ability to reparse an input sequence. much as one would in assembly programming, TEX’s doc- The pdfTEX extensions are two-fold. First is some ument model essentially consists of the following pro- experimental work simulating some of Peter Karow’s cesses: hz program. Second, more commonly used, are the extensions to generate PDF directly rather than DVI. In ad- • transforming streams of characters into streams of typeset glyphs and boxes (main loop); dition to its new pretty-printer for TEX pages, pdfTEX provides built-in mechanisms, using whatsit nodes, for • building math lists from TEX math, then transform- generating such things as PDF forms and margin items. ing the lists into streams of typeset glyphs (math Although pdfTEX is practical, in the sense that one mode); can quickly generate PDF files from a TEX file, the fact • transforming streams of typeset glyphs, with in- that all of the functionality is hard-coded limits the pos- serted hyphenation points, into streams of horizon- sibility for extending the same system. For example, the tal boxes, corresponding to lines (paragrapher); current pdfTEX does not allow EPS files to be included • building boxes, corresponding to tables, from align- in the PDF files that it generates. ment specifications; The Ω extensions are of a more general nature.

Load more