Design Decisions for a Structured Front End to LATEX Documents

Design decisions for a structured front end to LATEX documents Barry MacKichan MacKichan Software, Inc. barry dot mackichan at mackichan dot com 1 Logical design Procedural Scientific WorkPlace and Scientific Word are word processors that have been designed from the start to TeX handle mathematics gracefully. Their design philos- PostScript ophy is descended from Brian Reid’s Scribe,1 which emphasized the separation of content from form and 2 was also an inspiration for LATEX. This logical design philosophy holds that the author of a document should concern him- or herself with the content of the document, and with identifying the role that each bit of text plays, such as a header, a footnote, Structured or a quote. The details of formatting should be ig- Unstructured nored by the author, and handled instead by a pre- defined (or custom) style specification. LaTeX There are several very compelling reasons for the separation of content from form. • The expertise of the author is in the content; PDF the expertise of the publisher is in the presentation. Declarative • Worrying and fussing about the presentation is wasted effort when done by the author, since Thus, PostScript is a powerful programming the publisher will impose its own formatting on language, but it was later supplemented by PDF, the paper. which is not a programming language, but instead contains declarations of where individual characters • Applying formatting algorithmically is the eas- are placed. PDF is not structured, but Adobe has iest way to assure consistency of presentation. been adding a structural overlay. LATEX is quite • When a document is re-purposed it can be re- structured, but it still contains visible signs of the formatted automatically for its new purpose. underlying programmability of TEX, so I haven’t This can happen when a document is put on quite placed it at the bottom of the plot. The pat- the Web in addition to being published, or even tern is that power and flexibility generally get sup- when the author sends the document to a new plemented or replaced in some circumstances with publisher. structured and declarative alternatives. The most powerful typesetting programs tend The original design philosophy for Scientific to be programming languages themselves. The two WorkPlace and Scientific Word was to make visual most prominent examples are PostScript and TEX. word processors that live at the bottom right of Although these are extremely powerful, they are not the diagram, and produce their output by generat- always simple, and they do not separate content ing LATEX using one of over a hundred typesetting from form. Consequently, there is a migration on styles. This is the optimal solution for publishing, the following plot from the top to the bottom, and at least when we support a publisher’s style, or from the left to the right. when a publisher’s style uses the same tags as one of the standard LATEX document types. 1 Brian K. Reid, “Scribe: A Document Specification Language and its Compiler,” Ph.D. Dissertation, 2 Enter the customer Carnegie-Mellon University, Pittsburgh, PA, Oct. 1980. 2 Although this philosophy works very well for pub- Leslie Lamport, LATEX: A Document Preparation System, Addison-Wesley Longman Publishing Co., Inc., lishing, many of our customers want to have greater Boston, MA, Second Edition, 1994. control over the appearance of their documents. The TUGboat, Volume 28 (2007), No. 3 — Proceedings of the 2007 Annual Meeting 335 Barry MacKichan truth is that not all mathematical documents are Part of the motivation for not needing access to written for publication in a journal. The author our source code is that extending these operations might want to post a document on the Web or to will be easier for us if it is not necessary to change send out preprints, or to prepare reports that will and re-build C++ code in order to support a new not be published, or to prepare handouts for stu- tag or to change the behavior of a standard tag. dents. The cold hard truth is that programs like The other part of the motivation is that if the tools Microsoft Word — despite its intellectual roots be- are standard and well-documented, then advanced ing also in Scribe — have over the years encouraged users can make their own changes. users to fiddle and futz with formatting. The ex- perts may all agree that the result is ugly, but the 3.1 Internal form of a document customer is the one who pays our salaries. Scientific Word has an internal form that is not In the past, Scientific Word users had a hard LATEX but looks superficially like LATEX, and we have time if they wanted to change or add to a style. an adequate rendering engine for it. However, it is The advice of our tech support staff has been: not extensible — that is, to extend it means rewrit- ++ • You don’t want to do that ing C code and extending the rendering engine. • You shouldn’t do that To avoid this problem and get the extensibility we • You can use package X to do that need, we choose an internal form that is rich enough • You can rewrite the style file and extensible (and it must also be declarative and structured). The obvious candidate (at least in this We no longer give the first two responses, and century) is XML. We are basing future versions of our users are not going to be able to use the fourth our software on the Mozilla Gecko rendering engine bit of advice. Due to the large number of useful for HTML and XML. Tags can be introduced at will, packages, we now encourage users to start with a and CSS (Cascading Style Sheets) are used to deter- standard LATEX document type and to use packages. mine how these tags appear on the screen. This works, but it is not the most elegant way to Some of the features of Gecko that are very use- solve the problem, since you shouldn’t have to write ful to us are: options for the geometry package in order to change • The rendering engine is open-source under a li- a margin. cense that allows us to extend it if necessary. We also allow the user to enter snippets of raw • The rendering engine is rich and powerful (the A TEX or LTEX code in what we call a “TEX but- program user-interface is in fact a Gecko docu- A ton” (which is how we enter “TEX” and “LTEX”) ment). but this runs counter to the design philosophy, and • XML is a standard that is easily converted to can’t address problems when a user wants to change, and from LATEX. for example, how list items are generated (since the • A powerful scripting language is integrated into code to be added would be in the middle of code we Gecko. have generated). • A technology (XBL–XML Binding Language) 3 A statement of the problem allows attaching behavior to (new) XML tags. • A system of broadcasters and observers simpli- This discussion now allows a statement of the prob- fies coordinating the behaviors of objects. lem we are solving. • Support for infinite undo and redo is built into 1. We want an internal form for our documents the document-modifying functions. that is both rich and extensible, and a rendering A A 3.2 Conversion from LTEX engine that is rich enough to render a LTEX document and which is extensible. Scientific Word does not process TEX or LATEX files with T X. It simply determines the structure of 2. We want to convert a LAT X document to our E E the file by recognizing tags such as \section and internal form in a way that is extensible and \subsection. In the past, it has caused problems preferably uses standard, well-documented tools, when users defined their own macros: we did not and in particular does not require access to our recognize them and loaded the macro invocation as source code. a TEX button. Beginning with version 5.5 (two years 3. We want to convert our internal form to LATEX ago) we now run a version of the TEX macro pro- in a way that is extensible and uses standard cessor, and we evaluate macros defined by the user, tools, and does not require access to our source but we do not evaluate macros defined in LATEX or code. any of the standard packages. The result should be 336 TUGboat, Volume 28 (2007), No. 3 — Proceedings of the 2007 Annual Meeting Design decisions for a structured front end to LATEX documents a document that contains only the standard macros, The next section addresses the question of how and which can be read by Scientific Word. you can tailor the on-screen presentation of a tag. We continue with this same approach in our new architecture, except that the definitions of the 4 Some examples standard LATEX macros converts them to XML. The 4.1 Displaying ‘LATEX’ on screen resulting files are complicated, but most of the com- This is a brief discussion of how you can display plication is in some utility macros that make the fi- a new tag, such as <latex/>, on the screen. This nal macros quite easy to understand. Some sample is done by using XBL. We’ll skip lightly over the code from one of these files is: details. \def\out@begin@abstract{% In a CSS file there is a line that tells Gecko that \msitag{^^0a}% special rules apply to this tag: \msiopentag{abstract}{<abstract>} latex { } -moz-binding: url( \def\out@end@abstract{% "resource://app/res/xbl/latex.xml#latex"); \msitag{^^0a}% } \msiclosetag{abstract}{</abstract>} In the file latex.xml, there is a section that } says how to display the tag: This is all that is required to convert the abstract <xbl:content> environment to XML.

Design Decisions for a Structured Front End to LATEX Documents

Bibliography of Erik Wilde

Simplifying the Scientific Writing and Review Process with Sciflow

Security Analysis of Firefox Webextensions

Cross Site Scripting Attacks Xss Exploits and Defense.Pdf

Taft to the Negroes

Open Architecture for Multilingual Parallel Texts M.T

Firefox Hacks Is Ideal for Power Users Who Want to Maximize The

Classicthesis.Pdf

Here.Is.Only.Xul

Library Publishing Toolkit, Ed

Research Techniques in Network and Information Technologies, February

1. Overview Scribe Is a Free Cross-Platform Note-Taking Program