TUGboat, Volume 20 (1999), No. 3 155
ber of submissions was still rather ... euh ... low. Editorial Overview So Sebastian stood up and exhorted us, challenged us (while we were between speakers) to do something about it. Vancouver in August It took about half an hour but then there was a steady stream of quiet little walks to Sebastian’s end This year’s annual meeting (my first since 1995 in of the hall, a quick smile, a piece of paper changed Florida) had the papers, the attendees, the locale, hands ... and then the speaker’s eyes could return and the weather all vying for my attention. Every to the matter at hand. And thus a Poetry Reading day. It was a fantastic site: that West Coast mourn- in the Rose Garden was organised for that evening, fulness that comes of rain on huge fir trees for the and the results were unparalleled! You may have opening few days and then brilliant sunshine for the heard tales of “inspired” readings—it’s all true! Un- rest of the week, making the color contrasts outside fortunately, all we can reproduce here are the words. more like a paintbox than a conference site! The website has all the poems, all the pictures, and And then there were the papers! Oh yes. Alpha- all the prizes. \acro bet-soup time, folks. We have more sthan There was further mirth and enjoyment at do- ever, it seems: XML,MathML, EPS, DVIPDF, PDF, ing non-TEX things on the evening of the Banquet, HTML, CD-ROM ... The editor’s supposed to find when door prizes were awarded via an apparently the first instance of use in a paper and insist that a random picking of name tags out of a hat. But— definition be provided for those items which are new how ‘random’ is it to see most of these going to the or very jargon-ish. I tried . . . Fortunately, most au- same table ... where many of the LATEX3 team were thors already took care of that step; and where they sitting, in fact? I ask you ... didn’t, well, if the paper wasn’t the first one to men- Then it was time to select the Best Paper. The tion it, then I left it. Because I’m going to assume job was pushed onto me—so I promptly turned that not only will you be reading all these papers around to ask six other people for their suggestions. but you’ll also be reading them in order! Of course! Results? A tie! It was with great pleasure that I Well ... ok, then ... how about reading all could annouce that Don Story and Jean-luc Doumont the abstracts at least? After all, that’s why we put were the joint winners; judging from the applause, ’em into all these proceedings articles — sometimes this was the overwhelming view of the audience as it’s the only way for those of you who couldn’t be well. Congratulations, gentlemen. there to get a handle on just where things stand vis-`a-vis new developments and interesting projects 2 Recognition that often seem to wait till the meeting to be un- Recognition in a volunteer community can some- veiled. Granted, that’s not where you’ll find all the times take a long time to come around. It’s not that acronyms defined but it’s a start. the awareness or the gratitude aren’t there — they But before you plunge into the abstracts—or are, in spades—but formal recognition is something the full-blown articles — stay a moment and find out that requires an occasion, a gathering, and, of course, what else went on in Vancouver. Oh my ... we did someone to instigate it. How this latter came about get up to some fun and games. And I have a sus- for the Vancouver meeting is someone else’s story to picion that next year’s meeting, given some of its tell; what I can tell you is that Barbara Beeton’s organisers, will yield even more surprises and en- 20 years on TUG’s board of directors (a steering joyment for all in attendance. So, if you can, save committee in the early days) and Don Arseneau’s up your money, your travel points, your vacation many years of support to the entire TEX comunity days, and plan on being in Oxford next year in mid- via packages and answers on comp.text.tex are in- August: the 13th to the 16th. It’ll be great! For deed in the category of long-standing contributions tug2000. more details, see p.154 or visit the website: that were formally recognised this past August. tug.org ;-)) . . . not “tug00” Tuesday afternoon saw Michael Downes of the 1 Fun and games AMS speak of Don Arseneau’s place in our directo- ries and our code, and then he presented the man As many of you know from visiting the pre-confer- from TRIUMF with a specially commissioned draw- ence website, submissions for a Poetry Contest were ing by Duane Bibby. The website has more: Michael’s being solicited already long before the start of the tribute, a list of Don’s packages, and Duane’s draw- conference. However, by the time Wednesday morn- ing. Visit it—and then count how many packages ing of August the 18th had rolled around, the num- you’ve used over the years. 156 TUGboat, Volume 20 (1999), No. 3
On Wednesday evening, at the end of the Po- per; 21 reviewers delivered comments on 23 papers. etry Readings, Mimi Jett, TUG’s President, had the The authors’ work you will find further on in these pleasure of bringing our attention to the fact that proceedings, the reviewers’ names you will find here: Barbara Beeton, in yet another of her many roles, Don Arseneau, Thierry Bouche, David Carlisle, had now been a board member and board meeting Don DeLand, Michael Doob, Victor Eijkhout, attendee for 20 years. A bottle of vodka (courtesy Robin Fairbairns, Jeremy Gibbons, Denis of Irina Makhovaya of MIR Publishers) was proba- Girou, Michel Goossens, Eitan Gurari, Kathy bly the best possible choice to celebrate this fact— Hargreaves, Neal Holtz, Dag Langmyhr, strong drink for strong work! Pictures and details Pierre MacKay, Bernd Raichle, Peter Schmitt, at the website pages. Michael Sofka, Phil Taylor, Ron Whitney, and Ohhhh, Barbara ... You may want to look at Mark Wicks. p. 313 ... vodka on paper just won’t work, so I I would like to thank each of you for having been so have something else for you—and our readers—to generous in offering your time, thoughts, and ener- remember this 20th year of board-dom. gies to yet another volunteer activity for TUGboat and for the T X community. I am very grateful for 3 The editorial thing E all the added assistance in treating these papers. It’s been quite a while since I’ve been editor. This year’s crop of authors and papers has been a true 3.2 New trademarks pleasure to harvest. Not that they’ve sat back and At meetings such as this, there are always new items taken everything I’ve dished out ... but that they’ve being presented, some of which are trademarked. been very good to work with, in order to end up with Here are two new trademarks to note: text both they and I can live with. I’m far more W3C (World Wide Web Consortium) intrusive an editor than most (certainly more than MathType Barbara — but that’s also a function of the time that she simply does not have available to her) and yet For regular issues, the list of trademarks can be the authors have been very kind in both accepting— found on the verso of the title page. Proceedings and politely declining —editorial changes. issues have other stuff on that page, so I reproduce It’s only by the fifth or sixth article that I really here what we normally cover: hit my stride—which means I have to go back and Many trademarked names appear in the pages of redo the earlier papers. That’s normal. Then come TUGboat. If there is any question about whether the last few papers and it’s almost too much to finish a name is or is not a trademark, prudence dic- ... and I didn’t. Six papers had to wait till after the tates that it should be treated as if it is. The conference for my editorial sledge and in the interim, following list of trademarks which appear in this issue may not be complete: yet more second thoughts and doubts about edito- rial stances which had changed — and then needed MS/DOS is a trademark of Microsoft Corp. to be carried back into previously edited files. So it’s METAFONT is a trademark of Addison-Wesley been a year, more or less, since Anita’s invitation to Inc. be proceedings editor; I don’t think that’s normal— PCTEX is a registered trademark of Personal too long — but it’s a lot of pages, I tell myself. Per- TEX, Inc. haps a team approach for the actual editing wouldn’t PostScript is a trademark of Adobe Systems, be a bad idea on projects of this size. Inc. techexplorer is a trademark of IBM Research. 3.1 Reviewers TEXandAMS-TEX are trademarks of the Just as regular submissions to TUGboat are refer- American Mathematical Society. eed, a review process for conference papers also ex- Textures is a trademark of Blue Sky Research. ists. This is crucial when the editor is only aware of UNIX is a registered trademark of X/Open a small fraction of what’s going on in the TEXcom- Co. Ltd. munity — and knows even less of the details: I need this review process even more than the authors! The 4 The organising thing reviewers were thus able to bring additional infor- Conferences always have organisers — yet more vol- mation and other possibilities to the authors, who unteers! This year co-chairs Stephanie Hogue and in turn made very good use of the suggestions. I Anita Hoover, along with their committee members, was also fortunate in that only one or two authors were responsible for the program, local arrangements had to be asked to review another presenter’s pa- (for which we also thank the University of British TUGboat, Volume 20 (1999), No. 3 157
Columbia Conference Centre’s Sarah Johnston), pre- full-fledged article; we’ll have to put it into the next print printing, courses, vendor participation, and issue of TUGboat!1 publicity: However, I would be remiss if I didn’t talk a bit about the work that Ross Moore and Wendy McKay Don DeLand, Sue DeMeritt, Wendy McKay, have done both pre- and post-conference—it’s gone Patricia Monohon, Cheryl Ponchin, and Heidi far beyond anything I’d have thought possible. Sestrich. The website became, in a sense, a demonstra- 4.1 Corporate participation tion of what many conference papers would be talk- Vendors were active in several ways: display tables ing about: the migration and integration of TEX source files onto the Web, retaining as much as possi- outside the meeting room, demonstrations during the meeting, donations of products as door prizes; ble of the original typography and reducing recoding some also gave papers that yielded detailed intro- as much as possible. The entire operation was ren- dered all the more difficult as none of us had planned ductions and descriptions of some of the new soft- ware that’s out there. We therefore thank the fol- for it—the source files had therefore to be treated individually, carefully examined for author-specific lowing vendors for their participation at TUG’99: macros, conversions to LATEX made (it was all being Birkh¨auser-Boston, Blue Sky Research done using latex2html), and then after the confer- (Warren Leach), 4TEXCD(EricFrambach), ence, Ross went back into the files to insert links to IBM (Mimi Jett), Kinch Computing (Richard Web references which had been cited. Kinch), MIR Publishing (Irina Makhovaya), As of this writing (December 28, 1999), the Personal TEX Inc. (courtesy of Anita Hoover), TUG’99 website has had 241 visitors — now surely Springer-Verlag, Y&Y, and Zephyr software we can do better than that in showing some interest (courtesy of Ross Moore). in Topsy! Visit the site and then, for those of you I hope this list is complete! who are editors, blanch at the prospect of having to deal with a whole new medium of presentation! Me, 5 TUG99 Website: .../tug99/bulletin/... I’ve recovered from blanching . . . but let next year’s It’s been just over a year since Anita Hoover, co- editor(s) be warned ;-)) chair of the conference committee and technical ed- As mentioned, Ross has written up a blow-by- itor of these proceedings, asked if I’d like to be pro- blow account of the webification process, which we ceedings editor for TUG’99—she’d make sure all will carry in a future issue of TUGboat. His trials files were present, process them and confirm that and tribulations are those of any first-time effort: they did indeed run; I would take care of the con- things can only go better next time around! While tents. It took several months before abstracts start- it is perhaps obvious to those of you who have al- ed coming my way for editing before being posted ready been involved in paper/web publishing, pre- to the Web. A few months after that and the pa- planning is essential to keep everyone’s sanity within pers began arriving but it wasn’t too hard to keep the bounds of ‘normal’. I feel almost like I was be- up. I’d done this before, I knew the routine: Set ing left behind, in some ways, as not only the pa- up my charts, cajole reviewers into lending us some pers, but Ross and Wendy’s work, was moving well of their time and knowledge, edit hardcopies, in- beyond my sphere of activity and experience. It’s put the edits, exchange the files with authors a few exciting — and I’ve had some novel uses added to times . . . and occasionally a few more times . . . the my repertoire these past weeks as Mimi Burbank usual stuff. (our long-suffering production manager) and I have Then something new: Ross offered to convert worked to check all the little bits and pieces of text his paper into PDF format as a demonstration of and items that seem to come to mind only at the what TEX on the Web was all about. Sounded fine. very end—and it’s a bit intimidating. I’ll have to Except, somehow that one paper became all the spend some time thinking about just how far into papers; and that job became one that “grew like the “web-thing” I want to venture ... but it’s been Topsy”, as they say. And the site has continued to a heck of an experience, I can tell you! grow after the conference, as you all know from the e-mailed notice about the Bulletin being available; initially password-protected in order to allow TUG 1 members first access, the site is now open to all. Note that all papers at the website are the preprint ver- sions; the electronic versions of the papers found in these And like Topsy, the initial request for a few lines TUGboat pages will be slightly different, and of course sev- on how the “webification” work had gone became a eral papers will have a “Post-Conference Update”. 158 TUGboat, Volume 20 (1999), No. 3
5.1 Map of TUG meetings and members The full story on the Grimm Exhibition is also Something else that Ross and Wendy worked on is available from the website, along with some of the a massive map showing where all the TUG’99 at- posters (again, in glorious color). If you have the tendees had come from! The map is best viewed bandwidth and the space, try to look at the .pdf on the web — it just doesn’t come across as well in versions of these posters — it’s a wonderful show just print. It also shows where all past meetings have watching them ‘develop’, layering the colors and the taken place (see also p. 314). So if you want to see a shapes . . . demonstration of some of the techniques described 6 Production notes in their paper, go to /preprintmap/node2.html in the Bulletin webpages. The first image is a .gif There are actually two “productions” involved in file but if you have the capacity, download the .pdf creating this year’s proceedings: the usual one for version (833K). print copy, and the first-time (for us) of e-versions of the preprints. 5.2 The ‘TEX Friendly Zone’ Abstracts were solicited last fall, and, after I’d One more thing (their energy is boundless!) Wendy edited them, the texts were posted to the website— the “webbing” began early on. has also put up a “TEX Friendly Zone” logo, a nifty new Bibby drawing that’s available for download To help with their articles, macros and user and printing. Here’s what it’s all about: guides were made available to authors (again, from the TUG website). As the work progressed, I made The TEX Friendly Zone (TFZ) logo is another 2 notes on both my hardcopies of the guides on points one of Duane Bibby’s beautiful drawings of where we might improve or add things, particularly the TEX lion. to the guide for plain T X. Permission is granted to print the TFZ E image and post it in your office or any place We had 23 papers submitted, 7 in plain TEX, 16 in LAT X; two papers were not submitted by the where TEX Friendly people work! E You are also encouraged to get new members deadline, and a third was withdrawn. Both flavours to join TUG and enjoy the camaraderie of of TEX presented me with plenty of new coding styles, TUGees who willingly share their knowledge package options, and just really good examples of and expertise to make the typesetting world a how to do things. As well, a good number of papers better place! used BibTEX, something which I learned to use— and then control . . . a little! In short, editing our 5.3 Photos and Images proceedings may start with the words but it often And then there are the photos. We have to thank progresses deep into TEX itself—lots to learn in very Warren Leach of Blue Sky for such fantastic images, short order. both still and in QuickTime video, and also Kirin Authors dropped off their files at a designated Bahm and Jean-luc Doumont. Digital cameras are ftp site, and then Anita Hoover, the technical editor, what yielded the wonderful results on-screen. On tested them at her site to ensure all files had been 3 the other hand, for print, scanned pictures from a supplied before moving them to SCRI. good still camera are still a better route (it’s all a I approached potential reviewers to read and question of image resolution). comment on each paper, and as mentioned earlier, My efforts, on the other hand, with an old de- these people did a terrific job. I appreciated their fective camera, were less than stellar ... I’d highly willingness to help, and I believe the authors appre- recommend that next year’s organisers enlist the aid ciated finding a willing and knowledgeable audience of digital-camera owners to capture images if there to check their work. are any plans for web publication of pictures. Again, Once authors had addressed reviewer issues, these images are all at the website. There are lots their revised files were again processed to ensure more photos (and in color, of course). Information they still ran, and Anita then generated .ps files 4 on how the photos were taken is also available. for me to pick up, print here, and begin editing. I
2 3 “This year the familiar and much loved TEXLionwas The Supercomputer Computations Research Institute brought back to enhance and illustrate our activities in a at Florida State University, where all TUGboat production unique way. Duane Bibby produced several wonderful TEX takes place. 4 Lion art works for the TEXLive4 CD, TUG’99, TUG2000, the As SCRI has to be complete in its collection of fonts, Special Recognition Award, and the TFZ logo. We are happy packages, and utilities, there’s no point to my duplicating to have found him again, and we thank him very much for that environment at my site. Eventually, though, as the edit- doing these very special drawings — as only he can — at quite ing cycle progressed, I did process the bulk of the files here short notice.” [footnote in original – Ed.] in order to deal with final line-, column-, and page-breaks. TUGboat, Volume 20 (1999), No. 3 159 prefer to edit first on hardcopy, mainly so I can see inclusion of poems throughout the issue, and then all the pages at once, and also backtrack now and has stood by the printer, in sickness and in health then ;-)) before changing the files, which were pro- (the printer’s, I mean!), to ensure that the pages are cessed as necessary during this time to ensure that all there and that the whole package is shipped off edits weren’t generating errors. Then began the ex- to the printer (the company, I mean!). change of files with the authors: the editorial dance, As I said near the beginning, recognition re- as it were. Again, all went very smoothly, and so quires an occasion and someone to instigate it. That here we have a largish issue of TUGboat which rep- is as true for the launching of a conference effort as resents the written record of most of what went on it is for safe delivery of its printed proceedings. And at this year’s annual meeting in Vancouver, supple- so I must close by thanking these two women for mented this time by a good selection of images and their work in seeing these pages through their fi- additional texts on the conference website. That is, nal days and hours of production, after the editorial what we didn’t have room for, or what is not suit- work had been completed. Thank you. As always. able for print (i.e., images, especially color ones), TUG you’ll find up on the Web. 7 Where to find things on the ’99 I hope you enjoy reading—whether on paper or website on screen — about the work that’s currently engag- Start off at tug.org/tug99/bulletin,theTUG’99 ing the TEX community as it does indeed work its Post-Conference Bulletin, and read the warnings way onto the Web, tangled and tortured though that about this being a photo-intensive and colorful site. route may seem at times. And throughout, I think Scroll down to Click here . . . and take a coffee you will find a renewed and invigorated sense of be- along for the trip. Color monitor recommended. ing on the forefront of developments that affect not Also, please note that the password restriction (in just the math community but all of us. Recycling place to allow TUG members first dibs) has now been source files after our hard-won battles with the code, lifted. and seeing them emerge on the other side, perhaps even in colour (if we’ve done that pre-planning!), introduction /frontpage means our work is going to enjoy a very long life no papers (index) /preprintmap/node1.html matter what the medium. poetry /poems awards /award 6.1 Some final thank-yous /otherawards.html All of the above efforts are for naught unless they The Apocalypse /grimm make it to paper. And all the paper copies are for photos (index) /photoalbum naught unless they’re properly accompanied by the world map /preprintmap/node2.html easily forgotten elements to an issue: the calendar, TFZ logo /tfz the titlepage, the covers, the table of contents, and TUG2000 tug2000.tug.org the final sequencing and assignment of page numbers to all the elements. And one final place to visit: Ensuring that we don’t forget any files, we have /photoalbum/node15.html Barbara, who checks and calmly corrects what has already been checked, and who has seen to it that A real cutie! this editorial has also been checked and edited. Ensuring that we don’t forget any printed ⋄ Christina Thiele pages, we have Mimi Burbank, who has looked at all 15 Wiltshire Circle my nicely formatted final files and then made sug- Nepean K2J 4K9, Ontario Canada [email protected] gestions for improvements to many, has seen to the
Nevertheless, the grunt work is done at SCRI, and for that we are always grateful to Mimi and the people at that site. TUG ’99 Program
TUG ’99 Program TEX Online — Untangling the Web and TEX
Monday, August 16, 1999 TEX and Math on the Web Stephen A. Fulling “TEX and the Web in the Higher Education of the Future: Dreams and Difficulties” Patrick D.F. Ion “MathML: A Key to Math on the Web” Doug Lovell “TEXML: Typesetting XML with TEX” — Break — Paul R. Topping “Using MathType to Create TEX and MathML Equations” Workshop: “LATEXtoXML/MathML”, Eitan Gurari and Sebastian Rahtz — Lunch — Chris Rowley “Models and Languages for Formatted Documents” D.P. Story “AcroTEX: Acrobat and TEXTeamUp” — Break — Panel Discussion: “TEX and Math on the Web”, Stephen A. Fulling (moderator) Workshop: “Using LATEX to Create Quality Interactive PDF Documents for the WWW”, D.P. Story
Tuesday, August 17, 1999 Customizing Document Layout Jean-luc Doumont “Doing it my way: A Lone TEXer in the Real World” Peter Flynn “The vulcan Package: A Repair Patch for LATEX” Vendor Presentations — Break — David Carlisle, Frank Mittelbach and Chris Rowley “New Interfaces for LATEX Class Design: Parts I and II” — Lunch —
Workshop: “Writing Class Files: First Steps”, Michael Doob — Break —
Workshop: “Converting a LATEX 2.09 Style to a LATEX2ε Class”, AnitaZ.Hoover
TUG Business Meeting
160 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting TUG ’99 Program
Wednesday, August 18, 1999 TEX in Publishing Kaveh Bazargan “Multi-Use Documents: The Role of the Publisher” Frederick H. Bartlett “Very like a nail: Typesetting SGML with TEX” Harry Payne “Making a Book from Contributed Papers: Print and Web Versions” — Break — Robert L. Kruse “Managing Large Projects with PreTEX: A Preprocessor for TEX” Arthur Ogawa “Database Publishing with JAVA and TEX” Paul A. Mailhot “Implementing Dynamic Cross-Referencing and PDF with PreTEX” — Lunch — Hu Wang “A Web-Based Submission System for Meeting Abstracts” Petr Sojka “Hyphenation on Demand” Jonathan Fine “Active TEXandtheDOT Input Syntax” — Break — Panel Discussion: “TEX in Publishing”, Kaveh Bazargan (moderator)
Thursday, August 19, 1999 Fonts, Graphics, and New Developments Jean-luc Doumont “Drawing Effective (and Beautiful) Graphs with TEX” Wendy McKay and Ross Moore “Convenient Labelling of Graphics, the WARMreader Way” — Break — Sergey Lesenko and Laurent Siebenmann “Viewing DVI Files with Acrobat Reader — DVIPDF Gives Birth to AcroDVI” Alan Hoenig “MathKit: Alternatives to Computer Modern Mathematics” Vendor Presentations — Lunch — F. Popineau “fpTEX: A teTEX-Based Distribution for Windows” Jeffrey McArthur “Managing TEX Software Development Projects” Timothy Murphy “JAVA and TEX” — Break — Panel Discussion: “The Future of LATEX”, Arthur Ogawa (moderator)
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 161 Stephen A. Fulling
KEYNOTE: TEX and the Web in the Higher Education of the Future: Dreams and Difficulties
Stephen A. Fulling Mathematics Department, Texas A&M University, College Station, Texas, 77843-3368 USA [email protected]
Abstract New technology provides an opportunity to move mathematical and technical education out of the lecture-homework-test mold into modes that do more to develop students’ communication skills, teamwork, attention to quality, and overall responsible, mature behavior. In practice, however, severe and presumably unnecessary obstacles are encountered, mostly connected with the difficulty of transmitting mathematical notation electronically. As I document from personal experience, these obstacles are bad enough for an instructor providing information to students over the Web, and worse when students themselves are expected to communicate mathematically. I describe a partially successful scheme for carrying out peer review of homework papers over e-mail and the Web, resulting in a student-written solutions manual, and I offer a wish list of technical improvements.
In memory of Norman Naugle and James Boone
There are two kinds of people, classified by their encountered because our two most marvelous tools, reactions to new technology. People of the first type TEX and the World Wide Web, do not communicate say, “Now we can do new things — things that were well with each other. impossible before.” The other people say, “Now, I am not a computer professional. I’m a finally, for the first time, we can do the old things university teacher of mathematics, trying to make right!” Tim Berners-Lee, who made this significant contributions to that art while striving conference possible and necessary by creating the to keep a research career alive. This gives me World Wide Web, surely belongs to the first camp. little time to write my own software, and not even Donald Knuth, with his injunction to go forth enough time or facilities to find and evaluate all the and create beautiful books, is perhaps the most software that already exists. It is a great honor successful exemplar of the second mode of thinking. and responsibility to be scheduled as the leadoff The advance of computer technology is having speaker of this Annual Meeting. The professionals profound effects—actual or projected—on under- should regard me as something like Dilbert’s Boss graduate education. Most of the discussion takes (Adams, 1996, for example): I will present you with place in the first mode: laboratories with computer a list of technical demands; some of them may be algebra systems, pedagogical Java applets, distance ridiculously impractical; the rest are those you must education to reach new audiences, greater attention implement to keep academic customers happy in to numerical methods in the content of courses, etc. the next decade. Your job is to distinguish between Personally, I feel more capable of contributing in the two categories and to take action on the good the second mode, using the computer, the Internet, half. I am making an effort not to dwell on any and the Web, mainly as marvelous communication existing partial solutions that I may be aware of, tools that address the discontents of modern mass because I know that my information is incomplete; higher education. Here I shall describe two efforts it is up to the providers of those solutions to speak in this direction, with emphasis on the frustrations I for themselves.
162 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Keynote: TEX and the Web in the Higher Education of the Future
What I want for myself yesterday on an HTML page, including the Greek letters, (or at least tomorrow) integral signs, square roots, built-up fractions, etc., with confidence that they will be displayed Let’s start with the obstacles facing an instructor properly by every standard browser. (“Stan- who wishes to make text material with significant dard” may need to be redefined but the new mathematical content available to students over standard must be implemented!) the Web. 2. The screen appearance of the mathematical ex- For two academic years (from 1996 to 1998) I pressions must be of “typeset quality”. Note: was one of two calculus teachers in an experimental this does not necessarily mandate Computer freshman engineering program integrating calculus, Modern fonts. Indeed, one of the obstacles physics, English, chemistry, and two introductory to widespread acceptance of TEXontheWeb engineering courses into a unified curriculum. (This is that the CM fonts were designed for high- is a product of the Foundation Coalition, a con- resolution devices, and the more slanted ones, sortium of universities and colleges supported by especially, are barely readable on screen except the National Science Foundation.) To meet the at very large magnifications. Some compro- demands of a complete physics sequence in the mise in the design of the math fonts may be freshman year, it was necessary to revise the tradi- necessary and acceptable. tional calculus syllabus by moving large quantities I think that anyone who has tried to type of multivariable material from the third semester mathematics in HTML, even when all the necessary into the first two. No textbook on the market does characters and features (such as superscripts and that! Therefore, I wrote some textbook fragments subscripts) are available—or who has looked at the covering vectors, multiple integrals, Taylor series, syntax of MathML — will agree with the following and a few other topics, at the level and from the requirement: point of view needed. (See the URLs listed at the end of this article.) 3. It must be possible for the author to create As a long-time user of TEX, my first inclination mathematical expressions by typing raw math- was to write everything in TEXandpostthe mode TEXcodeintotheHTML (or XML) file resulting Masterpieces of the Publishing Art onto (inside some SGML-type wrapper, of course). the Web. But that smelled too much of using You’ll note that what I’m calling for here is the computer monitor as a bulky imitation of a “encapsulated TEX” (Murphy, 1991). Indeed, for printed page. In particular, hypertext features many purposes I am perfectly happy with the text would not be available. Also, the screens would not of an HTML document as browsers display it on the have the look and feel to which my Generation X screen and PostScript printers print it out. It is the audience is accustomed—possibly a serious tactical mathematical expressions that are unacceptable at error in winning their confidence and engaging their present. Moreover, the new MathML language in enthusiasm. I went to the opposite extreme: I its raw form is not writable or readable by human wrote the documents in HTML,inthehypertext beings—it would be like writing a text document and bulleted-list style. Whenever possible, the directly in PostScript. A human-oriented language mathematical expressions were written in HTML. is needed for input to MathML. But we have such Whenever I just couldn’t stomach the results, or a a language — it is TEX! The problem is to get the passage was particularly heavy in formulas, I wrote rest of the world to recognize it as the preexisting a patch of TEX and linked the .dvi file to the standard that it is. higher-level HTML page. (Since not every browser This leads back to a question that I’m sure the students were likely to use was configured to many of you wanted to ask earlier: Why don’t I handle .dvi files, PDF versions had to be supplied as use a program like latex2html? In part, that is a well.) The results still have a somewhat amateurish personal prejudice; my natural language is plain, appearance because the TEX (and the graphics) with an accumulation of personal macros, and I are not properly integrated into the HTML pages. would need to translate my files into LATEXtouse (With considerably more work and self-education I such programs. But I refuse to make that effort could have done better, but I did not have the time because, in any event, I can’t accept the results as or the expertise.) a permanent solution: But the world shouldn’t be like this! 4. The mathematical expressions must resize 1. Non-negotiable demand: It must be possi- properly when the user changes the font size in ble to place all standard mathematical symbols the browser.
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 163 Stephen A. Fulling
5. The mathematics must reside in the HTML file, – sentence structure and punctuation not in dozens of auxiliary files. – brevity and style 6. The system must not rely on an engine located – grammar and spelling on a remote third-party machine. (This seems – neatness to rule out MINSE, for example.) After several years of working purely with pa- So, that is what I want for myself. And I want per, this system moved to e-mail in 1996. I expected it yesterday—because the curriculum pilot project that keeping the reviewers’ reports electronic would is now over. As usually happens, something much cause a huge increase in efficiency, and for me it did; less ambitious was mainstreamed by my institution; however, I soon faced a revolt from the students, the course materials produced for the experimental who said they were spending too much time on key- classes remain on the hard disk, as a monument to board busywork instead of learning math, and from ... something. the grader, who found the chaotic e-mail files harder to work with than paper. Then a miracle happened: What I want for my students today a student in the class, Justin M. Sadowski, showed (or at least next year) up in my office with a C++ program to automate the mailing of reports. This program operates on I frequently teach courses for engineering and sci- the departmental UNIX systems where our students ence majors in their last two undergraduate years, have accounts. A menu guides the student reviewer covering topics in mathematics such as linear al- to type in the account names of the student authors, gebra and Fourier series. I have concluded that, and the program spawns sessions of the pico editor even at this advanced level, our standard teaching to write the reviews and a summary report to the practices do very little to develop habits of mature, instructor into template files. At the end it mails independent, professional behavior. This is not the the messages to the authors, grader, and me. place for a harangue on pedagogical philosophy. Students are strongly urged to make their (See references under “Homework review” below.) homework solutions visible on the Web as well Suffice it to say that we stand at a crossroads: as on paper. The long-range goal is an on-line Educators can use computers to restore literacy, or solutions manual, recreated each semester by a new to drive the last nails into its coffin. Freedom from class. But since these are courses in mathematics, paper is not the same thing as the death of the not computer usage, the system must run on the alphabet. students’ preexisting computer skills, which vary One of my attempts to address this situation widely. At present I estimate that about: is a drastic change in the handling of homework – 50% of the papers are still hand-written assignments. Instead of requiring every student to – 20% come from a word processor but are not turn in every exercise on the list, I assign each stu- Web-displayed dent only one or two problems per week. But these – 20% appear on-line, either in ASCII text or in problems must be taken seriously: The instruction HTML, the latter usually converted either from is to produce a carefully written, complete, formal Microsoft Word, or from Maple (which all our solution, comparable to a worked-out example in a students learn as freshmen) textbook . The class, collectively, solves most of the – 10% are written in TEX problems, thereby producing a “solutions manual”. These divisions are moving up (in the higher-tech This is a class project, their legacy to the students direction) every year. of future semesters. All the papers turned in on Why don’t more students use T X? Well, one one exercise are given to a team of two or three E of my students wrote: students for peer review. Subject to quality control by the paid grader and myself, they write a brief I spend enough time trying to get my programs critique of each paper and choose the best one for to compile. I don’t want to have to debug my “publication”. The criteria for evaluating papers [homework] papers!! include presentation as well as correct content; in They are not necessarily encouraged by their senior descending order of importance, they are: mentors, either. Although TEX has become the – mathematical correctness indispensable standard in academic mathematics – completeness and physics, the situation is different in engineering, – organization where most of my students come from. Two years – pedagogical effectiveness ago in a discussion at a conference on engineering
164 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Keynote: TEX and the Web in the Higher Education of the Future
education I suggested that we encourage the use of heroic measures are necessary to override TEX’s TEX, and the reaction of an engineering professor default behavior of reformatting paragraphs and was, “Surely you’re not still using that old thing!” discarding all the redundant white space. But all He couldn’t believe that mathematicians hadn’t all that is really necessary for most purposes is the switched to Microsoft Word. The fact that most math mode of TEX. Web users still can’t even read TEX output is 12. We must create an environment that makes another indication that we have a problem: it worthwhile for every student to learn basic 7. A plug-in or helper application to display TEX math mode syntax. (Learning to create .dvi files should be standard with every Web complete TEX documents, however, may be left browser. as the student’s option.) Instructional material A look at the mathematical documents created for TEX beginners should be reorganized to by my students and, even more, a look at the reflect this priority. reports they have written on their peers’ work, TEX math syntax is a very efficient and logical way shows that serious problems remain. The students of typing in mathematical expressions, far superior face the same obstacles listed earlier in putting to raw SGML-type languages. Hunt-and-peck menu mathematical expressions on the Web or into their systems are arguably easier to learn, but in the long printed papers, compounded by their inexperience. run excruciatingly slow to use. The TEX community So, let us extend the list of demands: simply hasn’t succeeded in getting this message out 8. It must be possible for students to produce to the rest of the world. decent mathematical documents, both in print In those fields where TEX has become the and on the Web. standard method of producing serious documents, its basic math syntax (“pidgin-T X”) has also Universal participation requires that: E become a rough-and-ready way of incorporating 9. The system must be platform-independent. math into e-mail and other informal pure-ASCII (At the very least, it must exist in both UNIX communications. This is another major reason and Windows versions, which communicate why all students should learn this part of TEX. flawlessly with each other.) My student reviewers are supposed to provide 10. The cost to students (in required software constructive criticism of their peers’ papers and to purchases) must be minimal. describe in their top-level reports the most serious 11. The time spent in learning software syntax and systematic mathematical errors they observed. must be minimized. Very often, what they write is something like “You These last two problems will be alleviated if the made a mistake in part (c),” leaving the author and students use the software throughout their under- me wondering what the mistake is. A big part of the graduate careers, not just in one course taught problem is the difficulty of discussing mathematics by an eccentric professor. Unfortunately, “it is with any specificity in an ASCII text message. More a characteristic of those in the [college teaching] generally, professions to resist change unless it is change that 1 13. All students (from the freshman year on) must they tried to start.” be enabled to discuss mathematics easily with How can we minimize software learning time? their teachers and with fellow students, in e- Does it mean giving up TEX? On the contrary, I mail, list servers, chat rooms, etc. Pidgin-TEX return to the notion of encapsulated TEX. I believe should be promoted as a medium for this. that most of T X’s reputation for being hard to E Of course, a more sophisticated system that made it learn and to use is related to high-level document possible for the students to annotate one another’s formatting. Most people first try out T Xonasmall E documents with electronic sticky-notes would be document, such as a job r´esum´e or a letter, that even better. would be trivial to type as a pure ASCII file where the typist has direct control over the placement of every character: the beginner soon discovers that Conclusions From my standpoint as a university teacher, the 1 The original quotation (Childs, et al., 1995, overwhelming issue right now is how to present p. 303) refers to the programming professions, but decently typeset mathematics on the Web. Presum- one of the authors has assured me that the theorem ably, in the near future this means: (1) tools for has broader validity. turning TEX input into MathML output, and (2)
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 165 Stephen A. Fulling
assuring universal availability (on college campuses, http://calclab.math.tamu.edu/ at least) of browsers that can handle MathML. ~fulling/m311/s99/handout.dvi Students should be able to create decent math- for class procedures, and ematical documents of their own and also to discuss http://calclab.math.tamu.edu/ mathematics in e-mail, etc. TEX math mode pro- ~fulling/m311/s99/instruct.txt vides a suitable language for the latter if we promote for operating Sadowski’s homework review genera- it adequately. tor. Samples of the resulting student reports and The academic scientific community laid the solutions can be seen at golden eggs for the exploding computer and soft- ware industries. In that light, the lingering difficulty http://calclab.math.tamu.edu/ of including mathematical notation in Web presen- ~fulling/hwk/311s99/sol2.html tations and in electronic mail amounts to a scandal. and similar URLs (all accessible from the course It is up to the TEX Users Group to: home pages). The rationale for the peer review • solicit its membership’s contributions, at all system is described in technical levels, toward solving this problem http://calclab.math.tamu.edu/ • exert pressure on commercial developers (and ~fulling/iawpaper.ps on agencies such as the W3 Consortium) to (written at an early stage when all the communica- steer developments in directions that meet tion was still on paper). These ideas were heavily these needs2 influenced by the collections in Connolly and Vilardi • continually keep its membership and the rest (1989) and Sterrett (1990). of the academic community informed of what General issues. Primarily to introduce students to options are or will be available T X and to the problems of mathematical electronic • E keep TEX in the mainstream of electronic communication, I have set up three sets of links: communications, so that the rest of the world http://calclab.math.tamu.edu/ does not come to regard it as a relic of the ’80s ~fulling/xxx.html where xxx = webmath, webtex,andviewers. Some links in lieu of references webmath lists on-line references on the problem of putting mathematical notation into Web pages Mathematics in the Foundation Coalition and e-mail, with links to proposed solutions, includ- (FC). A detailed report on the FC freshman ing MINSE, TTH, Techexplorer,andlatex2html. calculus program at Texas A&M University is at webtex provides information about TEX, in- http://www.math.tamu.edu/~fulling/fc/ cluding its derivatives with (allegedly) simpler input A summary of its goals and practices appears as interfaces, such as StarTex and Scientific Word.It has links to CTAN and TUG’s home page. http://www.math.tamu.edu/ viewers deals with viewers for .dvi files, which ~barrow/aseepaper.html people who do not aspire to write TEXthemselves The Web pages for students to read in place of nevertheless need to read TEX documents placed textbook sections can be reached directly as on-line in the most compact and convenient format. http://www.math.tamu.edu/ ~fulling/coalweb/xxx.htm References where xxx = vectors, cenmon, diffs,ortaylor, Adams, S. The Dilbert Principle. Harper–Collins, for instance; all are accessible from the course New York, 1996. syllabus pages within the report. Childs, B., D. Dunn, and W. Lively. Teaching CS/1 Homework review. The home pages for my courses in a literate manner. TUGboat, 16(3), sections of Mathematics 311 are at 300–309 (1995). Connolly, P. and T. Vilardi, eds. Writing to Learn http://calclab.math.tamu.edu/ Mathematics and Science. Teachers College Press, ~fulling/m311/ New York, 1989. Instructions for the students are Murphy, T. PostScript, QuickDraw,TEX. TUG- boat, 12(1), 64–65 (1991). 2 I reiterate that I have avoided positive remarks Sterrett, A., ed. Using Writing to Teach Mathemat- on various products because they would inevitably ics (MAA Notes No. 16). Mathematical Associa- have been incomplete and unfair. tion of America, Washington, 1990.
166 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting MathML: A Key to Math on the Web
Patrick D.F. Ion Mathematical Reviews P. O. Box 8604 Ann Arbor, MI 48107, USA [email protected]
Abstract As the Web gains in importance and as the needs of mathematical formalism on the Web are beginning to be met by MathML, it is an opportune time to reflect on the design decisions made by the W3C Math Working Group that resulted in the verbose markup language for transport of math on the Web that MathML turns out to be. The TEX community need not be frightened by the advent of MathML but may learn to work in the Web environment it provides. The presentation will describe some of the key aspects of MathML and ex- plain how it has begun to be used (by August 1999 the support for MathML may be expected to be much greater in a practical sense than it is now). Expected future developments and roles will be commented upon.
Introduction The presenter, Ernst Schr¨oder, an algebraist and logician from Karlsruhe5 began his talk by say- The Math Working Group1 of the World Wide Web ing that if there were any topic that really belonged Consortium2 certainly believes that MathML,Math at an International Conference of Mathematicians, Markup Language (Ion and Miner, eds., 1997), is a then it was pasigraphy. He was sure that pasigra- key to math on the Web. It is a protocol for trans- phy would take its rightful place on the agenda of all ferring mathematical knowledge, and for building succeeding such conferences. Perhaps it is needless tools to manipulate it, which shows great promise. to say it did not. There are already several implementations and tools Schr¨oder then went on to disagree with the dis- basedonMathML. Maybe its greatest strength is tinguished chair of the session, G. Peano,6 by saying that MathML is a specification which is open to view that he did not think that Leibniz’s problem of pro- and has been adopted as a Recommendation by the viding an algebra universalis, a symbolic calculus for World Wide Web Consortium (W3C). mathematics, had been solved. Peano (1858–1932) This contribution will discuss some of the con- had just begun to publish, in 1894, his four-volume text in which MathML has been developed; some of treatise intended to provide just that. It is here, the design decisions that went into it will be illus- in fact, that Peano’s axioms for the natural num- trated in the live presentation. bers are to be found, along with axiomatizations and History highly symbolic representations for much of arith- metic, algebra, geometry and calculus. In 1897, at the First International Congress of Math- Schr¨oder offered some of his own considerations ematicians in Z¨urich (Rudio, 1898) there were not on the topic of universal symbolics for math as part 3 ¨ many talks. Striking among their titles is “Uber of logic. He favored a system, near that of C.S. Pasigraphie, ihren gegenw¨artigen Zustand und die 4 Peirce (1867), using eighteen special symbols. pasigraphische Bewegung in Italien”. It is clear that the progress of science in general, and mathematics in particular, depends on there being a representation of its findings external to 1 http://www.w3c.org/Math, co-chairs: Angel L. Diaz (1998–2000), Patrick D. F. Ion (1997–2000), Robert L. Miner (1997–1998). ternational language using characters (as mathematical sym- 2 http://www.w3c.org. bols) instead of words to express ideas. 3 I thank R. Keith Dennis for showing me his copy of the 5 Perhaps best known today from the so-called Schr¨oder- proceedings in 1997. Bernstein theorem. 4 “Pasigraphy, its present state and the pasigraphic move- 6 The Italian mathematician who can be considered leader ment in Italy” (Schr¨oder, 1898). Pasigraphy is an artificial in- of the pasigraphic movement in Italy.
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 167 Patrick D.F. Ion
individuals. In this way the evolving knowledge can The W3C forms working groups (WG) to study be shared. The standard reference work on the his- and produce recommendations on subjects concern- tory of mathematical notation is by Florian Cajori ing the Web. These range from HTML, HyperText (1928/29). Markup Language (Raggett and Jacobs, eds., 1998), The ontology of mathematics has changed over the basic markup language for the Web, to PICS,the the centuries, so what the objects of mathematics Protocol for Internet Content Specification (World are is by no means a given. Very roughly speaking, Wide Web Consortium, 1997), which allows people and seen from the present day, for the early Greeks to control access to pages based on content ratings. math was geometry. Then came along a revolution A list of all the concerns of the W3C is available at started by Descartes who put forward a successful their main Web site mentioned above. algebraic form of geometry through the introduction Particularly relevant to the present subject are of coordinates. Throughout this time what num- the working groups centered around XML, an acro- bers were was really not up for discussion although nym from eXtensible Markup Language (Bray etal., there were, for instance, differences drawn between 1998) and XSL/CSS (eXtensible Style Language and rational and irrational numbers, and later between Cascading Style Sheets (Deach, 1999; Bos and Lie, algebraic and transcendental numbers. At the end eds., 1999; Bos et al., 1998). of the last century and the beginning of this, a new Members of the W3C may request representa- view held that all math had to be founded upon set tion on any WG of interest to them (one represen- theory and logic. And now that is seen as not all tative and possibly one alternate at most on a WG, that satisfactory, so that categories, toposes, or the together entitled to one vote in deliberations) and new developments of mathematical logic are viewed there may be invited experts from outside the W3C as fundamental.7 in a WG, as deemed appropriate. The W3C tries to These remarks have been made because there work by consensus as much as possible. The lim- is an obvious sense in which the W3C Math Work- itations on voting are there so that problems will ing Group is just trying to create a modern form of be sorted out on technical merit rather than being universal symbolic language. It is not intrinsically subject to gross commercial pressures. There are an easy task. guidelines and procedures, including voting if nec- essary, developed by the WG that deals with that The W3C — World Wide Web Consortium aspect of the W3C. The advantage of MathML mentioned above, that SGML, HTML and XML it is a public specification and a recommendation adopted by the W3C, stems from the sort of organ- The specification for which the W3C is best known isation the W3C is. is HTML, presently at version 4.0. This was intro- The W3C is a consortium of some 340 or so duced by Berners-Lee along with the linking and organizational members,8 mostly commercial enti- transfer protocols of HTTP, Hypertext Transfer pro- ties. They include big computer and software com- tocol (Fielding et al., 1998). In the beginning this panies—such as IBM, Hewlett-Packard, Sun, Mi- was a language developed for a specific purpose, crosoft and Netscape, as well as smaller ones—and and not necessarily consonant with any other stan- others from, for instance, the big aircraft or electric- dards. However, it was realised that a few changes ity industries — Boeing and Electricit´e de France, to to HTML would make it a markup language obey- name two. In general, the W3C is an organization ing the principles of SGML, Standard Generalized joined by those who wish to understand and influ- Markup Language (International Standards Orga- ence the development of the protocols and mecha- nization, 1986; van Herwijnen, 1994). SGML is an nisms that control the functioning of the Web. The ISO international standard that describes a way of W3C Director is Tim Berners-Lee, generally recog- writing down document markup. It provides a very nized as the inventor of the Web. The W3C is fully general framework, both verbose and complicated to international in scope, with main offices in Cam- use. There has been a lobby for some years trying bridge, Massachusetts, USA (MITLCS), in Grenoble, to push SGML standardization as a highly desirable France (INRIA) and in Tokyo, Japan (Keio U); it re- means for publishing production, which facilitates ceives additional support from both DARPA and the document re-use amongst other things. European Commission. However, the difficulties in using SGML stan- 7 dards in practice meant that it was unthinkable that See the potted history in, say, a book by Godement (1998). those standards be taken straight over to the Web. 8 http://www.w3.org/Consortium/Member/List True, SGML adoption would have provided a great
168 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting MathML:AKeytoMathontheWeb wealth of possibilities for Web publishing. But a to. Many simplifications have come out of years of Web browser would essentially incorporate a full experience with SGML. SGML parser and, worse still, it would have to deal Again of great importance, perhaps more than with the DSSSL, Document Style Semantics and its technical quality, is that XML is a Recommenda- Specification Language (10179:1996(E), 1996) for ac- tion of the W3C. The corporate members of W3C tual display. will support its place on the Web. Browsers are To use SGML strictly, every document should being made with with XML support (e.g., by both be parsed against an explicit DTD (Document Type Microsoft and Netscape). Tools, such as editors, are Definition) to be sure that it is a correct instance of being produced to enable XML document creation. some general class of documents and markup. This Associated specifications such as XSL and CSS for would require a level of rigor in SGML Web page formatting (for an overview, see W3C Style, 1999), programming in clear conflict with the use, and use- a DOM, Document Object Model (Apparao et al., fulness, of the Web. An SGML Web, it was feared, 1998) for the Web, the RDF, Resource Description would rapidly age. Pages might not be produced Framework (Lassila and Swick, eds., 1999) model because it would be too difficult to provide the level for providing metadata,9 as well as specifications for of correctness required for complex SGML browsers name spaces, linking, database queries and many to function; also, standards would be difficult to ob- other ancillary developments are underway at the serve in writing pages and in writing software, and W3C, using XML. The fundamental HTML is be- the increase of entropy toward a chaotic breakdown ing redone as XHTML (W3C HTMLWorking Group, of the Web would be rapid. Thus a cut-down, or 1999). simplified, version of SGML began to be developed for use with the World Wide Web, under the aegis The W3C Math Working Group of the W3C. It has been rather paradoxical that the mathemati- XML is a restricted form of SGML as a spec- cal formulae of science have been so difficult to rep- ification for markup languages. The most obvious resent on the Web. Tim Berners-Lee, after all, was thing unchanged is the tagging structure. As in a scientist at CERN, a massive international center SGML, elements of the document are marked up of physics in Geneva, when he came up with what with tags, to form phrases like
. confusion over math support following HTML 3.2 The next important matter for the Web is that (Raggett, 1997). Books appeared with sections ex- the language design is intended to allow the process- plaining simple extensions to HTML for math that ing, at least to some reasonable level, of pages with- were little more than suggestions how something out a declared DTD.AnXML document can be well might be done. The extensions were not in any formed without being an instance of some DTD,al- Recommendation accepted by the W3C. Recommen- though being well formed essentially means it would dations are issued only after a long careful review be possible to construct a DTD which has the doc- process, including a period of six weeks during which ument as a valid instance. So a browser which can parse XML can be allowed to process a free-form 9 Also of importance to math on the Web, especially in page, if XML syntax is respected. education, and for which there is an interest group. 10 There are many other technical differences from IBM, Hewlett-Packard, Adobe, Elsevier Science, Wol- fram Research, Maplesoft, SoftQuad, . . . SGML which allow XML parsers to be much easier 11 American Mathematical Society, Geometry Center, Stilo Technologies, (and later Design Science), . . .
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 169 Patrick D.F. Ion
W3C members may vote for or against acceptance. was realized that general acceptance of a new math They normally follow several working drafts, which specification would happen only if it embedded eas- are made available for public comment. It is a very ily into the technology of the internet, which was open process. If there is enough support from the coming to be dominated by XML and its relatives. W3C members then a draft is put forward as a Rec- MathML 1.0 is written as an XML application, ommendation. The math suggestions were dropped one of the first at more than a toy level. The Math because the matter needed more careful considera- WG wanted to come up with something that re- tion. But this shows the level of interest which was ally met the goal of facilitating the use of math on latent in the community. the Web. The Math WG’s original far-reaching objectives Many on the WG had considerable experience were listed as follows: with TEX and could consider it as a natural para- 1. is suitable for teaching and scientific communi- digm for a math language for the Web. IBM too, cation; drawing on its extensive experience with Scratch- 2. is easy to learn and to edit by hand for ba- pad and then Axiom, could have itself proposed a sic math notation, such as arithmetic, polyno- language for math. Similarly, Wolfram Research mials and rational functions, trigonometric ex- had Mathematica, a very rich language for express- pressions, univariate calculus, sequences and se- ing math in ASCII characters suitable for easy Web ries, and simple matrices; transmission. But there are disadvantages to such foundations 3. is well suited to template and other math edit- for math on the Web. A specification to be gener- ing techniques; ally used for math communication should be pub- 4. insofar as possible, allows conversion to and licly developed and not proprietary. Math is almost from other math formats, both presentational entirely treated as public property: one cannot, in and semantic, such as TEX and computer al- principle, patent mathematical facts. Agreement is gebra systems. Output formats may include needed from a broad spectrum of interested parties graphical displays, speech synthesizers, comp- that the language provided is expressive enough for uter algebra systems input, other math layout their purposes, for instance, from several symbolic languages such as T X, plain text displays (e.g., E algebra systems. Finally, a specification is more VT100 emulators), and print media, including likely to be deployed if those who will use it have braille. It is recognized that conversion to and been involved in its development. It is also more from other notational systems or media may likely to be realistic if people who will implement it lose information in the process; have contributed to its development. 5. allows the passing of information intended for So the Math WG decided to fall in line with specific renderers; the evolving standards of the Web. This was a deci- 6. supportes efficient browsing for lengthy expres- sion with many implications for our later work and sions; not that easy to take. The primary goal was to cre- 7. provides for extensibility, for example, through ate something that would be powerful, usable and contexts, macros, new rendering schemas or new adopted. And this aim has continued to drive the symbols; some extensions may necessitate the WG throughout. use of new renderers. Input considerations The above goals were endorsed by an earlier group meeting in October 1996 at the Boston W3C. Plainly Next the WG set about making an XML application. they have not all been accomplished yet but much This had the unfortunate corollary that the markup progress has been made. Point 2, ease of learning developed would not directly meet the need for an and hand editing, at least, cannot be achieved di- easy input syntax for math on Web pages, not even rectly with a satisfactorily general and expressive for simple math. markup language. After heavy discussion, it was eventually real- Many ideas were considered early on by the ized that the question of input styles was not one WG.TEX is clearly important for communication of that could be solved initially. There are too many mathematics nowadays. Why not just extend HTML different communities of users of mathematical with a TEX-like syntax? That turned out not to be formulas to satisfy them all. Making a lower-level simple at all. Finally, after much discussion, the WG language, which input mechanisms could write, would decided that it would develop a markup language encourage development of tools which could be of which accorded with XML. The reason is that it real help to the many not served by something as
170 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting MathML:AKeytoMathontheWeb complicated as TEXorLATEX sources. The keyboard MathML presentation coding for equation (1) input could be accepted by applications tailored to First, note that the punctuational period is not en- Conversion of legacy documents coded here; this is intentional. The period, and its The WG chose a layered approach to facilitating space away from the equation, should be part of the math on the Web. Once an expressive transport pro- overall document, not part of the math. MathML tocol is agreed upon then developers can set about is for the math and not for the rest of the docu- making their own compatible tools. In particular, ment; TEX is a system that can handle both but one can make converters of legacy documents into then tends to mix contexts, as here. TEX is also mis- editions using MathML, especially converters of ma- used when math mode is employed for special non- terial encoded in TEX. Conversion will never be mathematical layouts, just as HTML table mecha- completely automatic but there are several efforts nisms are employed to do math. People should be underway to provide converters. For instance, two inventive but there is more to the semantics of the 13 talks here will discuss the problems (Gurarie and situation than just the displayed form. Rahtz, 1999; Lovell, 1999), and the American Math- A mathematical expression is enclosed within ematical Society, Society for Industrial and Applied top-level tagging with
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 171 Patrick D.F. Ion
are thinking of digit strings in some ordinary script. square root sign. MathML provides many schemas
172 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting MathML:AKeytoMathontheWeb applied are
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 173 Patrick D.F. Ion
details, as usual. Examples of formulas can be dis- Specification”. Technical report, World Wide cussed much faster when speaking in front of over- Web Consortium, http://www.w3c.org/TR/ heads than on the printed page. REC-DOM-Level-1-19981001, 1998. The latest The WG is issuing a corrected version, Math- version of DOM1 is available at http://www. ML 1.01, in July 1999 and will provide a major revi- w3c.org/TR/REC-DOM-Level-1. sion and extension, MathML 2.0, by February 2000. Bos, B., H. W. Lie, C. Lilley, and I. Jacobs, Better integration with all the new standards from eds. “CSS, level 2 Recommendation”. Techni- W3C will be part of version 2.0 and extensibility cal report, World Wide Web Consortium, http: issues will be further addressed. //www.w3c.org/TR/REC-CSS2-19990512, 1998. In the meantime, the promising new develop- The latest version of CSS2 is available at http: ment is that the largest browser companies are be- //www.w3c.org/TR/REC-CSS2. ginning to implement MathML rendering in an es- Bos,B.andH.W.Lie,eds.“CSS, level 1 sentially native way. Up to the present the best dis- Recommendation”. Technical report, World play of MathML with a browser has been with the Wide Web Consortium, http://www.w3c.org/ Java plug-in WebEQ (which also has an associated TR/REC-CSS1-19990111, 1999. The latest ver- equation editor), or with the W3C’s testbed browser sion of CSS1 is available at http://www.w3c. Amaya. The Math WG is also working on providing org/TR/REC-CSS1. a test suite to verify compliance with the specifica- Bray, T., J. Paoli, and C.M. Sperberg-McQueen, tion issued and to help those who build tools using eds. “Extensible Markup Language (XML) 1.0.”. MathML. There is a great deal of activity of Math- 1998. The latest version of XML is available at ML development in process. http://www.w3c.org/TR/REC-xml. Cajori, Florian. A History of Mathematical Nota- Acknowledgements tion. Open Court Publishing, La Salle, Illinois, The activity of a W3C Working Group is very much 1928/29. 2 vols. Reprinted (Cajori, 1993). a collaborative one. Aside from the numerous peo- Cajori, Florian. A History of Mathematical Nota- ple who have expressed their views on how math tion. Dover, New York, 1993. 2 vols. printed to- should be on the Web, I would like very much to gether. thank all the members of the W3C Math WG — Deach, Stephen, ed. “Extensible Stylesheet Lan- Stephen Buswell, David Carlisle, St´ephane Dalmas, guage (XSL) Specification”. Technical report, Stan Devitt, Ben Hinkle, Angel D´ıaz, Sam Dooley, World Wide Web Consortium, http://www. Stephen Hunt, Roger Hunter, Doug Lovell, Robert w3c.org/TR/1999/WD-xsl-19990421/, 1999. Miner, Barry MacKichan, Ivor Philips, Nico Pop- The latest version of the XSL Working Draft is pelier, T.V. Raman, Dave Raggett, Murray Sargent available at http://www.w3c.org/TR/WD-xsl. III, Neil Soiffer, Paul Topping, Stephen Watt, the DSSSL. Document Style Semantics and Specification current members, and the past members, Stephen Language (DSSSL). ISO/IEC 10179:1996(E), Glim, Brenda Hunt, Arnaud Le Hors, Bruce Smith, 1996. Available at ftp://ftp.ornl.gov/pub/ Robert Sutor, Ron Whitney, Lauren Wood, Ka-Ping sgml/WG8/DSSSL/. Yee, Ralph Youngen, for the pleasure and stimu- Fielding, R., J. Gettys, J. Mogul, H. F. Nielsen, lation of working with them. Together they have T. Berners-Lee, et al.. “HTTP Version 1.1”. Tech- achieved something very worthwhile. nical report, Internet Engineering Task Force In addition, we are all very grateful to Barbara (IETF), 1998. Beeton for her stalwart efforts in trying to get the Godement, Roger. Analyse Math´ematique I. characters of math into Unicode, and thus onto the Springer, Berlin, Heidelberg and New York etc., Web. And I am grateful to the American Mathema- 1998. tical Society for supporting my efforts here, and to Gurarie, Eitan and S. Rahtz. “LATEXtoXML/Math- the IHES for superlative conditions during a long ML (Workshop)”. TUGboat 20(3), 1999. (See visit there. elsewhere in these Proceedings.). Ion, Patrick D.F. and R. L. Miner, eds. “Math- References ematical Markup Language (MathML)1.0 Apparao, V., S. Byrne, M. Champion, S. Isaacs, Specification”. Technical report, World Wide I. Jacobs, A. L. Hors, G. Nicol, J. Ro- Web Consortium, http://www.w3c.org/TR/ bie, R. Sutor, C. Wilson, and L. Wood, REC-MathML-19970430, 1997. The latest ver- eds. “Document Object Model (DOM) Level 1 sion of MathML is available at http://www.w3c. org/TR/REC-MathML.
174 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting MathML:AKeytoMathontheWeb
ISO 8879:1986. Information Processing — Text and Rudio, Ferdinand, 1856-1929., editor. Verhandlun- Office Systems — Standard Generalized Markup gen des ersten internationalen Mathematiker- Language (SGML). International Standards Kongresses in Z¨urich vom 9. bis 11. August 1897 Organization, ftp://ftp.ornl.gov/pub/sgml/ (Congr`es International des Math´ematiciens; WG8/DSSSL/, 1986. Consult http://www.iso. International Congress of Mathematicians), ch/cate/d16387.html for information about Leipzig. B.G. Teubner, 1898. the standard. Schr¨oder, (Friedrich Wilhelm Karl) Ernst, 1841- Lassila, O. and R. Swick, eds. “Resource De- 1902. “Uber¨ Pasigraphie, ihren gegenw¨artigen scription Framework (RDF)ModelandSyn- Zustand und die pasigraphische Bewegung in tax Specification”. Technical report, World Italien”. In Verhandlungen des ersten interna- Wide Web Consortium, http://www.w3c.org/ tionalen Mathematiker-Kongresses, pages 147– TR/REC-rdf-syntax-19990222, 1999. The lat- 162. 1898. est version of RDF is available at http://www. Sutor, Robert S. and S. S. Dooley. “TEXand w3c.org/TR/REC-rdf-syntax. LATEX on the Web via IBM techexplorer”. TUG- Lovell, Doug. “TEXML brings TEX to Web Future”. boat 19(2), 157–161, 1998. TUGboat 20(3), 1999. (See elsewhere in these Topping, Paul R. “Using MathType to Create TEX Proceedings.). and MathML Equations”. TUGboat 20(3), 1999. Mollin, R.A. “Prime-producing quadratics”. Amer. (See elsewhere in these Proceedings.). Math. Monthly 104(6), 529–544, 1997. MR Unicode. The Unicode Standard: Version 2.0. 98h:11113. The Unicode Consortium, 1996. The specifi- Peirce, C.S. “Description of a Notation for the Logic cation also takes into consideration the corri- of Relatives”. Memoirs Acad. of Arts and Sci- genda at http://www.unicode.org/unicode/ ences (Cambridge and Boston; N.S.) IX, 317– uni2errata/bidi.htm. For more information, 378, 1867. consult the Unicode Consortium’s home page at PICS. “Platform for Internet Content Selection http://www.unicode.org/. (PICS)”. http://www.w3c.org, 1997. For more van Herwijnen, Eric. Practical SGML.KluwerAca- information see http://www.w3.org/PICS/. demic Publishers Group, Norwell and Dordrecht, Raggett, D., A. Le Hors and I. Jacobs, eds. 1994. “HyperText Markup Language (HTML)4.0 W3C HTML Working Group. “XHTML 1.0: The Ex- Specification”. Technical report, World Wide tensible HyperText Markup Language; A Refor- Web Consortium, http://www.w3c.org/TR/ mulation of HTML 4.0 in XML 1.0”. Technical REC-html40-19980424, 1998. The latest version report, World Wide Web Consortium, http:// of HTML 4.0 is available at http://www.w3c. www.w3.org/TR/1999/xhtml1-19990505, 1999. org/TR/REC-html40. This is a Working Draft; the latest version of Raggett, D., ed. “HTML 3.2 Reference Specifica- XHTML is available at http://www.w3c.org/ tion”. Technical report, World Wide Web Con- TR/xhtml1. sortium, http://www.w3c.org/TR/REC-html32, W3C Style. “Home page of the W3C Style Ac- 1997. The latest version, HTML 4.0, is available tivity”. http://www.w3.org/Style/Activity , at http://www.w3c.org/TR/REC-html40. 1999. This includes XSL and CSS work.
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 175 TEXML: Typesetting XML with TEX
Douglas Lovell IBM Research P.O. Box 704 Yorktown Heights, NY 10598 [email protected]
Abstract XML, eXtensible Markup Language, is a simplified subset of SGML, which is fast becoming a standard for content management on the internet. TEXML is an XML vocabulary for TEX. A processor written in the Java programming language translates TEXML-conforming XML into TEX. The pro- cessor provides a document formatting solution for XML that leverages the rich knowledge and capability built over many years in TEX. This describes the TEXML document format and the processor, TEXMLatt`e, that produces TEX source from TEXML markup.
XML : The future of the Web content markup transformation The World Wide Web is moving toward a future in vocabulary standard for presentation ✥ ❵ which XML,notHTML, is the primary medium for ✥✥✥ SGML ❵❵❵ storing and delivering documents and data. HTML DSSSL ❄ HTML is presentation markup for browsers. To- ❄ ❄ ✥✥✥ XML ❤❤❤ gether with Cascading Style Sheets (CSS)itspecifies XHTML✥ ❤ XSL the type sizes and fonts, and layout of a document. XML is a standard for defining and sharing data on the Web, including Web content. Users of XML Figure 1: XML heritage may define vocabularies — sets of tags or element names, such as “title,” “section,” “citation” and setting. XSL is influenced by the DSSSL standard attributes such as “id”—to identify elements and (ISO/IEC, 1996). structures in a document. With XML, organiza- DSSSL is an ISO standard for typesetting SGML. tions can develop markup which captures the struc- The XSL effort began as a means for transforming tural and semantic properties of their documents XML markup into standard presentation markup, < > and data. Instead of writing, H1 Typesetting just as DSSSL did for SGML. A goal for XSL was to XML we can write,
176 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting TEXML: Typesetting XML with TEX
Figure 2: XML markup describing an artist
Figure 2 shows a sample of some XML markup The missing piece is a small implementation of describing an artist, taken from an application for TEX in the Java programming language which can an art museum. For more about XML see IBM/XML be plugged into a browser or executed on a server; (1999) and W3C (1999). however, much of LATEX can be displayed by IBM techexplorer (Sutor and D´ıaz, 1998). What is T XML? E The enabling technologies now available are TEX- TEXML is an XML vocabulary for representing TEX ML and XSL, with platform-specific implementations source. It is a medium for transforming any XML of TEX or with the IBM techexplorer plug-in for an data into a document which can be typeset with HTML browser. the TEX program. TEXML represents the TEXcom- mands, control symbols, and \specialsasXML el- XSL Tree Construction. The transformation part ements. Figure 3 shows the markup of Figure 2 rep- of XSL has been moving rapidly toward completion resented in TEXML. as a W3C recommendation. There are many imple- The potential for TEXandXML mentations of the XSL transformation rules. In the process, people have found uses for XSL transforms XML makes TEX a potential universal typesetting far beyond producing typeset documents. back-end for markup in a way it never achieved for There are numerous web servers busily trans- SGML. There is an opportunity for T X to marry E forming XML into HTML using XSL style sheets. Mi- itself to XML, the future of the WWW, and become crosoft is supplying the transform as a dynamic link a predominant technology for typesetting. library in their operating system. The Microsoft In- T X has many advantages. It has an estab- E ternet Explorer Web browser (v.5) can accept XML lished, stable implementation and user base. It has data, apply an associated XSL transform, and dis- a rich body of knowledge and experience captured play the result. Organizations use XSL style sheets in its macro packages and styles. It can typeset just to transform electronic data from another organiza- about anything. tion into a form suitable for internal use.
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 177 Douglas Lovell
Figure 3:TEXML markup derived from the XML of Figure 2
178 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting TEXML: Typesetting XML with TEX
Figure 4: XSL markup to transform XML foranartistintoTEXML
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 179 Douglas Lovell
Figure 5:TheDTD for TEXML
XSL has thus become an important tool for XML lator program TEXMLatt`e,writtenintheJavapro- transformation. It is on the verge of becoming a de gramming language. TEXMLatt`e reads a valid TEX- facto standard for XML transformations. ML file and writes a proper TEX file. The TEXML Figure 4 shows an XSL stylesheet that will trans- DTD appears as Figure 5. form the artist XML example in Figure 2 into the The basic template for a TEXML document is: TEXML example in Figure 3. XSL Formatting Objects. The formatting ob-
180 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting TEXML: Typesetting XML with TEX
TEXML TEX description ch attribute output % \%{} escape character esc \ & \&{} begin group bg { { \{ end group eg } } \} math shift mshift $ | $|$ alignment tab align & \ $\backslash$ parameter parm # $ \${} superscript sup ˆ # \#{} subscript sub \_{} tilde tilde ˜ ˆ \char‘\^{} comment comment % ˜ \char‘\~{} < $<$ Figure 7:
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 181 Douglas Lovell
TEXMLatt`e supplies the default values, “begin” An XSL implementation. We have used the Lo- for the begin attribute, and “end” for the end at- tus implementation, which you may download tribute of an
182 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting TEXML: Typesetting XML with TEX
\documentclass{article} \title{Edward Weston} \begin{document} \maketitle \section*{Some biographical information} Edward Weston was born in 1886. Weston lived until 1958. \par Weston was a photographer who pioneered the modern use of photography as an art form in the United States. \section*{Some short quotes from Weston} \begin{enumerate} \item ‘‘My own eyes are no more than scouts on a preliminary search, for the camera’s eye may entirely change my idea.’’ \item ‘‘Ultimately success or failure in photographing people depends on the photographer’s ability to understand his fellow man.’’ \end{enumerate} \section*{A longer quote} \begin{quote} ‘‘One does not think during creative work, any more than one thinks when driving a car. But one has a background of years - learning, unlearning, success, failure, dreaming, thinking, experience, all this - then the moment of creation, the focusing of all into the moment. So I can make ’without thought,’ fifteen carefully considered negatives, one every fifteen minutes, given material with as many possibilities. But there is all the eyes have seen in this life to influence me.’’ \end{quote} \end{document}
Figure 8:TheTEX produced by TEXMLatt`e using the XML of Figure 3
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 183 Using MathType to Create TEX and MathML Equations
Paul Topping Design Science, Inc. 4028 Broadway Long Beach, CA 90803 USA pault@mathtype.com www.mathtype.com
Abstract MathType 4.0 is the latest release of Design Science’s interactive mathematical equation editing software package, the full-featured version of the Equation Editor applet that comes with Microsoft Word. Its completely re-architected translation system should be of particular interest to the TEX and MathML communities. MathType can be used as an aid to learning TEX, as a simpler interface for en- tering equations into a TEX authoring system, or as part of a document conversion scheme for journal and book publishers. After the introduction, a simple translation example is given to show how its Translator Definition Language (TDL) is used to convert a MathType equa- tion to TEX. This is followed by an introduction to MathML and MathType’s MathML translators are discussed. Finally, some of the possibilities for creating translators for special purposes is mentioned, along with discussion on how Math- Type’s translation facilities can be used as component of a more comprehensive document conversion process.
Introduction MathType is an interactive tool for authoring math- ematical material. It runs on Microsoft Windows and Apple Macintosh systems (a Linux implemen- tation is under consideration). Readers may also be familiar with MathType’s junior version, Equa- tion Editor, as it is supplied as part of many personal computer software products, such as Microsoft Word and Corel WordPerfect. Unlike TEX, MathType does not process entire documents. Rather, it is used in conjunction with other products, such as word processors, page layout programs, presentation programs, web/HTML edi- tors, spreadsheets, graphing software, and virtually any other kind of application that allows insertion of a graphical object into its documents. MathType Figure 1:TheMathType Window equations can even be inserted into database fields with most modern database systems! MathType has a simple but powerful direct- the numerator and denominator. The contents of manipulation interface for creating standard math- each slot are filled in by the user by more typing ematical notation. Instead of entering a computer and inserting of templates. The displayed equation language, such as TEX, the MathType user combines is reformatted as the user types and spacing is added simple typing with the insertion of “templates”. For automatically (although spacing may be explicitly example, inserting a fraction template results in a overridden). Some find this interface to be simpler fraction bar with empty slots above and below for than direct TEX input as there are no keywords to
184 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Using MathType to Create TEX and MathML Equations remember and, most importantly, no possibility of controlled by a translator definition file, a text file syntax errors. containing a simple translation rule language that One common complaint heard from TEXusers allows a fragment of the target language to be asso- upon first seeing MathType’s user interface is that ciated with each of MathType’s many symbols and one must use the mouse for everything. Whereas templates. Although the chief motivation for its de- mouse support is an important part of MathType’s velopment was TEX translation, it can also be used user interface (as with virtually all Windows and to convert MathType equations to other languages, Mac applications), MathType 4.0 has keyboard meth- such as MathML and those specified by the math ods for performing all of its commands. Also, users parts of various SGML-based document languages. may assign keystrokes to commands in any kind of MathType is supplied with translator definition mnemonic scheme they care to invent. The ability to files for plain TEX, -TEX, LATEX, -LATEX, A S A S assign a keystroke to an arbitrary math expression and four MathML variations.M These canM be cus- is analogous to TEX’s macro facility. tomized for specific applications or translators for Although a thorough examination of MathType other mathematical languages can be written by start- is beyond the scope of this paper, here are some of ing from scratch. Also, commands are available in its most important features: Microsoft Word that will allow a Word document Any national language characters that the host containing MathType (or Equation Editor) equations • operating system allows may be inserted into to be converted to TEX or any other language sup- math, including Asian characters. ported by a translator. MathType’s translation fa- cilities can be used as an aid to learning T X, as a MathType’s internal representation of charac- E 1 simpler interface for entering equations into a T X • ters is Unicode, extended via its Private Use E authoring system, or as part of a document conver- area to cover more of the characters that ap- sion scheme for journal and book publishers. pear in mathematical notation. We call this MTCode. A user-extendable database of math The MTCode character encoding font-to-MTCode mappings is used to relate char- acters entered to knowledge used in line format- Although the designers of Unicode have attempted ting, as well as translation. to include many mathematical characters, their at- tempt falls somewhat short. In fairness to them, A basic set of mathematical fonts (Roman, incorporating all the characters of the many natu- • Greek, italics, Fraktur, blackboard bold, and ral languages in use in the world must have been an many mathematical symbols) is included. Math- overwhelming task. Type can also make use of any PostScript Type 1 There is an attempt by some in the mathemat- or TrueType font available via the operating ical community to get the Unicode Consortium to system. add the missing mathematical characters to a future MathType also includes a translation system for • version of Unicode. If and when they are successful, converting mathematics entered in its editing we will probably adopt it to replace MTCode. window to virtually any text-based language. MathType uses each character’s MTCode value This translation system is the chief subject of as a key into a database of character information this paper. In particular, it includes translators that, for each character, includes a human-readable for several flavors of TEX and MathML. description, an indicator of its role in mathematical MathType’s translation facilities notation (e.g. variable, binary operator), and infor- mation used in the process of choosing an appropri- MathType has had a TEX translator for many years ate font to render it on screen and printer. Most that allows the user to copy all or part of an ex- importantly, a character’s MTCode value is an in- pression onto the clipboard in the TEX language, dex into tables of translation strings in MathType’s ready to be pasted into a document. However, un- translation system. til version 4.0, it had two important limitations: it We will use the terms MTCode and Unicode could only generate plain TEX and the user had no interchangeably in the remainder of this paper. control over the TEX fragments generated for par- ticular symbols and templates. MathType 4.0 fea- tures a complete re-design of the translator mecha- nism. The translation of a MathType expression is 1 Unicode is a standard for encoding characters. See www. unicode.org for information.
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 185 Paul Topping
The translation system from a user’s to the user in the Translators dialog (see Fig. 2) perspective and a longer description which appears in the dialog once the translator is chosen from the The basic scenario for using MathType to aid in the list. The description might include the author’s creation of a T X document is to run it simultane- E name and affiliation as well as the version num- ously with your favorite T X editor. The process is E ber of the translator. this: a set of matching rules of the form MathType • whenever an equation is needed, the thing to match = translation string ; • window is brought to the front; h i h i MathType equations (just like TEX ones) are the equation is created in the MathType win- • represented internally as a tree. Let’s take the fol- dow; lowing equation as an example: the equation is selected and copied to the clip- a + b • y = board, a process which invokes the previously c selected translator; MathType sees this equation as: the TEX editor is brought to the front; • eqn (root) the TEX code for the equation is pasted into the slot (main) • document. character (y) Editing to correct mistakes is performed by re- character (=) template (fraction) versing this process, pasting the TEX code (along slot (numerator) with a comment containing a compressed form of character (a) MathType ’s internal representation) back into a character (+) MathType window, and then repeating the above character (b) process once the corrections have been made. slot (denominator) At the beginning of such a document creation character (c) and editing session, the user must select one of Math- Type’s translators. This is done via the Translators The translation process begins by applying the dialog, which presents a list of all the translators display equation rule,2 to the root of the MathType present on the user’s system in the MathType trans- expression: lators directory (Fig. 2). eqn = "\[@n#@n\]@n"; // display equation
The characters between quotation marks are pro- cessed from left to right. Most of the characters in the eqn rule are simply placed in the output trans- lation stream. The @n sequence outputs a newline. The @ character, called the escape character, is used to insert special characters into the output stream. The default escape character is $, but is redefined in the TEX translators to @ for convenience. The # in the eqn rule causes the translator to look for a rule that will be used to translate the equation’s main slot. After applying this transla- tion (we’ll get to that next) and inserting its output into the translation stream, the rest of the eqn trans- lation string is output and the translation process is Figure 2: The Translators dialog complete. Let’s go back to see how the # in the eqn rule Anatomy of a translator is processed. This is done with the rule: slot/t = "#"; Each translator is defined by a text file written in a simple language called TDL (Translator Definition This rule works just like the eqn rule but is even Language). A translator has a simple structure: simpler. The /t option is used to signal that this The first line defines the short name for the 2 • The translation rules used in our example are simplified translator which appears in the list presented somewhat for the purpose of this paper.
186 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Using MathType to Create TEX and MathML Equations rule is to be used for the top-most slot in the equa- can convert its equations into MathML’s presenta- tion only. Other slot rules enclose the translation tion markup. For example, translating the following of their contents in ,theTEX notation for group- expression into MathML: ing. Now, each item{} in the main slot is processed. The rules for y and = are also very simple: b √b2 4ac − ± − char/0x0079 = "y"; // Latin small letter y 2a char/0x003D = " = "; // Equals sign results in:
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 187 Paul Topping
Math Working Group has made browser integration with the development of custom translators, it one of its highest priorities. • can be used to generate virtually any text-based In order to work with the various MathML math language browser plug-ins, we have had to create several ver- It is our hope that TEX users will want to add it to sions of our MathML translator, one for each. They their arsenal of useful tools. differ only in the “wrapper” code they place around the generated MathML in order to invoke the plug-in and pass the MathML code to it. When true browser integration is possible, we will only need a single MathML translator. Experimenting with MathType’s translators Because of their simple and open structure, Math- Type’s translators can be easily modified or new ones written from scratch. Some possibilities include: creating a new translator for the mathematics • portion of an SGML document standard (DTD) TEXmusings changing an existing TEX translator to make Musings from the Bard • use of some macro package or the author’s own preferred macros Oh, what a tangled web is TEX, using the Unicode capability of MathType’s or so it seems at the outset; • for highest quality, the best to look, translators to take advantage of the various TEX adaptations for non-Ennglish languages Oh why did I choose to typeset my own book! ... Document conversion For Macintosh users the skies are quite blue, with Barry to help you and Ben Salzburg too. MathType’s translation facilities can also be incor- With Art, Ross, and Uwe ready to assist, porated into the process of converting entire docu- just send a short email to Gary Gray’s (Textures) ments to T X or any other language supported by E list. one of its translators. MathType has a programmatic ... interface beside its more familiar graphical user in- If shareware type software is more to your taste, terface. This interface is via functions in a Windows your super-fast Power-Mac need not go to waste. DLL (Dynamically Linked Library). As MathType is There’s CMac- and Direct-T X to lessen the sorrow, often used with Microsoft Word, we have provided E and that great program OzT X, by Andrew Trevor- functionality that can be accessed using commands E row. on a “MathType” menu within the Word application ... itself. One of these is a Convert Equations command For Unix-like platforms the software’s all free, that can be used to convert all the MathType and with a teT X installation from the T X-Live CD, Equation Editor equations in a document to any one E E which collects all the pieces and orders all parts, of the languages for which a MathType translator is Thanks to Thomas Esser, Kaja, and Sebastian Rahtz. available. Although this does not result in a fully ... translated Word document, the most difficult part, Leslie Lamport created LAT X nearly fifteen years converting equations, has been achieved. E ago. MathType’s Word support is written in that It evolved into 2-epsilon by a process rather slow. product’s Visual Basic language. The source code is Improvements are numerous; results you can see. accessible and may be used as the basis for your own Thanks to Carlisle, Mittelbach and Rowley, conversion scheme. It’ll be even better with release LATEX3. Conclusions ... For PostScript Type1 fonts exquisitely drawn, MathType 4.0, with its new translation features, can The expert is Berthold, whose surname is Horn. be used in a variety of ways: Where the yin meets the yang in the great cosmic as an interactive front-end to TEX authoring goo, • as an aid to learning TEX Don’t get this name mixed-up with Louis Vosloo. • to experiment with the new MathML standard —Ross Moore •
188 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Models and Languages for Formatted Documents
Chris Rowley The Open University 527 Finchley Road London NW3 7BG United Kingdom C.A.Rowley@open.ac.uk
Abstract The largest change that has come to the world of document formatting since TEX’s DVI language was designed is the need to support documents destined for multiple uses, e.g., for interactive reading on screen and for paper output. It is time to investigate what is needed, both now and in the immediate future, from a device-independent description language for formatted documents. This paper does not provide a complete such investigation. Instead, it out- lines what is required from a language that can describe the “device independent” properties of the “formatted form” of the currently imaginable range of docu- ments. The important and innovative concept identified here is the complete integration of three central formatting aspects of on-screen documents: text, graphics and the mechanics of interaction.
Introduction There probably also exist other necessary ex- tensions to the currently used models for format- Motivation. There is currently a need for a stan- ted documents that are not supported well by any dardised but flexible and comprehensive language current such language. Thus it is time to investi- that will enable the description of all aspects of gate what is needed, both now and in the imme- formatted documents that are the output of high- diate future, from a device-independent description quality document formatters such as the growing language for formatted documents. In order to illus- number based on T X (and its derivatives); this E trate and crystallise these ideas, it is useful to con- would also then be available for the output of all the sider how these needs are met by the current version superior XML/XSL-based formatting software that, of PDF or by other languages such as the Scalable we are assured by those who control the world, will Vector Graphics language. This would lead to a far soon decorate our desk-tops. longer paper detailing the achievements and failings Many aspects of formatted documents, such as of the current version of PDF and the relevance to graphics and colour, were deliberately excluded by this subject of the current thinking on SVG but here T X’s designers but the largest change that has come E I have confined myself to occasional comments on to the world of document formatting since T X’s DVI E pertinent aspects of these languages. language was designed is the need to support doc- uments that are designed (to high standards) for Background. About a year ago I was bold enough multiple uses, e.g., for interactive reading on screen to state: and for paper output. Many people now have ex- By August 1999 I hope, with a bit of help perience of using T X as the typesetting engine for E from my friends, to have further analysed the such documents, producing as the multi-use output models and concepts that need to be sup- form a ‘PDF document’. This could be either by use ported by a language for describing multi-use of pdfT X or by producing PostScript and then con- E documents, and how well PDF provides such verting this to PDF (the acronym PDF here refers to support. Adobe’s Portable Document Format). The power of combining the programmability of TEX with a thor- Well, I got a lot of help, mostly from the PDF discus- ough knowledge of the capabilities of PDF viewers sion list [8] and particularly from Sebastian Rahtz and Java programs have been brilliantly illustrated and Hans Hagen who, being privy to Adobe’s fu- by Hans Hagen. ture plans and hence in their thrall, could often only answer with the phrase “but I could not possibly
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 189 Chris Rowley
comment”, even about things that are already de- consequences of these for the structure of the lan- scribed in the literature—such is the way of com- guage. merce. Subsequent work will consider in more detail The result is this short paper outlining what the specification of the language together with the is required from a language that can describe the design and implementation of related applications. “device independent” properties of the “formatted form” of the currently imaginable range of docu- DVI and PDF. One motivation for this paper was ments. The discussion here tries to be general but my being asked at the TUG’98 conference: which is it is heavily influenced by the currently popular re- better, DVI or PDF? My reaction then was: since sources in this area: DVI [4], PDF [2,3](andhence they are so similar, neither! But then I was think- PostScript [1]), together with some, such as the Scal- ing only of the original version of PDF; now, having able Vector Graphics (SVG) Specification [5], that discovered the joys of v1.2 and more recently the are currently under development. 7.4MB of the latest (v1.3) manual, I publicly recant I am very much aware that the current version from that position. of this paper lacks a lot of explanation and exam- Although PDF is technically not a “device inde- ples; and that it contains little about practical ways pendent” language, it contains a large core of stuff to take these ideas forward and relate them to other that is, at least potentially, as “device independent” activity in this area. However, it does contain at as TEX’s eponymous DVI language. Both must, of least one significant new idea: the complete integra- course, be parsed by an application that understands tion of three central formatting aspects of on-screen the language and its underlying document model, documents: text, graphics and the mechanics of in- formatting model and page model; and, although teraction; this will lead to a more comprehensible they look very different at the detailed level, the and systematic treatment of all aspects of multi-use page models of these two languages (and their ab- formatted documents. stract semantics) also have a lot in common. This is one reason why the part of pdfTEX [6] that han- Preamble. I shall assume that the reader has some dles classical TEX files is only very locally and mini- familiarity with the DVI language (at least the com- mally different from classic TEX. However, PDF has monly used parts) and with the PDF language (v1.2 a somewhat richer document model and it integrates or later, but only the formatting-related parts). text and graphics in its formatting model. This is Please note that there are many things that are one reason why pdfTEX has extra primitives. not covered in this paper because, although very im- On the other hand, PDF also has a large, and portant for modern document science, they are not growing, part that is dependent on the very specific, directly relevant to the current subject. For exam- and limited, features of Adobe’s own viewers and ple, since we are considering a description language font technologies. As with most languages that are for formatted, multi-use documents, we completely being actively developed whilst being widely used, ignore the current uses, aimed at expressing docu- PDF is now a mixture of good and bad ideas: it is ments as logical tree-structures, of languages such still based on some simple but general models but as XML and HTML (although these are often used these are not always used to provide extensions nor to provide an inspired mixture of semi-specified for- have they been developed to provide more inclusive matting and logical markup). It is also possible but equally clean new models. Instead, it has grown to combine such languages with a language such as a collection of ad hoc add-ons that lack simplicity PDF to describe “partially formatted” documents. and coherence. Much of the additional functionality The major consequences of the chosen language of pdfTEX is there only to support such very partic- for the design of the applications, e.g., TEXorits ular features of PDF, as its meta-data objects and successors, that produce examples of it will be men- stream compression possibilities. tioned, but only in passing and hence incompletely. Although this is a lot, PDF does only what it However, these are probably of greater practical im- does; it has become a more difficult language for portance than the details of the language itself. a (human) document formatter or programmer to The paper begins by setting up the context, de- work with. It is therefore currently not at all clear scribing briefly the relationship between DVI, PDF how to make straightforward adaptations or exten- and the various models of document formatting into sions of the PDF language and we seem able to get which they fit. It then describes various models that from it only what They want us to have in Our doc- must be supported by a fully functional language uments. Note that some caution is needed when for multi-use formatted documents and analyses the evaluating the PDF language itself since much of its
190 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Models and Languages for Formatted Documents expressive ability is obscured by the lack of function- • The contents of the document that are needed ality and bad behaviour of the applications currently for rendering the FOs in the document on any available for viewing or translating it. supported output medium. Given the close symbiosis between its develop- • The contents that are needed to determine the ment and that of the Adobe’s Acrobat Reader ap- interaction state on any supported interactive plication, it is very likely that, wisely used, it is a output medium. very good language for rapid and accurate screen- • Information about structural relationships ing of downloaded documents. However, this is by amongst the formatted objects in the document. no means a unanimous verdict on the utility of PDF • Pointers to other resources required for the ren- and any such advantages have clearly been at the ex- dering process (e.g., rasterisation, font and col- pense of efficient and accurate production of those our information). documents since the current version is far from pro- viding the uniform, clean, comprehensible interfaces The following is information that is not essen- needed by writers of applications. tial but is useful; it is also very closely related to This suggests that there is a need for Yet An- the formatted document. Other (non-formatted) other Language: one with a clean and general model objects contain such information. and a flexible, uniform syntax. If this is incompat- • Information about the logical structure of the ible with fast incremental processing then a compi- document and its relationship to the structure lation process should be inserted to transform this of the formatted document. into something at least as good as PDF. So here are • Information needed for non-rendering activities some thoughts on such a language. for which support is needed; in general, this is Background and models too open-ended but some of these are the logical information that is needed for activities that are Here is a description of the use of a FDL (or Fixed traditionally associated with on-line document Formatted Document Language) and the models of readers, such as indexing and searching. document processing that it both supports and pro- There are, of course, many other things that vides. are essential to the complete description of a docu- The global model. The assumed usage of this ment and it may well be appropriate to add to the FDL derives from the following high-level model of language objects to be used for their specification. the document formatting process in which it is used. The following are some examples (from many) of in- formation that is important to the document but is • A generating application (GA) produces a fully formatted document and outputs it in this FDL not, per se, closely related to the formatted form of language. the document. • • A processing application (PA) uses the infor- Database information about the document it- mation about a fully formatted document de- self rather than its contents. scribed in this FDL in order to do only the fol- • How any visual material produced by other co- lowing (these are informal descriptions): operating applications (which may not them- – faithfully render (in accordance with the selves process the FDL material in this docu- medium) parts of the visual content of the ment) should be placed relative to the rendering document on one or more media; of the document. – when appropriate, supply information This paper will thus analyse in detail only the about attributes of such a rendering needed information in the first group (of four items). It to determine the interaction state of the PA; will also discuss some ideas concerning the informa- – pass on, but not process, information tion in the second group but will argue that the FDL streams to other applications (in partic- needs to be able to express only how the provision of ular, non-visual material). such information relates to information in the first group, leaving the specification of most of this in- At its top level, a formatted document consists of formation to other, more suitable, languages; these a collection of objects, the most pertinent of which languages have been, or will be, developed elsewhere are formatted objects (FOs). and can be used in a wider context. A model for the language. What information The FDL is not intended to provide a revisable must therefore definitely be represented in an FDL document format. Thus it will contain no provision description of a document? Here is an answer. below the level of the FOs for the specification of
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 191 Chris Rowley
user-level graphical objects so that they can be di- defined attribute functions such as “colour”. Thus rectly manipulated, as in a drawing application or a other technical issues not dealt with here are the document editor. This should not rule out support precision of numerical values and the closely related for extensions that, like PDF, provide some very lim- provision of rasterisation information. These are ited but useful form of structured revisions of FDL very important in practice but it is best to keep documents; however, this is not a primary property them clearly separate from the raster-independent, of the FDL. arbitrary precision part of the model. In addition Further restrictions. In order to appease the ed- (or rather subtraction) many of the complexities of itor of these proceedings and to put a reasonable colour and tone rendering are not present in the limit on the time I spend writing, I shall here re- model since these are intimately connected to the strict the analysis and discussion in the following rasterisation process. ways. Having so peremptorily dismissed rasterisation from this formal model, I must quickly explain that • The top-level formatted object (FO) described everything in the model is predicated on a model of by the FDL will be a two-dimensional, unro- device-dependent rendering that involves a rasteri- tatable rectangle (this is a convenient but not sation of this idealised plane. essential restriction). • The graphical model will have no concept of Analogies. One can liken a simple implementa- transparency, i.e., no graphical layers: this re- tion of this global model to a translator (the GA) flects only the current limit on my resources for and its agent (the PA), where: the translator com- investigating the issues involved and should be piles application-oriented document formats into a relaxed as soon as possible. It is also one of the well-defined, machine-oriented representation of the areas where PDF’s support falls short of current visual form of the document; the agent processes requirements. this lower-level code. In this simpler paradigm, the • There is no concept of time nor of an external “model for the language” is analogous to the opera- environment beyond the idealised two-dimen- tional semantics of that machine-oriented represen- sional output medium; hence the FDL itself does tation and the “model for the medium” would be not describe the non-typographic content of the abstract architecture of the machine. sound/video; and documents cannot be defined A heuristically better, but less precise, analogue to look different on Wednesdays, on Macs or is with database models that include pre-compiled on Vancouver Island (although some such re- views and data indexes. quirements could, of course, be implemented by Analysis the PA). • There is no concept of service levels to be ne- At the lowest level such an FDL needs to be able to gotiated between a client, knowing its local PA express, within the above limitations, full details of resources, and a document server. the following, and nothing more: Note that the first restriction does not limit the • everything that could be visually displayed by scope for specifying what is displayed since, within any PA using any supported visual medium; these top-level FOs, complex clipping paths can be • everything that is needed for the detection of specified. interaction events by any PA that supports in- Note also that the PA can use information ex- teractivity with such a visual mediuml pressed in the FDL to do complex things such as affording different views of the document and con- This information can be usefully divided into a num- trolling time-dependent actions to produce son-et- ber of related topics but they are all, ultimately, lumi`ere shows, etc., but these do not need to be graphical abstractions. described directly within the FDL. Graphical specifications. Here we separate the Moreover, the information needed to control concept of text (i.e., glyphs from fonts) from other associated multi-media actions should be encoded visual items; however, we do not separate the inter- in languages designed explicitly for describing such action-related graphical information from the visual objects and these languages should not be part of parts. the FDL. The underlying model for all this information A model for the medium. The abstract model is the specification of regions in the visual medium of the visual medium is therefore a rectangular sub- (idealised as a mathematical plane). These are the set of a mathematical Euclidean plane on which are only fundamental graphical objects that are used.
192 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Models and Languages for Formatted Documents
Although there are many other possibilities, colour information, also need methods for device- there is no clear reason to depart from, or extend, independent specification. The most general colour the commonly used collection of methods for speci- information is the specification of a colour gradient fying regions in terms of cubic paths; see, for exam- function, to specify how a region should be painted; ple, the paper version of the original PDF specifica- this is a mapping from the abstract visual medium tion [2]. This almost universal method is also used to a colour space. There is a need for further in- by PostScript and SVG. vestigation into what types of mappings are needed here; SVG will support a small range of mappings, Paths. These are used solely as a way of defining re- including linear, radial and periodic (for patterns). gions (stroking, etc., define a narrow area along the This could be extended to support the far more path). They are typically piece-wise cubics, other general concept of getting such resources from an ex- common forms such as conics being provided by the ternal paint server (not to be confused with the Mix- language only as syntax for their cubic approxima- Yer-Own machine outside the local Do-It-Yourself tions. store); this is analogous to the commonly used in- Note that these paths are mathematical ideal- direct ways of specifying glyphs and other font re- isations that are then used to define the somewhat sources. more concrete regions by means of various opera- tors such as clipping, stroking and filling. Note that Text. Although glyphs are also graphical objects, the use of these words here does not imply that any the methods by which they are specified are typi- painting of the defined region is yet specified. cally so completely different that treating the two similarly becomes fatuous. In particular, the choice Regions. Having thus defined a region, it can be and positioning of glyphs typically requires external (abstractly) painted in some way or it can be given resources and, hence, other languages. In the case of a label for use in defining interaction events; these PDF and PostScript, this is the only supported un- are not exclusive possibilities. Note that the speci- derlying model for text: both positioning and ren- fication of the region is identical for both visual in- dering information for typical fonts can only be spec- formation and for these interaction labels. This is ified via a fixed external font resource that must be all that is needed from the FDL in order to support accessed via a fixed-size encoding table. Such exter- all currently used types of interaction. A region can nal font resources are used by a specialised glyph- both have a label and be painted, and these two rendering part of the PA and also, often, by the GA: properties are completely independent. It may be it is clearly essential, but often difficult to achieve, sensible for all regions to be labelled objects so that that these two applications use identical informa- all their properties, including painting related oper- tion. ations, would be accessed via the label. Whilst there are good reasons to support most Events. The first stage is to define events; although of these existing models and formats for glyph pro- there is a good case for a generic language to be duction, the FDL must support a far wider range used here, the present level of development of the allowing, if feasible, for future new glyph resources technology suggests that ad hoc languages closely and font technologies as they come into use. Thus linked to particular devices may be needed for some it should support the specification of all of the fol- time. That used by SVG is a good example of such lowing: limited expressibility. • font-resource independent specification of a Thus the FDL needs event definition objects glyph within a font; where the following information can be put, using • explicit positioning of glyphs; a suitable language: • relative positioning of sequences of glyphs (us- ing font resources to calculate exact position- • definitions of the interaction events, ing): at least for all standard typesetting modes, • the actions associated with interaction events. both horizontal and vertical, possibly also for An important and common case of an action is to typesetting along more general graphics paths. display a particular view of the current, or some Although perhaps not strictly part of the FDL itself, other, FDL document; the features needed to sup- a clear requirement arising from these is the ability port this are described below. to attach arbitrary external resources to a FDL file. Painting. Much of what comes under the detailed Higher-level structure. The basic formatted ob- specification of a painting method is relevant only jects (FOs) can be related in various ways, including to the details of the rasterisation but some, such as these three of immediate importance:
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 193 Chris Rowley
• Logical arrangements: these can be very general not have to be able to turn these into basic cubic relationships but include traditional page se- paths but the PA must be able to process them. quences; these do not prescribe anything about Of course, increasing the amount of informa- the formatting of the individual objects. tion (e.g., font resources) that does not need to be • Formatted arrangements: use and reuse of ob- storedintheFDL file also decreases 3, but it also jects within others. requires the PA to be able to access these resources • Global information that allows viewers/printers effectively. to define different views of a document in terms This section does not analyse the possibilities of these objects: e.g., print sequences, relative for the use of alternative formats since these af- positioning of windows on a screen, suppressing fect equally any language. Some relevant techniques the rendering of the content of whole objects are data compression, which is comprehensively sup- (another area where PDF is currently deficient). ported by the PDF standard, and binary formats that can be read quickly, as typically used by DVI I see no reason to put any restrictions within the but not currently available in PDF. language on the nature of these relationships, thus at least a general labelled-graph language providing Summary arbitrary linking information is needed here. Such relationships and their specification need Outline. A formatted document, as described by further investigation and development. Specifica- an FDL specification, is a collection of reusable FOs tion of the formatting relationships will immediately with labelled relationships. These FOscontainposi- require an extended model of the medium that sup- tioned graphical objects including, recursively, fur- ports layers and transparency. ther FOs; but they have no further internal struc- There is currently some small-scale research ac- ture. No distinction is made between the graphical tivity concerned with logical information extensions objects used for painting and those used to define to PDF, in particular the work on structured-PDF at interaction events. Nottingham University. It is unclear whether Adobe Interaction events and associated actions are have any long-term interest in moving PDF in that not described in the FDL itself but it provides ob- direction (or, indeed, whether they have any inter- jects specifically to contain these descriptions. It est at all in the language itself as anything beyond also provides objects for describing external resources a cryptic internal language for the Acrobat black- and the possibility to attach such resources to an boxes). FDL file. Although glyphs are graphical objects, they are Trade-offs. Many of the choices that need to be most often accessed via external resources so they made in developing the detailed syntax of the lan- must be treated very differently within the FDL. guage lead to decisions that, whilst not affecting the All the organisational structure of the format- semantics or power of the language, do affect the ted document is defined in the FDL by general named following measures of its utility. The first two items relationships between the FOs; other logical infor- in this list are independent of any particular docu- mation is not described in the FDL itself. ment, whereas the others will vary according to the General principles. In developing the details of type of document and its uses: such a language the following principles should be 1. the expected functionality of the GA, adhered to as much as possible. 2. the required functionality of the PA, • Indirection: always A Good Thing. 3. the relative size of the FDL file, • Modularity: but do not try to separate too rash- 4. the relative speed of the generation of the FDL ly things that should be intimately connected. file, • Flexibility: do not impose unnecessary restric- 5. the relative speed of accessing information in tions on the GAsorPAs. the FDL file, • Extensibility: of course! But only within the 6. the relative speed of processing information from limits of the above outline. the FDL file. • Clarity: and ease-of-use as the cream on the In general, decreasing 1, and hence, typically, 4, will cake! increase 2 and, often, also 3, 5 and 6. For example, if The way forward the FDL supports a large range of higher-level graph- ical objects, such as transformations, arcs of conics The next step is to refine and formalise the ideas or smooth piecewise-cubic paths, then the GA does described here and to investigate extensions of these
194 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Models and Languages for Formatted Documents models that will support, in particular, a powerful text, possibly intimately connected? The possibili- concept of layers in the output medium. ties for each element are as follows (at least): resize, Some possible (and mutually supportive) ways clip, reflow. in which the TEX community should be able to assist Resizing may make sense, within reasonable lim- in this, and beyond, are as follows: its, for some graphics but maybe not for others; it is • Further extend DVI along the lines suggested rarely the best thing for text. Contrariwise, reflow- by the NTG TEX Future Working Group [7]. ing is not usually feasible for graphics but may be This already supplies a syntax for those graph- sensible for text, again within some limits. ics objects supported by PDF; semantics satis- So who decides what is allowed? The author fying the FDL model can be simply specified for should at least be able to define the reasonable limits these and further necessary objects. but maybe the user should have some control over • Work with the W3C group to influence the de- what he is looking at. Thus, more work, more ideas and, sadly, more velopment of the SVG specification so that it papers, are needed. can be used as (part of) an FDL. • Design future typesetting systems (such as NTS) References to output such a powerful, clean FDL and write [1] PostScript Language Reference, 3rd ed., Adobe drivers that use it directly or translate it to a more processor-friendly and currently sup- Systems, 1999. http://www.adobe.com/prodindex/ ported language, such as PDF or the API (A.. postscript/. P.. I..) of a printer/viewer sub-system. [2] Portable Document Format Reference Manual, Or maybe I should move on to even more radical v 1.2. Adobe Systems, 1996. ideas whilst PDF and SVG/XML slog it out in the [3] Portable Document Format Reference Manual, market place and TEX/DVI continues to dominate the quality niche? (Note: The last word must be v 1.3. Adobe Systems, 1999. treated `alafran¸caise to make the pun work.) http://www.pdfzone.com/resources/. [4] The DVIType Program. Stanford University, Postamble 1982. Preparing the conference talk and the subsequent http://www.CTAN.org/tex-archive/systems/ discussions have shown that there are other impor- knuth/texware/. tant questions not even touched on above; thus, this [5] Scalable Vector Graphics (SVG)Specification. paper should be treated as “preliminary thoughts”. W3C, 1999. In particular, it became very clear when I was http://www.w3.org/TR/WD-SVG/. designing and producing the examples for the talk [6] The pdfTEX Manual. 1999. that, even by using all currently available applica- http://www.tug.org/applications/pdftex/. tions (including some pre-release versions), I could [7] NTG TEX future working group. TEX in 2003: not implement all of those features of a formatted Part II: Proposal for \special standard. TUG- document that are currently agreed to be desirable. boat 19(3) pages 330–337, 1998. Moreover, it has taught me that, even at the high [8] PDF e-mail discussion list. level of my models and languages, there is a lot http://tug.org/mail-archives/pdftex/. more to the interactions amongst graphics, text, and [9] Hagen, Hans. Examples of the use of pdfT X screen formatting than I had considered so far. E to produce interactive documents. 1999. For example, what should happen when a user http://www.pragma-ade.nl/. resizes a window that contains both graphics and
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 195 AcroTEX: Acrobat and TEX Team Up
D.P. Story Department of Mathematics and Computer Science The University of Akron Akron, OH 44325 dpstory@uakron.edu http://www.math.uakron.edu/~dpstory/
Abstract
Adobe’s Acrobat (PDF) and Donald Knuth’s TEX system make a powerful team for putting mathematics on the internet. For the educator, this team, called “AcroTEX,” is the poor man’s multimedia software company. Though TEX was implemented before the rise in popularity of the World Wide Web, PostScript code written to the .dvi file, using TEX’s \specials, can be used to enhance an electronic document created from a TEX source by introducing such elements as color, hypertext links, form features, sounds, and even video clips. These special features are achieved by inserting ‘pdfmarks’ into the output file. The Adobe Distiller, in turn, interprets these pdfmarks and translates them into the appropriate element as it writes the PDF document. TEX is, therefore, well suited for creating PDF files, especially technical material. With the aid of its very powerful macro facility and the ability to position material very precisely on a page, these electronic enhancements can be created and placed in an exact and automated way. This paper explores the capabilities of AcroTEX and the contents of the AcroTEXwebsite(www.math.uakron.edu/~dpstory/acrotex.html); examples include tutorials, an electronic grading system, mathematical games, and techni- cal articles.
Introduction tion. This makes a PDF file suitable for distribution over the Web. In this paper, I will describe how T X, the out- E AcroT X also refers to a web site: standing mathematical typesetting engine, and the E Portable Document Format (PDF) of Adobe Sys- www.math.uakron.edu/~dpstory/acrotex.html tems, first introduced in the Acrobat suite of soft- The documents referenced in this paper can be ac- 1 ware, form a team called “team AcroTEX”, and cessed at the AcroTEX web site, a site primarily ded- how this team has opened up a world of possibilities icated to mathematical education. All files at this to people who are interested in electronic education. site, save only some start-up HTML pages, are in From the perspective of an educator, this pa- PDF format.2 per is an exposition of the natural implications of combining TEXandPDF; the exposition covers gen- Genesis esis (first thoughts), creation (of tutorials), problems The original concept was to create a series of elec- and solutions, educational games, technical articles, tronic tutorials covering some of the topics of the and new macro packages for readers who may want first semester of a standard course of calculus as of- to develop on-line educational materials themselves. fered in many colleges and universities in the U.S. Here, AcroTEX refers to a process of producing The design goals of the tutorials included typeset PDF PDF documents from a TEX source. A docu- quality on-screen mathematics, cross-referencing us- ment is a compact, self-contained file format which ing hypertext links, and on-screen color. preserves the page layout of the authoring applica- Typeset quality mathematics and a presence of limited finances implies TEX. Of the freeware and
1 2 A proposed alternative is TEXrobat, but this sounds too Readers are available for virtually every platform; see mechanical. www.adobe.com/prodindex/acrobat/readstep.html.
196 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting AcroTEX: Acrobat and TEXTeamUp commercial TEX systems available at the time, late the entire calculus/algebra set of tutorials)—are all 1994, only the Y&YTEX system offered coloring of provided. fonts and a hypertext feature within its .dvi pre- The tutorials include in-line examples and ex- viewer, DVIWindo. A small site license from Y&Y ercises `alaKnuth: that is, in the source file, the was purchased using a start-up grant from the Ed- questions and solutions are kept together; however, ucational Research and Development Office at the solutions are linked to the questions by hypertext. University of Akron. The syntax is as follows: The calculus tutorial, called “e-Calculus”, was written and put on the department intranet as a col- \exercise < Some exercise question. > lection of .dvi files. The students would come into \answer < The answer or solution. > ...... the computer lab to review the materials using Y&Y’s ...... DVIWindo. The system worked well; however, the \endexercise students were reluctant to spend the long periods of time that would be necessary to read the tutorial in The macro \exercise sets the hypertext link, the computer lab. \answer sets the target, a named destination, for This reluctance prompted me to consider the the link and starts a verbatim write macro. All ma- internet. For many reasons, the .dvi file format is terial between \answer and \endexercise is writ- not a suitable format for the tutorials on the Web. ten to the file \jobname.ans, which is later included Adobe’s PDF seemed to be a natural choice: the at the end of the main file. There is an \example typeset quality of the material, page layout, and macro that behaves in the same way to handle ex- color all would be preserved. amples. Fortunately, Y&Y was well positioned to con- This method of handling exercises and exam- vert its .dvi files to .pdf files. Their dvi-to-ps ples allows the posing of the question within the driver, dvipsone, automatically converted the hyper- body of the text, but the solution does not appear text links that DVIWindo understood to hypertext (and take up screen space) unless the reader wants links that the Acrobat Reader understood. As a re- to see it. This allows for a clean, clear, more orderly sult, “e-Calculus” was ported to the internet with presentation and discussion of topics. very little trouble.3 This was the beginning of the By the way, proofs of theorems are handled the AcroTEXwebsite. same way. The theorem is stated and a hypertext The site has since grown in size to include other link is given to the proof. The main text then contin- tutorials, mathematical games, a demonstration of ues with a discussion of the meaning of the theorem, forms processing using the PDF forms format FDF, followed by examples and exercises. several technical articles on TEX, LATEXandPDF, Another feature of the tutorials is user inter- and a couple of macro packages (web/exerquiz). action. User interaction with the document comes in the form of multiple-choice questions or quizzes. The tutorials Click on the chosen response to obtain instant feed- The writing of the “e-Calculus” tutorial was simple back in the form of humorous congratulations — or enough and will not be discussed here; another tu- an error message. torial, entitled “An Algebra Review in Ten Lessons” T XandPDF (mpt_home.html), has since been written in response E to the needs of the incoming students at the Univer- Macro packages. When I first started this project sity of Akron. in 1994, I decided to use AMS-TEX 2.1 as the basic Beyond the technical discussions of calculus or macro package. At the time, I had a rather slow algebra that any paper document would provide— computer with very little RAM. I found that LATEX though the discussions are more verbose than would was very slow in loading and took a long time to normally be seen on paper — these tutorials try to process a file; however, AMS-TEX 2.1 on the same take advantage of the electronic medium. system loaded quickly and ran acceptably fast. The For finding information within the tutorials, hy- consequences of that decision meant: (1) I must pertext versions of tables of contents, bookmarks write all my own macros for page formatting, cross- (a feature of PDF), cross-references, and indexes — referencing, tables of contents, indexes, color inclu- both local (for the file being viewed) and global (for sion, hypertext linking, etc., and (2) I must read The TEXbook not once, not twice, but many times. 3 File: e-calculus.html. All files mentioned in this paper Other books studied and found to be very useful are are located at www.math.uakron.edu/~dpstory. the ones by Salomon (1995) and Eijkhout (1992).
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 197 D.P. Story
If I were starting over today, I would proba- Quality Type 1 CM fonts have been made avail- bly use LATEX and Sebastian Rahtz’ hyperref pack- able by a consortium of Bluesky Research, Y&Y, AMS, age. LATEX comes with many of the features that I SIAM, IBM, and Elsevier. The freeware, shareware, had to struggle with to include in my own system; and commercial TEX systems now come with Type 1 hyperref provides many of the hypertext features fonts. For an author wanting to publish on the Web that I regularly use in my tutorials. using PDF, every effort must be made to reconfigure However, I have no regrets. At the time I be- their TEX system to use these quality fonts. gan this project in late 1994, hyperref was still in its infancy anyway. One thing is certain, since I Problems with multi-file systems wrote the macros myself, if something goes wrong, I In this section, problems and issues associated with know immediately where the problem is, and how to maintaining a large multi-file system are discussed. fix it. Problem turnaround time is much shorter: I Hopefully, there will be enough detail to help readers don’t have to complain (report bugs) to the package better manage their own systems. author, then wait for the fix to arrive. Make Creating PDF. There are essentially three meth- The use of a utility. The tutorials consist of ods for creating an interactive PDF document from a large number of files that are undergoing constant revision. A make utility7 is used to help maintain aTEX source: this system of files. .dvi 1. Convert the output file to a PostScript file A script file listing file dependencies was devel- dvi dvip- using a -to-PostScript driver such as oped for each of the two tutorials, “e-Calculus” and sone dvips or , then use the Acrobat Distiller to “Algebra Review”. (Actually, two sets of scripts are PDF 4 convert the PostScript file to a file. maintained: one for the system of .dvi files and the 2. Convert the .dvi file to PDF, bypassing the other for the system of .pdf files.) The make utility PostScript step, by using either dvipdf (Lesenko, reads the script and initiates the programed action, dvipdfm5 1996) or by Mark A. Wicks. perhaps calling the TEX compiler or the dvi-to-ps 3. Convert the .tex source file directly to a PDF converter. 6 file by using PDFTEX, a program that is the To create PDF files, for example, the make util- ongoing labor of H`an Thˆe´ Th`anh (Sojka, Thanh, ity, as signaled by the controlling script file, calls and Zlatuˇska, 1996). the dvi-to-ps converter (dvipsone in my case) for Figure 1 depicts the various paths from a .tex source each .dvi that needs to be updated, which in turn file to a .pdf file. dumps the PostScript output into “Watched Fold- TEX ers”. The Acrobat Distiller is activated and distills tex dvi the files in these watched folders automatically and places them in an out folder. A batch file is then dvipsone PDFT X dvipdfm invoked to move the new .pdf files into their proper E or dvips dvipdf location. pdf ps The behavior of the make can be controlled by Distiller way of command-line switches of the make utility. As a result, only the files that have changed can be Figure 1: From TEXtoPDF updated or the whole system of files can be re-TEXed and (optionally) re-distilled. Use Type 1 fonts. One last point must be made The make utility has been very helpful with the before leaving this topic. To create a quality PDF problem of trying to keep all files up-to-date. Updat- document from a TEX source it is necessary to use ing the whole system of files is a matter of starting Type 1 fonts. The traditional font used by many the make utility twice, once to TEX all files, then freeware TEX systems is the bitmap or pk font; these again to create the corresponding PostScript files fonts look choppy and jagged when incorporated dropped into the watched folders. The distiller does into a PDF document and viewed on screen (though the rest. they do print decently). 4 This method is the process referred to as ‘AcroTEX’; it Macro option switches. Each file belonging to is the only one that uses the Acrobat Distiller. one of the tutorials contains a table of contents, an 5 Information and download are available at http://odo. kettering.edu/dvipdfm/. 6 See www.tug.org/applications/pdftex/ for more infor- 7 The make utility that came with Microsoft Macro As- mation and download links. sembler 5.0 is used.
198 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting AcroTEX: Acrobat and TEXTeamUp index, and cross-document hypertext jumps. To fur- ures, and so on. For this purpose I wrote a macro ther complicate matters, a collection of files cover- called \WriteDocumentStateTo{
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 199 D.P. Story
page numbers is a little sneaky and should, perhaps, fit to students on the internet. To that end, I’ve be left for another occasion. written several technical articles that describe how to insert hypertext links and form annotations into Mathematical games a PDF document, and how to use LATEX to create a Games stimulate interest in mathematics, promote quality interactive document in PDF. creativity, and provide a resource of class projects. All of the games listed below suggest that T Xand E About Pdfmarks. A pdfmark operator is an ex- PDF can be used to create simple games that are tension to the PostScript language that is read and inexpensive to build and challenging, educational, understood by the Acrobat Distiller. The pdfmark and informative to play. is used to insert PDF-related features such as hy- My interest in games was stimulated by Gary pertext and form annotations. Primary documenta- Cosimini of Adobe. He created a game board in tion on pdfmarks is from Bienz and Staas (1996); an PDF that fascinated me; I wondered how he did it excellent on-line reference to pdfmarks is the “pdf- and set out to duplicate and expand on his game. mark Primer”, one of the chapters from Thomas To construct my own version of the game from a Merz’ fine book, Web Publishing with Acrobat/PDF TEX source, I determined that I needed a greater (1998). understanding of the pdfmark operator. My read- For the TEX programmer wanting to create hy- ing and experimentation resulted in a Jeopardy-like pertext links or to insert form elements into a docu- game called “Algeboard” covering some topics in el- ment, the appropriate code can be inserted into the ementary algebra. .dvi file by using raw PostScript \specials. This Studying the Pdfmark Reference Manual (Bi- code is then passed on to the .ps file by dvi-to-ps enz and Staas, 1996) and the PDF Reference Manual converters such as dvipsone or dvips. (Bienz et al., 1996) gave me greater understanding of For LATEX users, Sebastian Rahtz’ hyperref how to construct the game. With TEX’s macro fea- package performs all these tasks wonderfully. Still, ture and its ability to place objects very precisely, I there are certain special effects that hyperref does was able to stack form fields one on top of the other, not include; in this case, knowledge of pdfmark pro- and control their hidden attributes. When the user gramming is essential. clicks on one of the game squares, the visible form The electronic article, “Pdfmarks: Links and field becomes hidden and another becomes visible. Forms” (Story, 1998a), was written not long after This is the way some of the effects are accomplished I had constructed the games just described. I had in the game. made an in-depth study of the pdfmark operator and “The Giants of Calculus”, a two-column game thought it might be a useful to write about what I of matching, was another challenge in coordinating had learned as a way of organizing the information layered form fields to get the desired effects. in my own mind. Later, when Adobe Acrobat 3.0 came out with The article describes the basic, the advanced, JavaScript, the Algeboard game was revised and and the more subtle techniques of creating hyper- JavaScript was used to keep track of the score. Thus text links and form annotations for PDF. Written was born “Algeboard/JS”. This gave me some early from the perspective of a TEX user, the article is experience in using JavaScript. interactive and highly informative. The many ex- The experience with JavaScript helped when tensive and detailed examples contained in the ar- I attempted to create the more complicated “PDF ticle use TEX primitives and macros; TEXuserscan Flash Cards for Kids”, an electronic variation on the copy and paste code swatches into their own source flash cards children use to practice their arithmetic document. skills of addition, subtraction, multiplication, and division. All of the above-mentioned games can be down- Authoring PDF documents using LATEX. In loaded from the AcroTEXwebsite. the past few years, I’ve received several inquiries from educators about how to construct good on- Technical documentation line tutorials using LATEX. Not a regular user of A One of the goals of the AcroTEXsiteistotrytoen- LTEX myself, I really didn’t know. Ultimately, I A courage other educators to use TEX(orLATEX) and got interested in learning LTEX, and in understand- the Portable Document Format to put mathematics ing how to use Sebastian Rahtz’ hyperref package. (or any technical material) that might be of bene- As a result, I wrote the article, “Using LATEXtoCre- ate Quality PDF Documents for the WWW” (Story,
200 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting AcroTEX: Acrobat and TEXTeamUp
1998b). In this article, I tried to describe all the ele- duce an outstanding typeset-quality article on pa- ments that go into the creation of a visually attrac- per. TEX, however, is capable of more than just tive, interactive PDF document using LATEX, with black and white. When teamed with Adobe Ac- special emphasis on how to use the hyperref pack- robat and the Portable Document Format, TEXcan age. produce colorful documents with the richness of user interaction. The Web and Exerquiz packages TEXandPDF are a natural for putting educa- The goals of these two packages are: (1) to create tional material, especially technical material, on the an attractive, easy-on-the-eye page layout suitable internet. It is hoped that the use of “team AcroTEX” for the WWW,and(2)tomakeitveryeasy(for will continue to grow in the educational and TEX educators) to create interactive exercises and quizzes communities. in the PDF format. Both packages are available in References webeq.html. The web package (for LATEX) is a set of macros Bienz, Tim, R. Cohn, and J. Meehan. “Portable that establishes a page layout for a (PDF)document Document Format Reference Manual”. Version that is meant to be read on-screen and not meant 1.2, Adobe Systems, Inc., Mountain View, CA, to be printed. The package also redefines the table 1996. of contents to a web style and defines optional navi- Bienz, Tim and G. Staas. “pdfmark Reference Man- gational aids. The package has options for use with ual”. Technical Note 5150, Adobe Systems, Inc., dvipsone, dvips,andPDFTEX. Mountain View, CA, 1996. The exerquiz package defines three new envi- Eijkhout, Victor. TEX by Topic: A TEXnician’s Ref- ronments: erence. Addison-Wesley, Reading, MA, 1992. Goossens, Michel, F. Mittelbach, and A. Samarin. • the exercise environment creates on-line exer- The LAT X Companion. Addison-Wesley, Read- cises; solutions are hyperlinked to the question; E ing, MA, second edition, 1994. Goossens, Michel, S. Rahtz, and F. Mittelbach. The • the shortquiz environment is used to construct LAT X Graphics Companion: Illustrating Docu- multiple-choice quizzes (with or without solu- E ments with T X and PostScript. Tools and Tech- tions) with immediate feedback in the form of E niques for Computer Typesetting. Addison-Wes- “Right” and “Wrong” message boxes appearing ley, Reading, MA, 1997. on-screen; Lesenko, Sergey. “The DVIPDF Program”. TUG- boat 17(3), 252–254, 1996. • the quiz environment is used to write longer Merz, Thomas. Web Publishing with Acrobat/PDF. multiple-choice quizzes that are graded by Java- Springer-Verlag Berlin, 1998. Chapter 6, “pdf- Script. mark Primer”, is available on-line from: www. The exerquiz package is somewhat independent of ifconnection.de/~tm/pdfmark/. the web package; it can be used with any of the stan- Salomon, David. The Advanced TEXbook. Springer- dard LATEX classes and works correctly, for example, Verlag, Berlin, 1995. with pdfscreen,8 a screen design package written Sojka, Petr, H. T. Thanh, and J. Zlatuˇska. “The joy for PDFTEX by C.V. Radhakrishnan. of TEX2PDF —Acrobatics with an alternative to It should be remarked that both packages have DVI format”. TUGboat 17(3), 244–251, 1996. been extensively tested using the commercial TEX Story, D.P. “Pdfmarks: Links and Forms”. On-line system by Y&Y, which uses dvipsone, and the MikTEX documentation: lnk_forms.html, 1998a. system for Win95/NT, which includes dvips and PDF- Story, D.P. “Using LATEX to Create Quality PDF TEX. Documents for the WWW”. On-line documenta- tion: latx2pdf.html, 1998b. Final remarks
It is surprising what can be done with TEXandPDF. In the past, when I thought of TEX, I thought of a compiler and a collection of macros that could pro-
8 CTAN: macros/latex/contrib/supported/pdfscreen.
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 201 Jean-luc Doumont
Doing It My Way: A Lone TEXer in the Real World
Jean-luc Doumont JL Consulting Achterenberg 2/10 B–3070 Kortenberg, Belgium JL@JLConsulting.be
Abstract While a world-renowned standard in many academic fields, Don Knuth’s much acclaimed typesetting system is almost unknown in most parts of the real world, where many a document designer has achieved professional success without ever hearing (let alone pronouncing) the word “TEX”. Outside academia, the lone TEXer faces not only compatibility headaches, but also outright incomprehension from his customers, colleagues, or competitors: why would anyone want to use TEX to produce memos, two-color newsletters, full-color brochures, overhead transparencies, and other items—in short, anything but books that contain a lot of mathematics? As a consultant in professional communication, I have been using TEXfor all documents I have produced for my clients and for myself during the last ten years or so. Though it has turned out to be most successful, this approach is seen by most as a mere idiosyncrasy. And yet, the systematic use of my own TEX and PostScript programming gives me three unequalled advantages over using off-the-shelf software: I travel light, I can go anywhere I please, and I guarantee I’ll get there. Introduction Approach and opportunities Being someone who (among other activities) pro- The very varied documents my company produces duces documents for a living, I often have to discuss can be grouped into two categories. Some are software issues with clients, colleagues (or competi- in-house documents, such as letters, reports, and tors, as the case may be), and service bureaus or teaching materials (handouts, lecture notes, over- print shops. When I say I use “TEX,” these people head transparencies). The others are documents usually first ask me to repeat, then express their we create on behalf of clients, including newslet- surprise. Some immediately dismiss it as “never ters, brochures, corporate reports, and overhead heard of”; others first ask what exactly it is, be- transparencies. Documents of both categories may fore wondering why I would want to use something be black-and-white, two-color, or full-color ones, “that nobody else uses” instead of a WYSIWYG, printed in traditional (offset) or modern (digital) “professional” application (the one they use, no ways, in small or large runs. doubt). To be able to guarantee the quality of the My using TEX (and PostScript) for virtually documents we produce for our clients, we have all documents I produce, successful as it may opted to give them non-editable deliverables only— turn out, has puzzled many a TEX user, too. Is in practice, paper, film, or ready-to-print electronic TEX really the best tool for documents other than files (in PostScript, for example). We thus strive long, structured, or heavily mathematical ones? to preserve the quality of both the writing and Is a direct-manipulation, integrated application not the typesetting against two sources of undesirable better suited to producing short newsletters, graphs, alterations: unwise last-minute modifications by the or overhead transparencies? The most qualified clients themselves, often ruining a consistency they answer is likely to be, “Well, it depends.” may not readily perceive, and accidental changes This paper relates the experience of a long-time caused by an all-too-theoretical “seamless conver- lone TEXer in the real world. It explains choices, sion” between platforms or versions of a given points out advantages and limitations, and draws software application. lessons from the experience.
202 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Doing It My Way: A Lone TEXer in the Real World
Our insistence on not giving away editable files projects, even those in research organizations, typi- is a tough one to maintain (admittedly, it has cally from the Corporate Communication or Public suffered exceptions): clients logically consider as Relations department. These clients, used to main- theirs the text they have paid us to write or the stream direct-manipulation packages, have difficul- format they have paid us to design and produce. ties understanding the nature of TEX, the difference Yet our name in the list of credits constitutes our between TEX itself and its implementation on a best—if not our only useful—marketing tool: we specific platform, and the fact that TEXcannotdo cannot afford to take chances. We will, of course, much (relatively speaking) until one programs it. ensure that what we create addresses the client’s Irrelevant as it may seem, given the lack of needs while attaining the quality level we strive for. compatibility issues in our case, the choice of TEX Though our approach is dictated by quality in an essentially non-TEX world proved a liability to standards, not ease of production, it does simplify our credibility. I had expected that clients who did our work — to some extent, that is. Because we not know TEX would be content to judge the tool need not exchange editable files back and forth by the result; I was wrong, for at least two reasons. with external parties, we are free to choose not First, most of our clients are little able to judge the only the platform but also the software tools with quality of a typeset page; their eyes somehow seem which we work. More accurately, we must be blind to stretched out lines, uneven line spacing, able to process incoming ASCII files and occasional and other basic points. Yet the eventual readers images in GIF or JPEG format, and to produce of the documents may well notice the difference PostScript files for laser printers, imagesetters, or (if unconsciously), so producing high-quality pages others still—all in all, minor constraints. Our still matters. Second, our clients judge merit by software tools are therefore essentially limited to an popularity and hence distrust or even disdain what ASCII editor, an image processor, and, of course, a they do not know; they figure that, if they have TEX environment, which we use to produce virtually not heard of TEX, then it cannot possibly be any all our paper documents. good. Consequently, and independently from what they see, they are suspicious of the pages I typeset Surprise and frustration with TEX. Ironically, they seem infinitely more reassured when I simply tell them I use “a system The choice of TEX as all-purpose tool for producing I programmed myself” rather than “the system very varied documents surprises those who know developed by Professor Donald E. Knuth”. TEX and puzzles those who do not. Clearly, it Distrust of the unknown goes beyond the does stem from my previous experience with TEX clients. Many of the print shops with which clients in an academic setting (Stanford University, which sometimes ask me to work dislike my giving them happens to be TEX’s cradle), and from my tech- PostScript files instead of editable ones. PostScript nical education (applied physics) and consequent being akin to Greek to them, they somehow feel preference for analytical thinking. Admittedly, it deprived of their power and will not admit I have may also partly originate in my pronounced taste done half of the work for them. When anything goes for computer programming and, more generally, wrong at any stage of the printing process, they are in my peculiar habit of wanting to (re)do things quick to pin the blame on me, insisting to the client my own way. Still, the perspective provided by that I use a software application nobody else uses, some ten years of successful professional practice “so it’s bound to create problems”. (As an extreme vindicates my choice: over the years, I have satis- case, one renowned print shop even resorted to this factorily moved to TEX for more and more types of excuse after mistakenly using a Pantone color other documents. than the one I had specified in writing.) Until I entered the “real world,” I failed to People who do understand what TEXis,after realize how poorly known TEX is, at least over some explanations, wonder why on earth I prefer in Europe. It is known by the more technically using such an intricate system rather than a much minded participants at the training programs I more user-friendly (and better known) package. At teach in academia and national research centers, first, the advantages of TEX were so intuitively though even these people are often unclear about obvious to me I was unable to put them into words. what TEX really is and typically confuse TEX Extolling the line-breaking algorithms, positioning A and L TEX. In contrast, it is an unknown entity accuracy, or logical markup was pointless: these to clients who entrust us with document-creation people either saw no benefit in the features or insisted they could do all of that with their own
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 203 Jean-luc Doumont
text processor. To a point, they were right, of Besides the above observations, the (occasion- course, even if we all feel that “it’s not the same”. ally emotional) discussion on comp.text.tex as a result of my post pointed to two interesting differ- Alittlehelpfrommyfriends ences. Trivial as these may seem in retrospect, they helped me understand much of the religious wars At a loss for words in the defense of TEX, but con- around software: they are the differences between a vinced from experience that it suited my work so software tool’s well, I set out to ask other users why they preferred • claimed and actual capabilities, and T X. I contacted the makers of my own T Xim- E E • its potential and actual use. plementation, posted questions on comp.text.tex, and asked around. The answers were varied but Reliability and portability are typical issues can be grouped into five categories: output qual- that differentiate between claimed and actual ca- ity, flexibility, text-based formats, reliability, and pabilities. There is no intrinsic reason why an in- portability. tegrated application should be unreliable, although Not surprisingly, many who answered my post increased complexity and fast-changing versions on comp.text.tex summed it all up by saying that certainly increase the odds. Similarly, there is no TEX’s output “simply looks better,” especially as intrinsic reason why the claimed interchangeability regards mathematics. Among others, they pointed between platforms and between versions should turn to such features as transparent use of optical sizes, out untrue. Yet a pragmatic point of view suggests better hyphenation algorithms, and line-breaking otherwise. algorithms that act on whole paragraphs, not line Furthermore, the fact an application offers a by line. given feature does not imply that its users actually Besides the quality of its output, users love use it—often a question of usability. As an extreme TEX for its flexibility, allowing one to extend its example, any graphical application that allows one capabilities almost ad infinitum. Some mentioned to position characters precisely anywhere on the virtual fonts, custom hyphenation patterns, and page can produce output that matches TEX’s — non-European languages. Others lauded TEX’s but at what cost? TEX, or so its users say, not superior capabilities for floating inserts and cross- only makes some operations even possible at all, references. Following the theme of the present TEX it also makes many operations easier than direct- Users Group conference, some also underlined the manipulation software. parallel writing of printed and HTML documenta- Actual use is also largely affected by stability. tion. In never-ending debates on software, protagonists Less expectedly perhaps, users praised TEX’s who claim their own tool to be more efficient than ASCII roots. A plain-text format, they said, that of others may well all be right, for the tool allows easy manipulation with any text editor, each masters best is the most efficient for him or easy generation of TEX files by other applications her. Yet mastery requires time and users feel little (including preprocessors), and small files (hardly motivated to learn to master a tool that will soon larger than the actual text) that compress well. change: fast-evolving software, with pressure or Some also mentioned the small size of their TEX even obligation to upgrade regularly, may offer ever- implementation, compared to that of popular text increasing capabilities, but discourages in-depth processors. learning. comp.text.tex TEX’s text files, batch operation, and stability As a final point, the discussion over time were seen as the basis of its reliability also clarified WYSIWYG (What You See Is What and portability. In contrast to those of integrated You Get), a concept often confused with that of applications, the TEX (source) files can be damaged direct manipulation. If WYSIWYG denotes faithful only by the text editing, not by the formatting, correspondence between screen view (what you the displaying, or any other downstream operation. see) and paper output (what you get), then some .dvi Moreover, source files typeset with TEX ten years viewers are the closest to WYSIWYG one can ago can be typeset again today — on any platform — get. Many people, however, actually mean that to produce the exact same output, in the exact same what they do affects the output in a way they can see device-independent format. TEX files, being plain instantly , a characteristic of direct-manipulation ASCII, can also safely be exchanged by electronic software. For accurate positioning (for example, to mail across the world. align two words exactly), direct manipulation may not be the most practical approach.
204 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Doing It My Way: A Lone TEXer in the Real World
Agreed, but... Conclusion Answers from other TEX users expressed some of my Faced with people who equate “most popular” with intuitions in words, clarified some of my own mental “best”, I stopped saying I work with TEXincasual shortcuts, and triggered new thoughts. I agreed conversation. Today, if people ask, I usually say on all advantages presented, but felt something I use “my own system” and refrain from explain- was still missing. Most users enthusiastically put ing further. If people insist, I give them a short forward that TEX is unequaled for some documents; document that explains my company’s approach, very few suggested they were, like me, using TEX including its choice of software tools (Figure 1). Be- for all documents they produce. cause they may care little about output quality and Users see TEX as particularly suited to the because our approach bypasses portability issues, production of large, structured documents. Its the two pages on TEX and PostScript emphasize batch process, ability to input files in other files, the three other categories mentioned above: we say and virtually unlimited capabilities (such as num- it allows our work to be fast, flexible,andreliable. ber of paragraph styles) make large projects easier While I have been a TEX enthusiast for over to handle. The separation of contents and visual ten years, I have never been a TEX evangelist. appearance that it encourages (via macro program- More importantly, while I am convinced that using ming or use) allows one to focus more easily on TEX my own way is one of my best professional content, structure, and style. moves, I have never advised anyone to develop his To my surprise, many a TEX user I talked to or her own package from scratch, let alone use admitted to resorting to direct-manipulation appli- mine. Some clients, who do notice the superior cations for short, less-structured documents (letters output TEX allows me to produce, have asked what being the typical example) because “It’s so much software tool I was using in order “to buy it for easier.” I wonder what, specifically, they find easier. themselves”. I have to disappoint them, telling Letters—business letters, that is—may be seen, them my “tool” could not really be bought. Other if not as one large, structured document, at least clients, who had heard about TEX, asked me for as a uniform collection of documents. Consistency, the documents’ source files, so they could convert therefore, is as important to the 250 business letters them automatically to HTML. How could their I write every year as a one-off long report. Such HTML converter know, for example, that my TEX a level of consistency, it seems to me, is more command \Cs[137] produces 137Cs? easily and more reliably achieved with TEXthan Using TEX in the real world, where time and with direct-manipulation software. As an example, money matter much, may require a dedicated TEX the instruction \JL99001 TME/MDe typesets a letter wizard. A well-oiled macro package may save con- header with my company’s logo, my name, the date siderable time and money by yielding consistently (in the proper language), the letter’s reference num- beautiful documents fast. It may, however, not ber (here, 99001), and the complete address block account for the admittedly limited but nonetheless of my addressee (here, person MDe from company important special cases. In those cases, real-world TME, as specified in a data file). users may want to call upon a TEX wizard, some- The major perceived obstacle to using TEX, times on short notice. How severe a limitation to which some comp.text.tex members rapidly this is depends on management strategy. Learning pointed, is its steep learning curve. TEX, or rather TEX, in my case, certainly paid off, but it was TEX packages, are often quite immediate to use. quite an investment, one that was eased by personal Despite a marked aversion for computer program- motivation but that may not make sense from a ming, my business partner used my TEX macros pure business point of view. Still, it has given me very readily, even without the user manual I never an edge in many demanding professional cases, in got around to writing. By contrast, TEX, or rather which I can—actually—do what others can not. the fine programming of TEX, is not that immedi- I may remain a lone traveler, but my mind ate to learn, as confirmed by Knuth’s “dangerous is made up: I will go on traveling light, going bend” signs in The TEXbook (1984). The same anywhere I please, and resting assured I’ll get there. business partner was often at a loss when some- thing unexpected occurred: the mental paradigm Reference she had built through practice was insufficient to The T Xbook understand those cases. Knuth, Donald E., E . Addison-Wesley, Reading, Massachusetts, 1984.
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 205 Jean-luc Doumont
Figure 1 (next two pages) The two pages on our using TEX and PostScript, out of a short document explaining our approach to our clients.
ecause we strive for the highest quality, we invested in the best typesetting tools. All the paper documents we produce therefore B rely on two highly acclaimed standard codes: TEX and PostScript. TEX is a typesetting system developed by Donald E. Knuth at Stanford University for “the creation of beautiful books—and especially for books that contain a lot of mathematics.” PostScript is a standard page-description language developed by Adobe Systems Inc. “for imaging high-quality text and graphics.”
Software tools Taking advantage of their complementarity, we use TEX and PostScript in combination. We use TEX programming and encoding for all aspects of typesetting (arranging type on the page). We use PostScript programming and encoding for elements other than type, such as drawings and graphs. For simple illustrations, we write PostScript code directly. For complex ones, we use PostScript-oriented Adobe products: Illustrator™ for vectorial descriptions such as drawings, Photoshop™ for pixel descriptions such as sampled images.
Software Since we go from scratch to final pages, compatibility issues are few, if any. As compatibility input from clients, we favor the simplest form of all: unformatted ASCII-encoded text or data (often referred to as “plain text”), readily obtainable with most software today. And as output, we convert the whole document to PostScript code, for printing or imaging on a PostScript device. Our clients, in other words, do not have to know anything about TEX to work with us.
Advantages Programming TEX and PostScript—a route on which few professionals venture— of own tools gives us three unequaled advantages over using off-the-shelf software: our work is fast, flexible, and reliable. First, we know our tools in and out, and are not slowed down by countless unnecessary options: we travel light. Second, whenever a feature seems to be missing, we program it: we can go anywhere. Third, whenever a feature does not work as we intended, we can and do fix it: we’ll get there. We can therefore guarantee a well-done job by a certain date, without fearing that the software irremediably let us down at the last minute.
Freedom has its price: programming TEX and PostScript requires skill and dedication. Skill we have acquired through a high-level technical education, then refined through ten years of practice. Dedication is fueled by our (and our clients’) satisfaction with the results. Though we seldom recom- mend this approach to others, we will continue to develop our own tools, while guaranteeing full compatibility of the output pages with printing devices.
206 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Doing It My Way: A Lone TEXer in the Real World
The beauty of TEX TEX (pronounced “tech” or “teck”) is not so much a word processor as a programming language, complete with typed variables, block structure, executable statements, and a powerful macro facility. The TEX compiler turns a one-dimensional source program into a two-dimensional typeset page. Below is an example of source code (right) and resulting output (left).
50 \draw[\white]{8\pc}{8\pc}
40 r 2 =0.93 \xrange03{10} \xaxis01{10}34 \yrange0{50}1 \yaxis0{10}151 30 \markerdata
20 0.034 6.1667 0.051 13.833 ... \enddata \linedata 10 0.0 4.45 0.3 52.28 \enddata \an{0.2}{35}{$rˆ2=0.93$}rb 0 0.00.10.20.3 \enddraw
TEX’s advantages are numerous. Dedicated to high-quality typesetting, it pro- duces the best output available today, especially for such tasks as kerning lines, hyphenating paragraphs, and displaying mathematics. It allows accurate posi- tioning (better than 1 µm) and can tackle virtually any language, in any alphabet. As a clearly defined language, it is platform-independent and stable over time, allows a high level of automation and extension, nicely separates contents from format (thus encouraging logical design, rather than visual one), and works with plain-text source files (smaller, easier to manipulate, edit, transfer, and compress, and much harder to mangle by accident than word-processor files).
The example graph above exemplifies some of TEX’s advantages. The plain-text code is compact and compiles fast to produce a consistent graph. Typeface, tick lengths, and line widths are defined at the top of the document, and apply by default to all graphs. The graph is exactly eight interlines high, and is 1 shifted down by /4 of an interline; bottom labels are aligned to a line of text, contributing to the overall harmony of the page.
Because typesetting is a complex process, so, unavoidably, is TEX. While merely using existing macros is simple—much simpler, in fact, than using a word processor—writing one’s own set of macros is not. Though we would personally settle for nothing less, we consider TEX a tool for professionals and a priori do not recommend its widespread business use around the office.
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 207 The vulcan Package: A Repair Patch for LATEX
Peter Flynn University College Cork Computer Centre pflynn@imbolc.ucc.ie http://imbolc.ucc.ie/~pflynn/
Abstract
Over many years, TEXandLATEX systems have proved successful in providing high quality, affordable typesetting from the desktop. This success has tended to disguise some of the less felicitous design decisions made in good faith in earlier days. These are now making it harder to keep LATEX-based systems in line with user expectations, as the defaults remain rooted in a document model and a rendering which reflect the period of LATEX’s genesis. Of the thousands of LATEX packages, many have already tackled the allied problems of providing better formatting and configuration management and of adding new formatting features. This paper presents an attempt to tackle three of the more deep-seated problems: providing an improved document model, an updated rendering, and some fixes for what many users perceive (often wrongly) as bugs. The document model is compared with current practice elsewhere in the text field, and the fixes answer some of the most common user requests from the various FAQs and questions asked on ‘comp.text.tex‘. The changes are presented as a standard LATEX2ε package which will be submitted to the CTAN when testing is complete.
Background Equally obvious even to the casual user is the lack of some of the most basic features expected by When Lamport introduced the LAT X structured E publishers, such as provision for author affiliations documentation system in 1983, the T X world en- E and subtitles, and document controls like submission tered a new era. There was now a standard set and approval dates. of commands (macros) for processing some common The concept or model of ‘a document’ which document types, which could be learned very sim- LAT X represents is a curious hybrid. Early markup ply, ‘even by academics and business types’, as an E systems tended to regard the document as sequences esteemed colleague once phrased it. People who of text blocks (paragraphs, lists, sections, chapters) wanted acceptable quality typesetting without hav- separated by headings which applied to all text that ing to learn the low-level formatting hitherto used by followed, until the next heading. The end of a sec- dedicated typesetters could now obtain it, and LAT X E tion was thus implicit in the occurrence of a new rapidly became a de facto standard for laboratory heading. More recent models, notably SGML,tend and other research documents, both in academia and to regard the document as composed of textual units business. which are hierarchically ‘containerized’; that is, they However, as Lamport himself acknowledged dur- have explicitly marked start- and end-points1 and ing his informal presentation at the Santa Barbara subsections lie physically within the bounds of their TUG meeting in 1994, the default typographic styling parent section. LAT X by default uses parts of both of the article, report, letter,andbook document classes E models, reflected for example in its uncontained sec- leaves something to be desired. The best that can be tioning and in its use of bounded environments. said about them is perhaps that they formed a stan- dard at a time when no other was available. Regu- lar readers of the ‘comp.text.tex‘ Usenet newsgroup will be familiar with the frequently asked questions 1 The quite separate issue of markup minimization for about how to change the default styles. convenience is not relevant: the reference is to fully normal- ized markup.
208 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting The vulcan Package: A Repair Patch for LATEX
Development. For ten years the concept of all but Changing this basic provision of markup is not the most trivial changes to the default styles was possible as it is frozen, and too many documents rely beyond most LATEX(andevenTEX) users, who had a on the continued existence of the LATEX defaults. job to do and were not prepared to commit precious One solution to this is the development of transpar- time to making changes in largely undocumented ent ‘drop-in’ styles which provide solutions to the baseline code. The customization features of LATEX problems without affecting anything else. This al- 2.09 were not extensive, and yet it is a tribute to the lows existing files to be processed without changes robustness of Lamport’s work and the dedication of to the markup, and allows new files to be created hundreds of volunteers that many macro packages with markup which is 100% backward-compatible. were written to extend its functionality. The Vulcanize project,3 and the vulcan pack- 1993–94 saw the arrival of LATEX2ε, intended as age which it is developing, grew out of the author’s an interim solution to some of the problems while work in supporting LATEX in a research environment work on LATEX3 continued. The second edition of and also using it in a commercial production envi- Lamport’s book appeared (1994), updated for ronment. The simple practicalities of this made it LATEX2ε, and also the compendious and wonderful clear that while perhaps 85% of style writing could LATEX Companion, which provided for the first time be handled by existing packages and options, there a substantial corpus of in-depth detail on the inter- was a repeated necessity for some common features nals of the system, as well as copious information on which were not well addressed by the standard pack- packages and macros. ages. The present macros are therefore aimed at fixing these aspects by supplementing the standard Structure. Nothing was done, however, about the LAT X defaults with three additional features: level of markup detail or the default appearance of E the document styles. This is understandable when 1. extra commands to support many of the re- quirements for ‘logical’ markup (Lamport, 1988), LATEX is viewed from the point of view of a standard, but it is regrettable, as it is the visual appearance not a few of which are derived from the experi- of the default styles which is the very aspect of the ences of SGML and XML system which draws the most criticism, and is per- 2. new formats for the default document classes haps second only to broken markup as the reason which become operational without any change which has contributed most to the poor image and to existing markup, drawing on practices and take-up rate of LATEX outside the scientific and aca- developments in typographic design over the demic fields. last two decades or more A second factor in the use of structured doc- 3. document design support for authors and others umentation systems also came to the fore in the wanting to make changes, based more closely on decade before LATEX2ε.TheSGML standard2 grew the way the typesetting and publishing industry in use (and slowly, too, perhaps for reasons not un- works A like those which faced LTEX). The potential for The initial stage of work planned for the project is to a system which could separate content almost com- cover the first item above, and part of the second. A pletely from appearance was attractive to many later stage will tackle the more difficult problems of LATEX users, and those who worked on LATEX2ε were ‘customizable customization’ implied by the third. very much aware of the existence and growth of It has been noted that while the additional pro- SGML. Users, too, were not unaffected by the grow- visions of vulcan are in themselves non-standard, ing popularity of one rather small and restricted ap- there is an opportunity at this stage to try and har- plication of SGML called HTML (HyperText Markup monize some of the work being done by publishers Language). and others, especially for the front matter or meta- The Vulcanize project. In concluding that there data, and that the LATEX3 project may be a suitable are areas of LATEX’s implementation which need at- forum for discussion. tention, there is no suggestion that LATEXitself— Other work. There have been many packages which ‘ latex.ltx‘, for example—is in any way ‘bust’, so enhance the default TEXandLATEX styles, and some there is no implication that the core concepts of of them also provide alternative methods of specify- LATEX itself need to be ‘fixed’. It is contended, how- ing changes to the styles. Among the best known ever, that LATEX’s default styles and provision of are probably blu, lollipop,andkoma. The last in markup are, at the very least, sub-optimal. 3 As was noted in the author’s recent column in TUG- 2 Standard Generalized Markup Language: ISO boat(Flynn, 1998), the reference is to the vulcanization of 8879:1985 (Goldfarb, 1990). rubber being used to seal leaks.
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 209 Peter Flynn
particular is of interest, as it implemented, for the sugar’6 for every variation of formatting, the current first time in a TEX system so far as the author is contention is that the default styles are seriously de- aware, the page layout parameters favored by Jan ficient in their provision of markup and that this is Tschischold in later life (Tschichold, 1935). These a significant stumbling block to the advancement of have been borne in mind, although not directly used, LATEX as a serious publishing solution. in the layout changes implemented in vulcan. One of the greatest problems of acceptance for It is no part of this project to detract from these LATEX is that after all the persuasive sales pitch has packages, all of which provide much-needed addi- been made about the importance of proper markup tional features. On the contrary, it is an explicit and identification, and how portable it makes your aim to supplement them by addressing a related but LATEX, the new user is expected to perpetrate code distinct set of problems. like that in Figure 1 because the default document classes do not provide for anything else. Commands in vulcan This section deals with the primary stage of the project, which is to develop additional commands to Figure 1: Traditional methods of article A support a greater use of logical markup. The great- attribution in LTEX est part of this deals with the metadata found in doc- \title{Read Once, Write Many\\ ument headers and title blocks. Although the cur- {\Large Disambiguating multi-author rent proposals do not relate directly to those of the attributions in Restoration comedies}\\} \author{Mae West$^a$\\ Dublin Core4 or ISO 11179,5 the author is monitor- Mae East$^b$\\ ing developments in this area, and suggestions from Wile E Coyote$^a$\\ catalogers as well as publishers have been sought \\ in relation to the need for markup to hold ISBNs, {\small $^a$ University of Short ISSNs, CIP blocks, etc. Island, NY}\\ {\small $^b$ Chicken College, RI}} Titling. LATEX’s \maketitle command creates a title block (or title page, if the titlepage option to Read Once, Write Many the current class is in effect), using the values sup- Disambiguating multi-author attributions in plied in the immediately preceding \title, \author, Restoration comedies and \date commands. If the beginner omits one or Mae Westa b more settings, there are defaults or error messages Mae East Wile E Coyotea provided by LAT X. E a A University of Short Island, NY These commands in standard LTEX accommo- b Chicken College, RI date only the most basic of document identification, and require sometimes extensive additions to handle the requirements of publishers, institutions, or cor- The problem has long been recognized, and many porate standards. Code such as that in Figure 1 is publishers provide style files for their own journals all too common, and is largely due to the fact that (TUGboat is no exception), but these are clearly the only way in the standard classes to effect a sub- very specific solutions, as they address the immedi- title is to bind it typographically to the title, and ate disambiguation requirement for formatting. It is the only way to effect a multi-author representation clear that as publishers slowly move towards more is to bind it in a similar manner to the \author com- productive use of SGML, the format dependencies mand by formatting. While it can be argued, cor- are also slowly changing, but in the common case rectly, that the default styles can be used to achieve of the author writing a document when a suitable footnoted affiliations, for example, the problem is publisher has not yet been identified, many current that they are not called that: they have to be called formats are simply not useful. \thanks, which is misleading, especially for the be- For the three basic commands implemented by ginner. While it is not possible to provide ‘syntactic the standard classes, the vulcan package provides a replacement default title (‘Untitled Document’) and 4 A proposed methodology for adding metadata to HTML author (‘Anonymous’), and modifies LATEX’s date files to aid retrieval on the Web. 5 default to use the day–full-month–year format. In A standard proposing Registries to disambiguate the names used for data elements. This would allow context- addition, it provides for an optional subtitle, and sensitive searches to take place without the user needing to know what name an information provider has used for a par- 6 An alternative way of expressing something; strictly un- ticular class of information. necessary, but designed to keep people sweet.
210 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting The vulcan Package: A Repair Patch for LATEX author affiliation details (job title, department, or- Figure 2: Sample title page in vulcan. ganization, email address, website, and author bio) \title{Encoding Boole’s Algebra} for multiple authors in such a way as to make the \subtitle{(Markup and Semantics) or \\ data more congruent with that required by many (Markup and Presentation)} publishers. This avoids the need for users to add \author{Peter Flynn manual formatting to achieve a reasonably accept- \affil{University College Cork} able generic article format. As more style files for \office{Computer Centre} known publishers’ journals or series are developed, \email{pflynn@ucc.ie}} it is hoped that a common core of similar markup \author{Angela Gilham will emerge: one of the known barriers to acceptance \affil{University of Oxford} \office{Computer Laboratory} of LATEX by publishers is the need to make a signif- icant investment in programming in order to realize \email{angela.gilham@comlab.ox.ac.uk}} \conf{ALLC/ACH 2002} the required markup. \journal{Markup Languages}\vol5\num4 Author details, if they appear in the title block \submitted{31 December 2002} at all, usually occur in one of two places: a) im- \maketitle mediately after each author; or b) grouped together 5 4 after all the authors, and referenced with a raised Markup Languages, : footnote letter. To achieve this in vulcan with mini- Encoding Boole’s Algebra mum effort on the part of the author, the positioning (Markup and Semantics) or of the author details within the markup is used to (Markup and Presentation) determine where to place them. Peter Flynna Angela Gilhamb • author details which are included within the scope of the \author command argument (i.e. aUniversity College Cork Computer Centre before the closing curly brace of the author’s (pflynn@ucc.ie) bUniversity of Oxford Computer Laboratory name), will be printed ‘footnoted’ in a block (angela.gilham@comlab.ox.ac.uk) after all author names are complete Submitted: 31 December 2002 • author details which follow the closing curly brace of the \author command (i.e. they lie after each \author command), will be printed as they stand, separately after each author formats for this, some attaching the affiliation to the An example of the first layout is clearly seen in Fig- first author’s name, some attaching it to the last, ure 2. The information which can be provided about and some using footnote-style raised figures in mul- each author is defined below (some syntactic sugar tiple occurrences. There needs to be more discussion has been added to make it more context-sensitive): about how to make it easiest for the writer or editor \jobtitle Synonyms are to assign affiliations to author names. \position and \rank. Markup is also provided for the identification of \affiliation A synonym is publication-oriented metadata. This often occurs as \institution,and a running header, at least on the title page, so it is the abbreviations implemented in vulcan in that position as left-hand \affil and \inst also and right-hand heads: work. \copyrightholder, A copyright notice \office A synonym is \copyrightdate (holder and year); left \department,andthe head. abbreviation \dept \journal, \vol, \num Thenameofthejour- also works. nal or other publica- \address Postal address. tion, if relevant (with \email and \webpage Email address and volume number and is- Web site, if any. sue number); left head. \blurb This is the author’s \conf, \sponsor Thenameofthecon- short-form biography. ference or event (and The arrangements for handling multiple authors the funder or sponsor- sharing one affiliation is undecided at the time of ing organization), if rel- writing. Different publishers employ very different evant; right head.
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 211 Peter Flynn
Omission of date information works as for standard and small caps for author names, and smaller type LATEX (today’s date). Extra commands are provid- for author details, dates, and acknowledgements. ed to identify progress through the editorial process: However, the header layout is just an instance \submitted and \accepted are more commonly of the \maketitle macro, so it can easily be altered used by authors, and \received and \reviewed are to reflect a new design, without the need to bor- more commonly used by editors. row heavily from the class file. The remaining work An \ack command is supplied to identify fund- on parameterization (see the section “Conclusions”) ing sponsors or acknowledge other assistance. will address this. In all cases the order of the commands is not Indentation. LAT X’s automatic \noindent after important, except that author-specific data must ac- E section heads can be extended to new paragraphs company its owner directly, in one of the two places following lists, quotations, and so on by using the explained above. trapindent option. This removes an unpleasant vi- Sectioning and structural markup. The major- sual imbalance where a new paragraph indent would ity of LATEX’s default sectioning structure and ap- follow the end of the hanging indentation of a fore- pearance has been retained, but some aspects which going list. The flushquote removes the intrusive are the cause of numerous FAQs have been changed. initial indentation in the \quotation environment. The sanshead option sets all section heads, ta- Extended cross-references. Extended cross-ref- ble and figure captions, headings on front- and back- erences have been implemented for parts, chapters, matter sections, and description list topics in the sections, tables, figures, examples, (and lists, where current default sans-serif font. All display section possible), so that it is no longer necessary to name heads are set raggedright at all times. the class of object you refer to. If a reference is The concept of a chapter has been removed to a figure, then the word ‘Figure’ will be inserted from code which is applied in the case of the re- automatically. This is similar to the references seen port class, as reports very rarely have chapters.7 On in T Xinfo,8 which are fully explicit, giving section those occasions when a very large report in chapters E name and number as well as the page reference, and is needed, the chaprep option restores the original is already provided for in some class files, as prefixes use of chapters. This not only conforms with ob- for counters. The objective is to free authors from served practice, but solves an important annoyance the need to carry such details in the head, and to for the new user. allow text containing labels to be moved from one A new float type, example, has been added, environment to another without the need to conduct using the float package of Anselm Lingnau. a manual search and check for referential integrity. Example 1. Example of an example The \ref and \label commands have been mod- ified to handle the relevant additional details, so vul- The semantics of example markup require some care, can files will continue to work with standard LATEX as it has been noted that in North America, this in this regard, except that the auto-naming will not term appears to be taken to mean exclusively ‘ex- occur. Work is ongoing to eliminate some minor ample of verbatim programming code’, whereas in conflicts with other referencing packages. Europe it can be any kind of example of anything, depending on the context of the document, from the In-line lists. One class of lists which is inexplica- 9 brushstrokes in a painting by C´ezanne to the right bly missing from LATEX is the in-line list. These are way to boil a floury potato. extremely common in all classes of document, where they a) provide informality, b) consume less space, Formatting and c) enable a list to form a part of a complete sen- tence. They have been implemented in vulcan in the This work forms part of that identified as the second same way as other lists (ie as an environment), thus item in the section “The Vulcanize project”. \begin{inline}, \end{inline},and\item can be used inside a paragraph. Title block. The layout of the title block or page There is an optional argument to affect the num- has been radically altered, as described in the pre- bering style, using the same values as for section vious section: see Figure 2. Instead of the classic numbering (alph, Alph, roman, Roman,andarabic), centred design, vulcan uses a flushleft default lay- 8 out, with sans-serif document title and subtitle, caps A plain TEX documentation package common on Unix systems 7 It is unclear how chapters ever got into the report doc- 9 As this goes to press I notice the new paralist package ument class in the first place. which also implements these lists.
212 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting The vulcan Package: A Repair Patch for LATEX with a default of italicized lowercase letters (other Lamport, Leslie. “Document Production: Visual or styles are upright). This argument can also be given Logical?” TUGboat 9(1), 8–10 (1988). in the \usepackage{inlist} command if invoked Tschichold, Jan. Designing Books. Benno Schwabe, separately, to preset the default. Basel, 1935. Quoted in John Lewis, Typography: Basic Principles, Studio Books, London, 1963, Hanging indentation. Hanging indentation has p. 39. been introduced on footnotes by default, using the macro suggested in The LATEX Companion (p. 73). Conclusions Some parts of the vulcan package have been in use in the author’s work area for many years. Parts of the current version of the vulcan package have there- Sonnet from the ill-at-ease. fore already been used to typeset papers and books, Oh what a tangled web we get and work is continuing on improving robustness and when first we practice to typeset. functionality, and on adding compatibility with the And so we read TEX master Knuth darker recesses of the book and report classes. Hoping to Obtain some truth. The objective of codifying these macros was to Or else we read through the book of Spivak A provide a reliable escape route for LTEX users who but end up simply crying Alack! needed a better default presentation format than is And then we go and join the TUG provided by the basic installation, and at the same trying to make our process chug. time to explore the possibilities for creating a new Beginners know why the rhyme for TEX suite of document class style implementations to is often the frustrated Blecchh. A provide new LTEX users with some visual justifi- I can’t even get a .dvi file, cation for their placement of faith. It can be an and you think TEX to HTML will make me smile? unpleasant experience for someone whose only ex- perience of text processing has been with Word or And yet I know if I abandon the Lion, WordPerfect (or worse) to discover that so much With any other package I’ll surely be cryin’. A manual labor is needed with LTEX when its pro- —Joseph Haubrich ponents have been expounding its superiority.10 Some experiments have been made with full pa- rameterization for all the layout details in the new header, by way of preparation for the next phase The Hidden Passions of Mathematicians of the project. This would allow very simple cus- Step into the garden of conjectures and see tomization, but the very large number of LAT X E my Julia sets are uniformly perfect. ‘lengths’ (T X’s \dimenn registers) needed for all E Forget your nilpotence and steenrod algebras, controls would place an unreasonable burden on the my theta divisor is very ample. use of subsequent packages. The project is open to In this land of lemmata, suggestions about how to achieve a lighter weight you’ll glide with the smoothness of Kelley solution. while I, I’ll gather the perverse sheaves References and quivers, and we’ll dance ’til our zeta functions converge. TUGboat Flynn, Peter. “Typographer’s Inn.” Sipping modular moonshine, we’ll reach 19(1), 353–355, (1998). the highest eigenvalue without effort. Goldfarb, Charles. The SGML Handbook. Oxford, In this holomorphic vector field Clarendon Press, 1990. with totally degenerate zeroes, Goossens, Michel, F. Mittelbach, and A. Samarin. we may even discover the essence of chaos. A The LTEX Companion. Addison-Wesley, Read- —Debra Kaufman ing, MA, 1994.
10 The words ‘mote’ and ‘beam’ seem appropriate here.
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 213 New Interfaces for LATEX Class Design: Parts I and II
Frank Mittelbach LATEX3 Project frank.mittelbach@latex-project.org
David Carlisle LATEX3 Project david.carlisle@latex-project.org
Chris Rowley LATEX3 Project and Open University, UK chris.rowley@latex-project.org
Introduction to be implemented in a declarative way, by simply choosing suitable templates and supplying values for Traditional LAT X class files typically implement one E their named parameters. fixed design via ad hoc, and often low-level, (LA)TEX code. This style of implementation makes it much Spin-off technology harder than is either desirable or necessary to pro- duce classes that implement a specific visual design. Whilst applying the idea of templates to document A Moreover, the construction of such classes typically design in LTEX we have had the opportunity to involves a lot of work that is essentially program- substantially rethink many of the basic concepts of A ming and thus does not live easily with the declara- LTEX’s formatting machinery. This has led to the tive kind of design specification for a document (or development of major enhancements in the following A range of documents) that would be produced by a parts of LTEX. professional typographic designer.1 Paragraphs: There will be a completely new model This work introduces some extensions to LATEX and design interface for all aspects of paragraph- that will help to provide a new, more declarative in- making, including: the parameters that con- terface that can be used in class files. It is based on trol TEX’s hyphenation and justification sys- the idea of a template, which describes how to carry tem; special typographical treatment of the be- out some action but which provides some flexibil- ginning and end of paragraphs, e.g., initial let- ity since its code uses the values of a set of named ters/words (lettrines), nested run-on headings, (keyword) parameters. The specific design for this etc. action, as required for a particular class, is then se- Galleys: The paragraph model will be linked to a lected by choosing values for the template’s named new model for the construction of galleys from parameters. paragraphs and other material; this model will incorporate current standard LAT X concepts Plans E such as logical labels, marks and colour-change The plan is to provide standard templates for a wide nodes, together with more experimental objects range of typographic objects but, of course, new such as hyper-information nodes. templates for new ideas can be created, possibly by Floats: These elements have undergone a major re- adapting an existing one or by a little LATEX pro- development: gramming. It is our firm belief that there will soon Captions: The formatting and positioning of be a large range of templates available and that it the caption can be decided individually will thus be possible for the majority of class files for each float and can depend on exactly 1 where on the spread it appears. [This is an up-to-the-minute report from the LATEX3 team; a written version will follow in a future issue of TUG- boat.–Ed.]
214 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting New Interfaces for LATEX Class Design: Parts I and II
Position: The position specification allows Post-Conference Update changes if the float does not fit on the cur- The slides from the TUG’99 presentation of the talk rent page. we gave on a new interface for LATEX class designers Pages: A More flexibility and better specification are available from the LTEX Project website; look of both individual float pages and sequen- for the file tug99.pdf at: ces of float pages (e.g., at chapter ends), http://www.latex-project.org/talks/ including more possibilities to choose whether a text page or float page should The slides are also available from the TUG99 website be used. http://www.tug.org/TUG99-web/pdf Margins: Better integration of floats and marginal in: material, allowing floats to appear in margins carlisle.pdf, and for text to be changed according to which mittelbach.pdf,or margin is used. rowley2.pdf. Alignment: Better support for alignment between (All three have the same contents.) ‘minipages’, specified using logical handles such Please note that the accompanying notes were as ‘top centre’, ‘centre left’ or ‘first baseline only intended to be informal “speaker’s notes” for right’: the relative positioning of two boxes our own use. We decided to make them available (with such handles) is done by choosing a han- (the speaker’s notes as well as the slides that were dle on each box and the (2-D) offset between presented) because several people requested copies these handles. after the talk. However, they are not in a polished Page layout: More types of pages and spreads with- copy-edited form and are not intended for publica- in a document; more layout choices and param- tion. eterisations. Prototype implementations of parts of this in- terface are now available from: Document commands: New tools for providing http://www.latex-project.org/code/ document-level syntax. experimental/ Professional support: For editorial and page We are continuing to add new material at this lo- make-up processes currently used in the pub- cation so as to stimulate further discussion of the lishing industry: e.g., flexible manual control underlying concepts. As of December 20, 1999 the over positioning of floats, etc. in the final form following parts can be downloaded.2 document; automated handling of complex bib- liographic information in the front-matter of xparse This module contains the prototype imple- journal articles. mentation of the interface for declaring docu- ment command syntax. It supports the defini- These are all, of course, still severely limited tion of user commands with a LAT X2ε inter- by what is practical within current T X; for exam- E E face including star forms, optional arguments, ple, precise control over page-breaking within para- and picture mode arguments. See the .dtx files graphs is simply unobtainable using T X’s standard E for documentation. It is possible to use this in- mechanisms. Nevertheless, we hope that what we terface independently from all other modules have been able to do will inspire others to use the described below. tools we provide in the creation and use of high- quality typographic designs. template This module contains the prototype im- plementation of the template interface. Togeth- Presentations er with xparse it forms the basis of all further modules, i.e., to make use of any of the other The two presentations explain these concepts and modules you need both. show examples of their use, covering both the cur- A The file template.dtx in that directory has rent standard LTEX designs and some more exciting new possibilities. a large section of documentation at the front We demonstrate working examples of the ap- describing the commands in the interface and plication of these ideas in most of the major areas 2 Please remember that this material is intended only of document design, including page layout, section for experimentation and comments; thus any aspect of it, headings, lists and captions. e.g., the user interface or the functionality, may change and, in fact, is very likely to change. For this reason it is explicitly We also report on the current status of this forbidden to place this material on CD-ROM distributions or work and on our plans to complete and publish it. public servers.
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 215 Frank Mittelbach, David Carlisle and Chris Rowley
giving a ‘worked example’ to build up some xlists This module documents and implements tem- templates for caption formatting. plates for providing various kinds of list struc- xcontents This module contains the interface de- tures. As part of the included examples it will scription for table of contents data, describing contain a full set of LATEX2ε lists compatible both an extended data model to replace the in design to the article class implemented as data deposited by heading commands into a instances of the templates provided. The pro- .toc file and a number of template type de- totype code for this module is finished but re- scriptions for manipulating such data. There is quires services from galley2. currently no code provided to produce specially xinitials This module implements a template for formatted table of contents; however, usable ex- paragraph initials. An example of its use can amples for the templates have been thoroughly be admired in the slides of our talk. Since it discussed on the latex-l list, which can be re- too relies quite heavily on galley2 it is not yet trieved as explained below. released. xfootnote This module contains documented work- All examples are organised in subdirectories and ad- ing examples for generating footnotes, etc. It ditionally available as gzip tar files. implements some of the functionality of the These concepts, as well as their implementa- package footmisc by Robin Fairbains and pro- tion, are under discussion on the list LATEX-L.You vides, although still incomplete, a good intro- can join this list, which is intended solely for dis- duction to the usefulness of templates in class cussing ideas and concepts for future versions of design. Due to the fact that support modules LATEX, by sending mail to for paragraph manipulation, etc. are still un- listserv@URZ.UNI-HEIDELBERG.DE der development, most of the implementation containing the line should be considered as a first draft only. Modules currently under development include the SUBSCRIBE LATEX-L Your Name following. Some if not all might be publicly available This list is archived and, after subscription, you can TUGboat by the time you read this issue of . retrieve older posts to it by sending mail to the above galley2 This module will document a new data address, containing a command such as: structure for galley-related information, i.e., a GET LATEX-L LOGyymm data structure to better support handling of where yy=Year and mm=Month, e.g., inter-paragraph material. Beside implementing lower-level programmer mutator functions for GET LATEX-L LOG9910 this data structure it will provide higher-level for all messages sent in October 1999.3 templates for declaring paragraph shapes and H&J (hyphenation & justification) specs. Most other modules will depend on its services as paragraph handling is needed for most aspects of layout. The code of the second prototype implementation for the galley data structure is 90% finished and it is targeted for a release date in 1999.
3 No, we don’t know whether or not the listserv software is Y2K-compliant.
216 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Multi-Use Documents: The Role of the Publisher
Kaveh Bazargan Focal Image Ltd. 2 St John’s Place St John’s Square London EC1M 4NP kaveh@focal.demon.co.uk
Abstract Publishers now expect a variety of electronic files from the typesetter, both for publishing, and for archiving. And yet, the publishing industry, by and large, still follows the traditional “manuscript/edit/typeset/proofread/print” approach. The process is still essentially paper-based. The typesetting is also geared pri- marily towards paper output. I suggest that we are in need of radical changes to these procedures so that files can be used to produce output on any medium, including paper, and I believe that these changes will benefit all concerned—authors, publishers, and typesetters. And clearly TEX is an ideal medium to hold the definitive text in modern typesetting. Not only can it be directly edited by the author, but it can produce every type of output required directly. These include paper, PDF, SGML, HTML, XML,etc.
Recent changes in typesetting author There are two reasons why files submitted by manuscripts authors should be used in typesetting a manuscript: Some ten years ago, the process of typesetting ma- • By using the author’s original code, typographic terial for publishers did not, in general, involve any errors can be minimized. TEX documents are form of electronic files, whether for typesetting or often complex mathematical or technical ones, archiving. The author submitted a paper manu- and proofreading them is a difficult task. It script which was copy edited and sent to the typeset- therefore makes sense to use the author input ter for keyboarding, conventional proofreading, and where possible. setting to bromide. The procedure had evolved over • If the author has used LATEX in a structured decades, and if each party performed their roles and manner, then there may be a significant labour knew their responsibilities, the process worked well. and therefore cost saving for the publisher. During recent years, the well-defined roles of the three parties—the author, the publisher, and Electronic files required from the typesetter. the typesetter—have been blurred, due to electronic When publishers first requested electronic files from files, including those submitted to the publisher from typesetters, it was limited to PostScript files. These the author and those requested by the publisher from were used firstly in order that the printer could pro- the typesetter. When we look at current procedures duce high-resolution CRC to print from and, sec- employed by publishers, we see that in the main, the ondly, for archival purposes. Soon afterwards, PDF traditional manuscript-based approach is still taken, files were requested, having the advantage of smaller with the electronic files being treated as an after- file size and screen viewability. After discovering the thought, once the paper camera-ready copy (CRC) joys of hypertext links in PDF files, some publishers has been completed. requested that these be included for citations, fig- I would like to examine the use of electronic ures, tables, and sometimes to external URLs. Next files, and propose some changes in the traditional came HTML and SGML, either for the full text or procedure which should benefit all three parties. for abstracts and/or references. No doubt next in line will be XML,MathML, and who knows what Why use electronic files from the author? For will come after that. Gradually, these requests have the purposes of this discussion, I shall confine myself increased the workload of the typesetter, often with- to T XandLAT X files submitted by the author, E E out any increase in prices charged. although the principle applies to other file types.
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 217 Kaveh Bazargan
Using LATEX to produce electronic files. For- They could also insist that all outputs, whether tunately, we find that we can use LATEX itself, in con- electronic or paper, should emanate from the same juction with many programs which are in the public source code. This will guarantee uniformity in con- domain, to generate all the electronic files required. tent of the different outputs and minimize delays for ItistheopencodeofTEX and that of the auxiliary last-minute changes. programs written by third parties that allow us to Distribute an author kit By being provided with use LAT X in this way. The electronic files that can E such tools, the author is encouraged to submit a be produced in this way include full SGML files. An well-structured LAT X document. Here is what the important advantage is that there is only one source E author kit could contain: code and the chances of differences between differ- • A ent electronic versions are minimized. Most of the the LTEX manual [1]: this is a small investment tools needed for producing the electronic files are, which I believe will encourage the user to follow in fact, in the public domain, with the source codes the standard)—it would only apply to book available. This means that they can be customized authors to produce specific electronic output, for example • an authors’ guide, based on the same user friendly SGML output for a particular DTD. approach as Lamport’s manual • a generic class file for authors only (see below) The publisher’s role • a fully working example file with accompanying Here are what I think the publishers should be doing hard copy to help deal with electronic files better. Produce an ‘Editors’ Guide to Electronic Sub- Take a more active role in encouraging good missions’ Book manuscripts are normally sent to LATEX submission. In order that files prepared by freelance copy editors before typesetting. The copy the author can be used easily for multiple outputs, it editors are generally unaware that there are files ac- is essential that these files are coded with document companying the text, let alone that the files may structure in mind and the coding is completely log- be TEXorLATEX. It would be useful to produce a ical, with little or no visual encoding. LATEX, used short non-technical guide for copy editors, in order correctly, is a very good authoring environment for to make the process of copy editing smoother and to this type of coding. reduce the number of marks made on paper. Points It is my feeling that the acceptance of TEX files that should be addressed in this guide would be the by publishers has been under the pressure of au- following: thors, rather than being initiated by the publisher. • Automatic cross referencing:Itisusefulfora With a few notable exceptions, publishers have sim- copy editor to know if cross references for cita- A ply passed on any TEXorLTEX files received from tions, equations, etc. are generated automat- authors onto typesetters, in case they can be used. ically. If an equation is deleted, for example, By supplying the author with an “author kit” (see they can simply ask for the rest to be renum- below) authors will gain confidence in the publisher bered appropriately, rather than marking each A and will be encouraged to submit LTEX manuscripts occurance, which is the usual practice. They according to the publisher’s requirements. should also be warned of the reasons for mys- Recognize the extra care needed when multi- terious double question marks appearing in a ple outputs are required. More and more publi- manuscript. cations are available electronically, usually through • Table of contents and Index:InLATEXdocu- the Internet. These include HTML,andPDF files. ments these are usually automatically gener- I believe that publishers should consider all forms ated. Therefore copy editing them is invari- of electronic delivery at the outset, and treat the ably a waste of time both for the editor and for printed CRC as just one of those possible outputs. the computer operator. In particular, it is very They should take into consideration the extra care common for page numbers in index entries to that should be taken in the preparation of manu- be edited. For the typesetter, this is extremely scripts and their associated files. This means that laborious work to carry out and to check. they should be in contact with the author from the • Running heads: These are also automatically early stages, making sure that clean, structured files generated and should not generally be marked are submitted. If such files are not available, then by the copy editor. By understanding the over- the publisher should be prepared to compensate the all mechanism of running heads, a few simple typesetter for the extra work incurred. instructions should suffice for any changes.
218 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Multi-Use Documents: The Role of the Publisher
• Making global changes: The copy editor should be avoided and be reserved for the typesetting be encouraged to make as many global marks as stage. possible, rather than marking every occurance • Forgiving class file: So that the author can con- of an error. centrate on the contents of the work, the stan- The class/style files dard TEX parameters for such items as line and page breaking penalties should be relaxed so The current situation. Let us take the case of that overfull boxes are kept to a minimum. book production. At present, a typical scenario is that the publisher asks a TEX consultant to write Requirements of the typesetter. Let us now a class file (let us assume it is only LATEX2ε we look at the requirements of a class file used by the are dealing with). The consultant will be supplied typesetter. In general, the typesetter should have a with an example book and/or type specifications, more in-depth knowledge of LATEX than the author. from which to produce a LATEX class file. Normally He/she can therefore deal with much more complex instructions and an example file are also included. class files, with the following possible attributes:1 This collection of materials is then distributed to • Unusual fonts:TheTEX typesetter will have ac- prospective authors to apply to their manuscripts. cess to numerous commercial fonts. The class When the manuscript is received by the publisher, file can therefore use any standard font for the it is sent out for copy editing, and usually goes to a body text and a choice of several fonts for math- TEX typesetter for production of final CRC and any ematical text. subsequent electronic files. • The important thing here is that there is only Graphical embellishments: I believe that type- one class file. It is used by the author to produce setters should strive to get away from the classic the manuscript and by the typesetter to correct and “TEX look”. There are numerous ways in which A paginate the book. I would like to argue that this TEXandLTEX documents can be embellished approach should be re-examined and that separating with graphical devices, from simple rules to full- the author’s and typesetter’s class files might be a colour tints. A TEX typesetter should have all better route to take. Let us look at the requirements these tools available to improve a book’s ap- of each party in turn. pearance (in association with a book designer, of course). Requirements of the author. The author’s task is primarily to concentrate on the contents of the Looking at the above requirements for the class book or article being written, without much regard files for the author and the typesetter, it is clear that for the final look, which is normally decided by the there is a conflict. In the case of the author, the file publisher and is the responsibility of the typesetter. should avoid non-standard fonts, difficult construc- Here are what I see as the main attributes of the tions, graphic elements, etc. On the other hand, the class file distributed to the author: class file used by the typesetter can be as complex as necessary to get the desired effects in the final • Standard input syntax A book :TheLTEX class typesetting. At the moment, any class file writer is well known, and easy to use. It is a great has to strike a balance between these two conflicting advantage to present an author with a class file requirements. Graphic elements, if used, are kept book.cls which has an input syntax as close to simple, so that authors do not have trouble with as possible. In fact, the class file and any in- them. Of course, this limits the complexity of the struction material should encourage the author final typeset book. to enter standard LATEXcode. • Easy installation:TEX is available on most com- Using two separate class files. Looking at the puter systems, including some old machines with above requirements, it is my conclusion that the au- limited memory and power. It is a good idea thor should be supplied with a generic class file, de- to have a class file that does not need a lot of signed with the author requirements in mind. A memory, computer power, or unusual fonts. In publisher need not distribute more than one or two particular, it is safest to limit the font require- of these generic files, making support, debugging, ments to the standard Computer Modern fonts and maintenance easier. The class file used in the which are available on every T X installation. E 1 • Avoid unusual elements: In order to make it Of course this may not apply to all cases. Some authors are very adept at handling class files, and some books have easy for the class file to be used with any sys- such an unusual layout that the final class file must be used tem, unusual elements such as graphics should at all times.
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 219 Kaveh Bazargan
final setting of the book need not concern the pub- 12. Author uses generic style file to carry on work- lisher at all but should be the responsibility of the ing on other related material, such as the next typesetter. edition of the book. 13. Go to step 5 to repeat cycle. Re-use of typeset files by the author A References A unique advantage of using LTEXtosetbooksis that once the typesetting has finished, the files can [1] Lamport, Leslie. LATEX: A document preparation be passed back to the author to work on another system, 2nd ed. Addison Wesley Longman, 1994. edition. This cannot be done when other systems are used for typesetting, as no other typesetting system can generate clean, structured LATEX. Sequence of events in setting a book with LATEX. Here is what I see as a possible sequence A of events, regarding the LTEX files used in setting a The Journey of the TEXxies book. “A cold coding we had of it, 1. Author signs contract with publisher. Just the worst package from TEXLive A 2. Publisher sends author the LTEX author kit. Foradocument,andsuchalongdocument: The author is encouraged to contact publisher The macros long and the braces many, for assistance at this or any subsequent point; The very pits of TEX.” typesetter could be called on to support the au- And the tables hard, badly-aligned, refractory, thor in the interests of extracting clean files from Misplaced omit in every \multicolumn. the author. There were times we regretted 3. Author sends sample chapter to publisher, type- The dependence on longtable, the use of pdfmark, set using generic class file. And the hyperref macros breaking our \cite s. 4. Typesetter evaluates and reports on sample, with Then the authors cursing and grumbling suggestions to author. And using Macintoshes, and wanting their 5. Author submits manuscript, accompanied by Framemaker and Word, And the editor crashing, and the graphics corrupted, full set of files. And the setup.exe hostile and CTAN unfriendly Note: A possible extra stage here is that the And the usergroups dirty and charging high prices: typesetter reformats the files in the final style A hard time we had of it. before handing over for editing, rather than the At the end we preferred to work in XML, generic style file being used, exactly as submit- Validating as we went, ted by the author; there are pros and cons to With the voices singing in our ears, saying this stage. That this was all folly. 6. Copy editor edits manuscript, having read the Then at we came to the end publisher’s ‘Editors’ Guide to Electronic Sub- of the process, missions’. valid, with all elements ended, all IDREFs satisfied, 7. Author reviews editing (optional, for books only). With an XSL stylesheet, and even a version for IE5 8. Manuscript and files go to typesetter. And three HTML versions already on the Web. 9. Typesetter implements editor’s corrections, con- And an old white horse galloped away in the meadow. verts to final style, and sends copies to pub- —stolen by Sebastian Rahtz lisher. 10. Typesetter receives corrected proofs for CRC, (untitled) produces CRC and any specific electronic files, and sends these to publisher and printer. I thought I would never see 11. Typesetter strips out pagination codes used in poetry, written by Chimpanzee the author files to make the final pages, and But given TEX and infinite time returns these files to author (via publisher). I’ll steal it and call it mine —Donald Arseneau
220 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting ∗ Very Like a Nail: Typesetting SGML with TEX
Frederick H. Bartlett Springer-Verlag New York, Inc. fredb@springer-ny.com
Abstract be converted back to (La)TEX, but this is not a high priority at the moment. At Springer-Verlag, we have been frustrated for some years now with the difficulty of putting mathematics This talk describes something that is very much into a web-friendly format. We have not yet found a work in progress, so a discussion period will be a magic bullet, but ... most welcome. The XML application MathML may be the first real tool for putting mathematics on the Web in a useful form. Suppliers of mathematical tools such as Mathematica and Maple are gearing up to use MathML as an input/output format; thus, we can To T XornottoTX look forward to a day when mathematics on the Web E E will be truly interactive. To TEX, or not to TEX: that is the question: It is likely that—even if MathML fulfills every Whether ’tis nobler on the page to suffer bit of its promise—TEX will continue to be used The slings and arrows of outrageous software, for the preparation of mathematics for display and Or to write code against a sea of troubles, printing. And by opposing end them? Use Word? Use Quark? This presentation is an account of our efforts No more! For such as they could never end to translate author-generated LATEXintoXML.The The heartache and the thousand unnatural shocks project can be divided into four stages: That type is heir to. A 1. Normalizing (La)TEX. That is, transforming LTEXe*? authors’ idiosyncratic usages (and even more Devoutly to be wish’d! Or Quark to TEX? idiosyncratic macro definitions) into consistent, To TEX? Now there’s a dream. And here’s the rub: and consistently structured, files. The vast ma- From that disguis`ed TEX the dream may come jority of author-generated LATEX files can be That TEX should shuffle off this mortal coil, converted easily with a minimal understand- So should we pause? There’s no respect ing of TeX’s digestive tract; those which can’t For TEX in all of its long life; (especially plain TEX files) will require some For who would bear the whips and scorns of Frame, human intervention — or increasingly sophisti- WordPerfect’s wrong, Microsoft’s contumely, cated (read ‘bloated’) software. The pangs of despis’d TEX, Incontext’s delay, The insolence of Active TEX and the spurns 2. Converting to XML. This is the easy part: chang- wysiwyg A That patient Wizards of th’ ers take, ing structural LTEX tags into XML tags. When he himself might his quietus make 3. Converting to MathML. And this is the hard In a plain TEX style? Who would authors bear, part: It would be ideal to be able to convert To grunt and sweat under a weary life, LATEX math into both presentation and content But that the dread of something after TEX, MathML coding. Unfortunately, this is, even The undiscover’d standard from iso in principle, extremely difficult. So at first we And W3C, puzzles the will concentrate on the LATEX-to-presentation mark- And makes us rather love the type we have up path. Eventually, it will be possible to pro- Than fly to others that we know not of? duce an interactive LATEX-to-content mark-up TEX’s enterprise of great pith and moment converter for authors. With xml its currents join anon, And gain the funds of moguls. 4. Going backwards. It will eventually be helpful to authors and publishers if MathML/XML can
∗ [No paper submitted. – Ed.] —stolen by Fred Bartlett
0 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Making a Book from Contributed Papers: Print and Web Versions
Harry Payne Space Telescope Science Institute 3700 San Martin Drive Baltimore, MD 21218, USA payne@stsci.edu http://www.stsci.edu/~payne/
Abstract This paper describes a set of tools used for several conference proceedings projects. Processing of a run-file for the individual components is managed by the UNIX make utility, and performed with perl and sed scripts. One advantage to this scheme is that references to other papers in the same volume may be supplied with a page number in a straightforward way. Another is that the entire book may then be processed with latex2html to produce a Web version from the same set of input files. The tools will be made freely available via the Internet.
Introduction The ASP kindly allowed the editors of the 1993 volume to translate the proceedings into HTML for In my field (astronomy), it is common practice for access over the Web. The Web was new in 1993, conference proceedings to be generated from a set and although this first Web volume was attractive, of contributed papers, each of which is a stand- I wanted to do things differently. The first version alone LAT X document. Editors are expected to E of Nikos Drakos’ LAT X2HTML hadappearedinthe not only edit the individual contributions but also E meantime. I decided that I wanted to do the entire combine them into a book, with a table of contents book as a single LAT X document and then feed the and proper page numbers. But publishers generally E whole thing to LAT X2HTML. provide little more than a style file and instructions E I settled on a scheme to edit each contribution for the individual authors. as a stand-alone LAT X document, but to add in- In this paper I will discuss my experience with E formation for use in the book, such as index and turning LAT X manuscripts into a book and into an E author index entries, and cross references between on-line volume on the Web. I will describe some contributions (fairly common since the authors have tools I have written for this purpose, after giving already heard one another’s talks). The contribu- some general comments that may save you some tions are preprocessed into chapters, and a skeleton work if you are faced with a similar task. But first, document pulls in each chapter to make a book. I will describe how I got started on this, and the Front and back matter are created by LAT Xina choices and constraints we faced. E straightforward way. The UNIX make utility keeps My job is in astronomical software, and in track of all the pieces, updating the book if any of 1994 my institute hosted the fourth annual meeting the papers is changed. in a series devoted to Astronomical Data Analysis The on-line version of the book is produced Software and Systems (ADASS). I volunteered to from the same files used to produce the printed be a proceedings editor, at least in part thinking volume, processed by LAT X2HTML. The output from that I could enjoy the meeting since my work would E LAT X2HTML is modified with the help of some perl all come later (unlike TUG, we go to the meeting E scripts to obtain the final product. Reprints of the first, and then write our papers). The proceedings papers and a searchable index are provided. of the first three meetings were already published, I have used this scheme on about a half dozen so we knew what our book had to look like. The books, and other editors have used it on a few more. Astronomical Society of the Pacific (ASP) provided The editors of the most recent volume in the ADASS instructions to the authors. Previous editors used 1 series have made some improvements that I will perl scripts to kept track of the number of pages mention. in each contribution, so that each paper could be printed with the right page numbers, and a table of contents and index could be generated. But I chose 1 D. Mehringer, R. Plante, and D. Roberts, University of to depend on LATEX itself as much as possible. Illinois.
222 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Making a Book from Contributed Papers: Print and Web Versions
Editing the Book index is an iterative process that depends on seeing all of the entries. We provide a \keyword macro for As I was editing my first book with this scheme, I authors to provide a half dozen keywords or so, just made notes to myself every time I went through all as a starting point for the editors. Our astronomical of the contributions to fix something. Here are some software series now has enough volumes that we can examples: ask authors to use previous volumes as a guide, but • “Took out multiply named labels.” authors must be free to create. The instructions to authors said to label your Once the papers and their associated figures first figure with \label{fig-1}. When I put have been collected, you can finish editing them only all of the contributions together into one docu- after deciding the trade-off between uniformity and ment, fig-1 was used many times, of course. the author’s own voice. Making all of the tables • “Added author index information.” look the same was a worthwhile effort. On the other I cloned the \index macros to provide a way to hand, we foolishly once replaced all British spellings compile an author index. with their American equivalents. In the case of • “i.e. and e.g. are followed by a comma, and are authors who are not native speakers of English, you not italicized.” “Em dashes don’t have spaces will have to decide when your voice is preferable to around them.” “No bold, italics, or Greek the author’s—when unidiomatic phrasing is more symbols in titles or section headings.” confusing than colorful. And many other substitutions as well. Finally, if PostScript figures accompany the papers, plan on spending time fiddling with the • “Work on tables.” figures to make sure that the individual papers all The editors decided that each table would have print on your system. two \hlines, the headers, one \hline,the body, and two final \hlines. Making the Book • “Work on references.” Once you have finished editing the papers, you can TUGboat Describing the new macros, Robin create the structure for making the book. You need Fairbairns writes “Bibliographic citations give to convert each paper from a stand-alone document much grief to the editorial team” (1996, p. 286). into a chapter in the book and create a skeleton This is a mild statement of the situation faced document for pulling all of the pieces together. by editors in technical fields. North American Depending on the tools you have available, and European astronomers use different cita- you may not have to wait until you have final Bib tion formats, and TEX is rarely used. versions of the papers. I work with the UNIX The common thread connecting these items is that if operating system, and the scheme I use depends on we had made these decisions and properly instructed the make utility program. Programmers use make our authors before they submitted their papers, we to manage software projects. You tell it which could have saved ourselves a lot of work. Time spent components depend on which others and specify providing your authors with the information they rules for building the dependent pieces from the can use will repay itself many times over. independent ones. A program might depend on a In this spirit, the editors of the most recent number of files containing source code. After editing volume in the ADASS series came up with a way to any of the source code files, the programmer runs add cross references to other papers in the same vol- make to recompile only the changed files and then ume. Authors were instructed to label their papers build an updated version of the program. with the identifier given in the meeting program, Each paper’s .tex file is preprocessed to be- and to use this identifier to refer to other papers. come a chapter in a file with the same name but In the absence of such instructions, inserting these using the .ltx extension. In the make configuration cross references requires searching for all variations file (Makefile), I list all of the .ltx files I will need of “this meeting,” “this conference,” “these pro- ceedings,” “this volume,” etc. These editors also PAPERS = \ required authors to make their own author index accomazzia.ltx agafonovm.ltx \ entries. Instructions for all recent ADASS volumes alexova.ltx antunesa.ltx \ describe how to refer to Web resources in ways that ballesterp.ltx ... turn into live links in the on-line version. You cannot give authors enough information to and provide a rule that accomplishes the preprocess- make their own entries in a real index—making an ing. Below is an example using sed,theUNIX stream
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 223 Harry Payne
editor, just because all of the pieces are visible; you With these pieces in place, the make book.dvi could use perl or a compiled program instead. command will create all of the chapter files, run LATEXtomakethe.idx file, run makeindex to .tex.ltx: A sed "s/\\\\documentstyle/{% &/" $< |\ generate the .ind file, and then run LTEX again sed "s/\\\\begin{document}/% &/" |\ to make the .dvi file. If you go back and edit the sed "s/\\\\nofiles/% &/" |\ papers, you can update the book just by running sed "s/\\\\title/\\\\chapter/" |\ this command again — only the papers you edited sed "s/\\\\begin{abs/\\\\label{$*}&/"|\ will be processed into chapters again. sed "s/\\\\end{document}/}% &/" > $@ A style file for the book defines the \paper macro, and redefines the \chapter and table of To translate, this is a rule for converting a .tex file contents macros to match the format required by the into a .ltx file. Each line performs a substitution, publisher. A number of details are also addressed: commenting out \documentstyle, \nofiles,and page format, macros for references to earlier volumes \begin and \end{document} commands, replacing in the series, the author index, and proper number- \title with \chapter, inserting a label containing ing of figures, equations, appendices, etc. I wrote the name of the file ahead of the abstract, and macros that add the publisher’s copyright notice to putting braces around the entire paper to limit the first page of every paper, which I use to produce the damage when authors define their own macros. a version of the book that is sliced up into reprints “$<,” “$@,” and “$*” are make macros for the source for the on-line version. file, target file, and filename root, respectively. There are Makefile dependencies to control the For the skeleton document, I found it conve- production of the final printed copy to be sent to the nient to define a \paper macro which, in addition publisher. Our publisher reduces the camera-ready to an \input, gives the list of author names as they A pages we send, so I let LTEX work with pages and should appear in the table of contents, as well as the fonts that are the same size as the book, and then title and list of authors as they should appear in the ask the dvips program to enlarge the pages to the headers: size requested by the publisher. \paper{accomazzia}{A. Accomazzi, G. In theory, once you have finished editing and Eichhorn, M. J. Kurtz, C. S. Grant produced a final printout, you can send it to the and S. S. Murray}{Accomazzi, Eichhorn, publisher and turn your attention to the on-line Kurtz, Grant and Murray}{Mirroring the version—after all, it is desirable that printed and ADS Bibliographic Databases} on-line versions be identical. In practice, I strongly suggest that you wait. At this point, you will be I like each \paper command to be on a single long tired of looking at the text, but the first look at line, to make it easier to rearrange their order. the on-line version will make it all fresh again. I The ADASS papers alternate author names and guarantee that you will find things to change in the affiliations, making it tricky for a program to sort printed version. them out. I confess to generally using cut and paste to build these commands, but a perl script can give Making the On-line Version you a head start. Finish up the book skeleton with A a table of contents, and front and back matter of Running LTEX2HTML on a large book does impose your choice, such as a preface, list of participants, some requirements on the hardware you use. Be- conference photograph, and colophon. cause the program holds the entire book in memory, The Makefile says that the book’s .dvi file you will need a large amount of virtual memory — A depends on the chapter files and the index file for a 500-page book, I have seen LTEX2HTML require 750MB of memory. Aside from this, there is nothing book.dvi: $(PAPERS) book.ind too unusual about the way I use LATEX2HTML.But As well, the Makefile has rules for obtaining .dvi I do manipulate its output to produce the final and index files: product. .tex.dvi: Neither the papers nor the book skeleton file $(LATEX) $* needs to be modified to produce the on-line version. But the Makefile does contain a rule for creating a .tex.idx: new skeleton file from the original. The LAT X2HTML $(LATEX) $* E program runs a separate task to expand \input .idx.ind: instructions; instead of modifying this task I create a makeindex $<
224 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Making a Book from Contributed Papers: Print and Web Versions new skeleton file, where all of the \paper commands and the labels.pl file from LATEX2HTML to create are demoted to ordinary \input commands. a file containing three columns: the new file name, LATEX2HTML allows for customization using perl the original file name, and the page number in the style files (LATEX2HTML itself is written in perl). My printed version: customizations are either routines to translate new macros into HTML or routines which change the default behavior of LATEX2HTML. For example, my accomazzia.html node99.html 395 simple macro for referring to the fifth volume of the agafonovm.html node14.html 58 ADASS series albrechtm.html node91.html 363 \def\adassv{in Astronomical Data Analysis albrechtr.html node62.html 248 Software and Systems V, ASP Conf.~Ser., Vol.~101, eds. G.~H. Jacoby \& J. Barnes I then run a script which renames the files and (San Francisco: ASP)} updates all of the links. The page numbers are used for making the table of contents and index (so you is implemented with a routine that simply inserts can determine a citation to the book from the on- the same text, translated into HTML,intotheout- line version). The table of contents produced by put stream: LATEX2HTML will not contain the author names, so I sub do_cmd_adassv { replace it by running a script on the .toc file from local($_) = @_; A LTEX. While working on the table of contents, I join(’’,"in Astronomical Data Analysis put all of the front and back matter together under Software and Systems V, ASP a “Preface” heading. I generally make a version of Conf. Ser., Vol. 101, eds. the book cover by hand and use either this cover or G. H. Jacoby & J. Barnes the table of contents as the entry page for the on-line (San Francisco, ASP)$_");} book. The changes to the default behavior are primarily in One of the advantages of the on-line version the navigation panels, where I wanted a particular of the book is the ability to do full-text searching. format, links to an index and reprint files, and a There are many options for indexing software, and copyright notice. Other changes suppress the index, many are free, but your choice will depend on your which I make in a different way, and make sure that Web server platform. I have used WAIS,asomewhat chapters are not numbered. dated search technology, since I already had a WAIS The Makefile has a rule for running LATEX2HTML server available. Every page in the on-line version on my new skeleton file to produce HTML and image has a link to an index page from which the user files. Most ADASS papers are short, so I let LATEX2- can perform a search. Although this mechanism HTML put each paper into its own HTML file. I provides a powerful way to find what you are looking also select options which keep footnotes with each for, I also translate the author and subject indices paper (rather than breaking them out into separate from the book into HTML, using perl scripts and files) and which add section numbers, an HTML information from the .ind file produced by LATEX. title for the book, and an address at the bottom of We published the on-line version of the book every page. Using a Sun SPARCstation 5, running with the permission of the publisher. As a courtesy, LATEX2HTML on a 500-page book with 120 PostScript we put the publisher’s copyright notice on each figures and a lot of equations takes the better part HTML page (making the notice into a link taking of an afternoon. If you have to do it again, it will users to the publisher’s home page) and each reprint. go much faster if you can re-use the image files. I also added instructions for obtaining a printed After LATEX2HTML has done its job, you need to copy of the book by way of the publisher’s website. examine the output with a Web browser, looking for Except for the searchable index, all of the links in problems that might force you to do the conversion the on-line version are relative. This makes it easy again: broken image files or images that are out of to create mirror sites. sequence. You can spend as much time as you like on When you are satisfied with the output from projects like JavaScript commands to identify peo- LATEX2HTML, you are ready to make the final prod- ple in the conference photograph just by moving uct. I like to rename the output HTML files for the mouse. However, you can realistically expect to the papers, restoring the original name. To do this, produce an on-line volume with a few days of effort, I combine information in the .aux file from LATEX once the printed volume is ready.
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 225 Harry Payne
Trying it Yourself Ode to a special You can find the files you need to try out this scheme Oh what a tangled web we get at ftp://ftp.stsci.edu in /pub/software/tex/ When first we find \def and \futurelet. bookstuff/. We turn to eTEX, we turn to Omega, The most recent on-line volumes produced this They both look good, we start to get eager. way can be found at http://www.stsci.edu in But no, whats this, a misplaced \omit? /stsci/meetings/lisa3/ and /stsci/meetings/ Did we miss a brace, does \vbox still not fit? adassVII/. Help me fix my macros, bracket heroes all! But Pierre’s font is calling, Lamport’s left the hall References Barbara says use \downcase, Downes just strokes [1] Fairbairns, Robin. “The New (LATEX2ε) TUG- his beard. boat Macros.” TUGboat 17,3 (1996), pp. 282– Erik sells me 4TEX, pretends he never heard. 288. David’s hacking tables, Frank is fixing floats, Ogawa’s talking slowly, Kacvinsky wants his oats. Will Kaveh help me out? no, his dhoti’s dirty, Nelson’s feeding awks, Mimi’s feeling flirty. Flynn says use a rubber, Wendy needs a hug, Kath has got the answer, no it’s just another mug. TUG ’99 musings Young Ross is such a hoot, he says use xypic Why not turn to Sojka, he’s sure to know the trick. Mr. Fine, with the eye of the sleuth Anita, a Hoover, what use to me’s a dam? has discovered why TEX is uncouth, Send me out with Kiren, we’ll both go on the lam. that its catcodes are many Irina claims ‘for us in Mir is no problem’, when there shouldn’t be any TRIUMF uses \mathcode but \hspaces just one em. Has anyone told Donald Knuth? TrueTEX does it both ways, and you can trust the Blue Sky Mr. Flynn is now running on La- but hyperref the backend, oh why oh why oh why? TEX, and finds, when the tread starts to fray, he can patch up the rules ActiveT X, PassiveT X, what a great to do, with his vulcanize tools E E Can there be yet some way through? pump them up, and back on his way MathML has pointies, XSL can claim its templates ExerQuiz is so cute—so phooey on you Billy Gates. Though some of you authors may fret —Sebastian Rahtz about how every page should be set and think what you see andwhatitmustbe Mr. Bazargan says what you get —Pierre MacKay
226 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Managing Large Projects with PreTEX: A Preprocessor for TEX
Managing Large Projects with PreTEX: A Preprocessor for TEX
Robert L. Kruse PreTEX, Inc. 2891 Oxford Street Halifax, Nova Scotia, B3L 2V9 Canada bob@pretex.com
Abstract PreTEX is a preprocessor for TEX that supplies an author with many tools to simplify the writing and management of larger (book-length) projects. This paper concentrates on PreTEX’s use of secondary input files and conditional typesetting in managing large projects. The user may insert location tags within a file which can then be used by PreTEX to include parts of one file within another, in any order determined by the user. Parts of a file may also be selectively typeset according to the status of various conditions.
Introduction introductory guides for the user, reference volumes for more sophisticated users, precise specifications Let us consider two projects large enough to chal- for each part of the program, in-house documenta- lenge most T Xsystems. E tion and other comments from programmers, and The first is a large science or math textbook bug/modification reports and history. at the high school or elementary college level. Such PreT X is a preprocessor for T X (consisting of a book is often over 1000 pages long, requires four- E E about 15,000 lines of C code), designed explicitly to color printing, and contains many hundreds of full- facilitate the work of authors and editors in writing color photographs or other complicated images. It and production of large projects such as these. has a large index, bibliography, and cross references PreT X’s features, moreover, remain equally useful throughout. Answers to some of the exercises E for more ordinary book-length projects or even for appear at the back of the book. There may be small typesetting projects such as research papers. many supplements, such as a separate solutions By exploiting context dependency, PreT X sup- manual, a study guide, a lab manual, a bank of E plies much of the routine markup required for high- test questions, or transparency masters. All of quality typesetting in T X, simplifies the notation these contain many references that need to be keyed E for mathematics, supplies user-friendly error diag- to the textbook itself. There may be a separate nostics, uses its own tables of information to resolve book of instructor’s notes keyed to the text, or the many ambiguities in typesetting, and, by recog- book may be simultaneously published in a special nizing some of the syntax of various programming instructor’s edition with the notes in the margins or languages, provides powerful tools for typesetting on extra, unnumbered pages. Some of these many computer-program listings. supplements may be published on paper, some on Some of these features were discussed for a the Internet, some in PDF or other format on preliminary version of the PreT X software in Kruse CD-ROM with interactive links. E (1988). Since that article was published, the PreT X Our second example project is the documenta- E software has been substantially revised, extended, tion library for a large software system, one that and used intensively in a commercial environment. may contain hundreds of thousands or millions of It is now ready for initial public release. lines of computer code, written in several differ- This paper discusses only a fraction of the tools ent programming languages and distributed over provided by PreT X. Here, we concentrate on Pre- many files. To avoid errors and inconsistencies, E T X’s file-management facilities that are especially it is important to keep all the documentation in E valuable to authors of large projects. For a discus- the same files with the code. Such a system sion of PreT X’s implementation of hypertext links will have several kinds of documentation: informal E in PDF, see Mailhot (1999).
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 227 Robert L. Kruse
Independent chapter processing Secondary input files High-resolution scans of color photographs or other One of PreTEX’s most powerful features is its complicated graphics require considerable space in ability, while processing one file, to include portions computer storage. If a book contains hundreds of other files, arranged in any desired order, with of such images, its output files can become pro- parts skipped under the control of Boolean (that hibitively large, often several gigabytes in Post- is, true-false) variables. This feature is like TEX’s Script. It therefore becomes imperative to divide \input command, except that \input includes the such a project into smaller files that can be pro- entire file, whereas PreTEX may select only portions. cessed independently. First, we need some notation. In fact, PreT X allows a project to be di- E PreT X Command Syntax. As a preprocessor, vided into conveniently small, chapter-length files E PreT X has its own extensive command language, which are processed independently at every stage, E as well as responding to certain T X commands. including the final production of PostScript or PDF E It recognizes several different environments and files. At the same time, PreT X integrates the E processes its input differently according to the cross references, the index and contents entries, and environment. In the mathematics, verbatim, and the bibliographic citations from all the input files computer-program environments, the characters ‘<’ comprising the entire project. Hence, while the and ‘>’ are processed as usual, but in text (where author works on the files for one chapter, cross ref- they would normally not appear) ‘<’and‘>’ are used erences to other chapters will still resolve properly. to delineate commands to the PreT X preprocessor. Page numbers, chapter numbers, and other elements E Such commands take forms such as that number consecutively throughout the book are automatically updated for each chapter file. <:command_name option1 option2> To accomplish these goals, PreTEX maintains where the number of options and their syntax vary a whole directory of auxiliary files in place of the with the command. single auxiliary file used by LATEX. Indeed, for each To access a secondary input file, we need only chapter there are several auxiliary files that are write an instruction such as ib accessed by PreTEX, by TEX, by B TEX, and by <:read file filename from tag1 to tag2> PreT X’s enhanced equivalent of MakeIndex. Under E at the place where we wish to include part of the normal circumstances, the user never needs to look secondary file filename. The location tags, denoted at any of these auxiliary files directly. Since there
228 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Managing Large Projects with PreTEX: A Preprocessor for TEX
control the typesetting of material between a pair of normally be included (in separate groups) in the control sequences, which we might call \solution master file list for PreTEX. In this way, cross refer- and \endsolution. With this instruction, PreTEX ences may be made from any of the supplements to will recognize two new commands: any other or to the main text. Similarly, in PDF, <:solutions on> and <:solutions off> hypertext links allow the user to jump directly from locations in the text to appropriate locations in any Depending on the setting, PreT X will include or E supplement, or the reverse. delete material between \solution and \endsolution Program code and StripTEX For the textbook itself, we specify Now let us turn to our second motivating exam- <:solutions off> ple, a documentation library for a large software (likely in a style file, so it need be done only once system, where the code and documentation are for the entire book), in which case all solutions will distributed over many different files, generally with be deleted. When processing the solutions manual, differentauthors. we specify By treating each file of program code and doc- <:solutions on> umentation as a secondary file, PreTEXcanbeused so that all solutions will appear immediately after to combine any desired extracts of the documen- their exercises. tation or the program code in any desired order. With these tools, typesetting a separate solu- The identical source files can in this way be used to tions manual is trivial, and it will be guaranteed construct, for example, informal user guides, user to be kept current with any revisions to exercises reference manuals, programmer’s reference manuals, in the text. All we need to do is to place location or other documentation, either for in-house use or tags before and after each group of exercises in external distribution. the text’s chapter files. Then the file for the solu- In PreTEX’s different environments, the same tions manual will contain almost nothing except for symbols will be processed in different ways. PreTEX the command <:solutions on> followed by a long provides special facilities for typesetting computer series of <:read file ...> commands to include programs, understanding enough of the program each group of exercises and their solutions from the syntax to adjust spacing and choose special symbols. ... various chapter files. For example, curly braces { } (delineating a Some authors, on the other hand, prefer to group in TEX) become printed symbols in C or C++, place all exercises and their solutions into separate enclosing code segments, whereas in Pascal, they files, one or more exercise files per chapter. This delineate comments that will be typeset as text, not organization will work just as well. Now, with loca- as computer code. PreTEX provides environments tion tags before and after each group of exercises, for many common programming languages (Java, <:read file ...> commands in the main chapter C++, C, Pascal, Ada, Fortran, Basic, Cobol, among files will include the exercises where desired (along others). By treating program lines as tab-controlled with their solutions, which will be deleted, provided lines (as illustrated in The TEXbook, page 234), tab solutions is off). stops can be used to achieve proper alignment, or To place answers in the back of the book is other TEX markup can be inserted into programs. just as easy. We now introduce a second Boolean For these program files, PreTEX provides a variable, say answers, include the commands <:so- further utility, called StripTEX, that removes all lutions off> and <:answers on>, and use the the text surrounding program code, as well as same <:read file ...> commands to typeset, at any TEX markup in the program code, yielding the book’s end, only the selected answers. output that can be submitted directly to a com- Production of other supplements for a large piler. StripTEX recognizes all the same commands book follows much the same plan. The material as PreTEX, including reading from secondary files. for these supplements can either be placed in the Hence, the order of subprograms within files and main files with typesetting controlled by Boolean their accompanying documentation need not have variables, or placed in separate files used to produce any connection with the order expected by the the supplements, with <:read file ...> com- compiler; StripTEX can be used to extract sub- mands as appropriate to include material from the programs from arbitrary files and arrange them in main file or other supplements. In any case, the whatever order is needed by the compiler. The names of the files for all the supplements should same file(s), containing both documentation and
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 229 Robert L. Kruse
code, can be processed through PreTEXtoobtain chapters are written by different authors who may a well-formatted, documented program listing, or not be in touch with one another. Each chapter through StripTEX to obtain the program in exactly will then be written and processed independently; the form needed by the compiler. if desired, each chapter may have its own cross In this way, PreTEX and StripTEX provide all references, index, or contents. The editor may then the functionality and capabilities of Knuth’s WEB sys- later merge all these resources for the entire volume. tem. PreTEX replaces WEAVE, and StripTEX replaces The editor may supply a bibliographic database for TANGLE. The PreTEX–StripTEX system, moreover, use by all authors, accessed as needed by BibTEX brings several advantages over WEB: from within PreTEX. In addition to these global • The user has unlimited flexibility in the or- citations, PreTEX allows bibliographic citations local dering of subprograms and documentation and to each chapter, so each author may use both their placement in various files. the global database and, independently, use local • The same software manages programs in any citations from other bibliographic databases. To number of computer languages, including sev- enable local citations, PreTEX includes a second set eral for which WEB is not available. For software of citation commands with ‘l’ prefixed to the name systems using several languages, the identical ofeach,suchas<:lcite>, <:lbibliography>,and software generates all the files needed for the the like. various compilers. • The user has no need to learn a new command Preview language: StripT X shares the identical syntax E This paper touches on only a few of the tools of PreTEX, which is an easy extension of TEX. PreTEX provides to authors to simplify the writing, Bibliographic entries management, and typesetting of large projects. Together with its preprocessor, the full PreTEX For large projects with many references, maintain- software package contains the StripTEX processor, ing the bibliography can require considerable work. an auxiliary program for the construction of the BibTEX provides excellent facilities for maintaining index, another for the table of contents, and a large a bibliographic database; PreTEXusesBibTEX with- TEX macro package of about the same size and A out change. Any L TEXuserofBibTEX will immedi- capability as LATEX, but with a somewhat different ately be familiar with the PreTEX commands <:cite design philosophy. ...>, <:nocite ...>, <:bibliography ...>,and The PreTEX software has been under devel- <:bib_style ...>. opment and revision for several years, and at the As a preprocessor, PreTEX can automate and same time it has been used intensively by PreTEX, streamline the use of BibTEX. LATEX, for example, Inc., for the commercial typesetting of many text- requires a four-pass process: books in mathematics, engineering, and science. 1. Run LATEX to collect the citation keys. The software is now ready for initial external test- 2. Run BibTEX to assemble the corresponding ing and application, with public release to follow references from the database(s). in due course. Before its general public release, 3. Run LATEX to associate the citation keys with however, some work remains to be completed, espe- their references. cially the writing of user guides, reference manuals, 4. Run LATEX to resolve the citations. and other documentation necessary for the effective application of the PreT Xtools. PreTEX accomplishes the same results with only two E passes and with no special attention from the user. On its first pass, PreTEX assembles the citation keys Bibliography and automatically invokes BibT X, after which Pre- E Kruse, Robert L. “PreTEX: Tools for Typeset- T X immediately associates the citations with their E ting Technical Books.” TEXniques No. 7: Confer- references. On its second pass, PreTEXresolves ence Proceedings of the Ninth Annual Meeting, the citations. Whenever the document is processed Montreal, August 1988. Ed. Christina Thiele, again, PreTEX automatically detects if the citations pp. 219–226. T X Users Group. 1988. ib E have been changed, and PreTEXinvokesB TEX Mailhot, Paul A. “Implementing Dynamic Cross- only when necessary to keep the bibliography up Referencing and PDF with PreTEX.” 1999. (See to date. elsewhere in these Proceedings.) As our final example, consider a symposium or conference proceedings in which the individual
230 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting ∗ Database Publishing with JAVA and TEX
Arthur Ogawa TEX Consultants, California ogawa@teleport.com
Abstract Sun Computer intends its JAVA programming lan- guage to be the lingua franca of the World Wide Web. Sun may get their wish: dozens of books about JAVA are published every year, and there is even a conference about JAVA, JavaOne, being held in San Francisco at the same time as this TUG meeting. If you wanted to publish a book documenting and cross-referencing all the JAVA class libraries (running to 1,000 pages and covering over 1,500 classes, organized into about 70 “packages”) what Ode to a special would you do? If you are Patrick Chan and Rosanna Oh what a tangled web we get Lee, you would use JAVA itself to manage the data When first we find \def and \futurelet. (about 40Mb of it) and you would use T X to format E We turn to eTEX, we turn to Omega, your pages. They both look good, we start to get eager. In my talk, I will describe the criteria for se- But no, whats this, a misplaced \omit? lecting the formatter (i.e., TEX plus macros) for a Did we miss a brace, does \vbox still not fit? database typesetting project, the best way of in- Help me fix my macros, bracket heroes all! terfacing between TEX and a database engine, and But Pierre’s font is calling, Lamport’s left the hall some interesting (perhaps even challenging) features Barbara says use \downcase, Downes just strokes of the formatting work. I will also show how the his beard. success of the project enabled the author to make Erik sells me 4TEX, pretends he never heard. last-minute revisions to the book (changes neces- David’s hacking tables, Frank is fixing floats, sitated by late developments in the JAVA class li- Ogawa’s talking slowly, Kacvinsky wants his oats. braries themselves) even though this involved the re- Will Kaveh help me out? no, his dhoti’s dirty, processing of all 40Mb of data, in less than 24 hours. Nelson’s feeding awks, Mimi’s feeling flirty. Flynn says use a rubber, Wendy needs a hug, Kath has got the answer, no it’s just another mug. Young Ross is such a hoot, he says use xypic Why not turn to Sojka, he’s sure to know the trick. Anita, a Hoover, what use to me’s a dam? Send me out with Kiren, we’ll both go on the lam. Irina claims ‘for us in Mir is no problem’, TRIUMF uses \mathcode but \hspaces just one em. TrueTEX does it both ways, and you can trust the Blue Sky but hyperref the backend, oh why oh why oh why?
ActiveTEX, PassiveTEX, what a great to do, Can there be yet some way through? MathML has pointies, XSL can claim its templates ExerQuiz is so cute—so screw you Billy Gates.
∗ [No paper submitted. – Ed.] —Sebastian Rahtz
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 1075 Paul A. Mailhot
Implementing Dynamic Cross-Referencing and PDF with PreTEX
Paul A. Mailhot PreTEX, Inc. 2891 Oxford Street Halifax, Nova Scotia, B3L 2V9 Canada paul@pretex.com
Abstract This paper discusses how we can create dynamic and interactive on-line documents in Portable Document Format (PDF), using TEX. PDF documents are generally device and platform independent, therefore, ideally suited for on-line publishing and information exhange. We will need to work in three programming languages: macros in TEX, and definitions in PostScriptand PDF. We begin by describing the steps necessary to process PDF through TEXand PostScript, followed by several examples of defining such TEX macros. The examples are quite easy to follow; however, some knowledge of PostScript and PDF programming is useful. Paperless publishing Portable Document Format Publishing of books and documents has dramati- Portable Document Format, more commonly called cally changed in the past few decades. Philip Taylor PDF, was developed by Adobe Systems and is a (1996) accurately described the emergence of com- descriptive programming language, devoid of soft- puter typesetting with emphasis on Web document ware and hardware dependence, used to render rendering applications, using portable multiplat- documents. PDF is sometimes grossly misunder- form hypertext exchange standards, particularly stood as being a subset of the PostScript format focusing on the merits of PDF and HTML (including language. It is true that PDF and PostScript share XML, SGML, et al.). For our purpose we can define many common features but each format language on-line as any means by which a document can be suits a particular task, and there are features found electronically rendered; this is perhaps more aptly in each but not the other. Thomas Merz (1997) called paperless printing. describes these similarities and differences in suit- The popularity of on-line books and documents able detail. For our purpose we will concentrate has spawned a diverse variety of on-line readers. on PDF documents, in particular addressing some These readers, for our purpose, are programs, ap- hypertext features including bookmarks, links, and plets, and interpreters which can read electronic annotations. data and output the image to a screen, in much A PDF document in general cannot be directly the same way that one would see the output from a viewed, but rather, must be processed through a printer. Readers include proprietary single-platform reader such as Adobe Acrobat Reader or Ghost- programs, stand-alone multi-platform readers (in- script. There also exist a number of application cluding Frame Reader, Ghostscript, and Acro- plug-ins for internet, word-processing, and desktop- bat Reader), Web browsers (including HTML and publishing software which allow viewing of PDF JAVA), and Web browser plug-ins (including Acro- documents. PDF documents are usually created bat). It is not at all wrong to suggest that TEX, by printing a document through a printer driver along with a DVI previewer, is one of the first such as Acrobat PDF Writer or printing to a platform independent readers. Many people have PostScript file which is in turn passed through generously contributed packages to the TEXcause, Acrobat Distiller. The second method is more by which users can incorporate recent advancements commonly used by the TEX community; a brief in reader design such as HTML and PDF. description is given by Amy Hendrickson (1998). Contributions such as hyperref (Th`anh and Rahtz, 1997) and pdftex (Th`anh, 1998) provide direct-to- PDF document processing.
232 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Implementing Dynamic Cross-Referencing and PDF with PreTEX
PDF through PostScript [ /Rect [ 0 0 60 60 ] /Page 124 For our purpose we will concentrate on the method /LNK pdfmark used by Hendrickson. The description of PDF and PostScript operators for hypertext links is found in contains all of the essential elements required for a Merz, but for our purpose we will briefly review it. link to pass through PostScript and be executed in In order to create PDF links, we must first sup- PDF. The PostScript mark [ and definition pdfmark ply a set of raw PostScript instructions/definitions delimit the PDF code. It cannot be assumed that which will allow passage of PDF link code through the code between the delimiters is PDF because we Acrobat Distiller to create PDF files with links. may need to pass values from TEXintothePDF This PDF code will be ignored by non-PDF de- code. The fully functioning example vices such as PostScript printers. The following \def\pdfbookmark#1#2{{ PostScript source code accommodates this by an %#1=section/Chapter/etc.. if–else statement: %#2=usually the title /pdfmark where \special{verbatim= {pop} " [ /Title (#1 #2) {dictionary /pdfmark /OUT pdfmark"} /cleartomark }} load put} shows how we can define a TEX macro, using a ifelse PostScript special, to insert entries into an outline In this PostScript code, the where command of a PDF document. An example of an outline in searches for a PostScript dict containing a defi- Adobe Acrobat is shown in the left side of Figure 1. nition of pdfmark. If the definition is found, then We can jump to each entry of the document by the if portion {pop} is executed and word pdfmark clicking the icon to the left of the outline entry. is popped off the execution stack by the PostScript interpreter; otherwise, pdfmark is defined to remove all the tokens between mark and the word pdfmark by using the PostScript operator /cleartomark.A mark in PostScript is the character [.Wenow have an avenue by which PDF code can be passed through both to PDF devices and non-PDF devices. Most new DVI-to-PostScript interpreters have already included the above PostScript code in their document header. Those interpreters that do not include it can do so with the following TEX definition: Figure 1: An example of a PDF document con- \def\initializePDF{ taining an outline. We can use the outline to jump \special{header=pdfparse.psc}} to its location in the body of the text. This TEX macro tells TEX and any generic DVI- to-PS driver to insert the file pdfparse.psc into A more useful but trickier example is: the PostScript output file header. The file pdf- \def\pdfnameddestination#1#2{{ parse.psc would then contain the above PostScript %#1=xrf-tag name /pdfmark %#2=pagenumber code with the definition. \special{verbatim= " [ /Dest /#1 The PDF code /page #2 /Border [0 0 0] Now that we have taken care of passing PDF link /DEST pdfmark"} code, we must create PDF code for creating PDF }} links. The PDF code works in the same manner as \def\pdfgotonameddestination#1{{ PostScript in that we must create definitions using %#1=xref-tag name predefined PDF operators. There are essentially two \special{verbatim= parts to implementing PDF links. Merz (1997:288– " [ /Rect [ currentpoint 298) gives several examples of fully functional links 2 neg add exch which can be easily followed. The example 10 neg add exch
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 233 Paul A. Mailhot
currentpoint the absolute page numbers are 0 for i, 15 for xvi, 8 add exch 16 for 1, up through 38 for page 23. With a little 0 add exch] work, one could easily redefine T X cross-reference /Border [0 0 0] E /Dest /#1 and index macros to include file names and absolute /LNK pdfmark "} page numbers, thereby implementing dynamic and }} automatic PDF linking. One method for keeping This example supports dynamic cross-referencing in track of absolute page numbering in TEX is to define the PDF document in the same manner as LATEX. a new counter and increment it in the TEX output The first macro creates a mark where the source routine. reference is located and the second macro creates a We can combine the methods of the two previ- link to that reference. The second macro does this ousexamplesbyhavingtheTEX macro PDFnamed- by creating a PDF link box 10 points square in the destination include another TEX macro which PDF document. If, in Acrobat Reader, you click on writes each instance, named destination, and file- this box, you will jump to the page where the link name, to a global output file containing calls to is defined. each document file: One major restriction of this method is that % Global file containing the entire project needs to be in one PDF document. % named destinations \input document1.nd Multiple-file PDF documents \input document2.nd . Many books and other large projects need to be . processed as multiple files, broken up so that each \input documentN.nd section is of more manageable size. But our earlier In this manner, all files can be accessed somewhat method of creating links as named references does independently and named references for each docu- not work outside a single file. ment can be easily updated. By referencing this file, A better method, which works over multiple one can determine in which file a named destination files, is to create a set of macros that reference occurs. both the filename and the page number With this method, we can set up a link to any PDF document, as long as we know the file name and the absolute page reference. This method is implemented in the following TEX definition: \def\pdfxrffileopen#1#2{{ %#1= file name %#2= absolute page number - % not the printed page number \special{verbatim= " [ /Rect [ currentpoint 2 neg add exch 10 neg add exch currentpoint 8 add exch 0 add exch] /Border [0 0 0] Figure 2: An example of a PDF document con- /Action << /Type /Action /Subtype /GoToR taining links from Table of Contents entries to their /File (#1.pdf) locations in the main body of the text. /Dest [ #2 1 neg add /Fit ] >> /Subtype /Link Duplicate named destinations should not occur. /ANN pdfmark "}}} Figures 2 and 3 show several examples of how this This definition contains a few more operators but combination can be implemented to give PDF links. it has an overall structure similar to the earlier The small shaded boxes represent PDF link boxes; example. It is important to note that PDF docu- clicking on these boxes will jump the reader to the ments are referenced by absolute page number. If, destination page. for example, the document contains front matter numbered i to xvi and text pages 1 through 23, then
234 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Implementing Dynamic Cross-Referencing and PDF with PreTEX
executable should be thoroughly tested before being distributed.
Figure 4: An example of launching an application Figure 3: An example of index entries containing in a PDF file by clicking the shaded boxes containing links to their locations in the main body of the text. the “D”.
Launching applications from PDF Customized PDFlink radio buttons links We can now create documents which contain dy- One final useful example of a PDF link is to allow a namically linked cross-references in the form of PDF user to launch an application program while reading links. We should now enhance our PDF documents the PDF document. The following code allows an by creating menus or user-defined buttons which executable to be opened directly and related files to will allow warping to different locations of the book. be opened using associations: In Figures 1 to 4 we see a menu with such headings \def\pdflaunchapp#1{{% as Title, TOC, Prev, Next, Index,andHelp; % #1 = filename (pathname is optional) \special{verbatim= all these represent links to various locations in the " [ /Rect [ currentpoint book. These are inserted onto each page by includ- 2 neg add exch ingaTEX command in the output routine. The 10 neg add exch TEX output routine currentpoint \def\output{\shipout\vbox{\pdftoolbar 8 add exch \makeheadline 0 add exch] \pagebody /Border [0 0 0] \makefootline}% /Action << /Type /Action \advancepageno /Subtype /Launch \global\advance\absolutepageno by 1 /File (#1) >> \ifnum\outputpenalty>-\@MM /Subtype /Link \else\dosupereject\fi} /ANN pdfmark "}}} is taken from the set of macros used to create the The T X example E pages in Figures 1 to 4. Note that there is also a \pdflaunchapp{filename.c} reference to absolute page numbering in the sixth can be used to compile, or to open with an Inte- line, as we suggested earlier. The TEX macro for grated Development Environment (IDE), files hav- the PDF link toolbar ing extensions associated with C compilers. Files \def\pdftoolbar{{ such as MS-DOS executables can work, although \vbox to 0pt{\hbox to 0pt{\hskip -15pc these files can only be run in a DOS/Windows \vbox{\hsize=8pc% \pdftitlepage environment. The shaded boxes containing the \pdftoc uppercase “D” in Figure 4 indicates a PDF link \pdfprev to launch the program compside.exe.Ingen- \pdfnext eral, this type of PDF link can be platform- and \pdfindex operating-system-specific; however, under certain \pdfhelp }\hss}\vss}}} conditions, it can be quite useful. Any link to an
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 235 Paul A. Mailhot
is quite simple. Depending on its implementation, Kruse, Robert. “Managing Large Projects with however, must not affect the remainder of the page PreTEX: A Preprocessor for TEX.” 1999. (See body. The TEX macro \pdftoc, for example, is elsewhere in these Proceedings.) simply a line containing a PDF link using the Merz, Thomaz. PostScript and Acrobat/PDF \pdfxrffileopen macros. Springer-Verlag, Berlin, Heidelberg, 1997. Taylor, Philip. “Computer Typesetting or Electronic Acrobat 4 Publishing? New trends in scientific publishing.” TUGboat 17(4), 367–381 (1996). In this paper we have attempted to define a set Thanh, H`an Thˆe´. “Improving TEX’s Typeset Lay- of TEX macros which will create on-line PDF doc- out.” TUGboat 19(3), 284–288 (1998). uments which are platform independent. Since we Th`anh, H`an Thˆe´, and Sebastian Rahtz. “The cannot foresee advancements in software develop- pdfTEX user manual.” TUGboat 18(4), 249–254 ment, with the release of Acrobat 4 and future (1997). versions, there remains the potential for better and more visually appealing on-line documents. The examples used in this paper are based on exist- ing PDF features found in the on-line document Portable Document Format Reference Manual, Ver- sion 1.2, November 1997, by Adobe Systems. This edition is rather old, but one can get the most recent PDF version at the Adobe website www.adobe.com. ATEX Haiku \expandafter\def PreTEX and PDF \csname def\endcsname PreTEX is a preprocessor and macro package for {\message{farewell}}\bye TEX, developed by PreTEX, Inc., which supplies an —Sebastian Rahtz author with many tools to simplify the writing and management of larger (book length) projects. (For a discussion of PreTEX, see the article by R. Kruse ATEXnician’s Haiku elsewhere in this issue.) One of PreTEX’s important expansions of the resources available to an author This haiku for TEXnicians consists entirely of TEX is the inclusion of PDF links by using TEX macro control sequences; furthermore, it forms a valid definitions similar to the examples above. In this TEX assignment statement—provided that a certain way, a book can be published both on paper and in control sequence that is normally undefined is an electronic form that incorporates automatically defined in an obvious way. Can you identify the generated PDF links for cross-references, index control sequence in question? entries, launching application programs, and the like. The current set of PDF macros are not stable \catcode\csname enough; they will be made freely available in the \romannumeral\parshape spring of next year (1999). \endcsname\month Cheers. —Michael Downes References Adobe Systems Incorporated. PostScript Language ATEX Cheer Reference Manual. Addison-Wesley, Reading, Mas- sachusetts, 1993. Flush to the left Adobe Systems Incorporated. Portable Document Flush to the right Format Reference Manual. Addison-Wesley, Read- \vskip, \hskip, type type type ing, Massachusetts, 1993. Michael Sofka Hendrickson, Amy. “Real Life LATEX: Adventures of a TEX Consultant.” TUGboat 19(2), 162–167 (1998).
236 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting A Web-Based Submission System for Meeting Abstracts
Hu Wang American Institute of Physics 1NO1 2 Huntington Quadrangle Melville, NY 11747 hwang@aip.org
Abstract As one of the services provided to AIP’s Member Societies, we publish program books and abstract books for meetings sponsored by the societies. In this paper, we describe a Web-based client-server abstract submission system. It uses HTML forms to gather and validate user input of abstracts and related information, pro- vide on-line proof before officially submitting, and store the input in LATEX files and a database. From the publishing production’s point of view, the major bene- fit of such a system is the greatly improved quality of well-structured submissions, which makes it possible to streamline the whole production cycle. Some possibil- ities with its integration into a database publishing system are briefly discussed afterwards.
Introduction read their abstracts, and figures could not be easily submitted with the abstracts. The American Institute of Physics (AIP) publishes We needed to develop a system that could take program books and abstract books for some of its advantage of the potential strength of the email- member societies’ meetings. The last few years have based method while avoiding its shortcomings, name- seen many changes in the ways some societies’ meet- ly, the non-interactiveness and lack of control on the ing abstracts are submitted and published. quality of submissions. At the beginning, hardcopy abstracts prepared in various text formatting software were mailed in A Web-based system and published with the cut-and-paste method. It was non-digital, inefficient, and the resulting quality As the World Wide Web spreads around the world of finished products was understandably poor. (for some society meetings, over one third of the A contributed papers come from abroad), it lends itself Then, with the popularity of LTEX in scien- tific communities and the widespread availability of naturally to a solution of our problems. email, came email-based electronic submissions. Au- We have developed an automated, Web-based, A client-server meeting abstract submission system thors would download an appropriate LTEXtem- A with the following features: plate file that included predefined tags (LTEXcom- mands and environments), fill it out, and email it to • interactive user interfaces via HTML forms a designated address. A program would collect the • on-line LATEX help info A submissions and save them as individual LTEX files. • choice of LATEXornon-LATEXentrymethods The email approach had the potential advan- • uploading figures with abstracts tages that all files were standardized, well tagged, • on-line proof viewing and syntax validation and the publishing process could be highly auto- mated. Unfortunately, since there are still many • editable abstracts authors who are not familiar with or do not have • easy integration with publishing phases LATEX, many submissions had to be manually cor- Regardless of the input method chosen by the rected for syntax errors by the production staff, which authors, each submission results in a well-tagged was time consuming and sometimes impractical. This LATEX file that is free of syntax errors, a valid HTML often led to syntactically invalid submissions, which file into which the input data is injected, and up- in turn prohibited auto-processing. Another draw- dated database records. back was that the process was not interactive. As LATEX is used as the ultimate file format for for authors, those without LATEX could not proof- the abstract collecting phase for two reasons: it fits
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 237 Hu Wang
well with our existing production systems and it is The template has hyperlinks to sample inputs a widely accepted standard for publishing scientific for both entry methods, and to LATEXhelponhowto and technical documents. In fact, the on-line proof input symbols and mathematical expressions. When is produced by LATEX, along with dvips,andsome clicked, these links open separate windows for easy PostScript-to-gif conversion program. reference. The following form elements are used in the System requirements template: Client: Internet connection and a conventional Web • radio buttons for choosing the entry method browser. • presenting author’s name Server: Web server, LATEX, dvips, Image Alchemy • corresponding author’s name, address, country, or ps2gif, GhostScript, and CGI scripts. phone, email address, fax In the following section, we describe the imple- mentation of this system. • title and short title • author’s name and address Implementation •
238 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting A Web-Based Submission System for Meeting Abstracts
file, display it, and ask the user to go back to Title: Straight Text Mode #$% Title the template screen and fix the LATEX error. which will be converted into the following in the We choose the gif format for proofing because A it is accessible without requiring browser plug- resulting LTEX file: ins or help applications. Of course, the PDF \title{Straight Text Mode \#\$\% Title} format may be used for proofing, which requires the Adobe Acrobat Reader for viewing. Notice that the literal input of the special char- In order to show the user the dynamically acters #$% has been correctly converted with generated gif, we must prevent browser caching the addition of backslashes. from displaying the old gif. This can be done by The LATEX method. By selecting this method, proper HTTP headers such as ‘Cache-Control: authors can directly input LATEX code. In this no-cache’ or ‘Cache-Control: no-store’. The gif sample, the author is facing a similar screen, file for a typical abstract is about 20Kb in size. with “Presenting Author” as the prompt to the Building LATEX files via the Straight Text meth- left of the boxed area for the name: od warrants a little note here, as whatever the First name: P ’al Last name: Kn "oll user has entered will be treated as literal. The \ \ CGI script must handle the following T Xspe- A E will result in this statement in the LTEX file: cial characters carefully to properly escape them: #$%&_{}~^\<>| \pauthor{P\’al}{Kn\"oll} For example, author’s input < should be con- 3. Build a temporary HTML file into which the A verted to $<$ in the LTEX file; otherwise, un- user-input data is inserted. This file will be A expected output will result even though LTEX needed if the user wants to edit a previously does not complain. submitted abstract later on. A It’s rather straightforward to build LTEX files Here, care is needed too. First, every " char- A with the LTEX entry method — just insert the acter entered by the user must be replaced in author input data into the arguments of appro- the resulting HTML file by the entitized ver- A priate LTEX control sequences. Not surpris- sion, i.e. by " so that it does not inter- A ingly, the LTEX file’s tags correspond nicely fere with the HTML form’s element value de- to the template’s elements. For instance, these limiter ". Second, any scrolling list element in tags are used: the template must be placed in the HTML file • \pauthor and a SELECTED attribute inserted inside the • \cauthor
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 239 Hu Wang
and then go through the proof and final sub- comprehensive database publishing system for hand- mission cycle again. ing the following conferences-related tasks: 5. Final submit. For a new submission, assign a • During the period of collecting abstract submis- new sequential abstract number; for a re-sub- sions, set up a cron job for the conference coor- mission, extract its abstract number from the dinator to print out abstracts and accompany- corresponding hidden field. In either case, re- ing figures received the previous day, to gener- name the source file, the HTML file, and any ate a sort-by-category and sort-by-presenting- figure files to the abstract number with proper author list of abstracts received so far. extensions, email the abstract number and a • For program committee use: abstract database randomly generated PIN to the corresponding search for various fields. email address, and insert or update the database • Insert into the database the program commit- records accordingly. tee’s decisions of acceptance and rejection, as Conclusions well as the meeting sessions information (such as the session name, schedule, location, chair- This system offers many benefits to both users and person, invited talks, contributed talks, posters, the production staff. time slot for each presentation, etc.). Users’ benefits. Self-evident HTML form templates, • Generate letters of acceptance and rejection, as choice of entry methods, easy figure handling, LATEX well as mailing labels. syntax validation (this could be a mixed bag for • Generate program books and abstract books. A LTEX newbies because the error messages may be • Searchable on-line program listing and viewing. too cryptic for them), on-line proof viewing, and editing previously submitted abstracts. Acknowledgements Production benefits. Since the rigorous valida- Several people at the AIP were involved in this project: tion processes shift more responsibilities to the users, Don Lang, Chris Hamlin, and Kevin McGrath. My the abstract submission quality is greatly improved. thanks to all of them—especially Chris Hamlin, from The production staff now start the game with stan- whom I have learned many TEXniques. dardized, well-structured data, which makes it pos- I would also like to thank the TUG99 review- sible for the subsequent events to be highly auto- ers and the Proceedings editor, Christina Thiele, for mated. In fact, this system can be expanded into a their constructive suggestions and comments.
240 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Hyphenation on Demand
Petr Sojka Faculty of Informatics Masaryk University Brno Botanick´a68a, 60200 Brno Czech Republic sojka@informatics.muni.cz
Abstract The need to fully automate the batch typesetting process increases with the use of TEX as the engine for high-volume and on-the-fly typeset documents which, in turn, leads to the need for programmable hyphenation and line-breaking of the highest quality. An overview of approaches for building custom hyphenation patterns is pro- vided, along with examples. A methodology of the process is given, combining different approaches: one based on morphology and hand-made patterns, and one based on word lists and the program PATGEN. The method aims at modular, easily maintainable, efficient, and portable hyphenation. The bag of tricks used in the process to develop custom hyphenation is described.
Motivation We have already dealt with several issues re- ˇ In principle, whether to hyphenate or not is a style lated to hyphenation in TEX (Sojka and Seveˇcek, question and CSS [Cascading Style Sheets] should 1995; Sojka, 1995). On the basis of our being in- develop properties to control hyphenation. In volved in typesetting tens of thousands of TEX pages practice, however, for most languages there of multilingual documents (mostly dictionaries), we is no algorithm or dictionary that gives want to point out several methods suitable for the all (and only) correct word breaks, so development of hyphenation patterns. some help from the author may occasionally be needed. Pattern generation — (Bos, 1999) There is no place in the world that is linguistically homogeneous, despite the claims of the nationalists around the world. Separation of content and presentation in today’s — (Plaice, 1998) open information managment style in the sense of SGML/XML (Goldfarb, 1990; Megginson, 1998) is Liang (1983), in his thesis written under Knuth’s su- a challenge for TEX as a batch typesetting tool. pervision, developed a general method to solve the The attempts to bring TEX’s engine to untangle pre- hyphenation problem that was adopted in TEX82 sentation problems in the WWW arena are numer- (Knuth, 1986a, App. H). He wrote the PATGEN pro- ous (Sutor and D´ıaz, 1998; Skoup´y, 1998). gram (Liang and Breitenlohner, 1999), which takes One bottleneck in the high-volume quality pub- • a list of already hyphenated words (if any), lishing is the proofreading stage—line-breaking and • hyphenation handling that need to be fine tuned to a set of patterns (if any) that describes “rules”, the layout of particular publication. Tight dead- • a list of parameters for the pattern generation lines in paper-based document production and high- process, volume electronic publishing put additional demands • a character code translation file (added in PAT- for better automation of the typesetting process. GEN 2.1; for details see Haralambous (1994), The need for multiple presentations of the same data and generates (e.g., for paper and screen) adds another dimension • to the problem. Problems with hyphenation are of- an enriched set of patterns that “covers” all hy- phenation points in the given input word list, ten one of the most difficult. As most TEX users are perfectionists, fixing and tuning hyphenation for ev- • a word list hyphenated with the enriched set of ery presentation is a tedious, time-consuming task. patterns (optional).
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 241 Petr Sojka
The patterns are loaded into TEX’s memory When developing new patterns, it is good to start and stored in a data structure (cf. Knuth, 1986b, with the following bootstrapping technique with it- parts 40 – 43), which is also efficient for retrieval — eration, which should avoid the tedious task of man- a variant of trie memory (cf. Knuth, 1998, pp. 492 – ually marking hyphenation points in huge lists of 512). This data structure allows hyphenation pat- words: tern searching in linear time with respect to the pat- 1. Write down the most obvious initial patterns, tern length. The algorithm using a trie that “out- if any, and/or collect “the closest” ones (e.g., puts” possible hyphenation positions may be viewed consonant-vowel rules). as finite automaton with output (Mealy automaton 2. Extract a small word list for the given language. or transducer). 3. Hyphenate current word list with current set Pattern development patterns. ...problems[withhyphenation]havemoreorless 4. Check all hyphenated words and correct them; disappeared, and I’ve learnt that this is only because, in the case of errors return to step 3. nowadays, every hyphenation in the newspaper 5. Collect a bigger word list. is manually checked by human proof-readers. — (Jarnefors, 1995) 6. Use the previously generated set of patterns to hyphenate the words in this bigger list. Studying patterns that are available for various lan- 7. Check hyphenated words, and if there are no guages shows that PATGEN has only been used for errors, move to step 9. about half of the hyphenation pattern files on CTAN 8. Correct word list and return to step 6. (cf. Table 1 in Sojka and Seveˇˇ cek, 1995). There are two approaches to hyphenation pat- 9. Generate final patterns with PATGEN with pa- tern development, depending on user preferences. rameters fitted for the particular purpose (tuned for space or efficiency). Single authors using TEX as an authoring tool want to minimize system changes and want TEXtobe- 10. Merge/combine new patterns with other mod- have as a fixed point so that re-typesetting of old ules of patterns to fit the particular publishing articles is easily done, thanks to backwards compat- project. ibility. For such users, one set of patterns that is To find an initial set of patterns, some basic fixed once and for all might be sufficient. rules of hyphenation in the specific language should On the other hand, for publishers and corpo- be known. Language can be grouped into one of two rate users with high-volume output, it is more ef- categories: those that derive hyphenation points ac- ficient to make a long-term investment into devel- cording to etymology and those that derive hyphen- opment of hyphenation patterns for particular pur- ation according to pronunciation— “syllable-based” poses. I remember one TEX user saying that my sug- hyphenation. For the first group of languages, one gestion to enhance standard hyphenation patterns should start with patterns for most frequent end- with custom-made ones to allow better hyphenation ings and suffixes and prefixes. For syllable-based of chemical formulæ would save his employer thou- hyphenation, patterns based on sequences of conso- sands of pounds per year. Of course, with this ap- nants and vowels might be used (cf. Chicago Manual proach, one has to archive full sources for every pub- of Style, 1993, Section 6.44, and Haralambous, 1999) lication, together with hyphenation patterns and ex- as first approximation of hyphenation patterns. ceptions. As using TEX itself for hyphenation of word lists One of the possible reasons PATGEN has not been and development of patterns may be preferred to used more extensively may be the high investment other possibilities, we will start with this portable needed to create hyphenated lists of words, or bet- solution, using hyphenation of phonetic transcrip- ter, a morphological database of a given language. tions as an example of a syllable-based “language”. Pattern bootstrapping and iterative Let’s start with some plain TEX code to define development consonant-vowel (CV) patterns: % ... loading plain.tex The road to wisdom? % without hyphen.tex patterns ... Well it’s plain and simple to express: Err and err and err again \patterns{cv1cv cv2c1c ccv1c cccv1c but less and less and less. ccccv1c cccccv1c v2v1 v2v2v1 v2v2v2v1 — (Hein, 1966) ... }
242 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting Hyphenation on Demand
There is a way to typeset words together with their that was input. PERL addicts may want to use the hyphenation points in TEX; the code from Olˇs´ak PERL hyphenation module (Pazdziora, 1997) for the (1997, with minor modifications) looks like this: task. \def\showhyphenspar{\begingroup Once the job of proofreading the word list is \overfullrule=0pt \parindent0pt finished, we can generate new patterns and collect \hbadness=10000 \tt other words in the language. Using new patterns \def\par{\setparams\endgraf\composelines}% on the new collection will show the efficiency of the \setbox0=\vbox\bgroup process. \noindent\hskip0pt\relax} Fine tuning of patterns may be iterated, once PATGEN parameters are set, so that nearly 100 % cov- \def\setparams{\leftskip=0pt erage of hyphenation points is achieved in every it- \rightskip=0pt plus 1fil eration. The setting of such PATGEN parameters may \linepenalty1000 \pretolerance=-1 be difficult to find on the first attempt. Setting of ˇ \hyphenpenalty=-10000} these parameters is discussed in Sojka and Seveˇcek (1995). \def\composelines{% Modularity of patterns \global\setbox1=\hbox{}% \loop It is tractable for some languages to create patterns \setbox0=\lastbox \unskip \unpenalty by hand, simply by writing patterns according to the \ifhbox0 % rules for a given language. This approach is, how- \global\setbox1=\hbox{% ever, doomed to failure for complex languages with \unhbox0\unskip\hskip0pt\unhbox1}% several levels of exceptions. Nevertheless, there are \repeat special cases in which we may build pattern modules \egroup % close \setbox0=\vbox and concatenate patterns to achieve special purpose \exhyphenpenalty=10000% behaviour. This applies especially when additional \emergencystretch=4em% characters (not handled when patterns have been \unhbox1\endgraf built originally) may occur in words that we still \endgroup} want to hyphenate. Now, we will typeset our word list in the typewriter Patterns generated by Raichle (1997) may serve as an example that can be used with any fonts in the font without ligatures. To use the CV patterns de- standard LAT X eight-bit T1 font encoding, to allow fined above we need to map word characters prop- E erly: hyphenation after an explicit hyphen. Similar pat- tern modules can be written for words or chemical % vowels mapping formulæ that contain braces and parentheses. These \lccode‘\a=‘v \lccode‘\e=‘v can be combined with “standard” patterns in the \lccode‘\i=‘v \lccode‘\o=‘v needed encodings. Some problems might be caused ... by the fact that TEX does not allow metrics to be % consonants defined for \lefthyphenmin and \righthyphenmin \lccode‘\b=‘c \lccode‘\c=‘c properly—we might want to say that ligatures, for \lccode‘\d=‘c \lccode‘\f=‘c instance, count as a single letter only or that some ... characters should not affect hyphenation at all (e.g. \raggedbottom \nopagenumbers parentheses in words like colo(u)r). We must wait \showhyphenspar until some naming mechanisms for output glyphs The need to fully automate the (characters) is adopted by the TEX community for batch typesetting process increases handling these issues. with the use of word in wordlist Adding a new primitive for the hyphenmin code ... —let’s call it \hccode,a calque on \lccode — would \par\bye cause similar problems: changing it in mid-para- Finally, extracting hyphenated words from dvi the graph would have unpredictable results.1 file via the dvitype program, we get our word list It is advisable to create modules or libraries of hyphenated by our simple CV patterns. special-purpose hyphenation patterns, such as the Another way to get the initial word list hyphen- 1 ated is to use PATGEN with initial patterns and no ε-TEXv2hasanewfeaturetofixthe\lccode values new level, letting PATGEN hyphenate the word list during the pattern read phase.
TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting 243 Petr Sojka
ones mentioned above, to ease the task of pattern akkompagnement sb [æk^mpænj{- development. These patterns might be written in 'ma4] -et, -er hudebnı´ doprovod m such as as to be easily adaptable for use with core alimentationsbidrag sb [ælimEntæ- patterns of a different language. 'sˇo:’ns,bi,dra:’w] -et, - alimenty pl, prıˇ´spevekˇ m na vy´zˇivne´dı´teteˇ Common patterns for more languages befolkningseksplosion sb [be'f^l’g- Having large hyphenated word lists of several lan- ne4s-] -en, -er populacnıˇ ´explozef guages the possibility then exists to make multilin- -tilvækst -en, -er pˇrı´rustek ˚ m gual or special-purpose patterns from collections of obyvatelstva -tæthed -en, -er words by using PATGEN. Joining word lists and gen- hustota f obyvatelstva erating patterns on demand for particular publica- bemærkelsesværdig adj [be'mæR- tions is especially useful when the word databases g{ls{s,væR’di] -t, -e pozoruhodny´ are structured and split into sublists of personal beslutningsdygtig adj [be'slud- names, geographic names, abbreviations, etc. These ne4s,døgdi] -t, -e schopny´ patterns are requested when typesetting material in rozhodovat; den lovgivende for- which language switching is not properly done (e.g. samling var za´konoda´rne´ shro- ma´zˇdenıˇ ´ bylo˜ schopne´seusna´sˇet on the WWW). Czech and Slovak are very closely related lan- Figure 1: Example of phonetic hyphenation guages. Although they do not share exactly the in Kirsteinov´a and Borg (1999). same alphabet, rules for hyphenation are similar. That has led us to the idea of making one set of hy- phenation patterns to work for both languages, sav- 3. Hyphenate this word list with the initial set of ing on space in a format file that supports both. In patterns. the Czech/Slovak standard T X distribution there is E 4. Check and correct all hyphenated words. support for different font encodings. For every en- coding, hyphenation patterns have to be loaded as 5. Generate final quality patterns. there is no character remapping on the level of trie In bigger publishing projects efforts like this pay off possible. Such Czechoslovak patterns would save very quickly. patterns for each encoding in use. It should be mentioned that this approach can- Hyphenation for an etymological dictionary not be taken for any set of languages as there may In some publications (Rejzek, in prep., for exam- be, in general, identical words that hyphenate dif- ple), a different problem can arise: the possibility of ferently in different languages; thus, simply merging having more than 256 characters used within a sin- word lists to feed PATGEN is not sufficient without de- gle paragraph. This problem cannot, in general, be 3 grading the performance of patterns by forbidding easily solved withintheframeofTEX82. We thus hyphenation in these conflicting words (e.g. re-cord tried Ω, the typesetting system by Plaice and Hara- vs. rec-ord). lambous, for this purpose. One has to create special virtual fonts (e.g., by using the fontinst package) on Phonetic hyphenation top of the Ω ones, in order to typeset it—see Fig. 2. As an example of custom-made hyphenation pat- terns, the patterns required to hyphenate a pho- More hyphenation classes netic (IPA) transcription are described in this sec- But at least I can point out a minor weakness 2 tion. Dictionaries use this extensively — see Fig. 1, of TEX’s algorithm: all possible hyphenations taken from Kirsteinov´a. have the same penalty. This might be ok The steps used to develop the hyphenation pat- for english, but for languages like German terns for this dictionary were similar to those de- that have a lot of composite words there scribed in the previous section on bootstrapping: should be the ability to assign lower penalties between parts of a composite i.e. Um-brechen 1. Write down the most obvious (syllable) pat- should be favored against Umbre-chen. terns. — (Hars, 1999) 2. Extract all phonetic words from available texts. 3 One could try to re-encode all fonts used in parallel in some paragraph such that they share the same \lccode 2 The IPA font used is TechPhonetic, downloadable from mappings, but this exercise would have to be made for each http://www.sil.org/ftp/PUB/SOFTWARE/WIN/FONTS/. multilingual-intensive publication, again and again.
244 TUGboat, Volume 20 (1999), No. 3 — Proceedings of the 1999 Annual Meeting
Hyphenation on Demand