<<

TUGBOAT

Volume 41, Number 3 / 2020

General Delivery 259 From the president / Boris Veytsman 260 Editorial comments / Barbara Beeton Passings: Janusz Nowacki, Ed Benguiat, Ron Graham; new -specific area in CTAN; , fonts, fonts; Erratum: “The road to Noto”, Steven Matteson; Learning LATEX; Graphical in action; Followup to “old news”; Making TUGboat more accessible 263 How TEX changed my life / Michael Barr 265 Typographers’ Inn / Peter Flynn Fonts 269 Eye charts in focus: The magic of optotypes / Lorrie Frear Multilingual 275 The Non- scripts & typography / Kamal Mansour Document Processing Tutorials 281 Using DocStrip for multiple document variants / Matthew Leingang LATEX 286 LATEX news, issue 32, October 2020 / LATEX Project Team 292 LATEX Tagged PDF — A blueprint for a large project / Frank Mittelbach, Chris Rowley 299 Functions and expl3 / Enrico Gregorio 308 bib2gls: selection, cross-references and locations / Nicola Talbot 318 Making Markdown into a microwave meal / ´ıtNovotn´y Graphics 320 User-defined Type 3 fonts in LuaTEX / Hans Hagen 324 Data display, plots and graphs / Peter Wilson Software & Tools 327 Short report on the state of LuaTEX, 2020 / Luigi Scarso 329 Distinguishing 8-bit characters and Japanese characters in ()pTEX / Hironori Kitagawa 335 Keyword scanning / Hans Hagen 337 Representation of macro parameters / Hans Hagen 341 TEXdoc online — a web interface for serving TEX documentation / Island of TEX 343 MMTEX: Creating a minimal and modern TEX distribution for GNU/ / Michal Vlas´ak 346 UTF-8 installations of TEX / Igor Liferenko Macros 348 OpTEX — A new generation of Plain TEX / Petr Olˇs´ak Reviews 355 Book reviews: Robert Granjon, -cutter, and Granjon’ Flowers, by Hendrik .. Vervliet / Charles Bigelow 358 Book review: Glisterings, by Peter Wilson / Boris Veytsman 360 Historical review of TEX3 / Peter Flynn Hints & Tricks 368 The treasure chest / Karl Berry TUG Business 258 TUGboat editorial information 258 TUG institutional members 369 TUG 2021 election Advertisements 370 TEX consulting and production services News 372 Calendar TEX Users Group Board of Directors TUGboat (ISSN 0896-3207) is published by the , Ur Wizard of TEX-arcana † TEX Users Group. Web: tug.org/TUGboat. Boris Veytsman, President ∗ Arthur Rosendahl∗, Vice President Individual memberships Karl Berry∗, Treasurer 2020 dues for individual members are as follows: Klaus H¨oppner∗, Secretary Trial rate for new members: $30. Barbara Beeton Regular members: $105. Johannes Braams Special rate: $75. Kaja Christiansen The special rate is available to students, seniors, and Jim Hefferon citizens of countries with modest economies, as de- Taco Hoekwater tailed on our web site. Members may also choose to Frank Mittelbach receive TUGboat and other benefits electronically, Ross Moore at a discount. All membership options are described Norbert Preining at tug.org/join.html. Will Robertson Membership in the TEX Users Group is for the Herbert Voß calendar year, and includes all issues of TUGboat for Raymond Goucher, Founding Executive Director † the year in which membership begins or is renewed, Hermann Zapf (1918–2015), Wizard of Fonts as well as software distributions and other benefits. ∗member of executive committee Individual membership carries with it such rights †honorary and responsibilities as voting in TUG elections. All See tug.org/board.html for a roster of all past and the details are on the TUG web site. present board members, and other official positions.

Journal subscriptions Addresses Electronic Mail TUGboat subscriptions (non-voting) are available to TEX Users Group General correspondence, libraries and other organizations or individuals for P. . Box 2311 membership, subscriptions: whom memberships are either not appropriate or Portland, OR 97208-2311 [email protected] U.S.A. desired. Subscriptions are delivered on a calendar Submissions to TUGboat, year basis. The subscription rate for 2020 is $115. letters to the Editor: Telephone [email protected] +1 503 223-9994 Institutional memberships Technical support for Fax Institutional membership is primarily a means of T X users: +1 815 301-3568 showing continuing interest in and support for TEX [email protected] and TUG. It also provides a discounted membership Web Contact the rate, site-wide electronic access, and other benefits. tug.org Board of Directors: For further information, see tug.org/instmem.html tug.org/TUGboat [email protected] or contact the TUG office.

Trademarks Many trademarked names appear in the pages of Copyright 2020 TEX Users Group.

TUGboat. If there is any question about whether Copyright to individual articles within this publication a name is or is not a trademark, prudence dictates remains with their authors, so the articles may not that it should be treated as if it is. be reproduced, distributed or translated without the authors’ permission. For the editorial and other material not ascribed to a particular author, permission is granted to and distribute verbatim copies without royalty, in any medium, provided the copyright notice and this permission notice are preserved. Permission is also granted to make, copy and distribute translations of such editorial material into another , except that the TEX Users Group must approve [printing date: November 2020] translations of this permission notice itself. Lacking such approval, the original English permission notice must Printed in U.S.A. be included. The most beautiful thing in the world is a blank piece of . Ed Benguiat (1979), quoted in New York obituary, 16 October 2020

COMMUNICATIONS OF THE TEX USERS GROUP EDITOR BARBARA BEETON

VOLUME 41, NUMBER 3, 2020 PORTLAND,OREGON, U.S.A. 258 TUGboat, Volume 41 (2020), No. 3

TUGboat editorial information Submitting items for publication This regular issue (Vol. 41, No. 3) is the last issue of Proposals and requests for TUGboat articles are grate- the 2020 volume year. The deadline for the first issue in fully received. Please submit contributions by electronic Vol. 42 is March 31, 2021. Contributions are requested. mail to [email protected]. TUGboat is distributed as a benefit of member- The TUGboat style files, for use with plain TEX ship to all current TUG members. It is also available and LATEX, are available from CTAN and the TUGboat to non-members in printed form through the TUG store web site, and are included in common TEX distributions. (tug.org/store), and online at the TUGboat web site We also accept submissions using ConTEXt. Deadlines, (tug.org/TUGboat). Online publication to non-members templates, tips for authors, and more, is available at is delayed for one issue, to give members the benefit of tug.org/TUGboat. early access. Effective with the 2005 volume year, submission of Submissions to TUGboat are reviewed by volun- a new implies permission to publish the ar- teers and checked by the Editor before publication. How- ticle, if accepted, on the TUGboat web site, as well as ever, the authors are assumed to be the experts. Ques- in print. Thus, the physical address you provide in the tions regarding content or accuracy should therefore be manuscript will also be available online. If you have any directed to the authors, with an information copy to the reservations about posting online, please notify the edi- Editor. tors at the of submission and we will be happy to make suitable arrangements. TUGboat editorial board Barbara Beeton, Editor-in-Chief Other TUG publications Karl Berry, Production Manager TUG is interested in considering additional Robin Laakso, Office Manager for publication, such as manuals, instructional materials, Boris Veytsman, Associate Editor, Book Reviews documentation, or works on any other topic that might be useful to the TEX community in general. TUGboat advertising If you have such items or know of any that you For advertising rates and information, including consul- would like considered for publication, please contact the tant listings, contact the TUG office, or see: Publications Committee at [email protected]. tug.org/TUGboat/advertising.html tug.org/consultants.html

Center for Sciences, New York University, TUG Bowie, Maryland Academic Computing Facility, New York, New York CSTUG, Praha, Czech Republic Institutional , London, UK Members Harris and Intelligence StackExchange, Systems, Melbourne, Florida New York City, New York TUG institutional members Hindawi Foundation, London, UK Stockholm University, receive a discount on multiple Department of Mathematics, memberships, site-wide electronic Institute for Defense Analyses, Stockholm, Sweden Center for Communications access, and other benefits: TEXFolio, Trivandrum, India tug.org/instmem.html Research, Princeton, New Jersey Universit´eLaval, Thanks to all for their support! Maluhy & Co., S˜ao Paulo, Brazil Ste-Foy, Qu´ebec, Canada

Adobe Inc., San Jose, California Marquette University, University of Ontario, Milwaukee, Wisconsin Institute of Technology, American Mathematical Society, Oshawa, Ontario, Canada Providence, Rhode Island Masaryk University, University of Oslo, Association for Computing Faculty of Informatics, Institute of Informatics, Machinery, New York, New York Brno, Czech Republic Blindern, Oslo, Norway Aware Software, Newark, Delaware Nagwa Limited, Windsor, UK VTEX UAB, Vilnius, Lithuania TUGboat, Volume 41 (2020), No. 3 259

From the president known in the arts, where studying and copying old classics is considered an obligatory part of an educa- Boris Veytsman tion. Science and technology students are less keen There is an interesting paradox in the history of tech- to study classics — albeit my advisor, Prof. Niko- nology. Early adopters start with the first variants lay Malomuzh, urged us to read by Einstein of the innovation. They continue to use them while or Bohr rather than their summaries in textbooks. the world around them deploys newer and slicker “When you read a textbook,” he said, “you learn only versions, so the former look rather old and quaint what its author understood in the original paper.” in comparison—precisely because they have been The TEX community approach to free software pioneers in the acceptance of the new ideas. This can is based on ideas from Don Knuth. As a prolific be seen in many examples; the quirks of the NTSC author and mathematician, Knuth followed the old broadcasting standard adopted in the United States traditions of mathematics when thinking about intel- is one example. For another, remember my sur- lectual property. The way a theorem belongs to its prise when I first saw the inside of Mission Control at author is quite different from the way Mickey Mouse NASA Goddard Space Flight Center: the communi- belongs to Disney Studios. These ideas might be even cation devices with their cloth-covered speakers and more relevant now since the free software approach mother-of-pearl buttons were reminiscent of 1960s has become popular outside of the world of software turntables rather than futuristic visions of the latest itself. Scientific papers are increasingly available on Star Trek. Of course these devices were installed in preprint servers and open access journals. Many that era and have never needed an upgrade. publishers and granting agencies require the authors The influence of this paradox can be seen in the to make both their data and code publicly available. place of TEX in the free software community — or The need to significantly accelerate science due to rather in the wider free information community. TEX the COVID-19 pandemic has only accelerated these users already exchanged tapes of early implementa- trends. The free software community becomes a part tions when the proprietary of software was of a more general free information community. There taken for granted by many actors in the field. The is an understanding that the old licenses and notions subsequent appearance of the Comprehensive TEX based on the experience of free software, while very Archive Network, CTAN, became the model for such important, may be not sufficient for the many new archives as CPAN, CRAN, and others. Our flagship kinds of information. The appearance of innova- publication, TUGboat, started publishing papers tive ideas such as the Creative Commons licenses about free TEX software decades before the Journal attests to this understanding. I wonder whether a of Open Source Software was conceived. However, re-examination of TEX community practices might for many users and activists of the free software com- be useful in the search for approaches to tackle the munity our approach may seem definitely quaint and new reality of the open information epoch. strange. The fact that TEX has a dual role as a pro- Besides providing food for thought about the gram to typeset the texts and a language to encode approaches to the intellectual property, our commu- them does not help here. The need to preserve the nity is also in the business of providing technical integrity of the language and the ability to faithfully means for the free information movement. Many typeset old manuscripts led to the rather unusual scientific and technical papers — as well as works of requirements of the LATEX project public license. I fiction, technical documentation, etc. — are typeset remember heated discussions with some purists insist- with TEX. As always, there is more to do. I think ing that LPPL, and the license of TEX itself, were not we should do more to aid free tools in supporting ad- free. Fortunately, since the LPPL and TEX license vanced features of (the ubiquitous) PDF documents. requirements were accepted years ago by the GNU We need free tools for creation of accessible — a Project and the Free Software Guidelines as technology now being increasingly addressed by TEX free, we can put these discussions to rest. developers. A less TEXnical but perhaps equally im- There is, however, another side to the early portant problem is the improvement of the free PDF adopter paradox. The first versions of an innova- reading software, especially in handling PDF forms. I tion often contain more ideas than the later ones. hope our development fund (tug.org/tc/devfund) By streamlining the design, the subsequent genera- can help with incentivizing developers to address tions of engineers strip the “unnecessary” ideas and these problems. thoughts. Thus, an innovator seeking inspiration is Boris Veytsman well-advised to study the early works. This is well ⋄ president (at) tug org

From the president 260 TUGboat, Volume 41 (2020), No. 3

Editorial comments Macro packages that require LuaTEX (and work with none of the other engines!) are stored in Barbara Beeton CTAN:/macros/. Passings: Janusz Nowacki, Ed Benguiat, Macro packages that require X TE EX (and work Ron Graham with none of the other engines!) are stored in CTAN:/macros/. This year has not been kind to designers. Macro packages that work with any TEX engine, Janusz Marian Nowacki (9 July 1951–7 June or only with the ‘traditional’ engines, are stored, 2020) was an active participant in the e-foundry team as before, in CTAN:/macros outside the directories that created the Latin Modern family and the T X E mentioned above. Gyre fonts, and an honorary member of GUST, the So far, the following packages have been relo- Polish T X group. He was responsible for the revival E cated to CTAN:/macros/unicodetex/: of several traditional Polish fonts, including Antykwa fontsetup, fontspec, lilyglyphs, polyglossia, Toruńska and Antykwa Półtawskiego, which he im- quran, realscripts, texnegar, unicode-math, plemented using MetaType1. His attraction to T X E xltxtra. was in service to his principal occupation as a rubber Feedback would be most welcome about more stamp maker and his hobby as a fine art photogra- packages that should be moved to another loca- pher. A more personal remembrance appears on the tion, according to this classification. Please email GUST web site.1 [email protected] if you are the author, or a knowl- The prominent U.S. font designer Ephraim Ed- edgeable user, of a package that you feel should go ward “Ed” Benguiat (27 October 1927–15 October to the new CTAN:/macros/unicodetex area, but is 2020) is possibly best known for the font bearing his still located elsewhere on the archive. name — ITC Benguiat,2 a decorative loosely based on of the Art period. Fonts, fonts, fonts This typeface was released in 1977 by the Interna- Font Wars tional Typeface Corporation (ITC), an independent licensing company for type designers which he helped For about half a millennium before TEX was created, establish, and for which he became vice president. type was metal, but in the mid-20th century, metal Among his other accomplishments was work on the type began to be replaced by phototype — negative redesign of logo; this was more images of letters and other symbols on film, through a “cleanup” than a full redesign, and obit- which a light was flashed to record an image on a photosensitive surface. Don Knuth had access to an uary quotes Benguiat thus: “My thought was, ‘OK, we’ll change it — but if we change it, nobody will early laser printer, and realized that images could be represented by a of zeros and ones — a recognize it. So all I did was take it and fix it.” The obituary contains some good advice for aspiring bitmap — and from this idea, was born, typographers and is well worth reading.3 as a necessary adjunct to TEX. But this was still a Not a font designer, but a co-author with Don specialized operation, carried out on a large shared Knuth and Oren Patashnik of Concrete Mathemat- computer. When, in the 1980s, personal computers became available, one of the first killer applications ics, one of the first books to use the AMS Euler font, Ronald Graham (31 October 1935–6 July 2020) was processing. Soon after, personal-sized laser was a President of the American Mathematical So- printers appeared, and the race was on to provide ciety (1993–1994) and creator of the Erdős number, fonts that would allow any user of a PC to create a measure of the distance from Paul Erdős in the any kind of document their occupation required. collaboration network of mathematical publication. Competition among the manufacturers of PCs and associated software was fierce, and the part of .I.P. Messrs. Nowacki, Benguiat, and Graham. it that dealt with the printing of documents became A new Unicode-specific area in CTAN characterized as the “Font Wars”. This period has The new directory tree CTAN:/macros/unicodetex been described in a number of places, one of which is meant for macro packages that work with either we celebrate here. Chuck Bigelow, as part of a 2017 symposium on the History of X TE EX and LuaTEX, but not with ‘traditional’ TEX sponsored jointly by the IEEE Computer Society engines like TEX and pdfTEX. and the Computer History Museum in Mountain 1 www.gust.org.pl/news/jmn-obit-en View, California, tells the story of this period in a 2 www.fonts.com/font/itc/itc-benguiat 3 www.nytimes.com/2020/10/16/business/media/ two-part : “The Font Wars”, IEEE Annals of ed-benguiat-dead.html the History of Computing, 42:1. January–March TUGboat, Volume 41 (2020), No. 3 261

2020, 7–40. Additional notes are provided as a web text (measured in “ correct per minute”). supplement: history.computer.org/annals/dtp/ Its goal is to improve reading proficiency. fw/. TUG plans to publish the entire enterprise in • The Accessible RMarkdown Writer6 is not a font, book form next year, with additional material. Since but “a tool that creates documents in various Chuck was a key participant in this saga, this is a formats based on RMarkdown text”. Designed true first person account, well researched and lucidly to help create “scientifically rigorous” complex presented. Interspersed with the text are numerous documents, it is based on the existing tool Mark- illustrations depicting various methods, old and new, down, with the ability to add inline R-code, used for defining the shapes of letters. and uses pull-down menus to access symbols. This same subject has been covered from a differ- BibT X is supported for inclusion of references. ent of view (that of someone active in the print- E ing industry) by Frank Romano in his book History Erratum: of Desktop Publishing, which was reviewed in an ear- “The Road to Noto”, Steven Matteson lier issue of TUGboat (tug.org/TUGboat/tb41-1/ (TUGboat 41:2, 145–154) tb127reviews-romano.). This book character- izes the different personalities and points of view A question was raised regarding the relative antiquity of the principals involved in the hardware end of of Anatolian hieroglyphics compared to Egyptian, the font wars, and is notable for the presence of as stated on 152: “ are some photographs that illuminate the intensity of at least 4,000 years old, thus predating Egyptian competition that marked the period. hieroglyphs.” When Matteson was asked about this, he re- / Latin Modern sponded, “Whoops — yes I see looking at the timeline Changing gears, visible differences between Com- I quickly sketched out for the talk I transposed the puter Modern and Latin Modern were addressed words ‘Akkadian’ and ‘Anatolian’. Big mistake on by a question on the .stackexchange Q&A site my part and I’m very happy it was pointed out.” (tex.stackexchange.com/q/48369). CM has been But there’s more to this story. criticized for appearing too thin, especially on screen, To begin, “Akkadian” applies to , not but a comparison of copies of Knuth’s The Art of hieroglyphs. Cuneiform has traditionally been con- (TAOCP) show it to appear sidered slightly older than hieroglyphs, but archaeol- more substantial than other documents produced ogists are still digging. And the text here mentions with ()TEX. An answer to this question notes that only hieroglyphs, not cuneiform. So, where did “Ana- a great deal of LATEX material is now by default set tolian” come from? with Latin Modern, not the original CM. It attributes One of the goals of Noto is to support all lan- the difference in weight to the model used for LM guage scripts in Unicode; the relevant block (U+14400– (and also for the Type 1 implementations of CM), U+1467F) is named “Anatolian hieroglyphs”. The which has a lesser value for the METAFONT variable request for adding this to Unicode blacker, to noticeably thinner stems. (found in the Unicode archives) was submitted by the UC Berkeley Script Encoding Initiative. This Other fonts, and some related software document indicates that the script was used by mul- A productive field for font development is directed tiple , so a regional rather than a linguistic toward assisting readers with visual disabilities. Sev- name was applied. Sadly, no temporal information eral new fonts and supporting software in this area is included. have been announced on the web. Elsewhere, in a reference for Egyptian 4 • Luciole (French for “firefly”) is a sans serif hieroglyphics (https://en.wikipedia.org/wiki/ font intended to “advance research”, supporting Egyptian_hieroglyphs), it is stated that almost all European languages and including many Greek and math symbols for scientific Since the 1990s, [. . . ] discoveries of at notation. Abydos [Egypt], dated to between 3400 and 5 3200 BCE, have shed doubt on the classical • Lexend is a “variable” font (or series of varia- notion that the Mesopotamian symbol system tions on an underlying sans serif font style) predates the Egyptian one. However, Egyp- that is intended to change depending on feed- tian appeared suddenly at that time, back from the reader’s ability to comprehend a while Mesopotamia had a long evolutionary 4 luciole-vision.com/luciole-en.html 5 lexend.com 6 www.arowtool.com 262 TUGboat, Volume 41 (2020), No. 3

history of sign usage in tokens dating back to Illustrations accompanying the article show both circa 8000 BCE. original and newly created items; it’s very hard to It is tempting to infer that the underlying idea was distinguish which is which. (Will there be anyone, “in the air”, resulting in relatively contemporaneous or any reason, to memorialize this year’s disaster? development. The need is certainly there among artists.) Followup to “old news” Learning LATEX With the cooperation of the Computer History Mu- An introductory manual, Learning LATEX, by David Griffiths and Desmond Higham, first published in seum, ACM has carried out interviews with many 1997 with a edition in 2016, has been high- recipients of the Turing Award. This includes both Don Knuth and Leslie Lamport. The interviews lighted by a commentary in SIAM News.7 In it, the authors predict “future LaTeX breakthroughs, lead- (and transcripts) are online. Start with the main ing up to the release of LaTeX3 in 2051.” I think announcement page for Knuth, at amturing.acm. that some of these have already come true. org/award_winners/knuth_1013846.cfm. On that page are links for all forms of the interview, and Returning to the present, the new LATEX “in- an alphabetical listing of award recipients. (Leslie structional” site, learnlatex.org, has in just a few months progressed from an idea to a full-scale opera- is listed next after Don.) The interviews are long — tional reality. Introduced by Joseph Wright in a talk more than seven hours with Don, and about five and a half with Leslie. (I learned from Don’s interview at TUG 2020,8 the collection of elementary lessons has already been translated into French, Portuguese, that his long-time secretary, Phyllis Winkler, was Vietnamese and Spanish, and more translations are able to read his , was a top-notch tech- underway. Each lesson contains one or more typi- nical typist, and thus a superlative candidate for the cal examples, which can be run directly from the first non-DEK TEX tester.) Put this site on your list connected page or modified for a different view; ex- of things to turn to when you have a few hours of perimentation is encouraged. Access to, and presum- quiet time available. ably use of, the site has grown to several hundred TUG maintains an extensive list of Knuth videos connections per day. at tug.org/interviews/#knuthav as well as links for other people in the T X world elsewhere on tug. One of the principal goals of learnLATEX is to E keep it current, unlike many existing web sites that org/interviews. If you know of anything we’ve have been constructed but left to lie fallow (some- missed, please let us know. times for years), or books that represent a particular Making TUGboat more accessible point in time. No, this isn’t a claim that TUGboat will be easier for Graphical history in action someone with a visual disability to read, but starting with the next issue, each article will carry a DOI — a Not fonts, not T X, but an art form that does utilize E Digital Object Identifier. This unique identifier will fonts as an integral part of its message and consid- broaden digital access to TUGboat content, allowing erable charm, posters for the national parks of the it to be included in major collections of bibliographic U.S. have been compelling advertisements for the data, starting with that of Crossref, the DOI registrar locations they picture. Among the most attractive of for material like that published by TUG. the lot are the posters created as part of the Works The assigned DOI prefix for TUG is 10.47397. Progress Administration (WPA), an organization cre- To this will be added the journal, volume, issue, and ated during the Great Depression as a means of article identification. For new issues, the DOI will supporting artists, whose talents did not otherwise appear below the bottom of the first on the yield a stable existence. first page of each item to which a DOI is assigned. An article9 in the New York Times tells the story Assignment of DOIs to earlier issues will proceed as of a retired dentist, Doug Leen, who has tracked time permits. down original posters and reproduced them, and The DOIs will be added to Nelson Beebe’s Bib- also designed new posters in the distinctive style. TEX database of TUGboat contents. 7 sinews.siam.org/Details-Page/ Barbara Beeton writing-learning-latex ⋄ 8 tug.org/TUGboat/41-2/tb128carlisle-learnlatex. https://tug.org/TUGboat pdf; video: youtu.be/0qTBtKr-5c0 tugboat (at) tug dot org 9 www.nytimes.com/2020/08/25/style/ ranger-doug-leen-wpa-national-park-posters.html TUGboat, Volume 41 (2020), No. 3 263

How TEX changed my life typist might have typed a mimeograph master. I cannot imagine getting that corrected. Even earlier, Michael Barr the hand-written draft would be the one submitted Abstract for publication. I once looked up instructions for authors and discovered that the American Math. I describe the complicated process of writing math Society started requiring typewritten submissions papers, getting them typed, and finally getting them only in the early 1920s. published in the days before T X and how much E At any rate you submitted your masterpiece to a things changed after. journal, which sent it out for refereeing. The referee 1 Introduction invariably wanted revisions. Many of them involved shortening the paper, removing details, making it As I look back on a career writing papers that started harder to read. This wasn’t perverse on their part: with my 1962 PhD thesis, I am amazed how much mathematics was very expensive to print. The paper things changed from before Don Knuth’s amazing would have to be revised, which meant starting the program T X to after. The process of getting math E whole process over. Assuming the paper was finally into type was long, messy, and extremely error-prone. accepted, it then got copy-edited. For mathematics, It started with writing a fair copy in longhand that this meant marking up the theorem-like environ- a typist could hope to read, followed by correction, ments for italic, likewise with all the variables, Greek submission, revision, rinse and repeat. At the end of letters, script letters, Frakturs, and special symbols. the , the whole exercise had to be repeated with Then it was sent to a typesetter, who tried to match , all of which changed with the coming of the marked-up typescript as well as he could. He had T X and LAT X. E E about 250 characters that he could put into his - 2 My thesis chine and he would look at the paper to decide which Even in retrospect, it was a mess, a long very de- ones to load. Any additional ones would have to be tailed computation full of Greek letters and loads inserted by hand at extra cost. While 250 sounds like of subscripts. I was living at home, we rented a a lot, it has to include the standard alphanumeric , my mother, who could type, did most of characters, italic, bold, large sized for headers and it (with me dictating for the most part) and I did a whatever special characters the author might have little. With most of the Greek letters, we left space chosen. Once he had done his best, proofs would be and wrote them in with a pen. In a few cases, I returned to the author, who had to read it again and was able to improvise something. For example, O, find and mark errors. Minor author corrections were backspace, I would give a reasonable approximation permitted but discouraged. And nothing that would of Φ. Or +, backspace, 0 gave something like . change the size of was allowed. Finally, ⊕ publication! There were no italics; in point of fact until TEX came along, I was not consciously aware that theorems Starting in the mid-1960s there were small im- and the like were always set in italics and that math provements in the process. The first innovation was symbols were also italic. I don’t believe journals used things called typits. A typit was a character (symbol different fonts for text and math italics, incidentally. or otherwise) on what looked like a typewriter key. You would hold it against the typewriter ribbon and 3 Writing papers hit any other key. That would hit the typit, which I never made any attempt to publish my thesis, al- would impress the character on the page. I remem- though in fact I published the results in much greater ber one typist in my department who had a box of generality and without any computation seven years typits on her desk and got very fast about finding later. But I did start writing papers. To get a paper the required character. Still the process was slow into print was a long messy process. First write a and error prone. fair copy longhand. Give it to a department typist Next came the IBM Selectric typewriter with and hope that she (it was always a woman) could un- its typing ball replacing the typewriter keys. I will derstand it. A typescript would emerge and I would not try to describe the Selectric (you can google it), have to correct it and give it back to the typist. After but only mention that you could change the ball getting it corrected, I would have to add the Greek to a symbol ball or another — there was likely an letters, symbols, script letters, etc. by hand. italic ball — and use that. Still it was a slow process. We did have photocopiers, but copies cost ten Replace the ball, find which character to type and cents a copy, probably more like a dollar in today’s do so, then put back the standard ball. But the money. Still you made a copy. In an earlier era, the whole process of writing a fair copy, getting it typed,

How TEX changed my life 264 TUGboat, Volume 41 (2020), No. 3

correcting it, really hadn’t changed. You didn’t have But we actually had a great stroke of good for- as much hand work and the typesetter’s job must tune. We sent it to Springer-Verlag, who handed it have been somewhat easier, but it was still slow and off to the editor Roberto Mineo, who was at that expensive. time also a graduate student in computer science I think it was in the summer of 1979 that I got at Carnegie-Mellon University. He had discovered so fed up with the whole process that I and my two a beta version of LATEX and used it to do the re- older children found a book, vintage 1945, “Teach quired formatting and also used the LATEX picture Yourself to Type” and sat down and learned to type. mode to code all the diagrams. He printed it out I then bought a second-hand Selectric typewriter at CMU and Springer printed it directly from his along with a symbol ball, determined to learn how camera-ready text. Unfortunately, the printer he to do my own papers. But I never actually did that. used was not the best quality and the original is not Still learning to type was what I would eventually up to Springer’s standards. A better version is on have to do. my website www.math.mcgill./barr. After that, PC versions of TEX appeared and I 4 The coming of TEX never used a typist again. Life became so much easier. In 1979 I started working on a book [TTT] jointly I could typeset a paper, make the necessary revisions with the late Charles Wells of Case Western Uni- in much less time and with much less effort than versity. At first, we exchanged typescripts by mail. the old process of getting it ready for publication Unfortunately, mail between Cleveland and Mon- had been. I never actually used the Selectric and treal took a minimum of two weeks. In 1980, we eventually gave it to one of my students who used it discovered [T&M] and decided on the spot that we for a few years and then got his own computer and would try to use TEX to do our book. My brother, a learned TEX. Now I doubt there is a math journal computer professional, told me we were nuts until we in the world that will accept a paper not composed had access to an implementation we could use. We in TEX or, most likely, LATEX. did it anyway. Most readers of this article will not Charles and I did one more book together a realize the limitations of TEX 1. Most importantly, little later [CTCS]. We did the final version when there were no add, multiply, or divide instructions. we were both on sabbatical at Penn in 1990–91. It is You could increment or decrement a by 1, interesting to compare timings. In 1990 on a 6 MHz but that was all. Anything like LATEX would have IBM AT computer the compilation took about an been impossible. Nevertheless we persisted. A big hour. Conversion from .dvi to .hp took about an problem was the commutative diagrams. We basi- hour and a half and printing on an HP LaserJet cally left TEX mode and drew them as best we could took over an hour. I had occasion to recompile a using horizontal arrows fabricated with - signs, ver- 20% longer version of the book a few weeks ago and tical arrows drawn with | and diagonals with producing a .pdf file took all of 8 . and backslash. Arrowheads were done with <, >, v References and ^. I guess I assumed that a publisher might do the diagrams in the old-fashioned way and insert [TTT] M. Barr and C. Wells, Toposes, Triples, them in the right places. and Theories. Springer-Verlag, 1985. At some point we discovered computer networks [CTCS] M. Barr and C. Wells, Category Theory for and eventually, with much help from our computer Computing Science. Prentice-Hall Interna- centres, we learned to transmit our files electronically. tional, 1990. Charles had bought an Apple II in 1979 and by 1982 [T & M] D.E. Knuth, TEX and Metafont. Addison- I had an IBM PC. We once counted that we had Wesley, 1979. used 9 distinct editing programs. I actually wrote a program that was originally Michael Barr ⋄ intended to just remove the TEX code and print out 404-865 Ave. a more-or-less readable manuscript. But in fact I Mont-Royal, QC H4P1B2 discovered that, using the wonderful (for the time) Canada abilities of the Epson FX-80 printer, I could give a michael.barr (at) mcgill dot ca reasonable interpretation of the TEX code. I also https://math.mcgill.ca/barr wrote a font generator to generate special fonts for the nine-pin dot matrix printer. The results weren’t pretty, but they were readable. It was all monospaced and unjustified.

Michael Barr TUGboat, Volume 41 (2020), No. 3 265

Typographers’ Inn There is a substantial body of opinion, some backed by research and some not, that you should not Peter Flynn use PDF format for non-print use (e.. web ‘pages’) To print or not to print because of the potential for severe usability problems compared with conventional HTML: For over 500 years we have been surrounded by the idea that the final act of creating text is to print PDFs are meant for distributing documents it. Then you can bind it, sell it, lend it, circulate that users will print. They’re optimized for it, or whatever you want, because you have ‘it’ in paper sizes, not browser windows or mod- your hands: a book or pamphlet or leaflet, something ern device viewports. We often see users get tangible. lost in PDFs because the print-oriented view That idea led to the consolidation of conven- provides only a small glimpse of the content. tions in European publishing and elsewhere, some Users can’t scan and scroll around in a PDF of which was drawn from the manuscript era, about like on a web page. Content is split up across how documents work. sheets of paper, which is fine for printed doc- uments, but causes severe usability problems • The document is made up of rectangular pages, online. [7] held together to form the book. Normal practice is to publish in multiple formats • The text starts at the beginning, in the appro- anyway, with a growing recommendation for HTML5 priate corner, and progresses, symbol by symbol, with CSS3 Paged Media features [9]. However, the until the end. use of PDF is in many cases unavoidable for tech- • Along the way it can be broken into divisions nical or small-p political reasons, in particular the according to some conceptual or logical plan accuracy obtainable with LATEX which is often un- defined by the author, which can be used to available in browsers even with HTML5/CSS3, so we guide or inform readers. need to consider how we can overcome the legacy • There can be other waypoints or milestones to problems of print. In particular, whichever format show readers where they are in the document, you choose (or are required to use), it is essential and to enable them to tell others how to find to make the document accessible according to the some item of interest. prevailing guidelines in your field. • Once we moved from scrolls to pages, a human desire for order in chaos seems to have engen- Page numbers. When the idea of the web and dered some conceptions of how things conven- other forms of networked electronic publishing caught tionally look: on, many academic journals and citation format - thorities, accustomed to page number references, had – all the pages should be the same size; serious concerns, because a web page isn’t a page – they should all look roughly the same, or at all — it’s essentially like an endless scroll, able follow a limited set of patterns; to hold an entire book or even collection of books, – they should normally have the same num- with never a page number to be seen. EPUB books ber of lines per page; even when intruded change page numbers every time you zoom in or out upon by other material (mathematics, mu- for a better fit or font. Citation formats that made sic, figures, tables) the positioning of the page numbers compulsory even came under attack remaining lines should be consistent. for being old-fashioned by some of those who were by now publishing electronically only. Some formats This is not just to make them easier to bind, dug their heels in and insisted on page numbers even but to make them easier to read, and because for pageless documents. That particular panic is the people who printed and published the books largely past, and many journals now retrofit page eventually wanted their editions to be uniform numbers from the PDF back into the web version between themselves, but still distinct from ev- (relatively trivial with T X). eryone else’s. E Take away the idea of printing, and you are left with Margins. Printed books and journals are bound at the PDF or web page on your screen. It may even the left or right edge, according to , look like the printed page, but of course it’s just which means the inside needs to be more a bunch of colored dots. Yet we keep most of the than the outside one, to allow for the curvature features listed above because they’re useful to the of the pages close to the spine when the book is readers [5] — or we hope they are. open. Historically the margins were a subject of

Typographers’ Inn 266 TUGboat, Volume 41 (2020), No. 3

great care and attention in , both in • Font size and leading: how can you use it to manuscript and print, exemplified by Tschichold’s compensate for longer lines? famous diagram (Figure 1). Most printed documents • ‘Page’ numbering: is it needed at all? were traditionally set justified, much easier even in • Number of lines per page or screen or window: hand-set type than in manuscript, so the idea of the is it important? text occupying a rectangle of fixed dimensions on • Consistency and similarity: do they need to be each page was an easy convention to continue. preserved if more than one document is being published in series? 3 • Document structure: some form of sectional division will probably continue to be needed; they will require a numbering scheme of some kind if there are no pages to number. Paper isn’t going away any time soon, but as we 4 2 start to change our reading habits, it’s worth starting to think about how that will affect our document classes. Centering (again) This has become a recurrent theme as people send me 6 more examples of poor line-breaking in centred [2, 3]. In one article [4] I showed an early example Figure 1: Sketch of page proportions (after Tschichold (1549) which I reproduce again in Figure 2 where the [8], quoted in Lewis [6]). word ‘Contents’ was broken ‘CON’ (in , large and red, between fleurons) and ‘tents’ (in , In a format designed for on-screen reading, the body text size). uneven but symmetrical margins are probably an unnecessary distraction unless you can expect readers to use facing-page software. Some publishers create separate print-ready and display-ready PDFs so that online readers don’t see the odd and even margins intended for print. Lines. While the number of lines per page can be controlled in a PDF, it is pointless and meaningless on the web, and makes an EPUB virtually unusable, as both those formats are designed to be resized by the reader. In any event, line alignment across a double-page spread is not meaningful in a browser or reader unless facing-page viewing is available. The problems of ‘show-through’, where the aligned or Figure 2: Unusual line-breaking in a heading (Book ‘backed-up’ lines of print on the next or previous of Common Prayer, 1549, fragment, courtesy of The page are visible through thin paper, are quite clearly Society of Archbishop Justus); see [4] for the original a print-only concept. .

Questions. So what should we be looking out for Recently I came across another early example, when formatting for non-print reading only? Perhaps this time from 1573. It was posted on Twitter as the following can act as a starting-point: a very low-resolution image on a bright violet back- • ‘Page’ shape (window or viewport shape may be ground, and I am indebted to Paul . Nash, Editor a better term): portrait like an office document of the Journal of the Printing Historical Society for or landscape like a modern screen? identifying it for me, and for providing much addi- • Margins: if they no longer need to be asymmet- tional information (Figure 3). Richard Tottle (also rical, how big should they be? Tottel and other spellings, as here) was a publisher in • Line : there’s more space in landscape, sixteenth century London, known for his Miscellany, but let’s not use it at the expense of readability; the first collection of poetry in English.

Peter Flynn TUGboat, Volume 41 (2020), No. 3 267

(available from MyFonts.com). It is modeled on type used by Henrie Ballard, who ran a press just down the street from Tottle, the other side of Temple , a few decades later. The blackletter is Missaali, a textura based on a much earlier typeface from the German printer Bartholomew Ghotan in the 1480s, and available from CTAN. Despite the need to retain typographic unity within the line, it is interesting that neither compos- itors nor printers nor publishers (often the same in those days) felt it necessary for a name or a word to remain in the same font across a line-break. I did Figure 3: Colophon from Sir William Staunford’s at one stage think that perhaps there was a feeling An exposition of the kinges prerogative, collected out of that the publisher’s name should be in a specific font, the great Abridgement of Justice Fitzherbert and other and that there could have been a technical reason olde writers of the lawes of (1573) printed behind this — font bodies were not of exact or even by Richard Tottle. Facsimile available at https: sizes between foundries, so a font of a given size from //books.google.co.uk/books?vid=OIQ8AAAAcAAJ. one foundry might not be the same depth as the same font of the same nominal size from another, and would require additional spacing material. But In this colophon, however, it’s not a word broken Dr Nash has identified other mixed lines elsewhere in over a line but a phrase: the name of the location the document which indicate that the smaller size of (Hand and Star). It is subject to a change of font, black letter and italic were indeed cast on the same again from antiqua to blackletter, which to modern size of body. eyes appears strange. But it was a style at the time While we’re on the subject of mixing fonts, I was to alternate lines of different fonts, and Dr Nash is of sent the sign in Figure 5. At first glance I thought the opinion that this was the prevailing factor in an it might be a UK placename like Ottery St Mary or arrangement like a colophon where it contributed to Forncett St Peter, but apparently it only refers to successive lines being shorter and shorter to obtain Osborne Street. Ultimately, if you simply don’t have a triangular effect: the relation of the type to the access to the font any more, or it no longer exists meaning of the text was only considered very loosely in a usable form, your options for changing font in if at all. The example is also curious for the extra ‘t’ mid-line may be forced. in the printer’s name, and the accidental duplication of the syllable ‘men’ in the impressum at the bottom.

Figure 5: Garage sign in Colchester, UK

New device driver for old format Figure 4: Typographically reconstructed colophon I was talking with Barbara Beeton a while ago about (draft, incomplete) a project we are both involved in, and the topic of the durability of text came up. She was making the For a separate project I am using this as an point that computer files have nowhere near the per- example for typographic reconstruction using easily manence of clay tablets, which, after all, only become available modern fonts (Figure 4). In this case the more indestructible when subjected to fire [1]. antiqua is Ballard, from Proportional Lime, who Given that we can replicate a facsimile of a specialise in modern cuttings of historical typefaces clay tablet using a 3D printer, and that numerous

Typographers’ Inn 268 TUGboat, Volume 41 (2020), No. 3

References [1] . Beeton. The future of technical text. In . Hegland, ed., The Future of Text, pp. 58–59. Future Text Publishing, London, Nov 2020. [2] P. Flynn. Typographers’ Inn — Titling and centering. TUGboat 33(1), May 2012. tug.org/TUGboat/tb33-1/tb103inn.pdf [3] P. Flynn. Typographers’ Inn — Afterthought. TUGboat 37(3), Sep 2016. tug.org/TUGboat/tb37-3/tb117inn.pdf [4] P. Flynn. Typographers’ Inn — Afterthought. TUGboat 38(1), May 2017. tug.org/TUGboat/tb38-1/tb118inn.pdf [5] P. Flynn. Digital Typography. In K. Norman, . Kirakowski, eds., Handbook of Human-Computer Interaction, pp. 89–108. Wiley, Hoboken, , Jan 2018. https://doi.org/10.1002/9781118976005 [6] J. Lewis. Typography: basic principles: influences and trends since the 19th century. Studio Books, London, Jan 1963. [7] J. Nielsen, A. Kaley. Avoid PDF for On-Screen Reading. NN/g Web Usability, Jun 2020. nngroup.com/articles/avoid-pdf-for-on- Figure 6: Catalog for Artistic Bookbinding (2003) screen-reading/ [commemorative for Demetrios Krommydas of Chios [8] J. Tschichold. Designing Books: Planning a book; (1942–2001)]. a typographer’s composition rules; fifty-eight examples by the author. Wittenborn, Schultz, New York, NY, Jul 1951. cuneiform fonts can be used with LAT X, using the E [9] W3C, Boston, MA. CSS Paged Media polyglossia package, it should surely be possible to Module Level 3: Working Draft, Jan 2018. create a dvi2tablet output driver (or an equivalent w3.org/TR/css-page-3/ for a PDF file) so that students worried about the persistence of their dissertation would merely have Peter Flynn to translate it into one of the supported languages ⋄ Textual Therapy Division, (Akkadian, Eblaite, Elamite, Hattic, Hittite, Hurrian, Silmaril Consultants Luwian, Sumerian, Urartian, or Old Persian) and Cork, Ireland output it to a 3D printer, bake the tablets, and store Phone: +353 86 824 5333 them in a convenient cave. peter (at) silmaril dot ie blogs.silmaril.ie/peter Afterthought: What’s in a name Anyone who has read documentation about TEX or LATEX will probably have come across the description of how to pronounce the TEX bit as ‘tecchh’ because Knuth based it on the Greek τǫχνη´ , meaning ‘craft’ or ‘art’ (as in Knuth’s own Art of Computer Pro- gramming). Olivia Fitzpatrick, formerly of UCC’s Boole Li- brary, has shown me a copy of the commemorative catalog for the Greek bookbinder Demetrios Krom- mydas (Figure 6) which shows the word in a normal Greek context which serendipitously is a craft related to typesetting.

Peter Flynn TUGboat, Volume 41 (2020), No. 3 269

Eye charts in focus: The magic of optotypes Lorrie Frear Abstract An investigation into the optotypes used on eye charts, with comparisons to standard typefaces.

1 Introduction My graphic design students love to design posters using the classic eye chart composition, and they frequently ask “What typeface should I use for this?” Not having a definitive answer has always been frus- trating, so I decided to investigate to find out what typeface is used on eye charts. I started my quest by asking my ophthalmologist, who enthusiastically provided a dizzying amount of Figure 1: Kuchler Chart. Heinrich K¨uchler is one of technical information about the variety of eye charts the first individuals credited with creating an eye chart and tests designed for different audiences and eye to test visual acuity. conditions. Suddenly, a simple question became a series of discoveries. Not only is there not one letter- form design or font used for eye charts; the letterform recognizable to most Americans from visits to the designs are more appropriately called optotypes, of DMV (fig. 2). The Snellen Eye Chart was designed which there are several versions. There is a science to by Dutch ophthalmologist Herman Snellen in 1862 as the design of optotypes and their at specific a means of improving the subjective nature of vision distances. Since I am a graphic designer and not testing, which was usually accomplished by having an eye or vision expert, I will forgo the technical patients read a passage of text held their hands, explanations and focus on optotypes used on several or held at a distance by the doctor. This test had significant charts to provide a better understanding obvious limitations: the results were dependent upon of this complex and fascinating subject. the reading ability of the patient, the legibility of the Eye charts are designed to test visual acuity, or typeface used, and the fact that the patient could clarity of vision. Each chart design has limitations guess the next word by reading a . According and advantages, depending on the clinical setting, to Dr. August Colenbrander, a scientist at the - patient profile, and diagnostic objective. To under- Ketterwell Eye Research Institute and an expert on stand the differences between the charts, it is helpful eye chart design, Snellen began experimenting with to know a little historical background of standardized , or symbols such as squares and circles for visual acuity testing. his eye chart, but found that it was difficult for test 2 The first standardized tests subjects to describe the symbols accurately. [5] So, he moved on to using letters. The characters Heinrich K¨uchler, a German ophthalmologist, de- on the first Snellen Charts were: A, C, E, G, L, N, signed a chart in 1836 using figures cut from cal- P, R, T, 5, V, , B, D, 4, F, H, K, O, S, 3, U, Y, endars, books, and newspapers glued in rows of de- A, C, E, G and L. The letters used were Egyptian creasing sizes onto paper. These figures included can- Paragons or slab serifs of contrasting line thickness nons, guns, birds, farm equipment, camels, and frogs. with ornamental cross strokes on terminals. Snellen This system was limited because the figures were then theorized that test subjects would be able to not consistent in visual weight or style. Dr. K¨uchler identify non-ornamented, monoline/equally weighted continued to refine his chart, and in 1843, published letters of consistent visual size more easily, and so a new version using 12 rows of Blackletter letters he created optotypes. decreasing in size (fig. 1). This chart was not widely At first glance, it may appear that the Snellen adopted and was published only once in 1843. [6] optotypes are Lubalin Graph or Rockwell. But upon The next significant development in visual acuity detailed examination, it is evident that these char- chart design was the Snellen Eye Chart, which is acters are rather atypical (fig. 3). Unlike typical Originally published July 12, 2015 at ilovetypography. typefaces in which letter proportions are determined com/2015/07/12/what-are-optotypes-eye-charts-fonts. by ‘family’ groupings (such as n, r, m, h and u),

Eye charts in focus: The magic of optotypes 270 TUGboat, Volume 41 (2020), No. 3

Figure 3: Snellen vs. Lubalin Graph.

Figure 4: Snellen letter E

Dr. Snellen created optotypes using minutes of arc instead of a typographic measuring system. This made it possible for his charts to be reproduced easily. The first large order for Snellen Charts was from the British Army in 1863. From there, the Snellen Eye Chart became the standard for vision testing for almost a century. In addition, Snellen’s 5 x 5 grid optotype design is the foundation upon which all other eye chart systems are based. The Snellen Eye Chart is still the most recognized design, which can, to some extent, negate its effectiveness, if, for example, the test subject has memorized the chart. [5] Most Snellen Charts contain eleven lines of block letters. The first line consists of a single large letter, most often an E. Subsequent rows have increasing numbers of letters that are progressively smaller in Figure 2: Snellen Chart. (Grayscaled for print; the bar between lines 8 and 9 is red.) size. The test subject, from a distance of 20 ft, covers one eye, and, beginning at the top, reads aloud the letters in each row. The smallest row that can be read accurately indicates the visual acuity in that Snellen optotypes are designed on a 5 x 5 grid (fig. 4). particular eye. [9] Furthermore, they comprise a very limited character Current Snellen Charts use nine letters, C, D, set of just 9–10 letters. Optotypes are designed using E, F, L, O, P, T, Z. Note that with the exception a simple geometry in which the weight of the lines is of E and O, the letters are all . The equal to the negative space between lines. The height diverse shapes of the optoypes allow test subjects to and width of an optotype is five times the thickness identify verticals, horizontals, and diagonals. These of the line weight. [13] These design considerations letter shapes are also highly effective in identifying create inconsistently and oddly proportioned letters. astigmatism. For example, in a typical typeface, C and D would Although today’s Snellen Eye Charts may vary appear wider than Z, but in the optotype scheme, in the number of rows, size gradation, and serif or the opposite is true. sans serif design [6], their commonalities include

Lorrie Frear TUGboat, Volume 41 (2020), No. 3 271

the rectangular shape. This dictates the varying numbers of optotypes appearing on each line as space permits. [8] As a result of continual refinements, most of today’s Snellen Charts follow logarithmic progression, have improved letter designs, and a uniform 25% progression from line to line. [8]

3 Refinements and variations In 1868, Dr. John Green of the St. Louis College of Physicians and Surgeons in Missouri decided to make some changes to the Snellen Eye Chart. He de- signed a more structured grid featuring a consistent logarithmic geometric progression of 25% for succes- sive lines, and with proportional spacing. He also changed the style of the optotypes from the blocky to sans serif. His concept became known as the “Preferred Numbers Series”, but his system did not become widely recognized until the next cen- tury when sans serif typography gained popularity. Ironically, in response to criticism that his letters looked “unfinished”, Dr. Green abandoned them in 1872, and returned to the serif optotypes. [8] In 1959, Dr. Louise Sloan of Johns Hopkins University created ten new optotypes using sans serif letters preferred by Dr. Green. These optotypes included the letters: C, D, H, K, N, O, R, S, V, and Z. Like Snellen letters, Sloan Letters are formed within Figure 5: Sloan Chart. a square, with the stroke width equal to one-fifth of the letter height and with equal visual weight. The Sloan Chart (fig. 5) has consistent spacing between letters and rows that are proportional to letter size. Spacing between letters is equal to letter width, and spacing between rows is equal to the height of the letters in the subsequent, smaller row. [4] Notice that, as in the Snellen Chart, all of the characters are consonants with the exception of O. Figure 6: Sloan letters vs. . Also note that the letter selection used on the Snellen Chart is not the same as that in the Sloan Chart. In both cases, the diverse shapes of the optoypes allow 4 New charts and methods test subjects to identify verticals, horizontals and In 1976, Ian Bailey and Jan E Lovie-Kitchin of the diagonals — an aid to identifying or differentiating National Vision Institute of proposed a individual letters. The ten Sloan Letters are con- new chart layout (fig. 7), describing their concept as sidered to be the most effective letter selection for follows: equal legibility. What’s more, they are particularly effective at identifying astigmatism. “We have designed a series of near vision The Sloan Letters may at first glance resem- charts in which the typeface, size progression, ble or Eurostile (www.myfonts.com/ size range, number of words per row and spac- fonts/linotype/eurostile) fonts, but upon closer ings were chosen in an endeavor to achieve a examination (fig. 6), it is evident again that the grid standardization of the test task.” [11] format imposed upon these optotypes produces some This layout replaces the Snellen rectangular odd and inconsistently proportioned letters. chart format with a variable number of letters per

Eye charts in focus: The magic of optotypes 272 TUGboat, Volume 41 (2020), No. 3

between letters and lines, making the acuity chart more balanced. [8] This chart format has been ac- cepted by the National Eye Institute and the FDA, and is mandated for many clinical trials performed worldwide. The ETDRS test is more accurate than either the Snellen or Sloan versions because the rows contain the same number of letters, the rows and letters are equally spaced on a log scale, and individual rows are balanced for letter difficulty. There are also three different versions of the test available to deter memorization. [1] One limitation of the original ETDRS chart is its use of the Latin , making it difficult to use throughout all of Europe. To address this limitation, the Tumbling E and Landolt C charts (discussed below) are used for populations who are unfamil- iar with letters of the . Recently, a modified ETDRS chart was developed using Latin, Figure 7: Bailey–Lovie Chart. Greek, and Cyrillic . For this chart, the letters C, D, N, R, S, V and Z have been replaced by the letters E, P, X, B, T, M, and A. These letters are created using the same 5 x 5 grid and the Sloan Letter design. [7] In more recent years there has been a move to create electronic charts, including the British- designed Test Chart 2000, which was the world’s first Windows-based computerized test chart. It overcomes several difficult issues such as screen con- trast, and provides the opportunity to change the letter sequence, so that it cannot be memorized. [2] These fonts, for Mac and Windows OSs, are Figure 8: Snellen vs. Sloan letters; Sloan and ETDRS available for research purposes. The fonts are based letters are the same. on Louise Sloan’s designs, which has been designated the US standard for acuity testing by the National Academy of Sciences, National Research Council, line with a triangular one with five proportionally Committee on Vision. [12] spaced letters on each line. The ten Sloan Optotypes appear on the Bailey–Lovie Chart using the same 5 Charts for non-readers letter ratio of the letter-height equal to five stroke For testing patients who cannot read or for those widths, excluding serifs. unfamiliar with the Latin alphabet, the Tumbling E The Bailey–Lovie Chart is an example of a Eye Chart and the Landolt C or Broken Chart LogMAR test, a term describing the geometric no- are used. [8] tation used to express visual acuity: “Logarithm of The Tumbling E Chart was designed by Profes- the Minimum Angle of Resolution”. Such tests were sor Hugh Taylor of the Centre for Eye Research Aus- selected in 1984 as the standard for visual acuity tralia (CERA) in 1978 to test the vision of Australian testing by the International Council of Ophthalmol- Aborigine individuals in an attempt to identify those ogy. [3] with the eye disorder trachoma. In 1982, when the National Eye Institute needed Professor Taylor, using the Snellen proportions, standardized charts for its “Early Treatment of Di- designed a shape resembling an uppercase E, which abetic Retinopathy Study” (ETDRS), Dr. Rick Fer- he arranged in four directions (up, down, right, and ris combined the Green and Bailey-Lovie Charts’ left) in progressively smaller sizes (figs. 9 and 10). logarithmic progression and format with the Sloan The patient then describes the direction in which the Letters (fig. 8). ETDRS charts use equal spacing Tumbling E is facing.

Lorrie Frear TUGboat, Volume 41 (2020), No. 3 273

Figure 11: Landolt C (Broken Ring) Chart, designed by Edmund Landolt.

Figure 12: Snellen C and Broken Ring Eye Chart C. Figure 9: Tumbling E Chart, designed by Hugh Taylor. The Landolt C or Broken Ring Eye Chart (see figs. 11 and 12) is also used for illiterate individuals or those persons unfamiliar with the Latin alphabet. Created by Swiss ophthalmologist Edmund Landolt, this test is now considered the European standard. The Broken Ring (which is the same proportions as the C from the Snellen and Sloan Charts) is rotated in increments of 90°. The minimum perceivable angle of the C-gap is the measurement of visual acuity. [10] In addition to the Tumbling E and Landolt C tests, there are charts for children in which progres- Figure 10: Snellen E and Tumbling E. sively smaller, simple of objects are used. The challenge in designing these charts is creating

Eye charts in focus: The magic of optotypes 274 TUGboat, Volume 41 (2020), No. 3

recognizable pictograms of equal visual weight, con- [5] Kennedy, P. Who Made That Eye Chart? sistent style, and design. New York Times, 24 May 2013. nytimes.com/2013/05/26/magazine/who- 6 In conclusion made-that-eye-chart.html This article is not an exhaustive research study into [6] Miranker, E. Just My Optotype. the subject of eye charts or their efficacy. There are nyamcenterforhistory.org/2017/08/ many more examples of eye charts. My objective 17/just-my-optotype/ was to explore the archetypes of optotype design in [7] Plainis, S., Moschandreas, J., et al. the evolution of the eye chart as a diagnostic tool. Validation of a modified ETDRS chart for Now I can tell my students that there is, technically, European-wide use in populations that use not a single typeface to recommend for their designs; the Cyrillic, Latin or . Journal and I can refer them to this article for more informa- of Optometry Jan.–Mar. 2013; 6(1):18–24. tion! Examining optotypes has been an eye-opening journalofoptometry.org/en-validation- experience. modified-etdrs-chart-for-articulo- References S1888429612000817 [1] Bailey, I., Lovie-Kitchin, J. Visual acuity [8] Precision Vision. Snellen Eye Chart - a testing. From the laboratory to the clinic. Description and Explanation. Vision Research 20 Sep. 2013; 90:2–9. precision-vision.com/snellen-eye- sciencedirect.com/science/article/pii/ chart-a-description-and-explanation/. S0042698913001259 Also of interest on that site: Measuring Snellen Visual Acuity, precision-vision. [2] The College of Optometrists. Test charts. com/measuring-snellen-visual-acuity/; college-optometrists.org/the-college/ Snellen Eye Test Charts Interpretation, museum/online-exhibitions/virtual- precision-vision.com/snellen-eye-test- ophthalmic-instrument-gallery/test- charts-interpretation/. charts.html [9] Segre, L. What’s an eye test? Eye charts and [3] International Council of Ophthalmology, visual acuity explained. Visual Functions Committee. Visual Acuity allaboutvision.com/eye-test/ Measurement Standard. icoph.org/dynamic/attachments/ [10] Wikipedia. Landolt C. resources/icovisualacuity1984.pdf en.wikipedia.org/wiki/Landolt_C [4] Kaiser, P. Prospective Evaluation of Visual [11] Wikipedia. LogMAR chart. Acuity Assessment: A Comparison of Snellen en.wikipedia.org/wiki/LogMAR_chart Versus ETDRS Charts in Clinical Practice. [12] Wikipedia. Sloan letters. Trans. Am. Ophthalmol. Soc. Dec. 2009; en.wikipedia.org/wiki/Sloan_letters 107:311–324. www.ncbi.nlm.nih.gov/pmc/ [13] Wikipedia. Snellen chart. articles/PMC2814576 en.wikipedia.org/wiki/Snellen_chart

Lorrie Frear ⋄ Rochester Institute of Technology lorrie dot frear (at) rit dot edu lorriefrear.info

Lorrie Frear TUGboat, Volume 41 (2020), No. 3 275

The Non-Latin scripts & typography group includes Tamil, Telugu, Kannada, Malayalam, and Sinhala. In Southeast Asia, we find more dis- Kamal Mansour tant descendants of Brahmi in Thai, Lao, Khmer, 1 Introduction and Myanmar scripts. As close relatives of Latin, the Greek and Cyril- It is quite common in typographic terminology to lic scripts had more time to develop typographically divide the world’s scripts into Latin and non-. than other non-Latins. In its earliest typographic At first glance that might seem to be a reasonable forms, Greek initially imitated scribal form categorization until one looks a bit closer to find before eventually settling on printed forms based on that non-Latin consists of a huge number of diverse its classic origins. Classical and Biblical studies fur- scripts. Imagine if we were to call blue, ther established the need for Greek typefaces in the while calling all others non-blue. We would be gath- late nineteenth and early twentieth centuries. ering the remaining colors of the spectrum into one Cyrillic, originally derived from Greek, was ex- group without differentiation. Calling a color non- tensively revised and “europeanized” under the hand blue doesn’t tell us anything about what it might of Peter the Great. Then early in the twentieth cen- be; is it orange, green, magenta, indigo? tury, Cyrillic was further simplified by purging it of However strange this terminology may be, there unnecessary vestiges from Greek and re- is good reason behind it. Gutenberg invented move- dundant characters. In an effort to further unite the able type for Latin script; in particular, he focused huge territory of the Soviet Union, the government on a certain Gothic style used at the time by Ger- extended to support the many non- man scribes. We know things didn’t stop there. The Slavic languages of the various republics. Outside printed forms of Latin script moved away from the the Soviet Union, a certain number of Cyrillic type- handwritten forms over time, evolving into a dis- faces were always available through the mainstream tinct craft and discipline. In the process, the letter type foundries to satisfy the needs of academics in forms underwent considerable simplification. By the Slavic Studies and of emigrant communities. time typographic letters began to develop for other Because of their common shapes and behavior, scripts, Latin type had reached a maturity possible Greek and Cyrillic have always been easily accommo- only through time. Over the centuries, the machin- dated on the same typesetting equipment as Latin ery and techniques had been refined for the needs of script. Of course, the compositor had to be familiar Latin type and any newcomers to the game needed with the script he was setting, with Polytonic Greek to adapt to the current state of things. being the most demanding because of the variety of In the realm of non-Latin scripts, what are the . main groups and families? Moving eastward from When it comes to typography, every script has the origins of Latin script, we find Greek script, and its intricate details, and consequently, its own story further east, its close relative, Cyrillic. Hovering of transition from handwriting to type. It would over the Caucasus, Armenian and Georgian raise be interesting to tell every story, but that would be their heads. Reaching from North Africa eastward, enough to fill many books. Without going into every occupies a large swath of the land- detail, we can gain some understanding by survey- scape that reaches as far as India. Near the juncture ing the types of problems that arose when adapting of Western Asia and North Africa, Hebrew has its typesetting technology developed for Latin script to home. In East Africa, Ethiopic script — in ancient non-Latin scripts. Because scripts have developed times know as Ge’ez — has a sizable presence. Over as families, they share many attributes with other the large territory of China, as well as in Taiwan, members of the family. We can take advantage Chinese is the dominant script. , a syllabic of the similarities to identify recurring obstacles in script with some Chinese visual features, dominates the typographic development of non-Latin scripts. in Korea. Meanwhile, further south in Japan, rules a By citing judiciously chosen examples from a few unique hybrid writing system consisting of two syl- scripts, we can readily present the gamut of signifi- labaries ( and ), Chinese charac- cant difficulties. ters, in addition to Latin. The Indian subcontinent Before delving into the range of technical chal- is home for a large family of scripts descended from lenges, we must first recognize the most common ancient Brahmi writing. Over the centuries, the de- hurdle shared by all who endeavored to bring a non- scendants split into two groups, northern and south- Latin script into the typographic age: the fear of the ern scripts. In the northern group, , Ben- exotic. Typography developed in Europe and it was gali, and Gujarati scripts stand out. The southern

The Non-Latin scripts & typography 276 TUGboat, Volume 41 (2020), No. 3

Europeans who first strove to take this craft to dis- The orthographic norms of represent tant lands. Before the advent of sociology, anthro- the range of simple to complex syllables by allowing pology, and linguistics, non-European cultures were its components to stack both horizontally and ver- seen as foreign, exotic, and difficult to fathom. Be- tically. Like Devanagari, the simplest syllable con- fore understanding the written form of a language, sists of a with an implied or explicit , one must first learn the spoken language and, to but because of the tonal nature of Khmer, marks in- some extent, understand its culture. dicating tone are sometimes also required. Khmer To explore the gamut of typographic challenges, has the same needs as Devanagari for upper and we will visit four scripts: Devanagari, Khmer, Ara- lower marks, which result in more height and depth bic, and Chinese. In terms of visual impression and for syllable clusters. While Devanagari allows multi- structure, each of these scripts differs greatly from consonant syllables, it also allows them to stack hor- the others. izontally, if need be. On the other hand, Khmer requires the components of a multi-consonant sylla- 2 Devanagari ble cluster to stack vertically. We begin with Devanagari — the most commonly Following is an example of a relatively simple used script in contemporary India — that is used syllable cluster consisting of a single consonant no primarily for the Hindi and Marathi languages. De- followed by a vowel mark i above (u1793 + u17B7): vanagari has a long history of development that be- gan with the language centuries ago. The + structure of Devanagari writing is based on the sylla- ន ◌ិ −→ និ ble which, in its simplest form, can be composed of As an example where additional depth is re- a consonant with either an inherent or explicit vowel quired, the following syllable cluster consists of a mark. As an example, let us consider the letter , single consonantal letter followed by a subscript con- sonant that sits to its right while partially overlap- which when written as क, includes the implied vowel a. If we want to write kī, we need to explicitly add ping it below [ + subscript sa ]. The use of a subscript form indicates that the preceding conso- the ī vowel ◌ी to ka क, resulting in क. Presented at a larger size, we can clearly show how the two nant is not followed by a vowel (u1795 + u17D2 + components of the kī syllable overlap on the top: u179F): क + ◌ी क ផ ◌្ ស −→ ផ䮟 Now, if we want to write−→ku instead of kī, the u The character u17D2 indicates that the follow- vowel is placed below ka as follows: ing consonantal symbol should appear in its sub- script form. There are also deeper cases where a vowel sym- Marks other than can also appear above bol ie surrounds and subtends a consonant ko fol- and below letters; for instance, if we want to indicate lowed by a subscript consonant lo (u1783 + u17D2 that the vowel in ka sounds nasal, we would add a + u179B + u17C0). Note that the right part of ie dot above (natively, ) as in: is stretched to embrace the stacked consonants (ko+ subscript lo + ie): कं Since Latin type was usually composed on a single horizontal line with gaps between letters, De- ឃ ល េ◌ៀ −→ េឃៀ䮛 vanagari presented quite a challenge because of its The above sequence could also carry an addi- need for overlap between adjacent characters, in ad- tional mark above it. Unlike Latin letters, Khmer dition to upper and lower marks. characters have a predominantly vertical aspect - tio when arranged into syllable clusters. One can 3 Khmer imagine the numerous challenges posed by Khmer Next, we consider Khmer, or Cambodian, writing. script for those who wanted to implement it in metal As a remote descendant of the , the type. How does one stack so many vertical elements structure of Khmer is also broadly based on the syl- along the ? If Latin and Khmer letters are lable, including the presence of an implied or explicit put side by side, what of size ratio should be vowel; however, the phonetic nature of the Khmer maintained between the two? Does one resort to us- language has also resulted in many additional fea- ing the largest leading needed throughout one docu- tures that go beyond the ancestral Brahmi model. ment, or does one increase it only when required?

Kamal Mansour TUGboat, Volume 41 (2020), No. 3 277

In digital fonts equipped with OpenType tech- Each of the above styles required that the com- nology — or some other shaping software — the built- positor thoroughly master its shaping rules. in shaping logic allows the user to enter text at the character level only, while watching as syllable clus- ters compose magically into their correct form. By moving from metal to digital type, we also leave the Most Arabic text is set without vowel marks, obstacles of metal behind while gaining new flexibil- so setting fully vocalized text (with optional marks ity and fluidity of form. above and below the letters) as in the above sample was (and still is) laborious, and therefore avoided 4 Arabic because of the extra expense. The cursive form of Arabic script is the one trait that With the advent of mechanized typesetting in differentiates it from most others. Arabic writing the early to mid-twentieth century, styles had to never developed typographic “block” letters, but has be simplified to fit the new size constraints of mag- remained cursive from its inception to this day. azines and keyboards. This trend to simplify per- Spread over many centuries and a broad geo- sisted, reaching its peak with the arrival of photo- graphic terrain, many different styles of Arabic script typesetters where the limited number of character have developed. Under Ottoman rule, when it came slots was dictated by Latin fonts. time to adapt one of the Arabic styles to type, the In the present day, one can say with good con- Naskh style prevailed. In due time, many other fidence that Arabic script stands to benefit greatly styles were implemented, but Naskh has remained from digital fonts equipped with OpenType technol- the reference style for running text. Because of its ogy in several ways. First of all, the oversimplifica- rich variety of contextually variant forms, Naskh tion of the past century can be discarded because style could only be typeset by a compositor well OpenType technology is capable of emulating the versed in the intricacies of the style. Showing a - various calligraphic styles of Arabic script. Since riety of styles available for manual setting, the fol- the global adoption of the Unicode Standard in the lowing image is taken from an 1873 catalog of the 1990s, Arabic text is stored as a sequence of alpha- Amiriyah Press in : betic characters that are independent of style. The embedded shaping logic of each font reflects its style, while text input remains invariant. By setting a string of text in a particular font, one can expect it to take the shape mandated by the rules of that style. For instance, in the following sample, each line consists of the same text set in three distinct fonts. Note that the second and third lines show words that are set on an incline with respect to the baseline — a calligraphic feature ably rendered in OpenType.

5 Chinese We now take a look at the fourth script, Chinese, which is nowadays shared mainly by the Chinese and Japanese languages, albeit with some stylistic differ- ences. Unlike the three previously examined scripts, require no additional marks nor any contextual shaping. Once designed, a character can simply be typeset as is. The primary obstacle for Chinese script lies in the thousands of characters needed to set everyday text. Chinese writing has a

The Non-Latin scripts & typography 278 TUGboat, Volume 41 (2020), No. 3

…Inherent dignity and…inalienable rights of…the human family is the foundation of freedom

…Inherent dignity and…inalienable rights of…the human family is the foundation of freedom Figure 1: Typesetting in a standard Latin font vs. Zapfino.

long history of exacting calligraphic styles that re- Scripts vary greatly in their level of complexity. quire great effort to master. The following sample A typical Latin typeface is quite simple because of its shows Song, one of the most common styles. relatively small character set and the near absence of shaping rules, while a calligraphic font such as Zapfino depends on a set of complex rules; see fig.1 This example clarifies how complexity can of- ten be based in a particular style, and is not nec- Despite the graphic complexity of its characters, essarily fundamental to the structure of the script Chinese script is typically classified among the “sim- itself. Some scripts are relatively complex because ple” scripts because the typesetting of Chinese text of their intrinsic shaping rules, while their input re- has no complexities in terms of its character shapes. quirements are simple due to a small character set. To write Japanese, Chinese characters () The line composition of Chinese text is relatively are intermixed with two phonetic called simple because all characters fit within a square and hiragana and katakana to show grammatical affixes, require no additional marks. On the other hand, the as well as foreign words and names. In typical Japa- labor needed to create all the characters is dispro- nese text, about half the characters consist of portionately high, and input methods are complex (hiragana or katakana). The following text is taken by necessity because of the large character set. An from the first line of the Universal Declaration ofHu- Arabic font whose style requires at most four shapes man Rights. The more fluid and curvier character for each letter is simple in comparison to one whose are hiragana. style requires a rich set of contextual variants. Com- plexity in scripts is a matter of gradation. There is no need to simplistically put all scripts into two heaps, simple or complex. In this brief overview, we have seen some salient features of each of the following scripts: Devanagari, In the twentieth century, ingenious methods Khmer, Arabic, and Chinese. Although there are no and typesetting equipment were devised to handle visual similarities among these scripts, they do share the large character sets needed for Chinese and - a common history: each of them came into the twen- panese in their home lands. However, the keyboards tieth century with a typographic tradition of varying and magazines of the mechanical typesetters of Lino- durations and strengths. While this is also true of type and Monotype were designed for limited char- other scripts, there are many others that came into acter sets. In addition, the cost of developing an ad- the twenty-first century with either a typographic equate set of Chinese characters was high enough to false start or none at all. There are many language stop any Western companies from entering the field communities who had their own traditional script until the early era of digital typesetters in the late but whose populations were too small to initiate and 1980s, when Monotype entered the marketplace. sustain a typographic practice. Other communities Once digital methods prevailed in the Chinese had established a traditional practice of metal type- and Japanese markets, more local enterprises were setting during the nineteenth or twentieth centuries, able to enter the typographic field and thrive. Dig- but could not transition to newer technology such ital text entry now depends on sophisticated soft- as as metal technology became ob- ware that exploits the systematic structure of - solete. nese character in terms of its radicals, strokes, or Let us now roll back the calendar by about four phonetics to simplify the job for the user. decades to recount how we got to the current state of typography. As digital technology was matur- 6 Degrees of complexity ing in the 1970s and 1980s, a silent software revo- Now that we have taken a quick glimpse at several lution was starting to overtake typography. Donald scripts and their traits, we need to examine what Knuth’s METAFONT and TEX started making digital inferences we can draw for the world of scripts.

Kamal Mansour TUGboat, Volume 41 (2020), No. 3 279

typography directly accessible to academics. This a faithful image of the intended content, but it was transition enabled a wave of academic publications not a dependable, unambiguous archive of the orig- that had earlier been throttled by ever-rising costs of inal text. professional typesetters. In the mid-1980s, the wave Around that time, a group of technologists in- of WYSIWYG word-processing began at Xerox, but volved with multilingual computing were discussing was later made available to the through the how to achieve the until-then elusive goal of an un- Apple Macintosh. The concurrent publishing of the ambiguous, global character set. They wanted a PostScript language, and its use as a standard font digital environment where code 192 would represent format, wrested digital typography from the hands only one character, and every character of every lan- of proprietary typesetting equipment and brought guage would be assigned a unique code. Gone would it into the public arena. High-resolution printing — be the world of ambiguous character codes, and all well, 300 dpi at the time — suddenly appeared in scripts, Latin and non-Latin, would be on equal foot- big and small offices alike, dragging along with it ing. This group of visionaries laid down the foun- the word font into everyday language. dation for the Unicode Standard, which eventually The first stage of the typographic revolution en- came to be recognized as the one global character set abled the definition of character shapes through soft- by all the major makers of systems and software. ware instead of hardware. As affordable character- Once Unicode was accepted by Apple and Mi- drawing software became available in the early 1990s, crosoft, the slow transition away from ambiguous more typographic aficionados were drawn to the field codes began. Before and during this transition, the as new practitioners. The new software made it pos- transfer of text documents between different sys- sible to produce consistent character sets, properly tems was laborious. Any characters outside the 7-bit formatted as fonts destined to create high-resolution ASCII range could be garbled in transit unless they imprints on paper. Because of the notable differ- had been deliberately filtered. For instance, code ence in resolution, what appeared on the computer 0xE8 in Mac Roman represented Ë, while in Win- screen was only an approximation of the final image dows Code Page 1252 it stood for è. As the use of produced by a laser printer on paper. In the end, personal computers continued to penetrate every as- the paper image was the more important one. After pect of life, it became clear that such a contradictory all, until that time, typography’s aim was to leave situation was untenable in the long run. Change was an imprint on the relatively permanent medium of on the horizon, but it was slow in coming. paper. In the mid-1990s, as the presence of the Uni- By making letter forms in software instead of code Standard made itself felt across the globe, it metal, the wall of separation between Latin and met resistance in many different regions because it non-Latin characters collapsed. After all, outlines was perceived as foreign-born. To be accepted every- defined in the form of polynomials have no physical where, advocates and practitioners of the Standard barriers like molds and grooves, and are even free to had to demonstrate its efficacy and suitability be- overlap and intertwine. The domain of digital char- fore many different national and linguistic commu- acter shapes proved itself to be adaptable to any nities. As this defense of the Standard was mounted, script. it soon became apparent that an aspect of Unicode As impressive as the first stage of digital typog- had turned out to be an unintended obstacle for raphy was, it was only a first step forward. Even some non-Latin scripts. though it allowed us to produce typographic images “Characters, not glyphs” — one of Unicode’s pri- without resorting to metal, it still had significant de- mary design principles — was aimed at keeping plain ficiencies which were not immediately apparent to Unicode-encoded text free of the entanglements of all. Once stored on a computer, one could use a style. Characters represent abstract units such as text document to produce numerous impressions of letters of the alphabet, while glyphs depict the par- it over time. In a sense, we can consider the docu- ticular style and size of the character, as dictated by ment a digital equivalent of the typographic galley a specific font. Since Unicode represents only the for metal type, but unlike the galley, the early digital characters of a string, the final shape and appear- documents were somewhat unreliable. In the early ance of the display glyphs are relegated to the “text 1990s, typical text documents used inherently am- rendering” process. biguous character codes; for instance, the code 192 In the mid-1990s, competing systems had differ- (decimal) represented À in a Latin-1 code set, while ent “rendering” processes that typically supported it meant ω¯ in a Greek set. With the right software only a few scripts. The rendering component of sys- and fonts in place, such a document could produce tems turned out to be a bottleneck for “complex”

The Non-Latin scripts & typography 280 TUGboat, Volume 41 (2020), No. 3

scripts such as Devanagari and Arabic; though their lished in 2002, the Script Encoding Initiative (SEI) texts could be readily encoded in Unicode, it was has been preparing and presenting such proposals difficult to achieve good typographic results. In for scores of less privileged scripts (see linguistics. that period, Apple introduced a sophisticated ty- berkeley.edu/sei/scripts-encoded.html). Over pographic system called “Apple GX” which was ca- the years, SEI has been instrumental in bringing at pable of rendering many scripts and styles, but it least 70 scripts into Unicode. Without such spon- ran only on Apple hardware. Soon thereafter, Mi- sorship, many scripts have little chance to enter the crosoft countered with a competing system called digital era. “TrueType Open”. About 10 years ago, Google launched the Noto Every practitioner in the publishing community Project (google.com/get/noto) to build OpenType was faced with a difficult choice to make. It wasn’t fonts for the scripts in Unicode, and to make the until 1996 that Adobe and first agreed on fonts freely available to all under an open source li- OpenType, a font specification that integrated both cense. were required not only to support PostScript Type 1 and TrueType font formats, in ad- the character set for each script, but also to make dition to a unified system supporting advanced typo- use of advanced typographic features that render the graphic features. Finally, there was hope of produc- script in its expected native appearance. So far, at ing cross-platform fonts in the near term, even for least 120 scripts are supported by Noto fonts. For non-Latin fonts. As could be expected, this poten- scripts that were already widely covered, the Noto tial was achieved gradually over the following years. fonts simply broadened the variety of styles avail- Today, we can rightly claim that a large num- able, while for other scripts, they opened the door ber of scripts can be correctly rendered and that the to the digital domain. level of typographic sophistication is continually ris- Not all scripts are on even footing yet — a great ing. Certainly, scripts such as Devanagari, Arabic, many have just begun their “digital ”. As al- and Khmer can be correctly rendered on multiple ways, improvements will need to be made, and addi- digital platforms without any worry about portabil- tional scripts will have to be encoded. All the same, ity. The use of Unicode for numerous scripts, and I can say with confidence that this is a fruitful and the languages they support, has been universally ac- auspicious moment for the whole spectrum of global cepted. Each of the characters À, ω¯, Ë, and è is scripts, from red to violet. represented by the same unique code in any docu- Kamal Mansour ment, on any platform. ⋄ At this auspicious juncture, one must ask if Kamal.Mansour(at){monotype,me}dotcom there is more to do. Have we reached a stage where all the scripts in the world are on an even footing? In the era, what does it take for a script to become “digitally enabled”? To prepare text docu- ments in a particular script, all its characters must Production notes first be listed in the Unicode Standard, each ofthem assigned a unique code. Then, to make such doc- Karl Berry uments humanly readable, there must be at least This article was typeset with X LE ATEX, using the one functional font that renders the text in its com- polyglossia package and its commands \texthindi monly recognizable form. Once these two require- and \textkhmer for the Devanagari and Khmer ex- ments are met, documents in a given script — and amples. The input text was UTF-8. all the languages it supports — can be created, read, We used the Noto Serif Devanagari and Noto distributed, and archived. Serif Khmer fonts (regular weight). It was not fea- Even though Version 13.0 of Unicode includes sible or desirable to install them as system fonts, so 154 scripts, more than 100 scripts are known to be we specified them to polyglossia as : under consideration for future inclusion. It might \setotherlanguages{hindi,khmer} come as a surprise to some that so many scripts re- \newfontfamily\devanagarifont[Script=Devanagari] main unencoded. Some scripts might no longer be {NotoSerifDevanagari-Regular.ttf} in use and require considerable research to produce \newfontfamily\khmerfont[Script=Khmer] a definitive character set, while others might belong {NotoSerifKhmer-Regular.ttf} to a minority population that does not have the re- With the addition of ,Renderer=HarfBuzz in sources to prepare the documents needed to propose the \newfontfamily calls, the results obtained with their inclusion in the Standard. Since it was estab- LuaLAT X were identical. E ⋄

Kamal Mansour TUGboat, Volume 41 (2020), No. 3 281

Using DocStrip for multiple document Listing 1: A minimal DocStrip batch file variants \input docstrip.tex Matthew Leingang∗ \generate{\file{foo.sty} {\from{foo.dtx}{package}}} Abstract \endbatchfile I describe a method of keeping multiple variants of the same document within a single file, using DocStrip. are written to all destination files. Lines beginning 1 Introduction with a guard will be written to the destination file depending on the options set. Each guard begins As a college professor, there are several times when with a % and contains a boolean expression enclosed I need to keep teaching materials in several different by angle brackets. For example, a line beginning forms. For example: with % will only be passed to generated files • A single class day’s lesson might consist of lec- when the bar option is set. DocStrip can generate ture slides in the beamer class, a handout of the more than TEX files, too; for example, BibTEX files, same slides printed 2–3 on a page for student data files, and shell scripts can be embedded in the notes, my own lecture notes as a manuscript, a master file and extracted. worksheet for in-class activity, and solutions to Now putting guards at the start of every line that worksheet. would be cumbersome to type. So guard modifiers • A single week’s homework assignment might are used to delimit blocks of code with the same consist of problem statements, with hints and guard. Any expression preceded by ‘*’ will apply reading notes, a LATEX template for students to the indicated guard to every line that follows, until fill in with their own answers, and solutions with the identical expression is encountered with the ‘/’ comments to be published after the assignment modifier. This gives an almost HTML-like layer to the has been graded. DocStrip source file, where blocks of code between Over the years I have developed a workflow for lines starting with %<*bar> and % will be maintaining these “bundles” of documents in the written to any file generated with the bar option. same file, using the LATEX DocStrip utility. This A DocStrip batch declaration such as in Listing 1 workflow allows me to programmatically vary the often resides in a separate file. If this code were in content and formatting, and avoids external scripts foo.ins, running TEX on foo.ins would extract or filesystem hacks. In this article I will introduce foo.sty from foo.dtx and quit. But DocStrip files the reader to DocStrip and explain how I use it. can be also be configured to “self-extract” by putting the batch declaration at the beginning of the file. 2 The DocStrip utility In this configuration, running TEX on foo.dtx will DocStrip [5] was originally designed as a literate pro- instruct TEX to parse foo.dtx a second time, this gramming method. LATEX package and class authors time writing foo.sty. Thus, the document content use it to write documentation for their code in the and extraction instructions can reside in a single file. same file as the implementation, within commented lines. DocStrip would strip out the comment lines 3 Example: A problem set with answer and use them to produce documentation. The slim- template and solutions mer package and class files would be installed to save As a small but not quite minimal example, let’s compile time. In subsequent versions DocStrip de- consider a DocStrip file called hw.dtx. The en- veloped the ability to write lines to several different tire file can be viewed online as part of the github files in one batch, through the use of options. repository for this article: github.com/leingang/ DocStrip batches are programmed in a TEX file tugboat-docstrip. as in Listing 1. The \input line loads the DocStrip code. The \generate line instructs TEX, effectively, 3.1 Batch header to “read foo.dtx and write foo.sty, setting the Listing 2 shows the beginning of hw.dtx, which loads package option”. docstrip.tex and declares the \generate batch. When generating files, DocStrip ignores all lines It’s surrounded with driver guards. Since no gen- beginning only with a single %. Non-commented lines erated file sets the driver option, this block is not ∗ The author wishes to thank the editors and reviewers written to any file. for their thoughtful and productive feedback. We generate four files:

Using DocStrip for multiple document variants 282 TUGboat, Volume 41 (2020), No. 3

Listing 2: The header block of hw.dtx, declaring \generate batch

1 %<*driver> 2 \input docstrip.tex 3 \askforoverwritefalse 4 \generate{ 5 \file{\jobname.qns.tex}{\from{\jobname.dtx}{questions}} 6 \file{\jobname.ans.tex}{\from{\jobname.dtx}{questions,answers}} 7 \file{\jobname.sol.tex}{\from{\jobname.dtx}{questions,solutions}} 8 \file{\jobname.bib}{\from{\jobname.dtx}{bib}} 9 } 10 \endbatchfile 11 %

• hw.qns.tex, which sets the option questions. of hw.dtx is retained in the hw.ans.tex file (List- This document will be the prepared questions ing 5). It is a note to the student where to write sheet for the instructor to distribute to the stu- their answer. Finally, the solution environment dents. and subsequent commentary are written • hw.ans.tex, which sets the options questions to the hw.sol.tex file (Listing 6). and answers. This file will be distributed to 3.3 Using guards to conditionally define students as LAT X source, so that they can fill E environments in their answers without having to create their own file from scratch. The implementation of the environments question, • hw.sol.tex, which sets the options questions answer, hint, and solutions have to be set up in and solutions. This can be published once the the preambles of the generated LATEX files (or in assignment is collected and graded. packages used by them). But guards can be used in the preambles too. In this way, we can conditionally • hw.bib, which only sets the bib option. This style the document. is a BibTEX file that can be included in any of For instance, I prefer that the question text be the LATEX files. If the assignment needs to be copied with only a few changes, such as the year upright in the questions file and italicized in the an- and the due date, only one file must be copied swers/solutions file. This is accomplished in Listing 7. from the old directory to the new. Line 34 is written to the answers and solutions file, and overrides line 33. The preamble of the questions Lines 12–89 of hw.dtx are delimited with the file defines question under the definition theorem A guards and enclose an entire LTEX doc- style, with bold header and upright body font. But ument from \documentclass{article} through to in the answers and solutions file, the plain theorem \end{document}. Lines 90 and onward (not shown in style is in force, so question sets its body in italic. this article) are delimited with and comprise the complete hw.bib file. Listing 7: An excerpt of hw.dtx (lines 35–39) showing conditional styling of the question environment 3.2 Using guards to conditionally include text \usepackage{amsthm} \usepackage{amssymb} Listing 3 (following page) shows an excerpt of hw.dtx \theoremstyle{definition} that declares a question. Notice that the hint envi- %\theoremstyle{plain} ronment is surrounded by a guard with a compound \newtheorem{question}{Question} boolean expression . The hint questions effect is that the is shown when the 3.4 Advanced tricks option is selected, but answers and solutions are not selected; that is, only in the hw.qns.tex file. If you look in the full hw.dtx file online, you’ll see a The resulting block that is written to the questions few more automatic variations with DocStrip: file is shown in Listing 4. • The document is specified in hw.qns.tex. Comment lines that begin with a single % are In hw.ans.tex, the text ‘Answers to ’ is pre- stripped from the input and do not get printed to pended to the title (using the \preto command any output file. But comment lines beginning with from the etoolbox package). In hw.sol.tex, two % characters remain. So the comment on line 71 the phrase ‘Solutions to ’ is prepended. In

Matthew Leingang TUGboat, Volume 41 (2020), No. 3 283

Listing 3: An excerpt of hw.dtx declaring a question (the question is from [7])

61 \begin{question} 62 \cite[Exercise 6.6]{Scheinerman}. 63 Disprove: if $p$ is , then $2^p-1$ is also prime. 64 \end{question} 65 %<*!answers&!solutions> 66 \begin{hint} 67 All you need is one counterexample. Guess and check, and be persistent. 68 \end{hint} 69 % 70 %<*answers> 71 %% Student: put your answer between the next two lines. 72 \begin{answer} 73 \end{answer} 74 % 75 %<*solutions> 76 \begin{solution} 77 Let $p=11$. Then $p$ is prime. But $2^p-1 = 2^{11}-1 = 2047 = 23 \times 89$. 78 So the statement is false. 79 \end{solution} 80 A prime number that is equal to $2^n-1$ for some $n$ is called a \emph{Mersenne 81 Prime}. Examples of Mersenne primes are $3 = 2^2 - 1$ and $127 = 2^7-1$. It is 82 unknown whether the number of Mersenne primes is finite! 83 %

Listing 4: The generated block in hw.qns.tex

61 \begin{question} 62 \cite[Exercise 6.6]{Scheinerman}. 63 Disprove: if $p$ is prime, then $2^p-1$ is also prime. 64 \end{question} 65 \begin{hint} 66 All you need is one counterexample. Guess and check, and be persistent. 67 \end{hint}

this way, the original title only needs to be put 4.1 Separate files in one place. In this strawman workflow, separate, nearly identical files are kept side-by-side. Any correction (to a prob- • The document author is set differently in the dif- lem, for instance) requires three files to be updated. ferent document variants. In hw.qns.tex and Further, copying a question to another assignment hw.sol.tex, the author is the professor. In requires copying from three different files to three hw.ans.tex, the author is set to a generic stu- different files. This setup is ripe for inconsistency dent name, and \LaTeXWarning is used to re- and headache. mind the student to change the generic name to their own name. 4.2 Option setting in pure (LA)TEX In this workflow, certain (LA)TEX booleans are defined 4 Comparing alternatives and set, and code is conditionally executed depending The problem of maintaining the sources for different, on those booleans. The booleans can be set by adding but closely related documents in the same file, and TEX code on the command line as optional arguments specifying which documents are to be typeset at the to the executable (e.g., pdflatex). Or, a shell file can time of compilation, has been encountered by users be created which sets options, then inputs a common before, for instance on the Stack Exchange network master file. [1, 4, 6]. Let’s consider some of the alternatives in When options are set within the document, the the context of the example use case from Section 3. \jobname is the same independent of the options

Using DocStrip for multiple document variants 284 TUGboat, Volume 41 (2020), No. 3

Listing 5: The generated block in hw.ans.tex

50 \begin{question} 51 \cite[Exercise 6.6]{Scheinerman}. 52 Disprove: if $p$ is prime, then $2^p-1$ is also prime. 53 \end{question} 54 %% Student: put your answer between the next two lines. 55 \begin{answer} 56 \end{answer}

Listing 6: The generated block in hw.sol.tex

57 \begin{question} 58 \cite[Exercise 6.6]{Scheinerman}. 59 Disprove: if $p$ is prime, then $2^p-1$ is also prime. 60 \end{question} 61 \begin{solution} 62 Let $p=11$. Then $p$ is prime. But $2^p-1 = 2^{11}-1 = 2047 = 23 \times 89$. 63 So the statement is false. 64 \end{solution} 65 A prime number that is equal to $2^n-1$ for some $n$ is called a \emph{Mersenne 66 Prime}. Examples of Mersenne primes are $3 = 2^2 - 1$ and $127 = 2^7-1$. It is 67 unknown whether the number of Mersenne primes is finite!

set. In our homework example, this would mean This avoids the colliding output file issue of the problems file and solutions file would both end command-line arguments, and is more lightweight up named hw.pdf. Not only does this mean that than shell files. A new variant just requires a new the problem set and solutions PDFs cannot inhabit symlink. It has the same disadvantages of the master- the same directory at the same time, one can be plus-shell files workflow, though. Code cannot be mistaken for another. Imagine the poor professor stripped out of the master file for a template, only distributing what he thought was the questions-only gobbled and discarded. Not every PDF, only to realize that he had instead shared all and file system has this capability, either. the solutions! With shell files, the \jobname is different for 4.4 Other preprocessors each document variant, avoiding the possibility of DocStrip functions as a preprocessor —it converts such a mistake. The \jobname can also be set on the one source file to another (or several others). There command line when invoking (LA)T X. But either E are other tools for this job, among them GPP and m4. way, extra files are needed to support this workflow. Using another program requires installing, learning, Also, in the context of a homework assignment, and maintaining another program, whereas DocStrip none of these methods allow the distribution of an is available wherever T X is. answer template as a LAT X source file. If the solu- E E The DocStrip method described here is operat- tions are in the master T X file, and conditionally E ing system independent. It is secure in that each typeset depending on options, they still remain the document variant gets a distinct \jobname and out- source. put file name. It is lightweight in that only one 4.3 Symlinks DocStrip file needs to be created for every bundle of documents needed, and no programming other than In this workflow, a single T X file is created, and E T X is necessary. each variant document is a symbolic link (or symlink) E to the original file with a different file name. The operating system treats a symlink to a file as a dif- 4.5 Disadvantages of the DocStrip method ferent name for that file. Editing one of these “files” One drawback of this method is that it requires an affects the contents referenced by the file and all of extra TEX run. First, the DocStrip file is compiled, its symlinks. But the \jobname is determined by the extracting the various TEX files. Then each of them file name, so it can be tested in order to conditionally must be compiled. A bit of programming can au- execute certain code. tomate this process (as well as decide when certain

Matthew Leingang TUGboat, Volume 41 (2020), No. 3 285

files don’t need to be recompiled), as we’ll describe or -pdflua to ensure lualatex) will be passed when in the next section. making each TEX file. Not every TEX editor can be used for DocStrip- The process can be further automated and in- files. In particular, WYSIWYG or WYSIWYM editors tegrated into various TEX editors. I have written such as LYX and TEXmacs expect the source file to a .latexmkrc (configuration file for latexmk) that be a regular (LA)TEX file. But any editor designed to looks for DocStrip files in the argument list, and when operate on text files can be configured for DocStrip. found, parses their log files for names of generated Some of the most popular TEX editors (for example, files to make automatically. With this configuration, TEXShop, TEXworks, TEXstudio, AUCTEX) support latexmk foo.dtx will take care of all generated files DocStrip out of the box, as does Visual Studio Code. in one fell swoop. Another, more uncomfortable drawback is that Any editor that can run a program can be con- TEX errors are only discovered when the generated figured to run latexmk. For instance, I have written TEX files are compiled. The line numbers reported a TEXShop “engine” script that wraps around la- by TEX at the time of the error are different from texmk so configured. I edit the DocStrip and press the line numbers in the DocStrip file. So diagnosing Command-T once. I open one of the generated files errors needs to be done without navigating to specific in “preview” mode, which gives the PDF window line numbers. Rather, the token list before the error but not the TEX window (we won’t be editing the can be used for a search string. generated file directly, so we don’t need it). The Finally, errors in the first run of TEX (when preview window updates each time the underlying DocStrip is extracting) can arise from unbalanced file is changed. guard modifiers, e.g., a %<*solutions> line with no I have also gotten this workflow to succeed in closing %. These are hard to isolate Visual Studio Code with the LATEX Workshop [3] since DocStrip does not log its processing with much These script files and corresponding documentation context, and the error isn’t discovered until the end are in the github repository. of the file is reached.. I have been able to find these 6 Conclusion through a combination of retracing my steps, and selectively commenting out blocks. I will continue to maintain and update the github repository referenced above. If you would like to try 5 Automation this method, and find that additional documentation To recap, this workflow requires editing a DocStrip would be useful, I will be happy to include it. file marked up as in Section 3, compiling that Doc- References Strip file to generate separate TEX files, then compil- ing the desired TEX file. The first run can be done [1] Caramdir. Passing parameters to a document, 2010. tex.stackexchange.com/q/1492 in any TEX engine, because the batch declaration [2] J. Collins, E. McLean, D. J. Musliner. latexmk— header (and docstrip.tex) is in core TEX. The fully automated LAT X document generation, 2019. second set of compilations require whatever engine E ctan.org/pkg/latexmk your destination documents require. [3] J. Lelong, T. Tamura, et al. Visual Studio Code The latexmk program [2] works like make for LATEX Workshop Extension, 2020. TEX projects. It examines a TEX file to find de- github.com/James-Yu/LaTeX-Workshop pendencies, watches for warnings about re-running [4] meduz. What Makefile to produce slides and LATEX, and runs the necessary commands to get the handouts [in] a common file?, 2014. entire document stable. latexmk is written in Perl, tex.stackexchange.com/q/170542 distributed with TEX Live and MiKTEX, and actively [5] F. Mittelbach, D. Duchier, et al. The DocStrip maintained. program, 2006. ctan.org/pkg/docstrip A one-line command that does both of [6] reprogrammer. Passing command-line arguments to these is: LATEX document, 2009. stackoverflow.com/q/1465665 tex foo.dtx && latexmk [7] E. R. Scheinerman. Mathematics: A Discrete Introduction. Thomson Brooks/Cole, Belmont, MA, This processes foo.dtx and, upon success of that 2nd edition, 2005. command, runs latexmk. Without file arguments, latexmk looks for any TEX files in the current work- Matthew Leingang ⋄ ing directory and makes them. Any command-line New York University options to latexmk (notably, -pdf to make sure the leingang (at) nyu dot edu pdftex engine is chosen, -pdfxe to ensure xelatex, www.cims.nyu.edu/~leingang/

Using DocStrip for multiple document variants 286 TUGboat, Volume 41 (2020), No. 3

LATEX News Issue 32, October 2020

Contents Changes to packages in the tools category 5 array: Support stretchable glue in w-columns . . 5 Introduction 1 array: Use math mode for w and W-cells in array 6 Providing xparse in the format 1 array: Fix for \firsthline and \lasthline . . 6 varioref: Support Japanese as a language option 6 A management system for LATEX 2 xr: Support for spaces in filenames ...... 6 Other changes to the LATEX kernel 2 Changes to packages in the amsmath category 6 \symbol in math mode for large Unicode values 2 Placement corrections for two accent commands 6 Correct Unicode value of \=y (¯y)...... 2 Fixes to aligned and gathered ...... 6 Add support for Unicode soft . . . . . 2 Detect Unicode engines when setting Fix capital accents in Unicode engines . . . . . 2 \std@minus and \std@equal ...... 6 Support calc in various kernel commands . . . 2 lualatex-math: Use LuaTEX primitives . . . . . 6 Support ε-TEX length expressions in picture coordinates ...... 2 Changes to the babel package 6 Spaces in filenames of included files ...... 3 Avoid extra line in \centering, \raggedleft Introduction or \raggedright ...... 3 The 2020-10-01 release of LATEX shows that work on Set a non-zero \baselineskip in text scripts . 3 improving LATEX has again intensified. The two most Spacing issues when using \linethickness . . 3 important new features are the kernel support for xparse Better support for the legacy series default and the introduction of the new hook management interface ...... 3 system for LATEX, but as you can see there are many Support for uncommon font series defaults . . . 3 smaller enhancements and bug fixes added to the kernel Checking the current font series context . . . . 3 and various packages. Avoid spurious package option warning . . . . . 3 Adjusting fleqn ...... 3 Providing xparse in the format Provide \clap ...... 3 The official interface in the LATEX 2ε kernel for cre- Fix to legacy math alphabet interface . . . . . 4 ating document-level commands has always been Added tests for format, package and class dates 4 \newcommand. This was a big step forward from Avoid problematic spaces after \verb ...... 4 LATEX 2.09. However, it was still very limited in the Provide a way to copy robust commands. . . . . 4 types of command it can create: those taking at most . . . and a way to \show them ...... 4 one optional argument in square brackets, then zero or l3docstrip into docstrip ...... 4 more mandatory arguments. Richer required Support vertical typesetting with doc . . . . . 4 use of the TEX \def primitive along with appropriate Record the counter name stepped by low-level macro programming. \refstepcounter ...... 5 The LATEX team started work on a comprehensive Native LuaTEX behavior for \- ...... 5 document-command parser, xparse, in the late 1990s. In Allow \par commands inside \typeout . . . . 5 the past decade, the experimental ideas it provides have Spacing commands moved from amsmath to been carefully worked through and moved to a stable the kernel ...... 5 footing. As such, xparse is now used to define a very Access raw glyphs in LuaTEX without large number of document and package commands. It reloading fonts ...... 5 does this by providing a rich and self-consistent Added a fourth empty argument to to describe a wide range of interfaces seen in LATEX \contentsline ...... 5 packages. LuaTEX callback new_graf made exclusive . 5 The ideas developed in xparse are now sufficiently well tested that the majority can be transferred into the Changes to packages in the graphics category 5 LAT X kernel. Thus the following commands have been Generate a warning if existing color definition E added is changed ...... 5 Specifying viewport in the graphics package . . 5 • \NewDocumentCommand, \RenewDocumentCommand, Normalizing \endlinechar ...... 5 \ProvideDocumentCommand, Files with multiple parts ...... 5 \DeclareDocumentCommand

LATEX News #32 TUGboat, Volume 41 (2020), No. 3 287

• \NewExpandableDocumentCommand, For those who wish to also study the code, replace -doc \RenewExpandableDocumentCommand, with -code, e.g., lthooks-code.pdf. All documents \ProvideExpandableDocumentCommand, should be accessible via texdoc, e.g., \DeclareExpandableDocumentCommand texdoc lthooks-doc • \NewDocumentEnvironment, should open the core documentation for you. \RenewDocumentEnvironment, \ProvideDocumentEnvironment, Other changes to the LATEX kernel \DeclareDocumentEnvironment \symbol in math mode for large Unicode values • \BooleanTrue \BooleanFalse The LATEX 2ε kernel defines the command \symbol, which allows characters to be typeset by entering their • \IfBooleanTF, \IfBooleanT, \IfBooleanF ‘slot number’. With the LuaTEX and X TE EX engines, • \IfNoValueTF, \IfNoValueT, \IfNoValueF these slot numbers can extend to very large values • \IfValueTF, \IfValueT, \IfValueF to accommodate Unicode characters in the upper Unicode planes (e.g., bold mathematical capital A is • \SplitArgument, \SplitList, \TrimSpaces, slot number "1D400 in hex or 119808 in decimal). The \ProcessList, \ReverseBoolean X TE EX engine did not allow \symbol in math mode for • \GetDocumentCommandArgSpec values above 216; this limitation has now been lifted. \GetDocumentEnvironmentArgSpec (github issue 124) Most, but not all, of the argument types defined Correct Unicode value of \=y (¯y) by are now supported at the kernel level. In xparse The Unicode slot for ¯ywas incorrectly pointing to the particular, the types g/G, l and u are not provided by slot for Y.¯ This has been corrected. (github issue 326) the kernel code; these are deprecated but still available by explicitly loading xparse. All other argument types Add support for Unicode soft hyphens are now available directly within the LATEX 2ε kernel. For a long time, the UTF-8 option for inputenc made the Unicode soft character (U+00AD) an alias for A hook management system for LAT X E the LATEX \-. The Unicode engines X TE EX With the fall 2020 release of LATEX we provide a and LuaTEX behaved differently though: They either general hook management system for the kernel and for ignored U+00AD or interpreted it as an unconditional packages. This will allow packages to safely add code hyphen. This inconsistency is fixed now and LATEX to various kernel and package hooks and if necessary always treats U+00AD as \-. (github issue 323) define rules to reorder the code in the hooks to resolve typical package loading order issues. This hook system Fix capital accents in Unicode engines is written in the L3 programming layer and thus forms In Unicode engines the capital accents such as the first larger application within the kernel that makes \capitalcedilla, etc., have been implemented as trivial for the normal accents (because other use of the LATEX3 functionality now available (if we discount xparse which has already been available for a than Computer Modern virtually no fonts support long time as a separate package). them), but that failed when hyperref got loaded. This The file lthooks.dtx holds the core management has been corrected. (github issue 332) code for hooks and defines basic hooks for environments Support calc in various kernel commands (as previously offered by etoolbox), ltshipout.dtx The \hspace, \vspace, \addvspace, \\ and other provides kernel hooks into the shipout process (making commands simply passed their argument to a T X packages like atbegshi, etc., unnecessary) and the file E primitive to produce the necessary space. As a result it holds redefinitions for commands like ltfilehook.dtx was impossible to specify anything other than a simple or so that they offer hooks in \input \usepackage dimension value in such arguments. This has been a similar fashion to what is provided by the filehook changed, so that now calc syntax is also supported with package. these commands. (github issue 152) At the moment the integration is lightweight, overwriting definitions made earlier during format Support ε-TEX length expressions in picture coordinates generation (though this will change after more thorough Picture mode coordinates specified with (_,_) previously testing). For that reason the documentation isn’t in its accepted multiples of \unitlength. They now also allow final form either and you have to read through three ε-TEX length expressions (as used by the \glueexpr different documents: primitive although all uses in picture mode are lthooks-doc.pdf Core management interface and non-stretchy). basic hooks for environments provided by the kernel. So, valid uses include \put(2,2) as previously, but now also uses such as ltshipout-doc.pdf Hooks accessible while a page is \put(\textwidth-5cm,0.4\textheight). being shipped out. Note that you can only use expressions with ; ltfilehook-doc.pdf Hooks used when reading a file. \put(1+2,0) is not supported.

LATEX News #32 288 TUGboat, Volume 41 (2020), No. 3

Spaces in filenames of included files Checking the current font series context File names containing spaces lead to unexpected Sometimes it is necessary to define commands that results when used in the commands \include and act differently when used in bold context (e.g., inside \includeonly. This has now been fixed and the \textbf). Now that it is possible in LATEX to specify dif- argument to \include can contain a file name containing ferent “bf” defaults based for each of the three meta fam- spaces. Leading or trailing spaces will be stripped off ilies (rm, sf and tt) via \DeclareFontSeriesDefault, but spaces within the file name are kept. The argument it is no longer easy to answer the question “am I type- to \includeonly, which is a -separated list setting in a bold context?”. To help with this problem a of files to process, can also contain spaces with any new command was provided: leading and trailing spaces stripped from the individual filenames while spaces in the file names will remain \IfFontSeriesContextTF{ context } h i intact. (github issues 217 and 218) { true code }{ false code } h i h i Avoid extra line in \centering, \raggedleft or The context can be either bf (bold) or md (medium) h i \raggedright and depending on whether or not the current font is If we aren’t justifying paragraphs then a very long word recognized as being selected through \bfseries or (longer than a line) could result in an unnecessary extra \mdseries the true code or false code is executed. h i h i line in order to prevent a hyphen in the second-last As an example line of the paragraph. This is now avoided by setting \finalhyphendemerits to zero in unjustified settings. \usepackage{bm} % (bold math) (github issue 274) \newcommand\vbeta{\IfFontSeriesContextTF{bf}% {\ensuremath{\bm{\beta}}}% Set a non-zero \baselineskip in text scripts {\ensuremath{\beta}}} As \textsuperscript and \textsubscript usually contain only a few characters on a single line the This way you can write \vbeta-isotopes and if \baselineskip was set to zero. However, hyperref uses used in a heading it comes out in a bolder version. that value to determine the height of a link box which (github issue 336) consequently came out far too small. This has been adjusted. (github issue 249) Avoid spurious package option warning When a package is loaded with a number of options, Spacing issues when using \linethickness say X, Y and Z, and then later another loading attempt In some circumstances the use of \linethickness was made with a subset of the options or no options, it introduced a spurious space that shifted objects in was possible to get an error message that option X is not a picture environment to the right. This has been known to the package. This obviously incorrect error corrected. (github issue 274) was due to a timing issue where the list of available Better support for the legacy series default interface options got lost prematurely. This has now been fixed. In the initial implementation of LATEX’s font selection (github issue 22) scheme (NFSS) changes to any default were carried out by redefining some commands, e.g., \seriesdefault. In Adjusting fleqn 2019 we introduced various extensions and with it new In amsmath the \mathindent parameter used with the methods of customizing certain parts of NFSS, e.g., the fleqn design is a rubber length parameter allowing for recommended way for changing the series default(s) is setting it to a value such as 1em minus 1em, i.e., so that now through [1]. In this \DeclareFontSeriesDefault the normal indentation can be reduced in case of very release we improved the support for legacy documents wide math displays. This is now also supported by the using the old method to cover additional edge cases. LAT X standard classes. (github issues 306,315) E In addition a compressible space between formula Support for uncommon font series defaults and equation number in the equation environment got If a was set up with fairly unusual font series added when the fleqn option is used so that a very defaults, e.g., wide formula doesn’t bump into the equation number. \renewcommand\ttdefault{lmvtt} (github issue 252) \DeclareFontSeriesDefault[tt]{md}{lm} \DeclareFontSeriesDefault[tt]{bf}{bm} Provide \clap then a switch between the main document families, LATEX has inherited \llap and \rlap from plain TEX e.g., \ttfamily...\rmfamily did not always correctly (zero-sized boxes whose content sticks out to the left continue typesetting in medium or bold series if that or right, respectively) but there isn’t a corresponding involved adjusting the values used by \mdseries \clap command that centers the material. This missing or \bfseries. This has now been corrected. command was added by several packages, e.g., mathtools, (github issue 291) and has now been added to the kernel.

LATEX News #32 TUGboat, Volume 41 (2020), No. 3 289

Fix to legacy math alphabet interface From this release onwards, the LATEX 2ε kernel When using the LATEX 2.09 legacy math alphabet provides \NewCommandCopy (and \Renew... and interface, e.g., $\sf -1$ instead of $\mathsf{-1}$, \Declare... variants) which functions almost like an extra math Ord atom was added to the formula in \LetLtxMacro. To the end user, both should work the case the math alphabet was used for the first time. In same way, and one shouldn’t need to worry about the some cases this math atom would change the spacing, definition of the command: \NewCommandCopy should do e.g., change the unary minus sign into a binary minus the hard work. in the above example. This has finally been fixed. \NewCommandCopy knows about the different types of (gnats issue latex/3357) definitions from the LATEX 2ε kernel, and also from other packages, such as xparse’s command declarations like Added tests for format, package and class dates \NewDocumentCommand, and etoolbox’s \newrobustcmd, To implement compatibility code or to ensure that and it can be extended to cover further packages. certain features are available it is helpful and often (github issue 239) necessary to check the date of the format or that of a package or class and execute different code based on ... and a way to \show them the result. For that, LATEX previously had only internal It is sometimes necessary to look up the definition of a commands (\@ifpackagelater and \@ifclasslater) command, and often one not only doesn’t know where for testing package or class names, but nothing that command is defined, but doesn’t know if it gets reasonable for testing the format date. For the latter redefined by some package, so often enough looking at one had to resort to some obscure command \@ifl@t@r the source doesn’t help. The typical way around this that, given its cryptic name, was clearly never intended problem is to use TEX’s \show primitive to look at the for use even in package or class code. Furthermore, even definition of a command, which works fine until the the existing interface commands were defective as they command being \shown is robust. With \show\frac are testing for “equal or later” and not for “later” as one sees their names indicate. > \frac=macro: We have therefore introduced three new CamelCase ->\protect \frac . commands as the official interface for such tests which is not very helpful. To show the actual com- \IfFormatAtLeastTF{ date } mand the user needed to notice that the real def- h i { true code }{ false code } inition of \frac is in the \frac␣ macro and do h i h i \expandafter\show\csname frac\space\endcsname. and for package and class tests But with the machinery for copying robust commands \IfClassAtLeastTF{ class name }{ date } in place it is already possible to examine a command h i h i { true code }{ false code } and detect (as far as a macro expansion language allows) h i h i \IfPackageAtLeastTF{ package name }{ date } how it was defined. \ShowCommand knows that and with h i h i { true code }{ false code } \ShowCommand\frac the terminal will show h i h i For compatibility reasons the legacy commands remain > \frac=robust macro: available, but we suggest to replace them over time and ->\protect \frac . use the new interfaces in new code. (github issue 186) > \frac =\long macro: Avoid problematic spaces after \verb #1#2->{\begingroup #1\endgroup \over #2}. If a user typed \verb␣!~!␣foo instead of \verb!~!␣foo (github issue 373) by mistake, then surprisingly the result was “!~!foo” without any warning or error. What happened was that Merge l3docstrip into docstrip the ␣ became the argument delimiter due to the rather The file l3docstrip.tex offered a small extension over the original docstrip.tex file supporting the %<@@= module > complex processing done by \verb to render verbatim. h i This has been fixed and spaces directly following the This has been merged into docstrip so that it can now be command \verb or \verb* are now ignored as elsewhere. used for both traditional .dtx files and those containing (github issue 327) code written in the L3 programming layer language. (github issue 337) Provide a way to copy robust commands. . . Support vertical typesetting with doc With the previous LAT X 2 release, several user-level E ε The macrocode environment uses a trivlist internally commands were made robust, so the need for a way and as part of this sets up the \@labels box to contain to create copies of these commands (often to redefine some horizontal skips, but that box is never used. As them) increased, and the LAT X 2 kernel didn’t have a E ε a result this generates an issue in some circumstances way to do so. Previously this functionality was provided if the typesetting direction is vertical. This has now in part by Heiko Oberdiek’s letltxmacro package, which been corrected to support such use cases as well. allows a robust command \foo to be copied to \bar (github issue 344) with \LetLtxMacro\bar\foo.

LATEX News #32 290 TUGboat, Volume 41 (2020), No. 3

Record the counter name stepped by \refstepcounter LuaTEX callback new_graf made exclusive \refstepcounter now stores the name of the counter Corrected an incorrect callback type which caused in \@currentcounter. This allows packages like zref return values from the new_graf callback to be ignored and hyperref to store the name without having to patch and paragraph indentation to be suppressed. In the \refstepcounter. (github issue 300) new version, only one new_graf callback handler can be active at a time, which allows this handler to take full control of paragraph indentation. (github issue 188) Native LuaTEX behavior for \- LATEX changes \- to add a discretionary hyphen even if \hyphenchar is set to 1. This change is not necessary Changes to packages in the graphics category − under LuaTEX because there \- is not affected by Generate a warning if existing color definition is changed \hyphenchar in the first place. Therefore this behavior If a color is defined twice using \DefineNamedColor, has been changed to ensure that LuaTEX’s (language no info text Redefining color ... in named color specific) hyphenation characters are respected by \-. model ... was written to the log file, because of a typo in the check. This has been corrected. Allow \par commands inside \typeout (gnats issue graphics/3635) \typeout used to choke when seeing an empty line or a Specifying viewport in the graphics package \par command in its argument. However, sometimes it Specifying a BoundingBox does not really have meaning is used to display arbitrary user input or code (wrapped, when including non-EPS graphics in pdfT X and for example, in \unexpanded) which may contain explicit E LuaT X. For some years the graphicx package bb key has \par commands. This is now allowed. (github issue 335) E been interpreted (with a warning) as a viewport key. This feature has been added to the two-argument form of Spacing commands moved from amsmath to the kernel \includegraphics, which is mostly used in the graphics Originally LATEX only provided a small set of spacing package. \includegraphics[1,2][3,4]{file} will commands for use in text and math; some of the now be interpreted in pdfTEX and LuaTEX in the same commands like \; were only supported in math mode. way as graphicx’s amsmath normalized and provided all of them in text \includegraphics[viewport=1 2 3 4]{file}. and math. This code has now been moved to the kernel so that it is generally available. (github issue 303) Normalizing \endlinechar If \endlinechar is set to 1 so that ends of lines are − command name(s) math text ignored in special contexts, then a low level TEX error \, \thinspace xx xx would be generated by code parsing BoundingBox com- \! \negthinspace xx xx ments. The package now locally sets \endlinechar to \: \> \medspace xx xx its standard value while reading files. (github issue 286) \negmedspace xx xx \; \thickspace xx xx Files with multiple parts \negthickspace xx xx Sometimes one has a graphics file, say, file.svg, and converts it to another format to include it in LATEX and Access raw glyphs in LuaTEX without reloading fonts ends up with a file named file.svg.png. In previous LATEX’s definitions for \textquotesingle, releases, if the user did \includegraphics{file.svg}, \textasciigrave, and \textquotedbl for the TU an error would be raised and the graphics inclusion encoding in LuaTEX need special handling to stop would fail due to the unknown .svg extension. The the shaper from replacing these characters with curly graphics package now checks if the given extension is quotes. This used to be done by reloading the current known, and if it doesn’t, it tries appending the known font without the tlig feature, but that came with extensions until it finds a graphics file with a valid multiple disadvantages: It behaves differently than extension, otherwise it falls back to the file as requested. the corresponding X TE EX code and it is not very (github issue 355) efficient. This code has now been replaced with an implementation which injects a protected node Changes to packages in the tools category which is not affected by font shaping. (github issue 165) array: Support stretchable glue in w-columns If stretchable glue, e.g., \dotfill, is used in tabular Added a fourth empty argument to \contentsline columns made with the array package, it stretches as it LATEX’s \addcontentsline writes a \contentsline would in normal paragraph text. The one exception was command with three arguments to the .toc and similar w-columns (but not W-columns) where it got forced to files. hyperref redefines \addcontentsline to write a its nominal width (which in case of \hfill or \dotfill fourth argument. The change unifies the number of is 0pt). This has been corrected and now w-columns arguments by writing an additional empty brace group. behave like all other column types in this respect. (github issue 370) (github issue 270)

LATEX News #32 TUGboat, Volume 41 (2020), No. 3 291 array: Use math mode for w and W-cells in array \std@equal. This avoids a package like unicode-math The w and W-columns are LR-columns very similar to having to patch the code in the begin document hook to l, c and r. It is therefore natural to expect their cell change the commands. content to be typeset in math mode instead of text mode lualatex-math: Use LuaT X primitives if they are used in an array environment. This has now E For a number of years lualatex-math patched \frac, been adjusted. Note that this is a breaking change in \genfrac and the subarray environment to make use version v2.5! If you have used w or W-columns in older of new luaT X primitives. This code has now been documents either add >{$}...<{$} for such columns or E integrated into amsmath. remove the $ signs in the cells. Alternatively, you can roll back to the old version by loading array with Changes to the babel package \usepackage{array}[=v2.4] Multilingual typesetting has evolved greatly in recent in such documents. (github issue 297) years, and babel, like LATEX itself, has followed the footsteps of Unicode and the W3C consortia to produce array: Fix for \firsthline and \lasthline proper output in many languages. Replacing \hline with \firsthline or \lasthline Furthermore, the traditional model to define and select could lead in some cases to an increase of the tabular languages (which can be called “vertical”), based on width. This has now been corrected. (github issue 322) closed files, while still the preferred one in monolingual documents, is being extended with a new model (which varioref: Support Japanese as a language option can be called “horizontal”) based on services provided The package now recognizes japanese as a language by babel, which allows defining and redefining locales option. The extra complication is that for grammatical with the help of simple ini files based on key/value reasons \vref, \Vref, \vrefrange and \fullref need pairs. The babel package provides about 250 of these a structure different from all other languages currently files, which have been generated with the help of the supported. To accommodate this, \vrefformat, Unicode Common Language Data Repository. \Vrefformat, \vrefrangeformat, and \fullrefformat Thanks to the recent advances in lualatex and have been added to all languages. (github issue 352) luaotfload, babel currently provides services for bidi typesetting, line breaking for Southeast Asian and xr: Support for spaces in filenames CJK scripts, nonstandard hyphenation (like ff to ff-f), The command \externaldocument, provided by xr, now alphabetic and additive counters, automatic selection of also supports filenames with spaces, just like \include fonts and languages based on the script, etc. This means and \includeonly. (github issue 223) babel can be used to typeset a wide variety of languages, Changes to packages in the amsmath category such as Russian, Arabic, Hindi, Thai, Japanese, Bangla, Amharic, Greek, and many others. Placement corrections for two accent commands In addition, since these ini files are easily parsable, The accent commands \dddot and \ddddot (producing they can serve as a source for other packages. triple and quadruple dot accents) moved the base For further details take a look at the babel package character vertically in certain situations if it was a documentation [4]. single glyph, e.g., $Q \dddot{Q}$ were not at the same baseline. This has been corrected. (github issue 126) References Fixes to aligned and gathered [1] LATEX Project Team: LATEX2ε news 31. The environments aligned and gathered have a trailing https://latex-project.org/news/latex2e-news/ optional argument to specify the vertical position of ltnews31.pdf the environment with respect to the rest of the line. [2] LATEX documentation on the LATEX Project Website. Allowed values are t, b and c but the code only tested https://latex-project.org/help/documentation/ for b and t and assumed anything else must be c. As [3] LATEX issue tracker. a result, a formula starting with a bracket group would https://github.com/latex3/latex2e/issues/ get mangled without warning—the group being dropped [4] Javier Bezos and Johannes Braams. and interpreted as a request for centering. After more Babel—Localization and internationalization. than 25 years this has now been corrected. If such https://www.ctan.org/pkg/babel a group is found a warning is given and the data is processed as part of the formula. (github issue 5)

Detect Unicode engines when setting \std@minus and \std@equal amsmath now detects the Unicode engines and uses their extended commands to define \std@minus and

LATEX News #32 292 TUGboat, Volume 41 (2020), No. 3

LATEX Tagged PDF — A blueprint for a large on physical media, now including overhead projector project slides. The move beyond paper came soon after this, Frank Mittelbach, Chris Rowley with hyperlinks and other web publishing support Abstract getting layered on top of basic LATEX; but these (together with many other subsequent extensions) In Frank’s talk at the TUG 2020 online conference were never incorporated as integral parts of its de- we announced the start of a multi-year project to sign. This piecemeal development has resulted in enhance LAT X to fully and naturally support the cre- E the extensions being very fragile so that they often ation of structured document formats, in particular break in more complex documents, especially when the “tagged PDF” format as required by accessibility used in combination. standards such as PDF/UA. Even today it is the case that, in many areas, In this short article we outline the background core LATEX provides little in the way of APIs that are to this project and some of its history so far. We then robust enough to be safely used or built on by devel- describe the major features of the project and the opers. Therefore most such extensions are forced to tasks involved, of which more details can be found contain a lot of code that overwrites many internal in the Feasibility Study [8] that was prepared as the LATEX commands: such unfortunate and unsustain- first part of our co-operation with Adobe. able implementation practices being currently neces- This leads on to a description of how we plan sary in order to implement most extra functionality. to use the study as the basis for our work on the Such methodology naturally leads to many issues, of project and some details of our planned working which a common and particularly frustrating one is methodologies, illustrated by what we have achieved incompatibilities between extensions. so far and leading to a discussion of some of the Another severe limitation of current LATEX is obstacles we foresee. that, whenever it produces an output page, it very Finally there is also a summary of recent, current carefully throws away the wealth of structural infor- and upcoming activities on and around the project. mation it has used and accumulated while producing 1 Contents that page. As a result, a PDF or DVI file produced by basic LATEX is nothing but a stream of positioned 1 A short history of the project 292 glyphs containing very little structural information. 2 What the project covers 293 As long as your intention is only to print a document on a physical medium, then this is all that 3 Explaining the blueprint 294 is required. However, for quite a while now other uses 3.1 The major tasks ...... 295 of documents have been increasing in importance so 3.2 The project plan ...... 295 that nowadays many documents are never printed, 3.3 Timeline and other considerations . 296 or printed only as a secondary consideration. 4 First results 297 Coming into the 21st century, for many reasons great interest has arisen in the production of PDF 5 Outlook 297 documents that are “accessible”, in the sense that they contain information to assist screen reading soft- ware, etc., and, more formally, that they adhere to 1 A short history of the project the PDF/UA (Universal Accessibility) standard [17], When TEX was developed by Don Knuth in the early explained further in [3]. Ross Moore has recently 1980s, it was designed as a high-quality automated written a comprehensive introduction to this sub- typesetting system that concerned itself solely with ject [9]. the production of a “printed result” from text input, with paper as the final intended output medium. Any other kind of typeset output was either not 1 This jettisoning of much useful information was abso- supported or not directly supported. lutely essential during the early decades of LATEX development because the system would otherwise never have run in the se- Partially in parallel to Knuth’s work, Leslie Lam- verely limited memory available on the then current computer port developed LATEX as a complex and highly inte- systems. So all information no longer needed for producing grated collection of TEX macros that “runs on top a “printed page” was dropped immediately so as to keep the of T X”. Some time later, during the 1990s, some memory footprint within reasonable limits. E But even that was not always enough: on his first PC one support for graphics and color was added, but still author was greeted with “out of memory” while loading the following the same paradigm and aimed at printing article class.

Frank Mittelbach, Chris Rowley TUGboat, Volume 41 (2020), No. 3 293

One important requirement for such PDF docu- As a result of these discussions, towards the end ments is that they must contain a significant amount of 2019 the authors, together with Ulrike Fischer, pro- of embedded structural information and other data. duced an extended feasibility study for the project, At the moment, all methods of producing such “ac- aimed primarily at Adobe engineers and decision cessible PDFs”, including the use of LATEX, require makers. This study [8] describes in some detail the extensive manual labor in preparing the source or various tasks that will constitute the project and in post-processing the PDF (maybe even at both their inter-dependencies. It also contains a plan cov- stages); and these labors often have to be repeated ering how, and in what order, these tasks should be after making even minimal changes to the (LATEX tackled in order both to achieve the final goal and, or other) source. Such methods and their inherent at the same time, to provide intermediate concrete problems have been discussed and demonstrated re- results that are relevant to user communities (both cently by developers such as Ross Moore, much of LATEX and PDF); these latter will help in obtaining whose work in this area can be accessed through the feedback that is essential to the successful completion web site at [10]. of later tasks. The LATEX Project Team have for some years The project plan gained the approval of the been well aware that these new usages are not ade- Adobe management and we agreed in early 2020 to quately supported by the current system architecture start the project with Adobe contributing a substan- of LATEX, and that major work in this area is there- tial portion of the expected project costs. But this fore urgently needed to ensure that LATEX remains was 2020, so then “Corona arrived” with its huge an important and relevant document source format. spanner to throw into the works! Thus all activity However, the amount of work required to make such came to a halt while Adobe was forced to reassess major changes to the LATEX system architecture will such external financial commitments. However, we be enormous, and definitely way beyond the regular were quite quickly told that this was a temporary resources of our small team of volunteers working in setback, so we restarted work on the project, albeit their spare time! Or maybe just about possible, but at a slightly slower pace, and so were able to present only given a very long — and most likely too long — our first results at the TUG 2020 online conference: period of time. i.e., the new hook management system for LATEX.3 At the TUG conference 2019 in Palo Alto our previously pessimistic outlook on this subject became 2 What the project covers cautiously optimistic, because we were approached This project has the title “LATEX Tagged PDF” so by two senior engineers from Adobe to discuss the before discussing further its coverage we should de- possibility of producing structured PDF from LATEX scribe the use of the word “Tagged” here. A good source without the need for the normal requirement starting point for this is the following quote from the of considerable manual post-processing. Moreover, PDF standard [2, 15] which defines tagging in the they had come to ask us what it would take to make PDF context thus: this happen. They indicated that, if this turned out Tagged PDF ([introduced in ]PDF 1.4) is a to be a feasible project for which we could provide stylized use of PDF that builds on the logical them with a convincing project plan, then they would structure framework described in Section happily advise Adobe, at a suitably influential level of [...], “Logical Structure.” It defines a set management, to support such work, both financially of standard structure types and attributes 2 and in other ways — no strings attached. that allow page content (text, graphics, and images) to be extracted and reused for other purposes. It is intended for use by tools that 2 A Adobe’s rationale for this support is that LTEX is cur- perform the following types of operations: rently a very important document format in the STEM dis- ciplines and is one of the formats best suited to produce • Simple extraction of text and graphics well-structured PDF “out of the box”, given that its source for pasting into other applications already contains such a large amount of information about the document’s structure. Thus, with LATEX capable of supporting • Automatic reflow of text and associated the straightforward production of structured PDF documents, graphics to fit a page of a different size a large number of such documents would become available on than was assumed for the original layout the Internet. In addition, many older documents could be rea- sonably easily reprocessed to add this structural information • Processing text for such purposes as to their PDF form. For Adobe all this is of great interest as searching, indexing, and spell-checking the existence of large numbers of such documents forms an important and strategic input to their Document Cloud [1] 3 The project task for this is described in section 2.2.5 of offerings and related future services. the Feasibility Study [8].

LATEX Tagged PDF — A blueprint for a large project 294 TUGboat, Volume 41 (2020), No. 3

• Conversion to other common file formats For this reason, a large portion of the necessary (such as HTML, XML, and RTF) with project work consists of adding to the core of LATEX document structure and basic styling all the functionality, data structures and interfaces information preserved that are necessary for preserving and processing the many details of this structural information. This will • Making content accessible to users with result in a much enhanced LATEX system that can visual impairments (see Section [...], manage and transport structural information from a “Accessibility Support”) source to an output format. With all of this firmly embedded into LATEX, Based on this definition we can now give a high- the project will be able to go further to provide, for level description of this project by analyzing the example, appropriate document-level interfaces for quoted text from the perspective of how LATEX cur- some types of data that currently are not, at least rently works and how we will need to change it. The by default, made explicit in the source file. In this quoted text tells us that to produce a Tagged PDF area there already exist some solutions for aspects document from a LATEX source we must proceed as such as metadata, PDF bookmarks, etc., produced by follows: Ross Moore, Scott Pakin, Heiko Oberdiek and oth- First, we must exploit LATEX’s knowledge of how ers [11, 12, 14]. Wherever feasible, we shall unify and the components of a document are presented in the incorporate such useful existing work. Support for LATEX source file and used to describe its structure, some, but not all, of the wider (beyond tagging) re- consisting of “structure types and attributes” such as quirements of “Accessible PDF” will also be included the paragraph text, headings, lists, items, graphics, in the project work. tables, table cells, mathematics, etc. We also want to include here some important Then, we must use this knowledge about the (to us, at least) observations concerning the process- source document to add the necessary components ing of structural information using LATEX. From the to the PDF we are constructing. There are two basic document-level (frontend) processing to the internal ingredients of this: a tree structure whose nodes data structures and interfaces, everything we wrote in represent the “logical components” of the document; the high-level description above is essentially indepen- and a mechanism for identifying those portions of dent of the target output format (Tagged PDF in this the “page content” that correspond to each (leaf) project). Identical processing will be needed for any node in this tree. output format that needs such structural information, These “tagged” portions of the content, together including HTML, XML and other “tagged/structured with information about the corresponding structure formats”. node, can then be used by other (consumer) applica- Thus, while the project will focus primarily on tions to carry out operations like those listed at the PDF output (either generated directly by the TEX end of the quote above. engine or through a DVI-based workflow) it will, as None of the above may sound too complicated a bonus, make it easy to add other such output for- given that LATEX knows what kind of structures will mats to the workflow by simply replacing an output be in the source file that it is processing. But as we (backend) module. Instead of PDF output, HTML5 or explained earlier, for performance reasons (necessary some other format will then get written. As of now, when LATEX was designed and implemented), cur- such alternative backends are not part of the project rently this structural information is disposed of as coverage, but once LATEX is able to pass structure soon as possible — therefore, by the time of writing information with well-defined interfaces to a backend out the content of a page to the PDF file it is no we expect that support for other structured output longer available. formats will follow. Such work may be undertaken In addition, there are some problematic details by us or by other teams, possibly in parallel to the within these processes. For example, the main con- later phases of the project discussed in this article. tent of a PDF document is a collection of “page objects” and the “tagged portions” of the text have 3 Explaining the blueprint to be nested within each page object; but TEX’s asyn- The document entitled “LATEX Tagged PDF Feasibil- chronous process does not play well with ity Evaluation” [8], available from the LATEX Project this constraint. Dealing with this problem is there- website [5], is a forty-page study that explains in fore part of one of the lower-level tasks in the project, detail both the project goals and the tasks that need one that may best be solved by enhancements at the to be undertaken, concluding with a plan for how we engine level. think the project should be undertaken. Thus it is

Frank Mittelbach, Chris Rowley TUGboat, Volume 41 (2020), No. 3 295

divided into three parts, beginning with an “Intro- we have ensured that the results of each phase will duction” which contains an overview of the benefits offer immediate benefits to the LATEX community. of the project and goes on to explain why LATEX docu- This latter requirement is important since it ments make a good starting point for the production will lead to timely feedback and early adoption. As of tagged PDF. you can see below, we expect, for example, that on It is advisable, if you are consulting this study, completion of phase II it will already be possible to to bear in mind that it was addressed primarily to automatically generate tagged PDFs for a restricted an audience within Adobe;4 this consisted of engi- set of documents. In later phases this level of automa- neers and managers with a wide knowledge of digital tion will be extended to a wider range of documents. typography and electronic publishing but not neces- sarily much background within the specialized world Phase I — Prepare the ground of TEX, LATEX and friends. This phase covers some tasks that are important precursors of later work; it is already well under 3.1 The major tasks way. One important deliverable here is the new Following the overview, the next part of the study general hook management system for LATEX that was (“Project Overview”) documents all the major tasks presented at the 2020 TUG conference and which is that we consider necessary to reach the project goals. now part of core LATEX, starting from the October These tasks are categorized under five headings, the 2020 release. two most important being “General LATEX Extension Another task here is the initial work on the low- Tasks” and “Structured PDF Tasks”. level requirements for the creation of tagged PDF; For each task, the study describes the rationale this is currently available in the highly experimen- for its inclusion and the work that has to be under- tal package tagpdf, which uses prototype code from taken. For the larger tasks, suitable subtasks are also pdfresources-related files for the creation and man- identified. Furthermore, to support the project time- agement of PDF objects. This package thus supports line planning, all the dependencies of each task on such tasks as the creation in the PDF of the structure other tasks are documented. Each such task section tree and adding the necessary connections between ends with a list of deliverables, which will help us to the nodes in this tree and the “content stream” of measure, and hence monitor, our progress through each page. the project. It currently enables the creation (with some user Note that in total there are twenty engineering intervention) of tagged PDF documents; but, please tasks for which we believe that the necessary work note that this work is truly experimental and so both is well understood and therefore it “only” has to be the code and interfaces are likely to change, or even done — but done very well! In addition there are vanish, at any time. a few project tasks that will require more extensive This phase is the only one that was not designed research. Their descriptions in the study are much to offer significant benefits to users/authors. Never- less complete; this is simply because the engineering theless, the standardization and interfaces provided details are not as yet known, or not well enough by the general hook management will arguably ben- understood. These research tasks are, naturally, efit these groups in addition to its direct target, the likely to identify further engineering tasks. package developers. Note that, since the tasks in this part are ar- ranged by category, they are not listed in their Phase II — Provide tagging of simple chronological order of execution within the overall documents plan: i.e., a task’s number is no indication of how The main goal of phase II is to provide basic support far along in the project that task will be tackled. for the automation of the tagging process. In this phase the automation will cover only relatively simple 3.2 The project plan documents: those that do not contain any of the In the final major part of the study (“Project Time- more complicated structures such as mathematics line”) we develop a complete project plan consisting and tables. of six phases. The tasks have been sorted into these At this stage, the automated tagging will be phases in such way that the dependencies between provided by loading an add-on package rather than them are respected; but also, and more importantly, its being in the kernel, and it will be only a prototype implementation. 4 The public, cited, version of the study was updated in September 2020 with some minor redactions, corrections and For this phase several “workarounds” will be clarifications. required in order to provide necessary, but missing,

LATEX Tagged PDF — A blueprint for a large project 296 TUGboat, Volume 41 (2020), No. 3

features. For example, instead of setting up (as part directly, i.e., without any post-processing of the PDF of this phase) a full extended cross-reference system file. It also covers kernel support for some further (similar to the zref package by Heiko Oberdiek, but features such as the production of outlines for a docu- built into core LATEX) we will either use that package ment and the incorporation into a PDF of “associated or add temporary code written to “work around” any files”. issues resulting from this functionality not yet being a component of the kernel. Any such “workaround Research work code” will need to be removed or adjusted in a later In addition to these six phases, which contain only phase. tasks that are largely understood from a technical This ordering of the work was chosen so as to perspective, there are a number of other tasks that offer tangible results reasonably early in the life of will require preparatory research in order to under- the project, rather than taking up too much time at stand what engineering work is needed. This research this stage on making purely internal improvements will be carried out in parallel with the other work that have no immediately visible application and and, as indicated earlier, it is very likely to lead therefore appear to many users to be devoid of value. to the specification of additional engineering tasks. Thus the research outcomes will probably lead to Phase III — Remove the workarounds some alterations and extensions in phases IV to VI. needed for tagging The main goal of phase III is to extend the coverage Alterations, adjustments and reporting of automatic tagging to a wider variety of documents On the whole we believe that this plan offers a consis- by making further basic document elements “tag- tent and sensible approach to attaining the intended ging aware”. These extensions will also enable us to goals of the project. However, there will undoubt- remove the “workarounds” introduced (to provide edly be some changes since, for example, the research a working prototype) in the previous phase. This tasks will probably lead to some augmentation of the phase will also include a metadata mechanism that planned tasks and changes to some of them. More- is integrated into the LATEX kernel. over, we have already found that we want to move some deliverables (from the larger tasks) forward Phase IV — Make basic tagging and to earlier phases. Also, in some cases we are think- hyperlinking available ing about not completely finishing a task within the The main goal of phase IV is to incorporate into the planned phase because it appears more important kernel all the code from the prototype packages. This to first focus on other work (of course, such changes needs to be done very carefully and cautiously as will work only in cases where those deliverables are there should be no negative impact on the processing not required for other tasks). Furthermore, working of legacy documents. Thus we expect to need at through a task may identify some additional work least one full LATEX release cycle5 to complete this that is needed but not yet accounted for, leading to work. new subtasks with extra deliverables. We are therefore planning to provide an appro- Phase V — Extend the tagging capabilities priate level of continuing (public) reports on the With basic tagging available, the focus of phase V is project’s status and results, including any additions to provide extended support for tagging by adding or changes of direction. The Feasibility Study, with tables and formulas (mathematics) to the supported its fairly detailed documentation and deliverables, document structures. will provide a good starting point for this important At this stage, interfaces will also be added wher- activity. The exact form and shape of this reporting ever needed to support other aspects of Tagged PDF is not yet decided, but we will in due time announce such as the specification of alternate text (for formu- it on the LATEX Team website [5] and/or in TUGboat las, illustrations, etc.). as appropriate.

Phase VI — Handle standards 3.3 Timeline and other considerations Finally, phase VI focuses on the provision of support The time estimates we supplied (in the Feasibility for the relevant PDF standards (such as PDF/A [16], Study) for the individual tasks are based on the explained further in [13]; and PDF/UA [17], see also assumption that there will be at least (the equivalent [3]) at least so far as this is possible using LATEX of) one software developer working exclusively and 5 See Section 3.3 for further information concerning phases full time on a task throughout its allocated time and this release cycle. period, together with additional support from further

Frank Mittelbach, Chris Rowley TUGboat, Volume 41 (2020), No. 3 297

developers as necessary. We are also very well aware • PDF string support (Task 2.2.1 of [8]) which is that activities such as documentation and testing largely an internal enabler for later tasks; require substantial specialized professional input that • Initial experiments with tagpdf prototype code will also need to be resourced. (part of Task 2.2.6 of [8]). See [4] for a discussion. Whether or not this level of effort will be pos- sible at all times will depend on various factors, an 5 Outlook important one being the availability of appropriate Right now we are in the middle of phase I and ex- funding that can free the responsible developers and pecting it to be finished with the Spring release of other key workers from the need to undertake other LATEX in 2021. Given that we intend to shift the work to earn a living. While the financial sponsorship LATEX release dates next year to better align with from Adobe will go a long way towards meeting the the yearly TEX Live distributions, we expect to start needs of the project, it may well not be sufficient by work on some tasks of phase II already before the itself to fully sustain the work. For all the above rea- next LATEX release. sons, the estimates in the study should be considered In addition to the ongoing work on engineering as approximations and not as definite values. tasks, some effort will go into managerial tasks, e.g., A realistic scenario would be that each phase • Setting up connections and collaboration with takes between one and two release cycles of LATEX, of external experts to gain expertise in areas where which there are two per year. This implies that the the present team is lacking, and to ensure that project will stretch across four years as a minimum, our work will continue to be focused on pertinent but it most probably will be somewhat longer. Addi- goals; tional funding will help to ensure timely delivery of • Looking for additional financial support to bring each phase and may additionally allow us to broaden in extra expertise and hence, we hope, speed up the scope in some areas. Nevertheless, given the the later phases; complexity of the topic, any expectation of earlier • Setting up a professional system for the produc- delivery dates is not realistic. tion of documentation, high in both quality and It is also important to note that all the neces- quantity. sary updates to important external packages (those Of course, LAT X development and maintenance not supported by the LATEX Project Team) are ex- E pected to be done using external resources, i.e., by will not be solely restricted to this project over the the maintainers of those packages. This assumption coming years. There are many other activities that A is probably not tenable in all cases (see, for example, the LTEX Project Team plans to carry out in parallel. the discussion in [8, Task 2.4.3]). In such cases, addi- Their results will, as in the past, appear via the usual tional work will have to be undertaken as part of the communication channels, e.g., our website, ltnews project, which will also alter the timeline or require newsletters, TUGboat articles and/or Internet-based hiring additional developers to take on such work. discussions on StackExchange chat, LATEX-L and Last but not least, the success of the project will other places. also depend very much on productive collaborations with many people in the LATEX community: testers, References package writers, and possibly also those who tend to [1] Adobe Document Cloud. the underlying T X engines and the various utility E https://en.wikipedia.org/wiki/Adobe_ programs used to produce PDF output. Also, from Document_Cloud. the wider world, we shall need input from a variety of experts in the production of high-quality PDF [2] Adobe Systems Inc. PDF Reference 1.7, Nov. and from those with knowledge of how consumer 2006. https://www.adobe.com/devnet/pdf/ applications use its features in the real world. pdf_reference.html. Freely available version of [15]. 4 First results [3] O. Dr¨ummer, B. Chang. PDF/UA in a Nutshell — Accessible documents with PDF. As mentioned earlier, we have started the project PDF Association, Aug. 2013. https://pdfa. despite COVID-19 getting in the way. So we can now org/resource/pdfua-in-a-nutshell/. already report on some success stories: [4] U. Fischer. Creating accessible pdfs • The new LATEX hook management system (Task with LATEX. TUGboat 41(1):26–28, 2020. 2.2.5 of [8]) presented at TUG 2020 [7] and in a https://tug.org/TUGboat/tb41-1/ TUGboat article [6]; tb127fischer-accessible.pdf

LATEX Tagged PDF — A blueprint for a large project 298 TUGboat, Volume 41 (2020), No. 3

[5] LATEX Project Team. Website of the LATEX [14] S. Pakin. The hyperxmp package. Project. https://latex-project.org/. https://ctan.org/pkg/hyperxmp, Oct. 2020. [6] F. Mittelbach. Quo vadis LATEX(3) Team — A [15] Technical Committee ISO/TC 171/SC 2. look back and at the upcoming years. TUGboat ISO 32000-1:2008 Document management — 41(2):201–207, 2020. https://latex-project. Portable document format (PDF 1.7), July org/publications/indexbyyear/2020/. 2008. https://iso.org/standard/51502. html. Freely available as [2]. [7] F. Mittelbach. Video: Quo vadis LATEX(3) Team — A look back and at the upcoming [16] Technical Committee ISO/TC 171/SC 2. years. 2020. Recording of the talk ISO 19005-2:2011 Document management — given at the online TUG 2020 conference, Document file format for long-term https://youtu.be/zNci4lcb8Vo. preservation—Part 2: Use of ISO 32000-1 (PDF/A-2), July 2011. [8] F. Mittelbach, U. Fischer, C. Rowley. LATEX https://iso.org/standard/50655.html. Tagged PDF Feasibility Evaluation.LATEX Project, Sept. 2020. https://latex-project. [17] Technical Committee ISO/TC 171/SC 2. org/publications/indexbyyear/2020/. ISO 14289-1:2014 Document management [9] R. Moore. Implementing PDF standards applications — Electronic document file format for mathematical publishing. TUGboat enhancement for accessibility — 1: Use of ISO 39(2):131–135, 2018. https://tug.org/ 32000-1 (PDF/UA-1), Dec. 2014. TUGboat/tb39-2/tb122moore-pdf.pdf https://iso.org/standard/64599.html. [10] R. Moore. Website: Tagged PDF examples and resources, 2020. http://maths.mq. Frank Mittelbach ⋄ edu.au/~ross/TaggedPDF and in particular Mainz, /TUG2019-movies. frank.mittelbach (at) [11] R. Moore, C. V. Radhakrishnan, et al. latex-project dot org Generation of PDF/X- and PDF/A-compliant https://www.latex-project.org PDFs with pdfTEX— pdfx.sty. Chris Rowley ⋄ https://ctan.org/pkg/pdfx, Mar. 2019. Sattahip, Chonburi, [12] H. Oberdiek. The bookmark package. chris.rowley (at) latex-project dot org https://ctan.org/pkg/bookmark, Dec. 2019. https://www.latex-project.org [13] A. Oettler. PDF/A in a Nutshell — PDF for long-term archiving. PDF Association, May 2013. https://pdfa.org/resource/ pdfa-in-a-nutshell/.

Frank Mittelbach, Chris Rowley TUGboat, Volume 41 (2020), No. 3 299

Functions and expl3 The issue comes up when programming. In ‘cor- rect’ LATEX programming we have a user level com- Enrico Gregorio mand which calls a function that performs an action: Abstract \NewDocumentCommand{\title}{m} { In this tutorial we discuss expl3 functions, their role, \example_title:n { #1 } definition and variants, also touching on variables. } % 1 Introduction \tl_new:N \g_example_title_tl The term function is not used in standard LATEX, but % is a very important concept in the expl3 language. \cs_new_protected:Nn \example_title:n The language itself had no precise name until a couple { of years ago, when it was eventually decided that \tl_gset:Nn \g_example_title_tl { #1 } its name would be expl3 just like the package that } provided it (now merged in the LATEX kernel). Then the user level command \maketitle will use In the language, care is taken to distinguish the value stored in the variable, via other functions. between variables and functions. Variables store This (imaginary) example shows many of the some value that can change during the LATEX run, concepts we’ll be discussing. We define a user-level whereas functions perform some action. command in terms of a function; this function has A special kind of variable is the constant, whose one argument (the given title) and its job is to set value is supposed not to change during the run. Well, the value of a particular variable to the specified the name seems to conflict with the nature, but value. The variable has been declared in advance: mathematicians are used to this kind of stretched • \title is a user-level command; these are not terminology: you who are not a mathematician, don’t the subject of this paper; worry and carry on, just smile at mathematicians’ • \g_example_title_tl is a variable, defined us- bizarre way of thinking. ing \tl_new (tl = token list); We’ll be mostly interested in functions, but vari- • \example_title:n is a function, defined using ables can be the staple food of functions, so we’ll \cs_new_protected (cs = control sequence). also need to know a bit about them. We’ll be discussing all of this in detail. Let me give an example using legacy concepts. The standard document classes, and most nonstan- 2 Naming conventions dard ones as well, have the command \title. How A common problem with TEX is that it has no con- does this work? This very paper has cept of namespace, which only became common in \title{Functions and \expliii} computer science circles much later. Name conflicts were frequent in the olden days of LAT X, and such at its start. When LATEX processes this instruction, E it will do something like conflicts still appear now and then. It might be appealing for a package to use \@x and \@y for coor- \gdef\@title{Functions and \expliii} dinates, but package authors should be aware that if so it will be able to use \@title when doing its a simple name appeals to them, other authors have typesetting job related to \maketitle. probably thought the same. There is a big conceptual difference between In expl3, variables should have a name of the \title and \@title. The former performs an action, form the latter is simply a container. In expl3 terms, the \l_ prefix _ proper name _ type h i h i h i former will be a function, the latter a variable: this \g_ prefix _ proper name _ type h i h i h i function’s action is to store a value in the specified \c_ prefix _ proper name _ type h i h i h i variable. where the distinct parts are important and necessary: At the user level the distinction is blurred; with • l, g and c declare that the variable is local, \newcommand{\CC}{\mathbb{C}} global or constant, respectively; is \CC be a function or a variable? Fortunately, it’s • prefix should be a unique string of letters for h i not important to decide, because this is essentially a the package we’re writing or the code in the user’s and at this level the distinction is document; almost irrelevant. • proper name is an arbitrary string of letters h i Simultaneously published in Italian for the GuIT 2020 possibly split into parts separated by an under- conference, guitex.org. score;

Functions and expl3 300 TUGboat, Volume 41 (2020), No. 3

• type is the type of variable. (more likely on one line in a source file; split here h i The most common types of variables are: because of the paper’s formatting). It doesn’t matter now to know what this code does; seq_set_split is • tl, for token list; usually called as part of other processing and receives • seq, for sequence; the arguments from other calls. The important thing • clist, for comma list; is to see that the arguments follow the naming con- • prop, for property list; vention: the first one is a single token, the other two • int, for integer; are braced lists of tokens. • dim, for dimension; A function can have no arguments, but the • box, for box; is still required. Although TEX will not balk if you • fp, for floating point. define a function with a nonconforming name, - There are several others, but as the purpose of this ing to the convention will help to avoid conflicts and tutorial is to talk about functions, I’ll skip the more to have more easily parsable code. A typical func- esoteric ones for now, only touching them when need tion with no arguments is \scan_stop:, which is comes. nothing other than our old friend \relax. Maybe Function names are similar: the expl3 name is less poetic, but it expresses what the function’s main purpose is. \ prefix _ proper name : signature h i h i h i The prefix and proper name are the same as 3 Defining functions h i h i before, but the signature must be explained. It can h i There are many kernel functions which define func- be an arbitrary string of characters among tions. All of them share the cs prefix. The main cefnNovVwxTF ones are Each character given, except w, denotes an argument \cs_new:Nn to the function. We’ll be going into details soon. \cs_new_protected:Nn Mathematical functions can depend on one or We’ll discuss the others later. According to the more arguments (well, also zero, but then they’re naming conventions given, we can see that both have constant functions) and the same is true for expl3 two arguments: the first argument is a single token, functions. The purpose of the signature is to precisely the name of the function to be defined, the second specify how many arguments the function depends being the replacement text, that is, the code that will on and their type. For instance, the commonly used be substituted to the function’s call. function While expl3 tries hard to emulate a functional \seq_set_split:Nnn language, it is still implemented in TEX, which only takes three arguments, one of type N and two of type knows primitives, macros and registers. This fact n in that order. An argument of type N should be needs to be kept in mind when programming. On the a single token, the nature of which depends on the other hand, the new language makes for simpler con- function; in the above case, it should be a sequence structions, avoiding the clumsy (or fun, depending on variable. An argument of type n should be a braced the ’s attitude) chains of \expandafter list of tokens. In our example, the title of the paper or \noexpand often seen in traditional TEX, under- is an n-type argument to \title. This is a bit standing which is often quite hard. stretched, but should explain the concept. A simple example is the internal function for The w type is an exception, because it specifies managing the document’s title, which we saw earlier: nothing except that the arguments to the function \cs_new_protected:Nn \example_title:n are weird, and one must refer to the package/code { documentation in order to know how many there \tl_gset:Nn \g_example_title_tl { #1 } are and what syntax they have. Generally speaking, } w-type arguments should only appear in low-level Since the function’s name has signature :n (strictly functions. speaking, the colon is not part of the signature, but A call to the previously mentioned function it’s convenient to use it as a marker), expl3 knows might be something like that the replacement text can use #1 to refer to the \seq_set_split:Nnn argument supplied at call time. \l_example_test_seq Why are we using the protected instruction { || } and not the simpler cs_new? Because our function { a || b || c } will set the value of a variable. This is something

Enrico Gregorio TUGboat, Volume 41 (2020), No. 3 301

that for years has frustrated a horde of LATEX pro- { grammers and has required the distinction between \tl_use:N #1 robust and fragile commands. Nowadays the issue } is less relevant because almost all fragile commands The tl_clear_new instruction clears a possible pre- have been ‘robustified’, but it can still bite. ceding value or allocates a new variable. Then the What is the problem? If we define, in legacy tl_set function does the setting job; it’s not pro- LATEX, something like tected because it does no dangerous processing. How- \newcommand{\foo}[1]{% ever, this does not solve the problem we face. Here’s \renewcommand{\baz}{#1}% where the concept of variants comes into the scene. } We do which is the common way to store a value into a \cs_generate_variant:Nn \example_setvar:Nn macro, and then somehow \foo ends up in \write or { cn } \edef, even under their wrappers \protected@write \cs_generate_variant:Nn \example_usevar:N or \protected@edef, a long list of error messages ap- { c } pears. The traditional way of avoiding this is to use which effectively defines two new functions named \DeclareRobustCommand instead of \newcommand. \example_setvar:cn Any function that works by setting variables or \example_usevar:c calling other protected functions should generally be What does c mean? It means that the new functions protected itself. In case of doubt, protect. expect a braced argument, which it will build a Beware! The signature of the function to be command name from (in this case the name of a defined can only consist of the characters N or n. variable, in other cases it could be the name of a Well, T and F are also allowed, but these are a special function). So now we can define the user interface: topic that we’ll touch later on. How do the other \NewDocumentCommand{\setvar}{mm} characters listed above get into signatures? This is { a good question! \example_setvar:cn { l_example_var_#1_tl } 3.1 Generating variants { #2 } Suppose we’re doing a general purpose function for } setting tl (token list) variables to contain some ma- \NewExpandableDocumentCommand{\usevar}{m} terial we need to use at later points. The user inter- { face would utilize \setvar for storing the value and \example_usevar:c { l_example_var_#1_tl } \usevar for delivering the value. } We face a problem: how can the user specify Sites on LATEX are plagued with questions about the name of a variable inside the document where code doing nasty things such as \def\c{something}, the expl3 names are not allowed? Indeed, in normal asking why this breaks. text, the cannot be used in a command With the approach just outlined we set up a name, so something like namespace for our variables, which can just be called \setvar{\l_example_var_a_tl}{something} by their ‘outer’ name, leaving to the implementation the details about how to avoid conflicts. would bomb out. We’d like instead that the user [We could certainly define the user level commands in legacy types in LATEX. A typical implementation would be \setvar{a}{something} \newcommand{\setvar}[2]{% \usevar{a} \expandafter \def\csname example@var@#1\endcsname{#2}% (at different points of the document, of course). Let’s } proceed at a slow pace. We’ll define the user interface \newcommand{\usevar}[1]{% afterwards. First we define a function that allocates \csname example@var@#1\endcsname a variable and stores a value in it; then a function } that delivers the contents of a variable: I won’t quarrel with people maintaining this is simpler. But I’ll remain with my opinion that it isn’t.] \cs_new_protected:Nn \example_setvar:Nn { Let’s add something to the game: now we want \tl_clear_new:N #1 to allow the user to copy the value of a variable into \tl_set:Nn #1 { #2 } another, say by doing \copyvar{b}{a}, where b is } the new one and a the existing one. We only need a \cs_new:Nn \example_usevar:N new variant, namely

Functions and expl3 302 TUGboat, Volume 41 (2020), No. 3

\cs_generate_variant:Nn \example_setvar:Nn We could avoid generating variants. For in- { cv } stance the job of \example_setvar:cv could be ac- and then define the user interface with complished by \exp_args:Ncv \example_setvar:Nn \NewDocumentCommand{\copyvar}{mm} { (which is actually how the variant is defined), but \example_setvar:cv there’s no point in complicating our life this way. { l_example_var_#1_tl } Modern TEX and friends’ implementations have lots { l_example_var_#2_tl } of memory available, and the times when memory } was in very short supply and tricks saving just a We now have at our disposal another function, namely few tokens in order to spare memory were necessary \example_setvar:cv, which takes two braced argu- are only remembered by old dinosaurs like Frank, ments. The second one is scanned like c, producing Chris, David and myself. I remember well the time I a symbolic token which should be a variable of some saw the dreaded “This can’t happen” error message kind and then will deliver the contents of the variable because I was using PICTEX. as a braced argument to the main function. Let’s go back to theory. Are there variants to the function-defining functions? Yes, of course there are! [I leave as an easy exercise on \expandafter how to do this in legacy LATEX programming (hint: use \let and two There are times when we want to define a function \expandafters, cleverly positioned).] whose name is decided at runtime. The example It’s not necessary to write two distinct calls for below is a bit silly, but should give the idea: the call defining the variants, we can create them both at \cs_new:cn { example_foo:n } { -- #1 -- } once with: is the same as the definition of \example_foo:n \cs_generate_variant:Nn above, because the name, including the signature, \example_setvar:Nn will be formed before the underlying \cs_new:Nn { cn, cv } function does its job. One can use anything inside the braces corresponding to the c argument type, The v variant is a special case of the V variant, which so long as the final result, after full expansion, just is almost the same, but the capital letter reminds consists of characters. Oh! Expansion! What is it? us that a single token (a variable’s name) should be Be patient. First there are other bits to discuss. used without braces. Suppose we have a function that does something with its argument: 3.2 Local, global, and extra s h i \cs_new:Nn \example_foo:n { -- #1 -- } Everybody should know about local and global. In (just some nonsense to illustrate the concept). How- LATEX, if we perform some command definition inside ever, in some cases we need to pass the function an environment, it is local to the environment and something that has been stored in a tl variable: will disappear when it ends. Other assignments of nothing simpler, because we can do meaning are instead global: operations on counters, for instance. We don’t need a full discussion on local \cs_generate_variant:Nn \example_foo:n versus global, but there are aspects of the problem {V} that are relevant for functions. allows us to do All \cs_new extra :Nn declarations are always h i \example_foo:V \l_tmpa_tl global. Even if performed inside a group, their effect will also be carried on at the outer levels. Further, If, say, \l_tmpa_tl has been set to contain abc, then they will check whether the function is already de- calling \example_foo:V \l_tmpa_tl is exactly the fined and issue an error message if so. The extra same as doing \example_foo:n {abc}. h i part will be described next. Variable allocation (not Bear in mind that variants do not come into setting) is also always global: the instruction existence without first generating them. Kernel func- tions come predefined with several variants that have \tl_new:Nn \l_example_foo_tl proven to be useful; they’re listed together with the defines the variable at all levels and will balk if the main function in interface3.pdf. What if we don’t variable already exists. know whether a variant has already been defined? However, sometimes we need locally defined aux- No problem at all! The generation will be silently iliary functions, that have no fixed meaning and need ignored and, even if it weren’t, there should be no to be redefined according to the context. For these problem either, because variants are generated in a there is the family: uniform way. \cs_set extra :Nn h i

Enrico Gregorio TUGboat, Volume 41 (2020), No. 3 303

The syntax is exactly the same as with \cs_new, as The gset family is almost the same as new, are the available variants. but no check is performed about the function being Which one to prefer? The new or the set family? defined. The answer is easy: the former, unless the function [For the TEX gurus that are reading these notes, new and gset is required to change its definition according to the use \gdef, whereas set uses \def; the \long prefix is added context. Rarely, if ever, will a high level function be except with nopar.] defined with set. What are the available variants? Here’s the Thus, the very nature of the language invites complete list for all of the above functions: to code in layers. For instance, we Nn cn Nx cx could have done: What’s this mysterious x? It’s intended to bring to mind fully expanded. There will be a section later \NewDocumentCommand{\setvar}{mm} on the topic. { \tl_clear_new:c 3.3 Parameters { l_example_var_#1_tl } \tl_set:cn Some people will now be complaining that they have { l_example_var_#1_tl } seen different ways to define functions and they’re { #2 } right. There is a whole new family like the one above } but where the signature has a strange p between N and n, namely without defining \example_setvar:Nn at all. I dis- \cs_new:Npn courage this kind of programming: our code should \cs_new_protected:Npn sit on top of expl3 and provide APIs for the program- \cs_new_nopar:Npn mer to employ. As I said before, there is no point \cs_new_protected_nopar:Npn in sparing a function, even more so if we consider \cs_set:Npn that once our API is available, we can easily define \cs_set_protected:Npn variants thereof for particular jobs. \cs_set_nopar:Npn What’s the complete list of extra s available \cs_set_protected_nopar:Npn h i after \cs_new or \cs_set? Here it is: \cs_gset:Npn \cs_gset_protected:Npn \cs_new:Nn \cs_gset_nopar:Npn \cs_new_protected:Nn \cs_gset_protected_nopar:Npn \cs_new_nopar:Nn The p is a reserved argument type just for these \cs_new_protected_nopar:Nn functions and variants thereof: all of them come \cs_set:Nn along with the variants \cs_set_protected:Nn \cs_set_nopar:Nn Npn cpn Npx cpx \cs_set_protected_nopar:Nn and stand for parameter text. The two lines below \cs_gset:Nn are completely equivalent: \cs_gset_protected:Nn \cs_new:Nn \example_usevar:n {...} \cs_gset_nopar:Nn \cs_new:Npn \example_usevar:n #1 {...} \cs_gset_protected_nopar:Nn The same for the other functions. In the second In all cases, the first argument is the name of the instance, the parameter text is explicitly written function to be defined and the second argument is out. Remember that when the expl3 programming the replacement text. conventions are in force, spaces are ignored, so for You probably already have an idea of what two parameters we can have protected does: it arranges things so that the func- \cs_new:Nn \example_foo:nn {...} tion is not expanded when full expansion is enforced. \cs_new:Npn \example_foo:nn #1 #2 {...} In particular, a protected function cannot be used \cs_new:Npn \example_foo:nn #1#2 {...} in a c-type argument, because it wouldn’t be ex- \cs_new:Npn \example_foo:nn #1 #2{...} panded and it is not a character. \cs_new:Npn \example_foo:nn #1#2{...} The nopar variety disallows \par tokens in the and the last four lines are completely equivalent. function’s argument (when called). Unless we’re Personally, I prefer the first way that’s ‘more logical’; dealing with special situations where \par does not others prefer the second way. Beware! The second make sense in a function’s argument, there is no need way doesn’t check for consistency of the signature to use it. with the parameter text and it even allows for ‘wrong’

Functions and expl3 304 TUGboat, Volume 41 (2020), No. 3

signatures, but this fact should not be exploited: the package writer will discover a better way to LATEX will not balk if you type accomplish the task, but this would only influence \cs_new_protected:Npn the lower level and not the main function, which will \example_setvar:cn #1 #2 be possible to call forever. Maybe the definition of {...} \example_setdate:n will change in the future, but but this doesn’t mean that you can avoid the two-step this won’t affect code that uses it. procedure of first defining \example_setvar:Nn and 4 Expansion then creating the variant. Doing the definition this way is wrong. The second family of functions even There can be no full understanding of TEX with- allows for no signature at all, actually, so they can out some knowledge on how expansion works. In be used for defining user level commands, although functional programming languages, if g is a func- the path with \NewDocumentCommand (or siblings) is tion of one variable returning an array of three data, recommended.1 whereas f is a function of three variables, a call The p way is necessary when the parameter text f(g(x)) is ‘nonstandard’, in the sense that we’re defining a would be permissible (from a mathematical point function with delimited arguments; in this case, the of view, at least, maybe a particular functional lan- signature should be w. If we want a function that guage requires some tweak). This is not the same in sets three variables to the year, month and day given TEX, which goes from outside to inside, rather than an ISO-format date such as 2020-10-15, we can do conversely. \int_new:N \l_example_year_int If we have a one-argument \example_a:n func- \int_new:N \l_example_month_int tion that returns three braced lists of tokens, and \int_new:N \l_example_day_int another function \example_b:nnn that takes three \cs_new_protected:Nn \example_setdate:n arguments, a call such as { \example_b:nnn { \example_a:n {x} } \__example_setdata:w #1 \q_stop } would fail miserably. That’s how TEX works and no \cs_new_protected:Nn clever code can change this. The outer function will \__example_setdate:w #1-#2-#3 \q_stop look for three arguments and the given braced list { of tokens is just one. \int_set:Nn \l_example_year_int { #1 } As an example of how to handle this, suppose \int_set:Nn \l_example_month_int { #2 } that \example_a:n can be fed a date in ISO format \int_set:Nn \l_example_day_int { #3 } and from 2020-10-15 it returns {2020}{10}{15}, } whereas \example_b:nnn takes three arguments and Here I introduce another useful convention: if the produces a date in a different format, say “day ‘name prefix is preceded by a double underscore, the func- of the month’, year”: in this case it should output h i tion is considered lower level than the others and “15 October, 2020”. should never be called outside its specific uses by We need an indirect approach, in order to allow standard functions (without the double underscore). feeding an ISO date to the general function out- The idea is that the standard functions are the ‘pro- putting the date in that format. Let’s see how the grammer’s interface’, whereas the others are auxiliary general function might be defined: whose actual implementation should not concern the \cs_new:Nn \example_date:nnn programmer. The distinction when writing personal { % #1 = year, #2 = month, #3 = day code is not so important, but it is crucial for pack- #3˜ age writers. Standard functions (without the double \int_case:nn { #2 } underscore) can be used by other packages, whereas { one should not count on the lower level ones (with {1}{January} the double underscore) to even be defined in later {2}{February} ... versions of the package. {12}{December} This should clarify why the code above splits } the work into two levels; we have the high level ,˜#1 function \example_setdate:n function which relies } on a lower level one to do the dirty work. Maybe The \int_case:nn function examines its first argu- 1 expl3 can also be used with plain TEX, and in this case ment against the list given as its second argument this is the only way to define user level commands. (the code is incomplete, but you can guess how to

Enrico Gregorio TUGboat, Volume 41 (2020), No. 3 305

fill it in) consisting of pairs of braced items; the first { \__example_isodate:n { #1 } } contains an integer, the second something to output } when a match is found. The ˜ here is not a nonbreak- \cs_new:Nn \__example_date:Nn ing space in the expl3 programming environment, but { a normal space. #1 #2 We also need a function that is given a date in } \cs_generate_variant:Nn ISO format (2020-10-15) and returns it split into \__example_date:Nn { Ne } the three constituent parts, {2020}{10}{15}: upon which \example_date:n {2020-10-15} would \cs_new:Nn \__example_isodate:n produce the intended result. { There are still more ways, but here the idea \__example_isodate:w #1 \q_stop } is to present how we can exploit the full expansion \cs_new:Npn variants. \__example_isodate:w #1-#2-#3 \q_stop In sum, there are three kinds of them, namely { e, x and f. The last of these is the most restricted, {#1}{#2}{#3} because it performs recursive expansion of the to- } kens as soon as they are placed in the input stream The input to the second function is split at the hy- and ends at the first unexpandable token it finds. phens and at the terminator, so we’re using delimited Notwithstanding this limitation it has several uses. arguments, a detail I’ll skim over. Here, we just need The x type is nowadays less important because to know that the call all TEX engines supporting LATEX have the primi- \__example_isodate:n {2020-10-15} tive \expanded, which is itself expandable. Only will eventually return {2020}{10}{15} to the input Knuthian TEX (the engine that one launches with stream. tex on the command line) lacks it, since it is kept How do we combine these? There are several with no extensions, according to Knuth’s desiderata. ways, but all of them require understanding the con- [What does \expanded do? It is essentially like \edef, with cept of full expansion.T X only knows macros; when the difference that no macro is defined. The argument to E it is subject to full recursive expansion which doesn’t stop it finds one, it knows how many arguments it takes when an unexpandable token is found, but just jumps over and looks for them in the input stream; upon finding it and continues from the next token, until exhausting the them, it replaces the whole sequence of tokens so supplied token list. The result is then placed on the input found with the replacement text of the macro. stream (without braces).] The main problem is that most of the time we 4.1 Full expansion with e don’t know how many steps of expansion it will take to get from \__example_isodate:n {2020-10-15} The e argument type tells LATEX to first fully expand to {2020}{10}{15}. In this case it would be easy the given argument and then supply the result to the to count them, but this is just a simple example. original function. This is very important if the argu- If we knew, a suitable chain of \expandafter com- ment contains a variable which we want to deliver mands would suffice, but this is prone to errors and the value of at call time. inconsistencies if the implementation changes. We can see an example in a post on TEX.Stack- 2 What we’d like is for \__example_isodate:n Exchange. The question is about adding to end- to go all the way down to the final result in one notes, via the endnote package, the page number swoop. Here’s a way: where the endnote actually appears. We want to use \pageref through an automatically supplied label: \exp_last_unbraced:Ne \example_date:nnn \NewDocumentCommand{\MyEndNote}{m} { \__example_isodate:n {2020-10-15} } { \polyv_myendnote:ne What \exp_last_unbraced does is fully expand its { #1 } second argument and return the result in the input { \int_eval:n { \arabic{endnote}+1 } } stream with no braces around it. } There are other ways. One is to define a new helper function: \cs_new_protected:Nn \polyv_myendnote:nn \cs_new:Nn \example_date:n { { \endnote \__example_date:Ne 2 https://tex.stackexchange.com/a/438715/. The code \example_date:nnn there uses f, because e wasn’t available yet.

Functions and expl3 306 TUGboat, Volume 41 (2020), No. 3

{ Type v is nearly the same as V; it just adds #1˜(page\nobreakspace the ability of building the name of the variable by \pageref{#2:endnote}) supplying data at runtime. The main one is V. } Again, let’s suppose we have our favorite func- \label{\arabic{endnote}:endnote} tion that splits an ISO date into components and } outputs the date in another format, and that it’s \cs_generate_variant:Nn named \example_date:n. (Its implementation is \polyv_myendnote:nn { ne } irrelevant.) Suppose now we have a date stored in a tl What’s the problem to be solved? The endnote variable. Since this is just a macro under cover, at number is incremented after the endnote is processed, the beginning of expl3, the way to process this was so a \label command gets this new number. So we can’t use \cs_generate_variant:Nn \example_date:n { o } \pageref{\arabic{endnote}:endnote} because this would refer to the previous endnote. to be called like Thus the internal function uses e expansion in order \example_date:o { \l_tmpa_tl } to generate the successor to the current value of but this is bad because it depends on the knowledge the endnote counter. Without this full expansion, of the implementation of tl variables. Also, it is all the endnotes declared with \MyEndNote would not generalizable to other kinds of variables. The contain the equivalent of o variants do a single expansion step in the braced \pageref{% argument; while this works with the straightforward \int_eval:n{\arabic{endnote}+1}:endnote implementation of token list variables it would fail } spectacularly with fp variables (which contain float- and so the final result would be undefined cross ref- ing point numbers). erences, because would always \arabic{endnote} The best method is to do expand to the final value of the counter. For in- stance, if the last endnote was number 10, we’d end \cs_generate_variant:Nn up with \pageref{11:endnote}. Instead, with full \example_date:n { V } expansion, the current value is used and passed to to be called like the main function. At the first endnote, the counter \example_date:V \l_tmpa_tl has value 0, so the end result is the same as \endnote{The text of the endnote This will deliver the contents of the variable, suitably (page\nobreakspace\pageref{1:endnote})} braced, to the main function. So if we did \label{1:endnote} \tl_set:Nn \l_tmpa_tl { 2020-10-15 } We could as well have used f or x for this partic- then the call \example_date:V \l_tmpa_tl would ular application, but e is the most efficient of the lot. be equivalent to \example_date:n {2020-10-15}. The difference from x is that functions using x are An example with a different kind of variable, not expandable, so they have to be of the protected namely int. We want to feed in such a variable and kind. Indeed the process is a two-step one: first a the result should pad it with zeros to get four digits: temporary token list is set using \cs_new:Nn \example_pad:n \tl_set:Nx \l__exp_internal_tl {...} { (which internally uses good old \edef). \prg_replicate:nn The introduction of e-type full expansion has { 4 - \tl_count:n { #1 } } been a significant step forward, because it allows for { 0 } things that were almost impossible before. #1 However, as seen above, there are uses for x: } for instance there is no \cs_new:Ne variant and it \cs_generate_variant:Nn would be less efficient than \cs_new:Nx (which is \example_pad:n { V } just \edef). \int_set:Nn \l_tmpa_int { 43 } 5 Another essential variant: V for variables \example_pad:V \l_tmpa_int In the list of argument types above there are V and This will print 0043. This may seem of academic v, which have been touched upon briefly. Now it’s interest only, but with the sibling v type, we can time to discuss the former in greater detail. transform this into

Enrico Gregorio TUGboat, Volume 41 (2020), No. 3 307

\cs_generate_variant:Nn when we have nothing to execute for the false or true \example_pad:n { v } branch respectively. Of course \example_pad:v { c@page } \int_compare:nF { relation }{B} h i knowing that \c@page is the LAT X name of the reg- \int_compare:nTF { relation }{}{B} E h i ister containing the page number and an application are completely equivalent, but the former shows more can be immediately thought of. This exploits the fact clearly that we want to do nothing if the relation h i that standard TEX counters are exactly like expl3 turns out to be true. With the :nnn signature, empty int variables. Using the V or v variants we’re passing arguments would be always required. Also, the pres- the main function the actual value as a list of digits, ence of either T or F (or both) immediately alerts us so we can count it, which would be impossible with that the function is a conditional. the ‘abstract value’. All kernel conditional functions are available What variable types can be used this way? Sev- with ending TF, T or F; some conditional-like func- eral: tl, int, fp; also clist and others more eso- tions even have the version with neither. For in- teric. Essentially, all variables that can deliver some stance we can see in interface3.pdf that there sensible output. This cannot be expected from seq is \str_case:nn, but also \str_case:nnTF . The or prop variables and, indeed, using those will crash. strange-looking TF means that all three combina- I’ve sometimes found it useful to define the tions TF, T and F available. \cs_set:NV variant for \cs_set:Nn in order to use Why such a “pseudo-conditional”? The func- a tl variable where the desired replacement text has tion \str_case:nn is not a conditional (being re- been stored and modified via some regular expres- lated to ), but we can use the extended 3 sion replacement. By the way, it is not possible to version to output something if there is, or is not, a have a \cs_set:NpV variant, because the generator match. The version which is most likely to be used \cs_generate_variant:Nn can only accept a func- is \str_case:nnF to output something in case of no tion with a signature consisting of N or n characters. match, maybe an error message or a default output. 6 True or false? To generate proper conditionals, variants should be defined with There are two other interesting argument types: T \prg_generate_conditional_variant:Nnn and F, and the title of the section should suggest that they’re connected with truth and falsehood. rather than with \cs_generate_variant:Nn. For instance, if we plan to store some relation s for Exactly so! They are argument specifiers in the h i signature of conditional functions. Example: \int_compare:nTF into tl variables, the correct way to generate the variant is \int_compare:nTF is a function that takes three standard braced argu- \prg_generate_conditional_variant:Nnn \int_compare:n { V } { p, TF, T, F } ments; the first is a numeric relation between integers to test, the second the code to execute if the relation This will at once generate the variants is true, the third the code to execute if the relation \int_compare:VTF is false. So \int_compare:VT \int_compare:VF \int_compare:nTF { 0<1 } { A } { B } \int_compare:nTF { 0>1 } { A } { B } as well as the ‘predicate form’ will result in printing A and B respectively. So, why \int_compare_p:V isn’t it more simply \int_compare:nnn? Indeed, it to be used in boolean expressions. But this is outside used to be this way in the first versions of expl3, the scope of the present paper. but it was realized that having different argument Enrico Gregorio specifiers is handier, because we can also have ⋄ \int_compare:nT Dipartimento di Informatica, \int_compare:nF Universit`adi Verona enrico dot gregorio (at) univr 3 https://tex.stackexchange.com/a/355576/ dot it

Functions and expl3 308 TUGboat, Volume 41 (2020), No. 3

bib2gls: selection, cross-references and parent={bird}, locations description={waterbird with webbed feet}} \newglossaryentry{quartz}{name={quartz}, Nicola L. C. Talbot parent={mineral}, description={hard mineral consisting of silica}} Abstract This defines seven glossary entries. Only three In my previous article [6], I described using index- have been referenced in the document, three are A ing applications with LTEX, a process required by ancestors of the referenced entries so they must be the glossaries package to sort and collate terms, and included in the glossary as well, and one (antigen) the development of the bib2gls command line ap- hasn’t been referenced and isn’t required by any plication, which was designed specifically for the referenced entry. The document build is:1 glossaries-extra extension package. This article de- latex myDoc scribes how bib2gls differs from the other indexing makeglossaries myDoc methods with respect to selection, grouping, cross- latex myDoc references and invisible locations. (assuming the document source is in the file myDoc. 1 \printglossary vs \printunsrtglossary tex). The makeglossaries helper script invokes In order to better understand how items are listed makeindex, which creates the file myDoc.gls that with bib2gls [3], it’s useful to understand the prin- contains (line breaks added for clarity throughout): cipal differences between \printglossary (which is \glossarysection[\glossarytoctitle] provided by glossaries [4] and used with makeindex {\glossarytitle} and xindy [1]) and \printunsrtglossary (which \glossarypreamble is provided by glossaries-extra [5] and used with \begin{theglossary}\glossaryheader \glsgroupheading{A}\relax reset bib2gls). This was briefly covered in the previous h i \glossentry{animal}\relax reset h i article but is described in more detail here. \subglossentry{1}{bird}\relax reset h i Consider the following document: \subglossentry{2}{duck}{ location list } h i \documentclass{article} \subglossentry{2}{parrot}{ location list } h i \usepackage[style=treegroup]{glossaries} \glsgroupskip \makeglossaries \glsgroupheading{M}\relax reset h i \loadglsentries{entries} \glossentry{mineral}\relax reset h i \begin{document} \subglossentry{1}{quartz}{ location list } h i \Gls{duck}, \gls{parrot} and \gls{quartz}. \end{theglossary}\glossarypostamble \printglossary (The reset code, which is omitted for clarity, deals h i \end{document} with counteracting the effect of \glsnonextpages.) The entries are all defined in the file entries.tex, Note that the location list argument for the unrefer- which helps reduce in the main document file enced ancestor entries is just \relax. The start of and also makes it easier to reuse the same definitions each letter group is identified with in other documents. The contents of this file follows: \glsgroupheading{ group label } h i \newglossaryentry{antigen}{name={antigen}, The argument is a label which may have a corre- description={toxin or other foreign substance sponding title. If there’s no title associated with it that induces an immune response}} the label is used as the title. Glossary styles that \newglossaryentry{mineral}{name={mineral}, description={solid, inorganic, don’t support group headings define this command naturally-occurring substance}} to do nothing. \newglossaryentry{animal}{name={animal}, \printglossary effectively does: description={living organism that has setup defaults h i specialised sense organs and nervous system}} \bgroup \newglossaryentry{bird}{name={bird}, process options h i parent={animal}, input glossary file if it exists h i description={egg-laying animal with feathers, \egroup wings and a beak}} The initialisation parts ( setup defaults and process h i h \newglossaryentry{parrot}{name={parrot}, options ) deal with defining the glossary section ti- parent={bird}, i tle ( and ), the description={mainly tropical bird with bright \glossarytitle \glossarytoctitle plumage}} 1 latex is used here to denote pdflatex, xelatex or \newglossaryentry{duck}{name={duck}, lualatex. Replace as appropriate.

Nicola L. C. Talbot TUGboat, Volume 41 (2020), No. 3 309

preamble and postamble, and implementing the re- \def\@gls@currentlettergroup{} quired glossary style (which defines theglossary Each iteration of the loop performs the following and the formatting commands used in that environ- steps: ment). 1. Do the loop hook (which does nothing by default A few minor modifications are needed to the but may be configured to skip the current entry). example document to use \printunsrtglossary in- stead: 2. If the current entry doesn’t have a parent, obtain its group label (empty, if unavailable), and if \documentclass{article} the group label for this entry is different from \usepackage[postdot,stylemods,style=treegroup] h i {glossaries-extra} the currently stored group label then add the following code to content : \loadglsentries{entries} h i \begin{document} \glsgroupheading{ group label } \Gls{duck}, \gls{parrot} and \gls{quartz}. h i (if the current group label is empty) or \printunsrtglossary \end{document} \glsgroupskip\glsgroupheading{ group label } h i Note that \makeglossaries has been removed as (if the current group label isn’t empty). The there are now no indexing files that need to be opened. current group label is then set to group label . h i The extension package has a different set of defaults 3. Add the following to content : to the base package, so the post-description punctu- h i \ internal cs handler { entry label } ation needs to be added (postdot) if required. The h i h i stylemods option automatically loads glossaries-extra- The group label is obtained as follows: if the group stylemods which modifies the predefined glossary key has been defined then the label is obtained from styles to provide better integration with glossaries- the entry’s group field (which may be empty) other- extra and bib2gls and to make the styles easier to wise the label is obtained from the uppercase char- customise. acter code of the first letter of the sort field (which The document build is now simply: is normally obtained from the name field if not set). In this example, the entry on the first iteration of latex myDoc the loop is ‘antigen’. This entry doesn’t have a parent In this case there’s no file for \printunsrtglossary so the group information is queried to determine if a to input. Instead, it iterates over all defined entries new group heading should be inserted. for the given glossary to obtain the contents. Some The group key hasn’t been defined in this doc- glossary styles use a tabular-like environment and ument, so the group label needs to be obtained from loops within such environments are problematic, so the first character of the name field (since the sort an internal control sequence (\@glsxtr@doglossary) field hasn’t been provided). This character is the is used to store the contents of the glossary which is letter ‘a’ so the label is set to the decimal code of its then expanded on completion. The glossary code is uppercase equivalent (65). This is different from the now essentially: current group label (initially empty), so the group setup defaults h i header command is added: \bgroup \glsgroupheading{65} process options h i \glossarysection[\glossarytoctitle] (The decimal code is used for the group label to make {\glossarytitle} it easier to expand.) \glossarypreamble Note that no \glsgroupskip is added at this construct \@glsxtr@doglossary h i point because the current group label was empty. \printunsrtglossarypredoglossary The new current group label is updated (to 65). The \@glsxtr@doglossary internal handler macro is then added: \glossarypostamble \egroup \@printunsrt@glossary@handler{antigen} The \@glsxtr@doglossary command ends up de- This handler macro is used by all entries, regardless fined as: of their hierarchical level, and it uses the command: \begin{theglossary}\glossaryheader reset \printunsrtglossaryhandler{ label } h i h i content This is the command that should be redefined (not h i \end{theglossary} the internal handler macro) if you want to customize The content part is constructed within a loop. The the output. The default definition is simply h i current group label is initialised to empty: \glsxtrunsrtdo{ label } h i

bib2gls: selection, cross-references and locations 310 TUGboat, Volume 41 (2020), No. 3

This fetches the entry’s hierarchical level and then The way bib2gls works is by fetching data from does either ( level = 0) .bib files and creating a file (.glstex) that defines h i \glossentry{ label }{ location } all required entries in the required order with the h i h i or ( level > 0) required internal fields (for the group label and loca- h i tion lists) set appropriately. Wrapper commands are \subglossentry{ level }{ label }{ location } h i h i h i provided to make it easier to customise. For example: where the location list is obtained from an internal \providecommand{\bibglsnewentry}[4]{% field. In this example those fields haven’t been set, \longnewglossaryentry*{#1}{name={#3},#2}{#4}} so the locations are all empty. For debugging purposes, it’s possible to see the (\longnewglossaryentry* is used to allow for multi- glossary code content using: paragraph descriptions.) If required, the group labels are obtained by \renewcommand{\printunsrtglossarypredoglossary}{% the sort method and the code to define the corre- \csshow{@glsxtr@doglossary}} sponding titles is added to the .glstex file. In other In the above example, the content is: words, bib2gls takes care of all the tedious code \begin{theglossary}\glossaryheader reset h i that’s required with the manual method. This be- \glsgroupheading{65} haviour is possible to override; however, if bib2gls \@printunsrt@glossary@handler{antigen} is instructed to assign group labels that don’t follow \glsgroupskip\glsgroupheading{77} the order obtained by the given sorting method then \@printunsrt@glossary@handler{mineral} \glsgroupskip\glsgroupheading{65} fragmented groups will occur. (If you find yourself \@printunsrt@glossary@handler{animal} wanting to order by group title then this is an indica- \@printunsrt@glossary@handler{bird} tion that you should actually be using a hierarchical \@printunsrt@glossary@handler{parrot} system instead [7].) \@printunsrt@glossary@handler{duck} The .glstex file is input (if it exists) with: \@printunsrt@glossary@handler{quartz} \GlsXtrLoadResources[ options ] \end{theglossary} h i This command also writes information to the .aux (There’s only one reset here as there’s no sense in us- h i file that’s picked up by bib2gls (for example, the ing \glsnonextpages with \printunsrtglossary.) names of the .bib files that contain the data and The results of both methods are shown in Fig- how to order the entries). ures 1 and 2. Note that the letter groups show the bib2gls comes with a helper command line util- decimal character code (used as the group label) ity convertgls2bib which can be used to parse TEX because no title has been assigned. Titles may be files for instances of \newglossaryentry and other assigned with commands that are provided to define entries (such \glsxtrsetgrouptitle{ label }{ title } h i h i as \newabbreviation). In general, it’s best to use For example: this tool with files that only contain entry definitions \glsxtrsetgrouptitle{65}{A} (such as the example entries.tex) but it can also Obviously this is quite tedious to do for the entire be used on a complete document. (In this case, the alphabet. -p or --preamble-only switch may be used to limit The order of definition has created some strange parsing to the document preamble.) For example: results: there are two groups with the label 65 (‘A’) convertgls2bib entries.tex entries.bib and the ‘quartz’ sub-entry is separated from its This will create a file called entries.bib. The ex- parent (‘mineral’). The glossary style determines ample myDoc.tex file can now be modified to use whether or not the hierarchy is visible (through in- bib2gls: dentation etc.). The internal loop doesn’t make any \documentclass{article} attempt to gather child entries. The parent field is \usepackage[record,postdot,style=treegroup] only queried within the loop to determine whether {glossaries-extra} or not to attempt to insert the letter group headings. \GlsXtrLoadResources[src={entries}] The key to using \printunsrtglossary is to \begin{document} ensure the entries are defined in the correct order, \Gls{duck}, \gls{parrot} and \gls{quartz}. defining child entries immediately after their parent, \printunsrtglossary and defining only those entries which are required. In \end{document} this case, the entries should be defined in the order: Note the use of the record package option, which animal, bird, duck, parrot, mineral, quartz (antigen is required with bib2gls. This option defines the shouldn’t be defined as it’s not required). group key, which defaults to an empty label if not

Nicola L. C. Talbot TUGboat, Volume 41 (2020), No. 3 311

explicitly assigned, and the location key, which is This is best placed near the start of the file. Gen- used to store the formatted location list (another field eral comments (but not the encoding) may also be is available that stores each location in an internal supplied in @comment. For example: list, if required). @Comment{jabref-meta: databaseType:bib2gls;} The document build is now: (Entry type names are case-insensitive.) There are latex myDoc four basic sets of entry types: bib2gls -g myDoc latex myDoc Two primary entry types: @ and @. These have The result is shown in Figure 3. two required fields: short and long. The -g (or --group) switch is required if you symbols Two primary entry types: want distinct groups. This will make the sort meth- @symbol and . The required fields are: ods automatically assign the group label to each @number name or parent. If the name is missing, then the top-level entry (stored in the entry’s group field). If is also required. this switch isn’t used and the group labels aren’t description assigned in some other way, then step 2 in the loop index Two primary entry types: @index and iteration (page 309) will be skipped. @indexplural. There are no required fields. Note there’s a difference between using the -g general One primary entry type: @entry. The switch with a style that doesn’t show the group title required fields are: description and either and not using the -g switch. For example, if the style name or parent. is changed from treegroup to tree then when bib2gls There are other entry types, but they are beyond the is invoked with -g there will be a vertical gap between scope of this article. letter groups (unless the nogroupskip option is used) Unknown entry types and fields can be aliased, whereas there won’t be a gap if bib2gls is run with which can make a .bib file more adaptable to multi- the default --no-group setting. ple documents. For example, consider: In the first case, the group label is set, so step 2 @unit{m, in the loop iteration adds the group skip and group unitname={}, heading commands. The tree style redefines the unitsymbol={\si{\metre}}, group heading command to do nothing but the group measurement={length} skip is implemented. In the second case, the group } label isn’t set, so step 2 is omitted, so neither the This is an unknown entry type where all the fields group skip nor the group heading command will are also unknown. However, the resource options be inserted. If the nogroupskip option is set with a glossary style that doesn’t show the group head- entry-type-aliases={unit=entry}, field-aliases={ ing, then the result will typically appear the same unitname=name, as invoking bib2gls with the default --no-group unitsymbol=symbol, setting. However, since the group formations add to measurement=description the total document build time it’s more efficient to } simply use the default setting — unless --no-group will make bib2gls treat this entry as though it had you have multiple glossaries where some do require been defined as visual separation between groups. @entry{m, 2 The .bib file name={metre}, symbol={\si{\metre}}, As with BibTEX, data is defined in the .bib file in description={length} the form: } @ entry-type { label , key=value list } h i h i h i whereas If the entry-type is unrecognised, it will be ignored h i entry-type-aliases={unit=symbol}, (with a warning). Comments are slightly different: field-aliases={ in BibT X, anything outside of @ entry-type {...} unitname=description, E h i is considered a comment, but bib2gls is stricter and unitsymbol=name comments need to be marked up as such. Like TEX, } bib2gls recognises % as a comment character. The will make bib2gls treat this entry as though it had most important comment is the encoding line, e.g.: been defined as % Encoding: UTF-8 @symbol{m,

bib2gls: selection, cross-references and locations 312 TUGboat, Volume 41 (2020), No. 3

Glossary

A

animal living organism that has specialised sense organs and nervous system. bird egg-laying animal with feathers, wings and a beak. duck waterbird with webbed feet. 1 parrot mainly tropical bird with bright plumage. 1

M

mineral solid, inorganic, naturally-occurring substance. quartz hard mineral consisting of silica. 1

Figure 1: \printglossary (ordered by makeindex)

Glossary

65

antigen toxin or other foreign substance that induces an immune response.

77

mineral solid, inorganic, naturally-occurring substance.

65

animal living organism that has specialised sense organs and nervous system. bird egg-laying animal with feathers, wings and a beak. parrot mainly tropical bird with bright plumage. duck waterbird with webbed feet. quartz hard mineral consisting of silica.

Figure 2: \printunsrtglossary and stylemods (no automated ordering)

Glossary

A

animal living organism that has specialised sense organs and nervous system. bird egg-laying animal with feathers, wings and a beak. duck waterbird with webbed feet. 1 parrot mainly tropical bird with bright plumage. 1

M

mineral solid, inorganic, naturally-occurring substance. quartz hard mineral consisting of silica. 1

Figure 3: \printunsrtglossary and stylemods (ordered with bib2gls --group)

Nicola L. C. Talbot TUGboat, Volume 41 (2020), No. 3 313

description={metre}, \documentclass{article} name={\si{\metre}} \usepackage{makeidx} } \usepackage{glossaries} With the other indexing options (makeindex, xindy \makeglossaries or \printnoidxglossary), the general recommenda- \newglossaryentry{product}{name={products}, description={...}} tion is to set the sort key for any entry that contains \newglossaryentry{vector-product}{ commands within the name. For example: name={vector product},description={...}} \newglossaryentry{m}{name={\si{\metre}}, \newglossaryentry{cross-product}{ sort={m},description={metre}} name={cross product},description={...}} With bib2gls, the recommendation is the oppo- \newglossaryentry{dot-product}{ site: the sort field typically shouldn’t be set [8]. name={dot product},description={...}} For this reason, by default convertgls2bib will \begin{document} \Gls{vector-product}. skip the sort field when parsing commands like \glsadd[format=see{vector product}] \newglossaryentry. By omitting this field, it be- {cross-product} comes possible to dynamically allocate the most \glsadd[format=seealso{vector product}] appropriate value on a per-document basis, which {dot-product} makes it much easier to share .bib files across mul- \glsadd[format=see{dot product and vector tiple documents. This will be covered in more detail product}]{product} in a follow-up article. \printglossaries \end{document} 3 Cross-referencing In version 1.17 (2008-12-26) of the base glos- When using \index with makeindex, if you want to saries package a new command \glssee was added add a cross-reference in the index then you use the to provide a cross-referenced entry similar to this, but see or seealso encap (format). For example: instead of using makeidx’s \see and \seealso com- mands it uses its own analogous commands that take \index{cross product|see{vector product}} \index{dot product|seealso{vector product}} a label as the first argument instead of user-supplied \index{products|see{dot product and vector text. (Again the second argument containing the product}} location is ignored.) So the above document can be These are treated by makeindex in the same way as changed to use \glssee: any other location format, where the content follow- \documentclass{article} ing the encap marker (the vertical pipe | by default) \usepackage{glossaries} is treated as the name of a formatting command that \makeglossaries needs to encapsulate the page number. The argument \newglossaryentry{product}{name={products}, text {vector product} is considered all part of the description={...}} formatting command name (from makeindex’s point \newglossaryentry{vector-product}{ name={vector product},description={...}} of view). The above commands will be converted by \newglossaryentry{cross-product}{ makeindex into: name={cross product},description={...}} \item cross product, \see{vector product}{1} \newglossaryentry{dot-product}{ ... name={dot product},description={...}} \item dot product, \seealso{vector product}{1} \begin{document} ... \Gls{vector-product}. \item products, \see{dot product and vector \glssee{cross-product}{vector-product} product}{1} \glssee[see also]{dot-product}{vector-product} (assuming the \index commands were on page 1). \glssee{product}{dot-product,vector-product} The \see and \seealso commands are provided by \printglossaries indexing packages such as makeidx [2] and are defined \end{document} to ignore the second argument. Naturally, you also This has several advantages: need to index the referenced term (‘vector product’ • the cross-references are identified by label so the in this case) to avoid confusing the reader. text produced can be obtained from the name By analogy, you could adopt the same method key, which ensures consistency; with the glossaries package (makeidx is loaded in the example below to provide \see and \seealso): • if the hyperref package is added then the cross- reference can be automatically hyperlinked;

bib2gls: selection, cross-references and locations 314 TUGboat, Volume 41 (2020), No. 3

• if xindy is required instead of makeindex, then as a shortcut for \glssee, but it may be useful to \glssee can use xindy’s native cross-referencing later query the information. The seealso key also markup. saves its value. The extension package also provides The location (which is ignored within the document a related key alias which may only take a single but required by makeindex) is set to ‘Z’ regardless label as its value. This behaves much like its see of where \glssee is used in the document so, with counterpart when indexing but it will also make the default makeindex settings, the cross-reference commands like \gls link to the alias target in the will be pushed to the end of the location list. glossary. In the case of synonyms, such as ‘cross prod- Now let’s switch to \printunsrtglossary: uct’, that don’t need to be used in the document but \documentclass{article} need to be added to the glossary as a cross-reference \usepackage{hyperref} to assist the reader, then the term only needs to \usepackage[seenoindex=ignore]{glossaries-extra} be defined and indexed with \glssee. For conve- \newglossaryentry{product}{name={products}, see={dot-product,vector-product}, nience, version 1.17 also introduced the see key to description={...}} \newglossaryentry as a shortcut to enable the en- \newglossaryentry{vector-product}{ try to be defined and indexed at the same time. For name={vector product},description={...}} example: \newglossaryentry{cross-product}{ \newglossaryentry{cross-product}{ name={cross product},description={...}, name={cross product},description={...}, alias={vector-product}} see={vector-product}} \newglossaryentry{dot-product}{ name={dot product},description={...}, is equivalent to: seealso={vector-product}} \newglossaryentry{cross-product}{ \begin{document} name={cross product},description={...}} \Gls{vector-product} (also called \glssee{cross-product}{vector-product} \gls{cross-product}) and \gls{dot-product}. This is the only function that the see key serves with \printunsrtglossary \end{document} the base glossaries package. Since indexing can only be performed after the associated files have been No indexing is performed so the see and seealso opened an error will occur if the see key is used be- keys have no effect. There are no location lists for fore \makeglossaries (otherwise the indexing will any of the entries (not even the ones used in the silently fail). For draft documents (where you may document). In order to show the cross-referencing want to consider commenting out \makeglossaries information in the glossary, it’s necessary to either to speed compilation), you can suppress the error or modify the glossary style (or associated hooks) or turn it into a warning with the seenoindex package define the location key (which the record option option. does) and then set this key for the required entries. As with \index, it’s necessary to ensure that the For example: referenced entry is also indexed (through commands \usepackage[record]{glossaries-extra} like \gls or \glsadd). \newglossaryentry{product}{name={products}, The glossaries-extra package provides a similar see={dot-product,vector-product}, command \glsxtrindexseealso, which essentially description={...}, does \glssee[\seealsoname] (unless xindy is re- location={\glsxtrusesee{product}}} \newglossaryentry{vector-product}{ quired, in which case alternative markup is used). name={vector product},description={...}} There’s a corresponding key seealso that performs \newglossaryentry{cross-product}{ this command, analogous to the see key. (Note that name={cross product},description={...}, the tag used for the ‘see also’ command and key is alias={vector-product}, always \seealsoname.) Although these commands location={\glsxtrusealias{cross-product}}} (and their corresponding shortcut keys) essentially \newglossaryentry{dot-product}{ do the same thing but with a different tag, they are name={dot product},description={...}, provided both for semantic reasons and to make it seealso={vector-product}, easier to apply different formatting, depending on location={\glsxtruseseealso{dot-product}}} whether the cross-reference is a synonym or a pointer Again this is tedious to do manually but can be to related terms. performed automatically by bib2gls. The extension package modifies the see key so In .bib files, the see, seealso and alias fields that its value is also saved. The key still serves don’t perform any automated indexing but establish

Nicola L. C. Talbot TUGboat, Volume 41 (2020), No. 3 315

dependencies. The entries that are actually selected Two entries have been indexed in the document (cau- and added to the .glstex file depend on the selection liflower and marrow) and three have been implicitly criteria. For example: indexed via the see or seealso key (marrow, zuc- \usepackage[record]{glossaries-extra} chini and eggplant). If the file is called myDoc.tex \GlsXtrLoadResources[src=entries,selection=all] then the document build would normally be: \begin{document} latex myDoc \printunsrtglossaries makeglossaries myDoc \end{document} latex myDoc This will select all entries defined in entries.bib. This results in a glossary containing five items (cauli- None of them will have any page numbers (because flower, courgette, eggplant, marrow and zucchini; they haven’t been indexed in the document), but see Figure 4), and there are two warnings from any entries with the see, seealso or alias fields hyperref about non-existent references to targets set will have the cross-reference information added glo:aubergine and glo:cabbage. This is because to the location field. there are hyperlinks in the glossary to aubergine and The default setting is selection={recorded cabbage, but the targets aren’t defined as those en- and deps} which selects all entries with records in tries haven’t been indexed. In the case of cabbage, the .aux file (that is, they’ve been indexed using com- makeindex isn’t aware of the reference in the descrip- mands like \gls) and their dependent entries (ances- tion of cauliflower, but once the glossary has been tors, cross-references and any entries that have been created this reference can be indexed on the next referenced in certain fields, such as description). LATEX run. This means that the complete document This is straightforward for bib2gls to do (since it build has to be: has access to all data in the files) but is some- .bib latex myDoc thing that makeindex and xindy can’t do (as they makeglossaries myDoc only have limited information about entries that have latex myDoc been indexed and no information at all about entries makeglossaries myDoc that haven’t been indexed). latex myDoc Consider the following example (which requires This ensures that the required cabbage entry appears makeindex): in the glossary but there’s still a broken link to the \documentclass{article} unlisted aubergine (Figure 5). The cross-reference \usepackage[colorlinks]{hyperref} (via see or \glssee) only indexes the source entry \usepackage[style=tree]{glossaries-extra} (eggplant). It doesn’t index the target (aubergine). \makeglossaries The target must be indexed in order to resolve the \loadglsentries{vegetables} broken link, but there’s no reason for either eggplant \begin{document} \Gls{cauliflower} and \gls{marrow}. or aubergine to be listed in the glossary as neither \printglossaries are required in the document. \end{document} The vegetables.tex file can be converted to a Where the file vegetables.tex contains: .bib file: \newglossaryentry{cauliflower}{ convertgls2bib -i vegetables.tex vegetables.bib name={cauliflower},description={type of (The -i switch converts the entries with empty de- \gls{cabbage} with edible white flower head}} scriptions to use @index instead of @entry, which \newglossaryentry{cabbage}{ is more appropriate.) The document can now be name={cabbage},description={vegetable converted to use bib2gls: with thick green or purple leaves}} \documentclass{article} \newglossaryentry{marrow}{ \usepackage[colorlinks]{hyperref} name={marrow},description={long \usepackage[record,stylemods,style=tree] white-fleshed gourd with green skin}, {glossaries-extra} seealso={courgette}} \GlsXtrLoadResources[src={vegetables}] \newglossaryentry{courgette}{name={courgette}, \begin{document} description={immature fruit of a \gls{marrow}}} \Gls{cauliflower} and \gls{marrow}. \newglossaryentry{zucchini}{name={zucchini}, \printunsrtglossaries description={},see={courgette}} \end{document} \newglossaryentry{aubergine}{name={aubergine}, description={purple egg-shaped fruit}} This is with the default selection criteria which selects \newglossaryentry{eggplant}{name={eggplant}, recorded entries (cauliflower and marrow) and their description={},see={aubergine}} dependencies (cauliflower requires cabbage, since

bib2gls: selection, cross-references and locations 316 TUGboat, Volume 41 (2020), No. 3

\gls{cabbage} is in its description, and marrow Since bib2gls is designed for glossaries where lo- requires courgette, in order to resolve the cross- cations may not be required, it allows selection with- reference). This means that the glossary ends up out adding to the location list. The bib2gls alterna- with four items: cabbage, cauliflower, courgette and tive to \glsaddallunused is to use selection=all, marrow. Note that cabbage doesn’t have a location. which will select all entries, but only those that have The location (if required) can only be determined been specifically indexed will have locations. It also once the description is expanded in the glossary. recognises glsignore as a special ‘ignored location’, Neither zucchini nor eggplant have been selected which indicates that the entry should be selected since neither of them have records and neither are but the location should be discarded (rather than required by any of the indexed entries (or their de- simply rendered invisible). You can even set this as pendents). It would, however, be useful to also se- the default format with lect zucchini to supply the synonym for courgette \GlsXtrSetDefaultNumberFormat{glsignore} (but not eggplant, since aubergine isn’t required). This could, for example, be done at the start of the This can be done with either selection={recorded back matter, or it could be done for the entire doc- and deps and see} or selection={recorded and ument and only overridden for significant locations. deps and see not also} This will select any en- Setting up the alternative modifier can make it easier tries that cross-reference a required entry via the to switch the format. For example: see or alias fields. The former will also include \GlsXtrSetAltModifier{!}{format=glsnumberformat} cross-references via the seealso field. The latter doesn’t. This will now include zucchini but not egg- Now the principal mention of cauliflower could be plant (Figure 7). written as: So with bib2gls you can use see, seealso and A \gls!{cauliflower} is a type of \gls{cabbage}. alias to establish dependencies without automati- If glsignore has been set as the default format this cally forcing the entry into the glossary. With the will only add the current page to the cauliflower other methods, these keys should only be used if that location list but will ensure that cabbage is also automated indexing is intended. selected. This can help reduce lengthy location lists into a more compact list that only includes the most pertinent locations. 4 Invisible or ignored locations References Both makeindex and xindy require an associated location (typically a page number). They are gen- [1] R. Kehr, J. Schrod. xindy: a general-purpose index processor, 2018. ctan.org/pkg/xindy. eral purpose indexing applications and indexes are [2] LAT X Team. The makeidx package, 2014. intended to direct the reader to relevant locations E ctan.org/pkg/makeidx. in the document. Glossaries, on the other hand, [3] N. Talbot. bib2gls: Command line application provide definitions of terms and these don’t neces- to convert .bib files to glossaries-extra.sty sarily require any locations. The location list may resource files, 2020. ctan.org/pkg/bib2gls. be suppressed with the nonumberlist option, but this [4] N. Talbot. The glossaries package, 2020. will also suppress any cross-references (since they are ctan.org/pkg/glossaries. placed inside the location list). [5] N. Talbot. The glossaries-extra package, 2020. The glossaries package provides a \@gobble-like ctan.org/pkg/glossaries-extra. command \glsignore which simply ignores its ar- [6] N. Talbot. Indexing, glossaries and bib2gls. gument and may be used as an encap to provide an TUGboat 40(1), 2019. tug.org/TUGboat/tb40-1/ invisible location. This only works if that is the only tb124talbot-bib2gls.pdf location in the list. If there are other locations it [7] N. Talbot. Logical glossary divisions (type vs group will result in spurious or en-. This vs parent), 2020. dickimaw-books.com/gallery/ encap is used by \glsaddallunused, which iterates ?label=logicaldivisions. over all defined entries and indexes each unused en- [8] N. Talbot. Sorting, 2019. dickimaw-books.com/ . try. The aim here is to ensure all entries appear in gallery/?label=bib2gls-sorting the glossary, while only those used in the text have Nicola L. C. Talbot ⋄ locations. The problematic spurious commas and School of Computing Sciences en-dashes occur when this command is combined University of East Anglia with any indexing command that doesn’t mark the Norwich NR4 7TJ entry as used or if the first-use flag has been reset or United Kingdom if any subsequent indexing occurs. https://www.dickimaw-books.com

Nicola L. C. Talbot TUGboat, Volume 41 (2020), No. 3 317

Glossary

cauliflower type of cabbage with edible white flower head 1 courgette immature fruit of a marrow

eggplant see aubergine

marrow long white-fleshed gourd with green skin 1, see also courgette

zucchini see courgette

Figure 4: makeindex can’t detect dependent entries that haven’t been indexed

Glossary

cabbage vegetable with thick green or purple leaves 1 cauliflower type of cabbage with edible white flower head 1 courgette immature fruit of a marrow

eggplant see aubergine

marrow long white-fleshed gourd with green skin 1, see also courgette

zucchini see courgette

Figure 5: A second run is required when \gls is used in the description

Glossary

cabbage vegetable with thick green or purple leaves cauliflower type of cabbage with edible white flower head 1 courgette immature fruit of a marrow

marrow long white-fleshed gourd with green skin 1, see also courgette

Figure 6: bib2gls with selection=recorded and deps

Glossary

cabbage vegetable with thick green or purple leaves cauliflower type of cabbage with edible white flower head 1 courgette immature fruit of a marrow

marrow long white-fleshed gourd with green skin 1, see also courgette

zucchini see courgette

Figure 7: bib2gls with selection=recorded and deps and see

bib2gls: selection, cross-references and locations 318 TUGboat, Volume 41 (2020), No. 3

Making Markdown into a microwave meal mark converts the markdown documents to LATEX, saves the converted LATEX documents to disk, and V´ıt Novotn´y then passes control back to the Markdown package. Abstract The Markdown package then inputs the LATEX docu- ments into the main LATEX document for typesetting. In today’s academic publishing, many venues request Since version 2.9.0 (2020/09/14), the Markdown LAT X source in addition to or instead of PDF docu- E package supports the finalizecache option that ments. This is often for the purpose of editing and makes Lunamark produce a LATEX document frozen- improving full-text search. Services such as arXiv, Cache.tex (the frozen cache) mapping an enumer- Editorial Manager, and EasyChair require LAT X E ation of markdown documents to their converted that can be microwaved in a single run LATEX documents, and the frozencache option that of pdfT X without shell access and without invok- E makes the Markdown package use the frozen cache ing external programs. This requires that authors instead of Lunamark. The resulting LATEX docu- include auxiliary files and limits their selection of ment becomes portable, but further changes in the LAT X packages. In this article, I will show how E order and content of markdown documents are not LAT X documents using the Markdown, Minted, and E reflected. Appearance of markdown documents can BibLATEX packages can be precooked and frozen to still be adjusted by redefining the LATEX macros that be later microwaved in a single run of pdfT X. E render markdown elements. 1 Introduction 2.2 Minted Academic publishers often require that LATEX docu- ments can be microwaved in a single run of pdfT X The Minted package extracts code listings from the E A without shell access and without invoking external main LTEX document to disk and passes control to programs. Precooking and freezing the documents the Pygments Python package. Pygments converts A A to this form can be a daunting task for authors, who the code listings to LTEX, saves the converted LTEX documents to disk, and then passes control back may opt out of using powerful LATEX packages just to save themselves the hassle. to the Minted package. The Minted package then A A The Markdown package [3, 4, 5] allows the au- inputs the LTEX documents into the main LTEX thors to mix the familiar lightweight markup of Mark- document for typesetting. Since version 2.2 (2016/06/08), the Minted pack- down with LATEX, but requires shell access and in- vokes Lua. The Minted package [7] provides a simple age supports the finalizecache option that makes A interface for typesetting listings with syntax high- Minted produce LTEX documents listing listing number .pygtex (the frozen cache) containingh the lighting, but also requires shell access and invokes i converted LAT X documents, and the frozencache Python. The BibLATEX package [2] automates many E aspects of bibliography management that require option that makes the package use the frozen cache instead of Pygments. The resulting LAT X document careful manual work in LATEX, such as sorting and E formatting, but requires the invocation of Perl. becomes portable, but further changes in the appear- In this article, I will show by example how a ance, order, and content of listings are not reflected. LAT X document using the Markdown, Minted, and E 2.3 BIBLATEX BibLATEX packages can be precooked and frozen, so that it can be later microwaved in a single run The BibLATEX package extracts citations from the main LATEX document document .tex to disk. The of pdfTEX. This will allow the authors to quickly h i prepare documents using powerful packages without author then invokes the Biber Perl package, either spending a thought on the publisher’s demands. manually or using automation software such as GNU Make, LATEXMk, Arara [1], etc. Biber converts the 2 Technical details citations and an external bibliography database into Here I describe the technicalities of precooking and a precooked bibliography file document .bbl and h i freezing the outputs of the Markdown, Minted, and saves it to disk. The BibLATEX package then inputs BibLATEX packages before moving on to the example. the precooked bibliography file for typesetting. The famished may skip to the following page. By including the precooked bibliography file with the LATEX source, we remove the necessity of 2.1 Markdown invoking Biber. The resulting LATEX document be- The Markdown package extracts markdown docu- comes more portable, but further changes in the ments from the main LATEX document to disk and order of citations and the appearance and content of passes control to the Lunamark Lua parser. Luna- references are not reflected.

V´ıt Novotn´y TUGboat, Volume 41 (2020), No. 3 319

3 Example In addition to the LATEX document, we also have the following bibliography database bibliography.bib: We have the following LATEX document document.tex using Markdown, Minted, and BibLATEX: @article{pinelis, author = {Pinelis, Iosif}, \documentclass{article} title = {The exp-normal distribution is \usepackage[citations,fencedCode]{markdown} infinitely divisible}, \markdownSetup{renderers={inputFencedCode=% journal = {arXiv}, {\inputminted{#2}{#1}\write18{#2 #1}}}} year = {2018}, \usepackage{minted,biblatex} } \addbibresource{bibliography.bib} \begin{document} Our LATEX document may seem simple, but it \begin{markdown*}{hybrid,=false} requires shell access and invokes Lua, Python, and The following code in *Python* shows how to Perl. A publisher is likely to reject it. produce iid r.v.'s $W_1, W_2, \ldots, W_k$ First, we remove the directories _markdown_doc- s.t. $\prod_i W_i\sim N(0, 1)$: [@pinelis] ument and _minted-document if they exist. Sec- ``` python ond, we change line 1 of our LATEX document to \documentclass[finalizecache]{article} and def pinelis(k, size, iterations=100): invoke Python, pdfT X, and Biber from the shell: import numpy as np, numpy.random as npr E arange1 = np.arange(iterations) + 1 $ pip install numpy scipy matplotlib def eps(): $ pdflatex -shell-escape document.tex return ( $ biber document.bcf (npr.uniform(size=size) < 0.5) * Third, we change line 1 of our LATEX document 2.0 - 1.0 to \documentclass[frozencache]{article} and ) we remove \write18{#2 #1} from line 4. Finally, def gamma(size): we create a ZIP archive document.zip with the return npr.gamma(1.0 / k, size=size) files document.bbl, document.tex, and plot.pdf def rv(): and with the directories _markdown_document/, and return eps() * np.exp( _minted-document/: np.log(2.0) / (2.0 * k) - $ zip -r document document.{bbl,tex} \ gamma(size) - > plot.pdf {_markdown_,_minted-}document/ np.sum( If our document contained forward references, we gamma((size, iterations)) / would also include file document.aux. After the (2.0 * arange1 + 1.0), publisher has unpacked the ZIP archive, they only axis=-1, need to microwave the document in a single run ) + np.sum( of pdflatex document.tex before savoring it and np.log(1.0 + 1.0 / arange1) / wolfing it down. (2.0 * k) After typesetting, our LAT X document produces ) E the following output. Listing was trimmed for brevity ) (and colors have been grayscaled throughout in the return [rv() for _ in range(k)] printed article): import matplotlib.pyplot as plt fig, A = plt.subplots(1, 3) W1, W2, W3 = pinelis(k=3, size=10**7) W = (W1, W1 * W2, W1 * W2 * W3) D = ('$W_1$', '$W_1W_2$', '$W_1W_2W_3$') for ax, w, desc in zip(A, W, D): ax.hist(w, 'auto', density=True) ax.set_title(desc) ax.set(xlabel='value', ylabel='pdf') plt.savefig('plot.pdf', dpi=300) \end{markdown*} \includegraphics[width=\textwidth]{plot} \printbibliography \end{document}

Making Markdown into a microwave meal 320 TUGboat, Volume 41 (2020), No. 3

4 Conclusion User-defined Type 3 fonts in LuaTEX A Powerful new LTEX packages are created every day. Hans Hagen However, authors in academia often avoid them to steer clear of the publisher’s fury. In this article, I 1 Introduction have shown how authors can have their cake and eat This article describe the generic mechanism that is it too: by precooking and freezing, the authors can present in LuaTEX 1.13 and following to deal with quickly prepare mouth-watering documents using the user-defined Type 3 fonts. The examples shown here Markdown, Minted, and BibLAT X packages without E might not work out well in ConTEXt because it has having to worry about the publisher’s demands. its own font layer, which could interfere with low Acknowledgements level hooks, but the same principles apply. Beware: in ConTEXt LMTX we do things a bit differently. This work was funded by the South Moravian Centre In a TEX environment Type 3 fonts are normally for International Mobility and the Brno Ph.D. Talent used for bitmap (pk) fonts. However, they can be project. useful for other purposes too. In LuaTEX a relatively References simple mechanism is provided to (ab)use this font format. [1] P. R. M. Cereda. The bird and the lion: arara. TUGboat 36(1):37–40, 2015. 2 Creation via callback tug.org/TUGboat/tb36-1/tb112cereda.pdf [2] P. Kime, P. Lehman. BibLATEX: Sophisticated Defining a whole font in advance when only a few bibliographies in LATEX, 2020. shapes are used makes no sense. Apart from a waste ctan.org/pkg/biblatex of time and memory it could, as a side effect, trigger [3] V. Novotn´y. Using markdown inside TEX the inclusion of all kinds of resources. Therefore, documents. TUGboat 38(2):214–217, 2017. handling is delayed to the moment that the subset tug.org/TUGboat/tb38-2/tb119novotny.pdf of the font actually gets written to the PDF file. [4] V. Novotn´y. Markdown 2.7.0: Towards lightweight In the frontend you can create virtual characters markup in TEX. TUGboat 40(1):25–27, 2019. tug. but their rendering gets in-lined which is often okay, org/TUGboat/tb40-1/tb124novotny-markdown.pdf but when you need for instance graphics (using the [5] V. Novotn´y. Markdown: A package for converting image virtual command) that can be sub-optimal. and rendering markdown documents inside TEX, One can refer to characters in another font and that 2020. ctan.org/pkg/markdown font can be a (future) Type 3 font. It is only when the [6] I. Pinelis. The exp-normal distribution is infinitely document is finalized that the exact subset of glyphs divisible. arXiv, 2018. arxiv.org/abs/1803.09838 used in the font is known so that is the moment when [7] G. Poore. Minted: Highlighted source code for we deal with what needs to be included. This is done LATEX, 2017. ctan.org/pkg/minted with a plug-in: a single callback that does several things in sequence. V´ıt Novotn´y ⋄ The glyphs in a Type 3 font are streams of PDF Nad Cihelnou 602 operators, a.k.a. char procs. When these are (inline) Veleˇs´ın, 382 32 bitmaps or graphic operators all is relatively easy, but Czech Republic what if they are images or references to shapes from witiko (at) mail dot muni dot cz github.com/witiko fonts? In that case we also need to make sure that resources are dealt with. We can cook up a complex system of additional resource management, compara- ble to pages and reused boxes, but it doesn’t pay off. Instead we provide a couple of calls to the same call- back, provide_charproc_data, to deal with that. Because we can use TEX asynchronously (using the mechanism for executing tokens) the relevant render- ings can be done on demand. When a Type 3 font is specified and when its psname property is equal to the string none, a call- back is triggered. Actually it is triggered three times. • The first call is a preroll. It can be used to do the preparations needed for successive calls. TUGboat, Volume 41 (2020), No. 3 321

Between the first two calls the used characters of no nested Type 3 fonts are used. We also assume fonts are identified again. This makes it possible that we handle all this at the Lua end. to use a reference to an xform in the mentioned This mechanism is pretty low level, for a good char proc stream that itself uses fonts, or we reason: we’re already wrapping up the PDF file so we can refer to other fonts directly. As so-called cannot burden the engine too much with arbitrary xforms objects are managed independently they actions that mess up the process. Now, one can use don’t interfere with the font at hand. The first TEX to typeset the stream but in practice the stream argument is 1 which indicates that a preroll is can best be constructed manually. One can always being done. The callback function also gets the use TEX to construct an xobject that gets referred to. font id and character reference passed and no The good thing is that this feature doesn’t change return value is expected. (or add) anything to the front-end. • The second call gets passed the number 2, the We could have stuck to a more automated mech- font id, and the character index but this time anism, for instance by expecting xform object ref- there have to be two return values: the width erence, a width, height and depth (indicating some (in basepoints) and an object number of the char shift) but then we also need to pass information proc stream object. When an object number about using d0 or d1 so in the end one needs to is returned, a reference will be added to the know about charprocs anyway and then we can as resource of the font. well expect stream objects. A bit of a complicated mess is compensated for by flexibility, but a mess it • The third and last call is for housekeeping. This remains. In a similar fashion using one callback with call gets the number 3 passed and the font id. numbers indicating each call’s purpose is nicer than The two expected return values are the scale fac- three different callbacks. tor in the font matrix (e.g. 0.001) and a string that has additional entries in the resource dic- 3 Examples 1 tionary. It is now time for a few examples. These are simple Mechanisms like this are normally kept hidden ones, as it makes no sense to come up with many from the user. An example follows in a moment, but pages of how to do this in for instance ConTEXt first we explain the steps. For sure one needs more (MkIV that is). We define a font with several solu- code to integrate it properly. Don’t do it this way tions mixed. It is not part of some font system. The in ConTEXt and expect it to work forever, because following example will work okay in MkIV (because we wrap and overload. Anyway, in the end there are we typeset the LuaTEX manual with it). only a few cases to cover: First we define a couple of token registers and • A stream of mere graphical operators with no fill them with some content which as you can see can dependencies on resources like fonts or objects. be anything. • A stream with a reference to an xform which has \newtoks \MoreCrapA the actual content, in which case we need to add \newtoks \MoreCrapB \newcount\MoreCrapC a reference in the xobject resource dictionary of the font’s. \MoreCrapA{\setbox0\hbox{% • A stream with a reference to a font, in which \font\foo=dejavusansmono at 10bp\foo xyz}} case we need to add a reference font resource \MoreCrapB{\setbox0\hbox{% dictionary of the font’s. \externalfigure[cow.pdf][height=4mm]}} • A stream of operators that do have dependencies We also define a simple handler mechanism but on whatever resources one can think of, in which hook into the ConTEXt one if we run that macro case we need to be able to add these to the fonts package (this hook is there only for the manual). resource dictionary. \startluacode if context then And, because we can have additional fonts used RegisterTypeThreeHandler (either in a created xform or in the stream) we need = fonts.handlers.typethree.register to analyze the Type 3 fonts first. We assume that else local typethree = { } 1 An earlier version had four separate calls: one for the scale, two that looped (by multiple calls) over lists of xob- callback.register("provide_charproc_data", jects and used fonts, and a final one for additional resources. function(action,f,...) But because this mechanism is not meant for general use, if typethree[f] then assembling the right entries is now delegated to the caller. return typethree[f](action,f,...)

User-defined Type 3 fonts in LuaTEX 322 TUGboat, Volume 41 (2020), No. 3

end ["tounicode"] = "0066", end) }, function RegisterTypeThreeHandler(id, [103]={ handler) ["depth"] = d/4, typethree[id] = handler ["height"] = d/2, end ["width"] = d, end ["tounicode"] = "0067", \stopluacode }, Next we hard code a font table. Later we will see [104]={ ["depth"] = 0, what these character definitions do. Setting psname ["height"] = d/2, to none signals that we want to trigger the callback. ["width"] = d, \startluacode ["tounicode"] = "0068", local d = 655360 }, }, local f = { -- the minimal amount of metadata: }

["name"] = "MyFancyTestFont", -- normally you do this at the TeX end and ["psname"] = "none", -- trigger -- integrate into a font definition mechanism ["format"] = "type3", id = font.define(f) ["tounicode"] = true, token.set_macro( -- the minimal number of parameters: "MyTestFont", "\\setfontid " .. tostring(id) .. "\\relax " ["parameters"] = { ) ["extra_space"] = 0, tex.setcount( ["quad"] = d, "MoreCrapC", ["size"] = d, id ["slant"] = 0, ) ["space"] = d/2, \stopluacode ["space_shrink"] = d/10, The font is defined and as you can see, we don’t ["space_stretch"] = d/6, need to have any meaningful rendering yet; that is ["x_height"] = d/2, what we do next. Now, if you don’t get what happens }, here by looking at it, this mechanism is not for you. -- five characters: We’re talking rather low-level PDF combined with the interface to PDF objects and streams. ["characters"] = { For this example font the preroll step will [100] = { construct some boxes with content. The flushed ["commands"] = { objects are later referenced by a name (/Xnnn in { "down", d/3 }, our case) bound to a form object (m 0 R). Before { "rule", d, d }, the assembly stage kicks in, the backend will check }, what fonts are used again so that referenced fonts ["depth"] = d/3, get included. The assemble routine uses low level ["height"] = 2*d/3, Type 3 directives that are explained in the PDF ["width"] = d, reference manuals. You have to make sure that no ["tounicode"] = "0064", }, tricky dependencies on other Type 3 fonts occur. [101]={ The wrapup function takes care of communicating ["depth"] = 0, the used resources. ["height"] = d/2, \startluacode ["width"] = d, local usedobjects = { } ["tounicode"] = "0065", local usedfonts = { } }, local usedfontid = tex.getcount("MoreCrapC") [102]={ ["depth"] = d/3, local function preroll(f,c) ["height"] = 2*d/3, if c == 103 then ["width"] = d, tex.runtoks("MoreCrapA")

Hans Hagen TUGboat, Volume 41 (2020), No. 3 323

usedobjects[c] end = tex.saveboxresource(0,nil,nil,true) resources = resources .. "/Font << " elseif c == 104 then .. table.concat(t) .. ">>" tex.runtoks("MoreCrapB") end usedobjects[c] = tex.saveboxresource(0,nil,nil,true) return 0.001, resources end end end local function usedfonthandler(action,...) local function assemble(f,c) if action == 1 then if c == 101 then return preroll(...) local r = pdf.immediateobj( elseif action == 2 then "stream", return assemble(...) "1000 0 d0 10 w 0 1 0 rg " elseif action == 3 then .. "0 0 1000 500 re F" return wrapup(...) ) else return r, 10 -- won’t happen elseif c == 102 then end local r = pdf.immediateobj( end "stream", "1000 0 d0 10 w 1 0 0 rg " RegisterTypeThreeHandler(usedfontid, .. "0 -333 1000 1000 re F" usedfonthandler) ) \stopluacode return r, 10 The last thing we do is register this plug-in. An elseif c == 103 then example of using the font is this: local r = pdf.immediateobj( "stream", \MyTestFont "1000 0 d0 55 0 0 100 0 -200 cm /X103 Do" \char100\char101 ) \char100\char102 return r, 10 \char100\char103 elseif c == 104 then \char100\char104 local r = pdf.immediateobj( \char100 "stream", And the output (grayscaled for print; the second "1000 0 d0 55 0 0 50 60 -50 cm /X104 Do" character is a green bar and the fourth is a red ) square): return r, 10 else return 0, 0 end So, to summarize what we do: the implemented end method lives in the backend and leaves the frontend local function wrapup(f,c) untouched. The backend recognizes a user Type 3 local resources = "" font, and just injects references to charproc streams that can, but are not required to, refer to one xform if next(usedobjects) then per charproc. This is about as simple as it could be local t = { } made with only minimal overhead but it (probably) for k, v in pairs(usedobjects) do still has enough potential. table.insert(t,"/X" .. k .. " " .. v As with the rest of those Lua driven features of .. " 0 R ") LuaTEX, you don’t need to be a mastermind to cook end up solutions. Of course you should limit yourself resources = resources .. "/XObject << " to what really makes sense. That said, practice .. table.concat(t) .. ">>" end has shown that often whatever opening the program provides, it will be abused, and I expect the same for if next(usedfonts) then this mechanism. Just don’t blame the engine when local t = { } the produced PDF misbehaves. for k, v in pairs(usedfonts) do table.insert(t,"/F" .. k .. " " .. v Hans Hagen ⋄ .. " 0 R ") http://pragma-ade.com

User-defined Type 3 fonts in LuaTEX 324 TUGboat, Volume 41 (2020), No. 3

Data display, plots and graphs Date Wk. Morn. Aft. Eve. Average Peter Wilson 4/4 1 179/109 179/109 11/4 2 156/109 147/89 149/93 150/97 18/4 3 158/108 142/92 146/92 149/97 1 Introduction etc. Some years ago tex.stackexchange.com (TeX.SE) The higher readings are for the systolic (max- seems to have taken over from comp.text.tex for imum) blood pressure and the lower ones for the asking about (LA)T X and friends. A perennial ques- E diastolic (minimum) pressure during the heartbeat’s tion on TeX.SE seems to be asking what (LA)T X E cycle. is useful for apart from typesetting mathematical papers. There have been many answers to this and 4 Plotting I would like to suggest one more: displaying data. In this note I’ll mention a couple of ways that I According to the user manual for my BP monitor, the World Health Organization (WHO) have developed a found that LATEX could help with tables, graphs, and plots of data. The impetus for this was when I was BP classification scheme. I decided that it might be strongly advised by my local hospital to keep a check useful to plot the BP against this scheme as shown on my blood pressure (BP). for the Q individual. WHO describe 6 regions in their classification. These are: Optimal BP, Normal BP, Normal Sys- 2 Practicalities tolic, Mild Hypertension, Moderate Hypertension, Following the consultant’s suggestion I measure my and Severe Hypertension. BP three times a day (morning, afternoon, and in the I have used the standard picture environment evening) and average them to get a reading for the for producing the plot. The only special macros that day. I do this every day and it is surprising, to me I used were at least, how it varies. I felt that I needed to keep a % bored with typing \makebox(0,0) record of all this so I could present it to the medical \newcommand{\zbox}[1]{\makebox(0,0){#1}} experts in case of any problems (like blackouts or % plot symbol falling downstairs — don’t ask). \newcommand*{\mk}{\zbox{$\bullet$}} I decided that I needed at least three kinds of % \plotit{location}{week} records: a tabulation of the BP readings; a plot of \newcommand{\plotit}[2]{\put(#1){\mk}} the BP; and a graph of the BP. In the following the data shown is for a hypo- The first two to minimise typing and the last for thetical individual I have designated as Q1 and have plotting a BP reading at the \put location. With the reference point for plot- no relationship with any actual BP readings. \makebox(0,0){text} ting ‘text’ is at the center, vertically and horizontally, of text. 3 Tabulation This is an outline of the code I used for the I just used the normal table environment with the picture. booktabs package to produce a tabulation along the \setlength{\unitlength}{0.8cm} following lines, resulting in the example below for Q \begin{picture}(8,11) at a single day per week (Wk.). % \usepackage{booktabs} % in the preamble \thicklines \begin{tabular}{lcllll} \toprule Date & Wk. & Morn. & Aft. & Eve. & Average % the horizontal and vertical lines \\ \midrule \put(0,0){\line(1,0){9}} 4/4 & 1 & & & 179/109 & \put(9,0){\vector(1,0){0}} 179/109 \\ \put(0,0){\line(0,1){10}} 11/4 & 2 & 156/109 & 147/89 & 149/93 & \put(0,10){\vector(0,1){0}} 150/97 \\ \multiput(0,0)(1,0){9}{\line(0,1){0.1}} etc. \\ \multiput(0,0)(0,1){10}{\line(1,0){0.1}} \bottomrule \end{tabular} % the axis labels \put(1,10.3){\zbox{SYSTOLIC}} \put(1.4,-1.0){\zbox{DIASTOLIC}} 1 I’m a fan of the original Bond books. \put(1,-0.3){\zbox{75}}

Peter Wilson TUGboat, Volume 41 (2020), No. 3 325

% etc The result shows that the hypothetical Q per- \put(8,-0.3){\zbox{110}} son’s BP is typically in the range of Normal Systolic to Mild Hypertension but with some outliers.2 \put(-0.5,1){\zbox{110}} % etc 5 Graphing \put(-0.5,9){\zbox{190}} For graphing BP I used the regular picture environ- ment. Nothing special about drawing the axes. The % the regions thing of interest here is the use of the \polyline \put(0,2){\line(1,0){2}} macro from the curve2e package. This takes a list \put(2,0){\line(0,1){2}} of coordinates like (x,y) and draws straight lines \put(1,1){\zbox{Optimal}} between them. % etc \put(0,8){\line(1,0){8}} \put(8,0){\line(0,1){8}} BP \put(4,7){\zbox{Moderate hypertension}} \put(5,9){\zbox{Severe hypertension}} 180 SYSTOLIC 170 % the BPs \plotit{7.8,6.5}{1} 160 % etc \plotit{0.8,3.6}{33} 150 \end{picture} 140 \vspace{10mm} % caption 130 {\centering \emph{Scatter plot with 120 WHO classification of blood˜pressure} 110 DIASTOLIC \vspace{\baselineskip} \par} 100

SYSTOLIC 90 190 Severe hypertension 80 180 70 170 Moderate hypertension 60 • 160 • 5 10 15 20 25 30 Weeks • 150 Mild hypertension • • Graph of blood pressure over time • •• • • • • • 140 ••• Normal• • systolic• Here is a brief outline of the code I used for the • • • • 130 • graph showing the use of \polyline. • • • Normal \begin{center} 120 • \setlength{\unitlength}{5.5pt} \begin{picture}(41,81) 110 Optimal % draw axes, etc., then the BP graphs % scaled to the size of the axes 75 80 85 90 95 100 105 110 % first the systolic DIASTOLIC \polyline

Scatter plot with WHO classification of 2 As a non-medical person I cannot comment on what this blood pressure might mean for our imaginary person.

Data display, plots and graphs 326 TUGboat, Volume 41 (2020), No. 3

(6,65)(7,46)(8,46)(9,40)(10,38)(11,35)% Nothing special about the code. I used the (12,37)(13,30)(14,34)(15,34)(16,34)(17,32)% \framebox macro for drawing the rectangular re- (18,42)(19,40)(20,43)(21,42)% etc gions and created a macro to reduce the number of % then the diastolic characters needed for specifying its location and size. \polyline \newcommand{\histit}[2]{\put(#1,0.0)% (6,26)(7,23)(8,23)(9,18)(10,15)(11,13)% {\framebox(1,#2){}}} (12,11)(13,10)(14,13)(15,13)(16,13)(17,9)% where the first argument is the x location of the (18,12)(19,15)(20,16)(21,16)% etc framebox and the second is its height. I must say that I found the scatter plot more \end{picture} informative than the histogram, although the latter % caption highlighted the unusual high diastolic readings. \emph{Graph of blood pressure over time} \vspace{\baselineskip} 7 Summary \end{center} I have shown four different ways of displaying data. The dashed lines indicate the upper limits of Edward Tufte3 has shown many other ways. the WHO Normal Systolic regime. There are many applications for (LA)TEX and The graphs show that after an initial worrying friends. Among those noted on TeX.SE, apart from period Q’s BP settled down to a fairly regular pattern mathematical and scientific publications, are: albeit with some fits and starts. Books fiction and non-fiction 6 Histogram Correspondence Another way of displaying data is by a histogram Games Bridge, Chess, Crosswords, Noughts and which shows the number of data points noted within Crosses (aka Tic-tac-toe), Sudoku sets of ranges. The following is a histogram of Q’s Greeting cards diastolic BP for 5 mg ranges. Invoices Literature Critical editions, Multilingual 9 Mars Rover (programmed via TEX) Music 8 Newsletters 7 Poetry Postcards 6 Presentations (slides) ... 5 I hope that my small application might give 4 thoughts towards suitable additions to the above list. 3 Peter Wilson ⋄ 2 12 Sovereign Close Kenilworth, CV8 1SQ UK 1 herries dot press (at) earthlink dot net 75 80 85 90 95 100 105 110 Histogram of Q’s Diastolic BP 3 Edward R. Tufte, The Visual Display of Quantitative Information, Graphics Press, 1983.

Peter Wilson TUGboat, Volume 41 (2020), No. 3 327

Short report on the state of LuaTEX, 2020 the floating-point mode, while the mpost program also includes the mpfr for arbitrary precision Luigi Scarso support. Abstract As result of mixing and matching all these vari- ations, building LuaTEX and integrating it into TEX A short report on the current status of LuaT X and E Live is quite a complex task, but thanks to the GNU its relatives: LuaHBT X, LuaJITT X and LuaJITHB- E E autotools, things are manageable. TEX. 1 Background 2 The current status of LuaTEX First, let’s summarize that there are four programs As noted above, LuaHBTEX (version 1.12.0) shipped for the first time with the T X Live 2020 DVD, and it or “flavors” of LuaTEX: E is already supported by LuaLATEX: At the TEX Live • LuaTEX, with lua; 2020 meeting, the talk “HarfBuzz in LuaLATEX” by • LuaJITTEX, with luajit (just-in-time compila- Marcel Kr¨uger has shown some differences between tion capability); the HarfBuzz text shaping and the ConTEXt text • LuaHBTEX, with lua and HarfBuzz; shaping; also better memory management for large • LuaJITHBTEX, with luajit and HarfBuzz. fonts with respect to LuaTEX, especially for 32-bit The build system manages the task of compiling platforms. and linking the shared components and the common On the other side, Petr Olˇs´akin 2020 has pub- components, so that, for example, they all have ex- lished OpTEX, “. . . a LuaTeX format based on Plain actly the same TEX core, and share the same version TeX macros (by Donald Knuth) and on OPmac number and development id. macros” (see petr.olsak.net/optex, and article On 30 May 2019 the first commit of LuaHBTEX, in this issue), also included in TEX Live 2020. It’s a LuaTEX variant with the ability to use HarfBuzz not clear if it will eventually support LuaHBTEX. for glyph shaping, landed in the LuaTEX repository, Finally, also at the TUG 2020 online meeting, after a discussion started around mid-February about Patrick Gundlach in his talk “Speedata Publisher — a the merging of HarfTEX by Khaled Hosny and the different approach to typesetting using Lua” (see tug. upcoming LuaTEX in TEX Live 2019. By that time org/TUGboat/41-2/gundlach-speedata.pdf) has LuaTEX was already frozen for the DVD and, - shown an example of a working workflow that uses rially, it was not possible to reopen the development. LuaTEX (and possibly LuaJITTEX) purely by means Also in 2019, LuaTEX entered its “bug fixing” phase of the lua API — a sort of TEX without \TeX. The (more on this below), further complicating the merge. Speedata Publisher software has been actively devel- LuaHBTEX was integrated in the experimental oped for a decade. branch of LuaTEX repository on 6 July 2019 and di- In light of these continuing developments, it is rectly after, on 8 July 2019, it landed in the TEX Live therefore appropriate to clarify the meaning of “bug repository; The first release of LuaHBTEX, tagged fixing” mode, because it is sometimes associated with as version 1.11.1, was on 19 October 2019, giving the term “frozen”. a wide margin for testing for the next (i.e., now the LuaTEX is based on the lua release 5.3.5, and current) TEX Live 2020. it will stay on the 5.3 version at least for TEX Live Together with HarfBuzz, the other component 2021 and TEX Live 2022, possibly switching to the under “observation” in 2019 was the pplib library, final release 5.3.6 (in release candidate 2 at the date the PDF reader library by Pawe l Jackowski. This of 2020-07-23) at some future point. The current was the candidate to replace in TEX Live — release of lua is 5.4.0, with approximately five years this eventually happened with the first commit to the between two versions; it’s good practice to have an TEX Live repository on 21 April 2020. This means year of transition between two different versions, so that the next TEX Live 2021 will entirely use pplib a rough estimation for the next lua transition is six instead of poppler, for both LuaTEX (as was the case years from now, i.e., around TEX Live 2026. in previous years), and now also X TE EX; poppler is On the side of the TEX core, the plan is for bug no longer in the TEX Live repository. (By the way, fixing and marginal improvements, for example the pdfTEX will continue to use its own semi-homegrown \tracinglostchars 3 that now raises an error (a ≥ libxpdf to read PDF files until there is some clear new feature added across engines by David Jones), reason to change.) but not new features. From what we have seen MetaPost gets related special treatment: the previously, stressing LuaTEX in different areas (e.g., library in LuaTEX includes only the decimal and only with the lua API or with the new HarfBuzz

Short report on the state of LuaTEX, 2020 328 TUGboat, Volume 41 (2020), No. 3

shaping library) can reveal hidden bugs, but it should The lack of a specific format for LuaJIT does be noted that bug fixing is a complex task because fake the results a bit, but maintaining an additional the fix must be well harmonized with the rest of the format in this case is not an easy task: To take code: For example, some issues with DVI output that advantage of the JIT, where LuaJIT shines, one has to need to be checked carefully are still open. write specialized Lua code and using the ffi module Nevertheless, there are three areas that are still requires rather in-depth knowledge of C to achieve marked as “under active development”: the first is significant results. Currently only ConTEXtMkIV has the ffi (foreign function interface) library, that in some support for LuaJITTEX. LuaTEX is not yet finished and not as functional as Probably LuaJITTEX and LuaJITHBTEX are its counterpart in LuaJITTEX. Admittedly it is not better suited for specialized tasks (e.g. database pub- a key feature of LuaTEX and probably useful only in lishing) or as software as service in cloud, possibly the context of automated workflows. in a containerized environment, but they should also The second is the binding with HarfBuzz li- be considered a research tool in digital typesetting. brary, currently given by the luahrfbuzz module. Currently LuaJIT in TEX Live is still using the If necessary the binding can still be expanded and/ 2.1-beta3 release (from 2017), but it is likely it will or modified, preserving as much as possible the cur- sync with the official repository by the end of the year. rent API, because LuaHBTEX is in an early phase of Although LuaJIT development is not proceeding at a adoption. rapid pace, there have been important updates (e.g., The third area is the pplib library that surely all LuaJIT 64-bit ports now use 64-bit GC objects needs more testing. by default; and there is support for more platforms). Finally, the bug fixing phase certainly also in- There are some mismatches with Lua (a few functions volves the MetaPost library. in Lua that are not available in LuaJIT, notably the utf8 module) still to be fixed. 3 The current status of LuaJITTEX 4 Conclusion LuaJITTEX is (or in some way is considered) a niche engine. One issue is that while LuaTEX is based on At the TEX core, LuaTEX and LuaHBTEX are exactly Lua 5.3.5, LuaJITTEX is still based on 5.1 with the same and the choice between one or the other some partial coverage of 5.2. LuaJITTEX also has depends only on whether or not one accepts HarfBuzz some intrinsic limits, such as the fixed number of as a dependency. As OpTEX has shown, LuaHBTEX nested tables, which has a serious impact on the is not always the necessary choice. In any case, the table serialization. By design, LuaTEX makes heavy current state is better described by “bug fixing mode use of C functions bound via the classic Lua/C API; with marginal improvements” rather than “frozen”, the just-in-time (JIT) doesn’t play well in with an on stability. The area marked this situation, but this is not a serious issue, given as “under active development” may change more that it can be turned off on demand (and indeed significantly, but this should have a minimal impact it’s off by default). Finally, LuaJIT doesn’t support on stability. all of Lua’s platforms, although the most important LuaJITTEX and LuaHBTEX are more or less still ones are available. out of the mainstream and that gives a wider range On the other hand, the LuaJIT virtual machine for maneuvering; given the high efficiency of the is much faster than Lua and the compilation of an ar- implementation of LuaJIT, it’s often better to code a ticle can have a significant speed-up. For this article, module directly in Lua rather than compile and link LuaJITHBTEX is 2.5 times faster than LuaHBTEX a C module. Admittedly, it’s a rather specialized with exactly the same LuaLATEX format; although topic, but efficiency has its costs. for complex documents the gain is smaller, around 15%–20%. Luigi Scarso ⋄ luigi.scarso (at) gmail dot com

Luigi Scarso TUGboat, Volume 41 (2020), No. 3 329

Distinguishing 8-bit characters and Japanese professional quality [9], including Japanese line break- characters in (u)pTEX ing rules and vertical typesetting. pTEX and pLATEX were originally developed by Hironori Kitagawa the ASCII Corporation2 [1]. However, pTEX and Abstract pLATEX in TEX Live, which are our concern, are community editions. These are currently maintained pTEX (an extension of TEX for Japanese typesetting) 3 by the Japanese TEX Development Community. For uses a legacy encoding as the internal Japanese en- more detail, please see the English guide for pTEX [3]. coding, while accepting UTF-8 input. This means pTEX itself does not have -TEX features, but that pTEX does code conversion in input and output. there is -pTEX [7], which merges pTEX, -TEX and Also, pT X (and its Unicode extension upT X) dis- E E additional primitives. Anything discussed about tinguishes 8-bit character tokens and Japanese char- 휀 pTEX in this paper (besides this paragraph) also acter tokens, while this distinction disappears when 휀 휀 applies to -pTEX, so I simply write “pTEX” instead tokens are processed with \string and \meaning, of “pTEX and -pTEX”. Note that the pLATEX format or printed to a file or the terminal. in TEX Live is produced by -pTEX, because recent These facts cause several unnatural behaviors 휀 versions of LAT휀EX require -TEX features. with (u)pTEX. For example, pTEX garbles “ſ ” () to “ ” on some occasions. This paper explains these 휀 2.1 Input code conversion by ptexenc unnatural顛 behaviors, and discusses an experiment in 휀 improvement by the author. Although pTEX in TEX Live accepts UTF-8 inputs, the internal Japanese character set is limited to 1 Introduction JIS X 0208 (JIS level 1 and 2 kanjis), which is a Since TEX Live 2018, UTF-8 has been the new default legacy character set before Unicode. pTEX uses input encoding in LATEX [8]. However, with pLATEX, Shift_JIS (Windows) or EUC-JP (other) as the in- which is a modified version ofA LTEX for the pTEX ternal encoding of JIS X 0208. engine, the source In pTEX and related programs, the ptexenc library [12] converts an input line to the internal %#!platex encoding. pTEX’s input processor actually reads \documentclass{minimal} the converted result by ptexenc. A valid UTF-8 \begin{document}ſ\end{document} % longs sequence which does not represent a JIS X 0208 char- gives an inconsistent error message [4] (edited to fit acter — such as (“ſ”) or <9F> (“”) — TUGboat’s narrow columns): is converted to ^^-notation, such as ^^ab. On the other hand, an invalid UTF-8 sequence ! PackageinputencError:Unicodecharacter (U+C4CF)notsetupforusewithLaTeX. is converted into (an undefined 顛 in EUC-JP) sometimes, in TEX Live 2019 or prior. Here “ ”, “ſ” and U+C4CF are all different characters. 顛 In TEX Live 2020, the sequence is always converted The purpose of this paper is to investigate the into ^^-notation. background of this message and propose patches to resolve this issue. This paper is based on a cancelled 2.2 Japanese character tokens talk [6] in T XConf 2019.1 E pT X divides character tokens into two groups: ordi- In this paper, the following are assumed: E nary 8-bit character tokens and Japanese character • All inputs and outputs are encoded in UTF-8. tokens. The former are not different from tokens in • pTEX uses EUC-JP as the internal Japanese en- 8-bit engines, say, TEX82 and pdfTEX.A ^^-notation coding (see Section 2.1). sequence is always treated as an 8-bit character. • Sources are typeset in plain pTEX(ptex), unless A Japanese character token is represented by stated otherwise by %#!. its character code. In other words, although there • The notation describes a byte 0xab, or a is a \kcatcode primitive, which is the counterpart character token whose code is 0xab. of \catcode, its information is not stored in tokens. Hence, changing \kcatcode by users is not recom- 2 Overview of pTEX mended. pTEX is an engine extension of TEX82 for Japanese typesetting. It can typeset Japanese documents of 2 Currently ASCII DWANGO in DWANGO Co. Ltd. 3 texjp.org/. Several GitHub repositories: 1 TEXConf 2019 (the annual meeting of Japanese TEX users, github.com/texjporg/tex-jp-build ((u)pTEX), texconf2019.tumblr.com) was canceled due to a typhoon. github.com/texjporg/platex (pLATEX).

Distinguishing 8-bit characters and Japanese characters in (u)pTEX 330 TUGboat, Volume 41 (2020), No. 3

2.3 An example input This is because all of Now we look at an example. Our input line is \csname u^^c5^^bf\endcsname a<9F> (aß ſ§) \csname uſ\endcsname % ſ: in UTF-8 漢 \csname u \endcsname % : in EUC-JP First, ptexenc converts this line into 顛 顛 have the same name u in pTEX, hence they a^^c3^^9f^^c5^^bf are treated as the same control sequence. Applying which is fed to pTEX’s input processor. The final \string to them, we get the same token list character “§” is included in JIS X 0208. \ u From the result above, pTEX produces tokens 顛 12 This12 explains the error message in the introduc- a <9F> 漢 § tion. “ (U+C4CF)” in the message is generated 顛 where11 12and 12are Japanese12 character12 tokens. From from this example,漢 § we can see that we cannot write “§” \expandafter\string directly to output this character in a Latin font (use \csname u8:\string\string\endcsname commands or ^^c2^^a7). The inputenc package expects that applying \string 3 Stringization in pTEX to the above control sequence produces 3.1 Overview \ u 8 : Names of multiletter control sequences, which in- A clude control sequences with single Japanese charac- but12 the12 result12 12 in pL12TEX is12 ter name, such as \ , are stringized, that is to say, \ u 8 : they are stored intoあ the string pool. Similarly, some 顛 primitives, such as \string, \jobname, \meaning 3.312 12\meaning12 12 and \the (almost always the case), first stringize The result of their intermediate results into the string pool, and \font\Z=ec-lmr10 \Z % T1 encoding then retokenize these intermediate results. \def\fuga{^^c5^^bf ſ}\meaning\fuga Stringization of pTEX has two crucial points. 顛 • The origin of a byte is lost in stringization. A differs between plainE T X and plain pTEX: byte sequence, for example , in the plain TEX macro:->Å£éąŻÅ£ string pool may be the result of stringization of plain pTEX macro:-> a Japanese character “ ”, or that of two 8-bit 顛顛顛 顛 Now we look at what happened with pTEX. The characters and . definition of \fuga is represented by the token list • In retokenization, a byte sequence which repre- sents a Japanese character in the internal encod- 顛 ing is always converted to a Japanese character This12 gives the12 following12 string12 as the intermediate token. For example, is always con- result of \meaning. verted to a Japanese token . 顛 macro:-> These points cause unnatural behavior, namely bytes from 8-bit characters becoming garbled to Japanese Retokenizing this string gives the final result character tokens. We look into several examples. macro:-> 顛顛顛 3.2 Control sequence name which we have already seen. Let’s begin with the following source: 3.4 A tricky application \font\Z=ec-lmr10 \Z % T1 encoding The behavior described in Section 3.2 has a tricky \expandafter\def\csname uſ\endcsname{AA} application: generating a Japanese character token \expandafter\def\csname u \endcsname{BB} 顛 from its code number, even in an expansion-only \def\ZZ#1{#1 (\string#1) } context. This can be constructed as follows: \expandafter\ZZ\csname u^^c5^^bf\endcsname% (1) \expandafter\ZZ\csname uſ\endcsname % (2) %#!eptex \expandafter\ZZ\csname u \endcsname % (3) \font\Z=ec-lmr10 \Z % T1 encoding 顛 \input expl3-generic % for \char_generate:nn With pTEX, (1)–(3) produces the same result \ExplSyntaxOn BB (\u ) \cs_generate_variant:Nn \cs_to_str:N { c } 顛

Hironori Kitagawa TUGboat, Volume 41 (2020), No. 3 331

\cs_new:Npn \tkchar #1 { Then, pTEX prints this token list. Since \cs_to_str:c { are printable and <9F> is not, the putc2 function \char_generate:nn % upper byte receives the following string, one byte per call. { \int_div_truncate:nn { #1 } { 256 } } { 12 } ^^9f \char_generate:nn % lower byte Each is converted to “ ” by putc2, { \int_mod:nn { #1 } { 256 } } { 12 } 顛 } while the single remains unchanged. Hence the } final result is “ ^^9f”, as shown. 顛顛 \ExplSyntaxOff \edef\A{\tkchar{` }\tkchar{` }} 4.3 \message 漢 字 \meaning\A % ==> macro:-> 漢字 \message is similar to \write, but differs in that it stringizes its argument. Now consider an input line This \tkchar will be unnecessary as of TEX Live 2020, since the \Uchar and \Ucharcat primitives \message{^^fe^^f3: :} were added into -pTEX at that time. ꚲ Here (<9A> in UTF-8) is a character 4 Output to file or terminal ꚲ 휀 included in JIS X 0213, but not in JIS X 0208. 4.1 Output code conversion The argument of \message is (expanded to) the following token list. As with input, pTEX does a code conversion from the internal Japanese encoding to UTF-8 in outputting : <9A> : to a file or the terminal. This is done in two steps: Then,12 this token12 12 list is12 stringized12 to 12 12 12 • As with TEX82, pTEX uses the print proce- dure for printing a string.4 In pTEX, a byte :<9A>: is printable if and only if its value is between 32 (“␣”) and 126 (“~”), or it is used in the inter- This string is “printed” by print; since only <9A> is nal Japanese encoding ( in EUC-JP). unprintable, putc2 receives • pTEX uses the putc2 function instead of the :^^9a: standard putc C function. putc2 is a variation of putc with code conversion, and is defined in Now, putc2 converts (an undefined ptexenc. code point in EUC-JP) to the null character <00>, and to “ ”. Hence the final result is Hence pTEX may garble 8-bit characters, such as 險 , into a Japanese character in output. We <00>: ^^9a: 險 look into two examples, one is of \write and the other is of \message. 4.4 Controlling printability 4.2 \write TEX82 and pdfTEX support TCX (TEX Character Translation) files [2], which can be used to specify With pTEX, the following source which characters are printable. In fact, cp227.tcx \newwrite\OUT is activated in (pdf)LATEX and several other formats \immediate\openout\OUT=test.dat in TEX Live, to make characters 128–255 and three \immediate\write\OUT{ ſß} 顛 control characters printable. One can switch to a \immediate\closeout\OUT different TCX file at runtime. For example, only produces a file test.dat, whose contents are characters 32–126 are printable in

^^9f latex -translate-file=empty.tcx 顛顛 Let’s look at what happened. However, pTEX was not expected to use TCX First, the argument of \write is (expanded to) files (no TCX files are activated in formatsE bypT X the following token list. in default). inipTEX can make characters printable by a TCX file, and that’s all. For example, tomake <9F> 顛 characters 128–255 printable in pTEX, one has to make another format with appropriate option. There 4 In fact,12 slow_print12 is12 used for12 printing a string which might contain unprintable characters. However, slow_print is no method to make an arbitrary character, say calls print internally. , unprintable when using this format.

Distinguishing 8-bit characters and Japanese characters in (u)pTEX 332 TUGboat, Volume 41 (2020), No. 3

5 upTEX 6 Distinguishing bytes from 8-bit 5.1 Overview characters and those from Japanese characters upTEX [10, 11] is a Unicode extension of pTEX by To resolve (u)pT X’s behavior described so far, I Takuji Tanaka. upTEX is (almost fully) upward- E have been developing an experimental version5 of compatible with pTEX, so it is a very convenient solution for converting existing documents to Uni- (u)pTEX, where stringization and outputting retain code with minimal changes. the origin of a byte — an 8-bit character (token) or a Japanese one. I refer to these as “experimental”, In upTEX, a Japanese character token is a pair of the character code and \kcatcode. Furthermore, and (u)pTEX in TEX Live development repository as \kcatcode controls whether a UTF-8 sequence pro- “trunk”. duces a Japanese character token or a sequence The implementation approach is to extend the of 8-bit tokens. For example, <9B> ( , range of a “byte” to 0–511 (Table 1). A value between U+985B) in an input line is treated as three 8-bit顛 0–255 means a byte from an 8-bit character (token), characters when \kcatcode"985B is 15, and as a and 256–511 means a “byte” from a Japanese one. Japanese character otherwise. I tested a different approach, namely using as a prefix to a byte 128–255 which came from an 8-bit character. But this approach caused confusion 5.2 No code conversion with , so I gave up. Since upTEX’s internal Japanese character code is Unicode (UTF-8 in the string pool), code conversion 6.1 \write by ptexenc has no effect. Hence the inconsistent For example, consider the source from Section 4.2: error message described in the introduction will not \newwrite\OUT be issued. \immediate\openout\OUT=test.dat \immediate\write\OUT{ ſß} 顛 5.3 Retokenization and \kcatcode \immediate\closeout\OUT In upTEX, \kcatcode is involved in the retokeniza- with the experimental pTEX. When no TCX file is tion process. Specifically, a UTF-8 sequence is con- activated, putc2 receives the string verted into a Japanese character token if and only <1C5><1BF>^^c5^^bf^^c3^^9f if its \kcatcode is not 15. This means that the result of \meaning of the same macro depends on because a Japanese token sends <1C5><1BF> to 顛 \kcatcode settings, as in the following example. putc2, and <80>– are not printable. Thus the contents of the output test.dat are %#!uptex ^^c5^^bf^^c3^^9f \font\Z=ec-lmr10 \Z % T1 encoding 顛 %% default: \kcatcode"3042=17 When cp227.tcx is activated, they become \def\hoge{^^e3^^81^^82 } あ ſß \kcatcode"3042=15 顛 \meaning\hoge % ==> macro:->ãĄĆãĄĆ because <80>– are printable in this case. \kcatcode"3042=17 \meaning\hoge % ==> macro:-> 6.2 The string pool ああ Since the range of a “byte” is increased to 0–511, The definition of \hoge is represented by the token the type of the string pool is changed to let each list element store a “byte”; concretely, to a 16-bit array. For example, let’s reconsider the following source: <81> <82> あ \font\Z=ec-lmr10 \Z % T1 encoding 12 12 12 17 \def\fuga{^^c5^^bf ſ}\meaning\fuga Hence the intermediate result of \meaning\hoge is 顛 With the experimental pTEX, the intermediate result macro:-><81><82><81><82> of \meaning\fuga is macro:-><1C5><1BF> However, because the \kcatcode of “ ” is changed, two calls of \meaning\hoge give differentあ results. Hence the result of \meaning\fuga is We will see results of \string of multiletter 5 github.com/h-kitagawa/tex-jp-build/tree/ control sequences later. printkanji_16bit. GitHub issue: [5]

Hironori Kitagawa TUGboat, Volume 41 (2020), No. 3 333

Table 1: A “byte” in experimental (u)pTEX

“byte” 0–255 256–511 origin an 8-bit character (token) a Japanese character (token) 푐 printable characters 32–126 (“␣”–“~”) all

“safe” printing of print ∗ print_char (not print) putc2 without code conversion with code conversion

retokenization푐 an 8-bit(푐) character token a Japanese(푐) character∗∗ token (푐, …) ∗∗ Web2C’s default; can be extended by a TCX file. With adjacent푐 “bytes” which are between 256–511. ∗ ∗∗

macro:->Å£ Å£ \kcatcode"3042=15 \expandafter\ZZ 顛 \csname \endcsname % (2) because only <1C5><1BF> is converted to a Japanese あ character token . 顛 Results are summarized in Table 2. One may feel The change in the type for the string pool in- uneasy about both results. creases the size of format files by about the total length of strings, but the amount of increase is not trunk The results of \string for (1) and (2) differ, so large. For example, the platex-dev format is while they represent the same control sequence increased by about % (see table below). As of (as in Section 5.3). TEX Live 2020, pdfTEX and (u)pTEX use compressed experimental (1) and (2) represent different con- trol sequences. format files, so 3.5the amount of increase on diskis smaller. 6.4 Input buffer(s) platex-dev.fmt [kB] trunk experimental I also introduced an array buffer2 as a companion ar- uncompressed 10412 10774 ray to buffer, which contains an input line. buffer2 compressed 2322 2380 plays the role of the “upper byte” of buffer . Hence, when (u)pTEX considers a byte sequence buffer [푖] I wanted to keep the modification as small and as a Japanese character, buffer2 is[푖] set to 1. simple as possible; so I left unchanged the structure This is needed when scanning a control sequence[푖 . . 푗] of the string pool, except for adding a “flag bit”. name in order to distinguish a byte[푖 which. . 푗] consists a part of a Japanese character from another byte. 6.3 Control sequence names in upTEX Suppose that the category codes of and In the experimental pTEX, are both 11 (letter), an input line contains \csname uſ\endcsname \^^c5^^bf (\ ^^c5^^bf, \ ſ) (1) \csname u \endcsname 顛 顛 顛 and pTEX is about to scan this control sequence (1). are treated as different control sequences. This is Since (p)TEX converts ^^-notation in a control se- because the name of the former is u, while quence name into single characters in buffer, the that of the latter is u<1C5><1BF>. This behavior contents of buffer become seems to be natural. \ (\ ) However, the situation is more arguable between 顛顛 the experimental upT X and the trunk upT X. For E E Thus, the control sequence (1) cannot be distin- example, let’s compare the results of (1) and (2) in guished from \ so far. However, the experi- the following source by both versions of upTEX. 顛顛 mental pTEX can distinguish the control sequence (1) %#!uptex from \ , because the contents of buffer2 differ (see \font\Z=ec-lmr10 \Z % T1 encoding Table 3).顛顛 \def\ZZ#1{#1 (\string#1) } buffer2 is also useful in showing contexts in \kcatcode"3042=15 upTEX. For example, let’s look the following input: \expandafter\def\csname \endcsname{AA} あ \kcatcode"3042=17 %#!uptex \expandafter\def\csname \endcsname{BB} \def\J{\kcatcode"3042=17 } あ \kcatcode"3042=17 \expandafter\ZZ \def\L{\kcatcode"3042=15 } \csname \endcsname % (1) \J \L \undefined \J あ あ あ あ あ

Distinguishing 8-bit characters and Japanese characters in (u)pTEX 334 TUGboat, Volume 41 (2020), No. 3

Table 2: Properties of \csname \endcsname of TEX source in Section 6.3 あ trunk experimental \kcatcode of “ ” name result of \TEST name result of \TEST あ (1) 17 <81><82> BB (\ ) <1E3><181><182> BB (\ ) あ あ (2) 15 <81><82> BB (\ãĄĆ) <81><82> AA (\ãĄĆ)

Table 3: Contents of buffer and buffer2 when the experimental pTEX scans control sequences in an input line

\ ^^c5^^bf (\ ſ) \ 顛 顛 顛顛 buffer \ \ buffer2 0 1 1 0 0 0 1 1 1 1 name <1C5><1BF> <1C5><1BF><1C5><1BF>

With the experimental upTEX (and no TCX file), we [4] JulienPalard. Inconsistent error message. can know that the second “ ” is treated as three github.com/texjporg/platex/issues/84. あ 8-bit characters from the error message. I hope this [5] H. Kitagawa. Distinction between a byte will be useful in debugging. sequence and a Japanese character token ! Undefined control sequence. (in Japanese). github.com/texjporg/ l.3 \J \L ^^e3^^81^^82\undefined tex-jp-build/issues/81. あ \J あ あ [6] H. Kitagawa. Distinction of Latin The third and the final “ ” is not read by upTEX’s characters and Japanese characters in input processor at the error.あ So they are printed as if stringization of pTEX family (in Japanese). all UTF-8 characters gave Japanese character tokens. osdn.net/projects/eptex/docs/tc19ptex/ ja/1/tc19ptex.pdf. 7 Conclusion [7] H. Kitagawa. -pTEX Wiki (in Japanese). The primary factor of the complications discussed in osdn.net/projects/eptex/wiki/FrontPage. this paper is that (u)pT X are Japanese extension of E [8] LAT X Project휀 Team. LAT X news, an 8-bit engine; this causes the same byte sequence E E issue 28. www.latex-project.org/news/ can represent different things, namely a sequence latex2e-news/ltnews28.pdf, Apr. 2018. of 8-bit characters (token) or Japanese characters. Although my experiment does not get rid of this [9] H. Okumura. pTEX and Japanese Typesetting. factor (only ameliorates it), I hope that it is helpful. The Asian Journal of TEX 2(1):43–51, 2008. I thank the executive committee of TEXConf ajt.ktug.org/2008/0201okumura.pdf 2019, which gave me the opportunity for preparing [10] T. Tanaka. upTeX, upLaTeX — unicode the original talk, and the people who discussed the version of pTeX, pLaTeX. www.t-lab.opal. topics of this paper with me, especially Hironobu - ne.jp/tex/uptex_en.html. mashita, Takuji Tanaka, Takayuki Yato, and Norbert [11] T. Tanaka. upT X — Unicode version of pT X Preining. E E with cjk extensions. TUGboat 34(3):285–288, References 2013. tug.org/TUGboat/tb34-3/tb108tanaka.pdf [1] ASCII Corporation. ASCII Japanese TeX [12] N. Tutimura. UTF-8 (4) — ptetex Wiki (pTeX) (in Japanese). 対応 asciidwango.github.io/ptex/index.html. (in Japanese). tutimura.ath.cx/ptetex/ ?UTF-8%C2%D0%B1%FE%284%29. [2] K. Berry, O. . Web2c, for version 2019. tug.org/texlive/Contents/ Hironori Kitagawa live/texmf-dist/doc/web2c/web2c.pdf, Tokyo, Japan Feb. 2019. ⋄ h_kitagawa2001 (at) yahoo dot [3] Japanese TEX Development Community. Guide co dot jp to pTEX and friends. ctan.org/pkg/ptex-manual.

Hironori Kitagawa TUGboat, Volume 41 (2020), No. 3 335

Keyword scanning \hbox to 10.0cm {10.0cm} \hbox to .0cm {.0cm} Hans Hagen \hbox to .cm {.cm} \hbox to 10.cm {10.cm} Some primitives in TEX can take one or more op- tional keywords and/or keywords followed by one These are all valid and according to the speci- fication; even the single period is okay, although it or more values. In traditional TEX it concerns a looks funny. It would not be hard to intercept that handful of primitives, in pdfTEX there are plenty but I guess that when T X was written anything of backend-related primitives, LuaTEX introduced E optional keywords to some math constructs and at- that could harm performance was taken into account. One can even argue for cases like: tributes to boxes, and LuaMetaTEX adds some more too. The keyword scanner in TEX is rather special. \hbox to \first.\second cm {.cm} Keywords are used in cases like: Here \first and/or \second can be empty. \hbox spread 10cm {...} Most users won’t notice these side effects of scanning \advance\scratchcounter by 10 numbers anyway. \vrule width 3cm height 1ex Pushing back tokens Sometimes there are multiple keywords, as with rules, in which case you can imagine a case like: The reason for writing up any discussion of keywords \vrule width 3cm depth 1ex width 10cm depth 0ex is the following. Optional keyword scanning is kind height 1ex\relax of costly, not so much now, but more so decades ago (which led to some interesting optimizations, as we’ll Here we add a \relax to end the scanning. If we see). For instance, in the first line below, there is no don’t do that and the rule specification is followed by keyword. The scanner sees a 1 and it not being a arbitrary (read: unpredictable) text, the next word keyword, pushes that character back in the input. might be a valid keyword and when followed by a dimension (unlikely) it will happily be read as a \advance\scratchcounter 10 \advance\scratchcounter by 10 directive, or when not followed by a dimension an error message will show up. Sometimes the scanning In the case of: is more restricted, as with glue where the optional \scratchskip 10pt plux plus and minus are to come in that order, but when it has to push back the four scanned tokens plux. missing, again a word from the text can be picked up Now, in the engine there are lots of cases where if one doesn’t explicitly end with a \relax or some lookahead happens and when a condition is not satis- other token. fied, the just-read token is pushed back. Incidentally, \scratchskip = 10pt plus 10pt minus 10pt % okay when picking up the next token triggered some ex- \scratchskip = 10pt plus 10pt % okay pansion, it’s not the original next token that gets \scratchskip = 10pt minus 10pt % okay pushed back, but the first token seen after the ex- \scratchskip = 10pt plus whatever % error pansion. Pushing back tokens is not that inefficient, % typesets "plus 10pt": although it involves allocating a token and pushing \scratchskip = 10pt minus 10pt plus 10pt and popping input stacks (we’re talking of a mix of The scanner is case insensitive, so the following reading from file, token memory, Lua prints, etc.) specifications are all valid: but it always takes a little time and memory. In Lua- \hbox To 10cm {To} TEX there are more keywords for boxes, and there \hbox TO 10cm {TO} we have loops too: in a box specification one or more \hbox tO 10cm {tO} optional attributes are scanned before the optional \hbox to 10cm {to} to or spread, so again there can be push back when It happens that keywords are always simple no more attr are seen. English words so the engine uses a cheap check deep \hbox attr 1 98 attr 2 99 to 1cm{...} down, just offsetting to uppercase, but of course In LuaMetaTEX there is even more optional that will not work for arbitrary UTF-8 (as used in keyword scanning, but we leave that for now and LuaTEX) and it’s also unrelated to the upper- and just show one example: lowercase codes as T X knows them. E \hbox spread 10em {\hss The above lines scan for the keyword to and \hbox orientation 0 yoffset 1mm to 2em {up}\hss after that for a dimension. While keyword scanning is \hbox to 2em{here}\hss case tolerant, dimension scanning is period tolerant: \hbox orientation 0 xoffset-1mm to 2em{down}\hss \hbox to 10cm {10cm} }

Keyword scanning 336 TUGboat, Volume 41 (2020), No. 3

Although one cannot mess too much with these command code is a letter or other character (two low-level scanners there was room for some opti- checks) a fast check happens for the control sequence mization, so the penalty we pay for more keyword code being zero. If that is the case, the character scanning in LuaMetaTEX is not that high. (I try code is compared. In practice that works out well be- to compensate when adding features that have a cause the characters that make up a keyword are in possible performance hit with some gain elsewhere.) the range 65–90 and 97–122, and all other character It will be no surprise that there can be interest- codes are either below that (the ones that relate to ing side effects to keyword scanning. For instance, primitives where the character code is actually a sub- using the two character keyword by in an \advance command of a limited range) or much larger numbers can be more efficient because nothing needs to be that, for instance, indicate an entry in some array, pushed back. The same is true for the sometimes where the first useful index is above the mentioned optional equal: ranges. \scratchskip = 10pt The surprise is in the fact that there is no check- Similar impacts on efficiency can be found in the ing for letters or other characters, so this is why the 1 way the end of a number is seen, basically anything following code will work too: not resolving to a number (or digit). (For these, \catcode‘O= 1 \hbox tO 10cm {...}% { begingroup assume a following token will terminate the number \catcode‘O= 2 \hbox tO 10cm {...}% } endgroup if needed; we’re focusing on the spaces here.) \catcode‘O= 3 \hbox tO 10cm {...}% $ mathshift \catcode‘O= 4 \hbox tO 10cm {...}% & alignment \scratchcounter 10% space not seen, ends \cs \catcode‘O= 6 \hbox tO 10cm {...}% # parameter \scratchcounter =10% no push back of optional = \catcode‘O= 7 \hbox tO 10cm {...}% ^ superscript \scratchcounter = 10% extra optional space gobble \catcode‘O= 8 \hbox tO 10cm {...}% _ subscript \scratchcounter = 10 % efficient end of scanning \catcode‘O=11 \hbox tO 10cm {...}% letter \scratchcounter = 10\relax % maybe less efficient \catcode‘O=12 \hbox tO 10cm {...}% other In the above examples scanning the number involves: skipping over spaces, checking for an op- In the first line, if we changed the catcode of T tional equal, skipping over spaces, scanning for a (instead of O), it gives an error because TEX sees a sign, checking for an optional octal or hexadecimal begin group character (category code 1) and starts trigger (single or double quote character), scanning the group, but as a second character in a keyword (O) the number till a non-digit is seen. In the case of it’s okay because TEX will not look at the category dimensions there is fraction scanning as well as unit code. scanning too. Of course only the cases 11 and 12 make sense in In any case, the equal is optional and kind of practice. Messing with the category codes of regular a keyword. Having an equal can be more efficient letters this way will definitely give problems with then not having one, again due to push back in case processing normal text. In a case like: of no equal being seen, In the process spaces have {\catcode ‘o=3 \hbox to 10cm {oeps}} % \hb been skipped, so add to the overhead the scanning {\catcode ‘O=3 \hbox to 10cm {Oeps}} % {$eps} for optional spaces. In LuaMetaTEX all that has we have several issues: the primitive control sequence been optimized a bit. By the way, in dimension \hbox has an o so TEX will stop after \hb which can scanning pt is actually a keyword and as there are be undefined or a valid macro and what happens several dimensions possible quite some push back next is hard to predict. Using uppercase will work can happen there, but we scan for the most likely but then the content of the box is bad because there candidates first. the O enters math. Now consider: Catcode surprises {\catcode ‘O=3 \hbox tO 10cm {Oeps Oeps}} % {$eps $eps} All that said, we’re now ready for a surprise. The keyword scanner gets a string that it will test for, This will work because there are now two O’s in say, to in case of a box specification. It then will the box, so we have balanced inline math triggers. fetch tokens from whatever provides the input. A But how does one explain that to a user? (Who token encodes a so-called command and a charac- probably doesn’t understand where an error message ter and can be related to a control sequence. For comes from in the first place.) Anyway, this kind of instance, the character t becomes a letter command tolerance is still not pretty, so in LuaMetaTEX we with related value 116. So, we have three properties: now check for the command code and stick to letters the command code, the character code and the con- 1 No longer in LuaMetaTEX where we do a bit more robust trol sequence code. Now, instead of checking if the check.

Hans Hagen TUGboat, Volume 41 (2020), No. 3 337

and other characters. On today’s machines (and even Representation of macro parameters on my by now ancient workhorse) the performance hit can be neglected. Hans Hagen In fact, by intercepting the weird cases we also When TEX reads input it either does something di- avoid an unnecessary case check when we fall through rectly, like setting a register, loading a font, turning the zero control sequence test. Of course that also a character into a glyph node, packaging a box, or it means that the above mentioned category code trick- sort of collects tokens and stores them somehow, in a ery doesn’t work any more: only letters and other macro (definition), in a token register, or someplace characters are now valid in keyword scanning. Now, temporary to inject them into the input later. Here it can be that some macro programmer actually used we’ll be discussing macros, which have a special token those side effects but apart from some macro hacker list containing the preamble defining the arguments being hurt because no longer mastering those details and a body doing the real work. For instance when can be showed off, it is users that we care more for, you say: don’t we? \def\foo#1#2{#1 + #2 + #1 + #2} Current performance the macro \foo is stored in such a way that it knows To be sure, the abovementioned performance of key- how to pick up the two arguments and when expand- word and equal scanning is not that relevant in prac- ing the body, it will inject the collected arguments tice. But for the record, here are some timings on a each time a reference like #1 or #2 is seen. In fact, laptop with a i7-3849QM processor using MinGW bi- quite often, TEX pushes a list of tokens (like an argu- naries on a 64-bit Windows 10 system. The times are ment) in the input stream and then detours in taking the averages of five times a million such assignments tokens from that list. Because TEX does all its mem- and advancements. ory management itself the price of all that copying is not that high, although during a long and more onemilliontimes terminal LMTX LuaT X E complex run the individual tokens that make the \advance\scratchctr 1 space 0.068 0.085 forward linked list of tokens get scattered in token \advance\scratchctr 1 \relax 0.135 0.149 memory and memory access is still the bottleneck in \advance\scratchctr by 1 space 0.087 0.099 processing. \advance\scratchctr by 1 \relax 0.155 0.161 A somewhat simplified view of how a macro like \scratchctr 1 space 0.057 0.096 \scratchctr 1 \relax 0.125 0.151 this gets stored is the following: \scratchctr=1 space 0.063 0.080 hash entry "foo" with property "macro call" => \scratchctr=1 \relax 0.131 0.138 match (# property stored) We differentiate here between using a space as match (# property stored) terminal or a \relax. The latter is a bit less efficient end of match because more code is involved in resolving the mean- ing of the control sequence (which eventually boils match reference 1 down to nothing) but nevertheless, these are not other character + timings that one can lose sleep over, especially when match reference 2 other character + the rest of a decent TEX run is taken into account. match reference 1 And yes, LuaMetaT X (LMTX) is a bit faster here E other character + than LuaT X, but I would be disappointed if that E match reference 2 weren’t the case. When a macro gets expanded, the scanner first Hans Hagen collects all the passed arguments and then pushes ⋄ http://pragma-ade.com those (in this case two) token lists on the parameter stack. Keep in mind that due to nesting many kinds of stacks play a role. When the body gets expanded and a reference is seen, the argument that it refers to gets injected into the input, so imagine that we have this definition: \foo#1#2{\ifdim\dimen0=0pt #1\else #2\fi} and we say: \foo{yes}{no} 338 TUGboat, Volume 41 (2020), No. 3

then it’s as if we had typed: The last used argument signal character (offi- \ifdim\dimen0=0pt yes\else no\fi cially called a match character, here we have two that So, you’d better not have something in the ar- fit that category, # and ?) is used in the serialization! guments that messes up the condition parser! From Now, there is an interesting aspect here. When TEX the perspective of an expansion machine it all makes stores the preamble, as in our first example: sense. But it also means that when arguments are match (# property stored) not used, they still get parsed and stored. Imagine match (# property stored) using this one: end of match \def\foo#1{\iffalse#1\oof#1\oof#1\oof#1\oof#1\fi} the property is stored, so in the later example we get: When TEX sees that the condition is false it will enter a fast scanning mode where it only looks match (# property stored) at condition related tokens, so even if \oof is not match (# property stored) defined this will work : match (? property stored) end of match \foo{!} But in the macro body the number is stored But when we say this: instead, because we need it as reference to the - \foo{\else} rameter, so when that bit gets serialized TEX (or it will bark! This is because each #1 reference will more accurately: LuaTEX, which is what we’re using be resolved, so we effectively have (line breaks in the here) doesn’t know what specific signal was used. following are editorial) When the preamble is serialized it does keep track of \def\foo#1{\iffalse\else\oof\else\oof\else\oof the last so-called match character. This is why we \else\oof\else\fi} see this inconsistency in rendering. which is not good. On the other hand, since no A simple solution would be to store the used expansion takes place in quick parsing mode, this signal for the match argument, which probably only will work: takes a few lines of extra code (using a nine integer \def\oof{\else} array instead of a single integer), and use that instead. \foo\oof I’m willing to see that as a bug in LuaTEX but when I ran into it I was playing with something else: adding which to TEX looks like: the ability to prevent storing unused arguments. But \def\foo#1{\iffalse\oof\oof\oof\oof\oof\oof\oof the resulting confusion can make one wonder why \oof\oof\fi} we do not always serialize the match character as #. So, a reference to an argument effectively is just It was then that I noticed that the preamble a replacement. As long as you keep that in mind, stored the match tokens and not the number and and realize that while T X is skipping ‘if’ branches E that TEX in fact assumes that no mixture is used. nothing gets expanded, you’re okay. And, after prototyping that in itself trivial change I Most users will associate the # character with decided that in order to properly serialize this new macro arguments or preambles in low level align- feature it also made sense to always serialize the ments, but since most macro packages provide a match token as #. I simply prefer consistency over higher level set of table macros the latter is less well confusion and so I caught two flies in one stroke. The known. But, as often with characters in TEX, you new feature is indicated with a \#0 parameter: can do magic things: \bgroup \catcode‘?=\catcode‘# \catcode‘?=\catcode‘# \def\foo #1#2?3{?1?2?3} \def\foo ?1?0?3{?1?2?3} \meaning\foo\space=>\foo{1}{2}{3}\par \meaning\foo\space=>\foo{1}{2}{3}\crlf \def\foo ?1#2?3{?1?2?3} \def\foo ?1#0?3{?1?2?3} \meaning\foo\space=>\foo{1}{2}{3}\par \meaning\foo\space=>\foo{1}{2}{3}\crlf \def\foo ?1?2#3{?1?2?3} \def\foo #1#2?3{?1?2?3} \meaning\foo\space=>\foo{1}{2}{3}\par \meaning\foo\space=>\foo{1}{2}{3}\crlf Here the question mark also indicates a macro \def\foo ?1#2?3{?1?2?3} argument. However, when expanded we see this as \meaning\foo\space=>\foo{1}{2}{3}\crlf result: \def\foo ?1?2#3{?1?2?3} \meaning\foo\space=>\foo{1}{2}{3}\crlf macro:#1#2?3->?1?2?3 =>123 \egroup macro:?1#2?3->?1?2?3 =>123 macro:?1?2#3->#1#2#3 =>123 shows us:

Hans Hagen TUGboat, Volume 41 (2020), No. 3 339

macro:#1#0#3->#1#2#3 =>13 machine. I didn’t carefully measure that so I might macro:#1#0#3->#1#2#3 =>13 be wrong here. Anyway, it’s always good to avoid macro:#1#2#3->#1#2#3 =>123 moving around data when there is no need for it. macro:#1#2#3->#1#2#3 =>123 Just to temper expectations with respect to macro:#1#2#3->#1#2#3 =>123 performance, here are some examples: So, what is the rationale behind this new #0 \catcode‘!=9 % ignore this character variant? Quite often you don’t want to do something \firstoftwoarguments with an argument at all. This happens when a macro {!!!!!!!!!!!!!!!!!!!}{!!!!!!!!!!!!!!!!!!!} acts upon for instance a first argument and then \secondoftwoarguments {!!!!!!!!!!!!!!!!!!!}{!!!!!!!!!!!!!!!!!!!} expands another macro that follows up but only deals \secondoffourarguments with one of many arguments and discards the rest. {!!!!!!!!!!!!!!!!!!!}{!!!!!!!!!!!!!!!!!!!} Then it makes no sense to store unused arguments. {!!!!!!!!!!!!!!!!!!!}{!!!!!!!!!!!!!!!!!!!} Keep in mind that in order to use it more than once In ConT Xt we define these macros as follows: an argument does need to be stored, because the E parser only looks forward. In principle there could \def\firstoftwoarguments #1#2{#1} \def\secondoftwoarguments #1#2{#2} be some optimization in case the tokens come from \def\secondoffourarguments#1#2#3#4{#2} macros but we leave that for now. So, when we don’t need an argument, we can avoid storing it and just The performance of 2 million expansions is the skip over it. Consider the following: following (probably half or less on a more modern machine): \def\foo#1{\ifnum#1=1 \expandafter\fooone \else\expandafter\footwo\fi} macro total step \def\fooone#1#0{#1} \firstoftwoarguments 0.245 0.000000123 \def\footwo#0#2{#2} \secondoftwoarguments 0.251 0.000000126 \foo{1}{yes}{no} \secondoffourarguments 0.390 0.000000195 \foo{0}{yes}{no} But we could use this instead: We get: \def\firstoftwoarguments #1#0{#1} yes no \def\secondoftwoarguments #0#2{#2} Just for the record, tracing of a macro shows \def\secondoffourarguments#0#2#0#0{#2} that indeed there is no argument stored: which gives: \def\foo#1#0#3{....} macro total step \foo{11}{22}{33} \firstoftwoarguments 0.229 0.000000115 \foo #1#0#3->.... \secondoftwoarguments 0.236 0.000000118 #1<-11 \secondoffourarguments 0.323 0.000000162 #2<- So, no impressive differences, especially when #3<-33 one considers that when that many expansions hap- Now, you can argue, what is the benefit of not pen in a run, getting the document itself rendered storing tokens? As mentioned above, the TEX engines plus expanding real arguments (not something de- do their own memory management.1 This has large fined to be ignored) will take way more time com- benefits in performance especially when one keeps pared to this. I always test an extension like this on in mind that tokens get allocated and are recycled the test suite2 as well as the LuaMetaTEX manual constantly (take only lookahead and push back). (which takes about 11 seconds) and although one However, even if this means that storing a cou- can notice a little gain, it makes more sense not to ple of unused arguments doesn’t put much of a dent play music on the same machine as we run the TEX in performance, it does mean that a token sits some- job, if gaining milliseconds is that important. But, where in memory and that this bit of memory needs as said, it’s more about unnecessary memory access to get accessed. Again, this is no big deal on a com- than about CPU cycles. puter where a TEX job can take one core and basically This extension is downward compatible and its is the only process fighting for CPU cache usage. But overhead can be neglected. Okay, the serialization less memory access might be more relevant in a sce- 2 Currently some 1600 files that take 24 minutes plus or nario of multiple virtual machines running on the minus 30 seconds to process on a high end 2013 laptop. The same hardware or multiple TEX processes on one 260 page manual with lots of tables, verbatim and MetaPost images takes around 11 seconds. A few milliseconds more or 1 An added benefit is that dumping and undumping is less don’t really show here. I only time these runs because I relatively efficient too. want to make sure that there are no dramatic consequences.

Representation of macro parameters 340 TUGboat, Volume 41 (2020), No. 3

now always uses # but it was inconsistent before, so A few side notes I’m willing to sacrifice that (and I’m pretty sure no The fact that characters can be given a special mean- ConT Xt user cares or will even notice). Also, it’s E ing is one of the charming properties of TEX. Take only in LuaMetaTEX (for now) so that other macro these two cases: packages don’t suffer from this patch. The few cases \bgroup\catcode‘\&=5 &\egroup where ConT Xt can benefit from it are easy to isolate E \bgroup\catcode‘\!=5 !\egroup for MkIV and LMTX so we can support LuaTEX and In both lines there is now an alignment character LuaMetaTEX. used outside an alignment. And, in both cases the I mentioned LuaTEX and how it serializes, but error message is similar: for the record, let’s see how pdfTEX, which is very close to original TEX in terms of source code, does ! Misplaced alignment tab character & it. If we have this input: ! Misplaced alignment tab character ! So, indeed the right character is shown in the \catcode‘D=\catcode‘# message. But, as soon as you ask for help, there is a \catcode‘O=\catcode‘# difference: in the first case the help is specific for a \catcode‘N=\catcode‘# tab character, but in the second case a more generic \catcode‘-=\catcode‘# explanation is given. Just try it. \catcode‘K=\catcode‘# The reason is an explicit check for the amper- \catcode‘N=\catcode‘# \catcode‘U=\catcode‘# sand being used as tab character. Such is the charm \catcode‘T=\catcode‘# of TEX. I’ll probably opt for a trivial change to be \catcode‘H=\catcode‘# consistent here, although in ConTEXt the ampersand \def\dek D1O2N3-4K5N6U7T8H9{#1#2#3 #4#5#6#7#8#9} is just an ampersand so no user will notice. {\meaning\dek \tracingall \dek don{}knuth} There are a few more places where, although in principle any character can serve any purpose, there The meaning gets typeset as (again, line break are hard coded assumptions, like $ being used for is editorial): math, so a missing dollar is reported, even if math started with another character being used to enter macro:D1O2N3-4K5N6U7T8H9->H1H2H3 H4H5H6H7 math mode. This makes sense because there is no H8H9don knuth urgent need to keep track of what specific character while the tracing reports: was used for entering math mode. An even stronger argument could be that TEXies expect dollars to be \dek D1O2N3-4K5N6U7T8H9->H1H2H3 H4H5H6H7H8H9 used for that purpose. Of course this works fine: D1<-d \catcode‘€=\catcode‘$ O2<-o € \sqrt{xˆ3} € N3<-n But when we forget an € we get messages like: -4<- K5<-k ! Missing $ inserted N6<-n or more generic: U7<-u ! Extra }, or forgotten $ T8<-t which is definitely a confirmation of “America first”. H9<-h Of course we can compromise in display math be- The reason for the difference, as mentioned, is cause this is quite okay: that the tracing uses the template and therefore uses \catcode‘€=\catcode‘$ the stored match token, while the meaning uses the $€ \sqrt{xˆ3} €$ reference match tokens that carry the number and unless of course we forget the last dollar in which at that time has no access to the original match case we are told that token. Keeping track of that for the sake of tracing ! Display math should end with $$ would not make sense anyway. So, traditional TEX, so no matter what, the dollar wins. Given how ugly which is what pdfTEX is very close to, uses the last the Euro sign looks I can live with this, although used match token, the H. Maybe this example can I always wonder what character would have been convince you that dropping that bit of log related taken if TEX was developed in another country. compatibility is not that much of a problem. I just tell myself that I turned an unwanted side effect into Hans Hagen ⋄ a new feature. http://pragma-ade.com

Hans Hagen TUGboat, Volume 41 (2020), No. 3 341

TEXdoc online — a web interface for serving The last noteworthy feature allows users to TEX documentation browse packages by topics. The list is retrieved from the (unmaintained) texdoctk.dat file available in Island of TEX (developers) the underlying TEX Live distribution. Within each Abstract topic a selection of packages is shown with the name, description and link to the main documentation for When looking for T X-related documentation, users E each. The topics and presented packages are not have many options, including running texdoc on exhaustive and many packages on CTAN or even in their local machine, looking up the package at CTAN, TEX Live will not be presented to the user. or using a service like texdoc.net. As the latter is known for lacking regular updates, the Island of TEX 2 Providing a RESTful API for texdoc decided to take the opportunity to provide a complete A RESTful API is a stateless interface to web appli- rewrite of the provided service using a RESTful API cations responding to typical HTTP requests with and a self-updating Docker container. uniform responses. Usually, JSON is used for the response format. Following this principle, our soft- 1 Core features of texdoc.net ware responds to HTTP GET requests with JSON The most important feature of texdoc.net is the representing documentation-related objects. documentation search as the prominent first item on The endpoints you can access are described as the landing page (cf. fig. 1). Searching for a package follows. Keep in mind that these requests will return yields a table with the entries a user could retrieve either HTTP status code 200 (OK) or, in the case locally by executing texdoc -l package . Each en- of any error, HTTP status code 422 (Unprocessable h i try is linked to the corresponding document and the Entity). The only endpoint that is guaranteed not user is able to either view (if there is browser view to fail is located at /version. support for the file type) or download the documen- /version This endpoint returns the versions of the tation file. This is especially useful for users without API and data format (api), the installed version a local TEX installation like Overleaf users. of texdoc (texdoc) and the date of the last TEX Live update as an ISO date string (tlpdb). Make sure your software always expects the correct API version to avoid problems. Our API versions follow semantic versioning with all its consequences. /texdoc/ name On this path, the list of entries is returnedh thati a local call to texdoc -l would produce. For each entry, there are two fields: path, containing the path to the documenta- tion file relative to the doc subfolder of the TEXMFDIST directory; and description contain- Figure 1: Screenshot of (top portion of) texdoc.net ing the file’s description if there is one (empty otherwise). The application will always return a JSON array of such entries. A little bit more direct is the use of the HTTP /serve/ name / index This call results in the doc- endpoint https://texdoc.net/pkg/ package ; this umentationh i fileh at indexi index of the result of responds with the file that texdoc h iwould /texdoc/ name being returnedh i to the client. open if executed locally. The user does not get to h i choose what to open but in most cases texdoc is good /pkg/ name This endpoint is a shortcut for the /serve/h i name /0 endpoint, defined to preserve at determining the proper documentation if given h i the package name, so users get what they want. compatibility with the API of texdoc.net. The aforementioned feature of creating simple /topics/list This endpoint returns the list of top- references to the documentation is what makes the ics known to the application specified by their service well suited for writing posts on web forums or key and a caption called details. This is a direct sites like the TEX StackExchange. Linking to CTAN interface to CTAN’s API for topics. Network mirrors gives far longer urls with many unnecessary access for the server software is required. path components compared to the short syntax of /topic/ name This endpoint returns details for a texdoc.net. topich by returningi the key (what is passed as

TEXdoc online — a web interface for serving TEX documentation 342 TUGboat, Volume 41 (2020), No. 3

name ), a string with a short description called run using your local Java installation. Running the h i details and a list of package names (strings) JAR bundle will open a local web server that can called packages. This is a direct interface to be accessed using the url http://127.0.0.1:8080 CTAN’s API for topics. Network access for the from your browser. It will not work without Internet server software is required. access (because of how it fetches topics) or a local TEX installation (for TEXdoc) though. 3 The new front end As an alternative to running the JAR file we The front end of TEXdoc online is structured in also provide a dockerized image that users can pull a similar way to texdoc.net. The main feature using (line break is editorial; this is a one-line string) is still searching for packages. This is based on registry.gitlab.com/islandoftex/images/ the /texdoc/ package endpoint presented in the texdoc-online:latest previous section.h The resultsi will be the same as on This image is based on our TEX Live images, which texdoc.net. makes it quite large in terms of file size but also elim- inates the need for a local TEX installation. Using Docker is the preferred solution for hosting your own instance of TEXdoc online. The Docker image also provides daily updates. The container will update daily at midnight (con- tainer time) and thus stays up to date as long as the current version of TEX Live is supported. Our con- tinuous integration ensures that you can always pull the latest tag and receive the latest and greatest TEX documentation, which when pulled and run will update itself. To ease deployment we provide a ready-made docker-compose.yml configuration for Docker Com- pose (https://docs.docker.com/compose/), an or- chestration tool for Docker containers. It uses the Figure 2: Screenshot of (top portion of) texdoc-online Caddy web server to provide automatic HTTPS with local certificates or, on public domains, Let’s Encrypt Topics are handled differently, though. We use certificates. Alter the localhost line within that CTAN’s JSON API to fetch their topics and pack- file to suit your needs and docker-compose up will ages belonging to these topics. Any user visiting the start a production-ready instance of TEXdoc online. landing page will be shown six random categories with a few packages each. If a category holds sev- 5 Final remarks eral packages, four of them are selected at random. In cooperation with Stefan Kottwitz we are working Users have the option to show all topics and list the on providing a hosted solution as a future replace- packages for any topic they are interested in. Also, ment of texdoc.net. Stay tuned for seeing this each package entry can be automatically queried for transition happen within the next months. documentation. This covers many more packages TEXdoc online is still a work in progress and than the old texdoctk.dat file, though it does have there is room for improvement. We are working on the disadvantage that some of CTAN’s topics are new features as well as considering ways to extend sparsely populated. the current front-end for additional, hosting-based The front page also offers a “jump to documen- content. The RESTful API, through the endpoints tation” section introducing the API and options to presented in section 2, allows external applications host the software yourself. These are covered in the and services to easily query TEX package documen- next section. tation from an updated lookup system. Feedback can be provided through the project 4 Deploying the software repository in GitLab. We look forward to hearing The source code for TEXdoc online is hosted at Git- from you! Lab and may be found at https://gitlab.com/ islandoftex/images/texdoc-online. There are Island of TEX (developers) also download options to get an executable JAR file ⋄ https://gitlab.com/islandoftex with all dependencies included that you can simply

Island of TEX (developers) TUGboat, Volume 41 (2020), No. 3 343

MMTEX: Creating a minimal and modern Compiling LuaTEX. Originally LuaTEX started TEX distribution for GNU/Linux with pdfTEX’s sources (mostly written in WEB), but was translated to C. LuaTEX is somewhat confus- Michal Vlas´ak ingly developed in a repository that is essentially a MMTEX stands for “minimal modern TEX” and is a subset of TEX Live’s repository, but separate. TEX simple, small and legacy-free distribution of TEX for Live’s build infrastructure is based on GNU Auto- GNU/Linux. It has the form of a installable package, tools and is able to compile all of TEX Live and offers full functionality of OpTEX and plain LuaTEX external libraries for a lot of platforms, but is very formats, allows use of system fonts and resources in slow and does a lot of checks for things that have external TEXMF trees. been standardized in C or POSIX for years. This article explains my motivation for creating After preparing replacements of build-time gen- it, describes some aspects of a distribution in general erated files it is however possible to compile LuaTEX and how they are handled in MMTEX. My goal is to in more or less a single call to the compiler — the show that things can be done simply, and that TEX only catch is mplib. It is written in CWEB, which it- can integrate better into a Unix system and not be self is written in CWEB, so bootstrapping is needed. the odd one out. External libraries. LuaTEX uses several external C libraries. The most prominent one is of course Motivation Lua, but it also uses, for example, libpng to handle I find TEX Live huge and complicated. Its “full PNG images. There are two choices for how to use scheme” installation, which is the default, takes up a library—either compile its source into the binary about 7 GiB. Though with “minimal” (plain TEX (“statically link”) or on each run of the program only) scheme and no documentation you can get to find and load the compiled library file somewhere about 50 MiB. I also think that it doesn’t fit very on the file system (“dynamically link”). The usual neatly into a Unix system. It isolates itself more or choice for most systems is dynamic linking—this less in a single directory tree and doesn’t follow the allows reuse of a single library file for more programs hier(7) standard; as two examples, it doesn’t store (making updates easier) and saves disk space. It is configuration files in /etc and doesn’t permit read- a bit slower, because of the searching and loading. only mount of /usr. The libpng and zlib libraries, for example, are My end goal was to create something that: often already present on systems or can easily be installed using a package manager. For these the is a single package that can be installed using dynamic linking approach is better. Other libraries • an operating system’s package manager, have LuaT X’s modifications (Lua) or are specific integrates well into a system (respecting filesys- E to T X (Kpathsea) so sharing them would not be • tem hierarchy, using system fonts), E especially useful. These are statically linked. includes only core functionality, but could eas- • ily be pointed to external TEXMF tree(s) with Formats additional packages/files, doesn’t complicate things with legacy, dynamic The obvious choice of format for a minimal TEX • regeneration of various files, ... would be plain TEX. Or rather its adaptation for The intended users are those who want a small LuaTEX which for example outputs to PDF by de- (34 MiB) but functional TEX system, one compara- fault. While it can be called minimal, it isn’t “mod- ble to LATEX and its many packages. Because of its ern”. Most users today expect a format to be able small size and ability to install using a package man- to easily create documents with numbered sections, ager it can also be useful for Docker images or CI/CD tables of contents, bibliographies, hyperlinks, source scripts that need to set up a TEX environment. code listings with syntax highlighting, etc. A recent format, based on LuaTEX, which provides these fea- Engine tures, but still keeps plain’s simplicity is OpTEX. In a sense it is even more powerful than LATEX—where Nowadays support for PDF output and OpenType LATEX needs a package or an external binary, OpTEX fonts is a must. That leaves the choice of engine be- has it built in. tween LuaTEX and X TE EX. LuaTEX was chosen, be- Both formats are included in the distribution, cause it is extensible with Lua, has better support of OpTEX is the primary one, while plain is for now micro-typographical extensions, and has integrated included more or less just to allow running luatex METAPOST in the form of mplib. without getting an error about a missing format.

MMTEX: Creating a minimal and modern TEX distribution for GNU/Linux 344 TUGboat, Volume 41 (2020), No. 3

Finding files Language support LuaTEX uses the Kpathsea library for finding files. Previous TEX engines had the limitation of being Kpathsea uses path specifications and variables sim- able to load hyphenation patterns only at format ilar to the Unix PATH environment variable, but dif- creation time—when running iniTEX. LuaTEX has ferentiates between file types. For each file type it no such limitation; by using Lua, it is possible to maintains one or more associated variable names, a load hyphenation patterns at runtime. list of possible suffixes and most importantly a calcu- Today virtually all hyphenation patterns and lated search path (directories separated by colons). exceptions that have been used by TEX users are For example bib_format’s variables are BIBINPUTS distributed in the hyph-utf8 package. hyph-utf8 and TEXBIB, while the suffix list contains .bib. also provides patterns and exceptions in UTF-8 en- In a simple case a search path is determined in coded text files, which are preferred for LuaTEX. one of three ways, in order of significance: TEX Live’s approach is to provide hyphenation value of associated variable set in the environ- patterns and exceptions for each language in a sep- • ment (that is, an environment variable), arate package. Each package then hooks itself using value of associated variable from a texmf.cnf the TEX Live execute AddHyphen directive. An ex- • configuration file, ample for French: default value set at compilation time. execute AddHyphen \ • name=french synonyms=patois,francais \ For finding .cnf files the same path searching lefthyphenmin=2 righthyphenmin=2 \ mechanism is used; the variable is TEXMFCNF, but as file=loadhyph-fr.tex \ it cannot be set from a configuration file, only the file_patterns=hyph-fr.pat.txt \ first and last way applies. All texmf.cnf files that file_exceptions= are found are read. Order is important — earlier as- This information is also written to files used by signments in configuration files override later ones. ε-TEX’s language mechanism, which is used by plain I set all the useful file types to have defaults LuaTEX. This gets added to language.def: that work without any configuration file, and respect \addlanguage{french}{loadhyph-fr.tex}{}{2}{2} standards like hier(7), TDS and XDG. For example, the search path for .cnf files is: and this is written to language.dat.lua: TEXMFDOTDIR (more on this later), [’french’] = { • loader = ’loadhyph-fr.tex’, ~/.config/mmtex (or more precisely its XDG • lefthyphenmin = 2, equivalent), righthyphenmin = 2, /etc/mmtex, • synonyms = { ’patois’, ’francais’ }, to allow local (“project”), user and system configu- patterns = ’hyph-fr.pat.txt’, ration files respectively. hyphenation = ’’, }, In Kpathsea, default (compile-time) values for etex.src reads language.def at format creation search paths can be set, but not variables. For this time. Listed languages are registered and their hy- I created a patch that “injects” default values for a phenation patterns loaded into the format. This en- few variables, as if they were read from a configura- ables their use later with \uselanguage. tion file. In LuaTEX, it is discouraged to load patterns TEXMFDOTDIR variable was inspired by TEX Live into the format, so the mechanism is changed by and is normally the current directory (“.”), but is hyph-utf8’s own etex.src. Instead of loading each useful for temporary overrides on the command line, pattern or exception file on \addlanguage, the lan- using environment variables. Every search path con- guage is only registered and the files are loaded at tains TEXMFDOTDIR as the first entry, even the one the first \uselanguage. Both commands use Lua for .cnf files (allowing for project-specific settings). code in luatex-hyphen.lua, which uses informa- TEXMF is the most important variable for MMTEX. tion in language.dat.lua for handling synonyms It should contain roots of all TEXMF trees. It is sup- and finding the names of pattern files. posed to be set by the user or system administrator In OpTEX the situation is simpler. It doesn’t at any level of configuration they need at the mo- read language.def because it already has that in- ment, and doesn’t have a default value (preferences formation, but it still uses luatex-hyphen.lua. of users and system administrators vary widely). To support all languages in hyph-utf8, MM- TEX generates the files language.def and language. dat.lua from the hyph-utf8 sources.

Michal Vlas´ak TUGboat, Volume 41 (2020), No. 3 345

Fonts The standard TDS directories for font files also work: $TEXMF/fonts/{,}. To fully use the potential of LuaTEX, OpenType fonts should be used. These are the same fonts MetaPost that are used by other programs, and as such some of them are already preinstalled on operating sys- METAPOST is integrated into LuaTEX as mplib and tems. And probably many more are additionally available via a Lua interface. luamplib adapts the installed by users or administrators. To also not code from ConTEXt for plain (and LATEX), making duplicate any effort with packaging of fonts, the dis- it possible to use METAPOST in a .tex file. To use tribution doesn’t provide any OpenType fonts. The it, \input luamplib.sty. idea is to let users use the fonts they already have mplib proved to be useful even as a METAFONT or can get on their system, as well as the fonts they replacement. “Ralph Smith’s Formal Script” font have in their TEXMF tree(s). For example Latin (required by OpTEX) doesn’t have prebuilt TFM Modern, the GUST e-foundry adaptation of Com- files on CTAN. Normally one would use META- puter Modern which includes OpenType, is avail- FONT to generate the metrics, but with a few lines able as fonts-lmodern on Debian-based systems of Lua mplib can, just like METAPOST, use the and otf-latin-modern on . mfplain.mp format to function as METAFONT and do the job. 8-bit fonts. Only 8-bit fonts can be preloaded into a TEX format. Both OpTEX and plain LuaTEX do Implementation of MMTEX this. To support this, MMTEX includes a minimal set of Type 1 fonts and their respective metric and MMTEX itself is a few supporting files and a script encoding files. A pdftex.map file is needed, as it is called build which contains instructions for building used to map names of TFM metric files to font names MMTEX to a given directory. The script in this form and font files, with optional reencodings. This file allows a wrapper that packs together the directory contains lines like: and some metadata to create an installable pack- cmr5 CMR5

MMTEX: Creating a minimal and modern TEX distribution for GNU/Linux 346 TUGboat, Volume 41 (2020), No. 3

UTF-8 installations of TEX https://github.com/igor-liferenko/cweb Igor Liferenko TEX format default is in TEX’s external encod- ing. Abstract < ASCII_code TEX_format_default < [1+format_default_length+1] In its design TEX has the concepts of “internal < =" TeXformats/plain.fmt"; encoding” and “external encoding”. This fact allows TEX to work with any encoding. > wchar_t TEX_format_default > [1+format_default_length+1] We use Unicode as TEX’s external encoding. > =L" TeXformats/plain.fmt"; Then we change the necessary parts of TEX to use UTF-8 as the input/output encoding. It remains to set the LC_CTYPE locale category, The resulting implementation passes the trip on which depends the behavior of the C library func- test. tions used below setlocale(LC_CTYPE, "C.UTF-8"); 1. Initialization and to add the necessary headers. Note: we use the web2w [1] implementation of TEX, #include but the ideas described here can be applied to any #include implementation. First, we change the data type of characters 2. Input in text files to wchar_t to accommodate Unicode values. For automatic conversion from UTF-8 to Unicode, Background: this predefined C type allocates text files (including the terminal) must be read with four bytes per character (on most systems). Char- the C library function fgetwc [4]. acter constants of this type are written as L’...’ In ctex.w the macro get is used for reading text and string constants as L"...". files, as well as font metric and format files. (For brevity, in the diffs following, the original Text files are read in the functions a open in code from web2w’s ctex.w source is preceded with and input ln. In a open in we replace the macro < characters, and the new code with >. Both are reset with its expansion and then in both functions sometimes reformatted for presentation in this arti- we change get((*f)) to (*f).d=fgetwc((*f).f) cle, and for readability we sometimes leave a blank line between the pieces. The actual implementation 3. Output utex.patch uses the file [2].) For automatic conversion from Unicode to UTF-8, < @d text_char unsigned char text files (including the terminal) must be written > @d text_char wchar_t with the C library function fwprintf [4]. Use values from the basic multilingual In ctex.w the macro write is used for writing (BMP) of Unicode. text files in all cases but one. So, we change fprintf to fwprintf in the definition of write. The one case < @d last_text_char 255 where write is used is for writing DVI files—there > @d last_text_char 65535 we just use its old expansion. Then we change the size of the xord array [3] 16 In addition to redefining the macro write, we to 2 bytes. need to add the ‘L’ prefix to strings which are used in < ASCII_code xord[256]; the macros that call the macro write. These changes > ASCII_code xord[65536]; are trivial and there are quite a few of them so we Elements in the xchr array [3] are overridden will not list them here. Instead, we show the fol- using the file mapping.w. lowing cases, where the conversion specifier in the printf-style directives also needs to change: @i mapping.w < wterm("%c",xchr[s]); This file specifies the character(s) required for a par- > wterm(L"%lc",xchr[s]); ticular installation of TEX, for example: xchr[0xf1] = L’¨e’; < wlog("%c",xchr[s]); A complete example of mapping.w is here: > wlog(L"%lc",xchr[s]);

Igor Liferenko TUGboat, Volume 41 (2020), No. 3 347

< write(write_file[selector],"%c",xchr[s]); In the code checking format default length for > write(write_file[selector],L"%lc",xchr[s]); consistency, we use the C library function wcstombs [4] to count the number of bytes in the UTF-8 rep- 4. The file name buffer resentation of TEX format default. < if (format_default_length > The name of the file to be opened, which is stored in < file_name_size) the name of file buffer, must be encoded in UTF-8. Therefore, each character passed to append to name, > if (wcstombs(NULL, before being added to name of file, must be con- > TEX_format_default+1,0) > verted to UTF-8. This is done using the C library > file_name_size) function wctomb [4]. In the append to name macro, the variable k is In the function pack buffered name, the code used as the index into the name of file buffer where that drops excess characters assumes that each char- the last byte was stored. Originally, k was always acter is one byte: increased and provisions were made that charac- if (n+b-a+1+format_ext_length > ters will not be written beyond the end of buffer file_name_size) (which has the index file name size); name length b=a+file_name_size-n-1-format_ext_length; was then set to the minimal value between k and But the number of bytes used to represent a char- file name size. acter in UTF-8 may be more than one. Therefore, We cannot do the same in our implementation, we use an equivalent method to drop excess charac- because we may reach the end of the buffer in the ters, the one which will work with multibyte charac- midst of a multibyte character. Instead, if the next ters: After appending the contents of buffer[a..b] multibyte character does not fit into the buffer, we to name of file, we roll back in it character by char- prevent k from being increased by negating its value: acter until the format extension fits in it. We use the < @d append_to_name(X) { c=X;incr(k); C library function mblen [4] to determine the start < if (k <= file_name_size) of the next (multibyte) character to be discarded. < name_of_file[k]=xchr[c]; } while (k+wcstombs(NULL,TEX_format_default+ format_default_length- > @d append_to_name(X) { c=X; format_ext_length+1,0) > > if (k >= 0) { /* try to append? */ file_name_size) { > char mb[MB_CUR_MAX]; k--; > int len = wctomb(mb, xchr[c]); while (mblen(name_of_file+k+1,MB_CUR_MAX) > if (k+len <= file_name_size) ==-1) > for (int i = 0; i < len; i++) k--; > name_of_file[++k] = mb[i]; } > else > k = -k; /* freeze k */ } } References In pack file name and pack buffered name (the [1] Ruckert, Martin. WEB to cweb. functions that call append to name), we have to “un- mruckert.userweb.mwn.de/hint/web2w.html freeze” its value if it was “frozen”. if (k < 0) k = -k; [2] Source of the present implementation. https://github.com/igor-liferenko/tex In make name string, each (multibyte) charac- [3] Knuth, Donald E. TEX: The Program, 1986. ter from name of file must be converted from UTF-8 ISBN 0201134373. to Unicode, before being converted to TEX’s internal encoding. This is done using the C library function [4] Single Unix Specification. Introduction to mbtowc [4]. ISO C Amendment 1 (Multibyte Support Environment). < append_char(xord[name_of_file[k]]); http://unix.org/version2/whatsnew/ > { wchar_t wc; login_mse.html > k += mbtowc(&wc, name_of_file+k, Igor Liferenko > MB_CUR_MAX) - 1; ⋄ > append_char(xord[wc]); } igor.liferenko (at) gmail dot com

UTF-8 installations of TEX 348 TUGboat, Volume 41 (2020), No. 3

OpTEX — A new generation of Plain TEX the t. Of course, TEX supports the command \def for suchˇ situations (among many others). Petr Olˇs´ak Unfortunately, there are many similar “prob- lems”; you can see them at StackExchange or else- Introduction where. These “problems” shouldn’t exist if we leave the old way of thinking about TEX. Today, there OpTEX [1] is a new format prepared for LuaTEX. are plenty of good Unicode fonts. Simply, use them. It keeps Plain TEX’s simplicity, adds the power of The second typical matter can be found in the OPmac macros [2] and leaves behind the old ob- the accepted answer of that same StackExchange scurities with non-Unicode fonts and with various thread: “If you are using a special TEX engine then TEX engines. It provides a powerful font selection you must do \ifsomething ...\fi and you must system, colors, graphics, references, hyperlinks, syn- load a package XY supporting this \ifsomething.” tax highlighting, preparing indexes and bibliography This is absurd. OpTEX eliminates the need to deal listings. All these features are implemented at TEX with such issues. It supports only one modern TEX macro level and they are ready to use without any engine: LuaTEX. Simplicity is power. external program. OpTEX is a new Plain TEX suit- The three-line source file from the previous ex- able for the present. ample shows another very important characteristic OpTEX was introduced in February 2020 and of Plain TEX. You need not code your document,** uploaded to CTAN. Now, it is ready to use in both you don’t need to adapt yourself to a special com- TEX Live and MiKTEX with the basic command puter language for your source files where are many optex document. It underwent significant develop- \begin{foo}...\end{foo}, for example. Simply ment in the first half of 2020, so please take the most write what you want. Of course, you have to decide recent version of it (from CTAN or a distribution) if what font family is used (the \fontfam command) you want to experiment with it. and then just write the text. And use \bye if you want to say goodbye to TEX. First example A question was given on TeX.StackExchange [3]: Main principles of OpTEX “How can I write the symbol t in TEX?” Unfor- There are three main principles. tunately, the accepted answer isˇ typical for the old The first one was mentioned in the previous days of T X: use a macro package which loads a E • paragraph. The author can write the docu- font in an obscure 8-bit encoding and use the com- ment, he or she need not code the document mand \textsubwedge{t}. For me, the question has in a computer language. The source text of the no sense and the answer sounds like something from OpT X document keeps the basic rules about the last millennium. Imagine a little modification E tagging documents (chapters, sections, empha- of this question: “How can I write the symbol K in sized text, footnotes, listings of items, etc.), but T X?” And the answer should be: use package XY, E there are minimal tagging marks, because we then you can use \printthischarK command. But want to keep the text human-readable. There the normal answer is: “Use K in your text.” is a minimum of braces {...}, for example. My answer, second on that StackExchange The second principle of OpT X says: don’t page, was: “Use OpT X and then use normally the E E • hide T X. Don’t declare new parameters and symbol t. If a supporting this charac- E new syntax constructions. If a user needs ter is loaded*ˇ then there is no more problem.” For or wants to use T X then he or she sim- example: E ply uses TEX. For example, we have the \fontfam[] \hsize primitive register, and we don’t de- Symbol t ˇ clare a new one like \textwidth. If you need \bye to set a value to the register then use simply There was an addendum to my answer, essentially \hsize=15cm; there is no alternative syntax this: You can define \def\t{t} if your text editor like \setlength{\textwidth}{15cm}. Basic ˇ or keyboard does not comfortably support making knowledge of the TEX primitive syntax is ex- pected when using OpTEX. But it pays off.

* More precisely: it is not a single character in this ** Many questions at StackExchange begin with “My case but this is only a technical detail. code is ... ”.

Petr Olˇs´ak TUGboat, Volume 41 (2020), No. 3 349

LATEX, ConTEXt and their many packages give Many font families provide font modifiers, for • users plenty of new parameters and options, example \cond for condensed variants or \caps for and they create a new level of language (on capitals and small capitals. Usage of such font mod- top of TEX). When we are using or develop- ifiers change a font context, but does not select the ing OpTEX, we don’t need to go that same new font directly. This is done only when a vari- way. There is TEX for controlling the docu- ant selector is used. The variant selector respects ment, without new options and parameters. the font context given by previous font modifiers. The macros of OpTEX are more straightfor- For example \cond\it selects condensed italics and ward and simple because they do not create a if someone uses \bf in the same TEX group scope new level of syntax but only what is explicitly where \cond was declared then the bold condensed needed. If you need to make a change in the variant is selected. design of the document (for example) then you There are many font modifiers declared among can copy the appropriate macro from OpTEX the font families. The set of available font modifiers to your macro file and make the changes di- depends on the selected font family. These modifiers rectly there. For example, there are macros can be independent of each other if the font family \_printchap, \_printsec in OpTEX. Do you provides all such shapes. For example, \cond and need a different design? Copy such macros to \caps are independent, so you can set four font con- your macro file and declare your design there. texts by these two selectors and you can use four This is the third principle of OpTEX which es- basic variant selectors: this gives 16 font shapes. tablishes a significant difference from LATEX or The settings of Unicode font features are im- ConTEXt and which keeps the macros simple. plemented as font modifiers. This means that the current setting of font features is a part of the font Summary of features provided by OpTEX context. The setting of the font size is implemented as The user manual of OpTEX is 21 pages. It con- another font modifier. It means that the font size tains hundreds of hyperlinks to a second part of is the part of the font context too. If the family the manual: technical documentation (about 140 provides optical sizes, these sizes are respected by pages). The technical documentation is generated the OpTEX font selection system. directly from the OpTEX sources. There are listings The font families are declared and registered of all OpTEX codes with extensive technical notes by the font selection system in font family files. The about the code. family-dependent font modifiers are declared here. We introduce a few features of the OpTEX sys- You can load more font families (by more \fontfam tem here, giving just short overviews. commands) and you can select between them by the The font selection system. You can use basic family selectors like \Termes, \Heros, \LMfonts, variant selectors \rm, \bf, \it and \bi as in Plain etc. The font modifiers and variant selectors be- TEX, i.e. {\bf text}. Plain TEX does not define the have independently in each family, and they respect \bi selector for bold italic, but OpTEX does because the selected family. For example if you want to mix almost all font families used today provide this font Heros with Termes, you can declare: variant. \fontfam[Heros] You can choose the font family by the \fontfam \fontfam[Termes] % Termes is current command. WYSIWYG systems typically offer a \def\exhcorr{\setfontsize{mag.88}} menu for selecting the font family and this menu \famvardef\ {\Heros\exhcorr\rm} shows how text looks in listed fonts. This is a great \famvardef\ssi{\Heros\exhcorr\it} advantage of such systems. You can get similar information by writing \fontfam[catalog]. Then Compare ex-height of Termes \ss with Heros \rm and Termes again. all font families registered with the OpTEX macros are printed like a . Each font family is This example shows several things: shown in all provided variants and the font modifiers If multiple font families are loaded then the last given for each family are listed too. This “almost • one is selected as the current family. instantaneous” font catalog provides a sort of sub- More variant selectors can be defined by the stitute for the interactive menus used in WYSIWYG • \famvardef command. The example shows a systems. declaration of new \ss (sans serif) and \ssi (sans serif italic) variant selectors.

OpTEX — A new generation of Plain TEX 350 TUGboat, Volume 41 (2020), No. 3

The font size can be set by the \setfontsize Labels for cross-references can be declared • command. It provides more syntactic rules but by \label[ label ] or by \sec[ label ] title , one of them is the keyword mag: the new font etc. The labelsh i can be used byh the commandsi h i size is calculated as a factor of the font size \ref[ label ] or \pgref[ label ]. currently selected. Listsh ofi items look like:h i The Termes and Heros families have visually • \begitems incompatible x-heights. We need to do the cor- * First idea. rection Termes = 0.88 Heros. * Second idea. Macro programmers can declare font selec- \begitems \style i tors directly with the \font primitive if the font * First subidea. name or font file name and its font features is * Second subidea. known. Or the font selector can be declared by \enditems the \fontlet\new=\ori sizespec if another font \enditems selector is known, and weh need onlyi to set another font size of the same font. Finally, the font selector Auto-generated listings. The table of contents \maketoc can be declared by the \fontdef macro if you can can be printed by the command and the \makeindex set the font by variant selector and font modifiers. index by the command. The alphabet- The last case (by the \fontdef macro) respects ical sorting of the index is done at the TEX macro level with respect to the rules of the current language the actual font family and font context when the \fontdef macro is used. If you change the font selected. No external software is needed. Thus, you family before a set of \fontdef declarations then all don’t need to do more than specify the words to be \makeindex declarations are re-calculated to the new font family. indexed and write . The index is reini- Another example: you can set a new default font tialized in each TEX run. size by \typosize[11/13.5] and all fonts declared The situation is similar with bibliographies. OpT X reads .bib database files directly without by \fontdef will respect this new font size. E the need of any external program and creates these OpTEX loads only a few 8-bit Latin Modern listings with respect to a big set of rules declared fonts when its format is initialized. The Unicode in the bib-style files. These rules and customizing fonts cannot be here due to a technical limitation of possibilities are described in [4]. OpTEX provides LuaTEX. It is supposed that these pre-loaded fonts simple and iso690 bib-style files now. will be used only for short experiments with OpTEX macros, not for processing real documents. A user Colors. Colors can be simply used, for example, should specify the font family with \fontfam first. {\Red something} or {\Magenta text}. They can This macro loads the Unicode variant of the fonts. be defined in three possible ways: A lot of font families provided by OpT X have E \def \Red {\setrgbcolor {1 0 0}} registered the appropriate Unicode math font too. \def \Magenta {\setcmykcolor {0 1 0 0}} For example Latin Modern fonts have Latin Modern \def \Brown {\setcmykcolor {0 0.67 0.67 0.5}} Math, Termes has TEX Gyre Termes Math, etc. The \def \Black {\setgreycolor {0}} \fontfam macro loads this Unicode math font too unless the user says \noloadmath. and the user can define more such colors. The re- spective color model (RGB or CMYK) is used in Tagging the document. All the typical tags low-level PDF commands. If you need not mix for documents are borrowed from OPmac. The color models in the PDF output then you can say op-demo.tex document shows the of such \onlyrgb or \onlycmyk. Then the colors are re- tagging. Chapters are marked by \chap title , h i calculated to the desired color model as needed. sections by \sec title and subsections by \secc This calculation is done only via simple math formu- title . The title hendsi at the end of the current line h i h i lae; the visual feeling of the color may be changed. (unless the line ends with the % character; then the These two color models are not transformable from title continues). This decision mainly respects user one to the other without loss of information. needs: to write the document simply. Today, long OpTEX initializes two color stacks: one for nor- lines (more than 80 characters) are quite common. mal text and a second for footnotes. Footnotes can Macro writers have a little complication if they use span from one page to another independently on the \chap, etc., in their macros because the end of line main text, so we need two independent color stacks. is changed locally only at the input processor level, You can create a long footnote in green, for exam- but this can be handled. ple, with the main text in red at the same point.

Petr Olˇs´ak TUGboat, Volume 41 (2020), No. 3 351

The next page continues with the red main text and Verbatim text. Code listings can be placed be- green footnote. All colors work without problems on tween a \begtt and \endtt pair, or they can be in- the next page. (If you try to do the same in LATEX, cluded from an external file with, e.g., \verbinput you realize that it does not work without special ( fromline - toline ) .c. Inline verbatim care.) texth can bei h surroundedi by an arbitrary character The color blender macro \colordef is provided declared by the \activettchar macro. Nowadays, by OpTEX. It enables color mixing in the subtrac- the most common usage is \activettchar‘ as a tive (CMYK) or additive (RGB) color model. declaration. Then you can type ‘\relax‘ to print \relax. This tagging is inspired by the Markdown Graphics. The \inspic { file name } includes a h i language and is used very commonly at StackEx- graphics file in JPEG or PNG or PDF format (the change, for example. last can be a vector graphic) at the current typeset- Sometimes you need to use inline verbatim in ting point as an \hbox. The width or height of the titles or parameters of other macros. This doesn’t picture can be given by \picwidth or \picheight work when the \activettchar character is used be- parameters. Other parameters accepted by the cause there is a “catcode movement” in the param- \pdfximage primitive can be specified too. For ex- eter of ‘...‘. OpT X provides a robust alternative ample, you can select a given page from the included E command for such situations: \code{ text }. The PDF file. text is printed detokenized with \escapecharh i set (a free vector graphics editor) is able hto 1.i From the user point of view, all “sensitive” to save a vector graphic to a PDF file and labels characters− in the \code parameter text should be to a LAT X file. OpT X is able to read both these E E escaped. For example, \code{\\relax\{}h i prints files (the LAT X commands used by Inkscape must E \relax{. This can be used in titles of sections, etc., be emulated here). You can do this by \inkinspic without problems. macro which outputs the PDF graphic plus the la- Listings can be printed with highlighted syn- bels. They are printed in the current fonts selected tax (typically colored). Such syntax highlighting is in the document. defined in hisyntax macro files and can be activated OpT X supports linear transformations using E with \begtt \hisyntax{C} ... \endtt, for exam- commands \pdfrotate, \pdfscale and (in general) ple. All processing is done at the T X macro level \pdfsetmatrix. All compositions of these opera- E without using any external programs. These hisyn- tions are allowed too. The \transformbox macro tax files are easily customizable. They support C, does linear transformations and the real boundaries XML/HTML, T X and Python syntax at this time; of the box are calculated in respect of the trans- E others may be added in the future. Users can declare formed material. more such files. If the graphics need to interact with the text, then TikZ can be used (\input tikz). This works Languages. LuaTEX is the only TEX engine which in Plain TEX too. But simple tasks can be done enables loading hyphenation patterns for a selected using OpTEX macros without TikZ (we are happy language on demand inside the document. Thus, we when TikZ is not loaded because TikZ is a very big need not preload all hyphenation patterns in the for- package). For example, putting the text into an oval mat. Hooray! OpTEX provides the language selec- or into an ellipse (its size depends on the amount tors \ isocode lang (for example \enlang, \frlang, of the text) can be done directly by \inoval or \delangh , \eslangi , \cslang). These commands \incircle OpTEX macros. A clipping path can be load the hyphenation patterns of a given language declared by \clipinoval or \clipincircle. when they are first used in the document, and switch to the loaded hyphenation patterns when they are Hyperlinks. There are four types of internal links: used subsequently. Macro programmers can set cross-references, citations (bibliography), links from more language-dependent macros; these macros are the table of contents or index, and hyperlinks processed when an \ isocode lang language selector to/from footnotes. There is one type of external is used. h i link generated by \url or \ulink macros. The Language-dependent phrases like “Chapter”, hyperlinks can be activated by the \hyperlinks “Figure”, “Table” are automatically selected by or \fnotelinks commands. The user or macro the current value of the \language primitive regis- programmer can declare more types of hyperlinks. ter (this is used for hyphenation patterns). These Structured outlines (for PDF viewers) are auto- phrases are declared in OpT X via: matically generated by the \outlines macro. E

OpTEX — A new generation of Plain TEX 352 TUGboat, Volume 41 (2020), No. 3

\_langw en Chapter Table Figure Subject The internal OpTEX macros not intended for %------direct usage by the user have only a prefixed form. \_langw cs Kapitola Tabulka Obr´azek Vˇec And the control sequences never used in the OpTEX \_langw de Kapitel Tabelle Abbildung Betreff macros but offered to the user (\alpha and other \_langw es Cap´ıtulo Tabla Figura Sujeto sequences for math symbols) are defined only in un- pairs can be declared by prefixed form (in the user name space). \quoteschars clqq crqq clq crq , for example The character _ always has category code 11 \quoteschars“”h ‘’ihfor English.ih ih Thei first type of (letter) in OpTEX. You aren’t forced to write quotation marks can be printed by \"text" and the \makeatletter or anything similar when you need second type by \’text’.* Several languages have to access the control sequences from the OpTEX their \quoteschars predefined in OpTEX. name space. Simply use them. You can redefine these control sequences, but then presumably you Styles in OpTEX. The default design style of the know what you are doing. An example of when document is inspired by Plain T X: 10 pt/12 pt size E it’s expected to redefine macros from the OpTEX of basic text, 20 pt \parindent, zero \parskip. name space was given above where the macros The command \report at the beginning of the \_printchap and \_printsec were mentioned. document sets some typesetting parameters differ- The character _ has category code 11 in math ently, suitable for reports. The \letter command mode too. It is defined as math-active for doing sets a design convenient for letters. subscripts in math formulae. Cases like \int_a^b If you write \slides then you can create pre- work too because they are handled in the LuaTEX sentation slides. This style is documented in the file input processor. op-slides.tex which also serves as an example of OpTEX uses the _ character only as the first usage of this style. character of control sequences. We suppose that Name spaces for control sequences. Suppose macro package writers will use internal control se- that the user writes \def\fi{Finito} into the doc- quences in the form \_pkg_foo. This is a pack- ument. What happens? When LATEX, ConTEXt or age name space. Moreover, the macro writer does original Plain TEX is used then the document pro- not need to see repeated \_pkg_foo, \_pkg_bar, cessing crashes. When OpTEX is used, then nothing \_pkg_other control sequences in the code because critical happens. The user name space of control there is the command \_namespace{pkg}. When sequences allows names where only letters are used. it is used then the macro writer can use \.foo, If such sequences are redefined by users then this \.bar, \.other in the code which is much more only affects their own usage and macros; it’s not human-readable. These control sequences are con- a problem for the internal macros of OpTEX. The verted to internal \_pkg_foo, etc., automatically by internal macros of OpTEX do not use such control the LuaTEX input processor. sequences.** Odds and ends. Logos are defined with an op- When OpTEX initializes, all TEX primitives tional / character which can follow the control se- and OpTEX macros have two representations, pre- quence; it is ignored if present. You can write, for fixed: \_hbox and unprefixed: \hbox. OpTEX uses example: only the prefixed versions. This is the OpTEX name space. A user can work with the non-prefixed ver- \OpTeX/ is a new generation of Plain \TeX/ sions of control sequences. If he or she redefines with features comparable to \LaTeX. them nothing happens with the OpT X internal But \LaTeX/ needs to load about ten additional E packages to have comparable features. macros. This source looks more attractive. We needn’t sep- arate such control sequences by {} or some similar construction. * When \quoteschars are declared, then the original The command \lorem[ from - to ] produces Plain TEX macros \" and \’ are redefined. This prob- h i h i lem is discussed further in the following section about the text “ dolor sit”. There is an inter- esting implementation of this macro: the 150 para- compatibility with Plain TEX. ** There is only one exception: the control sequence graphs of the text are not loaded into the OpTEX \par is (unfortunately) hardwired to the TEX internal format. Rather, the first usage of the \lorem com- algorithms—it is the output of the tokenizer when an mand loads the external file lipsum.ltd.tex from empty line occurs in the input. If the user redefines \par by mistake then processing may crash.

Petr Olˇs´ak TUGboat, Volume 41 (2020), No. 3 353 the LATEX package lipsum and prints the given para- My path from TEX to OpTEX graphs. The second and subsequent usage of the My first attempts at T X were with Plain T X in \lorem macro prints the desired text from memory. E E 1991. I realized that it was unusable with the Czech Compatibility with Plain TEX. All Plain TEX language until 8-bit support of fonts was ready. The macros were re-implemented in OpTEX; nearly all concept of writing Ol\v s\’ak instead of directly features of Plain TEX are essentially preserved. But Olˇs´ak is not viable for real Czech texts. And Czech there are some differences. hyphenation patterns cannot work in the former case The internal control sequences like \p@ or \f@@t either. • were renamed or completely removed. We don’t I became a member of the development team support the “catcode dancing” with the @ char- of CSTEX. The 8-bit fonts with Czech and Slovak METAFONT acter. alphabet (CSfonts) were created using , derived from the Computer Modern fonts. I cre- The Latin Modern 8-bit fonts in the EC encod- plain.tex • ing are preloaded instead of the Computer Mod- ated a macro to read the file without ern 7-bit fonts. the part of loading Computer Modern fonts. This part was replaced by loading CSfonts. The Czech The math fonts are preloaded in 7-bit ver- • and Slovak hyphenation patters were added and the sions comparable to Plain T X plus AMST X. E E CSplain format was originated. I was (and still am) But if the \fontfam command is used then a maintainer of CSplain. In the mid-1990s, I added the preloaded 8-bit text fonts and 7-bit math PostScript support to CSfonts and Czech and Slovak fonts are not used; instead, Unicode text and accents for the 35 base Adobe fonts. Unicode math fonts are used. I read The TEXbook and felt that the descrip- The control sequences for characters defined by tion of all TEX algorithms could be done more sys- • \mathhexbox or in a similarly obscure way are tematically and clearly. This was the reason why I not defined (for example \P, \L). We suppose wrote the TEXbook naruby (“TEXbook inside out”, that such characters should be used directly in in Czech only) [7]. This book became a standard Unicode. Only the \copyright macro is kept for TEX manuals in Czech. For example, computer but it is defined by \def\copyright{ c }. science students were using it to help welcome their The accent macros \", \’, \v, \u, \=, \^, \., new colleagues. • \H, \~, \‘, \t are undefined in OpT X. We E LATEX was not the center of my interest because are using Unicode, so all accented characters it seems to be more complicated. I wrote an article can be written directly and this is the only rec- “Why I don’t like using LATEX” (1997, in Czech) [8]. ommended way. You can use these control se- I have reread this article recently and I found that quences for your own purposes. For example all its arguments are still valid. The main problem is \" and \’ are used for quotation marks when that LATEX offers a “coding language” not a “human \quoteschars are declared, as mentioned ear- language” for writing documents. The second prob- lier. If you insist on using old accents from lem is that it tries to hide TEX by creating a new Plain TEX then you can use the \oldaccents level of language. But this is impossible because command. TEX was not designed for such a task. LATEX users The default paper size is A4 with 2.5 cm mar- • will still see the TEX messages like “extra alignment gins, not letter with 1 in margins. You can de- tab has been changed to \cr”. This language is dif- clare the default Plain TEX margins by the com- ferent from the language used in LATEX manuals, so mand \margins/1 letter (1,1,1,1)in. LATEX users are lost. From my point of view, it is The page origin is at the top left page corner, unfair to T X users to hide T X. • E E not at the coordinates [1 in, 1 in] as in Plain I created encTEX in 2003 [9]. It is a pdfTEX TEX. This is a much more natural setting. extension which supports input of UTF-8 encoded These “1 in” values brought only unnecessary documents directly. The Unicode characters (rep- complications for macro programmers. resented by multi-byte sequences in UTF-8) are The \sec macro is reserved for sections, not the mapped by encT X to one 8-bit character or to a • E math secant operator. control sequence which can be defined arbitrarily. The most important advantage is that each Unicode Tips and tricks. The web page [5] has a section character is represented as only one token in TEX. Tips and Tricks when using OpT X. This is inspired E This is the main difference from the LATEX package by Tips and Tricks of OPmac [6]. OpTEX users can \inputenc. give a problem and I’ll try to put the solution here.

OpTEX — A new generation of Plain TEX 354 TUGboat, Volume 41 (2020), No. 3

I have been giving lessons about TEX to our of TEX. My dream is to eliminate the widespread students and learning from them. They represent a notion among TEX users that TEX is equal to LATEX. new generation and their point of view and require- Will you help me with it? ments to TEX are very important to me. One of the results of these discussions is: at present, only References Unicode makes sense. I decided that encTEX devel- opment was a dead end. We have a sufficient number 1. http://petr.olsak.net/optex of quality Unicode fonts today, we don’t need to do 2. P. Olˇs´ak: OPmac: Macros for Plain special mappings and complicated macros, we can TEX. TUGboat 34:1, 2013, pp. 88–96. use Unicode fonts directly. petr.olsak.net/opmac-e.html As a maintainer of CSplain and (of course) user tug.org/TUGboat/tb34-1/tb106olsak-opmac.pdf of Plain TEX, I created many typical macros needed 3. tex.stackexchange.com/questions/541064 for document processing: creating the table of con- 4. P. Olˇs´ak: OPmac-bib: Citations using tents, fonts in different sizes, references, hyperlinks, *.bib files with no external program. etc. I released these home-made macros in 2013 as TUGboat 37:1, 2016, pp. 71–78. an additional package in CSplain called OPmac [2]. tug.org/TUGboat/tb37-1/tb115olsak-bib.pdf It works with CSplain or Plain TEX with all typical TEX engines. The great disadvantage of OPmac is 5. petr.olsak.net/optex/optex-tricks.html that its technical documentation, though extensive, 6. petr.olsak.net/opmac-tricks-e.html is only in the Czech language, because it was cre- 7. P. Olˇs´ak: TEXbook naruby, 1996, 2000. 468 ated as home-made documentation of home-made pp., ISBN 80-7302-007-6. Freely available. macros. petr.olsak.net/tbn.html I created the template for student theses at our 8. P. Olˇs´ak: Proˇcnerad pouˇz´ıv´am LAT X, 1997. university based on OPmac and CSplain [10]. There E petr.olsak.net/ftp/olsak/bulletin/nolatex.pdf are hundreds of satisfied users. It shows that Plain TEX is still viable today. 9. petr.olsak.net/enctex.html (2003–) I planned to make a re-implementation of 10. P. Olˇs´ak: The CTUstyle template for OPmac with new English documentation and with student theses. TUGboat 36:2, 2015, new features and internals. What TEX engine would pp. 130–132. petr.olsak.net/ctustyle.html be suitable for such a plan? X TE EX does not support tug.org/TUGboat/tb36-2/tb113olsak.pdf all of the micro-typographic extensions introduced 11. ctan.org/pkg/tex-nutshell (2020–) by pdfT X, and does not offer the extensive cus- E 12. P. Olˇs´ak: T X pro pragmatiky, 2013, 2016. tomization of LuaT X. Furthermore, it seems that E E 148 pp, ISBN 978-80-901950-1-1. Freely X T X is no longer significantly developed, while E E available. petr.olsak.net/tpp.html LuaTEX has been declared substantively stable as of version 1.10 (http://www.luatex.org/roadmap.html). Petr Olˇs´ak LuaT X won. ⋄ E Czech Technical University I finished my reimplementation of OPmac by in Prague May 2020 and the result is called OpTEX.* The http://petr.olsak.net documentation of OpTEX expects knowledge of TEX basics. This is the main reason why I wrote a short encyclopedic document “TEX in a Nutshell” [11] (in English). The big reference “TEXbook naruby” [7] and short summary “TEX pro pragmatiky” [12] are available only in the Czech language, unfortunately. I hope that OpTEX will find many users and thus gain more respect. I hope that it will be a good alternative to other currently used formats. It can show that the native principles of TEX do not have to be covered by new levels of computer languages, and they can live at present: 40 years after the birth

* You can guess why I had more time to do it in the year 2020.

Petr Olˇs´ak TUGboat, Volume 41 (2020), No. 3 355

Book reviews: Robert Granjon, letter-cutter, and preeminent type artist himself, prized regularity, clarity, Granjon’s Flowers, by Hendrik D.L. Vervliet good taste, and grace in and evidently saw those qualities in Granjon’s work. Charles Bigelow The sixteenth century has been called the Gold- Hendrik D.L. Vervliet, Robert Granjon, letter-cutter; en Age of Typography; the most famous letter-cutter 1513–1590: an oeuvre catalogue. of the era was Claude , renowned for his Oak Knoll Press, New Castle, DE, 2018, hc, roman types. More than fifty digital type families today 200 pp., US $75.00, ISBN 978-1584563761, are named Garamond or Garamont, though many are oakknoll.com/pages/books/131957. not based on Garamond’s actual designs. Hendrik D.L. Vervliet, Granjon’s Flowers: An Enquiry Granjon’s name, though well known in his time, Into Granjon’s, Giolito’s, and De Tournes’ Ornaments, is borne today by only one type family, which com- 1542–1586. bines a roman based on a Garamond design with an Oak Knoll Press, New Castle, DE, 2016, hc, italic based on a Granjon design. The only type family 248 pp., US $65.00, ISBN 978-1584563556, today that authentically, and brilliantly, embodies both oakknoll.com/pages/books/127576. Granjon’s roman and his italic designs is not named “Granjon” but “Galliard”, the English spelling of “Gail- lard”, the original name of a small (around 8 point) roman that Granjon cut around 1570. (This review is typeset in Galliard, both roman and italic.) Before and during Granjon’s early career, italic was often used in body text instead of roman, but later, italic was used in subordination to roman — usually a Garamond roman with a Granjon italic. By the seventeenth century, italic was commonly paired with roman in type families, a practice continuing to present. Hence, though Gran- jon’s italics have often been imitated in Garamond type revivals, they are typically known as “Garamond Italic”. Perhaps paradoxically, the subordination of italic These two books tell us essentially all that we know of to roman gave Granjon greater freedom of expression the illustrious sixteenth century French type designer and invention in cutting italics, which could be more Robert Granjon. Thanks to six decades of meticulous exuberant and stylish because they did not need to sup- research by Hendrik D. L. Vervliet, Granjon can now port fluent reading of long texts. Granjon cut some be ranked not only among the finest letter-cutters1 of thirty different italics in at least three different styles. the French but among the greatest of all His “pendante” or “couchée” is a strongly slanted time. Granjon was both astonishingly versatile and style that has most often been revived as a compan- amazingly productive, cutting around ninety fonts, in- ion to Garamond romans. His “droite” is a narrower cluding romans, italics, and cursive in the and more upright style, revived and modified by Jan Latin alphabet, as well as Greek, Cyrillic, Arabic, Ar- Tschichold for Sabon, a modern revival of Garamond menian, Hebrew, and Syriac among non-Latin scripts. roman and Granjon italic, produced for Monotype He also cut music fonts and ornamental printers’ flow- and Linotype hot-metal composition machines and ers. Vervliet quotes Giambattista Bodoni, writing two then adapted to photo- and digital type. A third italic centuries after Granjon’s death, that: “Maestro Robert style of Granjon resembled models of Italian chancery Granjon was the best letter-cutter ever.” Bodoni, a cursive like those of writing manuals by Arrighi and Tagliente. This chancery style is the italic in the Gal- 1 Granjon would today be called a “type designer”, but that liard family. term was not used until the twentieth century. Vervliet instead uses “letter-cutter” signifying the material technology of type employed In addition to plentiful details, dates, and exam- from the fifteenth through the nineteenth century, when types were ples of Granjon’s typefaces, Vervliet provides a biog- created by skilled artists who used hard steel engraving tools and files raphy of the man. Robert Granjon was born around to cut and shape the forms of letters at actual size in steel punches. 1513 in Paris, son of a bookbinder and publisher. He The punches were struck into copper matrices, producing concave impressions of the letters. From the matrices, types were cast in lead was apprenticed to a goldsmith and afterward turned alloy, with the letter forms in relief, to be inked and impressed onto to letter cutting. The earliest types reliably attributed paper in printing. A common term for those type artists has been to him are a Roman on “Long Primer” body ( 10 “punch-cutter”, which denotes the technique but omits the art. A ≈ sculptor analogously described might be called a “marble cutter” or point today) in 1542 and an Italic on “English” body ( 14 point today). Though working in Paris from “clay shaper”, and a painter, a “color dauber” or “pigment brusher”. ≈

Book reviews: Robert Granjon, letter-cutter, and Granjon’s Flowers, by Hendrik D.L. Vervliet 356 TUGboat, Volume 41 (2020), No. 3

around 1542 to 1552, Granjon traveled to Lyons and “arabesques” in metal work, bookbinding, and other sold types to printers there, especially Jean de Tournes, crafts, had been used in European printing long before for whom he also cut some exclusive types. Around Granjon. In Granjon’s Flowers, Vervliet provides a con- 1553, Granjon moved to Lyons and later married An- cise historical analysis of four stylistic phases of French toinette Salomon, daughter of Bernard Salomon, an ornamental typography in the sixteenth century, plac- eminent illustrator, engraver, and book designer for ing Granjon’s work in the fourth phase. Beginning De Tournes. He apparently left Lyons around 1562, in the 1540s, Granjon was cutting single ornaments, worked for a time in Geneva and Strasbourg, and in and by the mid-1560s, he was creating complex sets 1564 moved to Antwerp, where he sold types and cus- of modular flowers that could be combined intoa tom cuttings to Christophe Plantin, known as the king wide variety of patterns. As Vervliet puts it, these were of printers. Granjon moved back to Paris in 1570, to “arabesque patterns based on a continuous repetition Lyons from 1575 to 1578, and then to Rome, where of non-figurative foliated elements and undulating, in- he cut punches for some two dozen non-Latin types tertwined, schematized and denaturalized stems”. The and printers’ flowers for the Vatican, until a few months cutting of flowers gave Granjon a golden, or leaden, before his death in 1590. opportunity for abstract expression in typography, and The types of Granjon became internationally pop- he obviously reveled and excelled in it, cutting around ular when French romans and italics, betokening fash- fifty elegant, graceful, and original abstractions, which ionable , became popular not Vervliet has painstakingly tracked down, identified, and only in , but also in , Switzerland, Spain, documented in an impressive display of typographic and the . In his first ten years of letter- detective work. cutting in Paris, Granjon cut around 24 fonts: 11 ro- This definitive study provides examples, dates, mans, 9 italics, 2 Greeks, and 2 musics, and 12 fleurons and notes of all the Granjon flowers that Vervliet has (typographic “flowers” or ornaments). These amount identified, accompanied by an extensive bibliography. to some 2,500 punches, averaging 250 punches per Several of Granjon’s flower designs were revived year, an impressive number considering that he also by Monotype in the 1920s and have been known cast (“founded”) some of his types and was also a pub- as “Granjon Arabesques”. For decades, they have in- lisher, producing around 14 books, including works trigued and delighted typographers who have explored of religion, science, history, music, and literature. He possible combinations by rearranging the ele- spent nine years in Lyon, cutting around 2600 punches, ments. Most of these works were printed letterpress in including 10 fonts of italic, 4 Greeks, 2 Civilités (a cur- limited-edition books now unobtainable or astronomi- sive French blackletter), 4 musics, 12 sets of fleurons, cally priced, but Jacques André has made a brilliant dig- and two sets of large script (the models cut in ital edition and translation of Swiss typographer Max wood and cast in sand). Caflisch’s playful explorations of Granjon’s Arabesque, By the time of his death, Granjon’s italics domi- “Kleines Spiel mit Ornamenten” (in French translation: nated European humanist (non-blackletter) typogra- “Petits jeux avec des ornements”). It includes examples phy. In Christophe Plantin’s 1585 Folio Specimen, of the ornaments in use in historical books along with twelve of thirteen italics are by Granjon. In the 1592 reproductions of Caflisch’s studies, not by scanning dig- Egenolff-Berner specimen produced in itization but by new compositions using digital fonts France, seven of the eight italics are by Granjon. of Monotype’s Granjon Arabesques. It is available at There is no known portrait nor physical descrip- jacques-andre.fr/ed/caflisch-jeux.pdf . tion of Granjon the man, but his career seems all the Both books were handsomely designed by Alas- more impressive and romantic when we consider his tair , fine printer and typographic scholar. travels. As a free-lance letter-cutter, Granjon presum- Robert Granjon, letter-cutter is composed in Stempel ably rode on horseback from city to city, carrying his Garamond, which has a Granjon italic, and Granjon’s gravers and files in a box, along roads sometimes rough Flowers in Linotype Granjon, with a Garamond roman and hazardous. The gunslinger Paladin in a 1950s and Granjon italic. American television show had a calling card that read, In closing, two references; and sample pages are “Have Gun Will Travel”. Granjon’s card could have reproduced following. read, “Have Graver Will Travel”. Bigelow, C. On Type: Galliard. Fine Print 5(1):27–30, 1979. Reprinted in Fine Print on Type, 1990. Granjon’s Flowers Carter, M. Galliard: a modern revival of the types of Robert Granjon. Visible Language 19(1):77–97, 1985. In addition to alphabetic and music types, Granjon cut ornamental printers flowers or “fleurons”. Ornamen- Charles Bigelow ⋄ tal flowers, presumably inspired by Islamic patterns or https://lucidafonts.com

Charles Bigelow TUGboat, Volume 41 (2020), No. 3 357

All pages reproduced from Robert Granjon, letter-cutter.

Book reviews: Robert Granjon, letter-cutter, and Granjon’s Flowers, by Hendrik D.L. Vervliet 358 TUGboat, Volume 41 (2020), No. 3

Book review: Glisterings, by Peter Wilson short discussions of tricky LATEX problems and their solutions. Some material of this column was based on Boris Veytsman the discussions in ctt and texhax. For many years Peter Wilson, Glisterings. LATEX and Other these articles were the favorite part of the journal Oddments. TEX Users Group, Portland, Oregon, for me — and, I guess, for numerous other readers. USA, 2020, paperback, 130pp., US$15.00, ISBN Now TUG has published these columns in a 978-0982462621. book, and one can reread them collected together. After doing this myself, I recognized how much my TEX style was shaped over the years by them. It was a joy to recall the times when I discovered these little gems of TEX programming — and to find the things overlooked at the first reading. The subjects in the articles vary from complex problems like string parsing to the textbook-like explanation of subjects like the ways of defining new macros in TEX and LATEX. The columns about paragraphs and their shapes are probably among the most useful in the book. The TEX way of dealing with paragraphing is rather complex, and Peter explains it with his characteristic lucidity and clearness. Besides TEX and LATEX, the book discusses fonts, ornaments and other printer devices, MetaPost (the image on the cover, a spidron, is created with MetaPost; see Figure 1), and many other topics. As another proud owner of Lanston Type Company’s LTC Fleurons Granjon (LTC is now a division of P22 Type Foundry, p22.com), I was especially interested in the chapter about using the Fleurons in LATEX. It gives very useful advice on getting the glyphs aligned. Of course TEX has changed over these years. While some problems discussed in the book now have somewhat more straightforward solutions (for example, string manipulations today would be prob- Many many years before people were reading tex. ably based on the xstring package), it is surprising stackexchange.com on their smartphones, the pri- how much of the book is still very relevant today mary forums for asking TEX questions were the list and is required reading for an aspiring TEXnician. [email protected] (still in lively existence) and the The style of Peter’s writing is fascinating. The Usenet group comp.text.tex (also still in existence, author’s subtle British humor makes the reading though not so lively), or ctt as it was known to pleasant and far from the dry stuff filling many tech- aficionados. I guess that before texhax and ctt, in nical books. The reader meets this self-deprecating the prehistoric times when people exchanged TEX humor on the first page of the Introduction: tapes and freshly knapped stone axes, there were For many years Jeremy Gibbons edited a very TEX-related BBSes, but I do not have reliable infor- successful column in the TEX Users Group mation about those. Anyway, many of old timers journals TEX and TUG NEWS and TUGboat remember these long Usenet discussions about TEX called Hey — It works! [52]. I learnt much tricks, where Peter Wilson was prominent with his from this but apparently not enough to decline clever solutions and lucid explanations. Even more when asked to take over the column. On the people know Peter Wilson for his memoir class — a other hand I have learnt to my cost that the great example of exquisite typesetting. The manual quickest way to get a correct answer to a ques- of the class is a great introduction not only to the tion on the comp.text.tex (ctt) newsgroup class itself, but also to the typesetting of books in is to give an incorrect answer. In order not general. to sully Jeremy’s reputation my first thought Starting in 2001, Peter published a column Glis- was to change the title to Hey — It might work terings in TUGboat. The articles consisted of fairly but after some consideration the new title is

Boris Veytsman

TUGboat, Volume 41 (2020), No. 3 359

l fgfgfgfgfgfgfgfgfgfgfgfgfgfgfgfgfgfgfgfgfgk

TUGBOAT l 9.2. Spidrons ik 14. 32:2, 2011 j

l ik j \wbc{\wsp}{24pt}{\wupit{-19pt}{o}%

\llap{\wupit{35pt}{\rotpi{n}}}\kern-14pt p} Except for the simplest scheme of just putting l ik j

the glyphs in a row, experimentation will nearly al-

ways be required to obtain sympathetic relationships n l ik j among the elements of the pattern. They don’t have

to be mathematically exact but must look good to l ik j the eye.

p Moving on, here is another set of four glyphs, l ik j which would normally be used at the corners of a

o page, that can be combined in interesting ways. l ik j

\wbc{\wsp}{24pt}{E F G H}

Defining a macro for each assembly will let us mix

l

ik and match. j

l ik j \newcommand*{\qno}{q\kern-14pt\wupit{-19pt}{n}% EFGH

\llap{\wupit{35pt}{\rotpi{o}}}}

l ik \newcommand*{\onp}{\wupit{35pt}{\rotpi{n}}% j

\kern-14pt p} One simple way is just using two lines, which re-

minds me of a row of gilt mirrors. l ik j \wbc{\wsp}{24pt}{EFEFEF\\GHGHGH}

Figure 9.5: Tilings: (left) Spidrons can do it alone; (right) Hornflakes need spidrons Now, l ik j

\wbc{\wsp}{24pt}{\onp\qno\onp\qno\onp\qno}

produces EFEFEF l ik j on on o What was missing from this article was any hint n

as to what those ‘right combinations’ of folds might l ik GHGHGH j be to create these effects. After some searching on

the web I found the following remarks by Erdély [40]. l ik j pqpqpq We can add some further decorative elements, re-

I folded every second edge, reaching to the ducing the size at the same time: l ik n n n j centre of the created hexagon in the given

Spidron system, as a spine and folded ev- \wbc{\wsp}{12pt}{HGHGHGHGHGHG\\[-3pt] l ik and j ery first edge as a groove. The resulting \wbc{\wsp}{24pt}{\qno\onp\qno\onp} EFEFEFEFEFEF\\GHGHGHGHGHGH\\[-3pt]

relief-like surface, under the impact of an EFEFEFEFEFEF}

l

ik displays j

external deforming force, does not show on

simple linear displacements, such as those on l ik j produced with an accordion; instead, the

edges between the vertices and the centres HGHGHGHGHGHG l ik j of the original hexagonal system move in a qpqp EFEFEFEFEFEF

vortex within each hexagon. GHGHGHGHGHGH l ik j n n EFEFEFEFEFEF

After a lot of cogitation and physical experimen- l ik I had to experiment to decide on the various j tation I came to believe that among the ‘right com-

distances to move things to create the \onp and

binations’ are the ones shown in Figure 9.6, which and add more and more if desired. On the other l ik j \qno assemblies. These distances would have to be hand, joining the four elements in a different manner

shows half a hexagon with three semi-spidrons. The changed if something other than 24pt was used as

dotted lines indicate ‘valley’ folds (paper on either can lead to something even fancier. l ik the font size. However, it is always possible to use j side of the fold, or crease, is bent upwards) and the In the days of lead type, a sort (a single charac-

\scalebox from the graphicx package to appropri-

full lines indicate ‘mountain’ folds (paper on either ter) was a rectangular bar of lead with the glyph in l ik ately size a pattern. This gives a half-size result j Figure 9.6: Folding. [Editor’s note: We gratefully relief on the end that was to be inked and printed.

side of the crease is bent downwards). compared with the previous ones.

If you want to create a large construct for fold- acknowledge Jeremy Gibbons’ paper “Dotted and There was no way of stretching or shrinking a piece l ik j ing, here is the code for generating the spidron tiling dashed lines in METAFONT”, TUGboat 16:3 (1995), of type, and neither was there any way of getting one

shown in Figure 9.5. You can, of course, modify this https://tug.org/TUGboat/tb16-3/tb48gibb.pdf, \wbc{\wsp}{24}{\scalebox{0.5}{\qno\onp}} piece of type to overlap another, except by printing l ik j to meet your needs. which aided us in finalizing this figure, and a fasci- twice, once with one sort and then with the second

nating read in itself.] sort. With digital fonts these restrictions no longer

l ik j % glstr9.mp MP spidron figures on apply. In the next example, the bounding boxes of

% earlier pictures the glyphs overlap, although the glyphs themselves

l ik qp j

beginfig(5); % spidron tiling n do not.

l ik j 31 ifgfgfgfgfgfgfgfgfgfgfgfgfgfgfgfgfgfgfgfgfgj

Figure 1: A page from the column about spidrons Figure 2: A page from the column about ornaments

as you see it earlier — Glisterings — implying When compiling the columns in the book, the that there might be some dross among the author gathered the references from each into a joint nuggets. bibliography, and added an index — a very useful and continues to walk along with it during the jour- device for a compendium of disjoint materials like ney through the book. this one. He also rearranged epigraphs and deleted Another part of the book’s charm are the epi- the repeated ones. The resulting book is a welcome graphs throughout. They are funny and wise, pro- addition to a library of a (LA)TEX student or even viding a surprising counterpoint to the point of the TEXnician — indeed, even for those of us who have text. For example, the section about changing the a collection of TUGboat issues for the last twenty layout starts with a note by Samuel Johnson, years. I wholeheartedly recommend it. Glisterings is the third book published by TUG, Change is not made without inconvenience, after a volume of interviews (2009) and the 25 an- even from worse to better. niversary collection of papers (2010). This is a fit- The section about the “superstitious” version of enu- ting continuation of the series. They are all available merate designed to eliminate an item no. 13 is ac- from tug.org/books, as well as in the general online companied by the apt passage from George Orwell’s stores. I hope that we keep publishing the books 1984: under our own imprint—perhaps next time without It was a bright cold day in April, and the a 10 year hiatus. clocks were striking thirteen. Boris Veytsman The book is lovingly typeset by Peter, reminding ⋄ Systems Biology School, us that the author is a TEX wizard and typesetter of George Mason University a very high caliber. The techniques discussed in the Fairfax, VA book are illustrated by the book itself; for example, borisv (at) lk dot net a page of the column about ornaments has a nice http://borisv.lk.net frame around it (Figure 2). ORCID 0000-0003-4674-8113

Book review: Glisterings, by Peter Wilson 360 TUGboat, Volume 41 (2020), No. 3

Historical review of TEX3 what it is you’re trying to do (the terms ‘visual’ and ‘logical’ were coined ca. 1988 [4]). Peter Flynn Visual systems rely on the operator’s skill in Abstract manipulating a bitmapped screen image, usually by hand, eye and mouse. The relationship between text While clearing out cupboards during lockdown, I structure and appearance is undefined, or specified came across several oddities, including this review by simple tags in style sheets. The method allows of T X3 which magazine asked me for in April E .exe freehand adjustment to appearance, such as position- 1991, during negotiations into being taken over. They ing, scaling, distorting or overlaying of graphics and subsequently collapsed as a print publication, and text, but places a high degree of reliance on manual the article was therefore never published, meaning dexterity and the visual judgment of the user. Each it’s still my copyright. It was interesting to see what page is normally made up individually, so that any I had picked up on for them, as they had asked me knock-on effect of changes must also be attended to. to write it from the point of view of a new T X user E Logical systems rely on the author’s or editor’s with experience in other systems. skill to bind structure tightly to appearance, using It referenced five figures which were long since commands specified in a style file or embedded in lost, but which I have reconstructed from memory the text to define positioning and typography. The and the descriptions. The original was formatted in text file is edited externally, and processed in its unmarked monospace, according to the publisher’s entirety to ensure the effect of changes is properly requirements. This version has been used without accounted for from page to page. Adjustments to substantive textual change for TUGboat, but the appearance are made by changing the commands plaintext symbolic notations like have been \TeX\ in the text, and by the use of fully programmable replaced with the genuine logos, and URIs macros. This largely removes the need for manual have been marked, some previously unnoticed ty- or visual intervention in long or repetitive texts, pos have been corrected, and oddities of plaintext or in applications where automation, regularity or markup have been converted to LAT X. E dimensional accuracy is required. Most companies, addresses, electronic access, etc., mentioned no longer exist. The references are Origins kept as a matter of nostalgia and history. The article begins below the rule. TEX is, in effect, a document compiler, taking a piece of source ‘code’ (your text) and acting on the instructions you embedded in it. The original pro- gram was written by Don Knuth to typeset a new The grand-daddy of DTP systems offers a radically edition of his famous Art of Computer Programming different approach to typography on the desktop. series, because ordinary commercial typesetters ei- Peter Flynn investigated and found it still has a lot ther couldn’t handle the technical matter, or charged to offer. too much. (The lowered ‘E’ in the TEX logotype distinguishes it in print from TEX, an old Honeywell Introduction editor, for copyright reasons, and emphasizes the Out of the current soup of DTP systems, a few leaders relationship with typography.) have emerged among the visual packages. Ventura, He generously made TEX publicly available, so Pagemaker, and Quark XPress each has its own bene- it can be had in commercial and non-commercial fits and peculiarities, as any user will have discovered, versions for almost every machine in existence from but there is another form of DTP altogether, known Apple IIGS, Amiga, and Archimedes, up through PC as ‘logical’, which is where systems like TEX fit in. and Mac, to Sun and other UNIX systems, VAX/VMS, Contrary to popular opinion, DTP was not in- IBM and other mainframes, and even the Cray (that vented in the mid-80s on the Apple Mac, but in 1978 should shift some lead!). All implementations are on a DEC-20 minicomputer. The program was TEX, compatible with each other, differing only in mem- and it allowed users for the first time to produce ory capacity and speed, depending on the operating printers’ quality typesetting from their terminals, environment. This level of availability must make it using the new ‘laser-beam’ printers. the most widely-spread DTP system in use: certainly Heady stuff indeed at the time, given that word- in today’s increasingly networked environment it is processing had barely yet been given a name. a significant factor to be able to establish completely The difference between visual and logical sys- compatible DTP across almost every platform with tems is straightforward enough once you understand no need for investment in special hardware.

Peter Flynn TUGboat, Volume 41 (2020), No. 3 361

TEX has had a mixed reception in the past. It an older commercial VAX/VMS version (Kellerman has variously been criticised for being difficult to use, & Smith) and a PD one for the Macintosh called lacking graphics, only having one typeface, for not OzTEX. I didn’t have a UNIX machine available, being WYSIWYG and for not being a visual-based so I was unable to investigate UNIX-flavoured TEX system (about as sensible as criticising a fish for not personally, but given the rigorous quality control, it being a chicken). Part of the reason is that, com- is reasonable to assume it performs as claimed. ing from the , there was no one to There are other public-domain and trumpet its abilities or explain its working. Some versions too, from a variety of bulletin boards (e.g. reviewers who should know better have not distin- CIX), user groups and file server hosts on the wide- guished between visual and logical systems, and some area networks all over Europe and the United States. were making their comments based on old versions, UK, Irish and continental European users of normal hearsay, and incomplete information. However, as internetwork email can order from Aston or Heidel- we shall see, it is in fact quite easy to use, can handle berg (for details see end) but if you’re trapped on BT plenty of graphics, lots of typefaces, and has some Gold or EirMail, forget it and contact your national of the best screen displays around. User Group. In addition there are other regular Quality is one area that has always been com- commercial versions such as Textures and TurboTEX mented on favourably, and it is carefully guarded: (for the Mac and PC respectively) and a low-cost for an implementation to call itself TEX it must pass commercial version for the PC (DOSTEX). a stringent test specified by Knuth.

Availability Installation Perhaps one of the reasons for the ‘well-kept secret’ TEX for desktop machines needs about 5MB of disk ethos surrounding TEX is the fact that the original space to live in, and runs in 512KB of memory. On source code was placed in the public domain. Anyone larger machines, where there has traditionally been can have this code for the asking: it was written in more disk space, implementations tend to spread WEB, a ‘literate programming language’ also devised themselves around a bit more. If you want to add by Knuth, which produces Pascal source. There are more fonts, design packages, CAD, specialist macros, many commercial versions of TEX as well, written foreign-language hyphenation and the endless other by companies or individuals who have optimised the goodies, you will need more room. On the other code for specific operating systems, and ship the hand, if all you want to do is publish, say, a typeset system as a package. Because of its popularity and database listing, you can trim right down to just over heavy use in research and technical typesetting, and 1MB (plus your own data/text storage, of course). in publishing trade and the academic field, there PC-TEX came in the usual style of PC software has been little need for glossy full-colour whole-page binder: the INSTALL routine worked perfectly and adverts for it in magazines aimed at the domestic or took around 15 minutes to unpack everything from office market. This is perhaps a pity, as many visual the nine disks. emTEX arrived as several .BOO archive DTP users who need reliable, accurate and auto- files (encodings into printable characters which allow mated formatting are still doing their work manually, 8-bit binary to traverse the 7-bit email networks) unaware that TEX exists! which DEBOOed and unZIPped (with the -d option) Those who have anxiety attacks at the mention to recreate all the files and the whole directory struc- of the phrase ‘public domain’ can take comfort in ture, in about 20 minutes. I wish some other software the fact that it is the source code itself which is I could mention unpacked with even half the care and available, so if you have misgivings about viruses, intelligence that has been spent on organising both you can check it and compile it yourself. There are these versions. The Mac version I unpacked in a sim- no known instances of an infected copy of TEX being ilar manner from BINHEXed STUFFIT archives with- distributed, and all the public versions available from out problems. The VAX software originally came on the regular sources are checked out before being made standard half-inch magtape and took rather longer, available. If in doubt, of course, buy a commercial because of the need to establish logical names and copy. pathways to cater for multiuser operation. My review commercial copy (PC-TEX) came Drivers are available for most printers (i.e. those from UniTEX Systems (for details see end) and my emulating IBM Graphics and ProPrinters, Epson FX public-domain ones (emTEX and SBTEX) came from and LQ, NEC Pinwriter, PostScript and HP LaserJet) the TEX server archive (TEX.AC.UK) at Aston Univer- and you can get or make drivers for almost anything sity, Birmingham, via the email network. I also used which puts marks on paper, even a fax: if you have

Historical review of TEX3 362 TUGboat, Volume 41 (2020), No. 3

something esoteric, there is generic driver code avail- The documentation accompanying PC-TEX is able for you to roll your own. There’s even a driver a more digestible spiral-bound manual which sum- for a CalComp pen plotter if you want letters sev- marises all the introductory matter of The TEXbook eral feet high! Because TEX produces an output file and looks less forbidding for a beginner [5]. which is entirely device-independent, you aren’t tied The PD versions came with documentation on to using any particular supplier’s driver, or even any how to set up and run the programs (in most cases particular machine. better explained than most ‘professional’ manuals) TEX itself knows nothing about fonts except but they do assume you have The TEXbook for details the height and width of each character, plus a few of how to format text. other parameters, which it reads from font metric For the user who just wants to produce some- files. The print drivers, however, normally use bit- thing with a relatively simple standardised layout, map font files, not outlines, so you need to order the LATEX macro package which comes with TEX is the right resolution with your printer driver (usually the easiest route, as it has several carefully-designed 180, 240, 300 or 360 dpi depending on your printer). sample document styles. The LATEX book which de- Using bitmaps is a two-edged sword: you tend to scribes these is also an Addison-Wesley publication get far better quality, but you are restricted to those but comes free with PC-TEX [3]. sizes you have on disk. However, there is a compan- There are some very good introductory booklets ion type-design program for TEX called METAFONT, available free as TEX input files, so you just process which can create additional optimised font files at and print them. The best two I found were A Gentle any resolution for any dot-matrix or laser printer Introduction to TEX by Michael Doob [1] and Intro- (and some typesetters). METAFONT is well worth duction to TEX on VAX/VMS by Joe St Sauver [6]. having if you want extra odd font files at specific This last one sounds a bit operating-system specific, sizes, and if you’re into type design, this is probably but if you just ignore the VMS bits it is a very good one of the most powerful tools around. If you use guide. There are others covering the many add-ons PostScript, of course, you don’t need any bitmaps, to TEX, again usually supplied on disk as .TEX files just the font metrics. I shall have more to say about for you to print yourself. fonts later.

\noindent This is a very short test, to make This is a very short test, to make sure the program is sure the program is working correctly. This working correctly. This paragraph starts flush left, paragraph starts flush left, and shows the and shows the appearance of bold face type. appearance of {\bf bold face} type. This paragraph is indented, and shows the ap- pearance of italics. It contains the math formula This paragraph is indented, and shows the z n = x n + y n. Such a formula might also be appearance of {\it italics}. It contains the displayed∗ ∗ ∗ math formula $z*n=x*n+y*n$. Such a formula z n = x n + y nS might also be displayed $$z*n=x*n+y*nS$$ to ∗ ∗ ∗ make it more prominent. to make it more prominent.

\bye Figure 2: Output of TEST.TEX (HP LaserJet III) Figure 1: Source text of the file TEST.TEX

Operation Documentation TEX operates in a radically different manner to visual The manual for all implementations of TEX is The DTP. You create your document in a plain ASCII TEXbook by Donald Knuth, published by Addison- file, embedding formatting commands in the text. Wesley [2]. This explains everything, in progressively When you run the program, it processes your file finer detail. The early chapters are excellent, and and makes a compact, portable device-independent can get even a total novice producing the goods very typeset file, which is used by the preview and print quickly. Learning TEX is not difficult, but like any drivers to produce the output. Because it doesn’t DTP system, it is nevertheless not trivial, and the use a graphic display during processing, repetitive later chapters need some careful reading, as there is production jobs such as database publishing can run a lot in them. There are plenty of worked examples, unattended once they are set up. though, far more than in any other manual I have A test file (test.tex) which demonstrates the ever seen (you can never have enough, in my view). method of operation is supplied as standard (see

Peter Flynn TUGboat, Volume 41 (2020), No. 3 363

Figure 1 and Figure 2 for the input and output). marked, formatting and reformatting becomes rela- The DVI (DeVice-Independent) file can be printed on tively trivial — just change the macro definitions. any supported printer without the need to reprocess There are some tricks and traps of course. One your source text: just run the relevant print driver. small confusion for beginners arises over the use of The concept is similar to PostScript, in that once curly braces: TEX uses them to group together text you have the final version off your laser or dot-matrix which you want treated in a particular way, so for printer, you don’t have to reformat or reprocess for example, to italicise text you type {\it your text} a higher-resolution device. Unlike PS, however, you in curly braces, with the ‘\it’ command immediately are not restricted to specific output devices or fonts. inside the opening brace: this restricts the effect of All the formatting commands are mnemonic, italics to the text in the braces. So far so good. But so they are fairly straightforward to learn. They curly braces are also used to delimit arguments, so for all begin with a backslash (e.g. \parindent=2em or example, typing \centerline{some text} centres \baselineskip=15pt, see Figure 1). The use of the text between the margins. There are very cogent an ASCII file means you can continue to use your reasons for this, but it means a second or so’s thought favourite editor (or wordprocessor in ‘non-document’ until you get used to it. or ‘ASCII export’ mode), so there are no new menus Once you’ve processed your file, you will want or keystrokes to learn. No editor is supplied, as it is to see the results. The WYSIWYG screen in emTEX assumed that every machine already has one in some is one of the best I have seen. It is configurable for form or another. I’m uncertain about this: although all the conventional PC screens from mono CGA to it is easy to get many excellent editor/wordprocessor the 64/256 colour VGA display, and on the higher programs, it surely could have been possible to throw resolution devices it uses grey-scaling to give the in a good public-domain editor, or license a shareware image better definition. You can shrink the image one, even if only for beginners. I suppose you could small enough to fit two A4 pages on a single screen, even use EDLIN if you were a masochist. or enlarge it big enough to see only a few words If you’ve got existing wordprocessor files, there at a time. An on-screen ruler lets you measure di- are converter programs from WordPerfect and MS- mensions, and the image can be rotated, mirrored Word into TEX and LATEX. and inverted. Despite the fact that TEX drivers use As with any DTP system, once you go beyond bitmap fonts, the previewer does not require its own elementary formatting you do need to be aware of set of fonts at screen resolution: it can use whatever what you want to do. Typographical training is a you have around for your printer, reducing your disk much-neglected part of office life so far, although it is storage needs. Other suppliers also have very good noticeably easier to get good typographic quality in previewers, especially the Preview program from Ar- your output using TEX than with some visual systems. bortext. However, as I mentioned earlier, there are so many The print drivers have similar facilities for po- predefined layouts available that many users don’t sitioning the output on the page, so it is possible bother to delve into the deeper recesses of TEX’s to print portrait or landscape, mirrored or inverted, abilities. even on dot-matrix printers. You can easily print The macro facilities mean there is a lot of under- selected pages in any order, with multiple pages per lying power available. TEX is in fact a typographical sheet in various orientations, so it is possible to make programming language, which means you can pro- booklets correctly paginated for folding straight off gram decision-making \if statements to modify the the printer. appearance depending on the text being processed. The only thing that causes an initial stumbling The effect of this is to dramatically increase the block is the sequence of edit–process–view–print. reusability of your text: using LATEX’s predefined This is common to all logical systems, in that the styles, for example, you can turn a text file into entire file has to be processed before it can be seen. a business report by marking the component parts The reason is partly historical, in that TEX grew up (title, sections, subsections etc.) and adding three before bitmapped screens were commonplace; and commands at the top and one at the bottom. If partly inherent in a logic-based system, in that type your analysis was so good you wanted to turn it into placement on the page naturally depends on what a chapter of your new book, change the document fit onto previous pages. PC-TEX does in fact sell a style report to book and that’s it. To publish it as version which displays as you go, and similar ver- an article, change book to article, and change the sions are available for the Amiga and some other fast chapters into sections. This is the whole essence of machines, but these are very much the exception. logic-based DTP: once the structure of the text is

Historical review of TEX3 364 TUGboat, Volume 41 (2020), No. 3

Timings the cause of the problem and the actual place where T X spotted things going astray. TEX is fast: I clocked just over one-and-a-half seconds E per page on a 16MHz 80386 clone with 640KB and Justification and hyphenation problems are rare a 28ms unfragmented hard disk for processing plain in normal text, because TEX justifies an entire para- graph at one go, rather than line by line. This results continuous text in emTEX. By comparison, PC-TEX’s 80386-specific executable on the same machine but in a wonderfully smooth and professional finish com- using 2MB of memory gave me under half a second pared with the woefully ragged and uneven spacing per page on the same file. found in some systems, but there are occasions when Printing speed is limited by the throughput of you have to fix an obtuse word with a discretionary the printer itself. The HP LaserJet driver printed at hyphen. the full eight pages a minute, with only a brief pause The most common mistake I made was forget- at the start to download the font glyphs. ting to turn off a typestyle such as boldface, resulting Printing full-page graphics on a dot-matrix print- in the remainder of the document being in bold type, er is always tedious, but I counted 34 seconds per but that is easy to spot and easy to fix, as it is ob- A4 page of solid text on an Epson LQ800, which is vious from the display or printout where the error acceptable for short drafts. starts. Omitting or wrongly delimiting an argument causes unexpected results for the unwary: typing emTEX runs very happily under DesqView/386, my own preference for a working environment, with \centerline and then omitting any argument and carrying on with the text makes T X centre the first PC-Write in one window, the TEX engine in another E and the WYSIWYG screen in another (plus my usual character of the next word and then complain that it assortment of comms, spreadsheet and database). can’t fit the surrounding text on the line. Perfectly I haven’t tested it under Windows, but there’s a reasonable, but a strong case for RTFM (Read The Windows (Microsoft Paintbrush-style) screen driver Flaming Manual). Fortunately the results of RTFM are well worth in emTEX, and the TEX program and print drivers should cause no problems as DOS tasks. it: after the learning curve had flattened a little (an afternoon), doing some automated formatting, When something goes wrong even including some mathematics, produced such a TEX’s error messages are explicit but technical. Press- professionally-formatted page that the idea of going ing H when it pauses at an error gives some fur- back to placing everything with the mouse by hand ther explanation, and often points you to the rel- is one I can’t contemplate. evant section of The TEXbook. It is assumed that TEX has extremely few bugs, as Don Knuth has you have read some of this, or similar explanatory been offering hard cash to anyone who can find one, documentation, because otherwise a message such starting at 1 cent and doubling on each occasion. As as ‘Overfull \hbox at line 432’ is not going to the current rate is only $40.96 after 12 years, you mean much (in fact it refers to a justification or hy- are not likely to find that many more. phenation problem, the h(orizontal) box being the When you need more serious help, therefore, it page element it is trying to justify). will be with typography, not bugs, and you will need ‘Errors’ in this sense means one of two things: access to a design or typographic expert fluent in either you have mistyped a command word, which TEX. In the case of commercial software, this is often is easy to correct, as it tells you what and where; provided by the supplier, and either included in the or there is a fault in the logic of your instructions, price or charged as support. In the case of public- which is harder to spot. Unfortunately, computers domain TEX, you can either call your user group cannot guess your intentions, although TEX makes (see details at end), or if you have email access, send a damn good try. Missing a closing curly-brace is a message to one of the many support conferences, a common typing error, and so is forgetting to turn where there are hundreds of experts who will gladly off a special mode, such as mathematical setting, help. For specialist or large-scale development, there but this is not logically determinable until your text are also many TEX consultants who charge normal tries to do something that TEX knows cannot be commercial rates. done in whatever mode you were in, like finding a paragraph-end in the middle of text supposed to be Fonts centred. All the program can do is report what went The default font is Computer Modern (see Figure 2), wrong and where: it’s up to you to find out why, and a redrawing (by Knuth) of Monotype Modern 8A, this can be tricky in complex work, as there may and resembling Century Schoolbook, but less bulky. be a considerable amount of innocuous text between It suits excellently for continuous text, which is what

Peter Flynn TUGboat, Volume 41 (2020), No. 3 365

TEX was originally designed for. Because of this, Roman) from the MetaFoundry (Dublin, Ohio); and and because Computer Modern has all the scientific two new ones, a modern Irish typeface by Micheál and mathematical symbols built-in (many more than Ó Searcóid of University College, Dublin; and a new in your average DTP system), a lot of TEX users and by Yannis Haralambous never bother to get away from it. This is a pity, as of CITI Lille, which has some stunning decorated it has led to the myth that TEX only works with the initials (see Figure 3 and Figure 4). CM typeface. A browse through TUGboat, the TEX Users Group journal, soon explodes the myth: TEX works fine with Bitstream fonts, PostScript, and The quick brown fox jumped over the lazy Pandora anything from the Hewlett-Packard downloadable The quick brown fox jumped over the lazy Concrete METAFONT stable, as well as with other designs. The quick brown fox jumped over the lazy Mind you, you have to pay money for some of The quick brown fox jumped over the lazy Times these, but I used Bitstream Swiss and Humanist (Hel- vetica and Optima), the Type 1 Adobe PostScript fonts, and some HP softfonts generated by Glyphix, Figure 4: Comparison of some text fonts used in TEX all without problems (although Glyphix doesn’t pro- vide a pound [sterling] sign, being American). These Interestingly, TUGboat says that the resident fonts have to be in TEX format, which means a small ‘Wizard of Fonts’ is Hermann Zapf himself, so they conversion program from PC-TEX Inc. for Bitstream clearly have some good high-level advice on tap. outlines. If you don’t have a PostScript printer, you There are many non-Latin fonts available: Cyrillic, can buy the Adobe fonts as bitmaps from the Kinch Greek, Devanagari, Arabic, Turkish, and Japanese, Computer Company, and there is a public-domain even a Tengwar script for Tolkien fans! The Schol- HPtoTeX program available to convert HP download- arTEX package provides Persian, Ottoman, Pashto, ables. The Austin Code Works also has a whole Urdu, Hebrew, Yiddish, Syriac, Armenian, Greek, stash of fonts already in TEX format, and as I said and Latin, Fraktur and Schwabacher, earlier, if you restrict yourself to a PostScript printer, Anglo-Saxon, Irish, Glagolitic, Coptic, Calligraphic you can do away with bitmaps entirely, although you Arabic and Sanskrit. There’s a CM-compatible In- lose some of the mathematical and scientific symbols, ternational Phonetic Alphabet (IPA) from the Uni- and the interletter spacing is a little poxy in places. versity of Washington.

Computer Modern Roman 5pt design at natural size lleŊ VergŁngliĚe / IĆ nur ein GleiĚniŊ; / Computer Modern Roman 17pt design scaled down to 5pt DaŊ UnzulŁngliĚe, / Hier wird’Ŋ EreigniŊ; / DaŊ UnbesĚreibliĚe, / Hier iĆ’Ŋ getan; / Computer Modern Roman 17pt design A at natural size DaŊ Ewig-WeibliĚe / Zieht unŊ hinan. Computer Modern Roman Nilaon tintean 5pt design scaled up to 17pt mar do xinteanfein Figure 5: Difference between design sizes

Figure 3: Fraktur with decorated initial, and an Irish TEX provides the ability to scale any individual text typeface font or the whole document to any size (but you still need the bitmaps at the right dot-density in METAFONT is a programming language for mak- order to print), but The TEXbook makes it clear ing your own fonts. Font design itself is a decidedly that 5pt type is best done with a real 5pt design, non-trivial activity, but the program can be used and 50pt type with a 50pt design, rather than by with several public-domain font definitions to create scaling another design size up or down (see Figure 5). font files. There is an increasing number of new fonts This is of course exactly what typographers have being done in METAFONT, and I tested a mixture: been doing since Gutenberg, but some authors of some well-established ones like Pandora by Neenie modern systems failed initially to understand the Billawala and Hermann Zapf’s Euler (from the Amer- need, and gave only outline fonts. This was partly ican Mathematical Society); Helvetica and Times acknowledged by Adobe’s use of ‘hints’, but it has led (more akin to Morison’s original, not Times New to some disastrous results in terms of legibility and

Historical review of TEX3 366 TUGboat, Volume 41 (2020), No. 3

appearance, as anyone who has tried scaling 110pt source code as output; and SliTEX for making multi- type down to 5pt can see. Using bitmaps of fonts colour overhead transparency separations. This has at different design sizes does place some restrictions recently been extended by David Love of the Dares- on users’ disk space, but the selection of Computer bury Labs to produce output on colour PostScript Modern shipped with TEX provides a good everyday printers — what he terms TEXnicolor. working subset as a mixture of design sizes and scaled METAFONT, the font-making program, is sup- fonts. It is clear that the reason for METAFONT is plied with emTEX plus the complete source code that you can generate fonts at the size required in for all the Computer Modern fonts, and some font a few minutes and then junk the bitmaps when you manipulation tools to go with it. With most com- are finished, if you are short of space (some systems, mercial implementations, though, METAFONT is a such as the version for Amiga, do the font generation chargeable extra. automatically when you reference a font which is not The public-domain bolt-ons are legion. In addi- on disk). tion to the wide range of styles under LATEX, assorted generous people have written and donated macro Graphics packages to do circuit diagrams, molecular (hydro- The TEX typesetting engine itself, of course, is not carbon) diagrams, critical text editions, style files a graphics processor: no logical system is, and even for journals, music typesetting, , data- most visual DTP systems make poor drawing pack- base output, newsletters, calendars; TEX has been ages. Most DTP users are accustomed to scanning used for most things at one time or another over the artwork and then embedding it into the document, last dozen years, so there is a wealth of experience and the same applies to TEX, although you can also to draw on. There are also plenty of agencies and use the more elementary line-drawing abilities of consultants who will design styles in TEX to your the LATEX macro package, and the more advanced specifications. facilities of PICTEX or TEXCAD. Scanning artwork usually means touching it up in a paintbox program such as PC Paintbrush, and you can then make it into a character in a font file, Conclusion using a neat little routine from Micro Programs Inc, The gripes I referred to earlier (lack of fonts, graphics or if your printer can’t handle large downloadable and styles, visual presentation and perceived diffi- glyphs, you can use the same routine to make a culty of use) seem to have been a problem of users’ printstream output file which the print driver can perception, rather than any deficiency in the TEX handle itself. system itself. TEX is, however, something of an ac- This means you can also embed printstream quired taste, but has the capability to outperform output of some other application in the TEX output most other systems in its field in quality, flexibility (e.g. Encapsulated PostScript, or HP’s PCL generated and automation. by spreadsheets, business graphics packages, CAD, Directly comparing TEX with a visual system is drawing programs etc.). There are also several rou- not strictly valid, as the two kinds of DTP are usually tines developed for using half-tones (photographs), aimed at different requirements, although there is a in monochrome or colour, although the 300 dpi reso- large area of overlap. Those users who feel uneasy lution of a normal laser printer scarcely does them with an asynchronous visual feedback are probably justice. more productive with a visual system, but they may have to spend more effort to achieve the same level Extras of quality. Most implementations of TEX include the macro Equally, there are tasks more suited to visual package LATEX, which provides the facilities for struc- systems than to logical ones: attention-grabbing tured documents, with variations on autonumbering advertising and newsletters which need to rely on sections, subsections, paragraphs etc.; automated a complex blend of overlapping colour graphics and table-of-contents and indexes; simple diagram and type for optical appeal are far faster to produce using graph-drawing; forward and backward references and a visual system than a logical one. a whole stack of other stuff, including BibTEX, a bib- In the end it comes down to suiting the facilities liographic database program. offered to the typographic skill and knowledge of the There’s PICTEX, for drawing more complex di- user, as well as to the demands of the task. Despite agrams such as flowcharts or organisation charts; the initial appeal of visual systems, TEX is still worth TEXCAD, a little CAD package which produces LATEX a careful look.

Peter Flynn TUGboat, Volume 41 (2020), No. 3 367

A Original biography • TurboTEX, $150; Adobe bitmaps, $200. Kinch Peter Flynn is in charge of research and academic Computer Company, 501 South Meadow Street, computing at University College Cork, Ireland. He Ithaca, NY 14850, USA (fax +1 607 273 0484). has been Technical Consultant to a large City com- Kinch also provides a fax driver for TEX. puter bureau, deputy DP manager for one of the • Bitmap fonts. Austin Code Works, 11100 Leaf- UK Training Boards, and a teacher of programming wood Lane, Austin, Texas 78750-3464, USA (fax and systems analysis. His hobbies are early music, +1 512 258 1342, email [email protected]); reading and surfing. He can be reached by email as • Capture, $100; TEXPIC (graphics), $79. Micro pflynn on BIX and CIX, and through the wide-area Programs, Inc, 251 Jackson Avenue, Syosset NY networks as [email protected]. 11791-4117, USA (tel +1 516 921 1351). • ScholarT X (out shortly): Yannis Haralambous, B Network sources of public-domain E rue Breughel 101/11, FR-59650 Villeneuve software d’Ascq, France (tel +33 20.05.28.80, email Email users (e.g. CIX, JANET, HEANET, BITNET, [email protected]). UUCP etc.) can retrieve files from the Aston archive of TEXware: send a one-line electronic mail message D Pull quotes to [email protected] saying 1. All implementations are compatible with each send[tex-archive]00directory.list other, differing only in memory capacity and (this is a large file containing a directory of everything speed, depending on the operating environment. in the archive). Individual files can then be retrieved 2. Omitting or wrongly delimiting an argument by the same method. causes unexpected results for the unwary: a A one-line mail to [email protected] strong case for RTFM (Read The Flaming Man- saying just index will retrieve the file list from the ual). server in Heidelberg (files are got by sending get 3. Despite the initial appeal of visual systems, T X followed by the filename). Some additional T X E E is still worth a careful look. material is kept at [email protected]. All network servers respond to the single-word References command help by sending you a help file about [1] M. Doob. A Gentle Introduction to T X. As- what is in them and how to access them. BT Gold E ton University, UK: UK T X Archive, Jan. 1990. and EirMail are unfortunately not connected to the E ctan.org/pkg/gentle. international email networks, but Gold 400 is. [2] D. Knuth. The T Xbook. Addison-Wesley, Boston, In case of difficulty, contact the UKTEX Users E Group, c/o Aston University Computer Centre, Birm- MA, 18th edition, May 1990. ingham (email [email protected]). [3] L. Lamport. LATEX: A Document Preparation System. Addison-Wesley, Boston, MA, Jan. 1986. C Commercial software [4] L. Lamport. Document Production: Visual or • PC-TEX. UniTEX Systems, 12 Dale View Road, Logical? TUGboat 9(1):8, Jan. 1988. Sheffield. tug.org/TUGboat/tb09-1/tb20lamport.pdf • µTEX. Arbortext Inc, 535 West William Street, [5] M. Spivak. The PC-TEX Manual. Personal TEX, Ann Arbor, Michigan 48103, USA (fax +1 313 Inc., San Francisco, CA, 01 1985. 996 3573). [6] J. St Sauver. Introduction to TEX on VAX/VMS. • Textures (Mac), £350; CM fonts as PostScript Claremont University, CA: YMIR TEX Archive, outlines, £300. TEXpert Systems, PO Box 1897, Jan. 1991. London NW6 1DQ (fax +44 71 433-3576). Peter Flynn • For other operating systems, contact the UK ⋄ T X Users Group (address above) for details of Textual Therapy Division, E Silmaril Consultants suppliers. Cork, Ireland • WordPerfect-to-TEX converter, $249. K-Talk Phone: +353 86 824 5333 Communications, 30 West First Avenue, Suite peter (at) silmaril dot ie 100, Columbus, Ohio 43201, USA (fax +1 614 blogs.silmaril.ie/peter 294 3704).

Historical review of TEX3 368 TUGboat, Volume 41 (2020), No. 3

decision-table in macros/latex/contrib The Treasure Chest Decision tables in Decision Model and Notation (DMN) format. docutils in macros/latex/contrib Support for Docutils reStructuredText sources These are the new packages posted to CTAN (ctan. (docutils..io). org) from August–October 2020, along with a few leftindex in macros/latex/contrib notable updates. Descriptions are based on the an- Left indices (sub/superscripts) with improved nouncements and edited for extreme brevity. spacing. Entries are listed alphabetically within CTAN nnext in macros/latex/contrib directories. More information about any package Extensions for the gb4e linguistics package. can be found at ctan.org/pkg/pkgname. A few oup-authoring-template in macros/latex/contrib entries which the editors subjectively believe to be Template for journals. of especially wide interest or otherwise notable are qyxf-book in macros/latex/contrib starred (*); of course, this is not intended to slight Book template for Qian Yuan Xue Fu club. the other contributions. realtranspose in macros/latex/contrib We hope this column helps people access the vast 90 degree character transposition. amount of material available through CTAN and the runcode in macros/latex/contrib distributions. See also ctan.org/topic. Comments Execute any command line tool and typeset are welcome, as always. its output. semtex in macros/latex/contrib Karl Berry Deal with stripped SemanT X documents. ⋄ tugboat (at) tug dot org E swfigure in macros/latex/contrib dviware Five display modes for handling figures too large to fit on a single page. dvivue in dviware totalcount in macros/latex/contrib Windows viewer for PDF or DVI. Typeset total values of counters (from caption). xmuthesis in macros/latex/contrib fonts Thesis for Xiamen University. cmathbb in fonts Computer Modern math , macros/latex/required with complete alphabet and numerals. latex-firstaid in macros/latex/required doulossil in fonts Late-breaking compatibility fixes for packages, The Doulos SIL font for IPA typesetting. provided by the LATEX team. josefin in fonts Josefin fonts with LATEX support, a geometric macros/luatex/latex sans. lua-physical in macros/luatex/latex plimsoll in fonts Pure Lua library providing functions and Access to the Plimsoll symbol used in chemistry. objects for computing physical quantities. spectral in fonts stricttex in macros/luatex/latex Spectral fonts with LATEX support, a serif face. Strictly balanced brackets; allow numbers in command names. unitipa in macros/luatex/latex fonts/utilities Typesetting TIPA fonts with . ot2woff in fonts/utilities Convert OpenType or TrueType to WOFF. macros/unicodetex/latex t1subset in fonts/utilities Barbara Beeton’s column in this issue (p. 260) describes ++ C library to subset Type 1 fonts. the new CTAN area macros/unicodetex. woff2ot in fonts/utilities texnegar in macros/unicodetex/latex Convert a WOFF file to OpenType or TrueType. Kashida justification improved wrt xepersian.

macros/latex/contrib support centerlastline in macros/latex/contrib light-latex-make in support “Spanish” paragraphs with last line centered. A build tool for LATEX documents. TUGboat, Volume 41 (2020), No. 3 369

2021 TEX Users Group election 2021 TUG Election — Nomination Form TUG Elections Committee Eligibility requirements: • TUG members whose dues for 2021 have been paid. The terms of TUG President and ten other members of • Signatures of two (2) members in good standing at the Board of Directors will expire as of the 2021 Annual the time they sign the nomination form. Meeting, expected to be held in July or August 2021. Supplementary material to be included with the The terms of these directors will expire in 2021: • ballot: passport-size photograph, a short biography, Karl Berry, Johannes Braams, Kaja Christiansen, and a statement of intent. The biography and state- Taco Hoekwater, Klaus H¨oppner, Frank Mittelbach, ment together may not exceed 400 words. Ross Moore, Arthur Rosendahl, Will Robertson, Herbert Voß. • Names that cannot be identified from the TUG mem- Continuing directors, with terms ending in 2023: bership records will not be accepted as valid. Barbara Beeton, Jim Hefferon, Norbert Preining. The undersigned TUG members propose the nomination of: The election to choose the new President and Board members will be held in early Spring of 2021. Nomina- Name of Nominee: tions for these openings are now invited. A nomination Signature: form is on this page; forms may also be obtained from the TUG office or via tug.org/election. Date: The Bylaws provide that “Any member may be for the position of (check one): nominated for election to the office of TUG President/ TUG President to the Board by submitting a nomination petition in  TUG accordance with the TUG Election Procedures. Election  Member of the Board of Directors ... shall be by ... ballot of the entire membership, carried for a term beginning with the 2021 Annual Meeting. out in accordance with those same Procedures.” 1. The name of any member may be placed in nomina- (please print) tion for election to one of the open offices by submission of a petition, signed by two other members in good stand- (signature) (date) ing, to the TUG office; the petition and all signatures must be received by the deadline stated below. A can- 2. didate’s membership dues for 2021 must be paid before (please print) the nomination deadline. The term of President is two years, and the term of TUG Board member is four years. (signature) (date) An informal list of guidelines for TUG board mem- bers is available at tug.org/election/guidelines.html. Return this nomination form to the TUG office via postal It describes the basic functioning of the TUG board, mail, fax, or scanned and sent by email. Nomination including roles for the various offices and ethical consid- forms and all required supplementary material (photo- erations. The expectation is that all board members will graph, biography and personal statement for inclusion abide by the spirit of these guidelines. on the ballot, dues payment) must be received at the Requirements for submitting a nomination are listed TUG office in Portland, Oregon, USA, no later than at the top of the form. The deadline for receipt of com- pleted nomination forms and ballot information is 07:00 a.m. PST, 1 March 2021. 07:00 a.m. PST, 1 March 2021 It is the responsibility of the candidate to ensure at the TUG office in Portland, Oregon, USA. No excep- that this deadline is met. Under no circumstances will tions will be made. Forms may be submitted by fax, or late or incomplete applications be accepted. scanned and submitted by email to [email protected]; re- Supplementary material may be sent separately from ceipt will be confirmed by email. In case of any questions the form, and supporting signatures need not all appear about a candidacy, the full TUG Board will be consulted. on the same physical form. Information for obtaining ballot forms from the TUG 2021 membership dues paid website will be distributed by email to all members within  nomination form 21 days after the close of nominations. It will be possible  photograph to vote electronically. Members preferring to receive a  biography/personal statement paper ballot may make arrangements by notifying the  TUG office; see address on the form. Marked ballots must TEX Users Group be received by the date noted on the ballots. Nominations for 2021 Election Ballots will be counted by a disinterested party not P.O. Box 2311 affiliated with the TUG organization. The results of the Portland, OR 97208-2311 election should be available by mid-April, and will be U.S.A. announced in a future issue of TUGboat and through various TEX-related electronic media. (email: [email protected]; fax: +1 815 301-3568) 370 TUGboat, Volume 41 (2020), No. 3

TEX Consultants

The information here comes from the consultants Latchman, David themselves. We do not include information we 2005 Eye St. Suite #6 know to be false, but we cannot check out any of Bakersfield, CA 93301 the information; we are transmitting it to you as it +1 518-951-8786 was given to us and do not promise it is correct. Email: david.latchman (at) Also, this is not an official endorsement of the texnical-designs.com people listed here. We provide this list to enable Web: http://www.texnical-designs.com you to contact service providers and decide for LATEX consultant specializing in the typesetting of yourself whether to hire one. books, manuscripts, articles, Word document TUG also provides an online list of consultants conversions as well as creating the customized at tug.org/consultants.html. If you’d like to be LATEX packages and classes to meet your needs. listed, please see there. Contact us to discuss your project or visit the website for further details. Dangerous Curve Monsurate, Rajiv +1 213-617-8483 India Email: typesetting (at) dangerouscurve.org Email: tex (at) rajivmonsurate.com We are your macro specialists for TEX or LATEX Web: https://www.rajivmonsurate.com fine typography specs beyond those of the average I have over two decades of experience with LATEX LATEX macro package. We take special care to in STM publishing working with full-service typeset mathematics well. suppliers to the major academic publishers. I’ve Not that picky? We also handle most of your built automated typesetting and conversion typical TEX and LATEX typesetting needs. systems with LATEX and rendered TEX support for We have been typesetting in the commercial and a major publisher. academic worlds since 1979. I offer design, typesetting and conversion Our team includes Masters-level computer services for self-publishing authors. I can help with scientists, journeyman typographers, graphic LATEX class/package development, conversion tools designers, letterform/font designers, artists, and a and training for publishers and typesetters for co-author of a TEX book. book and journal production. I can also help with full-stack web development. Dominici, Massimiliano Email: info (at) typotexnica.it Sofka, Michael Web: http://www.typotexnica.it 8 Providence St. Our skills: layout of books, journals, articles; Albany, NY 12203 creation of LATEX classes and packages; graphic +1 518 331-3457 design; conversion between different formats of Email: michael.sofka (at) gmail.com documents. Personalized, professional TEX and LATEX We offer our services (related to publishing in consulting and programming services. Mathematics, Physics and Humanities) for I offer 30 years of experience in programming, documents in Italian, English, or French. Let us macro writing, and typesetting books, articles, know the work plan and details; we will find a newsletters, and theses in TEX and LATEX: customized solution. Please check our website Automated document conversion; Programming in and/or send us email for further details. Perl, Python, C, C++, R and other languages; Writing and customizing macro packages in TEX or LATEX, knitr. If you have a specialized TEX or LATEX need, or if you are looking for the solution to your typographic problems, contact me. I will be happy to discuss your project. TUGboat, Volume 41 (2020), No. 3 371

Veytsman, Boris Warde, Jake 132 Warbler Ln. Forest Knolls, CA 94933 Brisbane, CA 94005 +1 650-468-1393 +1 703 915-2406 Email: jwarde (at) wardepub.com Email: borisv (at) lk.net Web: http://myprojectnotebook.com Web: http://www.borisv.lk.net I have been in academic publishing for 30+ years. TEX and LATEX consulting, training, typesetting I was a Linguistics major at Stanford in the and seminars. Integration with databases, mid-1970s, then started a publishing career. I automated document preparation, custom LATEX knew about TEX from Computer Science editors at packages, conversions (Word, OpenOffice etc.) and Addison-Wesley who were using it to publish much more. products. Beautiful, I loved the look. Not until I I have about two decades of experience in TEX had immersed myself in the production side and three decades of experience in teaching & of academic publishing did I understand the training. I have authored more than forty packages contribution TEX brings to the reader experience. on CTAN as well as Perl packages on CPAN Long story short, I started using TEX for and R packages on CRAN, published papers in exploratory projects (see the website referenced) TEX-related journals, and conducted several and want to contribute to the community. Having workshops on TEX and related subjects. Among spent a career evaluating manuscripts from many my customers have been Google, US Treasury, perspectives, I am here to help anyone who seeks FAO UN, Israel Journal of Mathematics, Annals of feedback on their package documentation. It’s a Mathematics, Res Philosophica, Philosophers’ start while I expand my TEX skills. Imprint, No Starch Press, US Army Corps of Engineers, ACM, and many others. We recently expanded our staff and operations to provide copy-editing, cleaning and troubleshooting of TEX manuscripts as well as typesetting of books, papers & journals, including multilingual copy with non-Latin scripts, and more. TUG 2020: Reprise

Videos for all talks: tug.org/l/tug20-video YouTube channel: youtube.com/c/TeXUsersGroup Conference schedule and abstracts: tug.org/tug2020/program.html Proceedings (open access): tug.org/TUGboat/tb41-2 372 TUGboat, Volume 41 (2020), No. 3

Calendar

2020 Jul ?? International Society for the History and Theory of Intellectual Property (ISHTIP), Oct 27–31 Association Typographique Internationale, 12th Annual Workshop, ATypI All Over, www.atypi.org “Landmarks of Intellectual Property”. Oct31 GuIT Meeting 2020, Bournemouth University, UK. th 17 Annual Conference, online, www.ishtip.org/?p=1027 hosted by SISSA medialab, Trieste, Italy. Jul 20 – 21 Centre for Printing History & Culture, www.guitex.org/home/en/meeting CPHC/Print Networks Conference, Nov 5–8 AwayzGoose, an online gathering “A visitor attraction: printing for lovers of type and letterpress, for tourists”, Hamilton Wood Type & Appleby-in-Westmorland, Cumbria, UK. Printing Museum and www.chpc.org.uk/events American Printing History Association, Jul 26–30 Digital Humanities 2021, Alliance of Two Rivers, Wisconsin. Digital Humanities Organizations, woodtype.org/pages/wayzgoose Tokyo, Japan. adho.org/conference Jul 26–20 SHARP 2021, “Moving texts: 2021 from discovery to delivery”. Society for the History of Authorship, Mar 1 TUG election: nominations due, Reading & Publishing. Hosted virtually 07:00 a.m. PST. tug.org/election by the University of Muenster. Mar 10–12 DANTE 2021 Fr¨uhjahrstagung and www.sharpweb.org/main/news 64th meeting, 32 Jahre DANTE e.V., Aug 1–5 SIGGRAPH 2021, Los Angeles, California. Otto--Guericke Universit¨at, s2021.siggraph.org Magdeburg, Germany. www.dante.de/veranstaltungen Aug 2–6 Balisage: The Markup Conference, Rockville, Maryland. www.balisage.net Mar 31 TUGboat 42:1, submission deadline. Aug 18–22 TypeCon2021, May 2–5 CODEX VIII, “EXTRACTION: Philadelphia, Pennsylvania. Art on the Edge of the Abyss”, typecon.com Richmond, California. www.codexfoundation.org Sep 10 The Updike Prize for Student Type Design, application deadline, 5:00 p.m. EST. Jun ?– TypeParis21, Providence Public Library, Jul ? intensive type design program, Providence, Rhode Island. Paris, France. typeparis.com prov.pub/updikeprize Jun 30– Nineteenth International Conference Oct1–3 OakKnollFest XXI, Jul 2 on New Directions in the Humanities, “Women in the Book Arts”, “Critical Thinking, Soft Skills, New Castle, Delaware. and Technology”, www.oakknoll.com/fest Universidad Complutense Madrid, Spain. thehumanities.com/2021-conference

Owing to the COVID-19 pandemic, schedules may change. Check the websites for details.

Status as of 15 October 2020 For additional information on TUG-sponsored events listed here, contact the TUG office (+1 503 223-9994, fax: +1 815 301-3568, email: [email protected]). For events sponsored by other organizations, please use the contact address provided. User group meeting announcements are posted at tug.org/meetings.html. Interested users can subscribe and/or post to the related mailing list, and are encouraged to do so. Other calendars of typographic interest are linked from tug.org/calendar.html. TUGBOAT Volume 41 (2020), No. 3 Introductory 263 Michael Barr / How TEX changed my life

• math writing, typing, and typesetting 260 Barbara Beeton / Editorial comments

• typography and TUGboat news 265 Peter Flynn / Typographers’ Inn

• To print or not to print; Centering (again); New device driver for old format; What’s in a name 360 Peter Flynn / Historical review of TEX3 • a 1991 review of TEX, written for those experienced in other typesetting systems 259 Boris Veytsman / From the president

• the paradox of early adoption; moving free software forward Intermediate 368 Karl Berry / The treasure chest

• new CTAN packages, August–October 2020 269 Lorrie Frear / Eye charts in focus: The magic of optotypes

• history and design of the optotypes used on eye charts 286 LATEX Project Team / LATEX news, issue 32, October 2020

• xparse in the format; hook management system; changes in graphics, tools, amsmath, babel 281 Matthew Leingang / Using DocStrip for multiple document variants

• step-by-step example of homework assignments with questions, solutions, and more 275 Kamal Mansour / The Non-Latin scripts & typography

• examples of the wide range of typesetting and font requirements beyond Latin 327 Luigi Scarso / Short report on the state of LuaTEX, 2020 • development status and comparison of LuaTEX and its relatives: LuaHBTEX, LuaJITTEX and LuaJITHBTEX 324 Peter Wilson / Data display, plots and graphs

• simple example of using tables, scatter plots, line graphs, histograms Intermediate Plus 341 Island of TEX / TEXdoc online — a web interface for serving TEX documentation

• HTTP API for texdoc and CTAN topics 292 Frank Mittelbach, Chris Rowley / LATEX Tagged PDF — A blueprint for a large project A • background and task summary of an extended feasibility study on the L TEX web site 318 V´ıtNovotn´y / Making Markdown into a microwave meal A • freezing Markdown, Minted, and BibL TEX output for single-run processing 308 Nicola Talbot / bib2gls: selection, cross-references and locations A • selecting, grouping, referencing glossary items with L TEX’s index facilities 348 Petr Olˇs´ak / OpTEX — A new generation of Plain TEX • modern LuaTEX format with full Unicode support 343 Michal Vlas´ak / MMTEX: Creating a minimal and modern TEX distribution for GNU/Linux • a small TEX distribution with LuaTEX and OpTEX that integrates easily on modern systems Advanced 299 Enrico Gregorio / Functions and expl3 A • an introduction to L TEX3 programming with functions and variables 335 Hans Hagen / Keyword scanning

• peculiarities of parsing keywords, catcodes, and performance 337 Hans Hagen / Representation of macro parameters

• considers making unused arguments more efficient and tracing output more consistent 320 Hans Hagen / User-defined Type 3 fonts in LuaTEX

• constructing a font on the fly that can reference images, other fonts, graphics 329 Hironori Kitagawa / Distinguishing 8-bit characters and Japanese characters in (u)pTEX

• analysis and solutions for differing interpretations of byte sequences 346 Igor Liferenko / UTF-8 installations of TEX • changing TEX’s encoding to support reading/writing Unicode Reports and notices 258 Institutional members 355 Charles Bigelow / Book reviews: Robert Granjon, letter-cutter, and Granjon’s Flowers, by Hendrik D.L. Vervliet

• detailed review of these two books from Oak Knoll Press on 17th century type designer Robert Granjon 358 Boris Veytsman / Book review: Glisterings, by Peter Wilson

• review of this volume of collected columns from TUGboat, published by TUG 369 TUG Elections committee / TUG 2021 election 370 TEX consulting and production services 372 Calendar