Computers, Typesetting, and Lexicography

JÖRGEN P ind Computers, Typesetting, and Lexicography A bstract As part of the general strategy of computerizing the lexicographic work process at the Institute of Lexicography, we have adopted Donald E. Knuths typesetting program as our typesetting engine. The main characteristics of the program wiU be briefly described, followed by a dis cussion of its advantages for lexicographic work. has already been used for the typesetting of a 1300 page etymo logical dictionary of Icelandic. A number of other projects are under way. Special notice will be paid to the problem of coding as it relates to the making of dictionaries. The advantages of a generic, or logical, coding over typographic coding will be emphasized. However, doubts will be raised about the possibility of providing a set of tags which are completely neutral with respect to typographic considerations. 1 Introduction In this paper I want to discuss one particular aspect of computational lexicogra^ phy, namely the typesetting of dictionaries. This is perhaps not an issue which is central to computational lexicography, yet it is a subject which deserves study, especially now when the arts of typesetting have been moving onto the desktop. I will show you the approach we have adopted at the Institute of Lexicography, and remark on how it fits into our overall strategy for computational lexicograr phy. Let me begin, in all modesty, by quoting myself. In 1986 I was invited to give a talk at the NordData Conference in Stockholm. At that time we were just embarking on widespread use of computers at the Institute, and I attempted to draw up a schematic diagram of a ‘Lexicographers’ workbench’ (see figure 1), commenting that a number of features had not been implemented. “This holds es pecially for the ‘manuscript writer’. Our work has not yet reached the stage where this is in great demand, but we envisage the possibility of using the database to turn out manuscripts for a typesetting program like (Find 1986:87). Well, this was written before we even had a version of running at the Institute! As a matter of fact, though we expected that typesetting would be 308 Proceedings of NODALIDA 1989, pages 308-325 Jörgen Pind: Typesetting and Lexicography 309 Figure 1: The lexicographer's workbench, 1986 vintage. something that we would deal with much later, a lot of work over the past couple of years has been devoted to the typesetting side of lexicography. There are two major reasons for this. The first is that the editor wants to be able to print proofs which are as closely related to the final form of the dictionary as possible. Thus a ‘manuscript writer’ has in fact been implemented as a feature of the ‘workbench’ we are currently working on. The relationship of this ‘manuscript writer’ to the work on the verbal dictionary has already been touched on in the paper by Björn Por Svavarsson and Jörgen Find in this volum e. The second reason is the fact that we have been engaged in producing Ice landic dictionaries from manuscripts, rather than from a database. Foremost among these is an Icelandic etymological dictionary by the late Åsgeir Blöndal Magnusson, former editor at the Institute, which will appear later this year.^ We have also embarked upon a series of reprints of older Icelandic lexicographic works. These works have been coded in the TfeX typesetting language. 2 Named Categories and Visual Formatting In recent years a revolution has been taking place in the typesetting industry where numerous ‘desktop publishing’ programs have gradually been replacing the traditional tools of the printer. How far has this revolution affected the ‘ As a matter of fact, it was published on the 2nd of November 1989 as plemned. Proceedings of NODALIDA 1989 309 310 Computational Linguistics — Reykjavik 1989 dictionary publisher and maker? I want to argue that such systems are not suited for the making of dictionaries. Traditionally, dictionaries have been produced from collections of slips which have been used to ease the task of keeping the dictionary entries in alphabetical order and to allow them to expand as needed, without unduly affecting entries which follow alphabetically. The slips have then often been used, with minimal markup, as the manuscript for the printer. In the past few years attempts have, however, been made to use database systems to ease the arduous task of heindling the collections of slips, with some success (cf. the paper by Björn I>6r Svavarsson and Jörgen Find in this volume). If we consider for a moment the nature of the database system, it is obvious that one of its major strengths is the fact that it allows the user to assign names or tags to the individual fields in the database. Thus we can easily imagine a database system for lexicographic work which knows about categories such as headword, pronunciation, grammatical code, semantic field, usage notes, and so on. One of the typographical requirements for a dictionary is that som e o f these categories should be reflected in the typesetting itself. This shows for example in the use of difierent fonts in dictionaries, typically used to distinguish some of the categories. Note that only some of the categories will be thus reflected, since a typical dictionary contains many more categories than would be distinguished by typographic means. Some distinctions will thus be lost in the printed dictionary which are kept in the database systems. Ideally, the lexicographer would like to use the database to automatically generate ‘scripts’ for typesetting, simply by instructing the database to print relevant typographic codes around some of the fields and not others. An even better approach would be to tag all the categories in the typesetting script and then instruct the typesetting system as to which ones should affect the typesetting process and which ones should not be reflected typographically. This latter approach is easy enough to accomplish if the typesetting system allows ‘generic’ or abstract coding of the input. The desktop publishing systems mentioned at the beginning of this section do not allow such abstract coding (indeed very few of them are able to deal with traditional typesetting codes), since they are almost universally based on the idea of ‘direct manipulation’ or ‘visual formatting’. The user manipulates a pointing device, such as a mouse, to mark parts of the text for, say, a font change. The notion of abstract coding plays no part at all in the formatting, and thus it is impossible in such a system to form a link between the categories of the database system and the typesetting. However, this is, of course, of the utmost importance for the lexicographer. A priori, I would have thought that this limitation of the desktop publishing systems would rule them out as being suitable for lexicographic work, and I was thus rather surprised when I came across the following description of the approach taken at the dictionary of Old English in Toronto. The typographical complexity of the dictionary entries— with a number of special characters, several languages, and many subsec Proceedings of NODALIDA 1989 310 Jörgen Pind: Typesetting and Lexicography 311 tions and cross-references which are distinguished by type— empha sizes the importance of interactive formatting. Because the working copy of the entry on the screen depicts the final appearance of a page, we hope to improve consistency... ... For example to put a keyword in bold in a citation, an editor can activate the area to be formatted by ranging over it with the mouse, and then use the mouse to select and apply the property bold from the Character Looks Menu (Healy 1985:248). The system being described is a Xerox workstation, running publishing soft ware similar to programs running on the Macintosh computer. As mentioned earlier, this approach is severely handicapped by the fact that there is no easy way in a visual formatting system to form links to the categories of the database system being used. I would therefore like to argue that the requirements which need to be made of a typesetting system for lexicographic work are twofold. • The typography should be of the highest order. • The system must be able to work with generic or logical markup. These requirements are met by a number of systems. We have chosen to work with ItjX. In the following pages I will describe the way we have used TfeX. While some of you are undoubtedly familiar with IfeX, I will presume that not everyone is, and ask those knowledgeable to bear with me while I give a short tutorial introduction to T ^ . 3 What is T^jX? A Tutorial Introduction is a typesetting system ‘intented for the creation of beautiful books’ to quote 1 ^ ’s author. Professor Donald E. Knuth of Stanford University. Those who have read his T ^ b o o k will also know that the previous quote continues with ‘and especially books that contain a lot of mathematics’. is indeed the premier system for typesetting mathematics available in the world today, so it is perhaps somewhat surprising to find it used for the making of dictionaries, indeed dictionaries which contain n o mathematics at all! I will attempt to describe why we have found to be eminently suitable for the typesetting of our dictionaries. 3.1 The Beginnings of It is perhaps rather surprising that we should be able to use 1^)X at all consid ering that it was created for one express purpose, viz. to allow Don Knuth to typeset his own magisterial treatise on the Art of Programming in what he felt would be an acceptable manner. These books started out being typeset in lead in the time-honoured manner of many generations of printers.

Computers, Typesetting, and Lexicography

Knuthweb.Pdf

Typeset MMIX Programs with TEX Udo Wermuth Abstract a TEX Macro

Miktex Manual Revision 2.0 (Miktex 2.0) December 2000

236 Tugboat, Volume 9 (1988), No. 3 Some Typesetting Conventions

Programming Technologies for the Development of Web-Based Platform for Digital Psychological Tools

The LATEX Web Companion

Development of Computer Typesetting

Typographic Terms Alphabet the Characters of a Given Language, Arranged in a Traditional Order; 26 Characters in English

Mmixware a RISC Computer for the Third Millennium Reihe: Lecture Notes in Computer Science

Book Design for Tex Users, Part 1

Implementation of a MIX Emulator: a Case Study of the Scala Programming Language Facilities

Phototypesetting and Desk-Top Publishing Systems in Archaeology