A. The CD-ROM

~ll complete!' said the Toad triumphantly, pulling open a locker. 'You see ... everything you can possibly want...you'll find that nothing whatever has been forgotten ... '

Kenneth Grahame, The Wind in the Willows

The disk that accompanies this book contains and documenta• tion for systems designed for use with SGML, and also for some which are more general-purpose but have some specific SGML applications. Although I have tried to make the documentation as obvious as pos• sible, I have avoided starting on the bottom rung of trying to teach the use of the keyboard, the mouse, and the (see the assumptions alluded to earlier, which are in section 1.3). Not all the programs described in the book are on the disk, as many of them are regular commercial products for which no free or demo version exists. By the same token, not all the programs on the disk are necessarily described in the book, as I have tried to make the disk as up to date as possible, and the lead-time for printing is longer than that for making the disk. If you don't have access to MS-Windows to use the Synex ViewPort browser provided (see next section), you can read the same information by using your regular Web browser to open the INDEX. HTM file on the CD-ROM. The disk is written in ISO 9660 format, so that it is usable on Macs, PCs, and systems. 384 Peter Flynn A.1. BrOwsing the CD-ROM: using SyneX/lnso ViewPort

Synex, makers of the ViewPort SGML browser engine for MS-Windows, have kindly provided a copy of their demonstration browser for you to use with this CD-ROM. It's not marketed as a separate product, because they license it to who want to embed the browser technology in their own products, but this is the latest version and it provides a useful way of seeing what's what and using SGML to do so. If you're not running MS-Windows, you can use any regular (HTML) Web browser to view the same information in HTML files. MS-Windows users can run the browser directly off the CD-ROM using the browse. bat file. If you wish, you can copy the browser instal• lation (the SYNEX folder) to your hard disk (for example to c: \Program Fi 1es \Synex), but you will need to update the configuration file SY.INI in the c: \wi ndows directory to reflect the correct hard disk letter.

A.2. Words of warning about the software

Two small warnings are needed: 1. The conditions in which you use the software, the nature of your data, and the configuration of your computer are outside my con• trol, so please ask the manufacturer or the software author if you need further details about a product or the way it works, not me or my publisher. 2. All the programs on the disk have been obtained from reliable sources, either from the manufacturers themselves (in the case of commercial software), or from a trusted archive or the original author (in the case of public domain software). Where facilities exist the executables have been virus-checked, but as neither I nor my publisher wrote the programs, we cannot be held responsible for their contents, nor their effect on your system or data.

A.3. How does It work?

The CD-ROM is written in plain ISO 9660 format, so you can use it on Apple Macs, PCs (MS-DOS and MS-Windows), and Unix workstations Understanding SGML and XML Tools 385 which support the CD , but on older computers you may need to update your CD-ROM driver software (consult your supplier for details).

CD-ROM format and

The ISO 9660 CD-ROM format (sometimes called a 'multisession' CD-ROM) can be used on multiple platforms because it restricts two aspects of the file system: • the file names and folder names are all UPPER CASE; • no folders can nest more than eight deep. For this reason, all software is kept in the following archive or compressed formats which contain the applications with preserved mixed-case filenames within them: • ZIP files (DOS/Windows and VMS): unwrap with UNZIP; • HQX files (Macs): unwrap with UnStufflt; • TAZ files (Unix): unwrap with gunzip then tar; In most cases there is a self-installing SETUP or make program which you run to install the application. In a few cases the setup is manual, and the software author has provided instructions in a README or INSTALL file. Unzipping and other public decompression tools are in the UT I LS folder in each platform directory. I have not modified any of the software in any way, so if there are instal• lation problems, inquiries should be directed to the software author; please, not to me.

Apart from the software, some of which is specific to certain platforms, there are DTDs, stylesheets, and SGML documentation which can be used on all systems (see section A.S below). Most of the examples and illustrations have been taken from MS-Windows versions of the software because that was the closest to hand at the time of writing, but the principles apply equally to all platforms. The top-level (root) directory of the CD-ROM has folders for the various platforms and one for the ViewPort browser (an application for MS-Windows to let you browse the SGML documentation on the disk): 386 Peter Flynn

[j :J :J :J :J wi BfCl wnr mae '": soml ~ :.J'ims CATALOG' II Ra~ 8.htm fII] Rudme.Jom Ra~dm. _txt

The SGML folder contains DTDs and reference documentation and has a different structure to the platform-specific folders. The PC folder is subdivided into DOS and Windows (I am not aware of any SGML software exclusively for Windows NT alone yet: 32-bit software for Windows is typically written to execute on as well as NT). Within the platform folders, the structure follows (where possible) the life-cycle approach of the chapters of the book:

x fOe (dO _ I!oO> ~ :J :J :J .:J .::J a,

Each of the upper-level folders has a plaintext INFO file containing information in unmarked text, so you can read it in any simple editor or word processor even if you haven't started using SGML yet. The same information is in two other files, an SGML README which you can use with the ViewPort demonstration browser supplied on the disk, and a HTML I NDEX file which you can read with any Web browser such as Opera, Netscape Navigator, or . In the SGML folder, the DTDS subfolder contains zipped copies of all the Document Type Descriptions covered in section 1.5.1.5, and the ENTITIES folder contains zipped copies of the standard ISO character entity files which are referred to by DTDs to specify symbols and ac• cented characters. These are all plaintext files, so they are usable on all systems. There is a master CATALOG file in standard OASIS (SGML Open) format at the top level of the CD-ROM directory structure, which gives the equivalences between the file names for all the DTDs (and the character entity files) and their Formal Public Identifiers (explained in sections 145 and page 153) . You can install all these files on your hard disk by using the setup program if you use a PC, or by copying them manually on other systems. The following folder names are rec• ommended for installation, with 'dtds' and 'entities' subdirectories within them: Understanding SGML and XML Tools 387

Platform Disk prefix Directory name PC c: \sgml Mac Hard Disk :sgml Unix [none] /usr/local/lib/sgml VMS SYS$LOGIN: [.sgml]

On shared system such as VMS and Unix, you may not have access to the root directory or the system disk, so you may have to install the folders by hand elsewhere withing your personal directory structure and make the relevant changes to the catalog file. In the other folders, the software is supplied either as self-extracting installation programs (. SEA files for Macs; . EXE files for PCs) or as plain compressed files (.ZIP, .GZ, or .SIT archives). The Unix software is in . TAR. GZ format. The dearchiving and uncompressing programs needed to handle these are in the UT I LS folders for Macs and PCs (Unix systems come with copies themselves already). If a program has installation instructions supplied by the author, these are included in each program subdirectory without change; sometimes in a file with a name such as INSTALL. If there were no separate instructions, I have provided some in a file of that name. Most of the self-installing software must be copied to your hard disk before trying to run the self-installer, because it needs to create temporary files, and this is not possible on a CD-ROM. I am told that there is a small amount of SGML software for the Atari and , which are still popular and useful machines, but I have been unable to it down, even after many requests. Regrettably, there is still very little SGML-specific software available for VMS or Macs by comparison with the PC or Unix world. Despite two decades of proving its worth, the Mac is still regarded by many software developers as a niche machine for artists. In addition to the SGML software, there is a complete base-level TEX installation for all three main platforms, because several public-domain SGML systems use TEX as the formatter. Please note that this is the base installation only: there is a large amount of additional free software for TEX for use in more advanced formatting. You can download any of it from any server of the CTAN (Comprehensive TE)( Archive Network) on the Internet (see http://www.tug.org or http://www.ucc.ie/cgi• bi n/ctan for details. 388 Peter Flynn A.4. what am I allowed to do with it?

The software falls into three categories: 1. Some of the programs are copies ofregular commercial packages. provided by the manufacturer for you to test with your own data. before you decide to buy. In these cases the programs are usually demonstration copies or limited versions which let you run them for a fixed length of time or for a fixed number of occasions, after which you need to buy the package to continue using it. 2. Some other programs and packages have been written by members of the SGML or computing community themselves, and generously made available at no charge to everyone else: this software is • tirely free, and you can continue using it forever. A large amount of it is protected by the 'copyleft' terms of the GNU General Public License, which means you can use it and redistribute it freely, but you may not prevent anyone else from doing the same. A copy of this license text is on the CD-ROM in the files gnugp 1 .sgml and gnugp 1 . txt. 3. Lastly. some of the software is shareware: you can try it for noth• ing. and you are encouraged give away copies to others to try for themselves, but if you decide to use it, you are required to buy a license, usually at a relatively low price, which gives you access to the manuals and future updates. Some of these programs you can continue using free of charge for personal use, but you have to pay if you use it commercially or on a multi-user local network. Please do not misuse or breach the terms of these licenses. Whatever your views on the rightness or wrongness of copyright and charging for software. the law as it stands requires that you observe it. Details of alternative approaches which are under constant discussion are refer• enced from the GNU General Public License mentioned above.

A.S. Documentation

Some of the key online documentation of SGML and program facilities is included on the disk in the SGML folder to save you the trouble of downloading it. • Where SGML versions of these files exist, they have been used. and can be viewed with the copy of the ViewPort demo browser for PC/Windows in the root directory. Understanding SGML and XML Tools 389

• If the documentation was in HTML, that is supplied unmodified. Be aware that some of these files may be non-conformant (do not match any DTD) so they are best viewed with your regular Web browser. • Some is in PostScript form, and is intended to be printed. If you don't have a PostScript printer, you can install the GhostScript in• terpreter and its associated viewer, which let you display and print PostScript files on most common systems and printers. • Some documentation is just plaintext, and can be viewed or printed with any editor or word processor. Where new or experimental public software is developing rapidly, how• ever, it is always best to go back to the Internet source for an updated copy.

A.6. CNU software

A large amount of the free software has been contributed to the GNU project by individuals and institutions everywhere. GNU stands for 'GNU's Not Unix', and is a project to create an independent, free com• puter operating system, with programs being written by contributors allover the world. The project solicits sponsorship and contributions in cash, code, or 'quipment, 501 (c) 3 tax-deductible in the USA (similar terms may apply in other places). Further details are at http://www.gnu.orgf. All GNU software is freely distributable under the (copyleft' terms of the GNU General Public License (see the copy of the license text on the CD-ROM: it can also be downloaded from the Web at http: / /www.gnu.ai.mit.edu/copyleft/copyleft.html).This license applies to all GNU software on the CD-ROM accompanying this book. Put simply, it licenses you to use the software, and to give it to others: it explicitly probihits you from preventing other people from doing the same. Copies of GNU software may not be distributed without a copy of the license, and without a similar condition, including this condition, being imposed upon the subsequent user. B. SOML resources

B.1. On the Web

The principal resource is Robin Cover's SGML Web pages at http:// www.sil.org/sgml/ (there's XML-specific information at http://www. sil.org/sgml/xml.html). These pages are updated almost daily and contain comprehensive pointers to just about everything that is going on in SGML. Many other people, especially the leaders in the field, have made topic-specific pages about areas they are involved in. These are all linked from the SGML Web pages, and you can usually find information about the topic you want by using the front page (URL above) to select the topic first: the list of topics doubles as the index. There is a search engine attached to the pages if you can't find what you're looking for in the categorizations. There are numerous SGML and XML FAQs, but the most up-to-date (and shortest) SGML FAQis maintained by David Megginson and can be found at http://www.;nfosys.utas.edu.au/; nfo/sgml faq. txt, and the XML FAQis maintained by me at http://www.ucc.ie/xmlj.

B.2. usenet news

News is a bulletin board service which runs across many networks including the Internet. Your ISP (Internet Service Provider) or computer center can tell you how to access it. News posts look like email, but whereas email is aimed specifically at a single recipient (or a restricted list of recipients via a mailing list), news is pinned up in public for 392 ';Peter Flynn anyone and everyone to see. You can't therefore delete news posts, but news reading software automatically keeps track of the posts you have read, and won't show them to you again unless you specifically request it. News is organized into a hierarchy, with each topic and subtopic named by separating the elements of the hierarchy with a period. The SGML news group is comp. text. sgml. There is no specific XML news group (yet) but there is an active hierarchy of HTML groups un• der comp. infosystems.www. Most technical newsgroups are intended for serious discussion, and it is expected that you read the FAQ first and phrase your queries and comments accordingly: most of the inhab• itants have plenty of other work to do. Products annoucements and the occasional job advert are tolerated: spam and plain advertising is not. All posts to comp. txt. sgml are archived at ftp://ftp. i fi . ui o. no/pub/ SGML/comp.text.sgml/.

B.3. Mailing lists

Mailing lists are run at various locations around the networks and provide a means for users to join in discussions by email only: direct interactive Internet access is not required. All mailing lists have two addresses: one (the 'list server' address) is for administrivia like sub• scribing and unsubscribing; the other (the actual 'list address') is for sending your discussion messages to. It is a major breach of protocol to send subscription or unsubscription requests to the list address instead of the server address, as all that does is broadcast your request to hundred ofpeople instead of having it acted upon. You should normally only send plaintext messages to mailing lists, as not everyone will have the same formatting or graphical capabilities as you do. In particular you should turn off anything like HTML format• ting, word processor style files, style attachments, and other non-text appendages. Never send attachments to mailing lists unless by agree• ment with the other subscribers: some people have to pay to receive email by the , and don't like having unwanted extras forced down their line. If you use a signature file, it is courteous to restrict it to 3-4 lines. Davenport (DocBOokl To subscribe to the Davenport mailing list, send a I-line email message to davenport-request@berkshi re. net say• ing subscribe davenport Understanding SGML and XML Tools 393 in the body of your message. DSSSList The DSSSList is provided by Mulberry Technologies as a service to the DSSSL user community: to subscribe, send a I-line email message to [email protected] saying subscribe dssslist in the body of your message. The DSSSList archive is at http:/ /www.mulberrytech.com/dsssl /dsssl i st/archi ve, and the archive search page is at http://www.mulberrytech.com/dsss 1/dsss 1i st/ search.html. OMUC-L The Omnimark User Group Mailing List is hosted by Om• nimark: to subscribe, send a I-line email message to 1 i stproc@ omnimark. com saying subscribe omug-1 forename surname in the body of your message, substituting your own forename and surname. SCML-L There is an SGML mailing list at the Heidelberg LISTSERV: to subscribe, send a I-line email message to l;[email protected];• he; de 1berg. de saying subscribe sgm1-1 in the body of your message. Archived messages can be got by sending the command get sgm1-1 logyymm to the server address, substituting the 2-digit year and month. TEI-L The TEl mailing list is hosted on the Chicago LISTSERV: to subscribe, send a I-line email message to [email protected];c. edu saying subscri be tei-1 in the body of your message. Archived messages can be got by sending the command get tei-1 logyymm to the server address, substituting the 2-digit year and month. XML-L The XML mailing list is hosted on the Dublin LISTSERV: to subscribe, send a I-line email message to 1 i stserv@l; stserv. heanet. ; e saying subscribe xm1-l in the body of your message. Archived messages can be got by sending the command 394 Peter Flynn

get -l logyymm to the server address, substituting the 2-digit year and month. xml·dev The XML Developers' list is hosted at Imperial College: to subscribe, send a I-line email [email protected] . uk say• ing subscribe xml-dev \textit{yourname}@\textit{yoursite} in the body of your message, substituting your own name and site as in your email address. The list traffic is publicly archived (via hypermail) and WAIS-searchable at: http://www.lists.ic.ac . uk/ hypermail/xm1-dev/. XSL List The XSL List is hosted by Mulberry Technologies as a service to the XSL user community: to subscribe, send a I-line email message to [email protected] saying subscribe xsl-list in the body of your message. The XSL List archive is at http://www.mu1berrytech.com/xs1/xsl-list/archive/. There are other lists for associated software, companies, services, and standards: see the page at http://www.sil.orgjsgm1 /1 i sts. html.

8.4. SCML User Croups

The International SGML Users' Group (ISUG) is made up of national, regional, and sectoral user groups world wide. The principal site for in• formation is at http://www . i sgml ug. orgj. The objectives of the SGML Users' Group are to promote the use of the Standard Generalized Markup Language and to provide a forum for exchange of information about SGML. The current primary contact (editorial address) for the ISUG is: Pamela Gennusa, SGML Users' Group, PO Box 361, Swindon SNS 7BF, Wiltshire, United Kingdom. Phone: +441793512515, Fax: +441793 512516, Email: [email protected]. ISUG was founded in 1984 by Joan M Smith and has just over 150 corporate and just under 300 individual members. It has liaison repre• sentatives to ISO/lEJ JTCl/WG4, the ISO entity responsible for SGML (see section 2. 7.1.1). The Annual General Meeting of the Users' Group is normally held in conjunction with the International Markup Confer• ence sponsored by the Graphic Communications Association. For an Understanding SGML and XML Tools 395 address list of elected members, national chapters, SIGs and pending chapters see the current online listing at http://www.sil.org/sgml/ sugExec1998.html. In addition, there are many specialist SGML and XML user groups for specific industry, project or technology applications (details on the User Groups page at http://www. sil .org/sgml /groups. html). References

1. Abensour, L.: 1927, Le probleme feministe: un cas d'aspiration collective vers l'egalite. Paris: Radot. 2. Ahonen, H., H. Mannila, and E. Nikunen: 1994, 'Generating Grammars for SGML Tagged Texts Lacking DTD'. In: Workshop on Principles of Document Processing. Darmstadt. 3. Alschuler, L.: 1995, ABCD ... SGML, 1-850-32197-3. Boston MA: ITCP. 4. Anon: 1987, ~utomated Interchange of Technical Information'. Technical Report MIL-STD-1840A, US 000, Washington DC. 5. Anon: 1988a, 'Computer-Aided Acquisition and Logistic Support (CALS) Pro• gram Implementation Guide'. Technical Report MIL-HDBK-59A, US 000, Wash• ington DC. 6. Anon: 1988b, 'Markup Requirements and Generic Style Specification for Elec• tronic Printed Output and Exchange of Text'. Technical Report MIL-M-28001A, US 000, Washington DC. 7. Anon: 1989, 'Technical Manuals: General Style and Format Requirements'. Tech• nical Report MIL-M-38784, US 000, Washington DC. 8. Anon: 1995, 'Understanding the SGMLDeclaration'. Technical report, Omnimark Corporation, Nepean ON. http://www.omnimark.com/resources/white/dec/. 9. Anon: 1997a, 'General Content, Style, Format, and User Interaction Require• ments for Interactive Electronic Technical Manuals'. Technical Report MIL-PRF- 87268A, US 000, Washington DC. 10. Anon: 1997b, 'Revisable Data Base for the support of Interactive Electronic Tech• nical Manuals'. Technical Report MIL-PRF-87269A, US 000, Washington DC. 11. Bartholomew, 0.1.: 1973, Stochastic Models for Social Processes, 0-471-05451-8. Chichester, England: John Wiley & Sons. 12. Brown, F.: 1942, 'ETAOIN SHRDLU'. In: Unknown Worlds. New York NY: Street & Smith Publications. 13. Busa, R.: 1980, Index Thomisticus Sancti Thomae Aquinatus Operum Omnium Indices et Concordantiae. Stuttgart-Bad Connstatt: Fromann-Holzboog. 14. Coumane, M.: 1997, 'The Application of SGML/fEI to the Processing of Com• plex Multilingual Historical Texts'. Ph.D. thesis, University College Cork, Cork, Ireland. 15. Oem, D.: 1994, The Internet Guide for New Users, 0-07-016511-4. Boston MA: McGraw-Hill. 398 Peter Flynn

16. DeRose, S.: 1997, The SGML FAQ Book: Understanding the Foundation of HTML and XML, 0-7923-9943-9. Boston MA: Kluwer Academic Publishers. 17. DeRose, S. and D. Durand: 1996, Making Hypermedia Work: A User's Guide to HyTime, 0-7923-9432-1. Boston MA: Kluwer Academic Publishers. 18. Ensign, C.: 1996, SGML: The Billion Dollar Secret, 0-l3-226705-5. Upper Saddle NJ: PTR Prentice Hall. 19. Flynn, P.: 1995, The Handbook, 1-85032-205-8. London: Interna• tional Thomson Computer Press. 20. Goldfarb, C. E: 1990, The SGMLHandbook, 0-19-853737-9. New York NY: Oxford University Press. 21. Goldfarb, C. E: 1996, 'SGML: Grove and Grove Plan'. comp.text.sgml. http://www. sil.org/sgml/grove-CFG.html. 22. Goldfarb, C. E, S. Pepper, and C. Ensign: 1997, The SGML Buyers' Guide, 0-l3- 681511-1. Upper Saddle River NJ: Prentice Hall. 23. Ide, N. and J. Veronis: 1996, The Text Encoding Initiative: Background and Context, 0-7923-3704-2. Dordrecht, Netherlands: Kluwer. 24. Kimber, E.: 1996, 'SGML: Groves'. comp.text.sgml. http://www.sil.org/sgml/ groveKimber.html. 25. Kimber, E.: 1997, Practical Hypermedia: An Introduction to HyTime, 0-l3-309899-0. New York NY: Prentice-Hall. 26. Knuth, D. E.: 1984, The TEXbook, 0-201-13447-0. Reading MA: Addison-Wesley. 27. Krol, E.: 1997, The Whole Internet User's Guide and Catalog, 1-56592-025-2. Se• bastopol CA: O'Reilly. 28. Lamport, L.: 1988, 'Document production: visual or logical?'. TUGboat 9(1), 8-10. 29. Lesk, M.: 1976, 'Bell Laboratories Computing Science Technical Report'. Techni• cal Report 49, Bell Laboratories. 30. Maler, E. and]. el Andaloussi: 1996, Developing SGML DTDs: From Text to Model to Markup, 0-l3-309881-8. Upper Saddle River N]: Prentice Hall. 31. McEvilly, C.: 1997, How to make great Web pages. Los Alamos NM: Carlos McEvilly. http://www.c3.1anl.gov/-cim/webgreat/. 32. McGrath, S.: 1997, ParseMe.lst: SGML for Software Developers, 0-l3-488967-3. Upper Saddle River NJ: Prentice Hall. 33. McGrath, S.: 1998, 'XML Programming in Python'. Dr Dobbs Journal. 34. Pepper, S.: 1997, 'The Whirlwind Guide to SGML Tools and Vendors'. Technical report, Steve Pepper, Oslo, Norway. http://www.infotek.no/sgmltool/. 35. Ramalho,]. C.,]. G. Rocha,].]. Almeida, and P. R. Henriques: 1997, 'SGML Docu• ments: Where does Quality go?'. In: Proceedings of the SGML/XMI:97 Conference. Alexandria VA, pp. 171-177. 36. Raman, T.: 1997, Auditory User Interfaces: Toward the Speaking Computer, 0-7923- 9984-6. Boston MA: Kluwer. 37. Smith, C. M.: 1857, The Working Man's Way in the World. London: WF & G Cash. Reprinted 1967 by the Printing Historical Society, Bride Lane, London EC4. 38. Smith, N.: 1996, Practical Guide to SGML/XML Filters, 1-55622-607-1. Plano TX: Wordware. 39. Sperberg-McQueen, M. and L. Burnard: 1994, 'Guidelines for Electronic Text Encoding and Interchange'. Technical report, Oxford and Chicago. 40. Steinberg, S. H.: 1974, Five Hundred Years of Printing, 0-14-020243-5. London: Penguin/Pelican. Understanding SGML and XML Tools 399

41. van Herwijnen, E.: 1994, Practical SGML, 0-7923-9434-8. Dordrecht, Nether• lands: Kluwer. 42. WaIl, L. and R. Schwartz: 1992, Programming Perl, 0-937175-64-1. Sebastopol CA: O'Reilly. Olossary abstract syntax The theoretical instantiation of a grammar, designed to allow explanation of the ideas behind it without having to use anyone particular syntax. In ISO 8879, the abstract syntax allows us to discuss the concepts of markup without tying us to a specific system. ASCII The American Standard Code for Information Interchange, equi• valent to ISO 646 International Reference Version, which defines the 7-bit coded character set used on most computers, made up of the 52 letters of the Latin alphabet (26 capitals and 26 lowercase), the digits 0-9, and some . Sometimes called 'plain• text' to distinguish it from 'binary' data, which is unprintable. It is an error to refer to accented letters and other symbols as being ~SCII'. attribute Additional item ofinformation about an element, contained within a start-tag, in the form of a name and value separated by an equals sign. Minimization, if permitted in a DTD, may allow the name and equals sign to be omitted where it is unambiguous to do so, and the quotation marks around the value may be omitted if the value uses only name characters. In XML no minimization is allowed and all values must be quoted. 402 Peter Flynn attribute list declaration In a DTD, form of markup declaration that defines the attributes of an element, listing their types and values. binary Composed all or partly of non-printable characters ('control characters'). The opposite of a 'plaintext' or ASCII file. Binary files cannot normally be read or used directly by humans: they need a program to do som~thing with them. capacities Measures of program storage requirements calculated from quantities specified in an SGML Declaration. catalog File listing Formal Public Identifiers and their System Iden• tifier equivalents for the user's local file system (or network file repository, such as the Web). CERN Originally the Conseil Europeen pour la Recherche Nucieaire, now the European Laboratory for Particle Physics. The birthplace of the World Wide Web and its development centre until responsibility passed to the W3C in 1996. compilation The process of creating a binary representation of a DTD in computer-readable form, so that documents can be processed without the DTD having to be re-read from scratch (and thus re• parsed and re-validated) every time. concrete syntax An actual real-life syntax instantiating a grammar. In the case ofISO 8879, the default concrete syntax involves angled , ampersands, semicolons, slashes, and percent signs as delimiters for various purposes: this is provided in ISO 8879 for reference purposes and is called the Reference Concrete Syntax. A different concrete syntax could use entirely other characters. connector In DTDs, the sequence of elements in a content model is symbolized by the characters '&' (and: the elements must all occur, but may be in any order), 'I' (or: only one element of those listed may occur), and',' (seq: all the elements listed must occur in the order given). content markup Markup occurring within mixed content, typically in running text such as and -level elements. The opposite of structural markup. content model In a DTD, the specification of what mix of further [sub]elements or text data an element may contain. Understanding SGML and XML Tools 403 crossed boundaries An error in SGML markup when an element end• tag occurs outside the bounds of the containing element in which its start-tag occurred. descriptIVe markup A form of content markup used to describe iden• tity or intent rather than function. Also used to mean optional markup, added for informational purposes. document type The principal identity of an SGML document, de• clared by its Document lYPe Declaration, defined in the DTD, and named as its root element. document type declaration The declaration at the top of an SGML instance which specifies the name of the root element and the identity of the DTD. See DTD. DTD Document Type Definition. The formal definition of the ele• ments, entities, and notations which go to make up a specific document type in SGML. element A named component part of the text or structure of an SGML document entity, identified in the markup hierarchy, containing other elements, or text data, or both, or nothing. The names given to elements and their potential location in the markup hierarchy are declared in the DTD. element content The state pertaining within an element when the only content allowed is more element markup, not text. Typical of the outer structure of DTDs, where sectional elements can contain a variety of structures (paragraphs, tables, figures, notes, headings, etc). element declaration In a DTD, form of markup declaration that de• fines an element, giving its name, minimization parameters (omit• ted in XML), and content model. end-tag See tag. entity An abstract term for a variety of parts of a whole SGML docu• ment, which must all resolve to a concrete instantiation when the document is parsed. A parameter entity is a component of a DTD whose value requires interpretation and action; a character entity is a named representation of a ; a general entity is a fragment of text or data named for multiple reuse or external reference; a 404 Peter Flynn file entity is a file containing a part of the document or its DTD; the document entity itself is the root element and all it contains. entity resolution The process of identifying, retrieving, and inserting files speCified by Formal Public Identifiers or System Identifiers during parsing, so as to construct a whole SGML document. exclusion The attachment of an element or list of elements to a con• tent model to specify that they may not occur in that content in any instance, despite being included in another part of the content model. Often used to remove a definition of a structural element like a displayed formula or table or figure from being included within itself. FPI Formal Public Identifier. A label for a resource referred to in an SGML , constructed according to the syntax defined in ISO 8879, as a way of avoiding the need to hard-code an author's or user's local file or directory names into a document. glyph A printable symbol, possibly compounded from other symbols. grove The in-memory result of parsing an SGML document: a com• plete collection of tree-like maps of every object (markup as well as text content) and every relationship between them. Groves are defined as part of the HyTime and DSSSL standards. These do not specify how a parser should implement groves (that is left to the implementors): instead, they provide an abstract vocabulary and syntax for communicating information about the parsed document between parser and application. HTML HyperText Markup Language (an application of SGML). For• mally defined as HTML 2.0 in RFC 1866 but existing in a multitude of previous and subsequent versions of greater and lesser utility. HyTime Hypertext and Time-based (SGML multimedia standard), ISO 10744. IETF Internet Engineering Task Force. The body responsible for the technical specification of the Internet. The IETF traditionally achie• ved rapid advance in technology via RFCs on the basis of 'rough consensus and running code' (ie get some agreement that it's a good idea, and a program that shows it working). Inclusion The attachment of an element or list of elements to a con• tent model to specify that they may occur anywhere in that con• tent in any instance. Often used to add a definition of a globally• occurring element like a footnote. Understanding SGML and XML Tools 405

Instance An instance of a DTD: more commonly called 'an SGML file'. Usually composed of the author's text, enclosed in the start• and end-tags of the root element. It excludes the DTD, Document Type Declaration, and SGML Declaration. INRIA Institut National de Recherche en Informatique et en Automatique. French National Computing Research Institute at Sophia-Antipolis near Cannes, and other sites in France. With MIT, one of the orig• inal sponsoring members of the W3C. ISO International Organization for Standardization (Geneva). De• spite the apparent misacronym, the letters refer to the Greek UTO~ ('equal') as in 'isobar', 'isotherm', etc, and were adopted as an synthesis of all the ISO/OIS/IOS acronyms of the organization in various languages. !rElC See TWC. logical markup The use of markup systems to represent the structure, intent, or function of a document's components, leaving their ap• pearance to be determined by a style sheet. markup Symbols or characters used with text to indicate significance or action. SGML uses inline markup (markup inserted within the text at the points of significance); some systems use out-of-line markup, where the markup is stored separately from the text it applies to. markup declaration In a DTD or a Document Type Declaration, the ac• tual specification of markup being defined, rather than the markup itself being used. Usually indicated by an exclamation mark fol• lowing the open-angle-bracket. MIME MUltipurpose Internet Extensions. The Internet standard (actually a collection of many standards) for the description of the content format of attachments and extensions to email mes• sages, also used in Web servers to stamp the type of document on outgoing files. Each different type is recognized by its MIME Content-Type, for example text/sgml for SGML files. minimization Techniques in SGML for allowing markup to be rep• resented with a smaller number of characters or keystrokes than normal. minimization parameters In an element declaration, the combination of or letter-O which signal whether or not the start-tag or end-tag (or both) may be omitted. 406 Peter Flynn

MIT Massachusetts Institute ofTechnology, Cambridge, MA. The Lab• oratory for Computer Science was a founding organization of the W3C. miXed content The state within an element where both text data and more element markup are valid. Typical of all paragraph-type ele• ments (paragraphs themselves, list items, notes, captions, labels, titles, etc). names In SGML, the names of elements and attributes and some attribute values, restricted in the set of characters they can use. By default these must start with a letter, and continue with letters, digits, periods, or for up to eight characters, or as specified in an SGML Declaration. In XML the and colon are added to the valid characters, and the length limit removed. normalization The technique of expanding all minimization during parsing and validation so that the output instance contains all the markup fully expressed. occurrence In DTDs, the way in which elements may occur is symbol• ized in content models by the characters '?' (optional: the element may occur once or not at all), '+' (repeatable: the element must oc• cur once but may occur more times), and '*' (optional-repeatable: the element may be absent or may occur once or many times). omission In standard SGML, the practise or ability of a DTD to allow the start-tag or end-tag (or both) of an element to be omitted when it is unambiguous to do so. parameters Values or sets of values which one part of a processing system passes to another in order to enable, control, or restrict its actions. parsing The process of identifying and distinguishing between text and markUp in a document, and between this and markup dec• larations in a DTD, and using this information to determine the document's structure. plaintext Computer text (or file) using only the 96 printable characters of the ASCII character set (ISO 646irv); that is: A-Z, a-z, 0-9, and punctuation, plus the space, TAB, and linebreak characters. No , no accents, no symbols, no hidden characters or encoding. This format is one of the lowest common denominators: it is public property so it can be used without license, and is portable to every computer system in the world. Understanding SGML and XML Tools 407 prescriptive markup Markup which is designed so that the user or author must follow a (usually rigid) document structure, to en• sure conformity with some external standard. Prescriptive markup usually has many compulsory elements. property set The SGML property set is the complete repertoire of ob• jects and their properties that may occur in SGML, categorized into a number of classes. For any given document, a document property set therefore defines the set of possible classes and properties that each component can be allocated as it is parsed. The SGML Declaration, DTD, and Document Type Declara• tion which must precede an instance, either implicitly or explicitly. quantities Measures (in terms of length or occurrence) of the items specified in an SGML Declaration used in calculating capacities. reference concrete syntax The syntax provided in the ISO 8879 stan• dard as a concrete example of how the abstract syntax of SGML could be instantiated, and used as a reference for the con• struction of other concrete syntaxes. The Reference Concrete Syn• tax forms part of the default SGML Declaration. Regular Expressions In Unix and Unix-like systems, a syntax for ex• pressing patterns of characters, typically used in search programs like grep to allow complex matching. RFC Request For Comment. The class of document in which technical suggestions are made relating to the conventions and standards used on the Internet. RFCs are subdivided into two tracks by the IETF, 'informational' and 'standards', depending on their content and potential effects. root element The element named as the document type in the Docu• ment Type Declaration. The start- and end-tags bearing this name must enclose all the user's text and markup, forming an SGML Instance. RTF Rich Text Format. A language devised by Microsoft for describing the appearance of text, so that it can be passed between different makes and models of word processor. Internally, RTF looks like a cross between TJjX and PostScript, and can be generated by SGML systems such as Jade and InContext to provide printed output. schema A formal statement describing the features of objects in a database and how they are related to each other. 408 Peter Flynn script Interpreted program or list of actions grouped together in a file for use as a single command or action. selection rule In DSSSL and XSL, a specification governing which el• ement[s] or element-attribute combinations are to be used in ap• plying subsequent style or other rules. SOML Standard Generalized Markup Language (ISO 8879: 1986). The international standard for defining and describing markup lan• guages. SOML Declaration The specification of coded character set, special characters, names, quantities, capacities, and other restrictions which a document type designer may with to place on or lift from a DTD. Many small DTDs use the default SGML Declaration spec• ified in the SGML standard. start-tag See tag. style rule In DSSSL and XSL, a specification governing which ele• ment[s] or element-attribute combinations are to be formatted in accordance with the given style. stylesheet File containing a specification of how a document or its components are to appear when displayed, printed, spoken, or oth• erwise represented. Stylesheets for SGML are written in a language designed for the task, for example a public standard like DSSSL, CSS, or XSL; or a language designed to go with a specific product like DynaText, Synex, or other browser or formatter. subset A portion (or all) of a DTD stored in a separate file (an external subset) or between square brackets at the end of a Document Type Declaration (an internal subset). system Identifier A , directory path and filename, or a URI (URN or URL), giving the exact location of a file entity within a retrievable file system. tag A markup object which signals the start or end of an element (in the case of empty elements a single tag signifies the existence). A tag is made up of the element name, enclosed in angled brack• ets (by convention: the actual characters can be changed in an SGML Declaration). Start-tags may contain additional information (attributes) between the name and the closing bracket; end-tags identify themselves with a slash (again by convention) between the opening bracket and the name. Understanding SGML and XML Tools 409 TEl< Typesetting language and program designed by Donald Knuth and made available in the public domain for almost any make or model of computer (commercial versions also exist). Used extensively in research and academic work as well as commercial , and common as a formatting tool with SGML systems because of its programmability. IMP{ is a set of macros for TP{ to improve the consistency of document preparation by dissociating visual formatting from the structure of the document. validation The process of analytically applying the result of parsing to ascertain if a document structure matches that declared in the DTD. visual markup The use of markup systems to represent the appear• ance of a document on screen or on paper without distinguishing between the functions of different components of the document. W3C World Wide Web Consortium. An association jointly hosted by MIT (USA), INRIA (France), and Keio University Oapan) which took over responsibility for Web development from CERN in 1995. Membership is open to corporations only. Index

* (asterisk) AECMA,77 occurrence indicator in DTDs, 64, 166 .lElfred, 278 occurrence indicator in Regular AF, 196, see Architectural Forms Expressions, 288 AFDR,196 special meaning to Unix shell, 287 Air Transport Association of America, see wildcard in filenames, 286 ATA < and > , 112, 117, 121 delimiters in markup, see markup AMS Unix redirection, see Unix American Mathematical Society, III 3B2,235,363,366-369 Amsterdam SGML Parser, 254, 267, 378 4GL,304 API, 219, 267, 268, 276, 277, 316, 348, AAp, 84, 85, 87,111,112,114,220 374,378,379,381,382 abuse of markup, see tag abuse Application Builder, 224, 226, 227 accents, see entity, character Application Programming Interface, see Acrobat, 228,231,318,381 API ACS, 334, 335 ARC-SGML, 254, 267 Active Views, 227 Architectural Forms, 196 ADEPT, 119,213-217,219,361,371, in HyTime linking, 197 372 Architectural Forms Definition Command Language, 216, 219 Requirements, see AFDR Document Architect, 213, 217 , 112, 117 Editor, 213 ASCII,336 Adobe Type Manager, 364 ASP, 254 412 Peter Flynn

Association Europeenne de CDF, 130 Constructeurs de Materiel CDWeb Publisher, 353 Aerospatial, see AECMA CERN,123 Association of American Publishers, see Channel Definition Format, see CDF AAP Chemical Markup Language, see CML AsTeR,359 Claris Works, 201 Astoria, 371 ClipWin, 173 ATA, 77, 220 CML, 93,129 Atari,387 color attribute lists in editing, 212, 238 duplicate name tokens allowed, 191 in style sheets, 210 empty, 191 Common Telecom DTD, see CTD multiple, 191 Computer Aided Document Engineering, attributes, 23 see CADE Aurora, 131 Computer Aided Logistic Support, see Austin, 185 CALS Author/Editor, xxv, 31, 152,212,219-222, Computer-Based Training, see CBT 353 Conseil Europeen pour la Recherche awk, 179,264,293-296,304 Nuc1eaire, see CERN content Babble, 356, 357 mixed,63 Backus-Naur Form, see BNF model, 27, 62-64 Balise, 304, 313, 314, 316, 331 pernicious mixed, 65, 257, 259 Balise HTML Package, 316 Continuous Acquisition and Lifecyc1e Banff, see Konstruktor Support, see CALS, see CALS bash,287 cross-references, 210, 317 Beca, 162, 185, 186 crossed boundaries, 98 binary, 17 crosstabulation, 109 BNC, 357 csh,287 BNF,186 CSS, 143, 198, 211 bold italics, xiii in XML, 143 British National Corpus, see BNC CTAN,387 CTD,79 CADE,50 CALS, 39, 77, 89, 91, 106, 106, 198,220, DAPHNE,80 232 Dare, 343 tables, 91, 106 database Canonical XML, 265 IETM, 91, 335 Capacities, 30 mathematics, III Carthage, 184 ODBC, 316 Cascading StyleSheets, see CSS searching, 342 catalog, 203, 205, 255, 307 SGML, 342, 370 entry format, 155 Department of Defense, see DoD OASIS, 28, 155,255, 271, 350 DeskTopPro, 363 on CD-ROM, 386 DESSERT, 55 CatEdit, 153, 155, 159,273,382 disabled CB1; 355 browser, 359 CD-ROM editor, 359 contents, 385 SGML facilities, 359 format, 384 techniques for authoring, 360 Understanding SGML and XML Tools 413 disclaimer, 384 modifying, 163 DL Composer, 363 omitting undeclared content, 184 DocBook, xxv, xxvi, 31, 80, 82, 84, 85, 94, parsing and validation, 183 97,107,132,163,167,168,173,176, simple generation, 178, 180 200,204,214,217,240,268,318, where to store, 207 330,363,364 writing your own, 160 document DTD Viewer, 50 class, 15 dtd2html, 302 creation, 43 dtddiff, 301 instance, 44 dtdtree, 301 type dtdview, 300 declaration, 45, 203, 349 DTP systems, 363 declaration subset, 57 Dublin Core, 131 definition, 44 duplication prescriptive or descriptive, 44 attribute values, 191 Document Encoding and Structuring DynaBase, 345 Standard for Electronic Recipe DynaTag, 321, 325-327, 345 Transfer, see DESSERT DynaText, 111, 198,211,325,343-347, Document Generator, 228 358,360,408 Document Style Semantics and DynaWeb,347 Specification Language, see DSSSL documentation EAD, 92,103 inaDTD,174 EasyDTD, 178 in DTD design, 162 ECMAscript, 144 , 336 ed,288 military, 89 EDD,364 quality, 35 edit,S SGML,39 editor, 201 technical, 2, 40, 79, 82, 345 graphical, 49 DoD,89 human, 9 dongle, 306 in document creation, 43 DOS plaintext, 5 editing and tagging, 246 SGML,24 software, 387 use of white-space, 53 dosedit, 5 editor wars, 201 DSSSL, 143, 188, 193,211,237,317, EDT,5,201 362,393 effectivities, 77 online,200 EFTI,79 dsssl-o, see DSSSL online Electronic Archival Description, see EAD DTD,202 Electronic Binding Project, see Ebind 'flattening', 181 Electronic Parts Catalog Exchange arbitrary tagsets, 185 Standard, see EPCES business use, 66 element, 23 compilation, 203 EMPTY, 61, 192 defaults when not used, 192 in XML, 137 grammar generated, 185 content, 63 industrial use, 76 declaration, 26, 27 instances without one, 185 designing your own, 160 mathematics, 111 spanning, 100 missing files, 207 element content, 63 414 Peter Flynn

Element Definition Document, see EDD inclusion and exclusion, 63 Element Structure Information Set, see exclusions, 63 ESIS expat, 267, 278, 300 , xxv, 5, 53, 139, 156, 183, 201, Explorer, 343 203,207,212,235-239,260,262, expression 271,289,304,306,307,313,359,365 language in stylesheets, 344 emacspeak, 359 Extended Pointer Notation, see EPN email, 20 TEl, 354 empty Extensible Markup Language, see XML elements, 61 Extensible Style Language, see XSL elements in XML, 137 External Subset, 47 end-tag, 23 ezDTD,235 entity catalog, see catalog FAQ, 35 character, 57 SGML, 40, 319,391 character entity file, 57, 83, 157, 158, XML, 3, 40, 167,391 206 FlexLM,306 character entity files, 206 floats general, 48, 57 in stylesheets, 210 manager, see catalog Flow Object nee, see FOT notation, 58 flow objects parameter, 48, 57 in XSL, 144 resolution, 203 fonts, 6, 10 resource file, 349 in stylesheets, 210 subdoc,58 used in browser, 349 EPeES, 77 Formal Public Identifier, see FPI EPN, 100, 140 Formal System Identifier Definition errors Requirements, see FSIDR ambiguous, 261 formatting, 13 in compiling DTDs, 205 DSSSL stylesheets, 193, 198 in the instance, 272 language, 193 parser error messages, 257 qualified by attributes, 350, 354, 356 reportable in SGML, 253 stylesheets, 208 reportable in XML, 275 XML stylesheets, 143 escape Formatting Output Specification backslash, 16, 223, 287, 291 Instance, see FOSI designating sequence in character forms sets, 149 HTML-style, 354 percent, 306 FOSI, 89, 198,211,218,352,360 quotes in Unix shell, 288 FOT, 318 ESIS, 263, 264, 266, 269, 271, 273, 287, Fourth Generation Language, see 4GL 300 FPI,46, 146, 147,152,156,205,255, Euromath 268,294,349 DTD,115 and Internet Domain Names, 190 editor, 115 algorithmic resolution, 270 project, 114 errors in resolving, 257 European Forum for Telecom Industry identifying the file, 149 Information Interchange, see EFTI in APPINFO/SEEALSO, 192 Excel, 231 ownership, 147 exceptions repository server, 150 Understanding SGML and XML Tools 415

syntax, 147 HTML Pro, 124, 128 FQDN,190 HTML standard, 125 Frame+SGML, 77, 243, 363, 364, 366 math, 112, 113, 117 FrameMaker, 235, 321, 326, 341, 345, standard, 124 363,364,366 tables, 105 frames humanities HTML-style, 354 DTDs,93 Fred, 135, 162, 185, 186 Hypercard, 6 Frequently Asked Question, see FAQ hypertext, 6, 100 FSIOR,196 HyTime, 193, 351 FTP context links, 198 anonymous, 6 independent links, 198 Fully-Qualified Domain Name, see FQDN links, 197 revised standard (HyTIme 2), 196 GCA,41 GCSFUI,90 il8n, see internationalization Genera, 363 IBMIODOC, 91 General Architecture, 196 ICADD,359 General Content, Style, Format, and User ION, see Internet Domain Names in FPIs, Interaction, see GCSFUI 294 general entity, see entity, general IETp, 123, 128, 129, 407 generated text, 53,210 IETM, 90, 91 GhostScript, 389 browser, 355 GNU, 287 classes, 90 GNUS, 236 revisable database, 91 GNUscape Navigator, 236 IliAD, 359 Grab It!, 173 1M, 371, 374 Graphic Communications Association, IML,310 seeGCA inclusion Graphical DTD Editor, 162, 172 files in DTDs, 58 Graphical DTD Viewer, 240 inclusions, 63 grep, 285, 286, 288, 290, 291, 297, 304, InContext, 208, 228, 231-233, 407 380,407,418,420 indentation Grip, see SGML Editor in style sheets, 210 groves, 194,219,379 indexing, 342, 345 gzip,60 Information Manager, 219, 371 information retrieval, 342 Health Level 7, see HL7 INRIA, 123 hexadecimal instances, 20, 44 numeric character references, 192 InStEd,345 HL7,131 Institut Nationale de Recherche en hostid,306 Informatique et en Automatique, see HoTMetaL, 122, 157,212,219,353 INRIA HTML,122 Interactive Electronic Technical Manual, browser, 349 see IETM for books, xxv Interleaf, 345 HTML 2.0, 125 Internal Markup Language, see IML HTML+, 124, 127 Internal Subset, 47 HTML 3, 124, 127 International Committee on Accessible HTML 4, 124, 128 Document Design, see ICADD 416 Peter Flynn

International Organization for simple, 141 Standardization, see ISO two-way, 140 internationalization with URLs, 142 XML,138 with XPointers, 142 Internet, 6 XML, 140 books, 7 XML (XLL), 140 connection, 6 Linux, 81, 336 Domain Names in FPls, 190 LinuxDoc, 198,268,336,363 programs, 6 Lisp, 304 Internet Engineering Task Force, see IETF lists Internet Explorer, 119,266,361,386 inline,211 Internet Service Provider, see ISP LISTSERV, 393 intranets Local Area Network, see LAN publishing, 353 ls, 294, 296 ISBN, 294 LT NSL, 304, 379, 380 ISO, 36,294 LTXML,234, 305, 379, 380 standards, see standards, ISO LTDR,196 ISF, 391 italics, xx m4,298 manual J2008,106,232 Unix manual pages, 286, see Unix Jade, 183, 192, 198,200,211,236,237, Maple, 111 239,271,277,317,318,363,407 margins jadetex,318 in stylesheets, 209 Java, 128, 183,214,276-279,331,357 Mark-It!, 254, 266-268, 378 Java Development Kit, see JDK Write-It!, 268 JavaScript, 144 markup, 198 JDK, 277 declaration, 27 JIT, 276 definition, 12 JUMBO, 130 description, 23 Just-In-Time, see JIT hidden, 13 justification introduction, 9 in stylesheets, 210 omission indicators, 27 knit, 380 proof-correction, xx Konstruktor,309 SGML, xx ksh,287 Mathematica, 111 mathematics LAN, 345 DTDs, 111 Lark, 277 fonts for the Web, 111 Larval, 277 MathML, 112, 119, 130 IMpe, xxvi, 15,80,81, 198,250,251, metadata, 100 303,318,336,366,409 MIME IMpe, 250, 405 Content-Type, 349 Lexical Type Definition Requirements, see used to steer browsers, 352 LTDR minimization, 27, 28, 62, 261, 262, 289 limerick, 98 not allowed in XML, 52 linking parameters, 52, 59 extended, 141 minimum literals, 147, 268 HTML,140 MIT, 123 HyTime,197 mixed content, 63, see content, mixed Understanding SGML and XML Tools 417

Mosaic, 352 Organization for the Advancement of mtSgmlQL, 343 Structured Information, see OASIS multi-user systems, 205, 207 OSD,130 MultiDoc LT, 355 Oxford Text Archive, 181 MultiDoc Pro, 119, 132, 140, 154, 193, 249,348,354,355,362,418 P-Stat, 304 Translating Editor, 249 . P-Stat Programming Language, see PPL music, see SMDL pagebreaks in stylesheets, 209 Names, 30 Paint Shop Pro, xxv Natural Language Processing, see NLP Pandora, 88, 215 Navigator, 111, 112, 267, 277, 386 Panorama, 119, 132, 140, 154, 193,343, navigator, 349 344,348,352,354,362,418 Near& Far, 26, 164, 172, 244, 278 Panorama Viewer, 353 Near& Far Author for Word, 235, 328 parameter Near& Far Designer, 50, 69, 71, 162, 172, entity, see entity, parameter 177,240 parsing, 27 NGO, 76 and syntax checking, 255 NLp, 334, 335 and validation, 253 Non-Governmental Organization, see drag-and-drop, 266, 272 NGO normalization, 262 empty elements in XML, 62 NormDTD, 177,181,182,246 entities, 57 Nota Bene, 235, 247 HTML,256 NotePad,5 marked sections, 60 nrofI. 336 part of a file, 256 nsgmls, 65, 152, 159, 182, 183,207,237, SGML Declarations, 29 239,254,257,260,263,266, stand-alone, 266 268-273,302,317,336,368 PAT, 258, 342 NTEmacs, 237-239 PBM, xxv, xxvi Null End Tag, see NET PC-Write, 13, 14,247 null end-tag PDF, 131,318,381 delimiter, 192 Perl, 133, 162,264,265,267, 274, 277, start character, 192 283,298-300,302-304 perISGML, 162,300 OASIS, 40 PFE, 178,307, 322 occurrence, 27 PGML,131 OCLC, 185 PIC, 133 ODBC, 316 PICS,130 OFX, 131 PIO, 133 OMLE, 305, 306,309 plaintext, 5 Omnimark, 179,237,305-307,309,319, editors, 5 331, 381, 393 Platform for Internet Content Selection, omnimark-mode, 306 see PICS Online Computer Library Center, see Poet, 371 OCLC points, printers', 6 Open DataBase Connectivity, see ODBC Portable Document Format, see PDF Open Financial Exchange, see OFX positioning Open Software Description, see OSD in style sheets, 210 Opera, 386 PPL, 304 418 Peter Flynn

Precision Graphics Markup Language, see in Emacs, 237 PGML ingrep, 287 prefixes, see generated text in MultiDoc Pro, 356 Prescriptive markup, 20 in Panorama, 354 print in Synex, 350 conventions, 11 in editors, 213 not sole target, 32 syntax, 288 origins, 9 Request For Comments, see RFC printing, 362 resolver rough and ready, 362 entity, see catalog terminology, 6 Resource Description Framework, see Processing Instruction Close, see PIC RDF Processing Instruction Open, see PIa RFC, 123 product name, xx Rich Text Format, see RTF Production Publisher, 363 RIF, 77 prolog, 44 RMAIL,236 proofreading, xix rms, 374 Property Set Definition Requirements, see Roustabout, 331 PSDR RTF, 309, 318, 332 property sets, 194 RulesBuilder, 31,181,220,221,368 PSDR,196 RunSP, 159, 183 psgml, 53, 139, 150, 152, 183, 203, 207, 212,235-237,239,260,262,271, 54-Desktop, 159,381,382 289,313,365 Sara, 344, 357, 358 PSP, xxvi SASOUT, 106 pwWebSpeak, 359 SAJ<,276,276,279 PySGML,304 schema, 144 Python, 162,234,264,283,298,304 Scheme, 143, 304 python-mode, 304 scripts, 284 sculptor, 221 Quantities, 30 SDA,85 QuarkXPress, 331 attributes, 359 QWERTZ, 80, 336 SDATA,57 SDD,134 Railroad Industry Forum, see RIF SDQL,198 RainbowMaker, 65, 321, 324, 326, 332, search 345 language, 193 RAST, 263, 270 searching rast, 270 applications, 342 rbmaker, 321, 322, 324, 326, 332 display style, 352 RCS, 29, see Reference Concrete Syntax sed, 291-293, 300, 304 RDF, 130 Select, 328 RDS, 29 Selection rules, 143 redirection, see Unix sgcount, 380 Reference Application for SGML Testing, sggrep, 380, 381 see RAST, see RAST sggrep-mark, 380 Reference Concrete Syntax, see RCS, 31, SGML 190, 204, 205 arguments for use, 32 Reference Delimiter Set, see RDS buying a copy, 188 Regular Expressions changes to the standard, 187 Understanding SGML and XML Tools 419

Declaration, 28, 189, 190, 197, Special Interest Group, see SIG, see SIG 203-205 Specific Character Data, see SDATA Declaration (WWW), 189 spent, 271 Open, see OASIS spreadsheets, 104 review process, 188 Standalone Document Declaration, see WebSGML Adaptations, 189, 190 SDD Working Group 4, 188 Standard Document Query Language, see SGML Author for Word, 328, 329 SDQL SGML Declaration, 44 standards SGML Disabled Access, see SDA ANSI SGML Document Generator, 243 Z39.58-1988,85 SGML Editor, 224, 227, 228,243, 251 ATA SGML Tagger, 181,246-248,268 ATA-1000, 106,232 SGML-Tools, 82, 198, 268, 336,363 ISO SGMLC, 75, 111,304,310,331,368 10179,143,188,193,198,317 sgmlnorm, 262, 271 1064~30,31, 133, 18~ 190,356 sgmlregion, 31 10743,193 sgmls, ISO, 152, 153, 246, 254, 263, 10744,40,188,193,195,404 266-271,302,317,336 12083, 8~ 85, Ill, 112, 114, 15~ SGMLSpm, 302 214 sgmltrans, 381 13673,263,270 sgrep, 298, 343 2022:1994,149 sh,287 639, 149 SI, 46, 152, 268 646,149,401 binary file, 152 8859-1,57,206 local filename, 150 8859,302 URL,150 8879,29,31,35,36,38,146,152, URL in XML, 146 187,188,190,253,254 SIG, 274, 304 9070,147,223 Simple API for XML, see SAX 9660,383,384 SimpleText, 5 ISO-HTML, 125 Simula,304 TR 9573, 106, 112, 329 Smalltalk, 304 MIL SmartFonts, 366 MIL-D-87269, 91 SMDL,193 MIL-HDBK-28001, 106 SMPOO,273 MIL-M-28001,89 software MIL-M-38784B, 89, 106 commercial, 388 MIL-M-38784C, 214 documentation formats, 388 MIL-PRF-87268A,90 downloading, 7 MIL-PRF-87269A,91 formats, 387 MIL-STD-1840B,77 installation, 2 national bodies, 188 licenses, 388 RFC public domain, 388 1738, 47 shareware, 388 1808,47 Solaris, 321 1866, 123-125, 359, 404 sort, 294-296 SPEC SP, 182, 192,254,262,271,272,277, 2000, 77 316, 318, 379 Spec spam, 177, 183, 271 200, 77 420 Peter Flynn

TR Techexplorer, 111, 112, 250 9401: 1997, 153 Technical Corrigendum, see TC 9401, 153 ENR,190 9502:1995,106 HyTime, 196 start-tag, 23 WebSGML, 190 using grep to search for one, 290 TEl, 92, 93, 101, 104, 127, 132, 152, STiLO, 208, 230 163,276 stripsgml, 302 Extended Pointer Notation, 354 style rules, 144 header, 102 stylesheet, 349 modifying, 169 editing for a browser, 350 Telecommunications Industry Forum, see stylesheets, 53, 193, 198,208,363 TCIF changing styles while editing, 211 Telecommunications Interchange character-level styles, 210 Markup, see TIM hierarchical, 209 Terminate and Stay Resident, see TSR inheritance, 209 TPC, 15, 16, 84, 111, 112, 118,218,236, paragraph-level styles, 209 237,250-252,309,312,313,318, while editing, 53 336,366,387,405,407,409 suffixes, see generated text text Suite 9,241 generated, 53 SunOS, 321 scholarly, 100 Symposia Doc+, 228 structures, 6 Symposia Pro, 228 Text Encoding Initiative, see TEl Synex, 408, 418 textonly, 380 System Identifier, see SI, see SI, 145 TIM,78 System Identifiers, 206, 255 Tk,234 TPU, 5, 201 tables tr,295 CALS,106 transformation complex cell content, 110 language, 193 for standards documents, 110 Translating Editor, 249, 355 head, body, and foot, 107 translation HTML,105 human languages, 250 non-tabular data, 105 TSR,246 SASOUT model, 107 type validity, 134, 190 statistical, 107 , 6 table DTDs, 104 typography, 10,363 tabular data, 105 conventions, 10 tag abuse, 191 , 356 validity, 134, 190 uniq, 294, 296, 297 Tag Wizard, 333 UniSQUX, 374 TagPerfect, 332, 334 Universal Resource Indicator, see URI tags, 23 Universal Resource Locator, see URL TagView, 139 Unix tar, 251 manualpages,286,288 TC,188 redirection, 286 TCIF, 78 URI,146 tcsh, 287 not URL, 47 TeachText, 5 URL, 6, 46, 140, 196 Understanding SGML and XML Tools 421

for DTDs, 349 Write, 233, 333 U senet news, 6 writing, xx UTF-8,134 WYSINAWYM, 11 WYSIWYG, 13, 208 validation editing styles, 53 and parsing, 253 on import and export, 256 XED,234 process, 254 Xemacs, 239 requirements of ISO 8879, 253 XGML,305 validity xgrabsc, xxv new classes for the Web, 190 XLL,140 tag-valid documents, 190 Xmetal, 219, 353 type-valid documents, 190 XML, 122, 132 XML,135 and EPN, 100 vi, 5, 201 and the WebSGML Adaptations, 190 ViewPort, 119, 198,343,344,348-351, Canonical, 265 363,383-386,388 case-sensitivity, 138 W3C, 123, 128, 129 Data, 144 webs declarations, 133 in browsers, 351 described, 132 WebSGML Adaptations, 134 empty elements, 13 7 Web Writer, 228, 230, 243 FAQ, 353 well-formed, 135 linking, 140 What You See Is What You Get, see naming, 138 WYSIWYG omissions from SGML, 135 white-space, 289 parsing, 273 changes in handling, 192 Processing Instructions, 133 HTML handling rules, 64 SGML Declaration, 133 in element content, 53 syntax, 133 in stylesheets, 209 validity, 135 inXML,139 well-formed, 135 SGML handling rules, 64 white-space, 139 Word, 17, 35, 201, 235, 239, 244, 245, XML Linking Language, see XLL 281,309,321,326-331,333,336, xmltok,277 341,345 xmlwf, 277 WordPad, 233 XP,183,277 WordPerfect, 34, 35, 88, 201, 215, 235, XSL, 143, 211 239,309,341 rules, 143 SGML Edition, 239 xv, xxv, xxvi Suite, 31, 177, 235, 239, 240 XyWrite, 235 Suite Professional, 240 WordPerfect Suite, 240, 242, 365 YASP, 254,273, 379 Index of people and organizations

Adobe,228,231,318,381 Connolly, Dan, 304 Allen, Terry, 200 Corel, 50,162,172,177,235,239,240, American Mathematical Society, III 309 Apropos Toy and Tool Development, 331 Cover, Robin, xxiv ArborText, 106,212,213,215,361,371 Datalogics, 363 Bartlett, Geoff, 378 DeRose, Steve, 35, 196, 344 Bate, JL, 112 Durand, David, 196 Beeton, Barbara, III Berger-Levrault,313 EBT, 321, 343, 344 Berglund, Anders, 112, 200 el Andaloussi, Jeanne, 160 Bingham, Harvey, 360 Electricite de France, 378 Birnbaum, David, 318 Electronic Book Technologies, 321 Bosak, Jon, 200 Espert, Christophe, 378 Bradley, John, xxv European Physical Society, III Bray, Tim, 277 Exoterica, 305 Brown, Fredric, 80 Burnard, Lou, 93 Busa, Roberto, 341 GCA,40 German Research Network, 80 Chrystal Software, 371 GNU, xxv, 212 Citec, 355 Goldfarb, Charles, 36, 38, 188,254,267, Clark, James, 182, 192, 198,200,254, 382 265,267,271,277,300,302,317 Graham, Tony, 200 424 Peter Flynn

Graphic Communications Association, Prescod, Paul, 304 41 Productivity Works, 359 GriF, 112, 119,224,226,228,243,251 Rahtz, Sebastian, 318 Hood, Earl, 162 Raman, TY, 359 Richard, Pierre, 254, 378 141, 228, 273, 382 Rubinsky, Yuri, xxiv, 359 IB~,38, Ill, 112,250,267,378 Inso, 119,249,321,343 Sampson,Crrug, 107 Interleaf, 321, 326 Schuchardt, Bruce, xxv lon, Patrick, III Serna, 267, 268 Seybold, 40 Jaakkola, Jani, 297 Seymour, Robert, 299 JASC, Inc, xxv SG~L Open, 41, 153, 386 SG~L Systems Engineering, 139,310, Kilpeliiinen, Pekka, 297 312 Smith, Norm, 178, 282 Language Technology Group, 379, 380 SoftQuad, xxv, 31, 106, 122, 152,212, Light, Richard, 272 219,221,343,353,359 Lockheed ~artin, 188 Software Excellence By Design Inc, 173 Sperberg-~cQueen, ~ichael, xvi, 93, 184 ~aler, Eve, 94, 160 Staflin, Lennart, 235, 237 ~ason,Jarnes, 188 Stallman, Richard, 236 ~egginson, David, 239, 276, 391 STiLO, 112, 119,228,231,243 ~icrosoft, 17, 129,231,235,244,266, Synex, 119, 140, 143,198,249,343,348, 309,321,327,333,336,386 352,360,363,383,384 ~icrostar, 50, 69, 162, 173,235, 244, 278,328 Texcel, 219,371 ~iles 33, 363 Thomas Aquinus, St, 341 ~illigan, Spike, 98 Torvalds, Linus, 335 ~ulberry Technologies, 393 Tufte, Ed, 360 ~urray-Rust, Peter, 129 UniSQL, Inc, 374 Netscape, 111, 131,267,277,386 NICE Technologies, 332 van Herwijnen, Eric, 332 Nicol, Gavin, 60 W3C, 359, 360 OASIS, 28, 41, 106, 149, 153-155, 159, Wall, Larry, 300 217,223,233,237,241,255,271, Walsh, Norm, 318 294,316,350,382,386 Warmer, Jos, 254, 267 Omnimark, 31, 205, 309, 393 WebTY, 129 Open ~olecule Foundation, 129 Whitney, Ron, III Open Text, 31 Widman, Brian, 328 OpenText, 342 Woolf, Bill, III Word Perfect Corporation, 239 Penta, 363 Pepper, Steve, xxiv, 276 Xyvision,363 Poet Software, 371 Poskanzer, J ef, xxv Youngen,Ralph, III Index of markup elements and parameters

attribute names attribute values ALIGN, 61, 310 clink,197 ALT,61 defaul t, 139 CLASS, 127, 128 Important, 165 doc, 141 iso10744,198 HREF, 140, 141, 166 Low, 165 href, 141 no, 134 HyTime, 197 preserve, 139 10,128,290 simple, 141 id, 52, 135 Urgent, 165 IMPORTANCE, 165 label,317 elements lang, 127 , 125, 140, 141,166,334 1i nkend, 197 , 127 NEXT, 98 , 127 Normal,165

, 334 PREV,98 , 70 PRIORITY, 165 , 128 SRC, 61
, 46 STYLE,128 , 78 uni ts, 52 , 127 xml:link,141 , 129 xml : space, 139 , 178 426 Peter Flynn

, 127 , 72 , 174 , 77 , 128 , 67 , 166 , 100 , 72, 74 , 84 , 128 , 362 , 129

, 362 , 127 , 37,64, 174 , 165, 174
, 165 , 67 , 174 , 82 , 165 , 118 , 62, 352 , 166 , 178 , 72, 74 , 164-166 , 209 , 127
, 129 , 174 , 368 , 354 , 63, 64, 351 , 118 , 143 ,62,255,350 , 71 , 168,243 , 324 , 127 , 106, 107 , 102, 103 , 60 , 109 , 52
  • , 178 , 166, 178 , 141, 174 , 72 ,64,255,350 , 127 <1 i st>, 67, 243 , 127 ,62 , 128 , 127
    , 125, 127 , 380 , 259 , 71 , 209 , 129 , 144 , 127 , 128, 129 , 63,117,120,168,169 , 25,27 , 165 , 107 , 128 , 77 , 174 , 69 , 129 , 172 , 174 , 311 , 129 , 127 , 127 , 84 , 71 ,96 , 62 , 127, 198 , 127-129 , 25, 27 , 354 , 128