<<

View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by HAL - Université de Franche-Comté

Writing Structured and Semantics-Oriented Documents: TeX vs XML Jean-Michel Hufflen

To cite this version: Jean-Michel Hufflen. Writing Structured and Semantics-Oriented Documents: TeX vs XML. Biuletyn , 2006, pp.104–108.

HAL Id: hal-00644460 https://hal.archives-ouvertes.fr/hal-00644460 Submitted on 24 Nov 2011

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destin´eeau d´epˆotet `ala diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publi´esou non, lished or not. The documents may come from ´emanant des ´etablissements d’enseignement et de teaching and research institutions in France or recherche fran¸caisou ´etrangers,des laboratoires abroad, or from public or private research centers. publics ou priv´es. Writing Structured and Semantics-Oriented Documents: TEX vs

Jean-Michel HUFFLEN LIFC (FRE CNRS 2661) University of Franche-Comté 16, route de Gray 25030 BESANÇON CEDEX FRANCE [email protected] http://lifc.univ-fcomte.fr/~hufflen

Abstract Using xml-like syntax for documents gives them a tree structure, inducing a notion of structured document. Defining domain-dependent tags introduces a notion of semantics-oriented writing. These two points result in a new view about document production. In fact, they have already existed within TEX, but in another shape. This article aims to point out these notions and the differences between them. It ends with some proposals about the evolution of the tools belonging to TEX’s world. Keywords Structured documents, semantics-oriented writing, TEX, LATEX, PassiveTEX, xml, xslt, xsl-fo.

Streszczenie Używanie składni xml-owej do opisu dokumentów nadaje im strukturę drzewiastą i indukuje w ten sposób pojęcie dokumentu strukturalnego. Definiowanie znacz- ników domenowo zależnych wprowadza pojęcie pisania zorientowanego seman- tycznie. Oba elementy łącznie dają nowe spojrzenie na tworzenie dokumentów. W rzeczy samej istniały one już w TEX-u, ale w innym kształcie. W artykule staramy się omówić wymienione pojęcia oraz różnice między nimi. Kończymy propozycjami dotyczącymi rozwoju narzędzi należących do świata TEX-owego. Słowa kluczowe Dokumenty strukturalne, pisanie zorientowane semantycz- nie, TEX, LATEX, PassiveTEX, xml, xslt, xsl-fo.

Introduction easily, a subset of this meta-language has been de- fined as xml5. This meta-language has succeeded: The notion of document has deeply changed since nowadays it is used as a central formalism for data the introduction of sgml2. A document markup interchange, some related to networking use config- only depends on what users want to express by their uration files written according to xml’s syntax, . . . own tags, regarding questions that are relevant for On another point, this markup notion also ex- them. Besides, the notion of document transforma- isted within word processors such as Plain T X or tion also appeared at this time with dsssl3 [6]: from E LAT X. So this article aims to point out different the same sgml document, we can derive a print- E kinds of markup, and how they are put into action able document sent to a laser printer as well as a in xml and T X. hyper-text document in html4 for the Web. sgml E being too complex for defining specialised markup Marking documents up ∗ Title in Polish: Konstruowanie dokumentów struktural- If we consider a document, many tags used nych i zorientowanych semantycznie: TEX versus xml. throughout it are related to questions of style: good 2 Standard Generalized . Now this meta-language has only historical interest, a good introduc- examples are definitions of headings by means of tion to it can be found in [1]. tags h1, h2, etc. Even if the layout may be refined 3 Document Style Semantics and Specification Language. 4 HyperText Markup Language. See [1, Ch. 12] about 5 eXtensible Markup Language. A good introduction to the relationship between sgml and html. it is [10].

TUGboat, Volume 0 (2060), No. 0 — Proceedings of the 2060 Annual Meeting 1001 Jean-Michel HUFFLEN

Czuj, czuj, czuwaj,">]> Płonie ognisko Płonie ongisko w lesie, Wiatr smętną piosnkę niesie. Przy ogniu zaś drużyna Gawędę rozpoczyna &refren-1;&refren-1; Rozlega się dokoła, &refren-1;&refren-1; Najstarszy druh zawoła. Przestańciesię już bawić I czas swój marnotrawić. Niechj każdy z was się szczerze, Do pracy swej zabierze

Figure 1: Example of a Polish song as an xml text.

by means of css6 files [3, § 7.4], such an approach of an entity, or structurally, by sharing subtrees la- is related to the shape of document. In fact, html belled by identifiers. Such an approach yields a very tags related to speech structuration, like p or div, strict hierarchy among tags. are rarely used in practice. A good example of structured document is given If we look at Figure 2, the DocBook tags of this by a poem, as shown in Figure 1: this document is bibliography express semantic information. Such an obviously given a tree structure. Besides, repetitions approach is more conformant to xml’s philosophy, are easily implemented: a repetition of a verse or but the questions related to style may be more diffi- stanza can be implemented syntactically, by means cult to implement. For example, we see that title tags are used for three purposes: the bibliography’s 6 Cascading StyleSheets. title, a bibliographical entry’s title, and the title of a

1002 TUGboat, Volume 0 (2060), No. 0 — Proceedings of the 2060 Annual Meeting Writing Structured and Semantics-Oriented Documents: TEX vs xml

Example of a small bibliography for the Bachotex 2006 conference Stephen Donaldson R. 1982 0-00-615239-2 658 Fontana

Harper Collins Publishers 77–85 Fulham Palace Road Hammersmith London W6 8JB United Kingdom
The One Tree 2 The Second Chronicles of Thomas Covenant Raymond Feist E. JannyWurts 1990 0-586-07481-3 827 Grafton Books
Daughter of the Empire
(DocBook is a system for writing documents. It was initially designed from sgml [15], but recent versions have been reformulated from xml. We use the conventions of [11].)

Figure 2: Bibliography using DocBook.

TUGboat, Volume 0 (2060), No. 0 — Proceedings of the 2060 Annual Meeting 1003 Jean-Michel HUFFLEN

f ( x fx ) = separators=""> a a x x + b b 2 2 π + 2 Figure 4: The equation (1) in content mode. π Structure and semantics within TEX At a first glance, the programs related to TEX & Co. only put a notion of structured document into ac- tion. Let us not forget that end-users are responsi- Figure 3: The equation (1) in presentation mode. ble for their semantics. Besides, such an semantics- oriented approach is encouraged by conceptors. For example, Leslie Lamport recommends to define a LAT X command for an inner product, in order to series an entry belongs to. That is a nice use of such E decide its layout at only one place [7, § 1.5]. To give a tag name w.r.t. semantics-oriented approach, but a second example from our documents, we system- these three kinds of titles are not to be displayed the atically use a \pgname command for programming same way when this bibliography is listed: according languages’ names that are not logos. That allows to English-speaking conventions [2, §§ 15 & 16], the us a unified layout for such names and we can know title of a book should be displayed using italicised which programming languages are cited throughout characters, whereas the title of a series is just dis- one of our texts by a quick search. Such a command played using ‘normal’ font. Last, the bibliography’s can be easily changed, as shown by our \logo com- title should be emphasised as a ‘general’ title. mand: our logos are displayed using small capitals, These two different approaches coexist within except for the articles for TUGboat, where an ad the description of a mathematical expression using hoc command is used. MathML7, either by its presentation or by its con- tents [14, §§ 2.3.1 & 2.3.2]. Let us consider: \newcommand{\pgname}[1]{\textsf{#1}} \def\logo#1{\iffortugboat% (ax + b)2 f(x) = √ (1) \acro{\uppercase{#1}}\else\textsc{#1}% π \fi} The specification of Figure 3 emphasises its graphi- cal structure, whereas the version of Figure 4 directly So these simple examples show that semantics- refers to mathematical operations. oriented writing is possible with TEX, even if it is not always practised. Besides, grouping such commands 7 MATHematical Markup Language. into packages improve interchange among users.

1004 TUGboat, Volume 0 (2060), No. 0 — Proceedings of the 2060 Annual Meeting Writing Structured and Semantics-Oriented Documents: TEX vs xml

Directions ion. Addison-Wesley. 1997. A well-known drawback about programs belonging [2] The Chicago Manual of Style. The University of Chicago Press. The 14th edition of a manual to TEX’s world: they recognise only their own for- mats. Let us consider the text given in Figure 1, it of style revised and expanded. 1993. cannot be processed with LATEX. In that case, this is [3] Michel Goossens, Sebastian Rahtz, Eitan M. not a problem, we can write an xslt8 program [12] Gurari, Ross Moore and Robert S. Sutor: A whose output would be suitable for LATEX. But this The LTEX Web Companion. Addison-Wesley output will be in text mode, that is, there will be Longmann, Inc., Reading, Massachusetts. May two ckecks from a syntactic point of view. If LATEX 1999. accepted xml inputs, we could ensure that the re- [4] Hans Hagen: ConTEXt, the Manual. Novem- sult of such an xslt program would be syntactically ber 2001. http://www.pragma-ade.com/ suitable for LATEX. ConTEXt [4] can do some im- general/manuals/cont-enp.pdf. port, LATEX should do, too. [5] Jean-Michel Hufflen: “MlBibTEX: be- On another point, a good ‘recycling’ of TEX into yond LATEX”. In: Apostolos Syropoulos, xml’s world is Passive TEX [9, p. 180], which pro- Karl Berry, Yannis Haralambous, Baden cesses xsl-fo9 documents. TEX is unrivalled as a Hugues, Steven Peter and John Plaice, typeset engine, so this approach allows some xml eds., International Conference on TEX, xml, texts to take advantage of TEX’s power. From our and Digital Typography, Vol. 3130 of lncs, point of view, this project should be developed and pp. 203–215. Springer, Xanthi, Greece. August expanded, in the sense that Passive TEX should be 2004. able to include and mix fragments written w.r.t. TEX [6] International Standard iso/iec 10179:1996(e): syntax as well as xsl-fo documents. dsssl. 1996. TEX and the programs related to it often work [7] Leslie Lamport: LATEX. A Document Prepa- within a closed work. As an example, LATEX users ration System. User’s Guide and Reference often get used to put LATEX commands inside values Manual. Addison-Wesley Publishing Company, handled by BibTEX [8], the bibliography processor Reading, Massachusetts. 1994. usually associated with it. That is often needed, [8] Oren Patashnik: BibT Xing. February 1988. but makes difficult the use of B T X for another E ib E Part of BibT X’s distribution. target than LAT X. We think that a successor of E E [9] Dave Pawson: xsl-fo. O’Reilly & Associates, BibT X should be based on xml as an interchange E Inc. August 2002. format and be able to replace LATEX commands by ‘semantic’ tags of xml, that is what we plan in [5]. [10] Erik T. Ray: Learning xml. O’Reilly & Asso- Such a choice would allow the layout corresponding ciates, Inc. January 2001. to semantic tags to be deferred until the final step. [11] Thomas Schraitle: DocBook-xml — Medi- enneutrales und plattformunabhändiges Publi- As a conclusion, we think that structured and zieren. SuSE Press. 2004. semantics-oriented approaches are complementary. [12] W3C: xsl Transformations ( xslt). Ver- TEX and xml can be complementary, too. The pro- sion 1.0. w3c Recommendation. Edited by grams related to TEX’s world are sometimes viewed as old products, but they can get new youth if they James Clark. November 1999. http://www.w3. org/TR/1999/REC-xslt-19991116. succeed in taking advantage of xml features. [13] W3C: Extensible Stylesheet Language ( xsl). Acknowledgements Version 1.0. w3c Recommendation. Edited by James Clark. October 2001. http://www.w3. Many thanks to Jerzy B. Ludwichowski, who has org/TR/2001/REC-xsl-20011015/. written the Polish translation of the abstract in a very short time, and who waited it for a longer time. [14] W3C: Mathematical Markup Language (MathML) Version 2.0, 2nd edition. w3c References Recommendation. Edited by David Carlisle, Patrick Ion, Robert Miner, and Nico Poppelier. [1] Neil Bradley: The Concise sgml Compan- October 2003. http://www.w3.org/TR/2003/ 8 eXtensible Stylesheet Language Transformations, the REC-MathML2-20031021. language of transformations used for xml texts. [15] Norman Walsh and Leonard Muellner: Doc- 9 eXtensible Stylesheet Language — Formatting Objects: Book: The Definitive Guide. O’Reilly & Asso- this language aims to describe high-quality print outputs. See ciates, Inc. October 1999. [9] for an introduction, the official document being [13].

TUGboat, Volume 0 (2060), No. 0 — Proceedings of the 2060 Annual Meeting 1005