XML: Looking at the Forest Instead of the Trees Guy Lapalme Professor Département D©Informatique Et De Recherche Opérationnelle Université De Montréal
Total Page:16
File Type:pdf, Size:1020Kb
XML: Looking at the Forest Instead of the Trees Guy Lapalme Professor Département d©informatique et de recherche opérationnelle Université de Montréal C.P. 6128, Succ. Centre-Ville Montréal, Québec Canada H3C 3J7 [email protected] http://www.iro.umontreal.ca/~lapalme/ForestInsteadOfTheTrees/ Publication date April 14, 2019 XML to PDF by RenderX XEP XSL-FO Formatter, visit us at http://www.renderx.com/ XML: Looking at the Forest Instead of the Trees Guy Lapalme Professor Département d©informatique et de recherche opérationnelle Université de Montréal C.P. 6128, Succ. Centre-Ville Montréal, Québec Canada H3C 3J7 [email protected] http://www.iro.umontreal.ca/~lapalme/ForestInsteadOfTheTrees/ Publication date April 14, 2019 Abstract This tutorial gives a high-level overview of the main principles underlying some XML technologies: DTD, XML Schema, RELAX NG, Schematron, XPath, XSL stylesheets, Formatting Objects, DOM, SAX and StAX models of processing. They are presented from the point of view of the computer scientist, without the hype too often associated with them. We do not give a detailed description but we focus on the relations between the main ideas of XML and other computer language technologies. A single compact pretty-print example is used throughout the text to illustrate the processing of an XML structure with XML technologies or with Java programs. We also show how to create an XML document by programming in Java, in Ruby, in Python, in PHP, in E4X (Ecmascript for XML) and in Swift. The source code of the example XML ®les and the programs are available either at the companion web site of this document or by clicking on the ®le name within brackets at the start of the caption of each example. The source ®le will then be shown in a web page or in a new tab of the browser. Should the page be interpreted by the browser, the reader can ask for the source of the page. XML to PDF by RenderX XEP XSL-FO Formatter, visit us at http://www.renderx.com/ Acknowledgements The writing of this XML tutorial started in the Fall of 2002 during my sabbatical at the Université de Grenoble and at Xerox Research Centre Europe. I wish to thank Gilles Sérasset, Christian Boitet, Pierre Isabelle and Marc Dymetman for many fruitful discussions. Since then, the document has been improved (at least it has increased in the number of pages...) after having used it in teaching undergraduate and graduate courses at the Université de Montréal: IFT3225 and IFT6281. I especially thank Fabrizio Gotti for his careful proofreading and many insightful comments. In 2007, I decided to start practicing what I preach by using XML technologies for the organization of the manuscript of this tutorial. Fabrizio Gotti converted the original LaTeX ®les to DocBook so that now PDF and HTML versions can be produced from a single set of XML source ®les. iii XML to PDF by RenderX XEP XSL-FO Formatter, visit us at http://www.renderx.com/ iv XML to PDF by RenderX XEP XSL-FO Formatter, visit us at http://www.renderx.com/ Table of Contents 1. Introduction .................................................................................................................................. 1 2. Instance Document ........................................................................................................................ 9 2.1. Namespaces ..................................................................................................................... 13 3. Document Validation ................................................................................................................... 17 3.1. Document Type De®nition (DTD) ..................................................................................... 17 3.1.1. Associating an Instance File with a DTD ................................................................. 21 3.2. Schema ............................................................................................................................ 22 3.2.1. Simple Types ......................................................................................................... 33 3.2.2. Complex Types ...................................................................................................... 34 3.2.3. Namespaces in Schemas ......................................................................................... 34 3.2.4. Overview of the XML Schemas ............................................................................. 35 3.3. RELAX NG ..................................................................................................................... 38 3.4. Schematron ...................................................................................................................... 43 3.5. Associating an Instance File with a Schema ........................................................................ 48 3.6. Additional Information on XML Schema ........................................................................... 49 4. XPath ......................................................................................................................................... 51 4.1. XPath expression components ........................................................................................... 51 4.2. XPath functions ................................................................................................................ 55 4.3. XPath examples ................................................................................................................ 57 4.4. Additional Information on XPath ....................................................................................... 58 5. Document Transformation ........................................................................................................... 59 5.1. XSL Transformations ........................................................................................................ 60 5.2. Transformation in HTML .................................................................................................. 63 5.2.1. Table ..................................................................................................................... 63 5.2.2. Computing New Information .................................................................................. 68 5.2.3. Bulleted Lists ........................................................................................................ 76 5.3. Transformation into a Compact Textual Form ..................................................................... 80 5.4. Transformation into PDF with XSL-FO ............................................................................. 84 5.4.1. XSL-FO Input to the Renderer ................................................................................ 86 5.4.2. From the Instance Document to the XSL-FO ®le ..................................................... 87 5.5. Transformation with a CSS ............................................................................................... 93 5.6. Associating an Instance File to a Stylesheet ........................................................................ 95 5.7. Additional Information on XSL ......................................................................................... 97 6. Document Query ......................................................................................................................... 99 6.1. XQuery output in HTML ................................................................................................ 101 6.1.1. Table ................................................................................................................... 101 6.1.2. Computing New Information ................................................................................ 103 6.1.3. Bulleted Lists ...................................................................................................... 107 6.2. Transformation into a Compact Textual Form with XQuery ............................................... 110 6.3. Querying an instance ®le ................................................................................................. 112 6.4. Additional Information on XQuery .................................................................................. 112 7. RDF : Resource Description Framework ..................................................................................... 113 7.1. Triples in RDF/XML ...................................................................................................... 116 7.2. RDF Schema .................................................................................................................. 123 7.3. RDF queries ................................................................................................................... 124 7.4. RDF versus XML ........................................................................................................... 125 v XML to PDF by RenderX XEP XSL-FO Formatter, visit us at http://www.renderx.com/ XML: Looking at the Forest Instead of the Trees 7.5. Additional Information on RDF ....................................................................................... 126 8. Document Processing by Programming in Java ............................................................................ 127 8.1. Document Object Model (DOM) ..................................................................................... 128 8.2. Simple API for XML (SAX) ...........................................................................................