Xml London 2015 Conference Proceedings
Total Page:16
File Type:pdf, Size:1020Kb
XML LONDON 2015 CONFERENCE PROCEEDINGS UNIVERSITY COLLEGE LONDON, LONDON, UNITED KINGDOM JUNE 6–7, 2015 XML London 2015 – Conference Proceedings Published by XML London Copyright © 2015 Charles Foster ISBN 978-0-9926471-2-4 Table of Contents General Information. 6 Sponsors. 7 Preface. 8 Improving Pattern Matching Performance in XSLT - John Lumley and Michael Kay. 9 It's the little things that matter - Abel Braaksma. 26 Continuous Integration for XML and RDF Data - Sandro Cirulli. 52 Vivliostyle - Web browser based CSS typesetting engine - Shinyu Murakami and Johannes Wilm. 61 Magic URLs in an XML Universe - George Bina. 63 A rendering language for RDF - Fabio Labella and Henry S. Thompson. 69 Publishing with XProc - Nic Gibson. 81 Data-Driven Programming in XQuery - Eric van der Vlist. 90 XML Processing with Scala and yaidom - Chris de Vreeze. 99 Lizard - A Linked Data Publishing Platform - Andy Seaborne. 109 Streamlining XML Authoring Workflows - Phil Fearon. 114 Implementation of Portable EXPath Extension Functions - Adam Retter. 125 Validating XSL-FO with Relax NG and Schematron - Tony Graham. 143 The application of Schematron schemas to word-processing documents - Andrew Sales. 153 XML Interfaces to the Internet of Things with XForms - Steven Pemberton. 163 Diaries of a desperate XProc Hacker - James Fuller. 169 MarkLogic- DataLeadershipEvent-A4-v01b.pdf 1 11/21/2013 3:29:18 PM Are you stuck xing YESTERDAY Or are you solving for TOMORROW? The world’s largest banks use MarkLogic to get a 360-degree view of the enterprise, reduce risk and operational costs, and provide better customer service. What are you waiting for? C M Y CM MY CY CMY K Hello, we’re your Enterprise NoSQL database solution. Say hello to the new generation. POWERFUL & SECURE | AGILE & FLEXIBLE | ENTERPRISE-READY | TRUSTED www.marklogic.com . [email protected] General Information Date Saturday, June 6th, 2015 Sunday, June 7th, 2015 Location University College London, London – Roberts Engineering Building, Torrington Place, London, WC1E 7JE Organising Committee Kate Harris, Socionics Limited Dr. Stephen Foster, Socionics Limited Charles Foster, MarkLogician (Socionics Limited) Programme Committee Abel Braaksma, AbraSoft Adam Retter, Freelance Charles Foster (chair), MarkLogician Dr. Christian Grün, BaseX Eric van der Vlist, Dyomedea Geert Bormans, Freelance Jim Fuller, MarkLogic John Snelson, MarkLogic Lars Windauer, BetterFORM Mohamed Zergaoui, Innovimax Norman Walsh, MarkLogic Philip Fennell, MarkLogic Produced By XML London (http://xmllondon.com) Sponsors Gold Sponsor • MarkLogic - http://www.marklogic.com Silver Sponsor • oXygen - http://www.oxygenxml.com Bronze Sponsor • Saxonica - http://www.saxonica.com Preface This publication contains the papers presented during the XML London 2015 conference. This is the third international XML conference to be held in London for XML Developers – Worldwide, Semantic Web and Linked Data enthusiasts, Managers / Decision Makers and Markup Enthusiasts. This 2 day conference is covering everything XML, both academic as well as the applied use of XML in industries such as finance and publishing. The conference is taking place on the 6th and 7th June 2015 at the Faculty of Engineering Sciences (Roberts Building) which is part of University College London (UCL). The conference dinner and the XML London 2015 DemoJam is being held in the UCL Marquee located on the Front Quad of UCL, London. The conference is held annually using the same format, with XML London 2016 taking place in June next year. — Charles Foster Chairman, XML London Improving Pattern Matching Performance in XSLT John Lumley jωL Research & Saxonica <[email protected]> Michael Kay Saxonica <[email protected]> Abstract engineering', such as DocBook and DITA, which use pattern-matching templates extensively. Typically a This paper discusses improving the performance of XSLT processing step might involve the use of hundreds of programs that use very large numbers of similar patterns in templates which have to be 'checked' for applicablity their push-mode templates. The experimentation focusses against XML nodes that are being processed in a push around stylesheets used for processing DITA document fashion. One of the challenges for the implementor of an frameworks, where much of the document logical structure is XSLT engine is to ensure that for most common cases, encoded in @class attributes. The processing stylesheets, often this matching process is efficient. defined in XSLT1.0, use string-containment tests on these Various aspects of overall XSLT performance have attributes to describe push-template applicability. For some been studied and reported [1] [3] including optimization cases this can mean a few hundred string tests have to be rewriting [2]. In this paper we will examine some cases performed for every element node in the input document to where, owing to the nature of the XML vocabularies determine which template to evaluate, which in sometimes being processed and the design of the large XSLT means up to 30% of the entire processing time is taken up processing stylesheets employed, the default matching with such pattern matching. This paper examines methods, process in one XSLT implementation (Saxon) can be within XSLT implementations, to ameliorate this situation, rather expensive, in some cases taking about a third of all including using sets of pattern preconditions and pre- the transform execution time. We'll discuss possible tokenization of the class-describing attributes. How such additions and modifications to the pattern-matching optimisation may be configured for an XSLT techniques to improve performance and show their implementation is discussed. effects. Some knowledge of XSLT/XPath is assumed. The paper is organised as follows: Keywords: XSLT, Pattern matching • We first describe “push-mode” processing in XSLT and what the process for matching template patterns is in detail. 1. Introduction • How this process is performed in the Saxon implementation is presented, along with some general XSLT's push mode of processing [5], where templates are remarks about the problems of many templates invoked by matching XPath-based patterns that describe matching predicated generic, as opposed to named, conditions on nodes to which they are applicable, is one elements. of the really powerful features of that language. It allows • We discuss in detail measurements of the pattern very precise declarative description of the cases for which matching performance when processing a large a template is considered relevant, and along with a well- sample DITA document. defined mechanism of priority, and precedence, permits • Possible improvements using sets of common specialisation and overriding of 'libraries' to encourage preconditions to partition applicable templates are significant code reuse. Whilst other features of XSLT are outlined and performance measurements using a valuable, push-mode pattern matching is almost certainly variety of these tactics in processing the sample are the most important. discussed. Consequently much effort has been expended on • Some other approaches, involving more detailed developing XSLT-based processing libraries, for many knowledge of stylesheet expectations are discussed types of XML processing, most notably in 'document briefly. doi:10.14337/XMLLondon15.Lumley01 Page 9 of 177 Improving Pattern Matching Performance in XSLT • We speculate on methods to define and introduce • If it has just a single template member, then the such tuning features into an XSLT implementation. result of the xsl:apply-templates is the (non- error) result of executing the sequence 2. XSLT push mode constructor of that template with the tested item as the context item. XSLT's push-mode of processing takes a set of items • If the resulting template set has more than one member, then it is an implementation choice as (usually one or more nodes from the input document 1 such as elements, attributes or text) and for each finds a to whether a dynamic error is thrown . If not, stylesheet template declaration whose pattern matches then the last in “sequence” is chosen and its the item (the “context item”) in question. Assuming one body executed. is found, the body of the template, which can contain What this list doesn't prescribe is how the process is to be both result tree fragments and XSLT instructions, is implemented. Clearly there are a number of possibilities executed to generate a result which is then added to the of improving performance, by for example examining current result tree. This push mode is often exploited candidate templates in a suitable order, or pre-classifying highly recursively, descending large source input trees to subsets of the possible templates. In this paper we accumulate result transformations, usually as modified examine some possibilities which look deeper into the trees. To understand the problems with large sets of such template patterns themselves. templates, which is often the case in industrial-scale document processing applications, we need to describe more closely what this pattern matching process is. 3. Template rules in Saxon When processing a candiate item (“context item”) in XSLT via the xsl:apply-templates instruction the The algorithm used for matching template patterns in following is the effective declarative procedure: the Saxon processor has been unchanged for many years [1], and works well in common cases. In simplified form, 1. All the templates that have @match patterns and it is as follows: