XML Prague 2020

XML Prague 2020 Conference Proceedings University of Economics, Prague Prague, Czech Republic February 13–15, 2020 XML Prague 2020 – Conference Proceedings Copyright © 2020 Jiří Kosek ISBN 978-80-906259-8-3 (pdf) ISBN 978-80-906259-9-0 (ePub) Table of Contents General Information ..................................................................................................... vii Sponsors .......................................................................................................................... ix Preface .............................................................................................................................. xi A note on Editor performance – Stef Busking and Martin Middel .............................. 1 XSLWeb: XSLT- and XQuery-only pipelines for the web – Maarten Kroon and Pieter Masereeuw ............................................................................ 19 Things We Lost in the Fire – Geert Bormans and Ari Nordström .............................. 31 Sequence alignment in XSLT 3.0 – David J. Birnbaum .............................................. 45 Powerful patterns with XSLT 3.0 hidden improvements – Abel Braaksma ............ 67 A Proposal for XSLT 4.0 – Michael Kay ..................................................................... 109 (Re)presentation in XForms – Steven Pemberton and Alain Couthures ................... 139 Greenfox – a schema language for validating file systems – Hans-Juergen Rennau ................................................................................................... 151 Use cases and examination of XML to process MS Word documents – Colin Mackenzie ............................................................................................................ 185 XML-MutaTe – Renzo Kottmann and Fabian Büttner ................................................ 205 Analytical XSLT – Liam Quin ..................................................................................... 219 XSLT Earley: First Steps to a Declarative Parser Generator – Tomos Hillman ..... 231 v vi General Information Date February 13th, 14th and 15th, 2020 Location University of Economics, Prague (UEP) nám. W. Churchilla 4, 130 67 Prague 3, Czech Republic Organizing Committee Petr Cimprich, XML Prague, z.s. Vít Janota, XML Prague, z.s. Káťa Kabrhelová, XML Prague, z.s. Jirka Kosek, xmlguru.cz & XML Prague, z.s. & University of Economics, Prague Martin Svárovský, Memsource & XML Prague, z.s. Mohamed Zergaoui, ShareXML.com & Innovimax Program Committee Robin Berjon, The New York Times Petr Cimprich, Wunderman Jim Fuller, MarkLogic Michael Kay, Saxonica Jirka Kosek (chair), University of Economics, Prague Ari Nordström, Creative Words Uche Ogbuji, Zepheira LLC Adam Retter, Evolved Binary Andrew Sales, Bloomsbury Publishing plc Felix Sasaki, Cornelsen GmbH John Snelson, MarkLogic Jeni Tennison, Open Data Institute Eric van der Vlist, Dyomedea Priscilla Walmsley, Datypic Norman Tovey-Walsh, MarkLogic Mohamed Zergaoui, Innovimax Produced By XML Prague, z.s. (http://xmlprague.cz/about) Faculty of Informatics and Statistics, UEP (http://fis.vse.cz) vii viii Sponsors oXygen (https://www.oxygenxml.com) Antenna House (https://www.antennahouse.com/) le-tex publishing services (https://www.le-tex.de/en/) Saxonica (https://www.saxonica.com/) print-css.rock (https://print-css.rock/) Czech Association for Digital Humanities (https://www.czadh.cz) speedata (https://www.speedata.de/) schematronist.org (https://schematronist.org/) Mercator IT Solutions Ltd (http://www.mercatorit.com) ix x Preface This publication contains papers presented during the XML Prague 2020 conference. In its 15th year, XML Prague is a conference on XML for developers, markup geeks, information managers, and students. XML Prague focuses on markup and semantic on the Web, publishing and digital books, XML technologies for Big Data and recent advances in XML technologies. The conference provides an over- view of successful technologies, with a focus on real world application versus theoretical exposition. The conference takes place 13–15 February 2020 at the campus of University of Economics in Prague. XML Prague 2020 is jointly organized by the non-profit organization XML Prague, z.s. and by the Faculty of Informatics and Statistics, University of Economics in Prague. The full program of the conference is broadcasted over the Internet (see https://xmlprague.cz)—allowing XML fans, from around the world, to participate on-line. The Thursday runs in an un-conference style which provides space for various XML community meetings in parallel tracks. Friday and Saturday are devoted to classical single-track format and papers from these days are published in the pro- ceeedings. This year we put special focus on CSS and publishing. On the un-conference day there will be introductory tutorial about producing print output using CSS followed by the workshop where future of CSS Print should be discussed. Friday opening keynote by Rachel Andrew Refactoring (the way we talk about) CSS will hopefully give you a new perspective about how to perceive CSS. We hope that you enjoy XML Prague 2020! — Petr Cimprich & Jirka Kosek & Mohamed Zergaoui XML Prague Organizing Committee xi xii A note on Editor performance A story on how the performance of Fonto came to be what it is, and how we will further improve it Stef Busking FontoXML <[email protected]> Martin Middel FontoXML <[email protected]> Abstract This paper will discuss a number of key performance optimizations made during the development of Fonto, a web-based WYSIWYM XML editor. It describes how the configuration layer of Fonto works and what we did to make it faster. It will also describe how the indexing layer of Fonto works and how we improve it in the future. 1. Introduction 1.1. How does Fonto work? Fonto is a browser-based WYSIWYM1 editor for XML documents. It can be configured for any schema, including many DITA specializations, JATS, the TEI, doc- book and more. Fonto configuration consists of three parts: 1. How do elements look and feel (the schema experience) 2. How can they be mutated (the operations) 3. The encompassing user interface of Fonto The schema experience is specified as a set of rules that assign specific properties to all nodes matching a corresponding selector. These selectors are expressed in XPath. Operations also make use of XPath in order to query the documents. Effects are defined either as JavaScript code, or using XQuery Update Facility 3.0. The user interface of Fonto has several areas (e.g., the toolbar, sidebar and custom dialog boxes) in which custom UI can be composed from React components. These can observe XPath expressions to access the current state of the documents 1What You See Is What You Mean 1 A note on Editor performance and be updated when it changes. The documents themselves are rendered recur- sively by querying the schema experience for each node and generating HTML appropriate for the resulting configuration. 1.2. What is performance? When a single key is pressed, Fonto needs to update the XML and then update all related UI. This includes updating the HTML representation of the documents, recomputing the state of all toolbar buttons based on the applicability of their operation in the new state, and updating any other UI as necessary. Typically, such updates involve looking up the values of various configured properties for a number of nodes (by re-evaluating the associated XPath selectors against those nodes) and/or executing other types of XPath / XQuery queries. In order to keep the editor responsive, these updates need to be implemented in a way that scales well with respect to both the complexity of the configuration as well as the sizes of the documents being edited. In order to keep Fonto easy to configure, we should not place too many requirements on the shape of this configuration. This means Fonto has to deal with a wide range of possibilities regard- ing the number of selectors etc. When we started Fonto, we considered documents of around 100KB to be ‘pretty big’, and these could be pretty slow to work with. After heavy optimiza- tion, we now have workable editors that load documents of multiple megabytes2, using (automatically updating) cross references, (automatic) numbering of sec- tions and more. This paper details a few of the most significant optimizations we have applied in order to get to that point. 2. Accessing schema experience configuration As described in the introduction, Fonto uses XPath selectors to apply a set of properties to nodes. We call the combination of a selector and a value a declara- tion. Example of the look and feel configuration of the ‘p’ element: configureAsBlock(sxModule, 'self::p'); This configuration does the following internally: 2Using just-in-time loading to only load a small subset, this even scales to working in collections total- ing in the hundreds of megabytes, but that could be considered cheating. 2 A note on Editor performance Table 1. Summary of properties set for a paragraph Property Value Automergable false Closed false Detached false Ignored for navigation false Removable if empty true Splittable true Select before delete false Default Text Container none Layout type block Inner layout type inline … (a total of 23 properties, plus optionally up to 35 more that are not set ... automatically) There are about 23 properties being configured for a single paragraph, each speci- fying whether the paragraph may be split, how it should interact

XML Prague 2020

Generating Xml Documents from Xml Schemas C

V a Lida T in G R D F Da

Bibliography of Erik Wilde

A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and Their Usage

Pearls of XSLT/Xpath 3.0 Design

High Performance XML Data Retrieval

Decentralized Identifier WG F2F Sessions

Hermes Documentation Release 2.2

Automatic Generic Web Information Extraction at Scale

STYX: Connecting the XML Web to the World of Semantics

Documentum Xplore Administration Guide

Presentation Component XSL Reference Rev: 2013-10-04