Conference Proceedings

XML LONDON 2014 CONFERENCE PROCEEDINGS UNIVERSITY COLLEGE LONDON, LONDON, UNITED KINGDOM JUNE 7–8, 2014 XML London 2014 – Conference Proceedings Published by XML London Copyright © 2014 Charles Foster ISBN 978-0-9926471-1-7 Table of Contents General Information. 7 Sponsors. 8 Preface. 9 Benchmarking XSLT Performance - Michael Kay and Debbie Lockett. 10 Streaming Design Patterns or: How I Learned to Stop Worrying and Love the Stream - Abel Braaksma. 24 From monolithic XML for print/web to lean XML for data: realising linked data for dictionaries - Matt Kohl, Sandro Cirulli and Phil Gooch. 53 XML Processing in Scala - Dino Fancellu and William Narmontas. 63 XML Authoring On Mobile Devices - George Bina. 76 Engineering a XML-based Content Hub for Enterprise Publishing - Elias Weingärtner and Christoph Ludwig. 83 A Visual Comparison Approach to Automated Regression Testing - Celina Huang. 88 Live XML Data - Steven Pemberton. 96 Schematron - More useful than you’d thought - Philip Fennell. 103 Linked Data in a .NET World - Kal Ahmed. 113 Frameless for XML - The Reactive Revolution - Robbert Broersma and Yolijn van der Kolk. 128 Product Usage Schemas - Jorge Luis Williams. 133 An XML-based Approach for Data Preprocessing of Multi-Label Classification Problems - Eduardo Corrêa Gonçalves and Vanessa Braganholo. 148 Using Abstract Content Model and Wikis to link Semantic Web, XML, HTML, JSON and CSV - Lech Rzedzicki. 152 JSON and XML: a new perspective - Eric van der Vlist. 157 MarkLogic- DataLeadershipEvent-A4-v01b.pdf 1 11/21/2013 3:29:18 PM Are you stuck xing YESTERDAY Or are you solving for TOMORROW? The world’s largest banks use MarkLogic to get a 360-degree view of the enterprise, reduce risk and operational costs, and provide better customer service. What are you waiting for? C M Y CM MY CY CMY K Hello, we’re your Enterprise NoSQL database solution. Say hello to the new generation. POWERFUL & SECURE | AGILE & FLEXIBLE | ENTERPRISE-READY | TRUSTED www.marklogic.com . [email protected] Intuitive web- based XML-editor Xeditor allows you to intuitively create complex and structured XML documents without any technical knowledge using a configurable online editor simi- lar to MSWord with real-time validation. www.xeditor.com “ XML is not dead, but alive ” 1 System N System Structured Data Resume Web Service HTML Name: P P DB Address: DB View HP: Search Model Web Programmer EAI Legacy Technology Security Resume XML Name: Address: E DB View HP: Big Data XML End User Structured / Content Unstructured Data XML N-Screen Model View Web App SOAXML™ Technology www.3ksoftware.com 3KsoftwareUSA, Inc. 3Ksoft Suite 375, 3F, Hyundai Topics Bldg, www.xmlidc.com 3 Twins Dolphin Drive, 44-3, Bangi-dong, [email protected] Redwood City, CA, 94065 Songpa-gu, Seoul, Korea General Information Date Saturday, June 7th, 2014 Sunday, June 8th, 2014 Location University College London, London – Roberts Engineering Building, Torrington Place, London, WC1E 7JE Organising Committee Kate Foster, Socionics Limited Dr. Stephen Foster, Socionics Limited Charles Foster, MarkLogician (Socionics Limited) Programme Committee Abel Braaksma, AbraSoft Adam Retter, Freelance Charles Foster (chair), MarkLogician Dr. Christian Grün, BaseX Eric van der Vlist, Dyomedea Jim Fuller, MarkLogic John Snelson, MarkLogic Lars Windauer, BetterFORM Mohamed Zergaoui, Innovimax Norman Walsh, MarkLogic Philip Fennell, MarkLogic Produced By XML London (http://xmllondon.com) Sponsors Gold Sponsor • MarkLogic - http://www.marklogic.com Silver Sponsors • Xeditor - http://www.xeditor.com • 3Ksoftware - http://www.3ksoftware.com • oXygen - http://www.oxygenxml.com Bronze Sponsors • Antenna House - http://www.antennahouse.com • Saxonica - http://www.saxonica.com Preface This publication contains the papers presented during the XML London 2014 conference. This is the second international XML conference to be held in London for XML Developers – Worldwide, Semantic Web and Linked Data enthusiasts, Managers / Decision Makers and Markup Enthusiasts. This 2 day conference is covering everything XML, both academic as well as the applied use of XML in industries such as finance and publishing. The conference is taking place on the 7th and 8th June 2014 at the Faculty of Engineering Sciences (Roberts Building) which is part of University College London (UCL). The conference dinner and the XML London 2014 DemoJam is being held in the Jeremy Bentham Room at UCL, London. The conference is held annually using the same format, with XML London 2015 taking place in June next year. — Charles Foster Chairman, XML London Benchmarking XSLT Performance Michael Kay Saxonica Debbie Lockett Saxonica Abstract As for usability, we believe that far more users notice sub- standard usability than notice sub-standard performance. This paper presents a new benchmarking framework for Most users only care that performance is good enough, XSLT. The project, called XT-Speedo1, is open source and we not that it is the best available. If one processor runs a hope that it will attract a community of developers. The transformation in 0.1s and another does the same work tangible deliverable consists of a set of test material, a set of in 0.2s, most users won't notice the difference. They are test drivers for various XSLT processors, and tools for much more likely to end up using your product because analyzing the test results. Underpinning these deliverables is they discover, from experience, that its error messages are a methodology and set of measurement objectives that more helpful. influence the design and selection of material for the test But even though performance is our third priority, suite, which are also described in this paper. it's still extremely important. Saxonica has users running some seriously impressive workloads, and when we get things wrong, they notice. 1. Objectives and Motivation In measuring performance, our main objective is to avoid regression between successive releases. Such Performance of XSLT is, of course, important, though regression is probably the most obvious source of we need to qualify that by pointing out that performance complaints; the last thing users want if they move is not the only thing that matters. forward to a new release is to find that their particular For Saxonica as a developer of one of the leading workload runs more slowly. Although we have always XSLT engines, performance is not actually our number included some performance tests in our standard quality one objective. Our first objective is standards assurance checklist before release, experience has shown conformance; the second is usability, and performance that these are inadequate, and too many mistakes slip comes third. This means that we will (almost!) never through. sacrifice standards conformance in order to achieve Another significant objective is to be able to test the improved performance, and we are prepared to take a impact of product changes. When we introduce a change performance hit in order to improve usability, for (perhaps a new optimization) that is designed to improve example by maintaining extra run-time information that performance,we will often do some ad-hoc tests to ensure is needed only for diagnostics when things go wrong. that the particular test case it is tackling shows the We choose these priorities because we think this is expected improvement. But assessing the overall impact what the market wants. We are probably rather obsessive of the change is much more difficult. about corner cases when it comes to standards conformance, but being obsessive about cases that most users don't care about means that we almost always get things right in the more common cases where users care a lot. Standards conformance is typically a major objective for vendors who can't take their place in the market for granted, and for Saxonica, achieving standards conformance was what gave us a place at the top table of XSLT vendors alongside the likes of Microsoft and IBM. 1 XT-Speedo - https://github.com/Saxonica/XT-Speedo/ Page 10 of 162 doi:10.14337/XMLLondon14.Kay01 Benchmarking XSLT Performance Some readers may be surprised that we do very little Another benchmarking effort that appeared on the web competitive benchmarking of our own product against and then disappeared leaving very little trace was products from other vendors. We did more of this in the Bumblebee. The main problem with this effort was that early days, before Saxon became well established. One the drivers were not open source, and there was explanation is that the primary reason people adopt insufficient information provided to create your own Saxon has always been that they want access to XSLT drivers. We used it in Saxonica for a while, but when we 2.0, and are switching from a product that only supports found that we couldn't update or tweak the drivers to XSLT 1.0. In that situation, the only thing they want to experiment with different settings, we abandoned it. be sure of is that the change will not cause an Sarvega published results of a benchmarking study in unacceptable performance hit. We've had lots of positive 2003, but the results were only available on a commercial feedback from users making this transition, and very basis, and the tools needed to reproduce the results were little negative feedback, so we haven't felt any pressure to not made available at all. improve our competitive position, although we have A more recent effort is XSLTMark II3, produced by always been aware that some of the competitors' Viktor Mašíček as an M.Sc thesis at Charles University, products have excellent performance. Prague. Mašíček is concerned to measure more than just As regard competitive performance, it's worth performance, but although he recognizes the importance emphasizing that this is not a race in which the winner of other qualities such as usability, he does not attempt takes all. Firstly, speed is not a one-dimensional metric, any scientific measurement.

Conference Proceedings

JAXB Release Documentation JAXB Release Documentation Table of Contents

Fun Factor: Coding with Xquery a Conversation with Jason Hunter by Ivan Pedruzzi, Senior Product Architect for Stylus Studio

Semantics Developer's Guide

XSAMS: XML Schema for Atomic, Molecular and Solid Data

Rdfa in XHTML: Syntax and Processing Rdfa in XHTML: Syntax and Processing

Rdf Repository Replacing Relational Database

Publishing Pdfs from DITA

Basex Server

XML for Java Developers G22.3033-002 Course Roadmap

Xmlmind XSL Utility - Online Help

XML: Looking at the Forest Instead of the Trees Guy Lapalme Professor Département D©Informatique Et De Recherche Opérationnelle Université De Montréal

JSON Application Programming Interface for Discrete Event Simulation Data Exchange