Markup UK 2021 Proceedings
Total Page:16
File Type:pdf, Size:1020Kb
2021 Proceedings A Conference about XML and Other Markup Technologies Markup UK 2021 Proceedings 2 Markup UK 2021 Proceedings 3 Markup UK 2021 Proceedings Markup UK Sister Conferences A Conference about XML and Other Markup Technologies https://markupuk.org/ Markup UK Conferences Limited is a limited company registered in England and Wales. Company registration number: 11623628 Registered address: 24 Trimworth Road, Folkestone, CT19 4EL, UK VAT Registration Number: 316 5241 25 Organisation Committee Geert Bormans Tomos Hillman Ari Nordström Andrew Sales Rebecca Shoob Markup UK 2021 Proceedings Programme Committee by B. Tommie Usdin, David Maus, Syd Bauman – Northeastern University Alain Couthures, Michael Kay, Erik Digital Scholarship Group Siegel, Debbie Lapeyre, Karin Bredenberg, Achim Berndzen – <xml-project /> Jaime Kaminski, Robin La Fontaine, Abel Braaksma – Abrasoft Nigel Whitaker, Steven Pemberton, Tony Peter Flynn – University College Cork Graham and Liam Quin Tony Graham – Antenna House Michael Kay – Saxonica The organisers of Markup UK would like to Jirka Kosek – University of Economics, thank Antenna House for their expert and Prague unstinting help in preparing and formatting Deborah A. Lapeyre – Mulberry the conference proceedings, and their Technologies generosity in providing licences to do so. David Maus – State and University Library Hamburg Antenna House Formatter is based on the Adam Retter – Evolved Binary W3C Recommendations for XSL-FO and B. Tommie Usdin – Mulberry Technologies CSS and has long been recognized as Norman Walsh – MarkLogic the most powerful and proven standards Lauren Wood – XML.com based formatting software available. It is used worldwide in demanding applications Thank You where the need is to format HTML and XML into PDF and print. Today, Antenna Evolved Binary House Formatter is used to produce le-tex Publishing Services millions of pages daily of technical, Saxonica financial, user, and a wide variety of other OxygenXML documentation for thousands of customers Exeter in over 45 countries. Adam Retter Tony Graham ...and our long-suffering partners 4 Encouraging Tag Set Branching without Creating a Briar Patch 6 B. Tommie Usdin. What’s in a Schematron? 18 David Maus. XSLTForms for the ‘20s 26 Alain Couthures. <transpile from="Java" to="C#" via="XML" with="XSLT"/> 34 Michael Kay. Comprehensible XML 50 Erik Siegel. How Much Tag Set Documentation is Needed? 62 Debbie Lapeyre. 2021 The Future of Distributed Markup Systems or ‘Help my package has become too large!’ 100 Karin Bredenberg. Jaime Kaminski. An improved diff3 format using XML: diff3x 108 Robin La Fontaine. Nigel Whitaker. On the Design of a Self-Referential Tutorial 124 Steven Pemberton. “FYI we’re not looking to go to print” 134 Tony Graham. CSS From XSLT 146 Liam Quin. 5 Encouraging Tag Set Branching without Creating a Briar Patch B. Tommie Usdin, Mulberry Technologies, Inc. ©2021 Mulberry Technologies, Inc. Customizing a tag set can be an easy way to get the vocabulary you need. It can also be a a journey filled with dead ends, trap doors, and slowly-revealed and difficult to identify problems. Like many public tag sets, JATS (the Journal Article Tag Suite) was designed to be customized. Our original expectation was that individual users would customize it, and while a few have done that to good effect, we have found that the major customizations have been by groups of users. BITS (the Book Interchange Tag Suite), NISO STS (Standards Tag Suite), and Manuscript Exchange Common Approach (MECA) are widely adopted customizations of JATS. When users customize a tag set they expect to be able to use the existing infrastructure associated with that tag set, making changes to accommodate the changes they made. They often expect to intermingle their new documents with documents tagged to the original tag set and perhaps with documents tagged to other customizations of the source tag set. They expect this to work gracefully, easily, seamlessly. Sometimes it does, but sometimes it does not! The “JATS Compatibility Meta-Model Description” was developed to help people who customize JATS create tag sets to create models that will coexist peacefully with existing JATS documents and with documents tagged to other JATS customizations. It seems unlikely that the particulars of the JATS Compatibility Model will apply to other tag sets, but the principles behind the Meta-Model might be useful to other groups thinking about ways to make their families of tag sets flexible and compatible. 1. Customizing Tag Sets ◇ Not new ◇ A good idea (usually) ◇ Saves work, time, money ◇ Promotes document interchange ◇ Can go horribly wrong 6 A Little History (over simplified) 2. A Little History (over simplified) 2.1. Generic Markup ◇ Tag what it is not what it should look like ◇ Province of system (e.g., IBM Starter Set was tag set for IBM Script) ◇ Didn’t meet all needs ◇ Attempt to make “universal” list of tags ◇ Impossible ◇ Developed “name your own tag” systems 2.2. Bespoke Vocabularies ◇ Markup projects started with “Task 1 - Create the Tag Set” ◇ Expensive and time consuming ◇ Some considered trade secrets ◇ Required custom infrastructure ◆ Formatter customization cost as much as (or more than) vocabulary ◆ Interchange required custom transform for each destination ◇ Constant expert maintenance 2.3. Public Vocabularies ◇ Developed to ◆ Promote interchange ◆ Lower barrier to entry ◇ Enabled ◆ Shared tools ◆ Lower cost services ◆ Shared expertise, software 2.4. Need for Customization Soon Obvious ◇ Vocabulary maintainers flooded with requests ◇ Tag sets began to grow ◇ Tag abuse grew ◇ Local customizations were clever but often hacks 7 Vocabularies Build In Customization Tools 3. Vocabularies Build In Customization Tools 3.1. “Symposium on Markup Vocabulary Customization” Proceedings at: https://www.balisage.net/Proceedings/vol24/cover.html ◇ Akoma Ntoso ◇ DITA ◇ DocBook ◇ JATS ◇ TEI 3.2. Debates about Value of Customization A few examples from JATS: ◇ “Superset Me—Not: Why the Journal Publishing Tag Set Is Sufficient if You Use Appropriate Layer Validation”, Alexander B. Schwarzman. (2010) https:// www.ncbi.nlm.nih.gov/books/n/jatscon10/schwarzman/ ◇ “Why Create a Subset of a Public Tag Set”, Deborah Aleyne Lapeyre. (2010) https:// www.ncbi.nlm.nih.gov/books/n/jatscon10/lapeyre/ ◇ “When the ‘One Size Fits Most’ tagset doesn’t fit you”, Tommie Usdin. (2013) https:// www.ncbi.nlm.nih.gov/books/n/jatscon13/usdin/ ◇ “JATS Subset and Schematron: Achieving the Right Balance”, Alexander B. Schwarzman. (2017) https://www.ncbi.nlm.nih.gov/books/n/jatscon17/schwarzman/ 3.3. Customization Mechanisms Vary ◇ DITA protects content from uncustomized tools ◇ TEI provides a web-based customization tool (Roma) ◇ Akoma Ntosa assumes users familiar with HTML ◇ JATS assumes customization by XMLers 3.4. Customizations ◇ Useful ◇ Convenient ◇ Appropriate ◇ Can go wrong 4. Experience with JATS Customizations ◇ JATS designed to be customized from beginning ◇ Provided “Modifying This Tag Set” in documentation: https://jats.nlm.nih.gov/archiving/ tag-library/1.2/chapter/implementor.html 8 Many Customizations by Groups of Users ◇ Expected individual users to customize 4.1. Many Customizations by Groups of Users Standards committees wrote: ◇ BITS (the Book Interchange Tag Suite) ◇ NISO STS (Standards Tag Suite) ◇ Manuscript Exchange Common Approach (MECA) 4.2. User Expectations for a Customization ◇ Existing infrastructure will work ◇ Only change parts directly related to new content ◇ Only modify to accommodate new requirements ◇ Intermix old and new documents ◇ Easy, graceful, seamless integration 4.3. User Experience ◇ Sometimes it “just works” as expected ◇ Sometimes problems appear on initial testing ◇ Sometimes problems pop up later ◇ Some problems easy to identify & fix ◇ Some problems very difficult to identify ◇ Some problems found too late to fix 5. JATS Compatibility Model “Building JATS-Compatible Vocabularies Draft 0.10” — formerly called “JATS Compatibility Meta-Model” ◇ Developed to help people who customize JATS create tag sets to create models that will coexist peacefully with existing JATS documents and with documents tagged to other JATS customizations ◇ Particulars unlikely to apply to other tag sets ◇ Principles probably do apply to others Current version at: https://www.niso.org/publications/jats-compatibility-model 5.1. JATS Compatibility Model in 2 Parts ◇ Design Principles ◆ about models as a whole 9 Design Principles ◆ philosophical more than technical ◇ Compatibility Properties ◆ about individual structures ◆ some highly technical ◆ some automatically check-able 5.2. Design Principles 5.2.1. Respect the Semantics ◇ The most important rule ◇ Use named structures to mean the same thing as source vocabulary ◇ Do not “reuse” structures This applies to ALL vocabularies! 5.2.2. Prevent Semantic Mismatch Homonyms are destructive in tag sets For example, in a collection of city regulations: ◇ you may not need the JATS <license> element (Set of conditions under which the content may be used, accessed, and distributed.) ◇ you may need to know if a dog, bicycle, or bar needs a license to operate ◇ Create a NEW structure, perhaps <city-license> or <permit-license> or <MyCity:license> ◇ DO NOT repurpose <license> 5.2.3. Linking Direction ◇ Links may go in either direction, or both ◇ Match the link philosophy of the source vocabulary ◇ In JATS: ◆ links go from many to one ◆ citations point to references, because a reference may be cited multiple times ◆ cross-references