RELAX NG XML Schemas Schematron

Total Page:16

File Type:pdf, Size:1020Kb

RELAX NG XML Schemas Schematron RELAX NG RELAX NG is a schema language for XML. The key features of RELAX NG are that it: • is simple • is easy to learn • has both an XML syntax and a compact non-XML syntax • does not change the information set of an XML document • supports XML namespaces • treats attributes uniformly with elements so far as possible • has unrestricted support for unordered content • has unrestricted support for mixed content • has a solid theoretical basis • can partner with a separate datatyping language (such W3C XML Schema Datatypes) XML Schemas XML Schemas express shared vocabularies and allow machines to carry out rules made by people. They provide a means for defining the structure, content and semantics of XML documents. in more detail. XML Schema was approved as a W3C Recommendation on 2 May 2001 and a second edition incorporating many errata was published on 28 October 2004. Schematron The Schematron differs in basic concept from other schema languages in that it not based on grammars but on finding tree patterns in the parsed document. This approach allows many kinds of structures to be represented which are inconvenient and difficult in grammar-based schema languages. If you know XPath or the XSLT expression language, you can start to use The Schematron immediately. The Schematron allows you to develop and mix two kinds of schemas: • Report elements allow you to diagnose which variant of a language you are dealing with. • Assert elements allow you to confirm that the document conforms to a particular schema. The Schematron is based on a simple action: • First, find a context nodes in the document (typically an element) based on XPath path criteria; • Then, check to see if some other XPath expressions are true, for each of those nodes. The Schematron can be useful in conjunction with many grammar-based structure-validation languages: DTDs, XML Schemas, RELAX, TREX, etc. Indeed, Schematron is part of an ISO standard (DSDL: Document Schema Description Languages) designed to allow multiple, well- focussed XML validation languages to work together. You can even embed a Schematron schema inside an XML Schema <appinfo> element or inside a RELAX NG schema! ? DTD (W3C) XML RELAX TREX RELAX NG Schematron Schemas overview a XML an object- a pattern- a pattern a schema a rules-based structure oriented based, user- specification language XML definition XML friendly for the created by schema with a list schema XML structure and unifying language of legal language schema content of an RELAX elements language XML Core and document TREX (Tree Regular Expressions for XML) grammar posses its object-like, both an own XML syntax XML syntax compact and a but non- compact XML non-XML grammar syntax datatyping no, (yes but yes yes weak, only (datatype applies on systems can attributes) be plugged) support for none yes yes XML namespaces can partner yes, with a with others separate datatyping language Vendor support Post- yes yes no Schema- Validation- Infoset complexity high can express no yes non- determinism rules no no no yes, using expression XPath ?structures? yes yes yes no ?integrity? yes flexibility poor intermediate high for top, but all (weak structures must be support for defined ? DTD (W3C) XML RELAX TREX RELAX NG Schematron Schemas unordered content) notes a Schema is TREX has relatively been merged easy to with extend and RELAX to good for create data- RELAX oriented NG. All applications future development of TREX will take place as part of the RELAX NG effort • How to obviate parsing problems ? (i.e. how to have a well formed XML document containing the needed data) => find another way to express the unparsed data => use external data-files (non XML) containing the unparsed data • RELAX NG : is it possible to reference a single definition of an element from another file.rng ? • Data-structure is easily represented by RELAX NG! • How to translate the following text into RELAX NG ? <!-- Definition of Annotation follows --> <xs:complexType name="Annotation"> <xs:annotation> <xs:documentation>Concise processing directives for downstream applications.</xs:documentation> </xs:annotation> <xs:sequence> <xs:any processContents="skip" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> • How is the "xs:anyType" translated into RELAX NG from <xs:element name="type" type="xs:anyType"/> ? • With RELAX NG the style of your schema (Russian doll, DTD-like, or content- oriented) has an impact on its extensibility. =>The content-oriented option is the most extensible. • Some interesting RELAX NG features : => the ability to define attributes wherever you want in your patterns, => the flexibility and freedom with which you can combine patterns and the lack of restrictions associated with these combinations, => can use regular expressions to specify or constraint datatypes, => previously defined structures can be very easily employed,.
Recommended publications
  • Schematron Overview Excerpted from Leigh Dodds’ 2001 XSLT UK Paper, “Schematron: Validating XML Using XSLT”
    Schematron: validating XML using XSLT Schematron Overview excerpted from Leigh Dodds’ 2001 XSLT UK paper, “Schematron: validating XML using XSLT” Leigh Dodds, ingenta ltd, xmlhack.com April 2001 Abstract Schematron [Schematron] is a structural based validation language, defined by Rick Jelliffe, as an alternative to existing grammar based approaches. Tree patterns, defined as XPath expressions, are used to make assertions, and provide user-centred reports about XML documents. Expressing validation rules using patterns is often easier than defining the same rule using a content model. Tree patterns are collected together to form a Schematron schema. Schematron is a useful and accessible supplement to other schema languages. The open-source XSLT implementation is based around a core framework which is open for extension and customisation. Overview This paper provides an introduction to Schematron; an innovative XML validation language developed by Rick Jelliffe. This innovation stems from selecting an alternative approach to validation than existing schema languages: Schematron uses a tree pattern based paradigm, rather than the regular grammars used in DTDs and XML schemas. As an extensible, easy to use, open source tool Schematron is an extremely useful addition to the XML developers toolkit. The initial section of this paper conducts a brief overview of tree pattern validation, and some of the advantages it has in comparison to a regular grammar approach. This is followed by an outline of Schematron and the intended uses which have guided its design. The Schematron language is then discussed, covering all major elements in the language with examples of their 1 of 14 Schematron: validating XML using XSLT usage.
    [Show full text]
  • Mulberry Classes Guide to Using the Oxygen XML Editor (V19.0)
    Mulberry Classes Guide to Using the Oxygen XML Editor (v19.0) Mulberry Technologies, Inc. 17 West Jefferson Street, Suite 207 Rockville, MD 20850 Phone: 301/315-9631 Fax: 301/315-8285 [email protected] http://www.mulberrytech.com Version 1.6 (April 21, 2017) ©Copyright 2015, 2017 Mulberry Technologies, Inc. Mulberry Classes Guide to Using the Oxygen XML Editor (v19.0) Exercise 1 Exercise 1. Guide to Using Oxygen XML Editor (v19.0) NOTE: This is a reference, not a list of instructions! Oxygen is both an XML editor and a development tool. We will be using it to run XML transforms using XSLT, to validate documents according to a DTD or schema, and to run Schematron, XQuery, XSLT-FO, and other pro- cesses. Key Oxygen Icons check well-formedness (blue checkmark) validate document (red checkmark) associate schema (red push pin) apply transformation scenario (triangle in circle) configure transformation scenario (wrench) XPath 2.0 search window Open Oxygen XML Editor • Double click the icon Naming Files When you create a file, it is considered best practice to name your files using the following file extensions: • XML filenames end in “.xml” • XSLT filenames end in “.xsl” Exercises page 1 Mulberry Classes Guide to Using the Oxygen XML Editor (v19.0) • XML Schema filenames end in “.xsd” • DTD filenames end in “.dtd” • DTD modules (DTD fragments) end in “.ent” or “.mod” • Schematron filenames end in “.sch” • PDF files end in “.pdf” • HTML an XHTML files end in “.html” or “.htm” • RELAXNG files end in “.rng” Create a New XML Document 1. First Time Opening Oxygen • If a “Welcome to Oxygen” screen appears, under Create New • Choose New Document • Choose XML Document • Then finish as explained below • If there is no “Welcome to Oxygen” screen, on the top bar choose File • Choose New • Under New Document, choose XML Document • Then finish as explained below 2.
    [Show full text]
  • Xmltool Indent Indent Options Validate Options [2] Common Options [9] [ Xml File ]*
    The xmltool command-line utility Hussein Shafie, XMLmind Software <[email protected]> June 22, 2021 Abstract This document is the reference manual of the xmltool command-line utility. The xmltool command-line utility can be used to validate and pretty-print (i.e. indent) XML documents and also to automatically generate a reference manual in HTML format for a schema. This utility, like all the other command-line utilities, is found in XXE_install_dir/bin/. Table of Contents 1. Why use the xmltool command-line utility? .............................................................................. 1 2. Synopsis ................................................................................................................................. 1 3. validate options .................................................................................................................... 2 4. indent options ........................................................................................................................ 3 5. schematron options ................................................................................................................ 7 6. schemadoc options .................................................................................................................. 8 7. Common options ..................................................................................................................... 9 A. Implementation limits ..........................................................................................................
    [Show full text]
  • NEMSIS V3 Schematron Guide
    NEMSIS TAC Whitepaper NEMSIS V3 Schematron Guide Date November 23, 2011 (Final) January 17, 2014 (Rewritten – Candidate Release 1) March 3, 2014 (Updated) March 2, 2015 (Updated) September 7, 2017 (Updated references) Authors Joshua Legler – NEMSIS Consultant Shaoyu Su – NEMSIS Software Developer N. Clay Mann – NEMSIS P.I. Contributors Aaron Hart, Chris Morgan, Mike Darvill, Kashif Khan and Troy Whipple – ImageTrend and René Nelson – ZOLL Lindsey Narloch – State of North Dakota Adam Voss – TriTech Software Systems Mark Potter – Medusa David Saylor – Beyond Lucid Technologies Patrick Sennett – Good Samaritan Hospital Jeff Robertson – EMSPIC Paul Sharpe – Commonwealth of Virginia Jessica Lundberg – Cognitech Ryan Smith – Intermedix Juan Esparza – State of Florida Tom Walker – University of Alabama Overview Schematron is a rule-based language for XML document validation. Schematron is an international standard defined in ISO/IEC 19757-3(2006) (hereafter referred to as “normative standard”). Anyone who creates Schematron files or software that performs Schematron-based validation should obtain a copy of the normative standard at https://www.iso.org/standard/40833.html. (Note: The normative standard was updated in 2016. Software compliant with NEMSIS version 3.4 should implement the 2006 version of the normative standard, as contained in the NEMSIS version 3.4 Schematron Development Kit.) Much of the validation in NEMSIS is accomplished via the use of W3C XML Schemas (known as XSD). XML Schemas constrain the structure of NEMSIS XML documents and the contents of elements and attributes within those documents using grammar-based validation. However, XML Schemas are not capable of context-sensitive validation, such as constraining the contents of one element based on the contents of another element.
    [Show full text]
  • Use Cases and Examination of XML Technologies to Process MS Word Documents in a Corporate Environment
    Use cases and examination of XML technologies to process MS Word documents in a corporate environment Colin Mackenzie [email protected] XML Consultant Why develop this solution? • Learn in a hands-on way • Many XML developers not using XSLT3 (some not properly utilising XSLT2) • Used Xproc 1.0 +CX but limited subset of features and non-complex requirements • Needed a project to increase my skills that I can then transfer to customers Why choose Word documents? • XML’s popularity now focussed on documents • MS Word used by most major corporate users of documents • MS Word uses OOXML • Processing Word often requires complex development (unpacking, no nested structure etc.) Use case – quality and consistency of styles • Content • Are all required sections present and correctly named? • Styling • Latest branding • Professional looking result • Consistent through-out document • Numbering and referencing affects legal meaning Why do things go wrong with Styles? • Lack of training • Flexibility of Word • Misused styles and templates • Manual formatting • Manual or simple numbering CTRL+B Typical solutions Users Dev/IT • Custom templates • Macros, VB, ribbons Knowledge • Training Worker • Commercial add-ins and products • Tend to fail over time So what about a standards-based solution? • Allow knowledge workers to manage styles in the template • Leave Word UI as out of the box • Provide suggestions and feedback to users in a language they can understand • Define the rules for style and content clearly Some XML content Word XML workflow • Word Tables -> HTML/CALS tables • Footnotes, links, graphic references etc. -> XML mark-up • Flat Headings (style) -> nested XML structure Save As • Flat Lists (style) -> nested lists XSLT • Other paras -> Semantic XML elements XML Word 2003 XML Semantic conversion Word creation, • Para in this style -> that semantic element Review and correction.
    [Show full text]
  • Introduction to Schematron
    Introduction to Schematron Wendell Piez and Debbie Lapeyre Mulberry Technologies, Inc. 17 West Jefferson St. Suite 207 Rockville MD 20850 Phone: 301/315-9631 Fax: 301/315-8285 [email protected] http://www.mulberrytech.com Version 90-1.0 (November 2008) © 2008 Mulberry Technologies, Inc. Introduction to Schematron Administrivia...................................................................................................................... 1 Schematron is a ................................................................................................................. 1 Reasons to use Schematron............................................................................................... 1 What Schematron is used for............................................................................................ 2 Schematron is an XML vocabulary................................................................................... 2 Schematron specifies, it does not perform........................................................................ 2 Simple Schematron processing architecture...................................................................... 3 Schematron validation in action........................................................................................ 4 Basic Schematron building blocks................................................................................. 4 How Schematron works.................................................................................................. 4 Outline of a simple Schematron
    [Show full text]
  • Using Cocoon, WML, and Xmlforms
    Using Cocoon, WML, and XMLForms Presented by developerWorks, your source for great tutorials ibm.com/developerWorks Table of Contents If you're viewing this document online, you can click any of the topics below to link directly to that section. 1. Before you start......................................................... 2 2. Getting started with Cocoon.......................................... 3 3. Cocoon processing model, sitemap, and pipelines............... 4 4. Cocoon and HTML ..................................................... 7 5. Cocoon and WML ...................................................... 11 6. Cocoon, XMLForms, and WML ...................................... 14 7. Summary and resources .............................................. 19 Using Cocoon, WML, and XMLForms Page 1 of 19 ibm.com/developerWorks Presented by developerWorks, your source for great tutorials Section 1. Before you start About this tutorial This tutorial teaches you how to develop applications using Cocoon, Wireless Markup Language (WML), and XMLForms. The course is intended for developers and technical managers who want to get an overview of Cocoon and understand how to use Cocoon for application development. Prerequisites To use this tutorial, you should be familiar with basic Java programming, WML, XML, and XSLT. About the authors Vivek Malhotra is a Subject Matter Expert on Wireless Technologies who is based in the Washington D.C. area. Vivek has several years of experience developing and implementing wireless applications and has spoken on expert panels focusing on the wireless industry. You can reach him at [email protected] for any questions you might have about the content of this tutorial. Roman Vichr is senior architect at DDLabs, an e-commerce and EAI consulting company. During the past nine years, his focus has been on database management for client/server and Web applications development.
    [Show full text]
  • A Tutorial on Xproc an XML Pipeline Language
    A Tutorial on XProc An XML Pipeline Language Rui Lopes LaSIGE/University of Lisbon [email protected] Outline • Pipeline Concepts • Syntax Overview • Steps • Other Pipeline Elements • Standard Step Library • Recipes XProc: Background • XML processing is becoming mainstream • XSLT, XQuery, Schema validation, etc. • How to glue them all coherently, in order to create (complex) XML applications? XML Pipelines. http://www.w3.org/XML/XProc/docs/langspec.html XProc: Concepts Step (Pipeline) Step (Choose) Stylesheet V1.0 Schema Step (Validate) Step true (XSLT) Source /*[@version<2.0] Result false Step (Validate) V2.0 Schema Concepts: Steps • Compound (e.g., Pipeline, Choose) • Containers for other steps (defined through subpipelines) • Atomic (e.g., Validate, XSLT) • Referenced by name Concepts: I/O • XML documents (i.e., infosets) flow inside a pipeline • A step definition states which inputs and outputs it requires (single or sequence) • Steps are connected together through pipes between inputs and outputs, defining a pipeline’s flow • No loops (directed acyclic graph) • As streamable as possible (huge datasets) Concepts: I/O • Steps have primary inputs and outputs (and may have secondary) • Steps may accept options (name/value pairs) • Steps may accept parameters (name/value pairs) Concepts: XPath context • May occur in several places: • Compound steps, compute option and parameter values, values passed to steps • XPath 1.0 or 2.0 (depending on implementations) Concepts: XPath context • Processor vs. step context levels • XPath extension functions
    [Show full text]
  • EAD-ODD: a Solution for Project-Specific EAD Schemes Laurent Romary, Charles Riondet
    EAD-ODD: A solution for project-specific EAD schemes Laurent Romary, Charles Riondet To cite this version: Laurent Romary, Charles Riondet. EAD-ODD: A solution for project-specific EAD schemes. Archival Science, Springer Verlag, 2018, 10.1007/s10502-018-9290-y. hal-01737568v2 HAL Id: hal-01737568 https://hal.inria.fr/hal-01737568v2 Submitted on 21 Mar 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License 1 EAD-ODD: A solution for project-specific EAD schemes Laurent Romary, Charles Riondet Inria Paris, ALMAnaCH1 This article tackles the issue of integrating heterogeneous archival sources in one single data repository, namely the European Holocaust Research Infrastructure (EHRI) portal, whose aim is to support Holocaust research by providing online access to information about dispersed sources relating to the Holocaust (http://portal.ehri-project.eu). In this case, the problem at hand is to combine data coming from a network of archives in order to create an interoperable data space which can be used to search for, retrieve and disseminate content in the context of archival-based research. The scholarly purpose has specific consequences on our task.
    [Show full text]
  • User Assistance with Schematron
    User Assistance with Schematron George Bina @georgebina [email protected] User Assistance with Schematron What is Schematron • An ISO Standard ISO/IEC 19757 – DSDL (Document Schema Definition Language) Part 3: Rule-based validation • A very simple schema language less than 10 main elements, about 20 elements in total • A different kind of schema defines business rules, not the document structure the error messages are specified inside the schema • Invented by Rick Jelliffe Copyright @ Syncro Soft, 2017. All rights reserved. User Assistance with Schematron Related technologies • XPath Used by Schematron to match and assert • XSLT Can be used to extend XSLT-based Schematron implementations • SQF Provide quick-fixes to identified issues defined as small scripts annotating the Schematron assertions Copyright @ Syncro Soft, 2017. All rights reserved. User Assistance with Schematron User Assistance Help users create correct documents as they write Copyright @ Syncro Soft, 2017. All rights reserved. User Assistance with Schematron Cost for solving a problem Cost Authoring time Review time Publishing time Production time Cost Try to solve the problems at the authoring time! Copyright @ Syncro Soft, 2017. All rights reserved. User Assistance with Schematron Assistance tools Send messages to the user when some issues are detected in the edited document • using different levels: information, warning, error, fatal • provide links for more details using the @see attribute Provide automatic solutions to detected issues • Using SQF Copyright @ Syncro Soft, 2017. All rights reserved. User Assistance with Schematron UA use cases using Schematron • Integrated intelligent style guide • Learn DITA from a Markdown perspective Copyright @ Syncro Soft, 2017. All rights reserved. User Assistance with Schematron Integrated intelligent style guide Copyright @ Syncro Soft, 2017.
    [Show full text]
  • A Complete Schema Definition Language for the Text Encoding
    A complete schema denition language for the Text Encoding Initiative Lou Burnard and Sebastian Rahtz XML London, June 16th 2013 1/30 Reminder: what is the TEI? A 25 year old project to dene Guidelines for text encoding: mainly targetted at digital editions of existing texts covers manuscripts, dictionaries, transcribed text, spoken corpora, and facsimiles, as well as simple books governed by an international membership consortium denes a very rich language, with about 550 elements managed in 22 modules and an infrastructure of model and attributes classes Specialist vocabularies such as XInclude, MathML and SVG are used where appropriate. .http://www.tei-c.org/ 2/30 The domain of the TEI 3/30 The domain of the TEI (2) 4/30 The TEI manifesto 1. The Guidelines are descriptive of many different ways and levels of encoding a digital text, not prescriptive 2. The Guidelines should be technology-agnostic. They currently use XML, but are prepared to change 3. The schema is modelled as independently as possible, though it currently uses RELAX NG to describe content models 4. A project is actively encouraged to develop an appropriate subset of the Guidelines, and apply domain-apppropriate constraints 5/30 The TEI is built using a literate programming system: ODD (one language does it all) A set of TEI elements which describe elements and attributes descriptions (in multiple languages) examples content models and datatypes information about how it can be used constraints equivalences (eg to formal ontologies like FRBR or CIDOC CRM) 6/30 Original tagdoc for <resp> element in TEI P2 (20 years ago) 7/30 How we do ODD now .
    [Show full text]
  • NPRG036 XML Technologies
    NPRG036 XML Technologies Lecture 7 Schematron, RELAX NG 16. 4. 2018 Author: Irena Holubová Lecturer: Martin Svoboda http://www.ksi.mff.cuni.cz/~svoboda/courses/172-NPRG036/ Lecture Outline XML schema languages Best practices RELAX NG Schematron Best Practices Best Practices How to define XML schemas for various use cases Are we going to use the schema locally or share it with others? Will the schema evolve? Are we going to preserve multiple versions of schemas? There are many recommendations Fact: The W3C specification does not recommend anything Basic Recommendations Use the XML features fully: Use your own elements and attributes <?xml version="1.0"?> <element name="order"> <attribute name="number" value="ORD001" /> <element name="employee"> ... </element> ... </element> Basic Recommendations Use the XML features fully: Maximize readability of XML documents using reasonable element and attribute names Even though the names are longer We can use (XML-specific) compression <?xml version="1.0"?> <o n="ORD001"> <e><nm>Martin Necasky</nm></e> <il> <i><m>5</m><c>5</c></i> </il> ... </o> Basic Recommendations Use the XML features fully: Do not use dot notation instead of XML tree hierarchy <?xml version="1.0"?> <order number="ORD001"> <customer.name.first>Martin</customer.name.first> <customer.name.surname>Necasky</customer.name.surname> <customer.address.street>Malostranské nám.</customer.address.street> <customer.address.number>25</customer.address.number> ... </order> Basic Recommendations Use the XML features fully: Do not use references instead of tree hierarchy They usually have less efficient processing (Of course, in some cases it might make sense) XML data model != relational data model <?xml version="1.0"?> <selling> <order number="ORD001"><employee ref="Z002"/>...</order> <order number="ORD002"><employee ref="Z001"/>...</order> <employee id="Z001">...</employee> <employee id="Z001">...</employee> </selling> Elements vs.
    [Show full text]