Advanced XML Technologies: Schema, Xpath, Xquery, and XSL

Total Page:16

File Type:pdf, Size:1020Kb

Advanced XML Technologies: Schema, Xpath, Xquery, and XSL The National Virtual Observatory Book ASP Conference Series, Vol. 382, © 2008 M. J. Graham, M. J. Fitzpatrick, and T. A. McGlynn, eds. Chapter 57: Advanced XML Technologies: Schema, Xpath, XQuery, and XSL Raymond L. Plante Introduction Much of what happens behind the scenes in the VO happens using XML A major reason for choosing to use XML is to take advantage of “off-the-shelf” standards and technologies that can help us manage our metadata. Most VO users will not need to know anything about various manipulations of XML going on underneath their appli- cations. However, users who begin to delve into programming for the VO, be it scripting to gather data for a VO research project or developing a general application, will start to see some of these technologies at work under the hood. In four sections in this chapter, we’ll look at four of the most useful XML technologies. With each one, we’ll start by highlighting how you might find it useful. The intent is not to make you proficient in these tools. Rather, by getting a general sense how these technologies work you will at least have some ability to debug your application when things go wrong. In some cases, you may acquire enough familiarity to edit and use pre-existing samples. In this chapter, you will get a chance to make some of those edits and try out some tools you can find on the CD. We will use some tools from the adqllib pack- age. If you tried out the exercises in Chapter 36, then you may have already built these tools. If not, you can do that now On Linux/Mac-OS: > cd $NVOSS_HOME/java/src/adqllib > ant and on Windows: > cd %NVOSS_HOME\java\src\adqllib > ant 1. Defining Metadata Using XML Schema 1.1. Introduction What it is: XML Schema is a World Wide Web Consortium (W3C) standard for defining and verifying an XML grammar. It defines a set of legal XML tags and at- tributes and gives it a name, or more precisely, a namespace. It also specifies what order the tags must appear in and what values are allowed. A Schema-aware XML 619 620 Plante parser can read this definition and use it to test if an XML document obeys the grammar rules. Why you might care: The VO uses XML to encode metadata and service messages, and XML Schema is used to define the metadata encoding and message syntax. Having the ability to read XML Schema definitions may help you understand what metadata is needed by an application as well as how to encode it. What you’ll get from this section: You will hopefully gain some rudimentary skills for reading an XML Schema document as a means of discerning the proper syntax for creating XML that conforms to the schema. You will also get a look at the role of namespaces in supporting multiple schemas in the same document. Finally, we’ll try out a tool for validating XML documents to determine if they comply with a given schema; this will allow us to experiment with changes to a document. If you want to try out the exercises described in this section, you will the find the example files in the CD software distribution under $NVOSS_HOME/java/src/ advxml (on Windows, %NVOSS_HOME%\java\src\advxml). 1.2. Schema Basics In this section, we will consider the case of using XML to encode metadata. We will use XML Schema to define the metadata names and the value types. XML format is great for metadata because it can easily capture both simple and complex values. That is, some metadata will be simple: a name and value like a string or a real num- ber (e.g. “title” or “frequency”). Others can be complex where several simple values are combined together and given a name (e.g. “position” might be comprised of a Right Ascension value and a Declination value). Thus, our metadata will be encoded as elements and attributes in our XML document. When we define a set of related metadata that are meant to be used together, we call that a schema. XML Schema is a particular standard for defining the elements and attributes that will be used to en- code our metadata. The definition is done via an XML Schema document (also in XML format) which essentially contains a list of definitions. Typically, most of the definitions are of XML elements, attributes, and types, but they can also include definitions of groups of elements and attributes as well. To understand how these things are de- fined, we’ll step through a few simple examples, starting with the schema listed in Figure 1 (xmltech-simple.xsd in the CD distribution under $NVOSS_HOME/java/src/advxml). The example starts with the root element, <xs:schema>; it contains some attributes related to namespaces which we will get to later. For now, just note that the xs: prefix denotes things that are defined as part of the XML Schema language. The first interesting thing in this example is the definition of our first element using the <xs:element> tag. The name attribute indicates that the element will be called “resource”. We call this element a global element because its definition ap- pears as a direct child of the <xs:schema> tag. Only global elements can appear as root elements of an XML document (but they can appear elsewhere, too). Advanced XML 621 The full definition of the “resource” element appears in the content of the <xs:element> tag, and the first tag inside it, <xs:complexType>, indicates that it has a complex type. All elements and attributes have an associated type that indicates what its value will look like. A complex type means that the element can contain other elements. Some of those other elements inside might also be defined to be complex, which captures the familiar hierarchical structure of XML documents. Ob- viously attributes cannot be defined to be complex because they can only contain simple values, not other elements. We note also this kind of type is referred to as an targetNamespace <?xml version="1.0" encoding="UTF-8"?> <xs:schema targetNamespace="http://nvoss.org/VOResource" globally defined xmlns:xs="http://www.w3.org/2001/XMLSchema" element elementFormDefault="qualified"> <xs:element name="resource"> anonymous type <xs:complexType> definition <xs:sequence> content model <xs:element name="title" type="xs:string" /> <xs:element name="referenceURL" type="xs:anyURI" locally defined minOccurs="0"/> elements <xs:element name="type" minOccurs="0" maxOccurs="unbounded"> Occurrence <xs:simpleType> restrictions <xs:restriction base="xs:string"> <xs:enumeration value="Archive" /> <xs:enumeration value="Catalog" /> <xs:enumeration value="Organisation" /> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> <xs:attribute name="created" type="xs:dateTime" /> </xs:complexType> </xs:element> </xs:schema> Figure 1. xmltech-simple.xsd: an annotated sample of a simple XML Schema document; see text for a detailed explanation. Fig. 2 contains an example of an XML document that conforms with this schema. anonymous type because it’s being defined directly inside the <xs:element> tag. Later, we’ll look at an example of a globally defined type definition that is given a name (and therefore is not anonymous!). Complex types are said to be described by a content model. This defines what other elements it can contain and how they must be arranged. <xs:sequence> is the most common type of content model used in VO schemas. Inside that tag, we list the elements that may appear inside the “resource” element. The order that these ele- ments are listed in the definition is the order they must have inside the “resource” element. In other words, the “resource” element can contain “title,” “referenceURL,” and “type” elements in that order. Other types of content models which we won’t 622 Plante cover (though you might surmise their meaning) include xs:any, xs:all, and xs:group. The content model is further controlled by occurrence constraints. Each <xs:element> tag inside the <xs:sequence> can have minOccurs and maxOccurs attributes that specify the number of sequential occurrences of the element can ap- pear. For example, the “type” element can appear a minimum of zero times — that is, it doesn’t have to appear at all — or it can appear an unlimited number of times. If either minOccurs or maxOccurs is not specified, it is assumed to be one. Thus, the “title” element must appear once and only once. The “referenceURL” is optional (because minOccurs=0), but no more than one “referenceURL” is allowed. The “title,” “referenceURL,” and “type” elements are what we call local ele- ments (as opposed to global elements) because they are defined inside the definition of a type. The “type” element is being assigned an anonymous type, much like “re- source” was (because its type definition appears immediately inside the element defi- nition), but the “title” and “referenceURL” elements are being assigned predefined types, xs:string and xs:anyURI, via the type attribute. The xs: prefix indicates that these types are defined by the XML Schema standard. All three of these ele- ments have what we call simple types. These represent values that do not contain other elements but, rather, have values that can be appear as simple strings, such as an integer, date, URL, or just a generic string. XML Schema defines a large set of simple types that define the format of the value. xs:decimal, xs:integer, xs:positiveInteger, xs:date, and xs:dateTime are other useful simple types available. In the example, the “type” element has a simple, anonymous type. It’s our first example of a derived type.
Recommended publications
  • EAN.UCC XML Architectural Guide Version 1.0 July 2001
    EAN·UCC System The Global Language of Business ® EAN.UCC XML Architectural Guide Version 1.0 July 2001 COPYRIGHT 2001, EAN INTERNATIONAL™ AND UNIFORM CODE COUNCIL, INC.Ô EAN.UCC Architectural Guide EAN·UCC System The Global Language of Business ® Table of Contents Document History........................................................................................................................ 4 1. Introduction............................................................................................................................ 5 1.1 Overview......................................................................................................................................................................... 5 1.1.1 Extensions in UML............................................................................................................................................... 5 1.1.1.1 Common Core Components ....................................................................................................................... 6 1.2 In A Nutshell: A Business Process Document......................................................................................................... 6 1.3 Other Related Documents ............................................................................................................................................ 7 2. Implementation Guidelines ..................................................................................................... 8 2.1 Schema Language ........................................................................................................................................................
    [Show full text]
  • Schematron Overview Excerpted from Leigh Dodds’ 2001 XSLT UK Paper, “Schematron: Validating XML Using XSLT”
    Schematron: validating XML using XSLT Schematron Overview excerpted from Leigh Dodds’ 2001 XSLT UK paper, “Schematron: validating XML using XSLT” Leigh Dodds, ingenta ltd, xmlhack.com April 2001 Abstract Schematron [Schematron] is a structural based validation language, defined by Rick Jelliffe, as an alternative to existing grammar based approaches. Tree patterns, defined as XPath expressions, are used to make assertions, and provide user-centred reports about XML documents. Expressing validation rules using patterns is often easier than defining the same rule using a content model. Tree patterns are collected together to form a Schematron schema. Schematron is a useful and accessible supplement to other schema languages. The open-source XSLT implementation is based around a core framework which is open for extension and customisation. Overview This paper provides an introduction to Schematron; an innovative XML validation language developed by Rick Jelliffe. This innovation stems from selecting an alternative approach to validation than existing schema languages: Schematron uses a tree pattern based paradigm, rather than the regular grammars used in DTDs and XML schemas. As an extensible, easy to use, open source tool Schematron is an extremely useful addition to the XML developers toolkit. The initial section of this paper conducts a brief overview of tree pattern validation, and some of the advantages it has in comparison to a regular grammar approach. This is followed by an outline of Schematron and the intended uses which have guided its design. The Schematron language is then discussed, covering all major elements in the language with examples of their 1 of 14 Schematron: validating XML using XSLT usage.
    [Show full text]
  • Schema Declaration in Xml Document
    Schema Declaration In Xml Document Is Darin varying or propellant when cluck some suffusion motive melodically? When Anton interdepend his titters reap not gyrally enough, is Parsifal malnourished? Styled Winnie scollop confoundedly, he reincreasing his stopple very refreshfully. When a uri that they come back into account only applies when sales orders that document in order to the design Must understand attribute Should I define a global attribute that will indicate to implementation the criticality of extension elements? Create a DOM parser to use for the validation of an instance document. As such, the XML Schema namespace, and provides a mapping between NIEM concepts and the RDF model. It is a convention to use XSD or XS as a prefix for the XML Schema namespace, et al. XML documents have sufficient structure to guarantee that they can be represented as a hierarchical tree. Are expanded names universally unique? DOC format for easy reading. This site uses cookies. This schema fragment shows the definition of an association type that defines a relationship between a person and a telephone number. Furthermore, such as Adobe Acrobat Reader. In some cases, if any. The idea behind using URIs for namespace names are that URIs are designed to be unique and persistent. Note: The material discussed in this section also applies to validating when using the SAX parser. Finally, Relax NG Schema, even when the elements and attributes defined by Microsoft are present. The XML representation for assertions. Developers of domain schemas and other schemas that build on and extend the NIEM release schemas need to be able to define additional characteristics of common types.
    [Show full text]
  • Information Technology - Object Management Group XML Metadata Interchange (XMI)
    ISO/IEC 19509:2014(E) Date: April 2014 Information technology - Object Management Group XML Metadata Interchange (XMI) formal/2014-04-06 This version has been formally published by ISO as the 2014 edition standard: ISO/IEC 19509. ISO/IEC 19509:2014(E) Table of Contents 1 Scope ................................................................................................. 1 2 Conformance ...................................................................................... 1 2.1 General ....................................................................................................1 2.2 Required Compliance ...............................................................................1 2.2.1 XMI Schema Compliance ................................................................................. 1 2.2.2 XMI Document Compliance .............................................................................. 1 2.2.3 Software Compliance ....................................................................................... 2 2.3 Optional Compliance Points .....................................................................2 2.3.1 XMI Extension and Differences Compliance .................................................... 2 3 Normative References ........................................................................ 2 4 Terms and Definitions ......................................................................... 3 5 Symbols .............................................................................................. 3 6 Additional
    [Show full text]
  • Determining the Output Schema of an XSLT Stylesheet
    Determining the Output Schema of an XSLT Stylesheet Sven Groppe and Jinghua Groppe University of Innsbruck, Technikerstrasse 21a, A-6020 Innsbruck, Austria {Sven.Groppe, Jinghua Groppe}@uibk.ac.at Abstract. The XSLT language is used to describe transformations of XML documents into other formats. The transformed XML documents conform to output schemas of the used XSLT stylesheet. Output schemas of XSLT stylesheets can be used for a static analysis of the used XSLT stylesheet, to automatically detect the XSLT stylesheet, which has been used for the trans- formation, of target XML documents or to reason on the output schema without access to the target XML documents. In this paper, we describe how to auto- matically determine such an output schema of a given XSLT stylesheet, where we only consider XML to XML transformations. The input of our proposed output schema generator is the XSLT stylesheet and the schema of the input XML documents. The experimental evaluation shows that our prototype can de- termine the output schemas of nearly all typical XSLT stylesheets. 1 Introduction Among other usages of XML, XML is the most widely used data model for exchang- ing data on the web and elsewhere. For the exchange of data, we have to transform the data from one format into another format whenever the two exchange partners use different formats. The exchange partners can use different formats, which might be a proprietary company standard, a proprietary application format or other standard for- mats, for historical, political or other reasons. We focus on XSLT [23] as transforma- tion language for the XML data.
    [Show full text]
  • XHTML+Rdfa 1.1 - Third Edition Table of Contents
    XHTML+RDFa 1.1 - Third Edition Table of Contents XHTML+RDFa 1.1 - Third Edition Support for RDFa via XHTML Modularization W3C Recommendation 17 March 2015 This version: http://www.w3.org/TR/2015/REC-xhtml-rdfa-20150317/ Latest published version: http://www.w3.org/TR/xhtml-rdfa/ Implementation report: http://www.w3.org/2010/02/rdfa/wiki/CR-ImplementationReport Previous version: http://www.w3.org/TR/2014/PER-xhtml-rdfa-20141216/ Previous Recommendation: http://www.w3.org/TR/2013/REC-xhtml-rdfa-20130822/ Editor: Shane McCarron, Applied Testing and Technology, Inc., [email protected] Please check the errata for any errors or issues reported since publication. This document is also available in these non-normative formats: XHTML+RDFa, Diff from Previous Recommendation, Postscript version, and PDF version The English version of this specification is the only normative version. Non-normative translations may also be available. Copyright © 2007-2015 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply. Abstract RDFa Core 1.1 [RDFA-CORE [p.61] ] defines attributes and syntax for embedding semantic markup in Host Languages. This document defines one such Host Language. This language is a superset of XHTML 1.1 [XHTML11-2e [p.61] ], integrating the attributes as defined in RDFa Core 1.1. This document is intended for authors who want to create XHTML Family documents that embed rich semantic markup. - 1 - Status of This Document XHTML+RDFa 1.1 - Third Edition Status of This Document This section describes the status of this document at the time of its publication.
    [Show full text]
  • Mulberry Classes Guide to Using the Oxygen XML Editor (V19.0)
    Mulberry Classes Guide to Using the Oxygen XML Editor (v19.0) Mulberry Technologies, Inc. 17 West Jefferson Street, Suite 207 Rockville, MD 20850 Phone: 301/315-9631 Fax: 301/315-8285 [email protected] http://www.mulberrytech.com Version 1.6 (April 21, 2017) ©Copyright 2015, 2017 Mulberry Technologies, Inc. Mulberry Classes Guide to Using the Oxygen XML Editor (v19.0) Exercise 1 Exercise 1. Guide to Using Oxygen XML Editor (v19.0) NOTE: This is a reference, not a list of instructions! Oxygen is both an XML editor and a development tool. We will be using it to run XML transforms using XSLT, to validate documents according to a DTD or schema, and to run Schematron, XQuery, XSLT-FO, and other pro- cesses. Key Oxygen Icons check well-formedness (blue checkmark) validate document (red checkmark) associate schema (red push pin) apply transformation scenario (triangle in circle) configure transformation scenario (wrench) XPath 2.0 search window Open Oxygen XML Editor • Double click the icon Naming Files When you create a file, it is considered best practice to name your files using the following file extensions: • XML filenames end in “.xml” • XSLT filenames end in “.xsl” Exercises page 1 Mulberry Classes Guide to Using the Oxygen XML Editor (v19.0) • XML Schema filenames end in “.xsd” • DTD filenames end in “.dtd” • DTD modules (DTD fragments) end in “.ent” or “.mod” • Schematron filenames end in “.sch” • PDF files end in “.pdf” • HTML an XHTML files end in “.html” or “.htm” • RELAXNG files end in “.rng” Create a New XML Document 1. First Time Opening Oxygen • If a “Welcome to Oxygen” screen appears, under Create New • Choose New Document • Choose XML Document • Then finish as explained below • If there is no “Welcome to Oxygen” screen, on the top bar choose File • Choose New • Under New Document, choose XML Document • Then finish as explained below 2.
    [Show full text]
  • The Docbook Schema Working Draft V5.0B9, 26 October 2006
    The DocBook Schema Working Draft V5.0b9, 26 October 2006 Document identifier: docbook-5.0b9-spec-wd-01 Location: http://www.oasis-open.org/docbook/specs Editor: Norman Walsh, Sun Microsystems, Inc. <[email protected]> Abstract: DocBook is a general purpose [XML] schema particularly well suited to books and papers about computer hardware and software (though it is by no means limited to these applications). The Version 5.0 release is a complete rewrite of DocBook in RELAX NG. The intent of this rewrite is to produce a schema that is true to the spirit of DocBook while simultaneously removing inconsistencies that have arisen as a natural consequence of DocBook's long, slow evolution.The Technical Committee has taken this opportunity to simplify a number of content models and tighten constraints where RELAX NG makes that possible. The Technical Committee provides the DocBook 5.0 schema in other schema languages, including W3C XML Schema and an XML DTD, but the RELAX NG Schema is now the normative schema. Status: This is a Working Draft. It does not necessarily represent the consensus of the committee. Please send comments on this specification to the <[email protected]> list. To subscribe, please use the OASIS Subscription Manager. The errata page for this specification is at http://www.oasis-open.org/docbook/specs/docbook5-er- rata.html. Copyright © 2001, 2002, 2003, 2004, 2005, 2006 The Organization for the Advancement of Structured In- formation Standards [OASIS]. All Rights Reserved. Table of Contents 1. Introduction .................................................................................................................................... 2 2. Terminology .................................................................................................................................... 2 3. The DocBook RELAX NG Schema V5.0 .............................................................................................
    [Show full text]
  • NEMSIS V3 Schematron Guide
    NEMSIS TAC Whitepaper NEMSIS V3 Schematron Guide Date November 23, 2011 (Final) January 17, 2014 (Rewritten – Candidate Release 1) March 3, 2014 (Updated) March 2, 2015 (Updated) September 7, 2017 (Updated references) Authors Joshua Legler – NEMSIS Consultant Shaoyu Su – NEMSIS Software Developer N. Clay Mann – NEMSIS P.I. Contributors Aaron Hart, Chris Morgan, Mike Darvill, Kashif Khan and Troy Whipple – ImageTrend and René Nelson – ZOLL Lindsey Narloch – State of North Dakota Adam Voss – TriTech Software Systems Mark Potter – Medusa David Saylor – Beyond Lucid Technologies Patrick Sennett – Good Samaritan Hospital Jeff Robertson – EMSPIC Paul Sharpe – Commonwealth of Virginia Jessica Lundberg – Cognitech Ryan Smith – Intermedix Juan Esparza – State of Florida Tom Walker – University of Alabama Overview Schematron is a rule-based language for XML document validation. Schematron is an international standard defined in ISO/IEC 19757-3(2006) (hereafter referred to as “normative standard”). Anyone who creates Schematron files or software that performs Schematron-based validation should obtain a copy of the normative standard at https://www.iso.org/standard/40833.html. (Note: The normative standard was updated in 2016. Software compliant with NEMSIS version 3.4 should implement the 2006 version of the normative standard, as contained in the NEMSIS version 3.4 Schematron Development Kit.) Much of the validation in NEMSIS is accomplished via the use of W3C XML Schemas (known as XSD). XML Schemas constrain the structure of NEMSIS XML documents and the contents of elements and attributes within those documents using grammar-based validation. However, XML Schemas are not capable of context-sensitive validation, such as constraining the contents of one element based on the contents of another element.
    [Show full text]
  • Introduction to Schematron
    Introduction to Schematron Wendell Piez and Debbie Lapeyre Mulberry Technologies, Inc. 17 West Jefferson St. Suite 207 Rockville MD 20850 Phone: 301/315-9631 Fax: 301/315-8285 [email protected] http://www.mulberrytech.com Version 90-1.0 (November 2008) © 2008 Mulberry Technologies, Inc. Introduction to Schematron Administrivia...................................................................................................................... 1 Schematron is a ................................................................................................................. 1 Reasons to use Schematron............................................................................................... 1 What Schematron is used for............................................................................................ 2 Schematron is an XML vocabulary................................................................................... 2 Schematron specifies, it does not perform........................................................................ 2 Simple Schematron processing architecture...................................................................... 3 Schematron validation in action........................................................................................ 4 Basic Schematron building blocks................................................................................. 4 How Schematron works.................................................................................................. 4 Outline of a simple Schematron
    [Show full text]
  • NENA Technical Information Document on XML Namespaces
    NENA Technical Information Document On XML Namespaces NENA Technical Information Document on XML Namespaces NENA 02-503, Issue 1, February 23, 2007 Prepared by: Pierre Desjardins, Positron Inc. For National Emergency Number Association (NENA) DTC XML Working Group Published by NENA Printed in USA NENA Technical Information Document on XML namespaces NENA 02-503 Issue 1, February 23, 2007 NENA TECHNICAL INFORMATION DOCUMENT NOTICE The National Emergency Number Association (NENA) publishes this document as an information source for the designers and manufacturers of systems to be utilized for the purpose of processing emergency calls. It is not intended to provide complete design specifications or parameters or to assure the quality of performance for systems that process emergency calls. NENA reserves the right to revise this TID for any reason including, but not limited to: • conformity with criteria or standards promulgated by various agencies • utilization of advances in the state of the technical arts • or to reflect changes in the design of network interface or services described herein. It is possible that certain advances in technology will precede these revisions. Therefore, this TID should not be the only source of information used. NENA recommends that members contact their Telecommunications Carrier representative to ensure compatibility with the 9-1-1 network. Patents may cover the specifications, techniques, or network interface/system characteristics disclosed herein. No license expressed or implied is hereby granted. This document shall not be construed as a suggestion to any manufacturer to modify or change any of its products, nor does this document represent any commitment by NENA or any affiliate thereof to purchase any product whether or not it provides the described characteristics.
    [Show full text]
  • XML Schema Infoset Model, Part 1
    XML Schema Infoset Model, Part 1 Presented by developerWorks, your source for great tutorials ibm.com/developerWorks Table of Contents If you're viewing this document online, you can click any of the topics below to link directly to that section. 1. Before you start......................................................... 2 2. XML Schema fundamentals .......................................... 3 3. Setting up the development environment .......................... 5 4. Using XML Schema Infoset Model classes ........................ 9 5. Working with XML Schema resources in Eclipse ................. 21 6. Working with namespaces............................................ 25 7. You try it!................................................................. 27 8. Summary and resources .............................................. 28 XML Schema Infoset Model, Part 1 Page 1 of 29 ibm.com/developerWorks Presented by developerWorks, your source for great tutorials Section 1. Before you start About this tutorial The first of a two-part series, this tutorial gives you the building blocks you need to set up and work with the XML Schema Infoset Model. In this tutorial, you will learn: • How to set up the development environment • How to use the XML Schema Infoset Model classes • How to load and create XML Schema models • How to link XML Schema models together This tutorial is for developers who are familiar with Java, XML, and XML Schema, and who are interested in combining these technologies using the XML Schema Infoset Model. You should therefore understand how to write Java code and understand how XML Schemas work. Some understanding of the Unified Modeling Language (UML) is helpful but not required. You can get an introduction to XML Schema fundamentals in the Resources on page 28 at the end of this tutorial, and an introduction to UML basics in How to read UML class diagrams on page 9 .
    [Show full text]