Chapter 2 – XML, XML Schema, XSLT, and Xpath

SOFTWARE ARCHITECTURES Chapter 2 – XML, XML Schema, XSLT, and XPath Ryan McAlister Summary XML stands for Extensible Markup Language, meaning it uses tags to denote data much like HTML. Unlike HTML though it was designed to carry data, not to display it. XML is used to structure, store and transport data. Some of the extensions for XML help with this. The three we are going to look at are XML Schema, XSLT and XPath. XML Schema gives us a way to validate if a XML document follows a specific structure. XSLT gives us a way to convert XML documents into different formats. XPath gives us a way to extract data from XML documents in an easy format. Introduction XML and the standard extensions give us a way to design documents in a way that helps us work with data. Usually XML is used in conjunction with HTML. XML keeps all the data in a separate file that an HTML document can read and display. This is helpful if you need to display data that is constantly changing. Without XML we would have to edit the HTML every time the data changed. With XML we just keep the data in a separate file then have the HTML read it from there keeping it updated. XML Schema is used to describe the structure of XML documents. We do this by describing what elements and attributes can appear in the document. Elements are used to give a name and a type to describe and define the data contained in that instance. These are much like variables in programming languages. Attributes allow us to describe an element even further. Once we build our schema, we can then use it to validate XML documents, making sure they are in the format we want. XPath gives us a way to traverse XML documents and pull certain pieces of data out of the document. It works by taking an expression and returning the data that matches the location or locations described by the expression. We can then use this information in many different ways. XSLT allows us to transform our XML documents into other XML documents, HTML or XHTML. This works by creating a XSL Style Sheet which we will use to describe how to transform the different elements in the XML document. Then, we link the XSL Style Sheet with a XML Document. It will then transform the data and give us a new document and leave the old document unchanged. This is especially helpful if we wanted to only transform some of the data into one document, and then transform the part we did not use into another document. XML - Extensible Markup Language XML is an application of SGML (Standard Generalized Markup Language), a powerful markup language specified in the mid-1980s. XML was developed in 1996 and became a W3C Recommendation in 1998. W3C is the World Wide Web Consortium, the main standards organization of the Web. SOFTWARE ARCHITECTURES XML documents are used in a variety of ways, but their main usage is to store and structure data to be used by other formats. XML draws many comparisons to HTML, because they are both markup languages, but they are used for different things and are generally used in conjunction with each other. XML is built to store data in an efficient way, not to display data. HTML is used to display data, but it is not adept at storing data. Usually XML is used to store the data and HTML pulls the data out of the XML file and displays it. This is helpful because the data in the XML document can change and the HTML file does not need to be adjusted. XML Basics If we look at the XML document in Figure 1.1 to the right, the first thing we will notice is that it looks very similar to HTML. This is because they are both markup languages and both use tags to describe the data present. There are a few key differences that we will discuss that makes XML different from HTML. The main difference we will look at is that XML allows the use of user defined tags, whereas HTML only allows for the user to use Figure 1.1: An Example of a XML Document predefined tags. The usage of user defined tags allows us to describe the data in any way see fit. This also allows the data to become not just machine readable, but also easily read by humans. Let us take a closer look at what this example. The first line is the XML declaration. It defines what XML version we are using‒in this case version 1.0. The next line describes the root element of the document. Here we are saying it is a note. The next 3 lines are the child elements of the root element, which we use describe the contents of the note. The final line defines the end of the root element. This just says that our note is finished. As you can see this makes it very easy for humans to read, because all the tags are descriptive enough to describe what is contained within them. Looking at this we can easily discern that this is a note, to a student, from a teacher, telling the student that there is a test next Tuesday. Through the use of tags, a machine can also be able to describe it in a way that is useful. For example, if we only needed the body of the note, a machine could easily look in the note root element for the <body> child element and return what is inside that element. SOFTWARE ARCHITECTURES Figure 1.2: Another example XML Document XML is not just for small datasets, we can apply the same principles we used for the note and create much larger datasets. For example, what if we wanted to create an XML document detailing all the computers in an computer lab. Say we wanted to know the computer's name, whether it is a Mac or PC, and the date it was purchased. Our root element would be <computerlab>, and it would be populated by child elements of <computer>. Figure 1.2 shows how this XML document would be formatted. In the example, we only list 3 computers, but we could continue to add <computer> elements for every computer in a large computer lab. The things to take away from this are relatively simple. Firstly, XML is a markup language, much like HTML, used to store data as opposed to displaying it. Secondly, that it allows for user defined tags that can be much more descriptive and easier to read. Lastly, that it is not just for small datasets but very large ones as well. XML Schema There are many Schemas out the for XML, but for the purpose of this text we will be describing the first one recommended by the W3C. XML Schema is a way for us to define how to build an XML document. We do this by describing what elements should be present, where in the document they are located, and what attributes they have. From there, we can build a document to the specifications laid out in our schema. We can also test a document against our schema to determine if it is a valid match or not. SOFTWARE ARCHITECTURES The syntax of XML Schema Figure 1.3: Example of XML Schema Before we look at how to use an XML Schema, we need to first view the different pieces of the schema. If we take a look at Figure 1.3, we can see a XML Schema for our example from Figure 1.1. Looking at the first line we see a definition for a element called note with no type specified. There is no type specified in the element definition because the next line defines the note element as a complex type. A complex type is mainly used when an element will contain other elements. The next line is <xs.sequence> this simply means that the child elements follow this line must appear in the order that is in the schema. There is also <xs.all> and <xs.choice> that could be placed here instead. <xs.all> means that all the elements must be present, but in no particular order. <xs.choice> means that either one element or another can occur. The next 3 lines are definitions for the child elements. These are simple elements that only have a name and a type. The Figure 1.4: Common Data Types for Schemas most common types are listed in Figure 1.4. Then, we just close out the tags for the remaining open tags. There are many different data types available to use in XML Schema, but the most common ones are listed in Figure 1.4. These types are just used to describe what should be contained the element. "xs.string" for example should be used when the element will hold text data, such as a name or website address. This is why in the schema created in Figure 1.3 we used "xs.string" as our type. If we decided to add a date to our note element though, we would use "xs.date" for the type. SOFTWARE ARCHITECTURES How to use XML Schema Now we can talk about how we can use a schema to help us create our XML documents. A schema describes what must be in an XML document. Looking at Figure 1.3, we can determine that we must have a note element that has 3 child elements: to, from, and body, in that specific order.

Chapter 2 – XML, XML Schema, XSLT, and Xpath

OWL 2 Web Ontology Language Quick Reference Guide

SLA Information Technology Division Metadata for Video: Too Much

Schematron Overview Excerpted from Leigh Dodds’ 2001 XSLT UK Paper, “Schematron: Validating XML Using XSLT”

Schema Declaration in Xml Document

Instruction for Using XML Notepad

Cisco XML Schemas

Modularization of XHTML in XML Schema Modularization of XHTML™ in XML Schema

The Development of Algorithms for On-Demand Map Editing for Internet and Mobile Users with Gml and Svg

Determining the Output Schema of an XSLT Stylesheet

XHTML+Rdfa 1.1 - Third Edition Table of Contents

Mulberry Classes Guide to Using the Oxygen XML Editor (V19.0)

The Docbook Schema Working Draft V5.0B9, 26 October 2006