<<

XPath, XQuery and XSLT

Timo Lilja [email protected]

February 25, 2010

1 Introduction

An XML document consists of nodes which are organized in a hierarchical structure. The XPath operates on this tree repsentation and provides language with which a user can obtain parts of the tree for further processing. XQuery 1.0 includes XPath as a subset and provides SQL-like query facilities for querying single or multiple documents. XPath and XQuery are non- XML syntax functional query languages with no support for mutation. XSLT which stands for Extensible Stylesheet Languge (XSL) Transforms is a language with pure XML syntax that is mostly useful for transform XML doc- uments to more or less human readbale form, for example, in web browsers. Unlike XQuery, XSLT supports only transforming a single document at a time, thus making it more suitable for transforming whereas the multiple document support of XQuery makes it more suitable for like queries. W3C School provieds good on-line tutuorials to XPath [1], XQuery [2] and XSLT [3]. Some of the examples used here are based on their examples.

2 XPath

XPath 2.0 [8] is a language that is used to navigate through the nodes of an XML document’s tree representation. XPath syntax is non-XML which makes it easy to use in the context of XML attributes and URIs. XPath is used as the embedded query language for both XQuery 1.0 and XSLT 2.0. XPath 2.0 is a subset of XQuery 1.0 thus making all valid XPath expressions as valid XQuery 1.0 expressions. The XPath 2.0 mixes static and dynamic typing. Type information is gathered from the document’s XML Schema and is assigned to the nodes. If there is no XML Schema in use, the node will be assigned an untyped type.

1 When evaluating an expression, typed values are checked to prevent type errors. Untyped values are automatically converted to suitable type before evaluation. The standard defines coercion rules and built-in types for integers, strings, and dates.

Above is an overview of processing a XPath expressions. Steps that lie outside the line are not in the scope of an executing XPath query. Since XPath operates on the tree representation of XML document a data Data model representing this must be built first. The evaluation of an XPath expressions starts by building the operation tree which is then normalized, which includes finding out the types of non-atomic expressions and computing the results of statically decidable boolean expressions. In the static evaluation phase if static type information is available, the expression is type checked and a type error is given if a type mismatch is found. In the dynamic evaluation phase is the phase where the value of a XPath expression is computed. This phase either produces the results for a query or give dynamic type error. The final phase which is implementation dependant, is the serialization which produces the data in a form readable for the host. Path expressions consists of one or more steps separated by the path operator (/, slash). An absolute path expression starts with a trailing / while relative does not. A step can be

• an axis which is a way to reference a node in the path. • a node-test which corresponds to a node in the actual XML document • zero or more predicates

The syntax for the path expressions:

2 (: absolute path :) /step/step/ (: relative path :) step/step

Syntax for a step: axisname::nodetest[predicate]

Let’s asumme that we have an XML document

Everyday Italian Giada De Laurentiis 2005 30.00

Harry Potter J K. Rowling 2005 45.00

To query all the titles from the above document one could say bookstore/book/title which would produce

Everyday Italian, Harry Potter

The path operator is a binary operator which applies the result of the left hand side to the right hand side producing a result. The expressions on the left and right hand side of the operator can be arbitrary expressions provided that the left hand side evaluates to a sequence type expression. Trailing / is interpreted as root(self::node()). Double slashes // can be used a sort of wildcard to omit steps: bookstore//title

To select the first node:

3 /bookstore/book[1]

To select the last node:

/bookstore/book[last()]

To select all elements with attribute lang set to "en":

/bookstore/book/title[@lang="en"]

To select all elements whose price is more than 35.00 euros:

/bookstore/book[price>35.00]/title

To select all ancestors or self from a book node: bookstore/book/ancestor-or-self::book

XPath provides a set of operators for arithmetic ( +, -,*, div,mod logical tests (=, !=, <, <=, >, >=) and operators (or, and) and node-set combination |. For example, the pipe operator can be used to select several paths:

//book/title | //book/price or alternatively

//book/title union //book/price which would produce

Everyday Italian, 30.00, Harry Potter, 45.00

XPath has a function with string conversion, regular expressions, arith- metic, date and time utilities. The library is shared with XQuery 1.0. See the document on function operators [11]. For example to convert the results to lower case:

//book/lower-case(title) and the result would be

"everyday italian", "harry potter"

4 3 XQuery

XQuery [4] is a functional query language that can be used to query XML doc- ument data. It doesn’t include any kind of mutation or update operations and thus is somewhat similar to select expressions. There are currently draft standards that would extend XQuery to support updating the XML document data through XQuery [7]. XQuery language uses XPath expressions to select parts of XML document and extends it with FLWOR expressions which is an acronym for FOR, LET, WHERE, ORDER BY, RETURN. FLOWR expressions allow sorting the expressions: for $x in doc("books.")/bookstore/book where $x/price>30 order by $x/title return $x/title

XQuery supports conditional expressions: for $x in doc("books.xml")/bookstore/book return if ($x/@category="CHILDREN") then {data($x/title)} else {data($x/title)}

You can perform looping with for expressions: for $x in (1 to 5) return {$x}

You can bind variables with let clauses: let $x := (1 to 5) return {$x} which would produce

1 2 3 4 5

To invoke functions you can use them inside an element

{uppercase($booktitle)} or put it inside a path expression doc("books.xml")/bookstore/book[substring(title,1,5)=’Harry’]

5 or use it in a let clause let $name := (substring($booktitle,1,4))

You can define functions with type annotations declare function local:minPrice($p as xs:decimal?,$d as xs:decimal?) AS xs:decimal? { let $disc := ($p * $d) div 100 return ($p - $disc) }

Below is an example of how to call the function above:

{local:minPrice($book/price,$book/discount)}

Recursion is supported and namespaces are defined as below: declare namespace factorial (: = "http://example.com/factorial"; :) declare function factorial:fact($i as xs:integer) as xs:integer { if ($i <= 1) then 1 else $i * factorial:fact($i - 1) };

To invoke the namespace must be explicitly called: factorial:fact(4)

4 XSLT

XSLT [10] is a for XML documents using XML syntax and XPath expressions for path expression querying. The goal of XSLT is to be a stylesheet language with focus on how to render the XML documents to human readable form. XQuery, on the other hand, is primarily focused on database like queries where the end result is used by an application instead of browser viewing the result to human eye. XQuery supports querying multiple documents at the same time while XSLT handles one document at a time. Most of the modern browsers support XSLT natively if the HTML document refers to XSLT document accordingly:

6 Let’s assume that we have the following XML document:

Empire Burlesque Bob Dylan USA Columbia 10.90 1985 . .

We can apply the following XSL transformation to it

My CD Collection

Title Artist

7

The transformation begins with a declarations for xml and styleseet. The xsl:template with match="/" means that the matching will start from the root of the document. HTML tags are copied verbatim to the output. The xsl:for-each expression will execute body for each XML document expression satisfying the XPath query expression defined in the select="catalog/cd" at- tribute. The xsl:value-of tag returns the corresponding elements from the for each query. Ther sort information is put inside the element to produce the results in sorted order. The example above would produce the result:

Empire Burlesque Bob Dylan USA Columbia 10.90 1985 . .

The example include a conditional node. Alternatively, if you need to choose between multiple alternatives you can use element:

In addition to these, you can call templates based on the element that matches the attribute

8

My CD Collection

Title:

Standard XPath functions and operators are available in XSLT and in addition few built-in functions and built-elements are provided. See [11]. It is also possible to write an XML to s-exp convert in XSLT [5].

5 Tools

Probably the most comprehensive implementation of XML standards is The XML Parsen adn toolkit of Gnome libxml [12]. It includes XPath and XSLT support but no support for XQuery. XQilla [6] is an open-source implementation of XPath 2.0 and XQuery 1.0. Another alternativei is Galax [9], an OCaml implemenetation of XQuery 1.0 and XPath 2.0. I have used galax for XPath/XQuery expressions and libxml’s xsltproc for XSLT since they provide nice command line interfaces.

9 References

[1] XPath Introduction. http://www.w3schools.com/xpath/xpath_intro. asp. [2] XQuery Tutorial. http://www.w3schools.com/xquery/default.asp. [3] XSLT Tutorial. http://www.w3schools.com/xsl/. [4] S. Boag, D. Chamberlin, M. F. Fern´andez,D. Florescu, J. Robie, and J. Sim´eon.XQuery 1.0: An XML Query Language, 2007. http://www.w3. org/TR/xquery/. [5] J. D. Brennan. Convert XML to S-expression. http://github.com/nex3/ arc/blob/master/extras/xml2sexp.xsl.

[6] Y. Cai. XQilla. http://xqilla.sourceforge.net/HomePage. [7] D. Chamberlin, M. Dyck, D. Florescu, J. Melton, J. Robie, and J. Sim´eon. XQuery Update Facility 1.0, 2009. http://www.w3.org/TR/ xquery-update-10.

[8] J. Clark and S. DeRose. XML Path Language (XPath) 2.0, 2007. http: //www.w3.org/TR/xpath20/. [9] M. Fern´andezand J. Sim´eon.Galax. http://galax.sourceforge.net/. [10] M. Kay. XSL Transformations (XSLT) Version 2.0, 2007. http://www.w3. org/TR/xslt20/. [11] A. Malhotra, J. Melton, and N. Walsh. XQuery 1.0 and XPath 2.0 Func- tions and Operators, 2007. http://www.w3.org/TR/xquery-operators/. [12] D. Veillard. libxml - The XML library for Gnome. http://xmlsoft.org/.

10