DOM and XPath

J. Schneeberger University of Applied Sciences Deggendorf [email protected]

1 Overview

• DOM – • XPath – step and location path – axis – other concepts – node types, abbreviations, and data types – Functions • XLink and XPointer

2 DOM / XPath / XQuery

• idea: uniform query language for parts of XML trees • DOM – first implementation in Netscape 2 – different in different browsers – standard by W3C – platform independent • XPath – Version 1: 1999 – Version 2 draft: 2004 – Version 2 standard: 2007 – Version 2 is supported by a few tools only

3 DOM Document Object Model Document Object Model (DOM)

• W3C spezification • facilitates the XML/HTML documents tree structure • programming interface (API) in JavaScript for HTML and XML documents – Core DOM – base model for HTML and XML documents – XML DOM – model for XML documents – HTML DOM – model for HTML documents DOM nodes

• the whole document is a node • each XML element is a node • the text within an XML element is a node • each attribute is an (attribute) node • comments are (comment) nodes

6 JavaScript DOM (Netscape)

7 In JavaScript / Browser

• Load an XML file var xmlDoc; xmlDoc=new window.XMLHttpRequest(); xmlDoc.open("GET","books.xml",false); xmlDoc.send(""); • Load an XML string

try { //Internet Explorer xmlDoc=new ActiveXObject("Microsoft.XMLDOM"); xmlDoc.async="false"; xmlDoc.loadXML(txt); return xmlDoc; } catch(e) { parser=new DOMParser(); xmlDoc=parser.parseFromString(txt,"text/"); return xmlDoc; } 8 DOM: properties and methods

• Examples for properties – if x is a node: – x.nodeName – the name x – x.nodeValue – the value of x – x.parentNode – the parent node of x – x.childNodes – the child node of x – x.attributes – the attribute node of x • Examples of methods: – x.getElementsByTagName(name) returns all elements with name name – x.appendChild(node) inserts a child node below x – x.removeChild(node) removes a child node from x 9 Another JavaScript example txt = xmlDoc.getElementsByTagName("title")[0].childNodes[0].nodeValue • xmlDoc the dom node generated by the parser • getElementsByTagName("title")[0] the first element from the array of all elements • childNodes[0] the first element in the array of child nodes • nodeValue the value of a node (e.g. some text) 10 Accessing DOM nodes </p><p>• by getElementsByTagName() • Traversing the dom tree • Navigating the tree using the node relations </p><p>11 Event handler </p><p>• A lot of HTML elemnts can be accessed by event handler – e.g: onAbort, onClick, onFocus, onLoad, onMouseover, onSelect, onSubmit, etc. • Event handlers may specify JavaScript expressions which are evaluated when the specified event occurs. • The evaluation order of event handlers may be tricky – e.g. if a button specifies all of the following: onClick, onFocus, onMouseover, onSubmit </p><p>12 Firefox DOM Inspector </p><p>G. Görz, FAU, Informatik 8 G. Görz, FAU, Informatik 8 DOM specifications </p><p>• Level 1: Core – HTML and XML document model – navigation in trees, tree modifications • Level 2: Style Sheet – change of format specs at tree nodes – event handler – functions for interaction – XML namespaces • Level 3: – load and save of documents – DTD and Schema support – views and formatting • Further Levels: windows and interaction </p><p>[http://xml.coverpages.org/dom.html] 16 XPath Overview </p><p>• DOM – Document Object Model • XPath – step and location path – axis – other concepts – node types, abbreviations, and data types – Functions • XLink and XPointer </p><p>18 Source </p><p>• Parts from: Anders Møller, Michael I. Schwartzbach “An Introduction to XML and Web Technologies” Addison-Wesley, January 2006 • http://www.brics.dk/ixwt/ </p><p>19 What is XPath? </p><p>• a notation to describe parts of trees • to navigate in trees </p><p>• Is used by: – XSLT – programming language to transform XML – XML Schema (to define the uniqueness and the scope of elements) – XLink and XPointer </p><p>20 XPath </p><p>• combination of path expressions (like those of a command shell) and simple programming language expressions – *.xml – all files with the extension “.xml“ – eg.: /body/table[@border="1"] </p><p>• XSLT style sheets use XPath expressions in match and select elements <xsl:template match="/"> <xsl:value-of select="."> <xsl:apply-templates select="/recipe/incredients/item"> </p><p>21 Spaghetti </p><p><recipe> <title>Spaghetti Carbonara spaghetti butter egg garlic Spaghetti carbonara is the classical .. 2 22 ... as XML tree

recipe

title incredients preparation info

Spaghetti item item item item Spaghetti difficulty duration duration Carbonara Carbonara is.. ..

weight Spaghetti weight butter amount eggs garlic 2 min work min work

250g 10g 3 20 prep.. 20 total

23 Location Path / Step

24 Location Path (1)

• two kinds of location paths • relative path – one or more steps (from left to right), connected by “/“ – each step selects a set of nodes – relative to the context (i.e. the start node) – each node of this set is in the context for the next step – multiple result sets of a step are combined (set union). • absolute path – an absolute path consists of “/“ followed by a relative path – the “/“ absolute path selects the root node

[http://www.w3.org/TR/xpath#location-paths] 25 Location Path (2)

• a location path – evaluates to a sequence of nodes – the sequence is sorted in document order – the sequence will never contain duplicates • general form: step1 / step2 / ... / stepN

26 Evaluating a Location Path

• A step maps a context node into a sequence • ƒThis also maps sequences to sequences – each node is used as context node – and is replaced with the result of applying the step • ƒ The path then applies each step in turn.

27 Example context descendant::C A

B B

C C D

E E E F C E

E E F

F

28 Example descendant::C/child::E

A

B B

C C D

E E E F C E

E E F

F

29 Example descendant::C/child::E/child::F context A

B B step 1 C C D step 2 E E E F C E

E E F step 3 / result F

30 Context

• The context of an XPath evaluation consists of – a context node (a node in an XML tree) – a context position and size (two nonnegative integers) – a set of variable bindings – a function library – a set of namespace declarations

• ƒThe application determines the initial context

• If the path starts with ‘/’ then – the initial context node is the root – the initial position and size are 1

31 XPath data model

• Starting point is the XML Information Set (Infoset). I.e. the information found in a valid and parsed XML document. • In addition, the following holds for XPath: – All data types of XML Schema are supported. (complex and simple types). – Collection elements and complex values. – Typed atomic values. – Ordered and heterogeneous sequences. • http://www.w3.org/TR/xpath-datamodel/

32 Location Step

• The location path is a sequence of steps • A location step consists of – an axis – a node test – some predicates axis :: nodetest [expr1] [expr2]

33 Axis

34

Axis • An axis is a sequence of nodes • An axis is evaluated relative to a context • XPath names 12 axis:

child parent self attribute ancestor descendant ancestor-or-self descendant-or-self preceding-sibling following-sibling preceding following

35 Axis Direction

• Each Axis has a direction • forward – in document orientation – child, descendant, following-sibling, following, self, descendant-or-self • backward – inverse document orientation – parent, ancestor, preceding-sibling, preceding • without direction – depends on the implementation – attribute

36 parent axis

A

B B

C C D

E E E F C E

E E F

F

37 child axis

A

B B

C C D

E E E F C E

E E F

F

38 descendant axis

A

B B

C C D

E E E F C E

E E F

F

39 ancestor axis

A

B B

C C D

E E E F C E

E E F

F

40 following-sibling axis

A

B B

C C D

E E E F C E

E E F

F

41 preceding-sibling axis

A

B B

C C D

E E E F C E

E E F

F

42 following axis

A

B B

C C D

E E E F C E

E E F

F

43 preceding axis

A

B B

C C D

E E E F C E

E E F

F

44 axis

ancestor

self

preceding following

descendant

45 axis

ancestor

following-sibling self

preceding-sibling

preceding following

descendant

46 Node Types, Abbreviations, and Data Types

47 Node Types

• Element node – a node in the tree that corresponds to an element. • text node – text in the XML tree with no further subelements • attribute node – represents an attribute (with name and value)

48 Node Test node test selection text() a text node comment() a comment node processing- instruction() node() All elements enclosed by tags and also nodes consisting of text (between elements). * elements with arbitrary names QName elements with a qualifying name *:NCName elements with an arbitrary namespace and a qualifying name NCName:* elements with a qualifying namespace and an arbitrary name 49 Abbreviations for XPath expressions

• attributes: @ • the context: . • the parent context: .. • the descendent axis: // • the child axis: (... by omission)

50 Abbreviations

[http://www.brics.dk/ixwt/] 51 More Examples

Abbreviation Long form * child::element() text() child::text() @maker attribute::maker @* attribute::* x//y child::x/descendant::y . self::node() .. parent::node() ../@maker parent::node()/attribute::maker car[5] child::car[position()=5] car[@maker="US"] child::car/self::node()[maker="US"] car[milage] child::car/self::node()[child::milage]

52 Atomization

• A sequence may be atomized • This results in a sequence of atomic values • For element nodes this is the concatenation of all descendant text nodes • For other nodes this is the obvious string

53 Data Types

XPath 1.1 XPath 2.0 – XML-Schema • xs:string • node sets • xs:boolean • xs:decimal • boolean values • xs:float • xs:double • numbers (floating point) • xs:duration • xs:dateTime • strings • xs:time • xs:date • xs:gYearMonth • xs:gYear • xs:gMonthDay • xs:gDay • xs:gMonth • xs:hexBinary • xs:base64Binary • xs:anyURI • xs:QName • xs:NOTATION 54 Expressions

55 Location Step

• The location path is a sequence of steps • A location step consists of – an axis – a node test – some predicates axis :: nodetest [expr1] [expr2]

56 XPath expressions

• Expression • XPathExpression – LocationPath – OrExpression – FilterExpression • AndExpression • PrimaryExpression – EqualityExpression – VariableReference » RelationalExpr – Literal – Number ... – FunctionCall • Predicate – RelativeLocationPath • Step – AxisSpecifier – ...

57 Expressions

• Each expression evaluates – an atomic value or – a sequence of nodes • Atomic values are: – numbers – booleans – Unicode-Strings – XML Schema data types

58 Predicates

A predicate • is an XPath expression evaluated with respect to the context • the result is transformed into a boolean value: – if the result is a number: true if the current position in the context corresponds to the number – true, if the result is a string which is not empty – if the result is a sequence (of nodes) which is not empty

59 Literal Expressions

60 Arithmetic Expressions

• +, -, *, div, idiv, mod • Operators are extended to set operators – If one argument is empty, the result is empty – If all arguments are sequences of numbers, the operation computes the result on the sequences – Otherwise, an error occurs

61 Variable references

• $foo • $bar:foo – variable names may contain : • $foo-17 – variable names may contain - be careful with variables in arithmetic expressions: ($foo)-17 $foo -17 $foo+-17

62 sequence expressions

• comma , concatenates sequences • to operator to construct numeric sequences. • Operators: union, intersect, except • sequences are always flattened • The following examples represent the same sequence:

63 Filter Expressions

• Extends predicates to sequences • The expression “.” is the context • The expression

has the result

64 Numeric Value Comparison

• Operators: eq, ne, lt, le, gt, ge • Used for atomic values • If used on arbitrary objects: – Atomization – If an argument is empty, the result is empty – If an argument is a sequence with length > 1, then false – If not comparable, then error – Otherwise compare the arguments

65 General Comparisons • Operators: =, !=, <, <=, >, >= • If used on arbitrary arguments – atomization – If the arguments contain at least one value that can be compared to a value in the other argument, then success – if not possible: false

66 Node Comparison

• Operators: is, <<, >> • Compares nodes with respect to identity and order • When comparing arbitrary arguments: – If a argument is emtpy, the result is empty – If both arguments ar single nodes, the nodes are compared – Otherwise error

67 Boolean Expressions

• Operators: and, or • Arguments are transformed to boolean values. • The result is false if – the value is the boolean false value, – the sequences is empty, – the string is empty, – the numer is 0. • Constant boolean values are specified by the functions true() and false() • Negation is not(...)

68 Funktions

69 Usage of Functions

• Expressions within predicates • With empty arguments line[last()] line[position() mod 2 =0] all even elements of a line – the function operates on the result of the preceeding XPath step • With Arguments count(/body/table[@border=“1“]) – the function uses the specified arguments

70 XPath 1 vs. 2

• Function categories • much more – context functions – nodes • a namespace “fn:...“ – conversion of data • user definable types functions – string functions – arithmetic function

71 Function Calls

• Some functions can be called on sequences or arbitrary many arguments

72 Arithmetic Functions

• Examples

73 Boolean Functions

• Examples

74 String Functions

• Examples

75 Functions with Regular Expressions

• Examples

76 Functions for Cardinality

• Examples

77 Functions on Sequences

78 Aggregation Functions

• Examples

79 Node Functions

• Examples

80 Type Conversion Functions

• Examples, please note the namespace

81 For Loop

82 Conditionals

83 Quantified Expressions

84 Hands-on Section

85 www.zvon.org

86 Eclipse XPath view

87 88 89 90 XLink and XPointer XPath, XLink, XPointer and XSLT

XPointer XQuery XLink XPath

XSLT

92 XPointer

• Identifies parts of an XML document

:href="index.xml#r102"

# Symbol XPointer

Variants: – Simple references • element(...) • (...)

93 XPointer: variants (1)

• Simple references using an ID attribute ..#r102 • element(...) – Refers to elements by their position in the document ..#element(/1/5)

94 XPointer: variants (2)

• xpointer(...) ..#xpointer(//recipe[4]) ..#xpointer(//rcp:recipe[./rcp:title ='Zuppa Inglese']) An xpointer(...) expression may contain: – An Xpath expression – And additional (XPath) functions • Positions before and after elements start-point(), end-point(), ... • Ranges between elements range(), range-to(), ... • Selected ranges in text string-range(), ...

95 XLink

• XLink is a generalized concept of the well-known hyperlinks in HTML – Provides many-to-many relations – Provides “third party“ links – For arbitrary elements • XLinks are specified by attributes only – Which is the problem of XLink – and the reason for not being used.

96 Xlink example (1)

97 Xlink example (2)

student

student teacher

„Carl and Fred are students of Joe“ 98 Xlink variants

• Simple Links • HTML Links (in XLink) • HLinks

99 Simple Links

… an abbreviation of …

100 HTML-Links in XLink

101 HLink Tag

102 Online Ressources

• http://www.w3.org/TR/xpath/ • http://www.w3.org/TR/xpath20/ • http://www.w3.org/TR/xlink/ • http://www.w3.org/TR/xptr-framework/

103