DOM and XPath
J. Schneeberger University of Applied Sciences Deggendorf [email protected]
1 Overview
• DOM – Document Object Model • XPath – step and location path – axis – other concepts – node types, abbreviations, and data types – Functions • XLink and XPointer
2 DOM / XPath / XQuery
• idea: uniform query language for parts of XML trees • DOM – first implementation in Netscape 2 – different in different browsers – standard by W3C – platform independent • XPath – Version 1: 1999 – Version 2 draft: 2004 – Version 2 standard: 2007 – Version 2 is supported by a few tools only
3 DOM Document Object Model Document Object Model (DOM)
• W3C spezification • facilitates the XML/HTML documents tree structure • programming interface (API) in JavaScript for HTML and XML documents – Core DOM – base model for HTML and XML documents – XML DOM – model for XML documents – HTML DOM – model for HTML documents DOM nodes
• the whole document is a node • each XML element is a node • the text within an XML element is a node • each attribute is an (attribute) node • comments are (comment) nodes
6 JavaScript DOM (Netscape)
7 In JavaScript / Browser
• Load an XML file var xmlDoc; xmlDoc=new window.XMLHttpRequest(); xmlDoc.open("GET","books.xml",false); xmlDoc.send(""); • Load an XML string
try { //Internet Explorer xmlDoc=new ActiveXObject("Microsoft.XMLDOM"); xmlDoc.async="false"; xmlDoc.loadXML(txt); return xmlDoc; } catch(e) { parser=new DOMParser(); xmlDoc=parser.parseFromString(txt,"text/xml"); return xmlDoc; } 8 DOM: properties and methods
• Examples for properties – if x is a node: – x.nodeName – the name x – x.nodeValue – the value of x – x.parentNode – the parent node of x – x.childNodes – the child node of x – x.attributes – the attribute node of x • Examples of methods: – x.getElementsByTagName(name) returns all elements with name name – x.appendChild(node) inserts a child node below x – x.removeChild(node) removes a child node from x 9 Another JavaScript example txt = xmlDoc.getElementsByTagName("title")[0].childNodes[0].nodeValue • xmlDoc the dom node generated by the parser • getElementsByTagName("title")[0] the first element from the array of all
• by getElementsByTagName() • Traversing the dom tree • Navigating the tree using the node relations
11 Event handler
• A lot of HTML elemnts can be accessed by event handler – e.g: onAbort, onClick, onFocus, onLoad, onMouseover, onSelect, onSubmit, etc. • Event handlers may specify JavaScript expressions which are evaluated when the specified event occurs. • The evaluation order of event handlers may be tricky – e.g. if a button specifies all of the following: onClick, onFocus, onMouseover, onSubmit
12 Firefox DOM Inspector
G. Görz, FAU, Informatik 8 G. Görz, FAU, Informatik 8 DOM specifications
• Level 1: Core – HTML and XML document model – navigation in trees, tree modifications • Level 2: Style Sheet – change of format specs at tree nodes – event handler – functions for interaction – XML namespaces • Level 3: – load and save of documents – DTD and Schema support – views and formatting • Further Levels: windows and interaction
[http://xml.coverpages.org/dom.html] 16 XPath Overview
• DOM – Document Object Model • XPath – step and location path – axis – other concepts – node types, abbreviations, and data types – Functions • XLink and XPointer
18 Source
• Parts from: Anders Møller, Michael I. Schwartzbach “An Introduction to XML and Web Technologies” Addison-Wesley, January 2006 • http://www.brics.dk/ixwt/
19 What is XPath?
• a notation to describe parts of trees • to navigate in trees
• Is used by: – XSLT – programming language to transform XML – XML Schema (to define the uniqueness and the scope of elements) – XLink and XPointer
20 XPath
• combination of path expressions (like those of a command shell) and simple programming language expressions – *.xml – all files with the extension “.xml“ – eg.: /body/table[@border="1"]
• XSLT style sheets use XPath expressions in match and select elements
21 Spaghetti
recipe
title incredients preparation info
Spaghetti item item item item Spaghetti difficulty duration duration Carbonara Carbonara is.. ..
weight Spaghetti weight butter amount eggs garlic 2 min work min work
250g 10g 3 20 prep.. 20 total
23 Location Path / Step
24 Location Path (1)
• two kinds of location paths • relative path – one or more steps (from left to right), connected by “/“ – each step selects a set of nodes – relative to the context (i.e. the start node) – each node of this set is in the context for the next step – multiple result sets of a step are combined (set union). • absolute path – an absolute path consists of “/“ followed by a relative path – the “/“ absolute path selects the root node
[http://www.w3.org/TR/xpath#location-paths] 25 Location Path (2)
• a location path – evaluates to a sequence of nodes – the sequence is sorted in document order – the sequence will never contain duplicates • general form: step1 / step2 / ... / stepN
26 Evaluating a Location Path
• A step maps a context node into a sequence • ƒThis also maps sequences to sequences – each node is used as context node – and is replaced with the result of applying the step • ƒ The path then applies each step in turn.
27 Example context descendant::C A
B B
C C D
E E E F C E
E E F
F
28 Example descendant::C/child::E
A
B B
C C D
E E E F C E
E E F
F
29 Example descendant::C/child::E/child::F context A
B B step 1 C C D step 2 E E E F C E
E E F step 3 / result F
30 Context
• The context of an XPath evaluation consists of – a context node (a node in an XML tree) – a context position and size (two nonnegative integers) – a set of variable bindings – a function library – a set of namespace declarations
• ƒThe application determines the initial context
• If the path starts with ‘/’ then – the initial context node is the root – the initial position and size are 1
31 XPath data model
• Starting point is the XML Information Set (Infoset). I.e. the information found in a valid and parsed XML document. • In addition, the following holds for XPath: – All data types of XML Schema are supported. (complex and simple types). – Collection elements and complex values. – Typed atomic values. – Ordered and heterogeneous sequences. • http://www.w3.org/TR/xpath-datamodel/
32 Location Step
• The location path is a sequence of steps • A location step consists of – an axis – a node test – some predicates axis :: nodetest [expr1] [expr2]
33 Axis
34
Axis • An axis is a sequence of nodes • An axis is evaluated relative to a context • XPath names 12 axis:
child parent self attribute ancestor descendant ancestor-or-self descendant-or-self preceding-sibling following-sibling preceding following
35 Axis Direction
• Each Axis has a direction • forward – in document orientation – child, descendant, following-sibling, following, self, descendant-or-self • backward – inverse document orientation – parent, ancestor, preceding-sibling, preceding • without direction – depends on the implementation – attribute
36 parent axis
A
B B
C C D
E E E F C E
E E F
F
37 child axis
A
B B
C C D
E E E F C E
E E F
F
38 descendant axis
A
B B
C C D
E E E F C E
E E F
F
39 ancestor axis
A
B B
C C D
E E E F C E
E E F
F
40 following-sibling axis
A
B B
C C D
E E E F C E
E E F
F
41 preceding-sibling axis
A
B B
C C D
E E E F C E
E E F
F
42 following axis
A
B B
C C D
E E E F C E
E E F
F
43 preceding axis
A
B B
C C D
E E E F C E
E E F
F
44 axis
ancestor
self
preceding following
descendant
45 axis
ancestor
following-sibling self
preceding-sibling
preceding following
descendant
46 Node Types, Abbreviations, and Data Types
47 Node Types
• Element node – a node in the tree that corresponds to an element. • text node – text in the XML tree with no further subelements • attribute node – represents an attribute (with name and value)
48 Node Test node test selection text() a text node comment() a comment node processing- instruction() node() All elements enclosed by tags and also nodes consisting of text (between elements). * elements with arbitrary names QName elements with a qualifying name *:NCName elements with an arbitrary namespace and a qualifying name NCName:* elements with a qualifying namespace and an arbitrary name 49 Abbreviations for XPath expressions
• attributes: @ • the context: . • the parent context: .. • the descendent axis: // • the child axis: (... by omission)
50 Abbreviations
[http://www.brics.dk/ixwt/] 51 More Examples
Abbreviation Long form * child::element() text() child::text() @maker attribute::maker @* attribute::* x//y child::x/descendant::y . self::node() .. parent::node() ../@maker parent::node()/attribute::maker car[5] child::car[position()=5] car[@maker="US"] child::car/self::node()[maker="US"] car[milage] child::car/self::node()[child::milage]
52 Atomization
• A sequence may be atomized • This results in a sequence of atomic values • For element nodes this is the concatenation of all descendant text nodes • For other nodes this is the obvious string
53 Data Types
XPath 1.1 XPath 2.0 – XML-Schema • xs:string • node sets • xs:boolean • xs:decimal • boolean values • xs:float • xs:double • numbers (floating point) • xs:duration • xs:dateTime • strings • xs:time • xs:date • xs:gYearMonth • xs:gYear • xs:gMonthDay • xs:gDay • xs:gMonth • xs:hexBinary • xs:base64Binary • xs:anyURI • xs:QName • xs:NOTATION 54 Expressions
55 Location Step
• The location path is a sequence of steps • A location step consists of – an axis – a node test – some predicates axis :: nodetest [expr1] [expr2]
56 XPath expressions
• Expression • XPathExpression – LocationPath – OrExpression – FilterExpression • AndExpression • PrimaryExpression – EqualityExpression – VariableReference » RelationalExpr – Literal – Number ... – FunctionCall • Predicate – RelativeLocationPath • Step – AxisSpecifier – ...
57 Expressions
• Each expression evaluates – an atomic value or – a sequence of nodes • Atomic values are: – numbers – booleans – Unicode-Strings – XML Schema data types
58 Predicates
A predicate • is an XPath expression evaluated with respect to the context • the result is transformed into a boolean value: – if the result is a number: true if the current position in the context corresponds to the number – true, if the result is a string which is not empty – if the result is a sequence (of nodes) which is not empty
59 Literal Expressions
60 Arithmetic Expressions
• +, -, *, div, idiv, mod • Operators are extended to set operators – If one argument is empty, the result is empty – If all arguments are sequences of numbers, the operation computes the result on the sequences – Otherwise, an error occurs
61 Variable references
• $foo • $bar:foo – variable names may contain : • $foo-17 – variable names may contain - be careful with variables in arithmetic expressions: ($foo)-17 $foo -17 $foo+-17
62 sequence expressions
• comma , concatenates sequences • to operator to construct numeric sequences. • Operators: union, intersect, except • sequences are always flattened • The following examples represent the same sequence:
63 Filter Expressions
• Extends predicates to sequences • The expression “.” is the context • The expression
has the result
64 Numeric Value Comparison
• Operators: eq, ne, lt, le, gt, ge • Used for atomic values • If used on arbitrary objects: – Atomization – If an argument is empty, the result is empty – If an argument is a sequence with length > 1, then false – If not comparable, then error – Otherwise compare the arguments
65 General Comparisons • Operators: =, !=, <, <=, >, >= • If used on arbitrary arguments – atomization – If the arguments contain at least one value that can be compared to a value in the other argument, then success – if not possible: false
66 Node Comparison
• Operators: is, <<, >> • Compares nodes with respect to identity and order • When comparing arbitrary arguments: – If a argument is emtpy, the result is empty – If both arguments ar single nodes, the nodes are compared – Otherwise error
67 Boolean Expressions
• Operators: and, or • Arguments are transformed to boolean values. • The result is false if – the value is the boolean false value, – the sequences is empty, – the string is empty, – the numer is 0. • Constant boolean values are specified by the functions true() and false() • Negation is not(...)
68 Funktions
69 Usage of Functions
• Expressions within predicates • With empty arguments line[last()] line[position() mod 2 =0] all even elements of a line – the function operates on the result of the preceeding XPath step • With Arguments count(/body/table[@border=“1“]) – the function uses the specified arguments
70 XPath 1 vs. 2
• Function categories • much more – context functions – nodes • a namespace “fn:...“ – conversion of data • user definable types functions – string functions – arithmetic function
71 Function Calls
• Some functions can be called on sequences or arbitrary many arguments
72 Arithmetic Functions
• Examples
73 Boolean Functions
• Examples
74 String Functions
• Examples
75 Functions with Regular Expressions
• Examples
76 Functions for Cardinality
• Examples
77 Functions on Sequences
78 Aggregation Functions
• Examples
79 Node Functions
• Examples
80 Type Conversion Functions
• Examples, please note the namespace
81 For Loop
82 Conditionals
83 Quantified Expressions
84 Hands-on Section
85 www.zvon.org
86 Eclipse XPath view
87 88 89 90 XLink and XPointer XPath, XLink, XPointer and XSLT
XPointer XQuery XLink XPath
XSLT
92 XPointer
• Identifies parts of an XML document
xlink:href="index.xml#r102"
# Symbol XPointer
Variants: – Simple references • element(...) • xpointer(...)
93 XPointer: variants (1)
• Simple references using an ID attribute ..#r102 • element(...) – Refers to elements by their position in the document ..#element(/1/5)
94 XPointer: variants (2)
• xpointer(...) ..#xpointer(//recipe[4]) ..#xpointer(//rcp:recipe[./rcp:title ='Zuppa Inglese']) An xpointer(...) expression may contain: – An Xpath expression – And additional (XPath) functions • Positions before and after elements start-point(), end-point(), ... • Ranges between elements range(), range-to(), ... • Selected ranges in text string-range(), ...
95 XLink
• XLink is a generalized concept of the well-known hyperlinks in HTML – Provides many-to-many relations – Provides “third party“ links – For arbitrary elements • XLinks are specified by attributes only – Which is the problem of XLink – and the reason for not being used.
96 Xlink example (1)
student
student teacher
„Carl and Fred are students of Joe“ 98 Xlink variants
• Simple Links • HTML Links (in XLink) • HLinks
99 Simple Links
… an abbreviation of …
101 HLink Tag
102 Online Ressources
• http://www.w3.org/TR/xpath/ • http://www.w3.org/TR/xpath20/ • http://www.w3.org/TR/xlink/ • http://www.w3.org/TR/xptr-framework/
103