<<

3/5/19

2 Announcements (Tue. Mar. 5)

• Homework #3 (2 probs) to be released today. Due in two weeks • Project milestone #1 feedback : See private posts on Piazza XPath and XQuery • We will ask for weekly update from all group members Introduction to Databases to avoid any last min “X did not do anything” .. to be emailed soon CompSci 316 Spring 2019 • Project milestone 2 is due in 3 weeks

3 4 Query languages for XML Example DTD and XML

• XPath • Path expressions with conditions F Building block of other standards (XQuery, XSLT, XLink, XPointer, etc.) • XQuery • XPath + full-fledged SQL-like ]> • XSLT: mostly used a stylesheet language Foundations of Databases Abiteboul • XPath + transformation templates Hull VianuAddison Wesley We are not going to cover it in this course 1995

… …

5 6 XPath Try the queries in this lecture online

• There are many online Xpath testers -- e.g. http://codebeautify.org/Xpath-Tester • XPath specifies path expressions that match XML • Try with this example (or change it for different queries) data by navigating down (and occasionally up and across) the Foundations of Databases • Example Abiteboul Hull Vianu • Query: /bibliography/book/author Addison Wesley 1995

abc
Like a path, except there can be multiple 25 75 “subdirectories” with the same name • Result: all author elements reachable from root via the DBSTS path /bibliography/book/author Ramakrishnan Gehrke Addison Wesley 1999
abc
15 10

1 3/5/19

7 8 Basic XPath constructs Simple XPath examples / separator between steps in a path • All book titles /bibliography/book/title name matches any child element with this tag name • All book ISBN numbers * matches any child element /bibliography/book/@ISBN ]> @name matches the attribute with this name • All title elements, anywhere in the document @* matches any attribute //title //matches any descendent element or the current • All section titles, anywhere in the document element itself //section/title . matches the current element • Authors of bibliographical entries (suppose there .. matches the parent element are articles, reports, etc. in addition to books) /bibliography/*/author

9 10 Predicates in path expressions Predicates in path expressions – contd.

[condition] matches the “current” element if • Books with author “Abiteboul” condition evaluates to true on the current element /bibliography/book[author='Abiteboul'] • Books with price lower than $50 • Books with a publisher child element /bibliography/book[@price<50] /bibliography/book[publisher] • XPath will automatically convert the price string to a • Prices of books authored by “Abiteboul” numeric value for comparison /bibliography/book[author='Abiteboul']/@price

]> ]>

11 12 More complex predicates Predicates involving -sets /bibliography/book[author='Abiteboul'] Predicates can use and, or, and not • There may be multiple authors, so author in general • Books with price between $40 and $50 returns a node-set (in XPath terminology) /bibliography/book[40<=@price and @price<=50] • The predicate evaluates to true as long as it • Books authored by “Abiteboul” or those with price evaluates true for at least one node in the node-set, no lower than $50 i.e., at least one author is “Abiteboul” /bibliography/book[author='Abiteboul' or @price>=50] • Tricky query /bibliography/book[author='Abiteboul' or not(@price<50)] • Any difference between these two queries? /bibliography/book[author='Abiteboul' and author!='Abiteboul']

]> ]>

2 3/5/19

13 14 XPath operators and functions More XPath examples

Frequently used in conditions: • All elements whose tag names contain “section” x + y, x – y, x * y, x div y, x mod y (e.g., “subsection”) //*[contains(name(), 'section')] contains(x, y) true if string x contains string y • Title of the first section in each book count(node-set) counts the number nodes in node- /bibliography/book/section[position()=1]/title set • A shorthand: /bibliography/book/section[1]/title position() returns the “context position” (roughly, • Title of the last section in each book the position of the current node in the node-set /bibliography/book/section[position()=last()]/title containing it) last() returns the “context size” (roughly, the size of the node-set containing the current node) name() returns the tag name of the current element ]>

15 16 More XPath examples – contd. A tricky example

• Books with fewer than 10 sections • Suppose for a moment that price is a child /bibliography/book[count(section)<10] element of book, and there may be multiple • All elements whose parent’s tag name is not “book” prices per book //*[name()!='book']/* • Books with some price in range [20, 50] • Wrong answer: /bibliography/book[price >= 20 and price <= 50] • Correct answer: /bibliography/book[price[. >= 20 and . <= 50]]

]>

17 18 De-referencing IDREF’s General XPath location steps id(identifier) returns the element with identifier • Technically, each XPath query consists of a series of location steps separated by / • Suppose that books can reference other books

Introduction • Each location step consists of XML is a hot topic these days; see for more • An axis: one of self, attribute, parent, child, ancestor,† details… ancestor-or-self,† descendant, descendant-or-self, following,
following-sibling, preceding,† preceding-sibling,† and • Find all references to books written by “Abiteboul” namespace in the book with “ISBN-10” • A node-test: either a name test (e.g., book, section, *) or a type test (e.g., text(), node(), comment()), separated from /bibliography/book[@ISBN='ISBN-10'] //bookref[id(@ISBN)/author='Abiteboul'] the axis by :: • Or simply: Zero of more predicates (or conditions) enclosed in id('ISBN-10')//bookref[id(@ISBN)/author='Abiteboul'] square brackets †These reverse axes produce result node-sets in reverse document order; others (forward axes) produce node-sets in document order

3 3/5/19

19 20 Example of verbose syntax Some technical details on evaluation

Verbose (axis, node test, predicate): Given a context node, evaluate a location path as follows: 1. Start with node-set � = {context node} /child::bibliography 2.For each location step, from left to right: /child::book[attribute::ISBN='ISBN-10'] • � ← ∅ /descendant-or-self::node() • For each node � in �: /child::title • Using � as the context node, compute a node-set � from the axis and the node-test • Each predicate in turn filters �, in order • For each node � in �, evaluate predicate with the following context: Abbreviated: • Context node is � • Context size is the number of nodes in � /bibliography/book[@ISBN='ISBN-10']//title • Context position is the position of �′ within � • � ← � ∪ � • child is the default axis • � ← � • // stands for /descendant-or-self::node()/ 3.Return �

21 22 One more example XQuery

• Which of the following queries correctly find the third • XPath + full-fledged SQL-like query language author in the entire input document? • //author[position()=3] • XQuery expressions can be • Same as /descendant-or-self::node()/author[position()=3] • XPath expressions • Finds all third authors (for each publication) • FLWOR expressions • /descendant-or-self::node() [name()='author' and position()=3] • Quantified expressions • Returns the third element or text node in the document • Aggregation, sorting, and more… if it is an author • /descendant-or-self::node() • An XQuery expression in general can return a new [name()='author'] result XML document [position()=3] • • Correct! Compare with an XPath expression, which always • After the first condition is passed, the evaluation context changes: returns a sequence of nodes from the input document • Context size: # of nodes that passed the first condition or atomic values (boolean, number, string, etc.) • Context position: position of the context node within the list of nodes

23 24 A simple XQuery based on XPath FLWR expressions

Find all books with price lower than $50 • Retrieve the titles of books published before 2000, { together with their publisher doc("bib.")/bibliography/book[@price<50] { } for $b in doc("bib.xml")/bibliography/book let $p := $b/publisher • Things outside {}’s are copied to output verbatim where $b/year < 2000 return • for: loop • Things inside {}’s are evaluated and replaced by the • $b ranges over the result sequence, getting results { $b/title } one item at a time • doc("bib.xml") specifies the document to query { $p } • let: “assignment” • $p gets the entire result of $b/publisher • Can be omitted if there is a default context document } (possibly many nodes) • The XPath expression returns a sequence of book • where: filtering by condition elements • return: result structuring • Invoked in the “innermost loop,” i.e., once • These elements (including all their descendants) are for each successful binding of all query copied to output variables that satisfies where

4 3/5/19

25 26 An equivalent formulation Another formulation

• Retrieve the titles of books published before 2000, • Retrieve the titles of books published before 2000, together with their publisher together with their publisher

{ { for $b in doc("bib.xml")/bibliography/book[year<2000] for $b in doc("bib.xml")/bibliography/book, Nested loop return $p in $b/publisher where $b/year < 2000 { $b/title } return { $b/publisher } • Is this query equivalent to the previous two? { $b/title } • Yes, if there is one publisher per book } { $p } • No, in general • Two result book elements will be } created for a book with two publishers • No result book element will be created for a book with no publishers

27 28 Yet another formulation Subqueries in return

• Retrieve the titles of books published before 2000, • Extract book titles and their authors; make title an together with their publisher attribute and rename author to writer

{ { let $b := doc("bib.xml")/bibliography/book for $b in doc("bib.xml")/bibliography/book where $b/year < 2000 return return { for $a in $b/author { $b/title } • Is this query correct? { $b/publisher } • No! return {string($a)} } • It will produce only one output book What happens if we replace it with $a? } element, with all titles clumped together } and all publishers clumped together • normalize-space(string) removes leading and trailing • All books will be processed (as long as one is published before 2000) spaces from string, and replaces all internal sequences of white spaces with one white space

29 30 An explicit join Existentially quantified expressions

• Find pairs of books that have common author(s) (some $var in collection satisfies condition) • Can be used in where as a condition { for $b1 in doc("bib.xml")//book • Find titles of books in which XML is mentioned in for $b2 in doc("bib.xml")//book some section where $b1/author = $b2/author ← These are string comparisons, and $b1/title > $b2/title { not identity comparisons! return for $b in doc("bib.xml")//book where (some $section in $b//section satisfies {$b1/title} contains(string($section), "XML")) {$b2/title} return $b/title } }

5 3/5/19

31 32 Universally quantified expressions Aggregation (poor man’s version)

(every $var in collection satisfies condition) • List each publisher and the average prices of all its • Can be used in where as a condition books { • Find titles of books in which XML is mentioned in for $pub in distinct-values(doc("bib.xml")//publisher) every section let $price := avg(doc("bib.xml")//book[publisher=$pub]/@price) return { for $b in doc("bib.xml")//book {$pub} where (every $section in $b//section satisfies {$price} contains(string($section), "XML")) return $b/title } } • distinct-values(collection) removes duplicates by value • If the collection consists of elements (with no explicitly declared types), they are first converted to strings representing their “normalized contents” • avg(collection) computes the average of collection (assuming each item in collection can be converted to a numeric value)

33 34 Conditional expression Aggregation (XQuery >1.0)

• List each publisher and, only if applicable, the average • A new group by clause prices of all its books { for $book in doc("bib.xml")//book { let $pub := string($book/publisher) for $pub in distinct-values(doc("bib.xml")//publisher) group by $pub let $price := avg(doc("bib.xml")//book[publisher=$pub]/@price) return return {$pub} {$pub} { if ($price) {avg($book/@price)} then {$price} else () } Empty list ≈ nothing } } • After the group by clause, for each group, any non- grouping variable (e.g., $book) becomes a a sequence of • Use anywhere you’d expect a value, e.g.: values that this variable takes for all members of that • let $foo := if (…) then … else … group • return • Not supported by our saxonb- tool (which only supports XQuery 1.0)

35 36 Sorting (a brief history) Tricky semantics

• A path expression in XPath returns a sequence of • List titles of all books, sorted by their ISBN nodes according to original document order { (doc("bib.xml")//book sort by (@ISBN))/title • for loop will respect the ordering in the sequence } • What is wrong? • August 2002 (http://www.w3.org/TR/2002/WD-xquery-20020816/) • The last step in the path expression will return nodes in • Introduce an operator sort by (sort-by-expression-list) to document order! output results in a user-specified order • Correct versions • Example: list all books with price higher than $100, in { order by first author; for books with the same first for $b in doc("bib.xml")//book sort by (@ISBN) author, order by title return $b/title } { doc("bib.xml")//book[@price>100] { sort by (author[1], title) doc("bib.xml")//book/title sort by (../@ISBN) } }

6 3/5/19

37 38 Current version of sorting Summary

Since June 2006 • Many, many more features not covered in class • sort by has been ditched • XPath is very mature, stable, and widely used • A new order by clause is added to FLWR • Has good implementations in many systems • Which now becomes FLWOR • Is used in many other standards • Example: list all books in order by price from high to • XQuery is also fairly popular low; for books with the same price, sort by first • Has become the SQL for XML author and then title • { Has good implementations in some systems for $b in doc("bib.xml")//book[@price>100] stable order by Preserve input order number($b/price) descending, Order as number, not string $b/author[1], Override default (ascending) $b/title empty least Empty value considered smallest return $b }

39 XQuery vs. SQL

• Where did the join go? • Is navigational query going to destroy physical data independence? • Strong ordering constraint • Can be overridden by unordered { for… } • Why does that matter?

7