XML for Java Developers G22.3033-002

Session 9 - Main Theme XML Information Retrieval (Part I)

Dr. Jean-Claude Franchitti

New York University Computer Science Department Courant Institute of Mathematical Sciences

1

Agenda

n Summary of Previous Session

n Applications of XML to Database Technology

n XML Query Languages

n XPath

n XML Queries

n XQuery: A Query Language for XML

n XML Query Engines

n Assignment 5a+5b+5c (due in two weeks)

2

Summary of Previous Session

n XML/XSL and JSP/JavaBeans Rendering Technology n Internationalization Issues n Web Content Accessibility Guidelines (WCAG) n Assignment 4a+4b (due next week)

3

1 XML-Based Retrieval Development

n XML Software Development Methodology

n Language + Stepwise Process + Tools

n Rational Unified Process (RUP) v.s. “XML Unified Process”

n XML Application Development Infrastructure

n Metadata Management (e.g., XMI)

n XML Query Engine (3rd party software)

n XML Tools (e.g., XML Editors)

n XML Applications Involved in the Rendering Phase:

n Application(s) of XML

n XML-based applications/services (markup language mediators)

n MOM, POP, Persistence service

n Application Infrastructure Frameworks

4

XML Data Retrieval Patterns

n XML Data Retrieval Operations

n Access

n Query

n Manipulate n Multiple XML Data Sources Integration n XML Message Filtering n DBMS Data Views n Database System Interfacing

5

Part I

Applications of XML to Database Technology

6

2 Towards XML Application Services n Processing

n DOM Extensions

n Binding Extensions

n Component Frameworks (reusable component models)

n Model-Based Automation (MDA) n Rendering

n DOM 2.1.0, SAX 2.0, JAXP 1.1 & TraX, XSL-FO 1.0

n Component Frameworks n Querying

n XQuery 1.0, XSLT 1.1/2.0, XPath 1.0/2.0 n Security (signatures encryption/decryption, etc.) n Etc. 7

Retrieval Software Development

n Languages (XML-QL, YaTL, XQL, etc.)

n Data Model + Operations + Syntax

n Process (“XUP”)

n Frameworks

n Custom Engine

n e.g., XQEngine

n Translation to SQL

n e.g., DB2XML, Oracle’s XML I/F

n Translation to OQL

n XML Query Infrastructure

n XPath Processors: Saxon 6.1, Xalan-J2.1.0

n XQuery Processors 8 n SQLX

XML Query History

n SGML Query Facilities

n Ad-hoc Approach to Query Languages

n 02/98: XQL Proposal

n 08/98: XML-QL Submission

n 12/98: W3C QL’98 Workshop

n Candidate Requirements for XML Query

n Database Desiderata for an XML Query Language

n 11/99: XPath Recommendation

9

3 W3C XML Query WG

n 07/99: WG Proposal

n 09/99: WG Official Inception

n Today:

n 30 W3C Member Companies

n 11 Meetings, 60+ Telcons

n Heartbeat every Three Months

n Proposed Recommendation(s)

n Goals:

n XML Query Data Model for XML Dcouments

n Query Operators for XML Query Data Model

n Query Language Based on XML Query Operators 10

W3C’s Related Standards n XML Query Specifications:

n XML Query Requirements (02/16/01 - orig. 01/00)

n XML Query Use Cases (06/08/01 - orig. 08/00)

n XQuery 1.0 and XPath 2.0 Data Model (06/07/01 - orig. 05/00)

n XQuery 1.0 Formal Semantics (06/07/01 - orig. 12/00)

n XQuery 1.0: An XML Query Language (06/07/01)

n XML Syntax for XQuery 1.0 (XQueryX) (06/07/01) n XPath 2.0 Specifications

n XPath Requirements Version 2.0 (02/15/01)

n XQuery 1.0 and XPath 2.0 Data Model (06/07/01)

11

Related XML Technologies

n XPath

n XSL

n XPointer

n XML Schema

n XML Infoset

n WAI

n Internationalization

n IETF DASL

n Distributed Authoring Searching and Locating

n http://www.ics.uci.edu/pub/ietf/dasl/ 12

4 Properties of RDBMS Queries n Pattern + Filter + Construction clause n Construction clause may have ordering subclauses n Queries may perform joins across multiple input sets n Queries may generate intermediate variables or path expressions

13

Mapping XML to a RDBMS n SQL-like queries that return XML documents

n e.g., Microsoft IIS + SQL Server

n e.g., Oracle Database Server n Broad spectrum of possible mappings

n Hierarchical v.s. limited RDBMS tree structure

14

JDBC Refresher n See section 6.2 of XML and Java textbook

n Importing JDBC Package

n Loading a JDBC Driver

n Connecting to a Database

n Submitting a Query

15

5 XML Embedded in SQL (SQLX) n SQL Embedded in XML

n See Section 6.3 of XML and Java textbook

n Front-end to RDBMS that provides XML-based Input/Output

n Translates XML query into sequence of JDBC calls, and converts the result to a DOM structure which is returned

16

Part II

XML Query

17

XML Query Requirements (Part I) n General:

n Declarative Language

n Readable XML Syntax

n Protocol Independence

n Standard Error Conditions

n Support for Future Updates n Data Model

n Based on XML Infosets

n Namespace Aware

n Support for XML Schema Data Types

n Support Inter/Intra Document References 18

6 XML Query Requirements (Part II)

n Query Functionality:

n Operators on All Data Types

n Text Operators Across Element Boundaries

n Hierarchies and Sequences

n Combination of Data from Various Locations

n Aggregation and Sorting

n Combination of Operators (Queries as Operands)

n Support NULL values

n Preservation of Structure/Identity

n Operations on Names/Schemas

n Extensibility & Closure 19

XML Query Use Cases n Approach

n Description, DTD/Schema, Input, Queries, Results n Existing Use Cases

n XMP (examples)

n TREE (queries that preserve hierarchy)

n SEQ (queries based on sequence)

n R (relational data access)

n TEXT (text search)

n NS (namespace-based queries)

n PARTS (recursive parts queries) 20 n REF (queries based on references)

XML Query Data Model n Information Presented to a Query Processor n Augmented Infoset:

n XML Schema Data Types (PSVI)

n Document Collections

n References n Node-Labeled Tree Constructor Model with Node Identity n Infoset Mapping to Query Data Model is Defined as Part of the Specification

21

7 XML Query Data Model (continued) n Nodes

n Node = DocNode | ElemNode | AttrNode | ValueNode | NSNode | PINode | CommentNode | InfoItemNode n XML Schema Primitive Types

n string, boolean, ID, IDREF, decimal, etc. n Collections

n list [T], set {T}, bag {|T|}, disjoint/union (T1 | T2), tuple (T1, …, Tn) n References

n ref(T) 22

XML Query Algebra n Defines Static and Dynamic Semantics

n Static Semantics are Type Inference Rules

n Relate Algebra Expressions to Types

n Dynamic Semantics are Value Inference Rules

n Relate Algebra Expressions to Values n Issues:

n Algebra Type System Alignment with XML Schema

n Operators on Schema Simple Types not Defined

n Lexical Representation of Schema Simple Types not Defined 23

Constructors n Construct Values in XML Query Data Model attrNode : (Ref(QNameValue), Ref(ValueNode)) -> AttrNode ValueNode = QNameValue | StringValue | DecimalValue | ... qnameValue : (uriReference | null, string, Ref(Def_QName)) -> QNameValue decimalValue : (decimal, Ref(Def_decimal)) -> DecimalValue attrNode(ref(qnameValue(null, “price”, Ref(Def_QName)), ref(decimalValue(10.50, Ref(Def_decimal)))) 24

8 Assessors n Deconstruct Values in XML Query Data Model name : AttrNode -> Ref(QNameValue) value : AttrNode -> Ref(ValueNode) type : AttrNode -> Ref(ElemNode) name(A1) = ref(qnameValue(null, “price”)) value(A1) = ref(decimalValue(10.50, Ref(Def_decimal))) type(A1) =

25

Part III

XML Query Languages

26

XQuery

n Functional Language

n Query Represented as an Expression

n Expressions can be Nested without Restriction

n Input/Output of an XQuery are Instances of the XML Query Data Model

n Based on OQL, SQL, XML-QL, XPath

n Readable XML Syntax

27

9 XQuery Expressions n Path Expressions n Element Constructors n FLWR Expressions n Expressions with Operators/Functions n Conditional Expressions n Quantified Expressions n List Constructors n Expressions to Test/Modify Datatypes

28

XQuery Path Expressions

n Abbreviated XPath 1.0 Syntax

n Find figure(s) with caption “Tree Frogs” in second chapter of “zoo.

n document(“zoo.xml)/chapter[2]//figure[caption = “Tree Frogs”]

n Extensions

n Dereference Operator

n Range PredicateDocument Formats

n Find captions of figures referenced by

n document(“zoo.xml”)/chapter[title = “Frogs”]//figref/@refid->fig/caption 29

XQuery Element Constructor n Start/End Tag + Enclosed List of Expressions

n Generate an element with a computed name that contains nested elements: <$tagname> $d $p

30

10 XQuery For Let Where Return (FLWR)

n FOR and LET Clause

n Generate a List of Tuples that Preserves Doc Order

n WHERE Clause

n Applies a Predicate to Eliminate Some Tuples

n RETURN Clause

n Executed on Resulting Tuples -> Ordered Output List

n Syntax: FOR var IN expr WHERE expr RETURN expr LET var := expr

31

FLWR Sample Expressions n List titles of books published by MK in 98 FOR $b IN document (“bib.xml”)//book WHERE $b/publisher = “Morgan Kaufmann” AND $b/year = “1998” RETURN $b/title n List each publisher and its books average price FOR $p IN distinct(document(“bib.xml”)//publisher) LET $a := avg(document(“bib.xml”)

/book[publisher = $p]/price) 32

XQuery Operators and Functions

n Infix/Prefix Operators

n e.g., Infix Operators BEFORE and AFTER

n Parenthesized Expressions

n Arithmetic/Logical Operators

n Collection Operators

n e.g., UNION, INTERSECT, EXCEPT

n Functions Can Be Defined in XQuery

33

11 Sample Operators and Functions

n Find max depth of “partlist.xml” NAMESPACE xsd=“http”//www.w3.org/2001/03/XMLSchema-datatypes”

FUNCTION depth(ELEMENT $e) RETURNS xsd:integer { IF empty ($e/*) THEN 1 ELSE max (depth($e/*))+1 } depth(document(“partlist.xml”))

34

XQuery Conditional Expressions

FOR $h IN //holding RETURN $h/title IF $h/@type=“Journal” THEN $h/editor ELSE $h/author SORTBY (title)

35

XQuery Quantified Expressions n Example 1: FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, “sailing”) AND contains($p, “windsurfing”) RETURN $b/title n Example 2: FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, “sailing”) RETURN $b/title 36

12 XQuery List Constructors

n List encloses zero or more expressions in square brackets, separated by commas

n List of member variables: [$x, $y, $z]

n Empty list: [ ]

37

XQuery Operators on Data Types

n INSTANCEOF (instance, type)

n CAST

n Convert value from one datatype to another

n TREAT

n Causes the query processor to treat an expression as if its datatype were a subtype of its static type

38

XQuery Outstanding Issues

n Integration with XPath 2.0 n Alignment of XQuery and XML Query Algebra Syntax n Internationalization

n e.g., Collation Sequences for Sorting, Strings ops n XML Query Syntax n Operators and Functions TBD

39

13 Part IV

XML Query Engines

40

Various Approaches n XQEngine

n Full-text search engine for XML

n Java APIs available

n W3C XQuery Specification Support n DB2XML

n Standalone tool (with GUI or command line)

n Servlet to dynamically generate XML-documents

n DB2XML API n Oracle XML Developer Kit (XDK ) n Microsoft SQL Serversupport for XML 41

Part V

Conclusions

42

14 Summary

n Applying XML to Database Technology allows the viewing of database data as an XML document.

n XML Query is based on a well defined Data Model and Algebra

n Various syntaxes are possible for an XML Query Language

n XML Query Engines are infrastructure components that support XML Query

43

Readings n Readings

n XML Development with Java 2: Chapter 10

n Professional Java XML: Chapters 16, and 17

n XML and Java: Chapter 6

n Handouts posted on the course web site

n Review textbook chapters on XML/Java persistence bindings n Project Frameworks Setup (ongoing)

n Howard Katz’s XQEngine

n Apache’s Web Server, TomCat/JRun, and Cocoon

n Apache’s Xerces, Xalan, Saxon

n Antenna House XML Formatter, Apache’s FOP, X-smiles

n Publishing Systems at http://www.xmlsoftware.com

n Visibroker 4.5, WebLogic 6.1 44 n POSE & KVM (See Session 3 handout)

Assignment

n Assignment #5: (#5c will be discussed next week)

n #5a: This part of the project focuses on the application data model design/development using XML information retrieval technology. The design/development process should focus initially on identifying the data to be retrieved for resulting subsets of data

n #5b: Your XML application service should demonstrate the following additional steps: (a) Defining the optimal retrieval approach for each dataset, and (b) Considering query constraints when designing an overall application data model

n More specific project related information, and extra credit

assignments will be provided during the session 45

15 Next Session: XML Information Retrieval (Part II) XML-Based Frameworks (Part I) n XML Object Persistence n Advanced XML-QL/XQL Concepts n Using the SAX and DOM APIs with a database n XML Server Pages (XSP) n Presentation Oriented Publishing (POP) Frameworks n Client-Side XML POP Frameworks n Server-Side XML POP Frameworks

46

16