XML for Java Developers G22.3033-002
Session 9 - Main Theme XML Information Retrieval (Part I)
Dr. Jean-Claude Franchitti
New York University Computer Science Department Courant Institute of Mathematical Sciences
1
Agenda
n Summary of Previous Session
n Applications of XML to Database Technology
n XML Query Languages
n XPath
n XML Queries
n XQuery: A Query Language for XML
n XML Query Engines
n Assignment 5a+5b+5c (due in two weeks)
2
Summary of Previous Session
n XML/XSL and JSP/JavaBeans Rendering Technology n Internationalization Issues n Web Content Accessibility Guidelines (WCAG) n Assignment 4a+4b (due next week)
3
1 XML-Based Retrieval Development
n XML Software Development Methodology
n Language + Stepwise Process + Tools
n Rational Unified Process (RUP) v.s. “XML Unified Process”
n XML Application Development Infrastructure
n Metadata Management (e.g., XMI)
n XML Query Engine (3rd party software)
n XML Tools (e.g., XML Editors)
n XML Applications Involved in the Rendering Phase:
n Application(s) of XML
n XML-based applications/services (markup language mediators)
n MOM, POP, Persistence service
n Application Infrastructure Frameworks
4
XML Data Retrieval Patterns
n XML Data Retrieval Operations
n Access
n Query
n Manipulate n Multiple XML Data Sources Integration n XML Message Filtering n DBMS Data Views n Database System Interfacing
5
Part I
Applications of XML to Database Technology
6
2 Towards XML Application Services n Processing
n DOM Extensions
n Binding Extensions
n Component Frameworks (reusable component models)
n Model-Based Automation (MDA) n Rendering
n DOM 2.1.0, SAX 2.0, JAXP 1.1 & TraX, XSL-FO 1.0
n Component Frameworks n Querying
n XQuery 1.0, XSLT 1.1/2.0, XPath 1.0/2.0 n Security (signatures encryption/decryption, etc.) n Etc. 7
Retrieval Software Development
n Languages (XML-QL, YaTL, XQL, etc.)
n Data Model + Operations + Syntax
n Process (“XUP”)
n Frameworks
n Custom Engine
n e.g., XQEngine
n Translation to SQL
n e.g., DB2XML, Oracle’s XML I/F
n Translation to OQL
n XML Query Infrastructure
n XPath Processors: Saxon 6.1, Xalan-J2.1.0
n XQuery Processors 8 n SQLX
XML Query History
n SGML Query Facilities
n Ad-hoc Approach to Query Languages
n 02/98: XQL Proposal
n 08/98: XML-QL Submission
n 12/98: W3C QL’98 Workshop
n Candidate Requirements for XML Query
n Database Desiderata for an XML Query Language
n 11/99: XPath Recommendation
9
3 W3C XML Query WG
n 07/99: WG Proposal
n 09/99: WG Official Inception
n Today:
n 30 W3C Member Companies
n 11 Meetings, 60+ Telcons
n Heartbeat every Three Months
n Proposed Recommendation(s)
n Goals:
n XML Query Data Model for XML Dcouments
n Query Operators for XML Query Data Model
n Query Language Based on XML Query Operators 10
W3C’s Related Standards n XML Query Specifications:
n XML Query Requirements (02/16/01 - orig. 01/00)
n XML Query Use Cases (06/08/01 - orig. 08/00)
n XQuery 1.0 and XPath 2.0 Data Model (06/07/01 - orig. 05/00)
n XQuery 1.0 Formal Semantics (06/07/01 - orig. 12/00)
n XQuery 1.0: An XML Query Language (06/07/01)
n XML Syntax for XQuery 1.0 (XQueryX) (06/07/01) n XPath 2.0 Specifications
n XPath Requirements Version 2.0 (02/15/01)
n XQuery 1.0 and XPath 2.0 Data Model (06/07/01)
11
Related XML Technologies
n XPath
n XSL
n XPointer
n XML Schema
n XML Infoset
n WAI
n Internationalization
n IETF DASL
n Distributed Authoring Searching and Locating
n http://www.ics.uci.edu/pub/ietf/dasl/ 12
4 Properties of RDBMS Queries n Pattern + Filter + Construction clause n Construction clause may have ordering subclauses n Queries may perform joins across multiple input sets n Queries may generate intermediate variables or path expressions
13
Mapping XML to a RDBMS n SQL-like queries that return XML documents
n e.g., Microsoft IIS + SQL Server
n e.g., Oracle Database Server n Broad spectrum of possible mappings
n Hierarchical v.s. limited RDBMS tree structure
14
JDBC Refresher n See section 6.2 of XML and Java textbook
n Importing JDBC Package
n Loading a JDBC Driver
n Connecting to a Database
n Submitting a Query
15
5 XML Embedded in SQL (SQLX) n SQL Embedded in XML
n See Section 6.3 of XML and Java textbook
n Front-end to RDBMS that provides XML-based Input/Output
n Translates XML query into sequence of JDBC calls, and converts the result to a DOM structure which is returned
16
Part II
XML Query
17
XML Query Requirements (Part I) n General:
n Declarative Language
n Readable XML Syntax
n Protocol Independence
n Standard Error Conditions
n Support for Future Updates n Data Model
n Based on XML Infosets
n Namespace Aware
n Support for XML Schema Data Types
n Support Inter/Intra Document References 18
6 XML Query Requirements (Part II)
n Query Functionality:
n Operators on All Data Types
n Text Operators Across Element Boundaries
n Hierarchies and Sequences
n Combination of Data from Various Locations
n Aggregation and Sorting
n Combination of Operators (Queries as Operands)
n Support NULL values
n Preservation of Structure/Identity
n Operations on Names/Schemas
n Extensibility & Closure 19
XML Query Use Cases n Approach
n Description, DTD/Schema, Input, Queries, Results n Existing Use Cases
n XMP (examples)
n TREE (queries that preserve hierarchy)
n SEQ (queries based on sequence)
n R (relational data access)
n TEXT (text search)
n NS (namespace-based queries)
n PARTS (recursive parts queries) 20 n REF (queries based on references)
XML Query Data Model n Information Presented to a Query Processor n Augmented Infoset:
n XML Schema Data Types (PSVI)
n Document Collections
n References n Node-Labeled Tree Constructor Model with Node Identity n Infoset Mapping to Query Data Model is Defined as Part of the Specification
21
7 XML Query Data Model (continued) n Nodes
n Node = DocNode | ElemNode | AttrNode | ValueNode | NSNode | PINode | CommentNode | InfoItemNode n XML Schema Primitive Types
n string, boolean, ID, IDREF, decimal, etc. n Collections
n list [T], set {T}, bag {|T|}, disjoint/union (T1 | T2), tuple (T1, …, Tn) n References
n ref(T) 22
XML Query Algebra n Defines Static and Dynamic Semantics
n Static Semantics are Type Inference Rules
n Relate Algebra Expressions to Types
n Dynamic Semantics are Value Inference Rules
n Relate Algebra Expressions to Values n Issues:
n Algebra Type System Alignment with XML Schema
n Operators on Schema Simple Types not Defined
n Lexical Representation of Schema Simple Types not Defined 23
Constructors n Construct Values in XML Query Data Model attrNode : (Ref(QNameValue), Ref(ValueNode)) -> AttrNode ValueNode = QNameValue | StringValue | DecimalValue | ... qnameValue : (uriReference | null, string, Ref(Def_QName)) -> QNameValue decimalValue : (decimal, Ref(Def_decimal)) -> DecimalValue
8 Assessors n Deconstruct Values in XML Query Data Model name : AttrNode -> Ref(QNameValue) value : AttrNode -> Ref(ValueNode) type : AttrNode -> Ref(ElemNode)
25
Part III
XML Query Languages
26
XQuery
n Functional Language
n Query Represented as an Expression
n Expressions can be Nested without Restriction
n Input/Output of an XQuery are Instances of the XML Query Data Model
n Based on OQL, SQL, XML-QL, XPath
n Readable XML Syntax
27
9 XQuery Expressions n Path Expressions n Element Constructors n FLWR Expressions n Expressions with Operators/Functions n Conditional Expressions n Quantified Expressions n List Constructors n Expressions to Test/Modify Datatypes
28
XQuery Path Expressions
n Abbreviated XPath 1.0 Syntax
n Find figure(s) with caption “Tree Frogs” in second chapter of “zoo.xml”
n document(“zoo.xml)/chapter[2]//figure[caption = “Tree Frogs”]
n Extensions
n Dereference Operator
n Range PredicateDocument Formats
n Find captions of figures referenced by n document(“zoo.xml”)/chapter[title = “Frogs”]//figref/@refid->fig/caption 29 XQuery Element Constructor n Start/End Tag + Enclosed List of Expressions n Generate an element with a computed name that contains nested elements: <$tagname> 30 10 XQuery For Let Where Return (FLWR) n FOR and LET Clause n Generate a List of Tuples that Preserves Doc Order n WHERE Clause n Applies a Predicate to Eliminate Some Tuples n RETURN Clause n Executed on Resulting Tuples -> Ordered Output List n Syntax: FOR var IN expr WHERE expr RETURN expr LET var := expr 31 FLWR Sample Expressions n List titles of books published by MK in 98 FOR $b IN document (“bib.xml”)//book WHERE $b/publisher = “Morgan Kaufmann” AND $b/year = “1998” RETURN $b/title n List each publisher and its books average price FOR $p IN distinct(document(“bib.xml”)//publisher) LET $a := avg(document(“bib.xml”) /book[publisher = $p]/price) 32 XQuery Operators and Functions n Infix/Prefix Operators n e.g., Infix Operators BEFORE and AFTER n Parenthesized Expressions n Arithmetic/Logical Operators n Collection Operators n e.g., UNION, INTERSECT, EXCEPT n Functions Can Be Defined in XQuery 33 11 Sample Operators and Functions n Find max depth of “partlist.xml” NAMESPACE xsd=“http”//www.w3.org/2001/03/XMLSchema-datatypes” FUNCTION depth(ELEMENT $e) RETURNS xsd:integer { IF empty ($e/*) THEN 1 ELSE max (depth($e/*))+1 } depth(document(“partlist.xml”)) 34 XQuery Conditional Expressions FOR $h IN //holding RETURN 35 XQuery Quantified Expressions n Example 1: FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, “sailing”) AND contains($p, “windsurfing”) RETURN $b/title n Example 2: FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, “sailing”) RETURN $b/title 36 12 XQuery List Constructors n List encloses zero or more expressions in square brackets, separated by commas n List of member variables: [$x, $y, $z] n Empty list: [ ] 37 XQuery Operators on Data Types n INSTANCEOF (instance, type) n CAST n Convert value from one datatype to another n TREAT n Causes the query processor to treat an expression as if its datatype were a subtype of its static type 38 XQuery Outstanding Issues n Integration with XPath 2.0 n Alignment of XQuery and XML Query Algebra Syntax n Internationalization n e.g., Collation Sequences for Sorting, Strings ops n XML Query Syntax n Operators and Functions TBD 39 13 Part IV XML Query Engines 40 Various Approaches n XQEngine n Full-text search engine for XML n Java APIs available n W3C XQuery Specification Support n DB2XML n Standalone tool (with GUI or command line) n Servlet to dynamically generate XML-documents n DB2XML API n Oracle XML Developer Kit (XDK ) n Microsoft SQL Serversupport for XML 41 Part V Conclusions 42 14 Summary n Applying XML to Database Technology allows the viewing of database data as an XML document. n XML Query is based on a well defined Data Model and Algebra n Various syntaxes are possible for an XML Query Language n XML Query Engines are infrastructure components that support XML Query 43 Readings n Readings n XML Development with Java 2: Chapter 10 n Professional Java XML: Chapters 16, and 17 n XML and Java: Chapter 6 n Handouts posted on the course web site n Review textbook chapters on XML/Java persistence bindings n Project Frameworks Setup (ongoing) n Howard Katz’s XQEngine n Apache’s Web Server, TomCat/JRun, and Cocoon n Apache’s Xerces, Xalan, Saxon n Antenna House XML Formatter, Apache’s FOP, X-smiles n Publishing Systems at http://www.xmlsoftware.com n Visibroker 4.5, WebLogic 6.1 44 n POSE & KVM (See Session 3 handout) Assignment n Assignment #5: (#5c will be discussed next week) n #5a: This part of the project focuses on the application data model design/development using XML information retrieval technology. The design/development process should focus initially on identifying the data to be retrieved for resulting subsets of data n #5b: Your XML application service should demonstrate the following additional steps: (a) Defining the optimal retrieval approach for each dataset, and (b) Considering query constraints when designing an overall application data model n More specific project related information, and extra credit assignments will be provided during the session 45 15 Next Session: XML Information Retrieval (Part II) XML-Based Frameworks (Part I) n XML Object Persistence n Advanced XML-QL/XQL Concepts n Using the SAX and DOM APIs with a database n XML Server Pages (XSP) n Presentation Oriented Publishing (POP) Frameworks n Client-Side XML POP Frameworks n Server-Side XML POP Frameworks 46 16