Session 9: XML Information Retrieval (Part I)

Session 9: XML Information Retrieval (Part I)

XML for Java Developers G22.3033-002 Session 9 - Main Theme XML Information Retrieval (Part I) Dr. Jean-Claude Franchitti New York University Computer Science Department Courant Institute of Mathematical Sciences 1 Agenda n Summary of Previous Session n Applications of XML to Database Technology n XML Query Languages n XPath n XML Queries n XQuery: A Query Language for XML n XML Query Engines n Assignment 5a+5b+5c (due in two weeks) 2 Summary of Previous Session n XML/XSL and JSP/JavaBeans Rendering Technology n Internationalization Issues n Web Content Accessibility Guidelines (WCAG) n Assignment 4a+4b (due next week) 3 1 XML-Based Retrieval Development n XML Software Development Methodology n Language + Stepwise Process + Tools n Rational Unified Process (RUP) v.s. “XML Unified Process” n XML Application Development Infrastructure n Metadata Management (e.g., XMI) n XML Query Engine (3rd party software) n XML Tools (e.g., XML Editors) n XML Applications Involved in the Rendering Phase: n Application(s) of XML n XML-based applications/services (markup language mediators) n MOM, POP, Persistence service n Application Infrastructure Frameworks 4 XML Data Retrieval Patterns n XML Data Retrieval Operations n Access n Query n Manipulate n Multiple XML Data Sources Integration n XML Message Filtering n DBMS Data Views n Database System Interfacing 5 Part I Applications of XML to Database Technology 6 2 Towards XML Application Services n Processing n DOM Extensions n Binding Extensions n Component Frameworks (reusable component models) n Model-Based Automation (MDA) n Rendering n DOM 2.1.0, SAX 2.0, JAXP 1.1 & TraX, XSL-FO 1.0 n Component Frameworks n Querying n XQuery 1.0, XSLT 1.1/2.0, XPath 1.0/2.0 n Security (signatures encryption/decryption, etc.) n Etc. 7 Retrieval Software Development n Languages (XML-QL, YaTL, XQL, etc.) n Data Model + Operations + Syntax n Process (“XUP”) n Frameworks n Custom Engine n e.g., XQEngine n Translation to SQL n e.g., DB2XML, Oracle’s XML I/F n Translation to OQL n XML Query Infrastructure n XPath Processors: Saxon 6.1, Xalan-J2.1.0 n XQuery Processors 8 n SQLX XML Query History n SGML Query Facilities n Ad-hoc Approach to Query Languages n 02/98: XQL Proposal n 08/98: XML-QL Submission n 12/98: W3C QL’98 Workshop n Candidate Requirements for XML Query n Database Desiderata for an XML Query Language n 11/99: XPath Recommendation 9 3 W3C XML Query WG n 07/99: WG Proposal n 09/99: WG Official Inception n Today: n 30 W3C Member Companies n 11 Meetings, 60+ Telcons n Heartbeat every Three Months n Proposed Recommendation(s) n Goals: n XML Query Data Model for XML Dcouments n Query Operators for XML Query Data Model n Query Language Based on XML Query Operators 10 W3C’s Related Standards n XML Query Specifications: n XML Query Requirements (02/16/01 - orig. 01/00) n XML Query Use Cases (06/08/01 - orig. 08/00) n XQuery 1.0 and XPath 2.0 Data Model (06/07/01 - orig. 05/00) n XQuery 1.0 Formal Semantics (06/07/01 - orig. 12/00) n XQuery 1.0: An XML Query Language (06/07/01) n XML Syntax for XQuery 1.0 (XQueryX) (06/07/01) n XPath 2.0 Specifications n XPath Requirements Version 2.0 (02/15/01) n XQuery 1.0 and XPath 2.0 Data Model (06/07/01) 11 Related XML Technologies n XPath n XSL n XPointer n XML Schema n XML Infoset n WAI n Internationalization n IETF DASL n Distributed Authoring Searching and Locating n http://www.ics.uci.edu/pub/ietf/dasl/ 12 4 Properties of RDBMS Queries n Pattern + Filter + Construction clause n Construction clause may have ordering subclauses n Queries may perform joins across multiple input sets n Queries may generate intermediate variables or path expressions 13 Mapping XML to a RDBMS n SQL-like queries that return XML documents n e.g., Microsoft IIS + SQL Server n e.g., Oracle Database Server n Broad spectrum of possible mappings n Hierarchical v.s. limited RDBMS tree structure 14 JDBC Refresher n See section 6.2 of XML and Java textbook n Importing JDBC Package n Loading a JDBC Driver n Connecting to a Database n Submitting a Query 15 5 XML Embedded in SQL (SQLX) n SQL Embedded in XML n See Section 6.3 of XML and Java textbook n Front-end to RDBMS that provides XML-based Input/Output n Translates XML query into sequence of JDBC calls, and converts the result to a DOM structure which is returned 16 Part II XML Query 17 XML Query Requirements (Part I) n General: n Declarative Language n Readable XML Syntax n Protocol Independence n Standard Error Conditions n Support for Future Updates n Data Model n Based on XML Infosets n Namespace Aware n Support for XML Schema Data Types n Support Inter/Intra Document References 18 6 XML Query Requirements (Part II) n Query Functionality: n Operators on All Data Types n Text Operators Across Element Boundaries n Hierarchies and Sequences n Combination of Data from Various Locations n Aggregation and Sorting n Combination of Operators (Queries as Operands) n Support NULL values n Preservation of Structure/Identity n Operations on Names/Schemas n Extensibility & Closure 19 XML Query Use Cases n Approach n Description, DTD/Schema, Input, Queries, Results n Existing Use Cases n XMP (examples) n TREE (queries that preserve hierarchy) n SEQ (queries based on sequence) n R (relational data access) n TEXT (text search) n NS (namespace-based queries) n PARTS (recursive parts queries) 20 n REF (queries based on references) XML Query Data Model n Information Presented to a Query Processor n Augmented Infoset: n XML Schema Data Types (PSVI) n Document Collections n References n Node-Labeled Tree Constructor Model with Node Identity n Infoset Mapping to Query Data Model is Defined as Part of the Specification 21 7 XML Query Data Model (continued) n Nodes n Node = DocNode | ElemNode | AttrNode | ValueNode | NSNode | PINode | CommentNode | InfoItemNode n XML Schema Primitive Types n string, boolean, ID, IDREF, decimal, etc. n Collections n list [T], set {T}, bag {|T|}, disjoint/union (T1 | T2), tuple (T1, …, Tn) n References n ref(T) 22 XML Query Algebra n Defines Static and Dynamic Semantics n Static Semantics are Type Inference Rules n Relate Algebra Expressions to Types n Dynamic Semantics are Value Inference Rules n Relate Algebra Expressions to Values n Issues: n Algebra Type System Alignment with XML Schema n Operators on Schema Simple Types not Defined n Lexical Representation of Schema Simple Types not Defined 23 Constructors n Construct Values in XML Query Data Model attrNode : (Ref(QNameValue), Ref(ValueNode)) -> AttrNode ValueNode = QNameValue | StringValue | DecimalValue | ... qnameValue : (uriReference | null, string, Ref(Def_QName)) -> QNameValue decimalValue : (decimal, Ref(Def_decimal)) -> DecimalValue <part price=10.50/> <xsd:attribute name=“price” type=xsd:decimal/> attrNode(ref(qnameValue(null, “price”, Ref(Def_QName)), ref(decimalValue(10.50, Ref(Def_decimal)))) 24 8 Assessors n Deconstruct Values in XML Query Data Model name : AttrNode -> Ref(QNameValue) value : AttrNode -> Ref(ValueNode) type : AttrNode -> Ref(ElemNode) <xsd:attribue name=“price” type=xsd:decimal/> <part price=10.50/> name(A1) = ref(qnameValue(null, “price”)) value(A1) = ref(decimalValue(10.50, Ref(Def_decimal))) type(A1) = <!-- data model representation of simple type decimal --> 25 Part III XML Query Languages 26 XQuery n Functional Language n Query Represented as an Expression n Expressions can be Nested without Restriction n Input/Output of an XQuery are Instances of the XML Query Data Model n Based on OQL, SQL, XML-QL, XPath n Readable XML Syntax 27 9 XQuery Expressions n Path Expressions n Element Constructors n FLWR Expressions n Expressions with Operators/Functions n Conditional Expressions n Quantified Expressions n List Constructors n Expressions to Test/Modify Datatypes 28 XQuery Path Expressions n Abbreviated XPath 1.0 Syntax n Find figure(s) with caption “Tree Frogs” in second chapter of “zoo.xml” n document(“zoo.xml)/chapter[2]//figure[caption = “Tree Frogs”] n Extensions n Dereference Operator n Range PredicateDocument Formats n Find captions of figures referenced by <figref” elements in “Frogs” chapter of “zoo.xml” n document(“zoo.xml”)/chapter[title = “Frogs”]//figref/@refid->fig/caption 29 XQuery Element Constructor n Start/End Tag + Enclosed List of Expressions n Generate an element with a computed name that contains nested elements: <$tagname> <description> $d </description> <price> $p </price> </$tagname> 30 10 XQuery For Let Where Return (FLWR) n FOR and LET Clause n Generate a List of Tuples that Preserves Doc Order n WHERE Clause n Applies a Predicate to Eliminate Some Tuples n RETURN Clause n Executed on Resulting Tuples -> Ordered Output List n Syntax: FOR var IN expr WHERE expr RETURN expr LET var := expr 31 FLWR Sample Expressions n List titles of books published by MK in 98 FOR $b IN document (“bib.xml”)//book WHERE $b/publisher = “Morgan Kaufmann” AND $b/year = “1998” RETURN $b/title n List each publisher and its books average price FOR $p IN distinct(document(“bib.xml”)//publisher) LET $a := avg(document(“bib.xml”) /book[publisher = $p]/price) 32 XQuery Operators and Functions n Infix/Prefix Operators n e.g., Infix Operators BEFORE and AFTER n Parenthesized Expressions n Arithmetic/Logical Operators n Collection Operators n e.g., UNION, INTERSECT, EXCEPT n Functions Can Be Defined in XQuery 33 11 Sample Operators and Functions n Find max depth of “partlist.xml” NAMESPACE xsd=“http”//www.w3.org/2001/03/XMLSchema-datatypes” FUNCTION depth(ELEMENT $e) RETURNS xsd:integer { IF empty ($e/*) THEN 1 ELSE max (depth($e/*))+1 } depth(document(“partlist.xml”))

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    16 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us