High Performance XML Data Retrieval

High Performance XML Data Retrieval Mark V. Scardina Jinyu Wang Group Product Manager & XML Evangelist Senior Product Manager Oracle Corporation Oracle Corporation Agenda y Why XPath for Data Retrieval? y Current XML Data Retrieval Strategies and Issues y High Performance XPath Requirements y Design of Extractor for XPath y Extractor Use Cases Why XPath for Data Retrieval? y W3C Standard for XML Document Navigation since 2001 y Support for XML Schema Data Types in 2.0 y Support for Functions and Operators in 2.0 y Underlies XSLT, XQuery, DOM, XForms, XPointer Current Standards-based Data Retrieval Strategies y Document Object Model (DOM) Parsing y Simple API for XML Parsing (SAX) y Java API for XML Parsing (JAXP) y Streaming API for XML Parsing (StAX) Data Retrieval Using DOM Parsing y Advantages – Dynamic random access to entire document – Supports XPath 1.0 y Disadvantages – DOM In-memory footprint up to 10x doc size – No planned support for XPath 2.0 – Redundant node traversals for multiple XPaths DOM-based XPath Data Retrieval A 1 1 2 2 1 /A/B/C 2 /A/B/C/D B F B 1 2 E C C 2 F D Data Retrieval using SAX/StAX Parsing y Advantages – Stream-based processing for managed memory – Broadcast events for multicasting (SAX) – Pull parsing model for ease of programming and control (StAX) y Disadvantages – No maintenance of hierarchical structure – No XPath Support either 1.0 or 2.0 High Performance Requirements y Retrieve XML data with managed memory resources y Support for documents of all sizes y Handle multiple XPaths with minimum node traversals y Support DTD and Schema-based XML documents Extractor for XPath y Stream-based processing utilizing SAX y Support for DTDs and XML Schemas y Implements Publish/Subscribe model for scalability y Handles multiple XPaths simultaneously y Supports XPath 1.0; extended to 2.0 Extractor’s Publish/Subscribe Processing Model XPath Expressions &Content Handlers SAX Extractor Outputs Extractor’s Function Blocks y Initialization: registration of XPaths/Handlers y XPath Compilation: compiles and builds index graphs y XPath Tracking: maintains XPath state and matches doc XPaths with the indexed XPaths y Output: sends matching XPath start/stop events along with the XML data Extractor’s Function Blocks XPath Initialization Output Expressions & Content Handlers Register XPath expressions and the XML Sequences instances of content handlers XMLSequenceBuilder SAX Files DTD/XSD Compile XPath XMLSAXSerializer expressions and build Index graph Registered...... Handler A XML Track XPath and perform ...... SAX XPath matching when SAX receiving SAX events Registered Handler Z ExtractorRuntime Initialization y Registration of absolute XPaths y Support for XML Namespaces for differentiation y Registration of execution handlers using XContentHandler() y Built-in Handlers for ease of use – XMLSequenceBuilder() – XMLSAXSerializer() XPath Compilation y XPath streamability evaluation y Streamable isAll=true/false option – True: Process only streamable XPaths – False: Buffer data as needed y Build XPath Predicate Table y Build Index Tree – XPath Dependency Tree (w/o DTD/XSD) – Data Model Tree (w/ DTD/XSD) using existing Validation engine Compilation of Each XPath Predicate Predicate XPath Evaluation Attributes Status Related to Predicate XPath -1 Value TRUE Location XPath -1 Predicate XPath -2 Predicate XPath-3 FALSE Predicate XPath-4 Value Pending Predicate Table (not-matched, matched, yet-to-be-matched) ... ... ... ... Predicate XPath Evaluation Attributes Status Related to Predicate XPath -1 Value TRUE Location XPath -n Predicate XPath -2 Predicate XPath-3 FALSE Predicate XPath-4 Value Pending Predicate Table (not-matched, matched, yet-to-be-matched) •Also Used for isAll = True condition Runtime XPath Matching y State Machine tokenizes and tracks – In-scope Namespaces – Current Element Name – Current Element Attributes – Node Position Relative to Siblings – Number of Child Elements y Implemented as a Stack XPath Index Tree (w/o DTD/XSD) / X Fake Node for /* A X X Fake Node for // B 4 1 /A/B/C P 2 /A/B/C/D 1 C X 3 /A/B//P X 4 //P 2 5 //P/*/Q 3 D P 5 Q XPath Expressions XPath Dependency Tree Dependency Tree Traversal 1 /A/B/C X Fake Node for /* 2 /A/B/C/D 3 /A/B//P X Fake Node for // 4 //P 5 //P/*/Q / 3 4 A X'' P P X'' X' D 2 DX''X' B 4 P C 1 C X'' X' 1 C X' B B X'' X''' 2 3 A A X'' D P 5 Q XPath Stack Matched Node XPath Dependency Tree Synchronous Data Model Traversal A A 1 1 B F B B F B 2 2 E E C C /A/B/C C 3 C 3 F D F D /A/B/C/D XML Document Tree XML Data Model Extractor Output y XContentHandler() – Execution of Registered Content Handlers y XMLSequenceBuilder() – Built-in Handler – Presents Result Set as XMLSequence Object – Contains a Linked List of XMLItems y XMLSAXSerializer – Built-in Handler – Serializes output to Printwriter or OutputStream Extractor Use Cases y Content Management y Web-Services y XSLT/XQuery Implementation Extractor Content Management Use Case Retrieve XPath Expressions XML system/public ID or XML Schema Location URL Pre-Process DTD/XML Schema: JDBC XPaths Table XPath Expressions: /a/b /a/b/@c Initialize ... XML Registered Handler A MetaData Tables SAX JDBC Parser ...... Extractor Registered Handler Y Connection Pool Doc Table JDBC Registered Handler Z Oracle Database Process Data using Content Handlers Extractor Web Service Use Case Service Broker Register Data Web Services Subscription using "invoke" WS XPath SOAP Register XPath/ Web Service Web Services Content SOAPClient WS Handlers response SOAP XML Initialize Client 1 ...... SAX Extractor Client 2 Parser Client 3 Clients Subscribing and Receiving the Data Extractor XSLT/XQuery Use Case XSLT Resolving XPath XMLSequnceObject XQuery XPath Bridge Register XPath XML SAX Parser Extractor XMLSequenceBuilder XML Oracle XML Resources •Oracle Technology Network • http://otn.oracle.com • Downloads, Demos, Samples, Papers • XML Support Forum •Oracle Database 10g XML & SQL Design, Build, & Manage XML Applications in Java, C, C++, & PL/SQL • Covers all of Oracle XML technology • BetaBook Forum on OTN • Available in May from Bookstores .

High Performance XML Data Retrieval

XML Document Parsing: Operational and Performance Characteristics

V a Lida T in G R D F Da

Bibliography of Erik Wilde

XML Parsers - a Comparative Study with Respect to Adaptability

XML and Related Technologies Certification Prep, Part 3: XML Processing Explore How to Parse and Validate XML Documents Plus How to Use Xquery

Licensing Information User Manual Release 21C (21.1) F37966-01 March 2021

XML Prague 2020

Web Technologies in Java EE

JAVA XML Mock Test

Metro User Guide Metro User Guide Table of Contents

An Overview on Xml Parsing

Pearls of XSLT/Xpath 3.0 Design