High Performance XML Data Retrieval
Total Page:16
File Type:pdf, Size:1020Kb
High Performance XML Data Retrieval Mark V. Scardina Jinyu Wang Group Product Manager & XML Evangelist Senior Product Manager Oracle Corporation Oracle Corporation Agenda y Why XPath for Data Retrieval? y Current XML Data Retrieval Strategies and Issues y High Performance XPath Requirements y Design of Extractor for XPath y Extractor Use Cases Why XPath for Data Retrieval? y W3C Standard for XML Document Navigation since 2001 y Support for XML Schema Data Types in 2.0 y Support for Functions and Operators in 2.0 y Underlies XSLT, XQuery, DOM, XForms, XPointer Current Standards-based Data Retrieval Strategies y Document Object Model (DOM) Parsing y Simple API for XML Parsing (SAX) y Java API for XML Parsing (JAXP) y Streaming API for XML Parsing (StAX) Data Retrieval Using DOM Parsing y Advantages – Dynamic random access to entire document – Supports XPath 1.0 y Disadvantages – DOM In-memory footprint up to 10x doc size – No planned support for XPath 2.0 – Redundant node traversals for multiple XPaths DOM-based XPath Data Retrieval A 1 1 2 2 1 /A/B/C 2 /A/B/C/D B F B 1 2 E C C 2 F D Data Retrieval using SAX/StAX Parsing y Advantages – Stream-based processing for managed memory – Broadcast events for multicasting (SAX) – Pull parsing model for ease of programming and control (StAX) y Disadvantages – No maintenance of hierarchical structure – No XPath Support either 1.0 or 2.0 High Performance Requirements y Retrieve XML data with managed memory resources y Support for documents of all sizes y Handle multiple XPaths with minimum node traversals y Support DTD and Schema-based XML documents Extractor for XPath y Stream-based processing utilizing SAX y Support for DTDs and XML Schemas y Implements Publish/Subscribe model for scalability y Handles multiple XPaths simultaneously y Supports XPath 1.0; extended to 2.0 Extractor’s Publish/Subscribe Processing Model XPath Expressions &Content Handlers SAX Extractor Outputs Extractor’s Function Blocks y Initialization: registration of XPaths/Handlers y XPath Compilation: compiles and builds index graphs y XPath Tracking: maintains XPath state and matches doc XPaths with the indexed XPaths y Output: sends matching XPath start/stop events along with the XML data Extractor’s Function Blocks XPath Initialization Output Expressions & Content Handlers Register XPath expressions and the XML Sequences instances of content handlers XMLSequenceBuilder SAX Files DTD/XSD Compile XPath XMLSAXSerializer expressions and build Index graph Registered...... Handler A XML Track XPath and perform ...... SAX XPath matching when SAX receiving SAX events Registered Handler Z ExtractorRuntime Initialization y Registration of absolute XPaths y Support for XML Namespaces for differentiation y Registration of execution handlers using XContentHandler() y Built-in Handlers for ease of use – XMLSequenceBuilder() – XMLSAXSerializer() XPath Compilation y XPath streamability evaluation y Streamable isAll=true/false option – True: Process only streamable XPaths – False: Buffer data as needed y Build XPath Predicate Table y Build Index Tree – XPath Dependency Tree (w/o DTD/XSD) – Data Model Tree (w/ DTD/XSD) using existing Validation engine Compilation of Each XPath Predicate Predicate XPath Evaluation Attributes Status Related to Predicate XPath -1 Value TRUE Location XPath -1 Predicate XPath -2 Predicate XPath-3 FALSE Predicate XPath-4 Value Pending Predicate Table (not-matched, matched, yet-to-be-matched) ... ... ... ... Predicate XPath Evaluation Attributes Status Related to Predicate XPath -1 Value TRUE Location XPath -n Predicate XPath -2 Predicate XPath-3 FALSE Predicate XPath-4 Value Pending Predicate Table (not-matched, matched, yet-to-be-matched) •Also Used for isAll = True condition Runtime XPath Matching y State Machine tokenizes and tracks – In-scope Namespaces – Current Element Name – Current Element Attributes – Node Position Relative to Siblings – Number of Child Elements y Implemented as a Stack XPath Index Tree (w/o DTD/XSD) / X Fake Node for /* A X X Fake Node for // B 4 1 /A/B/C P 2 /A/B/C/D 1 C X 3 /A/B//P X 4 //P 2 5 //P/*/Q 3 D P 5 Q XPath Expressions XPath Dependency Tree Dependency Tree Traversal 1 /A/B/C X Fake Node for /* 2 /A/B/C/D 3 /A/B//P X Fake Node for // 4 //P 5 //P/*/Q / 3 4 A X'' P P X'' X' D 2 DX''X' B 4 P C 1 C X'' X' 1 C X' B B X'' X''' 2 3 A A X'' D P 5 Q XPath Stack Matched Node XPath Dependency Tree Synchronous Data Model Traversal A A 1 1 B F B B F B 2 2 E E C C /A/B/C C 3 C 3 F D F D /A/B/C/D XML Document Tree XML Data Model Extractor Output y XContentHandler() – Execution of Registered Content Handlers y XMLSequenceBuilder() – Built-in Handler – Presents Result Set as XMLSequence Object – Contains a Linked List of XMLItems y XMLSAXSerializer – Built-in Handler – Serializes output to Printwriter or OutputStream Extractor Use Cases y Content Management y Web-Services y XSLT/XQuery Implementation Extractor Content Management Use Case Retrieve XPath Expressions XML system/public ID or XML Schema Location URL Pre-Process DTD/XML Schema: JDBC XPaths Table XPath Expressions: /a/b /a/b/@c Initialize ... XML Registered Handler A MetaData Tables SAX JDBC Parser ...... Extractor Registered Handler Y Connection Pool Doc Table JDBC Registered Handler Z Oracle Database Process Data using Content Handlers Extractor Web Service Use Case Service Broker Register Data Web Services Subscription using "invoke" WS XPath SOAP Register XPath/ Web Service Web Services Content SOAPClient WS Handlers response SOAP XML Initialize Client 1 ...... SAX Extractor Client 2 Parser Client 3 Clients Subscribing and Receiving the Data Extractor XSLT/XQuery Use Case XSLT Resolving XPath XMLSequnceObject XQuery XPath Bridge Register XPath XML SAX Parser Extractor XMLSequenceBuilder XML Oracle XML Resources •Oracle Technology Network • http://otn.oracle.com • Downloads, Demos, Samples, Papers • XML Support Forum •Oracle Database 10g XML & SQL Design, Build, & Manage XML Applications in Java, C, C++, & PL/SQL • Covers all of Oracle XML technology • BetaBook Forum on OTN • Available in May from Bookstores .