XML and Databases XML XML Databases

2 XML Similar to HTML (Berners-Lee, CERN W3C) XML and Databases use your own tags. Amount/popularity of XML data is growing steadily (faster than computing power) Lecture 1 Introduction to XML Sebastian Maneth NICTA and UNSW CSE@UNSW - Semester 1, 2009 3 4 XML XML and Databases This course will Similar to HTML (Berners-Lee, CERN W3C) use your own tags. introduce you to the world of XML and to the challenges of dealing with XML in a RDMS. Amount/popularity of XML data is growing steadily (faster than computing power) Some of these challenges are Existing (DB) technology cannot be applied to XML data. HTML pages are *tiny* (couple of Kbytes) XML documents can be huge (GBytes) Databases 5 6 XML and Databases XML and Databases This course will This course will introduce you to the world of XML and to introduce you to the world of XML and to the challenges of dealing with XML in a RDMS. the challenges of dealing with XML in a RDMS. tree structured Some of these challenges are Some of these challenges are Existing (DB) technology cannot be applied to XML data. Existing (DB) technology cannot be applied to XML data. can handle huge amounts of data stored in relations can handle huge amounts of data stored in relations storage management storage management index structures index structures join/sort algorithms join/sort algorithms … … 1 7 8 XML and Databases XML and Databases This course will This course will introduce you to the world of XML and to introduce you to the world of XML and to the challenges of dealing with XML in a RDMS. the challenges of dealing with XML in a RDMS. tree structured tree structured Some of these challenges are Some of these challenges are Existing (DB) technology cannot be applied to XML data. Existing (DB) technology cannot be applied to XML data. how do we store trees? can handle huge amounts of data stored in relations can we benefit from index structures? how to implement tree navigation? storage management table tree index structures node fchild nsib Additional challenges posed by W3C’s XQuery proposal. join/sort algorithms 1 2 - a notion of order 1 … 2 - 3 a complex type / schema system 3 - 4 possibility to construct new tree nodes on the fly. 2 3 4 How to store a tree, in a DB table/relation?? 4 5 - 5 - 6 5 6 “shredding” 6 - - 9 10 XML and Databases XML and Databases This course will This course will introduce you to the world of XML and to introduce you to the world of XML and to the challenges of dealing with XML in a RDMS. the challenges of dealing with XML in a RDMS. tree structured tree structured Some of these challenges are Some of these challenges are Existing (DB) technology cannot be applied to XML data. Existing (DB) technology cannot be applied to XML data. how do we store trees? how do we store trees? can we benefit from index structures? can we benefit from index structures? how to implement tree navigation? how to implement tree navigation? Additional challenges posed by W3C’s XQuery proposal. Additional challenges posed by W3C’s XQuery proposal. a notion of order a notion of order a complex type / schema system a complex type / schema system possibility to construct new tree nodes on the fly. possibility to construct new tree nodes on the fly. XML = Threat to Databases… ?! XML = Threat to Databases !! 11 12 XML and Databases XML and Databases This course will This course will introduce you to the world of XML and to introduce you to the world of XML and to the challenges of dealing with XML in a RDMS. the challenges of dealing with XML in a RDMS. You will learn about You will learn about You will NOT learn about Tree structured data (XML) Tree structured data (XML) Hacking CGI scripts XML parsers & efficient memory representation XML parsers & efficient memory representation HTML Java Query languages for XML (XPath, XQuery, XSLT…) Query Languages for XML (XPath, XQuery, XSLT…) … Efficient evaluation using finite-state automata Efficient evaluation using finite-state automata Mapping XML to databases Mapping XML to databases Advanced topics (query optimizations, Advanced Topics (query optimizations, access control, access control, update languages…) update languages…) 2 13 14 About XML About XML XML is the World Wide Web Consortium’s (W3C, http://www.w3.org/) We will talk about algorithms and programming techniques to Extensible Markup Language efficiently manipulate XML data: We hope to convince you that XML is not yet another hyped TLA, Regular expressions can be used to validate XML data but is useful technology. Finite state automata lie at the heart of highly efficient You will become best friends with one of the most important data XPath implementations structures in Computing Science, the tree. XML is all about tree-shaped data. Tree traversals may be used to preprocess XML trees in order to support XPath evaluation, to store XML trees in databases, etc. You will learn to apply a number of closely related XML standards: In the end, you should be able to digest the thick pile of related W3C > Representing data: XML itself, DTD, XMLSchema, XML dialects X___ standards. > Interfaces to connect PLs to XML: DOM, SAX > Languages to query/transform XML: XPath, XQuery, XSLT. (like, XQuery, XPointer, XLink, XHTML, XInclude, XML Base, XML Schema, … 15 16 Course Organization Course Organization Lecture Thursday, 15:00 – 18:00 Lecture Thursday, 15:00 – 18:00 Central Lecture Block 4 (K-E19-G05) Central Lecture Block 4 (K-E19-G05) Lecturer Sebastian Maneth Lecturer Sebastian Maneth Consult Friday, 11:00-12:00 (E508, L5) Consult Friday, 11:00-12:00 (E508, L5) All email to [email protected] All email to [email protected] Tutorial Book None! Monday, 11:00-13:00 @ Quadrangle G045 (K-E15-G045) Tuesday, 11:00-13:00 @ Quadrangle G045 (K-E15-G045) Suggested reading material: Tuesday, 16:00-18:00 @ Quadrangle G026 (K-E15-G026) Wednesday, 16:00-18:00 @ Quadrangle G040 (K-E15-G040) Course slides of Marc Scholl, Uni Konstanz http://www.inf.uni-konstanz.de/dbis/teaching/ws0506/database-xml/XMLDB.pdf Tutors Andrew Clayphan, Kim Nguyen Theory / PL oriented, book draft: All email to [email protected] http://arbre.is.s.u-tokyo.ac.jp/~hahosoya/xmlbook/ 17 18 Course Organization Outline - Lectures Lecture Thursday, 15:00 – 18:00 Central Lecture Block 4 (K-E19-G05) 1. Introduction to XML, Encodings, Parsers 2. Memory Representations for XML: Space vs Access Speed Lecturer Sebastian Maneth 3. RDBMS Representation of XML Consult Friday, 11:00-12:00 (E508, L5) 4. DTDs, Schemas, Regular Expressions, Ambiguity All email to [email protected] 5. Node Selecting Queries: XPath 6. Efficient XPath Evaluation 7. XPath Properties: backward axes, containment test 8. Streaming Evaluation: how much memory do you need? Programming Assignments 9. XPath Evaluation using RDBMS 10. XSLT 5 assigments, due every other Monday. (1st is due 25th March 11. XSLT & XQuery 2nd is due 6th April …) 12. XQuery & Updates Per assignment: 12 points (total: 60 points) Final exam: 40 points (must get 40% to pass!) 3 19 20 Outline - Assignments Outline - Assignments You can freely choose to program your assignments in C / C++, or 1. Read XML, using DOM parser. Create document statistics 12 days Java 2. SAX Parse into memory structure: Tree vs DAG 2 weeks However, your code must compile with gcc / g++, javac, 3. Map XML into RDBMS 1 week (+ break) as installed on CSE linux systems!! 4. XPath evaluation over main memory structures 3 weeks (+ streaming support) Send your source code by Monday 23:59 (every other week) 5. XPath into SQL Translation 2 weeks to [email protected] 21 22 Outline - Assignments Hashing/hash code’s (A2) 1. Read XML, using DOM parser. Create document statistics 12 days 2. SAX Parse into memory structure: Tree vs DAG 2 weeks Lecture 1 3. Map XML into RDBMS 1 week (+ break) 4. XPath evaluation over main memory structures 3 weeks (+ streaming support) XML Introduction 5. XPath into SQL Translation 2 weeks Finite automata (A4) 23 24 Outline XML Introduction 1. Three motivations for XML (1) religious (2) practical (3) theoretical / mathematical Religious motivation for XML: 2. Well-formed XML to have one language to speak about data. 3. Character Encodings 4. Parsers for XML parsing into DOM (Document Object Model) 4 25 26 XML Motivation (religious cont.) XML is a Data Exchange Format 1974 SGML (Charles Goldfarb at IBM Research) 1989 HTML (Tim Berners-Lee at CERN/Geneva) 1994 Berners-Lee founds Web Consortium (W3C) 1996 XML (W3C draft, v1.0 in 1998) 27 28 XML Motivation (religious cont.) (2) Practical XML = data XML is a Data Exchange Format Tom Henzinger EPFL [email protected] 1974 SGML (Charles Goldfarb at IBM Research) … 1989 HTML (Tim Berners-Lee at CERN/Geneva) Helmut Seidl TU Munich 1994 Berners-Lee founds Web Consortium (W3C) [email protected] 1996 XML (W3C draft, v1.0 in 1998) http://www.w3.org/TR/REC-xml/ Text file 29 30 (2) Practical XML = data + structure (2) Practical XML = data + structure <Related> <Related> Tom Henzinger <colleague> Tom Henzinger <colleague> EPFL <name>Tom Henzinger</name> EPFL <name>Tom Henzinger</name> [email protected] <affil>EPFL</affil> [email protected] <affil>EPFL</affil> “mark <email> “mark <email> it [email protected] it [email protected] … </email></colleague> … </email></colleague> up!” </colleague> up!” </colleague> … … Helmut Seidl <friend> Helmut Seidl <friend> TU Munich <name>Helmut Seidl</name> TU Munich <name>Helmut Seidl</name> [email protected] <affil>TU Munich</affil>

Load more