XML Background from HTML to XML HTML Example in XML XML As A

<Course> XML Background <Title> CS 186 </Title> <Semester> Fall 2002 </Semester> • eXtensible Markup Language <Lecture Number = “12”> • Roots are HTML and SGML <Topic> XML </Topic> – HTML mixes formatting and semantics <Topic> Databases </Topic> – SGML is cumbersome </Lecture> • XML is focused on content </Course> – Designers (or others) can create their own sets of tags. – These tag definitions can be exchanged and shared “The reason that so many among various groups (DTDs, XSchema). people are excited about XML is – XSL is a companion language to specify presentation. that so many people are excited • <Opinion> XML is ugly </Opinion> about XML.” – Intended to be generated and consumed by applications --- not people! ANON From HTML to XML HTML <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999 HTML describes the presentation Example in XML XML as a Wire Format • People quickly figured out that XML is a convenient <bibliography> way to exchange data among applications. <book> <title> Foundations… </title> – E.g. Ford’s purchasing app generates a purchase order <author> Abiteboul </author> in XML format, e-mails it to a billing app at Firestone. <author> Hull </author> – Firestone’s billing app ingests the email, generates a bill <author> Vianu </author> in XML format, and e-mails it to Ford’s bank. <publisher> Addison Wesley </publisher> • Emerging standards to get the “e-mail” out of the picture: SOAP, WSDL, UDDI… <year> 1995 </year> • The basis of “Web Services” --- potential impact is tremendous. </book> • Why is it catching on? … It’s just text, so… </bibliography> •Platform, Language, Vendor agnostic •Easy to understand, manipulate and extend. XML describes the content •Compare this to data trapped in an RDBMS. 1 What’s this got to do with Databases? XML – Basic Structure <?xml version="1.0" encoding="UTF-8"?> • Given that apps will communicate by exchanging XML <!DOCTYPE article SYSTEM data, then databases must at least be able to: "http://xml.cXML.org/schemas/cXML/1.1.010/cXML.dtd"> – Ingest XML formatted data – Publish their own data in XML format <article key="Codd70"> •Preamble hasXML declaration, <author>E.F.Codd</author>, root element, ref to “DTD” • Thinking a bit harder: <title>A Relational Model •Elements have start and end –XML is kind of a data model. of Data for Large Shared Data Banks. Tags – Why convert to/from relational if everyone wants XML? </title>, •Well Formed: has root, proper • More cosmically: <pages>377-387</pages>, nesting, … <year>1970</year>, – Like evolution from spoken language to written language! •Valid: Conforms to DTD <volume>13</volume>, •Note that order matters (i.e. no • The (multi-) Billion Dollar Question: <journal>CACM</journal>, <number>6</number>, sets, only lists) – Will people really want to store XML data directly? <url>db/journals/cacm/cacm13.html#Codd70</url> – Current opinion: ORACLE, IBM, INFORMIX say no, other <ee>db/journals/cacm/Codd70.html</ee> DB vendors say Yes, or at least, “Maybe” <cdrom>CACMs1/CACM13/P377.pdf</cdrom> </article> Another (partial) Example Can View XML Document as a Tree <Invoice> <Buyer> <Name> ABC Corp. </Name> Invoice as a tree <Address> 123 ABC Way </Address> </Buyer> Invoice <Seller> <Name> Goods Inc. </Name> Buyer Seller Itemlist <Address> 17 Main St. </Address> </Seller> Name Address Name Item <ItemList> Address Item Item ABC Corp. 123 ABC Way Goods Inc. 17 Main widget thingy jobber <Item> widget </Item> St. <Item> thingy </Item> <Item> jobber </Item> </ItemList> Question: What Normal Form is this in? </Invoice> Mapping to Relational New splinters from XML Very expensive to store variable • Relational systems handle highly structured document types data ≠ ≠ ≠ Difficult to search trees that are broken into tables 2 Mapping to Relational I Mapping to Relational II • Question: What is a relational schema for • Can leverage Schema (or DTD) information to storing XML data? create relational schema. • Answer – Depends on how “Structured” it is… • Sometimes called “shredding” • If unstructured – use an “Edge Map” • For semi-structured data use hybrid with edge map for overflow. ParentLabel ID 0 NULL article 0 article STORED table Overflow buckets 0 author 1 (author, year, journal, …) article 1 E.F. Codd NULL 1 2 3 4 5 … 0 pages 2 author pages year journal number … 2 377-387 NULL author pages year journal cdrom … … … E.F. 377- 1970 CACM 6 Codd 387 E.F. 377- 1970 CACM P377.pdf Codd 387 Other XML features Document Type Definitions (DTDs) • Elements can have “attributes” (not clear why). • Grammar for describing the allowed structure of XML Documents. <Price currency="USD">1.50</Price> • Specify what elements can appear and in what order, nesting, etc. • XML docs can have IDs and IDREFs, URIs • DTDs are optional (!) – reference to another document or document element • Many “standard” DTDs have been developed • Two APIs for interacting with/parsing XML Docs: for all sorts of industries, groups, etc. –Document Object Model (DOM) – e.g. NITF for news article dissemination. • A tree “object” API for traversing an XML doc • Typically for Java –SAX • Event-Driven: Fire an event for each tag encountered during parse. • May not need to parse the entire document. DTD Example (partial) Beyond DTDs - XML Schemas, etc. <?xml version="1.0" encoding="UTF-8"?> <!ENTITY % datetime.tz "CDATA"> Here’s a DTD for a Contract <!ENTITY % string "CDATA"> • XML Schema is a proposal to replace/augment <!ENTITY % nmtoken "CDATA">  <!ENTITY % xmlLangCode "%nmtoken;"> DTDs <!ELEMENT SupplierID (#PCDATA)> – Has a notion of types and typechecking <!ATTLIST SupplierID – May introduce some notions of IC’s domain %string; #REQUIRED Elements contain others: > – Quite complicated, controversial ... not really <!ELEMENT Comments (#PCDATA)> ? = 0 or 1 <!ELEMENT ItemSegment (ContractItem+)> * = 0 or more adopted yet <!ATTLIST ItemSegment + = 1 or more • XML Namespaces segmentKey %string; #IMPLIED > – Can import tag names from others <!ELEMENT Contract (SupplierID+, Comments?, ItemSegment+)> – Disambiguate by prefixing the namespace name <!ATTLIST Contract • I.e. berkeley-eecs:gpa is different from uphoenix:gpa effectiveDate %datetime.tz; #REQUIRED expirationDate %datetime.tz; #REQUIRED > 3 Querying XML XPath • Syntax for tree navigation and node selection • Xpath – Navigation is defined by “paths” – A single-document language for “path expressions” – Used by other standards: XSLT, XQuery, XPointer,XLink • / = root node or separator between steps in path • XSLT – XPath plus a language for formatting output • * matches any one element name • @ references attributes of the current node • XQuery • // references any descendant of the current node – An SQL-like proposal with XPath as a sub-language – Supports aggregates, duplicates, … • [] allows specification of a filter (predicate) at a – Data model is lists, not sets step – “reference implementations” have appeared, but language is • [n] picks the nth occurrence from a list of elements. still not widely accepted. • SQL/XML • The fun part: – the SQL standards community fights back Filters can themselves contain paths XPath Examples XQuery • Parent/Child (‘/’) and Ancestor/Descendant (‘//’): /catalog/product//msrp <result> • Wildcards (match any single element): FOR $x in /bib/book WHERE $x/year > 1995 /catalog/*/msrp RETURN <newtitle> • Element Node Filters to further refine the nodes: $x/title – Filters can contain nested path expressions </newtitle> //product[price/msrp < 300]/name </result> //product[price/msrp < /dept/@budget]/name – Note, this last one is a kind of “join” XQuery XQuery Main Construct (replaces SELECT-FROM-WHERE): • FOR $x in expr -- binds $x to each value in the • FLWR Expression: FOR-LET-WHERE-RETURN list expr FOR/LET Clauses • LET $x = expr -- binds $x to the entire list Ordered List of tuples expr – Useful for common subexpressions and for WHERE Clause aggregations Filtered list of tuples RETURN Clause XML data: Instance of Xquery data model 4 XQuery Advantages of XML vs. Relational <big_publishers> • ASCII makes things easy FOR $p IN distinct(document("bib.xml")//publisher) –Easy to parse LET $b := document("bib.xml")/book[publisher = $p] – Easy to ship (e.g. across firewall, via email, etc.) • Self-documenting WHERE count($b) > 100 – Metadata (tag names) come with the data RETURN $p • Nested </big_publishers> – Can bundle lots of related data into one message – (Note: object-relational allows this) • Can be sloppy –don’t have to define a schema in advance distinct = a function that eliminates duplicates • Standard count = a (aggregate) function that returns the number of elms – Lots of free Java tools for parsing and munging XML • Expect lots of Microsoft tools (C#) for same • Tremendous Momentum! What XML does not solve Reminder: Benefits of Relational • XML doesn’t standardize metadata – It only standardizes the metadata language • Data independence buys you: • Not that much better than agreeing on an alphabet – Evolution of storage -- vs. XML? – E.g. my <price> tag vs. your <price> tag – Evolution of schema (via views) – vs. XML? • Mine includes shipping and federal tax, and is in $US • Database design theory • Yours is manufacturer’s list price in ¥Japan – IC’s, dependency theory, lots of nice tools for ER – XML Schema is a proposal to help with some of this • Remember, databases are long-lived and reused • XML doesn’t help with

XML Background from HTML to XML HTML Example in XML XML As A

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support