XML Error Handling

XML Error Handling

Theories of Errors Bijan Parsia COMP60372 Feb. 19, 2010 Monday, 1 March 2010 1 Finishing Types Since we are gluttons for punishment Monday, 1 March 2010 2 Recall • We now have a theory of matching – i.e., we know what it is for a value to match a type – e.g., a simple value matches xs:string iff it’s a string – Matching was complex for elements • But matching isn’t our key service – validation is • Two conceptions of validation – Grammar based recognition • Validate as instance of some DTD • Validate “against” some DTD • Determine if valid wrt some DTD – PSVI production • Go from an untyped value (or its string rep) to a typed one • validation and erasure Monday, 1 March 2010 3 Validation (subset) Compare with matching! Monday, 1 March 2010 4 Erasure A complexity! integer-of-string(“01”) = integer-of-string(“1”) Wildcard info lost! Monday, 1 March 2010 5 Validation & Erasure • Features of “external representation” – Self-describing and round-tripping • Round-tripping failure comes from cases where – erases is a relation (trivial) • “01” to 1 to “1” – erases obliterates type Self-description failure! • {“1”, 1} to “1 1” to {1, 1} (or {“1”, “1”} Monday, 1 March 2010 6 Coursework Retrospective • Some Tricky Bits™ – No one expects the Spanish Inquisition • No one! – minidtdx.xsd describes the syntax of (mini)DTD/XML • It describes an XML syntax for a fragment of DTDs • It does not describe the semantics! – Esp. not by example! • It is incomplete – It is not the tightest schema possible – Questionable syntax choices? » Repetition in choice • XML Schema in different modes – Validate where? – Xerces Has A Bug • nillable=”false” Monday, 1 March 2010 7 Error Reporting • What’s wrong with... declare function ssd:convertDeclarationOrComment($dec) { validate { typeswitch ($dec) Severity: fatal ….. Description: Attribute @ref is not allowed on element case element(ref) return <element>. The name is in one of the disallowed <xs:element ref="$dec/@ref"/> namespaces for the wildcard …. default return... } }; declare function ssd:convertDeclarationOrComment($dec) { validate { typeswitch ($dec) ….. Severity: fatal case element(empty) return Description: Required attribute @name is missing <xs:complexType/> …. default return... } On what?! }; Monday, 1 March 2010 8 What is the problem? • Validation! declare function ssd:convertDeclarationOrComment($dec) { validate { typeswitch ($dec) ….. case element(empty) return <xs:complexType/> …. default return... } }; Why doesn’t validate find the “most appropriate” type? case element(empty) return validate {<xs:element name="foo"><xs:complexType/></xs:element>}/*[1] (Note this requires adjustment elsewhere.) Why don’t constructers support type? Monday, 1 March 2010 9 minidtdx2wxs.xquery • What does this describe? – Ideally, a set of WXS – What’s the most specific output type? – Given the input, and the functions • we have a constrained output • we can define additional constraints along the way – e.g., that no @ref appears on a global element • Compare with minidtdx.xsd – Both can be seen as constructive • WXS produces PSVI given input • mimnidtdx2wx.xquery produces a WXS – Both can be seen as checking • (May want to modify some aspects of the query) – Which (XQuery or WXS) is more expressive? • Which is more analyzable? Monday, 1 March 2010 10 What’s right? Wherein we think about correctness Monday, 1 March 2010 11 What is an XML “Document”? • Layers – A series of octets – A series of unicode characters Errors here mean no – A series of “events” XML! SAX ErrorHandler • SAX perspective • E.g., Start/End tags • Events are tokens – A tree structure • A DOM/Infoset Yay! XPath! XSLT! Etc. – A tree of a certain shape • A Validated Infoset – An adorned tree of a certain shape • A PSVI wrt an WXS Types in play Monday, 1 March 2010 12 What is an XML “Document”? • Layers validate – A series of octets – A series of unicode characters – A series of “events” • SAX perspective • E.g., Start/End tags • Events are tokens – A tree structure • A DOM/Infoset – A tree of a certain shape • A Validated Infoset – An adorned tree of a certain shape • A PSVI wrt an WXS erase Monday, 1 March 2010 13 What is an XML “Document”? • Layers – A series of octets – A series of unicode characters – A series of “events” • SAX perspective • E.g., Start/End tags • Events are tokens – A tree structure • A DOM/Infoset – A tree of a certain shape “Same” inputs can • A Validated Infoset have different “meanings”! – An adorned tree of a certain shape (external validation) • A PSVI wrt an WXS Monday, 1 March 2010 14 What is an XML “Document”? • Layers Generally looks like <configuration xmlns="http://saxon.sf.net/ns/configuration" edition="EE"> – A series of octets <serialization method="xml" /> – A series of unicode characters </configuration> – A series of “events” But can look otherwise! • SAX perspective element configuration { attribute edition {"ee"}, • E.g., Start/End tags element serialization {attribute method {"xml"}}} • Events are tokens – A tree structure Same “meaning”, • A DOM/Infoset different spelling – A tree of a certain shape • A Validated Infoset – An adorned tree of a certain shape • A PSVI wrt an WXS Monday, 1 March 2010 15 What is an XML “Document”? • Layers – A series of octets – A series of unicode characters – A series of “events” Can have many... • SAX perspective • E.g., Start/End tags • Events are tokens ..for “the same” meaning – A tree structure • A DOM/Infoset – A tree of a certain shape • A Validated Infoset – An adorned tree of a certain shape • A PSVI wrt an WXS – A picture (or document, or action, or…) • Application meaning Monday, 1 March 2010 16 A Case to Study Wherein we go all trendy Monday, 1 March 2010 17 A Case to Study • Consider weblogs – Chronologically reversed series of "items” – Each item has an author and a timestamp – Items are generally short, but can contain all sorts of hypermedia – Generally intended to be read by people • Closer to a magazine than to a stock ticker • Different aspects – Writing – Reading – Publishing • As a web site • As a "feed" for syndication – Aggregating Monday, 1 March 2010 18 A Weblog Workflow Monday, 1 March 2010 19 Weblog Data Formats • For writing – HTML (directly or via a Web App) – Markdown/Wikilike Languages • For reading – HTML • For publishing – HTML (websites) – RSSx/Atom (syndication) • For aggregation – RSSx/Atom – HTML? (via scraping) • hAtom? Monday, 1 March 2010 20 HTML as SSD • HTML files (tend to) correspond to documents – Text/narrative heavy – Complex, irregular (but with some, and some treelike) structure – Lots of features (doc structure, formatting, forms, etc.) • HTML is not XML – Most XHTML on the Web is served as text/html – Permissive parsing: No need for well formedness – Omit close tags, quotes around attributes – Misnest tags -- the browsers will cope! • HTML is Not SGML – See, for example, the case of comments Monday, 1 March 2010 21 A Simple HTML Weblog (1) Authentic Voice of a Person. Reverse Chronological Order. On the web. These are essential characteristics of a online Journal or weblog. Given the statements above, a well formed log entry would contain at a minimum an author, a creationDate, and a permaLink. And, of course, content. <h1>My Weblog</h1> -- Sam Ruby <h2>What I Did Today</h2> <h3><a id=“160220081”></a> Feb. 11, 2008; Bijan Parsia</h3> <p>Taught class and it went <i>very</i> well.</p> What is this notion of “well-formed”? Monday, 1 March 2010 22 A Simple HTML Weblog (2) • We can radically change the markup <h1>My Weblog</h1> <ul> <li> <b>What I Did Today</b><br/> <i><a id="160220081"> Feb. 11, 2008; Bijan Parsia</a></i></br> <p>Taught a class and it went <em>very</em> well. </li> </ul> • Which is “right”? • Where is the structure, semi or otherwise? – Is this a “well formed” weblog entry? • By Ruby’s critera? Monday, 1 March 2010 23 A Simple Atom Entry <feed xmlns="http://www.w3.org/2005/Atom"> <title>My Weblog</title> <updated>2008-02-13T18:30:02Z</updated> <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id> <entry> <author> <name>Bijan Parsia</name> </author> <title>What I Did Today</title> <id>urn:uuid:1225c695-cfb8-4ebb-aaaa</id> <updated>2008-02-13T18:30:02Z</updated> <content type="xhtml" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"> <p>Taught class and it went <em>very</em> well.</p> </content> </entry> </feed> Monday, 1 March 2010 24 A Structured HTML (1) <div class=title>My Weblog</div> <div class="entry"> <div class=title>What I Did Today</div> <div class=byline> <span class=date>Feb. 16, 2009</span> <span class=author>Bijan Parsia</span> </div> <div class="content"> <p>Taught a class and it went <i>very</i> well.</p> </div> </div> • What do we see? – Not well formed XML – Will not render as nicely • With the right style (CSS), will look like the others! – Not a well formed log entry! • Missing a permalink • Though perhaps it’s implicit? Monday, 1 March 2010 25 Some CSS • Which layer does this describe? <style type="text/css"> .title {font-weight: bold} div.title {text-align:center; font-size: 24; } div.entry div.title {text-align: left; font-variant: normal} span.date {font-style: italic} span.date:after{content:" by"} div.content {font-style: italic} div.content i {font-style: normal; font-weight: bold} </style> Structure Presentation Monday, 1 March 2010 26 A Structured HTML (2) <div class="hfeed"> <p>My Weblog</p> <div class="hentry" id="112993192128302715"> <strong class="entry-title"> What I Did Today </strong> <div class="entry-content"> <p>Taught a class and it went <i>very</i> well.</p> </div> </div> <div> <span class="byline">posted by <span class="author vcard"> <span class="fn">Bijan Parsia</span> at <a rel="bookmark" href="2009-16-02-post1"> <abbr class="published">Feb.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    70 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us