XML and Java XML Parsing Program for Today Extensible Markup

XML and Java XML Parsing Program for Today Extensible Markup

11/8/10 Program for today • Introduc?on to XML • Parsing XML XML and Java XML Parsing • Wri?ng XML ICW 2010 Lecture 11 Marco Cova Internet Compu?ng Workshop 2 Extensible Markup Language (XML) Extensible Markup Language (XML) • We know how to store data on the server • How can we send todo – Rela?on database, SQL, and JDBC items from users to • We know how to present data to users server? – HTML, CSS, and JavaScript Alice’s todos – Plain text? • We know how to send data over the network – HTML? – Sockets • It would be easy if we had – HTTP tags to describe the items • We’ll see how you can handle user requests and send back Bob’s todos Todo server we exchange (todos, in appropriate responses over HTTP this case) – Servlets and JSP (next on!) • XML allows to do that • What if data has to be processed by another soUware – Feeds: RSS, ATOM instead of being displayed to a human user? – RPC: SOAP Charlie’s todos Marco Cova Internet Compu?ng Workshop 3 Marco Cova Internet Compu?ng Workshop 4 Example: RSS Well-formed, valid XML <?xml version="1.0" encoding="UTF-8"?> • Well-formed document <rss version="2.0”> <channel> – Starts with prolog, e.g., <?tle> » School News</?tle> <?xml version="1.0" encoding="UTF-8”?> <link>hkp://www.cs.bham.ac.uk/sys/news/content</link> <descripon>School of Computer Science, The University of Birmingham</descripon> – Has one root element, e.g., <pubDate>Tue, 21 Sep 2010 12:08:34 +0000</pubDate> <rss version="2.0”> <generator>hkp://wordpress.org/?v=2.7.1</generator> – <language>en</language> Elements nest properly <item> • Valid document <?tle>Computer Science student named top 100 graduate in the UK</?tle> <link>hkp://www.cs.bham.ac.uk/sys/news/content/2010/09/02/computer-science-student- – Well-formed named-top-100-graduate-in-the-uk/</link> <pubDate>Thu, 02 Sep 2010 16:51:26 +0100</pubDate> – Has an associated document type declara?on <category><![CDATA[School News]]></category> – Complies with the constraints of the document type <descripon><![CDATA[Computer Science graduate…]></descripon> </item> declara?on </channel> </rss> Marco Cova Internet Compu?ng Workshop 5 Marco Cova Internet Compu?ng Workshop 6 1 11/8/10 Reading XML documents Tree-based parsing: DOM Parsing approaches: XML file Java DOM API DOM • Tree-based: object representa?on of XML document is stored in memory (as a tree) Document BuilderFactory – +: easy to manipulate document <todo> – -: memory requirement <?tle>slides</?tle> <due>2010-09-30</due> Document • Event-based: as parts of XML document are read <done>y</done> Builder events are generated (e.g., element started) <todo> – +: memory requirement – -: cannot easily manipulate document Marco Cova Internet Compu?ng Workshop 7 Marco Cova Internet Compu?ng Workshop 8 Tree-based parsing: DOM DOM use // instantiate the factory DocumentBuilderFactory dbf = Note: Node types Tree naviga1on DocumentBuilderFactory.newInstance(); • By default, parser is not • Ar • Node.getChildNodes // instantiate the document builder valida?ng • Comments • Node.getFirstChild DocumentBuilder db = – See dbf.setValida?ng() • Document dbf.newDocumentBuilder(); • Node.getLastChild • • Element // parse the file Various op?ons to • Document.getDocumentEle Document doc = db.parse(new File tweak the generated • Text (“todo.xml”)); ment tree, e.g.: • Less common: CDATASecon, CharacterData, • getElementsByTagName – Whitespace DocumentFragment, – CDATA coalescing DocumentType, En?ty, – comments En?tyReference, Nota?on, ProcessingInstruc?on Marco Cova Internet Compu?ng Workshop 9 Marco Cova Internet Compu?ng Workshop 10 Event-based parsing SAX • Parser produces no?fica?ons in XML file Java DOM API EventHandler correspondence of certain events (e.g., element started, element closed) SaxParser • 2 approaches: Factory – Push-based <todo> todo el. start <?tle>slides</?tle> • Once document read, parser must handle all events <due>2010-09-30</due> SaxParser ?tle el. start <done>y</done> • Simple API for XML (SAX) <todo> – Pull-based todo el. end • Parser controls the parsing (start, pause, resume) • Streaming API for XML (StAX) Marco Cova Internet Compu?ng Workshop 11 Marco Cova Internet Compu?ng Workshop 12 2 11/8/10 SAX StAX Parse setup Event handling XML file Java DOM API Program // instan?ate the factory class MyEventHandler SAXParserFactory factory = extends DefaultHandler { SAXParserFactory.newInstance(); SaxParser // instan?ate the parser // invoked when close tag found Factory SAXParser sp = factory.newSAXParser(); public void endElement(String uri, next // parse the document String localName, String qName); sp.parse(new File(“todo.xml”), <todo> new MyEventHandler()); // invoked when start tag found <?tle>slides</?tle> public void startElement(String <due>2010-09-30</due> SaxParser uri, String localName, String <done>y</done> todo el. start qName); <todo> // invoked when text element found public void characters(char[] ch, int start, int length) } Marco Cova Internet Compu?ng Workshop 13 Marco Cova Internet Compu?ng Workshop 14 StAX Modifying XML documents Parse setup Event handling • Add element // instantiate the factory while (reader.hasNext()) { – Document.createElement() XMLInputFactory factory = // get event type XMLInputFactory.newFactory(); int et = reader.getEventType() – Element.appendChild // instantiate the reader XMLStreamReader reader = // handle START_ELEMENT • Add akributes factory.createXMLStreamReader if (et == START_ELEMENT) { (new FileReader(“todo.xml”)); – Element.setAkribute(akr, value) } // handle other event types • Remove element … – Element.removeChild // get next event reader.next(); • Remove akribute } – Element.removeAribute Marco Cova Internet Compu?ng Workshop 15 Marco Cova Internet Compu?ng Workshop 16 Wri?ng XML documents Wri?ng XML documents – cont’d We could simply do: But: • A beker approach: serialize a DOM object out = System.out; • It’s hard to read and • Two techniques maintain out.println(“<?xml version=\"1.0\" – Visit the DOM tree and produce output depending encoding=\"UTF-8\"?>”); • out.println(“<rss version=\"2.0\”>”); It’s easy to make on current element … mistakes for (Item e : items) { – Leverage the transforma?on mechanism out.println(“<item>”); – Can you spot the error? out.println(“ <tle>” + e.?tle + “</ (java.xml.transform) to apply an iden?ty ?tle>”); transforma?on that saves its result in a file (see .. } code example) out.write(“</rss>”); Marco Cova Internet Compu?ng Workshop 17 Marco Cova Internet Compu?ng Workshop 18 3 11/8/10 Wri?ng XML documents – cont’d Charset, character encodings writeNode(doc.getDocumentElement()); void writeElement(Element e, FileWriter fout) { • Character set: set of String n = e.getNodeName(); characters void writeNode(Node n, FileWriter fout) { // open tag • Coded character set: if (n instanceof Element) fout.write(“<“ + n + “>”); writeElement((Element)n, fout); character set in which else if (n instanceof Text) // handle children every leker is mapped writeText((Text) n, fout); NodeList c = e.getChildNodes(); … for (int i = 0; i < c.getLength(); i++) { to a number (code } writeNode(c.item(i), fout); point) } • Character encoding: // close tag // write a text node fout.write(“</“ + n + “>”); specify how coded void writeText(Text t, } FileWriter fout) { characters are mapped fout(t.nodeValue); } to bytes Courtesy of hkp://www.w3.org/Interna?onal/ar?cles/defini?ons-characters/ Marco Cova Internet Compu?ng Workshop 19 Marco Cova Internet Compu?ng Workshop 20 Namespaces Namespaces – cont’d • Certain elements and akributes define a Solu?on: XML namespace • Elements are specified (“fully qualified”) by namespace vocabulary that may be reused in mul?ple prefix and name documents dc:?tle – E.g., the “Dublin Core Metadata Element Set” defines • Namespace prefix is associated to a namespace name 15 general proper?es, such as creator, :tle, etc. xmlns:dc=hkp://purl.org/dc/elements/1.1/ • Document declares the bindings it will use: • XML documents may wish to combine several <rss version="2.0” vocabularies xmlns:dc=hkp://purl.org/dc/elements/1.1/ xmlns:atom=hp://www.w3.org/2005/Atom • Issue: collisions • When using an element, use its fully qualified name: – E.g., both Dublin and Atom define a :tle element <dc:?tle>Title</dc:?tle> Marco Cova Internet Compu?ng Workshop 21 Marco Cova Internet Compu?ng Workshop 22 Escaping References • What if we want to write a todo item such as <todo> • Extensible Markup Language (XML) 1.0, <?tle>add new field <category></?tle> hkp://www.w3.org/TR/xml/ </todo> • <category> should be interpreted as simple text, but parser • Introducing Character Sets and Encodings, will consider it to be a tag and will raise an error (Why?) hkp://www.w3.org/Interna?onal/ar?cles/ • Soluon: CDATA secon – Forces parser to consider its content as text (not markup) defini?ons-characters/ – <![CDATA[ … ]]> <todo> <?tle>add new field <![CDATA[<category>]]></?tle> </todo> Marco Cova Internet Compu?ng Workshop 23 Marco Cova Internet Compu?ng Workshop 24 4 .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    4 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us