Distributed Computing Lesson 18: XML Processing Thomas Weise · 汤卫思 [email protected] ·

Distributed Computing Lesson 18: XML Processing Thomas Weise · 汤卫思 Tweise@Hfuu.Edu.Cn ·

Distributed Computing Lesson 18: XML Processing Thomas Weise · 汤卫思 [email protected] · http://www.it-weise.de Hefei University, South Campus 2 合肥学院 南艳湖校区/南2区 Faculty of Computer Science and Technology 计算机科学与技术系 Institute of Applied Optimization 应用优化研究所 230601 Shushan District, Hefei, Anhui, China 中国 安徽省 合肥市 蜀山区 230601 Econ. & Tech. Devel. Zone, Jinxiu Dadao 99 经济技术开发区 锦绣大道99号 Outline 1 XML Processing website DistributedComputing ThomasWeise 2/24 Overview • Now that we can create XML documents and define XML Schemas regarding how they look like... • ...we want to process these documents from Java. DistributedComputing ThomasWeise 3/24 XML Processing • Parsing: Read an XML document from a stream DistributedComputing ThomasWeise 4/24 XML Processing • Parsing: Read an XML document from a stream • Serializing: Write an XML document to a stream DistributedComputing ThomasWeise 4/24 XML Processing • Parsing: Read an XML document from a stream • Serializing: Write an XML document to a stream • Transforming: Translating an XML document from one form to another one (e.g., with XSLT [1, 2]) DistributedComputing ThomasWeise 4/24 XML Parsing • Non-validating parser: only checks for well-formedness DistributedComputing ThomasWeise 5/24 XML Parsing • Non-validating parser: only checks for well-formedness • Validating parser: also checks validity DistributedComputing ThomasWeise 5/24 XML Parsing • Non-validating parser: only checks for well-formedness • Validating parser: also checks validity (needs some language definition, e.g., DTD or XSD [3–8]) • Different parsing modes DistributedComputing ThomasWeise 5/24 XML Parsing • Non-validating parser: only checks for well-formedness • Validating parser: also checks validity (needs some language definition, e.g., DTD or XSD [3–8]) • Different parsing modes: • Result = tree data structure DistributedComputing ThomasWeise 5/24 XML Parsing • Non-validating parser: only checks for well-formedness • Validating parser: also checks validity (needs some language definition, e.g., DTD or XSD [3–8]) • Different parsing modes: • Result = tree data structure • Result = Series of events DistributedComputing ThomasWeise 5/24 XML Parsing • Non-validating parser: only checks for well-formedness • Validating parser: also checks validity (needs some language definition, e.g., DTD or XSD [3–8]) • Different parsing modes: • Result = tree data structure • Result = Series of events: • Push-parsing: SAX DistributedComputing ThomasWeise 5/24 XML Parsing • Non-validating parser: only checks for well-formedness • Validating parser: also checks validity (needs some language definition, e.g., DTD or XSD [3–8]) • Different parsing modes: • Result = tree data structure • Result = Series of events: • Push-parsing: SAX • Pull-parsing: StAX DistributedComputing ThomasWeise 5/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . • DOM-API: language independent, standardized at W3C [9] DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . • DOM-API: language independent, standardized at W3C [9] • Allows for high-level queries and manipulations DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . • DOM-API: language independent, standardized at W3C [9] • Allows for high-level queries and manipulations, e.g., • traverse nodes in certain order DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . • DOM-API: language independent, standardized at W3C [9] • Allows for high-level queries and manipulations, e.g., • traverse nodes in certain order • add, remove, insert, modify nodes, elements, attributes, .. DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . • DOM-API: language independent, standardized at W3C [9] • Allows for high-level queries and manipulations, e.g., • traverse nodes in certain order • add, remove, insert, modify nodes, elements, attributes, .. • finding nodes with specific names and/or features DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . • DOM-API: language independent, standardized at W3C [9] • Allows for high-level queries and manipulations, e.g., • traverse nodes in certain order • add, remove, insert, modify nodes, elements, attributes, .. • finding nodes with specific names and/or features • ... DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . • DOM-API: language independent, standardized at W3C [9] • Allows for high-level queries and manipulations, e.g., • traverse nodes in certain order • add, remove, insert, modify nodes, elements, attributes, .. • finding nodes with specific names and/or features • ... • Complete document must be read before it can be processed DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . • DOM-API: language independent, standardized at W3C [9] • Allows for high-level queries and manipulations, e.g., • traverse nodes in certain order • add, remove, insert, modify nodes, elements, attributes, .. • finding nodes with specific names and/or features • ... • Complete document must be read before it can be processed • Everything must be in memory (high memory consumption) DistributedComputing ThomasWeise 6/24 DOM Document Creation Listing: Creating and writing a DOM document (DOMCreateAndPrintDocument.java). import org.w3c.dom.Document; import javax.xml.parsers.DocumentBuilder; import org.w3c.dom.Element; import javax.xml.transform.stream.StreamResult; import org.w3c.dom.Node; import javax.xml.transform.TransformerFactory; import org.w3c.dom.NodeList; import javax.xml.parsers.DocumentBuilderFactory; import java.io.File; import javax.xml.transform.dom.DOMSource; public class DOMCreateAndPrintDocument { public static void main ( final String argv[]) throws Throwable { Document document; Node n, m; document = DocumentBuilderFactory.newInstance() .newDocumentBuilder() .newDocument(); // create an empty document n = document.createElement("myNewElement"); // create and add a new element document.appendChild(n); m = document.createElement("myNewSubElement"); // create and add a sub-element n.appendChild(m); n = document.createAttribute("myAttribute"); // create a new attribute n.setNodeValue("myAttributeValue"); // add the attribute m.getAttributes().setNamedItem(n); TransformerFactory.newInstance() .newTransformer() .transform( new DOMSource(document), new StreamResult(System.out)); // print to stdout } } DistributedComputing ThomasWeise 7/24 Reading and Modifying DOM Documents Listing: Reading, Modifying, and writing a DOM document (DOMReadAndPrintDocument.java). import org.w3c.dom.Document; import javax.xml.parsers.DocumentBuilder; import org.w3c.dom.Element; import javax.xml.transform.stream.StreamResult; import org.w3c.dom.Node; import javax.xml.transform.TransformerFactory; import org.w3c.dom.NodeList; import javax.xml.parsers.DocumentBuilderFactory; import java.io.File; import javax.xml.transform.dom.DOMSource; public class DOMReadAndPrintDocument { public static void main ( final String argv[]) throws Throwable { Document document; Node n, m; document = DocumentBuilderFactory.newInstance() // read the course example .newDocumentBuilder() // document as DOM tree . parse ("../xml/course.xml"); n = document.createElement("myNewElement"); // create and add a new element document.getDocumentElement().appendChild(n); m = document.createElement("myNewSubElement"); // create and add a sub-element n.appendChild(m); n = document.createAttribute("myAttribute"); // create a new attribute n.setNodeValue("myAttributeValue"); // add the attribute m.getAttributes().setNamedItem(n); TransformerFactory.newInstance() .newTransformer() .transform( new DOMSource(document), new StreamResult(System.out)); // print to stdout } } DistributedComputing ThomasWeise 8/24 SAX Parsing • SAX = Simple API for XML [10] DistributedComputing ThomasWeise 9/24 SAX Parsing • SAX = Simple API for XML [10] • SAX parser receives implementation of event handler interface DistributedComputing

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    70 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us