
Distributed Computing Lesson 18: XML Processing Thomas Weise · 汤卫思 [email protected] · http://www.it-weise.de Hefei University, South Campus 2 合肥学院 南艳湖校区/南2区 Faculty of Computer Science and Technology 计算机科学与技术系 Institute of Applied Optimization 应用优化研究所 230601 Shushan District, Hefei, Anhui, China 中国 安徽省 合肥市 蜀山区 230601 Econ. & Tech. Devel. Zone, Jinxiu Dadao 99 经济技术开发区 锦绣大道99号 Outline 1 XML Processing website DistributedComputing ThomasWeise 2/24 Overview • Now that we can create XML documents and define XML Schemas regarding how they look like... • ...we want to process these documents from Java. DistributedComputing ThomasWeise 3/24 XML Processing • Parsing: Read an XML document from a stream DistributedComputing ThomasWeise 4/24 XML Processing • Parsing: Read an XML document from a stream • Serializing: Write an XML document to a stream DistributedComputing ThomasWeise 4/24 XML Processing • Parsing: Read an XML document from a stream • Serializing: Write an XML document to a stream • Transforming: Translating an XML document from one form to another one (e.g., with XSLT [1, 2]) DistributedComputing ThomasWeise 4/24 XML Parsing • Non-validating parser: only checks for well-formedness DistributedComputing ThomasWeise 5/24 XML Parsing • Non-validating parser: only checks for well-formedness • Validating parser: also checks validity DistributedComputing ThomasWeise 5/24 XML Parsing • Non-validating parser: only checks for well-formedness • Validating parser: also checks validity (needs some language definition, e.g., DTD or XSD [3–8]) • Different parsing modes DistributedComputing ThomasWeise 5/24 XML Parsing • Non-validating parser: only checks for well-formedness • Validating parser: also checks validity (needs some language definition, e.g., DTD or XSD [3–8]) • Different parsing modes: • Result = tree data structure DistributedComputing ThomasWeise 5/24 XML Parsing • Non-validating parser: only checks for well-formedness • Validating parser: also checks validity (needs some language definition, e.g., DTD or XSD [3–8]) • Different parsing modes: • Result = tree data structure • Result = Series of events DistributedComputing ThomasWeise 5/24 XML Parsing • Non-validating parser: only checks for well-formedness • Validating parser: also checks validity (needs some language definition, e.g., DTD or XSD [3–8]) • Different parsing modes: • Result = tree data structure • Result = Series of events: • Push-parsing: SAX DistributedComputing ThomasWeise 5/24 XML Parsing • Non-validating parser: only checks for well-formedness • Validating parser: also checks validity (needs some language definition, e.g., DTD or XSD [3–8]) • Different parsing modes: • Result = tree data structure • Result = Series of events: • Push-parsing: SAX • Pull-parsing: StAX DistributedComputing ThomasWeise 5/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . • DOM-API: language independent, standardized at W3C [9] DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . • DOM-API: language independent, standardized at W3C [9] • Allows for high-level queries and manipulations DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . • DOM-API: language independent, standardized at W3C [9] • Allows for high-level queries and manipulations, e.g., • traverse nodes in certain order DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . • DOM-API: language independent, standardized at W3C [9] • Allows for high-level queries and manipulations, e.g., • traverse nodes in certain order • add, remove, insert, modify nodes, elements, attributes, .. DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . • DOM-API: language independent, standardized at W3C [9] • Allows for high-level queries and manipulations, e.g., • traverse nodes in certain order • add, remove, insert, modify nodes, elements, attributes, .. • finding nodes with specific names and/or features DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . • DOM-API: language independent, standardized at W3C [9] • Allows for high-level queries and manipulations, e.g., • traverse nodes in certain order • add, remove, insert, modify nodes, elements, attributes, .. • finding nodes with specific names and/or features • ... DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . • DOM-API: language independent, standardized at W3C [9] • Allows for high-level queries and manipulations, e.g., • traverse nodes in certain order • add, remove, insert, modify nodes, elements, attributes, .. • finding nodes with specific names and/or features • ... • Complete document must be read before it can be processed DistributedComputing ThomasWeise 6/24 DOM Trees • DOM (Document Object Model): object-oriented approach – XML documents represented as tree data structures • Hierarchy of document nodes: document, element, attribute, text, note, . • DOM-API: language independent, standardized at W3C [9] • Allows for high-level queries and manipulations, e.g., • traverse nodes in certain order • add, remove, insert, modify nodes, elements, attributes, .. • finding nodes with specific names and/or features • ... • Complete document must be read before it can be processed • Everything must be in memory (high memory consumption) DistributedComputing ThomasWeise 6/24 DOM Document Creation Listing: Creating and writing a DOM document (DOMCreateAndPrintDocument.java). import org.w3c.dom.Document; import javax.xml.parsers.DocumentBuilder; import org.w3c.dom.Element; import javax.xml.transform.stream.StreamResult; import org.w3c.dom.Node; import javax.xml.transform.TransformerFactory; import org.w3c.dom.NodeList; import javax.xml.parsers.DocumentBuilderFactory; import java.io.File; import javax.xml.transform.dom.DOMSource; public class DOMCreateAndPrintDocument { public static void main ( final String argv[]) throws Throwable { Document document; Node n, m; document = DocumentBuilderFactory.newInstance() .newDocumentBuilder() .newDocument(); // create an empty document n = document.createElement("myNewElement"); // create and add a new element document.appendChild(n); m = document.createElement("myNewSubElement"); // create and add a sub-element n.appendChild(m); n = document.createAttribute("myAttribute"); // create a new attribute n.setNodeValue("myAttributeValue"); // add the attribute m.getAttributes().setNamedItem(n); TransformerFactory.newInstance() .newTransformer() .transform( new DOMSource(document), new StreamResult(System.out)); // print to stdout } } DistributedComputing ThomasWeise 7/24 Reading and Modifying DOM Documents Listing: Reading, Modifying, and writing a DOM document (DOMReadAndPrintDocument.java). import org.w3c.dom.Document; import javax.xml.parsers.DocumentBuilder; import org.w3c.dom.Element; import javax.xml.transform.stream.StreamResult; import org.w3c.dom.Node; import javax.xml.transform.TransformerFactory; import org.w3c.dom.NodeList; import javax.xml.parsers.DocumentBuilderFactory; import java.io.File; import javax.xml.transform.dom.DOMSource; public class DOMReadAndPrintDocument { public static void main ( final String argv[]) throws Throwable { Document document; Node n, m; document = DocumentBuilderFactory.newInstance() // read the course example .newDocumentBuilder() // document as DOM tree . parse ("../xml/course.xml"); n = document.createElement("myNewElement"); // create and add a new element document.getDocumentElement().appendChild(n); m = document.createElement("myNewSubElement"); // create and add a sub-element n.appendChild(m); n = document.createAttribute("myAttribute"); // create a new attribute n.setNodeValue("myAttributeValue"); // add the attribute m.getAttributes().setNamedItem(n); TransformerFactory.newInstance() .newTransformer() .transform( new DOMSource(document), new StreamResult(System.out)); // print to stdout } } DistributedComputing ThomasWeise 8/24 SAX Parsing • SAX = Simple API for XML [10] DistributedComputing ThomasWeise 9/24 SAX Parsing • SAX = Simple API for XML [10] • SAX parser receives implementation of event handler interface DistributedComputing
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages70 Page
-
File Size-