2.3 JAXP: Java API for XML Processing JAXP 1.1

  How can applications use XML processors? Included in Java since JDK 1.4  – A Java -based answer: through JAXP An interface for “plugging -in ” and using XML – An overview of the JAXP interface processors in Java applications » What does it specify? – includes packages » What can be done with it? » org..sax : SAX 2.0 » How do the JAXP components fit together? » org.w3c.dom: DOM Level 2 » javax.xml.parsers : [Partly based on tutorial “An Overview of the APIs ” at initialization and use of parsers http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/overview » javax.xml.transform : /3_apis.html, from which also some graphics are borrowed] initialization and use of transformers (XSLT processors) XPT 2006 XML APIs: JAXP 1 XPT 2006 XML APIs: JAXP 2

Later Versions JAXP: XML processor plugin (1)

 JAXP 1.2 (2002) adds property -strings for  Vendor -independent method for selecting setting the language and source of a schema processor implementation at run time used for (non -DTD -based) validation – principally through system properties  JAXP 1.3 included in JDK 5.0 (2005) javax.xml.parsers.SAXParserFactory javax.xml.parsers.DocumentBuilderFactory – more flexible validation (decoupled from parsing) javax.xml.transform.TransformerFactory – support for DOM3 and XPath – Set on command line (for example, to use as the DOM implementation):  We'll restrict to basic ideas from JAXP 1.1 java -Djavax.xml.parsers.DocumentBuilderFactory = org.apache.xerces.jaxp.DocumentBuilderFactoryImpl

XPT 2006 XML APIs: JAXP 3 XPT 2006 XML APIs: JAXP 4

JAXP: XML processor plugin (2) JAXP: Functionality

−> – Set during execution ( Saxon as the XSLT impl ):  Parsing using SAX 2.0 or DOM Level 2 System.setProperty ( "javax.xml.transform.TransformerFactory ",  Transformation using XSLT "com.icl.saxon.TransformerFactoryImpl "); – (We ’ll study XSLT in detail later)  By default, reference implementations used  Adds functionality missing from SAX 2.0 and – Apache Crimson/ Xerces as the XML parser – as the XSLT processor DOM Level 2:  Supported by a few compliant processors: – controlling validation and handling of parse errors can – Parsers: Apache Crimson and Xerces , Aelfred , » error handling be controlled in SAX, Oracle XML Parser for Java by implementing ErrorHandler methods – XSLT transformers: Apache Xalan , Saxon – loading and saving of DOM Document objects

XPT 2006 XML APIs: JAXP 5 XPT 2006 XML APIs: JAXP 6

JAXP Parsing API JAXP: Using a SAX parser (1)

 .newSAXParser () Included in JAXP package .getXMLReader () javax.xml.parsers

 Used for invoking and using SAX … Used for invoking and using SAX … XML SAXParserFactory spf = SAXParserFactory .newInstance (); .parse ( ”f.xml ”) and DOM parser implementations: f.xml DocumentBuilderFactory dbf = DocumentBuilderFactory .newInstance ();

XPT 2006 XML APIs: JAXP 7 XPT 2006 XML APIs: JAXP 8 JAXP: Using a SAX parser (2) JAXP: Using a DOM parser (1)

 We have already seen this : .newDocumentBuilder () SAXParserFactory spf = SAXParserFactory .newInstance (); try { .newDocument () SAXParser saxParser = spf. newSAXParser (); XMLReader xmlReader = saxParser. getXMLReader (); ... .parse( ”f.xml ”) xmlReader .setContentHandler (handler ); xmlReader .parse (fileNameOrURI ); ... f.xml } catch (Exception e) { System.err.println(e.getMessage ()); System.exit(1); }; XPT 2006 XML APIs: JAXP 9 XPT 2006 XML APIs: JAXP 10

JAXP: Using a DOM parser (2) DOM building in JAXP

 Document Parsing a file into a DOM Document : Builder DocumentBuilderFactory dbf = (Content DocumentBuilderFactory .newInstance (); Handler ) try { // to get a new DocumentBuilder : XML Error DOM Document DocumentBuilder builder = Reader Handler dbf. newDocumentBuilder (); XML Document domDoc = (SAX DTD builder. parse (fileNameOrURI ); Parser ) Handler } catch (ParserConfigurationException e) { Entity e.printStackTrace ()); Resolver System.exit(1); }; DOM on top of SAX - So what ? XPT 2006 XML APIs: JAXP 11 XPT 2006 XML APIs: JAXP 12

JAXP: Controlling parsing (1) JAXP: Controlling parsing (2)

 Errors of DOM parsing can be handled  – by creating a SAX ErrorHandler Further DocumentBuilderFactory » to implement error , fatalError and warning methods configuration methods to control the form of and passing it to the DocumentBuilder : the resulting DOM Document : builder. setErrorHandler (new myErrHandler ()); domDoc = builder. parse (fileName ); setIgnoringComments (true /false)  Parser properties can be configured : setIgnoringElementContentWhitespace (true /false) – for both SAXParserFactories and setCoalescing (true /false) DocumentBuilderFactories (before parser /builder • combine CDATA sections with surrounding text? creation ): factory. setValidating (true /false) setExpandEntityReferences (true /false) factory. setNamespaceAware (true /false) XPT 2006 XML APIs: JAXP 13 XPT 2006 XML APIs: JAXP 14

JAXP Transformation API JAXP: Using Transformers (1)

 earlier known as TrAX  Allows application to apply a Transformer to a .newTransformer (…) Source document to get a Result document .transform (.,.)  Transformer can be created – from XSLT transformation instructions (to be discussed later ) – without instructions Source » gives an identity transformation , which simply XSLT copies the Source to the Result

XPT 2006 XML APIs: JAXP 15 XPT 2006 XML APIs: JAXP 16 JAXP Transformation APIs Source -Result combinations

 javax.xml.transform : Source Transformer Result – Classes Transformer and TransformerFactory ; initialization similar XML Content Reader Handler to parsers and parser factories (SAX Parser )  Transformation Source object can be DOM – a DOM tree, a SAX XMLReader or an input stream DOM Output  Input Output Transformation Result object can be Stream Stream – a DOM tree, a SAX ContentHandler or an output stream

XPT 2006 XML APIs: JAXP 17 XPT 2006 XML APIs: JAXP 18

JAXP Transformation Packages (2) Serializing a DOM Document as XML text

 Classes to create Source and Result objects  By an identity transformation to an output stream: from DOM, SAX and I/O streams defined in packages TransformerFactory tFactory = – javax.xml.transform.dom , TransformerFactory .newInstance (); javax.xml.transform.sax , and // Create an identity transformer : javax.xml.transform.stream Transformer transformer = tFactory. newTransformer ();   Identity transformation to an output stream is a DOMSource source = new DOMSource (myDOMdoc ); vendor -neutral way to serialize DOM documents StreamResult result = (and the only option in JAXP) new StreamResult (System.out ); – “I would recommend using the JAXP interfaces until the DOM ’s own load/save module becomes available ” » Joe Kesselman , IBM & W3C DOM WG transformer. transform (source , result );

XPT 2006 XML APIs: JAXP 19 XPT 2006 XML APIs: JAXP 20

Controlling the form of the result ? Creating an XSLT Transformer

 We could specify the requested form of the result by  Then create a tailored transfomer : an XSLT script, say, in file saveSpec. : StreamSource saveSpecSrc = new File( ”saveSpec.xslt ”) ); Transformer transformer = // and use it to transform a Source to a Result , // as before  The Source of transformation instructions could be given also as a DOMSource , URL, or character reader

XPT 2006 XML APIs: JAXP 21 XPT 2006 XML APIs: JAXP 22

DOM vs. Other Java/XML APIs JAXP: Summary

 JDOM ( www.jdom.org ), DOM4J ( www.dom4j.org ),  An interface for using XML Processors JAXB ( java.sun.com /xml/ jaxb ) JAXB ( ) – SAX/DOM parsers, XSLT transformers  The others may be more convenient to use,  Supports pluggability of XML processors but … but …  “The DOM offers not only the ability to move Defines means to control parsing, and between languages with minimal relearning, but handling of parse errors (through SAX to move between multiple implementations in ErrorHandlers ) a single language – which a specific set of classes  such as JDOM can ’t support ” Defines means to write out DOM Documents » J. Kesselman , IBM & W3C DOM WG  Included in Java 2

XPT 2006 XML APIs: JAXP 23 XPT 2006 XML APIs: JAXP 24