API for XML Processing (JAXP)

• API that provides an abstraction layer to XML parser implementations (specifically implementations of DOM and SAX), and applications that process Extensible Stylesheet Language Transformations (XSLT) • JAXP is is a layer above the parser that makes it easier to perform some vendor-specific tasks in a vendor-neutral fashion. JAXP employs the Abstract Factory design pattern to provide a plugability layer, which allows you to plug in an implementation of DOM or SAX, or an application that processes XSLT • The primary classes of the JAXP plugability layer are javax..parsers.DocumentBuilderFactory, javax.xml.parsers.SAXParserFactory, and javax.xml.transform.TransformerFactory. • Classes are abstract so you must ask the specific factory to create an instance of itself, and then use that instance to create a javax.xml.parsers.DocumentBuilder, javax.xml.parsers.SAXParser, or javax.xml.transform.Transformer, respectively. • DocumentBuilder abstracts the underlying DOM parser implementation, SAXParser the SAX parser implementation, and Transformer the underlying XSLT processor. DocumentBuilder, SAXParser, and Transformer are also abstract classes, so instances of them can only be obtained through their respective factory. JAXP Example - 1 import java.io.*; import javax.xml.*; import org.w3c.dom.Document; import org.xml.sax.SAXException; import javawebbook.sax.ContentHandlerExample; public class JAXPTest { public static void main(String[] args) throws Exception { File xmlFile = new File(args[0]); File xslFile = new File(args[1]); File xsltResultFile = new File(args[2]); DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance(); docBuilderFactory.setNamespaceAware(true); docBuilderFactory.setValidating(true); DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder(); Document doc = docBuilder.parse(xmlFile); JAXP Example - 2

SAXParserFactory saxParserFactory = SAXParserFactory.newInstance(); saxParserFactory.setNamespaceAware(true); saxParserFactory.setValidating(true); SAXParser saxParser = saxParserFactory.newSAXParser(); saxParser.parse(xmlFile, new ContentHandlerExample()); TransformerFactory transformerFactory = TransformerFactory.newInstance(); Source xslSource = new StreamSource(xslFile); Transformer transformer = transformerFactory.newTransformer(xslSource); Source xmlSource = new StreamSource(xmlFile); Result xsltResult = new StreamResult(xsltResultFile); transformer.transform(xmlSource, xsltResult); } } JDOM and • DOM is useful but can be awkward to use because it was designed to be independent of any programming language. • Implementations that take advantage of the strengths of Java can be easier to use. Examples are JDOM (http://jdom.org) and dom4j (http://dom4j.org). • Both JDOM and dom4j are open source and can be used with JAXP • Both APIs take advantage of built-in Java classes, provide an object model to represent an XML tree, are intuitive and easy to use, integrate well with SAX and DOM, support XPath, and are more efficient than DOM. • JDOM is built on concrete classes and dom4j on interfaces. • dom4j is more flexible, yet more complex. • dom4j additional features over JDOM like event-based processing for handling very large documents or streamed documents. • dom4j also aims to be a more complete solution than JDOM, whose goal is to solve only about 80% of the Java/XML problems. Transforming XML Using XSLT • Extensible Stylesheet Language Transformations (XSLT) are part of the XML Stylesheet Language (XSL). • An XSLT stylesheet, which is simply and XML document, contains instructions on how an XML document should be transformed by an XSLT processor. XSLT is a full programming language, expressed as XML, designed specifically for reformatting XML documents. There are more than 50 XSLT elements and more than 200 attributes. • XSL Transformations provide a way to translate the semantic descriptions of an XML document to presentational descriptions, e.g., translate XML to HTML. • XSL Transformations allow XML data to be reordered, permit the display of attributes, and allow elements to be displayed in an order other than that in which they are given in the XML document. XSL Transformations can also add static data to the output, such as XHTML tags and CSS style specifications. XSLT, cont. • Writing an XSLT stylesheet simply involves writing templates for those elements that are to be a part of the output. The XSLT processor traverses the supplied XML document tree looking for elements that match these templates. Templates may include XML element and attribute contents, other markup, such as XHTML tags, and other literal and computed values. For example:


XSLT, cont. • An XSLT processor that was supplied this XSLT stylesheet and a valid XML document would, for each element combination it encountered, iterate over the nested elements outputting the value of each element followed by a space, then the value of the element followed by the markup
. • matches a element as the root of the XML document. It specifies , which means apply any tags that match a element, which is nested within a element. • matches the element, within which iterates over any elements using to output the element value and to output a space. Without the XSLT processor would remove the whitespace. XSLT, cont.

then outputs the value of a matched tag nested within a tag. An element can be used for computation, but is most often used to select elements or attributes of the input document for writing to output. The select attribute of is an XPath expression that determines what value from the input document is to be written to the output. •The
tag is simply XHTML markup; it is not in the xsl namespace, so it will be copied to the output just like any other text. • The default behavior of XSLT is to copy the element values and whitespace outside of elements to the output document. A template at the outermost level can be used to specify which inner elements are to be used, and in what order. XPath • XPath expressions look a lot like directory path expressions for operating systems; both describe a path through a tree structure. Absolute paths begin with a / and start at the root element of the document. The XSLT processor traverses the input document in preorder fashion and keeps track of its current position. The current position is called the context , and it is referred to with a period. Relative path specifications are relative to the context node and do not begin with a slash. Possible XPath specifications and their meanings: / – The root of the document. . – Contents of the current context node. to/text() – Contents of the text node of the element /memo/to/name – elements that are a child element of /memo/to. //surname –All elements, even if at different levels. /memo/to/name[1] –First child element of /memo/to. /memo/to/name[last()] – Last child element of /memo/to. @date – Contents of the date attribute of the current context element. /memo@date – Contents of the date attribute of the element. • XPath also includes functions and operators, which were’nt discussed. Simple XSLT Example • The following example illustrates transforming an XML document into an HMTL TABLE – Input • Style sheet (XSL): table.xsl • XML document: acronym.xml – Output • HTML document: acronym. XSLT Stylesheet: table.xsl

... XSLT Stylesheet: table.xsl (continues) ...

  XML Document: acronyms.xml DOM Document Object Model JAXP Java AIP for XML Parsing SAX Simple API for XML TrAX Transformation API for XML XSLT XSL Transformation Transformation Result

ACRONYMDESCRIPTION
1DOM Document Object Model 
2JAXP Java AIP for XML Parsing 
3SAX Simple API for XML 
4TrAX Transformation API for XML 
5XSLT XSL Transformation 
Transformation Result (continued)