Java API for XML Processing (JAXP)

Java API for XML Processing (JAXP) • API that provides an abstraction layer to XML parser implementations (specifically implementations of DOM and SAX), and applications that process Extensible Stylesheet Language Transformations (XSLT) • JAXP is is a layer above the parser APIs that makes it easier to perform some vendor-specific tasks in a vendor-neutral fashion. JAXP employs the Abstract Factory design pattern to provide a plugability layer, which allows you to plug in an implementation of DOM or SAX, or an application that processes XSLT • The primary classes of the JAXP plugability layer are javax.xml.parsers.DocumentBuilderFactory, javax.xml.parsers.SAXParserFactory, and javax.xml.transform.TransformerFactory. • Classes are abstract so you must ask the specific factory to create an instance of itself, and then use that instance to create a javax.xml.parsers.DocumentBuilder, javax.xml.parsers.SAXParser, or javax.xml.transform.Transformer, respectively. • DocumentBuilder abstracts the underlying DOM parser implementation, SAXParser the SAX parser implementation, and Transformer the underlying XSLT processor. DocumentBuilder, SAXParser, and Transformer are also abstract classes, so instances of them can only be obtained through their respective factory. JAXP Example - 1 import java.io.*; import javax.xml.*; import org.w3c.dom.Document; import org.xml.sax.SAXException; import javawebbook.sax.ContentHandlerExample; public class JAXPTest { public static void main(String[] args) throws Exception { File xmlFile = new File(args[0]); File xslFile = new File(args[1]); File xsltResultFile = new File(args[2]); DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance(); docBuilderFactory.setNamespaceAware(true); docBuilderFactory.setValidating(true); DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder(); Document doc = docBuilder.parse(xmlFile); JAXP Example - 2 SAXParserFactory saxParserFactory = SAXParserFactory.newInstance(); saxParserFactory.setNamespaceAware(true); saxParserFactory.setValidating(true); SAXParser saxParser = saxParserFactory.newSAXParser(); saxParser.parse(xmlFile, new ContentHandlerExample()); TransformerFactory transformerFactory = TransformerFactory.newInstance(); Source xslSource = new StreamSource(xslFile); Transformer transformer = transformerFactory.newTransformer(xslSource); Source xmlSource = new StreamSource(xmlFile); Result xsltResult = new StreamResult(xsltResultFile); transformer.transform(xmlSource, xsltResult); } } JDOM and dom4j • DOM is useful but can be awkward to use because it was designed to be independent of any programming language. • Implementations that take advantage of the strengths of Java can be easier to use. Examples are JDOM (http://jdom.org) and dom4j (http://dom4j.org). • Both JDOM and dom4j are open source and can be used with JAXP • Both APIs take advantage of built-in Java classes, provide an object model to represent an XML tree, are intuitive and easy to use, integrate well with SAX and DOM, support XPath, and are more efficient than DOM. • JDOM is built on concrete classes and dom4j on interfaces. • dom4j is more flexible, yet more complex. • dom4j additional features over JDOM like event-based processing for handling very large documents or streamed documents. • dom4j also aims to be a more complete solution than JDOM, whose goal is to solve only about 80% of the Java/XML problems. Transforming XML Using XSLT • Extensible Stylesheet Language Transformations (XSLT) are part of the XML Stylesheet Language (XSL). • An XSLT stylesheet, which is simply and XML document, contains instructions on how an XML document should be transformed by an XSLT processor. XSLT is a full programming language, expressed as XML, designed specifically for reformatting XML documents. There are more than 50 XSLT elements and more than 200 attributes. • XSL Transformations provide a way to translate the semantic descriptions of an XML document to presentational descriptions, e.g., translate XML to HTML. • XSL Transformations allow XML data to be reordered, permit the display of attributes, and allow elements to be displayed in an order other than that in which they are given in the XML document. XSL Transformations can also add static data to the output, such as XHTML tags and CSS style specifications. XSLT, cont. • Writing an XSLT stylesheet simply involves writing templates for those elements that are to be a part of the output. The XSLT processor traverses the supplied XML document tree looking for elements that match these templates. Templates may include XML element and attribute contents, other markup, such as XHTML tags, and other literal and computed values. For example: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="memo"> <xsl:apply-templates select="to/name" /> </xsl:template> <xsl:template match="name"> <xsl:for-each select="forename"> <xsl:value-of select="."/><xsl:text> <xsl:text> </xsl:for-each> <xsl:value-of select="surname" /> <br /> </xsl:template> </xsl:stylesheet> XSLT, cont. • An XSLT processor that was supplied this XSLT stylesheet and a valid XML document would, for each <memo><to><name> element combination it encountered, iterate over the nested <forename> elements outputting the value of each <forename> element followed by a space, then the value of the <surname> element followed by the markup <br />. • <xsl:template match="memo"> matches a <memo> element as the root of the XML document. It specifies <xsl:apply- templates select="to/name">, which means apply any <xsl:template> tags that match a <name> element, which is nested within a <to> element. • <xsl:template match="name"> matches the <name> element, within which <xsl:for-each select="forename"> iterates over any <forename> elements using <xsl:value-of select="."/> to output the <forename> element value and <xsl:text> to output a space. Without <xsl:text> the XSLT processor would remove the whitespace. XSLT, cont. • <xsl:value-of select="surname" /> then outputs the value of a matched <surname> tag nested within a <name> tag. An <xsl:value-of > element can be used for computation, but is most often used to select elements or attributes of the input document for writing to output. The select attribute of <xsl:value-of > is an XPath expression that determines what value from the input document is to be written to the output. •The <br /> tag is simply XHTML markup; it is not in the xsl namespace, so it will be copied to the output just like any other text. • The default behavior of XSLT is to copy the element values and whitespace outside of elements to the output document. A template at the outermost level can be used to specify which inner elements are to be used, and in what order. XPath • XPath expressions look a lot like directory path expressions for operating systems; both describe a path through a tree structure. Absolute paths begin with a / and start at the root element of the document. The XSLT processor traverses the input document in preorder fashion and keeps track of its current position. The current position is called the context node, and it is referred to with a period. Relative path specifications are relative to the context node and do not begin with a slash. Possible XPath specifications and their meanings: / – The root of the document. – Contents of the current context node. to/text() – Contents of the text node of the <to> element /memo/to/name – <name> elements that are a child element of /memo/to. //surname –All <surname> elements, even if at different levels. /memo/to/name[1] –First <name> child element of /memo/to. /memo/to/name[last()] – Last <name> child element of /memo/to. @date – Contents of the date attribute of the current context element. /memo@date – Contents of the date attribute of the <memo> element. • XPath also includes functions and operators, which were’nt discussed. Simple XSLT Example • The following example illustrates transforming an XML document into an HMTL TABLE – Input • Style sheet (XSL): table.xsl • XML document: acronym.xml – Output • HTML document: acronym.html XSLT Stylesheet: table.xsl <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html" /> <xsl:template match="/"> <TABLE CELLPADDING="3" BORDER="1" ALIGN="CENTER"> <!–- Build table header, by selecting the name of each element in the first ROW. --> <TR><TH></TH> <xsl:for-each select="ROWSET/ROW[1]/*"> <TH><xsl:value-of select="name()" /></TH> </xsl:for-each> </TR> <!–- Apply template to build table rows --> <xsl:apply-templates select="ROWSET" /> </TABLE> </xsl:template> ... XSLT Stylesheet: table.xsl (continues) ... <xsl:template match="ROW"> <TR><TD><xsl:number /></TD> <!-– Select all elements in the ROW. Populate each TD with the corresponding text value of the element. Note: produces by Xalan --> <xsl:for-each select="*"> <TD><xsl:value-of select="." /> </TD> </xsl:for-each> </TR> </xsl:template> </xsl:stylesheet> XML Document: acronyms.xml <?xml version="1.0"?> <ROWSET> <ROW> <ACRONYM>DOM</ACRONYM> <DESCRIPTION>Document Object Model</DESCRIPTION> </ROW> <ROW> <ACRONYM>JAXP</ACRONYM> <DESCRIPTION>Java AIP for XML Parsing</DESCRIPTION> </ROW> <ROW> <ACRONYM>SAX</ACRONYM> <DESCRIPTION>Simple API for XML</DESCRIPTION> </ROW> <ROW> <ACRONYM>TrAX</ACRONYM> <DESCRIPTION>Transformation API for XML</DESCRIPTION> </ROW> <ROW> <ACRONYM>XSLT</ACRONYM> <DESCRIPTION>XSL Transformation</DESCRIPTION> </ROW> </ROWSET> Transformation Result <TABLE ALIGN="CENTER" BORDER="1" CELLPADDING="3"> <TR> <TH></TH><TH>ACRONYM</TH><TH>DESCRIPTION</TH> </TR> <TR> <TD>1</TD><TD>DOM </TD><TD>Document Object Model </TD> </TR> <TR> <TD>2</TD><TD>JAXP </TD><TD>Java AIP for XML Parsing </TD> </TR> <TR> <TD>3</TD><TD>SAX </TD><TD>Simple API for XML </TD> </TR> <TR> <TD>4</TD><TD>TrAX </TD><TD>Transformation API for XML </TD> </TR> <TR> <TD>5</TD><TD>XSLT </TD><TD>XSL Transformation </TD> </TR> </TABLE> Transformation Result (continued).

Load more