TU07 XML at The
Total Page:16
File Type:pdf, Size:1020Kb
ApacheCon 2004 November 2004 XML at the ASF Ted Leung [email protected] Copyright © Sauria Associates, LLC 2004 1 ApacheCon 2004 November 2004 Overview xml.apache.org ws.apache.org Xerces XML-RPC Xalan Axis FOP WSIF Batik JaxMe Xindice cocoon.apache.org Forrest XML-Security Cocoon XML-Commons Lenya XMLBeans Copyright © Sauria Associates, LLC ApacheCon 2004 2 There are three major XML focused projects at the ASF. Originally there was one project, xml.apache.org. Earlier this year, the Cocoon and web services projects were formed. Xml.apache.org contains a number of projects that are general purpose XML tools. Most of these tools are based on specifications from the World Wide Web Consortium. This includes XML itself, XSLT, XSL Formatting object, Scalable Vector Graphics, and XML Signature and XML Encryption The web services project, ws.apache.org contains projects that cluster around standards for dealing with Web Services, including SOAP and XML-RPC The Cocoon project is oriented around the Cocoon Web publishing framework which is basd on XML, XSLT, and a number of other XML related technologies. I’m not going to be able to give you any deep technical details regarding all of these projects. Instead, I’m going to try to describe what these projects are, what standards they implement, and talk about situations where you might use them. Unless I say otherwise, I’m going to be covering the Java projects. There are a few projects which have C/C++ versions and I’ll mention that where applicable. Copyright © Sauria Associates, LLC 2004 2 ApacheCon 2004 November 2004 Xerces-J <?xml version="1.0" encoding="UTF-8"?> <books xmlns="http://sauria.com/schemas/apache-xml-book/books" xmlns:tns="http://sauria.com/schemas/apache-xml-book/books" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://sauria.com/schemas/apache-xml-book/books http://www.sauria.com/schemas/apache-xml-book/books.xsd" version="1.0"> <book> <title>Effective Java</title> <author>Joshua Bloch</author> <isbn>yyy-yyyyyyyyyy</isbn> <month>August</month> <year>2001</year> <publisher>Addison-Wesley</publisher> <address>New York, New York</address> </book> </books> Copyright © Sauria Associates, LLC ApacheCon 2004 3 Xerces-J is an XML parser written in Java. There is also a C++ version Xerces-C, and a Perl wrapper for the C Version called Xerces-P The job of an XML parser is to break an XML document up into its constituent parts: Elements Attributes Character data The parser also verifies that the document obeys the rules of XML: Well-formedness Validity (optional) Namespace rules The parts are exposed to an application via an API of some sort. XML Parsing is the fundamental building block for just about every XML based application and technology Copyright © Sauria Associates, LLC 2004 3 ApacheCon 2004 November 2004 Xerces-J Provide API’s to the parts SAX DOM XNI XNI-based pull Validation DTDs XML Schema Support Relax NG support via Andy Clark’s Neko Tools for XNI When would I use it? Everywhere Copyright © Sauria Associates, LLC ApacheCon 2004 4 Xerces provides 4 APIs for accessing the parts of an XML document SAX – defacto standard - event callback based – produces “event streams” DOM – W3C standard – tree oriented object model – produces a DOM tree XNI – Xerces Native Interface – event callback based XNI-based pull – CyberNeko – allows inversion of control There are three major standards for validating XML documents. Xerces has built in support for 2 and added support for the third. Xerces has built in support for XML DTD’s and the W3C XML Schema Via Andy Clark’s CyberNeko Tools for XNI, Xerces has support for OASIS’s Relax NG XML parsing provides the foundation layer for most XML applications. If you are using XML the question isn’t so much when would I use an XML parser, the question is, will I know that I am using an XML parser. Some of the projects that I’ll discuss in this talk use an XML parser as part of their implementation Copyright © Sauria Associates, LLC 2004 4 ApacheCon 2004 November 2004 Xalan-J What does it do? XSLT Processor Converts one kind of XML into another Uses a stylesheet (XML document) Stylesheet describes tree transformation Declarative programming model Based on pattern matching Copyright © Sauria Associates, LLC ApacheCon 2004 5 Xalan-J is an XSLT processor. As with Xerces, there is also a C++ version. XSLT is the XSL Transformations language. It is a declarative language for describing how to transform a source XML document into another XML document. The transformations are specified via an XML document called an XSLT stylesheet XSLT views every XML document as a tree of nodes, similar to the DOM model. Transformations on documents correspond to transformations of the input tree Copyright © Sauria Associates, LLC 2004 5 ApacheCon 2004 November 2004 Xalan-J <?xml version="1.0"?> <html> <books> <head> <book> <META http-equiv="Content-Type" <title>Effective Java</title> content="text/html; charset=UTF-8"> <author>Joshua Bloch</author> <title>Book Inventory</title> <isbn>yyy-yyyyyyyyyy</isbn> </head> <month>August</month> <body> <year>2001</year> <em>Effective Java</em> <publisher>Addison- <br> Wesley</publisher> <b>Joshua Bloch</b> <address>New York, New <br> York</address> yyy-yyyyyyyyyy<br> </book> August, </books> 2001<br> Addison-Wesley<br> New York, New York<br> <p></p> </body> </html> Copyright © Sauria Associates, LLC ApacheCon 2004 6 Here’s a simple example of the way that XSLT can be used to transform XML into another XML dialect. Conveniently, HTML can also be viewed as an XML vocabulary, so we can use XSLT to convert XML data into HTML In this diagram, the arrows show how various parts of the XML document get transformed into the HTML document. Copyright © Sauria Associates, LLC 2004 6 ApacheCon 2004 November 2004 Xalan-J <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:books="http://sauria.com/schemas/apache-xml-book/books" exclude-result-prefixes="books"> <xsl:output method="html" version="4.0" encoding="UTF-8" indent="yes" omit-xml-declaration="yes"/> <xsl:template match="books:books"> <html> <head><title>Book Inventory</title></head> <body> <xsl:apply-templates/> </body> </html> </xsl:template> <xsl:template match="books:book"> <xsl:apply-templates/> <p /> </xsl:template> <xsl:template match="books:title"> <em><xsl:value-of select="."/></em><br /> </xsl:template> … Copyright © Sauria Associates, LLC ApacheCon 2004 7 This is a portion of the XSLT stylesheet that produced the HTML document in the previous slide The important thing to look at here is the <xsl:template> elements, which show how the pattern matching model of XSLT functions. The template matches, some nodes are inserted into the result tree. More templates can be tried if <xsl:apply-templates/> appears in the template body Copyright © Sauria Associates, LLC 2004 7 ApacheCon 2004 November 2004 Xalan-J TrAX API is part of JAXP XSLTC Compiler Extensions EXSLT Xalan specific When would I use it? Convert XML to HTML Convert XML to XML WML Vocabulary translation Copyright © Sauria Associates, LLC ApacheCon 2004 8 JAXP provides an API for dealing with an XSLT processor This API is called TrAX, which stands for Transformation API for XML. In JDK 1.4 and above, the reference implementation for TrAX is provided by Xalan Xalan provides two XSLT processing engines. The default engine interprets XSLT stylesheets as it transforms a document. The XSLTC engine compiles XSLT stylesheets into Java programs called translets. The translets are then executed to transform the document This approach has large performance benefits in scenarios where the documents being transformed use a fixed grammar or set of grammars. Xalan allows two forms of extensions: It supports the EXSLT XSLT extension library found at http://www.exslt.org Xalan also includes is own mechanism for defining XSLT extensions. This allows you to define extension elements (in XSLT stylesheets) and extension functions (which can be used in XPath expressions). I’ve already show an example of using XSLT to convert XML to HTML. You can build entire web sites this way, and use XSLT to generate the right kind of output format – HTML, XHTML, WML and so on. If you are doing publishing, this kind of single input multiple output should be very appealing. There are also lots of applications where you need to take XML data that uses a grammar and convert it into another grammar. XSLT can be very useful in these situations as well. Copyright © Sauria Associates, LLC 2004 8 ApacheCon 2004 November 2004 FOP What does it do? XSL Processor Convert XML with XSL elements into non XML formats Copyright © Sauria Associates, LLC ApacheCon 2004 9 FOP is an XSL Formatting Object processor XSL Formatting Objects is a W3C Recommendation that uses XML markup to describe how to render a document. The usual rendering is a non-XML format. The XSL specification contains lots of constructs for specifying precise layout and page control. Copyright © Sauria Associates, LLC 2004 9 ApacheCon 2004 November 2004 FOP – XSL Input <?xml version="1.0" encoding="UTF-8"?> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <fo:simple-page-master margin-right="0.5in" margin-left="0.5in" margin-bottom="0.5in" margin-top="0.5in" page-width="8.5in" page-height="11in" master-name="book-page"> <fo:region-body margin="1in"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="book-page">