Professional XML Development with Apache Tools Xerces, Xalan, FOP, Cocoon, Axis, Xindice

Theodore W. Leung

Wiley Publishing, Inc.

Professional XML Development with Apache Tools Xerces, Xalan, FOP, Cocoon, Axis, Xindice

Professional XML Development with Apache Tools Xerces, Xalan, FOP, Cocoon, Axis, Xindice

Theodore W. Leung

Wiley Publishing, Inc. Professional XML Development with Apache Tools: Xerces, Xalan, FOP, Cocoon, Axis, Xindice Published by Wiley Publishing, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com Copyright © 2004 by Wiley Publishing, Inc., Indianapolis, Indiana Published by Wiley Publishing, Inc., Indianapolis, Indiana Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8700. Requests to the Publisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc., 10475 Crosspoint Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4447, E-mail: [email protected]. Trademarks: Wiley, the Wiley Publishing logo, Wrox, the Wrox logo, the Wrox Programmer to Pro- grammer logo and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries, and may not be used without written permission. All other trademarks are the property of their respective owners. Wiley Publishing, Inc. is not associated with any product or vendor mentioned in this book.

LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: WHILE THE PUBLISHER AND AUTHOR HAVE USED THEIR BEST EFFORTS IN PREPARING THIS BOOK, THEY MAKE NO REPRESENTA- TIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS BOOK AND SPECIFICALLY DISCLAIM ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES REPRESENTATIVES OR WRITTEN SALES MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR YOUR SITUA- TION. YOU SHOULD CONSULT WITH A PROFESSIONAL WHERE APPROPRIATE. NEITHER THE PUBLISHER NOR AUTHOR SHALL BE LIABLE FOR ANY LOSS OF PROFIT OR ANY OTHER COMMERCIAL DAMAGES, INCLUDING BUT NOT LIMITED TO SPECIAL, INCIDENTAL, CON- SEQUENTIAL, OR OTHER DAMAGES.

For general information on our other products and services or to obtain technical support, please con- tact our Customer Care Department within the U.S. at (800) 762-2974, outside the U.S. at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Library of Congress Control Number: 2003115130 ISBN: 0-7645-4355-5 Manufactured in the United States of America 10 9 8 7 6 5 4 3 2 1 Credits Author Vice President & Executive Group Publisher Theodore W. Leung Richard Swadley

Executive Editor Vice President & Executive Publisher Robert Elliott Robert Ipsen

Production Editor Vice President and Publisher Vincent Kunkemueller Joseph B. Wikert

Copy Editor Executive Editorial Director Tiffany Taylor Mary Bednarek

Compositor Editorial Manager Gina Rexrode Kathryn A. Malm

Book Producer Ryan Publishing Group, Inc.

About the Author

Theodore W. Leung Ted Leung is a Member of the Apache Software Foundation. He is a founding member of the Apache XML Project and served as the chairman of the XML Project Management Committee from March, 2001 to June, 2003.

He is also the principal of Sauria Associates, LLC, a Pacific Northwest consultancy focused on high-impact software development. He has served companies such as F5 Networks, IBM, Enkubator, Apple Computer, and Taligent in roles spanning technical lead through chief technol- ogy officer. Ted holds a S.B in Mathematics from the Massachusetts Institute of Technology and a Sc.M. in Computer Science from Brown University.

Ted has given a number of technical presentations at industry conferences, including Software Development West and ApacheCon. A full list of his speaking engagements is available at http://www.sauria.com/presentations.html.

Acknowledgments

Writing a book is a journey, and the journey begins well before the keys on the keyboard start click- ing. In my case, the journey to this book has led through a number of organizations, and I want to thank those that have helped me along the way.

Without the Apache Software Foundation, this book—and more importantly, the software described in this book—would not exist. It has been my privilege to work with the contributors, committers and members of the ASF. I’d like to give special thanks to Dirk-Willem van Gulik, Stefano Mazzochi, Pierpaolo Fumagalli, Davanum Srinivas, and James Duncan Davidson.

My involvement with the ASF would not have happened were it not for the hard work of the developers and management of the IBM Cupertino XML4J team: Mike Pogue, Andy Clark, Glenn Marcy, Ralf Pfieffer, Andy Heninger, Tom Watson, Eric Ye, Mike Weiner, Rajiv Jain, and Paul Buck. I’d also like to thank Rachel Reinitz, from IBM’s Software Services for Websphere, for starting me down the road of interacting with real people who were trying to get their jobs done using XML technologies.

The actual act of writing a book cannot happen in a vacuum. Various people have provided much needed help or advice. My neighbor Kate deVeaux dropped everything to help me beat an eleventh hour deadline by taking the photograph that graces the cover of this book. My other neighbors Alex Torres and David Shenk provided valuable advice about how to handle various aspects of the book writing process.

Apache is about community software development, performed across great distances via the won- der that is the Internet. But people also need a local, physical community, so I’d like to thank the members of our spiritual community, for their support, understanding, and prayers during this project, especially the Campbells, Woleslagles, Ziakins, Bests, and Larsens. Extra thanks to Larry Gonwick for physically standing in for me this summer.

To Mom and Dad: Thank you for your love and all the years of prayers and hard labor.

To Abigail, Michaela, and Elisabeth: Daddy is finally done with the book. We can go play now.

To Julie, the love of my life, thank you for standing over me and guarding my time, hearing my frustrations, and enduring my absences.

One of the many characterizations of the open source culture has been as a gift culture. Jesus Christ gave himself as a gift for the world. I want to thank him for the inspiration to be a gift giver. I hope that I will be able to follow his example.

Contents

Acknowledgments ix Introduction xvii

Chapter 1: Xerces 1 Prerequisites 2 Well-Formedness 3 Validity 4 Entities 6 XML Parser 6 SAX 8 DOM 13 Installing Xerces 15 Development Techniques 16 Xerces Configuration 16 Deferred DOM 20 Schema Handling 20 Grammar Caching 23 Entity Handling 29 Entity References 31 Serialization 34 XNI 38 Using the Samples 43 CyberNeko Tools for XNI 44 NekoHTML 44 ManekiNeko 44 NekoPull 45 Practical Usage 48 Common Problems 49 Applications 50

Chapter 2: Xalan 53 Prerequisites 53 XPath 53 XSLT 62 Installing and Configuring Xalan 69 Contents

Development Techniques 70 TrAX 70 Xalan Specific Features 81 XSLTC 91 Xalan Extensions 97 Practical Usage 104 Applications 104

Chapter 3: FOP 113 Prerequisites 114 Basic XSL 115 Flows 115 List Blocks 116 Generating XSL with XSLT 117 Tables 119 Installing and Configuring FOP 123 Hyphenation 124 Development Techniques 125 Embedding 125 Using the Configuration Files and Options 127 SAX 129 DOM 131 XSLT 133 Validating XSL 135 Command-Line Usage 135 Ant Task 137 Fonts 138 Output 140 Graphics 143 FOP Extensions 145 Practical Usage 146 Applications 146

Chapter 4: Batik 155 Prerequisites 156 Static SVG 156 Dynamic SVG 164 Installing and Configuring Batik 169 Development Techniques 170 SVGGraphics2D 171 JSVGCanvas 183 xii Contents

ImageTranscoding 187 SVG Scripting 191 Security 197 SVG Rasterizer 202 Command Line 202 SVG Browser 205 SVG Pretty-Printer 206 SVG Font Converter 207 Practical Usage 208 Applications 209 Rich Client User Interfaces 212

Chapter 5: Cocoon Concepts 213 Prerequisites 213 Concepts 214 Sitemap 214 Generators 223 Transformers 225 Serializers 229 Matchers 231 Selectors 233 Actions 234 Action Sets 236 Readers 236 Views 238 Resources 239 elements 239 Cocoon URIs 241 XSP 242 Sessions 253

Chapter 6: Cocoon Development 255 Installing and Configuring Cocoon 255 Configuring Cocoon 257 Development Techniques 258 Database Access 258 Simple Application 269 Practical Usage 283 Performance 283 Applications 284

xiii Contents

Chapter 7: Xindice 285 Prerequisites 286 XML:DB 287 XUpdate 293 Installing and Configuring Xindice 299 Command-Line Tools 300 Runtime Environment 300 Adding to the Database 300 Retrieval 302 Deleting 303 Indexing 304 Other 306 Development Techniques 307 XML:DB API 307 Practical Usage 329 Applications 329 XMLServlet: Accessing Xindice 330 XSLTServletFilter 334 Deployment Descriptors 337 XSLT Stylesheets 338 A SAX-based Version 341 XPathResultHandler 346

Chapter 8: XML-RPC 349 Prerequisites 350 Concepts 350 XML Encoding RPCs 351 Using HTTP as an RPC Transport 354 Installing and Configuring XML-RPC 356 Development Techniques 356 A Simple Client 357 Mapping to Java Types 358 A Simple Server 359 Asynchronous Clients 361 Getting More Control Over Server Processing 363 Handling BASIC Authentication on the Server 365 XML-RPC in Existing Servers 366 Using SSL 368 Practical Usage 373 Applications 373 Simplifying XML-RPC 373 xiv Contents

Chapter 9: Axis 379 Prerequisites 380 Concepts 380 SOAP 380 WSDL 384 JAX-RPC 392 Installing and Configuring Axis 399 Deployment Environment Setup 400 Development Environment Setup 401 Development Techniques 402 Axis Conceptual Model 402 Axis and WSDL 407 Accessing the ServletContext 434 Message Service 436 Handlers 442 .jws Web Services 448 Tools 449 Practical Usage 453 Applications 454

Chapter 10: XML Security 455 Prerequisites 456 One-Way Hashing 456 Symmetric Key Encryption 457 Public Key Encryption 457 Digital Signatures 457 Concepts 458 Canonicalization 459 Installing and Configuring XML Security 470 Development Techniques 471 Canonicalizing and Computing the Digest 471 Signing 474 Verification 485 More Signatures 490 Resolvers 493 Encryption 495 Practical Usage 501 Applications 502

Index 503

xv

Introduction

XML is growing in popularity for use in all kinds of applications. Some have called it the new ASCII, believing that XML will be used as widely as the ASCII character set is today. XML’s simple rules for markup, utilization of Unicode, and endorsement by the World Wide Web Consortium (W3C) have made it a good choice for representing various kinds of data in a human-readable manner. In addition, many new technologies have been built on top of XML—from technologies that convert XML from one vocabulary to another, to those that render formatted XML as PDF or Postscript, all the way to technologies that digitally sign and encrypt XML documents or pieces of XML documents.

To use XML or one of its related technologies, you need to have a toolset, and one such toolset is the topic of this book. The Apache Software Foundation hosts a number of projects related to XML. The aim of this book is to give you an overview of the XML technology the projects implement and then show you how to use the projects in your own applications.

How to Use This Book This book is intended for Java developers who are already familiar with XML and who want to use one of the Apache XML projects in an application setting. The focus of the book is the unique fea- tures of the Apache tools we’ll cover. This isn’t intended to be a tutorial book for any of the XML technologies standardized by the W3C. In particular, you shouldn’t use this book as a tutorial on XML, XML Namespaces, SAX, DOM, XSLT, XSL, SVG, SOAP, WSDL, XML Signature, or XML Encryption. Although each chapter contains some material to make sure that you can understand the functionality provided by the Apache projects, each one of these technologies could be the sub- ject of an entire book (and many of them are). Our goal is to help you see how to use the Apache libraries to perform the kinds of tasks needed to build real-world applications.

This book also doesn’t cover every Apache XML-related project—a book must have a finite scope, just like a software project. We’ll examine the most useful tools provided by Apache, as well as real-world techniques for using them.

Unless noted, all the examples have been written for and tested under the Java SDK version 1.4.2. Many of the examples will run fine on earlier versions of the JDK, but a few will not, and these are noted for you.

The tools presented are mostly independent of one another, with the exception of Xerces. XML parsing is a fundamental aspect of all the Apache XML tools, so knowledge of Xerces can be help- ful in many circumstances. It would be a good idea for you to at least browse the contents Chapter 1, “Xerces,” to make sure you’re familiar with all the concepts. After that, you can jump to the chapter that discusses the particular tool you’re interested in.

All the code and sample applications in this book can be downloaded from www.wrox.com. Introduction Organization This book is divided up into three parts. Part I contains an overview of all the tools in this book, and it explains the details of Xerces, a tool which is used in conjunction with almost all the other tools discussed in this book. Part II focuses on tools that are particularly useful for developing Web applications—these tools have other purposes as well, but the focus is on Web applications. Part III covers tools that are primarily used to build back-end applications.

Part I: Getting Started with the Apache XML Tools Part I includes a chapter on Xerces, which is foundational for all the other tools described in the book. Chapter 1: Xerces Xerces-J is an XML parsing library that has been used in many large Java applications. It provides support for XML 1.0 with Namespaces and has preliminary support for XML 1.1. Xerces-J supports the W3C DOM Level 2 tree-based API, the SAX 2.0.1 event-driven API, and the Java API for XML Parsing (JAXP) 1.2, which is based on the DOM and SAX APIs. In addition, Xerces-J provides full support for the W3C XML Schema 1.0 recommendation, allowing you the choice of using DTDs or XML Schema when validating XML documents. XML parsing is the foundational layer for any XML processing application, and Xerces provides a stable and flexible foundation. Xerces has been incorporated into so many applications that you may be using it today and not know it.

We’ll look at how to use JAXP with Xerces-J to obtain a parser that you can use from within your application. Then we’ll discuss how you can use either the SAX or DOM APIs to process an XML document. Xerces-J provides additional functionality that isn’t described by any of the standards, so we’ll look at how to use the Xerces configuration mechanism and the Xerces Native Interface to perform tasks that are outside of what the standards describe. Xerces includes a library for serializ- ing XML, taking either SAX event callbacks or a DOM tree and turning them back into an XML document. One of the newest features of Xerces is the ability to cache XML Schemas to avoid repro- cessing them when you’re processing a number of documents that use the same schema.

Part II: Web Application Development Part II is devoted to tools that are useful when you’re developing Web applications. That’s not to say that these tools don’t have other applications; it’s recognition that they can be particularly use- ful in developing Web applications. Chapter 2: Xalan Xalan is the ASF library for working with XSLT, the eXtensible Stylesheet Language (XSL) Transformations language. XSLT is an XML grammar that allows you to specify how an XML docu- ment using one vocabulary can be converted into an XML document that uses a different vocabu- lary. These transforms let you add or subtract information from the document, as well as rearrange the structure of the document. Xalan implements the XSLT 1.0, XPath 1.0, and JAXP 1.2 APIs. You can use XSLT to convert XML of various sorts into HTML, WML, or other display languages based on XML. You can also use it to convert data files from one vocabulary to another.

xviii Introduction

The JAXP API is the primary method for interacting with Xalan, so we’ll look at how to use it to perform XSLT transformations. Xalan can operate in two modes, interpretive and compiled. In interpretive mode, Xalan interprets the XSLT stylesheet as it transforms the document. In compiled mode, Xalan uses an XSLT-to-Java compiler called XSLTC to compile the stylesheet into a Java class that can then be executed directly. If you’re passing a large number of documents through the same stylesheet, using XSLTC can improve the performance of your application substantially. We’ll also examine the Xalan mechanism for implementing extensions to XSLT using libraries of functions and via custom Java code. The chapter ends by showing you how to implement a Java servlet filter that can be used to add XSLT capabilities to many servlet applications. Chapter 3: FOP The XSL recommendation comes in two parts: XSLT and XSL, which is sometimes referred to as XSL-FO (XSL formatting objects). Whereas XSLT is concerned with transforming one flavor of XML into another, XSL-FO takes XML in the form of XSL formatting objects and formats it for output to non-XML formats such as PDF, PCL, and Postscript. The Apache FOP project is an implementation of an XSL processor. It’s capable of rendering XSL formatting objects to PDF, PCL, Postscript, SVG, text, and other file formats.

We’ll look at the FOP SAX- and DOM-based APIs to see how to embed FOP in a Java program. FOP provides some command-line tools that are useful for debugging and testing, as well as some Ant tasks. At the end of the chapter, we’ll discuss how to implement a Java servlet filter that uses FOP as an output stage. We’ll also see how to chain that filter together with the Xalan XSLT filter to obtain a complete system for rendering XML to non-XML formats. Chapter 4: Batik People are finding applications for XML in a wide and varied number of problem domains. The Scalable Vector Graphics (SVG) recommendation specifies an XML vocabulary for describing vec- tor graphics operations. SVG allows you to describe dynamic as well as static images, so you can use it to perform animation and build user interfaces. There are two methods for describing SVG animation: a scripting-based method based on an SVG-enabled version of the DOM, and a more declarative method based on elements taken from the Synchronized Multimedia Integration Language (SMIL). SVG provides a way to integrate visual content into XML data. The Batik project is Apache’s library for working with SVG.

The Batik toolkit contains both end-user and developer components. We’ll look at the Batik tools for generating images from SVG files, an SVG browser based on Batik components, and tools that Batik supplies for generating fonts and pretty-printing SVG files. Batik provides a library of classes you can integrate into your application so you can generate SVG documents by using the Java2D drawing API. There is also a Swing-based component for displaying SVG documents; you can add it to your Swing applications to obtain SVG display capabilities. We’ll also examine how Batik inte- grates with scripting languages such as ECMAScript/JavaScript and JPython. Chapter 5: Cocoon Concepts Cocoon is a sophisticated Web publishing framework based on XML. It’s different from most of the other tools we’ll discuss because no standard defines how it works. It’s purely the invention of the Cocoon committers, and it makes heavy use of the rest of the Apache XML libraries.

xix Introduction

Cocoon is based on the notion of XML processing pipelines. These pipelines are used to process request URIs and generate results of varying types, including HTML, XHTML, WML, PDF, and others. Cocoon defines eXtensible Server Pages, an XML-compliant equivalent to JavaServer Pages, which you can use to generate XML data for Cocoon pipelines. A new feature in Cocoon is FlowScript, which provides a compact mechanism for capturing complex Web page interactions such as multipage forms.

Cocoon is a big topic, so it’s covered in two chapters—this chapter focuses on Cocoon concepts and terminology, and we look at how to build pipelines. Chapter 6: Cocoon Development This chapter focuses on practical applications. We’ll discuss installing and setting up Cocoon. We’ll approach database access from a number of different angles because Cocoon provides a diversity of database access methods. And we’ll show you how to tie all these components together in a sim- ple database-backed Web application. Chapter 7: Xindice Once you start marking up data as XML, you’ll discover that you quickly accumulate a lot of infor- mation. The question then becomes how to manage all this information. The Xindice project is an attempt to answer that question. Instead of trying to store your XML data in files or as columns or rows in a relational database, Xindice gives you the option of storing your data in a database that has been designed from the beginning to deal with XML. Xindice lets you create collections of doc- uments, index them, and query them using XPath. There are no widely adopted standards in this space, although Xindice does support an API called XML:DB that has been developed by a few native XML database vendors.

We’ll look at setting up a Xindice database and how you go about creating, retrieving, updating, and deleting XML from it. To do this, we’ll discuss the Xindice command-line tools, as well as Xindice’s implementation of the XML:DB APIs. We’ll demonstrate how to integrate Xindice into an XML application by using Xindice as the source of XML data to be processed by the servlet filters developed in earlier chapters.

Part III: Back-End Application Development Part III discusses three tools that are involved at the back-end plumbing level. You can use them in Web applications and many other back-end application situations. Chapter 8: XML-RPC The XML-RPC protocol uses XML to mark up the arguments of a remote procedure call. The result- ing XML document is transported to the server using HTTP. This protocol is fairly easy to imple- ment and use, which makes it a popular choice for integrating applications. The main details you need to be aware of are how the XML-RPC library maps types in the host programming language (Java, in our case) onto the set of types defined by XML-RPC. The Apache XML-RPC library is part of the Web services projects at http://ws.apache.org; it’s called Apache XML-RPC.

xx Introduction

This chapter shows how to build XML-RPC clients and servers using the Apache XML-RPC library. We’ll also talk about how to perform basic security tasks such authentication and encryption using SSL. The chapter concludes with an example of how to make XML-RPC calls a bit more type safe. Chapter 9: Axis Some of the creators of XML-RPC went on to create a more sophisticated version of an XML over HTTP protocol called the Simple Object Access Protocol (SOAP). The SOAP protocol is one the cor- nerstones of the Web services technology stack. Web services is an attempt to construct systems using an architecture based on loose coupling. Using XML as a marshalling format and HTTP as a transfer protocol is one way to obtain loose coupling. Another cornerstone of the Web services technology stack is the Web Services Description Language (WSDL). This XML vocabulary is used to describe a Web service. A WSDL description can then be used to generate the code that imple- ments a Web service. is a part of the Apache Web services project and is the ASF implementation of both SOAP and WSDL. Axis provides an implementation of the Java API for XML-based RPC (JAX-RPC).

We’ll look at how you write both service providers and service requestors using the Axis / JAX- RPC APIs, and we’ll discuss the Axis deployment descriptor format and how it controls the deployment of Web services when you use Axis. Axis includes powerful command-line tools for taking a WSDL file and generating Java code that implements the service being described in the WSDL file. It also includes tools for processing an existing Java class and creating a WSDL file based on it. In addition, Axis provides a mechanism for creating a message-based service where the Axis runtime does not interpret the contents of the SOAP message, allowing your application to have total control over how the XML message is processed. Chapter 10: XML-Security XML is increasingly being used to mark up data that has various kinds of security requirements. Either the data needs to have a guarantee of authenticity, or the data needs to be private or secret. The W3C has developed two recommendations to deal with these issues. The XML Signature Syntax and Processing recommendation details how to represent digital signature information as XML and how to use digital signatures to sign portions of XML documents. The XML Encryption Syntax and Processing recommendation contains a parallel technology for using encryption.

The Apache XML project hosts the XML-Security project, which provides a Java API for working with digital signatures and encryption. The digital signature functionality is stable and has been available for some time. The encryption functionality is in the alpha stage but is usable. This chap- ter explains the APIs provided by the library and walks you through signing, verifying, encrypt- ing, and decrypting documents and portions of documents.

The Apache Software Foundation You may wonder where all this software comes from. How was it developed? Who developed it? Who paid for it? What can you use it for and under what terms? What if you have a problem? In this section, we’ll talk about the Apache Software Foundation and, along the way, answer all these questions.

xxi Introduction History The Apache Software Foundation started out in 1995 as a group of eight Webmasters called the Apache Group. Their purpose was to continue the development of the public domain HTTP server daemon that was developed by Rob McCool at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign (UIUC). McCool had left NCSA in 1994, and no one was developing the NCSA code base. The problem was, the NCSA HTTP server was the most popular Web server in 1995, and many people wanted to keep using it. The group of Webmasters used e-mail to coordinate the changes each of them made to the NCSA HTTP code. The changes were distributed as patches, generated by the UNIX diff command and integrated using the UNIX patch command. All these patches led to the Apache name: “a patchy server.” Since 1996, the Apache HTTP server has been the number-one Web server on the Internet.

In 1999, the members of the Apache Group formed the Apache Software Foundation (ASF) in order to provide organizational, legal, and financial support for the Apache HTTP Server. The ASF is a membership-based, not-for-profit corporation formed for the following reasons:

❑ Provide a foundation for open, collaborative software development projects by supplying hardware, communication, and business infrastructure. ❑ Create an independent legal entity to which companies and individuals can donate resources and be assured that those resources will be used for the public benefit. ❑ Provide a means for individual volunteers to be sheltered from legal suits directed at the foundation’s projects. ❑ Protect the Apache brand as applied to its software products from being abused by other organizations. New Projects The ASF initiated the Apache XML project in the fall of 1999 with code donations from a variety of sources including IBM, , Datachannel, and several individuals. These projects formed the core of the new project. Since that time, the Apache XML project has added several new projects as the XML area has continued to grow. Early in the spring of 2003, some of the projects in the XML project were spun off into a Web Services project, and the Cocoon project became a top- level project as well. This was done to facilitate the foundation’s ability to oversee these projects, and so that the projects could build more cohesive communities around their areas of interest.

The projects are frequently referred to by the URLs of their Websites. The XML project is at http://xml.apache.org, the Web Services project is at http://ws.apache.org, and Cocoon is at http://cocoon.apache.org. The ASF has a number of other projects, including Ant, APR, Avalon, Commons, DB, Incubator, Jakarta, James, Maven, , PHP, and TCL.

What does this mean to you? The software you download from the ASF is owned by the ASF and is licensed to you, as is all software. As you’ll see in the next section, the terms of the license aren’t restrictive. Development of the software is the responsibility of the ASF through individual con- tributors.

xxii Introduction Licensing The ASF differs from some of the other large open source efforts in a number of ways. One of them is the area of licensing. You may have heard that open source software is viral, meaning that if you use it in your application, you must distribute your application as open source. Whether this is true depends on the license you’re using. The GNU General Public License (GPL) is the primary license that has this property. The GNU Lesser General Public License (LGPL) may also have this property when applied to Java code. This issue is still being resolved at the time of this writing.

The Apache Software Foundation has its own license called the Apache Software License. For your convenience, the entire content of the license follows.

/* ======* The Apache Software License, Version 1.1 * * Copyright (c) 2003 The Apache Software Foundation. All rights * reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials provided * with the distribution. * * 3. The end-user documentation included with the redistribution, * if any, must include the following acknowledgment: * "This product includes software developed by the * Apache Software Foundation (http://www.apache.org/)." * Alternately, this acknowledgment may appear in the software * itself, if and wherever such third-party acknowledgments * normally appear. * * 4. The names "Apache" and "Apache Software Foundation" must * not be used to endorse or promote products derived from this * software without prior written permission. For written * permission, please contact [email protected]. * * 5. Products derived from this software may not be called "Apache", * nor may "Apache" appear in their name, without prior written * permission of the Apache Software Foundation. * * THIS SOFTWARE IS PROVIDED "AS IS" AND ANY EXPRESSED OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT

xxiii Introduction

* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * ======* * This software consists of voluntary contributions made by many * individuals on behalf of the Apache Software Foundation. For more * information on the Apache Software Foundation, please see * . * */

There are seven items you need to be aware of:

❑ You’re free to redistribute the source code and/or binaries of an Apache-licensed piece of software. This is true even if you make modifications to the source code. You aren’t required to make your product open source. You also aren’t required to give out the source code, but you’re allowed to do so if you choose. There are some conditions, and they basically come down to this: You must credit the fact that you used Apache-licensed code. So, if you redistribute the source code to an Apache project, you must include the license file (which appears at the top of every source-code file). ❑ If you redistribute binaries (including binaries you write that include the Apache code), then you need to distribute a copy of the license file somewhere in your distribution. ❑ If you make a distribution (including building software that uses the Apache code), then you need to include the clause “This product includes software developed by the Apache Software Foundation (www.apache.org/)” somewhere in the end-user documentation, in the software, or wherever you acknowledge the use of third-party software. ❑ You can’t use the name Apache or Apache Software Foundation to endorse or promote your product unless you get written permission from the ASF. ❑ You can’t use Apache in the name of your software product unless you get written permis- sion from the ASF. ❑ There is no warranty for the software.

Under the terms of the , it’s fine for you to build a product that uses one or more pieces of Apache software and sell the product in binary form. You don’t have to distribute the source code for your product, and you don’t have to make it open source. You don’t have to pay the ASF, although they would never turn down your donation. As long as you give the ASF a little credit and don’t try to misuse the Apache name, you should have no problems. A number of com- mercial Java products do this on a regular basis (IBM’s WebSphere application server comes to mind as an example).

xxiv Introduction Community Another way the ASF is different from other open source projects is in its focus on community- developed software. For an ASF project, the health and diversity of the development community is as important as the technical quality of the project code. The knowledge and expertise about the code resides in the community of developers.

A good way to keep abreast of developments in the broader ASF community is via the Apache Newsletter at www.apache.org/newsletter/index.html.

The ASF is a virtual organization; it has no central meeting place, so everything happens via the Internet. An Apache project uses several tools to create a virtual meeting place:

❑ Mailing lists—The most important tool is mailing lists. Usually a project has two or three mailing lists. One mailing list is for users to ask questions, get help, and help each other. Another list is used by the people who are developing the software for the project. This mailing list is public and open to anyone who is interested. There may also be a third mail- ing list that records all the changes made via the source-code control system, CVS. Sometimes the developer list and the CVS list are the same. The XML and Web Services projects each have a project-wide mailing list called general. This cross-project list facilitates interaction between the various subprojects and the public working area for the project management committee (PMC). ❑ CVS source-code control system—CVS is an open-source version-control system that allows developers to work in parallel without locking files. If two developers change the same file, then a conflict occurs, and they have to merge their changes. The importance of CVS is obvious. It’s where the code lives, and because people are there for the code, CVS is essen- tial. ❑ IRC channel—Sometimes a project has an IRC channel where people can gather and dis- cuss issues interactively. This tool can be useful but can also be a barrier to community development unless the IRC session is logged and the logs are posted to the mailing lists. ❑ Wiki—The ASF also hosts a wiki (a collaboratively edited Website) at http://nagoya .apache.org/wiki/apachewiki.cgi. It’s useful for working on documents in a collaborative manner. Mailing the final document or snapshots of the document to the appropriate mail- ing list is encouraged. Roles There are different categories of involvement with an Apache project, all of them voluntary. The first and most important category includes people who are using the software. Without the users, there is no reason for the project to exist. They use the software, make requests for new features, report bugs, and let the developers know where the software is hard to use or understand. These users are also doing the marketing and sales for ASF projects. If someone has a good experience using a piece of Apache software, they are likely to let their friends and colleagues know about it. So, although most users don’t contribute code, they supply lots of information that shapes the way the code is written.

Some users do a bit more: They send changes to the documentation or to the software itself. Typically these changes are sent to one of the mailing lists as patches (generated using the diff com-

xxv Introduction

mand) to the existing documentation or software, although occasionally someone sends an entire file. One of the committers (explained next) examines the patches and decides whether the changes should be incorporated. If the changes should be incorporated (or modified and then incorpo- rated), the committer makes the changes and commits them to the CVS repository. A contributor can contribute in other ways as well, such as by helping resolve disputes or making significant suggestions for features.

Contributors who consistently make valuable contributions may become committers. The term committer comes from the CVS commit command used to make changes to the CVS repository. The committers form the core development team for the project. Because they have been recognized as doing significant amounts of work, they decide on the direction of the project. In order to become a committer, a person must be nominated by an existing committer. A contributor can ask to be nom- inated, but a committer must be willing to propose that the contributor be given commit rights. All the committers for the project vote on whether the person should be granted commit rights. If the person receives three +1 votes and no -1 votes, then they are given commit privileges. Voting Let’s change topics for a moment and talk about voting. All major decisions for an Apache project are made by voting. Voting is typically used in three areas: major code modifications, project releases, and procedural matters. However, voting is so much a part of the way Apache projects work that it often happens spontaneously as a way of expressing agreement or disagreement on an issue. Depending on the project, a good rule of thumb is that a vote lasts for 72 hours in order to allow people to participate—this is especially important because the committers may be dis- tributed around the world.

Votes are stated using numbers. Voting +1 means you’re in favor, voting -1 means you aren’t in favor, and voting 0 means you’re neutral. In addition, +0 and -0 votes indicate that you’re leaning, but your inclination isn’t strong enough to be in favor or against.

Code modification votes work like this. Someone makes a proposal for a code change they would like to make and requests a vote (people may also vote spontaneously to show that they believe a vote is necessary). At that point, the proposal needs three +1 votes and no -1 votes. In this setting, a -1 vote is a veto. Vetoes should be exercised reasonably, and the person exercising the veto must give a valid technical reason for it. It’s possible to get someone to retract their veto after addressing their concerns.

Code release votes are typically called by the person acting as the release manager for that release. Typically, the project develops a release plan and votes on it using procedural style voting. A twist in the voting on release plans is that +1 is assumed to mean the voter will help make the release happen, whereas +0 and -0 indicate that the voter will not help or hinder the release process. Once all the items in the release plan have been completed, the release manager calls for a vote to make a release. Release votes can’t be vetoed, but they do require a minimum of three +1 votes to be approved.

Procedural votes on issues such as developing release plans are done via a simple majority. There must be more +1s than -1s, and there are no vetoes.

Another voting style can be used: lazy consensus. In such a vote, someone says something like “I’m going to modify the code to do this instead of that unless someone objects within three days.” If no xxvi Introduction

one objects, then the change is made. This process can be used effectively for smaller code changes that may not be important enough to vote on. If another committer believes the change is impor- tant enough for a vote, then a vote can be taken. Lazy consensus is mostly used for code modifica- tion votes, although it can be used for minor procedural matters. Another name for lazy consensus is silence implies consent. PMCs All the top-level ASF projects have a project management committee (PMC). The top-level projects are those that have their own Websites. Of the projects we’ll discuss in this book, the XML project (http://xml.apache.org), the Web Services project (http://ws.apache.org) and Cocoon (http://cocoon.apache.org) are all top-level projects. Xerces, Xalan, FOP, Batik, Xindice, and XML- Security are all subprojects of the XML project, and XML-RPC and Axis are subprojects of the Web Services project. You may hear the term umbrella PMC used to describe the XML and Web Services PMCs (as well as Jakarta) because they form an umbrella over a number of (sub) projects. The sub- projects operate as projects in their own right, but many of them don’t want to deal with having their own PMC.

The role of the PMCs is to ensure that the ASF guidelines on voting and culture are followed, to ensure that projects continue to be developed in a reasonable fashion, to help resolve conflicts, and to help take care of administrative or legal issues that might arise. The PMCs don’t determine the direction in which the software is developed. The members of the PMCs are drawn from the com- mitters of the project or subprojects and are voted on by the existing members of the particular PMC.

The PMC for a particular project does most of its business in the project’s general mailing list ([email protected], [email protected], [email protected]). The PMC also has a private mailing list, pmc@, which is used only for sensitive matters.

New Projects The last issue we’ll address is the question of how new ASF projects (or subprojects) are created. It’s fairly common for groups of developers or a corporation to approach the ASF with the desire to start a project around some piece of code that has been developed.

Remember that the most important criterion the ASF uses in evaluating the proposed project is the health of its development community. After much experience, the ASF has learned that this is the key factor for success. If the project is a closed-source project that wants to become open source, as is often the case with a code base that has come from a company, then it’s very important to have a diversity of committers to the project. If the code base has been developed by a single person, there is a similar concern over the size of the community around the project. Neither of these issues is insurmountable, but if you want your project to be accepted by the ASF, you should be aware that these are among the most important issues.

New project proposals are sent to [email protected]. The project is designed to help new projects learn to function within the ASF framework. This includes following the voting guidelines, learning to work in an open community development style, and finding necessary resources and infrastructure within the ASF. The ultimate decision of whether a project is accepted for incubation is up to the ASF board or, if the new project would fall under the domain

xxvii Introduction

of one of the umbrella PMCs, the PMC for the appropriate project. If the board or the PMC decides to accept the project, then an ASF member is asked to help shepherd the new project through the incubation process. The project spends some time in incubation, learning to work in the Apache style, before leaving the incubator. The decision on whether a project is ready to leave incubation is a joint one, involving the members of the project, the ASF board or relevant PMC, and the Incubator PMC.

W3C Process All the projects described in this book are under active development. A number of them implement specifications developed by the W3C. In order to understand the development status of a project that is implementing a W3C specification, it’s useful to understand the W3C process for develop- ing specifications.

The documents that many people refer to as standards are actually called Recommendations by the W3C. Recommendation is the highest status a specification can reach within the W3C process. In practice, this distinction doesn’t mean much, because most people treat a W3C Recommendation as a standard. The following figure illustrates the steps a specification goes through on its way to becoming a Recommendation.

Working Draft (WD)

Last Call Working Draft

Candidate (CR) Recommendation

(PR) Proposed Recommendation

(REC) Recommendation

xxviii