XML Testing and Tuning Discover Tools and Hints for Working with XML
Total Page:16
File Type:pdf, Size:1020Kb
XML and Related Technologies certification prep, Part 5: XML testing and tuning Discover tools and hints for working with XML Skill Level: Intermediate Louis E Mauget ([email protected]) Senior Consultant Number Six Software, Inc. 24 Oct 2006 This tutorial on XML testing and tuning is the final tutorial in a series that helps you prepare for the IBM certification Test 142, XML and Related Technologies. This tutorial provides tips and hints for how to choose an appropriate XML technology and optimize transformations. It wraps up with coverage of common tools you can use in testing XML designs. Section 1. Before you start In this section, you'll find out what to expect from this tutorial and how to get the most out of it. About this series This series of five tutorials helps you prepare to take the IBM certification Test 142, XML and Related Technologies, to attain the IBM Certified Solution Developer - XML and Related Technologies certification. This certification identifies an intermediate-level developer who designs and implements applications that make use of XML and related technologies such as XML Schema, Extensible Stylesheet Language Transformation (XSLT), and XPath. This developer has a strong understanding of XML fundamentals; has knowledge of XML concepts and related technologies; understands how data relates to XML, in particular with issues associated with information modeling, XML processing, XML rendering, and Web services; has a thorough knowledge of core XML-related World Wide Web XML testing and tuning © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 1 of 33 developerWorks® ibm.com/developerWorks Consortium (W3C) recommendations; and is familiar with well-known, best practices. About this tutorial This tutorial is for programmers who have a basic understanding of XML and whose skills and experience are at a beginning-to-intermediate level. You should have a general familiarity with defining, validating, and reading XML. The standardized nature of XML has given rise to a number of derivative cross-platform, cross-language parsers and derivative technologies. Parts 1 through 4 of this series discussed applied aspects of XML and its common related technologies (see Resources). To wrap up the series, this tutorial presents a number of rationales and hints for choosing appropriate technologies, explains how simple choices affect performance, and demonstrates simple examples of how to use common tools to test XML designs. Objectives After completing this tutorial, you will know how to: • Choose an appropriate XML technology • Optimize a transformation • Test an application of XML Prerequisites This tutorial is for developers who have a background in programming or scripting and who understand basic computer-science models and data structures. You should be familiar with the following XML-related, computer-science concepts: tree traversal, recursion, and reuse of data. You should be familiar with Internet standards and concepts, such as Web browser, client-server, documenting, formatting, e-commerce, and Web applications. Experience in designing and implementing Java™-based computer applications and working with relational databases is also recommended. System requirements This tutorial's testing and demonstration tools -- Internet Explorer® 6.0, Mozilla Firefox 1.5, Altova XMLSpy Home Edition, and IBM® Rational® Application Developer for WebSphere Software V6.0 -- are all either free, bundled with Microsoft® Windows®, or available as time-limited free evaluation copies. Procure them on the Web from the links provided in Resources. XML testing and tuning Page 2 of 33 © Copyright IBM Corporation 1994, 2008. All rights reserved. ibm.com/developerWorks developerWorks® You might also find the following tools useful: • FireBug: A Firefox browser Document Object Model (DOM) and script extension • XMLBuddy: An XML editor plug-in for the Eclipse integrated development environment (IDE) Section 2. Choosing an appropriate XML technology The following sections discuss choices and trade-offs to consider when choosing an XML technology. W3CDOM DOM is a representation of an XML or HTML document as a tree data structure. A modern DOM parser includes an API and conforms to a series of standard specifications from the W3C. Such a parser is known as a W3CDOM parser. It is independent of computer language and platform. Older DOM parsers were vendor-specific and therefore not as portable as today's W3CDOM parsers. When this tutorial refers to a DOM parser or a DOM, it's referring to a W3CDOM. For cross-platform language independence, it is recommended that you use a standard W3CDOM parser when you work with a DOM. CRUD To create a DOM tree, you can use a parser to recognize an XML stream, create nodes and attributes programmatically, or use a combination of each approach. A DOM parser supports full Create, Read, Update, and Delete (CRUD) functionality of DOM elements. If you use the DOM API, you don't deal with angle brackets and XML syntax when you create or update an XML document. Creating XML programmatically from string data is generally messy and error-prone. You can also use XSLT to create an XML document, but it requires a base document. I recommend that you use a DOM parser API to create or update XML documents when not creating or transforming a base XML document to a new document. Push versus pull Simple API for XML (SAX) is an XML serial stream parser. A SAX parser application uses a data push model, meaning that the parser calls the application when a desired element or attribute appears from the XML input stream. The application registers a callback for each desired SAX event. Compare this push model to the XML testing and tuning © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 3 of 33 developerWorks® ibm.com/developerWorks DOM pull model, where the application pulls desired nodes from a DOM tree. A SAX-oriented application can handle a huge XML document, limited only by what it does with the data pushed to it. The application registers callback functions only for nodes or attributes of interest. The other elements are ignored during the parse operation, so they don't cause a memory-resource drain. Finite main memory limits a DOM-oriented application, because the DOM parser transforms the entire document into an in-memory tree. On the other hand, a DOM-oriented application can access XML DOM nodes in any order. Additionally, it can add, modify, or delete those nodes, as mentioned earlier. A SAX application cannot directly modify nodes or access them outside of their natural document ordering. Dual APIs DOM parsers, such as those of the Apache Xerces family, usually create a DOM tree through an internal SAX parser. These parsers can expose both the DOM API and a SAX API. You should choose a SAX parser for large read-only documents. If you need random access to DOM nodes, use a DOM parser. Narrative versus data XML is a standardized metalanguage used to create custom document grammars. It owes its roots to Standard Generalized Markup Language (SGML), a complex document markup language. Table 1 lists a few characteristics of narrative XML documents. Table 1. Narrative XML Characteristics of narrative XML Raw source is human-readable and not necessarily pretty Hierarchical format Content hierarchy is extremely flexible Documents can be large Documents render to human-readable text The primary actor is a human being We ate our own dog food in publishing this series of XML developerWorks tutorials. This tutorial document is a raw XML source document that conforms to a strict narrative XML schema. You're reading the rendered result, transformed by XSL to HTML for the Web rendition, or XSL Formatting Objects (XSL-FO) for the PDF version. The standardized nature of XML attracts programmers who need to manipulate rigidly formatted hierarchical data. Today, the application of XML to describe data is ubiquitous. Table 2 lists some of the characteristics of XML used to describe data. XML testing and tuning Page 4 of 33 © Copyright IBM Corporation 1994, 2008. All rights reserved. ibm.com/developerWorks developerWorks® Table 2. Data XML Characteristics of data XML Raw source is human-readable and not necessarily pretty Hierarchical format Content hierarchy is extremely rigid Documents are often small but can be large Documents might not render to human-readable text The primary actor is a computational process Table 3 summarizes the differences between Table 1 and Table 2. Table 3. Narrative versus data XML Document type Description Narrative Flexible, large, acted upon by human beings Data Rigid, small, acted upon by a computational process Consider these distinctions when you design a document XML schema or Document Type Definition (DTD). In fact, your first choice is to choose whether to constrain the document by XML schema or by DTD. XML schema versus DTD A schema enables fine-grained control over the document format -- something not normally associated with a narrative form, but often important to a data document. Suppose a document contains a zip code that constrains that element to all numerals with a length of exactly five or nine digits. You might want to modify the schema to validate postal codes of other countries. For example, Canada accepts a postal code that has six alphanumeric characters. A document that contains a paragraph would not require any length or type constraint for a paragraph element. It's clear that narrative documents tend to be looser than database-like documents. They usually don't need the iron discipline possible through an XML schema. Thus, DTDs, not schemas, are often the choice for narrative document grammars. Moreover, people often anonymously exchange these documents, so a standard grammar is important. A great number of standard DTDs are available. The HTML DTDs are examples of narrative DTDs. Many DTDs are specialized toward professional disciplines.