Starting Tips for Using Purexml in DB2 on Z/OS
Total Page:16
File Type:pdf, Size:1020Kb
With DB2 9 it became possible to store, retrieve and query XML documents in DB2 for z/OS ,using the pureXML functionality. Although today most DB2 DBA are familiar with the concepts of this new hybrid technology, it requires some effort to obtain the skills to be able to implement this new technology on site. This presentation will focus on a practical case the speaker used to help acquire these skills, by showing some experiences and scenario's that every one can try out and play with in their own environments. The objectives of this presentation are : •Understand the benefits of storing XML data in pureXML format instead of relational format •How to create and populate an XML database with real data and realistic volumes to play around •How to use SQL and XPath to query this XML data ; how to handle namespaces •How to implement XML schemas •Understand the use of XML indexes to improve query performance 1 2 3 4 5 6 XML, the extensible markup language acts as a flexible and self-describing data format for data exchange, web services, and service-oriented architectures. XML is also a hierarchical data model that is inherently different from the relational model. While relational data processing is based on rigorous and predefined schemas that allow for limited flexibility, XML is well-suited to represent data with variable or evolving schemas. XML is also commonly used as a data-format for semi-structured and unstructured data . Depending on the performance and flexibility requirements of particular applications, you will find that in some cases XML is a better choice than a relational schema, and in other cases relational data has advantages over XML. Many scenario’s also exists in which a hybrid approach , that is a mix of XML and relational data , is the best solution. DB2 pureXML provides sophisticated capabilities for storing, indexing, querying, updating and validating XML documents. The pureXML technology and its native XML storage format provide significantly higher performance and flexibility than alternative storage options for XML data, such as LOBs or shredding. DB2 pureXML also enables seamless integration of XML and relational data. (From DB2 pureXML Cookbook p13 ) One reason to store a XML document still as LOB is to be able to retrieve the document afterwards byte-to-byte 100 % identical as the original document (ex. for Compliance or Auditing reasons) . If you store a XML document as pureXML data type, the XML parser might remove insignificant whitespace . 7 DB2 pureXML has been designed to overcome the problems that are inherent in LOB storage and shredding. The advantages of DB2 pureXML and its native XML storage format include: • Retaining awareness of the internal structure of the XML data: Contrary to LOB storage, DB2 pureXML stores XML in a parsed tree format that explicitly represents the structure of each XML document. As a result, applications can query and update XML data using XQuery, XPath, and SQL/XML without XML parsing at runtime. This is a critical performance benefit. Additionally, indexes can be created on specific nodes. • Keeping business objects intact: DB2 pureXML stores each XML document as a cohesive unit that belongs to one row in a table, providing a very intuitive storage and processing model. In contrast, XML shredding scatters the values of each XML document over a number of tables. Hence, shredding can result in an unwieldy relational schema that is difficult to understand and inefficient for queries and the reconstruction of XML documents. • Schema flexibility: While shredding requires all XML documents to adhere to a single XML Schema that is mapped to relational tables, DB2 pureXML can store documents for variable or evolving schemas in the same XML column. The cost of schema evolution is much lower for DB2 pureXML than for a shredding approach. • Faster application development: Because DB2 pureXML does not require any schema mapping and uses a single XML column instead of complex relational schema, prototyping and designing applications can be much simpler with DB2 pureXML than with shredding. (from DB2 pureXML Cookbook p 10) 8 9 In DB2 pureXML Cookbook, two of IBM's leading experts (Matthias Nicola & Pav Kumar-Chatterjee) provide the single most comprehensive coverage of DB2's pureXML capabilities. This book explains DB2 pureXML in more than 700 practical examples, including 250+ XQuery and SQL/XML queries, taking the reader from simple introductions all the way to advanced scenarios. The authors have distilled their hands-on experience with many pureXML applications so that you can benefit from best practices, tips & tricks, performance guidelines, and other gems that are not documented elsewhere. This book is invaluable for database administrators and application developers, beginners and DB2 experts. The topics are organized by typical user tasks throughout the life cycle of XML database projects, from planning, designing, and implementing databases all the way to tuning, problem determination, and application development. It includes code samples for Java, .NET, COBOL, PL/1, C, PHP, and Perl programmers. The DB2 pureXML Cookbook provides proven recipes rather than a mere reference of ingredients. 10 11 In DB2 for z/OS, The installation job DSNTEJ1 creates five tables with XML columns. These tables are in the relational schema DSN8910 and are named PRODUCT, CUSTOMER, PURCHASEORDER, CATALOG, and SUPPLIERS. Only table DSN8910.PRODUCT is populated by the installation jobs. There are several ways to populate some of these tables. For example, if you have a DB2 for Linux, UNIX, and Windows installation, such as the free DB2 Express-C, you can create the sample database and select or export the data from there. The data can then be imported or inserted into the z/OS tables using SUPFI or an import job. The PDF document “DB2 Version 9.1 for z/OS XML Guide” (SC18-9858) provides the DDL and three INSERT statements with XML data for a table called MYCUSTOMER. You can copy and paste these statements into SPUFI to build a sample table to work with. 12 An XML document basically consists of elements with zero, one or more attributes. Each element consists of a <start tag> and an </end tag> . These tags are enclosed in angle brackets. Elements can have a value or contain other elements. Empty elements can have attributes and can be represented by a single <empty tag/> . Elements can occur multiple times. Attributes always have a value. A well-formed document has a single root element. The order of elements is significant. The order of attributes is not significant An XML document is case sensitive. This sample XML document is very simple in nature (no encoding schema, no XML version, no namespaces ,limited number of elements and attributes,….) 13 The IBM GSDB sample database is available to use in your own projects and for learning about IBM products (like IBM Data Studio) . The sample database contains a rich set of sample data that follows the fictional Sample Outdoor company and its sales and operations. It can be downloaded from the web To set up the sample database on DB2 for z/OS, you run the setup scripts from a workstation and install the database on a cataloged remote DB2 for z/OS subsystem. It contains one table with an XML column and 212 rows of data. Beware that cust-order-details1 and cust_order_details2 are empty elements with attributes represented by a single empty tag. 14 15 The XMLSERIALIZE function will convert an XML data type to XML text . The opposite function is XMLPARSE which converts XML text to a XML data type. Internally an XML parsed document is stored in UTF-8. Difference between AS CLOB and AS BLOB : the XML data is always returned in UTF-8 encoding scheme . With CLOB it will be shown as EBCDIC on the 3270 screen because of conversion to the application coding scheme. With BLOB the resulting string will be in UTF-8 . 16 17 18 Also IBM Data Studio Developer has an XML document viewer and editor. 19 20 21 In QMF you can export the result of a QMF query or a table in XML format by using the DATAFORMAT=XML clause on the EXPORT DATA or EXPORT TABLE command. This format must be used when the data contains XML columns but can also be used when the data or table to be exported does not contain XML columns. When you export data or tables in XML format, the data is exported to the HFS Unix file, the TSO data set, or the CICS data queue that is specified in the command. QMF uses the XML 1.0 specification (fourth edition) when exporting data. QMF uses z/OS XML parse services as well as z/OS Unicode conversion services when processing XML data for export , so these services must be configured and active. The result of exported XML data in QMF is always in Unicode UTF-8 format. The Unicode character set can include characters from almost all of the living languages of the world. In UTF-8, ASCII and control characters are represented by their usual ASCII single-byte codes, and other characters become two to four bytes long. The IBM UTF-8 implementation is defined by codepage 1208. UTF-8 stands for “UCS Transformation Format 8” 22 To illustrate we can use a simple result set as shown above with 2 columns and 2 rows. 23 The header records in the exported XML file contain the version of XML used, the encoding scheme, and a line that references which style sheet to use to format the exported XML document. QMF provides a default style sheet “qmf_dataset.xslt” as member DSQ1STSH of the QMF samples data set SDSQSAPn .