Front cover

Database Strategies: Using Informix XPS and DB2 Universal

Understanding and exploiting the strengths of Informix XPS and DB2 UDB

Considerations for transitioning data and schemas to DB2 UDB

Working with very large data volumes

Chuck Ballard Weiren Ding Carlton Doe Glen Mules Rajamani Muralidharan Santosh Sajip Nora Sokolof Andreas Weininger .com/redbooks

International Technical Support Organization

Database Strategies: Using Informix XPS and DB2 Universal Database

August 2005

SG24-6437-00

Note: Before using this information and the product it supports, read the information in “Notices” on page xiii.

First Edition (August 2005)

This edition applies to Version 8.2 of DB2 Universal Database (UDB) and Version 8.5 of Informix Extended Parallel Server (XPS).

© Copyright International Business Machines Corporation 2005. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents

Notices ...... xiii Trademarks ...... xiv

Preface ...... xv The team that wrote this redbook...... xvi Become a published author ...... xix Comments welcome...... xix

Chapter 1. Introduction to this redbook ...... 1 1.1 Understanding strategic directions for XPS...... 2 1.2 Objective of this redbook...... 3 1.3 Chapter abstracts ...... 4

Chapter 2. XPS and DB2 UDB architectures ...... 7 2.1 High-level product overviews ...... 9 2.1.1 IBM Informix Extended Parallel Server ...... 9 2.1.2 DB2 Edition ...... 11 2.2 Understanding the architectures ...... 20 2.3 Defining an instance ...... 21 2.3.1 Informix XPS instance architecture ...... 23 2.3.2 DB2 Universal Database instance architecture...... 25 2.4 Storage architecture ...... 29 2.4.1 Pages ...... 31 2.4.2 Containers and chunks ...... 31 2.4.3 Logical disks ...... 35 2.4.4 Logging ...... 37 2.4.5 Storage architecture summary ...... 41 2.5 Parallelism...... 42 2.5.1 The process model of XPS ...... 42 2.5.2 The process model of DB2 ...... 44 2.5.3 Intra-node parallelism ...... 47 2.5.4 Inter-node parallelism ...... 47 2.6 Memory management ...... 48 2.6.1 XPS memory model ...... 48 2.6.2 DB2 memory model ...... 51 2.7 Partitioning ...... 54 2.7.1 Fragmentation in XPS ...... 54 2.7.2 Partitioning in DB2 ...... 58 2.8 Terminology...... 60

© Copyright IBM Corp. 2005. All rights reserved. iii Chapter 3. Configuration ...... 63 3.1 XPS and DB2 configuration...... 64 3.1.1 Knobs (configuration files and tuning parameters) ...... 64

3.1.2 Commands ...... 64 3.1.3 Granularity...... 64 3.1.4 Database manager ...... 65 3.1.5 Dynamic parameters ...... 65 3.1.6 Cataloging ...... 66 3.1.7 Client access to DB2 instances...... 66 3.2 Configuration methods ...... 66 3.2.1 DB2 configuration methods...... 67 3.2.2 Configuration Advisor and the AUTOCONFIGURE command . . . . . 69 3.3 Configuration files and objects overview ...... 72 3.3.1 Environment variables and the profile registry ...... 72 3.3.2 Setting registry and environment variables ...... 73 3.3.3 DB2 configuration files and objects...... 74 3.4 Configuring the instance ...... 79 3.4.1 Page size(s) ...... 79 3.4.2 spaces...... 80 3.4.3 Bufferpools ...... 81 3.4.4 Physical and Logical Logs...... 82

Chapter 4. Instance and database operations ...... 87 4.1 Instance operation modes ...... 88 4.1.1 Online mode ...... 88 4.1.2 Offline mode ...... 90 4.1.3 Quiescent mode ...... 90 4.1.4 Creating and dropping the instance ...... 91 4.2 Modifying the configuration ...... 91 4.2.1 Working with the DAS ...... 92 4.2.2 Viewing or updating the configuration using Control Center ...... 93 4.2.3 Managing database groups...... 96 4.2.4 Managing buffer pools...... 99 4.3 Managing database storage ...... 100 4.3.1 Table spaces and containers ...... 100 4.3.2 Monitoring table space and container storage ...... 103 4.3.3 Transactions and logs ...... 108 4.4 Backup and recovery...... 112 4.4.1 Recovery types ...... 112 4.4.2 Backup and restore methods ...... 114 4.4.3 Table level restore...... 121 4.5 High availability ...... 122 4.5.1 Log mirroring ...... 122

iv Database Strategies: Using Informix XPS and DB2 Universal Database 4.5.2 Replication ...... 123 4.5.3 Online split mirror and suspended I/O support ...... 123 4.6 Security ...... 124

4.6.1 Authorization and privileges ...... 125 4.6.2 Roles and groups ...... 128 4.6.3 Security levels ...... 129 4.6.4 Client/server security...... 139 4.6.5 Authentication methods...... 139

Chapter 5. Data types ...... 141 5.1 Object names ...... 142 5.2 Data type mapping ...... 143 5.3 NULL values ...... 145 5.4 Disk considerations ...... 145 5.5 Character types ...... 146 5.5.1 Truncation ...... 146 5.5.2 NCHAR data type ...... 146 5.5.3 VARCHAR data type...... 147 5.5.4 TEXT data type ...... 147 5.6 Numerical data types...... 148 5.6.1 Numerical limits...... 148 5.7 DECIMAL ...... 149 5.7.1 MONEY data type ...... 149 5.7.2 SERIAL and SERIAL8...... 150 5.8 Date and time types ...... 152 5.8.1 DATE data type...... 152 5.8.2 DATETIME, TIME, and TIMESTAMP data types ...... 152 5.8.3 INTERVAL data type...... 154 5.9 FLOAT...... 155 5.10 REAL or SMALLFLOAT ...... 155 5.11 LOB data types ...... 155 5.12 Sequence objects ...... 155 5.13 Other object limits in DB2 ...... 157 5.14 DB2 manuals...... 158

Chapter 6. Data partitioning and access methods...... 159 6.1 Benefits of data partitioning...... 160 6.2 Hash fragmentation ...... 161 6.3 Round robin fragmentation ...... 164 6.4 Expression and range fragmentation ...... 165 6.5 Hybrid fragmentation ...... 166 6.6 Range partitioning using MDC ...... 166 6.6.1 Benefits of MDC ...... 170

Contents v 6.6.2 Design considerations for MDC tables ...... 171 6.6.3 Operations on MDC tables ...... 172 6.6.4 Space requirement for MDC ...... 172

6.7 Range-clustered tables in DB2 ...... 173 6.8 Roll-in and roll-out of data using UNION ALL views ...... 174 6.8.1 of UNION ALL views ...... 176 6.8.2 Benefits of UNION ALL views ...... 178 6.8.3 Limitations of UNION ALL views ...... 179 6.9 MDC and UNION ALL views for roll-in and roll-out ...... 180 6.10 Indexing strategies ...... 180 6.10.1 Syntax for index creation...... 180 6.10.2 DB2 index expansions ...... 181 6.10.3 Index types and access methods ...... 183 6.10.4 Space requirements for indexes ...... 186 6.10.5 Table and Index reorganization on DB2 ...... 187 6.11 Joins ...... 188 6.11.1 Join syntax ...... 188 6.11.2 Join methods (generic) ...... 190 6.11.3 Join strategies in a partitioned database...... 192 6.11.4 MERGE, UPDATE, and DELETE joins ...... 195 6.12 Optimizer ...... 196 6.12.1 The role of query optimizer ...... 196 6.12.2 LEO: Learning Optimizer...... 197 6.12.3 Push-down hash ...... 198 6.12.4 Optimization strategies for intra-partition parallelism ...... 199 6.12.5 Directives ...... 200 6.12.6 Optimization classes ...... 200 6.13 Performance enhancements in DB2 UDB V8.1 ...... 204 6.13.1 Distributed catalog cache ...... 204 6.13.2 Prefetch...... 205 6.13.3 Page cleaner I/O improvements ...... 205 6.13.4 Multi-threading of Java-based routines ...... 205 6.13.5 Join variations ...... 205 6.13.6 Increased opportunity for selection of bit-filters...... 206 6.13.7 Informational constraints ...... 206 6.13.8 Uniform page size ...... 207

Chapter 7. SQL considerations ...... 209 7.1 SELECT issues ...... 210 7.1.1 Selectivity ...... 210 7.1.2 Statistical sampling ...... 212 7.1.3 SELECT cursors ...... 215 7.1.4 Joins ...... 216

vi Database Strategies: Using Informix XPS and DB2 Universal Database 7.2 MATCHES predicate ...... 218 7.3 Comments ...... 218 7.4 SQLCODE and SQLSTATE ...... 219

7.5 Built-in functions ...... 219 7.6 SQL access to system catalogs ...... 223 7.7 Quotations and character strings ...... 224 7.8 Concatenation behavior ...... 224 7.9 Implicit casting...... 226 7.10 Deferred constraint checking...... 227 7.11 Set Operators: UNION, INTERSECT, and MINUS ...... 227 7.12 Multi-database access...... 227 7.13 Temporary tables ...... 228 7.13.1 Implicit ...... 228 7.13.2 Explicit...... 229 7.14 Compound SQL...... 230 7.15 INSERT cursors ...... 231 7.16 MERGE INTO ...... 231 7.17 Online analytical processing SQL ...... 234 7.18 Isolation levels...... 236 7.19 Optimizer directives...... 238 7.20 DDL issues ...... 238 7.20.1 Creating and altering tables ...... 238 7.20.2 Synonyms ...... 239 7.20.3 Primary key definitions ...... 240 7.20.4 Constraint naming ...... 240 7.21 Triggers ...... 240 7.21.1 SELECT triggers ...... 241 7.21.2 BEFORE-statement triggers ...... 242 7.21.3 Disabling triggers ...... 242 7.22 Multidimensional Clustering in DB2...... 243 7.23 DB2 Materialized Query Tables ...... 245 7.23.1 Using and Configuring MQTs ...... 246 7.24 System commands ...... 247 7.24.1 CREATE DATABASE ...... 247 7.24.2 Administrative commands ...... 247 7.25 Statistics for table and indexes ...... 248 7.26 Query Monitoring...... 249

Chapter 8. Loading and unloading data ...... 253 8.1 Loading and inserting data in a single stream...... 254 8.2 Parallel bulk loading ...... 255 8.2.1 Handling bad rows ...... 264 8.2.2 Performance and tuning considerations for loading with DB2 . . . . 269

Contents vii 8.3 Parallel unloading ...... 270 8.3.1 XPS unloading ...... 270 8.3.2 DB2 unloading...... 271

8.3.3 Parallel exports ...... 273 8.4 Specific issues...... 274

Chapter 9. Administration tools and utilities ...... 275 9.1 Resource management ...... 276 9.2 Performance tuning ...... 276 9.3 Tools and wizards that are included with DB2 ...... 278 9.3.1 Control Center...... 278 9.3.2 Command Editor ...... 278 9.3.3 Task Center...... 279 9.3.4 SQL Assist ...... 279 9.3.5 Visual Explain ...... 279 9.3.6 Configuration Assistant ...... 280 9.3.7 Journal ...... 280 9.3.8 Health Center ...... 281 9.3.9 Replication Center...... 281 9.3.10 License Center ...... 281 9.3.11 Information Catalog Center ...... 281 9.3.12 Data Warehouse Center ...... 282 9.3.13 Web administration ...... 283 9.3.14 Wizards, advisors, and launchpads ...... 283 9.4 Optional tools ...... 284 9.4.1 DB2 Performance Expert ...... 284 9.4.2 DB2 Recovery Expert ...... 284 9.4.3 DB2 High Performance Unload...... 284 9.4.4 DB2 Test Database Generator ...... 285 9.4.5 DB2 Table Editor...... 285 9.4.6 DB2 Web Query Tool ...... 286 9.4.7 Query Patroller ...... 286 9.5 Utilities...... 288 9.5.1 Database reorganization ...... 288 9.5.2 Database statistics ...... 290 9.5.3 Schema extraction ...... 291 9.5.4 Maintaining database integrity ...... 292 9.5.5 Throttling utilities ...... 292 9.5.6 Validating a backup...... 293 9.6 Other administrative operations ...... 293 9.6.1 Configuring automatic maintenance ...... 293 9.7 Monitoring tools and advisors ...... 296 9.7.1 Health check tools...... 296

viii Database Strategies: Using Informix XPS and DB2 Universal Database 9.7.2 Memory Visualizer...... 297 9.7.3 Storage Manager ...... 298 9.7.4 Event monitor ...... 298

9.7.5 Snapshots ...... 299 9.7.6 Activity Monitor ...... 300 9.7.7 DB2 Performance Expert ...... 300 9.7.8 The db2pd utility, an onstat equivalent ...... 300 9.7.9 Diagnostic files ...... 303 9.7.10 Error message and command help ...... 304

Chapter 10. Planning the transition...... 307 10.1 Tasks and activities ...... 308 10.1.1 Readiness assessment and scope ...... 308 10.1.2 Tool evaluation ...... 309 10.1.3 Estimating project duration ...... 309 10.2 Data conversion ...... 310 10.2.1 Preparation overview ...... 310 10.2.2 Data conversion process...... 312 10.2.3 Time planning ...... 313 10.2.4 The database structure ...... 314 10.2.5 Data movement approaches ...... 314 10.2.6 WebSphere Information Integrator ...... 315 10.2.7 Modifying the application...... 316 10.2.8 Database objects and interfaces...... 317 10.3 After the transition ...... 320

Chapter 11. Application conversion considerations ...... 321 11.1 Key considerations ...... 322 11.2 Application transitioning from XPS to DB2 ...... 322 11.3 Transactions ...... 323 11.4 ...... 324 11.5 Locks and isolation levels ...... 325 11.5.1 Lock escalation ...... 326 11.5.2 Deadlocks ...... 326 11.5.3 Isolation levels...... 326 11.6 Packages...... 327 11.6.1 Static versus Dynamic SQL ...... 328 11.6.2 Binding ...... 329 11.7 Cursors ...... 330 11.8 Stored procedures...... 331 11.9 Programming languages ...... 332 11.9.1 ESQL/ ...... 332 11.9.2 JDBC...... 334

Contents ix 11.9.3 ODBC/CLI ...... 335 11.9.4 C++ ...... 337 11.9.5 Large objects...... 337

11.9.6 SQL Communications Area...... 339 11.9.7 SQLDA ...... 344

Chapter 12. DB2 Migration ToolKit for Informix ...... 345 12.1 Features and functionality ...... 346 12.2 Recommendations for Use ...... 347 12.2.1 MTK installation and configuration ...... 347 12.2.2 MTK Configurations ...... 348 12.3 Technical overview of MTK ...... 350 12.3.1 The MTK GUI ...... 350 12.3.2 The migration process...... 351 12.4 DB2 Data Partitioning Facility considerations ...... 361 12.5 Installing and executing the MTK ...... 362 12.5.1 Using MTK with manual deployment to DB2 UDB ...... 363

Chapter 13. Large data volumes: A case study ...... 373 13.1 Project environment ...... 374 13.2 Disk layout...... 375 13.3 Splitting the CPU resources ...... 376 13.3.1 Configuring processor affinity on XPS ...... 376 13.3.2 Creating resource sets on AIX ...... 377 13.4 TPC-H Data generation...... 379 13.5 XPS configuration ...... 381 13.5.1 Onconfig file ...... 382 13.5.2 Creating dbslices...... 384 13.5.3 Creating and loading the table ...... 387 13.5.4 Index builds and update statistics ...... 389 13.5.5 Running TPC-H queries ...... 390 13.6 DB2 configuration ...... 391 13.6.1 Database and database manager configuration ...... 391 13.6.2 Creation of nodegroups and table spaces ...... 393 13.6.3 Transfer of schema ...... 394 13.6.4 Creating and loading the table ...... 398 13.6.5 Transfer of data...... 400 13.6.6 Index builds and runstats ...... 403 13.6.7 Changes to the TPC-H queries ...... 403 13.6.8 Roll-in and roll-out of data ...... 405 13.6.9 Roll-in and roll-out using UNION ALL views ...... 407 13.7 Observations ...... 409

x Database Strategies: Using Informix XPS and DB2 Universal Database Appendix A. Case study schemas definitions ...... 411 XPS schema and load scripts ...... 411 DB2 schema and load scripts...... 421

DB2 federated database system support ...... 428

Appendix B. Additional material ...... 431 Locating the Web material ...... 431 Using the Web material ...... 432 System requirements for downloading the Web material ...... 432 How to use the Web material ...... 432

Glossary ...... 433

Abbreviations and acronyms ...... 439

Related publications ...... 443 IBM Redbooks ...... 443 Other publications ...... 443 How to get IBM Redbooks ...... 445 Help from IBM ...... 445

Index ...... 447

Contents xi

xii Database Strategies: Using Informix XPS and DB2 Universal Database Notices

This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces.

© Copyright IBM Corp. 2005. All rights reserved. xiii Trademarks

The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:

Eserver ® Architecture™ Lotus® Eserver® DB2 Connect™ MVS/ESA™ Redbooks (logo) ™ DB2 Extenders™ OS/390® iSeries™ DB2 OLAP Server™ POWER4+™ pSeries® DB2 Universal Database™ Red Brick™ z/OS® DB2® Redbooks™ zSeries® DRDA® Tivoli® AIX® Informix® VisualAge® Cube Views™ Intelligent Miner™ WebSphere® Database 2™ IBM® Distributed IMS™

The following terms are trademarks of other companies: EJB, Java, JavaScript, JavaSoft, JDBC, JDK, JSP, JVM, J2EE, Solaris, Sun, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. ActiveX, Excel, Microsoft, Visual C++, Windows NT, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, and service names may be trademarks or service marks of others.

xiv Database Strategies: Using Informix XPS and DB2 Universal Database Preface

If you are an IBM® Client or IBM Business Partner, this IBM Redbook can help you better understand the DB2® Universal Database™ (DB2 UDB) with the Data Partitioning Feature, Informix® Extended Parallel Server (XPS), and the synergies between these products. This redbook describes the key functions and features that can help you develop robust data warehousing solutions, particularly with very large data volumes. It also discusses alternative strategies to take advantage of these products, particularly as you consider implementing new systems and solutions.

As an XPS client or partner, you might not be familiar with DB2 UDB. This redbook includes information to help you better understand DB2 and how you can use it to enhance database and data warehousing implementations. To assist in this task, it provides some alternative approaches for your consideration. As examples, it discusses an approach of coexistence with XPS and DB2 — one that includes Informix Dynamic Server — and an approach that involves transitioning from XPS to DB2 as your strategic DBMS. The information presented is applicable to the multiplatform versions of DB2 UDB — such as AIX®, versions of UNIX®, and Linux® — but is not applicable to IBM zSeries® or iSeries™ environments.

To aid with the transition option, IBM provides the DB2 Migration ToolKit for Informix (MTK). The MTK provides data mapping between the environments and includes the capability to actually convert and migrate the data from XPS for use in DB2 UDB.

© Copyright IBM Corp. 2005. All rights reserved. xv The team that wrote this redbook

This redbook was produced by a team of specialists from around the world.

Some team members worked locally at the International Technical Support Organization (ITSO), San Jose Center, while others worked from remote locations.

Chuck Ballard is a Project Manager at the International Technical Support Organization, in San Jose, California. He has over 35 years experience, holding positions in the areas of Product Engineering, Sales, Marketing, Technical Support, and Management. His expertise is in the areas of database, data management, data warehousing, business intelligence, and process re-engineering. He has written extensively on these subjects, taught classes, and presented at conferences and seminars worldwide. Chuck has both a Bachelors degree and a Masters degree in Industrial Engineering from Purdue University.

Weiren Ding is a Certified Advanced Technical Expert, in DB2 Advanced Support, Menlo Park, CA. He has worked many years in software research and development before he joined XPS Advanced Support with Informix in 1997. He works currently in DB2 Advanced Support, primarily in the areas of diagnosing and recovering down systems. He is also a foremost expert for UNION ALL issues and MDC solutions. Weiren has a Bachelors degree in Computer Science from Shanghai Jiao Tong University and a Masters degree in Computer Science from McGill University.

Carlton Doe had over 10 years of Informix experience as a DBA, Engine Administrator, and 4GL Developer before joining Informix in 2000. During this time, he was actively involved in the local Informix user group community and was one of the five founders of the International Informix Users Group (www.IIUG.org). He sat on the IIUG Board of Directors for many years in addition to serving as IIUG President and Product Advocacy Director. He has written two Informix Press books on administering the IDS engine, as well as several IBM white papers and technical articles.

xvi Database Strategies: Using Informix XPS and DB2 Universal Database Glen Mules is a Principal Consultant and Instructor in DB2 Information Management Education Services. He develops courseware and conducts DB2 Enablement programs world-wide. Glen has over 30 years experience as a developer, consultant, and instructor, including two years as a member of the data warehouse consulting and implementation group with Informix. He holds a Bachelor of Science degree in Mathematics from the University of Adelaide, South Australia, a Master of Science degree in Computer Science from the University of Birmingham, UK, and is working towards the completion of a Ph.D. in Education from Walden University.

Rajamani Muralidharan is a Consulting IT Specialist in Data Management Channels Technical Sales. He joined Informix in 1993 and has over 16 years experience working with Informix products as a developer, trainer, consultant, and pre-sales engineer. He holds Masters degrees in Applied Mathematics and Computer Science from New Jersey Institute of Technology.

Santosh Sajip is a Senior Software Engineer with the IBM Informix XPS Advanced Support team, Menlo Park, California. He joined the XPS Advanced Support team in 1998 and has over 10 years of experience in the software development and technical support field. He holds a Masters degree in Computer Science from Pune University in India.

Nora Sokolof is a Certified Consulting Brand Sales IT Specialist with the IBM DB2 Migration Team (SMPO). Nora has been with IBM for almost 20 years and has held positions as a DB2 UDB, Informix, Oracle, and PeopleSoft development DBA. In her seven years with the SMPO, Nora has assisted hundreds of customers with their migrations to DB2 from Oracle, PeopleSoft, and Informix. She has also authored a white paper and co-authored an IBM Redbook.

Preface xvii Andreas Weininger is a Consulting IT Specialist in Data Management Technical Sales in Germany. He has 10 years of experience in database systems for cluster and MPP systems. Andreas holds a Ph.D. degree in Computer Science from Technische Universität Munich.

Special Acknowledgements We want to express special appreciation to the following people for their significant contributions to this project, who provided sections of the redbook content, in addition to great feedback during the technical review process:

Kyle McEligot, Senior Software Engineer, XPS and DB2 Development, Beaverton, OR. Kyle is an XPS architect and currently has a focus on the considerations and issues for transitioning from XPS to DB2.

Gustavo Castro, Advanced Support Engineer, Coral Gables, Florida.

Other Contributors Thanks also to the following people for their contributions to this project.

From IBM Locations Worldwide  Omkar Nimbalkar, Informix Product Management and Marketing, Menlo Park, CA.  Jim Troisi, Senior Functional Manager, XPS and DB2 Development, Beaverton, OR.  Calisto Zuzarte, Manager, Query Rewrite Development in Markham, ON Canada.  Russ Stoker, DB2 eBusiness Performance, Markham, ON Canada.  Nattavut Sutyanyong, Query Rewrite Development, Markham ON Canada.

From the ITSO, San Jose Center  Mary Comianos, Operations and Communications  Deanna Polm, Residency Administration  Emma Jacobs, Graphics

xviii Database Strategies: Using Informix XPS and DB2 Universal Database Become a published author

Join us for a two- to six-week residency program! Help write an IBM Redbook

dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. You'll team with IBM technical professionals, Business Partners, or customers.

Your efforts will help increase product acceptance and customer satisfaction. As a bonus, you'll develop a network of contacts in IBM development labs and increase your productivity and marketability.

Obtain more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html

Comments welcome

Your comments are important to us!

We want our Redbooks™ to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways:  Use the online Contact us review redbook form found at: ibm.com/redbooks  Send your comments in an e-mail to: [email protected]  Mail your comments to: IBM Corporation, International Technical Support Organization Dept. QXXE Building 80-E2 650 Harry Road San Jose, California 95120-6099

Preface xix

xx Database Strategies: Using Informix XPS and DB2 Universal Database

1

Chapter 1. Introduction to this redbook

Perhaps you are feeling the impact of the changes that are taking place in the business environment today. Planning horizons and measurement periods are all becoming much shorter. Business processes, systems and application implementations, new market opportunities, and technology are all changing at an ever-increasing rate.

Coping with this changing environment requires speed and flexibility. To remain competitive, you must be quick to recognize when changes are required and quick to implement the required changes. You also must also remain flexible to meet customer demands, to satisfy shareholder expectations, and to enable continued business growth and leadership. Critical to meeting these requirements is maintaining a flexible and dynamic IT support infrastructure.

This redbook is all about change, specifically change to your IT support infrastructure, and, in particular, your database management system (DBMS), which is the foundation of that infrastructure. It is critical that your DBMS can house, organize, manage with integrity, and deliver to you, on demand, the data that is collected and stored by your organization. However, in addition, it needs to change with the business requirements and needs to be enabled with a robust set of tools and applications that can implement the change and growth in your organization.

This chapter provides a basic introduction to the strategic directions for XPS and explains how the information in this book is organized.

© Copyright IBM Corp. 2005. All rights reserved. 1 1.1 Understanding strategic directions for XPS

IBM DB2 Universal Database (DB2 UDB) is a leading DBMS and supports many applications, solution components, and technologies. It has a broad base of customers as well as a long history of leading capabilities and high customer satisfaction. DB2 is a performance leader, a highly scalable DBMS that is capable of supporting the largest customer data volumes, and it enjoys a robust set of very powerful capabilities.

However, this redbook is about the Informix Extended Parallel Server (XPS). What about that product? That is a good question!

XPS is also an excellent DBMS. We assume you are probably reading this book because you already have XPS. As such, you likely have a very good understanding of it. So, this redbook presents information about both products so that you can see the similarities and differences. The objective is not to compare them but simply to relate their capabilities as an enabler for more easily understanding them both.

But, you might ask, why do you need to understand DB2 at all? Another good question!

IBM is committed to supporting the full spectrum of Informix products as long as customers continue to use them. A quick look at what has happened since the Informix acquisition supports this commitment. There have been several new versions of Informix Dynamic Server (IDS), each with significant new functionality, to which customers have responded positively. Sales of IDS have grown steadily. In addition, work has continued with the Informix 4GL development environment becoming the foundation of a new Java™-oriented deployment model called Enterprise Generation Language (EGL). Even the older, more mature, products (such as Informix Standard Engine and Informix OnLine version 5) have seen functional improvements since the acquisition.

Informix XPS has received functional enhancements since the acquisition as well as continuing support. This support for existing customers will continue for as long as is necessary. IBM understands that the choice to use XPS was a strategic one and made after long and deliberate evaluations of the available options.

Moving forward, as new application requirements appear, IBM had to evaluate whether the functionality to support these requirements should be added to XPS, DB2, or both engines. In looking at the market penetration of XPS compared with DB2, IBM made the strategic decision to focus development on DB2. Thus, XPS version 8.51 is the last feature release, although fix packs to correct product deficiencies will be made available.

2 Database Strategies: Using Informix XPS and DB2 Universal Database IBM is not abandoning its XPS customers nor asking them to change to a new database engine. IBM understands that any type of data or infrastructure transition is difficult and costly. IBM will continue to provide support to existing XPS customers.

So, if you are an XPS customer, what should you do? The answer is simple — do what makes the most business and technical sense for you. If XPS is meeting your needs, continue to use it. If, however, you have new requirements that XPS cannot satisfy, we encourage you to evaluate DB2 and understand clearly its enhancements. There is a migration path, as well as a convergence, between the XPS technology and that of DB2.

DB2 UDB version 8.2 is an excellent starting point for these evaluations. Many key XPS features are included in DB2, and more are planned for subsequent versions. From a performance perspective, DB2 currently holds the TPC-H world records at the 100, 300, 1000, and 10000 GB levels. In short, going forward, the best technology features for business intelligence and data warehousing will continue to be made available in the DB2 product family.

Because there might be XPS customers who are evaluating their future choices and deciding to move to DB2 UDB, this redbook is designed to teach our Informix XPS customers with some important information regarding data management, DBMS strategies, DB2 UDB, and alternative management directions in a changing business environment.

1.2 Objective of this redbook

The acquisition of Informix by IBM has provided the opportunity for Informix customers to consider new alternatives to further enrich their data management systems infrastructure. They can now more easily take advantage of products, services, and capabilities that are available as they grow and change. There are many capabilities and alternatives available from IBM that Informix customers should explore, many of which are beyond the scope of this redbook. Therefore, this redbook focuses on the capabilities and alternatives for enriching the data management environment and includes information about:  Capabilities of DB2 UDB  Additional IBM data management products for data access and integration  DB2 tools and utilities  Data partitioning strategies and capabilities  Data access and integration  DBMS coexistence and transitioning

The intent of this redbook is to arm Informix customers with the information that they need to make decisions on their DBMS strategy and direction. It contains

Chapter 1. Introduction to this redbook 3 information that is suitable for readers at the management level as well as the technical level.

1.3 Chapter abstracts

This section provides a high-level summary of the contents and topics that this redbook contains. Through these abstracts, you can gain an overall understanding of the contents of this redbook without having to read all the detailed information that is contained in each chapter.

The information presented includes some high-level product overviews but is primarily oriented to detailed technical discussions of Informix XPS and DB2 UDB. Depending on your interest, level of detail, and job responsibility, you might want to be selective in the sections where you focus your attention. We have organized this redbook to enable that selectivity.

Here is a brief overview of the contents by chapter:  Chapter 1, “Introduction to this redbook” on page 1 This is the chapter that you are reading now. It gives a brief introduction and statement of the objectives and scope of this redbook and is focused around the IBM acquisition of Informix.  Chapter 2, “XPS and DB2 UDB architectures” on page 7 This chapter includes a comprehensive discussion of the DB2 architecture and describes how it relates to the XPS architecture. It provides a base for understanding the information in the other chapters. Topics covered include the basic structural components of memory, processes, and disk, and their management.  Chapter 3, “Configuration” on page 63 This chapter outlines the configuration options as they apply to both XPS and DB2. It presents information about the capabilities and parameters DB2 and discusses how they relate to an Informix XPS environment. This information can help, for example, as you consider the issues that are involved in a transition from XPS to DB2. As with XPS, configuration choices inherently influence the performance of the DB2 instance.  Chapter 4, “Instance and database operations” on page 87 This chapter describes the management of an instance and database operations. It includes an overview of memory and database storage management. Backup, recovery, and high availability are also covered, along with an overview of working in a database partitioned environment. Topics covered include instance operation and creation, configuration changes,

4 Database Strategies: Using Informix XPS and DB2 Universal Database managing database and log storage, backup and recovery, and high availability. It also provides some extensive examples to familiarize you with the operational aspects of DB2 with the Database Partitioning Feature (DPF). Because much of the operation of DB2 can be performed using the graphical user interface (GUI), it illustrates only a few operations that are specific to DPF.  Chapter 5, “Data types” on page 141 This chapter includes information about data types and data movement. It compares and contrasts XPS and DB2 object types and limits. It also provides an overview of the various data migration methods that can be used in addition to the DB2 Migration ToolKit for Informix. Although XPS and DB2 support the same types of data in general terms, each has particulars to the implementation. Most of these particulars are internal and will not overtly affect your application. However, some of them certainly can if you use them. This chapter discusses these potential issues.  Chapter 6, “Data partitioning and access methods” on page 159 This chapter focuses on the partitioning schemes that are provided by XPS and DB2 so that you can understand how each supports data partitioning. It describes the indexing schemes and join methods, which enable excellent operations in a partitioned environment, that are used by each product. This chapter describes the different types of fragmentation schemes in XPS and how they can be mapped to DB2 partitioning. It then discusses the various types of data access and join methods in XPS and how they can be mapped to the various capabilities in DB2. It particularly focuses on the multidimensional clustering capabilities of DB2.  Chapter 7, “SQL considerations” on page 209 This chapter contains a description of SQL considerations, along with the implementation and syntax that are used by XPS and DB2. Although both XPS and DB2 support SQL standards, they also support extensions to those standards that are effectively differences. There are both overt and subtle ramifications to these SQL differences. This chapter discusses SELECT variations, variations in the use of cursors, built-in functions, differences in DDL syntax, error handling, string manipulation, and SET statements.  Chapter 8, “Loading and unloading data” on page 253 This chapter details the process for loading and unloading , which is an important issue when operating in an environment with huge data volumes.

 Chapter 9, “Administration tools and utilities” on page 275 This chapter describes some of the tools and capabilities for administrative operations, as well as some of the typical DBA activities. As examples, it compares the use of XPS and DB2 utilities (such as loading and unloading

Chapter 1. Introduction to this redbook 5 data, validating data integrity, and maintaining data organization). It also provides a brief overview of the tools for working with DB2 and reviews DB2 monitoring tools and advisors as well as diagnostics.

 Chapter 10, “Planning the transition” on page 307

This chapter lists the specific steps to take in planning a transition. It includes a discussion of product similarities and differences, application considerations, and hardware considerations.  Chapter 11, “Application conversion considerations” on page 321 This chapter discusses the application conversion considerations. It includes topics such as transactions, locking, isolation levels, cursors, stored procedures, and triggers as well as environment-specific considerations such as JDBC™, ESQL/C, and ODBC.  Chapter 12, “DB2 Migration ToolKit for Informix” on page 345 This chapter gives a brief overview of the DB2 Migration ToolKit for Informix (MTK), a tool that automates the process of migrating your schema objects and data from IDS to DB2. The MTK is the recommended method of converting DDL and data, because it creates and runs scripts that can greatly reduce conversion time when compared to manual methods.  Chapter 13, “Large data volumes: A case study” on page 373 This chapter presents a case study that demonstrates operations in a large data volume environment. In our lab, we use the TPC-H benchmark schema and data for this case study. This data and schema was developed by the Council for use in comparing the capabilities of competing DBMS offerings. It demonstrates how XPS and DB2 enable the required functionality.  Appendix A, “Case study schemas definitions” on page 411 This appendix provides the detailed information used in the TPC-H case study as presented in Chapter 13, “Large data volumes: A case study” on page 373. It includes information about data generation and configuration, and it presents some best practices for working in a large data volume environment.

6 Database Strategies: Using Informix XPS and DB2 Universal Database

2

Chapter 2. XPS and DB2 UDB architectures

This chapter describes the architectures of the IBM Informix Extended Parallel Server (XPS) and DB2 Universal Database (DB2 UDB) relational database management system (RDBMS) products.

Understanding the product architectures is critical if you are considering a transition from XPS to DB2. This chapter compares and contrasts these architectures at a medium- to low-level of detail, which includes, but is not limited to, memory sizing, process considerations, and disk (data storage) allocation. For DB2 UDB, it concentrates specifically on DB2 UDB Enterprise Server Edition (ESE) with the Data Partitioning Feature (DPF), because this version of DB2 includes — as does XPS — support for cluster and massively parallel processing systems (MPP). When describing DB2, this chapter is not restricted to the itself but includes products of the DB2 Data Warehouse Edition, because some of the functionality that is available with XPS and its bundled utilities is available for DB2 from products in the DB2 Data Warehouse Edition. For example, some of the functionality of Informix I-Spy is provided by DB2 Query Patroller, which is part of the DB2 Data Warehouse Edition.

Although mentioned briefly in this chapter, this chapter does not cover the full configuration of either product. You can find additional details about configuration in Chapter 3, “Configuration” on page 63. In addition, this chapter touches upon performance tuning considerations. However, it does not discuss the topic at any

© Copyright IBM Corp. 2005. All rights reserved. 7 substantial depth. You should consider performance tuning after the transition, but such tuning is a function of your particular environment.

Both XPS and DB2 many similar concepts, components, and architectural structures. However, these products also have significant differences. You

should take these differences into consideration during the transition process.

The terminologies of XPS and DB2 vary slightly in some areas and more significantly in other areas. This chapter describes the terminology that is used as well as the component definitions to promote a common understanding.

Within this chapter, each topic uses the following format:  Introduction to topic area, with minor comparisons for clarity  Topic description relative to the XPS architecture  Topic description relative to the DB2 architecture

A dilemma that is encountered typically in a discussion of this type is finding common terms for what is described. For example, this chapter describes the architecture and components of database management systems. Specifically, it refers to actual implementations of the XPS and management systems. Customers and developers often refer to such implementations in a variety of terms (such as servers, engines, online systems, database servers, instances, or even simply as databases). For this redbook, we use the term instance to describe the implementation of the database management system.

8 Database Strategies: Using Informix XPS and DB2 Universal Database 2.1 High-level product overviews

This section gives a high-level overview of the two products that are discussed in this redbook:

 IBM Informix Extended Parallel Server  DB2 Universal Database Enterprise Server Edition with the Database Partitioning Feature (DB2 UDB ESE with DPF)

Because XPS is bundled with several products, such as Informix I-Spy, it also briefly discusses these bundled products. The DB2 Data Warehouse Edition also includes bundled products, some of which correspond to products bundled with XPS. However, there are also additional products for tasks such as data mining and ETL (extract, transform, and load). This section provides a high-level discussion of these products as well. Details about the commonalities and differences in the architecture of the two products are presented in subsequent sections.

2.1.1 IBM Informix Extended Parallel Server The IBM Informix Extended Parallel Server (XPS) version 8.50 database server is a high-end database server that provides scalable data warehousing with fast data loading and comprehensive data management. Designed for a broad range of enterprises, Informix XPS offerings include extensive performance-enhancing technologies for complex, query-intensive analytical applications. Informix XPS database servers provide:  Reliable data manipulation  Fast, ad hoc queries from their data warehouses  A combined data warehouse and  Rapid and concurrent data loading and query execution  Easy and simple expansion of capacity as data needs grow  Flexible control over user sessions and resource usage

Informix XPS database servers provide data warehousing features such as:  Rapid, efficient, fully parallel query processing: The Informix XPS database server uses all available hardware resources to deliver mainframe-caliber scalability, manageability and performance while requiring minimal operation-system overhead. The Informix XPS optimizer determines the best and can combine several methods of joining tables in a single query plan for the efficient use of memory and processing power.  Fast, easy expandability: The dynamic coserver management feature lets you add nodes to your system to expand database server capacity, either temporarily or permanently. In addition, one or more specific-purpose

Chapter 2. XPS and DB2 UDB architectures 9 coservers can distribute the processing load to enable parallel processing tasks to be accomplished while normal operations continue. To satisfy growing requirements, you can add permanent coservers to the database server to contain the additional tables or table fragments.  Flexible : The Informix XPS database server provides table fragmentation methods that are appropriate for normalized or denormalized relational database schemas. First, choose the that best fits your data queries, and then determine the fragmentation method to distribute data across coservers for best performance on specific queries.  Rapid data loading and unloading: The Informix XPS parallel data loader is fast and checks constraints as it loads. It is also a parallel unloader that enables quick downloads of data from your data warehouse to data marts or other specialized data stores. External tables enable fast and flexible data loading and unloading with easy handling of large volumes of data. You can load and unload data on 64-bit platforms from files and to files of unlimited size.  Ease of management: With the Informix Server Administrator tool, you can manage your Informix XPS database from any computer that has a Web browser.

The following products are bundled with XPS:  Informix Client Software Development Kit (Client SDK): A single packaging of several application programming interfaces (APIs) for rapid, cost-effective development of applications for IBM Informix servers.  Informix Connect: A runtime deployment component that includes the runtime libraries of the APIs which comprise Informix Client SDK.  Informix I-Spy: A smart data warehouse monitoring and optimization tool designed for IBM Informix Improves design efficiency; ensures lower maintenance costs.  Informix JDBC: A Java database connectivity (JDBC) driver — the JavaSoft™ specification of a standard API that allows Java programs to access database management systems.  Informix Server Administrator (ISA): A management tool that allows developers to perform system configuration, backup and restore, and system monitoring from any machine with a Web browser. Easy-to-use interface for the XPS command line.

10 Database Strategies: Using Informix XPS and DB2 Universal Database 2.1.2 DB2 Data Warehouse Edition

DB2 Universal Database Data Warehouse Edition (DWE) provides world-record

performance and robust business integration features, for real-time insight and decision-making to deliver information about demand. DWE continues to build on the scalable, extensible data warehousing and analytics engine of DB2. Even before adding on the OLAP, data mining, and ETL capabilities of DWE, the enhanced DB2 V8.2 engine supports some of the worlds largest, most demanding, and most highly used data warehousing environments.

There are three editions of DWE: 1. DB2 Data Warehouse Standard Edition: A complete data mart infrastructure product that includes DB2, integrated OLAP, advanced data mining, ETL, and provides spreadsheet integrated business integration for the desktop. DWE works with and enhances the performance of advanced desktop OLAP tools such as DB2 OLAP Server™ and others from IBM partners. Features of this edition are: – DB2 Alphablox – DB2 Universal Database Workgroup Server Unlimited Edition – DB2 Cube Views™ – DB2 Intelligent Miner™ Modeling, Visualization, and Scoring – DB2 Office Connect Enterprise Web Edition 2. DB2 Data Warehouse Base Edition: Provides mid- to large-scale enterprises with a data warehouse and data mart infrastructure that includes DB2 and integrated OLAP capability that can support departmental warehouses or lines of business or enterprise data warehouses where there is a need for scalability and performance. Features of this edition include: – DB2 UDB Enterprise Server Edition (ESE), V8.2 – DB2 Cube Views, V8.2 3. DB2 Data Warehouse Enterprise Edition: A powerful business intelligence platform that includes DB2, federated data access, Data Partitioning, integrated Online Analytical Processing (OLAP), advanced data mining, enhanced ETL, workload management, and provides spreadsheet integrated business integration for the desktop. DWE works with and enhances the performance of advanced desktop OLAP tools such as DB2 OLAP Server and others from IBM partners. Features of this edition include:

– DB2 Alphablox rapid assembly and broad deployment of integrated analytics – DB2 Universal Database Enterprise Server Edition

Chapter 2. XPS and DB2 UDB architectures 11 – DB2 Universal Database, Database Partitioning Feature (large clustered server support)

– DB2 Cube Views (OLAP acceleration) – DB2 Intelligent Miner Modeling, Visualization, and Scoring (powerful data

mining and integration of mining into OLTP applications) – DB2 Office Connect Enterprise Web Edition (Spreadsheet integration for the desktop) – DB2 Query Patroller (rule-based predictive query monitoring and control) – DB2 Warehouse Manager Standard Edition (enhanced extract/transform/load services supporting multiple Agents) – WebSphere® Information Integrator Standard Edition (in conjunction with DB2 Warehouse Manager to provide native connectors for accessing data from Oracle databases, Teradata databases, Sybase databases, and Microsoft® SQL server databases)

The DB2 Data Warehouse Enterprise Edition supports the Database Partitioning Feature and is the edition that best corresponds to XPS. Therefore, we focus on that edition. The individual components of the DB2 Data Warehouse Enterprise Edition are discussed in “Solution contents” on page 12. The main component of the DB2 Data Warehouse Enterprise Edition is discussed in “Database partitioning feature” on page 19.

Solution contents This section discusses the components of DB2 DWE that deliver not only the world’s leading DBMS but also enable the implementation of a complete data warehousing environment.

DB2 Alphablox DB2 Alphablox is an industry-leading platform for the rapid assembly and broad deployment of integrated analytics that are embedded within applications. It has an open, extensible architecture that is based on Java 2 platform, Enterprise Edition (J2EE™) standards, an industry standard for developing Web-based enterprise applications. It simplifies enterprise application development by handling many details of application behavior automatically without the need for complex programming.

DB2 Alphablox provides various Blox components, which are modular, reusable components, as well as an application framework, a powerful programming model, and a variety of development tools for assembling analytic applications. For its runtime environment, DB2 Alphablox leverages standard J2EE application servers. DB2 Alphablox can be installed on leading commercial J2EE application servers such as IBM WebSphere and others.

12 Database Strategies: Using Informix XPS and DB2 Universal Database When developing applications with embedded DB2 Alphablox capability, you can take advantage of many features that are offered by the underlying J2EE application servers, including enhanced performance, security, and personalization. Integration with the application server environment enables application builders to leverage DB2 Alphablox for its base capabilities that are related to building, deploying, and executing analytic applications, while relying on the application server to provide robust management and deployment services.

DB2 Alphablox provides an extensive library of Blox to meet integrated analytic application design requirements for maximum usability. These components include:  Data access blox which manage data access through connection between the user interface and the appropriate data sources. Because DB2 Alphablox directly accesses the data from your databases, applications leveraging its capabilities will abide by any security features or constraints built into your database. DB2 Alphablox exposes all of the analytic function that is supplied by the multidimensional database engines (as examples, ranking, derived calculations, ordering, sophisticated filtering, percentiles, deciles, variances, standard deviations, correlations, trending, statistical functions, and other sophisticated calculations). In addition, DB2 Alphablox allows users and application developers to create custom calculated members.  DataBlox also offers application program interfaces (APIs) to return data in an XML format. This feature opens the door to extensibility, applications leveraging DB2 Alphablox to be integrated with enterprise applications. It also enables delivery of data to XML-enabled clients, including phones, pagers, and PDAs. Application developers also can expose the data in a Web service or build custom user interfaces.  User interface blox that are provided by DB2 Alphablox are highly functional, interactive, and completely customizable to improve your application usability. These user interface elements employ DHTML technology to provide a rich user experience, including menu bars, right-click menus, and custom layouts in a thin client (no need for Java, ActiveX®, or other browser plug-ins).  Form elements blox provides several form elements Blox that are extremely useful in developing custom analytic applications. All the form elements Blox not only maintain the form elements' current state, freeing up developers from writing the extra code, but also link with other components such as Java beans, including data access Blox and user interface Blox to provide most commonly required functionality with minimal coding.

 Business logic blox provides business logic to facilitate incorporating dynamic, complex business logic into integrated analytic applications.

Chapter 2. XPS and DB2 UDB architectures 13  Analytic infrastructure blox packs a tremendous amount of customization, personalization, and collaboration capability. Application developers can customize their line-of-business applications and personalize the interfaces to each individual user through blox properties, blox JavaScript™ and Java API, and application and user custom properties. DB2 Alphablox supports a standard J2EE application development model, offering a complete development paradigm for application delivery. It provides application developers extensive flexibility in customizing the user interface and adding their own business and application logic by exposing every component as a Java bean and allowing programmatic access to those beans through a rich set of Java APIs.

With the DB2 Alphablox tag libraries, JSP™ coders do not have to know the low-level technical details behind the components. They simply need to know the syntax and function for that respective piece. This feature enables page authors with no Java experience to incorporate analytics seamlessly on an intranet or extranet using best-of-breed authoring tools. Each Blox has a comprehensive set of properties that could be set easily using the tags to custom values in the JSP pages.

DB2 UDB ESE with the Database Partitioning Feature The DPF is the database server included in the DB2 Data Warehouse Enterprise Edition. It is discussed in more detail in “Database partitioning feature” on page 19.

DB2 Cube Views From a general perspective, DB2 Cube Views enable you to:  Accelerate OLAP queries by using more efficient DB2 materialized query tables. DB2 materialized query tables can pre-aggregate the relational data and dramatically improve query performance for OLAP tools and applications.  Integrate business intelligence applications with the data warehouse easily by sharing the metadata between the relational database and business intelligence applications. Instead of managing each application individually, you can model the data in the warehouse once and deploy that model with every application.  Accelerate OLAP deployments by building and storing dimensional metadata in the database for use by client tools.

OLAP is a core component of business integration, and gives users the ability to interrogate data by intuitively navigating from summary to detail data. All OLAP solutions rely on a RDBMS to source and query data dynamically and to support drill-through reports.

14 Database Strategies: Using Informix XPS and DB2 Universal Database DB2 Cube Views are the latest generation of OLAP support in DB2 UDB and include features and functions that make the relational database a first-class platform for managing and deploying multidimensional data across the enterprise. Armed with these facilities, architects can provide OLAP solutions that can be deployed faster, are easier to manage, and can improve performance across the spectrum of analytical applications regardless of the particular OLAP tools and technologies used. With DB2 Cube Views, the database becomes multidimensionally aware by:  Including metadata support for dimensions, hierarchies, attributes, and analytical functions.  Analyzing the dimensional model and recommending aggregates (MQTs, also known as summary tables) that improve OLAP performance.  Adding OLAP metadata to the DB2 catalogs, providing a foundation for OLAP that will speed deployment and improve performance.  Simplifying the exploitation of advanced DB2 technologies such as summary table management and analytical functions.

DB2 Cube Views are a unique integrated approach to OLAP and improve DB2 query performance, because it:  Enables the DB2 optimizer to rewrite incoming queries to take advantage of the MQTs that DB2 Cube Views recommends.  Loads cubes and performs drill-through queries and ad-hoc analysis directly to the relational tables in DB2.  Enhances all queries that use aggregate data.  Enables applications and tools to process the dimensional data in DB2 naturally, productively, and efficiently.

DB2 Cube Views exploit DB2 features such as summary tables, different index schemes, OLAP-style operators, and aggregate functions. The following capabilities and components are included:  You can create a set of metadata objects to dimensionally model your relational data and OLAP structures. DB2 Cube Views stores each of the metadata objects that you can create in the DB2 catalog.  With the OLAP Center, you can create, manipulate, import, or export cube models, cubes and other metadata objects to be used in OLAP tools. The OLAP Center provides easy-to-use wizards and windows to help you work with your metadata. For example, the Optimization Advisor analyzes your metadata and recommends how to build summary tables that store and index aggregated data for your OLAP-style SQL queries.

Chapter 2. XPS and DB2 UDB architectures 15  DB2 Office Connect Analytic Edition is an easy-to-use spreadsheet add-in tool for querying OLAP data in DB2. With DB2 Office Connect Analytic Edition, you can connect to a DB2 database, a DB2 Cube Views cube, and explore the data in Microsoft Excel®.  DB2 Cube Views provides a SQL-based and XML-based API for OLAP tools and application developers. Through CLI, ODBC, or JDBC connections or by using embedded SQL to DB2, applications and tools can use a single to create, modify, and retrieve metadata objects.  A sample application and database are available to help you learn. You can exchange metadata objects between the DB2 catalog and OLAP tools. To import or export metadata objects to or from the DB2 catalog, utilities that are called metadata bridges are available for specific OLAP and database tools.  The db2mdapiclient utility is provided as sample source code for coding an application for Multidimensional Services.

DB2 Intelligent Miner Modeling, Visualization, and Scoring DB2 Intelligent Miner brings support for data mining to the DB2 Data Warehouse Enterprise Edition. By using DB2 Intelligent Miner Modeling, you can discover hidden relationships in your data without exporting data to a special data mining computer or resorting to small samples of data.

DB2 Intelligent Miner Modeling delivers DB2 Extenders™ for the following modeling operations:  Associations discovery. Application examples include the discovery of product associations in a market basket analysis, site visit patterns an eCommerce site, or combinations of financial offerings purchased.  Demographic clustering. Application examples include market segmentation, store profiling, and buying-behavior patterns.  Tree classification. Application examples include profiling customers based on a desired outcome such as propensity to buy, projected spending level, and the likelihood of attrition within a period of time.

DB2 Intelligent Miner Modeling is a sophisticated SQL extension of the DB2 database and enables modeling functions to be imbedded into business applications. It supports the development of data mining models in a format which conforms with Predictive Model Markup Language (PMML) V2.0, the new industry standard for analytic models.

When new relationships are discovered, DB2 Intelligent Miner Scoring allows you to apply the new relationships in your data to new data in real-time.

16 Database Strategies: Using Informix XPS and DB2 Universal Database Data-mining model-analysis is available via DB2 Intelligent Miner Visualizer, a Java-based results browser. It allows even non-experts to and evaluate the results of the data-mining modeling-process.

DB2 Office Connect Enterprise Web Edition Spreadsheets provide an intuitive and powerful front end to represent and manipulate business information. Microsoft Excel is a widely used spreadsheet product. A primary issue with Excel is its inability to transfer information seamlessly between the spreadsheet and a relational database such as DB2 or Informix. Often the users end up writing complex macros to do this transfer. This process is expensive, difficult to maintain, and frequently beyond the skill set of the typical Excel power user.

IBM Office Connect 4.0 enables Excel users to overcome this limitation by providing a simple GUI-based patented process that enables information in an Excel spreadsheet to be transferred seamlessly to multiple databases. Office Connect transforms a normally static Excel spreadsheet to a dynamic e-business application by providing enterprise users, secure and authenticated database reporting and update capabilities, in an internet/intranet and client server environment.

DB2 Query Patroller DB2 Query Patroller is included in DB2 Data Warehouse Enterprise Edition, but it is also available as a stand-alone product. DB2 Query Patroller is a powerful query management system that you can use to control the flow of queries proactively and dynamically against your DB2 database in the following key ways:  Define separate query classes for queries of different sizes to better share system resources among queries and to prevent smaller queries from getting stuck behind larger ones.  Give queries submitted by certain users a high priority so that these queries run sooner.  Put large queries on hold automatically so that they can be canceled or scheduled to run during off-peak hours.  Track and cancel runaway queries.

The features of Query Patroller allow you to regulate your database query workload so that small queries and high-priority queries can run promptly and your system resources are used efficiently. In addition, information about completed queries can be collected and analyzed to determine trends across queries, heavy users, and frequently used tables and indexes.

Chapter 2. XPS and DB2 UDB architectures 17 Administrators can use Query Patroller to:

 Set resource usage policies at the system level and at the user level.  Actively monitor and manage system usage by canceling or rescheduling queries that could impact database performance

 Generate reports that assist in identifying trends in database usage such as which objects are being accessed, and which individuals or groups of users are the biggest contributors to the workload.

Query submitters can use DB2 Query Patroller to:  Monitor the queries they have submitted.  Store query results for future retrieval and reuse, effectively eliminating the need for repetitive query submission.  Set a variety of preferences to customize their query submissions, such as whether to receive e-mail notification when a query completes.

DB2 Query Patroller provides most of the functionality of I-Spy. In addition, it provides some functionality not available in I-Spy, such as charge back reporting, and object usage/non-usage. In general, I-Spy delivers the following capabilities:  A smart data warehouse monitoring and optimization tool.  Helps data warehouse administrators and architects enhance utilization efficiency, design improvements, and lower maintenance costs.  Sits transparently between the database and the client, helping the administrator monitor and adjust database resources and client query usage, including: viewing executed SQL; viewing the accessed data; obtaining reports on execution time; determining quantity of data returned; identifying inefficient or long-running queries.  Maximizes data warehouse investment by providing information about how the warehouse is being used.  Improves the efficiency of data warehouse design by allowing the data warehouse architects and developers to prototype the data and queries required.  Provides information that enables the reduction of CPU requirements.

DB2 Warehouse Manager Standard Edition DB2 Warehouse Manager Standard Edition, V8.2 provides an integrated and flexible tool for designing, populating, and managing data warehouses. DB2 Warehouse Manager Standard Edition ETL tool allows you to define transformations with the SQL-based stored procedure language of DB2, and leverages DB2 itself as the underlying transformation engine. DB2 Warehouse Manager Standard Edition has a flexible architecture that allows remote agents

18 Database Strategies: Using Informix XPS and DB2 Universal Database to perform the transformation function on the optimal platform: source, target, or dedicated hub.

WebSphere Information Integrator Standard Edition WebSphere Information Integrator Standard Edition provides (solely for use with Warehouse Manager ETL) native connections to Oracle, Teradata, Microsoft SQL Server, and Sybase databases. It offers the capabilities of WebSphere Information Integrator Replication Edition plus those of a federated data server, including powerful cost-based query optimization and integrated caching.

This package includes:  DB2 Net Search Extender V8.2  IBM Lotus® Extended Search V4.0  WebSphere Studio Site Developer V5.1.2  WebSphere MQ V5.3  WebSphere Application Server V5.1 (Windows®, AIX, HP-UX, Solaris™, and Linux)  DB2 UDB Additional Features for Linux 32- and 64-bit

Database partitioning feature DB2 UDB Enterprise Server Edition (ESE) is designed to meet the relational database server needs of mid- to large-size businesses. It can be deployed on Linux, UNIX, or Windows servers of any size, from one CPU to hundreds of CPUs. DB2 UDB ESE is an ideal foundation for building on demand enterprise-wide solutions, such as large data warehouses of multiple terabyte size or high performing 24x7 available high volume transaction processing business solutions, or Web-based solutions. It is the database backend of choice for industry-leading ISVs building enterprise solutions, such as, Business Intelligence, Content Management, e-Commerce, ERP, CRM, or SCM. Additionally, DB2 ESE offers connectivity, compatibility, and integration with other enterprise DB2 and Informix data sources.

The DPF allows Enterprise Server Edition customers to partition a database within a single server or across a cluster of servers. The DPF capability provides the customer with multiple benefits including scalability to support very large databases or complex workloads and increased parallelism for administration tasks. DPF is required in order to partition your DB2 UDB ESE database, either within a single server, or across multiple servers. The DPF is a license-only and does not require any additional products on top of DB2 UDB ESE to be installed on your database server to support database partitioning.

Chapter 2. XPS and DB2 UDB architectures 19 In the past database partitioning was provided by the DB2 UDB Enterprise Extended Edition (EEE) and in order to partition a database this product needed to be installed. With DB2 UDB version 8, if you already have DB2 UDB ESE installed and determine that it would be beneficial to partition the database, there is no need to remove or install any software. You only need to purchase the DPF for the server(s) where you will create the database partitions.

2.2 Understanding the architectures

Both, XPS and DB2 share the same basic architecture. They are both designed to support cluster systems and massively parallel processor (MPP) systems, and they both have a shared nothing architecture. Figure 2-1 shows a cluster that consists of several nodes.

Interconnect

node 1 node 2 node ‘n’ Memory Memory ... Memory CPUs CPUs CPUs

Disks Disks Disks Figure 2-1 Cluster architecture

The term MPP system is often used as a for a cluster system with a large number of nodes. Each node has a set of local resources, such as one or more CPUs, memory, and disks, and each can use only its local resources. The nodes communicate via a shared network.

When each node can only access its own resources directly, it is called a shared nothing architecture. XPS and DB2 UDB DPF mirror this hardware architecture in software by having logical nodes, called coservers, in XPS, and partitions in DB2 that correspond to the physical nodes of a cluster. Each logical node has its local resources (processes) corresponding to CPUs, memory, and logical disks. The only communication between the logical nodes is via message passing with

20 Database Strategies: Using Informix XPS and DB2 Universal Database the XPS Message Passing Facility (XMF) in XPS and the Fast Communication Manager (FCM) in DB2 UDB DPF.

An alternative to a shared nothing architecture is a shared everything architectures, which can be found in some database systems, which support

only single symmetric multiprocessing (SMP) nodes, and shared disk systems, which are used by some database systems for cluster systems that assume a synchronized access to a shared disk subsystem.

2.3 Defining an instance

We define an instance (of an RDBMS) as the physical instantiation of that RDBMS on a cluster of computers. A special case of a cluster is one which consists of just one computer (node). An instance is comprised of four basic components, all of which are resources that are necessary to operate an RDBMS in a cluster environment. These components are memory, CPU (processes), disk, and interconnect. This definition is a common definition for any RDBMS that works in a cluster environment, including both XPS and DB2 ESE DPF. Actually the first three of these components are even common for an RDBMS in non-clustered environments. Figure 2-2 illustrates and describes these components.

Interconnect

Logical node One or more shared memory segments that are allocated to that instance. The Memory allocation of these segments can be dynamic for some types of segments or a pre-allocation of memory.

One or more processes that do the tasks that are requested by the instance. The Instance Processes allocation of these processes can be dynamic or pre-allocated depending on the type of process.

A storage facility that contains persistent instance structures to hold data. The Disk Disk allocation of these structures can be dynamic or pre-allocated, depending on the type of disk allocation.

Figure 2-2 High-level instance architecture per logical node

Chapter 2. XPS and DB2 UDB architectures 21 Both XPS and DB2 use these common components. However, their implementations are different. For example, in XPS many structures are shared across the instance, which is very different from a DB2 instance, where many structures and processes are per database. The XPS memory and process footprint is more fixed in size than on a DB2 instance. Although in XPS there are portions and segments that can grow, this should not be necessary depending on adequate initial configuration and predicted activity growth. With DB2, processes can be spawned for many reasons, such as connection growth or database creation. This spawning can cause the DB2 memory or process footprint to grow dramatically depending on the configuration.

There are configuration choices, however, that help pre-allocate either memory structures, processes, or disk so that the size is more predictable or more contained. Figure 2-3 depicts the overall instance architecture of XPS and DB2. This figure shows that the software architecture of XPS and DB2 maps very nicely to the cluster architecture shown in Figure 2-1 on page 20.

Inter-node Communication

Logical node 1 Logical node 2 Logical node ‘n’ Memory Memory ... Memory Processes/Threads Processes/Threads Processes/Threads

Logical Disks Logical Disks Logical Disks

Figure 2-3 Components of an XPS or DB2 instance

The following sections discuss the components of the XPS and DB2 instances.

22 Database Strategies: Using Informix XPS and DB2 Universal Database 2.3.1 Informix XPS instance architecture

Figure 2-4 illustrates the high-level XPS architecture. Note that this figure could illustrate an environment of one user or several thousand users. The footprint of the instance could grow if needed either dynamically or manually by the DBA.

However, if the initial instance configuration was large enough, the instance size would not have to change to accommodate the large range of users.

Coserver X Shared memory segments: resident, virtual (several resident virtualvirtualvirtual message possible), message (optional). Optional segment for inter- Memory segment segment(s)segmentsegment segment(s) coserver communication

oninit oninit oninit oninit oninit oninit processes - more commonly known as VPs, or oninit oninit oninit oninit oninit virtual processors. Processes

root dbslice: logical log physical log one dbslice – the rootdbs. logical log Contains one dbspace per logical log coserver

dbslice1: 0 to m dbslices for data Disk Any number of dbspaces per dbslice2: coserver

dbslice3: 0 to m temp dbslices Any number of dbspaces per temp dbslice: coserver

Figure 2-4 High-level XPS instance architecture

Both XPS and DB2 can have from one to 256 instances on a single computer. Figure 2-5 on page 25 shows the most important subsystems of an XPS instance are shown in figure, which are as follows:  ASF (Association Service Facility): The ASF subsystem is used for communicating with the client. It is also used when single queries involve several XPS (or IDS) instances and therefore has to talk to other instances.  SQL: The SQL subsystem is responsible for parsing a query, building a query plan, optimizing and parallelizing it (for example, by constructing a so-called

xplan), and for implementing different types of iterators, such as hash joins and hash groups.  XNC/XNP: This subsystem provides global transactions (for example, those transactions involving several coservers).

Chapter 2. XPS and DB2 UDB architectures 23  DFM (Data Flow Manager): The DFM is responsible for executing parallel query plans, and as the name implies controlling the data flow during the execution.

 XMF (XPS Message Passing Facility): The XMF subsystem is responsible for the inter-coserver communication.  DG Layer (Datagram Layer): The DG layer is the lowest layer of the XMF subsystem. It is the only platform dependent code of the inter coserver communication. There are implementations of the DG which are specific to certain interconnects, but there are also implementations of the DG layer with UDP and with TCP/IP.  RSAM (Relational Storage Access Method): The RSAM subsystem handles all the aspects of accessing and using the disks, including logging, scanning, and writing data.  BELIB (Back-End Library): The BELIB provides an abstraction for the and especially implements the non-preemptive threads of XPS.  RGM (Resource Grant Manager): The RGM controls access to the XPS system and determines how much resource (especially memory) a parallel query gets.

Figure 2-5 on page 25 shows also how these subsystems handle the different components of an XPS instance. The XMF and DG Layer are responsible for the interconnect, RSAM for the logical disks, BELIB for processes and threads, and the RGM to control how much memory a parallel query gets.

Resources such as shared memory, logical disks (for example, dbspaces and dbslices), and logical and physical logs are allocated in XPS at the instance level, not at the database level. Also, most other configuration parameters are set at the instance level and not at the database level.

24 Database Strategies: Using Informix XPS and DB2 Universal Database

Client

ASF

XNC/XNP SQL DFM RSAM RGM XMF BELIB

DG Layer

disk interconnect processes memory subsystem

Figure 2-5 Subsystems of XPS

2.3.2 DB2 Universal Database instance architecture With DB2, each instance might have one or more databases and one Database Manager Configuration file. In addition, each database has its own Database Configuration file, catalog tables, logs, reserved buffer pool area, and table spaces. Table spaces can be regular, long (for LOB data), user temporary, and system temporary. Tuning parameters, resource management, and logging can differ for each database in the instance and can be controlled at the database level. Configuration parameters at the instance and database level can be set using the Control Center.

Temporary table spaces are used to store temporary tables that are created and managed by the database management system for sorting and other DBMS activities. By default, one temporary table space, TEMPSPACE1, is created. But additional temporary table spaces can be created at different page sizes, (such as examples, 4 KB, 8 KB, 16 KB, and 32 KB as examples). Temporary objects are allocated to temporary table spaces in round robin fashion. A default user temporary table space is not created with the installation.

Chapter 2. XPS and DB2 UDB architectures 25 DB2 and XPS instance architecture differences The DB2 instance has many similar components to XPS. But, there are at least three major differences from a high-level architectural point-of-view: 1. The DB2 Administration Server (DAS) is a special type of instance that is created typically when a DB2 instance is first created. The DAS allows remote administration tools, such as the DB2 Control Center, access to a DB2 instance.

Note: The DAS does not have to be running to use a DB2 instance. However, if the DBA wants to administer an instance remotely via tools such as the DB2 Control Center, then the DAS does have to be running.

There is only one DAS, which handles all instances, on each computer.

2. Memory allocation is done at the database manager and instance level, but also at a lower level of granularity, such as per connection, application, or database. You can find more detail in 2.6, “Memory management” on page 48. 3. Process allocation is done both dynamically and at instance startup. This is very different than the XPS instance where the processes are pre-allocated and are not increased unless by a DBA.

Figure 2-6 on page 27 shows a high-level view of a DB2 instance on a single computer.

26 Database Strategies: Using Informix XPS and DB2 Universal Database

Computer

DB2 instance database1 database2

DAS (DB2 AdministrationServer)DAS (DB2 fact_table buffer pool (4K page)

Memory IBMDEFAULTBP Buffer Pool IBMDEFAULTBP Buffer Pool

processes processes Processes

Primary fact_table syscatspace log files tempspace1 tablespace tablespace tablespace (4K page) Primary userspace1 log files Disk tablespace syscatspace userspace1 tempspace1 tablespace tablespace tablespace

Figure 2-6 DB2 architecture — one instance

The DB2 architecture that is shown in Figure 2-6 has the following components:  One DAS - DB2 Administration Server  One instance with a Database Manager  Two databases: database1 and database2  The following allocations for each database: – Processes – Log buffers and log files – Buffer pools - two for database2, and 1 for database1 – Table spaces • Default table spaces (built when instance is created) - syscatspace, userspace1, tempspace1 for both databases • Table and index table spaces for database2 - fact_table and fact_index

Chapter 2. XPS and DB2 UDB architectures 27 Figure 2-7 on page 29 shows the subsystems of DB2, which are described as follows:

 CCI (Client Connect Interface): The layer that is responsible for communication with a client. Thus, this subsystem roughly corresponds to the ASF subsystem of XPS.  RDS (Relational Data Services): This subsystem optimizes a query and builds a query plan. The services which the RDS subsystem provides are implemented in XPS, in the SQL subsystem.  DMS (Data Management Services): The subsystem that is responsible for the execution of a query. It receives the query plan from the RDS subsystem and returns the results to the RDS after the execution of a query.  FCM (Fast Communication Manager): This subsystem is responsible for the inter-coserver communication. It roughly corresponds to XMF and DG layer subsystems of XPS.  DPS (Data Protection Services): This subsystem contains the log manager, the lock manager, and the transaction manager. Most of the tasks of this subsystem are provided in XPS by the RSAM subsystem.  BPS (Bufferpool Services): The subsystem that manages the bufferpool, retrieves data from and write data to the disk subsystem by using the services provided by the OSS subsystem. Most of these services are provided by the RSAM subsystem in XPS.  BSU (Base System Utilities): The “main engine” of DB2 that invokes all the other subsystems at startup. It allocates the memory for instance and for the databases.  OSS (Operating System Services): The subsystem that provides all the operating system services for all other subsystems. It handles memory and I/O, manages processes or threads, and provides synchronization mechanisms. It corresponds to the BELIB subsystem in XPS.

28 Database Strategies: Using Informix XPS and DB2 Universal Database

Client

CCI

RDS

DMS BSU

FCM DPS BPS

OSS

disk interconnect memory processes subsystem

Figure 2-7 Subsystems of DB2

Although there are many correspondences between the subsystems of XPS and DB2, there are also many differences in the detail. DB2 uses a structure called Query Graph Model to internally represent a query plan.

2.4 Storage architecture

This section discusses how data is organized on disk in XPS and DB2, including the concepts and terminologies used for storing and retrieving data. The commonalities and differences in the storage architectures are highlighted.

Both XPS and DB2 have similar approaches to the allocation of disk space for any kind of storage, such as tables, indexes, and log files. However, there are some major differences that are valuable to understand, such as the terminologies. Some terms are the same for different structures, while others are different for identical structures.

Chapter 2. XPS and DB2 UDB architectures 29 Figure 2-8 shows the main concepts and terms for disk structures that are used by XPS and DB2.

XPS – configurable size per instance. page XPS – configurable size per instance. page DB2DB2 - - configurable configurable size size per per tablespace tablespace. .

chunk XPSXPS–– contiguous contiguous pages pages from from disk disk chunk allocatedallocated to to a a dbspace. dbspace. container DB2DB2–– contiguous contiguous pages pages from from disk disk container allocatedallocated to to a a table table space. space.

XPS – a non-empty ordered set of dbslice XPS – a non-empty ordered set of dbslice dbspaces.dbspaces. XPS – a logical collection of one or more dbspace XPS – a logical collection of one or more dbspace chunks.chunks. DB2 – a logical collection of one or more tablespace DB2 – a logical collection of one or more tablespace containers.containers.

tablespace XPSXPS – – a a logical logical collection collection of of one one ore ore more more tablespace extentsextents for for a a table, table, partition partition or or fragment. fragment. XPS – contiguous pages from a chunk. extent XPS – contiguous pages from a chunk. extent DB2DB2 – – contiguous contiguous pages pages from from container. container.

Figure 2-8 Disk terms for XPS and DB2

Although data are organized logically in rows and columns, it is not possible to read or write with an I/O operation, an individual in a single , or even the entire single row. An I/O operation always works on the set of rows that are on the same page. (Pages are discussed in more detail in 2.4.1, “Pages” on page 31.) A container in DB2 and a chunk in XPS contain a set of contiguous pages. (These two concepts are compared and explained in 2.4.2, “Containers and chunks” on page 31.) When pages are needed for a database object, such as a table, neither DB2 nor XPS allocate individual pages from a container or chunk but from a set of pages called an extent. (The concept of an extent is also discussed in 2.4.2, “Containers and chunks” on page 31.)

When a database object is created in SQL, one does not specify chunks or containers as the locations where the database objects are to be stored. Instead a logical disk or a set of logical disks is specified. These logical disks are called dbspaces in XPS and table spaces in DB2. A dbspace contains one or more chunks and a table space contains one or more containers. In XPS there is also the concept of dbslices, which is an ordered set of dbspaces. (Table spaces, dbspaces, and dbslices are discussed in 2.4.3, “Logical disks” on page 35.) The

30 Database Strategies: Using Informix XPS and DB2 Universal Database term table space is also used in XPS but with a very different meaning than in DB2. A table space in XPS is the set of extents that belong to one single database object. (This meaning of table space is discussed in 2.4.2, “Containers and chunks” on page 31.)

2.4.1 Pages In XPS and DB2, the rows of a table are organized in pages. For XPS and DB2 each page consists of a page header and up to 255 so-called slots for storing rows. While in XPS, a single row might span several pages, in DB2 a single row is always stored on a single page. While the page size is the same for all pages in one instance in XPS, the page size can be selected individually for each table space in DB2. XPS supports page sizes of 2 KB, 4 KB, and 8 KB. Pages in DB2 can be 4 KB, 8 KB, 16 KB, and 32 KB. The page is the basic unit of I/O for XPS and DB2. XPS allows larger transfer sizes for so-called light scans and light appends. These do I/O by circumventing the bufferpool with transfer sizes of 64 KB or 128 KB (depending on the platform). Both database systems support the combination of adjacent pages in I/O operations.

The criteria for choosing the page size are similar for XPS and DB2. The advantage of large page sizes is that more data are read and written with a single I/O operation. Therefore, the performance of the instance for full table scans is usually better with large page sizes. The advantage of small page sizes is that if just a single short row has to be accessed, the amount of additional unwanted data which has to be read is smaller. Therefore, the same general rules are used for XPS and DB2. For example, for a workload similar to OLTP, usually smaller page sizes are better, while for a data warehousing workload, usually larger page sizes are better. However, because only 255 rows fit on a page, it is possible that space is wasted if short rows are combined with large rows on a page. This is true for both XPS and DB2.

2.4.2 Containers and chunks A set of contiguous pages allocated from the operating system in one piece is called a chunk in XPS and a container in DB2. Often a container or chunk is mapped to a single physical disk. However, the logical volume to which a container is mapped can also be a RAID volume or device.

The maximum size of a chunk in XPS is operating system dependent (for example, 1 TB on a Solaris operating system). In DB2, the maximum size of a container belonging to a regular table space is determined by the maximum size of a regular table space. This size again depends on the page size: 64 GB for 4 KB pages; 128 GB for 8 KB pages; 256 GB for 16 KB pages; and 512 GB for 32 KB pages.

Chapter 2. XPS and DB2 UDB architectures 31 Chunks in XPS can be put on raw devices or cooked files. If raw devices are used, one can also specify an offset on the raw device in addition to the size of the chunk.

DB2 supports two types of table spaces: System Managed Spaces (SMS) and

Database Managed Spaces (DMS). Accordingly, there are two different types of containers, either belonging to SMS table spaces or to DMS table spaces. DMS containers are very similar to XPS chunks. They can be created on cooked files or on raw devices. As with chunks in XPS, these containers are completely allocated when the container is created. And the allocation of database objects within a container/table space is managed by the database system. In contrast to that, containers in an SMS table space are directories in a file system. The database objects correspond to individual files and are, therefore, managed by the operating system. For XPS customers who are familiar with Informix Standard Engine, the file structure and allocation behavior of SMS table spaces is very similar to that of the Standard Engine.

Many guides for DB2 recommend to use either SMS or file-based DMS for temporary table spaces.

The concept of an extent exists in both XPS and DB2. In both cases, an extent denotes a set of contiguous pages in a chunk or container respectively. However, the use of extents is very different in both cases. To better understand this concept, let us first look at a how a dbspace or table space is filled when data is appended to a table. For simplicity, we consider only the non-DPF cases for table spaces, because the extension to a scenario with several partitions is straightforward.

Figure 2-9 on page 33 and Figure 2-10 on page 34 depict how the allocation of pages in tables is done in DB2 and XPS. We assume in both cases that a table is loaded or inserted into an empty dbspace or table space respectively. The table fills 38 data pages in DB2 and XPS. It is assumed in these examples that the table space contains three containers and that the dbspace contains three chunks. The size of each container and chunk should be 32 pages.

Figure 2-9 on page 33 shows how the page allocation can be done in DB2. The size of extents in the table space in this example is four pages. The extents are allocated in a round robin fashion across all three containers:  The first extent with four pages is allocated in container 1.  The second extent in container 2.  The third extent in container 3.

 The fourth again in container 1, and so on.

Typically the prefetch size in this case would be set to 12 pages. Therefore, one access would read four pages from each container, which means that the three

32 Database Strategies: Using Informix XPS and DB2 Universal Database containers really act similar to a software RAID 0. Therefore, the extent size in DB2 is used similar to a stripe size for RAID 0.

Container 1: Container 2: Container 3: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

extent consisting of 4 pages n page n tablespace

Figure 2-9 Allocation of data in containers in DB2

Figure 2-10 on page 34 shows how pages are allocated in XPS. The extent size in XPS is an attribute of a table and not of the dbspace in which the table is created. There are also two different extent sizes which can be specified for a table: the initial extent size which is 20 pages in this example and an extent size for the following extents, which is 8 in this example.

Given this, the first extent with 20 pages is allocated from the first chunk. The second extent is allocated again from the first container. Because now only four pages are remaining and extents cannot span chunks, the third extent has to be allocated from the second chunk. Finally, the last extent of eight pages is also allocated from the second chunk. Because adjacent extents can be combined to a single extent, the last two extents form one extent with 16 pages. All these extents form a table space, which in XPS is the set of all extents belonging to one database object.

Chapter 2. XPS and DB2 UDB architectures 33

Chunk 1: Chunk 2: Chunk 3:

0 1 2 3 28 29 30 31 4 5 6 7 32 33 34 35 8 9 10 11 36 37 38 39 12 13 14 15 40 41 42 43 16 17 18 19 20 21 22 23 24 25 26 27

extent consisting of 20 pages next extent consisting of 8 pages n page n tablespace

Figure 2-10 Allocation of data in chunks in XPS

Using several containers provides a striping mechanism in the database. That is, the containers increase the parallelism for accessing the disks, while chunks are essential a way to increase the capacity.

The following list summarizes the commonalities and differences of extents in DB2 and XPS:  Extents in XPS and DB2 do not span chunk or container boundaries.  Extent size in XPS is defined at the table level. Different tables in the same container can have extents with different sizes. Even the same table can have several extent sizes (even more than two because the extent can be reduced when in all chunks of a dbspace only less space is available, and the extent size is doubled after a certain number of extent allocations for one table). The size of all extents is always the same in the same table space for DB2.  The purpose of extents in XPS is to allocate the space of a table contiguously, for allowing sequential scans of a table without unnecessary disk head movements. Therefore, having one extent per table per container would be best. The purpose of extents in DB2 is to stripe data across containers and to allow parallel access to all containers. Parallel I/O is achieved via fragmentation (see 2.7.1, “Fragmentation in XPS” on page 54) in XPS.

34 Database Strategies: Using Informix XPS and DB2 Universal Database 2.4.3 Logical disks

In XPS, one or more chunks form a dbspace, while in DB2 one or more containers form a table space. In a DPF environment, a table space can be distributed across several database partitions, while in XPS a dbspace is always

local to a single coserver. There is a separate concept in XPS called dbslice. A dbslice is an ordered set of at least two dbspaces. One or several dbslices are used for fragmenting tables on the same lines as a table space in a partitioned environment is used to partition tables. However, while the dbslice concept is independent of the coserver concept — it is possible to have a dbslice within one coserver — the data partitioned in a table space is always partitioned along partition borders. A database partition group (see 2.5, “Parallelism” on page 42) can be specified, which determines which partitions are involved in a table space. Figure 2-11 gives examples of dbslices and tables paces in a environment with several logical nodes.

coserver 1 coserver 2 coserver 3 coserver 4

partition 1partition 2 partition 3 partition 4

dbslice typical use tablespace dbspace Figure 2-11 Examples of table spaces and dbslices in a multi-node environment

Logical disks are used to store database objects such as tables and indexes. In XPS the dbspace is used for storing a table and is specified during the creation of the table. Similarly, the table space used for storing tables in DB2 and is determined during the creation of the table. In XPS, the table space that is used for indexes is determined when the index is created. In DB2, the table space that is used for the indexes of a table is specified when the table is created. This is

Chapter 2. XPS and DB2 UDB architectures 35 only done for DMS Table spaces. Indexes for tables in SMS table spaces are always in the same table space. This means that two indexes (i1 and i2 for a table t) can be in different dbslices for XPS but are in the same table space for DB2.

In XPS, it is possible to specify that dbspaces should be mirrored by the database system. In DB2, mirroring of table spaces has to be done either by a logical volume manager or in hardware by the disk array.

The dbspaces and dbslices are owned by the instance while table spaces are owned by the database. Therefore, tables from different databases can share the same dbslice in XPS (the rootdbslice has to be shared between all databases), while in DB2 tables in different databases are always in different table spaces.

Logs (logical logs and physical log (used for before images of data pages) are allocated in dbspaces and dbslices, while logs in DB2 are allocated independently of table spaces.

Backups (full, incremental, and delta, which are called level 0, level 1, and level 2 in XPS) can be done at the table space or dbspace level both in XPS and DB2. A restore at the table level is done with separate tools in both DB2 and XPS (High Performance Unload (HPU) in DB2 and are checked in XPS).

There are several kinds of dbspaces or dbslices and table spaces in XPS and DB2. XPS knows:  Regular dbspaces or dbslices that are used for permanent and temporary tables  Temp dbspaces or dbslices that are used for temporary tables

Regular dbspaces or dbslices allow operations which are logged, but temporary dbspaces do not allow logged operations on the objects created within them. DB2 knows:  Regular table spaces  System temporary table spaces  User temporary table spaces  Large table spaces

Regular table spaces are used for storing permanent tables. System temporary tables are used for storing temporary data which are needed by the database server itself. For example, for temporary space for sorting and storing the overflow of a hash join. User temporary table spaces are used for storing global temporary tables, and large table spaces are used for storing large objects.

Both XPS and DB2 require some dbslices and table spaces so that they can work. The only required dbslice in XPS is the root dbslice. The root dbslice is

36 Database Strategies: Using Informix XPS and DB2 Universal Database created automatically when the XPS server is initialized and contains exactly one dbspace per coserver. Although not required, it is recommended to have at least one regular dbslice for data (for separating the critical data stored in the root dbslice from the regular data) and to have a temporary dbslice for performance reasons. The following table spaces are required by DB2 for each database:  Catalog table space (SYSCATSPACE)  User table space (USERSPACE1)  System temporary table space (TEMPSPACE1)

They are created when a new database is created. An additional user temporary table space is required when declared temporary tables have to be created.

2.4.4 Logging Both XPS and DB2 have a number of log buffers and log files for capturing to disk groups of changes made to pages in memory. The overall objective of logging is to provide the ability to replay or recover changes made in memory if the instance should fail. The approaches are very similar, but the implementations are significantly different in XPS and DB2.

XPS logging XPS uses the logical log to record modifications to database objects for recovery or restore purposes. DDL operations (such as CREATE TABLE, DROP INDEX) are recorded for both logged and non-logged tables. DML operations (INSERT, UPDATE, DELETE) are recorded only for logged tables. Certain internal operations (such as checkpoints, extent allocations, and dbspace creations) are also logged.

XPS uses the physical log to store, in their pre-modified state, certain pages that have been modified since the last checkpoint. If the server shuts down abnormally these before-images are recovered during the subsequent fast recovery, after which the appropriate operations recorded in the logical log are re-executed before the server switches to quiescent or online mode.

Figure 2-12, “XPS log buffer flushing” on page 38 shows the log structures that XPS uses.

Chapter 2. XPS and DB2 UDB architectures 37

Computer Instance1 logical log buffers -(3 memory

resident segment buffers of 16 pages each by default) - buffer cache captures changes to objects. At some point, the current buffer must be flushed to a logical log on disk. While before image pages changes to pages this flushing is taking place, one of the other logical log buffers are used. memory

physical log buffers physical log buffers -(2 memory logical log buffers buffers of 16 pages each by default) - captures the 1st before image of changed buffer cache pages. At some point, the current buffer must be flush flushed to the physical log file on disk. proc While the flushing is taking place, the some dbspace other physical log buffer is used. logical log physical log Flushing occurs when an application logical log connected to an Unbuffered Log Database issues a WORK, or logical log

disk the current log buffer is full.

Figure 2-12 XPS log buffer flushing

Log files The following are types of log files in XPS:  Physical Log File (1 only), which receives pages that are flushed from the physical log buffers.  Logical Log Files (3 minimum; 32,767 maximum), which receives pages that are flushed from the logical log buffers.

Important: In XPS 8.4x or before, logical logs must be added manually if more are needed. A level-0 archive must also be taken prior to availability.

In XPS 8.4x and beyond, dynamic logging can be enabled by the DBA. In this case, logical logs can be added automatically by the instance if chunk space is available. If logs are added, the instance retains these. That is, they are not freed up when they are no longer needed.

38 Database Strategies: Using Informix XPS and DB2 Universal Database XPS database modes XPS has two log modes: 1. Logged database, which captures DML and DDL for this specific database via the physical and logical logs. Logical recovery is possible, and explicit transactions (BEGIN WORK/COMMIT WORK) are possible. If a BEGIN WORK is not passed by the application, the instance wraps the statement in a BEGIN WORK/COMMIT WORK. This is known as a singleton transaction. 2. ANSI database, which means that all DML and DDL are captured. Any SQL is always in a transaction, and logging cannot be turned off for this type of database. This is the type of database that DB2 has available.

DB2 logging or database modes DB2, similar to XPS, must at some point log to disk the changes made to memory pages to facilitate recovery of a failed instance. DB2 has log buffers that are flushed to log files on disk as does XPS, although the implementation is very different. Shown in Figure 2-13 on page 40 are the log buffers and files for two different databases. Recall that with DB2, each database has its own set of log buffers and files, while with XPS log buffers and files are shared across the instance.

Chapter 2. XPS and DB2 UDB architectures 39

Computer database manager

DB2 instance database1 database2

fact_table IBMDEFAULTBP IBMDEFAULTBP buffer pool DASAdministration (DB2 Server) Buffer Pool Buffer Pool (4K page) memory

log buffers log buffers

log buffer flushing occurs when a transaction COMMITs or a group of flush on transactions COMMIT; OR the log buffer flush on COMMIT is full, OR some other internal database event occurs. COMMIT processes

Primary syscatspace Primary log files log files tablespace fact_table tablespace userspace1 (4K page) tablespace Secondary disk Secondary log files fact_index log files tempspace1 syscatspace tablespace tablespace tablespace

Figure 2-13 DB2 log buffer flushing

DB2 uses a logging approach similar to XPS, with some significant differences:  Log files are allocated at the database level - one set per database. (This is different than XPS).  DB2 does not have a physical log file.

The logging mode of a database cannot be turned off. The database can initially be created with NOT LOGGED INITIALLY, which disables logging for CREATE TABLE or ALTER TABLE statements. Thus, no logging activity is captured. However, the logging would have to be turned on prior to usage.

Types of log files The following are the types of log files with DB2:

 Primary log files, which are a fixed amount of storage allocation. These files are pre-allocated at the first connection to the database.  Secondary log files, which are allocated one at time as needed (up to the value of LOGSECOND parameter) when the primary log files become full.

40 Database Strategies: Using Informix XPS and DB2 Universal Database Types of logging There are two types of logging:  Circular logging supports non-recoverable databases and uses only the log files that are found as active in the instance.

Note: Roll-forward recovery is not available with circular logging.

 Archival logging archives a log file as it becomes inactive. This type of logging supports roll-forward recovery. This method uses the following types of log files: a. Active files that contain information for transactions that have not been committed or rolled back, as well as those transactions that have been committed or rolled back, but changes have not yet been written to disk. b. Online archived files where a log that is not needed anymore and is considered closed. This log though still resides in the ACTIVE log subdirectory. c. Offline archived where logs have been moved from the ACTIVE log subdirectory manually or by a USEREXIT that calls a user exit program when a log file is ready for archiving. These logs can simply be moved out of the current ACTIVE log directory or moved to tape.

2.4.5 Storage architecture summary Table 2-1 summarizes most of the terms that are discussed in 2.4, “Storage architecture” on page 29.

Table 2-1 Disk terms comparison Disk Term XPS DB2

page The basic unit of I/O. Same page The basic unit of I/O. The page size is used for the entire size can be chosen for each instance. The page size can be table space separately. The 2KB, 4KB or 8KB. page size can be 4 KB, 8 KB, 16 KB, or 32 KB.

chunk One or more contiguous pages on N/A disk allocated to a dbspace.

extent One or more pages from a One or more pages from a container allocated to the chunk. Can be free or allocated, instance. Can be free or and can be used for tables, allocated, and can be used for indexes, log files, for example. table space or index.

Chapter 2. XPS and DB2 UDB architectures 41 Disk Term XPS DB2

container N/A One or more contiguous pages on disk allocated to a table space.

dbspace A logical collection of one or more N/A chunks

dbslice A ordered set of dbspaces. N/A

table space The extents belonging to one A logical collection of containers table. or disk storage.

2.5 Parallelism

This section discussed how XPS and DB2 make use of parallelism and how they differ in there use of parallelism.

XPS and DB2 support two kinds of parallelism:  Within a logical node (intra-partition parallelism, intra-coserver parallelism)  Between logical nodes (inter-partition parallelism, inter-coserver parallelism)

To understand how both servers make use of parallelism, you must understand the process or execution model of those two servers, which is discussed in the following sections.

2.5.1 The process model of XPS Each XPS coserver uses a relatively small number of processes, called virtual processors (VPs). A VP is a process that is designed to do work similar to a physical processor on a computer. While a processor is responsible for managing processes on the system, a VP is responsible for managing threads.

Table 2-2 on page 43 shows the classes of virtual processors and the types of processing that are performed. Each class of virtual processor is dedicated to processing certain types of threads.

42 Database Strategies: Using Informix XPS and DB2 Universal Database Table 2-2 VP class descriptions

VP Class Description

CPU All user threads and some threads for the server system run on VPs in this class. No blocking system calls are allowed on this VP, such as

activities that read and write from disk or wait for messages from the application.

AIO The AIO (asynchronous I/O) VP performs disk I/O to cooked chunks, and to raw devices when kernel I/O is not turned on, including disk reads and writes needed for SQL statements, checkpoints, and other I/O activities. The number of AIO VPs can be configured by the administrator.

PIO The PIO VP runs threads for the server that perform writes to the physical log on disk. The PIO VPs are automatically allocated when the server is started.

LIO The LIO VP runs internal threads for the server that perform writes to the logical log on disk. The LIO VPs are automatically allocated when the server is started.

SHM The shared memory class handles the task of polling for new connections using the shared memory method of communication to the application. It also handles incoming messages from the application.

TLI The TLI class handles polling tasks for the TLI programming interface for TCP/IP or IPX/SPX communication with the application.

SOC The SOC class handles polling tasks for the TCP/IP Berkeley sockets method of communication with the application.

ADM Performs administrative functions.

FIFO Performs reads and inserts for high performance loading and unloading through FIFO.

Note: For more information, refer to Chapter 11 "Virtual Processors and Threads" in the IBM Informix Extended Parallel Server Performance Guide, G251-2235.

Chapter 2. XPS and DB2 UDB architectures 43 2.5.2 The process model of DB2

DB2 has processes that are allocated for the database manager (instance), database, or for the requesting connection. Figure 2-14 shows a detailed view of the DB2 process model. Notice that many processes are per instance or per

database, which is very different than XPS, where all processes or VPs are shared across the instance.

Figure 2-14 DB2 process model with DPF

The following tables describe some of the important processes visible during different states of a DB2 instance, which are:  DB2 processes per instance  DB2 processes per connection  DB2 processes per database  DB2 processes on the catalog partitions

With a few exceptions, these processes are on all logical database partitions.

Table 2-3 on page 45 shows DB2 processes on each logical database partition per instance. These processes exist whether there are connections or active databases.

44 Database Strategies: Using Informix XPS and DB2 Universal Database Table 2-3 DB2 processes per instance

Process Name Description

db2sysc Main DB2 system controller or engine.

db2gds Global Daemon Spawner that starts all EDUs (one per partition).

db2ipccm Inter-Process Communication Manager, listener for local client (one per partition).

db2tcpcm TCP/IP Communication Manager, listener for remote TCP/IP requests.

db2tcpdm TCP/IP Discovery Communication Manager for configuration.

db2snacm SNA/APPC Communication Manager.

db2wdog DB2 Watch Dog.

db2fcmdm Fast Communication Manager Daemon for handling inter-partition communications (DPF only).

db2pdbc Parallel Database Controller handles parallel requests from remote nodes (DPF only).

db2cart Determines when a log file can be archived and invokes the user exit to do the actual archiving. There is one db2cart process per instance, but it only runs if there is at least one database in the instance which has USEREXIT enabled.

db2fmtlg Pre-allocates log files in the log path when the database is configured with LOGRETAIN ON and USEREXIT OFF.

Chapter 2. XPS and DB2 UDB architectures 45 Table 2-4 shows additional CB2 processes per partition when there are connections present.

Table 2-4 DB2 processes per connection

Process name Description db2agent DB2 coordinator agent.

db2agnta An idle subagent that was used in the past by a coordinator agent and is still associated to that coordinating agent process.

db2agntp A subagent that is currently performing work on behalf of the coordination agent it is associated with.

Table 2-5 describes processes for each active databases, per instance, in addition to the instance and connection processes.

Table 2-5 DB2 processes per database Process name Description

db2pfchr The buffer pool prefetchers, which read data and index pages from disk into the database buffer pool(s) before it is read on behalf of applications.

db2pclnr The buffer pool page cleaners, which asynchronously write dirty pages from the buffer pool(s) back to disk.

db2logts This process is used for collecting historical information about which logs are active when a table space is modified. This information is recorded in the DB2TSCHG.HIS file in the database directory. It is used to speed up table space roll forward recovery.

db2loggr The database log reader, which reads the database log files during transaction processing, restart recovery, and roll forward operations.

db2loggw The database log writer, which flushes log files to disk.

db2dlock Local deadlock detector; there is one per database partition. It scans the lock list and looks for deadlock conditions.

Table 2-6 shows DB2 processes that reside on the catalog partition.

Table 2-6 DB2 processes on catalog partition Process name Description

db2glock Global deadlock detector.

46 Database Strategies: Using Informix XPS and DB2 Universal Database 2.5.3 Intra-node parallelism

With Intra-node parallelism on an SMP machine, multiple processes can serve a given user simultaneously. A database operation is broken down into component steps that can be performed in parallel within a given coserver (XPS) or partition

(DB2). In XPS, virtual processors of the CPU class can run multiple session threads, working in parallel, for a single client. With DB2, a coordinating agent (db2agent) process can use multiple subagent (db2agntp) processes.

For a DB2 database to exploit this type of parallelism, the database manager configuration parameter INTRA_PARALLEL must be set to YES. When performing a PREP or a BIND, the DEGREE parameter can be set to a specific integer degree of intra-partition parallelism to be generated, or it can be set to ANY. If set to ANY, the optimizer determines the degree of intra-partition parallelism. The degree for dynamic SQL can be specified by setting the CURRENT DEGREE register. The default value for DEGREE is controlled by a database configuration parameter named DFT_DEGREE.

In addition to CPU parallelism, both products facilitate parallel I/O by fragmenting or striping (round robin) data across multiple storage devices.

2.5.4 Inter-node parallelism You can configure the database server on a single computer or a parallel-processing platform as a set of coservers (XPS) or partitions (DB2). In such an environment, a single database operation can be performed in parallel on multiple coservers or partitions. This provides inter-node parallelism. As with the intra-node parallelism, a database operation is broken down into component steps that can be performed in parallel. The steps are then executed across multiple coservers/partitions.

With Inter-node parallelism, on an MPP machine, multiple processes can serve a given user simultaneously. A database operation is broken down into component steps that can be performed in parallel on multiple coservers (XPS) or partitions (DB2) in a database instance. In XPS, virtual processors of the CPU class can run multiple session threads, working in parallel, for a single client. With DB2, a coordinating agent (db2agent) process can use multiple subagent (db2agntp) processes. With DB2, DPF provides this functionality.

With both XPS and DB2, the node can be a logical node. Thus, this type of parallelism can be used on SMP machines, clusters, or MPP machines.

Chapter 2. XPS and DB2 UDB architectures 47 2.6 Memory management

This section describes the memory models for XPS and DB2.

2.6.1 XPS memory model This section describes memory allocation for XPS. To aid in understanding, refer to the visual of memory allocation depicted in Figure 2-15. XPS shared memory is divided into memory segments that consist of three portions: the resident portion, the virtual portion, and the message portion.

Figure 2-15 XPS memory allocation

Segment descriptions The XPS instance has a minimum of two shared memory segments: one resident and one virtual. The message portion is optional, but one or more is typically present. However, if there are no shared memory connections configured, there is not a message portion.

48 Database Strategies: Using Informix XPS and DB2 Universal Database Resident portion The resident portion is a fixed size portion that contains many structures. The significant structures for the resident portion are:  Buffer cache (BUFFERS). A single, fixed-size set of memory buffers that are 2 KB, 4 KB, or 8 KB fixed size. The buffer cache holds the majority of XPS data pages read from disk for sharing among users. Pages that are read via a light scan are not held in the buffer cache. A light scan is a type of read where pages are placed in private memory buffers. This type of read is typically used, for example, in data warehousing applications.  Locks (LOCKS). A structure that is used to enforce locking in XPS. A single set of lock structures is allocated initially in the resident portion. If the instance needs more locks, it will attempt to allocate additional locks structures in the virtual segment if memory is available - up to a defined limit.  Log buffers (LOGBUFF / PHYSBUFF). Memory buffers that temporarily store either pages or records that will be sent to either the logical log files or physical log file.  Least recently used queues (LRUS). Queues for organizing pages in the buffer cache with respect to popularity. Modified and unmodified pages are separated into different queues. The buffer cache pages are managed by LRU and by priority based on the specific page type.

Virtual portion The virtual portion contains any shared information in the database server that can grow or shrink, or be allocated or de-allocated (memory pools). The number of segments in the virtual portion can grow as needed during the life of the database server. If the instance needs more memory, it can request additional virtual segments if available. The instance retains these segments (does not release them automatically) until the next computer or instance reboot.

The most significant structures found in the virtual portion are:  Session pools. For each connection, a session is created, including the SCB (session control block), TCB (thread control block), and RSTCB (RSAM thread control block). These control blocks hold session information for that session. When the application or user disconnects, the memory pools are freed and available for another usage.  Miscellaneous memory pools. For operations such as sorting, cache, stored procedure cache, and PDQ allocations.  Light scan buffers. Private buffers into which data is read for read-only queries. Typically used in data warehousing.

Chapter 2. XPS and DB2 UDB architectures 49  Light append buffers. Used for loading data via the High Performance Loader Express (HPL Express). Using HPL allows bypassing of the BUFFERS and LRUS overhead in the resident segment.

Message portion The message portion contains message buffers that are used for local/shared memory connections. These would be connections originating on the same computer as the instance. The segments in this portion have read/write permissions for all users. The significant structures in the message segment(s) are:  Client-side message buffers, for the client connections to use when making requests to the instance via a shared memory connection.  Server-side message buffers, for the instance to use when passing results back to the client shared memory connection.

Process memory footprint With the XPS instance, the processes are referred to as virtual processors, or VPs. Initially a number of VPs of different classes are allocated when the instance starts. (For more information about VPs, see Table 2-2 on page 43.) VPs can also be added or dropped dynamically, depending on the class of VP.

When these processes are allocated, the process space for the instance is fixed. The multi-threaded architecture keeps that process space fixed. The existing VPs control the thread switching within the process space. The control blocks for the thread can in fact grow and shrink. But this is done in the virtual segment(s) of the instance, which is already allocated as well. While the XPS instance can add virtual segments of memory, it does not automatically allocate/spawn additional processes. Figure 2-16 on page 51 depicts how the process space (center portion of the diagram) remains fixed within XPS as the lightweight threads switch on and off the process or VP.

50 Database Strategies: Using Informix XPS and DB2 Universal Database

Thread Content Thread Content

Program Counter Program Counter Stack Pointer Stack Pointer Register Contents Process Register Contents Space text Shared Memory stack Each thread has a pointer into the text stack data of the process. stack

Figure 2-16 Multi-threaded architecture thread context switch

Note: You can find additional information about the XPS memory model in the IBM Informix Extended Parallel Server Performance Guide, G251-2235.

2.6.2 DB2 memory model There are different memory sets in DB2 UDB are Database Manager Shared Memory, Database Global Memory, Application Global Memory, and Agent Private Memory. Figure 2-17 shows the basic memory architecture of a DB2 server for a partition. Each logical database partition has this structure.

Shared Memory Database Manager

Database1 Shared Memory Database2 Shared Memory

Application1 Group Application2 Group Shared Memory Shared Memory

Application Shared Memory Application Shared Memory

Agent Agent Agent Agent Private Private Private Private Memory Memory Memory Memory

Figure 2-17 DB2 shared memory architecture

Chapter 2. XPS and DB2 UDB architectures 51 Database Manager Shared Memory Also known as instance shared memory. There is one Database Manager Shared Memory set on each partition of a DB2 instance. It is used for instance level tasks such as monitoring, auditing, and inter-node communication. This includes the following:  Monitor heap: holds Database System Monitor data.  Audit buffer size: holds audit data if auditing is enabled.  FCM buffers: holds Fast Communication buffers.

Database manager shared memory is allocated in full at instance start time (db2start) and de-allocated when it is stopped (db2stop).

Database shared memory Database shared memory is also known as database global memory. There is one database shared memory set per database on each partition. This memory is used by DB2 to manage the activities of all connections to a specific database associated with an instance. This includes the following:  Buffer pools hold table and index pages when they are read from disk.  Database heap holds space for many items such as temporary memory for utilities, event monitor buffers and log buffers.  Utility heap specifies the maximum amount of memory that can be used for utilities such as backup, restore and load (including load recovery). This area of memory might also contain temporary overflows from the package cache and from the catalog cache.  Lock list specifies the amount of memory allocated to store all locks held by all applications concurrently connected to this database. The total amount of locks that can used by all concurrent applications per each database.  Catalog cache specifies the amount of memory used to cache system catalog information.  Package cache specifies the amount of memory used to cache sections for both static and dynamic SQL statements.  Shared sort memory specifies an upper limit on the amount of database shared memory.

Database shared memory is allocated in full at the first connection to the database or when the database is activated, and de-allocated when the

database is deactivated (if it was activated) or the last connection disconnects.

52 Database Strategies: Using Informix XPS and DB2 Universal group or application shared memory Application group shared memory and application shared memory are used by all agents (both coordinating and subagents) that work for an application. This memory is only allocated in a partitioned database environment, in a non-partitioned database with the database manager intra-partition parallelism configuration parameter (intra_parallel) enabled, or in an environment in which the connection concentrator is enabled. This memory is allocated incrementally when an application starts, and is de-allocated when the application completes.

Agent private memory This memory is allocated for an agent when that agent is created. The agent private memory contains memory allocations that will be used only by this specific agent, such as the sort heap and the allocation heap.

Note: For more information about DB2 Memory Management, refer to Chapter 8 "Operational Performance: Memory Usage" in the DB2 Administration Guide: Performance, SC09-2945.

Buffer pools A buffer pool is an area of memory into which pages from disk are temporarily moved from disk storage. DB2 agents read and modify data pages in the buffer pool. The buffer pool is a key influencer of overall database performance, because data can be accessed much faster from memory than from a disk. If more of the data needed by applications is present in the buffer pool, then less time would be needed to access this data, thereby improving performance. Buffer pools can be defined with varying page sizes including 4 KB, 8 KB, 16 KB and 32 KB.

Block-based buffer pools In DB2 V8, prefetching on certain platforms can be improved by creating block-based buffer pools. By default, buffer pools are page-based, which means that contiguous pages on disk are prefetched into non-contiguous pages in memory. Sequential prefetching can be enhanced if contiguous pages can be read from disk into contiguous pages within a buffer pool. DB2 prefetching code recognizes when a block-based buffer pool is available and will use block I/Os to read multiple pages into a continuous buffer pool in a single I/O, thereby significantly improving the performance of prefetching.

A block-based buffer pool consists of both a page area and a block area, where:

 The page area is required for non-sequential prefetching workloads.  The block area consists of blocks where each block contains a specified number of contiguous pages — referred to as the block size.

Chapter 2. XPS and DB2 UDB architectures 53 2.7 Partitioning

One of the most important aspects in a database system for a cluster hardware

architecture is the way data can be divided in disjoint data set which can be processed independently. Dividing a table in disjoint data sets is called fragmentation in XPS and partitioning in DB2. Data has not only to be divided in set when the data is stored on disk, but also when queries are processed. For example, when a hash join is performed.

The fragmentation methods of XPS are presented in 2.7.1, “Fragmentation in XPS” on page 54, and the partitioning methods of DB2 are discussed in 2.7.2, “Partitioning in DB2” on page 58 along with a discussion how to map the fragmentation methods of XPS to DB2.

There are some general goals which partitioning and fragmentation should achieve:  Uniform distribution of the data  Avoiding waste of disk space  Minimization of the administration overhead  Allowing the elimination of unnecessary work  Minimization of communication during queries  Easy and efficient addition of data

2.7.1 Fragmentation in XPS Fragmentation in XPS is independent of the coserver concept. When a database object (table or index) is fragmented in XPS, the kind of fragmentation (round robin, expression, hybrid, or range) is specified and the list(s) of dbspaces, or the dbslice(s) into which the database object is put. Fragmentation determines also the amount of parallelism for scan, insert, update, and delete iterator. The default is one thread per fragment.

Fragmentation by round robin Fragmentation by round robin distributes all the data evenly across all specified dbspaces independent of the value of any of the columns of the row which is distributed. Fragmentation by round robin ensures that there is no data skew but never allows collocated hash joins. Also it is never possible to eliminate any fragments.

54 Database Strategies: Using Informix XPS and DB2 Universal Database Fragmentation by expression Fragmentation by expression allows you to specify a SQL expression for each dbspace in the fragmentation clause. The typical usage of fragmentation by expression is to specify (multidimensional) ranges, but it is also possible to do something similar to your own hashing by using a modulo function in the expression. The big advantage of fragmentation by expression is that it is very easy to reduce the amount of work to do during a scan by fragment elimination, even if the where clause in the SQL statement contains range expressions. The disadvantage of fragmentation by expression is that it never allows collocated hash joins and that the is responsible for avoiding data skew by selecting appropriate expressions.

Fragmentation by hash Fragmentation by hash distributes the data with a built-in hash function across the dbspaces specified in the fragmentation clause. The advantages of fragmentation by hash are that it usually distributes the data pretty evenly across all fragments. This means a full table scan of a hash-fragmented table usually make good use of the hardware resources available. It also allows collocated hash joins, and allows also fragment elimination for equality

Fragmentation by hybrid Fragmentation by hybrid combines the advantages of fragmentation by expression and fragmentation by hash. It is essentially a two-dimensional fragmentation scheme where one dimension is hashed on a column which is used in the most important joins and the other dimension is a set of ranges often on a date column. Table t in Figure 2-18 on page 57 is an example of a hybrid fragmented table. Hybrid fragmentation also supports very well a rolling window of data by supporting the detaching and attaching of dbslices. Hybrid fragmentation is typically used for the largest (fact) tables of a data warehouse.

Fragmentation by range Fragmentation by range (and the hybrid version of fragmentation) is mostly used for OLTP system. Because we are mostly focused on data warehousing, this method of fragmentation will not be discussed any further.

Fragmentation of indexes Fragmentation of an index can be specified directly during the creation of the index or indirectly during the creation of the table. When no fragmentation is specified during index creation in XPS, the index will be fragmented in the same way as the table. This kind of index is called an attached index, and has its own table spaces. A unique attached index can only be created on columns which contain the partitioning column.

Chapter 2. XPS and DB2 UDB architectures 55 When a fragmentation clause is specified during the index creation there are two possibilities. If each row of the index is mapped to the same coserver, this index is called a locally detached index. Otherwise, it is called a globally detached index. Examples of attached, locally detached, and globally detached indexes are shown in Example 2-1. A globally detached index can also be created when the index is not fragmented but put instead in a single dbspace, as depicted by the index i_gd2 in Example 2-1.

Example 2-1 Different kinds of indexes -- Dbslices creation: create dbslice dbsl0 from cogroup cogroup_all chunk “/dbslices/%c/dbsl0-1” size 30 gbytes, cogroup cogroup_all chunk “/dbslices/%c/dbsl0-2” size 30 gbytes; create dbslice dbsl1 from cogroup cogroup_all chunk “/dbslices/%c/dbsl1” size 20 gbytes; -- Table creation: create table t ( c0 integer, c1 integer, c2 integer ) fragment by hash(c0) in dbsl0; -- Attached index: create index i_a on t(c0); -- Locally detached indexes create index i_ld1 on t(c0) fragment by hash(c0) in dbsl1; create index i_ld2 on t(c1) fragment by hash(c0) in dbsl0; -- Globally detached index create index i_gd1 on t(c1) fragment by hash(c1) in dbsl0; create index i_gd2 on t(c0) in dbsl0.0;

A typical use of a locally detached index is a table with many data fragments on each coserver, and just one or a few index fragments per coserver. An example of this is shown in Figure 2-18 on page 57, where the table t is fragmented by hybrid and the index i fragmented by hash.

56 Database Strategies: Using Informix XPS and DB2 Universal Database

create table t (c0 integer, c1 integer, c2 integer) fragment by hybrid(c0) expression c1 < = 100 in dbsl1, c1 > 100 and c1 < = 200 in dbsl2, c1 > 200 and < = 300 in dbsl3, c1 > 300 in dbsl4;

Coserver 1Coserver 2 Coserver 3

dbsl1

dbsl2 t: dbsl3

dbsl4

dbspace/ fragment

i: dbsl5

Create index “I” on t(c0) fragment by hash (c0) in dbsl5;

Figure 2-18 Locally detached index on hybrid fragmented table

Partitioning of data during query processing The fragmentation of the data automatically provides the partitioning of the scan iterators. There are essentially three partitioning methods in XPS for iterators higher up in the query tree, such as hash joins or hash groups:  Broadcasting an intermediate result  Partitioning the intermediate result via a hash function  Passing the data on as they are currently partitioned

The broadcasting of an intermediate result is used for instance for small table broadcasts in hash join. This avoids the need for the larger table of the hash join to be distributed. The partitioning of the intermediate result via a hash function is for instance used for hash joins and hash groups. The hash function used for that is the same as the hash function used for hash and hybrid fragmentation. Therefore, if the tables involved in a hash join are already fragmented by hash or hybrid on the join columns, no data have to be moved over the network. A join or group where the input data are already fragmented in the way as it is required for the join or group is called collocated.

Chapter 2. XPS and DB2 UDB architectures 57 2.7.2 Partitioning in DB2

DB2 currently supports hash partitioning for distributing data across the logical nodes. When a database object is partitioned in DB2, a table space is specified which is defined over a database partition group which is a subset of logical

nodes. Therefore, DB2 partitioning is currently (version 8.2) across logical nodes. When a partitioning of the data within one node is required, other techniques such as views with UNION ALL statements, or multidimensional clustering (MDC) are used.

Hash partitioning Similar to XPS, DB2 uses a hash function which is mapping the rows to partitions according to the values of the partitioning columns. A major difference between XPS and DB2 is that DB2 is mapping the partitioning columns first via an internal hash function to a so-called partitioning map which is an array of size 4096 which contains for each index value in the partitioning map a target partition. This is shown in Figure 2-19. By default, the partitions are just distributed uniformly in round robin fashion across the partitioning map, as can be seen in Figure 2-19. An advantage of this indirect approach of mapping rows to partitions is that the partitioning map can be modified manually if a table shows data skew.

partitioning key row: 12345

Hash function

partitioning map: map index 0 1 2 3 4 5 6 7 8 9 ... 4093 4094 4095 partition # 1 2 3 4 1 2 3 4 1 2 ... 2 3 4

1 2 3 4

Figure 2-19 Partition map for hashing

Partitioning of indexes The partitioning of table is specified, as in XPS, when the table is created. By default, indexes are partitioned in the same way as the table. This corresponds to attached indexes in XPS. When a table is partitioned into an DMS table spaces, it is possible to specify a different table spaces for the indexes of this table. The restriction for this index table space is that it must be based on the

58 Database Strategies: Using Informix XPS and DB2 Universal Database same database partition group as the table space of the table itself. Therefore, indexes partitioned in this way are essentially locally detached indexes (to use XPS terminology). If you have to create indexes similar to globally detached indexes in XPS, another technique can be used. A of the original table with the same columns but a different partitioning has to be created. The partitioning columns of the materialized view should be the partitioning columns which one wants to use for the globally detached index. Then the index which is needed as a globally detached index can be created on the materialized view.

Other ways of partitioning data in DB2 When partitioning with a logical node is required, two other techniques can be used in DB2:  Views with UNION ALL  Multidimensional clustering

Views with UNION ALL have a body of the view which is doing a UNION ALL of several tables. Each table corresponds to a fragment in XPS. Each table also has to have a check condition associated with it. This check condition specifies, for instance, the range of values if the UNION ALL view is used to model a range fragmentation which is done with fragmentation by expression in XPS. All the check constraints have to be mutually disjoint.

When a UNION ALL view is combined with hash partitioning, it is similar to hybrid partitioning in XPS. When all the tables in the UNION ALL view correspond to tables on different logical nodes, then this is similar to doing a fragmentation by expression across several nodes. With the UNION ALL views it is possible to implement a detach/attach mechanism to the detach or attach mechanism. By having check constraints associated with each table in the UNION ALL view the optimizer can eliminated tables which are not needed in a query. UNION ALL views should be only used it the number of partition is small. If the number of partitions is large, multidimensional clustering is often a better technique for implementing a partitioning of the data corresponding to fragmentation by expression of hybrid fragmentation.

Partitioning of data during query processing As in XPS the best way of partitioning data in DB2 during the execution of a query is by not repartitioning a query, that is, by doing collocated joins. The requirements for collocated joins are very similar to the requirements in XPS.

Chapter 2. XPS and DB2 UDB architectures 59 Partitioning during the execution of a query is done in DB2 with so-called table queues. The following list shows the possible types of table queues:

 Broadcast table queue (BTQ)  Directed table queues (DTQ)

Broadcast table queues correspond to broadcast mechanism of XPS and are used in an analog way. Directed tables queues provide a mechanism to partition data for operators in the with a hash function, as in XPS.

2.8 Terminology

Table 2-7 contains a summary of examples of XPS and DB2 terminology. This table focusses particularly on terms which have a small or large semantic difference in XPS and DB2. Terms which mean exactly the same in XPS and DB2, such as row, column, or table, are not listed in this summary.

Table 2-7 Terminology table XPS DB2 Description

instance instance container for one or more databases, has physical resources associated with it

database database/schema logical/physical collection of database objects or resources

schema schema logical collection of database objects (tables, views, and so forth)

dbspace tablespace logical disk

dbslice tablespace set of logical disks

chunk container part of a logical disk (sometimes corresponding to a single physical disk/RAID array)

extent extent basic unit of allocation of space from a logical disk for a database object

page page basic unit of organization of data, often basic unit of I/O

tablespace whole space allocated for database object within a logical disk

fragment partition part of a database object located in a single logical disk

60 Database Strategies: Using Informix XPS and DB2 Universal Database XPS DB2 Description index index auxiliary structure for accessing database objects coserver partition logical node cogroup database partition group set of logical nodes bufferpool bufferpool database cache for pages decision support sort heap part of memory used for processing memory memory consuming iterators such as hash joins and sorts sqlexec thread coordinator agent thread communicating with the client aio vps prefetcher processes responsible for asynchronous I/O

CPU vps agents processes responsible for processing queries catalog catalog repository of meta data information

VP/thread agent thread or processes working a part of the DBMS thread or subsection piece independently executed part of segment query plan temp dbspace or system temporary (set of) logical disk(s) that are used dbslice tablespace for allocating temporary space used internally by the database system temp dbspace or user temporary tablespace (set of ) logical disk(s) that are used dbslice for allocating temporary space used for temporary tables dbspace or long table space (set of) logical disk(s) that are used dbslice for allocating space for large objects (that is, blobs) table types not logged initially control of operations allowed, type of logging on tables temp tables global temporary table non-permanent table

Chapter 2. XPS and DB2 UDB architectures 61

62 Database Strategies: Using Informix XPS and DB2 Universal Database

3

Chapter 3. Configuration

This chapter discusses DBMS configuration. It presents information about the capabilities and parameters of DB2 and how they relate to an Informix XPS environment. This information can help as you consider the issues that are involved in a transition from XPS to DB2.

This chapter provides an overview of the configuration of a DB2 instance if you are considering a transition from XPS to DB2. As with XPS, configuration choices inherently influence the performance of the DB2 instance. It is not the intent of this chapter to provide step-by-step instructions on how to configure DB2 or to make recommendations for tuning DB2.

Fundamentally, both XPS and DB2 use a shared nothing architecture implementation and are similar in many areas. Having a good understanding of the architecture of DB2 helps tremendously with configuration challenges and tasks. For more details on the architectures, see Chapter 2, “XPS and DB2 UDB architectures” on page 7.

DB2 configuration parameters are discussed in detail in the DB2 product documentation. There is also a very exhaustive description provided in Database Transition: Informix Dynamic Server to DB2 Universal Database, SG24-6367.

© Copyright IBM Corp. 2005. All rights reserved. 63 3.1 XPS and DB2 configuration

XPS and DB2 configurations are similar in many ways. However, there are many significant differences as well. This section describes the similarities and differences a high level.

3.1.1 Knobs (configuration files and tuning parameters) As an experienced XPS DBA, a first look at the DB2 configuration parameter choices might seem daunting and complicated. In reality, there are probably only a handful of knobs (configuration files and tuning parameters) that need to be tweaked to improve the performance of the database environment. Also, DB2 does provide tools to assist with the configuration and management of instances and databases. A good understanding of the cause and effect relationship of any tuning parameter is very important to ensure the smooth functioning of a database management system, independently of whether the DBMS is XPS or DB2.

3.1.2 Commands In general, an XPS DBA who is most familiar with the command line nature of management, might find DB2 to be much more verbose. Also, there is no mechanism to edit or change parameters that is similar to the ONCONFIG file in XPS. As familiarity increases with DB2, use of GUI tools, command line abbreviations, and aliases, can ease the administration and management of the DB2 environment.

3.1.3 Granularity Configuration parameters in XPS are set primarily in the ONCONFIG file, which affects the entire instance. For example, there are session level changes that can be set to override the global settings from the ONCONFIG file. When configuring or tuning DB2, knobs can be set at the instance, individual database, table, or connection level.

Some of the other areas where XPS differs from DB2 in granularity are:  Coserver and partition The logical entity that consists of processor/memory/disk and serves as an administrative unit, is referred to as a coserver in XPS and a database partition in DB2. Coserver 1 in XPS has a special purpose because it contains some additional administrative information. There are no such special partitions in DB2. However, it is not uncommon to find installations where the

64 Database Strategies: Using Informix XPS and DB2 Universal Database database catalog is maintained in a separate partition with no other data partitions.

 Transaction or logical logs XPS has one set of logical logs per coserver within an instance. DB2 has a

set of transaction logs for each data partition within the instance.  Configuration files XPS has one configuration file per instance, and if this file is not located on a shared file system, it must be identical on all coservers. In DB2, each database can be configured and tuned independently of other databases. While coserver-specific sections in an XPS configuration file can have differing values for a set of permissible parameters, it is rarely found in customer implementations. In DB2, the database configuration parameters could potentially be different on each of the database partitions. This is typically not a recommended configuration. The term configuration file in DB2 is somewhat of a misnomer, because it is not a human readable file that can be edited with a text editor.  Buffer pools XPS has one buffer pool. In DB2, each table space (data or index) can have a bufferpool. If there are multiple page sizes in the database, then each table space will have a bufferpool.

3.1.4 Database manager DB2 has the concept of a database manager (DBM) that houses all the databases. The DBM is known as the instance, and it has its own configuration file, which is different from individual database configuration files. While database configurations can be tuned independently of each other within an instance, the DBM parameters can be used as a resource control to ensure that system resources are not depleted to the detriment of the underlying operating system environment.

In XPS, the onconfig file has settings for the instance that affects every user and database.

3.1.5 Dynamic parameters

With XPS, in some cases, the instance must be bounced before parameter changes can take affect. The term bounced indicates that the instance is taken offline and then brought back online.

Chapter 3. Configuration 65 With DB2, three different levels of configuration must be considered:

 Database manager (DBM) configuration parameters, which are typically considered instance level parameters and might require an instance bounce.  Registry profile settings, which are settings that can be changed without

stopping or starting the instance.  Database (DB) configuration parameters, when changing most of these settings, only the database must be stopped and restarted, not the entire instance.

Note: This redbook does not attempt to categorize the list of configuration parameters into one of these three levels. Refer to the product documentation for access to the latest updates.

3.1.6 Cataloging In general, cataloging in DB2 refers to the configuration of clients, databases, and nodes. This is somewhat similar to how connectivity is configured in XPS using the SQLHOSTS file. However, sometimes, just as in XPS, the system catalogs are also referred to using the same terminology.

3.1.7 Client access to DB2 instances For clients to connect to a remote XPS database system, either the sqlhosts file (UNIX) or registry (Windows using setnet32) needs to be configured on the client. Similarly, DB2 clients need to have the remote system information cataloged locally on the client using either the GUI interface or the catalog command. There are a variety of methods to accomplish this. See the product documentation for details.

3.2 Configuration methods

XPS configuration methods are not covered in full detail because we assume that you are familiar with these methods. However, the following list provides a brief summary:  Command line. XPS users typically use the command line approach to configuration.

 Informix Server Administrator (ISA). This is a Web-based utility that enables the DBA to control and configure many instances from one browser window.

66 Database Strategies: Using Informix XPS and DB2 Universal Database 3.2.1 DB2 configuration methods

While DB2 has more tunable parameters that you can use to build and administer an instance, it also has a broader choice of configuration tools and utilities. As examples, consider the following:

 Command line. The DB2 product has command line utilities for configuration. Again, XPS administrators might find the commands themselves to be more verbose. However, over time, this eases and the use seems less cumbersome. A DBA can also use aliases or can script the commands to be more familiar, if needed.  GUI tools. For many years, DB2 users in the UNIX or Windows environments have had access to many GUI tools for administration, monitoring, and configuration. DB2 8.2 continues with this tradition in that there are new tools and many improvements on the existing tools. GUI tools are one of the greatest strengths of DB2.

Tip: For almost every choice within the DB2 GUI tools, there is a command line equivalent that can be displayed. This is an excellent way to learn commands and obtain exact command line syntax.

 The DB2 Control Center. The DB2 Control Center is a very powerful utility for configuring DB2. Figure 3-1 on page 68 shows a high-level view of the DB2 Control Center. This document does not address the operations of the various wizards that the Control Center provides. Refer to the product documentation, as well as Database Transition: Informix Dynamic Server to DB2 Universal Database, SG24-6367.

Chapter 3. Configuration 67

Figure 3-1 DB2 Control Center

Using the Control Center for command generation You can use the DB2 Control Center to derive the CLP command for an operation by simply clicking the operation requested. For almost every operation, there is a Show SQL Command button that displays the actual syntax. This button can be very valuable to an XPS administrator who is not comfortable with the DB2 commands. Figure 3-2 on page 69 depicts an example of using Show SQL to display the command to create an SMS table space.

68 Database Strategies: Using Informix XPS and DB2 Universal Database

Figure 3-2 Control Center generation of SQL command

Example 3-1 illustrates the command that is derived by the Control Center.

Example 3-1 Deriving commands with the Control Center CONNECT TO MARK; CREATE REGULAR TABLESPACE NJ_TABLESPACE PAGESIZE 4 K MANAGED BY SYSTEM USING ('/home/marks/large_pb') EXTENTSIZE 32 OVERHEAD 12.67 PREFETCHSIZE 32 TRANSFERRATE 0.18 BUFFERPOOL IBMDEFAULTBP DROPPED TABLE RECOVERY ON; COMMENT ON TABLESPACE NJ_TABLESPACE IS 'NJ''s Tablespace'; CONNECT RESET;

3.2.2 Configuration Advisor and the AUTOCONFIGURE command The Configuration Advisor Wizard is a utility that comes in handy for a novice DB2 administrator. Based on responses to questions about anticipated workload and the system environment, it can suggest various database and instance configuration parameters to provide a starting point in the tuning efforts. Figure 3-3 on page 70 shows the results of a configuration advisor session.

Chapter 3. Configuration 69

Figure 3-3 Configuration advisor wizard

Note: These recommendations should be validated through actual measurements in regression test environments before committing the changes in the production environment.

The AUTOCONFIGURE command delivers the same results as the Configuration Advisor wizard, by accepting the same input via options that are listed and described in Table 4-1. AUTOCONFIGURE calculates and displays the optimum values for the buffer pool size, database configuration, and database manager configuration parameters with the option of applying these recommended values immediately.

70 Database Strategies: Using Informix XPS and DB2 Universal Database You can use AUTOCONFIGURE with the CREATE DATABASE command to configure databases as soon as they are created. This feature calculates and displays initial values for the buffer pool size, database configuration, and database manager configuration parameters with the option of applying these recommended values. Example 3-2 shows the syntax for the AUTOCONFIGURE command.

Example 3-2 AUTOCONFIGURE from the command line AUTOCONFIGURE [USING config-keyword value [{config-keyword value}...]] [APPLY {DB ONLY | DB AND DBM | NONE}] config-keyword: MEM_PERCENT, WORKLOAD_TYPE, NUM_STMTS, TPM, ADMIN_PRIORITY, IS_POPULATED NUM_LOCAL_APPS, NUM_REMOTE_APPS, ISOLATION, BP_RESIZEABLE.

There are many options and settings for AUTOCONFIGURE. Table 3-1 shows these options, values, and their explanations.

Table 3-1 AUTOCONFIGURE configuration keywords Keyword Valid values Default Explanation value

mem_percent 1-100 80 Percentage of memory to dedicate. If other applications (other than the operating system) are running on this server, set this to less than 100.

workload_type simple, mixed, mixed Simple workloads tend to be I/O complex intensive and mostly transactions, while complex workloads tend to be processor intensive and mostly queries.

num_stmts 1 to 1,000 K 10 Number of statements per unit of work.

tpm 10,200,000 60 Transactions per minute.

admin_priority performance, both Optimize for better performance recovery, both (more transactions per minute) or better recovery time.

is_populated yes, no yes Is the database populated? num_local_apps 0-5,000 0 Number of connected local applications.

Chapter 3. Configuration 71 Keyword Valid values Default Explanation

value

num_remote_apps 0-5,000 10 Number of connected remote applications.

isolation RR, RS, CS, UR RR Isolation level of applications connecting to this database (Repeatable Read, Read Stability, Stability, Uncommitted Read).

bp_resizeable yes, no yes Are buffer pools resizeable?

For more details on the AUTOCONFIGURE command, refer to DB2 UDB Command Reference, SC09-4828.

3.3 Configuration files and objects overview

In XPS, almost all files that are related to the operation of the database engine are located within the installation directory, commonly referred to as $INFORMIXDIR. Almost all files are owned and belong to the special user or group informix.

The DB2 product is typically installed in a /opt or /usr/opt file system, depending on the underlying operating system environment. An instance creation creates various directory structures in the home directory of the instance owner. An instance owner can be any user ID.

When changing parameters with XPS, the DBA typically edits the $ONCONFIG file directly and often an instance stop or restart is necessary. However, which parameters require an instance stop or restart is not obvious. The DBA must know whether a stop or restart is necessary.

With DB2, the DBA modifies one of the configuration files via the command line commands or via the GUI interface, not by direct editing. If it is necessary to stop or start the instance, or to deactivate or activate a database, the DBA is notified by the command.

3.3.1 Environment variables and the profile registry

In XPS, there are a few environment variables that need to be set to access the instance and the stored databases. In addition, there can be some occasions where the DBA might need to set specialized environment variables to provide session level overrides for system defaults or to implement a temporary fix for a

72 Database Strategies: Using Informix XPS and DB2 Universal Database known issue. As an administrator, there is also a command line mechanism to change the operating parameters dynamically for a single active session or for the instance as a whole.

Similarly, in DB2, environment and registry variables control your database

environment. There are various levels (instance, global, and node) at which registry variables can be set to provide a finer grain of control in the operation of the database system. To modify these registry variables might require authority to be granted to the person executing the updates. When updating the registry, changes do not affect the currently running DB2 applications or users. Applications started following the update use the new values.

3.3.2 Setting registry and environment variables The db2set command is used to set registry variables. There are various options to this command that can set the registry variables at the appropriate level. These registry variables are stored in text files, in either a subdirectory of the instance directory or in a subdirectory under /var (depending on the operating system). The following are some examples of the db2set command:  To display help information for the command: db2set ?  To view a list of all supported registry variables: db2set -lr  To change the value for a variable in the current or default instance: db2set registry_variable_name=new_value  To list all defined registry variables in the profile registry: db2set -all

The closest equivalent to a registry in XPS would be the informix.rc file that you can use to set a list of variables that provide a default state of operation. In addition, there are some session level settings that you can implement via the sysdbopen procedure when connecting to a database.

Setting environment variables on UNIX systems It is strongly recommended that you define all DB2-specific registry variables in the DB2 profile registry. If DB2 variables are set outside of the registry, remote administration of those variables is not possible.

On UNIX operating systems, you must set the system environment variable DB2INSTANCE. For users requiring access to a specific instance, the script db2profile or dbcshrc (depending on the user's shell environment), located in the

Chapter 3. Configuration 73 sqllib subdirectory of the installation directory, can be sourced to set the required DB2 environment variables.

Table 3-2 shows some of the commonly used environment and registry variables used in DB2.

Table 3-2 Common variables Variable Description

DB2NODE Numeric value that specifies the database partition to connect to. By default, connection is to logical port 0 on the current host. Similar to setting INFORMIXSERVER.

DB2COMM Specifies the communication managers to start when DB2 is started. This is similar to starting the listener threads in XPS. Set it to TCPIP.

DB2INSTANCE Specifies the name of the database instance to work with. Similar to setting INFORMIXDIR and ONCONFIG.

DB2DBDFT Specifies default database name for implicit connections.

DB2SYSTEM Unique name that identifies the host system on the network. Typically set to the hostname of the system.

Enabling execution of remote commands DB2 and XPS both require /etc/hosts.equiv or .rhosts to be configured to be able to execute remote commands among the partitions within an instance.

3.3.3 DB2 configuration files and objects DB2 does not have specific files that can be edited with your favorite UNIX editor. The configuration files are updated via a command line or Control Center command.

DB2 directories Access to both local and remote databases use entries in the DB2 directories. Directories hide the requirement for the user to know where a database actually resides. The directories are:  System database directory, which resides in the SQLDBDIR subdirectory in the instance directory and is used to catalog both local and remote databases.  Local database directory, which resides in every drive or path that contains a database. It is used to access local databases in that subdirectory.

74 Database Strategies: Using Informix XPS and DB2 Universal Database  Node directory, which contains entries for all instances that the client accesses, including information about the network connection to the instance. Each database client has a node directory. If multiple instances exist on a remote machine, then each instance must be cataloged as a separate node before you are able to access any information with the instance.  DCS directory, which holds the connection information for DRDA® host databases. This directory only exists if DB2 Connect™ is installed on your system.  Administrative node directory, which contains one definition for each remote system that is known to a DB2 client.

The Database Manager configuration file The Database Manager configures the instance. These settings affect all databases and users connecting to this instance of DB2. To view the configuration for a database manager, there are two choices:  db2 get database manager configuration  db2 get dbm cfg

Updating database manager configuration parameters To update configuration parameters for the instance, use one of the following:  UPDATE DATABASE MANAGER CONFIGURATION USING parametername value  UPDATE DBM CFG USING parametername value

The database configuration file The database configuration file is present on each data partition in a multi-partition DB2 ESE or DPF environment. As a result, any command that displays or updates only affects the currently connected partition. The command for database configuration parameters is: db2 GET DATABASE CONFIGURATION (or GET DB CFG) [ FOR database_name ]

To update a database configuration parameter, use one of the following commands:  UPDATE DATABASE CONFIGURATION USING parametername value  UPDATE DB CFG USING parametername value

In a partitioned database system, you might want to issue commands to run on servers in the instance or on database partition servers (nodes).

Chapter 3. Configuration 75 Issuing commands to multiple database partitions So, how do you update the configuration on all the database partitions? You can use a utility that is similar to xctl. For the experienced XPS DBA, xctl is a tremendous help because it allows the modification and maintenance of coservers without requiring a login to the affected coservers. DB2 provides a similar utility called db2_all.

The simplest usage of this command is as follows: db2_all "db2 update db cfg using parametername value"

This command executes the db2 command that is contained within the quotation marks on all the database partitions in serial. This operation is similar to the basic execution of the xctl command in XPS. Be aware that DB2 has yet another command, called rah (Run All Hosts), which operates a bit differently than db2_all.

Table 3-3 provides a summary of some of the commands that affect multiple partitions.

Table 3-3 Commands that affect multiple partitions Command Description

rah Runs the command on all machines.

db2_all Runs the command on all database partition servers that you specify.

db2_kill Abruptly stops all processes that are running on multiple database partition servers and cleans up all resources on all database partition servers. Renders your databases inconsistent. Do not issue this command except under direction from IBM service.

Note: The db2_all command is the equivalent of xctl. There is no equivalent for rah in XPS. If there is only one logical data partition per physical node, then both rah and db2_all are similar in functionality.

Running db2_kill is similar to running xctl kill -9 on oninit processes on each coserver.

For information about using the rah and db2_all commands, see the DB2 V8 Administration Guide: Implementation Guide, SC09-4820.

76 Database Strategies: Using Informix XPS and DB2 Universal Database Configuring multiple logical nodes A logical DB2 partition or node is the same as an XPS coserver. There are multiple ways in which logical partitions (or for that matter coservers) can be configured:  A standard configuration where each machine has only one database partition server (single coserver setup).  A multiple logical node configuration where a machine has more than one database partition server. (multiple coservers on a single host).  A configuration where several logical nodes run on each of several machines (multiple coservers on multiple hosts).

In XPS, multiple coservers are setup by modifying the ONCONFIG file with the required coserver sections. Connectivity to the coservers is further established by modifying the sqlhosts file.

In DB2, the file db2nodes.cfg is used to describe the logical partitions defined in the instance. The format of the file is as follows: dbpartitionnum hostname logical-port netname rsetname where  dbpartitionnum is a unique number that defines a partition. Database partition numbers must be in ascending sequence, but you can have gaps in the sequence.  hostname is the IP address or fully qualified hostname (if applicable) for inter-partition communications.  logical-port specifies the logical port number for the node. This number is optional if there is only one logical partition per node. This number is used with the database manager instance name to identify a TCP/IP service name entry in the /etc/services file. The combination of the IP address and the logical port is used as a well-known address, and must be unique among all applications to support communications connections between nodes. For each hostname, one logical-port must be either zero (0) or blank (which defaults to 0). The node that is associated with this logical-port is the default node on the host to which clients connect. You can override this with the DB2NODE environment variable.  netname is an optional parameter that is used to support a host that has more than one active TCP/IP interface, each with its own hostname.  rsetname is an optional parameter that specifies the operating system resource set to use.

Chapter 3. Configuration 77 Example 3-3 shows the db2nodes.cfg file that we used in this redbook.

Example 3-3 db2nodes.cfg

1 CLYDE 0 CLYDE DB2/MLN1 2 CLYDE 1 CLYDE DB2/MLN2 3 CLYDE 2 CLYDE DB2/MLN3 4 CLYDE 3 CLYDE DB2/MLN4 5 CLYDE 4 CLYDE DB2/MLN5 6 CLYDE 5 CLYDE DB2/MLN6 7 CLYDE 6 CLYDE DB2/MLN7 8 CLYDE 7 CLYDE DB2/MLN8

Fast Communications Manager (FCM) communications Inter-coserver communication in XPS is accomplished using a hybrid-datagram implementation. Where applicable, a shared-memory segment is used for coservers on the same node and inter-node traffic uses TCP. The port numbers used are implementation dependent and use the CAMPORT configuration parameter as a starting point.

In a partitioned DB2 database environment, most communication between database partitions is handled by the Fast Communications Manager (FCM). To enable the FCM at a database partition and allow communication with other database partitions, define ports in the /etc/services file. The FCM uses the specified port to communicate. If there are multiple partitions defined on the same host, a range of ports must be setup as shown in Example 3-4.

Example 3-4 The /etc/services entries for partitioned database FCM configuration DB2_tpch 60000/tcp starting logical port (0) for FCM communication DB2_tpch_1 60001/tcp logical port 1 DB2_tpch_2 60002/tcp logical port 2 DB2_tpch_3 60003/tcp logical port 3 DB2_tpch_4 60004/tcp logical port 4 DB2_tpch_5 60005/tcp logical port 5 DB2_tpch_6 60006/tcp logical port 6 DB2_tpch_7 60007/tcp logical port 7 DB2_tpch_END 60008/tcp last logical port (8) for FCM communication

78 Database Strategies: Using Informix XPS and DB2 Universal Database Note: The service ports specified in the example must follow a specific syntax. Each entry must be prefixed with DB2_ and the last entry must have the keyword _END.

Be advised that these services ports are not used for client communication. The database manager configuration parameter SVCENAME defines the service port where the DB2 listener is started, to listen for client communications.

You should ensure that the number of ports allocated in the file is either greater than or equal to the largest number of multiple database partitions in the instance. The FCM port entries in the /etc/services file must be identical on all the hosts where there is a database partition for that instance.

In Example 3-4 on page 78, a range of nine ports are set up for FCM communications, implying that there cannot be more than nine database partitions within a single node.

Communication with logical partitions on the same node is accomplished using UNIX sockets by default. However, this behavior can be altered by setting the registry variable DB2_FORCE_FCM_BP. Setting this variable forces FCM memory buffers to be created in a separate shared memory segment so that communication between FCM daemons of different logical partitions on the same physical node occurs through shared memory. The amount of memory allocated is largely dependent on the number of FCM buffers to be created, as specified by the fcm_num_buffers database manager configuration parameter. A side effect of setting this variable is to reduce the maximum size of database buffer pools, particularly in 32-bit implementations.

3.4 Configuring the instance

This section provides an overview of the parameters that are used in configuring a DB2 instance and addresses where it is different from XPS.

3.4.1 Page size(s) A DB2 instance can use multiple page sizes for different database objects and their associated bufferpools. Page size is associated with a table space. In many cases there are inherent performance advantages to having multiple page sizes, but the primary reason for having multiple page sizes is that a data row cannot span pages with DB2. For example, if the table space created has a 4 KB page size, any table/index that is created in that table space must have a row size of

Chapter 3. Configuration 79 less than 4 KB (minus page or row level overhead). If the row size of a table needs to grow beyond the maximum row size of the page size, then the table should be dropped, a new table space/bufferpool created with a larger page size, and the table then reloaded.

Page size(s) can be 4 KB, 8 KB, 16 KB, or 32 KB. All tables created within a table space of a particular size have a matching page size. Table 3-4 shows the maximum number of columns and row size based on page size.

Table 3-4 Maximum row length by page size Page size Maximum # of Columns Maximum row size

4 KB 500 4,005

8 KB 1012 8,101

16 KB 1012 16,293

32 KB 1012 32,677

The default system page size is 4 KB. In general, for DSS environments, it is advisable to use larger page size(s). This is true in XPS and DB2. However, just as in XPS, number of rows on a page is limited to 255.

3.4.2 Table spaces A table space is a storage structure containing tables, indexes, large objects, and long data. They allow you to assign the location of database and table data directly into containers, which enables a flexible configuration that can result in improved performance. A container can be a directory name, a device name, or a file name. A table space is similar to a XPS dbspace, and containers are chunks.

When creating tables in XPS, extent size(s) can be specified for the initial and additional extents. However, in DB2, extent size is a table space specific parameter and all tables created within the table space have the same extent size.

There are two types of table spaces that can be created in a DB2 system. System Managed Space (SMS) table space is storage that is managed on demand. Data is stored in operating system files and these file(s) are typically expanded one page at a time. DB2 provides a utility called db2empfa that can alter this behavior to allow for page allocations one extent at a time. You should take care when using SMS table spaces because the storage limit is determined by the availability of file system space. Containers in an SMS table space are directory paths. If multiple containers are allocated to a SMS table space and

80 Database Strategies: Using Informix XPS and DB2 Universal Database one of them exhausts available file system space, then the SMS table space is considered to be full.

Database Managed Space (DMS) table spaces are managed by the database manager. A list of devices or files is selected to belong to a table space when the

DMS table space is defined. DMS table spaces differ from SMS table spaces in that for DMS table spaces, space is allocated when the table space is created and not allocated when needed.

XPS space management is similar to the DMS management in DB2.

Comparison of SMS and DMS table spaces There are a number of trade-offs to consider when determining which type of table space you should use to store your data. Here are a few examples:  Advantages of an SMS Table Space: – Space is not allocated by the system until it is required. – Creating a table space requires less initial work, because you do not have to predefine the containers.  Advantages of a DMS Table Space: – The size of a table space can be increased by adding or extending containers, using the ALTER TABLESPACE statement. Existing data can be automatically rebalanced across the new set of containers to retain optimal I/O efficiency. – A table can be split across multiple table spaces, based on the type of data being stored. – Long field and LOB data. – Indexes. – Regular table data.

3.4.3 Bufferpools In XPS, bufferpool memory is set up as part of the instance configuration and when a change is required, the ONCONFIG file is modified and the engine needs to be bounced. In a predominantly read-only environment, quite often the number of buffers created in XPS is typically small.

In DB2, bufferpool(s) are associated with a database and not with an instance. Unlike XPS, most data manipulation takes place in buffer pools. Only large objects and long field data are not manipulated in a buffer pool. There is no such concept as a light scan. However, it is possible to create block-based buffer pools for reading blocks of pages. (For details, see Administration Guide:

Chapter 3. Configuration 81 Performance, SC09-4821.) As such, configuring buffer pools is the single most important area for performance tuning.

Memory is allocated for a buffer pool when a database is activated or when the first application connects to the database. Buffer pools can also be created,

dropped, and resized while the database is manager is running.

When creating a table space, it must be associated with a specific buffer pool. Different table spaces using the same page size can be associated with the same buffer pool. If multiple page sizes are used for the various table spaces within a database, then buffer pools with corresponding page sizes must be created. To ensure that an appropriate buffer pool is available in all circumstances, DB2 creates small buffer pools, one for each page size: 4 KB, 8 KB, 16 KB, and 32 KB. The size of each buffer pool is 16 pages. These buffer pools are hidden from the user.

Buffer pools are created using the CREATE BUFFERPOOL statement and modified using the ALTER BUFFERPOOL statement. If the IMMEDIATE keyword is used when altering bufferpool(s) to increase the size of the buffer pool, memory is allocated immediately if available. If the memory is not available, the change occurs when all applications are disconnected and the database is reactivated. If the size of a buffer pool is decreased, memory is de-allocated at commit time. When all applications are disconnected, the buffer-pool memory is de-allocated.

After a buffer pool has been created, it is not possible to change the page size for that buffer pool.

Important: It is typically not recommended to have multiple bufferpools of the same page size.

XPS and DB2 operate very similarly in terms of overall buffer pool management and page re-use.

3.4.4 Physical and Logical Logs DB2 does not have the concept of a physical log. Transaction logs in DB2 fundamentally serve the same purpose as in XPS. Primarily, they are responsible for capturing logical changes to data or the instance and enabling the ability to do logical recovery if configured to do so. However, the management of these logs is different from XPS.

82 Database Strategies: Using Informix XPS and DB2 Universal Database XPS uses a form of circular logging within a set of predefined logs. Prior to reuse, the should be archived (or sent to /dev/ if recoverability is not an issue). Logs in XPS are associated with the entire instance and are shared among the various databases within the instance.

Transaction logs in DB2 are associated with individual databases. DB2 also has the capability of circular logging. However, there is no requirement for archiving the logs to be made available for reuse. If recoverability is desired, then a whole set of database configuration parameters must be set and steps must be taken to ensure that the completed log is archived. Figure 3-4 shows the list of database configuration parameters associated with log maintenance.

Figure 3-4 Database log configuration parameters

When the database is created, the log files are created in a default directory within the database directory structure. These logs can be relocated to a different location by setting the NEWLOGPATH parameter. The changes are effective when the database is restarted.

Primary log files are those allocated when the database is created. Secondary logs are allocated on a needed basis when the primary logs are full. If LOGSECOND is set to -1, then infinite logging is enabled. There are some implications for recovery in such a situation.

If logs are not retained (LOGRETAIN=NO), then circular logging is active. If user exit (USEREXIT) is not enabled, then there is no archiving of transaction logs. If rollforward recovery is required for the database, set LOGRETAIN to YES and enable USEREXIT.

Chapter 3. Configuration 83 Note: DB2 will always maintain the number of active logs based on the LOGPRIMARY parameter. If USEREXIT is not enabled when LOGRETAIN has been enabled, then the transaction logs will stay in the directory where they are created waiting to be archived resulting in a file system full condition.

MIRRORLOGPATH can be set to allow the database to write an identical second copy of the logs to a different location. The database needs to be restarted for this parameter to become active. This is akin to the recommendations in XPS to mirror rootdbs and logical logs.

Some Configuration differences The following are some of the configuration differences when considering DB2 and XPS:  Read ahead In DB2 read ahead is not a database or instance level configuration parameter. Pre-fetching, as it is known, is a table space option. It can be set at table space creation time or altered later using the PREFETCHSIZE option. If this is omitted, then the value of the database configuration parameter DFT_PREFETCH_SZ is used.  LRU queues The number of LRU Queues is not configurable in DB2. The database configuration parameter chngpgs_thresh is the equivalent of LRU_MAX_DIRTY and specifies the percentage of dirty pages in memory before asynchronous cleaning is initiated.  Diagnostic information In XPS, the location of the message path is set in the parameter MSGPATH. DB2 uses DIAGPATH to specify the location of the diagnostics file(s). Additionally, the database manager parameter DIAGLEVEL specifies the amount of information output to the log files.  Query parallelism PDQPRIORITY and its associated parameters in XPS have no direct equivalence in DB2. Both XPS and DB2 have implemented function shipping and provide native parallelism for query execution. However, unlike XPS, there is a little bit of control on whether there is any intra-parallelism within a data partition.

84 Database Strategies: Using Informix XPS and DB2 Universal Database The following configuration parameters control and manage intra-partition parallelism:

– The intra_parallel database manager configuration parameter enables or disables parallelism support.

– The dft_degree database configuration parameter sets the default value for the CURRENT DEGREE special register and the DEGREE bind option. – The max_querydegree database configuration parameter sets an upper limit for the degree of parallelism for any query in the database. This value overrides the CURRENT DEGREE special register and the DEGREE bind option. DEGREE refers to the number of parts of a query that execute concurrently. There is no strict between the number of processors and the value that you select for the degree of parallelism. You can specify more or less than the number of processors on the machine. Even for uniprocessor machines you can set a degree higher than one to improve performance in some ways. Note, however, that each degree of parallelism adds to the system memory and processor overhead.  Number of data partitions in a node When configuring a large SMP system with XPS, it is very rare to define the number of coservers equal to the number of physical processors on the system. In most configurations, there are at least two to three processors per coserver. In DB2, it is not uncommon to specify one data partition per physical processor.  Memory management In XPS, the only memory management parameters that need to be set are: SHMVIRTSIZE, SHMADD, SHMTOTAL and DS_TOTAL_MEMORY. The database engine typically manages the memory resources that are needed for query operations and administrative actions by allocating and de-allocating from within the virtual segment. DB2 provides a much finer grain of control by allowing the database administrator to tune a number of parameters that control utilization of the memory resources. It is beyond the scope of this document to list and describe the function of these parameters and the reader is referred to the Administration Guide: Performance, SC09-4821.  Checkpoint

In XPS checkpoints are initiated under different conditions but primarily due to the setting of the CKPTINTVL parameter.

Chapter 3. Configuration 85 In DB2, checkpoints are also initiated under various conditions but there is no periodic time interval that can be defined. However, the configuration parameter softmax can be set to specify what is called a soft checkpoint.

The softmax parameter is used to influence the number of logs required for crash

recovery but has the side effect of initiating a checkpoint. It is expressed as a percentage of log file(s) needed to recover the database from a system crash. For example, if softmax is set to 300, then the database engine tries to ensure that there are no more than three logs that are required for crash recovery purposes. Indirectly, it also is the percentage of logical log files that can be filled between soft checkpoints before DB2 begins to clean the buffer pool pages back to disk. So, in our example, 300% of logical log files (or three logs) are filled between soft checkpoints.

The performance implications of setting softmax are described in detail in the Administration Guide: Performance, SC09-4821.  Processor affinity In XPS, on systems that support this feature, it is enabled by setting the AFF_NPROCS and AFF_SPROCS parameters. Setting up this feature in DB2 seems to be more complicated and might be only available for AIX and Linux environments. One method involves the need to define resource sets and associating resource sets with the partition definitions in the db2nodes.cfg file. See 13.3.2, “Creating resource sets on AIX” on page 377 for an example of how we set up this feature for the case study.

86 Database Strategies: Using Informix XPS and DB2 Universal Database

4

Chapter 4. Instance and database operations

This chapter describes the key aspects of instance and database operations. Topics covered include instance operation and creation, configuration changes, managing database and log storage, backup and recovery, and high availability. Where possible, we provide extensive examples in order to familiarize you with the operational aspects of DB2 with the Database Partitioning Feature (DPF).

Although much of the operation of DB2 can be performed using the graphical user interface (GUI), we only illustrate a few operations specific to DPF. You can find more details about the generic operation of the DB2 GUI in Database Transition: Informix Dynamic Server to DB2 Universal Database, SG24-6367.

© Copyright IBM Corp. 2005. All rights reserved. 87 4.1 Instance operation modes

DB2 with DPF and XPS have similar operating modes: online, offline, and quiescent. What differs is that DB2 administrators do not need to explicitly initialize the instance for first use, for example by using xctl -C oninit -iy. This section explores the operating modes of DB with DPF, their implications, and differences with XPS.

4.1.1 Online mode In DB2 with DPF, the db2start command starts the current database manager instance background processes on all the database partitions defined unless the DBPARTITIONNUM parameter is used. Example 4-1 shows an example of a run of the db2start command.

Example 4-1 db2start command 3-CLYDE [tpch] $ db2start 10/18/2004 18:55:22 5 0 SQL1063N DB2START processing was successful. 10/18/2004 18:55:22 8 0 SQL1063N DB2START processing was successful. 10/18/2004 18:55:22 2 0 SQL1063N DB2START processing was successful. 10/18/2004 18:55:22 6 0 SQL1063N DB2START processing was successful. 10/18/2004 18:55:22 3 0 SQL1063N DB2START processing was successful. 10/18/2004 18:55:22 4 0 SQL1063N DB2START processing was successful. 10/18/2004 18:55:23 1 0 SQL1063N DB2START processing was successful. 10/18/2004 18:55:23 7 0 SQL1063N DB2START processing was successful. SQL1063N DB2START processing was successful.

When the db2start command is executed, the system reads the value of DB2INSTANCE and content of dbnodes.cfg file and starts the specified database manager instance. DB2 reads the DBM configuration file and sets up UNIX processes to allow communication with the instance. Database manager global shared memory is allocated and remains allocated until the database manager is stopped, using the db2stop command. This area contains information that the database manager uses to manage activity across all database connections. When the first application connects to a database, both global and private memory areas are allocated.

Connecting, database activation and termination Unlike XPS, an instance can be active, but the database within the instance is not active unless a user either connects to the database using the connect command or an explicit activate command is issued. This means that memory usage for the overall instance might be lower, because inactive databases do not consume memory resources.

88 Database Strategies: Using Informix XPS and DB2 Universal Database To connect to the database, use the connect command as follows: db2 connect to databasename user username using userpassword

For example: db2 connect to db2tpch user micky using mypwd

If a database is not yet running, the connect command starts the database and then establishes a connection for the application issuing the command. After the database is running, any additional applications can connect and subsequently disconnect from the database, including the initial connection. The database continues running as long as there is at least one connection attached to it. When all the applications have disconnected from the database, the database shuts down automatically. To explicitly end a connection, issue the terminate command.

Because you can define multiple database partitions for one host, to connect to a specific database partition, you need to set DB2NODE to the partition number as shown in the following example on a UNIX platform: db2 terminate export DB2NODE=2 db2 connect to tpch user micky using mypwd

Tip: The step of db2 terminate is to force db2bp backend process exits if it exists so that DB2NODE can take effect for newly started db2bp process for the next connection.

If DB2NODE is not set, the default database partition gets connected is the one that has a value of zero (0) for the logical-port parameter in db2nodes.cfg file.

Instance attachment At the instance level, the administrator who wishes to perform instance level maintenance must use the attach command to initiate operations with that instance. While multiple database connections can be maintained at a given time, only one attach can be in effect. For example: db2 attach to nodename user username using userpassword

If ATTACH has not been executed, instance-level commands are executed against the current instance, specified by the DB2INSTANCE environment variable. To disconnect from an instance, simply issue the detach command.

Chapter 4. Instance and database operations 89 4.1.2 Offline mode

The db2stop command terminates the DB2 instance. However, if some partition has active database connections, instance on those partitions are not brought down, and you get the error that is shown in Example 4-2.

Example 4-2 db2stop error messages 2-CLYDE [db2tpch] $ db2stop 10/27/2004 09:58:04 1 0 SQL1025N The database manager was not stopped because databases are still active. 10/27/2004 09:58:04 2 0 SQL1064N DB2STOP processing was successful. 10/27/2004 09:58:04 3 0 SQL1064N DB2STOP processing was successful. 10/27/2004 09:58:04 4 0 SQL1064N DB2STOP processing was successful. 10/27/2004 09:58:04 5 0 SQL1064N DB2STOP processing was successful. 10/27/2004 09:58:04 6 0 SQL1064N DB2STOP processing was successful. 10/27/2004 09:58:04 7 0 SQL1025N The database manager was not stopped because databases are still active. 10/27/2004 09:58:05 8 0 SQL1064N DB2STOP processing was successful. SQL6033W Stop command processing was attempted on "8" node(s). "6" node(s) were successfully stopped. "0" node(s) were already stopped. "2" node(s) could not be stopped.

You can override this error by using the db2stop force command. This is similar to the xctl onmode -ky command in XPS.

4.1.3 Quiescent mode Quiescent mode restricts new users from connecting and allows the administrator to perform maintenance activities. This mode is equivalent to the XPS xctl onmode -s command. The following command forces all users off the specified instance or database and puts it into a quiesced mode: quiesce instance instancename/ quiesce database databasename

In quiesced mode, users cannot connect from outside of the database instance. After administrative tasks are complete, use the following command to activate the instance and database and to allow other users to connect to the instance or database without shutting down and starting another database or instance: unquiesce instance instancename or unquiesce database databasename

In this mode, only users with authority in this restricted mode are allowed to attach or connect to the instance or database. Users with SYSADM, SYSMAINT, and SYSCTRL authority always have access to an instance while it is quiesced, and users with SYSADM authority always have access to a database while it is quiesced.

90 Database Strategies: Using Informix XPS and DB2 Universal Database 4.1.4 Creating and dropping the instance

The db2icrt command creates and configures the database manager instance on the server. Only the user root has authority to run this command. As part of this process, the environment variable DB2INSTANCE is set to the name of the

database manager instance and PATH is set to include the path to the DB2 UDB binary files. A new directory, sqllib, is created in the $HOME directory of the user specified as the SYSADM.

Finally, the files necessary to set environment variables are created. The first of these two files is db2profile (or db2bashrc or db2cshrc, depending on your shell), which sets the default environment variables. This file is often overwritten by new versions of DB2 UDB or by fix packs. Do not make any changes to it. The second file is called userprofile and is provided for your use to set environment variables unique to your installation. It is not overwritten by new versions of DB2 UDB or by fix packs. The db2drop command drops the DB2 instance. However, it does not affect the content of databases created in the instance. You can re-create the same instance, catalog the databases and regain access to them again.

In order to remotely administrate the instance or database from another machine, such as administrating via DB2 Control Center on PC, you need to use dasicrt command to create a DAS daemon on each physical machine where the instance resides. For more information about syntax usage of the db2icrt and dasicrt commands, refer to IBM DB2 Universal Database Command Reference, SC09-2844.

4.2 Modifying the configuration

Many DB2 configuration parameters can now be set online. Refer to the DB2 UDB Administration Guide: Performance, SC09-4821, for a list of parameters. Changes to these configurable online configuration parameters take immediate effect without the need to stop and start the instance, or deactivate and activate the database. You no longer have to disconnect users when you fine-tune your system, giving you more flexibility for deciding when to change the configuration.

Examples of parameters that can be set online include memory heaps such as CATALOGCACHE_SZ, PCKCACHESZ, STMTHEAP, SORTHEAP, and UTIL_HEAP_SZ. Other parameters such as LOCKLIST size, MAXLOCKS, and DLCHKTIME (dead lock check time) allows you to adjust the locking characteristics of your database system which can improve performance. You

can choose to defer a change to a configurable online configuration parameter so that the configuration change is made at the next instance start or database activation. A SHOW DETAIL option is available in the GET DATABASE CFG and GET DATABASE MANAGER CONFIGURATION commands that lists both the current value

Chapter 4. Instance and database operations 91 and the value that is used at the next instance start or database activation. In some cases, you can set the parameter you are configuring to automatic, and DB2 then adjusts its value automatically as workload on the system changes. For example, setting MAXAPPLS to automatic says there is no limit to the maximum number of applications, except when memory is exhausted. Database configuration parameters are stored in a file named SQLDBCONF for each database partition. These files cannot be directly edited and can only be changed or viewed via a supplied API or by a tool which calls that API. This is unlike XPS, which allows you to edit the onconfig file. In addition, because each database partition maintains its own copy of SQLDBCONF file, in order for the change to take effect on all database partitions, you need to use the db2_all command to invoke the db2 command for changing the database configuration.

4.2.1 Working with the DAS The DB2 Administration Server (DAS) provides administration services to remote clients, as well as scheduling services. In XPS, each instance operates autonomously, there is no concept of a DAS.

The DAS is the connection between the DB2 GUI tools on the client and those on the server. If the DAS is not created or has been stopped, you cannot connect to the database(s) using the GUI tools. Because there can be only one DAS running on a physical machine, multiple instances and versions of DB2 UDB, such as V8.1 or V8.2, connect through the same DAS. It is not necessary to create DAS if you do not want to use GUI administration tools to manage your instance and database remotely.

If you do not select DAS to be created during DB2 installation, use the following to create the DAS manually: dascrt -u dasuser (UNIX) db2admin create /user:username /password:password (INTEL)

To start and stop the DAS: db2admin start db2admin stop

To list the DAS: db2set -g DB2ADMINSERVER daslist

To remove the DAS: dasdrop (UNIX) db2admin drop (INTEL)

92 Database Strategies: Using Informix XPS and DB2 Universal Database 4.2.2 Viewing or updating the configuration using Control Center

Within Control Center, you can use the Configure Instance option to set the database manager configuration parameters on either a client or a server. You can use the Configure Database option to alter the value of database

configuration parameters. The DB2 Control Center also provides the Configuration Advisor to recommend optimal configuration values based on the responses you provide to a set of questions, such as the workload and the type of transactions that run against the database. For more details, see 9.3.1, “Control Center” on page 278.

Now consider the option for the configuration of the instance. To access this option: 1. Select the instance name from Control Center and right-click Configure Parameters, as shown in Figure 4-1.

Figure 4-1 Configure parameters for instance

2. After you make this selection, you are prompted for your user ID and password. Note that you must have the appropriate level of instance or database authority to update the configuration. You are shown a list of the available instance level configuration parameters, which are conveniently categorized into areas such as applications,

Chapter 4. Instance and database operations 93 communications, diagnostic, monitor, and parallel. We choose to update SHEAPTHRES, which is under the Performance category. This parameter controls the amount of memory that can be allocated across the instance for sort heaps. By clicking the +++ symbols next to the current value, you are able to enter a new value as shown in Figure 4-2.

Figure 4-2 Update SHEAPTHRES configuration parameter

3. Click OK and you are returned to the parameters list screen. Notice in Figure 4-3 on page 95 that there are three differences from Figure 4-2. First, there is now a Pending Value for SHEAPTHRES, indicating a change is pending. Second, Pending Value Effective indicates this particular pending value will take effect after instance restarts. Last, the Show Command button is no longer greyed out.

94 Database Strategies: Using Informix XPS and DB2 Universal Database

Figure 4-3 Pending configuration value

4. You can also choose to save the syntax by clicking Save. You can save it as a file or as a task in Task Center. Select Close and then OK to save the configuration changes. 5. Click Show Command and the Control Center displays the command syntax that is needed to update the configuration as shown in Figure 4-4 on page 96.

Chapter 4. Instance and database operations 95

Figure 4-4 Update SHEAPTHRES command syntax

4.2.3 Managing database partition groups DB2 with DPF has a concept of database partition group which is similar to a cogroup in XPS. A database partition group is a set of one or more database partitions. When you want to create tables for the database, you first create the database partition group where the table spaces are stored, then you create the table space where the tables are stored.

You can define named subsets of one or more database partitions in a database. Each subset that you define is known as a database partition group. Each subset that contains more than one database partition is known as a multi-partition database partition group. Multi-partition database partition groups can only be defined with database partitions that belong to the same instance.

You create a new database partition group using the CREATE DATABASE PARTITION GROUP statement. You can modify it using the ALTER DATABASE PARTITION GROUP statement. Data is divided across all the partitions in a database partition group, and you can add or drop one or more database partitions from a database partition group.

96 Database Strategies: Using Informix XPS and DB2 Universal Database Each database partition that is part of the database system configuration must already be defined in a partition configuration file called db2nodes.cfg. A database partition group can contain as little as one database partition, or as much as the entire set of database partitions defined for the database system.

Example 4-3 shows an example of creating a database partition group, onepartition_group, which consists of database partition number 2. The create tablespace command in this example is explained in “Creating and altering table spaces and containers” on page 101.

Example 4-3 Database partition group create database partition group onepartition_group on dbpartitionnums(2); create tablespace space_small in database partition group onepartition_group pagesize 4K managed by system using ( '/wrk2/tablespaces/db2f1. $N /space_small', '/wrk2/tablespaces/db2f2. $N /space_small', '/wrk2/tablespaces/db2f3. $N /space_small', '/wrk2/tablespaces/db2f4. $N /space_small' ) bufferpool ibmdefaultbp ;

Default database partition groups that were created when the database was created, are used by the database manager. IBMCATGROUP is the default database partition group for the table space containing the system catalogs. IBMTEMPGROUP is the default database partition group for system temporary table spaces. IBMDEFAULTGROUP is the default database partition group for the table spaces containing the user defined tables that you might choose to put there. A user temporary table space for a declared temporary table can be created in IBMDEFAULTGROUP or any user-created database partition group, but not in IBMTEMPGROUP.

Chapter 4. Instance and database operations 97 To list all database partition groups created, you can use list database partition groups command as shown in Example 4-4.

Example 4-4 List partition groups

4-CLYDE [tpch] $ db2 list database partition groups

DATABASE PARTITION GROUP ------IBMCATGROUP IBMDEFAULTGROUP ONEPARTITION_GROUP

3 record(s) selected.

You can also get the partition groups list by using the GUI ControlCenter, as depicted in Figure 4-5.

Figure 4-5 List database partition groups in Control Center

98 Database Strategies: Using Informix XPS and DB2 Universal Database 4.2.4 Managing buffer pools

Buffer pools are a critically important memory component, and are the single most important element for good performance. A buffer pool is memory used to cache table and index data pages as they are being read from disk, or being

modified. Because memory access is much faster than disk access, the less often the database manager needs to read from or write to a disk, the better the performance. Only large objects and long field data are not manipulated in a buffer pool. DB2 buffer pools are similar to XPS BUFFERS, except that with DB2, buffer pools are assigned to individual databases, instead of the instance, and DB2 buffer pools are associated with one or more table spaces.

With DB2, every database must have at least one buffer pool. One default buffer pool, IBMDEFAULTBP, is created when the CREATE DATABASE command is processed.

Memory is allocated for the buffer pool when a database is activated or when the first application connects to the database. Buffer pools can also be created, dropped, and resized while the database manager is running. If you use the IMMEDIATE keyword when you use the ALTER BUFFERPOOL command to increase the size of the buffer pool, memory is allocated as soon as you enter the command, if the memory is available. If the memory is not available, the change occurs when all applications are disconnected and the database is reactivated. If you decrease the size of the buffer pool, memory is deallocated at commit time. When all applications are disconnected, the buffer pool memory is de-allocated.

To create a buffer pool, you can use the CREATE BUFFERPOOL command. After a buffer pool has been created, it can be associated with one or more table spaces. This is done with the create tablespace command, which is discussed in “Creating and altering table spaces and containers” on page 101. Bufferpools can be resized to accommodate your changing memory requirements. You can use ALTER BUFFERPOOL command to resize the bufferpool.

Example 4-5 shows an example of increasing the size of the default IBMDEFAULTBP bufferpool, and creating a new bufferpool via the command line. Alternatively you can use the Control Center GUI tool to perform the same operation. (For more detailed examples, see Database Transition: Informix Dynamic Server to DB2 Universal Database, SG24-6367.)

Example 4-5 Bufferpool alter bufferpool ibmdefaultbp size 50000;

create bufferpool bp16k all dbpartitionnums size 250000 pagesize 16K;

Chapter 4. Instance and database operations 99 4.3 Managing database storage

In DB2, there are two areas of storage management to consider: database disk

management and log file management. Chapter 3, “Configuration” on page 63 discusses table spaces and containers in detail, along with best practices for setting up your environment. This section discusses how to set up and manage storage for data, as well as storage for transaction logging and maintaining database integrity.

4.3.1 Table spaces and containers DB2 database data is managed within table spaces and containers, which are similar to the XPS concept of dbspaces or dbslices and chunks. Table spaces can be of two types: SMS (System Managed Space) or DMS (Database Managed Space). An SMS table space is created using directory containers. The DMS table space is created using either file containers or device containers. Device containers are similar to XPS raw disk.

Important: The SMS table space is full as soon as any one of its containers is full. Thus, it is important to have the same amount of space available to each container.

There are a number of trade-offs to consider when determining which type of table space you should use to store your data.

Advantages of an SMS Table Space include:  Space is not allocated by the system until it is required.  Creating a table space requires less initial work, because you do not have to predefine the containers.

Advantages of a DMS Table Space include:  The size of a table space can be increased by adding or extending containers, using the ALTER TABLESPACE statement. Existing data can be automatically rebalanced across the new set of containers to retain optimal I/O efficiency.  A table can be split across multiple table spaces, based on the type of data being stored:

– Long field and LOB data – Indexes – Regular table data

100 Database Strategies: Using Informix XPS and DB2 Universal Database  The location of the data on the disk can be controlled, if this is allowed by the operating system.

 If all table data is in a single table space, a table space can be dropped and redefined with less overhead than dropping and redefining a table.

 In general, a well-tuned set of DMS table spaces will outperform SMS table spaces.

For more details, see IBM DB2 Administration Guide: Planning, SC09-4822.

Creating and altering table spaces and containers The syntax and process to create table spaces is similar, whether using DMS or SMS. The basic parameters that are needed to explicitly create a table space are the table space name, how it is managed, and the container(s). Optionally, you can specify the use of the table space, such as regular or user-temporary. You can also optionally specify database partition group, page size, extent size, prefetch size, and buffer pool name. Different than XPS using kilobytes, in DB2 UDB sizes can be entered in the following units:  integer — page  integer K — kilobyte  integer M — megabyte  integer G — gigabyte

In XPS, a dbslice resides on coserver(s) defined by a cogroup, and all the dbspace(s) name contain the coserver suffix. In DB2, there is no concept of a dbslice. However, a table space can reside on one or more partitions defined in a database partition group.

Example 4-6 on page 102 shows an example of creating a DMS table space, with the name space_dataindex, using four raw devices as containers in the ibmdefaultgroup database partition group, and an SMS table space, named space_small, in the onepartition_group database partition group using four directory paths as containers.

Chapter 4. Instance and database operations 101 Example 4-6 DMS table space

create tablespace space_dataindex in database partition group ibmdefaultgroup pagesize 16K managed by database using ( device '/wrk2/tablespaces/db2a1. $N ' 1250000, device '/wrk2/tablespaces/db2a2. $N ' 1250000, device '/wrk2/tablespaces/db2a3. $N ' 1250000, device '/wrk2/tablespaces/db2a4. $N ' 1250000 ) bufferpool bp16k extentsize 32 prefetchsize 128 ;

create tablespace space_small in database partition group onepartition_group pagesize 4K managed by system using ( '/wrk2/tablespaces/db2f1. $N /space_small', '/wrk2/tablespaces/db2f2. $N /space_small', '/wrk2/tablespaces/db2f3. $N /space_small', '/wrk2/tablespaces/db2f4. $N /space_small' ) bufferpool ibmdefaultbp ;

As discussed in “Managing database partition groups” on page 96, ibmdefaultgroup consists of all database partitions, and onepartition_group consists database partition number 2. Therefore, table space space_dataindex resides on all database partitions and space_small resides only on database partition number 2.

Similar to the %c formatting string used in XPS, the argument $N ([blank]$N) indicates a database partition expression. A database partition expression can be used anywhere in the container name, and multiple database partition expressions can be specified. Terminate the database partition expression with a space character, and whatever follows the space is appended to the container name after the database partition expression is evaluated. If there is no space character in the container name after the database partition expression, it is assumed that the rest of the string is part of the expression. The arguments for creating container operators can only be used in one of the forms depicted in the examples in Table 4-1 on page 103, and they are evaluated from left to right. The database partition number in these examples is assumed to be 5.

102 Database Strategies: Using Informix XPS and DB2 Universal Database Table 4-1 Arguments for creating containers operators

Syntax Example Value

[blank]$N “$N” 5

[blank]$N+[number] “$N+1011” 1016

[blank]$N%[number] “$N%3” 2

[blank]$N+[number]%[number] “$N+12%13” 4

[blank]$N%[number]+[number] “$N%3+20” 22

4.3.2 Monitoring table space and container storage Two methods are available to help you understand how much disk space you are using for the database objects. You can use either DB2 commands or the Storage Manager.

Monitoring table space storage from the command line Within DB2 UDB, you can use the DB2 list tablespaces show detail command to view the table space type, page allocation and usage, extent size, page size, number of containers for all table spaces, and the table space state. You first need a database connection to do so. Keep in mind that if you are using an SMS table space, all of the allocated pages are shown as used, and the free page value is not applicable. If you just want summary information, you can also use the list tablespaces command. In a partitioned database environment, it only list the result for the connecting partition. To get information from all partitions, you need to invoke the command via db2_all as follows: db2_all “db2 connect to tpch ; db2 list tablespaces”

To avoid adding database connection in db2_all command, you can set DB2 registry DB2DBDFT. It specifies the name of the database to be used for implicit connects. If an application has no database connection but SQL statements are issued, an implicit connect is made if the DB2DBDFT environment variable has been defined with a default database.

Chapter 4. Instance and database operations 103 Example 4-7 shows an example of running the list tablespaces show detail command against database partition number 2 via db2_all.

Example 4-7 List table spaces

3-CLYDE [db2tpch] $ db2set DB2DBDFT=tpch 3-CLYDE [db2tpch] $ db2_all "<<+2< db2 list tablespaces show detail"

Tablespaces for Current Database

Tablespace ID = 1 Name = TEMPSPACE1 Type = System managed space Contents = System Temporary data State = 0x0000 Detailed explanation: Normal Total pages = 4 Useable pages = 4 Used pages = 4 Free pages = Not applicable High water mark (pages) = Not applicable Page size (bytes) = 4096 Extent size (pages) = 32 Prefetch size (pages) = 768 Number of containers = 4

Tablespace ID = 2 Name = USERSPACE1 Type = System managed space Contents = Any data State = 0x0000 Detailed explanation: Normal Total pages = 1 Useable pages = 1 Used pages = 1 Free pages = Not applicable High water mark (pages) = Not applicable Page size (bytes) = 4096 Extent size (pages) = 32 Prefetch size (pages) = 192 Number of containers = 1

Tablespace ID = 3 Name = SPACE_DATAINDEX Type = Database managed space Contents = Any data State = 0x0000

104 Database Strategies: Using Informix XPS and DB2 Universal Database Detailed explanation: Normal Total pages = 5000000 Useable pages = 4999808 Used pages = 1144736 Free pages = 3855072 High water mark (pages) = 1178944 Page size (bytes) = 16384 Extent size (pages) = 32 Prefetch size (pages) = 128 Number of containers = 4

Tablespace ID = 4 Name = SPACE_SMALL Type = System managed space Contents = Any data State = 0x0000 Detailed explanation: Normal Total pages = 5547 Useable pages = 5547 Used pages = 5547 Free pages = Not applicable High water mark (pages) = Not applicable Page size (bytes) = 4096 Extent size (pages) = 32 Prefetch size (pages) = 768 Number of containers = 4

Tablespace ID = 5 Name = SPACE_TEMP Type = System managed space Contents = Any data State = 0x0000 Detailed explanation: Normal Total pages = 4 Useable pages = 4 Used pages = 4 Free pages = Not applicable High water mark (pages) = Not applicable Page size (bytes) = 16384 Extent size (pages) = 32 Prefetch size (pages) = 128 Number of containers = 4

DB21011I In a partitioned database server environment, only the table spaces on the current node are listed. CLYDE: db2 list tablespaces ... completed ok

Chapter 4. Instance and database operations 105 To find more information about the containers for a table space, you can use the following DB2 command:

list tablespace containers for tablespace_number show detail

You can obtain the table space number for a container from the list tablespaces show detail command, and this provides the path to the storage container. With this information, you can view the container using system tools, such as ls -al or du -k -s directory_name.

The alternative is to use the db2pd command. Running the db2pd command with the -db and -tablespaces options gives similar output look as the output of onstat -d in XPS. It also has options to collect information for a set of database partitions, -dbpartitionnum [,], and for all database partitions, -alldbpartitionnums.

Example 4-8 illustrates the command options to collect table space information for database partition number 3.

Example 4-8 Collecting table space information 4-CLYDE [db2tpch] $ db2pd -dbpartitionnum 3 -db tpch -tablespaces

Database Partition 3 -- Database TPCH -- Active -- Up 0 days 00:37:29

Tablespaces: Address Id Type Content PageSize ExtentSize Auto Prefetch BufID BufIDDisk State TotPages UsablePgs UsedPgs PndFreePgs FreePgs HWM MinRecTime NQuiescers NumCntrs MaxStripe Name 0x0780000073A2F000 1 SMS SysTmp 4096 32 Yes 768 1 1 0x00000000 0 0 0 0 0 0 0 0 4 0 TEMPSPACE1 0x07800001772768C0 2 SMS Any 4096 32 Yes 192 1 1 0x00000000 0 0 0 0 0 0 0 0 1 0 USERSPACE1 0x0780000177276EE0 3 DMS Any 16384 32 No 128 2 2 0x00000000 5000000 4999808 1143648 0 3856160 1177856 0 0 4 0 SPACE_DATAINDEX 0x07800000739EA120 5 SMS Any 16384 32 No 128 2 2 0x00000000 0 0 0 0 0 0 0 0 4 0 SPACE_TEMP

Containers: Address TspId ContainNum Type TotalPages UseablePgs StripeSet Container 0x0780000073A2F620 1 0 Path 0 0 0 /wrk2/tablespaces/db2f1.3/systmp 0x0780000073A2F770 1 1 Path 0 0 0 /wrk2/tablespaces/db2f2.3/systmp

106 Database Strategies: Using Informix XPS and DB2 Universal Database 0x0780000073A2F8C0 1 2 Path 0 0 0 /wrk2/tablespaces/db2f3.3/systmp 0x0780000073A2FA10 1 3 Path 0 0 0 /wrk2/tablespaces/db2f4.3/systmp 0x0780000073A2FB80 2 0 Path 0 0 0 /wrk2/db2tpch/db2tpch/NODE0003/SQL00001/SQLT0002.0 0x0780000177277500 3 0 Disk 1250000 1249952 0 /wrk2/tablespaces/db2a1.3 0x0780000177277650 3 1 Disk 1250000 1249952 0 /wrk2/tablespaces/db2a2.3 0x07800001772777A0 3 2 Disk 1250000 1249952 0 /wrk2/tablespaces/db2a3.3 0x07800001772778F0 3 3 Disk 1250000 1249952 0 /wrk2/tablespaces/db2a4.3 0x0780000177277A60 5 0 Path 0 0 0 /wrk2/tablespaces/db2f1.3/space_temp 0x0780000177277BB0 5 1 Path 0 0 0 /wrk2/tablespaces/db2f2.3/space_temp 0x0780000177277D00 5 2 Path 0 0 0 /wrk2/tablespaces/db2f3.3/space_temp 0x0780000177277E50 5 3 Path 0 0 0 /wrk2/tablespaces/db2f4.3/space_temp

In the output of db2pd with the -tablespaces options, the table space state is shown in hexadecimal values, instead of text, as shown in output of the list tablespaces command. The db2tbst (Get Tablespace State) command can be used to obtain the table space state associated with a given hexadecimal value.

Example 4-9 lists the bit definitions listed in sqlutil.h.

Example 4-9 Table space bit definitions 0x0 Normal 0x1 Quiesced: SHARE 0x2 Quiesced: UPDATE 0x4 Quiesced: EXCLUSIVE 0x8 Load pending 0x10 Delete pending 0x20 Backup pending 0x40 Roll forward in progress 0x80 Roll forward pending 0x100 Restore pending 0x100 Recovery pending (not used) 0x200 Disable pending 0x400 Reorg in progress 0x800 Backup in progress 0x1000 Storage must be defined 0x2000 Restore in progress

Chapter 4. Instance and database operations 107 0x4000 Offline and not accessible 0x8000 Drop pending 0x2000000 Storage may be defined 0x4000000 StorDef is in 'final' state 0x8000000 StorDef was changed prior to rollforward 0x10000000 DMS rebalancer is active 0x20000000 TBS deletion in progress 0x40000000 TBS creation in progress 0x8 For service use only

Monitoring table space storage from Control Center You can monitor table space in DB2 ControlCenter. It provides a Storage Management tool with the ability to manage the storage of a specific database or database partition over the long term. It also allows you to capture data distribution snapshots and to view storage history.

A detailed example of using this tool is provided in Database Transition: Informix Dynamic Server to DB2 Universal Database, SG24-6367.

4.3.3 Transactions and logs Both XPS and DB2 UDB support many variations of logging at the database level. While the objective is the same, physical or logical recovery of changes to the instance or databases, the implementation of the logging is very different. The following sections provide details on each approach.

Transaction logging occurs at the database level in DB2 UDB and can be enabled differently for each database in the Instance. Unlike XPS, DB2 uses a single set of logs. In DB2 UDB, logging must be either circular or archival. Similar to XPS, there is no option to create database with option of NO LOG. At the table level, you can use the NOT LOGGED INITIALLY option on the CREATE TABLE/ALTER TABLE statement to avoid logging transactions on an initial table create/load. With this option, any changes made to the table by an Insert, Delete, Update, Create Index, Drop Index, or Alter Table operation in the same unit of work in which the table is created, are not logged. The behavior is similar to raw table in XPS.

With DB2 you do not need to mark the beginning of a transaction, you only need to mark the end. Transactions begin implicitly with the first SQL statement executed, and end with a COMMIT or ROLLBACK.

108 Database Strategies: Using Informix XPS and DB2 Universal Database Tip: By default, the DB2 CLP turns on Auto-commit. It performs implicitly a COMMIT WORK after each SQL statement. To turn of Auto-commit of DB2 CLP, you can invoke DB2 CLP with the +c option, db2 +c, or with environment DB2OPTIONS=”+c” dbaccess , so that it behaves similar to in XPS.

XPS uses a combination of a physical log and logical log for database recovery. The physical log contains an image of the page prior to change and the logical log records the transaction. DB2 UDB implements write-ahead logging in which the changed data is always written to the log files before the change is committed. DB2 UDB logs changes in its log files including the old version of the data.

Managing logs DB2 uses Database Configuration parameters to control the number of logs to configure, their sizes and other parameters affecting logging.

Different from XPS, that maintains its log files in its dbspaces, when DB2 creates a database it creates the recovery log files for it in a subdirectory of the directory containing the database. The default is a subdirectory named SQLOGDIR under the directory created for the database.

The current location of log file is shown in the Path to log files field of database configuration. You cannot change this parameter directly because it is set by the database manager after a change to the NEWLOGPATH parameter becomes effective.

Log files can be system files or raw devices. You can control the number and size of the DB2 log files via the LOGPRIMARY, LOGSECOND, and LOGFILSIZ database parameters. Consult the DB2 UDB Administration Guide, SC09-4821, for more details.

The DB2 Transaction Log is an important part of the backup and recovery processes. The Transaction Log records all committed changes to the database and it is used to bring the database to a consistent point after a power failure. It can also be used to restore a database to its current state after media failure or application failure.

A total size of 256 GB is supported for logging. In addition to the primary logs, DB2 UDB also uses a number (between 0 and 126) of secondary logs as specified in the LOGSECOND parameter. Secondary logs do not occupy permanent file space, but are allocated one at a time as needed and de-allocated as they are not needed. Secondary logs are used when the primary logs fill up and more log space is required. Infinite active logging is also new in DB2 version 8. Its allows an active unit of work to span the primary logs and archive logs,

Chapter 4. Instance and database operations 109 effectively allowing a transaction to use an infinite number of log files. Without infinite active log enabled, the log records for a unit of work must fit in the primary log space. Infinite active log is enabled by setting LOGSECOND to -1. Infinite active log can be used to support environments with large jobs that require more log space than you would normally allocate to the primary logs. Another important configuration parameter, BLK_LOG_DSK_FUL, allows you to specify that DB2 should not fail when running applications on disk full condition from the active log path. When you enable this option, DB2 will retry every five minutes allowing you to resolve the full disk situation and allowing the applications to complete.

The amount of space (in bytes) that is required for primary log files is: (LOGPRIMARY * (LOGFILSIZ + 2) * 4096) + 8192

The amount of space (in bytes) that is required for primary and secondary log files is: (LOGPRIMARY + LOGSECOND) * (LOGFILSIZ + 2) * 4096) + 8192

You need to monitor the log-storage directory to ensure you have enough file system space to hold the logs.

Logging modes There are two types of DB2 logging: circular and archive. Each provides a different level of recovery capability.

Circular logging is the default behavior when a new database is created. (The logarchmeth1 and logarchmeth2 database configuration parameters are set to OFF.) With this type of logging, only full, offline backups of the database are allowed. The database must be offline (inaccessible to users) when a full backup is taken. As the name suggests, circular logging uses several reusable online logs to provide recovery from transaction and system failures. The logs are used and retained only to the point of ensuring the integrity of current transactions. Circular logging does not allow you to roll a database forward through transactions performed after the last full backup operation. All changes occurring since the last backup operation are lost. Because this type of restore operation recovers your data to the specific point in time at which a full backup was taken, it is called version recovery. This type of logging is similar to setting XPS logical log backup device to NULL.

Archive logging is used specifically for rollforward recovery. Archived logs are logs that were active but are no longer required for failure recovery. Use the LOGARCHMETH1 database configuration parameter to enable archive logging.

110 Database Strategies: Using Informix XPS and DB2 Universal Database The advantage of choosing archive logging is that rollforward recovery can use both archived logs and active logs to rebuild a database either to the end of the logs, or to a specific point in time. The archived log files can be used to recover changes made after the backup was taken. This is different from circular logging where you can only recover to the time of the backup, and all changes made after that are lost.

Taking online backups is only supported if the database is configured for archive logging. During an online backup operation, all activities against the database are logged. When an online backup image is restored, the logs must be rolled forward at least to the point in time at which the backup operation completed. For this to happen, the logs must have been archived and made available when the database is restored. After an online backup is complete, DB2 forces the currently active log to be closed and, as a result, it will be archived. This ensures that your online backup has a complete set of archived logs available for recovery.

There are several options for how logs are archived with this mode. You can archive to disk, tape, use tape management software such as Tivoli® Storage Manager, use third-party APIs, or write your own user exit. The LOGARCHMETH1 database configuration parameter specifies the media type of the primary destination for archived logs. The LOGARCHMETH2 specifies the media type of the secondary destination for archived logs. If this path is specified, log files will be archived to both this destination and the destination specified by the LOGARCHMETH1 database configuration parameter.

DB2 UDB supports log mirroring at the database level, which is enabled by setting the MIRRORLOGPATH database configuration parameter. The MIRRORLOGPATH configuration parameter allows the database to write an identical second copy of log files to a different path. It is recommended that you place the secondary log path on a physically separate disk, preferably one that is also on a different disk controller. That way, the disk controller cannot be a single point of failure.

When MIRRORLOGPATH is first enabled, it is not actually used until the next database startup, similar to the NEWLOGPATH configuration parameter.

For more information, refer to DB2 Data Recovery and High Availability Guide and Reference, SC09-4831.

Chapter 4. Instance and database operations 111 4.4 Backup and recovery

DB2 has a rich architecture for defining and managing backup and recovery

tasks. It can backup/restore at both database level and table space level. A database can become unusable because of hardware or software failure, or both. Each failure scenario requires a different recovery action. It is critical to protect your data against the possibility of loss by having a well rehearsed recovery strategy in place, just as you do with XPS. When making a transition from XPS to DB2, you should also consider:  Will the database be recoverable?  How much time can be spent recovering the database?  How much time will pass between backup operations?  How much storage space can be allocated for backup copies and archived logs?  Will table space level backups be sufficient, or will full database backups be necessary?

The overall strategy should also include procedures for recovering command scripts, applications, user defined functions (UDFs), stored procedure code in operating system libraries, and load copies.

4.4.1 Recovery types The concept of a database backup for DB2 is the same as for XPS. That is, taking a copy of the data and then storing it on a different medium in case of failure or damage to the original. The simplest case of a backup involves shutting down the database to ensure that no further transactions occur, and then simply backing it up. This is referred to as an offline backup. You can then rebuild the database if it becomes damaged or corrupted.

In DB2, the rebuilding of the database is called recovery. There are three types of recovery: version recovery, crash recovery, and rollforward recovery. We can now look at each of them.

Version recovery This type of recovery involves the restoration of a previous version of the database, such as using an image that was created during a backup operation. Version recovery is equivalent to performing a physical restore in XPS.

112 Database Strategies: Using Informix XPS and DB2 Universal Database Crash recovery Crash recovery is the automatic recovery of the database if a failure occurs before all of the changes that are part of one or more units of work (transactions) are completed and committed. This type of recovery is equivalent to XPS fast recovery. DB2 performs crash recovery by rolling back incomplete transactions and completing committed transactions that were still in memory when the crash occurred. Crash recovery is performed automatically when required. Recovery log files and the recovery history file are created automatically when a database is created. These log files are important if you need to recover data that is lost or damaged.

Each database partition has its own recovery logs, which are used to recover from application or system errors. In combination with the database backups, they are used to recover the consistency of the database right up to the point in time when the error occurred. The recovery history file contains a summary of the backup information that can be used to determine recovery options, if all or part of the database must be recovered to a given point in time. It is used to track recovery-related events such as backup and restore operations, among others. This file is located in the database directory.

The table space change history file, which is also located in the database directory, contains information that can be used to determine which log files are required for the recovery of a particular table space. You cannot directly modify the recovery history file or the table space change history file. However, you can delete entries from the files using the PRUNE HISTORY command. You can also use the REC_HIS_RETENTN database configuration parameter to specify the number of days that these history files will be retained.

Data that is easily re-created can be stored in a non-recoverable (circular logging) database, as discussed in “Logging modes” on page 110. When circular logging is enabled, the only logs that are kept are those required for crash recovery. These logs are known as active logs, and they contain current transaction data. Version recovery using offline backups is the primary means of recovery for a non-recoverable database. Such a database can only be restored offline. It is restored to the state it was in when the backup image was taken and rollforward recovery is not supported.

Rollforward recovery Rollforward recovery is the reapplication of transactions recorded in the database log files after a database or a table space backup image has been restored. Rollforward recovery is the equivalent of applying the logical logs in XPS.

Chapter 4. Instance and database operations 113 When using a recoverable, or archiving enabled, database, both active logs and archived logs (which contain committed transaction data) can be used for recovery. This is known as rollforward recovery. Using the ROLLFORWARD utility, when you restore the online backup, you can then roll the database forward and apply transactions that took place after the backup by using the active and archived logs. You can rollforward to either a specific point in time, or to the end of the active logs.

For recoverable databases, backup operations can be performed either offline or online (online meaning that other applications can connect to the database during the backup operation). Online table space restore and rollforward operations are supported only if the database is recoverable. If the database is non-recoverable, database restore and rollforward operations must be performed offline. During an online backup operation, rollforward recovery ensures that all table changes are captured and reapplied if that backup is restored.

If you have a recoverable database, you can back up, restore, and roll individual table spaces forward, rather than the entire database. When you back up a table space online it is still available for use, and simultaneous updates are recorded in the logs. When you perform an online restore or rollforward operation on a table space, the table space itself is not available for use until the operation completes, but users are not prevented from accessing tables in other table spaces.

Automated backup operations Because it can be time-consuming to determine whether and when to run maintenance activities such as backup operations, you can use the Configure Automatic Maintenance wizard to do this for you. With automatic maintenance, you specify your maintenance objectives, including when automatic maintenance can run. DB2 then uses these objectives to determine if the maintenance activities need to be done and then runs only the required maintenance activities during the next available maintenance window (a user-defined time period for the running of automatic maintenance activities). You can still perform manual backup operations when automatic maintenance is configured. DB2 only performs automatic backup operations if they are required. For more information, see 9.6.1, “Configuring automatic maintenance” on page 293.

4.4.2 Backup and restore methods Similar to XPS, you can perform both full and incremental backups and restores. An incremental backup is a backup image that contains only pages that have been updated since the previous backup was taken. In addition to updated data and index pages, each incremental backup image also contains all of the initial database metadata (such as database configuration, table space definitions, database history, and so on) that is normally stored in full backup images.

114 Database Strategies: Using Informix XPS and DB2 Universal Database Two types of incremental backup are supported:

 Incremental An incremental backup image is a copy of all database data that has changed since the most recent, successful, full backup operation. This is also known

as a cumulative backup image, because a series of incremental backups taken over time will each have the contents of the previous incremental backup image. The predecessor of an incremental backup image is always the most recent successful full backup of the same object. This is the equivalent of a XPS Level 0 archive.  Delta A delta, or incremental delta, backup image is a copy of all database data that has changed since the last successful backup (full, incremental, or delta) of the table space in question. This is also known as a differential, or non-cumulative, backup image. The predecessor of a delta backup image is the most recent successful backup containing a copy of each of the table spaces in the delta backup image.

The key difference between incremental and delta backup images is their behavior when successive backups are taken of an object that is continually changing over time. Each successive incremental image contains the entire contents of the previous incremental image, plus any data that has changed, or is new, since the previous full backup was produced. Delta backup images contain only the pages that have changed since the previous image of any type was produced.

Combinations of database and table space incremental backups are permitted, in both online and offline modes of operation. Be careful when planning your backup strategy, because combining database and table space incremental backups implies that the predecessor of a database backup (or a table space backup of multiple table spaces) is not necessarily a single image but could be a unique set of previous database and table space backups taken at different times. The history file can help you to decide the best approach for recovery.

If you wish to enable incremental backups, you need to set the database configuration parameter, TRACKMOD, to YES, by issuing the following command: UPDATE DB CONFIG USING TRACKMOD YES

To rebuild the database or the table space to a consistent state, the recovery process must begin with a consistent image of the entire object (database or table space) to be restored, and must then apply each of the appropriate incremental backup images in the order described below.

Chapter 4. Instance and database operations 115 Backup In contrast to XPS, on a DB2 partitioned database system, database partitions are backed up individually. The operation is local to the database partition server on which you invoke the utility. You can, however, issue the db2_all command from one of the database partition servers in the instance to invoke the backup utility on a list of servers, which you identify by partition number. (Use the LIST DBPARTITIONNUMS command to identify the database partition servers that have user tables on them.)

If you do this, you must back up the catalog node first, then back up the other database partitions. In Example 4-10, the database SAMPLE is defined on 2 partitions, numbered 0 and 1. The path /wrk28/backup is accessible from all partitions. Partition 0 is the catalog partition and needs to be backed-up separately in the first command because this is an offline backup. In the second command, the db2_all utility issues the same backup command to each database partition in parallel in the background (except partition 0). After two commands complete successfully, two backup image files are available in the /wrk28/backup directory. The backup image is located at the target location that was specified or the directory from which the command was issued, which is:  A directory (for backups to disk or diskette)  A device (for backups to tape)  A Tivoli Storage Manager server  Another vendor's server

On UNIX-based systems, file names for backup images created on disk consist of a concatenation of several elements separated by periods, such as: DB_alias.Type.Inst_name.NODEnnnn.CATNnnnn.timestamp.Seq_num

The recovery history file is updated automatically with summary information whenever you invoke a database backup operation. This file is created in the same directory as the database configuration file.

Example 4-10 Database backup 0-CLYDE [db2test] $ db2_all "<<+0< db2 backup database sample to /wrk28/backup"

Backup successful. The timestamp for this backup image is : 20041103110543

CLYDE: db2 backup database ... completed ok 0-CLYDE [db2test] $ db2_all "||<<-0< db2 backup database sample to /wrk28/backup" rah: omitting logical node 0

CLYDE: CLYDE: Backup successful. The timestamp for this backup image is : 20041103110704

116 Database Strategies: Using Informix XPS and DB2 Universal Database CLYDE: CLYDE: db2 backup database ... completed ok 0-CLYDE [db2test] $ ls /wrk28/backup SAMPLE.0.db2test.NODE0000.CATN0000.20041103110543.001 SAMPLE.0.db2test.NODE0001.CATN0000.20041103110704.001

You can also use the Command Editor to back up database partitions. You should also keep a copy of the db2nodes.cfg file with any backup copies you take, as protection against possible damage to this file. In addition, because for offline backup, the catalog partition must be backed up separately first. You might consider not storing any user data to catalog partition so that the catalog partition requires minimum time to finish offline backup.

The following restrictions apply to the backup utility:  A table space backup operation and a table space restore operation cannot be run at the same time, even if different table spaces are involved.  If you want to be able to do rollforward recovery in a partitioned database environment, you must regularly back up the database in the list of dbpartitions, and you must have at least one backup image of the rest of the dbpartitions in the system (even those that do not contain user data for that database). Two situations require the backed-up image of a database partition at a database partition server that does not contain user data for the database: – You added a database partition server to the database system after taking the last backup, and you need to do forward recovery on this database partition server. – Point-in-time recovery is used, which requires that all database partitions in the system are in rollforward pending state. – Online backup operations for DMS table spaces are incompatible with the following operations: load, reorganization (online and offline), drop tablespace, table truncation, index creation, not logged initially (used with the CREATE TABLE and ALTER TABLE statements).

Note that different from XPS, DB2 log files by default is not included in the backup image. You can specify INCLUDE LOGS option to enable this. If you are using a database with recovery enabled, you will have other backup options, such as performing an incremental or delta backup, and performing the backup online instead of offline.

Chapter 4. Instance and database operations 117 Recovery There are two methods for database recovery. The first, uses the RESTORE command combined with the ROLLFORWARD command. The second, uses the new RECOVER command.

RESTORE and ROLLFORWARD commands Before you recover a database, it is important to first understand the recovery history file. A recovery history file contains the history of backup and restore events and is created with each database. It is updated automatically whenever any database object is backed up, restored, or rolled forward; when table spaces are created, altered, or dropped; or when a table is loaded, dropped, or reorganized.

You can use the summarized backup information in this file to recover all or part of a database to a given point in time. The information in the file includes:  An identification (ID) field to uniquely identify each entry.  The part of the database that was copied and how.  The time the copy was made.  The location of the copy (stating both the device information and the logical way to access the copy).  The last time a restore operation was done.  The time at which a table space was renamed, showing the previous and the current name of the table space.  The status of a backup operation: active, inactive, expired, or deleted.  The last log sequence number saved by the database backup or processed during a rollforward recovery operation.

To see the entries in the recovery history file, use the History tab in the Journal, or issue the LIST HISTORY command from the command line. Example 4-11 gives an example that shows backup history for a database sample on partition number 1 since 2004/11/03 12:00:00PM.

Example 4-11 Backup history 0-CLYDE [db2test] $ db2_all "<<+1< db2 list history backup since 20041103120000 for database sample"

List History File for sample

Number of matching file entries = 1

Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID ------

118 Database Strategies: Using Informix XPS and DB2 Universal Database B D 20041103135049001 F D S0000000.LOG S0000000.LOG ------Contains 3 tablespace(s):

00001 USERSPACE1 00002 QPCONTROL 00003 QPRESULT ------Comment: DB2 BACKUP SAMPLE OFFLINE Start Time: 20041103135049 End Time: 20041103135111 Status: A ------EID: 46 Location: /wrk28/backup

CLYDE: db2 list history backup ... completed ok

Every backup operation (database, table space, or incremental) includes a copy of the recovery history file. The recovery history file is linked to the database and, therefore, dropping a database deletes the recovery history file. Restoring a database to a new location restores the recovery history file. Restoring does not overwrite the existing history recovery file unless the file that exists on disk has no entries. If that is the case, the database history will be restored from the backup image.

If the current database is unusable or not available, and the associated recovery history file is damaged or deleted, an option on the RESTORE command allows only the recovery history file to be restored. The recovery history file can then be reviewed to provide information about which backup to use to restore the database.

To restore the SAMPLE database that was backed up in Example 4-11 on page 118, we restore the catalog partition first, then all other database partitions of the SAMPLE database from the /wrk28/backup directory, and issue the commands that are illustrated in Example 4-12 from one of the database partitions.

Example 4-12 Database restore 0-CLYDE [db2test] $ db2_all "<<+0< db2 restore database sample from /wrk28/backup taken at 20041103110543 into sample replace existing"

SQL2539W Warning! Restoring to an existing database that is the same as the backup image database. The database files will be deleted. DB20000I The RESTORE DATABASE command completed successfully. CLYDE: db2 restore database ... completed ok

Chapter 4. Instance and database operations 119 0-CLYDE [db2test] $ db2_all "<<+1< db2 restore database sample from /wrk28/backup taken at 20041103110704 into sample replace existing

SQL2539W Warning! Restoring to an existing database that is the same as the backup image database. The database files will be deleted. DB20000I The RESTORE DATABASE command completed successfully. CLYDE: db2 restore database ... completed ok

You might consider regularly performing a backup of the catalog node and avoid putting user data on it, because other data increases the time required for the backup and restore.

A database restore operation requires an exclusive connection. That is, no applications can be running against the database when the operation starts, and the restore utility prevents other applications from accessing the database until the restore operation completes successfully. A table space restore operation, however, can be done online. A table space is not usable until the restore operation (followed by rollforward recovery) completes successfully. In addition, you can perform restores across similar operating systems. This means, you can restore between AIX, Solaris, and HP-UX, and from Windows to Windows, and Linux to Linux.

You can also redefine table space containers by invoking the RESTORE DATABASE command and specifying the REDIRECT parameter, or by using the Containers page of the Restore Database notebook in the Control Center. The process for invoking a redirected restore of an incremental backup image is similar to the process for a non-incremental backup image: Call the RESTORE DATABASE command with the REDIRECT parameter and specify the backup image from which the database should be incrementally restored.

During a redirected restore operation, directory and file containers are automatically created if they do not already exist. The database manager does not automatically create device containers.

Container redirection provides considerable flexibility for managing table space containers. For example, even though adding containers to SMS table spaces is not supported, you could accomplish this by specifying an additional container when invoking a redirected restore operation.

Different from XPS that the logical restore operation applying transactions logs, in DB2 you use ROLLFORWARD DATABASE command to recovers a database by applying transactions recorded in the database log files. Invoked after a database or a table space backup image has been restored, or if any table spaces have been taken offline by the database due to a media error. The database must be recoverable (that is, the logarchmeth1 or logarchmeth2

120 Database Strategies: Using Informix XPS and DB2 Universal Database database configuration parameters must be set to a value other than OFF) before the database can be recovered with rollforward recovery.

In a DB2 partitioned database environment, this command can only be invoked from the catalog partition. A database or table space rollforward operation to a

specified point in time affects all partitions that are listed in the db2nodes.cfg file. A database or table space rollforward operation to the end of logs affects the partitions that are specified. If no partitions are specified, it affects all partitions that are listed in the db2nodes.cfg file; if rollforward recovery is not needed on a particular partition, that partition is ignored. Example 4-13 gives a simple ROLLFORWARD DATABASE command example to rollforward recover a table space that resides on eight database partitions (3 to 10) listed in the db2nodes.cfg file. The database partitions on which the table space resides do not have to be specified. The utility defaults to the db2nodes.cfg file. See the DB2 Command Reference, SC26-8967, for more details and examples.

Example 4-13 Database rollforward db2 rollforward database dwtest to end of logs on dbpartitionnums (3 to 10) tablespace (tssprodt)

RECOVER DATABASE command The new RECOVER DATABASE command combines the functionality of the RESTORE DATABASE and ROLLFORWARD DATABASE commands. When you use this command, you specify the point-in-time to which you want the database recovered. You do not need to indicate which database backup image must be restored or which log files are required to reach the specified point-in-time. The RECOVER DATABASE command also supports recovery operations to the end of the log files.

See more details in Data Recovery and High Availability Guide and Reference, SC09-4831.

4.4.3 Table level restore As a separate tool, DB2 HPU (High Performance Unload for Multiplatforms and Workgroups version 2.2) can extract tables from full, offline DB2 backups. It uses Catalog Information in Backups if provided. It allows extraction even if DB2 is down, allowing the extraction on a different system. It also support DB2 Backup compression introduced in DB2 version 8.1 Fix Pack 4. This fix pack provides some functionality for table level restore, similar to the archekcer utility in XPS.

Chapter 4. Instance and database operations 121 4.5 High availability

In addition to log mirroring at database level, DB2 product family offers several

replication solutions. Although the high availability disaster recovery (HADR) feature is not supported when you have multiple database partitions on DB2 UDB ESE, WebSphere Information Integrator, and DB2 UDB include SQL replication and queue replication solutions that can be used, in some configurations, to provide high availability. Failover protection can also be achieved by keeping a copy of your database on another machine that is perpetually rolling the log files forward. is the process of copying whole log files to a standby machine, either from an archive device, or through a user exit program running against the primary database.

With this approach, the primary database is restored to the standby machine, using either the DB2 restore utility or the split mirror function. You can also use suspended I/O support to quickly initialize the new database. The standby database on the standby machine continuously rolls the log files forward. If the primary database fails, any remaining log files are copied over to the standby machine. After a rollforward to the end of the logs and stop operation, all clients are reconnected to the standby database on the standby machine.

4.5.1 Log mirroring DB2 supports log mirroring at the database level, while with XPS, you can mirror not only logs but also any dbspace. Mirroring log files helps protect a database from accidental deletion of an active log, and data corruption caused by hardware failure. The DB2 configuration parameter, MIRRORLOGPATH, specifies a secondary path for the database to manage copies of the active log, mirroring the volumes on which the logs are stored. The MIRRORLOGPATH configuration parameter allows the database to write an identical second copy of log files to a different path. It is recommended that you place the secondary log path on a physically separate disk (preferably one that is also on a different disk controller). That way, the disk controller cannot be a single point of failure.

When MIRRORLOGPATH is first enabled, it will not actually be used until the next database startup. This is similar to the NEWLOGPATH configuration parameter. If there is an error writing to either the active log path or the mirror log path, the database marks the failing path as bad, writes a message to the administration notification log, and writes subsequent log records to the remaining good log path only. DB2 does not attempt to use the bad path again until the current log file is completed. When DB2 needs to open the next log file, it verifies that this path is valid, and if so, begins to use it. If not, DB2 does not attempt to use the path again until the next log file is accessed for the first time.

122 Database Strategies: Using Informix XPS and DB2 Universal Database There is no attempt to synchronize the log paths. However, DB2 keeps information about access errors that occur, so that the correct paths are used when log files are archived. If a failure occurs while writing to the remaining good path, the database shuts down.

4.5.2 Replication DB2 UDB supports both SQL replication and Queue replication, to provide high availability. These functions maintain logically consistent copies of database tables at multiple locations. In addition, they provide flexibility and complex functionality such as support for column and row filtering, data transformation, updates to any copy of a table, and they can be used in partitioned database environments. DB2 offers two choices for replication, using SQL and using queues. The queue version of replication is targeted more at the needs of high availability. DB2 embeds MQ Series technology for guaranteed message delivery.

4.5.3 Online split mirror and suspended I/O support You can create a split mirror image of your database as a snapshot or clone, standby, or backup image. Frequently this is achieved with the use of disk hardware or third party packages that create a split-mirror or snapshot of the database.

A split mirror is an instantaneous copy of the database that can be made by mirroring the disks containing the data, and splitting the mirror when a copy is required. Disk mirroring is the process of writing all of your data to two separate hard disks, with one being the mirror of the other. Splitting a mirror is the process of separating the primary and secondary copies of the database.

If you would rather not back up a large database using the DB2 backup utility, you can make copies from a mirrored image by using suspended I/O and the split mirror function. This approach also eliminates backup operation overhead from the production machine, and is a fast way to clone systems. Use the db2inidb command in conjunction with the suspend and resume commands to do this.

The SNAPSHOT option specifies that the mirrored database will be initialized as a clone of the primary database. You can then use this clone for testing or as a point in time failover environment. When you use the STANDBY option, the database is placed in roll forward pending state. New logs from the primary database can be fetched and applied to the standby database. The standby database can then be used in place of the primary database if it goes down. The MIRROR option specifies that the mirrored database is to be used as a backup image which can be used to restore the primary database.

Chapter 4. Instance and database operations 123 To clone a database, see Example 4-14.

Example 4-14 Cloning a database using split mirror

Suspend I/O on the primary database: db2 set write suspend for database

Use appropriate operating system-level commands to split the mirror or mirrors from the primary database. Resume I/O on the primary database: db2 set write resume for database

Catalog the mirrored database on the secondary system. Note: By default, a mirrored database cannot exist on the same system as the primary database. It must be located on a secondary system that has the same directory structure and uses the same instance name as the primary database. If the mirrored database must exist on the same system as the primary database, you can use the db2relocatedb utility or the RELOCATE USING option of the db2inidb command to accomplish this. Start the database instance on the secondary system: db2start

Initialize the mirrored database on the secondary system: db2inidb database_alias as snapshot

4.6 Security

Security constructs such as user and group authentication, password encryption, or alternative authentication methods are common with both DB2 and XPS and, thus, should not be a problem during transition. DB2 also offers a few additional security levels, but most of the privileges are comparable to those known by XPS. When discussing security, two terms must be clarified:  Data Encryption: Sensitive data is stored encrypted on a storage device or is transmitted to a sender or receiver cryptographically secured.  Authentication: Only permitted personnel are allowed to logon to sensitive servers, such as a database engine.

The subsequent sections provide discussion of the XPS and DB2 security mechanisms.

124 Database Strategies: Using Informix XPS and DB2 Universal Database 4.6.1 Authorization and privileges

Authorization is the process whereby DB2 obtains information about an authenticated DB2 user, indicating the database operations that user can perform and what data objects can be accessed. With each user request, there

might be more than one authorization check, depending on the objects and operations involved.

Authorization is performed using DB2 facilities. DB2 tables and configuration files are used to record the permissions associated with each authorization name. The authorization name of an authenticated user, and those of groups to which the user belongs, are compared with the recorded permissions. Based on this comparison, DB2 decides whether to allow the requested access.

There are two types of permissions recorded by DB2 UDB: privileges and authority levels. A privilege defines a single permission for an authorization name, enabling a user to create or access database resources. Privileges are stored in the database catalogs. Authority levels provide a method of grouping privileges and control over higher-level database manager maintenance and utility operations. Database-specific authorities are stored in the database catalogs; system authorities are associated with group membership, and the group names that are associated with the authority levels are stored in the database manager configuration file for a given instance.

Groups provide a convenient means of performing authorization for a collection of users without having to grant or revoke privileges for each user individually. Unless otherwise specified, group authorization names can be used anywhere that authorization names are used for authorization purposes. In general, group membership is considered for dynamic SQL and non-database object authorizations (such as instance level commands and utilities) but is not considered for static SQL. The exception to this general case occurs when privileges are granted to PUBLIC. These are considered when static SQL is processed. Specific cases where group membership does not apply are noted throughout the DB2 UDB documentation, where applicable.

The following list describes all authorities known by DB2 at database manager level:  SYSADM: System administration authority SYSADM authority is the highest level of authority and has control over all the resources created and maintained by the database manager. SYSADM authority includes all the authorities of DBADM, SYSCTRL, and SYSMAINT, and the authority to grant or revoke DBADM authorities.

Chapter 4. Instance and database operations 125  SYSCTRL: System control authority

SYSCTRL authority is the higher level of system control authority and applies only to operations affecting system resources. It does not allow direct access to data. This authority includes privileges to create, update, or drop a database; to stop an instance or a database; and to create or drop a table space.  SYSMAINT: System maintenance authority SYSMAINT authority is the second level of system control authority. A user with SYSMAINT authority can perform maintenance operations on all databases associated with an instance. It does not allow direct access to data. This authority includes the following privileges: – To update database configuration files – To back up a database or a table space – To restore an existing database – To monitor a database  SYSMON: System monitor Defines the group name with SYSMON authority for the instance. Users having SYSMON authority at the instance level gives them the ability to take database system monitor snapshots of a database manager instance or its databases.  DBADM: authority DBADM authority is the administrative authority specific to a single database. This authority includes privileges to create objects, issue database commands, and access the data in any of its tables through SQL statements. DBADM authority also includes the authority to grant or revoke CONTROL and individual privileges.

126 Database Strategies: Using Informix XPS and DB2 Universal Database Figure 4-6 provides a graphical depiction of the DB2 authorities and privileges.

SYSADM (System Administrator)

DBADM SYSCTRL (Database Administrator) (System Resouce Administrator)

SYSMAINT Database users with privileges (System Maintenance Administrator)

SYSMON (System Monitor Administrator)

Figure 4-6 Hierarchy of DB2 authorities and privileges

System authorities are stored in database manager configuration file. You need to use the DB2 Command Line Processor or the DB2 Control Center to view and change the setting. Example 4-15 shows how to view system authorities via the command line.

Example 4-15 Location of authority assignment db2 get dbm cfg ... Unit of work (DFT_MON_UOW) = OFF Monitor health of instance and databases (HEALTH_MON) = ON

SYSADM group name (SYSADM_GROUP) = DB2GRP1 SYSCTRL group name (SYSCTRL_GROUP) = SYSMAINT group name (SYSMAINT_GROUP) = SYSMON group name (SYSMON_GROUP) =

Client Userid-Password Plugin (CLNT_PW_PLUGIN) = Client Kerberos Plugin (CLNT_KRB_PLUGIN) = ...

Chapter 4. Instance and database operations 127 4.6.2 Roles and groups

XPS allows you to create database specific user groups, called roles. After a role has been defined, users are assigned to the role and further permissions are granted to the role.

DB2 is aware of groups and calls them just that, groups. However, in contrast to XPS, groups are not defined at database level but at operating system level. DB2 uses operating system groups.

Another difference between DB2 and XPS regarding groups is that with DB2 there is no set role SQL command to assign a user to a group role in order to inherit the permissions of a group. A user inherits all permissions of a group automatically because the user login group is already known.

The permissions to a group are granted or revoked similar to XPS, as shown in Example 4-16.

Example 4-16 Grant/Revoke permissions to/from a group GRANT CONNECT ON DATABASE TO ; REVOKE CONNECT ON DATABASE FROM ;

Public group The purpose of the PUBLIC group is the same as in XPS. Every user is assigned to this group. By that we mean that all users who are known to the operating system, independent of what authentication method is used.

As with XPS, you must be careful when working with this group. If not administered closely, a security hole can be produced. Example 4-17 shows a situation where the permission are not set as you might expect. That is, user u1 is still allowed to select data from table t1 because this user is part of the group PUBLIC.

Example 4-17 Is user u1 allowed to select table t1? # uid=db2inst1 db2 => create database ifmx2db2 => connect to ifmx2db2 => create table t1 (col1 int) => grant connect on database to public => grant select on table t1 to u1 => grant select on table t1 to public => revoke select on table t1 from u1

128 Database Strategies: Using Informix XPS and DB2 Universal Database 4.6.3 Security levels

Both XPS and DB2 apply security at different levels on different database objects. You can have specific privileges on tables and other specific privileges on the columns of that table. At database level, DB2 maintains security as

authorities. The authorities is applicable to the entire database rather than privileges that apply to specific objects within the database.

As with XPS, GRANT and REVOKE SQL statements are used in DB2 to assign authorities and privileges to a user. The following is a list of those authorities and privileges, with their description:  Default privileges DB2 does not grant default privileges the way that XPS does. If you grant database CONNECT privilege to a user, the user does not have permission automatically to query any table in the database. All subsequent privileges have to be configured manually. You will find that DB2 is more restrictive in this area than XPS.  Database authorities The syntax to administer database authorities in DB2 (similar to database level privileges in XPS) is slightly different than in XPS. The syntax to grant database authorities is depicted in Figure 4-7.

Figure 4-7 Database authorities syntax

Chapter 4. Instance and database operations 129 Example 4-18 illustrates the Grant and Revoke commands.

Example 4-18 Granting and revoking permissions

GRANT CONNECT ON DATABASE TO mark; REVOKE CONNECT ON DATABASE FROM nora;

DB2 offers more specific privileges than XPS. Some of these additional privileges are not covered here because XPS has nothing comparable, so they are therefore not relevant to a discussion on transitioning.

Figure 4-8 provides an overview of the authorities and privileges and how to relate.

SYSADM TABLESPACE SYSCTRL USE SYSMAINT SYSMON DATABASE DBADM SCHEMA CREATEIN BINDADD PACKAGE ALTERIN CONNECT CONTROL DROPIN CREATE_NOT_FENCED BIND CREATETAB TABLE (VIEW) EXECUTE IMPLICIT_SCHEMA ALL LOAD ALTER SEQUENCE INDEX USE CONTROL ROUTINE INSERT EXECUTE REFERENCES SELECT UPDATE

CONTROL

Figure 4-8 Privileges overview

The following is a list of the applicable authorities and privileges and their descriptions:  BINDADD Grants the authority to create packages. The creator of a package automatically has the CONTROL privilege on that package and retains this privilege even if the BINDADD authority is subsequently revoked.

 CONNECT Grants the authority to access the database.

130 Database Strategies: Using Informix XPS and DB2 Universal Database  CREATETAB

Grants the authority to create base tables. The creator of a base table automatically has the CONTROL privilege on that table. The creator retains this privilege even if the CREATETAB authority is subsequently revoked. There is no explicit authority required for view creation. A view can be created at any time if the authorization ID of the statement used to create the view has either CONTROL or SELECT privilege on each base table of the view.  CREATE_EXTERNAL_ROUTINE Grants the authority to register external routines. Care must be taken that routines so registered will not have adverse side effects. (For more information, see the description of the THREADSAFE clause on the CREATE or ALTER routine statements.) When an external routine has been registered, it continues to exist, even if CREATE_EXTERNAL_ROUTINE is subsequently revoked.  CREATE_NOT_FENCED_ROUTINE Grants the authority to register routines that execute in the database manager’s process. Care must be taken that routines so registered will not have adverse side effects. (For more information, see the description of the FENCED clause on the CREATE or ALTER routine statements.) When a routine has been registered as not fenced, it continues to run in this manner, even if CREATE_NOT_FENCED_ROUTINE is subsequently revoked. CREATE_EXTERNAL_ROUTINE is automatically granted to an authorization-name that is granted CREATE_NOT_FENCED_ROUTINE authority.  DBADM Grants the database administrator authority and all other database authorities. A database administrator has all privileges against all objects in the database and can grant these privileges to others.

Note: All other database authorities are granted to an authorization-name implicitly and automatically that is granted DBADM authority.

 IMPLICIT_SCHEMA Grants the authority to implicitly create a schema.  LOAD

Grants the authority to load in this database. This authority gives a user the right to use the LOAD utility in this database. SYSADM and DBADM also have this authority by default. However, if a user only has LOAD authority (not SYSADM or DBADM), the user is also required to have table-level privileges.

Chapter 4. Instance and database operations 131 In addition to LOAD privilege, the user must have:

– INSERT privilege on the table for LOAD with mode INSERT, TERMINATE (to terminate a previous LOAD INSERT), or RESTART (to restart a previous LOAD INSERT)

– INSERT and DELETE privilege on the table for LOAD with mode REPLACE, TERMINATE (to terminate a previous LOAD REPLACE), or RESTART (to restart a previous LOAD REPLACE) – INSERT privilege on the exception table, if such a table is used as part of LOAD  QUIESCE_CONNECT Grants the authority to access the database while it is quiesced.

Index privileges Similar to XPS, in DB2 there is a privilege for controlling indexes. Figure 4-9 shows the syntax for the commands. CONTROL grants the privilege to drop the index. This is the CONTROL authority for indexes, which is granted automatically to creators of indexes.

,

GRANT CONTROL ON INDEX index-name TO authorization-name USER GROUP

PUBLIC

,

REVOKE CONTROL ON INDEX index-name FROM authorization-name USER GROUP

PUBLIC

Figure 4-9 GRANT/REVOKE index syntax

Package privileges Packages contain static SQL statements which are executed by an application. Before an application can run against a DB2 database, the application specific package has to be bound/registered in the database. The syntax for the grant or revoke of a package privilege is depicted in Figure 4-10 on page 133. A description of packages can be found in Chapter 11, “Application conversion considerations” on page 321.

132 Database Strategies: Using Informix XPS and DB2 Universal Database

,

GRANT BIND ON PACKAGE package-id CONTROL schema-name. EXECUTE ,

TO authorization-name USER WITH GRANT OPTION GROUP

PUBLIC

,

REVOKE BIND ON PACKAGE package-id CONTROL schema-name. EXECUTE ,

FROM authorization-name USER BY ALL GROUP

PUBLIC

Figure 4-10 GRANT/REVOKE package syntax

The following is a list of the privileges:  BIND Grants the privilege to bind a package. The BIND privilege allows a user to re-issue the BIND command against that package, or to issue the REBIND command. It also allows a user to create a new version of an existing package. In addition to the BIND privilege, a user must hold the necessary privileges on each table referenced by static DML statements contained in a program. This is necessary, because authorization on static DML statements is checked at bind time.  CONTROL Grants the privilege to rebind, drop, or execute the package and extends package privileges to other users. The CONTROL privilege for packages is granted automatically to creators of packages. A package owner is the package binder or the ID specified with the OWNER option at bind or precompile time. BIND and EXECUTE are granted automatically to an authorization-name that is granted CONTROL privilege. CONTROL grants

the ability to grant the above privileges (except for CONTROL) to others.  EXECUTE Grants the privilege to execute the package.

Chapter 4. Instance and database operations 133  WITH GRANT OPTION

Allows the specified authorization-name to GRANT the privileges to others. If the specified privileges include CONTROL, the WITH GRANT OPTION applies to all of the applicable privileges except for CONTROL (SQLSTATE 01516).

Routine privileges Routine is a collective term for stored procedures, functions, and methods. It is irrelevant to privileges, and the programming language in which a routine has been created. EXECUTE grants the privilege to run the identified user-defined function, method, or stored procedure.

Schema privileges A schema is a namespace where database objects are consolidated. A database can have many schemas. Figure 4-11 depicts the syntax for granting and revoking schema privileges.

,

GRANT ALTERIN ON SCHEMA schema-name CREATEIN DROPIN ,

TO authorization-name USER WITH GRANT OPTION GROUP

PUBLIC ,

REVOKE ALTERIN ON SCHEMA schema-name CREATEIN , DROPIN FROM authorization-name USER BY ALL GROUP

PUBLIC

Figure 4-11 GRANT/REVOKE schema syntax

The following is a list of the schema privileges that can be granted or revoked.  ALTERIN Grants the privilege to alter or comment on all objects in the schema. The owner of an explicitly created schema automatically receives ALTERIN privilege.

134 Database Strategies: Using Informix XPS and DB2 Universal Database  CREATEIN

Grants the privilege to create objects in the schema. Other authorities or privileges that are required to create the object (such as CREATETAB) are still required. The owner of an explicitly created schema receives CREATEIN privilege automatically. An implicitly created schema has CREATEIN privilege granted to PUBLIC automatically.  DROPIN Grants the privilege to drop all objects in the schema. The owner of an explicitly created schema automatically receives DROPIN privilege.

Sequence privileges Figure 4-12 depicts the sequence syntax. The sequence privilege that can be granted or revoked is USAGE, which Grants the privilege to reference a sequence using a nextval-expression or prevval-expression.

GRANT USAGE ON SEQUENCE sequence-name TO PUBLIC

REVOKE USAGE ON SEQUENCE sequence-name FROM PUBLIC

Figure 4-12 GRANT/REVOKE sequence syntax

Table, view, or nickname privileges Most of the table privileges found here are identical to the those of XPS, which are described in the following list:  ALL (ALL PRIVILEGES) Grants all the appropriate privileges, except CONTROL, on the base table, view, or nickname named in the ON clause. If the authorization ID of the statement has CONTROL privilege on the table, view, or nickname, or DBADM or SYSADM authority, then all the privileges applicable to the object (except CONTROL) are granted. Otherwise, the privileges granted are all those grantable privileges that the authorization ID of the statement has on the identified table, view, or nickname. If ALL is not specified, one or more of the keywords in the list of privileges must be specified.

Chapter 4. Instance and database operations 135  ALTER

Grants the privilege to: – Add columns to a base table definition. – Create or drop a primary key or unique constraint on a base table. – Create or drop a on a base table. (The REFERENCES privilege on each column of the parent table is also required.) – Create or drop a check constraint on a base table. – Create a trigger on a base table. – Add, reset, or drop a column option for a nickname. – Change a nickname column name or data type. – Add or change a comment on a base table or a nickname.  CONTROL Grants the following: – All of the appropriate privileges in the list, that is: • ALTER, CONTROL, DELETE, INSERT, INDEX, REFERENCES, SELECT, and UPDATE to base tables. • CONTROL, DELETE, INSERT, SELECT, and UPDATE to views. • ALTER, CONTROL, INDEX, and REFERENCES to nicknames. • The ability to grant the above privileges (except for CONTROL) to others. – The ability to drop the base table, view, or nickname. This ability cannot be extended to others on the basis of holding CONTROL privilege. The only way that it can be extended is by granting the CONTROL privilege itself and that can only be done by someone with SYSADM or DBADM authority. – The ability to execute the RUNSTATS utility on the table and indexes. – The ability to issue the SET INTEGRITY statement against a base table, materialized query table, or staging table. The definer of a base table, materialized query table, staging table, or nickname automatically receives the CONTROL privilege. The definer of a view automatically receives the CONTROL privilege if the definer holds the CONTROL privilege on all tables, views, and nicknames that are identified in the full select.  DELETE Grants the privilege to delete rows from the table or updatable view.

136 Database Strategies: Using Informix XPS and DB2 Universal Database  INDEX

Grants the privilege to create an index on a table, or an index specification on a nickname. This privilege cannot be granted on a view. The creator of an index or index specification automatically has the CONTROL privilege on the index or index specification (authorizing the creator to drop the index or index specification). In addition, the creator retains the CONTROL privilege even if the INDEX privilege is revoked.  INSERT Grants the privilege to insert rows into the table or updatable view and to run the IMPORT utility.  REFERENCES Grants the privilege to create and drop a foreign key referencing the table as the parent. If the authorization ID of the statement has one of: – DBADM or SYSADM authority. – CONTROL privilege on the table. – REFERENCES WITH GRANT OPTION on the table then the grantee(s) can create referential constraints using all columns of the table as parent key, even those added later using the ALTER TABLE statement. Otherwise, the privileges granted are all those grantable column REFERENCES privileges that the authorization ID of the statement has on the identified table. The privilege can be granted on a nickname, although foreign keys cannot be defined to reference nicknames.  REFERENCES (column-name,...) Grants the privilege to create and drop a foreign key using only those columns specified in the column list as a parent key. Each column-name must be an unqualified name that identifies a column of the table identified in the ON clause. Column level REFERENCES privilege cannot be granted on typed tables, typed views, or nicknames (SQLSTATE 42997).  SELECT Grants the privilege to: – Retrieve rows from the table or view. – Create views on the table. – Run the EXPORT utility against the table or view.

Chapter 4. Instance and database operations 137  UPDATE

Grants the privilege to use the UPDATE statement on the table or updatable view identified in the ON clause. If the authorization ID of the statement has one of the following:

– DBADM or SYSADM authority. – CONTROL privilege on the table or view. – UPDATE WITH GRANT OPTION on the table or view then the grantee(s) can update all updatable columns of the table or view on which the grantor has with grant privilege as well as those columns added later using the ALTER TABLE statement. Otherwise, the privileges granted are all those grantable column UPDATE privileges that the authorization ID of the statement has on the identified table or view.  UPDATE (column-name,...) Grants the privilege to use the UPDATE statement to update only those columns specified in the column list. Each column-name must be an unqualified name that identifies a column of the table or view identified in the ON clause. Column level UPDATE privilege cannot be granted on typed tables, typed views, or nicknames (SQLSTATE 42997).

Table space privileges The table space privileges involve actions on the table spaces in a database. The following are the privileges that can be used:  USE Grants the privilege to specify or default to the table space when creating a table.The creator of a table space receives automatically USE privilege with grant option.  OF TABLESPACE Identifies the table space on which the USE privilege is to be granted. The table space cannot be SYSCATSPACE (SQLSTATE 42838) or a system temporary table space (SQLSTATE 42809).

138 Database Strategies: Using Informix XPS and DB2 Universal Database 4.6.4 Client/server security

In a client/server environment, your data might also require protection because it is being transmitted over a network between the participants. To provide security, you can choose to encrypt the data before it is transmitted. DB2 UDB V8.2

introduces client/server encryption. To use client/server encryption, enable the feature at the instance level, as shown in Example 4-19.

Example 4-19 DB2 encryption Database manager authentication (AUTHENTICATION) = DATA_ENCRYPT

DB2 offers two different settings for client/server encrypted communication:  DATA_ENCRYPT. All clients must use data encryption.  DATA_ENCRYPT_CMP. Clients can use data encryption. The mode is for compatibility reasons.

The encryption method is a 56 bit DES.

4.6.5 Authentication methods Similar to XPS, authentication of a user is completed using a security facility outside of DB2 UDB. In addition, the security facility that is used by DB2 not only can be part of the operating system, but also can be a separate product or, in certain cases, might not exist at all. On UNIX based systems, the security facility is in the operating system itself. DB2 UDB also supports LDAP and security plug-ins which can be developed by yourself or by a third party.

The authentication type is stored in the database manager configuration file at the server. It is initially set when the instance is created. There is one authentication type per instance, which covers access to that database server and all the databases under its control.

The following authentication types are provided by DB2 UDB:  SERVER Specifies that authentication occurs on the server using local operating system security. This is the default security mechanism.  SERVER_ENCRYPT Specifies that the server accepts encrypted SERVER authentication

schemes.

Chapter 4. Instance and database operations 139  CLIENT

Specifies that authentication occurs on the database partition where the application is invoked using operating system security.  KERBEROS

Used when both the DB2 UDB client and server are on operating systems that support the Kerberos security protocol.  KRB_SERVER_ENCRYPT Specifies that the server accepts KERBEROS authentication or encrypted SERVER 7 authentication schemes.  DATA_ENCRYPT The server accepts encrypted SERVER authentication schemes and the encryption of user data.  DATA_ENCRYPT_CMP The server accepts encrypted SERVER authentication schemes and the encryption of user data. In addition, this authentication type allows compatibility with down level products not supporting DATA_ENCRYPT authentication type.  GSSPLUGIN Specifies that the server uses a GSS-API plug-in to perform authentication.  GSS_SERVER_ENCRYPT Specifies that the server accepts plug-in authentication or encrypted server authentication schemes.

For more detailed information, see Performance and Administration Guide: Implementation, SC09-4821.

140 Database Strategies: Using Informix XPS and DB2 Universal Database

5

Chapter 5. Data types

This chapter discusses data type issues. Although XPS and DB2 support the same types of data in general terms, each has particulars to the implementation. Most of these particulars are internal and will not overtly affect your application. However, some of them certainly can if you use them. This chapter provides information about these potential issues.

© Copyright IBM Corp. 2005. All rights reserved. 141 5.1 Object names

XPS V8.40 and later has support for long object names. The implementation of long object names is such that you can use long names for any type of database object name. DB2 supports long names for some objects, but not all, as shown in Table 5-1.

If you upgraded to XPS V8.40 and have not taken advantage of long object names or if you are currently on a version of XPS prior to V8.40, you should not encounter issues in this area.

Table 5-1 IMaximum object name lengths Object Type XPS V8.40 and later DB2 version 8.2

User 32 30

Constraint name 128 18

Correlation name 128 128

Cursor name 128 18

Host identifier 128 255

Schema name 32 30

Database name 128 8

Statement name 128 18

Column name 128 30

Table name 128 128

View name 128 128

Stored procedure name 128 128

Synonym/alias 128 128

User-defined type 128 18

User-defined function 128 18

Tablespace/dbpace name 128 18

Trigger 128 18

Index name 128 18

142 Database Strategies: Using Informix XPS and DB2 Universal Database If you have used long object names in XPS, when moving to DB2 you must physically shorten the object names in the DDL. Even if the first n characters of the object name are unique, the DB2 command line processor will not accept more than the maximum.

5.2 Data type mapping

Table 5-2 shows equivalent data types in XPS and DB2 together with space requirements expressed in bytes (octets, 8-bits).

Table 5-2 XPS and DB2 data type mapping XPS data type Space DB2 data type Space requirement requirement

CHAR(n) n<=32,767 CHAR(n) n<=254 VARCHAR(n) n<=32,672 LONG VARCHAR(n) n<=32,700

VARCHAR(255) 1 byte per row VARCHAR(32,672) 4 byte overhead, overhead 2 with value compression

NCHAR(n) GRAPHIC(127) up to 127, 2 byte characters. <=254 bytes total. Long vargraphic also available.

TEXT up to 2G CLOB(n) up to 2GB VARCHAR(n) n<=32,672 LONG VARCHAR(n) n<=32,700

INT 4 bytes INT 4 bytes INTEGER INTEGER

INT8 8 bytes BIGINT 8 bytes

SMALLINT 2 bytes SMALLINT 2 bytes

DECIMAL(p,s), s is odd: (p+4)/2 DEC(p,s), DECIMAL(p,s) p/2+1 DEC(p,s) s is even (p+3)/2 p <= 31 digits p<= 32 digits

DECIMAL(p) see FLOAT FLOAT 8 bytes

FLOAT 8 FLOAT 8 bytes

SMALLFLOAT 4 bytes REAL 4 bytes

Chapter 5. Data types 143 XPS data type Space DB2 data type Space

requirement requirement

DOUBLE 8 bytes DOUBLE PRECISION 8 bytes PRECISION

MONEY such DEC(p,s), DECIMAL(p,s) p/2+1 MONEY(p,s) DECIMAL(16,2) p <= 31 digits

such as DECIMAL(p,s)

BYTE up to 2 GB BLOB(n) VARCHAR for bit data up to 2 GB LONG VARCHAR for bit data

DATETIME (total digits)/2+1 DATE TIMESTAMP (YYYY-MM-DD (depends on TIME (10 bytes) HH:MM:SS.nnnnn) precision length) TIMESTAMP (YYYY-MM-DD-HH.MM.SS.nnnnnn)

DATE 4 bytes DATE (MM/DD/YYYY) USA 4 bytes (or other ISO or national format)

DATETIME (total digits)/2+1 TIME 8 bytes (hour to sec.)

SERIAL-4 4 or 8 bytes INTEGER with IDENTITY, 4 or 8 bytes SERIAL-8 BIGINT with IDENTITY

INTERVAL (total digits)/2+1 DECIMAL See 5.8.3, “INTERVAL data type” on page 154.

144 Database Strategies: Using Informix XPS and DB2 Universal Database 5.3 NULL values

DB2 uses an extra byte of storage for a NULL indicator. In DB2, for each column defined with a NOT NULL constraint, 1 extra byte of storage is required. If you have a large table, with a large number of columns with NOT NULL constraints defined, the impact of this extra byte could be significant. You should consider this when planning storage.

Aside from internal storage, XPS and DB2 manipulate NULL values the same. The presence of NULL values in columns affects joins and arithmetic. For example, the results of a calculation with a NULL value is NULL.

5.4 Disk considerations

DB2 supports 4 KB, 8 KB, 16 KB, and 32 KB page sizes. Table limits are determined based on the page size. For a full list of limits in DB2, see Appendix A, “SQL Limits” in the DB2 SQL Reference, SC09-2974 and SC09-2975.

Table 5-3 Maximums per page size Table limits 4 KB page 8 KB page 16 KB page 32 KB page

Maximum length of a row including all 4005 8101 16,293 32,677 overhead

Most columns in a table 500 1012 1012 1012

Maximum size of a table (per partition) 64 GB 128 GB 256 GB 512 GB

Maximum size of index (per partition) 64 GB 128 GB 256 GB 512 GB

Maximum size of a dms table space 64 GB 128 GB 256 GB 512 GB

Most elements in a select list 500 1012 1012 1012

Maximum index key length 1024 1024 1024 1024

Most columns in an index key 16 16 16 16

Chapter 5. Data types 145 5.5 Character types

This section discusses the various character types that are supported and their description and characteristics as well as the server handling of these character types.

When providing examples of strings or single characters, the examples are enclosed in quotation marks for XPS because quotation marks are generally used by users. Over time, we recommend that you convert quotation marks to single quotation marks because that is standard SQL, which is also supported fully by XPS but required for DB2 (and all other database management systems conforming to SQL standards).

5.5.1 Truncation In XPS log mode ANSI databases and in DB2, when an inserted value exceeds the maximum length of column, an error is returned. Truncation is not supported. For XPS logged, non-mode ANSI databases, truncation occurs automatically if an inserted value exceeds the maximum and no error is returned.

Tip: Unless you have been using the XPS log mode ANSI database, a change in character truncation might have significant application ramifications.

It is possible that your XPS application uses host variables that are of generic size and that no checking is done for size limits. For these applications, you might see an increase in INSERT/UPDATE failure because the application might be attempting to process strings that are too long.

Tip: If your application does not check for failed INSERT/UPDATE errors, you can experience truncation failures and might not be aware of it. You might be losing data.

5.5.2 NCHAR data type Although the NCHAR data type maps to the DB2 GRAPHIC data type, there are storage differences.

The XPS NCHAR data type can be used for single or double byte characters. XPS uses the codepage to determine if the NCHAR string is required to store single or double byte data. NCHAR columns that store single byte data will use less disk space than NCHAR columns that store double byte data.

146 Database Strategies: Using Informix XPS and DB2 Universal Database With the DB2 GRAPHIC data type, DB2 always assumes the data is double byte and thus allocates double the length specified.

Tip: If your XPS application stores single byte characters in NCHAR columns, with DB2 VARGRAPHIC your disk space requirements will double.

If your XPS database has been defined with NCHAR columns, but you are not using the NLS functionality, you might investigate moving to a DB2 CHAR or VARCHAR to avoid the extra disk consumption.

5.5.3 VARCHAR data type XPS allows specification of minimum pre-allocated space for VARCHARs with a reserve parameter, as follows: CREATE TABLE spoonman (c1 VARCHAR(max,reserve))

The reserve signals XPS to reserve a certain amount space for VARCHAR columns when rows are INSERTed. Even if the value for that column is NULL or its length is less than the reserve, the reserve space is still allocated. This is useful for when columns are initially INSERTed empty, but UPDATEd later. Having the reserve allows the row to stay in place and avoids row chaining with a forward pointer.

DB2 does not support pre-allocation of space for VARCHARs, and thus the two parameter declaration is not supported. Also, DB2 does not use row chaining, but uses a system of variable page size to accommodate larger rows. See 3.4, “Configuring the instance” on page 79.

Tip: In DB2, VARCHAR types are the only types that allow altering with the ALTER command.

5.5.4 TEXT data type The XPS TEXT data type contains only printable characters. One possible character contained within a TEXT value might be a carriage return character. The DB2 load utility, by default, uses a carriage return as a row delimiter. Thus, when moving XPS TEXT data with imbedded carriage returns to DB2 VARCHAR, multiple rows can result because the DB2 load utility assumes the carriage returns are row delimiters.

To resolve this situation, set the delprioritychar file type modifier on the load command to change the priority from row delimiter to character delimiter. When

Chapter 5. Data types 147 doing this, if a row delimiter is found within a character stream, the character stream delimiter takes priority over the row delimiter and does not split the row.

5.6 Numerical data types

Both XPS and DB2 support the basic numerical data types. There are differences, however, in specialized numerical types and the detailed handling of all these types.

5.6.1 Numerical limits Table 5-4 shows the numerical limits for XPS and DB2. In most cases, these limits are identical or only differ slightly because of implementation differences. In many cases, the XPS limit values are 1 less than the corresponding DB2 limits, because XPS internally uses the extreme largest/smallest values to track NULL conditions. This enables XPS to record NULL values without the cost of the extra octet needed when a DB2 column is nullable.

Table 5-4 Database numerical limits Data type XPS limit DB2 limit

Smallest INTEGER -2 147 483 647 -2 147 483 648

Largest INTEGER +2 147 483 647 +2 147 483 647

Smallest BIGINT -9 223 372 036 854 775 807 -9 223 372 036 854 775 808

Largest BIGINT 9 223 372 036 854 775 807 +9 223 372 036 854 775 807

Smallest SMALLINT -32 767 -32 768

Largest SMALLINT +32 767 +32 767

Largest decimal precision (p) 32 31

Largest or smallest DOUBLE value 17 significant digits FP, based +/- 1.79769E+308 on hardware limits

Smallest positive or negative DOUBLE 17 significant digits FP, based +/-2.225E-307 value on the hardware C double data type

Largest or smallest REAL value Approximate nine significant +/-3.402E+38 digits, floating point, based on hardware C float

148 Database Strategies: Using Informix XPS and DB2 Universal Database Data type XPS limit DB2 limit

Smallest positive or negative REAL Approximate nine significant +/-1.175E-37 value digits, floating point, based on hardware C float

5.7 DECIMAL

The XPS DECIMAL data type has a maximum precision of 32. The DB2 DECIMAL data type has a maximum precision of 31. XPS and DB2 have some significant differences regarding default scale and rounding that you should note. In XPS if a column is declared as a DECIMAL, with no precision and scale, it defaults to a precision of 16 while, DB2 defaults to 5. When moving to DB2, you should take care to specify a DECIMAL precision. Doing so guarantees that your existing data fits.

It is a good idea to specify the precision of your DECIMAL data types. In XPS, a DECIMAL data type without a scale such as DEC(9) is represented internally as a FLOAT. In DB2, a DECIMAL data type specified with a precision but without a scale, defaults to DEC(9,0) which is not a floating point number. If your application is using the XPS floating point behavior, make sure to convert to a floating point number or specify a specific scale for the DECIMAL column.

Tip: For DECIMAL types, XPS rounds, while DB2 truncates.

In XPS when a value of 123.456 is inserted into a DEC(3,2) column, XPS rounds the second decimal place value and stores 123.46. In DB2, when a value of 123.456 is inserted into a DEC(3,2) column, DB2 truncates the second decimal place value and stores 123.45. Obviously, this difference can have significant implication on monetary or financial applications.

5.7.1 MONEY data type Internally, XPS supports the MONEY data type with a DECIMAL data type. However, a column defined as MONEY(n) (with no scale) maps to a default scale of 2 - in effect a DEC(16,2). As discussed in 5.7, “DECIMAL” on page 149, in DB2, a DEC(n) (with no scale) defaults to a scale of 0, not a floating point number. For this reasons, users with columns defined as MONEY(n) (with no scale) should take care to add a scale of 2 when changing this to a decimal definition — DEC(n,2).

The prominent feature of the XPS MONEY data type is that it provides automatic handling of currency symbols and formatting and is integrated with locales. If this

Chapter 5. Data types 149 full functionality is required in DB2, you will need to create a user-defined type named MONEY and provide the associated user-defined functions to generate appropriate formatting.

5.7.2 SERIAL and SERIAL8 The SERIAL data type can map to DB2 by using a numeric data type with the IDENTITY column attribute. IDENTITY can be used with SMALLINT, INTEGER, BIGINT, and DECIMAL. When defining a column as IDENTITY, unique, sequential numeric values will be generated for every row that is inserted.

The XPS SERIAL data type has several specific behaviors which can be duplicated in DB2:  In XPS, if you insert a row to a table with a SERIAL column, but do not address the SERIAL column in the column and value lists, XPS continues to assign a SERIAL value for you, as follows: (XPS) CREATE TABLE t (id SERIAL, c1 CHAR(20)) In the following example XPS generates a SERIAL value for column ID, even though, ID was not mentioned in the column and value lists, as follows: (XPS) INSERT INTO t (c1) VALUES John Doe In DB2, use the GENERATED BY DEFAULT syntax along with the IDENTITY definition. DB2 generates a value for the column unless a value is explicitly provided for the column either by INSERT or LOAD, as follows: (DB2) CREATE TABLE t (id INTEGER GENERATED BY DEFAULT AS IDENTITY,c1 CHAR(20)) In the following example, as in XPS, the system generates an IDENTITY value for column ID, even though ID was not mentioned in the column and value lists: (DB2) INSERT INTO t (c1) VALUES (“John Doe”)  In XPS, if you insert a row and supply a value of 0 for the SERIAL column as a placeholder, XPS continues to generate a sequential number: (XPS) INSERT INTO t VALUES (0, “Jane Doe”)

In DB2 use the term DEFAULT instead, as follows: (DB2) INSERT INTO t VALUES (DEFAULT, “Jane Doe”)

150 Database Strategies: Using Informix XPS and DB2 Universal Database  As with XPS, you can specify the starting value for the IDENTITY column, as follows:

(DB2) CREATE TABLE t (id INTEGER GENERATED BY DEFAULT AS IDENTITY (START WITH 1), c1 CHAR(20))  In most cases XPS users have UNIQUE indexes defined on SERIAL columns, although uniqueness is not required. This is also the case with DB2.  XPS uses an internal counter to track the next SERIAL number assigned. There is an algorithm that determines what the next serial value will be (and it is dependent on table fragmentation). If the last row assigned, skipped values, the next row goes to the next value. In most cases this could cause a gap in serial values. (XPS) CREATE TABLE t (id SERIAL,c1 CHAR(20)) INSERT INTO t VALUES (0,"Alex”) -- id column l be assigned 1 INSERT INTO t VALUES (5,”Geddy”) -- id column will be assigned 5 INSERT INTO t VALUES (0,”Niel”) -- id column will be assinged 6 In DB2 you have more control over a similar, internal counter, but the counter must be set manually with the RESTART command, as follows: (DB2) CREATE TABLE t (id INTEGER GENERATED BY DEFAULT AS IDENTITY, c1 CHAR(20)) INSERT INTO t VALUES (DEFAULT,’Alex’) --id column will be assigned 1 INSERT INTO t VALUES (5,’Geddy’) --id column will be assinged 5 INSERT INTO t VALUES (DEFAULT,’Niel’) --id column will be assigned 2 ALTER TABLE t ALTER id RESTART WITH 6 --internal counter = 6 INSERT INTO t VALUES (DEFAULT,’Tom’) --id column will be assigned 6  In XPS programming, the SQLCA record (the SQL Communications Access record) contains information from XPS about the last SQL statement executed by a program. After an INSERT, to programmatically obtain the value assigned to a SERIAL column, the program would simply look at the SQLCA record at some point after executing the table INSERT in question, but before executing some other SQL statement. As long as the program checks the SQLCA record prior to executing further SQL, the SQLCA will still contain the assigned value. (XPS) $INSERT INTO experts (lname) VALUES (“Scranton”); ... /* other program logic here, no SQL though */ ... x=sqlca.sqlerrd[1]; In DB2 use the IDENTITY_VAL_LOCAL built-in function. Its usage is more analogous to using the XPS dbinfo function. The IDENTITY_VAL_LOCAL function returns the most recently assigned value for an identity column,

Chapter 5. Data types 151 where the assignment occurred as a result of a single row INSERT statement using a VALUES clause. Note that the IDENTITY_VAL_LOCAL results can be affected by other events especially triggers. If you are transitioning from XPS using SQLCA it is highly recommended that you consult the DB2 manuals regarding the exact behavior of IDENTITY_VAL_LOCAL.

5.8 Date and time types

DATE is a basic type in both XPS and DB2. The two RDBMSs, however, have different implementations concerning the storage of time and date plus time. If you are new to DB2 and wish to understand how to manipulate dates and times, see the following article: http://www7b.software.ibm.com/dmdd/library/techarticle/0211yip/0211yip3.html

5.8.1 DATE data type DB2 supports many formats for DATEs. It supports ISO, EUR, and JIS formats. Dates can be input with any of the four formats. The default format is USA format, but this can be determined by the date format defined at installation time.

Tip: DB2 supports entry of dates in four digit year format only. There is no default century functionality similar to what is provided by XPS DBCENTURY.

5.8.2 DATETIME, TIME, and TIMESTAMP data types For storing date plus time values or just time values, XPS uses one data type, DATETIME. The precision of what is stored is defined by qualifiers. Example 5-1 shows various XPS DATETIME definitions with qualifiers.

Example 5-1 XPS DATETIME definitions DATETIME HOUR TO SECOND(5) DATETIME YEAR TO DAY DATETIME YEAR TO SECOND(2)

In DB2, there is not one, single counterpart data type to the XPS DATETIME data type. While XPS uses one, configurable type, DB2 uses two predefined types TIME and TIMESTAMP. These types are of fixed units and you cannot reduce precision.

152 Database Strategies: Using Informix XPS and DB2 Universal Database If you are using an XPS DATETIME YEAR TO DAY, the DB2 DATE format might be the most analogous data type to use. You might consider moving to a DATE rather than a TIMESTAMP.

If using the DB2 load utility to load XPS timestamps, you can set the dateformat or timestamp format file-type modifier to specify the format of the incoming date or datetime values. By doing this, DB2 can translate the input values into formats that it can load.

If you are using XPS DATETIME with qualifiers that are a subset of a full timestamp, an example being MONTH TO HOUR, you have several options, all of which might have impact on application handling and disk storage requirements. Your options are:  Expand to full TIMESTAMP. You might decide to expand the time you track to the full timestamp, including the entire date and the entire time. As in the MONTH TO HOUR example, this would mean tracking years, minutes, and seconds in addition to what is already there. This would have an impact on loading your existing XPS data into DB2 because you would have to provide additional information prior to insertion. You would have to generate or assign year information for existing data. Minutes and seconds could be assigned 0.  Use a character string. If you are storing a subset of a full stamp and do not want to expand to a full timestamp, you could choose to store the information in a character string. This might be acceptable if you do not manipulate dates and do not require use of date-based built-in functions.  Create a user-defined type. This could be a user-defined type that is derived from the base TIMESTAMP date type but that provides for input and output of a TIMESTAMP subset.

In DB2, when inserting data into a TIMESTAMP column, specification of all six microseconds is not required. At minimum, you must specify YYYY-MM-DD HH.MM.SS. When retrieving dates, the format is the default format which is specified at installation time. This format can be overridden temporarily for specific SQL statements in several ways:  Using the CHAR function The following SELECT statement shows how a default ISO date format can be converted to the USA format: SELECT empno, CHAR(hiredate, USA) FROM emp  Using DB2 built-in date/time arithmetic and casting

The following SELECT statement shows the use of the DAYS built-in function to convert a date into days: SELECT * FROM orders WHERE DAYS(ship_date) – DAYS(order_date) > 5

Chapter 5. Data types 153 5.8.3 INTERVAL data type

XPS offers an INTERVAL data type that represents a span of time. The INTERVAL data type supports input and output of intervals as character strings. INTERVALS, however, are internally stored as decimal numbers. Therefore, they

are best mapped to a DB2 DECIMAL type. DB2 does not support a specific INTERVAL data type. Example 5-2 depicts an XPS INTERVAL data type.

Example 5-2 XPS INTERVAL INTERVAL(123456 13:07:56) DAY(6) TO SECOND

This value is actually stored in the XPS database as the decimal number 123456130756, where the digits are the various components of interval.

You might wish to examine how the MTK translates XPS INTERVAL values. When using the MTK to transition INTERVAL to DECIMAL, the MTK normalizes the internal representation to a total-number-of-seconds value. Thus, the DB2 value in seconds for the above example is: 123456*86400 + 13*3600 +7*60 +56

Example 5-3 depicts how this type of translation would affect an INTERVAL default.

Example 5-3 XPS and DB2 INTERVAL with DEFAULT (XPS) CREATE TABLE t(i INTERVAL DAY (6) TO SECOND DEFAULT INTERVAL (2 0:0:56) DAY (6) TO SECOND NOT NULL)

(DB2) CREATE TABLE t(i DECIMAL (20,5) DEFAULT 172856 NOT NULL)

The XPS INTERVAL data type provides not only storage for intervals but also special character string input and output support for intervals, which might impact your application.

DB2 does provide verbiage to support interval manipulation of dates and times in SQL, as shown in the following SELECT statement: SELECT col1 FROM tab1 WHERE CURRENT DATE < (date1 + 10 YEARS)

154 Database Strategies: Using Informix XPS and DB2 Universal Database 5.9 FLOAT

Both XPS and DB2 support a FLOAT data type; however, there are differences.

The XPS FLOAT data type stores double-precision floating point numbers with up to 16 significant digits. When declared; FLOAT(n) the value of n must be a whole number between 1 and 14 to specify precision.

The DB2 FLOAT data type is either single or double-precision depending upon the integer value specified when defining the column. The value of the integer is 1 to 53 with 1 through 14 specifying single-precision and 25 through 53 indicating double-precision. DB2 also accepts REAL for single-precision and both DOUBLE and DOUBLE-PRECISION to define double-precision.

5.10 REAL or SMALLFLOAT

XPS implements REAL as a synonym for SMALLFLOAT. SMALLFLOAT stores single-precision floating point numbers with eight significant digits. This data type can be mapped to the DB2 data type FLOAT(8).

5.11 LOB data types

XPS uses TEXT, BYTE, BLOB, and CLOB data types for large character and binary data. DB2 has comparable data types, CLOB(n) and BLOB(n). Unlike XPS, DB2 requires that the maximum length for the CLOB or BLOB value be specified when the column is defined, and n must be specified. The maximum length of the DB2 CLOB and BLOB data types are 2 GB.

5.12 Sequence objects

Sequence objects are available in DB2 and IDS but are not available in XPS. Unlike SERIAL data types which generate a sequence of numbers for a table, a SEQUENCE is independent of a table and rows in a table. A knowledge of sequence objects is necessary to understand how to implement an equivalent of the XPS SERIAL and SERIAL8 in the target database.

Chapter 5. Data types 155 Informix IDS has had sequence objects since IDS V9.40. Both IDS and DB2 support SEQUENCES, although there are some differences in the implementations, as described in the following list:

 The syntax of the CREATE SEQUENCE statement is nearly identical in IDS and DB2. The only exception being IDS specifies the attributes NOCYCLE, NOORDER and NOCACHE as a single key word, while the same attributes in DB2 are two words separated by a space: NO CYCLE, NO ORDER, and NO CACHE.  The syntax to drop a sequence in DB2 is DROP SEQUENCE RESTRICT. The RESTRICT keyword is not used in IDS.  In both IDS and DB2, a reference to the NEXTVAL of a sequence increments the sequence and returns that new value. Multiple references to NEXTVAL in the same statement all return the same value.  IDS supports CURRVAL and NEXTVAL while DB2 supports PREVVAL and NEXTVAL. IDS uses CURRVAL as a way of obtaining the most recently generated sequence value without causing a new value to be generated. In DB2, PREVVAL returns the sequence value from the previous successful NEXTVAL reference.  References to NEXTVAL and CURRVAL in IDS have the form “sequence_name.NEXTVAL” (or CURRVAL). The DB2 equivalent is “NEXTVAL FOR sequence_name” (or PREVVAL FOR sequence_name).

Example 5-4 shows the use of sequence in both IDS and DB2.

Example 5-4 Sequence (IDS and DB2) create table dept (deptno smallint not null, deptname varchar(36) not null, mgrno char(6), admrdept smallint not null,llocation char(30)); create sequence dept_seq start with 500 increment by 1 cache 20;

(IDS) insert into dept values (dept_seq.nextval,'SALES',‘Eddie',50,'Downtown'); insert into dept values (dept_seq.nextval,'MARKETING','Alex',100,'Midtown'); insert into dept values (dept_seq.nextval,'ACCOUNTING','Sammy',150,'Uptown');

(DB2) insert into dept values (nextval for dept_seq,'SALES','Eddie',50,'Downtown'); insert into dept values (nextval for dept_seq,'MARKETING','Alex',100,'Midtown'); insert into dept values (nextval for dept_seq,'ACCOUNTING','Sammy',150,'Uptown');

156 Database Strategies: Using Informix XPS and DB2 Universal Database (IDS and DB2) select * from dept order by deptno;

deptno 500 deptname SALES mgrno Eddie admrdept 50 location Downtown

deptno 501 deptname MARKETING mgrno Alex admrdept 100 location Midtown

deptno 502 deptname ACCOUNTING mgrno Sammy admrdept 150 location Uptown

5.13 Other object limits in DB2

Table 5-5 provides a chart of miscellaneous limits on object names and content in DB2 as of DB2 UDB V8.2.

Table 5-5 DB2 Limits Limit Value

Longest SQL statement length 2 MB

Longest authorization name (can only be single-byte characters 30

Longest constraint name 18

Longest correlation name 128

Longest condition name 64

Longest cursor name 18

Longest external program name 8

Longest host identifier 255

Longest schema name 30

Longest server (database alias) name 8

Chapter 5. Data types 157 Limit Value

Longest statement name 18

Longest unqualified column name 30

Longest unqualified package name 8

Longest unqualified table name, view name, stored procedure name, 28 nickname, or alias

Longest unqualified user-defined type, user-defined function, buffer 18 pool, table space, nodegroup, trigger or index name

Note: The longest SQL statement length was increased from 65,535 bytes (DB2 UDB V8.1 and earlier) to 2 MB with DB2 UDB V8.2. The additional length was necessary to support creation of stored procedures in SQL and other changed programming features introduced with DB2 UDB V8.2.

Some of these limits on object names might be revised higher in later releases of DB2 UDB because the tendency is to increase the maximum length of object names to 128 Unicode character positions.

5.14 DB2 manuals

The primary manuals that are needed for understanding DB2 data types and DB2 SQL are:  SQL Reference, Volumes 1 and 2, SC09-2974 and SC09-2975.  Data Movement Utilities Guide and Reference, SC09-2955.

158 Database Strategies: Using Informix XPS and DB2 Universal Database

6

Chapter 6. Data partitioning and access methods

This chapter describes the different types of fragmentation schemes in XPS and how they can be mapped to DB2 partitioning. It also discusses the various types of data access and join methods in XPS and how they can be mapped to the various capabilities in DB2.

© Copyright IBM Corp. 2005. All rights reserved. 159 6.1 Benefits of data partitioning

Partitioning is an intelligent way of distributing (partitioning or fragmenting) the data across multiple disks on a single node or multiple nodes. Distributing data across multiple disks enables multiple coservers (or processors) to perform disk I/O operations in parallel, thereby accessing multiple table partitions or fragments concurrently. It also enables the SQL operations to be segmented into subtasks and executed on multiple nodes in parallel. In fact this parallelism (optimization) can be exploited within a physical SMP node by having multiple XPS coservers (or multiple DB2 partitions) on one physical node. The fact that data is split across the partitions is transparent to the users issuing SQL statements.

Splitting the data across multiple disks or nodes is known as fragmentation in XPS and Database Partitioning in DB2. We will use the terms fragmentation when referring to XPS and partitioning when referring to DB2. Throughout the chapter, when we mention the name DB2 or DB2 UDB, we are referring to DB2 UDB Enterprise Server Edition (ESE) with the Database Partitioning Feature (DPF).

Both DB2 and XPS are based on a shared-nothing architecture, meaning data or memory is not shared between database partitions (coservers). One of the benefits of this architecture, in addition to improved parallelism, is linear scalability.

Similar to XPS, DB2 supports all the partitioned hardware environments. For example:  Single partition on a single processor (uniprocessor)  Single partition with multiple processors (SMP)  Multiple partitions with one processor (MPP)  Multiple Partitions with multiple processors (cluster of SMPs)

160 Database Strategies: Using Informix XPS and DB2 Universal Database Table 6-1 summarizes the types of parallelism best suited to take advantage of the various hardware environments.

Table 6-1 Types of parallelism

Hardware Environment I/O Parallelism Intra-Partition Inter-Partition Parallelism Parallelism

Single Partition, Single Ye s N o 1 No Processor

Single Partition, Multiple Ye s Ye s N o Processors (SMP)

Multiple Partitions, One Ye s N o 1 Ye s Processor (MPP)

Multiple Partitions, Multiple Ye s Ye s Ye s Processors (cluster of SMPs)

Logical Database Partitions Yes Yes Yes

1 There can be an advantage to setting the degree of parallelism (using one of the configuration parameters) to some value greater than one, even on a single processor system, especially if the queries you execute are not fully using the processor (for example, if they are I/O bound).

The subsequent sections briefly discuss the fragmentation strategies, such as hash, round robin, expression and hybrid, that are available in XPS and how these strategies map into the DB2 partitioning schemes.

6.2 Hash fragmentation

This type of fragmentation uses an internal, system-defined hashing function that distributes rows with the object of keeping the same number of rows in each partition. Hash partitioning is supported in the DB2 UDB with DPF. The hash fragmentation in XPS maps nicely into the database partitioning functionality of DB2.

Let us take an example from Chapter 13, “Large data volumes: A case study” on page 373. In Example 6-1 on page 162, in XPS, table partsupp is fragmented by hash using column ps_partkey in the dbslice s_c_p_ps where the dbslice is defined in cogroup_all. For the four coserver XPS environment that is used in the

Chapter 6. Data partitioning and access methods 161 case study, it creates 32 dbspaces across the four coservers with eight dbspaces on each coserver. It creates:

 s_c_p_ps.1, s_c_p_ps.5, .., s_c_p_ps.29 on coserver 1.  s_c_p_ps.2, s_c_p_ps.6, .., s_c_p_ps.30 on coserver 2.  And so on.

The DB2 equivalent syntax creates the table with the same column as a hash partitioning key in the partition group ibmdefaultgroup spanning across all partitions. Just as XPS substitutes %c with the coserver number, DB2 substitutes $N with the partition number. For the eight partition setup, DB2 contains a total of 32 containers with four containers on each partition.

Example 6-1 Hash partitioning (XPS) create dbslice s_c_p_ps from cogroup cogroup_all chunk "/wrk4/tpch/disks/xps1.%c" offset 2500000 size 1000000, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps2.%c" offset 2500000 size 1000000, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps3.%c" offset 2500000 size 1000000, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps4.%c" offset 2500000 size 1000000, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps5.%c" offset 2500000 size 1000000, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps6.%c" offset 2500000 size 1000000, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps7.%c" offset 2500000 size 1000000, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps8.%c" offset 2500000 size 1000000;

create raw table partsupp ( ps_partkey integer not null, ps_suppkey integer not null, ps_availqty integer, ps_supplycost decimal(12,2), ps_comment varchar(199) ) fragment by hash(ps_partkey) in s_c_p_ps extent size 512 next size 128 lock mode table ; ------(DB2) create tablespace space_dataindex in database partition group ibmdefaultgroup pagesize 16K

162 Database Strategies: Using Informix XPS and DB2 Universal Database managed by database using ( device '/wrk2/tablespaces/db2a1. $N ' 1250000, device '/wrk2/tablespaces/db2a2. $N ' 1250000, device '/wrk2/tablespaces/db2a3. $N ' 1250000, device '/wrk2/tablespaces/db2a4. $N ' 1250000 ) bufferpool bp16k extentsize 32 prefetchsize 128; create table partsupp ( ps_partkey integer not null, ps_suppkey integer not null, ps_availqty integer, ps_supplycost decimal(12,2), ps_comment varchar(199) ) partitioning key (ps_partkey) using hashing in space_dataindex not logged initially;

Note:  The extent size is an attribute of the table in XPS but is an attribute of the table space in DB2. As a result, you can create tables with different extent sizes in a dbslice in XPS where as all the tables created in a table space will have the same extent size in DB2.  On the other hand, pagesize being an attribute of the table space in DB2, you can create tables with different page sizes in a database instance. While on XPS, you can use only one page size for an instance (PAGESIZE onconfig parameter).  When creating a dbslice, the size of the chunk is specified in units of kilobytes. When creating a table space, the size of the container is specified in number of pages.

Some of the considerations regarding hash partitioning are as follows:  The criteria for choosing the columns for hash partitioning remains the same. You typically choose columns that contain unique values as your hashing column to obtain even data distribution across partitions. Lots of duplicates in the hashing column leads to data skew and causes negative impact on performance.

Chapter 6. Data partitioning and access methods 163  Choose the join columns as your hash partitioning key for tables that are joined more often to take advantage of co-located joins. Co-located joins generate less traffic between partitions because the joins can be done locally.

 Queries containing an equality expression in the predicates benefit from hash partitioning because they eliminate partitions based on the hashed column value.  Hash partitioning does not help queries containing a range in the predicates. Such queries are optimized for tables that are fragmented by expression or hybrid. As we discuss later, this can be handled in DB2 using Multidimensional Clustering (MDC) or UNION ALL views.

6.3 Round robin fragmentation

In XPS, round robin fragmentation places rows one after another in fragments, rotating through the series of fragments to distribute the rows evenly. For INSERT statements, XPS uses a hash function on a random number to determine the fragment in which to place the row. For INSERT cursors, the database server places the first row in a random fragment, and the second and subsequent rows are assigned to fragments in sequence. If one of the fragments is full, that fragment is skipped. Fragment elimination cannot be done for tables fragmented by round robin.

In a non-partitioned database, DB2 distributes data (in units of extents) in round robin fashion across multiple containers in a table space automatically. For a partitioned database, you can implement round robin by creating a generated column of data type identity (similar to serial data type in XPS) and hash partitioning on this generated column. DB2 guarantees unique values for the identity data type when you use the generated always clause, as shown in Example 6-2.

Example 6-2 Round robin partitioning (XPS) create table round_robin (col1 integer, col2 char(10)) fragment by round robin in rootdbs;

(DB2) create table round_robin ( col1 integer, col2 char(10), hash_column bigint not null generated always as identity (start with 1, increment by 1) partitioning key (hash_column) using hashing in userspace1

164 Database Strategies: Using Informix XPS and DB2 Universal Database Note: For a single coserver environment in XPS and a single partition environment in DB2:

 A dbslice can contain multiple dbspaces within a coserver. A table space can contain multiple containers within a partition.  In a round robin schema, XPS used round robin across the dbspaces but multiple chunks within a dbspace are used in serial fashion (not round robin). DB2 uses the containers in a round robin fashion. In this scenario, a dbspace equates to a container.  XPS performs round robin data distribution in units of rows, while DB2 distributes in units of extents.

Round robin strategy is typically chosen when:  You do not know the data distribution at the time of creating the table.  You have chosen a column for hash partitioning that causes data skew.

6.4 Expression and range fragmentation

The expression-based fragmentation puts rows into fragments based on a fragmentation expression that you specify. This expression defines criteria, or rules, for assigning a set of rows to each fragment. Range fragmentation is a convenient alternative to fragmenting by expression. In a range-fragmented table, the database server implicitly clusters rows within the fragments, based on the range of the values in the fragmentation column. You do not need to specify explicitly the range values for each fragment.

In DB2, you can use the MDC functionality on a partitioned database to implement range partitioning. (The subsequent sections in this chapter discuss the MDC concepts.) The columns that are used in the expression or range fragmentation can be used as the dimension keys for MDC. Because a partitioning key is required in DB2, you have to identify a new column for hashing. If there are not any columns that fit the criteria (unique values), you can create a generated identity column and use that as the hash column, as discussed in 6.3, “Round robin fragmentation” on page 164.

Chapter 6. Data partitioning and access methods 165 6.5 Hybrid fragmentation

This type of fragmentation combines two types of distribution schemes to put

rows into fragments in different dbslices and dbspaces. You specify an expression-based distribution scheme to choose a dbslice and a system-defined hash distribution scheme to fragment the table across dbspaces within that dbslice.

6.6 Range partitioning using MDC

Prior to DB2 UDB V8.1, clustering was implemented via the clustering index. However, clustering was supported in only a single dimension. Clustering indexes, such as all indexes, were record-based and contained one entry for each record, which can result in large indexes.

Multidimensional Clustering is a feature introduced in DB2 UDB V8.1. It enables a table to be clustered physically on more than one key (or dimension). MDC ensures that rows are organized on disk in blocks, such that all rows within any block have a particular combination of dimension values. Actually, the block equates to an extent, thus each extent has a particular combination of dimension values. The number of rows with a particular set of dimension values can fill up an extent, in which case multiple extents can be assigned to a particular set of dimension values.

MDC creates automatically a composite block index on the dimension columns and a dimension block index on each dimension. The composite block index contains one entry for each combination of the dimension values and the dimension block index has one entry for each dimension value. Assuming the dimension columns contain lots of duplicates, the block index is much smaller than a record identifier (RID) index. The leaf pages of a block index contain pointers to extents.

Example 6-3 illustrates the MDC table. The table mdc_table is defined with the MDC clause organize by on three dimension columns: o_ year, country, and color. You can create an MDC on generated columns (in this case year).

Example 6-3 MDC index create table mdc_table (o_key int, country varchar(25), o_date date, color varchar(10), o_year int generated always as year(o_date), ..) partitioning key (o_key) using hashing ORGANIZE BY (o_year, country, color)

166 Database Strategies: Using Informix XPS and DB2 Universal Database Suppose you query on this table with the predicate <1992,Canada,yellow>. DB2 searches the composite block index for this combination key. The leaf of the key points to a list of extents that fall in the query criteria.

Figure 6-1 depicts a view of this example to explain the concepts and terminology of MDC.

MDC Table Dimension An axis along which data is physically organized in an MDC table

Slice

2002 2001 Canada, Canada, yellow blue 2002 2002 Canada, Canada, Country yellow yellow

Dimension 2002 2001 Mexico 45 Mexico blue 31 yellow 2002 2002 Cell Mexico Mexico yellow yellow 127 Color Dimension Year Block Dimension

Figure 6-1 MDC terminology

Here are some brief definitions: Dimension An axis along which data is organized physically in the table. In Example 6-3 on page 166, the three dimension columns are o_year, country, and color. Slice A column or a row in the grid. It is the portion of the table that contains all rows having a specific value for one of the dimensions, for example 2001. Cell An element that contains a unique combination of dimension values that is organized physically as blocks of pages, where a block is a set of consecutive pages on disk, for example cell 2002, Mexico, blue. Block The smallest allocation unit of the MDC table. It is equivalent to an extent. The extent size was specified when creating the table space. For example, blocks 31, 45, and 127 identify the cell having dimension values 2002, Mexico, yellow. These blocks are

Chapter 6. Data partitioning and access methods 167 numbered according to the logical order of allocated extents in the table.

DB2 creates two default indexes and a block map automatically after you create an MDC table. The following sections describe those two indexes.

Composite block index For the table in Example 6-3 on page 166, a block index is created on the composite columns (o_year, country, and color). The structure of a block index is almost identical to a regular index. The major difference is that the leaf pages of a regular index are made up of pointers to rows, while the leaf pages of a block index contain pointers to extents. Because each entry of a block index points to an extent, while the entry in a RID index points to a row, a block index is much smaller than a RID index. In determining access paths for queries, the optimizer can use block indexes in the same way that it uses RID indexes. For example, the block indexes can be ANDed and ORed with other block indexes. They can also be ANDed and ORed with RID indexes. This operation is done at the block level and thus is much faster.

Figure 6-2 depicts a composite block index. This index is used for query processing as well as to quickly locate the candidate blocks for inserts.

Note: Because the block index contains pointers to extents, not rows, a block index cannot enforce uniqueness of rows. When uniqueness is a requirement, then a RID index on the column would be necessary.

Block index on: Year, Country, Color 1997, Canada, Blue 1997, Canada, Yellow 1997, Mexico, Blue 1997, Mexico, Yellow (4,0), (84,0), (444,0) (52,0), (292,0) (124,0), (128,0) (96,0), (3340,0)

Figure 6-2 Composite block index

168 Database Strategies: Using Informix XPS and DB2 Universal Database Dimension block index In Example 6-3 on page 166, a dimension block index is created on each of the three dimensions: o_year, country, and color. You can also specify a set of columns for multi-column dimension by using parenthesis around the required columns, as shown in Example 6-4. The example creates two dimension block indexes, (o_year, country) and color, in addition to the composite block index on (o_year, country, color).

Example 6-4 Composite block index create table mdc_table (o_key int, country varchar(25), o_date date, color varchar(10), o_year int generated always as year(o_date), ..) partitioning key (o_key) using hashing ORGANIZE BY ((o_year, country), color)

Figure 6-3 illustrates the index key for the key value Canada. The index key has a list of block IDs, where the block ID is comprised of the first pool relative page of the block and a dummy slot (0). Compare this to a RID in a RID index, which is made up of the page number and slot number of a row in the table.

Each key has a list of BIDs (Block IDs)

Key for Canada:

Canada 4,0 12,0 48,0 52,0 76,0 100,0216,0 292,0 304,0

Key BID(Block ID) = < first pool relative page of block, 0 > Example: if extent size is 4 pages, Block ID (4,0) represents 4,5,6 and 7.

Figure 6-3 Dimension block index

Block map In addition to the two indexes, MDC tables maintain a block map that contains a bitmap indicating the availability status of each block. This block map is stored as a separate object on disk. The bitmap list stores status information such as block in use, block recently loaded, and block free.

Because each block has an entry in the block map, it grows as the table grows.

Chapter 6. Data partitioning and access methods 169 6.6.1 Benefits of MDC

Some of the benefits of using MDC are:

 The syntax is flexible and dynamic. As examples, you do not need to create a new range for a new set of data, and you do not have to make sure that the new data obeys the constraints of the new range  Data having particular dimension values is guaranteed to be found in a set of blocks that contain only, and all, records having those values.  Blocks are consecutive pages on disk, so access to the rows within the block is sequential, requiring minimal I/O.  Clustering is maintained automatically over time.  When existing blocks in a cell are full, DB2 reuses, or allocates, a block and adds it to the set of blocks for that cell.  When a block is emptied of data, the block identifier (BID) is removed from the block indexes and can be reused for another cell when needed, thus minimizing space requirements.  In terms of query performance, range queries involving any combination of specified dimensions of the table will benefit from clustering. Not only do these queries access only those pages that have records with the correct dimension values, those qualifying pages are grouped by extents. For example, a query that contains a predicate uses the dimension block index that corresponds to country and color to get the list of qualifying extents, thus optimizing performance.  Because the BID indexes can be ANDed or ORed with the RID indexes, a filter condition involving AND and OR of MDC column with any other single clustered index is optimized.  Although a table with a clustering index can become unclustered over time as space fills up in the table, an MDC table is able to maintain clustering over all dimensions automatically and continuously, thus eliminating the need to reorganize the table to restore the physical order of the data.  The DB2 query compiler derives predicates automatically to favor MDC dimension columns. For example, consider the tables from Chapter 13, “Large data volumes: A case study” on page 373. The tables orders and lineitem contain MDC on the generated columns o_orderym = INTEGER(o_orderdate)/100 and l_shipym=INTEGER(l_shipdate)/100 respectively. The TPC-H query Q3 contains a predicate o_orderdate < date('1995-03-15') and l_shipdate > date('1995-03-15'). The optimizer derives the predicates o_orderym < 199503 and l_shipym > 199503.

170 Database Strategies: Using Informix XPS and DB2 Universal Database 6.6.2 Design considerations for MDC tables

To get the maximum benefit of MDC, you should carefully choose the set of dimension columns and block size (table space extent size) for clustering a table. If they are chosen poorly, performance and space utilization can be less than

optimal. Identifying the appropriate dimension columns is a crucial aspect of MDC. The ideal MDC columns are as follows:  Columns frequently queried upon and containing lots of duplicates. Unique column values have sparse blocks and cause excessive disk usage.  Columns used for range, equality, and IN predicates, for example: shipdate > ’2002-05-14’, shipdate = ’2002-05-14’, year(shipdate) in (1999, 2001, 2002)  Roll-in or roll-out of data, for example: delete from table where year(shipdate) = ‘1999’  Columns referenced in GROUP BY or ORDER BY clause.  A column with N duplicates in a 1:N join condition can be a good candidate.

Columns that are frequently updated are not good candidates. If there are few duplicates in a dimension column, consider a higher granularity to optimize disk space. For example, if the dimension column is a date and each date has only 10 rows, a lot of disk space might be wasted. A better choice would be YearMonth as a dimension, as each YearMonth contains 30x10= 300 rows.

Note: Range scans on generated columns can only be done when the expression used to generate the column is monotonic. Monotonic means: if (A > B) then expr(A) >= expr(B) and if (A < B) then expr(A) <= expr(B)

As A increases in value, the expression based upon A also increases or remains constant.

Examples of monotonic operations include: A + B, A * B, integer(A).

Examples of non-monotonic operations are: A - B, month(A), day(A). The expression month(A) is non-monotonic because as A increases, the value of the expression fluctuates. month(20010531) equals 05; month(20021031) equals 10; but month(20020115) equals 01. So, as the date value increases, the value of the month fluctuates.

Chapter 6. Data partitioning and access methods 171 6.6.3 Operations on MDC tables

Clustering or indexing techniques imply additional overhead for maintenance. However, inserts, updates, deletes, and loads on MDC tables have been optimized for efficiency using new techniques.

Load An algorithm creates bins for the data records in memory based on the dimension values. All records with a particular combination of dimension values are sent to the same bin. On filling up these bins, they are written to disk.

Insert During insert of a record, DB2 searches the composite block index using the key value of the record that is inserted. If it finds such a key, it inserts this new row into the block (corresponding to this key) that contains free space. The block map is used to find the blocks containing space. If it does not find any free blocks, it allocates a new block and adds it to the block map. The block ID is added to the leaf pages of the composite block index and each of the dimension block indexes. The row then is inserted into this new block.

If no such key exists in the block indexes, this new key is added to the block indexes, the row is inserted in a newly allocated block, and the block ID is added to the block indexes.

Delete Bulk delete on an MDC table using a predicate on the dimension column outperforms the equivalent regular indexed delete. When all the pages in a block are empty, the block is marked free for reuse by future inserts or updates and the corresponding block ID is removed from the dimension indexes.

Update If the update results in the creation of an overflow, the overflow record is created in the block corresponding to the same cell. Update on the MDC columns is similar to a delete from old cell and an insert into new cell. For this reason, frequently updated columns should not be considered as MDC candidate columns.

6.6.4 Space requirement for MDC

The unit of storage allocation for an MDC table is an extent. For each unique combination of values for the chosen dimensions, at least one extent is required. If many rows have a particular combination of values, they might fill an extent and expand to one or more additional extents. But if only one row occurs with a

172 Database Strategies: Using Informix XPS and DB2 Universal Database particular combination of values, that single row occupies the entire extent. If your choice of dimensions leads to a lot of extents, each having only a few rows, excessive space will be used. These mostly empty blocks are sometimes called sparse blocks or sparse extents.

The number of cells in the MDC table is equal to the number of unique combinations of the dimension attributes.The number of rows output by the query in Example 6-5 is equal to the number of cells, and the column RpC contains the number of rows or cell. If you have chosen appropriate dimension columns, the RpC should typically be a large value.

Example 6-5 Query to determine MDC cell count SELECT DISTINCT dimension_col1, dimension_col2, ... , dimension_colN, COUNT(*) FROM table GROUP BY dimension_col1, dimension_col2, ... , dimension_colN

Our case study compared the space requirement for MDC tables with non-MDC tables. (See Chapter 13, “Large data volumes: A case study” on page 373 for more details.) For our data distribution and schema, the MDC table contained six million data pages and 16 index pages of disk space, as compared to the original six million pages for a non-MDC table. The extra space requirement for our MDC table as compared to a non-MC table was infinitesimal.

We recommend that you refer to DB2 UDB ESE: Partitioning for Performance in an e-business Intelligence World, SG24-6917, which contains detailed information about how to calculate the space requirement for an MDC table.

6.7 Range-clustered tables in DB2

A range-clustered table (RCT) is a table layout scheme where each record in the table has a predetermined RID that is an internal identifier used to locate a record in a table.

An algorithm is used to equate the value of the key for the record with the location of a specific record within a table. The basic algorithm is fairly simple. In its most basic form (using a single column instead of two or more columns to comprise the key), the algorithm maps a sequence number to a logical record number. The algorithm also uses the record key to determine the logical page number and slot number. This process provides exceptionally fast access to records; that is, to specific rows in the table.

Each record key in the table should be unique, not null, and a monotonically increasing integer. Also the key column values should be within a predetermined set of ranges.

Chapter 6. Data partitioning and access methods 173 Space for the RCT is pre-allocated and reserved for use by the table even when records for the table have not been loaded. At table creation time, there are no records in the table. However, the entire range of pages is pre-allocated. Pre-allocation is based on the record size and the maximum number of records to be stored. Applications where tightly clustered (dense) sequence key ranges are likely, are excellent candidates for range-clustered tables. When using this type of key to create a range-clustered table, the key is used to generate the logical location of a row in a table. This process avoids the need for a separate index.

6.8 Roll-in and roll-out of data using UNION ALL views

In a typical business environment, the DBA has to roll-in and roll-out data. For example, an application might require that you roll-out the data for the three oldest months (oldest quarter) and roll-in the data for the new quarter. Typically, these tables are fragmented on some date column.

In XPS, the old data is rolled-out (into a table) using the ALTER FRAGMENT DETACH command. The data for the new quarter is loaded into a staging table and this table is attached to the base table using the ALTER FRAGMENT ATTACH command.

For the MDC tables, this translated into the INSERT of old data into a staging table, the DELETE of this data from the base table, and the INSERT of new data into the base table. These SQL commands use predicates on the dimension columns. Because of the block deletes in MDC, the performance of the delete is significantly faster than the single clustered or regular index. The INSERT has some performance overhead due to the block search and the updates on the block indexes.

This section discusses the concept of UNION ALL views. You can use this feature to implement the XPS ATTACH or DETACH type of functionality in DB2. As an example, a physically large table is partitioned into a set of smaller tables called branches. Each of the branch tables have a mutually exclusive constraint on the same set of columns. A view is created using the UNION ALL construct to SELECT from all of these smaller tables. To the user, the view appears to be one big table, and this view can be used in any join condition, as though it were a single table.

To simulate the XPS ALTER FRAGMENT DETACH, you can drop the UNION

ALL view and re-create it by excluding the base table that you need to detach. Similarly for attaching a table, re-create the view by including the new table in the view definition.

174 Database Strategies: Using Informix XPS and DB2 Universal Database An example of this process is depicted in Example 6-6, where orders_all is a view with UNION ALL on the three smaller tables: orders_quarter12 (has constraint for quarters 1 and 2), orders_quarter3 (has constraint for quarter 3), and orders_quarter4 (has constraint for quarter 4).

Example 6-6 UNION ALL views create table orders_quarter12 ( o_orderkey decimal(10,0) not null, o_orderdate date, o_totalprice decimal(12,2), constraint Q12 check (o_orderdate < '2004-07-01') ) partitioning key (o_orderkey) using hashing in space_dataindex; create table orders_quarter3 ( o_orderkey decimal(10,0) not null, o_orderdate date, o_totalprice decimal(12,2), constraint Q3 check (o_orderdate >= '2004-07-01' and o_orderdate < '2004-10-01') ) partitioning key (o_orderkey) using hashing in space_dataindex; create table orders_quarter4 ( o_orderkey decimal(10,0) not null, o_orderdate date, o_totalprice decimal(12,2), constraint Q4 check (o_orderdate >= '2004-10-01' and o_orderdate < '2005-01-01') ) partitioning key (o_orderkey) using hashing in space_dataindex; create view orders_all as ( select * from orders_quarter12 union all select * from orders_quarter3 union all select * from orders_quarter4);

Chapter 6. Data partitioning and access methods 175 6.8.1 Query optimization of UNION ALL views

The DB2 query rewrite component of the SQL compiler optimizes the queries on UNION ALL views. It attempts to prune the number of tables that need to be accessed during query processing. This section discusses these optimizations.

Local predicate pushdown and redundant branch elimination The query rewrite module pushes eligible local predicates (predicates not involving other tables) down through the SELECT, join, UNION, or GROUP BY. By applying the predicates at the lowest table scan level, it can eliminate the unqualified rows at the earliest opportunity so fewer rows move up to the upper levels (such as join and GROUP). The redundant branch elimination works in combination with the local predicate pushdown. The branches containing constraints that contradict the local predicates are eliminated, because they do not return any rows.

Let us go back to the view definition, orders_all, as discussed in Example 6-6 on page 175. In Example 6-7, we query on orders_all with a predicate on the constraint column o_orderdate. The query rewrite module optimizes the query to push this predicate to the individual branches before doing the UNION ALL. It also eliminates the branch orders_quarter4, because it has a contradictory condition o_orderdate <= '2004-09-27' when compared to its constraint o_orderdate >= '2004-10-01' and o_orderdate < '2005-01-01'.

Example 6-7 Local predicate pushdown Original Statement: ------select o_orderkey from orders_all where o_orderdate <= '2004-09-27'

Optimized Statement: ------select q5.$c0 as "o_orderkey" from (select q1.o_orderkey from db2test.orders_quarter12 as q1 where (q1.o_orderdate <= '09/27/2004') union all select q3.o_orderkey from db2test.orders_quarter3 as q3 where (q3.o_orderdate <= '09/27/2004') ) as q5

176 Database Strategies: Using Informix XPS and DB2 Universal Database Predicates involving functions, such as UPPER(state) = ‘CALIFORNIA’, cannot be used for contradictory checks and hence cannot be used to eliminate branches. They will still be pushed down to the branch level.

The exceptions are the YEAR and MONTH functions. For example, the predicate

YEAR(sales_data)=2004 and MONTH(sales_data)=2 is converted to sales_data >= ‘02-01-2004’ and sales_data < ‘03-01-2004’. You can create a generated column that uses the function name as a workaround.

Join pushdown The DB2 SQL compiler also pushes down any equi-join predicates to the base tables. This has the same benefit as the local predicate pushdown because it reduces the number of rows flowing up to the upper operators.

Even if the constraint conditions for the tables being joined are different, the SQL compiler attempts a join-pushdown. However, there are some limits beyond which the compiler will not do the join pushdown. In cases where the constraint conditions are the same, these limits are higher.

Let us take an example. We join two tables, sales_all and orders_all (defined in Example 6-6 on page 175). Let us say sales_all is defined along the same lines as orders_all with the same constraints on the date column s_orderdate. For a join between these two tables on the view partitioned columns, the query rewrite pushes the join to the base table level as shown in Example 6-8.

Example 6-8 Join pushdown Original Statement: ------select o_orderkey from sales_all s,orders_all o where s_orderdate = o_orderdate

Optimized Statement: ------select q10.$c0 as "o_orderkey" from (select q2.o_orderkey from db2test.sales_quarter12 as q1, db2test.orders_quarter12 as q2 where (q1.s_orderdate = q2.o_orderdate) union all select q5.o_orderkey from db2test.sales_quarter3 as q4, db2test.orders_quarter3 as q5 where (q4.s_orderdate = q5.o_orderdate) union all select q8.o_orderkey from db2test.sales_quarter4 as q7, db2test.orders_quarter4 as q8

Chapter 6. Data partitioning and access methods 177 where (q7.s_orderdate = q8.o_orderdate) ) as q10

Join pushdown has some overhead during compile time because the optimizer has to determine the optimal join ordering for each branch of the UNION ALL. Also, the memory requirements increases as the number of join combinations (after a pushdown) increases. When the optimizer determines that the overhead required is too large, based on limits discussed previously, there is no join pushdown.

GROUP BY pushdown The query rewrite module also pushes down the GROUP BY column through a UNION ALL view so that the grouping operation can be applied early on a smaller set of rows. GROUP BY pushdown is effective especially when grouping on the partitioning column, though it works for other columns as well.

GROUP BY is only performed in the following circumstances:  Aggregate functions must be any of MIN, MAX, SUM, COUNT, or AVG.  The number of remaining branches after partition elimination must be fewer than 64.

Runtime branch elimination In addition to branch elimination at compile time, DB2 can do runtime branch elimination for equality or range predicates. This is useful when you have host variables in the statement during compile time.

6.8.2 Benefits of UNION ALL views In addition to query performance, UNION ALL views of a partitioned database also have other advantages, such as:  Better control of maintenance window utilities.  Easier to roll-in and roll-out data.  Ability to leverage different storage media.  Branch-based performance tuning.  Schema and data evolution.  Decreased I/O through branch elimination.  Increased parallelism.  Does not require any index changes. In XPS, if the table being altered contains an index, an ATTACH or DETACH on such a table alters or rebuilds the index. For attached indexes, you create

178 Database Strategies: Using Informix XPS and DB2 Universal Database an index on the consumed table. ALTER FRAGMENT ATTACH alters the index on the surviving table to include the newly built index. For detached indexes, the index on the surviving table is dropped, the consumed table is attached, and then the index is re-created. The elapsed time of such an ALTER also includes the time that is takes to rebuild the index on the entire base table. There is no index build overhead in DB2.

6.8.3 Limitations of UNION ALL views As the number of branches in the UNION ALL view increases, the complexity of the query after transformation increases. This leads to increased compile time and memory requirements for the SQL optimizer.

The join pushdown is done only for equi-joins and has certain restrictions, which are as follows: 1. For UNION ALL join N number of tables: M = number of branches after local predicate pushdown (remaining branches) when M <= 64 and M x N <= 180 2. For UNION ALL (UA1) join UNION ALL (UA2) join N tables: A = number of remaining branches from UA1 B = number of remaining branches from UA2 when A x B < = 64 and A x B x N < = 180 3. For Special case scenario of UNION ALL (UA1) join UNION ALL (UA2) join N tables: In the case where the range of each branch from UA1 matches the one in UA2 and the constraint columns are equi-joined, the limits for join pushdown are higher than in list item 2. when A < = 64 and B <= 64 and A x N < = 180

In situations where DB2 is not able to perform a pushdown, it can materialize the entire view which causes a severe performance impact. Also, for UNION ALL views, you cannot use globally unique indexes across the base tables.

Chapter 6. Data partitioning and access methods 179 6.9 MDC and UNION ALL views for roll-in and roll-out

If the number of branches and number of joined tables fall within the limits,

UNION ALL views is a very good and fast option to simulate the ATTACH and DETACH FRAGMENT of XPS. Otherwise, you should consider implementing it using MDC tables.

For the roll-in and roll-out of data using UNION ALL views, you have to drop and re-create all the UNION ALL views. Effectively the views are not accessible to the users during this operation. Also, if you have view definitions or stored procedures that use UNION ALL views, you need to re-create those views or rebind the packages. On the other hand, the MDC tables are still accessible when the roll-in and roll-out is happening (via DMLs) on these tables.

We used both the options, MDC and UNION-ALL views, in our case study, as described in Chapter 13, “Large data volumes: A case study” on page 373. We determined that for our environment and schema, MDC was more manageable than UNION-ALL views. About four queries from the total 22 TPC-H queries performed a join between two UNION ALL views. They ended up exceeding the limits and, thus, did not perform well. For our data distributions, the INSERT and DELETE operations on the MDC tables, to simulate DETACH and ATTACH of approximately 80 million rows, finished within a couple of minutes, even though these operations were logged. If the roll-out or roll-in of data is a monthly activity during a scheduled downtime over the weekend, MDC is a better option considering the ease of use and the flexibility.

6.10 Indexing strategies

Both DB2 and XPS implement their indexing schemas using B+ Trees. This section discusses the syntax for CREATE INDEX, some additional clauses in the CREATE INDEX on DB2, various indexing methods available in DB2, and how some of the XPS indexes can be translated to DB2.

6.10.1 Syntax for index creation The general syntax for creating indexes is common between XPS and DB2, as shown in Example 6-9.

Example 6-9 Basic create index syntax in DB2 and XPS

CREATE [UNIQUE] INDEX ON

()

The basic naming rules for creating indexes are also common. As examples, an index name must be unique for a database, the identifier (the name of the index)

180 Database Strategies: Using Informix XPS and DB2 Universal Database must not be longer than 128 characters, and special characters are not allowed in the index name.

In contrast to XPS, DB2 allows indexing columns greater than 255 characters. The column width in DB2 can be up to 1024 characters. Keep in mind that it is

not always ideal to index very large columns. The larger the indexed column, the less performant the index.

In XPS it is a common practice to separate the index pages from the data. This is done by specifying a dbspace or dbslice that is different than the table dbspace or dbslice at index creation time. These are called detached indexes. Even if the distribution scheme specified for the index is identical to that specified for the table, the index is considered to be detached.

DB2 also allows you to place an index into a different table space. In contrast to XPS, the index table space is specified at table creation time, as shown in Example 6-10, and not at index creation time. All indexes for a given table go in the specified table space.

Example 6-10 DB2: create index in separate table space CREATE TABLE t1 ( col1 INTEGER, col2 CHAR (10) ) IN tbs1 INDEX IN idxtbs1;

CREATE INDEX idx1 ON t1 (col1);

6.10.2 DB2 index expansions In the DB2 SQL Reference, SC09-2974, you can find the complete CREATE INDEX command syntax. There are more options and parameters to the syntax than with XPS. The following sections provide common examples of those.

Include option Both XPS and DB2 allow key-only reads. These are reads from the database where all the columns in the select list are available via an index. Key-only reads are very efficient because there is no need to access data pages.

In XPS, it is not uncommon for users to add extra columns to unique indexes to facilitate key-only reads. These columns do not affect uniqueness at all. The problem with this practice is that the more columns you add into a composite index, the bigger each index entry grows at all levels of the index.

Chapter 6. Data partitioning and access methods 181 DB2 allows you to include columns (see the general syntax in Example 6-11) into a unique index which are found only at leaf level. The index size is much smaller because the include column is only stored at the leaf level as compared to the workaround used in XPS, where the extra column is at all levels.

Example 6-11 DB2 index with INCLUDE expansion CREATE UNIQUE INDEX idx_name ON tab_name (col_list) INCLUDE (col_list)

Note: Include indexes are allowed only for unique indexes.

MINPCTUSED parameter An index tree is as dynamic as the data changes in the corresponding table. Index leaf pages are split into two new leaf pages if the original leaf page becomes full. If a leaf page becomes empty because rows have been deleted from the corresponding table, both DB2 and XPS try to compact two leaf pages together into one.

The MINPCTUSED parameter defines at what percentage a leaf page is declared empty and thus subject for compression. The default value for this parameter is 0%, which means that a index leaf page must be completely empty before a reverse split occurs. You can set the MINPCTUSED parameter up to 99%, as shown in Example 6-12, but it is recommended that you not use values higher than 50%.

Example 6-12 DB2 Index with MINPCTUSED CREATE INDEX ON

() MINPCTUSED [0..99]

PCTFREE parameter The XPS onconfig parameter FILLFACTOR advises the database engine on how much space should be reserved within index leaf pages for upcoming INSERT statements. DB2 has a similar parameter PCTFREE. The default value is set to 10. That means 10% of the space in an index page is to be reserved for future modifications. Keep in mind that DB2 has different page sizes. This parameter is passed to the CREATE INDEX statement as shown in Example 6-13.

Example 6-13 Index with PCTFREE CREATE [UNIQUE] INDEX ON

() PCTFREE [0..99]

182 Database Strategies: Using Informix XPS and DB2 Universal Database ALLOW REVERSE SCANS option As in XPS, DB2 index leaf pages are implemented by a double linked list. That allows access of the data in both ascending and descending order. Unlike XPS, when you create an index in DB2 you must specify what sort sequence the index should support. If you do not specify the sorting sequence the index follows ascending sorting. When ALLOW REVERSE SEQUENCE is specified in the CREATE INDEX statement (see Example 6-14) the index also can be used for reverse sorting. The default is DISALLOW REVERSE SCANS.

Example 6-14 DB2 index with ALLOW REVERSE SCANS CREATE INDEX ON

() ALLOW REVERSE SCANS

COLLECT DB2 allows collection of statistical information of an index during creation time as demonstrated in Example 6-15. As such, there is no need to execute a RUNSTATS command after the index has been created.

There are three options when adding the COLLECT parameter: 1. STATISTICS. Collect basic statistic information of the index. 2. DETAILED STATISTICS. Creates a distribution curve of the index data. 3. SAMPLED DETAILED STATISTICS. Creates a sampled distribution curve of the index data.

Example 6-15 DB2 index with COLLECT STATISTICS CREATE INDEX ON

() COLLECT [ [SAMPLED] DETAILED ] STATISTICS;

6.10.3 Index types and access methods This section discusses the various types of indexes that are available on XPS and DB2 and how some of them map with each other.

Implicit indexes When you create an Primary Key or Foreign Key constraints on columns, a corresponding index is necessary for these columns. If an index does not exist, the database will create one for you. These kinds of indexes are called implicit indexes.

Both XPS and DB2 use implicit indexes.The composite block index and the dimension block indexes for an MDC table are also created implicitly by DB2.

Chapter 6. Data partitioning and access methods 183 Clustered index A clustered index attempts to insert new rows physically close to the rows for which the key values of this index are in the same range. In XPS, you cannot create clustered indexes on a standard table. Also, you cannot use the CLUSTER option and storage options in the same CREATE INDEX statement.

DB2 also supports clustered indexes. The syntax to create a clustered index in DB2 is slightly different from XPS, as shown in Example 6-16.

Example 6-16 Creating a clustered index (XPS) CREATE [UNIQUE]CLUSTER INDEX on

()

(DB2) CREATE [UNIQUE] INDEX ON

() CLUSTER

The parameter PCTFREE, which we previously discussed, becomes an important factor because it influences the future quality of a clustered index when new rows are inserted. This is similar to the XPS onconfig parameter FILLFACTOR.

Key-only scans XPS implements key-only scans when all of the required data can be retrieved from the index without accessing the table. This is supported in DB2 by the INCLUDE option in the CREATE INDEX syntax, which we discussed earlier in 6.10.2, “DB2 index expansions” on page 181. It is called index-only access.

Multiple index scans The multi-index scan in XPS is also implemented in DB2 and is called a multiple index scan. This feature allows the scan of multiple indexes on the same table that satisfy the predicates in the WHERE clause.

For example, lets say you have an index on the two columns, job and engineer, and you run a query with predicates job = ‘engineer’ or years > 5. It scans the index on job to produce a list of record IDs (RIDs) with value engineer, and then scans index on years to produce another list of RIDs with value 5. These two lists of RIDs are combined and duplicates are removed before the table is accessed. This is known as index ORing. For an AND condition, a bitmap is used to output common RIDs. Based on the first predicate, the relevant index is scanned and a bitmap is created on the RIDs. Scanning the second index and probing the bitmap results in the list of qualifying RIDs that satisfy both predicates. This is known as dynamic bitmap ANDing.

184 Database Strategies: Using Informix XPS and DB2 Universal skip scans and RID-list-fetch XPS uses an access method called index skip scans on a single index to eliminate random disk I/O. It fetches all the qualifying RIDs from the index using the predicates on the index column. These RIDs are then sorted. A scan through the table picks up the next page that has the relevant records (RIDs), skipping those that do contain any qualifying rows. Because records are sorted in order, data pages are also read in order. This ensures pages are read only once.

Skip scan is called rid-list-fetch in DB2. The RIDs are sorted for unclustered indexes and the records are scanned via RIDs, similar to XPS.

GK indexes and Materialized Query Tables (MQT) XPS supports creation of Generalized Key Indexes (GK indexes). GK indexes store information about the records of a static table, in an index, based on the results of a query. They provide a form of pre-computed index capability that allows faster query processing. Example 6-17 shows the XPS syntax for creating a GK index.

Example 6-17 XPS: syntax for GK index create static table t1 (col1 int primary key, col2 int); create static table t2 (col1 int primary key, col2 int);

CREATE GK INDEX gki on t1 (SELECT t1.col1, t1.col2 FROM t1, t2 WHERE t1.col1 = t2.col1 )

Queries with the same join condition as the GK index use the GK index directly instead of joining the tables. Some of the restrictions on GK indexes are:  It can be created only on static tables. In fact all the tables in the join should be static tables. Also, you cannot create a GK index on remote tables or views.  The base tables (static, UPDATE, INSERT, DELETE and LOAD) are not allowed on the base tables unless the GK index is dropped, the table type changed and the index is re-created.  The tables that are mentioned in the FROM clause must be transitively joined on the primary key to the indexed table. In the example, col1 is the primary key used in the join condition.  The SELECT and WHERE clause should not contain functions such as USER, TODAY, CURRENT, or DBINFO.

 Key-only index scans are not available with GK indexes.

Chapter 6. Data partitioning and access methods 185 MQT is a feature available in DB2 that serves the same purpose as a GK index. An MQT is a table whose definition is based on the result of a query, and whose data is in the form of precomputed results that are taken from one or more tables on which the materialized query table definition is based. Example 6-18 shows the syntax for creating a MQT. Example 6-18 DB2: Syntax for MQT CREATE TABLE bad_account AS (SELECT customer_name, customer_id, a.balance FROM account a, customers c WHERE status IN ('delinquent', 'problematic', 'hot') AND a.customer_id = c.customer_id) DATA INITIALLY DEFERRED REFRESH DEFERRED

A REFRESH TABLE statement is used to incrementally refresh the MQT that was defined with the REFRESH DEFERRED option. To have the MQT refreshed automatically when changes are made to the base table or tables, specify the REFRESH IMMEDIATE keyword.

In the SQL compiler, the query rewrite phase and the optimizer match queries with MQTs and determine whether to substitute an MQT for a query that accesses the base tables.

Some of the similarities and differences between GK indexes and MQTS are:  As with a GK index, you cannot alter an MQT.  MQTs are tables and not indexes and, therefore, can be used in queries the same as other regular tables. The same is not true for GK indexes.  Unlike GK indexes, you can create non indexes and execute RUNSTATS (equivalent of the XPS UPDATE STATISTICS) on MQTs. Unique indexes are not allowed on MQTs.  Unlike GK indexes, you can run UPDATE, INSERT, or DELETE on the base tables.

6.10.4 Space requirements for indexes On XPS, you can use the onutil check info command to get the size of your existing index. On DB2, if you create an index using the DB2 Control Center, Create Index Wizard (as depicted in Figure 6-4 on page 187) you can get an estimation of index sizes before creating the index. Just select Estimate Size... within the wizard. In the new window that is displayed, enter the number of rows of the table and click Refresh.

186 Database Strategies: Using Informix XPS and DB2 Universal Database

Figure 6-4 Create Index Wizard

6.10.5 Table and Index reorganization on DB2 After many changes to table data, logically sequential data might be on non-sequential physical data pages. Therefore, additional read operations would be required to access data. The same is true for tables where a significant number of rows have been deleted. The table and index reorganization feature on DB2 allows you to reclaim unused or deleted space from the tables and indexes. Read the DB2 UDB Administration Guide for the syntax and specifics on these commands.

The b-tree cleaning feature in XPS is kind of similar to the index reorganization. The b-tree cleaning is automatically achieved in the XPS server by the btcleaner thread.

On DB2, you can either explicitly run the command to perform the table reorganization or set the MINPCTUSED parameter when you create an index to automatically merge index leaf pages. If a key is deleted and the free space is less than the specified percent, the leaf pages are merged. This process is called online index defragmentation.

Chapter 6. Data partitioning and access methods 187 6.11 Joins

This section discusses joins: first the syntax of joins and then the different actual

methods available on DB2 to achieve the joints. In both these aspects, DB2 UDB differs from Informix XPS.

Join syntax allows the user to determine what tables are to be joined and using what keys. The optimizer then selects an appropriate join method to be used for a particular method. In particular, based on the distribution of the data in the database, the optimizer selects different methods to join tables that are distributed over multiple database partitions.

6.11.1 Join syntax DB2 supports the full set of ANSI joins. The simplest join is that expressed by the following SQL statement (run against the DB2 sample database): SELECT empno, firstnme, lastname, deptname FROM employee, department WHERE workdept = deptno AND admrdept = ’A00’

This type of join is called an inner join and the formal method of writing this query with DB2 is as follows: SELECT empno, firstnme, lastname, deptname FROM employee INNER JOIN department ON workdept = deptno WHERE admrdept = ’A00’

You should note here that INNER JOIN is used in the FROM clause. The ON keyword specifies the join predicates and categorizes rows as either joined or not joined. The WHERE clause is thus used solely to filter rows.

There are three types of OUTER join:  LEFT OUTER JOIN SELECT empno, firstnme, lastname, deptname FROM employee LEFT OUTER JOIN department ON workdept = deptno  RIGHT OUTER JOIN SELECT empno, firstnme, lastname, deptname FROM employee RIGHT OUTER JOIN department ON workdept = deptno

188 Database Strategies: Using Informix XPS and DB2 Universal Database  FULL OUTER JOIN

SELECT empno, firstnme, lastname, deptname FROM employee FULL OUTER JOIN department ON workdept = deptno The Venn Diagram of Figure 6-5 show these three types of ANSI OUTER joins.

Employee Department Left Table Right Table

LEFT OUTER JOIN

RIGHT OUTER JOIN

FULL OUTER JOIN

Figure 6-5 Three types of OUTER join

Cartesian Product (or CROSS JOIN) The Cartesian Product or CROSS JOIN joins each row of one table with every row of the other table. Cartesian products are rarely desirable and are usually the result of mis-coded SQL. Thus, a join such as the following (where there are no join predicates specified) results in a cartesian product: SELECT empno, firstnme, lastname, deptname FROM employee, department

With XPS V8.50, an alarm is triggered when a Cartesian product is used in a query. ANSI-compliant joins with Informix IDS With Informix IDS V9.4, you can specify the following:  INNER JOIN  CROSS JOIN  NATURAL JOIN  LEFT JOIN (or LEFT OUTER JOIN)  RIGHT JOIN (and FULL OUTER JOIN keywords) The OUTER keyword is optional in ANSI-compliant outer joins.

Chapter 6. Data partitioning and access methods 189 6.11.2 Join methods (generic)

Depending on the existence of a join predicate, as well as various costs involved as determined by table and index statistics, the optimizer chooses one of the following join methods:

 Nested-loop join  Merge join  Hash join

When two tables are joined, one table is selected as the outer table and the other as the inner. The outer table is accessed first and is scanned only once. Whether the inner table is scanned multiple times depends on the type of join and the indexes that are present.

Even if a query joins more than two tables, the optimizer joins only a pair of tables at a time. If necessary, temporary tables are created to hold intermediate results.

When joining tables, the optimizer needs to determine which algorithm, or method to use. This algorithm determines the basic procedure for combining the tuples from each of the two tables into a resultant row. The algorithm does not specify the result — because that is determined by the SQL — but just the most optimal of the three approaches to use. While the results are determined by the SQL, the performance in execution is determined by the algorithm used.

When joining two tables, regardless of which join method is used, one table will be selected to be the OUTER table, and another table will be the INNER table. The optimizer decides which will be the OUTER table and which will be the INNER table based on the calculated cost and the join method selected. The OUTER table will only be scanned once. The INNER table can be scanned multiple times, depending on the type of join and indexes that are present on the table.

The optimizer understands the cost advantages and disadvantages of each of the three algorithms and it chooses the algorithm that result in the lowest cost based on the current statistics in the system catalog.

The following are a high-level summary of some of the considerations used by the optimizer:  The smaller table is more likely to be chosen as the outer table, thereby reducing the number of accesses to the inner table.

 A table is more likely to be chosen as the outer table if highly selective predicates can be applied to it, as this reduces the number of accesses to the inner table.

190 Database Strategies: Using Informix XPS and DB2 Universal Database  If a suitable index lookup can be applied to one table, that table becomes a better candidate to use as the inner table, because indexing into the inner table is far more efficient than scanning the table.

 The table with the fewest duplicates in the join keys is a better choice for the outer table of the join.

Tip: Best performance is achieved when the statistics in the system catalog represent the actual state of the tables. Thus, when substantive changes are made to any of the tables you should run the RUNSTATS command to update the statistics in the system catalog.

Nested loop join To join rows in a nested loop join, successive rows are selected from the outer table and then for each row the inner table is accessed for a match (preferably by indexing into the inner table but, if necessary, by scanning for a match).

Merge Join With the merge join, both tables need to be ordered by the join predicates and ordering can be achieved by either a sort or indexed access. Each table is then read sequentially and the join columns are matched up.

Hash joins Hash joins require one or more equality join predicates between the joined tables, for which the column types are the same. The fact that they can handle more than one equality predicate between the joined tables is a distinct advantage over other join methods.

The inner table is scanned and the rows copied into memory buffers taken from the sort heap allocation. The memory buffers are divided into partitions based on a “hash code that is computed from the column(s) of the join predicate(s). After processing the inner table, the outer table is scanned and its rows are matched to rows of the inner table by also computing the hash code.

Hash joins require a significant amount of memory. Performance suffers if the size of the first table (the inner table) exceeds the available sort heap space, because then buffers from selected partitions have to be written to temporary tables on disk.

With DB2 V7, hash joins are only considered when the registry variable DB2_HASH_JOIN is set to YES. With DB2 UDB V8.1 for Linux, UNIX, and Windows (and later), this registry variable is set by default.

Chapter 6. Data partitioning and access methods 191 Note: The hash join is only available with DB2 UDB for Linux, UNIX, and Windows. With DB2 for z/OS® and OS/390®. The hybrid join combines data and pointers to access and combine the rows from the tables being joined. This join type is beyond the scope of this book.

Use of indexes for joining In general, DB2 recommends the use of explicit indexes to support joining while XPS does not.

DB2 prefers the nested loop join when a small number of rows qualify for the join, and the nested loop join can take advantage of indexes on the inner (and, thus, the larger table). When the number of rows involved is larger, the merge join becomes a better choice, and again indexes might be helpful to access the row.

Finally, in the case of a hash join, the inner table is kept in memory and disk-based table indexes are not of any use. The inner table is always selected as the smaller table to try to avoid memory spill for the hash tables.

Star joins with dynamic bitmap indexes This join method can be accomplished through the following steps: 1. Process each dimension table by performing a semi-join between the dimension table and the foreign key index of the fact table to determine qualifying rows of the fact table. 2. Hash the fact table row ID (RID) values to dynamically create bitmaps. 3. Use AND predicates against the previous bitmap for each bitmap. This is called index ANDing. 4. Determine the surviving RIDs after processing the last bitmap. 5. Optionally sort these RIDs. 6. Fetch a base table row. 7. Re-join the fact table with each of its dimension tables, accessing the columns in dimension tables that are needed for the SELECT clause. 8. Reapply the residual predicates.

6.11.3 Join strategies in a partitioned database In a partitioned database, the join strategies are more complex than in a non-partitioned database. Additional techniques have to be applied to standard join methods to improve performance.

192 Database Strategies: Using Informix XPS and DB2 Universal Database Collocated table joins are the preferred join strategy when using the DB2 ESE database partitioning feature (DPF) as this does not require data to be shipped from partition to partition. This is only possible if the corresponding data from the joined tables is available in the same partition(s). Table collocation provides the means in a partitioned database to locate data from one table with the data from another table at the same partition based on the same partitioning key. When collocated, data to be joined can participate in a query without having to be moved to another database partition as part of the query activity. The result set (or answer set) does need, however, to moved to the coordinator partition for delivery to the client.

Collocated table joins do not require that all data reside in one partition, but only that the corresponding rows (based on the join key) are in the same partition for each table. Thus, the two tables involved in the join must be partitioned using the same key(s).

When corresponding rows are not collocated, rows must be shipped. Two antithetical approaches are available to allow rows to be joined:  Directed join: Rows from one table are shipped (or directed) to the appropriate partition of the other table. The appropriate partition is computed by applying the hash computation for the second table on the joining columns of the first table.  Broadcast join: All the rows from one table are shipped (or broadcast, as the terminology is in this case) to all partitions where there are rows from the second table that will participate in the join.

In the directed join, the partitioning keys of at least one of the files are used as the join fields. The join fields do not match the partitioning keys of the other files (as otherwise a collocated join would be performed). Records of this one table are directed to or sent to the partition (node) of the second table based on the hashing of the join field values using the partition map and partition (node) group of the second table. When the records have been joined, the results are forwarded to the coordinating partition.

The best description of the join strategies for multi-partition DB2 databases can be found in Chapter 6 “Understanding the SQL compiler” of the DB2 Administration Guide: Performance, SC09-4821.

Table queue terminology for DB2 database systems The descriptions of join techniques in a DB2 partitioned database use the following terminology:  Table queue: A mechanism for transferring rows between database partitions, or between processors in a single partition database.

Chapter 6. Data partitioning and access methods 193  Directed table queue: A table queue in which rows are hashed to one of the receiving database partitions.

 Broadcast table queue: A table queue in which rows are sent to all of the receiving database partitions, but are not hashed.

A table queue is used in the following circumstances:  To pass table data from one database partition to another when using inter-partition parallelism  To pass table data within a database partition when using intra-partition parallelism  To pass table data within a database partition when using a single partition database.

Each table queue is passes the data in a single direction. The compiler decides where table queues are required and includes them in the plan. When the plan is executed, the connections between the database partitions initiate the table queues. The table queues close as processes end.

There are several types of table queues:  Asynchronous table queues: These table queues are known as asynchronous because they read rows in advance of any FETCH being issued by the application. When the FETCH is issued, the row is retrieved from the table queue. Asynchronous table queues are used when you specify the FOR FETCH ONLY clause on the SELECT statement. If you are only fetching rows, the asynchronous table queue is faster.  Synchronous table queues: These table queues are known as synchronous because they read one row for each FETCH that is issued by the application. At each database partition, the cursor is positioned on the next row to be read from that database partition. Synchronous table queues are used when you do not specify the FOR FETCH ONLY clause on the SELECT statement. In a partitioned database environment, if you are updating rows, the database manager will use the synchronous table queues.  Merging table queues: These table queues preserve order.  Non-merging table queues: These table queues are also known as regular table queues. They do not preserve order.  Listener table queues: These table queues are use with correlated subqueries. Correlation values are passed down to the subquery and the results are passed back up to the parent query block using this type of table queue.

194 Database Strategies: Using Informix XPS and DB2 Universal Database Collocated join A collocated join occurs locally on the partition where the data resides. The partition sends the data to the other partitions after the join is complete. For the optimizer to consider a collocated join, the joined tables must be collocated, and all pairs of the corresponding partitioning key must participate in the equality join predicates.

Directed join Each row of one table is sent (or directed) to one particular partition of the other table after performing the hash computation. When received, that row is joined with the data of second table that resides on that partition.

Broadcast join Broadcast outer-table joins are a parallel join strategy that can be used if there are no equality join predicates between the joined tables. It can also be used in other situations in which it is the most cost-effective join method. For example, a broadcast outer-table join might occur when there is one very large table and one very small table, neither of which is partitioned on the join predicate columns. Instead of partitioning both tables, it might be cheaper to broadcast the smaller table to the larger table.

All rows of one table are sent (or broadcast) to all relevant partitions, where some rows are joined with the data of the second table that resides on that partition. The table that is broadcast is always the smallest table in any pair, and generally there is a limitation on the size of the table that is broadcast.

6.11.4 MERGE, UPDATE, and DELETE joins Both XPS and DB2 support the MERGE INTO statement — and that statement can be used to provide an update function on one table (the target table) based on the contents on another table (and, for XPS, the second table can be an external table).

Only XPS provides the ability to delete rows based upon the contents of a second table through a delete-join. However, DB2 can accomplish that by using the MERGE statement. The following syntax does delete based on the contents of a second table: MERGE INTO part_price old USING new_part_price as new ON old.part_id = new.part_id WHEN MATCHED and old.part_id = 103 THEN DELETE:

Chapter 6. Data partitioning and access methods 195 Using XPS DELETE join Instead of writing a subquery in the WHERE clause, in XPS, you can use a delete-join to join rows from various tables and delete these rows from a target table based on the join results. Suppose that you discover that some rows of the stock table in the storesdb contain incorrect manufacturer codes. Rather than update them, you want to delete them so that they can be re-entered. You can use a delete-join query to achieve this as follows: DELETE FROM stock USING stock, manufact WHERE stock.manu_code != manufact.manu_code).

6.12 Optimizer

The key to performance in a modern database is the optimizer which is discussed in the following sections:  The role of query optimizer  LEO: Learning Optimizer

Beyond that, additional features of the DB2 optimizer are discussed in the following sections:  Push-down hash join  Optimization strategies for intra-partition parallelism  Directives  Optimization classes

6.12.1 The role of query optimizer Query optimizers are one of the most autonomic features of today’s relational database systems, automatically determining the best way to execute a declarative SQL query. Since its inception, the DB2 query optimizer has optimized automatically even the most complex decision-support queries, without any of the hints from the user required by some competitive optimizers.

The DB2 query optimizer uses a combination of:  Powerful query rewrite rules to transform user-written queries into standardized, easier-to-optimize queries.  A detailed cost model to generate and evaluate a rich set of alternative plans for executing the query.

The cost model can be displayed using Visual Explain. Thus, the optimizer automatically determines whether any existing Materialized Query Tables (MQT) could benefit a query and, if so, routes the query to use the AST without having

196 Database Strategies: Using Informix XPS and DB2 Universal Database to alter the query in the application program. Further, it collects statistics on the size of each table and the distribution of each column to model how many rows must be processed by any query a user might submit.

The optimizer adapts its model to the local machine environment, factoring in the

speed of the processor, the type and model of storage devices, and the network connecting machine clusters (in a shared-nothing environment) or sites (in a federated environment). In most cases, the optimizer minimizes the total overall resource consumption, but automatically changes the optimization criterion to be minimal elapsed time in parallel environments. The cost model includes detailed modeling of the availability of various memory categories (such as multiple buffer pools and sort heap) versus the amount needed, hit ratios, the cost to build temporary tables versus the cost to re-scan them, various flavors of prefetching and big-block I/O, non-uniformity of data distributions, and so on.

The optimizer even has a meta-optimizer, which determines automatically when a query is too complex to optimize using dynamic programming, and instead uses a “greedy” algorithm to save on optimization time and space.

6.12.2 LEO: Learning Optimizer The new optimizer in DB2 UDB V8.2 for Linux, UNIX, and Windows — LEO or Learning Optimizer — is part of the IBM SMART (self managing and resource tuning) database technology and a typical example of autonomic computing that is being incorporated into DB2. Key features include:  LEO watches production queries and compares actuals with estimates.  The statistics gathered are used to improve future queries with similar characteristics.  The optimizer learns from its mistakes over time.

Autonomic features such as these are being integrated into all future versions of DB2 and other IBM software products.

For further information about LEO, see “Related publications” on page 443 for a list of resources that are available.

Design purpose of LEO LEO is a comprehensive way to repair incorrect statistics and cardinality estimates of a query execution plan (QEP). By monitoring previously executed queries, LEO compares the optimizer estimates with actuals at each step in a QEP and computes adjustments to cost estimates and statistics that can be used during future query optimizations.

Chapter 6. Data partitioning and access methods 197 In practice, LEO learns from its past mistakes, that is, accelerating — sometimes drastically — future executions of similar queries while incurring a negligible monitoring overhead on query compilation and execution.

Rather than optimizing a query once, when it is compiled, LEO watches

production queries as they run and then fine-tunes them as it learns about data relationships and user needs. This approach comes about because LEO can empirically derive interesting things about the data. For example, LEO would eventually come to realize that a zip code can be associated with only one state or that a Camry is made only by Toyota, even if those rules are not specified in advance.

LEO is expected to be most helpful in large and complex databases and in databases where inter-data relationships exist but are not explicitly declared by the database designers.

LEO at run time No database maintains perfectly up-to-date statistics about itself. DB2 itself only updates those statistics when RUNSTATS is executed and statistics are gathered in detail or through sampling. At other times these statistics are imperfect, and they are always imperfect except where detailed distributions are computed, and then only briefly.

LEO self-validates the query optimizer cardinality model by instrumenting the execution module to collect actual cardinalities at each step of the query execution plan (QEP). After the query completes, LEO compares these actuals to the query optimizer estimates, to produce adjustment factors to be exploited by future optimization of queries that have similar predicates. In this way, LEO actually learns from its previous estimates and executions over time by accumulating metadata in the database that augments the statistics indicating where data is queried the most.

This approach is very general and can correct the result of any sequence of operations, not just the access of a single table or even just the application of predicates.

6.12.3 Push-down hash join In XPS, push-down hash joins are used only for tables in a star or snowflake schema. These schemas, often used for data warehouses and data marts, consist of one or two very large fact tables joined by foreign keys to several very small dimension tables. For information about star and snowflake schemas, refer to the IBM Informix: Database Design and Implementation Guide, G251-2271.

198 Database Strategies: Using Informix XPS and DB2 Universal Database The primary purpose of a push-down hash join is to reduce the size of the initial probe table, usually a fact table. In a push-down hash-join plan, all of the hash tables are built before any hash joins occur. The database server pushes down one or more of the join keys to filter the fact table at the same time as it builds hash tables on the dimension tables. This dynamic filter on the fact table reduces the number of rows used in the probe phase of the actual hash joins. A more descriptive name for push-down hash joins might be push-down filters.

For Informix XPS, the database server chooses a push-down hash join for star or snowflake schema queries in the following circumstances:  Neither the fact table nor any dimension table can participate in an outer join.  The join between the fact table and any dimension table must be a primary key to foreign key equi-join, which makes a hash join possible.  No table can be an external table.  The values in the join-key columns of the dimension table must be unique. The optimizer uses data distribution statistics to determine whether a column contains unique values. The schema does not need to specify explicitly that a column is a primary key or apply the UNIQUE constraint to the column definition to produce a push-down hash join. If a table has a multi-column key, the table can be used as a dimension table only if one of the key columns is unique.  The query must include a filter on at least one of the dimension tables.  The number of rows selected from each dimension table must be less than the number of rows selected from the fact table after all scan filters are applied.

6.12.4 Optimization strategies for intra-partition parallelism The DB2 optimizer can choose an access plan to execute a query in parallel within a single database partition if a degree of parallelism is specified when the SQL statement is compiled. At execution time, multiple database agents called subagents are created to execute the query. The number of subagents is less than or equal to the degree of parallelism specified when the SQL statement was compiled.

To make an access plan parallel, the optimizer divides it into a portion that is run by each subagent and a portion that is run by the coordinating agent. The subagents pass data through table queues to the coordinating agent or to other subagents. In a partitioned database, subagents can send or receive data through table queues from subagents in other database partitions.

Chapter 6. Data partitioning and access methods 199 Intra-partition parallel scan strategies Relational scans and index scans can be performed in parallel on the same table or index. For parallel relational scans, the table is divided into ranges of pages or rows. A range of pages or rows is assigned to a subagent. A subagent scans its assigned range and is assigned another range when it has completed its work on the current range.

For parallel index scans, the index is divided into ranges of records based on index key values and the number of index entries for a key value. The parallel index scan proceeds such as the parallel table scan with subagents being assigned a range of records. A subagent is assigned a new range when it has complete its work on the current range.

The optimizer determines the scan unit (either a page or a row) and the scan granularity. Parallel scans provide an even distribution of work among the subagents. The goal of a parallel scan is to balance the load among the subagents and keep them equally busy. If the number of busy subagents equals the number of available processors and the disks are not overworked with I/O requests, then the machine resources are being used effectively.

6.12.5 Directives Informix XPS and Informix IDS both support optimizer directives, including directives that specify how a particular join should be performed (USE_NL, USE_HASH) or not (AVOID_NL, AVOID_HASH).

DB2 does not currently (as of DB2 UDB V8.2.2) support optimizer directives. If directives were to be permitted, the arguments for both positive and negative directives — similar to those provided with Informix XPS and Informix XPS — would prevail.

6.12.6 Optimization classes While there are no optimizer directives as such in DB2 some control is available through optimization classes. You can specify one of the optimizer classes when you compile an SQL query. Table 6-2 on page 201 provides a list of those classes and their descriptions.

200 Database Strategies: Using Informix XPS and DB2 Universal Database Table 6-2 Optimization classes

Class Description

0 This class directs the optimizer to use minimal optimization to generate an access plan. This optimization class has the following characteristics:  Non-uniform distribution statistics are not considered by the optimizer. Only basic query rewrite rules are applied.  Greedy join enumeration occurs.  Only nested loop join and index scan access methods are enabled.  List prefetch and index ANDing are not used in generated access methods.  The star-join strategy is not considered. This class should only be used in circumstances that require the lowest possible query compilation overhead. Query optimization class 0 is appropriate for an application that consists entirely of very simple dynamic SQL statements that access well-indexed tables.

1 This optimization class has the following characteristics:  Non-uniform distribution statistics are not considered by the optimizer.  Only a subset of the query rewrite rules are applied.  Greedy join enumeration occurs.  List prefetch and index ANDing are not used in generated access methods although index ANDing is still used when working with the semi-joins used in star joins.

Optimization class 1 is similar to class 0 except that Merge Scan joins and table scans are also available.

Chapter 6. Data partitioning and access methods 201 Class Description

2 This class directs the optimizer to use a degree of optimization significantly higher than class 1, while keeping the compilation cost significantly lower than classes 3 and above for complex queries. This optimization class has the

following characteristics:  All available statistics, including both frequency and quantile non-uniform distribution statistics, are used.  All query rewrite rules are applied, including routing queries to materialized query tables, except computationally intensive rules that are applicable only in very rare cases.  Greedy join enumeration is used.  A wide range of access methods are considered, including list prefetch and materialized query table routing.  The star-join strategy is considered, if applicable.

Optimization class 2 is similar to class 5 except that it uses Greedy join enumeration instead of Dynamic Programming. This class has the most optimization of all classes that use the Greedy join enumeration algorithm, which considers fewer alternatives for complex queries, and therefore consumes less compilation time than classes 3 and above. Class 2 is recommended for very complex queries in a decision support or online analytic processing (OLAP) environment. In such environments, specific queries are rarely repeated exactly, so that a query access plan is unlikely to remain in the cache until the next occurrence of the query

3 This class requests a moderate amount of optimization. This class comes closest to matching the query optimization characteristics of DB2 for MVS/ESA™, OS/390, or z/OS. This optimization class has the following characteristics:  Non-uniform distribution statistics, which track frequently occurring values, are used if available.  Most query rewrite rules are applied, including subquery-to-join transformations.  Dynamic programming join enumeration, as follows: – Limited use of composite inner tables – Limited use of Cartesian products for star schemas involving look-up tables  A wide range of access methods are considered, including list prefetch, index ANDing, and star joins.

This class is suitable for a broad range of applications. This class improves access plans for queries with four or more joins. However, the optimizer might fail to consider a better plan that might be chosen with the default optimization class.

202 Database Strategies: Using Informix XPS and DB2 Universal Database Class Description

5 This class directs the optimizer to use a significant amount of optimization to generate an access plan. This optimization class has the following characteristics:

 All available statistics are used, including both frequency and quantile distribution statistics.  All of the query rewrite rules are applied, including the routing of queries to materialized query tables, except for those computationally intensive rules which are applicable only in very rare cases.  Dynamic programming join enumeration, as follows: – Limited use of composite inner tables – Limited use of Cartesian products for star schemas involving look-up tables  A wide range of access methods are considered, including list prefetch, index ANDing, and materialized query table routing. When the optimizer detects that the additional resources and processing time are not warranted for complex dynamic SQL queries, optimization is reduced. The extent or size of the reduction depends on the machine size and the number of predicates. When the query optimizer reduces the amount of query optimization, it continues to apply all the query rewrite rules that would normally be applied. However, it does use the Greedy join enumeration method and reduces the number of access plan combinations that are considered. Query optimization class 5 is an excellent choice for a mixed environment consisting of both transactions and complex queries. This optimization class is designed to apply the most valuable query transformations and other query optimization techniques in an efficient manner.

7 This class directs the optimizer to use a significant amount of optimization to generate an access plan. It is the same as query optimization class 5 except that it does not reduce the amount of query optimization for complex dynamic SQL queries.

Chapter 6. Data partitioning and access methods 203 Class Description

9 This class directs the optimizer to use all available optimization techniques. These include:  All available statistics.  All query rewrite rules.  All possibilities for join enumerations, including Cartesian products and unlimited composite inners.  All access methods. This class can greatly expand the number of possible access plans that are considered by the optimizer. You might use this class to obtain whether more comprehensive optimization would generate a better access plan for very complex and very long-running queries that use large tables. Use Explain and performance measurements to verify that a better plan has actually been found.

6.13 Performance enhancements in DB2 UDB V8.1

A number of important performance enhancements were made to DB2 UDB with version 8.1 and its fix packs. These enhancements include:  Distributed catalog cache  Prefetch  Page cleaner  Multithreading  Join variations  Increased use of bit-filters  Informational constraints  Uniform page size

6.13.1 Distributed catalog cache DB2 UDB V8.1 gives the DBA the ability to distribute the catalog cache across all the database partitions that make up a partitioned database instance. The distributed catalog cache increases the performance of applications that connect to a non-catalog partition. All types of workloads can benefit from this feature as now there are fewer constraints on choosing a coordinator partition for applications without sacrificing performance.

Furthermore, more information is now stored in the catalog cache, including trigger, , and check constraint information.

204 Database Strategies: Using Informix XPS and DB2 Universal Database 6.13.2 Prefetch

With DB2 UDB V8.1, when a block-based buffer pool is available to DB2 UDB, the prefetching code recognizes this and uses block I/O to read multiple pages into the buffer pool in a single I/O operation. This significantly improves the

performance of prefetching.

The blocksize parameter of the CREATE and ALTER BUFFERPOOL statements define the size of the blocks, and thus the number of pages read from disk with block I/O.

6.13.3 Page cleaner I/O improvements DB2 UDB has long employed the use of page cleaners that write dirty pages in the buffer pool back to disk. DB2 UDB V8.1.2 enhances the performance of these page cleaners by making them aware of the underlying complexities of the I/O subsystems (both hardware and software). The page cleaner algorithms now take into account these I/O subsystems. No configuration parameters are needed to use this information about the I/O primitives and how they are best applied to the work of DB2 UDB.

6.13.4 Multi-threading of Java-based routines In DB2 UDB V7.2, Java routines were run in fenced mode by most customers and in vendor-supplied software applications. This process-based model is expensive both as to startup and from continued memory and operating system perspectives.

With DB2 UDB V8.1.2, a variety of routines (stored procedures, user-defined functions, and methods) are now implemented using a thread-based model that results in a dramatic performance increase for database servers running these routines. This new approach allows resource sharing of the common Java Virtual Machine (JVM™) for Java routines and reduces the amount of context switching in general for users that run a large number of fenced-mode routines.

6.13.5 Join variations DB2 UDB V8.1.2 introduces three new join variants to the methods discussed above. These variants are:  Reverse outer join (or right outer join), which is an outer join that has the inner

table as the row preserving side.  Reverse early out , which is an inner join where a row from the inner table need only match the first matching row of the outer table.

Chapter 6. Data partitioning and access methods 205  Anti join, which is an outer join that returns only rows from the row preserving side (either the inner or the outer table) that have no match to rows on a null producing side.

The reverse outer join in DB2 UDB V8.1.2 is only applicable to the hash join

method and not to the nested loop join nor to the sort merge join. This emphasis on the hash join method follows a trend with XPS where more complex techniques are needed for the approaches that are used for the largest queries.

The anti join, on the other hand, is applicable for all three join methods (HJ, NLJ, and SM). This variant results from an internal rewrite of a NOT EXISTS (or NOT IN) clause. This can be used, for instance, to rewrite the SELECT in Example 6-19 to the SELECT in Example 6-20.

Example 6-19 SELECT NOT EXISTS SELECT … FROM t1 WHERE NOT EXISTS (SELECT … FROM t1 WHERE t2.c2 = t1.c1)

Example 6-20 SELECT rewrite SELECT … FROM t1 ANTI JOIN t1 ON t2.c2 = t1.c1

The anti-join concept is best understood in terms of a business query such as, “Which of our customers have never placed an order?”

6.13.6 Increased opportunity for selection of bit-filters DB2 UDB V8.1.2 includes performance improvement for hash joins by selecting bit-filters more often, in situations where they help. This feature improves the performance too of large scale has joins by the addition of a new 32-bit hash code.

6.13.7 Informational constraints DB2 UDB V8.1 introduced a new type of constraint called an informational constraint. Informational constraints are data model information that is now captured to the system catalog and hence can be used to inform the optimizer. Generally informational constraints are enforced by the ETL (extract, transform, and load) routines.They are not enforced for or validated by DB2.

Because no constraint checking is performed by DB2 at run time, the performance of INSERT, UPDATE, DELETE, and LOAD operations is improved.

206 Database Strategies: Using Informix XPS and DB2 Universal Database 6.13.8 Uniform page size

DB2 UDB V8.2.2 includes a new capability that allows the DBA to use a single page size for their entire database.

The simplest example illustrating this can be seen with the following statement: CREATE DATABASE eight PAGE SIZE 8 K

In this database, all catalog tables are created with a larger page size (8192 bytes). Even the default IBMDEFAULTBP is 8 KB instead of the customary 4 KB page size, which was the only one that was available previously.

Chapter 6. Data partitioning and access methods 207

208 Database Strategies: Using Informix XPS and DB2 Universal Database

7

Chapter 7. SQL considerations

Although both XPS and DB2 support SQL standards, they also support extensions to those standards that are effectively differences. There are both overt and subtle ramifications to these SQL differences. This chapter discusses some considerations for SQL that involve these differences.

In this chapter, where we refer to XPS, the same considerations apply to the Informix Dynamic Server (IDS) unless explicitly noted.

© Copyright IBM Corp. 2005. All rights reserved. 209 7.1 SELECT issues

The SELECT statement is one of the most widely used and most highly flexible SQL statements. There are many aspects of SELECT clause. Some of these differ between XPS and DB2.

7.1.1 Selectivity This section discusses the capabilities that are available which enhance the selectivity of the SELECT statement.

SELECT FIRST Both XPS and DB2 allow you to specify that only n number of rows are to be returned by a SELECT statement. However, both XPS and DB2 use different placement and phraseology to specify this, as shown in the following: (XPS) SELECT FIRST n ...

(DB2) SELECT ... FETCH FIRST n ROWS

Arbitrary rows are returned unless ORDER BY is also specified.

SELECT LAST can be achieved by reversing the direction of the ORDER BY clause.

DISTINCT and UNIQUE Both XPS and DB2 support the ANSI/ISO SQL keyword DISTINCT in the SELECT clause. XPS, however, also supports the extended keyword UNIQUE as an alternate keyword for DISTINCT. DB2 does not support this keyword.

ROWID ROWID is a row addressing mechanism used internally by XPS. Although an internal attribute, ROWID is exposed to XPS users for use in applications.

DB2 uses its own addressing mechanism and therefore does not support ROWID specifically. You can simulate the ROWID functionality in DB2 however. For each table requiring ROWID, add an integer column to that table named ROWID. Assign the column the IDENTITY attribute. The IDENTITY attribute generates unique values. See 5.7.2, “SERIAL and SERIAL8” on page 150.

210 Database Strategies: Using Informix XPS and DB2 Universal Database Substring functionality Both XPS and DB2 support the SUBSTR (col, start, length) function. XPS, also supports a column substring notation — col[start,end] — that DB2 does not support. If you use the column substring notation, you need to change to an equivalent SUBSTR function.

The following substring notations are equivalent (if you compute first the value length = end-start+1): (XPS column substring notation) col[start,end]

(XPS and DB2) SUBSTR(col,start,length)

IDS has a related SUBSTRING function, but this has no different syntax and a different functionality for end of string and wrap as follows: (IDS only) SUBSTRING(col FROM start [FOR length])

Stored procedure in SELECT XPS supports the calling of a stored procedure from within a select list, as follows: (XPS) SELECT my_stored_procedure(col1) from tab1

This type of stored procedure is really better designated as a user-defined function. In this case, the designation as stored procedure rather than a user defined function is historical and was superseded in IDS anyway.

DB2 does not support the calling of a stored procedure from within a select list, because it has traditionally defined stored procedures as not being usable except through the use of the CALL keyword. The difference is more terminological that inherent. However, DB2 does support the calling of user-defined functions.

When transitioning from XPS to DB2 there are several options: 1. DB2 user-defined functions can be written entirely in SQL. If you have an XPS stored procedure that simply executes some SQL, then just CREATE it as a DB2 user-defined function rather than as a stored procedure. 2. Convert your XPS stored procedure to a DB2 stored procedure under a different name. Create a DB2 user-defined function with the same name as your XPS stored procedure. Have this user-defined function call the stored procedure. The SELECT would call the user-defined function, which would call the stored procedure.

Chapter 7. SQL considerations 211 3. Rewrite the stored procedure logic into a DB2 user-defined function in C or Java (or C# or Visual Basic in a .NET environment).

In all these cases, the syntax issues are non-trivial. The classification as a user-defined function or a stored procedure is historical rather than actual.

7.1.2 Statistical sampling To help with the need to run business integration or OLAP queries, you might need to execute advanced analytical, mining, and statistical algorithms or interactively explore the vast amounts of data in the modern data warehouse. To speed up the calculation, you can use statistical sampling to give results that are often sufficiently accurate in a reasonable amount of time. Rather than read each page and row, a sample of rows is chosen to provide the basis for computations. Generally, sampling is best applied against aggregates such as AVG rather than enumerating individual rows.

Both XPS and DB2 (starting with V8.1.2) provide statistical sampling of data. IDS does not provide statistical sampling.

The syntax for XPS and DB2 statistical sampling of data differs both in syntax and approach, as follows: (XPS) SELECT AVG(invoice_amount) FROM 1000 SAMPLES OF orders

(DB2) SELECT AVG(invoice_amount) FROM orders TABLESAMPLE SYSTEM (2) REPEATABLE (12345)

In the XPS syntax, the number of rows that are required or analyzed is specified (in this example, 1000). The user needs to determine how many rows might provide statistically relevant results.

DB2 uses the ISO SQL-200n Change Proposal that has been approved nominally by the ISO Database Languages Working Group. Although this proposed ISO syntax permits a sampling clause to be associated with any table reference, DB2 currently restricts sampling to actual tables and does not allow it on logical views, nicknames (in federated databases), table expressions, table functions, or similar. Thus only tables, materialized query tables (MQT/AST), and global temporary table can be sampled.

212 Database Strategies: Using Informix XPS and DB2 Universal Database DB2 sampling syntax and semantics In DB2, any stored table appearing in the FROM clause of a SELECT query can be qualified with a sampling clause. The basic syntax of a sampling clause is as follows (illustrated here for a single table): SELECT select_list FROM table_name TABLESAMPLE sampling_method (p) [ REPEATABLE (s) ]

Here TABLESAMPLE is a new SQL keyword, introduced in V8.1.2, that tells DB2 to process only a sample of rows from table_name rather than all of the rows. In addition, sampling_method specifies which one of two methods (BERNOULLI or SYSTEM) should be used to sample the rows, and p specifies the target percentage of rows to retrieve (between 0 and 100).

When a given percentage p is requested, the rows processed or returned approximate the percentage requested and can differ from run to run. Thus requesting 3% from the CUSTOMER table in the standard DB2 sample database (with 36 rows) might return typically between two to five rows. The number of rows selected from large tables will generally better approximate the percentage requested.

The optional REPEATABLE clause is useful for debugging sampling queries. The repeat value s provides a seed to the random number generator used for the selection. Thus, when the same repeat value is provided for the same data, the same results are returned.

DB2 supports two sampling methods, which are specified currently in the ISO proposal:  BERNOULLI sampling (with sampling percentage p) Bernoulli sampling provides a simulation that “flips a weighted coin” for each row individually to determine whether to include that row in the sample with probability p/100 and excluding it with probability 1-p/100, independently of the other rows. For this reason the BERNOULLI sampling method is sometimes called row-level Bernoulli sampling. Although on average the sample contains p percent of the rows in the table, the actual sample size is random and thus might differ on subsequent executions of the query. When there is no index available, BERNOULLI sampling retrieves each row in the table, so that there is no I/O savings, and thus sampling performance can be poor. When there is an index on one or more of the columns on the table, then DB2 performs the Bernoulli “coin flips” on the RIDs in the index leaf pages, thereby retrieving only those pages that contain at least one sampled row. However, even under the most efficient possible implementation of BERNOULLI sampling, a large fraction of the pages must be retrieved unless the sampling fraction is extremely low.

Chapter 7. SQL considerations 213  SYSTEM sampling

System sampling permits the query optimizer to determine the most efficient manner in which to perform the sampling. In most cases, SYSTEM sampling applied to a table means that each page of the table is included in the sample with probability p/100 and excluded with probability 1-p/100. For each page that is included, all rows on that page qualify for the sample. This sampling method is called page-level Bernoulli sampling. SYSTEM sampling generally executes much faster than BERNOULLI sampling, because fewer data pages need to be retrieved. SYSTEM sampling can, however, yield less accurate estimates for aggregate functions, for example, SUM(SALES), especially if there are many rows on each page or if the rows of the table are clustered on any columns referenced in the query. The optimizer might, in certain circumstances, decide that it is more efficient to perform SYSTEM sampling as though it were BERNOULLI sampling, for example, when a predicate can be applied by an index and is much more selective than the sampling rate p.

Semantically, sampling of a table occurs before any other query processing, such as applying predicates or performing joins. You can envision the original tables referenced in a query are replaced initially by temporary smaller tables that contain sampled rows, and then normal query processing commences on the reduced tables. (For performance reasons, actual processing does not occur in exactly this manner.)

You can find further details on statistical sampling in DB2 UDB in the following article: http://www.almaden.ibm.com/cs/people/peterh/idugjbig.pdf

Examples of the use of statistical sampling in this article show that a 1% sample on a large sampled table are almost indistinguishable from those for the entire table. The required processing time is reduced by over two orders of magnitude and demonstrates the use of DB2 linear regression functions to fit a line to a set of data points, using both full and statistical sampling.

In general, sampling techniques are ideally suited to discovering general trends and patterns in data.

Important: The results of a sampled query contain a certain amount of deviation from a complete scan of all rows. You can reduce the expected error to an acceptable level by increasing the proportion of sampled rows to actual rows. When you use sampling with tables that are joined, the expected error increases dramatically. You must use larger samples to retain an acceptable level of accuracy.

214 Database Strategies: Using Informix XPS and DB2 Universal Database Other differences between XPS sampling and IDS sampling XPS sampling is sampling with replacement (that is, the same row can be returned or used multiple times). Because the user requests a specific number of rows, the precise number is returned or used.

DB2 sampling is sampling without replacement (that is, the same row is not used twice in any one run). The number of rows returned or used, however, is only specified as a percentage. The actual number returned or used varies from run to run. Although on average the sample contains p percent of the rows in the table, the actual sample size is random and thus can differ on subsequent executions of the query.

Sampling with and without replacement represent two different statistical approaches. Statisticians use the term binomial distribution to describe sampling with replacement, and the term hypergeometric distribution to describe sampling without replacement. For large populations, binomial and hypergeometric distributions produce similar results.

Tip: With XPS, you must run at least UPDATE STATISTICS LOW before you run the query with the SAMPLES OF option. If you have never run UPDATE STATISTICS, the SAMPLES OF clause is ignored, and all data values are returned. Generally, for the best results, run UPDATE STATISTICS MEDIUM at least once before ever using the SAMPLES OF option.

There is apparently no similar requirement to run RUNSTATS before using DB2 sampling, but obviously better sampling might result.

7.1.3 SELECT cursors This section discusses the options that are available with the SELECT cursors statement.

SELECT ... FOR UPDATE In XPS, for log mode ANSI databases, all cursors are update cursors by default. In XPS, for logged, non mode ANSI databases, cursors must be explicitly defined FOR UPDATE. DB2 also requires FOR UPDATE phraseology.

In DB2, for cursors that are not used for update, you might want to consider adding DB2’S FOR READ ONLY phraseology to the cursor syntax. This clarifies the usage of the cursor and allows DB2 to avoid pessimistic locking.

Chapter 7. SQL considerations 215 Cursors WITH HOLD In both XPS and DB2, by default, open cursors are closed upon the termination of a transaction. Also, both XPS and DB2 allow cursors to be declared WITH HOLD to override this behavior.

In XPS, declaring a cursor WITH HOLD holds a cursor open through any transaction end, both a COMMIT and a ROLLBACK. In DB2 WITH HOLD holds a cursor open only through a COMMIT.

Tip: DB2 cursors do not operate exactly the same as XPS cursors, and they provide many options. It is highly recommended that you read the DB2 documentation regarding SELECT statements and cursors. In addition, you should note that DB2 closes cursors whenever a ROLLBACK occurs, regardless of the WITH HOLD status.

7.1.4 Joins The ANSI/ISO SQL standards support the following types of join:  Inner join (keyword INNER is typically optional), which returns rows with matching values from both joined tables and exclude all other rows.  Cross join (key phrase CROSS JOIN, but in older syntax this is omitted), which returns all possible combinations of tuples from the joined tables, often known as a Cartesian join.  Outer join (with subtypes LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL [OUTER], and UNION JOIN), which returns rows from the inner join plus other rows from the joined tables dependent on the nature of the type of outer join that is used.

The old syntax (SQL-92) uses the WHERE clause and a relational expression, while the new syntax (SQL-99 and later) uses the ON clause. The old syntax was never fully standardized among database vendors and thus IDS/XPS, Oracle, and MS SQL Server have slightly differing syntaxes. The various RDBMSs have notable differences in the handling of NULL values in the join columns.

DB2 does not use any old syntax forms for outer joins at all, because it used SQL-99-compliant syntax from the very beginning. In fact, the DB2 use formed the basis for the SQL standard.

Thus, the problem of transitioning from XPS to other than IDS is a conversion of old syntax format to SQL standards formats for join statements. The old syntax formats of XPS or IDS are idiosyncratic.

216 Database Strategies: Using Informix XPS and DB2 Universal Database The differences between various approaches to the SQL join statement are highlighted in Chapter 9 “Multi-table Queries” of the following:

Alex Kriegel & Boris M. Trukhnov, SQL Bible (Indianapolis, IN: Wiley, 2003). [This book provides IBM DB2 UDB Personal Edition V8.1 on its enclosed CD.]

Outer join XPS supports the same ANSI/ISO outer join syntax that is supported by DB2. The XPS original join support, OUTER, is the equivalent of a left outer join. The simple XPS phraseology, OUTER, is equivalent to a LEFT OUTER JOIN. DB2 does not support this stand-alone OUTER phraseology. As such, XPS OUTER phraseology must be converted to the standard ANSI syntax that DB2 supports, as shown in the following SELECT statements. (XPS) SELECT c.customer_num, c.lname, c.company, c.phone, u.call_dtime, u.call_descr FROM customer c, OUTER cust_calls u WHERE c.customer_num = u.customer_num

(DB2) SELECT c.customer_num, c.lname, c.company, c.phone, u.call_dtime, u.call_descr FROM customer AS c LEFT OUTER JOIN cust_calls AS u ON c.customer_num = u.customer_num;

Restriction: XPS does not allow you to use an external table as:  The outer table in an outer join  The table in a self-join

GROUP BY XPS supports a GROUP BY clause with either the column name or an ordinal number corresponding to the column position in the SELECT list. Currently DB2 only supports the column name and not an ordinal number.

Chapter 7. SQL considerations 217 7.2 MATCHES predicate

For wildcard matching in SELECT WHERE clauses, XPS supports both the ANSI

standard LIKE syntax as well as a MATCHES syntax, as shown in Table 7-1. DB2 supports only the ANSI standard LIKE syntax.

Table 7-1 MATCHES versus LIKE syntax Any one character Any number of Escape character characters (including none)

MATCHES ?* \

LIKE _%\

The following are two SELECT statement examples that use the MATCHES predicate: (XPS) SELECT * FROM books WHERE title MATCHES “Atlas Shrug*” UPDATE cars SET name = “Corvette” WHERE description MATCHES “C?” DELETE FROM sale_info WHERE discount MATCHES ‘*5%*’

(XPS and DB2) SELECT * FROM books WHERE title LIKE ‘Atlas Shrug*’ UPDATE cars SET name = ‘Corvette’ WHERE description LIKE ‘C_’ DELETE FROM sale_info WHERE discount LIKE ‘*5\%*’

7.3 Comments

Support for comments includes:  Both XPS and DB2 support double dashes (--) for line-by-line comments in SQL.  Both XPS and DB2 support C-style /* ... */ delimiters around multi-line comments in embedded C/C++ applications.  XPS also supports curly braces ({ ... }), for multiline comments in SQL. DB2 supports C style comments in DB2 SQL PL procedures.

218 Database Strategies: Using Informix XPS and DB2 Universal Database 7.4 SQLCODE and SQLSTATE

This section discusses the considerations for the SQLCODE and SQLSTATE

variables.

SQLCODE and no rows found In XPS log mode ANSI databases and in DB2, the engine sets SQLCODE to 100 (no rows found) for SELECT that finds no rows to return, DELETE that finds no rows to delete, UPDATE that finds no rows to update, and INSERT INTO and SELECT FROM that find no rows to insert.

In XPS logged, non-mode ANSI databases, and unlogged databases, SQLCODE is set to 100 (no rows found) only for SELECT that finds no rows to return. For DELETE, UPDATE, and INSERT that execute correctly but do not find rows to operate upon, SQLCODE is set to zero (0).

Error and exception handling is very important to applications. It is possible that your application checks for no rows found conditions on SELECT statements only and not the other DML statements, simply because the condition has no impact on the application logic. If this is the case, a change in no rows found behavior will have minimal impact.

SQLSTATE The SQLSTATE variable consists of two parts: 1. A two-character error class that identifies the general classification of the error 2. A three-character error subclass that identifies a specific type of error within a general error class

XPS and DB2 use many common SQLSTATE codes, but there are some that can be specific to XPS or DB2. Hopefully your application has error and exception handling. If so, you will want to check for any proprietary error codes the application examines.

The ANSI standard method for error handling is the GET DIAGNOSTICS.

7.5 Built-in functions

Table 7-2 on page 220 depicts both XPS and DB2 built-in functions. Remember that for cases where an exact duplicate DB2 built-in function does not exist for a specific XPS function, you can always create your own user-defined function. These functions can be written from scratch or they might use other similar but

Chapter 7. SQL considerations 219 existing DB2 functions, along with customization, to achieve the desired result. MTK converts any XPS construct that is included in the list of IDS features that MTK converts.

Table 7-2 Function mapping

XPS function DB2 equivalent Comments

ABS ABS

AVG AVG

CARDINALITY

CHAR_LENGTH LENGTH Note1

CASE CASE Note 5

COUNT COUNT

CURRENT CURRENT TIMESTAMP or CURRENT DATE

DATE DATE

DAY DAY

DBSERVERNAME CURRENT SERVER

DECODE CASE Note 5

EXTEND Note 4

HEX HEX

INITCAP

LENGTH Note 1

LOWER LOWER

LPAD

MAX MAX

MDY

MIN MIN

MOD MOD

MONTH MONTH

NVL COALESCE or VALUE

220 Database Strategies: Using Informix XPS and DB2 Universal Database XPS function DB2 equivalent Comments

ROUND ROUND

RPAD

STDEV STDEV

SUBSTR SUBSTR

SUM SUM

TODAY CURRENT DATE

TO_CHAR TO_CHAR Note 4

TO_DATE TO_DATE Note 4

TRIM LTRIM and RTRIM Note 2

TRUNC TRUNC

UPPER UPPER

USER CURRENT USER

VARIANCE VARIANCE

WEEKDAY DAYOFWEEK(date) - 1 Note 3

YEAR YEAR

Notes: 1. In XPS, LENGTH returns the number of bytes in a column excluding trailing spaces (except for BLOB and TEXT where it does include trailing spaces). For single byte character sets, such as English, this equates to the number of characters in a column because each character is one byte. CHAR_LENGTH, on the other hand, returns the number of logical character in a column excluding trailing spaces. For double byte character sets, this might be different than the number of bytes. For single byte character sets, CHAR_LENGTH is still the same as LENGTH. The DB2 LENGTH function by definition, behaves similar to the XPS CHAR_LENGTH function, not the XPS length function. When using single byte character sets, because one character equals one byte, you can still use the DB2 LENGTH function because it returns the same value as the XPS LENGTH function. 2. XPS uses a singular TRIM function to trim leading or trailing spaces. This is done with a parameter to the TRIM function. DB2, however, uses LTRIM to

Chapter 7. SQL considerations 221 trim leading spaces on the left, and the RTRIM function to trim trailing spaces on the right. Here are the equivalent conversions:

(XPS) TRIM(LEADING, col1) TRIM(TRAILING, col1)

(DB2) LTRIM(col1) RTRIM(col2) If your XPS application uses TRIM extensively, you might consider creating a DB2 user-defined function named TRIM that emulates the functionality of the XPS TRIM 3. The closest equivalent of XPS WEEKDAY function is DB2 DAYOFWEEK.

Tip: Both XPS and DB2 count the days of week starting on Sunday. XPS counts Sunday as zero (0), while DB2 counts Sunday at one (1).

Be aware of the XPS and DB2 day of week numbering difference. If your application uses SQL exclusively to manipulate dates, this might not be an issue for you because all DB2 SQL functions will work consistently along on the same DB2 day of week numbering scheme. If your application uses custom logic where you have mentioned specifically the day of week number, you can do one of the following: – Apply the minus one factor to the DAYOFWEEK function call to match results to your existing application logic. – Change your application logic, adding 1, to reflect the values that DB2 DAYOFWEEK returns. – Create a simple DB2 user defined function and use it as you have been using the XPS weekday function. This is recommended if you use weekday extensively (Example 7-1).

Example 7-1 Weekday user defined function CREATE FUNCTION weekday (d DATE) RETURNS INT LANGUAGE SQL CONTAINS SQL BEGIN ATOMIC RETURN dayofweek(d) - 1; END@

222 Database Strategies: Using Informix XPS and DB2 Universal Database Note: In DB2, user defined functions written in SQL are optimized along with the instigating SQL. As such, they are not invoked through the normal UDF function call interface. As a result, they are very efficient.

4. TO_DATE and TO_CHAR Both XPS and DB2 provide TO_DATE and TO_CHAR functions for converting character strings and date data types. The TO_CHAR function converts a date value into a character string representation and the TO_DATE function converts a character string representation of a date into a date value. Both functions in both databases utilize a FORMAT_STRING parameter as a template for performing the conversion. There is no commonality between the two format strings, however XPS accepts format parameters preceded by the percent sign (%) which are position dependent: (XPS) SELECT TO_CHAR(begin_date, '%A %B %d, %Y %R') FROM tab1 DB2 accepts format strings such as: (DB2) INSERT INTO in_tray (received) VALUES(TO_DATE(‘1999-12-31 23:59:59’,’YYYY-MM-DD HH24:MI:SS’)); Refer to DB2 version 8 SQL Reference for more documentation regarding TO_DATE and TO_CHAR format parameters. NOTE: To ease transition efforts, DB2 user-defined functions can be used in place of the built-in TO_DATE and TO_CHAR. 5. DECODE XPS supports both the DECODE function and the ANSI standard CASE statement. DB2 does not support the DECODE function but supports the CASE statement. The CASE expression can be easily substituted for the DECODE function.

7.6 SQL access to system catalogs

Both XPS and DB2 allow SQL access to the system catalog.

Aside from the actual structural differences in the catalogs themselves, it should be noted, that DB2 stores information differently in the catalogs than XPS. DB2 folds the names of objects stored in the catalog to upper case. XPS does not. It is important to remember this when using SELECT from the catalog with a WHERE

Chapter 7. SQL considerations 223 clause. As always, SQL phraseology itself is case insensitive for the naming of columns. Data (that is, the content of the column itself), however, is case sensitive.

In XPS, if you create a table called tab1 with CREATE TABLE tab1, it is stored in

the system catalogs as tab1 (in lower case). If you wanted to query the catalog for tab1, you would say WHERE tabname =”tab1”. The quotation marks can be either single (standard ANSI SQL) or double.

In DB2, even if you create tab1 with CREATE TABLE tab1, the value tab1 is stored in the catalog as TAB1 (in upper case). To query the catalog for information about this table you would now need to say WHERE name=’TAB1’ (note the upper case and the single quotation marks).

7.7 Quotations and character strings

XPS supports the usage of both single quotation marks and quotation marks around character strings, as shown here: (XPS) SELECT * FROM state WHERE name = “Texas” SELECT * FROM books WHERE author = ‘Carlton Doe’ SELECT * FROM games WHERE name = “Fishin’ Fever”

DB2 supports only single quotation marks, because that is the ANSI standard. Care must be taken for cases where the character strings themselves contain single quotation marks. For these, you must double-up the single quotation marks, as shown here: (DB2) SELECT * FROM state WHERE name = ‘Texas’ SELECT * FROM books WHERE author = ‘Carlton Doe’ SELECT * FROM games WHERE name = ‘Fishin’’ Fever’

The ANSI/ISO standard specifies that quoted strings be enclosed in single quotation marks. The XPS extension to allow double quotation marks provides no useful additional functionality and should be generally discontinued in favor of the SQL standards approach.

7.8 Concatenation behavior

Both XPS and DB2 support CONCAT and two vertical bars ( || ) as synonyms for the concatenation operator. Be aware of result data type and length differences between XPS and DB2. For DB2, refer to the DB2 SQL Reference, SC09-2974 and SC09-2975. In this book, in the “[Expressions ...] With the Concatenation

224 Database Strategies: Using Informix XPS and DB2 Universal Database Operator” chapter, there is an extensive table that shows the result data type and length for the various combinations of operands. For example, concatenating a CHAR(m) and a CHAR(n) produces a VARCHAR(m+n) in DB2 if (m+n) > 254, while XPS produces a CHAR(m+n) for (m+n) up to 2000.

Another difference in CONCAT behavior occurs when a NULL are involved. With DB2, if either operand is NULL, the result is NULL. With XPS, a NULL is treated as a zero-length string, so it does not cause a concatenation with NULL to produce NULL result. If your application implements string concatenation, you might need to investigate the possibility of the occurrence of NULL strings and their potential impact on concatenation output. You might need to use COALESCE or VALUE functions to override NULL values, as shown in Example 7-2.

Example 7-2 NULL concatenation CREATE TABLE t(c1 VARCHAR(5)); INSERT INTO t VALUES ('zyx'); INSERT INTO t VALUES (null); SELECT 'abc' || 'def' || c1 FROM t;

In XPS, the result of the above SELECT is: abcdefzyx abcdef

In DB2, the result is: abcdefzyx (null result)

To have the DB2 result match that of XPS, change the SELECT to either of the COALESCE or VALUE expression as shown in Example 7-3 where NULLs are changed to empty strings.

Example 7-3 DB2 COALESCE and VALUE functions SELECT 'abc' || 'def' || COALESCE(c1,'') FROM t; SELECT 'abc' || 'def' || VALUE(c1,'') FROM t;

In the output of SELECT statements in both XPS and DB2, by default the NULL values of a column appear last when that column is ordered in ascending sequence, and first when ordered in descending sequence. The XPS SELECT statement allows NULLS FIRST or NULLS LAST to be specified as a way to override the default. DB2 does not provide this syntax, but a way to obtain a similar result is to use COALESCE() to convert NULL to the empty string, which is at the opposite end of the collating sequence from NULL, as shown in Example 7-4 on page 226.

Chapter 7. SQL considerations 225 Example 7-4 Using COALESCE to control NULL sort order in DB2

(a)SELECT lname FROM t ORDER BY lname ASC; -- nulls last (b)SELECT COALESCE(lname,’’) FROM t ORDER BY 1 ASC; -- nulls first (c)SELECT COALESCE(lname,’’) AS lname FROM t ORDER BY lname ASC; -- nulls first

(d)SELECT lname from t ORDER BY lname DESC; -- nulls first (e)SELECT COALESCE(lname,’’) FROM t ORDER BY 1 DESC; -- nulls last (f)SELECT COALESCE(lname,’’) AS lname FROM t ORDER BY lname DESC; -- nulls last

With the above approach, Example 7-4, you have to be careful because the application actually sees empty strings returned, not NULL. If that must be avoided, you can keep the original column in the select list and order on the coalesced form of it, as shown in Example 7-5.

Example 7-5 Altered COALESCE (g)SELECT lname, COALESCE(lname,’’) AS lname_no_nulls FROM t ORDER BY lname_no_nulls DESC; -- nulls last

7.9 Implicit casting

XPS supports implicit casting where possible. For example, you can concatenate a string with a number because XPS implicitly casts the number into a character string prior to the concatenation. (XPS) SELECT cust_num || lastname FROM customer

Other types of implicit casting takes place in XPS such as casting dates to integers for performing arithmetic on DATEs and string manipulations on numbers.

DB2 does not support implicit casting between data types, but requires explicit casting. Below is an example of an implicit cast in XPS and an explicit cast in DB2. The results of these SELECT statements would be the string “123”. CREATE TABLE t (c1 INTEGER); INSERT INTO t VALUES (12345);

(XPS) SELECT SUBSTR(c1,1,3) FROM t;

(DB2) SELECT INTEGER(SUBSTR(CHAR(c1),1,1)) FROM t;

226 Database Strategies: Using Informix XPS and DB2 Universal Database Note: DB2 performs implicit promotion of data types of the same type. For example, when joining a SMALLINT to a BIGINT, DB2 would promote implicitly the SMALLINT to a BIGINT before the join.

7.10 Deferred constraint checking

In XPS, referential integrity checking can be deferred to the end of a transaction with the SET CONSTRAINTS DEFERRED statement. With constraints deferred, during the course of a transaction, XPS allows that constraints not be satisfied. However, constraints are checked and enforced at the end of the transaction.

DB2 does not support transactional deferred constraint checking or the creation of violations tables used with constraint checking. You can temporarily suspend and reactivate constraint checking for a table with the SET INTEGRITY statement. Doing so turns off constraint checking for the entire table, not just for rows effected by a particular transaction, and puts the table in a limited access check pending state.

7.11 Set Operators: UNION, INTERSECT, and MINUS

Set operators can be used to combine result sets in both XPS and DB2. The differences between the products are:  DB2 supports the ALL option on each operator, allowing duplicates to be preserved. XPS allows ALL only on UNION.  MINUS is not supported by DB2. The DB2 equivalent is EXCEPT (without the ALL option).

7.12 Multi-database access

Both XPS and DB2 support access to tables from individual databases within a single instance from a single SQL statement such as a joined SELECT. This is a built-in capability of both XPS and DB2.

Tip: When using DB2 you must configure the database to use the built-in multi-database access feature. It is not available automatically.

Chapter 7. SQL considerations 227 7.13 Temporary tables

In XPS, there are two types of temporary tables, implicit and explicit.

7.13.1 Implicit XPS customers often use temporary tables for complex SQL processing. Data is selected and placed in what is called an implicit temporary table. This data is then later selected out of the implicit temporary table and manipulated further. SELECTing information in two steps can yield results more complex than is possible in one step. The temp tables are known as implicit because there is no explicit table creation statement, and the structure of the temp table is implied by what is SELECTed into it, as shown in Example 7-6.

Example 7-6 XPS implicit temp table SELECT * from orders WHERE order_num < 525648 INTO TEMP y WITH NO LOG; SELECT * FROM y GROUP BY 2 ORDER BY 1

DB2 does not support such implicit temp tables. However, DB2 allows SELECT from table expressions, which might negate the need for an implicit temp table. Table expressions are SELECT results that look similar to tables. You can SELECT from table expressions instead of from tables. Table expressions can be several levels deep and complex. Each can have their own column lists, joins to other tables, and GROUP BY clauses. Example 7-7 depicts a simple table expression that is one level deep.

Example 7-7 DB2 SELECT with a simple table expression SELECT * FROM (SELECT * FROM orders WHERE order_num < 525648) as y

Example 7-8 depicts a more complex table expression. Note that each expression has its own set of clauses.

Example 7-8 DB2 SELECT with a more complex table expression WITH paylevel AS ( SELECT empno, YEAR(hiredate) AS hireyear,edlevel, salary+bonus+comm AS TOTAL_PAY FROM employee WHERE edlevel > 16 ),

paybyed (educ_level, year_of_hire, avg_total_pay) AS (

228 Database Strategies: Using Informix XPS and DB2 Universal Database SELECT edlevel, hireyear, AVG(total_pay) FROM paylevel GROUP BY edlevel, hireyear, )

SELECT empno, EDLEVEL, year_of_hire, TOTAL_PAY, avg_total_pay FROM paylevel, PAYBYED WHERE edlevel=educ_level AND hireyear, = year_of_hire AND total_pay < avg_total_pay;

7.13.2 Explicit DB2 does support the explicit creation of temporary tables. When defining an explicit temporary table, if you want the temporary table to look similar to a permanent table, you can use LIKE phraseology, as follows: CREATE TEMP TABLE y LIKE orders; INSERT INTO y SELECT * from orders WHERE order_num < 525648

If you are selecting a subset of columns or mixture of table columns into a temporary table you can use the DEFINITION ONLY clause to avoid having to specify exact data types, as shown in Example 7-9. Also, in DB2, when a transaction commits, all rows are deleted from any temporary tables created during that transaction. This is not the behavior of XPS. This might or might not affect your application. If you need to override the default behavior, use the ON COMMIT PRESERVE ROWS phrase.

Example 7-9 DB2 explicit temporary table DECLARE GLOBAL TEMPORARY TABLE marys.temptab1 AS (SELECT c1, c2, c3, c4 FROM tab1) DEFINITION ONLY ON COMMIT PRESERVE ROWS NOT LOGGED IN usertempsp1;

In XPS, if temporary tables are placed in a temporary dbspace, they are not logged automatically. You can, however, place temporary tables in any dbspace. Temporary tables placed in regular dbspaces might be logged.

In DB2, you can have system temp space and user temp space. Temporary tables must be placed in user temp space, not system temp space, and you can use the NOT LOGGED phrase (to prevent logging against the temporary table).

Neither XPS (for CREATE TEMP TABLE) nor DB2 (for DECLARE GLOBAL TEMPORARY TABLE) place entries in the system catalog tables. Thus, in

Chapter 7. SQL considerations 229 neither case is there any conflict between different applications nor any system catalog table access required.

7.14 Compound SQL

Both XPS and DB2 support compound SQL. SQL is compound when two or more SQL statements are bundled together and sent to the database instance as a unit.

XPS supports compound SQL in a prepared statement simply by placing semi-colons between statements. XPS does guarantee that the statements complete in the order specified. As such, longer running statements might finish last in the compound group regardless of placement within the group. Most likely this is not significant to the application.

XPS compound SQL statements are atomic, meaning they execute as a unit. If one statement in the compound group fails, the whole group fails. It does not act as a formal independent transaction but is still governed by the application’s transaction.

DB2 also supports compound SQL, but with a more verbose syntax, as shown in Example 7-10.

Example 7-10 DB2 compound SQL EXEC SQL BEGIN COMPOUND ATOMIC STATIC INSERT INTO org (dept, deptname, location) VALUES (:dn,:dname,:dloc); UPDATE employee SET dept =:dn WHERE empnum > :empnum; END COMPOUND;

DB2 supports atomic compound SQL, but also not atomic compound SQL. Not atomic means that the compound SQL does not act as a unit. The changes made by successful statements within the compound SQL statement remain effective even if some individual statements are not successful. As in Example 7-10, you must specify ATOMIC or NOT ATOMIC.

230 Database Strategies: Using Informix XPS and DB2 Universal Database 7.15 INSERT cursors

XPS uses a concept of INSERT cursors to process batch INSERTs into a table.

DB2 supports batch INSERTs but does not support the XPS INSERT cursor method using PUT and FLUSH. In DB2, you accomplish batch inserts by simply specifying multiple values lists in the INSERT statement.

Example 7-11 DB2 batch INSERT INSERT INTO customer (fname, lname) VALUES (‘Uwe’, ‘Weber’), (‘Donald’,’Duck’), (‘P.’,’Frampton’), (‘Minnie’, ‘Mouse’);

7.16 MERGE INTO

In traditional OLTP systems, an application developer has to code separate UPDATE and INSERT statements when changes need to be made to an existing table. We can call the table that is to be updated the master table in this update process, and the secondary table as the merge or transaction table. Combined update, insert and even delete operations (merge operations, in effect) provides flexibility, and allows the various updates to be performed in batch mode if conditions dictate.

This capability is often described as an UPSERT statement because it allows you, in one step, to present data to the server and have the server decide if an update or insert is needed — even delete is available as an action.

The IBM SQL Language Council has described the UPSERT pattern in the following terms when discussing the need for the MERGE INTO statement: A frequently occurring database design is one that has a master table containing the existing or current knowledge of a domain (for example, a parts table or accounts table) and a transaction table (for example, a shipment or trades table) containing a set of changes to be applied to the master table. This transaction table can contain updates to objects existing in the master table or new objects that should be inserted into the master table. To apply the changes from the transaction table to the master table requires two separate operations, an update operation for those rows already existing in the master table and an insert operation for those rows that do not exist in the master table.

Chapter 7. SQL considerations 231 The MERGE INTO statement was introduced into XPS with V8.50 and into DB2 with V8.1.2. The MERGE INTO statement was made available Oracle with Oracle 9i but is not currently available in IDS, nor in SQL Server.

Example 8-12 shows the high-level generic form of the MERGE INTO statement.

Here, table-name is any real table, updateable view, or for DB2 a full-select statement that allows updates.

Example 7-12 General form of the MERGE INTO statement MERGE INTO table-name AS correlation-name USING table-reference ON search-condition WHEN matching-condition THEN modification-operation

The matching condition can be as simple as MATCHED or NOT MATCHED (or both in separate WHEN clauses, because multiple WHEN clauses are allowed). A relatively complex example is shown in Example 7-13 when describing MERGE INTO with DB2 to show the power of this statement.

In Example 7-13, the EMPLOYEE table of the SAMPLE database is the master table that contains information about the employees. A number of updates have been generated by another system and these have been made available in the EMP_UPDATES table. The updates now need be processed into the data warehouse as either changes (updates) or new records (inserts).

Example 7-13 DB2 example of a MERGE INTO statement MERGE INTO employee AS e USING ( SELECT empno, firstnme, midinit, lastname, workdept, phoneno, hiredate, job, edlevel, sex, birthdate, salary, bonus, comm FROM emp_updates ) AS eu ON e.empno = eu.empno WHEN MATCHED THEN UPDATE SET (salary, bonus, comm) = (eu.salary, eu.bonus, eu.comm) WHEN NOT MATCHED THEN INSERT (empno, firstnme, midinit, lastname, workdept, phoneno, hiredate, job, edlevel, sex, birthdate, salary, bonus, comm) VALUES (eu.empno, eu.firstnme, eu.midinit, eu.lastname, eu.workdept, eu.phoneno, eu.hiredate, eu.job, eu.edlevel, eu.sex, eu.birthdate, eu.salary, eu.bonus, eu.comm)

This example assigns a correlation name e to the master table (employee) into

which the updates are being merged and a correlation name eu to the table from which the updates are to be processed (emp_updates). The correlation names are needed to ensure that there is no ambiguity between the column names of the two tables.

232 Database Strategies: Using Informix XPS and DB2 Universal Database UPSERT in XPS prior to V8.50 The UPSERT issue in XPS prior to V8.50 is expressed in Figure 7-1. You can update an existing table, using INSERT to add new row or using UPDATE to change an existing row.

UPSERT issue in XPS prior to V8.50

new_part_price table: part_price table:

part_id part_price …… UPSERT part_id part_price …… 100 189.99 100 209.00 333 23.99 101 38.99

Problem: How do you update the part_price table part_id part_price …… conditionally?: INSERT new part price and UPDATE existing part price in the 100 189.99 part_price table using the information 101 38.99 given in the new_part_price table? 333 23.99

Figure 7-1 UPSERT

To achieve the effects of the MERGE INTO statement prior to XPS V8.50, multiple SQL statements are required. This is illustrated in Example 7-14 where three statements and a temporary table are needed to perform the actions in Example 7-1.

Example 7-14 Workaround for UPSERT with XPS prior to V8.50 -- UPDATE through the following update join statement UPDATE part_price SET old.part_price = new.part_price FROM part_price AS old, new_part_price AS new WHERE old.part_id = new.part_id;

-- INSERT using the following two statements SELECT * FROM new_part_price WHERE part_id NOT IN (SELECT part_id FROM part_price) INTO TEMP t; INSERT INTO part_price SELECT * FROM t;

Chapter 7. SQL considerations 233 The UPDATE-JOIN statement (that is, an UPDATE statement with two tables that are joined) used here in this workaround for UPSERT is not standard SQL and hence is not available in DB2. If the XPS system has been upgraded to V8.50, the combination of three statements should be rewritten as a MERGE INTO statement as the latter is one step and more efficient. With DB2, there is no choice but to rewrite the code because UPDATE-JOIN is not available.

Restrictions and limitations on MERGE statement The XPS MERGE INTO statement has the following restrictions and limitations:  The MERGE INTO target cannot have any violation tables defined.  Insert or update triggers should not be invoked by the MERGE INTO statement.  The MERGE INTO target cannot be an external table.  The MERGE INTO target cannot be a remote table.  The MERGE INTO target cannot be a duplicated table.  The MERGE INTO target not be a catalog table or read-only view. It should not be a static or a pseudo table.  The MERGE INTO statement is not allowed in the trigger action statement  A row cannot be updated more than once during the merge statement execution  Inserted rows is not updated during the merge statement execution 7.17 Online analytical processing SQL

DB2 provides a variety of online analytical processing (OLAP) functions:  Ranking functions –RANK – DENSE_RANK  Row numbering function – ROW_NUMBER  Aggregation function – OVER  Super-grouping functionality

– GROUPING SETS – Adding grand totals to queries – ROLLUP – CUBE

234 Database Strategies: Using Informix XPS and DB2 Universal Database The rank functions compute the ordinal rank of a row R within the query result set. Rows that are not distinct are assigned the same rank. There are two variants, indicated by the keywords RANK and DENSE_RANK.

 If RANK is specified, then the rank of row R is defined as 1 (one) plus the number of rows that precede R and are not peers of R.  If DENSE_RANK is specified, then the rank of row R is defined as the number of rows preceding and including R that are distinct

The ROW_NUMBER function computes the sequential row number, starting with 1 (one) for the first row, of the row within the result set of the query. When aggregation is performed (COUNT, SUM, AVG, and so forth.), the OVER function can limit the aggregation to the current group under consideration.

The GROUPING SETS specification in the GROUP BY clause (SELECT ... GROUP BY GROUPING SETS (...)) is used to generate multiple aggregation groups in a single SQL statement. By using GROUPING SETS, you can calculate aggregations that would otherwise require a set operation such as UNION to put together the different components. Also, by using GROUPING SETS, you can add grand totals to queries.

ROLLUP grouping generates the various hierarchical groups in a single pass SQL statement. The CUBE operation produces all combinations of groups, calculating row summaries and grand totals for the various dimension in addition to all the totals provided by ROLLUP.

See “Related publications” on page 443 for a list of sources for more information.

The OLAP functions have their origins in the need to provide analytical tools in the SQL engines — tools that were previously only supplied by individual third-party packages. The market has come to realize that everyone would prefer to have basic analytical tools built into the database engines so that third-party packages could focus on real value-added analysis capabilities. Accordingly, the SQL standards community has offered an opportunity to standardize a selection of features, commonly called OLAP features, through an amendment to the ISO SQL 1999 standard (formerly known as SQL3).

Chapter 7. SQL considerations 235 7.18 Isolation levels

Locking isolation level affects the behavior of SQL statements. In XPS, default

isolation behavior is dependent on database type, as shown in Table 7-3.

Table 7-3 XPS database types and default isolation levels Database type Default isolation level

unlogged DIRTY READ

logged, non mode ANSI COMMITTED READ

log mode ANSI REPEATABLE READ

DB2 does not use database types. All DB2 databases are of the same type with default isolation of Cursor Stability (CS). Both XPS and DB2 allow you to override the default isolation level.

Unless your application explicitly directs isolation level, a transition to DB2 generally results in a change in locking behavior. If you have an XPS non-log-mode-ANSI database, your default isolation level will go from committed read to cursor stability (a more restrictive isolation level). If you have an XPS unlogged database, your default isolation level will go from dirty read to cursor stability (a very significant change). Depending on your application, you might end up with an increase in overall database locking requirements, and thus you might need to increase lock resources.

Both XPS and DB2 support a SET ISOLATION LEVEL statement. XPS uses the keyword TO, where DB2 prefers an equal sign (=) but now also accepts the keyword TO for compatibility with XPS and IDS.

The DB2 equivalent of the XPS dirty read isolation level is called uncommitted read. Many XPS applications set isolation to dirty read. For compatibility reasons DB2 now supports the phrase dirty read as a substitute for uncommitted read, as shown here: (XPS) SET ISOLATION TO DIRTY READ

(DB2) SET ISOLATION = DIRTY READ SET ISOLATION = UNCOMMITTED READ

236 Database Strategies: Using Informix XPS and DB2 Universal Database Further details on DB2 use of SET ISOLATION Prior to DB2 UDB V8.2, it was not possible to change isolation level within an application. With DB2 UDB V8.2, the SET CURRENT ISOLATION statement assigns a value to the CURRENT ISOLATION special register. Similar to Informix IDS and XPS, this statement is not under transaction control and thus the register setting is retained across COMMIT boundaries.

The statement can be embedded in an application program or issued through the use of dynamic SQL statements. It is an executable statement that can be dynamically prepared. No particular authorization is required to execute this statement.

The syntax for this statement is as follows: SET [CURRENT] ISOLATION [=] {UR | CS | RR | RS | RESET }

With this statement the value of the CURRENT ISOLATION special register is replaced by the specified value or set to blanks if RESET is specified. The following syntax is also supported:  The word CURRENT is optional  The equals sign (=) is optional, but you can specify TO in place of the equal sign (=)  DIRTY READ can be specified in place of UR  READ UNCOMMITTED can be specified in place of UR  READ COMMITTED is recognized and upgraded to CS  CURSOR STABILITY can be specified in place of CS  REPEATABLE READ can be specified in place of RR  SERIALIZABLE can be specified in place of RR

In these examples, the items that have asterisks (*) are provided for compatibility with IBM Informix SQL syntax, for both IDS and XPS.

Tip: DB2 also supports specifying the isolation level of a particular SQL statement by using a WITH UU (or RR or CS or RS) clause at the end of the SELECT statement, for example: SELECT count(*) FROM SysCat.Tables WHERE Type in ('A') FOR FETCH ONLY WITH UR

Chapter 7. SQL considerations 237 7.19 Optimizer directives

XPS supports optimizer directives in SELECT statements. DB2 uses numerous

optimizer level settings to control optimizer behavior but does not support directives from within individual SQL statements.

If you have used double dash (--) or C style (/* ... */) notation to specify XPS optimizer directives, these directives will simply be ignored by DB2 because they appear to DB2 as comments. See Example 7.3 on page 218. Curly braces ({ and }), however, are not supported as DB2 comments. If you have optimizer directives specified within { ... } comments, they must be removed.

Both XPS and DB2 support a SET OPTIMIZATION statement. XPS supports HIGH and LOW phraseology while DB2 requires that the optimization be set to one of many possible numerical values. Also, DB2 requires the usage of the CURRENT keyword and the equal sign (=). Note that the DB2 statement is only for dynamic SQL, and the BIND option should be used for static SQL. (XPS) SET OPTIMIZATION LOW

(DB2) SET CURRENT OPTIMZATION = 3

The DB2 set optimization statement is only for dynamic SQL. For static SQL, you should use the bind option.

7.20 DDL issues

This section describes the details of DDL issues.

7.20.1 Creating and altering tables Creating and altering tables in XPS is very similar to creating and altering tables in DB2. There are some exceptions such as:  Column data types might not be changed after creation, except increasing length of VARCHAR.  You cannot explicitly fragment tables using expression or round robin.  You cannot drop a column.

 You specify a table space name rather than a dbspace name.  You can associate tables with particular buffer pools.

238 Database Strategies: Using Informix XPS and DB2 Universal Database  You must precede the constraint definition with the constraint name. See 7.20.4, “Constraint naming” on page 240.

 You cannot alter a table to add a NOT NULL constraint.

In DB2, you should plan table layout carefully. DB2 has restrictions on objects that can be altered, added, or dropped after table creation. See the DB2 SQL manual for complete syntax on ALTER TABLE. For example, you cannot ALTER a table to add a NOT NULL constraint after a table is created.

A number of these restrictions were lifted with DB2 UDB V8.2, where the changes made by the ALTER TABLE statement are masked by a “load new table and then drop old table” approach on the underlying table. No in-table change is currently performed.

7.20.2 Synonyms DB2 supports synonyms, but they are called aliases. For compatibility reasons, DB2 supports the usage of the keyword SYNONYM in the CREATE statement. In XPS log mode ANSI databases and in DB2, all aliases are private and must be accessed using the alias owner syntax. (XPS) SELECT * FROM synonym_name

(DB2) SELECT * FROM myschema.alias_name

Because they are private, privileges must be granted for others to use DB2 aliases. DB2 does not support public aliases. Therefore, public or private toggle is not required or supported.

In XPS logged, non mode-ANSI databases and unlogged databases, synonyms are public by default, but can be made private using the PRIVATE keyword in the CREATE SYNONYM statement. If you are using this feature, the PRIVATE syntax is no longer required and should be removed. Synonyms are private by default in DB2, thus this phraseology is not required or supported.

Also, a DB2 alias can only refer to local database objects. DB2 uses nicknames (which are very similar to aliases) to refer to remote objects. Creating nicknames is part of the federated database feature which is a built-in capability of DB2.

Tip: When using DB2 you must configure the database to use the built-in federated feature. It is not available automatically.

Chapter 7. SQL considerations 239 7.20.3 Primary key definitions

Both XPS and DB2 do not allow any part of a primary key to contain a null value. XPS, in fact, implicitly assumes the NOT NULL constraint on primary keys and does not require an explicit definition. For clarity, DB2 does not make this

assumption and thus requires the explicit definition of a NOT NULL constraint. Thus, the following achieves the same effect, namely a primary key that is also not null: (XPS) CREATE TABLE manufact (manu_code CHAR(3) PRIMARY KEY)

(DB2) CREATE TABLE manufact (manu_code CHAR(3) PRIMARY KEY NOT NULL)

7.20.4 Constraint naming There is a minor difference in the syntax for naming referential integrity constraints. XPS requires custom constraint names after the constraint type, while DB2 requires custom constraint names before the constraint type. (XPS) CREATE TABLE t (c2 SMALLINT NOT NULL, c3 CHAR(40) NOT NULL, PRIMARY KEY (c2) CONSTRAINT pk_col2);

(DB2) CREATE TABLE t (c2 SMALLINT NOT NULL, c3 CHAR(40), CONSTRAINT pk_col2 PRIMARY KEY (c2));

7.21 Triggers

Triggers are used to start an action if a certain event occurs. Events that can be caught by a Trigger are INSERT, UPDATE, and DELETE statements processed on a table or view. Generally, triggers are not an important feature in a data warehouse system.

The assembly of a DB2 Trigger is similar to an XPS Trigger. A DB2 Triggers has a unique name, a definition on what event on a table or view it should be executed, there are also correlation names for a row (NEW and OLD) and finally the action which has to be performed if the trigger has been fired. Figure 7-2 on page 241 shows you the complete CREATE TRIGGER syntax.

240 Database Strategies: Using Informix XPS and DB2 Universal Database

CREATE TRIGGER trigger-name NO CASCADE BEFORE INSERT AFTER DELETE INSTEAD OF UPDATE , OF column-name

ON table-name view-name

AS REFERENCING OLD correlation-name DEFAULTS NULL AS NEW correlation-name AS OLD_TABLE correlation-name AS NEW_TABLE correlation-name FOR EACH ROW MODE DB2SQL FOR EACH STATEMENT WHEN ( search condition )

SQL procedure statement

Figure 7-2 DB2 CREATE TRIGGER statement

7.21.1 SELECT triggers Neither XPS nor DB2 support SELECT triggers, but IDS does. On an IDS system, with this feature, a DBA can specify actions that are to occur when SELECTs on particular tables are executed. The feature can be used to call stored procedures or INSERT a row into a audit table.

With DB2, you can achieve some level of query auditing for warehousing-type applications with the DB2 Query Patroller product which is very similar to Informix I-Spy, but these tools do have an effect on system performance.

In certain situations, you might consider to use a DB2 table function to achieve behavior similar to SELECT triggers. Using table functions, however, requires a change in the instigating SELECT phraseology, and therefore might not be practical. DB2 table functions are a special type of function that return data in table format and replace the table name in a SELECT. Thus, the IDS (but not XPS) supports: (IDS) CREATE TRIGGER emptrig1 ON SELECT FROM emp INSERT into audittab(current, “emp”, USER, “select”); SELECT * FROM emp WHERE empid = “5631”;

Chapter 7. SQL considerations 241 To achieve this with DB2, the following table function would replace the IDS trigger and the instigating SELECT would need to be changed:

(DB2) CREATE FUNCTION emp (a_empid INTEGER) RETURNS TABLE(empid INTEGER, name VARCHAR(20), salary DEC(10,2)) MODIFIES SQL DATA RETURN SELECT empid, name, salary FROM NEW TABLE( INSERT INTO audit(tstamp, tabname, user, data) INCLUDE(empid INTEGER, name VARCHAR(20), salary DEC(10,2))

SELECT CURRENT TIMESTAMP, ‘EMP’, USER, ‘EMPID: ‘|| CHAR(empid), Empid, name, salary FROM emp WHERE empid = a_empid) AS I

SELECT emp.* FROM TABLE(emp(5631)) AS emp

7.21.2 BEFORE-statement triggers XPS allows the definition of triggers that occur:  Once AFTER the entire group of rows, that is after statement.  Once AFTER each row.  Once BEFORE each row.  Once BEFORE the entire group of rows that is before statement.

DB2 allows the definition of triggers that occur:  Once AFTER the entire group of rows, that is, after statement.  Once AFTER each row.  Once BEFORE each row.

DB2 does not support a before-statement triggers. However, in cases were only a single row is processed, a before statement trigger is actually analogous to a before row trigger. If your database has defined before-statement triggers, but your application only operates upon single rows, you might be able to change the before statement triggers to before-row triggers. Before-row triggers are supported in DB2.

7.21.3 Disabling triggers

DB2 does not support the disabling and enabling of triggers. To emulate such behavior, triggers would need to be dropped and re-created.

242 Database Strategies: Using Informix XPS and DB2 Universal Database 7.22 Multidimensional Clustering in DB2

Although partitioning (or fragmentation, as it is also known in XPS) by expression

or range is not available yet (as of V8.2) in DB2 UDB, Multidimensional Clustering (MDC) capabilities introduced with DB2 UDB V8.1 provides a similar, but finer-grained segregation of rows. The MDC capability allows you to physically cluster data along multiple dimensions simultaneously.

By using MDC, you can create multiple, independently clustered indexes on a table simultaneously that improve disk access (data is clustered) and eliminate blocks of rows where the keys do not apply.

MDC is based on the definition of one or more orthogonal clustering attributes (or expressions) of a table. The table is organized physically by associating records with similar values for the dimension attributes in a cluster. Each clustering key is allocated one or more blocks of physical storage with the aim of storing the multiple records belonging to the cluster in almost contiguous fashion. The clustering index of the MDC table is based on a block-oriented index structure. Each block or cell has rows with the same key values for all the keys of the index. As data is initially loaded, or later added to, rows with the same set of key values are physically clustered into pages and extents — all rows in an extent have the same set of key values.

Example 7-15 shows the DDL for creating an MDC-based table. The ORGANIZE BY DIMENSIONS clause is used to specify which dimensions participate in the clustering.

Example 7-15 Creation of an MDC-based table CREATE TABLE sales_fact ( country VARCHAR(20), -- dimensions year SMALLINT, color VARCHAR(10),

quantity INT, -- measures price DECIMAL(8,2) ) ORGANIZE BY DIMENSIONS (country, year, color)

This example DDL produces a table that is clustered in three dimensions: country, year, and color, as shown in Figure 7-3 on page 244. The individual

logical cells, consisting of one or more extents, hold rows that have the same values for the dimension pointers (for example, in this figure, country = Mexico, year = 2002, and color = blue). A slice is a set of cells with common value for

Chapter 7. SQL considerations 243 more than one, but less than all possible dimensions (for example, in this figure, year = 2001).

MDC Table Dimension An axis along which data is physically organized in an MDC table

Slice

2002 2001 Canada, Canada, yellow blue 2002 2002 Canada, Canada, Country yellow yellow

Dimension 2002 2001 Mexico 127 Mexico blue 31 yellow 2002 2002 Cell Mexico Mexico yellow yellow Color 45 Dimension Year Dimension Figure 7-3 MDC table with three dimensions

When there are no rows for a particular combination of dimension values, no cell is created.

When the first row for a particular combination of dimension values needs to be stored, a complete extent is allocated, but the extent can be placed anywhere because access to the cell is through the block pointer index. Subsequent rows with the same combination of dimension values will be stored in the same page or in other pages of the extent until the extent is filled. Another extent is allocated when needed, but it is unlikely that it will be contiguous with the original extent. All extents with this combination of dimension values are considered to be members of the cell. Thus, the MDC cell is a logical collection of extents.

In a data warehouse system built on DB2 with the data partitioning feature (DPF) option, a common approach would be to hash across the partitions with a dimension such as customer and then to use MDC to provide the equivalent of XPS hybrid fragmentation within the partitions. The case study used for the development of this redbook showed that DB2 hash partitioning plus MDC indexes could provide equivalent performance to an XPS hybrid fragmentation scheme.

Prior to DB2 UDB V8.1, only one index per table could be designated as the clustering index, and all indexes in that approach are row-based rather than

244 Database Strategies: Using Informix XPS and DB2 Universal Database block based. Clustering using this approach is only perfect at the moment of the index creation and valid only until the next group of inserts or deletes on the table.

This approach, similar to indexing in XPS, is still available alongside the MDC

approach and can be used in addition to MDCs.

7.23 DB2 Materialized Query Tables

Materialized query tables were introduced into DB2 UDB with version 5 and have been enhanced with each release since then. The current version, since V8, addresses problems with automatic summary tables (AST). With DB2 UDB V8, ASTs were renamed as MQTs and the feature has been expanded to cover a wider range of capabilities. ASTs were only summary tables based on actual tables. Starting with V8, DB2 MQTs are no longer required to have summary data, and the concept has been generalized.

MQTs address issues of performance associated with analyzing large amounts of data. They:  Allow the DBA to precompute and materialize an aggregate query  Enhance the ability of the DB2 optimizer to rewrite queries against the base table into queries against the MQT instead

Options in the CREATE TABLE statement that defines (and populates) the MQT allow for:  DEFINITION ONLY tables  REFRESH DEFERRED or REFRESH IMMEDIATE  ENABLE (or DISABLE) QUERY OPTIMIZATION  MAINTAINED BY SYSTEM (or BY USER)

In addition, the REFRESH TABLE statement provides for INCREMENTAL or NON INCREMENTAL update. Here the NON INCREMENTAL approach specifies a full refresh by recomputing all rows from the MQT table definition, while an INCREMENTAL approach specifies that refresh is to be performed by considering only appended data or by using the content of an associated staging table. The simple REFRESH TABLE tablename command, without any further information specified, allows the system to determine whether it knows enough to do incremental processing or whether a full refresh is required.

The DB2 MQT does not have any equivalent counterpart in XPS. Because the optimizer is MQT aware, the use of MQTs can improve server performance

Chapter 7. SQL considerations 245 substantially. DB2 UDB V8.2 provides an MQT Advisor as part of the Design Advisor.

7.23.1 Using and Configuring MQTs

MQTs are a powerful way to improve response time for complex queries, especially queries that might require some of the following operations:  Aggregated data over one or more dimensions  Joins and aggregates data over a group of tables  Data from a commonly accessed subset of data, that is, from a hot horizontal or vertical partition  Re-partitioned data from a table, or part of a table, in a partitioned database environment

Knowledge of MQTs is integrated into the SQL compiler. In the SQL compiler, the query rewrite phase and the optimizer match queries with MQTs and determine whether to substitute an MQT for a query that accesses the base tables. If an MQT is used, the EXPLAIN facility (that is, Visual Explain) can provide information about which MQT was selected.

Because MQTs behave similar to regular tables in many ways, the same guidelines for optimizing data access using table space definitions, creating indexes, and issuing RUNSTATS apply to MQTs.

There are two tools that make choosing MQTs easier:  DB2 Cube Views, which makes recommendations based on modelling the business integration dimension model and on the characteristics of the tables involved in business integration queries.  MQT Advisor, which makes recommendations based on user-provided workflow analysis.

DB2 Cube Views has been available since DB2 UDB V8.1 FP2. It is included in the DB2 Data Warehouse Edition (DWE), but otherwise must be purchased separately. Cube Views has a model-based MQT advisor that considers the dimensional model, table stats, and samples of actual table data, and then comes up with MQT recommendations. It is capable of exchanging cube metadata with third party tools.

MQT Advisor was released DB2 UDB V8.2 as part of a new combined Design Advisor, an autonomic enhancement now included with the DB2 UDB software. The MQT Advisor is a workload-based MQT advisor that is similar to the previously available (V8.1) Index Advisor that is also based on a workload provided by user as qualified by space and time constraints. The MQT Advisor

246 Database Strategies: Using Informix XPS and DB2 Universal Database develops costs for various plan alternatives with and without MQTs and comes up with a set of MQT recommendations.

Tip: The best overall approach to using MQTs is a complimentary combination of both approaches: Best Performance = Model-Based MQT (using Cube Views) + Workload-Based MQT (using MQT Advisor)

The model-based approach is generally performed before you have detailed knowledge about your data and the workload-based approach after you have specific experience with your queries.

7.24 System commands

This section reviews some of the system commands.

7.24.1 CREATE DATABASE In XPS, the CREATE DATABASE statement is considered to be SQL, but in DB2 is it considered to be a system command (and thus requires SYSADM authority).

7.24.2 Administrative commands XPS uses certain utilities to perform some database administration activities. These utilities are not SQL statements, nor can be mixed with SQL statements, but are command line utilities. For example, in XPS you use the onutil command to create a dbspace.

DB2, as a general rule, relies more on the SQL DDL interface to perform administrative functions. For example, in DB2, a table space is created with a CREATE TABLESPACE command in the SQL CLP.

Attention: The notion of table space differs between DB2 and XPS/IDS. The DB2 table space corresponds to the same concept as an XPS/IDS dbspace — it is a logical collection of containers (physical allocations of disk corresponding to XPS/IDS chunks). Differences, and similarities, in terminology are listed in the Appendixes.

As a side issue, table space is two words in DB2 (but one compound word, TABLESPACE, in administrative commands), and just a single word in XPS/IDS.

Chapter 7. SQL considerations 247 7.25 Statistics for table and indexes

The catalog tables within each database hold critical metadata about the

structure of user-defined tables and other objects, and the statistics associated with these tables are used by the cost-based optimizer to determine the best execution paths and methods for the query. However, the optimizer can only use the information that it has.

The DB2 query optimizer uses a combination of:  Powerful query rewrite rules to transform user-written queries into standardized, easier-to-optimize queries.  A detailed cost model to generate and evaluate a rich set of alternative plans for executing the query.

The cost model can be displayed using Visual Explain. Thus, for instance, the optimizer automatically determines whether any existing Materialized Query Tables (MQT) could benefit a query, and, if so, rewrites the query to use the MQT without having to alter the query in the user application program. Further, it collects statistics on the size of each table and the distribution of each column to model how many rows must be processed by any query a user might submit.

The optimizer adapts its model to the local machine environment, factoring in the speed of the processor, the type and model of storage devices, and the network connecting machine clusters (in a shared-nothing environment) or sites (in a federated environment). In most cases, the optimizer minimizes the total overall resource consumption but changes the optimization criterion automatically to be minimal elapsed time in parallel environments. The cost model includes detailed modeling of the availability of various memory categories (multiple buffer pools, sort heap, and so on) versus the amount needed, hit ratios, the cost to build temporary tables versus the cost to re-scan them, various flavors of pre-fetching and big-block I/O, non-uniformity of data distributions, and so on.

UPDATE STATISTICS (XPS) The UPDATE STATISTICS statement in XPS performs all of the following tasks:  Determines the distribution of column values.  Updates system catalog tables that the database server uses to optimize queries.  Forces re-optimization of SPL routines.

 Converts existing indexes when you upgrade the database server.

248 Database Strategies: Using Informix XPS and DB2 Universal Database RUNSTATS and REORG (DB2) The runstats and reorg utilities are needed to update the catalog tables statistics information and to improve access performance.

7.26 Query Monitoring

Explain (for XPS) and Visual Explain (for DB2) are important tools for users of SQL to aid in the development of efficient SQL. Both provide explanation and analysis of SQL statements, one in character format (XPS) and one in graphical format (DB2). Both provide the ability to develop an access plan with or without the execution of the plan.

Both servers also offer a number of real-time monitoring tools.

Explain (XPS and IDS) To obtain the access plan (or explain report), you use the SET EXPLAIN statement to switch on explain-mode for the session or you can include EXPLAIN in the optimizer directives. In either case, explain information is written to a file (default name is sqlexplain.out) that shows the query plan of optimizer, an estimate of the number of rows returned, and the relative cost of the query.

Output from a SET EXPLAIN ON statement is directed to the appropriate file until you issue a SET EXPLAIN OFF statement or until the program ends. If you do not enter a SET EXPLAIN statement, the default behavior is OFF. The database server generates the optimization plan but does not produce the report that shows the measurements for queries.

The EXPLAIN process executes during the database server optimization phase that takes place when you initiate a query. For queries that are associated with a cursor, if the query is prepared and does not have host variables, optimization occurs when you prepare it. Otherwise, optimization occurs when you open the cursor.

For discussions of SET EXPLAIN and of analyzing the output of the optimizer, see the IBM Informix Extended Parallel Server Performance Guide, G251-2235-00.

I-Spy (XPS and IDS) IBM Informix I-Spy monitors SQL statements submitted to an Informix database server and creates an activity log that you can use for performance analysis and tuning. I-Spy can also use a set of user-created rules to control SQL statement execution.

Chapter 7. SQL considerations 249 I-Spy itself is a daemon process that client applications see as a database server. I-Spy listens on a TCP port for connections and routes client connections and associated traffic to an actual Informix database serve. It is often referred to as the shadow server. While it routes network traffic, the shadow server decodes all messages and logs the SQL statements and associated statistics to an I-Spy activity database.

I-Spy works transparently in your existing environment and requires no changes in the client applications or the database server. The only change for the client might be a different value for the INFORMIXSERVER environment variable.

Further details on I-Spy can be found in the IBM Informix I-Spy User Manual, Part No. 000-6830A.

Tuning queries with XPS Performance tuning is an iterative process. Each query and each database server is different, so you must use your judgement to evaluate the output of tuning utility programs and make tuning decisions. After you make tuning adjustments, re-evaluate the effect and make further adjustments if necessary.

For further details on tuning queries, especially with XPS, see IBM Informix Extended Parallel Server Performance Guide, G251-2235, especially Chapter 7 “Tuning Specific Queries and Transactions.”

Explain tools and Visual Explain (DB2) There are a number of SQL explain tools available for use in evaluating SQL access plans. The SQL explain tools available with DB2 UDB V8 are:  dynexpln, which is used from the command line to explain dynamic SQL statements  db2expln, which is used from the command line to explain both static and dynamic SQL  db2exfmt, which is used from the command line to format the content of the explain tables in the database that are populated by the optimizer processing  Visual Explain, which is used to graphically present both static and dynamic SQL statements

In all cases, the explain tools display such information as the indexes being used, join methods used, timerons (artificial units of time for comparison between alternate statements), index and table statistics, order of join operations, number and type of sort operations, and type of optimization used. When SQL access plan information has been captured and analyzed, you can use the DB2 Design Advisor to recommend the creation of additional indexes and similar items.

250 Database Strategies: Using Informix XPS and DB2 Universal Database For a complete list of DB2 Developer Tools and component, refer to DB2 Application Development Guide: Programming Client Applications, SC09-4826. For details on Visual Explain, refer to DB2 UDB V8 Administration Guide: Performance, SC09-4821, and DB2 V8 Visual Explain Tutorial, Form db2tvx80.

Visual Explain can be launched from the DB2 Control Center and the DB2 Command Center. Figure 7-4 shows a Visual Explain plan for a three-table join generated by running a SELECT statement from a command window. The figure also illustrates that further details can be obtained on individual components of the plan such as the lower hash-join in the visual plan.

Figure 7-4 Visual Explain running inside the DB2 Control Center

Monitoring (DB2) A number of different monitoring tools are available with DB2, including:  Query Patroller  SNAPSHOT and event monitoring

Chapter 7. SQL considerations 251

252 Database Strategies: Using Informix XPS and DB2 Universal Database

8

Chapter 8. Loading and unloading data

This chapter describes how loading and unloading of data is done in XPS, when loading and unloading of data is typically used, and how this unloading can be done in DB2.

XPS and DB2 offer essentially two methods for loading data into a data warehouse:  Loading and inserting data in a single stream  Parallel bulk loading

This chapter also provides a short discussion of loading and inserting, and then focuses on parallel bulk loading.

© Copyright IBM Corp. 2005. All rights reserved. 253 8.1 Loading and inserting data in a single stream

Loading and unloading data in a single stream is often done with ETL (extract, transform, and load) tools such as WebSphere DataStage and Informatica. Such ETL programs should need little modification when migrated to a DB2 ESE DPF environment.

Sometimes, single stream loading and unloading is also used for transferring small amounts of data with manually written programs. The primary ways of doing this are:  Using I-Star for getting data from other XPS or IDS systems.  Using ESQL/C, ESQL/Cobol, or 4GL programs for inserting and selecting data.  Using dbaccess load and unload commands for loading and unloading.

WebSphere Information Integrator provides functionality similar to I-Star. The WebSphere Information Integrator functionality that is necessary for accessing IDS or XPS systems is integrated in DB2 ESE. Therefore, no separate product has to be purchased in this case (see Getting Started on Integrating Your Information, SG24-6892).

A particular problem is the use of insert cursors in an embedded SQL program, because DB2 does not support insert cursors directly. A solution for this is to use the feature offered by Client SDK 2.90 which provides DB2 connectivity for Informix embedded SQL programs. This enables the use of insert cursors with DB2.

For more information about how to migrate Embedded SQL or 4GL programs, see Database Transition: Informix Dynamic Server to DB2 Universal Database, SG24-6367. The transformation of scripts that use the dbaccess load and unload commands is also briefly discussed in Transitioning: Informix 4GL to Enterprise Generation Language (EGL), SG24-6673.

In short, the DB2 import command corresponds to the dbaccess load command, and the DB2 export command corresponds to the unload command.

Note that these types of loading and unloading are typically not used for performance critical tasks.

254 Database Strategies: Using Informix XPS and DB2 Universal Database 8.2 Parallel bulk loading

XPS uses so-called external tables for loading and unloading data. External tables provide a SQL interface for a set of load and unload files. The files can be named pipes or regular files. These external tables can be used similar to regular internal tables in SQL statements. It will use the full parallelism provided by the database system. Therefore, a parallel load in XPS is an INSERT or SELECT statement inserting in an internal table and selecting from an external table.

The typical DB2 method for parallel bulk load is to use the DB2 load command. It provides the possibility to specify the load files, the target table, and many options for influencing the load. In a DPF environment, a load consists of essentially two phases: 1. The load files are split according to the partitioning key of the target. 2. A local load is performed on each partition.

For understanding how a load works in XPS and DB2, consider an example where we have a system with four processors on one SMP computer. (The typical XPS or DB2 DPF is usually much larger, but this size was chosen so that the example remains readable.) In this example, there are:  Eight input files (all approximately the same size).  /data/loadfile[1-4]-[1-2].unl in pipe-delimited ASCII format.  The data from these input files is loaded into a database table t which is distributed across eight physical disks (for simplicity, we do not consider RAID here).

We assume that the XPS configuration consists of two coservers, each with two CPU virtual processors (CPUVPs). The table t should be fragmented by hash in the dbslice dbsl, which consists of eight dbspaces, four on each coserver.

We assume that the DB2 configuration consists of four partitions. The table t should be in a table space tsp that consists of two containers per partition.

The external table t_ext describing the load files could be defined in XPS with the SQL that is shown in Example 8-1 on page 256. This says table t_ext has the same columns as the internal table and that the load files are in pipe-delimited ASCII format which is the default for external tables.

The names of the load files are specified in the using clause. The file specification consists of three parts which are separated by colons. The first part specifies whether regular files or pipes are used. In this case, disk means that we are using regular files. The next part defines the coservers on which load files are located. In this case, cogroup_all means that load files are located on all coservers. The third and last part specifies the actual file names. In this case, %c

Chapter 8. Loading and unloading data 255 means that the coserver ID appears at this position in the file name. The %r(1..4) means that there are actually four load files per coserver, and in one this is replaced in the file name with 1, in another one with 2, in another one with 3, and in the last one this is replaced in the file name by 4g.

Example 8-1 Definition of external table t_ext create external table t_ext sameas t using ( datafiles ( “disk:cogroup_all:/data/loadfile%r(1..4)-%c.unl” ) )

When the external table t_ext is defined, the load of table t from the load files can be done with the SQL insert statement that is shown in Example 8-2.

Example 8-2 Loading of table t in XPS insert into t select * from t_ext

What happens in the XPS server when this insert statement is executed is shown in Figure 8-1 on page 257. There are four xlread threads, one running in each CPUVP. The number of xlread threads is determined as the minimum of the total number of CPUVPs and of the number of load files. Each of the xlread threads is reading from one load file at a time.

When an xlread thread has completed reading a file it switches to the next available file on the coserver on which the xlread thread is running. The xlread threads are passing the rows on to the xlconv threads. An xlconv thread is responsible for converting the rows from external format to internal format. For example, an integer from external ASCII representation to internal binary representation.

Usually there is one xlconv thread per CPUVP. The xlconv thread passes each row to that insert thread which is handling the fragment to which this row belongs. There is one insert thread for each dbspace which appears in the fragmentation clause of table. Depending on the table type of table t, the insert is done with logging or as a so-called light-append without logging.

256 Database Strategies: Using Informix XPS and DB2 Universal Database Coserver with 2 CPU VPs Coserver with 2 CPU VPs

xlread xlconv insert

threads dbslice external table

Figure 8-1 Execution of the sample load in XPS

Example 8-3 shows the load statement that you can use to do a similar load in DB2.

Example 8-3 Load statement for table t in DB2 load from /data/loadfile1-1.unl, /data/loadfile1-2.unl, /data/loadfile2-1.unl, /data/loadfile2-2.unl, /data/loadfile3-1.unl, /data/loadfile3-2.unl, /data/loadfile4-1.unl, /data/loadfile4-2.unl of del modified by coldel| anorder nochardel insert into t nonrecoverable

Chapter 8. Loading and unloading data 257 The load statement lists the load files. These load files have to be on the same partition (usually the partition to which one is connected). The of clause specifies the format of the load files (fixed length ASCII, delimited ASCII, an internal format, or a cursor).

In our example, del specifies that we are using delimited ASCII. The default delimiter for DB2 is a semicolon. Therefore, we have to specify next in the modified by clause with the chardel modifier that we are using pipe-delimited load files. The next modifier, anyorder, is very important for getting a parallel load. If this modifier is omitted, the load is done sequentially.

The nochardel modifier specifies that no additional delimiter is put around character strings. We use that because we have load files in the same format as used by XPS. If the load file was created by DB2 (for example, with the export command), then you do not need this modifier.

Finally, the insert clause tells us that the load appends rows to the table t. The very last clause, nonrecoverable, determines that we do not need recoverability of the load. That is, we accept that if the load fails the table has to be dropped or restored from a backup. This behavior is acceptable because we are assuming that our load is an initial load of table t.

Figure 8-2 on page 259 shows the processes that were created for executing this load statement and the communication between the processes. We are assuming that we are connected to partition one.

258 Database Strategies: Using Informix XPS and DB2 Universal Database processes

db2lpprt

db2lpart

round robin db2lmr

db2lfrmX hashed db2lrid

db2lbmX

partition

tablespace

Figure 8-2 Execution of the sample load in DB2

The first step of processing the load is reading the load files and splitting them in partition-specific load files which can later be processed locally. The pre-partitioning agent db2lpprt runs on the coordinator database partition. There are eight of these processes, one for each load file. If we did not specify anyorder in the load statement, then there would be only one db2lpprt process.

Each db2lpprt process reads from its corresponding load file and sends the data in a round robin fashion to the partitioning agents (db2lpart). A db2lpart process extracts the partitioning column out of each record and sends the record to the media reader processes on each partition via TPC/IP. The number of db2lpart processes can be configured for the load command. When nothing is specified explicitly, as shown in Example 8-3 on page 257, the number of db2lpart processes p is defined in the following equation:

t p = --- + 1 4

The parameter t is the number of output partitions, which defaults to the number of partitions of the table that is loaded.

Chapter 8. Loading and unloading data 259 Therefore, because we did not specify anything else, we get two db2lpart processes in Example 8-3 on page 257. These are running on partition 2 and 3, because per default, the coordinator partition is avoided by the db2lpart processes.

All this processing — starting with the media reader processes db2lmr — is happening local to a partition. Therefore, there are as many db2lmr processes as there are partitions for the target table.

A db2lmr process reads a buffer with raw data, finds the end of the last complete record in the buffer, and sends the complete records to a formatter process (db2lfrmX, where X is the number of the formatter on the local partition). These numbers start with 0 on each partition on the same partition. If the buffer does not contain a complete record, all the following buffers are sent to the same formatter.

The db2frmX processes construct a list of records. This list of records is passed to the ridder (db2lrid) processes. The number of db2frmX is influenced by the CPU_PARALLELISM parameter. However, the default is determined by the number of processors available to a partition. Therefore, in the example, we have four db2lfrmX processes, all named db2lfrm0.

The ridder processes construct full extents from the record lists. These extents are passed to the buffer manipulator processes db2lbmX. The buffer manipulators write the extents to disk. There is one db2lrid process per partition.

Table 8-1 shows a summary of the number of the different load processes.

Table 8-1 Number of load processes Process Default number of processes per partition Number of processes set explicitly with

db2lpprt number of load files

db2lpart PARTITIONING_DBPARTNUMS t --- + 1 4 where t is the number of output partitions (defaulting to the number of partition of table)

db2lmr number of partitions

db2lfmX number of processors per partition CPU_PARALLELISM

db2lrid 1

db2lbmX max(4*#db2frmX,max(50,#containers) DISK_PARALLELISM

260 Database Strategies: Using Informix XPS and DB2 Universal Database Table 8-2 provides a description of all the agents. This table also contains descriptions of some additional processes that are involved in loading data which are necessary in more complicated scenarios.

Table 8-2 Processes involved in DB2 load

Process name Description

db2lbs LOAD LOB scanner. They are only used when the load tool is loading into a table with LOB columns. These processes scan the LOB object of the table and read the information back in.

db2lbmX LOAD buffer manipulator. The character X indicates one or more. Writes loaded data to the database and can be involved in async I/O. There is always one and often more depending on a heuristic. The heuristic is based on the number of processors on the system and the number of containers being written to.

This "intelligent default" can be overridden by the DISK_PARALLELISM modifier to the LOAD command.

We should make it clear that this Async I/O is not the async file I/O supported by some operating systems; it just means that we have separate processes writing the I/O. This means that other processes that are formatting the data are not impacted by I/O waits.

db2lfrmX OAD formatter process. The character X indicates one or more. This process formats the input data into internal form. It is always present in a LOAD. An intelligent default is used which can be overridden by the CPU_PARALLELISM modifier to choose the optimum number of processors.

db2lfs These process are used when the table being loaded has LONG VARCHAR columns. These are used to read and format the LONG VARCHAR columns in the table.

db2lmr This is the LOAD Media Reader process. It reads the load input file(s) and disappears when the input file(s) have been read completely. This happens even before the entire load operation has completed.

db2lmwX These are the LOAD media writer processes. The last character 'X' indicates one or more.

These processes make the load copy if this option is specified for the LOAD command. The load copy is essentially a backup of the data that was loaded into the table. These media writers are the same as the media writers used by BACKUP and RESTORE. There is one media writer invoked per copy session as described on the command line (you can create a load copy to multiple files). If there is no load copy there is no media writer. They get input from the other processes in load depending on what the data type is, but typically every bit of data that gets written by a buffer manipulator will be passed on to the media writer. As with all the other processes they are controlled by the load agent.

Chapter 8. Loading and unloading data 261 Process name Description

db2lrid This process performs the index sort and builds the index Record IDs (RIDs) during the LOAD.

This process is not present in a non-parallel database instance. That is, INTRA_PARALLEL is disabled. The tasks performed by this process are done by the formatter EDU in a non-parallel instance.

This process performs the following functions:  SMP synchronization  Allocate RIDs, the last is to build the indexes  Control the synchronization of the LOAD formatter processes

db2ltsc The LOAD table scanner. These processes scan the data object for the table being loaded and read the information for the LOAD tool. These are used during a LOAD append operation.

db2linit The LOAD initialization subagent. This subagent acquires the resources required on the database partitions and serializes the reply back to the load catalog subagent. Multi-partitioned database environment only.

db2lcata The LOAD catalog subagent. This subagent is executed only on the catalog partition and is responsible for:  Spawning the initialization subagents  Processing their replies  Storing the lock information at the catalog partition.

The catalog subagent also queries the system catalog tables to determine which partitions to use for data splitting and partitioning.

There is only one catalog subagent for a normal load job. The exception are loads failing to acquire loading resources on some partitions. If setup errors are isolated on database partitions, the coordinator will remove the failed partitions from the load's internal partition list and spawn a new catalog subagent. This process is repeated until resources are successfully acquired on all partitions, or failures are encountered on all partitions. Multi-partitioned database environment only.

db2lpprt Load pre-partition subagent. This subagent pre-partitions the input data from one input stream into multiple output streams, one for each partitioning subagent. There will be one pre-partitioning subagent for each input stream. Multi-partitioned database environment only.

db2lpart The load partition subagent. This subagent partitions the input data into multiple output streams, one for each database partition where the data will be written. The number of partitioning subagents can be configured by the user. The default number depends on the total number of output database partitions. Multi-partitioned database environment only.

262 Database Strategies: Using Informix XPS and DB2 Universal Database Process name Description db2lmibm The load mini-buffer manipulator subagent processes.

This subagent writes the partitioned output file if the partition_only mode is used for the

load.

There is one mini buffer manipulator subagent per output database partition. Multi-partitioned database environment only. db2lload The load subagent processes. This subagent is responsible for carrying out the loading on each database partition. It spawns the formatters, ridder, buffer manipulators and media writer EDUs and oversees their work.

There is one load subagent for each output database partition. Multi-partitioned database environment only. db2lrdfl The load read-file subagent processes. This subagent reads the message file on a given database partition and sends the data back to the client. There is a read-file subagent for each output partition, partitioning partition and pre-partitioning partition. Multi-partitioned database environment only. db2llqcl The load query cleanup subagent processes. This subagent removes all of the load temporary files from a given partition.

There is one cleanup subagent for each output partition, partitioning partition and pre-partitioning partition. Multi-partitioned database environment only. db2lmitk The load mini-task subagent processes. This subagent frees all LOB locators used in a load from cursor call or a CLI load.

There is one mini-task subagent per cursor/CLI load running on the coordinator partition. Multi-partitioned database environment only. db2lurex The load user-exit subagent processes. This subagent runs the user's file transfer command.

There is one user-exit subagent for each load job using the file transfer command option. Multi-partitioned database environment only. db2lmctk This process is used to hold, release or downgrade locks held on the catalog partition as a result of the load. Multi-partitioned database environment only. d2med These processes handle the reading from or writing to the database table spaces for LOAD, BACKUP, and RESTORE.

Chapter 8. Loading and unloading data 263 8.2.1 Handling bad rows

When data is loaded, some might not be in the correct format. In XPS there are basically two ways to address bad rows. It is possible to define restrictions:

 On the external table  On the internal table into which data are inserted

Restrictions on the external table can have two different forms:  The data type can be more restricted  Constraints (not null or check constraints) can be defined on the external table

For example, if a date column is loaded, it is possible to have a date column in the external table or it is also possible to have a character column in the external table. In the first case, the converter thread is already checking whether the date column is in the correct format. In the second case, the insert is checking the date type (assuming the target table has defined the column as date type). The advantage of having a CHAR column in the external table is that more logic can be put in the select statement and more complicated date formats can be handled. It is also possible to put check constraints and not null constraints on an external table.

Restrictions on an internal table can be, besides the data type specification, all the usual constraints such as primary and foreign key constraints and check constraints.

There is a very significant difference between restrictions on the external table and restrictions on an internal table. Restriction violations on an external table typically result in an entry to a so-called reject file while a restriction violation on an internal table causes an error by default. However, there are exceptions to this statement. With the MAXERRORS parameter in an external table definition, it is possible to determine how many rows with errors can be put in the rejectfile. If the maximum number of errors is reached (or if no reject file is specified), the INSERT statement selecting from the external table and inserting into the internal table is aborted. It usually makes sense to set MAXERRORS not too high, because it is better to recognize a completely corrupted set of load files early.

A reasonably well-tuned load should typically be processor bound. If it is not, it is very likely that all rows which should be loaded are being written to the reject file

instead.

Example 8-4 on page 265 shows how the REJECTFILE clause, MAXERRORS, and constraints can be used in an external table definition.

264 Database Strategies: Using Informix XPS and DB2 Universal Database Example 8-4 External table with rejectfile and MAXERRORS create external table t_ext1 ( date1 char(10), date2 date, date3 date check (date3 > date(’01/01/2004’) and date3 < date(’01/19/2004)’) date4 not null ) using ( datafiles ( “disk:1:/data/loadfile.unl” ), rejectfile “/data/rejectfile%c“, maxerrors 100000 )

We assume that we load the rows from the this external table t_ext1 into the internal table t_int1, as defined in Example 8-5.

Example 8-5 Internal table for load with reject file create table t_int1( date1 date, date2 date, date3 date, date4 date )

What would happen if we load the file shown in Example 8-6?

Example 8-6 load file /data/loadfile.unl 01/01/2004|01/01/2004|01/01/2004|01/01/2004| 01/02/2004|01/02/2004|01/02/2004|01/02/2004| 01/02/2004|01/02/2004|01/02/2004|| 01/02/2004|02/30/2004|01/02/2004|01/02/2004| 02/30/2004|01/02/2004|01/01/2004|01/02/2004| 02/30/2004|01/02/2004|01/02/2004|01/02/2004|

The first row would generate a entry in the reject file (shown in Example 8-7 on page 266), because it violates the constraint defined for column date3 in the external table. That name of the constraint c104_3 can be verified with the dbschema command. The second row is correctly inserted in the internal table because no constraints are violated. The third row creates an entry in the reject file because the NOT NULL constraint on column date4 is violated. The fourth row also creates an entry in the reject file because the second column date2 does not contain a valid date. The fifth row contains an invalid date for the first column and a constraint violation for column date3. Because only the constraint violation is defined in the external table, an entry in the reject file for this

Chapter 8. Loading and unloading data 265 constraint violation is created. The sixth row also contains an invalid date for the first column. However, because this row does not contain any contrarian violations defined in the external table, no entry in the reject file is created. Instead the load is aborted when this row is processed.

Example 8-7 shows the reject file that results from trying to load the data in Example 8-6 on page 265. If this should not cause the load to abort a so-called violation table must be defined.

Example 8-7 Reject file /data/rejectfile1 1,/data/loadfile.unl,1,CONSTRAINT(informix.c104_3),:01/01/2004|01/01/2004 |01/01/2004|01/01/2004| 1,/data/loadfile.unl,3,NOT_NULL,date4:01/02/2004|01/02/2004|01/02/2004|| 1,/data/loadfile.unl,4,CONVERT_ERR,date2:01/02/2004|02/30/2004|01/02/2004|01/02 /2004| 1,/data/loadfile.unl,5,CONSTRAINT(informix.c104_3),:02/30/2004|01/02/2004 |01/01/2004|01/02/2004|

How are bad rows handled in DB2?

Constraints for columns have to be defined on the target table in DB2. The target table t_int1 could be defined in DB2 as shown in Example 8-8.

Example 8-8 Definition of table t_int1 for DB2 create table t_int1 ( d1 date, d2 date, d3 date check (d3 > date(‘01/01/2004’) and d3 < date(‘01/19/2004’)), d4 date not null );

Now, the load statement (shown in Example 8-9) can be used to load the data from Example 8-6 on page 265, which are contained in the load file /data/loadfile.unl into the table t_int1.

Example 8-9 Load statement for handling bad load data in DB2 load from /data/loadfile.unl of del modified by anyorder coldel| dumpfile=/data/dumpfile warningcount 100000 messages /data/messages insert into t_int1 ;

266 Database Strategies: Using Informix XPS and DB2 Universal Database Two new modifiers are used:

 The dumpfile modifier specifies the prefix of the name of a file which is used for storing the rows that are rejected by the load command. This file corresponds to the reject file of XPS.

 The warningcount modifier specifies the number of rows which might be rejected before the load command stops. This modifier corresponds to the MAXERRORS parameter in the external table definition of XPS.

When a load command is stopped because warningcount is reached, the load command can be restarted. The messages option allows you to specify a file which contains the reason why a row was rejected. Therefore, the messages file and the dump file together contains the information of the rejected file of XPS.

Example 8-10 contains the contents of the dump file when executing the load statement in Example 8-9 on page 266.

Example 8-10 Dump file /data/dumpfile.load.000 01/02/2004|01/02/2004|01/02/2004|| 01/02/2004|02/30/2004|01/02/2004|01/02/2004| 02/30/2004|01/02/2004|01/01/2004|01/02/2004| 02/30/2004|01/02/2004|01/02/2004|01/02/2004|

Two rows were inserted in the target table and four rows were rejected. The four rejected rows are in /data/dumpfile.load.000. The rows are rows 3 to 6 from the load file. The reasons for the rejection can be found in the message file: Row 3 was reject because the violation of the not null constraint, the remaining 3 rows were rejected because they contain an invalid date value (02/30/2004).

Example 8-11 contains the message file created during the execution of Example 8-9 on page 266.

Example 8-11 Messages file SQL3501W The table space(s) in which the table resides will not be placed in backup pending state since forward recovery is disabled for the database.

SQL3109N The utility is beginning to load data from file “/data/loadfile.unl”.

SQL3500W The utility is beginning the “LOAD” phase at time “05/16/2005 20:24:30.590364”.

SQL3519W Begin Load Consistency Point. Input record count = “0”.

SQL3520W Load Consistency Point was successful.

Chapter 8. Loading and unloading data 267 SQL3116W The field value in row “3” and column “4” is missing, but the target column is not nullable.

SQL3185W The previous error occurred while processing data from row “3” of the input file.

SQL0181N The string representation of a datetime value is out of range. SQLSTATE=22007

SQL3185W The previous error occurred while processing data from row “4” of the input file.

SQL0181N The string representation of a datetime value is out of range. SQLSTATE=22007

SQL3185W The previous error occurred while processing data from row “5” of the input file.

SQL0181N The string representation of a datetime value is out of range. SQLSTATE=22007

SQL3185W The previous error occurred while processing data from row “6” of the input file.

SQL3110N The utility has completed processing. “6” rows were read from the input file.

SQL3519W Begin Load Consistency Point. Input record count = “6”.

SQL3520W Load Consistency Point was successful.

SQL3515W The utility has finished the “LOAD” phase at time “05/16/2005 20:24:30.623585”.

SQL3107W There is at least one warning message in the message file.

Number of rows read = 6 Number of rows skipped = 0 Number of rows loaded = 2 Number of rows rejected = 4 Number of rows deleted = 0 Number of rows committed = 6

Therefore, the invalid values and check constraints are already handled, but the row violating the check constraint (row 2) was inserted in the target table. Therefore, the target table is in an inconsistent state. This has to be fixed in the following way: First, an exception table has to be defined, which has the same structure as the target table, but will get all the rows which violate constraints

268 Database Strategies: Using Informix XPS and DB2 Universal Database (besides NOT NULL constraints which are already handled). Example 8-12 shows the definition of the exception table. Next, the SET INTEGRITY statement can be used to move all rows which contain rows that violate constraints from the target table to the exception table.

Example 8-12 Definition of exception table create table t_exc1 like t_int1

Example 8-13 shows how the SET INTEGRITY statement is used. It specifies the table for which constraints should be checked and the exception table where the rows violating the constraints should be placed.

Example 8-13 SET INTEGRITY statement set integrity for t_int1 immediate checked for exception in t_int1 use t_exc1

Example 8-14 shows the content of the target table after the SET INTEGRITY statement was executed. It now contains only the second row from the load file.

Example 8-14 Contents of t_int1 D1 D2 D3 D4 ------01/02/2004 01/02/2004 01/02/2004 01/02/2004

Example 8-15 shows the content of the exception table, which is the first row from the load file that is violating the check constraint on column d3.

Example 8-15 Contents of t_exc1 D1 D2 D3 D4 ------01/01/2004 01/01/2004 01/01/2004 01/01/2004

8.2.2 Performance and tuning considerations for loading with DB2 As mentioned previously, the anyorder modifier is important for parallelism. There are ways for configuring the degree of parallelism for the different operators that are involved in the execution of a load statement.

However, there are additional aspects for load performance. First, there is a high overhead for starting a load command in a environment with a large number of partitions. Therefore, it is usually faster to use an IMPORT statement if just a few rows are loaded.

Chapter 8. Loading and unloading data 269 Load from cursor can be a fast alternative to an INSERT or SELECT statement if the number of partitions is small. Otherwise, the non-parallel cursor can become the bottleneck.

Because the load files are read on only one partition, it is important to put the

load files on file systems that can deliver a high I/O throughput. Therefore, these file systems should be usually on a RAID 5 or RAID 10.

Load operations which have to do many sorts, for example because they are inserting into a table which uses MDC, require enough memory for efficient sorting. Therefore, the data parameter UTIL_HEAP_SZ and the DATA BUFFER option of the load command have to be set sufficiently large.

See “Related publications” on page 443 for a list of available resource material.

8.3 Parallel unloading

This section discusses how parallel unloading is done in XPS and DB2 and particularly concentrates on a multi-partition environment.

XPS uses for parallel unloading the same method as for loading — external tables. An external table is used for loading when the external table is one of the tables in the FROM clause of the SELECT statement that is used by an INSERT statement. It is used for unloading when it is the INSERT-TARGET of an INSERT statement.

DB2 provides several methods for unload. However, because we focus on parallel unloading in a multi-partition environment, this section concentrates on the IBM DB2 High Performance Unload (HPU) for Multiplatforms V2.2 utility and on unloading in parallel with several export commands manually.

8.3.1 XPS unloading Let us look at another example. Assume that we want to unload the table t into the flat files that we have used for loading the table in the example in 8.2, “Parallel bulk loading” on page 255. We use the same external table definition as in Example 8-1 on page 256. The actual unload statement is done with an INSERT statement selecting from the internal table and writing to the external table, as shown in Example 8-16.

Example 8-16 Unloading of table t in XPS insert into t_ext select * from t

270 Database Strategies: Using Informix XPS and DB2 Universal Database This INSERT statement is reading in parallel from table t and distributing the data rows in a round robin manner to the eight flat files. Therefore, all the unload files should be approximately the same size even if there is data skew in table t. The unload files are local to the corresponding coserver.

8.3.2 DB2 unloading DB2 HPU for Multiplatforms V2.2 provides the following set of features to support partitioned DB2 database environments using either the DB2 Enterprise Extended Edition (EEE), or the Database Partitioning Feature (DPF) for DB2 V8:  You can unload DB2 table data from all the partitions where the data resides with a single execution of HPU.  You can unload DB2 table data on all the partitions to a single output file or to multiple output files, either on the local nodes or on the current node where HPU is issued.  HPU continues to unload data from the other nodes in the event that one node terminates abnormally.

Let us look at an example for the usage of HPU. Assume that we have a database tpch that contains a table customer which is distributed across eight partitions. Let us further assume that we want to unload each partition to a separate unload file called: /wrk5/unloaddb21/customer-hpu.unl.00

To do this, we first define an HPU control file as shown in Example 8-17.

Example 8-17 HPU control file hpu-customer-8c.ctrl global connect to tpch; unload tablespace select * from customer; output ( on remote host “/wrk5/unloaddb21/customer-hpu.unl” );

This control file consists of two so-called blocks: the global block and the unload block. The global block in this example is very simple and contains only the database to which HPU must connect to locate the tables for unloading. The unload tablespace line starts the unload block. The next line specifies that we want to unload from the table customer. Finally, the output statement determines that we want to have an unload file on each partition starting with the prefix: /wrk5/unloaddb21/customer-hpu.unl

Chapter 8. Loading and unloading data 271 The execution of the unload is done by calling the executable for HPU db2hpu with the -f option and specifying the control file that is defined in Example 8-17 on page 271. Example 8-18 shows this invocation of HPU.

Example 8-18 HPU invocation

db2hpu -f hpu-customer-8c.ctrl

This invocation of HPU writes concurrently to eight unload files, one for each partition. Example 8-19 shows the log that HPU generates for this invocation.

Example 8-19 HPU log file ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+---- 8----+----9----+----10----+----11----+----12----+----13-- 000001 global connect to tpch; 000002 unload tablespace 000003 select * from customer; 000004 output ( 000005 on remote host “/wrk5/unloaddb21/customer-hpu.unl” 000006 ) 000007 000008 ; 000009

INZU462I HPU control step start: 03:49:02.049. INZU463I HPU control step end : 03:49:02.170. INZU464I HPU run step start : 03:49:05.742. INZU410I HPU utility has unloaded 1875837 rows on CLYDE host for DB2TPCH.CUSTOMER in /wrk5/unloaddb21/customer-hpu.unl.001. INZU410I HPU utility has unloaded 1875024 rows on CLYDE host for DB2TPCH.CUSTOMER in /wrk5/unloaddb21/customer-hpu.unl.003. INZU410I HPU utility has unloaded 1874761 rows on CLYDE host for DB2TPCH.CUSTOMER in /wrk5/unloaddb21/customer-hpu.unl.004. INZU410I HPU utility has unloaded 1875205 rows on CLYDE host for DB2TPCH.CUSTOMER in /wrk5/unloaddb21/customer-hpu.unl.005. INZU410I HPU utility has unloaded 1874889 rows on CLYDE host for DB2TPCH.CUSTOMER in /wrk5/unloaddb21/customer-hpu.unl.006. INZU410I HPU utility has unloaded 1875803 rows on CLYDE host for DB2TPCH.CUSTOMER in /wrk5/unloaddb21/customer-hpu.unl.002. INZU410I HPU utility has unloaded 1874835 rows on CLYDE host for DB2TPCH.CUSTOMER in /wrk5/unloaddb21/customer-hpu.unl.007. INZU410I HPU utility has unloaded 1873646 rows on CLYDE host for DB2TPCH.CUSTOMER in /wrk5/unloaddb21/customer-hpu.unl.008. INZU465I HPU run step end : 03:50:23.930. INZU412I HPU successfully ended: Real time -> 1m21.880676s User time -> 0m20.799999s : Father -> 0m0.090000s, Children -> 0m20.709999s Syst time -> 0m17.080000s : Father -> 0m0.040000s, Children -> 0m17.040001s

272 Database Strategies: Using Informix XPS and DB2 Universal Database You can find more information about HPU in DB2 High Performance Unload for Multiplatforms and Workgroups User’s Guide Version 2 Release 2, SC88-9874.

8.3.3 Parallel exports

A simple method for parallel unloading which does not require any additional tool is the use of the export command. The export command is similar to the syntax of the import or load command but is used for unloading instead of loading. However, the export command is writing to just a single output file. Therefore, for doing a parallel export on each partition, a script must be used which runs one export command against each partition.

The ksh script in Example 8-20 shows how this can be done. It assumes that a table customer which is distributed across all partitions should be unloaded. It is also assumed that the table customer has a column c_custkey.

Example 8-20 Ksh script for parallel export #!/bin/ksh #parallel export

db2_all “; db2 \ export to /wrk5/unloaddb2\$DB2NODE/customer.unl of del \ select ‘*’ \ from customer \ where ‘nodenumber(c_custkey)’ = current node \ > /wrk5/unloaddb2\$DB2NODE/customer.log 2>&1”

The db2_all command is used to execute the command specified as the argument in quotation marks against each partition. The initial semicolon causes all the CLP instances which are started to be executed in parallel. The db2_all command makes sure that the environment variable DB2NODE is set to the correct value for each partition.

The export command specifies the unload file first. The unload file for each partition is called /wrk5/unloaddb2/customer.unl. The of del clause means that a delimited ASCII file should be created. Because no delimiter is specified, the default delimiter of a comma (,) is used.

The SELECT statement determines the rows which should be unloaded in each local file. The most interesting part of the SELECT statement is the where clause. The NODENUMBER(column) parameter yields the partition number of a specific row. By comparing it to current node (that is, the partition number of the partition to which we are connected), we make sure that we select only local rows. The last line of the scripts creates a local log file for each partition, so that we can verify whether all of the export statements were successful.

Chapter 8. Loading and unloading data 273 8.4 Specific issues

You cannot do the following directly in DB2:  Use the database for processing flat files (select from external table and

insert in external table)  Join load files with flat files

Both of these issues can be solved by using staging tables in the database, which are loaded first. Then all the necessary processing can be done with normal SQL.

If tables are partitioned with UNION ALL views in DB2, it is not possible to use the view as a target for the load operation. This is usually not a problem, because the UNION ALL view allows incremental loading of a sliding window of data. Therefore, the load is done directly into the corresponding base table of the UNION ALL view.

When loading in MDC tables, each load always starts a new block. This might be an issue for incremental loads, which are adding just a few rows to the block. Therefore, this can result in a huge waste of space. A solution for this issue is again the use of a staging table. The load command can load the data in the non-MDC staging table. Then an INSERT or SELECT statement can be used to update the MDC table.

274 Database Strategies: Using Informix XPS and DB2 Universal Database

9

Chapter 9. Administration tools and utilities

This chapter provides an overview of some of the DB2 administrative tools and capabilities. Topics include the graphical tools and wizards, system and database utilities, monitoring tools and advisors, and diagnostics. This brief overview is primarily to make you aware of these tools and capabilities so that you can understand their basic advantages and benefits. Where applicable, we try to relate them to similar tools or capabilities available with XPS.

© Copyright IBM Corp. 2005. All rights reserved. 275 9.1 Resource management

We discussed many of the similarities and differences in the DB2 and XPS architectures in Chapter 2, “XPS and DB2 UDB architectures” on page 7. From there, we learned that they each have different resource management requirements.

For example, in DB2 the DB2 Governor utility (db2gov) can be used to control the resource consumed by applications. The governor can monitor the behavior of applications that run against a database and can change certain behavior, based on the rules that you specify in the governor configuration file. In a partitioned database environment, you have a choice of whether to start the governor on all partitions or on an individual partition.

When the governor is active, a daemon on each partition collects information about the applications that run against the database. It then checks this information against the rules that you specified in the governor configuration file for this database. For example, applying a rule might indicate that an application is using too much of a particular resource. The rule would specify the action to take, such as to change the priority of the application or force it to disconnect from the database. If the action associated with a rule changes the priority of the application, the governor changes the priority of agents on the database partition where the resource violation occurred. In a partitioned database, if the application is forced to disconnect from the database, the action occurs even if the daemon that detected the violation is running on the coordinator node of the application.

For more detailed information about the DB2 Governor utility, see the DB2 Administration Guide: Performance, SC09-4821.

9.2 Performance tuning

Relative to XPS, DB2 UDB has a significantly greater number of configuration parameters that can be tuned and can impact performance at the instance level, database level, and application level. When you start a new instance of DB2, consider the following suggestions for a basic configuration:  Use the Configuration Advisor in the Control Center to get advice about reasonable beginning defaults for your system. The defaults shipped with DB2 should be tuned for your unique hardware environment. You need to gather information about the hardware at your site so that you can answer the questions that are asked by the wizard. You can apply the suggested configuration parameter settings immediately or let the wizard create a script

276 Database Strategies: Using Informix XPS and DB2 Universal Database based on your answers, and run the script later. This script also provides a list of the most commonly tuned parameters for later reference.

 Use other wizards in the Control Center and Client Configuration Assistant for performance-related administration tasks. These tasks are usually those in which you can achieve significant performance improvements by spending spend a little time and effort. Other wizards can help you improve performance of individual tables and general data access. These wizards include the Create Database, Create Table, Index, and Configure Multi-site Update wizards. The Health Center provides a set of monitoring and tuning tools.  Use the Design Advisor tool from the Control Center or the db2advis command to obtain what indexes, materialized query tables, multidimensional clustering tables, and database partitions improves query performance.  Use the ACTIVATE DATABASE command to start databases. In a partitioned database, this command activates the database on all partitions and avoids the startup time required to initialize the database when the first application connects. If you use the ACTIVATE DATABASE command, you must shut down the database with the DEACTIVATE DATABASE command. The last application that disconnects from the database does not shut it down.  Consult the summary tables in Administration Guide: Performance, SC09-4821, that list and briefly describe each configuration parameter that is available for the database manager and each database. These summary tables contain a column that indicates whether tuning the parameter results in high, medium, low, or no performance changes, either for better or for worse. Use this table to find the parameters that you might tune for the largest performance improvements.

The performance improvement process is an iterative, long term approach to monitoring and tuning aspects of performance. Depending on the result of monitoring, you and your performance team adjust the configuration of the database server and make changes to the applications that use the database server.

Base your performance monitoring and tuning decisions on your knowledge of the kinds of applications that use the data and the patterns of data access. Different kinds of applications have different performance requirements.

See Administration Guide: Performance, SC09-4821, for further guidelines and information.

Chapter 9. Administration tools and utilities 277 9.3 Tools and wizards that are included with DB2

DB2 is rich in tools and graphical assistants to make your job easier. Many of the

recent features that have been added to DB2 are centered around the concept of SMART, Self Managing and Resource Tuning. This section introduces the DB2 tools that are available to you and that are accessible from the Control Center.

9.3.1 Control Center The Control Center is the primary administration environment for DB2 and contains capabilities that extend beyond the XPS tools, such as dbaccess or Informix Server Administrator (ISA). The Control Center is written in Java. Therefore, it can run on any platform that has a JDK™. You should check the DB2 readme notes to verify supported Java releases.

Some of the key tasks that you can perform with the Control Center are:  Add DB2 UDB systems, federated systems such as XPS, IDS, DB2 UDB for z/OS and OS/390 systems, instances, databases, and database objects to the object tree.  Manage database objects. You can create, change, and drop databases, table spaces, tables, views, indexes, triggers, and schemas.  Manage data. You can load, import, export, and reorganize data, as well as gather statistics and run queries.  Perform preventive maintenance by backing up and restoring databases or table spaces.  Configure and tune instances and databases.  Manage database connections, such as DB2 Connect servers and subsystems.  Manage DB2 UDB for z/OS and OS/390 subsystems.  Manage applications.  Analyze queries using Visual Explain to look at access plans.

9.3.2 Command Editor Similar to the XPS dbaccess utility, Command Editor is an interface for running SQL queries and operating system commands. Use the Command Editor to generate, edit, and execute SQL statements, work with the resulting output of DB2 command, and to view a graphical representation of the access plan for explained SQL statements.

278 Database Strategies: Using Informix XPS and DB2 Universal Database 9.3.3 Task Center

Use the Task Center to create, schedule, and run tasks. There is no equivalent facility in XPS. You can create the following types of tasks:

 DB2 scripts that contain DB2 commands.  Operating system scripts that have operating system commands.  Grouping tasks that contain other tasks.

Task schedules are managed by a scheduler, included with DB2, while the tasks are run on one or more systems, called run systems. You define the conditions for a task to fail or succeed with a success code set. Based on the success or failure of a task or group of tasks, you can run additional tasks, disable scheduled tasks, and take other actions. You can also define notifications to send after a task completes. You can send an e-mail notification to people in your contacts list, or you can send a notification to the Journal.

9.3.4 SQL Assist SQL Assist is a tool that uses an outline and details panels to help you organize the information that you need to create an SQL statement. You can launch SQL Assist from the SQL icon in the Command Editor.

9.3.5 Visual Explain Visual Explain shows you a graphic of the access plan for explained SQL statements. This represents the graphical equivalent of the XPS SET EXPLAIN output. You can use the information available from the graph to tune your SQL queries for better performance. An access plan graph shows:  Tables (and their associated columns) and indexes  Operators (such as table scans, sorts, and joins)  Table spaces and functions  Total estimated cost and number of rows retrieved (cardinality)

In addition, Visual Explain displays the statistics that were used at the time of optimization. You can then compare these statistics to the current catalog statistics to help you determine whether rebinding the package might improve performance.

Figure 9-1 on page 280 is an example of Visual Explain output for the query.

Chapter 9. Administration tools and utilities 279

Figure 9-1 Visual Explain

9.3.6 Configuration Assistant Use the Configuration Assistant to configure and maintain the database objects that you will be using. Unlike XPS, which requires client configuration at an instance level, DB2 requires you to configure access to each database from your DB2 client before you can work with it. You must configure your DB2 clients so they can work with the available objects. From the Configuration Assistant, you can work with existing database objects, add new ones, bind applications, set database manager configuration parameters, and import and export configuration information.

9.3.7 Journal

The Journal provides the ability to view historical information about tasks, database actions and operations, messages, and notifications. The Journal is the focal point for viewing historical information generated within the Control Center and its components. It provides an equivalent of the XPS onstat -m function plus additional capabilities.

280 Database Strategies: Using Informix XPS and DB2 Universal Database 9.3.8 Health Center

The Health Center is a graphical administration tool designed to support management-by-exception. Health center monitors and provides alert states of all instances and their databases and recommended resolution actions for

current alerts.

9.3.9 Replication Center The DB2 Replication Center is a tool that you can use to set up and administer your replication environment. The Replication Center supports administration for DB2-to-DB2 replication environments, and administration for replication between DB2 and non-DB2 relational databases, such as Informix database family. The DB2 Replication Center is part of the DB2 Control Center set of tools. You can use the Replication Center to set up the three types of replication that DB2 supports: SQL replication, Queue replication, and Event Publishing. You can specify unidirectional, and bidirectional replication, with one or more servers. Use Replication Center to:  Create replication control tables  Register replication sources  Create subscription sets and add subscription-set members to subscription sets  Operate the Capture program  Operate the Apply program  Monitor the replication process

9.3.10 License Center The management of licenses for your DB2 Universal Database (DB2 UDB) products is done primarily through the License Center. From the License Center you can check the license information, statistics, registered users, and current users for each of your installed products. License Center can be accessed from the Tools menu in Control Center. There is no equivalent component for XPS.

9.3.11 Information Catalog Center The Information Catalog Center provides you with the ability to manage descriptive data, also known as business metadata, through information catalogs. The descriptive data, which is organized into metadata objects, helps you identify and locate information. You can search for specific objects in the information catalog and view any relationships an object participates in or an

Chapter 9. Administration tools and utilities 281 object's lineage. You can also create comments for objects. Some users can also define additional objects in the information catalog.

9.3.12 Data Warehouse Center

You can use the Data Warehouse Center (DWC) to move data from operational databases to a data warehouse database, that users can query for decision support. This process is also known as ETL, which stands for extract, transform, load. You can use the DWC to define the structure of the operational databases, called sources. You can then specify how the operational data is to be transformed, and moved to the data warehouse. You can model the structure of the tables in the data warehouse database, called targets, or build the tables automatically as part of the process of defining the data movement operations and loading the data.

For example, you can use the Data Warehouse Center to define your sales source data for use in a data warehouse, define a star schema for the data warehouse, and clean and transform the data to fit the star schema format.

The Data Warehouse Center, depicted in Figure 9-2, uses SQL, ODBC export, and DB2 load and export utilities to move and transform data. You can use replication to copy large quantities of data from warehouse sources into a warehouse target, and then capture any subsequent changes to the source data. These operations are supported on all of the DB2 Universal Database workstation operating environments, DB2 Universal Database for zSeries, DB2 for iSeries, and non-DB2 databases such as IDS and XPS, with WebSphere Information Integrator. You can also use the Data Warehouse Center to move data into an OLAP (Online Analytical Processing) database. An expanded functionality version of DB2 Warehouse Center is called DB2 Warehouse Manager, and is available as an additional cost option.

Figure 9-2 Data Warehouse Center

For more information about Data Warehouse Center, see Data Warehouse Center Administration Guide, SC26-9993.

282 Database Strategies: Using Informix XPS and DB2 Universal Database 9.3.13 Web administration

Two Web-based tools are available for remote administration of DB2, DB2 Web Command Center and DB2 Web Health Center. These tools run as Web applications on a Web application server to provide access to DB2 servers

through Web browsers. They can play a similar role to the ISA in the XPS environment.

The DB2 Web Command Center is based on a three-tier architecture. The first tier is the Web client HTTP browser. The middle tier is an application server that hosts the business logic and set of applications. This middle tier provides the underlying mechanisms for the (HTTP/HTTPS) communication with the first tier (Web client browser) and also the third tier, which is the database or transaction server.

The DB2 Web Command Center implements many of the already existing features of the DB2 Command Center. However, it does not contain SQL Assist or Visual Explain.

The DB2 Web Command Center is targeted for use with the HTTP clients (browsers) available on mobile notebooks, as well as Web-enabled PDAs and Palm devices.

9.3.14 Wizards, advisors, and launchpads The wizards and advisors that are part of DB2 greatly improve your productivity, especially as you make the transition from XPS to DB2. The DB2 advisors, wizards, and launchpads are integrated into the DB2 administration tools and assist you in completing administrative tasks by stepping you through the tasks.

Some of the wizards and launchpads that are available from other parts of Control Center include:  Configure Automatic Maintenance wizard  Create Cache Table wizard  Redistribute Data wizard  Storage Management Setup launchpad  Set up Activity Monitor wizard  Set up High Availability Disaster Recovery (HADR) Databases wizard  Configure Automatic Maintenance wizard  Storage Management Setup launchpad

Chapter 9. Administration tools and utilities 283 9.4 Optional tools

The tools discussed in this section add value to your DB2 environment by helping

you to maximize your productivity. Each is an optional cost item. For more details, see: http://www.ibm.com/software/data/db2imstools/

9.4.1 DB2 Performance Expert DB2 Performance Expert offers a comprehensive view that consolidates, reports, analyzes, and recommends changes on DB2 performance-related information. The tool includes a Performance Warehouse that stores performance data and analysis tools, as well as a Buffer Pool Analyzer that collects data and provides reports on related event activity. DB2 Performance Expert builds on IBM autonomic computing and on-demand expertise, providing recommendations for system tuning to gain optimum throughput. The tool is available for DB2 Universal Database on the Linux, UNIX and Windows platforms as well as z/OS.

Consider DB2 Performance Expert if you need a comprehensive DB2 monitoring and reporting tool, monitor your DB2 systems across the enterprise, wish to utilize DBA skills for monitoring multiplatform DB2 databases and z/OS databases concurrently or if you need in-depth information for long-term planning.

9.4.2 DB2 Recovery Expert IBM DB2 Recovery Expert provides targeted, flexible, and automated recovery of database assets. DB2 Recovery Expert helps expert and novice DBAs to recover database objects safely, precisely and quickly without having to resort to full disaster recovery processes.

Building on IBM autonomic computing expertise, the tool provides intelligent analysis and diagnostics of altered, incorrect, or missing database assets including tables, indexes, or data. It also automates the process of rebuilding those assets to a correct point-in-time, often without taking the database offline. In addition, you can mine the database logs to create undo or redo SQL and to remove errant transactions without having to do a full table space or database recovery.

9.4.3 DB2 High Performance Unload High Performance Unload (HPU) is a high-speed DB2 utility for unloading DB2 tables from either a database or from a backup. HPU is an extra charge option of DB2 that provides the capability to unload data via bypassing the database

284 Database Strategies: Using Informix XPS and DB2 Universal Database manager, and reading directly from data blocks. The result is performance typically several times faster than the EXPORT utility.

For more information about High Performance Unload, see: http://www.ibm.com/software/data/db2imstools/db2tools/db2hpu/

9.4.4 DB2 Test Database Generator IBM DB2 Test Database Generator rapidly populates application and testing environments and simplifies problem resolution. It can easily create test data from scratch or from existing data sources and maintains referential integrity while extracting data sets from source databases. It can create complete or scaled down copies of production databases while masking sensitive production data for use in a test environment.

For more information about DB2 Test Database Generator, see: http://www.ibm.com/software/data/db2imstools/db2tools/db2tdbg/

9.4.5 DB2 Table Editor IBM DB2 Table Editor quickly and easily accesses, updates, and deletes data across multiple DB2 database platforms. Key features include:  Navigates IBM DB2 databases, tables and views; finds related data; and quickly updates, deletes, or creates data with full support for your existing security and logon IDs.  Edits DB2 tables everywhere with your choice of end-user entry points: Java-enabled Web browsers; Java-based interfaces launched from the IBM DB2 Control Center; or an ISPF interface.  Provides drag-and-drop and wizards to rapidly create customized, task-specific Java- or Windows-based table editing forms containing built-in data validation and business rules.

Consider DB2 Table Editor when you need to edit DB2 Tables, need easy-to-build forms capability for users, or need access to DB2 data. For more information about DB2 Table Editor, see: http://www.ibm.com/software/data/db2imstools/db2tools/db2te/

Chapter 9. Administration tools and utilities 285 9.4.6 DB2 Web Query Tool

The DB2 Web Query Tool connects all your users directly to multiple enterprise databases, securely and simultaneously, regardless of database size, hardware, operating system, or location. Key features of DB2 Web Query Tool are:

 Enables complex querying, data comparisons, and customized presentations.  Provides rapid global access to business information over e-mail clients, including WAP-enabled devices such as PDAs, wireless phones, and text pagers.  Supports standard browsers, giving administrators, developers, and users the ability to build queries that support multiple DB2 platforms, share and run the queries, and convert the results to XML and other highly transportable file formats.  Is a J2EE-compliant Web application so that it can be deployed on WebSphere and other application servers.

Consider the DB2 Web Query Tool when you need comprehensive query and comparison capabilities without compromising DB2 security or data integrity, require thin client access from many different devices on your network, or need DB2 databases across an enterprise.

For more information about DB2 Web Query Tool, see: http://www-306.ibm.com/software/data/db2imstools/db2tools/db2wqt/

9.4.7 Query Patroller DB2 Query Patroller is a powerful query management system that you can use to proactively and dynamically control the flow of queries against your DB2 database in the following key ways:  Define separate query classes for queries of different sizes to better share system resources among queries and to prevent smaller queries from getting stuck behind larger ones.  Give queries that are submitted by certain users high priority so that these queries run sooner.  Automatically put large queries on hold so that they can be cancelled or scheduled to run during off-peak hours.  Track and cancel runaway queries.

The features of DB2 Query Patroller allow you to regulate your database query workload so that small queries and high-priority queries can run promptly, and your system resources are used efficiently. In addition, information about

286 Database Strategies: Using Informix XPS and DB2 Universal Database completed queries can be collected and analyzed to determine trends across queries, heavy users, and frequently used tables and indexes.

This DB2 tool provides a similar set of functionality to I-SPY in the XPS environment. Figure 9-3 depicts the Query Patroller Center GUI tool being used to monitor and manage queries.

Figure 9-3 Query Patroller managing queries

Chapter 9. Administration tools and utilities 287 9.5 Utilities

DB2 has a complete set of utilities, available via command line and GUI.

Table 9-1 is a comparison chart that shows the major DB2 utilities with their XPS equivalents. For more information, see DB2 Command Reference, SC26-8967.

Table 9-1 DB2 and XPS utilities comparison Function DB2 XPS

Backup/ Restore backup, restore, recover onbar

Load/unload data import (row at a time), load pload, load, unload (bulk data), export, High Performance Unload (fast unload, a separately priced option), db2move (move database)

Check if database needs reorgchk onutil; sysmaster reorganization

Reorganize reorg index, reorg table unload/reload, alter fragment init

Maintain database runstats (can also be update statistics statistics automated)

Analyze queries explain, db2exfmt, set EXPLAIN, I-Spy db2expln, Visual Explain

DDL (schema) extraction db2look dbschema

Check database integrity db2dart, inspect/db2inspf onutil

Command line database db2pd onstat monitoring (has many of the same commands as onstat)

Check backup db2ckbkp archecker

9.5.1 Database reorganization After many changes to table data, logically sequential data might be on non-sequential physical data pages so that the database manager must perform additional read operations to access data. Additional read operations are also

required if a significant number of rows have been deleted. In such a case, you might consider reorganizing the table to match the index and to reclaim space. You can reorganize the system catalog tables as well as database tables.

288 Database Strategies: Using Informix XPS and DB2 Universal Database Consider the following factors, which might indicate that you should reorganize a table:

 A high volume of insert, update, and delete activity on tables accessed by queries.

 Significant changes in the performance of queries that use an index with a high cluster ratio.  Executing runstats to refresh statistical information does not improve performance.  The reorgchk command indicates a need to reorganize your table.  The trade-off between the cost of increasing degradation of query performance and the cost of reorganizing your table, which includes the processor time, the elapsed time, and the reduced concurrency resulting from the REORG utility locking the table until the reorganization is complete.

To reduce the need for reorganizing a table, perform these tasks after you create the table: 1. Alter table to add PCTFREE. 2. Create clustering index with PCTFREE on index. 3. Sort the data. 4. Load the data.

After you have performed these tasks, the table, with its clustering index and the setting of PCTFREE on table, helps preserve the original sorted order. If enough space is allowed in table pages, new data can be inserted on the correct pages to maintain the clustering characteristics of the index. As more data is inserted and the pages of the table become full, records are appended to the end of the table so that the table gradually becomes unclustered.

Similarly, as tables are updated with deletes and inserts, index performance degrades in the following ways: 1. Fragmentation of leaf pages increases I/O costs because more leaf pages must be read to fetch table pages. 2. The physical index page order no longer matches the sequence of keys on those pages, which is referred to as a badly clustered index. 3. When leaf pages are badly clustered, sequential prefetching is inefficient and results in more I/O waits.

4. The index develops more than its maximally efficient number of levels.

Chapter 9. Administration tools and utilities 289 If you set the MINPCTUSED parameter when you create an index, the database server automatically merges index leaf pages if a key is deleted and the free space is less than the specified percent. This process is called online index defragmentation. To restore index clustering, free space, and reduce leaf levels, you can use one of the following methods: 1. Drop and re-create the index. 2. Use the reorg indexes command to reorganize indexes online. You might choose this method in a production environment because it allows users to read from and write to the table while its indexes are being rebuilt. 3. Use the reorg table command with options that allow you to reorganize both the table and its indexes off-line.

Tip: Creating Multidimensional Clustering (MDC) tables might reduce the need to reorganize. For MDC tables, clustering is maintained on the columns that you specify as arguments to the ORGANIZE BY DIMENSIONS clause of the CREATE TABLE statement. However, REORGCHK might recommend reorganization of an MDC table if it considers that there are too many unused blocks or that blocks should be compacted.

You can use the reorgchk command to assess the need for reorganization. It returns statistical information about data organization and can advise you about whether particular tables need to be reorganized. Running specific queries against the catalog statistics tables at regular intervals or specific times can also provide a performance history that allows you to spot trends that might have wider implications for performance. You can also use the reorgchk command to update table and index statistics in the catalogs.

9.5.2 Database statistics Both the XPS and DB2 use a cost based optimizer, which means that statistical information, such as the number of rows of a table or structure of an index, is used by the optimizer to calculate the cost of a query plan. It is up to the DBA to keep these statistics as current as possible and necessary. Without the correct statistical information, the optimizer is not able to calculate an adequate query plan, which can result in degraded performance.

The DB2 version of update statistics is called runstats. Similar to the update statistics command that is known by XPS, you have to run runstats from time to time. The right time to execute runstats depends on the dynamic nature of the data. If the content of a table is changing constantly, you will need to update the statistic more often. If a table content is static, a single runstats is sufficient. XPS behaves in almost the same manner.

290 Database Strategies: Using Informix XPS and DB2 Universal Database However there is a difference between XPS and DB2 regarding the kind of statistics collected. DB2 collects table, column and index information. In addition, different from update statistics in XPS that collects statistics from all coservers, runstats in DB2 only collects statistics for tables on the partition from which you execute it. The runstats command results from this partition are extrapolated to the other partitions. If the database partition from which you execute runstats does not contain a table partition, the request is sent to the first database partition in the database partition group that holds a partition for the table. In DB2, you can also automate statistics maintenance by using the Automated Database Maintenance feature. For a description, see 9.6.1, “Configuring automatic maintenance” on page 293.

Table 9-2 Basic update statistics and runstats comparison XPS DB2

update statistics low runstats on table schema.tabname and indexes all

update statistics medium runstats on table schema.tabname with distribution on all columns default num_freqvalues n num_quantiles m and sampled detailed indexes all

update statistics high runstats on table schema.tabname with distribution on all columns and detailed indexes all

Note: All examples in Table 9-2 run against all columns and indexes of the table. You can also select single columns or indexes.

The update statistics medium equivalent for DB2 is not exactly the same as for XPS. The lower the NUM_FREQVALUES and NUM_QUANTILES parameters are, the smaller are the samples that are taken.

9.5.3 Schema extraction The db2look utility provides similar functionality to the XPS dbschema utility for extracting a database schema to an external file. It also has additional functionality, such as capturing all the statistics using SQL UPDATE statements in a script. You can use these DDL statements and statistics captured to reproduce the database objects of a production database in a test environment. Therefore, using this tool makes it possible to create a test system where access plans are similar to those that would be used in the production system.

Chapter 9. Administration tools and utilities 291 9.5.4 Maintaining database integrity

You can check for database integrity using either of two options:

1. The inspect command allows you to inspect table spaces and tables for their architectural integrity, while the database remains online.The inspection validates table objects and table space structures. In a partitioned database system, it is the collection of all logical partitions defined in db2nodes.cfg. 2. The db2dart utility can accomplish similar results as the inspect command. However, it only examines the database partition from which it is executed, and no database access is allowed while the tool is running.

Both inspect and db2dart are run from the command line. There are many options available, such as specifying table space(s), tables, and schema. When you run inspect, you must use the db2inspf facility to format the results kept.

9.5.5 Throttling utilities DB2 maintenance utilities, such as load, backup, and rebalance, can be very resource intensive, and running them can impact the performance of your production system. With DB2 utility throttling, you can regulate the performance impact of maintenance utilities, so that they can be run during production periods. You can develop a throttling policy that will run the utilities aggressively when the production workload is light, but will run them more conservatively as production demands increase.

The ability to throttle utilities allows you to:  Execute maintenance tasks with total control over the performance impact to the production workload. This eliminates the need to identify off-peak hours or schedule downtime for utility tasks.  Ensure that valuable system resources are fully utilized by utilities in periods of reduced demand.  Eliminate performance impact as a consideration when monitoring a utility and configuring its parameters (for example, setting the PARALLELISM parameter for a backup invocation). When a throttling policy is established, it is the responsibility of the system to ensure that the policy is obeyed (to the extent possible).

The set util_impact_priority command changes the impact setting for a running utility. Using this command, you can throttle a utility that was invoked in

un-throttled mode, un-throttle a throttled utility (disable throttling), or re-prioritize a throttled utility. You can monitor utility progress using the Utility Monitor tool, which is accessible in Control Center by right-clicking the database, and

292 Database Strategies: Using Informix XPS and DB2 Universal Database selecting Manage Utilities. The command line equivalent to the Utility Monitor is list utilities.

The general syntax for command is: set util_impact_priority for to

You can obtain the utility ID via the list utilities command. This command can be used to monitor the progress of the following operations:  Backup  Restore  Crash recovery  Load  Rebalance

9.5.6 Validating a backup Similar to archecker, in XPS, the db2ckbkp (Check Backup) utility can be used to test the integrity of a backup image and to determine whether the image can be restored. It can also be used to display the metadata stored in the backup header.

For more information, refer to DB2 Command Reference, SC09-2951.

9.6 Other administrative operations

Other administrative operations of general interest include configuring automatic maintenance, and working with databases and tables.

9.6.1 Configuring automatic maintenance DB2 provides automatic maintenance capabilities for performing database backups, keeping statistics current, and reorganizing tables and indexes. Enablement of the automatic maintenance features is controlled using the automatic maintenance database configuration parameters. These are a hierarchical set of switches to allow for simplicity and flexibility in managing the enablement of these features. You can configure automatic maintenance using Control Center. An Automation example using Control Center is presented in Database Transition: Informix Dynamic Server to DB2 Universal Database, SG24-6367.

Chapter 9. Administration tools and utilities 293 Automatic database backup Automatic database backup provides you with a solution to help ensure that your database is backed up both properly and regularly, without having to worry about when to back up or having knowledge of the backup command.

Automatic database backup determines the need to perform a backup operation based on one or more of the following criteria:  You have never completed a full database backup.  The time elapsed since the last full backup is more than a specified number of hours.  The transaction log space consumed since the last backup is more than a specified number of 4 KB pages (in archive logging mode only).

Important: If backup to disk is selected, the automatic backup feature deletes backup images regularly from the directory specified in the Configure Automatic Maintenance wizard. Only the most recent backup image is guaranteed to be available at any given time. It is recommended that this directory be kept exclusively for the automatic backup feature and not be used to store other backup images.

The automatic database backup feature can be enabled or disabled by using the AUTO_DB_BACKUP and AUTO_MAINT database configuration parameters. In a partitioned database environment, the automatic database backup runs on each partition if the database configuration parameters are enabled on that partition. The backup policy for a database is created automatically when the DB2 Health Monitor first runs.

Through the Configure Automatic Maintenance wizard in the Control Center or Health Center, you can configure the requested time or number of log pages between backups, the backup media, and whether it will be an online or offline backup.

Automatic database statistics Automatic statistics collection attempts to improve the performance of the database by maintaining up-to-date table statistics. Automatic statistics profiling advises when and how to collect table statistics by detecting outdated, missing, and incorrectly specified statistics and by generating statistical profiles based on query feedback.

Automatic statistics collection works by determining the minimum set of statistics that give optimal performance improvement. The decision to collect or update statistics is taken by observing and learning how often tables are modified and how much the table statistics have changed. The automatic statistics collection

294 Database Strategies: Using Informix XPS and DB2 Universal Database algorithm learns over time how fast the statistics change on a per table basis and internally schedules runstats execution accordingly.

Normal database maintenance activities such as runstats, reorg, or altering or dropping a table are not affected by the enablement of this feature. The automatic statistics collection feature can be enabled or disabled by using the AUTO_RUNSTATS, AUTO_TBL_MAINT, and AUTO_MAINTDATABASE configuration parameters.

Tables considered for automatic statistics collection are configurable by you using the Automatic Maintenance wizard from the Control Center or Health Center.

Automatic reorganization Automatic reorganization manages offline table and index reorganization without users having to worry about when and how to reorganize their data. Automatic reorganization determines the need for reorganization on tables by using the REORGCHK formulas. It periodically evaluates tables that have had their statistics updated to see if reorganization is required. If so, it schedules internally a classic reorganization (offline) for the table. This requires that your applications function without write access to the tables being reorganized.

The automatic reorganization feature can be enabled or disabled by using the AUTO_REORG, AUTO_TBL_MAINT, and AUTO_MAINT database configuration parameters.

You can configure automatic reorganization using the Automatic Maintenance wizard from the Control Center or Health Center.

Automation maintenance windows The automatic maintenance features consume resources on your system and might affect the performance of your database when you run them. Automatic reorganization and offline database backup also restrict access to the tables and database when these utilities are run.

Therefore, it is necessary to provide appropriate periods of time when these maintenance activities can be scheduled internally for execution. These times can be specified as offline and online maintenance time periods using the automatic maintenance wizard from the DB2 Control Center or DB2 Health Center.

Offline database backups and table and index reorganization are run in the offline maintenance time period. These features run to completion even if they go beyond the time period specified. The internal scheduling mechanism learns over time and estimates job completion times. If the offline time period is too

Chapter 9. Administration tools and utilities 295 small for a particular database backup or reorganization activity, the scheduler will not start the job the next time around and relies on the Health Monitor to provide notification of the need to increase the offline maintenance time period.

Automatic statistics collection and profiling as well as online database backups

are run in the online maintenance time period. To minimize the impact on the system, they are throttled by the adaptive utility throttling mechanism. The internal scheduling mechanism uses the online maintenance time period to start the online jobs. These features run to completion even if they go beyond the time period specified.

9.7 Monitoring tools and advisors

DB2 offers a wide variety of monitoring tools and advisors, which are discussed in this section. These tools help the DBA proactively monitor conditions and advise on a course of action. To facilitate monitoring, DB2 collects information from the database manager, its databases, and any connected applications. With this information you can do the following, and more:  Forecast hardware requirements based on database usage patterns.  Analyze the performance of individual applications or SQL queries.  Track the usage of indexes and tables.  Pinpoint the cause of poor system performance.  Assess the impact of optimization activities (for instance, altering database manager configuration parameters, adding indexes, or modifying SQL queries).

9.7.1 Health check tools This section looks at the Health Monitor, Health Center, and recommendation adviser. The Health Monitor is a server-side tool that adds a management-by-exception capability by constantly monitoring the health of an instance and active databases, table space, and table space containers, even without user interaction. The Health Monitor proactively detects issues that might lead to hardware failure, or to unacceptable system performance or capability. The proactive nature of the Health Monitor enables users to address an issue before it becomes a problem that affects system performance. This management-by-exception model frees up valuable DBA resources by

generating alerts to potential system health issues without requiring active monitoring.

296 Database Strategies: Using Informix XPS and DB2 Universal Database If the Health Monitor finds that a defined threshold has been exceeded (for example, the available log space is not sufficient) or if it detects an abnormal state for an object (for example, an instance is down), it raises an alert. When an alert is raised two things can occur: 1. Alert notifications can be sent by e-mail or to a pager address, allowing you to contact whoever is responsible for a system. 2. Pre-configured actions can be taken. For example, a script or a task (implemented from the new Task Center) can be run.

A health indicator is a system characteristic that the Health Monitor checks against health-indicator thresholds when determining whether to issue an alert. The Health Monitor comes with a set of predefined thresholds for these health indicators. Using the Health Center, commands, or APIs, you can customize the threshold settings of the health indicators, and define who should be notified and what script or task should be run if an alert is issued.

The Health Monitor can only evaluate health indicators on a database and its objects when the database is active. You can keep the database active either by starting it with the activate database command or by maintaining a permanent connection to the database.

Note: The Health Monitor gathers information about the health of the system using interfaces that do not impose a performance penalty, such as data retrieved from database system monitor elements, and the operating system. It does not turn on any snapshot monitor switches to collect information.

The Health Center provides the graphical interface to the Health Monitor. Use the Health Center to configure the Health Monitor, and to see the alert state of your instances and database objects. With the Health Monitor drill-down capability, you can access details about current alerts and obtain a list of recommended actions that describe how to resolve the alert. You can follow one of the recommended actions to address the alert. If the recommended action is to make a database or database manager configuration change, a new value will be recommended. You can then implement the recommendation by clicking a button directly from within the tool. In other cases, the recommendation is to investigate the problem further by launching a tool, such as the CLP or the Memory Visualizer.

9.7.2 Memory Visualizer The Memory Visualizer is a DB2 tool that helps database administrators to monitor the memory-related performance of an instance and all of its databases.

Chapter 9. Administration tools and utilities 297 Using the tool, you can select, display, and graph memory usage for one or more memory components.

The Memory Visualizer window displays two views of data: a tree view and a historical view. A series of columns show percentage threshold values for upper

and lower alarms and warnings. The columns also display real time memory utilization. With the Memory Visualizer, you can:  View data on the memory utilization of selected components for a DB2 instance and its databases.  Change settings for individual memory components by updating configuration parameters.  Load performance data from a file into a Memory Visualizer window.  Save the performance data for later analysis.

9.7.3 Storage Manager You can use the Storage Manager, which is accessible from Control Center, to manage table spaces and containers, to monitor size over time, and to set warning indicators and thresholds.

9.7.4 Event monitor Event monitors are used to collect information about the database and any connected applications when specified events occur. The event monitors must be created and enabled. There is no equivalent in XPS. However, running onstat commands repeatedly enables you to capture some sequence of events. Events represent transitions in database activity (for example, connections, deadlocks, statements, and transactions). You can define an event monitor by the type of event or events you want it to monitor. For example, a deadlock event monitor waits for a deadlock to occur; when one does, it collects information about the applications involved and the locks in contention. By default, all databases have an event monitor named DB2DETAILDEADLOCK defined, which keeps track of DEADLOCKS WITH DETAILS. The DB2DETAILDEADLOCK event monitor starts automatically when the database starts.

While the snapshot monitor is typically used for preventative maintenance and problem analysis, event monitors are used to alert administrators to immediate problems or to track impending ones. To create an event monitor, use the CREATE EVENT MONITOR SQL statement. Event monitors collect event data only when they are active. To activate or deactivate an event monitor, use the set event monitor state statement. The status of an event monitor (whether it is active or inactive) can be determined by the SQL function EVENT_MON_STATE.

298 Database Strategies: Using Informix XPS and DB2 Universal Database Event monitor output can be directed to SQL tables, a file, or a named pipe. For example, you can request that DB2 logs the occurrence of deadlocks between connections to a database.

9.7.5 Snapshots Snapshots are useful for diagnosing both operational and application issues, especially for situations that occur over time, such as deadlocks. Prior to the new db2pd command, snapshots and event monitors were the method to obtain point in time information from the instance and database.

Similar to event monitors, you have the option of storing monitor information in files or SQL tables, viewing it on screen (directing it to standard-out), or processing it with a client application.

The snapshot monitor provides two categories of information for each level being monitored: state, and counter. State includes information such as: current status of the database or application, information about the current or most recent unit of work, list of locks being held, Current number of database connections, and most recent SQL statement. Counters accumulate counts for activities from the time monitoring started until the time a snapshot is taken. Counters are kept for items such as: the number of deadlocks, number of database transactions, and application lock wait time.

Unlike onstat, snapshots must be enabled via database manager configuration parameters, or via application snapshot switches, otherwise it can only capture a limited set of information. You can select the appropriate switches to enable, which include: buffer pool, lock, sort, statement, table, and unit of work.

There are two ways to set monitor switches: at the database manager level, using the update dbm cfg command, and at the application level, using the command update monitor switches.

To set or deactivate a switch at the instance level, use the syntax: db2 update dbm cfg using dft_mon_switchname ,

In this command, switchname is one of the following:  DFT_MON_BUFPOOL  DFT_MON_LOCK  DFT_MON_SORT  DFT_MON_STMT  DFT_MON_TABLE  DFT_MON_UOW

Chapter 9. Administration tools and utilities 299 To set or deactivate a switch at the application level, use the syntax:

update monitor switches using

To display monitor switches for your instance, use the db2 get dbm cfg command as follows: db2inst2:/> db2 get dbm cfg

To display monitor switches for your application, issue the following command: db2inst2:/> db2 get dbm monitor switches

9.7.6 Activity Monitor The Activity Monitor is a tool that assists database administrators in improving the efficiency of database performance monitoring, problem determination, and resolution. The Activity Monitor focuses on monitoring application performance, application concurrency, resource consumption, and SQL statement usage. It helps the DBA to diagnose the cause of database performance problems such as application locking situations, and to tune queries for optimal use of database resources.

The Activity Monitor provides easy access to relevant and well-organized monitor data through a set of predefined reports such as Top processor-time consuming applications and SQL statements with the largest total sort time. For each predefined report, appropriate actions might be recommended to help resolve resource utilization problems, to optimize performance, or to invoke another tool for further investigation. Lock monitor data is also supplied to illustrate the details of lock waiting situations. Application lock chains can be displayed to show lock waiting dependencies.

9.7.7 DB2 Performance Expert This optional tool provides additional monitoring capabilities, such as the ability to graph performance elements and to discover longer term performance trends.

9.7.8 The db2pd utility, an onstat equivalent The db2pd utility provides a command line interface for monitoring DB2 instances and databases. This feature was implemented specifically to address the needs of the Informix product DBA who has derived great productivity and functionality from the onstat feature.

The db2pd utility was created with a look, feel, and function similar the XPS onstat utility. Similar to onstat, it runs from the command line with optional interactive mode. The tool can provide a wide range of information useful for

300 Database Strategies: Using Informix XPS and DB2 Universal Database troubleshooting and problem determination, performance improvements, and application development design, including:

 Locks  Buffer pools  Table spaces  Containers  Dynamic SQL statements  Agents  Applications  Memory pools and sets  Transactions  Logs

The db2pd utility is similar to onstat in that it does not acquire any locks or latches or use instance resources. However, because the DB2 architecture is different than that of XPS, especially with regard to having more memory areas, db2pd output is somewhat different.

For example, you have the ability to limit the scope of output to either the instance, or the database, by using the -ins or -db DATABASENAME modifiers. Using db2pd with either modifier, by default, provides all information for the instance or database. It can also collect information for all database partitions, -alldbpartitionnums modifier, or a set of database partitions, -dbpartitionnum [,] modifier.

At the instance level, information includes: DB2 version, operating system information (type, level, machine name), processor, memory for both server and database instance, agents, the database manager configuration, and utilities status. At the database level, in addition to DB2 version, operating system, processor, and memory, output includes applications connected, transactions, buffer pools, logs, locks, table spaces, containers, cache, dynamic SQL, packages, database configuration, table and index statistics, and database organization status.

One thing to note is the use of abbreviations in the syntax. Most db2pd options and sub-options have a three character minimum requirement. A user can use the full option name or any number of characters with a minimum of three. For purposes of our examples, we use the minimum.

Table 9-3 on page 302 compares several onstat functions to their db2pd equivalents.

Chapter 9. Administration tools and utilities 301 Table 9-3 onstat and db2pd equivalents

Function db2pd onstat

Elapsed time since the db2pd - onstat - instance was started

Version information db2pd -V onstat -V

All information (note- eve is db2pd -eve onstat -a an abbreviation for everything

Repeat command db2pd repeat num sec n onstat command -r n count n

Display locks db2pd -db onstat -k -locks

Display db2pd -db onstat -u session/application -applications information

Display threads/agents db2pd -agents onstat -g ath

Display storage db2pd -db onstat -d information -tablespaces (db2spaces/table spaces)

Display logs information db2pd -db onstat -l -logs

Display transactions db2pd -db onstat -x -transactions

Display memory db2pd [-db ] onstat -g seg segments/sets -memsets

Display memory pools db2pd [-db ] onstat -g mem -mempools

Display configuration db2pd -dbmcfg onstat -c

You can see more examples comparing output of onstat and db2pd in Database Transition: Informix Dynamic Server to DB2 Universal Database, SG24-6367. For further details on db2pd, see the DB2 Command Reference, SC09-2951.

302 Database Strategies: Using Informix XPS and DB2 Universal Database 9.7.9 Diagnostic files

DB2 uses two sets of diagnostic files, while XPS uses one, the online.log. Prior to version DB2 V8.1, there was one log, which was known as the db2diag.log. The db2diag.log has now been split into two separate logs — the administration

notification log (.nfy), and the db2diag.log — to separate administration and user information (the admin log) from internal support information (db2diag.log). The logs are both located in the directory specified by the DIAGPATH database manager configuration parameter. DIAGPATH is equivalent to the XPS MSGPATH in the onconfig file.

The information that DB2 records in the db2diag.log file and .nfy is determined by the DIAGLEVEL and NOTIFYLEVEL settings.

Valid values for DIAGLEVEL are: 0 — No diagnostic data captured 1 — Severe errors only 2 — All errors 3 — All errors and warnings 4 — All errors, warnings and informational messages

The Administration logs grow continuously. When they get too large, back them up and then erase the file. A new set of files is generated automatically the next time they are required by the system.

Valid values for NOTIFYLEVEL are: 0 — No administration notification messages captured. (This setting is not recommended.) 1 — Fatal or unrecoverable errors. Only fatal and unrecoverable errors are logged. To recover from some of these conditions, you might need assistance from DB2 service. 2 — Immediate action required. Conditions are logged that require immediate attention from the system administrator or the database administrator. If the condition is not resolved, it could lead to a fatal error. Notification of very significant, non-error activities (for example, recovery) can also be logged at this level. This level will capture Health Monitor alarms. 3 — Important information, no immediate action required. Conditions are logged that are non-threatening and do not require immediate action but can indicate a non-optimal system. This level will capture Health Monitor alarms, Health Monitor warnings, and Health Monitor attentions. 4 — Informational messages.

Chapter 9. Administration tools and utilities 303 If you have the Health Monitor enabled, you can trigger certain messages to kick off event warnings and alarms, so that you are made aware of potential problems with your system proactively. Otherwise, we suggest that you periodically monitor both files, to ensure that any error messages are investigated.

Diagnostic log analysis tool for db2diag.log A tool for filtering and formatting db2diag.log files (db2diag) is available in DB2 V8.2. You can use this tool to filter diagnostic log files. Among other options, you can indicate which fields to display, use a filter similar to grep to reduce the number of records and to have the empty fields omitted.

Command line options include: db2diag -help provides a short description of the options db2diag -h brief provides descriptions for all options without examples db2diag -h notes provides usage notes and restrictions db2diag -h examples provides a small set of examples to get started db2diag -h tutorial provides examples for all available options db2diag -h all provides the most complete list of options

In Table 9-1, the command db2diag -gi "level=severe" -H 3d shows all severe errors from the past three days.

Example 9-1 Diagnostic log filter example 0-CLYDE [db2test] $ db2diag -gi "level=severe" -H 3d 2004-11-08-13.19.03.784775-480 I1485497A411 LEVEL: Severe PID : 3035182 TID : 1 PROC : db2agent (instance) 0 INSTANCE: db2test NODE : 000 APPHDL : 0-187 APPID: G9012698.KA09.010F08211827 FUNCTION: DB2 UDB, base sys utilities, sqleattach_agent, probe:60 RETCODE : ZRC=0x81360012=-2127167470=SQLZ_RC_CMERR, SQLT_SQLJC "External Comm error"

9.7.10 Error message and command help The question mark (?) within a DB2 command provides online syntax examples, as shown in Example 9-2.

Example 9-2 DB2 command line help 4-CLYDE [db2tpch] $ db2 ? list history LIST HISTORY {BACKUP | ROLLFORWARD | REORG | CREATE TABLESPACE | ALTER TABLESPACE | DROPPED TABLE | LOAD | RENAME TABLESPACE | ARCHIVE LOG} {ALL | SINCE timestamp |CONTAINING {schema.object_name | object_name}} FOR [DATABASE] database-alias

304 Database Strategies: Using Informix XPS and DB2 Universal Database NOTE: From the operating system prompt, prefix commands with 'db2'. Special characters MAY require an escape sequence (\), for example: db2 \? change database db2 ? change database xxx comment with \"text\"

You can also obtain error message text by using the question mark (?) symbol with the error number, similar to using the XPS command finderr nnnn. In Example 9-3, we use the question mark to investigate an SQL error message from a SELECT statement.

Example 9-3 Error message help 0-CLYDE [db2tpch] $ db2 ? sql0911 SQL0911N The current transaction has been rolled back because of a deadlock or timeout. Reason code "".

Explanation: The current unit of work was involved in an unresolved contention for use of an object and had to be rolled back.

The reason codes are as follows: 2 transaction rolled back due to deadlock. 68 transaction rolled back due to lock timeout. 72 transaction rolled back due to an error concerning a DB2 Data Links Manager involved in the transaction.

Note: The changes associated with the unit of work must be entered again. The application is rolled back to the previous COMMIT.

User Response: To help avoid deadlock or lock timeout, issue frequent COMMIT operations, if possible, for a long-running application, or for an application likely to encounter a deadlock.

Federated system users: the deadlock can occur at the federated server or at the data source. There is no mechanism to detect deadlocks that span data sources and potentially the federated system. It is possible to identify the data source failing the request (refer to the problem determination guide to determine which data source is failing to process the SQL statement). eadlocks are often normal or expected while processing certain combinations of SQL statements. It is recommended that you design applications to avoid deadlocks to the extent possible. sqlcode : -911 sqlstate : 40001

Chapter 9. Administration tools and utilities 305

306 Database Strategies: Using Informix XPS and DB2 Universal Database

10

Chapter 10. Planning the transition

This chapter introduces the common concepts that are involved in planning for such a transition, as well as the available tools and resources. It is intended to be a high-level overview to help you get started in your planning process. The transition involves more than just transitioning the data from one DBMS to another. Thus, there are many other associated activities that you should consider as part of the overall project plan, such as:  Education and training of IT  Transition planning, testing, and verification  Migration of the actual data  Migration and modifications to the data schemas and metadata  Evaluation, selection, and testing of auxiliary tools  Changes to the applications  Consideration and use of new software, utilities, and administration tools  Performance tuning  Backup, restore, and recovery requirements  Resource requirements, including people and hardware  Availability requirements and service level agreements  Plans for delays and possible fallback  Consulting and services requirements

In addition, there are IBM services resources available to assist with planning, estimating the cost, and performing the actual transition. These are trained and experienced resources that understand the process and can, therefore, save you significant time and effort.

© Copyright IBM Corp. 2005. All rights reserved. 307 10.1 Tasks and activities

Typical transition planning tasks includes planning for activities that occur even before actually moving the data. For example, it might be a good time to consider architectural or structural changes you’ve thought about. However, take caution. Making multiple major changes during a transition exposes you to the potential for increased and over complex, issues. If time is on your side, it is typically better to only make one change at a time. Some of typical steps to consider are:  A readiness assessment  Tool evaluation and selection  Defining the scope of the project and the process steps  Estimating durations of the process steps  Planning the project  Allocating resources

Each project is different, but there are some factors that are good indicators of the overall effort. For instance, for applications that frequently use stored procedures, the number and complexity of the stored procedures to be converted greatly affect the length of the application transition. Another area might be the use of times and dates. Each DBMS has different internal format and display techniques. Physical requirements, such as the use of raw and cooked disk areas, spaces, and nodes, can also represent a large amount of work, especially as data grows significantly over time.

A transition plan can be as simple as a spreadsheet that lists the primary tasks along with some of the associated information for each task - such as start date, end date, elapsed time, dependencies, and who is responsible. There are also project planning tools available that are specifically designed to plan and track projects. These tools let you assign tasks, establish dependencies among the various steps (for instance, you cannot start testing until you move the database structure and the test data), and chart the original plan against what in-process and completed activities.

10.1.1 Readiness assessment and scope Planning a transition project begins with an assessment of the environment, the size of the project, and an understanding of the resources that can be used.

An accurate profile of the system architecture is key to success. The following is a list of some of the considerations that require attention:

 What characterizes the workload type mix: standardized reports or ad-hoc?  What languages are used for the applications? For example, Java, C, C++, and Visual Basic.

308 XPS to DB2  What is the target operating environment? Specifically, such things as the operating system, version, release, and fix pack level.

 What is the server target hardware platform? It could be IBM PSeries, Compaq, HP, and Sun™ as examples.

 What is the typical configuration of the database server? For example, the number of boxes, number of processors, size of RAM, and disk capacity.

10.1.2 Tool evaluation Although a migration can be performed without the help of tools, IBM has created the MTK. It is specifically designed to make the transition from XPS as fast and as easy as possible. There might be special circumstances, which also warrant the use of a third party tool in conjunction with the MTK. That will be part of the evaluation and selection process.

The IBM MTK can be used to generate DDL scripts to create database objects such as tables, indexes, views, triggers, stored procedures, and so on. It aids in moving data from XPS to DB2. For example it can either connect directly to the source system and perform its own extraction of the DDL, or it can use a syntax valid SQL script extracted by other tools such as dbschema. For more information about MTK, see Database Transition: Informix Dynamic Server to DB2 Universal Database, SG24-6367.

10.1.3 Estimating project duration An accurate estimation of the duration of any project activity includes considerations for the scope of project, resources needed, and knowledge of the products, applications, and transition plan.

The MTK can also be used as an assessment tool to determine the complexity of the migration. Using the MTK in this manner, for example, reveals stored procedures or SQL that might require manual intervention. However, a good working knowledge of the MTK is mandatory. Estimating the duration of activities also heavily depends on the skill level of those performing the work.

The cost of new software and migration tools should also be considered when estimating the total project cost. For example, the MTK is provided free of charge, and DB2 can also be significantly less the competitive products. Contact your IBM Sales Representative for more details.

Chapter 10. Planning the transition 309 Training costs, for IT staff and for users, should also be factored into the project plan. A good course for the experienced DBA is the IBM “Fast Path to DB2 for experienced relational DBAs”, course code CF281. See IBM Learning Services for more details: http://www-3.ibm.com/services/learning/

In addition, hardware requirements must also be planned if your existing data server does not have the capacity to run the existing XPS instance, the IBM MTK, and DB2 simultaneously. If moving to a different or an upgraded platform, it will take time to accurately determine sizing requirements then properly configure the system to adequately support DB2.

The IBM Software Migration Project Office (SMPO) can provide sizings nd estimates. Contact the SMPO at: http://www-3.ibm.com/software/solutions/softwaremigration/

10.2 Data conversion

Data conversion is a critical task in a transitioning project because you must ensure that all data is moved to the target database with integrity and in a timely fashion. There are a number of processes to help with this, including:  Using the MTK to generate scripts and files or move data online.  Exporting XPS data manually to flat file and importing or loading to DB2.  Exporting the data through pipes.  Using WebSphere Information Integrator.  Using an alternative data conversion product.

10.2.1 Preparation overview This section briefly discusses the steps that are required to prepare the DB2 target environment to receive the data from the XPS source. The steps include installation of DB2, instance and database creation, and table space planning. Before attempting DB2 installation, we strongly recommend you to read the installation instructions provided in Quick Beginnings for DB2 Servers, GC09-4836. See your IBM representative to get this document and the latest fix packs for DB2.

The following are some of the tasks that are required to prepare the DB2 environment: 1. DB2 V8 installation The first task is to decide which edition of DB2 fits your business requirements. The application assessment step provides you the base criteria

310 XPS to DB2 for selection. Regardless of the platform, you need to verify whether the system satisfies hardware and software requirements. For the DB2 installation process, we used both AIX and Windows as the platforms for installation in this redbook. Installing DB2 ESE on a platform such as SUN Solaris and HPUX, requires modifying the operating system kernel parameters. A system reboot is required afterwards. 2. Additional software requirements There are software considerations when running a database environment. These considerations will revolve around the type of applications accessing the database. If database development is desired, then the proper versions of the different software components, such as C or Java compilers, must also be installed on the server. 3. Instance and database creation The next task to perform is to create a DB2 instance and database. You need to consider details such as instance and database location, user IDs that are required, and permission requirements. 4. Table space planning When the database has been created, it becomes ready for object creation, which is space planning for the data. DB2 allows for two types of table spaces, System Managed Space (SMS), which is maintained by the system, and Data Managed Space (DMS), which is maintained by the DB2 administrator. If you are satisfied with the way the data is organized in the source database, you can look for the generally compatible way to organization the data in the DB2 database. There are two key caveats to remember however. First, the ability to intelligently fragment data in a DB2 instance is still under development. The other caveat is that without light scans, additional indexes will most likely be required. However, some of this can be mitigated through the use of DB2 Cube View technology. DB2Cube Views provide a multidimensional interface to Materialized Query Tables (MQTs) as well as Multidimensional Clusters (MDC). With MQTs and MDCs, data can be pre-aggregated along traditional relational boundaries such as time, geography or product. Cube Views can be created on top of these objects to leverage their relationships with each other to provide answers to business questions. These questions can be easily phrased through SQL statements against the Cube Views. The DB2 optimizer’s query rewrite functionality automatically translates and executes the statements against the appropriate source table underneath the Cube View. Planning the logical and physical implementation of the database in DB2 can provide an opportunity for you to change the way the data is placed, which can have an impact on the performance and maintenance. The DB2 manuals

Chapter 10. Planning the transition 311 Administrant Guide: Planing, Administration Guide, SC09-4822, and Performance and Administration Guide: Implementation, SC09-4821 provide detailed information about MQTs, MDCs, and Cube Views.

10.2.2 Data conversion process The data conversion process can become quite complex depending on the degree of customization. Before defining a transfer method, you should test with only with a portion of the data to validate the selected method. The tests could include a number of potential situations. Therefore, it is highly recommend that you start early with testing.

Some of the typical tasks of a test phase include:  Calculate source data size and space needed for unloading files to disk  Select the tools and the conversion method  Test the conversion using the chosen method with small amount of data

With the results of some simple testing, you should be able to:  Estimate the time for the complete data conversion process  Create a plan for the development environment conversion  Create a plan for the production environment conversion  Schedule the transition dates and duration

The following can influence the time and complexity of the process:  Volume of data and data changes: The more data you have to move the more time you need. Consider the data changes as well as such potential issues as timestamp conversions. Converting the data from its native binary format to ASCII then into binary within the new database environment requires significantly more time than a direct binary transfer.  System availability: You can execute the data movement activities either while the production system is down, or while the business process is running by synchronizing source and target database. The strategy you choose will drive the determination of whether, you need less or more time.  Hardware resources: Be aware that you might need three times or more disk space during the data movement for: – The source data in XPS – The unloaded data stored in the file system – The data loaded into the target DB2

312 XPS to DB2 10.2.3 Time planning

After testing the data movement and choosing the proper tool and strategy, you should create a detailed time plan that include some of the following tasks:

 Learn and test selected data movement tools.  Implement or modify scripts for data unload and load  Unload data from XPS  Load data to DB2  Backup target database  Test loaded data for completeness and consistency  Modify and switch to applications, including database interfaces  Plan a fallback process

One of the more sensitive environments is a production system with a high availability requirement. Figure 10-1 depicts one approach for moving data to the target database in a high availability environment. The dark areas represent new data, and the lighter areas represent data that has been converted and moved. If possible, export the data from a standby database or mirror database to minimize the impact on the production environment. The following are some of the tasks to be considered: 1. Create scripts that export all data up to a defined timestamp. 2. Create scripts that export changed data since the last export. This includes new data as well as deleted data. 3. Repeat step 2 until all data is moved to the target database. 4. Define a fallback strategy and prepare fallback scripts.

XPS XPS XPS

data movement

DB2 DB2 DB2

time

Figure 10-1 Data movement strategy in a high availability environment

Chapter 10. Planning the transition 313 When the data is completely moved to the target database, you can switch the application and database. Prepare a well-defined rollout process for the applications, and allow time for unplanned incidents.

10.2.4 The database structure Before data can be moved, the differences in database structures must be considered. These differences could be a result of different interpretation of SQL standards, or the addition or omission of particular functions. The differences can often be fixed syntactically, but in some cases, you must add functions or modify the application.

One way to describe a database structure is with an Entity-Relationship (ER) model of the data. The model describes the meaning of each entity, the relationships that exist, and the attributes. From this model, the SQL (DDL) statements that can be used to create a database that can be captured. If the database structure is already in the form of ER metadata (that is, an ER modeling tool was used in the design of the system), it is often possible to have the modeling tool generate a new set of DDL that is specific to DB2. Otherwise, the DDL from the current system must be captured and then modified into a form that is compatible with DB2. After the DDL is modified, it can be loaded and executed to create a new database (tables, indexes, and constraints) in DB2.

There are three approaches that can be used to move the structure of a DBMS:  Manual: Create the structure in DB2, by hand and manually adjust for issues.  Metadata transport: Extract the metadata (often called the schema) and import it to DB2  Migration tools: Use a tool to extract the structure, adjust it, and then implement it in DB2.

10.2.5 Data movement approaches There are a number of ways to accomplish data movement during the process. We give a brief overview of some of them in the following list:  Through flat files Before writing data into flat file, ensure that the maximum file size of your operating system is big enough to hold the exported files. On UNIX systems this is typically accomplished with the ulimit command. The data can then be loaded or imported into DB2 with any tool or application desired.

314 XPS to DB2  Using the MTK

The MTK allows you to move data through its GUI online and without generated scripts. When moving data online, the MTK does not use the generated scripts when deploying the load or import from the GUI. For information about the MTK, see Chapter 12, “DB2 Migration ToolKit for Informix” on page 345.  With named pipes You might need additional disk space to execute data movement process depending on which method you use. To avoid the required extra space for the flat files when using UNIX-based systems, you can use named pipes. To use this function, the writer and reader of the named pipe must be on the same machine. In addition, you must create the named pipe on a local file system before exporting data from the XPS database. Because the named pipe is treated as a local device, there is no need to specify that the target is a named pipe. Here is an AIX example: a. Create a named pipe: mkfifo /u/dbuser/mypipe b. Use this pipe as the target for data unload operation: > /u/dbuser/mypipe c. Load data into DB2 from the pipe: < /u/dbuser/mypipe The commands in steps b and c only show the principle of using the pipes.

Note: It is important to start the pipe reader after starting the pipe writer. Otherwise, the reader finds an empty pipe and exits immediately.

 Third-party tools There is another class of tools called ETL tools (extract, transform, and load). These tools are specifically designed for taking data out of a source, transforming or translating it, and then loading it into a target. There are several tools that work with both XPS and DB2, including IBM WebSphere DataStage and Informatica PowerMart.

10.2.6 WebSphere Information Integrator WebSphere Information Integrator is technology that allows DB2 to federate disparate data sources. Federation means that data can reside in different sources but can appear as a single source.

Chapter 10. Planning the transition 315 In a high availability environment, you might have to move the data during production activity. A practical solution is the replication facility of the WebSphere Information Integrator.

IBM DB2 Information Integrator provides integrated, real-time access to diverse

data as though it were a single database, regardless of where it resides. As a result, you can hold the same data both in XPS and in DB2. You are then free to switch to the new DB2 database when the functionality of the ported database and application is guaranteed.

DB2 replication server (formerly known as DB2 Data Propagator, now known as DB2 SQL Replication) lets users manage data movement strategies between mixed relational data sources, including distribution and consolidation models.

Data movement can be managed a table-at-a-time, such as for data warehouse loading during batch windows, or with transaction consistency for data that is never off-line. It can be automated to occur on a specific schedule, at designated intervals, continuously, or as triggered by events. Transformation can be applied in-line with the data movement through standard SQL expressions and stored procedure execution. For porting data, you can use the replication server to support data consolidation, moving data from XPS to DB2.

For more information about replication, see A Practical Guide to DB2 Data Replication V8, SG24-6828.

Note: The new DB2 Q-Replication mechanism from IBM supports only DB2 and does not work with XPS. On the other hand, you can use SQL Replication with XPS.

10.2.7 Modifying the application While transitioning the database structure and objects can be automated to some extent, application code changes mostly require manual conversion. If all database interaction is restricted to a database access layer, then the scope and complexity of necessary changes is well defined and manageable. However, when database access is not isolated to a database access layer (that is, it is distributed throughout application code files, contained in stored procedures or triggers, or used in batch programs that interact with the database), then the effort required to convert and test the application code depends on how distributed the database access there is and on the number of statements in each application source file that require conversion.

It is important to first migrate the database structure (DDL) and database objects (stored procedures, triggers, and user-defined functions). It is then useful to

316 XPS to DB2 populate the database with a test set of data so that the application code can be ported and tested incrementally.

Few tools are available to transition actual application code because much of the work is dependent upon vendor-specific issues. These issues include

adjustments to logic to compensate for differing approaches to transaction handling, join syntax, use of special system tables, and use of internal registers and values. Manual effort is normally required to make and test these adjustments. Often, proprietary functions used in the source DBMS will have to be emulated under DB2, usually by creating a DB2 user defined function or stored procedure with the same name as the proprietary one being ported. This way, any SQL statements in the application code that call the proprietary function in question will not need to be altered. The IBM MTK is equipped with some of the most commonly used vendor-specific functions and will automatically create a DB2-equivalent function (or stored procedure) during the migration process.

Another issue when porting high-level language code (such as C, C++, Java, and COBOL) involves compiler differences. Modifications to the application code might be required if a different compiler or object library are used in the DB2 environment. This might be caused by the selection of a different hardware or OS platform. It is vital to fully debug and test such idiosyncrasies before moving a system into production.

10.2.8 Database objects and interfaces This section describes the database objects and interfaces that are encountered during the modification of the applications.

Database objects Database objects such as stored procedures, triggers, and user-defined functions can be part of the application logic that is contained within the database. Most of these objects are written in a language that is very specific to the source DBMS, or are written in a higher-level language that then must be compiled and somehow associated or bound to the target DBMS for use.

XPS supports stored procedures and triggers, but does not support user-defined functions. UDFs can be used in DB2 to add to or enhance application functionality.

Note: The addition of these kinds of objects requires testing. This might mean

that test data is needed and must be populated into the database structure before testing can occur.

Chapter 10. Planning the transition 317 Database interfaces Applications that connect to the source database using a standardized interface driver, such as ODBC and JDBC, usually require few changes to work with DB2. In most cases, simply providing the DB2 supported driver for these interfaces is enough for the application to run with a DB2 database.

There are certain circumstances where the DB2-supported driver for an interface does not implement or support one or more features specified in the interface standard. It is in these cases where you must take action to ensure that application functionality is preserved after the transition. This usually involves changing application code to remove references to the unsupported functions and either replacing them with supported ones, or simulating them by other means.

Applications that use specialized or native database interfaces will require application code changes. Such applications can be ported using the DB2 native CLI interface, or by using a standardized interface such as ODBC or JDBC. If porting to CLI, many native database-specific function calls will need to be changed to the CLI equivalents. This is not usually an issue as most database vendors implement a similar set of functions. The DB2 CLI is part of the SQL standard and mappings of functions between other source DBMSs and DB2 CLI can be found in the applicable DB2 porting guide.

DB2 also provides a library of administrative functions for applications to use. These functions are used to develop administrative applications that can administer DB2 instances, backup and restore databases, import and export data, and perform operational and monitoring functions. These administrative functions can also be run from the DB2 Command Line Processor (CLP), Control Center, and DB2 scripts.

The following lists some of the common interfaces that are used with DB2:  JDBC and SQLj DB2 provides several JDBC drivers to write dynamic SQL programs in Java. DB2 provides support for the Type 2, Type 3, and Type 4 drivers. SQLj offers developers a way to write static SQL programs using Java. SQLj programs generally outperform their JDBC counterparts because the query access plans of executable statements have been optimized before run time.  Embedded SQL (static and dynamic)

DB2 provides the option of writing applications with SQL statements directly embedded within the host language. The SQL statements provide an interface to the database while the host language provides facilities to perform the application logic. DB2 supports several host languages including C/C++, , COBOL, and Java (SQLj).

318 XPS to DB2 Programmers have the option of using static or dynamic SQL, depending on the nature of the application.

 ODBC Microsoft’s ODBC standard provides a set of APIs for accessing a specific

vendor database. Vendors must supply their own driver that implements a subset of the API for accessing the database. The DB2 CLI driver can be used on its own to access a DB2 database or as an ODBC driver. DB2 conforms to most of the Level 3 compliance level for ODBC.  ADO and OLE DB Microsoft’s ActiveX Data Objects provide a set of methods for accessing data from a wide variety of data sources including relational databases, HTML, video, Text, and just about any other source of data. Access to the data is handled by ADO and is accessed through a service such as OLE DB or ODBC. In order to use OLE DB with DB2, you must first download the newest driver from the OLE DB Web page.  Microsoft .NET A Microsoft development platform that competes with the J2EE standard. The .NET framework programming model enables developers to build Web-based applications, smart client applications, and XML Web services applications which expose their functionality programmatically over a network using standard protocols such as SOAP and HTTP. Full support for the .NET standard for DB2 is on the way. Currently, IBM is offering a .NET driver program for developers interested in writing applications using the .NET standard.  DB DBI is an API that provides database access for client applications written in Perl. DBI defines a set of functions, variables, and conventions that provide a platform-independent database interface.  DB2 CLI The DB2 CLI Driver implements most of the function set defined in the ODBC standard as well as additional functionality specific to DB2. This interface offers more available functionality than do the other non-embedded and driver options.  Stored procedures Another popular method of interfacing with the database is through stored procedures. Stored procedures can be written in DB2 SQL procedural language, or in an external programming language such as C/C++ or Java. Restricting database access through stored procedures offers numerous benefits such as a reduction in network traffic (all processing takes place on

Chapter 10. Planning the transition 319 the server), and providing an additional layer of isolation between the application code and business logic.

10.3 After the transition

In addition to the fundamentals of database administration, do not forget these other key administrative functions:  Backup and recovery To prevent loss of data, you should have an adequate strategy for saving the data in your databases and for being able to restore it in case of a failure or error. The larger the amount of data you have, the longer and more sophisticated your strategy will become, especially if you need your database always on line.  Replication The DB2 replication feature enables you to transfer data between systems for the purpose of building new data stores, and duplicating all, or some, of the original data in another DBMS.  High availability High availability ensures that your data sources are always available for use, even if there is a hardware or software failure. This concept is closely related to the backup and recovery process, and is often implemented at the same time. The most recent release of DB2 included a new feature called High Availability Disaster Recovery (HADR) which is a port of the Informix Dynamic Server High Availability Data Replication (HDR) functionality. With HADR, a hot-site immediate fail-over copy of the database environment can be maintained to significantly reduce the risk of system downtime. Implementing HADR does require a full duplicate of the primary operating system including hardware, operating system, and database software.  Federation Federation allows access to data in numerous, heterogeneous data stores from one query. This section introduces the federation concepts and guides you to resources that will assist in creating a federated design.

This chapter presented a brief overview of some of the alternatives that are available for transitioning. IBM also has a full set of offerings available to help you. If you want IBM to perform some or all of your transition activities, contact

your IBM representative.

320 XPS to DB2

11

Chapter 11. Application conversion considerations

This chapter provides information about general application conversion considerations, along with some of the specific application development differences between DB2 and XPS. It gives the basics that you need when considering an application conversion project.

For more detailed information about this topic, refer to Chapter 12 of Database Transition: Informix Dynamic Server to DB2 Universal Database, SG24-6367. In general, there are very few differences between IDS and XPS when it comes to application development methodologies. Some of the most representative sections from that redbook are duplicated here for ease of use and a quick comparison.

© Copyright IBM Corp. 2005. All rights reserved. 321 11.1 Key considerations

In almost every situation, an application transition effort will require changes to the source. The degree of difficulty in an application transition or conversion from one database environment to another is very much dependant on how modular the application is as well as any coding that takes advantage of the database vendor's extensions to standards. Almost all database vendors support some of the common programming languages, such as C, C++, and Java.

While some standards exist that provide guidance for using a common set of interfaces to the database backend, most database vendors have provided extensions to enable the application to take advantage of native capabilities of the particular database engine. Thus, it is possible for an application transition effort to become much more difficult than a database transition.

In addition to applications that were coded in-house, any purchased products in use might require attention as well. In general, these products transition more easily because the application provider typically has ports for the target database. However, in situations where customization were made to take advantage of specific database extensions, there is most likely still work to do .

11.2 Application transitioning from XPS to DB2

XPS is a very specialized database management system. While there is nothing to prevent highly complex applications from being developed to interact with the XPS engine, the typical nature of most applications that work with a data warehousing engine such as XPS is to perform analytics. As such, there is much more emphasis on SQL than any complicated application programming logic.

Most XPS SQL statements translate to DB2 fairly easily, because they are based on the ANSI standard. Workarounds exist for some of the Informix extensions but not all of them. ESQL/C and JDBC/ODBC applications should translate easily also. If you have 4GL applications that have been written to interact with XPS, transitioning those to the EGL environment is necessary to work with DB2. For help with that effort, refer to Transitioning Informix 4GL to Enterprise Generation Language (EGL), SG24-6673.

322 Database Strategies: Using Informix XPS and DB2 Universal Database 11.3 Transactions

Both XPS and DB2 support transactions in terms of the ACID capabilities. ACID

is an acronym for the following terms:  Atomic  Consistency  Isolated  Durable

DB2 supports only the XPS equivalent of log-mode ANSI databases. Any migration from a non-ANSI XPS database requires modification of all SQL to fully qualify any database object that is referenced in those statements.

Another issue with non-ANSI versus ANSI database, from a transaction perspective, is the need to explicitly initiate the transaction. For example, the SQL statement BEGIN WORK indicates to a non-ANSI database that a transaction is about to start. This statement is not applicable to ANSI mode databases. Thus, DB2 is not aware of it and returns an error to the application if it is not removed.

Transactions in DB2 are started implicitly with the first executable SQL statement. In general, the scope of a transaction in DB2 is a set of SQL statements that terminate with a COMMIT or ROLLBACK statement. Depending on the application development language that is used, there might be a need to set the AUTO-COMMIT parameter to OFF to prevent DB2 from committing after each SQL statement.

If XPS applications terminate without an explicit commit or rollback, the database engine automatically rolls back the transaction. However, DB2 might commit transactions when applications end normally, except on Windows platforms.

Chapter 11. Application conversion considerations 323 11.4 Savepoints

Savepoints are a transactional extension of DB2 that allow you to rollback a

transaction. However, does not rollback completely, but to a well defined point with the transaction flow. Figure 11-1 shows a schematic diagram of how the affects a transaction.

tx starts savepoint sp1 tx ends

ROLLBACK

ROLLBACK TO SAVEPOINT sp1

insert update t update delete delete insert delete update Figure 11-1 Transaction flow when rolling back a transaction to a savepoint

It is possible to set several savepoints. Doing so allows you to rollback to different states of the transaction flow in your application. You can always rollback to a savepoint that was set before the actual position in the transaction flow. However, when a transaction has been rolled back to a savepoint, you cannot rollback the transaction flow to a savepoint which has been set after the savepoint you currently have reached.

324 Database Strategies: Using Informix XPS and DB2 Universal Database Figure 11-2 presents a schematic illustration of this behavior.

tx1 = transaction flow

tx1end

tx1current rollback to sp3 rollback to sp2 rollback to sp1 rollback to sp3

savepoint sp3

savepoint sp2

savepoint sp1

tx1start

t = application flow Figure 11-2 Multiple savepoint with rollback commands

Figure 11-3 shows the syntax for setting a savepoint in an application and demonstrates how to use a savepoint in an application.

SAVEPOINT savepoint-name UNIQUE

ON ROLLBACK RETAIN CURSORS ON ROLLBACK RETAIN LOCKS

Figure 11-3 Syntax to set a savepoint in DB2

11.5 Locks and isolation levels

The lock mechanisms in XPS and DB2 are for the most part very similar, except that there is no concept of page level locks in DB2. However, DB2 automatically escalates locks internally to minimize the number of locks that an application holds. In addition, there is also the concept of a table space lock. However, this lock is typically the result of an internal operation.

Chapter 11. Application conversion considerations 325 As with XPS, locks in DB2 can be held in shared, update, and exclusive modes on database objects. In addition, there is another set of intent, weak, and super exclusive locks that are placed automatically on these objects depending on the nature of the activity. You can find a detailed description of these lock modes in the product documentation.

11.5.1 Lock escalation Lock escalation is an internal mechanism that reduces the number of locks that are held. In a single table, locks can be escalated to a table lock from many row locks or for Multidimensional Clustering (MDC) tables from many row or block locks. Lock escalation occurs when applications hold too many locks of any type. It can also occur for a specific database agent if the agent exceeds its allocation of the lock list. Such escalation is handled internally. The only externally detectable result might be a reduction in concurrent access on one or more tables.

In a properly configured database, lock escalation should occur infrequently. You should monitor lock escalation to determine the root cause.

11.5.2 Deadlocks Deadlocks occur if several sessions are locking themselves mutually. Each XPS coserver monitors and prevents potential deadlock situations by sending an error message to the application if a request for a lock results in a deadlock. DB2 also has deadlock detection, but it is not deterministic regarding which session is rejected if a deadlock occurs.

11.5.3 Isolation levels Isolation levels control when and what kind of locks are set on an database object such as a row, page, or table. DB2 implements the SQL ANSI requirements.

For your convenience, Table 11-1 on page 327 shows the comparison of XPS and DB2 isolation levels. The following sections provide a more detailed description.

326 Database Strategies: Using Informix XPS and DB2 Universal Database Table 11-1 XPS and DB2 isolation levels

XPS DB2

Repeatable Read (RR) Repeatable Read (RR)

Cursor Stability (CS) Cursor Stability (CS)

N/A Read Stability (RS)

Committed Read (CR) N/A

Dirty Read (DR) Uncommitted Read (UR)

The default Isolation Level for an XPS databases (non log mode ANSI) is Committed Read. Most applications use this standard mode because the need of lock resources is relatively small with relatively high transaction encapsulation. DB2 has no exact equivalence to Committed Read.

The DB2 default isolation level is Cursor Stability. DB2 offers different techniques for changing the isolation level.

Changing isolation levels in SQL To set or change the isolation level for a single statement, you can use the WITH clause at the end of an SQL statement SELECT. The SET ISOLATION statement can change the isolation level for the duration of an entire transaction or Unit Of Work (UOW).

Example 11-1 shows the variants that are used to change isolation levels in SQL.

Example 11-1 Changing the isolation levels in SQL SELECT FROM ... WITH {UR|RR|CS|RS};

SET ISOLATION={UR|RR|CS|RS};

11.6 Packages

The concept of packages is something completely new for an XPS application developer. Packages contain the compiled query plan(s) for the Static SQL statements of an application that runs against DB2. The understanding of packages is important when working with embedded SQL languages, such as ESQL/C. The query plan in the package is created at compile time. Therefore, the quality of the query plan depends on the accuracy of the optimizer statistics at compile time.

Chapter 11. Application conversion considerations 327 A package is represented by a file, called the bind file, which is created during the pre-compile procedure. Bind files that are created during the compilation process must be submitted to the database before the applications can run successfully. This can be accomplished by the bind command (see 11.6.2, “Binding” on page 329). Figure 11-4 depicts the process of getting from the source code (in this example, ESQL/C) to the database-ready application.

source.bnd BINDV810üa EXEC SQL CONNECT $8L\l|!$$BDB2INST2CONNE bind EXEC SQL SELECT CT gAsHRJIUÈ!$"EGIN DECLARE SECTIONEND if (oltp && ids) { DECLARE SECTION- CONNECT TO :H00002 /* don’t trans… */ USER :H00003 USING Package } :H000042SELECT COUNT in database EXEC SQL PREARE (*) INTO :H00001 FROM pre-compile syscat.tables! while (true) { EXEC SQL FETCH static char sqla_program_id[162] 0101010100001010010 = 1010010101010101001 if (sqlca.sqlcode … 0010101010101101010 { 1010100101010100101 } 0010101101010010101 0,42,68,65,75,65,73,65,67,79,78, 0101010010101010101 78,69,67,84,32,103,65,115,72, compile 0010100101010010101 source.sqc 0101001010100101101 0100010101010100010 82,74,73,85,48,49,49,49,49,32,5 1001001010010100100 0,32,0,8,68,66,50,73,78,83, 1001001001010010100 1010100010101001010 1010010100100101001 01010

source.c a.out Figure 11-4 Steps to perform from source code to application 11.6.1 Static versus Dynamic SQL There is a slight difference between XPS and DB2 in terms of the definition of Static and Dynamic SQL. DB2 defines Static SQL as one that is fully defined at compile time. The only information that can be specified at run time are values for any host variables referenced by the statement. However, host variable information, such as data types, must still be precompiled. Anything else is dynamic. Consider Example 11-2, which depicts example code segments from an ESQL/C program.

Example 11-2 ESQL/C code segments Segment 1: EXEC SQL DECLARE mycur CURSOR FOR SELECT COUNT (*) FROM systables; Segment 2: strcpy (myhostvar, “SELECT colname FROM syscolumns WHERE colname NOT LIKE ‘sys%’”); EXEC SQL PREPARE mypstmt FROM :myhostvar;

328 Database Strategies: Using Informix XPS and DB2 Universal Database Both SQL statements would be considered as Static SQL in XPS, while DB2 handles Segment 2 as Dynamic SQL. The SQL string does not appear explicitly in an EXEC SQL line. It is copied into a host variable first and passed to the EXEC SQL line.

11.6.2 Binding Binding is the procedure that takes place when passing a bind file to the database where it is called package. Binding packages requires the user to have the necessary privileges (bind).

It is beyond the scope of this document to detail the various options of the bind command. However, Example 11-3 shows the syntax of the command.

Example 11-3 DB2 BIND syntax BIND {filename | @filelist} [ACTION {ADD | REPLACE [RETAIN {YES | NO}] [REPLVER version-id]}] [BLOCKING {UNAMBIG | ALL | NO}] [CLIPKG number-of-packages] [COLLECTION collection-id] [DATETIME {DEF | USA | EUR | ISO | JIS | LOC}] [DEGREE {1 | degree-of-parallelism | ANY}] [DYNAMICRULES {RUN | BIND | INVOKERUN | INVOKEBIND | DEFINERUN | DEFINEBIND}] [EXPLAIN {NO | YES | REOPT | ALL} [EXPLSNAP {NO | YES | REOPT | ALL}] [FEDERATED {NO | YES}] [FUNCPATH schema-name [ {,schema-name} ... ]] [GENERIC string] [GRANT {authid | PUBLIC}] [GRANT_GROUP group-name] [GRANT_USER user-name] [INSERT {BUF | DEF}] [ISOLATION {CS | RR | UR | RS | NC}] [MESSAGES message-file] [OWNER authorization-id] [QUALIFIER qualifier-name] [QUERYOPT optimization-level] [REOPT {NONE | ONCE | ALWAYS}] [SQLERROR {NOPACKAGE | CHECK | CONTINUE}] [VALIDATE {RUN | BIND}] [SQLWARN {NO | YES}] [STATICREADONLY {NO | YES}] [TRANSFORM GROUP transform-group]

Package isolation level When using the bind command, it is also possible to specify the default isolation level for this package. All SQL statements of this package run in the isolation level that is specified, except for those which set their own isolation level.

Rebinding The bind file contains the Static SQL statements of an application. During the bind process, the query plans of the SQL are calculated. The calculation of query plans depend on the statistics to which the optimizer has access at the time of compilation.

Chapter 11. Application conversion considerations 329 The rebind command recalculates the query plans of the Static SQL in the application based on the current statistics. Figure 11-5 shows the REBIND command syntax. In general it is good practice to rebind a package after you execute runstats. Doing so guarantees that the Static SQL statements in the application use the most appropriate query plan.

REBIND package-name PACKAGE VERSION version-name

RESOLVE ANY CONSERVATIVE

Figure 11-5 The rebind command syntax

Depending on the number of packages that are stored in a database, the rebind process can become an administrative issue. To keep it simple, you can use the db2rbind command, whose syntax is shown in Example 11-4, to rebind all existing packages.

Example 11-4 Syntax of db2rbind Usage: db2rbind database-alias -l logfile [all -u userid -p password] [-r {conservative|any}]

Package versions DB2 V8 introduced the possibility to have more than one package for the same application in a database. The advantage of this is that applications can work with a proven version while others are testing a new version of a package. The versioning of packages is reflected by additional parameters in all the bind utilities and statements described here.

11.7 Cursors

Cursor management in DB2 is very similar to that of XPS, with the following differences:  Hold cursors in DB2 are closed after a rollback work statement.  There is no support implicitly for scrollable cursors. However, applications written in ODBC/CLI or Java/JDBC can call the built-in functions and methods

provided to use a Scroll Cursor.  All allocated resources for a cursor are freed automatically upon closing the cursor.

330 Database Strategies: Using Informix XPS and DB2 Universal Database  DB2 currently does not support Insert cursors. However, the INSERT SQL statement does provide for the capability to perform batch inserts by specifying values for multiple rows in a single statement. Example 11-5 shows a batch insert statement.

Example 11-5 Batch insert INSERT INTO customer (fname, lname) VALUES ('Uwe', 'Weber'), ('Mark','Scranton'), ('P.','Frampton'), ('Mutt', 'Bonkey');

11.8 Stored procedures

Stored procedures are typically small pieces of code that are used to encapsulate business logic to ensure consistency in the application of such logic. Unlike its counterpart, Informix Dynamic Server, XPS provides an SQL-based language with some language control and cursor management capabilities.

DB2 provides a robust environment for stored procedures that can be written in both SQL/PL (an extension that is provided by DB2) as well as external programming languages, such as C or Java. SQL/PL offers a host of extensions that emulate programming environments such as loops and cursors. In addition, it allows for the capability to work with dynamic SQL statements as well.

SYSDBOPEN and SYSDBCLOSE procedures Currently, DB2 does not have any equivalent mechanism to the XPS SYSDBOPEN and SYSDBCLOSE stored procedures, and, at this writing, there is no effective way to simulate them.

Chapter 11. Application conversion considerations 331 11.9 Programming languages

This section presents some general information regarding certain programming

languages and their database interface. This is not an exhaustive discussion, but simply provides some information of which you should be aware when porting an application from XPS to DB2.

11.9.1 ESQL/C ESQL/C is one of the most popular programing language interfaces found in the Informix environment. ESQL/C lets the programmer write SQL statements in the C source code.

A precompiler is needed to convert the SQL statements into regular C-function calls. After the conversion, you can compile the resulting C file into an executable or library. Example 11-6 demonstrates the steps you take to create an ESQL/C application (see also Figure 11-4 on page 328) in DB2.

Example 11-6 Steps to perform from source code to application. $ db2 connect to ifmx2db2 Database Connection Information

Database server = DB2/6000 8.2.0 SQL authorization ID = DB2INST2 Local database alias = IFMX2DB2

$ db2 prep myexample.sqc bindfile

LINE MESSAGES FOR myexample.sqc ------SQL0060W The "C" precompiler is in progress. SQL0091W Precompilation or binding was ended with "0" errors and "0" warnings.

$ db2 bind myexample.bnd

LINE MESSAGES FOR myexample.bnd ------SQL0061W The binder is in progress. SQL0091N Binding was ended with "0" errors and "0" warnings.

$ gcc -I$HOME/sqllib/include -L$HOME/sqllib/lib -ldb2 myexample.c -o myexample

332 Database Strategies: Using Informix XPS and DB2 Universal Database Host variables C variables that have to be used in SQL statements are called host variables. Host variables are declared in a special section of an application, called the declare section. The way host variables are declared is identical in XPS and DB2, as shown in Example 11-7.

Example 11-7 Host variable declaration in DB2 ESQL/C [...] EXEC SQL BEGIN DECLARE SECTION; sqlint32 rowcount; char dbname[9]; EXEC SQL END DECLARE SECTION; [...]

Even if the declaration of host variables is identical, you should be aware of the following differences in what types of variables you can declare in the section:  Integer variables If you want to declare an integer variable, you have to use sqlint32 or sqlint16. The C-int-variables are not accepted by the DB2 precompiler.  Pointer (C strings) In XPS, you can declare a pointer to a C string, as shown in Example 11-8.

Example 11-8 Dynamic C string declaration in Informix. EXEC SQL BEGIN DECLARE SECTION; char *mycstr; EXEC SQL END DECLARE SECTION;

DB2 allows you the same declaration but is has a different meaning. This does not declare a pointer to the first character of a null-terminated C string. It declares a pointer to a single char. To achieve a similar behavior as in XPS, you have to deal with the SQLDA structure, which allows dynamically created C string host variables.

Chapter 11. Application conversion considerations 333  Assignments

As with XPS, you can assign values to host variables in the declare section. You should look at the C string declarations and assignments that are shown Example 11-9.

Example 11-9 DB2 version of assignments in declare section EXEC SQL BEGIN DECLARE SECTION; sqlint32 colint = 42; /* ok */ char *ptrchar1 = “I’m invalid.”; /* wrong */ char arrchar1[] = “I’m invalid, too.”; /* wrong */ char arrchar2[128] = “I’m fine.”; /* ok */ EXEC SQL END DECLARE SECTION;

 C structures The declaration of C structures in DB2 is slightly different than in XPS. It is not necessary to name a C structure, but you need to name the variable which represents the C structure, as shown in Example 11-10.

Example 11-10 Using C structures in DB2 ESQL/C EXEC SQL BEGIN DECLARE SECTION; struct { sqlint32 col1; char col2[11]; } mystruct; EXEC SQL END DECLARE SECTION; [...] EXEC SQL INSERT INTO mytable (:mystruct); [...]

 TYPEDEF The reserved C keyword TYPEDEF is not allowed in declare sections at all.

11.9.2 JDBC If your application is written in Java/JDBC, it might be relatively simple to port this application to DB2. Conceptually, you have to perform only a few changes.

You should be aware that DB2 delivers a JDBC Type 4 driver with V8. However, you still can find a Type 2 and Type 3 JDBC driver. The JDBC driver types have different URLs, as shown in Table 11-2 on page 335.

334 Database Strategies: Using Informix XPS and DB2 Universal Database Table 11-2 DB2 JDBC drivers

JDBC Type URL

Type 2 com.ibm.db2.jdbc.app.DB2Driver (only for applications)

Type 3 (deprecated) com.ibm.db2.jdbc.net.DB2Driver (only for applets)

Type 4 com.ibm.db2.jcc.DB2Driver (for applications and applets)

Connection String The connection string for the DB2 JDBC driver is different from the XPS version, as shown in Example 11-11.

Example 11-11 Establishing a JDBC connection against a DB2 database import java.sql.*;

class rsetClient { public static void main (String args []) throws SQLException {

// Load DB2 JDBC application driver try { Class.forName("COM.ibm.db2.jdbc.app.DB2Driver"); } catch (Exception e) { e.printStackTrace(); } // Connect the database Connection conn = DriverManager.getConnection("jdbc:db2:dbname","uid","pwd"); } }

11.9.3 ODBC/CLI DB2 Call Level Interface (DB2 CLI) is the IBM callable SQL interface to the DB2 family of database servers. It is a C and C++ API for relational database access that uses function calls to pass dynamic SQL statements as function arguments. It is an alternative to embedded dynamic SQL. However, unlike embedded SQL, DB2 CLI does not require host variables or a precompiler.

DB2 CLI is based on the Microsoft ODBC specification and the International Standard for SQL/CLI. These specifications were chosen as the basis for the DB2 CLI in an effort to follow industry standards and to provide a shorter learning

Chapter 11. Application conversion considerations 335 curve for those application programmers familiar with either of these database interfaces. In addition, some DB2 specific extensions have been added to help the application programmer specifically exploit DB2 features.

The DB2 CLI driver also acts as an ODBC driver when loaded by an ODBC

driver manager. It conforms to ODBC 3.51.

Setting up the CLI environment Runtime support for DB2 CLI applications is contained in all DB2 UDB clients. Support for building and running DB2 CLI applications is contained in the DB2 Application Development (DB2 AD) Client.

The CLI/ODBC driver automatically bind on the first connection to the database, provided that the user has the appropriate privilege or authorization. The administrator might want to perform the first connect or explicitly bind the required files.

For a DB2 CLI application to successfully access a DB2 database, you need to: 1. Ensure that the DB2 CLI/ODBC driver was installed during the DB2 client install. 2. Catalog the DB2 database and node if the database is being accessed from a remote client. On the Windows platform, you can use the CLI/ODBC settings GUI to catalog the DB2 database. 3. Optional: Explicitly bind the DB2 CLI/ODBC bind files to the database with the following command: db2 bind ~/sqllib/bnd/@db2cli.lst blocking all messages cli.msg \ grant public On the Windows platform, you can use the CLI/ODBC settings GUI to bind the DB2 CLI/ODBC bind files to the database. 4. Optional: Change the DB2 CLI/ODBC configuration keywords by editing the db2cli.ini file, located in the sqllib directory on Windows and in the sqllib/cfg directory on UNIX platforms.

On the Windows platform, you can use the CLI/ODBC settings GUI to set the DB2 CLI/ODBC configuration keywords.

336 Database Strategies: Using Informix XPS and DB2 Universal Database 11.9.4 C++

Informix offers a proprietary C++ API which provides classes and methods to access an XPS database. Behind the scenes the C++ API is a wrapper that maps the non-object orientated Informix ESQL/C to a object-orientated

environment.

The Informix C++ API does not support access to DB2.

You should consider using DB2 ESQL/C, which also allows embedding into C++ or use ODBC/CLI functions.

11.9.5 Large objects Large objects, such as a Binary Large Object (BLOB) are data types that can store a variable length of any type of data. XPS BLOBs are differentiated into two sub-types of large objects:  TEXT BLOB columns contain printable information.  BYTE BLOB columns contain any binary information.

DB2 also supports large objects. If you declare a large object in DB2, you have three different declaration techniques, described in the following sections. (Table 5-2 on page 155 lists the data type mappings of large objects.)

LOB host variable Select the entire LOB value into a host variable. The entire LOB value is copied from the server to the client, as shown in Example 11-12. This is method is inefficient and sometimes not feasible. Host variables use the client memory buffer, which might not have the capacity to hold larger LOB values.

Example 11-12 Using DB2 LOB host variable EXEC SQL BEGIN DECLARE SECTION; SQL TYPE IS CLOB (1k) myclob; sqlint16 myclob_ind; EXEC SQL END DECLARE SECTION; [...] EXEC SQL DECLARE cur1 CURSOR FOR SELECT clobcolumn FROM mytab;

EXEC SQL OPEN cur1; EXEC SQL FETCH cur1 INTO :myclob:myclob_ind; [...] printf (“length of clob: %d\n”, myclob.length); [...]

Chapter 11. Application conversion considerations 337 A declaration such as:

SQL TYPE IS CLOB (1k) myclob;

results in the following C structure: static struct myclob_t { sqluint32 length; /* length of data */ char data[1024]; /* the actual data */ } myclob;

LOB locator host variable Select only a LOB locator into a host variable. The LOB value remains on the server, and the LOB locator moves to the client. If the LOB value is very large and is needed only as an input value for one or more subsequent SQL statements, then it is best to keep the value in a locator. The use of a locator eliminates any client/server communication traffic needed to transfer the LOB value to the host variable and back to the server. Example 11-13 depicts this method.

Example 11-13 Using DB2 LOB locator host variable EXEC SQL BEGIN DECLARE SECTION; SQL TYPE IS CLOB_LOCATOR my_locator; EXEC SQL END DECLARE SECTION; [...] EXEC SQL FREE LOCATOR :my_locator;

A declaration such as: SQL TYPE IS CLOB_LOCATOR my_locator;

results in the following C variable declaration: sqlint32 my_locator; /* LOB handle */

338 Database Strategies: Using Informix XPS and DB2 Universal Database LOB file host variable Select the entire LOB value into a file reference variable. The LOB value (or a part of it) is moved to a file at the client without going through the application memory, as shown in Example 11-14.

Example 11-14 Using DB2 LOB file host variable EXEC SQL BEGIN DECLARE SECTION; SQL TYPE IS BLOB_FILE myfile; EXEC SQL END DECLARE SECTION; [...] strcpy (myfile.name, “blob.dat”); myfile.name_length = strlen (myfile.name); myfile.file_options = SQL_FILE_OVERWRITE; [...] EXEC SQL SELECT blobcolumn INTO :myfile ...

A declaration such as: SQL TYPE IS BLOB_FILE myfile;

results in the C structure shown in Example 11-15.

Example 11-15 BLOB file declaration struct { sqluint32 name_length; /* length of file name */ sqluint32 data_length; /* write file (SELECT): bytes written */ /* read file (INSERT): how many byes to read */ sqluint32 file_options; /* SQL_FILE_READ, SQL_FILE_CREATE, */ /* SQL_FILE_OVERWRITE, SQL_FILE_APPEND */ char name[255]; /* file name */ } myfile;

11.9.6 SQL Communications Area The SQL Communications Area (SQLCA) is a standardized interface that is used by application to exchange non-user data information with the database engine.

Even if the SQLCA structure is standardized, database vendors tend to expand the structure. Thus, you might have to handle some differences between XPS and DB2 SQLCA structures.

Chapter 11. Application conversion considerations 339 Table 11-3 shows all elements of the DB2 SQLCA structure in a generalized form. The data types noted are SQL data types. Depending on the programming interface that you use, appropriate data types are provided.

Table 11-3 DB2 SQLCA structure description

Name Data type Field value

sqlcaid CHAR(8) An indicator for storage dumps containing SQLCA. The sixth byte is L if line number information is returned from parsing an SQL procedure body.

sqlcabc INTEGER Contains the length of the SQLCA (136 bytes).

sqlcode INTEGER Contains the SQL return code, which means:  0: Successful execution (although one or more SQLWARN indicators can be set).  positive: Successful execution, but with a warning condition.  negative: Error condition.

sqlerrml SMALLINT Length indicator for sqlerrmc, in the range 0 through 70. 0 means that the value of sqlerrmc is not relevant.

sqlerrmc VARCHAR (70) Contains one or more tokens, separated by X'FF', which are substituted for variables in the descriptions of error conditions.

This field is also used when a successful connection is completed.

When a NOT ATOMIC compound SQL statement is issued, it can contain information about up to seven errors.

sqlerrp CHAR(8) Begins with a three-letter identifier indicating the product, followed by five digits indicating the version, release, and modification level of the product. For example, SQL08010 means DB2 UDB V8 Release 1 Modification level 0.

If SQLCODE indicates an error condition, this field identifies the module that returned the error.

This field is also used when a successful connection

is completed.

340 Database Strategies: Using Informix XPS and DB2 Universal Database Name Data type Field value sqlerrd ARRAY Six INTEGER variables that provide diagnostic information. These values are generally empty if there are no errors, except for sqlerrd(6) from a partitioned

database. sqlerrd(1) INTEGER If connection is invoked and successful, contains the maximum expected difference in length of mixed character data (CHAR data types) when converted to the database code page from the application code page. A value of 0 or 1 indicates no expansion; a value greater than 1 indicates a possible expansion in length; a negative value indicates a possible contraction.

On successful return from an SQL procedure, contains the return status value from the SQL procedure. sqlerrd(2) INTEGER If connection is invoked and successful, contains the maximum expected difference in length of mixed character data (CHAR data types) when converted to the application code page from the database code page. A value of 0 or 1 indicates no expansion; a value greater than 1 indicates a possible expansion in length; a negative value indicates a possible contraction. If the SQLCA results from a NOT ATOMIC compound SQL statement that encountered one or more errors, the value is set to the number of statements that failed.

Chapter 11. Application conversion considerations 341 Name Data type Field value

sqlerrd(3) INTEGER If PREPARE is invoked and successful, contains an estimate of the number of rows that will be returned. After INSERT, UPDATE, DELETE, or MERGE,

contains the actual number of rows that qualified for the operation. If compound SQL is invoked, contains an accumulation of all sub-statement rows. If CONNECT is invoked, contains 1 if the database can be updated, or 2 if the database is read only.

If the OPEN statement is invoked, and the cursor contains SQL data change statements, this field contains the sum of the number of rows that qualified for the embedded insert, update, delete, or merge operations.

If CREATE PROCEDURE for an SQL procedure is invoked, and an error is encountered when parsing the SQL procedure body, contains the line number where the error was encountered. The sixth byte of sqlcaid must be L for this to be a valid line number.

sqlerrd(4) INTEGER If PREPARE is invoked and successful, contains a relative cost estimate of the resources required to process the statement. If compound SQL is invoked, contains a count of the number of successful sub-statements. If CONNECT is invoked, contains 0 for a one-phase commit from a down-level client; 1 for a one-phase commit; 2 for a one-phase, read-only commit; and 3 for a two-phase commit.

sqlerrd(5) INTEGER Contains the total number of rows that were deleted, inserted, or updated as a result of both:  The enforcement of constraints after a successful delete operation  The processing of triggered SQL statements from activated triggers If compound SQL is invoked, contains an accumulation of the number of such rows for all sub-statements. In some cases, when an error is encountered, this field contains a negative value that is an internal error pointer. If CONNECT is invoked, contains an authentication type value of 0 for server authentication; 1 for client authentication; 2 for authentication using DB2 Connect; 3 for DCE security services authentication; and 255 for unspecified authentication.

342 Database Strategies: Using Informix XPS and DB2 Universal Database Name Data type Field value sqlerrd(6) INTEGER For a partitioned database, contains the partition number of the partition that encountered the error or warning. If no errors or warnings were encountered,

this field contains the partition number of the coordinator node. The number in this field is the same as that specified for the partition in the db2nodes.cfg file. sqlwarn ARRAY A set of warning indicators, each containing a blank or W. If compound SQL is invoked, contains an accumulation of the warning indicators set for all sub-statements. sqlwarn0 CHAR(1) Blank if all other indicators are blank; contains W if at least one other indicator is not blank. sqlwarn1 CHAR(1) Contains W if the value of a string column was truncated when assigned to a host variable. Contains N if the null terminator was truncated. Contains A if the CONNECT or ATTACH is successful, and the authorization name for the connection is longer than 8 bytes. sqlwarn2 CHAR(1) Contains W if null values were eliminated from the argument of a function. sqlwarn3 CHAR(1) Contains W if the number of columns is not equal to the number of host variables. sqlwarn4 CHAR(1) Contains W if a prepared UPDATE or DELETE statement does not include a WHERE clause. sqlwarn5 CHAR(1) Reserved for future use. sqlwarn6 CHAR(1) Contains W if the result of a date calculation was adjusted to avoid an impossible date. sqlwarn7 CHAR(1) Reserved for future use.

If CONNECT is invoked and successful, contains E if the DYN_QUERY_MGMT database configuration parameter is enabled. sqlwarn8 CHAR(1) Contains W if a character that could not be converted was replaced with a substitution character. sqlwarn9 CHAR(1) Contains W if arithmetic expressions with errors were ignored during column function processing.

Chapter 11. Application conversion considerations 343 Name Data type Field value

sqlwarn10 CHAR(1) Contains W if there was a conversion error when converting a character data value in one of the fields in the SQLCA.

sqlstate CHAR(5) A return code that indicates the outcome of the most recently executed SQL statement.

11.9.7 SQLDA An SQL Descriptor Area (SQLDA) is a collection of variables that are required for execution of the SQL DESCRIBE statement. The SQLDA variables are options that can be used by the PREPARE, OPEN, FETCH, and EXECUTE statements. An SQLDA communicates with dynamic SQL. It can be used in a DESCRIBE statement, modified with the addresses of host variables, and then reused in a FETCH or EXECUTE statement.

SQLDAs are supported for all languages, but predefined declarations are provided only for C, REXX, FORTRAN, and COBOL.

The meaning of the information in an SQLDA depends on its use. In PREPARE and DESCRIBE, an SQLDA provides information to an application program about a prepared statement. In OPEN, EXECUTE, and FETCH, an SQLDA describes host variables.

Because the entire explanation of how to work with the SQLDA structure is beyond the scope of this book, refer to IBM DB2 Universal Database, SQL Reference Volume 2, SC09-4845.

344 Database Strategies: Using Informix XPS and DB2 Universal Database

12

Chapter 12. DB2 Migration ToolKit for Informix

This chapter introduces the DB2 Migration ToolKit for Informix (MTK). This chapter describes the MTK and its functions and features. It is a technical overview and includes information about implementation and deployment that can help if you plan to transition from XPS to DB2 UDB.

Database migrations vary in size and complexity due to the countless number of database designs that have been implemented and used. As a result, each database migration has a unique set of issues and challenges. However, there are technical elements that are common to all database applications and it is this quality that advocates the development and use of migration tools to facilitate the migration process. The MTK was designed by IBM to do just that. Although the MTK cannot resolve all of the possible migration issues automatically, it can simplify and manage many of the conversion tasks and guide you in the right direction.

The MTK was developed as a joint venture by the IBM Silicon Valley Lab in California and the IBM Research Laboratory in Hawthorne, New York. This product is free-of-charge and can be downloaded from the IBM Web site.

© Copyright IBM Corp. 2005. All rights reserved. 345 12.1 Features and functionality

You can use the MTK to migrate database objects, SQL, and data, from an XPS source database to a DB2 UDB target database. It supports the Extended Parallel Server (XPS) 8.3, 8.4, and 8.5. Only initial support of XPS is provided, and manual intervention is required, particularly when creating objects that reside on database partitions.

On the target side, MTK supports the following databases:  DB2 UDB V8.1 and V8.2 for Linux, UNIX and Windows  DB2 UDB for iSeries 5.2 and 5.3

The important features and functionality of MTK are as follows:  Translating objects and SQL: –Tables • Maps data types except for collections and other complex data types • Includes primary and foreign keys, unique, check and null constraints • With some limitations on SQL used in Check constraints • Includes schema names – Views, with some limitations – Indexes, regular only – SPL routines (functions and procedures), translated to SQL PL with some limitations – Triggers, with some limitations – Synonyms – Sequences – SQL statements, with some limitations – Built-in functions, with some exceptions (as examples, Char_length, range, dbinfo)  Translating some of the options that are used with the following data definition statements for supported objects: – Create –Alter –Drop  Converting the following transaction control statements: – Set isolation –Commit – Rollback

346 Database Strategies: Using Informix XPS and DB2 Universal Database  Performing these additional migration tasks:

– Extracts object definitions directly from XPS system tables. – Imports object definitions from SQL scripts sourced by external methods (as examples, dbschema scripts, manually created scripts).

– Generates data extract scripts. – Generates data load or import scripts. – Generates scripts to move data using Named Pipes. – Automates the migration of data (including LOBs). – Automates the deployment of database objects. – Generates DB2 source scripts for manual deployment of objects. – Maps a subset of XPS SQLCODEs to DB2 SQLCODEs. – Creates a DB2 database with supporting table spaces (SMS) and buffer pools. – Executes the runstats utility after data migration. – Performs referential integrity checking on migrated data. – Performs dynamic conversions of individual SQL and DDL statements using the interactive SQL Translator feature. – Allows for manual override of selected data types. – Allows for renaming of tables (including columns) and indexes. – Tracks, logs, and reports the status of object translations, data movement, DDL changes and deployment.

12.2 Recommendations for Use

This section provides some recommendations for using the MTK, relative to installation, configuration, and deployment.

12.2.1 MTK installation and configuration The MTK can be installed on:  Windows (NT 4.0, 2000, XP Professional)  AIX (4.3.3.0 or later)  Linux (tested on Mandrake V8.1 with pdksh installed)  SUN Solaris (5.7)  HP-UX (v11i) platforms

Chapter 12. DB2 Migration ToolKit for Informix 347 Note: If you are also migrating data and you want to use the MTK GUI to automate the movement of the data, then be sure to install the MTK on a system that has sufficient disk space for the data files.

12.2.2 MTK Configurations You can use several MTK configurations:  Option 1 The MTK, the XPS server, and the DB2 server, are all installed on the same machine. This is a configuration often used on Windows when testing the MTK  Option 2 The MTK, the XPS server, and the DB2 server are each located on separate machines. For example, MTK on Windows, and XPS and DB2 on separate UNIX machines. This configuration is convenient for object conversions but requires that there is sufficient disk space on Windows for data extract files. In addition, if you wish to automate the migration of your data, your data cannot include LOBs. If either of these is not the case, then, with this configuration, the MTK data transfer scripts can be used to migrate the data manually.  Option 3 The MTK is installed on the same machine as the XPS server; DB2 is installed on a separate machine. Though possible, it is not recommended for real-life migration projects.  Option 4 The MTK is installed on the same machine as the DB2 server; XPS is installed on a separate machine. Suggested configuration for a real-life migration project. Supports MTK automated data movement of LOBs.

All four options require that database connectivity is established between the MTK and the XPS server and between the MTK and the DB2 server. For options 2, 3 and 4, network connectivity must also exist between the XPS server machine and the DB2 server machine.

348 Database Strategies: Using Informix XPS and DB2 Universal Database The recommended configuration for a migration project is option 4 (the MTK is installed on the same machine as the DB2 server). The reasons for this are:

 When migrating LOB data using the MTK GUI, the MTK must be installed on the same machine as the DB2 server. LOB data cannot be loaded over the network using client-side load.  If deploying to a local database, MTK can create the database for you.  For option 3, it is unlikely that the MTK will be allowed to be installed on the same machine as the production XPS server.  There is no real benefit to placing the MTK anywhere other than on the DB2 machine. Although the data extraction phase takes longer than the load phase, this is primarily due to data transformation processing that takes place while building data extract files.

Deploying to DB2 It is recommended that during the development and unit testing phase of your project, you let MTK create your database and deploy your objects to DB2. MTK creates a database with default parameters and creates SMS table spaces and buffer pools to get you started quickly.

For the system testing, performance testing and cut-over to production phases, it is recommended that you build and deploy your database manually with the help of the scripts that MTK generates. This allows you to tune the physical design of your database to meet the processing needs of your applications.

ISV migrations It is not recommended that you use the MTK to convert ISV schema definitions for the following reasons:  Most ISVs supply their own conversion tools and some require that their tool be used or the migrated database will not be certified.  MTK data type mappings might not match the ISVs specifications.

The MTK can be used to prototype the conversion of customizations made to ISV SQL, including procedures and triggers. However, the tables that the procedures and triggers depend on must also be converted in MTK.

Chapter 12. DB2 Migration ToolKit for Informix 349 12.3 Technical overview of MTK

This section provides some of the technical details of the MTK. It can give you a

good basic understanding of the functions and capabilities of the MTK.

12.3.1 The MTK GUI The MTK GUI, shown in Figure 12-1, presents five tabs, each of which represents a specific task in the conversion process.

Figure 12-1 The MTK GUI

The tabs are, from left to right:  Specify Source  Convert  Refine  Generate Data Transfer Scripts  Deploy to DB2

350 Database Strategies: Using Informix XPS and DB2 Universal Database The menu bar contains the following choices:

 Application, which allows you to set up your preferences, such as an editor.  Project, which allows you to start a new project or open an existing project.  Tools, from here you can launch the SQL Translator, reports, and the log.  Help, which provides MTK online help text that describes error messages and how to use the generated data movement scripts to migrate the data manually.

12.3.2 The migration process The five tabs in the MTK user interface represent the five steps in the migration process, which are described in the following sections.

Specify Source step If using EXTRACT to specify that the source of the objects for conversion is an XPS database, a database connection must be established. The easiest way to establish a connection is to use a JDBC connection. The JDBC driver that is required for this connection is provided with MTK in the ifxjdbc.jar file. All you need to do is to supply the information that is requested.

You can also use an ODBC connection provided that you are running XPS and MTK on a Windows platform. You can set up an ODBC data source for your XPS database by using Administrative Tools from the Control Panel.

You can also use IMPORT to specify that the source to your conversion is an external SQL script file containing XPS object definitions or SQL statements.

The Convert and Refine steps The next two steps, convert and refine, are used together and usually require several passes. During the convert step, you can convert the database objects, including procedures, functions, and triggers. If you have used an SQL script as input to the conversion, you can also convert SQL statements. MTK uses the files that are generated by this phase to deploy to DB2 automatically, or you can deploy to DB2 manually using the DB2 Command Line Processor.

During the refine step, you have a chance to review translation information, warnings, and errors for your conversion. You can then make modifications and run convert again.

How the translator works The converter (also referred to as the translator) is written in Java and uses ANTLR (Another Tool for Language Recognition) as its parsing engine. The

Chapter 12. DB2 Migration ToolKit for Informix 351 converter is a language translator which operates similar to a compiler for conventional programming language. It takes as input a sequence of XPS SQL scripts and generates a corresponding DB2 SQL script as output. It also generates metadata information about each XPS object definition and corresponding generated DB2 object definition. The metadata is encoded in the XML-based Metadata Interchange (XMI) format for easy reuse. The metadata information summarizes important properties about source objects and is used by MTK to generate an overview of source and target objects in the Refine Step (where the results of the translation are reported).

An XPS SQL script is a sequence of SQL statements and SPL commands. The SQL statements are translated as they are encountered in the script. Therefore, the order in which XPS objects are defined in the source script is critical for proper conversion. The converter requires that an object be defined before it is used. Queries of an object cannot be translated if the object has not yet been defined. When the source of the object definitions are extracted (using MTK) directly from the system catalogs, the object definitions are in dependency order. When the source of the object definitions are imported into the MTK from an external file, some manual reordering to satisfy dependencies might be required.

For each DB2 UDB statement generated in a converted script or stored procedure, the converter normally copies the corresponding XPS statement as a comment preceding the generated target statement, as shown in Example 12-1.

Example 12-1 Converter output --| create table mytab --| (dte datetime year to day default datetime(2000-01-01) year to day;

CREATE TABLE mytab (dte DATE DEFAULT '2000-01-01-00.00.00.000000')!

This annotation makes it easier to understand how the generated code relates to the source and how to perform manual refinement of the generated code if necessary. If an error occurs during the conversion, the error message will appear after the source code and any invalid DB2 statement that results from the error will be commented out. If you prefer to not see the commented source in the output file, you can disable it in the MTK.

In some cases, because of identifier length restrictions (such as examples, indexes, constraints, and triggers), the converter generates truncated target names for some source objects. The converter generates new names such that they do not conflict with pre-existing names. However, the name generation process depends on the order in which object definitions occur in a script. As a result of the renaming process, if the converter is used to convert two scripts separately that contain the same object definitions but in different orders, the resulting target scripts might contain inconsistently renamed target objects.

352 Database Strategies: Using Informix XPS and DB2 Universal Database Translating tables, indexes, and views This phase of the translation is the most straightforward. The MTK converts DDL with very little manual intervention required. However, the MTK does not convert dbspaces. You need to develop a script to create your DB2 table spaces manually. You can then edit your DB2 table DDL to assign table spaces to your tables and indexes and to add other DDL changes to optimize performance. You then deploy the DDL to DB2 manually.

In the beginning phase of your conversion, you can use MTK to deploy your tables and indexes and to create the table spaces for you automatically.

Translating built-in functions The MTK comes with pre-written DB2 user-defined functions that match the functionality of many XPS built-in functions. There are three possible scenarios when converting XPS built-in functions to DB2 SQL:  An XPS built-in function has an equivalent DB2 function. In this case, function calls are mapped directly to DB2 SQL.  An XPS built-in function does not have an equivalent DB2 function, yet a similar DB2 function is available.  An XPS built-in function has no DB2 equivalent. In most of these cases a DB2 SQL or Java UDF is provided by the MTK to provide similar functionality.

The user defined functions that are packaged with the MTK are contained in the INFX schema. You can find them in the MTK installation directory in files named mtkinfx.udf and infxUDFs.jar.

MTK installs the Java and SQL UDFs automatically during the deploy to DB2 step. The DEPLOY_yourprojectname_UDF.log file contains information about the success or failure of UDF deployment.

An example of a Java UDF packaged with the MTK is the SYSTEM function. The SYSTEM function is used to translate SPL procedures using the system command to equivalent functionality in DB2 procedures, as shown in Example 12-2.

Example 12-2 System function CREATE FUNCTION INFX.system(cmd varchar(2000)) RETURNS INT EXTERNAL NAME 'infx.udfjar:com.ibm.db2.tools.mtk.mtkinfxudf.infxUDFs.os_cmd' LANGUAGE java PARAMETER STYLE JAVA DETERMINISTIC FENCED NOT NULL CALL

Chapter 12. DB2 Migration ToolKit for Informix 353 NO SQL NO EXTERNAL ACTION NO SCRATCHPAD NO FINAL CALL ALLOW PARALLEL NO DBINFO !

Example 12-3 shows the Java code.

Example 12-3 Java code public static int os_cmd(String cmd) { Runtime rt = Runtime.getRuntime(); Process p=null; int success = 0; try { p = rt.exec(cmd); } catch (IOException e) { success = -1; } return (success); }

Translating functions and procedures The conversion of functions and procedures generally requires more manual intervention than the conversion of tables and indexes. However, MTK provides a quick start for these conversions.

The MTK translates the XPS Create Function and Create Procedure to DB2 functions and procedures based on the following rules:  XPS procedures with no return values are translated to DB2 UDB procedures.  Functions and procedures with multiple return values are translated as DB2 UDB procedures, with additional OUT parameters for the return values.  Functions with a single return value are translated to DB2 UDB functions unless they contain any specific feature that require them to be translated as DB2 UDB procedures.

XPS functions and procedures (collectively called routines) contain statements of the form RETURN…WITH RESUME which are called cursor functions because they return a series of values to the caller. In SPL, these functions must be called from a FOREACH statement. On the application side, a cursor is declared to contain the results of the routine call and then FETCH is used to retrieve the

354 Database Strategies: Using Informix XPS and DB2 Universal Database results. MTK translates cursor functions to DB2 stored procedures that return a result set using cursor processing or a FOR loop.

The MTK does not convert GLOBAL variables found in SPL routines. The easiest way to migrate these to DB2 is to convert global variables to INOUT parameters for all procedures that require them. Another method to convert global variables is to make use of the DB2 Declared Global Temporary Tables to share data between procedures within a session.

The Generate Data Transfer Scripts step The Generate Data Transfer Scripts step involves two tasks:  Generating scripts that transform XPS data into DB2 format and that extract the data to a file.  Generating scripts to read the data from a file and load the data into DB2.

The MTK builds SELECT statements with built-in functions to transform XPS data and extract the data into files, as shown in Example 12-4.

Example 12-4 Select statement SELECT customer_num, TO_CHAR(call_dtime,'%Y-%m-%d-%H.%M.00.000000'), user_id, call_code, call_descr, TO_CHAR(res_dtime,'%Y-%m-%d-%H.%M.00.000000'), res_descr from cust_calls;

MTK uses the DB2 load or import utility to load the data into DB2. As previously mentioned, with MTK, the transformation and extraction phase generally takes more execution time than loading the data into DB2.

In order to understand the script generation step more fully, some background information about how the MTK migrates data is helpful. There are two ways that you can deploy data to DB2 with MTK: 1. Automatic transfer by using the Deploy to DB2 tab from the MTK GUI. 2. Manual transfer (no MTK GUI involved) by installing and executing scripts that were generated by MTK. The scripts can be used for two methods of data transfer: – A staged approach using intermediate files – Piping the data using Named Pipes

Chapter 12. DB2 Migration ToolKit for Informix 355 If the data migration is performed using the Deploy to DB2 step from the GUI, then data is migrated automatically by the MTK. For this to occur, there must be sufficient disk space available on the machine where the MTK is installed. The data can be loaded into a local DB2 database or to a remote DB2 database provided there is a client connection to the remote DB2 server. If data is to be migrated to a remote DB2 and there is not enough disk space on the MTK machine or if the data includes LOBs, then the data transfer scripts can be used to migrate the data. To migrate the data manually using MTK generated scripts:  The source XPS machine must be accessible through the network to the DB2 machine.  The data transfer scripts generated by MTK must be installed as directed in the Help on the MTK menu bar.

Note: It is not recommended to extract XPS data on one system and transport the data files over the network to the DB2 system for loading.

The Deploy to DB2 step In this step, you use MTK GUI to automatically: 1. Create your objects on DB2. 2. Extract your data from XPS. 3. Load your data to DB2.

You can use the MTK GUI to deploy to either a local or remote DB2 UDB database; however, as previously mentioned, the following restrictions apply:  MTK cannot automate the creation of a remote DB2 database.  There must be sufficient disk space on the MTK machine for migrating data.  LOB data can only be loaded into a local DB2.

Here is some additional information about the Deploy to DB2 step:  A DB2 UDB database name has a limit of eight characters.  During deployment, the connection to DB2 uses a Java native driver and your DB2 server must be configured properly for Java 1.3.1.

356 Database Strategies: Using Informix XPS and DB2 Universal Database  Before you can deploy to a remote DB2, you must also establish DB2 client connectivity with the server using the DB2 Catalog Database and Catalog Node commands.

 The User Defined Functions packaged with MTK are created on the DB2 database automatically.  To deploy objects and data to a previously created database, the user ID must have DBADM authority.  If you are deploying procedures to DB2 UDB V8.1 or prior releases, you must have a C compiler installed on the same machine as DB2.  If you are using the MTK GUI to deploy the data, the data is unloaded into a directory that is local to where the MTK is installed.  Before you can deploy using MTK, you must first generate the scripts from the Generate Data Transfer Scripts step.  When a database is created by the MTK, a buffer pool and three table spaces are created with a page size of 32 KB. This is to provide enough space for the deployment of tables with any row length. However, a 32 KB page size might not be optimal for tables with a smaller row length. Therefore, before deploying your database into production, adjust the table space sizes accordingly. For each [table space] page size used, there must be at least one buffer pool with the same page size created.

Note: With DB2, an entire row must fit on a single page and no row chaining is allowed. This is unlike XPS, where a row can be longer than the page size used.

 The default collating behavior between DB2 UDB and XPS are different. There is an option on the Deploy tab that allows you to select a collating sequence for the DB2 database. The default collating sequence for DB2 is SYSTEM. When SQL and data are deployed to DB2, run tests to determine whether to modify DB2 UDB to use the IDENTITY collating sequence.  If data is loaded, the integrity of the data is checked and the Runstats utility is executed.

Chapter 12. DB2 Migration ToolKit for Informix 357  Two files are produced at the end of the deployment process:

– The DB2 deployment log, whose output is generated by DB2 for each statement or command executed. – The Verification Report, which contains the results of the deployment in

report format.  After Deployment completes, you can go back to the Generate Data Transfer Scripts tab and view the SELECT statements that were used to extract the data from XPS tables and view the data.

Figure 12-2 shows a view of the data extraction information.

Figure 12-2 View of data extraction information

358 Database Strategies: Using Informix XPS and DB2 Universal Database Figure 12-3 shows a sample of the extracted data.

Figure 12-3 Sample of extracted data

Connecting to a remote DB2 UDB database If you are using MTK to deploy to a remote DB2 UDB database, you must first establish connectivity to access the remote database. You establish connectivity on the client machine where the MTK is installed by following these steps: 1. Telnet to the remote database server and run: db2 list database directory This command lists the local databases on that server. Verify that the database you need to access has been created. 2. On your local DB2 machine, issue two commands: db2 catalog tcpip node node_alias remote ip_address server service_name db2 catalog db dbname as alias_name at node node_alias In these commands: – node_alias can be any eight character name – ip_address is the IP address of the remote server – service_name is the service name for the remote server (port number can also be used) – alias_name is any eight character name that refers to the remote DB2 database

Chapter 12. DB2 Migration ToolKit for Informix 359 3. To test your connection to the remote DB2, enter the following from your local machine:

db2 connect to alias_name user userid using password In this command:

– alias_name is the database you have just cataloged – userid and password are a user ID and password that has been created at the remote server.

Loading data to a remote DB2 You might also find that before using the MTK to perform a client-side load to the remote DB2, you will have to bind a file to add an additional package on the remote DB2 by following these steps: 1. From your local DB2 system, connect to the remote DB2: db2 connect to alias_name user userid using password 2. Bind the file named db2ucktb.bnd found on your local DB2 system: db2 bind db2ucktb.bnd blocking all grant public

Configuring a C++ compiler for DB2 UDB prior to V8.2 If you are using DB2 V8.2, a C++ compiler is not required for deploying SQL stored procedures. If you are using a DB2 version prior to V8.2, a C++ compiler is required.  Windows: After you have installed MS Visual C++®, you need to point DB2 to the directory where the compiler has been installed. You can do it by issuing the following command: db2set DB2_SQLROUTINE_COMPILER_PATH="installation path for compiler". Use the following command to verify that the DB2 registry variable has been set: db2set -all  AIX: If you have installed the IBM VisualAge® C++ 5.0 compiler, DB2's registry variables have been preconfigured to use the compiler. If you are using the IBM C for AIX version 5.0 compiler, enter the following as one command to set the DB2 registry variable: db2set DB2_SUBROUTINE_COMPILE_COMMAND=xlc_r -I$HOME/sqllib/include SQLROUTINE_FILENAME.c -bE:SQLROUTINE_FILENAME.exp -e SQLROUTINE_ENTRY -o SQLROUTINE_FILENAME -L$HOME/sqllib/lib -ldb2

360 Database Strategies: Using Informix XPS and DB2 Universal Database 12.4 DB2 Data Partitioning Facility considerations

Basically, most XPS customers are using fragmented database structures. When

these customers consider a transition to DB2, they must consider the ramifications of using the DB2 Data Partitioning Facility (DPF).

For an overview of the XPS capabilities with fragmented databases, refer to Chapter 6, “Data partitioning and access methods” on page 159. For an overview of DB2 capabilities with partitioned databases, refer to 2.7.2, “Partitioning in DB2” on page 58.

The MTK does not provide for the automated movement of data from an XPS fragmented environment to a DB2 DPF environment. Manual intervention is required for this task. Prior to moving data between these two environments, significant planning is required because many of the XPS data fragmentation capabilities do not yet exist with DB2. However, DB2 has powerful capabilities with Multidimensional Clustering (MDC) that offer alternative approaches with the data partitioning.

During the writing of this redbook, the project team implemented both XPS fragmentation and DB2 DPF. We ran tests to enable a high-level comparison of these two implementations from several perspectives, such as:  Disk space utilization  Performance  Load and unload capabilities  Impact on applications

For an overview of some of the observations that we observed during testing, refer to 12.4, “DB2 Data Partitioning Facility considerations” on page 361 and 13.7, “Observations” on page 409.

Chapter 12. DB2 Migration ToolKit for Informix 361 12.5 Installing and executing the MTK

You can download the MTK from the following URL:

http://www-306.ibm.com/software/data/db2/migration/mtk/

Hardware requirements The following are the hardware requirements for installing and executing the MTK for Informix:  Disk space – 50 MB for installation – 5 MB per project – Additional space varies by the number of source script files and data  Memory – 512 MB (minimum) – Processor speed for Windows platforms: 300 Mhz  Software requirements, by platform –Windows Platforms To deploy SQL stored procedures prior to DB2 UDB V8.2, a Microsoft Visual C++ version 5 or later must be installed. – UNIX platforms To deploy SQL stored procedures prior to DB2 UDB V8.2, a C++ compiler must be installed. You can use the GNU compiler. Java 1.3.1 must be installed and added to the system $PATH environmental variable in .profile. For example on AIX, perform the following: export PATH=/usr/java131/bin:/usr/java131/jre/bin:$PATH If Java 1.3.1 is not installed on your machine, you can download it from: http://www6.software.ibm.com/dl/lxdk/lxdk-p  Installation procedures, by platform –Windows i. Unzip and extract the package. Run setup.exe. The installation defaults to the C:\MTK directory.

ii. Run the installation shield wizard and follow its instructions. Then to launch the MTK select: iii. Click Start → Programs → IBM DB2 Migration Toolkit → Toolkit.

362 Database Strategies: Using Informix XPS and DB2 Universal Database – UNIX platforms

i. Log in with the user ID you will use to install MTK. Do not install as root. Install MTK with a user ID in the db2admin group (SYSADM authority). ii. Verify that the DB2 $INSTHOME environment variable is set up to

point to your DB2 instance directory and that it is properly exported when you start. For example, in Korn shell type: Export INSTHOME=/home/db2inst2 Echo $INSTHOME iii. Download or copy the MTK into a newly created directory, and expand the package. For example, on AIX to uncompress db2mtk_version_aix.tar.Z use the following: tar -xvf db2mtk_version_aix.tar iv. Launch the MTK from the directory in which it was installed by typing: ./MTKMain

Note: For further installation instructions, refer to the readme file that is packaged with the MTK installation files or to the release notes on the Web site.

12.5.1 Using MTK with manual deployment to DB2 UDB This section describes how to use the MTK in a manual deployment scenario.

Creating the instance and database You can have DB2 UDB create the DB2 instance automatically while installing DB2 UDB. You also can create the DB2 instance manually after the installation is completed. On AIX, the DB2 instance can be created by executing the db2setup program used to install DB2, or manually through the command line by issuing the db2icrt command, or by using the Control Center provided by DB2.

Using the db2setup and db2isetup utilities Using the db2setup utility provides an easy way to create a DB2 instance. As root perform the following steps: 1. Launch the db2setup utility. 2. Select Create a new DB2 instance or set up an existing DB2 instance. This screen allows you to configure the DB2 administration server and user used as a repository for the GUI administration tools that are provided with DB2 such as the Control Center. The default value for this user is dasusr1 with a default home directory of /home/dasusr1.

Chapter 12. DB2 Migration ToolKit for Informix 363 3. Click Instance setup and choose Create DB2 instance - 32 bit .

4. For a single partition instance choose the first option. 5. On the Set User Information for the DB2 Instance Owner screen, you need to identify a system user who will be the instance owner. If you choose a new

user, then specify the name of the user and his password. The default values are user db2inst1 and group db2grp1. You also have to specify the home directory for this user, for example /home/db2inst1. By default, any databases that are created under this instance are created in this directory unless otherwise specified. The installing creates both the user and the home directory. 6. The Set User Information for the Fenced User screen allows you to specify its username and password. The default user is db2fenc1 assigned to group db2fgrp1 in home directory /home/db2fenc1. 7. The tools Catalog screen is meant for preparing the DB2 Tools catalog on the server. If you do not need the tools catalog installed, choose Do not prepare the DB2 tools catalog on this computer. 8. Set the administrator contact information and click Finish.

As part of the instance creation, the installer creates all three users that are identified mainly as db2inst1, db2fenc1, and dasadm1. If you do not want to use the default users, you can create the user IDs and groups ahead of time and use the IDs during creating the instance.

The installer also adds the following entry to the /etc/services file in order to allow communication from DB2 clients: db2c_db2inst1 50000

where db2c_db2inst1 indicates the service name and 50000 indicates the port number. Subsequent instances can be created on the same server simply by invoking the db2isetup utility and going through the steps just described.

Using DB2 commands You can also create a DB2 instance manually by following these steps: 1. Log on to the AIX system as root. 2. Create the necessary groups for DB2 Instance owner, administration server, and Fenced ID using the following commands: groupadd db2grp1

groupadd db2fenc1 groupadd dasadm1

364 Database Strategies: Using Informix XPS and DB2 Universal Database 3. Create the DB2 Instance user ID, administration server user ID, and Fenced ID and assign them to their respective groups using the following commands:

useradd -g db2grp1 -d /home/db2inst1 db2inst1 -p my_password useradd -g db2fenc1 -d /home/db2fenc1 db2fenc1 -p my_password useradd -g dasadm1 -d /home/dasusr1 dasusr1 -p my_password 4. Issue the command: db2icrt -u db2fenc1 db2inst1 5. Edit the /etc/services file and add the following entries: db2c_db2inst1 50000/tcp #DB2 port for remote clients db2idb2inst1 50001/tcp #interrupt ports for DB2 1.x clients 6. Log on as the instance owner and update the Database Manager Configuration (dbm cfg) file to reflect the service name in the /etc/services file with: update dbm cfg using SVCENAME db2c_db2inst1 7. Set up the default communication protocol: db2set -i db2inst1 -i DB2COMM=TCPIP 8. Set the instance to auto-start with the system if desired: db2set - i db2inst1 DB2AUTOSTART=TRUE

At this point the server is ready to create the database. To simplify the database connectivity test, you can create a sample database in the following steps: 1. Log on the AIX system as the instance owner db2inst2. 2. Execute the db2sampl command located at sqllib/bin directory under the home directory of the DB2 instance. The db2sampl executable is a script that creates a small database called SAMPLE automatically. 3. Connect to the SAMPLE database by issuing the db2 connect command. In our example the command becomes db2 connect to sample, which should display a the following connection confirmation on the screen: Database server = DB2/6000 8.2.0 SQL authorization ID = DB2INST2 Local database alias = SAMPLE 4. To see the results, issue a SQL query such as: db2 "select * from staff"

Chapter 12. DB2 Migration ToolKit for Informix 365 You can create a DB2 database either by the Control Center or by using the command line. In order to create a DB2 database manually:

1. Log on to the AIX system as the instance owner db2inst2. 2. Because DB2 allows for one instance to have multiple databases, it is always

recommended to attach to the desired instance before the create database command is issued: db2 attach to instance_name where the instance name in our case is db2inst2. 3. Issue the create database command. The simplest this command can take the form: db2 "create database my_database on /db_path"

This command creates a database and the following three table spaces:  SYSCATSPACE to store system catalog tables  USERSPACE1 to store user defined objects  TEMPSPACE1 to store temporary objects

You can view these table spaces by issuing the command: db2 list tablespaces

There are many options that can be included in the database command. You can see some of the available options in Example 12-5. Also refer to DB2 SQL Reference Volume 2, SC09-4845 for more details on the create database command.

Example 12-5 The create database command CREATE DATABASE my_db ON /db_path ALIAS warehouse_db USING CODESET code_set TERRITORY US COLLATE USING SYSTEM USER TABLESPACE MANAGED BY SYSTEM USING ('/user_tablespace_path') CATALOG TABLESPACE MANAGED BY SYSTEM USING ('/catalog_tablespace_path') TEMPORARY TABLESPACE MANAGED BY SYSTEM USING ('/temp_tablespace_path')

The default value for db_path is the home directory of the instance owner db2inst2 /home/db2inst2.

366 Database Strategies: Using Informix XPS and DB2 Universal Database Note: If you are manually creating a database to migrate to rather than letting the MTK create it during deployment, then ensure you use the following DB2 commands: db2 update database manager configuration using keepdari no db2stop force db2start

Run these commands each time you create a new instance. Otherwise, the MTK runs these commands for you.

Table space planning DB2 UDB allows for two types of table spaces, SMS and DMS. Both types of table spaces have containers or data files associated with them. In this section we discuss both types of table spaces. For a summary of the differences between both types of table spaces, refer to Table 12-1.

Table 12-1 SMS and DMS table space differences Table space feature SMS DMS

Can dynamically increase the number of containers in a table space No Yes

Can store indexes for a table in a separate table space No Yes

Can store long data for a table in a separate table space No Yes

One table can span multiple table spaces No Yes

Space allocated only when needed Yes No

Table space can be placed on different disks Yes Yes

Extent size can be changed after creation No No

There are three categories of table space:  Regular table space that can store regular, index, and long data. Nevertheless, this type of table spaces is not optimized for long type data.  Large table space is designed to store long character or LOB type data.  Temporary table space is designed to store temporary tables. A user cannot define a table in a temporary table space.

Note: Only users with SYSADM or SYSCTRL authority can create table spaces.

Chapter 12. DB2 Migration ToolKit for Informix 367 SMS table space This type of table space stores its containers in the form of operating system directories. Because this type of table spaces cannot be resized manually, enlarging the underlying file system would then increase the size of the table space. SMS table spaces acquire more space only when needed.

There are a few advantages that are associated with creating SMS table spaces, such as ease of creation and maintenance. The main disadvantage of an SMS table space is that it can only be created as regular or temporary and cannot store long data types.

Note: Only users with SYSADM or SYSCTRL authority can create table spaces.

DMS table space The containers that are associated with a DMS table space are either operating system files or raw devices. A DMS table space can be resized manually with the alter tablespace command using the RESIZE option. The DBA decides the location of containers belonging to the table space and when to add containers. A DMS table space can be defined as regular, large, or temporary.

When planning for table spaces, you should consider the table space size, type, and the placement on the physical drive. Migration time is a good time to redesign the table spaces of your database if you have been considering it. XPS chunks are most similar to the DB2 UDB DMS table space container.

Creating table spaces The command used to create a table space can be used in the following form: CREATE Tablespace_data_type TABLESPACE Tablespace_name PAGESIZE Integer K MANAGED BY Tablespace_type USING Container_path

In this command:  Tablespace_data_type indicates if a table space is regular, large or temporary.  Tablespace_name indicates the name of the table space.  Integer indicates the size of a memory page in kilobytes (KB).  Tablespace_type indicates either SYSTEM or DATABASE for SMS and DMS table spaces respectively.  Container_path indicates the path and name of a container.

368 Database Strategies: Using Informix XPS and DB2 Universal Database Note: DB2 supports page sizes of 4 KB, 8 KB, 16 KB, and 32 KB. If you are creating tables with row sizes wider than 4 KB (for tables without LOBs), then you must create a table space with a page size large enough to support the width of that table; you must also create a buffer pool using the same page size as your table space if one does not already exist. Buffer pools are created using the CREATE BUFFERPOOL statement.

The following are examples of some DB2 commands to use when working with table spaces. Example 12-6 shows the command to create a regular table space.

Example 12-6 Create tablespace command CREATE REGULAR TABLESPACE EMP_TBS managed by database using ('/db2/user_data' 25600);

Example 12-7 shows the syntax that is used to create a table space used to store Long objects in a DB2 database.

Example 12-7 Create a table space of type large CREATE LARGE TABLESPACE lob_tbs MANAGED BY DATABASE USING (FILE '/db2/lob/user_lobs' 25600);

Example 12-8 shows the syntax that is used to create a table space used to store the indexes in a DB2 database.

Example 12-8 Create a table space to store indexes CREATE TABLESPACE ind_tbs MANAGED BY DATABASE USING (FILE '/db2/indx/user_indx' 25600);

To obtain information about existing table spaces, the DBA can issue the following command from the CLP: db2 list tablespaces

If detailed information is required, issue the following command: db2 list tablespaces show detail

Security considerations DB2 UDB uses existing operating system users as database users. On an environment such as AIX, users are simply added to specific operating system groups, and are authenticated at the operating system level.

Chapter 12. DB2 Migration ToolKit for Informix 369 DB2 provides two levels of security to users. The first is called an authority. It bundles specific privileges over the database in its entirety, such as creating a database, creating a table space, and performing backup and recovery tasks. The second is called a privilege, which allows a user to access, create or manipulate a specific database object in the database such as a table, view, or index. The instance owner, however, has extra authority at the instance level.

DB2 UDB authorities An authority in DB2 UDB is defined as a Group in AIX and granting a specific user this authority simply means that this user is assigned to this group in the /etc/group file. The levels of authorities in DB2 UDB are classified as follows:  _ SYSADM Administrative authority, system administrators are given full privileges over the entire DB2 instance. SYSADM cannot be granted with a SQL statement.  _SYSCTRL System control authority, system controllers are given full privileges for managing the system, but are not allowed access to data. SYSCTRL cannot be granted with a SQL statement.  _ SYSMAINT System maintenance authority. System maintainers are given a subset of privileges to manage the system. SYSMAINT cannot be granted with a SQL statement.  _ DBADM Administrative authority. Database administrators have control over an individual database. DBADM can be granted with a SQL statement.  _ LOAD The LOAD authority is granted in the database level. The users with LOAD authority can load data to a table. To load data to a table, the INSERT privilege on the table is also required. Depending on the load activity, the UPDATE and DELETE privilege on the table might also needed.

370 Database Strategies: Using Informix XPS and DB2 Universal Database DB2 UDB privileges Database privileges are granted in the database through the SQL command GRANT. Privileges are stored in the system catalog tables within the database. There are three types of privileges:  _ Ownership or CONTROL privileges In most cases the database user who creates a database object is automatically granted the CONTROL privilege. This privilege permits the user to grant other database users certain privileges on this object. The GRANT privilege can be granted through the GRANT statement.  _ Individual privileges A classic example of this type of privileges is the SELECT, INSERT, UPDATE and DELETE privileges.  _ Implicit privilege This is a sub privilege, which is granted automatically to a user when this user is granted a high level privilege.

Grant command syntax The syntax to use the grant or revoke command is as follows: GRANT privilege ON Object_name TO USER username REVOKE privilege ON object_name FROM username

Example 12-9 includes examples of granting database privilege and table access authority to a user.

Example 12-9 Granting create table privilege to user smith GRANT CREATETAB TO USER smith; GRANT INSERT ON emp_table TO USER smith;

Creating DB2 database users In DB2 UDB, users are created at the operating system level using operating system commands and utilities. For example, if we need to create a new database user called db2usr and grant the user select, insert, and update privileges on table accounts on an AIX environment, we need to perform the following steps: 1. Log on to the AIX server as root and create a group: mkgroup id=995 accttab

2. Create a user and assign him to group accttab: mkuser id=1001 pgrp=accttab groups=accttab home=/home/db2user db2user

Chapter 12. DB2 Migration ToolKit for Informix 371 3. Edit the .profile file for user db2usr and add the db2profile path to it, and execute the .profile in order to reflect the changes:

. /db2/home/db2inst2/sqllib/db2profile . ./.profile 4. Log on to the AIX server as the instance owner or any authorized user and connect to the database: su - db2inst2 db2 connect to sample 5. Grant the desired privileges to the group: db2 "grant select, insert, update on accounts to group accttab" 6. Log on as user db2user, connect to database sample, and issue a SQL statement against table accounts: su - db2user db2 connect to sample db2 "select * from db2inst2.staff"

372 Database Strategies: Using Informix XPS and DB2 Universal Database

13

Chapter 13. Large data volumes: A case study

This chapter discusses the case study that is used to demonstrate the transition of large volumes of data from XPS to DB2 UDB Enterprise Server Edition (ESE) with the Data Partitioning Feature (DPF). It also discusses the various tools and scripts that we used for transferring the schema, data, queries, and applications from XPS to DB2. Based on our test results, it provides some recommendations and tips that can help make the transitioning process as smooth as possible.

The TPC-H benchmark data schema, which is defined by the Transaction Processing Council, was used as a representative schema for large data volumes. Although the TPC-H benchmark is a standard test case that is primarily used for measuring performance, that was neither our objective nor our intent. We simply used this benchmark for our transitioning case study because it was available, and is well known and understood by many. Although this chapter briefly touches on some performance tuning considerations, the topic is not discussed at depth because it is beyond the scope of this redbook.

© Copyright IBM Corp. 2005. All rights reserved. 373 13.1 Project environment

Both XPS and DB2 were installed on the same logical partition (LPAR) of a single IBM Eserver®. We could have installed XPS and DB2 on separate LPARs but decided to use this configuration to demonstrate and test the co-existence of both on a single logical partition.

The processor and memory resources that were available were split evenly between XPS and DB2. There were eight processors each for XPS and DB2. To split the processor resources, we used the processor affinity setting for XPS and the AIX resource set functionality for DB2.

The details about how to set the processor affinity and the AIX resource set is explained in 13.3, “Splitting the CPU resources” on page 376.

To split the memory, we set the configuration parameters SHMTOTAL on XPS and INSTANCE_MEMORY on DB2. Although we created separate volume groups for XPS and DB2 the disks were effectively shared.

The following are lists of the products used for the hardware and software environments:  Hardware: –IBMEserver, a pSeries® Model 690 – 128 GB of memory – 32 hard disks each with 140 GB of disk space – 16 processors POWER4+™ TURBO 1.9 GHz each  Software: – AIX 5.2 ML 3 – XPS 8.50.FC1 – CSDK 2.81.FC2 – I-Spy 2.00.FD2 – DB2 UDB ESE with DPF V8.2 – Query Patroller V8.2 – DB2 High Performance Unload (HPU) V2.2 – TPC-H schema, and data and query generator

374 Database Strategies: Using Informix XPS and DB2 Universal Database 13.2 Disk layout

There were 32 disks available on the server, each containing 140 GB of space. We created 32 volume groups with each volume group containing these five logical volumes corresponding to disk N (where N = 1..32):  /wrkN: cooked file system; /wrk[1-4] contained the installation of DB2 and XPS, and the TPC-H datagen software.  /dev/rxpsN: raw device used for the XPS dbslices.  /dev/rdb2aN: raw device used for DB2 table spaces.  /dev/rdb2bN: raw device used for DB2 table spaces.  /db2fN: cooked file system used for DB2 system TEMP table space and TPC-H load datafiles.

Figure 13-1 illustrates the layout of the disks.

/wrk1 20GB /wrk2 20GB /wrk3 20GB /wrk4 20GB /dev/rxps1 40GB /dev/rxps2 40GB /dev/rxps3 40GB /dev/rxps4 40GB /dev/rdb2a1 20GB /dev/rdb2a2 20GB /dev/rdb2a3 20GB /dev/rdb2a4 20GB /dev/rdb2b1 20GB /dev/rdb2b2 20GB /dev/rdb2b3 20GB /dev/rdb2b4 20GB /db2f1 40GB /db2f2 40GB /db2f3 40GB /db2f4 40GB

Drive 1 Drive 2 Drive 3 Drive 4

/wrk5 20GB /dev/rxps5 40GB /dev/rdb2a5 20GB /dev/rdb2b5 20GB /db2f5 40GB /wrk32 20GB Drive 5 /dev/rxps32 40GB /dev/rdb2a32 20GB /dev/rdb2b32 20GB /db2f32 40GB

Drive 32

Controller 0 Controller 1 Controller 2 Controller 3

Raw File system Figure 13-1 Disk layout

Chapter 13. Large data volumes: A case study 375 13.3 Splitting the CPU resources

This section describes how we split the CPU resources between XPS and DB2

by using the processor affinity and AIX resource sets respectively. Out of the available 16 CPUs, we reserved CPUs 0 through 7 for XPS and CPUs 8 through 16 for DB2. By doing this, XPS and DB2 will not compete with each other for CPU resources, even though they will compete for I/O bandwidth.

13.3.1 Configuring processor affinity on XPS On multiprocessor computers that support processor affinity, binding a CPU virtual processor (VP) to a CPU causes the CPUVP to run exclusively on that CPU. The database server assigns CPUVPs to CPUs in serial fashion, starting with the processor number that is specified by the onconfig parameter AFF_SPROC. The onconfig parameter AFF_NPROCS specifies the number of CPUs to which the database server can bind CPUVPs.

The onconfig parameter NUMCPUVP was set to 2 (4 coservers x 2 CPUVPs each = 8 processors). AFF_NPROCS was set to 2 in the global onconfig section and AFF_SPROC parameter was set in the coserver specific section. The CPUs 0 and 1 were bound to coserver 1, CPUs 2 and 3 were bound to coserver 2, and so on. See Example 13-1 for a snippet from our onconfig file.

Example 13-1 Processor affinity setting in the onconfig file NUMCPUVPS 2 # Number of user (cpu) vps AFF_NPROCS 2 # Affinity number of processors

COSERVER 1 NODE CLYDE AFF_SPROC 0 # Affinity start processor END

COSERVER 2 NODE CLYDE AFF_SPROC 2 # Affinity start processor END

COSERVER 3 NODE CLYDE AFF_SPROC 4 # Affinity start processor END

COSERVER 4 NODE CLYDE AFF_SPROC 6 # Affinity start processor END

376 Database Strategies: Using Informix XPS and DB2 Universal Database 13.3.2 Creating resource sets on AIX

The following are the steps that we took to enable processor affinity on DB2:

1. We defined eight new resource sets in the /etc/rsets file that correspond to the eight database partitions in our DB2 environment. CPUs 8 through 15 were assigned to DB2. These eight resource sets are named as DB2/MLN[1-8], as shown in Example 13-2.

Example 13-2 The /etc/rsets file DB2/MLN1: owner = db2tpch group = db2admin perm = rwr-r- resources = sys/cpu.00008

DB2/MLN2: owner = db2tpch group = db2admin perm = rwr-r- resources = sys/cpu.00009

DB2/MLN3: owner = db2tpch group = db2admin perm = rwr-r- resources = sys/cpu.00010

DB2/MLN4: owner = db2tpch group = db2admin perm = rwr-r- resources = sys/cpu.00011

DB2/MLN5: owner = db2tpch group = db2admin perm = rwr-r- resources = sys/cpu.00012

DB2/MLN6: owner = db2tpch group = db2admin perm = rwr-r-

resources = sys/cpu.00013

DB2/MLN7: owner = db2tpch group = db2admin

Chapter 13. Large data volumes: A case study 377 perm = rwr-r- resources = sys/cpu.00014

DB2/MLN8: owner = db2tpch group = db2admin perm = rwr-r- resources = sys/cpu.00015

2. The newly defined resource sets were added to the kernel data structures using the following SMIT fast path: $ smit reloadrsetcntl This menu gives you the option to reload the database now, at next boot, or both. Because this was the first time we used the new resource set, we selected both to allow rset to be loaded now and after each reboot. If you had changed an existing rset, you might have selected now. 3. The eight resource sets are specified in the resourcename column of db2nodes.cfg (equivalent to the COSERVER section in the XPS onconfig file) file against each database partition. This is depicted in Example 13-3.

Example 13-3 Linking resource set to db2nodes.cfg 1 CLYDE 0 CLYDE DB2/MLN1 2 CLYDE 1 CLYDE DB2/MLN2 3 CLYDE 2 CLYDE DB2/MLN3 4 CLYDE 3 CLYDE DB2/MLN4 5 CLYDE 4 CLYDE DB2/MLN5 6 CLYDE 5 CLYDE DB2/MLN6 7 CLYDE 6 CLYDE DB2/MLN7 8 CLYDE 7 CLYDE DB2/MLN8

378 Database Strategies: Using Informix XPS and DB2 Universal Database 13.4 TPC-H Data generation

For this case study, we used 100 GB of data that was generated by the TPC-H

data generation tool dbgen. Table 13-1 lists the eight tables of the TPC-H schema, along with their respective data volume.

Table 13-1 TPC-H tables Table Rows

lineitem 600,037,902

orders 150,000,000

customer 15,000,000

supplier 1,000,000

part 20,000,000

partsupp 80,000,000

nation 25

region 5

The TPC-H program, dbgen, was used to generate the load files. Example 13-4 gives a snippet from the dbgen help output.

Example 13-4 The dbgen utility - help $ dbgen -help TPC-H Population Generator (Version 1.3.0) Copyright Transaction Processing Performance Council 1994 - 2000 USAGE: dbgen [-{vfFD}] [-O {fhmsv}][-T {pcsoPSOL}] [-s ][-C ][-S ] dbgen [-v] [-O {dfhmr}] [-s ] [-U ] [-r ]

-C -- use processes to generate data [Under DOS, must be used with -S] -F -- generate flat files output -s -- set Scale Factor (SF) to -T c -- generate cutomers ONLY -T l -- generate nation/region ONLY -T L -- generate lineitem ONLY -T n -- generate nation ONLY -T o -- generate orders/lineitem ONLY -T O -- generate orders ONLY -T p -- generate parts/partsupp ONLY -T P -- generate parts ONLY

Chapter 13. Large data volumes: A case study 379 -T r -- generate region ONLY -T s -- generate suppliers ONLY -T S -- generate partsupp ONLY

You can create a datafile for a particular table using the -T option. If n is passed as a parameter to the -C option, dbgen creates the datafiles as n streams.

We created 16 streams for lineitem (the largest table in the database), one stream for nation and region (smallest tables) and eight streams for the remaining five tables. The size of the database (100 GB in this case) was passed to the -s option.

We used the script in Example 13-5 to generate the load files. This script generates load files in the directory /db2f[1-n]/dbgen where n is the number of splits. For example, the 16 load files for lineitem are created in the directories /db2f1/dbgen/lineitem.tbl.1, /db2f2/dbgen/lineitem.tbl.2, .., /db2f16/dbgen/lineitem.tbl.16. These directories correspond to the disks 0,1,2,..15 as seen in the Disk Layout in Figure 13-1 on page 375.

Note: The load files are evenly spread across the four disk controllers to maximize I/O throughput.

Example 13-5 Script to generate load datafiles #!/bin/ksh # Creates datafiles in dir /db2f[1-n]/dbgen/

.tbl.n TPCH_SIZE=100 gen() { # Parms = TABLE, TABLE_ACRONYM, NUM_SPLIT # don’t create links for small_table(s) # create the links to the dirs and then run dbgen i=1 while [ $i -le $3 ] do mkdir -p /db2f$i/dbgen rm $1.tbl.$i 2> /dev/null rm /db2f$i/dbgen/$1.tbl.$i 2> /dev/null ln -s /db2f$i/dbgen/$1.tbl.$i $1.tbl.$i let i=i+1 done ./dbgen -s $TPCH_SIZE -T $2 -C $3

fi }

cd /home/informix/dbgen

380 Database Strategies: Using Informix XPS and DB2 Universal Database gen partsupp S 8 gen supplier s 8 gen part P 8 gen orders O 8 gen lineitem L 16 gen customer c 8 gen nation n 1 gen region r 1

13.5 XPS configuration

We installed the following Informix products:  XPS 8.50.FC1  ClientSDK version 2.81.FC2  I-Spy version 2.00.FD2 in the directory /wrk3

The CSDK libraries are used by WebSphere Information Integrator to interact with the XPS database.

We chose a four coserver environment with two CPUVPs per coserver. Out of the total 128 GB, 64 GB was assigned for XPS. We reserved 58 GB (90% of 64 GB) for XPS. These are some of the coserver specific parameters:  SHMTOTAL = 14.5 GB (58 GB across four coservers)  SHMVIRTSIZE = 10 GB  SHMADD = 2 GB  DS_TOTAL_MEMORY = 11 GB (75% of SHMTOTAL)

The general steps we took to run the TPC-H queries on XPS are shown in Figure 13-2 on page 382. The scripts used to create the dbslices and tables are discussed in the subsequent sections of this chapter.

Chapter 13. Large data volumes: A case study 381

Initialize Create Load disk, create raw tables

dbslices and tables from flat database files

Run the Alter table to TPC-H Run update operational and queries statistics create indices

Figure 13-2 XPS setup process

13.5.1 Onconfig file Example 13-6 describes some of the interesting parameters from our onconfig file.

Example 13-6 The onconfig file snippet MULTIPROCESSOR 1 # 0 for single-processor, 1 for multi-processor NUMCPUVPS 2 # Number of user (cpu) vps SINGLE_CPU_VP 0 # If non-zero, limit number of cpu vps to one

NOAGE 1 # Process aging AFF_NPROCS 2 # Affinity number of processors

PAGESIZE 8192 # System Pagesize

# Read Ahead Variables RA_PAGES 256 # Number of pages to attempt to read ahead RA_THRESHOLD 128 # Number of pages left before next group IDX_RA_PAGES 64 # Number of index pages to read ahead IDX_RA_THRESHOLD 32 # Number of index pages left before next group

# Shared Memory Parameters

LOCKS 10000 # Maximum number of locks BUFFERS 50000 # Maximum number of shared buffers NUMAIOVPS 4 # Number of IO vps NUMFIFOVPS 2 # Number of FIFO vps PHYSBUFF 32 # Physical log buffer size (Kbytes) LOGBUFF 32 # Logical log buffer size (Kbytes) CLEANERS 2 # Number of buffer cleaner processes

382 Database Strategies: Using Informix XPS and DB2 Universal Database SHMBASE 0x700000000000000L # Shared memory base address SHMVIRTSIZE 10000000 # initial virtual shared mem segment size SHMADD 2000000 # Size of new shared memory segments (Kbytes) SHMTOTAL 14680064 # Total shared memory (Kbytes). 0=>unlimited CKPTINTVL 30000 # Check point interval (in sec) LRUS 8 # Number of LRU queues LRU_MAX_DIRTY 60 # LRU percent dirty begin cleaning limit LRU_MIN_DIRTY 50 # LRU percent dirty end cleaning limit TXTIMEOUT 300 # Transaction timeout (in sec) STACKSIZE 64 # Stack size (Kbytes)

# Read Ahead Variables RA_PAGES 256 # Number of pages to attempt to read ahead RA_THRESHOLD 128 # Number of pages left before next group IDX_RA_PAGES 64 # Number of index pages to read ahead IDX_RA_THRESHOLD 32 # Number of index pages left before next group

DS_TOTAL_MEMORY 11010048 # Decision support memory (Kbytes)

SBUFFER 2048 LBUFFER 24576 DGINFO SHM_SBUFS,96,SHM_LBUFS,256,SHM_HBUFS,16

COSERVER 1 NODE CLYDE AFF_SPROC 0 # Affinity start processor END COSERVER 2 NODE CLYDE AFF_SPROC 2 # Affinity start processor END COSERVER 3 NODE CLYDE AFF_SPROC 4 # Affinity start processor END COSERVER 4 NODE CLYDE AFF_SPROC 6 # Affinity start processor END

Chapter 13. Large data volumes: A case study 383 13.5.2 Creating dbslices

We created the symbolic links to the 32 raw disks by using the script that is shown in Example 13-7. We used these links to create the dbslices on the four coservers.

Example 13-7 Symbolic links to disks #!/bin/ksh

# create links to the raw disks for 4 coservers

ln -s /dev/rxps1 /wrk4/tpch/disks/xps1.1 ln -s /dev/rxps2 /wrk4/tpch/disks/xps2.1 ln -s /dev/rxps3 /wrk4/tpch/disks/xps3.1 ln -s /dev/rxps4 /wrk4/tpch/disks/xps4.1 ln -s /dev/rxps5 /wrk4/tpch/disks/xps5.1 ln -s /dev/rxps6 /wrk4/tpch/disks/xps6.1 ln -s /dev/rxps7 /wrk4/tpch/disks/xps7.1 ln -s /dev/rxps8 /wrk4/tpch/disks/xps8.1

ln -s /dev/rxps9 /wrk4/tpch/disks/xps1.2 ln -s /dev/rxps10 /wrk4/tpch/disks/xps2.2 ln -s /dev/rxps11 /wrk4/tpch/disks/xps3.2 ln -s /dev/rxps12 /wrk4/tpch/disks/xps4.2 ln -s /dev/rxps13 /wrk4/tpch/disks/xps5.2 ln -s /dev/rxps14 /wrk4/tpch/disks/xps6.2 ln -s /dev/rxps15 /wrk4/tpch/disks/xps7.2 ln -s /dev/rxps16 /wrk4/tpch/disks/xps8.2

ln -s /dev/rxps17 /wrk4/tpch/disks/xps1.3 ln -s /dev/rxps18 /wrk4/tpch/disks/xps2.3 ln -s /dev/rxps19 /wrk4/tpch/disks/xps3.3 ln -s /dev/rxps20 /wrk4/tpch/disks/xps4.3 ln -s /dev/rxps21 /wrk4/tpch/disks/xps5.3 ln -s /dev/rxps22 /wrk4/tpch/disks/xps6.3 ln -s /dev/rxps23 /wrk4/tpch/disks/xps7.3 ln -s /dev/rxps24 /wrk4/tpch/disks/xps8.3

ln -s /dev/rxps25 /wrk4/tpch/disks/xps1.4 ln -s /dev/rxps26 /wrk4/tpch/disks/xps2.4 ln -s /dev/rxps27 /wrk4/tpch/disks/xps3.4 ln -s /dev/rxps28 /wrk4/tpch/disks/xps4.4 ln -s /dev/rxps29 /wrk4/tpch/disks/xps5.4 ln -s /dev/rxps30 /wrk4/tpch/disks/xps6.4 ln -s /dev/rxps31 /wrk4/tpch/disks/xps7.4 ln -s /dev/rxps32 /wrk4/tpch/disks/xps8.4

384 Database Strategies: Using Informix XPS and DB2 Universal Database Example 13-8 shows the script that we used to create the dbslices. Each slice contains 32 dbspaces that are spread across the 32 disks. We used the offset into the raw disks to create the dbslices on these 32 disks. As you can see from the dbslice definition, the coserver 1 dbspaces reside on disks 1..8, coserver 2 dbspaces reside on disks 9..16, and so on. The tables, lineitem and order, are created on the 84 slices lo_mon (1..84). As the name suggests, temp_slice is a temporary dbslice. Indexes are created on the dbslice olind, and the remaining tables reside in dbslice s_c_p_ps.

Example 13-8 The dbslice creation script #!/bin/ksh onutil <

Chapter 13. Large data volumes: A case study 385 offset 2500000 size 1000000;

create dbslice lo_mon1 from cogroup cogroup_all chunk "/wrk4/tpch/disks/xps1.%c" offset 3500000 size 95240, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps2.%c" offset 3500000 size 95240, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps3.%c" offset 3500000 size 95240, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps4.%c" offset 3500000 size 95240, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps5.%c" offset 3500000 size 95240, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps6.%c" offset 3500000 size 95240, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps7.%c" offset 3500000 size 95240, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps8.%c" offset 3500000 size 95240; ...... ... create dbslice lo_mon84 from cogroup cogroup_all chunk "/wrk4/tpch/disks/xps1.%c" offset 11404920 size 95240, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps2.%c" offset 11404920 size 95240, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps3.%c" offset 11404920 size 95240, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps4.%c" offset 11404920 size 95240, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps5.%c" offset 11404920 size 95240, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps6.%c" offset 11404920 size 95240, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps7.%c" offset 11404920 size 95240, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps8.%c" offset 11404920 size 95240;

create dbslice olind from cogroup cogroup_all chunk "/wrk4/tpch/disks/xps1.%c" offset 11500160 size 2000000, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps2.%c" offset 11500160 size 2000000, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps3.%c" offset 11500160 size 2000000, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps4.%c" offset 11500160 size 2000000,

386 Database Strategies: Using Informix XPS and DB2 Universal Database cogroup cogroup_all chunk "/wrk4/tpch/disks/xps5.%c" offset 11500160 size 2000000, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps6.%c" offset 11500160 size 2000000, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps7.%c" offset 11500160 size 2000000, cogroup cogroup_all chunk "/wrk4/tpch/disks/xps8.%c" offset 11500160 size 2000000;

13.5.3 Creating and loading the table We created and loaded the table using external tables with the script shown in “XPS schema and load scripts” on page 411. We fragmented the two largest tables, lineitem and orders, on the 84 dbslices, lo_mon (1..84), using hybrid strategy. The hashing column was the key column (l_orderkey and o_orderkey), and the expression column was the date column (l_shipdate and o_orderdate).

Figure 13-3 shows the layout of these two tables on the 84 dbslices.

lo.mon84 XPS1.1XPS1-1 XPS2.1XPS2-1 XPS1.4XPS1-4 XPS2.4XPS2-4

lo.mon1 XPS1.1 XPS2.1 XPS1.4 XPS2.4

lo.mon1.1 lo.mon1.5 lo.mon1.4lo.mon1.4 lo.mon1.8lo.mon1.8

lo.mon1.1 lo.mon1.5 lo.mon1.4 lo.mon1.8 XPS8.4

XPS8.1 XPS8.4

lo.mon84.32

lo.mon1.29 lo.mon1.32 Coserver 1Cos 2 Cos 3 Coserver 4

Legend Case Study Tables: lineitem chunk dbspace orders lineitem extents dbslice orders extents

Figure 13-3 Tables lineitem, orders created in dbslices lo_mon (1..84)

The smallest tables, nation and region were created in rootdbs.1. The remaining tables customer, supplier, parts, and partsupp, were hash

Chapter 13. Large data volumes: A case study 387 fragmented on a key residing in the dbslice s_c_p_ps. Figure 13-4 shows the layout of these four tables across the four coservers.

s_c_p_ps

supplier customer

XPS1.1 XPS2.1 XPS3.1 XPS4.1

lo.mon1.1 lo.mon1.5

XPS8.1

partsupp parts

Coserver 1 Coserver 2 Cos 3 Cos 4

Case Study Tables: Legend dbspace supplier partsupp customer parts chunk parts supplier partsupp customer dbslice

Figure 13-4 Tables supplier, customer, parts and partsupp in dbslice s_c_p_ps

All the tables were created as raw and lock mode set to table (to avoid allocation of locks). The data was loaded using external tables. After the load, the table type was changed to operational (to enable index creation) and the lock mode changed to page (real world scenario).

Note: The load files were spread across the four coservers so that the load files could be processed in parallel.

388 Database Strategies: Using Informix XPS and DB2 Universal Database 13.5.4 Index builds and update statistics

We used the script in Example 13-9 to create the indexes and run the update statistics. We ran a high level of update statistics on the smaller tables nation and region, and medium level of update statistics with a resolution of 0.1 and a

confidence level of 0.99 on the remaining tables.

Example 13-9 Index Build and Update statistics script on XPS #!/bin/ksh

DBNAME=tpch

dbaccess -e $DBNAME <

dbaccess -e $DBNAME<

Chapter 13. Large data volumes: A case study 389 Figure 13-5 shows the layout of the index on the dbslice olind.

Index extent (o_i1_k) OLIND on orders

XPS1.1 XPS2.1

lo.mon1.1 lo.mon1.5

XPS8.1

Index extent (I_i1_k) on lineitem Coserver 1 Coserver 2 Cos 3 Cos 4

Legend Case Study Tables: dbspace lineitem o_i1_k extents chunk orders l_i1_k extents dbslice

Figure 13-5 Indexes l_i1_k and o_i1_k created in dbslice olind

13.5.5 Running TPC-H queries We ran the TPC-H that provided 22 queries in both single stream and multiple streams method.

Single stream run of TPC-H queries To see the 22 queries that we ran on XPS and DB2, see “Using the Web material” on page 432, which provides a pointer to a zipped file that contains these queries.

Multistream run For a multistream run, we ran five client sessions in parallel. Each client session contained the 22 queries in a random order. We used the qgen utility that is provided by TPC-H to generate the five scripts for the stream run. Example 13-10 on page 391 shows the script that created the five stream files of size 100 GB for database tpch. Each script contains the 22 queries in a random order with a random generated seed for the filters in the WHERE condition. The query output is created in the results directory.

390 Database Strategies: Using Informix XPS and DB2 Universal Database Example 13-10 Script to create stream files

#!/bin/ksh for i in 1 2 3 4 5 do qgen -n tpch -c -o results -p $i -s 100 > stream0$i.sql done

13.6 DB2 configuration

We chose a eight partitioned environment for the DB2 instance corresponding to the eight CPUVPs. This section describes the DB2 configuration and the scripts that we used to create the table spaces, load the tables, and create the indexes.

13.6.1 Database and database manager configuration Some of the interesting configuration parameters we set at the database and the database manager level is shown in Example 13-11 on page 392. The parameter UTIL_HEAP_SZ was set to 65536 during the loads, and later changed to 5000 during the query run, because this parameter is only specific to loads.

In a partitioned database environment, most communication between database partitions is handled by the Fast Communications Manager (FCM). If the DB2_FORCE_FCM_BP variable is set to Yes, the FCM buffers are always created in a separate memory segment so that communication between FCM daemons of different logical partitions on the same physical node occurs through shared memory. Otherwise, FCM daemons on the same node communicate through UNIX Sockets.

If DB2_PARALLEL_IO is not set, the I/O parallelism is equal to the number of containers in the table space. If it is set, the I/O parallelism is equal to the result of prefetch size divided by extent size. For example, even though a table space has only two containers, if this variable is set and prefetch size is four times the extent size, a prefetch request for this table space will be broken into four requests (one extent per request) with a possibility of four prefetchers servicing the requests in parallel.

The DB2_LARGE_PAGE_MEM 7 registry variable is used to enable large page support when running on an OS that supports this feature (for example, AIX 5.2). If this is set, each DB2 UDB agent consumes at least one large page (16 MB) of physical memory.

Chapter 13. Large data volumes: A case study 391 The TPC-H queries Q21 and Q22 contain a subquery in the NOT EXISTS clause. Setting DB2_ANTIJOIN to ON rewrites the NOT EXISTS to an anti-join and dramatically improves the performance of these two queries.

For more details on any specific parameter, refer to DB2 UDB Administration

Guide: Implementation, SC09-4820.

Example 13-11 Database and Database Manager configuration #!/bin/ksh

db2set DB2_FORCE_FCM_BP=yes db2set DB2_PARALLEL_IO="*" db2set DB2_LARGE_PAGE_MEM=DB db2set DB2_ANTIJOIN=ON

# Database configuration db2 update dbm cfg using UTIL_IMPACT_LIM 100 db2_all db2 -v update db cfg for tpch using NUM_FREQVALUES 0 db2_all db2 -v update db cfg for tpch using NUM_QUANTILES 1000 db2_all db2 -v update db cfg for tpch using DBHEAP 7500 db2_all db2 -v update db cfg for tpch using LOGBUFSZ 1024 db2_all db2 -v update db cfg for tpch using UTIL_HEAP_SZ 65536 db2_all db2 -v update db cfg for tpch using LOCKLIST 8196 db2_all db2 -v update db cfg for tpch using SOFTMAX 300 db2_all db2 -v update db cfg for tpch using SORTHEAP 50000 db2_all db2 -v update db cfg for tpch using SHEAPTHRES_SHR 250 db2_all db2 -v update db cfg for tpch using STMTHEAP 20000 db2_all db2 -v update db cfg for tpch using STAT_HEAP_SZ 10000 db2_all db2 -v update db cfg for tpch using LOGFILSIZ 8192 db2_all db2 -v update db cfg for tpch using LOGPRIMARY 50 db2_all db2 -v update db cfg for tpch using LOGSECOND 5 db2_all db2 -v update db cfg for tpch using DATABASE_MEMORY automatic

# Database Manager configuration db2 -v update dbm cfg using SHEAPTHRES 500000 db2 -v update dbm cfg using FCM_NUM_BUFFERS 262144

392 Database Strategies: Using Informix XPS and DB2 Universal Database 13.6.2 Creation of nodegroups and table spaces

The PAGESIZE in XPS was set to 8 KB. On DB2, we used a pagesize of 16 KB for the tables and indexes. The bufferpools were created or altered using the following command:

alter bufferpool ibmdefaultbp size 50000; create bufferpool bp16k all dbpartitionnums size 250000 pagesize 16K;

We created a partition group corresponding to partition #2. The smallest tables, nation and region, reside on partition #2. The partition group was created with the following command: create database partition group onepartition_group on dbpartitionnums(2);

We then created the following table spaces:  space_small, which is managed by SMS and used to host data for the tables, nation and region. It uses the bufferpool ibmdefaultbp with default page size of 4 KB.  space_dataindex, which is managed by DMS and used to host data and index for the remaining tables; uses the bufferpool bp16k with page size of 16 KB.

Example 13-12 shows the script that we used to create these table spaces.

Example 13-12 Script to create table spaces on DB2 #!/bin/ksh

if [ $# -lt 2 ] then echo "\tusage: ${0} " exit fi

export DB2OPTIONS="-tv +p"

# 1250000 pages of 16K page = 20G total_size=$2

db2 <

Chapter 13. Large data volumes: A case study 393 device '/wrk2/tablespaces/db2a4. \$N ' ${total_size} ) bufferpool bp16k extentsize 32 prefetchsize 128;

create tablespace space_small in database partition group onepartition_group pagesize 4K managed by system using ( '/wrk2/tablespaces/db2f1. \$N /space_small', '/wrk2/tablespaces/db2f2. \$N /space_small', '/wrk2/tablespaces/db2f3. \$N /space_small', '/wrk2/tablespaces/db2f4. \$N /space_small' ) bufferpool ibmdefaultbp; EOF db2 terminate

13.6.3 Transfer of schema You can use the DB2 Migration Tool Kit (MTK) for Informix to transfer your schemas to XPS, as explained in Chapter 12, “DB2 Migration ToolKit for Informix” on page 345. For our case study, we wanted to use the manual method to identify similarities and differences between the two. We used the following steps to create a DB2 compatible SQL command script: 1. Schema download We ran dbschema with the fragmentation option (-ss) to unload the schema to a file schema.out. In this file, we changed the XPS specific syntax to DB2 syntax manually, as follows: dbschema -d tpch -ss > schema.out

Note: The MTK V1.3 ignores the fragmentation clauses in the XPS input file and creates an output without any partitioning information. You have to change the output to add the DB2 partitioning clauses manually. Future versions will support this feature.

2. Data types All the data types in the TPC-H schema exist in DB2, so we did not change any datatypes. If your schema contains datatypes such as serial, interval, text, byte, money, datetime, and nchar, they need to be translated to their equivalent DB2 datatypes. MTK does this for you automatically.

394 Database Strategies: Using Informix XPS and DB2 Universal Database 3. Table type and schema name

The table type operational was changed to not logged initially (equivalent to the table type raw in XPS) to improve load performance. The schema name informix was replaced with the DB2 user db2tpch. Example 13-13 shows the commands that we used.

Example 13-13 changing schema name and table type (XPS) create operational table "informix".nation (n_nationkey integer not null, ..) in rootdbs.1;

(DB2) create table “db2tpch”.nation (n_nationkey integer not null, ..) in space_small not logged initially;

4. Hash fragmentation The tables that were fragmented by hash on a column were changed to the equivalent syntax on DB2 partitioning key on that column (DPF). Example 13-14 shows the commands to do that.

Example 13-14 XPS hash to DB2 partitioning (XPS) create operational table "informix".partsupp ( ps_partkey integer not null, ps_suppkey integer not null, ps_availqty integer, ps_supplycost decimal(12,2), ps_comment varchar(199) ) fragment by hash (ps_partkey) in s_c_p_ps;

(DB2) create table partsupp ( ps_partkey integer not null, ps_suppkey integer not null, ps_availqty integer, ps_supplycost decimal(12,2), ps_comment varchar(199) ) partitioning key (ps_partkey) using hashing in space_dataindex not logged initially;

Chapter 13. Large data volumes: A case study 395 5. Hybrid fragmentation

The hybrid fragmentation on XPS tables translates to a partitioning key on the hash column (DPF) and MDC on the expression column.

Note: On XPS, lineitem and orders are fragmented by hybrid on 84 dbslices. Each slice contains data for one month. The 84 dbslices contain seven years of data (84/12=7). Creating an MDC on the date column would create a sparse block index. Therefore, we created an MDC on the generated column YearMonth. As a result, each MDC slice contains one month of data.

Example 13-15 shows how the orders table is created in DB2 using MDC on a generated column. The function integer (1998-11-01) returns an integer value 19981101.

Example 13-15 Conversion of orders table (XPS) create table orders ( o_orderkey decimal(10,0) not null, o_custkey integer, ... o_orderdate date, ... o_comment varchar(79) ) fragment by hybrid(o_orderkey) expression o_orderdate < '1992-02-01' in lo_mon1, o_orderdate >= '1992-02-01' and o_orderdate < '1992-03-01' in lo_mon2, o_orderdate >= '1992-03-01' and o_orderdate < '1992-04-01' in lo_mon3, ...... o_orderdate >= '1998-11-01' and o_orderdate < '1998-12-01' in lo_mon83, o_orderdate >= '1998-12-01' in lo_mon84 extent size 512 next size 128 lock mode table;

(DB2) create table orders ( o_orderkey decimal(10,0) not null, o_custkey integer, ... o_comment varchar(79), o_orderym generated always as (integer(o_orderdate)/100) ) partitioning key (o_orderkey) using hashing

396 Database Strategies: Using Informix XPS and DB2 Universal Database organize by(o_orderym) in space_dataindex not logged initially;

6. Extent size

On DB2, the extent size and next size cannot be specified directly during a CREATE TABLE statement. We had to remove these clauses from the CREATE TABLE. In DB2, the extent size of a table is determined by the extent size of the table space. Example 13-16 shows that the extent size for orders, which resides in table space space_dataindex, is 32. There is no such thing as next size in DB2.

Example 13-16 DB2 extent size (DB2) create tablespace space_dataindex in database partition group ibmdefaultgroup pagesize 16K managed by database using (...) bufferpool bp16k extentsize 32 prefetchsize 128;

7. Table lock mode The default lock mode in DB2 is row level locking. In XPS, we used table level lock before load and then later changed it to page level. 8. Indexes The CREATE INDEX statement in DB2 does not contain storage clauses. If you need to create an index in a different table space than the table, you have to specify it when creating the table. We created the same two indexes (on lineitem and order) on XPS and DB2, as shown in Example 13-17. On DB2, the indexes were created in the same table space (space_dataindex) as the tables.

Example 13-17 Index creation (XPS) create index l_i1_k on lineitem (l_orderkey) fragment by hash(l_orderkey) in olind; create index o_i1_k on orders (o_orderkey) fragment by hash(o_orderkey) in olind; (DB2) create index l_i1_k on lineitem (l_orderkey); create index o_i1_k on orders (o_orderkey);

Chapter 13. Large data volumes: A case study 397 9. Grants and Revokes

We did not create any roles in XPS. If you have roles defined in your XPS instance, you will have to create these roles as OS groups in DB2. There was only one user informix working on the XPS server.

Example 13-18 shows the default GRANT and REVOKE statements that were generated by the dbschema command on the table partsupp. The AS clause is not valid on DB2. The equivalent grant command on DB2 is shown here.

Example 13-18 Grant statements (XPS) grant select on "informix".partsupp to "public" as "informix"; grant update on "informix".partsupp to "public" as "informix"; grant insert on "informix".partsupp to "public" as "informix"; grant delete on "informix".partsupp to "public" as "informix"; grant index on "informix".partsupp to "public" as "informix";

(DB2) grant select on "informix".partsupp to "public";

13.6.4 Creating and loading the table On DB2, we ran the same TPC-H queries using single-stream and multi-stream method on the following configurations:  Multidimensional Clustering (MDC) tables.  Non-MDC tables (partitioning using DPF)  UNION ALL views

Load was done using disk as well as pipes. We tested a load for a smaller table using WebSphere Information Integrator. We also carried out roll-in or roll-out of data on the MDC tables and UNION ALL views.

MDC tables The scripts used for the MDC approach is described in , “DB2 schema and load scripts” on page 421. The tables lineitem, orders, customer, parts, supplier, and partsupp and the three indexes were created in the table space space_dataindex. These tables were partitioned across the eight partitions using a key. The smaller tables nation and region were created in table space space_small located only on partition 2.

We created MDCs on a generated column based on the date columns (same as the expression column of the hybrid fragmented table in XPS) for the largest tables lineitem and orders. The generated column was the YearMonth part of

398 Database Strategies: Using Informix XPS and DB2 Universal Database the date column. Hence the MDC granularity was 1 month. This is similar to XPS where each dbslice contained 1 month of data. Example 13-9 on page 389 shows the command to create the lineitem table using MDC.

Example 13-19 MDC table create table lineitem ( l_orderkey decimal(10,0) not null, l_partkey integer, l_suppkey integer, l_linenumber integer, l_quantity decimal(12,2), l_extendedprice decimal(12,2), l_discount decimal(12,2), l_tax decimal(12,2), l_returnflag char(1), l_linestatus char(1), l_shipdate date, l_commitdate date, l_receiptdate date, l_shipinstruct char(25), l_shipmode char(10), l_comment varchar(44), l_shipym generated always as (integer(l_shipdate)/100) ) partitioning key (l_orderkey) using hashing organize by(l_shipym) in space_dataindex not logged initially ;

Chapter 13. Large data volumes: A case study 399 Figure 13-6 shows the layout of the TPC-H tables in the DB2 environment.

P1 P2 P3 P4 P5 P6 P7 P8 space_dataindex

DB2a1.1 DB2a2.1 customer DB2a1.8 DB2a2.8

lo.mon1.1 lo.mon1.5 parts supplier partsupp

DB2a3.1 DB2a4.1 DB2a3.8 DB2a4.8 lineitem

orders

Index on lineitem (l_ i1_k)

Index on orders (0 – i1 – 0K) P1 P2 P3 P4 P5 P6 P7 P8 space_small DB2f1.2 DB2f2.2 Legend Nation customer orders parts Index on lineitem Region supplier Index on orders DB2f3.2 DB2f4.2 partsupp nation lineitem region container Table Space

Figure 13-6 Layout of the tables on DB2

Non-MDC tables For this configuration, we re-created the lineitem and orders table without the MDC ORGANIZE BY clause. All the other tables were exactly the same as the MDC approach.

13.6.5 Transfer of data This section discusses the several methods that we used to transfer data between XPS and DB2:

Load using ASCII flat files We unloaded the data from the XPS tables into delimited ASCII flat files using external table unload. These datafiles were then loaded into the DB2 tables using the LOAD command.

The scripts that we used to load the data into the MDC tables are described in “DB2 schema and load scripts” on page 421.

400 Database Strategies: Using Informix XPS and DB2 Universal Database Load using pipes We also tested unload from XPS and load into DB2 using pipes, as both XPS and DB2 reside on the same physical server. You can use this approach to save on disk space that would otherwise be required for staging of the load files. For your XPS and DB2 systems that are on separate machines, you could use NFS mount points to transfer data between XPS and DB2. However, you might have to weigh the performance impact of using NFS against the added disk requirement.

Writing a script to implement was a little tricky. Sometimes the INSERT would fail saying pipe not open, because DB2 had not opened the pipe to read. So, we wrote a script wait_load_start.sh so we would not start the INSERT until DB2 opens the pipe on all the partitions and starts the second phase. We also wrote a script wait_load_stop.sh to make it sequential. The scripts that we used to load the data using pipes are shown in Example 13-20.

Example 13-20 Unload and load using pipes #!/bin/ksh #load_using_pipe.sh dbaccess -e tpch - < unload.out 2>&1 create external table partsupp_ext sameas partsupp using(datafiles("PIPE:1:/db2f1/dbgen/partsupp.tbl.1", "PIPE:2:/db2f2/dbgen/partsupp.tbl.2", "PIPE:3:/db2f2/dbgen/partsupp.tbl.3", "PIPE:4:/db2f4/dbgen/partsupp.tbl.4", "PIPE:1:/db2f5/dbgen/partsupp.tbl.5", "PIPE:2:/db2f6/dbgen/partsupp.tbl.6", "PIPE:3:/db2f7/dbgen/partsupp.tbl.7", "PIPE:4:/db2f8/dbgen/partsupp.tbl.8"));

!db2_load_partsupp.sh & !wait_load_start.sh PARTSUPP 8 insert into partsupp_ext select * from partsupp; !wait_load_stop.sh db2_load_

======#!/bin/ksh # db2_load_partsupp.sh export DB2OPTIONS="-tv +p" db2 < z_pipe.log 2>&1 connect to tpch; load from /db2f1/dbgen/partsupp.tbl.1,

Chapter 13. Large data volumes: A case study 401 /db2f2/dbgen/partsupp.tbl.2, /db2f3/dbgen/partsupp.tbl.3, /db2f4/dbgen/partsupp.tbl.4, /db2f5/dbgen/partsupp.tbl.5, /db2f6/dbgen/partsupp.tbl.6, /db2f7/dbgen/partsupp.tbl.7, /db2f8/dbgen/partsupp.tbl.8 of del modified by coldel| anyorder replace into partsupp nonrecoverable data buffer 40000 partitioned db config mode partition_and_load ; EOF

db2 terminate ======#!/bin/ksh # wait_load_start.sh if [ $# -lt 2 ] then echo "\tusage: ${0}

" exit fi

number_partitions=0 table=${1} while [ "x$number_partitions" != "x$2" ] do # ps -ef | grep db2lpprt number_partitions=$(db2pd -utilities -alldbpartitionnum | grep ${table} | awk '{print $12}' | grep 2 | wc -l | tr -d ' ') sleep 1 done sleep 10

======#!/bin/ksh # wait_load_stop.sh if [ $# -lt 1 ] then echo "\tusage: ${0} " exit fi

rc=0 while [ $rc -eq 0 ] do dummy=$(ps -ef | grep ${1} | grep -v grep | grep -v $0)

402 Database Strategies: Using Informix XPS and DB2 Universal Database rc=$? sleep 1 done

WebSphere Information Integrator We used WebSphere Information Integrator to demonstrate query across the federated XPS and DB2 databases. We also demonstrated that WebSphere Information Integrator has the capability for transferring data for smaller tables as well.

13.6.6 Index builds and runstats We created the same two indexes on DB2 as were on XPS, and ran the runstats utility on all the eight tables by connecting to partition 1. With DB2 runstats, only rows that reside on the connection partition are sampled. This is different than XPS where rows from all the coservers are used for sampling. Example 13-21 shows the commands we used for runstats.

Example 13-21 Index builds and runstats (DB2) create index l_i1_k on lineitem (l_orderkey); create index o_i1_k on orders (o_orderkey);

-- for each table runstats on table ${table} with distribution on all columns and indexes all;

13.6.7 Changes to the TPC-H queries We made the following changes to the TPC-H queries to make it work for the DB2 syntax:  XPS does not support ANSI outer joins, rather it uses its own semantics for the outer joins. We can convert the simple case of an XPS outer join into an equivalent ANSI outer join syntax. However, in case of outer(tab1,tab2), this is not so straightforward. For the simplest case, the join conditions in the WHERE clause of XPS outer join moves into the join condition of the ANSI outer join syntax and the filter conditions in the WHERE clause moves into the WHERE clause of the ANSI outer join.

Chapter 13. Large data volumes: A case study 403 Example 13-22 shows how the query Q13 is rewritten to use a semantically equivalent ANSI outer join.

Example 13-22 ANSI outer join

(XPS) select c_count, count(*) as custdist from ( select c_custkey, count(o_orderkey) c_count from customer, outer orders where c_custkey = o_custkey and o_comment not like '%pending%packages%' group by c_custkey ) as c_orders group by c_count order by custdist desc, c_count desc;

(DB2) select c_count, count(*) as custdist from ( select c_custkey, count(o_orderkey) c_count from customer left outer join orders on c_custkey = o_custkey where o_comment not like '%unusual%packages%' group by c_custkey ) as c_orders group by c_count order by custdist desc, c_count desc;

 Cardinal number in the group by column list is not a valid syntax on DB2 V8.2. So, we changed Q9 to use the column name, as shown in Example 13-23.

Example 13-23 Cardinal number to column name (XPS) group by n_name, 2 order by n_name, 2 desc;

(DB2) group by n_name, year(o_orderdate) order by n_name, year(o_orderdate) desc;

404 Database Strategies: Using Informix XPS and DB2 Universal Database 10.The FIRST syntax is different in DB2, so it had to be changed for query Q3. Example 13-24 shows the rewritten query.

Example 13-24 FIRST syntax

(XPS) select FIRST 10 l_orderkey, ... from customer, orders, lineitem where ... group by ... order by ...;

(DB2) select l_orderkey, ... from customer, orders, lineitem where ... group by ... order by ...fetch first 10 rows only

11.The interval syntax needed to be changed. For example Q1 had to be changed as shown in Example 13-25.

Example 13-25 Interval syntax (XPS) l_shipdate <= date('1998-12-01') - interval (90) day (3) to day

(DB2) l_shipdate <= date('1998-12-01') - 90 day

13.6.8 Roll-in and roll-out of data We tested roll-in and roll-out of data in several environments. The results are discussed in the following sections.

XPS We ran DETACH and ATTACH on the lineitem table to perform roll-in and roll-out of data. Because the lineitem table has a detached index (fragmentation strategy of index is different than the table fragmentation) we had to drop the index before the ATTACH/DETACH and re-create it afterwards. Also note that the new table that results from the execution of the DETACH clause does not inherit any indexes or constraints from the original table. That is why you see an ALTER TABLE MODIFY before re-attaching the new table. If this constraint is not added, the ALTER FRAGMENT ATTACH gives a “Cannot attach because of incompatible schema” error.

Chapter 13. Large data volumes: A case study 405 Example 13-26 shows the script that we used to run this.

Example 13-26 Roll-in/Roll-out: XPS

(XPS) #!/bin/ksh # detach_and_attach.sh export DBDATE=y4md-

dbaccess -e - - < setup.out 2>&1 database tpch; begin work;

-- need to do this because index is detached drop index l_i1_k;

-- Detach the month 03/1997 alter fragment on table lineitem detach lo_mon63.1 detached_lo;

alter table detached_lo modify (l_orderkey decimal(10,0) not null);

alter fragment on table lineitem attach detached_lo as l_shipdate >= '1997-03-01' and l_shipdate < '1997-04-01' after lo_mon62.1;

create index "informix".l_i1_k on "informix".lineitem (l_orderkey) fragment by hash (l_orderkey) in olind ; commit work; EOF

DB2: Using MDC tables We used a single SQL to perform a DELETE and INSERT of the rows from the base table. Example 13-27 describes the script that we used to perform a roll-in and roll-out on an MDC table.

Example 13-27 Roll-in and roll-out: DB2 #!/bin/ksh # delete_and_insert.sh export DB2OPTIONS="+c -tv +p -s" db2 <

create table detached_lo like lineitem partitioning key (l_orderkey) using hashing in space_dataindex not logged initially;

declare detach_cursor cursor for (select * from old table

406 Database Strategies: Using Informix XPS and DB2 Universal Database (DELETE from lineitem where l_shipdate >= '1997-03-01' and l_shipdate < '1997-04-01') );

load from detach_cursor of cursor modified by anyorder replace into detached_lo nonrecoverable;

insert into lineitem select * from detached_lo; commit work; EOF

13.6.9 Roll-in and roll-out using UNION ALL views We created 28 smaller tables for the lineitem and orders tables where each table contained 3 months of data. The lineitem and order tables were created as a UNION ALL of these 28 smaller tables. Example 13-28 shows a snippet of the script that we used to perform this. Roll-in involved creating a new table and recreating the view with an additional SELECT of this table. Similarly, to perform roll-out, we excluded the table (to be detached) from the view definition.

Example 13-28 Roll-in/Roll-out: UNION ALL views create table orders_before_1992_04 ( o_orderkey decimal(10,0) not null, o_custkey integer, o_orderstatus char(1), o_totalprice decimal(12,2), o_orderdate date, o_orderpriority char(15), o_clerk char(15), o_shippriority integer, o_comment varchar(79), constraint before_1992_04 check (o_orderdate < '1992-04-01') enforced enable query optimization ) partitioning key (o_orderkey) using hashing in space_dataindex not logged initially ;

create table orders_1992_04_06 ( o_orderkey decimal(10,0) not null, o_custkey integer, o_orderstatus char(1),

Chapter 13. Large data volumes: A case study 407 o_totalprice decimal(12,2), o_orderdate date, o_orderpriority char(15), o_clerk char(15), o_shippriority integer, o_comment varchar(79), constraint in_1992_04_06 check (o_orderdate >= '1992-04-01' and o_orderdate < '1992-07-01') enforced enable query optimization ) partitioning key (o_orderkey) using hashing in space_dataindex not logged initially ;

...... < so on for all the 28 tables>

create view orders as ( select * from orders_before_1992_04 union all select * from orders_1992_04_06 union all select * from orders_1992_07_09 union all select * from orders_1992_10_12 union all select * from orders_1993_01_03 union all select * from orders_1993_04_06 union all select * from orders_1993_07_09 union all select * from orders_1993_10_12 union all select * from orders_1994_01_03 union all select * from orders_1994_04_06 union all select * from orders_1994_07_09 union all select * from orders_1994_10_12 union all select * from orders_1995_01_03 union all select * from orders_1995_04_06 union all select * from orders_1995_07_09 union all select * from orders_1995_10_12 union all select * from orders_1996_01_03 union all select * from orders_1996_04_06 union all select * from orders_1996_07_09 union all select * from orders_1996_10_12 union all select * from orders_1997_01_03 union all select * from orders_1997_04_06 union all select * from orders_1997_07_09 union all select * from orders_1997_10_12 union all select * from orders_1998_01_03 union all select * from orders_1998_04_06 union all select * from orders_1998_07_09 union all select * from orders_after_1998_10 );

408 Database Strategies: Using Informix XPS and DB2 Universal Database The queries that joined the lineitem and orders tables along with the other tables did not perform well as there was no join pushdown. This was because the number of tables fell beyond the limits of the UNION ALL views as described in 6.8.3, “Limitations of UNION ALL views” on page 179. It particularly impacted queries that did not contain any filter criteria on these two tables. The setup to create the UNION ALL views for larger tables can be cumbersome. This is one of the reasons we an MDC is a better alternative in such cases.

13.7 Observations

Here are some observations from the case study. Bear in mind, that these observations are based on our testing in a minimally tuned environment. We did not take the time for performance tuning because that was not a primary objective of the project. In addition, the XPS and DB2 environments were not similar in a number of respects. For example, XPS had a four coserver environment as compared to eight partitions on DB2. Also, the maximum number of chunks on XPS was around 2500, while DB2 used 32 containers.

The following is a list of observations specific to our environment:  The total query times on XPS and DB2 for both the single and multi-stream runs were approximately the same.  Query Q22 initially performed faster on XPS. But when we set the environment variable DB2_ANTIJOIN on DB2 to ON, it performed the same as with XPS. This registry variable transforms the NOT EXISTS to an ANTIJOIN. The queries Q16, Q21, and Q22, contained a NOT EXISTS clause.  We found the partitioning and MDC syntax on DB2 to be more manageable than UNION ALL views (DB2) or hybrid fragmentation (XPS). For XPS, the user must know the data set values for the fragmentation columns in order to create an expression to be able to ATTACH a fragment to an existing table.  Roll-in/Roll-out of data for an MDC table can be done with INSERT and DELETE statements. Our tests finished in acceptable times. For example, a DELETE and INSERT of around 8 Million rows finished in just seconds.  There is some additional overhead when running a DETACH/ATTACH in XPS that you will not encounter in DB2. If the table has a detached index on XPS, you would have to drop and re-create the index on the base table. If there are outstanding DMLs running on the table, you cannot run an ATTACH/DETACH on the table. You need to add constraints to the new table (corresponding to the constraints of the base table) before an ATTACH.  Even taking into consideration the logging overhead, MDC tables performed much better than UNION ALL tables. This is because we did not get a join

Chapter 13. Large data volumes: A case study 409 pushdown on some of the queries using UNION ALL views. They also out-performed non-MDC tables (those that used pure hash).

 Loading on XPS using external tables was faster than DB2 load. However, this is not a fair comparison because XPS had many more data fragments and hence used more parallelism.  Extra disk usage for MDC tables was infinitesimally small (< 0.001%).

You will get fast roll-in and roll-out times with UNION ALL views, but you have to make sure that the queries using these views do not fall under the limitations mentioned in 6.8.3, “Limitations of UNION ALL views” on page 179. UNION ALL views work very well with smaller numbers of tables. However, if your XPS environment contains tables with a large number of fragments and you do not need very fast roll-in and roll-out of data, and you might find the DELETE and INSERT times to be within acceptable limits. MDC tables seem to be the best option available on DB2 UDB V8.2.

410 Database Strategies: Using Informix XPS and DB2 Universal Database

A

Appendix A. Case study schemas definitions

This appendix lists the XPS and DB2 table schemas that we used in Chapter 13, “Large data volumes: A case study” on page 373.

XPS schema and load scripts

We used the script depicted in Example A-1 to create the TPC-H and the external tables on XPS. It also contains the statements that we used to load the TPC-H data into these tables.

Example: A-1 XPS: schema and load script #!/bin/ksh

# create the database and load the respective files DBNAME=tpch export DBDATE=y4md-

dbaccess -e - - <

set pdqpriority 100;

© Copyright IBM Corp. 2005. All rights reserved. 411 create raw table partsupp ( ps_partkey integer not null, ps_suppkey integer not null, ps_availqty integer, ps_supplycost decimal(12,2), ps_comment varchar(199) ) fragment by hash(ps_partkey) in s_c_p_ps extent size 512 next size 128 lock mode table ;

create external table partsupp_ext sameas partsupp using(datafiles("DISK:1:/db2f1/dbgen/partsupp.tbl.1", "DISK:2:/db2f2/dbgen/partsupp.tbl.2", "DISK:3:/db2f3/dbgen/partsupp.tbl.3", "DISK:4:/db2f4/dbgen/partsupp.tbl.4", "DISK:1:/db2f5/dbgen/partsupp.tbl.5", "DISK:2:/db2f6/dbgen/partsupp.tbl.6", "DISK:3:/db2f7/dbgen/partsupp.tbl.7", "DISK:4:/db2f8/dbgen/partsupp.tbl.8" ));

insert into partsupp select * from partsupp_ext; drop table partsupp_ext; alter table partsupp type (operational); ------create raw table supplier ( s_suppkey integer not null, s_name char(25), s_address varchar(40), s_nationkey integer, s_phone char(15), s_acctbal decimal(12,2), s_comment varchar(101) ) fragment by hash(s_suppkey) in s_c_p_ps extent size 512 next size 128 lock mode table ;

create external table supplier_ext sameas supplier using( format "delimited", datafiles("DISK:1:/db2f1/dbgen/supplier.tbl.1", "DISK:2:/db2f2/dbgen/supplier.tbl.2", "DISK:3:/db2f3/dbgen/supplier.tbl.3", "DISK:4:/db2f4/dbgen/supplier.tbl.4", "DISK:1:/db2f5/dbgen/supplier.tbl.5", "DISK:2:/db2f6/dbgen/supplier.tbl.6", "DISK:3:/db2f7/dbgen/supplier.tbl.7", "DISK:4:/db2f8/dbgen/supplier.tbl.8"

412 Database Strategies: Using Informix XPS and DB2 Universal Database )); insert into supplier select * from supplier_ext; drop table supplier_ext; alter table supplier type(operational); ------create raw table part ( p_partkey integer not null, p_name varchar(55), p_mfgr char(25), p_brand char(10), p_type varchar(25), p_size integer, p_container char(10), p_retailprice decimal(12,2), p_comment varchar(23) ) fragment by hash(p_partkey) in s_c_p_ps extent size 512 next size 128 lock mode table; create external table part_ext sameas part using ( format "delimited", datafiles("DISK:1:/db2f1/dbgen/part.tbl.1", "DISK:2:/db2f2/dbgen/part.tbl.2", "DISK:3:/db2f3/dbgen/part.tbl.3", "DISK:4:/db2f4/dbgen/part.tbl.4", "DISK:1:/db2f5/dbgen/part.tbl.5", "DISK:2:/db2f6/dbgen/part.tbl.6", "DISK:3:/db2f7/dbgen/part.tbl.7", "DISK:4:/db2f8/dbgen/part.tbl.8" )); insert into part select * from part_ext; drop table part_ext; alter table part type(operational); ------create raw table orders ( o_orderkey decimal(10,0) not null, o_custkey integer, o_orderstatus char(1), o_totalprice decimal(12,2), o_orderdate date, o_orderpriority char(15), o_clerk char(15), o_shippriority integer, o_comment varchar(79) ) fragment by hybrid(o_orderkey) expression

Appendix A. Case study schemas definitions 413 o_orderdate < '1992-02-01' in lo_mon1, o_orderdate >= '1992-02-01' and o_orderdate < '1992-03-01' in lo_mon2, o_orderdate >= '1992-03-01' and o_orderdate < '1992-04-01' in lo_mon3, o_orderdate >= '1992-04-01' and o_orderdate < '1992-05-01' in lo_mon4, o_orderdate >= '1992-05-01' and o_orderdate < '1992-06-01' in lo_mon5, o_orderdate >= '1992-06-01' and o_orderdate < '1992-07-01' in lo_mon6, o_orderdate >= '1992-07-01' and o_orderdate < '1992-08-01' in lo_mon7, o_orderdate >= '1992-08-01' and o_orderdate < '1992-09-01' in lo_mon8, o_orderdate >= '1992-09-01' and o_orderdate < '1992-10-01' in lo_mon9, o_orderdate >= '1992-10-01' and o_orderdate < '1992-11-01' in lo_mon10, o_orderdate >= '1992-11-01' and o_orderdate < '1992-12-01' in lo_mon11, o_orderdate >= '1992-12-01' and o_orderdate < '1993-01-01' in lo_mon12, o_orderdate >= '1993-01-01' and o_orderdate < '1993-02-01' in lo_mon13, o_orderdate >= '1993-02-01' and o_orderdate < '1993-03-01' in lo_mon14, o_orderdate >= '1993-03-01' and o_orderdate < '1993-04-01' in lo_mon15, o_orderdate >= '1993-04-01' and o_orderdate < '1993-05-01' in lo_mon16, o_orderdate >= '1993-05-01' and o_orderdate < '1993-06-01' in lo_mon17, o_orderdate >= '1993-06-01' and o_orderdate < '1993-07-01' in lo_mon18, o_orderdate >= '1993-07-01' and o_orderdate < '1993-08-01' in lo_mon19, o_orderdate >= '1993-08-01' and o_orderdate < '1993-09-01' in lo_mon20, o_orderdate >= '1993-09-01' and o_orderdate < '1993-10-01' in lo_mon21, o_orderdate >= '1993-10-01' and o_orderdate < '1993-11-01' in lo_mon22, o_orderdate >= '1993-11-01' and o_orderdate < '1993-12-01' in lo_mon23, o_orderdate >= '1993-12-01' and o_orderdate < '1994-01-01' in lo_mon24, o_orderdate >= '1994-01-01' and o_orderdate < '1994-02-01' in lo_mon25, o_orderdate >= '1994-02-01' and o_orderdate < '1994-03-01' in lo_mon26, o_orderdate >= '1994-03-01' and o_orderdate < '1994-04-01' in lo_mon27, o_orderdate >= '1994-04-01' and o_orderdate < '1994-05-01' in lo_mon28, o_orderdate >= '1994-05-01' and o_orderdate < '1994-06-01' in lo_mon29, o_orderdate >= '1994-06-01' and o_orderdate < '1994-07-01' in lo_mon30, o_orderdate >= '1994-07-01' and o_orderdate < '1994-08-01' in lo_mon31, o_orderdate >= '1994-08-01' and o_orderdate < '1994-09-01' in lo_mon32, o_orderdate >= '1994-09-01' and o_orderdate < '1994-10-01' in lo_mon33, o_orderdate >= '1994-10-01' and o_orderdate < '1994-11-01' in lo_mon34, o_orderdate >= '1994-11-01' and o_orderdate < '1994-12-01' in lo_mon35, o_orderdate >= '1994-12-01' and o_orderdate < '1995-01-01' in lo_mon36, o_orderdate >= '1995-01-01' and o_orderdate < '1995-02-01' in lo_mon37, o_orderdate >= '1995-02-01' and o_orderdate < '1995-03-01' in lo_mon38, o_orderdate >= '1995-03-01' and o_orderdate < '1995-04-01' in lo_mon39, o_orderdate >= '1995-04-01' and o_orderdate < '1995-05-01' in lo_mon40, o_orderdate >= '1995-05-01' and o_orderdate < '1995-06-01' in lo_mon41, o_orderdate >= '1995-06-01' and o_orderdate < '1995-07-01' in lo_mon42, o_orderdate >= '1995-07-01' and o_orderdate < '1995-08-01' in lo_mon43, o_orderdate >= '1995-08-01' and o_orderdate < '1995-09-01' in lo_mon44, o_orderdate >= '1995-09-01' and o_orderdate < '1995-10-01' in lo_mon45, o_orderdate >= '1995-10-01' and o_orderdate < '1995-11-01' in lo_mon46, o_orderdate >= '1995-11-01' and o_orderdate < '1995-12-01' in lo_mon47, o_orderdate >= '1995-12-01' and o_orderdate < '1996-01-01' in lo_mon48,

414 Database Strategies: Using Informix XPS and DB2 Universal Database o_orderdate >= '1996-01-01' and o_orderdate < '1996-02-01' in lo_mon49, o_orderdate >= '1996-02-01' and o_orderdate < '1996-03-01' in lo_mon50, o_orderdate >= '1996-03-01' and o_orderdate < '1996-04-01' in lo_mon51, o_orderdate >= '1996-04-01' and o_orderdate < '1996-05-01' in lo_mon52, o_orderdate >= '1996-05-01' and o_orderdate < '1996-06-01' in lo_mon53, o_orderdate >= '1996-06-01' and o_orderdate < '1996-07-01' in lo_mon54, o_orderdate >= '1996-07-01' and o_orderdate < '1996-08-01' in lo_mon55, o_orderdate >= '1996-08-01' and o_orderdate < '1996-09-01' in lo_mon56, o_orderdate >= '1996-09-01' and o_orderdate < '1996-10-01' in lo_mon57, o_orderdate >= '1996-10-01' and o_orderdate < '1996-11-01' in lo_mon58, o_orderdate >= '1996-11-01' and o_orderdate < '1996-12-01' in lo_mon59, o_orderdate >= '1996-12-01' and o_orderdate < '1997-01-01' in lo_mon60, o_orderdate >= '1997-01-01' and o_orderdate < '1997-02-01' in lo_mon61, o_orderdate >= '1997-02-01' and o_orderdate < '1997-03-01' in lo_mon62, o_orderdate >= '1997-03-01' and o_orderdate < '1997-04-01' in lo_mon63, o_orderdate >= '1997-04-01' and o_orderdate < '1997-05-01' in lo_mon64, o_orderdate >= '1997-05-01' and o_orderdate < '1997-06-01' in lo_mon65, o_orderdate >= '1997-06-01' and o_orderdate < '1997-07-01' in lo_mon66, o_orderdate >= '1997-07-01' and o_orderdate < '1997-08-01' in lo_mon67, o_orderdate >= '1997-08-01' and o_orderdate < '1997-09-01' in lo_mon68, o_orderdate >= '1997-09-01' and o_orderdate < '1997-10-01' in lo_mon69, o_orderdate >= '1997-10-01' and o_orderdate < '1997-11-01' in lo_mon70, o_orderdate >= '1997-11-01' and o_orderdate < '1997-12-01' in lo_mon71, o_orderdate >= '1997-12-01' and o_orderdate < '1998-01-01' in lo_mon72, o_orderdate >= '1998-01-01' and o_orderdate < '1998-02-01' in lo_mon73, o_orderdate >= '1998-02-01' and o_orderdate < '1998-03-01' in lo_mon74, o_orderdate >= '1998-03-01' and o_orderdate < '1998-04-01' in lo_mon75, o_orderdate >= '1998-04-01' and o_orderdate < '1998-05-01' in lo_mon76, o_orderdate >= '1998-05-01' and o_orderdate < '1998-06-01' in lo_mon77, o_orderdate >= '1998-06-01' and o_orderdate < '1998-07-01' in lo_mon78, o_orderdate >= '1998-07-01' and o_orderdate < '1998-08-01' in lo_mon79, o_orderdate >= '1998-08-01' and o_orderdate < '1998-09-01' in lo_mon80, o_orderdate >= '1998-09-01' and o_orderdate < '1998-10-01' in lo_mon81, o_orderdate >= '1998-10-01' and o_orderdate < '1998-11-01' in lo_mon82, o_orderdate >= '1998-11-01' and o_orderdate < '1998-12-01' in lo_mon83, o_orderdate >= '1998-12-01' in lo_mon84 extent size 512 next size 128 lock mode table; create external table order_ext sameas orders using(datafiles("DISK:1:/db2f1/dbgen/orders.tbl.1", "DISK:2:/db2f2/dbgen/orders.tbl.2", "DISK:3:/db2f3/dbgen/orders.tbl.3", "DISK:4:/db2f4/dbgen/orders.tbl.4", "DISK:1:/db2f5/dbgen/orders.tbl.5", "DISK:2:/db2f6/dbgen/orders.tbl.6", "DISK:3:/db2f7/dbgen/orders.tbl.7", "DISK:4:/db2f8/dbgen/orders.tbl.8" ));

Appendix A. Case study schemas definitions 415 insert into orders select * from order_ext; drop table order_ext; alter table orders type (operational); ------create raw table region ( r_regionkey integer not null, r_name char(25), r_comment varchar(152) ) in rootdbs.1 lock mode table;

create external table region_ext sameas region using( format "delimited", datafiles("DISK:1:/db2f1/dbgen/region.tbl.1"));

insert into region select * from region_ext; drop table region_ext; alter table region type(operational); ------create raw table lineitem ( l_orderkey decimal(10,0) not null, l_partkey integer, l_suppkey integer, l_linenumber integer, l_quantity decimal(12,2), l_extendedprice decimal(12,2), l_discount decimal(12,2), l_tax decimal(12,2), l_returnflag char(1), l_linestatus char(1), l_shipdate date, l_commitdate date, l_receiptdate date, l_shipinstruct char(25), l_shipmode char(10), l_comment varchar(44) ) fragment by hybrid (l_orderkey) expression

l_shipdate < '1992-02-01' in lo_mon1, l_shipdate >= '1992-02-01' and l_shipdate < '1992-03-01' in lo_mon2, l_shipdate >= '1992-03-01' and l_shipdate < '1992-04-01' in lo_mon3, l_shipdate >= '1992-04-01' and l_shipdate < '1992-05-01' in lo_mon4, l_shipdate >= '1992-05-01' and l_shipdate < '1992-06-01' in lo_mon5, l_shipdate >= '1992-06-01' and l_shipdate < '1992-07-01' in lo_mon6, l_shipdate >= '1992-07-01' and l_shipdate < '1992-08-01' in lo_mon7, l_shipdate >= '1992-08-01' and l_shipdate < '1992-09-01' in lo_mon8,

416 Database Strategies: Using Informix XPS and DB2 Universal Database l_shipdate >= '1992-09-01' and l_shipdate < '1992-10-01' in lo_mon9, l_shipdate >= '1992-10-01' and l_shipdate < '1992-11-01' in lo_mon10, l_shipdate >= '1992-11-01' and l_shipdate < '1992-12-01' in lo_mon11, l_shipdate >= '1992-12-01' and l_shipdate < '1993-01-01' in lo_mon12, l_shipdate >= '1993-01-01' and l_shipdate < '1993-02-01' in lo_mon13, l_shipdate >= '1993-02-01' and l_shipdate < '1993-03-01' in lo_mon14, l_shipdate >= '1993-03-01' and l_shipdate < '1993-04-01' in lo_mon15, l_shipdate >= '1993-04-01' and l_shipdate < '1993-05-01' in lo_mon16, l_shipdate >= '1993-05-01' and l_shipdate < '1993-06-01' in lo_mon17, l_shipdate >= '1993-06-01' and l_shipdate < '1993-07-01' in lo_mon18, l_shipdate >= '1993-07-01' and l_shipdate < '1993-08-01' in lo_mon19, l_shipdate >= '1993-08-01' and l_shipdate < '1993-09-01' in lo_mon20, l_shipdate >= '1993-09-01' and l_shipdate < '1993-10-01' in lo_mon21, l_shipdate >= '1993-10-01' and l_shipdate < '1993-11-01' in lo_mon22, l_shipdate >= '1993-11-01' and l_shipdate < '1993-12-01' in lo_mon23, l_shipdate >= '1993-12-01' and l_shipdate < '1994-01-01' in lo_mon24, l_shipdate >= '1994-01-01' and l_shipdate < '1994-02-01' in lo_mon25, l_shipdate >= '1994-02-01' and l_shipdate < '1994-03-01' in lo_mon26, l_shipdate >= '1994-03-01' and l_shipdate < '1994-04-01' in lo_mon27, l_shipdate >= '1994-04-01' and l_shipdate < '1994-05-01' in lo_mon28, l_shipdate >= '1994-05-01' and l_shipdate < '1994-06-01' in lo_mon29, l_shipdate >= '1994-06-01' and l_shipdate < '1994-07-01' in lo_mon30, l_shipdate >= '1994-07-01' and l_shipdate < '1994-08-01' in lo_mon31, l_shipdate >= '1994-08-01' and l_shipdate < '1994-09-01' in lo_mon32, l_shipdate >= '1994-09-01' and l_shipdate < '1994-10-01' in lo_mon33, l_shipdate >= '1994-10-01' and l_shipdate < '1994-11-01' in lo_mon34, l_shipdate >= '1994-11-01' and l_shipdate < '1994-12-01' in lo_mon35, l_shipdate >= '1994-12-01' and l_shipdate < '1995-01-01' in lo_mon36, l_shipdate >= '1995-01-01' and l_shipdate < '1995-02-01' in lo_mon37, l_shipdate >= '1995-02-01' and l_shipdate < '1995-03-01' in lo_mon38, l_shipdate >= '1995-03-01' and l_shipdate < '1995-04-01' in lo_mon39, l_shipdate >= '1995-04-01' and l_shipdate < '1995-05-01' in lo_mon40, l_shipdate >= '1995-05-01' and l_shipdate < '1995-06-01' in lo_mon41, l_shipdate >= '1995-06-01' and l_shipdate < '1995-07-01' in lo_mon42, l_shipdate >= '1995-07-01' and l_shipdate < '1995-08-01' in lo_mon43, l_shipdate >= '1995-08-01' and l_shipdate < '1995-09-01' in lo_mon44, l_shipdate >= '1995-09-01' and l_shipdate < '1995-10-01' in lo_mon45, l_shipdate >= '1995-10-01' and l_shipdate < '1995-11-01' in lo_mon46, l_shipdate >= '1995-11-01' and l_shipdate < '1995-12-01' in lo_mon47, l_shipdate >= '1995-12-01' and l_shipdate < '1996-01-01' in lo_mon48, l_shipdate >= '1996-01-01' and l_shipdate < '1996-02-01' in lo_mon49, l_shipdate >= '1996-02-01' and l_shipdate < '1996-03-01' in lo_mon50, l_shipdate >= '1996-03-01' and l_shipdate < '1996-04-01' in lo_mon51, l_shipdate >= '1996-04-01' and l_shipdate < '1996-05-01' in lo_mon52, l_shipdate >= '1996-05-01' and l_shipdate < '1996-06-01' in lo_mon53, l_shipdate >= '1996-06-01' and l_shipdate < '1996-07-01' in lo_mon54, l_shipdate >= '1996-07-01' and l_shipdate < '1996-08-01' in lo_mon55, l_shipdate >= '1996-08-01' and l_shipdate < '1996-09-01' in lo_mon56, l_shipdate >= '1996-09-01' and l_shipdate < '1996-10-01' in lo_mon57,

Appendix A. Case study schemas definitions 417 l_shipdate >= '1996-10-01' and l_shipdate < '1996-11-01' in lo_mon58, l_shipdate >= '1996-11-01' and l_shipdate < '1996-12-01' in lo_mon59, l_shipdate >= '1996-12-01' and l_shipdate < '1997-01-01' in lo_mon60, l_shipdate >= '1997-01-01' and l_shipdate < '1997-02-01' in lo_mon61, l_shipdate >= '1997-02-01' and l_shipdate < '1997-03-01' in lo_mon62, l_shipdate >= '1997-03-01' and l_shipdate < '1997-04-01' in lo_mon63, l_shipdate >= '1997-04-01' and l_shipdate < '1997-05-01' in lo_mon64, l_shipdate >= '1997-05-01' and l_shipdate < '1997-06-01' in lo_mon65, l_shipdate >= '1997-06-01' and l_shipdate < '1997-07-01' in lo_mon66, l_shipdate >= '1997-07-01' and l_shipdate < '1997-08-01' in lo_mon67, l_shipdate >= '1997-08-01' and l_shipdate < '1997-09-01' in lo_mon68, l_shipdate >= '1997-09-01' and l_shipdate < '1997-10-01' in lo_mon69, l_shipdate >= '1997-10-01' and l_shipdate < '1997-11-01' in lo_mon70, l_shipdate >= '1997-11-01' and l_shipdate < '1997-12-01' in lo_mon71, l_shipdate >= '1997-12-01' and l_shipdate < '1998-01-01' in lo_mon72, l_shipdate >= '1998-01-01' and l_shipdate < '1998-02-01' in lo_mon73, l_shipdate >= '1998-02-01' and l_shipdate < '1998-03-01' in lo_mon74, l_shipdate >= '1998-03-01' and l_shipdate < '1998-04-01' in lo_mon75, l_shipdate >= '1998-04-01' and l_shipdate < '1998-05-01' in lo_mon76, l_shipdate >= '1998-05-01' and l_shipdate < '1998-06-01' in lo_mon77, l_shipdate >= '1998-06-01' and l_shipdate < '1998-07-01' in lo_mon78, l_shipdate >= '1998-07-01' and l_shipdate < '1998-08-01' in lo_mon79, l_shipdate >= '1998-08-01' and l_shipdate < '1998-09-01' in lo_mon80, l_shipdate >= '1998-09-01' and l_shipdate < '1998-10-01' in lo_mon81, l_shipdate >= '1998-10-01' and l_shipdate < '1998-11-01' in lo_mon82, l_shipdate >= '1998-11-01' and l_shipdate < '1998-12-01' in lo_mon83, l_shipdate >= '1998-12-01' in lo_mon84 extent size 512 next size 128 lock mode table;

create external table lineitem_ext sameas lineitem using(datafiles("DISK:1:/db2f1/dbgen/lineitem.tbl.1", "DISK:2:/db2f2/dbgen/lineitem.tbl.2", "DISK:3:/db2f3/dbgen/lineitem.tbl.3", "DISK:4:/db2f4/dbgen/lineitem.tbl.4", "DISK:1:/db2f5/dbgen/lineitem.tbl.5", "DISK:2:/db2f6/dbgen/lineitem.tbl.6", "DISK:3:/db2f7/dbgen/lineitem.tbl.7", "DISK:4:/db2f8/dbgen/lineitem.tbl.8", "DISK:1:/db2f9/dbgen/lineitem.tbl.9", "DISK:2:/db2f10/dbgen/lineitem.tbl.10", "DISK:3:/db2f11/dbgen/lineitem.tbl.11", "DISK:4:/db2f12/dbgen/lineitem.tbl.12", "DISK:1:/db2f13/dbgen/lineitem.tbl.13", "DISK:2:/db2f14/dbgen/lineitem.tbl.14", "DISK:3:/db2f15/dbgen/lineitem.tbl.15", "DISK:4:/db2f16/dbgen/lineitem.tbl.16" ));

418 Database Strategies: Using Informix XPS and DB2 Universal Database insert into lineitem select * from lineitem_ext; drop table lineitem_ext; alter table lineitem type(operational); ------create raw table nation ( n_nationkey integer not null, n_name char(25), n_regionkey integer, n_comment varchar(152) ) in rootdbs.1 lock mode table; create external table nation_ext sameas nation using( format "delimited", datafiles("DISK:1:/db2f1/dbgen/nation.tbl.1")); insert into nation select * from nation_ext; drop table nation_ext; alter table nation type (operational); ------create raw table customer ( c_custkey integer not null, c_name varchar(25), c_address varchar(40), c_nationkey integer, c_phone char(15), c_acctbal decimal(12,2), c_mktsegment char(10), c_comment varchar(117) ) fragment by hash(c_custkey) in s_c_p_ps extent size 512 next size 128 lock mode table; create external table cust_ext sameas customer using ( format "delimited", datafiles("DISK:1:/db2f1/dbgen/customer.tbl.1", "DISK:2:/db2f2/dbgen/customer.tbl.2", "DISK:3:/db2f3/dbgen/customer.tbl.3", "DISK:4:/db2f4/dbgen/customer.tbl.4", "DISK:1:/db2f5/dbgen/customer.tbl.5", "DISK:2:/db2f6/dbgen/customer.tbl.6", "DISK:3:/db2f7/dbgen/customer.tbl.7", "DISK:4:/db2f8/dbgen/customer.tbl.8" )); insert into customer select * from cust_ext;

Appendix A. Case study schemas definitions 419 drop table cust_ext; alter table customer type (operational); alter table region lock mode(page); alter table nation lock mode(page); alter table part lock mode(page); alter table supplier lock mode(page); alter table partsupp lock mode(page); alter table customer lock mode(page); alter table orders lock mode(page); alter table lineitem lock mode(page); EOF

onutil alter cogroup cogroup_all reset backup

420 Database Strategies: Using Informix XPS and DB2 Universal Database DB2 schema and load scripts

This section contains the scripts that we used in DB2 to create our case study

tables and to load data. Example A-2 contains the script that we used to create tables.

Example: A-2 DB2: schema #!/bin/ksh

if [ $# -lt 1 ] then echo "\tusage: ${0} " exit fi

export DB2OPTIONS="-tv +p"

echo Start $0 at `date`

db2 <

connect to ${1};

create table partsupp ( ps_partkey integer not null, ps_suppkey integer not null, ps_availqty integer, ps_supplycost decimal(12,2), ps_comment varchar(199) ) partitioning key (ps_partkey) using hashing in space_dataindex not logged initially ;

create table supplier ( s_suppkey integer not null, s_name char(25), s_address varchar(40), s_nationkey integer, s_phone char(15), s_acctbal decimal(12,2), s_comment varchar(101) ) partitioning key (s_suppkey) using hashing in space_dataindex

Appendix A. Case study schemas definitions 421 not logged initially ;

create table part ( p_partkey integer not null, p_name varchar(55), p_mfgr char(25), p_brand char(10), p_type varchar(25), p_size integer, p_container char(10), p_retailprice decimal(12,2), p_comment varchar(23) ) partitioning key (p_partkey) using hashing in space_dataindex not logged initially ;

create table orders ( o_orderkey decimal(10,0) not null, o_custkey integer, o_orderstatus char(1), o_totalprice decimal(12,2), o_orderdate date, o_orderpriority char(15), o_clerk char(15), o_shippriority integer, o_comment varchar(79), o_orderym generated always as (integer(o_orderdate)/100) ) partitioning key (o_orderkey) using hashing organize by(o_orderym) in space_dataindex not logged initially ;

create table region ( r_regionkey integer not null, r_name char(25), r_comment varchar(152) ) in space_small not logged initially ;

422 Database Strategies: Using Informix XPS and DB2 Universal Database create table nation ( n_nationkey integer not null, n_name char(25), n_regionkey integer, n_comment varchar(152) ) in space_small not logged initially ; create table lineitem ( l_orderkey decimal(10,0) not null, l_partkey integer, l_suppkey integer, l_linenumber integer, l_quantity decimal(12,2), l_extendedprice decimal(12,2), l_discount decimal(12,2), l_tax decimal(12,2), l_returnflag char(1), l_linestatus char(1), l_shipdate date, l_commitdate date, l_receiptdate date, l_shipinstruct char(25), l_shipmode char(10), l_comment varchar(44), l_shipym generated always as (integer(l_shipdate)/100) ) partitioning key (l_orderkey) using hashing organize by(l_shipym) in space_dataindex not logged initially ; create table customer ( c_custkey integer not null, c_name varchar(25), c_address varchar(40), c_nationkey integer, c_phone char(15), c_acctbal decimal(12,2), c_mktsegment char(10), c_comment varchar(117) ) partitioning key (c_custkey) using hashing

Appendix A. Case study schemas definitions 423 in space_dataindex not logged initially ;

EOF

db2 terminate echo Done $0 at `date` echo ""

424 Database Strategies: Using Informix XPS and DB2 Universal Database Example A-3 contains the script that we used to load from the datafiles.

Example: A-3 DB2: Load scripts

#!/bin/ksh if [ $# -lt 1 ] then echo "\tusage: ${0} " exit fi export DB2OPTIONS="-tv +p" echo Start $0 at `date` db2 <

of del modified by coldel| anyorder replace into partsupp nonrecoverable data buffer 40000 -- partitioned db config mode partition_and_load ; values(current timestamp); load from /db2f1/dbgen/supplier.tbl.1, /db2f2/dbgen/supplier.tbl.2, /db2f3/dbgen/supplier.tbl.3, /db2f4/dbgen/supplier.tbl.4, /db2f5/dbgen/supplier.tbl.5, /db2f6/dbgen/supplier.tbl.6, /db2f7/dbgen/supplier.tbl.7, /db2f8/dbgen/supplier.tbl.8

Appendix A. Case study schemas definitions 425 of del modified by coldel| anyorder replace into supplier nonrecoverable data buffer 40000 -- partitioned db config mode partition_and_load ;

values(current timestamp); load from /db2f1/dbgen/part.tbl.1, /db2f2/dbgen/part.tbl.2, /db2f3/dbgen/part.tbl.3, /db2f4/dbgen/part.tbl.4, /db2f5/dbgen/part.tbl.5, /db2f6/dbgen/part.tbl.6, /db2f7/dbgen/part.tbl.7, /db2f8/dbgen/part.tbl.8

of del modified by coldel| anyorder replace into part nonrecoverable data buffer 40000 -- partitioned db config mode partition_and_load ;

values(current timestamp); load from /db2f1/dbgen/orders.tbl.1, /db2f2/dbgen/orders.tbl.2, /db2f3/dbgen/orders.tbl.3, /db2f4/dbgen/orders.tbl.4, /db2f5/dbgen/orders.tbl.5, /db2f6/dbgen/orders.tbl.6, /db2f7/dbgen/orders.tbl.7, /db2f8/dbgen/orders.tbl.8

of del modified by coldel| anyorder replace into orders nonrecoverable data buffer 40000 -- partitioned db config mode partition_and_load ;

values(current timestamp); load from /db2f1/dbgen/region.tbl.1

of del modified by coldel| anyorder

426 Database Strategies: Using Informix XPS and DB2 Universal Database replace into region nonrecoverable data buffer 40000 -- partitioned db config mode partition_and_load ; values(current timestamp); load from /db2f1/dbgen/lineitem.tbl.1, /db2f2/dbgen/lineitem.tbl.2, /db2f3/dbgen/lineitem.tbl.3, /db2f4/dbgen/lineitem.tbl.4, /db2f5/dbgen/lineitem.tbl.5, /db2f6/dbgen/lineitem.tbl.6, /db2f7/dbgen/lineitem.tbl.7, /db2f8/dbgen/lineitem.tbl.8, /db2f9/dbgen/lineitem.tbl.9, /db2f10/dbgen/lineitem.tbl.10, /db2f11/dbgen/lineitem.tbl.11, /db2f12/dbgen/lineitem.tbl.12, /db2f13/dbgen/lineitem.tbl.13, /db2f14/dbgen/lineitem.tbl.14, /db2f15/dbgen/lineitem.tbl.15, /db2f16/dbgen/lineitem.tbl.16

of del modified by coldel| anyorder replace into lineitem nonrecoverable data buffer 40000 -- partitioned db config mode partition_and_load ; values(current timestamp); load from /db2f1/dbgen/nation.tbl.1 of del modified by coldel| anyorder replace into nation nonrecoverable data buffer 40000 -- partitioned db config mode partition_and_load ; values(current timestamp); load from /db2f1/dbgen/customer.tbl.1, /db2f2/dbgen/customer.tbl.2, /db2f3/dbgen/customer.tbl.3, /db2f4/dbgen/customer.tbl.4, /db2f5/dbgen/customer.tbl.5, /db2f6/dbgen/customer.tbl.6,

Appendix A. Case study schemas definitions 427 /db2f7/dbgen/customer.tbl.7, /db2f8/dbgen/customer.tbl.8 of del modified by coldel| anyorder replace into customer nonrecoverable data buffer 40000 -- partitioned db config mode partition_and_load ;

values(current timestamp);

EOF

db2 <

db2 terminate echo Done $0 at `date` echo ""

DB2 federated database system support

As an add-on product, WebSphere Information Integrator merges diverse types of data into a format that provides easy access to information across an enterprise. With WebSphere Information Integrator you can perform the following tasks:  Access traditional forms of data and emerging data sources  Use data that is structured, semi-structured, and unstructured  Retrieve, update, transform, and replicate information from diverse distributed sources

Access to data that is stored in IBM databases (DB2 Universal Database and Informix) is built into DB2 Universal Database for Linux, UNIX, and Windows. We used this feature in our environment to access tables stored in XPS directly from DB2 via nickname as shown in Example A-4 on page 429, an example of loading table supplier from XPS via load cursor.

428 Database Strategies: Using Informix XPS and DB2 Universal Database Example: A-4 DDL to create nicknames

CREATE WRAPPER "IFMX" LIBRARY 'libdb2informix.a' OPTIONS( ADD DB2_FENCED 'N');

CREATE SERVER XPS TYPE INFORMIX VERSION '8' WRAPPER "IFMX" OPTIONS( ADD NODE 'xps', DBNAME 'tpch');

CREATE NICKNAME DB2TPCH.ORDERS_XPS FOR XPS."informix"."orders"; CREATE NICKNAME DB2TPCH.LINEITEM_XPS FOR XPS."informix"."lineitem"; CREATE NICKNAME DB2TPCH.PARTSUPP_XPS FOR XPS."informix"."partsupp"; CREATE NICKNAME DB2TPCH.SUPPLIER_XPS FOR XPS."informix"."supplier"; CREATE NICKNAME DB2TPCH.PART_XPS FOR XPS."informix"."part"; CREATE NICKNAME DB2TPCH.REGION_XPS FOR XPS."informix"."region"; CREATE NICKNAME DB2TPCH.NATION_XPS FOR XPS."informix"."nation"; CREATE NICKNAME DB2TPCH.CUSTOMER_XPS FOR XPS."informix"."customer";

Example A-5 shows the script for loading via nickname.

Example: A-5 Load region table directly from XPS via nickname declare load_cursor cursor for ( select * from supplier_xps ); load from load_cursor of cursor modified by anyorder replace into supplier nonrecoverable data buffer 40000 ;

Appendix A. Case study schemas definitions 429

430 Database Strategies: Using Informix XPS and DB2 Universal Database

B

Appendix B. Additional material

This redbook refers to additional material that can be downloaded from the Internet as described below.

Locating the Web material

The Web material associated with this redbook is available in softcopy on the Internet from the IBM Redbooks Web server. Point your Web browser to: ftp://www.redbooks.ibm.com/redbooks/SG246437

Alternatively, you can go to the IBM Redbooks Web site at: ibm.com/redbooks

Select Additional materials and open the directory that corresponds with the redbook form number, SG246437.

© Copyright IBM Corp. 2005. All rights reserved. 431 Using the Web material

The additional Web material that accompanies this redbook includes the

following files: File name Description TPC-H Queries.zip Zipped PDF file containing TPC-H benchmark queries

System requirements for downloading the Web material The following system configuration is recommended: Hard disk space: 200 MB minimum Operating System: Windows Memory: 128 MB or higher

How to use the Web material Create a subdirectory (folder) on your workstation, and unzip the contents of the Web material zipped file into that folder.

432 Database Strategies: Using Informix XPS and DB2 Universal Database Glossary

Access Control List (ACL). The list of principals Chunk. A collection of contiguous pages on a disk that have explicit permission (to publish, to allocated to a dbspace. subscribe to, and to request persistent delivery of a publication message) against a topic in the topic Commit. An operation that applies all the changes tree. The ACLs define the implementation of made during the current unit of recovery or unit of topic-based security. work. After the operation is complete, a new unit of recovery or unit of work begins. Aggregate. Pre-calculated and pre-stored summaries, kept in the data warehouse to improve Compensation. The ability of DB2 to process SQL query performance that is not supported by a data source on the data from that data source. Aggregation. An attribute level transformation that reduces the level of detail of available data. For Composite Key. A key in a fact table that is the example, having a Total Quantity by Category of concatenation of the foreign keys in the dimension Items rather than the individual quantity of each item tables. in the category. Computer. A device that accepts information (in Application Programming Interface. An the form of digitalized data) and manipulates it for interface provided by a software product that some result based on a program or sequence of enables programs to request services. instructions on how the data is to be processed.

Asynchronous Messaging. A method of Configuration. The collection of brokers, their communication between programs in which a execution groups, the message flows and sets that program places a message on a message queue, are assigned to them, and the topics and associated then proceeds with its own processing without access control specifications. waiting for a reply to its message. Connector. See Message processing node Attribute. A field in a dimension table/ connector.

BLOB. Binary Large Object, a block of bytes of data Data Append. A data loading technique where (for example, the body of a message) that has no new data is added to the database leaving the discernible meaning, but is treated as one solid existing data unaltered. entity that cannot be interpreted. Data Cleansing. A process of data manipulation Block. In multidimensional clustering terms, the and transformation to eliminate variations and smallest allocation unit in a Multidimensional inconsistencies in data content. This is typically to Clustering table. improve the quality, consistency, and usability of the data. Cell. In multidimensional clustering terms, the portion of a table containing rows having the same unique set of dimension values. It is the intersection of the slices from each of the dimensions.

© Copyright IBM Corp. 2005. All rights reserved. 433 Data Federation. The process of enabling data DB Connect. Enables connection to several from multiple heterogeneous data sources to appear relational database systems and the transfer of data as though it is contained in a single relational from these database systems into the SAP Business database. Can also be referred to “distributed Information Warehouse. access”. dbslice. In XPS, a named set of dbspaces that can Data Mart. An implementation of a data span multiple coservers. warehouse, typically with a smaller and more tightly restricted scope - such as for a department or dbspace. Logical collection of 1 or more chunks. workgroup. It could be independent, or derived from another data warehouse environment. DDL (Data Definition Language). a SQL statement that creates or modifies the structure of a table or Data Mining. A mode of data analysis that has a database. For example, CREATE TABLE, DROP focus on the discovery of new information, such as TABLE, ALTER TABLE, CREATE DATABASE. unknown facts, data relationships, or data patterns. Debugger. A facility on the Message Flows view in Data Partition. The data residing in a partition of a the Control Center that enables message flows to be partitioned database. It can be accessed and visually debugged. operated on independently even though it is part of a larger database. Deploy. Make operational the configuration and topology of the broker domain. Data Refresh. A data loading technique where all the data in a database is completely replaced with a Dimension. Data that further qualifies or describes new set of data. a measure, such as amounts or durations.

Data Warehouse. A specialized data environment Dimension. In multidimensional clustering terms, developed, structured, and used specifically for it is an axis along which data is organized in a decision support and informational applications. It is multidimensional clustered table. subject oriented rather than application oriented. Data is integrated, non-volatile, and time variant. Distributed Application In message queuing, a set of application programs that can each be Database Instance. A specific independent connected to a different queue manager, but that implementation of a DBMS in a specific collectively constitute a single application. environment. For example, there might be an independent DB2 DBMS implementation on a Linux DML (Data Manipulation Language). an INSERT, server in Boston supporting the Eastern offices, and UPDATE, DELETE, or SELECT SQL statement. another separate and independent DB2 DBMS on the same Linux server supporting the western Drill-down. Iterative analysis, exploring facts at offices. They would represent two instances of DB2. more detailed levels of the dimension hierarchies.

Database Partition. Part of a database that Dynamic SQL. SQL that is interpreted during consists of its own data, indexes, configuration files, execution of the statement. and transaction logs. Engine. A program that performs a core or DataBlades. These are program modules that essential function for other programs. A database provide extended capabilities for Informix engine performs database functions on behalf of the databases, and are tightly integrated with the database user programs. DBMS.

434 Database Strategies: Using Informix XPS and DB2 Universal Database Enrichment. The creation of derived data. An Materialized Query Table. A table where the attribute level transformation performed by some results of a query are stored, for later reuse. type of algorithm to create one or more new (derived) attributes. Measure. A data item that measures the performance or behavior of business processes. Extenders. These are program modules that provide extended capabilities for DB2, and are Message domain. The value that determines how tightly integrated with DB2. the message is interpreted (parsed).

Extent. Contiguous pages from a single chunk Message flow. A directed graph that represents allocated to a table. the set of activities performed on a message or event as it passes through a broker. A message flow FACTS. A collection of measures, and the consists of a set of message processing nodes and information to interpret those measures in a given message processing connectors. context. Message parser. A program that interprets the bit Federated Server. Any DB2 server where the stream of an incoming message and creates an WebSphere Information Integrator is installed. internal representation of the message in a tree structure. A parser is also responsible to generate a Federation. Providing a unified interface to diverse bit stream for an outgoing message from the internal data. representation.

Gateway. A means to access a heterogeneous Meta Data. Typically called data (or information) data source. It can use native access or ODBC about data. It describes or defines data elements. technology. MOLAP. Multidimensional OLAP. Can be called Grain. The fundamental lowest level of data MD-OLAP. It is OLAP that uses a multidimensional represented in a dimensional fact table. database as the underlying data structure.

Instance. A particular realization of a computer Multidimensional analysis. Analysis of data process. Relative to database, the realization of a along several dimensions. For example, analyzing complete database environment. revenue by product, store, and date.

Java Database Connectivity. An application Multidimensional Clustering (MDC). A DB2 data programming interface that has the same organization technique whereby data is characteristics as ODBC but is specifically designed automatically and continually clustered along for use by Java database applications. multiple dimensions.

Java Development Kit. Software package used to Multi-Tasking. Operating system capability which write, compile, debug and run Java applets and allows multiple tasks to run concurrently, taking applications. turns using the resources of the computer.

Java Message Service. An application Multi-Threading. Operating system capability that programming interface that provides Java language enables multiple concurrent users to use the same functions for handling messages. program. This saves the overhead of initiating the program multiple times. Java Runtime Environment. A subset of the Java Development Kit that allows you to run Java applets and applications.

Glossary 435 Nickname. An identifier that is used to reference Primary Key. Field in a table that is uniquely the object located at the data source that you want different for each record in the table. to access. Process. An instance of a program running in a Node Group. Group of one or more database computer. partitions. Program. A specific set of ordered operations for a Node. See Message processing node and Plug-in computer to perform. node. Pushdown. The act of optimizing a data operation ODS. (1) Operational : A relational table by pushing the SQL down to the lowest point in the for holding clean data to load into InfoCubes, and federated architecture where that operation can be can support some query activity. (2) Online Dynamic executed. More simply, a pushdown operation is Server - an older name for IDS. one that is executed at a remote server.

OLAP. OnLine Analytical Processing. ROLAP. Relational OLAP. Multidimensional Multidimensional data analysis, performed in analysis using a multidimensional view of relational real-time. Not dependent on underlying data data. A relational database is used as the underlying schema. data structure.

Open Database Connectivity. A standard Roll-up. Iterative analysis, exploring facts at a application programming interface for accessing higher level of summarization. data in both relational and non-relational database management systems. Using this API, database Server. A computer program that provides applications can access data stored in database services to other computer programs (and their management systems on a variety of computers users) in the same or other computers. However, the even if each database management system uses a computer that a server program runs in is also different data storage format and programming frequently referred to as a server. interface. ODBC is based on the call level interface (CLI) specification of the X/Open SQL Access Shared nothing. A data management architecture Group. where nothing is shared between processes. Each process has its own processor, memory, and disk Optimization. The capability to enable a process space. to execute and perform in such a way as to maximize performance, minimize resource Slice. That portion of a table which contains all utilization, and minimize the process execution rows having a specific value for one of the response time delivered to the user. dimensions.

Partition. Part of a database that consists of its Static SQL. SQL that has been compiled prior to own data, indexes, configuration files, and execution. Typically provides best performance. transaction logs. Static SQL. SQL that has been compiled prior to Pass-through. The act of passing the SQL for an execution. Typically provides best performance. operation directly to the data source without being changed by the federation server. Subject Area. A logical grouping of data by categories, such as customers or items. Pivoting. Analysis operation where user takes a different viewpoint of the results. For example, by changing the way the dimensions are arranged.

436 Database Strategies: Using Informix XPS and DB2 Universal Database Synchronous Messaging. A method of communication between programs in which a program places a message on a message queue and then waits for a reply before resuming its own processing.

Tablespace. In XPS, a logical collection of 1 or more extents allocated to a table.

Task. The basic unit of programming that an operating system controls. Also see Multi-Tasking.

Thread. The placeholder information associated with a single use of a program that can handle multiple concurrent users. Also see Multi-Threading.

Type Mapping. The mapping of a specific data source type to a DB2 UDB data type

Unit of Work. A recoverable sequence of operations performed by an application between two points of consistency.

User Mapping. An association made between the federated server user ID and password and the data source (to be accessed) used ID and password.

Virtual Database. A federation of multiple heterogeneous relational databases.

Warehouse Catalog. A subsystem that stores and manages all the system metadata.

Wrapper. The means by which a data federation engine interacts with heterogeneous sources of data. Wrappers take the SQL that the federation engine uses and maps it to the API of the data source to be accessed. For example, they take DB2 SQL and transform it to the language understood by the data source to be accessed. xtree. A query-tree tool that allows you to monitor the query plan execution of individual queries in a graphical environment.

Glossary 437

438 Database Strategies: Using Informix XPS and DB2 Universal Database Abbreviations and acronyms

ACS access control system DBMS DataBase Management ADK Archive Development Kit System AIX Advanced Interactive DCE Distributed Computing eXecutive from IBM Environment API Application Programming DCM Dynamic Coserver Interface Management AQR automatic query rewrite DCOM Distributed Component AR access register DDL Data Definition Language - an ARM automatic restart manager SQL statement that creates or ART access register translation modifies the structure of a table or database. For ASCII American Standard Code for example, CREATE TABLE, Information Interchange DROP TABLE. AST Application Summary Table DES Data Encryption Standard BID Block IDentifier DIMID Dimension Identifier BLOB Binary Large OBject DLL Dynamically Linked Library BW Business Information DML Data Manipulation Language - Warehouse (SAP) an INSERT, UPDATE, CCMS Computing Center DELETE, or SELECT SQL Management System statement. CFG Configuration DMS Database Managed Space CLI Call Level Interface DPF Database Partitioning Feature CLOB Character Large OBject DRDA Distributed Relational CLP Command Line Processor Database Architecture™ CORBA Common Object Request DSA Dynamic Scalable Broker Architecture Architecture CPU Central Processing Unit DSN CS Cursor Stability DSS Decision Support System DAS DB2 Administration Server EAI Enterprise Application Integration DB Database EBCDIC Extended Binary Coded DB2 Database 2™ Decimal Interchange Code

DB2 UDB DB2 Universal DataBase EDA Enterprise Data Architecture DBA Database Administrator EDU Engine Dispatchable Unit DBM DataBase Manager EGM Enterprise Gateway Manager

© Copyright IBM Corp. 2005. All rights reserved. 439 EJB™ Enterprise Java Beans JDBC Java DataBase Connectivity

ER Enterprise Replication JDK Java Development Kit ERP Enterprise Resource Planning JE Java Edition

ESE Enterprise Server Edition JMS Java Message Service ETL extract, transform, and load JRE Java Runtime Environment FP Fix Pack JVM Java Virtual Machine FTP File Transfer Protocol KB Kilobyte (1024 bytes) Gb Giga bits LDAP Lightweight Directory Access GB Giga Bytes Protocol GUI Graphical User Interface LPAR Logical Partition HADR High Availability Disaster LV Logical Volume Recovery Mb Mega bits HDR High availability Data MB Mega Bytes Replication MDC Multidimensional Clustering HPL High Performance Loader MPP Massively Parallel Processing I/O Input/Output MQI Message Queuing Interface IBM International Business MQT Materialized Query Table Machines Corporation MRM Message Repository Manager ID Identifier MTK DB2 Migration ToolKit for IDE Integrated Development Informix Environment NPI Non-Partitioning Index IDS Informix Dynamic Server ODBC Open DataBase Connectivity IMG Integrated Implementation Guide (for SAP) ODS Operational Data Store IMS™ Information Management OLAP OnLine Analytical Processing System OLE Object Linking and ISAM Indexed Sequential Access Embedding Method OLTP OnLine Transaction ISM Informix Storage Manager Processing ISV Independent Software Vendor ORDBMS Object Relational DataBase Management System IT Information Technology OS Operating System ITR Internal Throughput Rate O/S Operating System ITSO International Technical Support Organization PDS Partitioned Data Set IX Index PIB Parallel Index Build J2EE Java 2 Platform Enterprise PSA Persistent Staging Area Edition RBA Relative Byte Address JAR Java Archive RBW Red Brick™ Warehouse

440 Database Strategies: Using Informix XPS and DB2 Universal Database RDBMS Relational DataBase XPS Informix Extended Parallel Management System Server RID Record Identifier RR Repeatable Read

RS Read Stability SCB Session Control Block SDK Software Developers Kit SID Surrogate Identifier SMIT Systems Management Interface Tool SMP Symmetric Multiprocessing SMS System Managed Space SOA Service Oriented Architecture SOAP Simple Object Access Protocol SPL Stored Procedure Language SQL Structured Query TCB Thread Control Block TMU Table Management Utility TS Tablespace UDB Universal DataBase UDF User Defined Function UDR User Defined Routine URL Uniform Resource Locator VG Volume Group (Raid disk terminology). VLDB VP Virtual Processor VSAM Virtual Sequential Access Method VTI Virtual Table Interface WSDL Web Services Definition Language WWW World Wide Web

XBSA X-Open Backup and Restore APIs XML Extensible Markup Language

Abbreviations and acronyms 441

442 Database Strategies: Using Informix XPS and DB2 Universal Database Related publications

The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this redbook.

IBM Redbooks

For information about ordering these publications, see “How to get IBM Redbooks” on page 445. Note that some of the documents referenced here might be available in softcopy only.  Up and Running with DB2 UDB ESE: Partitioning for Performance in an e-Business Intelligence World, SG24-6917  Database Transition: Informix Dynamic Server to DB2 Universal Database, SG24-6367.  Transitioning: Informix 4GL to Enterprise Generation Language (EGL), SG24-6673

Other publications

These publications are also relevant as further information sources:  IBM Toronto Lab White Paper “DB2 Universal Database Version 8 Loader Performance” by Marko Milek, Aleksandrs Santars, Leo Lau, Mark Leitch, and Anoop Sood, July 2004.  DB2 V8.2 Manual “Data Movement Utilities Guide and Reference,” 2004.  DB2 UDB Administration Guide: Implementation, SC09-4820.  DB2 UDB Administration Guide: Performance, SC09-4821.  Data Warehouse Center Administration Guide, SC26-9993.  DB2 Command Reference, SC09-2951.  DB2 SQL Reference, SC09-2974, SC09-2975.  DB2 Command Reference, SC26-8967.

 Data Movement Utilities Guide and Reference, SC09-2955.  DB2 Application Development Guide: Programming Client Applications, SC09-4826.

© Copyright IBM Corp. 2005. All rights reserved. 443  IBM Informix Extended Parallel Server Administrator's Guide, G251-2231.

 IBM Informix Extended Parallel Server Performance Guide, G251-2235.  IBM Informix: Database Design and Implementation Guide, G251-2271.  Developerworks Article Monitoring query execution in XPS with onstat by Andreas Weininger, April 2004, at: http://www-128.ibm.com/developerworks/db2/library/techarticle/ dm-0404weininger/index.html  ECIS course L1-627.1 / ITES course CG-08 “Transitioning to IBM DB2 Universal Database V8.2”, October 2004.  IBM DB2 High Performance Unload for Multiplatforms and Workgroups User’s Guide Version 2 Release 2, Fifth Edition, September 2004, SC88-9874.  White Paper: Multi-Dimensional Clustering: A New Data Layout Scheme in DB2 by Sriram Padmanabhan, Bishwaranjan Bhattacharjee, Tim Malkemus, Leslie Cranston, and Matthew Huras.  DB2 UDB for Linux, UNIX, and Windows V8 Transition - Course Code CG082 - IBM Learning Services  White Paper: Partitioning in DB2 Using the UNION ALL View by Calisto Zuzarte, Robert Neugebauer, Natt Sutyanyong, Xiaoyan Qian, and Rick Berger  V. Markl, G. M. Lohman, and V. Raman, “LEO: An Autonomic Query Optimizer for DB2,” IBM Systems Journal 42:1 (2003), pp. 98-106. Available at: http://www.research.ibm.com/journal/sj/421/markl.pdf  S. S. Lightstone, G. Lohman, & D. Zilio, “Toward Autonomic Computing with DB2 Universal Database,” ACM SIGMOD Record — Web Edition, 31:3 (September 2002). Available at: http://www.acm.org/sigmod/record/issues/0209/lensilegman2.pdf  Graeme Birchall, DB2 UDB V8.2 SQL Cookbook, last updated for DB2 UDB V8.2 on November 3, 2004. Available for free from: http://ourworld.compuserve.com/homepages/Graeme_Birchall  George Barclarz and Bill Wong, DB2 Universal Database v9 for Linux, UNIX, and Windows: Database Administration Certification Guide (Upper Saddle River, NJ: Printice Hall & Austin, TX: IBM International Technical Support Organization, 2003). ISBN 0-13-046361-2. Chapter 7: Advanced SQL — OLAP Features.

444 Database Strategies: Using Informix XPS and DB2 Universal Database  Michael L. Gonzales, IBM Data Warehousing: With IBM Business Intelligence Tools (Indianapolis, IN: Wiley, 2003). ISBN 0-471-13305-1. Chapter 13: DB2 OLAP functions.

How to get IBM Redbooks

You can search for, view, or download Redbooks, Redpapers, Technotes, draft publications, and Additional materials, as well as order hardcopy Redbooks or CD-ROMs, at this Web site: ibm.com/redbooks

Help from IBM

IBM Support and downloads ibm.com/support

IBM Global Services ibm.com/services

Related publications 445

446 Database Strategies: Using Informix XPS and DB2 Universal Database Index

backup and recovery 112, 320 Symbols backup image 117, 123 .NET 319 backup utility 117 Backups 36 A Base System Utilities 28 ACID capabilities 323 BEFORE-statement triggers 242 active log 41 Bernoulli sampling 213 ActiveX 319 BID indexes 170 Activity Monitor 300 BIND command 329 Administrative Node Directory 75 BINDADD authority 130 ADO 319 binding a CPU virtual processor 376 agent private memory 51, 53 binomial distribution 215 Aggregation function 234 bit-filters 206 AIX 347 Block 167 aliases. 239 block index 168 altering tables 238 block map 169 alternative authentication 124 block-based buffer pools 53 Anti join 206 BLOCKSIZE 53 ANTLR 351 Broadcast join 193, 195 anyorder modifier 269 Broadcast table queue 60, 194 Application Global Memory 51 b-tree cleaning feature - XPS 187 Application Group Shared Memory 53 buffer cache 49 Application Shared Memory 53 Buffer Pool Analyzer 284 architecture 7 buffer pools 27, 53, 65 archive logging 41, 110 bufferpool memory 81 Association Service Facility 23 Bufferpool Services 28 Asynchronous table queues 194 Built-in functions 346 Atomic 323 business intelligence 11, 14 atomic compound SQL 230 Audit buffer size 52 authenticated DB2 user 125 C C 318 Authentication 124 C keyword typedef 334 autoconfigure command 70 C string 333 Automated backup 114 C string pointer 333 automatic database backup 294 C structures 334 Automatic database statistics 294 C variables 333 automatic maintenance 293 C++ API 337 automatic maintenance windows 295 cache 99 Automatic Maintenance wizard 114 Cartesian Product 189 Automatic reorganization 295 Catalog cache 52 catalog tablespace 37 B Cell 167, 173 Back-End Library 24 change the isolation level 327

© Copyright IBM Corp. 2005. All rights reserved. 447 character strings 224 Create DB2 instance 364 character types 146 CREATE INDEX statement 397 NCHAR 146 Create Index Wizard 186 TEXT 147 create table spaces 101 Truncation 146 CREATETAB authority 131 VARCHAR 147 Creating DB2 database users 371 Check Backup 293 creating tables 238 check constraints 264 Cross join 189, 216 chunk 41, 100 CUBE 234 circular logging 41, 110 cursor management 330 CLI 318 Cursor Stability 236 CLI environment 336 cursors WITH HOLD 216 Client Connect Interface 28 Client SDK 2.90 254 client/server encryption 139 D DAS 92 client-side message buffers 50 Data access blox 13 clone 123 data conversion 310, 312 clustered index 184 Data Encryption 124 clustering index 166, 170 Data Flow Manager 24 COBOL 318 Data Managed Space 311 collocated join 57 Data Management Services 28 collocated table joins 193, 195 data movement 314 Command Editor 278–279 Data Partitioning 11 Command Line Processor 318 Data Protection Services 28 composite block index 168 Data Warehouse Center 282 compound SQL 230 database activation 88 concatenation 224 database administration activities 247 Configuration Advisor 276 Database administration authority 126 Configuration Assistant 280 database agents 199 configuration file 65 database backup 112 configuration parameters 92 database configuration file 25 Configuring a C++ Compiler 360 database global memory 51–52 connection string 335 Database heap 52 Consistency 323 database instance 21 container 42, 100 database integrity 292 Control Center 25, 93, 186, 278, 285, 293–294, database interfaces 317 318, 366 Database Managed Space 32, 81, 100 CONTROL privileges 130, 371 Database Manager Configuration file 25 converter thread 264 database manager level 391 cooked files 32 Database Manager Shared Memory 51 coordinating agent 47 database manager. See DBM. coserver 255 database objects 317 Coserver and Partition 64 database operations 87 coserver logical logs 65 database partition group 35 cost-based optimizer 290 Database Partitioning Feature. See DB2 DPF. CPUVPs 256, 376, 391 database recovery 118 Crash recovery 113 database restore 120 create buffer pool 99 Database shared memory 52 CREATE DATABASE statement 247

448 Database Strategies: Using Informix XPS and DB2 Universal Database database specific user groups 128 DB2 Extenders 16 database subagents 199 DB2 extent 32 DataBlox 13 DB2 extent size 397 Datagram Layer 24 DB2 Governor utility 276 DataStage 254 DB2 hash partitioning 58 Date and time types 152 DB2 Health Center 277, 295 DB2 Administration Server 26, 92 DB2 High Performance Unload for Multiplatforms. DB2 Alphablox 11–12 See DB2 HPU. Analytic infrastructure blox 14 DB2 HPU 36, 121, 271, 284, 374 Blox components 12 control file 271 Blox library 13 DB2 index reorganization 187 Business logic blox 13 DB2 Intelligent Miner 11, 16 Data access blox 13 classification 16 DataBlox 13 Demographic clustering 16 Form elements blox 13 Scoring 16 tag libraries 14 Visualizer 17 DB2 architecture 4, 8, 27 DB2 Intelligent Miner Modeling 16 DB2 archive logging 110 Predictive Model Markup Language 16 DB2 authorities 125 DB2 License Center 281 DB2 bufferpool 65 DB2 limits 157 DB2 built-in functions 219 DB2 locks 325 DB2 C structures 334 DB2 log files DB2 cataloging 66 Primary Log Files 40 DB2 circular logging 110 Secondary Log Files 40 DB2 CLI 335 DB2 logging modes 110 DB2 Client Configuration Assistant 277 DB2 materialized query tables. See MQTs. DB2 Configuration Advisor Wizard 69 DB2 Memory Visualizer 298 DB2 Connect 75 DB2 Migration ToolKit. See MTK. DB2 container 30 DB2 monitoring tools 296 DB2 Control Center 67, 276, 295 DB2 Office Connect Analytic Edition 16 DB2 Cube Views 11, 14, 246 DB2 Office Connect Enterprise Web Edition 11 metadata support 15 DB2 OLAP Server 11 query performance 15 DB2 optimizer 199, 245 DB2 Data Warehouse Base Edition 11 DB2 packages 327 DB2 Data Warehouse Edition 7, 246 DB2 pages 31 DB2 Data Warehouse Enterprise Edition 11 DB2 partitioning 58, 159 DB2 Data Warehouse Standard Edition 11 DB2 partitioning key 395 DB2 database configuration parameters 66 DB2 performance expert 300 DB2 database manager config parameters 66 DB2 Query Patroller 7, 17–18, 241, 286 DB2 Database Modes 39 DB2 Recovery Expert 284 DB2 database partition 64 DB2 registry profile settings 66 DB2 Database Partitioning Feature. See DB2 DPF. DB2 Replication Center 281 DB2 diagnostic log files 304 DB2 sampling without replacement 215 DB2 directories 74 DB2 savepoints 324 DB2 DPF 5, 7, 21, 87, 160, 254, 271 DB2 Table Editor 285 DB2 Enterprise Extended Edition 271 DB2 table queue terminology 193 DB2 ESE DPF. See DB2 DPF. DB2 table reorganization 187 DB2 ESQL/C 337 DB2 tablespace 35 DB2 event monitors 298 DB2 Test Database Generator 285

Index 449 DB2 Tools catalog 364 device containers 100 DB2 transaction logs 65 diagnostic files, 303 DB2 UDB 2–3, 7, 281, 345 diagnostic log files 304 DB2 UDB authorities 370 dimension block index 169 DB2 UDB DPF. See DB2 DPF. Directed joins 193, 195 DB2 UDB Enterprise Extended Edition 20 Directed table queues 60, 194 DB2 UDB Enterprise Server Edition. See DB2 UDB directory containers 100 ESE DIRTY READ 237 DB2 UDB Enterprise Server Edition. See DB2 UDB disabling triggers 242 ESE. Disk considerations 145 DB2 UDB ESE 7, 19, 122 Disk mirroring 123 DB2 UDB privileges 371 Distributed catalog cache 204 DB2 Universal Database Data Warehouse Edition DMS 311 11 DMS table space 367–368 DB2 Universal Database. See DB2 UDB. double dashes 218 DB2 utility throttling 292 DPF 14 DB2 Warehouse Manager 282 dumpfile 267 DB2 Warehouse Manager Standard Edition 18 Durable 323 DB2 Web Command Center 283 dynamic coserver management 9 DB2 Web Health Center 283 dynamic logging 38 DB2 Web Query Tool 286 dynamic SQL 318, 328 DB2Cube Views 311 DB2DART utility 292 db2look utility 291 E embedded dynamic SQL 335 db2lpart 259 enabling triggers 242 db2lpprt 259 Enterprise Generation Language (EGL) 2 DBI 319 Entity-Relationship 314 DBM 65 environment variables 72, 91 DBM configuration file 65 ESQL/C 6, 332 DBMS 1 ETL 9, 11, 18 DBMS configuration 63 ETL tools 315 dbslice 30, 35, 384 EVENT MONITOR STATE SQL statement 298 dbspace 30, 42 event monitors 298 DCS Directory 75 Event Publishing 281 DDL 309, 314 exception table 268 deadlock detection 326 Explain - for XPS 249 deadlocks 299 export command 273 DECIMAL data type 149 expression fragmentation 55 FLOAT 149 expression-based fragmentation 165 declare section 333 extent 32, 34, 41, 166, 168, 172 default bufferpool 99 extent size 163, 397 deferred constraint checking 227 external table - restrictions 264 degree of parallelism 269 external tables 255, 270, 388 Delete-Join 196 delimited ASCII flat files 400 delprioritychar filetype modifier 147 F Demographic clustering 16 Fast Communications Manager 21, 28, 78, 391 Deployment 349 fast recovery 113 Design Advisor 246 FCM buffers 52

450 Database Strategies: Using Informix XPS and DB2 Universal Database federated data access 11 HP-UX 347 federation 315, 320 HTML 319 FENCED clause 131 HTTP 319 fenced mode 205 hybrid fragmentation 396 fenced user screen 364 hypergeometric distribution 215 file containers 100 FLOAT data type 155 Foreign Key 183 I I/O parallelism 391 FORTRAN 318 IBM callable SQL interface 335 fragmentation 34, 54, 243 IBM Informix Extended Parallel Server. See XPS. Expression 55 IBM Silicon Valley Lab 345 Hash 55 IBM Watson Research Laboratory 345 Hybrid 55 IDENTITY definition 150 Range 55 IDS architecture 23 fragmentation - expression-based 165 IDS built-in functions 353 fragmentation - hash 161 IDS data pages 49 fragmentation - hybrid 166 IDS Database Modes fragmentation - round robin 164 ANSI database 39 fragmentation methods 10 Logged database 39 Fragmentation of Indexes 55 IDS Logging 37 fragmentation schemes 159 IDS shared memory 48 FULL OUTER JOIN 189 implicit casting 226 Implicit privilege 371 G import statement 269 Generalized Key Indexes. See GK Indexes. IN predicate 171 generate scripts 310 Include option 181 GK indexes 185 incremental backup 114–115 globally detached index 56 incremental delta 115 Grant command syntax 371 Index Advisor 246 group authentication 124 index ANDing 192 GROUP BY 171 index expansions 181 GROUPING SETS 234 index leaf pages 182 GUI 348 index scans 200 index skip scans 185 index tree 182 H Indexes 346 hash code 191 indexing columns 181 Hash fragmentation 55, 161, 255, 395 indexing schemas 180 hash joins 191 Individual privilege 371 Hash partitioning 161 Informatica 254 Health Center 281, 294, 296 Informatica PowerMart 315 health indicator 297 Information Catalog Center 281 Health Monitor 296 Information Integrator 316 high availability 122–123, 313, 320 informational constraint 206 High Performance Loader Express 50 Informix 4GL 2 High Performance Unload. See DB2 HPU. Informix acquisition 2 host variables 333 Informix C++ API 337 assignments 334 Informix Client Software Development Kit 10 HPU. See DB2 HPU.

Index 451 Informix Connect 10 joining load files with flat files 274 Informix Extended Parallel Server. See XPS. joins 188 Informix I-Spy 7, 9–10, 241, 249, 287, 381 Broadcast 193 Informix JDBC 10 Collocated table 193 Informix OnLine version 5 2 CROSS 189 Informix Server Administrator 10, 66 Directed 193 Informix Server Administrator tool 10 Hash 191 Informix Standard Engine 2 inner 188 Informix XPS. See XPS. merge 191 Inner join 216 nested loop 191 inner join 188 OUTER 189 insert cursors 231, 254, 331 Star 192 insert thread 256 Journal 280 insert/select statement 270 INSPECT command 292 instance 8, 21, 25, 87, 311 K key-only scans 184 configuration changes 87 creation 87 operation 87 L shared memory 52 large data volumes 373 integer variable 333 Large Objects 337 internal table - restrictions 264 leaf page 182 inter-node parallelism 47 LEarning Optimizer (LEO) 197 inter-partition parallelism 194 LEFT OUTER JOIN 188 INTERVAL data type 154 LEO - Learning Optimizer 197 Intra-node parallelism 47 light-append 256 intra-partition parallelism 194 Linux 284, 346–347 iSeries 346 Listener table queues 194 Isolated 323 LOAD buffer manipulator 261 isolation level, changing 327 LOAD catalog subagent 262 isolation levels 236, 327 load command 269 I-Spy. Informix I-SPY. load copy 261 I-Star 254 Load from cursor 270 ISV migrations 349 LOAD initialization subagent 262 LOAD LOB scanner 261 LOAD Media Reader process 261 J load mini-buffer manipulator 263 J2EE 12, 319 load mini-task subagent processes 263 Java 278, 308, 318, 351 load partition subagent 262 Java Virtual Machine 205 load performance 269 Java/JDBC 330, 334 Load pre-partition subagent 262 JavaScript/Java API 14 load processes 260 JavaSoft specification 10 load query cleanup subagent processes 263 JDBC 6, 318, 334, 351 load read-file subagent processes 263 Join methods 190 load subagent processes 263 join predicates 191, 195 LOAD table scanner 262 join pushdown 177 load user-exit subagent processes 263 join syntax 188 load using pipes 401 join variants 205 LOAD utility 131

452 Database Strategies: Using Informix XPS and DB2 Universal Database LOB data types 155 monitoring tools and advisors 296 BLOB 155 monotonic 171 BYTE 155 MPP 7, 20, 47, 160 CLOB 155 MQ series 123 TEXT 155 MQT Advisor 246 LOB host variable 337 MQTs 14–15, 185, 196, 212, 311 LOB locator host variable 338 REFRESH DEFERRED 245 Local Database Directory 74 REFRESH IMMEDIATE 245 locally detached index 56 MTK 317, 345, 394 Lock escalation 326 MTK actions Lock list 52 convert 351 lock modes 326 Deploy to DB2 353, 356 Locking 236 Extract 351 Locks 49 Functions and Procedures 354 log buffers 27, 37, 49 Generate Data Transfer Scripts 355 log mirroring 122 Import 351 logical log file 36, 38 Loading Data 360 Logs 36 Manual transfer 355 long object names 142 metadata 352 refine 351 Specify Step 351 M translator 351 massively parallel processing systems. See MPP. MTK Configurations 348 MATCHES predicate 218 MTK with Manual Deployment 363 Materialized Query Tables. See MQTs. Multidimensional Clustering 166, 243, 290, 398 MAXERRORS 264 Multidimensional Services 16 maxerrors 267 multi-index scan 184 MAXERRORS parameter 264 multiline comments 218 MDC 270 multistream run 390 MDC. See Multidimensional Clustering. multi-threaded architecture 50 media reader processes 260 Multi-threading of Java-based routines 205 memory allocation 26, 48 memory heap 91 memory management 85 N memory models 48 named pipes 255, 315 memory pools 49 NCHAR data type 146 Memory Visualizer 297 nested loop join 191 MERGE INTO statement 195, 231 Node Directory 75 merge join 191 nodegroups 393 MERGE statement 234 Non-merging table queues 194 Merging table queues 194 non-parallel cursor 270 metadata 314 NULL values 145, 240 meta-optimizer 197 numerical data types 148 Microsoft SQL Server 19 DATE 152 migration path 3 DATETIME 152 migration process 351 DECIMAL 149 model 314 MONEY 149 MONEY data type 149 SERIAL 150 Monitor heap 52 TIME 152

Index 453 TIMESTAMP 152 multidimensional clustering 59 numerical limits 148 views with UNION ALL 59 numerical types partitioning keys 193 INTERVAL 154 partitioning methods for iterators 57 broadcasting 57 partitioning 57 O passing the data 57 OAD formatter process 261 Partitioning of Indexes 58 ODBC 6, 318–319, 351 password encryption 124 ODBC/CLI 330, 335 Performance Warehouse 284 offline archived 41 physical log file 36, 38 Offline mode 90 pipe-delimited ASCII 255 OLAP 11–12, 14, 234, 282 pipe-delimited load files 258 OLAP Center 15 predicate pushdown 176 OLE DB 319 prefetch. See sequential prefetch. ONCONFIG 92 pre-partitioning 263 Online Analytical Processing. See OLAP. pre-partitioning agents 259 online archived 41 primary key 183, 240 Operating System Services 28 private buffers 49 Optimization Advisor 15 Process allocation 26 Optimization classes 196, 200 processes 27 optimizer 168, 196 processing flat files 274 optimizer directives 200, 238 processor affinity setting 374 Oracle 19 Profile Registry 72 Outer join 216 programming languages 332 outer join 188 push-down hash joins 196, 198 Ownership privileges 371 Q P query - performance tuning 250 Package cache 52 query execution plan 197 package isolation level 329 Query Graph Model 29 package versions 330 query management 17 packages 327 Query optimizers 196 page 31, 41 Query Patroller 7 Page cleaner 205 query rewrite rules 196 page size 31, 79 Queue replication 123, 281 parallel bulk loading 253, 255 queues 49 parallel export 273 quotations 224 Parallel I/O 34 parallel index scans 200 parallel join strategy 195 R parallel query plans 24 range scans 171 parallel scan strategies 200 range-clustered table 173 parallel unloading 270, 273 Ranking functions 234 partition group 393 raw devices 32 partitioned database environment 121 raw disk 100 partitioning 243 READ COMMITTED 237 partitioning column 259 READ UNCOMMITTED 237 partitioning data in DB2 59 readiness assessment 308

454 Database Strategies: Using Informix XPS and DB2 Universal Database Real or Smallfloat data type 155 S REBIND command 330 savepoints 324 recommendation adviser 296 schema extraction 291 record identifer index 166 schema transfer 394 RECOVER command 118 scrollable cursors 330 recoverability 258 security recoverable database 114 client/server 139 recovery 112 default privileges 129 recovery history file 113, 118 GRANT and REVOKE 129 recovery log files 113 index privileges 132 Redbooks Web site 445 levels 129 Contact us xix list of privileges 133 redirected restore operation 120 package privileges 132 redundant branch elimination 176 permissions 128 referential integrity 227 privilege descriptions 130 registry variables 73 schema privileges 134 reject file 264 schema privileges list 134 Relational Data Services 28 table space privileges 138 Relational Storage Access Method 24 Table/View/Nickname privileges 135 REORG INDEXES 290 SELECT cursors 215 reorg utility 249 SELECT DISTINCT 210 reorganizing 289 select first ’n’ rows 210 REORGCHK 290 SELECT triggers 241 REPEATABLE READ 237 Self Managing 278 replication 123, 320 semi-join 192 Replication Center 281 Sequence objects 155 replication server 316 Sequences 346 Resource Grant Manager 24 sequential prefetch 32, 53, 84, 101, 205, 289, 391 Resource Tuning 278 SERIAL data type 150 RESTORE command 118 server-side message buffers 50 Reverse early out 205 session control block 49 Reverse outer join 205 Session pools 49 RID index 168 set integrity statement 269 ridder processes 260 SET ISOLATION 237 RIGHT OUTER JOIN 188 set operators 227 roles 128 Set User Information 364 rollback work statement 330 shared everything architecture 21 ROLLFORWARD command 118 Shared memory rollforward recovery 83, 113 connections 48 roll-in/roll-out 398, 405 message portion 50 ROLLUP 234 portions 48 rootdbslice 36 resident portion 49 round robin fragmentation 54, 164 virtual portion 49 row level locking 397 shared nothing architecture 20, 63 Row numbering function 234 Shared sort memory 52 RUNSTATS 183, 347 single partition database 194 runstats utility 249, 403 single stream loading 253 slice 167 SMP 21, 47, 160

Index 455 SMP cluster 160 Sybase 19 SMPO 310 symmetric multiprocessing. See SMP. SMS 311 Synchronous table queues 194 SMS table space 367–368 synonyms 239, 346 snapshot 123, 299 System administration authority 125 snapshot monitor 299 system catalog 223 snowflake schema 198 system commands 247 SOAP 319 System Database Directory 74 sparse blocks 173 System maintenance authority 126 sparse extents 173 System Managed Space 32, 80, 100 SPL commands 352 system monitor authority 126 SPL routines 346 System sampling 214 split mirror function 123 system temporary table space 37 split mirror image 123 system temporary tables 36 SQL 23 SQL Assist 279 SQL Communications Access record 151 T table fragmentation methods 10 SQL Communications Area 339 Table queue 193 SQL Descriptor Area 344 Table queue terminology 193 SQL explain tools 250 table space 25, 27, 35, 42, 79, 100 SQL replication 123, 281 creating 368 SQL standards 209 planning 367 SQL/PL 331 temporary 25 SQLCA table space categories see SQL Communications Area large 367 SQLCODE 219, 347 regular 367 SQLDA temporary 367 see SQL Descriptor Area table space change history file 113 SQLHOSTS file 66 table space security SQLj 318 authority 370 SQLSTATE 219 privilege 370 staging tables 136, 174, 245, 274 Task Center 279 non-MDC 274 Task schedules 279 standby database 123 Temporary table spaces 25 Star join 192 temporary tables 228 star schema 198 explicit 229 Static SQL 328 implicit 228 statistical algorithms 212 Teradata 19 statistical information 183 TEXT data type 147 statistical sampling 212 thread control block 49 statistics 248 thread-based model 205 Storage Manager 103, 298 throttling utilities 292 stored procedure in SELECT 211 time plan 313 stored procedures 205, 319, 331 TPC/IP 259 strategic directions 2 TPC-H benchmark 373 subagent 47 TPC-H queries 390, 403 Substring functionality 211 Training costs 310 SUN Solaris 347 transaction logging 108 Super-grouping functionality 234

456 Database Strategies: Using Informix XPS and DB2 Universal Processing Council 373 Set Up HADR 283 transition 3, 7, 345 Storage Management Setup launchpad 283 Tree classification 16 write-ahead logging 109 triggers 240, 346 Truncation 146 tuning queries 250 X xlconv thread 256 xlread threads 256 U XPS 2–3, 7, 9, 346 Uniform page size 207 XPS architecture 4, 8 UNION ALL views 176, 398 XPS buffer pool 65 Universal Database (UDB) XPS chunk 30 UNIX 284, 346 XPS circular logging 83 UPDATE STATISTICS 290 XPS command line 66 UPDATE STATISTICS statement 248 XPS coserver 47, 64 UPDATE-JOIN statement 234 XPS dbaccess utility 278 UPSERT 231 XPS dbschema utility 291 user authentication 124 XPS dbslice 35 user defined function 112, 211 XPS dbspace 35 user table space 37 XPS DETACH and ATTACH 405 User temporary table spaces 36 XPS extent 32 user-defined functions 205 XPS implicit casting 226 user-defined methods 205 XPS instance architecture 22 Utility heap 52 XPS locks 326 XPS memory model 48 XPS Message Passing Facility 21, 24 V XPS next size 397 VARCHAR data type 147 XPS onstat utility 300 Version recovery 114 XPS optimizer 9 versioning of packages 330 XPS optimizer directives 238 video 319 XPS outer join 403 Views 346 XPS pages 31 violation table 266 XPS parallel data loader 10 virtual processors 50 XPS sampling with replacement 215 Visual Explain 196, 249, 279 XPS virtual processors. See XPS VP. VisualAge C++ 360 XPS VP 42, 50 W warningcount 267 Z z/OS 284 WebSphere DataStage 315 WebSphere DataStage. See DataStage. WebSphere Information Integrator 12, 254, 403 Standard Edition 19 Windows 284, 346–347 Wizards Configure Automatic Maintenance 283 Create Cache Table 283 Redistribute Data 283 Set Up Activity Monitor 283

Index 457

458 Database Strategies: Using Informix XPS and DB2 Universal Database

Database Strategies: Using Informix XPS and DB2 Universal Database

Database Strategies: Using Informix XPS and DB2 Universal Database

(1.0” spine) 0.875”<->1.498” 460 <-> 788 pages

Back cover ®

Database Strategies: Using Informix XPS and DB2 Universal Database

Understanding and The acquisition of Informix by IBM has provided the opportunity for INTERNATIONAL exploiting the Informix customers to consider new alternatives to further enrich strengths of Informix their data management systems infrastructure. They can now TECHNICAL XPS and DB2 UDB more easily take advantage of available products, services, and SUPPORT capabilities as they grow and change. ORGANIZATION Considerations for This IBM Redbook focuses on strategies, techniques, capabilities, transitioning data and and considerations for using Informix Extended Parallel Server schemas to DB2 UDB (XPS) and DB2 Universal Database (UDB). It provides detailed BUILDING TECHNICAL discussions and data to give a good understanding of the two INFORMATION BASED ON PRACTICAL EXPERIENCE Working with very products, their capabilities, and their similarities. XPS customers large data volumes can choose to adopt a database strategy of coexistence or consider transitioning to DB2 UDB. IBM Redbooks are developed by the IBM International Technical The features and functionality of each DBMS are briefly described Support Organization. Experts for a better understanding, in areas such as architecture, from IBM, Customers and Partners from around the world partitioning techniques, SQL considerations, configuration, create timely technical indexing, data types, DML, and DDL. It also discusses products information based on realistic and tools to complement these database management systems. scenarios. Specific With this information, you can better decide which products satisfy recommendations are provided your particular requirements, and better plan on how to achieve to help you implement IT solutions more effectively in your objectives as you develop your database management your environment. system strategy. You will be better positioned to make informed decisions that can give you the best return on your DBMS investment. For more information: ibm.com/redbooks

SG24-6437-00 ISBN 0738493740