Session: F6 Put a smile on your DBAs face Performance for the Developer (and you know-it-all DBAs)

Marc Costa & Rob Crane FedEx Freight System

May 8, 2007 10:40 a.m. – 11:40 a.m.

Platform: DB2 for z/OS

This presentation is available in PowerPoint format to customize and use in your shop. Please send an email to Rob or Marc, or stop by to get a copy from the USB key.

Biography: Marc Costa BSBA Degree, Colorado State University, Fort Collins, CO December 1993. Accepted a Programmer position with Mutual of Omaha in 1994. Programmed against DB2 using Cobol and Visual Basic. Accepted a position as an application DBA, September 1998. Joined FedEx Freight System as a Sr. DBA in January 2003. Working on a DBA team supporting all aspects of DB2, including upgrades (V7, V8), application design, SQL tuning and system tuning. IBM Certified DBA V8.1 DB2 UDB for z/OS.

1 Performance for the Developer

⊕ Understanding your data and how DB2 sees it.

⊕ Clustering concepts and impacts on your SQL performance. Why clustering matters. What caused list prefetch to kick in? What is prefetch anyway and should I care if it changed?

⊕ Is there more than one way to do an existence check? Get off my case, my visual explain says I am using an index! Did you check the cardinality of that index? Visual explain shows I am only joining two tables together in my online transaction, it can’t take that long can it?

⊕ Is multiple index access similar to multiple personality disorder? Who is this Cartesian guy? Should I worry about him? I used LIKE because the user does not know what they want, why should I be concerned with re-optimization?

⊕ Runstats... Bueller? Reorg... Bueller? Rebind... Bueller? Anyone? Statistics rule, opthints drool. What can I do with real-time statistics? What is stats advisor? My drain is stuck, got any drano?

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 2

Who is responsible for performance? No, not the wizard behind the curtain. Is it the application developers? How about the DBAs? Maybe both? Hopefully the answer is not “What is performance?”

In this presentation, we will explore simple things most shops have probably experienced or are currently experiencing with performance. We will discuss easy ways to spot performance issues in development before they become production nightmares. We will cover clustering, indexing and the importance of the 3 R’s (runstats, reorg, rebind) when analyzing performance issues. We will look at ways to efficiently ask common questions of the database (does this row exist, does this column add filtering to the index, etc.). Come away with an understanding of performance pitfalls and how to avoid them in your shop.

2 FedEx Freight Production System Summary

z9 – 2094-712 z9 – 2094-701

AFW1 (12) AFWC (1) 8100 CA SPOOL DB2 Subsystem ESS DBP1 JHS Shark DASD CICS FARM PIPES CF04 (1) BATCH DDF Shared Com Area

AFW2 (4) Lock Structure Virtual Tape Server

DB2 Subsystem Emergency Failover Box DBP2

128/12 Drives CICS DS Group PIPES Automated Tape Library BATCH DDF DSN

5 TB data CF03 (1) 4.2 Billion SQL statements daily Group Buffer Pools 5,818 mips 22 Drives

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 3

FedEx Freight averages 70,000 shipments per day. This translates into 55-60 million CICS transactions per day, and 4.2 billion SQL statements per day. We typically average 800 transactions per second, the majority of those occurring on image AFW1. We are just shy of 5 terabytes of DB2 data in production. The company projects a steady growth pattern in the business, which translates into rapid growth in data and new systems to support our customers.

3 MOUNTAIN.PEAK Table

PEAK_NAME STATE_CD ELEVATION DATUM UTM_ZONE EASTING NORTHING RANGE US50 US48 MOUNT MCKINLEY AK 20320 NAD27 5 601034 6994441 1 MOUNT BEAR AK 14831 NAD27 7 492773 6794161 SAINT ELIAS 13 CA 14494 WGS84 11 384374 4048898 SIERRA 17 1 STARLIGHT PEAK CA 14080 WGS84 11 365360 4106483 SIERRA 63 43 SPLIT MOUNTAIN CA 14058 NAD27 11 373890 4097539 SIERRA 70 49 THUNDERBOLT PEAK CA 14000 NAD27 11 365250 4106560 SIERRA 91 70 CO 14420 NAD27 13 381385 4310318 SAWATCH 23 4 CO 14270 NAD27 13 429858 4387346 FRONT 31 12 CO 14255 NAD27 13 447179 4456607 FRONT 37 18 CO 14196 NAD27 13 386972 4299080 SAWATCH 45 26 MOUNT BROSS CO 14172 NAD27 13 404598 4354359 MOSQUITO 46 27 CO 14110 NAD27 13 496954 4298208 FRONT 58 38 SUNLIGHT PEAK CO 14059 WGS84 13 270921 4167643 SAN JUAN 69 48 LITTLE BEAR PEAK CO 14037 WGS84 13 456089 4157849 SANGRE DE CRISTO 76 55 MOUNT HOLLY CROSS CO 14005 NAD27 13 373170 4367510 SAWATCH 88 67 WA 14410 NAD27 10 595190 5189085 CASCADE 24 5

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 4

The MOUNTAIN.PEAK table is a subset of the data available. There are actually 91 peaks in the United States above 13,999 feet; we have shown 16 in the following examples to help understand the concepts.

The MOUNTAIN.PEAKB table contains all 91 mountains in the lower 48 states above 13,999 feet. Colorado Æ 54 summits above 13,999 feet | 4,267 meters Alaska Æ 21 summits above 13,999 feet | 4,267 meters California Æ 15 summits above 13,999 feet | 4,267 meters Washington Æ 1 summit above 13,999 feet | 4,267 meters

The MOUNTAIN.PEAK_50 table contains the 100 highest peaks in each of the 50 states regardless of the summit elevation. This table was used to get additional data for examples shown later in this presentation.

The various PEAK% tables were built from data found at www.americasroof.com and www.topozone.com. Data is for teaching concepts and not meant for navigation or mountaineering, although if lost in the mountains you could burn this presentation to keep warm. Know DB2’s filtering order (the first 4 below): 1. Index Probe | Stage 1 | Probe navigating the index B Tree structure matchcols >= 1 2. Index Screening | Stage 1| Index screening on leaf pages (Visual Explain denotes this type of access, the Plan_Table does not) 3. Data Screening | Stage 1 | data page level filtering applied 4. Data Screening | Stage 2 | all predicate filtering not eligible for stage 1 5. Application Screening | Stage 10 | program gymnastics 6. Distributed Application Screening | Stage 20 | distributed program gymnastics

4 Peaks > 13,999 Feet Located in US

PEAK_NAME STATE_CD ELEVATION DATUM UTM_ZONE EASTING NORTHING RANGE US50 US48 MOUNT MCKINLEY AK 20320 NAD27 5 601034 6994441 ALASKA 1 MOUNT BEAR AK 14831 NAD27 7 492773 6794161 SAINT ELIAS 13 SPLIT MOUNTAIN CA 14058 NAD27 11 373890 4097539 SIERRA 70 49 THUNDERBOLT PEAK CA 14000 NAD27 11 365250 4106560 SIERRA 91 70 MOUNT HOLLY CROSS CO 14005 NAD27 13 373170 4367510 SAWATCH 88 67 MOUNT RAINIER WA 14410 NAD27 10 595190 5189085 CASCADE 24 5 Goal of indexes is to minimize I/O Indexes: ⊕ Clustering Index DX1PEAK on: STATE_CD, ELEVATION (desc), PEAK_NAME ⊕ Unique Index UX1PEAK on: PEAK_NAME, US50_HEIGHT_RANK, US48_HEIGHT_RANK ⊕ Unique PK Index PX1PEAK on: DATUM, UTM_ZONE, EASTING, NORTHING

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 5

GOAL OF INDEXES: • Enforce uniqueness • Minimize I/O • Parent/Child RI navigation (for performance) • Provide filtering • Assist in sorting, grouping (aggregates), ordering

SQL Breakdown: 80% - SELECT PEAK_NAME, ELEVATION FROM PEAK WHERE STATE_CD = ’AK’ ORDER BY ELEVATION DESC;

SELECT PEAK_NAME, ELEVATION, STATE_CD FROM PEAK ORDER BY ELEVATION, STATE_CD;

SELECT MAX(EVELATION), STATE_CD FROM PEAK GROUP BY STATE_CD;

SELECT * FROM PEAK WHERE STATE_CD =‘CO’ ORDER BY ELEVATION;

15% - SELECT PEAK_NAME, ELEVATION FROM PEAK WHERE PEAK_NAME = ‘PIKES PEAK’;

5% - SELECT PEAK_NAME, STATE_CD FROM PEAK WHERE DATUM = ‘NAD27’ AND UTM_ZONE = ‘13’ AND EASTING = 373170 AND NORTHING = 4367510’

5 CLUSTERING INDEX (DX1PEAK) Page 501 Page 552 AK 20320 Mount McKinley … AK 14831 Mount Bear p. 501, 2 STATE_CD AK 14831 Mount Bear … ELEVATION (desc) CA 14080 Starlight Peak p. 502, 2 Page 502 PEAK_NAME CA 14494 Mount Whitney … CA 14080 Starlight Peak … Page 503

Page 553 CA 14058 Split Mountain … CA 14000 Thunderbolt Peak … Page 717 CA 14000 Thunderbolt Peak p. 503, 2 CO 14270 Grays Peak p. 504, 2 Page 504 CA 14080 Starlight Peak p. 552, 2 CO 14270 Grays Peak p. 553, 2 CO 14420 Mount Harvard … CO 14270 Grays Peak … CO 14110 Pikes Peak p. 554, 2 WA 14410 Mount Rainier p. 555, 2 Page 505 Page 554 CO 14255 Longs Peak … ROOT Page p. 505, 2 CO 14196 Mount Yale CO 14196 Mount Yale … CO 14110 Pikes Peak p. 506, 2 Page 506 CO 14172 Mount Bross … CO 14110 Pikes Peak …

Page 507 Page 555 CO 14059 Sunlight Peak … p. 507, 2 CO 14037 Little Bear Peak CO 14037 Little Bear Peak … Clustering: Y WA 14410 Mount Rainier p. 508, 2 Clustered: Y Page 508 Cluster Ratio: 100% NON-LEAF Page CO 14005 Mount Holy Cross … NLevels: 3 WA 14410 Mount Rainier … LEAF & DATA Page Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 6

The clustering index is one of the most critical decisions made when defining indexes against your tables. It is critical to understand how programs and ad hoc users will be coming after the data in the table. Cluster for most frequent access by the majority of your SQL requests. What data elements will be predicate candidates? Always favor naturally occurring keys that enforce business relationships/logic.

If possible, avoid columns that are updated frequently. Similar clustering on tables involved in join processing should also be considered. CLUSTERRATIOF indicates how closely the data pages match the order set by the clustering index. Higher cluster ratios yield more efficient data page access. A cluster ratio below 80% will disable sequential prefetch of data pages. Investigate FREEPAGE and PCTFREE changes to maintain acceptable cluster ratios over longer time periods. If you have INDEXONLY access, then CLUSTERRATIOF is not considered.

Always define a clustering index. If you do not define a clustering index, DB2 is forced to select the index with the lowest OBID to assist in insert row placement. When you have not specified a clustering index, the one with the lowest OBID is normally the first index created on that table. Regaining clustering order when you have not defined a clustering index is a crap shoot at best.

Duplicate index entries increase the CLUSTERRATIOF. The clustering index does not need to be a unique index (but can be if appropriate). There are several examples that warrant having a unique clustering index; however, most do not. Picture the phone book…. would you want it clustered by PHONE_NUMBER rather than LAST_NAME, FIRST_NAME? Cluster the data for the sequential access needs/use of the object. Remember the clustering needs associated to conversion efforts to ensure load utilities are sourced to input files sorted in clustering sequence.

With table based partitioning, the clustering of the data can be different than the partitioning of the data. For example, the ability to table partition the SHIPMENT table by SHIP_DOB_DT and cluster the data within each part by SHIP_DESTINATION_CENTER.

6 Unique Index (UX1PEAK) Page 401 Page 501 Grays Peak 31 12 p. 504, 2 AK 20320 Mount McKinley … PEAK_NAME Page 301 Little Bear Peak 76 55 p. 507, 2 AK 14831 Mount Bear … US50_HEIGHT_RANK p. 401, 2 US48_HEIGHT_RANK Little Bear Peak 76 55 Page 402 Page 502 Mount Bear 13 _ p. 402, 2 Longs Peak 37 18 p. 505, 1 CA 14494 Mount Whitney … Mount Bear 13 _ p. 501, 2 CA 14080 Starlight Peak … Page 201 Page 403 Page 503 Mount Bear 13 _ p. 301, 2 Mount Bross 46 27 p. 506, 1 CA 14058 Split Mountain … Mount Holy Cross 88 67 p. 302, 2 Mount Harvard 23 4 p. 504, 1 CA 14000 Thunderbolt Peak … Page 302 Page 404 Page 504 Mount Harvard 23 4 p. 403, 2 Mount McKinley 1 _ p. 501, 1 CO 14420 Mount Harvard … Page 101 Mount Holy Cross 88 67 p. 404, 2 Mount Holy Cross 88 67 p. 508, 1 CO 14270 Grays Peak … Mount Holy Cross 88 67 p. 201,2 Page 405 Page 505 Thunderbolt Peak 91 70 p. 202, 2 Mount Rainier 24 5 p. 508, 2 CO 14255 Longs Peak … ROOT Page Page 303 Mount Whitney 17 1 p. 502, 1 CO 14196 Mount Yale … Mount Whitney 17 1 p. 405, 2 Pikes Peak 58 38 p. 406, 2 Page 406 Page 506 Mount Yale 45 26 p. 505, 2 CO 14172 Mount Bross … Page 202 Pikes Peak 58 38 p. 506, 2 CO 14110 Pikes Peak … Pikes Peak 58 38 p. 303, 2 Page 407 Page 507 Thunderbolt Peak 91 70 p. 304, 2 p. 503, 1 NON-LEAF Page Split Mountain 70 49 CO 14059 Sunlight Peak … Page 304 Starlight Peak 63 43 p. 502, 2 CO 14037 Little Bear Peak … p. 407, 2 Clustering: N Starlight Peak 63 43 Page 408 Page 508 Clustered: N Thunderbolt Peak 91 70 p. 408, 2 Sunlight Peak 69 48 p. 507, 1 CO 14005 Mount Holy Cross … Cluster Ratio: 50% NON-LEAF Page NLevels: 4 Thunderbolt Peak 91 70 p. 503, 2 WA 14410 Mount Rainier … LEAF Page DATA Page Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 7

CREATE DATABASE DBUS14ER STOGROUP MNTSG BUFFERPOOL BP14 INDEXBP BP13 CCSID EBCDIC;

CREATE TABLESPACE TSPEAK01 IN DBUS14ER USING STOGROUP VENDORSG PRIQTY 720 SECQTY 720 FREEPAGE 2 PCTFREE 10 BUFFERPOOL BP14 LOCKSIZE PAGE CLOSE NO SEGSIZE 28 LOCKMAX SYSTEM CCSID EBCDIC;

CREATE TABLE MOUNTAIN.PEAK ( PEAK_NAME CHAR (30) NOT NULL , CONTINENT CHAR (20) NOT NULL , STATE_CD CHAR (2) NOT NULL , ELEVATION SMALLINT NOT NULL , DATUM CHAR (5) NOT NULL , UTM_ZONE CHAR (2) NOT NULL , EASTING INTEGER NOT NULL , NORTHING INTEGER NOT NULL , RANGE CHAR (18) NOT NULL , US50_HEIGHT_RANK SMALLINT NOT NULL , US48_HEIGHT_RANK SMALLINT , PEAK_DESCRIPTION_1 CHAR (255) NOT NULL WITH DEFAULT , PEAK_DESCRIPTION_2 CHAR (255) NOT NULL WITH DEFAULT , PEAK_DESCRIPTION_3 CHAR (255) NOT NULL WITH DEFAULT , PEAK_DESCRIPTION_4 CHAR (255) NOT NULL WITH DEFAULT , PEAK_DESCRIPTION_5 CHAR (255) NOT NULL WITH DEFAULT , PEAK_DESCRIPTION_6 CHAR (255) NOT NULL WITH DEFAULT , PEAK_DESCRIPTION_7 CHAR (255) NOT NULL WITH DEFAULT , CONSTRAINT PEAKPK01 PRIMARY KEY (DATUM ,UTM_ZONE ,EASTING ,NORTHING) IN DBUS14ER.TSPEAK01 CCSID EBCDIC;

COMMENT ON TABLE MOUNTAIN.PEAK IS 'Peaks in US greater than 13,999 feet’;

7 PRIMARY INDEX (PX1PEAK) Page 461 Page 501 NAD27 5 601034 6994441 p. 501, 1 AK 20320 Mount McKinley … DATUM Page 331 NAD27 7 492773 6794161 p. 501, 2 AK 14831 Mount Bear … UTM_ZONE NAD27 7 492773 6794161 p. 461, 2 EASTING Page 462 Page 502 NAD27 11 365250 4106560 p. 462, 2 NORTHING NAD27 10 595190 5189085 p. 508, 2 CA 14494 Mount Whitney … NAD27 11 365250 4106560 p. 503, 2 CA 14080 Starlight Peak … Page 281 NAD27 11 365250 4106560 p. 331, 2 Page 463 Page 503 NAD27 11 373890 4097539 p. 503, 1 CA 14058 Split Mountain … NAD27 13 386972 4299080 p. 332, 2 NAD27 13 373170 4367510 p. 508, 1 CA 14000 Thunderbolt Peak … Page 332 Page 464 Page 504 NAD27 13 373170 4367510 p. 463, 2 NAD27 13 381385 4310318 p. 504, 1 CO 14420 Mount Harvard … Page 101 NAD27 13 386972 4299080 p. 464, 2 NAD27 13 386972 4299080 p. 505, 2 CO 14270 Grays Peak … NAD27 13 386972 4299080 p. 281, 2 Page 465 Page 505 WGS84 13 456089 4157849 p. 282, 2 NAD27 13 404598 4354359 p. 506,1 Page 333 CO 14255 Longs Peak … ROOT Page NAD27 13 429858 4387346 NAD27 13 429858 4387346 p. 465, 2 p. 504, 2 CO 14196 Mount Yale … NAD27 13 496954 4298208 p. 466, 2 Page 466 Page 506 NAD27 13 447179 4456607 p. 505, 1 CO 14172 Mount Bross … Page 282 NAD27 13 496954 4298208 p. 506, 2 CO 14110 Pikes Peak … NAD27 13 496954 4298208 p. 333, 2 Page 467 Page 507 WGS84 13 456089 4157849 p. 334, 2 WGS84 11 365360 4106483 p. 502, 2 CO 14059 Sunlight Peak … NON-LEAF Page Page 334 WGS84 11 384374 4048898 p. 502, 1 CO 14037 Little Bear Peak … WGS84 13 270921 4167643 p. 467, 2 Clustering: N Page 468 Page 508 Clustered: N WGS84 13 456089 4157849 p. 468, 2 WGS84 13 270921 4167643 p. 507, 1 Cluster Ratio: 62.5% CO 14005 Mount Holy Cross … NLevels: 4 NON-LEAF Page WGS84 13 456089 4157849 p. 507, 2 WA 14410 Mount Rainier … LEAF Page DATA Page Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 8

CREATE INDEX MOUNTAIN.DX1PEAK ON MOUNTAIN.PEAK ( STATE_CD ASC , ELEVATION DESC , PEAK_NAME ASC ) USING STOGROUP VENDORSG FREEPAGE 20 PCTFREE 10 CLUSTER BUFFERPOOL BP13 PIECESIZE 2G;

CREATE UNIQUE INDEX MOUNTAIN.UX1PEAK ON MOUNTAIN.PEAK ( PEAK_NAME ASC , US50_HEIGHT_RANK ASC , US48_HEIGHT_RANK ASC ) USING STOGROUP VENDORSG FREEPAGE 20 PCTFREE 10 BUFFERPOOL BP13 PIECESIZE 2G;

CREATE UNIQUE INDEX MOUNTAIN.PX1PEAK ON MOUNTAIN.PEAK ( DATUM ASC , UTM_ZONE ASC , EASTING ASC , NORTHING ASC ) USING STOGROUP VENDORSG FREEPAGE 20 PCTFREE 10 BUFFERPOOL BP13 PIECESIZE 2G;

8 Existence Checking Fetch First 1 Row Only [1a] Singleton Select [1b]

SELECT '1' SELECT '1' FROM MOUNTAIN.PEAK FROM MOUNTAIN.PEAK WHERE STATE_CD = 'CA' WHERE STATE_CD = 'CO' WITH UR; FETCH FIRST 1 ROW ONLY CASE WITH UR; WHEN SQLCODE = 0 Æ 1 row found WHEN SQLCODE = 100 Æ no rows found WHEN SQLCODE = -811 Æ > 1 row found END CASE

Select COUNT(*) [3] SubSelect Exists [13]

SELECT '1' SELECT COUNT(*) FROM MOUNTAIN.PEAK OUTER_TBL FROM MOUNTAIN.PEAK WHERE EXISTS WHERE STATE_CD = 'WA' ( WITH UR; SELECT '1' IF COUNT > 0 THEN Æ row exists FROM MOUNTAIN.PEAK INNER_TBL ELSE Æ no row exists WHERE OUTER_TBL.STATE_CD = INNER_TBL.STATE_CD -OR- AND INNER_TBL.STATE_CD = 'AK' IF COUNT = 0 THEN Æ no row exists ) ELSE Æ row exists WITH UR;

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 9

There are a few ways to code for an existence check in SQL. Listed below are four of the more popular examples of existences checks. They are listed in order of preference from a performance standpoint. Understand that the filtering predicate driving the existence check will benefit from being a high order column of an index on the table. Notice the select is not selecting a column; there is no need to bring data back if only existence is being verified. This is why the constant (SELECT ‘1’) was used in the query.

Try one of the following approaches, depending on what you require. 1a is our pick. [1a] Fetch First 1 Row Only [1b] Singleton Select [3] Select Count(*) [13] Exists Sub Select

In the case of Fetch First 1 Row Only, DB2 will only return 1 row. FETCH FIRST 1 ROW ONLY will only return one row to the application. If no row exists, a +100 SQL statement is generated. This is different then OPTIMIZE FOR 1 ROW, which only influences the access path DB2 will use to retrieve all of the data that qualifies the supplied predicates.

Remember, I/O is done in pages. Depending on the index size (composite column length and number of levels in the index), the number of pages accessed to retrieve that one row will vary. In the MOUNTAIN.PEAK example, 2 pages were accessed due to a 2 level index. One root page and one leaf page were accessed.

9 One answer, two access paths; maybe there is a wizard?

Subselect Exist pitfalls MOUNTAIN.PEAK Matching FF Matching FF Desired AP Access Path Predicate Rows CPU Cost MS CPU Cost SU FF Inner Tbl FF Outer Tbl O/I Rids NL RIDs * NLoopJoin AK 2 1 2 0.125 0.125 2 4 * NLoopJoin CA 4 1 5 0.25 0.25 4 16 NLoopJoin CO 9 1 20 0.5625 0.5625 9 81 * NLoopJoin W A 1 1 1 0.0625 0.0625 1 1 PEAK 16 Matching FF Desired AP Access Path Predicate Rows CPU Cost MS CPU Cost SU FF Inner Tbl FF IN=OUT FF Outer Tbl RIDs CorSubQry AK 2 1 9 0.125 0.25 1 2 CorSubQry CA 4 1 10 0.25 0.25 1 4 * CorSubQry CO 9 1 11 0.5625 0.25 1 4 CorSubQry W A 1 1 8 0.0625 0.25 1 1 PEAK 16

MOUNTAIN.PEAK_50 Matching FF Matching FF Desired AP Access Path Predicate Rows CPU Cost MS CPU Cost SU FF Inner Tbl FF Outer Tbl O/I Rids NL RIDs * NLoopJoin AK 99 56 1505 0.0186 0.0186 86.3636 7458.676 * NLoopJoin CA 100 75 2011 0.0215 0.0215 100 1000 * NLoopJoin CO 100 75 2011 0.0215 0.0215 100 1000 * NLoopJoin W A 100 56 1505 0.0186 0.0186 86.3636 7458.676 PEAK_50 399 Matching FF Desired AP Access Path Predicate Rows CPU Cost MS CPU Cost SU FF Inner Tbl FF IN=OUT FF Outer Tbl RIDs CorSubQry AK 99 106 2850 0.0186 0.0204 1 86.3636 CorSubQry CA 100 106 2840 0.0215 0.0204 1 69 CorSubQry CO 100 106 2840 0.0186 0.0204 1 86.3636 CorSubQry W A 100 106 2850 0.0186 0.0204 1 86.3636 PEAK_50 399

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 10

What qualifies for the 3 queries on slide 9 [1a, 1b, 3]? It is important to understand the difference between what qualifies and what is returned to the application. FETCH FIRST 120 ROWS ONLY can be very helpful in screen population applications. Why return all 8,000 rows if the application is only going to display 120? Also remember the role that filter factor calculations play in the optimizer’s decision making algorithms, especially when predicates are sourced by :host variables and reopt(always) is not being used.

For these 3 queries on slide 9, the answer depends on the filtering predicate. Using SELECT COUNT(*), STATE_CD with GROUP BY STATE_CD is another way to understand your data. This analysis normally leads to creating different runstats colgroup for those objects. The data below is from visual explain output.

Table Creator Table Name Correlation Name Rows Pages Qualified Rows MOUNTAIN PEAK STATE_WA 16 8 1

Table Creator Table Name Correlation Name Rows Pages Qualified Rows MOUNTAIN PEAK STATE_AK 16 8 2

Table Creator Table Name Correlation Name Rows Pages Qualified Rows MOUNTAIN PEAK STATE_CA 16 8 4

Table Creator Table Name Correlation Name Rows Pages Qualified Rows MOUNTAIN PEAK STATE_CO 16 8 9

10 Should I trust Visual Explain? YES and back it up with details to silence the critics.

PEAK | STATE_CD = CO SubQuery INDB2_TIME INDB2_CPU NLJoin INDB2_TIME INDB2_CPU

Fetch 0.000215 0.000213 % Lower Cost % Lower Cost Fetch 0.000218 0.000214 Open 0.000017 0.000017 INDB2_TIME INDB2_CPU Open 0.000019 0.000019 total 0.000232 0.000230 1.066098081 0.647948164 total 0.000237 0.000233

PEAK_50 | STATE_CD = CO

SubQuery INDB2_TIME INDB2_CPU NLJoin INDB2_TIME INDB2_CPU Fetch 0.013093 0.010682 % Lower Cost % Lower Cost Fetch 0.0008 0.000778 Open 0.000029 0.000029 INDB2_TIME INDB2_CPU Open 0.000041 0.000027 total 0.013122 0.010711 87.95387811 86.0194512 total 0.000841 0.000805

SELECT '1' SELECT '1' FROM PEAK OUTER_TBL_CO FROM PEAK_50 OUTER_TBL_CO WHERE EXISTS WHERE EXISTS ( ( SELECT '1' SELECT '1' FROM PEAK INNER_TBL_CO FROM PEAK_50 INNER_TBL_CO WHERE WHERE OUTER_TBL_CO.STATE_CD = OUTER_TBL_CO.STATE_CD = INNER_TBL_CO.STATE_CD INNER_TBL_CO.STATE_CD AND AND OUTER_TBL_CO.STATE_CD = 'CO' INNER_TBL_CO.STATE_CD = 'CO' ) ) WITH UR; WITH UR;

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 11

RUNSTATS sample job and output: //STATSCG1 EXEC DSNUPROC,SYSTEM=DSNB,UID='RUNSTRAC',UTPROC='' //SYSPRINT DD SYSOUT=* //DSNUPROC.RNPRIN01 DD SYSOUT=* //DSNUPROC.SYSIN DD * RUNSTATS TABLESPACE DBUS14ER.TSPEAK01 TABLE(MOUNTAIN.PEAK) COLGROUP(STATE_CD) FREQVAL NUMCOLS 2 COUNT 16 COLGROUP(STATE_CD,ELEVATION) FREQVAL COUNT 16 COLGROUP(PEAK_NAME,ELEVATION) COLGROUP(PEAK_NAME,US50_HEIGHT_RANK,US48_HEIGHT_RANK) COLGROUP(PEAK_NAME,EASTING,NORTHING) SORTDEVT SYSDA SHRLEVEL CHANGE REPORT YES

!DBT2 DSNUSUTS - SYSTABLESPACE CATALOG UPDATE FOR DBUS14ER.TSPEAK01 SUCCESSFUL !DBT2 DSNUSUCD - SYSCOLDIST CATALOG STATISTICS FOR STATE_CD FREQUENCY COLVALUE ------5.625E-01 X'C3D6' 2.5E-01 X'C3C1' 1.25E-01 X'C1D2' 6.25E-02 X'E6C1' !DBT2 DSNUSUCD - SYSCOLDIST CATALOG UPDATE FOR MOUNTAIN.PEAK SUCCESSFUL DSNUSUCD - SYSCOLDIST CATALOG STATISTICS FOR STATE_CD,ELEVATION CARDINALITY = 1.6E+01 DSNUSUCD - SYSCOLDIST CATALOG STATISTICS FOR STATE_CD,ELEVATION FREQUENCY COLVALUE ------6.25E-02 X'C1D2B9EF' 6.25E-02 X'C1D2CF60'

11 Page 461 Page 501 NAD27 5 601034 6994441 p. 501, 1 List Prefetch AK 20320 Mount McKinley … NAD27 7 492773 6794161 p. 501, 2 AK 14831 Mount Bear … = unwanted disk head movement Page 462 RID LIST Page 502 NAD27 10 595190 5189085 p. 508, 2 NAD27 10 595190 5189085 p. 508, 2 CA 14494 Mount Whitney … NAD27 11 365250 4106560 p. 503, 2 NAD27 11 365250 4106560 p. 503, 2 CA 14080 Starlight Peak … Page 463 NAD27 11 373890 4097539 p. 502, 2 Page 503 NAD27 11 373890 4097539 p. 503, 1 WGS84 11 365360 4106483 p. 502, 2 CA 14058 Split Mountain … NAD27 13 373170 4367510 p. 508, 1 WGS84 11 384374 4048898 p. 503, 1 CA 14000 Thunderbolt Peak … Page 464 Page 504 Å (IX not Clustered) physical NAD27 13 381385 4310318 p. 504, 1 I/O order CO 14420 Mount Harvard … NAD27 13 386972 4299080 p. 505, 2 SORTRID CO 14270 Grays Peak … Page 465 Page 505 Sorted RID LIST NAD27 13 404598 4354359 p. 506,1 CO 14255 Longs Peak … WGS84 11 365360 4106483 p. 502, 2 NAD27 13 429858 4387346 p. 504, 2 CO 14196 Mount Yale … WGS84 11 384374 4048898 p. 503, 1 Page 466 NAD27 11 373890 4097539 p. 502, 2 Page 506 NAD27 13 447179 4456607 p. 505, 1 NAD27 11 365250 4106560 p. 503, 2 CO 14172 Mount Bross … NAD27 13 496954 4298208 p. 506, 2 NAD27 10 595190 5189085 p. 508, 2 CO 14110 Pikes Peak … Page 467 Page 507 WGS84 11 365360 4106483 p. 502, 2 SELECT PEAK_NAME, DATUM, CO 14059 Sunlight Peak … WGS84 11 384374 4048898 p. 502, 1 US50_HEIGHT_RANK,US48_HEIGHT_RANK CO 14037 Little Bear Peak … FROM MOUNTAIN.PEAK Page 468 WHERE DATUM IN ('NAD27','WGS84') Page 508 WGS84 13 270921 4167643 p. 507, 1 AND UTM_ZONE IN ('10’,’11') WITH UR; CO 14005 Mount Holy Cross … WGS84 13 456089 4157849 p. 507, 2 WA 14410 Mount Rainier … LEAF Page DATA Page Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 12

RID – Record Identifier The RID pool is used for record identifier processing, enforcing unique keys while updating multiple rows and sorting of RIDs for: 1) list prefetch, 2) access via multiple indexes, and 3) hybrid joins.

On-line transactions generally do not want lots of rows returned and would rather not have List Prefetch used. OPTIMIZE FOR 1 ROW will disable LIST PREFETCH. FETCH FIRST 1 ROW ONLY will use an implied OPTIMIZE FOR 1 ROW clause.

Index Pages ClusterRatio StatsTime AFIXCDPS 735 0.54 2007-02-03-21.16.12.792282 AFX5CDPS 461 0.42 2007-02-03-21.16.12.792282

AFIXCDPS 735 1.00 2007-02-03-14.32.58.886775 What caused the cluster ratio of index AFIXCDPS AFX5CDPS 461 0.41 2007-02-03-14.32.58.886775 to go from 100% to 54% in less than 7 hours?

AFIXCDPS AFX5CDPS AFX2CDPS ALTER INDEX AFIXCDPS NOT CLUSTER; TERMINAL_ID TERMINAL_ID DRIVER_ID ALTER INDEX AFX2CDPS CLUSTER;

STOP_NUMBER PICKUP_DATE PICKUP_DATE This is a very powerful feature of V8. COMPLETED_DATE COMPLETED_DATE CALL_COMPLETED_FLG TERMINAL_ID PICKUP_END_TIME ROUTE_CODE

12 Page 501 AK 20320 Mount McKinley … Sequential Detection AK 14831 Mount Bear …

Page 502 ACTIVE CA 14494 Mount Whitney … SELECT PEAK_NAME 5 out of 8 CA 14080 Starlight Peak … FROM PEAK Page 503 WHERE UTM_ZONE= ‘13’ ; CA 14058 Split Mountain … CA 14000 Thunderbolt Peak … p. 508 p. 504 p. 505 p. 506 p. 507p. 540 p. 526 p. 527 p. 547 Page 504 CO 14420 Mount Harvard … 1113 3314120 CO 14270 Grays Peak … Page 505 CO 14255 Longs Peak … CO 14196 Mount Yale … SELECT PEAK_NAME NOT Page 506 FROM PEAK ACTIVE CO 14172 Mount Bross … WHERE UTM_ZONE = ‘11’ ; 3 out of 8 CO 14110 Pikes Peak …

Page 507 p. 503 p. 502 p. 501 p. 529 p. 559p. 577 p. 579 p. 597 p. 654 CO 14059 Sunlight Peak … CO 14037 Little Bear Peak … 1 1 28301821747 Page 508 CO 14005 Mount Holy Cross … (data access sequential = 4 out of last 8 pages are page-sequential) WA 14410 Mount Rainier … Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 13

P = DB2 Prefetch Quantity = 32 pages (4K page size, 4K buffer pool with 1,000 buffers). The prefetch quantity will vary by page size and number of buffers in the buffer pool for the object in question.

Page-sequential = P/2 32/2 = 16

Is the current page within 16 pages of the next page? If the answer is yes, the relationship is considered page- sequential. Once DB2 has determined it is page-sequential, it determines if the data access is sequential. Data access is considered sequential if 4 out of the last 8 pages are page-sequential. Assuming this is true, sequential prefetch (getting 32 pages) will continue until the pattern stops.

13 Multiple Index Scan

SELECT * FROM MOUNTAIN.PEAK_50 WHERE ((LATITUDE BETWEEN '191131N' AND ‘325836N') OR (PEAK_NAME BETWEEN 'HARVARD, MOUNT' AND 'YALE, MOUNT')) WITH UR;

IXOR

SORTRID SORTRID

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 14

Based on the predicates, DB2 can perform multiple index filtering/screening. DB2 applies predicates to each index, and then individually sorts the rids which qualified for each index used for the filtering. Then, the IXOR process unions those sorted rids lists together (2 or more). Rids that exist in only one of the lists are kept. Duplicates are removed, resulting in your final answer set. The performance effectiveness of this join method is dependent on the uniqueness of the indexes being chosen and the number of rids each index returns.

14 Cartesian Join

SELECT A.COL1, A.COL2, A.COL3, B.COL4, B.COL5 FROM TBL1 A, TBL2 B WHERE A.COL6 = ? AND B.COL7 = ? WITH UR;

Is something missing? ⊕ There is no magic column in the explain tables to alert you of this possible mistake. ⊕ This access path appears normal, watch out!

In simple terms, a Cartesian join occurs when two (or more) tables are joined together without a join predicate between the tables. Using a 16 row table and a 525 row table, with filtering on each table but no join criteria between the two tables, returned 3,510 rows. When the join criteria was added, the result set was 9 rows.

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 15

Accidentally doing this with bigger tables (in a production environment) results in 8,796,000 rows being returned (a 4,619 row table and a 253,441 row table). Adding the join predicate returned the desired 13,900 rows. One way to detect a Cartesian join is to perform counts on each table (with the filtering criteria of each table). The largest individual table count is the upper limit on what the join result set should comprise of if you have proper join criteria in place (using 3rd normal form).

It might seem odd that we even mention Cartesian joins, but we do see them now and then. Not catching a minor code mistake can have significant ramifications. Writing your SQL to clearly denote and separate your join predicates from your filtering predicates can help mitigate the occurrences of Cartesian products.

15 LIKE :HV% Æ The lazy way out?

SELECT COL1, COL2, COL3.. FROM MOUNTAIN.PEAK WHERE PEAK_NAME LIKE :H AND DATUM LIKE :H AND UTM_ZONE LIKE :H AND RANGE LIKE :H AND STATE_CD LIKE :H AND CONTINENT LIKE :H WITH UR;

⊕ Do not rely on general purpose queries to answer multiple questions. Avoid these pitfalls: ¾ User’s won’t define search requirement minimums ¾ Time and complexity to code a query for each “combination” required ¾ Number of indexes to support a general purpose query ¾ Inefficient access path when search criteria does not match access path chosen REOPT(ALWAYS)

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 16

With static SQL, the access path is chosen at bind time. In this example, DB2 chooses the index with the highest first key cardinality. In some cases, it may not be too costly to have the “wrong” access path selected when the query does not provide a value for that high key cardinality column. However, when the tables get larger, these general purpose queries become very expensive (the problem multiplies as tables with additional like predicate filtering are added into the query stack).

Reopt(always) or report(vars) if on V7 might help, but can cause other potential problems which will be discussed in detail during the reoptimization slides.

16 Reoptimization | What is it?

⊕ REOPTIMIZATION Æ Evaluate data values and access path at runtime ¾ When two or more access paths are needed based on the content of the :HostVariables ¾ Allow the optimizer to make a different decision based on knowing the data values of the filtering predicates ¾ When the optimizer’s estimate of qualifying rows does not yield the desired access path which DB2 would select if it knew the content of the host variables prior to execution ¾ Limit parts for partition scans and influence join sequence ¾ The REOPT code path does not invoke Automatic Query Rewrite, AQR is currently only available for dynamic read-only SQL ⊕ REOPT(ALWAYS) ¾ Available for static and dynamic SQL ¾ Carry over of REOPT(VARS) from V7 ⊕ REOPT(ONCE) ¾ Available for dynamic SQL ¾ How do you seed the correct values in the :HostVariables for the first execution? This will set the access path tied to the SQL statement in the dynamic statement cache. Don’t forget about resetting due to actions like IPL, runstats report no update none. ⊕ Static SQL | package level granularity ¾ Consider isolating and consolidating your reopt statements to a few static packages. ¾ Document which objects the reoptimized access path uses and be aware of other access paths needing the same objects. ¾ Plan based SQL can also have reoptimization, although the best practice is to implement at the lower package granularity. ⊕ Dynamic SQL | statement level granularity allowing smaller scope and scale of impact ⊕ Did reoptimization benefit the SQL statement? ¾ IFCID 0022 ¾ SQL object monitoring Æ what got touched for this access path (Detector/Subsystem Analyzer, Query Monitor, Apptune, etc.)

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 17

Query showing package based DBRMs with active reoptimization SELECT DISTINCT APPLNAME, PROGNAME, WHEN_OPTIMIZE FROM PROD.PLAN_TABLE A WHERE WHEN_OPTIMIZE IN ('B') AND BIND_TIME = (SELECT MAX(BIND_TIME) FROM PROD.PLAN_TABLE B WHERE A.APPLNAME = B.APPLNAME AND A.PROGNAME = B.PROGNAME) WITH UR;

Query shows all plan based DBRMs that have had Reoptimization active SELECT DISTINCT APPLNAME, PROGNAME, WHEN_OPTIMIZE FROM PROD.PLAN_TABLE WHERE WHEN_OPTIMIZE IN ('B') WITH UR;

• Would static SQL benefit from being dynamic and taking advantage of dynamic cache? • Reoptimization is done at open cursor.

17 Executing a REOPT TRAN DISABLE Caveats with REOPT

REOPT 13.16.08 STC13637 DSNI031I -DBP2 DSNILKES - LOCK ESCALATION HAS Contention DSNDB06 OCCURRED FOR RESOURCE NAME = DSNDB06.SYSPLAN SYSDBASE LOCK STATE = X ARPOSTQ Outage Æ15 minutes, 5 seconds BIND 177 DBRMs PLAN NAME : PACKAGE NAME = DSNBIND : N/A RQ06 Outage Æ13 minutes, 21 seconds SYSPLAN COLLECTION-ID = N/A Other Plan Outages Æ 4 minutes, 40 seconds STATEMENT NUMBER = N/A

SYSPLAN TranIDCICS Outage CORRELATION-ID = CGARPOST CONNECTION-ID = BATCH Contention BIND

SYSPLAN Outages LUW-ID = USARFW01.LUDSNP2.C0079D34792B RC=0 THREAD-INFO = STCUSER : * Plan EDMPOOL 13.20.36 STC13736 DSNT501I -DBP1 DSNILMCL RESOURCE UNAVAILABLE Load Failure ENABLE CORRELATION-ID=POOLEQ030135 CONNECTION-ID=CICSCOM2 LUW-ID=USARFW01.LUDSNP.C0079DA6108D=0 13.05.43 STC13637 DSNT375I -DBP2 PLAN=DSNBIND WITH REASON 00C9008E CORRELATION-ID=CGARPOST TYPE 00000200 CONNECTION-ID=BATCH NAME DSNDB06 .SYSPLAN LUW-ID=USARFW01.LUDSNP2.C0079AC395F7=15307 13.20.36 STC13736 DSNT501I -DBP1 DSNILMCL RESOURCE UNAVAILABLE THREAD-INFO=STCUSER:*:*:* CORRELATION-ID=IV710DO IS DEADLOCKED WITH PLAN=RQ06SGL WITH CONNECTION-ID=BATCH CORRELATION-ID=ENTRRQ060037 LUW-ID=USARFW01.LUDSNP.C0079D47A4A2=0 CONNECTION-ID=CICSCOM2 REASON 00C9008E LUW-ID=USARFW01.LUDSNP.C0079A5B8C70=122665 TYPE 00000200 THREAD-INFO=CICSUSER:*:*:* NAME DSNDB06 .SYSPLAN ON MEMBER DBP1 13.20.36 STC13736 DSNT501I -DBP1 DSNILMCL RESOURCE UNAVAILABLE 13.05.43 STC13637 DSNT501I -DBP2 DSNILMCL RESOURCE UNAVAILABLE CORRELATION-ID=POOLIV780100 CORRELATION-ID=CGARPOST CONNECTION-ID=CICSWEBB CONNECTION-ID=BATCH LUW-ID=USARFW01.LUDSNP.C0079D9E0E81=0 LUW-ID=USARFW01.LUDSNP2.C0079AC395F7=0 REASON 00C9008E REASON 00C90088 TYPE 00000200 TYPE 00000302 NAME DSNDB06 .SYSPLAN NAME DSNDB06 .SYSDBASE.X'0007B3' 13.20.48 CGARPOST BIND Completed, RC=0 13.08.11 CICS Transaction RQ06 is disabled Æ tied to PLAN RQ06SGL using REOPT 13.21.32 CICS Transaction RQ06 enabled 13.08.13 CGARPOST BIND Begins

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 18

The real effects of large plan binds are shown on DBP1, due to the lock escalation on SYSPLAN for the large plan bind of ARPOSTQ(177 DBRMs). Other skeleton cursor tables could not be loaded into the EDM pool, causing other transactions not tied to ARPOSTQ bind to fail.

All of the failures around 13:20 are tied to the lock escalation of the large plan based bind. Outages after 13:20 on are not tied to reopt, they are tied to being a PLAN based shop. There is no versioning with PLANS. Packages allow versioning and can eliminate these errors tied to DSNDB06.SYSPLAN if implemented correctly.

Should code implementations be permitted 24/7?

Reoptimization uses a mini-bind. During the mini-bind, customers can experience the same locking needs as running bind operations in parallel. Collisions can lead to deadlocks on SYSDBASE (checking systabauth as an example). IBM is using an internal lock and latch that causes this deadlock. Until the operation is changed to support greater concurrency, issues can occur.

DSN1PRINT detail for original SYSDBASE lock [DSNDB06 .SYSDBASE.X'0007B3‘] Drill down with DSN1PRNT to see what object is the source of the lock. In this case, it is a page in table systablespace associated to CUSTOMER_PROFILE. The reoptimization was to support the general purpose LIKE wildcarding query.

18 RUNSTATS TABLESPACE DBUS14ER.TSPEAK01 RUNSTATS INDEX MOUNTAIN.DX1PEAK Does this column add filtering to the index? TABLE(MOUNTAIN.PEAK) INDEX(ALL) FREQVAL NUMCOLS 2 COUNT 16 ¾ Understanding your data is critical. SELECT COUNT(*), STATE_CD SELECT COUNT(*), STATE_CD, ELEVATION ¾ Would colgroup stats help? FROM PEAK FROM PEAK ¾ Can I see this in Visual Explain? WHERE STATE_CD = 'CO' WHERE STATE_CD = 'CO' ¾ Does the column add uniqueness? GROUP BY STATE_CD GROUP BY STATE_CD, ELEVATION ¾ Did the column increase the index levels? WITH UR; WITH UR; Multiple columns frequency distribution (STATE_CD, ELEVATION) Attribute Frequency Explanation

(CO, 14420) 0.0625

(CO, 14270) 0.0625 Colvalue CardF FrequencyF ColGroupColNo CardF FrequencyF AK 2 0.1250 AK 20320 1 0.0625 (CO, 14255) 0.0625 CA 4 0.2500 AK 14831 1 0.0625 (CO, 14196) 0.0625 CO 9 0.5625 CA xxxxx 1 0.0625 (CO, 14172) 0.0625 WA 1 0.0625 CO xxxxx 1 0.0625 WA xxxxx 1 0.0625 (CO, 14110) 0.0625 Colvalue CardF FrequencyF (CO, 14059) 0.0625 ColGroupColNo CardF FrequencyF AK 21 0.2308 (CO, 14037) 0.0625 CA 15 0.1648 AK xxxxx 1 0.0110 CO 54 0.5934 CA xxxxx 1 0.0110 (CO, 14005) 0.0625 WA 1 0.0110 CA 14162 2 0.0220 (CA, 14494) 0.0625 CA 14000 2 0.0220 (CA, 14080) 0.0625 CO xxxxx 1 0.0110 Attribute Frequency CO 14265 2 0.0220 (CA, 14058) 0.0625 CO 0.5625 CO 14162 2 0.0220 (CA, 14000) 0.0625 CA 0.25 CO 14042 2 0.0220 (AK, 20320) 0.0625 AK 0.125 CO 14014 2 0.0220 CO 14005 2 0.0220 (AK, 14831) 0.0625 WA 0.0625 CO 14197 3 0.0330 Cardinality 4.0 Number of rows for the column group 2007-02-24 WA 14410 1 0.0110 Timestamp 2007-02-24 Timestamp when last invocation of 18:33:16.801237 Timestamp 18:33:16.801237 RUNSTATS

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 19

Number Rows = SELECT COUNT(*) FROM PEAKS

SYSIBM.SYSCOLDIST / SYSIBM.SYSCOLDISTSTATS CARDF = Number of distinct columns for column group CARDF = SELECT STATE_CD, COUNT(*) FROM PEAKS GROUP BY STATE_CD CARDF = SELECT STATE_CD, ELEVATION, COUNT(*) FROM PEAKS GROUP BY STATE_CD, ELEVATION

FREQUENCYF = CARDF / Number of rows

NUMCOLUMNS = Number of columns associated to the statistics collection

COUNT = Number of top frequently occurring values to collect statistics on (from FREQVAL NUMCOLS xx COUNT xx syntax of RUNSTATS)

COLGROUPCOLNO = Array of smallinit numbers with the dimension equal to NUMCOLUMNS. The examples show the actual values for understanding purposes. COLGROUP statistics are helpful in allowing the optimizer to make better decisions.

Smaller filter factors mean less rows qualify. Fewer qualifying rows help DB2 favor index access. If a large percent of rows qualify, the filtering based on that column or index is poor; therefore, DB2 would likely favor a relational scan.

19 Should merge scan joins be in your onlines?

Impact of merge scan join on OLTP ⊕ Most likely not the preferred access path for onlines ⊕ Better for warehouse and reporting queries

Plan based SQL query to find merge scan joins SELECT INT(B.CARDF) AS ROWS, INT(B.NPAGESF) AS PAGES, A.* FROM PLAN_TABLE A, SYSIBM.SYSTABLES B WHERE A.APPLNAME LIKE ‘%your online pattern%’ AND A.METHOD = 2 AND A.TNAME = B.NAME AND A.CREATOR = B.CREATOR AND A.BIND_TIME = (SELECT MAX(C.BIND_TIME) FROM PLAN_TABLE C WHERE C.APPLNAME = A.APPLNAME AND C.PROGNAME = A.PROGNAME ) ORDER BY 2 DESC WITH UR;

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 20

Merge scan join: the present composite table and the new table are scanned in the order of the join columns, then matching rows are joined.

20 Merge scan join in your onlines | continued

Two code changes occurred ⊕ Cursor was changed to multi-row fetch ⊕ Optimize for N rows was also added ⊕ VE CPU Cost (ms) of merge scan join was 3,488 and the CPU Cost (ms) of nested loop join was 8

Before, query used 4 minutes 39.485703 seconds CPU and fetched 2,477,164 getpages.

After, query used 00.086878 seconds CPU and fetched 828 getpages.

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 21

Visual Explain tip:

EXPLAINing with the Visual Explain Stored Procedure (DSN8EXP) Æ Requires execute authority on the stored procedure, but enables statements to be explained without the user requiring the authority of those statements.

You will see this stored procedure running under: PLANNAME Æ DISTSERV PROGRAM Æ DSN8EXP COLLID Æ DSN8EXP

21 What about hybrid join in your onlines? Impact of hybrid join on onlines ⊕ May not be the preferred access path for onlines ⊕ Better for warehouse and reporting queries (along with Star Join)

Plan based SQL query to find hybrid scan joins SELECT INT(B.CARDF) AS ROWS, INT(B.NPAGESF) AS PAGES, A.* FROM PLAN_TABLE A, SYSIBM.SYSTABLES B WHERE A.APPLNAME LIKE '%your online pattern%' AND A.METHOD = 4 AND A.TNAME = B.NAME AND A.CREATOR = B.CREATOR AND A.BIND_TIME = (SELECT MAX(C.BIND_TIME) FROM PLAN_TABLE C WHERE C.APPLNAME = A.APPLNAME AND C.PROGNAME = A.PROGNAME ) ORDER BY 2 DESC WITH UR;

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 22

Hybrid join: the current composite table is scanned in the order of the join-column rows of the new table. The new table is accessed using list prefetch.

Clicking on the FETCH box in Visual Explain (look for the “information box” on the left) will show if DB2 is invoking prefetch, and what kind of prefetch (dynamic, sequential or list). The appendix section of this presentation has an example of this screen shot showing list prefetch.

22 Hybrid join in your onlines | continued

Sometimes simple to fix, sometimes not ⊕ Statistics Advisor ¾ Generated the proper stats to facilitate nested loop join access path ¾ Sort after nested loop join is for Group By (METHOD=3) ⊕ New/altered indexes ¾ Altering existing indexes and adding just one column can make a difference ¾ Consider new indexes to accommodate business changes

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 23

VE cost estimates: For hybrid join, CPU Cost (ms) is 13 and CPU Cost (su) is 188. For the nested loop join, CPU Cost (ms) is 7 and CPU Cost (su) is 139.

The two tables involved in this example are not big tables. If you have large tables doing hybrid join, converting them to nested loop for the small result set scope of your OLTP needs can have significant savings.

What niche do various join methods favor? Nested Loop (PLAN_TABLE.METHOD = 1) Æ Small result set or well clustered tables. Merge Scan (PLAN_TABLE.METHOD = 2) Æ Poor cluster ratio, large result set. Hybrid (PLAN_TABLE.METHOD = 4) Æ Poor cluster ratio, large number of columns on inner table. Star (PLAN_TABLE.JOIN_TYPE = S) Æ Has all join methods in its bag of tricks, not for OLTP.

PLAN_TABLE.JOIN_TYPE ‘‘Æ inner join F Æ full join L Æ left/right outer join S Æ star join

23 Good Will Hunting Queries Packages with Merge Scan Join Packages with Hybrid Scan Join SELECT HUNTING.* SELECT HUNTING.* FROM SYSIBM.SYSPLAN PLN, SYSIBM.SYSPACKAGE PKG , FROM SYSIBM.SYSPLAN PLN, SYSIBM.SYSPACKAGE PKG , SYSIBM.SYSPACKLIST PKLST, SYSIBM.SYSPACKLIST PKLST, (SELECT INT(B.CARDF) AS ROWS, (SELECT INT(B.CARDF) AS ROWS, INT(B.NPAGESF) AS PAGES, INT(B.NPAGESF) AS PAGES, A.* A.* FROM PROD.PLAN_TABLE A, SYSIBM.SYSTABLES B FROM PROD.PLAN_TABLE A, SYSIBM.SYSTABLES B WHERE A.MERGE_JOIN_COLS = 2 WHERE A.MERGE_JOIN_COLS = 4 -- AND A.COLLID = ‘collection for onlines‘ -- If you have one -- -- AND A.COLLID = ‘collection for onlines‘ -- If you have one -- AND A.TNAME = B.NAME AND A.TNAME = B.NAME AND A.CREATOR = B.CREATOR AND A.CREATOR = B.CREATOR AND A.BIND_TIME = AND A.BIND_TIME = (SELECT MAX(C.BIND_TIME) (SELECT MAX(C.BIND_TIME) FROM PROD.PLAN_TABLE C FROM PROD.PLAN_TABLE C WHERE C.COLLID = A.COLLID WHERE C.COLLID = A.COLLID AND C.PLANNO = A.PLANNO AND C.PLANNO = A.PLANNO AND C.VERSION = A.VERSION))AS HUNTING AND C.VERSION = A.VERSION))AS HUNTING WHERE PLN.NAME = PKLST.PLANNAME WHERE PLN.NAME = PKLST.PLANNAME AND PKLST.COLLID = PKG.COLLID AND PKLST.COLLID = PKG.COLLID AND PKG.NAME = HUNTING.PROGNAME AND PKG.NAME = HUNTING.PROGNAME AND PLN.NAME LIKE '%your online pattern%' AND PLN.NAME LIKE '%your online pattern%' WITH UR; WITH UR;

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 24

There is minimal difference between these two queries. Once you have created a good “hunting” query, you can easily modify it to search for your criteria.

Receiving no rows when running these hunting queries in your shop indicates you have your OLTP joins in decent shape. The key is ensuring the scope of what constitutes OLTP is accurately defined in the hunting queries. Do not forget about Murphy's law. Receiving an empty result set on Monday does not indicate that this same empty result set will persist over time.

24 Runstats... Bueller? Reorg... Bueller? Rebind... Bueller? Anyone?

RUNSTATS REORG ⊕ Frequency ⊕ Scheduled or Autonomic ¾ Weekly, monthly, quarterly, annually or ¾ Scheduled is very common maybe never ¾ Can cause unwanted outages ⊕ Are you using RTS (Real Time Statistics) ⊕ Let RTS drive the need to reorg ¾ Why not? ¾ Only reorg objects that have statistically changed and require reorganization ¾ Be selective, if it is just one index, so be it REBIND ⊕ After runstats? ¾ What if your runstats were at a disorganized time ⊕ After reorg? ¾ Gives DB2 the best picture for making access path decisions ⊕ Never? Spicoli doesn’t need rebind, but DB2 does! ¾ With dynamic SQL, you don’t get to ignore the need for rebind. DB2 initiates the rebind. Remember, runstats flushes the dynamic statement cache.

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 25

Commonly referred to as the 3 R’s (reorg, runstats, rebind). These utilities/commands are very helpful in fighting performance problems most organizations experience as their data and systems change.

RTS (Real Time Statistics) are gathered auto-magically by DB2 and externalized to tables (one for tablespaces and one for indexes). These statistics do not influence the optimizer at this point in time. They do provide a way to know when utility events should be triggered (reorg, image copy or runstats). Those event driven operations allow you to create a more dynamic solution for feeding the optimizer what it needs to make good decisions.

One of the major benefits with RTS is only taking action when required and not burning through utility cpu cycles unnecessarily. Only reorg, copy and runstat when RTS shows the need. If there is no significant change, why spend the cycles?

25 Don’t fear statistics, get your DBA into the future

Friday Night STATS Friday Night Without STATS

Burning CPU Undesired STATS Desired STATS

Saturday Move / BIND Saturday Move / BIND DSNDB06

REORG Stats Sat/Sun OLR driven off Optimizer Stats Real Time Stats Friday Night Stats DSNACCOR

Desired Sat/Sun OLR driven off STATS Real Time Stats

REBIND Desired STATS

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 26

Below is a sample query to evaluate object change using RTS detail. For example, if there is an effort to retire certain tables or indexes, verify those objects are not having their data content changed. The results may surprise you. SELECT 'TSREORG', NAME, PARTITION AS PART, UPDATESTATSTIME, NACTIVE AS PAGES, SPACE, EXTENTS, REORGLASTTIME AS UTILITYLASTTIME, REORGINSERTS AS INSERTS, REORGDELETES AS DELETES, REORGUPDATES AS UPDATES FROM SYSIBM.TABLESPACESTATS WHERE DBNAME = 'TFMSV15P' AND NAME IN ('TFMTS011', 'TFMTS013','TFMTS075','TFMTS076') AND ( REORGINSERTS >= 100 OR REORGDELETES >= 100 OR REORGUPDATES >= 100 ) AND UPDATESTATSTIME >= '2006-04-24-00.00.00.000000' UNION ALL SELECT 'TSSTATS', NAME, PARTITION AS PART, UPDATESTATSTIME, NACTIVE AS PAGES, SPACE, EXTENTS, STATSLASTTIME, STATSINSERTS, STATSDELETES, STATSUPDATES FROM SYSIBM.TABLESPACESTATS WHERE DBNAME = 'TFMSV15P' AND NAME IN ('TFMTS011', 'TFMTS013','TFMTS075','TFMTS076') AND ( STATSINSERTS >= 100 OR STATSDELETES >= 100 OR STATSUPDATES >= 100 ) AND UPDATESTATSTIME >= '2006-04-24-00.00.00.000000' UNION ALL SELECT 'TSCOPY', NAME, PARTITION AS PART, UPDATESTATSTIME, NACTIVE AS PAGES,SPACE, EXTENTS, COPYLASTTIME, COPYUPDATEDPAGES, COPYCHANGES, COPYCHANGES FROM SYSIBM.TABLESPACESTATS WHERE DBNAME = 'TFMSV15P' AND NAME IN ( 'TFMTS011', 'TFMTS013','TFMTS075','TFMTS076') AND ( COPYUPDATEDPAGES >= 10 OR COPYCHANGES >= 10 ) AND UPDATESTATSTIME >= '2006-04-24-00.00.00.000000' UNION ALL SELECT 'IXREORG', INDEXSPACE, PARTITION AS PART, UPDATESTATSTIME, NACTIVE AS PAGES, SPACE, EXTENTS, REORGLASTTIME AS UTILITYLASTTIME, REORGINSERTS, REORGDELETES, REORGLEAFFAR FROM SYSIBM.INDEXSPACESTATS WHERE DBNAME = 'TFMSV15P' AND INDEXSPACE IN ( 'TFMIX011','TFMX2011','TFMX3011','TFMX4011', 'TFMIX013', 'TFMIX075','TFMX2075','TFMX3075','TFMX4075','TFMX5075', 'TFMX6075','TFMIX076','TFMX2076') AND ( REORGINSERTS >= 100 OR REORGDELETES >= 100 OR REORGLEAFFAR >= 10 ) AND UPDATESTATSTIME >= '2006-04-24-00.00.00.000000' UNION ALL SELECT 'IXSTATS', INDEXSPACE, PARTITION AS PART, UPDATESTATSTIME, NACTIVE AS PAGES, SPACE, EXTENTS, STATSLASTTIME, STATSINSERTS, STATSDELETES, STATSMASSDELETE FROM SYSIBM.INDEXSPACESTATS WHERE DBNAME = 'TFMSV15P' AND INDEXSPACE IN ( 'TFMIX011','TFMX2011','TFMX3011','TFMX4011', 'TFMIX013', 'TFMIX075','TFMX2075','TFMX3075','TFMX4075','TFMX5075', 'TFMX6075','TFMIX076','TFMX2076') AND ( STATSINSERTS >= 100 OR STATSDELETES >= 100 OR STATSMASSDELETE >= 1 ) AND UPDATESTATSTIME >= '2006-04-24-00.00.00.000000' WITH UR; 26 Stats, stats and more stats | Do you think we like stats? So does the database! ⊕ Statistics Advisor ¾ From Visual Explain (which is free)

⊕ Red is bad ¾ Indicates default stats used in access path selection

⊕ Clicking on Analyze invokes Statistics Advisor

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 27

DSNACCOR is an IBM supplied stored procedure that uses the data in the real-time statistics tables to maintain the operation efficiency of a DB2 subsystem/group.

DSNACCOR performs these actions: • Recommends when to reorganize, image copy, or update statistics for tablespaces and/or indexspaces • Indicates tablespaces or indexspaces that have exceeded their data set (extent management) • Indicates whether objects are in a restricted state

The defaults for many of the parameters can be changed by specifying them as input to the DSNACCOR call statement. The RTS exception table can be utilized for "special" tables that do not require processing by DSNACCOR. The exception table excludes objects not in scope (for example; a large partitioned tables with one or more NPIs, or tables which already have special processing established).

27 Stats Advisor, the one loop your DBA might like. If at first you don’t succeed ⊕ Iterative process ¾ Sometimes you need to do this multiple times (until you get the same recommendation or “No RUNSTATS commands recommended”) ¾ The runstats suggested can be executed via stored procedure, or cut and pasted into a batch job for execution

⊕ Real numbers ¾ Should lead to better choices ¾ If not, open an ETR

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 28

Executing runstats from visual explain will require: • Having the stored procedure defined, • Having access to execute the stored procedure, and • Having authority to perform runstats on the objects.

Do not forget about the history statistics. They are very helpful with trending and performance.

Items to remeber tied to runstats: • Exclude volatile objects from your runstats to preserve your -1 in the various stats tables. • Do you have NPGTHRSH=502 to support the -1 stats and index usage? • Would a volatile table definition be better? • If you have updated the catalog stats manually, RUNSTATS will be overlaying those updates. • RUNSTATS will inactivate statements in Dynamic Cache (the statements associated to the objects being stated).

28 Runstats invoked via stored procedure VE results of running stats. Identical content / results as the green screen approach.

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 29

The same information is generated when runstats is executed from a batch job. Launching stats from Visual Explain results in the optimizer statistics being updated (there is no wizard behind the curtain). Stats advisor has the advantage of building out the statistics it sees are missing. Without these additional stats suggestions DB2 would potentially favor lesser access paths which are not the best fit for your data and queries.

At times, the recommendations from stats advisor are not enough to address the performance problem. For those scenarios, additional stats need to be run to account for potential data skews. The current release of Visual Explain operates at the statement level. Future releases (hopefully the release supporting DB2 9) will operate on the package/plan level and generate stats suggestions with that larger scope in mind.

29 Statistics Rule – Opthints Drool ⊕ Opthints sound good, but can be difficult to implement and maintain. Opthints typically lead to lackadaisical performance habits and practices. ¾ Solve the root cause of your performance issue. ¾ If more than 10% of your access paths are hinted, you are likely doing more harm than good. ¾ Usage should be documented and automated for future code moves. ¾ Create an ETR to ensure resolution of the root cause. ¾ DSNZPARM change required to activate. ⊕ Establish the source for the opthint by updating the plan_table rows to add an OPTHINT name. ¾ Keep the hint scope small, only hint the necessary query numbers, don’t go wild. ¾ Are the ophint rows protected from explain table clean up processes? ⊕ Programmers should add QUERYNO to their code. ¾ If you really need the hint, take the time to implement the queryno. ¾ Some case generation tools don’t support QUERYNO Â Should you be using the hint? Â Should you be using the case tool for the code that needs to be hinted? ⊕ How should opthints be named and managed? ¾ V8HINT1, V8RSU0702_HINT1 | CHAR(8) to VARCHAR(128) with V8. ¾ Can the existing change management software support pattern/standard implemented? ¾ Are the hinted rows kept in a source code repository? Is the PLAN_TABLE backed-up/recoverable? ⊕ Verify hint is in use! ¾ BIND/REBIND SQLcode +394, HINT_USED column of PLAN_TABLE ¾ Query the special register: CURRENT OPTIMIZATION HINT

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 30

+394 = User specified optimization hints used during access path selection. (SQLSTATE 01629) +395 = User specified optimization hints are invalid (reason code = ‘ ‘). The optimization hints are ignored. (SQLSTATE 01628)

DSNZPARM change required to support optimization hints. YES in the OPTIMIZATION HINTS field of installation panel DSNTIP4. OPTHINT and HINTUSED go from CHAR(8) to VARCHAR(128) with V8. Gives you more flexibility to create meaningful hint names.

Recommend creating an index on PLAN_TABLE. DB2 sample library member DSNTESC has a sample create index for the plan table.

CREATE TYPE 2 INDEX FXFP.DX7PLTB ON FXFP.PLAN_TABLE (OPTHINT ASC ,APPLNAME ASC ,PROGNAME ASC ,"COLLID" ASC ,VERSION ASC ,"QUERYNO" ASC) NOT PADDED USING STOGROUP DBASG PRIQTY 68400 SECQTY 7200 FREEPAGE 32 PCTFREE 25 BUFFERPOOL BP13 CLOSE YES DEFER YES PIECESIZE 2097152 K;

This index is needed for the optimizer to use an index on your plan table when establishing and setting the optimization hints. Why have DB2 perform a tablespace scan when looking for a hint at bind and rebind time? Remember with V8 you can alias folks (plan and package OWNERs) to one common set of explain tables for consolidating explain information into one common set of explain tables.

30 A day in the life of an Optimization Hint 1 UPDATE PROD.PLAN_TABLE SET OPTHINT = ‘V8RSU0702_H1’ WHERE Once the hint is statically bound, the optimizer does not have to access the PROGNAME = ‘LIKESRCH’ AND 3 VERSION = ‘2006-10-26-23.14.02.002004’ AND PLAN_TABLE to retrieve the access path which is hinted, it can just run. COLLID = ‘CS_MATCH’ AND QUERYNO IN (7448);

PLAN_TABLE | before the OPTHINT is sourced No Yes Is this Index on your QUERYNO APPLNAME PROGNAME VERSION COLLID OPTHINT HINT_USED PLAN_TABLE? 7448 LIKESRCH 2007-02-28-13.22.01.003007 CS_MATCH 7448 LIKESRCH 2006-10-31-13.07.04.004008 CS_MATCH 7448 LIKESRCH 2006-10-26-23.14.02.002004 CS_MATCH RScan Index Access DBUSXPTB.USTSPLTB PROD.DX1HINT QUERYNO, PLAN_TABLE | After the OPTHINT is sourced APPLNAME, 1 QUERYNO APPLNAME PROGNAME VERSION COLLID OPTHINT HINT_USED PROGNAME, 7448 LIKESRCH 2007-02-28-13.22.01.003007 CS_MATCH VERSION, 7448 LIKESRCH 2006-10-31-13.07.04.004008 CS_MATCH COLLID, 7448 LIKESRCH 2006-10-26-23.14.02.002004 CS_MATCH V8RSU0702_H1 OPTHINT

PLAN_TABLE | After REBIND OPTHINT('V8RSU0702_H1') 2 2 QUERYNO APPLNAME PROGNAME VERSION COLLID OPTHINT HINT_USED REBIND PACKAGE(CS_MATCH) 7448 LIKESRCH 2007-02-28-13.22.01.003007 CS_MATCH V8RSU0702_H1 MEMBER(LIKESRCH) 7448 LIKESRCH 2006-10-31-13.07.04.004008 CS_MATCH 7448 LIKESRCH 2006-10-26-23.14.02.002004 CS_MATCH V8RSU0702_H1 VERSION(‘2007-02-28-13.22.01.003007’) OPTHINT(‘V8RSU0702_H1’)

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 31

Dynamic Statements Query number is the statement number in the application where the prepare occurs. For some applications, such as DSNTEP2, the same statement in the application prepares each dynamic SQL statement. This makes the query number for each dynamic statement identical. Using QUERYNO eliminates ambiguity as to which rows in the PLAN_TABLE are associated with each SQL statement in the application.

Static Statements If the application change results in a statement number changing, then the QUERYNO will change as well (unless it was hard coded). This would mean updates to your opthints in your old PLAN_TABLE rows. Assigning a QUERYNO will ensure you are controlling and tagging your SQL properly.

Keep only the necessary rows in your plan_table to determine performance changes from various code releases. Clean up old versions of explain output when you deprecate prior packages. See deprecate slide for a safe way to remove obsolete code. Also consider updating the REMARKS column of PLAN_TABLE to include any notes about the performance for any QueryNo you have to hint.

IBM’s primary intent with opthints was to enable customers to resolve performance degradation by switching to the access path that performed well previously (also referred to in the manuals as “preserving the access path”). For the most part, you should not override what the optimizer wants to do. The optimizer knows best; give it better stats to do its job!

Download Visual Explain: http://www-306.ibm.com/software/data/db2/zos/downloads/ve.html

31 My drain is stuck, anyone have some Drano?

⊕ Claim – interest on a page set UR RS ⊕ Drain – required for certain operations UR UR CS UR ¾ Online schema evolution CS CS ¾ Utilities like the switch for online reorg UR RR UR UR ¾ Database commands and certain alters, creates, drops CS ⊕ COMMIT ¾ Do you have a fear of commitment MOUNTAIN.PEAK ? ⊕ Thread Reuse ¾ DRDA

COMMIT ¾ Enforce commit logic ¾ Thread growth UR UR ¾ 1.7 Gig limit DIST UR CS CS UR UR CS

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 32

A claim is a notification to DB2 that an object is being accessed. A claim indicates to DB2 that there is activity on or interest in a particular page set or partition. Claims prevent drains from occurring until the claim is released. There are 3 classes of claims: Write Æ Reading, updating, inserting, and deleting Repeatable read Æ Reading only, with RR isolation Cursor stability read Æ Reading only, with isolations, RS, CS or UR.

DB2 issues a warning message and generates a trace record for each time period that a task holds an uncommitted read claim. You can set the length of the period in minutes by using the LRDRTHLD subsystem parameter. An application using UR isolation cannot run concurrently with a utility that drains all claim classes.

A drain is the action of taking over access to an object by preventing new claims and waiting for existing claims to be released.

-DIS DB(DBUS14ER) SPACE(*) CLAIMERS DSNT360I !DBT2 *********************************** DSNT361I !DBT2 * DISPLAY DATABASE SUMMARY * GLOBAL CLAIMERS DSNT360I !DBT2 *********************************** DSNT362I !DBT2 DATABASE = DBUS14ER STATUS = RW DBD LENGTH = 12104 DSNT397I !DBT2 NAME TYPE PART STATUS CONNID CORRID CLAIMINFO ------TSPEAK01 TS RW TSPEAK50 TS RW SERVER qmfwin.exe:C (CS,C) GA015A2A.G7E3.3496F045BC03=78078 ACCESSING DATA FOR MEMBER NAME DBT2 TSPEAKB1 TS RW

32 Session: F6 Title: Put a smile on your DBAs face, Performance for the Developer Marc Costa & Rob Crane

ns? g! stio min ue co Q for nks Tha

[email protected] [email protected]

33

Biography: Rob Crane Rob is a database enthusiast who began programming against DB2 in 1990. He has held several roles interacting with DB2; programming, modeling, system design, sysprog and database administrator. Gained DB2 experience working for State Farm, The Principal, Platinum/CA and FedEx. Rob is responsible for leading FedEx Freight’s database teams, supporting all aspects of database development. IBM Certified DBA for DB2 on z. Co- founder & chairman of www.coloradodb2rug.org. Fortunate to be selected as a user speaker for IDUG NA conferences 2000-2007. Received top 10 speaker honors in 2000, 2005 & 2006 and Best User Speaker in 2005 (North America & Europe). Member of the IDUG NACPC 2007 & 2008.

33 Existence Check | Fetch First 1 Row Only

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 34

Each version of DB2 makes improvements and pushes more logic into stage one processing. Regardless of DB2 version, select only the columns and rows needed. The SQL should do a considerable amount of filtering up front.

Start by writing efficient SQL! Don’t wait to see explain output; encourage SQL coding habits that exploit the optimizers strongest features. If you knew the winning lotto numbers, wouldn’t you play?

Assuming the statement is not able to use an index, there is still considerable savings from being stage one vs. stage two. Learn the tricks of the trade to convert stage two into stage one statements. Pay attention to predicate usage and classification. Pushing even one part of the query to stage one can help in reducing the complexity and amount of stage two filtering that has to occur.

34 Existence Check | Singleton Select

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 35

Sample DSN1PRNT output. Tied to SYSDBASE contention discussed earlier.

PAGE: # 00000C18 ------DATA PAGE: PGCOMB='10'X PGLOGRBA='C00BA78984DD'X PGNUM='00000C18'X PGFLAGS=' PGFREE='000D'X PGFREEP=8087 PGFREEP='1F97'X PGHOLE1='0000'X PGMA PGNANCH=139 PGTAIL: PGIDFREE='00'X PGEND='N' ID-MAP FOLLOWS: 01 0014 00D4 0192 02A0 0365 040E 04C3 056E 09 0617 06BB 0763 080F 08C2 0975 0A29 0ACC 11 0B71 0C1C 0CC4 0D6D 0E0F 0EBA 0F64 1015 19 10C6 117D 1230 12E7 139E 1440 14EF 15A3 21 1658 170E 17C0 1870 1922 19CF 1A81 1B43 29 1BF5 1CA7 1D69 1E1A 1ECB

PGSLTH='00C0'X PGSOBD='0011'X PGSBID='07'X 39383220 00084146 ...... AFTSFBSH..KBG1982 ..AF 044E414E 80018001 WFMSYS.m....BP24 ..R..NAN.... 20202020 20208020 .... N..V.C.... . 20000101 01000000 ..KBG1982 .. !'..A...... 000080FF 2059C51D ...... "PAvCFE...... Y.. 0000 N0...... ,.y......

PGSLTH='00BE'X PGSOBD='0012'X PGSBID='07'X 57464D53 59530000 ...... AFTSFBSH..AFWFMSYS.. 42320008 44423243 .. ..FP.xI..PRODDB2..DB2C 80008019 20404040 ATP ...... F..N..0.... @@@ 20800000 A7800069 @....Y.T .. !'..A. @@@@@ ...... i 00000000 80000001 x...... I ..."PAvCF...... ,

PGSLTH='010E'X PGSOBD='0013'X PGSBID='07'X 000C7216 00085348 ...... &...6..r...SH 53595300 08414654 IPMENT..AFW T..AFWFMSYS..AFT 20202020 20202020 SFBSH.m.!......

35 Existence Check | COUNT(*)

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 36

36 Existence Check | Nested Loop Join

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 37

37 Existence Check | Correlated Sub Query

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 38

38 List Prefetch | SORTRID

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 39

39 Deprecate Obsolete Packages ⊕ You have identified several packages no longer needed by the business. How can you safely remove these packages while having fallback that does not require code to be moved into production? ⊕ Verify your list of obsolete packages against your accounting data to ensure the package is not active during the previous quarter. ⊕ REBIND DISABLE(BATCH,CICS,DB2CALL,DLIBATCH,IMSBMP,IMSMPP,REMOTE,RRSAF) ¾ SYSPKSYSTEM, SYSPLSYSTEM ⊕ If an application receives a SQLCODE -807, simply rebind the package enabling the environment it runs in. REBIND ENABLE(CICS) – off to the races. SQLCODE -923 for plan based DBRMs. DSNT408I SQLCODE = -807, ERROR: ACCESS DENIED: PACKAGE FMR7448 IS NOT ENABLED FOR ACCESS FROM BATCH DSNT418I SQLSTATE = 23509 SQLSTATE RETURN CODE

DSNT408I SQLCODE = -923, ERROR: CONNECTION NOT ESTABLISHED: DB2 ACCESS, REASON 00E3001B, TYPE 00000800, NAME FMR7448 DSNT418I SQLSTATE = 57015 SQLSTATE RETURN CODE ⊕ Update your package dependency rebind queries to exclude packages not enabled for any environment by adding a subselect not exists to SYSPKSYSTEM. ⊕ Free obsolete packages after they have been disabled for xx days. ⊕ Clean up explain table rows not required.

Put a smile on your DBAs face Performance for the Developer IDUG North America 2007 Page 40

INFO in BLACK builds the PLAN rebinds (note: no REMOTE for plans) INFO in BLUE builds the PACKAGE rebinds (note: includes the version)

SELECT DISTINCT 4,'99999', NAME, '00000', ' REBIND PLAN(' ||STRIP(B.NAME)|| ') EXPLAIN(YES) - ' FROM SYSIBM.SYSPLAN B WHERE NAME IN ('CS0018', 'BENDEP') UNION ALL SELECT DISTINCT 5,'99999', NAME, '00000', ' DISABLE(BATCH,CICS,DB2CALL,DLIBATCH,IMSBMP,IMSMPP,RRSAF)' FROM SYSIBM.SYSPLAN B WHERE NAME IN ('CS0018', 'BENDEP') UNION ALL SELECT 1, A.NAME, A.COLLID, A.VERSION, ' REBIND PACKAGE('||STRIP(A.COLLID)||'.'||STRIP(A.NAME)|| '.('||STRIP(A.VERSION)||')) - ' FROM SYSIBM.SYSPACKAGE A WHERE NAME IN ('CS0018', 'BENDEP') UNION ALL SELECT 2, A.NAME, A.COLLID, A.VERSION, ' EXPLAIN(YES) - ' FROM SYSIBM.SYSPACKAGE A WHERE NAME IN ('CS0018', 'BENDEP') UNION ALL SELECT 3, A.NAME, A.COLLID, A.VERSION, ' DISABLE(BATCH,CICS,DB2CALL,DLIBATCH,IMSBMP,IMSMPP,REMOTE,RRSAF)' FROM SYSIBM.SYSPACKAGE A WHERE NAME IN ('CS0018', 'BENDEP') ORDER BY 4, 2, 3, 1 WITH UR;

40