Trace Records: The Answer to All DB2 for z/OS Performance Questions

Bart Steegmans IBM

Session Code: B7–B8 Tue Nov 15 2016 – Part 1: 15:10 - 16:10 - Part 2: 16:30 - 17:30 | Platform: DB2 for z/OS Please Note:

• IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.

• Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.

• The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.

• The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. Outline • DB2 Instrumentation Trace Facility Overview • IFCIDs • Trace destinations • Trace types • Which Data / Traces Should I Collect When ? • Traces you should have on all the time • Stats • Accounting • More detailed trace to zoom in on a problem case • Traces for specific problems (usage scenarios) • Locking problem • SQL performance problem • Long service time suspensions (EU-switch) Outline • DB2 Instrumentation Trace Facility Overview • IFCIDs • Trace destinations • Trace types • Which Data / Traces Should I Collect When ? • Traces you should have on all the time • Stats • Accounting • More detailed trace to zoom in on a problem case • Traces for specific problems (usage scenarios) • Locking problem • SQL performance problem • Long service time suspensions (EU-switch) Introducing the DB2 Instrumentation Facility • Aka DB2 tracing • Traces contain a wealth of information • DB2 traces are grouped in trace types • Trace types use different trace classes • Trace classes consist of one or more IFCIDs • IFCID = Instrumentation Facility Component ID • This is an individual trace record

Statistics Accounting Audit

1 3 4 5 6 7 8 9 1 2 3 5 11 7 8 10 Global

3 239 Monitor 1 2 105 106 202 225 Performance Introducing IFCIDs • IFCIDs can have their own trace record • Like most performance trace IFCIDs • IFCID 21 (lock detail) creates an IFCID 21 trace record • IFCIDs can add fields to other IFCIDs • Like accounting IFCIDs • IFCID 232 (class2 times) adds the class 2 information to the IFCID 3 record • Can do both • Like accounting IFCIDs when they are part of a performance trace • IFCID 226-227 (page latch start/end) adds this info to accounting class 3, • Can also produce performance trace records as part of performance trace class(4) • Same IFCID can be part of multiple trace types/classes • IFCID 172 (deadlock) is part of stats class(3) and performance class (6) • IFCIDs can be part of no trace class • Always explicitly specified like IFCID 198 (getpage activity) IFCIDs can be

• Individual records • Paired records (begin/end) • Different records (IFCID 226 (begin page latch wait) – IFCID 227 (end ) ) • Same record for end of many types of events (IFCID 58 – end SQL) • Certain IFCIDs are not written at all When are trace records written • Vast majority is written when event occurs . Actually most of the time when the event is over, and it passes the filter criteria • Sometimes timer driven • DB2 statistics info (STATIME), or fixed intervals (like 1 min. stats) • DB2 dataset level statistics (DSSTIME) indicates when dataset level stats are to be reset for online monitors • When a thread de-allocates, is reused, goes inactive, rollup threshold is reached , … • DB2 accounting • When a trace is started • Eg. DB2 stats record is written when stats trace is started • (or modified which is an internal stop/start trace) • Never written • Some trace records are never externalized • They serve as switches to add info to certain records, or are only written to an internal trace table that shows up in an SVC dump Trace commands

• Starting a DB2 trace • -START TRACE • Or automatic at DB2 startup for some trace types (STA, ACCTG, MON, AUDIT, GLOBAL(probably not a good idea) • Displaying a DB2 trace • -DISPLAY TRACE • Changing an existing trace • -MODIFY TRACE • Performs a STO/STA trace under the covers • Stop a trace • -STOP TRACE Starting a DB2 trace • Automatically at DB2 startup via DSNZPARMs • Not for all traces can be started that way • Stats - SMFSTAT – default is yes, meaning 1,3,4,5,6 • Acctg - SMFACCT – default is 1 • Audit - AUDITST – default is NO • Monitor - MON – default is NO • Global - TRACSTR – default is NO

• Via –START TRACE command • Next foil -START TRACE command • Many additional filtering options since V9 (more details later) • Can specify a partial name, ending with “*” • E.g. PLAN(BARTPL*) • PLAN(*BARTPL) does not work • Can specify “_” (underscore) for a single char in the name • Can also use eXclude qualifiers • Eg. -STA TRA(P) CLASS(1) IFCID(198) DEST(GTF) TDATA(CPU,COR) XPLAN(ABC*) -START TRACE command

-START TRACE ( ACCTG ) STAT PERFM MONITOR AUDIT GLOBAL DEST ( SMF, GTF, Opn ,OPX, SRV ) SCOPE( LOCAL ) GLOBAL CLASS (1,2,3,4 . . .) AUTHID PLAN PKGLOC PKGCOL PKGPROG LOCATION USERID APPNAME WRKSTN CONNID CORRID ROLE XAUTHID XPLAN XPKGLOC XPKGCOL XPKGPROG XLOC XUSERID XAPPNAME XWRKSTN XCONNID XCORRID XROLE AUDTPLCY( ...) TDATA (COR,TRA,CPU,DIST) RMID (Resource-Manager-ID) ASID(x’yyyy’) IFCID (63,305,...) BUFSIZE(OP-Buffer-Size) COMMENT ('...... ‘) Trace Filtering • DB2 9 introduced extensive enhancements to instrumentation filtering • Filters now include PLAN, LOCATION, AUTHID, USERID, APPNAME, WRKSTN, PKGPROG, PKGLOC, PKGCOL, CONNID, CORRID, ROLE and eXclude keywords for each (eg. XPLAN) • Terminating and positional wildcards are allowed (eg. PLAN(DSNTEP*) , PLAN(PLAN_01)) • More details coming up • Trace filtering is important to reduce the trace volume (and trace overhead) Trace Filtering Details • For each affirmative filter, multiple values are allowed. Logic between them is OR. For example: • -START TRACE(PERFM) CLASS(3) PLAN(A, B) • Will write performance trace records if the PLAN = A OR PLAN = B • Note: This will start effectively 2 traces so the total trace limit of 32 may be reached more rapidly. Consider the use of wildcards if possible. • For each exclude filter, multiple values are allowed. Logic between these is AND. For example: • -START TRACE(PERFM) CLASS(3) XPLAN(A,B) • Will write performance trace records if the PLAN is not A AND is NOT B Using the IFCID keyword • Is used to specify additional IFCIDs on top of the IFCIDs that are part of the trace class that you are starting • Eg. –STA TRA(P) C(1) IFCID(21) • This will trace IFCID 001,002, 031,042,043,076,077,078,079,102,103 105,107,106,153 (that make up perf class 1) and also IFCID 21 • Using –STA TRA(P) C(1) IFCID(42,43) will still trace all the IFCIDs that are part of perf class 1. Specifying IFCID(42,43) is not needed as they are already part of perf class 1. • If you only want to trace certain IFCIDs , use an “empty” trace class. That is a class that has no IFCIDs assigned to it by default. • For the performance trace, these are classes 29, 30, 31, 32 • Eg. To only trace IFCID 42,43 (system checkpoint sta/end) ,use - STA TRA(P) C(32) IFCID(42,43) Using the IFCID keyword ...

° For people who like the gory details ... ° If you specify the IFCID(xxx) keyword and IFCID xxx is already part of the traces in the CLASS, there is NO override of the default ‘routing’ information • Eg. –STA TRA(A) C(1,2,3) IFCID(226,227) DEST(SMF)

• IFCID 226,227 are part of acctg trace class 3, BUT they are NOT written to an external destination. They use an internal trace destination called SR1 • Specifying IFCID(226,227) DEST(SMF) does NOT override the ‘default’ • Result is no IFCID 226,227 records will be written to SMF • You can use –STA TRA(P) C(30) IFCID(226,227) DEST(SMF) to produce IFCID 226/227 trace records Using the SCOPE keyword

• Applies to data sharing only • SCOPE(LOCAL) • Is used to start a trace on the member that issues the –STA TRA command • Default • SCOPE(GROUP) • Will start this trace on ALL members of the data sharing group • The data goes to the DEST for each member • To collect trace info from all members in one place, use IFI READA or READS calls to collect the data • This info passed between members via NOTIFY msgs, so be careful with trace volume Using the TDATA keyword • Is used to specify which trace headers you want DB2 to use when writing the trace record • The default trace headers depend on the trace type • Best practice is to specify them to make sure you have the headers you want • Standard header • Cannot be specified – always present • See DSNDQWHS (in hlq.SDSNMACS) • Contains great stuff, such as • LUWID, DB2ID, DB2 version, STCK , IFCID#, … • QWHSISEQ - IFCID SEQUENCE NUMBER • QWHSWSEQ - DESTINATION SEQUENCE NUMBER • Latter two can be used to check whether trace records were lost Using the TDATA keyword ... • COR : correlation header • With good stuff such as connection type, correlation-id, authid, planname, workstation- name, … (not package name !) • see DSNDQWHC • DIST : distributed header • See DSNDQWHD • TRA : trace header • See DSNDQWHT • Mostly used for R14 info in combination with global trace • CPU : CPU header • See DSNDQWHU • CPU time of the currently dispatched execution unit (TCB or SRB) • Includes (‘normalized’) CPU consumed on an specialty engine • DB2 11 zIIP time in a separate counter • Important to specify for performance traces • Data sharing • Cannot be specified – always present for data sharing • Contains member and group name • See DSNDQWHA Displaying a DB2 trace

• Via –DIS TRACE command • Starting in DB2 9, DB2 also lists the individual IFCIDs that are specified on the IFCID() keyword Displaying a DB2 trace • IFCIDs and qualifications are displayed with -DIS TRA(*) DETAIL(*) DSNW127I -DB9A CURRENT TRACE ACTIVITY IS - TNO TYPE CLASS DEST QUAL IFCID 01 STAT 01,03,04,05, SMF NO 01 06 02 ACCTG 01,02,03,07, SMF NO 02 08 03 ACCTG 01 OP1 NO 04 PERFM 01 GTF YES 198 *********END OF DISPLAY TRACE SUMMARY DATA********* DSNW143I -DB9A CURRENT TRACE QUALIFICATIONS ARE - DSNW152I -DB9A BEGIN TNO 01 QUALIFICATIONS: NO QUALIFICATIONS END TNO 01 QUALIFICATIONS DSNW152I -DB9A BEGIN TNO 02 QUALIFICATIONS: NO QUALIFICATIONS END TNO 02 QUALIFICATIONS DSNW152I -DB9A BEGIN TNO 03 QUALIFICATIONS: NO QUALIFICATIONS END TNO 03 QUALIFICATIONS DSNW152I -DB9A BEGIN TNO 04 QUALIFICATIONS: EXCLUDE PLAN: ABC* END TNO 04 QUALIFICATIONS DSNW148I -DB9A ******END OF DISPLAY TRACE QUALIFICATION DATA****** DSN9022I -DB9A DSNWVCM1 '-DIS TRACE' NORMAL COMPLETION Modifying a DB2 trace

• Via –MODIFY TRACE command • Based on the trace number TNO (see –DIS TRA output) • Performs an internal –STO/-STA TRACE command • Can only change the CLASS and IFCID specification Stopping traces

• Via –STOP TRACE command • Constraint block allows to specify similar keywords than -STA TRA command • Best to qualify using TNO to avoid stopping the wrong trace Outline • DB2 Instrumentation Trace Facility Overview • IFCIDs • Trace destinations • Trace types • Which Data / Traces Should I Collect When ? • Traces you should have on all the time • Stats • Accounting • More detailed trace to zoom in on a problem case • Traces for specific problems (usage scenarios) • Locking problem • SQL performance problem • Long service time suspensions (EU-switch) Using the DEST keyword • SMF—System Management Facility • Typically used for ‘standard monitoring’, stats, acctg, and audit • GTF—Generalized Trace Facility • Typically used for ‘detailed monitoring’, performance traces • OPn / OPX—Destination for trace output • Used by monitor programs that issue IFI READA • [RES—Resident trace table ] • Used for global trace • Will show up in a dump (Size of trace table controlled by TRACTBL DSNZPARM ) • [SRV—Serviceability routine ] (never seen it used other than DB2 internally) • Always keep an eye out for: • DSNW133I csect-name TRACE DATA LOST, dest NOT ACCESSIBLE RC=code Using SMF as trace destination • Losing trace records is usually not a problem • Flooding could be a problem • As SMF is often used for charge back accounting • When using logstreams for SMF, they should be able to handle large trace volumes • Usually used for stats and accounting - Not often for performance traces • Rollup accounting (ACCUMACC) to reduce the volume • Use SMFCOMP=YES to compress trace data for SMF • SMF record types used by DB2 • SMF 100 = Mainly DB2 statistics (IFCID 1, 2, 202, 225, 230) • SMF 101 = DB2 accounting (IFCID 3, 239) • SMF 102 = Mainly DB2 performance (All other IFCIDs including IFCID 0 (global trace)) • SMF must be active and SMFPRMxx member allows 100-102 eg. SYS(TYPE(0:255),...) Using GTF as trace destination • Mainly used for performance traces • Losing trace records can be a problem • GTF particularities: • GTF data set is used in wrap-around • GTF data set cannot use extents • GTF can use multiple data sets in parallel • Sample JCL in appendix • GTF can use more buffers • Set up GTF to only trace DB2 events • To monitor via GTF use: • Start GTF proc • Start DB2 traces • Trace what is needed • Stop DB2 traces • Stop GTF proc Using OPX/OPx as trace destination

• You can specify the trace buffer you want to use, eg. DEST(OP1) • OP1 -> OP8 • Or let DB2 assign one to you that is not in use, DEST(OPX) • OPx is used by IFI READA requests DB2 • Can specify the size of the buffer IFC – BUFSIZE(xxx) (in KB from 256 - 65536 KB) OP1 OP2 … OP8 • Default is MONSIZE zparm setting . • Note that READS does not use User program or DSN1SDMP OPx buffers

Trace data file Tracing Details • Trace overhead consists of: • Collecting trace data • Externalizing trace data • When the IFCID is active, trace data is collected • Before writing the trace record, the filter criteria are checked • When the trace record passes the filter criteria it is written out to the trace destination • If not, the trace record is discarded • So a restrictive filter saves in output not in trace collection overhead • Filter criteria checked when record is written based on who is writing it • Prefetch I/O is not traced if PLAN(xxx) specified, as prefetch is done under a system thread, not under the thread running plan xxx • Using XPKGPROG(MYPACK) to avoid package info to be written for MYPACK only works if MYPACK is the active package when the record is written • Not effective to filter package acctg records (the package is typically not active when acctg is written) Tracing Details ... • Accounting is “special” in that starting additional trace classes don’t create extra trace records • They just add counters to existing records • The result of that is that if • -STA TRA(A) C(1) DEST(SMF) • -STA TRA(A) C(1,2,3) DEST(OPX) • The SMF records will also contain the class 2 and 3 data • The record is only built once in storage • DB2 instrumentation facility is usually not effective to trace events during startup and shutdown • DB2 Trace records are described in : • hlq.SDSNIVPD(DSNWMSGS) • DSNDQxxx macros of hlq.SDSNMACS (most up to date) Outline • DB2 Instrumentation Trace Facility Overview • IFCIDs • Trace destinations • Trace types • Which Data / Traces Should I Collect When ? • Traces you should have on all the time • Stats • Accounting • More detailed trace to zoom in on a problem case • Traces for specific problems (usage scenarios) • Locking problem • SQL performance problem • Long service time suspensions (EU-switch) DB2 Trace Types

• Statisics Trace • Accounting Trace • Audit Trace • Performance Trace • Monitor Trace • [Global Trace] Statistics Trace

• DB2 Statistics reports are typically based on IFCID 1, 2, 225. • Used as a prime indicator for subsystem-related problems • Contains (an awful lot of) information about • SQL usage (DML, DCL, DDL, direct row access) and DB2 commands • Stored proc, triggers, UDFs • Authorization management • EDM pool • Locking activity / Data-sharing locking • Subsystem services • Query parallelism • Open/Close activity • CPU times • Log activity • DB2 IFI requests, IFC, and data capture • Plan/package processing • DB2 latch counters • DB2 commands • Buffer pool and Group buffer pool activity • RID list processing • DDF activity • Dynamic statement cache • Storage statistics • Accelerator data • Workfile database • Aggregated accounting statistics Statistics Trace ... • Stats class 1 - IFCID 1, 2,105,106, 202, 225 • Since DB2 10 IFCID 1, 2, 202, (217), 225, and 230 no longer controlled by STATIME. Written at fixed, one-minute intervals. • Stats class 3 – IFCID 172 , 196, 250, 258, 261, 262, 313, 330, 335, 337 (deadlock, timeout, long running threads) • Stats class 4 – mainly DRDA exceptions and ASUTIME exceeded • Stats class 5 - Data sharing statistics • Stats class 7 – IFCID 365 (DDF location statistics) • Stats class 8 - IFCID 199 (dataset I/O statistics) • Can be very useful to identify ‘hot’ objects • Stats class 9 – IFCID 369 (Aggregated Accounting Statistics) • Only when accounting is also active • Summary info per connection type at 1 minute intervals • Piggybacks on accounting - very little additional overhead 34 Accounting Trace • Accounting reports are based on accounting records (IFCID 3, 239) • Contains local and distributed DB2 activity associated with a thread • Used as a prime indicator for thread-related problems • Class 1 data • Class 1 times • Accelerator info • SQL statements counters • Dynamic statement cache • RID list processing • Logging • Buffer pool and group buffer pool activity • Query parallelism • Regular and data sharing locking • Stored Procedures / triggers / UDFs • Claim and drain activity • Resource limit facility (RLF) • Distributed data facility (DDF) Accounting Trace ...

• Class 2 data • Adds class 2 times to (existing) record • Class 3 data • Activates all suspension counters and times (adds to existing record) • Class 5 data • Activates IFI counters

• Class 7 data • Package level info (similar to class 2 info but at package level) • Class 8 data • Package level wait counters • Class 10 data • Detailed package accounting info (SQL stmt by type, locking, BP) • [Class 11 • To ‘separate’ writing IFCID 3 from 239 (V11 only) • -STA TRA(A) C(11) DEST(SMF) if you never want package info written to SMF ] DB2 Accounting Times Terminology

ß Class 1 times ß Application time from connect to DB2 (thread creation) ß till disconnect (thread termination), reuse , inactive ß Both class 1 elapsed time and class 1 CPU time are reported PLAN ß Class 2 times ß Time spent within DB2 ß Both class 2 elapsed time and class 2 CPU time are reported ß Class 3 times ß Thread suspension time, e.g. for synchronous I/O

ß Class 5 times ß Time spent for IFI calls ß Both class 5 elapsed time and class 5 CPU time are reported

ß Class 7 times ß Like class 2, but at package/DBRM level PACKAGE ß Class 7 elapsed time and class 7 CPU time are reported ß Class 8 times ßLike class 3, but at package/DBRM level Accounting trace • Accounting trace records are cut for different reasons • When thread ends • When thread is reused • When connection goes inactive (DRDA) • When using rollup accounting for DRDA and RRS • Every ACCUMACC times you would have normally written an accounting record • If gets stale (not enough occurrences of the rollup “group”) • If rollup blocks are using too much storage • But IFCID 3 + 239 always cut together as part of accounting class 1 • QWACRINV indicates reason why acctg record was written Rollup accounting - ACCUMACC • Supported only for RRS and DRDA • Goodness • Dramatic reduction in the number of accounting records • Disadvantage • Loss of transaction granularity (complicates analysis of short hiccups) • Loss of certain info in the accounting record • No DDF info in V8/9 • All packages are rolled up into a single *rollup* package in V8/9 • Some of the acctg record ‘identification info’ like LUWID is from the last record that was rolled up • DB2 10 added DSNDQWAR - records begin/endtime/ACE of first 25 tran • Alternatives to handle large numbers of accounting records • Use SMFCOMP=YES ZPARM to compress records written to SMF • Little overhead (~1%), great compression ratio (~70-80%) • Use logstreams for SMF data (can handle large trace volumes) DB2 Times Terminology - Accounting Class 1,2,3

Class 1 Class 2 in Appl in DB2 IRLM 1st SQL.. (in DB2) (Creating Thread)

SQL ..

SQL ..

...Wait for lock

...Wait for I/O

...Commit

(Terminating Thread) Class 3 (susp time) Agent Agent, non-nested CPU Agent, non-nested ET Audit Trace • More popular than ever before • Some require AUDIT attribute at table to be turned on • A lot of changes here in V10 with the introduction of audit policies • Uses SMF 102 records when DEST(SMF) is used • Overhead typically less than 5% • Consider the frequency of certain events. For example, security violations are not as frequent as table accesses

Trace class Description IFCIDs 1 Access attempts denied due to inadequate authorization 140 2 Explicit GRANT and REVOKE 141 3 CREATE, ALTER, and DROP operations against audited tables 142 4 First change of audited object 143 5 First read of audited object 144 6 Bind time information about SQL statements that involve audited objects 145 7 Assignment or change of authorization ID 55,83,87,169,319 8 Utilities 23,24,25,219,220 10 Trusted context information 269,270 11 Audit administrative authorities 361 (*) Monitor Trace • Used by online monitors • READA • Using OPx trace buffers – that are then read by the monitor program • READS • Uses a return area (no OP buffer) to obtain realtime data • Request waits until the requested data is available • Some IFCIDs are only available via READS • IFCIDs 124, 129, 147, 148, 149, 150, 185, 187, 197, 306

Class Description of class Activated IFCIDs 1 Standard accounting data. 0200 2 Entry or exit from DB2 event signaling. 0232 3 DB2 wait times 0006-0009, 0032, 0033,0044, 0045, 0117, 0118, 0127, 0128, 0170, 0171, 0174, 0175, 0213,0 214, 0215, 0216, 0226, 0227, 0242, 0243, 0321, 0322, 0378, 0379, 0382, 0383 5 Time spent processing IFI requests. 0187 6 Changes to tables created with DATA CAPTURE CHANGES. 0185 7 Entry or exit from DB2 event signaling for package accounting. 0200, 0232, 0240

8 Wait time for a package 0006-0009, 0032, 0033, 0044, 0045, 0051, 0052, 0056, 0057, 0117, 0118, 0127, 0128, 0170,171,174, 175, 213-216, 226, 227, 239, 241-243, 321, 322, 0378, 0379, 0382, 0383 9 Enables statement level accounting. 0124 10 Package detail for buffer manager, lock manager and SQL statistics. 0239 Performance Trace • Great for problem identification • See DB2 command reference –STA TRA command Trace class Description IFCIDs 1 Background events 1,2,31,42,43,76-79,102,103,105-107,153 2 Subsystem events 3,68-75,80-89,106,174,175 22,53,55,58-66,92,95-97,106,112,173,177,233, 3 SQL events 237,272,273,325,341,343,350,360 6-10,29,30,105-107,127,128,226,227,321,322, 4 Reads to and writes from the buffer and EDM pools 357,358 5 Write to active archive log 32-41,104,106,114-120,228,229 6 Summary lock information 20,44,45,105-107,172,196,213,214,218,337 7 Detailed lock information 21,105-107,223 8 Data scanning detail 13-18,105-107,125,221,222,231,305,311,363 9 Sort detail 26-28,95-96,106 10 BIND, commands, and utilities detail 23-25,90,91,105-111,201,219,220,256,360 11 Execution unit switch and latch contentions 46-52,56,57,93,94,106,113 12 Storage manager 98-101,106 13 Edit and validation exits 11,12,19,105-107 14 Entry from and exit to an application 67,106,121,122 16 Distributed processing 157-163,167,183 17 Claim and drain information 211-216 18 Event-based console messages 197 19 Data set open/close 370,371 20 Data sharing coherency summary 249-251,256,257,261,262,267,268 21 Data sharing coherency detail 255,259,263,329 22 Authorization exit parameters 314 23 LE Runtime diagnosis 327 24 Stored Procedure detail 380,499 29-32 Available for local use None Performance Trace ...

• Performance overhead CLASS EVENTS OVERHEAD • 1 BACKGROUND EVENTS LOW It depends 2 SUBSYSTEM EVENTS LOW • Typically 20~100 % 3 SQL EVENTS MEDIUM 4 BUFFER MANAGER + EDM I/O HIGH • Perf class 1,2,3 5 LOG MANAGER I/O HIGH • Typically 5-30% 6 LOCKING SUMMARY MEDIUM 7 LOCKING DETAIL VERY HIGH • Tracing IFCID 198 8 DATA MANAGER DETAIL MEDIUM 9 SORT DETAIL HIGH • Every getpage / 10 UTILITIES, BINDS, COMMANDS LOW release page / 11 EUS, LATCH CONTENTION VERY HIGH set write 12 STORAGE MANAGER VERY HIGH 13 EDIT AND VALIDATION VERY HIGH • Very expensive 14 IN AND OUT OF DB HIGH • Trace volume 16 REMOTE LOCATION EVENTS MEDIUM • CPU overhead 17 CLAIM AND DRAIN LOW 19 PHYSICAL OPEN CLOSE LOW • Filter as much as 20 DATA SHARING SUMMARY LOW 21 DATA SHARING DETAIL HIGH possible 22 AUTHORIZATION EXIT PARAMETERS HIGH • Significantly reduces 24 STORED PROCEDURE DETAIL MEDIUM the overhead 30..32 None none DB2 Instrumentation Facility Interface - Summary

ßTrace types ßTrace headers (TDATA) ß Accounting ß CORrelation header ß Audit ß CPU header ß Performance ß DISTributed header ß Statistics ß TRAce header ß Monitor ßQualifications ß Global ß AUTHID ßTrace classes ß PLAN ß Multiple trace classes per trace type ß LOCATION ß IFCID as basic unit of reporting ß IFCID ß Instrumentation Facility Component ß Many more since V9 ID entifier ßTrace destinations ß SMF - Daily monitoring ß GTF - High volume ß OPx - Online monitoring ß (SRV - Exit to a user-written routine) Outline • DB2 Instrumentation Trace Facility Overview • IFCIDs • Trace destinations • Trace types • Which Data / Traces Should I Collect When ? • Traces you should have on all the time • Stats • Accounting • More detailed trace to zoom in on a problem case • Traces for specific problems (usage scenarios) • Locking problem • SQL performance problem • Long service time suspensions (EU-switch) Which Data / Traces to Collect and When ? • Start at subsystem level (stats) and zoom in from there (top-down)

Always active Statistics Subsystem

Minimum 1 hour during peak time Accounting Thread

Performance Statement Activate when needed • Standard tracing - active all the time • Statistics class 1,3,4,5,(7),(8), 9 • Accounting 1,(2),3,(7,8) ,(10) • If you cannot afford accounting 1,2,3 to be active all the time, turn it on for one hour during peak time every day Standard DB2 Traces - ‘Typical’ Overhead

• DB2 statistics • Negligible • DB2 accounting • CLASS(1,3) (together with STAT(1,3,4,(6)): 2-5% • CLASS(3): Typically less than 1% overhead • Could be higher when system shows very high DB2 internal latch contention • CLASS(2): 1-10% (Higher for applications that go a lot back and forth between application and DB2, eg. FETCH intensive applications) • CLASS(7,8) (Package accounting): Less than 5% • CLASS(10) (Package detail since PK28561): higher CPU overhead Outline • DB2 Instrumentation Trace Facility Overview • IFCIDs • Trace destinations • Trace types • Which Data / Traces Should I Collect When ? • Traces you should have on all the time • Stats • Accounting • More detailed trace to zoom in on a problem case • Traces for specific problems (usage scenarios) • Locking problem • SQL performance problem • Long service time suspensions (EU-switch) Tracing for Specific Problems • SQL runtime behavior does not seem to match the PLAN_TABLE • Gather detailed statement level execution info • (IRLM) Locking problems • DB2 page latch contention • DB2 internal latch contention • Long service task wait times (aka EU switch) Trace info at the SQL Statement level

• When DB2 accounting (plan and package level) and DB2 statistics info are not enough to identify the problem • Zoom in at the SQL statement level, via : • Performance trace • Dynamic Statement Cache (DSC) runtime statistics • Static SQL statement level statistics via IFI 401 Obtain Detailed Runtime Execution Info • Use DB2 Performance Traces • Works for static and dynamic SQL • -STA TRA(P) C(1,2,3) DEST(xxx) • Filter as much as possible (PLAN, AUTHID ...) to reduce the overhead • If you suspect a sort related problem, add CLASS 9 • Be careful: It may seem that sort IFCID 95-96 took a long time but the time can also include the time to process the object itself DB2 kicks off the sort IFCID 95 (sort start) at beginning of the stmt, before there are rows to sort – then starts accessing the objects and as rows qualify they are handed to sort – only when sort ends IFCID 96 is created • If need data manager detail, add CLASS 8 • Provides the order in which objects are accessed • Provides high level info about activity against each object • If need I/O details add, CLASS 4 • Be careful with filtering for async read (and write) I/Os • If you need detailed getpage info, add IFCID(198) • Be careful, the trace volume becomes huge very quickly ! Obtain Detailed Runtime Execution Info … • Use DB2 Performance Traces ... • Verify that runtime access path (IFCID 22) is still the same than what you got in the PLAN_TABLE • Verify that the sequence in which objects are accessed matches with perf class 8 info • To find CPU time used by an SQL stmt you need the CPU header • Subtract total CPU at stmt start from total CPU at stmt end • Most monitor programs will do this for you • CPU info also available in IFCID 58 when IFCID400 is active • Sometimes the problem is related to the value of the input host variables at runtime • IFCID 247 (*) can be used to trace those • If you want to see the values that are being returned to the application (output host variables) • IFCID 248 (*) can be used to trace those (*) Typically high volume/high overhead so be careful Obtain Detailed Runtime Execution Info … • Obtain statement level runtime info for dynamic SQL • IFI READS 316, 317, (318) for DSC • -STA TRA(P) C(30) IFCID(316,317,318) • IFCID 316 - Info about statements in the dynamic statement cache (DSC) • IFCID 317 – Complete SQL statement text • IFCID 318 – (Total) runtime execution information at the SQL stmt level • EXPLAIN STMTCACHE ALL and IFCID318 • Similar info to IFI READS – but via a simple SQL statement • -STA TRA(P) C(30) IFCID(318) is enough • Stopping IFCID 318 will reset the execution stats So issue EXPLAIN STMTCACHE ALL prior to stopping the trace • Need appropriate authority to use EXPLAIN STMTCACHE ALL • SQLADM authority / System DBADM authority / SYSADM authority • Populates user.DSN_STATEMENT_CACHE_TABLE • DDL in *.SDSNSAMP(DSNTESC) • Can use Data Studio to capture this info Using Data Studio to capture DSC Analyzing Dynamic SQL using the DSC Stats

• Statistics counters in the DSC are cumulative (totals) since IFCID 318 was activated • Are zeroed when IFCID 318 is stopped so capture DSC before stopping the trace • Make sure that no other trace with IFCID 318 is already running • In that case nothing happens if an additional trace for IFCID 318 is started and the STAT_TS will reflect the old TS (when the first trace with IFCID 318 was initially started) • EXPLAIN STMTCACHE ALL only shows what is CURRENTLY in the cache • (Important) statements may have LRU-ed out • In V10 IFCID 316 can be written to SMF/GTF when a statement is removed from the cache (eg. –STA TRA(P) C(30) IFCID(316,317,318) DEST(SMF) ) • Use EXPLAIN STMTCACHE STMTID xxx to capture the access path into in the PLAN_TABLE (and other access path related tables)

• Find SQL statements that: • Use most CPU/ET/.. in total • Use most CPU/ET/.. on average • Automate and execute on a daily basis during peak workload times Obtain Detailed Runtime Execution Info

• Obtain statement level runtime info for static SQL • IFI READS 401 (400) • Similar info as dynamic SQL stmts from DSC • ONLY available via IFI program (not via an SQL stmt ) • Perf trace records (like IFCID 58) will contain extra info if IFCID 400 is active • Stmt level info at end of the statement (or close cursor for cursor based SQL)

• IFCID 401 and 316 are available • via IFI READS program calls • Can be written to a trace destination (like SMF) but only when a stmt is evicted from the DSC or EDM pool • Eg. Invalidated in DSC because a RUNSTATS was run DSN_STATEMENT_CACHE_TABLE info

• General statement information • STMT_ID - Statement ID - this value is the EDM unique token for the statement. Unique for a member. Can change if stmt is removed from the cache and later re- inserted. • STMT_TOKEN - Statement token you can provide an identification string on prepare • COLLID - The collection ID: DSNDYNAMICSQLCACHE • PROGRAM_NAME - The name of the package that performed the initial PREPARE for the statement • LINES - The precompiler line number from the initial PREPARE of the statement • PRIMAUTH - The primary authorization ID that did the initial PREPARE of the statement • CURSQLID -The CURRENT SQLID that did the initial PREPARE of the statement • SCHEMA - The value of the CURRENT SCHEMA special register • BIND_QUALIFIER - The BIND qualifier. For unqualified table names, this is the object qualifier • BIND_ISO - The value of the ISOLATION BIND option • BIND_CDATA - The value of the CURRENTDATA BIND option • BIND_DYNRL - The value of the DYNAMICRULES BIND option • BIND_DEGRE - The value of the CURRENT DEGREE special register DSN_STATEMENT_CACHE_TABLE info ... • BIND_SQLRL - The value of the CURRENT RULES special register • BIND_CHOLD - The value of the WITH HOLD attribute of the PREPARE for this stmt • BIND_RO_TYPE - The current specification of the REOPT option for the statement

• LITERAL_REPL - Identifies cached statements where the literal values are replaced by the '&' symbol • BIND_RA_TOT - The total number of REBIND commands that have been issued for the dynamic statement because of the REOPT(AUTO) option • GROUP_MEMBER - The member name of the DB2 that executed EXPLAIN • EXPANSION_REASON - Applies to only statements that reference archive tables or temporal tables • STMT_TEXT - The statement that is being explained (in CLOB). • INV_DROPALT - Whether the statement has been invalidated by a DROP or ALTER • INV_REVOKE - Whether the statement has been invalidated by a REVOKE • INV_LRU - Whether the statement has been removed from the cache by LRU • INV_RUNSTATS- Whether the statement has been invalidated by RUNSTATS • USERS - The number of current users of the statement that have prepared or run the statement during their current unit of work • COPIES - The number of copies of the statement that are owned by all threads in the system DSN_STATEMENT_CACHE_TABLE info ... • Timestamps • CACHED_TS - When the statement was stored in the dynamic statement cache • STAT_TS - Timestamp when IFCID 318 is started • EXPLAIN_TS - The timestamp for when the statement cache table is populated. • Statistical information • STAT_EXECB - The number of times this statement has been run. For a statement with a cursor, this is the number of OPENs • STAT_ELAP - The accumulated elapsed time that is used for the statement • STAT_CPU - The accumulated CPU time that is used for the statement

• STAT_SUS_SYNIO - The accumulated wait time for synchronous I/O operations • STAT_SUS_LOCK - The accumulated wait time for lock requests • STAT_SUS_SWIT - The accumulated wait time for synchronous execution unit switch • STAT_SUS_GLCK - The accumulated wait time for global locks • STAT_SUS_OTHR - The accumulated wait time for read activity that is done by another thread • STAT_SUS_OTHW - The accumulated wait time for write activity done by another thread DSN_STATEMENT_CACHE_TABLE info

• STAT_SUS_LATCH - The accumulated wait time for latch requests • STAT_SUS_PLATCH - The accumulated wait time for page latch requests • STAT_SUS_DRAIN - The accumulated wait time for drain lock requests • STAT_SUS_CLAIM - The accumulated wait time for claim count requests • STAT_SUS_LOG - Accumulated wait time for log writer requests • STAT_GPAGB - The number of getpage operations that are performed • STAT_SYNRB - The number of synchronous buffer reads that are performed • STAT_WRITB - The number of buffer write operations that are performed • STAT_EROWB - The number of rows that are examined • STAT_PROWB - The number of rows that are processed • STAT_SORTB -The number of sorts that are performed • STAT_INDXB - The number of index scans that are performed • STAT_RSCNB - The number of table space scans that are performed • STAT_PGRPB -The number of parallel groups that are created • STAT_RIDLIMTB - The number of times a RID list was not used because the number of RIDs would have exceeded DB2 limits • STAT_RIDSTORB - The number of times a RID list was not used because there is not enough storage available to hold the list of RIDs • Added global lock wait fields in DB2 12 (similar to fields in accounting records) Tracing for Specific Problems • SQL runtime behavior does not seem to match the plan_table • Gather detailed statement level execution info • (IRLM) Locking problems • DB2 page latch contention • DB2 internal latch contention • Long service task wait times (aka EU switch) (IRLM) Locking Problems • Deadlock and timeouts • Stats class 3 info (IFCID 172, 196) - should always be active • Better than syslog msgs as they give (all) waiters/holders • STMT_ID info added in V10 is very valuable • To see the lock suspensions (‘wait for locks’) • Use performance trace class 6 (IFCID 44-45) • To see who is holding the lock • That requires performance trace class 7 (IFCID 21) • To see which statement triggered the lock request • Performance trace class 3 is the easiest (avoids having to specify the individual IFCIDs for all individual types of SQL statements) • In DB2 10 you also have the STMT_ID to identify the SQL stmts involved in deadlock and timeout • If you know the problem is related to page set or page P-lock requests or negotiation • Limit the tracing to IFCID 251 and 259 (IRLM) Locking Problems ... • -START TRACE(P) CLASS(1,2,3,6,7,17) IFCID(226,227) TDATA(CPU,COR,TRA) DEST(GTF) PLAN(xx) AUTHID(xx) PKGPROG(xx) • PC01 – includes IFCIDs for EOT and EOM conditions • PC02 – includes IFCIDs related to thd create/end, commit, rollback • PC03 – incudes IFCIDs related to SQL stmt execution • PC06 – incudes IFCIDs related to lock suspensions • PC07 – included IFCIDs related to lock/unlock/change requests • PC17 – includes IFCIDs related to claim/drain processing • IFCID 226,227 are to track BM page latch suspensions • As they often go hand in hand with locking problems • TDATA(CPU..) – to include the CPU header • DEST(GTF) – write to GTF if you we expect high trace volumes • PLAN, AUTHID, PKGPROG – filter as much as possible to reduce the trace volume Tracing for Specific Problems • SQL runtime behavior does not seem to match the plan_table • Gather detailed statement level execution info • (IRLM) Locking problems • DB2 page latch contention • DB2 internal latch contention • Long service task wait times (aka EU switch) DB2 Buffer Manager Page Latch Contention • Page latches are managed by BM, not IRLM • Before BM can do anything with a page it has to be latched (S/X) • Shows up in accounting class 3 wait time (ET and #suspensions) • Often a symptom of something else • One thread waiting for something (eg. Write I/O to complete), all others waiting for a page latch • Usually the page latch is obtained first (so held by the 1 st thd that is waiting for something else, and all other threads are waiting for the page latch) • Use IFCID 226, 227 for detail analysis • Will provide dbid/obid and page number • No IFCID for page latch requests, only for page latch suspension/resume • -START TRACE(P) CLASS(30) IFCID(226,227) TDATA(CPU,COR,TRA) DEST(GTF) PLAN(xx) AUTHID(xx) DB2 Buffer Manager Page Latch Contention ... • Often observed during high insert activity • Space map (SM) page • Partitioning (allows more concurrent insert into different parts) • MEMBER CLUSTER • TRACKMOD NO (instead of the default YES) • Data page • Randomize inserts • Index page • V9 RANDOM instead of ASC/DESC ‹ Trade-off with higher data getpage, I/O, lock requests • V9 index page size: smaller if random key insert, bigger if sequential key insert • When observed on the workfile database (typically SM page) • Increase # workfile table spaces • When the result of sort activity, is it possible to add an index to avoid sort • Also consider • Higher VDWT to possibly reduce write frequency • Faster active log and database write I/O device Tracing for Specific Problems • SQL runtime behavior does not seem to match the plan_table • Gather detailed statement level execution info • (IRLM) Locking problems • DB2 page latch contention • DB2 internal latch contention • Long service task wait times (aka EU switch) DB2 Internal Latch Contention • Another serialization mechanism used by DB2 • ‘Cheap’ mechanism to quickly serialize on ‘something’ • Short code path (no call to IRLM) • ‘Something‘ is just an address in memory, so can be anything • DB2 internal latches are grouped in classes (32+1) • Information about DB2 internal latch contention • DB2 statistics record (every minute) • DB2 accounting record (per thread) , suspension time and #suspensions • If stats and accounting is not enough • Latch class details in IFCID 51/52 (Share) and 56/57 (Exclusive) • No IFCID for individual latch requests, only latch suspensions • -START TRACE(P) CLASS(30) IFCID(56,57) TDATA(CPU,COR,TRA) DEST(GTF) PLAN(xx) AUTHID(xx) • Typically tracing the x-latch suspensions is enough to find the problem latch • High volume high overhead trace (trace for a short time, use filtering if possible) • Disabling accounting class 3 trace can help reduce CPU time when high latch contention is observed DB2 Internal Latch Contention ...

• In Statistics reports LATCH CNT /SECOND /SECOND /SECOND /SECOND ------LC01-LC04 0.00 0.00 0.00 0.00 LC05-LC08 0.00 75.62 0.00 0.01 LC09-LC12 0.00 0.79 0.00 1.25 LC13-LC16 0.01 676.17 0.00 0.00 LC17-LC20 0.00 0.00 105.58 0.00 LC21-LC24 0.08 0.00 6.01 4327.87 LC25-LC28 4.18 0.00 0.02 0.00 LC29-LC32 0.00 0.20 0.57 25.46 LC254 0.00

ROT Try to keep latch contention rate < 1K-10K per second • Typical high latch contention classes highlighted (more in appendix) • LC06 = Index split latch (in data sharing) • LC14 = Buffer pool LRU and hash chain latch • LC19 = Log latch • LC24 = Prefetch latch (or EDM LRU chain latch) • In Accounting reports

CLASS 3 SUSPENSIONS ELAPSED TIME EVENTS TIME/EVENT ------LOCK/LATCH(DB2+IRLM) 4:58:57.6319 43 417.15423 IRLM LOCK+LATCH 9.285339 21 0.442159 DB2 LATCH 4:58:48.3466 22 814.92484 Tracing for Specific Problems • SQL runtime behavior does not seem to match the plan_table • Gather detailed statement level execution info • (IRLM) Locking problems • DB2 page latch contention • DB2 internal latch contention • Long service task wait times (aka EU switch) Long Service Task Switches • Execution Unit switch aka service task switch • Used when DB2 has to perform a task that cannot run under the current Execution Unit (EU) • Opening or closing a dataset is a good example. Datasets are allocated by the DBM1 address space, not the user address space. Therefore DB2 switches to run on a TCB in the DBM1 asid to open/close datasets. • Accounting data contains info about how long and how often a tran has to wait for an EU switch to complete CLASS 3 SUSPENSIONS ELAPSED TIME EVENTS ------...... SER.TASK SWTCH 27.363426 3 UPDATE COMMIT 0.000828 1 OPEN/CLOSE 27.362598 2 SYSLGRNG REC 0.000000 0 EXT/DEL/DEF 0.000000 0 OTHER SERVICE 0.000000 0

• CPU used by the EU is normally charged to a DB2 system address space (MSTR, DBM1, DIST) Long Service Task Switches … • If more details are required • -STA TRACE(P) C(11) ... • But that also traces internal DB2 latch suspensions (can be high volume) • Better to use a more specific trace request –STA TRACE(P) C(30) IFCID(46,47,48,49,50) ... • If you only want to know which EU switch took place, you can trace IFCID 170/171 • However that only traces the ‘outer’ EU switch • If an EU invokes another EU switch IFCID 170/171 does not show that How to Find Which Service Task is Invoked

• Performance trace with IFCID 46 thru IFCID 50 • IFCID 46 for the service task switch • 47 and 48 for Begin and End of SRB service task • 49 and 50 for Begin and End of TCB service task • Shows all nested service tasks

• Or use IFCID 170 and 171 for less output • Shows one (outermost) service task for each event in class 3 accounting • EU switches are identified by the DB2 resource manager (RMID) and the function code (FC) that they invoke • A long time gap between IFCID 46 and 47 (begin SRB task) or 49 (begin TCB task) can indicate that DB2 is waiting for a service task to become available Commonly Observed Service Tasks

• Commonly observed service tasks and RMID/FC identifiers as shown in IFCID 46, 47, 49, and 170 • Update Commit x’03’/x’49’

• Dataset Open x’0A’/x’59’ • Dataset Close x’0A’/x’4F’

• SYSLGRNX Update x’15’/x’44’

• Dataset Extend x’12’/x’65’ • Dataset Delete x’12’/x’6A’ • Dataset Define x’12’/x’68’ • Dataset Reset x’12’/x’6C’ • Dataset Count x’12’/x’88’ Commonly Observed “OTHER” Service Tasks

• DB2 for z/OS application requestor using TCP/IP waiting for server response x’1B’/x’8F’

• Parallel query cleanup x’14’/x’77’ or x’84’ • Shows up in repeated execution of short-running queries in parallelism • Use higher SPRMPTH ZPARM value to replace this (default=120(ms)) • Of course, for a long-running query, parallel query with degree ANY can result in many times reduction in elapsed time

• VSAM catalog update x’0A’/x’94’

• Peer BSDS read x'04‘/x'41‘ • For programs reading the log via IFI 306 in a data sharing environment (eg. the capture task of replication products) Outline • DB2 Instrumentation Trace Facility Overview • IFCIDs • Trace destinations • Trace types • Which Data / Traces Should I Collect When ? • Traces you should have on all the time • Stats • Accounting • More detailed trace to zoom in on a problem case • Traces for specific problems (usage scenarios) • Locking problem • SQL performance problem • Long service time suspensions (EU-switch) Trace Usage Scenarios • Using DB2 Accounting data to zoom in on a locking problem • Long (drain) lock suspension time • Using DB2 Performance trace to zoom in on an SQL problem • Runtime execution is not as expected • Using DB2 Performance trace to zoom in on long service tasks waits #1 Using Accounting info for a locking problem • Accounting trace contains summarized information about locking • The types of lock/latch resource (suspension time and #suspensions) • DB2 internal latch • IRLM lock + IRLM latch • Page latch • Drain/claim • Global lock – P-lock and L-lock • The numbers to differ the suspend type • Local lock/latch suspend • IRLM lock or IRLM latch • Global lock suspend • False contention • XES contention • Real contention • Heuristic conversion: SYNC to ASYNC Use Accounting Trace to Identify a Lock Issue

• The numbers: • IRLM lock + latch : 3 • Claim release: 0 • DB2 latch: 12 • Page latch: 0 • Drain lock: 1 • Global contention: 1

CLASS 3 SUSPENSIONS ELAPSED TIME EVENTS ------LOCK/LATCH(DB2+IRLM) 0.002913 15 IRLM LOCK+LATCH 0.038008 3 DB2 LATCH 0.002913 12 DRAIN LOCK 6.955973 1 CLAIM RELEASE 0.000000 0 PAGE LATCH 0.000000 0 GLOBAL CONTENTION 0.518333 1 Use Accounting Trace to Identify a Lock Issue

LOCKING TOTAL • 3 IRLM lock + latch ------• (Real) lock suspend: 0 TIMEOUTS 0 DEADLOCKS 0 • IRLM latch suspend: 3 LOCK REQUEST 136 • UNLOCK REQST 2 3 IRLM latch suspensions QUERY REQST 0 took 38 ms CHANGE REQST 0 OTHER REQST 0 TOTAL SUSPENSIONS 3 LOCK SUSPENS 0 IRLM LATCH SUSPENS 3 OTHER SUSPENS 0 Use Accounting Trace to Identify a Lock Issue • What is the global contention for ? • Parent L-lock • 1 parent L-lock suspension lasted 518ms

GLOBAL CONTENTION L-LOCKS ELAPSED TIME EVENTS ------L-LOCKS 0.518333 1 PARENT (DB,TS,TAB,PART) 0.518333 1 CHILD (PAGE,ROW) 0.000000 0 OTHER 0.000000 0 GLOBAL CONTENTION P-LOCKS ELAPSED TIME EVENTS ------P-LOCKS 0.000000 0 PAGESET/PARTITION 0.000000 0 PAGE 0.000000 0 OTHER 0.000000 0 Use Accounting Trace to Identify a Lock Issue

• Is the 518 ms parent L-lock contention real IRLM global lock contention? • IRLM global contention: 0 DATA SHARING TOTAL • XES contention: 0 ------LOCK - XES 22 • UNLOCK-XES 1 Heuristic conversion: 0 CHANGE-XES 0 • SUSP - IRLM 0 False contention: 2 SUSP - XES 0 CONV - XES 0 • It’s false contention! FALSE CONT 2 INCOMP.LOCK 0 • 1 is the parent L-lock NOTIFY SENT 0 • 2 false contention? • What’s the other one? Use Accounting Trace to Identify a Lock Issue • Where is the drain/claim counted, local or global? • Local if it is local contention • OTHER SUSPENS 0 • Global if it is global contention • FALSE CONT 2 • It is also a false contention !

CLASS 3 SUSPENSIONS ELAPSED TIME EVENTS ------LOCK/LATCH(DB2+IRLM) 0.002913 15 IRLM LOCK+LATCH 0.038008 3 DB2 LATCH 0.002913 12 DRAIN LOCK 6.955973 1 CLAIM RELEASE 0.000000 0 PAGE LATCH 0.000000 0 GLOBAL CONTENTION 0.518333 1 Start a Locking Trace To Verify

• To double check that it is really false contention, collect a performance trace • -START TRACE(P) CLASS(3,6,7,17) IFCID(226,227) TDATA(CPU,COR,TRA) DEST(GTF) PLAN(xx) AUTHID(xx) PKGPROG(xx) • Class 3 – SQL activity trace • Class 6 – Lock summary (including suspensions) • Class 7 – Lock detail • Class 17 – Claim and drain • IFCID 226/227 – Page latch contention As lock contentions often go hand in hand with page latch contention, it is usually a good idea to trace those as well Start a Locking Trace To Verify

• Locking trace confirms it is false contention

Lock suspend: 11:00:37.16219897 Lock resume : 11:00:44.11817197 Lock type: DRAIN CS Lock resource: DB =NIGHTDB OB =SSSINCT EVENT SPECIFIC DATA ------SUSP.TIME =6.955973 LOCAL CONTENTION=N DURATION =MANUAL LATCH CONTENTION=N STATE =S IRLM QUEUED REQ =N RESUME RSN=NORMAL INTER-SYSTEM =Y* NOTIFY MSG SENT =N* FALSE CONTENTION What’s Next?

• False contention time is expected to be very short. 6.9 seconds is too long, then what’s next? • Check DB2 statistics trace for more clues • IRLM CPU was very high at the problem time • Check RMF CF and XCF report

250 200 Good period 150 100 Bad period 50 0 MSTR DBM1 IRLM DIST 1-minute interval statistics trace – address space CPU Trace Usage Scenarios • Using DB2 Accounting data to zoom in on a locking problem • Long (drain) lock suspension time • Using DB2 Performance trace to zoom in on an SQL problem • Runtime execution is not as expected • Using DB2 Performance trace to zoom in on long service tasks waits #2 Runtime Execution is Not as Expected • Access path looks ok in the PLAN_TABLE • Runtime behavior is not ok • Use DB2 accounting info (plan and/or package level ) as a starting point • What is not as expected ? • Elapsed time too high • Class 2 time (in DB2 or not) • Class 3 time • CPU time too high (in DB2 or not) • Locking behavior • #SQL calls • (High) BP activity • Zoom in to the statement level • Detailed DB2 performance trace • READS IFCID 401(static), 316-318 (dynamic) via MON trace class 29 • EXPLAIN STMTCACHE ALL Analyzing an SQL Performance Problem • Accounting data is always a good starting point • Accounting class 1,2,3,7,8,10 was active • Use package level info to demonstrate ------Overview by OMPE |PROGRAM NAME CLASS 7 ELAPSED TIME CONSUMERS | |DSNREXX |> 1% | Easy to find high |GLWJEMP1 |=> 2% | |GLWJEMP2 | | CL7 ET or CPU |DPTDEL |==> 5% | |DPTSEL | | |EMPDEL |=> 2% | |EMPFND | | |EMPQR2 |> 1% | |EMPSEL |======> 80% |

EMPSEL VALUE EMPSEL TIMES/OCCURR TIMES/PKG ALLOC ------TYPE PACKAGE ELAP-CL7 TIME-AVG 48.198704 0.016906 CP CPU TIME 26.644116 0.009346 LOCATION DB1B AGENT 26.644116 0.009346 CLASS 7 COLLECTION ID GLWSAMP PAR.TASKS 0.000000 0 PROGRAM NAME EMPSEL SE CPU TIME 0.000000 0 info SUSPENSION-CL8 6.340073 0.002224 ACTIVITY TYPE NATIVE SQL PROC AGENT 6.340073 0.002224 ACTIVITY NAME EMPSEL PAR.TASKS 0.000000 0 SCHEMA NAME GLWSAMP NOT ACCOUNTED 15.214516 0.005337 SUCC AUTH CHECK 2 AVG.DB2 ENTRY/EXIT 5702.00 2 OCCURRENCES 2 DB2 ENTRY/EXIT 11404 4 NBR OF ALLOCATIONS 5702 SQL STMT - AVERAGE 11404.00 CP CPU SU 2186183.50 766.812 SQL STMT - TOTAL 22808 AGENT 2186183.50 766.812 NBR RLUP THREADS 2 PAR.TASKS 0.00 0 SE CPU SU 0.00 0

manually added for easy reading Analyzing an SQL Performance Problem • Package class 8 and class 10 info • (Averages are per package occurrence in the acctg records)

EMPSEL AVERAGE TIME AVG.EV TIME/EVENT ------CLASS 8 info LOCK/LATCH 6.278443 49.1K 0.000128 IRLM LOCK+LATCH 4.656214 32.00 0.145507 DB2 LATCH 1.622229 49.0K 0.000033 SYNCHRONOUS I/O 0.012326 57.50 0.000214 OTHER READ I/O 0.048752 92.50 0.000527 OTHER WRITE I/O 0.000000 0.00 N/C SERV.TASK SWITCH 0.000546 1.00 0.000546 ARCH.LOG(QUIESCE) 0.000000 0.00 N/C ARCHIVE LOG READ 0.000000 0.00 N/C DRAIN LOCK 0.000000 0.00 N/C CLAIM RELEASE 0.000000 0.00 N/C PAGE LATCH 0.000006 2.00 0.000003 NOTIFY MESSAGES 0.000000 0.00 N/C GLOBAL CONTENTION 0.000000 0.00 N/C CLASS 10 info TCP/IP LOB XML 0.000000 0.00 N/C ACCELERATOR 0.000000 0.00 N/C PQ SYNCHRONIZATION 0.000000 0.00 N/C TOTAL CL8 SUSPENS. 6.340073 49.2K 0.000129

EMPSEL AVERAGE TOTAL EMPSEL AVERAGE TOTAL EMPSEL AVERAGE TOTAL ------SELECT 8553.00 17106 BPOOL HIT RATIO (%) 100.00 N/A TIMEOUTS 0.00 0 INSERT 0.00 0 GETPAGES 51432.8K 102866K DEADLOCKS 0.50 1 UPDATE 0.00 0 BUFFER UPDATES 0.00 0 ESCAL.(SHARED) 0.00 0 DELETE 0.00 0 SYNCHRONOUS WRITE 0.00 0 ESCAL.(EXCLUS) 0.00 0 SYNCHRONOUS READ 57.50 115 MAX PG/ROW LOCKS HELD 67.50 135 DESCRIBE 0.00 0 SEQ. PREFETCH REQS 0.00 0 LOCK REQUEST 650.00 1300 PREPARE 0.00 0 LIST PREFETCH REQS 0.00 0 UNLOCK REQUEST 30.50 61 OPEN 0.00 0 DYN. PREFETCH REQS 4127.7K 8255484 QUERY REQUEST 0.00 0 FETCH 0.00 0 PAGES READ ASYNCHR. 1308.50 2617 CHANGE REQUEST 0.00 0 CLOSE 0.00 0 OTHER REQUEST 0.00 0 TOTAL SUSPENSIONS 32.00 64 LOCK TABLE 0.00 0 LOCK SUSPENSIONS 31.00 62 CALL 0.00 0 IRLM LATCH SUSPENS. 1.00 2 Analyzing an SQL Performance Problem • SQL Activity report BY STMTID • STMTID is a better identifier than the statement number

EVENT COUNT TOT.ELAPS TOTAL TCB DETAIL AET/EVENT TCB/EVENT ------PACKAGE DB1B.GLWSAMP.EMPSEL.X'1977A1EC125C0A1C' V1 ACQUIRE(USE) REOPT(N) RELEASE(COMMIT) ISO(CS) DYNAMICRULES(RUN) PREPARE(NODEFER) KEEPDYNAMIC(NO) PROTOCOL(DRDA) OPTHINT(N/P)

6977 5702 0.082504 0.030489 SELECT 0.000014 0.000005 .... 6979 5702 0.080681 0.036136 SELECT IFCID 401 type info also 0.000014 0.000006 .... present in IFCID 58 6980 265 0.003914 0.001883 SELECT 0.000015 0.000007 .... 6982 5435 1:34.075850 53.127033 SELECT 0.017309 0.009775 STMT ID : 6982 STMT TYPE : STATIC SORTS : 0 GET PAGES : 102819543 PARALLEL GRP CREATES: 0 SYNC BUFF READS : 98 BUFFER WRITES : 0 INDEX SCANS : 5435 TABLESPACE SCANS : 0 ROWS EXAMINED : 114931333 ROWS PROCESSED : 0 RID-LIMIT EXC. : 0 RID-NO STORAGE : 0 IN-DB2 ELAPSED : 1:34.026607 IN-DB2 CPU : 53.109795 Avg 18900 getpages GLOBAL LOCK : 0.000000 DRAIN LOCK : 0.000000 LOCK/LATCH : 7.362120 LATCH : 3.244350 per execution ! PAGE LATCH : 0.000012 CLAIM COUNT : 0.000000 SYNCHRON. I/O : 0.020706 UNIT SWITCH : 0.000000 READ-OTH. THREAD: 0.074695 WRITE-OTH. THREAD : 0.000000 LOG WRITER : 0.000000 Analyzing an SQL Performance Problem • IFCID 58 (END SQL) of an individual execution

BART DB2CALL CF8F311F7790 BART DB2CALL GLWRUN1 BART GLWRUN1 DB2CALL 20:03:02.97629991 243037 1 58 END SQL <-- NETWORKID: USIBMSC LUNAME: SCPD1B1 LUWSEQ: 21 DSNREXX 'BLANK' 0.22242555 |------|LOCATION NAME : DB1B |PKG COLLECTION ID : GLWSAMP |PROGRAM NAME : EMPSEL |CONSISTENCY TOKEN : X'1977A1EC125C0A1C' |STATEMENT NUMBER : 1 QUERY COMMAND ID : X'0000000000000000' QUERY INSTANCE ID: X'0000000000000000' |SQL REQUEST TYPE : 17 EXPANSION REASON : NO EXPANSION | SQLCODE : 0 SQLSTATE: 00000 |SQLCAID: SQLCA SQLCABC 136 SQLERRP : DSN SQLEXT : 00000 |SQLERRD1 0 SQLERRD2 0 SQLERRD3 0 SQLWARN0: SQLWARN1: SQLWARN2: SQLWARN3: |SQLERRD4 -1 SQLERRD5 0 SQLERRD6 0 SQLWARN4: SQLWARN5: SQLWARN6: SQLWARN7: | SQLWARN8: SQLWARN9: SQLWARNA: |SQLERRM: |...... |DATA TYPE INDX ROW PROC 83 ROW EXAM 83 STG1-QUAL 1 STG2-QUAL 1 ROW INSRT 0 |ROW UPDTE 0 ROW DELET 0 PAGES 84 RI SCAN 0 RI DELET 0 |LOB SCAN 0 LOB UPDTE 0 |DATA TYPE SEQD ROW PROC 25992 ROW EXAM 25992 STG1-QUAL 0 STG2-QUAL 0 ROW INSRT 0 |ROW UPDTE 0 ROW DELET 0 PAGES 23360 RI SCAN 0 RI DELET 0 |LOB SCAN 0 LOB UPDTE 0 |...... |TYPE OF STATEMENT : STATIC STATEMENT IDENTIFIER : 6982 |NBR OF SYNC READS : 0 NBR OF GETPAGES : 23444 |NBR OF ROWS EXAMINED : 25992 NBR OF ROWS PROCESSED : 0 |NBR OF SORTS : 0 NBR OF INDEX SCANS : 1 |NBR OF TABLESPACE SCANS : 0 NBR OF BUFFER WRITES : 0 |NBR OF PAR. GRPS CREATED: 0 | |ACCUMULATED TIME VALUES: |IN-DB2 ELAPSED : 0.013705 IN-DB2 CPU : 0.012275 |WAIT FOR SYNC I/O : N/P WAIT FOR LOCK/LATCH : N/P |SYNC EXEC UNIT SWITCH : N/P WT FOR GLOBAL LOCKS : N/P |WT FOR READ BY OTHER THR: N/P WT FOR WRTE BY OTHER THR: N/P | |NBR OF TIMES RID LIST NOT USED, BECAUSE: |NBR EXCEEDS DB2 LIMITS : 0 NOT ENOUGH STORAGE : 0 Analyzing an SQL Performance Problem • Statement Type = STATIC • We should be able to find it in the catalog

SELECT SUBSTR(COLLID,1,18) AS COLL_ID , SUBSTR(NAME,1,8) AS PACKAGE_ID , QUERYNO , STMT_ID , HEX(STMT_ID) AS HEX_STMT_ID , STATEMENT FROM SYSIBM.SYSPACKSTMT WHERE STMT_ID = 6982 ;

------+------+------+------+------+------+------+------+------COLL_ID PACKAGE_ID QUERYNO STMT_ID HEX_STMT_ID ------+------+------+------+------+------+------+------+------GLWSAMP EMPSEL 2610 6982 0000000000001B46

-+------+------+------+------+------+------+------+------+------+-- STATEMENT -+------+------+------+------+------+------+------+------+------+-- SELECT MIN(EMP_NO) INTO :H:H FROM GLWTEMP WHERE EMP_NO >= :H:H AND MANAGER = :H:H QUERYNO 2610 Analyzing an SQL Performance Problem • Check the PLAN_TABLE (or use EXPLAIN(ONLY) to extract the info • Use the QUERYNO to find the stmt

-----+------+----+------+------+------+------+------+------+------+------+----+------QUERYNO QBLOCKNO PROGNAME PLANNO METHOD CREATOR TNAME ACCESSTYPE MATCHCOLS ACCESSCREATOR ACCESSNAME -----+------+----+------+------+------+------+------+------+------+------+----+------2610 1 EMPSEL 1 0 GLWSAMP GLWTEMP I 1 GLWSAMP GLWXEMP3

----+------+------+------+-----+------+----+------INDEXONLY PREFETCH COLUMN_FN_EVAL SCAN_DIRECTION SORT(all) ----+------+------+------+-----+------+----+------N R F N

-- Database=GLWSAMP -- Index=GLWSAMP.GLWXEMP3 On GLWSAMP.GLWTEMP Access Path does ------not look too bad CREATE UNIQUE INDEX GLWSAMP.GLWXEMP3 ON GLWSAMP.GLWTEMP (EMP_NO ASC) USING STOGROUP GLWG01 PRIQTY 300 SECQTY 100 ERASE NO FREEPAGE 0 PCTFREE 10 GBPCACHE CHANGED BUFFERPOOL BP16 CLOSE YES COPY NO PIECESIZE 2 G; -- Analyzing an SQL Performance Problem • What if we added another index ? • Create index – Runstats – Rebind explain(yes)

-- Database=GLWSAMP -- Index=GLWSAMP.GLWXEMP4 On GLWSAMP.GLWTEMP ------IX is used -- CREATE UNIQUE INDEX GLWSAMP.GLWXEMP4 MC=1 but now IXONLY ON GLWSAMP.GLWTEMP (EMP_NO DESC, MANAGER ASC) USING STOGROUP GLWG01 PRIQTY 300 SECQTY 100 ERASE NO FREEPAGE 0 PCTFREE 10 GBPCACHE CHANGED BUFFERPOOL BP16 CLOSE YES COPY NO PIECESIZE 2 G; --

-----+------+----+------+------+------+------+------+------+------+------+----+------QUERYNO QBLOCKNO PROGNAME PLANNO METHOD CREATOR TNAME ACCESSTYPE MATCHCOLS ACCESSCREATOR ACCESSNAME -----+------+----+------+------+------+------+------+------+------+------+----+------2610 1 EMPSEL 1 0 GLWSAMP GLWTEMP I 1 GLWSAMP GLWXEMP4

----+------+------+------+-----+------+----+------INDEXONLY PREFETCH COLUMN_FN_EVAL SCAN_DIRECTION SORT(all) ----+------+------+------+-----+------+----+------Y R N 96 Analyzing an SQL Performance Problem • Run again – Did it help ? • CPU time /stmt exectution 0.009775 -> 0.000007 ( almost 1400x less) • 36570 executions (vs 5435 before) in the same time interval • Complete workload consists of many different tran • Overall workload throughput increased 7 times by just adding this index

EVENT COUNT TOT.ELAPS TOTAL TCB DETAIL AET/EVENT TCB/EVENT ------6982 36570 0.466894 0.241616 SELECT 0.000013 0.000007 STMT ID : 6982 STMT TYPE : STATIC SORTS : 0 GET PAGES : 73899 PARALLEL GRP CREATES: 0 SYNC BUFF READS : 0 BUFFER WRITES : 0 INDEX SCANS : 36570 TABLESPACE SCANS : 0 ROWS EXAMINED : 0 ROWS PROCESSED : 0 RID-LIMIT EXC. : 0 RID-NO STORAGE : 0 IN-DB2 ELAPSED : 0.216235 IN-DB2 CPU : 0.143823 GLOBAL LOCK : 0.000000 DRAIN LOCK : 0.000000 LOCK/LATCH : 0.000000 LATCH : 0.000023 Avg 2 getpages PAGE LATCH : 0.000000 CLAIM COUNT : 0.000000 SYNCHRON. I/O : 0.000000 UNIT SWITCH : 0.000000 per execution READ-OTH. THREAD: 0.000000 WRITE-OTH. THREAD : 0.000000 LOG WRITER : 0.000000 Trace Usage Scenarios • Using DB2 Accounting data to zoom in on a locking problem • Long (drain) lock suspension time • Using DB2 Performance trace to zoom in on an SQL problem • Runtime execution is not as expected • Using DB2 Performance trace to zoom in on long service tasks waits #3 Analyzing Long Service Task Switches • Accounting data is always a good starting point • Long open/close time observed

TIMES/EVENTS APPL(CL.1) DB2 (CL.2) IFI (CL.5) CLASS 3 SUSPENSIONS ELAPSED TIME EVENTS ------ELAPSED TIME 1:55.93590 27.496244 N/P LOCK/LATCH(DB2+IRLM) 0.000000 0 NONNESTED 1:55.93590 27.496244 N/A IRLM LOCK+LATCH 0.000000 0 STORED PROC 0.000000 0.000000 N/A DB2 LATCH 0.000000 0 UDF 0.000000 0.000000 N/A SYNCHRON. I/O 0.106740 72 TRIGGER 0.000000 0.000000 N/A DATABASE I/O 0.106740 72 LOG WRITE I/O 0.000000 0 CP CPU TIME 0.031744 0.028250 N/P OTHER READ I/O 0.000171 1 AGENT 0.031744 0.028250 N/A OTHER WRTE I/O 0.000000 0 NONNESTED 0.031744 0.028250 N/P SER.TASK SWTCH 27.363426 3 STORED PRC 0.000000 0.000000 N/A UPDATE COMMIT 0.000828 1 UDF 0.000000 0.000000 N/A OPEN/CLOSE 27.362598 2 TRIGGER 0.000000 0.000000 N/A SYSLGRNG REC 0.000000 0 PAR.TASKS 0.000000 0.000000 N/A EXT/DEL/DEF 0.000000 0 ... OTHER SERVICE 0.000000 0 SUSPEND TIME 0.000000 27.471342 N/A ARC.LOG(QUIES) 0.000000 0 AGENT N/A 27.471342 N/A LOG READ 0.000000 0 PAR.TASKS N/A 0.000000 N/A DRAIN LOCK 0.000000 0 STORED PROC 0.000000 N/A N/A CLAIM RELEASE 0.000000 0 UDF 0.000000 N/A N/A PAGE LATCH 0.000000 0 NOTIFY MSGS 0.000000 0 NOT ACCOUNT. N/A N/C N/A GLOBAL CONTENTION 0.001006 3 DB2 ENT/EXIT N/A 70 N/A COMMIT PH1 WRITE I/O 0.000000 0 EN/EX-STPROC N/A 0 N/A ASYNCH CF REQUESTS 0.000000 0 EN/EX-UDF N/A 0 N/A TCP/IP LOB XML 0.000000 0 DCAPT.DESCR. N/A N/A N/P ACCELERATOR 0.000000 0 LOG EXTRACT. N/A N/A N/P AUTONOMOUS PROCEDURE 0.000000 0 PQ SYNCHRONIZATION 0.000000 0 TOTAL CLASS 3 27.471342 79 Analyzing Long Service Task Switches • EU-switches • Used when DB2 has to perform a task that cannot run under the current Execution Unit (EU) • OPEN/CLOSE is a good example when a EU-switch is needed • TRA(P) CLASS(19) to trace physical open/close (IFCID 370/371) • IFCID 105/107 does NOT trace physical open • IFCID370/371 distinguishes between • Allocation time – basically z/OS part of the open (like SVC99) • Open time – basically Media Manager part of the open processing • Both are ok here – now what ?

03:26:17.58460764 972270 1 370 OPEN DATA SET N/P INFORMATION ------DATABASE OPEN INFORMATION DATA SET NAME : RMTH.DSNDBC.THVCDM00.IXVC0562.I0001.A002 FLAGS : X'C7' ACE ADDRESS : X'1E3983A0' DATABASE ID : 670 OBID : 1772 PART NUMBER : X'00000002' INSTANCE NUMBER : X'00000001' DSMAX : 70000 OPENED DATA SETS : 40126 ALLOCATION TIME : 0.020264 OPEN TIME : 0.020337 ------Check the Complete EU Switch Processing • Maybe we are out of service tasks to perform the open • Trace the EU switch activity • Can be done as part of TRACE(P) C(11) • But that also traces internal DB2 latch suspensions (can be high volume) • Better to trace as –STA TRACE(P) C(30) IFCID(46,47,48,49,50) • If you only want to know which EU switch trace IFCID170/171 • Nope – start running on the OPEN TCB right away RECORD TIME DESTNO ACE IFC DESCRIPTION DATA TCB CPU TIME ID ------03:25:50.24408312 965630 1 46 SERVICE RECORD ENTRVI8F N/P NETWORKID: RDNET LUNAME: TD2DB2AT LUWSEQ: 2 QW0046AC X'1E3983A0' QW0046FC X'0059' QW0046ID X'0A' 03:25:50.24415357 965633 1 49 SERVICE RECORD NETWORKID: RDNET LUNAME: TD2DB2AT LUWSEQ: 2 N/P QW0049ID X'0A' QW0049FC X'0059' ... 03:26:17.58460764 972270 1 370 OPEN DATA SET ENTRVI8F N/P INFORMATION NETWORKID: RDNET LUNAME: TD2DB2AT LUWSEQ: 2 |------| DATABASE OPEN INFORMATION |DATA SET NAME : RMTH.DSNDBC.THVCDM00.IXVC0562.I0001.A002 FLAGS : X'C7' | ..... (see previous slide) |------03:26:17.58475006 972271 1 50 SERVICE RECORD ENTRVI8F N/P NETWORKID: RDNET LUNAME: TD2DB2AT LUWSEQ: 2 Check the z/OS Side • Dataset open also needs to access the ICF catalog • RMF ENQ postprocessor report (needs SMF type 77) • REPORTS(ENQ(DETAIL)) • Waiting to access the ICF catalog (UCAT DEVXR3 contains our dsn)

E N Q U E U E A C T I V I T Y z/OS V2R1 SYSTEM ID RD01 DATE 07/29/2015 INTERVAL 14.59.959 CONVERTED TO z/OS V2R2 RMF TIME 03.15.00 CYCLE 1.000 SECONDS

ENQUEUE DETAIL ACTIVITY GRS MODE: STAR -NAME------CONTENTION TIME ------JOBS AT MAXIMUM CONTENTION-- ... MAJOR MIN MAX TOT AVG ----- OWN ------WAIT ---- ... MINOR TOT NAME TOT NAME ... SYSNAME SYSNAME ... SYSIGGV2 CATALOG. DEVXR3 (SYSTEMS) 0.000 62.617 178.15 0.019 1 CATALOG (S) 59 CATALOG (E) ... RD01 RD02 CATALOG (S) RD01

ENQUEUE DETAIL ACTIVITY -NAME- ...-----%QLEN DISTRIBUTION- AVG Q -REQUEST TYPE - --- CONTENTION --- MAJOR ... MIN 1 2 3 4+ LNGTH -EXCL- -SHARE- EVENT --STAT CHNG- MINOR ... MIN MAX MIN MAX TOTAL TOTAL %NODET

SYSIGGV2 CATALOG.DEVXE3 (SYSTEMS) ... 57.7 25.5 9.9 6.8 2.17 0 7 0 59 9159 N/A N/A Thank You Questions Notices and Disclaimers

Copyright © 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM.

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.

Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.

Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.

Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.

References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.

Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.

It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law. Notices and Disclaimers (con’t)

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.

• IBM, the IBM logo, .com, ®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, ®, StoredIQ, Tealeaf®, Tivoli®, ®, Unica®, urban{code}®, , WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml . APPENDIX Using GTF as Trace Destination • Sample GTF PROC with multiple (“striping”) datasets

//GTFDB2 PROC MEMBER=GTFDB2 //IEFPROC EXEC PGM=AHLGTF, // PARM='MODE=EXT,DEBUG=NO,TIME=YES,BLOK=1M,SIZE=8M,NOPROMPT', // REGION=0M //GTFOUTxx DD DSNAME=myprefix.GTFTRAxx,UNIT=SYSDA, // VOL=SER=xxxxxx, // DISP=(NEW,KEEP),SPACE=(CYL,(1000)),DCB=NCP=255 //* //* repeated up to 15 times for all the other trace data sets !! //* //SYSLIB DD DSNAME=SYS1.PARMLIB(&MEMBER),DISP=SHR • SYS1.PARMLIB(GTFDB2) contains

TRACE=USRP USR=(FB9) <-- this is to limit the input only from DB2 END Using GTF as Trace Destination • GTF “striped” data MUST be merged (via IPCS) before usage

//GTFMERGE JOB (XXX),'BART.STEEGMANS', // MSGLEVEL=(1,1), // NOTIFY=&SYSUID,MSGCLASS=H, // TIME=1440,REGION=256M //IPCS EXEC PGM=IKJEFT01,DYNAMNBR=20,REGION=1500K //IPCSDDIR DD DSN=myprefix.IPCSDDIR.DD,DISP=OLD //TRACE1 DD DISP=SHR,DSN=myprefix.GTF01 //TRACE2 DD DISP=SHR,DSN=myprefix.GTF02 //TRACEALL DD DISP=(NEW,CATLG), // DSN=myprefix.GTF.MERGE, // DCB=(myprefix.GTF01), // UNIT=(SYSDA,15),SPACE=(CYL,(500,500),RLSE) //IPCSPRNT DD SYSOUT=* //SYSTSPRT DD SYSOUT=* //SYSTSIN DD * IPCS NOPARM SETDEF NOCONFIRM PRINT NOTERM DROPD RECORDS(ALL) DDNAME(TRACE1) DROPD RECORDS(ALL) DDNAME(TRACE2) COPYTRC TYPE(GTF) + INFILE(TRACE1,TRACE2) + OUTFILE(TRACEALL) SETDEF CONFIRM NOPRINT TERM END DB2 Internal Latch Contention – Common LC

• LC06 for index tree P-lock during index split • ‘Real’ Latch Class (LC) X’46’ in IFCID 57 performance trace • Index split is particularly painful in data sharing • Results in two forced physical log writes (one in DB2 11) • Index split time can be significantly reduced by using faster active log device • Options to reduce index split • Index free space tuning for random insert • Minimum index key size especially if unique index • NOT PADDED index for large varchar columns (V8) • Large index page size (V9) • Asymmetric leaf-page split (V9) • Indication of number of index splits in LEAFNEAR/FAR in SYSINDEXPART and RTS REORGLEAFNEAR/FAR • V10: New IFCID 359 to monitor Index splits DB2 Internal Latch Contention ...

• LC14 Buffer Manager latch • If many table spaces and indexes, assign to separate buffer pools with an even getpage frequency • If objects bigger than buffer pool, try enlarging buffer pool if possible • If high LC14 contention, use buffer pool with at least 4000 buffers • Use PGSTEAL=FIFO or NONE rather than LRU buffer steal algorithm if there is no read I/O, i.e. object(s) entirely in buffer pool • LRU = Least Recently Used buffer steal algorithm (default) • FIFO = First In First Out buffer steal algorithm • Eliminates the need to maintain LRU chain which in turn • Reduces CPU time for LRU chain maintenance • Reduces CPU time for LC14 contention processing • V10: NONE = Buffer pool used for "in-memory" objects • Data is preloaded into the buffer pool at the time the object is physically opened and remains resident • Eliminates the need to maintain LRU chain DB2 Internal Latch Contention ...

• LC19 Log latch • Minimise #log records created via • LOAD RESUME/REPLACE with LOG NO instead of massive INSERT/UPDATE/DELETE • Segmented or UTS table space if mass delete occurs • Increase size of output log buffer if non-zero unavailable count • When unavailable, first agent waits for log write • All subsequent agents wait for LC19 • Reduce size of output log buffer if non-zero output log buffer paging • See Log Activity section in the DB2 statistics report • Reduced LC19 Log latch contention in DB2 9 • Log latch not held while spinning for unique LRSN • Further improvements in DB2 10 • Log latch time held minimized • Conditional attempts to get log latch before unconditional request DB2 Internal Latch Contention ...

• LC24 latch • Used for multiple latch contention types • (1) EDM LRU latch – LC X’18’ in IFCID 57 performance trace • Thread reuse with RELEASE DEALLOCATE instead of RELEASE COMMIT for frequently executed packages • V10: Moving CT,PT from EDM pool to thread pools reduced LC24 significantly • (2) Prefetch scheduling – LC X’38’ in IFCID 57 performance trace • Higher contention possible with many concurrent prefetches • Disable dynamic prefetch if unnecessary by setting VPSEQT=0 (V9) • V10: PGSTEAL NONE for in memory data/index BP • Use more partitions (one x’38’ latch per data set) Bart Steegmans IBM Please fill out your session [email protected] evaluation before leaving! Sessions: B7 + B8 Title: Trace Records: The Answer to All DB2 for z/OS Performance Questions