<<

USing the EXPLAIN Feature to Improve DB2~ Query Performance

Frank Solcntan, Blue Cross Blue Shield of CT North Haven, CT 06(73

.lBSTR.lCT - Providing pointer controlled access to data - Avoiding scanning of all table data This paper discusses the use of the SOL EXPLAIN - Avoiding sorts feature of DB2 to"assist in the understanding of - Avoiding access to the table data entirely bow DB2 btuldles an SOL query request and how to look for tuning opportunities for better query perforaenee. This paper will explain how DB2 How does DB2 prOC!!5S an SOl query? processes an SOL request. how to use the SOL EXPLAIN statement with the SAS8 SOL PASS-~ The SOL statement that is presented to DB2. Facility. obtain the EXP~N results. and whether dynamically (SAS. QMF) or through the BIND interpret what those results maan. In addition. process (application programs). must move through this paper will present some helpful ideas for many internal DB2 components to and optimizing queries to save both and system retrieve physical table data. The following gives overbead. a flow and description of the DB2 components used to satisfy an·SOL query request:

1I1TROOOCTIOlf SOL Parser - Checks for proper SOL syntax To acquire the skills to enhance SOL query performance it is important to understand the Optimizer underlying component structures that up the - Confirms the users authority 1m2 DBMS. 1m2 is a Database Management system. - Decomposes query into query blocks comprising of two dimensional tables" and indexes a query block represents a section of an that fora a basic relational structure. SOL SOL statement e.g .• base SELECl". a provides tbe means for data access and retrieval subquery. a UNICN froa ~ tables. Internally. ~ contains tables - Merges query with any DB2 table views that fora a catalog that store information about being accessed in the query all DB2 objects. An intelligent component of DB2. - Obtains catalog statistics for DB2 objects the optiaizer. deteraines the probable access path accessed of 4 eingle SOL ototoment to tho data doto~inod - ChOOa811!11 an accolI!IIlI!II po th: by lntoraatlon contain in the DB2 catalog. Tbe - Sequences query blocks optimizer is an inference engine that makes all - Computes row filter factors ~ data access decisions. The optimizer's - COmputes access path costs decision ean only be indirectly influenced through - Determines whether to use an index coding strategies in the SOL code. The SOL EXPLAIN - Determines indexes to use is used to request that ~ explain the - Chooses least costly path optimizer's access path decision. Access path - Creates the access plan per foraonce ae.y be .a. f foctod by: Code generator - Number of rows to be accessed - Converts access plan to object machine - Distribution of data code - Access strategy - Stores access plan as an Application Plan - Sequencing of data for statically bound applications - COndition of the databases - Requested approach to the data Executor - Executes stored Applic~tion Plans to Table INDEXES enhance access path performance by: access data

196 - Execute dynaaic (Ad-Hoc) queries one time SORnLUNIQ 0IAR(1) NOr NULL. only SCIR1N....TOIN 0IAR(l) NOr NULL. SORl>LQRDBIUIV 0IAR(1) NOr NULL. Data Retrieval SORnLGllOOPIIY 0IAR(1) NOr NULL. Deta Manager-1M SClRl'C.. 0IAR(l) NOr WLL. - Stage 1 row fIlter. called sargable SORTC_30IN 0IAR(l) NOr NULL. - Evaluates predicates (from WHERE SClRTC...0RDERB'i OfAR(l) NOr NULL. clause) as the data is retrieved from SClRl'C..GROUPIIY OfAR(l) NOr NULL. tbe physical storage (resource TSLOClOClDE 0IAR(3) NOr NULL. efficieut) :TIlIESTAHP 0IAR(16) NOr NULL. - Usee the Buffer Manager to access the RI!HARKS VAROIAR( 25.) NOr NULL. tables PREPETOi CHAR(l) NOr NULL WIn! DEPAULT. - Used by the Relational Data Syatem-RIlS ~ALOfAR(l) NOr NULL wrm DEPAULT. - Relational Deta systea-RDS MIXlJPSEQ SMALLINr NOT NULL wrm DEPAULT. - Stage 2 row filter. called non-sargable VERSION VAROIAR(6.) NOr NULL WIn! DEPAULT. - Interface for queries (creates results CXlLLID 0IAR(18) NOr NULL wrm DEPAULT) table) - Makes calls to the Deta Manager IN DBNAHE. TSNAME - Evaluates predicates (from WHERE c18use) after the data has been retrieved in the buffer (resource The follOWing is a simple coding semple to intensive) illustrate issuing an SOL EXPLAtN for a query - Buffer Manager SELECT using the SOL PASS-THROUGH Facility: - Works like a scratch pad area for 082 - Inter feces between the VSAM Media OPTIONS LINESIZE=250 PAGESIZE=60; Manager and the Deta M8nager PKlC SOL; VSAM Media Manager SELECT • noGlIANE 0i.\R(8) NOr NULL. PROM

197 To explain a query. the ooly eleaent that is TABID - N~ that identifies the ,sequence required is that the stata.ent EIPLAEH ALL SET of references to the seae table in OUBRYRJ .. 1 Pal IIlUSt prec:.e the SELEct clause. the PROM clause To issue an EXPLAIN. the EXECUTE state.ant TSI.OQ(MOI)E - The TABLI!SPACE lock _e (enables DBMS specific SOL) must be used or an IS -Intent &bare error will occur. The EXECUTE EXPLAIN statement IX -Intent exclusive can be eo.mented out when ready to actually SIX -Share with intent exclusive execute the query. When executing these S -Share statements. DB2 will insert rows representing the U -Update EXPLAIN data in the PLAlLTABLE. The query explain x -Exclusive .owa will. be identified by the number that is used TIMESTAMP - Date and till8 the EXPI.AIN was .in the SET QUERYlIl = D... so that this particular processed query can be easily 10ca~ when querying the REMARKS - Pield in which comaents can be PLAN...TABtE. Notice that -1- (parameter markers) inserted are used in place of actual data. Using actual data may cause a different access patb than using The PLAtLTA8LE colwms that relate to INDEX usage -1- due to differing filter factors. Use M1- when inforaation about the explained SOL are as the data is unknown or will vary to obtain the DB2 follows: default . Use actual data if it is known or static to obtain the actual filter factor results;' ACCESSTYPE - of table INDEX usage: It. filter factor of a Where predicate is a R -Pull table scan (uses no percentage that estimates how aeny rows are index) rejected by and how many rows are not rejected by I -Use an index ~he WHERE predicate. Filter factors greatly aftect 11 -one-fetch scan (MIN or MAX ~he estimated costs of the ~ access paths to be functiOns) chosen by the optiaizer by estimating the N -Index scan (predicate uses percentage.of the rows returned after evaluating an IN) ~be NtERE predicate. Use the &sg'''XI,s;t: aacro M -Multi-index scan followed variable which contains the return code and error by: descriptions produced by DB2 to check for MX -Matching index scan on successful execution of tbe SQL statements. row pointers only Appendix It. shows an exaaple of sc.e explained MI -consolidate (intersect) output frca a PLAN-TABLE. of row pointers from multiple indexes MU -Collbination (union) of Vhat do the EXPLAIlJ columas aean? row pointers from aultiple indexes 'The PLAN.....TABLE columns that relate to GENERAL MATCHCXlLS - NWlber 0 f index keys used in an inforaation about the explained sgL are as index scan follows: AOCESSCREATOR - creator of the index being used QlII!R'iNO - OUery nWlber assigned by the ACC:BSSNAMB - Name of the index being used explaining user INCEXCNLY - Y if all data comes from the QB[.OCIQl:l - Number that identifios the SOL query index sections for a query with a SELECT. N if data aust come from the subquery. \JNICIf table APPLNAM! - Plan naae for prograas with eabedded - Sequence of steps in a multi- SOL (C.COBOL.etc) index scan PRJGNAME - Prograa naae for programs with embedded SOL The PLANLTABLE columns that relate to usage PLANNO - identifies the order of operations information about the explained SOL are as within a ~ follows: aEATOR - Creator of a new table (materialized view} accessed - Indicates the table 30IN method - N..... 01 naw table (1IIIIter1al1zed o -First table accessed. first View) accessed outer table

198 1 -Nested loop join required to return large sets of data. Very Z -Merge scan join effective when the table data is in clustered 3 -Sort required for CJRDBR BY. sequence. Could be a problem or degrade GROUP B'i. UNION. DIS-riNer performance if expecting accessing a small amount .. -Hybrid join of data . SCRnLUNIO - Not being used (always N) SCRnLJOIN Sort inner table for join Table JOINS support multiple table access with processing (Y.N) matching and frequently make use of prlaary and - Not being used (always N) foreign keys. JOIN perforaance is contingent on Not being used (always N) indexes to support matching and WHERE predicates. Sort to remove duplicates When joining tables look the ~ column. A (Y.N) nested loop join is preferred if accessing a small SORTC-JOIN - Sort outer table for join percentage of rows or when index coluans are used processing (Y.N) in the join. Nested Loop joins are favored by 002 SORTC-~ - Sort for the ORDER BY clause when the appropriate indexes exist and the results (Y.N) table will be small. Merge scan join is preferred ~CiRClJPBY Sort for the GFI!1JP BY clause when accessing a large percentage of rows or index (Y.N) columns are not used in the join. Merge Scan joins PREFEICri Physical data read in advance: may require sorting prior to joining. Hybrid joins S -Sequential prefetch make use of list pre fetch and process duplicate L -List pre fetch outer table matches aost efficiently. DB2 favors a -Blank -No pre fetch Hybrid join for medium sized reSult tables. Joins ~AL - Indicates when a column are efficient than subqueries. Joins make fUDction is evaluated: better use of indexes than subquries. Rewrite R -At detl!. retrieval tiae subqueries as joins when possible. Again. check (stag8 1) the METHOD column to deteraine JOIN type. If using S -At eart time (stage 2) subqueries. use correlated subqueries when possible. A correlated subquery refers to an outer query. DB2 will sometimes hold the results of a Vhat should you be looking for? correlated subquery to evaluate against each row from the outer query. Know what coluans aake up tbe indexes on the tables being accessed. Indexes enbance performance Make predicates indexable whenever possible. and reduce costs. Look lit the ACC'ESS'lYPE to see if Indexable predicates are usually evaluated at an index is being used.. An Aa:ESS'IYPE of "R" means STAGE 1 filtering througb the DB2 Data Manager. all tbe date. in the table .ust be scanned. and no STAGE 1 filtering 1s DOre efficient because data indexes are being used. For larger tables and is filtered as the rows are being retrieved. The joins this is very elQ)8nsive. Look for MATO«XlLS followina are indexable oredicates: to eee bow llany index keye are being ueed. The aore the better. Clteck to see if INDEXt't«..y 1s Ny". a::u:.tJMN wlue A ·Y· aeens that data viII be retrieved from the 0JI..1..I4N > V.dUB (also =). not») index rather than the ~able (very f~st). ~tchlng COLUMN ( v~lue (also =<, not<) on a unique index provides the best filtering of aJLlJMN is NULL data. Change the WHERE clause to take advantage of CXJLI..Ir4N LIKE • cbar_'" indexes. Have indexes created to aatcb WHERE aJLlJMN IN (list) clauses that are comaon and often. OJLUMN BETWEEN wluBl AND value2 the above connected by ~n AND Avoid unnecessary sorts as auch as possible. DISTINCT, UNIW, ORDBR BY. GRlJUP JSi. and some stage 2 predicates are usually efficient joins require DB2 to sort data, DB2 is sorting if because data 1s filtered or evaluated after the MI!'nI::[) = -3" and the sc:R'Ill or SORTC columns = ny". rows have been physically retrieved. STAGE 2 PREF.ETCH is good if lots of rows are being predicates are evaluated last and are the accessed. Prefetch is when ~ fetches physical costliest to process. The following are STAGE 2 data into the buffer ahead of time before being predicates :' analyzed. Improves perforaance of some types of quer18s by reducing tbe nuaber of physical 1/08 A.roLt.IHNl B.OOLUHN2 (with any condition)

199 COLUMN = v.alue + 1 (any nu.eric calculation) the skills to perfor.ance _oriented query DAY(OJLlMiI) = 1 (any sealar functions) requests. ~lJICCLUMN2 = • val us' (concatenations)

Rewrite predicates to take advantage of indexes and STAGE ! processing. Check ACCESSTYPE column.

SolIe perforaance tips. REliEREHCES The follOWing are some perfor.ance tips you .ight QODslder when constructing your SQL or to ehange SAS Technical Report P-221. SAS/ACCESS Software: EXPLAIN results to iJIlprove the retrieval of data. Changes and Enhaneements. Release 6.07. SAS Insti tute Inc. • Make all queries as explicit as possible • The WHERE clause is a filter, use it IBM Database 2. Application Programming and SOL effectively Guide. version 2. IBM Inc, • Select only colu.ns that are needed. Do not use =- IEM Database 2. Administ~ation Guide. Ve~sion 2. • Use the IN statement instead of multiple ORa IBM Inc:. • Qualify all join columns in a JOIN. Join with as .any of the index colUEl.s as possible • Do not use a JOIN without a qualifying WHERE clause • Do not specify tables in a JOIN if you don't Deed thea • Avoid using arithmetic expressions in the WHEItE clause • Avoid using character concatenations (II) in the WHERE clause • Avoid using SUBSTRING in the WHERE clauSe • Use <= end => instead of NOr BETWEEN • Avoid using NOT =. tor BEniEEN. tor IN. tDT LIKE • Do not use an ORDER fS{ wi th DISTINCT or UNICE if the order you want is the column order of the SELECT clause • Avoid using a subquery in an IN list if the subquery returns .ultiple rows. Use a JOIN instead • Use NOT EXISTS inetead of HOT IN for a eubquery

COITCI.USIOH

The key to a healthy query environment is ~he way in which we access and manipulate the data that is .ade available to us. Using the EXPLAIN feature of ~ can in giving insight into the way De2 will handle the execution of the SOL query. Since the Del optimizer makes that decision for us. it is possible to help influence that decision by applying ve:ious techniques and re-formulating th6 sgL oode~ Understanding ~ and the processes that take place behind the scenes can greatly enhance ones perforaance and pra.ote a more efficient query envir::nment for 2111 to use. M2lka ita commor. pract1ce ~o use the EXPLAIN statement an4 develop

200 QUERVNO QBLOCKNO APPLNAME PROGNAME PLANNO METHOD CREATOR TABNO ACCESSTVPE MATCHCOLS ------TNAME--.------1000 1 DSQ8ESQL 0 PROD M 0 1000 1 DSQ8ESQL 0 PROD +l+g~gl~:gt::~~:~MX 1 1000 1 DSQ8ESQL 0 PROD MX 1 1000 1 DSQ8ESQL 0 PROD +:+g~g:~:gt::~=:~MU 0 1000 1 OSQ8ESQL 0 PROD TITCP012_CLM_VR_2 MX 1 1000 1 DSQ8ESQL 0 PROD TITCPOI2_CLM_VR_2 MU 0 1000 2 DSQ8ESQL 0 PROD TITCPOI3_CLM_VR_3 M 0 1000 2 DSQ8ESQL 0 PROD TITCPOI3_CLM_VR_3 MX 1 1000 2 DSQ8ESQL 0 PROD TITCPOI3_CLM_VR_3 MX 1 1000 2 DSQ8ESQL 0 PROD TITCP013 CLM VR 3 MU 0 1000 2 DSQ8ESQL 0 PROD TITCPOI3:CLM:VR:3 MX 1 1000 2 DSQ8ESQL 0 PROD TITCPOI3_CLM_VR_3 MU 0

SORTN SORTN SORTN SORTN SORTC ACCESSCREATOR SORTC SORTC SORTC ACCESSNAME INDEXONLV UNIQ JOIN OROER8V GROUPBV UNIQ JOIN ORDERBV GROUPBV TSLOCKMODE ------~ N N N N N N N N N S OBA E-< T1ICPO 12_02 V N N N N N N N N S .... DBA TIICPO 12_02 V N N N N N N N N S ~ N N N N N N .... ~ DBA N N N a TIl CPO12_02 V N N N N N N N N N N N, N N N N N N N ~ N N' N N N N N N DBA N S TIl CPO13_02 V N N N N N N N DBA N S TIl CPO13_02 V N N N N N N N N S N N N N N N DBA N N N T1ICPO 13_02 V N N N N N N N N N N N N N N N N N

CI)UIMN FN TIMESTAMP PREFETCH EVAL MIXOPSEQ COLLID ----._-...------...------1994031316265939 L 0 Q 1994031315265939 1 Q 1994031315265939 2 Q 1994031315266939 3 Q 199403131&266939 4 Q 199403131&266939 6 Q 1994031316266939 L 0 Q 1994031315266939 1 Q 1994031315286939 2 Q 1994031316266939 3 Q 1994031316285939 4 Q 1994031316265939 6 Q