<<

Versus Many SQL roads lead to ... But which is best?

Serge Rielau www.tinyurl.com/SQLTips4DB2 IBM Canada [email protected]

Session Code: D15 May 6, 2011 10:30 | Platform: DB2 9.7 for LUW Motivation • DB2 9.7 has biggest infusion of SQL since DB2 V2.1

• Many ways can be used to fix any business problem

• Which feature is best depends

• This talk tries to describe what it depends on

2 Setup

● The machine • My T400s Laptop, 3GB Ram, Dual Core 2.5GHz • 1 120GB SSD drive • Windows XP ● The database • DB2 9.7.3 • Default database, automatic storage, STMM • Additional log files ● The schema • T(c1 INT), 1 Million rows • (Presumable in memory) Compiled vs. Inlined logic Introducing the contestants • Compiled SQL PL • Introduced in DB2 7.1 (procedures) up to DB2 9.7 • Opaque (black box) to invoker • Statically compiled • Each SQL statement compiled separately

• Inlined SQL PL • Introduced in DB2 2.1 (triggers) up to DB2 7.2 (functions, logic) • Structure visible to invoking SQL statement's optimizer • Macro expanded • One super statement for logic and relational parts

4 Compiled vs. Inlined logic Strengths and weaknesses

SQL PL SQL SQL PL C C SQL Property Comp Return Inlined Fenced Unfenced Sourced

Speed O ++ + -- + ++

Plan Stability ++ O O ++ ++ ++

Global optimization -- ++ O -- -- ++

Package Cache ++ - -- ++ ++ ++

Functionality ++ - O ++ ++ -

Debugging ++ - -- O O O

Portability ++ ++ ++ O O ++

5 Compiled vs. Inlined logic The experiment: “plus”

● SQL Return CREATE FUNCTION PLUS_INLINED_RET(a INT, b INT) RETURNS INT RETURN a + b; ● SQL PL inlined CREATE FUNCTION PLUS_INLINED(a INT, b INT) RETURNS INT BEGIN ATOMIC DECLARE res INTEGER; SET res = a + b; RETURN res; END; ● SQL PL compiled CREATE FUNCTION PLUS_COMPILED(a INT, b INT) RETURNS INT BEGIN RETURN a + b; END; ● SQL Sources CREATE FUNCTION PLUS_SOURCED(INT, INT) RETURNS INT SOURCE SYSIBM."+"(INT, INT); Compiled vs Inlined Logic

● Fenced C UDF CREATE FUNCTION PLUS_C_FENCED(a INT, b INT) RETURNS INT LANGUAGE C EXTERNAL NAME 'plus' PARAMETER STYLE SQL FENCED; ● Unfenced C UDF CREATE FUNCTION PLUS_C_FENCED(a INT, b INT) RETURNS INT LANGUAGE C EXTERNAL NAME 'plus' PARAMETER STYLE SQL NOT FENCED; ● C UDF source code void SQL_API_FN Plus(SQLUDF_INTEGER *a, SQLUDF_INTEGER *b, SQLUDF_INTEGER *out, SQLUDF_SMALLINT *aNullInd, SQLUDF_SMALLINT *bNullInd, SQLUDF_SMALLINT *outNullInd, SQLUDF_TRAIL_ARGS) { if (*aNullInd == -1 || *bNullInd == -1) { *outNullInd = -1; } else { *out = *a + *b; *outNullInd = 0; } } Compiled vs. Inlined logic Mode Time (s) Empirical data Baseline 0.11 “+” 0.14 • Test RETURN 0.15 SQL SELECT MAX() Source 0.14 FROM T; Inline 1.25 • SQL RETURN and SQL PL Compile 27.5 SOURCE “disolve” Unfenced 10.2 C Fenced 104

• For “small” logic inline SQL PL beats unfenced C • Unfenced C (with no SQL) beats compiled SQL PL • Question: When does SQL PL overhead beat unfenced C impedance mismatch? 8 Views vs. Table functions vs. Created Cursors Introducing the contestants

● General ● SQL table functions • Encapsulate semantics • Introduced in DB2 7.1 • definer, path, schema • Always unordered • Access control • Take parameters • Return result sets • Must be embedded in a query • May be MODIFIES SQL DATA

● ● Views Created Cursors • Introduced in DB2 2.1 • Introduced in DB2 9.7 • Always unordered • Can be ordered • No parameters • Take parameters • READS SQL DATA query • Cannot be embedded in a query • Must be embedded in a query • May be MODIFIES SQL DATA • Can have public synonyms • Can be members of modules 9 Views vs Table Functions vs Created Cursors Strengths and weaknesses

SQL Property View Created Cursor Table Function Speed ++ ++ ++ Plan Stability ------Global optimization ++ ++ O Access Control ++ O O Functionality O ++ +

10 Array vs. Temp vs. Common Table Expression Introducing the contestants

● Array • Use in FROM via UNNEST • Shared type defined in catalog • No stats • Uses main memory • One index • Pass between client and server • Dense or sparse via procedures • Content is private • Can be scalar array or array of • Session lifetime rows • Non transactional • Produce scalar arrays with • Non SQL API column function array_agg() • Usage limited to SQL PL and PL/SQL CREATE TYPE intarraysparse AS INTEGER ARRAY[INTEGER]; CREATE TYPE intarraydense AS INTEGER ARRAY[]; CREATE VARIABLE sparseints intarraysparse; CREATE VARIABLE denseints intarraydense; SET denseints[7] = 1000; Array vs. Temp vs. Common Table Expression Introducing the contestants

 User Temporary Table  Common table expression  Uses bufferpool and disk  Uses bufferpool and disk  SQL API (INSERT, …)  SQL API (WITH)  Shared or private definition  Private definition  Private content  No stats  Stats optional  No index  Index optional  Statement lifetime  Sparse  Sparse  Session lifetime  Optionally transactional CREATE GLOBAL TEMPORARY TABLE temp_logged(idx INT, val INT) LOGGED; CREATE INDEX idx1 ON temp_logged(idx); DECLARE GLOBAL TEMPORARY TABLE temp_not_logged(idx INT, val INT) NOT LOGGED; CREATE INDEX idx2 ON temp_not_logged(idx); WITH cte AS (SELECT * FROM T) SELECT * FROM CTE; Array vs. Temp vs. Common Table Expression Introducing the contestants User Temporary Common Table Property Array Table Expression Speed ++ ++ +

SQL Integration O ++ ++

SQL PL Integration ++ O O

Memory footprint - + +

Disk footprint ++ + +

SQL Optimization -- ++ +

SQL PL Optimization ++ -- --

Transactional -- ++ O Array vs. Temp vs. Common Table Expression Empirical Data

Action Mode Time (s) Baseline 0.19 ARRAY Sparse Loop 55.6 Dense Loop 52.5 Dense ARRAY_AGG 0.19 Fill TEMP LOGGED 6.59 NOT LOGGED 4.20 PL/SQL Dense BULK COLL 10.2 ARRAY Sparse BULK COLL 10.4 Access ARRAY UNNEST 0.12 Dense Loop Sum 28.4 Sparse Loop Sum 29.5 TEMP SELECT 0.08 SQL PL Recursion vs. SQL Recursion Introducing the contestants

● SQL PL Recursion ● SQL Recursion • Introduced in DB2 7.1 • Introduced in DB2 5.1 • Max 64 level • Unlimited • Full semantic control • Limited semantic control • Payload • Payload • Descending, ascending • Descending and both • Breadth first • Depth or breadth first • Cost based optimization • No optimization Recursion inside out

1 Seed -> rec-temp 2 For each row in temp execute recursion • Append to temp 3 Finish when 2. catches up with appends

Read (z, z, z) Seed (h, i, j) (z, z, z) (a, b, c) Insert (k, m, n) Cursor (d, e, f) (k, l, m) (g, h, i) (a, b, c) (j, k, l) (d, e, f) (h, i, j) (g, h, i) Read Cursor (k, m, n) (j, k, l) (k, l, m) (a, b, c) (d, e, f) (g, h, i) (j, k, l) Rec-Plan RETURN ( 1) | WITH rec(i) AS TBSCAN (VALUES (1) ( 2) UNION ALL | SELECT i + 1 TEMP FROM rec ( 3) WHERE I < 1000) n->n+1 | SELECT * FROM rec UNION ( 4) Seed /----+---\ TBSCAN TBSCAN ( 5) ( 6) | | TEMP TABFNC: GENROW ( 3) SQL PL recursion vs. SQL recursion Empirical results

● SUM(x..y) <- SUM(x..(x + y) /2) + SUM((x+y) / 2 +1, y) • SQL PL • Stored procedure divides on the way down • Sum on the way up • SQL • Dived on the way down Mode Time (s) • Straight final sum SQL 4.61 ● SQL is much faster SQL PL 231 • Less overhead ● SQL PL is much more expressive • Not all can be written in SQL SQL vs SQL PL Introducing the contestants

● SQL ● Compiled SQL PL • Relational, Set oriented • Procedural • Parallelizable • Single thread • Cost based optimization • Some rule based • Distributed (DPF) optimization • Limited plan stability • Execute on coordinator only • Deterministic plan ● Inlined SQL PL • Procedural • Single thread • Some global rule based optimization • Execute on coordinator only SQL vs SQL PL Empirical Data

● Join 2 tables of 25k rows each • Best SQL PL can do, is barely beat a NL JOIN • “Correlated” => Predicate applied to INNER • “Uncorrelated” => Predicate applied to cross product

Mode Time (s) SQL Hash 0.01 Index only Hash 0.01 Nested loop (correlated) 8.50 Nested loop (non correlated) 10.1 Inline SQL PL Nested FOR (correlated) 9.47 Nested FOR (non Correlated >> Compiled SQL PL Nested FOR (Correlated) 6.83 Nested FOR (non correlated) 112 Private vs Public Aliases vs. PATH Introducing the contestants

● Private • Forward address from a specific schema to • Table, View, Nickname, Alias, Module, Sequence • Sourced functions equal private alias for functions • Introduced in DB2 2.1 ● Public Alias (aka public synonym) • Broadcast object's one-part name to all schemas • Table, View, Nickname, Alias, Module, Sequence • Overruled by schema local definition • Introduced in DB2 9.7 ● PATH • Subscribe to see objects in a set of schemas • Function, Procedure, Module, Type • Set path using CONNECT_PROC DB CFG (9.7.3) Private vs. Public Alias vs PATH

● Alias and PATH do not compete • Nearly disjoint set of objects ● Private Alias • Use when objects are of special interest to specific schemas ● Public Alias • Use when objects are of general interest to most/all schemas • Use for “committed version” of objects and “privatize” with local overloading. Conclusion

● We barely scratched the surface here ● Outcomes may vary very much depending on specific business need and implementation ● 30 years of cost based optimization do pay of

● “Don't trust a statistic you haven't faked yourself”

● The author learned a lot during creation of this talk. • Especially to be careful when proposing an all new talk. :-) Serge Rielau IBM Canada [email protected], www.tinyurl.com/SQLTips4DB2 D15 Versus - Many SQL roads lead to Rome... But which is best?