Tuning Principles, Experiments and Troubleshooting Techniques http://www.mkp.com/dbtune Dennis Shasha ([email protected]) Philippe Bonnet ([email protected])

© Dennis Shasha, Philippe Bonnet - 2002 1

Database Tuning

Database Tuning is the activity of making a run more quickly. “More quickly” usually means higher throughput, though it may mean lower response time for time-critical applications.

© Dennis Shasha, Philippe Bonnet - 2002 2

1 Application Programmer (e.g., business analyst, Data architect) Application

Sophisticated Application Query Processor Programmer (e.g., SAP admin) Indexes Storage Subsystem

Concurrency Control Recovery DBA, Tuner

Hardware

© Dennis[Processor(s), Shasha, Philippe BonnetDisk(s), - 2002 Memory] 3

Outline

1. Basic Principles 2. Tuning the guts 3. Indexes 4. Relational Systems 5. Application Interface 6. Ecommerce Applications 7. Data warehouse Applications 8. Distributed Applications 9. Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 4

2 Goal of the Tutorial

•To show: – Tuning principles that port from one system to the other and to new technologies – Experimental results to show the effect of these tuning principles. – Troubleshooting techniques for chasing down performance problems.

1 - Basic Principles © Dennis Shasha, Philippe Bonnet - 2002 5

Tuning Principles Leitmotifs

• Think globally, fix locally (does it matter?) • Partitioning breaks bottlenecks (temporal and spatial) • Start-up costs are high; running costs are low (disk transfer, cursors) • Be prepared for trade-offs (indexes and inserts)

1 - Basic Principles © Dennis Shasha, Philippe Bonnet - 2002 6

3 Experiments -- why and where

• Simple experiments to illustrate the performance impact of tuning principles. • http://www.diku.dk/dbtune/experiments to get the SQL scripts, the data and a tool to run the experiments.

1 - Basic Principles © Dennis Shasha, Philippe Bonnet - 2002 7

Experimental DBMS and Hardware • Results presented throughout this tutorial obtained with: – SQL Server 7, SQL Server 2000, Oracle 8i, Oracle 9i, DB2 UDB 7.1 – Three configurations: 1. Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb) 2 Ultra 160 channels, 4x18Gb drives (10000RPM), Windows 2000. 2. Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000. 3. Pentium III (1 GHz, 256 Kb), 1Gb RAM, Adapter 39160 with 2 channels, 3x18Gb drives (10000RPM), Linux Debian 2.4. © Dennis Shasha, Philippe Bonnet - 2002 8

4 Tuning the Guts

• Concurrency Control – How to minimize lock contention? • Recovery – How to manage the writes to the log (to dumps)? •OS – How to optimize buffer size, process scheduling, … • Hardware – How to allocate CPU, RAM and disk subsystem resources?

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 9

Isolation

• Correctness vs. Performance – Number of locks held by each transaction – Kind of locks – Length of time a transaction holds locks

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 10

5 Isolation Levels • Read Uncomitted (No lost update) – Exclusive locks for write operations are held for the duration of the transactions – No locks for read • Read Committed (No dirty retrieval) – Shared locks are released as soon as the read operation terminates. • Repeatable Read (no unrepeatable reads for read/write ) – Two phase locking • Serializable (read/write/insert/delete model) – Table locking or index locking to avoid phantoms

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 11

Snapshot isolation

0 s 1 s 0 ns rn rn • Each transaction executes ur tu tu et re re against the version of the data ) r ) X (Y (Z) items that was committed when R( R R the transaction started: T1 – No locks for read 1) – Costs space (old copy of data := ) (Y =3 must be kept) W Z: T2 2, • Almost serializable level: := (X –T1: x:=y T3 W – T2: y:= x – Initially x=3 and y =17 – Serial execution: X=Y=Z=0 TIME x,y=17 or x,y=3 – Snapshot isolation: x=17, y=3 if both transactions start at the same time. © Dennis Shasha, Philippe Bonnet - 2002 12

6 Value of Serializability -- Data

Settings: accounts( number, branchnum, balance); create clustered index c on accounts(number); – 100000 rows – Cold buffer; same buffer size on all systems. – Row level locking – Isolation level (SERIALIZABLE or READ COMMITTED) – SQL Server 7, DB2 v7.1 and Oracle 8i on Windows 2000 – Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 13

Value of Serializability -- transactions Concurrent Transactions: – T1: summation query [1 thread] select sum(balance) from accounts; – T2: swap balance between two account numbers (in order of scan to avoid deadlocks) [N threads] valX:=select balance from accounts where number=X; valY:=select balance from accounts where number=Y; update accounts set balance=valX where number=Y; update accounts set balance=valY where number=X;

© Dennis Shasha, Philippe Bonnet - 2002 14

7 Value of Serializability -- results

SQLServer • With SQL Server and 1 0.8 read committed DB2 the scan returns 0.6 serializable 0.4

answers 0.2 incorrect answers if

Ratio of correct 0 0246810 the read committed Concurrent update threads isolation level is used

Oracle (default setting) 1 • With Oracle correct 0.8 0.6 read committed 0.4 answers are returned serializable

answers 0.2 0 (snapshot isolation), Ratio of correct 0246810 Concurrent update threads but beware of swapping 2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 15

Cost of Serializability

SQLServer Because the update conflicts with the scan, (trans/sec) Throughput Throughput correct answers are 0246810 read committed Concurrent Update Threads obtained at the cost of serializable decreased concurrency Oracle and thus decreased throughput. read committed (trans/sec) Throughput serializable

0246810 Concurrent Update Threads

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 16

8 Locking Overhead -- data

Settings: accounts( number, branchnum, balance); create clustered index c on accounts(number);

– 100000 rows – Cold buffer – SQL Server 7, DB2 v7.1 and Oracle 8i on Windows 2000 – No lock escalation on Oracle; Parameter set so that there is no lock escalation on DB2; no control on SQL Server. – Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 17

Locking Overhead -- transactions

No Concurrent Transactions: – Update [10 000 updates] update accounts set balance = Val; – Insert [10 000 transactions], e.g. typical one: insert into accounts values(664366,72255,2296.12);

© Dennis Shasha, Philippe Bonnet - 2002 18

9 Locking Overhead

Row locking is barely more expensive than table locking because recovery 1 overhead is higher than 0.8 row locking overhead 0.6 db2 sqlserver – Exception is updates on 0.4 oracle DB2 where table locking is

0.2 distinctly less expensive than row locking. 0 Throughput ratio update insert (row locking) locking/table

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 19

Logical Bottleneck: Sequential Key generation • Consider an application in which one needs a sequential number to act as a key in a table, e.g. invoice numbers for bills. • Ad hoc approach: a separate table holding the last invoice number. Fetch and update that number on each insert transaction. • Counter approach: use facility such as Sequence (Oracle)/Identity(MSSQL).

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 20

10 Counter Facility -- data

Settings: accounts( number, branchnum, balance); create clustered index c on accounts(number);

counter ( nextkey ); insert into counter values (1);

– default isolation level: READ COMMITTED; Empty tables – Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 21

Counter Facility -- transactions

No Concurrent Transactions: – System [100 000 inserts, N threads] • SQL Server 7 (uses Identity column) insert into accounts values (94496,2789); • Oracle 8i insert into accounts values (seq.nextval,94496,2789); – Ad-hoc [100 000 inserts, N threads] begin transaction NextKey:=select nextkey from counter; update counter set nextkey = NextKey+1; commit transaction begin transaction insert into accounts values(NextKey,?,?); commit transaction

© Dennis Shasha, Philippe Bonnet - 2002 22

11 Avoid Bottlenecks: Counters

SQLServer • System generated counter (system) much better than a system counter managed as an ad-hoc Throughput attribute value within a table (statements/sec) (ad hoc). 0 1020304050 Number of concurrent insertion threads • The Oracle counter can become a bottleneck if every Oracle update is logged to disk, but caching many counter numbers

syste m is possible. ad-hoc

Throughput • Counters may miss ids. (statements/sec)

0 1020304050 Number of concurrent insertion© threads Dennis Shasha, Philippe Bonnet - 2002 23

Insertion Points -- transactions

No Concurrent Transactions: – Sequential [100 000 inserts, N threads] Insertions into account table with clustered index on ssnum Data is sorted on ssnum Single insertion point – Non Sequential [100 000 inserts, N threads] Insertions into account table with clustered index on ssnum Data is not sorted (uniform distribution) 100 000 insertion points – Hashing Key [100 000 inserts, N threads] Insertions into account table with extra attribute att with clustered index on (ssnum, att) Extra attribute att contains hash key (1021 possible values)

1021 insertion points© Dennis Shasha, Philippe Bonnet - 2002 24

12 Insertion Points

Row Locking • Page locking: single 800 600 insertion point is a source 400 sequential of contention (sequential 200 non sequential (tuples/sec) Throughput Throughput 0 hashing key key with clustered index, 0 102030405060 Number of concurrent insertion threads or heap) • Row locking: No Page Locking contention between 140 120 100 successive insertions. 80 sequential 60 non sequential 40 • DB2 v7.1 and Oracle 8i do Throughput (tuples/sec) hashing key 20 0 not support page locking. 0 102030405060 Number of concurrent insertion threads

© Dennis Shasha, Philippe Bonnet - 2002 25

Atomicity and Durability

• Every transaction either commits or aborts. It cannot change its mind COMMITTED COMMIT • Even in the face of

Ø ACTIVE BEGIN (running, waiting) failures: TRANS – Effects of committed ROLLBACK ABORTED transactions should be permanent; – Effects of aborted transactions should leave no trace.

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 26

13 UNSTABLE STORAGE DATABASE BUFFER LOG BUFFER lri lrj Pi Pj

WRITE WRITE log records before commit modified pages after commit

LOG DATA DATA DATA

RECOVERY STABLE STORAGE

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 27

Log IO -- data

Settings: lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER , L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT ); – READ COMMITTED isolation level – Empty table – Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 28

14 Log IO -- transactions

No Concurrent Transactions: Insertions [300 000 inserts, 10 threads], e.g., insert into lineitem values (1,7760,401,1,17,28351.92,0.04,0.02,'N','O', '1996-03-13','1996-02-12','1996-03- 22','DELIVER IN PERSON','TRUCK','blithely regular ideas caj');

© Dennis Shasha, Philippe Bonnet - 2002 29

Group Commits

• DB2 UDB v7.1 on Windows 2000 350 300 • Log records of many 250 200 transactions are written 150 together 100 50 – Increases throughput by

Throughput (tuples/sec) 0 125 reducing the number of Size of Group Commit writes – at cost of increased minimum response time.

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 30

15 Put the Log on a Separate Disk

• DB2 UDB v7.1 on Windows 2000 350 Log on same disk 300 Log on separate disk • 5 % performance 250 improvement if log is 200 150 located on a different disk 100 50 • Controller cache hides Throughput (tuples/sec) 0 negative impact controller cache no controller cache – mid-range server, with Adaptec RAID controller (80Mb RAM) and 2x18Gb disk drives.

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 31

Tuning Database Writes

• Dirty data is written to disk – When the number of dirty pages is greater than a given parameter (Oracle 8) – When the number of dirty pages crosses a given threshold (less than 3% of free pages in the database buffer for SQL Server 7) – When the log is full, a checkpoint is forced. This can have a significant impact on performance.

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 32

16 Tune Checkpoint Intervals

• Oracle 8i on Windows 2000 1.2 1 • A checkpoint (partial flush 0.8 of dirty pages to disk) 0.6 occurs at regular intervals 0.4 or when the log is full: 0.2 Throughput Ratio Throughput 0 – Impacts the performance of 0 checkpoint 4 checkpoints on-line processing + Reduces the size of log + Reduces time to recover from a crash

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 33

Database Buffer Size

DATABASE PROCESSES • Buffer too small, then hit ratio too small hit ratio = (logical acc. - physical acc.) / (logical acc.) DATABASE RAM BUFFER • Buffer too large, paging • Recommended strategy: monitor hit ratio and increase buffer size until hit ratio flattens out. If there is still Paging LOG DATA DATA Disk paging, then buy memory.

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 34

17 Buffer Size -- data

Settings: employees(ssnum, name, lat, long, hundreds1, hundreds2); clustered index c on employees(lat); (unused) – 10 distinct values of lat and long, 100 distinct values of hundreds1 and hundreds2 – 20000000 rows (630 Mb); –Warm Buffer – Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000 RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 35

Buffer Size -- queries

Queries: – Scan Query

select sum(long) from employees; – Multipoint query

select * from employees where lat = ?;

© Dennis Shasha, Philippe Bonnet - 2002 36

18 Database Buffer Size

Scan Query • SQL Server 7 on 0.1 0.08 Windows 2000 0.06 0.04 • Scan query: Throughput Throughput

(Queries/sec) 0.02

0 – LRU (least recently used) does 0 200 400 600 800 1000 badly when table spills to disk Buffer Size (Mb) as Stonebraker observed 20

Multipoint Query years ago.

160 • Multipoint query: 120 – Throughput increases with 80 buffer size until all data is 40 Throughput accessed from RAM. (Queries/sec) 0 0 200 400 600 800 1000 Buffer Size (Mb) © Dennis Shasha, Philippe Bonnet - 2002 37

Scan Performance -- data

Settings: lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER , L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT );

– 600 000 rows – Lineitem tuples are ~ 160 bytes long – Cold Buffer – Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000. © Dennis Shasha, Philippe Bonnet - 2002 38

19 Scan Performance -- queries

Queries: select avg(l_discount) from lineitem;

© Dennis Shasha, Philippe Bonnet - 2002 39

Prefetching

• DB2 UDB v7.1 on Windows 2000 0.2

0.15

0.1 • Throughput increases sca n 0.05 up to a certain point

Throughput (Trans/sec) Throughput 0 when prefetching size 32Kb 64Kb 128Kb 256Kb Prefetching increases.

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 40

20 Usage Factor

• DB2 UDB v7.1 on Windows 2000 0.2 • Usage factor is the 0.15 percentage of the page 0.1 used by tuples and scan 0.05 auxilliary data structures

Throughput (Trans/sec) 0 (the rest is reserved for 70 80 90 100 future) Usage Factor (%) • Scan throughput increases with usage factor.

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 41

RAID Levels

• RAID 0: striping (no redundancy) • RAID 1: mirroring (2 disks) • RAID 5: parity checking – Read: stripes read from multiple disks (in parallel) – Write: 2 reads + 2 writes • RAID 10: striping and mirroring • Software vs. Hardware RAID: – Software RAID: run on the server’s CPU – Hardware RAID: run on the RAID controller’s CPU

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 42

21 Why 4 read/writes when updating a single stripe using RAID 5? • Read old data stripe; read parity stripe (2 reads) • XOR old data stripe with replacing one. • Take result of XOR and XOR with parity stripe. • Write new data stripe and new parity stripe (2 writes).

© Dennis Shasha, Philippe Bonnet - 2002 43

RAID Levels -- data

Settings: accounts( number, branchnum, balance); create clustered index c on accounts(number);

– 100000 rows – Cold Buffer – Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 44

22 RAID Levels -- transactions

No Concurrent Transactions: – Read Intensive: select avg(balance) from accounts; – Write Intensive, e.g. typical insert: insert into accounts values (690466,6840,2272.76); Writes are uniformly distributed.

© Dennis Shasha, Philippe Bonnet - 2002 45

RAID Levels

Read-Intensive • SQL Server7 on Windows 80000

60000 2000 (SoftRAID means 40000 striping/parity at host) 20000

0 • Read-Intensive:

Throughput (tuples/sec) Soft- RAID5 RAID0 RAID10 RAID1 Single RAID5 Disk – Using multiple disks (RAID0, RAID 10, RAID5) Write-Intensive increases throughput 160 significantly. 120 • Write-Intensive: 80

40 – Without cache, RAID 5

0 suffers. With cache, it is ok.

Throughput (tuples/sec) Soft- RAID5 RAID0 RAID10 RAID1 Single RAID5 Disk

© Dennis Shasha, Philippe Bonnet - 2002 46

23 RAID Levels

• Log File – RAID 1 is appropriate • Fault tolerance with high write throughput. Writes are synchronous and sequential. No benefits in striping. • Temporary Files – RAID 0 is appropriate. • No fault tolerance. High throughput. • Data and Index Files – RAID 5 is best suited for read intensive apps or if the RAID controller cache is effective enough. – RAID 10 is best suited for write intensive apps. 2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 47

Controller Prefecthing no, Write-back yes. • Read-ahead: – Prefetching at the disk controller level. – No information on access pattern. – Better to let database management system do it. • Write-back vs. write through: – Write back: transfer terminated as soon as data is written to cache. • Batteries to guarantee write back in case of power failure – Write through: transfer terminated as soon as data is written to disk.

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 48

24 SCSI Controller Cache -- data

Settings: employees(ssnum, name, lat, long, hundreds1, hundreds2); create clustered index c on employees(hundreds2); – Employees table partitioned over two disks; Log on a separate disk; same controller (same channel). – 200 000 rows per table – Database buffer size limited to 400 Mb. – Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 49

SCSI (not disk) Controller Cache -- transactions No Concurrent Transactions: update employees set lat = long, long = lat where hundreds2 = ?; – cache friendly: update of 20,000 rows (~90Mb) – cache unfriendly: update of 200,000 rows (~900Mb)

© Dennis Shasha, Philippe Bonnet - 2002 50

25 SCSI Controller Cache

• SQL Server 7 on Windows 2000. 2 Disks - Cache Size 80Mb • Adaptec ServerRaid

2000 controller: no cache –80 Mb RAM 1500 cache – Write-back mode 1000 • Updates 500 • Controller cache increases Throughput (tuples/sec) 0 throughput whether cache friendly (90Mb) cache unfriendly (900Mb) operation is cache friendly or not. – Efficient replacement policy! 2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet - 2002 51

Index Tuning

• Index issues – Indexes may be better or worse than scans – Multi-table joins that run on for hours, because the wrong indexes are defined – Concurrency control bottlenecks – Indexes that are maintained and never used

3 - Index Tuning © Dennis Shasha, Philippe Bonnet - 2002 52

26 Clustered / Non clustered index

• Clustered index • Non-clustered index (primary index) (secondary index) – A clustered index on – A non clustered index does attribute X co-locates not constrain table records whose X values are organization. near to one another. – There might be several non- clustered indexes per table.

Records © Dennis Shasha, Philippe Bonnet - 2002Records 53

Dense / Sparse Index

•Sparse index • Dense index – Pointers are associated to – Pointers are associated to pages records – Non clustered indexes are dense

record record P1 P2 Pi record

3 - Index Tuning © Dennis Shasha, Philippe Bonnet - 2002 54

27 Index Implementations in some major DBMS •SQL Server • Oracle – B+-Tree data structure – B+-tree, hash, bitmap, – Clustered indexes are sparse spatial extender for R-Tree – Indexes maintained as – clustered index updates/insertions/deletes • Index organized table are performed (unique/clustered) •DB2 • Clusters used when – B+-Tree data structure, creating tables. spatial extender for R-tree • TimesTen (Main-memory – Clustered indexes are dense DBMS) – Explicit command for index reorganization – T-tree

3 - Index Tuning © Dennis Shasha, Philippe Bonnet - 2002 55

Types of Queries

• Point Query • Range Query

SELECT balance SELECT number FROM accounts FROM accounts WHERE number = 1023; WHERE balance > 10000 and balance <= 20000; • Multipoint Query • Prefix Match Query SELECT balance FROM accounts SELECT * WHERE branchnum = 100; FROM employees WHERE name = ‘J*’ ;

3 - Index Tuning © Dennis Shasha, Philippe Bonnet - 2002 56

28 More Types of Queries

• Extremal Query • Grouping Query

SELECT branchnum, avg(balance) SELECT * FROM accounts FROM accounts GROUP BY branchnum; WHERE balance = max(select balance from accounts) • Join Query • Ordering Query SELECT distinct branch.adresse SELECT * FROM accounts, branch WHERE FROM accounts accounts.branchnum = ORDER BY balance; branch.number and accounts.balance > 10000;

3 - Index Tuning © Dennis Shasha, Philippe Bonnet - 2002 57

Index Tuning -- data

Settings: employees(ssnum, name, lat, long, hundreds1, hundreds2); clustered index c on employees(hundreds1) with fillfactor = 100; nonclustered index nc on employees (hundreds2); index nc3 on employees (ssnum, name, hundreds2); index nc4 on employees (lat, ssnum, name); – 1000000 rows ; Cold buffer – Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 58

29 Index Tuning -- operations

Operations: – Update: update employees set name = ‘XXX’ where ssnum = ?; – Insert: insert into employees values (1003505,'polo94064',97.48,84.03,4700.55,3987.2); – Multipoint query: select * from employees where hundreds1= ?; select * from employees where hundreds2= ?; – Covered query: select ssnum, name, lat from employees; – Range Query: select * from employees where long between ? and ?; – Point Query: select * from© Dennisemployees Shasha, Philippe where Bonnet ssnum - 2002 = ? 59

Clustered Index

• Multipoint query that returns 100 records clustered nonclustered no index 1 out of 1000000. 0.8 • Cold buffer 0.6

0.4 • Clustered index is

0.2 Throughput ratio twice as fast as non- 0 clustered index and SQLServer Oracle DB2 orders of magnitude faster than a scan.

3 - Index Tuning © Dennis Shasha, Philippe Bonnet - 2002 60

30 Index “Face Lifts”

• Index is created with fillfactor = 100. SQLServer • Insertions cause page splits

100 and extra I/O for each query 80 • Maintenance consists in 60 40 dropping and recreating the 20 No maintenance Throughput Throughput index (queries/sec) 0 Maintenance 0 20406080100• With maintenance % Increase in Table Size performance is constant while performance degrades significantly if no maintenance is performed. 3 - Index Tuning © Dennis Shasha, Philippe Bonnet - 2002 61

Index Maintenance

• In Oracle, clustered index are approximated by an index Oracle defined on a clustered table 20 • No automatic physical 15 reorganization 10 5 No • Index defined with pctfree = 0 Throughput Throughput

(queries/sec) mainte nance 0 0 20406080100• Overflow pages cause % Increase in Table Size performance degradation

3 - Index Tuning © Dennis Shasha, Philippe Bonnet - 2002 62

31 Covering Index - defined

• Select name from employee where department = “marketing” • Good covering index would be on (department, name) • Index on (name, department) less useful. • Index on department alone moderately useful.

3 - Index Tuning © Dennis Shasha, Philippe Bonnet - 2002 63

Covering Index - impact

• Covering index performs 70 better than clustering 60 covering sec) index when first attributes 50 covering - not of index are in the where 40 ordered 30 non clustering clause and last attributes 20 in the select. clustering 10 • When attributes are not in

Throughput (queries/ Throughput 0 SQLServer order then performance is much worse.

3 - Index Tuning © Dennis Shasha, Philippe Bonnet - 2002 64

32 Scan Can Sometimes Win

• IBM DB2 v7.1 on Windows 2000 • Range Query • If a query retrieves 10% of the records or more, scan scanning is often better non clustering than using a non- Throughput (queries/sec) 0 5 10 15 20 25 clustering non-covering % of selected records index. Crossover > 10% when records are large or table is fragmented on disk – scan cost increases.

3 - Index Tuning © Dennis Shasha, Philippe Bonnet - 2002 65

Index on Small Tables

• Small table: 100 records, i.e., a few pages. • Two concurrent processes 18 16 perform updates (each 14 process works for 10ms 12 10 before it commits) 8 6 • No index: the table is 4 scanned for each update. 2

Throughput (updates/sec) 0 No concurrent updates. no index index • A clustered index allows to take advantage of row locking. 3 - Index Tuning © Dennis Shasha, Philippe Bonnet - 2002 66

33 Bitmap vs. Hash vs. B+-Tree

Settings: employees(ssnum, name, lat, long, hundreds1, hundreds2); create cluster c_hundreds (hundreds2 number(8)) PCTFREE 0; create cluster c_ssnum(ssnum integer) PCTFREE 0 size 60;

create cluster c_hundreds(hundreds2 number(8)) PCTFREE 0 HASHKEYS 1000 size 600; create cluster c_ssnum(ssnum integer) PCTFREE 0 HASHKEYS 1000000 SIZE 60;

create bitmap index b on employees (hundreds2); create bitmap index b2 on employees (ssnum);

– 1000000 rows ; Cold buffer – Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives© Dennis (10000RPM), Shasha, Philippe Windows Bonnet 2000. - 2002 67

Multipoint query: B-Tree, Hash Tree, Bitmap • There is an overflow Multipoint Queries chain in a hash index 25 • In a clustered B-Tree 20

15 index records are on 10 contiguous pages. 5

0 • Bitmap is proportional Throughput (queries/sec) Throughput B-Tree Hash index Bitmap index to size of table and non-clustered for record access.

3 - Index Tuning © Dennis Shasha, Philippe Bonnet - 2002 68

34 B-Tree, Hash Tree, Bitmap

Range Queries 0.5 • Hash indexes don’t 0.4 0.3 help when evaluating 0.2 range queries 0.1

0 Throughput (queries/sec) B-Tree Hash index Bitmap index

Point Queries • Hash index 60 outperforms B-tree on 50 40 point queries 30 20 10

Throughput(queries/sec) 0 B-Tree hash index 3 - Index Tuning © Dennis Shasha, Philippe Bonnet - 2002 69

Tuning Relational Systems

• Schema Tuning – Denormalization • Query Tuning – Query rewriting – Materialized views

4 - Relational Systems © Dennis Shasha, Philippe Bonnet - 2002 70

35 Denormalizing -- data

Settings: lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT ); region( R_REGIONKEY, R_NAME, R_COMMENT ); nation( N_NATIONKEY, N_NAME, N_REGIONKEY, N_COMMENT,); supplier( S_SUPPKEY, S_NAME, S_ADDRESS, S_NATIONKEY, S_PHONE, S_ACCTBAL, S_COMMENT); – 600000 rows in lineitem, 25 nations, 5 regions, 500 suppliers

© Dennis Shasha, Philippe Bonnet - 2002 71

Denormalizing -- transactions lineitemdenormalized ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT, L_REGIONNAME);L_REGIONNAME – 600000 rows in lineitemdenormalized – Cold Buffer – Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 72

36 Queries on Normalized vs. Denormalized Schemas Queries: select L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, R_NAME from LINEITEM, REGION, SUPPLIER, NATION where L_SUPPKEY = S_SUPPKEY and S_NATIONKEY = N_NATIONKEY and N_REGIONKEY = R_REGIONKEY and R_NAME = 'EUROPE';

select L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, L_REGIONNAME from LINEITEMDENORMALIZED where L_REGIONNAME = 'EUROPE';

© Dennis Shasha, Philippe Bonnet - 2002 73

Denormalization

• TPC-H schema • Query: find all lineitems

0.002 whose supplier is in Europe. 0.0015 • With a normalized schema 0.001 this query is a 4-way join. 0.0005 • If we denormalize lineitem and add the name

Throughput (Queries/sec) 0 normalized denormalized of the region for each lineitem (foreign key denormalization) throughput improves 30% 4 - Relational Systems © Dennis Shasha, Philippe Bonnet - 2002 74

37 Queries

Settings: employee(ssnum, name, dept, salary, numfriends); student(ssnum, name, course, grade); techdept(dept, manager, location); clustered index i1 on employee (ssnum); nonclustered index i2 on employee (name); nonclustered index i3 on employee (dept); clustered index i4 on student (ssnum); nonclustered index i5 on student (name); clustered index i6 on techdept (dept); – 100000 rows in employee, 100000 students, 10 departments; Cold buffer – Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 75

Queries – View on Join

View Techlocation: create view techlocation as select ssnum, techdept.dept, location from employee, techdept where employee.dept = techdept.dept; Queries: –Original: select dept from techlocation where ssnum = ?; – Rewritten: select dept from employee where ssnum = ?;

© Dennis Shasha, Philippe Bonnet - 2002 76

38 Query Rewriting - Views

• All systems expand the selection on a view into a 80 join 70 SQLServer 2000 Oracle 8i 60 • The difference between a DB2 V7.1 50 plain selection and a join 40 (on a primary key-foreign 30 key) followed by a 20

10 projection is greater on Throughput improvement percent Throughput improvement 0 SQL Server than on view Oracle and DB2 v7.1.

4 - Relational Systems © Dennis Shasha, Philippe Bonnet - 2002 77

Queries – Correlated Subqueries Queries: – Original: select ssnum from employee e1 where salary = (select max(salary) from employee e2 where e2.dept = e1.dept); – Rewritten: select max(salary) as bigsalary, dept into TEMP from employee group by dept; select ssnum from employee, TEMP where salary = bigsalary and employee.dept© Dennis Shasha, = temp.dept; Philippe Bonnet - 2002 78

39 Query Rewriting – Correlated Subqueries • SQL Server 2000 does a good job at handling the > 1000 > 10000 80 correlated subqueries (a

70 hash join is used as SQLServer 2000 60 Oracle 8i opposed to a nested loop DB2 V7.1 50 between query blocks) 40 – The techniques 30 implemented in SQL Server 20 2000 are described in 10 “Orthogonal Optimization Throughput improvement percent improvement Throughput 0 of Subqueries and correlated subquery -10 Aggregates” by C.Galindo- Legaria and M.Joshi, SIGMOD 2001. 4 - Relational Systems © Dennis Shasha, Philippe Bonnet - 2002 79

Aggregate Maintenance -- data

Settings: orders( ordernum, itemnum, quantity, purchaser, vendor ); create clustered index i_order on orders(itemnum); store( vendor, name ); item(itemnum, price); create clustered index i_item on item(itemnum); vendorOutstanding( vendor, amount); storeOutstanding( store, amount);

– 1000000 orders, 10000 stores, 400000 items; Cold buffer – Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 80

40 Aggregate Maintenance -- triggers Triggers for Aggregate Maintenance create trigger updateVendorOutstanding on orders for insert as update vendorOutstanding set amount = (select vendorOutstanding.amount+sum(inserted.quantity*item.price) from inserted,item where inserted.itemnum = item.itemnum ) where vendor = (select vendor from inserted) ; create trigger updateStoreOutstanding on orders for insert as update storeOutstanding set amount = (select storeOutstanding.amount+sum(inserted.quantity*item.price) from inserted,item where inserted.itemnum = item.itemnum ) where store = (select store.name from inserted, store where inserted.vendor = store.vendor) ;

© Dennis Shasha, Philippe Bonnet - 2002 81

Aggregate Maintenance -- transactions Concurrent Transactions: –Insertions insert into orders values (1000350,7825,562,'xxxxxx6944','vendor4');

– Queries (first without, then with redundant tables) select orders.vendor, sum(orders.quantity*item.price) from orders,item where orders.itemnum = item.itemnum group by orders.vendor; vs. select * from vendorOutstanding;

select store.name, sum(orders.quantity*item.price) from orders,item, store where orders.itemnum = item.itemnum and orders.vendor = store.vendor group by store.name; © Dennis Shasha, Philippe Bonnet - 2002 82 vs. select * from storeOutstanding;

41 Aggregate Maintenance

• SQLServer 2000 on Windows 2000 pect. of gain with aggregate maintenance

35000 31900 • Using triggers for 30000 25000 21900 view maintenance 20000 15000 • If queries frequent or 10000 5000 important, then 0 -5000 insert- 62.2 vendor total store total aggregate maintenance is good.

4 - Relational Systems © Dennis Shasha, Philippe Bonnet - 2002 83

Superlinearity -- data

Settings: sales( id, itemid, customerid, storeid, amount, quantity); item (itemid); customer (customerid); store (storeid); A sale is successful if all foreign keys are present.

successfulsales(id, itemid, customerid, storeid, amount, quantity);

unsuccessfulsales(id, itemid, customerid, storeid, amount, quantity);

tempsales( id, itemid, customerid, storeid, amount,quantity); © Dennis Shasha, Philippe Bonnet - 2002 84

42 Superlinearity -- indexes

Settings (non-clustering, dense indexes): index s1 on item(itemid); index s2 on customer(customerid); index s3 on store(storeid);

index succ on successfulsales(id); – 1000000 sales, 400000 customers, 40000 items, 1000 stores – Cold buffer – Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 85

Superlinearity -- queries

Queries: – Insert/create indexdelete insert into successfulsales select sales.id, sales.itemid, sales.customerid, sales.storeid, sales.amount, sales.quantity from sales, item, customer, store where sales.itemid = item.itemid and sales.customerid = customer.customerid and sales.storeid = store.storeid;

insert into unsuccessfulsales select * from sales; go delete from unsuccessfulsales where id in (select id from successfulsales)

© Dennis Shasha, Philippe Bonnet - 2002 86

43 Superlinearity -- batch queries

Queries: – Small batches DECLARE @Nlow INT; DECLARE @Nhigh INT; DECLARE @INCR INT; set @INCR = 100000 set @NLow = 0 set @Nhigh = @INCR WHILE (@NLow <= 500000) BEGIN insert into tempsales select * from sales where id between @NLow and @Nhigh set @Nlow = @Nlow + @INCR set @Nhigh = @Nhigh + @INCR delete from tempsales where id in (select id from successfulsales); insert into unsuccessfulsales select * from tempsales; delete from tempsales; END © Dennis Shasha, Philippe Bonnet - 2002 87

Superlinearity -- outer join

Queries: –outerjoin insert into successfulsales select sales.id, item.itemid, customer.customerid, store.storeid, sales.amount, sales.quantity from ((sales left outer join item on sales.itemid = item.itemid) left outer join customer on sales.customerid = customer.customerid) left outer join store on sales.storeid = store.storeid;

insert into unsuccessfulsales select * from successfulsales where itemid is null or customerid is null or storeid is null; go delete from successfulsales where itemid is null or customerid is null or storeid is null© Dennis Shasha, Philippe Bonnet - 2002 88

44 Circumventing Superlinearity

• SQL Server 2000 • Outer join achieves the

140 insert/delete no index best response time. 120 insert/delete indexed 100 small batches • Small batches do not outer join 80 help because overhead 60

40 of crossing the Response Time (sec) 20 application interface is 0 small large higher than the benefit of joining with smaller tables.

4 - Relational Systems © Dennis Shasha, Philippe Bonnet - 2002 89

Tuning the Application Interface

•4GL – Power++, Visual basic • Programming language + Call Level Interface – ODBC: Open DataBase Connectivity – JDBC: Java based API – OCI (C++/Oracle), CLI (C++/ DB2), Perl/DBI • In the following experiments, the client program is located on the site. Overhead is due to crossing the application interface.

5 - Tuning the API © Dennis Shasha, Philippe Bonnet - 2002 90

45 Looping can hurt -- data

Settings:

lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT ); – 600 000 rows; warm buffer. – Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 91

Looping can hurt -- queries

• Queries: – No loop: sqlStmt = “select * from lineitem where l_partkey <= 200;” odbc->prepareStmt(sqlStmt); odbc->execPrepared(sqlStmt);

– Loop: sqlStmt = “select * from lineitem where l_partkey = ?;” odbc->prepareStmt(sqlStmt); for (int i=1; i<100; i++) { odbc->bindParameter(1, SQL_INTEGER, i); odbc->execPrepared(sqlStmt); }

© Dennis Shasha, Philippe Bonnet - 2002 92

46 Looping can Hurt

• SQL Server 2000 on Windows 2000 600 • Crossing the application 500

400 interface has a significant

300 impact on performance. 200 • Why would a programmer 100 use a loop instead of throughput (records/sec) 0 relying on set-oriented loop no loop operations: object- orientation?

5 - Tuning the API © Dennis Shasha, Philippe Bonnet - 2002 93

Cursors are Death -- data

Settings: employees(ssnum, name, lat, long, hundreds1, hundreds2);

– 100000 rows ; Cold buffer – Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 94

47 Cursors are Death -- queries

Queries: –No cursor select * from employees; – Cursor DECLARE d_cursor CURSOR FOR select * from employees; OPEN d_cursor while (@@FETCH_STATUS = 0) BEGIN FETCH NEXT from d_cursor END CLOSE d_cursor go

© Dennis Shasha, Philippe Bonnet - 2002 95

Cursors are Death

• SQL Server 2000 on

5000 Windows 2000 4000 • Response time is a few 3000 seconds with a SQL 2000

1000 query and more than

0

Throughput (records/sec) an hour iterating over cursor SQL a cursor.

5 - Tuning the API © Dennis Shasha, Philippe Bonnet - 2002 96

48 Retrieve Needed Columns Only - data Settings: lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT ); create index i_nc_lineitem on lineitem (l_orderkey, l_partkey, l_suppkey, l_shipdate, l_commitdate); – 600 000 rows; warm buffer. – Lineitem records are ~ 10 bytes long – Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 97

Retrieve Needed Columns Only - queries Queries: –All Select * from lineitem; – Covered subset Select l_orderkey, l_partkey, l_suppkey, l_shipdate, l_commitdate from lineitem;

© Dennis Shasha, Philippe Bonnet - 2002 98

49 Retrieve Needed Columns Only

• Avoid transferring unnecessary data

1.75 • May enable use of a 1.5 all covering index. 1.25 covered subset 1 • In the experiment the 0.75 subset contains ¼ of the 0.5 attributes. 0.25 0 – Reducing the amount of Throughput (queries/msec) no index index data that crosses the application interface yields significant performance Experiment performed on improvement. Oracle8iEE on Windows 2000.

5 - Tuning the API © Dennis Shasha, Philippe Bonnet - 2002 99

Bulk Loading Data

Settings:

lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT ); – Initially the table is empty; 600 000 rows to be inserted (138Mb) – Table sits one disk. No constraint, index is defined. – Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 100

50 Bulk Loading Queries

Oracle 8i sqlldr directpath=true control=load_lineitem.ctl data=E:\Data\lineitem.tbl

load data infile "lineitem.tbl" into table LINEITEM append fields terminated by '|' ( L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE DATE "YYYY-MM- DD", L_COMMITDATE DATE "YYYY-MM-DD", L_RECEIPTDATE DATE "YYYY-MM-DD", L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT ) © Dennis Shasha, Philippe Bonnet - 2002 101

Direct Path

• Direct path loading 50000

40000 bypasses the query

30000 engine and the storage

20000 manager. It is orders

10000 of magnitude faster

Throughput (rec/sec) 65 0 than for conventional conventional direct path insert bulk load (commit every 100 records) and Experiment performed on inserts (commit for Oracle8iEE on Windows 2000. each record).

5 - Tuning the API © Dennis Shasha, Philippe Bonnet - 2002 102

51 Batch Size

• Throughput increases 5000 steadily when the batch 4000 size increases to 100000 3000 records.Throughput 2000 remains constant

1000 afterwards. Throughput (records/sec) 0 • Trade-off between 0 100000 200000 300000 400000 500000 600000 performance and amount of data that has to be Experiment performed on reloaded in case of SQL Server 2000 problem. on Windows 2000.

5 - Tuning the API © Dennis Shasha, Philippe Bonnet - 2002 103

Tuning E-Commerce Applications Database-backed web-sites: – Online shops – Shop comparison portals – MS TerraServer

6 - Ecommerce © Dennis Shasha, Philippe Bonnet - 2002 104

52 E-commerce Application Architecture

Clients Web servers Application servers Database server DB cache Web cacheWeb cache Web DB cache Web cache Web 6 - Ecommerce © Dennis Shasha, Philippe Bonnet - 2002 105

E-commerce Application Workload • Touristic searching (frequent, cached) – Access the top few pages. Pages may be personalized. Data may be out-of-date. • Category searching (frequent, partly cached and need for timeliness guarantees) – Down some hierarchy, e.g., men’s clothing. • Keyword searching (frequent, uncached, need for timeliness guarantees) • Shopping cart interactions (rare, but transactional) • Electronic purchasing (rare, but transactional) 6 - Ecommerce © Dennis Shasha, Philippe Bonnet - 2002 106

53 Design Issues • Need to keep historic information – Electronic payment acknowledgements get lost. • Preparation for variable load – Regular patterns of web site accesses during the day, and within a week. • Possibility of disconnections – State information transmitted to the client (cookies) • Special consideration for low bandwidth • Schema evolution – Representing e-commerce data as attribute-value pairs (IBM Websphere) 6 - Ecommerce © Dennis Shasha, Philippe Bonnet - 2002 107

Caching • Web cache: – Static web pages – Caching fragments of dynamically created web pages • Database cache (Oracle9iAS, TimesTen’s FrontTier) – Materialized views to represent cached data. – Queries are executed either using the database cache or the database server. Updates are propagated to keep the cache(s) consistent. – Note to vendors: It would be good to have queries distributed between cache and server.

6 - Ecommerce © Dennis Shasha, Philippe Bonnet - 2002 108

54 Ecommerce -- setting

Settings:

shoppingcart( shopperid, itemid, price, qty); – 500000 rows; warm buffer – Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 109

Ecommerce -- transactions

Concurrent Transactions: –Mix insert into shoppingcart values (107999,914,870,214); update shoppingcart set Qty = 10 where shopperid = 95047 and itemid = 88636; delete from shoppingcart where shopperid = 86123 and itemid = 8321; select shopperid, itemid, qty, price from shoppingcart where shopperid = ?; – Queries Only select shopperid, itemid, qty, price from shoppingcart where shopperid = ?;

© Dennis Shasha, Philippe Bonnet - 2002 110

55 Connection Pooling (no refusals)

• Each thread establishes a connection and performs 5 insert statements. 250 • If a connection cannot be 200 established the thread waits 15 150 secs before trying again. 100 connection pooling

Response Time Response 50 simple connections • The number of connection is 0 limited to 60 on the database 20 40 60 80 100 120 140 Number of client threads server. • Using connection pooling, the Experiment performed on requests are queued and Oracle8i on Windows 2000 serviced when possible. There are no refused connections. 6 - Ecommerce © Dennis Shasha, Philippe Bonnet - 2002 111

Indexing

90 • Using a clustered 80 no index 70 sec) index on shopperid in clustered index 60 50 the shopping cart 40 30 provides: 20 10

Throughput (queries/ Throughput – Query speed-up 0 Mix Queries Only – Update/Deletion speed- up Experiment performed on SQL Server 2000 on Windows 2000

6 - Ecommerce © Dennis Shasha, Philippe Bonnet - 2002 112

56 Capacity Planning

Entry (S1) 0.4 0.5 • Arrival Rate – A1 is given as an assumption – A2 = (0.4 A1) + (0.5 A2) Search (S2) – A3 = 0.1 A2 • Service Time (S) 0.1 – S1, S2, S3 are measured • Utilization Checkout (S3) –U = A x S • Response Time Getting the demand assumptions right – R = U/(A(1-U)) = S/(1-U) is what makes capacity planning hard (assuming Poisson arrivals)

6 - Ecommerce © Dennis Shasha, Philippe Bonnet - 2002 113

Datawarehouse Tuning

• Aggregate (strategic) targeting: – Aggregates flow up from a wide selection of data, and then – Targeted decisions flow down •Examples: – Riding the wave of clothing fads – Tracking delays for frequent-flyer customers

7 - Datawarehouse © Dennis Shasha, Philippe Bonnet - 2002 114

57 Data Warehouse Workload •Broad – Aggregate queries over ranges of values, e.g., find the total sales by region and quarter. • Deep – Queries that require precise individualized information, e.g., which frequent flyers have been delayed several times in the last month? • Dynamic (vs. Static) – Queries that require up-to-date information, e.g. which nodes have the highest traffic now?

7 - Datawarehouse © Dennis Shasha, Philippe Bonnet - 2002 115

Tuning Knobs

• Indexes • Materialized views • Approximation

7 - Datawarehouse © Dennis Shasha, Philippe Bonnet - 2002 116

58 Bitmaps -- data

Settings: lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT ); create bitmap index b_lin_2 on lineitem(l_returnflag); create bitmap index b_lin_3 on lineitem(l_linestatus); create bitmap index b_lin_4 on lineitem(l_linenumber); – 100000 rows ; cold buffer – Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 117

Bitmaps -- queries

Queries: – 1 attribute select count(*) from lineitem where l_returnflag = 'N'; – 2 attributes select count(*) from lineitem where l_returnflag = 'N' and l_linenumber > 3; – 3 attributes select count(*) from lineitem where l_returnflag = 'N' and l_linenumber > 3 and l_linestatus = 'F';

© Dennis Shasha, Philippe Bonnet - 2002 118

59 Bitmaps

12

10 • Order of magnitude 8 improvement compared to 6 scan. linear scan 4 bitmap 2 • Bitmaps are best suited for Throughput (Queries/sec) Throughput 0 multiple conditions on 123 several attributes, each A having a low selectivity. N R l_returnflag O F l_linestatus 7 - Datawarehouse © Dennis Shasha, Philippe Bonnet - 2002 119

Multidimensional Indexes -- data

Settings: create table spatial_facts ( a1 int, a2 int, a3 int, a4 int, a5 int, a6 int, a7 int, a8 int, a9 int, a10 int, geom_a3_a7 mdsys.sdo_geometry ); create index r_spatialfacts on spatial_facts(geom_a3_a7) indextype is mdsys.spatial_index; create bitmap index b2_spatialfacts on spatial_facts(a3,a7); – 500000 rows ; cold buffer – Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

© Dennis Shasha, Philippe Bonnet - 2002 120

60 Multidimensional Indexes -- queries Queries: – Point Queries select count(*) from fact where a3 = 694014 and a7 = 928878;

select count(*) from spatial_facts where SDO_RELATE(geom_a3_a7, MDSYS.SDO_GEOMETRY(2001, NULL, MDSYS.SDO_POINT_TYPE(694014,928878, NULL), NULL, NULL), 'mask=equal querytype=WINDOW') = 'TRUE'; – Range Queries select count(*) from spatial_facts where SDO_RELATE(geom_a3_a7, mdsys.sdo_geometry(2003,NULL,NULL, mdsys.sdo_elem_info_array(1,1003,3),mdsys.sdo_ordinate_array (10,800000,1000000,1000000)), 'mask=inside querytype=WINDOW') = 'TRUE'; select count(*) from spatial_facts where a3 > 10 and a3 < 1000000 and a7 > 800000 and a7 < 1000000;

© Dennis Shasha, Philippe Bonnet - 2002 121

Multidimensional Indexes

• Oracle 8i on Windows 2000

14 • Spatial Extension: 12 bitmap - two – 2-dimensional data 10 attributes r-tree 8 – Spatial functions used 6 in the query 4

Response Time(sec) 2 • R-tree does not 0 point query range query perform well because of the overhead of spatial extension.

7 - Datawarehouse © Dennis Shasha, Philippe Bonnet - 2002 122

61 Multidimensional Indexes

R-Tree Bitmaps

SELECT STATEMENT SELECT STATEMENT SORT AGGREGATE SORT AGGREGATE BITMAP CONVERSION TABLE ACCESS BY INDEX COUNT ROWID SPATIAL_FACTS BITMAP AND DOMAIN INDEX R_SPATIALFACTS BITMAP INDEX SINGLE VALUE B_FACT7 BITMAP INDEX SINGLE VALUE B_FACT3

© Dennis Shasha, Philippe Bonnet - 2002 123

Materialized Views -- data

Settings: orders( ordernum, itemnum, quantity, purchaser, vendor ); create clustered index i_order on orders(itemnum);

store( vendor, name );

item(itemnum, price); create clustered index i_item on item(itemnum);

– 1000000 orders, 10000 stores, 400000 items; Cold buffer – Oracle 9i – Pentium III (1 GHz, 256 Kb), 1Gb RAM, Adapter 39160 with 2 channels, 3x18Gb drives (10000RPM), Linux Debian 2.4.

© Dennis Shasha, Philippe Bonnet - 2002 124

62 Materialized Views -- data

Settings: create materialized view vendorOutstanding build immediate refresh complete enable query rewrite as select orders.vendor, sum(orders.quantity*item.price) from orders,item where orders.itemnum = item.itemnum group by orders.vendor;

© Dennis Shasha, Philippe Bonnet - 2002 125

Materialized Views -- transactions Concurrent Transactions: – Insertions insert into orders values (1000350,7825,562,'xxxxxx6944','vendor4');

–Queries select orders.vendor, sum(orders.quantity*item.price) from orders,item where orders.itemnum = item.itemnum group by orders.vendor; select * from vendorOutstanding;

© Dennis Shasha, Philippe Bonnet - 2002 126

63 Materialized Views

14 12 •Graph: 10 – Oracle9i on Linux 8 6 – Total sale by vendor is 4 materialized 2

Response Time (sec) 0 • Trade-off between query speed-up Materialized View No Materialized View and view maintenance: – The impact of incremental 1600 maintenance on performance is 1200 significant. 800 – Rebuild maintenance achieves a 400 good throughput. 0 Materialized Materialized No – A static data warehouse offers a View (fast on View (complete Materialized good trade-off.

Throughput (statements/sec) commit) on demand) View

© Dennis Shasha, Philippe Bonnet - 2002 127

Materialized View Maintenance • Problem when large • Let the views and base tables be number of views to nodes v_i maintain. • Let there be an edge from v_1 to v_2 if it possible to compute the • The order in which views view v_2 from v_1. Associate the are maintained is cost of computing v_2 from v_1 to important: this edge. – A view can be computed • Compute all pairs shortest path from an existing view where the start nodes are the set of instead of being recomputed base tables. from the base relations • The result is an acyclic graph A. (total per region can be Take a topological sort of A and let computed from total per that be the order of view nation). construction.

© Dennis Shasha, Philippe Bonnet - 2002 128

64 Approximations -- data

Settings: – TPC-H schema – Approximations insert into approxlineitem select top 6000 * from lineitem where l_linenumber = 4;

insert into approxorders select O_ORDERKEY, O_CUSTKEY, O_ORDERSTATUS, O_TOTALPRICE, O_ORDERDATE, O_ORDERPRIORITY, O_CLERK, O_SHIPPRIORITY, O_COMMENT from orders, approxlineitem where o_orderkey = l_orderkey; © Dennis Shasha, Philippe Bonnet - 2002 129

Approximations -- queries

insert into approxsupplier insert into approxpartsupp select distinct S_SUPPKEY, select distinct PS_PARTKEY, S_NAME , PS_SUPPKEY, S_ADDRESS, PS_AVAILQTY, S_NATIONKEY, PS_SUPPLYCOST, S_PHONE, PS_COMMENT S_ACCTBAL, from partsupp, approxpart, approxsupplier S_COMMENT where ps_partkey = p_partkey and from approxlineitem, supplier ps_suppkey = s_suppkey; where s_suppkey = l_suppkey; insert into approxcustomer insert into approxpart select distinct C_CUSTKEY, select distinct P_PARTKEY, C_NAME , P_NAME , C_ADDRESS, P_MFGR , C_NATIONKEY, P_BRAND , C_PHONE , P_TYPE , C_ACCTBAL, P_SIZE , C_MKTSEGMENT, P_CONTAINER , C_COMMENT P_RETAILPRICE , from customer, approxorders P_COMMENT where o_custkey = c_custkey; from approxlineitem, part insert into approxregion select * from where p_partkey = l_partkey; region; insert into approxnation select * from nation;

© Dennis Shasha, Philippe Bonnet - 2002 130

65 Approximations -- more queries

Queries: – Single table query on lineitem select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where datediff(day, l_shipdate, '1998-12-01') <= '120' group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;

© Dennis Shasha, Philippe Bonnet - 2002 131

Approximations -- still more

Queries: – 6-way join select n_name, avg(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'AFRICA' and o_orderdate >= '1993-01-01' and datediff(year, o_orderdate,'1993-01-01') < 1 group by n_name order by revenue© desc; Dennis Shasha, Philippe Bonnet - 2002 132

66 Approximation accuracy

TPC-H Query Q1: lineitem

60 1% sample 40 10% sample • Good approximation for 20 0 query Q1 on lineitem

(% of Base -20 12345678 Approximated Approximated Aggregate Value) Aggregate Values -40 • The aggregated values Aggregate values obtained on a query with a

TPC-H Query Q5: 6 way join 6-way join are

60 significantly different 40 from the actual values -- 20 0 for some applications may

(% of Base -20 12345 Approximated Approximated 1% sample Aggregate Value) still be good enough. aggregated values -40 10% sample Groups returned in the query

7 - Datawarehouse © Dennis Shasha, Philippe Bonnet - 2002 133

Approximation Speedup

• Aqua approximation on the TPC-H schema 4 3.5 base relations – 1% and 10% lineitem 3 1 % sample 2.5 sample propagated. 10% sample 2 1.5 1 • The query speed-up 0.5

Response Time (sec) 0 obtained with Q1 Q5 TPC-H Queries approximated relations is significant.

7 - Datawarehouse © Dennis Shasha, Philippe Bonnet - 2002 134

67 Tuning Distributed Applications

• Queries across multiple – Federated Datawarehouse – IBM’s DataJoiner now integrated at DB2 v7.2 • The source should perform as much work as possible and return as few data as possible for processing at the federated server • Processing data across multiple databases

8 - Distributed Apps © Dennis Shasha, Philippe Bonnet - 2002 135

A puzzle

• Two databases X and Y – X records inventory data to be used for restocking – Y contains sales data about shipments • You want to improve shipping – Certain data of X should be postprocessed on Y shortly after it enters X – You are allowed to add/change transactions on X and Y – You want to avoid losing data from X and you want to avoid double-processing data on Y, even in the face of failures.

8 - Distributed Apps © Dennis Shasha, Philippe Bonnet - 2002 136

68 Two-Phase Commit

commit + Commits are X coordinated between the source and the commit destination Source (coordinator - If one participant fails & participant) then blocking can Y occur - Not all db systems Destination support prepare-to- (participant) commit interface

8 - Distributed Apps © Dennis Shasha, Philippe Bonnet - 2002 137

Replication Server

+ Destination within a X few seconds of being up-to-date Source + Decision support queries can be asked on destination db Y - Administrator is needed when network Destination connection breaks!

8 - Distributed Apps © Dennis Shasha, Philippe Bonnet - 2002 138

69 Staging Tables

+ No specific mechanism is X Staging table necessary at source or Source destination - Coordination of transactions on X and Y Y

Destination

8 - Distributed Apps © Dennis Shasha, Philippe Bonnet - 2002 139

Staging Tables

Read from source table Write to destination M1 M2

Yunprocessed Yunprocessed Yunprocessed Yunprocessed unprocessed Yunprocessed Yunprocessed unprocessed unprocessed

Table S Table I Table I Table S

Database X Database Y Database X Database Y STEP 1 STEP 2

8 - Distributed Apps © Dennis Shasha, Philippe Bonnet - 2002 140

70 Staging Tables

Delete source; then dest Y Update on destination Xact: Update source site Xact:update destination site M3 M4

Yunprocessed Yunprocessed processed Yunprocessed Yunprocessed processed Yunprocessed unprocessed Yunprocessed unprocessed unprocessed unprocessed

Table S Table I Table I Table S

Database X Database Y Database X Database Y STEP 3 STEP 4

© Dennis Shasha, Philippe Bonnet - 2002 141

Troubleshooting Techniques(*)

• Extraction and Analysis of Performance Indicators – Consumer-producer chain framework – Tools • Query plan monitors • Performance monitors • Event monitors

(*) From Alberto Lerner’s chapter 9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 142

71 A Consumer-Producer Chain of a DBMS’s Resources

High Level sql commands Consumers

PARSERPARSER OPTIMIZEROPTIMIZER

Intermediate EXECUTIONEXECUTION SUBSYSTEM Resources/ DISK SUBSYSTEM LOCKING DISK LOCKING Consumers SYBSYSTEMSYBSYSTEM SUBSYSTEMSUBSYSTEM

CACHECACHE LOGGINGLOGGING probing MANAGER SUBSYSTEM spots MANAGER SUBSYSTEM for indicators DISK/ Primary MEMORY CPU NETWORK © Dennis Shasha, PhilippeCONTROLLER Bonnet - 2002 Resources143

Recurrent Patterns of Problems

Effects are not always felt first where the cause is!

• An overloading high-level consumer • A poorly parameterized subsystem • An overloaded primary resource

9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 144

72 A Systematic Approach to Monitoring

Extract indicators to answer the following questions • Question 1: Are critical queries being served in the most efficient manner? • Question 2: Are subsystems making optimal use of resources? • Question 3: Are there enough primary resources available?

9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 145

Investigating High Level Consumers

• Answer question 1: “Are critical queries being served in the most efficient manner?”

1. Identify the critical queries 2. Analyze their access plans 3. Profile their execution

9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 146

73 Identifying Critical Queries

Critical queries are usually those that: • Take a long time • Are frequently executed

Often, a user complaint will tip us off.

9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 147

Event Monitors to Identify Critical Queries • If no user complains... • Capture usage measurements at specific events (e.g., end of each query) and then sort by usage • Less overhead than other type of tools because indicators are usually by-product of events monitored • Typical measures include CPU used, IO used, locks obtained etc.

9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 148

74 An example Event Monitor

• CPU indicators sorted by Oracle’s Trace Data Viewer • Similar tools: DB2’s Event Monitor and MSSQL’s Server Profiler

9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 149

An example Plan Explainer

• Access plan according to MSSQL’s Query Analyzer • Similar tools: DB2’s Visual Explain and Oracle’s SQL Analyze Tool

9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 150

75 Finding Strangeness in Access Plans

What to pay attention to in a plan • Access paths for each table • Sorts or intermediary results • Order of operations • Algorithms used in the operators

9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 151

To Index or not to index?

select c_name, n_name from CUSTOMER join NATION on c_nationkey=n_nationkey where c_acctbal > 0 Which plan performs best? (nation_pk is an non-clustered index over n_nationkey, and similarly for acctbal_ix over c_acctbal)

9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 152

76 Non-clustering indexes can be trouble

For a low selectivity predicate, each access to the index generates a random access to the table – possibly duplicate! It ends up that the number of pages read from the table is greater than its size, i.e., a table scan is way better

Table Scan Index Scan CPU time 5 sec 76 sec data logical reads 143,075 pages 272,618 pages data physical reads 6,777 pages 131,425 pages index logical reads 136,319 pages 273,173 pages index physical reads 7 pages 552 pages

9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 153

An example Performance Monitor (query level)

Statement number: 1 Statement number: 1 select C_NAME, N_NAME select C_NAME, N_NAME from DBA.CUSTOMER join DBA.NATION on C_NATIONKEY = N_NATIONKEY from DBA.CUSTOMER join DBA.NATION on C_NATIONKEY = N_NATIONKEY where C_ACCTBAL > 0 where C_ACCTBAL > 0

Number of rows retrieved is: 136308 • Buffer and CPU Number of rows retrieved is: 136308 Number of rows sent to output is: 0 Number of rows sent to output is: 0 Elapsed Time is: 76.349 seconds consumption for a query Elapsed Time is: 76.349 seconds … … according to DB2’s Buffer pool data logical reads = 272618 Buffer pool data logical reads = 272618 Buffer pool data physical reads = 131425 Buffer pool data physical reads = 131425 Buffer pool data writes = 0 Benchmark tool Buffer pool data writes = 0 Buffer pool index logical reads = 273173 Buffer pool index logical reads = 273173 Buffer pool index physical reads = 552 Buffer pool index physical reads = 552 Buffer pool index writes = 0 • Similar tools: MSSQL’s Buffer pool index writes = 0 Total buffer pool read time (ms) = 71352 Total buffer pool read time (ms) = 71352 Total buffer pool write time (ms) = 0 SET STATISTICS Total buffer pool write time (ms) = 0 … … Summary of Results Summary of Results switch and Oracle’s ======Elapsed Agent CPU Rows Rows Elapsed Agent CPU Rows Rows Statement # Time (s) Time (s) Fetched Printed SQL Analyze Tool Statement # Time (s) Time (s) Fetched Printed 1 76.349 6.670 136308 0 1 76.349 6.670 136308 0 © Dennis Shasha, Philippe Bonnet - 2002 154

77 An example Performance Monitor (system level)

• An IO indicator’s consumption evolution (qualitative and quantitative) according to DB2’s System Monitor • Similar tools: Window’s Performance Monitor and Oracle’s Performance Manager 9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 155

Investigating High Level Consumers: Summary

Find critical queries

Investigate no yes Answer Q1 Found any? lower levels over them

no Overcon- yes Tune problematic sumption? queries

9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 156

78 Investigating Primary Resources

• Answer question 3: “Are there enough primary resources available for a DBMS to consume?” • Primary resources are: CPU, disk & controllers, memory, and network • Analyze specific OS-level indicators to discover bottlenecks. • A system-level Performance Monitor is the right tool here 9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 157

CPU Consumption Indicators at the OS Level

Sustained utilization 100% over 70% should trigger the alert. CPU total usage 70% System utilization % of shouldn’t be more utilization than 40%. DBMS (on a non- system usage dedicated machine) should be getting a time decent time share.

9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 158

79 Disk Performance Indicators at the OS Level

Should be close to zero Average Queue Size

New requests Wait queue Disk Transfers Wait times /second should also be close to Idle disk with zero pending requests? Check controller contention. Also, transfers should be balanced among disks/controllers © Dennis Shasha, Philippe Bonnet - 2002 159

Memory Consumption Indicators at the OS Level

Page faults/time should be close to zero. If paging happens, at least not DB cache pages. real virtual memory memory

% of pagefile in use (it’s used a fixed pagefile file/partition) will tell you how much memory is “lacking.”

9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 160

80 Investigating Intermediate Resources/Consumers

• Answer question 2: “Are subsystems making optimal use of resources?”

• Main subsystems: Cache Manager, Disk subsystem, Lock subsystem, and Log/Recovery subsystem • Similarly to Q3, extract and analyze relevant Performance Indicators

9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 161

Cache Manager Performance Indicators

If page is not in the Table cache, readpage scan (logical) generate an actual IO (physical). readpage() Ratio of readpages that did not generate physical IO should be Cache Pick 90% or more Manager victim strategy Data Pages Pages are regularly Free Page slots saved to disk to make free space. Page reads/ writes # of free slots should always be > 0

9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 162

81 Disk Manager Performance Indicators

Row displacement: should be rows kept under 5% of rows

page Free space fragmentation: pages with few space should not be in the free list Storage Hierarchy extent Data fragmentation: ideally (simplified) files that store DB objects (table, index) should be in one file or few (<5) contiguous extents

disk File position: should balance workload evenly among all 9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 163 disks

Lock Manager Performance Indicators

Lock Lock wait time for a request Locks transaction should be a pending list small fraction of the whole transaction time. Number of locks on wait should be a small fraction Object Lock Type TXN ID of the number of locks on Lock the lock list. List

Deadlocks and timeouts should seldom happen (no more then 1% of the transactions) 9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 164

82 Investigating Intermediate and Primary Resources: Summary

Answer Q3

Tune low-level Answer Q2 no Problems at yes OS level? resources

Tune yesProblematic no Investigate subsystems subsystems? upper level

9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 165

Troubleshooting Techniques

• Monitoring a DBMS’s performance should be based on queries and resources. – The consumption chain helps distinguish problems’ causes from their symptoms – Existing tools help extracting relevant performance indicators

9 - Troubleshooting © Dennis Shasha, Philippe Bonnet - 2002 166

83 Recall Tuning Principles

• Think globally, fix locally (troubleshoot to see what matters) • Partitioning breaks bottlenecks (find parallelism in processors, controllers, caches, and disks) • Start-up costs are high; running costs are low (batch size, cursors) • Be prepared for trade-offs (unless you can rethink the queries)

© Dennis Shasha, Philippe Bonnet - 2002 167

84