B.How Could a Table Scan Cause Poor Performance in SQL Statements?

Table Scan. a. What is a table scan? b. How could a table scan cause poor performance in SQL statements? c. Describe a situation and provide an SQL statement from one of our sample databases that illustrates the problem. 1. A table scan is reading every record from the table in a sequential order to find the data that a query is looking for. If you have a table with millions of records then this kind of reading can cause very slow performance in SQL queries.There are different situation where a full table scan can cause poor performance in SQL statements,such as consider a query which does not have a “Where” clause to filter the recordswhich could appear in the result set, then the full table scan will be performed on such statements or consider a query which has a“Where” clause, but none of the columns in that “Where” clause match the leading column of an index on the table, then a full table scan will be performed (Oracle, 2008) or if there is no reasonably selection condition in the “Where” clause can also cause the full table scan which in turn slows down the performance of the SQL query. Let us consider a SQL statement which can cause the full table scan. SQL>Select productid, prodname 2 From Product 3 Whereproddescr like '%hard disk%';

Execution Plan ------Plan hash value: 427209646

------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------| 0 | SELECT STATEMENT | | 22 | 44924 | 5 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| PRODUCT | 22 | 44924 | 5 (0)| 00:00:01 | ------The above query is not very selective and could fetch millions of records on a large table by resulting the poor performance of the SQL statement. Also the “where” condition will try to match for each and every row of the “PRODUCT” table which will cause a full table scan. First of all the column specified in the where clause does not refer to index column nor it is a primary key field and a “LIKE” query with a leading wildcard cannot be optimized. Table Joins. a. How does the order of joins in an SQL statement affect the performance of the join? b. What can the DBA do to determine the preferred order of joins for an SQL statement that includes the join of at least three tables? c. Provide an example SQL join from Global Engineering or the Retail Company (but not both) and discuss the preferred join order.

2. The order in which the tables in your queries are joined can have an effect on the query performs. If your query is joining all the large tables first and then joins to a smaller table, then this can cause a lot of unnecessary processing. The join order in each step means that the fewest number of rows are being returned to the next step, which in turn makes query performance better. Usually DBA’s will try to find the best possible access method of the

Page | 1 tables by performing the Explainplan on the query or by verifying the optimizer statistics. They also try to optimize the SQL workload by identifying the proper indexesfields in the “Where” clause for joining the tables. Before applying WHERE clause, the DBA’s will always try to provide the best suited condition in “ON” condition while performing the Join, this will filter the data and reduce the join result itself. The subsequent join conditions will be executed with filtered data which makes better performance. After that only WHERE condition will apply filter conditions.They even try to eliminate the unnecessary large full table scan. Let us consider the below Join query. Here Explain Plan will show how the statements are executed and it also shows the number of rows performed and the cost and CPU time taken. It produces the final record set of 665 rows. setautotracetraceonly explain

Select P.ProductID, PS.ProductID, COL.ProductID From Customerorderitem COL Inner Join ProductSupplier PS On COL.ProductID = PS.ProductID Inner Join Product P On COL.ProductID= P.ProductID

------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------| 0 | SELECT STATEMENT | | 4882 | 185K| 11 (10)| 00:00:01 | | 1 | NESTED LOOPS | | 4882 | 185K| 11 (10)| 00:00:01 | |* 2 | HASH JOIN | | 4882 | 123K| 11 (10)| 00:00:01 | | 3 | INDEX FAST FULL SCAN| PRODSUPPLPK | 288 | 3744 | 3 (0)| 00:00:01 | | 4 | TABLE ACCESS FULL | CUSTOMERORDERITEM | 4882 | 63466 | 7 (0)| 00:00:01 | |* 5 | INDEX UNIQUE SCAN | PRODUCTPK | 1 | 13 | 0 (0)| 00:00:01 | ------

After modifying the Join order, executed the query with Explain Plan and it executed with little less Cost than the previous query. It even reduced the number of rows processed based on the Nested loop of table “Productsupplier” and “Product”which returned 288 rows.It also produces the final record set of 665 rows. This Join order is preferred over the first query. setautotracetraceonly explain

Select P.ProductID, PS.ProductID, COL.ProductID From Customerorderitem COL Inner Join Product P On COL.ProductID= P.ProductID Inner Join ProductSupplier PS On P.ProductID = PS.ProductID;

Page | 2 ------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------| 0 | SELECT STATEMENT | | 4882 | 185K| 11 (10)| 00:00:01 | |* 1 | HASH JOIN | | 4882 | 185K| 11 (10)| 00:00:01 | | 2 | NESTED LOOPS | | 288 | 7488 | 3 (0)| 00:00:01 | | 3 | INDEX FAST FULL SCAN| PRODSUPPLPK | 288 | 3744 | 3 (0)| 00:00:01 | |* 4 | INDEX UNIQUE SCAN | PRODUCTPK | 1 | 13 | 0 (0)| 00:00:01 | | 5 | TABLE ACCESS FULL | CUSTOMERORDERITEM | 4882 | 63466 | 7 (0)| 00:00:01 |

Indexes. Consider the Global Engineering and the Retail Company databases in your answers to these questions. a. What problems could be caused by not having appropriate indexes? b. What problems could be cause by having too many indexes? c. What do database statistics contribute to defining appropriate indexes?

3. Not having an appropriate index can cause a full table scan while performing the select statement by using a non-index field in the WHERE clause. Such queries will increase the logical read of the table and increase the disk input/output calls, i.e because such SQL statement will performs a row by row search to find the data that a query is looking for.For example consider a SQL statement and execute the same in SQLPLUS with the Explain plan option. setautotracetraceonly explain

Select * from customer Where Custlname = 'Dobson';

------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------| 0 | SELECT STATEMENT | | 7 | 910 | 171 (1)| 00:00:03 | |* 1 | TABLE ACCESS FULL| CUSTOMER | 7 | 910 | 171 (1)| 00:00:03 | ------

This query will perform the full table scan on “CUSTOMER” table, because in the where condition we are using the non-index column “CUSTLNAME”.

Now consider another example where we use Index column in our where condition. setautotracetraceonly explain

Select * from customer Where CustID = 98;

Execution Plan ------

Page | 3 Plan hash value: 908218400

------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------| 0 | SELECT STATEMENT | | 1 | 130 | 0 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| CUSTOMER | 1 | 130 | 0 (0)| 00:00:01 | |* 2 | INDEX UNIQUE SCAN | CUSTOMERPK | 1 | | 0 (0)| 00:00:01 | ------

This query will perform the Index unique scan on index “CUSTOMERPK” and will perform the table access by Index rowid. Even we can see the execution time, CPU usage and cost difference between both the queries based on the Index usage.

One of the factor that can cause the problem by having too many index are, For example consider a table “customerorderitem” in retaildba. This table is modified very often on daily basis, as many customers will be ordering the products or they will be returning the purchased product. If such tables contains too many indexes then this could cause a slow performance problem, i.e. because for every data modification in that table, each index needs to be updated. Which in turn will use more CPU, more memory and more I/O operation. But if any table is modified less often, then having more indexes most likely won't be a problem.

Statistics can be created on tables, indexes columns and as well as on the individual columns. But, if for some reason table or index statistics have not been updated, then this may result in a full table scan. This is because most RDBMS’s have query optimizers that use those statistics to figure out if using an index is worth or not. The use of statistic will quickly determine which execution plan might product the fastest and most efficient execution plan.And if those statistics are not available, then the RDBMS may wrongly determine that doing a full table scan is more efficient than using an index.(Larsen,2013)

Page | 4 References:

Larsen,G. (2013). SQL Server How important are Index Statistics. Retrieved From http://www.databasejournal.com/features/mssql/sql-server-how-important-are-index- statistics.html Oracle. (2003). Database Performance Tuning Guide 10g Release 1 (10.1). Retrieved from http://docs.oracle.com/cd/B14117_01/server.101/b10752/sql_1016.htm Oracle. (2008). Database Performance Tuning Guide 11g Release 1(11.1). Retrieved From http://docs.oracle.com/cd/B28359_01/server.111/b28274/optimops.htm#PFGRF001

Page | 5