NESUG 17 Posters

Effectively Using the Indices in an Oracle® with SAS® Kenneth W. Borowiak, Howard M. Proskin & Associates, Inc.

Abstract

Key features of a database are indices, which can substantially aid in reducing the time it takes for queries submitted against the database to run. However, making effective use of them is a non-trivial task. This paper outlines how to identify indices, create a to see if and what indices are being used, create table indices and provide hints to the query optimizer in order to fine-tune queries submitted against an with SAS.

Introduction

Important components of any medium to large sized database are indices (plural for index)1. Indices are guides to where the values of variables are located in the database. Making effective use of table indices when querying a database can substantially reduce the time it takes for queries to run. To aid in the fine-tuning of queries submitted against an Oracle database through SAS, this paper outlines how to:

• identify table indices, • create a query execution plan to see if and what indices are being used, • provide hints to the query optimizer, and • create table indices.

The obvious target audience of this paper is those SAS users who need to extract data from an Oracle database. However, SAS data sets can also be buttressed with indices. Therefore, the concepts presented in this paper (but not necessarily the code) can be useful for any SAS user interested in making effective use of indices.

Preliminaries

The values of the user id, password and path needed to make the connection to the Oracle tables will be embedded in the macro variables me, myob, and lesstaken, respectively. Oracle tables will be written in mixed-case and fields in lower-case; both will be in Courier New font.

An Index Paradigm

An index is a guide to where the values of a variable or group of variables reside in a database. A collection of table indices can be likened to an index at the end of a textbook. When you want to find information on a particular topic in the book you could look up the word in index and find what pages related information is located. The textbook index

1 Other important considerations are database design and partitioning.

1 NESUG 17 Posters

paradigm of database indices just presented will continue to be built upon throughout the paper.

What Indices Exist?

Database indices have other similar characteristics to a textbook index. First, not every word that appears in a textbook will appear in the index because the numerous words or topics to keep track of would too cumbersome. Many words would not be very informative, such as the word ‘the’. Similarly, not every variable in a database should be indexed because resources are needed to create, store and maintain them. There are a couple of ways to ascertain the existence of table indices in an Oracle database. The database administrators may have readily accessible metadata on the available table indices, possibly in an HTML format. Oracle also provides a series of metadata or system tables. The All_Ind_Columns table contains information on which variables (found in the column_name field) are indexed (found in the index_name field). The following scripts can be submitted to extract information on table indices and indexed columns2:

/* Identifying table indices */

libname Orasys oracle user=&me password=&myob path=&lesstaken schema=SYS;

proc ; /* Pass-Through method #1 */ connect to oracle (user=&me password=&myob path=&lesstaken); create table

as select * from connection to oracle (select index_name, column_name from All_Ind_Columns); disconnect from oracle; /* or */

/*Pass-Through method #2-Requires privileges to create Oracle tables */ connect to oracle (user=&me password=&myob path=&lesstaken); execute ( create table

as select index_name, column_name from All_Ind_Columns) by oracle; disconnect from oracle; /* or */

/* Accessing the metadata w/ a libref */ create table

as select index_name, column_name from Orasys.All_Ind_Columns;

quit;

2 Setting Oracle system tables in a DATA step is another way to extract information from these tables. The SAS counterpart of the Oracle system tables are the dictionary tables in the SASHELP library. However, extracting information from the SAS dictionary table into a data set can only be accomplished using Proc SQL. To avoid confusion as to how to retrieve information from the various metadata sources, my suggestion is to avoid using a DATA step.

2 NESUG 17 Posters

It is important to know which columns in the database are indexed because including these variables in the WHERE clause of SQL queries can substantially reduce the time it takes for queries to finish executing.

Are the Indices Being Used?

A common problem that both the users of a database and a reader of a textbook face is whether or not to use an index. Let us suppose a SAS user wanted to find out information on ‘procs’ in the Procedures manual. After looking up the word in the index the user might find that the word ‘proc’ appears on 95% of the pages in the manual. The user could consult the index to find where the first page the word ‘proc’ occurs, check that page for the information of interest, consulting the index again if necessary, checking that page and so forth. Alternatively, the user could start from page 1 and read every page until she finds the information of interest.

As it turns out, the decision to use an index when querying a database is not totally left up to the user. When an SQL statement is submitted, the information in the code is passed along to the query optimizer, which determines the best possible execution plan to return the records you requested. The Oracle query optimizer uses a cost-based approach to make its decision based upon statistics gathered about the tables and indices. The SAS user has the opportunity to inspect the query execution plan, but must first define the table that the execution plan information will populate by submitting the following code:

proc sql; connect to oracle (user=&me password=&myob path=&lesstaken); execute( create table Plan_Table ( statement_id varchar2(30), timestamp date, remarks varchar2(80), operation varchar2(30), options varchar2(30), object_node varchar2(128), object_owner varchar2(30), object_name varchar2(30), object_instance number(38), object_type varchar2(30), optimizer varchar2(255), search_columns number, id number(38), parent_id number(38), position number(38), cost number(38), cardinality number(38), bytes number(38), other_tag varchar2(255), partition_start varchar2(255), partition_stop varchar(255), partition_id number(38), other long, distribution varchar2(30) ) ) by oracle;

3 NESUG 17 Posters

By default, the table Plan_Table is where the information regarding the query plan will be outputted. However, any table with the same defining characteristics as Plan_Table is a candidate to be filled with the query plan information, if instructed to do so. Another table with same table definition as Plan_Table can be created by submitting the following piece of code:

execute( create table like Plan_Table) by oracle;

The section of code below now requests that the query plan information be outputted into an Oracle table:

execute( explain plan > for enter SQL code here3 ) by oracle; disconnect from oracle; quit;

Requested information regarding the query execution plan does not overwrite any pre- existing information in the designated table, but appends the new information to it. To delete the records from an existing table for re-use the following code should be submitted:

execute( delete * from

) by oracle;

The query plan table has the potential to contain a lot of information, but the discussion here will focus solely on index usage. To ascertain which indices (if any) are used to retrieve the query, the variables operation, object_name and operations from the execution plan table should be consulted. The field operation identifies what operation is performed at that step of the query execution, where the user would be looking for the entry ‘INDEX’. The field object_name will identify the index used. The field operations identifies whether a range scan or a unique scan was used to retrieve the records from the indexed column(s).

The SAS counterpart to Oracle’s query execution plan functionality is the _tree option in Proc SQL4. Unfortunately, through SAS Version 8.2 the _tree option does not provide as nearly as much information on the query execution plan as Oracle and no information on index use. For this reason, using the explain plan statement in tandem with the SQL Pass-Through facility is essential to the SAS user for fine-tuning queries against an Oracle database.

3 Omit the CREATE statement and start with SELECT statement.

4 The option _tree is a more detailed and graphical description of the _method option. Lavery (2004) provides one of the first detailed descriptions of _tree.

4 NESUG 17 Posters

Though a variable which is indexed is included in the WHERE clause of a query, it is not guaranteed that the index will be used to execute the query. Generally speaking, if the amount of information requested in the WHERE clause of an indexed variable exceeds 30% of the total amount of data for that variable then the index will not be used. Returning to the scenario where a SAS user wanted to obtain information on ‘procs’ from the Procedures manual, she certainly would certainly find it more efficient to start reading the book from the beginning rather than using the index.. Also, using a wildcard at the beginning of a logical expression involving an indexed variable in the WHERE clause will preclude the index from being used. For example, suppose a user needed to retrieve records for all females from a database and that the last digit of the indexed character field subject_id identifies the subject’s gender. In this case, using the LIKE operator with the % wildcard will assist in returning all the relevant records, but without help from the index.

/* Index will not be used */

where subject_id LIKE '%F'

The paper by Raithel (2004) provides an excellent reference on forms of the WHERE clause that allow for the possibility of indices to be used for SAS data set, which generally apply when querying Oracle tables.

Providing Hints to the Query Optimizer

Queries submitted against a database sometimes do not execute well, and sometimes they never finish at all. It is possible that the statistics garnered about the indices by the DBAs are old or unreliable or that the user has knowledge about the database that the query optimizer does not. This may cause queries to execute based upon a suboptimal execution plan, which can be verified by the user. The user has the ability to provide hints to the query optimizer, which can alter the query execution plan. Hints are embedded within /*+ */ in the SELECT statement, as demonstrated below:

/* Sending hints to the query optimizer */

proc sql;

/* Example of the INDEX hint using Pass-Through */ connect to oracle (user=&me password=&myob path=&lesstaken); create table

as select * from connection to oracle (select /*+ INDEX (
) */ remaining SQL code here ); disconnect from oracle; quit;

In the example above, the INDEX hint explicitly requests a scan of the index in the specified table. There are other hints such as:

5 NESUG 17 Posters

• INDEX_ASC – used for a range scan, which is often associated with the logical operators BETWEEN, >, >=, <>, < and <=.

• NO_INDEX- disallows a set of indices5.

When joining many tables together the user may have to provide a series of hints to get the ideal plan of execution.

Creating an Index

The methods outlined in this paper so far have dealt with existing table indices. However, if a user has privileges to create Oracle tables then she has the ability to create indices on those tables. It was mentioned earlier in the paper that resources are needed to store indices, so some things to consider before creating them are:

• Null values in an Oracle table are not included in an index. Therefore, the higher the proportion of null values in a column the less effective the index may be. • Columns which have a more diverse collection of values (i.e. Social Security numbers) are better candidates to be indexed than those with less varied values (i.e. Gender). • An index can be created on a single column (called a simple index) or on a group of columns (called a composite index). If a group of variables are often included in the WHERE clause of queries, then it is more efficient to create a composite index on those variables rather than multiple simple indices.

The code below outlines how to create and drop6 an index from an Oracle table:

proc sql; connect to oracle (user=&me password=&myob path=&lesstaken);

/* Creating an index */ execute( create index on

(variable1, variable2, etc) ) by oracle;

/* Dropping an index */ execute( drop index on

) by oracle;

disconnect from oracle; quit;

5 There are other types of hints which do not pertain to index usage.

6 When an indexed Oracle table is deleted, so are the indices.

6 NESUG 17 Posters

Conclusion

The methods outlined in this paper to aid in the fine-tuning of queries against an Oracle database were presented somewhat independently, but their use in practice should not. Getting acclimated with the indices in the database is strongly encouraged before writing queries. After perusing the query execution plan and monitoring query performance, the user has the ability to alter the query plan by rewriting the query, creating new indices and providing hints to the query optimizer. In fact, a hint may be given to use the index that you just created! But any attempt to alter the original execution plan should be followed by inspection of the new query plan.

References

Ensor, D. & Stevenson, I. Oracle Design. Sebastopol, CA: O’Reilly & Assoc., Inc., 1997.

Lavery, Russell (2004), “An Animated Guide: The SQL Optimizer: Reading Output for the _method and _tree Options”. Proceedings of the Seventeenth Annual Northeast SAS Users Group Conference, USA.

McCullough, C. Oracle 8 for Dummies. Foster, CA: IDG Books Worldwide, Inc., 1997.

Raithel, Michael A. (2004), “Creating and Exploiting SAS® Indexes”. Proceeding of the Twenty-Eighth SAS Users Group International Conference. USA.

Acknowledgements

I would like to thank Mike Molter, Jenni Borowiak and especially Steve Paciocco for their insightful comments on this paper.

SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Oracle is a Registered Trademark of Oracle Corporation.

Contact Information

Your comments and questions are valued and encouraged.

Contact the author at: Kenneth W. Borowiak Howard M. Proskin & Associates, Inc. 300 Red Creek Drive, Suite 220 Rochester, NY 14623

E-mail: [email protected]

7