as select index_name, column_name from Orasys.All_Ind_Columns; quit;
2 Setting Oracle system tables in a DATA step is another way to extract information from these tables. The SAS counterpart of the Oracle system tables are the dictionary tables in the SASHELP library. However, extracting information from the SAS dictionary table into a data set can only be accomplished using Proc SQL. To avoid confusion as to how to retrieve information from the various metadata sources, my suggestion is to avoid using a DATA step.
2 NESUG 17 Posters
It is important to know which columns in the database are indexed because including these variables in the WHERE clause of SQL queries can substantially reduce the time it takes for queries to finish executing.
Are the Indices Being Used?
A common problem that both the users of a database and a reader of a textbook face is whether or not to use an index. Let us suppose a SAS user wanted to find out information on ‘procs’ in the Procedures manual. After looking up the word in the index the user might find that the word ‘proc’ appears on 95% of the pages in the manual. The user could consult the index to find where the first page the word ‘proc’ occurs, check that page for the information of interest, consulting the index again if necessary, checking that page and so forth. Alternatively, the user could start from page 1 and read every page until she finds the information of interest.
As it turns out, the decision to use an index when querying a database is not totally left up to the user. When an SQL statement is submitted, the information in the code is passed along to the query optimizer, which determines the best possible execution plan to return the records you requested. The Oracle query optimizer uses a cost-based approach to make its decision based upon statistics gathered about the tables and indices. The SAS user has the opportunity to inspect the query execution plan, but must first define the table that the execution plan information will populate by submitting the following code:
proc sql; connect to oracle (user=&me password=&myob path=&lesstaken); execute( create table Plan_Table ( statement_id varchar2(30), timestamp date, remarks varchar2(80), operation varchar2(30), options varchar2(30), object_node varchar2(128), object_owner varchar2(30), object_name varchar2(30), object_instance number(38), object_type varchar2(30), optimizer varchar2(255), search_columns number, id number(38), parent_id number(38), position number(38), cost number(38), cardinality number(38), bytes number(38), other_tag varchar2(255), partition_start varchar2(255), partition_stop varchar(255), partition_id number(38), other long, distribution varchar2(30) ) ) by oracle;
3 NESUG 17 Posters
By default, the table Plan_Table is where the information regarding the query plan will be outputted. However, any table with the same defining characteristics as Plan_Table is a candidate to be filled with the query plan information, if instructed to do so. Another table with same table definition as Plan_Table can be created by submitting the following piece of code:
execute( create table like Plan_Table) by oracle;
The section of code below now requests that the query plan information be outputted into an Oracle table:
execute( explain plan > for enter SQL code here3 ) by oracle; disconnect from oracle; quit;
Requested information regarding the query execution plan does not overwrite any pre- existing information in the designated table, but appends the new information to it. To delete the records from an existing table for re-use the following code should be submitted:
execute( delete * from
) by oracle; The query plan table has the potential to contain a lot of information, but the discussion here will focus solely on index usage. To ascertain which indices (if any) are used to retrieve the query, the variables operation, object_name and operations from the execution plan table should be consulted. The field operation identifies what operation is performed at that step of the query execution, where the user would be looking for the entry ‘INDEX’. The field object_name will identify the index used. The field operations identifies whether a range scan or a unique scan was used to retrieve the records from the indexed column(s).
The SAS counterpart to Oracle’s query execution plan functionality is the _tree option in Proc SQL4. Unfortunately, through SAS Version 8.2 the _tree option does not provide as nearly as much information on the query execution plan as Oracle and no information on index use. For this reason, using the explain plan statement in tandem with the SQL Pass-Through facility is essential to the SAS user for fine-tuning queries against an Oracle database.
3 Omit the CREATE statement and start with SELECT statement.
4 The option _tree is a more detailed and graphical description of the _method option. Lavery (2004) provides one of the first detailed descriptions of _tree.
4 NESUG 17 Posters
Though a variable which is indexed is included in the WHERE clause of a query, it is not guaranteed that the index will be used to execute the query. Generally speaking, if the amount of information requested in the WHERE clause of an indexed variable exceeds 30% of the total amount of data for that variable then the index will not be used. Returning to the scenario where a SAS user wanted to obtain information on ‘procs’ from the Procedures manual, she certainly would certainly find it more efficient to start reading the book from the beginning rather than using the index.. Also, using a wildcard at the beginning of a logical expression involving an indexed variable in the WHERE clause will preclude the index from being used. For example, suppose a user needed to retrieve records for all females from a database and that the last digit of the indexed character field subject_id identifies the subject’s gender. In this case, using the LIKE operator with the % wildcard will assist in returning all the relevant records, but without help from the index.
/* Index will not be used */
where subject_id LIKE '%F'
The paper by Raithel (2004) provides an excellent reference on forms of the WHERE clause that allow for the possibility of indices to be used for SAS data set, which generally apply when querying Oracle tables.
Providing Hints to the Query Optimizer
Queries submitted against a database sometimes do not execute well, and sometimes they never finish at all. It is possible that the statistics garnered about the indices by the DBAs are old or unreliable or that the user has knowledge about the database that the query optimizer does not. This may cause queries to execute based upon a suboptimal execution plan, which can be verified by the user. The user has the ability to provide hints to the query optimizer, which can alter the query execution plan. Hints are embedded within /*+ */ in the SELECT statement, as demonstrated below:
/* Sending hints to the query optimizer */
proc sql;
/* Example of the INDEX hint using Pass-Through */ connect to oracle (user=&me password=&myob path=&lesstaken); create table
as select * from connection to oracle (select /*+ INDEX ( ) */ remaining SQL code here ); disconnect from oracle; quit; In the example above, the INDEX hint explicitly requests a scan of the index in the specified table. There are other hints such as:
5 NESUG 17 Posters
• INDEX_ASC – used for a range scan, which is often associated with the logical operators BETWEEN, >, >=, <>, < and <=.
• NO_INDEX- disallows a set of indices5.
When joining many tables together the user may have to provide a series of hints to get the ideal plan of execution.
Creating an Index
The methods outlined in this paper so far have dealt with existing table indices. However, if a user has privileges to create Oracle tables then she has the ability to create indices on those tables. It was mentioned earlier in the paper that resources are needed to store indices, so some things to consider before creating them are:
• Null values in an Oracle table are not included in an index. Therefore, the higher the proportion of null values in a column the less effective the index may be. • Columns which have a more diverse collection of values (i.e. Social Security numbers) are better candidates to be indexed than those with less varied values (i.e. Gender). • An index can be created on a single column (called a simple index) or on a group of columns (called a composite index). If a group of variables are often included in the WHERE clause of queries, then it is more efficient to create a composite index on those variables rather than multiple simple indices.
The code below outlines how to create and drop6 an index from an Oracle table:
proc sql; connect to oracle (user=&me password=&myob path=&lesstaken);
/* Creating an index */ execute( create index on (variable1, variable2, etc) ) by oracle; /* Dropping an index */ execute( drop index on ) by oracle; disconnect from oracle; quit;
5 There are other types of hints which do not pertain to index usage.
6 When an indexed Oracle table is deleted, so are the indices.
6 NESUG 17 Posters
Conclusion
The methods outlined in this paper to aid in the fine-tuning of queries against an Oracle database were presented somewhat independently, but their use in practice should not. Getting acclimated with the indices in the database is strongly encouraged before writing queries. After perusing the query execution plan and monitoring query performance, the user has the ability to alter the query plan by rewriting the query, creating new indices and providing hints to the query optimizer. In fact, a hint may be given to use the index that you just created! But any attempt to alter the original execution plan should be followed by inspection of the new query plan.
References
Ensor, D. & Stevenson, I. Oracle Design. Sebastopol, CA: O’Reilly & Assoc., Inc., 1997.
Lavery, Russell (2004), “An Animated Guide: The SQL Optimizer: Reading Output for the _method and _tree Options”. Proceedings of the Seventeenth Annual Northeast SAS Users Group Conference, USA.
McCullough, C. Oracle 8 for Dummies. Foster, CA: IDG Books Worldwide, Inc., 1997.
Raithel, Michael A. (2004), “Creating and Exploiting SAS® Indexes”. Proceeding of the Twenty-Eighth SAS Users Group International Conference. USA.
Acknowledgements
I would like to thank Mike Molter, Jenni Borowiak and especially Steve Paciocco for their insightful comments on this paper.
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Oracle is a Registered Trademark of Oracle Corporation.
Contact Information
Your comments and questions are valued and encouraged.
Contact the author at: Kenneth W. Borowiak Howard M. Proskin & Associates, Inc. 300 Red Creek Drive, Suite 220 Rochester, NY 14623
E-mail: [email protected]
7