Effectively Using the Indices in an Oracle® Database with SAS® Kenneth W

NESUG 17 Posters Effectively Using the Indices in an Oracle® Database with SAS® Kenneth W. Borowiak, Howard M. Proskin & Associates, Inc. Abstract Key features of a database are indices, which can substantially aid in reducing the time it takes for queries submitted against the database to run. However, making effective use of them is a non-trivial task. This paper outlines how to identify table indices, create a query plan to see if and what indices are being used, create table indices and provide hints to the query optimizer in order to fine-tune queries submitted against an Oracle database with SAS. Introduction Important components of any medium to large sized database are indices (plural for index)1. Indices are guides to where the values of variables are located in the database. Making effective use of table indices when querying a database can substantially reduce the time it takes for queries to run. To aid in the fine-tuning of queries submitted against an Oracle database through SAS, this paper outlines how to: • identify table indices, • create a query execution plan to see if and what indices are being used, • provide hints to the query optimizer, and • create table indices. The obvious target audience of this paper is those SAS users who need to extract data from an Oracle database. However, SAS data sets can also be buttressed with indices. Therefore, the concepts presented in this paper (but not necessarily the code) can be useful for any SAS user interested in making effective use of indices. Preliminaries The values of the user id, password and path needed to make the connection to the Oracle tables will be embedded in the macro variables me, myob, and lesstaken, respectively. Oracle tables will be written in mixed-case and fields in lower-case; both will be in Courier New font. An Index Paradigm An index is a guide to where the values of a variable or group of variables reside in a database. A collection of table indices can be likened to an index at the end of a textbook. When you want to find information on a particular topic in the book you could look up the word in index and find what pages related information is located. The textbook index 1 Other important considerations are database design and partitioning. 1 NESUG 17 Posters paradigm of database indices just presented will continue to be built upon throughout the paper. What Indices Exist? Database indices have other similar characteristics to a textbook index. First, not every word that appears in a textbook will appear in the index because the numerous words or topics to keep track of would too cumbersome. Many words would not be very informative, such as the word ‘the’. Similarly, not every variable in a database should be indexed because resources are needed to create, store and maintain them. There are a couple of ways to ascertain the existence of table indices in an Oracle database. The database administrators may have readily accessible metadata on the available table indices, possibly in an HTML format. Oracle also provides a series of metadata or system tables. The All_Ind_Columns table contains information on which variables (found in the column_name field) are indexed (found in the index_name field). The following scripts can be submitted to extract information on table indices and indexed columns2: /* Identifying table indices */ libname Orasys oracle user=&me password=&myob path=&lesstaken schema=SYS; proc sql; /* Pass-Through method #1 */ connect to oracle (user=&me password=&myob path=&lesstaken); create table <table name> as select * from connection to oracle (select index_name, column_name from All_Ind_Columns); disconnect from oracle; /* or */ /*Pass-Through method #2-Requires privileges to create Oracle tables */ connect to oracle (user=&me password=&myob path=&lesstaken); execute ( create table <table name> as select index_name, column_name from All_Ind_Columns) by oracle; disconnect from oracle; /* or */ /* Accessing the metadata w/ a libref */ create table <table name> as select index_name, column_name from Orasys.All_Ind_Columns; quit; 2 Setting Oracle system tables in a DATA step is another way to extract information from these tables. The SAS counterpart of the Oracle system tables are the dictionary tables in the SASHELP library. However, extracting information from the SAS dictionary table into a data set can only be accomplished using Proc SQL. To avoid confusion as to how to retrieve information from the various metadata sources, my suggestion is to avoid using a DATA step. 2 NESUG 17 Posters It is important to know which columns in the database are indexed because including these variables in the WHERE clause of SQL queries can substantially reduce the time it takes for queries to finish executing. Are the Indices Being Used? A common problem that both the users of a database and a reader of a textbook face is whether or not to use an index. Let us suppose a SAS user wanted to find out information on ‘procs’ in the Procedures manual. After looking up the word in the index the user might find that the word ‘proc’ appears on 95% of the pages in the manual. The user could consult the index to find where the first page the word ‘proc’ occurs, check that page for the information of interest, consulting the index again if necessary, checking that page and so forth. Alternatively, the user could start from page 1 and read every page until she finds the information of interest. As it turns out, the decision to use an index when querying a database is not totally left up to the user. When an SQL statement is submitted, the information in the code is passed along to the query optimizer, which determines the best possible execution plan to return the records you requested. The Oracle query optimizer uses a cost-based approach to make its decision based upon statistics gathered about the tables and indices. The SAS user has the opportunity to inspect the query execution plan, but must first define the table that the execution plan information will populate by submitting the following code: proc sql; connect to oracle (user=&me password=&myob path=&lesstaken); execute( create table Plan_Table ( statement_id varchar2(30), timestamp date, remarks varchar2(80), operation varchar2(30), options varchar2(30), object_node varchar2(128), object_owner varchar2(30), object_name varchar2(30), object_instance number(38), object_type varchar2(30), optimizer varchar2(255), search_columns number, id number(38), parent_id number(38), position number(38), cost number(38), cardinality number(38), bytes number(38), other_tag varchar2(255), partition_start varchar2(255), partition_stop varchar(255), partition_id number(38), other long, distribution varchar2(30) ) ) by oracle; 3 NESUG 17 Posters By default, the table Plan_Table is where the information regarding the query plan will be outputted. However, any table with the same defining characteristics as Plan_Table is a candidate to be filled with the query plan information, if instructed to do so. Another table with same table definition as Plan_Table can be created by submitting the following piece of code: execute( create table <new table> like Plan_Table) by oracle; The section of code below now requests that the query plan information be outputted into an Oracle table: execute( explain plan <into <new table> > for enter SQL code here3 ) by oracle; disconnect from oracle; quit; Requested information regarding the query execution plan does not overwrite any pre- existing information in the designated table, but appends the new information to it. To delete the records from an existing table for re-use the following code should be submitted: execute( delete * from <table name> ) by oracle; The query plan table has the potential to contain a lot of information, but the discussion here will focus solely on index usage. To ascertain which indices (if any) are used to retrieve the query, the variables operation, object_name and operations from the execution plan table should be consulted. The field operation identifies what operation is performed at that step of the query execution, where the user would be looking for the entry ‘INDEX’. The field object_name will identify the index used. The field operations identifies whether a range scan or a unique scan was used to retrieve the records from the indexed column(s). The SAS counterpart to Oracle’s query execution plan functionality is the _tree option in Proc SQL4. Unfortunately, through SAS Version 8.2 the _tree option does not provide as nearly as much information on the query execution plan as Oracle and no information on index use. For this reason, using the explain plan statement in tandem with the SQL Pass-Through facility is essential to the SAS user for fine-tuning queries against an Oracle database. 3 Omit the CREATE statement and start with SELECT statement. 4 The option _tree is a more detailed and graphical description of the _method option. Lavery (2004) provides one of the first detailed descriptions of _tree. 4 NESUG 17 Posters Though a variable which is indexed is included in the WHERE clause of a query, it is not guaranteed that the index will be used to execute the query. Generally speaking, if the amount of information requested in the WHERE clause of an indexed variable exceeds 30% of the total amount of data for that variable then the index will not be used. Returning to the scenario where a SAS user wanted to obtain information on ‘procs’ from the Procedures manual, she certainly would certainly find it more efficient to start reading the book from the beginning rather than using the index.. Also, using a wildcard at the beginning of a logical expression involving an indexed variable in the WHERE clause will preclude the index from being used.

Load more