Optimal Usage Of Oracle’S Partitioning Option
Total Page:16
File Type:pdf, Size:1020Kb
Database High Availability Management
“OPTIMAL USAGE OF ORACLE’S PARTITIONING OPTION”
Frank Bommarito, SageLogix, Inc.
PARTITIONING - THE BEGINNING Partitioning, as a concept, has been in existence since the beginnings of large databases (i.e. data warehouses). The basic concept of partitioning is to divide one large table into multiple smaller units. Each of the smaller units (or partitions) can then be accessed and managed separately. The huge growth in various industries in the last decade has resulted in a phenomenal increase of data, and therefore, the databases and tables that contain this data. With such large-scale growth, interesting challenges were introduced for database administrators. Operations such as rebuilding indexes became nearly impossible to complete within designated outage windows. Duplication of tables from production to test environment became unmanageable. Query tuning was complicated by the sheer volume of data, resulting in sub-optimal performance for both index and full table scans. Partitioning was introduced to relieve these issues and allow continued growth of these large tables while providing the database administrator the ability to manage the database with smaller maintenance windows. Oracle based partitioning was implemented fully with Oracle Enterprise Edition 8.0. From an application perspective, Oracle masks the partitioning, allowing select, update, insert, and delete operations against all the partitions without application modifications. Additionally, Oracle’s optimizer is ‘partition aware’, meaning that the optimizer will avoid performing operations on partitions that do not match the query criteria. This process is known as partition pruning, and adds a new dimension of performance tuning that was previously unavailable for very large tables.
PARTITIONING CONCEPTS When a partition table is created, two distinct object types for the table are created as well. These object types are known as GLOBAL and LOCAL. GLOBAL objects refer to the table as a whole and are not concerned with any of the individual pieces. LOCAL objects are the individual partitions themselves.
A standard Oracle table can have indexes, constraints, triggers, etc. These same features are available for partitioned tables. However, the implementation of the indexes is slightly different for partitioned tables. This difference stems from the fact that the table’s rows are physically stored in multiple objects as opposed to one object.
Consider the following example: Create table range_partition ( Part_key number, Value1 varchar2(30), Value2 number) Partition by range (part_key) (
Page 1 of 10 Frank Bommarito – SageLogix, Inc. Paper# 35697 Database High Availability Management partition p1 values less than (80000), partition p2 values less than (160000), partition pmax values less than (maxvalue) );
In this example, a single table is created, but contains three physical segments. Indexes need to be created so that they can access all of the tables segments.
This statement will create a NON-PREFIXED LOCAL index. LOCAL indicates that THREE separate indexes are created. NON-PREFIXED indicates that the index does not have the partition key as the leading column. The partition key is derived from the statement “Partition by range (part_key)”. In this example, the column part_key is the partition key. Create index idx_example1 on range_partition (value2) LOCAL;
This statement will create one single index. This index is known as a GLOBAL index as it includes rows from all partitions. Create index idx_example2 on range_partition (value1) GLOBAL;
This statement creates a PREFIXED LOCAL index. Create index idx_example3 on range_partition (part_key) LOCAL;
After executing the three index creation statements, three new database objects (indexes) and seven new database segments (physical segments) will exist.
Performance Considerations: Please look at the following example.
Facts: The table range_partition is loaded with 256,000 rows. All three partitions have a near equal distribution of those rows. All three columns have unique values. The three indexes above have been created on the table.
20,000 queries are generated and executed for each column that is indexed.
Results Column Index Type Total Time Value1 GLOBAL 7 minutes 20 seconds Value2 NON-PREFIXED LOCAL 12 minutes 30 seconds Part_key PREFIXED LOCAL 6 minutes 10 seconds
The same table was created without partitioning. Same table, same rows! Column Index Type Total Time Value1 Standard 7 minutes 20 seconds Value2 Standard 7 minutes 20 seconds Part_key Standard 7 minutes 20 seconds
GLOBAL indexes are similar to indexes on non-partitioned tables. Local prefixed indexes are the fastest option as partition pruning can occur. Non-prefixed local indexes are the slowest, as they need to perform (in this example) more index scans than the other choices.
Page 2 of 10 Frank Bommarito – SageLogix, Inc. Paper# 35697 Database High Availability Management
PARTITION MAINTENANCE Given the performance considerations, why would anyone utilize a non-prefixed local index? The answer is maintenance. Partitions offer the ability to provide maintenance on excessively large tables in a timely manner. This is largely due to the fact that the table is split up into smaller, more manageable units. Each individual partition can have maintenance performed on it without affecting the other partitions. The independence between partitions allows the maintenance on those very partitions to take place simultaneously.
However, any maintenance on a partition does have an impact on GLOBAL table items. Each GLOBAL table item potentially impacted by partition maintenance is identified below. Maintenance operations may include: 1. Rebuild a specific partition’s data segments 2. Exchange a non-partitioned table with a partition 3. Merge two partitions together 4. Divide two partitions apart 5. Add new partitions to the table 6. Drop old partitions from the table
GLOBAL INDEXES Whenever a single ROW is affected by a partition maintenance operation, the ENTIRE global index becomes invalid. Starting with release 9i of Oracle, the “update global index” clause can be applied to partition maintenance operations. This clause rebuilds only those components of the GLOBAL index that are impacted. This rebuilding can cause performance degradations. However, these degradations are minimal compared to the impact of INVALIDATING important indexes.
Example: Alter table range_partition move partition p1 tablespace new_tablespace;
This command will rebuild the partition and locate the newly rebuilt partition in the tablespace “new_tablespace”. If the partition “p1” has one or more rows in it, then, any global indexes will become “UNUSABLE”. This means that application will begin to receive errors if the application needs to access the index. The following command will rebuild the unusable index. Alter index idx_example2 rebuild;
However, this command will need to re-index the entire table’s contents and will not be working only on the deltas.
This command will perform the same operation, but, will add the additional task of “fixing” the global index upon completion of the move. Release 9i and above Alter table range_partition move partition p1 tablespace new_tablespace update global indexes;
CONSTRAINTS Constraints are likely the largest single prohibitive unit for partition maintenance. Most partition maintenance operations do not work when constraints are enabled. Typically, the constraint needs to be dropped a re-applied after the partition maintenance operations. To this end, Oracle has added some new syntax that is handy when disabling constraints.
Alter constraint pk_contstraint disable keep index;
Page 3 of 10 Frank Bommarito – SageLogix, Inc. Paper# 35697 Database High Availability Management
The keep index clause will not drop the index. The maintenance operations can proceed and the index pieces that need rebuilding can occur. Once complete, the constraint can be re-enabled with a relatively short time period.
Example CREATE TABLE part_test (ID NUMBER NOT NULL, NUMB NUMBER) PARTITION BY RANGE (ID) (PARTITION P1 VALUES LESS THAN (10), PARTITION P2 VALUES LESS THAN (20)); CREATE unique INDEX part_test_pkx ON part_test (ID) LOCAL; ALTER TABLE part_test ADD CONSTRAINT part_test_pk PRIMARY KEY (ID) USING INDEX; create table fk_table (id number, descr varchar2(30)); ALTER TABLE fk_table ADD CONSTRAINT fk_table_fk FOREIGN KEY (ID) REFERENCES PART_TEST(ID); create table part_exch (ID NUMBER NOT NULL, NUMB NUMBER); insert into part_test values (1,1); alter table part_test exchange partition p1 with table part_exch;
ERROR at line 1: ORA-02266: unique/primary keys in table referenced by enabled foreign keys
Stored PL/SQL Within Oracle databases, stored PL/SQL often exists. These program units have dependencies upon database objects. When the database object is modified, the PL/SQL program units need re- compilation to ensure that the modifications are valid. Partition maintenance operations seem to be logically excluded from this. The addition of a new partition does not appear to have any logical impact on stored PL/SQL. However, the addition of a new partition will invalidate any dependent PL/SQL program. Release 9i and above of Oracle have handled this by automatically recompiling the invalidated programs.
When will PL/SQL become invalid? The answer is whenever the data dictionary needs to add or remove a row resulting from the partition maintenance operation.
The following command does not cause invalidation as the data dictionary is simply updated. Alter table range_partition exchange partition p1 with table no_partition;
The following command does cause invalidation as the data dictionary is removing a row. Alter table range_partition drop partition p1;
TYPES OF PARTITIONING The example shown above utilized a partitioning type known as RANGE partitioning.
Page 4 of 10 Frank Bommarito – SageLogix, Inc. Paper# 35697 Database High Availability Management
Oracle supports two other types of partitions (HASH and LIST). Also, within partitioning, a sub- partition can exist. Sub-partitions also referred to as composite partitions, can be either HASH or LIST.
RANGE Range partitions are the most common. Table and index partitions are based on a list of columns allowing to the database to store each occurrence in a given partition. These partitions are typically used within data warehousing systems. The most common range boundary is based off of dates.
Each partition is defined with an upper boundary. The storage location of each occurrence is then found by comparing the partitioning key of the occurrence with this upper boundary. This upper boundary is non-inclusive; in other words, the key of each occurrence must be less than this limit for the record to be stored in this partition.
HASH Hash partitions are ideal when there is no real method to divide a table based on a range. Hash partitions utilize a hashing algorithm to programmatically take a column value and store that value within a given partition. Each partition is defined with an upper boundary. The storage location of each occurrence is then found by comparing the partitioning key of the occurrence with this upper boundary. This upper boundary is non-inclusive; in other words, the key of each occurrence must be less than this limit for the record to be stored in this partition. This type of partitioning is recommended when it is difficult to define the criteria for the distribution of data.
LIST List partitions have a hard-coded LIST of values that will exist within any partition. A common usage would be with states. A state partition table would commonly have 50 partitions, one for each state.
SUB-Partitions are utilized most often when the partition strategy does not provide small enough partition units to achieve maintenance goals. When this is true, sub-partitions can further divide a table based another column.
Examples RANGE: A max partition will capture any values beyond the stated ranges – including NULLS Create table range_partition ( date_col date) partition by RANGE (date_col) ( partition p_jan_2001 values less than (to_date(‘01022001’,’ddmmyyyy’)), partition p_feb_2001 values less than (to_date(‘01032001’,’ddmmyyyy’)), partition pmax values less than (maxvalue) );
HASH – Hash partitions are most optimal when 8, 16, or 32 partitions are used. Create table hash_partition (account_id varchar2(30)) partition by HASH (account_id) partitions 16
Page 5 of 10 Frank Bommarito – SageLogix, Inc. Paper# 35697 Database High Availability Management
LIST Create table list_partition (state_id varchar2(2)) partition by LIST (state_id) ( partition P_MI values (‘MI’), partition P_CO values (‘CO’) );
PRACTICAL PARTITIONING USAGES There are four widely accepted usage models for partitioning. Each of these models is tailored for a particular need. Usage of partitioning within the boundaries of these models allows for significant application improvements in the area of performance, scalability, availability, and organization.
Partition Usage I – Data Warehousing Partitions typically based on date ranges (daily or monthly).
Partition Usage II – OLTP Partitions typically based upon a frequently accessed key.
Partition Usage III – ODS Partitions typically based upon a date range and a key.
Partition Usage IV – Temporary Storage Partitions rotate and are reused over time.
A typical example would be a partition based off of the day of month. Thirty-one partitions are created and a date function is used to place rows in a partition based off of the day of month. These partitions are read by another application that TRUNCATES the partitions after reading the data.
STATISTICS The cost-based optimizer of Oracle is partitioning-aware. In fact, the rule-based optimizer does not “do” partitions.
The cost-based optimizer works off of statistics. Statistics on standard tables are easier to generate and comprehend than statistics on partition tables.
Statistics are the number one problem with partitioning implementations.
With partitions, there are LOCAL and GLOBAL statistics. GLOBAL statistics are utilized whenever GLOBAL operations are performed. LOCAL statistics are utilized when the partition key is available and partition elimination is possible.
Consider the following examples: Select * from range_partition where value1 = :b1;
Page 6 of 10 Frank Bommarito – SageLogix, Inc. Paper# 35697 Database High Availability Management
In this example, value1 is indexed GLOBALLY. This means that only global statistics are reviewed. The optimizer will then determine if full tables scan or an index lookup is most appropriate.
Select * from range_partition Where value1 = :b1 And value2 = :b2 And part_key = :b3
In this example, local statistics are evaluated along with global statistics. Local statistics come into play because the PART_KEY is within the where clause.
Statistics can be gathered LOCALLY or GLOBALLY. Once these are gathered, they are tied together, in effect. This means that partition maintenance operations that impact GLOBAL operations will also impact GLOBAL statistics. If a partition is added to an existing table, the GLOBAL statistics will “disappear”.
The low down on statistics is as follows:
NO table statistics If there are NO table statistics at all, then, the optimizer acts “relatively” rule-based.
Relatively Rule Rule 1: If a GLOBAL index exists and can be used, it will be. LOCAL indexes are not considered unless there is not a GLOBAL index that is usable. Rule 2: If there are not any GLOBAL indexes, LOCAL indexes will be used if they exist.
What this means is that NO statistics is a viable option, if, all indexes created on the table are good choices and any GLOBAL indexes are superior to LOCAL indexes. Conclusion, if the partition is to be queried from a single column and that column is the partition key and is indexed, then, the absence of gathering statistics is optimal.
Gathering Statistics - LOCALLY If the following commands are used INITIALLY to gather statistics, and, no other complimentary command is used, then GLOBAL statistics are derived. execute dbms_stats.gather_table_stats(owner,'RANGE_PARTITION','P2',CASCADE=>TRUE); or execute dbms_stats.gather_table_stats(owner,'RANGE_PARTITION', 'P2',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL INDEXED COLUMNS SIZE 200');
The only difference between these two statistics commands is the generation of histograms.
Consideration when generating statistics locally: GLOBAL statistics are populated after each running of a LOCAL script. After generating statistics on SOME of the partitions: Select num_rows from dba_tables where table_name = ‘RANGE_PARTITION’; NUM_ROWS=NULL
Page 7 of 10 Frank Bommarito – SageLogix, Inc. Paper# 35697 Database High Availability Management
Select partition_name,num_rows from dba_tab_partitions where table_name = ‘RANGE_PARTITION’; All rows have a NUM_ROWS=NULL
After a SINGLE execution of a local partition statistic generation statement: execute dbms_stats.gather_table_stats('SYSTEM','RANGE_PARTITION','P1',CASCADE=>TRUE); Select num_rows from dba_tables where table_name = ‘RANGE_PARTITION’; Global Result=250,000
Select partition_name,num_rows from dba_tab_partitions where table_name = ‘RANGE_PARTITION’; Results= P1 79999 P2 NULL PMAX NULL
The GLOBAL statistics are “guessed” at and populated. Once all LOCAL statistics are generated, the GLOBAL statistics are still an aggregate and not “reality”. What this means is that gathering statistics this way still uses the relatively rule method of optimization. If a GLOBAL index exists, it is used. LOCAL indexes are evaluated if the where clause allows for partition pruning.
Why is this? Because the above commands did not and do not account for GLOBAL table units (The GLOBAL indexes were never analyzed).
Once GLOBAL indexes are analyzed, then, all needed “units” have statistics and the optimizer takes over (heaven help us). execute dbms_stats.gather_index_stats('SYSTEM','RANGE_PARTITION_DESC');
Gathering Statistics - GLOBALLY Usage of the following command to gather statistics will gather GLOBAL and LOCAL statistics. execute dbms_stats.gather_table_stats(owner,'RANGE_PARTITION', GRANULARITY=>'ALL',CASCADE=>TRUE);
This one command is equivalent to all of the commands above. This is the recommended approach for the initial gathering of statistics on partitions as this ensures that ALL statistics are gathered.
Partition Maintenance Effects The effects of performing partition maintenance vary by release. In release 8.x, the GLOBAL statistics temporarily disappear (the num_rows value becomes NULL). In release 9.x, the GLOBAL statistics do not change.
In either case, the statistics are no longer valid and are in need of updating. One of the advantages of a partitioned table is to perform maintenance work on smaller segments. What this means are that ONLY the modified partitions need to be updated. This update will correct the GLOBAL statistics for the table (including GLOBAL index statistics).
Once any partition modifications occur, ensure to run the statistics immediately on the effected partitions. Failure to do so can lead to the optimizer’s inability to parse the SQL statement (This
Page 8 of 10 Frank Bommarito – SageLogix, Inc. Paper# 35697 Database High Availability Management could lead to hanging). If this phenomenon occurs, the best corrective actions are to remove all statistics and to generate them again.
PARTITIONING OPTIONS Enable Row Movement A new option with Oracle partition started in release 8I and above. This option allows updates to the partition key to occur when that update would “relocate” a row from one partition to another partition. Please note that such an operation WILL CHANGE the ROWID for the row. The ROWID changes because the partition identification is stored within the ROWID. This could have impact on application programs that utilize ROWID.
Exchange without validation When a partition is exchanged with another table Alter table part_table exchange partition p1 with table fk_table without validation;
It is possible that the rows from this partition cannot be transparently queried. The validation ensures that rows in “fk_table” qualify for the given partition. When this option is bypassed (for performance reasons), then, care must be taken to ensure that the new partition rows do not violate the constrained partition boundaries.
Given that partition pruning occurs prior to selection, the violation of the boundaries could render false results from a query.
Example CREATE TABLE part_test (ID NUMBER NOT NULL) PARTITION BY RANGE (ID) (PARTITION P1 VALUES LESS THAN (10), PARTITION P2 VALUES LESS THAN (20)); create table fk_table (id number not null); insert into part_test values (5); insert into fk_table values (5); commit; alter table part_test exchange partition p2 with table fk_table without validation;
-- Returns 2 rows - both with the value of 5 select * from SYSTEM.RANGE_PARTITION;
-- Returns 1 row - with the value of 5 select * from SYSTEM.RANGE_PARTITION where id=5;
Oracle Initialization Parameters Oracle’s partitioning option coverts one single table into many physical segments. More physical segments require more resources from the Oracle SGA. In particular, the Oracle initialization parameter DML_LOCKS must be set to accommodate partitioning. If a table as 1000 partitions, then DML_LOCKS must be set to at least 1000 or the table cannot be created.
Page 9 of 10 Frank Bommarito – SageLogix, Inc. Paper# 35697 Database High Availability Management
CONCLUSION With the advent of partitioning, improved database administration with maintenance operations occurring at a partition level rather than at the table or index level allow Database Administrators to provide improved SLA’s. This alone makes partitioning a crucial aspect of any database containing large amounts of data. As maintenance windows decrease in length due to the cost of downtime, understanding methods to shorten database downtime is critical for success. After researching the various methods, each DBA should test the partitioning scheme best suited for his/her environment. Performance and maintenance are the primary concerns to account for when implementing partitioning option. The partitioning of large tables allows for faster data access, as well as decreased maintenance windows. Obviously, partitioning should not be taken lightly, however, it should be considered for any database with excessive data or when excessive growth is anticipated.
Please check out our website at www.sagelogix.com/partitioning. This location has a download zip file containing source code , which will automate the maintenance of date based range partitions.
Page 10 of 10 Frank Bommarito – SageLogix, Inc. Paper# 35697