Partition Techniques for Database Systems
Total Page:16
File Type:pdf, Size:1020Kb
Partition techniques for Database Systems Presentation for seminar General Informatics Sascha Herrmann Overview Motivation Types of partitioning Benefits of partitioning Implementation of partitioning Case Study Conclusion Motivation growing needs of information availability tends to larger and larger Databases Datawarehouse Systems and other Databases tends to grow to sizes in Terabyte regions in 1998 the biggest known centralized Database had a size of 4,3 Terabyte, in December 2001 it has grown to a size of 10,4 Terabyte Databases and even Tables with sizes up to several Terabytes sets high performance needs to Database Managment Systems Problems of unpartitioned tables big, unpartitioned tables are hard to manage for Database Systems there is no easy way of parallelizing queries on single tables data throughput is limited to the capabilities of one physical storage medium seeks of harddisk heads can©t be done in parallel loading or deletion of bigger amounts of data results in long living table locks and produces heavy loads on table indices there is no way to control where special tuples are physical stored Types of partitioning table partitioning horizontally partitioning vertical partitioning value based not value based (one- / multidimensional) interval hash random round-robin From: NOWITZKY, P. 345 Range partitioning T: ( measure_station; Table T measure_point; Partition 1 P1 if 2001−01−01DATE2001−04−01 measure_date) P2 if 2001−04−01DATE2001−07−01 F = p P3 if 2001−07−01DATE2001−10−01 Table T Table T P4 if 2001−10−01DATE2002−01−01 P else Partition 2 { 5 } Table T Partition 5 Index partitioning index can be partitioned too local indices local prefixed indices local non-prefixed indices global indices Benefits of partitioned tables reduce of seek times and increased throughput of physical devices exclusion of unaffected partitions easy parallelism of queries localization of maintenance operations control over physical location of data bigger fault tolerance transparent to modelling / user view Partition exclusion partition location of tupel is unambiguous determinable by partition criteria queries wich uses the partition criteria can exclude partitions from search operations select measure_point from T where measure_date between ©2001-06-21© and ©2001-09-02© this results in large reduce of data needed to process partitioning clusters data together by there partition criteria Parallelism queries can process several partitions in parallel reduced seek time because of deserialized physical seeks increased IO-throughput because of parallel data reads possible parallel calculation of operations calculation of interim results in parallel -> join to final result select avg(measure_point) from T where mesaure_station = 1 Physical Storage Control for unpartitioned tables one can control physical storage only on table basis with partitioning, control is extend to partition level partitions with historical / infrequently used data can be stored on cheap storage media partitions can be set read only to protect non working data storage on read only media is possible as well Localized maintenance for unpartitioned tables maintenance operations affect the whole table on partitioned tables maintenance operations can be localized to single partitions backups can be done partition based large insert or delete operations affect only single partitions avoid of large delete jobs by droping whole table partitions Manual partitioning not all database systems supports partitioning joining equal tables with union can help out parallel queries are possible indices are localized control of physical storage localized maintenance no global indices partition exclusion depends on query optimizer transparency is hard to achieve Manual partitioning create table part_test1 (id int not null, name varchar(200), mdate date not null, value int); alter table part_test1 add constraint check1 check (mdate < ©2003-01-01©); create table part_test2 (id int not null, name varchar(200), mdate date not null, value int); alter table part_test2 add constraint check2 check (mdate between ©2003-01-01© and ©2003-12-31©); create table part_test3 (id int not null, name varchar(200), mdate date not null, value int); alter table part_test3 add constraint check3 check (mdate > ©2003-12-31©); Manual partitioning create or replace view part_test as select * from part_test1 where datum < ©2003-01-01© union all select * from part_test2 where datum between ©2003-01-01© and ©2003-12-31© union all select * from part_test3 where datum > ©2003-12-31©; create or replace rule part_test_rule1 as on insert to part_test where new.datum < ©2003-01-01© do instead insert into part_test1 values (new.id, new.name, new.datum, new.gehalt); create or replace rule part_test_rule2 as ... Partitioned tables partitioning support by the DBMS solves the problems of manual partitioning transparency to applications global indices are available query optimizer is prepared for partitioned tables since partitioning is done by the DBMS probability of user errors is reduced Partitioned tables create table bde (measure_station number, measure_point number, measure_date date) storage (initial 10M) tablespace ts1 partition by range (measure_date) ( partition bde_q1 values less than (©01-APR-1997©), partition bde_q2 values less than (©01-JUL-1997©) storage (initial 20M) tablespace ts2, partition bde_q3 values less than (©01-OKT-1997©) storage (initial 20M) tablespace ts3, partition bde_q4 values less than (MAXVALUE) storage (initial 40M) tablespace ts4); From:HERRMANN, P.208 Case Study Case Study by Sanjay Mishra two tables, each contains 3,650,000 rows sales_forecast(part_id; forecast_date; quantity) actual_sales(part_id; sale_date; quantity) two queries select count(*) from actual_sales where part_id < 1600 and quantity < 500 select count(*) from sales_forecast f, actual_sales s where s.part_id = f.part_id and s.sale_date = f.forecast_date and s.quantity > f.quantity comparison of performance of unpartitioned table and table range partitioned by part_id into 10 partitions Case Study Result time elapsed physical reads query 1 without partitioning 29245 7.04s 10535 query 1 with partitioning 29245 2.04s 2124 query 2 without partitioning 1825057 3:57m 110480 query 2 with partitioning 1825057 1:54m 44270 Conclusion partitioning offers great performance increases if queries follows an pattern usable for partitioning allows localized maintanencane failures of single partitions doesn©t affect the whole table to the users / application view partitioning is transparent most bigger DBMS supports partitioning Oracle, DB2, Informix for Oracle single cpu, unlimited user licence the cost for the partitioning option is about 10.000€ Sources NOWITZKY: Informatik Spektrum, December, 2001, Heidelberg, P. 345 ± 355 HERRMANN: Oracle 8 für den DBA, Herrmann; Lenz; Unbescheid, 1998, Bonn MISHRA: http://www.dbazine.com/mishra2.shtml, Sanjay Mishra, 13.05.2004.