VLDB - an Analysis of DB2 at Very Large Scale

Click to edit Master title style VLDB - An Analysis of DB2 at Very Large Scale Austin Clifford IBM DRAFT Session Code: 2130 Fri, May 18, 2012 (08:00 AM - 09:00 AM) | Platform: DB2 for LUW - II Presentation Objectives 1) Design & implementation of a VLDB. 2) Benefits and best practice use of DB2 Warehouse features. 3) Ingesting data into VLDB. 4) Approach & considerations to scaling out VLDB as the system grows. 5) Management and problem diagnosis of a VLDB. Disclaimer ●© Copyright IBM Corporation 2012. All rights reserved. ●U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. ●THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE. •IBM, the IBM logo, ibm.com, and DB2 are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml 4 What is a Very Large Database? A very large database, or VLDB, is a database that contains an extremely high number of tuples (database rows), or occupies an extremely large physical filesystem storage space. The most common definition of VLDB is a database that occupies more than 1 terabyte. 5 Agenda ● VLDB Mission ● What is a PetaByte? ● Building a PetaByte System ● Shared Nothing Architecture ● Database Design ● Data Generation & ETL ● Workload & Testing ● Performance Monitoring ● Expanding the System ● Useful Tips for VLDB 6 VLDB Mission ● Real-time analytics is placing increasing demands on data warehouse systems. ● Verify the performance and scalability of DB2 and its complimentary products at the Petabyte scale. ● Simulate heavy on-peak analytics in parallel with other essential system functions such as data ingest and backup and recovery. ● Guide best practices and future product direction. ● Develop techniques for massive scale rapid data generation. 7 Digital Data 101 – What is a Petabyte? ● 1 Bit = Binary Digit ● 8 Bits = 1 Byte ● 1024 Bytes = 1 Kilobyte ● 1024 Kilobytes = 1 Megabyte ● 1024 Megabytes = 1 Gigabyte ● 1024 Gigabytes = 1 Terabyte ● 1024 Terabytes = 1 Petabyte ● 1024 Petabytes = 1 Exabyte ● 1024 Exabytes = 1 Zettabyte ● 1024 Zettabytes = 1 Yottabyte ● 1024 Yottabytes = 1 Brontobyte ● 1024 Brontobytes = 1 Geopbyte 8 Agenda ● VLDB Mission ● What is a PetaByte? ● Building a PetaByte System ● Shared Nothing Architecture ● Database Design ● Data Generation & ETL ● Workload & Testing ● Performance Monitoring ● Expanding the System ● Useful Tips for VLDB 9 The Building Blocks We Start with the Storage: 1x = 450GB 1PB of DB Data = Raw Data + RAID + Contingency = 1.6PB 4,608 x = 1.6PB 10 The Building Blocks ● Disks get housed in EXP5000 enclosures ● EXP5000 can hold 16 disks 4608/16 = x 288 ● EXP5000 need a DS5300 Storage controller to manage the IO activity (1 DS for 18 EXP) x 288 = x 16 11 The Building Blocks ●Thats the storage done – now we need to drive the system with servers. ●To maximise the advantages of parallel processing, the 16 Storage controllers & disks are assigned to 1 cluster each with a Smart Analytics guideline of 4 p550 Servers per cluster (64 servers total) = 4 x 12 The Building Blocks ●The communication between devices takes place via Juniper Network switches for the copper networks and IBM SAN switches for the fiber networks ●The server control for the 64 servers is managed by the HMC (Hardware Maintenance Console) 13 Hardware Summary ● Full VLDB deployment: ● Smart Analytics like configuration ● 64 p550 Servers ● 16 DS5300 Storage Controllers ● 288 EXP5000 Disk Enclosures ● 4,608 Disks (450GB each -> 1.6PB) ● 8 IBM SAN switches (24p/40p) ● 7 Juniper Network switches (48p) ● 2 HMCs ● 6KM of copper cables ● 2KM of fiber cables ● Occupies 33 fully loaded racks ● Latest ‘Free cooling” designs are incorporated into LAB ● Resulting in a predicted saving of 60% of the power required for cooling 14 Where is the system housed? ● The VLDB deployment when racked up, occupies 33 fully populated racks ● At project inception, there was no lab on the Dublin campus that could house the power and cooling requirements ● A brand new lab was built ● Each device and Rack for the VLDB system was delivered individually in its own packaging and had to be unpacked and racked ● Packaging should not be underestimated!! ● The VLDB project filled 7 industrial dumpsters with packaging. 15 Free Cooling ● There are 6 CRAC (Computer Room Air Con) units in the IM Lab ● Irelands favourable (?) climate results in significant savings for Computer room cooling ● As long as outside air temp is below 9.5 degrees C, 100% of the cooling of the room is by fresh air ● Over the full year, 80% of the cooling will be fresh air provisioned 16 Expansion Groups 17 Software Stack ● The following software was installed on the system: ● DB2 (Server 9.7 Fix Pack 5) ● IBM AIX 6.1 TL6 SP5 ● IBM General Parallel File System (GPFS™ ) 3.3.0.14 ● IBM Tivoli System Automation for Multi-Platforms 3.1.1.3 ● IBM DS Storage Manager 10.60.G5.16. 18 Agenda ● VLDB Mission ● What is a PetaByte? ● Building a PetaByte System ● Shared Nothing Architecture ● Database Design ● Data Generation & ETL ● Workload & Testing ● Performance Monitoring ● Expanding the System ● Useful Tips for VLDB 19 Shared Nothing Architectureselect … from table Tables Fast Communication Manager Engine Engine Engine Engine … data+log data+log data+log data+log Partition 1 Partition 2 Partition 3 Partition n Database ● Partitioned Database Model ● Database is divided into 504 partitions ● Partitions run on 63 physical nodes (8 partitions per host) ● Each Partition Server has dedicated resources ● Parallel Processing occurs on all partitions: coordinated by the DBMS ● Single system image to user and application 20 Shared Nothing Architecture ● Hash Partitioning ● Provides the best parallelism and maximizes I/O capability ● VLDB management (recovery, maintenance, etc.) ● Large scans automatically run in parallel... ● All nodes work together ● Truly scalable performance ● As we have 504 partitions, then it should finish in 1/504th of the time ● And not just the queries, but the utilities too (backup/restore, load, index build etc) 21 Mapping DB2 Partitions to Servers FCM FCM FCM FCM part0 part1 part2 part3 Node 1 Node 2 # db2nodes.cfg # •DB2 instance configuration file sqllib/db2nodes.cfg •All databases in the instance share this definition # •File in the DB2 instance directory 0 node1 0 •Sqllib directory located on one node of 1 node1 1 the system 2 node2 0 •GPFS/NFS mounted by all other nodes 3 node2 1 22 Agenda ● VLDB Mission ● What is a PetaByte? ● Building a PetaByte System ● Shared Nothing Architecture ● Database Design ● Data Generation & ETL ● Workload & Testing ● Performance Monitoring ● Expanding the System ● Useful Tips for VLDB 23 Database Design ● Star and snowflake ● Sampled production database artifacts. ● Dimensional levels and hierarchies. ● Larger dimension tables are typically snow-flaked. ● No referential integrity – relationships inferred. ● Dimensions tables have surrogate PKs ● Fact tables - composite PK or non-unique PK. ● Dimension FKs are indexed. ● All tables are compressed. 24 Database Design ● Star schema for 4 largest fact tables 25 Database Design ● Partition Groups ● Small dimension tables in SDPG. ● Fact and large dimension tables are partitioned. ● Collocation of Facts and largest/frequently joined dimension. ● Disjoint partition groups to drive table queueing. 26 Database Design ● Partitioning key ● A subset of the primary key ● DISTRIBUTE BY HASH ● Fewer columns is better ● Surrogate key with high cardinality is ideal ● Collocation ● Possible for tables with same partitioning key ● Data type must match ● Collocate Fact with largest commonly joined dimension table ● Use table replication for other non-collocated dimensions. ● Trade-off between partition balancing and optimal collocation ● Skew ● Aim for skew of less than 10% ● Avoid straggler partition. 27 Check Skew -- rows per partition SELECT dbpartitionnum(date_id) as ‘Partition number’, count(1)*10 as ‘Total # records’ FROM bi_schema.tb_sales_fact TABLESAMPLE SYSTEM 10 GROUP BY dbpartitionnum(date_id) ------------------------ 1 10,313,750 2 10,126,900 3 9,984,910 4 10,215,840 -- Space allocation per partition Select DBPARTITIONNUM, SUM(DATA_OBJECT_L_SIZE) SIZE_KB from SYSIBMADM.ADMINTABINFO

Load more