VLDB - an Analysis of DB2 at Very Large Scale - D13
Total Page:16
File Type:pdf, Size:1020Kb
VLDB - An Analysis of DB2 at Very Large Scale - D13 Austin Clifford IBM Session Code: 2130 Fri, May 18, 2012 (08:00 AM - 09:00 AM) | Platform: DB2 for LUW - II Presentation Objectives 1) Design & implementation of a VLDB. 2) Benefits and best practice use of DB2 Warehouse features. 3) Ingesting data into VLDB. 4) Approach & considerations to scaling out VLDB as the system grows. 5) Management and problem diagnosis of a VLDB. Disclaimer ●© Copyright IBM Corporation 2012. All rights reserved. ●U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. ●THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE. •IBM, the IBM logo, ibm.com, and DB2 are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml 4 What is a Very Large Database? A very large database, or VLDB, is a database that contains an extremely high number of tuples (database rows), or occupies an extremely large physical filesystem storage space. The most common definition of VLDB is a database that occupies more than 1 terabyte. 5 Agenda ● VLDB Mission ● What is a PetaByte? ● Building a PetaByte System ● Shared Nothing Architecture ● Database Design ● Data Generation & ETL ● Workload & Testing ● Performance Monitoring ● Expanding the System ● Useful Tips for VLDB ● Conclusion 6 VLDB Mission ● Increasing demands from real-time analytics is placing additional pressure on warehouse systems....... ● Demonstrate the performance and scalability of DB2 and its complimentary products at the Petabyte scale. ● Simulate heavy on-peak analytics in parallel with other essential system functions such as data ingest and backup and recovery. ● Guide best practices and future product direction. ● Develop techniques for massive scale rapid data generation. 7 Agenda ● VLDB Mission ● What is a PetaByte? ● Building a PetaByte System ● Shared Nothing Architecture ● Database Design ● Data Generation & ETL ● Workload & Testing ● Performance Monitoring ● Expanding the System ● Useful Tips for VLDB ● Conclusion 8 Digital Data 101 – What is a Petabyte? ● 1 Bit = Binary Digit ● 8 Bits = 1 Byte ● 1024 Bytes = 1 Kilobyte ● 1024 Kilobytes = 1 Megabyte ● 1024 Megabytes = 1 Gigabyte ● 1024 Gigabytes = 1 Terabyte ● 1024 Terabytes = 1 Petabyte ● 1024 Petabytes = 1 Exabyte ● 1024 Exabytes = 1 Zettabyte ● 1024 Zettabytes = 1 Yottabyte ● 1024 Yottabytes = 1 Brontobyte ● 1024 Brontobytes = 1 Geopbyte 9 Agenda ● VLDB Mission ● What is a PetaByte? ● Building a PetaByte System ● Shared Nothing Architecture ● Database Design ● Data Generation & ETL ● Workload & Testing ● Performance Monitoring ● Expanding the System ● Useful Tips for VLDB ● Conclusion 10 The Building Blocks We Start with the Storage: 1x = 450GB 1PB of DB Data = Raw Data + RAID + Contingency = 1.6PB 4,608 x = 1.6PB 11 The Building Blocks ● Disks get housed in EXP5000 enclosures ● EXP5000 can hold 16 disks 4608/16 = x 288 ● EXP5000 need a DS5300 Storage controller to manage the IO activity (1 DS for 18 EXP) x 288 = x 16 12 The Building Blocks ●That's the storage done – now we need to drive the system with servers. ●16 clusters ●Smart Analytics guideline of 4 p550 Servers per cluster ●Each cluster attached to 1 DS5300 ●64 servers total = 4 x 13 The Building Blocks ●The communication between devices ● Juniper Network switches for the copper networks ● IBM SAN switches for the fiber networks ●The server control for the 64 servers is managed by the HMC (Hardware Maintenance Console) 14 Expansion Groups P550 x 4 EXP5000 x 6 EXP5000 x 12 DS5300 x 1 15 Hardware Summary Full VLDB deployment: ● Smart Analytics like configuration ● 64 p550 Servers ● 16 DS5300 Storage Controllers ● 288 EXP5000 Disk Enclosures ● 4,608 Disks (450GB each -> 1.6PB) ● 8 IBM SAN switches (24p/40p) ● 7 Juniper Network switches (48p) ● 2 HMCs ● 6KM of copper cables ● 2KM of fiber cables ● Occupies 33 fully loaded racks ● Latest ‘Free cooling” designs are incorporated into the lab 16 Free Cooling ● 6 CRAC (Computer Room Air Con) units in the VLDB lab ● Ireland's favourable (?) climate ● Significant savings for Computer room cooling ● As long as outside air temp is below 9.5 degrees C, 100% of the cooling of the room is by fresh air ● Over a full year, 80% of the cooling is fresh air provisioned 17 Software Stack DB2 (Server 9.7 Fix Pack 5) IBM General Parallel File System (GPFS™ ) 3.3.0.14 IBM Tivoli System Automation for Multi-Platforms 3.1.1.3 IBM AIX 6.1 TL6 SP5 IBM DS Storage Manager 10.60.G5.16. 18 VLDB in the flesh 19 Agenda ● VLDB Mission ● What is a PetaByte? ● Building a PetaByte System ● Shared Nothing Architecture ● Database Design ● Data Generation & ETL ● Workload & Testing ● Performance Monitoring ● Expanding the System ● Useful Tips for VLDB ● Conclusion 20 Shared Nothing Architectureselect … from table Tables Fast Communication Manager Engine Engine Engine Engine … data+log data+log data+log data+log Partition 1 Partition 2 Partition 3 Partition n Database ● Partitioned Database Model ● Database is divided into 504 partitions ● Partitions run on 63 physical nodes (8 partitions per host) ● Each Partition Server has dedicated resources ● Parallel Processing occurs on all partitions: coordinated by DB2 ● Single system image to user and application 21 Shared Nothing Architecture ● Hash Partitioning ● Provides the best parallelism and maximizes I/O capability ● VLDB management (recovery, maintenance, etc.) ● Large scans automatically run in parallel... ● All nodes work together ● Truly scalable performance ● 504 partitions will complete the job in 1/504th of the time ● Queries and Utilities too (backup/restore, load, index build etc) 22 Mapping DB2 Partitions to Servers FCM FCM FCM FCM part0 part1 part2 part3 Node 1 Node 2 # db2nodes.cfg •DB2 instance configuration file # sqllib/db2nodes.cfg •All databases in the instance share this # definition •File stored in DB2 instance sqllib directory 0 node1 0 and shared to other nodes via GPFS 1 node1 1 2 node2 0 • Specifies the host name or the IP address 3 node2 1 of the high speed interconnect for FCM communication ........ 23 Agenda ● VLDB Mission ● What is a PetaByte? ● Building a PetaByte System ● Shared Nothing Architecture ● Database Design ● Data Generation & ETL ● Workload & Testing ● Performance Monitoring ● Expanding the System ● Useful Tips for VLDB ● Conclusion 24 Logical Design Larger dimension tables are Star and snowflake often snow-flaked Sampled production database artifacts Fact tables - composite PK Dimensions tables have or non-unique PK surrogate PKs Dimension FKs are indexed VLDB: Relationships inferred All tables are compressed 25 Sample Schema ● Star schema for 4 largest fact tables 26 Partition Group Design ● Partition Groups ● Small dimension tables in Single Database Partition Group (SDPG). ● Fact and large dimension tables are partitioned. PG_DISJOINT1 PG_DISJOINT2 ● Collocation of Facts and largest/frequently joined dimension. ● VLDB - disjoint partition groups used to drive FCM, Table Queues harder. 27 Choosing the Partitioning Key ● Partitioning key ● A subset of the primary key ● DISTRIBUTE BY HASH ● Fewer columns is better ● Surrogate key with high ● cardinality is ideal Collocation ● Possible for tables with same partitioning key ● Collocate Fact with largest commonly joined dimension table ● Consider adding redundant column to Fact PK ● Replicate other dimension tables ● Trade-off between partition balancing and optimal collocation ● Skew ● Avoid skew of more than 10% ● Avoid straggler partition. 28 Check Skew -- rows per partition SELECT dbpartitionnum(date_id) as ‘Partition number’, count(1)*10 as ‘Total # records’ FROM bi_schema.tb_sales_fact TABLESAMPLE SYSTEM 10 GROUP BY dbpartitionnum(date_id) Partition number Total # records ------------------------ 1 10,313,750 2 10,126,900 3 9,984,910 4 10,215,840 -- Space allocation per partition Select DBPARTITIONNUM, SUM(DATA_OBJECT_L_SIZE) SIZE_KB from SYSIBMADM.ADMINTABINFO where (tabschema,tabname) = ('THESCHEMA','THETABLE') group by rollup( DBPARTITIONNUM ) order by 2; 29 Physical Design ● Separate tablespaces for: ● Staging Tables ● Indexes ● MQTs ● Table data ● VLDB - typically larger tables have larger pagesize ● Range Partitioning ● Most Fact and large