Space: the Storage Frontier for Distributed DB2 DBA's

Session: A12 Space: The Storage Frontier for Distributed DB2 DBA’s Jerry Spence National City Corporation May 10, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for LUW All Databases reside on storage. Most storage is nothing but boxes of hardware to most people, but the DBA most go beyond the basic and understand how storage is laid out and the components of the storage design and how it relates to the database. This presentation will present storage concepts and how they relate to the database and some considerations for deploying a database on the various storage devices. 1 Objectives: • The hardware side of storage. Arrays, controllers, HBA’s RAID, SAN, Disks • The AIX view of Storage. The logical volume manager • The database view of storage. Tablespaces, Automatic Storage, tables, indexes, pages, extents, pre-fetching • DPF and the storage design. Using the BCU model • Back and recovery and storage strategies. 2 1. Overview: Why you need to understand the storage system and it isn't just a black box you can ignore. 2. The hardware components of Storage: Will discuss Server, Network and Storage array components and how they relate. 3. The OS view of storage. Will discuss AIX logical volume manager and how filesystems are created. 4. Ok I have a database to create, what's next? Will discuss how the storage design impacts the database design and some considerations. 5. Multi-partition and shared nothing: Multiple partitions presents challenges in the storage design. Will discuss some considerations when building a multi-partition database. 6. Backup and Recovery and the storage design: How you define your backup and recovery procedure will have a direct impact on storage usage. Will discuss the various strategies and how the related to disk usage. 2 Overview Storage is a integral part of databases. Databases could not exist with out some media for saving data. Storage comes in many designs and configurations. How storage is used can have a direct impact on the performance of a database. This presentation will give a basic review of storage and how it relates to DB2 databases. 3 3 Storage 4 4 Hardware: The Basics • Hard drives are composed of many parts, but the key parts are the platter, the spindle and motor, read write heads and the Head Actuator. • The platters are circular disks typically made out of a light aluminum alloy, glass and ceramic materials. The platter is coated with a magnetic material. • Platters are magnetized on both sides. • The platters are separated by spacers and attached to a spindle. The spindle is attached to a motor and when the motor spins, the spindle and platters also spin in unison. 5 5 Hardware: The Basics • Read/Write heads as the name infers reads and writes data from the disk platters. • There is generally one head per platter side • The heads fly just above the platter surface. Some distances are as little as 3 nanometres(3 billionths of a meter). • The read and write heads have involved over time from the magnetic coil to GMR (Giant Magnetorestistive) GMR is very sensitive. 6 6 7 7 Hardware: The Basics • The read write heads are mounted to a actuator assembly or head assembly. • The heads are attached to a a device called a slider. • The slider’s function is to support the head and hold it in the correct location over the platter. 8 8 Inside the Disk Drive 9 9 Hardware: How data is stored • Platters are broken down into tracks (concentric rings) and sectors or blocks. A block or sector is generally the smallest addressable unit. • Data is stored or read magnetically via the read/write heads on the surface of the platters. 10 10 The Platter 11 11 Hardware: Terms • Seek Time measures the time it takes to move the read/write heads to tracks on the platters. • Rotational Delay is how long it takes for the read/write heads to move to the sector or block within a track. • Transfer time is the time take to actually move the data to and from the disk. • Areal density or bit density is the amount of data that can be packed on to a storage medium. Gigabits per square inch. 12 12 Hardware: Terms • SATA - Serial Advanced Technology Advancement Based on serial signaling technology. Cables are thinner and more flexible. • Fibre Channel - High Speed • Lun - Logical Unit Number • Controller Software - Software that manages devices. • Mirroring - Raid 1 . Drives mirrored. Two writes. • Disk striping - Spreading data across many disks. 13 http://www.snia.org/education/dictionary/ very good site for network and storage terms 13 Hardware: Storage Arrays • Simply stated, storage arrays are large boxes that hold many hard disks. • IBM, HP, EMC, SUN, Hitachi to name a few offer storage arrays in different sizes and performance ranges. • Storage arrays can be scalable to the multi terabyte range. • Storage arrays include memory cache and come in different sizes. 14 14 Storage Array 15 15 Hardware: RAID • Raid stand for inexpensive arrays of disks. • Provides a method to access multiple disks as if they were one large disk • Data can be spread across multiple disk improving performance because more read/write heads are involved • Reliability can be increased with certain RAID types. • There are multiple RAID Types: 0, 1, 1+0 , 2, 3, 4, 5, 6..ect. 16 * RAID-0: This technique has striping but no redundancy of data. It offers the best performance but no fault-tolerance. * RAID-1: This type is also known as disk mirroring and consists of at least two drives that duplicate the storage of data. There is no striping. Read performance is improved since either disk can be read at the same time. Write performance is the same as for single disk storage. RAID-1 provides the best performance and the best fault-tolerance in a multi-user system. * RAID-2: This type uses striping across disks with some disks storing error checking and correcting (ECC) information. It has no advantage over RAID-3. * RAID-3: This type uses striping and dedicates one drive to storing parity information. The embedded error checking (ECC) information is used to detect errors. Data recovery is accomplished by calculating the exclusive OR (XOR) of the information recorded on the other drives. Since an I/O operation addresses all drives at the same time, RAID-3 cannot overlap I/O. For this reason, RAID-3 is best for single-user systems with long record applications. * RAID-4: This type uses large stripes, which means you can read records from any single drive. This allows you to take advantage of overlapped I/O for read operations. Since all write operations have to update the parity drive, no I/O overlapping is possible. RAID-4 offers no advantage over RAID-5. * RAID-5: This type includes a rotating parity array, thus addressing the write limitation in RAID-4. Thus, all read and write operations can be overlapped. RAID-5 stores parity information but not redundant data (but parity information can be used to reconstruct data). RAID-5 requires at least three and usually five disks for the array. It's best for multi-user systems in which performance is not critical or which do few write operations. * RAID-6: This type is similar to RAID-5 but includes a second parity scheme that is distributed across different drives and thus offers extremely high fault- and drive-failure tolerance. * RAID-7: This type includes a real-time embedded operating system as a controller, caching via a high-speed bus, and other characteristics of a stand-alone computer. One vendor offers this system. * RAID-10: Combining RAID-0 and RAID-1 is often referred to as RAID-10, which offers higher performance than RAID-1 but at much higher cost. There are two subtypes: In RAID-0+1, data is organized as stripes across multiple disks, and then the striped disk sets are mirrored. In RAID-1+0, the data is mirrored and the mirrors are striped. * RAID-50 (or RAID-5+0): This type consists of a series of RAID-5 groups and striped in RAID-0 fashion to improve RAID-5 performance without reducing data protection. * RAID-53 (or RAID-5+3): This type uses striping (in RAID-0 style) for RAID-3's virtual disk blocks. This offers higher performance than RAID-3 but at much higher cost. * RAID-S (also known as Parity RAID): This is an alternate, proprietary method for striped parity RAID from EMC Symmetrix that is no longer in use on current equipment. It appears to be similar to RAID-5 with some performance enhancements as well as the enhancements that come from having a high-speed disk cache on the disk array. 16 Raid 0 17 RAID 0 implements block striping, where data is broken into logical blocks and is striped across several drives. Unlike other RAID levels, there is no facility for redundancy. In the event of a disk failure, data is lost. In block striping, the total disk capacity is equivalent to the sum of the capacities of all drives in the array. This combination of drives appears to the system as a single logical drive. RAID 0 provides the highest performance. It is fast because data can be simultaneously transferred to or from every disk in the array. Furthermore, read/writes to separate drives can be processed concurrently. 17 Raid 1 18 RAID 1 implements disk mirroring, where a copy of the same data is recorded onto two drives. By keeping two copies of data on separate disks, data is protected against a disk failure.

Load more