Lenovo RAID Introduction Reference Information

Total Page:16

File Type:pdf, Size:1020Kb

Lenovo RAID Introduction Reference Information Lenovo RAID Introduction Reference Information Using a Redundant Array of Independent Disks (RAID) to store data remains one of the most common and cost-efficient methods to increase server's storage performance, availability, and capacity. RAID increases performance by allowing multiple drives to process I/O requests simultaneously. RAID can also prevent data loss in case of a drive failure by reconstructing (or rebuilding) the missing data from the failed drive using the data from the remaining drives. This guide describes the RAID technology and its capabilities, compares various RAID levels, introduces Lenovo RAID controllers, and provides RAID selection guidance for ThinkSystem, ThinkServer, and System x servers. Introduction to RAID RAID array (also known as RAID drive group) is a group of multiple physical drives that uses a certain common method to distribute data across the drives. A virtual drive (also known as virtual disk or logical drive) is a partition in the drive group that is made up of contiguous data segments on the drives. Virtual drive is presented up to the host operating system as a physical disk that can be partitioned to create OS logical drives or volumes. This concept is illustrated in the following figure. Figure 1. RAID overview The data in a RAID drive group is distributed by segments or strips. A segment or strip (also known as a stripe unit) is the portion of data that is written to one drive immediately before the write operation continues on the next drive. When the last drive in the array is reached, the write operation continues on the first drive in the block that is adjacent to the previous stripe unit written to this drive, and so on. Lenovo RAID Introduction 1 The group of strips that is subsequently written to all drives in the array (from the first drive to the last) before the write operation continues on the first drive is called a stripe, and the process of distributing data is called striping. A strip is a minimal element that can be read from or written to the RAID drive group, and strips contain data or recovery information. The particular method of distributing data across drives in a drive group is known as the RAID level. The RAID level defines a level of fault tolerance, performance, and effective storage capacity because achieving redundancy always lessens disk space that is reserved for storing recovery information. Recommendation: It is always recommended to combine into a single RAID drive group the drives of the same type, rotational speed, and size. RAID levels There are basic RAID levels (0, 1, 5, and 6) and spanned RAID levels (10, 50, and 60). Spanned RAID arrays combine two or more basic RAID arrays to provide higher performance, capacity, and availability by overcoming the limitation of the maximum number of drives per array that is supported by a particular RAID controller. RAID 0 RAID 0 distributes data across all drives in the drive group, as shown in the following figure. RAID 0 is often called striping. Figure 2. RAID 0 RAID 0 provides the best performance and capacity across all RAID levels, however, it does not offer any fault tolerance, and a drive failure causes data loss of the entire volume. The total capacity of a RAID 0 array is equal to the size of the smallest drive multiplied by the number of drives. RAID 0 requires at least two drives. RAID 1 RAID 1 consists of two drives, and data are written to both drives simultaneously, as shown in the following figure. RAID 1 is also known as mirroring. Figure 3. RAID 1 RAID 1 provides very good performance for both read and write operations, and it also offers fault tolerance. In case of a drive failure, the remaining drive will have a copy of the data from the failed drive, so data will not be lost. RAID 1 also offers fast rebuild time. However, the effective storage capacity is a half of total capacity of all drives in a RAID 1 array. The capacity of a RAID 1 array is equal to the size of the smallest drive. Lenovo RAID Introduction 2 RAID 5 RAID 5 distributes data and parity across all drives in the drive group, as shown in the following figure. RAID 5 is also called striping with distributed parity. Figure 4. RAID 5 RAID 5 offers fault tolerance against a single drive failure with the smallest reduction in total storage capacity; however, it has slow rebuild time. RAID 5 provides excellent read performance similar to RAID 0, however, write performance is satisfactory due to the overhead of updating parity for each write operation. In addition, read performance can also suffer when the volume is in degraded mode, that is, in case of a drive failure. The capacity of a RAID 5 array is equal to the size of the smallest drive multiplied by the number of drives less the capacity of the smallest drive. RAID 5 requires minimum three drives. RAID 6 RAID 6 is similar to RAID 5, except that RAID 6 writes two parity segments for each stripe, as shown in the following figure, enabling a volume to sustain up to two simultaneous drive failures. RAID 6 is also known as striping with dual distributed parity . Figure 5. RAID 6 RAID 6 offers the highest level of fault tolerance, and it provides excellent read performance similar to RAID 0, however, write performance is satisfactory (and slightly slower than RAID 5) due to the additional overhead of updating two parity blocks for each write operation. In addition, read performance can also suffer when the volume is in degraded mode, that is, in case of a drive failure. The capacity of a RAID 6 array is equal to the size of the smallest drive multiplied by the number of drives less the twice capacity of the smallest drive. RAID 6 requires minimum four drives. Lenovo RAID Introduction 3 RAID 10 RAID 10 is a combination of RAID 0 and RAID 1 where data is striped across multiple RAID 1 drive groups, as shown in the following figure. RAID 10 is also known as spanned mirroring. Figure 6. RAID 10 RAID 10 provides fault tolerance by sustaining a single drive failure within each span, and it offers very good performance with concurrent I/O processing on all drives. The capacity of a RAID 10 array is equal to a half of the total storage capacity. RAID 10 requires at least four drives. RAID 50 RAID 50 is a combination of RAID 0 and RAID 5 where data is striped across multiple RAID 5 drive groups, as shown in the following figure. RAID 50 is also known as spanned striping with distributed parity . Figure 7. RAID 50 RAID 50 provides fault tolerance by sustaining a single drive failure within each span, and it offers excellent read performance. It also improves write performance compared to RAID 5 due to separate parity calculations in each span. The capacity of a RAID 50 array is equal to the size of the smallest drive multiplied by the number of drives less the capacity of the smallest drive multiplied by the number of spans. RAID 50 requires minimum six drives. RAID 60 RAID 60 is a combination of RAID 0 and RAID 6 where data is striped across multiple RAID 6 drive groups, as shown in the following figure. RAID 60 is also known as spanned striping with dual distributed parity . Figure 8. RAID 60 Lenovo RAID Introduction 4 RAID 60 provides best fault tolerance by sustaining two simultaneous drive failures within each span, and it offers excellent read performance. It also improves write performance compared to RAID 6 due to separate parity calculations in each span. The total capacity of a RAID 60 array is equal to the size of the smallest drive multiplied by the number of drives less the capacity of the smallest drive multiplied by twice the number of spans. RAID 60 requires minimum eight drives. Hot Spares Hot Spares are designated drives that automatically and transparently take the place of a failed drive in the fault tolerant RAID arrays. This helps minimize the amount of time when the array remains in the degraded mode after the drive failure by automatically starting the rebuild process to restore the data from the failed drive on the hot spare. Hot spare drives can be global or dedicated. A global hot spare drive can be used to replace the failed drive in any fault tolerant drive group as long as its capacity is equal or larger than the capacity of the failed drive used by the RAID array. A dedicated hot spare can only replace the failed drive in the designated drive group. RAID level comparison The selection of a RAID level is driven by the following factors: Read performance Write performance Fault tolerance Degraded array performance (for fault tolerant RAID levels) Effective storage capacity The following table summarizes the RAID levels and their characteristics to assist you with selecting the most appropriate RAID level for your needs. Table 1. RAID level comparison Feature RAID 0 RAID 1 RAID 5 RAID 6 RAID 10 RAID 50 RAID 60 Minimum drives 1 2 3 4 4 6 8 Maximum drives* 32 2 32 32 16** 192*** 192*** Tolerance to drive None 1 drive 1 drive 2 drives 1 drive per 1 drive per 2 drives per failures span span span Rebuild time None Fast Slow Slow Fast Slow Slow Read performance Excellent Very good Excellent Excellent Very good Excellent Excellent Write performance Excellent Very good Satisfactory Satisfactory Very good Good Good Degraded array None Very good Satisfactory Satisfactory Very good Good Good performance Capacity overhead None Half 1 drive 2 drives Half 1 drive per 2 drives per span span * The maximum number of drives in a RAID array is controller and system dependent.
Recommended publications
  • VIA RAID Configurations
    VIA RAID configurations The motherboard includes a high performance IDE RAID controller integrated in the VIA VT8237R southbridge chipset. It supports RAID 0, RAID 1 and JBOD with two independent Serial ATA channels. RAID 0 (called Data striping) optimizes two identical hard disk drives to read and write data in parallel, interleaved stacks. Two hard disks perform the same work as a single drive but at a sustained data transfer rate, double that of a single disk alone, thus improving data access and storage. Use of two new identical hard disk drives is required for this setup. RAID 1 (called Data mirroring) copies and maintains an identical image of data from one drive to a second drive. If one drive fails, the disk array management software directs all applications to the surviving drive as it contains a complete copy of the data in the other drive. This RAID configuration provides data protection and increases fault tolerance to the entire system. Use two new drives or use an existing drive and a new drive for this setup. The new drive must be of the same size or larger than the existing drive. JBOD (Spanning) stands for Just a Bunch of Disks and refers to hard disk drives that are not yet configured as a RAID set. This configuration stores the same data redundantly on multiple disks that appear as a single disk on the operating system. Spanning does not deliver any advantage over using separate disks independently and does not provide fault tolerance or other RAID performance benefits. If you use either Windows® XP or Windows® 2000 operating system (OS), copy first the RAID driver from the support CD to a floppy disk before creating RAID configurations.
    [Show full text]
  • The Google File System (GFS)
    The Google File System (GFS), as described by Ghemwat, Gobioff, and Leung in 2003, provided the architecture for scalable, fault-tolerant data management within the context of Google. These architectural choices, however, resulted in sub-optimal performance in regard to data trustworthiness (security) and simplicity. Additionally, the application- specific nature of GFS limits the general scalability of the system outside of the specific design considerations within the context of Google. SYSTEM DESIGN: The authors enumerate their specific considerations as: (1) commodity components with high expectation of failure, (2) a system optimized to handle relatively large files, particularly multi-GB files, (3) most writes to the system are concurrent append operations, rather than internally modifying the extant files, and (4) a high rate of sustained bandwidth (Section 1, 2.1). Utilizing these considerations, this paper analyzes the success of GFS’s design in achieving a fault-tolerant, scalable system while also considering the faults of the system with regards to data-trustworthiness and simplicity. Data-trustworthiness: In designing the GFS system, the authors made the conscious decision not prioritize data trustworthiness by allowing applications to access ‘stale’, or not up-to-date, data. In particular, although the system does inform applications of the chunk-server version number, the designers of GFS encouraged user applications to cache chunkserver information, allowing stale data accesses (Section 2.7.2, 4.5). Although possibly troubling
    [Show full text]
  • Introspection-Based Verification and Validation
    Introspection-Based Verification and Validation Hans P. Zima Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109 E-mail: [email protected] Abstract This paper describes an introspection-based approach to fault tolerance that provides support for run- time monitoring and analysis of program execution. Whereas introspection is mainly motivated by the need to deal with transient faults in embedded systems operating in hazardous environments, we show in this paper that it can also be seen as an extension of traditional V&V methods for dealing with program design faults. Introspection—a technology supporting runtime monitoring and analysis—is motivated primarily by dealing with faults caused by hardware malfunction or environmental influences such as radiation and thermal effects [4]. This paper argues that introspection can also be used to handle certain classes of faults caused by program design errors, complementing the classical approaches for dealing with design errors summarized under the term Verification and Validation (V&V). Consider a program, P , over a given input domain, and assume that the intended behavior of P is defined by a formal specification. Verification of P implies a static proof—performed before execution of the program—that for all legal inputs, the result of applying P to a legal input conforms to the specification. Thus, verification is a methodology that seeks to avoid faults. Model checking [5, 3] refers to a special verification technology that uses exploration of the full state space based on a simplified program model. Verification techniques have been highly successful when judiciously applied under the right conditions in well-defined, limited contexts.
    [Show full text]
  • Fault-Tolerant Components on AWS
    Fault-Tolerant Components on AWS November 2019 This paper has been archived For the latest technical information, see the AWS Whitepapers & Guides page: Archivedhttps://aws.amazon.com/whitepapers Notices Customers are responsible for making their own independent assessment of the information in this document. This document: (a) is for informational purposes only, (b) represents current AWS product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers. © 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved. Archived Contents Introduction .......................................................................................................................... 1 Failures Shouldn’t Be THAT Interesting ............................................................................. 1 Amazon Elastic Compute Cloud ...................................................................................... 1 Elastic Block Store ........................................................................................................... 3 Auto Scaling ....................................................................................................................
    [Show full text]
  • Building Reliable Massive Capacity Ssds Through a Flash Aware RAID-Like Protection †
    applied sciences Article Building Reliable Massive Capacity SSDs through a Flash Aware RAID-Like Protection † Jaeho Kim 1 and Jung Kyu Park 2,* 1 Department of Aerospace and Software Engineering & Engineering Research Institute, Gyeongsang National University, Jinju 52828, Korea; [email protected] 2 Department of Computer Software Engineering, Changshin University, Changwon 51352, Korea * Correspondence: [email protected] † This Paper Is an Extended Version of Paper Published in the IEEE International Conference on Consumer Electronics (ICCE) 2020, Las Vegas, NV, USA, 4–6 January 2020. Received: 14 November 2020; Accepted: 16 December 2020; Published: 21 December 2020 Abstract: The demand for mass storage devices has become an inevitable consequence of the explosive increase in data volume. The three-dimensional (3D) vertical NAND (V-NAND) and quad-level cell (QLC) technologies rapidly accelerate the capacity increase of flash memory based storage system, such as SSDs (Solid State Drives). Massive capacity SSDs adopt dozens or hundreds of flash memory chips in order to implement large capacity storage. However, employing such a large number of flash chips increases the error rate in SSDs. A RAID-like technique inside an SSD has been used in a variety of commercial products, along with various studies, in order to protect user data. With the advent of new types of massive storage devices, studies on the design of RAID-like protection techniques for such huge capacity SSDs are important and essential. In this paper, we propose a massive SSD-Aware Parity Logging (mSAPL) scheme that protects against n-failures at the same time in a stripe, where n is protection strength that is specified by the user.
    [Show full text]
  • Rethinking RAID for SSD Reliability
    Differential RAID: Rethinking RAID for SSD Reliability Asim Kadav Mahesh Balakrishnan University of Wisconsin Microsoft Research Silicon Valley Madison, WI Mountain View, CA [email protected] [email protected] Vijayan Prabhakaran Dahlia Malkhi Microsoft Research Silicon Valley Microsoft Research Silicon Valley Mountain View, CA Mountain View, CA [email protected] [email protected] ABSTRACT sult, a write-intensive workload can wear out the SSD within Deployment of SSDs in enterprise settings is limited by the months. Also, this erasure limit continues to decrease as low erase cycles available on commodity devices. Redun- MLC devices increase in capacity and density. As a conse- dancy solutions such as RAID can potentially be used to pro- quence, the reliability of MLC devices remains a paramount tect against the high Bit Error Rate (BER) of aging SSDs. concern for its adoption in servers [4]. Unfortunately, such solutions wear out redundant devices at similar rates, inducing correlated failures as arrays age in In this paper, we explore the possibility of using device-level unison. We present Diff-RAID, a new RAID variant that redundancy to mask the effects of aging on SSDs. Clustering distributes parity unevenly across SSDs to create age dispari- options such as RAID can potentially be used to tolerate the ties within arrays. By doing so, Diff-RAID balances the high higher BERs exhibited by worn out SSDs. However, these BER of old SSDs against the low BER of young SSDs. Diff- techniques do not automatically provide adequate protec- RAID provides much greater reliability for SSDs compared tion for aging SSDs; by balancing write load across devices, to RAID-4 and RAID-5 for the same space overhead, and solutions such as RAID-5 cause all SSDs to wear out at ap- offers a trade-off curve between throughput and reliability.
    [Show full text]
  • Disk Array Data Organizations and RAID
    Guest Lecture for 15-440 Disk Array Data Organizations and RAID October 2010, Greg Ganger © 1 Plan for today Why have multiple disks? Storage capacity, performance capacity, reliability Load distribution problem and approaches disk striping Fault tolerance replication parity-based protection “RAID” and the Disk Array Matrix Rebuild October 2010, Greg Ganger © 2 Why multi-disk systems? A single storage device may not provide enough storage capacity, performance capacity, reliability So, what is the simplest arrangement? October 2010, Greg Ganger © 3 Just a bunch of disks (JBOD) A0 B0 C0 D0 A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 Yes, it’s a goofy name industry really does sell “JBOD enclosures” October 2010, Greg Ganger © 4 Disk Subsystem Load Balancing I/O requests are almost never evenly distributed Some data is requested more than other data Depends on the apps, usage, time, … October 2010, Greg Ganger © 5 Disk Subsystem Load Balancing I/O requests are almost never evenly distributed Some data is requested more than other data Depends on the apps, usage, time, … What is the right data-to-disk assignment policy? Common approach: Fixed data placement Your data is on disk X, period! For good reasons too: you bought it or you’re paying more … Fancy: Dynamic data placement If some of your files are accessed a lot, the admin (or even system) may separate the “hot” files across multiple disks In this scenario, entire files systems (or even files) are manually moved by the system admin to specific disks October 2010, Greg
    [Show full text]
  • Network Reliability and Fault Tolerance
    Network Reliability and Fault Tolerance Muriel Medard´ [email protected] Laboratory for Information and Decision Systems Room 35-212 Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA 02139 Steven S. Lumetta [email protected] Coordinated Science Laboratory University of Illinois Urbana-Champaign 1308 W. Main Street, Urbana, IL 61801 1 Introduction The majority of communications applications, from cellular telephone conversations to credit card transactions, assume the availability of a reliable network. At this level, data are expected to tra- verse the network and to arrive intact at their destination. The physical systems that compose a network, on the other hand, are subjected to a wide range of problems, ranging from signal distor- tion to component failures. Similarly, the software that supports the high-level semantic interface 1 often contains unknown bugs and other latent reliability problems. Redundancy underlies all ap- proaches to fault tolerance. Definitive definitions for all concepts and terms related to reliability, and, more broadly, dependability, can be found in [AAC+92]. Designing any system to tolerate faults first requires the selection of a fault model, a set of possible failure scenarios along with an understanding of the frequency, duration, and impact of each scenario. A simple fault model merely lists the set of faults to be considered; inclusion in the set is decided based on a combination of expected frequency, impact on the system, and feasibility or cost of providing protection. Most reliable network designs address the failure of any single component, and some designs tolerate multiple failures. In contrast, few attempt to handle the adversarial conditions that might occur in a terrorist attack, and cataclysmic events are almost never addressed at any scale larger than a city.
    [Show full text]
  • Identify Storage Technologies and Understand RAID
    LESSON 4.1_4.2 98-365 Windows Server Administration Fundamentals IdentifyIdentify StorageStorage TechnologiesTechnologies andand UnderstandUnderstand RAIDRAID LESSON 4.1_4.2 98-365 Windows Server Administration Fundamentals Lesson Overview In this lesson, you will learn: Local storage options Network storage options Redundant Array of Independent Disk (RAID) options LESSON 4.1_4.2 98-365 Windows Server Administration Fundamentals Anticipatory Set List three different RAID configurations. Which of these three bus types has the fastest transfer speed? o Parallel ATA (PATA) o Serial ATA (SATA) o USB 2.0 LESSON 4.1_4.2 98-365 Windows Server Administration Fundamentals Local Storage Options Local storage options can range from a simple single disk to a Redundant Array of Independent Disks (RAID). Local storage options can be broken down into bus types: o Serial Advanced Technology Attachment (SATA) o Integrated Drive Electronics (IDE, now called Parallel ATA or PATA) o Small Computer System Interface (SCSI) o Serial Attached SCSI (SAS) LESSON 4.1_4.2 98-365 Windows Server Administration Fundamentals Local Storage Options SATA drives have taken the place of the tradition PATA drives. SATA have several advantages over PATA: o Reduced cable bulk and cost o Faster and more efficient data transfer o Hot-swapping technology LESSON 4.1_4.2 98-365 Windows Server Administration Fundamentals Local Storage Options (continued) SAS drives have taken the place of the traditional SCSI and Ultra SCSI drives in server class machines. SAS have several
    [Show full text]
  • Architectures and Algorithms for On-Line Failure Recovery in Redundant Disk Arrays
    Architectures and Algorithms for On-Line Failure Recovery in Redundant Disk Arrays Draft copy submitted to the Journal of Distributed and Parallel Databases. A revised copy is published in this journal, vol. 2 no. 3, July 1994.. Mark Holland Department of Electrical and Computer Engineering Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA 15213-3890 (412) 268-5237 [email protected] Garth A. Gibson School of Computer Science Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA 15213-3890 (412) 268-5890 [email protected] Daniel P. Siewiorek School of Computer Science Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA 15213-3890 (412) 268-2570 [email protected] Architectures and Algorithms for On-Line Failure Recovery In Redundant Disk Arrays1 Abstract The performance of traditional RAID Level 5 arrays is, for many applications, unacceptably poor while one of its constituent disks is non-functional. This paper describes and evaluates mechanisms by which this disk array failure-recovery performance can be improved. The two key issues addressed are the data layout, the mapping by which data and parity blocks are assigned to physical disk blocks in an array, and the reconstruction algorithm, which is the technique used to recover data that is lost when a component disk fails. The data layout techniques this paper investigates are variations on the declustered parity organiza- tion, a derivative of RAID Level 5 that allows a system to trade some of its data capacity for improved failure-recovery performance. Parity declustering improves the failure-mode performance of an array significantly, and a parity-declustered architecture is preferable to an equivalent-size multiple-group RAID Level 5 organization in environments where failure-recovery performance is important.
    [Show full text]
  • Fault Tolerance
    Distributed Systems Fault Tolerance Paul Krzyzanowski Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License. Page Faults • Deviation from expected behavior • Due to a variety of factors: – Hardware failure – Software bugs – Operator errors – Network errors/outages Page A fault in a system is some deviation from the expected behavior of the system -- a malfunction. Faults may be due to a variety of factors, including hardware, software, operator (user), and network errors. Faults • Three categories – transient faults – intermittent faults – permanent faults • Any fault may be – fail-silent (fail-stop) – Byzantine • synchronous system vs. asynchronous system – E.g., IP packet versus serial port transmission Page Faults can be classified into one of three categories: transient faults: these occur once and then disappear. For example, a network message transmission times out but works fine when attempted a second time. Intermittent faults: these are the most annoying of component faults. This fault is characterized by a fault occurring, then vanishing again, then occurring, … An example of this kind of fault is a loose connection. Permanent faults: this fault is persistent: it continues to exist until the faulty component is repaired or replaced. Examples of this fault are disk head crashes, software bugs, and burnt-out hardware. Any of these faults may be either a fail-silent failure (also known as fail-stop) or a Byzantine failure. A fail-silent fault is one where the faulty unit stops functioning and produces no ill output (it produces no output or produces output to indicate failure). A Byzantine fault is one where the faulty unit continues to run but produces incorrect results.
    [Show full text]
  • Detecting and Tolerating Byzantine Faults in Database Systems Benjamin Mead Vandiver
    Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2008-040 June 30, 2008 Detecting and Tolerating Byzantine Faults in Database Systems Benjamin Mead Vandiver massachusetts institute of technology, cambridge, ma 02139 usa — www.csail.mit.edu Detecting and Tolerating Byzantine Faults in Database Systems by Benjamin Mead Vandiver Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2008 c Massachusetts Institute of Technology 2008. All rights reserved. Author.............................................................. Department of Electrical Engineering and Computer Science May 23, 2008 Certified by.......................................................... Barbara Liskov Ford Professor of Engineering Thesis Supervisor Accepted by......................................................... Terry P. Orlando Chairman, Department Committee on Graduate Students 2 Detecting and Tolerating Byzantine Faults in Database Systems by Benjamin Mead Vandiver Submitted to the Department of Electrical Engineering and Computer Science on May 23, 2008, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Abstract This thesis describes the design, implementation, and evaluation of a replication scheme to handle Byzantine faults in transaction processing database systems. The scheme compares answers from queries and updates on multiple replicas which are off-the-shelf database systems, to provide a single database that is Byzantine fault tolerant. The scheme works when the replicas are homogeneous, but it also allows heterogeneous replication in which replicas come from different vendors. Heteroge- neous replicas reduce the impact of bugs and security compromises because they are implemented independently and are thus less likely to suffer correlated failures.
    [Show full text]