10 Rules for Scalable Performance in 'Simple Operation' Datastores

Total Page:16

File Type:pdf, Size:1020Kb

10 Rules for Scalable Performance in 'Simple Operation' Datastores contributedarticles Doi:10.1145/1953122.1953144 sharing the following features: Partition data and operations, keep ˲˲ Disk-oriented storage; ˲˲ Tables stored row-by-row on disk, administration simple, do not assume hence, a row store; one size fits all. ˲˲ B-trees as the indexing mecha- nism; By MiChAeL StonebrakeR AnD Rick Cattell ˲˲ Dynamic locking as the concurren- cy-control mechanism; ˲˲ A write-ahead log, or WAL, for crash recovery; ˲˲ SQL as the access language; and ˲˲ A “row-oriented” query optimizer 10 Rules for 7 and executor, pioneered in System R. The 1970s and 1980s were charac- terized by a single major DBMS mar- scalable ket—business data processing—today called online transaction processing, or OLTP. Since then, DBMSs have come to be used in a variety of new markets, Performance including data warehouses, scientific databases, social-networking sites, and gaming sites; the modern-day DBMS market is characterized in the figure here. in ‘simple The figure includes two axes: hori- zontal, indicating whether an applica- tion is read-focused or write-focused, and vertical, indicating whether an ap- operation’ plication performs simple operations (read or write a few items) or complex operations (read or write thousands of items); for example, the traditional Datastores OLTP market is write-focused with simple operations, while the data warehouse market is read-focused with complex operations. Many appli- key insights Many scalable sQL and nosQL datastores the relatioNal MoDel of data was proposed in 1970 have been introduced over the past five 5 years, designed for Web 2.0 and other by Ted Codd as the best solution for the DBMS problems applications that exceed the capacity of of the day—business data processing. Early relational single-server RDBMss. 2 9 Major differences characterize these systems included System R and Ingres, and almost all new datastores as to their consistency guarantees, per-server performance, commercial relational DBMS (RDBMS) implementations scalability for read versus write loads, today trace their roots to these two systems. automatic recovery from failure of a server, programming convenience, and As such, unless you squint, the dominant commercial administrative simplicity. vendors—Oracle, IBM, and Microsoft—as well as the Applications must be designed for scalability, partitioning application major open source systems—MySQL and PostgreSQL— data into “shards,” avoiding operations that span partitions, designing for all look about the same today; we term these systems parallelism, and weighing requirements general-purpose traditional row stores, or GPTRS, for consistency guarantees. 72 Communications of The acm | juNe 2011 | voL. 54 | No. 6 high-level Look for languages are Plan to carefully shared-nothing GooD LeVerage main scalability. and need not hurt memory databases. 1 2performance. 3 high availability and automatic onLine recovery are essential for eVeRyThinG. SO4 scalability. 5 Don’t try to Avoid Look for build ACiD Multi-noDe administrative consistency operations. simpliCiTy. 6 7yourself. 8 Pay attention Open source to noDe gives you more performance. ControL over your future. 9 10juNe 2011 | voL. 54 | No. 6 | Communications of The acm 73 contributedarticles cations are, of course, in between; for objects, each with a key and a payload, transactions and/or by relaxing ACID example, social-networking applica- providing little or no ability to inter- semantics, by, say, supporting only tions involve mostly simple operations pret the payload as a multi-attribute “eventual consistency” on multiple but also a balance of reads and writes. object, with no query mechanism for versions of data. Hence, the figure should be viewed as non-primary attributes; These systems are driven by a vari- a continuum in both directions, with Document stores. Includes Couch- ety of motivations. For some, it is dis- any given application somewhere in DB, MongoDB, SimpleDB, and Ter- satisfaction with the relational model between. rastore in which the data model con- or the “heaviness” of RDBMSs. For oth- The major commercial engines sists of objects with a variable number ers, it is the needs of large Web prop- and open source implementations of of attributes, some allowing nested erties with some of the most demand- the relational model are positioned objects. Collections of objects are ing SO problems around. Large Web as “one-size-fits-all” systems; that is, searched via constraints on multiple properties were frequently start-ups their implementations are claimed to attributes through a (non-SQL) query lucky enough to experience explosive be appropriate for all locations in the language or procedural mechanism; growth, the so-called hockey-stick ef- figure. Extensible record stores. Includes fect. They typically use an open source However, there is also some dis- BigTable, Cassandra, HBase, HyperT- DBMS, because it is free or already satisfaction with one-size-fits-all. Wit- able, and PNUTS providing variable- understood by the staff. A single-node ness, for example, the commercial width record sets that can be par- DBMS solution might be built for ver- success of the so-called column stores titioned vertically and horizontally sion 1, which quickly exhibits scalabil- in the data-warehouse market. With across multiple nodes. They are gener- ity problems. The conventional wis- these products, only those columns ally not accessed through SQL; and dom is then to “shard,” or partitioning needed in the query are retrieved from SQL DBMSs. Focus on SO applica- the application data over multiple disk, eliminating overhead for unused tion scalability, including MySQL Clus- nodes that share the load. A table can data. In addition, superior compres- ter, other MySQL derivatives, VoltDB, be partitioned this way; for example, sion and indexing is obtained, since NimbusDB, and Clustrix. They retain employee names can be partitioned only one kind of object exists on each SQL and ACID (Atomicity, Consisten- onto 26 nodes by putting all the “A”s on storage block, rather than several-to- cy, Isolation, and Durability)a transac- node 1 and so forth. It is now up to ap- many. Finally, main-memory band- tions, but their implementations are plication logic to direct each query and width is economized through a query often very different from those of GP- update to the correct node. However, executor that operates on compressed TRS systems. such sharding in application logic has data. For these reasons, column stores We do not claim this classification a number of severe drawbacks: are remarkably faster than row stores is precise or exhaustive, though it does ˲˲ If a cross-shard filter or join must on typical data-warehouse workloads, cover the major classes of newcomer. be performed, then it must be coded in and we expect them to dominate the Moreover, the market is changing rap- the application; data-warehouse market over time. idly, so the reader is advised to check ˲˲ If updates are required within a Our focus here is on simple-opera- other sources for the latest. For a more transaction to multiple shards, then tion (SO) applications, the lower por- thorough discussion and references the application is responsible for tion of the figure. Quite a few new, non- for these systems, see Cattell4 and the somehow guaranteeing data consis- GPTRS systems have been designed table here. tency across nodes; to provide scalability for this market. The NoSQL movement is driven ˲˲ Node failures are more common Loosely speaking, we classify them largely by the systems in the first three as the system scales. A difficult prob- into the following four categories: categories, restricting the traditional lem is how to maintain consistent Key-value stores. Includes Dynamo, notion of ACID transactions by allow- replicas, detect failures, fail over to Voldemort, Membase, Membrain, Sca- ing only single-record operations to be replicas, and replace failed nodes in a laris, and Riak. These systems have the running system; simplest data model: a collection of a ACID; see http://en.wikipedia.org/wiki/ACID ˲˲ Making schema changes without taking shards “offline” is a challenge; A characterization of DBMs applications. and ˲˲ Reprovisioning the hardware to Complex operations data warehouses add additional nodes or change the configuration is extremely tedious and, likewise, much more difficult if social networking the shards cannot be taken offline. Many developers of sharded Web simple operations applications experience severe pain OLTP because they must perform these func- tions in application-level logic; much write-focus read-focus of the NoSQL movement targets this pain point. However, with the large number of new systems and the wide 74 Communications of The acm | juNe 2011 | voL. 54 | No. 6 contributedarticles range of approaches they take, cus- but limits the scalability of a shared cel, Netezza, and Teradata. Moreover, tomersb might have difficulty under- disk configuration to a small number DB2 can run shared-nothing, as do standing and choosing a system to of nodes, typically fewer than 10. many NoSQL engines. Shared-nothing meet their application requirements. Oracle RAC is a popular example of engines normally perform automat- Here, we present 10 rules we advise a DBMS running shared disk, and it ic sharding (partitioning) of data to any customer to consider with an SO is difficult to find RAC configurations achieve parallelism. Shared-nothing application and in examining non-GP- with a double-digit number of nodes. systems scale only if data objects are TRS systems. They are a mix of DBMS Oracle recently announced Exadata partitioned across the system’s nodes requirements and guidelines concern- and Exadata 2 running shared disk in a manner that balances the load. If ing good SO application design. We at the top level of a two-tier hierarchy there is data skew or “hot spots,” then state them in the context of customers while running shared-nothing at the a shared-nothing system degrades in running software in their own environ- bottom level.
Recommended publications
  • Aware, Workstation-Based Distributed Database System
    THE ARCHITECTURE OF AN AUTONOMIC, RESOURCE- AWARE, WORKSTATION-BASED DISTRIBUTED DATABASE SYSTEM Angus Macdonald PhD Thesis February 2012 Abstract Distributed software systems that are designed to run over workstation machines within organisations are termed workstation-based. Workstation-based systems are characterised by dynamically changing sets of machines that are used primarily for other, user-centric tasks. They must be able to adapt to and utilize spare capacity when and where it is available, and ensure that the non-availability of an individual machine does not affect the availability of the system. This thesis focuses on the requirements and design of a workstation-based database system, which is motivated by an analysis of existing database architectures that are typically run over static, specially provisioned sets of machines. A typical clustered database system — one that is run over a number of specially provisioned machines — executes queries interactively, returning a synchronous response to applications, with its data made durable and resilient to the failure of machines. There are no existing workstation-based databases. Furthermore, other workstation-based systems do not attempt to achieve the requirements of interactivity and durability, because they are typically used to execute asynchronous batch processing jobs that tolerate data loss — results can be re-computed. These systems use external servers to store the final results of computations rather than workstation machines. This thesis describes the design and implementation of a workstation-based database system and investigates its viability by evaluating its performance against existing clustered database systems and testing its availability during machine failures. ACKNOWLEDGEMENTS I’d like to thank my supervisors, Professor Alan Dearle and Dr Graham Kirby, for the opportunities, the support, and the education that they have given me.
    [Show full text]
  • Providing High Availability for SAP Resources with Oracle Clusterware 11 Release 2
    Providing High Availability for SAP Resources with Oracle Clusterware 11 Release 2 An Oracle White Paper September 2011 Document Version 6.0 Providing High Availability for SAP Resources Overview of High Availability for SAP Resources.................................................... 3 New Functionality...................................................................................................... 3 SAP Support for High Availability ............................................................................ 4 Installation and Management ..................................................................................... 7 Overview Of Installation and Configuration.............................................................. 8 Functionality............................................................................................................. 13 Conclusion................................................................................................................ 16 Worked Example...................................................................................................... 17 Appendix 1 – Sample profile scripts........................................................................ 22 Appendix 2 – Troubleshooting and Log Files.......................................................... 24 Appendix 2 - SAPCTL Bill of Materials.................................................................. 25 Appendix 3 – CRS resources and types ................................................................... 25 Appendix 4 –
    [Show full text]
  • Using Oracle Goldengate 12C for Oracle Database
    An Oracle White Paper Updated September 2013 Using Oracle GoldenGate 12c for Oracle Database Using Oracle GoldenGate 12c for Oracle Database Executive Overview ........................................................................... 2 Introduction ....................................................................................... 3 Architecture Overview ....................................................................... 4 Oracle GoldenGate Capture .......................................................... 5 Oracle GoldenGate Trail Files ....................................................... 6 Oracle GoldenGate Delivery .......................................................... 7 Oracle GoldenGate Manager ......................................................... 9 Associated Products ...................................................................... 9 One Platform, Many Solutions ......................................................... 11 Zero Downtime Migrations and Upgrades .................................... 12 Query Offloading ......................................................................... 12 Disaster Recovery and Data Protection ....................................... 13 Active-Active Database Replication ............................................. 13 Operational Reporting and Real-Time Data Warehousing ........... 13 Data Distribution and Synchronization for OLTP Systems ........... 14 Oracle GoldenGate for Oracle Database ......................................... 15 Capture (Extract) ........................................................................
    [Show full text]
  • Providing High Availability for SAP Resources
    Providing High Availability for SAP Resources An Oracle White Paper April 2006 Providing High Availability for SAP Resources Overview of High Availability for SAP Resources .......................................3 SAP Support for High Availability..................................................................3 Installation and Management ...........................................................................5 Overview Of Installation and Configuration.................................................6 Functionality.......................................................................................................9 Usage ...............................................................................................................9 Conclusion........................................................................................................ 12 W orked Example............................................................................................. 13 Appendix 1 – Standard SAP Script Modifications..................................... 18 Script: startsap............................................................................................. 18 Script: stopsap............................................................................................. 21 Appendix 2 – Sample profile scripts - ENQUEUE Service (ASC)......... 23 Appendix 3 – Sample profile scripts - REPLICATION Service (ENR)25 Appendix 4 – Troubleshooting and Log Files............................................ 27 Appendix 5 - SAPCTL Bill of Materials.....................................................
    [Show full text]
  • Magic Quadrant for Data Warehouse Database Management Systems
    Magic Quadrant for Data Warehouse Database Management Systems Gartner RAS Core Research Note G00209623, Donald Feinberg, Mark A. Beyer, 28 January 2011, RV5A102012012 The data warehouse DBMS market is undergoing a transformation, including many acquisitions, as vendors adapt data warehouses to support the modern business intelligence and analytic workload requirements of users. This document compares 16 vendors to help you find the right one for your needs. WHAT YOU NEED TO KNOW Despite a troubled economic environment, the data warehouse database management system (DBMS) market returned to growth in 2010, with smaller vendors gaining in acceptance. As predicted in the previous iteration of this Magic Quadrant, 2010 brought major acquisitions, and several of the smaller vendors, such as Aster Data, Ingres and Vertica, took major strides by addressing specific market needs. The year also brought major market growth from data warehouse appliance offerings (see Note 1), with both EMC/Greenplum and Microsoft formally introducing appliances, and IBM, Oracle and Teradata broadening their appliance lines with new offerings. Although we believe that much of the growth was due to replacements of aging or performance-constrained data warehouse environments, we also think that the business value of using data warehouses for new applications such as performance management and advanced analytics has driven — and is driving — growth. All the vendors have stepped up their marketing efforts as the competition has grown. End-user organizations should ignore marketing claims about the applicability and performance capabilities of solutions. Instead, they should base their decisions on customer references and proofs of concept (POCs) to ensure that vendors’ claims will hold up in their environments.
    [Show full text]
  • Oracle Database Net Services Reference, 12C Release 1 (12.1) E17611-13
    Oracle®[1] Database Net Services Reference 12c Release 1 (12.1) E17611-13 December 2014 Oracle Database Net Services Reference, 12c Release 1 (12.1) E17611-13 Copyright © 2002, 2014, Oracle and/or its affiliates. All rights reserved. Primary Author: Caroline Johnston Contributor: The Oracle Database 12c documentation is dedicated to Mark Townsend, who was an inspiration to all who worked on this release. Contributors: Robert Achacoso, Abhishek Dadhich, Santanu Datta, Steve Ding, Feroz Khan, Peter Knaggs, Bhaskar Mathur, Scot McKinley, Ed Miner, Sweta Mogra, Srinivas Pamu, Kant Patel, Hector Pujol, Murali Purayathu, Karthik Rajan, Saravanakumar Ramasubramanian, Sudeep Reguna, Ching Tai, Norman Woo This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S.
    [Show full text]
  • Technical Comparison of Oracle Real Application Cluster 11G Vs. IBM
    Technical Comparison of Oracle Real Application Clusters 11 g vs. IBM DB2 v9.5 for Linux, Unix, and Windows An Oracle White Paper August 2008 Technical Comparison of Oracle Real Application Cluster 11 g vs. IBM DB2 v9.5 for Linux, Unix, and Windows Introduction ....................................................................................................... 3 Enterprise Grid Computing........................................................................ 3 Enabling Enterprise Grids........................................................................... 4 Enterprise Grid Deployment........................................................................... 5 Shared-Nothing Architecture...................................................................... 5 DB2 Shared-Nothing............................................................................... 5 Shared Disk Architecture............................................................................. 6 Best Choice For Deployment ..................................................................... 7 IBM DB2 Partitioning.................................................................................. 8 Nodegroups............................................................................................... 8 IBM DB2 Design Advisor...................................................................... 9 Scalability and Performance............................................................................. 9 Adding nodes..............................................................................................
    [Show full text]
  • Oracle Database 10G Vs. IBM DB2 UDB Technical Overview
    Oracle Database10g vs IBM DB2 UDB 8.1 Technical Overview An Oracle Competitive White Paper March 2004 Technical Overview of Oracle Database 10g with IBM DB2 UDB V8.1 Introduction....................................................................................................... 3 Grid Computing................................................................................................ 3 Manageability..................................................................................................... 7 Self-Managing Database.............................................................................. 8 Application/SQL Tuning...................................................................... 10 Managing the Enterprise............................................................................ 11 High Availability.............................................................................................. 12 Bounding Database Crash Recovery....................................................... 13 Human Error Recovery............................................................................. 14 Online Maintenance................................................................................... 15 Data Centre Disasters ................................................................................ 15 Data W arehousing and Business Intelligence............................................. 16 Partitioning.................................................................................................. 18 Data Loading and Archiving....................................................................
    [Show full text]
  • Progress Datadirect for JDBC for Oracle User's Guide
    Progress DataDirect for JDBC for Oracle User©s Guide Release 6.0.0 Copyright © 2020 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. These materials and all Progress® software products are copyrighted and all rights are reserved by Progress Software Corporation. The information in these materials is subject to change without notice, and Progress Software Corporation assumes no responsibility for any errors that may appear therein. The references in these materials to specific platforms supported are subject to change. Corticon, DataDirect (and design), DataDirect Cloud, DataDirect Connect, DataDirect Connect64, DataDirect XML Converters, DataDirect XQuery, DataRPM, Defrag This, Deliver More Than Expected, Icenium, Ipswitch, iMacros, Kendo UI, Kinvey, MessageWay, MOVEit, NativeChat, NativeScript, OpenEdge, Powered by Progress, Progress, Progress Software Developers Network, SequeLink, Sitefinity (and Design), Sitefinity, SpeedScript, Stylus Studio, TeamPulse, Telerik, Telerik (and Design), Test Studio, WebSpeed, WhatsConfigured, WhatsConnected, WhatsUp, and WS_FTP are registered trademarks of Progress Software Corporation or one of its affiliates or subsidiaries in the U.S. and/or other countries. Analytics360, AppServer, BusinessEdge, DataDirect Autonomous REST Connector, DataDirect Spy, SupportLink, DevCraft, Fiddler, iMail, JustAssembly, JustDecompile, JustMock, NativeScript Sidekick, OpenAccess, ProDataSet, Progress Results, Progress Software, ProVision, PSE Pro, SmartBrowser, SmartComponent,
    [Show full text]
  • Implementation of Oracle Real Application Cluster
    REVIEW OF COMPUTER ENGINEERING A publication of IIETA STUDIES ISSN: 2369-0755 (Print), 2369-0763 (Online) Vol. 3, No. 4, December 2016, pp. 81-85 DOI: 10.18280/rces.030401 Licensed under CC BY-NC 4.0 http://www.iieta.org/Journals/RCES Implementation of Oracle Real Application Cluster Hayat Z.1*, Soomro T.R.2 *1 Ankabut IT, Khalifa University of Science & Technology, UAE 2 Department of Computer Science, SZABIST Dubai Campus, UAE Email: [email protected] ABSTRACT Oracle Real Application Cluster (RAC) is Oracle technology component that can be active on multiple servers to serve as Oracle database server unique. Each server has separate own database instance and these all act as instance of single database for the external network access on the unique virtual IP address. The architecture of the RAC is provides fault tolerance and a great power of treatment. Oracle RAC provides several features in any database operation of the enterprise level for example the scalability, availability, load balancing and failover. The purpose of this study is to serve as a quick reference guide to the installation of management and operation of the Oracle Cluster-ware Real Application Cluster (RAC) database. This feature is initially targeted for the technical staff mainly for DBA Admins of the system. Keywords: RAC, Shared Storage, SAN, Cluster-ware, Oracle ASM, Oracle Kernel. 1. INTRODUCTION systems in the cloud and enterprise software products. Oracle is the world's leading supplier of software for information Cluster database instances are common challenging for management, but is best known for its refined products for each vendor application is presenting its own architecture relational databases.
    [Show full text]
  • Business Continuity with IBM Hyperswap, Oracle Real Application Cluster (RAC), and Vmware Site Recovery Manager a Solution Overview
    IBM Systems Technical White Paper November 2017 Business continuity with IBM HyperSwap, Oracle Real Application Cluster (RAC), and VMware Site Recovery Manager A solution overview Overview In today’s world, coming up with a business continuity plan (BCP) is imperative to the sustainability of your business. Without a well-thought-out Challenge plan in place, it is highly unlikely that your company will be able to survive and recover from disasters. Do you have a business continuity plan to mitigate threats and risks for The BCP involves creation of a strategy through the recognition of threats and your business in the event of risks facing a company, with an eye to ensure that personnel and assets are potential disaster? Do you have an protected and are able to function in the event of a disaster. orchestrated way to quickly get your Business continuity is a proactive plan to avoid and mitigate risks associated business running post disaster? with a disruption of operations. It details the steps to be taken before, during, and after an event to maintain the financial viability of an organization. Solution People often mistake disaster recovery as business continuity plan. It is in fact Consider a multifold solution that a reactive plan for responding after an event. brings together the best of the However important the BCP is, there are several major roadblocks (such as industry technologies together. IBM business priority, prohibitive costs, high complexity, and willingness of the HyperSwap provides an active/active people involved) for successful implementation. replication of volumes across metro distances, Oracle Real Application Having a BCP enhances an organization's image with employees, shareholders, and customers by demonstrating a proactive attitude.
    [Show full text]
  • Oracle Real Application Clusters 11G
    Oracle Real Application Clusters 11g An Oracle Technical White Paper April 2007 Oracle Real Application Clusters 11g Introduction.......................................................................................................3 W hat is Oracle Real Application Clusters 11g?............................................3 Real Application Clusters Architecture.....................................................4 Oracle Clusterware...................................................................................5 Hardware Architecture............................................................................5 File Systems and Volume Management................................................6 Virtual Internet Protocol Address (VIP)..............................................6 Cluster Verification Utility......................................................................6 RAC on Extended Distance Clusters....................................................7 Benefits of Oracle Real Application Clusters................................................7 High Availability...........................................................................................7 Scalability........................................................................................................8 Managing Your Oracle Real Application Clusters Database......................9 Enterprise Manager......................................................................................9 Rolling Patch Application..........................................................................11
    [Show full text]