Database Reliability Engineering DESIGNING and OPERATING RESILIENT DATABASE SYSTEMS

Total Page:16

File Type:pdf, Size:1020Kb

Database Reliability Engineering DESIGNING and OPERATING RESILIENT DATABASE SYSTEMS Database Reliability Engineering DESIGNING AND OPERATING RESILIENT DATABASE SYSTEMS Laine Campbell & Charity Majors Database Reliability Engineering Designing and Operating Resilient Database Systems Laine Campbell and Charity Majors Beijing Boston Farnham Sebastopol Tokyo Database Reliability Engineering by Laine Campbell and Charity Majors Copyright © 2018 Laine Campbell and Charity Majors. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐ tutional sales department: 800-998-9938 or [email protected]. Editors: Courtney Allen and Virginia Wilson Indexer: Ellen Troutman-Zaig Production Editor: Melanie Yarbrough Interior Designer: David Futato Copyeditor: Bob Russell, Octal Publishing, Inc. Cover Designer: Karen Montgomery Proofreader: Matthew Burgoyne Illustrator: Rebecca Demarest November 2017: First Edition Revision History for the First Edition 2017-10-26: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491925942 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Database Reliability Engineering, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-491-92594-2 [LSI] Table of Contents Foreword. xi Preface. xiii 1. Introducing Database Reliability Engineering. 1 Guiding Principles of the DBRE 2 Protect the Data 2 Self-Service for Scale 3 Elimination of Toil 4 Databases Are Not Special Snowflakes 5 Eliminate the Barriers Between Software and Operations 5 Operations Core Overview 6 Hierarchy of Needs 7 Survival and Safety 7 Love and Belonging 8 Esteem 9 Self-actualization 10 Wrapping Up 11 2. Service-Level Management. 13 Why Do I Need Service-Level Objectives? 13 Service-Level Indicators 15 Latency 15 Availability 16 Throughput 16 Durability 16 Cost or Efficiency 16 Defining Service Objectives 17 iii Latency Indicators 17 Availability Indicators 20 Throughput Indicators 23 Monitoring and Reporting on SLOs 25 Monitoring Availability 25 Monitoring Latency 28 Monitoring Throughput 28 Monitoring Cost and Efficiency 28 Wrapping Up 29 3. Risk Management. 31 Risk Considerations 32 Unknown Factors and Complexity 32 Availability of Resources 32 Human Factors 33 Group Factors 34 What Do We Do? 35 What Not to Do 35 A Working Process: Bootstrapping 36 Service Risk Evaluation 37 Architectural Inventory 39 Prioritization 40 Control and Decision Making 42 Ongoing Iterations 45 Wrapping Up 47 4. Operational Visibility. 49 The New Rules of Operational Visibility 51 Treat OpViz Systems Like BI Systems 52 Distributed Ephemeral Environments Trending to the Norm 52 Store at High Resolutions for Key Metrics 54 Keep Your Architecture Simple 55 An OpViz Framework 56 Data In 57 Telemetry/Metrics 59 Events 60 Logs 60 Data Out 60 Bootstrapping Your Monitoring 61 Is the Data Safe? 63 Is the Service Up? 64 Are the Consumers in Pain? 65 iv | Table of Contents Instrumenting the Application 66 Distributed Tracing 66 Events and Logs 68 Instrumenting the Server or Instance 68 Events and Logs 70 Instrumenting the Datastore 71 Datastore Connection Layer 71 Utilization 71 Saturation 72 Errors 73 Internal Database Visibility 74 Throughput and Latency Metrics 74 Commits, Redo, and Journaling 75 Replication State 75 Memory Structures 76 Locking and Concurrency 77 Database Objects 78 Database Queries 79 Database Asserts and Events 79 Wrapping Up 80 5. Infrastructure Engineering. 81 Hosts 81 Physical Servers 81 Operating a System and Kernel 82 Storage Area Networks 92 Benefits of Physical Servers 92 Cons of Physical Servers 92 Virtualization 93 Hypervisor 93 Concurrency 94 Storage 94 Use Cases 94 Containers 95 Database as a Service 95 Challenges of DBaaS 96 The DBRE and the DBaaS 96 Wrapping Up 97 6. Infrastructure Management. 99 Version Control 100 Configuration Definition 101 Table of Contents | v Building from Configuration 103 Maintaining Configuration 104 Enforcement of Configuration Definitions 105 Infrastructure Definition and Orchestration 105 Monolithic Infrastructure Definitions 106 Separating Vertically 107 Separated Tiers (Horizontal Definitions) 108 Acceptance Testing and Compliance 109 Service Catalog 109 Bringing It All Together 110 Development Environments 111 Wrapping Up 112 7. Backup and Recovery. 113 Core Concepts 114 Physical versus Logical 114 Online versus Offline 114 Full, Incremental, and Differential 115 Considerations for Recovery 115 Recovery Scenarios 116 Planned Recovery Scenarios 116 Unplanned Scenarios 118 Scenario scope 121 Scenario Impact 121 Anatomy of a Recovery Strategy 122 Building Block 1: Detection 122 Building Block 2: Tiered Storage 124 Building Block 3: A Varied Toolbox 125 Building Block 4: Testing 127 A Recovery Strategy Defined 128 Online, Fast Storage with Full and Incremental Backups 128 Online, Slow Storage with Full and Incremental Backups 129 Offline Storage 130 Object Storage 131 Wrapping Up 132 8. Release Management. 133 Education and Collaboration 133 Become a Funnel 134 Foster Conversations 134 Domain-Specific Knowledge 135 Collaboration 137 vi | Table of Contents Integration 138 Prerequisites 139 Testing 141 Test-Friendly Development Practices 142 Post-Commit Testing 143 Full Dataset Testing 144 Downstream Tests 145 Operational Tests 145 Deployment 146 Migrations and Versioning 146 Impact Analysis 147 Migration Patterns 148 Manual or Automated 151 Wrapping Up 151 9. Security. 153 The Purpose of Security 153 Protecting Data from Theft 154 Protecting from Purposeful Damage 154 Protecting from Accidental Damage 154 Protecting Data from Exposure 155 Compliance and Auditing Standards 155 Database Security as a Function 155 Education and Collaboration 155 Self-Service 156 Integration and Testing 157 Operational Visibility 158 Vulnerabilities and Exploits 160 STRIDE 160 DREAD 161 Basic Precautions 162 Denial of Service 163 SQL Injection 166 Network and Authentication Protocols 168 Encryption of Data 168 Financial Data 169 Personal Health Data 169 Private Individual Data 169 Military or Government Data 170 Confidential/Sensitive Business Data 170 Data in Transit 170 Data in the Database 174 Table of Contents | vii Data in the Filesystem 177 Wrapping Up 179 10. Data Storage, Indexing, and Replication. 181 Data Structure Storage 181 Database Row Storage 182 Sorted-String Tables and Log-Structured Merge Trees 185 Indexing 188 Logs and Databases 189 Data Replication 189 Single-Leader 190 Multi-Leader Replication 203 Wrapping Up 209 11. Datastore Field Guide. 211 Conceptual Attributes of a Datastore 212 The Data Model 212 Transactions 215 BASE 221 Internal Attributes of a Datastore 222 Storage 222 The Ubiquitous CAP Theorem Section 223 Consistency Latency Trade-offs 225 Availability 226 Wrapping Up 227 12. A Data Architecture Sampler. 229 Architectural Components 229 Frontend Datastores 229 Data Access Layer 230 Database Proxies 231 Event and Message Systems 233 Caches and Memory Stores 235 Data Architectures 238 Lambda and Kappa 238 Event Sourcing 241 CQRS 242 Wrapping Up 243 13. Making the Case For DBRE. 245 A Culture of Database Reliability 246 Breaking-Down Barriers 246 viii | Table of Contents Data-Driven Decision Making 251 Data Integrity and Recoverability 252 Wrapping Up 252 Index. 253 Table of Contents | ix Foreword Collectively, we are witnessing a time of unprecedented change and disruption in the database industry. Technology adoption life cycles have accelerated to the point where all of our heads are spinning—with both challenge and opportunity. Architectures are evolving so quickly that the tasks we became accustomed to per‐ forming are no longer required, and the related skills we invested in so heavily are barely relevant. Emerging innovations and pressures in security, Infrastructure as Code, and cloud capabilities (such as Infrastructure and Database as a Service), have allowed us—and required us, actually—to rethink how we build. By necessity, we have moved away from our traditional, administrative workloads to a process emphasizing architecture, automation, software engineering, continuous integration and delivery, and systems instrumentation skills, above all. Meanwhile, the value and importance of the data we’ve been protecting and caring for all along has increased by an order of magnitude or more, and we see no chance of a future in which it doesn’t continue to increase in value. We find ourselves in the fortunate posi‐ tion of being able to make a meaningful, important difference in the world. Without a doubt, many of us who once considered ourselves outstanding database administrators are at risk of being overwhelmed or even
Recommended publications
  • An Architect's Guide to Site Reliability Engineering Nathaniel T
    An Architect's Guide to Site Reliability Engineering Nathaniel T. Schutta @ntschutta ntschutta.io https://content.pivotal.io/ ebooks/thinking-architecturally Sofware development practices evolve. Feature not a bug. It is the agile thing to do! We’ve gone from devs and ops separated by a large wall… To DevOps all the things. We’ve gone from monoliths to service oriented to microserivces. And it isn’t all puppies and rainbows. Shoot. A new role is emerging - the site reliability engineer. Why? What does that mean to our teams? What principles and practices should we adopt? How do we work together? What is SRE? Important to understand the history. Not a new concept but a catchy name! Arguably goes back to the Apollo program. Margaret Hamilton. Crashed a simulator by inadvertently running a prelaunch program. That wipes out the navigation data. Recalculating… Hamilton wanted to add error-checking code to the Apollo system that would prevent this from messing up the systems. But that seemed excessive to her higher- ups. “Everyone said, ‘That would never happen,’” Hamilton remembers. But it did. Right around Christmas 1968. — ROBERT MCMILLAN https://www.wired.com/2015/10/margaret-hamilton-nasa-apollo/ Luckily she did manage to update the documentation. Allowed them to recover the data. Doubt that would have turned into a Hollywood blockbuster… Hope is not a strategy. But it is what rebellions are built on. Failures, uh find a way. Traditionally, systems were run by sys admins. AKA Prod Ops. Or something similar. And that worked OK. For a while. But look around your world today.
    [Show full text]
  • Chapter 6 Structural Reliability
    MIL-HDBK-17-3E, Working Draft CHAPTER 6 STRUCTURAL RELIABILITY Page 6.1 INTRODUCTION ....................................................................................................................... 2 6.2 FACTORS AFFECTING STRUCTURAL RELIABILITY............................................................. 2 6.2.1 Static strength.................................................................................................................... 2 6.2.2 Environmental effects ........................................................................................................ 3 6.2.3 Fatigue............................................................................................................................... 3 6.2.4 Damage tolerance ............................................................................................................. 4 6.3 RELIABILITY ENGINEERING ................................................................................................... 4 6.4 RELIABILITY DESIGN CONSIDERATIONS ............................................................................. 5 6.5 RELIABILITY ASSESSMENT AND DESIGN............................................................................. 6 6.5.1 Background........................................................................................................................ 6 6.5.2 Deterministic vs. Probabilistic Design Approach ............................................................... 7 6.5.3 Probabilistic Design Methodology.....................................................................................
    [Show full text]
  • An Introduction to Psychometric Theory with Applications in R
    What is psychometrics? What is R? Where did it come from, why use it? Basic statistics and graphics TOD An introduction to Psychometric Theory with applications in R William Revelle Department of Psychology Northwestern University Evanston, Illinois USA February, 2013 1 / 71 What is psychometrics? What is R? Where did it come from, why use it? Basic statistics and graphics TOD Overview 1 Overview Psychometrics and R What is Psychometrics What is R 2 Part I: an introduction to R What is R A brief example Basic steps and graphics 3 Day 1: Theory of Data, Issues in Scaling 4 Day 2: More than you ever wanted to know about correlation 5 Day 3: Dimension reduction through factor analysis, principal components analyze and cluster analysis 6 Day 4: Classical Test Theory and Item Response Theory 7 Day 5: Structural Equation Modeling and applied scale construction 2 / 71 What is psychometrics? What is R? Where did it come from, why use it? Basic statistics and graphics TOD Outline of Day 1/part 1 1 What is psychometrics? Conceptual overview Theory: the organization of Observed and Latent variables A latent variable approach to measurement Data and scaling Structural Equation Models 2 What is R? Where did it come from, why use it? Installing R on your computer and adding packages Installing and using packages Implementations of R Basic R capabilities: Calculation, Statistical tables, Graphics Data sets 3 Basic statistics and graphics 4 steps: read, explore, test, graph Basic descriptive and inferential statistics 4 TOD 3 / 71 What is psychometrics? What is R? Where did it come from, why use it? Basic statistics and graphics TOD What is psychometrics? In physical science a first essential step in the direction of learning any subject is to find principles of numerical reckoning and methods for practicably measuring some quality connected with it.
    [Show full text]
  • Form Follows Function Model-Driven Engineering for Clinical Trials
    Form Follows Function Model-Driven Engineering for Clinical Trials Jim Davies1, Jeremy Gibbons1, Radu Calinescu2, Charles Crichton1, Steve Harris1, and Andrew Tsui1 1 Department of Computer Science, University of Oxford Wolfson Building, Parks Road, Oxford OX1 3QD, UK http://www.cs.ox.ac.uk/firstname.lastname/ 2 Computer Science Research Group, Aston University Aston Triangle, Birmingham B4 7ET, UK http://www-users.aston.ac.uk/~calinerc/ Abstract. We argue that, for certain constrained domains, elaborate model transformation technologies|implemented from scratch in general- purpose programming languages|are unnecessary for model-driven en- gineering; instead, lightweight configuration of commercial off-the-shelf productivity tools suffices. In particular, in the CancerGrid project, we have been developing model-driven techniques for the generation of soft- ware tools to support clinical trials. A domain metamodel captures the community's best practice in trial design. A scientist authors a trial pro- tocol, modelling their trial by instantiating the metamodel; customized software artifacts to support trial execution are generated automati- cally from the scientist's model. The metamodel is expressed as an XML Schema, in such a way that it can be instantiated by completing a form to generate a conformant XML document. The same process works at a second level for trial execution: among the artifacts generated from the protocol are models of the data to be collected, and the clinician conduct- ing the trial instantiates such models in reporting observations|again by completing a form to create a conformant XML document, represent- ing the data gathered during that observation. Simple standard form management tools are all that is needed.
    [Show full text]
  • Training-Sre.Pdf
    C om p lim e nt s of Training Site Reliability Engineers What Your Organization Needs to Create a Learning Program Jennifer Petoff, JC van Winkel & Preston Yoshioka with Jessie Yang, Jesus Climent Collado & Myk Taylor REPORT Want to know more about SRE? To learn more, visit google.com/sre Training Site Reliability Engineers What Your Organization Needs to Create a Learning Program Jennifer Petoff, JC van Winkel, and Preston Yoshioka, with Jessie Yang, Jesus Climent Collado, and Myk Taylor Beijing Boston Farnham Sebastopol Tokyo Training Site Reliability Engineers by Jennifer Petoff, JC van Winkel, and Preston Yoshioka, with Jessie Yang, Jesus Climent Collado, and Myk Taylor Copyright © 2020 O’Reilly Media. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more infor‐ mation, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. Acquistions Editor: John Devins Proofreader: Charles Roumeliotis Development Editor: Virginia Wilson Interior Designer: David Futato Production Editor: Beth Kelly Cover Designer: Karen Montgomery Copyeditor: Octal Publishing, Inc. Illustrator: Rebecca Demarest November 2019: First Edition Revision History for the First Edition 2019-11-15: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781492076001 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Training Site Reli‐ ability Engineers, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
    [Show full text]
  • Cluster Analysis for Gene Expression Data: a Survey
    Cluster Analysis for Gene Expression Data: A Survey Daxin Jiang Chun Tang Aidong Zhang Department of Computer Science and Engineering State University of New York at Buffalo Email: djiang3, chuntang, azhang @cse.buffalo.edu Abstract DNA microarray technology has now made it possible to simultaneously monitor the expres- sion levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremen- dous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. A very rich literature on cluster analysis has developed over the past three decades. Many conventional clustering algorithms have been adapted or directly applied to gene expres- sion data, and also new algorithms have recently been proposed specifically aiming at gene ex- pression data. These clustering algorithms have been proven useful for identifying biologically relevant groups of genes and samples. In this paper, we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data.
    [Show full text]
  • Reliability Engineering: Today and Beyond
    Reliability Engineering: Today and Beyond Keynote Talk at the 6th Annual Conference of the Institute for Quality and Reliability Tsinghua University People's Republic of China by Professor Mohammad Modarres Director, Center for Risk and Reliability Department of Mechanical Engineering Outline – A New Era in Reliability Engineering – Reliability Engineering Timeline and Research Frontiers – Prognostics and Health Management – Physics of Failure – Data-driven Approaches in PHM – Hybrid Methods – Conclusions New Era in Reliability Sciences and Engineering • Started as an afterthought analysis – In enduing years dismissed as a legitimate field of science and engineering – Worked with small data • Three advances transformed reliability into a legitimate science: – 1. Availability of inexpensive sensors and information systems – 2. Ability to better described physics of damage, degradation, and failure time using empirical and theoretical sciences – 3. Access to big data and PHM techniques for diagnosing faults and incipient failures • Today we can predict abnormalities, offer just-in-time remedies to avert failures, and making systems robust and resilient to failures Seventy Years of Reliability Engineering – Reliability Engineering Initiatives in 1950’s • Weakest link • Exponential life model • Reliability Block Diagrams (RBDs) – Beyond Exp. Dist. & Birth of System Reliability in 1960’s • Birth of Physics of Failure (POF) • Uses of more proper distributions (Weibull, etc.) • Reliability growth • Life testing • Failure Mode and Effect Analysis
    [Show full text]
  • Reliability: Software Software Vs
    Reliability Theory SENG 521 Re lia bility th eory d evel oped apart f rom th e mainstream of probability and statistics, and Software Reliability & was usedid primar ily as a tool to h hlelp Software Quality nineteenth century maritime and life iifiblinsurance companies compute profitable rates Chapter 5: Overview of Software to charge their customers. Even today, the Reliability Engineering terms “failure rate” and “hazard rate” are often used interchangeably. Department of Electrical & Computer Engineering, University of Calgary Probability of survival of merchandize after B.H. Far ([email protected]) 1 http://www. enel.ucalgary . ca/People/far/Lectures/SENG521/ ooene MTTF is R e 0.37 From Engineering Statistics Handbook [email protected] 1 [email protected] 2 Reliability: Natural System Reliability: Hardware Natural system Hardware life life cycle. cycle. Aging effect: Useful life span Life span of a of a hardware natural system is system is limited limited by the by the age (wear maximum out) of the system. reproduction rate of the cells. Figure from Pressman’s book Figure from Pressman’s book [email protected] 3 [email protected] 4 Reliability: Software Software vs. Hardware So ftware life cyc le. Software reliability doesn’t decrease with Software systems time, i.e., software doesn’t wear out. are changed (updated) many Hardware faults are mostly physical faults, times during their e. g., fatigue. life cycle. Each update adds to Software faults are mostly design faults the structural which are harder to measure, model, detect deterioration of the and correct. software system. Figure from Pressman’s book [email protected] 5 [email protected] 6 Software vs.
    [Show full text]
  • SUSE BU Presentation Template 2014
    TUT7317 A Practical Deep Dive for Running High-End, Enterprise Applications on SUSE Linux Holger Zecha Senior Architect REALTECH AG [email protected] Table of Content • About REALTECH • About this session • Design principles • Different layers which need to be considered 2 Table of Content • About REALTECH • About this session • Design principles • Different layers which need to be considered 3 About REALTECH 1/2 REALTECH Software REALTECH Consulting . Business Service Management . SAP Mobile . Service Operations Management . Cloud Computing . Configuration Management and CMDB . SAP HANA . IT Infrastructure Management . SAP Solution Manager . Change Management for SAP . IT Technology . Virtualization . IT Infrastructure 4 About REALTECH 2/2 5 Our Customers Manufacturing IT services Healthcare Media Utilities Consumer Automotive Logistics products Finance Retail REALTECH Consulting GmbH 6 Table of Content • About REALTECH • About this session • Design principles • Different layers which need to be considered 7 The Inspiration for this Session • Several performance workshops at customers • Performance escalations at customer who migrated from UNIX (AIX, Solaris, HP-UX) to Linux • Presenting the experiences made at these customers in this session • Preventing the audience from performance degradation caused from: – Significant design mistakes – Wrong architecture assumptions – Having no architecture at all 8 Performance Optimization The False Estimation Upgrading server with CPUs that are 12.5% faster does not improve application
    [Show full text]
  • Biostatistics (BIOSTAT) 1
    Biostatistics (BIOSTAT) 1 This course covers practical aspects of conducting a population- BIOSTATISTICS (BIOSTAT) based research study. Concepts include determining a study budget, setting a timeline, identifying study team members, setting a strategy BIOSTAT 301-0 Introduction to Epidemiology (1 Unit) for recruitment and retention, developing a data collection protocol This course introduces epidemiology and its uses for population health and monitoring data collection to ensure quality control and quality research. Concepts include measures of disease occurrence, common assurance. Students will demonstrate these skills by engaging in a sources and types of data, important study designs, sources of error in quarter-long group project to draft a Manual of Operations for a new epidemiologic studies and epidemiologic methods. "mock" population study. BIOSTAT 302-0 Introduction to Biostatistics (1 Unit) BIOSTAT 429-0 Systematic Review and Meta-Analysis in the Medical This course introduces principles of biostatistics and applications Sciences (1 Unit) of statistical methods in health and medical research. Concepts This course covers statistical methods for meta-analysis. Concepts include descriptive statistics, basic probability, probability distributions, include fixed-effects and random-effects models, measures of estimation, hypothesis testing, correlation and simple linear regression. heterogeneity, prediction intervals, meta regression, power assessment, BIOSTAT 303-0 Probability (1 Unit) subgroup analysis and assessment of publication
    [Show full text]
  • On the Performance Variation in Modern Storage Stacks
    On the Performance Variation in Modern Storage Stacks Zhen Cao1, Vasily Tarasov2, Hari Prasath Raman1, Dean Hildebrand2, and Erez Zadok1 1Stony Brook University and 2IBM Research—Almaden Appears in the proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17) Abstract tions on different machines have to compete for heavily shared resources, such as network switches [9]. Ensuring stable performance for storage stacks is im- In this paper we focus on characterizing and analyz- portant, especially with the growth in popularity of ing performance variations arising from benchmarking hosted services where customers expect QoS guaran- a typical modern storage stack that consists of a file tees. The same requirement arises from benchmarking system, a block layer, and storage hardware. Storage settings as well. One would expect that repeated, care- stacks have been proven to be a critical contributor to fully controlled experiments might yield nearly identi- performance variation [18, 33, 40]. Furthermore, among cal performance results—but we found otherwise. We all system components, the storage stack is the corner- therefore undertook a study to characterize the amount stone of data-intensive applications, which become in- of variability in benchmarking modern storage stacks. In creasingly more important in the big data era [8, 21]. this paper we report on the techniques used and the re- Although our main focus here is reporting and analyz- sults of this study. We conducted many experiments us- ing the variations in benchmarking processes, we believe ing several popular workloads, file systems, and storage that our observations pave the way for understanding sta- devices—and varied many parameters across the entire bility issues in production systems.
    [Show full text]
  • Sql Server to Aurora Postgresql Migration Playbook
    Microsoft SQL Server To Amazon Aurora with Post- greSQL Compatibility Migration Playbook 1.0 Preliminary September 2018 © 2018 Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only. It represents AWS’s current product offer- ings and practices as of the date of issue of this document, which are subject to change without notice. Customers are responsible for making their own independent assessment of the information in this document and any use of AWS’s products or services, each of which is provided “as is” without war- ranty of any kind, whether express or implied. This document does not create any warranties, rep- resentations, contractual commitments, conditions or assurances from AWS, its affiliates, suppliers or licensors. The responsibilities and liabilities of AWS to its customers are controlled by AWS agree- ments, and this document is not part of, nor does it modify, any agreement between AWS and its cus- tomers. - 2 - Table of Contents Introduction 9 Tables of Feature Compatibility 12 AWS Schema and Data Migration Tools 20 AWS Schema Conversion Tool (SCT) 21 Overview 21 Migrating a Database 21 SCT Action Code Index 31 Creating Tables 32 Data Types 32 Collations 33 PIVOT and UNPIVOT 33 TOP and FETCH 34 Cursors 34 Flow Control 35 Transaction Isolation 35 Stored Procedures 36 Triggers 36 MERGE 37 Query hints and plan guides 37 Full Text Search 38 Indexes 38 Partitioning 39 Backup 40 SQL Server Mail 40 SQL Server Agent 41 Service Broker 41 XML 42 Constraints
    [Show full text]