Database Schema Migration in Highly Available Services
Total Page:16
File Type:pdf, Size:1020Kb
DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2021 Database schema migration in highly available services JAVIER BENITEZ FERNANDEZ KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Database schema migration in highly available services BENITEZ FERNANDEZ, JAVIER ICT Innovation Master’s Program Date: January 20, 2021 Supervisor: Ying Liu Examiner: Benoit Baudry School of Electrical Engineering and Computer Science Host company: Eficode Oy Swedish title: Databasschema migreringar i mycket tillgängliga tjänster iii Abstract Since the rise of DevOps, companies have been trying to provide rapid and agile cycles on the development of new products, reducing time and costs to go from definition to production. However, whenever a MySQL system with a zero-downtime requirement is found, the database deployment cycle is still manual, leaving it less evolved than other parts of the development cycle, es- pecially when applying schema updates without affecting the availability of a MySQL system (Database Online Schema migrations) on Continuous Inte- gration and Continuous Delivery pipelines. As a consequence, it delays the process. There is a strong belief that the different affected technologies are ready to automatize Online Schema migrations in production. In this thesis, we verify the viability and study the reliability of automatizing Online Schema migrations with a zero-downtime requirement. We achieved this by testing the most common errors in a setup without error detection mechanisms and then testing our proposed design. We conclude that is viable and reliable to apply these techniques and strategies for providing Continuous Delivery. Our design prevents most common errors when updating SQL schemas online with some inherent implications and flaws in MySQL systems. iv Sammanfattning Sedan DevOps uppkomst har företag försökt ge snabba och smidiga cykler för utveckling av nya produkter, vilket reducerar tid och kostnader för att gå från definition till produktion. Men när ett noll-stilleståndskrav hittas i ett MySQL- system, är databasdistributionscykeln fortfarande rudimentär och manuell, vil- ket gör att den har mindre utvecklats än andra delar av utvecklingscykeln, sär- skilt migrationsdatabas online-schema för kontinuerlig integration och kon- tinuerliga leveransrörledningar . Som en konsekvens försenar det processen. Även om det inte finns någon tidigare litteratur om det, finns det en stark över- tygelse om att de olika berörda teknikerna är redo att automatisera online- schema-migrationer i produktionen. I den här avhandlingen verifierar vi livs- kraften och studerar tillförlitligheten för att automatisera Online Schema mi- greringar med ett noll-stilleståndskrav. Vi gjorde det genom att testa de van- ligaste felen i en normal installation och när vi använder vår föreslagna de- sign. Vi drar slutsatsen att det är genomförbart och pålitligt att tillämpa dessa tekniker och strategier för att tillhandahålla kontinuerlig leverans. Vår design förhindrar vanliga fel vid uppdatering av SQL-scheman online med några in- neboende implikationer och brister. Contents 1 Introduction 1 1.1 Problem Statement . 3 1.2 Research Question . 4 1.3 Structure of the Document . 5 1.4 Thesis Contribution . 5 1.5 Ethics and Sustainability . 6 2 Background 7 2.1 Devops, Continuous Integration and Continuous Delivery . 7 2.2 MySQL Systems . 9 2.3 Online Schema Migrations . 12 2.4 Summary . 13 3 Related Work 14 3.1 Methods . 14 3.2 Related Works . 15 3.3 Summaries and Discussions . 18 4 Methodology 20 5 Solution Design 22 5.1 Environment Design . 22 5.2 Requirements . 23 5.3 Expected Outputs . 26 5.4 Pipeline Structure . 26 5.5 Setup . 28 5.6 Implementation . 31 5.7 Summaries and Discussions . 36 v vi CONTENTS 6 Experiment 37 6.1 Research Questions . 37 6.2 Metrics . 38 6.3 Case Study and Scenarios . 39 6.4 RQ1 . 43 6.5 RQ2 . 45 6.6 RQ3 . 47 7 Discussion 49 7.1 Results of the Conducted Experiment . 49 7.2 General Considerations . 52 7.3 Impact of the Pipeline . 52 7.4 Threats to Validity . 53 8 Conclusions and Future Work 54 8.1 Conclusions . 54 8.2 Future Work . 55 Bibliography 56 List of Figures 1.1 DBA migration flow . 3 2.1 DevOps Workflow [23] . 8 2.2 MySQL cluster master-slaves . 10 2.3 MySQL cluster master-master . 11 2.4 AWS Aurora clustering master-slaves . 12 5.1 Pipeline stages . 26 5.2 Pipeline stages . 27 5.3 Continuous delivery flows . 28 5.4 Infrastructure architecture . 32 6.1 New column runtime error . 43 6.2 New column runtime latency . 44 6.3 Alter column backward compatibility error . 44 6.4 Alter column backward compatibility traffic.......... 45 6.5 Alter column runtime error . 46 6.6 Alter column runtime traffic .................. 46 6.7 Alter column runtime database CPU . 47 6.8 Drop column database CPU . 48 6.9 Drop column latency . 48 vii Chapter 1 Introduction Since the rise of DevOps, many companies have been trying to improve the development cycles and achieve more agile cycles [1][2]. The state of the art has been focusing on improving the agility of software development in gen- eral, however, some areas are still immature with limited research work. One of them is the necessity of predictable automatized updates for the system life- cycle of high availability systems. Jim Gray et al. describe a high availability system as a system with a low unavailable period of 5 minutes per year [3]. Due to technical challenges further explained in this document, DevOps practices, especially Continuous Delivery (CD) and Continuous Deployment (CDL) pipelines are not fully exploited in the scenarios involving updates for high availability systems. CD and CDL are two complementary components. CD pipelines are usually custom pipelines for each project to define differ- ent jobs for building and testing software automatically. CDL pipelines have the aim of automatically releasing the software previously processed by a CD pipeline. V. Holt et al investigated the adoption of the best practices for database management in the database community [4]. They found high use of agile de- velopment methodologies. However, half of the respondents remarked a lack of a defined database development life cycle, including fault recovery time ob- jectives. This makes it difficult to ensure the stability of the state conservation during database evolution without producing downtime, outage, or any other issue that compromises the availability of the system. Traditionally, a Database Administrator (DBA) maintains and manages all 1 2 CHAPTER 1. INTRODUCTION the changes related to a MySQL system, related to the code development life- cycle and the changes related to MySQL maintenance [5] as shown in Figure 1.1. Both types of changes are manually managed by DBAs. The flow de- scribed in Figure 1.1 is split in 7 steps. A developer updates a SQL schema in step 1 which would be assessed by a DBA, this process consists of analyzing the SQL statements validity and the impact that they may have over the sys- tem to prevent errors such as compatibility breaks and data integrity breaks. Following, a DBA tests the changes locally as manual integration tests before approving the migration. Once the migration is approved at step 4, a DBA prepares the MySQL system to be migrated, prepare a backup, schedule a maintenance time, and other project-specific tasks. Once the MySQL system is prepared for the migration, in step 6 a DBA proceeds with the online schema migration approach and supervises the operations while the data is migrated to the new schema. Then, a DBA checks that the migration has been successfully applied in step 7. Steps 5 and 6 can take from hours to days in a production database with required supervision. It presents a clear bottleneck and obstacle to automatize a system continuous delivery cycle. Even if, in some cases, there are already tools that can be effectively used in some stages of a CD pipeline or a CDL pipeline such as SQL lints for step 2 and online schema migration tools for step 5, they do not cover steps 5, 6 and 7 without compromising the system integrity. SQL lints perform sanity checks, mainly checking the syntax, how- ever, they lack breaking change checks or data integrity break check. Online schema migration tools manage part of the migration, it applies the changes to the schema without downtime but it lacks database state and performance monitoring which implies that a DBA needs to supervise the process. CD and CDL pipelines as automatic processes have to ensure the repeatability of themself and system integrity [6]. Currently, it is not possible to perform these tasks in CD and CDL pipelines without compromising the robustness and sta- bility of the system with the current tools for MySQL systems. Consequently, most companies keep using manual schema migration meth- ods for database schema changes. A DBA will manually apply and manage the changes, generating a delay in the development cycle. In some use cases, due to the complexity of the process and the required manual time, some tools re- duce manual interaction in some of the steps such as SQL lint at step 2 and online schema migration tools at step 6. However, these steps are just partially automated. CHAPTER 1. INTRODUCTION 3 Figure 1.1: DBA migration flow 1.1 Problem Statement There is already research around the different areas that may allow automa- tizing these steps. Schema transformation and rollbacks is a widely studied topic [7][8]. It has a special focus on the design of Database Evolution Lan- guage (DEL) to improve the process. On the other hand, relational database migrations are also widely studied [9][10][11][12][8]. The previous studies are particularly focused on analyzing, and designing strategies and techniques to face the resulting challenges [13][14].