High-Availability Database Systems: Evaluation of Existing Open Source Solutions

Total Page:16

File Type:pdf, Size:1020Kb

High-Availability Database Systems: Evaluation of Existing Open Source Solutions Aalto University School of Science Degree Programme of Computer Science and Engineering Tuure Laurinolli High-Availability Database Systems: Evaluation of Existing Open Source Solutions Master's Thesis Espoo, November 19, 2012 Supervisor: Professor Heikki Saikkonen Instructor: Timo L¨attil¨aM.Sc. (Tech.) Aalto University School of Science ABSTRACT OF Degree Programme of Computer Science and Engineering MASTER'S THESIS Author: Tuure Laurinolli Title: High-Availability Database Systems: Evaluation of Existing Open Source Solu- tions Date: November 19, 2012 Pages: 90 Professorship: Software Systems Code: T-106 Supervisor: Professor Heikki Saikkonen Instructor: Timo L¨attil¨aM.Sc. (Tech.) In recent years the number of open-source database systems offering high- availability functionality has exploded. The functionality offered ranges from simple one-to-one asynchronous replication to self-managing clustering that both partitions and replicates data automatically. In the thesis I evaluated database systems for use as the basis for high availability of a command and control system that should remain available to operators even upon loss of a whole datacenter. In the first phase of evaluation I eliminated systems that appeared to be unsuitable based on documentation. In the second phase I tested both throughput and fault tolerance characteristics of the remain- ing systems in a simulated WAN environment. In the first phase I reviewed 24 database systems, of which I selected six, split in two categories based on consistency characteristics, for further evaluation. Ex- perimental evaluation showed that two of these six did not actually fill my re- quirements. Of the remaining four systems, MongoDB proved troublesome in my fault tolerance tests, although the issues seemed resolvable, and Galera's slight issues were due to its configuration mechanism. This left one in each category. They, Zookeeper and Cassandra, did not exhibit any problems in my tests. Keywords: database, distributed system, consistency, latency, causality Language: English 2 Aalto-yliopisto Perustieteiden korkeakoulu DIPLOMITYON¨ Tietotekniikan tutkinto-ohjelma TIIVISTELMA¨ Tekij¨a: Tuure Laurinolli Ty¨on nimi: Korkean saavutettavuuden tietokantaj¨arjestelm¨at: Olemassa olevien avoimen l¨ahdekoodin ratkaisuiden arviointi P¨aiv¨ays: 19. marraskuuta 2012 Sivum¨a¨ar¨a: 90 Professuuri: Ohjelmistotekniikka Koodi: T-106 Valvoja: Professori Heikki Saikkonen Ohjaaja: Diplomi-insin¨o¨ori Timo L¨attil¨a Viime vuosina korkean saavutettavuuden mahdollistavat avoimen l¨ahdekoodin tietokantaj¨arjestelm¨at ovat yleistyneet. Korkean saavutettavuuden ratkaisut vaih- televat yksinkertaisesta asynkronisesta yksi yhteen -toisintamisesta dataa it- sen¨aisesti hajauttavaan ja toisintavaan ryv¨astykseen. T¨ass¨a diplomity¨oss¨a arvioin tietokantaj¨arjestelmien soveltuvuutta pohjaksi kor- kean saavutettavuuden toiminnoille komentokeskusj¨arjestelm¨ass¨a, jonka tulee pysy¨a saavutettavana my¨os kokonaisen konesalin vikaantuessa. Arvioinnin en- simm¨aisess¨a vaiheessa eliminoin dokumentaation perusteella selv¨asti soveltumat- tomat j¨arjestelm¨at. Toisessa vaiheessa testasin sek¨a j¨arjestelmien viansietoisuutta ett¨a l¨ap¨aisykyky¨a simuloidussa korkean latenssin verkossa. Ensimm¨aisess¨a vaiheessa tutustuin 24 tietokantaj¨arjestelm¨a¨an, joista valitsin kuusi tarkempaan arviointiin. Jaoin tarkemmin arvioidut j¨arjestelm¨at kahteen kategoriaan konsistenssiominaisuuksien perusteella. Kokeissa havaitsin ett¨a kaksi n¨aist¨a kuudesta ei t¨aytt¨anyt asettamiani vaatimuksia. J¨aljellej¨a¨aneist¨a nelj¨ast¨a j¨arjestelm¨ast¨a MongoDB aiheutti ongelmia viansietoisuustesteiss¨ani, joskin ongel- mat vaikuttivat olevan korjattavissa, ja Galeran v¨ah¨aiset ongelmat johtuivat sen asetusj¨arjestelm¨ast¨a. J¨aljelle j¨aiv¨at ensimm¨aisest¨a kategoriasta Zookeeper ja toi- sesta Cassandra, joiden kummankaan viansietoisuudesta en testeiss¨ani l¨oyt¨anyt ongelmia. Asiasanat: tietokanta, hajautettu j¨arjestelm¨a, ristiriidattomuus, konsis- tenssi, viive, latenssi, kausaalisuus Kieli: Englanti 3 Acknowledgements I would like to thank Portalify Ltd for offering me an interesting thesis project and ample time to work on it. At Portalify I'd especially like to thank M.Sc. Timo L¨attil¨a,my instructor, for putting me on the right track from the start. Outside Portalify, I would like to thank Professor Heikki Saikkonen for taking the time to supervise my thesis. I want to also thank my friends and family for providing me support and, perhaps even more importantly, welcome distractions. Aalto on Waves was downright disruptive, and learning to fly at Polyteknikkojen Ilmailukerho took its time too. However, constant support from old friends was the most important. Thank you, Juha and #kumikanaultimate! Helsinki, November 19, 2012 Tuure Laurinolli 4 Abbreviations and Acronyms 2PC Two-phase Commit ACID Atomicity, Consistency, Isolation, Durability API Application Programming Interface ARP Address Resolution Protocol CAS Compare And Set FMEA Failure Modes and Effects Analysis FMECA Failure Modes, Effects and Criticality Analysis FTA Fault Tree Analysis HAPS High Availability Power System HTTP Hypertext Transfer Protocol JSON JavaScript Object Notation LAN Local Area Network MII Media Independent Interface NAT Network Address Translation PRA Probabilistic Risk Assessment REST Representational State Transfer RPC Remote Procedure Call RTT Round-Trip Time SDS Short Data Service SLA Service Level Agreement SSD Solid State Drive SQL Structured Query Language TAP Linux network tap TCP Transmission Control Protocol TETRA Terrestrial Trunked Radio VM Virtual Machine WAN Wide Area Network XA X/Open Extended Architecture 5 Contents Abbreviations and Acronyms 4 1 Introduction 8 1.1 High-Availability Command and Control System . .8 1.2 Open-Source Database Systems . .9 1.3 Evaluation of Selected Databases . .9 1.4 Structure of the Thesis . 10 2 High Availability and Fault Tolerance 11 2.1 Terminology . 11 2.2 Overcoming Faults . 14 2.3 Analysis techniques . 16 3 System Architecture 24 3.1 Background . 24 3.2 Network Communications Architecture . 26 3.3 Software Architecture . 28 3.4 FMEA Analysis of System . 33 3.5 FTA Analysis of System . 37 3.6 Software Reliability Considerations . 39 3.7 Conclusions on Analyses . 40 4 Evaluated Database Systems 41 4.1 Database Requirements . 41 4.2 Rejected Databases . 42 4.3 Databases Selected for Limited Evaluation . 48 4.4 Databases Selected for Full-Scale Evaluation . 50 5 Experiment Methodology 54 5.1 Test System . 54 5.2 Test Programs . 56 6 5.3 Fault Simulation . 64 5.4 Test Runs . 65 6 Experiment Results 66 6.1 Throughput Results . 66 6.2 Fault Simulation Results . 75 7 Comparison of Evaluated Systems 84 7.1 Full-Scale Evaluation . 84 7.2 Limited Evaluation . 85 8 Conclusions 86 A Remaining throughput results 91 B Remaining fault test results 95 7 Chapter 1 Introduction In this thesis I present my research related to adoption of an existing open- source database system as the basis for high availability in a command and control system being developed by Portalify Ltd. 1.1 High-Availability Command and Control System The command and control system is designed to support operations of rescue personnel by automatically tracking status and location of field units so that dispatching operators always have correct and up-to-date view of available units. It tracks locations of TETRA handsets and vehicle radios, and handles status messages sent by field personnel in response to events such as receiving dispatch orders. The system also allows operators to dispatch a unit on a mission, and automatically sends necessary information to the unit. The system should scale to installations that span large geographical ar- eas, with dispatching operators located in multiple, geographically diverse control rooms, and thousands of controlled units spread over the geograph- ical area. Typically operators in one control room would be responsible for controlling units in a specific area, but it should be possible for another control room to take over the area in case the original control room cannot handle its tasks because it has for example lost electrical power. In this thesis I concentrate on hardware fault tolerance of the command and control system and also the database system, since studying software faults of large, existing software systems appears to be an unsolved problem. However, I touch on higher-level approaches that could be used to enhance software fault tolerance of a complex system in practice in Chapter 3. I introduce terminology and analysis methods related to availability and 8 CHAPTER 1. INTRODUCTION 9 fault tolerance in Chapter 2. In Chapter 3 I present more elaborate require- ments for the system, a system architecture based on those requirements and fault-tolerance analysis of the architecture model based on analysis methods introduced in Chapter 2. 1.2 Open-Source Database Systems The system described above must be able to share data between operators working on different workstations, located in different control rooms, dis- tributed across a country. A database system for storing the data and con- trolling access to it is required. Because of the fault tolerance requirements presented in Chapter 3, the database system must be geographically dis- tributed. Main functional requirement for the database is that it must provide atomic update primitive, preferably with causal consistency and read com- mitted visibility semantics. Main non-functional requirements are quick, au- tomatic handling of
Recommended publications
  • JETIR Research Journal
    © 2018 JETIR October 2018, Volume 5, Issue 10 www.jetir.org (ISSN-2349-5162) QUALITATIVE COMPARISON OF KEY-VALUE BIG DATA DATABASES 1Ahmad Zia Atal, 2Anita Ganpati 1M.Tech Student, 2Professor, 1Department of computer Science, 1Himachal Pradesh University, Shimla, India Abstract: Companies are progressively looking to big data to convey valuable business insights that cannot be taken care by the traditional Relational Database Management System (RDBMS). As a result, a variety of big data databases options have developed. From past 30 years traditional Relational Database Management System (RDBMS) were being used in companies but now they are replaced by the big data. All big bata technologies are intended to conquer the limitations of RDBMS by enabling organizations to extract value from their data. In this paper, three key-value databases are discussed and compared on the basis of some general databases features and system performance features. Keywords: Big data, NoSQL, RDBMS, Riak, Redis, Hibari. I. INTRODUCTION Systems that are designed to store big data are often called NoSQL databases since they do not necessarily depend on the SQL query language used by RDBMS. NoSQL today is the term used to address the class of databases that do not follow Relational Database Management System (RDBMS) principles and are specifically designed to handle the speed and scale of the likes of Google, Facebook, Yahoo, Twitter and many more [1]. Many types of NoSQL database are designed for different use cases. The major categories of NoSQL databases consist of Key-Values store, Column family stores, Document databaseand graph database. Each of these technologies has their own benefits individually but generally Big data use cases are benefited by these technologies.
    [Show full text]
  • LIST of NOSQL DATABASES [Currently 150]
    Your Ultimate Guide to the Non - Relational Universe! [the best selected nosql link Archive in the web] ...never miss a conceptual article again... News Feed covering all changes here! NoSQL DEFINITION: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply such as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge amount of data and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above. [based on 7 sources, 14 constructive feedback emails (thanks!) and 1 disliking comment . Agree / Disagree? Tell me so! By the way: this is a strong definition and it is out there here since 2009!] LIST OF NOSQL DATABASES [currently 150] Core NoSQL Systems: [Mostly originated out of a Web 2.0 need] Wide Column Store / Column Families Hadoop / HBase API: Java / any writer, Protocol: any write call, Query Method: MapReduce Java / any exec, Replication: HDFS Replication, Written in: Java, Concurrency: ?, Misc: Links: 3 Books [1, 2, 3] Cassandra massively scalable, partitioned row store, masterless architecture, linear scale performance, no single points of failure, read/write support across multiple data centers & cloud availability zones. API / Query Method: CQL and Thrift, replication: peer-to-peer, written in: Java, Concurrency: tunable consistency, Misc: built-in data compression, MapReduce support, primary/secondary indexes, security features.
    [Show full text]
  • Big Data: a Survey the New Paradigms, Methodologies and Tools
    Big Data: A Survey The New Paradigms, Methodologies and Tools Enrico Giacinto Caldarola1,2 and Antonio Maria Rinaldi1,3 1Department of Electrical Engineering and Information Technologies, University of Naples Federico II, Napoli, Italy 2Institute of Industrial Technologies and Automation, National Research Council, Bari, Italy 3IKNOS-LAB Intelligent and Knowledge Systems, University of Naples Federico II, LUPT 80134, via Toledo, 402-Napoli, Italy Keywords: Big Data, NoSQL, NewSQL, Big Data Analytics, Data Management, Map-reduce. Abstract: For several years we are living in the era of information. Since any human activity is carried out by means of information technologies and tends to be digitized, it produces a humongous stack of data that becomes more and more attractive to different stakeholders such as data scientists, entrepreneurs or just privates. All of them are interested in the possibility to gain a deep understanding about people and things, by accurately and wisely analyzing the gold mine of data they produce. The reason for such interest derives from the competitive advantage and the increase in revenues expected from this deep understanding. In order to help analysts in revealing the insights hidden behind data, new paradigms, methodologies and tools have emerged in the last years. There has been a great explosion of technological solutions that arises the need for a review of the current state of the art in the Big Data technologies scenario. Thus, after a characterization of the new paradigm under study, this work aims at surveying the most spread technologies under the Big Data umbrella, throughout a qualitative analysis of their characterizing features.
    [Show full text]
  • Nosql - Notonly Sql
    International Journal of Enterprise Computing and Business Systems ISSN (Online) : 2230-8849 Volume 2 Issue 2 July 2013 International Manuscript ID : ISSN22308849-V2I2M3-072013 NOSQL - NOTONLY SQL Dr. S. George University of New Zealand Abstract A NoSQL database provides a mechanism for storage and retrieval of data that uses looser consistency models than traditional relational databases. Motivations for this approach include simplicity of design, horizontal scaling and finer control over availability. NoSQL databases are often highly optimized key–value stores intended for simple retrieval and appending operations, with the goal being significant performance benefits in terms of latency and throughput. NoSQL databases are finding significant and growing industry use in big data and real-time web applications. NoSQL systems are also referred to as "Not only SQL" to emphasize that they do in fact allow SQL-like query languages to be used. ACID vs BASE NoSQL cannot necessarily give full ACID guarantees. Usually eventual consistency is guaranteed or transactions limited to single data items. This means that given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system. [citation needed ]. Contents History Carlo Strozzi used the term NoSQL in 1998 to name his lightweight, open-source relational database that did not expose the standard SQL interface. Strozzi suggests that, as the International Journal of Enterprise Computing and Business Systems ISSN (Online) : 2230-8849 Volume 2 Issue 2 July 2013 International Manuscript ID : ISSN22308849-V2I2M3-072013 current NoSQL movement "departs from the relational model altogether; it should therefore have been called more appropriately 'NoREL'.
    [Show full text]
  • 7 Programming Langua
    7 programming languages on the rise http://www.infoworld.com/print/141620 Published on InfoWorld (http://www.infoworld.com) Home > Application Development > Languages and Standards > 7 programming languages on the rise > 7 programming languages on the rise 7 programming languages on the rise By Peter Wayner Created 2010-10-25 03:00AM In the world of enterprise programming, the mainstream is broad and deep. Code is written predominantly in one of a few major languages. For some shops, this means Java [1]; for others, it's C# or PHP [2]. Sometimes, enterprise coders will dabble in C++ or another common language used for high-performance tasks such as game programming, all of which turn around and speak SQL to the database. Programmers looking for work in enterprise shops would be foolish not to learn the languages that underlie this paradigm, yet a surprising number of niche languages are fast beginning to thrive in the enterprise. Look beyond the mainstays, and you'll find several languages that are beginning to provide solutions to increasingly common problems, as well as old-guard niche languages that continue to occupy redoubts. All offer capabilities compelling enough to justify learning a new way to juggle brackets, braces, and other punctuation marks. [ Keep up on key application development insights with the Fatal Exception blog [3] and Developer World newsletter [4]. | See how the latest Python IDEs [5] and PHP tools [6] fared in our recent InfoWorld Test Center reviews. ] While the following seven niche languages offer features that can't be found in the dominant languages, many rely on the dominant languages to exist.
    [Show full text]
  • Databases Theoretical Introduction Contents
    Databases Theoretical Introduction Contents 1 Databases 1 1.1 Database ................................................ 1 1.1.1 Terminology and overview .................................. 1 1.1.2 Applications .......................................... 2 1.1.3 General-purpose and special-purpose DBMSs ........................ 2 1.1.4 History ............................................ 2 1.1.5 Research ........................................... 6 1.1.6 Examples ........................................... 6 1.1.7 Design and modeling ..................................... 7 1.1.8 Languages ........................................... 9 1.1.9 Performance, security, and availability ............................ 10 1.1.10 See also ............................................ 12 1.1.11 References .......................................... 12 1.1.12 Further reading ........................................ 13 1.1.13 External links ......................................... 14 1.2 Schema migration ........................................... 14 1.2.1 Risks and Benefits ...................................... 14 1.2.2 Schema migration in agile software development ...................... 14 1.2.3 Available Tools ........................................ 15 1.2.4 References .......................................... 15 1.3 Star schema .............................................. 16 1.3.1 Model ............................................ 16 1.3.2 Benefits ............................................ 16 1.3.3 Disadvantages .......................................
    [Show full text]
  • November 2010 Gemini Mobile Technologies
    # hibaridb Hibari/Erlang/ NOSQL for BIGDATA November 2010 Gemini Mobile Technologies Hibari Open Source project: http://sourceforge.net/projects/hibari/ 1 Introduction • Founded: July, 2001 • Offices: San Francisco, Tokyo, Beijing • Investors: – Goldman Sachs, Mitsubishi-UFJ, Mizuho, Nomura, Ignite, Access, Aplix • Accomplishments: – Messaging Products • Provide MMSC to 3 out of 4 Carriers in Japan (DoCoMo, Softbank, eMobile) • Largest MMSC in the world (Softbank Japan) • OEM to Alcatel-Lucent and ByteMobile – NOSQL / Big Data • 2006: First Mobile 3D SNS (Softbank, China Unicom, iPhone App) • 4/2010: WebMail, Japanese Mobile Carrier & Internet Provider • 7/2010: Hibari Open Source 2 Customers 3 Hibari (= Cloud Birds) 4 What is Hibari? • Hibari is a production-ready, distributed, key-value, big data store. – China Mobile and China Unicom - SNS – Japanese internet provider - GB mailbox webmail – Japanese mobile carrier - GB mailbox webmail • Hibari uses chain replication for strong consistency, high- availability, and durability. • Hibari has excellent performance especially for read and large value operations. • Hibari is open-source software under the Apache 2.0 license. 5 Environments • Hibari runs on commodity, heterogeneous servers. • Hibari supports Red Hat, CentOS, and Fedora Linux distributions. – Debian, Ubuntu, Gentoo, Mac OS X, and Free BSD are coming soon. • Hibari supports Erlang/OTP R13B04. – R14B is coming soon. • Hibari supports Amazon S3, JSON-RPC-RFC4627, UBF/EBF/JSF and native Erlang client APIs. – Thrift API was open
    [Show full text]
  • [email protected] @Slfritchie@Slfritchie TL;DR • See Slides 3-57
    A Tour of Basho's Source at GitHub Scott Lystig Fritchie Senior Software Engineer [email protected]@basho.com @slfritchie@slfritchie TL;DR • See slides 3-57. Goals • You know what an OTP application is. • You know what OTP apps Basho has @ GitHub. • You know how Basho's apps might help your app. • You don't mob me demanding beer.... The 'appmon' GUI • • • A View of 'sasl' ... … and 'kernel' OTP Application Properties • Version number • BEAM fles • Scripts: application dependencies, upgrade and downgrade scripts, ... • Processes • Supervisors • Workers Starting & Stopping • application:start(AppName). • application:stop(AppName). • application:which_applications(). [{basho_stats,"Basic• Erlang statistics library","1.0.1"}, {bitcask,[],"1.1.5"}, {cluster_info,"Cluster info/postmortem app","1.1.0"}, {crypto,"CRYPTO version 1","1.6.4"}, {erlang_js,"Interface between BEAM and JS","0.5.0"}, {kernel,"ERTS CXC 138 10","2.13.5"}, .... What Does This Have to Do With GitHub? • Yeah, I'm getting there.... Riak as Seen by 'appmon' Why so many apps? • Riak has many parts, different from C packaging. • Some of Riak's major OTP apps Client Application P.Buf. HTTP mochiweb, webmachine, crypto riak_client riak_kv, luwak Dynamo-style FSM replication riak_kv, riak_core riak_coreHTTP riak_core Vnode master riak_core Key-value node riak_kv, erlang_js, luke Storage engine bitcask, crypto, kernel, stdlib, .... Flexible App Packaging: KV, Search, Luwak, custom K-V Application Search App. Big File App. Your App. P.Buf. HTTP P.Buf. HTTP P.Buf. HTTP PB/HTTP riak_client search_client Luwak app Your code Dynamo-style riak_client riak_client FSM replication HTTP riak_core Vnode master Key-value node Your code Merge Index Storage engine engine Your code Reality Check • You probably know a bit more about: • What OTP applications are.
    [Show full text]
  • Big Data Tools-An Overview International Journal of Computer
    Ramadan. Int J Comput Softw Eng 2017, 2: 125 https://doi.org/10.15344/2456-4451/2017/125 International Journal of Computer & Software Engineering Review Article Open Access Big Data Tools-An Overview Rabie A. Ramadan Computer Engineering Department, Cairo University, Giza, Egypt Abstract Publication History: With the increasing of data to be analyzed either in social media, industry applications, or even science, Received: July 27, 2017 there is a need for nontraditional methods for data analysis. Big data is a way for nontraditional strategies Accepted: December 27, 2017 and techniques to organize, store, and process huge data collected from large datasets. Large dataset, in Published: December 29, 2017 this context, means too large data that cannot be handled, stored, or processed using traditional tools Keywords: and techniques or one computer. Therefore, there is a challenge to come up with different analytical approaches to analyze massive scale heterogeneous data coming with high speed. Consequently, big data Relational databases, Data has some characteristics that makes it different from any other data which are Veracity, Volume, Variety, warehouses, Data storage Velocity, and Value. Veracity means variety of resources while Variety means data from different sources. Big data Value characteristic is one of the ultimate challenge that could be complex enough to be stored, extracted, and processed. The Volume deals with the size of the data and required storage while Velocity is related to data streaming time and latency. Throughout this paper, we review the state-of-the-art of big data tools. For the benefits of researchers, industry and practitioners, we review a large number of tools either commercial or free tools.
    [Show full text]