Transactions and Data Management in Nosql Cloud Databases

Transactions and Data Management in NoSQL Cloud Databases Adewole Conrad Ogunyadeka November, 2016 Department of Computing and Communications Technologies Faculty of Technology, Design and Environment Oxford Brookes University Oxford Brookes University A dissertation submitted to the Faculty of Technology, Design and Environment in partial fulfilment of the requirements of the award of Doctor of Philosophy. Abstract NoSQL databases have become the preferred option for storing and processing data in cloud computing as they are capable of providing high data availability, scalability and efficiency. But in order to achieve these attributes, NoSQL databases make certain trade-offs. First, NoSQL databases cannot guarantee strong consistency of data. They only guarantee a weaker consistency which is based on eventual consistency model. Second, NoSQL databases adopt a simple data model which makes it easy for data to be scaled across multiple nodes. Third, NoSQL databases do not support table joins and referential integrity which by implication, means they cannot implement complex queries. The combination of these factors implies that NoSQL databases cannot support transactions. Motivated by these crucial issues this thesis investigates into the transactions and data management in NoSQL databases. It presents a novel approach that implements transactional support for NoSQL databases in order to ensure stronger data consistency and provide appropriate level of performance. The novelty lies in the design of a Multi- Key transaction model that guarantees the standard properties of transactions in order to ensure stronger consistency and integrity of data. The model is implemented in a novel loosely-coupled architecture that separates the implementation of transactional logic from the underlying data thus ensuring transparency and abstraction in cloud and NoSQL databases. The proposed approach is validated through the development of a prototype system using real MongoDB system. An extended version of the standard Yahoo! Cloud Services Benchmark (YCSB) has been used in order to test and evaluate the proposed approach. Various experiments have been conducted and sets of results have been generated. The results show that the proposed approach meets the research objectives. It maintains stronger consistency of cloud data as well as appropriate level of reliability and performance. Declaration Declaration Some parts of the work presented in this thesis have previously appeared in the following published paper: Ogunyadeka A, Younas M, Zhu H, Aldea A. “A Multi-Key Transactions Model for NoSQL Cloud Database Systems”, Proc. of the 2nd IEEE International Conference on Big Data Computing Service and Applications (BigDataService 2016), Oxford, England, UK, 29 March -1 April 2016. ii Acknowledgements Acknowledgements I would like to express my deepest gratitude and appreciation to my supervisor and Director of Studies Dr. Muhammad Younas, for his exceptional leadership, patience and mentoring. His guidance and support was unwavering throughout the period of this thesis. I could not have wished for a better supervisor. I would also like to thank my co-supervisors Dr Arantza Aldea and Professor Hong Zhu, for their support, encouragement and insightful comments in the period of this research. I am truly grateful for such a wonderful team. I will also like to thank my church family, members of Holding Forth the Word Ministry, Milton Keynes and in particular Revd. Biyi Ajala, for his support during the period of this study. I will also like to appreciate my first boss, Mr Isaac Orolugbagbe, for his support and for encouraging me to aim higher. Also, my appreciation goes to my family members who have stayed by me the last few years and to my brothers Soji, Gbolahan and Bolaji. I will particularly like to thank my father Mr Ayo Ogunyadeka for his immense support and also for bearing the entire financial burden of this research. Finally, I will like to thank my wife Ibijoke for her patience and understanding during this period and to my wonderful son, Olufemi. God bless you both. To God be all the Glory, great things He has done, greater things He will do! iii TABLE OF CONTENTS TABLE OF CONTENTS Abstract ............................................................................................................ i Declaration ...................................................................................................... ii Acknowledgements ......................................................................................... iii TABLE OF CONTENTS ....................................................................................... iv LIST OF FIGURES .............................................................................................viii LIST OF TABLES ................................................................................................. x LIST OF ABBREVIATIONS .................................................................................. xi CHAPTER 1 ....................................................................................................... 1 INTRODUCTION ................................................................................................ 1 1.1 CLOUD COMPUTING .......................................................................... 2 1.2 NoSQL DATABASES and TRANSACTIONS ............................................. 3 1.3 MOTIVATION AND RATIONALE OF THE RESEARCH .............................. 5 1.3.1 Statement of the Research Problem ................................................. 5 1.4 AIM and OBJECTIVES ......................................................................... 6 1.5 RESEARCH METHODS ......................................................................... 7 1.6 MAIN CONTRIBUTIONS ...................................................................... 8 1.7 STRUCTURE OF THE THESIS ................................................................ 9 CHAPTER 2 ..................................................................................................... 11 BACKGROUND ............................................................................................... 11 2.1 DATABASE MANAGEMENT SYSTEMS ................................................ 11 2.2 DATABASE TRANSACTION MANAGEMENT........................................ 12 2.2.1 ACID Properties ............................................................................... 13 2.2.2 Serializability ................................................................................... 15 2.2.3 Concurrency Control Techniques .................................................... 16 2.3 DISTRIBUTED TRANSACTION MANAGEMENT .................................... 17 2.4 TRANSACTION RECOVERY PROTOCOLS ............................................. 19 2.4.1 Two Phase Commit ......................................................................... 19 2.5 BIG DATA and NoSQL DATABASES .................................................... 24 2.6 BIG DATA MANAGEMENT IN NOSQL DATABASES ............................. 25 iv TABLE OF CONTENTS 2.6.1 Partitioning ...................................................................................... 25 2.6.2 Scaling ............................................................................................. 26 2.6.3 Replication ...................................................................................... 27 2.6.4 Failure Detection and Recovery ...................................................... 28 2.6.5 Load Balancing ................................................................................ 28 2.6.6 Garbage collection .......................................................................... 29 2.7 ANALYSIS OF CAP THEOREM ............................................................ 29 2.7.1 BASE ................................................................................................ 32 2.7.2 Other Consistency Models .............................................................. 34 2.8 SUMMARY ...................................................................................... 34 CHAPTER 3 ..................................................................................................... 36 DATA PROCESSING IN CLOUD COMPUTING ..................................................... 36 3.1 ACHITECTURAL CONSIDERATIONS of CLOUD DATABASES ................. 37 3.1.1 Loose Coupling VS Tight Coupling ................................................... 37 3.1.2 Share Nothing VS Shared Disk ......................................................... 38 3.1.3 Data Model...................................................................................... 38 3.1.4 Concurrency Control Techniques .................................................... 39 3.1.5 Replication ...................................................................................... 39 3.1.6 Master-Slave VS Peer to Peer Architecture .................................... 40 3.1.7 Query Processing Approach ............................................................ 40 3.1.8 Read Optimised VS Write Optimised .............................................. 41 3.1.9 Latency VS Durability ...................................................................... 42 3.2 NOSQL DATABASES AND BIG DATA .................................................. 42 3.2.1 BIGTABLE ......................................................................................... 44 3.2.2 MONGODB .....................................................................................

Transactions and Data Management in Nosql Cloud Databases

Rainblock: Faster Transaction Processing in Public Blockchains

Big Data Velocity in Plain English

Let's Talk About Storage & Recovery Methods for Non-Volatile Memory

Not ACID, Not BASE, but SALT a Transaction Processing Perspective on Blockchains

Failures in DBMS

ACID Compliant Distributed Key-Value Store

Lecture 15: March 28 15.1 Overview 15.2 Transactions

What Is a Database Management System? What Is a Transaction

What Is Nosql? the Only Thing That All Nosql Solutions Providers Generally Agree on Is That the Term “Nosql” Isn’T Perfect, but It Is Catchy

Damage Management in Database Management Systems

ACID, Transactions

Concurrency Control Basics