The Current Positioning of the Nosql Database
Total Page:16
File Type:pdf, Size:1020Kb
THE CURRENT POSITIONING OF THE NOSQL DATABASE SHOULD NOSQL DATABASES SUBSTITUTE OR COMPLEMENT TRADITIONAL RELATIONAL DATABASES AND HOW DOES THIS HAVE IMPLICATIONS FOR UNIVERSITY EDUCATION GIVEN THE PRACTICAL USE? Word count: 31.527 Ewoud Stroom Student number: 01409310 Supervisor: Prof. Dr. Geert Poels Master’s Dissertation submitted to obtain the degree of: Master in Business Engineering: Data Analytics Academic year: 2018 – 2019 Deze pagina is niet beschikbaar omdat ze persoonsgegevens bevat. Universiteitsbibliotheek Gent, 2021. This page is not available because it contains personal information. Ghent Universit , Librar , 2021. Preface This Master’s Dissertation was written in function of obtaining the degree of Master of Science in Business Engineering at Ghent University. I set the subject regarding the current positioning of the NoSQL database technology because of my growing interest in this emerging trend. The subject was first proposed by my promotor, Prof. Dr. Geert Poels, who emphasized how highly topical discussions about NoSQL databases are. In the current literature, there is no real consensus about the positioning of NoSQL in the database landscape. This challenge encouraged me to analyse this subject and to create and contribute my own insights and expertise to the world. In addition, it was an opportunity for me to use my obtained knowledge during 5 years of education to its full potential. On the one hand, the experience gained to conduct a correct qualitative and quantitative research could be linked with using programs such as SPSS and Nvivo in practice. On the other hand, the content of the subject could be linked with my specialization in Data Analytics, and because it was highly topical, I was able to get insights about the current trends in today’s world of data. I would also like to thank a few people in particular. First, I want to thank my promotor, Prof. Dr. Geert Poels, to give me the opportunity to work on this project and to steer me in the right direction whenever I was facing some difficulties. The most challenging aspect was for me interpreting the results of the quantitative research and finding a correct way to analyse the impact of this subject on university education, but thanks to the insights and feedback from my promotor, this has come to a right end. Furthermore, I would like to thank the interviewees for their time and cooperation, and in addition, my friends and acquaintances in helping to find the interviewees and bringing me in contact with them. Finally, I would also like to thank my family and friends for their support during some long days and nights. Ewoud Stroom Ghent, 4th of June 2019 I Abstract Nowadays, data is everything. The world is built on digitally stored information which is derived from data and without realising it, it is ingrained in today’s culture and influences everyone’s private and professional lives. This dependency on data requires a reliable way of storing, managing, and retrieving all kinds of data, and this is where the technology of databases is practised. Trends in the database landscape, like the development of the Internet and the occurrence of Big Data, cloud computing, social media… these days, caused issues for the traditional relational databases and resulted in the rise of a hype, called NoSQL databases. The choice of organizations which database type to use in a particular situation is essential, and therefore it is important to evaluate both database types and gain insight in the positioning of the NoSQL databases in comparison with the relational databases, which means defining them as complements or substitutes. This insight is exactly were the focus of this thesis lies. To obtain this insight, a qualitative study was conducted among 6 employees of different organizations, to discover possible patterns in practice. An additional quantitative study was performed in 2 Flemish universities to measure the current level of education about this subject. This resulted in a well-founded comparison of NoSQL and relational databases with implications for university education. Keywords: database, relational, NoSQL, complements, Big Data, cloud computing II Table of contents PREFACE I ABSTRACT II TABLE OF CONTENTS III LIST OF ABBREVIATIONS VI LIST OF FIGURES AND TABLES VI 1. INTRODUCTION 1 2. LITERATURE STUDY 3 2.1. RELATIONAL DATABASES 3 2.1.1. PROBLEM 3 2.1.2. SOLUTION: THE RELATIONAL DATABASE 4 2.1.2.1. Origin: the relational model of E.F. Codd 4 2.1.2.1.1. OPTIMIZING DATA INDEPENDENCY 4 2.1.2.1.2. INTERPRETATION OF THE RELATIONAL VIEW 5 2.1.2.1.3. LIMITATIONS OF CODD’S MODEL 6 2.1.2.2. Timeline: from Codd’s relational view to the current relational database 7 2.1.2.2.1. THE ‘70S 7 2.1.2.2.2. THE ‘80S 8 2.1.2.2.3. THE ‘90S AND EARLY 2000S 8 2.1.2.3. General definition 9 2.1.3. ADVANTAGES 12 2.1.3.1. Data independence 12 2.1.3.2. Acid properties 13 2.1.3.3. Simplicity 14 2.1.3.4. Security 14 2.1.3.5. Multiple access 14 2.2. NOSQL DATABASES 16 2.2.1. PROBLEM 16 2.2.1.1. Increasing volumes of data 16 2.2.1.2. Rise of unstructured data 17 2.2.1.3. Scalability 18 2.2.1.4. Connectivity 19 2.2.1.5. Cost 19 2.2.2. SOLUTION: NOSQL DATABASES 20 2.2.2.1. Initial rise 20 2.2.2.2. General definition 21 2.2.2.2.1. DEFINITION 22 III 2.2.2.2.2. NOSQL: NO SQL – NOT ONLY SQL – NON-RELATIONAL 24 2.2.2.2.3. TYPES 25 2.2.2.2.3.1. Key-value store databases 25 2.2.2.2.3.2. Document store databases 26 2.2.2.2.3.3. Column-oriented databases 27 2.2.2.2.3.4. Graph databases 29 2.2.3. ADVANTAGES 30 2.2.3.1. Big Data handling 30 2.2.3.2. Scalability 30 2.2.3.3. Continuous availability 31 2.2.3.4. Open-source 31 2.2.3.5. Cloud computing 32 2.2.3.6. Suitable architectures 32 2.3. COMPARISON OF RELATIONAL & NOSQL DATABASES 33 2.4. TRENDS IN THE DATABASE TECHNOLOGY 35 2.4.1. THE NOSQL ‘HYPE’ 35 2.4.2. CURRENT POSITIONING 36 2.5. CONCLUSION LITERATURE STUDY 37 3. METHODOLOGY 39 3.1. RESEARCH QUESTION 39 3.2. RESEARCH FRAMEWORK 40 3.3. RESEARCH SCOPE 41 3.4. RESEARCH DESIGN 42 3.4.1. QUALITATIVE RESEARCH 43 3.4.2. QUANTITATIVE RESEARCH 43 3.5. RESEARCH INSTRUMENTS 44 3.5.1. INTERVIEW AS AN INSTRUMENT 44 3.5.2. RATING SCALE AS AN INSTRUMENT 45 3.6. DATA COLLECTION 47 3.6.1. LITERATURE STUDY 47 3.6.2. EMPIRICAL RESEARCH 47 3.6.2.1. Interview research 47 3.6.2.1.1. DATA COLLECTION PLAN 47 3.6.2.1.2. SAMPLE 48 3.6.2.1.2.1. Sampling method 48 3.6.2.1.2.2. Sample size 50 3.6.2.2. Study guide research 51 3.6.2.2.1. DATA COLLECTION PLAN 51 3.6.2.2.2. SAMPLE 53 3.6.2.2.2.1. Sampling method 53 3.6.2.2.2.2. Sample size 54 3.7. DATA ANALYSIS 54 3.7.1. ANALYSIS OF THE INTERVIEWS 54 3.7.2. ANALYSIS OF THE STUDY GUIDES 56 3.8. DATA RESULTS COMPARED 62 3.9. VALIDITY AND RELIABILITY 62 4. RESULTS 64 4.1. INTERVIEW RESEARCH 64 4.1.1. CHARACTERISTICS OF THE USED DATABASE 64 IV 4.1.2. CHARACTERISTICS OF THE DATABASE TECHNOLOGY 65 4.1.3. UNIVERSITY IMPLICATIONS 67 4.1.4. OPINION ABOUT THE DATABASE TECHNOLOGY 67 4.1.4.1. Feeling about databases 68 4.1.4.2. Importance of the database type 69 4.1.5. ACQUIRED INSIGHTS 71 4.1.5.1. The link between the importance and the overall feeling 71 4.1.5.2. The link between timing and the way of obtaining and maintaining 72 4.1.5.3. The link between the positioning of the DB technology and the importance 73 4.1.5.4. Differences according to the type of employee 74 4.2. STUDY GUIDE RESEARCH 75 4.2.1. CHARACTERISTICS OF THE COURSES 75 4.2.1.1. Descriptive statistics 75 4.2.1.2. Statistical validation 81 4.2.2. INFLUENCES ON THE CHARACTERISTICS OF THE COURSES 82 4.2.2.1. Descriptive statistics 82 4.2.2.2. Statistical validation 85 5. DISCUSSION 87 5.1. DISCUSSION EMERGING FROM THE RESEARCH & IMPLICATIONS 87 5.2. LIMITATIONS OF THE RESEARCH 90 5.3. SUGGESTIONS FOR FURTHER RESEARCH 91 6. CONCLUSION 92 REFERENCES 93 APPENDICES 97 A.1 INTERVIEWING GUIDE 97 A.2 CHECKLIST WITH KEYWORDS 98 A.3 SPSS RESULTS 99 V List of abbreviations DB: database DBMS: database management system RDBMS: relational database management system SQL: Structured Query Language IDC: International Data Corporation ER: entity relationship List of figures and tables Figures FIGURE 2.1. OVERVIEW OF THE LITERATURE STUDY 3 FIGURE 2.2. EXAMPLE OF A RELATIONAL MODEL EXPLAINED WITH THE CONCEPTS 10 FIGURE 2.3. EXAMPLE OF THE WORKING OF A DBMS 11 FIGURE 2.4. SCALABLE OF DATA SIZE FROM 2007 TO 2010 (TAURO, ARAVINDH, & SHREEHARSHA, 2012) 17 FIGURE 2.5. DIFFERENCE BETWEEN STRUCTURED DATA AND UNSTRUCTURED 17 FIGURE 2.6. GROWTH OF INFORMATION CONNECTEDNESS (TAURO, ARAVINDH, & SHREEHARSHA, 2012) 19 FIGURE 2.7. EXAMPLE OF A DOCUMENT STORE DATABASE 27 FIGURE 2.8.