A Methodology for Evaluating Relational and Nosql Databases for Small-Scale Storage and Retrieval

Air Force Institute of Technology AFIT Scholar Theses and Dissertations Student Graduate Works 9-1-2018 A Methodology for Evaluating Relational and NoSQL Databases for Small-Scale Storage and Retrieval Ryan D. Engle Follow this and additional works at: https://scholar.afit.edu/etd Part of the Databases and Information Systems Commons Recommended Citation Engle, Ryan D., "A Methodology for Evaluating Relational and NoSQL Databases for Small-Scale Storage and Retrieval" (2018). Theses and Dissertations. 1947. https://scholar.afit.edu/etd/1947 This Dissertation is brought to you for free and open access by the Student Graduate Works at AFIT Scholar. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of AFIT Scholar. For more information, please contact [email protected]. A METHODOLOGY FOR EVALUATING RELATIONAL AND NOSQL DATABASES FOR SMALL-SCALE STORAGE AND RETRIEVAL DISSERTATION Ryan D. L. Engle, Major, USAF AFIT-ENV-DS-18-S-047 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR FORCE INSTITUTE OF TECHNOLOGY Wright-Patterson Air Force Base, Ohio DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited. AFIT-ENV-DS-18-S-047 The views expressed in this paper are those of the author and do not reflect official policy or position of the United States Air Force, Department of Defense, or the U.S. Government. This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States. i AFIT-ENV-DS-18-S-047 A METHODOLOGY FOR EVALUATING RELATIONAL AND NOSQL DATABASES FOR SMALL-SCALE STORAGE AND RETRIEVAL DISSERTATION Presented to the Faculty Department of Systems and Engineering Management Graduate School of Engineering and Management Air Force Institute of Technology Air University Air Education and Training Command In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Ryan D. L. Engle, BS, MS Major, USAF September 2018 DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited. ii AFIT-ENV-DS-18-S-047 A METHODOLOGY FOR EVALUATING RELATIONAL AND NOSQL DATABASES FOR SMALL-SCALE STORAGE AND RETRIEVAL Ryan D. L. Engle, BS, MS Major, USAF Committee Membership: Brent T. Langhals, PhD Chairman Michael R. Grimaila, PhD, CISM, CISSP Member Douglas D. Hodson, PhD Member ADEDJI B. BADIRU, PhD Dean, Graduate School of Engineering and Management iii AFIT-ENV-DS-18-S-047 Abstract Modern systems record large quantities of electronic data capturing time-ordered events, system state information, and behavior. Subsequent analysis enables historic and current system status reporting, supports fault investigations, and may provide insight for emerging system trends. Unfortunately, the management of log data requires ever more efficient and complex storage tools to access, manipulate, and retrieve these records. Truly effective solutions also require a well- planned architecture supporting the needs of multiple stakeholders. Historically, database requirements were well-served by relational data models, however modern, non-relational databases, i.e. NoSQL, solutions, initially intended for “big data” distributed system may also provide value for smaller-scale problems such as those required by log data. However, no evaluation method currently exists to adequately compare the capabilities of traditional (relational database) and modern NoSQL solutions for small-scale problems. This research proposes a methodology to evaluate modern data storage and retrieval systems. While the methodology is intended to be generalizable to many data sources, a commercially- produced unmanned aircraft system served as a representative use case to test the methodology for aircraft log data. The research first defined the key characteristics of database technologies and used those characteristics to inform laboratory simulations emulating representative examples of modern database technologies (relational, key-value, columnar, document, and graph). Based on those results, twelve evaluation criteria were proposed to compare the relational and NoSQL database types. The Analytical Hierarchy Process was then used to combine literature findings, laboratory simulations, and user inputs to determine the most suitable database type for the log data use case. The study results demonstrate the efficacy of the proposed methodology. iv AFIT-ENV-DS-18-S-047 Dedication This work is dedicated to my mother who instilled integrity in me and always inspired me to do my best. I also wish to thank my wife and children for their consistent support, encouragement, and patience. Finally, I want to thank God for unwavering grace and love. v AFIT-ENV-DS-18-S-047 Acknowledgments This research project would not have been possible without the support and guidance of my research committee. First and foremost, I want to thank my advisor, Dr. Brent Langhals, who asked me if I ever considered earning a PhD, helped me apply for this program, guided me towards and ultimately across the PhD finish line, and provided professional mentorship. Second, I want to thank Dr. Michael Grimaila for guiding me through my initial research effort, always pushing me to perform at a high level, and providing personal and professional mentorship and support. Next, I would also like to thank Dr. Douglas Hodson for providing peerless software engineering expertise, a multitude of creative ideas for research topics, questions, and directions, travel opportunities and companionship, unending support and encouragement, and great stories. Next, I want to thank Dan Koorn, for his expertise, time, and patience. I also want to thank TSgt Charlie Loper for the same things. Additionally, I would like to thank Dr. Timothy Lacey, Ryan Harris, and John Phillips for their support and computing resources for my research. Also, I wish to thank the field study participants but since I promised to not reveal their information, no names will be provided. Finally, I would like to thank Lt Col Logan Mailloux who provided encouragement, military mentorship and comradery, and ultimately friendship. vi AFIT-ENV-DS-18-S-047 List of Figures Figure 1. Simplified Data Lifecycle.............................................................................................. 21 Figure 2. Graph of Relationship between Data Lifecycle and 5 V's. ............................................ 21 Figure 3. Simplified Traditional Three-Tier Database System Architecture. ............................... 25 Figure 4. Data Warehouse Architecture Overview (Elmasri & Navathe, 2016, p. 1103) ............ 26 Figure 5. Data Lake Layers. .......................................................................................................... 28 Figure 6. Examples of rows in a columnar database. ................................................................... 35 Figure 7. Example of a property graph model. ............................................................................. 38 Figure 8. Evaluation Criteria......................................................................................................... 46 Figure 9. Operational View of the Unmanned Aircraft System of Interest. ................................. 56 Figure 10. UAS Log Data Question List. ..................................................................................... 59 Figure 11. Example of a Unique Key Using Flight Metadata and Sample Id. ............................. 71 Figure 12. Example Riak KV Aggregate. ..................................................................................... 71 Figure 13. Example MongoDB Document Containing UAS Log Data. ...................................... 72 Figure 14. HBase Query #1 Table Logical Overview. ................................................................. 74 Figure 15. HBase Flights Table Depicting Column Families and Columns................................. 75 Figure 16. Neo4j Schema for Log Data. ....................................................................................... 77 Figure 17. MySQL Engine Table Example .................................................................................. 78 Figure 18. Three Examples of Cypher Code to Import a Variable with a Changing Name. ........ 88 Figure 19. Query #2 Written in Python 2.7 for Riak KV. ............................................................ 99 Figure 20. Query #2 Written in Neo4j’s Cypher Query Language (CQL). ................................ 101 Figure 21. Query #2 Written in MySQL’s Structured Query Language (SQL). ........................ 101 Figure 22. Query #2 Written in JavaScript for MongoDB’s shell. ............................................. 102 Figure 23. Importance Priority Vectors - Location Charlie. ....................................................... 119 Figure 24. Importance Priority Vectors – Location Hotel. ......................................................... 124 Figure 25. Global Priorities for Location Hotel SMEs. .............................................................. 125 Figure 26. Importance Priority Vectors - Location Kilo. ............................................................ 127 Figure 27. Global Priorities for Location Kilo SMEs. ................................................................ 128 Figure 28. Importance Priority Vectors – Location Mike. .......................................................... 130 Figure 29. Global Priorities for Location Mike SMEs. .............................................................. 130 vii AFIT-ENV-DS-18-S-047

A Methodology for Evaluating Relational and Nosql Databases for Small-Scale Storage and Retrieval

Oracle Database VLDB and Partitioning Guide, 11G Release 2 (11.2) E25523-01

Vmware Tunnel

Data Warehouse Fundamentals for Storage Professionals – What You Need to Know EMC Proven Professional Knowledge Sharing 2011

Not ACID, Not BASE, but SALT a Transaction Processing Perspective on Blockchains

MDA Process to Extract the Data Model from a Document-Oriented Nosql Database Amal Ait Brahim, Rabah Tighilt Ferhat, Gilles Zurfluh

Database Systems Administrator FLSA: Exempt

MDA Process to Extract the Data Model from Document-Oriented Nosql Database

What Is Nosql? the Only Thing That All Nosql Solutions Providers Generally Agree on Is That the Term “Nosql” Isn’T Perfect, but It Is Catchy

The Relational Data Model and Relational Database Constraints

Study Material for B.Sc.Cs Dataware Housing and Mining Semester - Vi, Academic Year 2020-21

SQL Vs Nosql: a Performance Comparison

Index Selection for In-Memory Databases