A Methodology to Identify Alternative Suitable Nosql Data Models Via Observation of Relational Database Interactions

Air Force Institute of Technology AFIT Scholar Theses and Dissertations Student Graduate Works 9-2020 A Methodology to Identify Alternative Suitable NoSQL Data Models via Observation of Relational Database Interactions Paul M. Beach Follow this and additional works at: https://scholar.afit.edu/etd Part of the Databases and Information Systems Commons Recommended Citation Beach, Paul M., "A Methodology to Identify Alternative Suitable NoSQL Data Models via Observation of Relational Database Interactions" (2020). Theses and Dissertations. 4339. https://scholar.afit.edu/etd/4339 This Dissertation is brought to you for free and open access by the Student Graduate Works at AFIT Scholar. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of AFIT Scholar. For more information, please contact [email protected]. A Methodology to Identify Alternative Suitable NoSQL Data Models via Observation of Relational Database Interactions DISSERTATION Paul M. Beach, Major, USAF AFIT-ENV-DS-20-S-056 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR FORCE INSTITUTE OF TECHNOLOGY Wright-Patterson Air Force Base, Ohio DISTRIBUTION STATEMENT A APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. The views expressed in this document are those of the author and do not reflect the official policy or position of the United States Air Force, the United States Department of Defense or the United States Government. This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States. AFIT-ENV-DS-20-S-056 A METHODOLOGY TO IDENTIFY ALTERNATIVE SUITABLE NOSQL DATA MODELS VIA OBSERVATION OF RELATIONAL DATABASE INTERACTIONS DISSERTATION Presented to the Faculty Department of Systems Engineering and Management Graduate School of Engineering and Management Air Force Institute of Technology Air University Air Education and Training Command in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Paul M. Beach, BS, MS Major, USAF August 2020 DISTRIBUTION STATEMENT A APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. AFIT-ENV-DS-20-S-056 A METHODOLOGY TO IDENTIFY ALTERNATIVE SUITABLE NOSQL DATA MODELS VIA OBSERVATION OF RELATIONAL DATABASE INTERACTIONS DISSERTATION Paul M. Beach, BS, MS Major, USAF Committee Membership: Brent T. Langhals, PhD Chairman Michael R. Grimaila, PhD, CISM, CISSP Member Douglas D. Hodson, PhD Member Maj. Ryan D. L. Engle, PhD Member ADEDJI B. BADIRU, PhD Dean, Graduate School of Engineering and Management AFIT-ENV-DS-20-S-056 Abstract The effectiveness and performance of data-intensive applications are influenced by the data models upon which they are built. The relational data model has been the de facto data model underlying most database systems since the 1970's, but the recent emergence of NoSQL data models have provided users with alternative ways of storing and manipulating data. Previous research demonstrated the potential value in applying NoSQL data models in non-distributed environments. However, knowing when to apply these data models has generally required inputs from system subject matter experts to make this determination. This research considers an existing approach for selecting suitable data models based on a set of 12 criteria and extends it with a novel methodology to characterize and assess the suitability of the relational and NoSQL data models based solely on observations of a user's interactions with an existing relational database system. Results from this work show that this approach is able to identify and characterize the observed usage of existing systems and produce suitability recommendations for alternate data models. iv AFIT-ENV-DS-20-S-056 This work is dedicated to all who poured out the overwhelming love and support I needed to navigate this difficult season. Especially to my brilliant daughter - may you grow up knowing that you can accomplish anything you set your mind to. v Acknowledgements I would like to offer my deepest and most sincere thanks to all who helped me throughout this endeavour. In particular, I would like to thank Lt. Col. Logan Mailloux, who saw promise in me and helped guide me in the early days of my research. Many thanks to Dr. Robert Mills, who taught me how to take an idea from \I believe..." and develop it into a research effort. I would like to express my immeasurable gratitude to my faculty advisor, Dr. Brent Langhals, for your unwavering support and counsel, and for helping to keep my feet on the ground when my head was up in the clouds. I would like to express my deepest appreciation to my committee members: Dr. Michael Grimaila and Dr. Douglas Hodson for sharing your vast expertise, ideas, and feedback throughout my research, and to Maj. Ryan Engle for your keen insights, innumerable suggestions, and for allowing me to stand on your shoulders and continue with your work. I would like to acknowledge Dr. Eric Blasch and the Air Force Office of Scientific Research for the generous funding which enabled me to conduct this research. Finally, I lift up my utmost thanks and praises to God, my creator, redeemer, and sustainer, for the countless blessings and opportunities he has bestowed upon me. Paul M. Beach vi Contents Page Abstract............................................................... iv Acknowledgements...................................................... vi List of Figures...........................................................x List of Tables.......................................................... xii I. Introduction........................................................1 1.1 Background....................................................1 1.2 Motivation.....................................................4 1.3 Problem Statement..............................................5 1.4 Research Questions & Hypotheses.................................6 1.5 Methodology....................................................7 Assumptions/Limitations........................................ 10 Implications................................................... 10 Preview....................................................... 11 II. Literature Review.................................................. 12 2.1 Chapter Overview.............................................. 12 2.2 Database Systems.............................................. 12 Relational Data Model.......................................... 14 NoSQL Data Models............................................ 17 NoSQL on a Single Machine..................................... 24 2.3 Typical NoSQL Use Cases....................................... 26 Key-Value - Session Management, User Profiles, and Shopping Carts.......................................... 26 Document - Event Logging, Content Management Systems, and E-Commerce Applications..................... 27 Column-oriented - Event Logging and Content Management Systems..................................... 27 Graph - Social Networking and Recommendation Systems................................................. 28 2.4 Characterizing Database Usage Patterns........................... 29 Engle Criteria.................................................. 29 2.5 Relation of Engle Criteria to Data Models......................... 32 2.6 Multi-Criteria Decision-Making................................... 40 2.7 Conclusion.................................................... 41 vii Page III. Methodology...................................................... 43 3.1 Research Design................................................ 43 3.2 Research Questions & Hypotheses................................ 43 Mapping of Research Questions to Methodology.................... 44 Hypotheses.................................................... 44 3.3 Instrumentation................................................ 45 System Design................................................. 46 Strategies for Observing the Engle Criteria......................... 50 3.4 Simulation.................................................... 55 Simulation Breakdown.......................................... 58 Online Forum System: phpBB................................... 58 Authoritative Domain Name System: PowerDNS................... 61 Social Networking Site: Elgg..................................... 61 3.5 Data Analysis - Calculating Relative Weights...................... 62 3.6 Data Analysis - Simple Additive Weighting Decision Model........................................................ 64 3.7 Classicmodels Initial Pilot Study................................. 65 3.8 Classicmodels Follow-up Pilot Study.............................. 73 3.9 Conclusion.................................................... 79 IV. Findings.......................................................... 80 4.1 Introduction................................................... 80 4.2 Simulation Results............................................. 80 phpBB........................................................ 80 Elgg.......................................................... 88 PowerDNS.................................................... 96 4.3 Discussion..................................................... 98 Emphasis on Key-Value......................................... 98 Sensitivity of Query Omniscience Measurements.................... 99 System Overhead.............................................. 100 Viability of Results............................................ 101 4.4 Conclusions of Research........................................ 101

Load more