SOFTWARE ARCHITECTURES and PATTERNS for PERSISTENCE in HETEROGENEOUS DATA-INTENSIVE SYSTEMS by AARON SCHRAM B.S., University Of

SOFTWARE ARCHITECTURES AND PATTERNS FOR PERSISTENCE IN HETEROGENEOUS DATA-INTENSIVE SYSTEMS by AARON SCHRAM B.S., University of Colorado at Boulder, 2006 M.S., University of Colorado at Boulder, 2011 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirement for the degree of Doctor of Philosophy Department of Computer Science 2015 This thesis entitled: Software Architectures and Patterns for Persistence in Heterogeneous Data-Intensive Systems written by Aaron Schram has been approved for the Department of Computer Science ____________________________________ Professor Kenneth M. Anderson, Chair ____________________________________ Professor Richard Han ____________________________________ Professor James Martin ____________________________________ Professor Nenad Medvidovic ____________________________________ Professor Judith Stafford Date___________ The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline. iii ABSTRACT Schram, Aaron (Ph.D., Computer Science) Software Architectures and Patterns for Persistence in Heterogeneous Data- Intensive Systems Thesis directed by Associate Professor Kenneth M. Anderson Software engineers are faced with a variety of difficult choices when selecting appropriate technologies on which to base a software system. As the typical software user has become accustomed to systems being “on-demand” and “always- available,” the software engineer is more concerned than ever before about the issues of system scalability, availability, and durability. In the absence of expertise in distributed systems, architectural decisions become complex, slowing feature development and introducing error. Software engineering is in need of robust patterns and tools that increase the accessibility of specialized technologies developed for the completion of specialized tasks. This dissertation describes my existing work related to the challenges of domain modeling and data-access in large- scale, heterogeneous data-intensive systems and extends this work to include novel architectures for utilizing multiple large-scale data stores effectively. This research focuses on increasing the accessibility and flexibility of these data stores, which typically afford scalability, availability, and durability at the cost of added complexity for the application developer. The resulting architecture and associated implementation alleviates common challenges faced by small and medium software enterprises during the development of heterogeneous data-intensive software applications. DEDICATION To my parents and grandparents who I will always admire for their struggles, sacrifices, and hard work v ACKNOWLEDGEMENTS I would like to thank my colleagues in the various software engineering organizations that I have been a part of throughout my career for the valuable experience and lessons taken from each. Specifically I would like to thank Nate Sammons, William Butler, and Burke Webster for their continued support and friendship through this and many other shared experiences. This dissertation would not have been possible without the positive and long lasting relationship with my advisor, Dr. Kenneth M. Anderson, which began so many years ago when I was an optimistic and overly curious undergraduate. Thank you for guiding me on a journey only you saw I needed to take. Finally, I would like to thank my incredibly supportive network of friends and colleagues for, among many other things, ensuring I enjoy the moment. vi CONTENTS CHAPTER 1: INTRODUCTION .................................................................................... 1 CHAPTER 2: LARGE-SCALE DATA STORES .......................................................... 10 2.1 DATA STORE TECHNOLOGY OVERVIEW ......................................................................... 11 2.1.1 Relational ............................................................................................................... 12 2.1.2 Columnar ............................................................................................................... 14 2.1.3 Key-Value ............................................................................................................... 15 2.1.4 Document ............................................................................................................... 17 2.1.5 Network .................................................................................................................. 19 CHAPTER 3: TOWARDS N DATA STORE ARCHITECTURES .............................. 21 3.1 CHALLENGES FOR SOFTWARE ENGINEERING ............................................................... 21 3.2 OPPORTUNITIES FOR SOFTWARE ENGINEERING ........................................................... 26 CHAPTER 4: RELATED WORK ................................................................................. 28 CHAPTER 5: EVOLUTION OF THE PERSISTENCE TIER .................................... 35 5.1 TRADITIONAL ENTERPRISE ARCHITECTURE ................................................................. 37 5.2 TRADITIONAL MULTI-DATA STORE APPROACH ............................................................. 39 5.3 TRADITIONAL PERSISTENCE ARCHITECTURE DESIGN CONSTRAINTS ........................... 41 CHAPTER 6: THE DIAMOND ARCHITECTURE ..................................................... 45 6.1 OVERVIEW ..................................................................................................................... 45 6.2 APPROACH ..................................................................................................................... 46 6.3 DIAMOND ARCHITECTURAL COMPONENTS ................................................................... 49 vii 6.3.1 Command ............................................................................................................... 49 6.3.2 Runner ................................................................................................................... 51 6.3.3 Data Model Converter ........................................................................................... 52 6.3.4 Reconciler ............................................................................................................... 56 6.3.5 Pipeline .................................................................................................................. 57 6.3.6 RepositoryFacade (Optional) ................................................................................ 60 6.3.7 Final Architecture ................................................................................................. 61 CHAPTER 7: IMPLEMENTATION: THE MACHINIST FRAMEWORK ................. 64 7.1 THE MACHINIST FRAMEWORK ...................................................................................... 64 7.1.1 Command ............................................................................................................... 65 7.1.2 Runner ................................................................................................................... 73 7.1.3 Reconciler ............................................................................................................... 75 7.1.4 Pipeline .................................................................................................................. 75 7.1.5 Decorators .............................................................................................................. 83 CHAPTER 8: MBOX: A MULTI-DATA STORE INBOX ............................................ 87 8.1 OVERVIEW ..................................................................................................................... 87 8.2 DESIGN AND ARCHITECTURE ......................................................................................... 89 8.2.1 Data Model ............................................................................................................. 91 8.2.2 Persist .................................................................................................................... 92 8.2.3 Save ........................................................................................................................ 93 8.2.4 Find By User .......................................................................................................... 99 8.2.5 Search ................................................................................................................... 102 8.2.6 Decorators ............................................................................................................ 104 CHAPTER 9: EVALUATION ..................................................................................... 106 viii 9.1 OVERVIEW ................................................................................................................... 106 9.2 APPROACH ................................................................................................................... 107 9.3 ASSUMPTIONS .............................................................................................................. 107 9.4 EVALUATION SCENARIOS ............................................................................................ 109 9.4.1 Additional Data Store ......................................................................................... 109 9.4.2 Persist Data to Additional

Load more