U Niversal I Ntegration a Rchitecture for H

U Niversal I Ntegration a Rchitecture for H

U NIVERSAL I NTEGRATION A RCHITECTURE FOR H ETEROGENEOUS D ATASOURCES AND O PTIMISATION M ETHODS UNIWERSALNA ARCHITEKTURA INTEGRACYJNA DLA HETEROGENICZNYCH ZRÓDEŁ´ DANYCH I METOD OPTYMALIZACJI THIS DISSERTATION IS SUBMITTED FOR THE DEGREE OF Doctor of Philosophy BY MICHAŁ CHROMIAK FACULTY OF MATHEMATICS, PHYSICS AND COMPUTER SCIENCE, Maria Curie-Skłodowska University, Lublin ADVISOR: prof. dr hab. Krzysztof Stencel INSTITUTEOF FUNDAMENTAL TECHNOLOGICAL RESEARCH, POLISH ACADEMYOF SCIENCES WARSAW 2015 Table of Contents Page LISTINGS ............................................... 5 LISTOF FIGURES ........................................... 6 LISTOF TABLES ........................................... 8 ABSTRACT .............................................. 9 CHAPTER 1. INTRODUCTION ................................... 19 1.1 Motivation . 19 1.2 Considerations, Objectives and the Thesis . 20 1.3 History and Related Work . 22 1.4 Thesis Outline . 23 CHAPTER 2. THE STATE OF THE ART AND THE RELATED WORKS . 25 2.1 Integrity - the Philosophy of Integration . 25 2.2 Integration - Cure for Chaos of Multiplicity, General Considerations . 27 2.2.1 At the beginning there was a relation . 28 2.2.2 Revolution - the Web changes everything ................... 30 2.2.3 Integration - Principia and Taxonomy . 35 2.2.4 Data Integration Practices . 38 2.2.5 Integration Theory . 42 2.2.6 Data Integration Issues . 47 2.3 Data Stores - the Integration Targets . 51 2.3.1 Database modelling - persistence . 51 2.3.2 Relational Model . 51 2.3.3 Object-oriented Database Model . 55 2.3.4 Column-oriented Relational Database Model (CORDB) – Relational Ap- proach . 56 2.3.5 NoSQL – Distributed Storage Services . 57 2.3.6 NewSQL . 63 2.3.7 Big Data - all or nothing . 66 2.3.8 After SQL Era . 68 2.3.9 Database taxonomy . 70 3 2.4 Related Works - Overview of Modern Integrating Solutions . 71 2.4.1 OLTP & OLAP - sets of operations . 72 2.4.2 Metamodels - Metadata . 78 2.4.3 Distributed File Systems - Embracing Scaling Up in Size . 80 2.4.4 Enterprise Service Bus (ESB) . 94 2.4.5 ESB / SOA - Rules of Engagement . 96 2.5 Conclusions . 97 CHAPTER 3. THE MODELOFTHE ARCHITECTURE ....................... 99 3.1 Data vs Application Integration Patterns . 100 3.1.1 Patterns in Software Development . 100 3.1.2 Architectural Patterns in Integration . 101 3.2 General Architecture and Assumptions . 103 3.2.1 Virtualization as the Key to Integration – Postulates . 103 3.2.2 Polyglot Persistence – building "The Tower of Babel" . 105 3.2.3 Event Sourcing as a Persistence Technique . 108 3.2.4 Command Query Responsibility Separation (CQRS) Pattern . 109 3.2.5 OMG CORBA - Standard Specfication . 115 3.2.6 Metadata . 116 3.2.7 Design Patterns - Study of Utility . 116 3.2.8 Integration Database Model - IDBM . 118 3.2.9 Indexing Role in Integrated Datamodel . 119 3.3 The Architecture . 121 3.3.1 Principia – Assumptions and Directions . 121 3.3.2 Components of the Architecture . 124 3.3.3 Workflow . 139 3.4 Faced Challenges . 141 CHAPTER 4. APPLICATIONS .................................... 143 4.1 Integration . 143 4.1.1 Polystores as the Next-gen Federations vs Qboid-based Architecture for BigData Integration . 145 4.2 Optimization . 147 4.2.1 Indexing Distributed and Heterogeneous Data . 148 4.2.2 Indexing Projections . 148 4.2.3 Exploiting Order Dependencies Optimization Technique for Qboid-based Integration Architecture . 150 4.2.4 Polyglot Persistence as an Optimization Technique for Integration Archi- tecture . 156 4.3 Conclusions . 161 CHAPTER 5. SUMMARY AND CONCLUSIONS . 163 5.1 The Limitation of Prototype and Further Works . 164 5.2 Additional Mediator Functionalities . 165 APPENDIX A. PROTOTYPE IMPLEMENTATION ........................... 167 A.1Integration Layer . 167 A.1.1 The IDL Scheme for Integration Contexts of Qboid and the Integration View169 A.1.2 The Integration Scheme in Action – Example . 172 APPENDIX B. STANDARDS AND CLASSIFICATIONS ........................ 177 APPENDIX C. HADOOP ECOSYSTEM ................................ 185 4 LISTINGS 5 BIBLIOGRAPHY ............................................ 189 Listings 2.1 OWL/XML Syntax for Ontology Management . 41 2.2 GaV on data sources . 44 2.3 GaV based query. 45 2.4 GaV query unfolding . 45 2.5 LaV S1_emp(Name, Age) .................................. 45 2.6 LaV S2_emp(Name, Age) .................................. 45 2.7 Declare emp_type object with methods - PL/SQL style . 55 2.8 Define emp_type object with methods - PL/SQL style . 55 2.9 Define column and table of emp_type type....................... 55 2.10 Query column of emp_type type ............................. 55 2.11 Column . 59 2.12 Super-Column . 59 2.13 ColumFamily - simplified notation - i.e. no timestamps and column/super-column names removed . 59 2.14 Raw XML based document . 61 2.15 JSON-based document; MongoDB style . 61 2.16 Metadata document for page node . 62 3.1 Employee class. 113 3.2 Employee repository class. 113 3.3 Employee class. 113 3.4 Employee repository class. 113 3.5 Employee repository class – now handles COMMANDS. 114 3.6 Extracted query search handler class. 114 3.7 SQL based FAM selection . 126 3.8 Contributory View metadata schema. Some parts omitted for readability . 126 3.9 Remote Database Object Reference (rDOR) . 128 3.10 Contact and Connection Details of a rDOR . 131 3.11 Virtual, BRI-based data identification strategy . 133 3.12 Exemplary Cell Definition . 135 3.13 Exemplary Tuple Definition . 135 3.14 Exemplary Record Definition . 136 3.15 Exemplary Record Definition . 136 3.16 SQL based FAM selection . 136 3.17 Qboid Layer . 137 3.18 Qboid replica . 137 3.19 Qboid replication . 138 4.1 BigDWAG selection . 146 4.2 Index on Employee’s salary . 149 4.3 Index on Employee’s salary . 150 4.4 A query for sales in the indicated period . 151 4.5 A rewritten query for sales in the indicated period . 152 4.6 Query general schema . 152 4.7 PLSQL function that finds minimal Fact_ID for a given date . 153 4.8 Simple rewrite with sub-queries . ..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    203 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us