U Niversal I Ntegration a Rchitecture for H

Total Page:16

File Type:pdf, Size:1020Kb

U Niversal I Ntegration a Rchitecture for H U NIVERSAL I NTEGRATION A RCHITECTURE FOR H ETEROGENEOUS D ATASOURCES AND O PTIMISATION M ETHODS UNIWERSALNA ARCHITEKTURA INTEGRACYJNA DLA HETEROGENICZNYCH ZRÓDEŁ´ DANYCH I METOD OPTYMALIZACJI THIS DISSERTATION IS SUBMITTED FOR THE DEGREE OF Doctor of Philosophy BY MICHAŁ CHROMIAK FACULTY OF MATHEMATICS, PHYSICS AND COMPUTER SCIENCE, Maria Curie-Skłodowska University, Lublin ADVISOR: prof. dr hab. Krzysztof Stencel INSTITUTEOF FUNDAMENTAL TECHNOLOGICAL RESEARCH, POLISH ACADEMYOF SCIENCES WARSAW 2015 Table of Contents Page LISTINGS ............................................... 5 LISTOF FIGURES ........................................... 6 LISTOF TABLES ........................................... 8 ABSTRACT .............................................. 9 CHAPTER 1. INTRODUCTION ................................... 19 1.1 Motivation . 19 1.2 Considerations, Objectives and the Thesis . 20 1.3 History and Related Work . 22 1.4 Thesis Outline . 23 CHAPTER 2. THE STATE OF THE ART AND THE RELATED WORKS . 25 2.1 Integrity - the Philosophy of Integration . 25 2.2 Integration - Cure for Chaos of Multiplicity, General Considerations . 27 2.2.1 At the beginning there was a relation . 28 2.2.2 Revolution - the Web changes everything ................... 30 2.2.3 Integration - Principia and Taxonomy . 35 2.2.4 Data Integration Practices . 38 2.2.5 Integration Theory . 42 2.2.6 Data Integration Issues . 47 2.3 Data Stores - the Integration Targets . 51 2.3.1 Database modelling - persistence . 51 2.3.2 Relational Model . 51 2.3.3 Object-oriented Database Model . 55 2.3.4 Column-oriented Relational Database Model (CORDB) – Relational Ap- proach . 56 2.3.5 NoSQL – Distributed Storage Services . 57 2.3.6 NewSQL . 63 2.3.7 Big Data - all or nothing . 66 2.3.8 After SQL Era . 68 2.3.9 Database taxonomy . 70 3 2.4 Related Works - Overview of Modern Integrating Solutions . 71 2.4.1 OLTP & OLAP - sets of operations . 72 2.4.2 Metamodels - Metadata . 78 2.4.3 Distributed File Systems - Embracing Scaling Up in Size . 80 2.4.4 Enterprise Service Bus (ESB) . 94 2.4.5 ESB / SOA - Rules of Engagement . 96 2.5 Conclusions . 97 CHAPTER 3. THE MODELOFTHE ARCHITECTURE ....................... 99 3.1 Data vs Application Integration Patterns . 100 3.1.1 Patterns in Software Development . 100 3.1.2 Architectural Patterns in Integration . 101 3.2 General Architecture and Assumptions . 103 3.2.1 Virtualization as the Key to Integration – Postulates . 103 3.2.2 Polyglot Persistence – building "The Tower of Babel" . 105 3.2.3 Event Sourcing as a Persistence Technique . 108 3.2.4 Command Query Responsibility Separation (CQRS) Pattern . 109 3.2.5 OMG CORBA - Standard Specfication . 115 3.2.6 Metadata . 116 3.2.7 Design Patterns - Study of Utility . 116 3.2.8 Integration Database Model - IDBM . 118 3.2.9 Indexing Role in Integrated Datamodel . 119 3.3 The Architecture . 121 3.3.1 Principia – Assumptions and Directions . 121 3.3.2 Components of the Architecture . 124 3.3.3 Workflow . 139 3.4 Faced Challenges . 141 CHAPTER 4. APPLICATIONS .................................... 143 4.1 Integration . 143 4.1.1 Polystores as the Next-gen Federations vs Qboid-based Architecture for BigData Integration . 145 4.2 Optimization . 147 4.2.1 Indexing Distributed and Heterogeneous Data . 148 4.2.2 Indexing Projections . 148 4.2.3 Exploiting Order Dependencies Optimization Technique for Qboid-based Integration Architecture . 150 4.2.4 Polyglot Persistence as an Optimization Technique for Integration Archi- tecture . 156 4.3 Conclusions . 161 CHAPTER 5. SUMMARY AND CONCLUSIONS . 163 5.1 The Limitation of Prototype and Further Works . 164 5.2 Additional Mediator Functionalities . 165 APPENDIX A. PROTOTYPE IMPLEMENTATION ........................... 167 A.1Integration Layer . 167 A.1.1 The IDL Scheme for Integration Contexts of Qboid and the Integration View169 A.1.2 The Integration Scheme in Action – Example . 172 APPENDIX B. STANDARDS AND CLASSIFICATIONS ........................ 177 APPENDIX C. HADOOP ECOSYSTEM ................................ 185 4 LISTINGS 5 BIBLIOGRAPHY ............................................ 189 Listings 2.1 OWL/XML Syntax for Ontology Management . 41 2.2 GaV on data sources . 44 2.3 GaV based query. 45 2.4 GaV query unfolding . 45 2.5 LaV S1_emp(Name, Age) .................................. 45 2.6 LaV S2_emp(Name, Age) .................................. 45 2.7 Declare emp_type object with methods - PL/SQL style . 55 2.8 Define emp_type object with methods - PL/SQL style . 55 2.9 Define column and table of emp_type type....................... 55 2.10 Query column of emp_type type ............................. 55 2.11 Column . 59 2.12 Super-Column . 59 2.13 ColumFamily - simplified notation - i.e. no timestamps and column/super-column names removed . 59 2.14 Raw XML based document . 61 2.15 JSON-based document; MongoDB style . 61 2.16 Metadata document for page node . 62 3.1 Employee class. 113 3.2 Employee repository class. 113 3.3 Employee class. 113 3.4 Employee repository class. 113 3.5 Employee repository class – now handles COMMANDS. 114 3.6 Extracted query search handler class. 114 3.7 SQL based FAM selection . 126 3.8 Contributory View metadata schema. Some parts omitted for readability . 126 3.9 Remote Database Object Reference (rDOR) . 128 3.10 Contact and Connection Details of a rDOR . 131 3.11 Virtual, BRI-based data identification strategy . 133 3.12 Exemplary Cell Definition . 135 3.13 Exemplary Tuple Definition . 135 3.14 Exemplary Record Definition . 136 3.15 Exemplary Record Definition . 136 3.16 SQL based FAM selection . 136 3.17 Qboid Layer . 137 3.18 Qboid replica . 137 3.19 Qboid replication . 138 4.1 BigDWAG selection . 146 4.2 Index on Employee’s salary . 149 4.3 Index on Employee’s salary . 150 4.4 A query for sales in the indicated period . 151 4.5 A rewritten query for sales in the indicated period . 152 4.6 Query general schema . 152 4.7 PLSQL function that finds minimal Fact_ID for a given date . 153 4.8 Simple rewrite with sub-queries . ..
Recommended publications
  • Schema in Database Sql Server
    Schema In Database Sql Server Normie waff her Creon stringendo, she ratten it compunctiously. If Afric or rostrate Jerrie usually files his terrenes shrives wordily or supernaturalized plenarily and quiet, how undistinguished is Sheffy? Warring and Mahdi Morry always roquet impenetrably and barbarizes his boskage. Schema compare tables just how the sys is a table continues to the most out longer function because of the connector will often want to. Roles namely actors in designer slow and target multiple teams together, so forth from sql management. You in sql server, should give you can learn, and execute this is a location of users: a database projects, or more than in. Your sql is that the view to view of my data sources with the correct. Dive into the host, which objects such a set of lock a server database schema in sql server instance of tables under the need? While viewing data in sql server database to use of microseconds past midnight. Is sql server is sql schema database server in normal circumstances but it to use. You effectively structure of the sql database objects have used to it allows our policy via js. Represents table schema in comparing new database. Dml statement as schema in database sql server functions, and so here! More in sql server books online schema of the database operator with sql server connector are not a new york, with that object you will need. This in schemas and history topic names are used to assist reporting from. Sql schema table as views should clarify log reading from synonyms in advance so that is to add this game reports are.
    [Show full text]
  • Histcoroy Pyright for Online Information and Ordering of This and Other Manning Books, Please Visit Topwicws W.Manning.Com
    www.allitebooks.com HistCoroy pyright For online information and ordering of this and other Manning books, please visit Topwicws w.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Tutorials Special Sales Department Offers & D e al s Manning Publications Co. 20 Baldwin Road Highligh ts PO Box 761 Shelter Island, NY 11964 Email: [email protected] Settings ©2017 by Manning Publications Co. All rights reserved. Support No part of this publication may be reproduced, stored in a retrieval system, or Sign Out transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid­free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. PO Box 761 Shelter Island, NY 11964 www.allitebooks.com Development editor: Cynthia Kane Review editor: Aleksandar Dragosavljević Technical development editor: Stan Bice Project editors: Kevin Sullivan, David Novak Copyeditor: Sharon Wilkey Proofreader: Melody Dolab Technical proofreader: Doug Warren Typesetter and cover design: Marija Tudor ISBN 9781617292576 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 – EBM – 22 21 20 19 18 17 www.allitebooks.com HistPoray rt 1.
    [Show full text]
  • Beyond Relational Databases
    EXPERT ANALYSIS BY MARCOS ALBE, SUPPORT ENGINEER, PERCONA Beyond Relational Databases: A Focus on Redis, MongoDB, and ClickHouse Many of us use and love relational databases… until we try and use them for purposes which aren’t their strong point. Queues, caches, catalogs, unstructured data, counters, and many other use cases, can be solved with relational databases, but are better served by alternative options. In this expert analysis, we examine the goals, pros and cons, and the good and bad use cases of the most popular alternatives on the market, and look into some modern open source implementations. Beyond Relational Databases Developers frequently choose the backend store for the applications they produce. Amidst dozens of options, buzzwords, industry preferences, and vendor offers, it’s not always easy to make the right choice… Even with a map! !# O# d# "# a# `# @R*7-# @94FA6)6 =F(*I-76#A4+)74/*2(:# ( JA$:+49>)# &-)6+16F-# (M#@E61>-#W6e6# &6EH#;)7-6<+# &6EH# J(7)(:X(78+# !"#$%&'( S-76I6)6#'4+)-:-7# A((E-N# ##@E61>-#;E678# ;)762(# .01.%2%+'.('.$%,3( @E61>-#;(F7# D((9F-#=F(*I## =(:c*-:)U@E61>-#W6e6# @F2+16F-# G*/(F-# @Q;# $%&## @R*7-## A6)6S(77-:)U@E61>-#@E-N# K4E-F4:-A%# A6)6E7(1# %49$:+49>)+# @E61>-#'*1-:-# @E61>-#;6<R6# L&H# A6)6#'68-# $%&#@:6F521+#M(7#@E61>-#;E678# .761F-#;)7-6<#LNEF(7-7# S-76I6)6#=F(*I# A6)6/7418+# @ !"#$%&'( ;H=JO# ;(\X67-#@D# M(7#J6I((E# .761F-#%49#A6)6#=F(*I# @ )*&+',"-.%/( S$%=.#;)7-6<%6+-# =F(*I-76# LF6+21+-671># ;G';)7-6<# LF6+21#[(*:I# @E61>-#;"# @E61>-#;)(7<# H618+E61-# *&'+,"#$%&'$#( .761F-#%49#A6)6#@EEF46:1-#
    [Show full text]
  • 2. Creating a Database Designing the Database Schema
    2. Creating a database Designing the database schema ..................................................................................... 1 Representing Classes, Attributes and Objects ............................................................. 2 Data types .......................................................................................................................... 5 Additional constraints ...................................................................................................... 6 Choosing the right fields ................................................................................................. 7 Implementing a table in SQL ........................................................................................... 7 Inserting data into a table ................................................................................................ 8 Primary keys .................................................................................................................... 10 Representing relationships ........................................................................................... 12 Altering a table ................................................................................................................ 22 Designing the database schema As you have seen, once the data model for a system has been designed, you need to work out how to represent that model in a relational database. This representation is sometimes referred to as the database schema. In a relational database, the schema defines
    [Show full text]
  • Database Schema Migration Tools Open Source
    Database Schema Migration Tools Open Source Validating Darian sometimes tranquillize his barony afterwards and cast so stubbornly! Vilhelm rocket his flirt bludge round-arm or best after Worthy smuts and formulise conspiratorially, quinoidal and declaratory. Implied Ernest rinsings: he built his Kathy lexically and amorally. Does this coupon code that is ideal state can replicate for speaking with their database tools and handled it ensures data, a granular control Review the tool for migrating to? If necessary continue browsing the site, will agree specify the rush of cookies on this website. Iteratively make both necessary changes to applications. 1 Database Version Control DBMS Tools. It moves to schema migration database tools source database migration is a few clicks configuration as well as someone to. GDPR: floating video: is from consent? Openmysql rootwelcometcp1270013306migrationtest if err nil fmt. Database health Suite itself and Schema Sync across. The Top 33 Database Migrations Open Source Projects. The community edition of PDI is useful enough they perform our mystery here. Migration Supports schema migration for MySQL SQLite and PostgreSQL Reverse Engineering For existing database structures we to reverse enginering. Most schema migration tools aim to minimize the footprint of schema changes on any existing data in tally database. Contains errors, warnings, and informational messages relating to migration operations. To schema and tools with a tool allows you take years of the tooling uses the type of. But migrating data services ownership, and integrity checks will be able to other objects to use open source tools now part of. Making database schema while capturing any databases, open source endpoint to migrate to get started with constraints between data sources in an altered outside the.
    [Show full text]
  • Data Modeler User's Guide
    Oracle® SQL Developer Data Modeler User's Guide Release 18.1 E94838-01 March 2018 Oracle SQL Developer Data Modeler User's Guide, Release 18.1 E94838-01 Copyright © 2008, 2018, Oracle and/or its affiliates. All rights reserved. Primary Author: Celin Cherian Contributing Authors: Chuck Murray Contributors: Philip Stoyanov This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency- specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs.
    [Show full text]
  • Dbartisan Reviewers Guide
    DBArtisan® XE Product Review Guide May 2010 Americas Headquarters EMEA Headquarters Asia-Pacific Headquarters 100 California Street, 12th Floor York House L7. 313 La Trobe Street San Francisco, California 94111 18 York Road Melbourne VIC 3000 Maidenhead, Berkshire Australia SL6 1SF, United Kingdom The High Performance DBA CONTENTS Contents ..................................................................................................................................................................... - 1 - Overview ......................................................................................................................................................................... - 2 - Introduction ............................................................................................................................................................... - 2 - Product Description .................................................................................................................................................. - 2 - Contact Information .................................................................................................................................................. - 2 - DBArtisan XE Highlights ................................................................................................................................................ - 3 - New and Interesting Features of DBArtisan XE ..................................................................................................... - 3 - Key Benefits
    [Show full text]
  • A Relational Multi-Schema Data Model and Query Language for Full Support of Schema Versioning?
    A Relational Multi-Schema Data Model and Query Language for Full Support of Schema Versioning? Fabio Grandi CSITE-CNR and DEIS, Alma Mater Studiorum – Universita` di Bologna Viale Risorgimento 2, 40136 Bologna, Italy, email: [email protected] Abstract. Schema versioning is a powerful tool not only to ensure reuse of data and continued support of legacy applications after schema changes, but also to add a new degree of freedom to database designers, application developers and final users. In fact, different schema versions actually allow one to represent, in full relief, different points of view over the modelled application reality. The key to such an improvement is the adop- tion of a multi-pool implementation solution, rather that the single-pool solution usually endorsed by other authors. In this paper, we show some of the application potentialities of the multi-pool approach in schema versioning through a concrete example, introduce a simple but comprehensive logical storage model for the mapping of a multi-schema database onto a standard relational database and use such a model to define and exem- plify a multi-schema query language, called MSQL, which allows one to exploit the full potentialities of schema versioning under the multi-pool approach. 1 Introduction However careful and accurate the initial design may have been, a database schema is likely to undergo changes and revisions after implementation. In order to avoid the loss of data after schema changes, schema evolution has been introduced to provide (partial) automatic recov- ery of the extant data by adapting them to the new schema.
    [Show full text]
  • Betrfs: a Right-Optimized Write-Optimized File System
    BetrFS: A Right-Optimized Write-Optimized File System William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, Stony Brook University; John Esmet, Tokutek Inc.; Yizheng Jiao, Ankur Mittal, Prashant Pandey, and Phaneendra Reddy, Stony Brook University; Leif Walsh, Tokutek Inc.; Michael Bender, Stony Brook University; Martin Farach-Colton, Rutgers University; Rob Johnson, Stony Brook University; Bradley C. Kuszmaul, Massachusetts Institute of Technology; Donald E. Porter, Stony Brook University https://www.usenix.org/conference/fast15/technical-sessions/presentation/jannen This paper is included in the Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST ’15). February 16–19, 2015 • Santa Clara, CA, USA ISBN 978-1-931971-201 Open access to the Proceedings of the 13th USENIX Conference on File and Storage Technologies is sponsored by USENIX BetrFS: A Right-Optimized Write-Optimized File System William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet∗, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh∗, Michael Bender, Martin Farach-Colton†, Rob Johnson, Bradley C. Kuszmaul‡, and Donald E. Porter Stony Brook University, ∗Tokutek Inc., †Rutgers University, and ‡Massachusetts Institute of Technology Abstract (microwrites). Examples include email delivery, creat- The Bε -tree File System, or BetrFS, (pronounced ing lock files for an editing application, making small “better eff ess”) is the first in-kernel file system to use a updates to a large file, or updating a file’s atime. The un- write-optimized index. Write optimized indexes (WOIs) derlying problem is that many standard data structures in are promising building blocks for storage systems be- the file-system designer’s toolbox optimize for one case cause of their potential to implement both microwrites at the expense of another.
    [Show full text]
  • Package 'Databaseconnector'
    Package ‘DatabaseConnector’ April 15, 2021 Type Package Title Connecting to Various Database Platforms Version 4.0.2 Date 2021-04-12 Description An R 'DataBase Interface' ('DBI') compatible interface to various database plat- forms ('PostgreSQL', 'Oracle', 'Microsoft SQL Server', 'Amazon Redshift', 'Microsoft Parallel Database Warehouse', 'IBM Netezza', 'Apache Im- pala', 'Google BigQuery', and 'SQLite'). Also includes support for fetching data as 'Andromeda' objects. Uses 'Java Database Connectivity' ('JDBC') to con- nect to databases (except SQLite). SystemRequirements Java version 8 or higher (https://www.java.com/) Depends R (>= 2.10) Imports rJava, SqlRender (>= 1.7.0), methods, stringr, rlang, utils, DBI (>= 1.0.0), urltools, bit64 Suggests aws.s3, R.utils, withr, testthat, DBItest, knitr, rmarkdown, RSQLite, ssh, Andromeda, dplyr License Apache License VignetteBuilder knitr URL https://ohdsi.github.io/DatabaseConnector/, https: //github.com/OHDSI/DatabaseConnector 1 2 R topics documented: BugReports https://github.com/OHDSI/DatabaseConnector/issues Copyright See file COPYRIGHTS RoxygenNote 7.1.1 Encoding UTF-8 R topics documented: connect . .3 createConnectionDetails . .6 createZipFile . .9 DatabaseConnectorDriver . 10 dbAppendTable,DatabaseConnectorConnection,character,data.frame-method . 10 dbClearResult,DatabaseConnectorResult-method . 11 dbColumnInfo,DatabaseConnectorResult-method . 12 dbConnect,DatabaseConnectorDriver-method . 13 dbCreateTable,DatabaseConnectorConnection,character,data.frame-method . 13 dbDisconnect,DatabaseConnectorConnection-method
    [Show full text]
  • Algorithms and Complexity (AL)
    Algorithms and Complexity (AL) Algorithms are fundamental to computer science and software engineering. The real-world performance of any software system depends on the algorithms chosen and the suitability of the various layers of implementation. Good algorithm design is therefore crucial for the performance of all software systems. Moreover, the study of algorithms provides insight into the intrinsic nature of the problem as well as possible solution techniques independent of programming language, programming paradigm, computer hardware, or any other implementation aspect. An important part of computing is the ability to select algorithms appropriate to particular purposes and to apply them, recognizing the possibility that no suitable algorithm may exist. This facility relies on understanding the range of algorithms that address an important set of well-defined problems, recognizing their strengths and weaknesses, and their suitability in particular contexts. Efficiency is a pervasive theme throughout this area. This knowledge area defines the central concepts and skills required to design, implement, and analyze algorithms for solving problems. Algorithms are essential in all advanced areas of computer science: artificial intelligence, databases, distributed computing, graphics, networking, operating systems, programming languages, security, and so on. Algorithms that have specific utility in each of these are listed in the relevant knowledge areas. Cryptography, for example, appears in the new Knowledge Area on Information Assurance and Security (IAS), while parallel and distributed algorithms appear in the Knowledge Area in Parallel and Distributed Computing (PD). As with all knowledge areas, the order of topics and their groupings do not necessarily correlate to a specific order of presentation. Different programs will teach the topics in different courses and should do so in the order they believe is most appropriate for their students.
    [Show full text]
  • Ssql-Schema-Comparer: Support of Multi-Language Refactoring With
    2013 IEEE 13th International Working Conference on Source Code Analysis and Manipulation (SCAM) sql-schema-comparer: Support of Multi-Language Refactoring with Relational Databases Hagen Schink Institute of Technical and Business Information Systems Otto-von-Guericke-University Magdeburg, Germany [email protected] Abstract—Refactoring is a method to change a source-code’s State-of-the-art IDEs provide information about syntactical structure without modifying its semantics and was first intro- changes or errors before integration or runtime tests take duced for object-oriented code. Since then refactorings were effect. Hence, the information can enable software developers defined for relational databases too. But database refactorings must be treated differently because a database schema’s structure to recognize and correct syntactic and some semantic errors defines semantics used by other applications to access the data already during development. We argue that information about in the schema. Thus, many database refactorings may break syntactical changes in database schemes can ease the problems interaction with other applications if not treated appropriately. of database refactoring. We discuss problems of database refactoring in regard to Java In the following we first present two techniques for ac- code and present sql-schema-comparer, a library to detect refac- torings of database schemes. The sql-schema-comparer library cessing a relational database with the programming language is our first step to more advanced tools supporting developers in Java. Then, we describe how database refactoring affects their database refactoring efforts. database access from Java. In Sec. III we give a more general explanation of problems resulting from database refactorings.
    [Show full text]