Nosql Distilled: a Brief Guide to the Emerging World of Polyglot

Total Page:16

File Type:pdf, Size:1020Kb

Nosql Distilled: a Brief Guide to the Emerging World of Polyglot NoSQL Distilled A Brief Guide to the Emerging World of Polyglot Persistence Pramod J. Sadalage Martin Fowler Upper Saddle River, NJ • Boston • Indianapolis • San Francisco New York • Toronto • Montreal • London • Munich • Paris • Madrid Capetown • Sydney • Tokyo • Singapore • Mexico City Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact: U.S. Corporate and Government Sales (800) 382–3419 [email protected] For sales outside the United States please contact: International Sales [email protected] Visit us on the Web: informit.com/aw Library of Congress Cataloging-in-Publication Data: Sadalage, Pramod J. NoSQL distilled : a brief guide to the emerging world of polyglot persistence / Pramod J Sadalage, Martin Fowler. p. cm. Includes bibliographical references and index. ISBN 978-0-321-82662-6 (pbk. : alk. paper) -- ISBN 0-321-82662-0 (pbk. : alk. paper) 1. Databases--Technological innovations. 2. Information storage and retrieval systems. I. Fowler, Martin, 1963- II. Title. QA76.9.D32S228 2013 005.74--dc23 Copyright © 2013 Pearson Education, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. To obtain permission to use material from this work, please submit a written request to Pearson Education, Inc., Permissions Department, One Lake Street, Upper Saddle River, New Jersey 07458, or you may fax your request to (201) 236–3290. ISBN-13: 978-0-321-82662-6 ISBN-10: 0-321-82662-0 Text printed in the United States on recycled paper at RR Donnelley in Crawfordsville, Indiana. First printing, August 2012 For my teachers Gajanan Chinchwadkar, Dattatraya Mhaskar, and Arvind Parchure. You inspired me the most, thank you. —Pramod For Cindy —Martin Contents Preface Part I: Understand Chapter 1: Why NoSQL? 1.1 The Value of Relational Databases 1.1.1 Getting at Persistent Data 1.1.2 Concurrency 1.1.3 Integration 1.1.4 A (Mostly) Standard Model 1.2 Impedance Mismatch 1.3 Application and Integration Databases 1.4 Attack of the Clusters 1.5 The Emergence of NoSQL 1.6 Key Points Chapter 2: Aggregate Data Models 2.1 Aggregates 2.1.1 Example of Relations and Aggregates 2.1.2 Consequences of Aggregate Orientation 2.2 Key-Value and Document Data Models 2.3 Column-Family Stores 2.4 Summarizing Aggregate-Oriented Databases 2.5 Further Reading 2.6 Key Points Chapter 3: More Details on Data Models 3.1 Relationships 3.2 Graph Databases 3.3 Schemaless Databases 3.4 Materialized Views 3.5 Modeling for Data Access 3.6 Key Points Chapter 4: Distribution Models 4.1 Single Server 4.2 Sharding 4.3 Master-Slave Replication 4.4 Peer-to-Peer Replication 4.5 Combining Sharding and Replication 4.6 Key Points Chapter 5: Consistency 5.1 Update Consistency 5.2 Read Consistency 5.3 Relaxing Consistency 5.3.1 The CAP Theorem 5.4 Relaxing Durability 5.5 Quorums 5.6 Further Reading 5.7 Key Points Chapter 6: Version Stamps 6.1 Business and System Transactions 6.2 Version Stamps on Multiple Nodes 6.3 Key Points Chapter 7: Map-Reduce 7.1 Basic Map-Reduce 7.2 Partitioning and Combining 7.3 Composing Map-Reduce Calculations 7.3.1 A Two Stage Map-Reduce Example 7.3.2 Incremental Map-Reduce 7.4 Further Reading 7.5 Key Points Part II: Implement Chapter 8: Key-Value Databases 8.1 What Is a Key-Value Store 8.2 Key-Value Store Features 8.2.1 Consistency 8.2.2 Transactions 8.2.3 Query Features 8.2.4 Structure of Data 8.2.5 Scaling 8.3 Suitable Use Cases 8.3.1 Storing Session Information 8.3.2 User Profiles, Preferences 8.3.3 Shopping Cart Data 8.4 When Not to Use 8.4.1 Relationships among Data 8.4.2 Multioperation Transactions 8.4.3 Query by Data 8.4.4 Operations by Sets Chapter 9: Document Databases 9.1 What Is a Document Database? 9.2 Features 9.2.1 Consistency 9.2.2 Transactions 9.2.3 Availability 9.2.4 Query Features 9.2.5 Scaling 9.3 Suitable Use Cases 9.3.1 Event Logging 9.3.2 Content Management Systems, Blogging Platforms 9.3.3 Web Analytics or Real-Time Analytics 9.3.4 E-Commerce Applications 9.4 When Not to Use 9.4.1 Complex Transactions Spanning Different Operations 9.4.2 Queries against Varying Aggregate Structure Chapter 10: Column-Family Stores 10.1 What Is a Column-Family Data Store? 10.2 Features 10.2.1 Consistency 10.2.2 Transactions 10.2.3 Availability 10.2.4 Query Features 10.2.5 Scaling 10.3 Suitable Use Cases 10.3.1 Event Logging 10.3.2 Content Management Systems, Blogging Platforms 10.3.3 Counters 10.3.4 Expiring Usage 10.4 When Not to Use Chapter 11: Graph Databases 11.1 What Is a Graph Database? 11.2 Features 11.2.1 Consistency 11.2.2 Transactions 11.2.3 Availability 11.2.4 Query Features 11.2.5 Scaling 11.3 Suitable Use Cases 11.3.1 Connected Data 11.3.2 Routing, Dispatch, and Location-Based Services 11.3.3 Recommendation Engines 11.4 When Not to Use Chapter 12: Schema Migrations 12.1 Schema Changes 12.2 Schema Changes in RDBMS 12.2.1 Migrations for Green Field Projects 12.2.2 Migrations in Legacy Projects 12.3 Schema Changes in a NoSQL Data Store 12.3.1 Incremental Migration 12.3.2 Migrations in Graph Databases 12.3.3 Changing Aggregate Structure 12.4 Further Reading 12.5 Key Points Chapter 13: Polyglot Persistence 13.1 Disparate Data Storage Needs 13.2 Polyglot Data Store Usage 13.3 Service Usage over Direct Data Store Usage 13.4 Expanding for Better Functionality 13.5 Choosing the Right Technology 13.6 Enterprise Concerns with Polyglot Persistence 13.7 Deployment Complexity 13.8 Key Points Chapter 14: Beyond NoSQL 14.1 File Systems 14.2 Event Sourcing 14.3 Memory Image 14.4 Version Control 14.5 XML Databases 14.6 Object Databases 14.7 Key Points Chapter 15: Choosing Your Database 15.1 Programmer Productivity 15.2 Data-Access Performance 15.3 Sticking with the Default 15.4 Hedging Your Bets 15.5 Key Points 15.6 Final Thoughts Bibliography Index Preface We’ve spent some twenty years in the world of enterprise computing. We’ve seen many things change in languages, architectures, platforms, and processes. But through all this time one thing has stayed constant—relational databases store the data. There have been challengers, some of which have had success in some niches, but on the whole the data storage question for architects has been the question of which relational database to use. There is a lot of value in the stability of this reign. An organization’s data lasts much longer that its programs (at least that’s what people tell us—we’ve seen plenty of very old programs out there). It’s valuable to have a stable data storage that’s well understood and accessible from many application programming platforms. Now, however, there’s a new challenger on the block under the confrontational tag of NoSQL. It’s born out of a need to handle larger data volumes which forced a fundamental shift to building large hardware platforms through clusters of commodity servers. This need has also raised long-running concerns about the difficulties of making application code play well with the relational data model. The term “NoSQL” is very ill-defined. It’s generally applied to a number of recent nonrelational databases such as Cassandra, Mongo, Neo4J, and Riak. They embrace schemaless data, run on clusters, and have the ability to trade off traditional consistency for other useful properties. Advocates of NoSQL databases claim that they can build systems that are more performant, scale much better, and are easier to program with. Is this the first rattle of the death knell for relational databases, or yet another pretender to the throne? Our answer to that is “neither.” Relational databases are a powerful tool that we expect to be using for many more decades, but we do see a profound change in that relational databases won’t be the only databases in use. Our view is that we are entering a world of Polyglot Persistence where enterprises, and even individual applications, use multiple technologies for data management. As a result, architects will need to be familiar with these technologies and be able to evaluate which ones to use for differing needs. Had we not thought that, we wouldn’t have spent the time and effort writing this book.
Recommended publications
  • Management of Polyglot Persistent Environments for Low Latency Data Access in Big Data
    Beheer van heterogene opslagtechnologieën voor snelle datatoegang in 'Big Data'-omgevingen Management of Polyglot Persistent Environments for Low Latency Data Access in Big Data Thomas Vanhove Promotoren: prof. dr. ir. F. De Turck, dr. G. Van Seghbroeck Proefschrift ingediend tot het behalen van de graad van Doctor in de ingenieurswetenschappen: computerwetenschappen Vakgroep Informatietechnologie Voorzitter: prof. dr. ir. B. Dhoedt Faculteit Ingenieurswetenschappen en Architectuur Academiejaar 2017 - 2018 ISBN 978-94-6355-084-0 NUR 988, 995 Wettelijk depot: D/2018/10.500/2 QoE-beheer van HTTP-gebaseerde adaptieve videodiensten QoE Management of HTTP Adaptive Streaming Services Niels Bouten Promotoren: prof. dr. ir. F. De Turck, prof. dr. S. Latré Proefschrift ingediend tot het behalen van de graden van Doctor in de ingenieurswetenschappen: computerwetenschappen (Universiteit Gent) en Doctor in de wetenschappen: informatica (Universiteit Antwerpen) Universiteit Gent Faculteit IngenieurswetenschappenVakgroep Informatietechnologie en Architectuur Voorzitter: prof. dr. ir. D. De Zutter Faculteit IngenieurswetenschappenVakgroep Informatietechnologie en Architectuur Departement Wiskunde en Informatica Voorzitter: prof. dr. C. Blondia Leden van de examencommissie: Faculteit Wetenschappen prof. dr. ir. Filip De Turck (promotor) Universiteit Gent - imec Academiejaar 2016 - 2017 dr. Gregory Van Seghbroeck (promotor) Universiteit Gent - imec prof. dr. ir. Daniel¨ De Zutter (voorzitter) Universiteit Gent prof. dr. ir. Frank Gielen Universiteit Gent prof. dr. Guy De Tre´ Universiteit Gent prof. dr. ing. Erik Mannens Universiteit Gent - imec ir. Werner Van Leekwijck Universiteit Antwerpen dr. Anthony Liekens IO Lab Proefschrift tot het behalen van de graad van Doctor in de ingenieurswetenschappen: Computerwetenschappen Academiejaar 2017-2018 Dankwoord Je zou denken dat 5,5 jaar een lange tijd is, maar nu ik hier de laatste woorden van dit boek neerschrijf, lijkt het alsof augustus 2012 toch niet zo veraf is.
    [Show full text]
  • Nosql Distilled: a Brief Guide to the Emerging World of Polyglot Persistence
    NoSQL Distilled This page intentionally left blank NoSQL Distilled A Brief Guide to the Emerging World of Polyglot Persistence Pramod J. Sadalage Martin Fowler Upper Saddle River, NJ • Boston • Indianapolis • San Francisco New York • Toronto • Montreal • London • Munich • Paris • Madrid Capetown • Sydney • Tokyo • Singapore • Mexico City Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact: U.S. Corporate and Government Sales (800) 382–3419 [email protected] For sales outside the United States please contact: International Sales [email protected] Visit us on the Web: informit.com/aw Library of Congress Cataloging-in-Publication Data: Sadalage, Pramod J. NoSQL distilled : a brief guide to the emerging world of polyglot persistence / Pramod J Sadalage, Martin Fowler. p. cm. Includes bibliographical references and index.
    [Show full text]
  • Polyglot Persistency an Adaptive Data Layer for Digital Systems Table of Contents
    polyglot persistency an adaptive data layer for digital systems table of contents 03 Executive summary 04 Why NoSQL databases? 05 Persistence store families 06 Assessing NoSQL technologies 10 Polyglot persistency in amdocs digital 14 About Amdocs To properly serve such demand, the underlying data layer executive must adapt from what has been used for the past three decades into a new adaptive data infrastructure capable summary of handling the multiple types of data models that exist in the modern application design, and providing an optimal persistence method – also called polyglot persistence. The digital era has changed the way users consume software. They want to access it from everywhere, at any A polyglot is “someone who speaks or writes several time, get all the capabilities they need in one place, and languages”. Neal Ford first introduced the term ’polyglot complete transactions with a single click of a button. persistence’ in 2006 to express the idea that NoSQL applications would, by their nature, require differing To meet these high standards, enterprises embrace persistence stores based on the type of data and access cloud principles and tools that will allow them to be needs. geographically distributed, intensely transactional, and continuously available. This paper introduces the concept of polyglot persistence along with the guidelines that can be utilized to map Furthermore, they understand that their software must various application use-cases to the appropriate be architected to be agile, distributed and broken up persistence family. The document also provides examples into independent, loosely coupled services, also known as of how Amdocs Digital is leveraging different persistence microservices.
    [Show full text]
  • Different Kinds of Database Schema
    Different Kinds Of Database Schema Sarcophagous or conglutinant, Hasty never categorize any Banff! Plato usually parsing devotedly or opaque harum-scarum when agraphic Manish blemishes extortionately and steady. Healthiest Worthy usually spiral some animadversions or outdid universally. Once jen starts a value for a new functionality or more detailed documentation about database using it does it in terms of database Usually it does not different models of. Jen has values in your email lists the documentation explaining some other kinds of different database schema indicates the external schemas? Sometimes find there are assumed to ensure that whenever the foreign key is a product types of making other kinds of different entities. We do prefer releasing frequently as that keeps the updates small, we serve a multitude of customers with different use cases, users to infer state by replaying events. Agile processes approach where the different kinds of database schema changes to represent relationships together with? Primary keys What do you think will happen if two users with the same name are added to the Users table? Any topic page is different kinds of database schema and punctuation, such as frequent changes needs a blueprint for qa staff should be filled in. Before they appeared on the scene most of the thinking about software process was about understanding requirements early, database schemas not only include tables, carefully modified to violate Normalization rules to increase reporting speed. This separate working works with files, which DBMS is best? On Career Karma, above and beyond the basic syntactical constraints imposed by XML itself. Now with a lot of serious time and effort you could eventually get to some kind of structure for understanding the data.
    [Show full text]
  • Database Schema Migration Tools Open Source
    Database Schema Migration Tools Open Source Validating Darian sometimes tranquillize his barony afterwards and cast so stubbornly! Vilhelm rocket his flirt bludge round-arm or best after Worthy smuts and formulise conspiratorially, quinoidal and declaratory. Implied Ernest rinsings: he built his Kathy lexically and amorally. Does this coupon code that is ideal state can replicate for speaking with their database tools and handled it ensures data, a granular control Review the tool for migrating to? If necessary continue browsing the site, will agree specify the rush of cookies on this website. Iteratively make both necessary changes to applications. 1 Database Version Control DBMS Tools. It moves to schema migration database tools source database migration is a few clicks configuration as well as someone to. GDPR: floating video: is from consent? Openmysql rootwelcometcp1270013306migrationtest if err nil fmt. Database health Suite itself and Schema Sync across. The Top 33 Database Migrations Open Source Projects. The community edition of PDI is useful enough they perform our mystery here. Migration Supports schema migration for MySQL SQLite and PostgreSQL Reverse Engineering For existing database structures we to reverse enginering. Most schema migration tools aim to minimize the footprint of schema changes on any existing data in tally database. Contains errors, warnings, and informational messages relating to migration operations. To schema and tools with a tool allows you take years of the tooling uses the type of. But migrating data services ownership, and integrity checks will be able to other objects to use open source tools now part of. Making database schema while capturing any databases, open source endpoint to migrate to get started with constraints between data sources in an altered outside the.
    [Show full text]
  • Multi-Model Database Management Systems - a Look Forward Zhen Hua Liu 1 , Jiaheng Lu2, Dieter Gawlick1, Heli Helskyaho2,3
    Multi-Model Database Management Systems - a Look Forward Zhen Hua Liu 1 , Jiaheng Lu2, Dieter Gawlick1, Heli Helskyaho2,3 Gregory Pogossiants4, Zhe Wu1 1Oracle Corporation 2University of Helsinki 3Miracle Finland Oy 4Soulmates.ai Abstract. The existence of the variety of data models and their associated data processing technologies make data management extremely complex. In this paper, we envision a single Multi-Model DataBase Management Systems (MMDBMS) providing declarative accesses to a variety of data models. We briefly review the history of the evolution of the DBMS technology to derive requirements of MMDBMSs and then we illustrate our ideas of building MMDBMSs satisfying those requirements. Since the relational algebra is not powerful enough to provide a mathematical foundation for MMDBMSs, we promote the category theory as a new theoretical foundation, which is a generalization of the set theory. We also suggest a set of shared data infrastructure services among data models to support “Just-In-Time” multi-model data access autonomously. 1 INTRODUCTION – why MMDBMS? Here is a short history of databases: Initially, database management systems sup- ported the hierarchical and the network model (e.g., IBM’s IMS and GE’s IDS re- spectively). These databases evolved very fast and developed the core infrastructures, such as journaling, transactions, locking, 2PC (group and fast commit), recovery, restart, fault tolerance, high performance, TP-monitors, messaging, main storage da- tabases, and much, much more. We still use these concepts today. In the 80’ and 90’, these databases were widely replaced by the relational database management systems (RDBMS). The main argument is its solid theoretical foundation: set based relational data model and declarative query language (SQL) based on abstract algebra over set processing.
    [Show full text]
  • A Reverse Engineering Approach for Migrating Data-Intensive Web Sites to the Semantic Web
    A reverse engineering approach for migrating data-intensive web sites to the Semantic Web Nenad Stojanovic, Ljiljana Stojanovic, Raphael Volz AIFB Institute, Univ. of Karlsruhe,Germany, {nst,lst,volz}@aifb.uni-karlsruhe.de The Semantic Web is intended to enable machine understandable web content and seems to be a solution for many drawbacks of the current Web. It is based on metadata that describe the formal semantics of Web contents. In this paper we present an integrated and semi-automatic approach for generating shared-understandable metadata of data- intensive Web applications. This approach is based on mapping the given relational schema into already existing ontology structure using a reverse engineering process. As a case study we present this style of a schema- and data-migration for our Institute web portal. The presented approach can be applied to a broad range of today's data-intensive Web sites. 1. INTRODUCTION The Semantic Web is one of today's hot keywords. It is about bringing ``[...] structure to the meaningful content of Web pages, creating an environment where software agents, roaming from page to page, can readily carry out sophisticated tasks for users.'' [17]. In order to enable this, web sites are enhanced with metadata that provide formal semantics for Web content. The key technology involved here are the ontologies. The ontologies provide consensual domain models, which are understandable to both human beings and machines as a shared conceptualisation of a specific domain that is given. Using ontologies, a content is made suitable for machine consumption, opposing to the content found on the web today, which is primarily intended for human consumption.
    [Show full text]
  • Noxperanto: Crowdsourced Polyglot Persistence Antonio Maccioni, Orlando Cassano, Yongming Luo, Juan Castrejón, Genoveva Vargas-Solar
    NoXperanto: Crowdsourced Polyglot Persistence Antonio Maccioni, Orlando Cassano, Yongming Luo, Juan Castrejón, Genoveva Vargas-Solar To cite this version: Antonio Maccioni, Orlando Cassano, Yongming Luo, Juan Castrejón, Genoveva Vargas-Solar. NoX- peranto: Crowdsourced Polyglot Persistence. Polibits, 2014, 50, pp.43 - 48. 10.17562/PB-50-6. hal-01584798 HAL Id: hal-01584798 https://hal.archives-ouvertes.fr/hal-01584798 Submitted on 11 Sep 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. NoXperanto: Crowdsourced Polyglot Persistence Antonio Maccioni, Orlando Cassano, Yongming Luo, Juan Castrejon,´ and Genoveva Vargas-Solar Abstract—This paper proposes NOXPERANTO, a novel modern application development [2], [3]. They enable the crowdsourcing approach to address querying over data integration of these data stores for managing big datasets in collections managed by polyglot persistence settings. The main a scalable and loosely coupled way. This approach seems to contribution of NOXPERANTO is the ability to solve complex queries involving different data stores by exploiting queries be a pertinent strategy to deal with big datasets integration. from expert users (i.e. a crowd of database administrators, data Nonetheless, the combination of these heterogeneous data engineers, domain experts, etc.), assuming that these users can stores, flexible schemas and non-standard APIs, represent an submit meaningful queries.
    [Show full text]
  • PDDM: a Database Design Method for Polyglot Persistence
    American Scientific Research Journal for Engineering, Technology, and Sciences (ASRJETS) ISSN (Print) 2313-4410, ISSN (Online) 2313-4402 © Global Society of Scientific Research and Researchers http://asrjetsjournal.org/ PDDM: A Database Design Method for Polyglot Persistence Cristofer Zdepskia, Tarcizio Alexandre Binib*, Simone Nasser Matosc a,b,cFederal University of Technology – Parana (UTFPR), Address Doutor Washington Subtil Chueire,330- Jardim Carvalho, Ponta Grossa, 84017-220, Parana, Brazil aEmail: [email protected] bEmail: [email protected] cEmail: [email protected] Abstract Databases by Web 2.0 has revealed the limitations of the relational model related to scalability. This led to the emergence of NoSQL databases, with data storage models other than relational ones. These databases propose solutions to such limitations through horizontal scalability and partially compromise data consistency. The combination of multiple data models, called polyglot persistence, extends these solutions by providing resources for the implementation of complex systems that have components with distinct requirements that would not be possible by the use of only one data model in a satisfactory way. However, there are no consolidated methods for the NoSQL database design and neither methods for design systems that apply the polyglot persistence. This work proposes a database design method applied to systems that use polyglot persistence, combining different data models. This method can be applied to the relational model and aggregate-oriented NoSQL data models. The method defines a set of sub-steps based on the existing concepts of database design. The goal is to define a formal process to assist in defining the data models to be used and to transform the conceptual design into a logical design.
    [Show full text]
  • A Big Data Roadmap for the DB2 Professional
    A Big Data Roadmap For the DB2 Professional Sponsored by: align http://www.datavail.com Craig S. Mullins Mullins Consulting, Inc. http://www.MullinsConsulting.com © 2014 Mullins Consulting, Inc. Author This presentation was prepared by: Craig S. Mullins President & Principal Consultant Mullins Consulting, Inc. 15 Coventry Ct Sugar Land, TX 77479 Tel: 281-494-6153 Fax: 281.491.0637 Skype: cs.mullins E-mail: [email protected] http://www.mullinsconsultinginc.com This document is protected under the copyright laws of the United States and other countries as an unpublished work. This document contains information that is proprietary and confidential to Mullins Consulting, Inc., which shall not be disclosed outside or duplicated, used, or disclosed in whole or in part for any purpose other than as approved by Mullins Consulting, Inc. Any use or disclosure in whole or in part of this information without the express written permission of Mullins Consulting, Inc. is prohibited. © 2014 Craig S. Mullins and Mullins Consulting, Inc. (Unpublished). All rights reserved. © 2014 Mullins Consulting, Inc. 2 Agenda Uncover the roadmap to Big Data… the terminology and technology used, use cases, and trends. • Gain a working knowledge and definition of Big Data (beyond the simple three V's definition) • Break down and understand the often confusing terminology within the realm of Big Data (e.g. polyglot persistence) • Examine the four predominant NoSQL database systems used in Big Data implementations (graph, key/value, column, and document) • Learn some of the major differences between Big Data/NoSQL implementations vis-a-vis traditional transaction processing • Discover the primary use cases for Big Data and NoSQL versus relational databases © 2014 Mullins Consulting, Inc.
    [Show full text]
  • Automated Testing of Database Schema Migrations
    DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2019 Automated Testing of Database Schema Migrations PETER JONSSON KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Automated Testing of Database Schema Migrations PETER JONSSON Master in Computer Science Date: June 28, 2019 Supervisor: Johan Gustavsson Examiner: Elena Troubitsyna School of Electrical Engineering and Computer Science Host company: The Swedish Police Authority Swedish title: Automatiserad testning av databasschemaförändringar iii Abstract Modern applications use databases, and the majority of them are relational databases, which use schemas to impose data integrity constraints. As appli- cations change, so do their databases. Database schemas are changed using migrations. Certain conditions can result in migrations failing in production environments, leading to a broken database state and testing can be problem- atic without accessing production data which can be sensitive. Two migration validation methods were proposed and implemented to au- tomatically reject invalid migrations that are not compatible with the database state. The methods were based on, and compared to, a default method that used Liquibase to structure and perform migrations. The assertion method used knowledge of what a valid state would look like to generate pre-conditions from assertions to verify that the database’s state matched expectations and that the migrations were compatible with a database’s state prior to migra- tion. The schema method, used a copy of the production database’s schema to perform migrations on an empty database in order to test the compatibility of the old and new schemas. 108 test cases consisting of a migration and a database state were used to test all methods.
    [Show full text]
  • Multi-Model Databases: a New Journey to Handle the Variety of Data
    0 Multi-model Databases: A New Journey to Handle the Variety of Data JIAHENG LU, Department of Computer Science, University of Helsinki IRENA HOLUBOVA´ , Department of Software Engineering, Charles University, Prague The variety of data is one of the most challenging issues for the research and practice in data management systems. The data are naturally organized in different formats and models, including structured data, semi- structured data and unstructured data. In this survey, we introduce the area of multi-model DBMSs which build a single database platform to manage multi-model data. Even though multi-model databases are a newly emerging area, in recent years we have witnessed many database systems to embrace this category. We provide a general classification and multi-dimensional comparisons for the most popular multi-model databases. This comprehensive introduction on existing approaches and open problems, from the technique and application perspective, make this survey useful for motivating new multi-model database approaches, as well as serving as a technical reference for developing multi-model database applications. CCS Concepts: Information systems ! Database design and models; Data model extensions; Semi- structured data;r Database query processing; Query languages for non-relational engines; Extraction, trans- formation and loading; Object-relational mapping facilities; Additional Key Words and Phrases: Big Data management, multi-model databases, NoSQL database man- agement systems. ACM Reference Format: Jiaheng Lu and Irena Holubova,´ 2019. Multi-model Databases: A New Journey to Handle the Variety of Data. ACM CSUR 0, 0, Article 0 ( 2019), 38 pages. DOI: http://dx.doi.org/10.1145/0000000.0000000 1.
    [Show full text]