Nosql Distilled: a Brief Guide to the Emerging World of Polyglot
Total Page:16
File Type:pdf, Size:1020Kb
NoSQL Distilled A Brief Guide to the Emerging World of Polyglot Persistence Pramod J. Sadalage Martin Fowler Upper Saddle River, NJ • Boston • Indianapolis • San Francisco New York • Toronto • Montreal • London • Munich • Paris • Madrid Capetown • Sydney • Tokyo • Singapore • Mexico City Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact: U.S. Corporate and Government Sales (800) 382–3419 [email protected] For sales outside the United States please contact: International Sales [email protected] Visit us on the Web: informit.com/aw Library of Congress Cataloging-in-Publication Data: Sadalage, Pramod J. NoSQL distilled : a brief guide to the emerging world of polyglot persistence / Pramod J Sadalage, Martin Fowler. p. cm. Includes bibliographical references and index. ISBN 978-0-321-82662-6 (pbk. : alk. paper) -- ISBN 0-321-82662-0 (pbk. : alk. paper) 1. Databases--Technological innovations. 2. Information storage and retrieval systems. I. Fowler, Martin, 1963- II. Title. QA76.9.D32S228 2013 005.74--dc23 Copyright © 2013 Pearson Education, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. To obtain permission to use material from this work, please submit a written request to Pearson Education, Inc., Permissions Department, One Lake Street, Upper Saddle River, New Jersey 07458, or you may fax your request to (201) 236–3290. ISBN-13: 978-0-321-82662-6 ISBN-10: 0-321-82662-0 Text printed in the United States on recycled paper at RR Donnelley in Crawfordsville, Indiana. First printing, August 2012 For my teachers Gajanan Chinchwadkar, Dattatraya Mhaskar, and Arvind Parchure. You inspired me the most, thank you. —Pramod For Cindy —Martin Contents Preface Part I: Understand Chapter 1: Why NoSQL? 1.1 The Value of Relational Databases 1.1.1 Getting at Persistent Data 1.1.2 Concurrency 1.1.3 Integration 1.1.4 A (Mostly) Standard Model 1.2 Impedance Mismatch 1.3 Application and Integration Databases 1.4 Attack of the Clusters 1.5 The Emergence of NoSQL 1.6 Key Points Chapter 2: Aggregate Data Models 2.1 Aggregates 2.1.1 Example of Relations and Aggregates 2.1.2 Consequences of Aggregate Orientation 2.2 Key-Value and Document Data Models 2.3 Column-Family Stores 2.4 Summarizing Aggregate-Oriented Databases 2.5 Further Reading 2.6 Key Points Chapter 3: More Details on Data Models 3.1 Relationships 3.2 Graph Databases 3.3 Schemaless Databases 3.4 Materialized Views 3.5 Modeling for Data Access 3.6 Key Points Chapter 4: Distribution Models 4.1 Single Server 4.2 Sharding 4.3 Master-Slave Replication 4.4 Peer-to-Peer Replication 4.5 Combining Sharding and Replication 4.6 Key Points Chapter 5: Consistency 5.1 Update Consistency 5.2 Read Consistency 5.3 Relaxing Consistency 5.3.1 The CAP Theorem 5.4 Relaxing Durability 5.5 Quorums 5.6 Further Reading 5.7 Key Points Chapter 6: Version Stamps 6.1 Business and System Transactions 6.2 Version Stamps on Multiple Nodes 6.3 Key Points Chapter 7: Map-Reduce 7.1 Basic Map-Reduce 7.2 Partitioning and Combining 7.3 Composing Map-Reduce Calculations 7.3.1 A Two Stage Map-Reduce Example 7.3.2 Incremental Map-Reduce 7.4 Further Reading 7.5 Key Points Part II: Implement Chapter 8: Key-Value Databases 8.1 What Is a Key-Value Store 8.2 Key-Value Store Features 8.2.1 Consistency 8.2.2 Transactions 8.2.3 Query Features 8.2.4 Structure of Data 8.2.5 Scaling 8.3 Suitable Use Cases 8.3.1 Storing Session Information 8.3.2 User Profiles, Preferences 8.3.3 Shopping Cart Data 8.4 When Not to Use 8.4.1 Relationships among Data 8.4.2 Multioperation Transactions 8.4.3 Query by Data 8.4.4 Operations by Sets Chapter 9: Document Databases 9.1 What Is a Document Database? 9.2 Features 9.2.1 Consistency 9.2.2 Transactions 9.2.3 Availability 9.2.4 Query Features 9.2.5 Scaling 9.3 Suitable Use Cases 9.3.1 Event Logging 9.3.2 Content Management Systems, Blogging Platforms 9.3.3 Web Analytics or Real-Time Analytics 9.3.4 E-Commerce Applications 9.4 When Not to Use 9.4.1 Complex Transactions Spanning Different Operations 9.4.2 Queries against Varying Aggregate Structure Chapter 10: Column-Family Stores 10.1 What Is a Column-Family Data Store? 10.2 Features 10.2.1 Consistency 10.2.2 Transactions 10.2.3 Availability 10.2.4 Query Features 10.2.5 Scaling 10.3 Suitable Use Cases 10.3.1 Event Logging 10.3.2 Content Management Systems, Blogging Platforms 10.3.3 Counters 10.3.4 Expiring Usage 10.4 When Not to Use Chapter 11: Graph Databases 11.1 What Is a Graph Database? 11.2 Features 11.2.1 Consistency 11.2.2 Transactions 11.2.3 Availability 11.2.4 Query Features 11.2.5 Scaling 11.3 Suitable Use Cases 11.3.1 Connected Data 11.3.2 Routing, Dispatch, and Location-Based Services 11.3.3 Recommendation Engines 11.4 When Not to Use Chapter 12: Schema Migrations 12.1 Schema Changes 12.2 Schema Changes in RDBMS 12.2.1 Migrations for Green Field Projects 12.2.2 Migrations in Legacy Projects 12.3 Schema Changes in a NoSQL Data Store 12.3.1 Incremental Migration 12.3.2 Migrations in Graph Databases 12.3.3 Changing Aggregate Structure 12.4 Further Reading 12.5 Key Points Chapter 13: Polyglot Persistence 13.1 Disparate Data Storage Needs 13.2 Polyglot Data Store Usage 13.3 Service Usage over Direct Data Store Usage 13.4 Expanding for Better Functionality 13.5 Choosing the Right Technology 13.6 Enterprise Concerns with Polyglot Persistence 13.7 Deployment Complexity 13.8 Key Points Chapter 14: Beyond NoSQL 14.1 File Systems 14.2 Event Sourcing 14.3 Memory Image 14.4 Version Control 14.5 XML Databases 14.6 Object Databases 14.7 Key Points Chapter 15: Choosing Your Database 15.1 Programmer Productivity 15.2 Data-Access Performance 15.3 Sticking with the Default 15.4 Hedging Your Bets 15.5 Key Points 15.6 Final Thoughts Bibliography Index Preface We’ve spent some twenty years in the world of enterprise computing. We’ve seen many things change in languages, architectures, platforms, and processes. But through all this time one thing has stayed constant—relational databases store the data. There have been challengers, some of which have had success in some niches, but on the whole the data storage question for architects has been the question of which relational database to use. There is a lot of value in the stability of this reign. An organization’s data lasts much longer that its programs (at least that’s what people tell us—we’ve seen plenty of very old programs out there). It’s valuable to have a stable data storage that’s well understood and accessible from many application programming platforms. Now, however, there’s a new challenger on the block under the confrontational tag of NoSQL. It’s born out of a need to handle larger data volumes which forced a fundamental shift to building large hardware platforms through clusters of commodity servers. This need has also raised long-running concerns about the difficulties of making application code play well with the relational data model. The term “NoSQL” is very ill-defined. It’s generally applied to a number of recent nonrelational databases such as Cassandra, Mongo, Neo4J, and Riak. They embrace schemaless data, run on clusters, and have the ability to trade off traditional consistency for other useful properties. Advocates of NoSQL databases claim that they can build systems that are more performant, scale much better, and are easier to program with. Is this the first rattle of the death knell for relational databases, or yet another pretender to the throne? Our answer to that is “neither.” Relational databases are a powerful tool that we expect to be using for many more decades, but we do see a profound change in that relational databases won’t be the only databases in use. Our view is that we are entering a world of Polyglot Persistence where enterprises, and even individual applications, use multiple technologies for data management. As a result, architects will need to be familiar with these technologies and be able to evaluate which ones to use for differing needs. Had we not thought that, we wouldn’t have spent the time and effort writing this book.