Nosql Distilled: a Brief Guide to the Emerging World of Polyglot Persistence

NoSQL Distilled This page intentionally left blank NoSQL Distilled A Brief Guide to the Emerging World of Polyglot Persistence Pramod J. Sadalage Martin Fowler Upper Saddle River, NJ • Boston • Indianapolis • San Francisco New York • Toronto • Montreal • London • Munich • Paris • Madrid Capetown • Sydney • Tokyo • Singapore • Mexico City Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact: U.S. Corporate and Government Sales (800) 382–3419 [email protected] For sales outside the United States please contact: International Sales [email protected] Visit us on the Web: informit.com/aw Library of Congress Cataloging-in-Publication Data: Sadalage, Pramod J. NoSQL distilled : a brief guide to the emerging world of polyglot persistence / Pramod J Sadalage, Martin Fowler. p. cm. Includes bibliographical references and index. ISBN 978-0-321-82662-6 (pbk. : alk. paper) -- ISBN 0-321-82662-0 (pbk. : alk. paper) 1. Databases--Technological innovations. 2. Information storage and retrieval systems. I. Fowler, Martin, 1963- II. Title. QA76.9.D32S228 2013 005.74--dc23 Copyright © 2013 Pearson Education, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. To obtain permission to use material from this work, please submit a written request to Pearson Education, Inc., Permissions Department, One Lake Street, Upper Saddle River, New Jersey 07458, or you may fax your request to (201) 236–3290. ISBN-13: 978-0-321-82662-6 ISBN-10: 0-321-82662-0 Text printed in the United States on recycled paper at RR Donnelley in Crawfordsville, Indiana. Third printing, September 2013 For my teachers Gajanan Chinchwadkar, Dattatraya Mhaskar, and Arvind Parchure. You inspired me the most, thank you. —Pramod For Cindy —Martin This page intentionally left blank Contents Preface ........................................................................................................ xiii Part I: Understand .......................................................................................... 1 Chapter 1: Why NoSQL? ....................................................................... 3 1.1 The Value of Relational Databases ............................................ 3 1.1.1 Getting at Persistent Data .................................................... 3 1.1.2 Concurrency ...................................................................... 4 1.1.3 Integration ......................................................................... 4 1.1.4 A (Mostly) Standard Model .................................................. 4 1.2 Impedance Mismatch ................................................................ 5 1.3 Application and Integration Databases ...................................... 6 1.4 Attack of the Clusters ................................................................ 8 1.5 The Emergence of NoSQL ......................................................... 9 1.6 Key Points ................................................................................ 12 Chapter 2: Aggregate Data Models ...................................................... 13 2.1 Aggregates ............................................................................... 14 2.1.1 Example of Relations and Aggregates .................................... 14 2.1.2 Consequences of Aggregate Orientation ................................ 19 2.2 Key-Value and Document Data Models ................................... 20 2.3 Column-Family Stores ............................................................. 21 2.4 Summarizing Aggregate-Oriented Databases ........................... 23 2.5 Further Reading ....................................................................... 24 2.6 Key Points ................................................................................ 24 Chapter 3: More Details on Data Models ............................................. 25 3.1 Relationships ........................................................................... 25 3.2 Graph Databases ..................................................................... 26 vii viii Contents 3.3 Schemaless Databases .............................................................. 28 3.4 Materialized Views .................................................................. 30 3.5 Modeling for Data Access ........................................................ 31 3.6 Key Points ................................................................................ 36 Chapter 4: Distribution Models ........................................................... 37 4.1 Single Server ............................................................................ 37 4.2 Sharding .................................................................................. 38 4.3 Master-Slave Replication ......................................................... 40 4.4 Peer-to-Peer Replication ........................................................... 42 4.5 Combining Sharding and Replication ...................................... 43 4.6 Key Points ................................................................................ 44 Chapter 5: Consistency ........................................................................ 47 5.1 Update Consistency ................................................................. 47 5.2 Read Consistency ..................................................................... 49 5.3 Relaxing Consistency ............................................................... 52 5.3.1 The CAP Theorem .............................................................. 53 5.4 Relaxing Durability ................................................................. 56 5.5 Quorums ................................................................................. 57 5.6 Further Reading ....................................................................... 59 5.7 Key Points ................................................................................ 59 Chapter 6: Version Stamps ................................................................... 61 6.1 Business and System Transactions ............................................ 61 6.2 Version Stamps on Multiple Nodes .......................................... 63 6.3 Key Points ................................................................................ 65 Chapter 7: Map-Reduce ....................................................................... 67 7.1 Basic Map-Reduce ................................................................... 68 7.2 Partitioning and Combining ..................................................... 69 7.3 Composing Map-Reduce Calculations ..................................... 72 7.3.1 A Two Stage Map-Reduce Example ...................................... 73 7.3.2 Incremental Map-Reduce ..................................................... 76 7.4 Further Reading ....................................................................... 77 7.5 Key Points ................................................................................ 77 Part II: Implement ........................................................................................ 79 Chapter 8: Key-Value Databases .......................................................... 81 8.1 What Is a Key-Value Store ....................................................... 81 8.2 Key-Value Store Features ......................................................... 83 Contents ix 8.2.1 Consistency ....................................................................... 83 8.2.2 Transactions ...................................................................... 84 8.2.3 Query Features ................................................................... 84 8.2.4 Structure of Data ................................................................ 86 8.2.5 Scaling .............................................................................. 86 8.3 Suitable Use Cases ................................................................... 87 8.3.1 Storing Session Information ................................................. 87 8.3.2 User Proﬁles, Preferences ..................................................... 87 8.3.3 Shopping Cart Data ............................................................ 87 8.4 When Not to Use ..................................................................... 87 8.4.1 Relationships among Data ..................................................

Nosql Distilled: a Brief Guide to the Emerging World of Polyglot Persistence

Management of Polyglot Persistent Environments for Low Latency Data Access in Big Data

Polyglot Persistency an Adaptive Data Layer for Digital Systems Table of Contents

Multi-Model Database Management Systems - a Look Forward Zhen Hua Liu 1 , Jiaheng Lu2, Dieter Gawlick1, Heli Helskyaho2,3

Noxperanto: Crowdsourced Polyglot Persistence Antonio Maccioni, Orlando Cassano, Yongming Luo, Juan Castrejón, Genoveva Vargas-Solar

PDDM: a Database Design Method for Polyglot Persistence

A Big Data Roadmap for the DB2 Professional

Multi-Model Databases: a New Journey to Handle the Variety of Data

Nosql Distilled: a Brief Guide to the Emerging World of Polyglot

Polyglot Persistence: Handling Multiple Datastores

Polyglot Database Architectures = Polyglot Challenges

POLYGLOT PERSISTENCE for ADDRESSING DATA MANAGEMENT on the CLOUD Genoveva Vargas-Solar, CNRS, LIG-LAFMIA, Juan Carlos Castrejon, U

An Approach for Modeling Polyglot Persistence