Nosql and Functional Programming 209 11 ■ Security: Protecting Data in Your Nosql Systems 232 12 ■ Selecting the Right Nosql Solution 254

www.it-ebooks.info Making Sense of NoSQL www.it-ebooks.info www.it-ebooks.info Making Sense of NoSQL A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY ANN KELLY MANNING SHELTER ISLAND www.it-ebooks.info For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 261 Shelter Island, NY 11964 Email: [email protected] ©2014 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. Development editor: Elizabeth Lexleigh 20 Baldwin Road Copyeditor: Benjamin Berg PO Box 261 Proofreader: Katie Tennant Shelter Island, NY 11964 Typesetter: Dottie Marsico Cover designer: Leslie Haimes ISBN 9781617291074 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 – MAL – 18 17 16 15 14 13 www.it-ebooks.info To technology innovators and early adopters… those who shake up the status quo We dedicate this book to people who understand the limitations of our current way of solving technology problems. They understand that by removing limitations, we can solve problems faster and at a lower cost and, at the same time, become more agile. Without these people, the NoSQL movement wouldn’t have gained the critical mass it needed to get off the ground. Innovators and early adopters are the people within organizations who shake up the status quo by testing and evaluating new architectures. They initiate pilot projects and share their successes and failures with their peers. They use early versions of software and help shake out the bugs. They build new versions of NoSQL distributions from source and explore areas where new NoSQL solutions can be applied. They’re the people who give solution architects more options for solving business problems. We hope this book will help you to make the right choices. www.it-ebooks.info www.it-ebooks.info brief contents PART 1 INTRODUCTION ............................................................1 1 ■ NoSQL: It’s about making intelligent choices 3 2 ■ NoSQL concepts 15 PART 2 DATABASE PATTERNS .................................................. 35 3 ■ Foundational data architecture patterns 37 4 ■ NoSQL data architecture patterns 62 5 ■ Native XML databases 96 PART 3 NOSQL SOLUTIONS ..................................................125 6 ■ Using NoSQL to manage big data 127 7 ■ Finding information with NoSQL search 154 8 ■ Building high-availability solutions with NoSQL 172 9 ■ Increasing agility with NoSQL 192 PART 4 ADVANCED TOPICS ....................................................207 10 ■ NoSQL and functional programming 209 11 ■ Security: protecting data in your NoSQL systems 232 12 ■ Selecting the right NoSQL solution 254 vii www.it-ebooks.info www.it-ebooks.info contents foreword xvii preface xix acknowledgments xxi about this book xxii PART 1 INTRODUCTION.................................................1 1 NoSQL: It’s about making intelligent choices 3 1.1 What is NoSQL? 4 1.2 NoSQL business drivers 6 Volume 7 ■ Velocity 7 ■ Variability 7 ■ Agility 8 1.3 NoSQL case studies 8 Case study: LiveJournal’s Memcache 9 ■ Case study: Google’s MapReduce—use commodity hardware to create search indexes 10 Case study: Google’s Bigtable—a table with a billion rows and a million columns 11 ■ Case study: Amazon’s Dynamo—accept an order 24 hours a day, 7 days a week 11 ■ Case study: MarkLogic 12 Applying your knowledge 12 1.4 Summary 13 ix www.it-ebooks.info x CONTENTS 2 NoSQL concepts 15 2.1 Keeping components simple to promote reuse 15 2.2 Using application tiers to simplify design 17 2.3 Speeding performance by strategic use of RAM, SSD, and disk 21 2.4 Using consistent hashing to keep your cache current 22 2.5 Comparing ACID and BASE—two methods of reliable database transactions 24 RDBMS transaction control using ACID 25 ■ Non-RDBMS transaction control using BASE 27 2.6 Achieving horizontal scalability with database sharding 28 2.7 Understanding trade-offs with Brewer’s CAP theorem 30 2.8 Apply your knowledge 32 2.9 Summary 33 2.10 Further reading 33 PART 2 DATABASE PATTERNS ...................................... 35 3 Foundational data architecture patterns 37 3.1 What is a data architecture pattern? 38 3.2 Understanding the row-store design pattern used in RDBMSs 39 How row stores work 39 ■ Row stores evolve 41 Analyzing the strengths and weaknesses of the row-store pattern 42 3.3 Example: Using joins in a sales order 43 3.4 Reviewing RDBMS implementation features 45 RDBMS transactions 45 ■ Fixed data definition language and typed columns 47 ■ Using RDBMS views for security and access control 48 ■ RDBMS replication and synchronization 49 3.5 Analyzing historical data with OLAP, data warehouse, and business intelligence systems 51 How data flows from operational systems to analytical systems 52 Getting familiar with OLAP concepts 54 ■ Ad hoc reporting using aggregates 55 www.it-ebooks.info CONTENTS xi 3.6 Incorporating high availability and read-mostly systems 57 3.7 Using hash trees in revision control systems and database synchronization 58 3.8 Apply your knowledge 60 3.9 Summary 60 3.10 Further reading 61 4 NoSQL data architecture patterns 62 4.1 Key-value stores 63 What is a key-value store? 64 ■ Benefits of using a key-value store 65 ■ Using a key-value store 68 ■ Use case: storing web pages in a key-value store 70 ■ Use case: Amazon simple storage service (S3) 71 4.2 Graph stores 72 Overview of a graph store 72 ■ Linking external data with the RDF standard 74 ■ Use cases for graph stores 75 4.3 Column family (Bigtable) stores 81 Column family basics 82 ■ Understanding column family keys 82 ■ Benefits of column family systems 83 Case study: storing analytical information in Bigtable 85 Case study: Google Maps stores geographic information in Bigtable 85 ■ Case study: using a column family to store user preferences 86 4.4 Document stores 86 Document store basics 87 ■ Document collections 88 Application collections 88 ■ Document store APIs 89 Document store implementations 89 ■ Case study: ad server with MongoDB 90 ■ Case study: CouchDB, a large-scale object database 91 4.5 Variations of NoSQL architectural patterns 91 Customization for RAM or SSD stores 92 ■ Distributed stores 92 Grouping items 93 4.6 Summary 95 4.7 Further reading 95 www.it-ebooks.info xii CONTENTS 5 Native XML databases 96 5.1 What is a native XML database? 97 5.2 Building applications with a native XML database 100 Loading data can be as simple as drag-and-drop 101 ■ Using collections to group your XML documents 102 ■ Applying simple queries to transform complex data with XPath 105 Transforming your data with XQuery 106 ■ Updating documents with XQuery updates 109 ■ XQuery full-text search standards 110 5.3 Using XML standards within native XML databases 110 5.4 Designing and validating your data with XML Schema and Schematron 112 XML Schema 112 ■ Using Schematron to check document rules 113 5.5 Extending XQuery with custom modules 115 5.6 Case study: using NoSQL at the Office of the Historian at the Department of State 115 5.7 Case study: managing financial derivatives with MarkLogic 119 Why financial derivatives are difficult to store in RDBMSs 119 An investment bank switches from 20 RDBMSs to one native XML system 119 ■ Business benefits of moving to a native XML document store 121 ■ Project results 122 5.8 Summary 122 5.9 Further reading 123 PART 3 NOSQL SOLUTIONS ......................................125 6 Using NoSQL to manage big data 127 6.1 What is a big data NoSQL solution? 128 6.2 Getting linear scaling in your data center 132 6.3 Understanding linear scalability and expressivity 133 6.4 Understanding the types of big data problems 135 6.5 Analyzing big data with a shared-nothing architecture 136 6.6 Choosing distribution models: master-slave versus peer-to-peer 137 www.it-ebooks.info CONTENTS xiii 6.7 Using MapReduce to transform your data over distributed systems 139 MapReduce and distributed filesystems 140 ■ How MapReduce allows efficient transformation of big data problems 142 6.8 Four ways that NoSQL systems handle big data problems 143 Moving queries to the data, not data to the queries 143 ■ Using hash rings to evenly distribute data on a cluster 144 ■ Using replication to scale reads 145 ■ Letting the database distribute queries evenly to data nodes 146 6.9 Case study: event log processing with Apache Flume 146 Challenges of event log data analysis 147 ■ How Apache Flume works to gather distributed event data 148 ■ Further thoughts 149 6.10 Case study: computer-aided discovery of health care fraud 150 What

Load more