Seven Databases in Seven Weeks, Second Edition
Total Page:16
File Type:pdf, Size:1020Kb
Prepared exclusively for Shohei Tanaka Prepared exclusively for Shohei Tanaka What Readers Are Saying About Seven Databases in Seven Weeks, Second Edition Choosing a database is perhaps one of the most important architectural decisions a developer can make. Seven Databases in Seven Weeks provides a fantastic tour of different technologies and makes it easy to add each to your engineering toolbox. ➤ Dave Parfitt Senior Site Reliability Engineer, Mozilla By comparing each database technology to a tool you’d find in any workshop, the authors of Seven Databases in Seven Weeks provide a practical and well-balanced survey of a very diverse and highly varied datastore landscape. Anyone looking to get a handle on the database options available to them as a data platform should read this book and consider the trade-offs presented for each option. ➤ Matthew Oldham Director of Data Architecture, Graphium Health Reading this book felt like some of my best pair-programming experiences. It showed me how to get started, kept me engaged, and encouraged me to experiment on my own. ➤ Jesse Hallett Open Source Mentor This book will really give you an overview of what’s out there so you can choose the best tool for the job. ➤ Jesse Anderson Managing Director, Big Data Institute Prepared exclusively for Shohei Tanaka We've left this page blank to make the page numbers the same in the electronic and paper books. We tried just leaving it out, but then people wrote us to ask about the missing pages. Anyway, Eddy the Gerbil wanted to say “hello.” Prepared exclusively for Shohei Tanaka Seven Databases in Seven Weeks, Second Edition A Guide to Modern Databases and the NoSQL Movement Luc Perkins with Eric Redmond and Jim R. Wilson The Pragmatic Bookshelf Raleigh, North Carolina Prepared exclusively for Shohei Tanaka Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and The Pragmatic Programmers, LLC was aware of a trademark claim, the designations have been printed in initial capital letters or in all capitals. The Pragmatic Starter Kit, The Pragmatic Programmer, Pragmatic Programming, Pragmatic Bookshelf, PragProg and the linking g device are trade- marks of The Pragmatic Programmers, LLC. Every precaution was taken in the preparation of this book. However, the publisher assumes no responsibility for errors or omissions, or for damages that may result from the use of information (including program listings) contained herein. Our Pragmatic books, screencasts, and audio books can help you and your team create better software and have more fun. Visit us at https://pragprog.com. The team that produced this book includes: Publisher: Andy Hunt VP of Operations: Janet Furlow Managing Editor: Brian MacDonald Supervising Editor: Jacquelyn Carter Series Editor: Bruce A. Tate Copy Editor: Nancy Rapoport Indexing: Potomac Indexing, LLC Layout: Gilson Graphics For sales, volume licensing, and support, please contact [email protected]. For international rights, please contact [email protected]. Copyright © 2018 The Pragmatic Programmers, LLC. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior consent of the publisher. Printed in the United States of America. ISBN-13: 978-1-68050-253-4 Encoded using the finest acid-free high-entropy binary digits. Book version: P1.0—April 2018 Prepared exclusively for Shohei Tanaka Contents Acknowledgments . vii Preface . ix 1. Introduction . 1 It Starts with a Question 2 The Genres 3 Onward and Upward 8 2. PostgreSQL . 9 That’s Post-greS-Q-L 9 Day 1: Relations, CRUD, and Joins 10 Day 2: Advanced Queries, Code, and Rules 21 Day 3: Full Text and Multidimensions 36 Wrap-Up 50 3. HBase . 53 Introducing HBase 54 Day 1: CRUD and Table Administration 55 Day 2: Working with Big Data 67 Day 3: Taking It to the Cloud 82 Wrap-Up 88 4. MongoDB . 93 Hu(mongo)us 93 Day 1: CRUD and Nesting 94 Day 2: Indexing, Aggregating, Mapreduce 110 Day 3: Replica Sets, Sharding, GeoSpatial, and GridFS 124 Wrap-Up 132 5. CouchDB . 135 Relaxing on the Couch 135 Prepared exclusively for Shohei Tanaka Contents • vi Day 1: CRUD, Fauxton, and cURL Redux 137 Day 2: Creating and Querying Views 145 Day 3: Advanced Views, Changes API, and Replicating Data 158 Wrap-Up 174 6. Neo4J . 177 Neo4j Is Whiteboard Friendly 177 Day 1: Graphs, Cypher, and CRUD 179 Day 2: REST, Indexes, and Algorithms 189 Day 3: Distributed High Availability 202 Wrap-Up 207 7. DynamoDB . 211 DynamoDB: The “Big Easy” of NoSQL 211 Day 1: Let’s Go Shopping! 216 Day 2: Building a Streaming Data Pipeline 233 Day 3: Building an “Internet of Things” System Around DynamoDB 246 Wrap-Up 255 8. Redis . 259 Data Structure Server Store 259 Day 1: CRUD and Datatypes 260 Day 2: Advanced Usage, Distribution 274 Day 3: Playing with Other Databases 289 Wrap-Up 303 9. Wrapping Up . 305 Genres Redux 305 Making a Choice 309 Where Do We Go from Here? 309 A1. Database Overview Tables . 311 A2. The CAP Theorem . 315 Eventual Consistency 316 CAP in the Wild 317 The Latency Trade-Off 317 Bibliography . 319 Index . 321 Prepared exclusively for Shohei Tanaka Acknowledgments A book with the size and scope of this one is never the work of just the authors, even if there are three of them. It requires the effort of many very smart people with superhuman eyes spotting as many mistakes as possible and providing valuable insights into the details of these technologies. We’d like to thank, in no particular order, all of the folks who provided their time and expertise: Dave Parfitt Jerry Sievert Jesse Hallett Matthew Oldham Ben Rady Nick Capito Jesse Anderson Sean Moubry Finally, thanks to Bruce Tate for his experience and guidance. We’d also like to sincerely thank the entire team at the Pragmatic Bookshelf. Thanks for entertaining this audacious project and seeing us through it. We’re especially grateful to our editor, Jackie Carter. Your patient feedback made this book what it is today. Thanks to the whole team who worked so hard to polish this book and find all of our mistakes. For anyone we missed, we hope you’ll accept our apologies. Any omissions were certainly not intentional. From Eric: Dear Noelle, you’re not special; you’re unique, and that’s so much better. Thanks for living through another book. Thanks also to the database creators and committers for providing us something to write about and make a living at. From Luc: First, I have to thank my wonderful family and friends for making my life a charmed one from the very beginning. Second, I have to thank a handful of people who believed in me and gave me a chance in the tech industry at different stages of my career: Lucas Carlson, Marko and Saša Gargenta, Troy Howard, and my co-author Eric Redmond for inviting me on board to Prepared exclusively for Shohei Tanaka report erratum • discuss Acknowledgments • viii prepare the most recent edition of this book. My journey in this industry has changed my life and I thank all of you for crucial breakthroughs. From Jim: First, I want to thank my family: Ruthy, your boundless patience and encouragement have been heartwarming. Emma and Jimmy, you’re two smart cookies, and your daddy loves you always. Also, a special thanks to all the unsung heroes who monitor IRC, message boards, mailing lists, and bug systems ready to help anyone who needs you. Your dedication to open source keeps these projects kicking. Prepared exclusively for Shohei Tanaka report erratum • discuss Preface If we use oil extraction as a metaphor for understanding data in the contem- porary world, then databases flat-out constitute—or are deeply intertwined with—all aspects of the extraction chain, from the fields to the refineries, drills, and pumps. If you want to harness the potential of data—which has perhaps become as vital to our way of life as oil—then you need to understand databases because they are quite simply the most important piece of modern data equipment. Databases are tools, a means to an end. But like any complex tool, databases also harbor their own stories and embody their own ways of looking at the world. The better you understand databases, the more capable you’ll be of tapping into the ever-growing corpus of data at our disposal. That enhanced understanding could lead to anything from undertaking fun side projects to embarking on a career change or starting your own data-driven company. Why a NoSQL Book What exactly does the term NoSQL even mean? Which types of systems does the term include? How will NoSQL impact the practice of making great soft- ware? These were questions we wanted to answer—as much for ourselves as for others. Looking back more than a half-decade later, the rise of NoSQL isn’t so much buzzworthy as it is an accepted fact. You can still read plenty of articles about NoSQL technologies on HackerNews, TechCrunch, or even WIRED, but the tenor of those articles has changed from starry-eyed prophecy (“NoSQL will change everything!”) to more standard reporting (“check out this new Redis feature!”). NoSQL is now a mainstay and a steadily maturing one at that. But don’t write a eulogy for relational databases—the “SQL” in “NoSQL”—just yet. Although NoSQL databases have gained significant traction in the tech- nological landscape, it’s still far too early to declare “traditional” relational database models as dead or even dying.