Eric Redmond, Jim R. Wilson — «Seven Databases in Seven Weeks
Total Page:16
File Type:pdf, Size:1020Kb
What Readers Are Saying About Seven Databases in Seven Weeks The flow is perfect. On Friday, you’ll be up and running with a new database. On Saturday, you’ll see what it’s like under daily use. By Sunday, you’ll have learned a few tricks that might even surprise the experts! And next week, you’ll vault to another database and have fun all over again. ➤ Ian Dees Coauthor, Using JRuby Provides a great overview of several key databases that will multiply your data modeling options and skills. Read if you want database envy seven times in a row. ➤ Sean Copenhaver Lead Code Commodore, backgroundchecks.com This is by far the best substantive overview of modern databases. Unlike the host of tutorials, blog posts, and documentation I have read, this book taught me why I would want to use each type of database and the ways in which I can use them in a way that made me easily understand and retain the information. It was a pleasure to read. ➤ Loren Sands-Ramshaw Software Engineer, U.S. Department of Defense This is one of the best CouchDB introductions I have seen. ➤ Jan Lehnardt Apache CouchDB Developer and Author Seven Databases in Seven Weeks is an excellent introduction to all aspects of modern database design and implementation. Even spending a day in each chapter will broaden understanding at all skill levels, from novice to expert— there’s something there for everyone. ➤ Jerry Sievert Director of Engineering, Daily Insight Group In an ideal world, the book cover would have been big enough to call this book “Everything you never thought you wanted to know about databases that you can’t possibly live without.” To be fair, Seven Databases in Seven Weeks will probably sell better. ➤ Dr Nic Williams VP of Technology, Engine Yard Seven Databases in Seven Weeks A Guide to Modern Databases and the NoSQL Movement Eric Redmond Jim R. Wilson The Pragmatic Bookshelf Dallas, Texas • Raleigh, North Carolina Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and The Pragmatic Programmers, LLC was aware of a trademark claim, the designations have been printed in initial capital letters or in all capitals. The Pragmatic Starter Kit, The Pragmatic Programmer, Pragmatic Programming, Pragmatic Bookshelf, PragProg and the linking g device are trade- marks of The Pragmatic Programmers, LLC. Every precaution was taken in the preparation of this book. However, the publisher assumes no responsibility for errors or omissions, or for damages that may result from the use of information (including program listings) contained herein. Our Pragmatic courses, workshops, and other products can help you and your team create better software and have more fun. For more information, as well as the latest Pragmatic titles, please visit us at http://pragprog.com. Apache, Apache HBase, Apache CouchDB, HBase, CouchDB, and the HBase and CouchDB logos are trademarks of The Apache Software Foundation. Used with permission. No endorse- ment by The Apache Software Foundation is implied by the use of these marks. The team that produced this book includes: Jackie Carter (editor) Potomac Indexing, LLC (indexer) Kim Wimpsett (copyeditor) David J Kelly (typesetter) Janet Furlow (producer) Juliet Benda (rights) Ellie Callahan (support) Copyright © 2012 Pragmatic Programmers, LLC. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior consent of the publisher. Printed in the United States of America. ISBN-13: 978-1-93435-692-0 Encoded using the finest acid-free high-entropy binary digits. Book version: P1..0—May 2012 Contents Foreword . vii Acknowledgments . ix Preface . xi 1. Introduction ............. 1 1.1 It Starts with a Question 1 1.2 The Genres 3 1.3 Onward and Upward 7 2. PostgreSQL .............9 2.1 That’s Post-greS-Q-L 9 2.2 Day 1: Relations, CRUD, and Joins 10 2.3 Day 2: Advanced Queries, Code, and Rules 21 2.4 Day 3: Full-Text and Multidimensions 35 2.5 Wrap-Up 48 3. Riak . 51 3.1 Riak Loves the Web 51 3.2 Day 1: CRUD, Links, and MIMEs 52 3.3 Day 2: Mapreduce and Server Clusters 62 3.4 Day 3: Resolving Conflicts and Extending Riak 80 3.5 Wrap-Up 91 4. HBase . 93 4.1 Introducing HBase 94 4.2 Day 1: CRUD and Table Administration 94 4.3 Day 2: Working with Big Data 106 4.4 Day 3: Taking It to the Cloud 122 4.5 Wrap-Up 131 vi • Contents 5. MongoDB . 135 5.1 Hu(mongo)us 135 5.2 Day 1: CRUD and Nesting 136 5.3 Day 2: Indexing, Grouping, Mapreduce 151 5.4 Day 3: Replica Sets, Sharding, GeoSpatial, and GridFS 165 5.5 Wrap-Up 174 6. CouchDB . 177 6.1 Relaxing on the Couch 177 6.2 Day 1: CRUD, Futon, and cURL Redux 178 6.3 Day 2: Creating and Querying Views 186 6.4 Day 3: Advanced Views, Changes API, and Replicating Data 200 6.5 Wrap-Up 217 7. Neo4J . 219 7.1 Neo4J Is Whiteboard Friendly 219 7.2 Day 1: Graphs, Groovy, and CRUD 220 7.3 Day 2: REST, Indexes, and Algorithms 238 7.4 Day 3: Distributed High Availability 250 7.5 Wrap-Up 258 8. Redis . 261 8.1 Data Structure Server Store 261 8.2 Day 1: CRUD and Datatypes 262 8.3 Day 2: Advanced Usage, Distribution 275 8.4 Day 3: Playing with Other Databases 291 8.5 Wrap-Up 304 9. Wrapping Up . 307 9.1 Genres Redux 307 9.2 Making a Choice 311 9.3 Where Do We Go from Here? 312 A1. Database Overview Tables . 313 A2. The CAP Theorem . 317 A2.1 Eventual Consistency 317 A2.2 CAP in the Wild 318 A2.3 The Latency Trade-Off 319 Bibliography . 321 Index . 323 Foreword Riding up the Beaver Run SuperChair in Breckenridge, Colorado, we wondered where the fresh powder was. Breckenridge made snow, and the slopes were immaculately groomed, but there was an inevitable sameness to the conditions on the mountain. Without fresh snow, the total experience was lacking. In 1994, as an employee of IBM’s database development lab in Austin, I had very much the same feeling. I had studied object-oriented databases at the University of Texas at Austin because after a decade of relational dominance, I thought that object-oriented databases had a real chance to take root. Still, the next decade brought more of the same relational models as before. I watched dejectedly as Oracle, IBM, and later the open source solutions led by MySQL spread their branches wide, completely blocking out the sun for any sprouting solutions on the fertile floor below. Over time, the user interfaces changed from green screens to client-server to Internet-based applications, but the coding of the relational layer stretched out to a relentless barrage of sameness, spanning decades of perfectly compe- tent tedium. So, we waited for the fresh blanket of snow. And then the fresh powder finally came. At first, the dusting wasn’t even enough to cover this morning’s earliest tracks, but the power of the storm took over, replenishing the landscape and delivering the perfect skiing expe- rience with the diversity and quality that we craved. Just this past year, I woke up to the realization that the database world, too, is covered with a fresh blanket of snow. Sure, the relational databases are there, and you can get a surprisingly rich experience with open source RDBMS software. You can do clustering, full-text search, and even fuzzy searching. But you’re no longer limited to that approach. I have not built a fully relational solution in a year. Over that time, I’ve used a document-based database and a couple of key- value datastores. The truth is that relational databases no longer have a monopoly on flexibility or even scalability. For the kinds of applications that we build, there are more viii • Foreword appropriate models that are simpler, faster, and more reliable. As a person who spent ten years at IBM Austin working on databases with our labs and customers, this development is simply stunning to me. In Seven Databases in Seven Weeks, you’ll work through examples that cover a beautiful cross section of the most critical advances in the databases that back Internet development. Within key-value stores, you’ll learn about the radically scalable and reliable Riak and the beautiful query mechanisms in Redis. From the columnar database community, you’ll sample the power of HBase, a close cousin of the relational database models. And from the document-oriented database stores, you’ll see the elegant solutions for deeply nested documents in the wildly scalable MongoDB. You’ll also see Neo4J’s spin on graph databases, allowing rapid traversal of relationships. You won’t have to use all of these databases to be a better programmer or database admin. As Eric Redmond and Jim Wilson take you on this magical tour, every step will make you smarter and lend the kind of insight that is invaluable in a modern software professional. You will know where each platform shines and where it is the most limited. You will see where your industry is moving and learn the forces driving it there. Enjoy the ride. Bruce Tate author of Seven Languages in Seven Weeks Austin, Texas, May 2012 Acknowledgments A book with the size and scope of this one cannot be done by two mere authors alone.