Neo4j in Action

Aleksa Vukotic and Nicki Watt WITH Tareq Abedrabbo Dominic Fox Jonas Partner FOREWORD BY Jim Webber AND Ian Robinson MANNING Neo4j in Action ALEKSA VUKOTIC NICKI WATT with TAREQ ABEDRABBO DOMINIC FOX JONAS PARTNER MANNING SHELTER ISLAND For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: [email protected] ©2015 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. Development editor: Karen Miller 20 Baldwin Road Technical development editor Gordon Dickens PO Box 761 Copyeditor: Andy Carroll Shelter Island, NY 11964 Proofreader: Elizabeth Martin Technical proofreader: Craig Taverner Typesetter: Dennis Dalinnik Cover designer: Marija Tudor ISBN: 9781617290763 Printed in the United States of America 12345678910–EBM–191817161514 brief contents PART 1INTRODUCTION TO NEO4J ............................................1 1 ■ A case for a Neo4j database 3 2 ■ Data modeling in Neo4j 19 3 ■ Starting development with Neo4j 32 4 ■ The power of traversals 50 5 ■ Indexing the data 64 PART 2APPLICATION DEVELOPMENT WITH NEO4J ...................79 6 ■ Cypher: Neo4j query language 81 7 ■ Transactions 110 8 ■ Traversals in depth 125 9 ■ Spring Data Neo4j 147 PART 3NEO4J IN PRODUCTION.............................................181 10 ■ Neo4j: embedded versus server mode 183 11 ■ Neo4j in production 216 iii contents foreword xi preface xiii acknowledgments xiv about this book xvi about the authors xx about the cover illustration xxii PART 1INTRODUCTION TO NEO4J.................................1 A case for a Neo4j database 3 1 1.1 Why Neo4j? 4 1.2 Graph data in a relational database 5 Querying graph data using MySQL 6 1.3 Graph data in Neo4j 8 Traversing the graph 9 1.4 SQL joins versus graph traversal on a large scale 11 1.5 Graphs around you 14 1.6 Neo4j in NoSQL space 14 Key-value stores 15 ■ Column-family stores 15 Document-oriented databases 16 ■ Graph databases 16 NoSQL categories compared 16 v vi CONTENTS 1.7 Neo4j: the ACID-compliant database 17 1.8 Summary 18 Data modeling in Neo4j 19 2 2.1 What is a data model for Neo4j? 19 Modeling with diagrams: a simple example 20 Modeling with diagrams: a complex example 22 2.2 Domain modeling 24 Entities and properties 24 2.3 Further examples 27 Underground stations example 28 ■ Band members example 29 2.4 Summary 31 Starting development with Neo4j 32 3 3.1 Modeling graph data structures 33 3.2 Using the Neo4j API 36 Creating nodes 36 ■ Creating relationships 39 Adding properties to nodes 40 ■ Node type strategies 42 Adding properties to relationships 44 3.3 Node labels 46 3.4 Summary 48 The power of traversals 50 4 4.1 Traversing using the Neo4j Core Java API 51 Finding the starting node 51 ■ Traversing direct relationships 52 ■ Traversing second-level relationships 54 Memory usage considerations 57 4.2 Traversing using the Neo4j Traversal API 58 Using Neo4j’s built-in traversal constructs 58 Implementing a custom evaluator 60 4.3 Summary 63 Indexing the data 64 5 5.1 Creating the index entry 65 5.2 Finding the user by their email 66 5.3 Dealing with more than one match 68 5.4 Dealing with changes to indexed data 69 CONTENTS vii 5.5 Automatic indexing 70 Schema indexing 70 ■ Auto-indexing 73 5.6 The cost/benefit trade-off of indexing 75 Performance benefit of indexing when querying 76 Performance overhead of indexing when updating and inserting 77 Storing the index 78 5.7 Summary 78 PART 2APPLICATION DEVELOPMENT WITH NEO4J .......79 Cypher: Neo4j query language 81 6 6.1 Introduction to Cypher 82 Cypher primer 82 ■ Executing Cypher queries 84 6.2 Cypher syntax basics 89 Pattern matching 89 ■ Finding the starting node 93 Filtering data 97 ■ Getting the results 98 6.3 Updating your graph with Cypher 101 Creating new graph entities 102 ■ Deleting data 104 Updating node and relationship properties 104 6.4 Advanced Cypher 105 Aggregation 106 ■ Functions 106 ■ Piping using the with clause 108 ■ Cypher compatibility 109 6.5 Summary 109 Transactions 110 7 7.1 Transaction basics 111 Adding in a transaction 112 ■ Finishing what you start and not trying to do too much in one go 113 7.2 Transactions in depth 114 Transaction semantics 115 ■ Reading in a transaction and explicit read locks 117 ■ Writing in a transaction and explicit write locks 118 ■ The danger of deadlocks 120 7.3 Integration with other transaction management systems 121 7.4 Transaction events 122 7.5 Summary 124 viii CONTENTS Traversals in depth 125 8 8.1 Traversal ordering 125 Depth-first 126 ■ Breadth-first 128 ■ Comparing depth-first and breadth-first ordering 130 8.2 Expanding relationships 132 StandardExpander 132 ■ Ordering relationships for expansion 135 ■ Custom expanders 136 8.3 Managing uniqueness 138 NODE_GLOBAL uniqueness 138 ■ NODE_PATH uniqueness 140 ■ Other uniqueness types 142 8.4 Bidirectional traversals 143 8.5 Summary 145 Spring Data Neo4j 147 9 9.1 Where does SDN fit in? 148 What is Spring and how is SDN related to it? 149 What is SDN good for (and not good for)? 150 Where to get SDN 150 ■ Where to get more information 151 9.2 Modeling with SDN 151 Initial POJO domain modeling 151 ■ Annotating the domain model 154 ■ Modeling node entities 155 Modeling relationship entities 158 ■ Modeling relationships between node entities 160 9.3 Accessing and persisting entities 163 Supporting Spring configuration 163 ■ Neo4jTemplate class 163 ■ Repositories 165 ■ Other options 168 9.4 Object-graph mapping options 169 Simple mapping 169 ■ Advanced mapping based on AspectJ 173 ■ Object mapping summary 175 9.5 Performing queries and traversals 176 Annotated queries 177 ■ Dynamically derived queries 178 Traversals 180 9.6 Summary 180 CONTENTS ix PART 3NEO4J IN PRODUCTION .................................181 Neo4j: embedded versus server mode 183 10 10.1 Usage modes overview 184 10.2 Embedded mode 186 Core Java integration 186 ■ Other JVM-based integration 189 10.3 Server mode 190 Neo4j server overview 191 ■ Using the fine-grained Neo4j server REST API 192 ■ Using the Cypher Neo4j server REST API endpoint 194 ■ Using a remote client library to help access the Neo4j server 196 ■ Server plugins and unmanaged extensions 197 10.4 Weighing the options 198 Architectural considerations 199 ■ Performance considerations 201 ■ Other considerations 204 10.5 Getting the most out of the server mode 205 Avoid fine-grained operations 206 ■ Using Cypher 207 Server plugins 209 ■ Unmanaged extensions 212 Streaming REST API 214 10.6 Summary 215 Neo4j in production 216 11 11.1 High-level Neo4j architecture 217 Setting the scene ... 218 ■ Disks 218 ■ Store files 220 Neo4j caches 222 ■ Transaction logs and recoverability 227 Programmatic APIs 229 11.2 Neo4j High Availability (HA) 230 Neo4j clustering overview 231 ■ Setting up a Neo4j cluster 234 Replication—reading and writing strategies 237 Cache sharding 241 ■ HA summary 243 11.3 Backups 244 Offline backups 244 ■ Online backups 246 Restoring from backup 248 11.4 Topics we couldn’t cover but that you should be aware of 248 Security 248 ■ Monitoring 249 x CONTENTS 11.5 Summary 249 11.6 Final thoughts 249 appendix A Installing Neo4j server 250 appendix B Setting up and running the sample code 255 appendix C Setting up your project to use SDN 261 appendix D Getting more help 268 index 269 foreword The database world is experiencing an enormous upheaval, with the hegemony of relational databases being challenged by a plethora of new technologies under the NoSQL banner. Among these approaches, graphs are gaining substantial credibility as a means of analyzing data across a broad range of domains. Most NoSQL databases address the perceived performance limitations of relational databases, which flounder when confronted with the exponential growth in data vol- umes that we’ve witnessed over the last few years. But data growth is only one of the challenges we face. Not only is data growing, it’s also becoming more interconnected and more variably structured. In short, it’s becoming far more networked. In addressing performance and scalability, NoSQL has generally given up on the capabilities of the relational model with regard to interconnected data. Graph databases, in contrast, revitalize the world of connected data, outperforming relational databases by several orders of magnitude. Many of the most interesting questions we want to ask of our data require us to understand not only that things are connected, but also the differences between those connections. Graph databases offer the most powerful and best-performing means for generating this kind of insight. Connected data poses difficulties for most NoSQL databases, which manage doc- uments, columns, or key/value pairs as disconnected aggregates. To create any semblance of connectedness using these technologies, we must find a way to both denormalize data and fudge connections onto an inherently disconnected model.

Load more