
MEAP Edition Manning Early Access Program Graph Databases in Action Version 1 Copyright 2019 Manning Publications For more information on this and other Manning titles go to manning.com ©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders. https://forums.manning.com/forums/graph-databases-in-action welcome Thank you for purchasing the MEAP for Graph Databases in Action. Though cliché, the motivation behind this book is to write the book that I wish had existed when I started working with graph databases. Though there is a large amount of information available on the web about graph databases, it tends to be either very rudimentary or extremely advanced. What’s lacking is information to help people go from just getting started with graph databases to being proficient in the practical aspects of building applications with them. This is the void that I am looking to fill with this book. My approach to teaching graph databases is to draw on the familiar concepts of relational databases for comparison, so having a background in data modeling and querying relational databases is suggested. To make the most of this book you will also need to be familiar with building out Java applications on top of relational database systems. In this book you will gain an understanding of how graph databases work, how they differ from relational databases, and how to use them to develop applications. You’ll learn the fundamentals of graph databases by following along as we build a fictitious graph-based application called GluttonApp. The book is divided into three parts. Part 1 explains what a graph database is, when to use one, and what the ecosystem of graph database options looks like. It also covers the fundamental principles of graph data modeling in action, as we apply them to our sample application. In Part 2, you will learn how to query a graph database and build a basic Java application using a TinkerPop-enabled graph database named Tinkergraph. Part 3 will move us to some more advanced concepts of graph databases such as performance tuning and application pitfalls and anti-patterns. My goal in this book is to remain vendor agnostic in the concepts and techniques, so you can easily transfer the skills you learn here to a variety of popular databases when you actually go to put your application into production. To build out the application, I decided to go with the Apache TinkerPop project, due to its widest adoption amongst database vendors, including AWS and Azure, as well as its open source in-memory database, Tinkergraph, which we will use throughout the book. If you have any questions, comments, or suggestions, please share them in Manning’s Author Online forum for my book. —Dave Bechberger ©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders. https://forums.manning.com/forums/graph-databases-in-action brief contents PART 1: GETTING STARTED WITH GRAPH DATABASES 1 What is a Graph and what can I do with it? 2 Do I have a Graph problem? 3 Graph Data Modeling 4 Data modeling in practice PART 2: BUILDING ON GRAPH DATABASES 5 Querying Graph databases 6 Developing our application in Java 7 Beyond basic querying PART 2: MOVING BEYOND THE BASICS 8 Performance tuning our application 9 Graph pitfalls and anti-patterns 10 Graph analytics for non-Graph people APPENDIXES: A Apache TinkerPop installation and overview B Gremlin steps cheatsheet C An overview of property model graph databases and tools ©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders. https://forums.manning.com/forums/graph-databases-in-action 1 1 What is a Graph and What Can it Do? This chapter covers • What graph databases are and why you want to use them to solve highly connected data problems • How graph databases compare to relational and NoSQL databases • Introduction to “just enough” graph theory and terminology to get started In May of 2016, a massive leak of over 11 million documents, measuring ~2.6 terabytes of data, was released by the International Consortium for Investigative Journalists (ICIJ) (https://www.icij.org/investigations/panama-papers/), in what has become known as the Panama Papers. This release was a coordinated effort between journalists in nearly 80 countries to examine and connect information on approximately 200,000+ secret offshore companies based in Panama. (https://www.icij.org/investigations/panama- papers/pages/panama-papers-about-the-investigation/) Their investigation led to the naming of many celebrities, politicians, and their families as potentially using offshore bank accounts to hide their fortunes. Due to the sheer volume, the number of records, and highly connected nature of the data, the ICIJ decided to use a graph database named Neo4j to handle and coordinate the distributed efforts to connect the various pieces of data. Why would you choose to use a graph database over a more standard tool, such as a relational database to answer these sorts of questions? Graph databases are the only option when trying to make sense of the vast terabytes of connected data that we are producing more and more of, and are an essential tool for international agencies, governments, financial services, and security firms trying to uncover the truth. Emil Eifrem – CEO Neo4j Inc. in reference to the Panama Papers ©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders. https://forums.manning.com/forums/graph-databases-in-action 2 In other words, Mr. Eifrem's quote above references the fundamental power of graph databases: to show the richness of highly interconnected data in a manner that is unavailable with other types of databases. Graphs and graph databases are powerful tools that enable us to better understand real-world problems that deal with highly connected data such as social networking, search, infrastructure management, recommendation engines, or in the case of the Panama Papers, fraud detection. The fundamental driving factors for this book are to empower you as the reader to leverage graph databases as a tool to build applications. Throughout this book, we will examine how graphs, and graph databases, provide the end user with a tremendous amount of additional power to navigate and explore data in ways that cannot be accomplished easily within a traditional relational database. We will achieve this goal by walking through the process of building a fictitious application, which we will call GluttonApp. GluttonApp is an application that provides personalized restaurant reviews, similar to Yelp. This application will also allow you to connect with your friends and develop a social network, similar to Facebook or Twitter. Finally, the app will use your friends' ratings to personalize your restaurant recommendations based on their restaurant reviews. Each chapter will comprise a step the software development process and will build upon the previous chapters work on GluttonApp. By the end of this book, we will have created a functioning application on a graph database using the skills learned along the way. In this chapter, we begin our journey by gaining an understanding of what graphs and graph databases are and how they compare with traditional tools, such as relational and document databases. 1.1 What are Graph Databases? Graph databases are a type of database that uses graph structures, specifically vertices (also called nodes) and edges, to store and query complex data. Figure 1.1 A Simple Graph showing a Vertex and an Edge ©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders. https://forums.manning.com/forums/graph-databases-in-action 3 These databases combine these basic graph structures with the fundamental constructs of graph theory to provide a database that facilitates fast and straightforward retrieval of complex data and relationships. In this section, we will introduce what a graph is and how they fundamentally differ from relational databases. 1.1.1 What is a Graph and Why Use one? A graph is a mathematical construct used to model relationships between items. It provides an abstract method for connecting objects and representing the relationships between them. In the figure below, we see a small social network graph where the people (items) appear as vertices (circles) and the relationships are represented by the lines connecting them, known as edges. Figure 1.2 A small social network graph It is human nature to attempt to view systems of interconnected entities in the real world as graphs. When thinking about a set of data which contains a vast array of highly interconnected items, such as the Panama Papers, it is a natural tendency to analyze and describe them as a web of interconnected things, which reflect another way to describe a graph. In the real-world, items are related to other items in rich and varying ways, which are not well represented by the uniform and rigid structure of columns, rows, and tables used by relational databases. In the figure below, we visualize the business connections of the family members of Syrian President Bashar Al Assad (https://www.icij.org/investigations/panama- papers/the-power-players/).
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages30 Page
-
File Size-