System G Distributed Graph Database
Total Page:16
File Type:pdf, Size:1020Kb
System G Distributed Graph Database Gabriel Tanase Toyotaro Suzumura Jinho Lee IBM T.J. Watson Research IBM T.J. Watson Research IBM T.J. Watson Research Center Center Center 1101 Kitchawan Rd 1101 Kitchawan Rd 11501 Burnet Rd Yorktown Heights, NY 10598 Yorktown Heights, NY 10598 Austin, TX 78759 [email protected] [email protected] [email protected] ChunFu (Richard) Chen Jason Crawford Hiroki Kanezashi IBM T.J. Watson Research IBM T.J. Watson Research IBM T.J. Watson Research Center Center Center 1101 Kitchawan Rd 1101 Kitchawan Rd 1101 Kitchawan Rd Yorktown Heights, NY 10598 Yorktown Heights, NY 10598 Yorktown Heights, NY 10598 [email protected] [email protected] [email protected] ABSTRACT lems that need to be addressed when working with big data Motivated by the need to extract knowledge and value from is how to efficiently store and query them. In this paper we interconnected data, graph analytics on big data is a very focus on these challenges in the context of large connected active area of research in both industry and academia. To data sets that are modeled as graphs. Graphs are often used support graph analytics efficiently a large number of in mem- to represent linked concepts like communities of people and ory graph libraries, graph processing systems and graph things they like, bank accounts and transactions, or cities databases have emerged. Projects in each of these cate- and connecting roads. For all these examples vertices are gories focus on particular aspects such as static versus dy- in order of hundred of millions and edges or relations are in namic graphs, off line versus on line processing, small versus the order of tens of billions. For a property graph model large graphs, etc. [37] which is the focus of our work, there can be an ad- While there has been much advance in graph processing ditional arbitrary number of properties for each vertex and in the past decades, there is still a need for a fast graph pro- edge thus largely increasing the size of data to be maintained cessing, using a cluster of machines with distributed storage. and queried. In this paper, we discuss a novel distributed graph database The interest in graph databases has rapidly grown in the called System G designed for efficient graph data storage and past decade with many discussions on multiple aspects in- processing on modern computing architectures. In particu- cluding internal data structures and storage, query languages, lar we describe a single node graph database and a runtime programming models and analytic algorithms. As a re- and communication layer that allows us to compose a dis- sult, a few open-source based graph database solutions have tributed graph database from multiple single node instances. arisen [37, 6] and there are attempts to standardize pub- From various industry requirements, we find that fast inser- lic APIs and frameworks to query the graph data [10, 38]. tions and large volume concurrent queries are critical parts Also, a number of programming models for fast processing of of the graph databases and we optimize our database for OLAP (On-Line Analytical Processing work [33, 17, 28, 29] such features. We experimentally show the efficiency of have been proposed. However, there is not enough discus- System G for storing data and processing graph queries on sion on optimizing for performance in the distributed OLTP arXiv:1802.03057v1 [cs.DB] 8 Feb 2018 state-of-the-art platforms. (On-Line Transaction Processing) targeted graph databases and ways to design such databases are still under debate. Compared to OLAP-oriented approaches, OLTP systems 1. INTRODUCTION need to support masive changes to the data usually with Data is generated at an increasing rate and researchers continuous, large volume insertions and numerous small- in industry and academia are faced with novel challenges scale neighborhood searches. These performance require- on how to extract knowledge from it that subsequently can ments forces certain design choices and performance versus bring value to a community or company. A first set of prob- functionality trade-offs that need to be considered. In this work, we introduce the main components of the IBM System G graph database ecosystem: a fast, ACID- compliant, single node graph database and a distributed graph database that relaxes some of the ACID properties in order to achieve high performance. The scenarios we are currently targeting with our database are from financial world where the graph is constantly updated with hundreds of thousand of edges per second corresponding to transac- tions between various accounts while at the same time per- forming thousands of queries per second to look for spe- 1 cial patterns in small-scale neighborhoods. Additionally, we with support for persistent storage and a flexible runtime found out that many users do not keep the graph database for better work scheduling. as their primary database engine. Instead, they often choose OLAP Graph processing frameworks: Pregel and to migrate data from another type of database, often SQL, Giraph [33, 9, 42] employs a parallel programming model and run analytics on the graphs. In those cases, building called Bulk Synchronous Parallel (BSP) where the com- the graph may take several days to even weeks, dominating putation consists of a sequence of iterations. In each it- the time for the actual analytics phase. Therefore, the re- eration, the framework invokes a user-defined function for quirements for our graph database are: 1. Fast insertions each vertex in parallel. This function usually reads mes- of vertices and edges on the order of 100,000 per second. sages sent to this vertex from last iteration, sends messages 2. Support hundred of thousands parallel queries, which are to other vertices that will be processed at the next itera- often small scale searches around a given vertex or edge. We tion, and modifies the state of this vertex and its outgoing built SystemG because as far as we know, existing solutions edges. GraphLab [31] is a parallel programming and com- could not provide enough performance for the requirements putation framework targeted for sparse data and iterative above, for some of our industry collaborators. graph algorithms. PowerGraph [17] extends the GraphLab The contributions of our paper are a very fast single node with gather-apply-scatter model to optimize the distributed graph database implemented on top of a key value store graph processing. Pregel/Giraph and GraphLab are good and a distributed graph database designed as a composition at processing sparse data with local dependencies using it- of a set of single node databases referred to as shards. To erative algorithms. PGX.D [23] boosts performance of par- support high throughput insertions and low latency queries allel graph processing by communication optimization and we introduce a novel runtime supporting thousands of con- workload balancing. GraphX [43] is another Pregel based current requests per second and a Remote Procedure Call distributed framework that runs on Spark [3]. (RPC) based communication model to allow the database Another type of approach to reduce their network com- to move data efficiently between various machines hosting munication overhead is out-of-memory graph processing [28, different shards. On top of our distributed design we in- 39, 45, 32, 20]. They try to utilize the large storages (e.g., troduce two novel techniques for adding edges, techniques HDD, SSD). These frameworks are often focused on min- that reduces the number of exchanged messages compared imizing the number of storage accesses [28, 45], utilizing to a trivial implementation. We evaluate quantitatively our sequential accesses [39], or maintain multiple outstanding design and compare the performance of our database with I/O transactions to draw maximum bandwidth out of flash existing solutions such as Neo4J [37] and JanusGraph [6]. devices [44, 20]. There are also attempts to use mmap for ef- The remainder of the paper is organized as follows. In Sec- ficient storage management [32, 30]. However, none of these tion 2 we present related work, Section 3 briefly introduces are designed to answer ad hoc, small scale queries or process our single node graph database, Section 4 delves into our dis- graph with rich properties. tributed graph database design and organization. Section 5 OLTP Graph processing frameworks: Tinker- describes a general approach for writing graph queries on Pop [10] is an open-source graph ecosystem consisting of the distributed graph, Section 6 introduces two novel tech- key interfaces/tools needed in the graph processing space niques for adding edges, Section 7 describes the consistency including the property graph model (Blueprints), data flow model supported by our current API, Section 8 include our (Pipes), graph traversal and manipulation (Gremlin), graph- evaluation and comparison with other graph databases and object mapping (Frames), graph algorithms (Furnace) and Section 9 conclude our paper. graph server (Rexster and Gremlin Server). Interfaces de- fined by TinkerPop are becoming popular in the graph com- 2. RELATED WORK munity. As an example, Titan [11] and JanusGraph [6] ad- heres to a lot of APIs defined by TinkerPop and uses data In this section we discuss related projects and we clas- stores such as HBase and Cassandra as its scale-out per- sify them into four broad categories: graph datastructure sistent layer. TinkerPop focuses on defining data exchange libraries, graph processing frameworks, where the emphasis formats, protocols and APIs, rather than offering a software is on the programming models, graph databases, where the with good performance. focus is on storage, and hardware based approaches. Graph stores: Neo4J [37] provides a disk-based, pointer- Graph libraries: Graph libraries for in-memory-only chasing graph storage model.