A Comparison of Different Graph Database Types

A comparison of different graph database types Jieru Yao August 22, 2018 MSc in High Performance Computing with Data Science The University of Edinburgh Year of Presentation: 2018 Abstract In an era of information and Internet, there is a great amount of data produced and changed every day by different enterprises and individuals. Database Management System (DBMS) as an effective and efficient way for data storage, data management, data maintaining and data security, has been highly popular in various area like Business, Industry and Education. Relational Database System as the most familiar one has been known by a great many people. Relational Database uses ‘Tables’ as basic storage units. The data with various types and categories in our real lives could be abstracted into different ‘Tables’ as entities, while the relationships are represented by the correlation table between two entities, which means a correlation table will be created when there is a relationship with two entities. Then, there will be a negative effect that a great deal of correlation tables will be produced if the relationships between entities is more than one. In other words, if the relationships among entities are complex, it is difficult for designers to model the data using a Relational Database. Therefore, an alternative type of database based on Graph structure need to be introduced to improve performance when above problems happen, which is Graph Database. Graph Database uses nodes and edges representing data with complex relationships. This basic concepts help DBMS arrange and simplify the sophisticated relationships of mass data sets, which contributes to improve the performance of database. More detailed information will be demonstrated and discussed in later chapters. The dissertation mainly focuses on the performance of two different types (i.e. RDF and LPG types) of Graph Databases. Their different storage data types will be tested and analysed according to different size of dataset running on Windows 10 System and Ubuntu 16.04 System. Contents Chapter 1 Introduction .................................................................................................... 1 1.1 The importance of Graph Database ......................................................................... 1 1.2 Objectives ................................................................................................................. 4 Chapter 2 Literature Review .......................................................................................... 5 2.1 Graph Databases ....................................................................................................... 5 2.2 RDF Graph Databases .............................................................................................. 6 2.2.1 RDF graph ......................................................................................................... 6 2.2.2 SPARQL ............................................................................................................ 8 2.2.3 OpenLink Virtuoso ............................................................................................ 8 2.3 Labeled Property Graph Database ........................................................................... 9 2.3.1 Labeled Property Graph .................................................................................... 9 2.3.2 Neo4j................................................................................................................12 2.4 Open source data sets available in RDF ................................................................12 2.4.1 DBpedia ...........................................................................................................12 2.5 Introduction of two formats of RDF data sets .......................................................13 2.5.1 Turtle ................................................................................................................13 2.5.2 N-triples ...........................................................................................................13 Chapter 3 Research Methodology ................................................................................14 3.1 Data sets preparation from DBpedia ......................................................................14 3.2 Loading RDF data sets into OpenLink Virtuoso database ....................................15 3.2.1 Loading RDF data sets into OpenLink Virtuoso database on Windows 10 system .............................................................................................................15 3.2.2 Loading RDF data sets into OpenLink Virtuoso database on Ubuntu 16.04 system .............................................................................................................18 3.3 Loading RDF data sets into Neo4 ..........................................................................19 3.3.1 Loading RDF data sets into Neo4j database on Windows 10 system ...........19 i 3.3.2 Loading RDF data sets into Neo4j database on Ubuntu 16.04 system..........20 3.4 Measuring loading times on Windows 10 system .................................................20 3.4.1 Measuring loading times of Virtuoso database on Windows 10 system .......21 3.4.2 Measuring loading times of Neo4j database on Windows 10 system ...........23 3.5 Measuring loading times on Ubuntu 16.04 system ...............................................24 3.5.1 Measuring loading times of Virtuoso database on Ubuntu 16.04 system .....25 3.5.2 Measuring loading times of Neo4j database on Ubuntu 16.04 system .........26 3.6 Measuring query times on Windows 10 system ....................................................26 3.6.1 Measuring query times of Virtuoso database on Windows 10 system ..........26 3.6.2 Measuring query times of Neo4j database on Windows 10 system ..............27 3.7 Measuring query times on Ubuntu 16.04 system ..................................................27 3.7.1 Measuring query times of Virtuoso database on Ubuntu 16.04 system ........27 3.7.2 Measuring query times of Neo4j database on Ubuntu 16.04 system ............27 Chapter 4 Experimental Work Carried ......................................................................28 4.1 Hardware and Software configurations of test systems ........................................28 4.1.1 Windows 10 system ........................................................................................28 4.1.2 Ubuntu 16.04 system .......................................................................................28 4.2 Virtuoso Installation ...............................................................................................29 4.2.1 Virtuoso Installation on Windows system ......................................................29 4.2.2 Virtuoso Installation on Ubuntu 16.04 ...........................................................31 4.3 Neo4j Installation ...................................................................................................32 4.3.1 Neo4j installation on Windows 10 system .....................................................32 4.3.2 Neo4j installation on Ubuntu 16.04 system ...................................................33 4.4 Transformation process from RDF graph to LPG .................................................33 4.4.1 How to use the transformation plugin in Neo4j on Windows 10 system ......39 4.4.2 How to use the transformation plugin in Neo4j on Ubuntu 16.04 system ....41 Chapter 5 Results and Analysis ....................................................................................43 5.1 Loading time of two database systems on Windows 10 system ...........................43 ii 5.1.1 Loading time of Virtuoso on Windows 10 system.........................................43 5.1.2 Loading time of Neo4j on Windows 10 system .............................................44 5.1.3 Comparison of loading time of both database systems on Windows system 45 5.2 Loading time of two database systems on Ubuntu system ...................................46 5.2.1 Loading time of Virtuoso on Ubuntu system .................................................46 5.2.2 Loading time of Neo4j on Ubuntu system .....................................................47 5.2.3 Comparison of loading time of both database systems on Ubuntu system ...49 5.3 Comparison of loading time on Windows system and Ubuntu system for Virtuoso .......................................................................................................................................50 5.4 Comparison of loading time on Windows system and Ubuntu system for Neo4j .......................................................................................................................................51 5.5 Query time of Virtuoso on both systems ...............................................................52 5.5.1 Query time of Virtuoso on Windows system .................................................52 5.5.2 Query time of Virtuoso on Ubuntu system.....................................................53 5.5.3 Comparison of query time on Windows system and Ubuntu system for Virtuoso ..........................................................................................................54 5.6 Query time of Neo4j on both systems ...................................................................54 5.6.1 Query time of Neo4j on Windows system .....................................................54 5.6.2 Query time of Neo4j on Ubuntu system .........................................................55

Load more