1st National Conference on New Approaches in Electrical and Computer Engineering(NAECE2016) May.12-13th2016–Islamic Azad University Of Khorram Abad Branch

Graph Based NoSQL DataBases : Abilities and Applications

Parisa Delfani (Author) Parastou Alavi (Author)

Lorestan University ,Khorramabad,Iran Lorestan University ,Khorrramabad,Iran

[email protected] Ehsan Azizi Khadem(Author)

Lorestan University, Khorramabad, Iran,

[email protected]

Abstract —Most graph databases are NoSQL in nature and store Graph databases are based on graph theory. Graph databases their data in a key-value store or document-oriented databases. In general terms, they can be considered to be key-value databases employ nodes, properties, and edges. with the additional relationship concept added. Relationships allow the values in the store to be related to each other in a free form  Nodes represent entities such as people, businesses, way, as opposed to traditional relational databases where the accounts, or any other item you might want to keep track relationships are defined within the data itself. These relationships of. allow complex hierarchies to be quickly traversed, addressing one of the more common performance problems found in traditional  Properties are pertinent information that relate to nodes. key-value stores. Most graph databases also add the concept For instance, if Wikipedia were one of the nodes, one might of tags or properties, which are essentially relationships lacking a have it tied to properties such as website, reference pointer to another document. In computing, a is material, or word that starts with the letter w, depending on a database that uses graph structures for semantic queries with which aspects of Wikipedia are pertinent to the particular nodes, edges and properties to represent and store data. In this database. article, we define the graph databases neo4j, graphdb, Sesame, AllegroGraph , dex first, then we compare the features of each  Edges are the lines that connect nodes to nodes, or together.(Abstract) nodes to properties and they represent the relationship between the two. Most of the important information is stored in the edges. Meaningful patterns emerge when INTRODUCTION examining the connections and interconnections of nodes, properties, and edges. Compared with relational databases, graph databases are often faster for associative data sets and map more directly to the AllegroGraph: structure of object-oriented applications. They can scale more AllegroGraph is a graph database built around the W3C spec naturally to large data sets as they do not typically require for the Resource Description Framework. It’s designed for expensive join operations. As they depend less on a rigid handling Linked Data and the Semantic Web, subjects we’ve schema, they are more suitable to manage ad hoc and changing written about often. It supports SPARQL, RDFS++, and . data with evolving schemas. Conversely, relational databases are AllegroGraph is a proprietary product of Franz Inc., which typically faster at performing the same operation on large markets a number of Semantic Web products – including its numbers of data elements. flagship set of LISP-based development tools. The companyclaims Pfizer, Ford, Kodak, NASA and the Department Graph databases are a powerful tool for graph-like queries, for of Defense among its AllegroGraph customers. example computing the shortest path between two nodes in the

graph. Other graph-like queries can be performed over a graph

database in a natural way (for example graph's diameter

computations or community detection).

1

1st National Conference on New Approaches in Electrical and Computer Engineering(NAECE2016) May.12-13th2016–Islamic Azad University Of Khorram Abad Branch

“Find whether the most important friend of Sonya (using the Name AllegroGraph SNA centrality statistic) made a payment within a 100 miles of Description High performance, Rotterdam, NY (using geospatial) within the last 10 years persistent RDF store with (temporal).” additional support for This would be impossible in any other graph database, RDF Graph DBMS store, document store or Hadoop like solution in such a concise way.  AllegroGraph - Combines Geospatial, Temporal, Developer Franz Inc. and SNA into a single “Golden” query. Initial release 2004 Database as aService(DBaaS) No  W3C standards based query language and data format Data scheme Yes via SPARQL and RDF. APIs and other access methods RESTful HTTP API SPARQL  GRUFF - Data explorer, Graph Visualization, Supportedprogramminglanguages C#, , Java, Lisp, Graphical Query Generation , Python, Ruby, Scala  AllegroGraph has a built-in Rule Based System on top Triggers Yes of an ISO compliant Prolog. Users can write rules and In-memory capabilities No stored procedures in Prolog and make them available to Durability Yes other rules and/or users. User concepts Users with fine-grained authorization concept,  Triple Level Security user roles and pluggable authentication  True Durability in AllegroGraph - Many RDF and Table-1: Noted Features of AllegroGraph Graph DBMS systems to NOT write transaction logs for every transaciton, so in essence these databases are NOT durable. Specific characteristics: Experiment - Perform multiple commits per second and run AllegroGraph is 100 percent ACID, supporting monitor "vmstat 1" to look at blocks in and out (column - io Transactions: Commit, Rollback, and Checkpointing. Triple Level Security with Security Filters. Gruff - Graph bi/bo). Visualization, Generate SPARQL and Prolog queries visually. System Properties Comparison AllegroGraph vs. GraphDB AllegroGraph server can be scripted using the JavaScript API. Full and Fast Recoverability. 100% Read Concurrency, Near Full Write Concurrency. Online Backups, Point-in-Time Name AllegroGraph GraphDB Recovery, Replication, Warm Standby. Dynamic and Automatic Description High Graph database and RDF Indexing – All committed triples are always indexed (7 indices). Advanced Text Indexing – Text indexing per predicate. performance, triplestore built on OWL SOLR and MongoDB Integration. SPIN support (SPARQL persistent RDF standards Inferencing Notation). The SPIN API allows you to define a function in terms of a SPARQL query and then call that function store with in other SPARQL queries. These SPIN functions can appear in additional FILTERs and can also be used to compute values in assignment and select expressions. support for

Graph DBMS Competitive advantages: AllegroGraph is uniquely suited to support adhoc queries Database model Graph DBMS Graph DBMS through SPARQL, Prolog and languages like JavaScript. AllegroGraph uses sorted quintuple indices that will index every RDF store RDF store primary and non-primary field. So users never have to worry Developer Franz Inc. Ontotext about whether a certain field is indexed or not. There are 6 default indices that are in general sufficient for set based query Initial release 2004 2000 index to execute almost any adhoc query. Database as a No No One of the most powerful features of AllegroGraph is that it is possible to mix Geospatial, Temporal, Social Network Service (DBaaS) Analytics, and Reasoning, all in the same query (SPARQL or Implementation Java Prolog). An example:

2

1st National Conference on New Approaches in Electrical and Computer Engineering(NAECE2016) May.12-13th2016–Islamic Azad University Of Khorram Abad Branch

Developer Franz Inc. Aduna language Server operating Linux

Server operating Linux All OS with a Java VM systems OS X OS X systems OS X Linux Windows Unix

Windows OS X Windows

Windows Data scheme yes yes

Server-side yes Java Server Plugin XML support no scripts Secondary indexes yes Yes

Partitioning with Federation None APIs and other RESTful HTTP API Java API methods access methods SPARQL RIO

Replication Master-slave Master-master replication Sail API methods replication SeRQL

Consistency Immediate Eventual Consistency, Local Sesame REST HTTP concepts Consistency or consistency configurable in High Protocol

Eventual Availability Cluster setup SPARQL

Consistency SQL No No

depending on Partitioning with Federation None

configuration methods

Triggers Yes No Replication Master-slave replication None

Market metrics GraphDB is the most utilized methods semantic triplestore for mission critical enterprise... Consistency Immediate Consistency Key customers BBC, Press Association, concepts or Eventual Consistency Financial Times, DK, Euromoney, AstraZeneca, The depending on British... Licensing and GraphDB-Free is free to use. SE configuration pricing models and Enterprise are license per CPU-Core used. Perpetual,... Transaction ACID ACID Table-2: Compare the features of two Graph Database concepts

System Properties Comparison AllegroGraph vs. Sesame User concepts Users with fine-grained No

authorization concept, Name AllegroGraph Sesame user roles and pluggable Description High performance, Sesame is a framework for authentication persistent RDF store processing RDF data, Specific AllegroGraph is 100 characteristics percent ACID , with additional support supporting both memory- supporting Transactions: Commit, Rollback, for Graph DBMS based and a disk-based and... Competitive storage. advantages AllegroGraph is Database model Graph DBMS RDF store uniquely suited to

RDF store support adhoc queries

3

1st National Conference on New Approaches in Electrical and Computer Engineering(NAECE2016) May.12-13th2016–Islamic Azad University Of Khorram Abad Branch

scenarios Data Management Identity and Access Management Network... through SPARQL, Key customers eBay, Walmart, Cisco, UBS, HP, CenturyLink, Prolog... Telenor, TomTom, Table-3: Compare the features of two Graph Database Telenor, The National... Market metrics Neo4j boasts the world's System Propertis Comparison AllegroGraph vs. Neo4j largest graph database ecosystem with more than a million... Name AllegroGraph Neo4j Licensing and GPL v3 license that can be pricing models used all the places where Description High performance, Open source graph you might use MySQL. Neo4j Commercial... persistent RDF store database Table-4: Compare the features of two Graph Database with additional support Neo4j for Graph DBMS This is one of the most popular databases in the category, and Database model Graph DBMS Graph DBMS one of the only open source options. It’s the product of the RDF store company Neo Technologies, which recently moved the community edition of Neo4j from the AGPL license to the GPL Developer Franz Inc. Neo Technology license (see our coverage here). However, its enterprise edition is still proprietary AGPL. Neo4j is ACID compliant. It’s Java License commercial Open Source based but has bindings for other languages, including Ruby and Server operating Linux Linux Python. Neo Technologies cites several customers, though none of them systems OS X OS X are household names. Windows Windows System Properties Comparison GraphDB vs. Neo4j Triggers yes Yes

Partitioning with Federation None Name GraphDB Neo4j methods Description Graph database and RDF Open source

Replication Master-slave replication Master-slave replication triplestore built on OWL graph database methods standards

Consistency Immediate Consistency Eventual Consistency Database model Graph DBMS Graph DBMS concepts or Eventual Consistency configurable in High RDF store

depending on Availability Cluster setup Developer Ontotext Neo Technology

configuration Immediate Consistency Initial release 2000 2007

Transaction ACID ACID License commercial Open Source concepts Database as a Service no No

User concepts Users with fine-grained No (DBaaS) authorization concept, Secondary indexes realizable with Yes user roles and pluggable Solr/Lucene/Elasticsearch authentication connectors Specific AllegroGraph is 100 Unlock the value of data characteristics percent ACID , relationships with Neo4j, a Server-side scripts Java Server Plugin yes supporting transactional database Transactions: Commit, that... Consistency concepts Eventual Consistency, Local Eventual Rollback, and... Typical Real-Time consistency configurable in Consistency application Recommendations Master

4

1st National Conference on New Approaches in Electrical and Computer Engineering(NAECE2016) May.12-13th2016–Islamic Azad University Of Khorram Abad Branch

programming Clojure PHP

High Availability Cluster setup configurable in languages Go Python

High Availability Groovy

Cluster setup Java

Immediate JavaScript

Consistency Perl Specific characteristics GraphDB Enterprise is a high- Unlock the value PHP performance semantic of data repository created by relationships with Python Ontotext.... Neo4j, a transactional Ruby database that... Competitive Compliance to W3C standards, Neo4j is the only Scala advantages performant, extensible, transactional scalable, high-availability... database that Partitioning methods none none combines everything you Replication methods Master-slave replication none need for Specific Unlock the value of data performance... characteristics relationships with Neo4j, a Key customers BBC, Press Association, eBay, Walmart, transactional database that... Financial Times, DK, Cisco, UBS, HP, Competitive Neo4j is the only Euromoney, AstraZeneca, The CenturyLink, advantages transactional database that British... Telenor, combines everything you TomTom, need for performance... Telenor, The Typical application Real-Time Recommendations National... scenarios Master Data Management Licensing and pricing GraphDB-Free is free to use. GPL v3 license Identity and Access models SE and Enterprise are license that can be used Management Network... per CPU-Core used. all the places Licensing and GPL v3 license that can be Perpetual,... where you might pricing models used all the places where you use MySQL. might use MySQL. Neo4j Neo4j Commercial... Commercial... Table-6: Compare the features of two Graph Database Table-5: Compare the features of two Graph Database FlockDB System Properties Comparison Neo4j vs. Sesame FlockDB was created by for relationship related analytics. Twitter’s Kevin Weil talked about the creation of the Name Neo4j Sesame database, along with Twitter’s use of other NoSQL databses, Description Open source graph database Sesame is a atStrange Loop last year. There is no stable release of FlockDB, and there’s some framework for controversy as to whether it can be truly referred to as a graph processing RDF Name FlockDB

data, supporting both Database model Graph DBMS Developer Twitter memory-based and a Current release 1.8.5, February 2012 disk-based storage. License Open Source Database model Graph DBMS RDF store Database as a Service (DBaaS) no Developer Neo Technology Aduna database. In a DevWebPro article Michael Marr wrote: This lead MyNoSQL blogger Alex Popescu to write: “Without Initial release 2007 2004 traversals it is only a persisted graph. But not a graph database.” Implementation java Java Table-7: Noted Features of AllegroGraph language Supported .Net Java GraphDB

5

1st National Conference on New Approaches in Electrical and Computer Engineering(NAECE2016) May.12-13th2016–Islamic Azad University Of Khorram Abad Branch

license, while the enterprise edition is commercial Name GraphDB Sesame and proprietary. It’s available as a cloud-service through Amazon S3 or Microsoft Azure. Description Graph database and RDF Sesame is a Table-8: Compare the features of two Graph Database triplestore built on OWL framework for

standards processing RDF Name Sesame data, supporting Description Sesame is a framework for processing RDF data, supporting both memory- both memory- based and a disk-based storage. based and a disk- Database model RDF store Developer Aduna based storage. Initial release 2004 License Database model Graph DBMS RDF store Open Source RDF store Database as a Service no

Developer Ontotext Gather (DBaaS) Initial release 2000 2004 Implementation language Java Server operating systems Linux License commercial Open Source OS X Unix Server operating All OS with a Java VM Linux, Windows Data scheme systems Linux OS X yes OS X Unix yes Windows Windows Typing Secondary indexes yes Data scheme schema-free Yes SQL no Supported .Net Java APIs and other access methods Java API programming C# PHP RIO languages Clojure Python Sail API

Java SeRQL JavaScript (Node.js) Sesame REST HTTP Protocol SPARQL PHP Supported programming Java languages PHP Python Python yes Ruby Server-side scripts Scala Triggers yes Specific GraphDB Enterprise is a none characteristics high-performance semantic repository created by Partitioning methods Ontotext.... none Competitive Compliance to W3C advantages standards, performant, Replication methods extensible, scalable, high- MapReduce no availability... Licensing and GraphDB-Free is free to use. pricing models SE and Enterprise are license Transaction concepts ACID per CPU-Core used. yes Perpetual,... GraphDB is graph database built in .NET by the German Concurrency company sones. sones was founded in 2007 and received a new round of funding earlier this year, said to be a “couple million” Durability yes Euros. The community edition is available under an APL 2

6

1st National Conference on New Approaches in Electrical and Computer Engineering(NAECE2016) May.12-13th2016–Islamic Azad University Of Khorram Abad Branch

no

User concepts Table-9: Noted Features of AllegroGraph

DEX is a new graph database defined and implemented using a combination of several specialized structures that allow for an efficient management of very large graphs. It fulfills the conditions of a graph database model since its data representation is in the form of a large graph; the query operations are based on graph operations or extensions to graph operations; query results are also in the form of new graphs; and, finally, there are constraints based on node and edge types, explicit and implicit relationships, and attribute domains. DEX is based on a graph database model,[2] that is basically characterized by three properties: data structures are graphs or any other structure similar to a graph; data manipulation and queries are based on graph-oriented operations; and there are data constraints to guarantee the integrity of the data and its relationships.

A DEX graph is a Labeled Directed Attributed Multigraph. Labeled because nodes and edges in a graph belong to types. Directed because it supports directed edges as well as undirected. Attributed because both nodes and edges may have attributes and Multigraph meaning that there may be multiple edges between the same nodes even if they are from the same edge type.

7

1st National Conference on New Approaches in Electrical and Computer Engineering(NAECE2016) May.12-13th2016–Islamic Azad University Of Khorram Abad Branch

REFERENCES

[1] “Short overview on the emerging world of graph databases, ”http://www.graph-database.org/overview.html. [2] Neo4j B log, Internet: http://blog. neo4j.org/2009/04/current-database-debate- andgraph.html ,201 0.

[3] Brewer, E. (2000). Towards robust distribuited. In Proceedings of the 9th ACM Symposium on [4] principles of distributed computing, New York, NY, USA. ACM.

[5] http://readwrite.com/2011/04/20/5-graph-databases-to-consider/

[6] https://www.safaribooksonline.com/library/view/graph-databases- 2nd/9781491930885/ch01.html

[7] http://franz.com/agraph/support/documentation/current/agraph- introduction.html

[8] https://fa.wikipedia.org/wiki/%D9%BE%D8%A7%DB%8C%DA%A F%D8%A7%D9%87_%D8%AF%D8%A7%D8%AF%D9%87%E2 %80%8C%D9%87%D8%A7%DB%8C_%DA%AF%D8%B1%D8% A7%D9%81R. Angles and C. Gutierrez. Survey of graph database models. Technical Report TR/DCC-2005-10, Computer Science Department, Universidad de Chile, October 2005.

8