InPhilip Howard – ResearchBrief Director, Information Management www..com Daniel Howard – Senior Researcher 111 E 5th Avenue, San Mateo, CA 94401, USA Tel: +1 855 636 4532 Email: [email protected] Neo4j

The company CREATIVITY SCALE Neo4j Inc (previously Neo Technologies) was founded in 2000 in Sweden although it is now based in the United States. Outside of these two countries the company also has offices in the UK, Germany, France and Japan. The company’s eponymous product is available in both Community and Enterprise Editions and is available both on-premises and via Google, Amazon and Microsoft Azure cloud platforms. The company has a significant partner base, as illustrated in Figure 1. Notable amongst these

are Pitney Bowes, which embeds Neo4j within its EXECUTION TECHNOLOGY MDM offering. It is also worth mentioning Structr. org, which is an open source graph-based (Neo4j) The image in this Mutable Quadrant is derived from 13 high level metrics, the more the image covers a section the better. low code development and runtime environment for Execution metrics relate to the company, Technology to the mobile and web applications. product, Creativity to both technical and business innovation and Scale covers the potential business and market impact.

supports immediate consistency. Most users (see below) employ or OpenCypher (the open source version), which is the declarative language developed by Neo4j. It is notable that SAP, Redis, Memgraph and others have adopted OpenCypher and it is also being used within several open source projects including Cypher for , and Cypher for Gremlin, as well as in research projects like InGraph for streaming queries. As with any Figure 1 – Neo4j partner base declarative language this is best implemented along with a database optimiser and the company has devoted considerable resources to this, extending beyond an original rules-based optimiser so that it is now primarily cost-based, supporting optimisation for writes as well as reads.

What does it do? Historically, Neo4j has prioritised performance over scale but over the last couple of years it has put significant emphasis on scalability. This started with Figure 2 – Neo4j as an HTAP database support for read replicas and, in the latest release (3.4), the implementation of full horizontal multi- cluster scaling. What is it? Also in the 3.4 release the product supports native Neo4j, is a labelled, property with string indexes, which will improve write performance; a native engine that is targeted at operational and 3D geospatial search; a bulk data loader; security hybrid operational/analytic (HTAP) use cases , as applied against property values; a new date/time illustrated in Figure 2. It is ACID compliant and datatype; and faster Cypher run-times.

© Bloor 2019 Analytics Language Ease of Use Operations Features Performance Integration Scalability

“ I’d like to comment on Neo4j’s scalability and capability of looking Unusually for a non-technical users so that they can explore, edit at millions and millions of nodes. We property graph, SPARQL and search graphs, and create storyboards. An have a “big data” problem — not only in is supported. So too illustration of Neo4j Bloom is provided in Figure 3. structured data, but in unstructured data is Gremlin (part of the — and we are continually gathering more Apache Tinkerpop Why should you care? data. At NASA, my focus right now is on the unstructured data. And I need a product project). However, the Neo4j is the clear market leader in the graph space. or an application that can go across and emphasis is on Cypher It has the most users, it uses a widely adopted develop millions if not billions of nodes, and the company plans language (not just by Neo4j but also many other connect that information and at fast to introduce a “Cypher speeds. Neo4j is that tool.” suppliers of graph databases) that is much easier NASA for Gremlin” capability in to use than Gremlin and, in many respects, it has addition to the “Cypher for Spark” consistently been a lot more innovative than its capability that is already available. competitors. This is in part because of the maturity In the context of the latter the company of the product and partly because its success has currently has a product in alpha testing called meant that it has the resources to introduce such Morpheus for Apache Spark. This is intended to developments more quickly. Its competitors have allow graph analytics within a data lake (Hadoop historically argued that the product did not scale and Hive) with in-memory graphs, graph storage well but the multi-clustering now available should within Neo4j and high-speed data transfer between knock that argument on its head. Some vendors the two environments. In the future, Morpheus will that specialise in analytics will claim that they can support any source supported by the open source outperform Neo4j and this may be valid, but Neo4j Kettle data integration suite. Also, importantly, does not have this limited focus: it is, in effect, the the company has introduced a GQL (graph query Oracle or SQL Server of the graph database world. language) Manifesto as a step towards having a It is not the equivalent of Teradata. In other words, common standard for graph databases. This has it is a general-purpose graph database, and it is been proposed to the ANSI SQL committee for no coincidence that it is approval and is supported by a range of technology the leading product in vendors including Talend, SAP, Tableau and others. that space. In this Our Neo4j This proposal is running in parallel to the ANSI SQL context, we should “ solution is literally property graph extension program. comment that we have little faith thousands of times faster than the prior MySQL solution, with in the various queries that require 10-100 times benchmarks less code. At the same time, Neo4j published by allowed us to add functionality other vendors; that was previously not not least because possible. eBay Shutl” they are not typically comparing apples with apples.

The Bottom Line Figure 3 – The Neo4j Bloom interface Whenever we talk to a vendor in the graph database Other major developments in Neo4j include space it is Neo4j they compare themselves to. Even if support for high performance graph algorithms; they do something different and address a different preparatory work to leverage new hardware market, Neo4j is the benchmark. In pretty much capabilities such as Intel Optane and IBM Power 9; every instance Neo4j should be on your short list. integration and the introduction of Neo4j Bloom, which was released in May 2018. This provides a visualisation and communication interface for FOR FURTHER INFORMATION AND RESEARCH CLICK HERE

© Bloor 2019