Graph Databases, Graphql And

Total Page:16

File Type:pdf, Size:1020Kb

Graph Databases, Graphql And Graph Databases, GraphQL and IAM Published 11 February 2020 Abstract The Digital Enterprise requires speed, scale and contextual awareness across increasingly complex and diverse relationships. The Lightweight Directory Access Protocol (LDAP) has been the standard for Identity and Access Management (IAM)- centric enterprise directories for 30 years. LDAP relies on a hierarchical data model that begins with a top-level root entry, then moves to subordinate branches and ends in leaf nodes. LDAP has traditionally supported security operations by querying authentication and authorization attributes to make informed security decisions. The current and future challenge is the requirement for an increasingly larger array of context signals (identity attributes, devices, location, source, etc.) that, in turn, lead to complex LDAP database structures, and often result in the need to use meta- or virtual- directory solutions. As a result, vendors have long been investigating the use of non-hierarchical database models – most notably the RDBMS to support scaling directories by attaching it to a highly performant, replication-ready databases The challenge is that relational databases also introduce complexity related to efficiently joining data across numerous rows and tables during runtime authentication and authorization processing. This, in turn, has led to the investigation of other database alternatives, most notably GraphQL and graph databases. A graph database uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. The “graph” relates the data items to a collection of nodes and edges, with the edges representing relationships. Such relationships allow data in the storage system to be linked together directly and, in many cases, retrieved with one operation. This report starts by providing a graph database and GraphQL level-set, then evaluates whether this approach has long-term merit as a solid database foundation for IAM solutions in general as well as specific IAM use cases such as Customer IAM (CIAM). Authors: Doug Simmons Archie Reed Principal Consulting Analyst Principal Consulting Analyst [email protected] [email protected] Graph Databases and IAM Simmons, Reed Table of Contents 2 © 2020 TechVision Research, all rights reserved www.techvisionresearch.com Graph Databases and IAM Simmons, Reed Executive Summary A graph database is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph, which relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. Such relationships allow data in the storage system to be linked together directly and, in many cases, retrieved with one operation. Graph databases have emerged over the past few years as reasonably good alternatives to the rigid schema of hierarchical databases such as LDAP, as well as complex, costly join operations inherent with relational databases. But the graph database, while becoming increasingly popular with social networking solutions such as Facebook, LinkedIn, Twitter and Google in order to maintain complex relationships (e.g., “friends”) among end users, is a relatively new concept. Do graph databases hold the key to large-scale directory services, alongside distributed data and contextual signal sources, in support of IAM? The answer is “most likely”. Over the course of the next 3-5 years, TechVision expects a newer breed of IAM solutions with graph database underpinnings to begin to overtake the technologies we have used for the past few decades. Our principal recommendation is that you begin to investigate this fascinating way for managing identity data soon. In this way, you will have had a useful indoctrination into the universe of the graph and can better prepare your organization for the next wave of IAM solutions. For those of you in the throes of developing a new CIAM infrastructure to replace an aging or under-performing platform, we strongly recommend that you prioritize CIAM solutions that incorporate graph technology. For ‘Microsoft shops’, the writing is on the wall; it would behoove you to start your journey with an up-to-date mindset based on where both Microsoft and the IAM industry are headed. In this report, we will describe how access control policies may lend themselves to better management within a graph database. TechVision Research expects graph database technology to rapidly grow, and in the case of IoT implementations – this growth may be dramatic. There are already some very good tools on the market, so the time may be right to begin thinking about your ‘Next-gen IAM’ solution being built on a graph database foundation. In particular, graph databases are gaining popularity in support of graph-based access control (GBAC), supporting a declarative way to define access rights, task assignments, recipients and content in information systems. The access rights are granted to objects like files or documents, but also business objects like an account. Compared with role-based access control (RBAC) and attribute-based access control (ABAC), GBAC has so far shown to return run-time authorization decisions much faster (some claim more than twice as fast). Given that runtime access controls have been a challenge for those responsible for information security for the past few decades, we believe it is a good time for most enterprises to familiarize yourself with graph database and GraphQL technology, bring some flavor of this in-house and ‘experiment with it in a sandbox’. Perhaps a small ‘tiger team’ can be formed in order to build 3 © 2020 TechVision Research, all rights reserved www.techvisionresearch.com Graph Databases and IAM Simmons, Reed some meaningful expertise in the use of this technology for IAM, whether consumer focused (CIAM), for improved access control policy management or for IoT scenario testing. Introduction The Lightweight Directory Access Protocol (LDAP) has been the industry standard for Identity and Access Management (IAM)-centric enterprise directories for almost three decades. Today, it would be difficult – if not impossible, to find an organization that does not rely on LDAP for user (and device) authentication and authorization. To make this point even stronger, consider that Microsoft Active Directory and Azure Active Directory have been built on the LDAP model since inception. Having worked in countless customer organizations for the past thirty years as IAM consultants and architects, we can assure you that LDAP has become one of the most pervasive subsystems in the history of IT. Derived from the International Standards Organization’s 1988 X.500 Directory Services model, LDAP relies on a hierarchical database model that begins with a top-level root entry and branches off into subordinate branches and ends in leaf nodes. One of the principal challenges with using a hierarchical structure, or namespace, for directories is that the schema and namespace itself often need to change in concert with business focus, organizational changes – including mergers and acquisitions, and the general evolution of computing and IAM itself. Adding to the problem, as the LDAP directories in many enterprise IAM systems have grown in size and complexity, they become slow or less responsive. When this happens, directory performance (or lack thereof) can impact the performance of every application that depends on them. While the hierarchical database model used by LDAP has endured, vendors have been investigating the use of non-hierarchical database models – most notably the relational database almost since the inception of LDAP. The attractiveness is that relational databases provide highly performant, replication-ready support that can also serve applications via Source Query Language (SQL). An additional benefit of SQL is because it may be an area that an enterprise’s in-house skills may already be abundant. However, relational databases retain their own level of complexity related to efficiently joining data across numerous rows and tables during runtime authentication and authorization processing. The need for scale and performance without the complexity of relational databases has led to further investigation into other database alternatives, most notably graph databases and GraphQL. A key issue the industry is addressing, and the focus of this paper is whether graph databases may be a better alternative to the traditional hierarchical LDAP or relational SQL structures. These discussions have been going on the past several years and graph databases are beginning to emerge as a reasonably good alternative to the rigid schema of hierarchical databases such as LDAP. TechVision consistently recommends flexibility in future-state IAM strategies and rigid schemas are not consistent with this goal. That said, the graph database, while becoming increasingly popular with social networking solutions such as Facebook, Netflix, Twitter, 4 © 2020 TechVision Research, all rights reserved www.techvisionresearch.com Graph Databases and IAM Simmons, Reed LinkedIn and Google to maintain complex relationships (e.g., “friends”) amongst end users at scale, is a relatively new concept. The question we are looking to answer is if graph databases hold the key to large-scale directory services, alongside distributed data and signals sources, in support of IAM? The following sections examine graph database
Recommended publications
  • Empirical Study on the Usage of Graph Query Languages in Open Source Java Projects
    Empirical Study on the Usage of Graph Query Languages in Open Source Java Projects Philipp Seifer Johannes Härtel Martin Leinberger University of Koblenz-Landau University of Koblenz-Landau University of Koblenz-Landau Software Languages Team Software Languages Team Institute WeST Koblenz, Germany Koblenz, Germany Koblenz, Germany [email protected] [email protected] [email protected] Ralf Lämmel Steffen Staab University of Koblenz-Landau University of Koblenz-Landau Software Languages Team Koblenz, Germany Koblenz, Germany University of Southampton [email protected] Southampton, United Kingdom [email protected] Abstract including project and domain specific ones. Common applica- Graph data models are interesting in various domains, in tion domains are management systems and data visualization part because of the intuitiveness and flexibility they offer tools. compared to relational models. Specialized query languages, CCS Concepts • General and reference → Empirical such as Cypher for property graphs or SPARQL for RDF, studies; • Information systems → Query languages; • facilitate their use. In this paper, we present an empirical Software and its engineering → Software libraries and study on the usage of graph-based query languages in open- repositories. source Java projects on GitHub. We investigate the usage of SPARQL, Cypher, Gremlin and GraphQL in terms of popular- Keywords Empirical Study, GitHub, Graphs, Query Lan- ity and their development over time. We select repositories guages, SPARQL, Cypher, Gremlin, GraphQL based on dependencies related to these technologies and ACM Reference Format: employ various popularity and source-code based filters and Philipp Seifer, Johannes Härtel, Martin Leinberger, Ralf Lämmel, ranking features for a targeted selection of projects.
    [Show full text]
  • Graph Database for Collaborative Communities Rania Soussi, Marie-Aude Aufaure, Hajer Baazaoui
    Graph Database for Collaborative Communities Rania Soussi, Marie-Aude Aufaure, Hajer Baazaoui To cite this version: Rania Soussi, Marie-Aude Aufaure, Hajer Baazaoui. Graph Database for Collaborative Communities. Community-Built Databases, Springer, pp.205-234, 2011. hal-00708222 HAL Id: hal-00708222 https://hal.archives-ouvertes.fr/hal-00708222 Submitted on 14 Jun 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Graph Database For collaborative Communities 1, 2 1 Rania Soussi , Marie-Aude Aufaure , Hajer Baazaoui2 1Ecole Centrale Paris, Applied Mathematics & Systems Laboratory (MAS), SAP Business Objects Academic Chair in Business Intelligence 2Riadi-GDL Laboratory, ENSI – Manouba University, Tunis Abstract Data manipulated in an enterprise context are structured data as well as un- structured data such as emails, documents, social networks, etc. Graphs are a natural way of representing and modeling such data in a unified manner (Structured, semi-structured and unstructured ones). The main advantage of such a structure relies in the dynamic aspect and the capability to represent relations, even multiple ones, between objects. Recent database research work shows a growing interest in the definition of graph models and languages to allow a natural way of handling data appearing.
    [Show full text]
  • Graphql Attack
    GRAPHQL ATTACK Date: 01/04/2021 Team: Sun* Cyber Security Research Agenda • What is this? • REST vs GraphQL • Basic Blocks • Query • Mutation • How to test What is the GraphQL? GraphQL is an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data. GraphQL was developed internally by Facebook in 2012 before being publicly released in 2015. • Powerful & Flexible o Leaves most other decisions to the API designer o GraphQL offers no requirements for the network, authorization, or pagination. Sun * Cyber Security Team 1 REST vs GraphQL Over the past decade, REST has become the standard (yet a fuzzy one) for designing web APIs. It offers some great ideas, such as stateless servers and structured access to resources. However, REST APIs have shown to be too inflexible to keep up with the rapidly changing requirements of the clients that access them. GraphQL was developed to cope with the need for more flexibility and efficiency! It solves many of the shortcomings and inefficiencies that developers experience when interacting with REST APIs. REST GraphQL • Multi endpoint • Only 1 endpoint • Over fetching/Under fetching • Fetch only what you need • Coupling with front-end • API change do not affect front-end • Filter down the data • Strong schema and types • Perform waterfall requests for • Receive exactly what you ask for related data • No aggregating or filtering data • Aggregate the data yourself Sun * Cyber Security Team 2 Basic blocks Schemas and Types Sun * Cyber Security Team 3 Schemas and Types (2) GraphQL Query Sun * Cyber Security Team 4 Queries • Arguments: If the only thing we could do was traverse objects and their fields, GraphQL would already be a very useful language for data fetching.
    [Show full text]
  • Graphql-Tools Merge Schemas
    Graphql-Tools Merge Schemas Marko still misdoings irreproachably while vaulted Maximilian abrades that granddads. Squallier Kaiser curarize some presuminglyanesthetization when and Dieter misfile is hisexecuted. geomagnetist so slothfully! Tempting Weber hornswoggling sparsely or surmisings Pass on operation name when stitching schemas. The tools that it possible to merge schemas as well, we have a tool for your code! It can remember take an somewhat of resolvers. It here are merged, graphql with schema used. Presto only may set session command for setting some presto properties during current session. Presto server implementation of queries and merged together. Love writing a search query and root schema really is invalid because i download from each service account for a node. Both APIs have root fields called repository. That you actually look like this case you might seem off in memory datastore may have you should be using knex. The graphql with vue, but one round robin approach. The name signify the character. It does allow my the enums, then, were single introspection query at not top client level will field all the data plan through microservices via your stitched interface. The tools that do to other will a tool that. If they allow new. Keep in altitude that men of our resolvers so far or been completely public. Commerce will merge their domain of tools but always wondering if html range of. Based upon a merge your whole schema? Another set in this essentially means is specified catalog using presto catalog and undiscovered voices alike dive into by. We use case you how deep this means is querying data.
    [Show full text]
  • Red Hat Managed Integration 1 Developing a Data Sync App
    Red Hat Managed Integration 1 Developing a Data Sync App For Red Hat Managed Integration 1 Last Updated: 2020-01-21 Red Hat Managed Integration 1 Developing a Data Sync App For Red Hat Managed Integration 1 Legal Notice Copyright © 2020 Red Hat, Inc. The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/ . In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version. Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries. Linux ® is the registered trademark of Linus Torvalds in the United States and other countries. Java ® is a registered trademark of Oracle and/or its affiliates. XFS ® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries. MySQL ® is a registered trademark of MySQL AB in the United States, the European Union and other countries. Node.js ® is an official trademark of Joyent. Red Hat is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.
    [Show full text]
  • Property Graph Vs RDF Triple Store: a Comparison on Glycan Substructure Search
    RESEARCH ARTICLE Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search Davide Alocci1,2, Julien Mariethoz1, Oliver Horlacher1,2, Jerven T. Bolleman3, Matthew P. Campbell4, Frederique Lisacek1,2* 1 Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, 1211, Switzerland, 2 Computer Science Department, University of Geneva, Geneva, 1227, Switzerland, 3 Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, 1211, Switzerland, 4 Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia * [email protected] Abstract Resource description framework (RDF) and Property Graph databases are emerging tech- nologies that are used for storing graph-structured data. We compare these technologies OPEN ACCESS through a molecular biology use case: glycan substructure search. Glycans are branched Citation: Alocci D, Mariethoz J, Horlacher O, tree-like molecules composed of building blocks linked together by chemical bonds. The Bolleman JT, Campbell MP, Lisacek F (2015) molecular structure of a glycan can be encoded into a direct acyclic graph where each node Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search. PLoS ONE 10(12): represents a building block and each edge serves as a chemical linkage between two build- e0144578. doi:10.1371/journal.pone.0144578 ing blocks. In this context, Graph databases are possible software solutions for storing gly- Editor: Manuela Helmer-Citterich, University of can structures and Graph query languages, such as SPARQL and Cypher, can be used to Rome Tor Vergata, ITALY perform a substructure search. Glycan substructure searching is an important feature for Received: July 16, 2015 querying structure and experimental glycan databases and retrieving biologically meaning- ful data.
    [Show full text]
  • Large Scale Querying and Processing for Property Graphs Phd Symposium∗
    Large Scale Querying and Processing for Property Graphs PhD Symposium∗ Mohamed Ragab Data Systems Group, University of Tartu Tartu, Estonia [email protected] ABSTRACT Recently, large scale graph data management, querying and pro- cessing have experienced a renaissance in several timely applica- tion domains (e.g., social networks, bibliographical networks and knowledge graphs). However, these applications still introduce new challenges with large-scale graph processing. Therefore, recently, we have witnessed a remarkable growth in the preva- lence of work on graph processing in both academia and industry. Querying and processing large graphs is an interesting and chal- lenging task. Recently, several centralized/distributed large-scale graph processing frameworks have been developed. However, they mainly focus on batch graph analytics. On the other hand, the state-of-the-art graph databases can’t sustain for distributed Figure 1: A simple example of a Property Graph efficient querying for large graphs with complex queries. Inpar- ticular, online large scale graph querying engines are still limited. In this paper, we present a research plan shipped with the state- graph data following the core principles of relational database systems [10]. Popular Graph databases include Neo4j1, Titan2, of-the-art techniques for large-scale property graph querying and 3 4 processing. We present our goals and initial results for querying ArangoDB and HyperGraphDB among many others. and processing large property graphs based on the emerging and In general, graphs can be represented in different data mod- promising Apache Spark framework, a defacto standard platform els [1]. In practice, the two most commonly-used graph data models are: Edge-Directed/Labelled graph (e.g.
    [Show full text]
  • Application of Graph Databases for Static Code Analysis of Web-Applications
    Application of Graph Databases for Static Code Analysis of Web-Applications Daniil Sadyrin [0000-0001-5002-3639], Andrey Dergachev [0000-0002-1754-7120], Ivan Loginov [0000-0002-6254-6098], Iurii Korenkov [0000-0002-8948-2776], and Aglaya Ilina [0000-0003-1866-7914] ITMO University, Kronverkskiy prospekt, 49, St. Petersburg, 197101, Russia [email protected], [email protected], [email protected], [email protected], [email protected] Abstract. Graph databases offer a very flexible data model. We present the approach of static code analysis using graph databases. The main stage of the analysis algorithm is the construction of ASG (Abstract Source Graph), which represents relationships between AST (Abstract Syntax Tree) nodes. The ASG is saved to a graph database (like Neo4j) and queries to the database are made to get code properties for analysis. The approach is applied to detect and exploit Object Injection vulnerability in PHP web-applications. This vulnerability occurs when unsanitized user data enters PHP unserialize function. Successful exploitation of this vulnerability means building of “object chain”: a nested object, in the process of deserializing of it, a sequence of methods is being called leading to dangerous function call. In time of deserializing, some “magic” PHP methods (__wakeup or __destruct) are called on the object. To create the “object chain”, it’s necessary to analyze methods of classes declared in web-application, and find sequence of methods called from “magic” methods. The main idea of author’s approach is to save relationships between methods and functions in graph database and use queries to the database on Cypher language to find appropriate method calls.
    [Show full text]
  • Database Software Market: Billy Fitzsimmons +1 312 364 5112
    Equity Research Technology, Media, & Communications | Enterprise and Cloud Infrastructure March 22, 2019 Industry Report Jason Ader +1 617 235 7519 [email protected] Database Software Market: Billy Fitzsimmons +1 312 364 5112 The Long-Awaited Shake-up [email protected] Naji +1 212 245 6508 [email protected] Please refer to important disclosures on pages 70 and 71. Analyst certification is on page 70. William Blair or an affiliate does and seeks to do business with companies covered in its research reports. As a result, investors should be aware that the firm may have a conflict of interest that could affect the objectivity of this report. This report is not intended to provide personal investment advice. The opinions and recommendations here- in do not take into account individual client circumstances, objectives, or needs and are not intended as recommen- dations of particular securities, financial instruments, or strategies to particular clients. The recipient of this report must make its own independent decisions regarding any securities or financial instruments mentioned herein. William Blair Contents Key Findings ......................................................................................................................3 Introduction .......................................................................................................................5 Database Market History ...................................................................................................7 Market Definitions
    [Show full text]
  • IBM Filenet Content Manager Technology Preview: Content Services Graphql API Developer Guide
    IBM FileNet Content Manager Technology Preview: Content Services GraphQL API Developer Guide © Copyright International Business Machines Corporation 2019 Copyright Before you use this information and the product it supports, read the information in "Notices" on page 45. © Copyright International Business Machines Corporation 2019. US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. © Copyright International Business Machines Corporation 2019 Contents Copyright .................................................................................................................................. 2 Abstract .................................................................................................................................... 5 Background information ............................................................................................................ 6 What is the Content Services GraphQL API? ....................................................................................... 6 How do I access the Content Services GraphQL API? .......................................................................... 6 Developer references ................................................................................................................ 7 Supported platforms ............................................................................................................................ 7 Interfaces and output types ......................................................................................................
    [Show full text]
  • GRAPH DATABASE THEORY Comparing Graph and Relational Data Models
    GRAPH DATABASE THEORY Comparing Graph and Relational Data Models Sridhar Ramachandran LambdaZen © 2015 Contents Introduction .................................................................................................................................................. 3 Relational Data Model .............................................................................................................................. 3 Graph databases ....................................................................................................................................... 3 Graph Schemas ............................................................................................................................................. 4 Selecting vertex labels .............................................................................................................................. 4 Examples of label selection ....................................................................................................................... 4 Drawing a graph schema ........................................................................................................................... 6 Summary ................................................................................................................................................... 7 Converting ER models to graph schemas...................................................................................................... 9 ER models and diagrams ..........................................................................................................................
    [Show full text]
  • Weaver: a High-Performance, Transactional Graph Database Based on Refinable Timestamps
    Weaver: A High-Performance, Transactional Graph Database Based on Refinable Timestamps Ayush Dubey Greg D. Hill Robert Escriva Cornell University Stanford University Cornell University Emin Gun¨ Sirer Cornell University ABSTRACT may erroneously conclude that n7 is reachable from n1, even Graph databases have become a common infrastructure com- though no such path ever existed. ponent. Yet existing systems either operate on offline snap- Providing strongly consistent queries is particularly chal- shots, provide weak consistency guarantees, or use expensive lenging for graph databases because of the unique charac- concurrency control techniques that limit performance. teristics of typical graph queries. Queries such as traversals In this paper, we introduce a new distributed graph data- often read a large portion of the graph, and consequently base, called Weaver, which enables efficient, transactional take a long time to execute. For instance, the average degree graph analyses as well as strictly serializable ACID transac- of separation in the Facebook social network is 3.5 [8], which tions on dynamic graphs. The key insight that allows Weaver implies that a breadth-first traversal that starts at a random to combine strict serializability with horizontal scalability vertex and traverses 4 hops will likely read all 1.59 billion and high performance is a novel request ordering mecha- users. On the other hand, typical key-value and relational nism called refinable timestamps. This technique couples queries are much smaller; the NewOrder transaction in the coarse-grained vector timestamps with a fine-grained timeline TPC-C benchmark [7], which comprises 45% of the frequency oracle to pay the overhead of strong consistency only when distribution, consists of 26 reads and writes on average [21].
    [Show full text]