Graph , GraphQL and IAM Published 11 February 2020 Abstract The Digital Enterprise requires speed, scale and contextual awareness across increasingly complex and diverse relationships. The Lightweight Directory Access Protocol (LDAP) has been the standard for Identity and Access Management (IAM)- centric enterprise directories for 30 years. LDAP relies on a hierarchical data model that begins with a top-level root entry, then moves to subordinate branches and ends in leaf nodes. LDAP has traditionally supported security operations by querying authentication and authorization attributes to make informed security decisions. The current and future challenge is the requirement for an increasingly larger array of context signals (identity attributes, devices, location, source, etc.) that, in turn, lead to complex LDAP structures, and often result in the need to use meta- or virtual- directory solutions.

As a result, vendors have long been investigating the use of non-hierarchical database models – most notably the RDBMS to support scaling directories by attaching it to a highly performant, replication-ready databases The challenge is that relational databases also introduce complexity related to efficiently joining data across numerous rows and tables during runtime authentication and authorization processing. This, in turn, has led to the investigation of other database alternatives, most notably GraphQL and graph databases. A uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. The “graph” relates the data items to a collection of nodes and edges, with the edges representing relationships. Such relationships allow data in the storage system to be linked together directly and, in many cases, retrieved with one operation. This report starts by providing a graph database and GraphQL level-set, then evaluates whether this approach has long-term merit as a solid database foundation for IAM solutions in general as well as specific IAM use cases such as Customer IAM (CIAM). Authors:

Doug Simmons Archie Reed Principal Consulting Analyst Principal Consulting Analyst [email protected] [email protected]

Graph Databases and IAM Simmons, Reed

Table of Contents

2 © 2020 TechVision Research, all rights reserved www.techvisionresearch.com

Graph Databases and IAM Simmons, Reed

Executive Summary A graph database is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph, which relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. Such relationships allow data in the storage system to be linked together directly and, in many cases, retrieved with one operation. Graph databases have emerged over the past few years as reasonably good alternatives to the rigid schema of hierarchical databases such as LDAP, as well as complex, costly operations inherent with relational databases. But the graph database, while becoming increasingly popular with social networking solutions such as , LinkedIn, and in order to maintain complex relationships (e.g., “friends”) among end users, is a relatively new concept. Do graph databases hold the key to large-scale directory services, alongside distributed data and contextual signal sources, in support of IAM? The answer is “most likely”. Over the course of the next 3-5 years, TechVision expects a newer breed of IAM solutions with graph database underpinnings to begin to overtake the technologies we have used for the past few decades. Our principal recommendation is that you begin to investigate this fascinating way for managing identity data soon. In this way, you will have had a useful indoctrination into the universe of the graph and can better prepare your organization for the next wave of IAM solutions. For those of you in the throes of developing a new CIAM infrastructure to replace an aging or under-performing platform, we strongly recommend that you prioritize CIAM solutions that incorporate graph technology. For ‘Microsoft shops’, the writing is on the wall; it would behoove you to start your journey with an up-to-date mindset based on where both Microsoft and the IAM industry are headed. In this report, we will describe how access control policies may lend themselves to better management within a graph database. TechVision Research expects graph database technology to rapidly grow, and in the case of IoT implementations – this growth may be dramatic. There are already some very good tools on the market, so the time may be right to begin thinking about your ‘Next-gen IAM’ solution being built on a graph database foundation. In particular, graph databases are gaining popularity in support of graph-based access control (GBAC), supporting a declarative way to define access rights, task assignments, recipients and content in information systems. The access rights are granted to objects like files or documents, but also business objects like an account. Compared with role-based access control (RBAC) and attribute-based access control (ABAC), GBAC has so far shown to return run-time authorization decisions much faster (some claim more than twice as fast). Given that runtime access controls have been a challenge for those responsible for information security for the past few decades, we believe it is a good time for most enterprises to familiarize yourself with graph database and GraphQL technology, bring some flavor of this in-house and ‘experiment with it in a sandbox’. Perhaps a small ‘tiger team’ can be formed in order to build

3 © 2020 TechVision Research, all rights reserved www.techvisionresearch.com

Graph Databases and IAM Simmons, Reed

some meaningful expertise in the use of this technology for IAM, whether consumer focused (CIAM), for improved access control policy management or for IoT scenario testing. Introduction The Lightweight Directory Access Protocol (LDAP) has been the industry standard for Identity and Access Management (IAM)-centric enterprise directories for almost three decades. Today, it would be difficult – if not impossible, to find an organization that does not rely on LDAP for user (and device) authentication and authorization. To make this point even stronger, consider that Microsoft Active Directory and Azure Active Directory have been built on the LDAP model since inception. Having worked in countless customer organizations for the past thirty years as IAM consultants and architects, we can assure you that LDAP has become one of the most pervasive subsystems in the history of IT. Derived from the International Standards Organization’s 1988 X.500 Directory Services model, LDAP relies on a hierarchical that begins with a top-level root entry and branches off into subordinate branches and ends in leaf nodes. One of the principal challenges with using a hierarchical structure, or namespace, for directories is that the schema and namespace itself often need to change in concert with business focus, organizational changes – including mergers and acquisitions, and the general evolution of and IAM itself. Adding to the problem, as the LDAP directories in many enterprise IAM systems have grown in size and complexity, they become slow or less responsive. When this happens, directory performance (or lack thereof) can impact the performance of every application that depends on them. While the hierarchical database model used by LDAP has endured, vendors have been investigating the use of non-hierarchical database models – most notably the almost since the inception of LDAP. The attractiveness is that relational databases provide highly performant, replication-ready support that can also serve applications via Source (SQL). An additional benefit of SQL is because it may be an area that an enterprise’s in-house skills may already be abundant. However, relational databases retain their own level of complexity related to efficiently joining data across numerous rows and tables during runtime authentication and authorization processing. The need for scale and performance without the complexity of relational databases has led to further investigation into other database alternatives, most notably graph databases and GraphQL. A key issue the industry is addressing, and the focus of this paper is whether graph databases may be a better alternative to the traditional hierarchical LDAP or relational SQL structures. These discussions have been going on the past several years and graph databases are beginning to emerge as a reasonably good alternative to the rigid schema of hierarchical databases such as LDAP. TechVision consistently recommends flexibility in future-state IAM strategies and rigid schemas are not consistent with this goal. That said, the graph database, while becoming increasingly popular with social networking solutions such as Facebook, Netflix, Twitter,

4 © 2020 TechVision Research, all rights reserved www.techvisionresearch.com

Graph Databases and IAM Simmons, Reed

LinkedIn and Google to maintain complex relationships (e.g., “friends”) amongst end users at scale, is a relatively new concept. The question we are looking to answer is if graph databases hold the key to large-scale directory services, alongside distributed data and signals sources, in support of IAM? The following sections examine graph database technology and compare and contrast this with hierarchical and relational databases in support of scalable IAM solutions. The Graph Database So, what is a graph database? A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept is the graph, which relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. Such relationships allow data in the storage system to be linked together directly and, in many cases, retrieved with one operation. The concept of the graph database is illustrated below, courtesy of , a leader in graph database technology.

Figure 1: Graph Database Concept Graph databases hold the relationships between data as a priority. Querying relationships within a graph database is fast because the data relationships themselves are perpetually stored within the database. Furthermore, data relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

5 © 2020 TechVision Research, all rights reserved www.techvisionresearch.com

Graph Databases and IAM Simmons, Reed

Graph databases portray the data as it is viewed conceptually. This is accomplished by transferring the data into nodes and its relationships into edges. Figure 2 below illustrates the node and edge relationships in the graph database.

Figure 2: Nodes and Edges in the Graph Database A graph within graph databases is a set of objects, either a node or an edge – defined as follows: • Nodes represent entities or instances such as people, businesses, accounts, or any other item to be tracked. They are roughly the equivalent of a record, , or in a relational database, or a document in a document-store database. • Edges, also termed graphs or relationships, are the lines that connect nodes to other nodes; representing the relationship between them. Meaningful patterns emerge when viewing the connections and interconnections of nodes, properties and edges. The edges can be either directed or undirected. In an undirected graph, an edge from a point to another has one meaning. In a , the edges connecting two different points have different meanings depending on their direction. Edges are the key concept in graph databases representing an abstraction that is not directly implemented in either a hierarchical or . • Properties are germane information to nodes. For example, if TechVision Research were one of the nodes, it might be tied to properties such as website, research documents, or words that starts with the letter T, depending on which aspects of TechVision Research are germane to a given database. The concept of nodes, edges and properties are further illustrated below.

6 © 2020 TechVision Research, all rights reserved www.techvisionresearch.com

Graph Databases and IAM Simmons, Reed

Figure 3: Graph Database Drilldown It is important to note that the underlying storage mechanism of graph databases can vary. Some implementations utilize a relational database to store the graph data in a (note that a table is a logical element - meaning this approach imposes another level of abstraction between the graph database, the graph database management system and the physical devices where the data is actually stored). Other graph database implementations use a key-value store or document-oriented database for storage, making them inherently NoSQL structures. An example of a NoSQL database that utilizes this method is ArangoDB, a native multi-model database that supports graphs as one of its data models. It stores graphs by holding edges and nodes in separate collections of documents. A node is represented like any other document store, but edges that link two different nodes hold special “linking” attributes inside each document. Data lookup performance is dependent on the access speed from one node to another. The concept of index-free adjacency is important to understanding how graph databases work. Graph databases lookup

7 © 2020 TechVision Research, all rights reserved www.techvisionresearch.com

Graph Databases and IAM Simmons, Reed

adjacent nodes in a graph via a direct walk of memory (i.e., pointer hopping) - which currently is the fastest way computers can look at data relationships. Therefore, graph databases utilize direct physical RAM addresses for each node in the graph. Each node’s RAM address is a pointer that is created when data is loaded into the graph database - not when the data is queried. This means there is no need of an index (or many indices) to lookup data relationships – they are hard-coded within each node.

Figure 4: Index-free Adjacency Means Hard-coding Data Relationships As a result, index-free adjacency usually requires the nodes to have direct physical RAM addresses and physically point to other adjacent nodes to enable extremely fast data retrieval. Native graph databases use index-free adjacency to process create, read, update and delete (CRUD) operations on the stored data. A native graph system with index-free adjacency does not have to move through any other type of data structures to find links between the nodes. Directly related nodes in a graph are stored in the once one of the nodes are retrieved, making the data lookup even faster than the first time a user fetches a node. However, such advantage comes at a cost: index-free adjacency sacrifices the efficiency of queries that do not use such graph traversals.

8 © 2020 TechVision Research, all rights reserved www.techvisionresearch.com

Graph Databases and IAM Simmons, Reed

As discussed, graph databases are part of the NoSQL databases created to address the limitations that exist with relational databases. While the graph model explicitly lays out the dependencies between nodes of data, the relational model and other NoSQL database models link the data by implicit connections. Graph databases, by design, allow simple and fast retrieval of complex hierarchical structures that are difficult to model in relational systems. Retrieving data from a graph database requires a query language other than SQL, which was designed for the manipulation of data in a relational system and therefore cannot efficiently handle graph traversal. At present, no single has been universally adopted in the same way as SQL was for relational databases, and there are a wide variety of systems, most often tightly tied to one product. Some standardization efforts have occurred, leading to multi-vendor query languages like , SPARQL, and GraphQL – discussed in more detail below. In addition to having query language interfaces, many graph databases are accessed through application programming interfaces (). The underlying storage mechanism of graph databases can vary. Some depend on a relational engine and “store” the graph data in a table. However, a table is a logical element, therefore this approach imposes another level of abstraction between the graph database, the graph database management system and the physical devices where the data is actually stored. Thus, such an approach can hinder the performance of a graph by imposing this overhead. Others, however, use a key-value store or document-oriented database for storage, making them inherently NoSQL structures. As you might suspect, such non-relational underlying databases are able to function at the ‘true speed of the graph’ – which means, much faster because of the lack of overhead. As previously discussed, most graph databases based on non-relational storage engines also add the concept of tags or properties, which are essentially relationships having a pointer to another document. As we have said, graph databases allow data elements to be categorized for easy retrieval at large scale, which will be particularly important as we evolve toward IoT security/management and Zero Trust infrastructure reliant upon such scale and performance.

9 © 2020 TechVision Research, all rights reserved www.techvisionresearch.com

Graph Databases and IAM Simmons, Reed

While there can be differences below the GraphQL interface, there is generally only one query target. So, unlike REST API’s, GraphQL allows, and in fact requires a client-side call to specify the properties of the query across objects. This is more flexible than an RMDBS or LDAP server and even more so than a strict API, Interface Definition Language (IDL) or strongly defined schema. The approach is as simple as is illustrated below:

Source: https://ldapwiki.com/wiki/GraphQL Figure 5: GraphQL Queries Note that graph databases differ from graph compute engines. Graph databases are technologies that are translations of the relational online (OLTP) databases. On the other hand, graph compute engines are used in online analytical processing (OLAP) for bulk analysis. Graph databases have attracted considerable attention over the past 20+ years due to the successes of major technology corporations such as Netflix, Facebook, LinkedIn, and many others using proprietary graph databases, and the introduction of open-source graph databases.

10 © 2020 TechVision Research, all rights reserved www.techvisionresearch.com

Graph Databases and IAM Simmons, Reed

About TechVision World-class research requires world-class consulting analysts and our team is just that. Gaining value from research also means having access to research. All TechVision Research licenses are enterprise licenses; this means everyone that needs access to content can have access to content. We know major technology initiatives involve many different skillsets across an organization and limiting content to a few can compromise the effectiveness of the team and the success of the initiative. Our research leverages our team’s in-depth knowledge as well as their real-world consulting experience. We combine great analyst skills with real world client experiences to provide a deep and balanced perspective. TechVision Consulting builds off our research with specific projects to help organizations better understand, architect, select, build, and deploy infrastructure technologies. Our well-rounded experience and strong analytical skills help us separate the “hype” from the reality. This provides organizations with a deeper understanding of the full scope of vendor capabilities, product life cycles, and a basis for making more informed decisions. We also support vendors in areas such as product and strategy reviews and assessments, requirement analysis, target market assessment, technology trend analysis, go-to-market plan assessment, and gap analysis. TechVision Updates will provide regular updates on the latest developments with respect to the issues addressed in this report.

11 © 2020 TechVision Research, all rights reserved www.techvisionresearch.com

Graph Databases and IAM Simmons, Reed

About the Authors Doug Simmons brings more than 25 years of experience in IT security, risk management and identity and access management (IAM). He focuses on IT security, risk management and IAM. Doug holds a double major in Computer Science and Business Administration. While leading consulting at Burton Group for 10 years and security, and identity management consulting at Gartner for 5 years, Doug has performed hundreds of engagements for large enterprise clients in multiple vertical industries including financial services, health care, higher education, federal and state government, manufacturing, aerospace, energy, utilities and critical infrastructure. Archie Reed Over 25+ years, Archie Reed has a career spanning many evolutions of the technology industry, from identity management to cloud computing, from machine learning to DevOps security. He worked on the early development of standards in OASIS, IETF and other standards groups, he introduced the early concepts of “Context Based Identity Management” which formed the basis of many identity-based security solutions and was engaged in the early evolution of the Cloud Security Alliance. Archie has authored a number of well-referenced books including “The Definitive Guide to Identity Management” and “Silver Clouds, Dark Linings: The Executive Guide to Cloud Computing”.

12 © 2020 TechVision Research, all rights reserved www.techvisionresearch.com