A Mechanized Formalization of Graphql

A Mechanized Formalization of GraphQL Tomás Díaz Federico Olmedo Éric Tanter IMFD Chile Computer Science Department Computer Science Department [email protected] University of Chile & IMFD Chile U. of Chile & IMFD Chile & Inria Paris [email protected] [email protected] Abstract 1 Introduction GraphQL is a novel language for specifying and querying GraphQL is an increasingly popular language to define in- web APIs, allowing clients to flexibly and efficiently retrieve terfaces and queries to data services. Originally developed data of interest. The GraphQL language specification is un- internally by Facebook as an alternative to RESTful Web Ser- fortunately only available in prose, making it hard to de- vices [21], GraphQL was made public in 2015, along with velop robust formal results for this language. Recently, Har- a reference implementation1 and a specification—both of tig and Pérez proposed a formal semantics for GraphQL which have naturally evolved [16]. Since early 2019, as a in order to study the complexity of GraphQL queries. The result of its successful adoption by major players in indus- semantics is however not mechanized and leaves certain try, GraphQL is driven by an independent foundation2. The key aspects unverified. We present GraphCoQL, the first key novelty compared to traditional REST-based services is mechanized formalization of GraphQL, developed in the that tailored queries can be formulated directly by clients, Coq proof assistant. GraphCoQL covers the schema defini- allowing a very precise selection of which data ought to be tion DSL, query definitions, validation of both schema and sent back as response. This supports a much more flexible queries, as well as the semantics of queries over a graph and efficient interaction model for clients of services, who data model. We illustrate the application of GraphCoQL by do not need to gather results of multiple queries on their formalizing the key query transformation and interpretation own, possibly wasting bandwidth with unnecessary data. techniques of Hartig and Pérez, and proving them correct, af- The official GraphQL specification, called Spec hereafter, ter addressing some imprecisions and minor issues. We hope covers the definition of interfaces, the query language, and that GraphCoQL can serve as a solid formal baseline for the validation processes, among other aspects. The specifica- both language design and verification efforts for GraphQL. tion undergoes regular revisions by an open working group, which discusses extensions and improvements, as well as CCS Concepts • Information systems → Web services; addressing specification ambiguities. Indeed, as often hap- Query languages; • Theory of computation → Semantics pens, the Spec is an informal specification written in natural and reasoning. language. Considering the actual vibrancy of the GraphQL Keywords GraphQL, mechanized metatheory, Coq. community, sustained by several implementations in a vari- ety of programming languages and underlying technologies, ACM Reference Format: having a formal specification ought to bring some welcome Tomás Díaz, Federico Olmedo, and Éric Tanter. 2020. A Mechanized clarity for all actors. Formalization of GraphQL. In Proceedings of the 9th ACM SIGPLAN Recently, Hartig and Pérez [18] proposed the first (and International Conference on Certified Programs and Proofs (CPP ’20), so far only) formalization of GraphQL, called H&P here- January 20–21, 2020, New Orleans, LA, USA. ACM, New York, NY, after. H&P is a formalization “on paper” that was used to USA, 14 pages. https://doi.org/10.1145/3372885.3373822 prove complexity boundaries for GraphQL queries. Having a mechanized formalization would present many additional ∗This work is partially funded by ERC Starting Grant SECOMP (715753). benefits, such as potentially providing a faithful reference implementation, and serving as a solid basis to prove formal Permission to make digital or hard copies of all or part of this work for results about the GraphQL semantics. personal or classroom use is granted without fee provided that copies For instance, the complexity results of Hartig and Pérez are not made or distributed for profit or commercial advantage and that rely on two techniques: a) transforming queries to equivalent copies bear this notice and the full citation on the first page. Copyrights queries in some normal form and, interpreting queries for components of this work owned by others than the author(s) must b) be honored. Abstracting with credit is permitted. To copy otherwise, or in a simplified but equivalent definition of the semantics. republish, to post on servers or to redistribute to lists, requires prior specific However, Hartig and Pérez do not provide an algorithmic permission and/or a fee. Request permissions from [email protected]. definition of query normalization, let alone proving itcor- CPP ’20, January 20–21, 2020, New Orleans, LA, USA rect and semantics preserving; nor do they prove that the © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-7097-4/20/01...$15.00 1https://github.com/graphql/graphql-js https://doi.org/10.1145/3372885.3373822 2https://foundation.graphql.org/ CPP ’20, January 20–21, 2020, New Orleans, LA, USA Tomás Díaz, Federico Olmedo, and Éric Tanter type Artist type Animation implements Movie simplified semantics is equivalent to the original one when { { applied to normalized queries. id: ID id: ID name: String title: String This work presents the first mechanized formaliza- artworks(role:Role): [Artwork] year: Int tion of GraphQL, carried out in the Coq proof assistant [13], } cast: [Artist] e style: Style called GraphCoQL (|græf·co·k l|). GraphCoQL currently in- enum Role { } cludes the definition of the Schema DSL, query definition, ACTOR DIRECTOR enum Style { schema and query validation, and the semantics of queries, WRITER 2D defined over a graph data model asin H&P (§3). } 3D STOPMOTION As well as precisely capturing the semantics of GraphQL, union Artwork = Fiction } GraphCoQL makes it possible to specify and prove the cor- | Animation | Book type Book rectness of query transformations, as well as other exten- { sions and optimizations made to the language and its algo- interface Movie id: ID { title: String rithms. We illustrate this by studying H&P’s notion of query id: ID year: Int normalization (§4). Specifically, we define an algorithmic title: String ISBN: String year: Int author: Artist presentation of normalization, which we then prove correct cast: [Artist] } and semantics preserving. We also formalize and prove the } type Query { equivalence between the original query evaluation semantics type Fiction implements Movie artist(id:ID): Artist and the simplified semantics used by H&P for normalized { movie(id:ID): Movie id: ID } queries. In the process, we address some imprecisions and title: String minor issues that Coq led us to uncover. year: Int schema { cast: [Artist] query: Query We discuss the limitations and validation of GraphCoQL, } } including its trustworthiness, in §5. We hope that GraphCoQL can serve as a starting point for a formal specification of Figure 1. Example of GraphQL schema. GraphQL from which reference implementations can be extracted. Although we have not yet experimented with ex- The first step when defining a GraphQL traction, GraphCoQL facilitates this vision by relying on GraphQL schema. API is to define the service’s schema, describing the type boolean reflection as much as possible. system and its capabilities. The type system describes how We briefly introduce GraphQL in §2. We discuss related data is structured and precisely what can be requested and work in §6 and conclude in §7. The fundamental challenge received. A schema consists of a set of types, which can of the Coq formalization of GraphQL in comparison to sim- have declared fields. Only fields that are part of such type ilar formalization efforts (e.g. [1, 2, 4]) resides in the non- definitions can be accessed via a GraphQL query. In con- structural nature of most definitions related to GraphQL trast to traditional data management systems, field accesses queries. In particular, as we will see, both query semantics in GraphQL are handled by the backend via user-defined and query normalization are defined by well-founded induc- functions (called resolvers). tion on a notion of query size (§ 3.3). Figure1 depicts the schema for the Artists dataset, which GraphCoQL and the results presented in this paper have contains artists and their artworks. The definition is done been developed in Coq v.8.9.1 and are available online on using the GraphQL Schema Definition Language. The object GitHub3. GraphCoQL extensively uses the Mathematical type Artist declares three fields: an identifier, a name, and Components [20], Ssreflect [15] and Eqations [22] li- a list of artworks in which the artist has participated. The braries. This work is based on the latest GraphQL specifica- list of artworks is declared of type [Artwork]. Note that tions, dated June 2018 [16]. the field may receive an argument, role, which can be used to select only those artworks for which the author plays a 2 A Brief Introduction to GraphQL certain role, such as being the director. Possible roles are We briefly introduce GraphQL by means of the running modeled with the enum type Role, containing three scalar example that we use in the rest of the presentation. The values. An Artwork is a disjoint union of fictional movies, example is about a fictional dataset Artists, containing in- animated movies and books. Both fictional and animated formation about artists and the artworks they have been movies are defined as object types, namely Fiction and involved in, particularly movies and books. We discuss the Animation, implementing the same interface type Movie.4 dataset schema, the underlying data model and the query An object implementing an interface may add more fields, evaluation. such as the field style in the type Animation. To model the animation style, we use the enum type Style containing 4Note that GraphQL still requires objects to redeclare the implemented 3https://github.com/imfd/GraphCoQL fields.

A Mechanized Formalization of Graphql

Empirical Study on the Usage of Graph Query Languages in Open Source Java Projects

Graphql Attack

Graphql-Tools Merge Schemas

Red Hat Managed Integration 1 Developing a Data Sync App

Dedalus: Datalog in Time and Space

IBM Filenet Content Manager Technology Preview: Content Services Graphql API Developer Guide

A Deductive Database with Datalog and SQL Query Languages

DATALOG and Constraints

Graphql at Enterprise Scale a Principled Approach to Consolidating a Data Graph

Easy Web API Development with SPARQL Transformer

Constraint-Based Synthesis of Datalog Programs

Schemas and Types for JSON Data