A Mechanized Formalization of Graphql

Total Page:16

File Type:pdf, Size:1020Kb

A Mechanized Formalization of Graphql A Mechanized Formalization of GraphQL Tomás Díaz Federico Olmedo Éric Tanter IMFD Chile Computer Science Department Computer Science Department [email protected] University of Chile & IMFD Chile U. of Chile & IMFD Chile & Inria Paris [email protected] [email protected] Abstract 1 Introduction GraphQL is a novel language for specifying and querying GraphQL is an increasingly popular language to define in- web APIs, allowing clients to flexibly and efficiently retrieve terfaces and queries to data services. Originally developed data of interest. The GraphQL language specification is un- internally by Facebook as an alternative to RESTful Web Ser- fortunately only available in prose, making it hard to de- vices [21], GraphQL was made public in 2015, along with velop robust formal results for this language. Recently, Har- a reference implementation1 and a specification—both of tig and Pérez proposed a formal semantics for GraphQL which have naturally evolved [16]. Since early 2019, as a in order to study the complexity of GraphQL queries. The result of its successful adoption by major players in indus- semantics is however not mechanized and leaves certain try, GraphQL is driven by an independent foundation2. The key aspects unverified. We present GraphCoQL, the first key novelty compared to traditional REST-based services is mechanized formalization of GraphQL, developed in the that tailored queries can be formulated directly by clients, Coq proof assistant. GraphCoQL covers the schema defini- allowing a very precise selection of which data ought to be tion DSL, query definitions, validation of both schema and sent back as response. This supports a much more flexible queries, as well as the semantics of queries over a graph and efficient interaction model for clients of services, who data model. We illustrate the application of GraphCoQL by do not need to gather results of multiple queries on their formalizing the key query transformation and interpretation own, possibly wasting bandwidth with unnecessary data. techniques of Hartig and Pérez, and proving them correct, af- The official GraphQL specification, called Spec hereafter, ter addressing some imprecisions and minor issues. We hope covers the definition of interfaces, the query language, and that GraphCoQL can serve as a solid formal baseline for the validation processes, among other aspects. The specifica- both language design and verification efforts for GraphQL. tion undergoes regular revisions by an open working group, which discusses extensions and improvements, as well as CCS Concepts • Information systems → Web services; addressing specification ambiguities. Indeed, as often hap- Query languages; • Theory of computation → Semantics pens, the Spec is an informal specification written in natural and reasoning. language. Considering the actual vibrancy of the GraphQL Keywords GraphQL, mechanized metatheory, Coq. community, sustained by several implementations in a vari- ety of programming languages and underlying technologies, ACM Reference Format: having a formal specification ought to bring some welcome Tomás Díaz, Federico Olmedo, and Éric Tanter. 2020. A Mechanized clarity for all actors. Formalization of GraphQL. In Proceedings of the 9th ACM SIGPLAN Recently, Hartig and Pérez [18] proposed the first (and International Conference on Certified Programs and Proofs (CPP ’20), so far only) formalization of GraphQL, called H&P here- January 20–21, 2020, New Orleans, LA, USA. ACM, New York, NY, after. H&P is a formalization “on paper” that was used to USA, 14 pages. https://doi.org/10.1145/3372885.3373822 prove complexity boundaries for GraphQL queries. Having a mechanized formalization would present many additional ∗This work is partially funded by ERC Starting Grant SECOMP (715753). benefits, such as potentially providing a faithful reference implementation, and serving as a solid basis to prove formal Permission to make digital or hard copies of all or part of this work for results about the GraphQL semantics. personal or classroom use is granted without fee provided that copies For instance, the complexity results of Hartig and Pérez are not made or distributed for profit or commercial advantage and that rely on two techniques: a) transforming queries to equivalent copies bear this notice and the full citation on the first page. Copyrights queries in some normal form and, interpreting queries for components of this work owned by others than the author(s) must b) be honored. Abstracting with credit is permitted. To copy otherwise, or in a simplified but equivalent definition of the semantics. republish, to post on servers or to redistribute to lists, requires prior specific However, Hartig and Pérez do not provide an algorithmic permission and/or a fee. Request permissions from [email protected]. definition of query normalization, let alone proving itcor- CPP ’20, January 20–21, 2020, New Orleans, LA, USA rect and semantics preserving; nor do they prove that the © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-7097-4/20/01...$15.00 1https://github.com/graphql/graphql-js https://doi.org/10.1145/3372885.3373822 2https://foundation.graphql.org/ CPP ’20, January 20–21, 2020, New Orleans, LA, USA Tomás Díaz, Federico Olmedo, and Éric Tanter type Artist type Animation implements Movie simplified semantics is equivalent to the original one when { { applied to normalized queries. id: ID id: ID name: String title: String This work presents the first mechanized formaliza- artworks(role:Role): [Artwork] year: Int tion of GraphQL, carried out in the Coq proof assistant [13], } cast: [Artist] e style: Style called GraphCoQL (|græf·co·k l|). GraphCoQL currently in- enum Role { } cludes the definition of the Schema DSL, query definition, ACTOR DIRECTOR enum Style { schema and query validation, and the semantics of queries, WRITER 2D defined over a graph data model asin H&P (§3). } 3D STOPMOTION As well as precisely capturing the semantics of GraphQL, union Artwork = Fiction } GraphCoQL makes it possible to specify and prove the cor- | Animation | Book type Book rectness of query transformations, as well as other exten- { sions and optimizations made to the language and its algo- interface Movie id: ID { title: String rithms. We illustrate this by studying H&P’s notion of query id: ID year: Int normalization (§4). Specifically, we define an algorithmic title: String ISBN: String year: Int author: Artist presentation of normalization, which we then prove correct cast: [Artist] } and semantics preserving. We also formalize and prove the } type Query { equivalence between the original query evaluation semantics type Fiction implements Movie artist(id:ID): Artist and the simplified semantics used by H&P for normalized { movie(id:ID): Movie id: ID } queries. In the process, we address some imprecisions and title: String minor issues that Coq led us to uncover. year: Int schema { cast: [Artist] query: Query We discuss the limitations and validation of GraphCoQL, } } including its trustworthiness, in §5. We hope that GraphCoQL can serve as a starting point for a formal specification of Figure 1. Example of GraphQL schema. GraphQL from which reference implementations can be extracted. Although we have not yet experimented with ex- The first step when defining a GraphQL traction, GraphCoQL facilitates this vision by relying on GraphQL schema. API is to define the service’s schema, describing the type boolean reflection as much as possible. system and its capabilities. The type system describes how We briefly introduce GraphQL in §2. We discuss related data is structured and precisely what can be requested and work in §6 and conclude in §7. The fundamental challenge received. A schema consists of a set of types, which can of the Coq formalization of GraphQL in comparison to sim- have declared fields. Only fields that are part of such type ilar formalization efforts (e.g. [1, 2, 4]) resides in the non- definitions can be accessed via a GraphQL query. In con- structural nature of most definitions related to GraphQL trast to traditional data management systems, field accesses queries. In particular, as we will see, both query semantics in GraphQL are handled by the backend via user-defined and query normalization are defined by well-founded induc- functions (called resolvers). tion on a notion of query size (§ 3.3). Figure1 depicts the schema for the Artists dataset, which GraphCoQL and the results presented in this paper have contains artists and their artworks. The definition is done been developed in Coq v.8.9.1 and are available online on using the GraphQL Schema Definition Language. The object GitHub3. GraphCoQL extensively uses the Mathematical type Artist declares three fields: an identifier, a name, and Components [20], Ssreflect [15] and Eqations [22] li- a list of artworks in which the artist has participated. The braries. This work is based on the latest GraphQL specifica- list of artworks is declared of type [Artwork]. Note that tions, dated June 2018 [16]. the field may receive an argument, role, which can be used to select only those artworks for which the author plays a 2 A Brief Introduction to GraphQL certain role, such as being the director. Possible roles are We briefly introduce GraphQL by means of the running modeled with the enum type Role, containing three scalar example that we use in the rest of the presentation. The values. An Artwork is a disjoint union of fictional movies, example is about a fictional dataset Artists, containing in- animated movies and books. Both fictional and animated formation about artists and the artworks they have been movies are defined as object types, namely Fiction and involved in, particularly movies and books. We discuss the Animation, implementing the same interface type Movie.4 dataset schema, the underlying data model and the query An object implementing an interface may add more fields, evaluation. such as the field style in the type Animation. To model the animation style, we use the enum type Style containing 4Note that GraphQL still requires objects to redeclare the implemented 3https://github.com/imfd/GraphCoQL fields.
Recommended publications
  • Empirical Study on the Usage of Graph Query Languages in Open Source Java Projects
    Empirical Study on the Usage of Graph Query Languages in Open Source Java Projects Philipp Seifer Johannes Härtel Martin Leinberger University of Koblenz-Landau University of Koblenz-Landau University of Koblenz-Landau Software Languages Team Software Languages Team Institute WeST Koblenz, Germany Koblenz, Germany Koblenz, Germany [email protected] [email protected] [email protected] Ralf Lämmel Steffen Staab University of Koblenz-Landau University of Koblenz-Landau Software Languages Team Koblenz, Germany Koblenz, Germany University of Southampton [email protected] Southampton, United Kingdom [email protected] Abstract including project and domain specific ones. Common applica- Graph data models are interesting in various domains, in tion domains are management systems and data visualization part because of the intuitiveness and flexibility they offer tools. compared to relational models. Specialized query languages, CCS Concepts • General and reference → Empirical such as Cypher for property graphs or SPARQL for RDF, studies; • Information systems → Query languages; • facilitate their use. In this paper, we present an empirical Software and its engineering → Software libraries and study on the usage of graph-based query languages in open- repositories. source Java projects on GitHub. We investigate the usage of SPARQL, Cypher, Gremlin and GraphQL in terms of popular- Keywords Empirical Study, GitHub, Graphs, Query Lan- ity and their development over time. We select repositories guages, SPARQL, Cypher, Gremlin, GraphQL based on dependencies related to these technologies and ACM Reference Format: employ various popularity and source-code based filters and Philipp Seifer, Johannes Härtel, Martin Leinberger, Ralf Lämmel, ranking features for a targeted selection of projects.
    [Show full text]
  • Graphql Attack
    GRAPHQL ATTACK Date: 01/04/2021 Team: Sun* Cyber Security Research Agenda • What is this? • REST vs GraphQL • Basic Blocks • Query • Mutation • How to test What is the GraphQL? GraphQL is an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data. GraphQL was developed internally by Facebook in 2012 before being publicly released in 2015. • Powerful & Flexible o Leaves most other decisions to the API designer o GraphQL offers no requirements for the network, authorization, or pagination. Sun * Cyber Security Team 1 REST vs GraphQL Over the past decade, REST has become the standard (yet a fuzzy one) for designing web APIs. It offers some great ideas, such as stateless servers and structured access to resources. However, REST APIs have shown to be too inflexible to keep up with the rapidly changing requirements of the clients that access them. GraphQL was developed to cope with the need for more flexibility and efficiency! It solves many of the shortcomings and inefficiencies that developers experience when interacting with REST APIs. REST GraphQL • Multi endpoint • Only 1 endpoint • Over fetching/Under fetching • Fetch only what you need • Coupling with front-end • API change do not affect front-end • Filter down the data • Strong schema and types • Perform waterfall requests for • Receive exactly what you ask for related data • No aggregating or filtering data • Aggregate the data yourself Sun * Cyber Security Team 2 Basic blocks Schemas and Types Sun * Cyber Security Team 3 Schemas and Types (2) GraphQL Query Sun * Cyber Security Team 4 Queries • Arguments: If the only thing we could do was traverse objects and their fields, GraphQL would already be a very useful language for data fetching.
    [Show full text]
  • Graphql-Tools Merge Schemas
    Graphql-Tools Merge Schemas Marko still misdoings irreproachably while vaulted Maximilian abrades that granddads. Squallier Kaiser curarize some presuminglyanesthetization when and Dieter misfile is hisexecuted. geomagnetist so slothfully! Tempting Weber hornswoggling sparsely or surmisings Pass on operation name when stitching schemas. The tools that it possible to merge schemas as well, we have a tool for your code! It can remember take an somewhat of resolvers. It here are merged, graphql with schema used. Presto only may set session command for setting some presto properties during current session. Presto server implementation of queries and merged together. Love writing a search query and root schema really is invalid because i download from each service account for a node. Both APIs have root fields called repository. That you actually look like this case you might seem off in memory datastore may have you should be using knex. The graphql with vue, but one round robin approach. The name signify the character. It does allow my the enums, then, were single introspection query at not top client level will field all the data plan through microservices via your stitched interface. The tools that do to other will a tool that. If they allow new. Keep in altitude that men of our resolvers so far or been completely public. Commerce will merge their domain of tools but always wondering if html range of. Based upon a merge your whole schema? Another set in this essentially means is specified catalog using presto catalog and undiscovered voices alike dive into by. We use case you how deep this means is querying data.
    [Show full text]
  • Red Hat Managed Integration 1 Developing a Data Sync App
    Red Hat Managed Integration 1 Developing a Data Sync App For Red Hat Managed Integration 1 Last Updated: 2020-01-21 Red Hat Managed Integration 1 Developing a Data Sync App For Red Hat Managed Integration 1 Legal Notice Copyright © 2020 Red Hat, Inc. The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/ . In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version. Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries. Linux ® is the registered trademark of Linus Torvalds in the United States and other countries. Java ® is a registered trademark of Oracle and/or its affiliates. XFS ® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries. MySQL ® is a registered trademark of MySQL AB in the United States, the European Union and other countries. Node.js ® is an official trademark of Joyent. Red Hat is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.
    [Show full text]
  • Dedalus: Datalog in Time and Space
    Dedalus: Datalog in Time and Space Peter Alvaro1, William R. Marczak1, Neil Conway1, Joseph M. Hellerstein1, David Maier2, and Russell Sears3 1 University of California, Berkeley {palvaro,wrm,nrc,hellerstein}@cs.berkeley.edu 2 Portland State University [email protected] 3 Yahoo! Research [email protected] Abstract. Recent research has explored using Datalog-based languages to ex- press a distributed system as a set of logical invariants. Two properties of dis- tributed systems proved difficult to model in Datalog. First, the state of any such system evolves with its execution. Second, deductions in these systems may be arbitrarily delayed, dropped, or reordered by the unreliable network links they must traverse. Previous efforts addressed the former by extending Datalog to in- clude updates, key constraints, persistence and events, and the latter by assuming ordered and reliable delivery while ignoring delay. These details have a semantics outside Datalog, which increases the complexity of the language and its interpre- tation, and forces programmers to think operationally. We argue that the missing component from these previous languages is a notion of time. In this paper we present Dedalus, a foundation language for programming and reasoning about distributed systems. Dedalus reduces to a subset of Datalog with negation, aggregate functions, successor and choice, and adds an explicit notion of logical time to the language. We show that Dedalus provides a declarative foundation for the two signature features of distributed systems: mutable state, and asynchronous processing and communication. Given these two features, we address two important properties of programs in a domain-specific manner: a no- tion of safety appropriate to non-terminating computations, and stratified mono- tonic reasoning with negation over time.
    [Show full text]
  • IBM Filenet Content Manager Technology Preview: Content Services Graphql API Developer Guide
    IBM FileNet Content Manager Technology Preview: Content Services GraphQL API Developer Guide © Copyright International Business Machines Corporation 2019 Copyright Before you use this information and the product it supports, read the information in "Notices" on page 45. © Copyright International Business Machines Corporation 2019. US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. © Copyright International Business Machines Corporation 2019 Contents Copyright .................................................................................................................................. 2 Abstract .................................................................................................................................... 5 Background information ............................................................................................................ 6 What is the Content Services GraphQL API? ....................................................................................... 6 How do I access the Content Services GraphQL API? .......................................................................... 6 Developer references ................................................................................................................ 7 Supported platforms ............................................................................................................................ 7 Interfaces and output types ......................................................................................................
    [Show full text]
  • A Deductive Database with Datalog and SQL Query Languages
    A Deductive Database with Datalog and SQL Query Languages FernandoFernando SSááenzenz PPéérez,rez, RafaelRafael CaballeroCaballero andand YolandaYolanda GarcGarcííaa--RuizRuiz GrupoGrupo dede ProgramaciProgramacióónn DeclarativaDeclarativa (GPD)(GPD) UniversidadUniversidad ComplutenseComplutense dede MadridMadrid (Spain)(Spain) 12/5/2011 APLAS 2011 1 ~ ContentsContents 1.1. IntroductionIntroduction 2.2. QueryQuery LanguagesLanguages 3.3. IntegrityIntegrity ConstraintsConstraints 4.4. DuplicatesDuplicates 5.5. OuterOuter JoinsJoins 6.6. AggregatesAggregates 7.7. DebuggersDebuggers andand TracersTracers 8.8. SQLSQL TestTest CaseCase GeneratorGenerator 9.9. ConclusionsConclusions 12/5/2011 APLAS 2011 2 ~ 1.1. IntroductionIntroduction SomeSome concepts:concepts: DatabaseDatabase (DB)(DB) DatabaseDatabase ManagementManagement SystemSystem (DBMS)(DBMS) DataData modelmodel (Abstract) data structures Operations Constraints 12/5/2011 APLAS 2011 3 ~ IntroductionIntroduction DeDe--factofacto standardstandard technologiestechnologies inin databases:databases: “Relational” model SQL But,But, aa currentcurrent trendtrend towardstowards deductivedeductive databases:databases: Datalog 2.0 Conference The resurgence of Datalog inin academiaacademia andand industryindustry Ontologies Semantic Web Social networks Policy languages 12/5/2011 APLAS 2011 4 ~ Introduction.Introduction. SystemsSystems Classic academic deductive systems: LDL++ (UCLA) CORAL (Univ. of Wisconsin) NAIL! (Stanford University) Ongoing developments Recent commercial
    [Show full text]
  • DATALOG and Constraints
    7. DATALOG and Constraints Peter Z. Revesz 7.1 Introduction Recursion is an important feature to express many natural queries. The most studied recursive query language for databases is called DATALOG, an ab- breviation for "Database logic programs." In Section 2.8 we gave the basic definitions for DATALOG, and we also saw that the language is not closed for important constraint classes such as linear equalities. We have also seen some results on restricting the language, and the resulting expressive power, in Section 4.4.1. In this chapter, we study the evaluation of DATALOG with constraints in more detail , focusing on classes of constraints for which the language is closed. In Section 7.2, we present a general evaluation method for DATALOG with constraints. In Section 7.3, we consider termination of query evaluation and define safety and data complexity. In the subsequent sections, we discuss the evaluation of DATALOG queries with particular types of constraints, and their applications. 7.2 Evaluation of DATALOG with Constraints The original definitions of recursion in Section 2.8 discussed unrestricted relations. We now consider constraint relations, and first show how DATALOG queries with constraints can be evaluated. Assume that we have a rule of the form Ro(xl, · · · , Xk) :- R1 (xl,l, · · ·, Xl ,k 1 ), • • ·, Rn(Xn,l, .. ·, Xn ,kn), '¢ , as well as, for i = 1, ... , n , facts (in the database, or derived) of the form where 'if;; is a conjunction of constraints. A constraint rule application of the above rule with these facts as input produces the following derived fact: Ro(x1 , ..
    [Show full text]
  • Graphql at Enterprise Scale a Principled Approach to Consolidating a Data Graph
    A Principled Approach to Consolidating a Data Graph GraphQL at Enterprise Scale A Principled Approach to Consolidating a Data Graph Jeff Hampton Michael Watson Mandi Wise GraphQL at Enterprise Scale Copyright © 2020 Apollo Graph, Inc. Published by Apollo Graph, Inc. https://www.apollographql.com/ All rights reserved. No part of this book may be reproduced in any form on by an electronic or mechanical means, including information storage and retrieval systems, without permission in writing from the publisher. You may copy and use this document for your internal, reference purposes. You may modify this document for your internal, reference purposes This document is provided “as-is”. Information and views expressed in this document may change without notice. While the advice and information in this document is believed to be true and accurate at the date of publication, the publisher and the authors assume no legal responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. Revision History for the First Edition 2020-09-11: First Release 2020-10-27: Second Release 2020-12-10: Third Release 2021-04-26: Fourth Release Contents The Team v Preface vi Who Should Read this Guide . vi What You’ll Learn from this Guide . vii How to Contact Us . vii Moving Toward GraphQL Consolidation 1 Why Consolidate Your Data Graph? . 1 What Does a Consolidated Data Graph Look Like? . 8 When to Consolidate Your Data Graph . 9 Summary . 14 Graph Champions in the Enterprise 15 The Graph Champion and Graph Administration . 15 Delivering Organizational Excellence as a Graph Champion .
    [Show full text]
  • Easy Web API Development with SPARQL Transformer
    Easy Web API Development with SPARQL Transformer Pasquale Lisena1[0000−0003−3094−5585], Albert Meroño-Peñuela2[0000−0003−4646−5842], Tobias Kuhn2[0000−0002−1267−0234], and Raphaël Troncy1[0000−0003−0457−1436] 1 EURECOM, Sophia Antipolis, France [email protected], [email protected] 2 Vrije Universiteit, Amsterdam, The Netherlands [email protected], [email protected] Abstract. In a document-based world as the one of Web APIs, the triple-based output of SPARQL endpoints can be a barrier for developers who want to integrate Linked Data in their applications. A different JSON output can be obtained with SPARQL Transformer, which relies on a single JSON object for defining which data should be extracted from the endpoint and which shape should they assume. We propose a new approach that amounts to merge SPARQL bindings on the base of identifiers and the integration in the grlc API framework to create new bridges between the Web of Data and the Web of applications. Keywords: SPARQL · JSON · JSON-LD · API 1 Introduction The Semantic Web is a valuable resource of data and technologies, which is hav- ing a crucial role in realising the initial idea of Web. RDF can potentially repre- sent any kind of knowledge, enabling reasoning, interlinking between datasets, and graph-based artificial intelligence. Nevertheless, a structural gap exists that is limiting a broader consumption of RDF data by the community of Web devel- opers. Recent initiatives such as EasierRDF3 are strongly pushing the proposal of new solutions for making Semantic data on the Web developer friendly [3, 10].
    [Show full text]
  • Constraint-Based Synthesis of Datalog Programs
    Constraint-Based Synthesis of Datalog Programs B Aws Albarghouthi1( ), Paraschos Koutris1, Mayur Naik2, and Calvin Smith1 1 University of Wisconsin–Madison, Madison, USA [email protected] 2 Unviersity of Pennsylvania, Philadelphia, USA Abstract. We study the problem of synthesizing recursive Datalog pro- grams from examples. We propose a constraint-based synthesis approach that uses an smt solver to efficiently navigate the space of Datalog pro- grams and their corresponding derivation trees. We demonstrate our technique’s ability to synthesize a range of graph-manipulating recursive programs from a small number of examples. In addition, we demonstrate our technique’s potential for use in automatic construction of program analyses from example programs and desired analysis output. 1 Introduction The program synthesis problem—as studied in verification and AI—involves con- structing an executable program that satisfies a specification. Recently, there has been a surge of interest in programming by example (pbe), where the specification is a set of input–output examples that the program should satisfy [2,7,11,14,22]. The primary motivations behind pbe have been to (i) allow end users with no programming knowledge to automatically construct desired computations by supplying examples, and (ii) enable automatic construction and repair of pro- grams from tests, e.g., in test-driven development [23]. In this paper, we present a constraint-based approach to synthesizing Data- log programs from examples. A Datalog program is comprised of a set of Horn clauses encoding monotone, recursive constraints between relations. Our pri- mary motivation in targeting Datalog is to expand the range of synthesizable programs to the new domains addressed by Datalog.
    [Show full text]
  • Schemas and Types for JSON Data
    Schemas and Types for JSON Data: from Theory to Practice Mohamed-Amine Baazizi Dario Colazzo Sorbonne Université, LIP6 UMR 7606 Université Paris-Dauphine, PSL Research University France France [email protected] [email protected] Giorgio Ghelli Carlo Sartiani Dipartimento di Informatica DIMIE Università di Pisa Università della Basilicata Pisa, Italy Potenza, Italy [email protected] [email protected] ABSTRACT CCS CONCEPTS The last few years have seen the fast and ubiquitous diffusion • Information systems → Semi-structured data; • The- of JSON as one of the most widely used formats for publish- ory of computation → Type theory. ing and interchanging data, as it combines the flexibility of semistructured data models with well-known data struc- KEYWORDS tures like records and arrays. The user willing to effectively JSON, schemas, schema inference, parsing, schema libraries manage JSON data collections can rely on several schema languages, like JSON Schema, JSound, and Joi, as well as on ACM Reference Format: the type abstractions offered by modern programming and Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, and Carlo scripting languages like Swift or TypeScript. Sartiani. 2019. Schemas and Types for JSON Data: from Theory The main aim of this tutorial is to provide the audience to Practice. In 2019 International Conference on Management of (both researchers and practitioners) with the basic notions Data (SIGMOD ’19), June 30-July 5, 2019, Amsterdam, Netherlands. for enjoying all the benefits that schema and types can offer ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3299869. while processing and manipulating JSON data. This tuto- 3314032 rial focuses on four main aspects of the relation between JSON and schemas: (1) we survey existing schema language proposals and discuss their prominent features; (2) we ana- 1 INTRODUCTION lyze tools that can infer schemas from data, or that exploit The last two decades have seen a dramatic change in the schema information for improving data parsing and manage- data processing landscape.
    [Show full text]