Mechanically Verified Graph Query Processing

Context Regular Datalog Regular Datalog Formalization Conclusions Incremental Maintenance of Graph Views Stefania Dumbrava March 9, 2021 1 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions This Talk Our Work A machine-verified incremental graph query engine. 2 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions This Talk Our Work A machine-verified incremental graph query engine. • ...relying on the Coq proof assistant 2 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions This Talk Our Work A machine-verified incremental graph query engine. • ...relying on the Coq proof assistant • ...for regular queries 1 (closure operators as primitive) expressive & tractable 2 ) amenable to graph querying logic-based language ) amenable to formal verification 1 Moshe Y. Vardi: A Theory of Regular Queries. PODS 2016: 1-9 2 Juan L. Reutter, et al.: Regular Queries on Graph Databases. Th. Comput. Syst. 61(1): 31-83 (2017) 3 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Datalog • logic programming without function symbols • consists of facts and rules • finite model semantics • sound, complete and terminating evaluation ! bottom-up iteration of a fixpoint operator 4 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Datalog Example Assume we have the following simple graph database instance: ...and that we want to compute all the pairs of potential friends. 5 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Datalog Example 5 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Datalog Example 5 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Datalog Example 5 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Datalog Example 5 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Regular Datalog • safe: head variables in each clause appear in its body • recursion is restricted to transitive closure ...and internalized into special literals • distinguished top clause whose head we call view • all symbols (except the view one) are binary • stratified: no clauses with body symbols not previously defined 6 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Datalog vs. Regular Datalog • Regular Datalog corresponds to UC2RPQ under transitive closure ! expressive: allows for complex graph patterns ! tractable: parallelizability & decidable query containement 7 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Building Blocks Graph Databases finite sets of constants V (nodes) & symbols Σ (edge labels). Graph Instance G over Σ: set of directed labeled edges, E, where E ⊆ V × Σ × V. Graph Database D(G) over G: G can be seen as a database D(G) = fs(n1; n2) j (n1; s; n2) 2 Eg s1 sk Path ρ of length k in G: sequence n1 −! n2 ::: nk−1 −! nk ∗ Path Label: λ(ρ) = s1 ::: sk 2 Σ 8 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Syntax Regular Datalog (RD) Expressions Terms (Node IDs) t ::= x j n where x 2 Var; n 2 V Atoms A ::= s(t1; t2) where s 2 Σ Literals L ::= A j A+ Conjunctive Body B ::= L1 ^ ::: ^ Ln where n 2 N Disjunctive Body D ::= B1 _ ::: _ Bn where n 2 N Clauses C ::=( t1; t2) D ProgramsΠ:Σ ! fC1;:::; Cng where n 2 N Regular Queries (RQ) over G • RD-program Π • distinguished query clause Ω whose head is the top-level view 9 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Semantics (I/II) Interpretations (G) Modeled as indexed relations (Σ × f; +g) !P(V × V). 10 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Semantics (I/II) Interpretations (G) Modeled as indexed relations (Σ × f; +g) !P(V × V). 10 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Semantics (I/II) Interpretations (G) Modeled as indexed relations (Σ × f; +g) !P(V × V). Interpretation Well-Formedness (wfG) G(s; +) has to correspond to the transitive closure of G(s; ): wfG(G) ()8 s; is closure(G(s; ); G(s; +)) + is closure(gs ; gp) () 8(n1; n2) 2 gp; 9ρ 2 V ; path(gs ; n1; ρ) ^ last(ρ) = n2 path(g; n1; ρ) () 8i 2 f1 ::: jρjg; (ni ; ni+1) 2 g 10 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Semantics (II/II) Literal Satisfaction l For L ≡ s (n1; n2), G j= L () (n1; n2) 2 G(s; l). Clause Satisfaction For C ≡ (t1; t2) (L1;1 ^ ::: ^ L1;n) _ ::: _ (Lm;1 ^ ::: ^ Lm;n), W V G j=s L () 8η; i=1::m( j=1::n G j= η(Li;j )) ) G j= η(s(t1; t2)). Program Satisfaction For Π ≡ Σ ! fC1;:::; Cng, G j=Σ Π () 8 s 2 Σ; G j=s Π(s). 11 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions View Computation: Clausal Evaluation W For each clause C ≡ (t1; t2) i=1::n Bi corresponding to Π(s) • match each conjunctive body Bi against the interpretation A extend substitutions MG (a) matching a given atom a B ...uniformly, to the entire body as MG (Bi ) • collect the substitutions matching the entire disjunctive body Clausal Consequence Operator Π;s [ B T (G) ≡ fσ(t1; t2) j σ 2 MG (Bi )g i=1::n 12 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions View Computation: Stratified Program Evaluation Stratification Condition A program Π is stratified if: there exists a mapping σ : Σ ! [1; n] such that, for all s in Σ, the Π(s) clause (t1; t2) B satisfies: max σ(r) < σ(s) r2sym(B) 13 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions View Computation: Stratified Program Evaluation + Π1 = connected(X ; Y ) follows (X ; Y ) − Π2 = pfriends(X ; Y ) ((knows ^ (connected · connected )) _ contacted)(X ; Y ) + Π3 = pconnected(X ; Y ) pfriends (X ; Y ) 14 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions View Computation: Stratified Program Evaluation + Π1 = connected(X ; Y ) follows (X ; Y ) − Π2 = pfriends(X ; Y ) ((knows ^ (connected · connected )) _ contacted)(X ; Y ) + Π3 = pconnected(X ; Y ) pfriends (X ; Y ) Π1;connected ∆1 = T (;) = connected ! f(Alice; Bill); (Bob; Bill)g 14 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions View Computation: Stratified Program Evaluation + Π1 = connected(X ; Y ) follows (X ; Y ) − Π2 = pfriends(X ; Y ) ((knows ^ (connected · connected )) _ contacted)(X ; Y ) + Π3 = pconnected(X ; Y ) pfriends (X ; Y ) Π1;connected ∆1 = T (;) = connected ! f(Alice; Bill); (Bob; Bill)g Π2;pfriends ∆2 = T (∆1) = pfriends ! f(Alice; Bob); (Bob; Eve)g 14 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions View Computation: Stratified Program Evaluation + Π1 = connected(X ; Y ) follows (X ; Y ) − Π2 = pfriends(X ; Y ) ((knows ^ (connected · connected )) _ contacted)(X ; Y ) + Π3 = pconnected(X ; Y ) pfriends (X ; Y ) Π1;connected ∆1 = T (;) = connected ! f(Alice; Bill); (Bob; Bill)g Π2;pfriends ∆2 = T (∆1) = pfriends ! f(Alice; Bob); (Bob; Eve)g Π3;pconnected ∆3 = T (∆1 [ ∆2) = pconnected ! f(Alice; Bob); (Bob; Eve); (Alice; Eve)g: 14 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Modeling Updates Updates An update ∆ ≡ (∆+; ∆−) is a pair of disjoint graphs ∆+; ∆−. • ∆+ ≡ bulk insertions • ∆− ≡ bulk deletions Update Application G :+: ∆ ≡ G n ∆− [ ∆+ ∆fs ! (g+; g−)g ≡ (∆+fs ! g+g; ∆−fs ! g− n g+g) 15 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions View Maintenance Computation + Π1 = connected(X ; Y ) follows (X ; Y ) − Π2 = pfriends(X ; Y ) ((knows ^ (connected · connected )) _ contacted)(X ; Y ) + Π3 = pconnected(X ; Y ) pfriends (X ; Y ) 0 Π1;connected ∆1 = Tsupp (∆1) = ∆1fconnected ! (f(Bill; Bob)g; ;)g ... where supp = fknows; contacted; follows+; connected; pconnectedg 16 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions View Computation vs. Maintenance W For each clause C ≡ (t1; t2) i=1::n Bi corresponding to Π(s) Incremental Clausal Consequence Operator Π;s Π;s T (G :+: ∆); (s 2= supp) _ (∆− [ D) 6= ; TG;supp(∆) = S B i=1::n MG;∆(Bi ); otherwise 17 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Full Engine Overview • stratified, single-pass, bottom-up heuristic • non-recursive (recursion internalized in closure computation) • supports both view computation and incremental maintenance • core component: clause evaluation forward-chain clausal consequence operator (fwd or clause) based on a matching algorithm corresponds to computing a nested-loop join 18 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Full Engine Overview Fixpoint fwd_program Π G supp ∆ ΣB ΣC : edelta := match ΣC with | [::] => ∆ | [:: s & ss] => let (arg, body) := Π s in let ∆0 := fwd_or_clause G supp ∆ s arg body in let ∆0 := compute_closures G ∆0 s in 0 fwd_program Π G supp ∆ (s [ s+ [ ΣB) ss 19 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Regular Datalog: Engine Characterization Theorem (Soundness) • Π { a safe, stratifiable, Regular Datalog program • Σ { its set of symbols • G { a graph instance • ∆ { an update The IVM-engine cumulatively processes symbols in Σ, such that if: • the already processed symbols, ΣB, are a well-formed Π-slice • ∆ only modifies ΣB, i.e., sym(∆) ⊆ ΣB • G :+: ∆ j= Π ΣB Then, it outputs ∆O , such that G :+:∆ O j=Σ Π . 20 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Incremental View Maintenance (IVM) V ∆ V [G] V ν ∆ Π V G G ] :+: :+:∆ [G Π maintenance :+: ∆] V [G V recomputation G G :+: ∆ ∆ G ≡ base graph; Π ≡ RD program; V ≡ top-view; ∆ ≡ update. Soundness ∆ If V [G] j= ΠG , the engine outputs an incremental update V s.t ∆ V [G] :+: V j= ΠG:+:∆ 21 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Incremental ∆-Matching Compute V ∆ s.t V [G :+: ∆] = V [G] :+: V ∆ with a delta program1 Idea: distributing deltas over joins and factoring. 1 Ashish Gupta, et al.: Maintaining Views Incrementally. SIGMOD 1993: 157-166 22 / 24 Context Regular Datalog Regular Datalog Formalization Conclusions Incremental ∆-Matching Compute V ∆ s.t V [G :+: ∆] = V [G] :+: V ∆ with a delta program1 Idea: distributing deltas over joins and factoring. Delta Program For V L1;:::; Ln, a delta program is a set of clauses of the form ∆ ν ν V L1;:::; Li−1; Li ; Li+1;:::; Ln 1 Ashish Gupta, et al.: Maintaining Views Incrementally.

Mechanically Verified Graph Query Processing

Empirical Study on the Usage of Graph Query Languages in Open Source Java Projects

Graphql Attack

Graphql-Tools Merge Schemas

Red Hat Managed Integration 1 Developing a Data Sync App

IBM Filenet Content Manager Technology Preview: Content Services Graphql API Developer Guide

Graphql at Enterprise Scale a Principled Approach to Consolidating a Data Graph

Easy Web API Development with SPARQL Transformer

Schemas and Types for JSON Data

Elixir Generate Graphql Schema from Database

Graphql API Backed by a Graph Database

Linked Data Querying with Graphql

Defining Schemas for Property Graphs by Using the Graphql Schema Definition Language