Regular Path Query Evaluation on Streaming Graphs

Regular Path Query Evaluation on Streaming Graphs Anil Pacaci 1 Angela Bonifati 2 M. Tamer Ozsu¨ 1 1University of Waterloo 2Lyon 1 University 1 Characteristics of Streaming Graphs I Streams are unbounded { no global access I High streaming rates in real-world applications Streaming Graphs Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u b a a u z v v y a b u x a z t10 2 Characteristics of Streaming Graphs I Streams are unbounded { no global access I High streaming rates in real-world applications Streaming Graphs Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u b a a b u z v x v v a y y a b b u u b x x a a z z t10 t11 2 Characteristics of Streaming Graphs I Streams are unbounded { no global access I High streaming rates in real-world applications Streaming Graphs Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u b a a b u z v x v v a y y a b b u u b x x a a z z t10 t11 2 Characteristics of Streaming Graphs I Streams are unbounded { no global access I High streaming rates in real-world applications Streaming Graphs Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x b a a b a u z v x y v v v a a y y a y b b b u u a u b b x x x a a a z z z t10 t11 t13 2 Characteristics of Streaming Graphs I Streams are unbounded { no global access I High streaming rates in real-world applications Streaming Graphs Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x z b a a b a b u z v x y u v v v v a a y y a y y a b b b b u u a u a u b b b x x x x b a a a a z z z z t10 t11 t13 t14 2 Characteristics of Streaming Graphs I Streams are unbounded { no global access I High streaming rates in real-world applications Streaming Graphs Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x z b a a b a b u z v x y u v v v v v a a y y a y y a y a b b b b u u a u a u a u b b b b x x x x b x b a a a a a z z z z z t10 t11 t13 t14 t15 2 Characteristics of Streaming Graphs I Streams are unbounded { no global access I High streaming rates in real-world applications Streaming Graphs Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x z u b a a b a b Xb u z v x y u x v v v v v v a a a y y a y y a y a y b b b b u u a u a u a u a u b b b b x x x x b x b x b a a a a a a z z z z z z t10 t11 t13 t14 t15 t16 2 Streaming Graphs Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x z u b a a b a b Xb u z v x y u x Characteristics of Streaming Graphs I Streamsv are unboundedv { nov global accessv v v I High streaming rates in real-world applications a a a y y a y y a y a y b b b b u u a u a u a u a u b b b b x x x x b x b x b a a a a a a z z z z z z t10 t11 t13 t14 t15 t16 2 Intrusion detection on networks [Kent et al., 2015; Choudhury et al., 2015] TCP TCP Attacker TCP Victim TCP Bots (b) Denial-of-service (DOS) attack (Taken from [Choudhury et al., 2015]) Applications of Streaming Graphs Fraud detection in e-commerce [Qiu et al., 2018] Criminal Merchant credit 2 3 payment 4 transfer transfer 1 1 Middleman accounts (a) Credit-card fraud (Taken from [Qiu et al., 2018]) 3 Applications of Streaming Graphs Fraud detection in e-commerce [Qiu et al., 2018] Intrusion detection on networks [Kent et al., 2015; Choudhury et al., 2015] Criminal Merchant credit 2 3 payment TCP TCP 4 transfer transfer Attacker TCP Victim TCP 1 1 Bots Middleman accounts (a) Credit-card fraud (b) Denial-of-service (DOS) attack (Taken from [Qiu et al., 2018]) (Taken from [Choudhury et al., 2015]) 3 Applications of Streaming Graphs Fraud detection in e-commerce [Qiu et al., 2018] Intrusion detection on networks [Kent et al., 2015; Choudhury et al., 2015] OurCriminal setting: Query processingMerchant over streaming graphs credit 2 3 payment TCP I Persistent graph queries on streaming dataTCP Real-time results that are4 continuously updated I transfer transfer Attacker TCP Victim TCP 1 1 Bots Middleman accounts (a) Credit-card fraud (b) Denial-of-service (DOS) attack (Taken from [Qiu et al., 2018]) (Taken from [Choudhury et al., 2015]) 3 Path Navigation (follows · mentions)+ P1 P2 Path Navigation Queries Property paths in SPARQL v1.1 Single-label reachability in Cypher Path expressions in G-CORE & PGQL Regular Path Queries Generalized reachability & traversal Directed paths that match regular expressions What is a graph query? Graph queries feature: Subgraph Pattern c worksAt worksAt u1 u2 follows 4 Path Navigation Queries Property paths in SPARQL v1.1 Single-label reachability in Cypher Path expressions in G-CORE & PGQL Regular Path Queries Generalized reachability & traversal Directed paths that match regular expressions What is a graph query? Graph queries feature: Subgraph Pattern Path Navigation c (follows · mentions)+ worksAt worksAt P1 P2 u1 u2 follows 4 Regular Path Queries Generalized reachability & traversal Directed paths that match regular expressions What is a graph query? Graph queries feature: Subgraph Pattern Path Navigation c (follows · mentions)+ worksAt worksAt P1 P2 u1 u2 follows Path Navigation Queries Property paths in SPARQL v1.1 Single-label reachability in Cypher Path expressions in G-CORE & PGQL 4 What is a graph query? Graph queries feature: Subgraph Pattern Path Navigation c (follows · mentions)+ worksAt worksAt P1 P2 u1 u2 follows Path Navigation Queries Property paths in SPARQL v1.1 Single-label reachability in Cypher Path expressions in G-CORE & PGQL Regular Path Queries Generalized reachability & traversal Directed paths that match regular expressions 4 Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013] Cost-based planning for SPARQL property paths [Yakovets et al., 2016] Windowed processing Incremental & non-blocking operators Largely focus on pattern matching RPQ evaluation on static graphs The objective of this paper PersistentUnboundednessRPQ evaluation & high over streamingstreaming rates graphs Continuous processing of streaming graphs Regular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries in Wikidata query logs [Bonifati et al., 2019] 5 Windowed processing Incremental & non-blocking operators Largely focus on pattern matching The objective of this paper PersistentUnboundednessRPQ evaluation & high over streamingstreaming rates graphs Continuous processing of streaming graphs Regular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries in Wikidata query logs [Bonifati et al., 2019] RPQ evaluation on static graphs Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013] Cost-based planning for SPARQL property paths [Yakovets et al., 2016] 5 Largely focus on pattern matching The objective of this paper Persistent RPQ evaluation over streaming graphs Continuous processing of streaming graphs Regular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries in Wikidata query logs [Bonifati et al., 2019] RPQ evaluation on static graphs Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013] Cost-based planning for SPARQL property paths [Yakovets et al., 2016] Unboundedness & high streaming rates Windowed processing Incremental & non-blocking operators 5 The objective of this paper Persistent RPQ evaluation over streaming graphs Regular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries in Wikidata query logs [Bonifati et al., 2019] RPQ evaluation on static graphs Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013] Cost-based planning for SPARQL property paths [Yakovets et al., 2016] Unboundedness & high streaming rates Windowed processing Incremental & non-blocking operators Continuous processing of streaming graphs Largely focus on pattern matching 5 Regular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries in Wikidata query logs [Bonifati et al., 2019] RPQ evaluation on static graphs Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013] The objectiveCost-based of this planning paper for SPARQL property paths [Yakovets et al., 2016] PersistentUnboundednessRPQ evaluation & high over streamingstreaming rates graphs Windowed processing Incremental & non-blocking operators Continuous processing of streaming graphs Largely focus on pattern matching 5 2 Path semantics used in practice Simple paths (no repeating vertex): navigation on road networks Arbitrary paths: reachability on communication networks 3 Result semantics & stream types Append-only streams with fast insertions Support for explicit deletions 4 Physical operators for path navigation queries Compatible with existing languages: Cypher, G-CORE, PGQL, SPARQL v1.1 Efficiency & efficacy in real-world workloads Our Contributions Persistent Regular Path Query Evaluation on Streaming Graphs 1 The design space of persistent RPQ algorithms Path semantics Result semantics & stream types 6 3 Result semantics & stream types Append-only streams with fast insertions Support for explicit deletions 4 Physical operators for path navigation queries Compatible with existing languages: Cypher, G-CORE, PGQL, SPARQL v1.1 Efficiency & efficacy in real-world workloads Our Contributions Persistent Regular Path Query Evaluation on Streaming Graphs 1 The design space of persistent RPQ algorithms Path semantics Result semantics & stream types 2 Path semantics used in practice Simple paths (no repeating vertex):

Regular Path Query Evaluation on Streaming Graphs

Data Streams and Applications in Computer Science

Efficient Semi-Streaming Algorithms for Local Triangle Counting In

Streaming Quantiles Algorithms with Small Space and Update Time

Turnstile Streaming Algorithms Might As Well Be Linear Sketches

Finding Graph Matchings in Data Streams

Lecture 1 1 Graph Streaming

Streaming Algorithms

Anytime Algorithms for Stream Data Mining

Regular Path Query Evaluation on Streaming Graphs

Stochastic Streams: Sample Complexity Vs. Space Complexity

Finding Matchings in the Streaming Model

Suffix Tree, Minwise Hashing and Streaming Algorithms for Big Data Analysis in Bioinformatics