Regular Path Query Evaluation on Streaming Graphs
Anil Pacaci 1 Angela Bonifati 2 M. Tamer Ozsu¨ 1
1University of Waterloo
2Lyon 1 University
1 Characteristics of Streaming Graphs
I Streams are unbounded – no global access I High streaming rates in real-world applications
Streaming Graphs
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u
b a a
u z v
v
y a b
u
x a
z
t10 2 Characteristics of Streaming Graphs
I Streams are unbounded – no global access I High streaming rates in real-world applications
Streaming Graphs
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u
b a a b
u z v x
v v a y y a b b
u u b
x x a a
z z
t10 t11 2 Characteristics of Streaming Graphs
I Streams are unbounded – no global access I High streaming rates in real-world applications
Streaming Graphs
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u
b a a b
u z v x
v v a y y a b b
u u b
x x a a
z z
t10 t11 2 Characteristics of Streaming Graphs
I Streams are unbounded – no global access I High streaming rates in real-world applications
Streaming Graphs
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x
b a a b a
u z v x y
v v v a a y y a y b b b
u u a u
b b
x x x a a a
z z z
t10 t11 t13 2 Characteristics of Streaming Graphs
I Streams are unbounded – no global access I High streaming rates in real-world applications
Streaming Graphs
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x z
b a a b a b
u z v x y u
v v v v a a y y a y y a b b b b
u u a u a u
b b b
x x x x b a a a a
z z z z
t10 t11 t13 t14 2 Characteristics of Streaming Graphs
I Streams are unbounded – no global access I High streaming rates in real-world applications
Streaming Graphs
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x z
b a a b a b
u z v x y u
v v v v v a a y y a y y a y a b b b b
u u a u a u a u
b b b b
x x x x b x b a a a a a
z z z z z
t10 t11 t13 t14 t15 2 Characteristics of Streaming Graphs
I Streams are unbounded – no global access I High streaming rates in real-world applications
Streaming Graphs
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x z u b a a b a b Xb u z v x y u x
v v v v v v a a a y y a y y a y a y b b b b
u u a u a u a u a u
b b b b
x x x x b x b x b a a a a a a
z z z z z z
t10 t11 t13 t14 t15 t16 2 Streaming Graphs
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x z u b a a b a b Xb u z v x y u x Characteristics of Streaming Graphs
I Streamsv are unboundedv – nov global accessv v v I High streaming rates in real-world applications a a a y y a y y a y a y b b b b
u u a u a u a u a u
b b b b
x x x x b x b x b a a a a a a
z z z z z z
t10 t11 t13 t14 t15 t16 2 Intrusion detection on networks [Kent et al., 2015; Choudhury et al., 2015]
TCP TCP
Attacker TCP Victim TCP Bots
(b) Denial-of-service (DOS) attack
(Taken from [Choudhury et al., 2015])
Applications of Streaming Graphs
Fraud detection in e-commerce [Qiu et al., 2018]
Criminal Merchant credit 2 3 payment
4 transfer transfer 1 1 Middleman accounts
(a) Credit-card fraud
(Taken from [Qiu et al., 2018])
3 Applications of Streaming Graphs
Fraud detection in e-commerce [Qiu et al., 2018] Intrusion detection on networks [Kent et al., 2015; Choudhury et al., 2015]
Criminal Merchant credit 2 3 payment TCP TCP 4 transfer transfer Attacker TCP Victim TCP 1 1 Bots Middleman accounts
(a) Credit-card fraud (b) Denial-of-service (DOS) attack
(Taken from [Qiu et al., 2018]) (Taken from [Choudhury et al., 2015])
3 Applications of Streaming Graphs
Fraud detection in e-commerce [Qiu et al., 2018] Intrusion detection on networks [Kent et al., 2015; Choudhury et al., 2015]
OurCriminal setting: Query processingMerchant over streaming graphs credit 2 3 payment TCP I Persistent graph queries on streaming dataTCP Real-time results that are4 continuously updated I transfer transfer Attacker TCP Victim TCP 1 1 Bots Middleman accounts
(a) Credit-card fraud (b) Denial-of-service (DOS) attack
(Taken from [Qiu et al., 2018]) (Taken from [Choudhury et al., 2015])
3 Path Navigation (follows · mentions)+
P1 P2
Path Navigation Queries Property paths in SPARQL v1.1 Single-label reachability in Cypher Path expressions in G-CORE & PGQL Regular Path Queries Generalized reachability & traversal Directed paths that match regular expressions
What is a graph query?
Graph queries feature: Subgraph Pattern c worksAt
worksAt
u1 u2 follows
4 Path Navigation Queries Property paths in SPARQL v1.1 Single-label reachability in Cypher Path expressions in G-CORE & PGQL Regular Path Queries Generalized reachability & traversal Directed paths that match regular expressions
What is a graph query?
Graph queries feature: Subgraph Pattern Path Navigation c (follows · mentions)+ worksAt
worksAt P1 P2 u1 u2 follows
4 Regular Path Queries Generalized reachability & traversal Directed paths that match regular expressions
What is a graph query?
Graph queries feature: Subgraph Pattern Path Navigation c (follows · mentions)+ worksAt
worksAt P1 P2 u1 u2 follows
Path Navigation Queries Property paths in SPARQL v1.1 Single-label reachability in Cypher Path expressions in G-CORE & PGQL
4 What is a graph query?
Graph queries feature: Subgraph Pattern Path Navigation c (follows · mentions)+ worksAt
worksAt P1 P2 u1 u2 follows
Path Navigation Queries Property paths in SPARQL v1.1 Single-label reachability in Cypher Path expressions in G-CORE & PGQL Regular Path Queries Generalized reachability & traversal Directed paths that match regular expressions
4 Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013] Cost-based planning for SPARQL property paths [Yakovets et al., 2016]
Windowed processing Incremental & non-blocking operators
Largely focus on pattern matching
RPQ evaluation on static graphs The objective of this paper
PersistentUnboundednessRPQ evaluation & high over streamingstreaming rates graphs
Continuous processing of streaming graphs
Regular Path Queries
RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries in Wikidata query logs [Bonifati et al., 2019]
5 Windowed processing Incremental & non-blocking operators
Largely focus on pattern matching
The objective of this paper
PersistentUnboundednessRPQ evaluation & high over streamingstreaming rates graphs
Continuous processing of streaming graphs
Regular Path Queries
RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries in Wikidata query logs [Bonifati et al., 2019] RPQ evaluation on static graphs Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013] Cost-based planning for SPARQL property paths [Yakovets et al., 2016]
5 Largely focus on pattern matching
The objective of this paper
Persistent RPQ evaluation over streaming graphs
Continuous processing of streaming graphs
Regular Path Queries
RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries in Wikidata query logs [Bonifati et al., 2019] RPQ evaluation on static graphs Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013] Cost-based planning for SPARQL property paths [Yakovets et al., 2016] Unboundedness & high streaming rates Windowed processing Incremental & non-blocking operators
5 The objective of this paper
Persistent RPQ evaluation over streaming graphs
Regular Path Queries
RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries in Wikidata query logs [Bonifati et al., 2019] RPQ evaluation on static graphs Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013] Cost-based planning for SPARQL property paths [Yakovets et al., 2016] Unboundedness & high streaming rates Windowed processing Incremental & non-blocking operators Continuous processing of streaming graphs Largely focus on pattern matching
5 Regular Path Queries
RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries in Wikidata query logs [Bonifati et al., 2019] RPQ evaluation on static graphs Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013] The objectiveCost-based of this planning paper for SPARQL property paths [Yakovets et al., 2016] PersistentUnboundednessRPQ evaluation & high over streamingstreaming rates graphs Windowed processing Incremental & non-blocking operators Continuous processing of streaming graphs Largely focus on pattern matching
5 2 Path semantics used in practice Simple paths (no repeating vertex): navigation on road networks Arbitrary paths: reachability on communication networks 3 Result semantics & stream types Append-only streams with fast insertions Support for explicit deletions 4 Physical operators for path navigation queries Compatible with existing languages: Cypher, G-CORE, PGQL, SPARQL v1.1 Efficiency & efficacy in real-world workloads
Our Contributions
Persistent Regular Path Query Evaluation on Streaming Graphs
1 The design space of persistent RPQ algorithms Path semantics Result semantics & stream types
6 3 Result semantics & stream types Append-only streams with fast insertions Support for explicit deletions 4 Physical operators for path navigation queries Compatible with existing languages: Cypher, G-CORE, PGQL, SPARQL v1.1 Efficiency & efficacy in real-world workloads
Our Contributions
Persistent Regular Path Query Evaluation on Streaming Graphs
1 The design space of persistent RPQ algorithms Path semantics Result semantics & stream types 2 Path semantics used in practice Simple paths (no repeating vertex): navigation on road networks Arbitrary paths: reachability on communication networks
6 4 Physical operators for path navigation queries Compatible with existing languages: Cypher, G-CORE, PGQL, SPARQL v1.1 Efficiency & efficacy in real-world workloads
Our Contributions
Persistent Regular Path Query Evaluation on Streaming Graphs
1 The design space of persistent RPQ algorithms Path semantics Result semantics & stream types 2 Path semantics used in practice Simple paths (no repeating vertex): navigation on road networks Arbitrary paths: reachability on communication networks 3 Result semantics & stream types Append-only streams with fast insertions Support for explicit deletions
6 Our Contributions
Persistent Regular Path Query Evaluation on Streaming Graphs
1 The design space of persistent RPQ algorithms Path semantics Result semantics & stream types 2 Path semantics used in practice Simple paths (no repeating vertex): navigation on road networks Arbitrary paths: reachability on communication networks 3 Result semantics & stream types Append-only streams with fast insertions Support for explicit deletions 4 Physical operators for path navigation queries Compatible with existing languages: Cypher, G-CORE, PGQL, SPARQL v1.1 Efficiency & efficacy in real-world workloads
6 Our Solution
7 Spanning-tree index to encode partial results
Design Space for Streaming RPQ
An automata-based algorithm Automata transition graph to guide traversals
8 Design Space for Streaming RPQ
An automata-based algorithm Automata transition graph to guide traversals Spanning-tree index to encode partial results
8 Design Space for Streaming RPQ
An automata-based algorithm Automata transition graph to guide traversals Spanning-tree index to encode partial results
Result Semantics Path Semantics Append-only Explicit Deletions
Arbitrary O(n · k2) O(n2 · k) Simple1 O(n · k2) O(n2 · k).
1 These results hold in the absence of conflicts, a condition on cyclic structure of the query and graph. 8 Design Space for Streaming RPQ
An automata-based algorithm Automata transition graph to guide traversals Spanning-tree index to encode partial results
Result Semantics Path Semantics Append-only Explicit Deletions
Arbitrary O(n · k2) O(n2 · k) Simple1 O(n · k2) O(n2 · k).
1 These results hold in the absence of conflicts, a condition on cyclic structure of the query and graph. 8 Cross-edge
Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the(x, 0) window W ) I Expired tuples are removed in batches I Explicit windows are supported via negative tuples(y, 1) (z, 1) I A variation of Counting and DRed [Gupta et al., 1993] (u, 2) (w, 2)
(v, 1)
(y, 2)
Incremental Maintenance of Spanning Trees
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x
b a a b a
u z v x y
+ Q1 = (a · b) a
start s0 s1 s2 a b
9 Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W ) I Expired tuples are removed in batches I Explicit windows are supported via negative tuples I A variation of Counting and DRed [Gupta et al., 1993]
Cross-edge
Incremental Maintenance of Spanning Trees
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x
b a a b a
u z v x y
(x, 0) + Q1 = (a · b)
a (y, 1) (z, 1)
start s0 s1 s2 a b (u, 2) (w, 2)
(v, 1)
(y, 2)
9 Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W ) I Expired tuples are removed in batches I Explicit windows are supported via negative tuples I A variation of Counting and DRed [Gupta et al., 1993]
Incremental Maintenance of Spanning Trees
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x z
b a a b a b
u z v x y u
(x, 0) + Q1 = (a · b)
a (y, 1) (z, 1)
start s0 s1 s2 a b (u, 2) (w, 2)
Cross-edge (v, 1)
(y, 2)
9 Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W ) I Expired tuples are removed in batches I Explicit windows are supported via negative tuples I A variation of Counting and DRed [Gupta et al., 1993]
Cross-edge
Incremental Maintenance of Spanning Trees
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x z z
b a a b a b b
u z v x y u w
(x, 0) + Q1 = (a · b)
a (y, 1) (z, 1)
start s0 s1 s2 a b (u, 2) (w, 2)
(v, 1)
(y, 2)
9 Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W ) I Expired tuples are removed in batches I Explicit windows are supported via negative tuples I A variation of Counting and DRed [Gupta et al., 1993]
Cross-edge
Incremental Maintenance of Spanning Trees
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x z z
b a a b a b b
u z v x y u w
(x, 0) + Q1 = (a · b)
a (y, 1) (z, 1)
start s0 s1 s2 a b (u, 2) (w, 2)
(v, 1)
(y, 2)
9 Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W ) I Expired tuples are removed in batches I Explicit windows are supported via negative tuples I A variation of Counting and DRed [Gupta et al., 1993]
Cross-edge
Incremental Maintenance of Spanning Trees
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x z z
b a a b a b b
u z v x y u w
(x, 0) + Q1 = (a · b)
a (y, 1) (z, 1)
start s0 s1 s2 a b (u, 2) (w, 2)
(v, 1)
(y, 2)
9 Cross-edge
Incremental Maintenance of Spanning Trees
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x z z
b a a b a b b Append-onlyu Algorithmz forv Arbitraryx Pathy u Semanticsw
I O(n) amortized insertion cost (n = # of vertices in the(x, 0) window W ) Q = (a · b)+ I Expired tuples1 are removed in batches a I Explicit windows are supported via negative tuples(y, 1) (z, 1) IstartA variations0 of Countings1 ands2 DRed [Gupta et al., 1993] a b (u, 2) (w, 2)
(v, 1)
(y, 2)
9 Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W ) I Expired tuples are removed in batches I Explicit windows are supported via negative tuples I A variation of Counting and DRed [Gupta et al., 1993]
Cross-edge
Incremental Maintenance of Spanning Trees
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x z z v
b a a b a b b b
u z v x y u w y
(x, 0) + Q1 = (a · b)
a (y, 1) (z, 1)
start s0 s1 s2 a b (u, 2) (w, 2)
(v, 1)
(y, 2)
9 Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W ) I Expired tuples are removed in batches I Explicit windows are supported via negative tuples I A variation of Counting and DRed [Gupta et al., 1993]
Cross-edge
Incremental Maintenance of Spanning Trees
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x z z v
b a a b a b b b
u z v x y u w y
(x, 0) + Q1 = (a · b)
a (y, 1) (z, 1)
start s0 s1 s2 a b (u, 2) (w, 2)
(v, 1)
(y, 2)
9 Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W ) I Expired tuples are removed in batches I Explicit windows are supported via negative tuples I A variation of Counting and DRed [Gupta et al., 1993]
Incremental Maintenance of Spanning Trees
Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 y x u u x z z v
b a a b a b b b
u z v x y u w y
(x, 0) + Q1 = (a · b)
a (y, 1) (z, 1)
start s0 s1 s2 a b (u, 2) (w, 2)
Cross-edge (v, 1)
(y, 2)
9 Optimized for append-only streams An optimistic conflict-detection mechanism Aggressive pruning – backtracking only if absolutely necessary Correctness & complexity proofs are in the paper
Conflict-freeness: A sufficient condition for efficient evaluation A condition on the cyclic structure of the graph and the automata An arbitrary path pa : u→v =⇒ a simple path ps : u→v Requires DFS ordering
Our contribution: A streaming algorithm for conflict detection
How to Find Simple Paths?
Finding simple paths for arbitrary graphs and queries is NP-complete [Mendelzon and Wood, 1995]
10 Optimized for append-only streams An optimistic conflict-detection mechanism Aggressive pruning – backtracking only if absolutely necessary Correctness & complexity proofs are in the paper
Requires DFS ordering
Our contribution: A streaming algorithm for conflict detection
How to Find Simple Paths?
Finding simple paths for arbitrary graphs and queries is NP-complete [Mendelzon and Wood, 1995] Conflict-freeness: A sufficient condition for efficient evaluation A condition on the cyclic structure of the graph and the automata An arbitrary path pa : u→v =⇒ a simple path ps : u→v
10 Optimized for append-only streams An optimistic conflict-detection mechanism Aggressive pruning – backtracking only if absolutely necessary Correctness & complexity proofs are in the paper
Our contribution: A streaming algorithm for conflict detection
How to Find Simple Paths?
Finding simple paths for arbitrary graphs and queries is NP-complete [Mendelzon and Wood, 1995] Conflict-freeness: A sufficient condition for efficient evaluation A condition on the cyclic structure of the graph and the automata An arbitrary path pa : u→v =⇒ a simple path ps : u→v Requires DFS ordering
10 Optimized for append-only streams An optimistic conflict-detection mechanism Aggressive pruning – backtracking only if absolutely necessary Correctness & complexity proofs are in the paper
How to Find Simple Paths?
Finding simple paths for arbitrary graphs and queries is NP-complete [Mendelzon and Wood, 1995] Conflict-freeness: A sufficient condition for efficient evaluation A condition on the cyclic structure of the graph and the automata An arbitrary path pa : u→v =⇒ a simple path ps : u→v Requires DFS ordering
Our contribution: A streaming algorithm for conflict detection
10 An optimistic conflict-detection mechanism Aggressive pruning – backtracking only if absolutely necessary Correctness & complexity proofs are in the paper
How to Find Simple Paths?
Finding simple paths for arbitrary graphs and queries is NP-complete [Mendelzon and Wood, 1995] Conflict-freeness: A sufficient condition for efficient evaluation A condition on the cyclic structure of the graph and the automata An arbitrary path pa : u→v =⇒ a simple path ps : u→v Requires DFS ordering
Our contribution: A streaming algorithm for conflict detection Optimized for append-only streams
10 Aggressive pruning – backtracking only if absolutely necessary Correctness & complexity proofs are in the paper
How to Find Simple Paths?
Finding simple paths for arbitrary graphs and queries is NP-complete [Mendelzon and Wood, 1995] Conflict-freeness: A sufficient condition for efficient evaluation A condition on the cyclic structure of the graph and the automata An arbitrary path pa : u→v =⇒ a simple path ps : u→v Requires DFS ordering
Our contribution: A streaming algorithm for conflict detection Optimized for append-only streams An optimistic conflict-detection mechanism
10 Correctness & complexity proofs are in the paper
How to Find Simple Paths?
Finding simple paths for arbitrary graphs and queries is NP-complete [Mendelzon and Wood, 1995] Conflict-freeness: A sufficient condition for efficient evaluation A condition on the cyclic structure of the graph and the automata An arbitrary path pa : u→v =⇒ a simple path ps : u→v Requires DFS ordering
Our contribution: A streaming algorithm for conflict detection Optimized for append-only streams An optimistic conflict-detection mechanism Aggressive pruning – backtracking only if absolutely necessary
10 How to Find Simple Paths?
Finding simple paths for arbitrary graphs and queries is NP-complete [Mendelzon and Wood, 1995] Conflict-freeness: A sufficient condition for efficient evaluation A condition on the cyclic structure of the graph and the automata An arbitrary path pa : u→v =⇒ a simple path ps : u→v Requires DFS ordering
Our contribution: A streaming algorithm for conflict detection Optimized for append-only streams An optimistic conflict-detection mechanism Aggressive pruning – backtracking only if absolutely necessary Correctness & complexity proofs are in the paper
10 Over 70% of the workloads 2 − 5× impact on the tail latency
Update stream of LDBC SNB [Erling et al., 2015] Stackoverflow temporal graph Yago2s RDF graph Sub-millisecond tail latency (99thpercentile) Performance matching the complexity analysis Finding simple paths
Scalability analysis using synthetic RPQ workloads (gMark [Bagan et al., 2016])
Performance Results
Most common RPQs in Wikidata query logs [Bonifati et al., 2019]
11 Over 70% of the workloads 2 − 5× impact on the tail latency
Sub-millisecond tail latency (99thpercentile) Performance matching the complexity analysis Finding simple paths
Scalability analysis using synthetic RPQ workloads (gMark [Bagan et al., 2016])
Performance Results
Most common RPQs in Wikidata query logs [Bonifati et al., 2019] Update stream of LDBC SNB [Erling et al., 2015] Stackoverflow temporal graph Yago2s RDF graph
11 Over 70% of the workloads 2 − 5× impact on the tail latency
Performance matching the complexity analysis Finding simple paths
Scalability analysis using synthetic RPQ workloads (gMark [Bagan et al., 2016])
Performance Results
Most common RPQs in Wikidata query logs [Bonifati et al., 2019] Update stream of LDBC SNB [Erling et al., 2015] Stackoverflow temporal graph Yago2s RDF graph Sub-millisecond tail latency (99thpercentile)
11 Over 70% of the workloads 2 − 5× impact on the tail latency
Finding simple paths
Scalability analysis using synthetic RPQ workloads (gMark [Bagan et al., 2016])
Performance Results
Most common RPQs in Wikidata query logs [Bonifati et al., 2019] Update stream of LDBC SNB [Erling et al., 2015] Stackoverflow temporal graph Yago2s RDF graph Sub-millisecond tail latency (99thpercentile) Performance matching the complexity analysis
11 Scalability analysis using synthetic RPQ workloads (gMark [Bagan et al., 2016])
Performance Results
Most common RPQs in Wikidata query logs [Bonifati et al., 2019] Update stream of LDBC SNB [Erling et al., 2015] Stackoverflow temporal graph Yago2s RDF graph Sub-millisecond tail latency (99thpercentile) Performance matching the complexity analysis Finding simple paths Over 70% of the workloads 2 − 5× impact on the tail latency
11 Performance Results
Most common RPQs in Wikidata query logs [Bonifati et al., 2019] Update stream of LDBC SNB [Erling et al., 2015] Stackoverflow temporal graph Yago2s RDF graph Sub-millisecond tail latency (99thpercentile) Performance matching the complexity analysis Finding simple paths Over 70% of the workloads 2 − 5× impact on the tail latency Scalability analysis using synthetic RPQ workloads (gMark [Bagan et al., 2016])
11 Support for explicit edge deletions Implicit & explicit window semantics
Safely prune the search space to avoid exhaustive search Tractable evaluation on conflict-free graphs (matching tractability results [Mendelzon and Wood, 1995])
Scalability w.r.t. the window & query size Feasibility of simple path semantics
2 Persistent RPQs under arbitrary path semantics
3 Streaming conflict-detection algorithm
4 Extensive of experiments with real-world workloads [Bonifati et al., 2019]
Wrapping-up
1 Design space of non-blocking RPQ operators Path semantics Result semantics
12 Safely prune the search space to avoid exhaustive search Tractable evaluation on conflict-free graphs (matching tractability results [Mendelzon and Wood, 1995])
Scalability w.r.t. the window & query size Feasibility of simple path semantics
3 Streaming conflict-detection algorithm
4 Extensive of experiments with real-world workloads [Bonifati et al., 2019]
Wrapping-up
1 Design space of non-blocking RPQ operators Path semantics Result semantics 2 Persistent RPQs under arbitrary path semantics Support for explicit edge deletions Implicit & explicit window semantics
12 Scalability w.r.t. the window & query size Feasibility of simple path semantics
4 Extensive of experiments with real-world workloads [Bonifati et al., 2019]
Wrapping-up
1 Design space of non-blocking RPQ operators Path semantics Result semantics 2 Persistent RPQs under arbitrary path semantics Support for explicit edge deletions Implicit & explicit window semantics 3 Streaming conflict-detection algorithm Safely prune the search space to avoid exhaustive search Tractable evaluation on conflict-free graphs (matching tractability results [Mendelzon and Wood, 1995])
12 Wrapping-up
1 Design space of non-blocking RPQ operators Path semantics Result semantics 2 Persistent RPQs under arbitrary path semantics Support for explicit edge deletions Implicit & explicit window semantics 3 Streaming conflict-detection algorithm Safely prune the search space to avoid exhaustive search Tractable evaluation on conflict-free graphs (matching tractability results [Mendelzon and Wood, 1995]) 4 Extensive of experiments with real-world workloads [Bonifati et al., 2019] Scalability w.r.t. the window & query size Feasibility of simple path semantics
12 13 ReferencesI
Bagan, G., Bonifati, A., Ciucanu, R., Fletcher, G. H., Lemay, A., and Advokaat, N. (2016). gmark: schema-driven generation of graphs and queries. IEEE Trans. Knowl. and Data Eng., 29(4):856–869. Bagan, G., Bonifati, A., and Groz, B. (2013). A trichotomy for regular simple path queries on graphs. In Proc. 32nd ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, pages 261–272. Barbieri, D. F., Braga, D., Ceri, S., Della Valle, E., and Grossniklaus, M. (2009). C-sparql: Sparql for continuous querying. In Proc. 18th Int. World Wide Web Conf., pages 1061–1062. Bonifati, A., Martens, W., and Timm, T. (2019). Navigating the maze of wikidata query logs. In Proc. 28th Int. World Wide Web Conf., pages 127–138. ACM. Calbimonte, J.-P., Corcho, O., and Gray, A. J. (2010). Enabling ontology-based access to streaming data sources. In Proc. 9th Int. Semantic Web Conf., pages 96–111. Springer. Choudhury, S., Holder, L. B., Jr, G. C., Agarwal, K., and Feo, J. (2015). A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs. In Proc. 18th Int. Conf. on Extending Database Technology, pages 157–168.
14 ReferencesII
Dell’Aglio, D., Calbimonte, J.-P., Della Valle, E., and Corcho, O. (2015). Towards a unified language for rdf stream query processing. In Proc. 12th Extended Semantic Web Conf., pages 353–363. Springer. Erling, O., Averbuch, A., Larriba-Pey, J., Chafi, H., Gubichev, A., Prat, A., Pham, M.-D., and Boncz, P. (2015). The ldbc social network benchmark: Interactive workload. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 619–630. http://doi.acm.org/10.1145/2723372.2742786. Gupta, A., Mumick, I. S., and Subrahmanian, V. S. (1993). Maintaining views incrementally. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 157–166. Iyer, A. P., Li, L. E., Das, T., and Stoica, I. (2016). Time-evolving graph processing at scale. In Proc. 4th Int. Workshop on Graph Data Management Experiences and Systems, pages 1–6. Kent, A. D., Liebrock, L. M., and Neil, J. C. (2015). Authentication graphs: Analyzing user behavior within an enterprise network. Computers & Security, 48:150–166. Kumar, P. and Huang, H. H. (2020). Graphone: A data store for real-time analytics on evolving graphs. ACM Trans. Storage, 15(4):1–40.
15 ReferencesIII
Le-Phuoc, D., Dao-Tran, M., Parreira, J. X., and Hauswirth, M. (2011). A native and adaptive approach for unified processing of linked streams and linked data. In Proc. 10th Int. Semantic Web Conf., pages 370–388. Springer. Mariappan, M. and Vora, K. (2019). Graphbolt: Dependency-driven synchronous processing of streaming graphs. In Proc. 14th ACM SIGOPS/EuroSys European Conf. on Comp. Syst., pages 1–16. Mendelzon, A. O. and Wood, P. T. (1995). Finding regular simple paths in graph databases. SIAM J. on Comput., 24(6):1235–1258. Qiu, X., Cen, W., Qian, Z., Peng, Y., Zhang, Y., Lin, X., and Zhou, J. (2018). Real-time constrained cycle detection in large dynamic graphs. Proc. VLDB Endowment, 11(12):1876–1888. Yakovets, N., Godfrey, P., and Gryz, J. (2016). Query planning for evaluating sparql property paths. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 1875–1889. ACM.
16 Additional Slides
17 This presentation is not
1 A streaming RDF engine C-SPARQL [Barbieri et al., 2009], CQELS [Le-Phuoc et al., 2011], SPARQLstream [Calbimonte et al., 2010], W3C RSP-QL [Dell’Aglio et al., 2015] Based on SPARQL v1.0 - no path navigation Our algorithms can be incorporated to provide incremental RPQ evaluation 2 A graph analytics engine for dynamic graphs GraphOne [Kumar and Huang, 2020], GraphBolt [Mariappan and Vora, 2019], GraphTau [Iyer et al., 2016] Efficient maintenance of graph snapshots Iterative graph analytics in the vertex-centric model We focus on persistent evaluation of path queries that are specified declaratively
18 Workloads
Table: Most common RPQs in real workloads [Bonifati et al., 2019].
Name Query Name Query ∗ ∗ Q1 a Q7 a ◦ b ◦ c ∗ ∗ Q2 a ◦ b Q8 a? ◦ b ∗ ∗ + Q3 a ◦ b ◦c Q9 (a1 + a2 + ··· + ak ) ∗ ∗ Q4 (a1 + a2 + ··· + ak ) Q10 (a1 + a2 + ··· + ak ) ◦ b ∗ Q5 a ◦ b ◦ c Q11 a1 ◦ a2 ◦ · · · ◦ ak ∗ ∗ Q6 a ◦ b
19 Throughput & Tail Latency
Throughput (edges/s) Tail Latency (μs) Throughput (edges/s) Tail Latency (μs)
105 9.2 × 101 105 104 9 × 101
104 8.8 × 101 104 103 8.6 × 101
3 3 10 8.4 × 101 10 102 Tail Latency (μs) Tail Latency (μs) 8.2 × 101 Throughput (edges/s) Throughput (edges/s) 102 102 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q1 Q2 Q3 Q5 Q6 Q7 Q11 Query Query
(a) Yago2s (b) LDBC SF10
Throughput (edges/s) Tail Latency (μs)
105
104 104
103 Tail Latency (μs) Throughput (edges/s) 102 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Query
(c) Stackoverflow 20 Figure: Throughput and tail latency of the Algorithm ??. Y axis is given in log-scale. Index Size
# of Trees # of Nodes
40 20.0 35 17.5 30 15.0 25 12.5 20 10.0 15 7.5
10 # of nodes (1M) # of trees(1000) 5.0
2.5 5
0.0 0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Query
Figure: Size of the tree index ∆ on the SO graph with |W | = 30days.
21 Scalability - Tail Latency (99thPercentile)
Q1 Q3 Q5 Q7 Q9 Q11 Q2 Q4 Q6 Q8 Q10
3 × 102
2 × 102 Tail latency (μs)
5M 10M 15M 20M 0.5M 1M 1.5M 2M Window size Slide size Figure: Tail latency with various |W | and β
22 Scalability - Window Management Time
Q1 Q3 Q5 Q7 Q9 Q11 Q2 Q4 Q6 Q8 Q10
3 × 105
2 × 105
5
Expiry Time (μs) 10
5M 10M 15M 20M 0.5M 1M 1.5M 2M Window size Slide size Figure: The average window maintenance cost with various |W | and β
23 Explicit Deletions
Q1 Q3 Q5 Q7 Q9 Q11 Q2 Q4 Q6 Q8 Q10
140
130
120
110
100 Tail Latency (μs) 90
80 0 2 4 6 8 10 % of explicit deletions
Figure: Impact of the ratio of explicit deletions on tail latency for all queries on Yago2s RDF graph with |W | =≈ 10M tuples.
24 Impact of DFA Construction
12 ) s
120 / s e
10 g )
100 d k ( e
s 8 3 e 0
t 80 1 a ( t
t s
6 u f 60 p o
h g #
4 40 u o r h
2 20 T 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 Query size (|QR|) # of states (k)
Figure: The impact of the query length |QR | on the automata size, k, and the throughput. RPQs are generated using gMark [Bagan et al., 2016] where the query size ranges from 2 to 20. Each RPQ is formulated by grouping labels into concatenations and alternations of size up to 3, and each group has a 50% probability of having ∗ and +.
25