Streaming Graph

CSC 581 Christopher Siu 1 / 22 Finding the Majority Element

Consider the following problem: Given an unsorted list of arbitrary elements, find the element that makes up more than half the elements in the list.

Example Given: [−1, 8, λ, 8, 8, a, ?, 8, ?, 94, 8, 8, 8, 8, ∅] . . . return 8.

2 / 22 Naïve Solution

MajorityElement(A ← [a1, a2, . . . , an]) Input: A finite list of n elements, A Output: The majority element in the list 1: for i ← 1 to n do 2: let count ← 0 3: for j ← 1 to n do 4: if aj = ai then 5: count ← count + 1 n 6: if count > b /2c then 7: return ai 8: return “None”

This runs in On2 time and O(1) space.

3 / 22 Divide and Conquer Solution

MajorityElement(A ← [a1, a2, . . . , an]) Input: A finite list of n elements, A Output: The majority element and the number of times it occurs 1: if n = 1 then 2: return (a1, 1) 3: else n 4: let mid ← b /2c 5: let (al, cl) ← MajorityElement([a1, . . . , amid]) 6: let (ar, cr) ← MajorityElement([amid+1, . . . , an]) 7: if al = “None” and cr = “None” then 8: return (“None”, −1) 9: else if al = ar then 10: return (al, cl + cr) . .

4 / 22 Divide and Conquer Solution

MajorityElement(A ← [a1, a2, . . . , an]) . . 11: else 12: cl ← cl + Count(al, [amid+1, . . . , an]) 13: cr ← cr + Count(ar, [a1, . . . , amid]) n 14: if al 6= “None” and cl > b /2c then 15: return (al, cl) n 16: else if ar 6= “None” and cr > b /2c then 17: return (ar, cr) 18: else 19: return (“None”, −1)

This algorithm runs in O(n log n) time and O(n) space.

5 / 22 Hashing Solution

MajorityElement(A ← [a1, a2, . . . , an]) Input: A finite list of n elements, A Output: The majority element in the list 1: let T ← [] 2: for i ← 1 to n do 3: if ai ∈/ T then 4: T [ai] ← 1 5: else 6: T [ai] ← T [ai] + 1 n 7: if T [ai] > b /2c then 8: return ai 9: return “None”

This algorithm runs in O(n) time and O(n) space.

6 / 22 Boyer-Moore Majority Vote Algorithm

MajorityVote(A ← [a1, a2, . . . , an]) Input: A finite list of n elements, A Output: The majority element in the list, assuming one exists 1: let x ← a1 2: let votes ← 1 3: for i ← 2 to n do 4: if votes = 0 then 5: x ← ai 6: votes ← 1 7: else if ai = x then 8: votes ← count + 1 9: else 10: votes ← count − 1 11: return x

7 / 22 Boyer-Moore Majority Vote Algorithm

Example Let A = [−1, 8, λ, 8, 8, a, ?, 8, ?, 94, 8, 8, 8, 8, ∅].

Element Candidate Votes (continued) −1 −1 1 ?? 1 8 −1 0 94 ? 0 λ λ 1 8 8 1 8 λ 0 8 8 2 8 8 1 8 8 3 a 8 0 8 8 4 ?? 1 ∅ 8 3 8 ? 0 return 8

This algorithm runs in O(n) time and O(1) space.

8 / 22 Boyer-Moore Majority Vote Algorithm

In a sense, this algorithm is “complete”, but not “sound”. If there is a majority element, this algorithm will find it. If this algorithm finds an element, that element is not necessarily the majority element.

MajorityElement(A ← [a1, a2, . . . , an]) Input: A finite list of n elements, A Output: The majority element in the list 1: let x ← MajorityVote(A) n 2: if Count(x, A) > b /2c then 3: return x 4: else 5: return “None”

This algorithm, too, runs in O(n) time and O(1) space.

9 / 22 Boyer-Moore Majority Vote Algorithm

The Majority Vote algorithm loses context. The algorithm doesn’t keep complete occurrence information. The algorithm is not capable of .

The algorithm discards data it has already processed. The algorithm is capable of operating in constant space.

Karp, Shenker, and Papadimitriou generalized the Majority Vote 1  algorithm to find all elements with freqeuncy > θ in O /θ space, although it requres two passes.

10 / 22 Streaming Algorithms

A streaming algorithm processes an indefinite of elements, S = hs1, s2, . . . , sn,...i. Since the goal is to avoid storing the dataset in memory, the algorithm must operate under severe memory constraints.

Since the algorithm cannot know the future, it must process data as it arrives, in the order it arrives.

Streaming algorithms are ideal for processing massive datasets or responding to real-time events.

11 / 22 The Seven Bridges of Königsberg

A graph G = (V,E) consists of a set of vertices, V , and a set of edges, E. By convention, |V | = n and |E| = m.

Each edge e = (u, v) connects two vertices, u and v. Graphs are used to represent relationships between entities.

12 / 22 Graphs

Graphs can be used to represent: Bridges and landmasses Diseased populations Cities and roads States and transitions Computers on the Internet Friends on social networks Flights between airports ...

Given a graph, we might be interested in: Eulerian paths Densest subgraphs Hamiltonian paths Shortest paths Spanning trees Largest cliques Network flow ...

13 / 22 Finding A Minimum Spanning

Consider the following problem: Given a connected, undirected, weighted graph, find the tree of minimum total weight that connects all vertices.

Example

Given: v 3 6 8 2 u x

7 2 7 w . . . return {(u, v, 3) , (u, x, 2) , (w, x, 2)}.

14 / 22 Kruskal’s Algorithm

MinimumSpanningTree(G ← (V,E)) Input: A connected, undirected, weighted graph G Output: A minimum spanning tree of G 1: let T be an empty tree 2: while E 6= ∅ do 3: let e ← MinWeightEdge(E) 4: E ← E \{e} 5: if T ∪ {e} does not contain a cycle then 6: T ← T ∪ {e} 7: return T

This algorithm runs in O (m log m) time and O (n + m) space. Selecting the minimum weight edge requires that the algorithm access the edge set multiple times.

15 / 22 Graph Streaming Algorithms

Often, a semi-streaming model must be used. An algorithm is allowed O(n polylog n) space. The time constraint is similarly relaxed.

McGregor distinguishes three types of graph streams: Insert-Only An unordered stream of edges, S = he1, e2, . . . , emi Graph Sketch A stream of edge additions and removals, S = ha1, a2, . . . , aki, where ai = (ei, δi ∈ {1, −1}). Sliding Window An indefinite stream where at time t, only the last w edges are considered, S = h. . . , et−w+1, . . . , et−1, et,...i.

16 / 22 Finding a Minimum Spanning Tree

The Cycle Property For any cycle C in a graph G, if e is the strictly maximum weight edge in C, then e is not part of any MST of G.

MinimumSpanningTree(G ← he1, e2, . . . , emi) Input: A stream of edges defining a connected, undirected graph G Output: A minimum spanning tree of G 1: let T be an empty tree 2: for e ∈ G do 3: T ← T ∪ {e} 4: if T contains a cycle, C then 5: T ← T \{MaxWeightEdge(C)} 6: return T This algorithm only requires one pass through the edge set.

17 / 22 Selected Graph Streaming Algorithms

Algorithm

Problem Insert-Only Graph Sketch Sliding Window

Connectivity X X X Two-coloring X X X

Min. (1 + )-approx. or (1 + )-approx. Spanning Tree X O (log n) passes Max. 2-approx. Multiple passes (3 + )-approx. Matching

McGregor, Feigenbaum et al.

18 / 22 Future Work

Most research has been done with undirected graphs, yet many naturally occurring graphs are directed.

Constructing a graph generally does not produce edges in a truly random order; can we take advantage of that fact?

19 / 22 Questions?

20 / 22 References

1 R. M. Karp, S. Shenker, and C. H. Papadimitriou. “A Simple Algorithm for Finding Frequent Elements in Streams and Bags”. ACM Transactions on Database Systems, Vol. 28, No. 1, 2003, pp. 51-55. [Online]. Available: https://dl.acm.org/citation.cfm?id=762473

2 R. S. Boyer and J. S. Moore. “MJRTY — A Fast Majority Vote Algorithm”, in Automated Reasoning: Essays in Honor of Woody Bledsoe R. S. Boyer, Ed. Dordrecht, The Netherlands: Kluwer Academic Publishers, 1991, pp. 105-117. [Online]. Available: http://www.cs.utexas.edu/~moore/best-ideas/mjrty/index.html

3 N. Alon, Y. Matias, and M. Szegedy. “The Space Complexity of Approximating the Frequency Moments”, in Proc. of the 28th Annual ACM Symposium on Theory of Computing, 1996, pp. 20-29. [Online]. Available: https://dl.acm.org/citation.cfm?id=237823

4 A. McGregor. “Graph Stream Algorithms: A Survey”, in ACM SIGMOD Record, Vol. 43, No. 1, 2014, pp. 9-20. [Online]. Available: https://dl.acm.org/citation.cfm?id=2627694

21 / 22 References

5 R. E. Tarjan. “Minimum Spanning Trees”, in Data Structures and Network Algorithms, Philadelphia, USA: Society for Industrial and Applied Mathematics, 1983, pp. 71-83.

6 J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. “On Graph Problems in a Semi-Streaming Model”, in Theoretical Computer Science, Vol. 348, No. 2, 2005, pp. 207-216. 7 D. Ediger and J. P. Fairbanks. “Deriving Streaming Graph Algorithms from Static Definitions”, in IEEE Intl. Parallel and Distributed Processing Symposium Workshops, 2017, pp. 637-642. [Online]. Available: https://ieeexplore.ieee.org/document/7965103/

8 J. Riedy and D. A. Bader. “Multithreaded Community Monitoring for Massive Streaming Graph Data”, in IEEE 27th Intl. Symposium on Parallel and Distributed Processing Workshops and PhD Forum, 2013, pp. 1646-1655. [Online]. Available: https://ieeexplore.ieee.org/document/6651061/

22 / 22