Graphs Part 3 Strongly Connected Components This is another example application of Depth First Search (DFS). Definition: Maximal set of vertices for a , so that for each pair of vertices, u and v, there is a path both from u to v and from v to u. Usefulness: Groups of people or populations that are connected amongst themselves; brain regions that are strongly connected; figuring out if a network such as telephone lines are strongly connected (you can dial anyone in the network). We will use Depth First Search (DFS), but not just by itself. Consider panel (a) in the figure below (we will look at panels b and c afterwards): 616 Chapter 22 Elementary Graph Algorithms

abcd 13/14 11/16 1/10 8/9

(a)

12/15 3/4 2/7 5/6 efgh

abcd

(b)

efgh

cd

(c) abe

fg h

Figure 22.9 (a) AdirectedgraphG.EachshadedregionisastronglyconnectedcomponentofG. Each vertex is labeled with its discovery and finishing times in a depth-first search, and edges are shaded. (b) The graph GT,thetransposeofG,withthedepth-firstforestcomputedinline3 of STRONGLY-CONNECTED-COMPONENTS shown and tree edges shaded. Each strongly connected component corresponds to one depth-first tree. Vertices b, c, g,andh,whichareheavilyshaded,are the roots of the depth-first trees produced by the depth-first search of GT. (c) The acyclic component graph GSCC obtained by contracting all edges within each strongly connected component of G so that only a single vertex remains in each component.

Our algorithm for finding strongly connected components of a graph G D .V; E/ uses the transpose of G,whichwedefinedinExercise22.1-3tobethe graph GT .V; ET/,whereET .u; !/ .!;u/ E .Thatis,ET consists of D D f W 2 g the edges of G with their directions reversed. Given an adjacency-list representa- tion of G,thetimetocreateGT is O.V E/.ItisinterestingtoobservethatG C and GT have exactly the same strongly connected components: u and ! are reach- able from each other in G if and only if they are reachable from each other in GT. Figure 22.9(b) shows the transpose of the graph in Figure 22.9(a), with the strongly connected components shaded. Each cluster of strongly connected components is in gray. For example, there is a path from node c to d, and from node d to c. There is also a path from any node in a,b,e to other nodes in that cluster. If we started DFS from node c, we could discover strongly connected component cluster c and d. But if we started the DFS search from node a, we will actually discover all the nodes of this graph, and not an individual strongly connected component cluster (because b has an arrow out of the cluster going to c, so c would be discovered too, although it is not strongly connected with a,b,e). Solution: We need to call BFS from the right places to find the strongly connected components! Algorithm: 1. Call DFS on graph G to compute finish times u.f for each vertex u in graph G (this is step a in the figure above)

2. Compute the transpose of graph G: T G

This is the graph G, but with all edge directions reversed (see panel b in figure above). Note that G and its transpose have the same strongly connected component clusters. 3. Call DFS on the transpose graph, but in the DFS loop consider vertices in decreasing order of finish time from the DFS we ran on G (starting from highest finish time). This process is also shown in panel b. Since we are considering vertices in decreasing order of finish times from the DFS in panel (a), we first consider vertex b and perform DFS in panel (b) on the transposed graph. We find strongly connected components b,a,e in this way. From this cluster of 3 vertices, there are no are no more vertices going out in the transpose graph. So as in DFS, we choose a new starting node, and again take the largest remaining finish time from panel (a) and therefore start in panel (b) from node c. We find nodes c and d in this way. Again, there are no more nodes going out of the cluster. We start from g, and find strongly connected component cluster g,f. Finally, we also start from vertex h and find only the vertex itself as the last strongly connected component (with itself). 4. Panel c shows the resulting strongly connected components, with each cluster written out as its own vertex. This also shows the idea of strongly connected components. Each cluster has a path between all nodes in the cluster, but between clusters there is only a single arrow (otherwise if there were arrows going both ways, both clusters would form a larger strongly connected component). You could also see that the strongly connected components of graph G and its transpose are the same (just the arrow between strongly connected components is reversed in the transpose graph). Why it works: The main intuition is that the finish times in the original DFS on graph G, give us the proper starting points for our second DFS, such that we each time discover a strongly connected component cluster and there are no arrows to other undiscovered clusters. For example, the first cluster b,a,e in panel (b), had no out going arrows from these vertices. The second cluster c,d only had out going arrows to the cluster that was already discovered. And so on. There is a lemma and proof in the book, which we did not go over. Dijkstra’s shortest path algorithm 1. This algorithm finds a single source shortest path on a directed graph, with weights for the edges (for instance, weights could be driving distances between locations). 2. This is a greedy algorithm, and the weights must be non negative (otherwise, for instance if we want to represent positive profit and negative loss, we can use a “cousin” algorithm, Bellman Ford, which we do not discuss here, and uses dynamic programming). 3. Remember that Breadth First Search (BFS) also finds a shortest path, but in the case of unweighted edges. Dijkstra can be seen as a generalization of BFS. Approach: We first describe the main approach, and then look at an example from the text-book, which will make it more concrete. The main idea is that we maintain a set of vertices S whose final shortest path lengths have already been determined. Each time we consider the not yet discovered vertices in the graph, and all edges going from a discovered vertex (u) to an undiscovered vertex (v). We choose an undiscovered vertex with an edge from u to v, that gives the shortest path length. The length from u to v for each vertex v, is given by the length of u, plus the weight between u and v. In the initialization, we just include source node s in the set of discovered nodes, and set its length to 0. All other lengths are initially infinity. Then we keep expanding set S of discovered nodes in a greedy manner, as in the example figure below. In the figure, the black vertices at each step are those vertices added to set S. Initially, only s is in set S. s can go to t (length 10) or y (length 5), and y yields the shortest path. Now both s and y are in set S. We consider all possibilities from set S (vertices s and y) to other vertices. We already have the s to t (length 10) stored and we also look at y to t (5 + 3 = 8); y to z (5 + 2 = 7); and y to x (5 + 9) – 14. The shortest greedy choice is y to z (length 7), so now z is added to our set of discovered nodes S. And so on; see figure below. 24.3 Dijkstra’s algorithm 659

tx tx tx 1 1 1 ∞∞ 10 ∞ 8 14 10 10 10 9 9 9 s 0 2 3 4 6 s 0 2 3 4 6 s 0 2 3 4 6 7 7 7 5 5 5 ∞∞ 5 ∞ 5 7 yz2 yz2 yz2 (a) (b) (c) tx tx tx 1 1 1 8 13 8 9 8 9 10 10 10 9 9 9 s 0 2 3 4 6 s 0 2 3 4 6 s 0 2 3 4 6 7 7 7 5 5 5 5 7 5 7 5 7 yz2 yz2 yz2 (d) (e) (f)

Figure 24.6 The execution of Dijkstra’s algorithm. The source s is the leftmost vertex. The shortest-path estimates appear within the vertices, and shaded edges indicate predecessor values. Black vertices are in the set S,andwhiteverticesareinthemin-priorityqueueQ V S. (a) The D ! situation just before the first iteration of the while loop of lines 4–8. The shaded vertex has the mini- mum d value and is chosen as vertex u in line 5. (b)–(f) The situation after each successive iteration of the while loop. The shaded vertex in each part is chosen as vertex u in line 5 of the next iteration. The d values and predecessors shown in part (f) are the final values.

and added to S exactly once, so that the while loop of lines 4–8 iterates exactly V j j times. Because Dijkstra’s algorithm always chooses the “lightest” or “closest” vertex in V S to add to set S,wesaythatitusesagreedystrategy.Chapter16explains Run time: We noted! very briefly in class that we go though each vertex once, and then for each vertexgreedy we strategies need in to detail, look but at you its need adjacency not have read list. that If chapterthere toare understand n vertices Dijkstra’s algorithm. Greedy strategies do not always yield optimal results in gen- and m edges, weeral, have but as the the followingusual (n theorem + m) andnumber its corollary of operations. show, Dijkstra’s However, algorithm does each operation takes time,indeed computesince we shortest need paths. to Thefind key the is to minimum show that each amongst time it adds all a vertex possibleu to set S,wehaveu:d ı.s; u/. edges. We did not go into detail,D but this can be done efficiently by extracting the minimum from a TheoremQueue, 24.6 with (Correctness each such of Dijkstra’s minimum algorithm) taking O(log n) time. Overall we have O( (n+m)Dijkstra’s log n) algorithm, run on a weighted, directed graph G .V; E/ with non- D negative weight function w and source s,terminateswithu:d ı.s; u/ for all D vertices u V . 2