<<

ROBERT SEDGEWICK | KEVIN WAYNE

4.2 DIRECTED GRAPHS 4.2 DIRECTED GRAPHS

‣ introduction ‣ introduction ‣ digraph API ‣ digraph API Algorithms ‣ digraph search Algorithms ‣ digraph search FOURTH EDITION ‣ topological sort ‣ topological sort ‣ strong components ‣ strong components ROBERT SEDGEWICK | KEVIN WAYNE ROBERT SEDGEWICK | KEVIN WAYNE

http://algs4.cs.princeton.edu http://algs4.cs.princeton.edu

Directed graphs Road network

To see all the details that are visible on the screen,use the Address Holland Tunnel "Print" link next to the map. New York, NY 10013 Digraph. of vertices connected pairwise by directed edges. = intersection; edge = one-way street.

outdegree = 4 indegree = 2 0

6 8 7 directed 2 from 0 to 2 1

3 9 10

4 directed

5 11 12

©2008 Google - Map data ©2008 Sanborn, NAVTEQ™ - Terms of Use

3 4 Political blogosphere graph Overnight interbank loan graph

Vertex = political blog; edge = link. Vertex = bank; edge = overnight loan.

GSCC

GWCC

GIN GOUT

DC

Tendril

4b`tqX #$ 4XWXqTe YtgWr gXsvhqd Yhq GXisXfUXq #   5Q11 & `bTgs vXTdex VhggXVsXW VhfihgXgs The Political Blogosphere and the 2004 U.S. Election: Divided They Blog, Adamic and Glance, 2005 The Topology of the Federal Funds Market, Bech and Atalay, 2008 Figure 1: Community structure of political blogs (expanded set), shown using utilizing a GEM 21 & WbrVhggXVsXW VhfihgXgs 5G11 & `bTgs rsqhg`ex VhggXVsXW VhfihgXgs 57B & `bTgs bgVhfihgXgs 5CIH & `bTgs hts VhfihgXgs Cg sabr WTx saXqX vXqX ! ghWXr bg saX 5G11  ghWXr bg saX 57B ! layout [11] in the GUESS[3] visualization and analysis tool. The colors reflect political orientation, 5 6 red for conservative, and blue for liberal. Orange links go from liberal to conservative, and purple ghWXr bg 5CIH  ghWXr bg saX sXgWqber TgW  ghWXr bg T WbrVhggXVsXW VhfihgXgs ones from conservative to liberal. The size of each blog reflects the number of other blogs that link to it. !"#$%&%'$( HaX ghWXr hY T gXsvhqd VTg UX iTqsbsbhgXW bgsh T VheeXVsbhg hY Wbrchbgs rXsr VTeeXW WbrVhggXVsXW Uber taxi graph VhfihgXgsrImplication!!"# graph!!" HaX ghWXr vbsabg XTVa WbrVhggXVsXW VhfihgXgs Wh ghs aTuX ebgdr sh hq Yqhf longer existed, or had moved to a different location. When looking at the front page of a blog we did ! ! ghWXr bg Tgx! hsaXq VhfihgXgs" ! bX !0 1" !1 0"$0 !!" 1 !!" # bY ) # )  HaX VhfihgXgs not a distinction between blog references made in blogrolls (blogroll links) from those made ! !! ! vbsa saX eTq`Xrs gtfUXq hY ghWXr br qXYXqqXW# sh Tr saX$!)%*$& +"*,-.% $! /'$$"/&"0& /'12'$"$&' ( O5Q11P 7g hsaXq in posts (post citations). This had the disadvantage of not differentiating between blogs that were Vertex = taxi pickup; edge = taxi ride. vhqWrVertex saX = 5Q11 variable; br saX eTq`Xrsedge VhfihgXgs = logical hY saXimplication. gXsvhqd bg vabVa Tee ghWXr VhggXVs sh XTVa hsaXq ubT actively mentioned in a post on that day, from blogroll links that remain static over many weeks [10]. tgWbqXVsXW iTsar HaX qXfTbgbg` WbrVhggXVsXW VhfihgXgsr O21rP TqX rfTeeXq VhfihgXgsr Yhq vabVa saX Since posts usually contain sparse references to other blogs, and blogrolls usually contain dozens of rTfX br sqtX 7g XfibqbVTe rstWbXr saX 5Q11 br hYsXg YhtgW sh UX rXuXqTe hqWXqr hY fT`gbstWX eTq`Xq saTg blogs, we assumed that the network obtained by crawling the front page of each blog would strongly Tgx hY saX 21r OrXX 0qhWXq "& *-3 OPP reflect blogroll links. 479 blogs had blogrolls through blogrolling.com, while many others simply HaX 5Q11 Vhgrbrsr hY T )%*$& 4&5'$)-. /'$$"/&"0 /'12'$"$& O5G11P T )%*$& '6&7/'12'$"$& O5CIHP if x5 is true, maintained a list of links to their favorite blogs. We did not include blogrolls placed on a secondary T )%*$& %$7/'12'$"$& O57BP TgW &"$05%-4 OrXX 4b`tqX #P HaX 5G11 VhfiqbrXr Tee ghWXr saTs VTg qXTVa XuXqx then x0 is true page. hsaXq ghWX bg saX 5G11 saqht`a T WbqXVsXW iTsa ) ghWX br bg saX 5CIH bY bs aTr T iTsa Yqhf saX 5G11 We constructed a citation network by identifying whether a URL present on the page of one blog Uts ghs sh saX 5G11 7g VhgsqTrs T ghWX br bg~ saXx2 57B bY bsx aTr0 T iTsa sh saX 5G11 Uts ghs Yqhf bs ) references another political blog. We called a link found anywhere on a blog’s page, a “page link” to ghWX br bg T sXgWqbe bY bs WhXr ghs qXrbWX hg T WbqXVsXW iTsa sh hq Yqhf saX 5G11 distinguish it from a “post citation”, a link to another blog that occurs strictly within a post. Figure 1 !%4/644%'$( 7g saX gXsvhqd hY iTxfXgsr rXgs huXq 4XWvbqX TgTexyXW Ux GhqTf‰db "& *-3 O P saX 5G11 shows the unmistakable division between the liberal and conservative political (blogo)spheres. In br saX eTq`Xrs VhfihgXgs Cg TuXqT`X Tefhrs %&' hY saX ghWXr bg saTs gXsvhqd UXehg` sh saX 5G11 7g fact, 91% of the links originating within either the conservative or liberal communities stay within VhgsqTrs saX 5G11 br ftVa rfTeeXq Yhq saX YXWXqTe YtgWr gXsvhqd 7g   hgex (&' (' hY saX ghWXr x6 ~x4 x3 that community. An effect that may not be as apparent from the visualization is that even though UXehg` sh sabr VhfihgXgs 0x YTq saX eTq`Xrs VhfihgXgs br saX 57B 7g   )%' )')hY saX ghWXr vXqX ) we started with a balanced set of blogs, conservative blogs show a greater tendency to link. 84% bg sabr VhfihgXgs HaX 5CIH VhgsTbgXW (*' +' hY Tee ghWXr iXq WTx vabeX saXqX vXqX (+' ,' hY ) ) of conservative blogs link to at least one other blog, and 82% receive a link. In contrast, 74% of saX ghWXr ehVTsXW bg saX sXgWqber @Xrr saTg -' (' hY saX ghWXr vXqX bg saX qXfTbgbg` WbrVhggXVsXW ) liberal blogs link to another blog, while only 67% are linked to by another blog. So overall, we see a VhfihgXgsr OrXX HTUeX P ~x5 ~x1 x1 x5 slightly higher tendency for conservative blogs to link. Liberal blogs linked to 13.6 blogs on average, HaX sXgWqber fTx Terh UX Wb‡XqXgsbTsXW bgsh saqXX rtUVhfihgXgsr$ T rXs hY ghWXr saTs TqX hg T iTsa XfTgTsbg` Yqhf 57B T while conservative blogs linked to an average of 15.1, and this difference is almost entirely due to rXs hY ghWXr saTs TqX hg T iTsa eXTWbg` sh 5CIH TgW T rXs hY ghWXr saTs TqX hg T iTsa saTs UX`bgr bg 57B TgW XgWr bg 5CIH the higher proportion of liberal blogs with no links at all. !!"# hY ghWXr vXqX bg ƒYqhf57B„ sXgWqber $!%# hY ghWXr vXqX bg saX ƒsh5CIH„ sXgWqber TgW "!&# hY ghWXr vXqX bg Although liberal blogs may not link as generously on average, the most popular liberal blogs, ƒstUXr„ Yqhf 57B sh 5CIH Daily Kos and Eschaton (atrios.blogspot.com), had 338 and 264 links from our single-day snapshot ~x3 x4 ~x6

 4 ~x0 x2

http://blog.uber.com/2012/01/09/uberdata-san-franciscomics/ 7 8 Combinational circuit WordNet graph

Vertex = logical gate; edge = wire. Vertex = synset; edge = hypernym relationship.

event

happening occurrence occurrent natural_event miracle act human_acon human_acvity change alteraon modicaon miracle

group_acon damage harm ..impairment transion increase forfeit forfeiture acon

resistance opposion transgression leap jump saltaon jump leap change

demoon variaon moon movement move

locomoon travel descent

run running jump parachung

http://wordnet.princeton.edu dash sprint

9 10

Digraph applications Some digraph problems

digraph vertex directed edge

transportation street intersection one-way street problem description

web web page hyperlink s→t path Is there a path from s to t ? food web species predator-prey relationship shortest s→t path What is the shortest path from s to t ? WordNet synset hypernym

scheduling task precedence constraint directed cycle Is there a directed cycle in the graph ?

financial bank transaction topological sort Can the digraph be drawn so that all edges point upwards? cell phone person placed call strong connectivity Is there a directed path between all pairs of vertices ? infectious disease person infection For which vertices v and w is there a directed path from v to w ? game board position legal move

citation journal article citation PageRank What is the importance of a web page ?

object graph object pointer

inheritance hierarchy class inherits from

control flow code block jump

11 12 Digraph API

Almost identical to Graph API.

public class Digraph

4.2 DIRECTED GRAPHS Digraph(int V) create an empty digraph with V vertices Digraph(In in) create a digraph from input stream ‣ introduction void addEdge(int v, int w) add a directed edge v→w ‣ digraph API Iterable adj(int v) vertices pointing from v ‣ digraph search Algorithms int V() number of vertices ‣ topological sort int E() number of edges ‣ strong components ROBERT SEDGEWICK | KEVIN WAYNE Digraph reverse() reverse of this digraph http://algs4.cs.princeton.edu String toString() string representation

14

Digraph API Digraph representation: adjacency lists

% java Digraph tinyDG.txt tinyDG.txt Maintain vertex-indexed array of lists. V 0->5 5 1 13 E 22 0->1 tinyDG.txt V 5 1 4 2 2->0 13 adj[] E 2 3 2->3 0 3 tinyDG.txt 22 3 2 0 4 2 6 0 3->5 V 5 1 1 5 2 2 3 adj[] 0 1 3->2 13 0 3 E 3 2 0 2 0 2 4->3 22 11 12 3 2 6 0 3 1 5 2 12 9 4->2 4 2 0 1 9 10 4 5->4 4 2 3 2 0 adj[]2 9 11 11 12 3 0 2 3 5 ⋮ 3 7 9 3 2 12 9 0 6 9 4 8 0 10 12 11->4 6 0 9 10 4 4 11 4 7 11->12 9 11 1 5 5 2 4 3 6 9 0 1 7 9 8 12->9 9 4 8 0 3 5 2 0 10 12 2 6 6 8 9 6 11 12 11 4 7 3 2 8 6 10 4 3 3 6 9 5 4 8 ⋮ 12 9 3 5 0 5 11 11 10 6 8 4 9 6 4 12 9 10 6 4 6 9 12 8 6 10 In in = new In(args[0]); read digraph from 9 11 7 6 5 4 5 input stream 11 11 10 Digraph G = new Digraph(in); 4 12 7 9 0 5 10 12 6 4 6 12 9 4 8 0 6 9 12 for (int v = 0; v < G.V(); v++) 9 11 4 print out each 7 6 7 for (int w : G.adj(v)) 4 3 4 612 9 edge (once) 8 StdOut.println(v + "->" + w); 3 5 6 8 9 9 15 6 16 8 6 10 5 4 0 5 11 11 10 6 4 12 6 9 12 7 6 4 12

9 Digraph representations Adjacency-lists graph representation (review): Java implementation

In practice. Use adjacency-lists representation. public class Graph ・Algorithms based on iterating over vertices pointing from v. { private final int V; Real-world digraphs tend to be sparse. ・ private final Bag[] adj; adjacency lists

huge number of vertices, small average vertex degree public Graph(int V) { create empty graph this.V = V; with V vertices adj = (Bag[]) new Bag[V]; for (int v = 0; v < V; v++) insert edge edge from iterate over vertices representation space adj[v] = new Bag(); from v to w v to w? pointing from v? }

list of edges E 1 E E add edge v–w public void addEdge(int v, int w) { V 2 1† 1 V adj[v].add(w); adj[w].add(v); adjacency lists E + V 1 outdegree(v) outdegree(v) }

† disallows parallel edges iterator for vertices public Iterable adj(int v) adjacent to v { return adj[v]; } } 17 18

Adjacency-lists digraph representation: Java implementation

public class Digraph { private final int V; private final Bag[] adj; adjacency lists

public Digraph(int V) { create empty digraph 4.2 DIRECTED GRAPHS this.V = V; with V vertices adj = (Bag[]) new Bag[V]; ‣ introduction for (int v = 0; v < V; v++) adj[v] = new Bag(); ‣ digraph API } ‣ digraph search add edge v w Algorithms public void addEdge(int v, int w) → ‣ topological sort { adj[v].add(w); ‣ strong components ROBERT SEDGEWICK | KEVIN WAYNE

http://algs4.cs.princeton.edu }

iterator for vertices public Iterable adj(int v) pointing from v { return adj[v]; } } 19 Depth-first search in digraphs

Problem. Find all vertices reachable from s along a directed path. Same method as for undirected graphs. ・Every undirected graph is a digraph (with edges in both directions). s ・DFS is a digraph .

DFS (to visit a vertex v)

Mark v as visited. Recursively visit all unmarked vertices w pointing from v.

21 22

Depth-first search demo Depth-first search demo

To visit a vertex v : 4→2 To visit a vertex v : 2→3 Mark vertex v as visited. Mark vertex v as visited. ・ 3→2 ・ ・Recursively visit all unmarked vertices pointing from v. 6→0 ・Recursively visit all unmarked vertices pointing from v. 0→1 2→0 0 0 11→12 v marked[] edgeTo[] 12→9 6 8 7 6 8 7 0 T – 9→10 1 T 0 9→11 1 2 1 2 reachable 2 T 3 8→9 from vertex 0 3 T 4 10→12 4 T 5 11→4 5 T 0 3 9 10 4→3 3 9 10 6 F – 3→5 7 F – 4 6→8 4 8 F – 8→6 9 F – 5 5 11 12 5→4 11 12 10 F – 0→5 11 F – 6→4 12 F – a 6→9 reachable from 0 7→6 23 24 Depth-first search (in undirected graphs) Depth-first search (in directed graphs)

Recall code for undirected graphs. Code for directed graphs identical to undirected one. [substitute Digraph for Graph]

public class DepthFirstSearch public class DirectedDFS { { private boolean[] marked; true if connected to s private boolean[] marked; true if path from s

public DepthFirstSearch(Graph G, int s) public DirectedDFS(Digraph G, int s) { { constructor marks constructor marks marked = new boolean[G.V()]; vertices connected to s marked = new boolean[G.V()]; vertices reachable from s dfs(G, s); dfs(G, s); } }

private void dfs(Graph G, int v) recursive DFS does the work private void dfs(Digraph G, int v) recursive DFS does the work { { marked[v] = true; marked[v] = true; for (int w : G.adj(v)) for (int w : G.adj(v)) if (!marked[w]) dfs(G, w); if (!marked[w]) dfs(G, w); } }

client can ask whether any client can ask whether any public boolean visited(int v) public boolean visited(int v) vertex is connected to s vertex is reachable from s { return marked[v]; } { return marked[v]; } } }

25 26

Reachability application: program control-flow analysis Reachability application: mark-sweep garbage collector

Every program is a digraph. Every is a digraph. ・Vertex = basic block of instructions (straight-line program). ・Vertex = object. ・Edge = jump. ・Edge = reference.

Dead-code elimination. Roots. Objects known to be directly accessible by program (e.g., stack). Find (and remove) unreachable code. Reachable objects. Objects indirectly accessible by program Infinite-loop detection. (starting at a root and following a chain of pointers). Determine whether exit is unreachable. roots

27 28 Reachability application: mark-sweep garbage collector Depth-first search in digraphs summary

Mark-sweep algorithm. [McCarthy, 1960] DFS enables direct solution of simple digraph problems. ・Mark: mark all reachable objects. ✓・Reachability. ・Sweep: if object is unmarked, it is garbage (so add to free list). ・Path finding. ・Topological sort. Memory cost. Uses 1 extra mark bit per object (plus DFS stack). ・Directed .

Basis for solving difficult digraph problems. ・2-satisfiability. ・Directed Euler path. ・Strongly-connected components. roots

SIAM J. COMPUT. Vol. 1, No. 2, June 1972

DEPTH-FIRST SEARCH AND LINEAR GRAPH ALGORITHMS* ROBERT TARJAN"

Abstract. The value of depth-first search or "bacltracking" as a technique for solving problems is illustrated by two examples. An improved version of an algorithm for finding the strongly connected components of a directed graph and ar algorithm for finding the biconnected components of an un- direct graph are presented. The space and time requirements of both algorithms are bounded by k 1V + k2E d- k for some constants kl, k2, and k a, where Vis the number of vertices and E is the number of edges of the graph being examined. Key words. Algorithm, backtracking, biconnectivity, connectivity, depth-first, graph, search, spanning , strong-connectivity. 29 30 1. Introduction. Consider a graph G, consisting of a set of vertices U and a set of edges g. The graph may either be directed (the edges are ordered pairs (v, w) of vertices; v is the tail and w is the head of the edge) or undirected (the edges are unordered pairs of vertices, also represented as (v, w)). Graphs form a suitable abstraction for problems in many areas; chemistry, electrical engineering, and sociology, for example. Thus it is important to have the most economical algo- Breadth-first search in digraphs Directed breadth-firstrithms forsearchanswering graph-theoretical demo questions. In studying graph algorithms we cannot avoid at least a few definitions. These definitions are more-or-less standard in the literature. (See Harary [3], for instance.) If G (, g) is a graph, a path p’v w in G is a sequence of vertices and edges leading from v to w. A path is simple if all its vertices are distinct. A path Same method as for undirected graphs. Repeat until queue isp’v empty:v is called a closed path. A closed path p’v v is a cycle if all its edges are distinct and the only vertex to occur twice in p is v, which occurs exactly twice. Every undirected graph is a digraph (with edges in both directions). Remove vertex vTwo fromcycles which queue.are cyclic permutations of each other are considered to be the ・ ・ same cycle. The undirected version of a directed graph is the graph formed by converting each edge of the directed graph into an undirected edge and removing ・BFS is a digraph algorithm. ・Add to queue allduplicate unmarkededges. An undirected verticesgraph is connected pointingif there is a path frombetween vevery and mark them. pair of vertices. A (directed rooted) tree T is a directed graph whose undirected version is connected, having one vertex which is the head of no edges (called the root), and such that all vertices except the root are the head of exactly one edge. The "(v, w) is an edge of T" is denoted by v- w. The relation "There tinyDG2.txtis a path from v to w in T" is denoted by v w. If v w, v is the father ofVw and w is a son of v. If v w, v is an ancestor of w and w is a descendant of v. Every vertex is an 6 E BFS (from source vertex s) 0 ancestor and a descendant of itself. If v is a vertex2-in a tree T, T is the subtree of T having as vertices all the descendants of v in T. If G is a directed graph, a tree T 8 is a spanning tree of G if T is a subgraph of G and T contains all the vertices of G. Put s onto a FIFO queue, and mark s as visited. If R and S are binary relations, R* is the transitive closure of R, R-1 is the 5 0 inverse of R, and 2 4 Repeat until the queue is empty: RS {(u, w)lZlv((u, v) R & (v, w) e S)}. 1 3 2 - remove the least recently added vertex v * Received by the editors August 30, 1971, and in revised form March 9, 1972. Department of , Cornell University, Ithaca, New York 14850. This research was supported by the Hertz Foundation and the National Science Foundation under Grant GJ-992. 1 2 - for each unmarked vertex pointing from v: " 146 0 1 add to queue and mark as visited. 3 4 3 5 4 3 5 0 2

Proposition. BFS computes shortest paths (fewest number of edges) from s to all other vertices in a digraph in time proportional to E + V. graph G

31 32 Directed breadth-first search demo Multiple-source shortest paths tinyDG.txt 5 1 Repeat until queue is empty: Multiple-source shortest pathsV . Given a digraph and a set of source 13 E Remove vertex v from queue. vertices, find shortest path from22 any vertex in the set to each other vertex. ・ 4 2 ・Add to queue all unmarked vertices pointing from v and mark them. 2 3 adj[] 0 3 Ex. S = { 1, 7, 10 }. 3 2 0 6 0 Shortest path to 4 is 7 6 4. 5 2 ・ → → 0 1 1 0 2 Shortest path to is . v edgeTo[] distTo[] ・ 5 7→6→0 →2 5 0 2 11 12 3 2 ・Shortest path to 12 is 10→12. 3 0 – 0 12 9 1 0 1 … ・ 9 10 4 4 1 2 0 1 9 11 5 3 4 3 7 9 4 2 2 10 12 6 9 4 8 0 3 5 3 4 11 4 7 5 4 4 3 6 9 8 3 5 6 8 9 6 8 6 10 Q. How to implement multi-source 5 4shortest paths algorithm? 0 5 11 11 10 done A. Use BFS, but initialize by enqueuing 6 4 all source vertices. 12

33 6 9 34 12 7 6 4 12 Breadth-first search in digraphs application: web crawler Bare-bones web crawler: Java implementation 9

Goal. Crawl web, starting from some root web page, say www.princeton.edu. Queue queue = new Queue(); queue of websites to crawl SET marked = new SET(); set of marked websites

String root = "http://www.princeton.edu";

25 queue.enqueue(root); start crawling from root website Solution. [BFS with implicit digraph] 0 34 marked.add(root);

7 Choose root web page as source s. 2 while (!queue.isEmpty()) ・ 10 40 { 19 33 Maintain a Queue of websites to explore. 29 ・ 41 15 String v = queue.dequeue(); 49 read in raw html from next 8 StdOut.println(v); Maintain a SET of discovered websites. 44 website in queue ・ 45 In in = new In(v); Dequeue the next website and enqueue 28 String input = in.readAll(); ・ 1 14

22 48 websites to which it links 39 String regexp = "http://(\\w+\\.)+(\\w+)"; 18 6 21 (provided you haven't done so before). Pattern pattern = Pattern.compile(regexp); use regular expression to find all URLs 42 13 Matcher matcher = pattern.matcher(input); in website of form http://xxx.yyy.zzz 23 31 47 while (matcher.find()) [crude pattern misses relative URLs] 11 12 32 { 30 26 String w = matcher.group(); 5 37 27 if (!marked.contains(w)) 9 16 43 { marked.add(w); 24 if unmarked, mark it and put 4 queue.enqueue(w); 38 on the queue 3 17 } 35 36 46 } Q. Why not use DFS? 20 }

How many strong components are there in this digraph? 35 36 Web crawler output

BFS crawl DFS crawl

http://www.princeton.edu http://www.princeton.edu http://www.w3.org http://deimos.apple.com http://ogp.me http://www.youtube.com http://giving.princeton.edu http://www.google.com http://www.princetonartmuseum.org http://news.google.com http://www.goprincetontigers.com http://csi.gstatic.com 4.2 DIRECTED GRAPHS http://library.princeton.edu http://googlenewsblog.blogspot.com http://helpdesk.princeton.edu http://labs.google.com http://tigernet.princeton.edu http://groups.google.com ‣ introduction http://alumni.princeton.edu http://img1.blogblog.com http://gradschool.princeton.edu http://feeds.feedburner.com ‣ digraph API http://vimeo.com http:/buttons.googlesyndication.com http://princetonusg.com http://fusion.google.com ‣ digraph search http://artmuseum.princeton.edu http://insidesearch.blogspot.com Algorithms http://jobs.princeton.edu http://agoogleaday.com ‣ topological sort http://odoc.princeton.edu http://static.googleusercontent.com http://blogs.princeton.edu http://searchresearch1.blogspot.com ‣ strong components http://www.facebook.com http://feedburner.google.com ROBERT SEDGEWICK | KEVIN WAYNE http://twitter.com http://www.dot.ca.gov http://algs4.cs.princeton.edu http://www.youtube.com http://www.TahoeRoads.com http://deimos.apple.com http://www.LakeTahoeTransit.com http://qeprize.org http://www.laketahoe.com http://en.wikipedia.org http://ethel.tahoeguide.com ......

37

Precedence scheduling Topological sort

Goal. Given a set of tasks to be completed with precedence constraints, DAG. . in which order should we schedule the tasks? Topological sort. Redraw DAG so all edges point upwards. Digraph model. vertex = task; edge = precedence constraint.

0. Algorithms 0 0→5 0→2 0 1. Complexity Theory 0→1 3→6 2. Artificial Intelligence 2 5 1 3→5 3→4 2 5 1 3. Intro to CS 5→2 6→4 4. Cryptography 3 4 6→0 3→2 3 4 5. Scientific Computing 1→4 6. Advanced Programming 6 6

tasks precedence constraint graph directed edges DAG

Solution. DFS. What else? feasible schedule topological order 39 40 Topological sort demo Topological sort demo

・Run depth-first search. ・Run depth-first search. ・Return vertices in reverse postorder. ・Return vertices in reverse postorder.

tinyDAG7.txt 0 0 7 postorder 11 0 5 4 1 2 5 0 6 3 0 2 2 5 1 0 1 2 5 1 topological order 3 6 3 6 0 5 2 1 4 3 5 3 4 5 2 3 4 3 4 6 4 6 0 3 2 6 6

a directed acyclic graph done

41 42

Depth-first search order Topological sort in a DAG: intuition

Why does topological sort algorithm work? public class DepthFirstOrder { ・First vertex in postorder has outdegree 0. private boolean[] marked; Second-to-last vertex in postorder can only point to last vertex. private Stack reversePostorder; ・ ・... public DepthFirstOrder(Digraph G) { reversePostorder = new Stack(); 0 marked = new boolean[G.V()]; for (int v = 0; v < G.V(); v++) postorder if (!marked[v]) dfs(G, v); 4 1 2 5 0 6 3 }

private void dfs(Digraph G, int v) 2 5 1 topological order { marked[v] = true; 3 6 0 5 2 1 4 for (int w : G.adj(v)) if (!marked[w]) dfs(G, w); reversePostorder.push(v); 3 4 }

public Iterable reversePostorder() returns all vertices in “reverse DFS postorder” { return reversePostorder; } 6 }

43 44 Topological sort in a DAG: correctness proof Directed cycle detection

Proposition. Reverse DFS postorder of a DAG is a topological order. Proposition. A digraph has a topological order iff no directed cycle. Pf. Consider any edge v→w. When dfs(v) is called: Pf. dfs(0) dfs(1) ・If directed cycle, topological order impossible. dfs(4) ・Case 1: dfs(w) has already been called and returned. 4 done ・If no directed cycle, DFS-based algorithm finds a topological order. 1 done Thus, w was done before v. dfs(2) 2 done dfs(5) marked[] edgeTo[] onStack[] Case 2: dfs(w) has not yet been called. check 2 0 1 2 3 4 5 ... 0 1 2 3 4 5 ... 0 1 2 3 4 5 ... ・ 5 done dfs(w) will get called directly or indirectly 0 done dfs(0) check 1 dfs(5) 1 0 0 0 0 0 - - - - - 0 1 0 0 0 0 0 by dfs(v) and will finish before dfs(v). check 2 dfs(4) 1 0 0 0 0 1 - - - - 5 0 1 0 0 0 0 1 v = 3 dfs(3) dfs(3) 1 0 0 0 1 1 - - - 4 5 0 1 0 0 0 1 1 Thus, w will be done before v. check 2 case 1 check 4 check 5 1 0 0 1 1 1 - - - 4 5 0 1 0 0 1 1 1 check 5 dfs(6) ・Case 3: dfs(w) has already been called, case 2 check 0 a digraph with a directed cycle Finding a directed cycle in a digraph but has not yet returned. check 4 6 done Can’t happen in a DAG: function call stack contains 3 done check 4 path from w to v, so v w would complete a cycle. Goal. Given a digraph, find a directed cycle. → check 5 check 6 Solution. DFS. What else? See textbook. all vertices pointing from 3 are done before 3 is done, done so they appear after 3 in topological order 45 46

Directed cycle detection application: precedence scheduling Directed cycle detection application: cyclic inheritance

Scheduling. Given a set of tasks to be completed with precedence The Java does cycle detection. constraints, in what order should we schedule the tasks?

public class A extends B % javac A.java { A.java:1: cyclic inheritance ... involving A } public class A extends B { } ^ 1 error public class B extends C { ... } http://xkcd.com/754

public class C extends A { ... }

Remark. A directed cycle implies scheduling problem is infeasible.

47 48 Directed cycle detection application: recalculation Depth-first search orders

Microsoft Excel does cycle detection (and has a circular reference toolbar!) Observation. DFS visits each vertex exactly once. The order in which it does so can be important.

Orderings. ・: order in which dfs() is called. ・Postorder: order in which dfs() returns. ・Reverse postorder: reverse order in which dfs() returns.

private void dfs(Graph G, int v) { marked[v] = true; preorder.enqueue(v); for (int w : G.adj(v)) if (!marked[w]) dfs(G, w); postorder.enqueue(v); reversePostorder.push(v); }

49 50

Strongly-connected components

Def. Vertices v and w are strongly connected if there is both a directed path from v to w and a directed path from w to v.

Key property. Strong connectivity is an equivalence relation: ・v is strongly connected to v. 4.2 DIRECTED GRAPHS ・If v is strongly connected to w, then w is strongly connected to v. ‣ introduction ・If v is strongly connected to w and w to x, then v is strongly connected to x. ‣ digraph API Def. A strong is a maximal subset of strongly-connected vertices. Algorithms ‣ digraph search ‣ topological sort ‣ strong components ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu

5 strongly-connected components A digraph and its strong components 52 Connected components vs. strongly-connected components Strong component application: ecological food webs

v and w are connected if there is v and w are strongly connected if there is both a directed Food web graph. Vertex = species; edge = from producer to consumer. a path between v and w path from v to w and a directed path from w to v

A graph and its connected components A digraph and its strong components 3 connected components 5A digraphstrongly-connected and its strong components components

connected component id (easy to compute with DFS) strongly-connected component id (how to compute?)

0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 id[] 0 0 0 0 0 0 1 1 1 2 2 2 2 id[] 1 0 1 1 1 1 3 4 3 2 2 2 2

public boolean connected(int v, int w) public boolean stronglyConnected(int v, int w) http://www.twingroves.district96.k12.il.us/Wetlands/Salamander/SalGraphics/salfoodweb.gif { return id[v] == id[w]; } { return id[v] == id[w]; } Strong component. Subset of species with common energy flow. constant-time client connectivity query constant-time client strong-connectivity query 53 54

Strong component application: software modules Strong components algorithms: brief history

Software module dependency graph. 1960s: Core OR problem. ・Vertex = software module. ・Widely studied; some practical algorithms. ・Edge: from module to dependency. ・Complexity not understood.

1972: linear-time DFS algorithm (Tarjan). ・Classic algorithm. ・Level of difficulty: Algs4++. ・Demonstrated broad applicability and importance of DFS.

1980s: easy two-pass linear-time algorithm (Kosaraju-Sharir). ・Forgot notes for lecture; developed algorithm in order to teach it! ・Later found in Russian scientific literature (1972). Firefox Internet Explorer

Strong component. Subset of mutually interacting modules. 1990s: more easy linear-time algorithms. Approach 1. Package strong components together. ・Gabow: fixed old OR algorithm. Approach 2. Use to improve design! ・Cheriyan-Mehlhorn: needed one-pass algorithm for LEDA.

55 56 Kosaraju-Sharir algorithm: intuition Kosaraju-Sharir algorithm demo

Reverse graph. Strong components in G are same as in GR. Phase 1. Compute reverse postorder in GR. Phase 2. Run DFS in G, visiting unmarked vertices in reverse postorder of GR. Kernel DAG. Contract each strong component into a single vertex.

how to compute? Idea. 0 Compute topological order (reverse postorder) in kernel DAG. ・ 6 8 7 ・Run DFS, considering vertices in reverse topological order. 1 2

first vertex is a sink (has no edges pointing from it) 3 9 10

4

E B A 5 D 11 12 Kernel DAG in reverseC topological order A digraph and its strong components digraph G digraph G and its strong components kernel DAG of G (topological order: A B C D E) 57 58

Kosaraju-Sharir algorithm demo Kosaraju-Sharir algorithm demo

Phase 1. Compute reverse postorder in GR. Phase 2. Run DFS in G, visiting unmarked vertices in reverse postorder of GR. 1 0 2 4 5 3 11 9 12 10 6 7 8 1 0 2 4 5 3 11 9 12 10 6 7 8

0 0 v id[]

6 8 7 6 8 7 0 1 1 0 1 2 1 2 2 1 3 1 4 1 5 1 3 9 10 3 9 10 6 3 7 4 4 4 8 3 9 2 5 5 11 12 11 12 10 2 11 2 12 2

reverse digraph GR done

59 60 Kosaraju-Sharir algorithm Kosaraju-Sharir algorithm

Simple (but mysterious) algorithm for computing strong components. Simple (but mysterious) algorithm for computing strong components. ・Phase 1: run DFS on GR to compute reverse postorder. ・Phase 1: run DFS on GR to compute reverse postorder. ・Phase 2: run DFS on G, considering vertices in order given by first DFS. ・Phase 2: run DFS on G, considering vertices in order given by first DFS.

DFS in reverse digraph GR DFS in original digraph G

check unmarked vertices in the order reverse postorder for use in second dfs() check unmarked vertices in the order 0 1 2 3 4 5 6 7 8 9 10 11 12 1 0 2 4 5 3 11 9 12 10 6 7 8 1 0 2 4 5 3 11 9 12 10 6 7 8 dfs(1) dfs(0) dfs(11) dfs(6) dfs(7) 1 done dfs(5) check 4 check 9 check 6 dfs(0) dfs(4) dfs(12) check 4 check 9 dfs(6) dfs(3) dfs(9) dfs(8) 7 done dfs(8) check 5 check 11 check 6 check 8 check 6 dfs(2) dfs(10) 8 done 8 done check 0 check 12 check 0 dfs(7) check 3 10 done 6 done 7 done 2 done 9 done 6 done 3 done 12 done dfs(2) check 2 11 done dfs(4) 4 done check 9 dfs(11) 5 done check 12 dfs(9) check 1 check 10 dfs(12) 0 done check 11 check 2 dfs(10) check 4 check 9 check 5 10 done check 3 12 done check 7 check 6 ... 9 done 61 62 11 done check 6 dfs(5) dfs(3) check 4 check 2 3 done Kosaraju-Sharir algorithm check 0 Connected components in an undirected graph (with DFS) 5 done 4 done check 3 2 done Proposition. Kosaraju-Sharir0 done algorithm computes the strong components of public class CC dfs(1) { check 0 private boolean marked[]; a digraph in time proportional1 done to E + V. check 2 private int[] id; check 3 private int count; check 4 check 5 check 6 public CC(Graph G) Pf. check 7 check 8 { Running time: bottleneckcheck is9 running DFS twice (and computing R). marked = new boolean[G.V()]; ・ check 10 G id = new int[G.V()]; check 11 check 12 nd ・Correctness: tricky, see textbook (2 printing). for (int v = 0; v < G.V(); v++) { ・Implementation: easy! if (!marked[v]) { dfs(G, v); count++; } } }

private void dfs(Graph G, int v) { marked[v] = true; id[v] = count; for (int w : G.adj(v)) if (!marked[w]) dfs(G, w); }

public boolean connected(int v, int w) { return id[v] == id[w]; } } 63 64 Strong components in a digraph (with two DFSs) Digraph-processing summary: algorithms of the day

public class KosarajuSharirSCC { private boolean marked[]; private int[] id; single-source private int count; reachability DFS public KosarajuSharirSCC(Digraph G) { in a digraph marked = new boolean[G.V()]; id = new int[G.V()]; DepthFirstOrder dfs = new DepthFirstOrder(G.reverse()); for (int v : dfs.reversePostorder()) { if (!marked[v]) { topological sort dfs(G, v); DFS count++; in a DAG } } }

private void dfs(Digraph G, int v)

{ 0 marked[v] = true; 6 id[v] = count; strong 7 8 for (int w : G.adj(v)) 1 2 Kosaraju-Sharir if (!marked[w]) components 9 10 DFS (twice) dfs(G, w); 3 in a digraph 4 } 11 12 5 public boolean stronglyConnected(int v, int w) { return id[v] == id[w]; } } 65 66