Graphs, Connectivity, and Traversals

Definitions

Like trees, graphs represent a fundamental data structure used in computer science. We often hear about cyber space as being a new frontier for mankind, and if we look at the structure of cyberspace, we see that it is structured as a graph; in other words, it consists of places (nodes), and connections between those places. Some applications of graphs include

• representing electronic circuits

• modeling object interactions (e.g. used in the Unified Modeling Language)

• showing ordering relationships between computer programs

• modeling networks and network traffic

• the fact that trees are a special case of graphs, in that they are acyclic and connected graphs, and that trees are used in many fundamental data structures

1 An undirected graph G = (V,E) is a pair of sets V , E, where

• V is a set of vertices, also called nodes.

• E is a set of unordered pairs of vertices called edges, and are of the form (u, v), such that u, v ∈ V .

• if e = (u, v) is an edge, then we say that u is adjacent to v, and that e is incident with u and v.

• We assume |V | = n is finite, where n is called the order of G.

•| E| = m is called the size of G.

• A path P of length k in a graph is a sequence of vertices v0, v1, . . . , vk, such that (vi, vi+1) ∈ E for every 0 ≤ i ≤ k − 1.

– a path is called simple iff the vertices v0, v1, . . . , vk are all distinct. – a path is called a cycle iff its length is at least 3, and the start and end vertices are the same: i.e. v0 = vk. • a geometrical representation of a graph is obtained by treating the vertices as points on a plane, and edges e = (u, v) as smooth arcs which terminate at both u and v.

• the degree of a vertex v, denoted as deg(v), equals the number of edges that are incident with v.

• Handshaking Theorem. P deg(v) = 2|E|. v∈V

2 Example 1. Let G = (V,E), where

V = {SD, SB, SF, LA, SJ, OAK} are cities in California, and

E = {(SD, LA), (SD,SF ), (LA, SB), (LA, SF ), (LA, SJ), (LA, OAK), (SB,SJ)} are edges which represent flights between two cities. Provide the following for the graph:

• the order and size

• a geometrical representation

• a path of length 8

• the longest simple path

• the largest cycle

• the degrees of each vertex

3 A directed graph is a graph G = (V,E) whose edges have direction. In this case, given (u, v) ∈ E, u is called the start vertex and v is called the end vertex. Moreover, the in-degree of vertex v, denoted deg+(v), is the number of edges for which v is the end vertex. And the out-degree of vertex v, denoted deg−(v), is the number of edges for which v is the start vertex.

Similar to the handshaking theorem X X deg+(v) = deg−(v) = |E|. v∈V v∈V

An undirected graph can be made into a directed graph by orienting each edge of the graph; that is, by assigning a direction to each edge. Conversely, each directed graph has associated with it an underlying undirected graph which is obtained by removing the orientation of each edge.

Example 2. Redraw the graph from Example 1, but now with oriented edges.

4 Graph Connectivity and Traversals

Recall that a path in a graph G = (V,E) (either directed or undirected) from vertex u to vertex v is a sequence of vertices u = v0, v1, . . . , vn = v for which (vi, vi+1) ∈ E for all i = 0, 1, . . . , n − 1. We then say that G is connected provided there is a path from every vertex u ∈ V to every other vertex v ∈ V . In what follows we present that pertain to the degree of connectivity of a graph. To begin, we consider two different ways of traversing a graph; i.e. visiting each node of the graph. The first makes use of a FIFO queue data structure, and is called a breadth-first traversal, while the second uses a stack data structure, and is called a depth-first traversal.

Breadth-First Graph Traversal Algorithm.

Let G = (V,E) be a graph (either directed or undirected).

Initialize FIFO queue Q as being empty. Initialize each vertex of G as being unmarked.

While there exist unmarked vertices: Let u be one such unmarked vertex. Mark u and place it in Q.

While Q is nonempty: Remove node u from front of Q.

For every v that is a neighbor/child of u: If v is unmarked, then mark v and place it in Q.

In addition to visiting each node, this procedure implicitly yields a spanning forest of trees (or a spanning if the forest has only one tree) whose edges are comprised of those of the form (u, v) where u was the node that was removed from Q and led to v being marked. For undirected graphs, a breadth-first traversal partitions the edges of G into those that are used in the forest, and those that are not used. The latter edges are called cross edges, since they always connect nodes that are on different branches of one of the (spanning) trees of the spanning forest.

5 Example 3. For the graph G = (V,E), where

V = {a, b, c, d, e, f, g, h, i, j, k} and the edges are given by

E = {(a, b), (a, c), (b, c), (b, d), (b, e), (b, g), (c, g), (c, f),

(d, f), (f, g), (f, h), (g, h), (i, j), (i, k), (j, k)}. Show the forest that results when performing a breadth-first traversal of the G. Assume that all adjacency lists follow an alphabetical ordering.

6 Depth-First Graph Traversal Algorithm.

Let G = (V,E) be a graph (either directed or undirected).

Initialize stack S as being empty. Initialize each vertex of G as being unmarked.

While there exist unmarked vertices: Let u be one such unmarked vertex. Mark u and push it on to S.

While S is nonempty: Let u be at the front of S. Let v be the first unmarked neighbor/child of u.

If v does not exist: Pop u from S. Otherwise: Mark v and push v on to S.

In addition to visiting each node, this procedure also implicitly yields a spanning forest of trees (or a spanning tree if the forest has only one tree) whose edges are comprised of those of the form (u, v) where u was the node from the front of S that reached v and caused it to be marked. For undirected graphs, a depth-first traversal partitions the edges of G into those that are used in the forest, and those that are not used. The latter edges are called backward edges, since they always connect nodes that are on the same tree branch (i.e., the edge connects a descendant to an ancestor).

7 Example 4. For the graph G = (V,E), where

V = {a, b, c, d, e, f, g, h, i, j, k} and the edges are given by

E = {(a, b), (a, c), (b, c), (b, d), (b, e), (b, g), (c, g), (c, f),

(d, f), (f, g), (f, h), (g, h), (i, j), (i, k), (j, k)}. Show the forest that results when performing a breadth-first traversal of the G. Assume that all adjacency lists follow an alphabetical ordering.

8 Theorem 1. Undirected graph G = (V,E) is connected iff a breadth-first traversal of G yields a forest with exactly one tree.

Proof of Theorem 1. Suppose a breadth-first traversal of G yields a forest with exactly one tree T . Then G is connected since T is connected (by definition a tree is connected and acyclic), and is a spanning tree for G.

Now suppose G is connected. Suppose u is the root of the first tree. Let v be any other vertex of G. Then there is a path P : u = v0, v1, . . . , vn−1 = v from u to v. Notice that, in the first iteration of the outer while loop, v1 has the opportunity to be marked when v0 is removed from the queue. And inductively, assuming that vi−1 will be entered in the queue (during the first outer-loop iteration), vi will then have the opportunity to be marked. Therefore, by induction, v will be marked in the first iteration of the outer while loop. Since v was arbitrary, it follows that all vertices of G belong to the first and only tree (each iteration of the outer loop corresponds with a new tree).

Corollary 1. The size of the forest generated in a breadth-first traversal of a undirected graph equals the number of connected components of that graph. Moreover, each tree in the forest represents a spanning tree for one of the connected components; in other words, a tree that contains each vertex of a component.

A Connectivity Algorithm for Directed Graphs

For directed graphs, first note that there are two kinds of connectivity to consider. The first is called strong connectivity, and requires that all paths follow the orientation of each directed edge. The other kind is called weak connectivity and allows for paths that ignore edge orientation. Of course, weak connectivity can be tested using the breadth-first traversal algorithm described above. On the other hand, the (strong) connectivity of a directed graph G = (V,E) can be determined by making two depth-first traversals of the graph. In performing the first traversal, we recursively compute the post order of each vertex. In the first depth-first traversal, for the base case, let v be the first vertex that is marked, but has no unmarked children. Then the post order of v equals 1. For the recursive case, suppose v is a vertex and the post order of each of v’s children has been computed. Then the post order of v is one more than the largest post order of any of its children. Henceforth we let post(v) denote the post order of v.

The second depth-first traversal is performed with what is called the reversal of G. The reversal Gr of a directed graph G = (V,E) is defined as Gr = (V,Er), where Er is the set of edges obtained by reversing the orientation of each edge in E. Moreover, during this second depth-first search, when selecting a vertex to mark and enter into the stack during execution of the outer while loop, the vertex of highest post order (obtained from the first depth-first traversal) is always chosen.

Theorem 2. In the second depth-first traversal of directed graph G = (V,E), the resulting forest represents the set of strongly connected components of G.

9 Proof of Theorem 2. Let T1,T2,...,Tm, be the trees of the forest obtained from the second traversal, and in the order in which they were constructed. First note that, if u and v are in different trees, then u and v must be in different strongly-connected components. For example, suppose u is r in Ti and v is in Tj, with i < j, then there is no path in G from the root of Ti to v, which implies no path from v to the root of Ti in G. Thus, being in the same tree is a necessary condition for being in the same strongly-connected component. We now show that this condition is also sufficient, which will complete the proof.

Now let u and v be in the same tree Tj, for some j = 1, . . . , m. Let r be the root of this tree. Then there are paths from r to u and from r to v in Gr, which implies paths from u to r and from v to r in G. We now show that there is also a path from r to u in G (and similarly from r to v). This will imply a path from u to v (and similarly a path from v to u) in G by combining the paths from u to r, and then from r to v.

To this end, since r is the root of a second-round depth-first tree, we know that r must have higher post order than u, as computed during the first round. In the first round, r must be in the same tree as u, since there is a path from u to r and r has a higher post order than u. But then, since u has post order less than that of r, u must be a descendant of r, in which case there is a path from r to u.

10 Example 5. Use the algorithm described above to determine the strongly-connected components of the following directed graph. G = (V,E), where

V = {a, b, c, d, e, f, g, h, i, j} and the edges are given by

E = {(a, c), (b, a), (c, b), (c, f), (d, a), (d, c), (e, c),

(e, d), (f, b), (f, g), (f, h), (h, g), (h, i), (i, j), (j, h)}.

11 Biconnectivity

An undirected graph is called biconnected if there are no vertices, called articulation points, or edges, called bridges, whose removal disconnects the graph.

Example 6. For the graph G = (V,E), where V = {a, b, c, d, e, f, g} and

E = {(a, c), (a, d), (b, c), (b, e), (c, d), (c, e), (d, f), (e, g)},

find all articulation points and bridges.

12 Biconnectivity algorithm

We now show how to determine all articulation points and bridges using a single depth-first traversal for a connected undirected graph.

Let D denote the depth-first spanning tree that is constructed for G = (V,E). For all v ∈ V , let num(v) denote the order in which v is added to D (i.e. the order in which it is marked and placed in the stack), and low(v) be recursively defined as the minimum of the following:

1. num(v),

2. the lowest num(w) for any back edge (v, w), and

3. the lowest low(w) for any child w of v in D.

Lemma 1. Let D denote the depth-first spanning tree that is constructed for G = (V,E), and v be any vertex of D. Suppose T1 and T2 are two distinct subtrees rooted at v. Then there are no edges in G that connect a vertex in T1 with a vertex in T2.

Proof of Lemma 1. Assume that T1 is generated first in the depth-first traversal that constructs D. In other words, num(u) < num(w) for every u ∈ T1 and w ∈ T2. Thus, there cannot be an edge, say (u, w) connecting T1 and T2, since otherwise, during the depth-first traversal, edge (u, w) would have been added to D, and one would have w ∈ T1, a contradiction.

Theorem 3. Let G = (V,E) be an undirected and connected simple graph. Let D be a depth-first spanning tree for G. Then v ∈ V is an articulation point iff either v is the root of D and two or more subtrees are rooted at v, or v is not the root, but has a child w for which low(w) ≥ num(v).

Theorem 3. If v is the root and two or more subtrees are rooted at v, then the above lemma implies that v’s removal from G will disconnect all of those subtrees. Therefore v is an articulation point. Moreover, if there is only one subtree rooted at v, then removing v does not disconnect D, and hence v is not an articulation point.

Now suppose v is not the root, and there is a child w of v in D for which low(w) ≥ num(v). Then the only path from w to an ancestor of v must pass through v, which implies v is an articulation point, since its removal will disconnect w from the ancestor.

Finally, assume v is a non-root articulation point. First note that v cannot be a leaf of D, since the removal of v would still leave D connected, and thus all other nodes remain interconnected via D − {v}. Let Dˆ denote the part of D that remains after v and its subtrees are removed from D. Then it must be the case that there is at least one subtree rooted at v that has no back edges that connect to Dˆ. If this were not the case, then v would not be an articulation point, since all of its descendants would remain connected via Dˆ. Hence, letting w denote the root of this subtree, we have low(w) = num(w) > num(v); and the proof is complete.

13 Corrolary 2. Let G = (V,E) be an undirected and connected simple graph. Let D be a depth-first spanning tree for G. Then e ∈ E is a bridge iff e = (v, w) is an edge of D and low(w) ≥ num(v).

14 Example 7. Using the graph from Example 6, show the depth-first spanning tree, along with the low and num values of each vertex. Verify Proposition 3 for this example.

15 Directed Acyclic Graphs (DAGs)

A directed acyclic graph (DAG) is simply a directed graph that has no cycles (i.e. paths of length greater than zero that begin and end at the same vertex). DAGs have several practical applications. One such example is to let T be a set of tasks that must be completed in order to complete a large project, then one can form a graph G = (T,E), where (t1, t2) ∈ E iff t1 must be completed before t2 can be started. Such a graph can be used to form a schedule for when each task should be completed, and hence provide an estimate for when the project should be completed.

The following proposition suggests a way to efficiently check if a directed graph is a DAG.

Theorem 4. If directed graph G = (V,E) is a DAG, then it must have one vertex with out-degree equal to zero.

Proof of Theorem 4. If DAG G did not have such a vertex, then one could construct a path having arbitrary length. For example, if P is some path that ends at v ∈ V , then P can be extended by adding to it vertex w, where (v, w) ∈ E. We know that w exists, since deg+(v) > 0. Thus, in constructing a path of length greater than |V |, it follows that at least one vertex in |V | must occur more than once. Letting P 0 denote the subpath that begins and ends at this vertex, we see that P 0 is a cycle, which constradicts the fact that G is a DAG.

The following algorithm makes use of Proposition 5.

Algorithm for Deciding if a Directed Graph is Acyclic. Let G = (V, E, c) be a network.

Initialize queue Q with all vertices having out degree 0. Initialize function d:V->N so that d(v) is the out degree of v

While Q is nonempty: Exit node v from Q.

For every u that is a parent of v: Decrement d(u) by the number of connections from u to v. If d(u) == 0: Enter u in Q.

If d(u) == 0 for all u in V: Return true. Otherwise: Return false.

16 Note that a similar algorithm can be used to determine a topological sort of a DAG. Given DAG G = (V,E), of order n, then the ordering of V , v1, . . . , vn is said to be a topologically sorted iff, for all edges (u, v) ∈ E, u comes before v in the ordering. The only difference with the algorithm is that we use in-degrees instead of out-degrees. In this manner, the order in which nodes will be removed from the queue will be in topological-sorted order.

Example 8. Given the DAG G = ({a, b, c, d, e, f},E), with

E = {(a, b), (b, c), (b, d), (c, d), (e, f), (e, b), (f, a), (f, d)}, use the above algorithm to verify that G is a DAG, and provide a topological sort for V .

17 Exercises

For all graph-traversal problems, assume all neighbor lists are alphabetized, and that the selection of the next root vertex is based on alphabetical order.

1. The cubic graph Qn, n ≥ 1, has vertex set equal to the set of all binary strings of length n. Moreover, two vertices are adjacent iff they differ in at most one bit place. For example, in Q3, 000 is adjacent to 010, but not to 011. Draw Q1, Q2, and Q3. Show that Q3 has a Hamilton cycle, i.e. a path that visits every vertex exactly once, and returns the the starting vertex.

2. Provide formulas for both the order and size of Qn. Explain. 3. Prove that for any simple graph having n vertices, that there exist dn/2e vertices for which at least 3/4 of all edges are incident with.

4. Starting at vertex 000, perform a breadth-first traversal of Q3. Assume all adjacency lists are in numerical order. For example, (000, 001) occurs before (000, 010). Repeat using a depth-first traversal. In both cases show the resulting spanning trees. 5. Show the resulting spanning forests for both a breadth-first and depth-first traversal of the directed graph having vertex set a-k and edges {(j, a), (j, g), (a, b), (a, e), (b, c), (c, k), (d, e), (e, c), (e, f), (e, i), (f, k), (g, d), (g, e), (g, h), (h, e), (h, i), (i, f), (i, k)}. 6. Consider a breadth-first spanning tree for a graph. For each vertex v, let q(v) denote the order that v was added to the queue, while d(v) denotes the depth of v in the tree. Prove that if q(u) < q(v), then d(u) ≤ d(v). Hint: use mathematical induction on the height of the tree. 7. Consider a breadth-first spanning tree for a graph. Let d(v) denote the depth of v in the tree. Prove that if (u, v) is a cross edge, then |d(u) − d(v)| ≤ 1. Hint: use a proof by contradiction. 8. Let G be an undirected connected graph and suppose T is a breadth-first spanning tree for G. Prove that G has an odd cycle iff T has a cross edge (u, v) for which d(u) = d(v), where, e.g., d(u) denotes the depth of u in the tree. 9. For the the directed graph whose vertices are a-g, and whose edges are given by {(a, b), (b, g), (g, e), (b, e), (b, c), (a, c), (c, e), (c, d), (e, d), (e, f), (d, f), (d, a), (g, f), (f, d), (d, b)}. perform two depth-first traversals (one on G, the other on Gr) to obtain the strongly connected components of G. 10. Draw the simple graph whose vertices are a − k, and whose edges are given by {(a, c), (a, d), (c, b), (c, d), (c, f), (b, e), (e, f), (e, i), (e, h), (f, g), (h, j), (i, k), (j, k)}. Determine the articulation points of the graph by performing a depth-first traversal starting at vertex a, and computing num(v) and low(v) for each node.

18 11. Perform the topological-sort algorithm on the directed graph having vertex set a-k and edges

{(j, a), (j, g), (a, b), (a, e), (b, c), (c, k), (d, e), (e, c), (e, f), (e, i), (f, k),

(g, d), (g, e), (g, h), (h, e), (h, i), (i, f), (i, k)}. Show the state of the queue each time it changes state (by either inserting or removing a vertex).

19 Exercise Solutions

1. Q1 is shown below.

0 1

Q2 is shown below.

00 01

10 11

Q3 is shown below.

000 001 100 101

010 011 110 111

Hamilton Cycle: 000, 001, 101, 100, 110, 111, 011, 010, 000.

n n 2. |Vn| = 2 since there are 2 bit strings of length n. For |En|, note that each vertex is adjacent with exactly n other vertices, since a bit string of length n has n different bits that can be changed to produce a bit string that differs in exactly one place. Thus, by the Handshaking Theorem, X n deg(v) = n2 = 2|En|,

v∈Vn n−1 which implies |En| = n2 . 3. For simplicity assume G has n vertices, where n is even. Suppose we randomly select half of these vertices to color blue, while the remaining half are colored red. Then for a given edge of G, the vertices it is incident with have one of four color combinations: (blue, blue), (blue, red), (red, blue), (red, red). Then with probability 3/4, the edge will be incident with

20 a blue vertex. Letting m denote the size of G, we would thus expect 3m/4 of the edges to be incident with a blue vertex. Therefore, there must be an actual coloring of the vertices for which at least 3m/4 of the edges are incident with a blue vertex. For suppose every coloring produced fewer than 3m/4 edges that were incident with a blue edge. Then the average number of edges incident with a blue edge could not reach 3m/4. Therefore, at least 3m/4 of the edges are incident with a blue vertex from some actual coloring.

4. Breadth-first spanning tree for Q3:

000

001 010 100

011 101 110

111

A depth first traversal for Q3 starting at 000 consists of the following single branch (also known as a Hamilton path, since every node is visited exactly once): 000, 001, 011, 010, 110, 100, 101, 111. 5. The breadth-first spanning forest. a d g j

b e h

c f i

k

The Depth-first spanning forest is identical to the breadth-first forest. 6. For the basis step, assume the tree T has a height equal to zero. Then it only has one node, and the statement is true. Now suppose height(T ) = 1. Consider two vertices u, v ∈ T , and assume q(u) < q(v). If u is the root, then d(u) = 0 < d(v) = 1. Otherwise, u and v are both children of the root, in which case d(u) = d(v) = 1. For the inductive step, assume the statement is true for all trees with height h. Consider a tree T with height h + 1, and consider two vertices u, v ∈ T , with q(u) < q(v). Since q(u) < q(v) it follows that u was placed in the queue before v. This means that either u and v have the same parent (in which case they have the same depth), or that q(parent(u)) < q(parent(v)). In the latter case, the depths of these parents must be at most h. So by the inductive assumption, we must have d(parent(u)) ≤ d(parent(v)) which implies d(u) = d(parent(u)) + 1 ≤ d(parent(v)) + 1 = d(v).

21 7. Without loss of generality, assume that (u, v) is a cross edge, and that d(v) = d(u) + 2. Let w denote the parent of v. Then d(w) = d(u) + 1. Thus, by the previous exercise, it must be the case that u was added to and removed from the queue before w. But then, when u was removed, it must have been true that v had not yet been marked, since it gets marked by w which was either still in the queue, or had yet to be added to the queue. But since (u, v) is an edge, v would have been marked by u, and u would have been v’s parent, a contradiction. Therefore, it must be true that |d(u) − d(v)| ≤ 1.

8. Let G be an undirected connected graph and suppose T is a breadth-first spanning tree for G. First assume that T has a cross edge (u, v) for which d(u) = d(v). Let w be the first common ancestor of u and v in T . Then, since d(u) = d(v), the distance k from w to u equals the distance from w to v. Thus, if we combine the k edges from w to u with the k edges from w to v, and the edge (u, v), we obtain an odd cycle of length 2k + 1. Conversely, assume G has an odd cycle C. Color each vertex of G either red or blue, by coloring each level of T either all blue or all red, and alternating the two colors from one level to the next. Now since C has odd length, it also has an odd number of vertices. Thus, there must be either more red or more blue vertices. Without loss of generality, assume there are more red vertices in C. This means that two of the red vertices, say u and v, must be adjacent (you may want to draw some examples to convince yourself of this). Thus, the edge (u, v) must be a cross edge, since each tree edge is incident with both a blue and red vertex. But from the previous exercise, we know that |d(u) − d(v)| ≤ 1, which implies d(u) = d(v), since vertices of adjacent depths have different colors.

9. Depth-first spanning tree for G with post-order values.

a/7

b/6

c/4 g/5

d/2 e/3

f/1

22 Depth-first spanning tree for Gr. Note: since the 2nd traversal yields a single tree, we conclude that G is strongly connected. a

d

c e f

b g

23 10. Depth-first spanning tree for G with num and low values. Articulation points: c, e, f.

a/1/1

c/2/1

b/3/2 d/11/1

e/4/2

f/5/2 h/7/4

g/6/6 j/8/4

k/9/4

i/10/4

24 11. Algorithm details are provided in the table below. Note that the graph is empty once the queue is empty. Therefore, G is acyclic with a topological sort of j, a, g, b, d, h, e, c, i, f, k. Queue Vertex Removed Edges Removed deg+(v) = 0 Vertices {j} j (j, a), (j, g) a, g {a, g} a (a, h), (a, e) b {b, d, h} g (g, d), (g, e), (g, h) d, h {b, d, h} b (b.c) None {d, h} d (d, e) None {h} h (h, i), (h, e) e {e} e (e, c), (e, f), (e, i) c, i {c, i} c (c, k) None {i} i (i, f), (i, k) f {f} f (f, k) k {k} k None None ∅ None None None

25