MAL 376: Graph Algorithms I Semester 2014-2015 Lecture 1: July 24 Course Coordinator: Prof. B. S. Panda Scribes: Raghuvansh, Himanshu, Mohit, Abhishek

Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. They may be distributed outside this class only with the permission of the Course Coordinator.

This lecture’s notes describe the concept of depth first search and biconnectivity in graphs. They start by introducing graphs, their preliminaries and other notations that would be used later on to finally give an algorithm to enumerate all the biconnected components in the graph using depth first search.

1.1 Preliminaries

A graph is a set of vertices V and edges E ⊆ V × V . A graph is called simple if it has no multiple edges (multiple edges between the same pair of vertices) and self loops (an edge from a vertex to itself). A graph is considered to be undirected if there is no sense of ordering in the vertices of an edge i.e. an edge from a to b is also an edge from b to a, otherwise it is said to be directed. An edge (u, v) is said to be incident on the vertices u and v. The of a vertex is the number of edges incident on it. For directed graphs, there are two degrees, an in-degree and a out-degree for every vertex. Henceforth, unless otherwise specified, a use of the term graph would mean an undirected, simple graph. Pictorially, a graph with V = {r, s, u, v, w} and E = {(r, s), (r, v), (r, w), (s, v), (u, v), (u, w)} would be drawn as shown below:

Figure 1.1: A Graph

A graph (V 0,E0) is said to be a subgraph of another graph (V,E) if V 0 ⊆ V and E0 ⊆ E. A path is an sequence of edges that connect distinct vertices. A graph is said to be connected if there is a path between any pair of vertices, otherwise it is said to be disconnected. A graph can always be broken down into a set of connected subgraphs that are known as connected components of a graph. We will use the ‘Big-Oh’ “O” in our analysis of algorithms. By saying that an algorithm is O(f(n)) in some

1-1 1-2 Lecture 1: July 24

input parameter n, we mean that as there exist constants c > 0 and n0 > 0 such that ∀n > n0, the total units of time the algorithm would take to terminate will never be more than c ∗ f(n).

1.2 Definitions

Walk: A walk in the graph G = (V,E) is a finite sequence of the form vi0 , ej1 , vi1 , ej2 , ..., ejk , vik , which consists of alternating vertices and edges of G such that all edges e in the sequence are incident on the vertices just before and just after in the sequence. vi0 is said to be the initial vertex and vik is said to be the terminal vertex. k is called the length of the walk. Trail:Trail is a walk in which there are no repeated edges. Path:Path is a walk in which there are no repeated vertices. Circuit:A circuit is a trail that begins and ends at the same vertex. : A cycle is a path that begins and ends at the same vertex. : A graph with no cycles is termed a tree. A tree is said to be rooted if one special vertex has been designated the root and unrooted otherwise. Parent: A vertex v is a parent of w in a rooted tree v comes just before w in a path from root to w in the tree. Ancestor: A vertex v is a ancestor of w in a rooted tree if a path from root to w in the tree includes v. Descendant: A vertex v is a descendant of w if w is an ancestor of v in the tree.

1.3 Depth-first Search

One of the most basic algorithm to traverse a graph (i.e., to visit all it’s vertices and/or edges) is the depth first search(DFS). The basic idea of DFS is, as the name suggests, to search “deeper” in the graph wherever possible. It explores edges out of the most recently discovered vertex v that still has unexplored edges leaving it. After all the edges of v have been explored, the search “backtracks” to explore all the yet unexplored edges leaving the vertex from which v was discovered. When it discovers all the vertices that are reachable from the source vertex, it stops. For example, one of the possible ways in which the vertices of a graph could be explored by DFS is shown in Fig 1.2. The numbers indicate the order in which the vertex is visited.

Figure 1.1: A DFS traversal

For understanding DFS better let us define the concept of colouring of vertices. If a vertex is coloured white then it means it is not visited before. If it is coloured black then it has already been discovered. Lecture 1: July 24 1-3

Following is the pseudo code for DFS:

DFS (G): for each vertex u ∈ G.V u.color = WHITE for each vertex u ∈ G.V if u.color == WHITE DFS-VISIT (G, u)

DFS-VISIT (G , u): for each vertex v ∈ G.Adj[u] if v.color == WHITE DFS-VISIT (G, v) u.color = BLACK

Any implementation of the above algorithm defines a subset T of edges E. A edge (v, w) is said to be in T if while exploring this edge one of the vertices w was yet to be explored. This subset of edges constitutes a tree called the DFS-tree. The DFS-tree is rooted with its root at the ‘source’ vertex of the DFS. It is proved in the following theorem that any of the non-tree edges of G connect an ancestor to a descendant in the DFS-tree.

Theorem 1.1 All non-tree edges of G connect an ancestor to a descendant in the DFS-tree.

Proof: Consider an edge (v, w) of the graph G. Without loss of generality, we can assume that v was visited before w during the DFS. That means when the DFS was called on v, the vertex w was still white. Also, DFS at v would not stop until all its neighbours were black. Since, w is a neighbour of v it implies that it would also be black after DFS at v is over and hence be a descendant of v in the DFS-tree.

1.4 Biconnectivity

Having had an idea of depth-first search, we will now start with biconnectivity. As the name suggests, biconnectivity is like ‘connected twice over’. A more formal definition is as follows. A connected graph G is said to be biconnected if for all distinct triples of vertices v, w and a, there exists a path connecting v and w that does not go through a. In other words, the connection between v and w does not solely depend on a. There are other paths as well. Infact, every path has such ‘another’ path to reach from its source to the destination. It directly follows from this that removal of any vertex (along with its incident edges) of G would G into two connected components. This is true as we can take this vertex as our ‘a’ and for any other pair of vertices there exists a path not containing a. This path would remain even after removal of a and thus the graph would remain connected. If the graph were not biconnected, so that there exist three distinct vertices v, w and a such that all paths from v to w passed through a, then the removal of a would have split the graph into atleast two connected components. The vertex a in this case would have been called an articulation vertex. We can say that a graph is biconnected if and only if it has no articulation points. Just like we defined the connected components of a graph we can similarly define the biconnected components of a graph. First, we define a relation R on the edges of a graph. Under R, edges e and f are related if 1-4 Lecture 1: July 24

either e = f of there exists a cycle that contains both e and f. That the relation is an equivalence relation is easy to see and prove. For each equivalence class Ei of this relation, we can also prove that the subgraph induced by it is biconnected. A subgraph of a graph G is said to be induced by a set E of edges if it’s vertex set contains exactly those vertices of G that are incident on atleast one edge in E, and its edge set is the set E.

Theorem 1.2 Each equivalence class of R induces a biconnected subgraph of G.

Proof: Consider any triple v, w and a of vertices in G. Assume for the sake of contradiction that a is an articulation vertex and that all paths from v to w pass through a. Consider any such path. It would have exactly two edges that are incident on a (one ‘coming into’ and the other ‘going out’). Since these are in the same biconnected , there has to be a cycle containing these two edges (due to R). If we travel the other way round this cycle we can clearly ‘bypass’ a and therefore create a path from v to w that does not have a. This is a contradiction as we assumed that there is no such path. Hence, there is no articulation vertex in the induced subgraph which is therefore biconnected.

1.5 Biconnectivity Algorithm

Having learnt something about depth-first search and DFS-tree, we can now proceed towards the algo- rithm that enumerates all biconnected components. But before this, we’d first define the LOW and DFSNUMBER of a vertex. The DFSNUMBER of a vertex is the height of that vertex in the DFS-tree. The root has DFSNUMBER - and so on. The LOW of a vertex is defined as:

LOW (v) = min ({DFSNUMBER(v)} ∪ {DFSNUMBER(w) | ∃ an edge (x, w) ∈ E such that x is a descendant of v and w is an ancestor of v}) .

To see how this helps, think of checking for articulation vertices in G. The only edge is G would be the tree edges T and those connecting a ancestor to a descendant in the DFS-tree. A vertex in a connected component of G won’t be an articulation vertex if and only if after removing it the component is still connected. That means for all children w of v there has to be a edge from a descendant of w to a proper ancestor of v, or LOW (w) < v. The case of v being a root vertex is a special case. A root is an articulation vertex if and only if it has more than one child in the DFS-tree. Hence, using the LOW , we can get the biconnected components. The definition of LOW that we would use in our algorithm is as follows:

LOW (v) = min ({DFSNUMBER(v)} ∪ {DFSNUMBER(w) | ∃ an edge (v, w) ∈ E such that w is an ancestor of v} ∪ {LOW (w) | w is a child of v}) .

That this is equivalent to the definition is easily seen. What we have done is to group all the non-tree edges into two parts, those from v and those from proper descendants of v , and thus incorporated into LOW (w) for each child w of v. Thus calculated, LOW (v) can be used to find articulation vertices of the graph G and even to enumer- ate its biconnected components. Whenever we find a vertex v and its child w such that LOW (w) ≥ DFSNUMBER(v), we know that we have found a subgraph containing a biconnected component. This can be done recursively to enumerate all biconnected components. The data structure we would use for this purpose is a stack. What follows is the pseudocode for the whole process:

procedure SEARCHB (v): Lecture 1: July 24 1-5

begin mark v “old” ; DFSNUMBER[v] ← COUNT ; COUNT ← COUNT + 1; LOW [v] ← DFSNUMBER[v]; for each vertex w on L[v] do if w is marked “new” then begin add (v, w) to T ; F AT HER[w] ← v; SEARCHB(w); if LOW [w] ≥ DFSNUMBER[v] then a biconnected component has been found; LOW [v] ← MIN(LOW [v], LOW [w]); end else if w is not F AT HER[v] then LOW [v] ← MIN(LOW [v],DFSNUMBER[w]); end

If the graph is not connected, this thing is repeated for all the connected components. Algorithm: Finding biconnected components Input: An undirected graph G = (V,E). Output: A list of all the vertices in each biconnected component of G. Method:

• Initially, set T to φ and COUNT to 1. Also, mark all vertices in V as “new”. Then, select an arbitrary vertex v0 in V and call SEARCHB(v0) to build a depth-first spanning tree S = (V,T ) and to compute LOW (v) for each v ∈ V . • When vertex w is encountered at line 5 of SEARCHB, put edge (v, w) on STACK, a pushdown store of edges, if it is not already there. After discovering a pair (v, w) at line 10 such that w is a son of v and LOW (w) ≥ v, pop from STACK all edges upto and including (v, w). These form a biconnected component in G.

1.6 Correctness and Complexity

In this section we’d analyze the correctness and complexity of the two algorithm, namely the depth-first search and the biconnected component search.

1.6.1 DFS

Theorem 1.3 DFS from the vertex s visits all the vertices that are reachable from s.

Proof: For the sake of contradiction, consider a vertex v that is not visited by DFS but is reachable from s. Then there would be a path from s to v. Let w be the first unvisited vertex in that path and u be the vertex 1-6 Lecture 1: July 24

just before w. u exists because w 6= s (the source is trivially visited). By our assumption u is visited and w is not. Thus w was not visited when DFS at u was called. But in this case before completion of DFS at u, the edge (u, w) must have been checked and therefore w must have been made visited. This contradicts our assumption. Hence, our original statement stands proved. Since each edge is ‘checked’ atmost twice, from each of its incident vertices, thus the overall complexity of this algorithm would be O(m) where m is the number of edges.

1.6.2 Biconnected Components

Theorem 1.4 The algorithm presented correctly computes the biconnected components of any graph G.

Proof: The proof is by induction on the number of biconnected components b in the graph G. If b = 1, then the algorithm is trivial since the vertices v and w for which LOW (w) ≥ v are the root and its only child and so on completion of SEARCHB(w), all edges of G are on stack. Now assume by induction hypothesis, the theorem is true for all graphs with b biconnected components and consider a graph G with b + 1 biconnected components. Lest SEARCHB(w) be the first call of SEARCHB to end with LOW (w) ≥ v for a tree edge (v, w). There has to be such a call as there are articulation vertices in the graph. Since no edges have been removed from ST ACK, the set of edges above (v, w) on stack is the set of all edges incident on descendants of w. These edges form a biconnected component as if w was the first vertex to end with LOW (w) ≥ v, then all the descendants of w, which end before w, do not satisfy this condition. Thus, the contain no articulation vertices and form a biconnected component (no vertex before w can be in this component as LOW (w) ≥ v).On removal of these edges from the ST ACK, the algorithm behaves exactly as if it would on the graph G0 that is obtained from G after removal of this biconnected component. The induction now follows since G0 has b biconnected components. By the pseuodocode, it is obvious that the algorithm will have the same complexity as DFS. Therefore, this algorithm is also O(m) if m is number of edges in G.