1 Breadth-First Search

1 Breadth-First Search Breadth- rst search is the variant of search that is guided byaqueue, instead of depth- rst search's stack rememb er, depth- rst search does use a stack, the one implicit in its recursion. There is one stylistic di erence: One do es not restart breadth- rst search, b ecause breadth- rst searchonly makes sense in the context of exploring the part of the graph that is reachable from a particular no de s in the algorithm b elow. Also, although BFS do es not have the wonderful and subtle prop erties of depth- rst search, it do es provide useful information of another kind: Since it tries to b e \fair" in its choice of the next no de, it visits no des in order of increasing distancefrom s. In fact, our breadth- rst searchalgorithm b elow lab els eachnode with the shortest distance from s, that is, the numb er of edges in the shortest path from s to the no de. The algorithm is this: Algorithm dfsG=V,E: graph, s: node; v, w: nodes; Q: queue of nodes, initially fsg; dist: array[V] of integer, initially 1 dist[s];=0 while Q is not empty do fv:= ejectQ, for all edges v,w out of v do fif dist[w] = 1 then finjectw,Q, dist[w]:=dist[v]+1ggg For example, applied to the graph in Figure 1, this algorithm lab els the no des by the array distasshown. Why are we sure that the dist[v] is the shortest-path distance of v from s? It is certainly true if dist[v] is zero this happ ens only at s. And, if it is true for dist[v] = d, then it can b e easily shown to b e true for values of dist equal to d + 1 |any no de that receives this value has an edge from a no de with dist d, and from no no de with lower dist. Notice that no des not reachable from s will not b e visited or lab eled. 2 0 1 s 2 1 3 2 3 Figure 1: BFS of a directed graph Breadth- rst search runs, of course, in linear time, O jE j recall that we assume that jE j jV j. The reason is the same as with depth- rst search: breadth- rst search visits each edge exactly once, and do es a constant amountofwork p er edge. 2 Dijkstra's algorithm What if each edge v; w of our graph has a length, a p ositiveinteger denoted lengthv; w, and 1 we wish to nd the shortest from s to all no des reachable from it? breadth- rst search is still of help: We can sub divide each edge u; v into lengthu; v edges, by inserting lengthu; v 1 \dummy" no des, and then apply breadth- rst search to the new graph. This algorithm solves P the shortest-path problem in tom O length u; v . But, of course, this can b e very u;v 2E large |lengths could b e in the thousands or millions. This breadth- rst search algorithm will most of the time visit \dummy" no des; only o cca- sionally will it do something truly interesting, like visit a no de of the original graph. There is away to simulate it so that we only take notice of these \interesting" steps. We need to guide breadth- rst search, instead of by a queue, byaheap or priority queue of no des. Eachentry in the heap will stand for a pro jected future \interesting event" |breadth- rst searchvisiting for the rst time a no de of the original graph. The priorityofeach no de will b e the pro jected time at which breadth- rst search will reach it. These \pro jected events" are in general unre- liable, b ecause other future events may \move up" the true time at which breadth- rst search will reach the no de see no de b in Figure 2. But one thing is certain: The most imminent future scheduled event is going to happen at precisely the projected time |b ecause there is no intermediate eventtoinvalidate it and \move it up." And the heap conveniently delivers this most imminentevent to us. As in all shortest path algorithms we shall see, we maintain two arrays indexed by V . The rst array, dist[v], will eventually contain the true distance of v from s. The other array, prev[v], will contain the last no de b efore v in the shortest path from s to v . At al l times dist[v] wil l contain a conservative over-estimate of the true shortest distanceof v from s. dist[s] is of course always 0, and all other dist's are initialized to 1 |the most conservative overestimate of all::: The algorithm is this: algorithm DijkstraG=V, E, length: graph with positive weights; s: node v,w: nodes, dist: array[V] of integer; prev: array[V] of nodes; H: heap of nodes prioritized by dist; for all v 2 Vdofdist[v] := 1, prev[v] :=nilg H:=fsg , dist[s] :=0 while H is not empty do fv := deleteminH, for each edge v,w in E out of v do fif dist[w] > dist[v] + length[v,w] then fdist[w] := dist[v] + length[v,w], prev[w] := v, insertw,Hggg The algorithm, run on the graph in Figure 2, will yield the following heap contents no de: dist/priority pairs at the b eginning of the while lo op: fs :0g, fa :2;b :6g, fb :5;c :3g, fb :4;e :7;f :5g, fe :7;f :5;d :6g, fe :6;d :6g, fe :6g, fg. The nal distances from s are shown in Figure 2, together with the shortest path treefrom s, the ro oted tree de ned by the p ointers prev. What is the complexity of this algorithm? The algorithm involves jE j insert op erations and jV j deletemin op erations on H , and so the running time dep ends on the implementation of the heap H , so let us discuss this implementation. There are manyways to implementa 1 What if we are interested only in the shortest path from s to a sp eci c no de t? As it turns out, all algorithms known for this problem also give us, as a free bypro duct, the shortest path from s to all no des reachable from it. a 2 c 3 e 6 1 4 2 s 0 351 2 3 1 6 22 b 4 d 6 f 5 Figure 2: Shortest paths 2 heap. Even the most unsophisticated one an amorphous set, say an array or linked list of 2 no de/priority pairs yields an interesting time b ound, O n see rst line of the table b elow. A binary heap gives O jE j log jV j. Which of the two should we use? The answer dep ends on how dense or sparse our graphs 2 2 are. In all graphs, jE j is b etween jV j and jV j . Ifitis jV j , then we should use the lined 2 jV j list version. If it is anywhere b elow ,we should use binary heaps. log jV j heap implementation deletemin insert jV jdeletemin+jE jinsert 2 linked list O jV j O 1 O jV j binary heap O log jV j O log jV j O jE j log jV j d log jV j log jV j log jV j d-ary heap O O O jV jd + jE j log d log d log d Fib onacci heap O log jV j O 1 amortized O jV j log jV j + jE j A more sophisticated data structure, the d-ary heap, p erforms even b etter. A d-ary heap is just like a binary heap, except that the fan-out of the tree is d, instead of 2. In an array implementation, the child p ointers of no de i are implemented as d i::: i + d 1, while the log jV j i c. Since the depth of any such tree with jV j no des is , it is easy parent p ointer as b d log d to see that inserts take this amount of time, while deletemins take d times that |b ecause deletemins go down the tree, and must lo ok at the children of all no des visited. The complexity of our algorithm is therefore a function of d.Wemust cho ose d to minimize jE j it. The rightchoice is d = |the average degree! It is easy to see that it is the rightchoice jV j b ecause it equalizes the two terms of jE j + jV jd. This yields an algorithm that is go o d for 2 b oth sparse and dense graphs. For dense graphs, its complexityisO jV j . For sparse graphs with jE j = O jV j, it is jV j log jV j. Finally, for graphs with intermediate density, suchas 1+ jE j = jV j , where is the density exponent of the graph, the algorithm is linear! The fastest known implementation of Dijkstra's algorithm uses a more sophisticated data structure called Fibonacci heap, whichwe shall not cover; see Chapter 21 of CLR. The Fi- b onacci heap has a separate decrease-key op eration, and it is this op eration that can b e carried out in O 1 amortized time. By \amortized time" we mean that, although each decrease-key op eration may take in the worst case more than constant time, all such op erations taken together have a constantaverage not exp ected cost.

Load more