Efficient Transitive Closure Algorithms
Total Page:16
File Type:pdf, Size:1020Kb
Efficient Transitive Closure Algorithms Yards E. Ioannidis t Raghu Rantakrishnan ComputerSciences Department Universityof Wisconsin Madison.WJ 53706 Abstract closure (e.g. to find the shortestpaths between pairs of We havedeveloped some efficient algorithms for nodes) since it loses path information by merging all computing the transitive closure of a directed graph. nodes in a strongly connectedcomponent. Only the This paper presentsthe algorithmsfor the problem of Seminaive algorithm computes selection queries reachability. The algorithms,however, can be adapted efficiently. Thus, if we wish to find all nodesreachable to deal with path computationsand a signitkantJy from a given node, or to find the longest path from a broaderclass of queriesbased on onesided recursions. given node, with the exceptionof the Seminaivealgo- We analyze these algorithms and compare them to ritbm, we must essentiallycompute the entire transitive algorithms in the literature. The resulti indicate that closure(or find the longestpath from every node in the thesealgorithms, in addition to their ability to deal with graph)first and then perform a selection. queries that am generakations of transitive closure, We presentnew algorithmsbased on depth-first also perform very efficiently, in particular, in the con- searchand a schemeof marking nodes (to record ear- text of a dish-baseddatabase environment. lier computation implicitly) that computes transitive closure efficiently. They can also be adaptedto deal 1. Introduction with selection queries ‘and path computations Severaltransitive closure algoritluus have been efficiently. Jn particular, in the context of databasesan presentedin the literature.These include the Warshall importantconsideration is J/O cost, since it is expected and Warren algorithms, which use a bit-matrix that relations will not fit in main memory. A recent representationof the graph, the Schmitz algorithm, study [Agrawal and Jagadish871 has emphasizedthe which usesTatjan’s algorithmto identify strongly con- significant cost of J/O for duplicate elimination. The nectedcomponents in reversetopological order, and the algorithmspresented here will incur no J/O costs for Seminaiveand Smart/Logarithmicalgorithms, which duplicateelimination, and we thereforeexpect that they view the graph as a bii relation and compute the will be particularly suitedto databaseapplications. (We transitiveclosure by a seriesof relationaljoins. presentan analysisof the algorithmsthat reinforcesthis While all of the above algorithmscan compute PW transitive closure, not all can be used to solve some The paper is organizedas follows. We introduce related problems. Schmitx’salgorithm camrotbe used somenotation in Section2. Section3 presentsthe algo- to answerqueries about the set of pathsin the transitive rithms, starting with some simple versionsand subse- quently refining them. We presentan analysisof these algorithmsin Section4 and discussselection queries in Section5. Section6 containsa comparisonof the algo- rithms to related work. We briefly discussthese algo- ritbms in the context of path computations,one-sided recursion,and parallel executionin Section7. Finally, our conclusionsare presentedin Section8. Permission to copy without fee alI ar Put of this mataid is 2. Notation and Basic Definitions grcmtcdpovided~theoopiesnellotmrdeardistribugdfor We assumethat the graph G is specitiedas fol- ditectco mmacial advmage, the VKIB copyhght lrotice and lows: For eachnode i in tbe graph,there is a set of suc- the title of the pblicatian and its date appear, and notice is given CeSSOlSEi = (j I(i, j)isanarcofG ). hat copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, mquin~ a fee d/or specialpumission fmm the Endowment. Proceedingsof the 14th VLDB Confemce Los Angeles, California 1988 382 We denotethe transitiveclosure of a graph G by additions as closed additions. (The successorset Sj is G’. The strongly connected component of node i is closed.that is, it containsall successorsof j in the tran- dehed as vi = (i ) u (j l(i,j)e G’and(j,i)E sitive closure.) G’)).Thecomp~net~Vi is~ntivialifVi f (i ). To deal correctly with cycles, we must make The condensation graph of G has the strongly con- some modifications.The idea is to ignore back arcs nectedcomponents of G as its nodes.There is an am (arcs into nodes previously visited by the depth-first from Vi to Vj in the condensationgraph if and only if search lxocedure) during the numbering phase. ‘Ihe thereisapathfromi toj inG. algorithm Basic-TC is run using this numbering.In the The algorithmswe presentconstruct a set of suc- presenceof cycles, not all additions are closed. (For cessorsin the transitiveclosure for eachnode in G . The example,the addition of Sj to Si when the arc (i , j) is a set of successomin the transitiveclosure for a node i is back ate is not a closedaddition.) Si~(jI(i,j)isan~ofG’). AsuccessorsetSiis While numbering the nodes in a preprocessing partitionedinto hV0 setsMl and Ti, and thesemay be phasefor algorithm Basic-TC exposes the underlying thought of as the “marked” and “tagged” subsetsof ideasclearly, we might improveperformance by doing Si. h&llyMi =Ti =Oforalli.+ the transitive closure work as we proceed in the numberingalgorithm. The following simple algorithm 3. The Transitive Closure Algorithms ilhrstratesthe idea,although it only works for dags. 3.1. A Marking Algorithm proc DagDFTC( G ) In this section,we presenta simpleversion of the Input: A graph G repmsented by successor sets El. algorithm. We do not suggestusing this algorithm in Output: Si, i = 1 to n, ¬ing G’. general; we present better algorithms, which are derived by refining this algorithm. (In this algorithm (fori=ltondovbited[i]:=O;Si:=Ood two alone,Si is partitionedinto setsMj and Vi - not Tj while there is some node i s.t. visited [i] = 0 do visit(i) od - that can be thought of as “marked’* and “unmarked”.) I pr0Cvisit ( i ) procBasic-TC ( C ) ( visited [i] := 1; Iq~:AdigraphG witbsuccessorsetsEi,i=lton. whilethereissomej E Ei -Si do Olctput:Si=UIvMi,i=lton.denotingG’. il visited [i] = 0 then visit(j); (U, :=Ei;M, :=4 Si := Si v Sj v ( j ) lori=ltondo od wldletbereisanodej E (I, 1 doMi:;=MiUM,V(j); U~:=U,VU,-Miod The abovealgorithm can be mod&xl to dealwith od cyclic graphsas follows. We need to distinguishnodes 1 that a~ mhed via back TICS, and we now partition Si Proposition 3.1: Algorithm Basic-TC cmrectly com- into hV0 subsetsMi, and Ti . Tj denotes nodesreached putesthe transitiveclosure of a directedgraph G . via back arcs. 3.2. A Depth-First Transitive CIosure Algorithm pror:Dm(G 1 Suppose that the graph G is acyclic. Let us Inprrt: A graph G represented by successor sets Ei . number the nodes using a depth-firstsearch such that Outprct:Si=MivTI,i=lton.denotingG’. all descendantsof a node numberedn have a lower ( p visited [i ] 1 ifviPitl(i)bssbeencalled. *I number than n. If we now run algorithm Basic-TC I* vi&d2[i] 1 if visit2(i) has bean called. */ using this ordering, every time we add a successorset P popped Ii I 1ifcalltovisifl(i)hasrettxned. +I S) toaset&,Sj =Mi,andUj =d. Werefertosuch r Global The succ. set of the mot of a stz camp. */ r Initialized before calling visit 2 for mot. */ t?bCretSim~bethouehtOfaScoa~g~~~~~~ue either mabd or tagged. Agrawal and Jqadi.sh [Agmwal and Jagati 88) pinted ant thst this would lead to O(n2) storage cve&ead for the marks and tags. ‘lkey dxerved that implementing &is by paxtiticning fori=ltondo St into sepamtesets hors almost no additional cvehead. (llbe space visited[i] := visited2[i] :=popped[i] := 0; for storing the graph in O(n2) in any case.) Mi := Ti := Global := 0 383 od Dug-DFTC, which is the need for a second phase to while there is somenode i s.t. vkired [i] = 0 do vi&l(i) al take care of nontrivial strongly connected components. 1 The graph of Figure 3.1 contains two such components, namely (a ,b ,d) and (e J ). Concentrating on the proc visit1 ( i ) former, when a is visited via the arc (d p), the succes- ( visited [i] := 1; sor list of a has not been computed yet, and so the suc- whilethereisj E El -Ml -TJ do cessor set of d cannot be updated properly. Bark-TC if vi&d [i] = 0 then vi&l(j); solves the problem by continuing to visit the successors Upoppedfj]>O P(i,j)ismtabackarc.*/ of a again, but this may lead to serious inefficiencies. then(Mf:=MIuMIu(j);Tr:P~~uTI)-MI) DFTC solves the problem in the secondpass, where a else Ti := Ti u ( j ) is identified as the root of the componentand its succes- od sor list is distributed to all the nodes in the component. iliETithenribelongstoastrong~~.*/ This is done by the calls to visir2. Similar comments ifTi=(i) Piisthemot.*/ hold for the other componentalso. then(Mi:=Miu(i);Ti=4 Global :=Mi; virir2(i) ) 3.3. Optimized Processing of Nontrivial Strong else(Ti:=Z-(i);M~:=M~u(i)) Components poppfzd[i] := 1 l&m is a major potential inefficiency in DFTC 1 in that the second pass over a strong component re- infers several arcs of the transitive closure that have P Assigns Global to sly%. sets of all nodesin str. camp. +/ been inferred during the lirst pass also. This can be seen in the component (a ,b ,d) of Figure 3.1. Assume prec vi&( i ) thatfromd wefirstvisite andf andthenvisita. As ( visited 2[i] := 1; soonas(dp)isdiscoveredasabackarc,DFTCputsa while there is j E Ef s.t. visited2~]10 and TJ # B do inthetaggedsetofd,anditthenpopsbacktob adding visit2(j) od into its successor list d,e f , and a, with a being h4i :=Global;Ti :=d tagged. Finally, the successorlist of a is updated to contain all the nodes in the graph without any tags. 1 During the second phase of going over the component Theorem 3.2: Algorithm DFTC correctly computes (a,b,d), nodes d&f, and a (as well as c) will be the transitive closure of G .