Reachability, Routing and Distance Labeling Schemes in Graphs with Applications in Networks and Graph Databases
Total Page:16
File Type:pdf, Size:1020Kb
REACHABILITY, ROUTING AND DISTANCE LABELING SCHEMES IN GRAPHS WITH APPLICATIONS IN NETWORKS AND GRAPH DATABASES A dissertation submitted to Kent State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy by Yang Xiang December 2009 TABLE OF CONTENTS LISTOFFIGURES .................................. vii LISTOFTABLES ................................... x Acknowledgements ................................... xi 1 INTRODUCTION ................................. 1 1.1 Problem definitions and basic notations . ..... 1 1.2 Application backgrounds and related work . ..... 5 1.2.1 Reachability schemes . 5 1.2.2 Routing and distance estimation schemes . ... 12 1.3 Main contributions of this dissertation . ...... 20 2 PATH-TREE: EFFICIENTLY ANSWERING REACHABILITY QUERIES ON VERYLARGEDIRECTEDGRAPHS . 26 2.1 Path-Tree Cover For Reachability Query . .... 28 2.1.1 Notations ............................... 28 2.1.2 Path-DecompositionofDAG. 29 2.1.3 Path Subgraph and Minimal Equivalent Edge Set . .. 31 2.1.4 Path-Graph and its Spanning Tree ( -tree)............ 34 SP 2.1.5 Reachability Labeling for Path-Tree Cover . .... 36 2.1.6 Transitive Closure Compression and Reachability Query Answering 41 iii 2.2 Theoretical Analysis of Optimal Path-Tree Cover Construction. 43 2.2.1 Optimal Path-Tree Cover with Path-Decomposition . ..... 44 2.2.2 OptimalPath-Decomposition . 47 2.2.3 Superiority of Path-Tree Cover Approach . ... 49 2.2.4 VeryFastQueryProcessing . 49 2.3 Experimentalresults . .. .. 51 3 3-HOP: ANSWERING REACHABILITY QUERIES IN DENSE DIRECTED GRAPHS ...................................... 56 3.1 BasicIdeasof3-HopIndexing . 58 3.1.1 Basic3-Hop.............................. 58 3.1.2 ChainDecompositionfor3-Hop . 60 3.1.3 3-HopIndexingandOurApproach . 62 3.2 TransitiveClosureContour. .. 63 3.2.1 NotationandChain-Decomposition . 63 3.2.2 Transitive Closure between Two Chains . .. 64 3.2.3 Computing Transitive Closure Contour . .. 70 3.3 3-HOP Labeling for Transitive Closure Contour . ...... 72 3.3.1 ProblemDefinition . .. .. 72 3.3.2 A Basic Approximation Algorithm for 3-Hop Cover . ... 75 3.3.3 AFasterAlgorithmfor3-HOPLabeling . 78 3.4 Reachability Query Processing using 3-HOP Indexing . ........ 84 3.4.1 3-HOPContourQueryProcessing. 85 iv 3.4.2 3-HOP Segment Query Processing . 87 3.5 ExperimentalEvaluation . 89 4 HYPERRECTANGLE SUMMARIZATION FOR 0/1 MATRICES . 94 4.1 SummarizationObjectives . 95 4.2 The Hardness of Hyperrectangle Summarization . ...... 96 4.3 Algorithms for Summarization without False Positives . .......... 98 4.3.1 The Intuitive Greedy Algorithm . 98 4.3.2 The Hyper Algorithm........................ 100 4.3.3 Pruning Technique for Hyper .................... 105 4.4 Summarization of the Covering Database . 107 4.5 Related Summarization Works in data mining . 110 4.6 ExperimentalResults. .. .. 114 5 GREEDY ROUTING BY AID OF SPANNING TREES . 116 5.1 GreedyRoutingRules ............................ 117 5.2 Preliminaryresults .............................. 124 5.3 Framesforduallychordalgraphs . 126 5.4 Frames for k-chordalgraphsandsubclasses. 128 5.4.1 k-Chordalgraphs ........................... 130 5.4.2 Chordal bipartite graphs and AT-free graphs . 137 5.5 Localized frames for tree-length λ graphs and δ-hyperbolic graphs . 140 5.5.1 Tree-length λ graphs ......................... 141 5.5.2 δ-hyperbolicgraphs . 155 v 5.6 Appendix: Experimental Results . 157 5.6.1 Performance under various densities . 158 5.6.2 Performance under various localities . 159 6 DISTANCE AND ROUTING LABELING SCHEMES BY TREE SPANNERS 161 6.1 Distance and routing labeling schemes for circle graphs .......... 163 6.2 Distance and routing labeling schemes for k-gongraphs . 169 7 CONCLUSIONANDFUTUREWORK . 175 BIBLIOGRAPHY ................................... 178 vi LIST OF FIGURES 1 Path-DecompositionforaDAG . .. .. 31 2 PathRelationshipsinaDAG ........................ 33 3 (a) Weighted Directed Path-Graph & (b) maximum Directed Spanning Tree 35 4 Labeling for Path-Path (A simple case of Path-Tree) . ..... 37 5 CompleteLabelingforthePath-Tree . 40 6 Asimpleexamplefor3-hopand2-hop . 59 7 A simple DAG with a chain decomposition. (Dotted arrow from 13 14 → is not an edge in the original DAG, but an inferred one using reachability). 61 8 Two examples of Reachability between segments (through chains C2 and C3)....................................... 62 9 Pseudo-diagonal and Pseudo-upper triangular submatrix. All blank cells are0-cells.................................... 66 10 Edgelink between chain C1 and C3, and between chain C3 and C4. Dotted arrowsarevirtualedges(paths).. 69 11 Generalized Join and Chain-Center Bipartite Graph . ....... 73 12 3-Hop Labeling of Transitive Closure Contour . ..... 85 13 Index sizes of Synthetic Datasets (2K) by 3HOP-Contour, 3HOP-Segment, Path-Tree,and2HOP ............................ 92 14 A hyperrectangle H C . Shaded cells are covered by hyperrectangles ∈ α currently available in CDB . ......................... 103 vii 15 A graph and its rooted spanning tree with precomputed ancestry intervals. For (ordered) pair of vertices 10 and 4, both IGR and IGRF produce path 10,8,3,4 (TDGR produces 10,5,4). For pair 5 and 8, IGR produces path 5,2,1,8, while IGRF produces path 5,3,8 (TDGR produces 5,10,8). For pair 5 and 7, IGR produces path 5,2,1,7, while IGRF produces path 5,3,2,1,7 (TDGRproduces5,2,1,7). 121 16 Rectilinear grid and its column-wise Hamiltonian path. ........ 126 17 A simple graph demonstrating that IGRF strategy may produce a shorter routingpaththanIGRstrategy. 126 18 Illustration to the proof of Theorem 20 in Chapter 5. ...... 133 19 A 5-chordal graph with a LexBFS-ordering: length (RG,T (x, y )) = 4 + dG(x, y )..................................... 134 20 A chordal graph with a BFS-ordering: d (x, y ) = length (R (x, y )) 2 135 G G,T − 21 A chordal graph with a LexBFS-tree. This tree is not an additive r-carcass for r < 5. ................................... 137 22 A chordal graph with an additive 0-fframe. This graph has neither additive 0-carcassnoradditive0-frame.. 137 23 A chordal graph with an additive 0-frame (which is also an additive 0- fframe). This graph does not have any additive 0-carcass. 137 24 Illustration to the proof of Theorem 23 in Chapter 5. (1) NCA T (X, Y ) = X; (2) NCA T (X, Y ) = Y ; (3) NCA T (X, Y ) := A is neither X nor Y . 143 25 A tree-length 6 graph and its corresponding tree-decomposition. 145 26 A tree-length 8 graph and its corresponding tree-decomposition. 146 viii 27 (a) A path is a tree path. (b) A path is not a tree path. The edge ab is on the right side of CR. (c) A path is not a tree path. The edge ab is on the left side of CL. .............................. 153 28 Maximum multiplicative-stretch factors by varying densities, for several IGR,TDGR,andIGRFstrategies. 159 29 Average multiplicative-stretch factors by varying densities, for several IGR, TDGR,andIGRFstrategies. 159 30 Maximum multiplicative-stretch factors by varying localities, for several IGR,TDGR,andIGRFstrategies. 160 31 Average multiplicative-stretch factors by varying localities, for several IGR,TDGR,andIGRFstrategies. 160 32 A circle graph with an intersection model and two special chords a and b. A balanced separator S = N [a, b ] and the connected components of G S G \ arealsoshown. ................................ 164 33 (a) a house (or l-house for l = 1), (b) an l-house for l = 3, (c) a tree of l-houses of depth 3 where l =2. ...................... 168 34 A 6-gon graph with an intersection model and two special chords a and b. A balanced separator S = NG[a, b ] and the connected components of G S arealsoshown.............................. 170 \ ix LIST OF TABLES 1 Worst-Case Complexities of Available reachability Schemes........ 8 2 List of Some Important Routing Labeling Schemes (RLS). Labeling time ofthoseRLSispolynomial. 20 3 List of Some Important Distance Labeling Schemes (DLS). Labeling time ofthoseDLSispolynomial. 20 4 Real Datasets Used in Empirical Study of Path-Tree . .... 52 5 Comparison between Optimal Tree Cover Approach and Path-Tree Ap- proachonRealDatasets . .. .. 53 6 Query Time of Synthetic Datasets (2K) by 3HOP-Contour, 3HOP-Segment, Path-Tree, 2HOP, Breadth-First Search, and Depth-First Search . 91 7 Average densities of UDGs for nodes with different radiuses. ....... 158 x Acknowledgements I would like to thank my dissertation advisor Dr. Feodor F. Dragan. I will not forget in my life so many times he helped me to identify research problems and to refine my analytical skills in algorithmic graph theory. I would like to thank Dr. Ruoming Jin. Under his inspiration, I have found many interesting connections between algorithmic graph theory and broad application areas in databases and data mining. I would like to thank Dr. Hassan Peyravi. He helped me many times during my Ph.D. studies and gave me many good suggestions. Finally, I would like to thank all dissertation committee members, all my co-authors, and all my friends who supported me in completion of this Ph.D. dissertation. Partic- ularly, I would like to thank my co-authors Ning Ruan and David Fuhry, who worked together with me. Both of them did outstanding programming jobs. xi CHAPTER 1 INTRODUCTION To be able to answer whether a vertex can reach another vertex in a directed graph, to route a message from a source vertex to a destination vertex, to calculate or estimate the distance between two vertices, are fundamental research problems in algorithmic graph theory with wide range of applications in networks and graph databases. Although the three problems are different, they are quite related. In this dissertation, I primarily focus on designing efficient solutions for the above three problems by graph labeling approaches, i.e. assigning some bits of information to each vertex in a graph to achieve various goals. Section 1.1 formally defines the problems and gives the basic notations that will be used in the dissertation. Section 1.2 introduces the application backgrounds of these research problems and related work. Finally, Section 1.3 outlines the main contribution of this dissertation. The conclusion, future work, and open questions of this dissertation are given in Chapter 7.