<<

Better Approximation of Betweenness

Robert Geisberger† Peter Sanders† Dominik Schultes†

Abstract interior. Then, the for node v is Estimating the importance or centrality of the nodes in large networks has recently attracted increased inter- σ (v) c (v) := st , (1.1) est. Betweenness is one of the most important central- B σ ∈ st ity indices, which basically counts the number of short- s,tXV est paths going through a node. Betweenness has been used in diverse applications, e.g., social network analy- where σst := |SPst| and σst(v) := |SPst(v)|. sis or route planning. Since exact computation is pro- This definition counts the number of shortest paths hibitive for large networks, approximation algorithms through v, counting paths with alternatives only frac- are important. In this paper, we propose a framework tionally. for unbiased approximation of betweenness that gener- Our original motivation for considering betweenness alizes a previous approach by Brandes. Our best new was to identify sets of important nodes that can define schemes yield significantly better approximation than a highway-node hierarchy [14], which is used for (dy- before for many real world inputs. In particular, we also namic) in road networks. For this application, get good approximations for the betweenness of unim- the requirement is to process huge networks with many portant nodes. million nodes in a few minutes. In this context, we also need reasonable approximations for the betweenness of 1 Introduction all nodes since we have to decide which of several neigh- One of the most important aspects of automatic analysis boring unimportant nodes will make it to the first level of networks is the computation of centrality indices that of the hierarchy. measure the importance of a node in some well defined way. Recently, the focus of attention in network anal- 1.1 Related Work. Brandes [4] gives an exact algo- ysis has shifted to the analysis of ever larger networks rithm for computing betweenness of all nodes that is that are rapidly becoming available in such diverse areas based on solving a single source shortest problem as transportation networks (e.g., public transportation (SSSP) from each node. An SSSP computation from s or road networks), social networks (e.g., friendship cir- produces a (DAG) encoding all cles, recommendation networks, or citation networks), shortest paths starting at s. By backward aggregation of computer networks (e.g., the internet or peer-to-peer counter values, the contributions of these paths to the networks), or networks in bioinformatics (e.g., protein betweenness counters can be computed in linear time interaction networks). (Section 3 gives more details). Depending on the graph In this paper we consider betweenness centrality model, the exact algorithm takes time between Θ(nm) [8, 1], which is one of the most frequently considered (e.g., for unit edge weights) and Θ nm + n2 log(n) centrality indices. Our results might also be applicable (comparison based general edge weights). Although this  to related concepts such as stress centrality [15] that is polynomial time, it is prohibitive for networks with are also based on counting shortest paths. Consider many millions of nodes and edges. Bader and Madduri a weighted directed (multi)-graph G = (V, E) with [2] present a massively parallel implementation of the n = |V |, m = |E|. Let SPst denote the set of shortest exact algorithm that can handle a few million nodes. paths between source s and target t and SPst(v) the Brandes and Pich [5] investigate how the exact al- subset of SPst consisting of paths that have v in their gorithm can be turned into an by extrapolating from a subset of k starting nodes (piv- ots), otherwise using the same aggregation strategy as the exact algorithm. Subsequently, we refer to this ap- ∗Partially supported by DFG grant SA 933/1-3. proximation algorithm as “Brandes’ algorithm”. A ran- †Universit¨at Karlsruhe (TH), 76128 Karlsruhe, Germany, dom sample of starting nodes turns out to work well. {robert.geisberger, sanders, schultes}@ira.uka.de In particular, the randomized method yields an unbi- ased estimator1 for betweenness. Unfortunately, this 2 A Generalized Framework for Betweenness method produces large overestimates for unimportant Approximation nodes that happen to be near a pivot. For example, Our estimator is parametrized by a length function consider a degree-two node v connecting a degree-one ℓ : E → R on the edges3 and a scaling function node u to the rest of the network. If u is selected as a f : [0, 1] → [0, 1]. For a path P = he1,...,eki let pivot, the betweenness of v is overestimated by a factor ℓ(P ):= 1≤i≤k ℓ(ei). of n/k. In each iteration, our algorithm performs one of 2n possibleP shortest path searches with uniform probability 1.2 Our Contributions. Our main idea is to solve 1/2n. Namely, forward search in G = (V, E) from the problem described above by changing the scheme for a pivot s ∈ V (|V | = n possibilities) or backward aggregating betweenness contributions so that nodes do search from a pivot t ∈ V , i.e., a search from t in not unduly ‘profit’ from being near a pivot. Section 2 (V, {(v,u) : (u, v) ∈ E}) (another n possibilities). For describes a general framework for this idea that yields each shortest path of the form unbiased estimators for betweenness. Our framework also applies to a simpler variant Q of betweenness, canonical-path betweenness centrality P = hs,...,v,...,ti cC (v), which we usually just call canonical centrality. We introduce this variant due to our original motivation found in this way, we definez }| a{scaled contribution of dealing with road networks, where shortest routes are 2 f(ℓ(Q)/ℓ(P )) for a forward search ‘almost unique’ . Note that some route planning meth- σst ods even enforce unique shortest paths by perturbing δP (v) := .  1−f(ℓ(Q)/ℓ(P )) for a backward search the edge weights (e.g., [9]). Consequently, in case of  σst canonical centrality, we consider only a single canoni- Overall, v gets a contribution cal shortest path between any source-target pair. Our approach does not require edge perturbation. It is suffi- δ(v) := δ (v) := {δ (v) : P ∈ SP (v)} cient that some deterministic tie breaking ensures that s P st ∈ only a single shortest path is found. tXV X Section 3 describes how to efficiently implement two for a forward search, and instantiations of our framework—linear scaling, where the contribution of a sample depends linearly on the δ(v) := δt(v) := {δP (v) : P ∈ SPst(v)} distance to the sample, and bisection scaling, where a s∈V sample only contributes ‘on the second half’ of a path. X X Linear scaling can be implemented using a slight varia- for a backward search. tion of Brandes’ original aggregation scheme. Bisection Theorem 2.1. X:= 2nδ(v) is an unbiased estimator scaling requires a quite different approach and another for c (v), i.e., E(X)= c (v). level of random sampling. Section 4 reports on exten- B B sive experiments with a wide range of large graphs. The Proof. Summing over all 2n possible events with linear scaling is always better than [5]. (Sampled) bisec- probability 1/2n each, we get tion scaling is in many ways even better. In particular, it yields good approximation of the betweenness also δs(v)+ δt(v) for less important nodes already with a small number s∈V t∈V of pivots—an area in which the original method fails. E(X)=2n X 2n X Section 5 summarizes the results and outlines possible =1 further improvements. For example, we have evidence ℓ(Q) ℓ(Q) f +1− f that betweenness approximations can help to construct z ℓ(P ) }| ℓ(P ) { =     :P ∈SPst(v) better highway-node hierarchies for road networks. σst s,t∈V ( ) X X=σst(v)

|SPst(v)| (1.1) = = cB(v) .  1 σst That means the expectation of the estimated betweenness is s,t∈V z }| { the actual betweenness. X 2Indeed, multiple shortest routes do appear in practice. How- ever they usually share most edges so that in most cases, the term 3ℓ may or may not be identical to the edge-weight function σst(v)/σst is one or zero. used for shortest-path calculations. Hence, by averaging k independent runs of the paths of the form hs,...,v,wi and the δs(w) takes care above unbiased estimator, we obtain an approximation of nodes reached indirectly via w. X1+...+Xk k of the betweenness value of node v. In the following, we will instantiate the length 3.2 Linear Scaling is easiest to implement by using function ℓ either with the original edge weight or with the original edge weights as the length function ℓ. Then unit edge weight (hop counting). We will consider three it is possible to implement it with only small deviations variants for the choice of f. For constant f(x)=1/2 we from Brandes’ scheme. We have essentially get Brandes’ algorithm.4 Our new schemes use f(x)= x for linear scaling and µ(s, v) σsv δs(v)= · (1 + δs(w)) 0 for x ∈ [0, 1/2) µ(s, w) σsw f(x)= w∈Xsucc(v) (1 for x ∈ [1/2, 1] for bisection scaling. The intuition behind these meth- where µ(s, w) is the shortest path distance from s to w. ods is to reduce the contributions for nodes close to Less formally, the modification compared to Brandes’ the pivot. In a sense, bisection scaling is best for this scheme is to put values 1/µ(s, w) into the aggregation purpose. However, it will turn out that it is easier to rather than 1. At the end, multiplying with µ(s, v) implement linear scaling. yields the correct scaling of µ(s, v)/µ(s, w) for all values put into the aggregation. 3 Linear Time Computation of Contributions A naive implementation of the definitions from Section 2 3.3 Bisection Scaling For Canonical Centrality. could take quadratic time even for a single pivot and Here, we mostly use unit distances for the length canonical centrality. We will give algorithms that function ℓ. (Recall that ℓ is not necessarily the same evaluate the shortest path DAG in time linear in its size. as the edge weight.) We do aggregation by a depth-first We only explain the computations for forward search traversal of the shortest-path tree. This allows us to from source s. Backward search works analogously. keep an array storing the path from s to the currently explored node. Aggregation works similarly to Brandes’ 3.1 Brandes’ Algorithm. Brandes [4] already ex- algorithm with one major modification. When a node v ′ plains how to compute σst on the fly during the short- at depth d is explored, let v denote the node on position est path calculations: σss = 1 and, for s 6= t, σst = max(0, ⌊d/2⌋− 1). We decrement the current value for ′ v∈pred(t) σsv where pred(t) is a multiset containing δs(v ). This has the effect of dropping the contribution the immediate predecessors of t in the shortest path of v from the aggregation just where it is prescribed P 5 DAG. In a subsequent aggregation phase, the nodes by the scaling function f. We have also implemented are processed in reverse topological order, i.e., by non- a variant for general length functions. Here, we have increasing distance from s. We get, to search the stack for ℓ(hs,...,vi)/2 starting at the position used for the predecessor of v. Although this σsv δs(v)= (1 + δs(w)) has linear worst case complexity, it works reasonably σsw w∈Xsucc(v) well for road networks using travel time. where succ(v) denotes the immediate successors of v 3.4 Bisection Scaling For General Betweenness in the shortest path DAG.6 The factor σsv takes care σsw Centrality. We have a direct implementation of enu- of distributing the contributions from w to possible merative bisection scaling that enumerates all paths by multiple parents. The ‘1’ is the contribution due to backtracking. This works well for graphs with few re- dundant paths but can be very slow in the worst case. 4 One can also do just forward searches in this case. What is more interesting is that the general case can be 5Since there may be an exponential number of paths, this recurrence might lead to arithmetic overflows if one would use reduced to the canonical case: we randomly sample a integer arithmetics on machine words. However, we are only parent pointer for each node t in the shortest path DAG. interested in approximations, so that it suffices to compute with We call this variant bisection sampling. If a parent p floating point numbers with mantissas of logarithmically many of w is selected with probability σsp , we get an unbi- σsw bits. Our implementation uses double precision arithmetics that ased estimator. We can extract more information out of flags overflows as ∞-values. This never happened yet for the inputs we considered. the shortest path DAG by performing several sampling 6In our framework with backward searches we have to divide steps for the same DAG, which does not invalidate the by two at the end. fact that the estimator is unbiased. graph nodes edges source average time per pivot [ms] Belgian road network 463 514 596 119 PTV AG 1337 Belgian road network with unit distance 463 514 596 119 PTV AG 914 Actor co-starring network 392 400 16557451 [12] 6242 US patent network 3774769 16518947 [10] 5 World-Wide-Web graph 325 729 1497135 [12] 144 CNR 2000 Webgraph 325 557 3216152 [11] 362 CiteSeer undir. citation network 268 495 2313294 [6] 1711 CiteSeer co-authorship network 227 320 1628268 [6] 835 CiteSeer co-paper network 434 102 32073440 [6] 4110 DBLP co-authorship network 299 067 1955352 [7] 1323 DBLP co-paper network 540 486 30491458 [7] 5024

Table 1: Overview of used graphs. A co-paper network has papers as nodes and edges between papers that share at least one author. The values in the last column refer to general betweenness centrality and Brandes’ algorithm.

4 Experiments Euclidean distance of the normalized n-dimensional The experiments were done on one core of a single AMD vectors for exact and estimated centrality, respectively, 9 Opteron Processor 270 clocked at 2.0 GHz with 8 GB and the number of inversions when comparing the main memory and 2 × 1 MB L2 cache, running SuSE rankings produced by exact and estimated centrality, Linux 10.0 (kernel 2.6.13). The program was compiled respectively. The number of inversions is computed by the GNU C++ compiler 4.0.2 using optimization using a mergesort-based O(n log n) algorithm. The level 3. Euclidean distance is mostly governed by the nodes with We have approximated canonical centrality for sev- large betweenness, whereas the number of inversions eral real-world road networks with up to 42 million edges treats all nodes equally. We have also looked at using travel times as edge weights. We do most quality other measures. For example, the average absolute evaluations here with a subnetwork for Belgium with betweenness error turned out to give similar results as 10 463 514 nodes because for this we can still compute ex- the Euclidean distance , whereas the geometric mean 11 act values.7 of relative rank errors has similar characteristics as For general betweenness we used networks with unit the number of inversions. edge weights stemming from a variety of applications Since our new algorithms need slightly more time including citation networks, coauthorship networks, web than Brandes’ algorithm for evaluating the shortest graphs and communication networks (AS graph, router path DAG, we ensure a fair comparison by giving our level graphs). Table 1 gives additional information on algorithms no more time than Brandes’ algorithm; thus, these networks. Our focus was on large, real-world they will perform less iterations. Effectively, the time graphs where we can still compute exact betweenness in needed by Brandes’ algorithm for one pivot is our unit reasonable time. This includes the largest graphs from of time. [2]. We do not use the graphs considered in [5] since Figure 1 evaluates the approximation of canonical they are mostly randomly generated and comparatively centrality for Belgium. With respect to the Euclidean small. The evaluation here uses the most difficult distance, all algorithms show benign and predictable be- of these instances8—a movie actor multigraph whose havior. Still, our new methods fare uniformly better edges indicate costarring in some movie. This graph with bisection slightly ahead. In the same running time has 392400 nodes, 16557451 edges, and many shortest we get an about two times smaller distance or alterna- paths between most pairs of nodes. tively, we achieve about the same quality in eight times For assessing the quality of the estimation, we less running time. With respect to the number of inver- adopt the measures proposed by Brandes and Pich [5]— sions, the difference is much more dramatic. Even after

9i.e., the total number of pairs that are in the wrong order. 7 Computing exact values for our complete Western European 10Note that the normalization is likely to have little effect since road network would take about 12 years. all the compared methods are unbiased and hence the computed 8 The US patent network is much bigger but relatively easy to vectors are likely to have similar length anyway. solve exactly since most searches reach only a small number of 11Let r denote the ratio between the estimated rank and the nodes. true rank. Then the relative rank error is max {r, 1/r}. % of possible inversions Euclidean distance 0.5 1 2 3 −4 −3 −2 iue1 ulda itneadivrin o aoia c canonical for inversions and distance Euclidean 1: Figure 10 10 10 :801 :210 :941 :61:33:868:57 34:28 17:13 8:36 4:19 2:09 1:05 0:32 0:16 0:08 63 418265212 0849 8192 4096 2048 1024 512 256 128 64 32 16 8192 4096 2048 1024 512 256 128 64 32 16 linear bisection (sh.path) bisection (unit) Brandes pivots forBrandes’algorithm(timebelowmm:ss) nrlt fBelgium. of entrality linear bisection (sh.path) bisection (unit) Brandes

0.5 1 2 3 10−4 10−3 10−2 % of possible inversions Euclidean distance

−3 −2 −1 iue2 ulda itneadivrin o betweennes for inversions and distance Euclidean 2: Figure 0.3 1 3 10 10 10 :200 :701 :605 :633 :714:12 7:07 3:33 1:46 0:53 0:26 0:13 0:07 0:03 0:02 63 418265212 0849 8192 4096 2048 1024 512 256 128 64 32 16 8192 4096 2048 1024 512 256 128 64 32 16 pivots forBrandes’algorithm(timebelowhh:mm) nteatrnetwork. actor the in s bis. (pivots) linear b.sampl. (16) b.sampl. (8) b.sampl. (4) b.sampl. (2) bisection Brandes

0.3 1 3 10−3 10−2 10−1 iue4 nesos aoia etaiyo admgrap random a of centrality canonical Inversions, 4: Figure % of possible inversions % of possible inversions 5 10 20 2 3 iue3 nesos aoia etaiyo egu,Bra Belgium, of centrality canonical Inversions, 3: Figure 2 4 63 41826512 256 128 64 32 16 8 2 5 linear bisection Brandes 2 6 2 7 2 8 pivots forBrandes’algorithm pivots forBrandes’algorithm 2 9 2 10 2 11 2 12 ih100ndsad1 7 edges. 074 10 and nodes 000 1 with h 2 13 2 14 ds algorithm. ndes’ 2 15 2 16 2 17 2 18

5 10 20 2 3 down. oepvt r sd h oeuipratndsnear nodes become unimportant pivots more the the used, are pivots more nlz nmr ealhwapoiainerr are errors us approximation let how on, detail going more is what in understand double- analyze To a outperforms clearly on scaling. scaling behavior linear Bisection algo- linear new near plot. Our nice, logarithmic positive. a is show inversions rithms of number the on aeo otuipratndsi httercentral- their that the is initially that nodes is is unimportant explanation ity most possible A of fate 4). behaved’ ‘well Figure more much also are (see [5] in used graphs random begin even converge. not to does algorithm Brandes’ iterations, 8192 n important L most outliers. 256 denote algorithm. next circles Brandes’ the The diff 10 of 12 Level iterations for centrality, errors 1024 canonical rank after relative network the road of Distribution 5: Figure 12

iue3idctsta h ubr,i h eyend, very the in numbers, the that indicates 3 Figure relative rank error

least importantvertices 0.2 0.5 1 2 12 011 10 9 8 7 6 5 4 3 2 1 0 slightly hsbhvo suepce ic the since unexpected is behavior This grossly neetmtd oee,the However, underestimated. vrsiae.Tenteffect net The overestimated. do go level ieto ae ut el lhuhi saotfour about is it although enumerative well, Interestingly, quite get fares before. we bisection distance, as Euclidean results the For similar network. actor the deviations large to er- lead small can ranks. already value relative that absolute in so the similar in very rors for im- values are betweenness of nodes the level that these is highest reason the The have for methods portance. All errors nodes. large which the comparatively levels, of important majority mag- less the of the constitute orders for errors has larger algorithm nitude Brandes’ bisection while levels, that over quality all see approximation good can uniformly We gives scaling levels importance. different 12 of dis- for the (categories) errors rank illustrates relative 5 the of Figure tribution nodes. the over distributed rn ees(aeois fiprac nteBelgian the in importance of (categories) levels erent ds n oo.Tecosgvsteaeaevalue. average the gives cross The on. so and odes, iue2eautsbtenesapoiainfor approximation betweenness evaluates 2 Figure vl1 otiste18ndswt ags exact largest with nodes 128 the contains 11 evel linear bisection (sh.path) bisection (unit) Brandes 99.9% most importantvertices 0.1% 10% 25% 50% 75% 90% 99% 1%

0.2 0.5 1 2 nlsso ee ltsmlrt iue5rvasthat An reveals on 5 algorithm used. Figure Brandes’ are to outperforms pivots similar sampling plot few bisection level only a if of essential analysis is In informa- that away paths. throws tion shortest sampling bisection redundant to situation, highly seems to this of networks difference presence road the main for based be ranking The bisection clear-cut over. the more vari- take the iterations, other to 1024 the start than after algorithms start and Only better Brandes a ants. both after have However, scaling down linear heading iterations. finally al- of algorithm before thousands iterations scaling stagnates many linear which for The algorithm, Brandes’ outperforms clear. ways less is rithms performance. on does influence pivot big per achieve a trees 8192 have to sampled with not of time algorithm number smaller Brandes’ less The as times iterations. times bound four 16 error about same about the has needs methods. Bisection It and other errors scaling. best. the linear fares outperforms than sampling it iteration particular, In per slower times divid networks. algorithm different Brandes’ for for samples inversions two of Number 6: Figure

o h neso ubr h akn ftealgo- the of ranking the number, inversion the For compare inversion Brandes / b.sampl.(2) 1 2 5 10 20 63 418265212 0849 8192 4096 2048 1024 512 256 128 64 32 16 CiteSeer co−author DBLP co−author CiteSeer co−paper CiteSeer citation DBLP co−paper Belgium Belgium unit pivots forBrandes’algorithm uprom l te loihsfo h beginning. the from algorithm algorithms hypothetical other this of that all iterations see outperforms We counting bi- loop. just enumerative outer on the our in based of curve exploit- algorithm performance lowest for section the The algorithm plots available. 2 fast information Figure a the had what all we see ing to if to interesting leads happen also iterations, would is It of number performance. superior sufficient even- a that after plot double-logarithmic tually, decrease a near-linear on inversions nice the a of has levels. sampling important least bisection the Still, for deficits has but levels most h Sptn ewr n h e rps Here, graphs. web the 1— and Table network from networks patent pattern directed US this the the from were of Euclidean deviation found all the only for we to The better respect always networks. of With is these factor sampling up better. bisection a is 20 sampling distance, to factor bisection up a pivots, many Forto is For algorithm better. algorithm. yielded two Brandes’ improvement Brandes’ pivots, over the few sampling shows bisection 6 by Figure networks. db ubro nesosfrbscinsaigwith scaling bisection for inversions of number by ed eseasmlrptenfrmn fteother the of many for pattern similar a see We

1 2 5 10 20 all approximation algorithms had problems to achieve 5.1 Path Sampling. Bias due to the selection of good approximation. Linear scaling was always slightly sampled source nodes can be avoided altogether by better than Brandes’ algorithm. Bisection sampling sampling both source and target node rather than at- was sometimes better, sometimes worse. Shortest path tempting to gain as much information as possible from searches turned out to be very local. This means that a single-source (all-target) shortest-path computation. all approximation algorithms have trouble eliminating The obvious drawback is that we get n − 1 times fewer false zeroes for nodes of small betweenness. This is a paths per sample. We can mitigate this problem by us- disadvantage for bisection sampling since it is slower ing speedup techniques for answering the queries. In and since it produces zero contributions where Brandes [13] this is done using inexact heuristics so that it is and linear scaling produce positive contributions. On unclear what is actually computed. At least for road the other hand, the same effect makes these networks networks we can do much better using recently devel- relatively easy for the exact algorithm. For example, oped exact speedup techniques that are very efficient. our implementation solves the US patent graph in 127 In particular, transit-node routing [3] achieves nearly minutes—only 2.5 times more time than Bader and constant query time and thus might lead to very good Madduri [2] need using 16 IBM-P5 processor cores. results. More precisely, transit-node routing computes transit nodes u and v such that there is a shortest path 5 Conclusion and Future Work from source s via u and v to target t. We can process Our new bisection scaling algorithm is a versatile this information by incrementing counters for the access method for approximating canonical centrality that connections hs,ui and hv,ti and the transit connection works well even for huge networks. The algorithm has hu, vi. Only at the end, we have to propagate these considerably better performance than previous methods counters down to the actual nodes (or edges) of the un- for all graphs tried and for all global quality measures. derlying network. This propagation cannot take longer This good evaluation largely extends to the original def- than building the data structures during preprocessing. inition of betweenness and the sampled bisection algo- We have not implemented path sampling because it is rithm. However, for a small number of pivots, the linear complicated and unlikely to work better than bisection scaling variant of our framework is still better. Enumer- sampling for road networks. In particular, we need a ative bisection might be the overall winner here if we huge number of samples before unimportant nodes get found a near linear time implementation. a significant contribution. On the other hand, if we were All algorithms discussed are easy to parallelize with interested in a small selection of nodes with highest be- up to k processors. We have not yet attempted any tweenness and the transit-node preprocessing would be pragmatic tuning of the algorithms. For example, needed anyway, then path sampling would be very effi- by using robust statistics (e.g., omitting the smallest cient and we could even derive good performance guar- and largest measurements) it might be possible to antees using Chernoff bounds. reduce the number of nodes with grossly overestimated betweenness. At least for road networks it should also 5.2 Including Local Searches. We have developed help to do some preprocessing like getting rid of degree- a generalization of our framework that allows a mix of one nodes. In multigraphs like the actor network, we local and global search and remains unbiased. This might get improved performance by modeling parallel might lead to better estimates for not-so-important edges more explicitly. nodes. In particular, false zero values might be reduced For some directed networks, none of the approxi- by local searches around nodes with zero betweenness mation algorithms gives very convincing results since a estimate. Preliminary experiments indicate that the good approximation takes almost as much time as an general idea works but does not yield dramatic improve- exact calculation. Although we did not find undirected ments for the instances tried. graphs with similarly bad performance, better theoret- ical performance bounds13 or a more detailed experi- 5.3 Application to Highway-Node Routing. mental investigation on the influence of graph structure Highway-node routing [14] requires a nested hierarchy of on performance remain an interesting open problem. more and more important nodes. We have preliminary results on how to use betweenness estimations to define improved highway-node hierarchies. At the same time, the resulting method is simpler than the approach used in [14]. However, a really good scheme seems to require 13 [5] gives a probabilistic tail bound which contains a bug additional heuristics beyond the scope of this paper. however. Repairing this bug yields a bound only useful for nodes with near quadratic betweenness. Acknowledgments. Robert G¨orke has provided us cation networks. Bulletin of Mathematical Biophysics, with citation networks crawled from DBLP and Cite- 15:501–507, 1953. seer. Kamesh Madduri has provided the patent net- work and the actor network. We would like to thank Reinhard Bauer, Ulrik Brandes, Daniel Delling and Dorothea Wagner for interesting discussions. Several anonymous reviewers provided valuable suggestions.

References

[1] J. M. Anthonisse. The rush in a directed graph. Technical Report BN 9/71, Stichting Mathematisch Centrum, Amsterdam, 1971. [2] David A. Bader and Kamesh Madduri. Parallel algo- rithms for evaluating centrality indices in real-world networks. In ICPP, pages 539–550. IEEE Computer Society, 2006. [3] H. Bast, S. Funke, P. Sanders, and D. Schultes. Fast routing in road networks with transit nodes. Science, 316(5824):566, 2007. [4] Ulrik Brandes. A faster algorithm for betweenness cen- trality. Journal of Mathematical Sociology, 25(2):163– 177, 2001. [5] Ulrik Brandes and Christian Pich. Centrality estima- tion in large networks. International Journal of Bifur- cation and Chaos, to appear. special issue on Complex Networks’ Structure and Dynamics. [6] CiteSeer. Scientific Literature Digital Library. http: //citeseer.ist.psu.edu/, 2007. [7] DBLP. DataBase systems and Logic Programming. http://dblp.uni-trier.de/, 2007. [8] L. C. Freeman. A set of measures of centrality based on betweenness. Sociometry, 40:35–41, 1977. [9] A. Goldberg, H. Kaplan, and R. Werneck. Reach for A∗: Efficient point-to-point shortest path algo- rithms. In Workshop on Algorithm Engineering & Ex- periments, pages 129–143, Miami, 2006. [10] A. B. Jaffe Hall, B. H. and M. Tratjenberg. The NBER Patent Citation Data File: Lessons, Insights and Methodological Tools. NBER Working Paper, 8498, 2001. [11] Laboratory for Web Algorithmics. http: //law.dsi.unimi.it/index.php?option=com_ include&Itemid=65. [12] Notre Dame CNet resources. http://www.nd.edu/ ~networks/. [13] M. J. Rattigan, M. Maier, and D. Jensen. Using structure indices for efficient approximation of network properties. In KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge Discovery and Data mining, pages 357–366. ACM, 2006. [14] D. Schultes and P. Sanders. Dynamic highway-node routing. In 6th Workshop on Experimental Algorithms, volume 4525 of LNCS, pages 66–79. Springer, 2007. [15] Alfonso Shimbel. Structural parameters of communi-