<<

Subcubic Equivalences Between Graph Problems, APSP and Diameter

Amir Abboud∗ Fabrizio Grandoni† Virginia Vassilevska Williams‡

Abstract The graph centrality of a node w is the inverse of Measuring the importance of a node in a network is a major its maximum distance to any other node. The closeness goal in the analysis of social networks, biological systems, centrality of w is the inverse of the total distance of w transportation networks etc. Different centrality measures have been proposed to capture the notion of node importance. For to all the other nodes. The reach centrality of w is the example, the center of a graph is a node that minimizes the maximum distance between w and the closest endpoint maximum distance to any other node (the latter distance is of any s-t shortest passing through w.Informally, the radius of the graph). The median of a graph is a node that minimizes the sum of the distances to all other nodes. the of w measures the fraction of Informally, the betweenness centrality of a node w measures the shortest paths having w as an intermediate node. fraction of shortest paths that have w as an intermediate node. In this paper we study four fundamental graph cen- Finally, the reach centrality of a node w is the smallest distance r such that any s-t shortest path passing through w has either s trality computational problems associated with the men- or t in the ball of radius r around w. tioned centrality measures. Let G =(V,E) be an n-node The fastest known algorithms to compute the center and m-edge (directed or undirected) graph, with integer edge the median of a graph, and to compute the betweenness or reach weights w : E 0,...,M for some M 12.Let centrality even of a single node take roughly cubic time in the →{ } ≥ number n of nodes in the input graph. It is open whether these dG(s, t) denote the distance from node s to node t,and problems admit truly subcubic algorithms, i.e. algorithms with let us use d(s, t) instead when G is clear from the context. ˜ 3 δ 1 running time O(n − ) for some constant δ>0 . Let also σ be the number of distinct shortest paths from We relate the complexity of the mentioned centrality prob- s,t lems to two classical problems for which no truly subcubic al- s to t,andσs,t(b) be the number of such paths that use gorithm is known, namely All Pairs Shortest Paths (APSP) and node b as an intermediate node. Diameter. We show that Radius, Median and Betweenness Cen- trality are equivalent under subcubic reductions to APSP, i.e. The Radius problem is to compute R∗ := that a truly subcubic algorithm for any of these problems im- • minr V maxv V d(r∗,v) (radius of the graph). plies a truly subcubic algorithm for all of them. We then show ∗∈ ∈ that Reach Centrality is equivalent to Diameter under subcubic The Median problem is to compute M ∗ := reductions. The same holds for the problem of approximating • min d(m ,v) Betweenness Centrality within any constant factor. Thus the lat- m∗ V v V ∗ . ∈ ∈ ter two centrality problems could potentially be solved in truly The Reach Centrality problem (for a given node b)is subcubic time, even if APSP requires essentially cubic time. • ! to compute

1Introduction RC(b)= max min d(s, b),d(b, t) . s,t V : { { }} Identifying the importance of nodes in networks is a major d(s,t)=d(∈s,b)+d(b,t) goal in the analysis of social networks (e.g., citation net- works, recommendation networks, or friendship circles), The Betweenness Centrality problem (for a given • biological systems (e.g., protein interaction networks), node b)istocompute computer networks (e.g., the Internet or peer-to-peer net- σ (b) works), transportation networks (e.g., public transporta- BC(b) := s,t . σs,t tion or road networks), etc. A variety of graph theoretic s,t V b ,s=t notions of node importance have been proposed, among ∈ "−{ } ̸ the most relevant ones: betweenness centrality [20], graph All of these notions are related in one way or another centrality [29], closeness centrality [47], and reach cen- to shortest paths. In particular, we can solve the first three trality [28]. problems by running an algorithm for the classical All- Pairs Shortest Paths problem (APSP) on the underlying ∗Stanford University, [email protected]. graph and doing a negligible amount of post-processing. [email protected] †IDSIA, University of Lugano, .Partially The same holds for Betweenness Centrality by assuming supported by the ERC Starting Grant NEWNET 279352. ‡Stanford University, [email protected] NSF Grant CCF-1417238, BSF Grant BSF:2012338 and a Stanford SOE 2Though we focus here on non-negative weights, our results can be Hoover Fellowship. extended to the case of directed graphs with possibly negative weights 1The O˜ notation suppresses poly-logarithmic factors in n and M. and no negative cycles. that shortest paths are unique: we will next make this as- the famous results on 3-SUM hardness starting with the sumption unless differently stated3.Usingthebestknown work of Gajentaan and Overmars [21]. algorithms for APSP [55], this leads to a slightly subcu- In this paper we exploit both APSP and Diameter as bic (by an no(1) factor) running time for the considered our prototypical problem and prove a collection of sub- problems, and no faster algorithm is known. cubic equivalences with the above graph centrality prob- Each of these problems however only asks for the lems. Recall that the Diameter problem is to compute the computation of a single number. It is natural to ask, is largest distance in the graph. There is a trivial subcubic solving APSP necessary? Could it be that these problems reduction from Diameter to APSP and, although no truly admit much more efficient solutions? In particular, do subcubic algorithm is known for Diameter, finding a re- they admit a truly subcubic4 algorithm? duction in the opposite direction is one of the big open Besides the fundamental interest in understanding the questions in this area: can we compute the largest distance relations between such basic computational problems (can faster than we can compute all the distances? Radius be solved truly faster than APSP?), these ques- tions are well motivated from a practical viewpoint. As 1.2 Subcubic equivalences with APSP. Our first main evidence to the necessity of faster algorithms for the men- result is to show that Radius, Median and Betweenness tioned centrality problems, we remark that some papers Centrality are equivalent to APSP under subcubic reduc- presenting algorithms for Betweenness Centrality [6] and tions! Therefore, we add three quite different problems to Median [30] have received more than a thousand citations the list of APSP-hard problems [53] and if any of these each. problems can be solved in truly subcubic time then all of them can. 1.1 Approach. In this paper we address these ques- tions with an approach which can be viewed as a refine- THEOREM 1.1. Radius, Median, and Betweenness Cen- ment of NP-completeness. The approach strives to prove, trality are equivalent to APSP under subcubic reductions. via combinatorial reductions,thatimprovingonagiven upper bound for a computational problem B would yield Unfortunately, this is strong evidence that a truly sub- breakthrough algorithms for many other famous and well- cubic algorithm for computing these centrality measures studied problems. At high-level, the idea is to consider a is unlikely to exist (or at least very hard to find) since it given prototypical problem A for which the fastest known would imply a huge and unexpected algorithmic break- algorithm has running time O˜(nc) (here n is a size pa- through. c ε We find the APSP-hardnessresult for Radius quite in- rameter). Then we show that a O˜(n − )-time algorithm for a second problem B,forsomeconstantε>0,would teresting since, prior to our work, there was no good rea- c δ son to believe that Radius might be a truly harder problem imply a O˜(n − )-time algorithm for problem A for some other constant δ>0.Thiscanbeusedasevidencethata than Diameter. Indeed, in terms of approximation algo- c ε rithms, any known algorithm to approximate the diameter O˜(n − ) time algorithm for problem B is unlikely to ex- ist (or at least very hard to find). For c =3areduction can be converted to also approximate the radius in undi- of the above kind is called a subcubic reduction [53] from rected graphs within the same factor [3, 5, 10, 45]. Fur- AtoB.WesaythattwoproblemsA and B are equiva- thermore, the exact algorithms for Diameter and Radius lent under subcubic reductions if there exists a subcubic in graphs with small integer weights are also extremely reduction from A to B and from B to A.Inotherterms,a similar [13]. Our results seem to indicate the following truly subcubic algorithm for one problem implies a truly bizarre phenomenon. In dense graphs, Radius seems to subcubic algorithm for the other and vice versa. be harder than Diameter, as it is actually equivalent to Vassilevska Williams and Williams [53] introduced APSP, whereas a Diameter/APSP equivalence seems elu- this approach to the realm of Graph Algorithms to show sive. In sparse graphs, however, Diameter seems more the subcubic equivalence between APSP and a list of difficult than Radius since a known reduction [45] from seven other problems, including: deciding if an edge- CNF-SAT seems to imply that even approximatingthe Di- weighted graph has a triangle with negative total weight ameter in subquadratic time is hard. No such reduction is (Negative Triangle), deciding if a given matrix defines a known for Radius, and so far there is no evidence against metric, and the Replacement Paths problem [27, 46, 51, asubquadraticRadiusalgorithminsparsegraphs. 54]. Other examples of this approach [1, 2, 44] include 1.3 Subcubic equivalence with Diameter. Our sec- 3In the case of multiple shortest paths, the fastest known algorithm ond main result is to show that Reach Centrality and Di- for Betweenness Centrality takes O˜(n4) time; please see the discussion ameter are equivalent under subcubic reductions. in the related work section. 4We recall that a truly subcubic algorithm is an algorithm with THEOREM 1.2. Diameter and Reach Centrality are 3 δ running time O˜(n − ) for some constant δ>0. equivalent under subcubic reductions. On the positive side, it is within the realm of possi- with finite approximation factor for Betweenness Central- bility that Diameter is a truly easier problem than APSP, ity implies a truly subcubic Diameter algorithm, while a which would imply the same for Reach Centrality. On the truly subcubic Diameter algorithm implies a truly sub- negative side, Theorem 1.2 shows that finding a subcu- cubic (1 + ε)- for Betweenness bic algorithm for Reach Centrality is as hard as finding a Centrality, for any constant ε>0.Ourreductionsare subcubic algorithm for Diameter - a big open problem. Monte-Carlo, i.e. the resulting algorithm might fail to As a consequence of the tightness of our reductions, provide the desired approximation with some small prob- namely not only the number of nodes but also the largest ability. In more detail, we provide a subcubic reduction to absolute weight is roughly preserved, we also obtain a Diameter to compute the exact value of the betweenness faster algorithm for Reach Centrality in directed graphs centrality when that value is sufficiently small. For the with small integer weights. complementary case, we use a non-trivial random sam- pling algorithm. Analogously to the case of Reach Cen- ˜ ω THEOREM 1.3. There exists an O(Mn ) time algorithm trality, this gives some more hope that a truly subcubic al- for Reach Centrality in directed graphs. gorithm for Approximate Betweenness Centrality exists, Above ω [2, 2.373) [12, 14, 22, 52] denotes fast matrix however such algorithm is probably not easy to find. Part ∈ multiplication exponent. The previous best algorithm for of the mentioned reductions are summarized in Figure 1. small integer weights, which is based on the solution of APSP, takes time O˜(M 0.681n2.575) [57]. This is 1.5 Related Work. APSP is among the best stud- interesting in our opinion since Reach Centrality has ied problems in Computer Science. If the edge weights been used in some very fast (in practice) shortest paths are non-negative, one can run Dijkstra’s algorithm [16] algorithms [24, 25, 28]. from every source node, and solve the problem in time O(mn + n2 log n) (by implementing Dijkstra’s algorithm with Fibonacci heaps [19]). Johnson [36] showed how 1.4 Approximation algorithms. An approximate value of the mentioned graph centrality measures might to obtain the same running time in the case of negative weights also (but no negative cycles). Pettie [41] im- be sufficiently good in practice. This is indeed the topic 2 of several empirical works on Betweenness Centrality proved the running time to O(mn + n log log n) and to- [4, 7, 23]. Furthermore, the mentioned practically fast gether with Ramachandran to O(mn log α(m, n)) [42]. shortest paths algorithms [24, 25, 28] can be adapted to If the graph is undirected and the edge weights are in- work with approximate values of the reach centrality as tegers fitting in a word, one can solve the problem in time O(mn) in the word-RAM model [50]. In dense well. In this paper we formally study the approximability 3 of the mentioned problems. graphs the running time of these algorithms is O(n ). In more detail, given a graph centrality measure X, Slightly subcubic algorithms were developed as well, our goal is to compute (quickly) a quantity x such that starting with the work of Fredman [18]. Following a long 1 X x αX for some α 1 as small as possible (α is sequence of improvements (among others, [8, 31]), very α ≤ ≤ ≥ the approximation factor). It is known how to solve APSP recently Williams [55] obtained an algorithm with run- ˜ 3 Ω(√log n) within a multiplicative error (1 + ε) in time O˜(nω) [56]. ning time O(n /2 ). Faster algorithms are known This provides truly subcubic (1 + ε) approximation al- for small integer weights bounded in absolute value by M:inundirectedgraphsAPSPcanbesolvedinO˜(Mnω) gorithms for Radius and Median. However, this approach 1 ˜ 2 4 ω does not help with Reach/Betweenness Centrality, since in time [49] and in directed graphs in O(n (Mn) − ) those measures almost shortest paths are irrelevant. Here time [57]. The result for the directed case can be refined ˜ 0.681 2.575 we present some negative and (conditionally) positive re- to O(M n ) using fast rectangular matrix multi- sults on the approximability of the latter two problems. plication [32]. It is not hard to see that any approximation algorithm As we already mentioned, forgeneraledge-weights for Reach/Betweenness Centrality can be used to deter- the fastest known algorithms for Diameter and Radius mine whether BC(b) > 0.Weshowthat,whilesolving solve APSP (hence taking roughly cubic time). In the case the latter problem for a single node is equivalent to Diam- of directed graphs with small integer weights bounded ˜ ω eter, solving it for every node is at least as hard as APSP! by M there are faster, O(Mn ) time algorithms (see As a consequence any approximation algorithm (for any [13] and the references therein). Faster approximation finite α)tocomputethereach/betweennesscentralityof algorithms are known. Aingworth et al. [3] showed all nodes implies a truly subcubic algorithm for APSP. how to compute a (roughly) 3/2 approximation of the ˜ 2 On the positive side, we show that (single-node) Ap- diameter in time O(m√n+n ).Thesameapproximation proximate Betweenness Centrality is equivalent to Di- factor and running time can be achieved for Radius in ameter under subcubic reductions. This equivalence is undirected graphs [5]. The running time for both Radius ˜ quite strong: any truly subcubic approximation algorithm and Diameter was reduced to O(m√n) by Roditty and 3 Figure 1 The main subcubic reductions considered in this paper. Dashed arrows correspond to trivial reductions. All the remaining reductions are given in this paper, excluding the one from APSP to Negative Triangle which is taken from [53].

Betweenness Reach Centrality Centrality

Negative Radius Triangle APSP Diameter

Approx. Betw. Median Centrality

Vassilevska Williams [45] (see also [10] for a refinement prising a posteriori, in view of our APSP-hardness results. of the approximation factor). The authors also show In contrast, our results suggest a candidate way to obtain that a 3/2 ε approximation for Diameter running in aprovablyfastandaccuratealgorithmforApproximate 2−ε time O(m − ) (for any constant ε>0)wouldimply Betweenness Centrality (for a single node). Our approach that the Strong Exponential Time Hypothesis (SETH) deviates substantially from [4, 7, 23] for small values of of [33] fails, thus showing that improving on the 3/2- the betweenness centrality. approximation factor while still using a fast algorithm The Reach Centrality notionwasintroducedbyGut- would be difficult. man [28] in the framework of practically fast algorithms The notion of betweenness centrality was introduced to solve the Single-Source Shortest Paths problem. In par- by Freeman in the context of social networks [20], and ticular, the values RC(b) can be used to filter out some since then became one of the most important graph cen- nodes during an execution of Dijkstra’s algorithm. The trality measures in the applications. For example, this no- notion of Reach Centrality is also used in other works on tion is used in the analysis of protein networks [15, 35], the same topic [24, 25]. social networks [40, 43], sexual networks [38], and terror- Eppstein and Wang [17] consider the problem of ist networks [11, 37]. From an algorithmic point of view, approximating the closeness centrality of all nodes. They log n betweenness centrality was used to identify a highway- present a random-sampling-based O((m + n log n) ε2 ) node hierarchy for in road networks [48]. Bran- time algorithm which w.h.p. computes estimates within des’ algorithm [6] computes the betweenness centrality of an additive error εD∗,whereD∗ is the diameter of the all nodes in time O(mn + n2 log n).Thisresultisbased graph. The same problem is investigated in [7] from an on a counting variant of Dijkstra’s algorithm. We remark experimentalpoint of view. The Median problem was also that [6], similarly to other papers in the area, neglects studied in a distance-oracle query model [9, 26, 34]. the bit complexity of the counters which store the num- ber of pairwise shortest paths. This is reasonable in prac- 1.6 Preliminaries and Notation. W.l.o.g. we assume tice since the maximum number N of alternative short- that the considered graph G =(V,E) is connected, hence est paths between two nodes tends to be small in many m n 1.Wemaketheusualassumptionthatthenodes ≥ − of the applications. By considering also N,therunning of the considered graph are labelled with integers between time grows by a factor O(log N)=O(n log n).Indeed, 0 and n 1, and where needed we implicitly assume that n − in some applications one can even assume that shortest is lower bounded by a sufficiently large constant. For two paths are unique (as we do in most of this paper). The nodes u, v V ,byuv we indicate either an undirected ∈ uniqueness of shortest paths is either a consequence of edge between u and v or an edge directed from u to v. tie breaking rules (Canonical-Path Betweenness Central- The interpretation will be clear from the context. ity problem [23]), or can be enforced by perturbing edge We remark that, in our subcubic reductions, it would weights [24]. However, the running time to compute the be sufficient to preserve (modulo poly-logarithmic fac- exact betweenness centrality can be prohibitive in practice tors) the number n of nodes only. However, whenever for very large networks even assuming the uniqueness of possible, we will also try to preserve (in the same sense) shortest paths. For this reason, some work was devoted also m and M.Inmanycasesweobtainextremelytight to the fast approximation of the betweenness centrality of reductions that even allow us to obtain new faster algo- all nodes [4, 7, 23]. Those works are based on random rithms, as is the case with Reach Centrality via our tight pivot-sampling techniques. They do not provide any the- reduction to Diameter. oretical bound on the approximation factor: this is not sur- For a given node w V ,weletRad(w) := ∈ maxv V d(w, v) (eccentricity of w)andMed(w) := Combining the reductions below with Lemma 2.1 ∈ { } v V d(w, v).Anodew minimizing Rad(w) and proves Theorem 1.1. Med∈(w) is a center and a median of the graph, respec- ! tively. 2.1 Betweenness Centrality. We start with the reduc- In some claims we assume that a T (n, m) time, tion to Betweenness Centrality. T (n, M) time, or T (n, m, M) time algorithm for some ˜ problem is given. In all those claims we implicitly LEMMA 2.2. Given a O(T (n, m)) time algorithm for assume that those running times are polynomial functions Betweenness Centrality in directed or undirected graphs, ˜ lower bounded by m.Moregenerally,however,itisthere exists a O(T (n, m)) time algorithm for Negative sufficient for our proofs that O˜(m + T (O˜(n), O˜(m))) = Triangle. ˜ O(T (n, m)) and similarly for the other cases. Proof. Let (G =(V,E),w) be the input instance of Neg- Throughout this paper, with high probability (w.h.p.) O(1) ative Triangle (reduced as described above). In particular, means with probability at least 1 1/n . k+1 − n =2 is the number of nodes of G. We start with the simpler directed case (see also Fig- 2SubcubicEquivalencewithAPSP ure 2). We construct a weighted directed graph (G′,w′) In this section we prove the subcubic equivalence between as follows. Graph G′ contains four sets of nodes I, J, K, APSP and the following problems: Radius, Median and and L (layers). Each layer contains a copy of each node Betweenness Centrality. As mentioned in the introduc- v V .Letv be the copy of v in I,anddefineanalo- ∈ I tion, reducing these problems to APSP is fairly straight- gously vJ , vK and vL.LetQ =Θ(M) be a sufficiently forward and here we will focus on the opposite reductions. large integer. For each edge uv E,weaddtoG′ the ∈ We exploit Negative Triangle as an intermediate sub- edges uI vJ , uJ vK ,anduKvL,andassigntothoseedges problem: determine whether a given undirected graph weight 2Q+w(uv).WeaddtoG′ adummynodeb,and G =(V,E),withintegeredgeweightsw : E edges v b and bv for any v V ,ofweight3Q 1 and → I L ∈ − M,...,M ,containsatrianglewhoseedgessumto 3Q,respectively.WealsoaddtoG two sets of nodes {− } ′ anegativenumber;suchatriangleiscalledanegative Z = z ,...,z and O = o ,...,o .Foranyv V , { 0 k} { 0 k} ∈ triangle.Thelatterproblemwasshowntobeequivalent we add the following edges of weight 3Q 1 to G′.Let − to APSP under subcubic reductions in [53]. v0,v1,...,vk be a binary representation of v (interpreted as an integer between 0 and n 1=2k+1 1). For each LEMMA 2.1. [53] Negative Triangle and APSP (in di- − − j =0,...,k,weaddedgesv z and o v if vj =0,and rected or undirected graphs) are equivalent under subcu- I j j L edges v o and z v otherwise. We also add edges o z bic reductions. I j j L j j and z o of weight 3Q 1 for j =0,...,k.Observethat j j − In order to simplify our proofs, we assume that the k = O(log n),hencethereareO(n log n) edges of the input instance of Negative Triangle satisfies the following latter type. properties: On (G′,w′) we compute BC(b),andoutputYESto the input Negative Triangle instance iff BC(b)

z0 y z0 0B 0 3 B′ 6

1 4 0 z1 z1 1B B Q-8 Q+8 ′ 4 -8 1 2 o0 o0 2B 2 2 B′ 0A

o1 o1 3B 3 B′

1A

0I 0J 0K 0L 0I 0J 0K 0L Q+2 r 2Q 2Q -8 -8 4 Q-8 Q-8 + 2 Q A 2 0 0 1I 1J 1K 1L 1I 1J Q+2 1K 1L C C 2Q ′ +2 Q+4 Q+2 Q+4 3A

6 + 1 1 2I 2J 2K 2 2L 2I 2J 2K 2L C C Q Q 2Q-4 ′ 2 + Q+6 4 Q+4 2Q+4

3 3 3 3 3 3 3 3 2C 2 I J K L I J K L C′

x 3C 3 b C′

Then any s-t path passing through b costs at least 2(3Q graph. The key observation is that a node s has distance − I 1) + (2Q M).Ontheotherhand,anys Z O can at most 6Q 2 to every node t (including s )ifandonly − ∈ ∪ − L L reach any t Z O within distance 2(3Q 1),andany if s is in a negative triangle in G.Intuitively,thisallowsus ∈ ∪ − t I J K L within distance 3Q 1+2(2Q + M). to show that an algorithm distinguishing between radius ∈ ∪ ∪ ∪ − If s, t I J K L,thereexistsans-t path of 6Q 2 and radius 6Q 1 can solve Negative Triangle. ∈ ∪ ∪ ∪ − − length at most 3(2Q + M).Itremainstoconsiderthe To complete the reductionwe need to make sure that sI is case that s = s I and t = t L.Thepath close to every node in the graph (not only nodes in part L) I ∈ L ∈ s ,b,t has cost 6Q 1.Ifs = t,analogouslytothe and that the center can only lie in part I. I L − ̸ directed case there exists w Z O such that sI ,w,tL ∈ ∪ LEMMA 2.3. Given a O˜(T (n, m, M)) time algorithm for is a path of weight 6Q 2.Wecanconcludethat,like − Radius in directed or undirected graphs, there exists a in the directed case, the only pairs which can contribute O˜(T (n, m, M)) time algorithm for Negative Triangle. to BC(b) are of the form (sI ,sL).Theshortestpathof the form sI ,vJ ,wk,sL has weight at most 6Q 2 if s Proof. (G =(V,E),w) − Let be the considered instance belongs to a negative triangle, and at least 6Q otherwise. of Negative Triangle (modified as described before). We Any other path avoiding b contains at least 4 edges, and start with the directed case (see also Figure 2). Let therefore costs at least 4(2Q M).Wecanconclude Q =Θ(M) − be a sufficiently large integer. We construct that BCsI ,sL (b)=1if s is not contained in a negative adirectedweightedgraph(G′,w′) as follows. Similarly triangle of (G, w),andBCsI ,sL (b)=0otherwise. The to Lemma 2.2, graph G′ contains four copies I, J, K, correctness follows. and L of the node set V (layers). Let vX be the copy of For the sake of simplicity, we did not enforce unique- v V in layer X.Foreachedgeuv E,weaddto ∈ ∈ ness of shortest paths in our reduction but that can be done G′ edges uI vJ , uJ vK ,anduK vL of weight Q + w(vu). easily by perturbing the edge weights by small polynomial We also add to G′ two sets of nodes Z = z ,...,z { 0 k} factors exploiting the Isolation Lemma [39]. and O = o ,...,o .Weaddedgesincidenttonodes { 0 k} Z O in the same way as in Lemma 2.2, using edges of ∪ 2.2 Radius. Our reduction from Negative Triangle to cost Q. In more detail, let v0,v1,...,vk be the binary Radius is similar to the one in Lemma 2.2. Consider the representation of node v:weaddtheedgesvI zj and oj vL j same construction when we remove the node b from the if v =0,andtheedgesvI oj and zj vL otherwise. We also add edges z o and o z of weight Q for all j =0,...,k. weight Q/4 for any v V . j j j j ∈ Finally, we add nodes x and y,andforanyv V we add In this graph we compute the median value M ∗,and ∈ edges vI x, xvI ,andxvJ of weight Q,andedgesvI y of output YES to the input instance of Negative Triangle iff weight 3Q 1. M ∗ +(n 1) +6nQ. root node r), and exactly 3Q 1 to node y.Notealso 4 − 2 − that, if s belongs to a negative triangle, there exists an sI - On the other hand, for any node vA, sL path of the form sI ,vJ ,wK ,sL with length at most 3Q 2.OtherwiseoneshortestsI -sL path passes trough Med(v )=d(v ,r)+ d(v ,u ) − A A A A nodes in Z O and has length 3Q.Wecanconclude u V ∪ "∈ that the center of the graph belongs to I,andthatthe + (d(vA,uB)+d(vA,uB )) corresponding radius is upper bounded by 3Q 1 iff there ′ − u V exists a negative triangle in (G, w). "∈ + (d(v ,u )+d(v ,u )) In the undirected case we use precisely the same con- A C A C′ u V struction, but removing edge directions (and leaving only "∈ Q Q one copy of parallel edges). The algorithm is analogous as = +(n 1) well as its running time analysis. Its correctness can also 4 − 2 be proved analogously. In more detail, similarly to the di- + (Q + w(vu)+Q w(vu)) − rected case, nodes in I can reach any other node within u V "∈ distance at most 3Q +3M.Sinced(y,x)=4Q 1,and − + (d(vA,uC )+2Q + w(vu))) d(s, y) (3Q 1)+(Q M) for s/I y ,wecancon- ≥ − − ∈ ∪{ } u V clude that r∗ I.Alsointhiscase,foranynodes ,its "∈ ∈ I Q Q maximum distance to any other node is d(sI ,y)=3Q 1 = +(n 1) +2nQ − 4 2 if s belongs to a negative triangle, and d(s ,s ) 3Q − I L ≥ otherwise. + (d(vA,uC )+2Q + w(vu))) u V "∈ Q Q 2.3 Median. The reduction to Median is based on a +(n 1) +6nQ. rather different approach. ≤ 4 − 2

LEMMA 2.4. Given a O˜(T (n, M)) time algorithm for Therefore the median is in A.Inthelastinequalitywe upper bounded d(v ,u ) with w′(v u )=2Q w(vu). Median in undirected or directed graphs, there exists a A C A C − O˜(T (n, M)) time algorithm for Negative Triangle. Observe that a strict inequality holds if there exists a third node zB such that w′(vAzB)+w′(zBuC ) d(s, b) K,and Trivially, any α-approximation algorithm for Be- ≥ K + M>d(b, t) K.Weconstructaninstancetweenness Centrality, for any finite α,alsosolvesPos- ≥ (G′,w′) of Diameter as follows. We add to G′ acopy itive Betweenness Centrality since the answer is 0 iff of G.Furthermore,weaddasetofnodesA that contains BC(b)=0.AsimilarreductionalsoworksforReach anodev for each node v V such that K + M> Centrality. In more detail, by definition RC(b) A ∈ ≥ d(v, b) K.Symmetrically,weaddasetofnodesB min d(b, b),d(b, b) =0and RC(b) > 0 implies ≥ { } that contains a node v for each node v V such that BC(b) > 0.However,itmightstillbethatRC(b)=0 B ∈ K + M>d(b, v) K.Wealsoaddedgesv v and and BC(b) > 0.Wecansolvethisissuebyinitiallyper- ≥ A vv of weight K + M d(v, b) and K + M d(b, v), turbing edge weights by small polynomial factors, so as B − − respectively. Note that the weight of the latter edges is in to obtain an equivalent instance of Positive Betweenness [1,M] by construction. Finally, we add a directed path Centrality where all weights are strictly positive. In the P = v ,...,v , q = (2K +2M 2)/M ,whose reduced instance RC(b) > 0 iff BC(b) > 0. 0 q ⌈ − ⌉ edge weights are chosen arbitrarily in [1,M] so that the length of P is exactly 2K +2M 2.Foreveryv V , 4.1 Some Results on Positive Betweenness Centrality. − ∈ we add edges vv0 and vqv of weight zero. We also add Asimpleobservationisthatonunweightedgraphs, edges av of weight 1 and v a of weight 0 for any a A. Positive Betweenness Centrality is asking whether there is 0 q ∈ Symmetrically, we add edges vqb of weight 1 and bv0 of an in-neighbor x of b and an out-neighbor y of b such that weight 0 for any b B. xy / E,andthereforecanbesolvedinO(m) time. We ∈ ∈ We compute the diameter D∗ of (G′,w′) and output next show that, on weighted graphs, Positive Betweenness that RC(b) K iff D∗ 2K +2M.Therunningtime Centrality and Diameter are equivalent under subcubic ≥ ≥ of the algorithm is O˜(m + T (O(n),O(m + n),M)) = reductions. O˜(T (n, m, M)).Consideritscorrectness.Thedistance THEOREM 4.1. Diameter and Positive Betweenness between any two nodes in G P is at most 2K +2M 2. ∪ − Centrality are equivalent under subcubic reductions. The distance between any node in G P and any other ∪ node is at most 2K +2M 1.Thedistancebetweenany Theorem 4.1 follows from the following two lemmas. − node in B and any other node is at most 2K+2M 1.The − ˜ distance between any node in A and any node in G P A LEMMA 4.1. Given a O(T (n, m)) time algorithm for ∪ ∪ is at most 2K +2M 1. Positive Betweenness Centrality in directed (resp., undi- − ˜ Consider next any pair s A and t B.Ans - rected) graphs, there is a O(T (n, m)) time algorithm for A ∈ B ∈ A tB path using P would cost at least 2K +2M.Ashortest Diameter in directed (resp., undirected) graphs. s -t path avoiding P costs 2K+2M d(s, b) d(b, t)+ A B − − Proof. Let (G =(V,E),w) be the input instance of d(s, t) 2K +2M,wheretheequalityholdsiffb is along ≤ Diameter. Consider first the directed case (see also Figure some shortest s-t path. Therefore D∗ 2K +2M and ≤ 3). Let D be an integer in [1, (n 1)M].Constructan the equality holds iff there exists a pair (sA,tB) A B − ∈ × auxiliary weighted graph (G′,w′) consisting of a copy of such that d(s, t)=d(s, b)+d(b, t),i.e.iffRC(b) K. ≥ (G, w) plus a dummy node b and dummy edges vb and bv The correctness follows. of weight D/2 for any v V 7.Observethatanypairof ∈ nodes s, t V is connected by a path of length D using b. ∈ Our subcubic reduction in the undirected case is By performinga binarysearch on D and solving each time slightly less efficient in terms of the edge weights and will the resulting instance of Positive Betweenness Centrality follow from Lemmas 4.2 and 4.3 of the next section. on b,wedeterminethelargestvalueD′ of D such that the answer is YES (i.e., BC(b) > 0). The output value of the 4FastApproximationofReachandBetweenness diameter is D′. Centrality The running time of the algorithm is O˜((m + T (n + In this section we present our results about the approx- 1, 2n + m)) log(nM)) = O˜(T (n, m)).Let(s∗,t∗) be a imability of Reach and Betweenness Centrality. A key witness pair for the diameter D∗.Inanyexecutionwhere idea in our approach is to consider the following Positive D∗ D,thereexistsashortests∗-t∗ path using node ≥ Betweenness Centrality problem, which might be of in- b and hence the answer is YES. In any other execution dependent interest: determine whether BC(b) > 0 for agivennodeb (i.e., whether some shortest path uses b 7In order to avoid fractional edge weights, it is sufficient to multiply as an intermediate node). In this case it is convenient to edge weights by a factor 2. 9 Figure 3 (Left) Reduction from Diameter to Positive Betweenness Centrality in directed graphs. Gray edges have weight D/2,whereD is a guess of the diameter. (Middle) Reduction from Positive Betweenness Centrality to Diameter in directed graphs. Here D˜ is a proper upper bound on the diameter. (Right) Reduction from the Negative Triangle instance of Figure 2 to All-Nodes Positive Betweenness Centrality in directed graphs (partially drawn). Gray edges have weight 3Q.OnehasBC(3B) > 0 and BC(0B)=0since node 3 does not belong to a negative triangle while node 0 does it.

0 0 0 bA b bB 0B 0I 2Q 0J 2Q 0K 0L -8 -8 2 3 2 3 4 ˜ ˜ + D+1-3 D+1-2 Q b 1 4 1A 1 1B 1B 1I 1J 1K 2 1L

3 3 4 2Q + ˜ 6 2 D+1-4 0 + 2 2 2A 2 2B 2B 2I 2J 2K Q 2L Q + 2 4

3B 3I 3J 3K 3L

(where D∗ 1)andDiameter. G and therefore the answer to the Positive Betweenness ′ THEOREM 4.2. Diameter and Approximate Betweenness Centrality instance is YES. Vice versa, suppose that the Centrality are equivalent under subcubic Monte-Carlo answer to the Positive Betweenness Centrality instance is reductions. YES, i.e. there exists a shortest s-t path passing through b.Thisimpliesthatthereexistsashortestpathoftheform Similarly to the case of Reach Centrality, it is not hard s′,b,t′.Observethattheshortestpathsnotinvolvingnode to see that a truly subcubic α-approximation algorithm b are the same in G and G′.Thereforethereexistsashort- for Betweenness Centrality provides a truly subcubic est s′-t′ path in G′ passing through b.Sincebyconstruc- algorithm for Positive Betweenness Centrality (hence for tion d (s′,b),d (b, t′) K,thepair(s′,t′) witnesses Diameter via Theorem 4.1). G G ≥ that RC(b) K. We next show that a truly subcubic algorithm for ≥ The claim in the undirected case follows from the Diameter implies a truly subcubic PTAS for Betweenness same reduction, but removing edge directions (and leav- Centrality, i.e. an algorithm that computes a (1 + ε) ing only one copy of parallel edges). approximation of the betweenness centrality of a given node for any given constant ε>0.OurPTASisMonte- Another interesting observation about Positive Be- Carlo: it provides the desired approximation w.h.p. tweenness Centrality is that although solving it for a sin- Let (G, w, b) be the considered instance of Between- gle node b is equivalent to Diameter under subcubic re- ness Centrality, and define B∗ = BC(b).Observe ductions, the all-nodes version of the problem (where one that, under the assumption that shortest paths are unique, wants to determine whether BC(b) > 0 for all nodes b)is BC (b) 0, 1 and therefore B∗ 0,...,(n s,t ∈{ } ∈{ − actually at least as hard as APSP. 1)(n 2) .Givens, t V b such that BC (b)=1, − } ∈ −{ } s,t LEMMA 4.4. Given a O˜(T (n, m, M)) time algorithm we call (s, t) a witness pair, s a witness source,andt a for All-Nodes Positive Betweenness Centrality in directed witness target (of BC(b)). (or undirected) graphs, there is a O˜(T (n, m, M)) time Let also Bmed [0, (n 1)(n 2)] be a integer ∈ − − algorithm for Negative Triangle. parameter to be fixed later. Our PTAS is based on two different algorithms: one for B∗ Bmed and the other Proof. Let (G, w) be the input instance of Negative Tri- ≤ for B∗ >Bmed. angle. Consider first the directed case (see also Figure 3). We create a directed weighted graph (G′,w′) as fol- 4.2.1 An exact algorithm for small B∗. Let us start lows. Graph G′ contains five copies I, J, K, L and B of with the algorithm for small B∗.Recallthatawitness the node set V .WiththeusualnotationvX is the copy pair (s, t) satisfies BCs,t(b)=1.Acrucialobservationis of node v V in set X.LetQ =Θ(M) be a suf- that the number of witness pairs is equal to B∗ in case of ∈ uv E ficiently large integer. For every edge we add unique shortest paths. u v ,u v ,u v G ∈ the edges I J J K K L to ′ and set their weight It is convenient to define a generalization of Be- 2Q+w(uv) u u u u to .Wealsoaddedges I B and B L for tweenness Centrality, where we consider only some pairs every node u in G and set the weight of these edges to (s, t).ForS, T V b ,wedefineBCS,T (b) := 3Q. ⊆ −{} (s,t) S T BCs,t(b).The(S, T )-Betweenness Central- The algorithm solves the All-Nodes Positive Be- ∈ × ity problem is to compute BCS,T (b).ThePositive (S, T )- tweenness Centrality problem on (G′,w′) in time ! Betweenness Centrality problem is to determine whether O˜(T (n, m, M)),andoutputsYEStotheinputNegative BCS,T (b) > 0.WeusetheshortcutsBCs,T (b)= Triangle instance iff BC(uB) > 0 for some uB B.To ∈ BC s ,T (b) and BCS,t(b)=BCS, t (b).Ourfirstin- show correctness, observe that the only path through u { } { } B gredient is a reduction of that problem to Diameter. is from uI to uL and it has weight 6Q,whileeverypath of type u ,v ,w ,u corresponds to a triangle u, v, w LEMMA 4.5. Given a O˜(T (n, m)) time algorithm for I J K L { } in G and the weight of the path equals the weight of the Diameter in directed (resp., undirected) graphs, there triangle plus 6Q.Theclaimfollows. exists a O˜(T (n, m)) time algorithm for Positive (S, T )- The same construction, without edge directions, Betweenness Centrality in directed (resp., undirected) proves the claim for undirected graphs. graphs. 11 Proof. We use a construction similar to the one in the claimed running time, given a O˜(T (n, m)) time algorithm proof of Lemma 4.2. Let (G, w, b, S, T ) be the considered for Positive (S, T )-Betweenness Centrality. The claim instance of Positive (S, T )-Betweenness Centrality. We follows from Lemma 4.5. start with the directed case, the undirected one being The recursive algorithm works as follows. It initially analogous. Let us construct a directed weighted graph solve the corresponding Positive (S, T )-Betweenness in- (G′,w′).GraphG′ contains a copy of G.Furthermore, stance. If the answer is NO, the algorithm outputs 0.Oth- it contains a copy of S and a copy of T .Letv be the erwise, if S = T =1,thealgorithmoutputs1.Other- S | | | | copy of node v in S,anddefinevT analogously. Let wise, the algorithm partitions arbitrarily S into two sub- K := 2 + A,whereA is the maximum distance of type sets S1 and S2 of roughly the same cardinality, and it splits d(s, b) and d(b, t),withs S and t T .Foreach similarly T into T and T .Thenthealgorithmsolvesre- ∈ ∈ 1 2 s S and t T ,weaddedgess s and tt of weight cursively the sub-problems induces by the pairs (S ,T ), ∈ ∈ S T i j K d(s, b) and K d(b, t),respectively.Observethat i, j 1, 2 ,andoutputsthesumofthefourobtained − − ∈{ } the latter weights are lower bounded by 2.Wealsoadd values. two nodes r′ and r′′,withanedger′r′′ of weight 2.We The correctness of the algorithm is obvious. Con- add edges vr′ and r′v for every v V S of weight cerning its running time, consider the recursion tree. ∈ ∪ K 1.Symmetrically,weaddedgesvr′′ and r′′v for Let us call a subproblem whose corresponding Positive − every v V T of weight K 1.Wealsoaddedges (S, T )-Betweenness Centrality instance is a YES/NO in- ∈ ∪ − vr′ of weight K 1 for every v T .Wecomputethe stance a YES/NO subproblem. Observe that, excluding − ∈ diameter D∗ of (G′,w′),andoutputYESiffD∗ < 2K. the root problem, any NO subproblem must have at least The running time of the algorithm is O˜(m + one sibling YES subproblem in the recursion tree. Fur- T (O(n),O(m))) = O˜(T (n, m)).Letusproveitscor- thermore, each sub-problem has at most 4 children in the rectness. The distance from any node v V T r′,r′′ recursion tree. Therefore, if the root subproblem is a YES ∈ ∪ ∪{ } to any node w V S T r′,r′′ is at most subproblem, the total number of subproblems is at most 4 ∈ ∪ ∪ ∪{ } 2(K 2).Considernextanynodes S.Itsdistance times the number of YES subproblems. Note also that the − ∈ to any node in G r′,r′′ is also at most 2(K 2). number of leaf YES subproblems is equal to W ,andthat ∪{ } − It remains to consider the distance from s to any t T . each YES subproblem must have at least one leaf YES ∈ Any path using r′ or r′′ (or both) has length at least 2K. subproblem among its descendants. Finally, the depth of The shortest path avoiding r′ and r′′ has length precisely the recursion tree is O(log( S + T )) = O(log n).Thus | | | | 2K d(s, b) d(b, t)+d(s, t) 2K,wheretheequal- the number of subproblems is O˜(W ).Theclaimonthe − − ≤ ity holds iff b belongs to some shortest s-t path. We running time follows. can conclude that the diameter is smaller than 2K iff b is along some shortest s-t path with s S and t T , We are now ready to present our algorithm for small ∈ ∈ i.e., iff the answer to the input instance of Positive (S, T )- B∗. Betweenness Centrality is YES. LEMMA 4.7. Given an instance (G, w, b) of Betweenness The construction for the undirected case is similar, Centrality, a parameter B ,andanalgorithmfor where we remove edge directions and also remove the med Diameter of running time O˜(T (n, m)).Thereisan edges of type vr′ with v T (those edges were needed ∈ O˜(BmedT (n, m)) time algorithm which either outputs only to guarantee that distances from T to S are smaller B = BC(b) or answers NO in which case B >B . than 2K,whilethispropertyshouldnotalwaysholdinthe ∗ ∗ med ˜ undirected case). The running time remains O(T (n, m)). Proof. Consider the recursive algorithm from Lemma Analogously to the directed case, it is not hard to prove 4.6. We run that algorithm with S = T = V b . −{} that pairwise distances are always smaller than 2K ex- Furthermore, we keep track of the number W of leaf YES cluding possibly the distance between some s S and ∈ sub-problems found so far. If W>Bmed at any point, we t T which is equal to 2K iff b belongs to some shortest ∈ halt the recursive algorithm and output NO. Otherwise, s-t path. The correctness follows. we output the value W returned by the root call of the We will exploit the following recursive algorithm for recursive algorithm. (S, T )-Betweenness Centrality. The correctness of the algorithm follows immediately since the number of leaf YES subproblems in the original LEMMA 4.6. Given a O˜(T (n, m)) time algorithm for (non-truncated) algorithm equals B .Aneasyadaptation Diameter in directed (resp., undirected) graphs, there is a ∗ of the running time analysis in Lemma 4.6 shows that the O˜(W T (n, m)) time algorithm for (S, T )-Betweenness · running time is as in the claim. Centrality, where W is the number of pairs (s, t) S T ∈ × such that BCs,t(b)=1. 4.2.2 A Monte-Carlo PTAS for large B∗. We next Proof. We describe a recursive algorithm with the assume that B∗ >Bmed,andwepresentanalgorithm 1 x for this case. In order to lighten the notation, we next Above we used the fact that function xe − is increasing drop b (which is clear from the context). Recall that for x [0, 1 ] (and strictly smaller than 1 in the same ∈ 1+ε anodew is a witness source (resp., witness target) if range). Similarly, conditioning on the event that B> √B C log n C log n BCw,V > 0 (resp., BCV,w > 0). At high level, our med 1 ε ,oneobtainsE[B′]= √B B 1 ε and algorithm is based on the computation of the contribution − med ≥ − BCs,V to BC of a random sample of candidate witness √Bmed sources s.ThenweexploitChernoff’sboundtoprovethat Pr[B′ 0 tends to zero as C tends to + . s R BCs,V .ObservethatE[B′]=pmedBsmall∗ . ∞ ∈ small Furthermore B′ is the sum of independent random vari- Proof. We start by showing that w.h.p., for any s V ,if ! √Bmed ∈ ables each one of value at most 1 ε by the assumption s Slarge then BCs,V √Bmed/(1+ε),andotherwise − ∈ ≥ on Ssmall.Therefore,byChernoff’sbound, BCs,V √Bmed/(1 ε).DefineB′ = BCs,T and ≤ − C log n B = BCs,V .NotethatE[B′]= √B B.Notealsothat med Pr[B′ E[B′]+εpmedB∗] B′ = BCs,T = Xs,t,whereXs,t =0if t/T ≥ t V (1 ε)C log nB ∈ ∈ − small∗ and Xs,t = BCs,t otherwise. Since the variables Xs,t are B εB∗ med ! B negatively correlated, we can apply Chernoff’s bound to e small∗ √B . med ⎛ εB∗ ⎞ BCs,T .Inparticular,conditioningonB< ,one ≤ B +1 1+ε εB∗ small∗ ( B +1) obtains ⎜ small∗ ⎟ ⎝ ⎠ √Bmed Pr[B′ C log n = E[B′]] Assuming B∗ εB /2 and observing that B∗ ≥ B small ≥ med ≥ B∗ ,oneobtains C log n B small (√B /B) 1 √B e med − med √ ≤ (√B /B) Bmed/B Pr[B′ E[B′]+εp B∗] # med $ ≥ med C log n (1 ε)εC log n eε/(1+ε) eε − 2 . . ≤ 1+ε ≤ (1 + ε)1+ε % & % & 13 n Otherwise B∗ <εB /2 εB∗/2 and thus cannot be solved in time O((2 δ) ) for any constant small med ≤ − δ>0.Bythesameobservationasbefore,oneobtainsas Pr[B′ E[B′]+εp B∗] ≥ med acorollaryalowerboundontherunningtimeofanyap- (1 ε)C log nB∗ − B proximation algorithm for Betweenness/Reach Centrality. ε med e 2 ε B THEOREM 4.3. Suppose that there is an O(m − ) time small∗ ≤ ⎛ εB∗ +ε ⎞ (1 + ) B∗ algorithm, for any constant ε>0,thatsolvesPositive Bsmall∗ ⎝e ε(1 ε)C log n ⎠ Betweenness Centrality in directed or undirected graphs − . with edge weights in 1, 2 .ThenSETHisfalse. ≤ 3 { } - . Similarly Proof. Let F be a CNF-SAT formula on n variables. Our goal is to show that we can determine whether F is (1 δ)n 8 Pr[B′ E[B′] εpmedB∗] satisfiable in O (2 ) time for some constant δ>0 . ≤ − ∗ − p B 1 ( εB∗ )2 med small∗ Using the sparsification lemma of [33] (as, e.g., in [10]), − 2 B∗ √B /(1 ε) e small med − we can assume w.l.o.g. that F contains O(n) clauses. ≤ (1 ε)ε2 (B )2 C log n − ∗ Let us consider the undirected case first (see also 2 B B = e− small∗ med Figure 4). We partition the variables into two sets A (1 ε)ε2 − C log n and B of (roughly) n/2 variables each, and create a node e− 2 . ≤ for each partial assignment of the variables in A and B, respectively. We also add a node for each clause c,and Therefore w.h.p. B˜ [B∗ εB∗,B∗ +εB∗]. small ∈ small − small add one edge of weight 1 between each clause c and any Altogether, w.h.p. one has partial assignment φ of A or B that does not satisfy any literal of c (including the special case that c does not (1 2ε)B∗ (1 ε)B∗ + B∗ εB∗ B˜ − ≤ − large small − ≤ contain any variable in A or B). We also add two nodes (1 + ε)B∗ + B∗ + εB∗ (1 + 2ε)B∗. ≤ large small ≤ xA and xB ,andaddoneedgeofweight1 between them and any node in A and B,respectively.Finallyweadda The following lemma summarizes the above discus- node b,andaddoneedgeofweight2 between b and any sion. assignment of A and B. LEMMA 4.9. Given an instance (G, w, b) of Between- We claim that F is satisfiable iff BC(b) > 0. ness Centrality with BC(b)=B∗ Bmed,thereisan Observe that the distance between any clause node c ˜ nm ≥ and any other node is at most 4,whileanypathpassing O( √ ) time algorithm that returns a (1 + ε) approxi- Bmed through b would cost at least 5.Similarly,thedistance mation of B∗ w.h.p. between any two assignment of A or of B is at most 2, Combining the algorithms for small and large B∗,we so the corresponding shortest paths do not use b.Given obtain the following result. an assignment φA of A and an assignment φB of B,there

exists a φA-φB path of length 2 (hence BCφA,φB (b)=0) LEMMA 4.10. Given a truly subcubic algorithm for Di- iff there exists a clause c that is not satisfied by φA nor ameter, there exists a truly subcubic Monte-Carlo PTAS by φB.Otherwise(i.e.,φA and φB together satisfy F ), for Betweenness Centrality. φA,b,φB is a a shortest such path (hence BC(b) > 0). The graph has O(2n/2n) edges, and the conclusion of the ˜ 3 δ Proof. Let O(n − ) be the running time of the given lemma follows. Diameter algorithm, for some constant δ>0.From In the directed case we can use a similar construction, Lemmas 4.7 and 4.9, we can use it to compute w.h.p. a without nodes xA and xB,andorientingtheedgesfrom (1 + ε) approximation of the betweenness centrality of a the assignments of A to the clause nodes and to b,and 3 δ n3 given node in time O˜(Bmedn − + ).Choosing √Bmed from the latter nodes to the assignments of B.The 2δ/3 Bmed = n gives a truly subcubic running time in algorithm and its analysis are analogous to the undirected 3 δ/3 O˜(n − ). case.

2 ε Theorem 4.2 follows directly from Lemma 4.10. COROLLARY 4.2. Suppose that there is an O(m − ) time α-approximation algorithm for Betweenness Cen- 4.3 Reductions based on SETH. We are able to show trality or Reach Centrality, for any constant ε>0 and that, assuming the Strong Exponential Time Hypothesis any finite α (possibly depending on m). Then SETH is (SETH) [33], a subquadratic algorithm for Positive Be- false. tweenness Centrality does not exist even in sparse graphs. 8 We recall that SETH claims that CNF-SAT on n variables The O∗ notation suppresses polynomial factors. Figure 4 Reduction from CNF-SAT to Positive Betweenness Centrality (left) and Reach Centrality (right) in undirected graphs for the CNF-SAT formula c c c c = (X Y Z) (Z Y ) (X Y Q) (X Z Q).Thesetof 1 ∧ 2 ∧ 3 ∧ 4 ∨ ∨ ∧ ∨ ∧ ∨ ∨ ∧ ∨ ∨ variables are A = X, Y and B = Z, Q .NodeA corresponds to the partial assignment (X, Y )=(F, F) and { } { } FF similarly for the other nodes. Bold edges have weight 2,allotheredgeshaveweight1.TheshortestpathsAFF,r,BTT on the left and AFF,xA,r,xB ,BTT on the right witness that (X, Y, Z, Q)=(F, F, T, T ) is a satisfying assignment.

xA xA

AFF ATF AFT ATT AFF ATF AFT ATT

r c1 c2 c3 c4 r c1 c2 c3 c4

BFF BTF BFT BTT BFF BTF BFT BTT

xB xB

For Reach Centrality we can also show an approxi- and only if b is on the shortest path between an assignment mation lower bound for unweighted undirected graphs. φA of A and an assignment φB of B.Butashortestpath φ φ b 2 ε between A and B goes through if and only if for every THEOREM 4.4. Suppose there is a O(m − )-time (2 − clause node c either φAc is not an edge or φB c is not an ε)-approximation algorithm for Reach Centrality in undi- edge, and by definition of these edges it implies that for rected unweighted graphs, for some constant ε>0.Then every clause c,eitherφA or φB satisfies c (i.e. φA and φB SETH is false. induce a satisfying assignment of F ). The claim follows. Proof. Similarly to the proof of Theorem 4.3, we can 5ConclusionsandOpenProblems start with a CNF-SAT formula F containing n variables and m = O(n) clauses [33]. We will show how to There are many interesting problems that we left open. construct an instance (G, b) of Reach Centrality on an The main one is probably whether Diameter and APSP are unweighted undirected graph G =(V,E) with V = equivalent under subcubic reductions. By our reductions, | | O(2n/2 + m) nodes and E = O(2n/2m) edges, such on one hand a positive answer would indicate that truly | | that RC(b)=2if F is satisfiable and RC(b)=1 subcubic algorithms for Reach Centrality and for Approx- otherwise. The generation of the graph from the formula imate Betweenness Centrality are unlikely to exist. On the takes O(2n/2m) time and therefore if we could compute other hand, a negative answer would give truly subcubic 2 ε a (2 ε) approximation of RC(b) in O∗( E − ) time, algorithms for the latter problems as well. − | | for some ε>0,wewouldbeabletosolveCNF-SATin We have shown that Reach Centrality can be solved (1 ε/2)n ˜ ω O∗(2 − ) time (which would refute SETH). in O(Mn ) time in directed graphs, improving on the Similarly to the proof of Theorem 4.3, we partition previous best algorithm based on APSP. Similar running the variables into two subsets A and B of (roughly) times are known for Diameter and Radius [13]. To the ˜ ω n/2 variables each, and create a node for each partial best of our knowledge, it is open whether a O(Mn ) assignment of the variables in A and B.Wealsocreate time algorithm exists also for Median and Betweenness anodec for each clause c,andconnectc to each partial Centrality in directed graphs. We provedthata subquadratic 2 ε approximation al- assignment that does not satisfy any literal in c.Wealso − add nodes xA and xB ,andaddedgesbetweenthemand gorithm for Reach Centrality in sparse graphs is unlikely any node in A and B,respectively.Finally,weaddanode to exist. In [45] an analogous result is proved for Diam- b,andconnectittoxA and xB (note that the final part of eter. It would be interesting to show similar negative re- the construction deviates from Theorem 4.3). sults for Radius, Betweenness Centrality and Median (or To show correctness, note that b is on the shortest path find faster approximation algorithms in sparse graphs for between x and x and therefore RC(b) 1.Second, those problems). A B ≥ note that b cannot be on the shortest path between a clause node c and another node in G,andthereforeRC(b)= 2 if References 15 [1] A. Abboud and V. V.Williams.Popularconjectures imply lems in . Computational Geome- strong lower bounds for dynamic problems. FOCS,2014. try,5(3):165–185,1995. [2] A. Abboud, V. V. Williams, and O. Weimann. Conse- [22] F. L. Gall. Powers of tensors and fast matrix multiplica- quences of faster alignment of sequences. In ICALP (1), tion. In International Symposium on Symbolic and Alge- pages 39–51, 2014. braic Computation, ISSAC ’14, Kobe, Japan, July 23-25, [3] D. Aingworth, C. Chekuri, P. Indyk, and R. Motwani. 2014,pages296–303,2014. Fast estimation of diameter and shortest paths (without [23] R. Geisberger, P. Sanders, and D. Schultes. Better approx- matrix multiplication). SIAM J. Comput.,28(4):1167– imation of betweenness centrality. In ALENEX,pages90– 1181, 1999. 100, 2008. [4] D. A. Bader, S. Kintali, K. Madduri, and M. Mihail. [24] A. V. Goldberg, H. Kaplan, and R. F. Werneck. Reach for Approximating betweenness centrality. In WAW,pages A*: Efficient point-to-point shortest path algorithms. In 124–137, 2007. ALENEX,pages129–143,2006. [5] P. Berman and S. P. Kasiviswanathan. Faster approxima- [25] A. V. Goldberg, H. Kaplan, and R. F. F. Werneck. Better tion of distances in graphs. In WADS,pages541–552, landmarks within reach. In WEA,pages38–51,2007. 2007. [26] O. Goldreich and D. Ron. Approximating average param- [6] U. Brandes. A faster algorithm for betweenness centrality. eters of graphs. Random Struct. Algorithms,32(4):473– Journal of Mathematical Sociology,25(2):163–177,2001. 493, 2008. [7] U. Brandes and C. Pich. Centrality estimation in large [27] F. Grandoni and V. Vassilevska Williams. Improved dis- networks. International Journal of Bifurcation and Chaos, tance sensitivity oracles via fast single-source replacement 17(7):2303–2318, 2007. paths. In FOCS,pages748–757,2012. [8] T. M. Chan. More algorithms for all-pairs shortest paths [28] R. J. Gutman. Reach-based routing: A new approach to in weighted graphs. SIAM J. Comput.,39(5):2075–2089, shortest path algorithms optimized for road networks. In 2010. ALENEX/ANALC,pages100–111,2004. [9] C.-L. Chang. Deterministic sublinear-time approxima- [29] P. Hage and F. Harary. Eccentricity and centrality in tions for metric 1-median selection. Inf. Process. Lett., networks. Social Networks,17:57–63,1995. 113(8), 2013. [30] S. L. Hakimi. Optimum locations of switching centers and [10] S. Chechik, D. Larkin, L. Roditty, G. Schoenebeck, R. E. the absolute centers and medians of a graph. Operations Tarjan, and V. V. Williams. Better approximation algo- research,12(3):450–459,1964. rithms for the graph diameter. In SODA,pages1041–1052, [31] Y. Han and T. Takaoka. An O(n3 log log n/ log2 n) time 2014. algorithm for all pairs shortest paths. In SWAT,pages131– [11] T. Coffman, S. Greenblatt, and S. Marcus. Graph-based 141, 2012. technologies for intelligence analysis. Communications of [32] X. Huang and V. Y. Pan. Fast rectangular matrix multi- the ACM,47(3):45–47,2004. plication and applications. J. Complexity,14(2):257–299, [12] D. Coppersmith and S. Winograd. Matrix multiplication 1998. via arithmetic progressions. J. Symbolic Computation, [33] R. Impagliazzo, R. Paturi, and F. Zane. Which problems 9(3):251–280, 1990. have strongly exponential complexity? J. Comput. Syst. [13] M. Cygan, H. N. Gabow, and P. Sankowski. Algorithmic Sci.,63(4):512–530,2001. applications of Baur-Strassen’s theorem: Shortest cycles, [34] P. Indyk. Sublinear time algorithms for metric space diameter and matchings. In FOCS,pages531–540,2012. problems. In STOC,pages428–434,1999. [14] A. Davie and A. J. Stothers. Improved bound for com- [35] H. Jeong, S. Mason, A. Barab´asi, and Z. Oltvai. Lethality plexity of matrix multiplication. Proceedings of the Royal and centrality in protein networks. Nature,411:41–42, Society of Edinburgh, Section: A Mathematics,143:351– 2001. 369, 4 2013. [36] D. B. Johnson. Efficient algorithms for shortest paths in [15] A. Del Sol, H. Fujihashi, and P. O’Meara. Topology of sparse networks. J. ACM,24(1):1–13,1977. small-world networks of protein- protein complex struc- [37] V. Krebs. Mapping networks of terrorist cells. Connec- tures. Bioinformatics,21(8):1311–1315,2005. tions,24(3):43–52,2002. [16] E. W. Dijkstra. A note on two problems in connexion with [38] F. Liljeros, C. Edling, L. Amaral, H. Stanley, and Y. Aberg. graphs. Numerische Mathematik,1:269–271,1959. The web of human sexual contacts. Nature,411:907–908, [17] D. Eppstein and J. Wang. Fast approximation of centrality. 2001. J. Graph Algorithms Appl.,8:39–45,2004. [39] K. Mulmuley, U. V. Vazirani, and V. V. Vazirani. [18] M. L. Fredman. New bounds on the complexity of the is as easy as matrix inversion. Combinatorica,7(1):105– shortest path problem. SIAM J. Comput.,5(1):83–89, 113, 1987. 1976. [40] M. E. J. Newman and M. Girvan. Finding and evaluating [19] M. L. Fredman and R. E. Tarjan. Fibonacci heaps and community structure in networks. Physical Review E, their uses in improved network optimization algorithms. 69(2):26–113, 2004. J. ACM,34(3):596–615,1987. [41] S. Pettie. A new approach to all-pairs shortest paths on [20] L. Freeman. A set of measures of centrality based upon real-weighted graphs. Theor. Comput. Sci.,312(1):47–74, betweenness. Sociometry,40:35–41,1977. 2004. [21] A. Gajentaan and M. Overmars. On a class of o(n2) prob- [42] S. Pettie and V. Ramachandran. A shortest path algorithm for real-weighted undirected graphs. SIAM J. Comput., 34(6):1398–1431, 2005. [43] J. W. Pinney and D. R. Westhead. Betweenness-based de- composition methods for socialandbiologicalnetworks. In Interdisciplinary Statistics and Bioinformatics,pages 87–90, 2006. [44] M. Pˇatras¸cu. Towards polynomial lower bounds for dy- namic problems. In STOC,pages603–610,2010. [45] L. Roditty and V. Vassilevska Williams. Fast approxi- mation algorithms for the diameter and radius of sparse graphs. In STOC,pages515–524,2013. [46] L. Roditty and U. Zwick. Replacement paths and k simple shortest paths in unweighted directed graphs. ACM Transactions on Algorithms,8(4):33,2012. [47] G. Sabidussi. The centrality index of a graph. Psychome- tirka,31:581–606,1966. [48] D. Schultes and P. Sanders. Dynamic highway-node routing. In WEA,pages66–79,2007. [49] A. Shoshan and U. Zwick. All pairs shortest paths in undirected graphs with integer weights. In FOCS,pages 605–615, 1999. [50] M. Thorup. Undirected single source shortest path in linear time. In FOCS,pages12–21,1997. [51] V. Vassilevska Williams. Faster replacement paths. In SODA,pages1337–1346,2011. [52] V. Vassilevska Williams. Multiplying matrices faster than Coppersmith-Winograd. In STOC,pages887–898,2012. [53] V. Vassilevska Williams and R. Williams. Subcubic equiv- alences between path, matrix and triangle problems. In FOCS,pages645–654,2010. [54] O. Weimann and R. Yuster. Replacement paths and distance sensitivity oracles via fast matrix multiplication. ACM Transactions on Algorithms,9(2):14,2013. [55] R. Williams. Faster all-pairs shortest paths via circuit complexity. In STOC,2014. [56] U. Zwick. All pairs shortest paths in weighted directed graphs exact and almost exact algorithms. In FOCS, − pages 310–319, 1998. [57] U. Zwick. All pairs shortest paths using bridging sets and rectangular matrix multiplication. J. ACM,49(3):289– 317, 2002.

17