<<

Personalized PageRank Estimation and Search: A Bidirectional Approach

Peter Lofgren Siddhartha Banerjee Ashish Goel Department of CS School of ORIE Department of MS&E Cornell University Stanford University [email protected] [email protected] [email protected]

ABSTRACT 1. INTRODUCTION We present new for Personalized PageRank es- On social networks, personalization is necessary for re- timation and Personalized PageRank search. First, for the turning relevant results for a query. For example, if a user problem of estimating Personalized PageRank (PPR) from searches for a common name like John on a social network a source distribution to a target node, we present a new like Facebook, the results should depend on who is doing the bidirectional estimator with simple yet strong guarantees on search and who their friends are. A good personalized model correctness and performance, and 3x to 8x speedup over ex- for measuring the importance of a node t to a searcher s is isting estimators in experiments on a diverse set of networks. Personalized PageRank πs(t)[20, 13, 12] – this motivates a Moreover, it has a clean algebraic structure which enables natural Personalized PageRank Search Problem: Given it to be used as a primitive for the Personalized PageRank • a network with nodes V (each associated with a set of Search problem: Given a network like Facebook, a query keywords) and edges E (possibly weighted and directed), like “people named John,” and a searching user, return the • a keyword inducing a set of targets: top nodes in the network ranked by PPR from the perspec- T = {t ∈ V : t is relevant to the keyword} tive of the searching user. Previous solutions either score all • a searching user s ∈ V (or more generally, a distribution nodes or score candidate nodes one at a time, which is pro- over starting nodes), hibitively slow for large candidate sets. We develop a new return the top-k targets t1, . . . , tk ∈ T ranked by Personal- based on our bidirectional PPR estimator which ized PageRank πs(ti). identifies the most relevant results by sampling candidates The importance of personalized search extends beyond so- based on their PPR; this is the first solution to PPR search cial networks. For example, personalized PageRank can be that can find the best results without iterating through the used to rank items in a bi-partite user-item graph, in which set of all candidate results. Finally, by combining PPR sam- there is an edge from a user to an item if the user has liked pling with sequential PPR estimation and Monte Carlo, we that item. This has proven useful on YouTube when recom- develop practical algorithms for PPR search, and we show mending videos [5] and on for suggested users [3, via experiments that our algorithms are efficient on networks 12]. On the web graph there is a large body of work on us- with billions of edges. ing Personalized PageRank to rank web pages (e.g. [14, 13]). The most clear-cut motivation for our work is for the social network name-search application discussed above, which we Categories and Subject Descriptors use as a running example in this paper. H.3.3 [Information Search and Retrieval ]: Search pro- The personalized search problem is difficult because every cess; G.2.2 [Graph Theory]: Graph Algorithms searching user has a different on the target nodes. One naive solution would be to precompute the ranking for every searching user, but if our network has n users this General Terms requires Θ(n2) storage, which is clearly infeasible. Another arXiv:1507.05999v3 [cs.DS] 15 Dec 2015 Algorithms, Performance, Experimentation, Theory naive baseline would be to use power iteration [20] at query time, but that would take Θ(m) computation between the search query and response, where m is the number edges, Keywords which is also clearly infeasible. The challenge we face is Personalized Search, Personalized PageRank, Social Net- to create a data structure much smaller than O(n2) which work Analysis allows us to rank |T | targets in response to a query in less Permission to make digital or hard copies of all or part of this work for personal or than O(|T |) time. classroom use is granted without fee provided that copies are not made or distributed Previous work has considered the problem of personalized for profit or commercial advantage and that copies bear this notice and the full search on social networks. For example Vieira et. al. [24] on the first page. Copyrights for components of this work owned by others than the consider this problem and provide excellent motivation for author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or why results to a name-search query should be ranked based republish, to post on servers or to redistribute to lists, requires prior specific permission the friendships of the searching user and the candidate re- and/or a fee. Request permissions from [email protected]. WSDM 2016 February 22 - 25, 2016, San Francisco, CA, USA sults. They and others (e.g. [4]) propose to rank results Copyright is held by the owner/author(s). Publication rights licensed to ACM. by shortest path length. However, this metric doesn’t take ACM 978-1-4503-3716-8/16/02/$15.00 into account the number of paths between two users: If the DOI: http://dx.doi.org/10.1145/2835776.2835823. searcher and two results John A and John B are distance – BiPPR-Precomp-Grouped precomputes and stores the re- 3 apart, but the searcher and John A are connected by 100 verse vectors yt, t ∈ T after grouping them by their co- length-3 paths while the searcher and John B are connected ordinates. This exploits the natural sparsity of these by a single length-3 path, than John A should be ranked vectors to speed-up the computation of the PPR esti- above John B, yet the shortest distance can’t distinguish mates at runtime. the two. To the best of our knowledge, no prior work has – BiPPR-Precomp-Sampling samples nodes t ∈ T propor- solved the Personalized PageRank search problem using less tional to their PPR πs(t). Now since PPR values are than O(n2) storage and O(|T |) query time. The reason we usually highly skewed, this serves as a good proxy for are able to solve this is by exploiting a new bidirectional finding the top k search results. method of PageRank, introduced in [19] and improved in • Extensive simulations on the Twitter-2010 network to test this work. the scalability of our algorithms for PPR-search. Our ex- Our search algorithm is based on two key ideas. The first periments demonstrate the trade-off between storage and is that we can find the top target nodes without having to runtime, and suggest that we should use a combination consider each separately by sampling a target ti ∈ T in pro- of methods, depending on the size of the set of targets T portion to its Personalized PageRank πs(ti). Because the top induced by the keyword. results typically have a much higher personalized PageRank than an average result, by sampling we can find the top re- sults without iterating over all the results. The second idea 2. PRELIMINARIES is that the probability of a exactly reaching an We are given a graph G = (V,E) with n nodes and m element in T is often very small, but by pre-computing an edges. Define the out-neighbors of a node u by N out(u) = expanded set of nodes around each target, we can efficiently out out in {v :(u, v) ∈ E} and let d (u) = N (u) ; define N (u) sample random walks until they get close to a target node, and din(u) similarly. Define the average degree of nodes d¯= and then use the pre-computed data to sample targets t in i m . If the graph is weighted, for each (u, v) ∈ E there is some proportion to πs(ti). n positive weight w ; otherwise we define w = 1 for There are currently two main limitations to our work. u,v u,v dout(u) First, because we do pre-computation on the set of nodes all (u, v) ∈ E. For simplicity we assume the weights are P relevant to a query, we need the set of queries to be known normalized such that for all u, v wu,v = 1. in advance, although in the case of name search we can sim- The personalized PageRank from source distribution σ to ply let the space of queries be the set of all first or last names. target node t can be defined using linear algebra as the solu- tion to the equation π = π (ασ+(1−α)W ), or equivalently Second, the pre-computed√ storage is significant; for name-√ σ σ search it is O (n m) to achieve query running time O( m), defined using random walks where n is the number of nodes and m is the number of edges. However, large graphs tend to be sparse, so this is πσ(t) = Pr[a random walk starting from s ∼ σ 2 still much smaller than O n and is less storage than any of length ∼ geometric(α) stops at t] prior solution to the Personalized PageRank Search prob- lem. Also, pre-computation doesn’t need to be done for all as shown in [2]. For concreteness, in this paper we often queries: for queries with small or very large target sets we assume σ = es for some single node s (meaning the random describe alternative algorithms which do not require pre- walks always start at a single node s), but all results extend computation. These alternatives also overcome the limita- in a straightforward manner to any starting distribution σ. tion on queries being known in advance. Personalized PageRank was first defined in the original Contributions: To summarize, in this work we present: PageRank paper [20]. For more on the motivation of Per- • A new bidirectional PageRank estimator, Bidirectional- sonalized PageRank, see [13] and the survey [10]. PPR (section3), which has the following features: – Simple analysis: We combine a simple linear-algebraic invariant with standard concentration bounds. The 3. PAGERANK ESTIMATION new analysis also allows generalizations to arbitrary In this section, we present our new bidirectional algorithm Markov Chains, as done in [6]. for PageRank estimation. We first develop the basic al- – Easy to implement: The complete algorithm is only 18 gorithm along with its theoretical performance guarantees; lines of pseudo-code. next, we outline some extensions of the basic algorithm; fi- – Significant empirical speedup: For a given accuracy, it nally, we conclude the section with simulations demonstrat- executes 3x-8x faster than the fastest previous algo- ing the efficiency of our technique. rithm, FAST-PPR [19], on a diverse set of networks. The Bidirectional-PPR Algorithm – Simple linear structure: As shown in section 4.1, the At a high level, our algorithm estimates π (t) by first estimates are a simple dot-product between a forward s working backwards from t to find a set of intermediate nodes vector xs and a reverse vector yt – this enables the ‘near’ t and then generating random walks forwards from s development of PPR samplers. to detect this set. – Parallelizability: Because the estimate is a dot-product, The reverse work from t is done via the Approx-Contri- the precomputed vectors can be sharded across many butions algorithm (see Algorithm1) of Andersen et. al. [1], servers, and the estimation algorithm can be naturally that, given a target t and a desired additive error-bound distributed, as shown in [11]. r , produces estimates pt(s) of the PPR π (t) for every • Two new solutions to the Personalized PageRank Search max s start node s. More specifically, the Approx-Contributions problem – BiPPR-Grouped and BiPPR-Sampling. Given t n algorithm produces two non-negative vectors p ∈ R and any set of targets T : t n r ∈ R which satisfy the following invariant (Lemma 1 in [1]) Algorithm 2 Bidirectional-PPR(s, t, δ)

t X t Inputs: graph G, teleport probability α, start node s, tar- πs(t) = p (s) + πs(v)r (v). (1) get node t, minimum probability δ, accuracy parameter v∈V c (in our experiments we√ use c = 7) Approx-Contributions terminates once each residual value 1: Choose rmax = cbalance/ m), where cbalance is tuned t P t r (v) < rmax; now, viewing v∈V πs(v)r (v) as an error to balance forward and reverse work. (For greater effi- t term, Andersen et al. observe that p (s) estimates πs(t) up ciency, use the balanced version described in Section3.) to a maximum additive error of rmax. Our Bidirectional-PPR algorithm is based on the obser- 2:( pt, rt) = Approx-Contributions(t, rmax, α) vation that in order to estimate πs(t) for a particular (s, t) 3: Set number of walks w = crmax/δ (cf. Theorem1) pair, we can boost the accuracy by sampling and adding 4: for index i ∈ [w] do the residual values rt(v) from nodes v which are sampled 5: Sample a random walk starting from s (sampling a from πs. To see this, we first interpret Equation (1) as an start from s if s is a distribution), stopping after each expectation: step with probability α; let vi be the endpoint t t 6: Set Xi = rt(vi) πs(t) = p (s) + Ev∼πs [r (v)]. 7: end for P t t 8: return πs(t) = pt(s) + (1/w) Xi Now, since maxv r (v) < rmax, the expectation Ev∼πs(v)[r (v)] b i∈[w] can be efficiently estimated using Monte Carlo. To do so, we rmax generate w = c δ random walks of length Geometric(α) from start node s; here c is a parameter which depends 2eδ assumption rmax > α is easily satisfied, as typically δ = on the desired accuracy, rmax is the maximum residual af-   O 1  and r = Ω √1 . ter running Approx-Contributions, and δ is the minimum n max m PPR value we want to accurately estimate. Let Vi be the th Proof. As shown in Algorithm2, we will average over final node of the i random walk; note that Pr[Vi = v] = t r πs(v). Let Xi = r (Vi) denote the residual from the final w = c max ¯ 1 Pw δ node of the ith random walk, and X = w i=1 Xi. Then Bidirectional-PPR returns as an estimate of πs(t): walks, where c is a parameter we choose later. Each walk is t ¯ of length Geometric(α), and we denote Vi as the last node πbs(t) = p (s) + X th t visited by the i walk, so that Vi ∼ πs. Let Xi = r (Vi). The complete pseudocode is given in Algorithm2. The estimate returned by Bidirectional-PPR is

w t 1 X Algorithm 1 Approx-Contributions(G, α, t, rmax)[1] π (t) = p (s) + X . bs w i Inputs: graph G with edge weights wu,v, teleport probabil- i=1 ity α, target node t, maximum residual r max First, from Equation (1), we have that [π (t)] = π (t). 1: Initialize (sparse) estimate-vector p = ~0 and (sparse) E bs s t Moreover, Approx-Contributions guarantees that for all v, residual-vector rt = et (i.e. rt(v) = 1 if v = t; else 0) t r (v) < rmax, and so each Xi is bounded in [0, rmax]. Before 2: while ∃v ∈ V s.t. rt(v) > rmax do in applying Chernoff bounds, we rescale Xi by defining Yi = 3: for u ∈ N (v) do 1 Pw Xi ∈ [0, 1], and we define Y = Yi. 4: r (u) += (1 − α)w r (v) rmax i=1 t u,v t We will show concentration of the estimates via the fol- 5: end for lowing two Chernoff bounds (see Theorem 1.1 in [?]): 6: pt(v) += αrt(v) 7: r (v) = 0 2 t 1. [|Y − [Y ]| >  [Y ]] < 2 exp(−  [Y ]) 8: end while P E E 3 E 9: return (p , r ) −b t t 2. For any b > 2eE[Y ], P[Y > b] ≤ 2

Accuracy Analysis We perform a case analysis based on whether E[Xi] ≥ δ or We first prove that Bidirectional-PPR returns an esti- E[Xi] < δ. mate with the desired accuracy with high probability: First suppose E[Xi] ≥ δ. This implies that πs(t) ≥ δ so we will prove a relative error bound of . Now we have w c [Y ] = [Xi] = [Xi] ≥ c, and thus: Theorem 1. Given start node s (or source distribution E rmax E δ E σ), target t, minimum PPR δ, maximum residual rmax > ¯ 2eδ P[|πs(t) − πs(t)| > πs(t)] ≤ P[ X − E[Xi] > E[Xi]] α , relative error  ≤ 1, and failure probability pfail, Bi- b directional-PPR outputs an estimate πbs(t) such that with = P[|Y − E[Y ]| > E[Y ]] probability at least 1 − p the following hold: fail  2  • If π (t) ≥ δ: |π (t) − πˆ (t)| ≤ π (t). ≤ 2 exp − [Y ] s s s s 3 E • If πs(t) ≤ δ: |πs(t) − πˆs(t)| ≤ 2eδ.  2  ≤ 2 exp − c ≤ pfail, The above result shows that the estimateπ ˆs(t) can be used 3 to distinguish between ‘significant’ and ‘insignificant’ PPR where the last line holds as long as we choose pairs: for pair (s, t), Theorem1 guarantees that if πs(t) ≥ (1+2e)δ   (1−) , then the estimate is greater than (1 + 2e)δ, whereas 3 2 c ≥ 2 ln . if πs(t) < δ, then the estimate is less than (1 + 2e)δ. The  pfail   q ¯q Suppose alternatively that E[Xi] < δ. Then 1 d log(1/pfail) log(1/δ) O α2 δ log(1/(1−α)) for uniformly chosen targets. ¯ P[|πˆs(t) − πs(t)| > 2eδ] = P[ X − E[Xi] > 2eδ] The running time bound of Bidirectional-PPR is thus asymp-  w  totically better than FAST-PPR, and in experiments the con- = P |Y − E[Y ]| > 2eδ stants required for the same accuracy are smaller, making rmax Bidirectional-PPR is 3 to 8 times faster on a diverse set of  w  ≤ P Y > 2eδ . graphs. rmax Proof. In [18], it is proven that for a uniform random t, At this point we set b = w 2eδ = 2ec and apply the second d¯ ¯ rmax Approx-Contributions runs in average time where d is c αrmax Chernoff bound. Note that E[Y ] = δ E[Xi] < c, and hence the average degree of a node. On the other hand, from Theo- we satisfy b > 2e [Y ]. The second bound implies that rmax  E rem1, we know that we need to generate O δ2 ln (1/pfail) random walks, each of which can be sampled in average time [|πˆ (t) − π (t)| > 2eδ] ≤ 2−b ≤ p (2) P s s fail  q d¯ 1/α. Finally, we choose rmax = to minimize α ln(2/pfail) as long as we choose c such that: our running time bound and get the claimed result. 1 1 c ≥ log . Extensions Bidirectional-PageRank extends naturally to 2e 2 p fail generalized PageRank using a source distribution σ rather If πs(t) ≤ δ, then equation2 completes our proof. than a single start node – we simply sample an independent The only remaining case is when πs(t) > δ but E[Xi] < δ. starting node for each walk, and replace pt(s) with the ex- t t This implies that p (s) > 0 since πs(t) = p (s) + E[Xi]. pected value of pt(s) when s is sampled from the starting In the Approx-Contributions algorithm when we increase distribution. πs(t), we always increase it by at least αrmax, so we have The dynamic runtime-balancing method proposed in [19] t p (s) ≥ αrmax. We have that can improve the running time of Bidirectional-PageRank in practice. In this technique, r is chosen dynamically in or- |πˆ (t) − π (t)| |πˆ (t) − π (t)| max s s ≤ s s . der to balance the amount of time spent by Approx-Contri- πs(t) αrmax butions and the amount of time spent generating random By assumption, 2eδ < , so by equation2, walks. To implement this, we modify Approx-Contributions αrmax to use a priority queue in order to always push from the node   |πˆs(t) − πs(t)| v with the largest value of rt(v). We also change the while P >  ≤ pfail loop so that it terminates when the amount of time spent πs(t) achieving the current value of rmax first exceeds the pre- The proof is completed by combining all cases and choos- dicted amount of time required for sampling random walks, 3  2  rmax ing c = 2 ln . We note that the constants are not cwalk · c · , where cwalk is the average time it takes to  pfail δ optimized; in experiments we find that c = 7 gives mean sample a random walk. For full pseudocode, see [17]. relative error less than 8% on a variety of graphs. Experimental Validation We now compare Bidirectional-PPR to its predecessor Running Time Analysis algorithms (namely: FAST-PPR [18], Monte Carlo [2,9] and The runtime of Bidirectional-PPR depends on the target Approx-Contributions [1]). The experimental setup is iden- t: if t has many in-neighbors and/or large global PageRank tical to that in [18]; for convenience, we describe it here in π(t), then the running time will be slower than for a ran- brief. We perform experiments on 6 diverse, real-world net- dom t. Theorem 1 of [1] states that Approx-Contributions works: two directed social networks (Pokec (31M edges) and nπ(t) Twitter-2010 (1.5 billion edges)), two undirected social net- (G, α, t, rmax) performs pushback operations, and the αrmax exact running time is proportional to the sum of the in- work (Live-Journal (69M edges) and (117M edges)), degrees of all the nodes where we pushback from. In the a collaboration network (dblp (6.7M edges)), and a web- worst case, we might have din(t) = Θ(n) and Bidirectional- graph (UK-2007-05 (3.7 billion edges)). Since all algorithms PPR takes Θ(n) time. However, for a uniformly chosen target have parameters that enable a trade-off between running node, we can prove the following: time and accuracy, we first choose parameters such that the mean relative error of each algorithm is approximately 10%. Theorem 2. For any start node s (or source distribu- For bidirectional-PPR, we find that setting c = 7 (i.e., gener- rmax tion σ), minimum PPR δ, maximum residual rmax, relative ating 7 · δ random walks) results in a mean relative error error , and failure probability pfail, if the target t is chosen less than 8% on all graphs; for the other algorithms, we use uniformly at random, then Bidirectional-PPR has expected the settings determined in [18]. We then repeatedly sample running time a uniformly-random start node s ∈ V , and a random target r p ! t ∈ T sampled either uniformly or from PageRank (to em- d¯ log (1/pfail) O . phasize more important targets). For both Bidirectional- δ α PPR and FAST-PPR, we used the dynamic-balancing heuristic described above. The results are shown in Figure1. In contrast, the running time for Monte-Carlo to achieve Note that Bidirectional-PPR is 3 to 8 times faster than   1 log(1/pfail) FAST-PPR across all graphs. In particuar, Bidirectional- the same accuracy guarantee is O δ α2 , and the PPR only needs to sample 7 rmax random walks, while FAST-  d¯  δ running time for Approx-Contributions is O . The rmax δα PPR needs 350 δ walks to achieve the same mean relative fastest previous algorithm for this problem, the FAST-PPR error. This is because Bidirectional-PPR is unbiased, while algorithm of [19], has an average running time bound of FAST-PPR has a bias from Approx-Contributions. (a) Sampling targets uniformly (b) Sampling targets from PageRank distribution

Figure 1: Average running-time (on log-scale) for different networks. We measure the time required for 4 estimating PPR values πs(t) with threshold δ = n for 1000 (s, t) pairs. For each pair, the start node is sampled uniformly, while the target node is sampled uniformly in Figure 1(a), or from the global PageRank distribution in Figure 1(b). In this plot we use teleport probability α = 0.2.

4. PERSONALIZED PAGERANK SEARCH • BiPPR-Precomp: In this approach, we first use Approx- We now turn from Personalized PageRank estimation to Contributions to pre-compute and store a reverse vector t the Personalized PageRank search problem: y for each t ∈ V . At query time, we generate random walks to form the forward vector xs; now, given any set t Given a start node s (or distribution σ) and a query q of targets T , we compute |T | dot-products hxs, y i, and which filters the set of all targets to some list use these to rank the targets. This method now has an   T = {ti} ⊆ V , return the top-k targets ranked by πs[ti]. p worst-case running time O |T | d/δ¯ k . In practice, it We consider as baselines two algorithms which require no works well if T is small, but is too slow for large T . In pre-computation. They are efficient for certain ranges of our experiments (doing 100,000 random walks at runtime) |T |, but our experiments show they are too slow for real- this approach takes around a second for |T | ≤ 30, but this time search across most values of |T |: climbs to a minute for |T | = 10, 000. • Monte-Carlo [2,9]: Sample random walks from s, and The BiPPR-Precomp approach is faster than Bidirectional- filter out any walk whose endpoint is not in T . If we PPR (at the cost of additional precomputation and storage), desire ns samples, this takes time O (ns/πs[T ]), where and also faster than Monte-Carlo for small sets T , but it is P still not efficient enough for real-time personalized search. πs[T ] := t∈T πs[t] is the probability that a random walk terminates in T . This method works well if T is large, but This motivates us to find a more efficient algorithm that in our experiments on Twitter-2010 it takes minutes per scales better than Bidirectional-PPR for large T , yet is query for |T | = 1000 (and hours per query for |T | = 10). fast for small |T |. In the following sections, we propose • Bidirectional-PPR: On the other hand, we can estimate two different approaches for this – the first based on pre- πs[t] to each t ∈ T separately using Bidirectional-PPR. grouping the precomputed reverse-vectors, and the second   based on sampling target nodes from T according to PPR. This has an average-case running time O |T | pd/δ¯ where k For convenience, we first summarize the two approaches: th δk is the PPR of the k best target. This method works • BiPPR-Precomp-Grouped: Here, as in BiPPR-Precomp, we well if T is small, but is too slow for large T ; in our ex- compute an estimate to each t ∈ T using Bidirectional- periments, it takes on the order of seconds for |T | ≤ 100, PPR. However, we leverage the sparsity of the reverse vec- but more than a minute for |T | = 1000. tors yt = (pt, rt) by first grouping them in a way we will If we are allowed pre-computation, then we can improve describe. This makes the dot-product more efficient. This upon Bidirectional-PPR by precomputing and storing a re-  p  method has a worst-case running time of O |T | d/δ¯ k , verse vector from all target nodes. To this end, we first ob- serve that the estimateπ ˆ [t] can be written as a dot-product. and in experiments we find it is much faster than BiPPR- s Precomp. For our parameter choices its running time is Letπ ˜s be the empirical distribution over terminal nodes due to w random walks from s (with w chosen as in Theorem less than 250ms across the range of |T | we tried. 1); we define the forward vector x ∈ 2n to be the concate- • BiPPR-Precomp-Sampling: We again first pre-compute the s R reverse vectors yt. Next, for a given target t, we define nation of the basis vector es and the random-walk terminal t node distribution. On the other hand, we define the reverse the expanded target-set Tt = {v ∈ [2n]|y [v] 6= 0}, i.e., t 2n t the set of nodes with non-zero reverse vectors from t. At vector y ∈ R , to be the concatenation of the estimates p and the residuals rt. Formally, define run-time, we now sample random walks forward from s to nodes in the expanded target sets. Using these, we create 2n t t t 2n xs = (es, π˜s) ∈ R , y = (p , r ) ∈ R . (3) a sampler in average time O (rmax/δk) (where as before δk th is the k largest PPR value πs[tk]), which samples nodes Now, from Algorithm2, we have t ∈ T with probability proportional to the PPR πs[t]. We t describe this in detail in Section 4.2. Once the sampler πˆs[t] = hxs, y i. (4) has been created, it can be sampled in O(1) time per sam- The above observation motivates the following algorithm: ple. The algorithm works well for any size of T , and has Algorithm 4 BiPPRGroupedRankTargets(s, rmax, z) the unique property that in can identify the top-k target Inputs: Graph G, teleport probability α, start node s, max- nodes without computing a score for all |T | of them. For imum residual rmax, z: hash map of reverse vectors our parameter choice its running time is less than 250ms grouped by coordinate across the range of |T | we tried. rmax 1: Set number of walks w = c δ (In experiments we We note here that the case k = 1 (i.e., for finding the found c = 20 achieved precision@3 above 90%.) top PPR node) corresponds to solving a Maximum Inner 2: Sample w random-walks of length Geometric(α) from s; Product Problem. In a recent line of work, Shrivastava computeπ ˜s[v] = fraction of walks ending at node v and Li [21, 22] propose a sublinear time algorithm for this 2|V | 3: Compute xs = (es, π˜s) ∈ R problem based on Locality Sensitive Hashing; however, their 4: Initialize empty map score from V to R method assumes that there is some bound U on yt and 2 5: for v such that xs[v] > 0 do t that maxthxs, y i is a large fraction of U. In personalized 6: for t such that z[v][t] > 0 do t search, we usually encounter small values of maxthxs, y i 7: score(t) += xs[v]zv[t] relative to max yt – finding an LSH for Maximum In- 2 8: end for ner Product Search in this regime is an interesting open 9: end for problem for future research. Our two approaches bypass 10: Return T sorted in decreasing order of score this by exploiting particular structural features of the prob- lem – BiPPR-Precomp-Grouped exploits the sparsity of the reverse vectors to speed up the dot-product, and BiPPR- Precomp-Sampling exploits the skewed distribution of PPR we tried, taking less than 250 ms even for |T | = 10, 000. The scores to find the top targets without even computing full improved running time of BiPPR-Precomp-Grouped comes at dot-products. the cost of more storage compared to BiPPR-Precomp. In the case of name search, where each target typically only has a 4.1 Bidirectional-PPR with Grouping first and last name, each vector yt only appears in two of In this method we improve the running-time of BiPPR- these pre-grouped structures, so the storage is only twice Precomp by pre-grouping the reverse vectors corresponding the storage of BiPPR-Precomp. On the other hand if a tar- to each target set T . Recall that in BiPPR-Precomp, we get t contains many keywords, yt will be included in many t t t 2n first pre-compute reverse vectors y = (p , r ) ∈ R using of these pre-grouped data structures, and storage cost will Approx-Contributions for each t. At run-time, given s, we be significantly greater than for BiPPR-Precomp. compute forward vector xs = (es, π˜s) by generating suffi- t cient random-walks, and then compute the scores hxs, y i 4.2 Sampling from Targets Matching a Query for t ∈ T . Our main observation is that we can decrease the The key idea behind this alternate method for PPR-search running time of the dot-products by pre-grouping the vectors is that by sampling a target t in proportion to its PageRank yt by coordinate. The intuition behind this is that in each we can quickly find the top targets without iterating over all dot product P x [v]yt[v], the nodes v where x [v] 6= 0 of- v s s of them. After drawing many samples, the targets can be ten don’t have yt[v] 6= 0, and most of the product terms are ranked according to the number of times they are sampled. 0. Hence, we can improve the running time by grouping the Alternatively a full Bidirectional-PPR query can be issued vectors yt in advance by coordinate v. Now, at run-time, for some subset of the targets before ranking. This approach for each v such that x [v] 6= 0, we can efficiently iterate over s exploits the skewed distribution of PPR scores in order to the set of targets t such that yt[v] 6= 0. find the top targets. In particular, prior empirical work has An alternative way to think about this is as a sparse shown that on the Twitter graph, for each fixed s, the values -vector multiplication Y T x after we form a matrix s π [t] follow a power law [3]. Y T whose rows are yt for t ∈ T . This optimization can then s We define the PPR-Search Sampling Problem as follows: be seen as a sparse column representation of that matrix. Given a source distribution s and a query q which filters Algorithm 3 BiPPRGroupedPrecomputation(T, rmax) the set of all targets to some list T = {ti} ⊆ V , sample a Inputs: Graph G, teleport probability α, target nodes T , πs[ti] target ti with probability p[ti] = P π [t] . maximum residual rmax t∈T s 1: z ← empty hash map of vectors such that for any v, z[v] 2|V | defaults to an empty (sparse) vector in R We develop two solutions to this sampling problem. The 2: for t ∈ T do first, in O(w) = O(rmax/δk) time, generates a data struc- t 2|V | ture which can generate an arbitrary number of independent 3: Compute y = (pt, rt) ∈ R via Approx-Contri- butions(G, α, t, rmax) samples from a distribution which approximates the correct 4: for v ∈ [2 |V |] such that yt[v] > 0 do distribution. The second can generate samples from the ex- 5: z[v][t] = yt[v] act distribution πs[ti], and generates complete paths from s 6: end for to some t ∈ T , but requires time O(rmax/πs[T ]) per sam- 7: end for ple. Because the approximate sampler is more efficient, we 8: return z present that here and defer the exact sampler to [17]. The BiPPR-Precomp-Sampling Algorithm We refer to this method as BiPPR-Precomp-Grouped; the The high level idea behind our method is hierarchical sam- complete pseudo-code is given in Algorithm4. The correct- pling. Recall that the start node s has an associated forward ness of this method follows again from Theorem1. In exper- vector xs = (es, πs), and from each target t we have a reverse t t iments, this method is efficient for T across all the sizes of T vector y ; the PPR-estimate is given by πs[t] ≈ hxs, y i. Thus we want to sample t ∈ T with probability: Algorithm 5 SamplerPrecomputationForSet(T, rmax)

t Inputs: Graph G, teleport probability α, target-set T , hxs, y i p[t] = . maximum residual rmax P hx , yj i j∈T s 1: for t ∈ T do t t t 2|V | We will sample t in two stages: first we sample an interme- 2: Compute y = (p , r ) ∈ R via Approx-Contri- diate node v ∈ V with probability: butions(G, α, t, rmax) 3: end for P j 4: Compute yT = P yt 0 xs[v] j∈T y [v] t∈T p [v] = . T s P j 5: for v ∈ V such that y [v] > 0 do j∈T hxs, y i 6: Create samplerv which samples t with probability 00 Following this, we sample t ∈ T with probability: pv [t] (For example, using the alias sampling method [25], [15, section 3.4.1]). yt[v] p00[t] = . 7: end for v P yj [v] T j∈T 8: return (y , {samplerv}) P 0 00 It is easy to verify that p[t] = v∈V ps[v]pv [t]. Figure2 T shows how the sampling algorithm works on an example Algorithm 6 SampleAndRankTargets(s, rmax, y , {samplerv}) graph. The pseudo-code is given in Algorithm5 and Algo- Inputs: Graph G, teleport probability α, start node s, max- rithm6. T imum residual rmax, reverse vectors y , intermediate- node-to-target samplers {samplerv}. rmax 1: Set number of walks w = c δ . In experiments we found c = 20 achieved precision@5 above 90% on a# b# Twitter-2010. t1# 2: Set number of samples ns (We use ns = w) 3: Sample w random walks from s and letπ ˜s be the em- c# s# pirical distribution of their endpoints; compute forward 2|V | t2# vector xs = (es, π˜s) ∈ R 4: Create samplers to sample v ∈ [2 |V |] with probability 0 T ps[v], i.e., proportional to xs[v]y [v] t3# 5: Initialize empty map score from V to N 6: for j ∈ [0, ns − 1] do 7: Sample v from samplers 8: Sample t from samplerv 9: Increment score(t) Figure 2: Search Example: Given target set T = 10: end for {t1, t2, t3}, for each target ti we have drawn the ex- 11: Return T sorted in decreasing order of score panded target-set, i.e., nodes v with positive resid- ual yti [v]. From source s, we sample three random walks, ending at nodes a, b, and c. Now suppose Accuracy: BiPPR-Precomp-Sampling does not sample ex- yt1 (b) = 0.64, yt1 (c) = 0.4, yt2 (c) = 0.16, and yt3 (c) = 0.16 actly in proportion to π ; instead, the sample probabilities – note that the remaining residuals are 0. Then s are proportional to a distributionπ ˆ satisfying the guarantee we have yT (a) = 0, yT (b) = 0.64 and yT (c) = 0.72, s of Theorem1. In particular, for all targets t with π [t] ≥ δ, and consequently, the sampling weights of (a, b, c) are s this will have a small relative error , while targets with (0, 0.213, 0.24). Now, to sample a target, we first sam- π [t] < δ will likely be sampled rarely enough that they ple from {a, b, c} in proportion to its weight. Then if s won’t appear in the set of top-k most sampled nodes. we sample b, we always return t ; if we sample c, we 1 Storage Required: The storage requirements for BiPPR- sample (t , t , t ) with probability (5/9, 2/9, 2/9). 1 2 3 Precomp-Sampling (and for BiPPR-Precomp-Grouped) de- pends on the distribution of keywords and how rmax is cho- Note that we assume that some set of supported queries sen for each target set. For simplicity, here we assume a is known in advance, and we first pre-compute and store single maximum residual rmax across all target sets, and as- a separate data-structure for each query Q (i.e., for each sume each target is relevant to at most γ keywords. For target-set T = {t ∈ V : t is relevant to Q}). In addition, example, in the case of name search, each user typically has we can optionally pre-compute w random walks from each a first and last name, so γ = 2. start-node s, and store the forward vector xs, or we can compute xs at query time by sampling random walks. Theorem 3. Let graph G, minimum-PPR value δ and Running Time: For a small relative error for targets with time-space trade-off parameter rmax be given, and suppose πs[t] > δ, we use w = crmax/δ walks, where c is chosen every node contains at most γ keywords. Then the total as in Theorem1. The support of our forward sampler is storage needed for BiPPR-Precomp-Sampling to construct a at most w so its construction time is O(w) using the alias sampler for any source node (or distribution) s and any set   method of sampling from a discrete distribution [25], [15, of targets T corresponding to a single keyword is O γm . αrmax section 3.4.1]. Once constructed, we can get independent samples in O(1) time. Thus the query time to generate ns We can choose rmax to trade-off this storage requirement samples is O (crmax/δ + ns). with the running time requirement of O (crmax/δ) – for ex- ample, we can set both the query running-time and per-node for this keyword is bounded by: p ¯ ¯ storage to cγd/δ where d = m/n is the average degree. 1−β 1 X in πv[T ] ¯nπ[T ] mc2 |T | Now for name search γ = 2, and if we choose δ = n and d (v) = d = . rmax rmax w α = Θ(1),√ the per-query running time and per-node storage v∈V is O( m). Note that this is independent of π[T ]. There is still a de- Proof. For each set T corresponding to a keyword, and pendence on |T |, which is natural since for larger T there t each t ∈ T , we push from nodes v until for each v, r [v] < are more nodes which make it harder to find the top-k. For rmax. Each time we push from a node v, we add an entry β = 0.77 , the rate of growth, |T |0.23 is fairly small, and in in to the residual vector of each node u ∈ N (v), so the space particular is sublinear in |T |. in cost is d (v). Each time we push from a node v, we increase Dynamic Graphs: So far we have assumed that the graph t t P t the estimate p [v] by αr [v] ≥ αrmax, and t∈T p [v] ≤ and keywords are static, but in practice they change over P πv [t] πv[t] = πv[T ] so v can be pushed from at most time. When a keyword is added to some node T , the node’s t∈T αrmax times. Thus the total storage required is reverse vector yt needs to be added to the sampling data structure for that keyword. When an edge is added, the X in X in πv[T ] d (v)(# of times v pushed) ≤ d (v) (5) residual values need to be updated. We leave the extension αrmax v∈V v∈V to dynamic graphs to future work. Let T be the set of all target sets (one target set per 4.3 Experiments keyword). Then the total storage over all keywords is We conduct experiments to test the efficiency of these personalized search algorithms as the size of the target set X X X in πv[t] X X in πv[t] d (v) ≤ γ d (v) varies. We use one of the largest publicly available social αrmax αrmax T ∈T t∈T v∈V v∈V t∈V networks, Twitter-2010 [16] with 40 million nodes and 1.5 X in 1 m billion edges. For various values of |T |, we select a target ≤ γ d (v) ≤ γ . αrmax αrmax set T uniformly among all sets with that size, and compare v∈V the running times of the four algorithms we propose in this work, as well as the Monte Carlo algorithm. We repeat this using 10 random target sets and 10 random sources s per Adaptive Maximum Residual: One way to improve the target set, and report the median running time for all al- storage requirement is by using larger values of rmax for gorithms. We use the same target sets and sources for all target sets T with larger global PageRank. Intuitively, if T algorithms. is large, then it’s easier for random walks to get close to T , Parameter Choices: Because all five algorithms have pa- so we don’t need to push back from T as much as we would rameters that trade-off running time and accuracy, we choose for a small T . We now formalize this scheme, and outline the parameters such that the accuracy is comparable so we can savings in storage via a heuristic analysis, based on a model compare running time on a level playing field. To choose a of personalized PageRank values introduced by Bahmani et concrete benchmark, we chose parameters such that the pre- al. [3]. cision@3 of the four algorithms we propose are consistently For a fixed s, we assume the values πs[v] for all v ∈ V greater than 90% for the range of |T | we used in experi- approximately follow a power law with exponent β. Empiri- ment. We chose parameters for Monte-Carlo so that our cally, this is known to be an accurate model for the Twitter algorithms are consistently more accurate than it, and its graph – Bahmani et al. [3] find that the mean exponent for precision@3 is greater than 85%. In the full version we plot a user is β = 0.77 with standard deviation 0.08. To analyze the precision@3 of the algorithms for the parameters we use our algorithm, we further assume that πs restricted to T when comparing running time. also follows a power law, i.e.: We used δ = πs(tk) where πs(tk) is estimated using Eqn.

1 − β −β 6, using k = 3, power law exponent β = 0.77 (the mean value |T | πs[ti] = 1−β i πs[T ]. (6) |T | found empirically on Twitter), and assuming πs(T ) = n (the expected value of πs(T ) since T is chosen uniformly Suppose we want an accurate estimate of πs[ti] for the top- at random). Then we use Equation7 to set rmax, using k results within T , so we set δk = πs[tk]. From Theorem1, c = 20 and two values of w, 10,000 and 100,000. We used the number of walks required is: the same value of rmax for BiPPR-Precomp, BiPPR-Precomp- 1−β Grouped, and BiPPR-Precomp-Sampling. For Monte-Carlo, rmax rmax |T | 40 1 w = c = c2 we sampled δ walks . δk πs[T ] Results: Figure3 shows the running time of the five algo- rithms as |T | varies for two different parameter settings in where c = kβ c/(1 − β). If we fix the number of walks as 2 the trade-off between running time and precomputed storage w, then we must set r = wπ [T ]/(c |T |1−β ). Also, for a max s 2 requirement. Notice that Monte-Carlo is very slow on this uniformly random start node s, we have [π [T ]] = π[T ] (the E s large graph for small target set sizes, but gets faster as the global PageRank of T ). This suggests we choose rmax(T ) for set T as: 1Note that Monte-Carlo was too slow to finish in a rea- sonable amount of time, so we measured the average time wπ[T ] required to take 10 million walks, then multiplied by the rmax(T ) = 1−β (7) c2 |T | number of walks needed. When measuring precision, we simulated the target weights Monte-Carlo would generate, Going back to equation (5), suppose for simplicity that the by sampling ti with probability πs(ti); this produces exactly average din(v) encountered is d¯. Then the storage required the same distribution of weights as Monte-Carlo would. Running Time of Search Algorithms on Twitter-2010 Running Time of Search Algorithms on Twitter-2010 104 104 Monte Carlo Monte Carlo Bidirectional-PPR Bidirectional-PPR 3 10 Bidirectional-PPR-Precomputed Bidirectional-PPR-Precomputed 103 Bidirectional-PPR-Grouped Bidirectional-PPR-Grouped Bidirectional-PPR-Sampling Bidirectional-PPR-Sampling 102 102

101

101 100 Running Time per Search (s) Running Time per Search (s)

100 10-1

10-2 10-1 101 102 101 102 103 104 Target Set Size Target Set Size (a) Running Time, More Precomputation (b) Running Time, Less Precomputation

Figure 3: Running time on Twitter-2010 (1.5 billion edges) on log-scale, with parameters chosen such that the Precision@3 of our algorithms exceeds 90% and exceeds the precision@3 of Monte-Carlo. The two plots demonstrate the storage-runtime tradeoff: Figure 3(a) (which performs 10K walks at runtime) uses more pre-computation and storage compared to Figure 3(b) (with 100K walks). size of the target set increases. For example when |T | = 10 Monte Carlo takes half an hour, and even for |T | = 1000 it takes more than a minute. Bidirectional-PPR is fast for small T , but slow for larger T , taking more than a second when |T | ≥ 100. In contrast, BiPPR-Precomp-Grouped and BiPPR-Precomp-Sampling are both fast for all sizes of T , taking less than 250 ms when w = 10, 000 and less than 25 ms when w = 100, 000. Running Time of Search Algorithms on Pokec 103 The improved running time of BiPPR-Precomp-Grouped Monte Carlo and BiPPR-Precomp-Sampling, however, comes at the cost of Bidirectional-PPR pre-computation and storage. With these parameter choices, Bidirectional-PPR-Precomputed for w = 10, 000 the pre-computation size per target set in 102 Bidirectional-PPR-Grouped our experiments ranged from 8 MB (for |T | = 10) to 200MB Bidirectional-PPR-Sampling (for |T | = 1000) per keyword. For w = 100, 000, the storage per keyword ranges from 3 MB (for |T | = 10) to 30MB (for 101 |T | = 10, 000). To get a larger range of |T | relative to |V |, we also perform experiments on the Pokec graph [23] which has 1.6 million nodes and 30 million edges. Figure4 shows the results on Running Time per Search (s) 100 Pokec for w = 100, 000. Here we clearly see the cross-over point where Monte-Carlo becomes more efficient than Bi- directional-PPR, while BiPPR-Precomp-Grouped and BiPPR- -1 10 1 2 3 4 Precomp-Sampling consistently take less than 250 millisec- 10 10 10 10 Target Set Size onds. On Pokec, the storage used ranges from 800KB for |T | = 10 to 3MB for |T | = 10, 000. We implement our algorithms in Scala and report run- Figure 4: Running time on Pokec (30 million edges) ning times for Scala, but in preliminary experiments BiPPR- performing 100K walks at runtime. Notice that Precomp-Grouped is 3x faster when re-implemented in C++, Monte-Carlo is slow for small |T |, Bidirectional- we expect the running time would improve comparably for PPR is slow for large |T |, and BiPPR-Precomp-Grouped all five algorithms. Also, we ran each experiment on a sin- and BiPPR-Precomp-Sampling are fast across the entire gle thread, but the algorithms parallelize naturally, so the range of |T |. latency could be improved by a multi-threaded implemen- tation. We ran our experiments on a machine with a 3.33 GHz 12-core Intel Xeon X5680 processor, 12MB cache, and 192 GB of 1066 MHz Registered ECC DDR3 RAM. We mea- sured the running time of the tread running each experiment to exclude garbage collector time. We loaded the graph used into memory and completed any pre-computation in RAM pagerank contributions. In Algorithms and Models for before measuring the running time of the algorithms. the Web-Graph. Springer, 2007. [2] K. Avrachenkov, N. Litvak, D. Nemirovsky, and 5. RELATED WORK N. Osipova. Monte carlo methods in pagerank computation: When one iteration is sufficient. SIAM Prior work on PPR Estimation The Bidirectional- Journal on Numerical Analysis, 2007. algorithm introduced in the first half of this work builds PPR [3] B. Bahmani, A. Chowdhury, and A. Goel. Fast on the algorithm presented in [19] – for details FAST-PPR incremental and personalized pagerank. Proceedings of of prior work on Personalized PageRank estimation, see the the VLDB Endowment, 2010. section on existing approaches in [19]. Although FAST-PPR [4] B. Bahmani and A. Goel. Partitioned multi-indexing: was the first algorithm for PPR estimation with sublinear bringing order to . In Proceedings of the running-time guarantees, it has several drawbacks which are 21st international conference on . improved upon by our new Bidirectional-PPR algorithm: ACM, 2012. • Bidirectional-PPR has a simple linear structure which enables searching; Eqn.4 shows that the estimates are [5] S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, a dot-produce between a forward vector xs and a reverse S. Kumar, D. Ravichandran, and M. Aly. Video vector yt. In contrast, estimates in [19] require monitoring suggestion and discovery for : taking random each walk to detect collisions with a “frontier” set. walks through the view graph. In Proceedings of the • Bidirectional-PPR is 3x-8x faster than FAST-PPR for the 17th international conference on World Wide Web, same accuracy in experiments on diverse networks. pages 895–904. ACM, 2008. • Bidirectional-PPR is cleaner and more elegant, leading [6] S. Banerjee and P. Lofgren. Fast bidirectional to simpler correctness proofs and performance analysis. probability estimation in markov models. In Advances This also makes it easier to generalize to arbitrary Markov in Neural Information Processing Systems, pages Chains, as done in [6]. 1423–1431, 2015. Comparison to Partitioned Multi-Indexing For per- [7] P. Berkhin. Bookmark-coloring algorithm for sonalized search, our indexing scheme is partially inspired personalized pagerank computing. Internet by the Partitioned Multi-Indexing (PMI) scheme of Bah- Mathematics, 3(1):41–62, 2006. mani et al. [4]. Similar to our methods, PMI uses a bidirec- [8] S. Chakrabarti. Dynamic personalized pagerank in tional approach to rank search results according to shortest entity-relation graphs. In Proceedings of the 16th path distance from the searching user. Shortest path is eas- international conference on World Wide Web, pages ier to estimate than PPR, due to the fact that shortest path 571–580. ACM, 2007. is a metric; moreover, shortest path is believed to be a less [9] D. Fogaras, B. R´acz, K. Csalog´any, and T. Sarl´os. effective way of ranking search results than PPR. From a Towards scaling fully personalized pagerank: technical point of view, PMI is based on ‘sweeping’ from Algorithms, lower bounds, and experiments. Internet closer to more distant targets based on a distance oracle; in Mathematics, 2005. contrast, we use sampling to find the most relevant targets. [10] D. F. Gleich. PageRank beyond the web. arXiv, Prior work on Personalized PageRank Search In [7], cs.SI:1407.5107, 2014. Accepted for publication in Berkhin builds upon the previous work [14] and proposes SIAM Review. efficient ways to compute the personalized PageRank vec- [11] A. Goel, P. Gupta, and P. Lofgren. In preparation: tor πs at runtime by combining pre-computed PPR vectors Cross partitioning: Realtime computation of cosine in a query-specific way. In particular, they identify “hub” similarity, personalized pagerank, and more. Technical nodes in advance, using heuristics such as global PageRank, report, Stanford University, 2015. available at and precompute approximate PPR vectorsπ ˆh for each hub http://www.stanford.edu/~plofgren/. node using a local forward-push algorithm called the Book- [12] P. Gupta, A. Goel, J. Lin, A. Sharma, D. Wang, and mark Coloring Algorithm (BCA). Chakrabarti [8] proposes R. Zadeh. Wtf: The who to follow service at twitter. a variant of this approach, where Monte-Carlo is used to In Proceedings of the 22nd international conference on pre-compute the hub vectorsπ ˆh rather than BCA. World Wide Web. International World Wide Web Both approaches differ from our work in that they con- Conferences Steering Committee, 2013. struct complete approximations to πs, then pick out entries [13] T. H. Haveliwala. Topic-sensitive pagerank. In relevant to the query. This requires a high-accuracy esti- Proceedings of the 11th international conference on mate for πs even though only a few entries are important. World Wide Web, pages 517–526. ACM, 2002. In contrast, our bidirectional approach allows us compute [14] G. Jeh and J. Widom. Scaling personalized web only the entries π (t ) relevant to the query. s i search. In Proceedings of the 12th international conference on World Wide Web. ACM, 2003. 6. ACKNOWLEDGMENTS [15] D. E. Knuth. The Art of Computer Programming, Vol. Research supported by the DARPA GRAPHS program 2: Seminumerical Algorithms, 3rd Edition. Reading, via grant FA9550-12-1-0411, and by NSF grant 1447697. Mass.: Addison-Wesley, 1998. One author was supported by an NPSC fellowship. [16] Laboratory for web algorithmics. http://law.di.unimi.it/datasets.php. Accessed: 7. REFERENCES 2014-02-11. [17] P. Lofgren. Efficient Algorithms for Personalized [1] R. Andersen, C. Borgs, J. Chayes, J. Hopcraft, V. S. PageRank. PhD thesis, Stanford University, 2015. Mirrokni, and S.-H. Teng. Local computation of available at Comparable Accuracy Given Parameter Choices on Twitter-2010 http://cs.stanford.edu/people/plofgren/. 1.00 [18] P. Lofgren and A. Goel. Personalized pagerank to a Monte Carlo 0.98 Bidirectional-PPR target node. arXiv preprint arXiv:1304.4658, 2013. Bidirectional-PPR-Sampling [19] P. A. Lofgren, S. Banerjee, A. Goel, and C. Seshadhri. 0.96 Fast-ppr: Scaling personalized pagerank estimation for

large graphs. In Proceedings of the 20th ACM 0.94 SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pages 0.92 1436–1445, New York, NY, USA, 2014. ACM. [20] L. Page, S. Brin, R. Motwani, and T. Winograd. The Mean Precision@3 0.90 pagerank citation ranking: bringing order to the web. 1999. 0.88 [21] A. Shrivastava and P. Li. Asymmetric lsh (alsh) for 0.86 sublinear time maximum inner product search (mips). In Advances in Neural Information Processing 101 102 103 104 Target Set Size Systems, pages 2321–2329, 2014. [22] A. Shrivastava and P. Li. Improved asymmetric locality sensitive hashing (alsh) for maximum inner Figure 5: Median precision@3 for the search algo- product search (mips). stat, 1050:13, 2014. rithms we compare. Notice that the Precision@3 of our algorithms exceeds 90% and exceeds the preci- [23] Stanford network analysis platform (snap). sion@3 of Monte-Carlo. http://http://snap.stanford.edu/. Accessed: 2014-02-11. [24] M. V. Vieira, B. M. Fonseca, R. Damazio, P. B. APPENDIX Golgher, D. d. C. Reis, and B. Ribeiro-Neto. Efficient search ranking in social networks. In Proceedings of A. MORE EXPERIMENT PLOTS the sixteenth ACM conference on Conference on In Figure5, we plot the Precision@3 for several search information and knowledge management. ACM, 2007. algorithms on Twitter-2010 using the same paramters as [25] A. J. Walker. An efficient method for generating the experiments that used w = 100, 000. Note that BiPPR- discrete random variables with general distributions. Precomp and BiPPR-Precomp-Grouped compute the same es- ACM Trans. Math. Softw., 3(3):253–256, Sept. 1977. timates, and these estimates are similar to those of Bi- directional-PPR, so we plot a single line for their accuracy.