Parallel Personalized Pagerank on Dynamic Graphs

Parallel Personalized PageRank on Dynamic Graphs Wentian Guo, Yuchen Li, Mo Sha, Kian-Lee Tan School of Computing, National University of Singapore wentian, liyuchen, sham, tankl @comp.nus.edu.sg { } ABSTRACT representative graphs often evolve rapidly in many applica- 9 Personalized PageRank (PPR) is a well-known proximity tions. For example, network traffic data averages 10 pack- measure in graphs. To meet the need for dynamic PPR ets/hour/router for large ISPs [17] and Twitter has more maintenance, recent works have proposed a local update than 500 million tweets per day [3]. Since computation of scheme to support incremental computation. Nevertheless, PPR from scratch is prohibitively slow against high rate of sequential execution of the scheme is still too slow for high- graph updates, recent works [38, 49] focus on incremental speed stream processing. Therefore, we are motivated to PPR maintenance. The state-of-the-art dynamic PPR ap- design a parallel approach for dynamic PPR computation. proaches are based on a local update scheme [6, 5, 11, 31, 29, First, as updates always come in batches, we devise a batch 38, 49]. This scheme takes two steps against each single up- processing method to reduce synchronization cost among ev- date: (1) restore invariants; (2) perform local pushes. More ery single update and enable more parallelism for iterative specifically, each vertex v keeps two values Ps(v)andRs(v) parallel execution. Our theoretical analysis shows that the where Ps(v) is the current estimate to the PPR value of v parallel approach has the same asymptotic complexity as w.r.t. the source vertex s and Rs(v)istheupperboundon the sequential approach. Second, we devise novel optimiza- the estimation bias. We call Ps(v)andRs(v)astheestimate tion techniques to e↵ectively reduce runtime overheads for and residual of v respectively. The invariant is maintained parallel processes. Experimental evaluation shows that our for all vertices to ensure the residual is valid. To handle parallel algorithm can achieve orders of magnitude speedups any update, e.g., an edge insertion/deletion, the invariant is on GPUs and multi-core CPUs compared with the state-of- first restored which results in change in the residual. Subse- the-art sequential algorithm. quently, if the residual of a vertex v is over a user-specified threshold, i.e., ✏, the residual is pushed to v’s neighbors. PVLDB Reference Format: Although it has been shown in [49] that only a few op- Wentian Guo, Yuchen Li, Mo Sha, Kian-Lee Tan. Parallel Per- erations are needed to handle any single edge update (in- sonalized PageRank on Dynamic Graphs. PVLDB,11(1):3-9 sertion/deletion) under the common edge arrival models, it 106, 2017. DOI: https://doi.org/10.14778/3136610.3136618 is practically inefficient. We have observed latency of up to 2 3minutesagainstanupdatebatchconsistingof5000 edges− using the state-of-the-art approach [49] in our exper- 1. INTRODUCTION iments. This hardly matches the streaming rates of real- Personalized PageRank is one of the most widely used world dynamic graphs (e.g., 7000 tweets are generated per proximity measure in graphs. While PageRank quantifies second in Twitter [3]). Meanwhile, the emerging hardware the importance of vertices overall, the PPR vector measures architectures provide the consumers with massive computa- the importance of vertices w.r.t to a given source vertex s. tion power. The two of the most popular hardwares, multi- For instance, a vertex v with a larger PPR value w.r.t. s core CPUs and GPUs have shown their great potentials in implies that a random walk starting from s has a higher accelerating the graph applications [42, 36, 47, 34, 25, 35]. probability to reach v.PPRisusedinapplicationsofmany By utilizing the emerging hardwares, we are motivated to fields, such as web search for Internet individuals [39, 22], devise an efficient parallel approach for dynamic PPR com- user recommendation [8, 19], community detection [48] and putation under the state-of-the-art local update scheme to graph partitioning [6, 5]. achieve high-speed stream processing. Most existing works [39, 22, 6, 5, 30, 31, 29, 11] fo- Parallelizing dynamic PPR computation poses two new cus on PPR computation on static graphs. However, the challenges. First, as the update often comes in batches, it remains unclear if the batch can be updated in parallel for Permission to make digital or hard copies of all or part of this work for the local update scheme. A naive solution is to synchro- personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies nize on each single update to repair the invariant and then bear this notice and the full citation on the first page. To copy otherwise, to launch the local push. However, this solution causes sig- republish, to post on servers or to redistribute to lists, requires prior specific nificant synchronization overheads against batch updates. permission and/or a fee. Articles from this volume were invited to present Second, parallelizing the local push is non-trivial as it in- their results at The 44th International Conference on Very Large Data Bases, curs extra overheads compared with a sequential approach. August 2018, Rio de Janeiro, Brazil. We call the local push getting parallelized as the parallel Proceedings of the VLDB Endowment, Vol. 11, No. 1 Copyright 2017 VLDB Endowment 2150-8097/17/09... $ 10.00. push. The extra overheads of the parallel push come from DOI: https://doi.org/10.14778/3136610.3136618 93 two sources: (a) the parallel push, can take more operations, Algorithm 1 RestoreInvariant as it reads stale values of the residual at the beginning of it- Input: (G,Ps,Rs,u,v,op) erations and then propagates less probability mass from the Require: (u, v, op)istheupdatingedgewithop being the in- frontier vertices (the frontier contains vertices that need to sert/delete operation. 1: procedure Insert(u, v) be pushed in current iteration); (b) the parallel push needs (1 ↵) Ps(v) Ps(u) ↵ Rs(u)+↵ 1u=s 1 2: Rs(u) += − · − − · · to iteratively maintain a vertex frontier and thus results in dout(u) · ↵ 3: procedure Delete(u, v) synchronization costs to merge duplicate vertices. (1 ↵) Ps(v) Ps(u) ↵ Rs(u)+↵ 1u=s 1 4: Rs(u) = − · − − · · In this paper, we propose a novel parallel approach to − dout(u) · ↵ tackle the aforementioned challenges. First, we devise a batch processing method to avoid synchronization costs within a batch as well as to generate enough workloads for paral- are the out degrees of all vertices in G.ThePPRvector⇡s lelism. Our theoretical analysis proves that the parallel ap- w.r.t. s is the solution of the following linear equation: proach has the same asymptotic complexity as its sequential counterparts. Second, we propose novel optimization tech- ⇡s = ↵es +(1 ↵)W ⇡s (1) niques to practically reduce the number of local push op- − erations (eager propagation)andtoefficiently maintain the where ↵ is the teleport probability (which is a constant and vertex frontier in parallel by utilizing the monotonicity of it is typically set to 0.15), es is a personalized unit vector the local push process (local duplicate detection). with a single non-zero entry of 1 at s,andW is the transition We hereby summarize our contributions as follows: T 1 matrix s.t. W = A D− . We introduce a parallel approach to support dynamic Note that general PPR formulation does not restrict the • PPR computation over high-speed graph streams. One personalized vector to be a unit vector. We omit the dis- salient feature is that updates are processed in batches. cussion for the general case as it can be reduced to the case Theoretical analysis shows that the parallel approach with the unit vector scenario. In fact, many existing works has the same asymptotic complexity with its sequen- study how to use the unit vector formulation as a building tial counterpart under the two common edge arrival block to efficiently support the general case by maintaining models. To the best of our knowledge, this is the first multiple PPR vectors with di↵erent personalized unit vec- work on parallelizing dynamic PPR processing. tors. Interested readers are referred to [49, 38, 10, 31, 6, 9, We propose several optimization techniques to improve 29, 11]. Henceforth, this paper focuses on providing efficient • the performance of the parallel push. Eager propaga- parallel approach for the unit vector formalization. tion helps to reduce the number of local push operations. Our novel frontier generation method efficiently 2.2 Dynamic Graph Model maintains the vertex frontier by mitigating synchro- We introduce a typical dynamic graph model [10, 49] to nization overheads to merge duplicate vertices. articulate the dynamic PPR problem. The dynamic model We implement our parallel approach on GPUs and ∆ t • consists of an unbounded sequence of updates E ,which multi-core CPUs. The experiments over real-world indicate the set of edges arriving at time step t.Eachel- datasets have verified the e↵ectiveness of our proposal ement of ∆Et is (u, v, op)thatmeansanedgeu v with over both architectures. The proposed approach can op denoting the type (insertion/deletion). Note that! in pre- achieve orders of magnitude speedups on GPUs and vious works [38, 49] only one single edge update arrives at multi-core CPUs compared to the state-of-the-art se- time step t, in the following we will discuss them with the quential approach. same notations but readers should notice ∆Et = 1 for the The remaining part of this paper is organized as follows. Sec- case of single update. 0 0 0 tion 2 introduces preliminaries and the background.

Load more