Fast Parallel PageRank: A Linear System Approach

∗ David Gleich Leonid Zhukov Pavel Berkhin Stanford University, ICME Yahoo! Yahoo! Stanford, CA 94305 701 First Ave 701 First Ave [email protected] Sunnyvale, CA 94089 Sunnyvale, CA 94089 [email protected] pberkhin@yahoo- inc.com

ABSTRACT PageRank is also becoming a useful tool applied in many In this paper we investigate the convergence of iterative sta- Web search technologies and beyond, for example, spam de- tionary and Krylov subspace methods for the PageRank lin- tection [13], crawler configuration [10], or trust networks ear system, including the convergence dependency on tele- [20]. In this setting many PageRanks corresponding to dif- portation. We demonstrate that linear system iterations ferent modifications – such as graphs with a different level of converge faster than the simple power method and are less granularity (HostRank) or different link weight assignments sensitive to the changes in teleportation. (internal, external, etc.) – have to be computed. For each In order to perform this study we developed a framework technology, the critical computation is the PageRank-like for parallel PageRank computing. We describe the details vector of interest. Thus, methods to accelerate and paral- of the parallel implementation and provide experimental re- lelize these computations are important. Various methods to sults obtained on a 70-node Beowulf cluster. accelerate the simple power iterations process have already been developed, including an extrapolation method [19], a block-structure method [18], and an adaptive method [17]. Categories and Subject Descriptors Traditionally, PageRank has been computed as the prin- G.1.3 [Numerical Analysis]: Numerical Linear Algebra; ciple eigenvector of a probability transition D.2 [Software]: Software Engineering; H.3.3 [Information matrix. In this paper we consider the PageRank linear sys- Storage and Retrieval]: Information Search and Retrieval tem formulation and its iterative solution methods. An ef- ficient solution of a linear system strongly depends on the proper choice of iterative methods and, for high performance General Terms solvers, the computation architecture as well. There is no Algorithms, Performance, Experimentation best overall iterative method; one of the goals of this paper is to investigate the best method for the particular class of problems arising from PageRank computations on parallel Keywords architectures. PageRank, Eigenvalues, Linear Systems, Parallel Comput- It is well known that the random teleportation used in ing PageRank strongly affects the convergence of power iter- ations [15]. It has also been shown that high teleportation can help spam pages to accumulate PageRank [13], but a re- 1. INTRODUCTION duction in teleportation typically hampers the convergence The PageRank algorithm, a method for computing the of standard power methods. In this paper, we investigate relative rank of web pages based on the Web link struc- how the convergence of the linear system is affected by a ture, was introduced in [25, 8] and has been widely used reduction in a degree of teleportation. since then. PageRank computations are a key component of To perform multiple numerical experiments on real Web modern Web search ranking systems. For a general review graph data, we developed a system that can compute results of PageRank computing see [22, 7]. within minutes on Web graphs with one billion or more links. Until recently, the PageRank vector was primarily used When constructing our system, we did not attempt to op- to calculate a global importance score for each page on timize each method and instead chose an approach that is the web. These scores were recomputed for each new Web amenable to working with multiple methods in a consistent graph crawl. Recently, significant attention has been given manner while still easily adaptable to new methods. An ex- to topic-specific and personalized PageRanks [14, 16]. In ample of this trade-off is that our system stores edge weights both cases one has to compute multiple PageRanks corre- for each graph, even though many Web graphs do not use sponding to various teleportation vectors for different topics these weights. or user preferences. The remainder of the paper is organized as follows. The next section, PageRank Linear System, contains an analyt- ∗ Work performed while at Yahoo! ical derivation of a linear system version of PageRank, fol- lowed by discussion of numerical methods. We then pro- vide details on our parallel implementation. The Numerical Method IP SAXPY MV Storage PAGERANK 1 1 M + 3v JACOBI 1 1 M + 3v GMRES i + 1 i + 1 1 M + (i + 5)v BiCG 2 5 2 M + 10v BiCGSTAB 4 6 2 M + 10v

Table 1: Computational Requirements. Operations per iteration: IP counts inner products, SAXPY counts AXPY operations, MV counts matrix vector multiplications, and Storage counts the number of matrices and vectors required for the method.

Figure 1: Chart of computational methods. the second condition, aperiodicity, is routinely fulfilled (after the first condition is satisfied). The easiest way to achieve strong connectivity is to add to every page, dangling or not, Experiments section contains experimental results including a small degree of teleportation timings of the iterative methods on various size graphs and P 00 = cP 0 + (1 − c)evT , e = (1,..., 1). (3) an analysis of error metrics. where c is a teleportation coefficient. In practice 0.85 ≤ c < 2. PAGERANK LINEAR SYSTEM 1. After these modifications, matrix P 00 is row-stochastic and irreducible, and therefore, simple power iterations

2.1 PageRank Formulation (k+1) 00T (k) x = P x (4) Consider a Web graph adjacency matrix A with elements 00T Aij equal to 1 when there is a link i → j and equal to 0 oth- for the eigen-system P x = x converge to its principal erwise. Here i, j = 1 : n and n is a number of Web pages. eigenvector. For pages with a non-zero number of out-links deg(i) > 0, Now, combining Eq.(2 - 4) we get the rows of A can be normalized (made row-stochastic) by [cP T + c(vdT ) + (1 − c)(veT )]x = x. (5) setting Pij = Aij /deg(i). Assuming there are no dangling T T pages (vide infra) the PageRank x can be defined as a lim- Noting that (e x) = (x e) = ||x||1 = ||x|| (henceforth, all iting solution of the iterative process norms will be assumed to be 1-norms) we can derive a con- (k+1) X (k) X (k) venient identity xj = Pij xi = xi /deg(i). (1) T T i i→j (d x) = ||x|| − ||P x||. (6) At each iterative step, the process distributes page author- Then, Eq. (5) can be written as a linear system ity weights equally along the out-links. The PageRank is T defined as a stationary point of the transformation from (I − cP )x = k v (7) (k) (k+1) x to x defined by (1). This transfer of authority where k = k(x) = ||x|| − c||P T x|| = (1 − c)||x|| + (dT x). corresponds to the Markov chain transition associated with A particular value of k only results in a rescaling of x, the random surfer model when a random surfer on a page which has no effect on page ranking. The solution can always arbitrarily follows one of its out-links uniformly. be normalized to be a probability distribution by x/||x||. The Web contains many pages without out-links, called This brings us to the point that equation (7) with a fixed k dangling nodes. Dangling pages present a problem for the constitutes a linear system formulation of PageRank. In the mathematical PageRank formulation. A review of various future we simply set k = 1 − c. We also introduce the nota- approaches dealing with dangling pages can be found in [11]. tions A = I − cP T and b = k v = (1 − c)v, thus the linear One way to overcome this difficulty, is to slightly change system under consideration has the standard form Ax = b. the transition matrix P to a truly row-stochastic matrix The non-normalized solution to the linear system has an P 0 = P + d · vT , (2) advantage of depending linearly on the teleportation distri- bution. This property is important for blending topical or deg(i) where di = δ0 is the dangling page indicator, and v is otherwise personalized PageRanks. For this linear system, some probability distribution over pages. This model means we study the performance of advanced iterative methods in that the random surfer jumps from a dangling page accord- a parallel environment. ing to a distribution v. For this reason vector v is called a Casting PageRank as a linear system was suggested by teleportation distribution. Originally uniform teleportation, Arasu et al. [2] where Jacobi, Gauss-Seidel, and Successive vi = 1/n, was used, but non-uniform teleportation received Over-Relaxation iterative methods were considered. Numer- much attention after the advent of PageRank personaliza- ical solutions for various Markov chain problems are also tion [14, 16]. investigated in [26]. The classic simple power iterations (1) starting at an ar- bitrary initial guess converge to a principal eigenvector of P 2.2 Iterative Methods under two conditions specified in the Perron-Frobenius theo- The PageRank linear system matrix A = I − cP T is very rem. The first condition, matrix irreducibility or strong con- large, sparse and non-symmetric. Solution of the linear sys- nectivity of a graph, is not satisfied for a Web graph, while tem Eq. (7) by a direct method is not feasible due to the Name Nodes Links Storage Size 2.3 Parallel Implementation edu 2M 14M 176 MB We chose a parallel implementation of the PageRank al- yahoo-r2 14M 266M 3.25 GB gorithms to meet the scalability requirement. We wanted uk 18.5M 300M 3.67 GB to be able to compute PageRank vectors for large graphs yahoo-r3 60M 850M 10.4 GB (one billion links or more) quickly to facilitate experiments db 70M 1B 12.3 GB with the PageRank equation. To that end, our goal was av 1.4B 6.6B 80 GB to keep the entire Web graph in memory on a distributed memory parallel computer while computing the PageRank Table 2: Basic statistics for the data sets used in the vector. An alternate approach explored in [24] is to store experiments. a piece of the Web-graph on separate hard disks for each processor and iterate through these files as necessary. Our parallel computer was a Beowulf cluster of RLX blades matrix size and computational recourses. Sparse LU factor- connected in a star topology with a gigabit ethernet. We ization [23] can still be considered for smaller size problems had seven chassis composed of 10 dual processor Intel Xeon (or subproblems), but it does create additional fill in. It is blades with 4 GB of memory each (140 processors, and 280 also notoriously hard to parallelize. GB memory total). Each blade inside a chassis was con- In this paper we concentrate on the use of iterative meth- nected to a gigabit switch and the seven chassis’ were all ods [12, 3, 6]. There are two main requirements for the connected to one switch. iterative linear solver: i) it should work with nonsymmet- The parallel PageRank codes use the Portable, Extensible ric matrices and ii) it should be easily parallelizable. Thus, Toolkit for Scientific Computation (PETSc) [4, 5] to imple- from the stationary methods we use Jacobi iterations (in ment basic linear algebra operations and basic iterative pro- particular, since the matrix is strongly diagonally dominant) cedures on parallel sparse matrices. In particular, PETSc and from the non-stationary methods we have chosen sev- contains parallel implementations of many linear solvers, eral Krylov subspace methods. Figure (1) presents a chart including GMRES, BiCGSTAB, and BiCG. While PETSc of the methods used in this paper. provided many of the operations necessary for working with Power iterations. Simple power iterations (4) are the sparse matrices, we still had to develop our own tools to classic and the most widely used process for finding PageR- load the Web graphs as parallel sparse matrices, distribute ank. It has a convergence rate equal to c [15] (that is, the 00 them among processors and perform load balancing. second eigenvalue of the matrix P ). Computationally effi- PETSc stores a sparse matrix in parallel by dividing the cient formulation of these iterations can be found in [25]. rows of the matrix among the p processors. Thus, each Jacobi iterations. The Jacobi process is the simplest processor only stores a submatrix of the original matrix. stationary iterative method and is given by Further, PETSc stores vectors in parallel by only storing x = cP T · x + b. (8) the rows of the vector corresponding to the matrix on that k+1 k processor. That is, if we partition the n rows of a matrix Jacobi iterations formally result in the geometric series A among p processors then processor i will store both the P T k matrix rows and the corresponding vector elements. This x = k≥0(cP ) b, when started at x0 = b and thus, also have the rate of convergence of at least c. It is indeed equal distribution allows us to load graphs or matrices which are to c on graphs without dangling pages, in which case Jacobi larger than any individual processor can handle, as well as and power iterations are the same. to operate on vectors which are larger than any processor Krylov subspace methods. We also consider a set of can handle. While this method allows us to work on large Krylov subspace methods to solve the linear system Eq. (7). data, it may involve significant off-processor communication These methods are based on certain minimization proce- for matrix-vector operations. See [5] for a discussion of how dures and only use the matrix through matrix-vector mul- PETSc implements these operations. tiplication. Detailed description of the algorithms are avail- Ideally, the best way to divide a matrix A among p pro- able in [6] and [4]. cessors is to try to partition and permute A in order to bal- In this study we have chosen several Krylov methods sat- ance work and minimize communication between the pro- isfying our criteria: cessors. This classical graph partitioning problem is NP - - Generalize Minimum Residual (GMRES) hard, and approximate graph partitioning schemes such as - Biconjugate Gradient (BiCG) ParMeTiS [21] and Pjostle [27] do not work well on such - Quasi-Minimal Residual (QMR) large power-law data. Thus, we restricted ourselves to a sim- - Conjugate Gradient Squared (CGS) plified heuristic method to balance the work load between - Biconjugate Gradient Stabilized (BiCGSTAB) processors. - Chebyshev Iterations. By default, PETSc balances the number of matrix rows The convergence of Krylov methods can be improved by on each processor by assigning each approximately n/p rows. use of preconditioners. In this study we have used parallel For large Web graphs this scheme will result in a dramatic Jacobi, Block Jacobi and Adaptive Schwarz preconditioners. imbalance among the number of non-zero elements stored In all of the above methods we start with initial guess on each processor and often will result in at least one pro- x(0) = v. Note that the norm of ||x(k)|| is not preserved cessor with more than 4 GB of data. To solve this problem in the linear system iterations. If desired, one can nor- we implemented a balancing scheme whereby we can choose malize the solution to become a probability distribution by how to balance the data among processors at run-time. We x(k)/||x(k)||. allow the user to specify two integer weights wrows and wnnz The computational complexity and space requirements of and try to balance the quantity wrowsnp + wnnznnzp among the above methods are given in Table (1). processors, where np and nnzp are the number of rows and Error Metrics for BiCG Convergence 2 10 ||x(k+1) − x(k)|| ||A x(k) − b|| 0 (k+1) (k) (k) 10 ||x − x ||/||x ||

−2 10

−4 Metric Value 10

−6 10

−8 10 0 5 10 15 20 25 30 35 40 45 Iteration

Figure 2: Load balancing experiment for “yahoo-r2” Figure 3: Tracking all error metrics during conver- graph. gence on “db” graph for BiCG method. non-zeros on processor p, respectively. To implement this It is also seen from the bottom curve that when the com- approximate balancing scheme, we always store entire rows munication and work load balance is approximately pre- on one processor and keep adding rows to a processor and in- served, increasing the number of processors leads to a smaller crementing a counter until this counter exceeds a determined computation time, but that speedup curve eventually satu- threshold (wrowsn + wnnznnz)/p. Typically, we used equal rates. weighing between rows and non-zeros (wrows = 1, wnnz = 1). Due to the large number of low-degree nodes in our power- 3.3 Equivalence of Metrics law dataset, this approximate scheme, in fact, gave us a When comparing the error among methods, we used ab- good balance of work and memory among processors. solute error ||x(k+1) −x(k)|| for power iterations and normal- ized residual error ||Ax(k) −b||/||x(k)|| for the linear systems. 3. NUMERICAL EXPERIMENTS These are the equivalent metrics as the following analysis demonstrates. For a graph with no dangling nodes using 3.1 Data power iterations,

In our experiments we used six Web related directed graphs. (k+1) (k) T (k) (k) The “av” graph is the Alta Vista 2003 crawl of the Web, the ||x − x || = ||cP x − x + (1 − c)v||, (9) largest dataset used in experiments. We also constructed and using a linear system, and used three subsets of “av” graph: the “edu” graph containing web pages from the .edu domain; “yahoo-r2” ||Ax(k) − b|| = ||cP T x(k) − x(k) + (1 − c)v||. (10) and “yahoo-r3” graphs containing pages reachable from ya- (k) hoo.com by following two and three in- or out-links. We also However, since we do not enforce ||x || = 1 for the linear (k) (k) used a Host graph “db” obtained from a Yahoo! crawl by systems, we report ||Ax − b||/||x || instead. aggregation (a sum of) links between the pages within the In Figure (3) we show the value of the above metrics, plus host. Finally, the “uk” graph contains only UK sites and relative error ||x(k+1) − x(k)||/||x(k)||, for a linear system was obtained from [1]. Basic statistics for these graphs is solved with the BiCG solver on the “db” Host graph. The given in the Table (2). metrics follow the same behavior, with the residual slightly more stable for some methods. Thus residual was used to 3.2 Load Balancing Experiments control and stop iterations when the desired tolerance was We have performed a set of load balancing experiments achieved. The other error metrics are almost linearly scaled with the goal to find the best matrix partition and node because ||x(k)|| approaches a constant as k increases. distribution strategy that optimizes the algorithm perfor- mance. Figure (2) shows the computational (run) time as 3.4 Convergence Experiments a function of number of processors for two load balancing We have performed multiple experiments on convergence schemes. The upper curve corresponds to an equipartitioned of the iterative methods on a set of graphs. We have studied graph with n/p rows per processor, and the lower curve is the rate of convergence, the number of iterations and total obtained using our load balancing method. time taken by a method to converge to a given accuracy. It is clear that for the equipartitioned distribution, in- The detailed results for all tested methods and graphs are creasing the number of processors can lead to slower com- presented in Table (3). The quasi-minimum residual, conju- putations due to significant communications overhead. Web gate gradient squared, and Chebyshev methods did not con- graphs contain clusters corresponding to blocks in the tran- verge on our data and we do not report additional results sition matrix. If the clusters are split naively over several on these methods. In the table, the first line for each graph processors, it leads to diminishing performance. denotes the number of iterations required for each method uk iteration convergence uk time convergence 0 0 10 10 std std −1 jacobi −1 jacobi 10 10 gmres gmres bicg bicg −2 −2 10 bcgs 10 bcgs

−3 −3 10 10

−4 −4 10 10 Error Error

−5 −5 10 10

−6 −6 10 10

−7 −7 10 10

−8 −8 10 10 0 10 20 30 40 50 60 70 80 0 5 10 15 20 25 30 35 40 45 Iteration Time (sec) (a) Convergence Iterations (b) Convergence Time

Figure 4: Convergence of iterative methods on the “uk” Web graph.

db iteration convergence db time convergence 0 0 10 10 std std −1 jacobi −1 jacobi 10 10 gmres gmres bicg bicg −2 −2 10 bcgs 10 bcgs

−3 −3 10 10

−4 −4 10 10 Error Error

−5 −5 10 10

−6 −6 10 10

−7 −7 10 10

−8 −8 10 10 0 10 20 30 40 50 60 70 0 200 400 600 800 1000 Iteration Time (sec) (a) Convergence Iterations (b) Convergence Time

Figure 5: Convergence of iterative methods on the “db” Host graph. Graph PR Jacobi GMRES BiCG BCGS av time 0 edu 84 84 21† 44∗ 21∗ 10 20 procs 0.06 s 0.06 s 0.6 s 0.85 s 0.41 s std −1 bcgs yahoo-r2 71 65 12 20 10 10

−2 uk 73 71 22∗ 25∗ 11∗ 10 60 procs 0.08 s 0.14 s 0.8 s 0.78 s 1.05 s −3 yahoo-r3 76 75 10 60 procs 2.4 s 2.2 s −4 db 62 58 29 45 15∗ 10 error 60 procs 13 s 12 s 22 s 21 s 21 s −5 av 72 76 26 10 140 procs 30 s 30 s 60 s −6 10

Table 3: Timing Results. See discussion in text. −7 10

−8 10 to converge to an absolute residual value of 10−7. The sec- 0 500 1000 1500 2000 2500 ond line shows the mean time per iteration at the given Time (sec) number of processors. For the Krylov subspace methods, no superscript means we used no preconditioner, whereas ∗ Figure 6: Convergence on the full “av” Web graph. denotes a block Jacobi preconditioner and † denotes an ad- ditive Schwartz preconditioner. We only report results for the fastest (by time) method. All Krylov solvers failed on nodes and 6.6 billion edges. Other experiments with this “yahoo-r3” due to memory overloads on only 60 processors. data have been done by Broder et al. [9]. Our main re- The convergence behavior of algorithms on “uk” and “db” sults on this graph are presented in Figure (6) and Table graphs is shown in Figures (4) and (5). In these figures, (3). It presents performance of simple power iterations and we plot absolute error ||x(k+1) − x(k)|| for power iterations the BiCGSTAB method. For the linear system we show and normalized residual ||Ax(k) − b||/||x(k)|| for the linear the normalized residual error and absolute error for power systems. iterations. Our analysis of the results shows that A few tweaks were required to allow this large graph to work with our implementation. First, equal balancing be- • Power and Jacobi methods have approximately the tween nodes and edges is insufficient for such large graphs same rate and the most stable convergence pattern. and overloads the 4 GB memory limit on some nodes. Thus, This is an advantage of stationary methods that per- we attempted to estimate the optimal balance for the algo- form the same amount of work per any iteration. rithm. From Table (1), the PageRank algorithm requires storing the matrix and 3 vectors. Since the matrix is stored • Convergence of Krylov methods strongly depends on in compressed-row format with double sized floating point the graph and is non-monotonic. values, PETSc requires at least (12·nnz+4n)+3·(8n) bytes. • Although the Krylov methods have the highest average Because there is some overhead in storing the parallel ma- convergence rate and fastest convergence by number of trix, we typically estimate about 16nnz total storage for the iterations, on some graphs, the actual run time can be matrix (i.e. 33% overhead). This yields a row-to-non-zero longer than the run time for simple Power iterations. distribution of approximately 2 to 1. Thus, for simple power iterations we set wrows = 2 and wnnz = 1 for our dynamic • BiCGSTAB and GMRES have the highest rate of con- matrix distribution. A similar analysis for the BiCGSTAB vergence and converge in the smallest number of iter- algorithm yields the values wrows = 5 and wnnz = 1. ations, with GMRES demonstrating more stable be- It is reported in [9] that an efficient implementation of the havior. serial PageRank algorithm took 12.5 hours on this graph using a quad-processor Alpha 667 MHz server. A similar 3.5 Convergence and Teleportation serial implementation on a 800 MHz Itanium takes approx- We have performed several experiments to learn the de- imately 10 hours. Our implementation takes 35.5 minutes pendency of the convergence rate on the teleportation coef- (2128 secs) for PageRank and 28.2 minutes (1690 secs) for ficient c ≥ 0.85. The results are Figure (7). It shows that BiCGSTAB on the full cluster of 140 processors (70 ma- when c = 0.99, power iterations were not able to obtain de- chines). These parallel run times do not include the time sired accuracy within a large number of iterations. For the necessary to load the matrix into memory as repeated runs values of c where each algorithm converges, the deteriora- could be done with the in-memory matrix. tion of convergence rate for the Krylov subspace methods was much less pronounced than for simple power iterations. In other words they are less affected by decreasing the tele- 4. CONCLUSIONS portation level. In this paper we have demonstrated that PageRank can be successfully computed using linear system iterative solvers. 3.6 Experiment with the “av” Web graph We have developed an efficient scalable parallel implemen- In this section, we discuss the results of our parallel linear tation and studied Jacobi and Krylov subspace iterative system PageRank on a full Web graph. We used a Septem- methods. Our numerical results show that GMRES and ber 2003 crawl of the Web from Alta Vista with 1.4 billion BiCGSTAB are overall the best choice of solution methods [8] S. Brin and L. Page. The anatomy of a large-scale hypertextual web . Computer Networks and ISDN Systems, 33(3):107–117, 1998. [9] A. Z. Broder, R. Lempel, F. Maghoul, and J. Pedersen. Efficient approximation via graph aggregation. In Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, pages 484–485. ACM Press, 2004. [10] J. Cho, H. Garc´ıa-Molina, and L. Page. Efficient crawling through URL ordering. Computer Networks and ISDN Systems, 30(1–7):161–172, 1998. [11] N. Eiron, K. McCurley, and J. Tomlin. Ranking the web frontier. In WWW13, 2004. [12] G. H. Golub and C. F. V. Loan. Matrix computations (3rd ed.). Johns Hopkins University Press, 1996. [13] Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In 30th Figure 7: Convergence at high c. International Conference on Very Large Data Bases, pages 576–587, 2004. [14] T. Haveliwala. Topic-sensitive pagerank, 2002. for PageRank class of problems and, for most graphs, pro- [15] T. Haveliwala and S. Kamvar. The second eigenvalue vide faster convergence than power iterations. We have also of the matrix. Technical report, Stanford demonstrated that the linear system PageRank can converge University, California, 2003. for much larger values of the teleportation coefficient c than standard power iterations. [16] G. Jeh and J. Widom. Scaling personalized web search. In WWW12, pages 271–279. ACM Press, 2003. [17] S. Kamvar, T. Haveliwala, and G. Golub. Adaptive 5. ACKNOWLEDGMENTS methods for the computation of pagerank, 2003. We would like thank Farzin Maghoul for the Alta Vista [18] S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. web graph, Alexander Arsky for the Host graph, Jan Peder- Exploiting the block structure of the web for sen, Gary Flake, and Vivek Tawde for help and discussions computing pagerank, 2003. and Yahoo! Research Labs for providing valuable resources [19] S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. and technical support. Extrapolation methods for accelerating pagerank computations. In Proceedings of the Twelfth 6. REFERENCES International World Wide Web Conference., 2003. [20] S. D. Kamvar, M. T. Schlosser, and H. Garcia-Molina. [1] Webgraph datasets. The eigentrust algorithm for reputation management http://webgraph-data.dsi.unimi.it. in p2p networks. In Proceedings of the Twelfth [2] A. Arasu, J. Novak, A. Tomkins, and J. Tomlin. International World Wide Web Conference, 2003. PageRank computation and the structure of the web: [21] G. Karypis and V. Kumar. A coarse-grain parallel Experiments and algorithms. In WWW11, 2002. formulation of multilevel k-way graph-partitioning [3] O. Axelsson. Iterative solution methods. Cambridge algorithm. In Proc. 8th SIAM Conference on Parallel University Press, New York, NY, USA, 1994. Processing for Scientific Computing, 1997. [4] S. Balay, K. Buschelman, V. Eijkhout, W. D. Gropp, [22] A. N. Langville and C. D. Meyer. Deeper inside D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. pagerank. Technical report, NCSU Center for Res. Sci Smith, and H. Zhang. PETSc users manual. Technical Comp., 2003. Report ANL-95/11 - Revision 2.1.5, Argonne National [23] X. Li and J. Demmel. Superlu dist: A scalable Laboratory, 2004. distributed-memory sparse direct solver for [5] S. Balay, V. Eijkhout, W. D. Gropp, L. C. McInnes, unsymmetric linear systems, 2003. and B. F. Smith. Efficient management of parallelism [24] B. Manaskasemsak and A. Rungsawang. Parallel in object oriented numerical software libraries. In pagerank computation on a gigabit pc cluster. In E. Arge, A. M. Bruaset, and H. P. Langtangen, Proceedings of the 18th International Conference on editors, Modern Software Tools in Scientific Advanced Information Networking and Applications, Computing, pages 163–202. Birkh¨auser Press, 1997. 2004. [6] R. Barrett, M. Berry, T. F. Chan, J. Demmel, [25] L. Page, S. Brin, R. Motwani, and T. Winograd. The J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, pagerank citation ranking: Bringing order to the web. C. Romine, and H. V. der Vorst. Templates for the Technical report, Stanford Digital Library Solution of Linear Systems: Building Blocks for Technologies Project, 1998. Iterative Methods, 2nd Edition. SIAM, Philadelphia, [26] W. J. Stewart. Numerical methods for computing PA, 1994. stationary distribution of finite irreducible Markov [7] P. Berkhin. A survey on pagerank computing. chains. In W. Grassmann, editor, Advances in Technical report, Yahoo! Inc., 2004. Computational Probability, chapter 4. Kluwer Academic Publishers, 1999. [27] C. Walshaw, M. Cross, and M. G. Everett. Parallel dynamic graph partitioning for adaptive unstructured meshes. Journal of Parallel and Distributed Computing, 47(2):102–108, 1997.