Scalable Big Graph Processing in MapReduce

♮ Lu Qin†,JeffreyXuYu‡,LijunChang§,HongCheng‡,ChengqiZhang†, Xuemin Lin§

†Centre for Quantum Computation and Intelligent Systems, University of Technology, Sydney, Australia ‡ The Chinese University of Hong Kong, China ♮ § The University of New South Wales, Australia East China Normal University, China

†{lu.qin,chengqi.zhang}@uts.edu.au ‡{yu,hcheng}@se.cuhk.edu.hk §{ljchang,lxue}@cse.unsw.edu.au

ABSTRACT 1. INTRODUCTION MapReduce has become one of the most popular parallel com- As one of the most popular paradigms for puting paradigms in cloud, due to its high , reliability, big data, MapReduce [10] has been widely used in a lot of com- and fault-tolerance achieved for a large variety of applications in panies such as Google, Facebook, Yahoo, and Amazon to big data processing. In the literature, there are MapReduce Class a large amount of data in the order of tera-bytes everyday. The and Minimal MapReduce Class to define the mem- success of MapReduce is due to its high scalability, reliability, and oryMRC consumption, communication cost,MMC CPU cost, and number of fault-tolerance achieved for a large variety of applications and its MapReduce rounds for an to execute in MapReduce. easy-to-use programming model that allows developers to develop However, neither of them is designed for big graph processing in parallel data-driven in a distributed shared nothing envi- MapReduce, since the constraints in can be hardly achieved ronment. A MapReduce algorithm executes in rounds. Each round simultaneously on graphs and the conditionsMMC in may induce has three phases: map, shuffle,andreduce.Themap phase gener- scalability problems when processing big graphMRC data. In this pa- ates a set of key-value pairs using a map function, the shuffle phase per, we study scalable big graph processing in MapReduce. We in- transfers the key-value pairs into different machines and ensures troduce a Scalable Graph processing Class by relaxing some that key-value pairs with the same key arrive at the same machine, constraints in to make it suitable for scalableSGC graph process- and the reduce phase processes all key-value pairs with the same ing. We defineMMC two graph join operators in ,namely, join key using a reduce function. and join, using which a wide range of graphSGC algorithmsEN can be NE Motivation: In the literature, there are researches that define al- designed, including PageRank, breadth first search, graph keyword gorithm classes in MapReduce in terms of memory consumption, search, Connected Component (CC) computation, and Minimum communication cost, CPU cost, and the number of rounds. Karloff Spanning Forest (MSF) computation. Remarkably, to the best of et al. [23] give the first attempt in which the MapReduce Class our knowledge, for the two fundamental graph problems CC and ( ) is proposed. defines the maximal requirements for MSF computation, this is the first work that can achieve O(log(n)) anMRC algorithm to executeMRC in MapReduce, in the sense that if any con- MapReduce rounds with O(n + m) total communication cost in dition in is violated, running the algorithm in MapReduce each round and constant memory consumption on each machine, is meaningless.MRC Nevertheless, a better class is highly demanded where n and m are the number of nodes and edges in the graph re- to guide the development of more stable and scalable MapReduce spectively. We conducted extensive performance studies using two algorithms. Thus, Tao et al. [41] introduce the Minimal MapRe- web-scale graphs Twitter-2010 and Friendster with different graph duce Class ( ) in which several aspects can achieve optimal- characteristics. The experimental results demonstrate that our al- ity simultaneously.MMC A lot of important database problems including gorithms can achieve high scalability in big graph processing. sorting and sliding aggregation can be solved in .However, is still incapable of solving a large rangeMMC of problems espe- Categories and Subject Descriptors ciallyMMC for those involved in graph processing, which is an important branch of big data processing. The reasons are twofold. First, a H.2.4 [Information Systems]: Database Management—Systems graph usually has some inherent characteristics which make it hard to achieve high parallelism. For example, a graph is usually un- Keywords structured and highly irregular, making the locality of the graph Graph; MapReduce; ; Big Data very poor [30]. Second, the loosely synchronised shared nothing computing structure in MapReduce makes it difficult to achieve high workload balancing and low communication cost simultane- ously as defined in when processing graphs (see Section 3 MMC Permission to make digital or hard copies of all or part of this work for personal or for more details). Motivated by this, in this paper, we relax some conditions in and define a new class of MapReduce algo- classroom use is granted without fee provided that copies are not made or distributed MMC for profit or commercial advantage and that copies bear this notice and the full cita- rithms that is more suitable for scalable big graph processing. tion on the first page. Copyrights for components of this work owned by others than We make the following contributions in this paper. ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- Contributions: publish, to post on servers or to redistribute to lists, requires prior specific permission (1) New class defined for scalable graph processing: We define a and/or a fee. Request permissions from [email protected]. new class for scalable graph processing in MapReduce. We SIGMOD’14, June 22–27, 2014, Snowbird, UT, USA. aim at achievingSGC three goals: scalability, stability,androbustness. Copyright 2014 ACM 978-1-4503-2376-5/14/06 ...$15.00. http://dx.doi.org/10.1145/2588555.2593661. Scalability requires an algorithm to achieve good speed-up w.r.t the

827 number of machines used. Stability requires an algorithm to ter- 2. PRELIMINARY minate in bounded number of rounds. Robustness requires that an In this section, we introduce the MapReduce framework and re- algorithm never fails regardless of how much memory each ma- view the two algorithm classes in MapReduce in the literature. chine has. relaxes two constraints defined in ,namely, the communicationSGC cost on each machine, and theMMC total number of 2.1 The MapReduce Framework rounds. For the former, we define a new cost, that balances the MapReduce, introduced by Google [10], is a programming model communication in a random manner where the randomness is re- that allows developers to develop highly scalable and fault-tolerant lated to the degree distribution of the graph. For the latter, we relax parallel applications to process big data in a distributed shared noth- the O(1) rounds defined in to O(log(n)) where n is the ing environment. A MapReduce algorithm executes in rounds. Each graph node number. In addition,MMC we require the memory used in round involves three phases: map, shuffle,andreduce.Assuming each machine to be loosely related to the size of the input data, in that the input data is stored in a distributed file system as a set of order to achieve high robustness. Such a condition is even stronger key-value pairs, the three phases work as follows. than that defined in .Therobustness requirement is highly Map: In this phase, each machine reads a part of the key-value MMC • m m demanded by a commercial database system, with which a database pairs (ki ,vj ) from the distributed file system and generates { } s s administrator does not need to worry that the data grows too large a new set of key-value pairs (ki ,vj ) to be transferred to other to reside entirely in the total memory of the machines. machines in the shuffle phase.{ } Shuffle: The key-value pairs (ks,vs) generated in the map (2) Two elegant graph operators defined to solve a large range of • { i j } graph problems: We define two graph join operators, namely, phase are shuffled across all machines. At the end of the shuffle NE s s s s join and join. join propagates information from nodes phase, all the key-value pairs (ki ,v1), (ki ,v2), with the EN NE s { ···} to their adjacent edges and join aggregates information from same key ki are guaranteed to arrive at the same machine. adjacent edges to nodes. BothEN join and join can be imple- Reduce: Each machine groups the key-value pairs with the same NE EN • s s s s mented in . Using the two graph join operators, a large range key ki together as (ki , v1,v2, ), from which a new set of SGC key-value pairs (kr,v{r) is generated···} and stored in the dis- of graph algorithms can be designed in , including PageRank, { i j } breadth first search, graph keyword search,SGC Connected Component tributed file system to be processed in the next round. (CC) computation, and Minimum Spanning Forest (MSF) compu- Two functions need to be implemented in each round: a map tation. Especially, for CC and MSF computation, it is non-trivial to function and a reduce function. A map function determines how to s s m m solve them using graph operators in . To the best of our knowl- generate (ki ,vj ) from (ki ,vj ) whereas a reduce function SGC { } { r r } s s s edge, for the two fundamental graph problems, this is the first work determines how to generate (ki ,vj ) from (ki , v1,v2, ). that can achieve O(log(n)) MapReduce rounds with O(n + m) { } { ···} total communication cost in each round and constant memory con- 2.2 Algorithm Classes in MapReduce sumption on each machine, where n and m are the number of nodes In the literature, two algorithm classes have been introduced in and edges in the graph respectively. We believe our findings on MapReduce, in terms of disk usage, memory usage, communica- MapReduce can also guide the development of scalable graph pro- tion cost, CPU cost, and number of MapReduce rounds. Let S be cessing algorithms in other systems in cloud. the set of objects in the problem and t be the number of machines (3) Unified graph processing system: In all of our algorithms, we in the system. The two classes are defined as follows. MapReduce Class : The MapReduce Class is intro- enforce the input and output of any graph operation to be either a MRC MRC node table or an edge table, with which a unified graph processing duced by Karloff et al. [23]. Fix a ϵ>0, a MapReduce algorithm system can be designed. The benefits are twofold. First, the unified in should have the following properties: MRC 1 ϵ input and output make the system self-containable, on top of which Disk:EachmachineusesO( S − ) disk space. The total disk • 2 2ϵ | | more complex graph processing tasks can be designed by chaining space used is O( S − ). | | 1 ϵ several graph queries together. For example, one may want to find Memory:EachmachineusesO( S − ) memory. The total • 2 2ϵ | | all connected components of the subgraph induced by nodes that memory used is O( S − ). | | are related to photography and hiking.Thiscanbedonebychain- Communication:Ineachround,eachmachinesends/receives • 1 ϵ 2 2ϵ ing three graph queries: a graph keyword search query, an induced O( S − ) data. The total communication cost is O( S − ). | | | | subgraph query, and a connected component query, all of which are CPU: In each round, the CPU consumption on each machine is • studied in this paper. Second, by chaining multiple queries with O(poly( S )), i.e., polynomial to S . | | | | i the unified input/output, more query optimization techniques can Number of rounds:ThenumberofroundsisO(log S ) for a • | | be developed and integrated into the system. constant i 0. ≥ (4) Extensive performance studies: We conducted extensive per- Minimal MapReduce Class : The Minimal MapReduce Class is introduced by TaoMMC et al. [41] which aims to achieve formance studies using two real web-scale graphs Twitter-2010 and MMC Friendster,bothofwhichhavebillionsofedges.Twitter-2010 has a outstanding efficiency in multiple aspects simultaneously. A MapRe- smaller diameter with a skewed degree distribution, and Friendster duce algorithm in should have the following properties: MMC S has a larger diameter with a more uniform degree distribution. All | | Disk:EachmachineusesO( t ) disk space. The total disk of our algorithms can achieve high scalability on the two datasets. • space used is O( S ). | | S Outline: Section 2 presents the preliminary. Section 3 introduces Memory:EachmachineusesO( | | ) memory. The total mem- • t the scalable graph processing class and two graph operators ory used is O( S ). | | join and join in . SectionSGC 4 presents three basic graph Communication:Ineachround,eachmachinesends/receives NE EN SGC • S algorithms in .Sections5and6showhowtocomputeCC O( | | ) data. The total communication cost is O( S ). t | | and MSF in SGCrespectively. Section 7 evaluates the algorithms CPU: In each round, the CPU consumption on each machine is SGC • Tseq in using extensive experiments. Section 8 reviews the related O( ),whereTseq is the time to solve the same problem on SGC t work and Section 9 concludes the paper. asinglesequentialmachine. Number of rounds:ThenumberofroundsisO(1). •

828 3. SCALABLE GRAPH PROCESSING MRC MMC SGC 1+ c Disk/machine O(n 2 ) O( n+m ) O( n+m ) defines the basic requirements for an algorithm to execute t t 1+ c MRC Disk/total O(m 2 ) O(n + m) O(n + m) in MapReduce, whereas requires several aspects to achieve 1+ c 2 n+m MMC Memory/machine O(n ) O( t ) O(1) optimality simultaneously in a MapReduce algorithm. In the fol- 1+ c Memory/total O(m 2 ) O(n + m) O(t) lowing, we analyze the problems involved in and in 1+ c 2 n+m ˜ m MRC MMC Communication/machine O(n ) O( t ) O( t ,D(G, t)) graph processing and propose a new class which is suitable 1+ c SGC Communication/total O(m 2 ) O(n + m) O(n + m) for scalable graph processing in MapReduce. T CPU/machine O(poly(m)) O( seq ) O˜( m ,D(G, t)) We first analyze .ConsideragraphG(V,E) with n = t t MMC CPU/total O(poly(m)) O(Tseq ) O(n + m) V nodes and m = E edges. A common graph operation is to ex- Number of rounds O(1) O(1) O(log(n)) change| | data among all| | adjacent nodes (nodes that share a common Table 1: Graph Algorithm Classes in MapReduce edge) in the graph G. The memory constraint in requires that all edges/nodes are distributed evenly amongMMC all machines in uniformly distributed among all machines, denote by Vi the set of nodes stored in machine i for 1 i t,andletdj be the degree the system. Let Ei,j be the set of edges (u, v) in G such that u is v O˜(≤m ,D≤(G, t)) in machine i and v is in machine j.Thecommunicationconstraint of node j in the input graph, t is defined as: m in can be formalized as follows: O˜( ,D(G, t)) = O(max(Σv V dj )) (2) t 1 i t j ∈ i MMC ≤ ≤ max (Σ1 j t,j=i Ei,j ) O((n + m)/t) (1) t 1 2 1 i t ≤ ≤ ̸ | | ≤ D(G, t)= − Σv V dj (3) ≤ ≤ t2 j ∈ This requires minimizing max1 i t (Σ1 j t,j=i Ei,j ) which ≤ ≤ ≤ ≤ ̸ Lemma 3.1: Let xi (1 i q) be the communication cost upper is an NP-hard problem [13]. Furthermore, even if| the| optimal ≤ ≤ bound for machine i,i.e.,xi = dj ,theexpectedvalueof vj Vi solution is computed, it is not guaranteed that min (max1 i t 2m ∈ n+m ≤ ≤ xi, E(xi)= t , and the variance of xi, V ar(xi)=D(G, t). ✷ ( 1 j t,j=i Ei,j )) O( t ).Thus, is not suitable ! ≤ ≤ ̸ | | ≤ MMC to define a graph algorithm in MapReduce. The proof of Lemma 3.1 is omitted due to space limitation. Note !Next, we discuss . Since defines the basic condi- that the variance of the degree distribution of G,denotedVar(G), MRC MRC 2m 2 2 2 2 tions that a MapReduce algorithm should satisfy, a graph algorithm is ( v V (dj n ) )/n =(nΣvj V dj 4m )/n .Forfixed j ∈ − ∈ − in MapReduce is not an exception. However, like ,abetter t, n,andm values, minimizing D(G, t) is equivalent to minimizing MMC ! class is always desirable to be defined for more stably and scalably Var(G). In other words, the variance of communication cost for graph processing in MapReduce. Given a graph G(V,E) with n 1+c each machine is minimized if all nodes in the graph have the same nodes and m edges, assume that m n ,theauthorsin[23] degree. We define the scalable graph processing class below. define a class based on for graph≥ processing in MapReduce, SGC MRC Scalable Graph Processing Class : A graph MapReduce al- in which a MapReduce algorithm has the following properties: SGC c gorithm in should have the following properties: 1+ 2 Disk:EachmachineusesO(n ) disk space. The total disk SGC m+n • 1+ c Disk:EachmachineusesO( ) disk space. The total disk space used is O(m 2 ). • t 1+ c space used is O(m+n). This is the minimal requirement, since Memory:EachmachineusesO(n 2 ) memory. The total mem- • 1+ c we need at least O(m + n) disk space to store the data. ory used is O(m 2 ). Memory:EachmachineusesO(1) memory. The total memory Communication:Ineachround,eachmachinesends/receives • • 1+ c 1+ c used is O(t). This is a very strong constraint, to ensure the O(n 2 ) data. The total communication cost is O(m 2 ). robustness of the algorithm. Note that the memory defined here CPU: In each round, the CPU consumption on each machine is • is the memory used in the map and reduce phases. There is also O(poly(m)), i.e., polynomial to m. memory used in the shuffle phase, which is usually predefined Number of rounds:ThenumberofroundsisO(1). • by the system and is independent of the algorithm. Such a class has a good property that the algorithm runs in constant Communication:Ineachround,eachmachinesends/receivesO˜ 1+ c rounds. However, it requires each machine to use O(n 2 ) mem- • m ( t ,D(G, t)) data, and the total communication cost is O(m + ory, which can be large even for a dense graph. When the memory n),whereG is the input graph in the round. 1+ c of each machine cannot hold O(n 2 ) data, the algorithm fails no ˜ m CPU: In each round, the CPU cost on each machine is O ( t , matter how many machines are used in the system. Thus, the class • D(G, t)),whereG is the input graph in the round. The CPU is not scalable to handle a graph with large n. cost defined here is the cost spent in the map and reduce phases. Number of rounds:ThenumberofroundsisO(log(n)). 3.1 The Scalable Graph Processing Class • SGC Discussion: For the memory constraint, only requires each We now explore a better class that is suitable for graph process- machine to use constant memory, that isSGC to say, even if the total ing in MapReduce. We aim at defining a MapReduce class in which memory of the system is smaller than the input data, the algorithm a graph algorithm has the following three properties. can still be processed successfully. This is an even stronger con- Scalability: The algorithm can always be speeded up by adding • straint than that defined in . Nevertheless, we give the flex- more machines. ibility for the algorithm to runMMC other query optimization tasks using Stability: The algorithm stops in bounded rounds. the free memory, which can be orthogonally studied to our work. • Robustness: The algorithm never fails regardless of how much • Given the constraints on memory, communication, and CPU, it is memory each machine has. nearly impossible for a wide range of graph algorithms to be pro- It is difficult to distribute the communication cost evenly among cessed in constant rounds in MapReduce. Thus, we relax the O(1) all machines for a graph algorithm in MapReduce. The main rea- rounds defined in to O(log(n)) rounds, which is reason- son is due to the skewed degree distribution (e.g., power-law distri- able since O(log(nMMC)) is the processing time lower bound for a large bution) for a large range of real-life graphs, in which some nodes number of parallel graph algorithms in the parallel random-access m+n may have very high degrees. Hence, instead of using O( t ) as machines, and is practical for the MapReduce framework as evi- the upper bound of communication cost per machine, we define a denced by our experiments. The comparison of the three classes ˜ m weaker bound, denoted as O( t ,D(G, t)). Suppose the nodes are for graph processing in MapReduce is shown in Table 1.

829 3.2 Two Graph Operators in Algorithm 1 PageRank(V (id),E(id1,id2),d) SGC V 1 (V ✶EN E); We assume that a graph G(V,E) is stored in a distributed file 1: r id,cnt(id ) d, r id,id1 ← 2 → V → | | system as a node table V and an edge table E.Eachnodeinthe 2: for i =1! to d do table has a unique id and some other information such as label 3: Er id ,id , r p(Vr ✶id,idNE E); ← 1 2 d → 1 and keywords. Each edge in the table has id1, id2 defining the 4: Vr !id,d, α +(1 α)sum(E .p) r (Vr ✶id,idEN Er ); ← V − r → 2 source and target node ids of the edge, and some other information | | 5: return ! (Vr ); such as weight and label. We use the node id to represent the node id,r ! if it is obvious. G can be either directed or undirected. For an sets of s, s1 and s2, with s1 s2 = and s1 s2 = s, gk(s) can ∩ ∅ ∪ undirected graph, each edge is stored as two edges (id1,id2) and be computed using gk(s1) and gk(s2). ✷ (id2,id1). In the following, we introduce two graph operators in Given a node table Vi and an edge table Ej ,an join of Vi ,namely, join, and join, using which a large range of EN graphSGC problemsNE can be designed.EN and Ej can be formulated using the following algebra: Πid,g (c ) p ,g (c ) p , (σcond(c)((Vi V ) 3.2.1 Join 1 1 → 1 2 2 → 2 ··· → NE ✶EN (Ej E))) C(cond′(c′)) cnt (5) An join aims to propagate the information on nodes into id,id′ → | → NE edges, i.e., for each edge (vi,vj ) E,an join outputs an edge or equivalently in the following SQL form: ∈ NE (vi,vj ,F(vi)) (or (vi,vj ,F(vj )))whereF (vi) (or F (vj )) is a set select id, g1(c1) as p1, g2(c2) as p2, of functions operated on vi (or vj ) in the node table V .Givena ··· from Vi as V join Ej as E on V.id = E.id′ node table Vi and an edge table Ej ,an join of Vi and Ej can EN where cond(c) be formulated using the following algebra:NE group by id Πid1,id2,f1(c1) p1,f2(c2) p2, (σcond(c)((Vi V ) → → ··· → count cond′(c′) as cnt ✶NE (Ej E))) C(cond′(c′)) cnt (4) id,id′ → | → where each of c, c′, c1, c2, is a subset of fields in the two ··· or equivalently in the following SQL form: tables Vi and Ej ,andid′ can be either id1 or id2.Thewhere part and count part are analogous to those defined in join. gk is a select id1, id2, f1(c1) as p1, f2(c2) as p2, NE ··· decomposable aggregate function operated on the fields in ck,by from Vi as V join Ej as E on V.id = E.id NE ′ where cond(c) grouping the results using node idsasdenotedinthegroup by part. Since the group by field is always the node id, we omit the group count cond′(c′) as cnt by part in Eq. 5 for simplicity. where each of c, c′, c1, c2, is a subset of fields in the two tables ··· join in MapReduce: The join operation can be imple- Vi and Ej , fk is a function operated on the fields ck,andcond EN EN and cond′ are two functions that return either true or false defined mented in MapReduce as follows. Let the set of fields used in V be on the fields in c and c′ respectively. id′ can be either id1 or id2. cv,andthesetoffieldsusedinE be ce.Themap phase is similar to that in the join. That is, for each node v V , the values in The count part counts the number of truesreturnedbycond′(c′) NE ∈ and the number is assigned to a counter cnt, which is useful in cv with key v.id are emitted as a key-value pair (v.id, v.cv),and for each edge e E, the values in ce with key e.id′ are emitted as a determining a terminate condition for an iterative algorithm. ∈ key-value pair (e.id′,e.ce). In the reduce phase, for each node id, join in MapReduce: The join operation can be imple- the set of key-value pairs (id, v.cv), (id, e1.ce), (id, e2.ce), NE NE V mented in MapReduce as follows. Let the set of fields used in can be processed as a data{ stream without loading the whole···} set be cv, and the set of fields used in E be ce. In the map phase, into memory. Assuming that (id, v.cv) comes first before all other for each node v V , the values in cv with key v.id are emitted ∈ key-value pairs (id, ei.ce) in the stream, the algorithm first loads as a key-value pair (v.id, v.cv).Foreachedgee E, the values ∈ (id, v.cv) into memory and then processes each (id, ei.ce) one by in ce with key e.id′ are emitted as a key-value pair (e.id′,e.ce). one. For each function gk, since gk is decomposable, gk( e1,e2, In the reduce phase, for each node id, the set of key-value pairs { ,ei ) can be calculated using gk( e1,e2, ,ei 1 ) and (id, v.cv), (id, e1.ce), (id, e2.ce), can be processed as a data ··· } { ··· − } { ···} gk( ei ). After processing all (id, ei.ce), all the gk functions are stream without loading the whole set into memory. Assuming that computed.{ } Finally, the algorithm checks cond(c).Ifcond(c) re- (id, v.c ) (id, e .c ) v comes first before all other key-value pairs i e in turns true, it outputs the id as well as all the g values as a single the stream (this can be implemented as a secondary sort in MapRe- k tuple into the distributed file system and increases cnt if cond′(c′) (id, v.c ) duce), the algorithm first loads v into memory and then returns true. It is easy to see that join belongs to . processes each (id, ei.ce) one by one. For a certain (id, ei.ce), the EN SGC algorithm checks cond(c).Ifcond(c) returns false, the algorithm 4. BASIC GRAPH ALGORITHMS skips (id, ei.ce) and continues to process the next (id, ei+1.ce). The combination of join and join can solve a wide range Otherwise, the algorithm calculates all fj (cj ) and cond′(c′) from of graph problems in NE . In thisEN section, we introduce some (id, v.cv) and (id, ei.ce), outputs the selected fields as a single tu- basic graph algorithms,SGC including PageRank, breadth first search, ple into the distributed file system, and increases cnt if cond′(c′) returns true. It is easy to see that join belongs to . and graph keyword search, in which the number of rounds is deter- NE SGC mined by a user given parameter or a graph factor which is small 3.2.2 Join EN and can be considered as a constant. We will introduce more com- An join aims to aggregate the information on edges into plex algorithms that need logarithmic rounds in the worst case in EN nodes, i.e., for each node vi V ,an join outputs a node the next sections, including connected component and minimum ∈ EN (vi,G(adj(vi))) where adj(vi)= (vi,vj ) E ,andG is a spanning forest computation. { ∈ } set of decomposable aggregate functions on the edge set adj(vi), PageRank. PageRank is a key graph operation which computes the where a decomposable aggregate function is defined as follows: rank of each node based on the links (directed edges) among them. Definition 3.1: (Decomposable Aggregate Function) An aggregate Given a directed graph G(V,E), PageRank is computed iteratively. 1 function gk is decomposable if for any dataset s,andanytwosub- Let the initial rank of each node be V , in iteration i, the rank of a | |

830 Algorithm 2 BFS(V (id),E(id1,id2,s) Algorithm 4 CC(V (id),E(id1,id2))

1: Vd id,id=s?0:φ d(V ) 1: Vm id,min(id,id ) p(V ✶id,idEN E); ← → ← 2 → 1 2: for i =1to + do 2: Vc ((Vm V ) ✶EN (Vm V ′)); ! ∞ ← !V.id,V.p,cnt(V ′.id) c → id,p → 3: Ed (Vd ✶NE E); → id1,id2,d d1 id,id1 ✶ ← → 3: Vp !id,((c=0 id=p)?min(id ):p) p(Vc id,idEN E); ← ∧ 2 → 1 4: Vd !id,((d=φ min(d )=φ)?i:d) d(Vd ✶id,idEN Ed) C(d = ← ∧ 1 ̸ → 2 | 4: while true! do i) n ; ! → new 5: Vs star(Vp); 5: if nnew =0then break; ← 6: Eh id ,id ,p p (Vs ✶id,idNE E); 6: return Vd; ← 1 2 → 1 1 7: Vh′ !id,p,min(p,p ) p (σs=1(Vs ✶id,idEN Eh)); ← 1 → m 2 8: Vh !V .id,(cnt(p )=0?V .p:min(p )) p(Vs ✶id,pEN Vh′ ); ← s m s m → 9: Vs star(Vh); Algorithm 3 KWS(V (id, t),E(id1,id2), k1, ,kl , rmax) ← ! { ··· } Eu (Vs ✶NE E); 10: id1,id2,p p1 id,id1 1: Vr id,k t?(id,0):(φ,φ) (p ,d ), , ← → ← 1∈ → 1 1 ··· 11: Vu′ !id,p,min(p p =p) p (σs=1(Vs ✶id,idEN Eu)); k t?(id,0):(φ,φ) (p ,d )(V ); ← 1| 1̸ → m 2 ! l∈ → l l 12: Vu !V .id,(cnt(p )=0?V .p:min(p )) p(Vs ✶id,pEN Vu′ ); 2: for i =1to rmax do ← s m s m → 3: Er id ,id ,(p ,d ) (p ,d ), , 13: Vp V .id,V.p((Vu V ) ✶id,pNE (Vu V ′)) C(V ′.p = ← 1 2 1 1 → e1 e1 ··· ← ! ′ → → | ̸ V.p) ns; (Vr ✶NE E); ! (pl,dl) (pe ,de ) id,id1 ! → → l l 14: if ns =0then break; V 4: r id,amin(p ,d ,p ,d +1) (p ,d ), , Vp; ← 1 1 e1 e1 → 1 1 ··· 15: return (V ✶EN E ) ! amin(pl,dl,pe ,de +1) (pl,dl) r id,id r ; l l → 2 16: Procedure star(Vp) 5: return V . (σd1=φ dl=φ(Vr )); ✶ r ∗ ̸ ∧···∧ ̸ 17: Vg V .id,V .p,V.p g,(V.p=V .p?1:0) s ((Vp V ) id,pNE ( ← ′ ′ → ′ → → ! α ri 1(u) Vp V ′)); node v is computed as ri(v)= +(1 α) − , ! → V u nbrin(v) d(u) | | − ∈ 18: Vs′ V.id,V.p,and(V.s,V .s) s ((Vg V ) ✶id,gEN (Vg V ′)); ← ′ → → → where 0 <α<1 is a parameter, nbrin(v) is the set of in- ✶ ! 19: Vs !V .id,V .p,(V .s=0?0:V.s) s ((Vs′ V ) id,pNE (Vs′ V ′)); neighbors of v in G,andd(u) is the number of out-neighbors of ← ′ ′ ′ → → → ! u in G. 20: return Vs; The PageRank algorithm in is shown in Algorithm 1. Given SGC Suppose for each v V , t(v) is the text information included in v. the node table V (id), the edge table E(id1,id2),andthenumberof ∈ Given a keyword query with a set of l keywords Q = k1,k2, , iterations d, initially, the algorithm computes the out-degree of each { ··· 1 kl ,akeywordsearch[16,17]findsasetofrootedtreesintheform node using V ✶EN E, assigns an initial rank to each node, } id,id1 V of (r, (p1,d(r, p1)), (p2,d(r, p2)), , (pl,d(r, pl)) ),wherer | | { ··· } and generates a new table Vr (line 1). Then, the algorithm updates is the root node, pi is a node that contains keyword ki in t(pi),and the node ranks in d iterations. In each iteration, the ranks of nodes d(r, pi) is the shortest distance from r to pi in G for 1 i l. are updated using an join followed by an join. In the Each answer is uniquely determined by its root node r. rmax≤ is≤ the NE r(v) EN NE join, the partial rank p(v)= d(v) for each node v is propagated to maximum distance allowed from the root node to a keyword node all its outgoing edges using Vr ✶ E and a new edge table Er in an answer, i.e., d(r, pi) rmax for 1 i l. id,id1 ≤ ≤ ≤ is generated (line 3). In the join, for each node v, the partial Graph keyword search can be solved in . The algorithm is SGC ranks p(u) from all its incomingEN edges (u, v) are aggregated. The shown in Algorithm 3. Given a node table V (id, t),anedgeta- α new rank is computed as V +(1 α) u nbr (v)(p(u)) using ble E(id1,id2), a keyword query k1,k2, ,kl ,andrmax, the | | − ∈ in { ··· } algorithm first initializes a table Vr,whereineachnodev,forev- Vr ✶EN Er,andVr is updated with the new ranks (line 4). id,id2 ! ery ki, a pair (pi,di) is generated as (id(v), 0) if ki is contained Breadth First Search. Breadth First Search (BFS) is a fundamen- in v.t,and(φ,φ ) otherwise (line 1). Then the algorithm itera- tal graph operation. Given an undirected graph G(V,E),anda tively propagates the keyword information from each node to its source node s, a BFS computes for every node v V the shortest ∈ neighbor nodes using rmax iterations. In each iteration, the key- distance (i.e., the minimum number of hops) from s to v in G. word information for each node is first propagated into its adjacent The BFS algorithm in is shown in Algorithm 2. Given a SGC edges using join, and then the information on edges is grouped node table V (id),anedgetableE(id1,id2),andasourcenodes, into nodes toNE update the keyword information on each node using the algorithm computes for each node v the shortest distance d(v) join. Specifically, the join generates a new edge table from s to v.Initially,anodetableVd is created with d(s)=0 EN NE Er, in which each edge (u, v) is embedded with keyword infor- and d(v)=φ for v = s (line 1). Next, the algorithm iteratively mation (p1(u),d1(u)), , (pl(u),dl(u)) retrieved from node u computes the nodes with̸ d(v)=i from nodes with d(v)=i 1. ··· using Vr ✶id,idNE E (line 3). In the join Vr ✶id,idEN E (line 4), Each iteration i is processed using an join followed by an− 1 EN 2 NE EN each node updates its nearest node pi that contains keyword ki us- join. The join propagates d(u) into each edge (u, v) using ing an amin function, which is defined as: ✶ NE Vd id,idNE 1 E and produces a new table Ed.The join updates EN amin( (p1,d1), , (pk,dk) )=(pi,di) (di = min1 j kdj ) (6) all d(v) based on the following rule: { ··· } | ≤ ≤ (Distance Update Rule): In the i-th iteration of BFS, a node v is amin is a decomposable since for any two sets s1 and s2 with assigned d(v)=i iff in the (i 1)-th iteration, d(v)=φ and there s1 s2 = , the following equation holds: − ∩ ∅ exists a neighbor u of v such that d(u) = φ. amin(s1 s2)=amin( amin(s1),amin(s2) ) (7) ̸ ∪ { } ✶EN The rule can be easily implemented using Vd id,id2 Ed (line 4) After rmax iterations, for all nodes v in Vr, its nearest node pi that in which it also computes a counter nnew which is the number of contains keyword ki(1 i l) with distance di = d(v, pi) ≤ ≤ ≤ nodes with d(v)=i.Whennnew =0, the algorithm terminates. rmax is computed. The algorithm returns the nodes with di = φ It is easy to see that the number of iterations for Algorithm 2 is no for all 1 i l as the final set of answers (line 5). ̸ larger than the diameter of the graph G.Thusthealgorithmbelongs ≤ ≤ to if the diameter of the graph is small. SGC 5. CONNECTED COMPONENT Graph Keyword Search.Wenowinvestigateamorecomplexal- Given an undirected graph G(V,E) with n nodes and m edges, gorithm, namely, keyword search in an undirected graph G(V,E). aConnectedComponent(CC)isamaximalsetofnodesthatcan

831 13 14 1 17 16 5 7 1 3 3 11 2 5 7 192 10 12 9 6 15 18 8 1210 9 4 13 4 14 19 6 15 18 8 Figure 1: A Sample Graph G(V,E) 17 16 11 Figure 3: Singleton Elimination: Compute Vp

1 3 5 7 2 1 3 V.p 1210 9 16 13 4 2 5 7 6 15 18 8 14 19 10 12 9 13 4 V’.p=V.id 17 11 1914 6 15 18 8 17 16 11 V’.id Figure 2: Initialize CC: Compute Vm V reach each other through paths in G.ComputingallCCsofG is a Figure 4: Star Detection Step 1: Compute g fundamental graph problem and can be solved efficiently on a se- PRAM algorithm in O(t) time can be simulated in MapReduce in quential machine using O(n+m) time. However, it is non-trivial to O(t) rounds. For the CC computation problem, in the literature, the solve the problem in MapReduce. Below, we briefly introduce the best result in CRWE PRAM is presented in [22] which computes state-of-the-art algorithms for CC computation, followed by pre- CCs in O(log(n)) time. However, it needs to compute the 2-hop 2 senting our algorithm in . node pairs which requires O(n ) communication cost in the worst SGC case in each round. Thus, the simulation algorithm is impractical. 5.1 State-of-the-art We present three algorithms for CC computation in MapReduce: 5.2 Connected Component in SGC HashToMin, HashGToMin,andPRAM-Simulation. HashToMin We introduce our algorithm to compute CCs in .Concep- and HashGToMin are two MapReduce algorithms proposed in [36], tually, the algorithm shares similar ideas with mostSGC deterministic with a similar idea to use the smallest node in each CC as the rep- O(log(n)) CRCW PRAM algorithms, such as [39] and [6], but it resentative of the CC, assuming that there is a total order among all is a non-trivial adaption since each operation should be carefully nodes in G. PRAM-Simulation is to simulate the algorithm in the designed using graph joins in .Ouralgorithmmaintainsafor- Parallel Random Access Machine (PRAM) model in MapReduce est using a parent pointer p(vSGC) for each v V .Eachrootedtree using the simulation method proposed in [23]. in the forest represents a partial CC. A singleton∈ is a tree with one Algorithm HashToMin: Each node v V maintains a set Cv ini- node, and a star is a tree of height 1.Atreeisanisolatedtree ∈ tialized as Cv = v u (u, v) E .Letvmin =min u u if there are no edges in E that connect the tree to another tree. { }∪{ | ∈ } { | ∈ Cv , the algorithm updates Cv in iterations until it converges. Each The forest is iteratively updated using two operations: hooking and iteration} is processed using MapReduce as follows. In the map pointer jumping. Hooking merges several trees into a larger tree, phase, for each v V , two types of key-value pairs are emitted: and pointer jumping changes the parent of each node to its grand- ∈ (1) (vmin,Cv),and(2)(u, vmin ) for all u Cv. In the re- parent in each tree. When the algorithm ends, each tree becomes duce phase, for each v V ,{ a set} of key-value∈ pairs are received an isolated star that represents a CC in the graph. 1 ∈ k in forms of (v, Cv ), , (v, Cv ) .ThenewCv is updated as Specifically, the algorithm first initializes a forest to make sure i { ··· } that no singletons exist except for isolated singletons. Then, the al- 1 i k Cv.TheHashToMin algorithm finishes in O(log(n)) ≤ ≤ 1 O(log(n)(m+n)) gorithm updates the forest in iterations. In each iteration, two hook- rounds" , with total communication cost in each round. The algorithm can be optimized to use O(1) memory on ing operations, namely, a conditional star hooking and an uncondi- each machine using secondary sort in MapReduce. tional star hooking, followed by a pointer jumping operation are performed. The two hooking operations eliminate all non-isolated Algorithm HashGToMin: Each node v V maintains a set Cv ∈ stars in the forest and the pointer jumping operation produces new initialized as Cv = v .LetC v = u u Cv,u > v and stars to be eliminated in the next iteration. Our algorithm CC is { } ≥ { | ∈ } vmin =minu u Cv , the algorithm updates Cv in iterations { | ∈ } shown in Algorithm 4, which includes five components: Forest Ini- until it converges. Each iteration is processed using three MapRe- tialization (line 1-3), Star Detection (line 16-20), Conditional Star duce rounds. In the first two rounds, each round updates Cv as Hooking (line 5-8), Unconditional Star Hooking (line 9-12), and Cv umin (u, v) E in MapReduce. The third round is pro- Pointer Jumping (line 13-14). We explain the algorithm using a cessed∪{ as follows.| In∈ the}map phase, for each v V , two types of ∈ sample graph G(V,E) shown in Fig. 1. key-value pairs are emitted: (1) (vmin,C v),and(2)(u, vmin ) ≥ { } Forest Initialization: The forest is initialized in three steps. (1) In for all u C v. In the reduce phase, for each v V ,asetof ∈ ≥ 1 ∈ k the first step, a table Vm is computed, in which each node v finds key-value pairs are received in forms of (v, Cv ), , (v, Cv ) . i { ··· } the smallest node among its neighbors in G including itself as the The new Cv is updated as 1 i k Cv.TheHashGToMin al- ≤ ≤ parent p(v) of v, i.e., p(v)=min v u (u, v) E .Suchan ˜ { ∪{ | ∈ }} gorithm finishes in O(log(n")) (i.e., expected O(log(n))) rounds, operation guarantees that no cycles are created except for self cy- with O(m + n) total communication cost in each round. However, cles (i.e., p(v)=v). The operation can be done using V ✶id,idEN E it needs O(n) memory for a single machine to hold a whole CC in 1 as shown in line 1. (2) In the second step, we create a table Vc memory. Thus, as indicated in [36], HashGToMin is not suitable by counting the number of subnodes for each node in the forest, to handle a graph with large n. i.e., for each node v, c(v)= u p(u)=v .Thiscanbedone |{ | }| Algorithm PRAM-Simulation: The PRAM model allows multiple using a self join Vm ✶id,pEN Vm where the second Vm is con- EN processors to compute in parallel using a . There sidered as an edge table since it has two fields representing node are CRCW PRAM if concurrent writes are allowed, and CREW ids (line 2). (3) In the third step, we create Vp by eliminating all PRAM if not. In [23], a theoretical result shows that an CREW non-isolated singletons. A node v is a singleton, iff c(v)=0and p(v)=v. A non-isolated singleton v can be eliminated by as- 1The result is only proved on a path graph in [36]. signing p(v)=minu (u, v) E ,whichcanbedoneusing { | ∈ }

832 1 3 V.id=V’.g 1 3 5 V’h .pm 2 5 7 2 10 12 9 13 4 10 12 9 13 4 6 15 7 Vs .id=V’h .p 14 19 6 15 18 8 14 19 17 16 11 V’.id 17 16 11 18 8 V’h .id Figure 5: Star Detection Step 2: Compute Vs′ Figure 7: Conditional Star Hooking: Compute Vh

1 3 V’.p=V.id 2 5 7 1 5 V’u .pm 3 10 12 9 13 4 V’.id Vs .id=V’ .p 14 19 6 15 18 8 u 2 10 12 9 13 4 6 15 7 17 16 11 14 19 17 16 11 18 8 Figure 6: Star Detection Step 3: Compute Vs V’u .id ✶ Figure 8: Unconditional Star Hooking: Compute Vu Vc id,idEN 1 E (line 3). Obviously, no cycles (except for self cycles) are created in Vp. larger than v in order to make sure that no cycles (except for self For example, for the graph G shown in Fig. 1, the forest Vm is cycles) are created. After the hooking, it is guaranteed that there shown in Fig. 2, where solid edges represent parent pointers and are no edges that connect two stars. Conditional star hooking is dashed edges represent graph edges. p(17) = 10 since 10 is the done in three steps. (1) Create a new edge table Eh by embedding smallest neighbor of 17 in G. p(2) = 2 since 2 has no neighbor p(x) to each edge (x, y) E using Vs ✶NE E (line 6), where ∈ id,id1 which is smaller than 2. There are two singletons 9 and 16, which Vs is the forest with all stars detected (line 5). (2) Create a table are eliminated in Vp as shown in Fig. 3, by pointing 9 to 12 and Vh′, in which for each node y such that y is in a star, pm(y)= 16 to 17, since 12 is the smallest neighbor of 9 in G and 17 is the min p(y) p(x) (x, y) E is computed. This can be done {{ }∪✶{ | ∈ }} smallest neighbor of 16 in G. using σs=1(Vs id,idEN 2 Eh) (line 7). (3) Create a table Vh, in which the parent of each node v is updated to min pm(y) p(y)=v if Star Detection: We need to detect whether each node belongs to a ✶ { | } star before hooking. We use s(v)=1to denote that v belongs to a such pm exists using Vs id,pEN Vh′ (line 8). star and s(v)=0otherwise. The star detection is done using three For example, for Vs shown in Fig. 6 with all stars detected, there steps based on the following three filtering rules: exists an edge (x, y)=(15, 18) with u = p(x)=5and v = p(y)=7. Since 7 is in a star and 5 < 7, 7 is hooked to a new (Rule-1): Anodev does not belong to a star if p(v) = p(p(v)). parent 5 by assigning p(7) = 5 as shown in Fig. 7. Note that 5 is • (Rule-2): Anodev does not belong to a star if u,̸ such that u • ∃ also in a star in Vs,however,since7 > 5,wecannothook5 to 7 does not belong to a star and p(p(u)) = v. by assigning p(5) = 7 after which a cycle is created. (Rule-3): Anodev does not belong to a star if p(v) does not • belong to a star. Unconditional Star Hooking: Unconditional star hooking is sim- ilar to conditional star hooking by dropping the condition that a It is guaranteed that after applying the three rules one by one in node v should be hooked to a parent u with u v. It is done us- order, all non-stars are filtered. We now introduce how to apply ing the similar three steps (line 10-12) with the≤ only difference on the three rules using graph join operators. (Rule-1) For each node the second step, which calculates pm(y) as min p(x) (x, y) E v,wefinditsgrandparentg(v)=p(p(v)), and assign 1 or 0 to and p(x) = p(y) , instead of min p(y) p({x) (x,| y) E∈ . s(v) depending on whether p(v)=g(v).Thiscanbedoneusing We add a̸ condition} p(x) = p(y){{to avoid}∪{ hooking| a star∈ to}} it- ̸ a self join Vp ✶id,pNE Vp as shown in line 17. (Rule-2) A node self in order to make sure that all non-isolated stars are eliminated. v belongs to a star after applying Rule-2 if s(v)=1and for all Unconditional star hooking does not create cycles (except for self u such that g(u)=v, s(u)=1. Thus, we use an aggregate cycles) due to the fact that after conditional star hooking, there is function and(s(v),s(u)) which is a boolean function and returns no edge that connects two stars. 1 iff s(v)=1and g(u)=v, s(u)=1.Thiscanbedoneusinga ∀ For example, for the forest Vh shown in Fig. 7, there is only one self join Vg ✶id,gEN Vg for Vg created in Rule-1 as shown in line 18. star rooted at node 2. There exists an edge (x, y)=(19, 17) with (Rule-3) For each node v,wecomputes(p(v)) and assign s(v)=0 u = p(x)=2and v = p(y)=10and u = v,so2 is hooked to a ̸ if s(p(v)) = 0.ThiscanbedoneusingaselfjoinVs′ ✶id,pNE Vs′ for new parent 10 as shown in Fig. 8 with no stars existing. Vs′ created in Rule-2 as shown in line 19. Pointer jumping changes the parent of each For example, Fig. 4 shows Vg by applying Rule-1 on Vp shown Pointer Jumping: in Fig. 3. The grey nodes are those detected as non-star nodes, i.e., node to its grandparent in the forest Vu generated in unconditional s(v)=0.Fornode11, it does not belong to a star as (g(11) = star hooking by assigning p(v)=p(p(v)) for each node v. This can be done using a self join Vu ✶NE Vu (line 13). In pointer 3) =(p(11) = 4).Fig.5showsVs′ by applying Rule-2 on Vg. id,p Two̸ new nodes 1 and 3 are filtered as non-star nodes. For node 3, jumping, we also create a counter ns which counts the number of s(3) = 0 since there exists node 11 with g(11) = 3 and s(11) = 0. nodes with p(v) = p(p(v)).Whenns =0, all stars in Vu are iso- lated stars and the̸ algorithm terminates with each star represents a Fig. 6 shows Vs by applying Rule-3 on Vs′.Threenodes10, 13 and 4 are filtered. For node 4, s(4) = 0 since its parent node 3 has CC (line 14). For example, for the forest Vu computed in uncon- ditional star hooking, after pointer jumping, the new forest Vp is s(3) = 0.InVs, all non-star nodes are filtered, and 3 stars rooted at 2, 5 and 7 are detected. shown in Fig. 9 with two stars with roots 3 and 5 generated. The following theorem shows the efficiency of Algorithm 4. Due Conditional Star Hooking: In a conditional star hooking, for any node v which is the root of a star (i.e., p(v)=v and s(v)=1), to lack of space, the proof is omitted. the parent of v is updated to min p(v) u (x, y) E, s.t. Theorem 5.1: Algorithm 4 stops in O(log(n)) iterations. ✷ p(x)=u and p(y)=v . In{{ other} words,∪{ |∃v is hooked∈ to a }} The comparison of algorithms HashToMin, HashGToMin,and new parent u, if u is no larger than v, and the tree that u lies in our algorithm CC is shown in Table 2 in terms of the memory con- is connected to the star that v lies in through an edge (x, y) with sumption per machine, total communication cost per round, and the p(x)=u and p(y)=v. The operation ensures that p(v) is no number of rounds, in which our algorithm is the best in all factors.

833 Algorithm 5 MSF(V (id),E(id1,id2,w)) 1 3 V.p Vp (V ✶EN E); 5 1: id,min((id1,id2,w)) em,em.id2 p id,id1 2 10 17 12 ← → → 2: Et (σe =φ(Vp)); ← !em m̸ 13 4 11 6 15 18 8 7 V.id=V’.p 14 19 16 9 3: while true do V’.id ! 4: Vb ((Vp ← V ′.id,((V ′.id=V.p V ′.id

834 Vm .p 9 3 V’.id m 3 7 4 Vs .id=Vm .p 7 4 14 1 V.p 10 9 1 14 17 13 18 6 11 12 15 10 17 2 V.id=V’.p 13 18 16 6 8 11 2 16 V .id 15 12 5 5 8 m 13 Vm .em 10 Figure 11: Cycle Breaking: Compute Vb Figure 13: Edge Hooking: Compute Vp OneRoundMSF MultiRoundMSF MSF 1+ c 1+ϵ 3 Memory/machine O(n 2 ) O(n ) O(1) 1+ c 7 4 V.p Communication/round O(m 2 ) O(n + m) O(n + m) log (m) 1 10 9 1 14 17 n − Number of rounds O(1) O( ϵ ) O(log(n)) 2 13 18 16 6 8 11 15 12 5 V’.id V.id=V’.p Table 3: MSF Computation Algorithms in MapReduce

Figure 12: Pointer Jumping: Compute Vc hooked to 9 as shown in Fig. 13. Similarly, 4 is also hooked to 9 (15, 16, 13) (12, 8, 10) E the same edge (2, 13, 7), thus a cycle of size 2 is formed by nodes .Theedges and are added to the MSF t. 2 and 13. All the edges in Vp are added to the MSF Et. The following theorem shows the efficiency of Algorithm 5. Due to lack of space, the proof is omitted. Cycle Breaking: Cycle breaking breaks the cycles to make sure that there are no cycles except for self cycles in the forest. Accord- Theorem 6.1: Algorithm 5 stops in O(log(n)) iterations. ✷ ing to Lemma 6.2, a cycle of length 2 can be easily detected if there The comparison of algorithms OneRoundMSF, MultiRoundMSF, is a node v with p(v) = v and p(p(v)) = v.Wecaneliminatesuch ̸ and our algorithm MSF is shown in Table 3 in terms of memory cycles using the following rule: consumption per machine, total communication cost per round, and (Cycle Breaking Rule): For a node v with p(v) = v and p(p(v)) = the number of rounds. As we will show later in our experiments, the v,ifp(v) >v, then assign v to p(v). ̸ high memory requirement of OneRoundMSF and MultiRoundMSF becomes the bottleneck for the algorithms to achieve high scalabil- By applying the rule, we create a table Vb with no cycles (except ity when handling graphs with large n. for self cycles) using a self join Vp ✶id,pNE Vp as shown in line 4. For example, for the forest Vp in Fig. 10(b), the node 4 has p(4) = 6 =4and p(p(4)) = 4.Sincep(4) > 4,byapplying 7. PERFORMANCE STUDIES ̸ the cycle breaking rule, p(4) is updated to 4. The new forest Vb In this section, we show our experimental results. We deploy with no cycles of length larger than 1 is shown in Fig. 11. aclusterof17computingnodes,includingonemasternodeand 16 slave nodes, each of which has four Intel Xeon 2.4GHz CPUs Pointer jumping is analogous to that in Algo- Pointer Jumping: and 15GB RAM running 64-bit Ubuntu Linux. We implement all rithm 4, which creates a table Vc by changing the parent of each algorithms using Hadoop (version 1.2.1) with Java 1.6. We allow node v to its grandparent, using p(v)=p(p(v)) by a self join each node to run three mappers and three reducers concurrently, V ✶NE V as shown in line 5. Again, in pointer jumping, we b id,p b each of which uses a heap size of 2048MB in JVM. The block size create a counter ns to count the number of non-star nodes. When in HDFS is set to be 128MB, the data replication factor of HDFS is ns =0, the algorithm terminates and outputs Et as the MSF of G. set to be 3, and the I/O buffer size is set to be 128KB. For example, after pointer jumping, the forest Vb in Fig. 11 is 2 changed to the forest Vc in Fig. 12. The parent of node 8 changes to Datasets: We use two web-scale graphs Twitter-2010 and Friend- 3 its grandparent 4, and two new stars with roots 7 and 4 are created. ster with different graph characteristics for testing. Edge Hooking: Edge hooking aims to eliminate all stars (except Twitter-2010 contains 41,652,230 nodes and 1,468,365,182 edges • with an average degree of 71. The maximum degree is 3,081,112 for isolated stars) in the forest Vc . Suppose we create a table and the diameter of Twitter-2010 is around 24. Vs in line 7 with all stars detected using the same procedure star Friendster contains 65,608,366 nodes and 1,806,067,135 edges in Algorithm 4. In edge hooking, for any node v which is the • root of a star (i.e., p(v)=v and s(v)=1), let (x, y, w)= with an average degree of 55. The maximum degree is 5,214 min (x′,y′,w′) p(y′)=v, p(x′) = p(y′) , then v is assigned a and the diameter of Friendster is around 32. { | ̸ } new parent u = p(x) after hooking. Edge hooking is done in three Algorithms: Besides the five algorithms PageRank (Algorithm steps. (1) Create a new edge table Em by embedding p(x) to each 1), BFS (Algorithm 2), KWS (Algorithm 3), CC (Algorithm 4), edge (x, y, w) E using Vs ✶NE E (line 8). (2) Create a table ∈ id,id1 and MSF (Algorithm 5), we also implement the algorithms for Vm, in which for each node y such that y is in a star, pm(y)=p(x) PageRank, BFS, and graph keyword search using the join opera- with em(y)=(x, y, w)=min (x′,y′,w′) p(y′)=v, p(x′) = { | ̸ tions supported by Pig (http://pig.apache.org/) on Hadoop, denoted p(y′) is computed. This can be done with an aggregate function } PageRank-Pig, BFS-Pig and KWS-Pig respectively. Since the amin((x, y, w),p(x) p(x) = p(y)) using the join σs=1 (Vs | ̸ EN algorithms for PageRanks, BFS, and graph keyword search are ✶EN Em) (line 9). (3) Create a table Vp, in which the parent id,id2 rather simple, i.e., for each algorithm, only two MapReduce jobs of each node v is updated to pm(y) with (x, y, w)=min em(y) { are needed in each iteration for both Pig and our implementation, p(y)=v if such pm exists using Vs ✶EN Vm (line 10). The | } id,p the main difference between Pig and our implementation is how corresponding edge em = min em(y) p(y)=v is added to the { | } the join operation is implemented. In Pig, the join operation is MSF table Et (line 11) by Lemma 6.1 if it exists. It is guaranteed implemented using a load-and-join manner where each key-value that Lemma 6.2 still holds on the Vp created in edge hooking. pair is accessed for more than once in the reducer, and in our im- For example, for Vc in Fig. 12, there exists an edge (x, y)= plementation, the join operation is implemented as a streaming (15, 16) with u = p(x)=9and v = p(y)=7. Since 7 is the root 2 of a star and (15, 16, 13) = min (x′,y′w′) p(y′)=7,p(x′) = http://law.di.unimi.it/webdata/twitter-2010/ { | ̸ 3 7 = min (10, 7, 15), (9, 18, 14), (15, 16, 13), (5, 2, 14) , 7 is http://snap.stanford.edu/data/com-Friendster.html } { }

835 PageRank PageRank CC CC 12 20 PageRank-Pig 30 PageRank-Pig HashToMin HashToMin 16 24 INF 8 10 12 18 8 8 12 6 4 4 4 6 2 Processing Time (Hour) Processing Time (Hour) Processing Time (Hour) Processing Time (Hour) 20% 40% 60% 80% 100% 20% 40% 60% 80% 100% 20% 40% 60% 80% 100% 20% 40% 60% 80% 100% (a) Vary n (Twitter-2010) (b) Vary n (Friendster) (a) Vary n (Twitter-2010) (b) Vary n (Friendster) PageRank PageRank CC CC 12 12 PageRank-Pig PageRank-Pig HashToMin HashToMin 16 9 INF 12 8 8 6 8 6 4 4 3 4 2 Processing Time (Hour) Processing Time (Hour) Processing Time (Hour) Processing Time (Hour) 4 7 10 13 16 4 7 10 13 16 4 7 10 13 16 4 7 10 13 16 (c) Vary t (Twitter-2010) (d) Vary t (Friendster) (c) Vary t (Twitter-2010) (d) Vary t (Friendster) Figure 14: PageRank Figure 16: Connected Component

MSF MSF 24 BFS BFS MultiRoundMSF MultiRoundMSF BFS-Pig 30 BFS-Pig 20 INF INF 24 16 30 30 24 24 18 12 18 18 8 12 12 12 6 6 4 6 Processing Time (Hour) Processing Time (Hour)

Processing Time (Hour) Processing Time (Hour) 20% 40% 60% 80% 100% 20% 40% 60% 80% 100% 20% 40% 60% 80% 100% 20% 40% 60% 80% 100% (a) Vary n (Twitter-2010) (b) Vary n (Friendster) (a) Vary n (Twitter-2010) (b) Vary n (Friendster) MSF MSF 20 BFS 28 BFS MultiRoundMSF MultiRoundMSF BFS-Pig BFS-Pig 16 24 INF INF 20 30 30 12 24 16 24 18 12 18 8 12 12 8 6 4 6 4 Processing Time (Hour) Processing Time (Hour)

Processing Time (Hour) Processing Time (Hour) 4 7 10 13 16 4 7 10 13 16 4 7 10 13 16 4 7 10 13 16 (c) Vary t (Twitter-2010) (d) Vary t (Friendster) (c) Vary t (Twitter-2010) (d) Vary t (Friendster) Figure 15: Breadth First Search Figure 17: Minimum Spanning Forest manner as introduced in Section 3.2 where each key-value pair is slower than PageRank on average. We also vary the number of accessed for only once in the reducer. For CC computation, we iterations d from 2 to 10. The processing time for both PageRank implement HashToMin and HashGToMin as introduced in Sec- and PageRank-Pig increases linearly with the number of iterations. tion 5 by integrating all the optimization techniques introduced in The curves are not shown due to lack of space. [36], and for MSF computation, we implement OneRoundMSF Exp-2: Breadth First Search. Fig. 15 shows the testing results and MultiRoundMSF as introduced in Section 6. All the four for BFS and BFS-Pig by setting the depth of BFS to be 6.The algorithms OneRoundMSF, MultiRoundMSF, HashToMin,and curves on Twitter-2010 and Friendster when varying the graph size HashGToMin, and all our algorithms proposed in this paper are are shown in Fig. 15 (a) and Fig. 15 (b) respectively. The results are implemented by java using plain MapReduce in Hadoop without similar to those in PageRank. As shown in Fig. 15 (c) and Fig. 15 using Pig or other MapReduce based translators. (d), when increasing t from 4 to 16, the processing time for both BFS and BFS-Pig decreases, and the time decreases more sharply Parameters: For each dataset, we extract subgraphs of 20%, 40%, 60%, 80% and 100% nodes of the original graph with a default when t is smaller. This is because when t is smaller, each machine value of 60%. We also vary the number of slave nodes t in the will spend more time on shuffling and sorting data on disk, which cluster from 4 to 16 with a default value of 10. We test both pro- is costly. We also vary the depth of BFS d from 2 to 10.Thecurves cessing time and maximum communication cost on each machine. are omitted since the processing time for both BFS and BFS-Pig The curves for the communication cost are very similar to those increases linearly with d. for processing time, thus we only show the processing time due to Exp-3: Graph Keyword Search. We randomly generate some lack of space. We set the maximum running time to be 36 hours. keywords in both Twitter-2010 and Friendster, and test KWS and If a test does not stop in the time limit, or fails due to out of mem- KWS-Pig using a keyword query with 3 keywords and rmax =3 ory exception, we will denote the processing time using INF.For by default. The trends by varying n and t on both Twitter-2010 PageRank, we consider each graph as a directed graph and for other and Friendster are similar to those in BFS, since in graph keyword algorithms, we consider each graph as an undirected graph. search, the keyword information on each node is propagated to its neighbors in a BFS manner in iterations. We also vary rmax from Exp-1: PageRank. We test PageRank . We set the default number of iterations d to be 6. The testing results are shown in Fig. 14. 1 to 5. Again, the processing time for both KWS and KWS-Pig Fig. 14 (a) and Fig. 14 (b) demonstrate the curves for Twitter-2010 increases linearly with rmax. Due to space limitation, we omit the and Friendster respectively when varying the size of the graph from figures for graph keyword search in the paper. 20% to 100%.ThetimeforbothPageRank and PageRank-Pig Exp-4: Connected Component. We test three algorithms: CC, increases when the size of the graph increases. PageRank-Pig is HashToMin,andHashGToMin. Since HashGToMin fails in al- 1.5 to 3 times slower than PageRank. This is because Pig is not most all test cases due to the out of memory exception, we only optimized for graph processing using the techniques in join show the testing results for HashToMin and CC. and join introduced in Section 3. Fig. 14 (c) andNE Fig. 14 Fig. 16 (a) shows the curves when varying the size of the graph (d) showEN the results on Twitter-2010 and Friendster respectively in Twitter-2010. When the number of nodes n in the graph is 20%, when varying the number of slave nodes t in the cluster from 4 to HashToMin and CC have similar performance. However, when 16.Whent increases, the processing time for both PageRank and n increases, the processing time of HashToMin increases more PageRank-Pig decreases. PageRank can achieve high scalability sharply than CC. The reasons are twofold. First, HashToMin gen- in both Twitter-2010 and Friendster. PageRank-Pig is 2.3 times erates more intermediate results than CC, which increase both the

836 8 similar, which indicates that the processing time scale when vary- Twitter-2010 2.4 Twitter-2010 7 Friendster 2.2 Friendster 6 2 ing n is not largely impacted by the degree distribution. Next, we 5 1.8 4 1.6 take the processing time for t =4as 1,andtestthespeedupofthe 3

Speedup 1.4 2 1.2 algorithm when varying t from 4 to 16 on both Twitter-2010 and 1 1

Processing Time Scale 0 0.8 20% 40% 60% 80% 100% 4 7 10 13 16 Friendster. The result is shown in Fig. 18 (b). When t is large, the (a) Vary n (CC) (b) Vary t (CC) of CC on Twitter-2010 increases slower than Friendster, because the nodes with very high degree in Twitter-2010 become Figure 18: Degree Distribution Test the bottleneck for the speedup, while the speedup for Friendster communication cost and the cost for sorting data on each machine. still keeps increasing stably when t =16.Forotherproposedalgo- This is consistent with the theoretical results shown in Table 2. Sec- rithms, we can get similar result. Due to space limitation, we only ond, HashToMin needs to sort all the edges in the graph in order show the result for the CC algorithm in the paper. to achieve constant memory consumption on each machine, which is very costly. Fig. 16 (b) shows the testing results on Friendster 8. RELATED WORK when varying n. HashToMin cannot stop in the time limit when MapReduce Framework: MapReduce [10] is a big data process- n increases to 60% while CC still keeps stable. The reason is due ing framework that has become the de facto standard used through- to the large intermediate results generated by HashToMin. Note out both industry and academia. In industry, Google has developed that Friendster is a graph with a large diameter and uniform de- a MapReduce system on the distributed key-value store BigTable gree distribution. In such a graph, the number of iterations used in on top of GFS (Google File System). Yahoo has developed a MapRe- HashToMin is large. Since the communication cost of HashToMin duce system Hadoop and a high level data flow language Pig based in each round is linear with respect to the number of iterations used, on a distributed key-value store HBase on top of HDFS (Hadoop a large number of intermediate results are generated and shuffled by Distributed File System). Facebook has developed a SQL-like data HashToMin in each round, making HashToMin inefficient. warehouse infrastructure Hive based on Hadoop using a key-value The testing results when varying t on Twitter-2010 and Friend- store Cassandra. Microsoft has developed two languages, Scope ster are shown in Fig. 16 (c) and Fig. 16 (d) respectively. In Twitter- and DryadLINQ, based on the distributed execution engine . 2010, the processing time for both CC and HashToMin decreases Amazon has developed a key-value store called Dynamo. when t increases. When t increases to 13, the time for HashToMin In academia, Hadoop++ by Dittrich et al. [11] improves the per- decreases more slowly than CC. HashToMin is 1.5 to 2.2 times formance of Hadoop using user-defined functions. Column-based slower than CC in all test cases. When varying t in Friendster, storage is studied by Floratou et al. [14] and hybrid-based storage HashToMin cannot stop in the time limit until t reaches 16 due to is studied by Lin et al. [28] to speed up MapReduce tasks. HAIL is the large number of intermediate results generated, while the pro- proposed by Dittrich et al. [12] to improve the upload pipeline of cessing time of CC decreases stably when t increases. HDFS. Query optimization in MapReduce is discussed by Jahani et al. [18] and Lim et al. [27]. Mulit-query and iterative query Exp-5: Minimum Spanning Forest. In our last set of experi- optimization in MapReduce are studied by Wang et al. [45] and ments, we compare three algorithms: MSF, OneRoundMSF,and Onizuka et al. [34] respectively. Cost analysis of MapReduce is MultiRoundMSF. OneRoundMSF runs out of memory for all test given by Afrati et al. [3]. Some other works focus on solving a spe- cases, thus we only show the testing results for MultiRoundMSF cific type of query in MapReduce. For example, set similarity joins and MSF.Weassigneachedgearandomweightuniformlyin in MapReduce are studied by Vernica et al. [44] and Metwally and (0, 1).Wealsotestedothermethodsofassigningedgeweights Faloutsos [32]. Theta joins in MapReduce are discussed by Okcan (e.g., using other distributions). We find that the weight assignment and Riedewald [33] and Zhang et al. [48]. Multiway joins are opti- method has small effect on the efficiency of the algorithms. mized by Afrati et al. [4]. KNN joins in MapReduce are proposed The testing results are shown in Fig. 17, in which Fig. 17 (a) by Lu et al. [29]. A survey on distributed data management and and Fig. 17 (b) show the curves when varying n and Fig. 17 (c) processing using MapReduce is given by Li et al. [26]. and Fig. 17 (d) show the curves when varying t.Notsurprisingly, MultiRoundMSF runs out of memory in most cases, since it re- Graph Processing Systems in Cloud: Many graph processing quires a single machine to have a memory size super-linear to the systems are developed in order to deal with big graphs. One rep- number of nodes in the graph. In Fig. 17 (b), when increasing n resentative such system is Pregel [31], and its open source imple- from 20% to 100%, the processing time of MSF increases near mentation Giraph and HAMA based on Hadoop. Pregel takes a linearly, demonstrating the high scalability of MSF.Asshown vertex-centric approach and implements a bulk synchronous paral- in Fig. 17 (c) and Fig. 17 (d), once MultiRoundMSF runs out of lel (BSP) computation model [43], which targets towards iterative memory, the problem cannot be solved by adding more machines in computation on graphs. HipG [24] improves BSP by using asyn- the system, while the performance of MSF can be improved stably chronous messages to avoid synchronization. Microsoft Research when the number of slaves increases from 4 to 16. Labs develop a distributed in-memory based graph processing en- gine called Trinity [38, 47] based on a hypergraph model. Pow- Exp-6: Degree Distribution Test. We test the impact of degree erGraph [15] is a distributed graph processing system that is op- distribution to the efficiency of the algorithms proposed in this pa- timized to process power-law graphs. Distance oracle is studied per. Recall that the datasets Twitter-2010 and Friendster have dif- in [35]. Giraph++ is proposed in [42] to take graph partitioning ferent degree distributions. Twitter-2010 has a skewed degree dis- into consideration when processing graphs. Workload balancing tribution with maximum degree 3,081,112, and Friendster has a for graph processing in cloud is discussed in [37]. more uniform degree distribution with maximum degree 5,214. We test the CC algorithm on both Twitter-2010 and Friendster.First, Graph Processing in MapReduce: Many graph algorithms in- we take the processing time for n =20%as 1, and test the relative cluding triangles/rectangles enumeration, k-cliques computation, processing time scale when varying n from 20% to 100% on both barycentric clustering, and components finding in MapReduce are Twitter-2010 and Friendster. The result is shown in Fig. 18 (a). The discussed in [9]. In [25], several techniques are proposed to re- curves for processing time scale on Twitter-2010 and Friendster are duce the size of the input in a distributed fashion in MapReduce, and the techniques are applied for several graph problems such

837 as minimum spanning trees, approximate maximal matchings, ap- [12] J. Dittrich, J.-A. Quiané-Ruiz, S. Richter, S. Schuh, A. Jindal, and J. Schad. proximate node/edge covers, and minimum cuts. Triangle counting Only aggressive elephants are fast elephants. PVLDB,5(11):1591–1602,2012. [13] U. Elsner. Graph partitioning - a survey. Technical Report SFB393/97-27, in MapReduce is optimized in [40]. Diameter and radii computa- Technische Universitat Chemnitz,1997. tions in MapReduce are studied in [20]. Personalized PageRank [14] A. Floratou, J. M. Patel, E. J. Shekita, and S. Tata. Column-oriented storage computation in MapReduce is studied in [7]. Matrix multiplica- techniques for mapreduce. PVLDB,4,2011. tion based graph mining algorithms in MapReduce are discussed [15] J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: in [19, 21]. Densest subgraph computation in MapReduce is stud- Distributed graph-parallel computation on natural graphs. In OSDI’12,2012. [16] H. He, H. Wang, J. Yang, and P. S. Yu. BLINKS: ranked keyword searches on ied in [8]. Subgraph instances enumeration in MapReduce is pro- graphs. In Proc. of SIGMOD’07,2007. posed in [2]. Algorithms for connected components computation [17] V. Hristidis, H. Hwang, and Y. Papakonstantinou. Authority-based keyword in MapReduce in logarithmic rounds are discussed in [36]. Max- search in databases. ACM Trans. Database Syst.,33(1),2008. imum clique computation in MapReduce is studied in [46]. Re- [18] E. Jahani, M. J. Cafarella, and C. Ré. Automatic optimization for mapreduce programs. PVLDB,4,2011. cursive query processing such as transitive closure computation in [19] U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos. Gbase: a scalable and MapReduce is discussed in [1]. general graph management system. In Proc. of KDD’11,2011. [20] U. Kang, C. E. Tsourakakis, A. P. Appel, C. Faloutsos, and J. Leskovec. Hadi: Algorithm Classes in MapReduce: Algorithm classes in MapRe- Mining radii of large graphs. TKDD,5(2),2011. duce are studied by Karloff et al. [23] and Tao et al. [41], both [21] U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph of which have been introduced in details in Section 2. Note that mining system implementation and observations. In Proc. of ICDM’09,2009. by defining a new class for graph processing in MapReduce, we [22] D. R. Karger, N. Nisan, and M. Parnas. Fast connected components algorithms focus on the scalability issues that can be commonly achieved by for the erew pram. SIAM J. Comput.,28(3),1999. [23] H. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for a class of graph algorithms in MapReduce. The query optimiza- mapreduce. In Proc. of SODA’10,2010. tion techniques introduced in the MapReduce framework can be [24] E. Krepska, T. Kielmann, W. Fokkink, and H. E. Bal. Hipg: parallel processing orthogonally studied to our work by case-to-case analyses. Each of of large-scale graphs. Operating Systems Review,45(2),2011. the algorithms studied in this paper can certainly benefit from fur- [25] S. Lattanzi, B. Moseley, S. Suri, and S. Vassilvitskii. Filtering: a method for solving graph problems in mapreduce. In Proc. of SPAA’11,2011. ther optimization using techniques such as multi-query optimiza- [26] F. Li, B. C. Ooi, M. T. Özsu, and S. Wu. Distributed data management using tion [45] and iterative query optimization [34] which are beyond mapreduce. ACM Comput. Surv.,46(3),2014. the main scope of research in this paper. [27] H. Lim, H. Herodotou, and S. Babu. Stubby: A transformation-based optimizer for mapreduce workflows. PVLDB,5(11):1196–1207,2012. 9. CONCLUSIONS [28] Y. Lin, D. Agrawal, C. Chen, B. C. Ooi, and S. Wu. Llama: leveraging columnar storage for scalable join processing in the mapreduce framework. In In this paper, we study scalable big graph processing in MapRe- Proc. of SIGMOD’11,2011. duce. We review previous MapReduce classes, and propose a new [29] W. Lu, Y. Shen, S. Chen, and B. C. Ooi. Efficient processing of k nearest class to guide the development of scalable graph processing neighbor joins using mapreduce. PVLDB,5(10):1016–1027,2012. algorithmsSGC in MapReduce. We introduce two graph join operators [30] A. Lumsdaine, D. Gregor, B. Hendrickson, and J. W. Berry. Challenges in parallel graph processing. Parallel Processing Letters,17(1),2007. using which a large range of graph algorithms can be designed in [31] G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and .Especially,fortwofundamentalgraphalgorithmsCCcom- G. Czajkowski. Pregel: a system for large-scale graph processing. In Proc. of putationSGC and MSF computation, we improve the state-of-the-art al- SIGMOD’10,2010. gorithms both in theory and practice. We conducted extensive per- [32] A. Metwally and C. Faloutsos. V-smart-join: A scalable mapreduce framework for all-pair similarity joins of multisets and vectors. PVLDB,5(8),2012. formance studies using real web-scale graphs to show the high scal- [33] A. Okcan and M. Riedewald. Processing theta-joins using mapreduce. In Proc. ability achieved for our algorithms in . of SIGMOD’11,pages949–960,2011. SGC [34] M. Onizuka, H. Kato, S. Hidaka, K. Nakano, and Z. Hu. Optimization for Acknowledgements. Lu Qin was support by ARC DE140100999. iterative queries on mapreduce. PVLDB,7(4),2013. Jeffrey Xu Yu was supported by grant of the RGC of Hong Kong [35] Z. Qi, Y. Xiao, B. Shao, and H. Wang. Toward a distance oracle for billion-node SAR, No. CUHK 418512. Hong Cheng was supported by grant graphs. PVLDB,7(1),2013. of the RGC of Hong Kong SAR, No. CUHK 411310 and 411211. [36] V. Rastogi, A. Machanavajjhala, L. Chitnis, and A. D. Sarma. Finding connected components in map-reduce in logarithmic rounds. In Proc. of Xuemin Lin was supported by NSFC61232006, NSFC61021004, ICDE’13,2013. ARC DP110102937, ARC DP120104168, and ARC DP140103578. [37] Z. Shang and J. X. Yu. Catch the wind: Graph workload balancing on cloud. In Proc. of ICDE’13,2013. [38] B. Shao, H. Wang, and Y. Li. Trinity: a distributed graph engine on a memory 10.[1] F. N. REFERENCES Afrati, V. R. Borkar, M. J. Carey, N. Polyzotis, and J. D. Ullman. cloud. In Proc. of SIGMOD’13,2013. Map-reduce extensions and recursive queries. In Proc. of EDBT’11,2011. [39] Y. Shiloach and U. Vishkin. An o(log n) parallel connectivity algorithm. J. [2] F. N. Afrati, D. Fotakis, and J. D. Ullman. Enumerating subgraph instances Algorithms,3(1),1982. using map-reduce. In Proc. of ICDE’13,2013. [40] S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. [3] F. N. Afrati, A. D. Sarma, S. Salihoglu, and J. D. Ullman. Upper and lower In Proc. of WWW’11,2011. bounds on the cost of a map-reduce computation. PVLDB,6(4),2013. [41] Y. Tao, W. Lin, and X. Xiao. Minimal mapreduce algorithms. In Proc. of [4] F. N. Afrati and J. D. Ullman. Optimizing multiway joins in a map-reduce SIGMOD’13,2013. environment. IEEE Trans. Knowl. Data Eng.,23(9):1282–1298,2011. [42] Y. Tian, A. Balmin, S. A. Corsten, S. Tatikonda, and J. McPherson. From "think [5] A. V. Aho, J. E. Hopcroft, and J. D. Ullman. Data Structures and Algorithms. like a vertex" to "think like a graph". PVLDB,7(3),2013. Addison-Wesley, 1983. [43] L. G. Valiant. A bridging model for parallel computation. Commun. ACM, [6] B. Awerbuch and Y. Shiloach. New connectivity and msf algorithms for 33(8):103–111, 1990. shuffle-exchange network and pram. IEEE Trans. Computers,36(10),1987. [44] R. Vernica, M. J. Carey, and C. Li. Efficient parallel set-similarity joins using [7] B. Bahmani, K. Chakrabarti, and D. Xin. Fast personalized pagerank on mapreduce. In Proc. of SIGMOD’10,2010. mapreduce. In Proc. of SIGMOD’11,2011. [45] G. Wang and C.-Y. Chan. Multi-query optimization in mapreduce framework. [8] B. Bahmani, R. Kumar, and S. Vassilvitskii. Densest subgraph in streaming and PVLDB,7(3),2013. mapreduce. PVLDB,5(5):454–465,2012. [46] J. Xiang, C. Guo, and A. Aboulnaga. Scalable maximum clique computation [9] J. Cohen. Graph twiddling in a mapreduce world. Computing in Science and using mapreduce. In Proc. of ICDE’13,2013. Engineering,11(4):29–41,2009. [47] K. Zeng, J. Yang, H. Wang, B. Shao, and Z. Wang. A distributed graph engine [10] J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large for web scale rdf data. PVLDB,6(4),2013. clusters. In Proc. of OSDI’04,2004. [48] X. Zhang, L. Chen, and M. Wang. Efficient multi-way theta-join processing [11] J. Dittrich, J.-A. Quiané-Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad. using mapreduce. PVLDB,5(11):1184–1195,2012. Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing). PVLDB,3(1),2010.

838