Balanced Graph Partitioning with Apache...

12
Balanced Graph Partitioning with Apache Spark Emanuele Carlini 1 , Patrizio Dazzi 1 , Andrea Esposito 2 , Alessandro Lulli 2 , and Laura Ricci 2 1 Istituto di Scienza e Tecnologie dell’Informazione (ISTI) Consiglio Nazionale delle Ricerche (CNR) {name.surname}@isti.cnr.it 2 Dipartimento di Informatica Universit` a di Pisa {surname}@di.unipi.it Abstract. A significant part of the data produced every day by on- line services is structured as a graph. Therefore, there is the need for efficient processing and analysis solutions for large scale graphs. Among the others, the balanced graph partitioning is a well known NP-complete problem with a wide range of applications. Several solutions have been proposed so far, however most of the existing state-of-the-art algorithms are not directly applicable in very large-scale distributed scenarios. A recently proposed promising alternative exploits a vertex-center heuris- tics to solve the balance graph partitioning problem. Their algorithm is massively parallel: there is no central coordination, and each node is pro- cessed independently. Unfortunately, we found such algorithm to be not directly exploitable in current BSP-like distributed programming frame- works. In this paper we present the adaptations we applied to the original algorithm while implementing it on Spark, a state-of-the-art distributed framework for data processing. Keywords: graph partitioning; distributed algorithm; approximated al- gorithm 1 Introduction In the recent years we experienced an exponential growth and availability of structured and unstructured data. Every day, large amounts of data is pro- duced by on-line services like social networking, social media, fora, newsletters, mailing-lists, etc. Normally, this data needs deep analysis, in order to exploit its intrinsic value. However, due to the data size, efficient analysis cannot be performed with the computational capabilities of a single machine. Instead, it is a common practice to rely on efficient approaches and scalable solutions able to conduct these computations by orchestrating the work of a vast, distributed, set of computational resources. A significant part of the aforementioned data is modelled as a graph (some- times as an hyper-graph), in which vertices represent entities and (hyper-) edges represent the relationship between them. Usually, computation on graph relies

Transcript of Balanced Graph Partitioning with Apache...

Page 1: Balanced Graph Partitioning with Apache Sparkhpc.isti.cnr.it/.../exe/fetch.php?media=pdf:bgp-spark.pdf · 2015. 2. 28. · Balanced Graph Partitioning with Apache Spark Emanuele Carlini

Balanced Graph Partitioning with Apache Spark

Emanuele Carlini1, Patrizio Dazzi1, Andrea Esposito2, Alessandro Lulli2, andLaura Ricci2

1 Istituto di Scienza e Tecnologie dell’Informazione (ISTI)Consiglio Nazionale delle Ricerche (CNR)

{name.surname}@isti.cnr.it2 Dipartimento di Informatica

Universita di Pisa{surname}@di.unipi.it

Abstract. A significant part of the data produced every day by on-line services is structured as a graph. Therefore, there is the need forefficient processing and analysis solutions for large scale graphs. Amongthe others, the balanced graph partitioning is a well known NP-completeproblem with a wide range of applications. Several solutions have beenproposed so far, however most of the existing state-of-the-art algorithmsare not directly applicable in very large-scale distributed scenarios. Arecently proposed promising alternative exploits a vertex-center heuris-tics to solve the balance graph partitioning problem. Their algorithm ismassively parallel: there is no central coordination, and each node is pro-cessed independently. Unfortunately, we found such algorithm to be notdirectly exploitable in current BSP-like distributed programming frame-works. In this paper we present the adaptations we applied to the originalalgorithm while implementing it on Spark, a state-of-the-art distributedframework for data processing.

Keywords: graph partitioning; distributed algorithm; approximated al-gorithm

1 Introduction

In the recent years we experienced an exponential growth and availability ofstructured and unstructured data. Every day, large amounts of data is pro-duced by on-line services like social networking, social media, fora, newsletters,mailing-lists, etc. Normally, this data needs deep analysis, in order to exploitits intrinsic value. However, due to the data size, efficient analysis cannot beperformed with the computational capabilities of a single machine. Instead, it isa common practice to rely on efficient approaches and scalable solutions able toconduct these computations by orchestrating the work of a vast, distributed, setof computational resources.

A significant part of the aforementioned data is modelled as a graph (some-times as an hyper-graph), in which vertices represent entities and (hyper-) edgesrepresent the relationship between them. Usually, computation on graph relies

Page 2: Balanced Graph Partitioning with Apache Sparkhpc.isti.cnr.it/.../exe/fetch.php?media=pdf:bgp-spark.pdf · 2015. 2. 28. · Balanced Graph Partitioning with Apache Spark Emanuele Carlini

heavily on locality, in the sense that the computation for a vertex is done consid-ering its neighbours in the graph. Hence, when going distributed an importantissue is data partitioning: extremely large scale graphs must be distributed tohosts in such a way that, for each vertex, most of the adjacent edges are storedon the same host [11].

The identification of good partitions is a well-known and well-studied prob-lem in graph theory [6]. The aim of graph partitioning, sometimes referred to asthe solution of the min-cut problem, is to divide a graph into a defined amount ofcomponents, such that to minimize the number of edges connecting the compo-nents. A variant of the min-cut problem is the balanced (or uniform) graph par-titioning problem. In this case it is also important that the components presenta balanced amount of nodes.

A balanced graph partitioning has many relevant applications, including bio-logical networks, parallel programming, and on-line social network analysis andit can be used to minimise communication cost and to balance workload. As sug-gested by this considerable amount of potential applications the problem is notnew and there are many proposed solutions. In the last years, a large amountof algorithms have been conceived, implemented and optimised for achievinggood graph partitioning [1], [10], [9], [14], [15], [19]. Some of these solutions areparallel, and most of them implicitly assume to have a not expensive, randomaccess to the entire graph. In contrast to this vision, current large scale graphsare composed by billions of nodes and hundreds of billions of edges, e.g., userrelationships and interaction graphs from online social networking services suchas Facebook or Twitter. Clearly, these graphs do not fit into the main memoryof a single computer but are distributed, with only a small fraction of the nodeshosted on a single computer.

Despite the interest on the problem, only recently distributed approachesfor efficient balanced partitioning have been proposed. Among the others, JA-BE-JA [17] is a recent promising approach that employs a fully distributed,vertex-center heuristics. In JA-BE-JA each node of the graph is a virtual pro-cessing unit, having only the information about its neighbourhood. The nodesalso acquire knowledge about a small subset of random nodes in the graph byusing purely local interactions. Initially, every node selects a random partition,and over time nodes swap their partitions with each other in a way that increasesthe number of neighbours in the same partition as themselves.

JA-BE-JA aims at dealing with extremely large distributed graphs. To thisend the algorithm exploits its locality, simplicity and lack of synchronisationrequirements. JA-BE-JA was evaluated using different datasets. Some of themwere synthetically generated graphs that are well-known in the graph partition-ing community. They are part of a publicly downloadable archive made availableby Chris Walshaw [21]. The authors of JA-BE-JA also tested their approachwith real datasets, a set of sampled graphs extracted by Facebook [20] and Twit-ter [2]. For comparison they used METIS [8], showing that their distributed solu-tion can in some cases outperform the results achieved by METIS. Most of theirexperiments have been conducted by implementing the JA-BE-JA algorithm

Page 3: Balanced Graph Partitioning with Apache Sparkhpc.isti.cnr.it/.../exe/fetch.php?media=pdf:bgp-spark.pdf · 2015. 2. 28. · Balanced Graph Partitioning with Apache Spark Emanuele Carlini

on the PeerSim [16] simulator. Further, the authors of JA-BE-JA claims thatdue to its flexibility, it can be adapted easily to the currently available graphprocessing frameworks such as Pregel [12], GraphLab [11] or Spark [24] by meansof the Bulk Synchronous Parallel (BSP) abstraction.

Starting from JA-BE-JA (detailed in Section 2), our main contribution isthe implementation and analysis of JA-BE-JA in Apache Spark. Before portingJA-BE-JA to Spark, we implemented a BSP-like version of JA-BE-JA overPeerSim. Our version adds a BSP barrier on the PeerSim communications, soto simulate the execution on a Spark-like environment. From the analysis of theBSP-PeerSim we noticed a sensible reduction of the performance with respectto the original formulation and that gave us an idea of the performance wecould expect on Spark. When porting on Spark, we implemented our own BSPabstraction, as the current available technologies (e.g. Bagel and GraphX) havebeen not suitable to accommodate easily the execution of JA-BE-JA. The wholedescription of the porting work is detailed in Section 3.1.

Further, to reduce random access to the nodes of the graph when applying theheuristics of JA-BE-JA, we relaxed the consistency constraint on the knowledgeof the nodes about their neighborhood (Section 3.2). Finally, we validate oursolution by extensive experiments and we found that the relaxed version of JA-BE-JA retains the same performance in terms of the quality of the min-cut, butit reduces considerably the amount of memory used (Section 4).

2 JA-BE-JA and the Balanced min-cut problem

The balanced k-way graph partitioning is a well-know problem in graph theoryand often it is referred as the k-way min-cut problem. Here, we give a formalformulation of this problem, derived from the one defined by the JA-BE-JAauthors.

Consider an undirected graph G = (V,E), with V representing the set ofvertices and E the set of edges. A k-way partitioning divides the set V into ksubsets. The fewer edges cross the boundaries of each subgraph (or component),the higher is the quality of the achieved partitioning. The balanced version ofthe problem consists in having the components of approximately the same size.

A formal description of k-way partitioning can be given with the help of apartition function π : V → {1, . . . , k} that assigns a color to each vertex ofthe graph. Thus, πp indicates the color of node p. The vertices of the same colorbelongs to the same partition. Let Np indicate the neighbourhood of vertex pand Np(c) the set of neighbours of p having color c. The number of neighboursof node p is denoted by dp, and dp(c) = |Np(c)| is the number of neighbours ofp with color c.

Now consider the energy of the system as the number of existing edgesconnecting nodes with different colours. Accordingly, the energy of a node is thenumber of its neighbours with a different color, and the energy of the graph is

Page 4: Balanced Graph Partitioning with Apache Sparkhpc.isti.cnr.it/.../exe/fetch.php?media=pdf:bgp-spark.pdf · 2015. 2. 28. · Balanced Graph Partitioning with Apache Spark Emanuele Carlini

the sum of the energy of the nodes.

E (G, π) =1

2

∑p∈V

(dp − dp(πp)) (1)

where the sum is divided by two to avoid to sum each edge twice. Given thisformal description, the balanced optimisation problem can be expressed as theproblem of finding the optimal partitioning π∗.

π∗ = argminE(G, π) (2)

s.t. |V (c1)| = |V (c2)| ∀c1, c2 ∈ {1, . . . , k} (3)

where V (c) is the set of nodes with color c.

2.1 JA-BE-JA

JA-BE-JA is a distributed algorithm for resolving the balanced k-way problem.Given a ”colored” graph as defined above, the idea behind JA-BE-JA is toapply a local heuristic to drive the system into a lower energy state.

The local heuristics is executed by all the nodes in parallel. Each node at-tempts to swap the color to the most dominant color among its neighbors, by: (i)selecting another node from either its neighbors or from a random sample, and(ii) considering the utility of the color swapping. If the color swapping decreasesthe energy, then the two nodes swap their colors; otherwise, they keep theircolors. To avoid remaining in a local optimum solution, JA-BE-JA employs asimulated annealing technique.

Since node just exchange colors, the distribution of colors (i.e. the numberof partition) is maintained during the whole process. Therefore, if the color atthe start is assigned uniform, the final result it is expected to have balancedpartitions.

3 Our Distributed Implementation

In this section we present our distributed implementation and the adaptationsthat have been required in order to efficiently exploit JA-BE-JA into a BSP-likeenvironment.

JA-BE-JA is designed to fit two models: one host-one node and one host-multiple nodes (respectively also called “one-to-one” and “one-to-many”). The“one-to-one” node case represents a fully distributed P2P computation, the “one-to-many” model, instead, exploits a shared-memory support to short-cut thecommunications performed between the vertices located on the same machine.This latter solution is not Pregel-like because introduces the concept of sub-supersteps (involving intra-superstep communications) and different user-definedfunctions are executed for each vertex, depending on the exploitation of theshared-memory. If on the one hand this approach may speed-up the computation,

Page 5: Balanced Graph Partitioning with Apache Sparkhpc.isti.cnr.it/.../exe/fetch.php?media=pdf:bgp-spark.pdf · 2015. 2. 28. · Balanced Graph Partitioning with Apache Spark Emanuele Carlini

on the other hand it does not fit most of the current Pregel-like frameworks.Therefore our focus is on the “one-to-one” model.

According to the original authors, JA-BE-JA can be ported in Pregel-likeenvironment if those assumptions hold:

– the nodes of the graph are processed periodically and asynchronously;– each node only has access to the state of its immediate neighbours and a

small set of random nodes in the graph: no shared memory is required;– nodes could be placed either on an independent host each, or processed in

separate threads in a distributed framework;– nodes communicate only through messages over edges of the graph;

To evaluate the suitability of JA-BE-JA in a Pregel-like environment, wefirst implemented it (starting from the original code of the authors) in a modifiedversion of PeerSim that emulates the synchronisation barrier of the BSP (we referto it as bsp-simulation). To comply with the assumptions above the algorithmhas been modified in the following aspects:

– access-to-neighbours; An assumption at the basis of JA-BE-JA is theavailability to access the neighbours and a small random sample of thegraph. However, the local search operator requires also the knowledge aboutthe neighbourhood of the neighbours. To overcome this limitation our BSP-friendly version of JA-BE-JA implements an additional message to retrievethe neighbourhood of any node. Although this gives us the ability to operateaccording to the assumptions, it slows down the performance. In terms ofnumber of supersteps, as periodically one superstep is used to obtain infor-mation from the neighbours, and in term of number of messages as 2 · |E|messages are injected in the system at each superstep;

– threads; The active and passive threads of JA-BE-JA have been collapsedin a single workflow. This let us to adhere to the BSP specification, wherecomputations are kind of monolithic sequences between two supersteps.

As a consequence of this adaptation the performances of the graph parti-tioning decreased significantly, showing that JA-BE-JA can not be realised byadopting the BSP abstraction without a payback. To validate our findings wealso implemented JA-BE-JA in Apache Spark (referred to as bsp-spark). Thesetwo points are discussed in more details in the next section.

3.1 Spark, Bagel, GraphX and our BSP abstraction

Apache Spark is an efficient and scalable data processor. In the last years manyheterogeneous enterprises have chosen Spark [13, 22] because of the different op-erations available to be performed over data. Machine learning, Data Mining andComputational Science are just a bunch of viable areas where Spark is employedin. Also graph processing is aimed. Spark is bundled with Bagel and GraphX.Bagel is a plain Spark implementation of the Pregel-like APIs. GraphX is a

Page 6: Balanced Graph Partitioning with Apache Sparkhpc.isti.cnr.it/.../exe/fetch.php?media=pdf:bgp-spark.pdf · 2015. 2. 28. · Balanced Graph Partitioning with Apache Spark Emanuele Carlini

young custom framework that aims to efficient graph processing, by exploitingits statical topology.

The Resilient Distributed Dataset [23] abstraction is the key concept un-derlying the Spark computations. A RDD is an immutable data structure dis-tributed over several different machines. The distributed computation is achievedby the derivation of a RDD directly from a previous one or from stable stor-age. The chain of transformations, is called lineage. Fault-tolerance is faced byre-computing the lineages starting from the stable storage or a checkpoint ifpresent.

In spite of the existing out of the box abstractions provided by Spark, wedevelop our own custom implementation of the BSP abstraction over Spark. Infact, both Bagel and GraphX were inadequate to satisfy the requirements asa P2P-like algorithm such as JA-BE-JA. More precisely JA-BE-JA relies onthe access of a random sample of the graph for each node, whereas a GraphX’scomputation involves only neighbours i.e. no random sample is available. Beside,Bagel performs the computation without exposing to the developers the hooksrequired to calculate the total energy i.e. the edge-cut computation requires tobe performed before or after a BSP-superstep in order to be stable. Our BSP-likeimplementation suspends the computation accordingly in order to perform theexecution of the hooks.

Our BSP-like implementation also addressed several technical issues thathave arisen when implementing iterative algorithms in Spark. In fact, in our casethe computation is massively iterative, thus the RDD’s lineage could eventuallybe so long to deplete the stack space reserved by the Java Virtual Machine.To address this issue we forced a periodic check-pointing of the RDDs on disk,to prevent the accumulation of long lineage chains. This solutions increased theamount of disk swapping and the total time of computation, but accomplished inavoiding stack overflows even for large graphs. RDD transformations could alsothrow “Out-of-Memory” exceptions if intermediate data (e.g. data referring toprevious supersteps) were not dropped along the super-steps. Bagel does not takeinto account this aspect, therefore to limit this problem, we careful de-allocatedany intermediate data accordingly.

3.2 A faster implementation

To further improve our JA-BE-JA implementation, we focused on the optimiza-tion of the completion time. We notice that most part of the computation timewas taken by the mechanism introduced to resolve the access-to-neighbour issueexplained above. This mechanism forced us to reserve some supersteps for ac-quiring the information about the neighbourhood of neighbours, increasing thecomputation time significantly.

To avoid this problem, we implemented a mechanism in which nodes piggy-back theirs neighbourhood information in the original messages. This mechanismguarantee a certain degree of consistency, but still cause the the nodes to runthe local heuristic on possibly stale data.

Page 7: Balanced Graph Partitioning with Apache Sparkhpc.isti.cnr.it/.../exe/fetch.php?media=pdf:bgp-spark.pdf · 2015. 2. 28. · Balanced Graph Partitioning with Apache Spark Emanuele Carlini

In other words, we relaxed the assumption of the original JA-BE-JA onthe consistency of the nodes knowledge about their neighbours. Although thisintroduced a certain degree of approximations, we greatly reduced the processingtime. Section 4 supports these claims with experimental evidence.

4 Experimental Results

To evaluate the performances of our proposed solution we compared the differentimplementation of JA-BE-JA described in the paper:

– original-simulation: the original implementation provided by the authorsof the JA-BE-JA paper;

– bsp-simulation: our implementation over PeerSim that emulates the BSPabstraction;

– bsp-spark: our implementation over Spark, which includes also our ownBSP abstraction implementation. This implementation has also an approxi-mated version.

The analysis of the performances relied on a set of metrics to estimate theviability of our customised version of the JA-BE-JA algorithm with the currentBSP-based distributed programming frameworks. The selected metrics are:

– edge-cut, the number of edges that cross the boundaries of each subgraph.It corresponds to the energy definition given by Formula 1 in Section 2.These metrics gives an estimation about the quality of the cut, with lowervalue corresponding to a better cut.

– convergence, the number of cycles or supersteps required to achieve a sub-stantially definitive edge-cut result. The performances are inherently depen-dent by the actual iterations needed;

– speed-up, a simple comparative value between two or more solutions thatinvestigates the relative performances, i.e. speedup = p1

p2where p1, p2 are

two execution times of different solutions.

All the experiments have been conducted by using as input three datasetsfreely available online. Two datasets were taken from the Walshaw archives3

(4elt and vibrobox) and one of the Facebook social network4.

4.1 Bsp-simulation vs Original-simulation

The first set of experiments estimates the impact derived from implementingJA-BE-JA in a BSP-like environment. Every experiment has been conductedvarying the number of partitions (2, 4, 8, 16, 32, 64) and manually terminatedafter 1000 supersteps.

Figure 1 depicts the edge-cut value obtained by original-simulation and thebsp-simulation version. The results shown are computed averaging the results of10 runs. As it can be noticed the original-simulation provides better values thanthe bsp-simulation, according to the edge-cut metric.

3 http://staffweb.cms.gre.ac.uk/~wc06/partition/4 http://socialnetworks.mpi-sws.org/

Page 8: Balanced Graph Partitioning with Apache Sparkhpc.isti.cnr.it/.../exe/fetch.php?media=pdf:bgp-spark.pdf · 2015. 2. 28. · Balanced Graph Partitioning with Apache Spark Emanuele Carlini

0k

5k

10k

15k

20k

25k

0 10 20 30 40 50 60

edge-c

ut

number of partitions

bsp-simulationoriginal-simulation

(a) 4elt graph

0k

20k

40k

60k

80k

100k

0 10 20 30 40 50 60

edge-c

ut

number of partitions

bsp-simulationoriginal-simulation

(b) vibrobox graph

0k

100k

200k

300k

400k

500k

0 10 20 30 40 50 60

edge-c

ut

number of partitions

bsp-simulationoriginal-simulation

(c) facebook graph

Fig. 1: original-simulation vs. bsp-simulation edge-cut as function of partitionsnumber (1000 supersteps)

4.2 Bsp-peersim vs Bsp-spark

This experiment has been conducted varying the number of partitions (2, 4, 8,16, 32, 64) and manually terminated after 1000 supersteps.

4k

8k

12k

16k

20k

24k

28k

0 10 20 30 40 50 60

edge-c

ut

number of partitions

bsp-simulationbsp-spark

(a) 4elt graph

0k

20k

40k

60k

80k

100k

0 10 20 30 40 50 60

edge-c

ut

number of partitions

bsp-simulationbsp-spark

(b) vibrobox graph

0k

100k

200k

300k

400k

500k

0 10 20 30 40 50 60

edge-c

ut

number of partitions

bsp-simulationbsp-spark

(c) facebook graph

Fig. 2: Bsp-simulation vs. bsp-spark edge-cut as function of partitions number(1000 supersteps)

Figure 2 depicts the edge-cut value obtained by the bsp-spark and bsp-simulation. The results shown are computed averaging the results of 10 runs. Inthis case the edge-cut values achieved by bsp-simulation and bsp-spark are veryclose, indicating that bsp-simulation is able to provide a good approximation ofthe computations performed against a real Pregel-like distributed framework.

4.3 Comparison of original-simulation, bsp-simulation andbsp-spark

We conducted this set of experiments to evaluate the values of edge-cut ob-tained by the original-simulation, bsp-simulation, and bsp-spark versions of JA-BE-JA. The evaluation focuses on the amount of cycles/supersteps requiredby the different versions of the algorithm to converge towards a stable valuein terms of edge-cut. The experiment is manually terminated after 1000 super-steps. The results shown are computed averaging the results of 10 runs. It can

Page 9: Balanced Graph Partitioning with Apache Sparkhpc.isti.cnr.it/.../exe/fetch.php?media=pdf:bgp-spark.pdf · 2015. 2. 28. · Balanced Graph Partitioning with Apache Spark Emanuele Carlini

be observed that both the bsp-spark and bsp-simulation versions provide similarvalues, whereas the original-simulation offers results that are significantly better.As a consequence it can be stated that the original-simulation either underesti-mates the edge-cut value that can obtained by a distributed implementation orunderestimates the amount of supersteps required to achieve such a value.

0k

5k

10k

15k

20k

25k

30k

35k

0 200 400 600 800 1000

edge-c

ut

supersteps

original-simulationbsp-simulation

bsp-spark

(a) 4elt graph

20k

40k

60k

80k

100k

120k

140k

0 200 400 600 800 1000

edge-c

ut

supersteps

original-simulationbsp-simulation

bsp-spark

(b) vibrobox graph

100k

200k

300k

400k

500k

600k

700k

0 200 400 600 800 1000

edge-c

ut

supersteps

original-simulationbsp-simulation

bsp-spark

(c) facebook graph

Fig. 3: Edge-cut of original-simulation, bsp-simulation, and bsp-spark over 1000supersteps

4.4 Evaluation of the approximation

This experiment measured the speed-up that we achieve by introducing a degreeof approximation on the information that each vertex owns about its neighbours.Every experiment has been conducted varying the number of partitions (2, 8,32) and averaging the results achieved by 10 distinct runs. It is easy to noticethat the speed-up decreases when the number of partitions increases. Intuitively,this is due to the increased amount of “colours” that each vertex can assume.Greater is the set of possible colours, higher is the probability to have a wronginformation about the colour of a vertex.

2.6

2.7

2.8

2.9

3

3.1

2 8 32

speed-u

p

number of partitions

(a) 4elt graph

4.9

5

5.1

5.2

5.3

5.4

5.5

2 8 32

speed-u

p

number of partitions

(b) vibrobox graph

6.6

6.8

7

7.2

7.4

7.6

7.8

2 8 32

speed-u

p

number of partitions

(c) facebook graph

Fig. 4: SpeedUp of the approximated bsp-spark version compared to bsp-sparkas function of the partition number

Page 10: Balanced Graph Partitioning with Apache Sparkhpc.isti.cnr.it/.../exe/fetch.php?media=pdf:bgp-spark.pdf · 2015. 2. 28. · Balanced Graph Partitioning with Apache Spark Emanuele Carlini

5 Discussion and Related Work

Over the years several graph clustering algorithms [8] have been proposed inthe research area of parallel processing. However, almost all these algorithmsrequire to have random access to the entire graph. This makes such solutionspoor suitable for the analysis of very large graphs that cannot fit into the memoryof a single machine.

On the other way, algorithms suitable for the analysis of such large graphsrequire only a partial knowledge on the graph. Apart from JA-BE-JA which hasbeen extensively described in Section 2, other proposals in this area are DiDiC [3]and CDN [18]. DiDiC proposes a heuristics clustering algorithm based on theconcept of disturbed diffusion to identify dense graph regions. The algorithm isthe core of a distributed load balancing algorithm developed for a P2P-basedvirtual supercomputer which is programmed according to the BSP programmingmodel. Similarly to our approach, the fundamental requirements of DiDiC arethat i) nodes need to communicate only with their direct neighbours and ii) thecomputation starts from an arbitrary initial configuration. However, DiDiC isbased on a diffusion process that does not return a balanced k-way partitionof the graph. CDN, instead, exploits a different diffusion-flow process for graphclustering.

An alternative approach is based on a randomised computation of min-cut.Several proposals extends the randomised global min-cut algorithm edge con-traction algorithm proposed by Karger [7]. The basic idea of Karger’s algorithmis to pick uniformly at random an edge of the graph and merge its endpointsinto a single “supernode”. If you pick a random edge, likely it comes from partsof the graph that contain more edges in the first place, and this is the heuristicsthe algorithm is based on. The procedure is repeated until the graph includes asingle pair of “supernodes” and the final graph is exploited to return the guessedmin-cut. The definition of a distributed algorithm based on Karger’s approachis not straightforward. A recent proposal from Ghaffari et al. [4] defines a newtechnique, the random layering technique, to support the distributed randomisedmin-cut. The algorithm is defined according to a synchronous message passingmodel where in each time unit, a given amount of bits can be sent over everylink (in each direction).

Finally, Montresor et al. [5] proposes a different approach to clustering whichis edge-centric rather than vertex-centric. The basic observation is that divid-ing the vertex set of a graph into equal sized partitions can still lead to anunbalanced subdivision because having the same amount of vertices does notimply that the corresponding sub-graph have the same size, given the unknowndistribution of their edge degrees. For this reason, an edge-based partitioningsolution is proposed, in which edges, rather than vertices, are partitioned intodisjoint subsets and a vertex may belong to more than one partition. Even ifthis solution presents several advantages, it cannot benefit from current graphprocessing frameworks which are intrinsically vertex-centred.

Page 11: Balanced Graph Partitioning with Apache Sparkhpc.isti.cnr.it/.../exe/fetch.php?media=pdf:bgp-spark.pdf · 2015. 2. 28. · Balanced Graph Partitioning with Apache Spark Emanuele Carlini

6 Conclusions

In this paper we presented an adaptation to the JA-BE-JA algorithm to make itsuitable to be efficiently computed by a BSP-like distributed computing frame-work. We firstly implemented JA-BE-JA over a modified version of PeerSim,which forces the simulator to comply with the BSP model. Then, we implementa second version of it in Apache Spark. To this end we first developed a BSP-likeabstraction on top of it. The development of this layer was required because theexisting BSP abstractions resulted inadequate to match our requirements. Ontop of this BSP-like abstraction we implemented a Spark version of our BSP-friendly version of JA-BE-JA. This version supports two different consistencymodels for data. The simpler version corresponds to a stricter consistency modelwhich requires each vertex to have an up-to-date knowledge about its neighbour-hood. The other consistency model accepts a certain degree of approximationon the information about neighbours.

Our findings show that the experiments conducted on the BSP version ofPeerSim are much more similar to the ones effectively achievable using a dis-tributed framework like Spark. The experiments also show that the Spark ver-sion adopting the relaxed consistency model runs significantly faster than theformer one, still providing good performances in terms of edge-cut.

7 Acknowledgement

We greatly thank the authors of the JA-BE-JA original paper for providingthe source code of their implementation, and Marco Distefano for the supportin setting the machines for the tests.

References

[1] Enright, A.J., Van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic acids research 30(7), 1575–1584 (2002)

[2] Galuba, W., Aberer, K., Chakraborty, D., Despotovic, Z., Kellerer, W.: Outtweet-ing the twitterers - predicting information cascades in microblogs. In: Proceedingsof the 3rd Wonference on Online Social Networks. pp. 3–3. WOSN’10, USENIXAssociation, Berkeley, CA, USA (2010)

[3] Gehweiler, J., Meyerhenke, H.: A distributed diffusive heuristic for clustering avirtual p2p supercomputer. In: IPDPS Workshops. pp. 1–8 (2010)

[4] Ghaffari, M., Kuhn, F.: Distributed minimum cut approximation. In: DISC. pp.1–15 (2013)

[5] Guerrieri, A., Montresor, A.: Distributed edge partitioning for graph processing.CoRR abs/1403.6270 (2014)

[6] Hendrickson, B., Leland, R.: A multi-level algorithm for partitioning graphs. In:Supercomputing, 1995. Proceedings of the IEEE/ACM SC95 Conference. pp. 28–28 (1995)

[7] Karger, D.R.: Global min-cuts in rnc, and other ramifications of a simple min-outalgorithm. In: Proc. of the fourth annual ACM-SIAM Symposium on Discretealgorithms. pp. 21–30. Society for Industrial and Applied Mathematics (1993)

Page 12: Balanced Graph Partitioning with Apache Sparkhpc.isti.cnr.it/.../exe/fetch.php?media=pdf:bgp-spark.pdf · 2015. 2. 28. · Balanced Graph Partitioning with Apache Spark Emanuele Carlini

[8] Karypis, G., Kumar, V.: Parallel multilevel k-way partitioning scheme for irregulargraphs. In: Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.Supercomputing ’96, IEEE Computer Society, Washington, DC, USA (1996)

[9] Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioningirregular graphs. SIAM Journal on scientific Computing 20(1), 359–392 (1998)

[10] Karypis, G., Kumar, V.: Parallel multilevel series k-way partitioning scheme forirregular graphs. Siam Review 41(2), 278–300 (1999)

[11] Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.:Distributed graphlab: a framework for machine learning and data mining in thecloud. Proceedings of the VLDB Endowment 5(8), 716–727 (2012)

[12] Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Cza-jkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings ofthe 2010 ACM SIGMOD International Conference on Management of data. pp.135–146. ACM (2010)

[13] Metz, C.: Spark: Open source superstar rewrites future of big data (Jun 2013),http://www.wired.com/2013/06/yahoo-amazon-amplab-spark/all/

[14] Meyerhenke, H., Monien, B., Sauerwald, T.: A new diffusion-based multilevelalgorithm for computing graph partitions of very high quality. In: Parallel andDistributed Processing, 2008. IPDPS 2008. IEEE International Symposium on.pp. 1–13. IEEE (2008)

[15] Meyerhenke, H., Monien, B., Schamberger, S.: Graph partitioning and disturbeddiffusion. Parallel Computing 35(10), 544–569 (2009)

[16] Montresor, A., Jelasity, M.: Peersim: A scalable p2p simulator. In: Peer-to-PeerComputing, 2009, Ninth International Conference on. pp. 99–100. IEEE (2009)

[17] Rahimian, F., Payberah, A., Girdzijauskas, S., Jelasity, M., Haridi, S.: Ja-be-ja:A distributed algorithm for balanced graph partitioning. In: Self-Adaptive andSelf-Organizing Systems (SASO), 2013 IEEE 7th International Conference on.pp. 51–60 (Sept 2013)

[18] Ramaswamy, L., Gedik, B., Liu, L.: A distributed approach to node clusteringin decentralized peer-to-peer networks. IEEE Trans. Parallel Distrib. Syst. 16(9),814–829 (2005)

[19] Sanders, P., Schulz, C.: Engineering multilevel graph partitioning algorithms. In:Algorithms–ESA 2011, pp. 469–480. Springer (2011)

[20] Viswanath, B., Mislove, A., Cha, M., Gummadi, K.P.: On the evolution of userinteraction in facebook. In: Proceedings of the 2Nd ACM Workshop on OnlineSocial Networks. pp. 37–42. WOSN ’09, ACM, New York, NY, USA (2009), http://doi.acm.org/10.1145/1592665.1592675

[21] Walshaw, C.: The graph partitioning archive (Jun 2012), http://staffweb.cms.gre.ac.uk/~wc06/partition/

[22] Zaharia, M.: The growing spark community (Oct 2013), http://databricks.com/blog/2013/10/27/the-growing-spark-community.html

[23] Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin,M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: A fault-tolerant ab-straction for in-memory cluster computing. In: Proceedings of the 9th USENIXconference on Networked Systems Design and Implementation. pp. 2–2. USENIXAssociation (2012)

[24] Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Clus-ter computing with working sets. In: Proceedings of the 2Nd USENIX Con-ference on Hot Topics in Cloud Computing. pp. 10–10. HotCloud’10, USENIXAssociation, Berkeley, CA, USA (2010), http://dl.acm.org/citation.cfm?id=1863103.1863113