Graph Community Discovery Algorithms in Neo4j with a Regularization-based Evaluation Metric

Andreas Kanavos, Georgios Drakopoulos and Athanasios Tsakalidis Computer Engineering and Department, , Achaia 26504,

Keywords: CNM Algorithm, Community Discovery, Graph , Graph Mining, Graph Signal Processing, Louvain Algorithm, Newman-Girvan Algorithm, Neo4j, Regularization, Walktrap Algorithm.

Abstract: Community discovery is central to social network analysis as it provides a natural way for decomposing a social graph to smaller ones based on the interactions among individuals. Communities do not need to be disjoint and often exhibit recursive structure. The latter has been established as a distinctive characteristic of large social graphs, indicating a modularity in the way humans build societies. This paper presents the im- plementation of four established community discovery algorithms in the form of Neo4j higher order analytics with the Twitter4j Java API and their application to two real Twitter graphs with diverse structural properties. In order to evaluate the results obtained from each algorithm a regularization-like metric, balancing the global and local graph self-similarity akin to the way it is done in signal processing, is proposed.

1 INTRODUCTION los et al., 2015b) (Drakopoulos et al., 2016). To this end, a coherence metric balancing global and local Twitter is currently the most popular microblogging self-similarity properties with a rationale similar to platform and the stage for ongoing political, finan- the signal processing regularization criterion 2 2 cial, and cultural conversations with a vast amount K = x As + µ0 Bs , µ0 > 0 (1) of tweets being posted on a daily basis. Decom- k − k k k posing a Twitter social graph to communities yields which given a data vector x, possibly with noise and a deeper insight to these seemingly chaotic interac- outliers, computes a smoother version s thereof by tions. However, community discovery is by no means combining global and local patterns coded in matri- a trivial task. Besides the large volume of accounts, ces A and B respectively. µ0 (Drakopoulos and Mega- tweets, retweets, and hashtags that need to be exam- looikonomou, 2016) controls their contribution to s. Graph databases such as Neo4j1, GraphDB2, and ined, necessarily implying parallel or distributed pro- 3 cessing, the question of what constitutes a commu- BrightstarDB provide production grade front- or nity, although posed in easily understood terms, re- back-end graph storage. In addition, they also offer mains to be definitively answered. This does not im- graph analytics such as link prediction and minimum ply that no formal community definition exists. Quite spanning trees (Panzarino, 2014) (Robinson et al., the contrary, a plethora of such definitions has been 2013). Higher order analytics, such as community in fact proposed, for instance in (Carrington et al., discovery, constitute a significant addition as they of- 2005), (Fortunato, 2010), (Newman, 2010), which fer deeper insight in the graph structure. successfully capture crucial aspects of human social The primary contribution of this paper is twofold. organization. However, they differ in key aspects and, Four community discovery algorithms, namely the therefore, lead to different community detection algo- Newman-Girvan, the Walktrap, the Louvain, and the rithms. CNM were implemented in Java over Neo4j. More- over, the results of these algorithms applied to two Similarly, there are a number of ways to assess Twitter graphs created with Twitter4j4 are evalu- the clustering quality, namely community coherence. However, most of the existing coherence metrics 1www.neo4j.com are either prohibitively expensive, such as the maxi- 2www.ontotext.com mum distance between vertices, or are prone to out- 3www.brightstardb.com liers, such as the diameter-based metrics (Drakopou- 4http://twitter4j.org/en/index.html

403

Kanavos, A., Drakopoulos, G. and Tsakalidis, A. Graph Community Discovery Algorithms in Neo4j with a Regularization-based Evaluation Metric. DOI: 10.5220/0006382104030410 In Proceedings of the 13th International Conference on Web Information Systems and Technologies (WEBIST 2017), pages 403-410 ISBN: 978-989-758-246-2 Copyright © 2017 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved WEBIST 2017 - 13th International Conference on Web Information Systems and Technologies

ated with a regularization-like criterion which is effi- topics and subsequently the authorities for each topic ciently computed and relies on the fundamental self- are identified. This was extended in (Pal and Counts, similarity property of scale-free graphs. 2011) with additional features, advanced clustering The rest of this paper is structured as follows. and real-time capabilities. In addition, a previous Section 2 provides an overview of community detec- work regarding influential communities identification tion algorithms. The main characteristics of graph is presented in (Kafeza et al., 2014). Finally, an over- databases are described in section 3. The inherent all and extensive overview of the community discov- high order nature of graph communities and four im- ery field is (Fortunato, 2010). plemented algorithms are outlined in section 4. Fi- Signal regularization is a common technique aim- nally, section 5 describes the datasets used in this pa- ing at deriving a smoother or cleaner version of per and the results obtained from executing the com- a data vector without altering the regions of inter- munity detection algorithms in Neo4j, whereas sec- est. It has numerous applications in signal process- tion 6 concludes by recapitulating the main findings ing (Drakopoulos and Megalooikonomou, 2016), ma- and exploring future research directions. chine learning (Girosi et al., 1995), system identi- fication (Johansen, 1997), and inverse problem the- Table 1: Paper notation. ory (Vogel, 2002), while it also has connections to Symbol Meaning Sobolev space theory (Adams and Fournier, 2003) =4 Definition or equality by definition and to reproducible kernel Hilbert space theory (At- touch and Aze,´ 1993). deg(vk) Degree of vertex vk Kn Complete graph with n vertices The interest in the graph processing field has been invigorated with the advent of open source graph (v1,...,vn) Path with vertices v1,...,vn databases such as Neo4j, GraphDB and BrightStar. sk Set containing elements sk { } Graph processing is usually implemented with the S or sk Cardinality of set S or sk | | |{ }| { } use of massive distributed graph computing systems sk Sequence of items sk h i like Google Pregel and graph based machine learning frameworks like GraphLab. In these systems, graphs play a twofold role as the data flow model and as the 2 RELATED WORK learning model.

Community detection is related mostly to graph clus- tering (Scott, 2000), Web retrieval (Newman, 2010), and user influence (Carrington et al., 2005). Con- 3 ARCHITECTURE AND cerning graph clustering, it can be performed either SOFTWARE structurally or spectrally. In the former case parti- tioning is based on the properties of the graph adja- Graph databases such as Neo4j constitute one of the cency matrix (Kernighan and Lin, 1970) (Shi and Ma- four major technologies collectively known lik, 2000), whereas in the latter connectivity patterns as NoSQL. RDBMSs assume that data can be repre- such as edge density or modularity (Newman, 2004b) sented in a structured and tabular manner. However, (Newman, 2004a) play a primary role with notable the modern Web and the IoT generate unstructured or examples being (Blondel et al., 2008), (Girvan and semistructured, higher order, linked data which can- Newman, 2002). Vertex ranking, computed for in- not be easily described by a schema. The primary stance with PageRank (Brin and Page, 1998) includ- properties of Neo4j include (Robinson et al., 2013) ing its variants (Langville and Meyer, 2006) or HITS (Panzarino, 2014) (Drakopoulos et al., 2015a) (Kleinberg, 1998), can be used to build communities Property 1. Neo4j is schemaless. with vertices which share common topics. Authority estimation can also be used to construct graph com- Property 2. Neo4j conforms to BASE requirements. munities. In (Agichtein et al., 2008) several graph Property 3. The property graph model is the primary features as well as hub and authority scores are used conceptual data model of Neo4j. to model the relative importance of a given user. Al- Property 4. Neo4j supports SPARQL, a W3C RDF ternatively, in the expertise ranking model (Jurczyk query language, and Gremlin, a path query language and Agichtein, 2007), authorities are derived by per- (Drakopoulos et al., 2015a). However, queries to forming link analysis to the graph induced from in- a Neo4j system are mostly submitted in Cypher, an teractions between users. Moreover, in (Weng et al., ASCII art, pattern based, declarative language. The 2010) authors employ Latent Dirichlet Allocation and basic Cypher query has the form a PageRank variant to cluster the graph according to

404 Graph Community Discovery Algorithms in Neo4j with a Regularization-based Evaluation Metric

[ s t a r t

] be considered as a third order quantity. If a triangle match

is closed, then it is a third order quantity in terms of [ with

[ as

]] edges as well. This stems from the fact that single re- where

lationships between individuals, namely edges in so- return cial graphs, do not qualify as communities. Thus, in [ order by [ desc ]] a group there has to be at least one common acquain- tance connecting the individuals in this group. This is Cypher queries can be submitted directly in Neo4j reflected by the fact that successful community detec- console or, most frequently, through an application tion algorithms rely on higher order metrics directly over a Neo4j API. For Java the Neo4j API is included or indirectly. For instance, graph clustering or spec- in the Neo4j NetBeans extension library. tral graph partitioning algorithms exploit higher or- Figure 1 illustrates its components, including the der constructs such as the primary eigenvector or the social crawler, Neo4j, and the graph analytics, as well resolvent of the graph adjacency matrix (Benzi and as the data flow between them. Boito, 2010).

Twitter Gray box crawler 4.1 Louvain Algorithm

Black box Louvain or multilevel algorithm (Blondel et al., 2008) Java driver Client White box is a hierarchical clustering algorithm operating on (developer) weighted graphs. Initially each vertex is a single com- White box (user) munity. Then, communities are progressively merged NetBeans with neighboring ones based on the local edge density library Neo4j change. The objective is to create communities where edge density is high, while intercommunity density Figure 1: System architecture. remains low. Louvain algorithm expresses the intuitive notion The social crawler has been implemented in Java of edge density with modularity, a scalar m ranging using the Twitter4j API for collecting Twitter data from 1 to +1 is defined as − and NetBeans for interfacing with Neo4j. Concerning system configuration, the Twitter crawler is currently deg(v )deg(v ) 1 w i j ,v c v c 4 2 E ∑(i, j) i, j 2 E i i j j inaccessible from the client, excluding thus any loops m = | | − | | ∈ ∧ ∈ (2) (0,   v ,v c with user feedback in subsequent Twitter crawlings. i j ∈ i The Neo4j version is 2.2.5, the latest available ver- sion at the beginning of development. In (2) ci and c j denote the communities vi and v j belong to and wi, j is the weight of (i, j). Although the Louvain algorithm can be applied to unweighted 4 COMMUNITY DISCOVERY graphs, the result is always a weighted graph where weights are proportional to local edge density. An This section outlines four popular community detec- unweighted graph is treated as a weighted graph with tion algorithms. It should be noted that these commu- initial weights equal to one. nity detection algorithms rely on the inherently higher Modularity is maximized through a sequence of order information found as the graph structure. The two alternating steps. In the first step, each vi is latter is expressed in terms of the number of vertices merged with each of its neighbors into a single com- or edges that need to be visited or traversed respec- munity C and the modularity change ∆m is computed tively in order to compute a graph function. Typi- as the difference between the new modularity minus cal examples include the diameter or the number of the old one. Finally, vi is assigned to the c j yielding shortest paths connecting two given vertices. This can the bigger ∆m. In the second step, a new graph is con- be at least partly attributed to the linked graph nature structed where all vertices belonging to the same com- which balances local and global information. There- munity are merged into a single vertex. All edges con- fore, graph processing systems should be of similar necting two communities form a single edge whose nature, if useful information is to be extracted. weight is the sum of the individual weights. A manifestation of the higher order nature of the graph community detection problem is that the small- est community is a triangle. In terms of vertices, it can

405 WEBIST 2017 - 13th International Conference on Web Information Systems and Technologies

Algorithm 1: Louvain (multilevel) algorithm. fore, the edge e∗ with the highest betweeness central- ity should be removed and subsequently the process G(V,E) Require: Graph should be again applied to the new graph. Eventually, G Ensure: is partitioned into communities all edges connecting communities will be identified. 1: if G is unweighted then Intuitively, the edge sequence e∗ should contain the 2: for all (v ,v ) E do h i i j ∈ graph bridges as well, which are a subset of the com- 3: w 1 i, j ← munity connecting edges. In case the graph becomes 4: end for disconnected, then the process is repeated for each of 5: end if the connected components. 6: k 0 and V V and E E ← 0 ← 0 ← 7: for all v V do i ∈ 0 Algorithm 2: Newman-Girvan algorithm. 8: vi becomes a separate community 9: end for Require: Graph G(V,E); Termination criterion τ0 10: Compute m as in (2) Ensure: G is partitioned into communities 11: 1: while E = ∅ and τ0 not satisfied do repeat 6 12: for all v V do 2: Compute Bk as in (3) i ∈ k 13: for all v = v (i, j) E do 3: e∗ argmaxk Bk j 6 i | ∈ k ← { } 14: Assign temporarily v to c . 4: E E e∗ i j ← \ { } 15: Compute ∆m. 5: end while 16: end for 6: return 17: end for 18: Assign vi to c j with the biggest ∆m 4.3 Walktrap Algorithm 19: Merge vertices of ci to a single vertex 20: Merge edges within ci to a single loop Walktrap algorithm is based on the principle of ran- 21: Merge edges between ci and c j to a single edge dom walker. Starting from any random vertex the ran- 22: k 1 and update Vk, Ek dom walker will eventually spend more time steps in ← 23: until no ∆m can occur. densely interconnected graph segments, as it is more 24: return probable for a randomly picked edge to lead to an- other vertex inside the segment than to a vertex out- 4.2 Newman-Girvan Algorithm side it. Since such densely connected segments in- tuitively correspond to communities, random walks Newman-Girvan or edge betweeness algorithm (Gir- based metrics for community detection has been pro- van and Newman, 2002) relies on betweeness central- posed in (Pons and Latapy, 2005). The probability ity, an edge centrality metric which counts the fraction that the walker moves from vi to v j is of the number of the shortest paths connecting two A[i, j] p , = (4) vertices vi and v j a given edge ek is part of, denoted i j deg(v ) k i by ζi, j, to the total number of shortest paths connect- where A denotes the adjacency matrix ing vi and v j, denoted by ζi, j. Then the betweeness centrality for ek, denoted by Bk, is computed by aver- aging over each vertex pair 1, i = j (i, j) E V V A[i, j] =4 ∨ ∈ 0,1 | |×| | (5) k (0, i = j (i, j) E ∈ { } 1 ζi, j 6 ∧ 6∈ V ∑ v ,v V V ζ , vi = v j B =4 (| |) ( i j) i, j 6 (3) k  2 ∈ × 1, vi = v j As the probability that the random walker reaches v j from vi through a path of length `, is denoted by In (Girvan and Newman, 2002) a process for  p` , then if v and v belong to the same community, B e i, j i j computing k for each k, in a manner resembling ` breadth-first search, is described. The rationale is that then pi, j should be large for at least large values of `. vertices belonging to different communities should Note that the converse is not always true, depending rely on edges connecting communities for informa- on graph topology tion exchange. However, note that the converse does ` pi, j = pi, j (6) not need to be true. Moreover, depending on graph ∑ ∏ πk π =` (vi,v j) π topology, some of the community connecting edges | | ∈ k may not be high ranked in terms of betweeness cen- where trality, as other edges may be more preferable. There- π = v ,...,v , π = ` (7) k1 k`+1 | |  406 Graph Community Discovery Algorithms in Neo4j with a Regularization-based Evaluation Metric

Equations lay the groundwork for defining the dis- ∆ai, j, they are stored in a sparse matrix. Also the com- tance di, j between vi and v j as munities are stored in a binary tree where the leaves are the individual vertices. Each time two commu- 2 nities are merged, the resulting community is their V p` p` | | i,k j,k parent at the tree. Moreover, the two corresponding d` =4 v − (8) i, j u∑   columns of the ∆a sparse matrix are merged and uk=1 deg(vk) i, j u their elements are updated according to the following t ` rules (assuming communities i and j are to be fused): The transition probability pC,k from any vertex be- longing to a community C to vk in ` steps, is defined If community k is linked with communities i and as • j, then ` 1 ` ∆a = ∆a + ∆a (13) pC,k = ∑ pi,k (9) j,k i,k j,k C i C | | ∈ If community k is linked with community i but not • Generalizing (8), the distance rCi,Cj between the with j, then communities Ci and Cj is defined as ∆a = ∆a 2a a (14) j,k i,k − j k 2 V p` p` Finally, if community k is linked with community | | Ci,k Cj,k r = v − (10) • Ci,Cj u∑   j but not with i, then uk=1 deg(vk) u ∆a = ∆a 2a a (15) t j,k j,k − i k Algorithm 3: Walktrap algorithm. Algorithm 4: CNM algorithm. Require: Graph G(V,E); Termination criterion τ 0 Require: Graph G(V,E); Termination criterion τ Ensure: G is partitioned into communities 0 Ensure: G is partitioned into communities 1: for all v V do k ∈ 1: Assign each vertex to a separate community 2: Assign v to a separate community k 2: for all discrete pairs (v ,v ) V V do 3: end for i j ∈ × 3: Compute pairwise ∆a as in (12) 4: repeat i, j 4: end for 5: for all distinct community pairs C and C do i j 5: repeat 6: Compute r as in (10). Ci,Cj 6: for all remaining communities do 7: end for 7: Compute pairwise ∆a as in (12) 8: Merge communities which minimize r . i, j Ci,Cj 8: end for 9: until one community remains. 9: Find max∆a , and fuse communities 10: return i j 10: Update binary tree and ai and matrix ∆ai, j 11: until one community is left. 4.4 CNM Algorithm 12: return

The CNM algorithm is also a hierarchical vertex par- titioning algorithm. As such, initially each vertex 5 Results constitutes a separate community. Then, neighbor- ing communities are progressively merged to larger 5.1 Data Synopsis ones until no more merging is feasible according to the structurality criterion a. For a single vertex v , a i i Definition 1. The (log)completeness σ (σ ) of a is defined as 0 0 graph is defined as the ratio of the (log)number of deg(vi) ai =4 (11) edges to the (log)number of edges of K . 2 E n | | E 2 E 2 E For two neighboring vertices, ∆ai, j is defined as σ =4 | | = | | | | 0 V V ( V 1) ≈ 2 |2| V 1 deg(vi)deg(v j) | | | | − | | ∆ai, j =4 (12) log E log E 2 E − 2 =4  4 E σ0 |V | | | (16) | | | | log | | ≈ 2log V 2 | | and it is zero for non-neighboring vertices. ∆ai, j corresponds to the structural changes incurred from  adding (i, j) to a community. In order to keep track of

407 WEBIST 2017 - 13th International Conference on Web Information Systems and Technologies

Table 3: #SocialNetwork graph sunopsis. Definition 2. The (log)density ρ0 (ρ0) of a graph is defined as the ratio of the (log)number of edges to the Feature Value Feature Value (log)number of vertices. Directed True Weighted False E V σ0 V 4246 E 12054 ρ0 =4 | | = | | | | | | V 2 ρ0 2.8387 ρ0 1.1249 | | σ 0.0013 σ 0.5624 log E 0 0 ρ0 =4 | | 2σ0 (17) 0 log V ≈ 0 | | communities than Newman-Girvan and Walktrap. Notice that the base of the logarithm affects nei- Another observation is that communities tend to be ther σ0 nor ρ0 since clustered in size. log x log x = b2 , x = 0 (18) Table 4: Community sizes (%) of #Grexit graph. b1 log b 6 b2 1 id Edge Walktrap Louvain CNM It follows from (16) and (17) that 1 9.30 10.50 6.50 9.70 2 6.10 12.60 13.60 10.20 ρ 4 σ 0 = 0 (19) 3 5.10 7.00 11.20 22.10 ρ0 V σ0 . . . . | | 4 13 10 7 20 6 40 18 50 implying a balance between density, which connects 5 2.70 11.20 7.90 14.30 the number of vertices and edges of the same graph, 6 13.20 10.20 12.50 11.40 and completeness, which relates the number of edges 7 5.20 8.40 13.60 5.50 8 15.10 6.30 12.20 8.30 of a graph to those of K V . In order to demonstrate| | the differences between 9 12.10 5.50 6.30 - the algorithms of section 4, two social graphs with 10 11.10 11.40 5.20 - anonymized Twitter users were constructed. Twit- 11 1.10 2.10 4.60 - ter4j retrieved users as well as information regarding 12 3.40 3.40 - - who follows whom using a topic sampling approach. 13 2.50 4.20 - - A keyword search query collected the users whose tweets or retweets contained #Grexit, a trendy and po- Table 5: Community sizes (%) of #SocialNetwork graph. litically highly controversial topic, whereas a second id Edge Walktrap Louvain CNM query used the hashtag #SocialNetwork, a generic and 1 9.10 12.50 6.90 9.70 by no means inciendiary topic. Subsequently, users 2 6.60 14.60 18.40 12.00 following each other or having a common follower 3 5.10 7.10 13.20 25.10 were connected with an edge as in (Kanavos et al., 4 15.10 7.10 6.70 18.50 2014). Tables 2 and 3 review graphs #SocialNetwork 5 2.80 13.30 7.90 17.30 and #SocialNetwork respectively. Both seem to have 6 15.20 11.20 13.50 11.90 similar properties on a macroscopic scale, however 7 5.40 9.40 18.20 5.50 the seemingly subtle differences correspond to sig- 8 15.10 6.50 15.20 - nificant structural differences at the community level 9 13.10 5.60 - - stemming from the nature of the two topics. 10 12.50 12.70 - - Table 2: #Grexit graph sunopsis. A metric for evaluating the clustering quality Feature Value Feature Value is inspired by the regularization cost function from Directed True Weighted False (Drakopoulos and Megalooikonomou, 2016) V 3696 E 8225 + | | | | J(λ0) = J1 + λ0J2, λ0 R (20) ρ0 2.2313 ρ0 1.0973 ∈ σ0 0.0012 σ0 0.5486 where λ0 is a strictly positive factor expressing the relative importance of J1 compared to J2. The first term measures the combined and 5.2 Analysis weighted relative deviation of k-th community in terms of logdensity and logcompleteness in macro- Table 2 outlines the size of each community, ex- scopic or global scale, namely from the entire graph pressed as a percentage of the total number of ver- Ck tices, as generated by the four aforementioned algo- | | Vk ρk0 ρ0 σk0 σ0 J1 =4 ∑ | | − + − (21) rithms. Louvain and CNM algorithms yield fewer k=1 V ρ0 σ0 ! | |

408 Graph Community Discovery Algorithms in Neo4j with a Regularization-based Evaluation Metric

(a) #Grexit (b) #SocialNetwork Figure 2: Graph community sizes. where ρk0 and σk0 are the logdensity and the logcom- that it creates fewer communities, which are bound pleteness of the k-th community whereas Ck is the set to be heterogeneous. As Newman-Girvan is typically of communities. The weight of each community is the an exhaustive algorithm, it seems that Louvain and ratio of its vertices to the total number of vertices. Walktrap algorithms are balanced options. The second term quantifies the combined and weighted deviation from the expected scale-free be- havior, again expressed in terms of logdensity and logcompleteness as in (19), in microscopic or local 6 CONCLUSIONS AND FUTURE scale, namely at the community level WORK

Ck 2 | | Vk ρ0 4 σ0 This paper outlines the implementation of Newman- J =4 k k 2 v∑ | | (22) Girvan, Walktrap, Louvain, and CNM community de- uk=1 V ρk − Vk σk u | |  | |  tection algorithms over Neo4j. Also, a criterion for t Once communities are derived, computing logdensity assessing the compactness of the communities com- and logcompleteness is straightforward. This is an bining global and local scale-free graph behavior is advantage over community metrics such as diameter. proposed and tested on the results of applying these Moreover, J(λ0) is less prone to outliers and captures algorithms to two real Twitter graphs created from a the scale-free behavior of the graph. neutral as well as a politically charged topic. As future work, the scalability properties of com- Table 6: J score for #Grexit graph. munity discovery should be considered in parallel or λ0 Edge Walktrap Louvain CNM distributed environments. In addition, the proposed 0.1 14.94 15.55 13.67 21.44 criterion should be tested on larger graphs. Finally, 0.3 11.61 12.69 12.33 18.12 regarding λ0, a scheme for computing its optimum 0.5 09.11 11.07 10.59 18.37 value in finer granularity should be developed. 0.7 08.42 11.01 11.12 19.95 0.9 09.73 12.45 13.49 21.17 REFERENCES Table 7: J score for #SocialNetwork graph.

λ0 Edge Walktrap Louvain CNM Adams, R. A. and Fournier, J. J. (2003). Sobolev spaces, 0.1 18.42 20.42 20.61 25.34 volume 140. Academic press. 0.3 17.75 20.11 19.70 24.99 Agichtein, E., Castillo, C., Donato, D., Gionis, A., and 0.5 18.04 19.23 17.44 23.18 Mishne, D. (2008). Finding high-quality content in 0.7 19.78 18.92 18.63 22.34 social media. In Web Search and Data Mining confer- 0.9 20.53 19.53 21.00 21.53 ence (WSDM), pages 183–194. ACM. Attouch, H. and Aze,´ D. (1993). Approximation and reg- ularization of arbitrary functions in Hilbert spaces by As a general remark, there is no single optimum the Lasry-Lions method. In Annales de l’IHP Analyse value for λ0. Nonetheless, Newman-Girvan is consis- non lineaire´ , volume 10, pages 289–312. tently better with Walktrap and Louvain closely fol- Benzi, M. and Boito, P. (2010). Quadrature rule-based lowing and sharing the second position. CNM has the bounds for functions of adjacency matrices. Linear worst performance, which can be attributed to the fact Algebra and its Applications, 433(3):637–652.

409 WEBIST 2017 - 13th International Conference on Web Information Systems and Technologies

Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefeb- Newman, M. E. (2004a). Detecting community struc- vre, E. (2008). Fast unfolding of community hierar- ture in networks. The European Physical Journal B- chies in large networks. Journal of Statistical Me- Condensed Matter and Complex Systems, 38(2):321– chanics: Theory and Experiment, P1000. 330. Brin, S. and Page, L. (1998). The PageRank citation rank- Newman, M. E. (2004b). Fast algorithm for detecting com- ing: Bringing order to the web. Stanford Digital Li- munity structure in networks. Physical Review E, brary. 69(6). Carrington, P. J., Scott, J., and Wasserman, S. (2005). Mod- Newman, M. E. (2010). Networks: An Introduction. Oxford els and Methods in Social Network Analysis. Cam- University Press. bridge University Press. Pal, A. and Counts, S. (2011). Identifying topical author- Drakopoulos, G., Baroutiadi, A., and Megalooikonomou, ities in microblogs. In Web Search and Data Mining V. (2015a). Higher order graph centrality measures (WSDM), pages 45–54. for Neo4j. In Conference of Information, Intelligence, Panzarino, O. (2014). Learning Cypher. PACKT publish- Systems, and Applications (IISA). ing. Drakopoulos, G., Kanavos, A., Makris, C., and Mega- Pons, P. and Latapy, M. (2005). Computing communities in looikonomou, V. (2015b). On converting community large networks using random walks. detection algorithms for fuzzy graphs in Neo4j. In In- Robinson, I., Webber, J., and Eifrem, E. (2013). Graph ternational Workshop on Combinations of Intelligent Databases. O’Reilly. Methods and Applications, CIMA 2015. Scott, J. (2000). Social Network Analysis: A Handbook. Drakopoulos, G., Kanavos, A., Makris, C., and Mega- SAGEPublications Ltd. looikonomou, V. (2016). Comparing algorithmic prin- ciples for fuzzy graph communities over Neo4j. In Ad- Shi, J. and Malik, J. (2000). Normalized cuts and image vances in Combining Intelligent Methods, pages 47– segmentation. IEEE Transactions on Pattern Analysis 73. and Machine Intelligence, 22(8):888–905. Drakopoulos, G. and Megalooikonomou, V. (2016). Reg- Vogel, C. R. (2002). Computational methods for inverse ularizing large biosignals with finite differences. In problems. SIAM. International Conference of Information, Intelligence, Weng, J., Lim, E.-P., Lim, J., and Jiang, Q. H. (2010). Twit- Systems, and Applications (IISA). terrank: Finding topic-sensitive influential twitterers. Fortunato, S. (2010). Community detection in graphs. In Web Search and Data Mining (WSDM), pages 261– Physics Reports, 486:75–174. 270. Girosi, F., Jones, M., and Poggio, T. (1995). Regulariza- tion theory and neural networks architectures. Neural computation, 7(2):219–269. Girvan, M. and Newman, M. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(2):7821–7826. Johansen, T. A. (1997). On Tikhonov regularization, bias and variance in nonlinear system identification. Auto- matica, 33(3):441–446. Jurczyk, P. and Agichtein, E. (2007). Discovering author- ities in question answer communities by using link analysis. In Conference of Information and Knowl- edge Management (CIKM), pages 919–922. Kafeza, E., Kanavos, A., Makris, C., and Vikatos, P. (2014). T-PICE: Twitter personality based influential com- munities extraction system. In IEEE International Congress on Big Data, pages 212–219. Kanavos, A., Perikos, I., Vikatos, P., Hatzilygeroudis, I., Makris, C., and Tsakalidis, A. (2014). Conversation emotional modeling in social networks. In Interna- tional Conference on Tools with Artificial Intelligence (ICTAI), pages 478–484. Kernighan, B. and Lin, S. (1970). An efficient heuristic pro- cedure for partitioning graphs. The Bell System Tech- nical Journal, 49(1):291–307. Kleinberg, J. M. (1998). Authoritative sources in a hyper- linked environment. In Symposium of Discrete Algo- rithms (SODA), pages 668–677. Langville, A. and Meyer, C. (2006). Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press.

410