International Journal of Advanced Computational Engineering and Networking, ISSN: 2320-2106, Volume-5, Issue-9, Sep.-2017 http://iraj.in IMPROVIESED FOR FUZZY OVERLAPPING COMMUNITY DETECTION IN SOCIAL NETWORKS

1HARISH KUMAR SHAKYA, 2KULDEEP SINGH, 3BHASKAR BISWAS

123Indian Institute of Technology, Varanasi, (UP) India, India Email: [email protected],[email protected], [email protected]

Abstract: Community structure identification is an important task in social network analysis. Social communications exist with some social situation and communities are a fundamental form of social contexts. Social network is application of web mining and web mining is also an application of data mining. Social network is a type of structure made up of a set of social actors like as persons or organizations, sets of pair ties, and other interaction socially between actors. In recent scenario community detection in social networks is a very hot and dynamic area of research. In this paper, we have used improvised genetic algorithm for community detection in social networks, we used the combination of roulette wheel selection and square quadratic knapsack problem. We have executed the experiment on different datasets i.e. Zachary’s karate club [31], American college football [39], Dolphin social network [32] and many more. All are verified and well known datasets in the research world of social network analysis. An experimental result shows the improvement on convergence rate of proposed algorithm and discovered communities are highly inclined towards quality.

Index term: community detection, genetic algorithm, roulette wheel selection,

I. INTRODUCTION optimization problem, which involves a quality function first defined by Newman and Girvan [3], A social network is a branch of data mining which involves finding some structure or pattern amongst the set of individuals, groups and organizations. A social network involves representation of these societies in the form of a graph with the individuals and this quality function, called modularity, gave an as the vertices and the relationship among the approximate measure of the validity of the formed individuals being represented by the edges [1]. community structure. For each subset, the partition is Certain individuals of a social network are said to said to be better, if the number of excess of links in belong to a community if they have large number of each module is larger than the number of excess of interconnections. Similarly two individual links in the random case. So, the best partition of the communities will have less number of connections given social network will be the one in which the between them. In normal terms, community structure modularity has maximum value. Optimizing the value is defined as the groups of vertices such that the of the modularity of a community structure is a number of edges within the individuals of each group challenging task, since when the size of the social is greater than the number of edges that connect the network increases, the number of partition will individuals of this group to the rest of the groups in increase in at least exponential relation with the size. the community network. This complete process of It has been recently proved that optimizing the finding the exact community structure of any given modularity of a community structure for a given social network is called community detection in social network is an NP-complete problem, [4] so it social network [2].Community structure in any given is not possible to find the exact optimum value of social network gives us an indication of some modularity in polynomial time with the help of a important pattern which may be hidden on normal deterministic algorithm but we can find a good analysis, and thus can help us to understand a lot of approximate solution in polynomial time using processes and phenomenon of social networks and various techniques, like greedy agglomeration [5][6], communities better. This also helps when someone [7][8][9], extremal optimization makes an application using the social network and its [10] and spectral division [11]. communities. Community detection in social networks can be done The validity of the methods used for community with the help of genetic algorithm [12].The basic detection in social networks is based on the structure of a genetic algorithm involves three steps: “goodness” of the found communities, which is Selection operation, crossover operation and mutation evaluated with the help of some quality function. The operation [13]. Selection operation involves selecting problem of identifying different communities in given a certain fraction of individuals from the network as it social networks has been changed into an is and copying them to the next generation. This selection is made for the best communities. This is in

Improviesed Genetic Algorithm for Fuzzy Overlapping Community Detection in Social Networks

27 International Journal of Advanced Computational Engineering and Networking, ISSN: 2320-2106, Volume-5, Issue-9, Sep.-2017 http://iraj.in consistence with biological genetics assuming that a time complexity O(m2n) for a graph with n certain fraction of best or most fit individuals will vertices and m edges. The steps of this algorithm survive. The crossover operation is like reproduction include centrality computation, removal of edge in biology. This involves mixing the genes of two with largest centrality, recalculation of centrality individuals. Similarly in case of community and iteration of the cycle from step 2. detection, two individuals are crossover to form a  Modularity Maximization [17]: This is the most new individual in next generation. This individual has widely used method. Modularity function some of the genes of one parent and some of them of measures quality of the community structure. In the other one. Mutation operation is identical to modularity maximization method, we search biological mutation in which some of the individuals over all possible division of the network and adapt to form new species with changed genetic obtain the division with high modularity. But this material. Similarly, in community detection using solution n intractable. Approximate solutions are genetic algorithm, we modify the genes or made. Some of these approximate optimization characteristics of some percentage of individual to method include greedy algorithms, simulated form a new species in next generation. annealing and spectral optimization. The first Three types of partitions, possibilistic partitions, was proposed by Newman. fuzzy partitions, and crisp/hard partitions for This algorithm tends to form large community. communities are possible. For fuzzy community In simulated annealing, the probabilistic partition, fuzzy c-means clustering [14] is used. procedure is followed and the global optimum of a function is found. Modularity optimization fails II. LITERATURE REVIEW to detect clusters smaller than a particular value called resolution limit. Community detection is an important research topic In Fuzzy community detection algorithms, we define in the field of complex networks. Genetic algorithms a soft membership vectors or belonging vector for have been used as an effective optimization technique each node. This is used to quantify association to solve this problem. The earliest method was called strength in each community. The following the minimum-cut method. Then hierarchical researches have been made in this field: clustering method came up. This was followed by  Nepusz used simulated annealing [18] Girvan Newman algorithm which was further method. He converted the problem into a optimized using modularity maximization. The non-linear constraint optimization problem. algorithms in detail are:  Zhang used spectral clustering [19] framework and proposed an algorithm to do  Minimum-Cut Method [15]: In this method the effective division of social network into number of partition was predetermined and then communities. He used Fuzzy C-Means the network was divided. It was ensured that the clustering [20] (FCM) to obtain the soft division was such that the community was of assignment. Users specify an upper bound to approximately the same size. Also the number of the number of communities. If this upper intergroup edges is minimized. This method is bound is k then, only the top k-1 Eigen less than ideal as it finds only a fixed a number vectors are computed. Both, accuracy and of communities. time complexity rely on the value of k.  Hierarchical Clustering [16]: In hierarchical  In 2007, Newman and Leicht used a mixture clustering, a similarity measure is defined. This models [21] and provided an appropriate quantifies topological type of similarity between Fuzzy community detection method. This nodes. Method used includes cosine similarity, was possible only due to probabilistic nature jaccard index and haming distance between of the algorithm. In this, the number of adjacency matrix rows. Similar nodes where communities is same as the mixture models. grouped into one community. Two methods were This number needs to be specified in used for this grouping: Single linkage clustering advance. and complete linkage clustering. In the former, Researchers are being made in the field of fuzzy two groups are different iff all pair of nodes in community detection using evolutionary algorithms. different groups have similarity less than a given The following models were developed in the past: value. In the later, all pair of nodes within a  Modularity-based Model: In 2004, Newman group has similarity greater than a threshold. and Girvan devised an evaluation criteria  Girvan Newman Algorithm: Edges that lie called modularity denoted by Q. This between two communities are identified and criterion takes into account that the number removed. Identification is performed by of edges within a community are maximised between-ness measure in which a number is and the inter-community edges are assigned to each edge. This number is large if minimised. Higher the value of modularity, edge lie between many pair of nodes. This the better the solution is. However, this algorithm gives quality result but is slow with approach had several drawbacks [22]. Firstly

Improviesed Genetic Algorithm for Fuzzy Overlapping Community Detection in Social Networks

28 International Journal of Advanced Computational Engineering and Networking, ISSN: 2320-2106, Volume-5, Issue-9, Sep.-2017 http://iraj.in optimizing Q has been categorised as an NP- of time. A bi-objective optimization problem hard problem. Secondly, large Q values was proposed to solve community detection don’t always prove better community in dynamic networks. Normalized mutual partition as random networks with no Information [29] (NMI) is a similarity index community structures can also have high Q used in Information Theory. Given A and B values. Thirdly and most importantly, Q has as two partitions of network NMI (A, B) a resolution limitation, i.e. community with gives the similarity between the two sizes smaller than a threshold value are not partitions. If NMI (A, B) = 1 the partition is detected. same. If NMI (A, B) = 0 then the partition is  Multi-resolution Model [23]: This model completely different. was introduced to overcome the resolution limitation problem of modularity model. III. PROPOSED WORK Pizzuti, in 2008, proposed a genetic algorithm for community detection using In this paper, we have done different type of multi-resolution model. In this a community experiments on genetic algorithm and check the score evaluation is done taking into performance of modified GA. I have done the consideration that the number of edges modification in the algorithm but not change in within a community are maximised and the internal architecture. inter-community edges are minimised. Input: In this Genetic algorithm, input datasets in the  General Model: As we know for the best form of adjacency matrix and some other input community partition dense links within the parameters given below in Table 1. And Table 2. community should exist and sparse links Parameter Value Description between two communities should exist. m 1.7 Used in determining the Pizzuti and Pizzuti proposed an algorithm membership of each node called MOGANet. In this he used fast elitist cp 0.1 Percentage of individual genetic algorithm for sorting which was non- selected directly dominated. They defined two parameters, npc 10 Number of individuals with Community Scores (CS) and Community given number of partition Fitness (CF). Butul and Kaya improved the pm 1.0 Mutation percentage MOGANet by using meta-heuristics [24]. pc 0.9 Cross-over percentage They used enhanced firefly algorithm [25] Occ 10 Number of occurrences of followed by harmony search algorithm [26] max generation with termination and chaotic local search. Zhang and Li condition proposed a decomposition based method 10-5 Termination condition [27]. ᵋ  Signed Model: Signed networks are tmax 100 Number of iterations networks in which the vertices have friendly c 2 Minimum number of or hostile relation between each other min partitions of social network depending on the sign assigned to them. c 10 Maximum number of Gong, in 2014, proposed a PSO framework max partitions of social network which included two parameters, Signed Table 1: Values of different parameters Ratio Association (SRA) and Signed Ratio Output: partition and the cover matrix (U). Cut (SRC). In 2013, Amelio and Pizzuti put Terminal condition: return the best individuals. forward a model using NSGA2 framework. Dataset Symbol Vertices Edges  Overlapping Model [28]: A consideration Karate K 34 78 was taken that a node that connects multiple Dolphin D 62 159 communities with similar strength may be fuzzy. For example, if a node i has l links to PolBooks P 105 441 both community A and B then i must be a Football F 115 613 fuzzy node. Jazz J 198 2742  Dynamic Model: Dynamic model takes into Sawmill S 36 62 consideration evolving networks. This is a LesMis L 77 254 model which can be used in case some of the Words W 112 425 nodes or edges disappear. Dynamic Metabolic M 453 2025 networks help in predicting change Table 2: Description of the datasets used tendency. Thus community detection is We proposed the following two changes in GAFCD challenging in dynamic networks. In 2010, [30] to improve the final modularity value and NMI Folino and Pizzuti used Temporal value for the fuzzy community detected: Smoothness Framework, i.e. they ignored While calculating the modularity value (Q-value), we changes in the community for a short period calculated the contribution of each community

Improviesed Genetic Algorithm for Fuzzy Overlapping Community Detection in Social Networks

29 International Journal of Advanced Computational Engineering and Networking, ISSN: 2320-2106, Volume-5, Issue-9, Sep.-2017 http://iraj.in separately, while also maintaining the combined Q- to indicate the frequency with which they discussed value of each individual of the population. Q value work matters with each of their colleagues on a 5 was given by the trace of a c*c matrix, where point scale ranging from less than once a week to c=number of communities. The matrix was given by several times a day. Two employees were linked in U*B*U'. So for all of the c communities present in the communication network if they rated their contact this matrix, we stored the diagonal values in a vector as three or more. We do not know whether both called Q per community. employees had to rate their tie in this way or that at  In the fuzzy crossover function, after applying least one employee had to indicate strength of three Roulette wheel selection for calculating the optimal or more [35]. number of communities in the crossover child, random selection of individual communities was done 7)LESMIS: Co-appearance network in the novel from the union of the communities of the two parents. LesMis [36]. Instead of doing a random selection in this step, we used the Q per community vector calculated above to 8) WORDS: Adjacency network of common select the individual communities from the union. We adjectives and nouns in the novel David Copperfield applied Roulette wheel selection in this step. by Charles Dickens [37]. Experiment analysis The experiment was conducted on Microsoft 9) METABOLIC: KEGG Metabolic pathways can be Windows 10 Home Single Language ©2015 64 bit realized into network. Substrate or Product compound operating system using a MATLAB 11 programming are considered as Node and genes are treated as edge platform with Intel (R) Core-i5 1.70 GHz processor [38]. and 8.0 GB RAM. Datasets description IV. EXPERIMENTAL RESULT & ANALYSIS In this experiment, we have to use the number of well known datasets in the form of adjacency matrix. All We compare MGAFCD with GAFCD, MSFCM and the employed dataset description is given below. GALS on 10 real-world data sets that are described in 1) KARATE: This dataset is about study of a karate Table II. Metabolic Network is an undirected, club network by Zachary. The network consists of 34 weighted graph, but it has 15 loops or self- members of a karate club as nodes and 78 connections (none of the algorithms here can handle connections among members representing friendships these loops). Here, we simply remove these loops to in the club which was observed over a period of two make Metabolic Network a simple graph. Karate and years [31]. LesMis datasets are weighted and undirected, while all the other data sets are undirected and unweighted. 2) DOLPHIN: Bottlenose dolphin’s network study of . The different steps involved in SGA are: dolphins living in Doubtful Sound, New Zealand is  Initialization: Before evolution, a population of also used for evaluating communities. The network individuals are randomly initialized. was divided into two groups depending on the  Fitness Evaluation: In every iteration, the association patterns of dolphins [32]. competitiveness of individuals is first evaluated on the basis of a quality function and a fitness 3) POLBOOKS: A network of books about recent US score is assigned to each individual by this politics sold by the online bookseller Amazon.com. quality function. m= 1.7, npc=10. 10 partitions Edges between books represent frequent co- with each community size from cmin to cmax are purchasing of books by the same buyers. The network generated and taken as single individuals. was compiled by V. Krebs and is unpublished, but Population size=npc*(cmax-cmin+1) can found on Krebs' web site [33].  Survival of the Fittest: Individuals are selected for crossover and mutation with pre-define 4) FOOTBALL: A dataset showing the fixtures, probabilities pc and pm respectively. pc= 0.9, results and attendance of football games played by pm=1.0, cp=0.1 Leeds United football teams [39].  Evolution: The selection process guarantees that an individual with a higher fitness score will be 5) JAZZ: This is the collaboration network between chosen with a higher probability. Jazz musicians. Each node is a Jazz musician and an  Iteration: After a new generation is produced, edge denotes that two musicians have played together SGA terminates and returns the best individual in a band [34]. of the current generation if some stopping conditions are satisfied. Number of 6) SAWMILL: This is a communication network iterations=100 within a small enterprise: All employees were asked

Improviesed Genetic Algorithm for Fuzzy Overlapping Community Detection in Social Networks

30 International Journal of Advanced Computational Engineering and Networking, ISSN: 2320-2106, Volume-5, Issue-9, Sep.-2017 http://iraj.in

Figure 3: Jazz, Q=0.4452, c=4 Figure 1: Dolphin, Q=0.5285, c=5 Figure 4: Metabolic, Q=0.4447, c=9 Figure 2: Karate, Q=0.4449, c=4 nodes respectively. This partition gave a Q In this paper, Fig 1,2,3,4 represents fuzzy community value of 0.4449. partitions of our given dataset or social network. In  Figure 3 represents partition of Jazz dataset. It these figures, relative sizes of each of the forms 4 communities with 53.4, 62, 21.6 and communities are shown. 61 nodes respectively. This partition gave a Q  Figure 1 represents partition of Dolphin value of 0.4452. dataset. It forms 5 communities with 12, 20, 9,  Figure 4 represents partition of metabolic 16 and 5 nodes respectively. This partition dataset. It forms 9 partitions with 36.75, 60, gives Q value as 0.5285. 44, 11, 74, 107.25, 7, 93 and 20 nodes  Figure 2 represents partition of Karate dataset. respectively. This partition shows a Q value of It forms 4 communities with 5, 6, 11 and 12 0.4447.

Algo. K D P F J S L W M 0.4129 0.3963 0.4596 0.5266 0.398 0.3279 0.4897 0.0052 0.2588 MSFCM

0.0001 0.0043 0.0009 0.0008 0.02 0.0001 0.0108 0.0013 0.0118

mean 0.4449 0.5285 0.5272 0.6046 0.4452 0.5501 0.5667 0.3107 0.4261 Q GAFCD

std Q 0 0.0001 0 0 0 0 0 0.0009 0.0014

0.4449 0.5282 0.5272 0.6045 0.4448 0.5501 0.5313 0.3094 0.4153 GALS 0.0004 0 0.0003 0.0001 0 0.0013 0.002 0.0068 0

0.4449 0.5285 0.5275 0.6046 0.4452 0.5503 0.5667 0.3107 0.4415 MGAFCD 0 0 0 0 0 0 0 0.0005 0.005

Improviesed Genetic Algorithm for Fuzzy Overlapping Community Detection in Social Networks

31 International Journal of Advanced Computational Engineering and Networking, ISSN: 2320-2106, Volume-5, Issue-9, Sep.-2017 http://iraj.in

0.4132 0.3991 0.4601 0.5268 0.4078 0.328 0.4971 0.0083 0.2876 MSFCM

3 4 3 10 4 5 5 9 7

Qbest 0.4449 0.5285 0.5272 0.6046 0.4452 0.5501 0.5667 0.3126 0.4287 GAFCD c 4 5 5 10 4 4 6 7 9

0.4449 0.5285 0.5272 0.6046 0.4449 0.5501 0.5439 0.3121 0.428 GALS

4 5 5 10 4 4 6 7 18

0.4449 0.5285 0.5275 0.6046 0.4452 0.5503 0.5667 0.3126 0.4447 MGAFCD 4 5 5 10 4 4 6 7 9

Table 3: Compared Performance Of Community Detection Algorithms this we combined the two parents. Suppose parent 1 Table 3 shows the values that we compared between has c1 communities and parent 2 has c2 communities. the MSFCM, GAFCD, GALS and our algorithm, Then the child made can have the number of GAFCD. It involves modularity values, i.e. Qbest, Qstd and Qmean. In that experiment, MGAFCD modularity communities ranging from 2 to c1+c2. We calculated values increased by an approximate factor of 0.02 in average fitness for all the communities with a given the metabolic dataset and more datasets. We also number of partitions. Then Roulette Wheel selection received improved values of Qstd in comparison to was done to obtain the optimised number of MSFCM, GALS algorithm for all datasets. For some communities in the child. Then mutation was done of the smaller datasets like Dolphin, Qstd decreased which involved modifying each column of partition (and thus improved) in comparison to GAFCD. But, using qpip solver assuming that the other columns for some of the bigger datasets like Metabolic, this remain the same. The modification we did proved value increased, making the communities found a bit effective for large dataset like metabolic as it used inconsistent, though with better modularity. In informed selection. It was not quite effective for datasets like Karate, Dolphin and Football, partition smaller datasets as random selection and informed found is crisp as was in the implementation of selection will select almost the same set of GAFCD. But for datasets like Jazz and Metabolic, communities. Also, mutation operator will modify the fuzzy communities are observed. GAFCD and partition of smaller datasets easily thus eliminating MGAFCD has same number of communities for the need for informed selection. Whereas, in case of Metabolic dataset but different Q values. This is in large datasets, the modification increased the consistence with the fact that we have selected the modularity values and made a difference. We can optimal number of communities in the way similar to further modify the algorithm by including non- the way GAFCD did. But, we have improved the mutated partitions in the set as well. In the future, we algorithm in selection of communities that form the will put our efforts to enable our GAFCD workable next generation individual. Thus, it shows same for large social networks. With the assumption that number of communities but different modularity large social networks are usually sparse graphs, we value. will attempt to reduce the time cost for computing Qg for a fuzzy partition. Meanwhile, we will work CONCLUSION & FUTURE WORK towards a new effective and but more efficient algorithm to replace the current mutation operator. We have a successfully modified the existing GAFCD algorithm. The existing GFCD algorithm did REFERENCES the following: It made a fuzzy partition of the network using one step FCM initialization. It treated [1] Wasserman, Stanley; Faust, Katherine (1994). "Social Network Analysis in the Social and Behavioral each partition as an individual. The modularity value Sciences". Social Network Analysis: Methods and for the partition was used as the objective function to Applications. Cambridge University Press. pp. 1– evaluate each partition. These partitions were then 27.ISBN 9780521387071. sorted according to these modularity values. Then [2] S. Fortunato, “Community detection in graphs,” Physics certain percentage of individuals was directly selected Reports, vol. 486, pp. 75–174, 2010. for the next generation. Next, crossover was done. In

Improviesed Genetic Algorithm for Fuzzy Overlapping Community Detection in Social Networks

32 International Journal of Advanced Computational Engineering and Networking, ISSN: 2320-2106, Volume-5, Issue-9, Sep.-2017 http://iraj.in [3] M. E. J. Newman and M. Girvan, “Finding and evaluating 066122. arXiv:1107.1155.Bibcode:2011PhRvE..84f6122L. d community structure in networks”, Phys. Rev. E 69, 026113, oi:10.1103/PhysRevE.84.066122 2004. [23] Ju Xiang, Xin-Guang Hu, Xiao-Yu Zhang, Jun-Feng Fan, [4] U. Brandes, D. Delling, M. Gaertler, R. Goerke, M. Hoefer, Xian-Lin Zeng, Gen-Yi Fu, Ke Deng and Ke Hu (2012). Z. Nikoloski and D. Wagner, “Maximizing modularity is "Multi-resolution modularity methods and their limitations in hard”, physics/0608255 in www.arxiv.org. community detection". European Physical Journal B 85 (10): [5] M. E. J. Newman, “Fast algorithm for detecting community 1–10. doi:10.1140/epjb/e2012-30301-2 structure in networks”, Physical Review E 69, 066133, 2004. [24] Goldberg, D.E. (1989). Genetic Algorithms in Search, [6] A. Clauset, M. E. J. Newman and C. Moore, “Finding Optimization and Machine Learning. Kluwer Academic community structure in very large networks”, Phys. Rev. E Publishers. ISBN 0-201-15767-5. 70, 066111, 2004. [25] Yang, X. S. (2009). "Firefly algorithms for multimodal [7] R. Guimer`a, M. Sales-Pardo and L. A. N. Amaral, optimization". Stochastic Algorithms: Foundations and “Modularity from fluctuations in random graphs and complex Applications, SAGA 2009. Lecture Notes in Computer networks”, Phys. Rev. E 70, 025101(R), 2004. Sciences 5792. pp. 169–178. arXiv:1003.1466 [8] R. Guimer`a and L. A. N. Amaral, “Functional cartography [26] Original Harmony Search: Geem ZW, Kim JH, and of complex metabolic networks”, Nature 433, pp. 895-900, Loganathan GV, A New Heuristic Optimization Algorithm: 2005. Harmony Search, Simulation, 2001. [9] J. Reichardt and S. Bornholdt, “Statistical mechanics of [27] Gottlob, Georg; Nicola Leone; Francesco Scarcello (2000). community detection”, Physical Review E 74, 016110, 2006. "A comparison of structural CSP decomposition [10] J. Duch and A. Arenas, “Community detection in complex methods". Artificial Intelligence 124 (2): 243– networks using extremal optimization”, Phys. Rev. E 72, 282.doi:10.1016/S0004-3702(00)00078-3 027104, 2005. [28] Aliprantis, Charalambos D.; Brown, Donald J.; Burkinshaw, [11] M. E. J. Newman, “Modularity and community structure in Owen (April 1988). "5 The overlapping generations model networks”, Proc. Natl. Acad. Sci. USA 103, pp. 8577–8582, (pp. 229–271)". Existence and optimality of competitive 2006. equilibria (1990 student ed.). Berlin: Springer-Verlag. [12] Pizzuti, “Community detection in social networks with pp. xii+284. ISBN 3-540-52866-0.MR 1075992. genetic algorithms,” in Proceedings of the 10th annual [29] Kraskov, Alexander; Stögbauer, Harald; Andrzejak, Ralph conference on Genetic and . ACM, G.; Grassberger, Peter (2003). "Hierarchical Clustering Based 2008, pp. 1137–1138. on Mutual Information". arXiv:q-bio/0311039 [13] John Holland, Adaptation in Natural and Artificial Systems [30] Jianhai Su and Timothy C. Havens, Fuzzy Community (1975), University of Michigan Press, ISBN 0-262- 58111-6 Detection in Social Networks Using a Genetic Algortihm. [14] Zhang et al. 2007a Identification of overlapping community 2014 IEEE International Conference on Fuzzy Systems structure in complex networks using fuzzy c-means (FUZZ-IEEE) clustering. Physica A374, 483–490. [31] W. W. Zachary, “An information flow model for conflict and [15] Goemans, Michel X.; Williamson, David P. (1995), fission in small groups,” Journal of Anthropological "Improved approximation algorithms for maximum cut and Research, vol. 33, pp. 452– 473, 1977. satisfiability problems using semidefinite [32] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. programming", Journal of the ACM 42 (6): 1115– Slooten, and S. M. Dawson, “The bottlenose dolphin 1145, doi:10.1145/227683.227684 community of doubtful sound features a large proportion of [16] Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). long-lasting associations,” Behavioral Ecology and "14.3.12 Hierarchical clustering". The Elements of Statistical Sociobiology, vol. 54, pp. 396–405, 2003. Learning (PDF) (2nd ed.). New York: Springer. pp. 520– [33] V. Krebs, “Books about U.S.A. politics.” [Online]. Available: 528. ISBN 0-387-84857-6. Retrieved 2009-10-20. http://www.orgnet.com/ [26] P. M. Gleiser and L. Danon, [17] Newman, M. E. J. (2006). "Modularity and community “Community structure in jazz,” Adv. Comlex System, pp. structure in networks". Proceedings of the National Academy 656–573, July 2003. of Sciences of the United States of America 103 (23): 8577– [34] J. H. Michael and J. G. Massey, “Modeling the 8696. arXiv:physics/0602124. Bibcode:2006PNAS..103.8577 communication network in a sawmill,” Forest Products, vol. N. doi:10.1073/pnas.0601602103. PMC 1482622. PMID 167 47, pp. 25–30, 1997. 23398. [35] D. E. Knuth, The Stanford GraphBase: a platform for [18] Kirkpatrick, S.; GelattJr, C. D.; Vecchi, M. P. (1983). combinatorial computing. Addison-Wesley Reading, 1993, "Optimization by Simulated Annealing". Science 220 (4598): vol. 4. 671– [36] M. E. J. Newman, “Finding community structure in networks 680. Bibcode:1983Sci...220..671K.doi:10.1126/science.220.4 using the eigenvectors of matrices,” Phys. Rev. E, vol. 74, p. 598.671. JSTOR 1690046. PMID 17813860 036104, Sep 2006. [Online]. Available: [19] Arias-Castro, E. and Chen, G. and Lerman, G. (2011), http://link.aps.org/doi/10.1103/PhysRevE.74.036104 "Spectral clustering based on local linear [37] J. Duch and A. Arenas, “Community detection in complex approximations.", Electronic Journal of Statistics 5: 1537– networks using extremal optimization,” Phys. Rev. E, vol. 72, 1587, doi:10.1214/11-ejs651 p. 027104, Aug 2005. [Online]. Available: [20] "Fuzzy Clustering". reference.wolfram.com. Retrieved 2016- ttp://link.aps.org/doi/10.1103/PhysRevE.72.027104 04-26. [38] R. Guimera, L. Danon, A. D ` ´ ıaz-Guilera, F. Giralt, and A. [21] Dinov, ID. "Expectation Maximization and Mixture Arenas, “Self-similar community structure in a network of Modeling Tutorial". California Digital Library, Statistics human interactions,” Phys. Rev. E, vol. 68, p. 065103, Dec Online Computational Resource, Paper 2003. [Online]. Available: EM_MM,http://repositories.cdlib.org/socr/EM_MM, http://link.aps.org/doi/10.1103/PhysRevE.68.065103 December 9, 2008 [39] M. Girvan and M. E. J. Newman. Community structure in [22] Andrea Lancichinetti and Santo Fortunato (2011). "Limits of social and biological networks. Proceedings of the National modularity maximization in community detection". Physical Academy of Sciences, 99:7821–7826, 2002. Review E 84:



Improviesed Genetic Algorithm for Fuzzy Overlapping Community Detection in Social Networks

33