“Assortative Mixing, Graph Partitioning, Community Structure”

Total Page:16

File Type:pdf, Size:1020Kb

“Assortative Mixing, Graph Partitioning, Community Structure” 11 FIG. 6: Six modules corresponding to the uppermost branched linear chain of modules depicted in Figure 4. Colors denote modules as defined by the network information bottleneck algorithm. Again the modules roughly correspond to institutional affiliations. Over 50% of the blue nodes have one or more affiliations with the institutions based in and around Chicago (Argonne National Laboratory, University of Illinois at Chicago and University of Notre Dame). 70% of the red nodes are in England, and 75% of the green nodes are in China, mostly at the Institute of Chemistry Chinese Academy of Sciences, and all of the cyan nodes are at the Center of Complex Systems Research in Illinois. Both the yellow and magenta modules are mostly affiliated with the University of Nebraska. ECS 289 / MAE 298 Feb 9, 2011 “Assortative mixing, graph partitioning, FIG. 7: E. coli gene regulatory network. Lcommunityargest componen structure”t of the symmetric version of the E. coli genetic regulatory network. Colors denoted modules identified by NIB. Diffusive distributions are but one general class of distributions on a network. A natural generalization of these ideas is to describe other distributions on a network for which a particular function, energy, or origin is known, and on which some particular degree of freedom (such as chemical concentration or genetic expression as a function of time) may be defined. Finally, we note that while the information bottleneck is a prescription for finding the highest-fidelity summary of a system at a given simplicity, algorithms for determining network community structure are usually motivated by various definitions of normalized min-cuts [27, 28, 29, 30]. Our results, particularly for the synthetic graphs with prescribed modular structure, demonstrate that information modularity implies edge modularity, an unexpected finding which motivates further numerical and analytic investigations in progress regarding this relationship. Acknowledgments It is a pleasure to acknowledge Mark Newman, Noam Slonim, Christina Leslie, Risi Kondor, Ilya Nemenman, and Susanne Still for many useful conversations on the information bottleneck and modularity. This work was supported Many networks naturally break up into groups (Other common features: broad-scale, small worlds, clustering) • Modular networks / Community structures • Max-cut min-flow • Hierarchical networks • More generally, finding clusters of similar objects What can we learn from partitioning a network? (Still an open question: tons of work on detecting communities, not so much work in interpreting what they mean.) • Bottlenecks • Social groups / collaboration networks • US Congress: – Partisan / bi-partison voting patterns (both by party and region) – Central players • Sometimes functional subgroups (some evidence in Gene Regulatory networks) Topics covered today • Modularity metric (many edges within a module, few between modules) • Kernighan-Lin algorithm • Spectral techniques: Graph Laplacian, Fiedler vector • Girvan-Neman (centrality based) and Newman’s additional work (spectral) (works well for directed networks (weighted edges is easy – multigraph)) • Approximation techniques for large networks (finding the optimal is NP-complete) • Dendrograms / hierachy • Overlapping communities • Noise from high degree nodes (includes inherent modularity) • Maximum likelihood (nodes with similar patterns of connectivity) • Applications: when do “communities” actually tell us anything useful? (Esp since methods are topology based not about functional connections (e.g. V8 engine)) Here: missing edges, intelligent guesses at missing node attributes ... Kernighan-Lin algorithm, 1970 • How well can you partition a set of connected nodes into two groups with minimal group-spanning edges? (Motivation: Circuit board design – minimize wires crossing between boards.) • Judge this using a quality metric about how well partitions into groups: minimum number of group-spanning edges. • Divide the vertices into two groups (at random or, better yet, with good guess) • Examine each pair (i; j) where i in Group1 and j in Group2. Swap the (i; j) pair that reduces the spanning edges (if there is one). And repeat excluding this pair. • Avoiding local minimum – 1) “simulated annealing” (accept bad moves with some probability, and decrease the probability over time); 2) If no pair reduces number of spanning edges, swap pair that increases it the least. • Dependence on initial condition / grouping. • Also number of groups and their sizes preserved. Spectral partitioning – graph bi-section with Fiedler vector • Recall random walk state transition matrix (we started from Adjacency matrix A and row or column normalized it). 0 1 1=4 1=3 1=2 1=4 0 B 1=4 1=3 0 1=4 0 C B C B C M = B 1=4 0 1=2 0 0 C B C @ 1=4 1=3 0 1=4 1=2 A 0 0 0 1=4 1=2 • Consider instead the Graph Laplacian, L where: No self-loops allowed Lii = di (the degree of node i) 0 1 Lij = −Aij for i 6= j. 3 −1 −1 −1 0 B −1 2 0 −1 0 C (Recall −Aij = −1 if edge B C B C connects i and j, and 0 otherwise.) L = B −1 0 1 0 0 C B C @ −1 −1 0 3 −1 A 0 0 0 −1 1 Graph bisection • M, rwalk state transition matrix (Perron-Frobenious); largest eigenvalue λ = 1 (steady-state). The number of eigenvalues with λ = 1 equals number of disconnected components • L, graph Laplacian. – Related to diffusion on a network (see Newman book, sec 6.13.1) – All eigenvalues λi real and λi ≥ 0. – There will always be one λ = 0, call it λ1 (Note ~v1 = f1; 1; 1; · · · g) – The number of λi’s equal to zero are the number of disconnected components. – If only one λ = 0, call the next largest eigenvector λ2, the Fiedler vector. – The eigenvector ~v2 will have elements equal −1 or +1 only. – Elements with +1 assigned to group 1, those with -1 to group 2 (left picture:) Timescale, τ1 = 5314 to Timescale, τ2 = 1092 to Timescale, τ3 = 157 to ● ●● ● ●● ● ●● ● o o ● o o ● ● ● ● ● ● o● ● ● o● ● ● ● ● ● ● o o● o● o o o● o● ● o o● o● o o o● o● +++●+ +●+ + o o● o● o o● o● ● +● + ● ● ● ● ● ● ● o● o● o ● o o● o● o ● +● ●+●● +● ● ●● o ● ● ●● o 20 ++ +● ● 20 o o ● ● 20 o o o ● ● + ●●++ +● + o●● o o o● o o●● o o ● ●● +● +●+●+ ●o● o ● o● o● ●o● o ● ●+●+ o● +++ ● + + ● ● o ●o o● o● o ●o + + ● +● ++● + o● o o● o● o ● Y + +● ●●●● ● Y o● o●●o●● o● Y +● ●●●● ● ● +++●+ ● ooo● ● +++●+ ● o ● +++ ● o● ● o ● +++ o ●o● + ● ● +●● ● o● ●● + ● ● 10 o ++ 10 + + o 10 + + ++ ● ● ●+● ●+● o o● ● ● ●o + +● + +● 5 o● o ● 5 ● + ● 5 ● + ● ● o● ● ● o +● + ● ● ● o +● + ● ● ● o ●o o● o o●● ● ● ● o●● ● ● ● o●● ● o ● ● oo● o +● ++ ● oo● o +● ++ ● oo● o ● ●o o ●o●o● o● ●+●+ o ●o●o● o● ●+●+ ●o●o● o● 0 o o o 0 + o 0 + + o 0 5 10 20 0 5 10 20 0 5 10 20 X X X Aside: Some other important spectral properties • Eigenvalues of Adjacency matrix, Laplacian, rwalk matrix are invariant under node relabeling. • If two graphs are isomorphisms, they will have exactly the same eigenvalue spectrum. • This is a necessary condition for isomorphism, but not sufficient – just because two graphs have same eigenvalues does not mean they are isomorphic – but it is strong evidence for isomorphisms (sometimes just first few eigenvalues tested is considered evidence) • For more on graph isomorphisms, Fiedler vectors, graph embeddings, see http://cs-www.cs.yale.edu/homes/spielman/ From graph bisection to communities in networks • M. Girvan and M. E. J. Newman, Community structure in social and biological networks, Proceedings of the National Academy of Sciences, 99 (2002), 78217826. • The paper the launched “community structure” • Identify the edge with highest betweeness centrality, this edge partitions the graph into two pieces. • Rerun recursively on each sub-network. • Note, this is continued bi-section. • Very costly (need to calculate all shortest-paths between all pairs of nodes – recall definition of betweeness centrality). Relating this back to Kernighan-Lin and Fiedler • Newman, Girvan, Finding and evaluating community structure in networks, Physical Review E, 69, 026113 (2004). • Newman, Finding community structure in networks using the eigenvectors of matrices, Physical Review E, 74, 036104 (2006). • Note the next slides come from a presentation that Mark Newman gave at the Institute for Pure and Applied Math, Workshop on “Random and Dynamic Graphs and Networks”, May 2007. There are many other interesting presentations there: http://www.ipam.ucla.edu/programs/rsws3/ Spectral partitioning is related to min-cut/max-flow • Choose any pair of vertices (i; j) the minimum set of edges whose deletion places the two vertices in disconnected components of the graph, carries the maximum flow between the two vertices. • Finding the optimal partitioning over all possible pairs of nodes is a difficult optimization problem that is in NP. • Maximizing modularity also is NP: U. Brandes, D. Delling, M. Gaertler, R. Goerke, M. Hoefer, Z. Nikoloski, and D. Wagner, On modularity clustering, IEEE Transactions on Knowledge and Data Engineering, 20 (2008), 172188. • So rather than finding THE optimal partitioning, we turn to approximation algorithms to find approximately optimal partitions. The Louvain method • V.D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre (2008). ”Fast unfolding of community hierarchies in large networks”. J. Stat. Mech.: P10008 • http://sites.google.com/site/findcommunities/ • The method consists of two phases. First, it looks for ”small” communities by optimizing modularity in a local way. Second, it aggregates nodes of the same community and builds a new network whose nodes are the communities. These steps are repeated iteratively until a maximum of modularity is attained. • This method is implemented NetworkX and others. Dendrograms Can be built by aggregating (Louvain method) or recursing from the full graph. ! Vol 435|9 June 2005|doi:10.1038/nature03607 LETTERS Uncovering the overlapping community structure of complex networks in nature and society Gergely Palla1,2, Imre Dere´nyi2, Ille´s Farkas1 & Tama´s Vicsek1,2 Many complex systems in nature and society can be described in withOverlappingthe overlaps being their links.
Recommended publications
  • Beta Current Flow Centrality for Weighted Networks Konstantin Avrachenkov, Vladimir Mazalov, Bulat Tsynguev
    Beta Current Flow Centrality for Weighted Networks Konstantin Avrachenkov, Vladimir Mazalov, Bulat Tsynguev To cite this version: Konstantin Avrachenkov, Vladimir Mazalov, Bulat Tsynguev. Beta Current Flow Centrality for Weighted Networks. 4th International Conference on Computational Social Networks (CSoNet 2015), Aug 2015, Beijing, China. pp.216-227, 10.1007/978-3-319-21786-4_19. hal-01258658 HAL Id: hal-01258658 https://hal.inria.fr/hal-01258658 Submitted on 19 Jan 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Beta Current Flow Centrality for Weighted Networks Konstantin E. Avrachenkov1, Vladimir V. Mazalov2, Bulat T. Tsynguev3 1 INRIA, 2004 Route des Lucioles, Sophia-Antipolis, France [email protected] 2 Institute of Applied Mathematical Research, Karelian Research Center, Russian Academy of Sciences, 11, Pushkinskaya st., Petrozavodsk, Russia, 185910 [email protected] 3 Transbaikal State University, 30, Aleksandro-Zavodskaya st., Chita, Russia, 672039 [email protected] Abstract. Betweenness centrality is one of the basic concepts in the analysis of social networks. Initial definition for the betweenness of a node in a graph is based on the fraction of the number of geodesics (shortest paths) between any two nodes that given node lies on, to the total number of the shortest paths connecting these nodes.
    [Show full text]
  • Detecting Statistically Significant Communities
    1 Detecting Statistically Significant Communities Zengyou He, Hao Liang, Zheng Chen, Can Zhao, Yan Liu Abstract—Community detection is a key data analysis problem across different fields. During the past decades, numerous algorithms have been proposed to address this issue. However, most work on community detection does not address the issue of statistical significance. Although some research efforts have been made towards mining statistically significant communities, deriving an analytical solution of p-value for one community under the configuration model is still a challenging mission that remains unsolved. The configuration model is a widely used random graph model in community detection, in which the degree of each node is preserved in the generated random networks. To partially fulfill this void, we present a tight upper bound on the p-value of a single community under the configuration model, which can be used for quantifying the statistical significance of each community analytically. Meanwhile, we present a local search method to detect statistically significant communities in an iterative manner. Experimental results demonstrate that our method is comparable with the competing methods on detecting statistically significant communities. Index Terms—Community Detection, Random Graphs, Configuration Model, Statistical Significance. F 1 INTRODUCTION ETWORKS are widely used for modeling the structure function that is able to always achieve the best performance N of complex systems in many fields, such as biology, in all possible scenarios. engineering, and social science. Within the networks, some However, most of these objective functions (metrics) do vertices with similar properties will form communities that not address the issue of statistical significance of commu- have more internal connections and less external links.
    [Show full text]
  • General Approach to Line Graphs of Graphs 1
    DEMONSTRATIO MATHEMATICA Vol. XVII! No 2 1985 Antoni Marczyk, Zdzislaw Skupien GENERAL APPROACH TO LINE GRAPHS OF GRAPHS 1. Introduction A unified approach to the notion of a line graph of general graphs is adopted and proofs of theorems announced in [6] are presented. Those theorems characterize five different types of line graphs. Both Krausz-type and forbidden induced sub- graph characterizations are provided. So far other authors introduced and dealt with single spe- cial notions of a line graph of graphs possibly belonging to a special subclass of graphs. In particular, the notion of a simple line graph of a simple graph is implied by a paper of Whitney (1932). Since then it has been repeatedly introduc- ed, rediscovered and generalized by many authors, among them are Krausz (1943), Izbicki (1960$ a special line graph of a general graph), Sabidussi (1961) a simple line graph of a loop-free graph), Menon (1967} adjoint graph of a general graph) and Schwartz (1969; interchange graph which coincides with our line graph defined below). In this paper we follow another way, originated in our previous work [6]. Namely, we distinguish special subclasses of general graphs and consider five different types of line graphs each of which is defined in a natural way. Note that a similar approach to the notion of a line graph of hypergraphs can be adopted. We consider here the following line graphsi line graphs, loop-free line graphs, simple line graphs, as well as augmented line graphs and augmented loop-free line graphs. - 447 - 2 A. Marczyk, Z.
    [Show full text]
  • Centrality in Modular Networks
    Ghalmane et al. EPJ Data Science (2019)8:15 https://doi.org/10.1140/epjds/s13688-019-0195-7 REGULAR ARTICLE OpenAccess Centrality in modular networks Zakariya Ghalmane1,3, Mohammed El Hassouni1, Chantal Cherifi2 and Hocine Cherifi3* *Correspondence: hocine.cherifi@u-bourgogne.fr Abstract 3LE2I, UMR6306 CNRS, University of Burgundy, Dijon, France Identifying influential nodes in a network is a fundamental issue due to its wide Full list of author information is applications, such as accelerating information diffusion or halting virus spreading. available at the end of the article Many measures based on the network topology have emerged over the years to identify influential nodes such as Betweenness, Closeness, and Eigenvalue centrality. However, although most real-world networks are made of groups of tightly connected nodes which are sparsely connected with the rest of the network in a so-called modular structure, few measures exploit this property. Recent works have shown that it has a significant effect on the dynamics of networks. In a modular network, a node has two types of influence: a local influence (on the nodes of its community) through its intra-community links and a global influence (on the nodes in other communities) through its inter-community links. Depending on the strength of the community structure, these two components are more or less influential. Based on this idea, we propose to extend all the standard centrality measures defined for networks with no community structure to modular networks. The so-called “Modular centrality” is a two-dimensional vector. Its first component quantifies the local influence of a node in its community while the second component quantifies its global influence on the other communities of the network.
    [Show full text]
  • P2k-Factorization of Complete Bipartite Multigraphs
    P2k-factorization of complete bipartite multigraphs Beiliang Du Department of Mathematics Suzhou University Suzhou 215006 People's Republic of China Abstract We show that a necessary and sufficient condition for the existence of a P2k-factorization of the complete bipartite- multigraph )"Km,n is m = n == 0 (mod k(2k - l)/d), where d = gcd()", 2k - 1). 1. Introduction Let Km,n be the complete bipartite graph with two partite sets having m and n vertices respectively. The graph )"Km,n is the disjoint union of ).. graphs each isomorphic to Km,n' A subgraph F of )"Km,n is called a spanning subgraph of )"Km,n if F contains all the vertices of )"Km,n' It is clear that a graph with no isolated vertices is uniquely determined by the set of its edges. So in this paper, we consider a graph with no isolated vertices to be a set of 2-element subsets of its vertices. For positive integer k, a path on k vertices is denoted by Pk. A Pk-factor of )'Km,n is a spanning subgraph F of )"Km,n such that every component of F is a Pk and every pair of Pk's have no vertex in common. A Pk-factorization of )"Km,n is a set of edge-disjoint Pk-factors of )"Km,n which is a partition of the set of edges of )'Km,n' (In paper [4] a Pk-factorization of )"Km,n is defined to be a resolvable (m, n, k,)..) bipartite Pk design.) The multigraph )"Km,n is called Ph-factorable whenever it has a Ph-factorization.
    [Show full text]
  • Incorporating Network Structure with Node Contents for Community Detection on Large Networks Using Deep Learning
    Neurocomputing 297 (2018) 71–81 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Incorporating network structure with node contents for community detection on large networks using deep learning ∗ Jinxin Cao a, Di Jin a, , Liang Yang c, Jianwu Dang a,b a School of Computer Science and Technology, Tianjin University, Tianjin 300350, China b School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa 923-1292, Japan c School of Computer Science and Engineering, Hebei University of Technology, Tianjin 300401, China a r t i c l e i n f o a b s t r a c t Article history: Community detection is an important task in social network analysis. In community detection, in gen- Received 3 June 2017 eral, there exist two types of the models that utilize either network topology or node contents. Some Revised 19 December 2017 studies endeavor to incorporate these two types of models under the framework of spectral clustering Accepted 28 January 2018 for a better community detection. However, it was not successful to obtain a big achievement since they Available online 2 February 2018 used a simple way for the combination. To reach a better community detection, it requires to realize a Communicated by Prof. FangXiang Wu seamless combination of these two methods. For this purpose, we re-examine the properties of the mod- ularity maximization and normalized-cut models and fund out a certain approach to realize a seamless Keywords: Community detection combination of these two models. These two models seek for a low-rank embedding to represent of the Deep learning community structure and reconstruct the network topology and node contents, respectively.
    [Show full text]
  • Trees and Graphs
    Module 8: Trees and Graphs Theme 1: Basic Properties of Trees A (rooted) tree is a finite set of nodes such that ¯ there is a specially designated node called the root. ¯ d Ì ;Ì ;::: ;Ì ½ ¾ the remaining nodes are partitioned into disjoint sets d such that each of these Ì ;Ì ;::: ;Ì d ½ ¾ sets is a tree. The sets d are called subtrees,and the degree of the root. The above is an example of a recursive definition, as we have already seen in previous modules. A Ì Ì Ì B C ¾ ¿ Example 1: In Figure 1 we show a tree rooted at with three subtrees ½ , and rooted at , and D , respectively. We now introduce some terminology for trees: ¯ A tree consists of nodes or vertices that store information and often are labeled by a number or a letter. In Figure 1 the nodes are labeled as A;B;:::;Å. ¯ An edge is an unordered pair of nodes (usually denoted as a segment connecting two nodes). A; B µ For example, ´ is an edge in Figure 1. A ¯ The number of subtrees of a node is called its degree. For example, node is of degree three, while node E is of degree two. The maximum degree of all nodes is called the degree of the tree. Ã ; Ä; F ; G; Å ; Á Â ¯ A leaf or a terminal node is a node of degree zero. Nodes and are leaves in Figure 1. B ¯ A node that is not a leaf is called an interior node or an internal node (e.g., see nodes and D ).
    [Show full text]
  • Sphere-Cut Decompositions and Dominating Sets in Planar Graphs
    Sphere-cut Decompositions and Dominating Sets in Planar Graphs Michalis Samaris R.N. 201314 Scientific committee: Dimitrios M. Thilikos, Professor, Dep. of Mathematics, National and Kapodistrian University of Athens. Supervisor: Stavros G. Kolliopoulos, Dimitrios M. Thilikos, Associate Professor, Professor, Dep. of Informatics and Dep. of Mathematics, National and Telecommunications, National and Kapodistrian University of Athens. Kapodistrian University of Athens. white Lefteris M. Kirousis, Professor, Dep. of Mathematics, National and Kapodistrian University of Athens. Aposunjèseic sfairik¸n tom¸n kai σύνοla kuriarqÐac se epÐpeda γραφήματa Miχάλης Σάμαρης A.M. 201314 Τριμελής Epiτροπή: Δημήτρioc M. Jhlυκός, Epiblèpwn: Kajhγητής, Tm. Majhmatik¸n, E.K.P.A. Δημήτρioc M. Jhlυκός, Staύρoc G. Kolliόποuloc, Kajhγητής tou Τμήμatoc Anaπληρωτής Kajhγητής, Tm. Plhroforiκής Majhmatik¸n tou PanepisthmÐou kai Thl/ni¸n, E.K.P.A. Ajhn¸n Leutèrhc M. Kuroύσης, white Kajhγητής, Tm. Majhmatik¸n, E.K.P.A. PerÐlhyh 'Ena σημαντικό apotèlesma sth JewrÐa Γραφημάτwn apoteleÐ h apόdeixh thc eikasÐac tou Wagner από touc Neil Robertson kai Paul D. Seymour. sth σειρά ergasi¸n ‘Ελλάσσοna Γραφήματα’ apo to 1983 e¸c to 2011. H eikasÐa αυτή lèei όti sthn κλάση twn γραφημάtwn den υπάρχει άπειρη antialusÐda ¸c proc th sqèsh twn ελλασόnwn γραφημάτwn. H JewrÐa pou αναπτύχθηκε gia thn απόδειξη αυτής thc eikasÐac eÐqe kai èqei ακόμα σημαντικό antÐktupo tόσο sthn δομική όσο kai sthn algoriθμική JewrÐa Γραφημάτwn, άλλα kai se άλλα pedÐa όπως h Παραμετρική Poλυπλοκόthta. Sta πλάιsia thc απόδειξης oi suggrafeÐc eiσήγαγαν kai nèec paramètrouc πλά- touc. Se autèc ήτan h κλαδοαποσύνθεση kai to κλαδοπλάτoc ενός γραφήματoc. H παράμετρος αυτή χρησιμοποιήθηκε idiaÐtera sto σχεδιασμό algorÐjmwn kai sthn χρήση thc τεχνικής ‘διαίρει kai basÐleue’.
    [Show full text]
  • Modeling Multiple Relationships in Social Networks
    Asim AnsAri, Oded KOenigsberg, and FlOriAn stAhl * Firms are increasingly seeking to harness the potential of social networks for marketing purposes. therefore, marketers are interested in understanding the antecedents and consequences of relationship formation within networks and in predicting interactivity among users. the authors develop an integrated statistical framework for simultaneously modeling the connectivity structure of multiple relationships of different types on a common set of actors. their modeling approach incorporates several distinct facets to capture both the determinants of relationships and the structural characteristics of multiplex and sequential networks. they develop hierarchical bayesian methods for estimation and illustrate their model with two applications: the first application uses a sequential network of communications among managers involved in new product development activities, and the second uses an online collaborative social network of musicians. the authors’ applications demonstrate the benefits of modeling multiple relations jointly for both substantive and predictive purposes. they also illustrate how information in one relationship can be leveraged to predict connectivity in another relation. Keywords : social networks, online networks, bayesian, multiple relationships, sequential relationships modeling multiple relationships in social networks The rapid growth of online social networks has led to a social commerce (Stephen and Toubia 2010; Van den Bulte resurgence of interest in the marketing
    [Show full text]
  • Review of Community Detection Over Social Media: Graph Prospective
    (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 10, No. 2, 2019 Review of Community Detection over Social Media: Graph Prospective Pranita Jain1, Deepak Singh Tomar2 Department of Computer Science Maulana Azad National Institute of Technology Bhopal, India 462001 Abstract—Community over the social media is the group of globally distributed end users having similar attitude towards a particular topic or product. Community detection algorithm is used to identify the social atoms that are more densely interconnected relatively to the rest over the social media platform. Recently researchers focused on group-based algorithm and member-based algorithm for community detection over social media. This paper presents comprehensive overview of community detection technique based on recent research and subsequently explores graphical prospective of social media mining and social theory (Balance theory, status theory, correlation theory) over community detection. Along with that this paper presents a comparative analysis of three different state of art community detection algorithm available on I-Graph package on python i.e. walk trap, edge betweenness and fast greedy over six different social media data set. That yield intersecting facts about the capabilities and deficiency of community analysis methods. Fig 1. Social Media Network. Keywords—Community detection; social media; social media Aim of Community detection is to form group of mining; homophily; influence; confounding; social theory; homogenous nodes and figure out a strongly linked subgraphs community detection algorithm from heterogeneous network. In strongly linked sub- graphs (Community structure) nodes have more internal links than I. INTRODUCTION external. Detecting communities in heterogeneous networks is The Emergence of Social networking Site (SNS) like Face- same as, the graph partition problem in modern graph theory book, Twitter, LinkedIn, MySpace, etc.
    [Show full text]
  • Closeness Centrality for Networks with Overlapping Community Structure
    Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Closeness Centrality for Networks with Overlapping Community Structure Mateusz K. Tarkowski1 and Piotr Szczepanski´ 2 Talal Rahwan3 and Tomasz P. Michalak1,4 and Michael Wooldridge1 2Department of Computer Science, University of Oxford, United Kingdom 2 Warsaw University of Technology, Poland 3Masdar Institute of Science and Technology, United Arab Emirates 4Institute of Informatics, University of Warsaw, Poland Abstract One aspect of networks that has been largely ignored in the literature on centrality is the fact that certain real-life Certain real-life networks have a community structure in which communities overlap. For example, a typical bus net- networks have a predefined community structure. In public work includes bus stops (nodes), which belong to one or more transportation networks, for example, bus stops are typically bus lines (communities) that often overlap. Clearly, it is im- grouped by the bus lines (or routes) that visit them. In coau- portant to take this information into account when measur- thorship networks, the various venues where authors pub- ing the centrality of a bus stop—how important it is to the lish can be interpreted as communities (Szczepanski,´ Micha- functioning of the network. For example, if a certain stop be- lak, and Wooldridge 2014). In social networks, individuals comes inaccessible, the impact will depend in part on the bus grouped by similar interests can be thought of as members lines that visit it. However, existing centrality measures do not of a community. Clearly for such networks, it is desirable take such information into account. Our aim is to bridge this to have a centrality measure that accounts for the prede- gap.
    [Show full text]
  • A Centrality Measure for Networks with Community Structure Based on a Generalization of the Owen Value
    ECAI 2014 867 T. Schaub et al. (Eds.) © 2014 The Authors and IOS Press. This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License. doi:10.3233/978-1-61499-419-0-867 A Centrality Measure for Networks With Community Structure Based on a Generalization of the Owen Value Piotr L. Szczepanski´ 1 and Tomasz P. Michalak2 ,3 and Michael Wooldridge2 Abstract. There is currently much interest in the problem of mea- cated — approaches have been considered in the literature. Recently, suring the centrality of nodes in networks/graphs; such measures various methods for the analysis of cooperative games have been ad- have a range of applications, from social network analysis, to chem- vocated as measures of centrality [16, 10]. The key idea behind this istry and biology. In this paper we propose the first measure of node approach is to consider groups (or coalitions) of nodes instead of only centrality that takes into account the community structure of the un- considering individual nodes. By doing so this approach accounts for derlying network. Our measure builds upon the recent literature on potential synergies between groups of nodes that become visible only game-theoretic centralities, where solution concepts from coopera- if the value of nodes are analysed jointly [18]. Next, given all poten- tive game theory are used to reason about importance of nodes in the tial groups of nodes, game-theoretic solution concepts can be used to network. To allow for flexible modelling of community structures, reason about players (i.e., nodes) in such a coalitional game.
    [Show full text]