Random Spatial Network Models with Core-Periphery Structure

Random Spatial Network Models with Core-Periphery Structure Junteng Jia Austin R. Benson Cornell University Cornell University [email protected] [email protected] ABSTRACT A B Core-periphery structure is a common property of complex networks, which is a composition of tightly connected groups of core vertices and sparsely connected periphery vertices. This structure frequently emerges in traffic systems, biology, and social networks via underlying spatial positioning of the vertices. While core-periphery structure is ubiquitous, there have been limited attempts at modeling network data with this structure. Here, we C D develop a generative, random network model with core-periphery structure that jointly accounts for topological and spatial information by “core scores” of vertices. Our model achieves substantially higher likelihood than existing generative models of core-periphery structure, and we demonstrate how the core scores can be used in downstream data mining tasks, such as predicting airline traffic and classifying fungal networks. We also develop nearly linear time 0.2 -1.4 -5.2 algorithms for learning model parameters and network sampling Figure 1: Community (A–B) and core-periphery (C–D) struc- by using a method akin to the fast multipole method, a technique ture in the C. elegans network, where vertices are neurons, traditional to computational physics, which allow us to scale to and edges are neural connections. Vertex coordinates are networks with millions of vertices with minor tradeoffs in accuracy. neuron locations in the lateral plane [16] and vertex sizes are proportional to the square root of degrees. (A) The Lou- ACM Reference Format: vain algorithm [19] finds three communities (identified here Junteng Jia and Austin R. Benson. 2019. Random Spatial Network Models with Core-Periphery Structure. In The Twelfth ACM International Conference by the three colors). (B) The adjacency matrix ordered by on Web Search and Data Mining (WSDM ’19), February 11–15, 2019, Melbourne, the three communities. (C) Our proposed model learns ver- VIC, Australia. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/ tex “core scores” based on spatial location and connectiv- 3289600.3290976 ity, where larger core scores are more indicative of a vertex being in the core; here, the maximum and minimum core 1 NETWORK CORE-PERIPHERY STRUCTURE scores are 0.2 and -5.2. (D) The adjacency matrix ordered by Networks are widely used to model the interacting components of decreasing vertex core score. Vertices with high core scores complex systems emerging from biology, ecosystems, economics, are both spatially distributed and densely connected. and sociology [1–3]. A typical network consists of a set of vertices Another important mesoscale structure is core-periphery struc- V and a set of edges E, where the vertices represent discrete objects ture, although such structure has received relatively little attention. (e.g., people or cities) and the edges represent pairwise connections In contrast to community detection, which separates vertices into (e.g., friendships or highways). Networks are often described in several well-connected modules, core-periphery identification in- terms of local properties such as vertex degree or local clustering volves finding sets of cohesive core vertices and periphery vertices coefficients and global properties such as diameter or the number loosely connected to both each other and the cores. This type of of connected components. At the same time, a number of mesoscale structure is common in traffic [8, 9], economic [9, 10], social [11–13], proprieties are consistently observed in real-world networks, which and biological [14] networks. Oftentimes, spatial information is a often reveal important structural information of the underlying driving factor in the core-periphery structure [15–17]. For example, complex systems; arguably the most well-known is community in the C. elegans neural network shown in Fig. 1, most long-distance structure, and a tremendous amount of effort has been devoted to neural connections are between tightly connected early-born neu- its explanation and algorithmic identification [4–7]. rons, which serve as hubs and constitute the core [16]. Identifying core-periphery structure not only provides us with a new perspec- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed tive to study the mesoscale structure in networked systems but can for profit or commercial advantage and that copies bear this notice and the full citation also leads to insights on their functionalities [18]. on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or At the same time, random network models are useful for an- republish, to post on servers or to redistribute to lists, requires prior specific permission alyzing and understanding networks [20, 21]. For example, they and/or a fee. Request permissions from [email protected]. are routinely used as the null model to verify that features of real WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. world networks are not due randomness [22–24] and also serve ACM ISBN 978-1-4503-5940-5/19/02...$15.00 https://doi.org/10.1145/3289600.3290976 to identify community structure, as is the case with the stochas- Section 3). We provide two technical results about local optima tic block model and its variants [25–27] as well as methods such of the likelihood function for our model: (i) the expected degree as BigCLAM [28] and CESNA [29]. These models successfully in- of each vertex matches the input network and (ii) the expected corporate community structure, but have yet to effectively model aggregated log-distance matches the input network. core-periphery structure. While block modeling can also be used to identify core-periphery structure [12], we later show that such 2.1 A generative core-periphery model an approach is limited. Moreover, in general, spatial information is Basic model. In our basic model, we start with n vertices, where not often incorporated into random network models, even though each vertex u has a real-valued core score θ . For every pair of it can play a large role in their structure. u vertices u and v, we add an edge between them with probability Here, we present a random network model that generates net- θ +θ θ +θ works with core-periphery structure. In our model, each vertex has ρuv = e u v ¹e u v + 1º: (1) a real-valued core score to reflect its role in the core. Our model As a sanity check, the edge probability ρ 2 »0; 1¼ and increases assumes the edges are generated by a random process and the uv monotonically as a function of the combined core score. Thus, probability that two vertices connect is an increasing function of vertices that are both “in the core” (i.e., have large core scores) are their core scores. Given an arbitrary network, we infer the model more likely to connect. Two special cases have significant meaning. parameters (core scores) by maximum likelihood estimation. We First, if all vertices have the same core score θ , then the model is prove that at any local optimum of the likelihood function, the 0 the Erdős-Rényi model with edge probability p = e2θ0 /(e2θ0 + 1º. expected degree of any node under the model is the same as the Second, if the vertices are partitioned into a core V with core score given network. Therefore, our model can be an alternative to the c θ and a periphery V with core score −θ , then as θ increases, the Chung-Lu model for generating random networks [30]. 0 p 0 0 model converges to a traditional block model for social networks: Our model also accounts for spatial locations of vertices (if such the core vertices V form a clique while the periphery vertices V data is available). Spatial networks emerge appear in application c p form an independent set, and every pair of vertices u 2 V and such as trade, transportation, the power grid, and the Internet, c v 2 V is connected with probability 0.5. where there is a cost associated with the length of the edges [31] p and are known to carry core-periphery structure [9, 32, 33]. In such Full model. Now we incorporate spatial information into the cases, topology alone does not explain many proprieties in spatial model. All of our subsequent algorithms and analysis are then pre- networks, such as “small world” navigability. Our model accounts sented for this model, which includes the basic model in Eq. (1) as a for spatial information by decreasing the probability of an edge special case. Our model incorporates spatial information by adding between a pair of vertices with their distance. We show that at local a kernel function Kuv to the denominator of edge probability optima of our likelihood function, the log of the expected geometric θu +θv θu +θv ϵ ρuv = e ¹e + Kuv º: (2) mean edge length of networks generated by our model is the same as in the given network from which the parameters are learned. The kernel function Kuv can be any arbitrary non-negative func- Spatial information enables us to design efficient algorithms tion provided by the user, but we mainly focus on metric kernels to for maximizing likelihood and generating random networks. The study spatial networks where core-periphery structure is prevalent. main idea is that if a set S of vertices are far away in space from We will introduce kernel functions as we go through examples, but some given vertex u, then the effect of S on u can be efficiently a simple kernel is Euclidean distance: Kuv = kxu − xv k2, where xu approximated.

Load more