Fast Cross-Layer Dependency Inference on Multi-Layered Networks
Total Page:16
File Type:pdf, Size:1020Kb
FASCINATE: Fast Cross-Layer Dependency Inference on Multi-layered Networks Chen Chen†, Hanghang Tong†, Lei Xie‡, Lei Ying†, and Qing He§ †Arizona State University, {chen_chen, hanghang.tong, lei.ying.2}@asu.edu ‡City University of New York, [email protected] §University at Buffalo, [email protected] ABSTRACT Multi-layered networks have recently emerged as a new network model, which naturally finds itself in many high-impact applica- tion domains, ranging from critical inter-dependent infrastructure networks, biological systems, organization-level collaborations, to cross-platform e-commerce, etc. Cross-layer dependency, which describes the dependencies or the associations between nodes across different layers/networks, often plays a central role in many data mining tasks on such multi-layered networks. Yet, it remains a daunting task to accurately know the cross-layer dependency a prior. In this paper, we address the problem of inferring the missing cross- layer dependencies on multi-layered networks. The key idea behind our method is to view it as a collective collaborative filtering prob- lem. By formulating the problem into a regularized optimization model, we propose an effective algorithm to find the local optima with linear complexity. Furthermore, we derive an online algo- rithm to accommodate newly arrived nodes, whose complexity is Figure 1: An illustrative example of multi-layered networks. Each just linear wrt the size of the neighborhood of the new node. We gray ellipse is a critical infrastructure network (e.g., Telecom net- perform extensive empirical evaluations to demonstrate the effec- work, power grid, transportation network, etc). A thick arrow be- tiveness and the efficiency of the proposed methods. tween two ellipses indicates a cross-layer dependency between the corresponding two networks (e.g., a router in the telecom network 1. INTRODUCTION depends on one or more power plants in the power grid). In an increasingly connected world, networks from many high- and the disease network is in turn coupled with the drug network impact areas are often collected from multiple inter-dependent do- by drug-disease associations. Multi-layered networks also appear mains, leading to the emergence of multi-layered networks [4, 9, in many other application domains, such as organization-level col- 26, 29, 30]. A typical example of multi-layered networks is inter- laboration platform [5] and cross-platform e-commerce [6, 16, 21, dependent critical infrastructure network. As illustrated in Figure 1, 36]. the full functioning of the telecom network, the transportation net- Compared with single-layered networks, a unique topological work and the gas pipeline network is dependent on the power sup- characteristic of multi-layered networks lies in its cross-layer de- ply from the power grid. While for the gas-fired and coal-fired pendency structure. For example, in the critical infrastructure net- generators in the power grid, their functioning is fully dependent work, the full functioning of the telecom layer depends on the suf- on the gas and coal supply from the transportation network and the ficient power supply from the power grid layer, which in turn relies gas pipeline network. Moreover, to keep the whole complex sys- on the functioning of the transportation layer (e.g., to deliver the tem working in order, extensive communications are needed across sufficient fuel). While in the biological systems, the dependency is the networks, which are in turn supported by the telecom network. represented as the associations among diseases, genes and drugs. Another example is biological system, where the protein-protein In practice, the cross-layer dependency often plays a central role in interaction network (PPI/gene network) is naturally linked to the many multi-layered network mining tasks. For example, in the crit- disease similarity network by the known disease-gene associations, ical infrastructure network, the existence of the cross-layer depen- dency is in general considered as a major factor of the vulnerability of the entire system. This is because a small disturbance on one Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed supporting layer/network (e.g., power grid) might cause a ripple ef- for profit or commercial advantage and that copies bear this notice and the full cita- fect to all the dependent layers, leading to a catastrophic/cascading tion on the first page. Copyrights for components of this work owned by others than failure of the entire system. On the other hand, the cross-layer de- ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission pendency in the biological system is often the key to new discover- and/or a fee. Request permissions from [email protected]. ies, such as new treatment associations between existing drugs and KDD ’16, August 13-17, 2016, San Francisco, CA, USA new diseases (e.g., drug re-purposing). c 2016 ACM. ISBN 978-1-4503-4232-2/16/08. $15.00 Despite its key importance, it remains a daunting task to ac- DOI: http://dx.doi.org/10.1145/2939672.2939784 curately know the cross-layer dependency in a multi-layered net- Table 1: Main Symbols. work, due to a number of reasons, ranging from noise, incomplete data sources, limited accessibility to network dynamics. For ex- Symbol Definition and Description A B ample, an extreme weather event might significantly disrupt the , the adjacency matrices (bold upper case) a b power grid, the transportation network and the cross-layer depen- , column vectors (bold lower case) A B dencies in between at the epicenter. Yet, due to limited accessibility , sets (calligraphic) th th to the damage area during or soon after the disruption, the cross- A(i, j) the element at i row j column A layer dependency structure might only have a probabilistic and/or in matrix th coarse-grained description. On the other hand, for a newly identi- A(i, :) the i row of matrix A A(:,j) the jth column of matrix A fied chemical in the biological system, its cross-layer dependencies wrt proteins and/or the diseases might be completely unknown due A transpose of matrix A to clinical limitations.(i.e., the zero-start problem). Aˆ the adjacency matrix of A with the newly added node In this paper, we aim to tackle the above challenges and develop G the layer-layer dependency matrix effective and efficient methods to infer cross-layer dependency on A within-layer connectivity matrices of the network multi-layered networks. The main contributions of the paper can A = {A1,...,Ag} be summarized as D cross-layer dependency matrices D = {Di,j i, j =1, ..., g} • Problem Formulations. We formally formulate the cross- Wi,j weight matrix for Di,j layer dependency inference problem as a regularized opti- Fi low-rank representation for layer-i (i =1, ..., g) mization problem. The key idea of our formulation is to col- mi,ni number of edges and nodes in graph Ai lectively leverage the within-layer topology as well as the mi,j number of dependencies in Di,j observed cross-layer dependency to infer a latent, low-rank g total number of layers representation for each layer, based on which the missing r the rank for {Fi}i=1,...,g cross-layer dependencies can be inferred. t the maximal iteration number ξ the threshold to determine the iteration • Algorithms and Analysis. We propose an effective algorithm (FASCINATE) for cross-layer dependency inference on multi- G(i, j)=1if layer-j depends on layer-i, and G(i, j)=0other- layered networks, and analyze its optimality, convergence wise. Second, we need a set of g within-layer connectivity matri- and complexity. We further present its variants and gener- ces: A = {A1, ..., Ag} to describe the connectivities/similarities alizations, including an online algorithm to address the zero- between nodes within the same layer. Third, we need a set of cross- start problem. layer dependency matrices D = {Di,j i, j =1, ..., g}, where D • Evaluations. We perform extensive experiments on real data- i,j describes the dependencies between the nodes from layer-i sets to validate the effectiveness, efficiency and scalability of and the nodes from layer-j if these two layers are directly dependent G the proposed algorithms. Specially, our experimental eval- (i.e., (i, j)=1). When there is no direct dependencies between G uations show that the proposed algorithms outperform their the two layers (i.e., (i, j)=0), the corresponding dependency D best competitors by 8.2%-41.9% in terms of inference accu- matrix i,j is absent. Taking the multi-layered network in Fig- racy while enjoying linear complexity. Specifically, the pro- ure 2 for an example, the abstract layer-layer dependency network 7 G posed FASCINATE-ZERO algorithm can achieve up to 10 × of this biological system can be viewed as a line graph. The A speedup with barely no compromise on accuracy. four within-layer similarity matrices in are the chemical network (A1), the drug network (A2), the disease network (A3) and the A The rest of the paper is organized as follows. Section 2 gives the protein-protein interaction (PPI) network ( 4). Across those lay- formal definitions of the cross-layer dependency inference prob- ers, we have three non-empty dependency matrices, including the D lems. Section 3 proposes FASCINATE algorithm with its analysis. chemical-drug dependency matrix ( 1,2), the drug-disease inter- D Section 4 introduces the zero-start algorithm FASCINATE-ZERO. action matrix ( 2,3) and the disease-protein dependency matrix D Section 5 presents the experiment results.