ICML 2020, 12-18 July 2020, 2020
Total Page:16
File Type:pdf, Size:1020Kb
Integrated Defense for Resilient Graph Matching Jiaxiang Ren 1 Zijie Zhang 1 Jiayin Jin 1 Xin Zhao 1 Sixing Wu 2 Yang Zhou 1 Yelong Shen 3 Tianshi Che 1 Ruoming Jin 4 Dejing Dou 5 6 Abstract Zhang & Tong, 2016; Mu et al., 2016; Heimann et al., 2018; Li et al., 2019a; Fey et al., 2020; Qin et al., 2020; Feng A recent study has shown that graph matching et al., 2019; Ren et al., 2020). It has been widely applied models are vulnerable to adversarial manipulation to many real-world applications, including protein network of their input which is intended to cause a mis- alignment in bioinformatics (Liu et al., 2017; Vijayan et al., matching. Nevertheless, there is still a lack of 2020), user account linking in multiple social networks(Shu a comprehensive solution for further enhancing et al., 2016; Mu et al., 2016; Feng et al., 2019), object the robustness of graph matching against adver- matching in computer vision (Fey et al., 2020; Wang et al., sarial attacks. In this paper, we identify and study 2020b;e; Yang et al., 2020), knowledge translation in multi- two types of unique topology attacks in graph lingual knowledge bases (Sun et al., 2020; Wu et al., 2020c). matching: inter-graph dispersion and intra-graph assembly attacks. We propose an integrated de- Recently, there has been much interest in developing re- fense model, IDRGM, for resilient graph match- silient graph learning techniques to improve the model ro- ing with two novel defense techniques to de- bustness against adversarial attacks, including node classifi- fend against the above two attacks simultaneously. cation (Zhu et al., 2019; Xu et al., 2019b; Tang et al., 2020; A detection technique of inscribed simplexes in Entezari et al., 2020; Zheng et al., 2020; Zhou & Vorobey- the hyperspheres consisting of multiple matched chik, 2020; Jin et al., 2020b; Feng et al., 2020; Elinas et al., nodes is proposed to tackle inter-graph dispersion 2020; Zhang & Zitnik, 2020), graph classification (Jin et al., attacks, in which the distances among the matched 2020a), community detection (Jia et al., 2020), network nodes in multiple graphs are maximized to form embedding (Dai et al., 2019), link prediction (Zhou et al., regular simplexes. A node separation method 2019a), malware detection (Hou et al., 2019), spammer de- based on phase-type distribution and maximum tection (Dou et al., 2020), fraud detection (Breuer et al., likelihood estimation is developed to estimate the 2020; Zhang et al., 2020a), and influence maximization (Lo- distribution of perturbed graphs and separate the gins et al., 2020). The majority of existing techniques focus nodes within the same graphs over a wide space, on the defenses on single graph learning tasks. Improving for defending intra-graph assembly attacks, such the robustness of graph matching against adversarial attacks that the interference from the similar neighbors of has not been inadequately investigated yet. Existing tech- the perturbed nodes is significantly reduced. We niques for defending single graph learning tasks cannot be evaluate the robustness of our IDRGM model on directly utilized to improve the robustness of graph match- real datasets against state-of-the-art algorithms. ing, as the graph matching has to analyze interactions within and across graphs. To our best knowledge, RGM is the only robust graph matching model (Yu et al., 2021). It enhances the robustness of image matching against visual noise in 1. Introduction computer vision, including image deformations, rotations, Graph matching (i.e., network alignment), which aims to and outliers, but it fails to defend adversarial attacks on identify the same entities (i.e., nodes) across multiple graphs, graph topology. has been a heated topic in recent years (Chu et al., 2019; In the context of graph matching, there are two types of Xu et al., 2019a; Wang et al., 2020d; Chen et al., 2020a;b; topology attacks within and across graphs: (1) Inter-graph 1Auburn University, USA 2Peking University, China 3Microsoft dispersion attacks. Most of existing graph matching algo- Dynamics 365 AI, USA 4Kent State University, USA 5University rithms often aim to minimize the distance or maximize the of Oregon, USA 6Baidu Research, China. Correspondence to: similarity among the matched nodes in K different graphs Yang Zhou <[email protected]>. in training data by mapping these nodes with different fea- tures into common space through either matrix transforma- Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s). tion (Zhang & Tong, 2016; Zhang et al., 2019) or network Integrated Defense for Resilient Graph Matching embedding (Heimann et al., 2018; Chu et al., 2019; Xu et al., 2018), most of existing adversarial attack techniques on 2019a; Fey et al., 2020). The nodes with the smallest dis- graph data focus on how to generate imperceptible pertur- tances in K graphs in test data are selected as the matching bations within a lp norm neighborhood but ignore the dis- 1 2 results. As shown in Figure1, three matched nodes vi1 , vi2 , tribution change from clean graphs to perturbed ones (Bo- 3 1 2 3 and vi3 in three graphs G , G , and G are projected into jchevski & Gunnemann¨ , 2019; Wang & Gong, 2019; Liu 1 2 the same space, such that their embeddings ui1 , ui2 , and et al., 2019; Chang et al., 2020; Li et al., 2020; Zang et al., 3 1 2 3 ui3 are identical, i.e., ui1 = ui2 = ui3 . An inter-graph 2020). Thus, the perturbed graphs can follow any distribu- dispersion attack tries to push the matched nodes in multi- tions. The phase-type distribution can be used to approxi- ple graphs far away from each other for maximizing their mate any positive-valued distribution (O’Cinneide, 1990). distances under an attack budget . In this case, the attack By exploring the phase-type distribution and maximum like- problem is equivalent to a geometry optimization problem lihood estimation (Chakravarthy & Alfa, 1996; Asmussen of how to arrange the matched nodes in a hypersphere with et al., 1996), we develop a node separation algorithm to radius such that the distances among them are maximized. handle the intra-graph assembly attacks. We estimate the Namely, an inscribed regular (K − 1)-simplex in a hyper- distribution of perturbed graphs and maximize the distances sphere with radius is generated by adding/deleting edges among the perturbed nodes within the same graphs, for to/from the matched nodes, e.g., an inscribed equilateral separating the nodes in a narrow space into a wide space, triangle (i.e., regular 2-simplex) in a circle (1-hypersphere) such that the interference from the similar neighbors of the with radius in Figure1. In addition, there is little possibil- perturbed nodes is significantly reduced. In Figure2, the ity for the non-matched clean nodes in K graphs to form a nodes in two graphs G1 and G2 are separated respectively 1 2 1 2 regular or near-regular simplex, especially when K is large. by maximizing the distances 1=dy and 1=dy in G and G . Thus, regular or near-regular simplexes within the range Empirical evaluation over real graph datasets demonstrates of can be safely treated as the matched nodes under the that the remarkable robustness of IDRGM against state-of- inter-graph dispersion attacks; and (2) Intra-graph assem- the-art graph matching methods and representative resilient bly attacks. A recent attack solution for graph matching Lipschitz-bound neural architectures. In addition, more aims to move a node to be attacked to dense region in its experiments, implementation details, and hyperparameter graph, such that the distances between its similar neighbors selection and setting are presented in Appendices A.2-A.4. in the same graph and its counterparts in other graphs be- come smaller than the ones between this perturbed node To our best knowledge, this work is the first to study in- and its counterparts, and thus to generate a wrong matching tegrated defense for resilient graph matching against both result (Zhang et al., 2020b). As shown in Figure2, two inter-graph dispersion and intra-graph assembly attacks. 1 2 matched nodes ui1 and ui2 are pushed to dense regions in 1 2 1 two graphs G and G respectively, such that ui1 is closer 2. Problem Definition to the neighbors of u2 in G2, rather than u2 itself. A i2 i2 Given a set of K graphs G1; ··· ;GK to be matched, each wrong matching between u1 and an neighbor of u2 will be i1 i2 graph is denoted as Gk = (V k;Ek) (1 ≤ k ≤ K), where generated. In addition, since there are many similar neigh- V k = fvk; ··· ; vk g is the set of N k nodes and Ek = bors around the perturbed nodes in the dense region, this 1 N k f(vk; vk) : 1 ≤ i; j ≤ N kg is the set of edges. Each dramatically increases the possibility of deriving the wrong i j Gk has an N k × N k binary adjacency matrix Ak, where matching results. k k k k each entry Aij = 1 if there exists an edge (vi ; vj ) 2 E ; Motivated by the above analysis, we propose an effective k k th k otherwise Aij = 0. Ai: specifies the i row vector of A . simplex detection technique to tackle the inter-graph disper- In this paper, if there are no specific descriptions, we use sion attacks. The defense model tries to determine whether k k k vi to denote a node vi itself and its representation Ai:, i.e., the nodes in multiple graphs form inscribed regular sim- k k k th vi = Ai: and we utilize vij to specify the j dimension plexes in the hyperspheres with radius and how regular of vk, i.e., vk = Ak .