Learning Graph Representations with Recurrent Neural Network Autoencoders

Learning Graph Representations with Recurrent Neural Network Autoencoders Aynaz Taheri Kevin Gimpel Tanya Berger-Wolf University of Illinois at Chicago Toyota Technological Institute at University of Illinois at Chicago [email protected] Chicago [email protected] [email protected] ABSTRACT Representing and comparing graphs is a central problem in many fields. We present an approach to learn representations of graphs using recurrent neural network autoencoders. Recurrent neural networks require sequential data, so we begin with several methods to generate sequences from graphs, including random walks, breadth-first search, and shortest paths. We train long short-term memory (LSTM) autoencoders to embed these graph sequences into a continuous vector space. We then represent a graph by averaging its graph sequence representations. The graph representations are then used for graph classification and comparison tasks. We demonstrate the effectiveness of our approach by showing improvements over the existing state-of-the-art on several graph classification tasks, including both labeled and unlabeled graphs. Figure 1: Learned graph representations in Proteins [5] KEYWORDS dataset, which has two classes: enzyme (red) and non- enzyme (blue). The plot is generated by t-SNE [29] and the Representation learning, Deep Learning, Recurrent Neural Net- two highlighted graphs are generated by Gephi [2]. works, Graph Classification 1 INTRODUCTION three main (not necessarily disjoint) classes: feature extraction, We address the problem of comparing and classifying graphs by graph kernels, and graph matching. learning their representations. This problem arises in many domain Feature extraction methods compare graphs across a set of fea- areas, including bioinformatics, social network analysis, chemistry, tures, such as specific subgraphs or numerical properties that cap- neuroscience, and computer vision. For instance, in neuroscience, ture the topology of the graphs [4, 30]. The efficiency and per- comparing brain networks represented by graphs helps to identify formance of such methods is highly dependent on the feature se- brains with neurological disorders [43]. In social network analy- lection process. Most graph kernels [44] are based on the idea of sis, we may need to compare egonetworks to detect anomalies [1] R-convolutional kernels [19], a way of defining kernels on struc- or to identify corresponding egonetworks across multiple social tured objects by decomposing the objects into substructures and networks. Cutting across domains, we may be interested in under- comparing pairs in the decompositions. For graphs, the substruc- standing how to distinguish the structure of a social network from tures include graphlets [39], shortest paths [6], random walks [16], a biological network, an authorship network, a computer network, and subtrees [38]. While graph kernel methods are effective, their or a citation network [33]. In some tasks, we need to quantify the time complexity is quadratic in the number of graphs. Graph match- similarity between two graphs. For example, in pattern recognition ing algorithms use the topology of the graphs, their nodes and tasks such as handwriting recognition, we need to measure the edges directly, counting matches and mismatches [8, 36]. These similarity between graphs derived from handwriting images [14]. approaches do not consider the global structure of the graphs and We consider the setting in which we want to compare graphs are sensitive to noise. without necessarily finding any node correspondence between In this paper, we propose an unsupervised approach for learning them. Existing graph comparison methods can be categorized into graph representations using long short-term memory (LSTM) recurrent neural networks [22]. An unsupervised graph representation Permission to make digital or hard copies of part or all of this work for personal or approach can be used not only in processing labeled data, such as classroom use is granted without fee provided that copies are not made or distributed in graph classification in bioinformatics, but can be also applied for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. in many practical applications, such as anomaly detection in so- For all other uses, contact the owner/author(s). cial networks or streaming data, as well as in exploratory analysis KDD’18 Deep Learning Day, August 2018, London, UK and scientific hypothesis generation. An unsupervised method for © 2018 Copyright held by the owner/author(s). ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. learning graph representations provides a fundamental capability https://doi.org/10.1145/nnnnnnn.nnnnnnn to analyze graphs based on their intrinsic properties. KDD’18 Deep Learning Day, August 2018, London, UK B. Trovato et al. We leverage the LSTM sequence-to-sequence learning frame- used to predict new features for the graphs. Bruna et al. [7] proposed work of [40], which uses one LSTM to encode the input sequence spectral networks, generalizations of CNNs on low-dimensional into a vector and another LSTM to generate the output sequence graphs via graph Laplacians. Henaff et al. [20] and Defferrard et from that vector. We use the same sequence as both input and out- al. [11] extended spectral networks to high-dimensional graphs. put, making this a sequence-to-sequence LSTM autoencoder [27]. Scarselli et al. [37] proposed graph neural networks (GNN) and find We consider several ways to generate sequences from the graphs, node representations using random walks. Li et al. [28] extended including random walks, breadth-first search, and shortest paths. GNNs with gating recurrent neural networks to predict sequences After training the autoencoder, we represent a graph by averag- from graphs. In general, neural message passing approaches can suf- ing the encodings of several of its graph sequences. The graph fer from high computational and memory costs, since they perform representations are then used for graph classification. multiple iterations of updating hidden node states in graph repre- Figure 1 shows the result of using our approach to embed the sentations. However, our unsupervised approach obtains strong graphs in the Proteins dataset [5] into a 100-dimensional space, performance without the requirement of passing messages between visualized with t-SNE [29]. Each point is one of the 1113 graphs in vertices for multiple iterations. the dataset. The two class labels were not used when learning the Graph2vec [32] is an unsupervised method inspired by document graph representations, but we still generally find that graphs of the embedding models [25]. This approach finds a representation for same class are clustered in the learned space. a graph by maximizing the likelihood of graph subtrees given the We demonstrate the efficacy of our approach by using our learned graph embedding. Our approach outperforms this method by a large graph representations in classification tasks for both labeled and margin. Graph2vec suffers from its lack of capturing global infor- unlabeled graphs. We outperform the state-of-the-art on nearly all mation in the graph structure by only considering subtrees as graph datasets considered, showing performance gains over prior graph representatives. Other methods have been developed to learn rep- kernel methods while also being asymptotically more efficient. In- resentations for individual nodes in graphs, such as DeepWalk [35], deed, the time complexity of our approach is linear in the number LINE [41], node2vec [18], and many others. These methods are not of graphs, while kernel methods are quadratic. directly related to graph comparison because they require aggre- gating node representations to represent entire graphs. However, 2 RELATED WORK we compared to one representative method in our experiments to show that the aggregation of node embeddings is not informative While we briefly discussed prior work in Section 1, we now provide enough to represent the structure of a graph. some additional remarks to provide more perspective on other work in this area. 3 BACKGROUND Recently, some new graph kernels such as Deep Graph Ker- We briefly discuss the required background about graphs and LSTM nel (DGK) [47] and optimal-assignment Weisfeiler-Lehman (WL- recurrent neural networks. OA) [23] have been proposed and evaluated for graph classification Graphs. For a graph G = ¹V ; Eº, V denotes its vertex set and E ⊆ tasks. The DGK [47] uses methods from unsupervised learning of V × V denotes its edge set. G is called a labeled graph if there is a word embeddings [31] to augment the kernel with substructure labeling function label : V ! L that assigns a label from a set of similarity. WL-OA [23] is an assignment kernel that finds an op- labels L to each vertex. The graph G is called an unlabeled graph if timal bijection between different parts of the graph. While graph no label has been assigned to each vertex in the graph. kernel methods are effective, their time complexity is quadratic in Long short-term memory (LSTM). An LSTM [22] is a recurrent the number of graphs, and there is no opportunity to customize neural network (RNN) designed to model long-distance dependen- their representations for supervised tasks. cies in sequential data. We denote the input vector at time t by x There has been other work in using neural methods to learn t and we denote the hidden vector computed at time t by h . At each graph representations. The representations obtained by these ap- t time step, an LSTM computes a memory cell vector c , an input proaches are tailored for a supervised task and are not determined t gate vector i , a forget gate vector f , and an output gate vector o : solely based on the graph structure. Niepert et al. [34] developed a t t t framework (PSCN) to learn graph representations using convolu- it = σ¹Wixt + Uiht−1 + Kict−1 + bi º tional neural networks (CNNs). Deep Graph Convolutional Neural ft = σ¹Wf xt + Uf ht−1 + Kf ct−1 + bf º Network (DGCNN) [48] is another model that applies CNNs for ot = σ¹Woxt + Uoht−1 + Koct−1 + boº (1) graph classification.

Load more