Temporal Network Representation Learning
Total Page:16
File Type:pdf, Size:1020Kb
1 Temporal Network Representation Learning John Boaz Lee, Giang Nguyen, Ryan A. Rossi, Nesreen K. Ahmed, Eunyee Koh, and Sungchul Kim Abstract—Networks evolve continuously over time with the addition, deletion, and changing of links and nodes. Such temporal networks (or edge streams) consist of a sequence of timestamped edges and are seemingly ubiquitous. Despite the importance of accurately modeling the temporal information, most embedding methods ignore it entirely or approximate the temporal network using a sequence of static snapshot graphs. In this work, we introduce the notion of temporal walks for learning dynamic embeddings from temporal networks. Temporal walks capture the temporally valid interactions (e.g., flow of information, spread of disease) in the dynamic network in a lossless fashion. Based on the notion of temporal walks, we describe a general class of embeddings called continuous-time dynamic network embeddings (CTDNEs) that completely avoid the issues and problems that arise when approximating the temporal network as a sequence of static snapshot graphs. Unlike previous work, CTDNEs learn dynamic node embeddings directly from the temporal network at the finest temporal granularity and thus use only temporally valid information. As such CTDNEs naturally support online learning of the node embeddings in a streaming real-time fashion. The experiments demonstrate the effectiveness of this class of embedding methods for prediction in temporal networks. Index Terms—Dynamic node embeddings, temporal node embeddings, temporal walks, stream walks, graph streams, edge streams, temporal networks, dynamic networks, network representation learning, dynamic graph embedding, online learning, incremental learning F v2 v3 v4 v1 v4 v3 v5 v3 1 INTRODUCTION 1 2 3 4 5 7 8 10 ··· YNAMIC networks are seemingly ubiquitous in the v v v v v v v v D real-world. Such networks evolve over time with the 1 2 3 4 3 5 2 6 addition, deletion, and changing of nodes and links. The (a) Graph (edge) stream temporal information in these networks is known to be important to accurately model, predict, and understand v2 8 network data [1], [2]. Despite the importance of these 1 2 7 dynamics, most previous work on embedding methods have − − − v1 v3 v5 − − − ignored the temporal information in network data [3], [4], 10 4 3,5 [5], [6], [7], [8], [9], [10], [11], [12]. v4 v6 We address the problem of learning an appropriate network representation from continuous-time dynamic networks (b) Continuous-Time Dynamic Network (CTDN) (Figure 1) for improving the accuracy of predictive mod- els. We propose continuous-time dynamic network embeddings Fig. 1. Dynamic network. Edges are labeled by time. Observe that existing methods that ignore time would consider v4 −! v1 −! v2 a (CTDNE) and describe a general framework for learning valid walk, however, v4 −! v1 −! v2 is clearly invalid with respect to such embeddings based on the notion of temporal random time since v1−! v2 exists in the past with respect to v4−! v1. In this walks (walks that respect time). The framework is general work, we propose the notion of temporal random walks for embeddings that capture the true temporally valid behavior in networks. In addition, with many interchangeable components and can be used our approach naturally supports learning in graph streams where edges in a straightforward fashion for incorporating temporal arrive continuously over time (e.g., every second/millisecond) dependencies into existing node embedding and deep graph models that use random walks. Most importantly, the arXiv:submit/2652048 [cs.LG] 12 Apr 2019 CTDNEs are learned from temporal random walks that nodes) that are more recent and more frequent. The result is represent actual temporally valid sequences of node interactions a more appropriate time-dependent network representation and thus avoids the issues and information loss that arises that captures the important temporal properties of the when time is ignored [3], [4], [5], [6], [7], [8], [9], [10], [11], continuous-time dynamic network at the finest most natural [12] or approximated as a sequence of discrete static snapshot temporal granularity without loss of information while using graphs [13], [14], [15], [16], [17] (Figure 2) as done in previous walks that are temporally valid (as opposed to walks that work. The proposed approach (1) obeys the direction of do not obey time and thus are invalid and noisy as they time and (2) biases the random walks towards edges (and represent sequences that are impossible with respect to time). Hence, the framework allows existing embedding methods • J. Lee and G. Nguyen is with WPI to be easily adapted for learning more appropriate network Email: fjtlee,[email protected] representations from continuous-time dynamic networks • R. Rossi is with Adobe Research by ensuring time is respected and avoiding impossible E-mail: [email protected] • N. Ahmed is with Intel Labs sequences of events. Email: [email protected] The proposed approach learns a more appropriate net- • E. Koh and S. Kim is with Adobe Research work representation from continuous-time dynamic net- f g Email: eunyee,sukim @adobe.com works that captures the important temporal dependencies 2 of the network at the finest most natural granularity (e.g., at from the email communication ei = (v1; v2). However, a time scale of seconds or milliseconds). This is in contrast if T (v1; v2) > T (v2; v3) then the message ej = (v2; v3) to approximating the dynamic network as a sequence of cannot contain any information communicated in the email static snapshot graphs G1;:::;Gt where each static snap- ei = (v1; v2). This is just one simple example illustrating the shot graph represents all edges that occur between a user- importance of modeling the actual sequence of events (email specified discrete-time interval (e.g., day or week) [18], communications). Embedding methods that ignore time are [19], [20]. Besides the obvious loss of information, there prone to many issues such as learning inappropriate node are many other issues such as selecting an appropriate embeddings that do not accurately capture the dynamics aggregation granularity which is known to be an important in the network such as the real-world interactions or flow and challenging problem in itself that can lead to poor of information among nodes. An example of information predictive performance or misleading results. In addition, loss that occurs when time is ignored or the actual dynamic our approach naturally supports learning in graph streams network is approximated using a sequence of discrete static where edges arrive continuously over time (e.g., every snapshot graphs is shown in Figure 1 and 2, respectively. second/millisecond) [21], [22], [23], [24] and therefore can This is true for networks that involve the flow or diffusion be used for a variety of applications requring real-time of information through a network [29], [30], [31], networks performance [25], [26], [27]. modeling the spread of disease/infection [32], spread of We demonstrate the effectiveness of the proposed frame- influence in social networks (with applications to product work and generalized dynamic network embedding method adoption, viral marketing) [33], [34], or more generally for temporal link prediction in several real-world networks any type of dynamical system or diffusion process over from a variety of application domains. Overall, the proposed a network [29], [30], [31]. method achieves an average gain of 11:9% across all methods The proposed approach naturally supports generating and graphs. The results indicate that modeling temporal dynamic node embeddings for any pair of nodes at a specific dependencies in graphs is important for learning appropriate time t. More specifically, given a newly arrived edge between and meaningful network representations. In addition, any node i and j at time t, we simply add the edge to the graph, existing embedding method or deep graph model that uses perform a number of temporal random walks that contain random walks can benefit from the proposed framework those nodes, and then update the embedding vectors for (e.g., [3], [4], [8], [9], [10], [11], [12], [28]) as it serves as a basis those nodes (via a partial fast update) using only those for incorporating important temporal dependencies into walks. In this case, there is obviously no need to recompute existing methods. Methods generalized by the framework are the embedding vectors for all such nodes in the graph as able to learn more meaningful and accurate time-dependent the update is very minor and an online partial update can network embeddings that capture important properties from be performed fast. This obviously includes the case where continuous-time dynamic networks. either node in the new edge has never been seen previously. Previous embedding methods and deep graph models The above is obviously a special case of our framework and that use random walks search over the space of random is a trivial modification. Notice that we can also obviously walks S on G, whereas the class of models (continuous-time drop-out past edges as they may become stale. dynamic network embeddings) proposed in this work learn v2 temporal embeddings by searching over the space ST of tem- poral random walks that obey time and thus ST includes only v v v temporally valid walks. See Figure 3 for intuition. Informally, 1 3 5 a temporal walk St from node vi1 to node viL+1 is defined v4 v6 as a sequence of edges f(vi1 ; vi2 ; ti1 ), (vi2 ; vi3 ; ti2 );:::; (viL ; viL+1 ; tiL )g such that ti1 ≤ ti2 ≤ ::: ≤ tiL . A temporal walk (a) Static graph ignoring time represents a temporally valid sequence of edges traversed in increasing order of edge times and therefore respect time. v2 v2 For instance, suppose each edge represents a contact (e.g., email, phone call, proximity) between two entities, then v1 v3 v5 v1 v3 v5 a temporal random walk represents a feasible route for a piece of information through the dynamic network.