Discovering Latent Network Structure in Point Process Data

Discovering Latent Network Structure in Point Process Data Scott W. Linderman [email protected] Harvard University, Cambridge, MA 02138 USA Ryan P. Adams [email protected] Harvard University, Cambridge, MA 02138 USA Abstract are considered known and the data are the entries in the associated adjacency matrix. A rich literature has arisen in Networks play a central role in modern data anal- recent years for applying statistical machine learning mod- ysis, enabling us to reason about systems by els to this type of problem, e.g., Liben-Nowell & Kleinberg studying the relationships between their parts. (2007); Hoff (2008); Goldenberg et al. (2010). Most often in network analysis, the edges are given. However, in many systems it is diffi- In this paper we are concerned with implicit networks that cult or impossible to measure the network di- cannot be observed directly, but about which we wish to rectly. Examples of latent networks include eco- perform analysis. In an implicit network, the vertices nomic interactions linking financial instruments or edges of the graph may not be directly observed, but and patterns of reciprocity in gang violence. In the graph structure may be inferred from noisy emissions. these cases, we are limited to noisy observations These noisy observations are assumed to have been gen- of events associated with each node. To enable erated according to underlying dynamics that respect the analysis of these implicit networks, we develop latent network structure. a probabilistic model that combines mutually- For example, trades on financial stock markets are executed exciting point processes with random graph mod- thousands of times per second. Trades of one stock are els. We show how the Poisson superposition likely to cause subsequent activity on stocks in related in- principle enables an elegant auxiliary variable dustries. How can we infer such interactions and disen- formulation and a fully-Bayesian, parallel infer- tangle them from market-wide fluctuations? Discovering ence algorithm. We evaluate this new model em- latent structure underlying financial markets not only re- pirically on several datasets. veals interpretable patterns of interaction, but also provides insight into the stability of the market. In Section 4 we will analyze the stability of mutually-excitatory systems, and in 1. Introduction Section 6 we will explore how stock similarity may be in- Many types of modern data are characterized via relation- ferred from trading activity. ships on a network. Social network analysis is the most As another example, both the edges and vertices may be commonly considered example, where the properties of in- latent. In Section 7, we examine patterns of violence in dividuals (vertices) can be inferred from “friendship” type Chicago, which can often be attributed to social structures connections (edges). Such analyses are also critical to un- in the form of gangs. We would expect that attacks from derstanding regulatory biological pathways, trade relation- one gang onto another might induce cascades of violence, ships between nations, and propagation of disease. The but the vertices (gang identity of both perpetrator and vic- tasks associated with such data may be unsupervised (e.g., tim) are unobserved. As with the financial data, it should identifying low-dimensional representations of edges or be possible to exploit dynamics to infer these social struc- vertices) or supervised (e.g., predicting unobserved links tures. In this case spatial information is available as well, in the graph). Traditionally, network analysis has focused which can help inform latent vertex identities. on explicit network problems in which the graph itself is considered to be the observed data. That is, the vertices In both of these examples, the noisy emissions have the form of events in time, or “spikes,” and our intuition is Proceedings of the 31 st International Conference on Machine that a spike at a vertex will induce activity at adjacent ver- Learning, Beijing, China, 2014. JMLR: W&CP volume 32. Copy- tices. In this paper, we formalize this idea into a probabilis- right 2014 by the author(s). Discovering Latent Network Structure in Point Process Data tic model based on mutually-interacting point processes. 1 4 Specifically, we combine the Hawkes process (Hawkes, I 1971) with recently developed exchangeable random graph priors. This combination allows us to reason about latent 3 5a 5c networks in terms of the way that they regulate interaction in the Hawkes process. Inference in the resulting model can II be done with Markov chain Monte Carlo, and an elegant 2 5b data augmentation scheme results in efficient parallelism. III 2. Preliminaries 2.1. Poisson Processes Figure 1: Illustration of a Hawkes process. Events induce impulse responses on connected processes and spawn “child” events. See Point processes are fundamental statistical objects that the main text for a complete description. f gN ⊂ S S 0 yield random finite sets of events sn n=1 , where to the intensity of other processes k . Causality and locality is a compact subset of RD, for example, space or of influence are enforced by requiring hk;k0 (∆t) to be zero time. The Poisson process is the canonical exam- for ∆t2 = [0; ∆tmax]. ple. It is governed by a nonnegative “rate” or “intensity” function, λ(s): S! R+. The number of events By the superposition theorem for Poisson processes, these 0 in a subsetR S ⊂ S follows a Poisson distribution with additive components can be considered independent pro- mean 0 λ(s)ds. Moreover, the number of events in dis- cesses, each giving rise to their own events. We augment S 2 f − g joint subsets are independent. our data with a latent random variable zn 0; : : : ; n 1 to indicate the cause of the n-th event (0 if the event is due f gN ∼ PP We use the notation sn n=1 (λ(s)) to indicate that to the background rate and 1 : : : n − 1 if it was caused by f gN a set of events sn n=1 is drawn from a Poisson process a preceding event). The augmented Hawkes likelihood is with rate λ(s). The likelihood is given by then the product of likelihoods of each Poisson process: { } Z YN N N f g j − f g j f g ff 0 gg p( sn n=1 λ(s)) = exp λ(s)ds λ(sn): (1) p( (sn; cn; zn) n=1 λ0;k(t) ; hk;k (∆t) ) = S n=1 YK p(fs : c = k ^ z = 0g j λ (t)) × In this work we will make use of a special property of Pois- n n n 0;k k=1 son processes, the Poisson superposition theorem, which YN YK states that fs g ∼ PP(λ (s) + ::: + λ (s)) can be de- n 1 K p(fs 0 : c 0 = k ^ z 0 = ng j h (t − s )); composed into K independent Poisson processes. Let- n n n cn;k n n=1 k=1 ting zn denote the origin of the n-th event, we perform the decomposition by independently sampling each zn where the densities in the product are given by Equation 1. from Pr(z = k) / λ (s ), for k 2 f1 :::Kg (Daley & n k n Figure 1 illustrates a causal cascades of events for a simple Vere-Jones, 1988). network of three processes (I-III). The first event is caused by the background rate (z1 = 0), and it induces impulse 2.2. Hawkes Processes responses on processes II and III. Event 2 is spawned by Though Poisson processes have many nice properties, they the impulse on the third process (z2 = 1), and feeds back cannot capture interactions between events. For this we onto processes I and II. In some cases a single parent event turn to a more general model known as Hawkes pro- induces multiple children, e.g., event 4 spawns events 5a- cesses (Hawkes, 1971). A Hawkes process consists of K c. In this simple example, processes excite one another, point processes and gives rise to sets of marked events but do not excite themselves. Next we will introduce more f gN 2 f g sophisticated models for such interaction networks. sn; cn n=1, where cn 1;:::;K specifies the process on which the n-th event occurred. For now, we assume the events are points in time, i.e., sn 2 [0;T ]. Each of 2.3. Random Graph Models the K processes is a conditionally Poisson process with Graphs of K nodes correspond to K × K matrices. a rate λ (t j fs : s < tg) that depends on the history of k n n Unweighted graphs are binary adjacency matrices A events up to time t. where Ak;k0 = 1 indicates a directed edge from node k to Hawkes processes have additive interactions. Each process node k0. Weighted directed graphs can be represented by has a “background rate” λ0;k(t), and each event sn on pro- a real matrix W whose entries indicate the weights of the cess k adds a nonnegative impulse response hk;k0 (t − sn) edges. Random graph models reflect the probability of dif- Discovering Latent Network Structure in Point Process Data ferent network structures through distributions over these plete graph recovers the standard Hawkes process. Mak- matrices. ing g a probability density endows W with units of “ex- pected number of events” and allows us to compare the Recently, many random graph models have been unified relative strength of interactions. The form suggests an under an elegant theoretical framework due to Aldous and intuitive generative model: for each impulse response Hoover (Aldous, 1981; Hoover, 1979). See Lloyd et al. draw m ∼ Poisson(W 0 ) number of induced events and (2012) for an overview.

Load more