Markov-Modulated Marked Poisson Processes for Check-In Data [Draft]
Total Page:16
File Type:pdf, Size:1020Kb
Markov-modulated marked Poisson processes for check-in data [draft] Jiangwei Pan [email protected] Duke University Vinayak Rao [email protected] Purdue University Pankaj K. Agarwal [email protected] Duke University Alan E. Gelfand [email protected] Duke University Abstract Figure 1. Visualizing We develop continuous-time probabilistic mod- the check-in loca- els to study trajectory data consisting of times tions (in the US) and locations of user ‘check-ins’. We model of 500 random these as realizations of a marked point pro- users from the cess, with intensity and mark-distribution mod- FourSquare dataset ulated by a latent Markov jump process (MJP). (Gao et al., 2012). We extend this Markov-modulated marked Pois- son process to include user-heterogeneity by as- be noisy, due to measurement error or human perturbation. signing users vectors of ‘preferred locations’. In many cases, the trajectory sample points are sparse, with Our model extends latent Dirichlet allocation by large gaps between bursts of observations. Furthermore, dropping the bag-of-words assumption, and op- the rate of observations can vary along the trajectory, and erating in continuous time. We show how an ap- can depend on the the user and the location of the trajectory propriate choice of priors allows efficient poste- rior inference. Our experiments demonstrate the Our focus in this paper is on user check-in data; in partic- usefulness of our approach by comparing with ular, we consider a dataset of FourSquare check-ins (Gao various baselines on a variety of tasks. et al., 2012) (see Figure1). This smartphone-based social media tool allows individuals to publicly register visits to interesting places: a check-in is a record of such a visit, 1. Introduction consisting typically of a timestamp, a location (latitude and longitude), and additional information such as the name of With the advancement of sensing technologies, trajectory a place, a category (e.g. restaurant or museum) or a user- data are being collected in scenarios such as traffic monitor- rating. Unlike traditional trajectory data, check-in observa- ing (Wang et al., 2014; Westgate et al., 2013), surveillance tions are not passively collected (say, every hour); instead, systems (Wang et al., 2011), and mobile and social network the times of observations are informative about user behav- check-in data (Zhang et al., 2014; Gao et al., 2012). A tra- ior. Consequently, observation times are irregular and typ- jectory is a path taken by an object over a time period, with ically alternate between bursty and sparse episodes. data consisting of observations at a discrete, often irregular, sequence of times. In the simplest case, the observations Given a collection of trajectories from the same spatial do- consist of a time and location (latitude and longitude); ad- main, it is of interest to learn shared underlying patterns. ditionally, they can include activity information (for check- For check-in data, such information can include activity in data) as well as user-information. The observations may hotspots (e.g. big cities, tourist attractions, restaurants), transition rates across different hotspots and nature of activ- Proceedings of the 33 rd International Conference on Machine ity in different regions. Such information can assist recom- Learning, New York, NY, USA, 2016. JMLR: W&CP volume mendation systems, managing traffic, identifying unusual 48. Copyright 2016 by the author(s). activity patterns, understanding factors that lead to the suc- Markov-modulated marked Poisson processes for check-in data [draft] cess or failure of a business, as well as just to summarize a Chicago being the latent state or location, and the restaurant large collection of activity data. coordinates giving the check-in location. Generatively, at the start of the observation window, the individual starts In this work we apply techniques from continuous-time in a random state, stays there for a random period of time, Markov jump processes (MJP) to model such data. Unlike checks in a random number of times while in that state, and their more common discrete-time counterparts, continuous- stochastically transitions to a new state. When checking in, time models are ideal for bursty, asynchronous obser- a random location is attached to the individual. vations over long intervals of time. While MJPs have found applications in modeling point data (e.g. event atten- A first modeling approach might be a state space model dance (Ihler et al., 2007), genetics (Fearnhead & Sherlock, (Fuller, 1996). Discrete-time models of dynamical systems 2006)), there is little work for spatio-temporal phenomena are widespread in machine learning and statistics: starting like check-in data. One of our contributions is to apply and with the simple hidden Markov model (Rabiner, 1989), ex- extend existing work to construct a class of marked Markov tensions like dynamic Bayesian networks (Murphy, 2002), modulated Poisson process models. Another contribution infinite-state HMMs (Beal et al., 2002; Teh et al., 2006), is to introduce heterogeneity across point processes, via factorial Markov models (Ghahramani & Jordan, 1997) user-specific ‘preference’ vectors. For this, we also intro- have been proposed and applied. Such models are however duce a convenient conjugate prior that allows a simple ex- unsuitable for the check-in data we consider, which con- tension of existing Markov chain Monte Carlo (MCMC) tains bursty, asynchronous observations over long intervals methods. We apply our ideas to the FourSquare check-in of time. As we demonstrate in our experiments, accurate data, demonstrating their usefulness for data-exploration, prediction requires artificial discretization at high temporal prediction, and anomaly detection. resolution, incurring expensive computation. A more natural choice models check-in times as continu- Related work: There has been work on learning shared ous. A probabilistic mechanism well-suited to the forego- patterns of trajectories using model-based and optimization ing behavior is the Markov jump process (MJP) (C¸inlar, approaches. In computer vision, Wang et al. (Wang et al., 1975). MJPs are the simplest processes for creating 2011) extended the hierarchical Dirichlet Process for tra- continuous-time extensions of a discrete-time Markov jectories from videos and other surveillance devices. They chain, and a realization is a piecewise constant function treat trajectories as documents of exchangable words, and transitioning between N states. Changes in state can corre- can learn topics which represent semantic regions of differ- spond to changes in location or activity. Such a continuous- ent activities. This ignores sequential information in the time model allows us to effectively model the fact that typi- trajectories. Zhang et al. (Zhang et al., 2014) approach cal trajectories make only few transitions over an extended the problem of extracting semantic sequential patterns from period (e.g. a year), while accurately representing the ac- user check-in trajectories via a two-step optimization algo- tual transition times (upto a resolution of e.g. minutes). rithm that uses category information of each check-in loca- Formally, an MJP is parametrized by an initial distribution tion; see also (Giannotti et al., 2007; Zheng et al., 2009) on over states π, and an N × N rate matrix A. Entering state discovering sequential patterns from trajectory data. Other i triggers a race to jump to a new state j 6= i, with tran- shared patterns that have been studied include flock (Laube sitions from i to j having rate Aij. Effectively, for each & Imfeld, 2002), convoy (Jeung et al., 2008), and swarm candidate new state, a waiting time is sampled from a rate (Li et al., 2010). Unlike the Bayesian approach that we Aij exponential distribution, and the system moves into the take, these methods do not provide uncertainty in estimates, state with the minimum waiting time. The process then re- and are less capable of dealing with noise or missing data. peats. Note that the off-diagonal elements Aij are transi- Other Bayesian work includes Westgate et al. (Westgate tion rates and are thus nonnegative numbers. The negative et al., 2013), who use a Bayesian model to estimate ambu- of the diagonal element Aii gives the rate of leaving state lance travel times. Finally, Yuan et al. (Yuan et al., 2010) P i (i.e. Aii = − i6=j Aij) so that the rows of A sum to 0. proposes a framework called T-drive to suggest better driv- An equivalent way of sampling an MJP trajectory is to suc- ing directions based on taxi trajectory data. cessively first sample the time of the next transition from an exponential distribution with rate jAiij, and then sample 2. Markov jump processes a new state j 6= i with probability proportional to Aij. Our data is at the individual level, consisting of snapshots of individual trajectories over time intervals. Observations Modeling check-in data with MJPs: The data we con- are a sequence of check-in times, each with an associated sider consists of a collection of partially-observed trajecto- location. We model check-ins as local excursions about a ries over an interval [0;Tend], each trajectory correspond- latent location: e.g., one might visit a restaurant in Chicago, ing to a different ‘user’ (we will say users or trajectories Markov-modulated marked Poisson processes for check-in data [draft] Figure 2. 3-state mMMPP with Figure 3. Models without (left) A A rates (λ1; λ2; λ3). Crosses are and with (right) preference vec- check-ins, having 2-d marks tors. For the latter, a high pref- from a 2-d normal with state- erence for state A boosts in- dependent mean and covariance. coming rates to A. A similar B C B C but smaller effect exists for C depending on the context). We envision each trajectory as a piecewise-constant realization of an MJP.