Neural Network Based Continuous Conditional Random Field for Fine

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19) Neural Network based Continuous Conditional Random Field for Fine-grained Crime Prediction Fei Yi1;4 , Zhiwen Yu1 , Fuzhen Zhuang2;3 and Bin Guo1 1Northwestern Polytechnical University, Xi’an, Shaanxi, China 2Key Lab of IIP of CAS, Institute of Computing Technology, CAS, Beijing, China 3University of Chinese Academy of Sciences, Beijing, China 4Baidu Inc, The Business Intelligence Lab, Baidu Research [email protected], [email protected], [email protected], [email protected] Abstract tions among instances. And we could leverage this advan- tage in modeling region relationship for crime prediction by Crime prediction has always been a crucial issue regarding each region in a city as an instance. However, tra- for public safety, and recent works have shown ditional CCRF would face problems like complex gradient the effectiveness of taking spatial correlation, such derivation and capacity for large-scale dataset when pairwise as region similarity or interaction, for fine-grained interactions across instances are all considered [Ristovski et crime modeling. In our work, we seek to reveal al., 2013]. To solve these problems, we take advantages from the relationship across regions for crime predic- back-propagation algorithm in model training by introducing tion using Continuous Conditional Random Field neural network components into CCRF model. Specifically, (CCRF). However, conventional CCRF would be- mean-field theory [Koller and Friedman, 2009] has proved come impractical when facing a dense graph con- that the inference process of CCRF model could be trans- sidering all relationship between regions. To deal ferred into an iterative procedure, and we further reformulate with it, in this paper, we propose a Neural Network it into neural network layers and propose a Neural Network based CCRF (NN-CCRF) model that formulates based CCRF model (NN-CCRF) in our work. CCRF into an end-to-end neural network framework, which could reduce the complexity in model Inspired by recent works [Zheng et al., 2015; Xu et al., training and improve the overall performance. We 2017] on incorporating CRF with neural network for discrete integrate CCRF with NN by introducing a Long labeling problems, we proceed to alleviate the limit of in- Short-Term Memory (LSTM) component to learn tegrating CCRF with neural network for structured regres- the non-linear mapping from inputs to outputs of sion problems. In details, traditional CCRF model consists each region, and a modified Stacked Denoising Au- of two parts, the unary potential and the pairwise potential. toEncoder (SDAE) component for pairwise interac- Commonly, unary potential models relationship between in- tions modeling between regions. Experiments con- puts and outputs of each instance, and pairwise potential con- ducted on two different real-world datasets demon- straints outputs of each instance according to a predefined strate the superiority of our proposed model over correlation matrix calculated by some kernel functions (e.g., the state-of-the-art methods. Gaussian kernel, RBF kernel). Existing works only trans- form pairwise potential into neural network framework, and adopt predefined kernel functions to calculate the correlation 1 Introduction matrix for pairwise interaction modeling. In our work, the Works on smart city applications related to mobile comput- proposed NN-CCRF model not only formulates pairwise po- ing [Wang et al., 2017; Liu et al., 2019; Yu et al., 2015], tential into neural network, but also reformulates the unary social economics [Yu et al., 2016; Liu et al., 2016; Fu et potential into a Long Short-Term Memory (LSTM) neural al., 2014] and public safety [Yu et al., 2018; Yi et al., 2018] network. Furthermore, we propose to learn the correlation have inspired the implementation of advanced technologies matrix between instances using Stacked Denoising AutoEn- in crime prevention [Wang et al., 2013; Du et al., 2016]. coder (SDAE) rather than predefined kernel functions, which Specifically, the consideration of spatial correlation among is more effective to understand the spatial correlation between different regions has been proved effective, where [Wang et regions in a data-driven manner. And our work mainly makes al., 2016] studied the taxi trajectory-based region relationship the following contributions: and [Zhao and Tang, 2017] modeled a distance-based region similarity for spatial-temporal crime prediction. However, the • We propose a Neural Network based Continuous Condi- complexity of different type of spatial correlation between re- tional Random Field (NN-CCRF) model for fine-grained gions would eventually make the fine-grained crime predic- crime prediction, which applies Long Short-Term Mem- tion a difficult problem. ory (LSTM) as the unary potential and leverages Stacked As discussed in [Qin et al., 2009], CCRF is a powerful Denoising AutoEncoder (SDAE) to learn spatial corre- model that is typically designed to model effects of interac- lations across regions. 4157 Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19) • We formulate the inference process of NN-CCRF model into a sequential neural network, which helps us to train the whole model in an end-to-end manner leveraging the advantages of back-propagation algorithm. • We conduct experiments on two real-world crime records collected from Chicago and New York respec- tively. Considering different types of criminal inci- dences and disjointed grids in the city as fine-grained regions, our NN-CCRF model outperforms the state-of- the-art approaches with respect to crime prediction and Figure 1: The proposed NN-CCRF model. ranking accuracy. By achieving the above contributions, our work is of great ∗ importance for researchers to understand the mechanism of where R(x; h ) is the preliminary estimations on y without fusing/transforming traditional CCRF model into neural net- considering the pairwise spatial correlations, σ(•) is the Sig- works for crime prediction. mod function, and W∗ and U∗ are weight matrices, and b∗ donate the bias vectors within LSTM components that are ∗ 2 NN-CCRF Model for Crime Prediction specified as follows in iteratively updating ht : ∗ 2.1 Problem Formulation ft =σ(Wf xt + Uf ht−1 + bf ); ∗ In our work, we propose to learn a neural network based non- it =σ(Wixt + Uiht−1 + bi); linear mapping model M : I!O from the input historical ∗ ot =σ(Woxt + Uoh + bo); crime number I to the output future crime number O. More t−1 (3) q ∗ gt = tanh(Wgxt + Ugh + bg); formally, let Q = f(Hi; Fi)gi=1 be a training set of q pairs, t−1 where Hi 2 I represents the historical crime numbers and ct =ft ct−1 + it gt; Fi 2 O donates the corresponding future crime numbers. To h∗ =o tanh(c ): deal with fine-grained crime prediction, we divide city land- t t t scape into many small regions. Therefore, Hi is a N × T Besides, the pairwise potential function provides a spatial- matrix and Fi forms as a N × 1 vector, where N donates dependent smoothing term that encourage correlated regions the number of disjointed regions in a city and T represents to have similar crime numbers as defined: the length of historical time steps (e.g., T days, weeks, or ∗ ∗ 2 months). That is, our model aims to predict future crime p(yi; yj; K ) = −Ki;j (yi − yj) ; (4) numbers leveraging historical T time steps of data records. where K∗ represents the spatial-dependent correlation be- Further, considering a N × N correlation matrix that can po- i;j tween yi and yj, which constraints and smooths the prelim- tential influence the crime distribution across all regions, our ∗ ∗ inary estimations (e.g., R(xi; h ) and R(xj; h ) ) to have model is also required to capture spatial relationship for crime better overall results. And one critical problem is how to de- prediction. ∗ duce Ki;j for each pair of yi and yj. In our work, we leverage 2.2 Neural Network Based CCRF Model a modified SADE [Vincent et al., 2010] framework to learn the spatial correlation matrix K∗ used in pairwise potential The proposed Neural Network based Continuous Conditional function. Random Field (NN-CCRF) model is illustrated in Figure 1, Specifically, SDAE is a feedforward neural network that which takes advantages from both CCRF model and NN al- matches the corrupted input and output (ground-truth) by en- gorithms. Specifically, we demonstrate a conventional CCRF coding and decoding raw input data in a sequential manner. model in the middle, where each gray node tagged with x i In our work, suppose we have corrupted input ^y (inferred by represents the historical crime numbers with T time steps and unary potential function) and ground-truth y, SDAE first en- white node in y donates the corresponding future crime num- i codes ^y into image z and then decodes z to produce y0 as bers of i-th region. The unary feature function and correla- predictions on y as follows: tion matrix learning components are proposed based on neural network algorithms to solve CCRF model, and we define z =σ(Wz ^y + bz); our NN-CCRF model as: (5) y0 =σ(W z + b ); 1 X y y P (yjx) = expf (y; x; h∗) + (y ; y ; K∗)g; Z(x) u p i j and the objective function of conventional SDAE is shown as: i;j (1) 0 2 2 ∗ min jjy − y jjF + λjjW jjF ; (6) where u(y; x; h ) is the unary potential function, and we fW g;fbg adopt LSTM [Hochreiter and Schmidhuber, 1997; Gers et al., where W donate the weight matrices mapping ^y to y, which 1999] with hidden states h∗ to represent the mapping from can be regarded that W achieves the goal of measuring cor- input x to output y as follows: relation between yi and yj.

Load more