1 Understanding Urban Dynamics via Context-aware Tensor Factorization with Neighboring Regularization

Jingyuan Wang, Junjie Wu, Ze Wang, Fei Gao, and Zhang Xiong

Abstract—Recent years have witnessed the world-wide emergence of mega-metropolises with incredibly huge populations. Understanding residents mobility patterns, or urban dynamics, thus becomes crucial for building modern smart cities. In this paper, we propose a Neighbor-Regularized and context-aware Non-negative Tensor Factorization model (NR-cNTF) to discover interpretable urban dynamics from urban heterogeneous data. Different from many existing studies concerned with prediction tasks via tensor completion, NR-cNTF focuses on gaining urban managerial insights from spatial, temporal, and spatio-temporal patterns. This is enabled by high-quality Tucker factorizations regularized by both POI-based urban contexts and geographically neighboring relations. NR-cNTF is also capable of unveiling long-term evolutions of urban dynamics via a pipeline initialization approach. We apply NR-cNTF to a real-life data set containing rich taxi GPS trajectories and POI records of Beijing. The results indicate: 1) NR-cNTF accurately captures four kinds of city rhythms and seventeen spatial communities; 2) the rapid development of Beijing, epitomized by the CBD area, indeed intensifies the job-housing imbalance; 3) the southern areas with recent government investments have shown more healthy development tendency. Finally, NR-cNTF is compared with some baselines on traffic prediction, which further justifies the importance of urban contexts awareness and neighboring regulations.

Index Terms—Urban Dynamics, Tensor Factorizations, , Spatio-Temporal Pattern, GPS Trajectory !

1 INTRODUCTION

As reported by the World Bank1, at the end of 2016 understand the evolving rules of cities so as to make proper more than 53% population of the world, i.e., about 3.7 urban planning. The last one is to find urban dynamics billion people, lived in cities; about 36 mega-metropolises with good interpretability — an obscure urban dynamic is worldwide had a population of more than 10 million. Huge useless to decision making in real-world application sce- urban populations bring great challenges such as traffic narios. Despite of rich literature in applying matrix/tensor jams, educational/medical resource scarcity, environmental factorizations to model urban heterogeneous data, most of pollution, etc. Understanding the behavioral patterns of them aim to generate patterns to improve the predictive residents in a city, or urban dynamics for short, therefore be- accuracy of traffic volumes [2], [3], [4], but leave pattern comes an important yet urgent demand for urban planning explanation to luck. It is not until recently that a few works and public policy making from a perspective. begin to take the understanding of urban dynamics as the Fortunately, the widely adopted mobile crowd sensing (MCS) primary research task, and the representative ones include technologies [1], such as GPS, mobile phones, and location- the earlier rNTD model using Tucker factorizations [5], the based services, give us an unprecedented opportunity to ac- city spectrum modeling using CP factorizations [6], and still cess to enormous and perhaps unbounded human mobility some using single source data [7], [8], [9] or for discover- data, which combined with urban infrastructure data offer ing urban functional zones only [10], [11]. These excellent

arXiv:1905.00702v2 [cs.LG] 10 May 2019 a “rich ore” for discovery of urban dynamics. works, however, cannot meet all the above-mentioned re- In general, mining urban dynamics from MCS data quirements simultaneously. has three requirements. The first one is to model multi- In this paper, we propose a Neighbor-Regularized source heterogeneous data, which consist of mobility records context-aware Non-negative Tensor Factorization model of residents such as the origins and destinations, the travel (NR-cNTF) to discover explainable and evolving urban time, the purposes, and the surroundings hidden in different dynamics from multi-source heterogeneous urban data. In data sources such as GPS trajectories, urban contexts, and the NR-cNTF model, we introduce the concepts of data city maps. The second requirement is to capture long-term space and pattern space and describe the relations between evolutions, which is critically important for urban planners to urban data and urban dynamics. The Tucker factorization is then introduced with the POI-based (Point-Of-Interests) • J. Wang, Z. Wang and Z. Xioing are with the School of urban contexts to factorize the ODT (Origin-Destination- and Engineering, Beihang Unversity, Beijing 100191, China. E-mail: Time) tensor into spatial, temporal, and spatio-temporal {jywang,ze.w,xiongz}@buaa.edu.cn. patterns of good interpretability. Moreover, a neighboring • J. Wu (corresponding author) is with the School of Economics and Management, Beihang University, Beijing 100191, China. E-mail: regularization that incorporates geographically neighboring [email protected]. relations is introduced into our model to further improve • F. Gao is with Microsoft Research Asia, Beijing, China. the explainability of spatial patterns. Finally, a simple yet 1. http://data.worldbank.org/ effective pipeline initialization approach is designed to cap- 2

TABLE 1 element of W, i.e., wpq, is a coefficient that describes the Notation Definition similarity between urban zones p and q using, e.g., points of interest (POI) data. Space Variable Definition Pattern-space variables: The variables in pattern space R the data tensor include a core tensor and three pattern projection matrices. Data rxyz the (x, y, z) element of R Space W the urban context matrix Assume there are I origin spatial patterns (OSP), J desti- wpq the (p, q) element of W nation spatial patterns (DSP), and K temporal patterns (TP) C M×I the pattern tensor hidden inside the data tensor R. We define O ∈ R cijk the (i, j, k) element of C Pattern O, D, T the pattern projection matrices as a spatial projection matrix that projects M origin zones M×J Space ox, dx, tx the x-th row vectors of O, D, T into I OSP’s. Similarly, D ∈ R is defined as another o:i, d:i, t:i the i-th column vectors of O, D, T spatial projection matrix that projects M destination zones oxi, dxi, txi the (x, i) elements of O, D, T N×K into J DSP’s. The matrix T ∈ R is a temporal projection matrix that projects N time slices to K TP’s. The elements of O, D and T are denoted as oxi, dyj and tzk, respectively, ture the long-term evolutions of urban dynamics. indicating the projection intensities from the urban zones x, We conduct extensive experiments on a real-life data y and time slice z to OSP i, DSP j and TP k, 1 ≤ i ≤ I, set that contains the GPS trajectories of over 20,000 taxies 1 ≤ j ≤ J, 1 ≤ k ≤ K. We define a third-order tensor C as and over 400,000 POI records of Beijing from 2008 to 2015. a core tensor that describes the dynamics of resident travels The first scenario of the experiments is to verify the ability among temporal and spatial patterns. The (i, j, k) element of NR-cNTF in disclosing true urban dynamics and obtain of C, i.e., cijk, denotes the intensity of resident travels from managerial insights via NR-cNTF. The results indicate that: OSP i to DSP j within TP k. 1) NR-cNTF accurately captures four kinds of mobility rhythms and seventeen spatial communities of Beijing; 2) 2.1 Construction of Data Tensor the rapid development of Beijing in the CBD area, is indeed We here explain how to construct the data tensor R using at the expense of severer job-housing imbalance and there- real-life GPS trajectory data of Beijing Taxies. To this end, fore is unsustainable in a long run; 3) the southern areas we first segment the Beijing city map into M urban zones. of Beijing are experiencing unprecedented growth with the In the literature, quite a few methods including the grid recent government investments, and most importantly they based, morphology based, road networks based, and ad- have shown more healthy development tendency. The sec- ministrative boundaries based methods [12], [13] can fulfill ond scenario of the experiments is to testify the prediction this task. Here we adopt a Traffic Analysis Zones (TAZ) map power of NR-cNTF, which is compared with some baselines provided by Beijing Municipal Committee of Transport2 to on traffic prediction. The results demonstrate the superiority segment Beijing into M = 651 zones. Finally, since resident of NR-cNTF in tensor completion, which further justifies behaviors in city life are often cyclical every day, we divide the importance of adopting urban contexts and neighboring one day into N = 24 time slices (one hour per slice). The regulations in NR-cNTF. above procedure determines the three modes of R. We then compute the element values of R. Note that the 2 PROBLEM FORMULATION taxi GPS data are often organized as a set of quintuples in the form as hvid, time, longitude, latitude, statei, where In this section, we formulate urban dynamics discovery as vid is the unique ID of a taxi, (longitude, latitude) is the a context-aware tensor factorization problem. Table 1 lists location of the taxi, and state informs whether the taxi is the math variables to be used, which are divided into carrying any passengers at time time. We first obtain all two categories, i.e., data-space variables and pattern-space taxi-based passenger travels by removing the records with variables, according to their observability. Variables in the “no passengers” state. Then an origin-destination-time (ODT) data space are observable from real-world human mobility, record is constructed for each travel by picking up the first while variables in the pattern space are latent but crucial for and last records of the travel and then extracting the origin understanding urban dynamics. and destination coordinates and the travel starting time. We Throughout the paper, we use lowercase symbols such collect the travel ODT records of all workdays in a month as as a, b to denote scalars, bold lowercase symbols such as a data set. The monthly total amount of travels that depart a, b for vectors, bold uppercase symbols such as A, B for from TAZ x in time slice z and arrive at TAZ y is recorded matrices, and calligraphy symbols such as A, B for tensors. as r˜xyz. As reported in [14], the travel volumes between Data-space variables: The primary variable in data space different urban zones usually follow a long-tail distribution. is a data tensor. Assume there are M urban zones in a city, Therefore, we adopt the log function to rescale r˜xyz as and N time slices in a day. Let rxyz denote the resident travel intensity from an origin zone x ∈ {1, ··· ,M} to rxyz = log (1 +r ˜xyz) , (1) a destination zone y ∈ {1, ··· ,M} within a time slice which is finally used as the (x, y, z) element of R. M×M×N z ∈ {1, ··· ,N}. A third-order tensor R ∈ R is then defined by having rxyz as the (x, y, z) element. 2.2 Definition of Pattern Tensor R Intuitively, contains the original information about urban Variables in pattern space include C, O, D, and T, where C dynamics, which can be obtained from urban vehicle and is the core tensor that models the dynamic relations among resident trajectory data. Another variable in data space is M×M an urban-context similarity matrix W ∈ R . The (p, q) 2. http://www.bjjtw.gov.cn/ 3 spatio-temporal patterns in the pattern space, and O, D and T are the matrices that project the data tensor R into the Time Projection core tensor C. To better understand this, we give formal definitions to the spatial and temporal patterns as follows. Dynamic pattern tensor Definition 1 (Spatial Pattern): A spatial pattern is a » Origin vector containing the membership score of each urban zone Projection Destination to this pattern. Assume there are I spatial patterns and M ODT data tensor Projection urban zones. The ith spatial pattern is denoted as a vector v > :i = (v1i, . . . , vMi) , where vmi is the membership score of (a) Non-negative Tensor Factorization the mth zone to the ith spatial pattern. The spatial projection matrix V that projects M urban zones to I spatial patterns Origin Origin is then defined as V = [v:1,..., v:I ].  Projection Projection Context x V v The th row vector of , denoted as x, is a vector Similarity »

that depicts the membership scores of urban zone x to I Matrix ´ » ´ Origin

different spatial patterns. We assign x to spatial pattern i Origin

Projection Projection if i ∈ arg max1≤j≤I vxj. In this way, we can cluster all urban zones into the I spatial patterns. This implies that (b) Contexts Awareness a spatial pattern is essentially a spatial community consisting of urban zones that function similarly in urban dynamics. Fig. 1. Model framework of cNTF. For example, most of residents in a residential community leave in the morning and return in the evening. In contrast, 2.3 Definition of Urban Context for a business community, people arrive in the morning and leave in the evening. Spatial patterns can be further divided Travel behaviors of residents not only have relations with into origin spatial patterns (OSP) and destination spatial urban spatial and temporal patterns but also have close patterns (DSP). The projection matrix V is denoted as O relations with the so-called urban context [11], [15]. Urban for OSP’s and D for DSP’s for differentiation. While O and context refers to the surroundings inside an urban zone D share the same M urban zones, they might have different that can affect the travel behaviors of that zone. One typical numbers of spatial patterns. type of urban context is the so-called points of interests (POI) including residential buildings, office buildings, shopping Definition 2 (Temporal Pattern): A temporal pattern is a malls, etc. We have the following definition. vector containing the membership score of each time slice Definition 4 (Urban-Context Similarity Matrix): A ma- within a day to this pattern. Assume there are K temporal M×M trix W ∈ R is called an urban-context similarity patterns and N time slices in a day. The kth temporal pattern matrix, whose (p, q) element w is a coefficient that mea- t > pq is denoted as a vector :k = (t1k, . . . , tNk) , where tnk is the sures the POI context similarity between zones p and q, membership score of the nth time slice to the kth temporal 1 ≤ p, q ≤ M.  pattern. The temporal projection matrix T that projects N In general, W is a nonnegative and symmetric matrix, times slices into K temporal patterns is then defined as T = which could be used to validate the effectiveness of the spa- [t ,..., t ]. :1 :K  tial patterns found purely from trajectory data. For example, In essence, a temporal pattern describes a temporal it is intuitive that the travel patterns of urban zones with a rhythm of urban dynamics, which might correspond to an mass of office buildings should be very similar, but differ event that occurs recurrently everyday, e.g., the morning sharply from that of zones filled with residential buildings. peak and evening peak in a city. Accordingly, the vector t:k indicates the dynamic intensity of the rhythm k within a day. 2.4 Problem Definition Next, we define a pattern tensor to describe the interre- We here formulate the urban dynamics discovery problem as a lationships among spatio-temporal patterns. tensor factorization problem. The model framework is given in Fig. 1, where the ODT data tensor R, pattern tensor C, I×J×K Definition 3 (Pattern Tensor): A tensor C ∈ R and projection matrices O, D, and T have the following is a third-order pattern tensor, if its (i, j, k) element cijk relationship: indicates the intensity of resident travels from OSP i to DSP j in TP k, 1 ≤ i ≤ I, 1 ≤ j ≤ J, 1 ≤ k ≤ K.  R = C ×o O ×d D ×t T + E, (2) M×M×N Human behaviors in city life usually have synchronism, where E ∈ R is a random error tensor, and ×n which can be described by urban dynamic patterns in C. denotes the tensor n-mode product. Eq. (2) implies that the For example, intuitively, residents living in a residential resident travel dynamics hidden inside data tensor R can community commute to business regions synchronously in be well explained by the latent dynamic patterns given by every morning peak of workdays. So an element cijk has pattern tensor C. The matrices O, D, and T express the a high value when the origin spatial pattern i corresponds projection relations between R and C. to a residence community, the destination spatial pattern Note that while R is observable from resident travels j corresponds to a business community, and the temporal data, the pattern tensor C as well as the projection matrices pattern k corresponds to a morning-peak rhythm. O, D and T are unknown variables. Hence, our task is: 4

• To infer C,O, D and T from R; TABLE 2 • To understand urban dynamics using C, O, D, T. Information of POI categories The urban-context similarity matrix W offers additional ID POI category ID POI category information to tensor factorization. Recall the row vector ox 1 food & beverage Service 8 education and culture of the projection matrix O, which contains the membership 2 hotel 9 business building scores of urban zone x to all the OSP’s. It is intuitive that 3 scenic spot 10 residence similar urban zones should exhibit similar spatial patterns. 4 finance & insurance 11 living service Hence, we can measure the similarity of zones x and y by 5 corporate business 12 sports & entertainments > 6 shopping service 13 medical care simply having oxoy . Analogously, we can also measure the 7 transportation facilities 14 government agencies similarity of zones x and y by employing the information > of DSP’s in D, i.e., dxdy . Since W evaluates the similarity between x and y as wxy according to the urban context, In order to obtain more evident patterns, we should we finally have the following relationships between W and introduce sparse priors to the variables in pattern space. As projection matrices O and D: a result, we adopt zero-mean Laplace priors for projection > > matrices: W = OO + EO, and W = DD + ED, (3) M Y P (O|σO) = L(ox|0, σOII ), where EO and ED are random error matrices. Note that in x=1 Eq. (3), W is an observable variable and O and D are latent M ones. In other words, we can use urban context to fine-tune Y P (D|σD) = L(dy|0, σDIJ ), (5) OSP’s and DSP’s in O and D, respectively. y=1 In summary, Eq. (2) and Eq. (3) together define a context- N aware Non-negative Tensor Factorization (cNTF) problem. Y P (T|σT ) = L(tz|0, σT IK ), Our task is to infer urban dynamics given cNTF. z=1 and assume zero-mean Laplace priors for the pattern tensor: 2.5 Extension to Long-Term Evolution I J K Long-term evolution is an important characteristic of ur- Y Y Y P (C|σ ) = L(c |0, σ ). (6) ban dynamics, which refers to the evolution of urban spa- C xyz C x=1 y=1 z=1 tial, temporal and spatio-temporal patterns over time. For example, temporal rhythms of resident travels in a city Then the posterior distribution of the pattern space variables might change with the developments of public transport, is given by economics, migration, etc. 2 P (C, O, D, T|R, σ , σC, σO, σD, σT ) We use tensor sequence to describe the evolution of urban R P (R|C, O, D, T, σ2 )P (C|σ )P (O|σ )P (D|σ )P (T|σ ) dynamics in both data and pattern spaces. In the data space, = R C O D T , L R| 2 we define R|l=1 = {R1,..., RL} as a data tensor sequence P ( σR) of length L, where Rl is the data tensor of the l-th year. (7) Suppose we factorize Rl into Ol, Dl, Tl and Cl according to and the log posterior distribution is then calculated by Eq. (2) and Eq. (3), then we have the pattern tensor sequence ln P (C, O, D, T|R, σ2 , σ , σ , σ , σ ) L R C O D T C|l=1 = {C1,..., CL}, and the corresponding projection L L L 1 X 2 matrix sequences O| , D| and T| , respectively. ∝ − (rxyz − C ×o ox ×d dy ×t tz) l=1 l=1 l=1 2σ2 The problem is, for any two subsequent years l and l +1, R xyz the patterns inferred from Rl might not be comparable to 1 X 1 X 1 X (8) − koxk1 − kdyk1 − ktzk1 that from Rl+1, for they are inferred separately to optimize σO x σD y σT z the objectives in Eq. (2) and Eq. (3). Therefore, another task 1 X of this study is to infer the long-term evolution of urban − |cxyz|. σ dynamics given a data tensor sequence. C xyz Therefore, to obtain the Maximum A Posteriori (MAP) esti- 3 MODEL mation of O, D, T and C is equivalent to minimizing the In this section, we reformulate the cNTF problem from a object function

probabilistic perspective, which results in the exact objective 1 2 J˜ = kR − C ×o O ×d D ×t Tk function for urban dynamics discovery. 2σ2 F R (9) 1 1 1 1 3.1 Probabilistic Non-negative Tensor Factorization + kOk1 + kDk1 + kTk1 + kCk1, σO σD σT σC We assume the random error of observation E follows a 2 where k.kF is the Frobenius-norm, k.k1 is the L1-norm. Gaussian distribution: N (0, σR), then the conditional distri- bution over the observed entries in R is defined as 2 3.2 Modeling Urban Contexts P (R|C, O, D, T, σR) M M N We here introduce urban contextual factors into the prob- Y Y Y 2 (4) abilistic non-negative tensor factorization model. We use a = N (rxyz|C ×o ox ×d dy ×t tz, σR). x=1 y=1 z=1 Beijing POI dataset, with the categories given in Table 2. 5

bution of O, D, T and C is given by Urban zones 1500 Linear fitting P (O, D, T, C|R, W, Ω)

1000 ∝ P (R|O, D, T, C, Ω)P (W|O, Ω)P (W|D, Ω) (15) P (O|0, Ω)P (D|0, Ω)P (T|0, Ω)P (C|0, Ω), POI numbers 500 and the log posterior distribution is

0 ln P (O, D, T, C|R, W, Ω) 0 0.5 1 1.5 2 2.5 3 3.5 Travel volumes 4 x 10 1 X 2 ∝ − 2 (rxyz − C ×o ox ×d dy ×t tz) 2σR xyz Fig. 2. Validation of urban context correlations. 1 X > 2 1 X > 2 − 2 (wpq − opoq ) − 2 (wpq − dpdq ) 2σWO pq 2σWD pq 3.2.1 Urban Contextual Factors 1 X 1 X 1 X − koxk1 − kdyk1 − ktzk1 Fig. 2 shows a clear positive correlation between POI quan- σO x σD y σT z tity and the resident travel volume (including inflow and 1 X outflow) for all urban zones of Beijing. Moreover, urban − |cijk|. σ zones in the same community have similar categories of C ijk POI’s (see Section III of Supplementary Materials3 for the (16) details). Therefore, we use quantity and categories of POI’s To maximize the posterior distribution is equivalent to in an urban zone to describe urban contextual factors. minimizing the sum-of-squared errors function with hybrid Suppose altogether we have H POI categories, and de- quadratic regularization terms, i.e., note nph as the number of POI’s in category h for urban 2 min J = kR − C ×o O ×d D ×t TkF zone p. The fraction of the h-th category POI in the zone p O,D,T,C is defined as + αkW − OO>k2 + βkW − DD>k2 nph F F (17) cph = , (10) PP + γ kOk1 + δkDk1 + kTk1 + εkCk1 p=1 nph s.t. O ≥ 0, D ≥ 0, T ≥ 0, C ≥ 0, The fraction of all category of POI in the zone p is then σ2 σ2 2σ2 2σ2 2σ2 defined as R R R R R H where α = 2 , β = 2 , γ = , δ = ,  = , P σWO σWD σO σD σT h=1 nph 2σ2 np = PP PH , (11) ε = R . Note that we introduce non-negativity constraints n σC p=1 h=1 ph on the variables so as to avoid perplexing negative travel > We use the vector up = (cp1, . . . , cph, . . . , cpH , np) to volumes. Eq. (17) indeed formulates the cNTF problem describe the POI context of the zone p. defined in Sect. 2.4. Given the POI context vectors, the similarity of two urban zones p and q can be computed as 3.3 Neighboring Regularization up · uq wpq = , (12) kupk · kuqk Let SPi = {x : vxi = max1≤j≤I vxj} denote the ith urban community corresponding to the spatial pattern v which is the (p, q) element of W. :i in the spatial projection matrix V. For the urban zones in SPi, it is natural to expect that: i) they are geographically 3.2.2 Incorporating Urban Contextual Factors neighboring to each other, and ii) their resident mobility Context-aware regularization is an effective tool to fusion behaviors are similar to one another and different from contextual information into tensor and matrix factoriza- that in other communities. These, however, have not been tions [16], [17]. We introduce urban contextual factors as considered in the above-mentioned cNTF model. context-aware regularization using a maximum a posteriori To address these, we here introduce the so-called Neigh- method. Assume the elements of EO and ED in Eq. (3) boring Regularization (NR), which is inspired by the con- follow zero-mean Gaussian distributions, then we have ditional random field based image segmentation method M M in [18]. Specifically, we model urban community discovery 2 Y Y > 2 P (W|O, σWO) = N (wpq|opoq , σWO), (13) as an image segmentation problem; that is, the community p=1 q=1 labels of urban zones are modeled as a Markov random field G(V, E), where νx ∈ V is the community label of and urban zone x, and exy ∈ E is an undirectional dependency M M 2 Y Y > 2 between urban zone x and y. For the latent νx, we have an P (W|D, σ ) = N (wpq|dpd , σ ). (14) WD q WD observable matrix Rx:: for the origin order of R, or R:y: for p=1 q=1 the destination order. 2 2 2 Let Ω = {σR, σWO, σWD, σO, σD, σT , σC}. Given the data Without loss of generality, in what follows, we use the tensor R and urban context matrix W, the posterior distri- origin order as an example to introduce the neighboring regularization. Suppose G(V, E) and Rx::, x ∈ {1 ...M}, 3. The companion file with the supplementary materials of this paper. satisfy the conditional random field hypothesis. Similar to 6

init init init init Algorithm 1 Block Coordinate Descent Procedure G G G ... G ... G 0 1 2 l L Require: Data sets {R, W}, parameters {γ, δ, , ε}   Initialization: C(0), O(0), D(0), T(0) for s = 1, 2,... do Update C(s) by solving the problem (23a). R1W1 R2W2 ... RlWl ... RLWL Update O(s) by solving the problem (23b). (s) Fig. 3. Pipeline initialization for tensor sequence analysis. Update D by solving the problem (23c). Update T(s) by solving the problem (23d). Apply Algorithm 2 to O(s). the classical image segmentation task in [18], the optimiza- Apply Algorithm 2 to D(s). tion objective for community discovery is to maximize a if convergence then potential function as   return C(s), O(s), D(s), T(s) . M M X u X X p end if ζ = ψx (νx) + ψxy(νx, νy), (18) end for x=1 x=1 y∈Mx u where Mx is the set of neighbor zones of zone x. ψx (νx) is the unary potential of the CRF in zone x when the where fNR-cNTF denotes the optimization algorithm for NR- community label of x is set to νx, which is defined as cNTF. Fig. 3 further illustrates PI-TSA via a flow chart. As o can be seen, the key of PI-TSA is to set the initial values u − xνx ψx (νx) = log PI . (19) of the l-th year’s optimization as the outputs in the (l-1)-th oxi i=1 step (i.e., Gl−1). In this way, the patterns in the (l-1)-th year p ψxy(νx, νy) is the pairwise potential between zones x and y can be “inherited” by the patterns in the l-th year, and only when the community labels of x and y are set to νx and νy, the information of Rl and Wl is used for pattern discovery respectively; that is, in the l-th year. ( p 0, if νx = νy, ψ (νx, νy) = (20) 4 INFERENCE xy g(x, y), otherwise. 4.1 Basic Optimization Note that g(x, y) is a function of the difference between Rx:: We adopt the Block Coordinate Descent-Proximal Gradi- and Ry::, which is defined as a Gaussian kernel as follows: ent (BCD-PG) algorithm [21], [22] to solve the cNTF problem 2 ! in Eq. (17). While this function is not jointly convex with kRx:: − Ry::k respect to C, O, D, and T, it is block multiconvex with each g(x, y) = exp − F , (21) 2σ2 one when the other three are fixed. Therefore, as shown in NR Algorithm 1, we adopt a Block Coordinate Descent (BCD) (0) where σNR is a parameter suggested in [18]. This actually procedure, which starts from an initialization on G , and introduces a penalty for the zones that are adjacent and then iteratively updates G(s), s = 1, 2, ··· , by

have similar resident mobility behaviors but are assigned (s)  (s−1) (s−1) (s−1) C = arg min J C, O , D , T + γkCk1, (23a) to different communities. C In a nutshell, Eq. (18) introduces the spatial community (s)  (s) (s−1) (s−1) O = arg min J C , O, D , T + δkOk1, (23b) discovery problem, which could be regarded as a neigh- O boring regularization to cNTF, and thus form the so-called (s)  (s) (s) (s−1) D = arg min J C , O , D, T + kDk1, (23c) NR-cNTF model. D (s)  (s) (s) (s)  T = arg min J C , O , D , T + εkTk1. (23d) 3.4 Modeling Long-Term Evolution T Let (g , g , g , g ) denote (C, O, D, T) for concision. We here introduce a simple yet effective way to model the 1 2 3 4 Using a Proximal Gradient (PG) method, the algorithm long-term evolution of spatio-temporal patterns. Let R and l updates the i-th variable of G in the s-th round as Wl denote the data tensor and POI similarity matrix in the   l-th year, and Gl = {Cl, Ol, Dl, Tl} denote the set of latent * ∂J g(s), g˜(s), g(s−1) + (s) i (s) patterns learnt from the l-th year’s data, l = 1, 2, ··· ,L. gi = arg min , gi − g˜i g ≥0 ∂gi As described in Sect. 2.5, to factorize every Rl inde- i 2 L τi (s) pendently for G|l=1 is often inappropriate for generating ˜ + gi − gi + λikgik1 (24) incomparable patterns in successive years. The Dynamic 2 F   (s) (s) (s−1)  Tensor Analysis (DTA) scheme suggested in [19], [20] cannot  1 ∂J gi λ  = max 0, g˜(s) − − i , fulfill our task either for using Rl as well as historical data i  τi ∂gi τi  tensors to obtain a “hybrid” Gl, which is not the genuine Gl we aim to analyze in practice. (s) where h·i denotes the inner product, gi denotes {gi+1 ... g4 }. The Tensor Sequence Analysis (PI-TSA) method. In PI-TSA, the (s) factorization results in Gl are expressed as variable g˜i is a linear extrapolated point as follows: (s) (s−1) (s)  (s−1) (s−2) Gl = fNR-cNTF (Rl, Wl, Gl−1) , (22) g˜i = gi + ωi gi − gi , (25) 7 (s) where ωi is an extrapolation weight set according to [22]. Algorithm 2 Neighboring Regularization Optimization ∂J (gi) 0 oxi u 0 The parameter τi in (24) is a Lipschitz constant of Unary Potentials: oxi ← PI , ψx (i) ← − log oxi. ∂gi j=1 oxj with respect to gi, namely, ˜ P P p 0 Pairwise Potentials: Qxi ← j6=i y∈M ψxy(i, j)oyj. x ∂J (gi1 ) ∂J (gi2 ) Update the Projection Matrix. − ≤ τikgi1 − gi2 kF , ∀ gi1 , gi2 , (26) ∂gi1 ∂gi2 F

and λ is the regularization parameter of g . Specifically, (s) i i then updates o as the gradients of J with respect to each component are xi ( calculated as max{0, o(s−1) + ∆cNT F + ∆NR}, if ∆cNT F ≤ 0, o(s) = xi ∂J   >   >   >  xi (s−1) cNT F NR = 2 C ×o O O ×d D D ×t T T oxi + max{0, ∆ + ∆ }, otherwise. ∂C (33) > > > (s) (s) − R ×o O ×d D ×t T , NR Note that o˜xi ≤ oxi ⇒ ∆ ≤ 0, so the update of oxi in       (s−1) ∂J > > > Eq. (33) is in the same direction with the gradient of oxi . = 2 O C ×d D D ×t T T C(o) ∂O (o) Algorithm 2 therefore ensures that the reconstruction error  > > >  >  in each iteration is always the same or lower than that in the − R ×d D ×t T C(o) − α W − OO O , (o) previous iteration. (27) ∂J    >   >  > = 2 D C ×o O O ×t T T C(d) ∂D (d)  > > >  >  5 EXPERIMENTAL RESULTS − R ×o O ×t T C(d) − β W − DD D , (d) In this section, we conduct extensive experiments to evalu- ∂J    >   >  > ate the effectiveness of our methods in learning urban dy- = 2 T C ×o O O ×d D D C(t) ∂T (t) namics and gaining managerial insights for urban planning.  > > >  − R ×o O ×d D C(t) , We also compare our methods with some baselines on traffic (t) prediction, which justifies the modeling of urban contexts where X (x) denotes the mode-x matricization of tensor X . and neighboring regulation in NR-cNTF.

5.1 Experimental Setup 4.2 Neighboring Regularization Optimization 5.1.1 Data Sets Algorithm 2 shows the optimization process of neighboring Three types of data sets were used in our experiments regularization. Without loss of generality, we still take the including taxi trajectory data, POI data, and Traffic Analysis origin order for illustration. In each cNTF optimization Zone data. The taxi trajectory data set contains the GPS iteration, Algorithm 2 regularizes the projection matrix O trajectories of 20,000 Beijing taxis collected in November through the following steps: 2008 and November 2015, from which we extracted more O 1) Calculate Unary Potentials: We first normalize as than 6 million trips of taxi passengers to present the daily 0 oxi mobility behaviors of residents in Beijing. The POI data set oxi = PI . (28) j=1 oxj contains more than 400 thousands POI records of Beijing u 0 in the years of 2008 and 2015. The Traffic Analysis Zone Then the unary potential of oxi is ψx (i) = − log oxi. (TAZ) data set, offered by Beijing Municipal Commission of 2) Calculate Pairwise Potentials: We then calculate the Transportation, divides the Beijing area within the 5-th Ring average pairwise potential of νx = i to νy ∈ {j|j =6 i} as Road into 651 zones. Using the three data sets, we built two X X p data tensors (651 × 651 × 24) and two POI context matrices Qxi = Pyj · ψ (i, j), (29) xy (651 × 651) for the years of 2008 and 2015, respectively. In j6=i y∈Mx the experiments, we only use data of workdays to construct where Mx is the set of neighbor zones for zone x. Pyj the data tensor R, so the discovered patterns reflect resident in Eq. (29) is a probability of vy = j, which is defined as mobility in workdays. Peoples leisure patterns in holiday u could be very different from their workday patterns. We exp(−ψy (j)) 0 Pyj = = o , (30) have conducted extra experiments on holiday data, and Z yj y included the results to Supplementary Materials for readers where 1/Zx denotes the partition function. with interests. 3) Update the Projection Matrix: Finally, we calculate the total potential of oxi as 5.1.2 Setting of Dimensionality of Pattern Space The goal of the NR-cNTF model is to find an I × J × K- ζ = ψu(i) + Q . (31) xi x xi dimensional pattern space. How to set I, J, K appropriately, The regularized element is then defined as however, is a “tricky” issue. If the dimensionality is too small, we might omit some urban dynamics; if too large, I X we might obtain many trivial patterns (for the extreme case, o˜ = exp(−ζ ) · o . (32) xi xi xj if the dimensionality of the pattern space is the same as the j=1 data space, the patterns will be meaningless). For the s-th round of iteration in Algorithm 1, we define In our experiments, we set the parameters carefully so as (s) (s) (s) (s−1) ∆NR =o ˜xi − oxi , and ∆cNT F = oxi − oxi . Algorithm 2 to make a tradeoff between the reconstruction error and the 8

0.39 0.344 0.01 P1 0.01 P1 0.38 P2 P2 P3 P3 0.37 0.343 P4 P4 0.005 0.005 0.36

RMSE

RMSE 0.35 0.342 Pattern coefficient Pattern coefficient

0 0 0.34 1 5 10 15 20 24 1 5 10 15 20 24 Hour Hour 0.33 0.341 5 10 15 20 25 30 2 4 6 8 10 Number of spatial latent patterns I,J K Number of temporal latent patterns (a) 2008 (b) 2015 (a) Setting of I,J (b) Setting of K Fig. 6. Temporal patterns in 2008 and 2015. Fig. 4. Performance with varying dimensionality of pattern space.

0.358 In Eq. (35), the elements of T corresponding to the patterns k 0.354 ¬k are multiplied by zero, so R˜ only contains the com- 0.354 ponents of the pattern k. Therefore, the physical meaning 0.35 k RMSE

RMSE ˜ 0.35 of R is a component tensor corresponding to the k-th 0.346 k temporal pattern of the data tensor R. Using R˜ , we define 0.346 0 0.001 0.005 0.01 0.025 0.05 0.1 0.5 1 2.5 5 10 the energy of the temporal pattern k as α , β γ , δ, k PM PM PN k (a) POI Regularization (b) L1 Regularization kR˜ k1 x=1 y=1 z=1 |r˜xyz| uk = = . (36) Fig. 5. Performance with varying POI and L1 regularization coefficients. M × M × N M × M × N The physical meaning of the energy uk is a normalized size dimension reduction. The reconstruction error is evaluated of the components corresponding to the temporal pattern k. by Root Mean Square Error (RMSE) defined as follows: In the experiments, we define the re-scaled pattern coef- t˜ s ficient zk as PM PM PN 2 tzk x=1 y=1 z=1 (rxyz − rˆxyz) t˜zk = × uk. (37) RMSE = , (34) PN t M × M × N n=1 nk ˜ where rˆ is the (x, y, z) element of the reconstructed data The physical meaning of tzk is the energy of the temporal xyz ˜ tensor. We repeated experiments 10 times with I = J pattern k at the time slice z. The vector t:k is the distribution PN ˜ ranging from 5 to 30 and K ranging from 2 to 10. Fig. 4 of uk over the N time slices, and z=1 tzk = uk. We gives the resultant average reconstruction errors with dif- compare the re-scaled pattern coefficients of different years ferent parameters, where RMSE reduces sharply at the very to demonstrate the changes of temporal patterns of resident beginning but slows down when I,J ≥ 20 and K ≥ 4. We mobility from 2008 and 2015. therefore set I = J = 20 and K = 4 as defaults. Fig. 6 shows the four temporal patterns, which indeed correspond to four rhythms of urban traffic:

5.1.3 Setting of Tradeoff Parameters • P1: Morning Peak, with an active range roughly from 6:00 to 11:00. In NR-cNTF, the tradeoff parameters α and β are for ad- • P2: Midday, with an active range roughly from 9:00 justing the strength of urban context terms, and γ, δ and  to 18:00. for adjusting the strength of sparsity regularization terms. • P3: Evening Peak, with an active range roughly from In our experiment, we set the tradeoff parameters using a 16:00 to 24:00. traverse approach. We vary α and β from 0 to 0.05 and γ, • P4: Night, with an active range roughly from 20:00 to δ and  from 0.1 to 10, respectively, aiming to choose the 3:00 of the next day. parameters with the best performances. Fig. 5 exhibits the experimental reconstruction errors with different tradeoff To further reveal the evolution of temporal patterns from parameters, where each point is averaged on 10 runs. As can 2008 to 2015, we plot comparative diagram for each pattern be seen, the best performance appears when α = β = 0.01 of the two yeas in Fig. 7. The first observation is that the and γ = δ =  = 2.5, which become the default settings. intensity of the morning pattern was decreased significantly from 2008 to 2015 (see Fig. 7(a)), whereas the evening pattern seems much more stable (see Fig. 7(c)). We believe 5.2 Discovery of Temporal Patterns the reduction of the morning peak via taxies is due to the Here, we describe the temporal patterns discovered from rapid development of the metro system in Beijing. During Beijing taxi traffic in 2008 and 2015. To facilitate comparison, the period from 2008 to 2015, the Beijing metro increased the we first introduce a normalization scheme to the projection mileage from 198km to 631km, which is particularly suitable matrix T. Specifically, for the k-th pattern, we define a mask for the time-rigid morning commute but has less impact to k N×K k the evening commute with relatively flexible time. matrix as Y ∈ R , where the element yxi = 1 when i = k, and 0 otherwise. We use the mask matrix to construct Another observation is that the intensity of the midday a data tensor as pattern was increased during the seven years (see Fig. 7(b)). The main part of travel volume in the midday pattern k  k R˜ = C ×o O ×d D ×t T Y . (35) consists of business travels from one workplace to another, 9

x 10−3 6 0.01 2008 0.01 2008 0.01 2008 2008 2015 2015 2015 2015 0.008 0.008 0.008 4 0.006 0.006 0.006

0.004 0.004 0.004 2 0.002 0.002 0.002 Pattern coefficient Pattern coefficient Pattern coefficient Pattern Pattern coefficient Pattern

0 0 0 0 5 10 15 20 5 10 15 20 5 10 15 20 5 10 15 20 Hours Hours Hours Hours (a) Morning Peak (b) Midday (c) Evening Peak (d) Night

Fig. 7. The temporal patterns comparison between 2008 and 2012.

(a) 2008 DSP’s by NR-cNTF (b) 2015 DSP’s by NR-cNTF (c) 2008 DSP’s by cNTF

Fig. 8. Destination spatial patterns in 2008 and 2015.

whose destinations are random in essence and therefore Fig. 8(a) and Fig. 8(b) visualize the urban communities cannot count heavily on public transportation systems like corresponding to the destination spatial patterns found in metros. Moreover, the fast-rising income in China in recent 2008 and 2015, respectively. As can be seen, each urban years might also contribute to the more spending on the community (filled with a same color) identified by NR- relatively expensive taxi service. cNTF contains urban zones geographically adjacent to at The most interesting observation is that the peak time least one zone in the same community, which agrees with of the night pattern in 2015 came about two hours later our intuition about functional zoning of a city. In contrast, than that in 2008 (Fig. 7(d)). This implies that residents Fig. 8(c) shows the 2008 urban communities found by cNTF tend to have more travels in the midnight in recent years. without neighboring regulation, whose functionalities are The reasons behind this could be complicated, which might less clear due to the geographical discontinuity. For the include some lifestyle changes in Beijing, such as the more convenience of discussion, we numbered the communities colorful nightlife or the higher overtime working pressures. in Fig. 8(b) from 1 to 17. To sum up, the NR-cNTF model well captures the tem- A general observation from Fig. 8 is that the spatial com- poral patterns hidden inside the Beijing taxi traffic. The munities of Beijing radially surround the center of Beijing. evolution of these patterns further unveils the development This character of spatial communities has close relations of Beijing metros and the changes of lifestyle. with the trunk road network structure of Beijing. Fig. 10(a) shows there are four concentric ring roads surrounding the center of Beijing. As reported in [23], the ring roads provide 5.3 Discovery of Spatial Patterns a basic framework for the city’s overall spatial pattern. Here, we explore the spatial patterns discovered by NR- Affected by the ring roads, we can see that the communities cNTF. Given any origin or destination pattern v:i (see Def. 1 discovered in Fig. 8 also constitute two concentric circles in Sect. 2.2), we first obtain the corresponding urban com- surrounding the center of the Beijing city. Specifically, the munity SPi (see Sect. 3.3). We adopt the “crisp partition” communities C1-C10 form the outer circle, and C11-C17 assumption so that an urban zone will be assigned to one form the inner circle. Fig. 10(b) plots the trunk road network and only one urban community. As a result, among the of Beijing over the communities, from which we can see that I = J = 20 patterns in our experiment, we obtain 17 urban many boundaries of the communities overlap with the trunk communities, and the rest three are empty and omitted. roads, indicating that the spatial patterns of residential Note that we only use destination spatial patterns (DSP) mobility in Beijing are deeply shaped by the urban trunk for illustration below. The origin spatial patterns have the road network. similar results, we don’t put them in the paper for concision. Another observation from Fig. 8 is the interesting evo- 10

1 1 1 1

5 5 5 5

10 10 10 10

Origin communities 15 Origin communities 15 Origin communities 15 Origin communities 15

1 5 10 15 1 5 10 15 1 5 10 15 1 5 10 15 Destination communities Destination communities Destination communities Destination communities (a) 2008 Morning Peak (b) 2008 Midday (c) 2008 Evening Peak (d) 2008 Night

1 1 1 1

5 5 5 5

10 10 10 10

Origin communities 15 Origin communities 15 Origin communities 15 Origin communities 15

1 5 10 15 1 5 10 15 1 5 10 15 1 5 10 15 Destination communities Destination communities Destination communities Destination communities (e) 2015 Morning Peak (f) 2015 Midday (g) 2015 Evening Peak (h) 2015 Night

Fig. 9. Dynamic patterns in 2008 and 2015.

5.4 Discovery of Urban Dynamics among Patterns The 5th Ring Road The 4th Ring Road Here, we use the core tensor C to explore the urban dy-

The 2nd Ring Road namics, i.e., the interactions among spatial and temporal patterns. We first observe the slice C::k of C, which reveals the traffic intensity from every origin communities to every destination ones given the temporal pattern k, i.e., a com- The 3rd Ring Road munity level origin-destination (OD) matrix in rhythm k. Fig. 9 visualizes the community OD-matrices in the morning peak, midday, evening peak and night rhythms (a) The Ring Roads in Beijing (b) The Trunk Roads in Beijing of 2008 and 2015. A darker color indicates a higher traffic intensity. As can be seen, most energies of the OD-matrices Fig. 10. The urban communities and trunk roads in Beijing. are concentrated in their diagonal lines, implying that most of taxi travels in Beijing actually happened within the same community with relatively short distances. Moreover, the travel demands across communities have a tidal phe- nomenon. That is, in the morning peak, people flowed out lution of some urban communities in recent years. Let us from many communities (i.e., residential areas) and flowed take a closer look on community C7 located in the south of in a few ones (i.e., working areas), and the situation was Beijing, which has an obvious expansion trend from 2008 just the reverse in the evening peak and night rhythms. to 2015. That is, some urban zones that belonged to C6 in This implies that while the residential areas in Beijing are 2008 were “absorbed” by C7 in 2015. To understand this, very dispersed, the workplaces are relatively concentrated. we should trace back to the so-called South Beijing Devel- Indeed, it seems from Fig. 9(e) that C10, C13 and C17 are the opment Plan (SBDP) issued in 2008, which is a government three “most attractive” workplaces in Beijing, which are ac- investment plan in south areas of Beijing, with an executive tually well-known as the Zhongguancun area4, Beijing Cen- period from 2010 to 2015 and a total investment of nearly tral Business District (CBD)5, and Beijing Financial Street6, 62.9 billion USD (more information about SBDP could be respectively. From this aspect, NR-cNTF indeed generates found in Supplementary Materials). The purpose of SBDP is high-quality patterns for urban dynamics understanding. to narrow the development gap between the lagging-behind We then explore the evolution of traffic intensities from southern region and other areas of the city. It is interesting 2008 to 2015 in Beijing. For the comparison purpose, we first that the communities C6 and C7 are just in the investment concentrate the energies of projection matrices into the core 0 P P P region of the plan (see Fig. 2 in Supplementary Materials for tensor as cijk = cijk · x oxi · y dyj · z tzk. The total the evidence). The evolution of C6 and C7 from 2008 to intensity of inter-community traffic for a community x is inter P P 0 P P 0 2015 essentially reflects the great impact of huge economic then calculated as Ix = i6=x k cixk + j6=x k cxjk, investments to the real-life development of a city. and the intra-community traffic intensity for x is given by intra P 0 Ix = k cxxk. Along this line, we can quantify the daily To sum up, the above results justify the effectiveness of our NR-cNTF model in uncovering latent and geographi- 4. https://en.wikipedia.org/wiki/Zhongguancun cally adjacent spatial patterns, as well as their inconspicuous 5. https://en.wikipedia.org/wiki/Beijing central business district evolutions in recent years. 6. https://en.wikipedia.org/wiki/Beijing Financial Street 11

10 10 x 10 x 10 2.5 6 4 2008 South Beijing 2008 Development Plan 2015 CBD 2 2015 3 Zhongguancun 1.5 4 2 Financial Street 1 2 1 0.5 0 Inter−community traffic Intra−community traffic 0 Inter traffic growth ratio 0 1 3 5 7 9 11 13 15 17 1 3 5 7 9 11 13 15 17 1 3 5 7 9 11 13 15 17 Communties ID Communities ID Communties ID (a) Inter-Community Traffic (b) Inter-Community Traffic (c) Intra-Community Traffic Growth

Fig. 11. Inter- and intra-community traffic intensities. increments of inter- and intra-community traffic intensities communities than that in the morning (see Fig. 12(c) and from 2008 to 2015, as shown in Fig. 11. Fig. 12(d)). We believe it is Fig. 12(d) rather than Fig. 12(c) From Fig. 11(a), it is obvious that the inter-community that revealed all the housing communities for CBD. The pos- traffic increased from 2008 to 2015 for almost all commu- sible reason is, for residents living in remote communities, nities, with C10 (Zhongguancun area), C13 (CBD area) and the long-term, timely and economic way commuting to CBD C17 (Financial Street area) being the most significant ones. In in the morning is to take metro rather than taxi. From this particular, as shown in Fig. 11(b), the Zhongguancun area, a angle, we can conclude that the job-housing imbalance gets technology hub of Beijing and well-known as the “Chinese even worse with the rapid development of the CBD area Silicon Valley”, gains a highest growth ratio during the from 2008 to 2015. seven years, which coincides with the developing priority To sum up, the evolution of urban dynamics indicates of Beijing with high-tech industries preference. the rapid development of Beijing city in recent years. The Fig. 11(c) depicts the intra-community traffic intensity development pattern, however, is still worrying for the job- of each community from 2008 to 2015. It is interesting housing imbalance status quo, although the southern area that C7 and C15 emerged as the top-2 communities with has showed some positive changes. highest growth in internal traffic. Recall that these two communities are located in the south of the Beijing city, and have benefited from the 30 billion dollar investment of the 5.5 Quantitative Evaluation South Beijing Development Plan. The significant growth of In this subsection, we evaluate our NR-cNTF model by internal traffic implies that these two communities are gain- comparing its data tensor reconstruction error with that of ing more active economics, and perhaps are enjoying more some baseline models, for further explaining why NR-cNTF sustainable developing pattern — residents can work and can work well for understanding the Beijing city. Following rest interchangeably within a small distance. This indeed the tradition of tensor factorization based studies [4], [20], recommends a potential solution to mitigating the “big city the Root Mean Square Error defined in Eq. (34) is used as an disease” of Beijing: to promote industries and housing in indicator of quality. a same community or close ones. This job-housing balance In the experiments, we define a sampling tensor S ∈ thinking, however, was not the primary choice of Beijing in M×M×N R , in which the element sxyz = 1 when the traffic the past several decades. The development of the CBD area, volume form zone x to zone y in time slice z was sampled, which we will discuss below, is just the epitome. otherwise un-sampled. We then rewrite the objective func- In Fig. 12, we study the dynamic patterns of a partic- tion in Eq. (17) as ular community: the CBD area (C13), which is the central 2 arg min J = kS (R − C ×o O ×d D ×t T) kF business district of Beijing and shapes the lifestyle of the C,O,D,T≥0 city deeply. In the figure, the color of a community indicates + αkW − OO>k2 + βkW − DD>k2 the traffic intensity of that community from or to the CBD F F kOk kDk kTk kCk community: the redder the stronger, and the arrows indicate + γ 1 + δ 1 +  1 + ε 1. (38) traffic directions between communities. As shown in Fig. 12, The reconstruction error between R and the reconstructed CBD is a pure business area, with residents flowing in in the tensor Rˆ = C × O × D × T is calculated using Eq. (34). morning and flowing out in the evening. Similar situations o d t can be found from the Zhongguancun (C10) and the Finan- We compare the reconstruction error of NR-cNTF with cial Street (C17) communities. This indeed reflects the severe that of the following baseline methods: job-housing imbalance in Beijing, which contributes a lot to • Tucker: Non-negative Tucker Factorization, of which the city disease such as traffic congestion. Nevertheless, it is the objective function is more interesting to find the pattern evolution of CBD from 2 2008 to 2015. From Fig. 12(a) and Fig. 12(b), we can find the arg min kS (R − C ×o O ×d D ×t T)k C,O,D,T F nearly symmetric incoming and outgoing flows between the (39) + γ kOk + δkDk + kTk + εkCk . CBD community and the communities surrounding CBD in 1 1 1 1 2008. This symmetry, however, disappeared in 2015, where Compared with our method, Tucker does not con- the outflows from CBD in the evening spread over more sider urban context and neighboring regularization. 12

(a) 2008 Morning Peak (b) 2008 Evening Peak (c) 2015 Morning Peak (d) 2015 Evening Peak

Fig. 12. Dynamic patterns from and to the CBD community.

• CP: Non-negative CP Factorization, which supposes TABLE 3 a joint latent space for each mode by solving an Tensor Reconstruction Performance by RMSE objective function as 50% 60% 70% 80% 90% ! 2 X NR-cNTF 0.351 0.344 0.343 0.342 0.341 arg min S R − o:m ◦ d:m ◦ t:m , cNTF 0.350 0.345 0.343 0.342 0.341 O,D,T (40) m F Tucker 0.357 0.356 0.353 0.351 0.350 + γ kOk + δkDk + kTk , rCP-20 0.351 0.349 0.349 0.347 0.347 1 1 1 rCP-4 0.403 0.401 0.400 0.398 0.396 where operator ◦ represents the vector outer product. CP-20 0.353 0.352 0.349 0.348 0.346 CP-4 0.405 0.403 0.401 0.401 0.400 In the CP factorization, the latent factor dimensional- ity for both the spatial and temporal patterns are the same. As a result, we set the number of latent factors m = 4 or m = 20. The former is the same as the In summary, besides the superior interpretability, NR- number of temporal patterns for NR-cNTF, and the cNTF also shows excellent performance in quantitative eval- latter is in accordance with that of spatial patterns. uation on tensor factorization, by employing core tensor, • rCP: Regularized Non-negative CP Factorization, neighboring regulation, and urban contexts. As a natural which is a CP factorization with the urban context- corollary, NR-cNTF could be used for urban traffic volume aware regularization. The objective function is prediction when the elements of a data tensor are only partially available. ! 2 X arg min S R − o:m ◦ d:m ◦ t:m O,D,T 6 RELATED WORK m F 2 2 (41) Mining knowledge from human mobility data generated + α W − OO> + β W − DD> F F in urban areas has attracted many researchers’ interests in

+ γ kOk1 + δkDk1 + kTk1. recent years [24], [25]. Various types of “social sensors”, such as cell phones [26], GPS terminals [25], and smart bus/metro In our experiments, we compared the methods on the cards [27], have been adopted to record mobility informa- data tensor of 2015. The sampling rate varied from 50% tion of urban residents, based on which many successful ap- to 90%. The average RMSE values of ten times repeated plications have emerged for intelligent transportation [28], experiments are reported in Table 3. From the table, we have [29], environmental protection [30], urban planning [10], the following observations: urban emergency [31], etc. An excellent survey from an ur- ban computing perspective can be found in [24], while [25] • Both NR-cNTF and cNTF performed much better provides a survey from a social and community dynamics than the baseline methods, indicating the general perspective. superiority of the proposed methods. Among the abundant methods for human mobility data • NR-cNTF performed nearly the same as cNTF, indi- mining, tensor factorization/decomposition, like CANDE- cating that the neighboring regularization improves COMP/PARAFAC (CP) [32] and Tucker factorizations [33], the interpretability of spatial patterns at the very low gains particular interests for its distinct ability in modeling cost of model deviation from real-world data. multi-aspect heterogeneous big data. Indeed, in city scenar- • NR-cNTF/cNTF performed generally better than ios data samples are always involved with many aspects, Tucker, indicating the distinct value of urban contexts such as time, space, human, urban contexts and so on, and for tensor factorization. therefore are very suitable for tensor factorization based • NR-cNTF/cNTF/Tucker performed generally better data mining methods [24]. Typical applications of tensor than rCP4/CP4/rCP20/CP20, implying the advan- factorization could be classified into two categories. The first tage of employing Tucker rather than CP based category is to reconstruct tensors for predicting unknown methods. This is not unusual, since the core tensor values in multi-aspect data sets, such as completing miss- generated by Tucker factorization contains important ing traffic data [2], inferring urban gas consumption [3], information about urban dynamic patterns and im- predicting travel time [4], recommending social tags [34], proves the model interpretability. movies [35] and sightseeing locations [36], [37], and so on. 13 In recent years, more and more works focused on min- 7 CONCLUSION ing explainable latent factors from multi-aspect urban data In this paper, we proposed a POI context-aware nonnega- sets, which form the second category of applications. The tive tensor factorization model with neighboring regulation focal point here is to use tensor factorization to discover (NR-cNTF) for urban dynamics discovery. A simple pipeline latent lower-dimensional factors from higher-dimensional initialization method was also introduced to NR-cNTF to multi-aspect data sets. For instance, Metafac [38] used CP facilitate evolution analysis of the dynamics. Experiments factorizations to extract latent community structures from on Beijing taxi trajectory and POI data demonstrated the various social networks, and [39] proposed a multi-view high-quality of the spatial, temporal and spatio-temporal data clustering and partitioning method based on Tucker patterns generated by NR-cNTF for city-disease diagnosing factorization. Our study in this paper also falls in this and urban planning. The comparative studies with some category, with some most related works as follows. baselines on traffic prediction further justified the advantage of NR-cNTF in adopting urban contexts and neighboring The study [7] used a non-negative matrix factorization, regulation. i.e., a second-order tensor factorization, to model taxi trip data, and discovered the latent factors corresponding to three rhythms of resident’s daily life. Similarly, matrix fac- REFERENCES torizations were used for understanding the operational [1] H. Ma, D. Zhao, and P. Yuan, “Opportunities in mobile crowd behaviors of taxicabs in cities [8]. In the inspiring work, sensing,” IEEE Communications Magazine, vol. 52, no. 8, pp. 29–35, [5] adopted a regularized non-negative Tucker decompo- 2014. sition (rNTD) to discover residents’ mobility patterns in [2] H. Tan, G. Feng, J. Feng, W. Wang, Y.-J. Zhang, and F. Li, “A tensor- based method for missing traffic data completion,” Transportation Beijing from an origin-destination-time tensor. Following Research Part C: Emerging Technologies, vol. 28, pp. 15–27, 2013. this idea, [9] proposed a probabilistic tensor factorization [3] F. Zhang, D. Wilkie, Y. Zheng, and X. Xie, “Sensing the pulse method to find mobility patterns of public transaction sys- of urban refueling behavior,” in Proceedings of the 2013 ACM tem passengers from an origin-destination-time-type tensor. international joint conference on Pervasive and . ACM, 2013, pp. 13–22. CitySpectrum [6] used CP factorizations to mine joint time- [4] Y. Wang, Y. Zheng, and Y. Xue, “Travel time estimation of a day-location patterns of residents after the Great East Japan path using sparse trajectories,” in Proceedings of the 20th ACM Earthquake. Some more complex algorithms include NT- SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014, pp. 25–34. CoF [40], which is a non-negative tensor co-factorization [5] J. Wang, F. Gao, P. Cui, C. Li, and Z. Xiong, “Discovering urban algorithm for urban events detection from bike trip and spatio-temporal structure from time-evolving traffic networks,” in check-in data, and HTM [41], which is a hybrid tensor Asia-Pacific Web Conference. Springer, 2014, pp. 93–104. model and uses ACS-tucker decomposition to detect events [6] Z. Fan, X. Song, and R. Shibasaki, “Cityspectrum: a non-negative tensor factorization approach,” in Proceedings of the 2014 ACM from traffic data. In recent years, many dynamic tensor International Joint Conference on Pervasive and Ubiquitous Computing. factorization algorithms were proposed for time series and ACM, 2014, pp. 213–223. stream data mining. For instance, Dynamic Tensor Analy- [7] C. Peng, X. Jin, K.-C. Wong, M. Shi, and P. Lio,` “Collective human mobility pattern from taxi trips in urban area,” PloS one, vol. 7, sis [19] extended Tucker factorization to process dynamic no. 4, p. e34487, 2012. and stream high-order data, the Facets model [42] combined [8] C. Kang and K. Qin, “Understanding operation behaviors of dynamic graphical models with tensor factorizations for taxicabs in cities by matrix factorization,” Computers Environment mining co-evolving high-order time series, and FEMA [20] & Urban Systems, vol. 60, pp. 79–88, 2016. [9] L. Sun and K. W. Axhausen, “Understanding urban mobility was a flexible evolutionary tensor factorization algorithm to patterns with a probabilistic tensor factorization framework,” mine dynamic behavioral patterns of multi-facet data sets. Transportation Research Part B: Methodological, vol. 91, pp. 511–524, 2016. [10] N. J. Yuan, Y. Zheng, X. Xie, Y. Wang, K. Zheng, and H. Xiong, Despite of the wide existence of related works men- “Discovering urban functional zones using latent activity trajecto- tioned above, our study in this paper has its own unique- ries,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 3, pp. 712–725, 2015. ness. Unlike the previous works, we focus on understanding [11] J. Yuan, Y. Zheng, and X. Xie, “Discovering regions of different urban dynamics from multiple aspects, including spatial, functions in a city using human mobility and pois,” in Proceedings temporal, as well as spatio-temporal interactions, with still of the 18th ACM SIGKDD international conference on Knowledge a pursue to long-term evolution patterns. The results indeed discovery and data mining. ACM, 2012, pp. 186–194. [12] Y. Zheng, Y. Liu, J. Yuan, and X. Xie, “Urban computing with taxi- bring some important managerial insights and suggestions cabs,” in Proceedings of the 13th international conference on Ubiquitous to city development of Beijing. The proposed NR-cNTF computing. ACM, 2011, pp. 89–98. model takes Tucker factorization as a basic framework, [13] N. J. Yuan, Y. Zheng, and X. Xie, “Segmentation of urban areas using road networks,” Microsoft, Albuquerque, NM, USA, Tech. Rep. which compared with CP and matrix factorization based MSR-TR-2012-65, 2012. models [6], [7], [8], [41] has better interpretability for adopt- [14] X. Liang, X. Zheng, W. Lv, T. Zhu, and K. Xu, “The scaling of ing a core tensor to model relations among latent factors. human mobility by taxis is exponential,” Physica A: Statistical Compared with the existing Tucker factorization based Mechanics and its Applications, vol. 391, no. 5, pp. 2135–2144, 2012. [15] N. J. Yuan, Y. Zheng, X. Xie, Y. Wang, K. Zheng, and H. Xiong, methods [2], [9], [24], NR-cNTF incorporates urban con- “Discovering urban functional zones using latent activity trajecto- texts and neighboring regulation, which improve both the ries,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, accuracy and interpretability of Tucker factorization greatly. no. 3, pp. 712–725, 2015. Moreover, we proposed a pipeline initialization approach [16] D. Zhang, F. Zhang, and T. He, “Multicalib: national-scale traffic model calibration in real time with multi-source incomplete data,” to analyze the evolution of urban dynamics across several in Proceedings of the 24th ACM SIGSPATIAL International Conference years, which is simple yet practical. on Advances in Geographic Information Systems. ACM, 2016, p. 19. 14

[17] Y. Zheng, T. Liu, Y. Wang, Y. Zhu, Y. Liu, and E. Chang, “Diagnos- factorization,” in Proceedings of the 15th ACM SIGKDD international ing new york city’s noises with ubiquitous data,” in Proceedings conference on Knowledge discovery and data mining. ACM, 2009, pp. of the 2014 ACM International Joint Conference on Pervasive and 527–536. Ubiquitous Computing. ACM, 2014, pp. 715–725. [39] X. Liu, S. Ji, W. Glanzel,¨ and B. De Moor, “Multiview partitioning [18] P. Kr?henbhl and V. Koltun, “Efficient inference in fully connected via tensor methods,” IEEE Transactions on Knowledge and Data crfs with gaussian edge potentials,” pp. 109–117, 2012. Engineering, vol. 25, no. 5, pp. 1056–1069, 2013. [19] J. Sun, D. Tao, and C. Faloutsos, “Beyond streams and graphs: [40] L. Chen, J. Jakubowicz, D. Yang, D. Zhang, and G. Pan, “Fine- dynamic tensor analysis,” in Proceedings of the 12th ACM SIGKDD grained urban event detection and characterization based on ten- international conference on Knowledge discovery and data mining. sor cofactorization,” IEEE Transactions on Human-Machine Systems, ACM, 2006, pp. 374–383. vol. 47, no. 3, pp. 380–391, 2017. [20] M. Jiang, P. Cui, F. Wang, X. Xu, W. Zhu, and S. Yang, “Fema: [41] H. Fanaee-T and J. Gama, “Event detection from traffic tensors: A flexible evolutionary multi-faceted analysis for dynamic behav- hybrid model,” Neurocomputing, vol. 203, pp. 22–33, 2016. ioral pattern discovery,” in Proceedings of the 20th ACM SIGKDD [42] Y. Cai, H. Tong, W. Fan, P. Ji, and Q. He, “Facets: Fast comprehen- international conference on Knowledge discovery and data mining. sive mining of coevolving high-order time series,” in Proceedings ACM, 2014, pp. 1186–1195. of the 21th ACM SIGKDD International Conference on Knowledge [21] Y. Xu, “Alternating proximal gradient method for sparse nonneg- Discovery and Data Mining. ACM, 2015, pp. 79–88. ative tucker decomposition,” Mathematical Programming Computa- tion, vol. 7, no. 1, pp. 39–70, 2015. [22] Y. Xu and W. Yin, “A block coordinate descent method for regular- ized multiconvex optimization with applications to nonnegative tensor factorization and completion,” SIAM Journal on imaging sciences, vol. 6, no. 3, pp. 1758–1789, 2013. [23] G. Tian, J. Wu, and Z. Yang, “Spatial pattern of urban functions in the beijing metropolitan region,” Habitat International, vol. 34, no. 2, pp. 249–255, 2010. [24] Y. Zheng, L. Capra, O. Wolfson, and H. Yang, “Urban computing: concepts, methodologies, and applications,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 5, no. 3, p. 38, 2014. [25] P. S. Castro, D. Zhang, C. Chen, S. Li, and G. Pan, “From taxi gps traces to social and community dynamics: A survey,” ACM Computing Surveys (CSUR), vol. 46, no. 2, p. 17, 2013. [26] F. Calabrese, M. Colonna, P. Lovisolo, D. Parata, and C. Ratti, “Real-time urban monitoring using cell phones: A case study in rome,” IEEE Transactions on Intelligent Transportation Systems, vol. 12, no. 1, pp. 141–151, 2011. [27] L. Sun, K. W. Axhausen, D.-H. Lee, and X. Huang, “Understand- ing metropolitan patterns of daily encounters,” Proceedings of the National Academy of Sciences, vol. 110, no. 34, pp. 13 774–13 779, 2013. [28] J. Yuan, Y. Zheng, X. Xie, and G. Sun, “T-drive: Enhancing driving directions with taxi drivers’ intelligence,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 1, pp. 220–232, 2013. [29] L. Chen, X. Ma, T.-M.-T. Nguyen, G. Pan, and J. Jakubowicz, “Understanding bike trip patterns leveraging bike sharing system open data,” Frontiers of Computer Science, vol. 11, no. 1, pp. 38–48, Feb 2017. [Online]. Available: https: //doi.org/10.1007/s11704-016-6006-4 [30] Y. Zheng, F. Liu, and H.-P. Hsieh, “U-air: When urban air quality inference meets big data,” in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013, pp. 1436–1444. [31] X. Song, Q. Zhang, Y. Sekimoto, T. Horanont, S. Ueyama, and R. Shibasaki, “Modeling and probabilistic reasoning of population evacuation during large-scale disaster,” in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013, pp. 1231–1239. [32] H. A. Kiers, “Towards a standardized notation and terminology in multiway analysis,” Journal of chemometrics, vol. 14, no. 3, pp. 105–122, 2000. [33] L. R. Tucker, “Some mathematical notes on three-mode factor analysis,” Psychometrika, vol. 31, no. 3, pp. 279–311, 1966. [34] P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos, “A unified framework for providing recommendations in social tagging sys- tems based on ternary semantic analysis,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 2, pp. 179–192, 2010. [35] J. Tang, G.-J. Qi, L. Zhang, and C. Xu, “Cross-space affinity learning with its application to movie recommendation,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 7, pp. 1510–1519, 2013. [36] V. W. Zheng, B. Cao, Y. Zheng, X. Xie, and Q. Yang, “Collab- orative filtering meets mobile recommendation: A user-centered approach.” in AAAI, vol. 10, 2010, pp. 236–241. [37] V. W. Zheng, Y. Zheng, X. Xie, and Q. Yang, “Towards mobile intelligence: Learning from gps history data for collaborative recommendation,” Artificial Intelligence, vol. 184, pp. 17–37, 2012. [38] Y.-R. Lin, J. Sun, P. Castro, R. Konuru, H. Sundaram, and A. Kel- liher, “Metafac: community discovery via relational hypergraph