Understanding Urban Dynamics Via Context-Aware Tensor Factorization with Neighboring Regularization
Total Page:16
File Type:pdf, Size:1020Kb
1 Understanding Urban Dynamics via Context-aware Tensor Factorization with Neighboring Regularization Jingyuan Wang, Junjie Wu, Ze Wang, Fei Gao, and Zhang Xiong Abstract—Recent years have witnessed the world-wide emergence of mega-metropolises with incredibly huge populations. Understanding residents mobility patterns, or urban dynamics, thus becomes crucial for building modern smart cities. In this paper, we propose a Neighbor-Regularized and context-aware Non-negative Tensor Factorization model (NR-cNTF) to discover interpretable urban dynamics from urban heterogeneous data. Different from many existing studies concerned with prediction tasks via tensor completion, NR-cNTF focuses on gaining urban managerial insights from spatial, temporal, and spatio-temporal patterns. This is enabled by high-quality Tucker factorizations regularized by both POI-based urban contexts and geographically neighboring relations. NR-cNTF is also capable of unveiling long-term evolutions of urban dynamics via a pipeline initialization approach. We apply NR-cNTF to a real-life data set containing rich taxi GPS trajectories and POI records of Beijing. The results indicate: 1) NR-cNTF accurately captures four kinds of city rhythms and seventeen spatial communities; 2) the rapid development of Beijing, epitomized by the CBD area, indeed intensifies the job-housing imbalance; 3) the southern areas with recent government investments have shown more healthy development tendency. Finally, NR-cNTF is compared with some baselines on traffic prediction, which further justifies the importance of urban contexts awareness and neighboring regulations. Index Terms—Urban Dynamics, Tensor Factorizations, Urban Planning, Spatio-Temporal Pattern, GPS Trajectory F 1 INTRODUCTION As reported by the World Bank1, at the end of 2016 understand the evolving rules of cities so as to make proper more than 53% population of the world, i.e., about 3.7 urban planning. The last one is to find urban dynamics billion people, lived in cities; about 36 mega-metropolises with good interpretability — an obscure urban dynamic is worldwide had a population of more than 10 million. Huge useless to decision making in real-world application sce- urban populations bring great challenges such as traffic narios. Despite of rich literature in applying matrix/tensor jams, educational/medical resource scarcity, environmental factorizations to model urban heterogeneous data, most of pollution, etc. Understanding the behavioral patterns of them aim to generate patterns to improve the predictive residents in a city, or urban dynamics for short, therefore be- accuracy of traffic volumes [2], [3], [4], but leave pattern comes an important yet urgent demand for urban planning explanation to luck. It is not until recently that a few works and public policy making from a smart city perspective. begin to take the understanding of urban dynamics as the Fortunately, the widely adopted mobile crowd sensing (MCS) primary research task, and the representative ones include technologies [1], such as GPS, mobile phones, and location- the earlier rNTD model using Tucker factorizations [5], the based services, give us an unprecedented opportunity to ac- city spectrum modeling using CP factorizations [6], and still cess to enormous and perhaps unbounded human mobility some using single source data [7], [8], [9] or for discover- data, which combined with urban infrastructure data offer ing urban functional zones only [10], [11]. These excellent arXiv:1905.00702v2 [cs.LG] 10 May 2019 a “rich ore” for discovery of urban dynamics. works, however, cannot meet all the above-mentioned re- In general, mining urban dynamics from MCS data quirements simultaneously. has three requirements. The first one is to model multi- In this paper, we propose a Neighbor-Regularized source heterogeneous data, which consist of mobility records context-aware Non-negative Tensor Factorization model of residents such as the origins and destinations, the travel (NR-cNTF) to discover explainable and evolving urban time, the purposes, and the surroundings hidden in different dynamics from multi-source heterogeneous urban data. In data sources such as GPS trajectories, urban contexts, and the NR-cNTF model, we introduce the concepts of data city maps. The second requirement is to capture long-term space and pattern space and describe the relations between evolutions, which is critically important for urban planners to urban data and urban dynamics. The Tucker factorization is then introduced with the POI-based (Point-Of-Interests) • J. Wang, Z. Wang and Z. Xioing are with the School of Computer Science urban contexts to factorize the ODT (Origin-Destination- and Engineering, Beihang Unversity, Beijing 100191, China. E-mail: Time) tensor into spatial, temporal, and spatio-temporal fjywang,ze.w,[email protected]. patterns of good interpretability. Moreover, a neighboring • J. Wu (corresponding author) is with the School of Economics and Management, Beihang University, Beijing 100191, China. E-mail: regularization that incorporates geographically neighboring [email protected]. relations is introduced into our model to further improve • F. Gao is with Microsoft Research Asia, Beijing, China. the explainability of spatial patterns. Finally, a simple yet 1. http://data.worldbank.org/ effective pipeline initialization approach is designed to cap- 2 TABLE 1 element of W, i.e., wpq, is a coefficient that describes the Notation Definition similarity between urban zones p and q using, e.g., points of interest (POI) data. Space Variable Definition Pattern-space variables: The variables in pattern space R the data tensor include a core tensor and three pattern projection matrices. Data rxyz the (x; y; z) element of R Space W the urban context matrix Assume there are I origin spatial patterns (OSP), J desti- wpq the (p; q) element of W nation spatial patterns (DSP), and K temporal patterns (TP) C M×I the pattern tensor hidden inside the data tensor R. We define O 2 R cijk the (i; j; k) element of C Pattern O; D; T the pattern projection matrices as a spatial projection matrix that projects M origin zones M×J Space ox; dx; tx the x-th row vectors of O; D; T into I OSP’s. Similarly, D 2 R is defined as another o:i; d:i; t:i the i-th column vectors of O; D; T spatial projection matrix that projects M destination zones oxi; dxi; txi the (x; i) elements of O; D; T N×K into J DSP’s. The matrix T 2 R is a temporal projection matrix that projects N time slices to K TP’s. The elements of O, D and T are denoted as oxi, dyj and tzk, respectively, ture the long-term evolutions of urban dynamics. indicating the projection intensities from the urban zones x, We conduct extensive experiments on a real-life data y and time slice z to OSP i, DSP j and TP k, 1 ≤ i ≤ I, set that contains the GPS trajectories of over 20,000 taxies 1 ≤ j ≤ J, 1 ≤ k ≤ K. We define a third-order tensor C as and over 400,000 POI records of Beijing from 2008 to 2015. a core tensor that describes the dynamics of resident travels The first scenario of the experiments is to verify the ability among temporal and spatial patterns. The (i; j; k) element of NR-cNTF in disclosing true urban dynamics and obtain of C, i.e., cijk, denotes the intensity of resident travels from managerial insights via NR-cNTF. The results indicate that: OSP i to DSP j within TP k. 1) NR-cNTF accurately captures four kinds of mobility rhythms and seventeen spatial communities of Beijing; 2) 2.1 Construction of Data Tensor the rapid development of Beijing in the CBD area, is indeed We here explain how to construct the data tensor R using at the expense of severer job-housing imbalance and there- real-life GPS trajectory data of Beijing Taxies. To this end, fore is unsustainable in a long run; 3) the southern areas we first segment the Beijing city map into M urban zones. of Beijing are experiencing unprecedented growth with the In the literature, quite a few methods including the grid recent government investments, and most importantly they based, morphology based, road networks based, and ad- have shown more healthy development tendency. The sec- ministrative boundaries based methods [12], [13] can fulfill ond scenario of the experiments is to testify the prediction this task. Here we adopt a Traffic Analysis Zones (TAZ) map power of NR-cNTF, which is compared with some baselines provided by Beijing Municipal Committee of Transport2 to on traffic prediction. The results demonstrate the superiority segment Beijing into M = 651 zones. Finally, since resident of NR-cNTF in tensor completion, which further justifies behaviors in city life are often cyclical every day, we divide the importance of adopting urban contexts and neighboring one day into N = 24 time slices (one hour per slice). The regulations in NR-cNTF. above procedure determines the three modes of R. We then compute the element values of R. Note that the 2 PROBLEM FORMULATION taxi GPS data are often organized as a set of quintuples in the form as hvid; time; longitude; latitude; statei, where In this section, we formulate urban dynamics discovery as vid is the unique ID of a taxi, (longitude; latitude) is the a context-aware tensor factorization problem. Table 1 lists location of the taxi, and state informs whether the taxi is the math variables to be used, which are divided into carrying any passengers at time time. We first obtain all two categories, i.e., data-space variables and pattern-space taxi-based passenger travels by removing the records with variables, according to their observability. Variables in the “no passengers” state. Then an origin-destination-time (ODT) data space are observable from real-world human mobility, record is constructed for each travel by picking up the first while variables in the pattern space are latent but crucial for and last records of the travel and then extracting the origin understanding urban dynamics.