<<

Computers, Environment and Urban Systems 60 (2016) 79–88

Contents lists available at ScienceDirect

Computers, Environment and Urban Systems

journal homepage: www.elsevier.com/locate/ceus

Understanding operation behaviors of in cities by matrix factorization

Chaogui Kang a,⁎, Kun Qin a,b,⁎⁎ a School of Remote Sensing and Information Engineering, Wuhan University, 129 Luoyu Road, Wuhan, Hubei 430079, China b Collaborative Innovation Center of Geospatial Technology, 129 Luoyu Road, Wuhan, Hubei 430079, China article info abstract

Article history: Taxicabs play significant roles in public systems in large cities. To meet repetitive demands of daily Received 4 February 2016 intra-urban , cabdrivers acquire self-organized habitual operation behaviors in space and time, largely Received in revised form 2 August 2016 with assistance of their longitudinal operating experience. Recognizing those collective operation behavior pat- Accepted 4 August 2016 terns of taxicabs will enable us to better design and implement services and urban development Available online 21 August 2016 plan. In this paper, we systematically study patterns of the spatial supply of 6000+ taxicabs in Wuhan, China based on a monthly collection of their digital traces and the non-negative matrix factorization method. We suc- Keywords: mobility cessfully identify a set of high-level statistical features of the spatial operation behaviors of taxicabs in Wuhan, demand-supply providing valuable insights to our knowledge of the demand and supply of taxicabs in (similar) large cities. Habitual operation behavior First, we decouple several spatially cohesive regions with intensive internal taxicab travels (termed as demand Digital trace regions), which intuitively reveal the well-known multi-sectored urban configuration of Wuhan. Second, by ap- Matrix factorization plying the non-negative matrix factorization to taxicab's longitudinal traces, we uncover remarkably self- organized operation patterns of cabdrivers in space (termed as supply regions) as reactions to the sectored distri- bution of daily travel behaviors. We find that a large proportion of cabdrivers frequently operate within single specific service area and a small proportion of taxicabs works as shifting tools between different service areas. Last, we focus on performances of taxicabs with distinct spatial operation behaviors and unveil their statistical characteristics in terms of frequency, duration and distance with on board. Our work demonstrates the great potential to understand and improve urban mobility and public transport system from cabdrivers' col- lective intelligence. © 2016 Elsevier Ltd. All rights reserved.

1. Introduction and provide a promising tool for inspecting empirically the spatio- temporal patterns of movement behaviors of a massive volume of taxi- Taxicabs play significant roles in public transport systems in large cabs in various city settings . cities. It has been intensively reported worldwide that intra-city travels In this paper, we systematically study the spatial patterns of labor- carried by taxicabs can account to N10% of total daily travels in a large supply behaviors of 6000+ taxicabs in Wuhan, China based on a city. To meet intensive and repetitive demands of daily intra-urban monthly collection of their digital traces and the non-negative matrix travels (Huff & Hanson, 1986; Eagle & Pentland, 2009), cabdrivers factorization method. Our primary research questions are as follows: might acquire self-organized habitual operation behaviors in space (1) Whether (and how) do taxicabs' habitual operation behaviors rely and time, with assistance of their longitudinal operating experience on the spatial distribution of intra-urban travel demands? (2) What and their cognition map of urban built environment. Recognizing are the differences between the performances of taxicabs with distinct those collective and habitual operation behaviors of taxicabs will enable spatial habitual behaviors? In solving these problems, we identify a set us to rationally design and implement public transport services and of high-level statistical features of the spatial operation behaviors of urban configuration (Castro, Zhang, Chen, Li, & Pan, 2013). In recent taxicabs in the case study city, which provide valuable insights to our years, the prevalent deployment of vehicle-embedded GPS devices in knowledge of the demand and supply of taxicabs in (similar) cities. taxi service and the easy accessibility of relevant data for the public First, we decouple several spatially cohesive regions with intensive in- (Farber, 2015) lead to a burst of studies on taxi/passenger mobility, ternal taxicab travels, which intutively reveals the well-known multi- sectored urban configuration of Wuhan. Second, by applying the non- negative matrix factorization to taxicab's longitudinal digital traces, ⁎ Corresponding author. we uncover that cabdrivers' operation patterns are remarkably self- ⁎⁎ Correspondence to: K. Qin, Collaborative Innovation Center of Geospatial Technology, 129 Luoyu Road, Wuhan, Hubei 430079, China. organized in space as reactions to the sectored distribution of daily E-mail addresses: [email protected] (C. Kang), [email protected] (K. Qin). intra-urban travel behaviors. We find that a large proportion of

http://dx.doi.org/10.1016/j.compenvurbsys.2016.08.002 0198-9715/© 2016 Elsevier Ltd. All rights reserved. 80 C. Kang, K. Qin / Computers, Environment and Urban Systems 60 (2016) 79–88 cabdrivers frequently operate within a single specific service area boost city dwellers' traveling; (3) The regularity of activity-venue and (i.e., supply region) and a small proportion of taxicabs works as shifting travel-route selections. It implements statistical quantities, such as cy- travel tools between (two) different supply regions. Last, we focus on clical rhythm (Liu et al., 2012b) and spatio-temporal entropy (Peng, performances of taxicabs with distinct spatial operation behaviors and Jin, Wong, Shi, & Liò, 2012), to describe the high predictability of deter- reveal their statistical characteristics in terms of frequency, duration mining taxi trip origins and destinations in cities. Strong evidences exist and distance with passenger on board. to demonstrate that the number of frequent visitation venues and The major contribution of our work lies in the following: (1) We un- routes of individual dwellers' traveling is astonishingly small, e.g. with veil the relationship between longitudinal intra-urban travel demands an average of four venues and two routes. These studies shed bright and taxicabs' self-organized habitual behaviors; (2) We develop a ma- light on the repetitious daily routines and the highly limited activity trix factorization based analytical framework to quantify, visualize and budgets of city dwellers. evaluate taxicabs' operation behavior patterns in space. Our work also demonstrates the great potential to understand and improve urban mo- 2.2. Cabdrivers' mobility patterns bility and public transport system from cabdrivers' collective intelligence. For the sake of improving taxi service efficiency, another strand of studies heavily emphasize on cabdrivers' operation strategy for gaining 2. Related work optimal (e.g., minimal and maximal) profit(Liu, Andris, & Ratti, 2010). For most cabdrivers, there are three key decisions they have to make The continued development of urban computing and human mobil- during working as follows: (1) Where to search for the next passenger ity studies has been accumulating a rich body of evidences of the de- by cruising? (i.e., hunting behavior). Previous literature have reported scriptive characteristics of intra-urban travel activities. Taxi driving several well-received searching strategies based on hotspot detection trajectories, as a prevailing data source of related works, contain at (Chang, Tai, & Hsu, 2009), cabdrivers' historical performances (Li et al., least three major facets of human travel behaviors as being summarized 2011), and time-series-based prediction (Moreira-Matias, Gama, as: (1) General mobility patterns of city dwellers represented by pas- Ferreira, & Damas, 2012) of travel demands. The hunting behavior sengers' daily usages of taxicabs (Section 2.1); (2) Operation behavior should also take the cost of energy (as a proxy of cruising distance) patterns of cabdrivers represented by hunting, waiting and shifting for and time (as an indicator of traffic condition) into consideration profit-maximization (Section 2.2); (3) Structural properties of the (Yuan, Zheng, Zhang, & Xie, 2013); (2) Where to, if not hunt, wait for intra-urban spatial interaction network derived from taxi trip origins potential ? And how long to wait? (i.e., waiting behavior). and destinations (Section 2.3). This operation behavior is strongly intertwined with the hunting strat- egy. In most circumstances, cabdrivers would choose to wait at, or near- 2.1. Passengers' mobility patterns by to, the previous drop-off location if the current location is a hotspot with massive potential passengers (Li et al., 2012) or the cost of hunting Patterns of city dwellers' travel behaviors have longstanding impor- next passenger is too high. It is also noteworthy that certain literature tance in time geography, transportation and urban studies. As early as in (Li et al., 2011) argues that hunting is a dominant strategy upon waiting 1970's, Hägerstrand proposed his initial space-time framework of for most cabdrivers in cities; (3) When and where to interrupt regular representing individuals' travel behaviors (Hägerstrand, 1970). Since taxi working behaviors? This problem might involve cabdrivers' then, it has turned out to be a widely-reaching research community refueling, resting, shifting and ceasing behaviors. Economists develop with concentration on individuals' travel activities in space and time. several profound theories including income reference dependence In the last decade, the rapid development of human dynamics and sta- (Camerer, Babcock, Loewenstein, & Thaler, 1997) and gain-loss utility tistical physics further refine and enrich relevant research topics and (Farber, 2015) to explain such kind of daily labor supply decisions of methodologies in this field (González, Hidalgo, & Barabasi, 2008). cabdrivers. Urban researchers focus more on the optimization of urban Along with cellphone and social media data, taxicabs' digital traces infrastructures (such as the allocation of gas stations and parking lots) have contributed substantively to our understanding of individual and and taxicab dispatch service (Zhang, Yuan, Wilkie, Zheng, & Xie, collective human mobility patterns. Existing literature, in some ways, 2015). To the best of our knowledge, existing studies demonstrate regard travel behavior patterns of taxicab passengers as a proxy of the more intensive interests on cabdrivers' passenger hunting and waiting general intra-urban human mobility patterns. behaviors than other operation behaviors. From this starting point, a set of valuable features of taxi passengers' mobility patterns has been discovered. Research focus in this strand lies 2.3. Structures of taxi-trace-based spatial network in the following: (1) The spatio-temporal distributions of trip origins and destinations. An important research question is where and to From a high-level point of view, geographers and regional scientists what extent taxi pick-ups (i.e., trip origins) and drop-offs (i.e., trip des- usually simplify taxicabs' digital traces into an origin-destination matrix tinations) concentrate in spatial and temporal dimensions (Chen et al., or a set of location visitation sequences. In either transformation mode 2013; Hu, Miller, & Li, 2014b). Further investigations can involve the (i.e., matrix or sequence), they can be regarded as quantities of the spa- correlation between the temporal variations of taxi origin-destination tial connections between different locations within a city. In the nota- intensity and the underlying land use characteristics at specificlocations tion of graph theory, we can derive (taxi-trace-based) spatial (Liu, Wang, Xiao, & Gao, 2012b) as well as the temporal variations of networks from taxicabs' digital traces by treating locations as nodes taxi trip volume between each single pair of originating and destinating and taxi passenger travels as edges linking a pair of originating and locations (Liu, Kang, Gong, & Liu, 2016); (2) The distance decay effects destinating nodes. The structural properties of the resulting network, of collective travel activities. Statistical tools and models for quantifying such as centrality and community, are closely related to a couple of the distance decay effect of travel behaviors are thoroughly testified. key topics in geography and transportation, such as trafficdistribution However, debate yet continues between those in favor of the exponen- (Gao, Wang, Gao, & Liu, 2013), congestion propagation (Wang, Lu, tial law (Liang, Zheng, Lv, Zhu, & Xu, 2012) and those in favor of the Yuan, Zhang, & Van De Wetering, 2013) and regionalization (Goddard, power law (i.e., Lévy flight) (Jiang, Yin, & Zhao, 2009). To uncover un- 1970). Particularly, with advances of complex network science, commu- derlying mechanism of observed patterns, a couple of further studies nity detection analysis (Fortunato, 2010) has became a powerful tech- focus on the topological structures of road networks (Gallotti, Bazzani, nique for exploring taxi-trace-based spatial networks. Rambaldi, & Barthelemy, 2015) and the spatial attractions of land Depending on the mode of transformation as discussed above, two units (Liu, Kang, Gao, Xiao, & Tian, 2012a) that could constrain and kinds of spatial communities have been repetitiously recognized from C. Kang, K. Qin / Computers, Environment and Urban Systems 60 (2016) 79–88 81 taxi-trace-base spatial networks in existing literature. For the case of community detection in networks is given in the literature (Fortunato, trip origin-destination matrix, most studies detect that a spatial com- 2010). munity represents contiguous zones with frequent intra-zone taxi In general, we interpret these discovered communities (see Table 2) travels while inter-zone taxi travels is limited (Kang, Sobolevsky, Liu, with background information of the study area: (1) Community

& Ratti, 2013). It is therefore a clear delineation of the multi-sectored (i.e., node set)C1 exactly matches the area of Prefecture Hanyang within spatial patterns of taxi demands within a city. For the case of location the case study area. There might be two reasons for this phenomenon: visitation sequences, a spatial community represents a cluster of vicinity the relatively high undeveloped level of Hanyang and the effect of two zones which are frequently visited by a group of taxicabs with similar large rivers (see dot-lines in Fig. 1a) that separates Hanyang from traveling patterns in space (Guo, Liu, & Jin, 2010). In this sense, it is a de- Hankou and Wuchang; (2) Community C2, in a sense, is an isolated dis- scriptive indicator of different service zones of taxicabs, or in other trict from others in that there are a part of taxicabs that are restricted to words, different spatial patterns of taxi supplies in the city. Note that operate within Dongxihu area due to transportation policy. As a result, the mismatch of taxi demand and supply will be self-evident by map- there are significantly more taxi travels within this district than taxi ping those two kinds of spatial communities simultaneously within travels between it and any other districts; (3) Community C3 covers the same city. Recent advances in related studies also include extending the majority of the South-Eastern part of the city with a significant the network as time-evolving to reveal community dynamics (Wang, hub district 〈32: “Luxiang”〉. Within this area, Guanggu is becoming a Gao, Cui, Li, & Xiong, 2014), applying multiple-level network rapidly rising sub-center of Prefecture Wuchang as well as the partitioning to inspect urban functional zones (Liu, Gong, Gong, & Liu, entire City of Wuhan. Note that Guanggu is also famously known as 2015), and developing universal knowledge and unified theories to the “Optic Valley” of China as a major high-tech and electronic industry model underlying mechanisms (Yue, Lan, Yeh, & Li, 2014). center; (4) Community C4 contains the largest (high-speed) rail station Nonetheless, despite of significant advances in taxi mobility analysis, of Wuhan (i.e., Wuhan Rail Station) at the east corner of we argue that there exists a wide gap between the understanding of taxi 〈7:“Yangchunhu”〉, which serves as a primary “source-sink” area of demands (i.e., passengers' mobility) and the understanding of taxi sup- taxi trips; (5) Community C5 covers the historical districts of Prefecture plies (i.e., cabdrivers' mobility) from a spatial perspective. Previous Wuchang, showing relatively homogenous interactions between each studies focus on either spatial patterns of travel demands or spatial pat- other and a spread toward south-west along with major radical roads terns of taxi labor supplies. However, the interplay between those two between 〈2:“(Central)Wuchang”〉 and 〈31: “Qinglinghu”〉 (see Fig. 7b in space and time is still quite unclear. In particular, our knowledge of in Section S1); (6) Community C6 delineates the core area of Prefecture cabdrivers' collective intelligence of emerging an equilibrium between Hankou, which is also the central downtown of the case study city. travel demands and labor supplies of taxicabs within cities is substan- Within this community, the district 〈8:“Hankou”〉 is the hub node tially limited. that absorbs a majority of taxi traffic. It is actually a core business district along with the Yangtze River in the city. 3. Spatial characteristics of the labor supply of taxicabs Based on above analysis, we conclude here that the travel demands within the case study city demonstrate a remarkably multi-sectored In this paper, we utilize a monthly collection of digital traces from pattern in space. We also reveal additional hidden patterns that are 6023 taxicabs in the City of Wuhan, China during May 1, 2014 to May not easily perceived before, i.e., the isolated community (i.e., district 31, 2014 for empirical analysis of taxi demand-and-supply patterns. 18) due to specific transportation policy and the rising of Guanggu Each digital GPS point log accurately records spatial location (in longi- (i.e., district 32) as the new center of Prefecture Wuchang. So an imme- tude and latitude), timestamp, operation status (as vacant or occupied), diate following question is how taxicabs operate collectively and self- driving direction and spot speed of a given taxicab. All those taxicabs organized to meet these sectored travel demands? To answer this ques- work continuously (with pick-ups and drop-offs) in every single day tion, we therefore explore the visitation sequence between different of the study period, i.e., 31 days. Based on the pick-up and drop-off re- districts for each taxicab and target to recognize their habitual operation cords, we obtain 7,627,754 taxi passenger trips and then assign each area (we call it a “supply region”). trip origin-destination pair {(longitude,latitude)O,(longitude,latitude)D} into a pair of associated districts {districtO,districtD} belonging to the 36 districts within the study area (refer to Fig. 1aandTable 1). Note 3.2. Detecting supply regions of taxicabs that the statistical characteristics of those taxi origin-destination trips (see Fig. 6 in Section S1) well match findings reported in previous liter- We define a visitation sequence S : fdistrictt1 ; districtt2 ; ⋯; districttn g ature, e.g., the log-normal distribution of individual trip durations as the sequence of consecutive taxi picking-up locations in the granular- (Wang, Pan, Yuan, Zhang, & Liu, 2015) and the exponential distribution ity of district in that the focus of our analysis is taxi supply service. By of trip displacement distances (i.e., Euclidean distances between origins doing so, we further obtain a 6023 × 36 matrix M|T|×|V|, where T is a and destinations) (Liang et al., 2012). set of 6023 taxicabs, denoting the visitation frequency of each district for each taxicab. Statistics of the monthly performances of these taxi- 3.1. Decoupling demand regions of taxicabs cabs in terms of number of pick-ups, total hours with passenger on board and sum of passenger carrying distances are shown in Fig. 2.Gen- Taking districts as nodes, pairs of associated districts as edges and erally speaking, all the three performance indicators visually follow the frequencies of corresponding associated districts as edge weights, we (approximately) normal distribution over the entire population of taxi- obtain a network N=(V,E) with the number of nodes |V|=36 and cabs, though their statistical significance levels are low (p−value≈0). the number of edges |E|=1296. From the skeleton of our obtained Similar empirical findings have been intensively reported in previous network as shown in Fig. 1b and c, we identify a small set of studies (Hu, An, & Wang, 2014a). It is also noteworthy that, in practice, nodes {〈2:“Wuchang”〉,〈5:“Wuluo”〉,〈8:“Hankou”〉,〈11: “Jianshe”〉, the exactly normal distributions can be only observed in certain quanti- 〈19: “Zhongjiacun”〉,〈32: “Luxiang”〉} that concentrate a large propor- ties in physics, whereas many complex systems follow the approxi- tion of taxi passenger trips (approximately, 62.8%) within the study mately normal distributions. area. This evidence is highly consistent with the fact that Hankou and To identify taxicabs' habitual operation patterns (i.e., supply re- Wuchang are two largest centers with commercial facilities and popula- gions), we apply the non-negative matrix factorization (Lee & Seung, tion in Wuhan. We further apply community detection on network N 1999) to the visitation frequency matrix M. As well-known, non- and identify 6 spatially cohesive regions yielding an optimal modularity negative matrix factorization is a very robust technique to detect hidden Q ¼ 0:74 . Note that a detailed introduction of modularity-based patterns in data matrix whose entries cannot be negative as in our case 82 C. Kang, K. Qin / Computers, Environment and Urban Systems 60 (2016) 79–88

(a) Zoning map of Wuhan (b) Skeleton of taxi network

(c) One-dimensional representation of taxi network and community structure

Fig. 1. Descriptive information of Wuhan, China and its taxicab service. There are 36 districts within the study area as shown in (a) and the name of each district is listed in Table 1. Note that the dot-lines are rivers that separate Hankou, Wuchang and Hangyang. Based on the digital traces of 6023 taxicabs during 31 consecutive days, we obtain 7,627,754 individual origin-des- tination trips. We then build a spatial network by taking each district as a node and each taxi trip as an edge connecting the originating district and the destinating district. By converting the network into weighted form and filtering out edges with weights b8000 and 1000 respectively, we obtain the skeleton of the entire taxi network in (b) and its one-dimensional rep- resentation in (c). Note that we rescale edge weights of the network and then set it as the width of the edge for visualization. We also label the nodes with 6 different colors based on their assignment of communities, yielding a maximal modularity as high as 0.74. study since the visitation frequency is definitely larger than, or equal to, Matrix factorization enables us to decouple the different process, to zero. Moreover, the spatial distributions of taxi trip origins and destina- cluster the objects (or attributes) of a dataset, and to force representa- tions are combination of visitations that have arisen from different pro- tions of the data in terms of a small number of substructures where cesses to produce the overall distribution captured in the dataset. mainstream data-mining techniques ineffective (Skillicorn, 2007).

Mathematically, a non-negative matrix factorization M|T|×|V| ≈ Table 1 W|T|×K ×HK×|V| targets to minimize the difference (or in other words, Name and index of the 36 districts within the case study area. We adopt a zoning system maximize the approximation) between the left-hand side and the based on both administrative and functional divisions of the city. Due to population con- right-hand side of the equation under non-negative constraints, where centration, the spatial coverage of centering districts are much smaller than the coverage K is the optimal factoring rank, W is often called the coefficient matrix of peripheral districts.

123456 Jiyuqiao Wuchang Xudong Zhongnan Wuluo Qingshan Table 2 C 789101112 Discovered demand regions in the taxi network. Note that we label the regions from 1 to C Yangchunhu Hankou Baofeng CBD Jianshe Erqi 6 in the order of the clusters of districts from left to right in the one-dimensional taxi net- 13 14 15 16 17 18 work (see Fig. 1c). The description summarizes the overall coverage of each region with Gutian Yangchahu Houhu Changqing Jingyinhu Wujianshan background information. 19 20 21 22 23 24 Region Districts Description Zhongjiacun Sixin Wangjiawan Zhuankou Houguanhu Caidian

25 26 27 28 29 30 C1 19,20,21,22,23,24 Core area of Prefecture Hanyang Baiyushan Huashan Panglong Tianhe Baishazhou Nanhu C2 18 Dongxihu area with own taxicabs C3 26,32,33,34,35,36 The high-tech center “Optic Valley” 31 32 33 34 35 36 C4 6,7,25 High-speed rail station of Wuhan Qinglinghu Luxiang Guanshan High-tech Tansunhu Donghu C5 1,2,3,4,5,29,30, 31 Historical area of Prefecture Wuchang zone C6 8,9,10,11,12,13,14,15,16,17,27,28 Core area of Prefecture Hankou C. Kang, K. Qin / Computers, Environment and Urban Systems 60 (2016) 79–88 83

(a) Number of pickups (b) Travel durations (c) Travel distances

Fig. 2. Monthly performances of 6023 taxicabs in Wuhan. The background colors describe the {1%, 15%, 85%, 99%} percentiles of the empirical distribution as well as their coordinates over the x-axis. The red dot-line shows the normal function as a best fitting of the empirical distribution. We categorize those taxicabs between percentiles {0%, 1%} and {99%, 100%} as “out- lier(s)”, {1%, 15%} as “low-performed”, {15%, 85%} as “ordinary-performed” and {85%, 99%} as “high-performed”, respectively.

(a) Rank selection (b) Consensus analysis

(c) Eigenbehaviors

(d) Clusters of taxicabs

Fig. 3. Non-negative matrix factorization of taxicabs' visitation matrix M|T|×|V|. The best decomposition is given by the rank K=6 yielding the largest cophenetic correlation coefficient. In (c), the background indicates the 6 blocks of districts derived from the consensus map. In (d), the 6023 taxicabs are categorized as 6 different groups based on compositions of their eigenbehaviors. The number below represents the count of taxicabs in each group. 84 C. Kang, K. Qin / Computers, Environment and Urban Systems 60 (2016) 79–88 and H is often called the dictionary matrix. The key procedure of non- showing the taxi supply patterns with finer spatial resolution (see negative matrix factorization is to determine the optimal factoring Fig. 7b). Fortunately, the eigenspaces produced by the non-negative ma- rank K (meeting K≪|T|). To date, a couple of mature rank selection trix factorization on matrix Mgrid is robust, stable, and in great consis- criteria, such as cophenetic correlation coefficient (Brunet, Tamayo, tence with those factorizing results from matrix M|T|×|V|, as shown in Golub, & Mesirov, 2004) and the RSS (short for residual sum of squares) Fig. 8. Note that we run several trials on different factorizing rank K curve (Hutchins, Murphy, Singh, & Graber, 2008), have been developed. and choose to present results based on K=10 in the plots for two rea- In this study, we adopt the cophenetic correlation coefficient criterion sons: (1) The overall pattern of eigenspaces is insensitive to rank K= and find that a rank K=6 yields the best decomposition result for ma- [5:10]; (2) The rank K=10 is identical to the number of taxicab groups trix M (see Fig. 3a). The corresponding consensus map (Fig. 3b), that we will analyze in next section. These eigenspaces will also assist us which indicates the probability that a pair of districts are assigned to to better understand the performance of taxicabs in different supply re- the same cluster in multiple trails, illustrates 6 separating blocks gions (Table 4). Nevertheless, to be clear and concise, we focus on the

(i.e., clusters of districts) that frequently visited by a distinct group of analysis based on the visitation matrix M|T|×|V| in next sections. taxicabs. Note that the rank of factorization K is, in principle, identical to both the number of blocks in the consensus map as well as the 4. Performance of taxicabs in different supply regions number of clusters of taxicabs. By definition, these blocks are our recog- nized supply regions of taxicabs in the case study city. Comparing blocks Based on clustering analysis of taxicabs in terms of their

D : fD1; D2; D3; D4; D5; D6gwith communitiesC : fC1; C2; C3; C4; C5; C6g, eigenbehaviors, we obtain a set of 6 groups of taxicabs T : fT1; T2; T3; we find a relation that meets D∩C≈C≈D and thus conclude that taxi- T4; T5; T6g. Thereafter, by zooming into each group, we further identify cabs operate in response to the inherent sectored structure of travel de- acoupleofsub-groupsasshowninFig. 5. For instance, the 256 taxicabs mands remarkably well, perhaps based on their longitudinal experience T T F ; TS in group 1 can be divided into two sub-groups f 1 1g, where super- of passenger picking-up and carrying behaviors. script F indicates that taxicabs in the parent groupT1 is largely fixed to a Under closer scrutiny, we visualize the dictionary matrix H6×36 and single eigenbehavior, i.e., EB2; and superscript S indicates that taxicabs fi the coef cient matrix W6023×6 in Fig. 3c and d, respectively. For the dic- in the same group is frequently shifted between two eigenbehaviors, tionary matrix, we interpret the 6 spatial patterns as a set of i.e., fEB2; EB3g. Applying similar criteria to all the 6 groups, we finally EB : EB ; EB ; EB ; EB ; EB ; EB eigenbehaviors f 1 2 3 4 5 6g, each of which is obtain 10 sub-groups as T : fT F ; TS ; TS ; T F ; T ; TS ; T F ; T ; TS ; T Fg.Note fi 1 1 2 2 3 4 4 5 6 6 plotted as a line in Fig. 3c. Note that we de ne the operation behavior that we can divide these sub-groups iteratively based on dendrograms pattern associated with a single (or a combination of multiple) supply in the plots. To understand descriptive characteristics of taxicabs within fi region(s) as a eigenbehavior. It therefore not only quanti es the visita- each group, we compute three quantities as number of pickups, total tion frequency of each district in the same supply region, but also shows durations with passenger on board and total passenger carrying dis- the operation patterns as a combination of the multiple supply regions tances for each group, and then compare them with the overall statistics (Table 3). Intuitively, within each block the visitation frequency of of all the taxicabs in the city. The results are elaborated in Table 5 and each different district demonstrate a remarkably heterogeneity in Fig. 20 (in Section S1), and we interpret them under 5 distinct scenarios Fig. 4. For instance, the most visited district associated with as follows. eigenbehavior EB1 is district 〈7:“Yangchunhu”〉 involving the Wuhan

Rail Station as aforementioned. It implies that each eigenbehavior, to a Under scenario I : fT3; T5g, there are a large proportion (N60%) of great extent, is determined by a few hub districts. Based on those taxicabs operate repeatedly within the central area of Hankou and fi eigenbehaviors and the af nity of each taxicab of adopting them, we fur- Wuchang. These districts are crowded by city dwellers and usually T : T ; T ; T ; T ; T ; T ther categorize the 6023 taxicabs into 6 groups f 1 2 3 4 5 6g produce and absorb the majority of daily population movements. (colored from yellow to red) as shown in Fig. 3d. Due to this fact, the monthly number of pickups of taxicabs in For the sake of illustrative ease, we also map the 6 types of group T and T are slightly larger than the average level of all the eigenbehaviors on space in Fig. 4 and termed them as eigenspaces. As 3 5 discussed above, these spatial patterns are distinctly different with 6023 taxicabs within the case study city. Although taxi travel de- each other, demonstrating the ability of the non-negative matrix factor- mands (i.e., the density of pick-ups) is relatively strong in the central ization for taxi supply region identification. Additionally, the visitation area of Hankou and Wuchang, the competition between taxicabs is frequency of a district is in substantively negative correlation with its also very fierce in that more than half of all taxicabs frequently oper- distance to the central (i.e., hub) district of its belonging eigenspace. ate within these districts. As a result, taxicabs' picking-up perfor- To avoid potential bias due to variable spatial granularity in analysis, mances is quite close to the average level of all taxicabs. However, we also reproduce all previous analytics based on a 500 m × 500 m the working hours with passenger on board of taxicabs in group T3 grid partitioning of the case study area (see Fig. 7a in Section S1). By and T5 are significantly larger than the city-wide average level. To- doing so, we obtain a large(r) visitation frequency matrix Mgrid , 6023×4096 gether with the fact that the monthly passenger carrying distances of these taxicabs are also close to city-wide average level, we con-

clude that the average speed of taxicabs in groups T3 and T5 is rela- fi Table 3 tively slow when there are passengers on board. This nding Discovered supply regions from taxicabs' visitation sequences and their matched demand confirms the fact that the traffic condition in the downtown area is D D regions. Note that we label the regions from 1 to 6 in the order of blocks in the consensus usually heavy in that they are crowded by city dwellers and auto ve- map (see Fig. 3b). In the main text, we use “supply region” and “eigenspace” interchange- fi ably, and call the visitation patterns associated with each supply region as its hicles. Nonetheless, for gaining stable pro t with high certainty, to “eigenbehavior”. The mismatched parts between the supply region and its corresponding operate within the downtown area of the city is still an acceptable demand region are highlighted in red-and-bold font. strategy, resulting in a high probability of picking-up potential pas- Supply Region Eigenbehavior Districts Demand Region sengers and a little more loss of time. D EB C II : T F ; T F 1 1 3,6,7,25 4 Scenario f 2 4 g involve those taxicabs operating around Han- D EB C 2 2 17,18 2 yang and Guanggu area, in particular, along with certain major roads D3 EB3 8,9,10,11,12,13,14,15,16,*,27,28 C6 as shown in Figs. 18 and 13. As aforementioned, Hanyang and D4 EB4 19,20,21,22,23,24 C1 D5 EB5 1,2,*,4,5,29,30,31 C5 Guanggu are the secondary (but rising) centers of the study city. D EB C 6 6 26,32,33,34,35,36 3 Population within these areas are not as crowded as the central C. Kang, K. Qin / Computers, Environment and Urban Systems 60 (2016) 79–88 85

(a) EB-1 service scheme (b) EB-2 service scheme

(c) EB-3 service scheme (d) EB-4 service scheme

(e) EB-5 service scheme (f) EB-6 service scheme

Fig. 4. Spatial patterns of the 6 types of eigenbehaviors. The intensity of red color represents the visitation frequency of each district. The brighter the color is, the higher the value (i.e., the number of pickups) is.

area of Hankou and Wuchang. It results in two characteristics of taxi Table 4 Discovered groups of taxicabs based on their dominant operation patterns. Note that we demands and supplies. On the one hand, taxi demands in these dis- T T label the groups from 1 to 6 in the order of clusters of taxicabs from left to right in tricts are significantly less than city-wide average level. The statistics the clustergram map (see Fig. 3d). T F T F on the number of pick-ups for taxicabs in group 2 and 4 demon- Group Operation Pattern # Taxicabs Proportion Description strate this fact. On the other hand, since these areas are with low T EB 1 2 dominant 156 2.6% Dongxihu population and traffic density, taxicab's travel speed within them T EB 2 4 dominant 878 14.6% Hangyang should be relative high. In other words, the travel durations are ex- T3 EB3 dominant 2714 45.1% Hankou T4 EB6 dominant 891 14.8% Guanggu pected to be small(er). An alternative explanation to our empirical T EB 5 5 dominant 1002 16.6% Wuchang observations is that the small travel duration is a consequence of T EB 6 1 dominant 382 6.3% Wuhan Rail Station small number of pick-ups. But it is rejected immediately when we 86 C. Kang, K. Qin / Computers, Environment and Urban Systems 60 (2016) 79–88

(a) EB-2 dominant taxicabs (b) EB-4 dominant taxicabs

(c) EB-3 dominant taxicabs (d) EB-6 dominant taxicabs

(e) EB-5 dominant taxicabs (f) EB-1 dominant taxicabs

Fig. 5. Dendrograms of the clustering of taxicabs into different groups. Our analysis is based on the 10 groups partitioning that yields distinctly different spatial patterns between groups T F TS TS T F and a moderate proportion of taxicabs in each group. The number (and proportion) of taxicabs assigned to each group are j 1 j¼67 (1.1%), j 1j¼89 (1.5%), j 2j¼389 (6.5%), j 2 j¼489 T TS T F T TS T F (8.1%), j 3j¼2714 (45.1%), j 4j¼339 (5.6%), j 4 j¼552 (9.2%), j 5j¼1002 (16.6%), j 6j¼87 (1.4%), j 6 j¼295 (4.9%), respectively. In consistency with Fig. 3d, the color indicates the standardized coefficient (in the range of −2 and 2) for distinct operation behavior pattern.

Table 5 Monthly performance of taxicabs within different groups. Note that the arrow indi- inspect the distributions of monthly passenger carrying distances of cates the relationship between city-wide level and sub-group level performances: right- those taxicabs. The carrying distances are significantly longer than N b toward - [sub-group city-wide], left-toward - [sub-group city-wide], center-toward - city-wide average level. Recalling that a majority of travel trips with- [sub-group concentrates at middle-range], and straight-line - [sub-group ≈ city-wide]. Please refer to Fig. 9 in Supplementary Information for more details. in Hanyang and Guanggu follow certain (long) major roads, we con- fi T F Scenario Group Description Pickup Duration Distance clude with great delity that the travel speed of taxicabs in group 2 and T F is relatively fast. In this sense, those taxicabs (approximately, IT3 Central area of Hankou – → – 4 T5 Central area of Wuchang – → – 17% out of all) adopt an aggressive strategy of maximizing woking II T F Central area of Hanyang ←← → 2 efficiency in terms of speed and distance and relying slightly upon T F Guanggu High-Tech ←← → 4 the frequency of passenger pick-ups. III T F Dongxihu District →← ← 1 F IV T F Wuhan Rail Station →← →← Scenario III : fT g reveals the operation pattern of those 1.1% taxi- 6 1 V TS Hankou & Hanyang →← → →← 2 cabs restricted to work in Dongxihu. Due to less competition with TS Wuchang & Guanggu →→←→ 4 taxicabs in other groups, those taxicabs generally gain a very large VI TS Dongxihu & Hankou ←← →← 1 number of pick-ups. However, the authorized operating area for VII TS Wuhan Rail Station & Wuchang →→←→ 6 these taxicabs is quite small. That is to say, the corresponding taxi C. Kang, K. Qin / Computers, Environment and Urban Systems 60 (2016) 79–88 87 trips are usually very short in distance and duration. Although the number of pick-ups and short travel durations with passenger number of pick-ups is large, the total working hours and carrying TS on board for taxicabs (1.5% out of all) in group 1. This operation T F fi distances with passenger on board of taxicabs in group 1 are signif- strategy is, on the one hand, pro table and, on the other hand, icantly lower than city-wide average level. The operation strategy of with great uncertainties to succeed to find a potential passenger. those taxicabs is remarkably determined and protected by specific TS As for taxicabs in group 6, the travel demands nearby the Wuhan transportation policy. It thus grants those taxicabs a dominant com- Rail Statin and the core are of Wuchang are both strong, resulting parative advantage upon other taxicabs in terms of the ease of find- a relative large number of pick-ups. Despite that there are a few ing potential passengers. expressways between the central area of Wuchang and the IV : T F fi Scenario f 6 g involves those taxicabs frequently operating be- Wuhan Rail Station, to our knowledge the traf c condition is usu- tween the Wuhan Rail Station and its surrounding districts. As ally very heavy on these candidate routes. Compared with taxi- well-known, the rail station is a major transport hub within the TS TS cabs in 1, the travel speeds of taxicabs in 6 are slightly slower. study area and it is located nearby the periphery of the city. There- It is confirmed by the observation that the monthly working fore, the travel demands are strong and the traffic condition is light hours with passenger on board of those taxicabs are slightly lon- within these districts, relatively. We find that the monthly number ger than city-wide average level. Nevertheless, it seems that op- T F of pick-ups of taxicabs in group 6 is substantively larger than the erating nearby transport hubs like the Wuhan Rail Station is still average number for all taxicabs. Meanwhile, the total hours with a good strategy in that the travel demands and distances are sta- passenger on board are much less and the total lengths of passenger ble and high. carrying behaviors are moderate. Put it straightforwardly, taxicabs in To summarize, our analysis unveils self-organized operation behav- T F iors of taxicabs as reactions to characteristics of city dwellers' travel group 6 , accounting for about 5% of all taxicabs within our case study city, frequently serve passengers around the Wuhan Rail Sta- activities. The resulting supply region is an illustration of taxicabs' tion and their average speeds of traveling toward the (nearby) des- performances associated with distinct operation strategies, each of tination are relatively high. This operation strategy seems to be more which involves three factors, i.e., number of monthly pick-ups, preferable and efficient than driving under scenario I in that travel total working durations with passenger on board and total distances demands under scenario IV are as strong as scenario I but the traffic of passenger carrying trips. Interestingly, there is no dominant strat- condition is much better (in terms of travel speed). egy upon others that maximizes the number of pickups and the car- Apart from those taxicabs adhering to a single eigenspace, taxicabs op- rying distances, and minimizes the carrying durations at the same V : TS ; TS time. In most supply regions, taxicabs show a trade-off between erating under scenario f 2 4g always travel between primary centers and secondary centers within the city, such as between these three factors. We argue that these findings are generalizable TS to other cities. First, our proposed analytical framework is applicable Hankou and Hanyang for group 2 as well as between Wuchang for other cities and mobility datasets. Second, based on our experience and Guanggu for group TS. Within a polycentric city as Wuhan, the al- 4 and knowledge, we find that demand regions are widely discovered in location of subcenters on space havetomaximizetheoverallaccessi- taxi networks of many other cities, such as (Kang, Liu, & Wu, bility of city resources. So the distances between those separating 2015), Shanghai (Liu et al., 2015) and Singapore (Kang et al., 2013). subcenters are usually large. Moreover, subcenters also concentrate Note that, in existing literature, the detected regions are often defined a lot of population and inter-subcenter travel demands. This is con- as functional regions within large cities. More importantly, we argue firmed by our observations on taxicabs' operation behaviors in that randomly walking across the city is an inefficient strategy for cab- group TS and TS.Ingeneral,thenumberofpick-upsforthosetaxicabs 2 4 drivers. The more efficient way is to search and to carry passengers is above the city-wide average level. The relation also holds for the within a specific region, which has also been (partially) proven in monthly passenger carrying distances. However, by analyzing the dis- existing literature (Liu et al., 2010). Last, we argue that an equilibrium tributions of those taxicabs' working durations with passenger on of travel demands and supplies of taxicabs will not be reached in long board, we find the situation is a little bit disappointing. Due to heavy term for taxicabs, if there exist supply regions with dominant advan- traffic condition in originating and destinating subcenters as well as tage upon others. In practice, due to competition, a dominant supply limited number of alternative routes between them, the monthly sta- region will attract redundant taxicabs, and in turn decrease its advan- tistics on the passenger carrying durations of those taxicabs are rela- tage in the long-term. tively larger than city-wide average level. In a sense, those taxicabs (12% out of all) are shifting transport tools transferring city dwellers 5. Conclusions and outlook between subcenters of the city. This operation strategy shows advan- tages on picking-up frequency and carrying distances, but has disad- Understanding patterns of taxi demand-and-supply in cities has vantage on travel speed. many practical applications for urban management and transportation VI : TS VII : TS fi Scenario f 1g and Scenario f 6g are two speci ccases. intelligence. In this article, we develop a matrix factorization based an- The former captures taxicabs' operation behaviors between alytical framework to detect taxicabs' operation patterns in space by an- Dongxihu and the core area of Hankou. The latter captures taxi- alyzing their continuous digital traces. We recognize several typical taxi cabs' operation behaviors between the Wuhan Rail Station (and demand regions (i.e., spatial communities of trip origins and destina- its neighboring districts) and the core area of Wuchang. The spa- tions) as well as several taxi supply regions (i.e., eigenspaces frequently fi tial distances between associated trip origins and destinations are visited by a group of taxicabs). The identi ed demand regions and the identified supply regions demonstrate a remarkable match on space, often very large (please refer to Fig. 4). As a result, the monthly reflecting the self-organized operation behaviors of taxicabs as reac- TS TS passenger carrying distances for group 1 and 6 are both very tions to characteristics of city dwellers' travel activities. We argue that large. Furthermore, Dongxihu is a sparsely populated area within the supply region is an illustration of taxicabs performances associated the case study area and there are several expressways connecting with distinct operation strategies. In general, each strategy involves it with Hankou. This is the key reason for the relative small three factors, i.e., number of monthly pick-ups, total working durations 88 C. Kang, K. Qin / Computers, Environment and Urban Systems 60 (2016) 79–88 with passenger on board and total distances of passenger carrying trips. Goddard, J. B. (1970). Functional regions within the city centre: A study by factor analysis of taxi flows in central London. Transactions of the Institute of British Geographers, 49, There is no dominant strategy upon others that maximizes the number 161–182. of pick-ups and the carrying distances, and minimizes the carrying du- González, M. C., Hidalgo, C. A., & Barabasi, A. -L. (2008). Understanding individual human rations at the same time. In most supply regions, taxicabs show a mobility patterns. Nature, 453(7196), 779–782. Guo, D., Liu, S., & Jin, H. (2010). A graph-based approach to vehicle trajectory analysis. trade-off between these three factors. However, operating surrounding Journal of Location Based Services, 4(3–4), 183–199. certain major transport hubs, like the Wuhan Rail Station, gains a slight Hägerstrand, T. (1970). What about people in regional science? Papers in Regional Science, comparative advantage over other typical operation patterns. 24(1), 7–24. Our study provides valuable inputs in understanding cabdrivers' col- Hu, X., An, S., & Wang, J. (2014a). Exploring urban taxi drivers' activity distribution based on gps data. Mathematical Problems in Engineering, 2014,708482. lective intelligence. Yet, it is also important to highlight certain aspects of Hu, Y., Miller, H. J., & Li, X. (2014b). Detecting and analyzing mobility hotspots using sur- our data that might introduce potential bias in this research. First, there is face networks. Transactions in GIS, 18(6), 911–935. no trip information in our collected data. To quantify taxicabs' per- Huff, J. O., & Hanson, S. (1986). Repetition and variability in urban travel. Geographical Analysis, 18(2), 97–114. formances associated with distinct strategies, we use the duration and Hutchins, L. N., Murphy, S. M., Singh, P., & Graber, J. H. (2008). Position-dependent motif the straight-line-distance between trip origin and destination as a characterization using non-negative matrix factorization. Bioinformatics, 24(23), proxy for taxicabs' overall profits (or incomes). There are two draw- 2684–2690. fi Jiang, B., Yin, J., & Zhao, S. (2009). Characterizing the human mobility pattern in a large backs: (1) the trip duration is sensitive to instant traf c condition; street network. Physical Review E, 80(2), 021136. (2) Euclidean distance between trip origin and destination, though Kang, C., Liu, Y., & Wu, L. (2015). Delineating intra-urban spatial connectivity patterns by adopted frequently in existing literature, largely underestimate the real travel-activities: A case study of Beijing, China. Proceedings of the 23th international conference on GeoInformatics (pp. 1–7). travel distance of a taxi with passenger on board, which is actually Kang, C., Sobolevsky, S., Liu, Y., & Ratti, C. (2013). Exploring human movements in Singa- constrained by underlying road network. Auxiliary trip faring informa- pore: A comparative analysis based on mobile phone and taxicab usages. Proceedings tion or a (more) robust methodology for quantifying taxicabs' overall of the 2nd ACM SIGKDD international workshop on urban computing (pp. 1–7). ACM. Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix fac- performances will be included in our future works. We also look forward torization. Nature, 401(6755), 788–791. to explore the detailed operation behavior patterns of low-, ordinary- Li, B., Zhang, D., Sun, L., Chen, C., Li, S., Qi, G., & Yang, Q. (2011). Hunting or waiting? Dis- and high-performed taxicabs (refer to Fig. 2)ineachsupplyregion. covering passenger-finding strategies from a large-scale real-world taxi dataset. Pro- ceedings of the IEEE international conference on pervasive computing and communications (PERCOM). IEEE (pp. 63–68). Acknowledgements Li, X., Pan, G., Wu, Z., Qi, G., Li, S., Zhang, D., ... Wang, Z. (2012). Prediction of urban human mobility using large-scale taxi traces and its applications. Frontiers of Computer – The authors gratefully acknowledge suggestions from anonymous Science, 6(1), 111 121. Liang, X., Zheng, X., Lv, W., Zhu, T., & Xu, K. (2012). The scaling of human mobility by taxis reviewers. This research is partial funded by National Natural Science is exponential. Physica A: Statistical Mechanics and its Applications, 391(5), 2135–2144. Foundation of China (No. 41601484 and No. 41471326), China Postdoc- Liu, L., Andris, C., & Ratti, C. (2010). Uncovering cabdrivers' behavior patterns from their toral Science Foundation (No. 2015M580666), Fundamental Research digital traces. Computers, Environment and Urban Systems, 34(6), 541–548. Liu, X., Gong, L., Gong, Y., & Liu, Y. (2015). Revealing travel patterns and city structure Funds of the Central Universities (No. 2042016kf0055 and No. with taxi trip data. Journal of Transport Geography, 43,78–90. 2042015kf0183), and Open Research Fund of State Key Laboratory of Liu, X., Kang, C., Gong, L., & Liu, Y. (2016). Incorporating spatial interaction patterns in Information Engineering in Surveying, Mapping and Remote Sensing classifying and understanding urban land use. International Journal of Geographical Information Science, 30(2), 334–350. (No. 15S01). Liu, Y., Kang, C., Gao, S., Xiao, Y., & Tian, Y. (2012a). Understanding intra-urban trip pat- terns from taxi trajectory data. Journal of Geographical Systems, 14(4), 463–483. Appendix A. Supplementary data Liu, Y., Wang, F., Xiao, Y., & Gao, S. (2012b). Urban land uses and traffic ‘source-sink areas’: Evidence from GPS-enabled taxi data in shanghai. Landscape and Urban Planning, 106(1), 73–87. Supplementary data to this article can be found online at http://dx. Moreira-Matias, L., Gama, J., Ferreira, M., & Damas, L. (2012). A predictive model for the doi.org/10.1016/j.compenvurbsys.2016.08.002. passenger demand on a taxi network. Peroceedings of the 15th international IEEE con- ference on intelligent transportation systems (ITSC) (pp. 1014–1019). IEEE. Peng, C., Jin, X., Wong, K. -C., Shi, M., & Liò, P. (2012). Collective human mobility pattern References from taxi trips in urban area. PloS One, 7(4), e34487. Skillicorn, D. (2007). Understanding complex datasets: Data mining with matrix decomposi- Brunet, J. -P., Tamayo, P., Golub, T. R., & Mesirov, J. P. (2004). Metagenes and molecular tions (Chapman & Hall/CRC data mining and knowledge discovery series). Chapman & pattern discovery using matrix factorization. Proceedings of the National Academy of Hall/CRC. Sciences, 101(12), 4164–4169. Wang, J., Gao, F., Cui, P., Li, C., & Xiong, Z. (2014). Discovering urban spatio-temporal struc- Camerer, C., Babcock, L., Loewenstein, G., & Thaler, R. (1997). Labor supply of New York ture from time-evolving trafficnetworks.Web technologies and applications city cabdrivers: One day at a time. The Quarterly Journal of Economics, 112(2), (pp. 93–104). Springer. 407–441. Wang, W., Pan, L., Yuan, N., Zhang, S., & Liu, D. (2015). A comparative analysis of intra-city Castro, P. S., Zhang, D., Chen, C., Li, S., & Pan, G. (2013). From taxi GPS traces to social and humanmobilitybytaxi.Physica A: Statistical Mechanics and its Applications, 420, community dynamics: A survey. ACM Computing Surveys (CSUR), 46(2), 17. 134–147. Chang, H. -W., Tai, Y. -C., & Hsu, J. Y. -J. (2009). Context-aware taxi demand hotspots pre- Wang, Z., Lu, M., Yuan, X., Zhang, J., & Van De Wetering, H. (2013). Visual trafficjamanal- diction. International Journal of Business Intelligence and Data Mining, 5(1), 3–18. ysis based on trajectory data. IEEE Transactions on Visualization and Computer Chen, C., Zhang, D., Zhou, Z. -H., Li, N., Atmaca, T., & Li, S. (2013). B-planner: Night Graphics, 19(12), 2159–2168. route planning using large-scale taxi gps traces. Proceedings of the IEEE international Yuan, N. J., Zheng, Y., Zhang, L., & Xie, X. (2013). T-finder: A recommender system for find- conference on pervasive computing and communications (PERCOM). IEEE ing passengers and vacant taxis. IEEE Transactions on Knowledge and Data Engineering, (pp. 225–233). 25(10), 2390–2403. Eagle, N., & Pentland, A. S. (2009). Eigenbehaviors: Identifying structure in routine. Yue, Y., Lan, T., Yeh, A. G., & Li, Q. -Q. (2014). Zooming into individuals to understand the Behavioral Ecology and Sociobiology, 63(7), 1057–1066. collective: A review of trajectory-based travel behaviour studies. Travel Behaviour and Farber, H. S. (2015). Why you can't find a taxi in the rain and other labor supply lessons Society, 1(2), 69–78. from cab drivers. The Quarterly Journal of Economics, 130(4), 1975–2026. Zhang, F., Yuan, N. J., Wilkie, D., Zheng, Y., & Xie, X. (2015). Sensing the pulse of urban Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3), 75–174. refueling behavior: A perspective from taxi mobility. ACM Transactions on Intelligent Gallotti, R., Bazzani, A., Rambaldi, S., & Barthelemy, M. (2015). How transportation hierar- Systems and Technology (TIST), 6(3), 37. chy shapes human mobility. (arXiv preprint arXiv:1509.03752). Gao, S., Wang, Y., Gao, Y., & Liu, Y. (2013). Understanding urban traffic-flow characteris- tics: A rethinking of betweenness centrality. Environment and Planning B: Planning and Design, 40(1), 135–153.