<<

International Journal of Geo-Information

Article Inferencing Human Spatiotemporal Mobility in Greater via Mobile Phone Big Data Mining

Mohamed Batran 1,* ID , Mariano Gregorio Mejia 1, Hiroshi Kanasugi 2, Yoshihide Sekimoto 1 and Ryosuke Shibasaki 3

1 Institute of Industrial Science, The University of Tokyo, Tokyo 153-8508, Japan; [email protected] (M.G.M.); [email protected] (Y.S.) 2 Earth Observation Data Integration and Fusion Research Initiative, The University of Tokyo, Tokyo 153-8505, Japan; [email protected] 3 Center for Spatial Information Science, The University of Tokyo, Chiba 277-8568, Japan; [email protected] * Correspondence: [email protected]; Tel.: +81-70-1059-7972

 Received: 7 June 2018; Accepted: 26 June 2018; Published: 30 June 2018 

Abstract: The mobility patterns and trip behavior of people are usually extracted from data collected by traditional survey methods. However, these methods are generally costly and difficult to implement, especially in developing cities with limited resources. The massive amounts of call detail record (CDR) data passively generated by ubiquitous mobile phone usage provide researchers with the opportunity to innovate alternative methods that are inexpensive and easier and faster to implement than traditional methods. This paper proposes a method based on proven techniques to extract the origin–destination (OD) trips from the raw CDR data of mobile phone users and process the data to capture the mobility of those users. The proposed method was applied to 3.4 million mobile phone users over a 12-day period in , and the data processed to capture the mobility of people living in the Greater Maputo metropolitan area in different time frames (weekdays and weekends). Subsequently, trip generation maps, attraction maps, and the OD matrix of the study area, which are all practically usable for urban and transportation planning, were generated. Furthermore, spatiotemporal interpolation was applied to all OD trips to reconstruct the population distribution in the study area on an average weekday and weekend. Comparison of the results obtained with actual survey results from the Japan International Cooperation Agency (JICA) indicate that the proposed method achieves acceptable accuracy. The proposed method and study demonstrate the efficacy of mining big data sources, particularly mobile phone CDR data, to infer the spatiotemporal human mobility of people in a city and understand their flow pattern, which is valuable information for city planning.

Keywords: call detail record (CDR); mobile phone data; origin–destination matrix; spatiotemporal mobility; trip generation and attraction

1. Introduction The quality of urban life and a city’s economic growth may lie in how well its urban spaces and transportation infrastructure have been planned for and developed. In urban and transportation planning, it is critical to consider how to efficiently and effectively move people and goods in the city from their points of origin to their respective destinations by understanding how people conduct their daily activities and, with that, their travel behavior. Traditional methods persist in collecting such needed travel and activity behavior data via household and person-trip interview surveys and roadside monitoring. However, these activities are usually limited to a relatively small sample size,

ISPRS Int. J. Geo-Inf. 2018, 7, 259; doi:10.3390/ijgi7070259 www.mdpi.com/journal/ijgi ISPRS Int. J. Geo-Inf. 2018, 7, 259 2 of 21 involve relatively large-scale deployment, and are resource intensive in terms of cost and labor. As we now enter the era of big data [1], and in light of advances in digital and sensing technologies that acquire big data, researchers in urban and transportation planning-related fields have been developing new approaches that utilize potentially better alternatives to traditional methods that can sense different aspects of crowd mobility with less effort and money while maintaining an acceptable level of accuracy. The widespread use of pervasive sensors allows different levels of human mobility to be captured, and to better describe the displacement of people in time and space. Such is the case of mobile phones that are arguably the highest scale consumer tech in the world with 66% global penetration rate and more than five billion unique worldwide subscribers in 2017 [2]. In much of Sub-Saharan , mobile phones are more common than access to electricity [3]. Mobile phone data, also known as call detail records (CDR), is a digital record passively produced and collected by telecommunications equipment of mobile network operators (MNOs) for each instance of mobile phone communication usage (i.e., voice calls, short message service (SMS), and internet service). Each record includes the exact time stamp and the location of the used telecommunications tower. Thus, mobile phone data has been seen by urban planners and traffic engineers as a promising source to develop a wide range of smart city applications [4] such as enhancing smart mobility and transportation, as well as collecting reliable information about real-time population distribution. The advantages of using mobile phone data over traditional methods include ease and continuous streaming of data over a long period of time. While traditional survey methods provide a snapshot of the traffic situation in a typical weekday, mobile phone data can capture weekday and weekend travel patterns, as well as seasonal variation [5] of a large sample of the population at a low cost and wide geographical scale [6]. On the other hand, limitations of mobile phone data include sparseness [6], representativeness [7], and anonymity [8]. Arai, 2013 [9] discussed the previously mentioned limitations and their impact on mobile phone data analysis results. Despite the limitations, researchers have been trying to handle such data with various statistical measures to extract useful information and add social value. Studies have demonstrated the use of CDR data analytics to understand the travel behavior and mobility of people in cities and generate meaningful results that are readily useable and interpretable for city planning purposes. The trajectory of 100,000 individuals was analyzed by [10] from their call record history and concluded that human trajectories show high degree of spatial and temporal regularity, and that humans follow a simple reproducible pattern. The concept of motifs from network theory was employed by [11] to analyze different travel patterns in Paris, . The authors were able to detect 17 unique travel network patterns in the daily mobility of the population, which are sufficient to capture 90% of the trip patterns found from travel survey data. In addition to mobility behaviour of the individual, [12] studied the spatial structure of 31 Spanish cities from mobile phone data of the population. They were able to define a set of indicators to classify cities from the dynamic of their population. A framework to detect patterns of road usage in a city using mobile phone data was developed by [13]. They found that only few road segments are congested and that most of those segments can be associated with few major driver sources. Estimating the origin and destination (OD) of trips has been an active area of research in the past decade. Early studies [14] using synthetic vehicular and mobile phone data have shown that mobile phone data have great potential. An origin–destination matrix from mobile phone data was developed by [15] by extracting tower to tower transient OD, and using a simulator to estimate the appropriate scaling factor. In addition, various efforts to estimate characteristics of individual trips extracted from mobile phone data has emerged. A method to infer transportation mode was proposed by [16] based on travel time extracted from CDR data in the city of Boston. Although CDR data is coarse-grained, their method demonstrated acceptable accuracy on par with that obtained using fine-grained data. A method to estimate transportation mode from mobile phone data was developed by [17] by incorporating geographic information system (GIS) urban transportation network data to estimate modal split at a ISPRS Int. J. Geo-Inf. 2018, 7, 259 3 of 21 given location. The movement of people in Yangon was esrtimated by [18] within a limited time frame separately for weekdays and weekends based on CDR data. The origins and destinations of the trips were taken based on traffic analysis zones corresponding to the cellular tower in which the record was made. CDR was utilized by [19] in Singapore to examine human travels in an activity-based approach that focused on patterns of tours and trip-chaining behavior in daily mobility networks. They developed an integrated pipeline that included parsing, filtering, and expanding massive and passive raw CDR data and extracting meaningful mobility patterns from them that can be directly used for urban and transportation planning purposes. An activity-aware map was developed by [20] that described the probable daily activities of people (during weekdays) for specified areas based on CDR data. Their results showed a strong correlation in daily activity patterns within the group of people who share a common work area’s profile. Furthermore, it had the advantage of being low cost and suitable for statistical analysis on the transportation modes of a large population. Meanwhile, in a more practical application of mobile phone big data mining, Toole et al. 2015 [21] demonstrated a full method of transportation demand modeling that utilizes CDR data as the main input, and developed an interactive visualization platform that provides output readily interpretable by planners and policymakers. Demographic attributes and socio-economic indicators are also essential in parallel with trip information. First evidence that a subscriber’s personality can be inferred from his/her mobile phone records was provided by [22]. The authors were able to predict subscribers’ personality traits with 42% accuracy. Furthermore, Blumenstock, Cadamuro and On, 2015 [23] and Steele et al. 2017 [24] attempted to build predictive maps on poverty from mobile phone record data in and Dhaka, respectively. We adopted some of the techniques used and proven in these studies to demonstrate the mining of mobile phone big data to extract meaningful information for understanding the daily mobility of people in Greater Maputo, which is a first for CDR data analytics in Mozambique. We also introduce the use of the high-resolution settlement layer (HRSL), developed by the Facebook Connectivity Lab and Center for International Earth Science Information Network (CIESIN). To our knowledge, HRSL has never been used in combination with CDR data in previous studies to expand the CDR user sample to the actual population. Furthermore, we incorporate widely available geographic information from OpenStreetMap, and show how to derive the spatiotemporal distribution of the population in Greater Maputo on an average weekday and weekend. To evaluate the accuracy of our method against traditional survey methods, we introduce a validation approach that compares our results with actual available person-trip survey data in three different spatial resolutions comprising the traffic analysis zones (TAZs) established in the report of the Japan International Cooperation Agency (JICA) [25]. The remainder of this paper is organized as follows: Section2 explains in detail the study area and the data used, which include the CDR data for trip estimation, the HRSL population data for expanding the user sample to the population, and the JICA survey data used for validation. Section3 explains the proposed methodology in detail, from preparing the study area to scaling up the sample to represent the actual population. Section4 presents experimental results of the proposed method, and its validation with survey data. Finally, Section5 presents concluding remarks, including discussion of the potential of the proposed method for actual application in urban and transportation planning and development.

2. Study Area and Data

2.1. Study Area The study area was Mozambique’s Greater Maputo metropolitan area—consisting of the Maputo, Matola City, Boane City, and Marracuene District. In recent years industrial and residential development and the growing urban population has spread from Maputo City, the country’s political and industrial center, to the neighboring areas of Matola, Boane, and Marracuene, creating the 120,767-ha Greater Maputo metropolitan area, as shown in Figure1[ 25]. According to JICA’s forecast for the medium term from 2012 to 2035, the population of Greater Maputo is expected to ISPRS Int. J. Geo-Inf. 2018, 7, 259 4 of 21 increase from 2.2 million to 3.7 million and its economy to grow by a factor of 2.3 in terms of gross domestic product (GDP) per capita. With urban and economic development, Greater Maputo has seen ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 4 of 21 more movement of people and goods, and with it, worsening traffic conditions in its underdeveloped roadunderdeveloped network. The road number network. of daily The personnumber trips of daily is estimated person trips to moreis estimated than double, to more from than 3.1double, million trips/dayfrom 3.1 inmillion 2012 trips/day to 6.5 million in 2012 trips/day to 6.5 million in 2035, trips/day with in car 2035, ownership with car increasingownership increasing by a factor by of a 1.5 forfactor the same of 1.5 medium-term for the same period medium-term [25]. These period rapid [25]. growth These developmentrapid growth indicatorsdevelopment imply indicators an urgent needimply to formulate an urgent aneed comprehensive to formulate mastera comprehensive plan that canmaster facilitate plan that implementation can facilitate implementation and improvement of Greaterand improvement Maputo’s of public Greater transport Maputo’s infrastructure public transport and infrastructure road network and [25 ].road network [25].

FigureFigure 1. Map1. Map of of the the Greater Greater Maputo Maputo metropolitan metropolitan area, which shows shows the the four four main main cities/district cities/district it it covers, as taken from the report of the Japan International Cooperation Agency (JICA) [25]. covers, as taken from the report of the Japan International Cooperation Agency (JICA) [25].

2.2. Call Detail Record (CDR) Data 2.2. Call Detail Record (CDR) Data This study used mobile phone CDR data collected over a 12-day period, i.e., 1st to 12th of March 2016,This from study a major used mobile mobile network phone CDRoperator data (MNO) collected in Mozambique. over a 12-day The period, raw CDR i.e., dataset 1st to 12th contained of March 2016,a total from of a 393 major million mobile mobile network phone operator usage records (MNO) from in Mozambique. 3.4 million anonymous The raw CDR subscribers dataset containedof the a totalMNO of nationwide. 393 million The mobile observation phone usage period records include from 9 weekdays 3.4 million and anonymous 3 weekends subscribersand does not of include the MNO nationwide.any public Theor religious observation holidays period that include may have 9 weekdays different andmobility 3 weekends patterns. andAs stated does notabove, include three any publicmain or types religious of mobileholidays phone that may usage have are different considered: mobility (1) patterns.making/receiving As stated a above, voice threecall, (2) main typessending/receiving of mobile phone a text usage message are considered: by SMS, and (1) making/receiving(3) using data or internet a voice service. call, (2) Because sending/receiving of privacy a textissues, message all by personal SMS, and information (3) using in data the or CDR internet that service. may reveal Because the of subscriber’s privacy issues, identity all personal were informationanonymized in theby the CDR MNO that mayprior revealto its distribution. the subscriber’s The identityrelevant wereinformation anonymized contained by thein the MNO CDR prior to itsdata distribution. include the Theuser relevant ID, timestamp information and type contained of mobile in the phone CDR usage, data include and ID theand user location ID, timestamp of the andrecording type of mobilecellular phonetower. usage, and ID and location of the recording cellular tower. Figure 2 shows the temporal distribution (daily and hourly) of the raw CDR dataset. It should Figure2 shows the temporal distribution (daily and hourly) of the raw CDR dataset. It should be be noted that the 12-day observation period consists of nine weekdays and three weekends. The daily noted that the 12-day observation period consists of nine weekdays and three weekends. The daily distribution shows a consistent number of recorded mobile phone usage for all days, with the distribution shows a consistent number of recorded mobile phone usage for all days, with the exception exception of one day, i.e., 3 March 2016 (Wednesday), which has a relatively low number of records of one day, i.e., 3 March 2016 (Wednesday), which has a relatively low number of records (minimum (minimum value). Meanwhile, it is interesting to see that the hourly distribution shows low mobile value).phone Meanwhile, usage records it is between interesting midnight to see and that 5:00 the am, hourly the time distribution period when shows most low of mobile the population phone usage is recordsexpected between to be sleeping. midnight On and the 5:00 other am, hand, the timethe most period active when period most is between of the population 6:00 pm and is expected8:00 pm. to beThe sleeping. trip identification On the other efficiency hand, therelies most on the active number period of ismobile between phone 6:00 records, pm and which 8:00 is pm. relatively The trip identificationhigh from 8:00 efficiency am until relies 11:00 onpm the based number on the of hourly mobile distribution. phone records, It is expected which is that relatively it is within high this from 8:00time am frame until 11:00that most pm basedtrips occur—particularly on the hourly distribution. people traveling It is expected between that their it is home within and this workplace time frame thator most some trips other occur—particularly place, and would statistically people traveling provide between better trip their estimates. home and workplace or some other place, and would statistically provide better trip estimates.

ISPRS Int. J. Geo-Inf. 2018, 7, 259 5 of 21 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 5 of 21

(a)

(b)

FigureFigure 2. 2.Temporal Temporal distribution distributionof of thethe callcall detail record (CDR) (CDR) dataset: dataset: (a ()a daily,) daily, and and (b) ( bhourly.) hourly.

2.3. Population Data from the High-Resolution Settlement Layer (HRSL) 2.3. Population Data from the High-Resolution Settlement Layer (HRSL) Clearly mobile phone dataset described in Section 2.2 represents a sample of the population and needClearly to be mobilecombined phone with dataset accurate described information in Sectionabout population 2.2 represents distribution a sample as ofwill the be population discussed in and needSection to be 3.5. combined This study with used accurate Greater information Maputo’s population about population data obtained distribution from the as willHRSL. be The discussed HRSL in Sectiondeveloped 3.5. This for study Mozambique, used Greater as shown Maputo’s in Figure population 3, provides data obtained estimates from of thehuman HRSL. population The HRSL developeddistribution for Mozambique,at a resolution asof shownone arc-second in Figure (approximately3, provides estimates 30 m) for of humanthe year population 2015 based distributionon recent at acensus resolution data of and one high-resolution arc-second (approximately satellite imagery 30 m) from for DigitalGlobethe year 2015 [26]. based This on isrecent done census by first data andextracting high-resolution urban settlements satellite imagery from the from 0.5-m DigitalGlobe spatial resolution [26]. satellite This is doneimagery by by first the extracting Connectivity urban settlementsLab at Facebook from the using 0.5-m computer spatial resolution vision techniques, satellite imagery then applying by the Connectivity proportional Lab allocation at Facebook to usingdistribute computer population vision techniques, data from then subnational applying proportional census data allocation to the settlement to distribute extent. population (Detailed data frominformation subnational about census the HRSL data tocan the be settlement found on the extent. CIESIN (Detailed website.). information This means about that HRSL the HRSL is a finer can be founddisaggregated on the CIESIN version website.). of traditional This meanscensus thatdata HRSLand that is ait finercan be disaggregated used with confidence version to of calculate traditional censusthe population data and thatin any it canarbitrary be used area withof interest. confidence Accordingly, to calculate the HRSL the populationpopulation distribution in any arbitrary is used to estimate the population at the cellular tower zone level, which is represented by Voronoi area of interest. Accordingly, the HRSL population distribution is used to estimate the population zones, as discussed in Section 3.1. The estimated total population in the study area is 2,661,832. at the cellular tower zone level, which is represented by Voronoi zones, as discussed in Section 3.1. Statistics on the resulting population aggregation to the Voronoi polygons in the study area are The estimated total population in the study area is 2,661,832. Statistics on the resulting population summarized in Table 1. This means that the mean of the total population living within the boundary aggregation to the Voronoi polygons in the study area are summarized in Table1. This means that the of Voronois is 10,277. mean of the total population living within the boundary of Voronois is 10,277. Table 1. Population distribution statistics per Voronoi for Greater Maputo. Table 1. Population distribution statistics per Voronoi for Greater Maputo. Min Max Mean Median Min62 Max45,261 10,277 Mean6781 Median 62 45,261 10,277 6781

ISPRS Int. J. Geo-Inf. 2018, 7, 259 6 of 21 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 6 of 21

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 6 of 21

FigureFigure 3. Image 3. Image of theof the high-resolution high-resolution settlementsettlement layer layer (HRSL) (HRSL) of ofGreater Greater Maputo Maputo [7]. [7].

2.4. OpenStreetMapFigure 3. RoadImage Network of the high-resolution Data settlement layer (HRSL) of Greater Maputo [7]. 2.4. OpenStreetMap Road Network Data 2.4. OpenStreetMapSimilar to HRSL Road which Network will beData used to scale up subscribers to represent the population. Mobility Similarshould be to analyzed HRSL with which consideration will be used of the toexisting scale transportation up subscribers infrastructure to represent in the thestudy population. area. MobilityThisSimilar should is particularly to be HRSL analyzed usefulwhich inwill with reconstructing be used consideration to scale the up route subscribers of the of each existing to trip represent along transportation availablethe population. transportation infrastructure Mobility in the studyshouldnetwork area. be infrastructure analyzed This is with particularly which consideration will provide useful of the more in existing reconstructing realistic transportation estimation the routeof infrastructure the population of each in trip distributionthe along study area. available as transportationThiswill is be particularly discussed network in useful infrastructure Section in reconstructing 3.6. In which this study, will the provide route we use of more eachOpenStreetMap trip realistic along estimation available (OSM) [27] transportation of which the population is a distributionnetworkcollaborative as infrastructure will project be discussed that which provides will in Section provide anyone 3.6 more . free In realistic this access study, toestimation crowdsourced we use of OpenStreetMap the population geographical distribution (OSM) data of [27 the] as which will be discussed in Section 3.6. In this study, we use OpenStreetMap (OSM) [27] which is a is a collaborativeworld. Since the project data itself that is provides collected anyoneby volunteers, free access data representativeness to crowdsourced varies geographical from one country data of the collaborative project that provides anyone free access to crowdsourced geographical data of the world.to Sinceanother. the OSM data data itself in Mozambique is collected by contains volunteers, over 400,000 data representativenessroad links (as of May varies2018) with from sufficient one country world.coverage Since in the dataGreater itself Maputo is collected area byto volunteers,represent the data transportation representativeness network varies in the from study one area,country as to another. OSM data in Mozambique contains over 400,000 road links (as of May 2018) with sufficient toshown another. in Figure OSM data 4. in Mozambique contains over 400,000 road links (as of May 2018) with sufficient coveragecoverage in the in the Greater Greater Maputo Maputo area area to to represent represent the transportation transportation network network in the in thestudy study area, area,as as shownshown in Figure in Figure4. 4.

Figure 4. Image of the available OpenStreetMap road links in Greater Maputo.

2.5. Japan International Cooperation Agency (JICA) Survey Data for Results Validation FigureFigure 4. Image 4. Image of of the the available available OpenStreetMapOpenStreetMap road road links links in inGreater Greater Maputo. Maputo. To assess the validity of the proposed method and its output, we compare the results to the most 2.5. Japan International Cooperation Agency (JICA) Survey Data for Results Validation 2.5. Japanrecent International person-trip survey Cooperation data obtained Agency (JICA)from JICA’s Survey Comprehensive Data for Results Urban Validation Transport Master Plan To assess the validity of the proposed method and its output, we compare the results to the most Torecent assess person-trip the validity survey of data the obtained proposed from method JICA’s andComprehensive its output, Urban we compare Transport the Master results Plan to the most recent person-trip survey data obtained from JICA’s Comprehensive Urban Transport Master

Plan for the Greater Maputo project [25]. The JICA survey sampled a total of 38,216 persons over ISPRS Int. J. Geo-Inf. 2018, 7, 259 7 of 21

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 7 of 21 the agefor ofthe six Greater from Maputo 9983 sampled project [25]. households, The JICA survey producing sampled a totala total of of 65,168 38,216 persons trips in over one the day. age In the classicalof six four-step from 9983 demand sampled forecasting households, model, producing the a first total step of 65,168 (which trips is tripin one generation) day. In the estimatesclassical the numberfour-step of trips demand originating forecasting from model, and the attracted first step to (which TAZs is that trip aregeneration) defined estimates based on the socio-economic, number of demographic,trips originating and land-use from and attributes attracted to of TAZs the cordoned that are defined area [28 based]. To on validate socio-economic, our results, demographic, we considered threeand TAZ land-use levels attributes based on of the the zoning cordoned levels area of[28]. the To Greater validate Maputoour results, metropolitan we considered area three in TAZ the JICA report;levels specifically, based on AtheTAZ, zoning B levelsTAZ, ofand the C Greater TAZ, as Maputo shown metropolitan in Figure5 area. The in TAZs the JICA of the report; C TAZ levelspecifically, were identified A TAZ, for B the TAZ, JICA and survey C TAZ, and as shown for transport in Figure modeling 5. The TAZs purposes. of the C TheyTAZ level correspond were to identified for the JICA survey and for transport modeling purposes. They correspond to the the administrative boundaries of the “bairro,” where census data are available. A few bairros were administrative boundaries of the “bairro,” where census data are available. A few bairros were consolidatedconsolidated to form to form larger larger TAZs TAZs for for the the B B TAZ TAZ level,level, and and further further consolidation consolidation occurred occurred to result to result in in four largefour large TAZs TAZs for thefor the A TAZ A TAZ level. level. Table Table2 shows2 shows a a statistical statistical summary summary of of the the three three TAZ TAZ levels. levels.

(a) (b) (c)

Figure 5. Division of traffic analysis zones (TAZs) for each TAZ level: (a) A TAZ, (b) B TAZ, and (c) Figure 5. Division of traffic analysis zones (TAZs) for each TAZ level: (a) A TAZ, (b) B TAZ, C TAZ. and (c) C TAZ. Table 2. Statistical summary of Voronoi zones in the study area. Table 2. Statistical summary of Voronoi zones in the study area. Zone Level No. of TAZs Min (km2) Max (km2) Mean (km2) Median (km2) C TAZ 170 0.03 95.20 7.10 1.08 Zone Level No. of TAZs Min (km2) Max (km2) Mean (km2) Median (km2) B TAZ 40 0.81 305.13 30.21 9.65 C TAZ A TAZ 1704 0.03252.61 381.09 95.20 302.17 7.10287.49 1.08 B TAZ 40 0.81 305.13 30.21 9.65 A TAZ 4 252.61 381.09 302.17 287.49 3. Methodology In extracting the origin–destination (OD) trips from raw CDR data, and scaling them to represent 3. Methodology the mobility of the actual population of Greater Maputo in different time frames (weekdays and Inweekends), extracting we the propose origin–destination a method that incorporates (OD) trips proven from raw techniques CDR data, from previous and scaling research them to represent[11,18,19]. the mobilityOur method of involves the actual the population following: (1) of Voronoi Greater tessellation Maputo of in the different study area, time (2) frames (weekdaysestimation and of weekends), the home welocation propose of subscribers, a method that(3) filtering incorporates of valid proven user-days techniques of the sample, from previous(4) extraction of mobility/OD trips of the filtered sample, and (5) application of two types of research [11,18,19]. Our method involves the following: (1) Voronoi tessellation of the study area, magnification factors—one to scale up the user sample to represent the actual population in each (2) estimationzone, and the of other the home to normalize location the ofuser subscribers, sample to one (3) observation filtering ofday. valid user-days of the sample, (4) extraction of mobility/OD trips of the filtered sample, and (5) application of two types of magnification3.1. Voronoi factors—one Tessellation of to the scale Study up Area the user sample to represent the actual population in each zone, and the otherThere to are normalize 259 cellular the towers user samplein the study to one area, observation which are spatially day. distributed in relation to the distribution of population density. The density distribution of the cellular towers, and 3.1. Voronoicorrespondingly Tessellation the of mobile the Study phone Area network coverage area, increases toward the city center and Therecentral are business 259 cellular district. towers In general, in the there study is overlapping area, which ofare coverage spatially areas distributed of neighboring in relation cellular to the distributiontowers, ofwhich population should be density. considered The in density order to distribution appropriately of therepresent cellular the towers, locational and boundaries correspondingly of each tower. Accordingly, a centroidal Voronoi [29] diagram of the cellular tower network in the study the mobile phone network coverage area, increases toward the city center and central business district. area was developed, as shown in Figure 6. Each Voronoi tessellation approximately represents the In general,mobile therephone isnetwork overlapping coverage of and, coverage correspondingly, areas of neighboringthe area coverage cellular of each towers, cellular whichtower. Table should be considered in order to appropriately represent the locational boundaries of each tower. Accordingly, a centroidal Voronoi [29] diagram of the cellular tower network in the study area was developed, as shown in Figure6. Each Voronoi tessellation approximately represents the mobile phone network ISPRS Int. J. Geo-Inf. 2018, 7, 259 8 of 21 coverageISPRS Int. and, J. Geo-Inf. correspondingly, 2018, 7, x FOR PEER the REVIEW area coverage of each cellular tower. Table3 gives a summary 8 of 21 of 2 the Voronoi3 gives a tessellations summary of the in the Voronoi study tessellations area. The minimum in the study area area. coverage The minimum is 0.01 area km coverageand the maximumis 0.01 2 2 2 is 241.21km2 and km the, whereas maximum the meanis 241.21 and km median2, whereas are the 8.15 mean km and 1.81median km are, respectively. 8.15 km2 and 1.81 km2, respectively. Table 3. Statistical summary of Voronoi zones in the study area. Table 3. Statistical summary of Voronoi zones in the study area. Zone Source Number of Towers Min (km2) Max (km2) Mean (km2) Median (km2) Zone Source Number of Towers Min (km2) Max (km2) Mean (km2) Median (km2) CDR Voronoi 259 0.01 241.21 8.15 1.83 CDR Voronoi 259 0.01 241.21 8.15 1.83

Figure 6. Voronoi diagram of the study area (magnified area), representing the mobile network Figure 6. Voronoi diagram of the study area (magnified area), representing the mobile network coverage of each cellular tower. coverage of each cellular tower. 3.2. Home Location Estimation 3.2. Home Location Estimation There are several reasons why there is a need to identify the home location of the subscribers [19].There Firstly, are several it is needed reasons when why combining there is athe need CDR to data identify and the population home location dataof from the the subscribers HRSL to [19]. Firstly,scale it up is needed the user whensample combining to represent the the CDR actual data population. and the Secondly, population we dataconsidered from theonly HRSL the users to scale up theliving user within sample the tostudy represent area; thus, the if actual a user population. was found to Secondly, have a home we location considered outside only the the study users area living withinthey the were study excluded area; from thus, the if asample. user was Thirdly, found the to environment have a home and locationattributes outside of people’s the homes, study area theysuch were as excluded land use, affects from thetheir sample. travel behavior Thirdly, and the activities environment [30,31]. andThis attributesis important of to people’s understand homes, people’s mobility and travel behavior as affected by the space or environment that surrounds them, such as land use, affects their travel behavior and activities [30,31]. This is important to understand especially from an urban planning and development perspective [19]. people’s mobility and travel behavior as affected by the space or environment that surrounds them, Accordingly, we estimated the most probable location of a subscriber’s home based on their especiallymobile from phone an usage urban records. planning It is andexpected development that most perspectivepeople would [19 be]. at home at night and during theAccordingly, weekends, rather we estimated than during the working most probablehours on weekdays. locationof The a home subscriber’s location homeof subscribers based was on their mobileestimated phone based usage on records. the frequency It is expected of recorded that mobile most phone people usage would at cellular be at home towers at (corresponding night and during the weekends,to the Voronoi rather zones) than at duringthese times. working The “night hours time” on weekdays. considered The for this home study location was between of subscribers 8:00 pm was estimatedand 6:00 based am, onbecause the frequency this time period of recorded was found mobile to phonebe when usage most atof cellularthe people towers in Greater (corresponding Maputo to the Voronoiare at home zones) according at these to times.the JICA The household “night time” survey. considered Furthermore, for thismost study of the was population between reside 8:00 pm andoutward 6:00 am, of because the city thiscenter time in Maputo, period wasin the found outskirts to beof the when study most area. of Therefore, the people the in cellular Greater towers Maputo are at(Voronoi home accordingzones) located to thein these JICA areas household were considered survey. Furthermore,as home locations most of ofthe thesubscribers. population Other reside subscribers with home locations estimated outside of the study area were excluded from the sample. outward of the city center in Maputo, in the outskirts of the study area. Therefore, the cellular towers (Voronoi zones) located in these areas were considered as home locations of the subscribers. ISPRS Int. J. Geo-Inf. 2018, 7, 259 9 of 21

OtherISPRS subscribers Int. J. Geo-Inf. with 2018, home7, x FOR locationsPEER REVIEW estimated outside of the study area were excluded 9 of from 21 the sample. Therefore, from our home estimation location, we found 1,279,291 subscribers living within Therefore, from our home estimation location, we found 1,279,291 subscribers living within the study the study area out of the 3.4 million subscribers of the whole Mozambique (approximately 37%). area out of the 3.4 million subscribers of the whole Mozambique (approximately 37%). This This correspondedcorresponded to to 48% 48% of of the the total total population population of ofGreater Greater Maputo Maputo estimated estimated using using HRSL HRSL (2,661,832 (2,661,832 people).people). Figure Figure7 shows 7 shows the distribution the distribution of subscribers of subscribers living living in thein the study study area area resulting resulting from from the the home locationhome estimation. location estimation. The average The numberaverage number of subscribers of subscribers served served by each by cellulareach cellular tower tower is 7250, is 7250, which is less thanwhich the is meanless than population the mean population per cellular per tower cellular (10,277) tower (10,277) as presented as presented in Table in 1Table. 1.

FigureFigure 7. 7.Subscriber Subscriber distributiondistribution in in the the study study area. area.

3.3. Filtering Valid User Days 3.3. Filtering Valid User Days According to Jiang, Ferreira and Gonzalez, 2017 [19], the advantages of CDR data are its longer Accordingsample period to Jiang,and larger Ferreira sample and size Gonzalez, compared 2017to traditional [19], the survey advantages data, whereas of CDR its datadisadvantage are its longer sampleis its period sparseness. and larger As such, sample it is sizeimportant compared to consider to traditional user days survey with much data, mobile whereas phone its disadvantageactivity or is its sparseness.usage. According As such, to itprevious is important studies to [11,19], consider extracting user days individual with much mobility mobile patterns phone from activity CDR data or usage. Accordingcan be toregarded previous as statistically studies [11 consistent,19], extracting and comparable individual with mobility that of traditional patterns fromsurvey CDR data data given can be regardedthat a as certain statistically threshold consistent of daily mobile and comparable phone activity with orthat usage of is traditional met by users. survey The datavalue givenof this that a certainthreshold threshold should of daily not be mobile overly phonesmall as activity it would or favor usage shorter is met trip by patterns, users. Theand valuenot overly of this large threshold as it would exclude too many users [11]. In this regard, this study used similar filtering rules, as follows: should not be overly small as it would favor shorter trip patterns, and not overly large as it would exclude tooA day many is valid users for [11 a user]. In if this he/she regard, has a thisCDR study in at least used eight similar of the filtering 48 half-hour rules, time as follows:slots in one day (24 h). • A dayWeekdays is valid for and a weekendsuser if he/she are treated has a CDR separately in at leastas we eight presume of the that 48 trip half-hour behavior time can slots vary in one day (24between h). them. • WeekdaysAfter filtering and weekends the 1,279,291 are users treated extracted separately from the as home we presumelocation estimation, that trip there behavior remained can vary between797,329 users them. that had at least one valid user-day observation (62%). Figure 8 Shows the distribution of the number of valid user days per user where 17.5% of filtered users have only one valid user day. AfterThis translated filtering theto a 1,279,291total of 4,385,089 users extractedvalid user fromdays the(3,252,971 home user location weekdays estimation, and 1,132,118 there remaineduser 797,329weekends), users that which had correspondat least one to valid 14,744,180 user-day trips observation for user weekdays, (62%). Figure and 4,965,7398 Shows trips the for distribution user of theweekends number ofas validsummarized user days in Table per user4. This where equates 17.5% to an of average filtered of users 4.5 trips have per only user oneper validweekday, user day. This translatedand 4.3 tripsto per a user total per of weekend. 4,385,089 The valid filtered user sample days of (3,252,971 797,329 users user corresponds weekdays to and30% of 1,132,118 Greater user weekends),Maputo’s which population correspond of 2,661,832 to 14,744,180 (from HRSL), trips asfor compared user weekdays, to the person-trip and 4,965,739 survey, trips which for user weekends as summarized in Table4. This equates to an average of 4.5 trips per user per weekday, and 4.3 trips per user per weekend. The filtered sample of 797,329 users corresponds to 30% of Greater ISPRS Int. J. Geo-Inf. 2018, 7, 259 10 of 21

Maputo’s population of 2,661,832 (from HRSL), as compared to the person-trip survey, which sampled only 1.7% [25]. This gives the advantage of having better sample representativeness for the CDR data. ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 10 of 21 Table 4. Statistical summary of user sample before and after filtering valid user-days. sampled only 1.7% [25]. This gives the advantage of having better sample representativeness for the CDR data. Before Filtering After Filtering

NumberTable of users 4. Statistical summary 1,279,291of user sample before and after filtering 797,329 valid user-days.

Before Filtering After Filtering4,385,089 Number of user-days 12,059,561 [Weekdays: 3,252,971, Number of users 1,279,291 797,329 Weekends: 1,132,118] 4,385,089 Number of user-days 12,059,561 [Weekdays: 3,252,971,19,724,307 Weekends: 1,132,118] Number of trips 27,117,806 [Weekdays: 14,744,180, 19,724,307 Number of trips 27,117,806 Weekends: 4,965,739] [Weekdays: 14,744,180, Weekends: 4,965,739]

Figure 8. Number of valid days per user. Figure 8. Number of valid days per user. 3.4. Origin–Destination Extraction 3.4. Origin–Destination Extraction Studying macroscopic mobility requires more knowledge about the start and the end of the trips Studyingto quantify macroscopic trip generation mobility and attraction requires volumes more knowledge across different about parts the startof a city. and On the the end other of thehand, trips to quantifyCDR triphas the generation attribute andof sparseness, attraction which volumes therefore across requires different further parts processing of a city. in On order the otherto extract hand, CDR hasmeaningful the attribute trips. of In sparseness, this study, which we utilized therefore an approach requires similar further to processing that of Jiang, in order Ferreira to extract and meaningfulGonzalez, trips. 2017 In this [19], study, in which we the utilized OD extraction an approach process similar is split to into that two of Jiang,main steps: Ferreira (1) estimation and Gonzalez, of the possible stay zones for each subscriber, and (2) extraction of trip segments between the different 2017 [19], in which the OD extraction process is split into two main steps: (1) estimation of the possible stay points. This method was previously applied to triangulate CDR traces with an uncertainty of stay zones200–300 for eachm for subscriber,all traces [32]. and However, (2) extraction our CDR of dataset trip segments is in the cellular between tower the level different zone, staywhich, points. as This methodpreviously was discussed,previously has applied varying to coverage triangulate areas, CDR i.e., traces “location with uncertainty.” an uncertainty Spatial of 200–300 constraints m for all traceswere, [32 therefore,]. However, also ouradded CDR to capture dataset more is in underlying the cellular trips tower than level focusing zone, only which, on key as places previously of discussed,interest. has varying coverage areas, i.e., “location uncertainty.” Spatial constraints were, therefore, also added to capture more underlying trips than focusing only on key places of interest. 3.4.1. Extraction of Stay Locations 3.4.1. ExtractionBased ofon Stay the fact Locations that people spend most of their time at a few key locations [33], such as their Basedhomes on and the workplaces, fact that people it is important spend most to carefully of their capture time at those a few locations key locations for each [subscriber33], such asfrom their homeshis/her and workplaces, CDRs. Those it locations is important are normally to carefully associated capture with those a longer locations stay period. for each However, subscriber it is also from important to identify other places that are visited less frequently, such as shopping areas and cafés, his/her CDRs. Those locations are normally associated with a longer stay period. However, it is that could possibly be associated with a stay state. Thus, we employed all 12 days of CDR data to also important to identify other places that are visited less frequently, such as shopping areas and extract all possible stay locations for each subscriber. In general, for each subscriber, a zone was cafés, thatidentified could as possibly a stay location be associated if his/her with CDRs a stay indicated state. Thus,that he/she we employed continuously all 12stayed days in of aCDR certain data to extractzone all for possible a given threshold stay locations period. for This each threshold subscriber. can influence In general, the number for each of extracted subscriber, stay alocations zone was identifiedper as user a stay significantly, location adding if his/her bias CDRs to any indicated further trip that extraction. he/she continuouslyBecause cellular stayed tower in coverage a certain zone forvaries a given based threshold on population period. density,This threshold as previously can influence stated, the the number threshold of period extracted had stay to also locations be per userproportional significantly, to the adding coverage bias area. to any The further stay threshold trip extraction. period has Because an impact cellular on the tower number coverage of extracted varies based onstay population locations per density, user and, as previouslycorrespondingly, stated, the the number threshold of trips. period Therefore, had to it also has beto be proportional considered to the coveragereasonably. area. To The capture stay threshold trips with periodshort stay has locations, an impact for on example, the number buying of at extracted a store or stay dropping locations children at school, it is important to define the minimum threshold period that would not capture a per user and, correspondingly, the number of trips. Therefore, it has to be considered reasonably. false stay location. The exact stay time varies based on multiple parameters, including the

ISPRS Int. J. Geo-Inf. 2018, 7, 259 11 of 21

To capture trips with short stay locations, for example, buying at a store or dropping children at school, ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 11 of 21 it is important to define the minimum threshold period that would not capture a false stay location. ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 11 of 21 Thetransportation exact stay time mode varies used based to arrive on multiple and leave parameters, the destination including and, more thetransportation importantly, the mode coverage used to arriveareatransportation and of the leave subject the mode destinationcellular used towers.to arrive and, Accordingly, more and leave importantly, the we destination set the the coverage minimum and, more area threshold ofimportantly, the subjectperiod the cellularat 30coverage min towers. in Accordingly,orderarea of to the ensure wesubject set that the cellular minimum small towers. stays threshold Accordingly, were extracted period we at withoutset 30 the min minimum capturing in order threshold to noise. ensure Figure period that small 9 at shows 30 stays min the werein extracteddistributionorder to without ensure of the capturing thatnumber small of noise. extracted stays Figure were stay extracted9 showslocations the without per distribution user. capturing The resulting of the noise. numberaverage Figure ofnumber 9 extracted shows of stay the stay locationslocationsdistribution per extracted user. of the The numberper resulting user of was extracted average 3.2. stay number locations of stay per locationsuser. The extractedresulting average per user number was 3.2. of stay locations extracted per user was 3.2.

Figure 9. Distribution of number of stay locations per user. Figure 9. Distribution of number of stay locations per user. Figure 9. Distribution of number of stay locations per user. 3.4.2. Extraction of Trips 3.4.2.3.4.2. Extraction Extraction of of Trips Trips The basic concept for trip extraction is capturing recorded locations (in terms of cellular tower zones)TheThe basicwith basic successive concept concept for timefor trip trip stamps extraction extraction as a trip isis capturing capturingor part of arecorded trip depending locations locations on (in whether (in terms terms ofthe ofcellular locations cellular tower are tower zones)identifiedzones) with with successiveas successive stay locations. time time stamps stampsFigure as 10as ashowsa triptrip oror an part example of a trip trip of dependinga depending trip, where on on S1whether whether and S2 the are the locations origin locations and are are identifieddestinationidentified as as staystay stay locations.locations, locations. respectively, Figure Figure 1010 showsshows and T1 an and example T2 are intermediateof of a a trip, trip, where where records. S1 S1 and and S2 S2 are are origin origin and and destinationdestination stay stay locations, locations, respectively, respectively, and and T1 T1 andand T2T2 areare intermediate records. records.

Figure 10. Example of an extracted trip from a CDR. FigureFigure 10. 10.Example Example ofof an extracted trip trip from from a aCDR. CDR. 3.5. Estimation of Magnification Factors 3.5. Estimation of Magnification Factors 3.5. EstimationIn order of to Magnification expand the user Factors sample to represent the actual population of Greater Maputo, as in previousIn order studies to expand [18,19], thewe userused sample two types to represent of magnification the actual factors: population (1) scaling of Greater up the Maputo, user sample as in toprevious In represent order studies to the expand [18,19], actual the populationwe user used sample two in types each to representof zone, magnification and the (2) actual normalizing factors: population (1) ofscaling the of user up Greater the sample user Maputo, sample to one as inobservation previousto represent studies day. the actual [18,19 population], we used in two each types zone, of and magnification (2) normalizing factors: of the (1) user scaling sample up to the one user sampleobservation to represent day. the actual population in each zone, and (2) normalizing of the user sample to one observation3.5.1. User day. Sample to Population Magnification Factor 3.5.1. User Sample to Population Magnification Factor The user magnification factor for a user i in zone j (푢푠푟_푚푎푔) is simply the proportion between the totalThe useruser samplemagnification with home factor location for a user in thati in zonespecific j (푢푠푟_푚푎푔 zone (푃표푝_푢푠푒푟) is simply) and the the proportion actual population between takenthe total from user HRSL sample in the with same home (푃표푝_퐻푅푆퐿 location in), asthat follows: specific zone (푃표푝_푢푠푒푟) and the actual population taken from HRSL in the same (푃표푝_퐻푅푆퐿), as follows:

ISPRS Int. J. Geo-Inf. 2018, 7, 259 12 of 21

3.5.1. User Sample to Population Magnification Factor

The user magnification factor for a user i in zone j (usr_magij) is simply the proportion between the total user sample with home location in that specific zone (Pop_userj) and the actual population takenISPRS from Int. J. HRSL Geo-Inf. in2018 the, 7, samex FOR PEER (Pop REVIEW_HRSL j), as follows: 12 of 21

Pop_HRSLj usr_magij = 푃표푝_퐻푅푆퐿 (1) 푢푠푟_푚푎푔 =Pop_userj (1) 푃표푝_푢푠푒푟 FigureFigure 11 11shows shows the the distribution distribution of of thethe useruser magnificationmagnification factors factors obtained obtained in in the the study study area. area.

Figure 11. Distribution of user magnification factors (푢푠푟_푚푎푔) in the study area. Figure 11. Distribution of user magnification factors (usr_magij) in the study area. 3.5.2. Valid User-Days Magnification Factor 3.5.2. Valid User-Days Magnification Factor The valid user-days magnification factor is needed to normalize all users’ mobility to one observationThe valid day, user-days particularly magnification for those with factor more is than needed one valid to normalize user day, as all discussed users’ mobilityin Section to3.1. one observationThe valid day, user-days particularly magnification for those factor with should more than be treated one valid separately user day, for as weekday discussed and in weekend Section 3.1. Thesamples. valid user-days The sample magnification period had nine factor weekdays should and be three treated weekends; separately therefore, for weekday the maximum and weekend valid samples.user-days The for sample the weekday period sample had nine and weekdays the weekend and sample three weekends; were nine and therefore, three, respectively. the maximum validAccordingly, user-days thefor the valid weekday user-days sample magnification and the weekend factor for sample user i ( were푑푎푦_푚푎푔_푓푎푐 nine and ) three, was simply respectively. the Accordingly,proportion between the valid the user-days user magnification magnification factor of factor user fori (푢푠푟_푚푎푔 user i ()day and_mag the _user’sf aci) corresponding was simply the proportionvalid user-days between (푢푠푟_푑푎푦 the user), as magnification follows: factor of user i (usr_magij) and the user’s corresponding valid user-days (usr_dayi), as follows: 푢푠푟_푚푎푔 푖푓 푤푒푒푘푑푎푦, 1 ≤ 푢푠푟_푑푎푦 ≤ 9 푑푎푦_푚푎푔_푓푎푐 = (2) 푢푠푟_푑푎푦 푖푓 푤푒푒푘푒푛푑, 1 ≤ 푢푠푟_푑푎푦 ≤ 3 ( usr_magij i f weekday, 1 ≤ usr_dayi ≤ 9 day_mag_ f aci = (2) 3.6. Spatiotemporal Interpolation usr_dayi i f weekend, 1 ≤ usr_dayi ≤ 3

3.6. SpatiotemporalBefore further Interpolation utilization of the OSM road links, we applied topological correction and connectivity assessment to handle insufficient and disconnected nodes as we previously proposed in [34],Before resulting further in utilization353,139 connected of the OSM links road and links,271,454 we nodes applied in the topological whole of correction Mozambique. and connectivityThen, we assessmentused the osm2po to handle [35] insufficientconverter and and routing disconnected engine to parse nodes OSM as wepre-processed previously links proposed and make in it [34 ], resultingroutable. in Such 353,139 preprocessing connected linksis an essential and 271,454 step nodesto ensure in theall road whole links of Mozambique.are well connected Then, without we used theany osm2po possible [35 ]isolated converter road and clusters. routing engine to parse OSM pre-processed links and make it routable. To derive the spatiotemporal distribution of the population in the Greater Maputo area with the previously estimated OD pairs as an input, we applied an automatic routing and temporal interpolation for the subscriber location during all of his/her observed periods including stay periods when no mobility is detected. First, a simple Dijkstra’s algorithm is used to reconstruct the subscriber’s route over the previously prepared OSM road network. Then, we interpolate the position

ISPRS Int. J. Geo-Inf. 2018, 7, 259 13 of 21

Such preprocessing is an essential step to ensure all road links are well connected without any possible isolated road clusters. To derive the spatiotemporal distribution of the population in the Greater Maputo area with the previously estimated OD pairs as an input, we applied an automatic routing and temporal interpolation for the subscriber location during all of his/her observed periods including stay periods when no mobility is detected. First, a simple Dijkstra’s algorithm is used to reconstruct the subscriber’s route overISPRS the previouslyInt. J. Geo-Inf. 2018 prepared, 7, x FOR OSMPEER REVIEW road network. Then, we interpolate the position of the subscriber13 of 21 along that route using an arbitrary 1-minute interval from the start to end time of his/her trip. of the subscriber along that route using an arbitrary 1-minute interval from the start to end time of We applied the mentioned method to all ODs extracted from the 12-day study period which resulted in his/her trip. We applied the mentioned method to all ODs extracted from the 12-day study period a massivewhich spatiotemporalresulted in a massive people spatiotemporal flow dataset. people We then flow aggregated dataset. We weekdays then aggregated and weekends weekdays separately and for eachweekends hour of separately the day and for each constructed hour of a the 1-km day resolution and constructed 3D map a that 1-km represents resolution the 3D distribution map that of the populationrepresents the in distribution an average of weekday the population and weekend in an average in Greater weekday Maputo. and weekend Figure 12 in Greaterpresents Maputo. the flow of the spatiotemporalFigure 12 presents interpolation the flow of the method. spatiotemporal interpolation method.

FigureFigure 12. 12.Spatiotemporal Spatiotemporal interpolation interpolation method to to derive derive population population distribution. distribution.

4. Results and Validation 4. Results and Validation 4.1. Results 4.1. Results The results of our method for extracting the trips of users in Greater Maputo are presented in theThe form results of trip of our generation method and for extractingattraction maps, the trips as ofshown users in in Figure Greater 13. Maputo The trip are generation presented and in the formattraction of trip generation maps are classified and attraction as follows: maps, (a) as trip shown generation in Figure on average13. The weekday, trip generation (b) trip and attraction attraction mapson are average classified weekday; as follows: (c) trip (a)generation trip generation during weekday on average morning weekday, rush (b) hours trip (6:00–9:00); attraction on(d) averagetrip weekday;attraction (c) trip during generation weekday during morning weekday rush hoursmorning (6:00–9:00); rush hours (e) (6:00–9:00); trip generation (d) trip during attraction weekday during weekdayevening morning rush hours rush (16:00–19:00); hours (6:00–9:00); (f) trip attraction (e) trip during generation weekday during evening weekday rush hours evening (16:00–19:00); rush hours (16:00–19:00);(g) trip generation (f) trip attraction on average during weekend; weekday and evening (h) trip attraction rush hours on (16:00–19:00); average weekend. (g) trip It generation can be on averageobserved weekend; that all eight and maps (h) tripshow attraction similarity onin the average number weekend. of trips for It all can zones. be observed The zones that outward all eight mapsfrom show the similaritycenter of Greater in the Maputo, number particularly, of trips for in all Marracuene zones. The District, zones Boane outward City, from and the lower center of part of Maputo City, have relatively low generated and attracted trips. This implies that a smaller Greater Maputo, particularly, in Marracuene District, Boane City, and the lower part of Maputo City, share of the population lives in these zones, resulting in correspondingly less travel activity. On the have relatively low generated and attracted trips. This implies that a smaller share of the population other hand, the central part of Maputo City and the central-northern part of Matola City show the liveshighest in these number zones, resultingof generated in correspondingly and attracted trips, less implyingtravel activity. a greater On the share other of hand,Greater the Maputo’s central part of Maputopopulation City reside and thein those central-northern zones and a high part level of Matola of travel City activity. show theThis highest is in fact number the case of as generated those zones are part of the city center, wherein the central business district, main commercial areas, and Mozambique’s major university are located. Those zones continually produce and attract both short and long trips. The maps also provide an insight on the land use of those zones, such as for residential, business, or education.

ISPRS Int. J. Geo-Inf. 2018, 7, 259 14 of 21 and attracted trips, implying a greater share of Greater Maputo’s population reside in those zones and a high level of travel activity. This is in fact the case as those zones are part of the city center, wherein the central business district, main commercial areas, and Mozambique’s major university are located. Those zones continually produce and attract both short and long trips. The maps also provide an insightISPRS on Int. the J. Geo-Inf. land 2018 use, of7, x those FOR PEER zones, REVIEW such as for residential, business, or education. 14 of 21

Trip generation on Trip attraction on Trip generation during weekday Trip attraction during weekday average weekday average weekday morning rush hours morning rush hours

(a) (b) (c) (d)

Trip generation during weekday Trip attraction during weekday Trip generation on Trip attraction on evening rush hours evening rush hours average weekend average weekend

(e) (f) (g) (h)

Figure 13. Trip generation and attraction maps: (a) trip generation on average weekday; (b) trip Figure 13. Trip generation and attraction maps: (a) trip generation on average weekday; attraction on average weekday; (c) trip generation during weekday morning rush hours; (d) trip (b) trip attraction on average weekday; (c) trip generation during weekday morning rush hours; attraction during weekday morning rush hours; (e) trip generation during weekday evening rush (d) trip attraction during weekday morning rush hours; (e) trip generation during weekday evening hours; (f) trip attraction during weekday evening rush hours; (g) trip generation on average weekend; rushand hours; (h) trip (f) attraction trip attraction on average during weekend. weekday evening rush hours; (g) trip generation on average weekend; and (h) trip attraction on average weekend. Figure 14 shows the resulting people flow maps respectively for (a) weekday and (b) weekend, whichFigure show 14 shows the origin the and resulting destination people of trips flow connected maps respectively by lines. It can for be (a) observed weekday in andthe flow (b) weekend,maps whichthat show most theof the origin trips and are heavily destination concentrated of trips in connected the central by part lines. of the It canstudy be area, observed similar in to the the flow mapsobservation that most from of the the trips trip generation are heavily and concentrated attraction maps, in the and central that there part is of an the obvious study difference area, similar in to trip volume between weekdays and weekends; specifically, the latter is much lower. In addition, the observation from the trip generation and attraction maps, and that there is an obvious difference there appears to be a formation of four clusters of short trips in the central area, which suggests land in trip volume between weekdays and weekends; specifically, the latter is much lower. In addition, use with heavy daily activities on both weekdays and weekends. Furthermore, it can also be observed therethat appears some of to the be atrips formation cross water, of four which clusters are presumably of short trips the intrips the made central by area,water which ferries. suggests From the land use withreport heavy of JICA daily [25], activities 1% of the on total both trips weekdays are made and using weekends. this transportation Furthermore, mode. it canIt is alsointeresting be observed to thatnote some that of this the aspect trips crosscan be water, visualized which from are CDR presumably data. the trips made by water ferries. From the report ofIn JICA addition, [25], 1%we show of the a total demonstrative trips are made example using by using this transportation our approach in mode. capturing It is accurately interesting to notethe that origin this of aspect all trips can to be a specific visualized zone from as destination, CDR data. and the change in mobility between weekdays andIn addition,weekends. we Figure show 15a,b a demonstrative shows a flow map example of the trips by using toward our Universidade approach in Eduardo capturing Mondlane, accurately the originMozambique’s of all trips oldest to aand specific largest zone university, as destination, on weekdays and theand changeweekends, in mobilityrespectively, between while weekdaysFigure and16 weekends. shows the Figure temporal 15a,b distribution shows a flowof trips map over of thea 24-h trips period. toward With Universidade this example, Eduardo we can observe Mondlane, Mozambique’sdistinctly the oldest difference and largestin the number university, of trips on between weekdays weekday and weekends, and weekend; respectively, specifically, while there Figure are 16 more trips as classes are typically held on weekdays. Moreover, we can capture how many people shows the temporal distribution of trips over a 24-h period. With this example, we can observe arrive at the university at different times. It is interesting that we captured two peaks during weekdays when people arrive, i.e., in the morning (8:00–12:00) and in the evening (17:00–18:00). The first peak period pertains to the trips taken by the students (including university professors and staff) for the regular classes, while the other peak period is for the evening classes mostly taken by working students, as verified by the university.

ISPRS Int. J. Geo-Inf. 2018, 7, 259 15 of 21 distinctly the difference in the number of trips between weekday and weekend; specifically, there are more trips as classes are typically held on weekdays. Moreover, we can capture how many people arrive at the university at different times. It is interesting that we captured two peaks during weekdays when people arrive, i.e., in the morning (8:00–12:00) and in the evening (17:00–18:00). The first peak period pertains to the trips taken by the students (including university professors and staff) for the regularISPRS classes, Int. J. Geo-Inf. while 2018 the, 7, x other FOR PEER peak REVIEW period is for the evening classes mostly taken by working 15 students, of 21 as verifiedISPRS Int. by J. theGeo-Inf. university. 2018, 7, x FOR PEER REVIEW 15 of 21 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 15 of 21

(a) (b) (a) (b) Figure 14. People(a) flow maps in Greater Maputo: (a) on weekday, and(b )( b) on weekend. Figure 14. People flow maps in Greater Maputo: (a) on weekday, and (b) on weekend. FigureFigure 14. 14.People People flow flow maps maps in in GreaterGreater Maputo: (a ()a on) on weekday, weekday, and and (b) on (b) weekend. on weekend.

(b) (a(a) ) (b) (a) (b) FigureFigure 15. 15. Flow Flow map map for for trips trips to to Universidade Universidade Eduardo Eduardo Mondlane: Mondlane: ( a ()a ) on on weekday, weekday, and and (b ) ( b on) on Figure 15. Flow map for trips to Universidade Eduardo Mondlane: (a) on weekday, and (b) on Figureweekend.weekend. 15. Flow map for trips to Universidade Eduardo Mondlane: (a) on weekday, and (b) on weekend. weekend.

Figure 16. Temporal distribution of trips to Universidade Eduardo Mondlane. Figure 16. Temporal distribution of trips to Universidade Eduardo Mondlane. FigureFigure 16. 16.Temporal Temporal distribution distribution ofof tripstrips to Universidade Eduardo Eduardo Mondlane. Mondlane. Furthermore, we constructed an OD matrix of the study area, as shown in Figure 17, with 259 Furthermore, we constructed an OD matrix of the study area, as shown in Figure 17, with 259 origins/destinationsFurthermore, we correspondingconstructed an to OD the matrix Voronoi of the cellular study tower area, zones. as shown This, in as Figure well as17, the with trip 259 origins/destinations corresponding to the Voronoi cellular tower zones. This, as well as the trip origins/destinationsgeneration/attraction corresponding maps, are practically to the useful Voronoi for cellulartransportation tower planning zones. This, purposes, as well particularly as the trip generation/attraction maps, are practically useful for transportation planning purposes, particularly generation/attractionfor transport demand maps, forecasting are practically for new roads, useful or for for transportation new public transport planning routes. purposes, particularly for transport demand forecasting for new roads, or for new public transport routes. for transport demand forecasting for new roads, or for new public transport routes.

ISPRS Int. J. Geo-Inf. 2018, 7, 259 16 of 21

Furthermore, we constructed an OD matrix of the study area, as shown in Figure 17, with 259 origins/destinations corresponding to the Voronoi cellular tower zones. This, as well as the trip generation/attraction maps, are practically useful for transportation planning purposes, particularlyISPRS Int. J. Geo-Inf. for transport 2018, 7, x FOR demand PEER REVIEW forecasting for new roads, or for new public transport routes. 16 of 21

Figure 17. Constructed origin–destination (OD) matrix of the study area. Figure 17. Constructed origin–destination (OD) matrix of the study area.

Finally,Finally, we we present present people people flow flow (refer (refer to the supplementary supplementary video) video) and and the the spatiotemporal spatiotemporal distributiondistribution of of the the population population inin thethe Greater Maputo Maputo area. area. Figure Figure 18 18 presents presents the the hourly hourly distribution distribution ofof the the population population on on an an average average weekdayweekday as captured by by mobile mobile phone phone data. data. Distribution Distribution starts starts to to changechange from from 06:00 06:00 when when people people areare expected to to engage engage in in their their daily daily activities activities and and move move to the to thecity city center,center, which which is wellis well captured captured in in the the figure. figure. The The city city center center is mostis most congested congested at at noon noon and and during during work hourswork until hours 18:00. until After18:00. that, After the that, density the density at the at city the centercity center starts starts to decrease to decrease between between 18:00 18:00 and and 23:00 when23:00 people when returnpeople home.return Figurehome. Figure19 presents 19 presents the hourly the hourly distribution distribution of the of population the population on an on average an average weekend as captured by mobile phone data. The population is well distributed across the weekend as captured by mobile phone data. The population is well distributed across the study area study area as compared to weekdays. The first change in the distribution seems to appear at 10:00 as compared to weekdays. The first change in the distribution seems to appear at 10:00 (which is later (which is later than 06:00 on weekdays) with density increasing at city center as well but at a lower than 06:00 on weekdays) with density increasing at city center as well but at a lower density than on density than on weekdays. In both the weekday and weekend distributions, there are two peaks that weekdays. In both the weekday and weekend distributions, there are two peaks that attract a large attract a large number of the population during daytime. The first and strongest peak is found at the numbercentral ofpart the of population Maputo City, during the commercial daytime. and The business first and district strongest of the peak capital is found city. The at thesecond central peak part ofcontains Maputo Zimpeto,City, the commercial which is one and of thebusiness main district bus stations of the that capital receives city. people The second from peak urban contains and suburban areas who are commuting to the central part of Maputo City or other areas. The above results indicate that mobile phone data can successfully represent the spatiotemporal distribution of the population given an appropriate statistical handling and filtration of non- representative users as we have previously shown in the methodology section. The results can be used for multiple applications besides urban planning applications. For instance, in disaster-

ISPRS Int. J. Geo-Inf. 2018, 7, 259 17 of 21

Zimpeto, which is one of the main bus stations that receives people from urban and suburban areas who are commuting to the central part of Maputo City or other areas. The above results indicate that mobile phone data can successfully represent the spatiotemporal distribution of the population given an appropriate statistical handling and filtration of non-representative users as we have previously shown in the methodology section. The results can be usedISPRS for multipleInt. J. Geo-Inf. applications 2018, 7, x FOR besides PEER REVIEW urban planning applications. For instance, in disaster-management 17 of 21 applications temporal distribution of the population is more important rather than the static nighttime management applications temporal distribution of the population is more important rather than the population. By considering the dynamic population distribution, disaster-management officials will be static nighttime population. By considering the dynamic population distribution, disaster- able to enforce pre-disaster preparedness by estimating the impact of different hazard scenarios with management officials will be able to enforce pre-disaster preparedness by estimating the impact of respectdifferent to population hazard scenarios distribution. with Furthermore, respect to population continuous distribution. monitoring ofFurthermore, the real-time continuous population distributionmonitoring after of the disaster real-time will population enable better distribution response after and disaster data driven will enable decision better making response when and timeline data anddriven efficiency decision of resources making allocation when timeline are critical. and More efficiency importantly, of resources documenting allocation and are analyzing critical. people’s More behaviorimportantly, and mobility documenting pattern andduring analyzing disasters people’s can provide behavior a good andindicator mobility ofpattern the efficiency during disasters of disaster awarenesscan provide or suggest a good alternatives. indicator of the efficiency of disaster awareness or suggest alternatives.

FigureFigure 18. 18.Spatiotemporal Spatiotemporal distribution distribution of of the the population population in Greater Maputo Maputo on on an an average average weekday. weekday.

ISPRSISPRS Int. Int. J. Geo-Inf.J. Geo-Inf.2018 2018, 7, ,7 259, x FOR PEER REVIEW 18 of18 21 of 21

Figure 19. Spatiotemporal distribution of the population in Greater Maputo on an average weekend. Figure 19. Spatiotemporal distribution of the population in Greater Maputo on an average weekend.

4.2.4.2. Validation Validation of of Results Results ToTo validate validate the the results, results, we we comparedcompared the extracted trips trips from from CDR CDR data data with with JICA’s JICA’s person-trip person-trip interviewinterview survey survey data. data. However, However, ourour validationvalidation has has the the following following limitations: limitations: (1) (1) only only weekday weekday trips trips werewere considered considered as as JICA’s JICA’s data data did did not not cover cover weekends; weekends; (2) (2) the the JICA JICA person-trip person-trip surveysurvey accountedaccounted for populationfor population based based on the on 2011 the census,2011 census, whereas whereas we used we used population population from from the HRSLthe HRSL in 2015; in 2015; and and (3) the Voronoi(3) thezones Voronoi vary zones with vary JICA’s with traffic JICA’s analysis traffic analysis zone levels zone (A levels TAZ, (A B TAZ, TAZ, B and TAZ, C TAZ),and C asTAZ), discussed as indiscussed Section 3.1 in. FigureSection 203.1. compares Figure 20 thecompares daily weekdaythe daily weekday trip volume trip extractedvolume extracted from CDR from to CDR that to from JICA’sthat from person-trip JICA’s person-trip survey. Relatively survey. goodRelatively correlation good correlation of the daily of the trip daily volume trip fromvolume CDR from with CDR the B with the B TAZ level trips (R2 = 0.84) and A TAZ level trips (R2 = 0.97) can be obtained based on the TAZ level trips (R2 = 0.84) and A TAZ level trips (R2 = 0.97) can be obtained based on the obtained obtained R-squared (R2) values. It can be observed that the correlation or accuracy improves as the R-squared (R2) values. It can be observed that the correlation or accuracy improves as the zoning size zoning size increases, considering that the zoning mismatch decreases between the Voronoi zoning increases, considering that the zoning mismatch decreases between the Voronoi zoning level and larger level and larger TAZ levels (B TAZ and A TAZ). Accordingly, it can be said that we achieved TAZacceptable levels (B accuracy TAZ and with A TAZ). respect Accordingly, to data collected it can befrom said a traditional that we achieved method, acceptable which in this accuracy case is with respectfrom toJICA’s data person-trip collected from survey. a traditional method, which in this case is from JICA’s person-trip survey. Furthermore,Furthermore, CDR CDR can can capturecapture shortshort trips between between neighboring neighboring zones, zones, which which the the person-trip person-trip surveysurvey is is not not able able to to as as it it only only accountsaccounts for a a person’s person’s main main trip trip of of purpose, purpose, such such as, as,typically typically between between homehome and and workplace. workplace. In In essence,essence, thethe person-trip survey survey does does not not account account for for trips trips other other than than their their mainmain purpose. purpose. This This cancan be seen seen as as an an advantage advantage of CDR of CDR data data over over the person-trip the person-trip survey survey data as data the as the mobility of people taking short trips as well as the intermediate points/locations of trips can be captured, not just the “endpoints” or origin and destination locations.

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 19 of 21

ISPRSmobility Int. J. Geo-Inf. of people2018, 7 , taking 259 short trips as well as the intermediate points/locations of trips can19 be of 21 captured, not just the “endpoints” or origin and destination locations.

(a) (b) (c)

Figure 20. Comparison of daily trip volumes obtained from CDR and person-trip survey at respective Figure 20. Comparison of daily trip volumes obtained from CDR and person-trip survey at respective TAZ levels: (a) A TAZ, (b) B TAZ, and (c) C TAZ. TAZ levels: (a) A TAZ, (b) B TAZ, and (c) C TAZ. 5. Conclusions 5. Conclusions This paper proposed an innovative method based on proven techniques for extracting the spatiotemporalThis paper proposed mobility an patterns innovative of millions method of based people on living proven in techniques a metropolitan for extracting area and the spatiotemporalreconstructing mobility their spatiotemporal patterns of millions distribution of people from livingmobile in phone a metropolitan big data—specifically, area and reconstructing call detail theirrecord spatiotemporal (CDR) data. distributionThe results of from the proposed mobile phone method big are data—specifically, presented in the form call detailof trip recordgeneration (CDR) data.and The attraction results and of the people proposed flow maps method and are OD presented matrix, which in the are form easily of tripinterpretable generation and and practically attraction anduseful people for flow urban maps and and transportation OD matrix, planning which are purposes. easily interpretable Furthermore, and we practicallypresented in useful addition for urban1-h andinterval transportation population planning density purposes. maps for Furthermore,an average weekday we presented and weekend. in addition We applied 1-h interval the proposed population densitymethod maps to Greater for an averageMaputo, weekdayas a first for and CDR weekend. data analytics We applied in Mozambique. the proposed The main method advantages to Greater Maputo,of using as CDR a first data for over CDR traditional data analytics data collection in Mozambique. methods Theinclude main utilization advantages of fewer of using resources CDR in data overterms traditional of cost and data labor collection for both methods data acquisition include utilizationand method of implementation, fewer resources and in termsa larger of sample cost and size that provides less bias. Moreover, our method is easily reproducible such that the results can be labor for both data acquisition and method implementation, and a larger sample size that provides updated regularly or as soon as new CDR data are acquired, unlike traditional surveys, which take less bias. Moreover, our method is easily reproducible such that the results can be updated regularly years to be updated. In addition, our method is able to capture trips in different time frames or as soon as new CDR data are acquired, unlike traditional surveys, which take years to be updated. (weekdays and weekends), in contrast to person-trip interview surveys which can give biased results. In addition, our method is able to capture trips in different time frames (weekdays and weekends), Our results are practically useful for planners and policymakers as they provide them with an in contrastunderstanding to person-trip of which interview areas should surveys be which considered can give or biased prioritized results. for developing/improving new/existingOur results road are practically and public useful transportation for planners networks and and policymakers infrastructure. as they For example,provide them our OD with an understandingmatrix can be readily of which used areas and interpreted should be for considered transportation or prioritized demand modeling. for developing/improving The population new/existingdistribution road map and allows public monitoring transportation the networksdynamic changesand infrastructure. in how people For example, move an our emerging OD matrix canmetropolitan be readily used area. and Such interpreted can be directly for transportation used to detect demand and comprehend modeling. the The temporal population characteristics distribution mapof allowscongestion monitoring and to adjust the dynamicthe existing changes transportation in how facilities people movecorrespondingly. an emerging Urban metropolitan planners can area. Suchfind can the be temporal directly density used to of detect an area and a useful comprehend indicator the of land temporal use. Public characteristics facility managers of congestion may also and to adjustbenefit the from existing understanding transportation how facilities and where correspondingly. people move in Urban the surrounding planners can neighborhood find the temporal to densityoperate of their an area facilities a useful more indicator effectively. of land use. Public facility managers may also benefit from understandingOur analysis how andprovided where thoughtful people move insights in the and surrounding represented neighborhood temporal changes to operate in densities their facilitiesin the moremetropolitan effectively. area. We applied an offline data processing in our method for 12 days only due to limitedOur analysis data availability. provided A thoughtful longer observation insights period and represented may enhance temporal the robustness changes of in the densities results and in the show seasonal changes in mobility. In addition, we relied directly on the cellular tower location, and metropolitan area. We applied an offline data processing in our method for 12 days only due to limited the Voronoi polygons, correspondingly, to infer population mobility patterns. However, the data availability. A longer observation period may enhance the robustness of the results and show distribution of such infrastructure may change when the MNO decides to add or remove a cellular seasonal changes in mobility. In addition, we relied directly on the cellular tower location, and the tower at a certain location. That may result in inconsistent results over time. However, Voronoitelecommunications polygons, correspondingly, companies can to infer triangulate population and localize mobility the patterns. location However, of mobile the phones distribution when of suchrecording infrastructure any transaction may change with when an accuracy the MNO between decides 200 to to add 300 or m remove as in [32]. a cellular Such will tower convert at a certain the location.geographical That may location result inassociated inconsistent with results CDR over to coordinates time. However, and eliminate telecommunications the aforementioned companies canproblem. triangulate and localize the location of mobile phones when recording any transaction with an accuracy between 200 to 300 m as in [32]. Such will convert the geographical location associated with CDR to coordinates and eliminate the aforementioned problem. ISPRS Int. J. Geo-Inf. 2018, 7, 259 20 of 21

In future work we plan to fine-tune our method to obtain more accurate results, and extend the coverage of our study to the entire country of Mozambique with a longer observation period and localized coordinates.

Supplementary Materials: The following are available online at http://www.mdpi.com/2220-9964/7/7/259/s1. Author Contributions: Authorship has been included and strictly limited to researchers who have substantially contributed to the reported work. M.B. and M.G.M. wrote the paper. M.B. designed the research framework. M.B. and H.K. Conducted the technical analysis of mobile phone big data. M.B. directed the static visualization, while H.K. directed the dynamic visualization of human mobility. R.S. and H.K. administered CDR data acquisition. Y.S. provided critical review and suggestions that enhanced the outcome. All authors discussed the results and their implications and approved the submitted manuscript. Funding: This research received no external funding. Acknowledgments: The authors gratefully acknowledge Instituto Nacional das Comunicações de Moçambique (INCM) and Mozambique Road Fund for facilitating CDR data acquisition used in this study. Conflicts of Interest: The authors declare no conflicts of interest.

References

1. Manyika, J.; Chui, M.; Brown, B.; Bughin, J.; Dobbs, R.; Roxburgh, C.; Byers, A.H. Big data: The next frontier for innovation, competition, and productivity. McKinsey Glob. Inst. 2011, 156. [CrossRef] 2. GSM Association (GSMA). The Mobile Economy 2018; GSMA; London, , 2018. 3. In Much of Sub-Saharan Africa, Mobile Phones Are More Common Than Access to Electricity—Daily Chart. Available online: https://www.economist.com/graphic-detail/2017/11/08/in-much-of-sub- saharan-africa-mobile-phones-are-more-common-than-access-to-electricity (accessed on 31 May 2018). 4. Steenbruggen, J.; Tranos, E.; Nijkamp, P. Data from mobile phone operators: A tool for smarter cities? Telecommun. Policy 2015, 39, 335–346. [CrossRef] 5. Calabrese, F.; Lorenzo, G. Di Estimating Origin-Destination Flows using Mobile phone Location Data. Cell 2011, 10, 36–44. [CrossRef] 6. Becker, R.A.; Cáceres, R.; Hanson, K.; Isaacman, S.; Loh, J.M.; Martonosi, M.; Rowland, J.; Urbanek, S.; Varshavsky, A.; Volinsky, C. Human mobility characterization from cellular network data. Commun. ACM 2013, 56, 74. [CrossRef] 7. Ranjan, G.; Zang, H.; Zhang, Z.-L.; Bolot, J. Are call detail records biased for sampling human mobility? ACM Sigmob. Mob. Comput. Commun. Rev. 2012, 16, 33. [CrossRef] 8. Arai, A.; Witayangkurn, A.; Kanasugi, H.; Horanont, T.; Shao, X.; Shibasaki, R. Understanding User Attributes from Calling Behavior: Exploring Call Detail Records through Field Observations. Adv. Mob. Comput. Multimed. 2014, 95–104. [CrossRef] 9. Arai, A. Dynamic Census: Estimation of Demographic Structure and Spatiotemporal Distribution of Dynamic Living Population by Analyzing Mobile Phone Call Detail Records; The University of Tokyo: Tokyo, Japan, 2013. 10. González, M.C.; Hidalgo, C.A.; Barabási, A.L. Understanding individual human mobility patterns. Nature 2008, 453, 779–782. [CrossRef][PubMed] 11. Schneider, C.M.; Belik, V.; Couronne, T.; Smoreda, Z.; Gonzalez, M.C. Unravelling daily human mobility motifs. J. R. Soc. Interface 2013, 10, 20130246. [CrossRef][PubMed] 12. Louail, T.; Lenormand, M.; Cantu Ros, O.G.; Picornell, M.; Herranz, R.; Frias-Martinez, E.; Ramasco, J.J.; Barthelemy, M. From mobile phone data to the spatial structure of cities. Sci. Rep. 2014, 4, 1–12. [CrossRef] [PubMed] 13. Wang, P.; Hunter, T.; Bayen, A.M.; Schechtner, K.; González, M.C. Understanding road usage patterns in urban areas. Sci. Rep. 2012, 2.[CrossRef][PubMed] 14. Caceres, N.; Wideberg, J.P.; Benitez, F.G. Deriving origin—Destination data from a mobile phone network. IET Intell. Transp. Syst. 2007, 15–26. [CrossRef] 15. Iqbal, M.S.; Choudhury, C.F.; Wang, P.; González, M.C. Development of origin-destination matrices using mobile phone call data. Transp. Res. Part C Emerg. Technol. 2014, 40, 63–74. [CrossRef] 16. Wang, H.; Calabrese, F.; Di Lorenzo, G.; Ratti, C. Transportation Mode Inference from Anonymized and Aggregated Mobile Phone Call Detail Records. In Proceedings of the 13th International IEEE Conference on Intelligent Transportation Systems, Funchal, , 19–22 September 2010. ISPRS Int. J. Geo-Inf. 2018, 7, 259 21 of 21

17. Qu, Y.; Gong, H.; Wang, P. Transportation Mode Split with Mobile Phone Data. IEEE Conf. Intell. Transp. Syst. Proc. 2015, 2015, 285–289. [CrossRef] 18. Zin, T.A.; Lwin, K.K.; Sekimoto, Y. Estimation of Originating-Destination Trips in Yangon by Using Big Data Source. J. Disaster Res. 2018, 13, 6–13. [CrossRef] 19. Jiang, S.; Ferreira, J.; Gonzalez, M.C. Activity-Based Human Mobility Patterns Inferred from Mobile Phone Data: A Case Study of Singapore. IEEE Trans. Big Data 2017, 3, 208–219. [CrossRef] 20. Phithakkitnukoon, S.; Horanont, T.; Di Lorenzo, G.; Shibasaki, R.; Ratti, C. Activity-aware map: Identifying human daily activity pattern using mobile phone data. Lect. Notes Comput. Sci. 2010, 6219, 14–25. [CrossRef] 21. Toole, J.L.; Colak, S.; Sturt, B.; Alexander, L.P.; Evsukoff, A.; González, M.C. The path most traveled: Travel demand estimation using big data resources. Transp. Res. Part C Emerg. Technol. 2015, 58, 162–177. [CrossRef] 22. De Montjoye, Y.; Quoidbach, J.; Robic, F. Phone-Based Metrics. In Proceedings of the 6th international conference on Social Computing, Behavioral-Cultural Modeling and Prediction, Washington, DC, USA, 2–5 April 2013; pp. 48–55. [CrossRef] 23. Blumenstock, J.; Cadamuro, G.; On, R. Predicting poverty and wealth from mobile phone metadata. Science 2015, 350, 1073–1076. [CrossRef][PubMed] 24. Steele, J.E.; Sundsøy, P.R.; Pezzulo, C.; Alegana, V.A.; Bird, T.J.; Blumenstock, J.; Bjelland, J.; Engø-Monsen, K.; de Montjoye, Y.-A.; Iqbal, A.M.; et al. Mapping poverty using mobile phone and satellite data. J. R. Soc. Interface 2017, 14, 20160690. [CrossRef][PubMed] 25. Japan International Cooperation Agency (JICA). Comprehensive Urban Transport Master Plan for the Greater Maputo 2014; JICA: Tokyo, Japan, 2014. 26. HRSL; Columbia University: New York, NY, USA, 2016. 27. Contributors, O. OpenStreetMap. Available online: www.openstreetmap.org (accessed on 15 January 2018). 28. McNally, M.G. The four-step model. In Handbook of Transport Modelling, 2nd ed.; Emerald Group Publishing Limited: Bingley, UK, 2007; pp. 35–53. 29. Okabe, A.; Boots, B.; Sugihara, K. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams; John Wiley & Sons, Inc.: New York, NY, USA, 1992; ISBN 0-471-93430-5. 30. Cervero, R.; Murakami, J. Effects of built environments on vehicle miles traveled: Evidence from 370 US urbanized areas. Environ. Plan. A 2010, 42, 400–418. [CrossRef] 31. Zegras, C. Influence of land use on travel behavior in Santiago, Chile. Transp. Res. Rec. J. Transp. Res. Board 2004, 175–182. [CrossRef] 32. Alexander, L.; Jiang, S.; Murga, M.; González, M.C. Origin–destination trips by purpose and time of day inferred from mobile phone data. Transp. Res. Part C 2015, 58, 240–250. [CrossRef] 33. Isaacman, S.; Becker, R.; Cáceres, R.; Kobourov, S.; Martonosi, M.; Rowland, J.; Varshavsky, A. Identifying important places in people’s lives from cellular network data. In Proceedings of the 2011 International Conference on Pervasive Computing, San Francisco, CA, USA, 12–15 June 2011; pp. 133–151. 34. Sekimoto, Y.; Watanabe, A.; Nakamura, T.; Horanont, T. Digital archiving of people flow by recycling large-scale social survey data of developing cities. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2012, 39, B2. [CrossRef] 35. Moeller, C. Osm2po-OpenStreetMap Converter and Routing Engine for Java. Available online: http//osm2po (accessed on 15 January 2018).

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).