applied sciences

Article Geographically Modeling and Understanding Factors Influencing Transit Ridership: An Empirical Study of Shenzhen

Yuxin He 1 , Yang Zhao 1,2,* and Kwok-Leung Tsui 1,2 1 School of Data Science, City University of , Hong Kong 999077, Hong Kong; [email protected] (Y.H.); [email protected] (K.-L.T.) 2 Centre for Systems Informatics Engineering, City University of Hong Kong, Hong Kong 999077, Hong Kong * Correspondence: [email protected]

 Received: 2 September 2019; Accepted: 3 October 2019; Published: 10 October 2019 

Featured Application: Enriching the practical applications of direct demand models with geographically weighted regression (GWR) and examining the influencing factors of transit ridership can provide multiple potential implications for travel demand modelers, transit operators, and urban planners. The results of the GWR model indicate that transit travel demand (ridership) can be estimated by identifying the significant variables from a local perspective, which can help to establish an efficient transit system combined with sustainable urban development.

Abstract: Ridership analysis at the local level has a pivotal role in sustainable urban construction and transportation planning. In practice, urban rail transit (URT) ridership is affected by complex factors that vary across the urban area. The aim of this study is to model and explore the factors that impact metro station ridership in Shenzhen, from a local perspective. The direct demand model, which uses ordinary least squares (OLS) estimation, is the most widely used method of ridership modeling. However, OLS estimation assumes parametric stability. This study investigates the use of a direct demand model on the basis of geographically weighted regression (GWR) to model the local relationships between metro station ridership and potential influencing factors. Real-world smart card data are used to test and verify the applicability and performance of the model. The results show that GWR performs better than OLS estimation in terms of both model fitting and spatial interpretation. The GWR model demonstrates a high level of interpretability regarding the spatial distribution and variation of each coefficient, and thus can provide insights for decision-makers into URT ridership and its complex factors from a local perspective.

Keywords: geographically weighted regression (GWR); metro ridership; influencing factors; spatial autocorrelation

1. Introduction The dramatic increase in urbanization in the last few decades has made urban rail transit (URT) a central pillar of public transport, due to its efficiency and transport capacity. Identifying the dynamic mechanisms of urban transit is critical to both infrastructure planning and transportation operation, and thus urban indicators such as URT station ridership must be investigated systematically and comprehensively. URT ridership is a key factor used to determine a station’s occupation in terms of space and the supporting facilities required. URT ridership is known to be affected by the interaction of specific urban elements (such as land use and socio-economics). Thus, understanding the impact of these elements is essential to accurately estimate travel demand and effectively plan and design

Appl. Sci. 2019, 9, 4217; doi:10.3390/app9204217 www.mdpi.com/journal/applsci Appl. Sci. 2019, 9, 4217 2 of 23 urban systems, including planning infrastructure and deploying services and resources. URT ridership modeling can help estimate ridership and explore the influencing factors. The complex factors assumed to affect metro ridership include land use, socio-economics, intermodal transport accessibility, and network structures. Ridership modeling has long been applied in the field of transportation planning. Numerous models have been developed to explore the relationships between transit ridership and influencing factors. The four-step model (generation, distribution, mode choice, and assignment) was first developed in the 1950s and has since been dominant in traffic analysis. However, the four-step model has various practical shortcomings, such as its low level of accuracy, imprecise data, insensitivity to land use, institutional obstacles, and high cost [1]. The model is more applicable to transit ridership estimation in traffic zones (large regions) than in stations (small detailed regions) [2]. The direct demand model has recently attracted attention as an alternative to the four-step model. This estimates ridership via regression models and treats it as a function of its influencing factors in the pedestrian catchment area (PCA) and can thus identify influencing factors that can help to increase the volume of transit ridership [1,3–6]. In the model, the PCA is the geographical area from which the station draws its passengers. The shape and size of the PCA are determined by the accessibility of one station and the distance from other stations to it. Buffers can be used to generate circular catchment areas with a given distance, and Thiessen polygons are typically used to define an area around each station, where every location is nearer to this station than to all the others. The main advantages of the direct demand model in ridership modeling are simple usage, easy interpretation, quick response, and low expenses. Ordinary least squares (OLS) multivariate regression is a commonly used direct demand model, and assumes that the parameters are stable [1,5,7–16]. The advantages of spatial models are considered in direct demand models, and thus provide a better spatial interpretation by implementing geographically weighted regression (GWR), which can measure the nonstationarity and heterogeneity of spatial parameters. In addition, as real transportation systems are dynamic, the traffic demand of the transport network differs by the time of day and the day of week. The time-varying patterns of travel demand are accompanied by time-varying ridership, which is influenced by several factors. If the time dependence of travel demands is not considered, inaccurate estimations of ridership and incorrect analyses of the influencing factors may result. Thus, the specific factors influencing transit ridership at station level at different time periods should be identified so follow-up dynamic ridership forecasting can be accurately conducted, thus providing a theoretical foundation for real-time traffic management. In summary, the current direct demand models have various shortcomings while the GWR model has various advantages in the modeling of latent spatially varying relations. Thus, a direct demand model based on the GWR model is used to model the factors affecting transit ridership during different periods (evening rush hours, nonrush hours, average weekday ridership, average weekend ridership, and average daily ridership per week). Feature selection is conducted via the backward stepwise method before the GWR process. The applicability and effectiveness of the framework are demonstrated using the Shenzhen Metro network as a case study. Ridership data for 118 Shenzhen Metro stations were collected using the Automated Fare Collection (AFC) system in 2013. Four types of potential factors are considered: land use, socio-economics, the accessibility of intermodal transport, and network structure information. This study makes both methodological and empirical contributions, as follows. Methodologically, by quantifying the network structure factors using measurements in the domain of a complex network, comprehensive information and significant regression performance are obtained. To the extent of our knowledge, little research has been done in this way before. Empirically, different GWR models are implemented for different time periods, which can both model the local relationships between the metro station ridership and its potential influencing factors and interpret the temporal variation in the influencing factors of ridership. Therefore, the study contributes to metro planning and periphery development from both spatial and temporal perspectives. The findings also have several practical Appl. Sci. 2019, 9, 4217 3 of 23 implications for urban and transport policymakers, thus bridging the gap between theory and practice. The study also advances the literature of transit ridership modeling from a local perspective. In addition, its framework can be extended to other areas, such as analyzing the influencing factors of public health conditions (virus and disease spreading) and customer volume at different locations. The remaining parts of the paper are organized as follows. A comprehensive review of the research on modeling and analyzing transit ridership and its influencing factors is provided in Section2. The data collected for the study are described in Section3. Section4 introduces the methods of estimating transit station ridership and identifying the key influential factors. Section5 details and analyzes the results of the model implementation. Section6 concludes and provides practical recommendations.

2. Prior Research

2.1. Models for Estimating Transit Ridership Numerous studies have recently emerged that explore the influencing factors impacting transit ridership. We review the research focusing on direct demand models in transit ridership estimation (as summarized in Table1). The most widely used direct demand model in this research field is OLS multiple regression. Kuby et al. (2004) [5], Sohn and Shim (2010) [8], Loo et al. (2010), Sung and Oh (2011) [10], Gutierrez et al. (2011) [1], Thompson et al. (2012) [11], Guerra et al. (2012) [12], Zhao et al. (2013) [13], Chan and Miranda-Moreno (2013) [14], Singhai et al. (2014) [15], Liu et al. (2016) [16], Pan et al. (2017) [17], and Vergel-Tovar and Rodriguez (2018) [18] all used OLS regression models to fit the relationship between transit ridership and its influencing factors. However, OLS regression is limited by its assumption that the factors affecting transit ridership have nothing to do with the spatial location of stations, as OLS regression does not consider the spatial autocorrelation of variables. Fotheringham (1996) [19] proposed the geographically weighted regression (GWR) model, which is able to reveal the spatial correlations of variables under the condition of spatial heterogeneity, and thus is more suitable for dealing with spatial data analytics. GWR is widely used for spatial data analysis in many fields, such as economics, geography, and ecology. However, the model has rarely been applied to the field of transportation planning. The only analyses comparing the results of OLS and GWR modeling of the factors influencing transit ridership are those conducted by Cardozo et al. (2012) [2] and Tu et al. (2018) [20], who found a better fit for GWR. Jun et al. (2015) [21] used mixed geographically weighted regression (MGWR) models incorporating both local and global factors to explore the association between metro ridership and land use features. Their work provides new research directions that justify further investigations using this type of model. However, their GWR models still have some limitations, such as ignoring the differences in travel demand between time periods and other network structure factors. Thus, there is room for improvement in the implementation of GWR in transit ridership modeling. Other methods have also been considered, such as multiplicative regression (Choi et al., 2012; Zhao et al., 2014; Kepaptsoglou et al., 2017) [3,22,23], two-stage least square regression (2SLS) (Taylor et al., 2004; Estupiñán and Rodriguez, 2008) [24,25], Poisson regression (Chu, 2004; Choi et al., 2012) [3,6], negative binomial regression (Thompson et al., 2012) [11], and structural equation modelling (SEM) (Sohn and Shim, 2010) [8]; geographical methods such as distance-decay weighted regression (Gutiérrez et al., 2011) [1] and the network Kriging method (Zhang and Wang, 2014) [26]; machine learning methods such as the decision tree (DT) and support vector regression (SVR); and item-based collaborative filtering methods based on cosine similarity (CF) (Hu et al., 2016) [27], cluster analysis (Deng and Xu, 2015) [28], and back propagation neural networks (BPNN) (Li et al., 2016) [29]. However, understanding the results is a major challenge in terms of the interpretability of the function modeled by the machine learning algorithm. In regression models, there is a very simple relationship between inputs and outputs, and thus we choose the GWR model for our study. Appl. Sci. 2019, 9, 4217 4 of 23 Appl. Sci. 2019, 9, x FOR PEER REVIEW 4 of 23

TableTable 1. A 1. review A review of of the the literature literature related related to transit ridership ridership modeling modeling [1– [18,201–18–,2920].– 29].

Response Variable Factors Investigated Residual Morning Data Daily Ridership Average Monthly Daily and Peak and Analysis City Time Annual Station Weekly by Not Land Network Intermodal Socio- Station Built Transit External Reference Weekday Station Hourly Evening Weather TimeCitywide Method Span Ridership Passenger Ridership Transit Mentioned use Structure Connections Economics Environment Service Connectivity Ridership Ridership Subway Peak Volume Mode Ridership Ridership OLS/structural Not Sohn and Seoul, Korea √ √ √ √ equation mentioned Shim (2010) model (SEM) Distance-decay Madrid, Gutiérrez et 2004 √ √ √ √ weighted Spain al. (2011) regression Madrid, Cardozo et al. 2004 √ √ √ √ OLS/GWR Spain (2012) , Zhao et al. 2010 √ √ √ √ √ √ OLS China (2013) Shanghai, Pan et al. 2011 √ √ √ √ √ OLS China (2017) Hong Kong; Loo et al. New York, 2005 √ √ √ √ √ OLS (2010) U.S. Nine U.S. cities (ranging Kuby et al. from Buffalo 2000 √ √ √ √ √ √ OLS (2004) to St. Louis to San Diego) The state of Maryland, which consists of Liu et al. 2011 √ √ √ √ √ OLS 23 counties (2016) and the city of Baltimore, U.S. OLS and Nanjing, Zhao et al. 2011 √ √ √ √ multiplicative China (2014) regression New York, Singhal et al. 2010–2011 √ √ OLS U.S. (2014) Not Decision tree Hu et al. Singapore √ √ √ mentioned (DT), support (2016) Appl. Sci. 2019, 9, 4217 5 of 23

Appl. Sci. 2019, 9, x FOR PEER REVIEW Table 1. Cont. 5 of 23

vector regression (SVR), and item-based collaborative filtering method based on cosine similarity (CF) Back Tokyo, propagation 2010 √ √ √ √ Li et al. (2016) Japan neural network (BPNN) Chan and Montreal, 1998 and Miranda- √ √ √ √ OLS Canada 2003 Moreno (2013) , Cluster Deng and Xu 2014 √ √ √ China analysis (2015) Not Jun et al. Seoul, Korea √ √ √ √ MGWR mentioned (2015) Network New York Not Zhang and √ √ √ Kriging City, U.S. mentioned Wang (2014) method 265 Two-stage urbanized least squares Taylor et al. 2000 √ √ √ √ areas in the regression (2003) U.S. (2SLS) Jacksonville, Poisson 2001 √ √ √ √ √ Chu (2004) Florida, U.S. regression Estupiñán Bogota, and 2005, 2006 √ √ √ √ 2SLS Colombia Rodríguez (2008) Sung and Oh Seoul, Korea 2007 √ √ √ √ OLS (2011) Multiplicative Seoul, Korea Choi et al. 2010 √ √ √ and Poisson For (2012) regression Broward Negative Thompson et County, FL, 2000 √ √ √ √ binomial al. (2012) U.S. regression Three major Not Multiplicative Kepaptsoglou cities in √ √ √ √ √ √ mentioned regression et al. (2017) Cyprus Appl. Sci. 2019, 9, 4217 6 of 23

Appl. Sci. 2019, 9, x FOR PEER REVIEW Table 1. Cont. 6 of 23

Shenzhen, Tu et al. 2014 √ √ √ √ OLS/GWR China (2018) Vergel-Tovar Seven Latin and American 2014 √ √ √ √ √ √ OLS Rodriguez cities (2018)

Appl. Sci. 2019, 9, 4217 7 of 23

2.2. Explanatory and Response Variables The explanatory variables used in the models summarized in Table1 can be categorized into the four main groups of land use, socio-economics, transport accessibility, and network structure variables. Land use variables have been used extensively in previous research. For example, Jun et al. (2015) [21] first evaluated land use characteristics, including the proportions of residential, commercial and office, manufacturing, and mixed land use, within the PCAs of metro stations in the Seoul metropolitan area, and then explored the impact of these characteristics on metro station ridership. Hu et al. (2016) [27] examined the association between land use characteristics at two levels of granularity and public transit ridership in Singapore in time and space. The main socio-economics variables used are population, employment, and automobile ownership ratio. For example, Kuby et al. (2004) [5] identified the factors that attracted light-rail ridership in nine U.S. cities. They found that employment within walking distance of each station and residential population were significant factors. Thompson et al. (2012) [11] analyzed the determinants of work trip transit ridership in Broward County, Florida. Vehicles per person and parking fees were the potential determinants, and automobile parking fees were found to induce higher levels of transit ridership. As for the third category, transport accessibility, Loo et al. (2010) [9] considered the intermodal competition public transit mode (i.e., the number of bus stations within a station’s PCA) as an important factor influencing metro ridership, and they found that the number of bus stations is statistically significant in determining metro ridership in the regression model. Zhao et al. (2013) [13] studied how the accessibility of Nanjing metro stations to other modes of transport, such as feeder bus lines stopping at a station and park-and-ride spaces for nonmotor vehicles, influenced metro station ridership, and found that both were significant factors. Finally, in terms of network structure, Kuby et al. (2004) [5] considered transfer stations and terminal stations when examining the relationship between transit ridership and the station’s properties, using dummy variables to distinguish between these station types. Sohn and Shim (2010) [8] and Thompson et al. (2012) [11] also considered the factor of transfer station using dummy variables. However, measurements from graph theory and network analysis have not been applied in this research field, although practical significance can be achieved by calculating the degree centrality of nodes to distinguish between transfer, terminal, and normal stations, or by calculating a node’s betweenness centrality to indicate how a station allows traffic flows to pass from one part of the metro network to another. In terms of the response variable in direct demand models, daily ridership has been the most common concern, as in Kuby et al. (2004) [5], Chu (2004) [6], Sohn and Shim (2010) [8], Loo et al. (2010) [9], Zhao et al. (2013) [13], and Zhang and Wang (2014) [26], who all regarded the average weekday ridership as the response variable. Gutierrez et al. (2011) [1] and Cardozo et al. (2012) [2] considered monthly station ridership as the response in their models. Taylor et al. (2003) [24] and Thompson et al. (2012) [11] used annual ridership as the response. Zhao et al. (2014) [22], Singhal et al. (2014) [15], Hu et al. (2016) [27], Li et.al. (2016) [29], and Chan and Miranda-Moreno (2013) [14] used the shorter hourly analysis interval (e.g., peak hour ridership) as the response. However, few studies have considered the differences in influencing factors between time periods.

3. Study Area and Data Our study investigates Shenzhen Metro network, which consisted of five lines and 118 stations in the year of 2013. The Shenzhen Metro ridership used in this study was collected and aggregated from the transit entry-exit smart cards records between 14 October (Monday) 2013 and 20 October (Sunday) 2013, which are provided by Shenzhen Metro Corporation (Shenzhen Metro Corporation: http://www.szmc.net/) in China, which obtains the smart cards records from AFC system. Appl. Sci. 2019, 9, 4217 8 of 23

3.1. Explanatory Variables Explanatory variables represent the factors that hypothetically affect the ridership at the station (Table2). Variables can be categorized into four groups: (1) land use; (2) socio-economics; (3) network structure; (4) intermodal transport accessibility.

Table 2. Summary of explanatory variables.

Explanatory Acronym of Categories Source Minimum Mean Maximum Variables Variables Residential units Residence Baidu Map 0 36.86 218 Restaurants Restaurant Baidu Map 0 48.52 291 Land use (The Retailers/shopping Shopping Baidu Map 0 115.18 400 number of Schools School Baidu Map 0 5.64 64 ***(land uses) Offices Offices Baidu Map 0 18.58 298 within PCA) Banks Bank Baidu Map 0 9.60 73 Hospitals Hospital Baidu Map 0 1.008 10 Hotels Hotel Baidu Map 0 13.24 135 Distance to the city Dis_to_center Calculated 0.36 9.97 27.41 center Network structure Degree centrality Degree Calculated 0.017 0.037 0.068 Betweenness Betweenness Calculated 0 0.11 0.31 centrality Population Pop Worldpop 27.54 156.49 354.55 Socio-economics Days since opening Days_open UrbanRail 839 1762 3132 Intermodal The number of bus transport Bus Baidu Map 0 7.72 30 stations within PCA accessibility

3.1.1. Land Use Dovey et al. (2017) [30] believe that in large and medium-sized cities, friendly walking distance is generally 500 m. Therefore, we determine the PCA for each Shenzhen Metro Station as a circular buffer with a radius of 500 m. Then, with the help of Baidu Map API, we can collect the relevant data of land use variables. Land use variables include surrounding residence, recreational facilities, commercial districts, educational institutions, office areas, and so on. To be more detailed, land use variables involve the number of residential units, restaurants, schools, offices, hospitals, banks, shops, and hotels within the PCA with a radius of 500 m.

3.1.2. Socio-Economics In the category of socio-economic variables, there are two factors to be considered. The information about when those metro stations were opened was collected from a website named “UrbanRail” (Source: http://www.urbanrail.net/as/cn/shen/shenzhen.htm). With the information, we can calculate the elapsed days since stations were opened to the investigating days (14 October 2013 to 20 October 2013). A higher residential population is assumed to be positively correlated with ridership. Here, we obtained information on population distribution all around Shenzhen in 2013 collected from the website of Worldpop (Source: WorldPop (www.worldpop.org---School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University (2018). Global High Resolution Population Denominators Project—Funded by The Bill and Melinda Gates Foundation (OPP1134076). https://dx.doi.org/10.5258/SOTON/WP00645). The population data were processed with ArcGIS 10.2 (Environmental Systems Research Institute, Inc (Esri) (2014) Arcgis. URL http://desktop.arcgis.com/en/arcmap/). Specifically, the buffers of all stations with a radius of 500 m were generated and the population within the buffers are aggregated, and the population distribution and 500 m buffers are illustrated as Figure1. Appl. Sci. 2019, 9, 4217 9 of 23 Appl. Sci. 2019, 9, x FOR PEER REVIEW 7 of 23

FigureFigure 1. The1. The 500 500 m m bu buffersffers ofof metrometro stations and and population population distribution. distribution.

FromFrom Figure Figure1, we 1, can we observecan observe that that the denselythe densely populated populated areas areas are are mainly mainly surrounding surrounding the the metro stations.metro The stations. impact The of impact residential of residential population population density density on ridership on ridership in the in butheff bufferer zone zone of eachof each station station will be analyzed in further modeling. will be analyzed in further modeling. 3.1.3.3.1.3. Network Network Structure Structure We considerWe consider the the degree degree centrality, centrality, betweenness betweenness centrality, centrality, andand the distance distance from from metro metro stations stations to the cityto the center city as center the main as the factors main factors in the category in the category of network of network structure. structure. In graph In graph theory theory and and network analysis,network degree analysis, centrality degree is onecentrality of the is easiest one of centrality the easiest measures centrality to measures calculate, towhich calculate, is simply which ais count of howsimply many a count edges of ahow node many has. edges Betweenness a node has. centrality Betweenness of a centrality node is measuredof a node is with measured the number with of the number of shortest paths (between any pair of nodes in the graphs) that passes through the target shortest paths (between any pair of nodes in the graphs) that passes through the target node [31]. node [31]. Therefore, they are related to specific transfer stations, normal intermediate stations, and Therefore,terminal they stations, are related as well toas the specific role of transfer a station stations, in allowing normal traffic intermediateflows to pass from stations, one part and of terminalthe stations,metro as network well as to the the roleother. of The a station city center in allowingof Shenzhen tra isffi Shenzhenc flows to Municipal pass from People one’s part Government of the metro networkin Futian to the District other.. T Theo accurately city center calculate of Shenzhen the distance is Shenzhenof each station Municipal to the city People’s center, we Government take the in Futianinfluence District. of the To accuratelyradius of the calculate earth into the consideration, distance of and each the station distance to from the citystation center, 𝑖 to the we city take the influencecenter of 퐷𝑖푠푡 the푖 radiusis as follows of the (1): earth into consideration, and the distance from station i to the city center Dist is as follows (1): i 퐷𝑖푠푡푖 = R ∗ arccos⁡(cos(퐿푎푡0) 휋 (1) ∗ cos(퐿푎푡푖) ∗ cos(퐿표푛0 − 퐿표푛푖) + sin(퐿푎푡푖) ∗ sin⁡(퐿푎푡0)) ∗ π Dist = R arccos(cos(Lat ) cos(Lat ) cos(Lon Lon ) + sin(Lat ) sin(Lat180)) (1) i ∗ 0 ∗ i ∗ 0 − i i ∗ 0 ∗ 180 where R is the radius of the earth, (퐿푎푡0,⁡퐿표푛0) and (퐿푎푡푖,⁡퐿표푛푖) are respectively the latitude and 𝑖 wherelongitudeR is the radiusof the city of thecenter earth, and ( Latstation0, Lon . 0The) and required (Lati, Lon geographicali) are respectively information the was latitude collected and from longitude of theGoogle city center Maps (Source: and station https://maps.google.com)i. The required geographical. information was collected from Google Maps (Source: https://maps.google.com). 3.1.4. Intermodal Transport Accessibility 3.1.4. IntermodalFor intermodal Transport transport Accessibility accessibility, assuming that the number of surrounding bus stations of Forthe metro intermodal station transportis positively accessibility, correlated with assuming the ridership that theat the number station, of we surrounding consider the feeder bus stations bus of the metrosystem station of the metro. is positively The data correlated are also collected with the from ridership the Baidu at themap. station, we consider the feeder bus system of the metro. The data are also collected from the Baidu map. 3.2. Response Variable 3.2. Response Variable The purpose of this study is to explore the various impact factors of metro station ridership Theduring purpose different of thistime studyperiods. is toAs explore mentioned the earlier, various travel impact demands factors and of metro patterns station differ ridership at the time during differentof day time and periods. the day of As week, mentioned and the earlier,temporal travel distribution demands of s andmart patterns card records differ in at all the available time of days day and the day of week, and the temporal distribution of smart card records in all available days (14 October to 20 October) (shown in Figure2b) indicates that the average daily travel frequency is higher than that on weekends. Appl. Sci. 2019, 9, x FOR PEER REVIEW 8 of 23

(14 October to 20 October) (shown in Figure 2b) indicates that the average daily travel frequency is higher than that on weekends. We use the smart card records data collected from the AFC system of Shenzhen metro on 14 October 2013 to make a preliminary statistical analysis. Figure 2a presents the number of transaction records of smart cards distributed in space on 14 October. The top four stations (Grand Theatre Station, Laojie Station, Huaqiang Road Station, and Luohu Station) with the most transaction records are marked in Figure 2a. Figure 2b shows the temporal distribution of smart card records. It is noted that there is a morning traffic rush and an evening traffic rush on both weekdays and weekends. Moreover, the Appl.records Sci. 2019 of, 9 ,rush 4217 hours on weekdays are significantly more than those of weekend rush hours, whereas10 of 23 the records of nonrush hours on weekdays are less than those of weekends.

FigureFigure 2. 2.Spatiotemporal Spatiotemporal distribution of of smart smart card card records. records. (a) Spatial (a) Spatial distribution distribution of records of on records 14 on 14October October;; (b) Temporal (b) Temporal distribution distribution of hourly of records hourly on records different on days diff oferent the week days (14 of October the week–20 (14 October–20October). October).

We useTherefore, the smart the cardregression records models data collectedwith different from response the AFC variables system of including Shenzhen evening metro onrush 14 hour October 2013(17:0 to make0–19:00), a preliminary nonrush hou statisticalr (9:00–17 analysis.:00, 19:00– Figure23:00),2 aaverage presents weekday the number ridership, of transaction average weekend records of smartridership cards distributed, and average in daily space ridership on 14 October. of a week The (Shenzhen top four Metro stations is operated (Grand from Theatre 6:30 Station,to 23:00 Laojiein 2013) are built to identify the impact factors of ridership during the five aforementioned time Station, Huaqiang Road Station, and Luohu Station) with the most transaction records are marked in intervals. More concretely, ridership of each station is the sum of entry and exit records (Average Figure2a. daily ridership refers to the average of the total ridership within a few days of operation time (6:30– 23Figure:00). Evening2b shows rush the hour temporal ridership distribution is determined of smart by dividing card records. the total It isridership noted that in the there whole is a morningweek traffiofc rushevening and rush an eveninghours from tra ffi17:00c rush to 19:00 on both by 14 weekdays h (2 h multipl and weekends.ied by seven Moreover, days). Nonrush the records hour of rushridership hours on is weekdays obtained by are dividing significantly the total more ridership than of those remaining of weekend time (9:0 rush0–17 hours,:00, 19:0 whereas0–23:00) the except records of nonrushmorning hours (7:00– on9:00) weekdays and evening are lessrushthan hours those of a ofwhole weekends. week by 84 hours (12 hours multiplied by sevenTherefore, days)). the regression models with different response variables including evening rush hour (17:00–19:00), nonrush hour (9:00–17:00, 19:00–23:00), average weekday ridership, average weekend ridership,4. Method andology average daily ridership of a week (Shenzhen Metro is operated from 6:30 to 23:00 in 2013) are built to identify the impact factors of ridership during the five aforementioned time intervals. 4.1. Variable Selection More concretely, ridership of each station is the sum of entry and exit records (Average daily ridership refers toThe the structured average of data the contain total ridership 14 explanatory within variables a few days (shown of operation in Table 1) time with (6:30–23:00). a limited amount Evening rushof hour observations, ridership which is determined may cause by multicollinearity dividing the total and ridershipoverfitting. in Redundant the whole variables week of should evening be rush hoursremoved from 17:00 to ensure to 19:00 that by the 14 modelin h (2 h multipliedg process is byefficient. seven Thus, days). before Nonrush fitting hour the ridershipregression ismodel, obtained features from the original variables candidates must be selected. The backward stepwise regression by dividing the total ridership of remaining time (9:00–17:00, 19:00–23:00) except morning (7:00–9:00) method, a popular wrapper method, is used to select features and control the model complexity [32]. and evening rush hours of a whole week by 84 h (12 h multiplied by seven days)). The regression begins with a full model with all variables included, and at each step the variables are gradually eliminated to minimize the specific statistic used as a variable selection criterion. We can 4. Methodology thus eventually obtain a reduced model that best explains the data. 4.1. Variable Selection The structured data contain 14 explanatory variables (shown in Table1) with a limited amount of observations, which may cause multicollinearity and overfitting. Redundant variables should be removed to ensure that the modeling process is efficient. Thus, before fitting the regression model, features from the original variables candidates must be selected. The backward stepwise regression method, a popular wrapper method, is used to select features and control the model complexity [32]. The regression begins with a full model with all variables included, and at each step the variables are gradually eliminated to minimize the specific statistic used as a variable selection criterion. We can thus eventually obtain a reduced model that best explains the data. The Akaike information criterion (AIC) is commonly used in such methods. In general, suppose we have a model with variables. Let p be the number of variables in the model. Then, the AIC values are as follows: AIC = 2log likelihood + 2p (2) − − Appl. Sci. 2019, 9, 4217 11 of 23

Models with more variables have smaller root sum squares (RSSs), indicating a better goodness of fit, but increasing the complexity with more parameters. Models that strike a balance between fit and model size can generalize the best and perform substantially better than others. The AIC penalizes large models, so it strikes a balance. Thus, we use the AIC as a selection criterion for the backward stepwise regression method to obtain a reasonable model.

4.2. Geographically Weighted Regression We utilize geographically weighted regression (GWR) models to estimate station-level ridership. The GWR model is an extension of OLS multiple regression, which is shown as follows:

Xp yi = β + βkxik + εi (3) k=1

Local parameters can be estimated by introducing a geographical location factor into the regression, and the extended GWR model is as follows:

Xp yi = β0(ui, vi) + βk(ui, vi)xik + εi (4) k=1 where given the observation point i (i = 1, 2, ... , n) with longitude/latitude coordinates (ui, vi), yi and xi1, xi2, ... , xip refer to the observed values of the response variable y and explanatory variables x1, x2, ... , xp of point i, and εi is the error term, following a normal distribution. βk(ui, vi)(k = 1, 2, ... , p) denotes p unknown functions related to spatial location. The geographic location calibrated by (ui, vi) of each observation point i is weighted by the GWR model, and the weight is a type of distance decay function [33]. When calculating the distance matrix among stations, the Euclidean distance metric and network distance metric are assessed, and the latter considers that the travel activities always occur along the connection intermedia of the transportation network, such as metro lines and roads [26]. As the network structure of the Shenzhen Metro in 2013 was not particularly complex and there are no clear differences between the results calibrated by the network distance metric and the Euclidean distance metric, we use the Euclidean metric for simplicity of calculation. For application to other networks, the distance metric should be selected considering the network complexity, and more distance metrics related to network accessibility [34]. The determination of bandwidth will directly affect the weight function and the precision of the model, and is therefore critical. GWR4 software (GWR4 User Manual. http://geoinformatics.wp.st-andrews.ac.uk/download/software/GWR4manual.pdf.) is used to construct the GWR model, as it includes multiple kernel-type options and bandwidth methods. We select an adaptive bisquare kernel and use AICc to determine the bandwidth of the model, where AICc is the AIC with a correction for small sample sizes.

5. Results and Discussion

5.1. Spatial Autocorrelation Test for Variables Establishing whether the candidate variables are spatially autocorrelated is necessary before the GWR model can be implemented. A spatial autocorrelation test can detect the degree of spatial correlation of the variables, which will provide theoretical support for the feasible application of spatial models. Moran’s I, proposed by Patrick Alfred Pierce Moran (1950) [35], is a correlation coefficient that measures the spatial autocorrelation. The estimated Moran’s I values of the response variables and all of the candidate explanatory variables are higher than the expected I values, indicating that the variables have positive spatial autocorrelations (Table3). In the tables, we use the concise expressions weekly_ridership, weekday_ridership, weekend_ridership, evenrush_ridership, and nonrush_ridership to denote the average daily ridership in a week, average weekday ridership, average Appl. Sci. 2019, 9, x FOR PEER REVIEW 10 of 23 Appl. Sci. 2019, 9, 4217 12 of 23 concise expressions weekly_ridership, weekday_ridership, weekend_ridership, evenrush_ridership, and nonrush_ridership to denote the average daily ridership in a week, average weekday ridership, weekend ridership, evening rush hour ridership, and nonrush hour ridership, respectively. The Moran average weekend ridership, evening rush hour ridership, and nonrush hour ridership, respectively. scatter plot can directly reflect the spatial autocorrelation of variables, and the plot has four quadrants. The Moran scatter plot can directly reflect the spatial autocorrelation of variables, and the plot has A strongfour positivequadrants. spatial A strong correlation positive spatial is observed correlation when is theobserved values when aredistributed the values are in distributed the first and in third quadrants,the first and and a third negative quadrants, spatial and correlation a negative will spatial emerge correlation if they will fall inemerge the second if they andfall in fourth the second quadrants. Figureand3 presents fourth quadrants. the Moran Figure scatter 3 presents plots ofthe several Moran variables.scatter plots of several variables.

TableTable 3. Moran’s3. Moran’s I I test test for for spatialspatial autocorrelation autocorrelation of ofall allvariables. variables.

VariableVariable Moran’s IMoran’s Expected I Expected I Ip -Valuep-Value Weekly_ridership 0.241166216 −0.008547009 1.214e−06 Weekly_ridership 0.241166216 0.008547009 1.214e 06 Weekday_ridership 0.263893648− −0.008547009 1.441e− −07 Weekday_ridership 0.263893648 0.008547009 1.441e 07 Response − − Response VariablesWeekend_ridership Weekend_ridership0.166122759 0.1661227590.008547009 −0.008547009 0.0004006 0.0004006 Variables − Evenrush_ridershipEvenrush_ridership0.277295727 0.2772957270.008547009 −0.008547009 3.899e 3.899e08 −08 − − Nonrush_ridershipNonrush_ridership0.247797251 0.2477972510.008547009 −0.008547009 4.637e 4.637e07 −07 − − Residence 0.393652311 −0.008547009 1.349e−14 Residence 0.393652311 0.008547009 1.349e 14 − − RestaurantRestaurant 0.341715207 0.3417152070.008547009 −0.008547009 1.692e 1.692e11 −11 − − ShoppingShopping 0.231551952 0.2315519520.008547009 −0.008547009 3.923e 3.923e06 −06 − − SchoolSchool 0.289176791 0.2891767910.008547009 −0.008547009 4.164e 4.164e11 −11 − − OfficesOffices 0.174946826 0.1749468260.008547009 −0.008547009 1.365e 1.365e05 −05 − − BankBank 0.388310167 0.3883101670.008547009 −0.008547009 1.978e 1.978e14 −14 Candidate − − Candidate Explanatory BusBus 0.312294327 0.3122943270.008547009 −0.008547009 7.898e 7.898e10 −10 Explanatory − − Variables HospitalHospital 0.211271593 0.2112715930.008547009 −0.008547009 1.263e 1.263e05 −05 Variables − − Hotel 0.466480947 0.008547009 <2.2e 16 Hotel 0.466480947− −0.008547009 −<2.2e−16 Dis_to_center 0.959152856 0.008547009 <2.2e 16 Dis_to_center 0.959152856− −0.008547009 −<2.2e−16 Degree 0.125833878 0.008547009 0.005471 Degree 0.125833878− −0.008547009 0.005471 Betweenness 0.241057563 0.008547009 1.674e 06 Betweenness 0.241057563− −0.008547009 1.674e− −06 Pop 0.710332939 0.008547009 <2.2e 16 Pop 0.710332939− −0.008547009 −<2.2e−16 Days_open 0.383186771 0.008547009 2.108e 13 Days_open 0.383186771− −0.008547009 2.108e− −13

FigureFigure 3. 3.Moran Moran scatterscatter plots of of explanatory explanatory variables. variables. Appl. Sci. 2019, 9, 4217 13 of 23

None of the Moran’s I values presented in Figure3 are 0, indicating that they are not randomly distributed in space. In addition, most belong to the first and third quadrants, which indicates that the variables show significantly positive spatial autocorrelation. The three explanatory variables with the highest Moran’s I values are population, distance to the city center, and days since station opened, as their Moran’s I values are greater than 0.3 [36]. The results of the Moran’s I test thus provide a theoretical foundation for the rationale of the follow-up study.

5.2. Model Implementation and Results Analysis Strong spatial autocorrelation is found for all of the variables included in our study. Thus, it is feasible to implement GWR models to explore the association between Shenzhen Metro ridership and its influencing factors. The final selection of explanatory variables derived from the backward stepwise regression method for the five models is given in Table4.

Table 4. Variables selected in the models.

Variables Model 1 Model 2 Model 3 Model 4 Model 5 Response Variable Weekly_ridership Weekday_ridership Weekend_ridership Evenrush_ridership Nonrush_ridership (Ridership) Pop Pop Pop Pop Pop Betweenness Betweenness Degree Degree Degree Explanatory Days_open Days_open Betweenness Betweenness Betweenness Variables Shopping Office Days_open Days_open Days_open Dis_to_center Dis_to_center Residence School Shopping Shopping Dis_to_center Dis_to_center

Table4 enables us to find the common explanatory variables among the five models and the individual variables of each model. The common explanatory variables are the major factors impacting metro ridership, including population, betweenness centrality, and days since opening. The different individual variables of the five models indicate that factors affecting station-level ridership differ by the day of the week and time of the day. The number of offices within the PCA and the distance from the station to the city center are individual variables in the model for average weekday ridership. The number of residences and shopping places within the station catchment area is related to the average weekend ridership. Thus, commuting activities appear to mainly affect the average weekday ridership, while recreational activities related to commercial development such as shopping malls mainly affect the average weekend ridership. Across a single day, the number of schools and the distance to the center mainly affect the evening rush hour ridership, while the number of shopping places and the distance to the center mainly affect the nonrush hour ridership. Thus, passengers from schools (primary schools, high schools, and universities) contribute to the evening rush ridership and noncommuting activities like shopping affect the nonrush hour ridership. Due to limitations on space, we only compare the results of Model 5 with those of the OLS model in Table5, and we give the results of Models 1–4 compared with those of the OLS models in AppendixA (Tables A1–A4). Appl. Sci. 2019, 9, 4217 14 of 23 Appl. Sci. 2019, 9, x FOR PEER REVIEW 12 of 23

Table 5. Table 5.Results Results ofof thethe model for Nonrush_ridership. Nonrush_ridership.

Global (OLS) Local (GWR) Standard Variables Estimate t(Est/SE) Min Max Mean STD Error Intercept 459.94 32.18 14.29 −1446.82 4350.79 519.97 792.78 Pop 118.36 39.43 3.00 −471.76 400.50 80.31 170.68 Degree 21.29 40.77 0.52 −200.12 340.02 57.51 149.68 Betweenness 48.65 42.70 1.14 −177.32 402.69 53.90 116.54 Days_open 135.09 33.90 3.98 −2116.29 2539.11 153.31 593.88 Shopping 53.01 35.73 1.48 −479.35 196.72 −20.25 152.92 Dis_to_center −41.24 35.55 −1.16 −592.59 2590.58 55.57 447.78 Diagnostic R-squared 0.33 0.88 Adjusted R- 0.29 0.77 squared Sigma 349.49 199.58 AICc 1727.10 1677.79 Residual sum 13,557,658.86 2,475,111.14 of squares Number of 7 39.67 parameters GWR ANOVA Table Source SS DF MS F p-Value Global Residuals 1,355,7658.856 7.000 GWR Improvement 11,082,547.717 48.864 226,804.412 GWR Residuals 2,475,111.139 62.136 39,833.705 5.693781 0.0

First, Table 5 shows that the AICc values of all of the GWR models are smaller than those of the First, Table5 shows that the AICc values of all of the GWR models are smaller than those of the corresponding global regression (OLS) models. According to the evaluation criterion proposed by corresponding global regression (OLS) models. According to the evaluation criterion proposed by Fotheringham et al. (1996) [19], if the difference between the AICc values of a GWR model and an Fotheringham et al. (1996) [19], if the difference between the AICc values of a GWR model and an OLS model is more than 3, the GWR model can be considered more applicable than the OLS model, OLSeven model though is more it is more than complex. 3, the GWR The modeladjusted can R- besquared considered values moreof theapplicable GWR models than are the greater OLS than model, eventhose though of the it corresponding is more complex. OLS Themodels, adjusted demonstrating R-squared that values the GWR of the model GWR has models strong are explanatory greater than thosepower of theeven corresponding when considering OLS model models, complexity. demonstrating Likewise, that the the parameter GWR model values has (Sigma) strong indicating explanatory powerthe model even whenerror of considering the GWR models model complexity.are lower, and Likewise, the residual the parametersum of thevalues squares (Sigma) from the indicating GWR themodels model are error smaller of the than GWR those models from arethe lower,OLS models. and the Thus, residual the results sum ofshow the that squares the GWR from models the GWR modelsgenerally are smallerperform thanbetter those in goodness from the-of- OLSfit measures models. than Thus, the the OLS results models. show ANOVA that thetests, GWR as shown models generallyin Tableperform 5, are conducted better in to goodness-of-fit find out if the global measures (OLS) than regression the OLS model models. and ANOVA the GWR tests, model as have shown inthe Table same5, are statistical conducted performance to find out (the if same the global size of (OLS) error regressionvariance). The model results and suggest the GWR that model there is have a thesignificant same statistical improvement performance when GWR (the sameis used. size of error variance). The results suggest that there is a significantIn addition, improvement by comparing when GWR the results is used. of the five models, the model for nonrush_hour ridership regressionIn addition, is found by comparing to perform the the best results in terms of the of five the R models,-squared the value. model We for only nonrush_hour need the information ridership regressionabout population is found to distribution, perform the degree best in centrality, terms of the betweenness R-squared centrality, value. We days only need since the opening, information the aboutnumber population of shopping distribution, places and degree distance centrality, to the city betweenness center to use centrality,the GWR mo daysdel to since explain opening, 88% of the numberthe response of shopping variable places of andnonrush distance hour to the ridership. city center In addition,to use the the GWR relevant model todata explain covering 88% ofthe the information on the explanatory variables are easily accessible. response variable of nonrush hour ridership. In addition, the relevant data covering the information Figure 4 shows the standardized residuals of the GWR model for average weekday ridership, on the explanatory variables are easily accessible. and for most stations these are relatively small, demonstrating the high accuracy of the model. Figure4 shows the standardized residuals of the GWR model for average weekday ridership, Overpredictions (red bubbles) and underpredictions (blue bubbles) are randomly distributed in and for most stations these are relatively small, demonstrating the high accuracy of the model. Figure 5, which indicates that our model is well specified. The spatial autocorrelation (Moran’s I) test Overpredictionsof the regression (red residuals bubbles) helps and to ensure underpredictions that they are (bluespatially bubbles) random are (Table randomly 6). distributed in Appl. Sci. 2019, 9, x FOR PEER REVIEW 13 of 23

Appl. Sci. 2019, 9, 4217 15 of 23

Figure5, which indicates that our model is well specified. The spatial autocorrelation (Moran’s I) test ofAppl. the Sci. regression 2019, 9, x FOR residuals PEER REVIEW helps to ensure that they are spatially random (Table6). 13 of 23

Figure 4. Standardized residuals of GWR for average weekday ridership (Model 2).

Table 6. Global Moran’s I residuals test of the models for the average ridership across the whole week.

OLS GWR Moran’s index 0.108184155 −0.114488799 Expected index −0.008547009 −0.008547009 Variance 0.002855149 0.002870601 z-score 2.18460251 −1.9773379 Figure 4. Standardizedp-value residuals of GWR0.01446 for average weekday0.976 ridership (Model 2). Figure 4. Standardized residuals of GWR for average weekday ridership (Model 2).

Table 6. Global Moran’s I residuals test of the models for the average ridership across the whole week.

OLS GWR Moran’s index 0.108184155 −0.114488799 Expected index −0.008547009 −0.008547009 Variance 0.002855149 0.002870601 z-score 2.18460251 −1.9773379 p-value 0.01446 0.976

FigureFigure 5. 5.Local Local indicator indicator of of spatial spatial associationassociation (LISA) cluster cluster maps maps of of residuals residuals in in the the OLS OLS and and GWR GWR models.models. Colors Colors indicate indicate significant significant positivepositive (red and pink), pink), negative negative (pale (pale blue blue and and yellow), yellow), and and not not significantsignificant (blue) (blue) spatial spatial autocorrelation. autocorrelation. (a) (a) LISA LISA cluster cluster map map of residuals of residuals of OLS of modelOLS model for the for average the ridershipaverage acrossridership the across whole the week. whole (b )week. LISA (b) cluster LISA mapcluster of map residuals of residuals of GWR of modelGWR model for the for average the ridershipaverage acrossridership the across whole the week. whole week.

TableThe 6. globalGlobal Moran’s Moran’s I Iresiduals residuals test test of the modelsmodels for for the the average average ridership ridership across over the the whole whole week. week, shown in Table 6, demonstrates that GWR surpasses OLS, as the Moran I’s calculation is closer to the expected value in the GWR model. The residualsOLS of the GWR model GWR have a greater likelihood of Moran’s index 0.108184155 0.114488799 random distribution (p-value) and show less variance (z-score).− However, the residuals of OLS Expected index 0.008547009 0.008547009 demonstrate statistically significant clustering− characteristics (reflected− by the Z-score and p-value). Variance 0.002855149 0.002870601 The local indicator of spatial association (LISA) was proposed to represent local pockets of Figure 5. Local indicatorz-score of spatial association 2.18460251 (LISA) cluster maps of1.9773379 residuals in the OLS and GWR − nonstationarity,models. Colors assess indicate the psignificant-value influence positiveof individual (red 0.01446 and locations pink), negative on the magnitude (pale 0.976 blue andof the yellow), global and statistic, not andsignificant identify “outliers” (blue) spatial [37]. autocorrelation. Figure 5 shows (a) the LISA LISA cluster cluster map maps of of residuals residuals of in OLS the modelOLS and for GWR the modelsaverage for ridership average across daily the ride wholership week. over (b) a wholeLISA cluster week. map The of residuals residuals ofof theGWR OLS model model for the give The global Moran’s I residuals test of the models for the average ridership over the whole week, significantlyaverage ridership positive across high the-value whole clustering, week. while in the GWR model almost all of the clusters of shownresiduals in Table are ruled6, demonstrates out, implying that that GWR GWR surpasses makes a significant OLS, as the improvement Moran I’s calculationover OLS in is terms closer of to themodel expectedThe fittingglobal value from Moran’s inthe the perspective I residuals GWR model. oftest residuals. of The the residuals mode ls for of the the average GWR model ridership have over a greater the whole likelihood week, ofshown random in Table distribution 6, demonstrates (p-value) that and GWR show surpasses less variance OLS, as (z-score). the Moran However, I’s calculation the residuals is closer of to OLS the demonstrateexpected value statistically in the GWR significant model. clustering The residuals characteristics of the GWR (reflected model byhave the a Z-score greater and likelihoodp-value). of randomThe distr localibution indicator (p-value) of spatial and association show less (LISA)variance was (z- proposedscore). However, to represent the residuals local pockets of OLS of nonstationarity,demonstrate statistically assess the significant influence of clustering individual characteristics locations on the(reflected magnitude by the of theZ-score global and statistic, p-value). and identifyThe “outliers” local indicator [37]. Figure of spatial5 shows association the LISA cluster (LISA) maps was of proposed residuals to in rep theresent OLS and local GWR pockets models of fornonstationarity, average daily assess ridership the influence over a whole of individual week. The locations residuals on the of themagnitude OLS model of the give global significantly statistic, positiveand identify high-value “outliers” clustering, [37]. Figure while 5 in shows the GWR the LISA model cluster almost maps all of of the residuals clusters in of the residuals OLS and are GWR ruled models for average daily ridership over a whole week. The residuals of the OLS model give significantly positive high-value clustering, while in the GWR model almost all of the clusters of residuals are ruled out, implying that GWR makes a significant improvement over OLS in terms of model fitting from the perspective of residuals. Appl. Sci. 2019, 9, 4217 16 of 23 out, implying that GWR makes a significant improvement over OLS in terms of model fitting from the perspectiveAppl. Sci. 2019 of, 9 residuals., x FOR PEER REVIEW 14 of 23 Using the Voronoi algorithm [38], the Shenzhen Metro coverage area can be divided into several ThiessenUsing polygons the Voronoi according algorithm to the [38], locations the Shenzhen of the Metro stations. coverage Here, area the can spatial be divided distribution into several of the localThiessen R-squared polygons and according local coeffi tocients the locations is visualized of the usingstations. Thiessen Here, the polygons. spatial distribution The values of ofthe the local local R-squaredR-squared range and betweenlocal coefficients 0 and 1, whichis visualized indicates using the Thiessen satisfactory polygons. fitting ofThe the values local regressionof the local model. R- squared range between 0 and 1, which indicates the satisfactory fitting of the local regression model. Mapping the local R-squared values can help us to see where GWR has a higher predictive capacity and Mapping the local R-squared values can help us to see where GWR has a higher predictive capacity where it performs poorly. Figure6 illustrates the spatial distribution of the local R-squared of Model 2 and where it performs poorly. Figure 6 illustrates the spatial distribution of the local R-squared of for average weekday ridership and Model 5 for nonrush hour ridership, enabling us to understand Model 2 for average weekday ridership and Model 5 for nonrush hour ridership, enabling us to where the model has a stronger explanatory power (local R-squared). Both Model 2 and Model 5 have understand where the model has a stronger explanatory power (local R-squared). Both Model 2 and a higherModel explanatory5 have a higher power explanatory in the central-northpower in the central and southeast-north and regions southeast than regions in the than other in regions.the other In addition,regions.the In addition, local R-squared the local values R-squared of most values of theof most stations of the are stations higher are than higher 0.82; thesethan 0.82; stations these are mainlystations located are mainly in Luohu located district. in Luohu district.

FigureFigure 6. Spatial6. Spatial distribution distribution of theof the local local R-squared R-squared values values of Model of Model 2 and 2 Model and Model 5. (a )5. Local (a) Local R-squared R- values’squared distribution values’ distribution of Model 2 of (average Model 2 weekday). (average weekday). (b) Local R-squared(b) Local R values’-squared distribution values’ distri ofbution Model 5 (nonrushof Model hour). 5 (nonrush hour).

ByBy understanding understanding the the spatial spatial distribution distribution of local coefficients coefficients (elasticities) (elasticities) and and t-valuest-values (significance),(significance), we we can can determine determine how howthe relationship the relationship between between the variables the variables changes changes spatially spatially (estimated coe(estimatedfficients), andcoefficients), at what and level at of what statistical level of significance. statistical significance. For example, For in example, Model2 in (see Model Figure 2 (see7),as a commonFigure 7), factor, as a common the mean factor of the, the coe meanfficients of the for coeff the populationicients for the variable population is 3284.89. variable Thus, is 3284.89. for each personThus, within for each the person station’s within PCA, the the station’s number PCA, of trips the number adds up of to trips 3284.89 adds each up to weekday. 3284.89 each However, weekday. these elasticitiesHowever, are these distributed elasticities unevenly are distributed in space. unevenly More trips in perspace. capita More are trips expected per capita in the are central expected zone in and the central zone and the mid-north, where commercial and administrative areas and educational the mid-north, where commercial and administrative areas and educational institutions are intensively institutions are intensively distributed, while elasticity values are lower in the west and east. The t- distributed, while elasticity values are lower in the west and east. The t-value map on the right also value map on the right also shows that the effect of population is more significant in the middle area shows that the effect of population is more significant in the middle area at a 0.05 level (the absolute at a 0.05 level (the absolute value of a t-value larger than 1.96) (Figure 7a). value of a t-value larger than 1.96) (Figure7a). Appl. Sci. 2019, 9, 4217 17 of 23 Appl. Sci. 2019, 9, x FOR PEER REVIEW 15 of 23

Figure 7. SpatialSpatial distribution distribution of of local local coefficients coefficients (elasticities) (elasticities) on on weekdays. weekdays. (a) ( aSpatial) Spatial distribution distribution of oflocal local coefficients coefficients (elasti (elasticities)cities) and and t- t-valuesvalues (significance) (significance) for for the the population variable. (a)(b) Spatial distribution of local coecoefficientsfficients (elasticities) and t-valuest-values (significance)(significance) forfor thethe oofficeffice variable.variable.

For the individually selectedselected factorfactor inin Model Model 2, 2, the the mean mean of of the the elasticities elasticitie fors for the the o ffiofficece variable variable is 2397.24.is 2397.24. Thus, Thus, for for each each new new o ffiofficece within within the the station’s station’s PCA, PCA,the the numbernumber ofof tripstrips addsadds up to 2397.24 each weekday. Elasticities are higher (more trips attracted per office) office) in the east and west than in the middle, whereas the el elasticitiesasticities of the central and north regions are negative and low, indicating that people who go to work in these regions depend more on transport other than the metro. In In addition, addition, the t-valuet-value map on the right shows that the effect effect of oofficesffices is more significantsignificant in the east, west, and mid-southmid-south areas at a 0.05 level (the absolute value is largerlarger thanthan 1.96)1.96) (Figure(Figure7 7b).b). Thus,Thus, inin generalgeneral GWR is shown to have strong spatial explanatory power, based on local analysis of the variation of each coecoefficientfficient across spacespace (elasticities).(elasticities).

6. Conclusions This studystudy mainlymainly discussed discussed the the impact impact factors factors of stationof station ridership ridership of Shenzhen of Shenzhen metro metro at a localat a local level duringlevel during different different time periods. time periods. Four categories Four categories of influencing of influencing factors, includingfactors, including land use, land socio-economic, use, socio- networkeconomic, structure, network and structure, intermodal and transportintermodal access, transport are considered. access, are Forconsidered. data collection For data of thecollection factors of networkthe factors structure, of network what structure, is new compared what is with new prior compared studies with is that prior we introduce studies ismeasurements that we introduce from measurements from the field of network analysis, including degree centrality and betweenness centrality. Compared with the dummy variables mostly applied in the prior studies, these Appl. Sci. 2019, 9, 4217 18 of 23 the field of network analysis, including degree centrality and betweenness centrality. Compared with the dummy variables mostly applied in the prior studies, these measurements covering comprehensive information can better quantify the network structure factors related to the practical significance of metro networks. Through adopting backward stepwise regression method for variables selection, and Moran’s I spatial autocorrelation test, this paper builds GWR models to analyze the influencing factors of Shenzhen metro ridership at different time resolutions (including the day of week and the time of day) during different time periods (average daily ridership of weekdays, weekends and a whole week, and average hourly ridership of evening rush hours and nonrush hours). Finally, we demonstrate the superiority of GWR models through the case study of Shenzhen metro. Additionally, the impact factors of Shenzhen metro station ridership are explored and analyzed from a local perspective. The experimental results show that GWR models outperform OLS models in terms of both goodness-of-fit and explanatory power. In addition to the coefficient of determination of GWR models dramatically higher than OLS models, GWR models can provide more information about the spatial distribution of models’ elasticities and other parameters (e.g., significance and goodness-of-fit). The main findings of this study can be summarized as follows. First, the influencing factors are not entirely consistent as time changes. For different models corresponding to different time periods, the common influencing factors including population, days since stations were opened, and betweenness centrality refer to the major impact factors of Shenzhen metro ridership, while the individual factors can help to address the different impact factors of Shenzhen metro ridership during different time periods. It is found that the main source of the ridership over weekdays is from the commuting, and the ridership is driven by recreational activities related to commerce such as shopping over weekends. In a day, the ridership during evening peak hours is mainly affected by activities from school and commutes, while the ridership at nonrush hours is mainly driven by recreational activities related to commerce. Second, the spatial distribution of elasticities can help us to identify where the specific factor can attract more trips, for example, more trips per capita are expected in the center and mid-north, where commerce, administration, and education are concentrated, while elasticity values are lower in the west and east of Shenzhen. These results of the GWR models show great spatial interpretation in transport planning. Above all, these findings have significant implications for the understanding of transportation planning and periphery development. First, in general, population affects all metro station ridership, especially for the center and mid-north of Shenzhen, indicating that metro station ridership would be significantly increased by an additional resident population in these regions; therefore, the results suggest that it is reasonable to give priority to the planning and construction of the metro lines in the regions of densely distributed resident population. Second, betweenness centrality is positively associated with station ridership, indicating that the role of a station to the shortest paths through the metro network is important to attract more passengers, so it suggests the network planner take the network structure into consideration. Third, days since the stations were opened also has positive impacts on station ridership. One possible reason for this is that the first line of metro to be built is generally along the hottest line with the highest travel demands in a city. Therefore, it suggests those cities without metro and in the stage of metro system planning carry out forecasting and estimation of travel demands before determining the first line to be built. Fourth, traffic rush hours are different in different functional zones, so different strategies of diverting passenger flows are suggested to be adopted flexibly. For example, enhancing security when diverting passenger flows is vital in commercial developed areas, while improving transit efficiency is more important when diverting passenger flows in employment-based areas. Finally, the GWR model discusses different local effects of influencing factors on metro ridership. It implies that it is necessary for urban and transportation policy-makers to refer to the GWR results to adjust measures to local conditions when implementing the principle of Transit Oriented Development (TOD) planning. In general, different GWR models implemented for different levels of time periods in this study can not only model the local relationships between the metro station ridership and its potential impact Appl. Sci. 2019, 9, 4217 19 of 23 Appl. Sci. 2019, 9, x FOR PEER REVIEW 17 of 23 factors,This but also study interpret was limited the temporal by the variation absence of of impact some additional factors of ridership, influencing and factors therefore and inspire not metroincorporating planning andthe local periphery model developmentselection in the from model. both The spatial method and in temporal this study perspectives. could be extended to takeThis the study cross was-boundary limited travel by the flows absence between of some Shenzhen additional and influencing Hong Kong factors into consideration, and not incorporating which themight local have model an selectionimpact on in the the ridership model. of The cross method-boundary inthis metro study stations could (Luohu be extended station as to well take as the cross-boundaryFutian Checkpoint travel station) flows [39,40]. between It was Shenzhen not included and in Hong the present Kong study into consideration, because the data which collection might havefor an the impact cross-boundary on the ridership travel data of cross-boundary involved the institutional metro stations barrier. (Luohu We will station extend as wellthe relevant as Futian Checkpointstudy once station) the data [39, 40 are]. It available. was not included Moreover, in futurethe present work study will because also be the carried data outcollection on model for the cross-boundaryimprovement, travel and a dataGWR involved model incorporating the institutional regression barrier. coefficient We will extend shrinkage the relevant and local study model once theselection data are would available. help Moreover,us to gain more future insights work willof metro also planning be carried and out periphery on model development improvement, from and a a GWRlocal model perspective. incorporating regression coefficient shrinkage and local model selection would help us to gain more insights of metro planning and periphery development from a local perspective. Author Contributions: Conceptualization, Y.H., and Y.Z.; data collection, Y.H.; data analysis, Y.H. and Y.Z.; Authorwriting Contributions:—original draftConceptualization, preparation, Y.H.; writing Y.H.,— andreview Y.Z.; and data editing, collection, Y.H., Y.H.;Y.Z. and data K. analysis,-L.T.; supe Y.H.rvision, and K. Y.Z.;- writing—originalL.T. draft preparation, Y.H.; writing—review and editing, Y.H., Y.Z. and K.-L.T.; supervision, K.-L.T. Funding:Funding:This This research research was was partially partially supported supported by the National National Natural Natural Science Science Foundation Foundation of ofChina China (No. (No. 71901188),71901188), and and the the Research Research Grants Grants Council Council Theme-basedTheme-based Research Scheme Scheme (No. (No. T32 T32-101-101/15/15-R).-R). Acknowledgments:Acknowledgments:The The authors authors would would likelike toto thankthank Jian Ma from from Southwest Southwest Jiaotong Jiaotong University University for for providing providing thethe Shenzhen Shenzhen metro metro AFC AFC data. data.

ConflictsConflicts of of Interest: Interest:The The authors authors declare declare no no conflictconflict ofof interest.interest.

AppendixAppendix A A TheThe results results of of models models 1–4 1–4compared compared withwith those of OLS OLS models models were were shown shown inin the the followi followingng TablesTables A1 A1–A4 to. A4.

TableTable A1. A1.Results Results ofof the model for for Weekly_ridership. Weekly_ridership.

Global(OLS) Local(GWR) Variables Estimate Standard Error t(Est/SE) Min Max Mean STD Intercept 148,902.14 8373.14 17.78 −220,846.06 1,259,330.97 230,997.13 231,395.86 Pop 29,682.08 9949.31 2.98 −87,929.62 174,341.19 24,227.35 57,929.46 Betweenness 25,971.88 9096.29 2.86 −75,396.67 122,215.16 30,440.91 53,705.86 Days_open 36,322.47 8796.81 4.13 −449,460.34 868,771.29 72,348.20 158,603.43 Shopping 15,242.75 9217.93 1.65 −68,601.68 58,895.47 1474.80 29,877.34 Dis_to_center −10,074.73 9248.92 −1.09 −207,415.70 755,777.31 43,828.83 157,781.05 Diagnostic R-square 0.35 0.81 Adjusted R- 0.32 0.65 square Sigma 90,925.69 64,811.17 AICc 3038.33 3035.27 Residual sum 925,957,952,739.49 269,172,991,697.74 of squares Number of 6 37.83 parameters GWR ANOVA Table Source SS DF MS F p-Value Global Residuals 925,957,952,739.493 6.000 GWR Improvement 656,784,961,041.749 47.919 13,706,255,804.666 GWR Residuals 269,172,991,697.743 64.081 4,200,487,280.123 3.263016 0.00000618

Appl. Sci. 2019, 9, 4217 20 of 23 Appl. Sci. 2019, 9, x FOR PEER REVIEW 18 of 23 Appl. Sci. 2019, 9, x FOR PEER REVIEW 18 of 23 Table A2. Results of the model for Weekday_ridership. Table A2. Results of the model for Weekday_ridership. Table A2. Results of the model for Weekday_ridership. Global(OLS) Local(GWR) Variables EstimateGlobal(OLS) Standard Error t(Est/SE) Min MaxLocal(GWR) Mean STD VariablesIntercept 21Estimate,354.89 Standard1158.65 Error t(Est/SE)18.43 −57Min,334.43 148Max,479.18 23Mean,875.86 28STD,629.01 InterceptPop 214955.59,354.89 1158.651297.64 18.433.82 −5171,334.43152.22 14822,,849.73479.18 233284.89,875.86 287185.60,629.01 BetweennessPop 4955.593932.25 1297.641257.94 3.823.13 −−171486.93,152.22 2214,849.73774.72 3284.894679.16 7185.605835.02 BetweennessDays_open 3932.254794.00 1257.941199.76 3.133.99 −−979486.93,419.62 1427,774.72923.23 4679.16542.57 265835.02,011.35 Days_openWorking 4794.00632.51 1199.761215.40 3.990.52 −−989834.00,419.62 2726,923.23802.92 2937.24542.57 268566.93,011.35 Dis_to_centerWorking −632.511759.53 1215.401295.76 −0.521.36 −−284834.00,571.36 2686,802.92428.59 2937.244383.04 158566.93,504.50 Dis_to_center −1759.53 1295.76 Diagnostic−1.36 −24,571.36 86,428.59 4383.04 15,504.50 R-square 0.37 Diagnostic 0.79 AdjustedR-square R - 0.37 0.79 Adjusted R- 0.34 0.65 square 0.34 0.65 squareSigma 12,582.09 9130.64 SigmaAICc 122571.58,582.09 9130.642551.06 ResidualAICc sum 2571.58 2551.06 Residual sum 17,730,610,368.11 5,947,129,532.95 of squares 17,730,610,368.11 5,947,129,532.95 Numberof squares of Number of 6 31.68 parameters 6 31.68 parameters GWR ANOVA Table Source SS GWR DFANOVA Table MS F p-Value GlobalSource Residuals 17,730,610SS, 368.116 6.000DF MS F p-Value GWRGlobal Improvement Residuals 1711,730783,610480,368.116835.168 40.6656.000 289,772, 173.512 GWRGWR Improvement Residuals 115,947,783,129,480,532.948,835.168 40.66571.335 28983,368,772,605.017,173.512 3.475795 0.00000205 GWR Residuals 5,947,129,532.948 71.335 83,368,605.017 3.475795 0.00000205

Table A3. Results of the model for Weekend_ridership. TableTable A3. A3.Results Results ofof thethe model for Weekend_ridership. Weekend_ridership. Global(OLS) Local(GWR) Variables EstimateGlobal(OLS) Standard Error t(Est/SE) Min MaxLocal(GWR) Mean STD VariablesIntercept 21Estimate,069.11 Standard1396.44 Error t(Est/SE)15.09 −120Min,163.53 88Max,286.30 15Mean,756.13 34STD,442.86 InterceptPop 213315.04,069.11 1396.441641.47 15.092.02 −−12201,270.33,163.53 8819,286.30333.99 152360.82,756.13 347639.67,442.86 DegreePop 3315.042650.45 1641.471769.71 2.021.50 −2110,270.33861.88 1921,333.99766.31 2360.824363.04 7639.678529.63 BetweennessDegree 2650.452015.18 1769.711776.54 1.501.13 −10,861.88825.87 2117,766.31108.65 4363.041367.15 8529.635279.79 BetweennessDays_open 2015.185339.96 1776.541475.52 1.133.62 −−11670,825.87,661.22 1788,108.65682.06 1367.15−281.66 41401.465279.79 Days_openResidence 5339.961699.47 1475.521726.48 3.620.98 −−11675,025.38,661.22 8817,682.06648.50 −3450.52281.66 41401.466192.10 ResidenceShopping 1699.472358.64 1726.481754.75 0.981.34 −1250,025.38228.55 1720,648.50126.20 3450.52−497.87 6192.107419.68 Shopping 2358.64 1754.75 Diagnostic1.34 −20,228.55 20,126.20 −497.87 7419.68 R-square 0.29 Diagnostic 0.86 AdjustedR-square R - 0.29 0.86 Adjusted R- 0.24 0.70 square 0.24 0.70 squareSigma 15,165.28 9450.60 SigmaAICc 152616.89,165.28 9450.602602.62 ResidualAICc sum 2616.89 2602.62 Residual sum 25,528,398,304.96 5,143,828,231.09 of squares 25,528,398,304.96 5,143,828,231.09 Numberof squares of Number of 7 42.59 parameters 7 42.59 parameters GWR ANOVA Table Source SS GWR ANOVADF Table MS F p-Value GlobalSource Residuals 25,528,398SS, 304.956 7.000DF MS F p-Value GWRGlobal Improvement Residuals 2520,528384,398570,304.956073.869 53.4077.000 381,681, 908.043 GWRGWR Improvement Residuals 205,143,384,828,570,231.087,073.869 53.40757.593 38189,313,681,770.481,908.043 4.273495 0.00000010 GWR Residuals 5,143,828,231.087 57.593 89,313,770.481 4.273495 0.00000010

Appl. Sci. 2019, 9, 4217 21 of 23 Appl. Sci. 2019, 9, x FOR PEER REVIEW 19 of 23

TableTable A4. A4.Results Results ofof thethe modelmodel for Evenrush_ridership.

Global(OLS) Local(GWR) Variables Estimate Standard Error t(Est/SE) Min Max Mean STD Intercept 637.04 35.71 17.84 −2874.03 3506.87 640.77 957.13 Pop 148.58 40.39 3.68 −545.41 691.39 115.20 228.58 Degree −26.14 44.94 −0.58 −356.74 372.09 −20.69 170.33 Betweenness 124.29 47.10 2.64 −173.17 583.86 178.10 186.16 Days_open 149.47 36.73 4.07 −4123.52 2067.64 43.21 952.39 School 45.76 35.97 1.27 −282.31 334.43 55.45 138.56 Dis_to_center −57.96 39.57 −1.46 −1083.60 1956.09 114.93 433.65 Diagnostic R-square 0.36 0.84 Adjusted R-square 0.32 0.70 Sigma 387.72 256.41 AICc 1751.60 1729.41 Residual sum of squares 16,686,678.25 4,226,169.97 Number of parameters 7 37.83 GWR ANOVA Table Source SS DF MS F p-Value Global Residuals 16,686,678.255 7.000 GWR Improvement 12,460,508.283 46.718 266,718.092 GWR Residuals 4,226,169.972 64.282 65,744.115 4.056912 0.00000015

References References 1. Gutiérrez, J.; Cardozo, O.D.; García-Palomares, J.C. Transit ridership forecasting at station level: An 1. Gutiapproachérrez, J.; based Cardozo, on distance O.D.; Garc-decayía-Palomares, weighted regression. J.C. Transit J. ridership Transp. Geogr. forecasting 2011, 19 at, station1081–1092 level:. An approach 2. basedCardozo, on distance-decay O.D.; García-Palomares, weighted regression.J.C.; Gutiérrez,J. Transp. J. Application Geogr. 2011 of geographically, 19, 1081–1092. weighted [CrossRef regression] to 2. Cardozo,the direct O.D.; forecasting García-Palomares, of transit ridership J.C.; Guti até rrez,station J. Application-level. Appl. Geogr. of geographically 2012, 34, 548 weighted–558. regression to the 3. directChoi, forecasting J.; Lee, Y.J.; of Kim, transit T.; ridershipSohn, K. An at station-level.analysis of MetroAppl. ridership Geogr. 2012 at the, 34 station, 548–558.-to-station [CrossRef level] in Seoul. 3. Choi,Transportation J.; Lee, Y.J.; 2012 Kim,, 39 T.;, 705 Sohn,–722. K. An analysis of Metro ridership at the station-to-station level in Seoul. 4. TransportationCervero, R. 2012Alternative, 39, 705–722. approaches [CrossRef to modeling] the travel-demand impacts of smart growth. J. Am. Plan. 4. Cervero,Assoc. 2006 R. Alternative, 72, 285–295 approaches. to modeling the travel-demand impacts of smart growth. J. Am. Plan. 5. Assoc.Kuby,2006 M.;, 72Barranda,, 285–295. A.; [ CrossRefUpchurch,] C. Factors influencing light-rail station boardings in the United States. 5. Kuby,Transp. M.; Res. Barranda, Part A Policy A.; Upchurch, Pract. 2004 C., 38 Factors, 223–247 influencing. light-rail station boardings in the United States. 6. Transp.Chu, Res.X. Ridership Part A PolicyModels Pract. at the2004 Stop, Level38, 223–247.; Report [No.CrossRef BC137]-31; National Center for Transit Research for 6. Chu,Florida X. Ridership Departm Modelsent of Transportation: at the Stop Level Tampa,; Report Florida, No. BC137-31; USA, 2004. National Center for Transit Research for 7. FloridaHe, Y.; Department Zhao, Y.; Tsui of, Transportation:K.L. An AnalysisTampa, of Factors FL, Influencing USA, 2004. Metro Station Ridership: Insights from Taipei 7. He,Metro. Y.; Zhao, In Proceedings Y.; Tsui, K.L. of the An IEEE Analysis Conference of Factors on Intelligent Influencing Transportation Metro Station Systems Ridership: (ITSC), Insights Maui, fromHI, USA, Taipei 4–7 November 2018. Metro. In Proceedings of the IEEE Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 8. Sohn, K.; Shim, H. Factors generating boardings at Metro stations in the Seoul metropolitan area. Cities 4–7 November 2018. 2010, 27, 358–368. 8. Sohn, K.; Shim, H. Factors generating boardings at Metro stations in the Seoul metropolitan area. Cities 2010, 9. Loo, B.P.Y.; Chen, C.; Chan, E.T.H. Rail-based transit-oriented development: Lessons from New York City 27, 358–368. [CrossRef] and Hong Kong. Landsc. Urban Plan. 2010, 97, 202–212. 9. Loo, B.P.Y.; Chen, C.; Chan, E.T.H. Rail-based transit-oriented development: Lessons from New York City 10. Sung, H.; Oh, J.T. Transit-oriented development in a high-density city: Identifying its association with and Hong Kong. Landsc. Urban Plan. 2010, 97, 202–212. [CrossRef] transit ridership in Seoul, Korea. Cities 2011, 28, 70–82. 10.11.Sung, Thompson, H.; Oh, J.T. G.; Transit-oriented Brown, J.; Bhattacharya, development T. in What a high-density Really Matters city: Identifying for Increasing its association Transit Ridership: with transit ridershipUnderstanding in Seoul, the Korea. DeterminantsCities 2011 of, Transit28, 70–82. Ridership [CrossRef Demand] in Broward County, Florida. Urban Stud. 11. Thompson,2012, 49, 3327 G.;–3345 Brown,. J.; Bhattacharya, T. What Really Matters for Increasing Transit Ridership: 12.Understanding Guerra, E.; Cervero, the Determinants R.; Tischler, D. of The Transit half Ridership-mile circle: Demand Does it best in Broward represent County, transit station Florida. catchments.Urban Stud. 2012Transp., 49, 3327–3345. Res. Rec. 2012 [CrossRef, 2276, 101] –109. 12.13.Guerra, Zhao, E.; J.; Deng,Cervero, W.; R.; Song, Tischler, Y.; Zhu, D. The Y. What half-mile influences circle: Metro Does itstation best represent ridership transit in China? station Insights catchments. from Transp.Nanjing. Res. C Rec.ities 20122013, 352276, 114, 101–109.–124. [CrossRef] 13.14.Zhao, Chan, J.; Deng,S.; Miranda W.; Song,-Moreno, Y.; Zhu, L. A Y. station What- influenceslevel ridership Metro model station for ridershipthe metro innetwork China? in Insights Montreal, from Quebec. Nanjing. CitiesCan.2013 J. Civ., 35 Eng., 114–124. 2013, 40 [CrossRef, 254–262]. 14.15.Chan, Singhal, S.; Miranda-Moreno, A.; Kamga, C.; Yazici, L. A A. station-level Impact of weather ridership on model urban fortransit the ridership. metro network Transp. in R Montreal,es. Part A Policy Quebec. Can.Pract. J. Civ. 2014 Eng., 69, 2013379–,39140,. 254–262. [CrossRef] Appl. Sci. 2019, 9, 4217 22 of 23

15. Singhal, A.; Kamga, C.; Yazici, A. Impact of weather on urban transit ridership. Transp. Res. Part A Policy Pract. 2014, 69, 379–391. [CrossRef] 16. Liu, C.; Erdogan, S.; Ma, T.; Ducca, F.W. How to Increase Rail Ridership in Maryland: Direct Ridership Models for Policy Guidance. J. Urban Plan. Dev. 2016, 142, 04016017. [CrossRef] 17. Pan, H.; Li, J.; Shen, Q.; Shi, C. What determines rail transit passenger volume? Implications for transit oriented development planning. Transp. Res. Part D Transp. Environ. 2017, 57, 52–63. [CrossRef] 18. Vergel-Tovar, C.E.; Rodriguez, D.A. The ridership performance of the built environment for BRT systems: Evidence from Latin America. J. Transp. Geogr. 2018, 73, 172–184. [CrossRef] 19. Fotheringham, A.S.; Brunsdon, C.; Charlton, M.E. Geographically Weighted Regression: A Method for Exploring Spatial Nonstationarity. Geogr. Anal. 1996, 28, 281–298. 20. Tu, W.; Cao, R.; Yue, Y.; Zhou, B.; Li, Q.; Li, Q. Spatial variations in urban public ridership derived from GPS trajectories and smart card data. J. Transp. Geogr. 2018, 69, 45–57. [CrossRef] 21. Jun, M.J.; Choi, K.; Jeong, J.E.; Kwon, K.H.; Kim, H.J. Land use characteristics of subway catchment areas and their influence on subway ridership in Seoul. J. Transp. Geogr. 2015, 48, 30–40. [CrossRef] 22. Zhao, J.; Deng, W.; Song, Y.; Zhu, Y. Analysis of Metro ridership at station level and station-to-station level in Nanjing: An approach based on direct demand models. Transportation 2014, 41, 133–155. [CrossRef] 23. Kepaptsoglou, K.; Stathopoulos, A.; Karlaftis, M.G. Ridership estimation of a new LRT system: Direct demand model approach. J. Transp. Geogr. 2017, 58, 146–156. [CrossRef] 24. Taylor, B.D.; Miller, D.; Iseki, H.; Fink, C. Analyzing the Determinants of Transit Ridership Using a Two-Stage Least Squares Regression on a National Sample of Urbanized Areas; UC Berkeley, University of California Transportation Center: Berkeley, CA, USA, 2003; Available online: https://escholarship.org/uc/item/7xf3q4vh (accessed on 1 September 2019). 25. Estupiñán, N.; Rodríguez, D.A. The relationship between urban form and station boardings for Bogotá’s BRT. Transp. Res. Part A Policy Pract. 2008, 23, 439–464. [CrossRef] 26. Zhang, D.; Wang, X.C. Transit ridership estimation with network Kriging: A case study of Second Avenue Subway, NYC. J. Transp. Geogr. 2014, 41, 107–115. [CrossRef] 27. Hu, N.; Legara, E.F.; Lee, K.K.; Hung, G.G.; Monterola, C. Impacts of land use and amenities on public transport use, urban planning and design. Land Use Policy 2016, 57, 356–367. [CrossRef] 28. Deng, J.; Xu, M. Characteristics of subway station ridership with surrounding land use: A case study in Beijing. In Proceedings of the ICTIS 2015-3rd International Conference on Transportation Information and Safety, Wuhan, China, 25–28 June 2015. 29. Li, J.; Yao, M.; Fu, Q. Forecasting Method for Urban Rail Transit Ridership at Station Level Using Back Propagation Neural Network. Discret. Dyn. Nat. Soc. 2016.[CrossRef] 30. Dovey, K.; Pafka, E.; Ristic, M. Mapping Urbanities: Morphologies, Flows, Possibilities, 1st ed.; Routledge: Abingdon, UK, 2017; pp. 154–156. 31. Kitsak, M.; Gallos, L.K.; Havlin, S.; Liljeros, F.; Muchnik, L.; Stanley, H.E.; Makse, H.A. Identification of influential spreaders in complex networks. Nat. Phys. 2010, 6, 888. [CrossRef] 32. Larose, D.T.; Larose, C.D. Data Mining and Predictive Analytics, 2nd ed.; John Wiley & Sons: New York, NY, USA, 2015; pp. 358–359. 33. Fotheringham, A.S.; O’Kelly, M.E.; O´Kelly, M.E. Spatial Interaction Models: Formulations and Applications; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1989; pp. 112–115. 34. Curtis, C.; Scheurer, J. Planning for sustainable accessibility: Developing tools to aid discussion and decision-making. Prog. Plan. 2010, 74, 53–106. [CrossRef] 35. Moran, P.A. Notes on continuous stochastic phenomena. Biometrika 1950, 37, 17–23. [CrossRef] 36. Cressie, N. Statistics for spatial data. Terra Nova 1992, 4, 613–617. [CrossRef] 37. Amelin, L. Local Indicators of Spatial association-LISA. Geogr. Anal. 1995, 27, 93–115. 38. Tüceryan, M.; Jain, A.K. Texture Segmentation Using Voronoi Polygons. IEEE Trans. Pattern Anal. Mach. Intell. 1990, 12, 211–216. [CrossRef] Appl. Sci. 2019, 9, 4217 23 of 23

39. Wang, D. Hong Kongers’ cross-border consumption and shopping in Shenzhen: Patterns and motivations. J. Retail. Consum. Serv. 2004, 11, 149–159. [CrossRef] 40. Leung, A.; Burke, M.; Cui, J. The tale of two (very different) cities—Mapping the urban transport oil vulnerability of Brisbane and Hong Kong. Transp. Res. Part D Transp. Environ. 2018, 65, 796–816. [CrossRef]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).