smart cities

Article Passenger Flow Prediction of Urban Rail Transit Based on Deep Learning Methods

Zhi Xiong 1, Jianchun Zheng 2, Dunjiang Song 3, Shaobo Zhong 2,* and Quanyi Huang 1

1 Department of Engineering Physics, Tsinghua University, 100084, 2 Beijing Research Center of Urban Systems Engineering, Beijing 100035, China 3 Institute of Science and Development, Chinese Academy of Sciences, Beijing 100190, China * Correspondence: [email protected]

 Received: 28 May 2019; Accepted: 16 July 2019; Published: 23 July 2019 

Abstract: The rapid development of urban rail transit brings high efficiency and convenience. At the same time, the increasing passenger flow also remarkably increases the risk of emergencies such as passenger stampedes. The accurate and real-time prediction of dynamic passenger flow is of great significance to the daily operation safety management, emergency prevention, and dispatch of urban rail transit systems. Two deep learning neural networks, a long short-term memory neural network (LSTM NN) and a convolutional neural network (CNN), were used to predict an urban rail transit passenger flow time series and spatiotemporal series, respectively. The experiments were carried out through the passenger flow of Beijing metro stations and lines, and the prediction results of the deep learning methods were compared with several traditional linear models including autoregressive integrated moving average (ARIMA), seasonal autoregressive integrated moving average (SARIMA), and space–time autoregressive integrated moving average (STARIMA). It was shown that the LSTM NN and CNN could better capture the time or spatiotemporal features of the urban rail transit passenger flow and obtain accurate results for the long-term and short-term prediction of passenger flow. The deep learning methods also have strong data adaptability and robustness, and they are more ideal for predicting the passenger flow of stations during peaks and the passenger flow of lines during holidays.

Keywords: LSTM NN; CNN; urban rail transit; passenger flow prediction

1. Introduction With the continuous formation and expansion of urban rail transit networks, this transportation mode brings high efficiency and convenience to residents’ travel. However, its increasing passenger flow also remarkably increases the risk of emergencies such as crowded passengers and stampedes. In the urban rail transit intelligent system, the accurate and real-time prediction of dynamically changing passenger flow is of great significance to the daily operation safety management, emergency prevention, and dispatch of urban rail transit. Many researchers have proposed some methods for the passenger flow prediction of urban rail transit, as well as road traffic prediction. These studies mainly used historical time series or spatiotemporal series with certain time intervals—combined with other auxiliary information—and used data models or algorithms to mine the time or spatiotemporal internal relations of historical data to predict passenger flow in the future. Generally, prediction methods of time series or spatiotemporal series can be divided into two categories: Linear methods and nonlinear methods [1,2]. Linear methods are based on the linearity and stationary of time series or spatiotemporal series [3]. Commonly used linear methods in the past were mainly the moving average model and the exponential smoothing model [4–6]. In 1979, Ahmed and Cook [7] first applied the autoregressive integrated moving average (ARIMA) model to

Smart Cities 2019, 2, 371–387; doi:10.3390/smartcities2030023 www.mdpi.com/journal/smartcities Smart Cities 2019, 2 372 predict traffic flow time series, and they obtained more accurate prediction results than the moving average model and the exponential smoothing model. Since then, classical time series analysis models such as ARIMA [8] and the seasonal autoregressive integrated moving average (SARIMA) model [5,9] have been widely used in road traffic and urban rail transit passenger flow prediction, and they have achieved fairly good results. In 2005, Kamarianakis and Prastacos [10] introduced the space–time autoregressive integrated moving average (STARIMA) model into the traffic flow short-term prediction of the road network in the center of the city of Athens, Greece. By using a spatial weight matrix to quantify the correlations between traffic flow at any observation location and the traffic conditions in adjacent locations, the spatiotemporal evolution of traffic flow in the road networks was statistically described, and then STARIMA achieved satisfactory prediction results. Since these methods usually require the stationarity hypothesis, their application is limited. Otherwise, the simple linear relationship cannot fully characterize the internal relationship of time or spatiotemporal series. Therefore, some nonlinear algorithms have been proposed, such as the Gaussian maximum likelihood estimation, the nonparametric regression model, and Kalman filtering [11–13]. In recent years, intelligent algorithms such as the Bayesian network, the neural network, wavelet analysis, chaos theory, and the support vector machine have also been directly applied or combined into hybrid models for predicting road traffic or urban rail transit passenger flow [14–18]. Compared to linear methods, nonlinear methods are more flexible, and their prediction results generally perform better [19,20]. With the widespread use of various types of data acquisition equipment, the intelligent transportation system (ITS) has mastered a large amount of traffic data [21]. The general parameter approximation algorithms can only shallowly correlate data, and it is difficult for them to obtain a good prediction performance in the face of the curse of dimensionality caused by a data explosion. An artificial neural network (ANN) solves the curse of dimensionality by using distributed and hierarchical feature representation and by modeling complex nonlinear relationships with deeper network layers, thus creating a new field of deep learning [22]. In the 1990s, Hua and Faghri [23] introduced an ANN to the estimation of the travel time of highway vehicles. After that, various ANNs have been applied to traffic prediction by the ITS, such as the feed forward neural network (FFNN), the radial basis frequency neural network (RBFNN), the spectral-basis neural network (SNN) and the recurrent neural network (RNN) [24–27]. Among them, RNNs handle any input series through memory block, which is a memory-based neural network suitable for studying the evolutionary law of spatiotemporal data, but there are two shortcomings: (1) They need continuous trial and error to predetermine the optimal time lags, and (2) they cannot perform well in long-term predictions because of vanishing and exploding gradients [28]. As a special RNN, the long short-term memory neural network (LSTM NN) overcomes the above shortcomings and has been introduced into the road traffic time series prediction. It has obtained significantly better prediction results [19]. In addition, Fei Lin et al. [29] proposed a sparse self-encoding method to extract spatial features from the spatial-temporal matrix through the fully connected layer. They then combined this with the LSTM NN to predict the average taxi speed in Qingyang District of Chengdu and obtained a higher accuracy and robustness compared with the LSTM NN. Xiaolei Ma et al. [30] proposed a method based on the convolutional neural network (CNN), which used a two-dimensional time-space matrix to convert spatiotemporal traffic dynamics into an image which described the time and space relationship of traffic flow; they confirmed that the method can accurately predict traffic speed through two examples of the Beijing transportation network. For a large sample of urban rail transit passenger flow, there is very little research on passenger flow prediction based on deep learning methods. In this paper, two deep learning methods, the LSTM NN and CNN, are introduced to predict the time series and spatiotemporal series of urban rail transit passenger flow, respectively. In addition, the traditional linear models, ARIMA, SARIMA and STARIMA, are used as contrasts in different experiments to test the prediction performances of two deep learning methods. Smart Cities 2019, 2 373 Smart Cities 2019, 2 FOR PEER REVIEW 3

2. Materials Materials and and Methods Methods

2.1. Experiment Experiment Data Data As an an international international city, city, Beijing Beijing reached reached a populati a populationon of 21.542 of 21.542 million million by the by end the of end2018, of which 2018, meanswhich meansthere is therea huge is demand a huge demand for public for transporta public transportation.tion. This paper This took paper the Beijing took the metro Beijing system metro as asystem study asobject, a study and object, the experiment and the experiment data came data from came the dayparting from the dayparting passenger passenger flow of 47 flow metro of 47stations metro andstations daily and passenger daily passenger flow of 15 flow metro of 15 lines metro in 2015 lines (Figure in 2015 1). (Figure 1).

Legend ZHUXINZHUANG ± !( Line 2 HUOYING !( XI'ERQI LISHUIQIAO !( !( Olympic Green Line 13 WANGJING !( WANGJING West Line 15 !( !( FANGSHAN Line HAIDIANHUANGZHUANG BEITUCHENG HUIXINXIJIENANKOU ZHICHUNLU!( !( !( SHAOYAOJU!( !( CHANGPING Line YONGHEGONG!( SANYUANQIAO YIZHUANG Line National GULOUDAJIE Lama Temple Library DONGZHIMEN !( !( Airport Express !( !( NANLUOGUXIANG!( BAISHIQIAO!( !( !( !( PING'ANLI!( CHAOYANGMEN CISHOUSI HUJIALOU South CHEGONGZHUANG !( !( JINTAILU!( !( FUXINGMEN GUOMAO SIHUI SIHUI West GONGZHUFEN !( !( XIDAN!( !( !(CHONGWENMEN!( !( !( !( Military XUANWUMEN JIANGUOMEN Museum !( !( !( !( CIQIKOU LIULIQIAO Beijing West !( Railway !( Station XIJU !( QILIZHUANG !( JIAOMEN West SONGJIAZHUANG !( !(

GUOGONGZHUANG !(

Figure 1. BeijingBeijing metro metro network map.

The dayparting passenger flows flows of metro stations for experiments were a time series at 10-min intervals. In In order order to to train train the the network network with with complete complete knowledge knowledge of ofthe the nonlinear nonlinear distribution distribution of daypartingof dayparting passenger passenger flow, flow, the combination the combination of the ofpassenger the passenger flows on flows 3 March on 32015 March and 2015on 10 and March on 201510 March (Figure 2015 2, (Figuretaking XIZHIMEN2, taking XIZHIMEN Station as Station an exam asple) an example) were input were into input it. These into it.two These days two were days the Tuesdaywere the Tuesdayfrom the fromadjacent the adjacentweeks, which weeks, were which not were holidays not holidays or major or majorevent eventdays; days;as such, as such, the daypartingthe dayparting passenger passenger flow flow of these of these two twodays days basica basicallylly conformed conformed to the to thesame same distribution distribution law. law. In addition,In addition, passenger passenger flow flow during during the the two two peaks peaks (7:00–9:30 (7:00–9:30 and and 17:00–19:30) 17:00–19:30) accounted accounted for for more more than 50% of the total total daily daily passenger passenger flow flow in a a single single da day.y. Observing Observing the daily daily passenger passenger flow flow of the the Beijing Beijing metro line in 2015 (Figure 33,, takingtaking LineLine 11 asas anan example),example), thethe passengerpassenger flowflow changedchanged periodically, suddenly reducingreducing or or surging surging during during the holidays,the holidays, particularly particularly during during the Spring the FestivalSpring inFestival February. in February. Smart Cities 2019, 2 374 SmartSmart Cities Cities 2019 2019, ,2 2 FOR FOR PEER PEER REVIEW REVIEW 44

4000 4000

3500 3500

3000 3000

2500 2500

2000 2000

Passenger Flow 1500 Passenger Flow 1500

1000 1000

500 500

0 0 0 40 00 40 00 40 00 40 00 40 0 00 40 20 40 20 40 :20 00 40 4:20 00 40 00 40 00 40 00 40 20 :00 40 20 09:20 14:20 16: 21: 09: 14: 19:00 09 1 16:- 0-21: 09: 14: 19 50-11: 50- 50- 30-10: 30-15: 30-20: 07:30-07:09:10- 10:50-11:12:30-12:14:10- 15:50 17:30-17:19:10-19:2020:5 22:30-22:07:10-07:2008:50- 10:30-10:12:10-12:213:50- 15:30-15:17:10-17:18:50- 20:30-20:22:10-22: 07:30-07:09:10- 10: 12:30-12:14:10- 15: 17:30-17:19:10-19:2020: 22:30-22:07:10-07:2008:50- 10: 12:10-12:213:50- 15: 17:10-17:18:50- 20: 22:10-22: TimeTime

FigureFigure 2.2.2. Joint Joint passengerpassenger flowflowflow of ofof XIZHIMEN XIZHIMENXIZHIMEN Station. Station.Station.

140 140

120 120

100 100 ) 4 ) 4 10 10 × × 80 80

60 60 Passenger Flow( Passenger Passenger Flow( Passenger

40 40

20 20

Jan 1 Jan 31 Mar 2 Apr 1 May 1 May 31 Jun 30 Jul 30 Aug 29 Sep 28 Oct 28 Nov 27 Dec 27 Jan 1 Jan 31 Mar 2 Apr 1 May 1 May 31 Jun 30 Jul 30 Aug 29 Sep 28 Oct 28 Nov 27 Dec 27 Date Date FigureFigure 3. 3.3. Daily Daily passengerpassenger flowflowflow ofof LineLine 11 inin 2015.2015.

2.2.2.2. MethodsMethods

2.2.1.2.2.1. LSTMLSTM NNNN TheThe LSTMLSTMLSTM NNNNNN hashashas longlonglong short-termshort-termshort-term memorymemorymemory capabilitycapacapabilitybility comparedcompared withwith otherother RNNsRNNs duedue toto itsitsits uniqueunique networknetwork structure.structure. This This network network consists consists of of an an input input layer, layer, anan outputoutput layer,layer, andandand aaa recursiverecursiverecursive hiddenhidden layer.layer.layer. TheThe corecore ofofof thethethe recursiverecursiverecursive hidden hiddehiddenn layer layerlayer is isis a aa memory memorymemory block blockblock (Figure (Figure(Figure4 4))[4) 21 [21].[21].]. Smart Cities 2019, 2 375 Smart Cities 2019, 2 FOR PEER REVIEW 5

Figure 4.4. Structure of the memorymemory block of thethe proposedproposed longlong short-termshort-term memorymemory neuralneural networknetwork (LSTM(LSTM NN).NN).

The center of the memory block is the memory cell. In addition, the memory block includes three The center of the memory block is the memory cell. In addition, the memory block includes information control parts: Input gate, forget gate and output gate. The input of the memory block is three information control parts: Input gate, forget gate and output gate. The input of the memory preprocessed historical passenger flow time series X , and the output is predicted passenger flow block is preprocessed historical passenger flow time seriest Xt, and the output is predicted passenger

flowtime timeseries series Yt . Yt t.representst represents the the current current prediction prediction period. period. Using Using the the LSTM LSTM NN NN to to predict futurefuture passengerpassenger flowflow doesdoes notnot requirerequire commandingcommanding thethe networknetwork toto tracetrace backback somesome timetime steps.steps. TheThe inputinput timetime seriesseries isis iterativelyiteratively calculatedcalculated inin thethe memorymemory block,block, andand didifferentfferent informationinformation isis controlledcontrolled throughthrough didifferentfferent gates.gates. The LSTM NN memory block can capture long andand short-termshort-term complexcomplex connections within a time series. St represents the current state of the memory cell. St 1 represents the connections within a time series. St represents the current state of the memory cell.− St-1 represents previous state of the memory cell. The dashed lines represent the combination with St 1. Blue nodes − representthe previous multiplication. state of the Thememory iterative cell. calculations The dashed and lines information represent processes the combination in the memory with S blockt-1 . Blue are representednodes represent by the multiplication. following set The of expressions: iterative calculations and information processes in the memory block are represented by the following set of expressions:  (i) (i)  it = σ W Xt + U St 1 + bi =++σ  ()ii ()−  iXSbttti()WU( f ) ( f ) −1 ft = σ W Xt + U St 1 + b f =++σ () ()ff() ()−  tfXSbtttf= σ W()WUXt + U St 1 + b−1  −  (1) οο(c) (c) οσst ==++tanh W()Xt + U ()St 1 + bc ttt()WUXSb−−1 ο St = ft St 1 + it st (1) sXSb=++tanh(◦ WU− ()cc◦ () ) Ytttct =t tanh(St) −1 =+◦ SfSisttttt−1 where it, ft, and t represent the output of three gates, respectively. st represents new state of memory YS= ο  tanh( ) (i) ( f ) () (c) (i) ( f ) () (c) cell, and St represents the final statett of memory cell.t W , W , W , W , U , U , U and U are all weight matrixes. b , b , bo and bc are all bias vectors. denotes Hadamard product, σ denotes where i , f , and ο representi f the output of three gates, ◦respectively. s represents new state of sigmoidt function,t andt tanh is a kind of activation functions. t ()i ()f ()ο ()c ()i memory cell, and St represents the final state of memory cell. W , W , W , W , U , 2.2.2.()f CNN()ο ()c U , U and U are all weight matrixes. bi , bf , bo and bc are all bias vectors.  denotes The CNN has shown a strong learning ability and a wide application in image recognition with Hadamard product, σ denotes sigmoid function, and tanh is a kind of activation functions. its unique ability to extract key features from images [30]. The multi-station or multi-line passenger flow2.2.2. spatiotemporal CNN series studied in this paper are essentially two-dimensional matrixes, which can The CNN has shown a strong learning ability and a wide application in image recognition with its unique ability to extract key features from images [30]. The multi-station or multi-line passenger SmartSmart CitiesCities2019 2019,,2 2 FOR PEER REVIEW 3766

flow spatiotemporal series studied in this paper are essentially two-dimensional matrixes, which can bebe regardedregarded as as single-channel single-channel two-dimensional two-dimensional images, images, so so an an end-to-end end-to-end CNN CNN was was constructed constructed for thefor passengerthe passenger flow flow spatiotemporal spatiotemporal series series (Figure (Figure5). 5).

FigureFigure 5.5. Structure ofof thethe proposedproposed convolutionalconvolutional neuralneural networknetwork (CNN).(CNN).

The CNN consists of an input layer, multiple convolutional layers, and an output layer. Feature The CNN consists of an input layer, multiple convolutional layers, and an output layer. extraction is performed by convolution filters (also called convolution kernels) in convolutional Feature extraction is performed by convolution filters (also called convolution kernels) in convolutional layers. A convolution filter extracts a feature from the current layer, and then the extracted features layers. A convolution filter extracts a feature from the current layer, and then the extracted features are further combined into higher-level and more abstract features, so each convolution filter converts are further combined into higher-level and more abstractr features, so each convolution filter converts low-level features into high-level features. Using r Wl to represent a convolution filter, the filter low-level features into high-level features. Using Wl to represent a convolution filter, the filter output isoutput represented is represented as: as: m n XmnX r yWdy = = ((W((r) d )) ) (2) convconv ef==11l e l f efe f ef (2) e=1 f =1 where m and n represent two dimensions of the filter. d represents the value of the filter input where m and n represent two dimensions of the filter. de f representsef the value of the filter input matrix at the e and f positions in the row and column. (Wr) representsr the coefficients of the convolution matrix at the e and f positions in the row and column.l e f ()Wlef represents the coefficients of the filter at the e and f positions in the row and column. convolution filter at the e and f positions in the row and column. Each convolutional layer of the CNN also needs an ReLU (Rectified Linear Unit) function (a kind Each convolutional layer of the CNN also needs an ReLU (Rectified Linear Unit) function (a kind of activation function) to convert the output of each layer into a manageable, scaled data range, which is of activation function) to convert the output of each layer into a manageable, scaled data range, which beneficial for the training model. Moreover, the combination of ReLU functions can also simulate is beneficial for the training model. Moreover, the combination of ReLU functions can also simulate complex nonlinear functions, which make the CNN have a powerful data processing capability for complex nonlinear functions, which make the CNN have a powerful data processing capability for a a complex training set. The ReLU function is expressed as: complex training set. The ReLU function is expressed as: ( xxx,,if ifx >>0 0 g(x) ==  (3) gx() 0, otherwise (3) 0,otherwise When the CNN trains and predicts the two-dimensional matrix of spatiotemporal data, it is When the CNN trains and predicts the two-dimensional matrix of spatiotemporal data, it is generally necessary to use the correlations between stations or lines to specifically arrange the positions generally necessary to use the correlations between stations or lines to specifically arrange the of stations or lines in the two-dimensional matrix and then use the standard convolution kernel to extract positions of stations or lines in the two-dimensional matrix and then use the standard convolution features. The standard convolution kernel is locally convoluted. However, the spatial correlations kernel to extract features. The standard convolution kernel is locally convoluted. However, the spatial between stations or lines in reality are global, so the use of standard convolution kernels has significant correlations between stations or lines in reality are global, so the use of standard convolution kernels limitations. Therefore, this paper adopted convolution kernels with long-shapes and performed full has significant limitations. Therefore, this paper adopted convolution kernels with long-shapes and convolution on the spatial dimension of the two-dimensional matrix of the input spatiotemporal data, performed full convolution on the spatial dimension of the two-dimensional matrix of the input which made full use of the spatial correlations between stations or lines. The network also cancelled spatiotemporal data, which made full use of the spatial correlations between stations or lines. The the pooling layer, using the batch normalization and nonlinear ReLU function after each convolutional network also cancelled the pooling layer, using the batch normalization and nonlinear ReLU function layer, as well as the output layer. after each convolutional layer, as well as the output layer. 2.2.3. Methods of Network Training 2.2.3. Methods of Network Training In the prediction experiments of dayparting passenger flow of metro stations using the LSTM NN, In the prediction experiments of dayparting passenger flow of metro stations using the LSTM the data of joint passenger flow from 06:00 on 3 March to 17:00 on 10 March were taken as the training NN, the data of joint passenger flow from 06:00 on 3 March to 17:00 on 10 March were taken as the set, and the data of joint passenger flow from 17:00 to 23:00 on 10 March were taken as the test set. training set, and the data of joint passenger flow from 17:00 to 23:00 on 10 March were taken as the Because the input and output data were single time series, the number of input and output layer was test set. Because the input and output data were single time series, the number of input and output set to 1. The nodes of hidden layer were set to 50 units, the training epochs were set to 600, gradient layer was set to 1. The nodes of hidden layer were set to 50 units, the training epochs were set to 600, gradient threshold was 1, initial learning rate was 0.05, and the learning rate drop was set after 300 Smart Cities 2019, 2 377 threshold was 1, initial learning rate was 0.05, and the learning rate drop was set after 300 epochs by multiplying by a factor of 0.1. In the daily passenger flow prediction experiments of metro lines using the LSTM NN, the daily passenger flow in whole year of 2015 was taken as the training set, and the daily passenger flow of January–February in 2016 was taken as the test set. The number of input and output layers was set to 1, the hidden layer had 100 units, the training epochs had 500, gradient threshold was 1, initial learning rate was 0.05, and the learning rate drop was set after 250 epochs by multiplying by a factor of 0.1. The CNN proposed in this paper was a full convolution network and finally output the predicted passenger flow of each station and line directly. In the experiments of dayparting passenger flow prediction of metro stations using the CNN, the data of dayparting passenger flow on whole day of 3 March were taken as the training set, and the data of dayparting passenger flow on whole day of 10 March were taken the test set. A full convolutional network with six convolutional layers was constructed for the spatiotemporal data of 47 stations. The first three convolution kernels were (47, 47, 3), with the first three paddings being 1 and the last convolution kernel being (47, 47, 1). In the experiments of the daily passenger flow prediction of metro lines using the CNN, the daily passenger flow in whole year of 2015 was taken as the training set, and the daily passenger flow of January–February in 2016 was taken as the test set. A full convolutional network with six convolutional layers was constructed for the spatiotemporal data of 15 lines. The first three convolution kernels were (15, 15, 3), with the first three paddings being 1 and the last convolution kernel being (15, 15, 1). Both networks used the mean square error (MSE) as the loss function. All experiments evaluated the prediction performance by three criteria: Mean absolute error (MAE), mean relative error (MRE) and root mean square error (RMSE).

3. Results

3.1. Dayparting Passenger Flow Prediction of Metro Stations Using LSTM NN In order to avoid the error caused by the random gradient during network training, we carried out three repeated experiments. For XIZHIMEN Station, the prediction performance for the evening-peak (17:00–19:30) and the full-time (17:00–23:00) is shown in Table1. Using ARIMA (2, 2, 3) as the contrast experiment, the prediction performances of two methods are shown in Table2, and the predicted values of the two methods (the LSTM NN took three experiments’ averages) were compared with the observed values, as shown in Figure6. It can be seen that the prediction accuracy of the LSTM NN was much higher than that of ARIMA, regardless of the evening-peak or full-time, especially in the prediction of peak shape. Comparing the dayparting MAE of two methods (the LSTM NN took three experiments’ averages) (Figure7), the dayparting MAE of ARIMA fluctuated sharply during the evening-peak, while the dayparting MAE of the LSTM NN remained low throughout the full-time, which indicates that the LSTM NN is highly adaptable to extremely changing data. The MRE of the LSTM NN remained within 10%, regardless of the evening-peak or full time, indicating that the LSTM NN has both long-term and short-term prediction capabilities.

Table 1. LSTM NN prediction performance at XIZHIMEN Station.

Evening-Peak Full-Time Test ID MAE MRE RMSE MAE MRE RMSE 1 154.211 7.08% 194.872 97.843 9.84% 135.966 2 88.649 4.53% 108.202 67.991 7.81% 85.828 3 87.299 4.20% 118.454 76.523 10.74% 97.941 Average 110.053 5.27% 140.509 80.786 9.46% 106.578 Smart Cities 2019, 2 378

SmartSmart Cities Cities 2019 2019, 2, 2FOR FOR PEER PEER REVIEW REVIEW 8 8 Table 2. Prediction performances of two methods at XIZHIMEN Station. TableTable 2. 2. Prediction Prediction performances performances of of two two methods methods at at XIZHIMEN XIZHIMEN Station. Station. Evening-Peak Full-Time Models Evening-PeakEvening-Peak Full-TimeFull-Time ModelsModels MAE MRE MAE MRE MAE MRE MAEMAE MREMRE MAEMAE MREMRE MAEMAE MREMRE ARIMAARIMAARIMA 258.505 258.505 258.505 12.26%12.26%12.26% 318.266318.266318.266 149.677149.677149.677 15.08%15.08%15.08% 215.058 215.058215.058 LSTM NN 110.053 5.27% 140.509 80.786 9.46% 106.578 LSTMLSTM NN NN 110.053110.053 5.27%5.27% 140.509140.509 80.78680.786 9.46% 9.46% 106.578106.578

35003500 Observed Observed ARIMA ARIMA LSTM NN 30003000 LSTM NN

25002500

20002000

15001500 Passenger Flow Passenger Flow

10001000

500500

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -- :2 0 :4 0 0 0 :2 0 0 0 0 :0 0 0 4 0 :0 0 4 0 0 :4 0 -- 7 :2 7 :4 8: 0 8 :2 9: 0 0 :0 0: 4 1 :0 1: 4 2 :4 1 7 -1 7 1 8: 1 8 1 9: -19:2 2 0 -20:2 2 0: -2 1 2 1: -22:0 -2 2 -23:00 0- 1 0 -1 1 0- 1 1 0 -19:2 0- 2 0 -20:20- 2 0 -2 0- 2 0 -22:0 0 -2 0 -23:00 1 0- 0 0- 0 0- 0 3 0- 0 3 0- 0 0 0 1 :3 :3 :1 :1 :1 :1 :5 :5 :1 :1 3 :5 :5 3 :5 :5 :3 :3 7:7: 7 7 8 8 8:50-8:50-9 9 9 9 0 0 0:0: 0 0 1:1: 1 1 2 2 1 1 1 1 17:50-17:50-1 1 18:30-18:4018:30-18:401 1 1 1 19:30-19:4019:30-19:401 1 2 2 2 2 2 2 21:10-21:2021:10-21:202 2 2 2 22:10-22:2022:10-22:202 2 22:522:5 TimeTime

FigureFigureFigure 6. 6.6. Predicted PredictedPredicted and andand observed observedobserved values valuesvalues of ofof two twotwo methodsmethods methods atat at XIZHIMENXIZHIMEN XIZHIMEN Station. Station.

800800 ARIMA ARIMA LSTM LSTM NN NN 700700

600600

500500

400400 MAE MAE

300300

200200

100100

00 0 0 0 0 0 0 0 0 0 0 0 0 2 0 :4 0 2 0 4 0 2 0 4 0 2 0 4 0 2 0 4 0 2 0 4 0 :00 7: 2 7 :4 8: 2 8: 4 9: 2 9: 4 0: 2 0: 4 1: 2 1: 4 2: 2 2: 4 3 :00 -1 7: -1 7 -1 8: -1 8: -1 9: -1 9: -2 0: -2 0: -2 1: -2 1: -2 2: -2 2: -2 3 0 -1 -1 0 -1 -1 0 -1 -1 0 -2 -2 0 -2 -2 0 -2 -2 -2 0 30 0 30 50-19:00 0 30 50-20:00 0 30 50-21:00 0 30 50-22:00 0 30 50 :1:1 : :30 :1:1 : :30 : :50-19:00:1:1 : :30 : :50-20:00 : :30 : :50-21:00 : :30 : :50-22:00 : :30 : :50 7 7 8 8 9 9 1 1 1717 17:50-18:0017:50-18:001 1 1818 1818 1 1 1919 1919 20:120:1 2020 2020 21:121:1 2121 2121 22:122:1 2222 2222 TimeTime

FigureFigureFigure 7. 7.7. Dayparting DaypartingDayparting mean meanmean absolute absoluteabsolute error errorerror (MAE) (MAE)(MAE) of ofof two twotwo methods methodsmethods at at XIZHIMEN XIZHIMEN Station. Station.

InInIn order order toto to testtest test thethe the applicabilityapplicability applicability of of of the the the LSTM LSTM LSTM NN, NN, NN, 24 24 24 other other other Beijing Beijing Beijing metro metro metro stations stations stations were were were selected selected selected for forexperiments,for experiments, experiments, with with with ARIMA ARIMA ARIMA as the as as contrast. the the contrast. contrast. The prediction The The prediction prediction performances performances performances of two of methodsof two two methods methods for 25 stations for for 25 25 stationsarestations shown are are inshown shown Table 3in .in InTable Table comparison, 3. 3. In In comparison, comparison, the prediction th thee prediction prediction accuracy accuracy ofaccuracy the LSTM of of the the NN LSTM LSTM decreased NN NN decreased decreased when it whenwaswhen applied it it was was toapplied applied 25 metro to to 25 25 stations, metro metro stations, butstations, its accuracy but but its its accuracy wasaccuracy still was betterwas still still than better better those th than ofan ARIMA.those those of of InARIMA. ARIMA. addition, In In addition,addition, the the LSTM LSTM NN NN did did not not require require the the pre-verification pre-verification of of time time series series stationarity, stationarity, eliminating eliminating the the tedioustedious process process of of the the smoothing smoothing and and stationarity stationarity tests. tests.

Smart Cities 2019, 2 379 the LSTM NN did not require the pre-verification of time series stationarity, eliminating the tedious process of the smoothing and stationarity tests.

Table 3. Prediction performances of two methods for 25 stations.

Evening-Peak Full-Time Models MAE MRE MAE MRE MAE MRE ARIMA 136.954 12.72% 200.340 88.580 18.32% 143.020 LSTM NN 90.220 8.74% 137.948 61.060 13.76 100.228 Smart Cities 2019, 2 FOR PEER REVIEW 9

3.2. Daily Passenger Flow PredictionTable 3. Prediction of Metro performances Lines Using of two LSTM methods NN for 25 stations. The prediction performance in BeijingEvening-Peak metro Line 1 is shownFull-Time in Table 4. Similarly, in order to set Models up a contrast experiment, SARIMAMAE (1, 0, 4)MRE (1, 1, 1)MAE was usedMRE to modelMAE theMRE training set and predict the test set, and the predictionARIMA performances 136.954 12.72% of two methods200.340 88.580 are shown18.32% in 143.020 Table5 . The predicted values LSTM NN 90.220 8.74% 137.948 61.060 13.76 100.228 (the LSTM NN took three experiments’ averages) were compared with the observed values, as shown in Figure83.2.. It Daily was Passenger proved Flow that Prediction the fitting of Metro expression Lines Using of LSTM SARIMA NN was too simple, so the fitting process was greatly influencedThe prediction by performance periodicity. in SARIMABeijing metro also Line had 1 is littleshown response in Table 4. toSimilarly, extreme in order changes to set in the data. Therefore,up it a wascontrast diffi experiment,cult for SARIMASARIMA (1, to 0, predict4) (1, 1, 1) passenger was used to flowmodel duringthe training the set holidays. and predict In contrast, in additionthe to test accurately set, and the predicting prediction performances the periodic of two distribution methods are of shown daily in passenger Table 5. The flow, predicted the LSTM NN values (the LSTM NN took three experiments’ averages) were compared with the observed values, also performed well in predicting passenger flow during holidays. as shown in Figure 8. It was proved that the fitting expression of SARIMA was too simple, so the fitting process was greatly influenced by periodicity. SARIMA also had little response to extreme changes in the data. Therefore,Table 4. itCNN was difficult prediction for performanceSARIMA to predict in Line passenger 1. flow during the holidays. In contrast, in addition to accurately predicting the periodic distribution of daily passenger flow, theTest LSTM ID NN also performed MAE well in predicting passenger MRE flow during holidays. RMSE

1Table 4. 7.836 CNN prediction performance 10.30% in Line 1. 11.731 2 9.212 11.57% 14.321 3Test ID 9.874 MAE MRE 12.86% RMSE 13.768 1 7.836 10.30% 11.731 Average2 8.9749.212 11.57% 11.58% 14.321 13.273 3 9.874 12.86% 13.768 Table 5.AveragePrediction performances8.974 11.58% of two methods13.273 in Line 1.

Table 5. Prediction performances of two methods in Line 1. Models MAE MRE RMSE Models MAE MRE RMSE ARIMAARIMA 15.871 15.871 27.26% 27.26% 26.273 26.273 LSTM NNLSTM 8.974 NN 8.974 11.58% 11.58% 13.273 13.273

Observed 140 SARIMA LSTM NN

120

100

80

60 Passenger Flow (×10^4) 40

20

4 0 6 2 6 1 7 3 9 5 -- -0 -08 -1 -14 1 -20 2 -2 -0 -0 -1 -1 -23 2 -29 1-06 1-12 1- 1-18 1- 1-24 1-28 2-03 2-09 2-15 2-21 2- 2-27 01-0201 0 01 01 0 01 0 0 01 0 0 01 0 01-3002 0 02-0502 0 02-1102 0 02-1702 0 02 0 0 02 Date

FigureFigure 8. Predicted 8. Predicted values values of of two two methods an andd observed observed values values in Line in1. Line 1.

The constructed LSTM NN was applied to the daily passenger flow prediction experiments of 14 other lines. The prediction performance for all 15 lines is shown in Table 6. Figure 9 shows the LSTM NN prediction results of daily passenger flow in Line 5, Line 7, Line 9 and the CHANGPING Smart Cities 2019, 2 380

SmartThe Cities constructed 2019, 2 FOR PEER LSTM REVIEW NN was applied to the daily passenger flow prediction experiments of10 14 other lines. The prediction performance for all 15 lines is shown in Table6. Figure9 shows the LSTMLine. As NN seen prediction in Table results 6, the ofLSTM daily NN passenger could adapt flow well in Line in the 5, Linecase 7,of Linethe daily 9 and passenger the CHANGPING flow being Line.disturbed As seen by inholidays, Table6, theobtaining LSTM NNgood could prediction adapt wellaccuracy. in the case of the daily passenger flow being disturbed by holidays, obtaining good prediction accuracy. Table 6. The LSTM NN prediction performance for 15 lines. Table 6. The LSTM NN prediction performance for 15 lines. MAE MRE RMSE MAE4.853 12.64% MRE 9.798 RMSE 4.853 12.64% 9.798

Observed 120 Predicted

100

80

60

40 Passenger Flow (×10^4)

20

0

5 3 -09 24 -02 08 -17 1-03 1-18 2-26 0 01-06 01 01-12 01-1 0 01-21 01- 01-27 01-30 02 02-05 02- 02-11 02-14 02 02-20 02-2 0 02-29 Date (a) Line 5 Observed 45 Predicted

40

35

30

25

20

Passenger Flow (×10^4) 15

10

5

8 7 9 03 12 -21 -30 05 14 -23 1-06 2-08 2-17 01- 0 01-09 01- 01-15 01-1 01 01-24 01-2 01 02-02 02- 0 02-11 02- 0 02-20 02 02-26 02-2 Date

(b) Line 7

Figure 9. Cont. Smart Cities 2019, 2 381 Smart Cities 2019, 2 FOR PEER REVIEW 11

60 Observed Predicted

50

40

30 Passenger Flow (×10^4) 20

10

3 1 5 8 3 6 -0 -06 -09 -15 -18 -2 -24 -02 -0 -0 -11 -20 -2 -2 -29 1 1 2 2 01 01 01 01-12 0 0 01 01 01-27 01-30 0 02 02 02 02-14 02-17 0 02 02 02 Date (c) Line 9

25 Observed Predicted

20

15

10 Passenger Flow (×10^4)

5

0

-03 -06 -09 -12 -15 -21 -24 -27 -30 -02 -05 -08 -14 -17 -20 -23 -26 -29 1 2 2 01 01 0 01 01 01-18 01 01 01 01 0 02 02 02-11 02 02 02 02 0 02 Date (d) CHANGPING Line

Figure 9. The LSTM NN prediction results in Line 5, Line 7, Line 9 and CHANGPING Line. (a) Line Figure 9. The LSTM NN prediction results in Line 5, Line 7, Line 9 and CHANGPING Line. (a) Line 5, 5, (b) Line 7, (c) Line 9, and (d) CHANGPING Line. (b) Line 7, (c) Line 9, and (d) CHANGPING Line.

3.3. Dayparting Passenger Flow Prediction of Metro Stations Using CNN 3.3. Dayparting Passenger Flow Prediction of Metro Stations Using CNN Taking STARIMA as the contrast experiment, the prediction performances of the passenger flow Taking STARIMA as the contrast experiment, the prediction performances of the passenger flow for 47 stations by the two methods are shown in Table 7. Compared with STARIMA, the MAE, MRE for 47 stations by the two methods are shown in Table7. Compared with STARIMA, the MAE, MRE and RMSE of the CNN decreased by 24.62%, 29.43% and 27.86%, respectively. Taking CAISHIKOU and RMSE of the CNN decreased by 24.62%, 29.43% and 27.86%, respectively. Taking CAISHIKOU Station, CIQIKOU Station, HUIXINXIJIENANKOU Station and QILIZHUANG Station as examples, Station, CIQIKOU Station, HUIXINXIJIENANKOU Station and QILIZHUANG Station as examples, the prediction results of the two methods on the observed values are shown in Figure 10. Though the the prediction results of the two methods on the observed values are shown in Figure 10. Though the CNN used less historical data, it obtained better performance than STARIMA. STARIMA was CNN used less historical data, it obtained better performance than STARIMA. STARIMA was affected affected by the random fluctuation of passenger flow, and its predicted values deviated from the by the random fluctuation of passenger flow, and its predicted values deviated from the normal range. normal range. The CNN could predict smoothly, even if it faced the fluctuation data, indicating that The CNN could predict smoothly, even if it faced the fluctuation data, indicating that the CNN is more the CNN is more robust to noise interference. robust to noise interference. Smart Cities 2019, 2 382 Smart Cities 2019, 2 FOR PEER REVIEW 12

800 Observed STARIMA 700 CNN

600

500

400

Passenger Flow 300

200

100

0

0 0 :00 :00 0 0 :00 :00 :00 :00 :00 :00 :00 7 9 4 6 8 0 1 2 3 0-0 0-0 0-11:00 0-13:000-1 0-15:000-1 0-1 0-2 0-2 0-2 0-2 :5 :5 5 5 :5 5 :5 :5 :5 :5 :5 :5 6 8 3 5 7 9 0 1 2 0 07:50-08:000 09:50-10:10: 11:50-12:12: 1 14: 1 16:50-17:001 18:50-19:001 2 2 2 Time (a) CAISHIKOU Station Observed 1000 STARIMA CNN

800

600

Passenger Flow Passenger 400

200

0

0 0 0 0 0 0 0 0 :00 :00 0 0 :00 0 00 :00 0 00 8: 9: 3: 4: 8: 9: 3: -0 -0 -1 -1 -1 -1 -2 0 0 0-10 0 0 0-15 0 0 0 :5 :5 5 :5 :5 5 :5 :5 :5 7 8 9: 0:50-11 2 3 4: 7 8 9:50-20 2 06:50-07:000 0 0 1 11:50-12:001 1 1 15:50-16:0016:50-17:001 1 1 20:50-21:0021:50-22:2 Time (b) CIQIKOU Station 1600 Observed STARIMA 1400 CNN

1200

1000

800

Passenger Flow Passenger 600

400

200

0

0 0 0 0 0 0 0 0 0 0 0 0 0 :0 0 :0 0 :0 :0 0 :0 0 :0 0 8 1 3: 4 5 7: 8 0: 1 3: -07: -10: -1 -1 -2 -2 0 0 0 0-16:000 0-19:000 0-22:000 :5 :5 :5 5 :5 5 :5 5 :5 6 9 2 6 9 2 0 07:50-008:50-09:000 10:50-111:50-12:001 13:50-114:50-115: 1 17:50-118: 1 20:50-221: 2 Time (c) HUIXINXIJIENANKOU Station

Figure 10. Cont. Smart Cities 2019, 2 383 Smart Cities 2019, 2 FOR PEER REVIEW 13

1000 Observed STARIMA CNN

800

600

400 Passenger Flow

200

0

0 0 0 0 0 :00 :00 :00 :00 :00 :0 :00 :00 1 4 5 8 9 1 2 3 0-07: 0-09:000-10: 0-13:00 0-17:00 5 5 5 :50-1 5 :50-1 :50-1 5 :50-1 :50-1 :50-2 :50-2 :50-2 0 3 4 7 8 0 1 2 06: 07:50-08:0008: 09: 1 11:50-12:0012: 1 1 15:50-16:0016: 1 1 19:50-20:002 2 2 Time (d) QILIZHUANG Station

FigureFigure 10.10. TheThe CNN CNN prediction prediction results results at at(a) ( aCAISHIKOU) CAISHIKOU Station, Station, (b) (bCIQIKOU) CIQIKOU Station, Station, (c) (HUIXINXIJIENANKOUc) HUIXINXIJIENANKOU Station, Station, (d) ( dQILIZHUANG) QILIZHUANG Station. Station.

Table 7. Prediction performances of two methods for 47 stations.

Table 7. PredictionModels performances MAE of twoMRE methods RMSE for 47 stations. STARIMA 75.976 17.57% 128.387 Models MAE MRE RMSE CNN 57.272 12.4% 92.618 STARIMA 75.976 17.57% 128.387 CNN 57.272 12.4% 92.618 3.4. Daily Passenger Flow Prediction of Metro Lines Using CNN

3.4. DailyThe Passengertraining set Flow and Prediction the test of set Metro of Linesthe CNN Using were CNN the same as those of the LSTM NN in experiments of daily passenger flow prediction of metro lines, so we compared prediction results of theseThe two training methods. set andThe the prediction test set of performances the CNN were are the shown same as in those Table of 8. the Compared LSTM NN with in experiments the LSTM ofNN, daily the passengerMAE and flowRMSE prediction of the CNN of metro were lines,slightly so lower, we compared and the predictionMRE of the results CNN ofwas these slightly two methods.higher. However, The prediction the differences performances from arethe shownthree crit ineria Table between8. Compared two methods with the were LSTM small, NN, indicating the MAE andthat RMSEtheir prediction of the CNN accuracies were slightly were lower,comparable and the. However, MRE of the the CNN CNN was took slightly the passenger higher. flow However, of 15 thelines di ffaserences a whole from two-dimensional the three criteria matrix between for twotraini methodsng, while were the small,LSTM indicating NN constructed that their in predictionthis paper accuraciesneeded separate were comparable. training for However,different thelines. CNN Taking took theLine passenger 5, the CHANGPING flow of 15 lines Line as aand whole the two-dimensionalYIZHUANG Line matrix as examples, for training, the pr whileediction the results LSTM of NN the constructed two methods in thison the paper observed needed values separate are trainingshown in for Figure different 11. lines. Taking Line 5, the CHANGPING Line and the YIZHUANG Line as examples, the prediction results of the two methods on the observed values are shown in Figure 11. Observed 120 LSTM NN Table 8. Prediction performances of two methods for 15 lines. CNN

100 Models MAE MRE RMSE

80 LSTM NN 4.853 12.64% 9.798

(×10^4) CNN 4.557 13.46% 8.682

60

40 Passenger Flow

20

0 5 8 2 5 0 3 6 -09 -12 -1 -1 -27 -30 -0 -0 -17 -2 -2 -2 1 1 1 1 2 2 2 01-03 01-06 0 0 01 01 01-21 01-24 0 0 0 02 02-08 02-11 02-14 0 0 02 02 02-29 Date (a) Line 5 Smart Cities 2019, 2 FOR PEER REVIEW 13

1000 Observed STARIMA CNN

800

600

400 Passenger Flow

200

0

0 0 0 0 0 :00 :00 :00 :00 :00 :0 :00 :00 1 4 5 8 9 1 2 3 0-07: 0-09:000-10: 0-13:00 0-17:00 5 5 5 :50-1 5 :50-1 :50-1 5 :50-1 :50-1 :50-2 :50-2 :50-2 0 3 4 7 8 0 1 2 06: 07:50-08:0008: 09: 1 11:50-12:0012: 1 1 15:50-16:0016: 1 1 19:50-20:002 2 2 Time (d) QILIZHUANG Station

Figure 10. The CNN prediction results at (a) CAISHIKOU Station, (b) CIQIKOU Station, (c) HUIXINXIJIENANKOU Station, (d) QILIZHUANG Station.

Table 7. Prediction performances of two methods for 47 stations.

Models MAE MRE RMSE STARIMA 75.976 17.57% 128.387 CNN 57.272 12.4% 92.618

3.4. Daily Passenger Flow Prediction of Metro Lines Using CNN The training set and the test set of the CNN were the same as those of the LSTM NN in experiments of daily passenger flow prediction of metro lines, so we compared prediction results of these two methods. The prediction performances are shown in Table 8. Compared with the LSTM NN, the MAE and RMSE of the CNN were slightly lower, and the MRE of the CNN was slightly higher. However, the differences from the three criteria between two methods were small, indicating that their prediction accuracies were comparable. However, the CNN took the passenger flow of 15 lines as a whole two-dimensional matrix for training, while the LSTM NN constructed in this paper needed separate training for different lines. Taking Line 5, the CHANGPING Line and the SmartYIZHUANG Cities 2019, 2 Line as examples, the prediction results of the two methods on the observed values are 384 shown in Figure 11.

Observed 120 LSTM NN CNN

100

80 (×10^4)

60

40 Passenger Flow

20

0 5 8 2 5 0 3 6 -09 -12 -1 -1 -27 -30 -0 -0 -17 -2 -2 -2 1 1 1 1 2 2 2 01-03 01-06 0 0 01 01 01-21 01-24 0 0 0 02 02-08 02-11 02-14 0 0 02 02 02-29 Date Smart Cities 2019, 2 FOR PEER REVIEW 14 (a) Line 5 Observed 25 LSTM NN CNN

20

15

10 Passenger Flow (×10^4)

5

0

3 6 9 2 5 8 1 4 7 0 -0 -0 -0 -1 -15 -0 -1 -1 -1 -2 -23 1 1 1 1 1 2-02 2-0 2 2 2 2 2 2 0 0 0 0 0 01-18 01-21 01-24 01-27 01-30 0 0 0 0 0 0 0 0 02-26 02-29 Date

(b) CHANGPING Line Observed LSTM NN 22 CNN

20

18

16

14

12

10

8 Passenger Flow (×10^4)

6

4

2

3 1 4 7 0 0 3 6 9 0 -06 -09 -12 -15 2 2 2 3 -02 -05 -08 -11 2 2 2 -2 1- 1- 1- 1- 1- 2-17 2- 2- 2- 0 01 01 01 01 01-18 0 0 0 0 02 02 02 02 02-14 0 0 0 0 02 Date (c) YIZHUANG Line

FigureFigure 11. 11.Prediction Prediction results results of of two two methods methods in (ina) ( Linea) Line 5, (5,b) ( CHANGPINGb) CHANGPING Line, Line, (c) YIZHUANG(c) YIZHUANG Line. Line.

Table 8. Prediction performances of two methods for 15 lines.

Models MAE MRE RMSE LSTM NN 4.853 12.64% 9.798 CNN 4.557 13.46% 8.682

4. Discussion and Conclusions The LSTM NN and the CNN are two popular deep learning methods. While the former’s unique structure of memory cell can capture the long short-term dependencies of time series, the latter has a powerful extraction ability over key image features. According to our studies, the main findings and conclusions are as follows: (1) When the LSTM NN predicts the dayparting passenger flow of metro stations, the MRE of evening-peak and full-time can remain within 10%, which are much lower than ARIMA. In Smart Cities 2019, 2 385

4. Discussion and Conclusions The LSTM NN and the CNN are two popular deep learning methods. While the former’s unique structure of memory cell can capture the long short-term dependencies of time series, the latter has a powerful extraction ability over key image features. According to our studies, the main findings and conclusions are as follows:

(1) When the LSTM NN predicts the dayparting passenger flow of metro stations, the MRE of evening-peak and full-time can remain within 10%, which are much lower than ARIMA. In addition, the superiority of predicting the peak shape is particularly obvious, explaining that the LSTM NN is highly adaptable to extremely changing data and less limited to the prediction term span. (2) When the LSTM NN predicts the daily passenger flow of metro lines, it overcomes the shortcoming of SARIMA, which has weak responses to dramatic changes in the training set due to the influence of overall periodicity. Hence, the LSTM NN achieves good prediction accuracy during both non-holiday and holidays. (3) When the CNN predicts the dayparting passenger flow of metro stations, the MAE, MRE and RMSE of the CNN decreased by 24.62%, 29.43%, and 27.86%, respectively, compared with STARIMA, which proves that CNN is less affected by the random fluctuation of passenger flow and has a stronger robustness. (4) When the CNN predicts the daily passenger flow of metro lines, its prediction accuracy is comparable to that of the LSTM NN. Like the LSTM NN, the CNN has long short-term memory capabilities. Both methods can capture the overall periodicity and dramatic changes of passenger flow, but both will ignore the data shack that lasts too short as abnormal noise.

Based on the above experimental findings, we can prove that the LSTM NN and the CNN can both accurately predict long short-term passenger flow of urban rail transit. They also have good data adaptability and robustness. However, it is difficult for these two networks to capture the transient fluctuation of the passenger flow caused by external disturbance, as this can only be done through learning from the spatiotemporal series themselves. Therefore, in future work, it is necessary to input some prior knowledge outside the spatiotemporal series into the networks to improve prediction results.

Author Contributions: S.Z. principally conceived the idea for the study and was responsible for project administration. Z.X., J.Z., D.S. and Q.H. were responsible for preprocessing all the data and setting up experiments. Z.X. and S.Z. wrote the initial draft of the manuscript and were responsible for revising and improving of the manuscript according to reviewers’ comments. Funding: This research was funded in part by the National Key R & D Program of China (2016YFC0802500), the Independent Scientific Research Project of the Beijing Research Center of Urban Systems Engineering (2019C013), and the National Natural Science Foundation of China (41471338). Acknowledgments: We also appreciate support for this paper from the Beijing Key Laboratory of Operation Safety of Gas, Heating, and Underground Pipelines. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Leng, B.; Zeng, J.; Xiong, Z.; Lv, W.; Wan, Y. Probability tree based passenger flow prediction and its application to the system. Front. Comput. Sci. 2013, 7, 195–203. [CrossRef] 2. Fang, Z.; Yang, X.; Xu, Y.; Shaw, S.-L.; Yin, L. Spatiotemporal model for assessing the stability of urban human convergence and divergence patterns. Int. J. Geogr. Inf. Sci. 2017, 31, 2119–2141. [CrossRef] 3. Sun, Y.; Leng, B.; Guan, W. A novel wavelet-SVM short-time passenger flow prediction in Beijing subway system. Neurocomputing 2015, 166, 109–121. [CrossRef] 4. Smith, B.L.; Demetsky, M.J. Traffic Flow Forecasting: Comparison of Modeling Approaches. J. Transp. Eng. 1997, 123, 261–266. [CrossRef] Smart Cities 2019, 2 386

5. Williams, B.M.; Durvasula, P.K.; Brown, D.E. Urban Freeway Traffic Flow Prediction: Application of Seasonal Autoregressive Integrated Moving Average and Exponential Smoothing Models. Transp. Res. Rec. J. Transp. Res. Board 1998, 1644, 132–141. [CrossRef] 6. Xiong, Z.; Zhong, S.; Song, D.; Yu, Z.; Huang, Q. A method of fitting urban rail transit passenger flow time series. China Saf. Sci. J. 2018, 28, 35–41. 7. Ahmed, M.S.; Cook, A.R. Analysis of Freeway Traffic Time-Series Data by Using Box-Jenkins Techniques; Transportation Research Board: Washington, DC, USA, 1979. 8. Nihan, N.L.; Holmesland, K.O. Use of the box and Jenkins time series technique in traffic forecasting. Transportation 1980, 9, 125–143. [CrossRef] 9. Kumar, S.V.; Vanajakshi, L. Short-term traffic flow prediction using seasonal ARIMA model with limited input data. Eur. Transp. Res. Rev. 2015, 7, 21. [CrossRef] 10. Kamarianakis, Y.; Prastacos, P. Space-time modeling of traffic flow. Comput. Geosci. 2005, 31, 119–133. [CrossRef] 11. Lin, W.-H. A Gaussian maximum likelihood formulation for short-term forecasting of traffic flow. In Proceedings of the ITSC 2001, 2001 IEEE Intelligent Transportation Systems, Proceedings (Cat. No. 01TH8585), Oakland, CA, USA, 25–29 August 2001; pp. 150–155. 12. Smith, B.L.; Williams, B.M.; Oswald, R.K. Comparison of parametric and nonparametric models for traffic flow forecasting. Transp. Res. Part C Emerg. Technol. 2002, 10, 303–321. [CrossRef] 13. Okutani, I.; Stephanedes, Y.J. Dynamic prediction of traffic volume through Kalman filtering theory. Transp. Res. Part B Methodol. 1984, 18, 1–11. [CrossRef] 14. Sun, S.; Zhang, C.; Yu, G. A Bayesian Network Approach to Traffic Flow Forecasting. IEEE Trans. Intell. Transp. Syst. 2006, 7, 124–132. [CrossRef] 15. Chen, H.; Grant-Muller, S.; Mussone, L.; Montgomery, F. A Study of Hybrid Neural Network Approaches and the Effects of Missing Data on Traffic Forecasting. Neural Comput. Appl. 2001, 10, 277–286. [CrossRef] 16. Xie, Y.; Zhang, Y.; Ye, Z. Short-Term Traffic Volume Forecasting Using Kalman Filter with Discrete Wavelet Decomposition. Comput. Civ. Infrastruct. Eng. 2007, 22, 326–334. [CrossRef] 17. Hu, J.; Zong, C.; Song, J.; Zhang, Z.; Ren, J. An applicable short-term traffic flow forecasting method based on chaotic theory. In Proceedings of the 2003 IEEE International Conference on Intelligent Transportation Systems, Shanghai, China, 12–15 October 2003; pp. 608–613. 18. Wu, C.-H.; Ho, J.-M.; Lee, D. Travel-Time Prediction With Support Vector Regression. IEEE Trans. Intell. Transp. Syst. 2004, 5, 276–281. [CrossRef] 19. Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; Wang, Y. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transp. Res. Part C Emerg. Technol. 2015, 54, 187–197. [CrossRef] 20. Zhong, S.; Yao, G.; Zhu, W. A CPS-enhanced Subway Operations Safety System Based on the Short-term Prediction of the Passenger Flow. In Leveraging Data Techniques for Cyber-Physical Systems; Springer: Berlin/Heidelberg, Germany, in press. 21. Zhao, Z.; Chen, W.; Wu, X.; Chen, P.C.Y.; Liu, J. LSTM network: A deep learning approach for short-term traffic forecast. IET Intell. Transp. Syst. 2017, 11, 68–75. [CrossRef] 22. Schalkoff, R.J. Artificial Neural Networks; McGraw-Hill: New York, NY, USA, 1997; Volume 1. 23. Hua, J.; Faghri, A. Apphcations of artificial neural networks to intelligent vehicle-highway systems. Transp. Res. Rec. 1994, 1453, 83. 24. Park, D.; Rilett, L.R. Forecasting Freeway Link Travel Times with a Multilayer Feedforward Neural Network. Comput. Civ. Infrastruct. Eng. 1999, 14, 357–367. [CrossRef] 25. Park, B.; Messer, C.J.; Urbanik, T.; Ii, T.U. Short-Term Freeway Traffic Volume Forecasting Using Radial Basis Function Neural Network. Transp. Res. Rec. J. Transp. Res. Board 1998, 1651, 39–47. [CrossRef] 26. Rilett, L.R.; Park, D.; Han, G. Spectral Basis Neural Networks for Real-Time Travel Time Forecasting. J. Transp. Eng. 1999, 125, 515–523. 27. Lingras, P.; Sharma, S.; Zhong, M. Prediction of Recreational Travel Using Genetically Designed Regression and Time-Delay Neural Network Models. Transp. Res. Rec. J. Transp. Res. Board 2002, 1805, 16–24. [CrossRef] 28. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef] [PubMed] Smart Cities 2019, 2 387

29. Lin, F.; Xu, Y.; Yang, Y.; Ma, H. A Spatial-Temporal Hybrid Model for Short-Term Traffic Prediction. Math. Probl. Eng. 2019, 2019, 1–12. [CrossRef] 30. Ma, X.; Dai, Z.; He, Z.; Ma, J.; Wang, Y.; Wang, Y. Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction. Sensors 2017, 17, 818. [CrossRef] [PubMed]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).