Short-Term Travel Time Prediction for Public Transport Validation 1.00 Guillaume Neven (16-832-883) MEA 0.75

Lerning mechanism 1.0

0.5

0.0 Lateness [s]

7h 7h15 7h30 7h45 8h 8h15 8h30 8h45 9h Time Epoch 00 Epoch 10 Epoch 20 Epoch 30

Loss Train 1.25 Short-term travel time prediction for public transport Validation 1.00 Guillaume Neven (16-832-883) MEA 0.75

0.50 0 5 10 15 20 25 30 Epochs

Student Report 103-0488-00L Seminar in Raumentwicklung und Infrastruktursysteme 2020 Short-term travel time prediction for public transport 2020

Seminar project

Short-term travel time prediction for public transport

Eidgenössische Technische Hochschule Zürich Swiss Federal Institute of Technology ETH

Institut für Verkehrsplanung und Transportsysteme 103-0488-00L Seminar in Raumentwicklung und Infrastruktursysteme Frühjahrssemester 2020

Guillaume Neven (16-832-883)

Abstract

The reliability of public transportation is,unquestionably, of paramount importance in driving a shift in the use of transportation modes. In this respect, innovation in improving the prediction of delays has a major role to play. In the last decade, machine learning and artificial intelligence has been widely deployed in various field, and transportation science is no exception. In view of the importance of the objective, it is not surprising to see data driven model for delays prediction. This project contributes to this effort.

This project proposes a data driven model using a new kind of LSTM, capturing both temporal and spatial aspect to predict the delays of buses using their past delays, boarding/alighting time and travel time. It has been tested for the case of Zürich. This new model, implementing a bus based multi features analysis at a network scale has shown encouraging results. It succeeded to improved the medium-long term predictions for the city of Zürich, even though the network exhibits a high punctuality.

i Short-term travel time prediction for public transport 2020

Contents

Acknowledgement1

1 Introduction 2

2 Review of Relevant Literature3

3 Long short-term memory algorithm4

4 Implementation of the algorithm9 4.1 Parameter Tuning ...... 10 4.2 Features relevance ...... 13

5 Case study 14 5.1 Description of data set ...... 15 5.2 Data processing ...... 16 5.3 Time prediction performance ...... 17 5.4 Performance under extreme condition ...... 18 5.5 Discussion ...... 19

6 Conclusion 20

7 References 21

A Appendix 23 A.1 Tuning for deeper network ...... 23

ii Short-term travel time prediction for public transport 2020

List of Figures

1 Schema of traditional Artificial Neural Networks (ANN)...... 4 2 Schema of Recurrent Neural Network (RNN)...... 5 3 Schema of Long short-term memory...... 6 4 Illustration of the conventional layer...... 6 5 Illustration of the conventional long short-term memory model...... 7 8 Representation of the shape of the data...... 10 9 Tuning of the dropout...... 11 10 Dependency of the Kernel and Filter size...... 12 11 Performance for the different time step...... 13 12 Relevance of 3 features compared to the literature...... 13 13 Map of the Zürich’s network...... 14 14 Probability distribution function (PDF)...... 15 15 Distribution of the line set and their delays...... 16 16 Time prediction for different time step...... 17 17 Model performance per line...... 18 18 Distribution of lateness for each day of the full data set...... 18 19 Time prediction under extreme condition...... 19 20 Tuning of the hyper parameters for 2 layers...... 23 21 Tuning of the hyper parameters for 3 layers...... 24

List of Tables

1 Tuning and range of the diﬀerent hyper-parameters...... 9 2 Error and time for the best of each =...... 12

iii Short-term travel time prediction for public transport 2020

Acknowledgement

I am particularly grateful for the advice and detailed comments provided by my two supervisors Kimia Chavoshi and Dr. Georgios Sarlas as well as the support from Dr. Anastasios Kouvelas. I also want to thank the community behind the Keras package (Chollet and others (2015) ) for their sustained eﬀorts in making machine learning a bit more accessible.

1 Short-term travel time prediction for public transport 2020

1 Introduction

Public transportation has a major part to play in responding to increasing demand for mobility and achieving sustainable mobility. In this respect, it is important to shift travellers away from private motorized modes and towards public transportation.

To prompt this shift, it is essential to enhance user’s perception of quality. A key aspect of quality concerns the reliability of the network Jenelius (2018). Even though disruptions and delays can potentially decrease the reliability of a system, seeking ways of mitigating their impacts lie on the forefront of transport research. To that end, the importance of real time information and accurate service time predictions can be emphasized as they can allow both users and operators to plan accordingly.

To achieve these, it is essential to understand the function of a transport network and the interactions between its different actors, especially during peak hours where the demand and normally the delays are higher. In this project, a new method to predict delays of buses is proposed. More specifically, the application of a Machine Learning ( henceforth denoted as ML) algorithm is explored, which is trained on historical data in order to capture the different levels of network complexities that arise. The ML approach is tested in terms of its ability to predict buses’ delays in real time. Subsequently, the results could be utilised for enhancing the reliability of the public transport service.

In summary, the objective of this work is to create a data driven model to improve the quality of delay predictions signiﬁcantly ahead of time.

2 Short-term travel time prediction for public transport 2020

2 Review of Relevant Literature

ML has become widely used in the last decade, mainly as a result of the explosion of computation power. This has also occurred in the the case of transport systems where ML has been increasingly deployed as passenger and travel data started to be more widely recorded. The most popular algorithm is the Long Short-Term Memory (LSTM). It has been applied to various matters in transport engineering, mostly for travel time prediction purposes (e.g. Ma et al. (2015) Agafonov and Yumaganov (2019) Chen et al. (2019) Kim et al. (2016)), or prediction of "Time-to-green" in intersections (e.g. Genser et al. (2020)).

ML approaches have also found application in the area of delay prediction for public transport. For instance, Petersen et al. (2019) developed an LSTM model to predict travel times along the route of a single bus line in Copenhagen. Their approach employs a link-level analysis allowing to make statements about the delay based on the corresponding travel time predictions. More speciﬁcally, they proposed a combination of LSTM and convolution models in order to capture the temporal and spatial correlations. A comparison against diﬀerent prediction methods showed that their approach outperformed them.

In the same spirit, Huang et al. (2019) developed a model that takes into consideration a higher number of potential variables. In particular, their data set involved a combination of Origin-Destination data based on personalised card logs, and Automated Vehicle Location (AVL) system records for the whole network. The implemented model contains variables describing the travel time and the number of boarding and alighting passengers per stop. In addition, some sub-variables have been added as well, such as weather conditions, day of the week, and whether or not school was taking place at a speciﬁc day. In summary, they succeeded to drastically improve the accuracy of the predictions, but at the cost of a high data load, making it diﬃcult to implement everywhere.

Last, Liu et al. (2020) proposed a travel time prediction approach for a speciﬁc line that uses detailed bus GPS probe data, including speed and localisation. In this model, the respective bus travel routes are not analysed as line segments between successive stops but between network intersections. Subsequently, the traﬃc conditions on those link are predicted and then used to predict the bus delays. Once again they succeed to improve the prediction compared to the system in place.

3 Short-term travel time prediction for public transport 2020

3 Long short-term memory algorithm

The Long Short-Term Memory (LSTM) is an ML algorithm from the class of Recurrent Neural Network (RNN), which belongs itself to the class of Artiﬁcial Neural Networks (ANN). It is largely used for translation and speech recognition purposes but also have found application in transportation(e.g. Huang et al. (2019)).

In ﬁgure 1, the simplest form of a neural network is presented. It can have multiple inputs and outputs; in the example the model has 4 inputs and 1 output. Each link has a weight which indicates how much information should be transferred to the next node. Once in the node, a function merges all the incoming links and creates a new value that is transferred further on. A large number of functions exist with the simplest one being a linear activation. At the end, the output is compared to the expected value. Then the model is fed backwards, adjusting the weights in order to get closer to the expected value.

Figure 1: Schema of traditional Artiﬁcial Neural Networks (ANN).

Input Hidden Output layer layer layer

Input #1

Input #2 Output Input #3

Input #4

Source : http://www.texample.net/tikz/examples/neural-network/

4 Short-term travel time prediction for public transport 2020

A major limitation of the traditional ANN is the complexity to implement time relations. For instance, one could train a model with the delays of the bus in order to predict the futures delays but the model will not be recursive. RNN overcomes this inherent limitation. Figure 2 illustrates this idea which includes training the model for diﬀerent time steps, while taking into account information from the past.

Figure 2: Schema of Recurrent Neural Network (RNN).

Source : https://colah.github.io/posts/2015-08-Understanding-LSTMs

In this RNN, a time window is selected where the information to predict the output ℎC is coming

from time step -0 − -C and not further backwards. The number of weights, as depicted in 1 can however be very large. If applied to the case study using a simple RNN network to

predict the delays of #1DB, using the 3 selected features (Total lateness, Travel time lateness and

boarding/alighting delays) using the time windows ,B8I4 of 10 time step behind, one would need an astonishing 2.91412 number of weights (equation 1). Here lies the major advantages of the LSTM.

3 3 #weight = (#bus ∗ #features ∗ ,size) = (476 ∗ 3 ∗ 10) = 2.91412 (1)

5 Short-term travel time prediction for public transport 2020

Figure 3: Schema of Long short-term memory.

Source : https://colah.github.io/posts/2015-08-Understanding-LSTMs

In figure 3 a basic LSTM model is presented. As one could notice, there is no information travelling through neurons, and therefore through time. Unlike the RNN, the LSTM has two flows of information between cells. The first one, similar to the ANN, is the output of the previous time step. The second one is called the memory, which is independent of the inputs window’s size. For example, if a text is given to the LSTM, the memory will keep important information throughout the text, allowing it to understand pronoun referring to object or people defined well before. A detailed presentation and discussion of LSTM can be found in Olah (2015).

In addition, a modification can be made to the LSTM to capture spatial effects. The idea is to feed the model not merely with an input including a single value but to add a convolution layer just before. The figure 4 shows an example for a one dimension convolutional layer. In the cell , a weighing of the n-inputs is made and given to the LSTM.

Figure 4: Illustration of the conventional layer.

Source : https://colah.github.io/posts/2014-07-Conv-Nets-Modular/

6 Short-term travel time prediction for public transport 2020

The idea behind the implementation of this convolutional layer is to allow the model to look at the buses behind and ahead of that for which the prediction is performed. To illustrate it conceptually, if a road accident happened ahead, the model will not be trained to this particular scenario, but it might still learn about large delays from the bus(ses) ahead. The ConvLSTM also uses a convolution step inside each gate of LSTM model.

Each step of the ConvLSTM is mathematically deﬁned mathematically below (equation 2 to 7 ) and schematically in ﬁgure 5.

Figure 5: Illustration of the conventional long short-term memory model.

Source : https://medium.com/neuronio/an-introduction-to-convlstm-55c9025563a7

There are 4 diﬀerent gates in the convLSTM cell. First, the forget gate (equation 2), 5C is computed by passing to as simgoid function ( see ﬁgure 6) the combination of the previous

output HC−1, the memory C−1 and the input XC plus a bias 1 5 . Every , matrix contains the weights that are going to be updated along the process. The idea of this gate is to select ( from 0

to 1 ) the information to be passed to the memory C.

5C = f ,G 5 ∗ XC + ,ℎ 5 ∗ HC−1 + ,2 5 ◦ CC−1 + 1 5 (2)

7 Short-term travel time prediction for public transport 2020

Figure 6: simgoid function Figure 7: tanh function

sigma(x) tanh(x) 1.0 1.0 0.5

0.5 5 3 1 1 3 5 x 0.5 1.0 5 3 1 1 3 5 x

1 4G − 4−G f(G) = tanh(G) = 1 + 4−G 4G + 4−G

The second gate, called update gate, update the candidates. First, (equation 1) selects the

candidates to be updated; similarly to the forget gate, a combination of the previous output HC−1,

the memory C−1 and the input XC plus a bias 1 5 is made using once again the sigmoid function. Then, the selected candidates from the previous equation are updated using a tanh function (see ﬁgure 7).

8C = f (,G8 ∗ XC + ,ℎ8 ∗ HC−1 + ,28 ◦ CC−1 + 18) (3)

bC = tanh (,G2 ∗ XC + ,ℎ2 ∗ HC−1 + 12) (4)

Then the memory is ﬁlled with the updated candidates in equation 5.

CC = 5C ◦ CC−1 + 8C ◦ bC (5)

>C = f (,G> ∗ XC + ,ℎ> ∗ HC−1 + ,2> ◦ CC + 1>) (6)

Afterwards, a final sigmoid function selects parts of the previous output HC−1 and the input XC (equation 6) to be fed into the final equation 7. This final equation combines a piece of memory

C using a hyperbolic tangent (tanh) function and the last output >C to ﬁnally compute the output

HC.

HC = >C ◦ tanh (CC) (7)

8 Short-term travel time prediction for public transport 2020

4 Implementation of the algorithm

As mentioned before, the objective of this work is to achieve a more accurate prediction of delays. In order to capture the bus delays patterns across a network, a deep model is implemented. A model, is referred to as deep when the information goes through multiple layers before reaching the end. In our case, similarly to Petersen et al. (2019),the model has an encoding part using = encoders convolutional LSTM and = decoding. Note that the internal memory of the LSTM is reset every day, i.e. the model remembers information only from the same day as that for which the prediction is made and no further.

The data has been shaped as depicted in ﬁgure 8. For each sample, the model is fed with a three dimensions’ matrix, containing for all the buses the features for all selected previous time step. It is selected according to the size of the windows used. The 3 selected features are the total delays of the bus, the delays due to travel time and the increased delays due to boarding and alighting. The relevance of the features is depicted in subsection 4.2.

In order to train the model, it should be fed with the same data multiple times; each time is referred to as an epoch. The number of epochs is important; if there are too few, the model is stopped before capturing the complexity. If there are too many, the model will present really good performance, but it will only be due to over-fitting of the data. Essentially ,an over-fitted model makes no prediction, but just copies what he learns. To reduce this over-fitting risk, a dropout is implemented ( discussed below) and the error is calculated on another data set, called validation. Seven random days have been selected as validation data set. It is important to select entire day as the validation data and not random sample from different days. Indeed, the internal memory needs to have a complete day to perform at its best.

In order to select the optimal model, an optimisation needs to be performed on diﬀerent criteria (see table 1). A more in depth description of the tuning is given in subsection 4.1.

Table 1: Tuning and range of the diﬀerent hyper-parameters.

Parameter Range Best

Number of encoders (=) [1 2 3] 1 Size of ﬁlters [16 32 64] 32 Convolution window [1 5 10] 10 Dropout [0.05 0.1 0.2 0.35 0.5] 0.5

9 Short-term travel time prediction for public transport 2020 REALISE A L'AIDE D'UN PRODUIT AUTODESK VERSION ETUDIANT

Figure 8: Representation of the shape of the data.

Bus ID REALISE A L'AIDE D'UN PRODUIT AUTODESK VERSION ETUDIANT

Pelo ΔTravel ΔTravel ΔTravel ΔTravel ΔStop Total ΔStop Total ΔStop Total ΔStop Total Time Time ...... Time Time Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness

#Sample Pelo ΔTravel ΔTravel ΔTravel ΔTravel ΔTravel ΔTravel ΔStop Total ΔStop Total ΔStop Total ΔStop Total ΔStop Total ΔStop Total Time Time . . . Time...... Time...... Time Time Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness

Pelo ΔTravel ΔTravelΔTravel ΔTravel ΔTravelΔTravel ΔTravelΔTravel ΔTravel ΔTravel ΔStop Total ΔStopΔStop TotalTotal ΔStop Total ΔStopΔStop TotalTotal ΔStopΔStop...... TotalTotal. . . ΔStop Total ΔStop Total Time TimeTime . . . Time...... TimeTime...... TimeTime Time Time Lateness Lateness LatenessLateness LatenessLateness Lateness Lateness LatenessLateness LatenessLateness LatenessLateness LatenessLateness Lateness Lateness Lateness Lateness Lateness LatenessLateness Lateness LatenessLateness LatenessLateness Lateness Lateness

ΔTravel ΔTravelΔTravel ΔTravel ΔTravel ΔTravelΔTravel ΔTravel ΔStop Total ΔStopΔStop TotalTotal ΔStop...... Total. . . ΔStop...... Total. . . ΔStopΔStop...... TotalTotal. . . ΔStop...... Total...... Time TimeTime . . . Time...... Time...... TimeTime Time Lateness Lateness LatenessLateness LatenessLateness Lateness Lateness Lateness Lateness LatenessLateness LatenessLateness Lateness Lateness Lateness LatenessLateness Lateness Lateness LatenessLateness Lateness

ΔTravel ΔTravel ΔTravel ΔTravelΔTravel ΔTravel ΔTravel ΔTravel ΔStop Total ΔStop Total ΔStop Total ΔStopΔStop TotalTotal ΔStop...... Total. . . ΔStop Total ΔStop Total Time . . . Time...... Time...... TimeTime...... Time...... Time...... Time Lateness Lateness Lateness Lateness Lateness Lateness LatenessLateness LatenessLateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness LatenessLateness Lateness Lateness Lateness

ΔTravel ΔTravel ΔTravel ΔTravel ΔTravel ΔTravel ΔStop Total ΔStop Total ΔStop Total ΔStop Total ΔStop Total ΔStop Total Time Time . . . Time...... Time...... Time Time ...... Lateness...... Lateness. . . Lateness...... Lateness. . . Lateness...... Lateness. . . Lateness...... Lateness. . . Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness dt

ΔTravel ΔTravelΔTravel ΔTravel ΔTravelΔTravel ΔTravelΔTravel ΔTravel ΔTravel ΔStop Total ΔStopΔStop TotalTotal ΔStop Total ΔStopΔStop TotalTotal ΔStopΔStop...... TotalTotal. . . ΔStop Total ΔStop Total Time TimeTime . . . Time...... TimeTime...... TimeTime Time Time Lateness Lateness LatenessLateness LatenessLateness Lateness Lateness LatenessLateness LatenessLateness LatenessLateness LatenessLateness Lateness Lateness Lateness Lateness Lateness LatenessLateness Lateness LatenessLateness LatenessLateness Lateness Lateness

ΔTravel ΔTravelΔTravel ΔTravel ΔTravel ΔTravel ΔStop ΔStopΔStop Total ΔStop Total ΔTravel ΔStop ΔTravel Total ΔStop Total Total Time Total . . . Time...... ΔStop...... Total. . . ΔStop Time Total Time REALISE A L'AIDE D'UN PRODUIT AUTODESK VERSION ETUDIANT Time Time Time Time Lateness Lateness LatenessLateness LatenessLateness Lateness Lateness Lateness Lateness LatenessLateness LatenessLateness Lateness Lateness Lateness LatenessLateness Lateness Lateness LatenessLateness Lateness

ΔTravel ΔTravel ΔTravel ΔTravel ΔStop Total ΔStop Total ΔStop Total ΔStop Total Time Time ...... Time Time Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness Lateness

The code can be found in the followingETUDIANT VERSION link: AUTODESK https://github.com/elingenior/ PRODUIT D'UN L'AIDE A REALISE LSTM_for_bus_lateness_prediction (The code will be further cleaned)

4.1 Parameter Tuning

In order to ﬁnd the best model, multiple models have been trained with diﬀerent hyper parameters and only for the line 32. The mean absolute error ( see equation 8) on the validation data and the computational time to perform 30 iterations have been used as metrics and a combination of the metrics will be used to select the model.

The hyper parameters are, the kernel size, which is the number of neighbouring bus taken into account; the filter size, that defines for the convulotional layer the number of kernels used in the gate. Last, the dropout rate is a factor, randomly dropping nodes in order to prevent over-fitting by droping some random node at each iteration.

Í= Hpred − Htrue = 8=1 8 8 MAE = (8)

10 Short-term travel time prediction for public transport 2020

The numbers of layers = have been selected between 1 and 3 .The number of layers is limited to 3, since increasing the number of layers has negative impact on computation time.

The kernel is another name for the window. The range has been selected from 1 to 10. Computationally, it make more sense to have a ﬁlter as a power of 2.

In light of the results of ﬁgure 9, the dropout indeed reduces the error. As some nodes are dropped it also reduce the computation time, but on a really small scale. For later calculation a dropout of 0.5 is selected. Due to risk of instability in the model it is conventionally rare to select a dropout above 0.5.

Figure 9: Tuning of the dropout.

1.12 Computation time[s] 134 1.10 132 1.08 130 1.06 MAE 128 1.04 126

1.02 Computation time[s] MAE 124

0.0 0.1 0.2 0.3 0.4 0.5 Dropout

The optimisation takes place as follow; ﬁrst all possible models for = = 1 have been trained on 30 epochs and then the Mean Absolute Error (MAE) for the validation data set has been computed.

In figure 10 the MAE values for the different kernel and filter size are presented.A reduction on the computation time as the number of filters decrease can be observed. It is not really surprising, as less computation effort is needed.

Unlike the ﬁrst hyper-parameters, it seems that the Filter and Kernel size have a very strong inﬂuence on the Time as well as on the MAE. The best combination seems to be a Kernel size of 10 and a Filter size of 32.

11 Short-term travel time prediction for public transport 2020

Figure 10: Dependency of the Kernel and Filter size.

MAE Computation time[s] 1.09 900 1.08 64.0 1.07 1.03 1.01 64.0 669 976 942 800 1.07 700 1.06 600 32.0 1.07 1.02 1.01 1.05 32.0 631 597 836 500

Filter size 1.04 400 1.03 300 16.0 1.09 1.04 1.04 1.02 16.0 115 135 149 200 1.01 1.0 5.0 10.0 1.0 5.0 10.0 Kernel size Kernel size

From those results, deeper models have also been computed. However, as shown in table 2, the model with a unique layer performs better. It was expected to be faster but is also surprisingly gives a lower MAE. This ﬁnding is discussed in a later section (section 5.5). Figures illustrating the tuning for the others = can be found in Appendix A.1.

Table 2: Error and time for the best of each =.

Model MAE Time [s] Number of layers (=) = 1 0.697 2.798e+03 Number of layers (=) = 2 0.723 1.493e+03 Number of layers (=) = 3 0.704 8.655e+03

The question about the time step also has to be discussed, as well as the number of previous time steps used to predict the next one. These two parameters depend on the horizon of the prediction. Indeed, different time steps are needed to predict 2 or 20 minutes ahead. Once again different model have been computed to select the best parameters. The objective was getting the best C + 20[<8=] prediction. Results are presented in figure 11. The 2 minutes time step with a window size of 10 is much better than any other time step. This surprising results can be difficult to explain; it would appear that 2 minutes is enough to captures the important behaviours without actually be overloaded by information. If the models would have been trained with more epochs maybe the results would have been different.

12 Short-term travel time prediction for public transport 2020

It has to be noted that for each prediction time diﬀerent time step are optimal. It is also true for every hyper parameters. This make the tuning of the model really demanding and time consuming.

Figure 11: Performance for the diﬀerent time step.

MAE [s] Computation Time[s]

51.41 47.68 nan nan nan nan nan 52 38 47 nan nan nan nan nan 1200 1200 5000

600 49.33 48.96 47.91 nan nan nan nan 50 600 45 58 69 nan nan nan nan 4000

300 48.27 49.68 50.22 48.49 nan nan nan 48 300 33 45 101 162 nan nan nan 3000

120 47.32 40.78 49.44 49.45 48.84 49.34 nan 46 120 353 603 164 203 319 481 nan

Time step [s] Time step [s] 2000

60 51.78 52.60 52.84 52.69 51.46 49.89 nan 44 60 240 354 426 639 897 1776 nan 1000 30 52.79 51.91 51.42 48.93 47.13 48.64 52.31 42 30 172 225 311 522 1134 2126 5585

1 5 10 20 40 60 100 1 5 10 20 40 60 100 Time window's size Time window's size

4.2 Features relevance

As stated previously, 3 features have been selected. In the literature and especially in Petersen et al. (2019), only the travel time delays were computed. The question arises whether using 3 features improves the model. The results in ﬁgure 12 show that the implementation of multiple features increase the accuracy of the model. It is still interesting to note that the 1 feature model converges faster. The reduced number of variables is the most likely explanation.

Figure 12: Relevance of 3 features compared to the literature.

Loss Prediction 300 1.25

1.00 200

MAE 0.75 100

0.50 Lateness[s] 0 0.25 0 5 10 15 20 25 30 7h 7h15 7h30 7h45 8h 8h15 8h30 8h45 9h Epochs time Training 1 feature Training 3 features Real 1 feature 3 features Validation 1 feature Validation 3 features

13 Short-term travel time prediction for public transport 2020

5 Case study

In order to test the ability of the proposed model to provide short-term delay predictions, a case study is designed. More speciﬁcally, publicly available AVL data for the city of Zürich (2020) are used for that purpose. The data contains the arrival and departure time at each stop for every bus in Zürich. Consequently, from this raw data the delay of each trip can be inferred.

The data set is made of 35 working day from the 1st of January to the 21st of February. The weekend days and holidays are discarded as the focus lies on typical weekdays. Only the peak hours are selected when the delays are the most likely to occur. The model has been trained on 29 working day and 6 daysStadt have been Zürich used as a| validationZurich setCity to prevent over-ﬁtting.

Zürich Tarifzone 110 Figure 13: Map of the Zürich’s network. Flughafen

111 121 Fare zone 110

REGENSDORF S9 10 Glattbrugg 762 116 S S24 111 1 12 768 5 764 Halden KATZENSEE 110 Bhf. Lätten- 115 162

Im Ebnet S16 Käshaldenstr. Zentrum wiesen 161 Birch-| 114 113 124 Köschenrüti Glatttalstr. S2 Post 160 111 118 163 Leimgrübelstr. 123 Schönauring 75 Ausser- 112 110 120 164 dorfstr. 117 S6 Mühlacker 761 S7 UnteraffolternSchwandenholzWaidhof 742 Ettenfeld 761 Frohbühl- 111 170 Aspholz 62 121 122 171 str. Bhf. Opfikon Chrüzächer Holzerhurd 184 Blumenfeldstr. Friedhof 130 135 Buhnstr. 154 110 62 Seebach Giebeleichstr.

32 Schwandenholz Hertenstein- 131 172 Lindbergh- 173 491 61 40 str. OPFIKON 140 Fronwald platz Oberhusen 155 132 150 142 133 Staudenbühl 141 134 Bahnhof 75 143 111 Nord 768 Lindbergh-Allee 156 151 Affoltern Maillartstr.Stierenried 180 40 Himmeri Glattpark Chavez-Allee 152 183 154 Hungerbergstr. 182 Seebacherplatz Wright-Strasse 153 181 Grünwald Bhf. Seebach S6 S21 Fernsehstudio Glaubtenstr.Glaubtenstr. Zehntenhausplatz Einfangstr. Hürstholz Oerliker- ENGSTRINGEN 64 Mötteliweg Neun- Felsenrain- hus

brunnen str. r. Bollinger- Leutschen- 11 Schauen- Bhf. weg 14 Orionstr. berg bach Genossen- Geeringstr.Heizenholz Oerlikon Auzelg Nord HagenholzRiedbach schaftsst Rütihof Lerchenhalde Glaubtenstr. 781 12 WALLISELLEN Lerchen- weg Herti Belair Süd 62 Chalet- 787

rain 79 Giblenstr. 80 Auzelg Ost 46 Schumacherweg Bhf. Bahnhof Schützenhaus S8 S14 S19 485 Höngg Birchstr. Oerlikon Wallisellen 80 Neu- Ost Riedhofstr. 61 Aubrücke 38 121 37 Oberwiesen- S8 affoltern str. Bahnhof Hallenbad RiedgrabenSaatlenstr.Dreispitz Max-Bill-PlatzOerlikon Oerlikon 110 Flora- 89 94 94 308 Friedhof 40 str. Hönggerberg 62 Ifang Althoos Schürgi- ETH 61 str. FrankentalSegantinistr. Michelstr. Singlistr. Hönggerberg Luchs- 13 Wiesler- Maienweg 62 wiesen Messe|Hallenstadion Herzogenmühlestr. Glatt gasse 69 Birchdörfli 11 Glatt Sternen Dorflinde Herbstweg 9 Kappen- Regensberg- Oerlikon 759 Höngger- Krematorium brücke Altried Winzerstr. bühlweg berg Nordheim Tulpenstr. 787 Zwiel- Friedacker- Heerenwiesen 787 Neugut platz Nordheim- Bahnhof Nord Rütistr. Wartau Winzerstr. Süd Im Wingert str. str. 307 Berufswahl- Luegis-land Sportanlage Industrie GaswerkHohenklingen- 80 schule Bad Allenmoos Salersteig Schörlistr. Heerenschürli L IM M A T steig Appenzellerstr. Neugut Meierhof- Am Waidbadstr. Hirzenbach Bahnhof Winzer- Börtli 75 Süd Giessen Juchhof platz Pflege- Berninaplatz Schlieren Werd- halde Schwert zentrum Wald- Geiss- 154 hölzli Käferberg 7 Hochbord Nord Hirschwiesenstr. garten weid Schlieren 110 Kempfhofsteig Waidspital Radiostudio Roswiesen S Hochbord Süd Zentrum/ Wagonsfabrik 1 Bändliweg 1 Rebberg- 9 S 69 Frohburg Bahnhof 1 steig Glattwiesen 2 Alte Weiher- Friedrichstr. Gasometerbrücke 40 760 Ringwiesen S19 Trotte steig Tierspital Schwamendingerplatz Probstei Bhf. 6 Buchegg- Ringstr. S42 Grünaustr. Lehen- platzMilchbuck 12 89

304 Escher- Altstetten 485 str. Mülligen Vulkanstr. gutweg 752 78 Mattenhof 83 Universität Strickhof 754 Bahnhof Altstetten Tüffenwies Waid- Rosen- Nord 1 fussweg 72 Irchel 743 FeusisbergliNeeserweg kinger- gartenstr. Hardhof S2 p 744 Micafil Wi Guggach- Bahnhof

15 745 S16 platz 32 39 Hardturm str. Stettbach S5 S14 15 Schanzackerstr.Langensteinenstr. Bernoulli- S Laubi- Häuser r. weg Lang- S9 13 t s str. Schäppiweg Hoff- Nüren-rg mauer- S7 e Lägern- nung Fischerweg b Schaffhauser- Germaniastr.Rigiblick 3

Am Suter- 2 Sonnental 35 78 Würzgraben str. platz Bristenstr. S6 Bahnhof S acher Förrlibuckstr. Letzistr.

Aargauerstr. 3 33 23 5 8 751

89 4 Spyristeig 1 Loogarten 80 2 Wipkingen Farbhof Sportweg S Rösli- Scheuch- Kinkelstr. S Kämmaten . Rotbuchstr. zerstr. Hadlaubstr. r Toni-Areal Escher-Wyss-Platz str. Dunkel- Nordstr. Baslerst Krönleinstr. S12

hölzli Solidapark Letzibach Lettenstr. 14 Ottiker- SeilbahnGoldauerstr. Rigiblick 4 46 Kronenstr. str. 8 Technopark Seilbahn Rigiblick S11 Salzweg 3 r Heubeeriweg Ursprungstr. Bachmattst . Schiffbau Letzipark Okenstr. Beckenhof S9 S19 78 SBB-Werk- 80 4 Rautihalde stätte 9 Vogelsangstr. Flobotstr.Im Klösterli Dorf

Sonnegg- S5 Herdern- S14 39 78 Lindenplatz str. str. Schulhaus Letzipark Löwenbräu str. Museum f. Winkel- Spyriplatz S8 West Gestaltung S3 Grimselstr. Quellen- Landes- riedstr. Zoo Buchlern museum 2 Bhf. Hardbrücke Sihlquai|HB Stampfen-

Rauti- 89 S2 6 35 platz 4 bach- Bethanien

str. Hardplatz Limmat- 10 Tobelhof Kappeli Freihofstr. Röntgenstr. platz Flurstr. Susenbergstr. Friedhof 67 72 Güter- Bahnhofquai|HB Eichbühl 31 Halden- Hinter- Zoo|Forrenweid Letzigrund 3 bahnhof Zürich HB Haldenegg 8 Altes 8 bach bergstr. 3 Militär-| 15 Untermoos- Krematorium Albisriederplatz Bäcker- Langstr. Central Zürichbergstr. 33 6 Dreiwiesen str. Albis-rank anlage 46 Toblerplatz Kirche 33 Siemens Helvetia- Fluntern Bircher-Benner Bahnhofplatz| 3 Hubertus platz ETH| 751 Fellen- Zypressen- 8 gasse HB Polyterrasse Albisrieder- bergstr. str. Kanonen- 10 Universitäts- Kernstr. 24 dörfli Lochergut Polybahn spital 33 Langgrüt- Waldhaus 67 Im Gut 31 Sackzelg . Wiedikon 14 Dolder Friedhof Kalkbreite/f SihlposLöwenplatz 25 str. Bertastr. h HB Voltastr. Albisrieden Sihlfeld B Bahnhof- 3 5 Hofstr. Gutstr.Goldbrunnen- Bezirks- Neumarkt Platte Bergstation Dolderbahn str.|HB 4 Schulhaus gebäude t/ 33 Titlisstr. Altweg bergerstr.Schaufel- 11 13 Talwiesen- Zwingli- Stauffacher Buchholz platz haus Rennweg Rudolf- Klosbach Buchzelgstr. Goldackerweg Heuried Brun-Brücke Kantonsschule Schmiede Werd Sihlstr. Limmatquai Bahnhof 7 str. Wiedikon76 10 Hölderlinsteig In der Ey Wiedikon Kunsthaus Römerhof 67 Bahnhof Dolderbahn 80 Selnau 14 2 Rathaus Hottingerplatz Sädlenweg 9 Storchen 215|245 Helmhaus Hölderlinstr. Paradeplatz 350 Sprecher- 3 Klusplatz Zentrum Witikon 220 str. Englisch- Loorenstr.GlockenackerFriedhof Witikon Bahnhof 704 235|236 2 Kantonalbank viertelstr. S10 Stadelhofen 703 Triemli Bhf. Enge|Bederstr. WALDEGG 32 S4 Stockerstr. 8 701 89 Kantonsschule Höfliweg Waffenplatzstr. 7 9 Triemli- Enge S18 spital 76 Tunnelstr. Berghaldenstr. Manesse- 66 11 S10 91 Schweighof platz Signaustr. Zweiackerstr. Triemli Friesenberg Kapfstr.Schlyfi 5 Kreuzplatz Waserstr.

Räffelstr. 10 Drusbergstr. r. Friesenberg- Sihlcity Bellevue 31 Hegibach- Binz Opern- platz haus Freiestr.

73 Botanischer Uitikon Nord Bahnhof 91 Waldegg Bürkliplatz Grubenstr. Giesshübel Enge Friesenbergstr. Saalsporthalle halde Bellevue Kreuzstr. Stodolast Kienastenwies Sihlcity 912 Garten Segeten Rentenanstalt (See) Wetlistr. Hedwigsteig 110 121

S25 916 Feldegg- Museum 916 str. 11 Friedhof Binz 4 Schweizer Burgwies Wiesliacher Center 2 Rietberg Trichten-hausen Uetliberg S Rück 33 S6 Altenhofstr. 912 Fussweg Zielweg Im Hagacker 77 Balgrist

S8 S7

161 Trichtisal S16 Wonnebergstr. 66 4

Ringlikon 5 Hügelstr. S2 Brunaustr. Friedhof 89 Sukkulenten- Enzenbühl Laubegg sammlung Flühgasse Hegian- Brunau| 32 2 ZOLLIKERBERG wandweg Rehalp Uetlihof Mutschel- 165 Epi-Klinik 10 Billoweg Höschgasse S18

S10 lenstr. Landiwiese Elektrowatt Strassen- | 210 Brunau verkehrsamt 72 Fröhlichstr. Im Walder Thujastr. Bahnhof Wollishofen str. 110 13 200 Bleulerstr. Chinagarten Bhf. Wollishofen/Werft Wildbach- 140 Albisgütli 444 Besenrainstr. Bahnhof Bhf. Wollishofen/ Wollishofen Schiffstation Tiefenbrunnen SIHL Staubst Zürichhorn

S4 Jugend- 910 445 herberge P ost 200|210 Wollishofenr. Rote Fabrik Butzenstr. 444 Verenastr. Morgental 154 445 185 7 Tiefenbrunnen (See) 155 Seerose ZÜRICHSEE Uetliberg 70 184 Manegg 15 Tram (Nr. 2–17) Tram (nos. 2–17) Kalchbühlweg 70 Wollishoferplatz ZOLLIKON 66 Bus (Nr. 31–916) Bus (nos. 31–916)

Frymannstr. 66 Stadtgrenze 110 S8 S-Bahn S-Bahn train 155 110 Widmerstr. Bahnhof Leimbach 150 24 Bergbahn Cable car/funicular Neubühl Schiff Boat Marbachweg Dangelstr. Endhaltestelle Terminus Im Hüsli Haltestelle nur in Bus stop only for

S4 Pfeilrichtung bedient direction shown Sihlweidstr. 184 Sunnau Linie verkehrt nur Bus only travels in in einer Richtung one direction here Mittelleimbach 110 KILCHBERG Tarifzone ZVV ZVV fare zone 150

151 Wald Woodland 155 150 14 ADLISWIL

Design: e-team design GmbH | Realisation: PostAuto Region Zürich Region PostAuto GmbH | Realisation: design e-team Design: ©ZVV|VBZ|15.12.2019 Short-term travel time prediction for public transport 2020

5.1 Description of data set

The data set corresponds to the lines of the greater Zürich area, namely, 663 stops serviced by 69 lines (roughly the zone 110 of the VBZ network, highlighted in white in ﬁgure 13).

Figure 14: Probability distribution function (PDF)

− − −

In figure 14 the delays in the network are presented. Taking note of the log scale, it appears that small delays dominate. Indeed, if one counts a time difference between the actual and planned arrival time of more than 2 minutes as a delay, in our data only 0.009% of all the trips are subject to delay (where a trip is defined as a route between two stops).

Concerning the observed large delay values, they can be attributed to either reporting error, or technical problems with the vehicle. Nevertheless, given the very sparse existence of such events, it is not necessary to discard them as the model will recognize them as such.

Figure 15 shows the line distribution in the data set. One can observe the under-representation of some lines, highlighted in white.

The majority of those underrepresented lines correspond to lines leaving the study area and therefore stop recorded data, and as such incomplete information concerning their trips may resolved in model malfunctioning. Therefore, those lines are discarded from the respective data sets.

15 Short-term travel time prediction for public transport 2020

Figure 15: Distribution of the line set and their delays.

100 140000 Trips Late trips

120000 80 100000 60 80000

60000 40 # of trips of #

40000 trips late of # 20 20000

0 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 31 32 33 35 37 38 40 46 61 62 64 66 67 69 70 72 75 76 77 78 79 80 83 89 94 161 162 163 165 184 185 301 302 303 304 305 306 308 309 311 314 325 701 703 704 705 743 744 745 751 910 912 916 918 919 line ID

5.2 Data processing

Some modiﬁcations have been made to the data set, prior to the model estimation. First, all the runs to the depot have been deleted. Then a unique run ID has been created; according to the following equation;

D=8@D4 = 100 ∗ ;8=4 + ;8=4:DAB (9)

Subsequently, a normalisation on all 3 features is performed. The model is surprisingly sensible to the chosen normalisation. In this regard, a min-max normalisation was tested, i.e. value between -1 and 1 only. However, this type of normalisation yields models with unreasonable results. To overcome this issue, the following normalisation has been chosen instead:

D − D¯ D = √ (10) fD

√ With D being the data set, D¯ its mean and fD its standard deviation.

16 Short-term travel time prediction for public transport 2020

5.3 Time prediction performance

In this section, the performance of the proposed model is compared to 3 diﬀerent delay prediction approaches. The ﬁrst approach involves no prediction,i.e. the buses are assumed to be always on time. In the second approach, the delay of a bus is assumed to be equal to the mean delay of all the previous days, i.e. the bus always has the same delay everyday. Last, in the third approach that is presumed to be the one utilized currently by the operator, no adaptation of the delays through time is considered, i.e. it is assumed the bus will have the same delays in the future time steps without any correction.

The predictions for the next 20 minutes with a time step of 2 minutes are compared in ﬁgure 16

Figure 16: Time prediction for diﬀerent time step.

No prediction 30

MAE [s] MAE Mean LSTM 20 Zurich

2 4 6 8 10 12 14 16 18 20 t+ [min]

First, it is interesting to notice the performance of the method currently used. Indeed, for short prediction, Zürich’s model is performing far better than our model. However, for medium to long term prediction, our model could improve the accuracy of the predictions. It is also interesting to notice that the LSTM performed better than the no prediction case, even in the case of Zurich which features so few delays. .

It is also interesting to see performance per line; as shown in ﬁgure 17, the model’s performance varies substantially between the lines. For some lines the prediction for C + 20 [min] is better than for C + 2 [min]. Those line are lines leaving the central area, therefore the delays become stable once the vehicle has left the zone. It is then rather easy for the model to predict that the delay will be constant.

17 Short-term travel time prediction for public transport 2020

A good result that can also be seen in ﬁgure 17 is the consistency of the delays prediction error for the tram lines (from 2-14). Indeed, the prediction is rather constant while increasing the prediction lead time. As trams are the backbone of the urban transportation, having a good reliability on those lines is important.

Figure 17: Model performance per line.

120000 100

100000 90

80 80000 70 60000 MAE [s] MAE

# of trips of # 60

40000 50

20000 40

30 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 31 32 33 35 40 46 61 62 66 67 69 70 72 75 76 78 80 83 89 94 161 165 185 301 302 303 304 308 311 701 704 743 910 912 916 Line ID t+2 [min] t+20 [min]

5.4 Performance under extreme condition

Among all the days in the data set, one had a higher number of late trips, see ﬁgure 18. After veriﬁcation, it appeared that it snowed that day. It is a good opportunity to see how the model reacts in that kind of more extreme situations. Figure 18: Distribution of lateness for each day of the full data set.

0 # of late trips trips late of # 6.1.2020 7.1.2020 8.1.2020 3.2.2020 4.2.2020 5.2.2020 6.2.2020 7.2.2020 10.1.2020 13.1.2020 14.1.2020 15.1.2020 16.1.2020 17.1.2020 20.1.2020 22.1.2020 23.1.2020 24.1.2020 27.1.2020 28.1.2020 29.1.2020 30.1.2020 10.2.2020 11.2.2020 14.2.2020 17.2.2020 18.2.2020 19.2.2020 21.2.2020 Date

18 Short-term travel time prediction for public transport 2020

Figure 19 shows that the model fails to predict better when there are more delays in the network. An explanation is given in subsection 5.5. However, it is impressive to see how in those circumstances the mean techniques is by far the worst, even worse than the "no prediction". An explanation might be that some buses have a mean negative delays (so that they are on average in advance on the schedule), explaining why the mean technique is even worst than the "no prediction" one.

Figure 19: Time prediction under extreme condition.

Mean 60 No prediction LSTM 40 Zurich MAE [s] MAE 20

2 4 6 8 10 12 14 16 18 20 t+ [min]

5.5 Discussion

The methodology that we have developed leads to an improvement in the accuracy of delay predictions for the case study. However, due to the punctuality of public transport in the city of Zurich, the improvement is not huge. Indeed, as most of the trips are on-time the model has trouble to identify a pattern of delays. A solution to further test the potential of the methodology could be to select another network as case study ( which has not been done due to time restrictions), or to perform an unbalanced data set correction. The idea would be to either create some realistic high delays from the few one already collected, or delete a range of really low delays. By doing so, the data will not be overwhelmed anymore by the majority of small delays. Haibo He and Garcia (2009) developed an approach to cope with that kind of issues. Another solution could be to simply train the model where days with high delays are as represented as "normal" day. With either of these approaches, the model could then be used as an emergency model, i.e. when the network does not follow an every day routine but was aﬀected by some external factors like snow or public demonstrations.

19 Short-term travel time prediction for public transport 2020

6 Conclusion

The results that have been shown in this study provide some insights about the functionality of the proposed methodologies. The multi-features analysis manages to overcome previous methodology from the literature. It also conﬁrmed that LSTM models can be applied for delays prediction not only by inferring them from congestion but directly from the delays themselves.

However, due to its sensitivity, each prediction time have different optimal parameters, making it difficult to have a single model for a large time prediction. Having one model per time prediction would be feasible and interesting but will require a higher computational effort.

The case study in Zürich also shows the possibility to implement the model in a well performing network. Indeed, the upside compared to easier methods is not very large but still signiﬁcant.

For future work, a comparison between a stop based model and a bus based model could be performed on the same data set . A development relevant for machine learning, could be to run the model backwards, i.e. to understand on what basis is the model making prediction. Indeed, the lack of information about its actual understanding is one of the biggest weakness of machine learning algorithms. Some research have already been done to try to solve this black box problem (e.g. Lundberg (2018)).

20 Short-term travel time prediction for public transport 2020

7 References

Agafonov, A. A. and A. S. Yumaganov (2019) Bus Arrival Time Prediction Using Recurrent Neural Network with LSTM Architecture, Optical Memory and Neural Networks, 28 (3) 222–230, ISSN 1060-992X, 1934-7898.

Chen, C., H. Wang, F. Yuan, H. Jia and B. Yao (2019) Bus travel time prediction based on deep belief network with back-propagation, Neural Computing and Applications, ISSN 0941-0643, 1433-3058.

Chollet, F. and others (2015) Keras, GitHub.

Genser, A., L. Ambühl, K. Yang, M. Menedez and A. Kouvelas (2020) Time-to-Green: A framework to enhance SPaT messages using machine learning (working paper), 23rd IEEE International Conference on Intelligent Transportation Systems, September 2020.

Haibo He and E. Garcia (2009) Learning from Imbalanced Data, IEEE Transactions on Knowledge and Data Engineering, 21 (9) 1263–1284, ISSN 1041-4347.

Huang, Z., Q. Li, F. Li and J. Xia (2019) A Novel Bus-Dispatching Model Based on Passenger Flow and Arrival Time Prediction, IEEE Access, 7, 106453–106465, ISSN 2169-3536.

Jenelius, E. (2018) Public transport experienced service reliability: Integrating travel time and travel conditions, Transportation Research Part A: Policy and Practice, 117, 275–291, ISSN 0965-8564.

Kim, Y., S. Choi, S. Briceno and D. Mavris (2016) 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC)., IEEE., S.l., ISBN 978-1-5090-2523-7. OCLC: 1096705549.

Liu, H., H. Xu, Y. Yan, Z. Cai, T. Sun and W. Li (2020) Bus Arrival Time Prediction Based on LSTM and Spatial-Temporal Feature Vector, IEEE Access, 8, 11917–11929, ISSN 2169-3536.

Lundberg, S. (2018) Explainers — SHAP latest documentation, https://shap. readthedocs.io/en/latest/#.

Ma, X., Z. Tao, Y. Wang, H. Yu and Y. Wang (2015) Long short-term memory neural network for traﬃc speed prediction using remote microwave sensor data, Transportation Research Part C: Emerging Technologies, 54, 187–197, ISSN 0968090X.

Olah, C. (2015) LSTM, Understanding LSTM Networks – colah’s blog, https://colah. github.io/posts/2015-08-Understanding-LSTMs/.

Petersen, N. C., F. Rodrigues and F. C. Pereira (2019) Multi-output bus travel time prediction with convolutional LSTM neural network, Expert Systems with Applications, 120, 426–435, ISSN 09574174.

21 Short-term travel time prediction for public transport 2020

Zürich, O. D. (2020) Fahrzeiten der VBZ im SOLL-IST-Vergleich - data.stadt-zuerich.ch, https: //data.stadt-zuerich.ch/dataset/vbz_fahrzeiten_ogd. Library Catalog: data.stadt-zuerich.ch.

22 Short-term travel time prediction for public transport 2020

A Appendix

A.1 Tuning for deeper network

Figure 20: Tuning of the hyper parameters for 2 layers.

MAE 8_8 1.06 1.02 1.06 1.08 1.06 1.09 1.09 1.06 1.09 8_16 1.05 1.07 1.04 1.06 1.07 1.07 1.05 1.07 1.10 1.10 8_32 1.06 1.10 1.06 1.09 1.05 1.10 1.06 1.04 1.08 8_64 1.09 1.09 1.05 1.09 1.07 1.08 1.06 1.07 1.07 16_8 1.05 1.06 1.08 1.08 1.06 1.08 1.04 1.07 1.09 16_16 1.03 1.07 1.05 1.07 1.08 1.07 1.06 1.08 1.08 1.08 16_32 1.09 1.06 1.04 1.03 1.08 1.05 1.09 1.08 1.09 16_64 1.04 1.04 1.06 1.05 1.04 1.06 1.07 1.07 1.06 32_8 1.06 1.06 1.05 1.09 1.09 1.06 1.05 1.09 1.11 1.06 32_16 1.08 1.08 1.03 1.10 1.04 1.05 1.05 1.07 1.08 Filter size 32_32 1.06 1.05 1.04 1.07 1.07 1.06 1.06 1.08 1.07 32_64 1.06 1.04 1.05 1.04 1.05 1.08 1.05 1.11 1.06 64_8 1.08 1.05 1.05 1.09 1.04 1.07 1.09 1.05 1.11 1.04 64_16 1.08 1.05 1.03 1.04 1.06 1.07 1.08 1.09 1.08 64_32 1.07 1.05 1.07 1.06 1.06 1.09 1.11 1.05 1.08 64_64 1.08 1.04 1.03 1.07 1.04 1.04 1.07 1.05 1.06 1.02 10_10 5_10 1_10 10_5 5_5 1_5 10_1 5_1 1_1

Computation time[s] 8_8 239 274 296 484 521 886 541 185 104 6000 8_16 294 287 351 630 525 1129 877 250 115 8_32 545 669 897 1554 1159 2386 1778 400 150 8_64 831 849 1952 2505 1892 3313 2090 592 215 5000 16_8 331 299 632 697 525 863 915 301 176 16_16 411 356 778 1142 469 997 1121 433 190 4000 16_32 590 581 1531 1929 863 1428 2020 539 259 16_64 942 799 1570 2826 1185 3479 2315 709 293 32_8 567 584 1198 1880 1945 2216 2482 626 315 3000 32_16 577 681 1271 1816 2267 1735 2553 678 372 Filter size 32_32 662 969 1707 1760 3261 3217 3318 861 592 32_64 1014 1211 2200 4052 4243 5085 3947 1183 769 2000 64_8 733 875 914 2663 3178 2637 3389 1622 559 64_16 816 960 949 2486 3404 2605 3576 3690 787 1000 64_32 1055 1276 1442 2896 4412 3264 6205 4513 778 64_64 1422 1814 1861 4958 6243 4717 6170 4766 967 10_10 5_10 1_10 10_5 5_5 1_5 10_1 5_1 1_1 Kernel size

Selected values in bold

23 Short-term travel time prediction for public transport 2020

Figure 21: Tuning of the hyper parameters for 3 layers.

n = 3, Results

MAE Time 1.125 3000

1.100 2000 MAE

1.075 1000 Computation time[s] 16_16_16 16_32_16 16_64_16 32_16_16 32_32_16 32_64_16 64_16_16 64_32_16 64_64_16 16_16_32 16_32_32 16_64_32 32_16_32 32_32_32 32_64_32 64_16_32 64_32_32 64_64_32 16_16_64 16_32_64 16_64_64 32_16_64 32_32_64 32_64_64 64_16_64 64_32_64 64_64_64 Filters

Selected values encircling