Master Thesis

To bike or not to bike? Prediction of public bike availability in the Dutch Train Network

Gijs de Jager 10006729 Supervisor - Dr. F. M. Nack Second Examiner - Prof. Dr. T. V. van Engers Faculty of Science (FNWI) University of Amsterdam , Amsterdam To Bike or not to Bike?

Prediction of public bike activity in the Dutch Train Network

Gijs de Jager 10006729

ABSTRACT for the long term. In this thesis we test if the algorithms designed for local Bike In the current situation the consumer can obtain limited Sharing Systems can be applied in a national environment. information about the availability of the number of bikes, In order to make the rental bike a reliable part of the public every fifteen minute there is an update shown in the app transport system. A lot of research has been done on rela- or on the website of the number of bikes at a particular tively small-scale bike rental systems. We contribute to this train station. The problem with this solution, in contrary field by creating a Case Study for a Bike Sharing System to other public transports like a , metro or a , is that with a larger network of stations and greater variation of one cannot fit the bike in to his planned journey. Ideally, available bikes. We apply and modify a model to predict a passenger going from A to B by train and then continu- how many bikes are available at a certain train station at ing his journey by bike needs to know, before his departure, certain time in the future and, if no bike is available, how whether at least one bike will be available upon his arrival long it will take before a new bike comes available. The at B or, if no bike will be available, how long he needs to Case Study is built upon the data of the OV-fiets System wait for a bike.[2] provided by the NS(Dutch Railroad) and the Royal Dutch The suggested solution is that, in absence of a live up- Institute of Meteorology (KNMI). The predictions are done date, there should be a predicted number of bikes so that by the GAM method, the significance testing between the planning is feasible. Though there are Bike-Sharing-Systems two algorithms is done according the Kolmogorov-Smirnov (BSS) available in other cities and research has been done test. towards the predictability of those systems [5]kaltenbrun- ner2010urban[1][21], they are based on rather different in- Keywords frastructures. In all cases those systems are designed in Bike-sharing system, predictive model, GAM algorithm a context with few stations and low number of available NRMSE, Kolmogorov-Smirnov bikes. This results in the following research question: Can the available bike sharing algorithms be applied to solve the information problem for a environment with a large amount 1. INTRODUCTION of stations and a variety of available bikes? This question 1 23 The OV-Fiets is growing in popularity . In 2014 the num- will be answered by a case study where real data provided ber of bikes that were rented was approximately 1,4 million, by the NS and the Royal Dutch Institute of Meteorology in 2017 the amount was 3,2 million. Because of the growing (KNMI) is used. 4 popularity people are confronted with empty bike stations . This paper is organized as follows. In Chapter 2 the key The Dutch Railroad (NS) has solved this problem for the features of both the Dublinbikes and the OV-Fiets-system short term by buying a lot of extra bikes. But with the on- are presented. The history and development in predicting going growth of the bike rental this solution is insufficient bike availability and uncertainty aware journey planning are 1Best translated as PT-Bike: Public Transport Bike presented in Chapter 3. In Chapter 4 the variables are de- 2http://nieuws.ns.nl/recordaantal-ritten-met-de-ov-fiets-in- scribed and the new algorithm will be presented. In Chapter 2017/ 5 the Case Study is presented, including the results as well 3https://www.ad.nl/utrecht/ov-fiets-is-niet-aan-te- as the recommendations towards NS. Discussion is explained slepen af126e8b9/ in Chapter 7 and we conclude in Chapter 8. 4https://www.ovmagazine.nl/2017/06/ov-fiets-nog-niet-zo- betrouwbaar-als-de-trein-0600/ 2. BIKE SHARING SYSTEMS Worldwide, there are more then 700 Bike Sharing Systems.[3] Usually, they are set in large cities like Washington, Barcelona, Paris, Lyon and . The systems are often exploited by a stand-alone commercial organization. For example in Paris and Dublin the advertisement company JCDecaux5 owns the bikes and the stands. One can rent a bike, by swiping a card along the central pole at the bike-station and can drop the bike at any sta- 5jcdecaux.com

1 tion in the city, providing at least one empty stand. Also place for bikes, unlike the BSS stations where there is lim- at the most BSS the first 30 minutes of rent are free of ited space for the bikes. charge and, in Barcelona, a supplement of e 4,49 6 is charged Unlike the other BSS system OV-fiets works nation wide. when the rent takes longer that two hours. The combina- Ov-fiets has around 15.000 bikes divided over 317 bike sta- tion of the many stands and the incentive to use a bike tions which are in general placed at train stations. in less than 30 minutes leads to a very short rent term, for example in Lyon this leads to an median rental time of 11 minutes[1]. For cities as Barcelona (), Washing- 3. THEORETICAL FRAMEWORK ton (Capital Bikes), Paris(V´elib), Lyon(V´elo) and Dublin 3.1 Predictive models (DublinBikes) certain predictive algorithms has been devel- Researchers have tried to solve the problem of predicting oped[10][21][6][2][11]. However, those applications are not bikes in bike stands for a Dublinbike like system, like Froahlich integrated into a larger public service system. et al.[5], Kaltenbrunner et al.[10], Borgnat et al.[1] and Yoon et al.[21]. 2.1 Dublinbikes Froahlich et al. created the basis for this series of research The Dublinbikes7 system had in 2012 550 bikes across 44 where especially Bicing and Velo are being investigated and bike stations in Dublin. The stations are open from 05:00 where the researchers try to predict accurately how many a.m. to 0:30 a.m. seven days per week. And they had bikes are at a particular bike stand at a particular time in over more then four million rentals between 2009 and 2012. the future. Froahlich investigates the Barcelona Bicing sys- Dublinbikes provides real time information about the amount tem and makes use of a Bayesian Network(BN) and com- of the bikes at the stations but doesn’t given any informa- pares it to three other methods. tion for future prediction. They use the same pricing model The first, and most simple, is Last Value (LV). LV predicts as Vel´oand Bicing since they are also owned by JCDecaux. the number of bikes at a bike stand by giving the count of We choose for the DublinBikes as a example because Chen the last known value. For example, when one wants to know et al.[2] use the DublinBikes data. Their model is build how many bikes there are at a particular stand in about an according the same approach as ours, namely design a solu- hour. LV will check the current value, let’s say 25 bikes, and tion for planning the rental bike in to a journey, even when return 25 bikes as the predicted value. Froehlich shows that there is no bike available. Also their model focus mainly on LV is quite accurate until the prediction window exceeds the the switch between public transport and rental bikes where 60 minutes, after which it is out performed by Historic Mean models design for i.e. Bicing[5] and Velib[6] depend heavily (HM) and Historic Trend (HT) methods[5]. on bike station to bike station calculations and the way the HM calculates the historic average of the amount of bikes amount of bikes are distributed and less on predicting the for the same time (in history) as the predicted time. Froehlich amount of bikes for the journey planning of the traveler. On et al. shows that this predictive model is highly unstable top of that has research shown that the model of Chen et therefore concludes that the distribution of bikes at busy al. performs best. stations is very irregular. HT uses HM and LV plus an extra feature, namely: the calculation takes the Historic Mean of the amount of bikes at the time of request, the Historic mean of the time that 2.2 OV-Fiets one wants to predict and calculates the difference. And adds In contrast, the OV-fiets8 system works differently from that difference up on the value of LV. the BSS mentioned above. First of all the OV-fiets sys- Froehlich et al. shows that the Bayesian Network out- tem is deeply integrated in the Dutch Railway company (NS performs these three approaches. They propose three input Groep), which is a semi-public company owned by the Dutch nodes: time, bikes and Prediction Window. time is a day government. NS-Stations, part of the NS-Group, manages divided in 24 hours. bike is divided in 5 values of 20% that the OV-fiets. The main goal of the OV-fiets is to give the represents the amount of bikes at a station. And the Pre- traveler a real door-to-door experience. In this way NS can diction Window (PW) is divided in to six values: 10, 20, 30, keep the client instead of losing it to a local public transport 60, 90 and 120 minutes. The output is ∆, which shall rep- provider, like GVB9, and in this way make more money. resent the predicted amount of bikes. With this algorithm Therefore this system has whole different business model they achieved an prediction error of 0.08. than the other BSS. For the OV-Fiets the pricing is as fol- Froehlich et al. predicted if a station was full (what means lows: a user pays e 3,85 per 24 hours up to 72 hours, after one can not return a bike) and if a station was empty (which those 72 hours there is a supplement of e 5 for every 24 means one can not pick up a bike). BN predicted this wrong, hours. Also a user is discouraged to return the bike to an- 13.3% of the time. other station where he originally rented it. When one does Froehlich et al. created a clear basis for investigating the leave it at a different station, one needs to pay a supple- possibilities to predict the amount of bikes in bike shar- ment of e 10. This results in a different distribution of bike ing systems. However, several researchers have criticized rentals then the usual BSS. The OV-fiets is 24/7 available the approach of Froehlich et. al. For example Chen et for the traveler and can also be returned 24/7 at the train al. and Yoon et.al. claim that by not taking into account stations. The Bike stations of the OV-fiets have infinite the weather conditions is a missed opportunity to increase the performance of the algorithm [2] [21]. Also Yoon et al. 6https://www.bicing.cat/es/informacion/tarifas 7 claims that because of the fact that Froehlich et al. divides http://www.dublinbikes.ie/ the number of bikes into five frames (0%-20%, 20%-40%, 8https://www.ns.nl/deur-tot-deur/ov-fiets 9 40%-60%, 60%-80% and 80%-100%) instead of predicting https://www.gvb.nl/ actual numbers, other models, like LV, do not perform much

2 DayT ype T imeofday T imeOfY ear worse and BN gets an ’unfair’ good result. xt =(xt , xt , xt , (2) Yoon et al. themselves add seasonality and weather con- xT emperature, xHumidity) ditions to their algorithm and they analyze the stations t t that behave similar as the station they want to predict. They use the Auto Regressive Integrated Moving Average Equation 1 applied on the parameters of Chen et al. looks (ARIMA).[21] ARIMA is a adjusted version of the Auto Re- like the following equation: gressive Moving Average (ARMA) which is used by Kaltenbrun- y = f (xT imeOfDay) · 1(xDayT ype = “W eekday”) ner et al[10]. Where they use times series analysis to predict t 1 t t T imeOfDay DayT ype the availability of bikes at particular stations. To do so they + f2(xt ) · 1(xt = “W eekend”) use the historic data of the current station as the closest + f (xT imeOfY ear) + β · 1(xW eather = “rainy”) station around.[10]. 3 t 1 t W eather Yoon et al. and Kaltenbrunner et al. depend heavily on + β2 · 1(xt = “foggy”) the dynamics of multiple stations. The research of Chen et + f (xT emperature, xHumidity) + f (y − 1) al.[2] aims at the presence of bikes integrated in the multi- 4 t t 5 t modal traveling in Dublin, for example and trains by + f6(yt − 2) + t using data from the DublinBikes company. They also pre- (3) dict the waiting time for a new bike if there is no bike avail- In GAM all these elements are functions on their own and able. Therefore their research is unique to the others because the results are summed up including a parameter for random they treat the bike as any other type of public transport. effects which can not be explained by looking at the data: They also show that their prediction model is more accu- t. f5 and f6 are autoregressive variables that are used taken rate then the ones mentioned above. A key element of the in account the known number of available bikes in early time analysis of Chen et al. is the Generalized Additive model. stages. For example, with timestamps of 15 minutes -1 is the amount of bikes of 15 minutes in the past and -2 is the 3.1.1 Generalized Additive Model amount of bikes 30 minutes before the actual time stamp.[2] Chen et al. made Daytype binomial in their equation, so The Two Stage Generalized Additive Model (TGAM) exists when it’s a weekend day, Weekend is 1 and Weekday auto- out of two stages. The first stage is running a Generalized matically is 0, which means that the result of f2 is 0 and Additive Model(GAM). therefore f2 has no influence on the equation. The weather GAM is a combination of the Generalized Linear Model conditions are multiplied with a factor distilled from the an- (GLM) and the Additive Model (AM) introduced by Trevor alyze of the data represented by β and β . Hastie and Robert Tibshirani [7]. 1 2 They use their function in three different ways, for short- GLM is a widely used prediction model is statistics and term, medium-term and long-term rental. In Short-term, is introduced by John Nelder and Robert Wedderburn [15]. which is about 5 minutes ahead, all variables are taken in The GLM combines the systematic and random components. consideration. It has a linear predictor, a dependent variable, independent For Medium-term they don’t use the Rainy/Foggy pa- variables, a link function which converts the expected values rameters, because that can suddenly change. Humidity and to a linear scale and it has a error structure.[14] Temperature are quite stable so can taken in account. Inter- GAM is a non parametric regression model proposed by esting for our case is that in order to predict for more than Friedman and Stuetzle [4]. The mix of AM and GLM made 24 hours (long-term) all parameters considering weather are GAM which generalize the whole family of GLMs. The removed and the Historic Mean function is added.[2] GAM has the following form[7]:

p X g(η(x)) = α + fj (xj ),E[fj (xj )] = 0 (1) 3.1.2 Waiting Time j=1 The second stage of TGAM is the prediction of the waiting Chen et al. uses GAM to predict the amount of bikes at time when no bikes are available. This will be activated a particular bike station at a particular time in the future. wheny ˆ of 3 is 0. Chen et al. calculate the waiting time In order to do so they have first investigated which elements in a couple of steps. First they calculate the interval times have an influence on the distribution and the demand of the between returns in a particular period of time (d), and track bikes around Dublin. They concluded to go for the following every returning bike(n). Then calculate the intensity in time parameters as follows[2]: ˆ λ(t) = ni(t)/di(t) (4) 1. Day type: whether it is weekend of a weekday Where i(t) indexed the period within t, and as last step ˆ 2. Time of Day: instants in the day, for example divided they calculated the average of λ(t) for the whole training in twenty four hours set. In this way they can tell how many bikes are returned in a certain period of time. For example, on the 7th hour 3. Time of Year: which part of the year, season wise of a weekday there are 10 returned bikes, to know the av- erage return per minute in that particular hour one divides 4. Weather: Is ”Rainy”, ”Foggy” or ”Normal” the number of bikes (10)by the time period in minutes (60) gives an intensity of 0.167. So per minute there is 1/6th bike 5. Temperature: Noted in Celcius degree returned which means that on a average, every 6 minutes a 6. Humidity: Noted in % written as: bike is returned in that particular hour.

3 4. FEATURE REQUIREMENT ANALYSIS OV-fiets is designed to get people from the train station to AND TGAM 2.0 DEVELOPMENT one’s destination and, later on, back to the train station. To be able to answer our research question we first try to This means that while in Dublin the bikes are always at a identify the differences between the DublinBike and OVFi- station when they are not used, in the Netherlands this isn’t ets system. In that way we are able to identify potential always the case. Sometimes they are at an office or some- features with which the Chen algorithm needs to be altered. ones home. Obviously this leads to longer rental times. The In order to apply the TGAM algorithm on our case study median return time of OV-fiets bikes is 480 minutes, where we need to have the right data. Equation 2 and 3 shows that for DublinBikes like systems the median is 11 minutes. we need data of the number of available bikes, calender data The second and related element is pricing. DublinBikes, and data about the weather. For equation 4 we need data like the other JCDecaux companies, stimulates short rent- about the amount of returned bikes. The data we received ing times by ‘giving’ the first 30 minutes of rent free of from the NS exists out of two different datasets. The first charge(apart from a one-off payment). Renting longer than one shows the amount of pick ups and returns and the time 30 minutes will cost more. As said in section 2.1, the OV- between those two (period of rent) from Utrecht Centraal fiets charges per 24 hours. There is no incentive to bring a Station (UTC), one of the busiest train stations, we will call bike back before those 24 hours expires, this also explains this the UTC-dataset. The period of the UTC-dataset is the high renting time of 480 minutes. between 01.10.2017 and 30.11.2017. We call the second dataset the main dataset. this dataset contains all the numbers of bikes at every bike station for 4.1.3 Pick up & Return every 15 minutes. From this data we received all the updates For the OV-fiets there is a extra charge when one does not from all the stations between 15.03.2018 and 12.04.2018. NS return the bike at the same place where one rented it, in uses this data to send updates to their app to inform the contrast to the Dublin Bikes where one can drop off bike traveler about the amount of bikes. at all the 44 stations, if available. This means that most We use the UTC-dataset to calculate the expected average bikes are returned at the same place where they are rented. waiting time and the Main Dataset to predict the amount Noteworthy: The NS stations do not have a maximum of of bikes. slots for bikes. This is important to mention because all In order to calculate the waiting time we combine the off the predictive models mentioned, try to predict, besides UTC-dataset with the dataset of all the stations. The UTC- the availability of bikes, the availability of space to park the dataset tells us every moment at which someone has picked bike. Also the influence of an station without parking space up and returned the bike. The Main Dataset tells us the on the distribution of other stations plays a role in different number of bikes per 15 minutes on every station. researches. In the OV-fiets system this is not a issue because there is,at least for the user, infinite space to return it at a particular station.

4.1 Key differences 4.1.4 Weekend There are some major differences between the Dublinbikes As described in section 4.2, Chen et al. uses two kinds of Day system, and other BSS owned by JCDecaux, and the OV- Types: Weekend and weekdays. Looking at section 5.1.2 we fiets system. in the subsections below we will review them. suggest to use seven day types instead of two. Namely the In order to find what adjustments can be made to the TGAM seven days of the week. As shown in figure 1 we see that the algorithm. weekdays behave quite the same, except for Friday, which has a drop of availability in the evening hours. Also the Saturday behaves strongly different from the distribution of 4.1.1 Amount of Bike stations Sunday. This difference is proven by the correlation plot In the center of Dublin there are 44 bike stations, where in 6that shows that daynum has a higher correlation then Paris or Lyon have even more. In the Netherlands almost weekday every train station has at least one bike-station and in some occasions, some popular city spots do have an own bike- 4.1.5 Size of stations station. For example in Amsterdam, there are 11 OV-fiets Because the stations are divided across the whole nation, bike stations, where one of them is placed nearby Paradiso every station acts differently. In the bigger cities there is a pop-concert building. All the others are placed at train a lot more activity then in smaller places. For example in stations. A smaller city with less train stations has auto- Aalten there was no rental for over two weeks. matically less bike stations. The stations are scattered all Besides that, the mean of every station is, in contrary to the around the country and aren’t as centered as in BSS situa- DublinBike system, quite different per station. All stations tions. combined the mean of the amount of bikes is 25. While in station Amsterdam Centraal West the mean of bikes is 378. 4.1.2 Rental time Therefore we assume that taking the stations in to account There are two elements that have significant influence on the will improve the prediction error. rental time of a bike. The first one are the rules, the second one the is the pricing. DublinBikes is designed to get peo- ple from one place in the city to another place in the city.

4 4.1.6 Weather 5.1 Datasets As described in chapter 5.1.3, we have collected the hourly For predicting the number of available bikes and the waiting weather data of the described period, because of the shown times we use the data provided by NS and the data of the correlation between the amount of bikes available and whether KNMI. it rains or not, or the temperature is low. Although this also relates to the time of day: During the night there are lower 5.1.1 Main Dataset temperatures, but also lower demand of bikes. The Main dataset exists out of 1.137.024 observations of 28 We assume that because of the long rental time of the days of 423 stations distributed throughout the Netherlands. bikes, an average of 8 hours, plus the fact that users have The data is also provided with a reliability index with 0, 50 to bring back the bike at the same place, that the predicted and 100 percent reliability. In order to get less noisy data we weather conditions for the whole day has more effect on the decided to ignore all the observations that has a reliability decision to rent a bike, then the weather at that particular index of zero. After this cleansing we kept 778.520 of obser- moment. For example, when it’s dry and a pleasant tem- vations distributed over 317 stations. Almost 106 stations perature at 10 o’clock in the morning, but it will rain in the were in that period totally unreliable, sometimes this were end of the day, the user is less likely to rent the bike anyway. also stations which have almost the same name as a sta- We assume that this does not work the other way around. tion with 100% reliability. We assume that apart from some So when it’s bad weather in the morning but the forecast server problems there are also multiple names for the same for the afternoon is good, a user will still not use the bike. stations and therefore are marked unreliable. Per station, Real time conditions at the start of the rental period are we have a average of 2.455 observations, which is almost the relevant, but the predicted weather condition at the time of maximum possible observation in the presented time frame return of the bikes, is, we assume, also relevant at the time of 28 days: (96 · 28) 2.688 observations. of the start of rent. Since the KNMI can predict very accu- We choose to divide the stations in to two categories, big rate one day ahead, we took the general weather per hour of stations and small stations. Results presented in Table 1 weather station de Bilt as our ‘predicted weather’. De Bilt shows that the algorithm works best on stations with a rel- is a weather station near Utrecht and is historical chosen atively high amount of bikes. Because there is more activity as the weather station from where weather predictions were then on the stations with a lower amount of bikes. The ta- communicated to the public via radio and TV [8]. Weather ble shows that with stations smaller than Breda the error sentiment unconsciously plays a bigger role in choosing a on Chen and Chen 2.0 is higher then LV and therefore not bike then the actual weather[13] improving the current situation. So we choose to split the data from the average amount of bikes of Breda. Breda has an average of 61 bikes. This means that the Big 4.2 The TGAM 2.0 algorithm stations has 61 bikes or more and the small stations has 60 The difference proposed in 4.1 are applied on TGAM and bikes or less on average. The majority, 235, of the stations this resulted in the following equation we call TGAM 2.0: are small stations. 26 of the stations are big stations. Those 26 stations has 60% of the total amount of bikes within the T imeOfDay Dayofweek yt = f1(xt ) + f2(β1 · (xt )) OV-fiets system. This matches with the travelers data of Holiday A−rainfall A−F og the NS where 10% of the biggest stations handle 56% of the + f3(β2 · (xt ) + f4(xt ) + f5(xt ) A−temperature A−Humidity total travelers[17]. + f6(xt , xt ) Station + f7(β3 · (xt )) + f8(yt − 1) + f9(yt − 2) + t (5) 5.1.2 Fluctuation, distribution and intensity Where Dayofweek replaced the weekend and weekday func- Busiest stations also have the highest intensity as shown in tions of the TGAM model. A-rainfall, A-fog, A-temperature figure 1, figure 2, figure 3 and figure 4. In figure 1 we see the and A-Humidity represents the average of all those weather average available bikes of all the big stations together on a elements of a day, since we use weather forecasts of a day. weekday. The x-axis represents a day divided over 24 hours And we take in to account the Station for which we want to and the y-axis shows the amount of rents over the day. predict the number of bikes, since the stations differ. The figure shows that the availability drops between 7 and 10 o’clock in the morning and rises from 4 o’clock. Since the 5. CASE STUDY graph shows the availability of bikes, it means that most In this case study we apply the TGAM model developed by pick-ups take place in the morning and that the bikes are Chen et. al. The TGAM 2.0 model with our adjustments returned in the afternoon. A decrease in availability means and the Last Value on the NS main dataset and show the an increase of rentals. different results in Table 1 in order to find out if a algorithm The Friday, represented by the purple line, differs a bit. designed for BSS systems works for a environment with a There are less rents in the morning and more in the evening. large amount of stations and a variety of available bikes. Probably in preparation for the night life. We predicted respectively 15 minutes ahead ,the shortest During the weekends there are a 25% less rentals then during term we can predict, 1 hour ahead and 24 hours ahead. The weekdays on the big stations. Noteworthy is the difference results are presented in Table 1. between the Saturday and the Sunday, where, in the pre- viously mentioned researches a firm distinction was made between weekend and weekday. In chapter 4.1 we will elab- orate more on this topic. Figure 3 and 4 shows that ,on average, there is very little intensity at small stations. On the busiest day (Tuesday)

5 Figure 1: Available bikes per weekday of the Figure 4: Available bikes on weekdays& of the small big(≥61) stations stations per hour

Figure 2: Available bikes on Saturday & Sunday of Figure 5: Available bikes versus temperature the big stations per hour 5.1.4 Weather data In total we have used 26 weather stations to get the hourly data for all the bike stations across the Netherlands. We’ve collected the temperature in decimals, the relative humid- ity and whether it have rained in that hour, represented by a 1 and a 0. Since the time stamps of the bike updates are every 15 minutes, every four updates do have the same weather conditions. In order to collect predicted weather, to support our pre- sumption, we chose the average daily weather from weather station the Bilt, per issued day. Does the weather really influence the number of rentals on a day? The correlation plot (Figure 6) shows there is some correlation. Figure 5 shows what that correlation looks like when plotted. The Figure 3: Available bikes on Saturday & Sunday of Temperature is given in 0.1◦C and the lines shows the in- the small stations per hour fluence of rain on the demand of bikes. Note: the shadow around the lines shows how uncertain the plot is about the given line. Usually this occurs when there is not enough there was a difference of 3.5 bikes between the highest and data to be certain. Usually at the borders of the graph, lowest point, respectively 32%, where at big stations the where the extreme conditions are represented. For exam- amount of bikes changes almost with 100 %. ple: there weren’t many situations where it was raining and −5◦C. With that in mind figure 5 shows that in general, 5.1.3 Data Royal Dutch Institute of Meteorology more bikes are rented when it isn’t rainy and higher tem- Since the NS does not collect weather data by itself and perature leads to a increase of bike availability. weather data is, according to equation 3, needed, we used the free data of the KNMI. Since we collect all the data in 5.2 Setup hindsight, we didn’t collect live data of the weather at a Unfortunately Chen et al. were not permitted to share their particular time as Chen et al. did. But the KNMI saves the algorithm so we have to rebuild it on the basis of their re- weather data per hour, free to download for research. search. The writing of TGAM and TGAM 2.0 are done in R Studio which is widely used for applied machine learning.

6 is this the case. Froelich et al.[5], Kaltenbrunner et al.[10], Yoon et al. [21] en Chen et al.[2] uses RMSE as a model to measure their own model but also to compare their model to others. In order to get an fair image of the performance of TGAM on our data we are almost obliged to use the RMSE as our performance measurement too. Although RMSE is not uncontroversial since it is sensitive to outliers [6][9]. But RMSE is preferred because of the scale dependence and the theoretically relevance. We choose for the Normalized RMSE (NRMSE) because we test the models on stations with different sizes. To get an instant view of the results per stations and the difference between them, normalized error works best to see the rela- tivity. Assuming an error is calculated the following way: Figure 6: Correlation of different parameters which i.e. shows that taking everyday apart (daynum) is e(i) = pi − oi (6) more correlated then weekend vs. weekday (week- day)[18] where e is the error, p is the predicted value and o is the observed value. the RMSE would be calculated as follows:

r 2 1  pi − oi  We used a variety of packages but the most relevant pack- RMSE = Σn (7) n i=1 n age is the mgcv package developed by Simon N. Wood. We used this package in order to predict the amount of bikes where n is the amount of instances. Normalization is done using GAM in R, and thanks to the thoroughly documenta- by dividing the RMSE by the mean of the value of y from tion[20] we were able to recreate the TGAM of Chen et. al. equation 3, what looks like: and adjust the parameters according to the key differences RMSE NRMSE = (8) in chapter 4.1. We then applied the algorithms on the data y which resulted in predictions. In order to compare and eval- uate the predicted values we compare the predicted values [19] with the test set and look at the errors. 5.2.3 Kolmogorov-Smirnov test 5.2.1 Test set & Training set Sometimes the difference between the errors of two models We divided the Main dataset in to a training set and a test, are very close. This means that we sometimes cannot see if with a proportion of 75% training data and 25% test data, the difference is significant. Therefore we test the algorithms translated to the dates which results in 15-03-2018 till 04-04- together by the Kolmogorov-Smirnov test. By using this 2018 is training data and 04-06-2018 till 11-04-2018 is test test we create a Null-hypothesis (H0) where Algorithm(Ax) data. The average available bikes is a bit lower in the test 1 and Algorithm 2 (Ay) are the same (H0 : Ax = Ay) The set then in the training set. This applies to all the stations, alternative hypothesis is that the two algorithms aren’t the so the proportions stays the same. same (H1 : Ax 6= Ay).Depending on the p-value we can Since there are 317 stations we are not going to test them decide if H0 will be rejected or accepted[12]. Since we test all. We chose a couple of stations as representatives of sta- three algorithms we have to test each one against each other. tions with the same characteristics. We divided them by We calculate the p-value of the test with R package stats two characteristics: the size of the station and if they are and use as Ax and Ay the errors of both algorithms from located in a urban or non-urban place.10 The size of the the test sets per station[16]. With an α of 0.005. For exam- station and whether they are located in an urban or non ur- ple; the absolute errors of the prediction on the test set of ban place often correlates. A station as Schiedam Centrum Amsterdam Centraal station of the TGAM algorithm and is a exception, the city Schiedam is fused with Rotterdam the LV algorithm will be test against each other. but the station has a small number of bikes and therefore We tested the models on different stations with different pre- interesting to test. diction windows. In order to predict the waiting time we combine the UTC- dataset and the main dataset. With the UTC-dataset we can predict the average waiting time of UTC and with the Main 5.3 Findings dataset we can tell whether there were exactly zero bikes at 5.3.1 Predicting amount of bikes the bike station. When predicting 15 minutes ahead we see that TGAM and TGAM 2.0 performs significantly better than Last Value when it comes to the bigger stations. For Schiedam Centrum 5.2.2 Evaluation LV outperforms the other two algorithms. The difference be- The Root Mean Squared Error (RMSE) is widely used in tween TGAM and TGAM 2.0 is small, but the KS-test tells measuring the accuracy of a model. Also in Bike prediction us the the P-value is below 0.005 regarding to Haarlem and 10the urban agglomeration is in the Netherlands known as Breda, so H0 can be rejected. For Amsterdam Centraal and de Randstad and consists out of cities as Amsterdam, Den Rotterdam Centraal the p-value is to high to reject H0. For Haag and Rotterdam, located at the western side of the two of the four cases where TGAM 2.0 reaches a lower error, Netherlands it is significantly better when predicting 15 minutes ahead.

7 finding a solution for this problem is relevant. With 15 minutes ahead, our algorithm predicted in 97.22% of the cases correctly whether a bike station is empty or not. In 2.57% the times it predicted a positive amount of bikes when there was not. This means that for predicting for the relevant stations, the stations with more than 30.000 visits a day and more than 60 bikes on average, predicting the empty station doesn’t make sense. Tests show that the algorithm will always tell that there were more than zero bikes. There are almost no cases in when this is not cor- rect so the accuracy is very high, but we can’t tell if this is accurate, therefore we need more data with more situations where the amount of available bikes is zero. Equation 4 applied to the UTC-data predicts that on av- Figure 7: Predicted values plotted against actual erage on a weekday the waiting time is the highest between values for Breda Centrum with a NRMSE of 0.077 10 o’clock and 14 o’clock, with a maximum of 10 minutes. (TGAM 2.0). During weekend days there is a much higher waiting rate in the morning. Figure 8 and Figure 9 show the different Figure 7 illustrates the difference between TGAM 2.0 and waiting times trough the day. Take in mind that equation 4 the test set when having an NRMSE of 0.077 calculates the waiting time if there are zero bikes. For ex- When predicting 1 hour ahead the differences between ample, in the night the waiting time is the highest, but there TGAM and LV becomes very small but is still significantly are also always enough bikes so the actual waiting time will better when it comes to the bigger stations. On the smaller be zero. That’s why we left out the night hours in figure 8 stations the TGAM algorithms performs much worse than and 9. Last Value. Notice the performance of train station Schiedam. Although it’s a small station, TGAM 2.0 has the lowest er- ror, this is probably because of the higher intensity. Keep in mind that the actual difference in error is about 1/10th bike. So in practice there is no actual difference in the prediction. The low number of bikes is also reflected in the high errors of the small stations. When predicting 24 hours ahead we add the historic mean per timestamp to the TGAM 2.0 model to improve the re- sults. The TGAM model takes the average mean of all the stations, as described by Chen et al. Instead of comparing it to the LV, we decided to compare it with the Historic Mean of the predicted time. he historic mean of all stations to- gether is 18 bikes. That’s why TGAM outperforms TGAM 2.0 when it comes to Almere Centrum, but on the others it Figure 8: Expected waiting time at UTC during performs much worse and the error rates are very high. weekdays between 07:00 and 23:59 if there is no bike TGAM 2.0 performs better because of the historic mean available and the station taken in account. We see the same results as with 15 minutes and 1 hour predictions: On the big sta- tions TGAM 2.0 performs best, but when looking at smaller stations (with less activity) we see that the static models has smaller errors. Still, predicting 24 hours ahead for big stations gives very big errors, which is why one can consider to not look at that at all. Therefore we have to conclude that our adjustments have small positive effects on the performance of the TGAM algo- rithm but is only usefully applicable on the busiest stations, represented by Amsterdam Centraal, Rotterdam Centraal, Haarlem and Breda.

5.3.2 Waiting times To predict the waiting time if no bike is available we first Figure 9: Expected waiting time at UTC during have to predict if a bike station has zero bikes available or weekend days if there is no bike available not. We use the same variables of the initial algorithm and changed they ˆ from amount of bikes to variable Emptyslot which is binary. It’s rather empty (1) or not (0). 5.3.3 Recommendation In 3,27% of the times a station had exactly zero bikes. In the light of the results we recommend NS to use the This is not a lot, but taken in to account the growing popu- TGAM 2.0 algorithm for the 25 busiest stations, for the larity of the OV-fiets, this amount will increase and therefore other stations they should keep the 15 minute update. In

8 Table 1: This table shows the NRMSE of the TGAM, TGAM 2.0 and LV algorithm per prediction window per selected station

Station(avg. bikes) 15 minutes ahead 1 hour ahead 24 hours ahead TGAM TGAM 2.0 LV TGAM TGAM 2.0 LV TGAM TGAM 2.0 LV Amsterdam Centraal (327) 0.031 0.030 0.040 0.144 0.141 0.143 1.09 0.43 0.57 Rotterdam Centraal (206) 0.040 0.040 0.043 0.118 0.118 0.122 1.05 0.48 0.48 Haarlem( 102) 0.034 0.033 0.040 0.120 0.116 0.130 0.877 0.166 0.172 Breda (39) 0.082 0.077 0.080 0.234 0.238 0.256 0.910 0.964 0.658 Almere Centrum (20) 0.035 0.036 0.033 0.111 0.087 0.084 0.111 0.087 0.084 Schiedam ( 9) 0.088 0.090 0.081 0.207 0.198 0.209 1.15 0.55 0.51 Borne (6) 0.091 0.073 0.052 0.345 0.31 0.083 0.091 0.073 0.052 Kerkrade (4) 0.145 0.121 0.034 0.726 0.457 0.071 0.145 0.121 0.034 Aalten (3) 0.144 0.135 0.017 0.485 0.531 0.034 5.54 0.387 0.044

order to build trust, we recommend to present the accuracy the TGAM model after a thorough data-analysis which lead of the prediction. For example by giving a range of available to new important variables which did improve the model, bikes. In the case of Amsterdam Centraal where the error although the increase of performance was small. is 0.030 one can say that when the predicted availability is The case study shows that on the biggest stations the 100 bikes, there are between 97 and 103 bikes available or TGAM model significantly out performs the current situa- they are for 97% sure about their prediction. tion, the Last Value, especially when it comes to predict- Regarding predicting the empty bike stations we strongly ing in the next 15 minutes. Therefore we can say that the recommend to research this further since it is expected that TGAM is applicable on the NS OV-fiets system and is im- more empty bike stations will occur. Empty bike stations proved by the adjustments made in response to the data. does not have to be problem, as we have shown that the For the small stations with less activity this isn’t the case, maximum waiting time on Utrecht Centraal is 10 minutes. for those stations we have to conclude the TGAM model As long as the traveler is informed about the waiting time isn’t applicable. Still, since the 26 biggest stations handle they can add the waiting time to their planning, just like the 56% of the total travelers of the NS we can conclude that schedule of a tram or a bus. The result is that NS can keep for the busiest stations, with TGAM 2.0 the OV-fiets pre- the traveler in their ecosystem and give them a real door to dictions can be relied on, like one can rely on the bus or even door experience. the train itself. Therefore it is worthwhile for the NS to ap- ply TGAM 2.0 and inform the travelers about the number 6. DISCUSSION of predicted bikes. Despite the good results for the twenty-five biggest stations Apart from predicting bike availability it is interesting there is still room for improvement. Especially when it to investigate other ways to improve the OV-fiets system, comes to the provided data we could have done more if we for example by tracking the bikes, so one can predict when had different data. For instance it would be great for future someone is returning the bike. Or by giving incentives to research to collect live data from the NS database on when speed up the return of the bike. a bike is rented or returned and how many bikes are at the A logical step to build upon this project is to try imple- stations of a whole year instead of just two months. So one ment these results in the existing application of the NS and can see much more correlation between the different seasons how to show it to the user and find out if the user experience and the use of bikes. Due to complication and data-security will increase with this additive. reasons we weren’t able to combine the rental data with the travelers data of the different stations. We believe that for Acknowledgements future research that would be a interesting variable to inves- This research project would not have been possible without tigate. the help of many others. In particular, I would like to thank To get even more accurate data it would be better to track Frank Nack for his guidance throughout the writing of this the distribution of the bikes and at the same time collect thesis, his help and intelligent approaches. Also I would live weather data instead of collecting it in hindsight which like to thank NS and Rob Sluijsman in particular for their causes differences. cooperation in providing the data required for my research question. 7. CONCLUSION & FUTURE WORK The intention of this research was to find out whether the 8. REFERENCES model presented by Chen et al. is applicable on a system like [1] P. Borgnat, P. Abry, P. Flandrin, C. Robardet, J.-B. the OV-fiets system from the NS and if so, if we can improve Rouquier, and E. Fleury. Shared bicycles in a city: A the model to make it even better. In order to give the OV- signal processing and data analysis perspective. fiets the same status as tram or bus, a reliable part of the Advances in Complex Systems, 14(03):415–438, 2011. public transport. We have recreated the TGAM model and [2] B. Chen, F. Pinelli, M. Sinn, A. Botea, and applied it to the data provided by the NS. We also adjusted F. Calabrese. Uncertainty in urban mobility:

9 Predicting waiting times for shared bicycles and comparison of models. Journal of Geophysical parking lots. In Intelligent Transportation Research: Oceans, 90(C5):8995–9005, 1985. Systems-(ITSC), 2013 16th International IEEE [20] S. N. Wood. Generalized additive models: an Conference on, pages 53–58. IEEE, 2013. introduction with R. Chapman and Hall/CRC, 2006. [3] P. DeMaio. Bike-sharing: History, impacts, models of [21] J. W. Yoon, F. Pinelli, and F. Calabrese. Cityride: a provision, and future. Journal of public transportation, predictive bike sharing journey advisor. In Mobile Data 12(4):3, 2009. Management (MDM), 2012 IEEE 13th International [4] J. H. Friedman and W. Stuetzle. Projection pursuit Conference on, pages 306–311. IEEE, 2012. regression. Journal of the American statistical Association, 76(376):817–823, 1981. [5] J. Froehlich, J. Neumann, N. Oliver, et al. Sensing and predicting the pulse of the city through shared bicycling. In IJCAI, volume 9, pages 1420–1426, 2009. [6] N. Gast, G. Massonnet, D. Reijsbergen, and M. Tribastone. Probabilistic forecasts of bike-sharing systems for journey planning. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 703–712. ACM, 2015. [7] T. Hastie and R. Tibshirani. Generalized additive models. Wiley Online Library, 1990. [8] R. Herber and T. Langerveld. Knmi: 150 jaar onderzoek naar het weer en meer. De Biltse Grift: tijdschrift van Historische Kring d’Oude School, ISSN 0928-639X; jg. 13 (2004), nr. 4, p. 98-108, 2004. [9] R. J. Hyndman and A. B. Koehler. Another look at measures of forecast accuracy. International journal of forecasting, 22(4):679–688, 2006. [10] A. Kaltenbrunner, R. Meza, J. Grivolla, J. Codina, and R. Banchs. Urban cycles and mobility patterns: Exploring and predicting trends in a bicycle-based public transport system. Pervasive and Mobile Computing, 6(4):455–466, 2010. [11] Y. Li, Y. Zheng, H. Zhang, and L. Chen. Traffic prediction in a bike-sharing system. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, page 33. ACM, 2015. [12] F. J. Massey Jr. The kolmogorov-smirnov test for goodness of fit. Journal of the American statistical Association, 46(253):68–78, 1951. [13] M. Meng, J. Zhang, Y. Wong, and P. Au. Effect of weather conditions and weather forecast on cycling travel behavior in singapore. International journal of sustainable transportation, 10(9):773–780, 2016. [14] J. A. Nelder and R. J. Baker. Generalized linear models. Wiley Online Library, 1972. [15] J. A. Nelder and R. W. M. Wedderburn. Generalized linear models. Journal of the Royal Statistical Society. Series A (General), 135(3):370–384, 1972. [16] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2017. [17] T. Reiziger. Lijst in- en uit-stappers per station van groot naar klein, 2017, accessed June 6,2018. [18] T. Wei and V. Simko. R package ”corrplot”: Visualization of a Correlation Matrix, 2017. (Version 0.84). [19] C. J. Willmott, S. G. Ackleson, R. E. Davis, J. J. Feddema, K. M. Klink, D. R. Legates, J. O’donnell, and C. M. Rowe. Statistics for the evaluation and

10