Cluster-Based Destination Prediction in Bike Sharing System Pengcheng Dai Changxiong Song Huiping Lin School of Software and School of Software and School of Software and Microelectronics, Peking University Microelectronics, Peking University Microelectronics, Peking University , Beijing, China Beijing, China [email protected] [email protected] [email protected] Pei Jia Zhipeng Xu School of Software and School of Software and Microelectronics, Peking University Microelectronics, Peking University Beijing, China Beijing, China [email protected] [email protected]

ABSTRACT Bike in New York City, Vélib' in , and in Destination prediction not only helps to understand users’ Beijing, etc. [1]. For the bike sharing system with stations behavior, but also provides basic information for destination- (dockers to hold the bikes), user can pick up a bike from a nearby related customized service. This paper studies the destination bike station and drop it off to the one close to his destination. prediction in the public bike sharing system, which is now Because bikes are continuously and arbitrarily moved by users blooming in many cities as an environment friendly short-distance from stations to stations, one of the most common problems users transportation solution. Due to the large number of bike stations encounter is finding an available bike or parking slot [2]. To (e.g. more than 800 stations of in New York City), the improve the availability of bike sharing system, researches have accuracy and effectiveness of destination prediction becomes a studied the demand prediction that predict the number of available problem, where clustering algorithm is often used to reduce the bikes and docks [2, 3] and rebalancing strategy to efficiently number of destinations. However, grouping bike stations reallocate bikes among stations [4]. In addition to demand according to their location is not effective enough. The prediction and system rebalancing, destination prediction which contribution of the paper lies in two aspects: 1) Proposes a concerns about the user’s behavior and trip’s flow trend is another Compound Stations Clustering method that considers not only the aspect worthy to study. geographic location but also the usage pattern; 2) Provide a Destination prediction is to predict a user’s destination when he framework that uses feature models and corresponding labels for picks a bike from an origin station, by given the information such machine learning algorithms to predict destination for on-going as the location of origin station, the environment context, the user trips. Experiments are conducted on real-world data sets of Citi information and etc. Compared with demand prediction that Bike in New York City through the year of 2017 and results show concentrates on available bikes and number of trips of bike that our method outperforms baselines in accuracy. stations, destination prediction provides a perspective to study the users’ behavior pattern and bicycle traffic flow trend in bike CCS Concepts sharing systems. Therefore, system administrators can take • Information systems ➝ Clustering • Applied computing ➝ measures in advance to provide more efficient and accurate Forecasting • Applied computing ➝ Transportation service based on the result of destination prediction. On the one hand, destination prediction can be applied to guide users to find Keywords the best destination station to save energy and time [5] and other Destination Prediction; Machine Learning; Clustering; Bike destination-based services. It also helps to rebalance bikes by Sharing System incentivizing users to pick up or drop off bikes in proper bike 1. INTRODUCTION stations [6]. The bike sharing system provides a new and green public Usually the destination prediction in bike sharing systems is transportation to fill the “last mile” gap in urban areas such as Citi abstracted as a multi-classification problem. For example, Bacciu et al. [7] put it forward using machine learning methodologies to Permission to make digital or hard copies of all or part of this work for build multi-classification models to infer destination station for a personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies small bike sharing system with 15 bike stations. Zhang et al. [8] bear this notice and the full citation on the first page. Copyrights for simplified this problem to a binary classification problem and components of this work owned by others than ACM must be honored. proposed a model to do binary classification to predict whether a Abstracting with credit is permitted. To copy otherwise, or republish, to trip in candidate pair of stations will happen. However, post on servers or to redistribute to lists, requires prior specific permission destination prediction still facing challenges with the rapid and/or a fee. Request permissions from [email protected]. expansion of bike sharing systems: AICCC '18, December 21–23, 2018, Tokyo, Japan © 2018 Association for Computing Machinery.  The large number of destination stations. Modern bike ACM ISBN 978-1-4503-6623-6/18/12...$15.00 sharing systems usually have hundreds of bike stations, DOI:https://doi.org/10.1145/3299819.3299826 which makes it difficult to predict destination station directly.

1 For example, the Citi Bike in New York City consists of corresponding labels from relabeling destination station with more than 800 bike stations. The number of average destination station clustering result. Finally, we apply several historical destination stations of an origin station is more than machine learning algorithms for multi-classification to make 300 as shown in Figure 1(a). And Clustering of bike stations prediction and evaluate them with prediction performance to to reduce the number of destinations is a solution. Clustering select the best one to use in our framework. Major contributions techniques are applied in the process of destination extraction of this paper include: from raw GPS traces to make destination prediction for a  person or taxi [9, 10]. And clustering of stations is also a We propose a cluster-based framework using destination widely used method in bike sharing systems based on station clustering and multi-classification algorithms to observations that a cluster of stations is more regular and directly and accurately predict the destination of on-going steadier to make prediction than individual station [3, 11]. trips, which can be applied in large bike sharing systems with hundreds of stations.  Clustering of bike stations. Clustering intends to group a set  of objects so that objects in the same group are more similar We propose a Compound Station Clustering method to group to each other than to other objects in other groups [12]. And bike stations which are similar both in geographic location the intuitive way to measure the similarity between bike and usage pattern by measuring the similarity of geographic stations is geographic location [4]. However, grouping coordinate and inflow vector (detailed procedures in Section stations which are only similar in location might not result 4.2). effective clusters for destination prediction since bike stations  We conduct experiments of the proposed framework on one- with similar pattern are not necessary close in location. As year real-world data from Citi Bike in New York City, and shown in Figure 1(b), bike stations are in the same green the result outperforms baselines both in performance and status (full of bikes in bike station) around New York Penn practicality. Station, while stations through Broadway are almost in red status (lack of stations in bike station) which are not close in The rest of this paper is organized as follow. Firstly, in Section 2 location. In addition to geographic location, the usage pattern we review related work in this aspect. In Section 3, we introduce of stations is also an influential factor worthy to be preliminaries and formally define destination prediction problem. considered when measuring the similarity between bike Then in Section 4 we describe data sets used in this study and list stations [2, 13]. sources of them. And in Section 5, we show the overall proposed framework and illustrate several main phases of it in detail. In Section 6, experiments are conducted to evaluate our framework and results of experiments are analyzed. Finally, Section 7 conclude this paper with future work. 2. RELATED WORK Bike sharing systems. Bike sharing programs have received increasing attention in last years and developed with many issues to be studied [14]. The earlier issue mainly focused on predicting the number of available bikes and docks which is also called demand prediction [2, 3, 15, 16]. For example, Wang [15] built machine learning model that combines historical usage patterns with related influential factors like user information, meteorology and weekend to forecast bike rental demand in a future period. (a) (b) And then Zeng et al. [16] proposed a model to improve the bike demand prediction by developing more helpful global features Figure 1: (a) Distribution of Origin Stations with through data analysis and machine learning algorithms. Based on Different Historical Destination Stations. (b) Status of Several this, rebalancing approach to find the best way to redistribute Bike Stations at Evening Rush Hour in Weekday bikes is also studied as an applied issue for bike sharing systems  Trade-off between performance and practicality. There is [4, 6, 17]. Liu et al. [4] formulated the rebalance issue into a also a trade-off between performance and practicality when mixed integer nonlinear program and the objective is to minimize applying clustering techniques to the destination prediction of the total travel distance. And Singla et al. [6] presented a bike sharing systems. After all, one cluster of every bike crowdsourcing mechanism that incentivizes users to help station makes it difficult to achieve an effective result due to rebalancing bikes by providing them guidance of bike stations to too many potential destination stations, while one cluster of pick up or drop off bikes. Besides, destination prediction is a all bike stations can take an absolute 100% accurate yet novel issue in bike sharing systems with several studies which meaningless result with only one destination cluster. limit to problem formulation or data scope [7, 8]. Zhang et al. [8] Therefore, the criterion to measure the range of clusters and simplified this issue to a binary classification problem, while to decide the number of destination clusters when clustering Bacciu et al. [7] built multi-classification models for a small bike is significant. sharing system with only 15 stations. In this paper, we propose a cluster-based framework to make Destination Prediction. Previous studies about destination destination prediction for bike sharing systems. First, we extract prediction mainly focus on the person or taxi to provide location- station data from Citi Bike for destination station clustering to based service, destination-based advertising [18] or route map destination stations which are similar both in geographic recommendation [19]. Since location information can be collected location and usage pattern into destination clusters. Then we from a person’s mobile phone or a taxi’s GPS device, historical construct feature models from trip data we extract and trajectories are the most common and effective dataset for

2 destination prediction. Based on trajectories, various probabilistic and user and other related data of an on-going trip. Then models especially which based on Markov chain are trained and the vector is the feature model of the on-going trip, which then make prediction [20]. Besides, identifies of users are also contains 푈features like location of origin station, hour of used to make personalized destination prediction for higher departure time,푋 user information and so on (explicit feature performance [21]. However, the ‘disposable’ usage (a short trip list of a feature model is in Section 4.3). Besides, directly from origin station to destination station by an arbitrary ( , , ) = is the corresponding label of feature bike) and the absence of identifiers of users in datasets make bike model ( , , ), representing the destination cluster of 표 표 푑 푑 sharing systems not completely suitable to apply aforementioned this푌 푆 on푇-going푈 trip.푆 And∈ 퐶 the process to map destination station 표 표 methods for destination prediction. to destination푋 푆 푇 푈cluster for label is based on the result of destination station clustering. Clustering in bike sharing systems. When studying these issues 푆푑 퐶푑 푌 in bike sharing systems, there are two perspectives to choose, i.e.,  Problem Formulation: Destination Prediction. Given the station-level and cluster-level. Compared with station-level, feature model set { , , , … , } and corresponding studying in cluster-level sometimes can simplify issues by label set { , , , … , } as the training set extracted from 1 2 3 푛 grouping bike stations into fewer clusters. Therefore, clustering of trip data and other related푋 푋 data,푋 we aim푋 to find a hypothesis 1 2 3 푛 bike stations is widely used in studies of bike sharing systems [3, model learned푌 푌 푌from machine푌 learning algorithms for multi- 4, 13]. For example, Liu et al. [4] proposed constrained k-centers classification. So that for a new coming feature model clustering according to geographic information as well as extractedℍ from an on-going trip, we can predict its destination 푖 inventory capacity, and the rebalance problem among stations is cluster = ( ) before this trip completes. 푋 solved within each cluster and consecutive clusters. And Chen et al. [3] thought that the bike usage pattern of a station is dynamic 4. DATA푌푖 DESCRIPTIONℍ 푋푖 and context-dependent, so they proposed a method that In this section, description of data sets that we used in our paper constructing a network related to weights to model the will be performed and sources of them are listed. To take as many relationship among bike stations and dynamically clustering influential factors into consideration, we collect trip data and neighboring stations according to the context. Therefore, bike station data from Citi Bike in NYC, meteorology data of NYC and stations in the same cluster have similar bike usage patterns and some other contextual data like holidays through the year of 2017. this cluster can generally represent these stations in the over- A summary of data sets is in Table 1 and detailed description of demand prediction. Etienne et al. [13] also provided a mode-based data sets follows: solution which cluster bike stations according to similar bike usage patterns, such as stations around bus or train stations, and Table 1: Summary of Data Sets then make data analysis based on clusters. To deal with destination prediction in bike sharing systems, we propose a Data Source Description New York City cluster algorithm which comprehensively integrate the geographic 2017-01-01 ~ Time Span location and user pattern to group destination stations into 2017-12-31 destination clusters. Citi Bike Trips 16,365,296 3. PROBLEM FORMULATION Origin Stations 811 In this section, we introduce several definitions using mathematical notations and then formulate the destination Destination Stations 847 Weekdays/ prediction problem to be studied. Holiday 258/98d Weekends& Holidays  Definition 1: Bike Trip. A bike trip ( , , , , ) Records 24h × 356d denotes that a user pick up a bike in an origin station at 표 표 푑 푑 time and drop off this bike in a destination푇푇 푈 푆 station푇 푆 푇 at Temperature [-12.8, 34.4] 표 Meteorology time to finish a complet푈 e bike trip. 푆 Sunny/Fog/Rain/ 표 푑 8162/55/407/52h 푇 푆 Snow Hours ℃  Definition푑 2: Destination Cluster. Suppose that the number of 푇  all destination stations in bike sharing systems is and we Citi Bike: Citi Bike in NYC is the largest public bike sharing cluster stations { , , , … , } into ( ) system in the USA with more than 12,000 bikes and 800 bike stations. And they publish downloadable files of trip data clusters { , , , … , }, then each destination 푛cluster 1 2 3 푛 monthly from July 2013 [22] for public analysis, contains 푛several destination푆 푆 stations푆 푆 which are푘 similar.푘 ≤ 푛 In 1 2 3 푘 development and visualization, including trip duration, start other words,퐶 several퐶 퐶 destination퐶 stations will be mapped to퐶 time, stop time, start station, stop station, station ID, station one destination cluster . Specially,푆 when = , each 푑 latitude and longitude, bike ID, user type, gender and year of destination cluster contains only one station,푆 which means 푑 birth. Trip data is the main data set that contains potential each station is exactly a destination퐶 cluster. 푘 푛 knowledge of bike sharing system and behavior pattern of  Definition 3: Max Distance of Inner Stations (MDIS). users to be discovered to contribute to destination prediction. Suppose cluster consists of stations { , , , … , }  Meteorology: Meteorology is the most significant , then the max distance of inner stations of cluster is 푎 푏 푐 푙 environment factor to the usage of sharing bikes, for example, defined as max퐶 , ( , ) 푆 ,푆 푆 . 푆 The∈ 퐶 퐶 fewer users will choose to cycle in rain or other bad weather. MDIS of cluster is 푖used푗 to measure the range of this cluster. Therefore, the hourly meteorology data of NYC [23], 푆 푆 �푑푑푑푑푑�푑푑 푆푖 푆푗 � 푆푖 푆푗 ∈ 퐶� including weather (e.g. sunny, fog, rain, snow etc.),  Definition 4: Feature Model and Label. Let a vector temperature, humidity and wind speed, is necessary for ( , , ) = [ , , , … , ] denotes destination prediction. features extracted from origin station 푚, departure time 푋 푆표 푇표 푈 푥1 푥2 푥3 푥푚 ∈ ℝ 푚 푆표 푇표

3  Holiday: Federal and state holidays of NYC [24] and regular 5.2 Destination Station Clustering weekends are also important influential factors for As stated earlier, clustering the stations into groups helps to destination prediction. According to recent reports about reduce the number of destinations. Therefore, clustering of bike membership from Citi Bike, nearly 90% of users are annual stations is necessary in this framework to group destination subscribers (e.g. office workers) and the rest of them are stations into destination clusters. Intuitively, bike stations which daily customers (e.g. tourists). Office workers have common are close in location to each other should be placed in same cluster routines of commuting in weekdays and various plans in and a simple clustering method is to group bike stations into weekends, while tourists become more active in holidays to clusters by measuring the similarity of geographic coordinate (e.g. cycle around sightseeing. latitude and longitude) of them. 5. THE PROPOSED DESTINATION Bike stations close in location sometimes show similar pattern, PREDICTION FRAMEWORK however, bikes stations with similar pattern are not necessarily In this section, we introduce the overall destination prediction close in location. For example, on the one hand, there is a high framework we proposed in general and then explain three main probability for bike stations around a company being full during phases of it in detail, including destination station clustering morning rush hour since workers with bikes will arrive this phase, feature model construction phase and destination company at any of them as a destination station. On the other prediction phase. hand, bike stations through a busy avenue are likely to be empty during evening rush hour, offering any last bike for pedestrians 5.1 The Overall Framework cycling home. Therefore, we are going to take usage pattern in The proposed destination prediction framework consists of three addition to geographic location into consideration during station main phases that transform data to feature models and then make clustering. destination prediction based on them. In destination station To measure the similarity of usage pattern, we construct a vector clustering phase, destination stations are extracted from the to reflect the distribution of inflow trips of a destination station. station data and clustered to establish mapping relation from Concretely, for a station , the inflow trips from different destination stations to destination clusters. In feature model destination cluster arriving at is denoted as { } = construction phase, raw data are collected, cleaned and 푆푖 count (trips arriving at from { } ) 푗 푘 transformed to structural features, then all features are integrated 푖 , and the inflow푖 vec푗=1tor 푆 푘 퐼𝐼 to feature models excluding destination stations which are mapped ( ) =푖 , 푗 푗=1,…, of is denoted as 푆 1 퐶 2 푘 . In this way we to destination clusters as labels of corresponding feature models. 퐼𝐼푖 퐼𝐼푖 퐼𝐼푖 can measure the similarity of usage pattern of bike stations by In destination prediction phase, feature models and corresponding 푆푖 푃 푆푖 �퐼�푆푖 퐼�푆푖 퐼�푆푖 � labels are applied to several machine learning algorithms for measuring inflow vectors of them. multi-classification and evaluate them in same conditions. The In this phase, we propose a Compound Station Clustering (CSC) algorithm with the best performance is chosen to make destination method which comprehensively integrate geographic location and prediction in our framework. The overall framework is shown in usage pattern based on classic k-means algorithm as shown in Figure 2. Figure 3(a). The algorithm is supposed to iterate over two steps till convergence or iteration threshold: Step 1) Location-based clustering, use k-means to cluster bike stations by geographic coordinate; Step 2) Pattern-based clustering, construct inflow vector for every bike station then use k-means to cluster bike stations by inflow vector:

(a) (b) Figure 3: (a) Overall Iteration Procedure of CSC. (b) Inflow Vector Indicated Usage Pattern. 1) Location-based clustering. Initially in location-based clustering step of iteration round 1, we use k-means to group stations into clusters according to their geographic coordinate and stations in same cluster are denoted with same color.푛 When it comes푘 to the rest iterations, step 1 slightly differs from the first round in that we will not group stations into clusters with all stations but with stations in the same cluster from previous clustering result of step 2. Assuming Figure 2: The Proposed Destination Prediction Framework. that the number of stations in each cluster in previous step 2 is { , ,…, } , we respectively use k-means to group

{ , 1 ,…,2 } 푙stations into , ,…, clusters in 푛 푛 푛 푛1∗푘 푛2∗푘 푛푙∗푘 푛1 푛2 푛푙 � 푛 푛 푛 �

4 each cluster according to their geographic coordinate. Every so that it can be adopted as an off-the-shelf feature without any step 1 will result clusters and the result tend to be stable transformation. Furthermore, some categorical fields like usertype with iterations going on till we get final clustering result with (0 for annual subscriber and 1 for daily customers) are also convergence or iteration푘 threshold. transformed by one-hot encoding. After features in different data sets are prepared, they are joined according to corresponding keys 2) Pattern-based clustering. In step 2, inflow vector for every into a feature model for an on-going trip. Explicit features and bike station will be constructed to indicate usage pattern. As their values of a feature model are listed in Table 2: shown in Figure 3(b), we are able to construct the inflow ( ) Table 2: Features and Their Values of a Feature Model vector = 1 , 2 ,…, 푘 for station based on 퐼𝐼푖 퐼𝐼푖 퐼𝐼푖 historical trip data and previous clustering result of step 1. Features Values 푃 푆푖 �퐼�푆푖 퐼�푆푖 퐼�푆푖 � 푆푖 Then we re-cluster stations into clusters using k-means month {1, 2, …, 12}, one-hot encoding according inflow vector. The 푘point of pattern-based 2 hour {0, 1, …, 23}, one-hot encoding clustering is that we푛 can recognize� �usage pattern based on geographic locations. The pseudocode of CSC follows: minute {0, 1, …, 59}, discrete variable weekday {1, 2, …, 7}, one-hot encoding ALGORITHM 1: CSC Input: station location information { } holiday {0, 1}, one-hot encoding trip information { } 푛 푖 푖=1 usertype {0, 1}, one-hot encoding number of clusters 푚 푆 푖 푖=1 round threshold 푇 age [16, 159], continuous variable 1. Initialize 0, 0푘, 1 0 gender {0, 1}, one-hot encoding { } � { } 2. Group into clusters by k-means according weather {0, 1, 2, 3}, one-hot encoding to geographic𝑖푛 ← locations푙 ← 푘 ← 푘 푖 푖=1 푗 푗=1 3. While 푆 < Do 푘 퐶 temperature [-12.8, 34.4] , continuous variable 4. For 1 Do humidity [13, 100]%, continuous℃ variable 5. 𝑖 � count (trips arriving at ) windspeed [0, 42.6]mph, continuous variable 6. For� ← 1∶ 푛 Do 푖 푖 7. 퐼 𝐼 ← count (trips arriving 푆at from ) 푗 ←푗 ∶ 푘 5.4 Destination Prediction ( ) = , ,…,푖 푗 8. inflow퐼𝐼푖 vector← 1 2 푆 푘퐶 After feature models and corresponding labels are constructed 퐼𝐼푖 퐼𝐼푖 퐼𝐼푖 from raw data sets, they can be directly applied to train algorithms 푖 퐼�푆푖 퐼�푆푖 퐼�푆푖 9. Group { } into푃 푆 clusters� { } 푘 by k-means� to make destination prediction for new coming on-going trips. �2� 푛 푘 Several machine learning algorithms are proposed to solve all according to푖 inflow푖=1 vector { ( )} 푗 푆 �2� 퐶 푗=1 kinds of multi-classification issues [25]. To take as many multi- 10. 1 푛 For Do 푖 푖=1 classification algorithms into consideration, kNN (k-Nearest- 푘 푃 푆 11. = 2 Neighbors), NB (Naive Bayes), ANN (Artificial Neural Networks) 푙 ← 푙∶ � � 12. Group 푛 ∗푘 stations in into clusters by k-means and RF (Random Forests) are selected to evaluate un performance 1 2 according푘 to� geographic� coordinate before we decide which algorithm to make destination prediction 푙 푙 1 in this framework. 13. Integrate all푛 clusters into퐶 { } 푘 14. + 1 푘 To evaluate these algorithms with given feature models and 푗 푗=1 15. Return { } 퐶 corresponding labels, we have to split given data into two parts of 𝑖 ← 𝑖 Output: result of푘 clusters: { } training data and validation data. For example, random 70% of 퐶푗 푗=1 feature models and corresponding labels are used as training data 푘 푘 퐶푗 푗=1 to apply to algorithms, and the rest 30% of feature models are used as validation data to make destination prediction, then 5.3 Feature Model Construction predicted results are validated with the rest 30% of corresponding Raw data sets collected from different sources are constructed to true label to measure the performance of algorithms. However, a feature models before they are applicable and efficient for random split of training data and validation data might not able to machine learning algorithms to work with. Features are identified stably evaluate the performance of different algorithms due to from raw data based on domain knowledge and other principles in problems like randomness of data and overfitting for training data. feature engineering [11], indicating all aspects of attributes of an Cross-validation is a commonly used model validation technique object. Concretely, time, meteorology and user information are in machine learning that derives a more stable and accurate commonly considered influential factors for bike trips in bike estimate of algorithms [26]. In -fold cross-validation, the given sharing systems [2, 15], therefore a feature model in this data is randomly partitioned into in each iteration equal sized framework is a set of features that indicate these attributes of an subsamples, and we will repeat푘 the evaluation for times that on-going trip. 1 subsamples as training data and the rest one subsample as For example, the start time field in trip data that implies the exact validation data in each iteration, then average result푘 of cross- time when this trip start in format of YYYY-mm-dd HH:MM:SS, v푘alidation − is used to measure the performance of each algorithms. and we will tear it apart and transform into several features like Finally, the algorithm with the best performance evaluated by month, hour and minute, which can be represented by an cross-validation is selected in this framework, and it will be independent discrete variable. As for the temperature field in meteorology data, it’s a natural independent continuous variable

5 trained with the all feature models and corresponding labels to 6.2 Evaluation Plan make the destination prediction for new coming on-going trips. In destination station clustering phase, our framework is evaluated with three levels of clustering: 1) Station-level 6. EXPERIMENTS clustering (SLC) where each destination station is exactly a In this section, experiments are conducted with real-world data destination cluster; 2) Geographic clustering (GC) which simply sets in NYC through the year of 2017 to evaluate the proposed use k-means to group stations only by geographic coordinate to destination prediction framework. Firstly, we set experimental generate 40 clusters; 3) Our CSC method considering both data and parameters for this destination prediction framework and geographic location and usage pattern of stations to generate 40 define evaluation metrics. Then we introduce several levels of clusters. clustering and several machine learning algorithms for multi- classification used in prediction and make evaluation plan based In destination prediction phase, we will compare four machine on these baselines. Performance comparison of experiment results learning algorithms for multi-classification and briefly explain and analysis of them will be performed later to show the how they are applied in destination prediction issues: effectiveness of our clustering method and framework. Finally, we  make an extensive analysis of the impact of cluster size in kNN: k-Nearest-Neighbors classification is a simple and destination station clustering phase. fundamental classification algorithm that an object is classified according its neighbors’ class memberships [27]. 6.1 Experimental Settings Concretely, for a new coming feature model, we will find its In destination station clustering phase, we have to manually select nearest feature models which are in same hour of weekday the number of destination clusters for clustering algorithms to and weather and predict its destination cluster by the mode of work. As mentioned in the discussion of clustering techniques in destination cluster labels of its nearest feature models. Section 1, there is a trade-off between performance and  NB: Naive Bayes classification is a probabilistic algorithm practicality when deciding the number of clusters. After analyzing based on Bayes’ theorem with assumption that features are the distribution of trips with different distances in Figure 4, we independent to each other [28]. And the way to predict the find that the distance of trips varies from 0 to 8000 meters and the destination cluster of a new coming feature model is to average distance is about 1700 meters. After several quick select the most probable one among { , , , … , } (also experiments with different number of destination clusters in k- known as MAP estimation):퐶 푋 means, we choose 40 as the number of destination clusters. 퐶1 퐶2 퐶3 퐶푘 Concretely, all MDISs of these 40 destination clusters are exactly a little less than the average distance of trips, and the max MDIS = arg max ( ) ( | ), ( | ) = 푚 ) { ,…, } of these 40 destination clusters is 1687 meters while the average 푖 푖 푖 푗 푖 푌� 푖 ∈ 1 푘 푃 퐶 푃 푋 퐶 푃 푋 퐶 � 푃�푥 � 퐶 distance of trips is about 1700 meters.  ANN: Artificial Neural Networks is 푗=1inspired by the biological neural networks in brains, which can learn not only simple linear function but also complex function approximator [29]. ANN is consisted of multi-layer connected nodes, which can receive signal and then process and signal to adjacent nodes just like neurons as shown in Figure 5. For a new coming feature model = [ , , , … , ] input in feature layer, we will get signals passing 1 2 3 through hidden layers to label layer and use softmax푋 푥 function푥 푥 푚 to output푥 the prediction of destination cluster .

Figure 4: Distribution of Bike Trips with Different Distances. And in destination prediction phase, we apply a 5-fold cross validation in algorithms to measure their performance in an accurate and efficient way. The performance of different algorithms is evaluated with the result of destination prediction by Accuracy, a relatively strict metric for multi-classification issues: Figure 5: ANN with One Hidden Layer. 1  RF: Random Forest is an ensemble learning method based on ( , ) = 푛 1 ( = ) multiple decision trees, where each tree describes a decisional process learning from data to make prediction [30, 퐴𝐴퐴𝐴�퐴 푌� 푌 � 푌�푖 푌푖 where is the true label of destination푛 푖=1 cluster of -th feature 31]. Each decision tree in RF is constructed using random model and is the predicted destination cluster of this feature subspace method and random selection features based on 푖 model, 푌and 1 ( ) is the indicator function. � Breiman's “bagging” idea to construct a collection of decision 푌�푖 trees with relatively low variance, therefore RF can overcome 푥 the overfitting issue of decision tree. After every decision tree makes its own prediction of destination cluster of a new

6 coming feature model, RF will take the mode of predictions of destination cluster as the prediction result. All these four multi-classification algorithms are in combination with all three levels of clustering and constitute different baselines. And we will evaluate these baselines both in Accuracy and time consumption if applicable to decide the best multi-classification algorithm to make destination prediction in our framework. 6.3 Performance Comparison and Analysis We conduct destination prediction experiments on every origin station respectively, so there will be an Accuracy score for every origin station excluding those with too few trips (stations with less than 70 trips through the year of 2017). We evaluate baselines with average Accuracy of all origin stations to intuitively compare Figure 6: Distribution of Origin Stations with Different the performance of different levels of clustering and different Accuracy Predicted by RF. multi-classification algorithm in Table 3. And we get following observations and analysis from experiments: 6.4 Impact of Cluster Number Table 3: Average Accuracy with Different Baselines We conduct experiments to analyze the impact of cluster number to MDIS of clusters and Accuracy of destination prediction. We Prediction kNN NB ANN RF evaluate our destination prediction framework from 20 to 60 Clustering clusters generated by CSC and results are shown in Figure 7(a). SLC 4.6% 4.0% 14.3% With the number of clusters increasing from 20 to 60, the GC (40) 23.1% 7.3% 19.8% 30.9% prediction Accuracy is evenly decreasing from 48.5% to 29.2% as expected, indicating that less number of clusters clearly leads to CSC (40) 28.1% 8.7% 24.7% 39.3% higher prediction Accuracy. At the same time, the average MDIS  All experiments are conducted on a high-performance server of clusters also goes down as the number of clusters increasing with an 8-core 2.6GHz CPU and 64GB RAM allocated. kNN which make those clusters more practical and applied for bike and NB took about 3 hours and 8 minutes to finish respectively sharing systems. However, the average MDIS of clusters trends to in each experiment regardless of clustering algorithms. ANN be steady between 2500 and 2000 meters after the number of took about 4 hours to finish each experiment with GC or GSC clusters is more than 40. The clustering result of CSC with 40 clustering but failed to complete the experiment with SLC due clusters visualized in NYC is shown in Figure 7(b), bike stations to time complexity. RF was also inflected by this issue and with same color are in the same cluster. Generally, bike stations took about 2 hours to finish the experiment with SLC, while it are grouped by geographic location, while those surrounded by only took about half hour to finish each experiment with GC or another cluster are assigned to its own cluster due to their usage CSC. pattern.  The results with GC or CSC apparently outperform the results with SLC in all algorithms if applicable. However, it is an obvious result as expected since GC or CSC significantly reduces the number of destination clusters compared with SLC, which make it easier to predict.  The results with CSC outperform the results with GC in all algorithms. And RF performs the most improvement of 8.3% on average Accuracy from 30.9% of GC to 39.2% of CSC. It indicates that our CSC method is more effective than commonly used GC by clustering stations similar not only in geographic location but also in usage pattern.  RF is selected in our framework to make destination prediction since RF with CSC outperform all baselines, reaching average 39.2% Accuracy with 40 destination clusters. Besides, RF (a) (b) achieves the best result in SLC, reaching average 14.3% Accuracy with more than 800 destination clusters. Excluding Figure 7: (a) Average MDIS and Average Accuracy with NB with far worse result, RF is the fastest one due to its Different Cluster Numbers. (b) Clustering Result of CSC with parallel characteristics. 40 Clusters in NYC. Then we make a detailed comparison of distribution of stations 7. CONCLUSION with accuracy predicted by RF using GC and CSC respectively in In this paper, we propose a cluster-based framework to predict Figure 6. Obviously, there are much more origin stations of CSC destination for on-going trips in bike sharing system. To reduce (green bar) than origin stations of GC (blue bar) in higher- the number of destinations of this multi-classification issue, we accuracy intervals (e.g. [0.3, 0.35], [0.35, 0.4], etc.). The overall propose a CSC method to group stations with similar geographic improvement of Accuracy of origin stations indicates the location and usage pattern into destination clusters. We evaluate effectiveness of CSC in our framework towards destination our framework with three levels of clustering in combination with prediction issue in bike sharing system.

7 four machine learning algorithms for multi-classification using [11] Turner, C. R., Fuggetta, A., Lavazza, L., & Wolf, A. L. (1999). A feature models constructed from one-year data sets in NYC. conceptual basis for feature engineering. Journal of Systems and Results show that our framework with CSC and RF outperforms Software, 49(1), 3-15. other baselines and achieve 39.2% average Accuracy with 40 [12] Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a destination clusters, confirming the effectiveness of our CSC review. ACM computing surveys (CSUR), 31(3), 264-323. method and our destination prediction framework. [13] Etienne, C., & Latifa, O. (2014). Model-based count series clustering There are many issues worthy to study about destination for bike sharing system usage mining: a case study with the prediction in bike sharing systems. First, more influential factors Vélib’system of Paris. ACM Transactions on Intelligent Systems and Technology (TIST), 5(3), 39. like surrounding transportation stations can be take into consideration in feature model to improve the accuracy of [14] DeMaio, P. (2009). Bike-sharing: History, impacts, models of destination prediction. Second, a deeper analysis of bike stations provision, and future. Journal of public transportation, 12(4), 3. in destination clusters using different clustering algorithms might [15] Wang, W. (2016). Forecasting Bike Rental Demand Using New be able to reveal potential relations between stations and help to York Citi Bike Data. establish new stations. Finally, we plan to extend our experiments [16] Zeng, M., Yu, T., Wang, X., Su, V., Nguyen, L. T., & Mengshoel, O. on other bike sharing systems and even bike sharing systems J. (2016). Improving Demand Prediction in Bike Sharing System by without bike stations to evaluate and optimize our destination Learning Global Features. Machine Learning for Large Scale prediction framework. Transportation Systems (LSTS)@ KDD-16. [17] Chemla, D., Meunier, F., & Calvo, R. W. (2013). Bike sharing 8. ACKNOWLEDGMENTS systems: Solving the static rebalancing problem. Discrete This work is supported by High-performance Computing Platform Optimization, 10(2), 120-146. of Peking University. Associate Professor Huiping Lin is the corresponding author of this paper. [18] Xue, A. Y., Zhang, R., Zheng, Y., Xie, X., Huang, J., & Xu, Z. (2013, April). Destination prediction by sub-trajectory synthesis and privacy protection against such prediction. In Data Engineering 9. REFERENCES (ICDE), 2013 IEEE 29th International Conference on (pp. 254-265). [1] Shaheen, S., Guzman, S., & Zhang, H. (2010). Bikesharing in IEEE. Europe, the Americas, and Asia: past, present, and future. Transportation Research Record: Journal of the [19] Krumm, J., & Horvitz, E. (2006, September). Predestination: Transportation Research Board, (2143), 159-167. Inferring destinations from partial trajectories. In International Conference on Ubiquitous Computing (pp. 243-260). Springer, [2] Froehlich, J., Neumann, J., & Oliver, N. (2009, July). Sensing and , Heidelberg. predicting the pulse of the city through shared bicycling. In IJCAI (Vol. 9, pp. 1420-1426). [20] Alvarez-Garcia, J. A., Ortega, J. A., Gonzalez-Abril, L., & Velasco, F. (2010). Trip destination prediction based on past GPS log using a [3] Chen, L., Zhang, D., Wang, L., Yang, D., Ma, X., Li, S., ... & Hidden Markov Model. Expert Systems with Applications, 37(12), Jakubowicz, J. (2016, September). Dynamic cluster-based over- 8166-8171. demand prediction in bike sharing systems. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and [21] Nadembega, A., Taleb, T., & Hafid, A. (2012, June). A destination Ubiquitous Computing (pp. 841-852). ACM. prediction model based on historical data, contextual knowledge and spatial conceptual maps. In Communications (ICC), 2012 IEEE [4] Liu, J., Sun, L., Chen, W., & Xiong, H. (2016, August). Rebalancing International Conference on (pp. 1416-1420). IEEE. bike sharing systems: A multi-source data smart optimization. In Proceedings of the 22nd ACM SIGKDD International Conference [22] International, Citi Bike Inc. (2018). Citi Bike System Data. on Knowledge Discovery and Data Mining (pp. 1005-1014). ACM. http://www.citibikenyc.com/system-data. [5] Hu, J., Yang, Z., Shu, Y., Cheng, P., & Chen, J. (2017, November). [23] Weather Underground Inc. (2018). Weather API for New York in Data-Driven Utilization-Aware Trip Advisor for Bike-Sharing USA. https://www.wunderground.com/weather/api/. Systems. In Data Mining (ICDM), 2017 IEEE International [24] Office Holidays Inc. (2018). Public Holidays for New York in USA. Conference on (pp. 167-176). IEEE. https://www.officeholidays.com/countries/usa/index.php. [6] Singla, A., Santoni, M., Bartók, G., Mukerji, P., Meenen, M., & [25] Aly, M. (2005). Survey on multiclass classification methods. Neural Krause, A. (2015, January). Incentivizing Users for Balancing Bike Netw, 19, 1-9. Sharing Systems. In AAAI (pp. 723-729). [26] Zhang, P. (1993). Model selection via multifold cross validation. The [7] Bacciu, D., Carta, A., Gnesi, S., & Semini, L. (2017). An experience Annals of Statistics, 299-313. in using machine learning for short-term predictions in smart transportation systems. Journal of Logical and Algebraic Methods in [27] Keller, J. M., Gray, M. R., & Givens, J. A. (1985). A fuzzy k-nearest Programming, 87, 52-66. neighbor algorithm. IEEE transactions on systems, man, and cybernetics, (4), 580-585. [8] Zhang, J., Pan, X., Li, M., & Philip, S. Y. (2016, June). Bicycle- sharing system analysis and trip prediction. In Mobile Data [28] Rish, I. (2001, August). An empirical study of the naive Bayes Management (MDM), 2016 17th IEEE International Conference classifier. In IJCAI 2001 workshop on empirical methods in artificial on (Vol. 1, pp. 174-179). IEEE. intelligence (Vol. 3, No. 22, pp. 41-46). IBM. [9] Ashbrook, D., & Starner, T. (2003). Using GPS to learn significant [29] Zhang, G., Patuwo, B. E., & Hu, M. Y. (1998). Forecasting with locations and predict movement across multiple users. Personal and artificial neural networks: The state of the art. International journal Ubiquitous computing, 7(5), 275-286. of forecasting, 14(1), 35-62. [10] Xu, M., Wang, D., & Li, J. (2016, September). DESTPRE: a data- [30] Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32. driven approach to destination prediction for taxi rides. [31] Liaw, A., & Wiener, M. (2002). Classification and regression by In Proceedings of the 2016 ACM International Joint Conference on randomForest. R news, 2(3), 18-22. Pervasive and Ubiquitous Computing (pp. 729-739). ACM.

8