Cluster-Based Destination Prediction in Bike Sharing System

Cluster-Based Destination Prediction in Bike Sharing System Pengcheng Dai Changxiong Song Huiping Lin School of Software and School of Software and School of Software and Microelectronics, Peking University Microelectronics, Peking University Microelectronics, Peking University Beijing, China Beijing, China Beijing, China [email protected] [email protected] [email protected] Pei Jia Zhipeng Xu School of Software and School of Software and Microelectronics, Peking University Microelectronics, Peking University Beijing, China Beijing, China [email protected] [email protected] ABSTRACT Bike in New York City, Vélib' in Paris, Ofo and Mobike in Destination prediction not only helps to understand users’ Beijing, etc. [1]. For the bike sharing system with stations behavior, but also provides basic information for destination- (dockers to hold the bikes), user can pick up a bike from a nearby related customized service. This paper studies the destination bike station and drop it off to the one close to his destination. prediction in the public bike sharing system, which is now Because bikes are continuously and arbitrarily moved by users blooming in many cities as an environment friendly short-distance from stations to stations, one of the most common problems users transportation solution. Due to the large number of bike stations encounter is finding an available bike or parking slot [2]. To (e.g. more than 800 stations of Citi Bike in New York City), the improve the availability of bike sharing system, researches have accuracy and effectiveness of destination prediction becomes a studied the demand prediction that predict the number of available problem, where clustering algorithm is often used to reduce the bikes and docks [2, 3] and rebalancing strategy to efficiently number of destinations. However, grouping bike stations reallocate bikes among stations [4]. In addition to demand according to their location is not effective enough. The prediction and system rebalancing, destination prediction which contribution of the paper lies in two aspects: 1) Proposes a concerns about the user’s behavior and trip’s flow trend is another Compound Stations Clustering method that considers not only the aspect worthy to study. geographic location but also the usage pattern; 2) Provide a Destination prediction is to predict a user’s destination when he framework that uses feature models and corresponding labels for picks a bike from an origin station, by given the information such machine learning algorithms to predict destination for on-going as the location of origin station, the environment context, the user trips. Experiments are conducted on real-world data sets of Citi information and etc. Compared with demand prediction that Bike in New York City through the year of 2017 and results show concentrates on available bikes and number of trips of bike that our method outperforms baselines in accuracy. stations, destination prediction provides a perspective to study the users’ behavior pattern and bicycle traffic flow trend in bike CCS Concepts sharing systems. Therefore, system administrators can take • Information systems ➝ Clustering • Applied computing ➝ measures in advance to provide more efficient and accurate Forecasting • Applied computing ➝ Transportation service based on the result of destination prediction. On the one hand, destination prediction can be applied to guide users to find Keywords the best destination station to save energy and time [5] and other Destination Prediction; Machine Learning; Clustering; Bike destination-based services. It also helps to rebalance bikes by Sharing System incentivizing users to pick up or drop off bikes in proper bike stations [6]. 1. INTRODUCTION The bike sharing system provides a new and green public Usually the destination prediction in bike sharing systems is transportation to fill the “last mile” gap in urban areas such as Citi abstracted as a multi-classification problem. For example, Bacciu et al. [7] put it forward using machine learning methodologies to Permission to make digital or hard copies of all or part of this work for build multi-classification models to infer destination station for a personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies small bike sharing system with 15 bike stations. Zhang et al. [8] bear this notice and the full citation on the first page. Copyrights for simplified this problem to a binary classification problem and components of this work owned by others than ACM must be honored. proposed a model to do binary classification to predict whether a Abstracting with credit is permitted. To copy otherwise, or republish, to trip in candidate pair of stations will happen. However, post on servers or to redistribute to lists, requires prior specific permission destination prediction still facing challenges with the rapid and/or a fee. Request permissions from [email protected]. expansion of bike sharing systems: AICCC '18, December 21–23, 2018, Tokyo, Japan © 2018 Association for Computing Machinery. The large number of destination stations. Modern bike ACM ISBN 978-1-4503-6623-6/18/12...$15.00 sharing systems usually have hundreds of bike stations, DOI:https://doi.org/10.1145/3299819.3299826 which makes it difficult to predict destination station directly. 1 For example, the Citi Bike in New York City consists of corresponding labels from relabeling destination station with more than 800 bike stations. The number of average destination station clustering result. Finally, we apply several historical destination stations of an origin station is more than machine learning algorithms for multi-classification to make 300 as shown in Figure 1(a). And Clustering of bike stations prediction and evaluate them with prediction performance to to reduce the number of destinations is a solution. Clustering select the best one to use in our framework. Major contributions techniques are applied in the process of destination extraction of this paper include: from raw GPS traces to make destination prediction for a person or taxi [9, 10]. And clustering of stations is also a We propose a cluster-based framework using destination widely used method in bike sharing systems based on station clustering and multi-classification algorithms to observations that a cluster of stations is more regular and directly and accurately predict the destination of on-going steadier to make prediction than individual station [3, 11]. trips, which can be applied in large bike sharing systems with hundreds of stations. Clustering of bike stations. Clustering intends to group a set of objects so that objects in the same group are more similar We propose a Compound Station Clustering method to group to each other than to other objects in other groups [12]. And bike stations which are similar both in geographic location the intuitive way to measure the similarity between bike and usage pattern by measuring the similarity of geographic stations is geographic location [4]. However, grouping coordinate and inflow vector (detailed procedures in Section stations which are only similar in location might not result 4.2). effective clusters for destination prediction since bike stations We conduct experiments of the proposed framework on one- with similar pattern are not necessary close in location. As year real-world data from Citi Bike in New York City, and shown in Figure 1(b), bike stations are in the same green the result outperforms baselines both in performance and status (full of bikes in bike station) around New York Penn practicality. Station, while stations through Broadway are almost in red status (lack of stations in bike station) which are not close in The rest of this paper is organized as follow. Firstly, in Section 2 location. In addition to geographic location, the usage pattern we review related work in this aspect. In Section 3, we introduce of stations is also an influential factor worthy to be preliminaries and formally define destination prediction problem. considered when measuring the similarity between bike Then in Section 4 we describe data sets used in this study and list stations [2, 13]. sources of them. And in Section 5, we show the overall proposed framework and illustrate several main phases of it in detail. In Section 6, experiments are conducted to evaluate our framework and results of experiments are analyzed. Finally, Section 7 conclude this paper with future work. 2. RELATED WORK Bike sharing systems. Bike sharing programs have received increasing attention in last years and developed with many issues to be studied [14]. The earlier issue mainly focused on predicting the number of available bikes and docks which is also called demand prediction [2, 3, 15, 16]. For example, Wang [15] built machine learning model that combines historical usage patterns with related influential factors like user information, meteorology and weekend to forecast bike rental demand in a future period. (a) (b) And then Zeng et al. [16] proposed a model to improve the bike demand prediction by developing more helpful global features Figure 1: (a) Distribution of Origin Stations with through data analysis and machine learning algorithms. Based on Different Historical Destination Stations. (b) Status of Several this, rebalancing approach to find the best way to redistribute Bike Stations at Evening Rush Hour in Weekday bikes is also studied as an applied issue for bike sharing systems Trade-off between performance and practicality. There is [4, 6, 17]. Liu et al. [4] formulated the rebalance issue into a also a trade-off between performance and practicality when mixed integer nonlinear program and the objective is to minimize applying clustering techniques to the destination prediction of the total travel distance. And Singla et al. [6] presented a bike sharing systems. After all, one cluster of every bike crowdsourcing mechanism that incentivizes users to help station makes it difficult to achieve an effective result due to rebalancing bikes by providing them guidance of bike stations to too many potential destination stations, while one cluster of pick up or drop off bikes.

Load more