Trans-Sense: Real Time Transportation Schedule Estimation Using Smart Phones
Ali AbdelAziz* Amin Shoukry Walid Gomaa Moustafa Youssef Egypt-Japan University Egypt-Japan University Egypt-Japan University Alexandria University of Science and Technology of Science and Technology of Science and Technology Alexandria, Egypt. and Aswan University. and Alexandria University. and Alexandria University. [email protected] [email protected] [email protected] [email protected]
Abstract—Developing countries suffer from traffic congestion, time waiting (either for getting on a vehicle or In-Vehicle poorly planned road/rail networks, and lack of access to public Time (IVT)) to the scheduled time [11]. The second, adopted transportation facilities. This context results in an increase in fuel by transportation models, assumes that average waiting times consumption, pollution level, monetary losses, massive delays, and are half the service headway given random passenger arrivals less productivity. On the other hand, it has a negative impact [9], [10]. The definition adopted in this paper is the first one. on the commuters feelings and moods. Availability of real-time The total time spent in a trip can be estimated based on the transit information - by providing public transportation vehicles locations using GPS devices - helps in estimating a passenger’s access, regress, waiting and IVT times [12]. Recently, many waiting time and addressing the above issues. However, such transit agencies are established to provide a suitable public solution is expensive for developing countries. This paper aims transportation service and take advantage of the increasingly at designing and implementing a crowd-sourced mobile phones- high-amenity transit stations and stops to mitigate the burden based solution to estimate the expected waiting time of a pas- of wasted waiting time. Also, there are many facilities on senger in public transit systems, the prediction of the remaining social media that provide train tracking services using manual time to get on/off a vehicle, and to construct a real time public input from users. Nonetheless, these systems depend on either transit schedule. data available by the service providers, dedicated devices Trans-Sense has been evaluated using real data collected for attached to the transportation vehicles, and/or manual user over 800 hours, on a daily basis, by different Android phones, and input. All limit their deployment, especially in develipping using different light rail transit lines at different time spans. The countries. results show that Trans-Sense can achieve an average recall and precision of 95.35% and 90.1%, respectively, in discriminating lightrail stations. Moreover, the empirical distributions governing In this paper, we present Trans-Sense, a system that takes the different time delays affecting a passenger’s total trip time advantage of the ubiquitous devices available with the com- enable predicting the right time of arrival of a passenger to her muters, while avoiding expensive solutions based on GPS ded- destination with an accuracy of 91.81%. In addition, the system icated devices attached to public transit vehicles. We conduct estimates the stations dimensions with an accuracy of 95.71%. a case study of an LRT system that takes into consideration a uniquely systematic perspective, including a wide range of I.INTRODUCTION stations and tram lines types, different seasons, times within the day and with many users equipped with various mobile Using public transportation means such as light rail transit phones. Our results show the effectivness of the proposed (LRT), tram, bus, or metro is a normal daily activity for work technique in capturing different timing haractersitics of the or leisure. However, traveling time is usually considered as a trasit system dynamics. wasteful time and has a negative impact on the commuters’ feelings [1], [2]. Often, passengers engage in some activities This paper is organized as follows. Section II gives a brief to make their transportation/commuting time more productive. introduction about the investigated tramway system as well as Normally, public transportation follows a fixed schedule that an overview of the proposed Trans-Sense system. Section III arXiv:1906.07575v1 [cs.CY] 14 Jun 2019 maybe disturbed for many reasons including unpredictable describes the five main components of Trans-Sense. Finally, traffic problems; especially in developing countries; resulting the paper is concluded in Section V. in massive delays, high fuel wastage, wasted human resources, and finally monetary and economical losses. These obstacles II.PROPOSED SYSTEM affect the competitiveness of public transit, which is more eco friendly than private vehicles [3], [4]. A. Alexandria Tram System A growing interest in analyzing traffic problems in Alexandria tramway started in 1860. It is the oldest in developing countries has been witnessed over the years [5]– Egypt and Africa, and is among the oldest in the world. [8]. Systems like [9], [10] are among the recent studies Figure 1 and Table I show the Ramleh Tram System, one of that attempted to estimate passengers’ waiting time Alexandria LRT systems. It runs from the east to the west of distributions and perceptions, depending on service and Alexandria (from Victoria to Ramleh station) and vice versa. It stations characteristics. There are many definitions of the includes 39 stations. This system includes four different lines waiting time. The first expresses it as the ratio of the actual (four colors/Identifiers ID) that share the same rail in some parts of their journey and split into different rails in other *Electrical Engineering department, Aswan faculty of Engineering, Aswan parts. In several locations, cars are allowed to cross the tram University, Egypt, 81542. rails. Therefore, a tram driver has to stop until it is safe to pass w , w ws1 wseg12, wseg21 ws2 lg21 lg12 wf1 wlg13, wlg31 ws3
TABLE I: Different stations names and their refrences. (a) ws1 ws2 wf1 ws3
Ref. Station Name Ref. Station Name wsg12 wlg21 wlg13 Station 1 Station 2 Traffic 1 Station 3 S 1 Ramleh S 21 Gleem wsg21 wlg12 wlg3 S 2 Al-Qa’ed Ibrahim S 22 Fonoon Gamila (b) S 3 Azarita S 23 Zizinia S 4 Soter S 24 San Stefano Fig. 2: (a) Waiting time between 3 successive stations. The S 5 Shoban S 25 Thrwat route is direct between the first pair of stations while it includes S 6 Shatbi S 26 Louran a traffic light between the next pair. (b) corresponding state S 7 Game3a S 27 AlSraya diagram. S 8 Camb S 28 Sidi Bishr S 9 Ibrahemya S 29 El Seyoof time. At peak times a tram can be waiting at a station while S 10 Al-Riada Al-Sughra S 30 Victoria another tram, directly behind the first, is waiting to enter the S 11 Al-Riada Al-Kubra S 31 Zananeri same station. The waiting time of the second tram in this case S 12 Kliopatra Al-Sughra S 32 SidiGaber St. is the buffering waiting time. S 13 Kliopatra Hamammat S 33 Wezara S 14 SidiGaber Sheikh S 34 Felming Segment time Ehwsgi, is the average time taken by a tram to S 15 Moustafa Kamel S 35 Bakous travel from a given station directly to the next station (there S 16 Mahfouz S 36 Safr are no cross roads or traffic lights between these two stations). S 17 Roshdi S 37 Shods Leg time Ehwlgi, is the average time taken by a tram to travel S 18 Bokla S 38 Genakelese from a station/traffic signal to the next station/traffic signal on S 19 Hedaya S 39 Genakelese2 its route. This is for stations separated by traffic lights. S 20 Sapa Basha Trip time EhwT i, is the average time for a passenger to travel from a source to a destination. It is an aggregation of multiple segments time, stations delays, traffic delays, legs time and or a traffic light is green. Because most traffic lines are human segments time as in the following equation: controlled, the waiting time at these locations is unpredictable. N N−1 M With the continuous increase in the population in Alexandria, X X X EhwT i = ( ( (Ehwsi i + Ehwbfij i + Ehwsgij i migration from rural areas, the impact of the heavy traffic on (1) trams schedule is considerable. i=1 j=i+1 k=0 + Ehwlgik i + Ehwfk i + Ehwlgkj i))) where, i is the current station, j is the next station, k current S_1 traffic light, N number of stations from source to destination, M is the number of traffic light signals. LRT Line Numbers (ID’s)
Fig. 1: The investigated tram system. Two spliting areas can B. Data Collection. be identified that correlate with tram IDs. The data collection took place with different mobile devices including, Samsung Galaxy S, Samsung Note 4, HTC M9 plus, HTC E9 plus and Google Nexus. Hundred thousands of data Whenever a tram moves from a station to the next one, one traces have been collected from different users at different of the following scenarios is expected (Figure 2): times and seasons for about 18 months. 1) The path is direct and cannot be interrupted. 2) A cross road lies next to the starting station. An extra C. System Overview waiting time is required until it is safe to move. (e.g., In this section, we present the architecture of the Trans- stations S 5 and S 9 stations in Figure 8). Sense system. It consists of three different subsystems as 3) At least one cross-road controlled either by a human shown in Figure 3: the MonoSense, R-Sense and Railway- and/or traffic light interrupts the path. Hence, an extra Sense systems. Trans-Sense relies on crowd sensing approach delay time is expected. to collect and accumulate data sent from the smart phones’ sensors; carried by different users; to a service running on The following terminology is needed in the rest of this the cloud. The temporal and spatial information extracted paper: from the users are correlated with their speeds and loca- Mandatory stop, is a tram stop at a station. tions. The system starts by collecting a stream of GPS data Potential stop, is a possible tram stop at a traffic light. (d = d0; d1; ··· ; dt; ··· ), where each dt is an ordered pair All the following time dependent parameters are estimated as (Lat ; Long ) representing a user latitude and longitude at time hxi t t averages/ expectations, the operator E denotes the expec- t. The input data stream is first filtered and smoothed, then it tation operator. is passed to MonoSense. Traffic delay Ehwf i, is the average tram waiting time at a traffic signal. The MonoSense transportation mode detection system [13], Station delay Ehwsi, is the average tram waiting time at a [14] differentiates between walking, riding and driving users. station. If the sensed data represents a moving person/ rider, it is then Tram buffering delay Ehwbf i, is the average tram buffering passed to R-Sense. (unusual Tram ID Tram Time Arrival Current speed - - - is a well-known ’multipath effect’ Tram ID Detection ] σ
Route Direction +3 µ , σ
Current Estimate of Time Dbscan -3 Tram ID & Direction & ID Tram Component Detection
with Time History µ ,[ Backtracking Fuse Times mean Total
Speed Trip Time . It uses the empirical three sigma rule of : specifies the size (radius) of the (circular) Station : specifies the minimum number of points in History Component History Bootstrapping Bootstrapping History Time History Time Leg Time Traffic Time Traffic Waiting Time Time History Time Inside Segment Time Segment Transit Schedule Estimation) Schedule Transit Dynamic Time Estimation Component Estimation Time Dynamic ( MinPts the neighborhoodincluded of in a a cluster. given pointEpsilon in orderneighborhood. to be [23]–[25] as shown in Figure 5. 1 Sense - Moreover, misleading readings are due to The filtering phase uses a smoothing filter with non- Station Locations • • The filtered data (98.65% of the raw data) is then passed The main function of this component is to extract the It states that for a normal distribution 99.7% of the data lie within a range
Database Unit 1 of three standard deviations of the [20]–[22], where the direct pathFor to the the above GPS receiver reasons, islead spikes blocked. and to spurious substantial changeshandle deviations occur these and in errors,coarse we the apply outliar measured three removal locations.and different duplicates (Phase To filtering removal (Phase phases: 1), 3).outliar The noise data coarse filtering that filtering removes identifiable is (Phase values) significantly 2), differentthumb than other overlapping windows todata smooth the points data intotering and groups clusters algorithm using similar (DBSCAN) the density [26]. based spatial clus- clustering algorithm thatincluding the requires number of only clusters: two parameters not Finally, dissimilar remaining pointsobtained are clusters, removed. therepoints Among at are the the same clusters locationduplicate with that entries lower variance. that have They are represent multiple removed as data shown in Figureto 5 (b,d). the stationbetween location stations detection andbehavior. component traffic to lights according discriminate to passengers’ B. Stations Locations Detection Component. semantics hidden inoccupies the a filtered GPSnumber relatively points. of large Sincesame GPS area, a measurements. physical it station GPS GPS location, corresponds characteristics measurements, can to [22]in at or vary a receiving the the by large capabilities GPSTherefore, (differences meters we receiver cluster either these accuracyplaces data due into correspond among meaningful to to places. the either These sensing stations devices). (at which passengers are Station Location Station Railway system architecture Platform Length Length Platform Estimation Unit , i s
Station Location Platform the on Population
Detection Unit w h E Stationary points Current Tram Location, Stations history Stations Location, Tram Current
Detection Unit Trans-Sense Station Location Detection Component Detection Location Station ) (for a user
Spatial & Temporal Outlier Removal sub-system. The SV , then passed to Fig. 3: are detailed in the OMPONENTS [15], [16]. Specifically, C Temporal outliars are the Spatial data contains noise and Driving Sense YSTEM Railway-Sense Walking - Railway-Sense Stationary MonoSense (Bus/ (Bus/ Cars) R S system architecture, is described. (Tram/Train) Rail Transport Rail Transport Railway-Sense Road Road Transport Transport Mode Detection) Mode Transport ( Rail/ Road Transport Detection) Transport Road Rail/ (e.g. glitches) ( that correspond to a tram waiting at a system estimates and predicts the ex-
Pre-Processing- Smoothing- Filtering i lg operation is given below: Figure 3 shows w ) (for a user riding a tram, when will she h . system identifies whether a rider is on board E , VV Railway-Sense i Railway-Sense sg Client w h output is passed to Sensor III. Railway-Sense R-Sense E Sampling , Sense - i f Railway-Sense 2) Spatial outliars removal: The main goal of this component is to reduce the noise and 1) Temporal outliars removal: The five components of The The Railway-Sense w h Trans outliars mainly due torate the position urban information canyonerrors effect and are or/ more inaccu- often[19]; when where the GPS aamong signal different user bounces nearby off areas reading [17]– as changes shown in vastly Figures back 4, 5. and forth E station, a trafficdue delay, segment to many and reasons leg such times. as Outliars accidents occur or tram failure delays. outliars in the waiting time in the following parameters remove spatial and temporalmeasurements. ouliars in the input raw sensors’ A. Preprocessing Component. following subsections. waiting inVehicle-View a ( station,arrive when at her will destimation). Answeringthe the these estimation/ queries prediction necessitates next ofvehicle. the tram The remaining history time arrive) to componentused and get (described to on/off in a aggregate Section previous III-D)ID, riders’ is direction traces detection over componentis time. (described finally The in Section Tram used III-E) reflects to its identify the lineponent tram number). of direction In the and the its following ID section, ( each that com- of a rail transportationto vehicle or not. If yes, the data is passed pected arrival times ofthe the LRT users.the detailed A components brief of explanation the R-Sense of the station locationdiscriminate stations’ detection from componentSpeed traffic lights’ (Section Estimation stops. III-B) Theanswers component to Dynamic two (described basicusers’ in vehicle locations Section( scheduling and III-C) queries views: based Station-View on ( the
(a) Original data including noise (b) Data after phase1 (c) Data after phase2 (d) Data after phase3 Fig. 4: Distribution of GPS data recieved from Tram users before and after the three preprocessing phases
E9PlusOrange Data With Noise 31.3
31.25
31.2
31.15
31.1
31.05 Latitude
31
30.95
30.9 data1 30.85 29.55 29.6 29.65 29.7 29.75 29.8 29.85 29.9 29.95 30 Longitude (a) (b) (c) (d)
E9PlusOrange Data With Noise Removed Noise Phase2, DBSCAN Clustering Removed Noise Phase3, DBSCAN Clustering E9PlusOrangeRemoved Noise Phase1 ( IQR ) ( = 0.0005, Noise = 1.0868 %, MinPts = 300) ( = 0.0005, Noise = 1.9345 %, MinPts = 300) 18000 18000 18000 18000 16000 16000 16000 16000 14000 15000 14000 14000 14000 15000 15000 15000 12000 12000 12000 12000 10000 10000 10000 10000 10000 10000 10000 10000 5000 8000 5000 8000 5000 8000 5000 8000 6000 0 6000 6000 6000 0 0 0 4000 31.2 4000 4000 4000 31.24 31.1 29.9 2000 29.98 31.24 29.98 31.24 29.98 29.96 2000 29.96 2000 29.96 2000 31 29.8 31.22 29.7 29.94 31.22 29.94 31.22 29.94 30.9 0 Latitude 29.6 Longitude 31.2 29.92 0 29.92 0 29.92 0 Latitude Longitude Latitude 31.2 Longitude Latitude 31.2 Longitude (e) (f) (g) (h) Fig. 5: Data in different phases: (a) Original Data with noise (b) After applying Phase 1 (c) After applying Phase 2 (d) After applying Phase 3 (e), (f) (g) (h) Histograms of original data, after Phase 1, after Phase 2 and after Phase 3, respectively.
waiting for a tram ) or moving/ stationary trams. Stationary Minpts*, Epsilon* and DT* that achieve maximum TPR (True trams are those waiting at stations or traffic lights. In order to Positive Rate)= 0.9535 and minimum FPR (False Positive discriminate between these two tram states, we make use of Rate)= 0.2432 and DT= 0.0003 have been recorded and listed the travel environment as well as passengers behavior. in Table II.
1) Stationary Points Detection Unit: The main purpose Parameter/Metric Value of this unit is the automatic detection of the stationary points MinPts 100 (detection of passengers’ inside a stand still tram that is wait- Epsilon 0.0002◦ Distance Threshold DT 0.0003◦ ing at a station or traffic light). The unit starts by the stream TPR (True Positive Rate) 0.9535 of preprocessed random points. These points are clustered FPR (False Positive Rate) 0.2432 into stationary or moving points. Points that have nearly zero Precision 90.1% Recall 95.35% speed are considered as stationary points. Further, stationary points are clustered into either stations or traffic lights. We Note that , Dt are in degrees: 1◦ ≈ 111 km (110.57 eql 111.70 polar). have used Dbscan to detect the stationary points. The optimal 10 ≈ 1.85 km (= 1 nm) 0.01◦ ≈ 1.11 km. parameters Minpts* and Epsilon* for the collected data have 100 ≈ 30.9 m 0.0001◦ ≈ 11.1 m. been found based on ground truth data (the true locations of . the tram stations and traffic lights) and repetitive application TABLE II: Optimal clustering parameters and corresponding of the Dbscan algorithm with a range of values for Minpts performance metrics and Epsilon. The centroids of the found clusters are calculated and matched to the true centroids. If a matching is less than a small distance threshold (DT), it is considered as a positive 2) Station Location Detection Unit.: The function of hit. Figure 6 illustrates several ROC curves corresponding to this component is to discriminate between stations and traffic different trials of Dbscan algorithm (and different values of lights based on either temporal or spatial methods. From the Minpts, Epsilon, DT). From Figure 6, the optimal parameters temporal point of view, the real data shows that, in general, DBSCAN Clustering ( = 0.0001, MinPts = 60) ROC Space, Distance Threshold= 31.229 * Station Location (Ground Truth) S __ 17 Station 0.0003 1 31.2285 Waiting before S __ 17 Station 0.9
0.8 31.228
0.7
31.2275 0.6 Latitude
0.5 S __ 16 Station 31.227 0.4
31.2265
(TPR or Spensitivity) or (TPR 0.3
0.2 31.226 29.9475 29.948 29.9485 29.949 29.9495 29.95 29.9505 29.951 29.9515 0.1 Longitude
0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fig. 7: The red box denotes a tram waiting at a traffic light FPR or (1-Specificity) between S 16 and S 17 stations (blue boxes) Fig. 6: Optimal Clustering Parameters
The minimum bounding rectangle (MBR) of the clustered points representing a station, is used to estimate the platform the magnitude of Ehwsi is greater than Ehwf i. Figure 8 shows length. The length of the larger side of the bounding rectangle stations’ and traffic lights’ samples waiting times (Ehwsi, represents the platform length. Given that (Long1, Lat1) and Ehwf i) of trams in one of the tram lines. At S 5 station (Long2, Lat2) are the coordinates of the end points of the one may note that the waiting times are, in general, longer larger side, Eq (2) shows how to calculate a station platform than other stations because all types of waiting times (Ehwsi, length(D) using Haversine Formula: [30]. Ehwf i and Ehwbf i) usually occur at this station. From the spatial point of view, both stations and traffic lights (Dlong,Dlat) =(Long2, Lat2) − (Long1, Lat1) D are places of frequent stops. However, the frequency of stops A = cos(Lat ) ∗ cos(Lat ) ∗ (sin( long ))2 at stations is considerably larger than the stops at traffic lights. 1 2 2 Moreover, stations and traffic lights can be differentiated based D + (sin( lat ))2 on the passengers in/out-flow patterns to/from the trams, at 2√ these locations. Commuters engage in different activities while C =2 ∗ atan2( A, p(1 − A)) getting on/off trams such as moving up/down, standing, sitting, turning, etc... These activities can discriminate between traffic D =R ∗ C(where R is the radius of the Earth) lights and stations. Since travelers are expected to get off/ on (2) (from/ to) a tram only at stations while remaining relatively We could correctly estimate stations’ platforms lengths with motionless at traffic lights, the corresponding passengers’ an accuracy of 95.7%. For example, the true platform length activity patterns can discriminate between these two situations. of S 17 station is 70 meters, while the estimated length is 67 These activity patterns [27], [28] (illustrated in Table III) can meters. As a result, we can integrate new stations to Google be recognized using either dedicated wearable sensors or the maps and other similar services. ubiquitous sensors in smart phones via pervasive computing [29]. 4) Stations locations database unit.: The function of this unit is to store stations’ locations (Long, Lat) and dimensions Activity Activity Description and add new stations directly after finding them from previous Standing, walking, turning Left or right, walking, climbing components. up stairs, turning left or right, walking again till reaching Getting on a Tram 1 seat, turning left or right 90◦ or 180◦, seating down, sitting for long time. C. Dynamic Time Estimation Component. Sitting, standing up, turning left or right 90◦ or 180◦, Getting off a Tram2 walking till reaching the door, turning left or right to be in front of the stairs, getting down, walking on the ground. This component estimates the expectation of each of the 1 Note that Getting On Sequence will start from steady standing in the station. random parameters defined in Eq 1. Towards this goal the 2 Note that Getting Off Sequence will start from steady sitting state in the Tram. distribution governing each parameter is plotted and its ex- TABLE III: Getting on/off passengers’ activity patterns. pectation is calculated. For example, Figure 8 plots samples of the waiting time (Ehwsi) for different stations for many trips. A GPS sample observation is identified as belonging to a passenger in a tram waiting at a station if the spatial 3) Platform Length Estimation Unit.: The function of this unit is to estimate the platform length after splitting 1This formula does not take into consideration the (ellipsoidal) shape of the Earth. It will tend to overestimate trans-polar distances and underesti- stations from traffic lights using the previous unit. Platform mate trans-equatorial distances. The values used for the radius of the Earth length detection identifies whether the tram reached a station (3961miles, 6373km) are optimized for locations around 39 degrees from or just buffering outside the station, as shown in Figure 7. the equator (roughly the Latitude of Washington, DC, USA). $%"# " μ =
#" %&