Estimating Crowd Distribution Within the Urban Rail Transit System
Total Page:16
File Type:pdf, Size:1020Kb
CrowdAtlas: Estimating Crowd Distribution within the Urban Rail Transit System Jinlong Ey∗, Mo Liy∗, Jianqiang Huangz∗ yNanyang Technological University, Singapore zAlibaba Group, China ∗NTU-Alibaba Joint Research Institute, Singapore [email protected], [email protected], [email protected] Abstract—While the urban rail transit systems are playing an increasingly important role in meeting the transportation NS11 demands of people, the precise awareness of how the human NS9 crowd is distributed within the urban rail transit system is NS7 NS14 highly necessary, which serves to a range of important appli- cations including emergency response, transit recommendation, NS4 NS15 EW1 commercial valuation, etc. Most urban rail transit systems are Terminal closed systems where once entered the travelers are free to move NS1/EW24 NS18 EW3 around all stations that are connected into the system and are Terminal & Interchange EW4 EW27 EW8 difficult to track. In this paper, we attempt to estimate the EW29 crowd distribution within the urban rail transit system based Terminal NS22 only on the entrance and exit records of all the rail riders. EW13/NS25 Interchange EW19 EW14/NS26 Interchange Specifically, we study Singapore MRT (Mass Rapid Transit) as EW15 a vehicle and leverage the tap-in and tap-out records of the EZ- NS28 Terminal Link transit cards to estimate the crowd distribution. Guided by a key observation that the passenger inflows and arrival flows at various MRT stations are spatio-temporally correlated Fig. 1. A map of the two major MRT lines EW and NS. due to behavioral consistence of MRT riders, we design and implement a machine learning based solution, CrowdAtlas, that accurately estimates the crowd distribution within the MRT tap out) of 5 lines and 102 stations, totaling 1.2 billion records. system. Our trace-driven performance evaluation demonstrates With those records, we specifically study the two major lines the effectiveness of CrowdAtlas. (namely East-West line, or EW line, and North-South line, or Index Terms—Urban rail transit system, Crowd distribution, Machine learning NS line) that span across the country and possess the heaviest ridership (∼1.5 million rides take place on those two lines everyday). Fig. 1 depicts a map of the two MRT lines and the I. INTRODUCTION 52 affiliated stations. An urban rail transit system is generally an electric railway Achieving accurate estimation of crowd distribution in the system operating on an exclusive right-of-way [1], where MRT, however, is challenging owing to the limited informa- passengers can travel freely among stations in different lines. tion. With the EZ-Link card data we have the knowledge of In virtue of fast velocity and large capacity, the rail transit sys- the riders that enter the MRT system, but the travelers are free tems have become the most important urban public transporta- to move around all the stations unspotted. Fine reasoning the tion service in recent years. Especially in many metropolises, crowd movement needs to overcome the uncertainties that arise the average daily ridership has reached millions (e.g., ∼5 from the ride time and the trip destinations. Through analysis million in London [2] and ∼3.5 million in Singapore [3]). over historical MRT trip data, we have a key observation that Since travelers are free to move around all the stations once the passenger inflows and arrival flows at various MRT stations they enter the transit system, it is essential to have a fine esti- are spatio-temporally correlated due to behavioral consistence mation of how people are distributed within the transit system. of MRT riders. Such information is important to providing critical resilience Guided by the observation, we design and implement a guarantees, e.g., emergency evacuation in response to railway machine learning based solution, CrowdAtlas, that is able to disruptions or terrorist attacks. It is also useful to other value- capture MRT riders’ transition probabilities and based on that add businesses, e.g., real-time transit recommendations based perform accurate online estimation of the crowd distribution. on crowdedness, or commercial valuation with crowd flows. In particular, CrowdAtlas builds a neural network model that In this paper, we attempt to accurately estimate the crowd learns the flow correlation, i.e., the riders’ transition proba- distribution within the urban rail transit system and take bilities among stations and across time from a high volume Singapore MRT (Mass Rapid Transit) system as a vehicle to of historical MRT trips. With the model and all the tap- study the problem. We collect one-year EZ-Link card data of in records, the number of MRT riders at any MRT station Singapore MRT, involving daily rides (from both tap in and can thus be estimated by aggregating the riders transitioned Ride time 100 100 Arriving at Tapping in Boarding Alighting Tapping out 75 75 Platform 50 50 Ride time (min) 25 Ride time (min) 25 Walking time Waiting time Travel time Walking time 0 5 10 15 20 25 0 5 10 15 20 25 Transfer time End station (EW#) End station (NS#) Arriving at Boarding Alighting Boarding Alighting (a) To EW-line stations (b) To NS-line stations Platform Transfer Fig. 3. Ride time statistics from an MRT station EW1 to other stations. Scenario Travel time Walking time Waiting time Travel time arrival flow is brought by other stations’ inflows, we define τt transition probability poa which describes the probability that Fig. 2. Typical activities during a ride. a passenger arrives in station sa at time t given the ride starts from station so at time τ(< t). The relation between the two from all origin stations and from all past time beings. We flow sets A and I is perform comprehensive evaluations with EZ-Link data traces. n Z t− t X τ τt Our trace-driven evaluation results – the overall MAPE (Mean Ak = Ij pjk dτ; 8sk 2 S: (1) Absolute Percentage Error) is less than 15%, suggest that j=1 0 CrowdAtlas is able to produce accurate estimation of crowd The transition probability for any individual passenger is distribution for most stations. affected by his/her destination station sd as well as ride time II. PROBLEM FORMULATION Tr (i.e., time spent in the rail transit system), which can be τt described as a function poa = f(sd;Tr). It is challenging A passenger’s rail transit ride begins from tapping in at τt to accurately obtain poa due to the uncertainties of both the origin station. There are various travel and sojourn time destination s and ride time T . involved during the passenger’s stay within the MRT sys- d r We collect a large-scale EZ-Link transit card data of all tem, which finally ends when the passenger taps out at the Singapore MRT trips in a whole year, involving ∼2.8 million destination station. Additionally, if the origin and destination average daily ridership among 5 lines and 149 stations. We stations are in different lines, extra sojourn time is incurred at specifically study the two the major MRT lines – EW and NS the interchange stations. A sequence of these ride activities is lines that involve 52 stations spanning across the country with depicted in Fig. 2. ∼1.5 million rides everyday. Here we extract one month data Suppose there are n stations S = fs g(j = 1; 2; ··· ; n) j to illustrate and quantify the two uncertainties. Fig. 1 depicts for the rail transit system, and a passenger’s ride start time is the two MRT lines, where each station is named in the format τ. Based on that, a station’s inflow is defined as the number of line name + number (e.g., EW15, or NS14). Transfers may of passengers tapping in the station at time τ, which is a τ- take place at 3 interchange stations marked in the figure (i.e., dependent variable. We regard the inflow set of all stations τ NS1/EW24, EW13/NS25, and EW14/NS26). I = fIj g(j = 1; 2; ··· ; n) to be known, as generally these inflows could be obtained by the MRT operator in real time. Ride-Time Uncertainty. A passenger’s ride time Tr com- τ 0 prises several time periods between ride activities – walking Similarly, we can define a station’s outflow Oj as the number of passengers tapping out the station at time τ 0, which is also time, waiting time, travel time (as illustrated in Fig. 2). They obtained by the MRT operator every minute. are affected by either passenger behaviors (e.g., walking speed, A station’s arrival flow is defined as the number of passen- shopping activities, waiting) or train scheduling (different gers presently arriving at that station (from other stations). speeds and schedules). As a result, the ride time may vary u The present number of passengers at each station can be subject to an unknown distribution (Tr ∼ F1 (t)). For the derived by summing inflow and arrival flow at the present above two-line dataset, the maximum, minimum and median time as well as the sojourn passengers’ number at that station, ride times from a terminal station EW1 to all other stations which can be derived from previous arrival flows minus are depicted in Fig. 3 respectively (separately by EW/NS line). outflows (see xIII-B for details). The goal is thus to estimate The ride time variance is observed at each destination station a set of all stations’ arrival flows at present time t, i.e., with a growing trend as the station interval increases. There t exists even higher variance when transfer is involved in the A = fAkg(t > τ; k = 1; 2; ··· ; n).