<<

applied sciences

Article Dwell Time Estimation Using Real-Time Train Operation and -Based Passenger Data: A Case Study in ,

Yoonseok Oh 1 , Young-Ji Byon 2 , Ji Young Song 3, Ho-Chan Kwak 3,* and Seungmo Kang 1,* 1 School of Civil, Environmental and Architectural Engineering, Korea University, Seoul 02841, Korea; [email protected] 2 Department of Civil Infrastructure and Environmental Engineering, Khalifa University of Science and Technology, Abu Dhabi 127788, UAE; [email protected] 3 Railroad Policy Research Team, Future Transport Policy Research Division, Korea Railroad Research Institute, Uiwang 16105, Korea; [email protected] * Correspondence: [email protected] (H.-C.K.); [email protected] (S.K.); Tel.: +82-31-460-5495 (H.-C.K.); +82-2-3290-4862 (S.K.)  Received: 4 December 2019; Accepted: 7 January 2020; Published: 9 January 2020 

Abstract: Dwell time is a critical factor in constructing and adjusting railway timetables for efficient and accurate operation of railways. This paper develops dwell time estimation models for a (S line) in Seoul, South Korea using support vector regression (SVR), multiple linear regression (MLR), and random forest (RF) techniques utilizing archived real-time metro operation data along with smart card-based passenger information. In the first phase of this research, the collected data are processed to extract boarding and alighting passenger counts and observed dwell times of each train at all stations of the S line under the current operational environment. In the second phase, we develop SVR, MLR, and RF-based dwell time estimation models. It is found that the SVR-based model successfully estimates the dwell times within 10 s of differences for 84.4% of observed data. The results of this paper are especially beneficial for autonomous railway operations that need constructing and maintaining dynamic railway timetables that require reliable dwell time predictions in real-time.

Keywords: smart card; railway operation data; transit ridership; dwell time estimation; metro timetable; artificial intelligence

1. Introduction The relatively reliable schedule adherence of railway systems is one of the most attractive merits of the mode for the railway passengers [1]. However, it is still a challenge to operate on railways with perceived reliability. Railway schedules are constructed based on passenger demand, although the arrival patterns of passengers at stations are not uniform or deterministic in real-life. If a particular train’s arrival at a station is delayed, an additional accumulation of arriving passengers in the meantime will result in extended boarding and alighting times when the train dwells at the station, and this event will further propagate delays with respect to subsequent trains upstream. Railway operators and authorities have considered different strategies when determining train schedules with respect to stations with expected passenger demands. They employ a “running time supplement” [2,3], a buffering time that can help to recover from delays if they occur. To minimize the delay, larger railway operators often control the passenger demand with a more direct approach of employing dedicated personnel on the platforms at stations, who can restrict passengers from boarding trains when they are delayed. There have been various studies in predicting delays and analyzing

Appl. Sci. 2020, 10, 476; doi:10.3390/app10020476 www.mdpi.com/journal/applsci Appl. Sci. 2020, 10, 476 2 of 12 their intrinsic nature [2,3], timetable optimization [4–7], developing appropriate performance indices, and utilizing them for timetable construction [8]. For railway-related research, it is essential to have access to real-life data, including the operation of trains and passenger flows, while a larger amount of data will further benefit the reliability of the analysis. Automatic passenger counters (APCs) and smart card-based fare payment systems have been utilized for such purposes. Um et al. [9] utilized big data based on the smart card system in Seoul, South Korea for assessing performance measures of the public transit services including schedule adherence and crowdedness of passengers. Using smart card-based passenger fare and rail operation data, et al. [10] developed models for predicting actually taken passenger-routes, the number of boarding and alighting passengers at stations, and the number of passengers on board between stations. Berbey et al. [11] used data from of the Panama Metro and passengers’ preferences of train ticket classes in order to estimate dwell times at each station. Lam et al. [12] developed a regression model to investigate the relationship between dwell time and the number of boarding and alighting passengers, and tested it using Monte-Carlo simulations. Lee et al. [13] developed a timetable adjustment model for the metro, MTR, by analyzing the departure and arrival information of trains, boarding and alighting passenger counts at stations, and historical data of past causes of delays. Markovic et al. [14] developed a machine learning model based on a support vector regression method for predicting delays at railway stations operated with different classes of trains. Wang and Work [15] proposed a regression-based estimation model for future train delays using only the operation records of trains. Despite numerous efforts in research works regarding train delays, there have not been any significant research works for modeling dwell/delay time of trains utilizing boarding and alighting passenger counts, or onboard crowdedness (load factor), for all trains at all train stations of a metro line. Jiang et al. [16] conducted a similar line of research for of the in China, yet with limited data and simulated values for load factor, boarding, and alighting passenger counts. Adachi et al. [17] adopted a Gompertz curve to overcome a limited data set of boarding and alighting passenger counts collected at 30-min intervals. A smart card-based transit fare system has been in use in Seoul, South Korea since 2004 and currently 99% of all passengers use the smart card, which provides an excellent basis to collect real-life data for passenger flows. The smart card-based database includes each passenger’s origin and destination stops or stations, associated times, transit transfer locations, etc. This paper utilizes data from real-life operations of trains in the , and the smart card-based passenger information, for estimating dwell times for each station of a metro line. Section2 describes the data structures of train operations and smart card-based passengers’ information, and how the two databases are processed for further analysis. Section3 develops a methodology based on extracting observed dwell times for each station using real-time data of train operations. Section4 uses the observed dwell times, along with boarding and alighting passenger counts, onboard loading factors, to estimate dwell times. Section5 concludes the paper.

2. Matching the Automatic Fare Collection Data with the Real-Time Train Operation Data This paper develops a dwell time estimation model using boarding and alighting passenger counts, and onboard passenger counts between stations for each train. The initial dataset for analysis was prepared according to the method suggested by Hong et al. [10] based on the smart card passenger information and archived real-time train operation data. This dataset includes the number of trains operating and their current status with respect to their nearest station as one of 3 stages—“approaching,” “arrival,” and “departure”—along with timestamps for when the current stage was acquired. The approaching status is gained as soon as the arriving train passes a balise located at 1000 m upstream of an associated station. The arrival status is achieved as soon as the train passes a balise located 400 m upstream. The status changes to departure when the train passes a balise at 200 m downstream. The smart card-based passenger information includes the serial number of the card, boarding station, boarding time, alighting station, alighting time, transfer station, and transfer Appl. Sci. 2020, 10, 476 3 of 12 time. It is noted that the time associated with boarding, alighting, and transfer refers to the time when the passenger checks in or out with the gates at stations. In this paper, both archived real-time train operation and smart card-based passenger data from the Shinbundang line (S Line) on 31st October 2017 was used. The S line has concentrated commuting passenger demands during both morning and afternoon peak periods, with dominating directions of the majority of passengers: towards the CBD in the morning and towards in the afternoon (See Figure1). In this study, to show the morning peak clearly with maximum crowdedness, the S line’s Central Business District (CBD)-bound direction towards was selected for the analysis. The S line has a total length of 31.29 km, consisting of 12 stations, of which 4 stations allow inter-line transfers. The fleet size is 20 trains, and they run 327 cycles and 271 cycles in total on weekdays and weekend days, respectively; mostly throughout the entire route, except for the first and last trains each day. There is only 1 class of tickets, and daily ridership on the S line is roughly 247,000,Appl. Sci. 2020 as of, 10 2017., x FOR For PEER train REVIEW operation data, arrival and departure times were identified for each3 train of 12 and each station. However, in cases where departure data were missing, average dwell times based onIt is visual noted observations that the time withassociated video with cameras boarding, wereadded alighting, to the and arrival transfer times refers to estimateto the time the when missing the departurepassenger times.checks in or out with the gates at stations.

Figure 1. Route map and the station ID of Shinbundang S line. Figure 1. Route map and the station ID of Shinbundang S line. Figure2 shows the process of matching the smart card passenger information with the archived real-timeIn this train paper, operation both archived data. The real-time process train starts operation with identifying and smart a card-based passenger passenger who has alighted data from at athe station. Shinbundang Then, the line train (S Line) that on arrived 31st October at the station 2017 was most used. recently The S isline assumed has concentrated to be the onecommuting that the passenger gotdemands off from. during The departureboth morning time and of the afternoon train that peak the passengerperiods, with had do boardedminating earlier directions is found. of Ifthe the majority boarding of timepassengers: is earlier towards than the the departure CBD in the time morning of the train, and thetowards train numberBundang is in assigned the afternoon to that passenger,(See Figure assuming 1). In this that study, the to passenger show the has mo boardedrning peak and clearly will alight with from maximum that train. crowdedness, If the matching the S processline’s Central does not Business succeed, District the data (CBD)-bound from the smart direction card istowards not used, Gangnam since there station may was be logicalselected errors for the in theanalysis. data. The This S is line reasonable, has a total because length there of 31.29 is no km, way consisting to reduce of the 12 travelstations, time of bywhich overtaking 4 stations between allow trains,inter-line or transferringtransfers. The through fleet size the otheris 20 trains, lines since and thethey S linerun has327 acycles single and class 271 of train.cycles The in total process on stopsweekdays when and the numberweekend of days, the smart-card respectively; data mostly that have throughout been processed the entire is equal route, to theexcept total for number the first of theand smart-card last trains data,each day. N. There is only 1 class of tickets, and daily ridership on the S line is roughly 247,000,Figure as of3 shows 2017. For smart train card-based operation passenger data, arrival data and and departure train operation times were data identified from the S07 for stationeach train to theand S11 each station station. in aHowever, passenger in entry-exit cases where map departu suggestedre data by Hongwere missing, et al. [10 ].average The x-axis dwell represents times based the timeon visual at the observations S07 station, andwith the videoy-axis cameras represents were the ad timeded atto the the S11 arrival station. times Circles, to estimate triangles, the andmissing plus signsdeparture represent times. groups of individual passenger’s smart card data who have completed their trips from S07 toFigure S11 with 2 shows associated the process boarding of matching and alighting the smart times card at thosepassenger stations. information Train number with the 477 archived arrives atreal-time the S07 train at 7:25:56 operation and S11data. at The 7:38:02. process Any starts potential with passengersidentifying whoa passenger were on who train has 477 alighted during at the a station. Then, the train that arrived at the station most recently is assumed to be the one that the passenger got off from. The departure time of the train that the passenger had boarded earlier is found. If the boarding time is earlier than the departure time of the train, the train number is assigned to that passenger, assuming that the passenger has boarded and will alight from that train. If the matching process does not succeed, the data from the smart card is not used, since there may be logical errors in the data. This is reasonable, because there is no way to reduce the travel time by overtaking between trains, or transferring through the other lines since the S line has a single class of train. The process stops when the number of the smart-card data that have been processed is equal to the total number of the smart-card data, N. Figure 3 shows smart card-based passenger data and train operation data from the S07 station to the S11 station in a passenger entry-exit map suggested by Hong et al. [10]. The x-axis represents the time at the S07 station, and the y-axis represents the time at the S11 station. Circles, triangles, and plus signs represent groups of individual passenger’s smart card data who have completed their trips from S07 to S11 with associated boarding and alighting times at those stations. Train number 477 arrives at the S07 at 7:25:56 and S11 at 7:38:02. Any potential passengers who were on train 477 during the trip must be on the 2nd quadrant of the horizontal and vertical lines intersecting at train 477. If Appl. Sci. 2020, 10, 476 4 of 12 Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 12 Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 12 tripthe passengers must be on used the 2nd train quadrant 477, they of must the horizontalpass the boarding and vertical gate at lines S07 intersectingstation before at train train 477 477. arrives If the passengerstheat S07 passengers station, used usedand train musttrain 477, they477,also they mustalight must pass after pass the the boarding the train boarding arrives gate atgate S07at S11at station S07 station. station before Therefore, before train 477 train arrivesthe 477 group arrives at S07 of station,atpassengers S07 station, and surrounded must and also must alight by also the after alightdotted the after trainlines arrivesthe roughly train at reprarrives S11esents station. at S11all Therefore, ofstation. the potential Therefore, the group users ofthe of passengers grouptrain 477 of surroundedpassengersfrom S07 to surrounded byS11 the station. dotted Itby linesis the noted roughlydotted that lines representsthere roughly were allsome repr of the passengersesents potential all of who users the boardedpotential of train 477and users from alighted of S07 train toat S11477the station.fromsame S07stations, It to is S11 noted and station. thatthese thereIt were is noted were excluded that some there from passengers we there analysis. some who passengers boarded and who alighted boarded at and the alighted same stations, at the andsame these stations, were and excluded these were from excluded the analysis. from the analysis.

Figure 2. Smart card-based passenger information matching process against the real-time train Figure 2. Smart card-based passenger information matching process against the real-time train Figureoperation 2. data.Smart card-based passenger information matching process against the real-time train operation data. operation data.

Figure 3. Entry-exit map of the trip from Jeongja (S07) station to Yangjae (S11)(S11) station.station. Figure 3. Entry-exit map of the trip from Jeongja (S07) station to Yangjae (S11) station. Figure 4 shows the number of passengers boarding, alighting, and arriving onboard at each stationFigure on the 4 shows S line. the Trains number depart of passengersfrom the right boarding, towards alighting, the left, and and arriving each line onboard in each at graph each station on the S line. Trains depart from the right towards the left, and each line in each graph Appl. Sci. 2020, 10, 476 5 of 12

Figure4 shows the number of passengers boarding, alighting, and arriving onboard at each station onAppl. the SSci. line. 2020, Trains10, x FOR depart PEER REVIEW from the right towards the left, and each line in each graph represents5 of 12 a train. Red lines represent trains operated during the morning peak-hours from 6:30 a.m. to 10:30 a.m.,represents and blue a linestrain. representRed lines represent trains run trains during oper theated afternoon-peak during the morning hours peak-hours from 6:00 p.m. from to 6:30 9:00 a.m. p.m. to 10:30 a.m., and blue lines represent trains run during the afternoon-peak hours from 6:00 p.m. to Trains run outside of peak-hours are represented by gray lines. 9:00 p.m. Trains run outside of peak-hours are represented by gray lines.

FigureFigure 4. (a4.) The(a) The number number of boarding of boarding passengers passengers at each at station. each station. (b) The ( numberb) The ofnumber alighting of passengersalighting at eachpassengers station. at ( ceach) The station. number (c) ofThe onboard number passengers of onboard when passengers the train when had the been train arrived had been at each arrived station. at each station. Figure4 shows passenger boarding, alighting and onboard counts. When observed from the right, startingFigure 4 with shows the passenger first station boarding, S01, in alighting the morning and peak,onboard the counts. boarding When counts observed from from S01 to the S08 stations,right, starting inclusive, with are the significantly first station higher S01, in than the morning the rest ofpeak, stations the boarding combined. counts This from is due S01 to to the S08 fact thatstations, the regions inclusive, covered are bysignificantly stations from higher S01 than to S08 the arerest dedicated of stations residential combined. areas This is (bed due towns) to the offact the Greaterthat the Seoul regions Area covered (GSA), by S09 stations and S10 from are S01 located to S0 in8 are an outskirtsdedicated of residential Seoul, and areas S11 (bed and S12towns) are of located the in theGreater CBD Seoul of Gangnam Area (GSA), area S09 of and Seoul. S10 are During located the in evening an outskirts peak of period, Seoul, and boarding S11 and counts S12 are at located the S08 in the CBD of Gangnam area of Seoul. During the evening peak period, boarding counts at the S08 Appl. Sci. 2020, 10, 476 6 of 12

Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 12 station stands out as there is also a large concentrated commercial area called “Pangyo Techno Valley”. station stands out as there is also a large concentrated commercial area called “Pangyo Techno This area is also related to the high number of alighting at S08 in the morning. Valley”. This area is also related to the high number of alighting at S08 in the morning. DuringDuring the the morning morning peak-hour peak-hour period, period, atat stationsstations from S01 S01 to to S06, S06, inclusive, inclusive, and and S09, S09, there there are are nearlynearly zero zero alighting alighting passengers. passengers. At At S07 S07 andand S08S08 stations,stations, some passengers passengers al alight,ight, while while most most of ofthe the passengerspassengers alight alight at at S11 S11 and and S12 S12 stations stations inin thethe CBD.CBD. There are relatively relatively high high numbers numbers of of alighting alighting passengerspassengers at S07at S07 and and S08 S08 compared compared to otherto other residential residential areas areas because because there there are concentrationsare concentrations of o ffiof ce buildingsoffice buildings near those near stations. those stations Note that. Note S07 andthat S08S07 alsoand serveS08 also as transferserve asstations transfer connectedstations connected with other metrowith lines. other In metro the case lines. of stationIn the case S09, of there station are onlyS09, there residential are only facilities residential near facilities the station, near hence the station, resulting in veryhence low resulting alighting in counts.very low In alighting the afternoon-peak counts. In andthe non-peakafternoon-peak periods, and most non-peak alighting periods, counts most occur at S11alighting and S12 counts stations. occur at S11 and S12 stations. DueDue to to the the various various characteristics characteristics mentioned mentioned earlier,earlier, the onboard passenger passenger counts, counts, as as shown shown in in FigureFigure4c, 4c, continue continue to to increase increase as as trains trains approach approach the the final final stationstation S12S12 in the CBD. CBD. It It is is also also noted noted that that mostmost trains trains in in the the morning-peak morning-peak period period experienceexperience passengerspassengers occupying occupying at at the the full full capacity capacity of ofthe the trainstrains and and even even exceeding exceeding the the capacity capacity with with a a factor factor ofof 22 betweenbetween the the S07 S07 and and S11 S11 stations. stations.

3. Estimating3. Estimating Observed Observed Dwell Dwell Time Time of of Train Train When When DepartureDeparture Information Is Is Missing Missing ThisThis paper paper utilizes utilizes the the real-time real-time operation operation datadata ofof thethe metro trains. The The collected collected data data have have relativelyrelatively well-recorded well-recorded arrival arrival times times of of trains,trains, whilewhile often missing the the departure departure times. times. Therefore, Therefore, in casesin cases where where the the departure departure times times are are missing, missing, thethe observedobserved dwell time time for for each each train train at at each each station station was estimated by using typical travel times between stations. was estimated by using typical travel times between stations. As illustrated in Figure 5, the train arrival times at station S and S + 1 is easily determined from As illustrated in Figure5, the train arrival times at station S and S + 1 is easily determined from the train operation data. The arrival time refers to the time when a train passes 400 m upstream of the train operation data. The arrival time refers to the time when a train passes 400 m upstream of an an associated station. The time difference between the 2 stations consists of the braking time at station associated station. The time difference between the 2 stations consists of the braking time at station S (BS), dwell time at station S (DWS), and the travel time between the station S and 400 m upstream S (BlocationS), dwell of timestation at S station + 1. If Swe (DW assumeS), and the the braking travel times time are between similar the for stationall stations, S and the 400 braking m upstream time location of station S + 1. If we assume the braking times are similar for all stations, the braking time at station S (BS) can be subtracted from the time difference between the two stations S and S + 1 in at stationorder to S estimate (BS) can the be actual subtracted dwell fromtime. theThe timeS line, di fromfference which between the data the were two collected, stations is S specifically and S + 1 in ordersuitable to estimate for such the assumptions, actual dwell because time. Theit is Sautonomously line, from which operated the data and were has constant collected, braking is specifically times suitableat stations for such with assumptions, a large headway because of 4 min, it is autonomouslyand has a relatively operated low chance and has of constantcongestion braking occurring times at stationsdue to the with influence a large of headway trains downstream. of 4 min, and The has assumption a relatively of constant low chance braking of congestion times was occurringverified duevisually to the influencewith video of cameras trains downstream. installed on all The trai assumptionns. Figure of6 shows constant the braking time difference times was between verified visuallyarrivals with of trains video at cameras all stations installed (solid on black all trains.and red Figure lines),6 andshows average the time travel di timesfference visually between observed arrivals of trainsfrom trains at all equipped stations(solid with video black camera and reds (blue lines), dotted and averageline). The travel difference, times visuallyDWS, between observed the TT fromS trainsand equipped TDS is the with estimated video camerasdwell time (blue for dottedeach train line). at Theeach di stationfference, in DWthe Scase, between of missing the TT departureS and TD S is thedata. estimated It is noted dwell that the time red for lines each represent train at time each di stationfference in between the case arrivals of missing at stations departure of the data. first It5 is notedtrains that in thethe redearly lines morning represent period, time where difference there are between no delays, arrivals while at black stations lines of represent the first 5for trains all other in the earlytrains. morning period, where there are no delays, while black lines represent for all other trains.

Figure 5. Time components of railway operation between stations and definitions of TTS and TDS. Figure 5. Time components of railway operation between stations and definitions of TTS and TDS. Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 12 Appl. Sci. 2020, 10, 476 7 of 12 Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 12

Figure 6. TTS and TDS at each station and the concept of estimating the DWS at each station.

Figure 6. TTS and TDS at each station and the concept of estimating the DWS at each station. Figure 6. TTS and TDS at each station and the concept of estimating the DWS at each station. Figure 7 shows a relationship between the estimation of observed dwell times for all trains at all stationsFigure versus7 shows the asum relationship of boarding between and alighting the estimation passenger of observed counts. The dwell x-axis times represents for all trains the sum at all of Figure 7 shows a relationship between the estimation of observed dwell times for all trains at all stationsboarding versus and alighting the sum ofpassengers boarding while and alighting the y-axis passenger represents counts. the estimation The x-axis of observed represents dwell the times. sum stations versus the sum of boarding and alighting passenger counts. The x-axis represents the sum of ofEach boarding symbol and represents alighting a passengersstation, as illustrated while the iny-axis Table represents 1. The mini themum estimation observed of dwell observed time dwellis 16 s. boarding and alighting passengers while the y-axis represents the estimation of observed dwell times. times.The variability Each symbol of dwell represents times is a station,found to as be illustrated larger when in Tablethe passenger1. The minimum counts are observed smaller, dwell and smaller time Each symbol represents a station, as illustrated in Table 1. The minimum observed dwell time is 16 s. iswhen 16 s. the The passenger variability counts of dwell are larger. times isIn found addition, to be it largeris found when that thethe passengerminimum dwell counts time are increases smaller, The variability of dwell times is found to be larger when the passenger counts are smaller, and smaller andas the smaller passenger when counts the passenger increase. counts It is interesting are larger. to In note addition, that many it is found observed that dwell the minimum times at dwellcertain when the passenger counts are larger. In addition, it is found that the minimum dwell time increases timestations increases seem asto have the passenger constant dwel countsl times increase. regardless It is interestingof passenger to counts. note that This many is a result observed of enforced dwell as the passenger counts increase. It is interesting to note that many observed dwell times at certain timesdwell at times certain at some stations major seem stations to have being constant assisted dwell by timesdedicated regardless employees of passenger on the platforms, counts. This limiting is a stations seem to have constant dwell times regardless of passenger counts. This is a result of enforced resultthe number of enforced of boarding dwell times passengers. at some major stations being assisted by dedicated employees on the dwell times at some major stations being assisted by dedicated employees on the platforms, limiting platforms, limiting the number of boarding passengers. the number of boarding passengers.

FigureFigure 7. 7.Relationship Relationship betweenbetween estimatedestimated dwelldwell timetime andandthe the number number of of boarding boarding and and alighting alighting passengerspassengers at at each each station station of of S S line. line. Figure 7. Relationship between estimated dwell time and the number of boarding and alighting passengers at each station of S line.

Appl.Appl. Sci. Sci. 2020 2020, 10, ,10 x ,FOR x FOR PEER PEER REVIEW REVIEW 8 of8 of12 12 Appl.Appl. Sci. Sci. 2020 2020, 10, ,10 x ,FOR x FOR PEER PEER REVIEW REVIEW 8 of8 of12 12 Appl.Appl. Sci. Sci. 2020 2020, 10, ,10 x ,FOR x FOR PEER PEER REVIEW REVIEW 8 of8 of12 12 Appl.Appl. Sci. Sci. 2020 2020, 10, ,10 x ,FOR x FOR PEER PEER REVIEW REVIEW TableTable 1. 1.Symbols Symbols of ofeach each station. station. 8 of8 of12 12 Appl.Appl. Sci. Sci. 2020 2020, 10, ,10 x ,FOR x FOR PEER PEER REVIEW REVIEW TableTable 1. 1.Symbols Symbols of ofeach each station. station. 8 of8 12of 12 Appl.Appl. Sci. Sci. 2020 2020, 10, ,10 x ,FOR x FOR PEER PEER REVIEW REVIEWStationStation Table Table 1. 1.Symbols Symbols of ofeach each station. station. StationStation 8 of8 of12 12 Appl. Sci.Station2020Station, 10 Name, 476 Name StationStationTable Table Symbol Symbol1. 1.Symbols Symbols of ofeachStation eachStation station. station. Name Name StationStation SymbolSymbol8 of 12 StationStation Name Name StationIDID Table TableSymbol Symbol1. Symbols1. Symbols of ofeachStation eachStation station. station. Name Name StationIDID SymbolSymbol StationIDID StationIDID StationStation Name Name StationStation TableTable Symbol Symbol1. 1.Symbols Symbols of ofeachStation eachStation station. station. Name Name StationStation SymbolSymbol StationGwanggyoStationGwanggyo Name Name S01IDS01ID SymbolSymbol StationStationJeongjaJeongja Name Name S07IDS07ID SymbolSymbol StationStationID StationStationID StationGwanggyoStationGwanggyo Name Name StationS01StationS01ID Table SymbolSymbol 1. Symbols of eachStationStationJeongja station.Jeongja Name Name StationS07StationS07ID SymbolSymbol StationGwanggyoStationGwanggyo Name Name S01IDS01ID SymbolSymbol StationStationJeongjaJeongja Name Name S07IDS07ID SymbolSymbol GwanggyoStationGwanggyoGwanggyoGwanggyo Name Jungang Jungang StationS01 S02IDS01 S02ID ID Symbol StationPangyoJeongja NamePangyoJeongja StationS07S08IDS07S08 ID Symbol GwanggyoGwanggyo Jungang Jungang S02 S02 PangyoPangyo S08S08 GwanggyoGwanggyo S01S01 JeongjaJeongja S07S07 GwanggyoGwanggyoGwanggyoGwanggyo Jungang Jungang S01 S02 S02S01 PangyoJeongjaPangyoJeongja S07S08S08S07 GwanggyoSanghyeonSanghyeon S01S03S03 CheonggyesanJeongjaCheonggyesan S07S09S09 GwanggyoGwanggyo Jungang Jungang S02 S02 PangyoPangyo S08S08 GwanggyoGwanggyoGwanggyoSanghyeonSanghyeon Jungang Jungang Jungang S02S03 S02S03 S02 CheonggyesanPangyoCheonggyesanPangyoPangyo S08S09S08S09S08 SanghyeonSanghyeon S03S03 CheonggyesanCheonggyesan S09S09 GwanggyoGwanggyo Jungang Jungang S02 S02 PangyoPangyo S08S08 SanghyeonSanghyeonSeongbokSanghyeonSeongbok S03S03S04S03S04 YangjaeCheonggyesanYangjaeCheonggyesanCheonggyesan Citizen’s Citizen’s Forest Forest S09S09 S10S09 S10 SeongbokSeongbok S04S04 YangjaeYangjae Citizen’s Citizen’s Forest Forest S10 S10 SanghyeonSanghyeon S03S03 CheonggyesanCheonggyesan S09S09 SeongbokSanghyeonSeongbokSanghyeonSeongbok S04S03S04S04S03 YangjaeYangjaeYangjaeCheonggyesan Citizen’sCheonggyesan Citizen’s Citizen’s Forest Forest Forest S10S09 S10 S10S09 Suji-guSuji-gu Office Office S05 S05 YangjaeYangjae S11S11 SeongbokSeongbok S04S04 YangjaeYangjae Citizen’s Citizen’s Forest Forest S10 S10 Suji-guSuji-guSeongbokSuji-guSeongbok O ffiOfficece Office S05 S05S04 S05S04 YangjaeYangjaeYangjae YangjaeCitizen’sYangjae Citizen’s Forest Forest S11S11 S10S11 S10 Suji-guSuji-gu Office Office S05 S05 YangjaeYangjae S11S11 SeongbokSeongbok S04S04 YangjaeYangjae Citizen’s Citizen’s Forest Forest S10 S10 DongcheonSuji-guDongcheonSuji-guDongcheon Office Office S06 S05S06 S05S06 GangnamGangnamYangjaeGangnamYangjae S12S11S12S11S12 DongcheonDongcheon S06S06 GangnamGangnam S12S12 Suji-guSuji-gu Office Office S05 S05 YangjaeYangjae S11S11 Suji-guDongcheonSuji-guDongcheon Office Office S05S06S06 S05 GangnamYangjaeGangnamYangjae S11S12S12S11 Dongcheon S06 Gangnam S12 4.4. 4.Dwell DwellDwell TimeDongcheon TimeTime Estimation Estimation Estimation Model Model ModelS06 Gangnam S12 DongcheonDongcheon S06S06 GangnamGangnam S12S12 4. 4.Dwell DwellDongcheon TimeDongcheon Time Estimation Estimation Model ModelS06S06 GangnamGangnam S12S12 4. 4.Dwell DwellInInIn this this Timethis Time second second Estimation Estimation phasephase phase of Modelof research,Model ofresearch, research, we we utilize we utilize utilize the resultsthe the results results from from the from first the the phase, first first includingphase, phase, including including the observed the the 4. Dwell Time Estimation Model dwell4. DwellInIn timesthis this Time second (which second Estimation requiredphase phase of Model ofestimationsresearch, research, we whenwe utilize utilize the departurethe the results results information from from the the first was first missing),phase, phase, including boardingincluding the and the 4.observed observed4.Dwell DwellInIn this Timethisdwell Time dwellsecond second Estimation times Estimation times phase phase(which (which Modelof Modelof requresearch, research,requ ired ired estimationswe estimationswe utilize utilize thewhen thewhen results resultsthe the departure from departurefrom the the informationfirst informationfirst phase, phase, wasincluding includingwas missing), missing), the the observed4.alighting 4.observedDwell Dwell Timedwell passengerTimedwell Estimation times Estimation times counts(which (which Model Model forrequ requ all ired trainsired estimations estimations at all stations, when when inthe the order departure departure to develop information information dwell timewas was estimationmissing), missing), boardingobservedobservedboardingInIn this andthisdwell anddwellsecond alightingsecond alightingtimes times phase phase(which passenger(which passenger of of research,requ requresearch, countsired countsired estimationswe forestimationswe forutilize all utilize all trains trains thewhen the whenat results at all results theall stations,the stations, departurefrom departure from inthe intheorder informationfirstorder informationfirst tophase, todevelopphase, develop includingwas wasincluding dwell missing),dwell missing), time the time the observedboardingestimationmodels.boardingestimationInIn this and Thedwellthis models. and secondmodels. proposedalightingsecond alightingtimes Thephase The(whichphase passenger modelsproposed passenger proposedof ofrequresearch, estimateresearch, countsmoired countsmodels estimationsdelswe dwell for weestimate for utilize estimateall utilize timesall trains trains thedwellwhen atthe dwellat allresults at all results timesstationstheall stations,times stations,departure from atfrom at withall inallthe stations intheorder given stations informationfirstorder first to boardingphase,with to developphase,with develop given wasincludinggiven andincluding dwell missing),boardingdwell alightingboarding time thetime the boardingboardingobservedInIn this andthis anddwellsecond alightingsecond alighting times phase phase passenger(which passenger of ofresearch, requresearch, counts countsired we forestimationswe forutilize all utilize all trains trains the the whenat resultsat allresults all stations,the stations, from departure from inthe in theorder orderfirst informationfirst tophase, todevelopphase, develop including wasincluding dwell dwell missing), time thetime the boardingestimationandobservedpassengerestimationandobserved alighting alighting anddwell models. countsdwell models. passengeralighting passengertimes times and The The(which onboard passenger proposedcounts(which proposedcounts requ and passengerrequ and countsmoired onboardmoired delsonboard estimationsdels forestimations countsestimate estimatepassengerall passenger trains on dwellwhen arriving dwellwhenat counts all countstimesthe stations,timesthe trains. departureon departureaton arriving atall arriving inall stations order stationsinformation trains.information trains. to with develop with given wasgiven was dwell boardingmissing), boardingmissing), time observedestimationestimationboardingobserved dwell models.anddwell models. alightingtimes times The The (which proposed(which passengerproposed requ requ moired countsmoireddels estimationsdels estimations estimate for estimate all trains dwellwhen dwellwhen at timestheall timesthe stations,departure departureat at all all stationsin stations informationorder information with to with develop given wasgiven was missing),boardingdwell boardingmissing), time estimationandboardingandboarding alightingBoarding, Boarding,alightingBoarding, and models. and passengeralighting alighting, passengeralighting alighting,alighting, The passenger proposedcounts passenger andcounts andand onboard and onboardonboard and countsmo onboardcounts delsonboard passenger passengerfor passengerestimate for passengerall passengerall trains trains countsdwell countscounts at counts at all counts times forall stations, forfor stations,eachon eachateachon arriving all arrivingtrain in traintrainstations inorder orderat trains. at at trains.each toeach eachwith todevelop stationdevelop station stationgiven dwell are boarding aredwellare set settimeset timeas as as boardingandandestimationboarding alighting alighting and and models. passengeralighting passengeralighting The passenger counts passengercountsproposed and and counts onboardcountsmo onboarddels for for estimatepassengerall passengerall trains trains dwell at counts at allcounts all stations,times stations, on on arrivingat arriving inall in orderstations order trains. trains. to to develop with develop given dwell dwell boarding time time andindependentestimationindependentindependentestimation alightingBoarding,Boarding, models. variablesmodels.passenger variablesvariablesalighting, alighting, The The and countsproposed andand andproposed theand the onboarddwell and onboarddwell mo onboard motimedels time timedelspassenger aspassengerestimate as asdependentestimatepassenger dependent dependent counts dwell counts dwell countsvariable. variable.timesfor variable. timesfor eachon ateach This arriving at allThistrain This all istrainstations isintuitive,stations at isintuitive,trains. at intuitive,each eachwith with asstation as stationgivendwell asgivendwell dwellare boardingtimes are timesboarding set timesset areas areas estimationandestimationBoarding, alightingBoarding, models. models. alighting,passenger alighting, The The proposed andcountsproposed and onboard onboard and mo mo delsonboard delspassenger passengerestimate estimate passenger countsdwell counts dwell counts timesfor timesfor each eachaton at allarrivingtrain all trainstations stations at at trains.each eachwith with station stationgiven given areboarding are boarding set set as as independentmainlyandareindependentmainlyand alightingBoarding, mainly alighting affected affecteda variables ffpassenger ectedvariables alighting, passengerby by the by theand the boardingcountsand and boardingcounts the boarding the onboarddwell and dwell andand onboardand time andonboard timealighting passenger alighting alightingas as dependentpassenger dependentpassenger counts, counts,counts counts, countsvariable. while counts variable. forwhile while each onthe onThis the arriving This onboardarrivingtrain onboardis isintuitive, at intuitive,trains. eachtrains.crowdedness crowdednesscrowdedness asstation as dwell dwell are timesin intimes insettrains trains trains areas are andindependentindependentand alighting alightingBoarding, variables passenger variables passenger alighting, and countsand counts theand the dwell and onboarddwell and onboard time onboard time aspassenger as dependentpassenger dependentpassenger counts countsvariable. countsvariable. for on each onThis arriving This arriving trainis isintuitive, intuitive,trains. at trains. each as as stationdwell dwell times are times set are are as independentmainlyindirectlyindirectlymainlyindirectlyBoarding,Boarding, affected affectedaffects a affectsff variablesects alighting, bythem.alighting, them. bythem. the theand Theboarding Theandboarding the dependentand dependent dwellonboard onboard and and time variable,alighting passenger variable,alighting aspassenger dependent dwecounts, dwe dwellcounts,counts llcounts time,ll variable. time, time,while forwhile was for was waseachthe extracted eachThisthe extracted extractedonboardtrain onboardistrain intuitive, atfrom atfrom eachcrowdedness from each crowdednessthe theasstation theobserved stationdwell observed observed are times inare data insettrains data set trains dataare as as mainlymainlyindependentBoarding,Boarding, affected affected variablesalighting, byalighting, by the the boardingand andboarding and the onboard onboarddwell and and timealighting passengeralighting passenger as dependent counts, counts,counts counts while variable. forwhile for each the eachthe Thisonboardtrain onboardtrain is atintuitive, at eachcrowdedness eachcrowdedness station asstation dwell are inare times in settrains settrains as are as mainlyindirectlyindependentdescribedasindirectlydescribedindependent described affected affectsin affectsin variablesSection invariablesSection bythem. Section them. the 3. and The3.Supportboarding and3 TheSupport. thedependent Support thedependent dwell dwellVector andVector time Vector variable,alightingtime Regressionvariable, asRegression as dependent Regression dependent dwecounts, dwe (SVR),ll time,(SVR),ll variable. time, (SVR),while variable. Multiplewas Multiplewas the Multipleextracted This extracted Thisonboard Linear is Linear isintuitive, Linearfromintuitive, from Regressioncrowdedness Regression the Regressiontheas observed as dwell observed dwell (MLR), (MLR), timesin timesdata (MLR),trains data and are asand are as independentindirectlyindirectlymainlyindependent affectedaffects affects variables variables them. bythem. the and The andThe boarding thedependent thedependent dwell dwell and time variable,time variable,alighting as as dependent dependent dwe dwe counts,ll lltime, variable.time, variable. while was was extracted Thisthe extracted This onboardis isintuitive, fromintuitive, from crowdednessthe theas observed as dwell observed dwell times timesdatain data trains are as are as indirectlydescribedmainlyRandomanddescribedRandommainly Random affected Forestaffectedaffectsin Forest inSection ForestSection bythem.(RF) by(RF) the 3. (RF)themethods The 3.Supportboardingmethods Supportboarding methodsdependent wereVector andwereVector and were usedvariable,alighting Regressionusedalighting Regression used to to devedwe to counts,deve developcounts, (SVR),lllop time,(SVR),lop while3 3while Multipledifferentwas 3 Multipledifferent di the ffextracted erentthe onboard Linearestimationonboard Linear estimation from Regressioncrowdedness Regression crowdednessthe models, models,models,observed (MLR), (MLR),and in andand data in trains theirand trains their theirasand mainlydescribeddescribedindirectlymainly affected affectedin affectsin Section Section by bythem. the 3.the 3.Supportboarding The Supportboarding dependent Vector andVector and alighting Regressionvariable,alighting Regression counts, dwe counts, (SVR), (SVR),ll time,while whileMultiple Multiplewas the the extracted onboard Linearonboard Linear from Regressioncrowdedness Regression crowdedness the observed (MLR), (MLR), in intrains data andtrains and as describedRandomindirectlyestimationestimationRandomestimationindirectly Forestaffectsin performances Forestaffects performances performancesSection (RF)them. (RF)them. 3. methods The Supportmethods Thewere were weredependent dependent comp compared. wereVectorcomp wereared. ared.usedvariable, Regression usedvariable, Seventy SeventySeventyto to devedwe devedwe percent (SVR), percentllloppercent time,lllop time,3 of3 Multipledifferentwas of of differentthewas thethe extracted data extracted datadata Linearestimation wereestimation werewere from usedfromRegression usedused the models, asthe asmodels,observedas a observeda traininga training training(MLR), and and data setdata their set andsettheir asto to asto indirectlyRandomRandomdescribedindirectly Forestaffects Forestaffectsin Section them.(RF) (RF)them. methods The3.methods TheSupport dependent dependent were wereVector usedvariable, usedvariable, Regression to to devedwe devedwelllop (SVR),time,lllop time,3 3 differentwas differentMultiplewas extracted extracted estimation Linearestimation from from Regression the models,the models,observed observed (MLR),and and data datatheir their asand as Randomestimationdevelopdescribeddevelopestimationdevelopdescribed theForest the intheperformances model inperformancesSection model, model Section (RF), and , and 3. andmethods 3.Support 30% 30%wereSupport 30% were of of comp ofthe thewereVectorcomp the Vector data dataared. dataared.used Regressionwere were SeventyRegressionwere Seventyto used used deveused percent as as (SVR),loppercent as a(SVR), a validation validationa3 validation ofdifferentMultiple ofMultiplethe the data set. set.data set.Linearestimation wereLinear were usedRegression usedRegression models,as as a traininga training(MLR), and(MLR), settheir andset toand to describedestimationestimationRandomdescribed inperformances Forest inperformancesSection Section (RF) 3. 3.Supportmethods wereSupport were comp compVector wereVectorared.ared. Regressionused SeventyRegression Seventy to deve percent (SVR),percent (SVR),lop of3Multiple of differentMultiplethe the data data Linear wereLinearestimation were usedRegression usedRegression as models,as a traininga training(MLR), (MLR), and set andsettheir toand to developRandomdevelopRandom theForest the Forestmodel model (RF) ,(RF) and, andmethods 30%methods 30% of ofthewere thewere data dataused wereused were to used to deveused deve aslop as alop validation a3 validation 3different different set. set.estimation estimation models, models, and and their their estimationdevelopestimationToToTo comparethe compare compareperformances model performances thethe, theand, performancesperformances performances 30%were were of comp the comp ofdata ared.of the ared. of thewere di theSeventyff Seventydifferenerent useddifferen models,percent as tpercent models,at validationmodels, performancesof ofthe performancesthe performancesdata set. data were were used comparedwereused were as as compareda comparedtraininga fortraining 3 di setfffor erentsetfor to3 to 3 estimationRandomdevelopRandomToTo compare Forest compare theperformancesForest model (RF) the(RF) the andperformancesmethods performancesmethods were 30% comp ofwere thewere ared. of data used of theused theSeventywere differento differen to deveused deve percenttlop asmodels,tlop models,a3 validation of3different differentthe performances performancesdata set.estimation wereestimation used were were models,as models, compareda comparedtraining and and setfortheir fortheir to3 3 developdifferentscenariosdevelopdifferentestimation the scenarios the ofscenarios model dwellperformances model ,of time and , ofdwell and dwell estimation30% 30% timewere oftime of theestimation comp theestimationaccuracy data dataared. were were accuracy ofSeventy accuracy used less used than as of percentas a ofless 3,validationa less 5,validation than and ofthan the 3, 10 3,5,set. sdata 5, betweenandset. and were10 10 s between the sused between actual as thea and trainingthe actual estimatedactual andset and to estimationestimationToTo compare compare performances performances the, the performances performances were were comp comp ared.of ared.of the theSeventy Seventydifferen differen percent tpercent models,t models, of of the performancesthe performancesdata data were were used wereused were as as compareda comparedtraininga training setfor setfor to3 to3 differentestimateddevelopdwelldifferentestimateddevelopTo times. compare thescenarios dwell thescenarios modeldwell Amongmodel times. the times.of and, ofdwell theperformancesand Amongdwell 30%Among validation 30% time timeof the of theestimation the the estimationvalidation data set,validationof data the were were accuracy SVRdifferen accuracyset, used set, used modelth theas tof SVReas models, a oflessSVR shows validationa less validationmodel than model than thatperformances 3, shows 3,5, set. 53.6%shows 5,andset. and that 10 ofthat 10 s cases53.6% between swere 53.6% between have ofcompared ofcases the lesscases the actual thanhave actual have for 3andless sandless3 of developdifferentdifferentdevelopTo the scenarios compare thescenarios model model ,of the and,of dwell and dwell performances30% 30% time timeof of theestimation theestimation data data of were thewere accuracy accuracy useddifferen used as of as atof less validationmodels,a less validation than than 3,performances 3,5,set. 5,andset. and 10 10 s betweens betweenwere comparedthe the actual actual andfor and 3 differentestimatedthanerrors,estimatedthan To3 sTo3 andofcompare sscenarios dwell ofcompareerrors, dwell 67.2%errors, times. andthe times.of have andthe dwell performances 67.2%Among performances 67.2%lessAmong time thanhave havethe estimation the 5lessvalidation s lessvalidationof of than oferrors. thethan the accuracy5differen set,s5differen of Onset,s oftherrors. thetheerrors. tof SVRemodels,t otherless SVRmodels, On model Onthan model the hand, theperformances 3, othershowsperformances 5, othershows 87.5% and hand, that hand,10 ofthat s 53.6% the87.5% betweenwere 53.6%87.5% were estimation ofofcompared of ofcases thecompared casesthe estimationactual have byestimation have thefor andlessfor RFless3 3 estimatedestimateddifferentToTo compare dwellcompare scenarios dwell times. thetimes. the of performances Amongdwell performances Among time the the estimationvalidation validationof ofthe the differen accuracyset,differen set, th thet SVRemodels, t of SVRmodels, less model model than performances showsperformances 3, shows 5, and that that 10 53.6% s were53.6% betweenwere ofcompared of casescompared cases the have actual have for lessfor andless3 3 estimatedthanbydifferentmodelthanbydifferent the 3the RFs3 in of RFsscenarios modeldwell ofthe errors,scenarios model errors, validation times.in and of inthe and ofdwellthe 67.2% Amongvalidationdwell 67.2% setvalidation timemarked havetime havethe estimation set lessestimationvalidation set less belowmarked thanmarked than 10accuracy5 belowaccuracyset,s5 s ofbelow ofs oftherrors. error, e errors.10 ofSVR 10 ofsless as of sOnless model of shownerror,Onthan the error,than the 3, othershowsas in3,5,other as shown Table5,and shown hand,and thathand, 102 in.10 s It53.6%87.5% inbetween Tas is87.5% betweenTable notedble of of2. ofcases2. Itthethe that It isthethe estimationactualisnoted have theestimationactualnoted dwell thatandless thatand differentthanthanestimateddifferent 3 s3 of sscenarios of errors,scenarios dwell errors, and times.of and ofdwell 67.2% dwell 67.2%Among time havetime have estimation the estimationless lessvalidation than than accuracy5 accuracys5 ofset,s of errors. therrors. ofe ofSVRless Onless Onthan model the than the 3,other 3,other 5,shows 5,and hand,and hand, 10 that 10 s 87.5%between s 53.6%87.5% between of ofof thethe cases thethe estimationactual estimationactual have and andless thanbytheestimatedtimesbytheestimated the dwell 3the dwell RFs used ofRF modeldwell timeserrors, modeldwell fortimes training times.inused and times. inusedthe the for67.2%validationAmong forarevalidationAmong training training inhave integerthe theset lessvalidationare set validationmarkedare valuesthaninmarked ininteger 5integer belowsset, while ofbelowset, therrors.values eth thevalues10 SVR e10 sSVR estimated ofsOnwhile model oferror,while model theerror, the other asshowsthe dwell as showsshownestimated shown estimatedhand, that times that in 53.6%87.5% inTa dwell from53.6% Ta bledwellble of of2. the timesof cases2.Itthe times cases modelsItis estimationisnoted havefrom noted havefrom are thatlessthe thatlessthe in estimatedbybythanestimated the the RF3 RFs modeldwellof modeldwell errors, times.in times.in the and the Amongvalidation 67.2%validationAmong havethe the set validation set less validationmarked marked than belowset,5 belowset,s ofth th eerrors.10 SVR e10 sSVR ofs modelof error,On model error, the showsas otherasshows shown shown thathand, that in 53.6%in Ta 53.6%87.5% Tableble of 2. ofof 2.casesIt casesItisthe isnoted have estimationnoted have thatless thatless bythemodelsthanrealthemodelsthan the dwell 3 numbers,dwell RFs3 are ofs modelare timesoferrors, in timeserrors, inreal and inusedreal and usedthenumbers, makingand numbers, for67.2%validation for67.2% training training distinctions haveand haveand set makinglessare making lessmarkedare thanin betweenthan ininteger distinct integer5 distinct below s5 ofs ofvalueserrors. theions valueserrors.10ions 3 sbetween scenarios of Onwhilebetween error,Onwhile the thethe otherthe as separatedthe theother shownestimated 3 estimatedhand,scenarios3 hand,scenarios inby 87.5% Ta dwell87.5% a bledwellseparated few separated of2. timesseconds ofItthe times isthe estimationnoted byfromestimation byfrom maya thatfewathe few bethe thanthethebythan dwell the 3dwell s3 ofRFs oftimeserrors, model timeserrors, used and usedin and the for67.2% for 67.2%validation training training have have lessare set lessare thanmarkedin thanin integer integer5 s5 ofbelows oferrors.values valueserrors. 10 sOnwhile of Onwhile theerror, the the other the other asestimated shownestimated hand, hand, 87.5%in dwell87.5% Ta dwellble of times of2.the times Itthe estimationis from estimationnoted from the thatthe themodelssecondsbypracticallymodelssecondsby thedwell the RF are mayRF modelaretimesmay in insignificantmodel be inreal be practically usedrealin practically innumbers,the numbers,the for validation invalidation training insignificant real-life insignificantand and set makingare metrosetmaking marked inmarked in integer operations. indistinctreal-life distinct real-lifebelow below valuesions metro 10ions metro 10 sbetween of swhile betweenoperations. oferror, operations. error, the theas the as estimated shown3 shownscenarios3 scenarios in inTa dwell Ta bleseparated bleseparated 2. times 2.It isIt isnoted byfrom noted by a thatfewathe thatfew bymodelsmodelstheby the the dwell RF are RF modelare modelintimes in real realin usedinthenumbers, numbers,the validation for validation training and and setmaking set making aremarked marked in distinct integerdistinct below belowions values10ions 10 sbetween ofsbetween oferror,while error, the as thethe as shown 3 shownestimated3scenarios scenarios in inTa Ta bledwellseparated bleseparated 2. 2.It times Itis isnoted by noted byfrom a thatafew thatfewthe modelssecondsthesecondsthe dwellWhen dwellWhen aremay maytimes compared intimes becompared real be practicallyused practically usednumbers, fromfor fromfor training insignificantthetraining insignificantandthe perspective perspectivemaking are are in in in integer indistinctreal-life integer ofreal-life ofpercen percenvaluesions metrovalues metrotage betweentage whileoperations. accuracy, whileoperations. accuracy, thethe the estimated 3 the estimatedscenarios the SVR SVR modeldwell modeldwellseparated timesperformed timesperformed byfrom from a fewthe the thesecondssecondsmodelsthe dwell dwell may aremaytimes times bein be practicallyrealused practicallyused numbers, for for training insignificanttraining insignificant and are makingare in in in integerin real-life integer distinctreal-life values metrovaluesions metro between whileoperations. whileoperations. the thethe estimated estimated3 scenarios dwell dwell separated times times from byfrom athe fewthe secondsbestmodelsbestmodels inWhenTable inWhenthe aremay the are case 2.comparedin case becomparedThe inreal of practicallyreal ofproportionless numbers, less numbers, thanfrom thanfrom insignificant30%the of 30%andthe estimated andperspectiveerrors perspectiveerrorsmaking making followed data infollowed distinctreal-life of dependsdistinct ofpercen by percen byions metroRFions onRF tage andbetween thetage and betweenoperations. MLRaccuracy, absolute MLRaccuracy, modelsthe modelsthe di 3 theff scenarioserence3the asscenariosSVR asseenSVR betweenseen model inmodelseparated inTableseparated Table actualperformed performed 3. 3.Figuresby dwell Figuresby a fewathe fewthe8 8 modelssecondsmodelsWhenWhen are aremay comparedin compared inreal be real practically numbers, numbers, from from the insignificantandthe andperspective perspectivemaking making indistinct ofdistinctreal-life of percen percenionsions metrotage betweentage between accuracy, operations. accuracy, the the 3the scenarios3 the scenariosSVR SVR model modelseparated separated performed performed by by a fewathe fewthe bestandsecondsbestandseconds 9inWhen time show 9inthe showmay the andmay casecompared thecase be estimatedthe beof comparisonpractically of lesscomparisonpractically less thanfrom dwellthan 30%insignificant theof time30%insignificant of dwell perspectiveerrors dwell byerrors the time followed modeltimeinfollowed inreal-lifeestimation of real-lifeestimation types. percen by by RFmetro RFmetro tageresultsand resultsand operations. MLRaccuracy, operations. MLR derived derived models models the from from asSVR as seen each seen eachmodel in model. in Tablemodel. Tableperformed 3.The 3.TheFigures Figuresorange orange the 8 8 secondsbestbestseconds in in Whenthe maythe maycase casebecompared beof practically of lesspractically less than thanfrom insignificant30% 30% insignificantthe errors errorsperspective followed infollowed inreal-life real-life of by percen by metroRF RFmetro and tageand operations. MLR operations. MLRaccuracy, models models the as as SVRseen seen inmodel in Table Table performed 3. 3.Figures Figures the8 8 andand 9When show9When show compared the compared the comparison comparison from from theof the of dwellperspective dwellperspective time time estimation ofestimation ofpercen percen tageresults tageresults accuracy, accuracy,derived derived the from the from SVR SVReach eachmodel modelmodel. model. performed performed The The orange orange the the bestsquareandbestsquare in9 showin therepresents therepresents case thecase of comparison of lessthe lessthe thantrains thanModeltrains 30% operated of30% operated Type dwellerrors errors time duringfollowed duringfollowed< 3estimation s (%)the bythe morningby RF morning RF resultsand< and5 s MLRpeak (%) MLR peakderived modelshour, modelshour,< from10the as sthe as (%)blueseen each blueseen circlein model. circle inTable Table represents represents 3.The 3.Figures Figuresorange the 8the 8 squarebestandsquare inWhen 9 Whentherepresents show represents casecompared compared the of lesscomparisonthe the trainsthanfrom trainsfrom 30% theoperated theofoperated errorsperspective dwellperspective duringfollowed timeduring ofestimation the of percen theby percenmorning RFmorning tageand tageresults MLRaccuracy,peak accuracy,peak derived modelshour, hour, the thethe from astheSVR blueseenSVR blue eachmodel circleinmodel circle Tablemodel. performedrepresents performedrepresents 3. TheFigures orange the the8 andtrainsandtrainsbest 9 showoperated9in showoperated the thecase the duringcomparison ofduringcomparison less the than the evening of evening30% ofdwell dwell errors peak timepeak time followedhour, estimationhour, estimation and and by the RF theresults black resultsand black MLRtrianglederived trianglederived models representsfrom representsfrom as each seeneach model.the in model.the Tabletrains trains The The3.operated Figuresorangeoperated orange 8 bestsquaresquarebest in intherepresents therepresents case case of of lessthe lessthe thantrains trainsthan 30%SVR operated 30% operated errors errors duringfollowed duringfollowed 53.6 the theby morningby RFmorning RF and and 67.2 MLRpeak MLRpeak modelshour, modelshour, the 84.4 asthe as blueseen blueseen circlein circle inTable Table represents represents 3. 3.Figures Figures the the8 8 squaretrainsoutsideandtrainsoutsideand 9 operatedshow 9represents theoperatedshow the two the two the duringrush-hourcomparison duringrush-hourcomparisonthe trains the the periods.evening operatedperiods.ofevening ofdwell dwell peak peaktimeduring time hour, estimationhour, estimationthe and andmorning the theresults black results black peak trianglederived trianglederived hour, represents fromthe representsfrom blue each each circle themodel. themodel. trains represents trains The Theoperated operatedorange orange the andtrainstrainssquareand 9 showoperated9 operatedshow represents the the duringcomparison duringcomparison the the trains the evening RFofevening operatedof dwell dwell peak peaktime timeduring hour, estimationhour, 43.7 estimation andthe and morningthe theresults blackresults 62.2black peaktrianglederived trianglederived hour, representsfrom 87.5represents fromthe each blue each model. thecircle themodel. trains trains represents The Theoperated operatedorange orange the trainsoutsidesquareoutsidesquare operated representsthe representsthe two two duringrush-hour rush-hourthe the trains the trains periods.evening periods.operated operated peak during during hour, the and the morning themorning black peak peaktriangle hour, hour, represents the the blue blue circle the circle trains represents represents operated the the squareoutsideoutsidetrainssquare represents theoperated representsthe two two rush-hour duringrush-hourthe the trains trains the periods.MLR periods.operatedevening operated peakduring during hour, 39.0 the the andmorning morning the 54.3black peak peak triangle hour, hour, the 74.0represents the blue blue circle circle the represents trains represents operated the the outsidetrainstrainsTable operatedTable theoperated 2. two 2.The The duringrush-hour proportion during proportion the the periods.eveningof eveningofestimated estimated peak peak data datahour, hour,depends depends and and theon onthe the black the blackabsolute absolute triangle triangle difference difference represents represents between between the the actual trains actual trains dwell operated dwell operated trainsoutsidetrains operated operated the two during duringrush-hour the the evening periods.evening peak peak hour, hour, and and the the black black triangle triangle represents represents the the trains trains operated operated outsideoutsideTabletimeTabletime the and the 2. andtwo 2.The estimatedtwo The estimated rush-hour proportion rush-hour proportion dwell dwell periods.of time periods. ofestimated time estimated by by the the modeldata modeldata depends types.depends types. on on the the absolute absolute difference difference between between actual actual dwell dwell TableTable 2. 2.The The proportion proportion of ofestimated estimated data data depends depends on on the the absolute absolute difference difference between between actual actual dwell dwell outsideoutsidetimeWhentime the and the andtwo compared estimatedtwo estimated rush-hour rush-hour dwell from dwell periods. timetheperiods. time perspective by by the the model model of types. percentage types. accuracy, the SVR model performed the best TabletimetimeTable and 2. and 2.The estimated Theestimated proportion proportion dwell dwell of time ofestimated time estimated by by the the modeldata modeldata depends types.depends types. on on the the absolute absolute difference difference between between actual actual dwell dwell in theTable case 2. of The lessModel proportionModel than Type 30% Type of errors estimated followed <3 <3data s (%)s (%)depends by RF and on MLRthe <5 <5 sabsolute (%) modelss (%) difference as seen <10<10 inbetween s Table (%)s (%) 3 actual . Figures dwell8 and 9 timetimeTable and and 2.estimated Theestimated proportion dwell dwell time of time estimated by by the the model modeldata types. depends types. on the absolute difference between actual dwell TableTable 2. 2.The TheModel proportionModel proportion Type Type of ofestimated estimated <3 <3data s data (%)s (%)depends depends on on the <5 the <5 sabsolute (%)sabsolute (%) difference difference <10<10 between s between (%)s (%) actual actual dwell dwell showtimetime the and comparisonand estimated estimatedModelModelSVR SVRdwell ofType dwellType dwell time time timeby by the<3 estimationthe <353.6model s 53.6model (%)s (%) types. types. results <5 derived <567.2 s 67.2 (%)s (%) from each <10 model. <1084.4 s84.4 (%)s (%) The orange square timetime and and estimated estimatedModelSVR SVRdwell Type dwell time time by by the <3 the 53.6model s 53.6model(%) types. types. <567.2 s 67.2(%) <1084.4 s84.4 (%) represents the trainsModelSVRRFSVR operatedRF Type during <343.753.6 the53.643.7 s (%) morning peak <562.267.267.2 hour,62.2 s (%) the blue circle <1087.584.484.487.5 s represents(%) the trains ModelModelRF RFType Type <343.7 <3s 43.7 (%)s (%) <562.2 <5s 62.2 (%)s (%) <10 <1087.5 s87.5 (%)s (%) operated duringModel theModelMLRSVRRF eveningMLRSVR RFType Type peak hour, <353.639.0 <343.7 s 43.753.639.0 (%) ands (%) the black <5triangle67.2 54.3<562.2 s 62.267.254.3 (%)s (%) represents <10 the <1084.4 74.087.5 s87.584.4 74.0 trains (%)s (%) operated outside MLRSVRRFMLRSVR 43.739.053.639.053.6 62.2 54.367.2 54.367.2 87.5 74.084.4 74.084.4 the two rush-hour periods.MLRSVRMLRSVRRF 53.639.039.043.753.6 67.2 54.3 54.362.267.2 84.4 74.0 74.087.584.4 MLRRFRF 39.043.743.7 54.362.262.2 74.087.587.5 RFMLRRF 43.739.043.7 62.2 54.362.2 87.5 74.087.5 MLRMLR 39.039.0 54.3 54.3 74.0 74.0 MLRMLR 39.039.0 54.3 54.3 74.0 74.0 Appl.Appl. Sci.Sci. 20202020,, 1010,, 476x FOR PEER REVIEW 99 ofof 1212 Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 12 Table 3. The proportion of estimated data depends on the relative difference between actual dwell TableTabletime and 3. 3.The Theestimated proportion proportion dwell of of estimatedtime estimated by the data modeldata depends depends types. on theon the relative relative diff erencedifference between between actual actual dwell dwell time andtime estimated and estimated dwell dwell time bytime the by model the model types. types. Model Type <10% (%) <20% (%) <30% (%) ModelSVRModel Type Type <10%52.8<10% (%) (%) < <20%20%71.4 (%)(%) <30% <30% (%)81.6 (%) SVRRF SVR 42.352.8 52.8 71.469.7 81.681.081.6 MLRRF RF 42.336.5 42.3 69.762.1 81.081.0 73.8 MLRMLR 36.5 36.5 62.1 73.8 73.8

FigureFigure 8.8. Dwell time estimation performance comparisoncomparison amongamong thethe ((aa)) SVR,SVR, ((bb)) RF,RF, ((cc)) MLRMLR models.models. Figure 8. Dwell time estimation performance comparison among the (a) SVR, (b) RF, (c) MLR models.

FigureFigure 9.9. DwellDwell timetime estimationestimation performanceperformance comparisoncomparison byby thethe timetime ofof day.day. ((aa~~cc)) inin thethe morningmorning peak,Figurepeak, ((d d9.~~f f)Dwell) inin thethe time eveningevening estimation peakpeak andand performance ((gg~~ii)) inin thethe comparison normalnormal period.period. by the time of day. (a~c) in the morning peak, (d~f) in the evening peak and (g~i) in the normal period. Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 12 Appl. Sci. 2020, 10, 476 10 of 12 To further enhance the SVR model with 3 variables (with boarding, alighting, and onboard passenger counts at independent variables) which was found to be the most accurate model among To further enhance the SVR model with 3 variables (with boarding, alighting, and onboard the 3 models, we developed different SVR models with 3 additional input variables: train arrival time, passenger counts at independent variables) which was found to be the most accurate model among station information, and time difference between arrivals at 2 consecutive stations. As shown in the 3 models, we developed different SVR models with 3 additional input variables: train arrival Tables 4 and 5, it is found that the performances of SVRs with additional input variables were further time, station information, and time difference between arrivals at 2 consecutive stations. As shown in enhanced. In particular, the scenario with less than 5 s of error was improved by 20% in accuracy. Tables4 and5, it is found that the performances of SVRs with additional input variables were further Most notably, the enhanced SVR model performed at 97.2% accuracy for the scenario with less than enhanced. In particular, the scenario with less than 5 s of error was improved by 20% in accuracy. 10 s of error. Figure 10 shows the comparison of estimation performance between the 6-variable Most notably, the enhanced SVR model performed at 97.2% accuracy for the scenario with less than 10 model with the 3-variable model. s of error. Figure 10 shows the comparison of estimation performance between the 6-variable model with the 3-variable model. Table 4. The proportion of estimated data depends on the absolute difference between actual dwell Tabletime 4.andThe estimated proportion dwell of time estimated by the data number depends of variables on the absoluteof SVR models. difference between actual dwell time andModel estimated Type dwell time by the <3 number s (%) of variables of <5 SVR s (%) models. <10 s (%) SVR: 3 variablesModel (A) Type 53.3 <3 s (%) <5 67.2 s (%) <10 s (%) 84.4 SVR: 6 variablesSVR: 3 variables(B) (A) 76.3 53.3 67.2 87.6 84.4 97.2 ImprovementsSVR: (B-A) 6 variables (B)23.0 76.3 87.620.4 97.2 12.8 Improvements (B-A) 23.0 20.4 12.8 Table 5. The proportion of estimated data depends on the relative difference between actual dwell Tabletime 5.andThe estimated proportion dwell of estimated time by the data number depends of variables on the relative of SVR di ffmodels.erence between actual dwell time and estimatedModel dwell Type time by the number <10% of(%) variables of SVR <20% models. (%) <30% (%) SVR: 3 variablesModel (C) Type 52.8< 10% (%) <20% 71.4 (%) <30% (%) 81.6 SVR: 6 variablesSVR: 3 variables(D) (C) 72.1 52.8 71.4 92.9 81.6 96.3 ImprovementsSVR: (D-C) 6 variables (D)19.3 72.1 92.921.5 96.3 14.7 Improvements (D-C) 19.3 21.5 14.7

FigureFigure 10. 10.Estimation Estimation performance performance of of the the 6-variable 6-variable and and the the 3-variable 3-variable SVR SVR model. model.

Appl. Sci. 2020, 10, 476 11 of 12

5. Conclusions This paper develops a railway dwell time estimation model using SVR, RF, and MLR methods. In the first phase, smart card-based passenger information is matched against the real-time train operation data from the S line of the Seoul Metro in Seoul, South Korea, for extracting boarding, alighting, on-bard passenger counts, and the observed dwell times for all trains at all stations. When the train departure times were missing, an assumption of constant braking time was adopted to estimate actual/observed dwell times, since the S line is autonomously operated with minimal variability in braking times. In the second phase, the paper develops dwell time estimation models utilizing the extracted information from the first phase. The SVR, RF and MLR-based models were developed for dwell time estimations while boarding, alighting, and onboard passenger counts were treated as independent variables and the dwell time was set as a dependent variable. In the comparative scenarios with less than 3, 5, and 10 s of errors between the estimations and the observed values, the SVR model performed the best, with an accuracy of 67.2% in the scenario with less than 5 s of error. Then the SVR model was enhanced by including additional independent variables, including arrival times and time difference of arrivals, at 2 consecutive stations for all trains at all stations. It was found that the enhanced SVR model improved the accuracy by 20% to an accuracy of 87.6% for the scenario of less than 5 s of error. In the case of less than 10 s of error, the improved SVR model performed at 97.2% accuracy. This research is unique in the sense that, firstly, it extracts boarding, alighting, and onboard passenger counts using data from real-life metro operations and smart card-based passenger information for all trains at all stations on an urban metro line. Secondly, this research develops dwell time estimation models with high performance accuracies that are validated by real-life data. The results of this paper are especially beneficial for autonomous railway operation, which requires construction and maintenance of dynamic railway timetables that require reliable dwell time real-time predictions. However, this estimation model may not work well if incidents such as rolling stock failure, or big events (i.e., sports games or exhibitions) occur near metro stations that may increase the demand dramatically and present unseen data to the proposed models. Enhancing the proposed estimation models to cover not only a general commute-based operation environment, but also the situations affected by such special events, are topics of future studies.

Author Contributions: In this study, all of the authors contributed to the writing of the manuscript. Y.O. designed the overall modeling framework and performed the data analysis. Y.-J.B. helped with the data analysis and drafted the initial version of the manuscript. J.Y.S. supported data acquisition and contributed to the results interpretation. H.-C.K. and S.K. advised on the modeling process and coordinated the overall research. All authors have read and agreed to the published version of the manuscript. Funding: This research was supported by a grant from R&D Program of the Korea Railroad Research Institute, Republic of Korea. Acknowledgments: The authors thank Seong-Jong Woo and Soo-In Lee for their help in data processing. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Parbo, J.; Nielsen, O.A.; Prato, C.G. Passenger Perspectives in Railway Timetabling: A Literature Review. Transp. Rev. 2016, 36, 500–526. [CrossRef] 2. Goverde, R.M.P. A delay propagation algorithm for large-scale railway traffic networks. Transp. Res. Part C Emerg. Technol. 2010, 18, 269–287. [CrossRef] 3. Harrod, S.; Cerreto, F.; Nielsen, O.A. A closed form railway line delay propagation model. Transp. Res. Part C Emerg. Technol. 2019, 102, 189–209. [CrossRef] 4. Liebchen, C. The first optimized railway timetable in practice. Transp. Sci. 2008, 42, 420–435. [CrossRef] 5. Harrod, S.S. A tutorial on fundamental model structures for railway timetable optimization. Surv. Oper. Res. Manag. Sci. 2012, 17, 85–96. [CrossRef] Appl. Sci. 2020, 10, 476 12 of 12

6. Cucala, A.P.; Fernández, A.; Sicre, C.; Domínguez, M. Fuzzy optimal schedule of high speed train operation to minimize energy consumption with uncertain delays and driver’s behavioral response. Eng. Appl. Artif. Intell. 2012, 25, 1548–1557. [CrossRef] 7. Högdahl, J.; Bohlin, M.; Fröidh, O. A combined simulation-optimization approach for minimizing travel time and delays in railway timetables. Transp. Res. Part B Methodol. 2019, 126, 192–212. [CrossRef] 8. Goverde, R.M.P.; Hansen, I.A. Performance indicators for railway timetables. In Proceedings of the IEEE International Conference on Intelligent Rail Transportation, Beijing, China, 30 August–1 September 2013; pp. 301–306. 9. Eom, J.K.; Song, J.Y.; Moon, D.S. Analysis of public transit service performance using transit smart card data in Seoul. KSCE J. Civ. Eng. 2015, 19, 1530–1537. [CrossRef] 10. Hong, S.P.; Min, Y.H.; Park, M.J.; Kim, K.M.; Oh, S.M. Precise estimation of connections of metro passengers from Smart Card data. Transportation 2016, 43, 749–769. [CrossRef] 11. Alvarez, A.B.; Merchan, F.; Poyo, F.J.C.; George, R.J.C. A fuzzy logic-based approach for estimation of dwelling times of panama metro stations. Entropy 2015, 17, 2688–2705. [CrossRef] 12. Lam, W.H.K.; Cheung, C.Y.; Poon, Y.F. A Study of Train Dwelling Time at the Hong Kong Mass Transit Railway System. J. Adv. Transp. 1998, 32, 285–296. [CrossRef] 13. Lee, W.H.; Yen, L.H.; Chou, C.M. A delay root cause discovery and timetable adjustment model for enhancing the punctuality of railway services. Transp. Res. Part C Emerg. Technol. 2016, 73, 49–64. [CrossRef] 14. Markovi´c,N.; Milinkovi´c,S.; Tikhonov, K.S.; Schonfeld, P. Analyzing passenger train arrival delays with support vector regression. Transp. Res. Part C Emerg. Technol. 2015, 56, 251–262. [CrossRef] 15. Wang, R.; Work, D.B. Data Driven Approaches for Passenger Train Delay Estimation. In Proceedings of the 18th International Conference on Intelligent Transportation Systems, Las Palmas, Spain, 15–18 September 2015; pp. 535–540. 16. Jiang, Z.; Xie, C.; Ji, T.; Zou, X. Dwell time modelling and optimized simulations for crowded rail transit lines based on train capacity. Promet Traffic Transp. 2015, 27, 125–135. [CrossRef] 17. Adachi, S.; Yoshino, H.; Koresawa, M.; Parady, G.T.; Takami, K.; Harata, N. A Study on Train Travel Time Simulation Focused on Detailed Dwell Time Structure and on-site Inspections. In Proceedings of the 8th International Conference on Railway Operations Modelling and Analysis (ICROMA), Norrköping, Sweden, 17–20 June 2019; pp. 15–28.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).