Time-Series Data Mining in Transportation: a Case Study on Singapore Public Train Commuter Travel Patterns

IACSIT International Journal of Engineering and Technology, Vol. 6, No. 5, October 2014 Time-Series Data Mining in Transportation: A Case Study on Singapore Public Train Commuter Travel Patterns Roy Ka-Wei Lee and Tin Seong Kam discuss the potential use of time-series data mining, a Abstract—The adoption of smart cards technologies and relatively new framework by integrating conventional automated data collection systems (ADCS) in transportation time-series analysis and data mining techniques, to discover domain had provided public transport planners opportunities to actionable insights and knowledge from the smart card amass a huge and continuously increasing amount of time-series data about the behaviors and travel patterns of commuters. temporal data. A case study on the Singapore public train However the explosive growth of temporal related databases has transit will also be used to demonstrate the time-series data far outpaced the transport planners’ ability to interpret these mining framework and methodology. data using conventional statistical techniques, creating an In the following, we summarize the contribution of this urgent need for new techniques to support the analyst in paper: transforming the data into actionable information and knowledge. This research study thus explores and discusses the We discuss and explore the use of time-series data potential use of time-series data mining, a relatively new mining techniques in transportation domain. The framework by integrating conventional time-series analysis and application of time-series data mining would allow data mining techniques, to discover actionable insights and knowledge from the transportation temporal data. A case study transport planners to effectively transform large amount on the Singapore public train transit will also be used to of complex temporal data into actionable insights and demonstrate the time-series data-mining framework and knowledge. methodology. We demonstrate the use of time-series data mining in transportation domain with a case study on Singapore Index Terms—Time-series data mining, smart card, big data, public train transit. The case study would apply transportation. time-series data mining techniques on over 60 million commuters‘ public train transit trips and generate travel patterns of commuters for 102 train stations. I. INTRODUCTION We derived several interesting insights on commuters‘ A. Motivation travel patterns and behavior from the case study. These Bagchi and White [1], in their paper, ―The potential of generated insights provide an example to urban public transport smart card data‖, introduced and explored transport planners on how they could leverage the possibilities of using smart cards beyond fare collection. time-series data miming to better understand the The smart cards data, beyond just the transacted fare prices, commuters‘ travel patterns and behaviors. also contains rich information such as boarding and exiting C. Paper Outline time, as well as geospatial information such as the boarding The rest of the paper is organized as follows. Section II and exiting bus stops and train stations. Since then, urban reviews the literatures related to our study. Section III transport planners had tried to analyze the collected smart discusses the time-series data mining technique. The case card data in attempt to discover useful information and study on Singapore public train transit will be introduced in knowledge on the commuters‘ travel patterns and behaviors. Section IV. Section V describes the application of time-series However, the analysis of the smart card data was proven to data mining methodology and framework used in the case be a challenge as the automatically collected smart card data study. We present the insights generated from this case study were large and continue to increase in size. This explosive in Section VI before concluding the paper in Section VII. growth in such complex temporal data has far outpaced the urban transport planners‘ ability to interpret these data using conventional statistical techniques. As such, there is an urgent II. LITERATURE REVIEW need for new techniques to allow urban transport planners to transform these massive complex data into actionable In recent years, smart cards and automated data collection information ad knowledge. systems (ADCS) had also been widely adopted in public transport networks around the world. These smart cards, B. Research Objectives and Contributions which are portable credit card size devices that store and In this research study, we therefore aim to explore and process data, were credited with monetary value and used for public transit fare payment [2]. Although the preliminary purpose of smart cards is for Manuscript received November 25, 2013; revised March 3, 2014. collection of fares for public transport, Bagchi and White [1] The authors are with Singapore Management University, Singapore (e-mail: [email protected]). discussed the potential use of the passively recorded DOI: 10.7763/IJET.2014.V6.737 431 IACSIT International Journal of Engineering and Technology, Vol. 6, No. 5, October 2014 transaction data for travel behavior analysis and public proposed by Berndt and Clifford [9]. As mentioned transport planning. Since then, various researches and case previously, one of the common problems in analyzing studies were done on the data collected from public transport time-series data is that time-stamped data might be irregularly smart card systems around the world. Morency et al [3] recorded, which might cause two different time-series that explored the data mining techniques used to analyze the have common trends to occur at different time. If the two spatial and temporal variability of Canadian pubic transit time-series did not occur simultaneously, the application of network passengers using different card types. Asakura et al traditional data mining techniques will not discern such a [4], constructed a origin-destination (O-D) matrix using the relationship, as it does not consider time as a factor in smart card data collected from Japan‘s public train network comparison. and applied statistical analysis to study the change in passengers‘ travel patterns when the train operator changed its train timetable. Kim and Kang [5] also attempted a similar research on Seoul public transit network by using the transaction data collected from T-Money, South Korea‘s electronic fare card system. Fig. 1. Comparing time-series using Euclidean distance. Locally, there were also some researches done using the data collected from EZ-Link, Singapore‘s smart card used for public transit. Lee et al [6] did a case study to optimize serviceability and reliability of bus routes by running statistical analysis on the EZ-Link data collected from bus transit trips. Sun et al [7] also did a study using the EZ-Link data collected from public train transit to estimate the Fig. 2. Comparing time-series using dynamic time warping. spatial-temporal density passenger onboard a train or waiting in the train station. Take for example, Fig. 1, a traditional data mining In the literature, there is a large body of work on applying similarity measure such as Euclidean distance is used to statistical analysis on the smart card data collected from compare the similarity between two time series Q and C, and a public transit trips. However, few researches were able to deal relationship is not discern because it is the two time-series are with the growing large amount of temporal data effectively or out of phase. In Fig. 2, DTW algorithm is used to overcome were there any extensive study done using the smart card data this problem by accounting for the time factor when to reveal the commuters‘ travel patterns. comparing the two time-series. III. TIME- SERIES DATA MINING Time-series data refers to data collected in a routine, continuous and sequential manner. This type of data, which typically accompanied with a timestamp, has always been collected by businesses and organizations in their daily operations. Examples of such data include sales transaction, delivery orders, stock quote prices etc. Increasingly, businesses and organizations seek to analyze these time-series data to uncover more business insights. However, the analysis of time-series data posted new challenges, as traditional data Fig. 3. Singapore public transport mode share. mining and statistical techniques are inappropriate when There were already time-series data mining case studies analyzing data that contain a time factor. These challenges done on business and organizations. However, most case eventually became the motivation for the development of studies were done in the retail business domain. In a recent time-series data mining techniques. case study, Nakkeeran et al [10] demonstrated how According to Schubert and Lee [8], there are two main time-series data mining techniques can be used to perform challenges in analyzing time-stamped data. Firstly, it is a clustering of retail store-level revenue over time and how tedious process to transform time-stamped data into table profiling of such clusters generates greater business insights. formats, which is suitable for the application of traditional Although there is an increased interest for organizations to statistical techniques. Secondly, it is challenging to apply explore time-series data mining, few research studies were pattern detection on time-stamped data using traditional data done to apply time-series data mining in the transportation mining as the time-stamped data might be irregularly recorded. domain. Time-series data mining however, tried to overcome these challenges by presenting a framework methodology that converts time-stamped data into time-series format suitable IV. SINGAPORE PUBLIC TRAIN TRANSIT for analysis and pattern detection. One of the most widely studied time-series data mining A. Singapore Public Transport System technique is the dynamic time warping (DTW) algorithm There are three main modes of public transportation in 432 IACSIT International Journal of Engineering and Technology, Vol. 6, No. 5, October 2014 Singapore: Taxi, bus and mass rapid transit (MRT) rail V.

Time-Series Data Mining in Transportation: a Case Study on Singapore Public Train Commuter Travel Patterns

FITTING-OUT MANUAL for Commercial Occupiers

Merdeka Generation Package $100 Top-Up Benefit

Yamato Transport Branch Postal Code Address TA-Q-BIN Lockers

SLIDE Store Listing- 1 Apr 2019

The New Bugis Station and Associated Tunnels for the Singapore MRT

Singapore for Families Asia Pacificguides™

List of Yamato Singapore Branch Offices, 7-CONNECT Lockers and 7

Moving Stories 2.0.Pdf

List of Public CD Shelters As of 31 Dec 2019.Xlsx

Duo Towerduo

List of Yamato Singapore Branch Offices, TA-Q-BIN Lockers and 7

BUS ROUTES in SINGAPORE Route No