Application of Data Clustering to Railway Delay Pattern Recognition

Hindawi Journal of Advanced Transportation Volume 2018, Article ID 6164534, 18 pages https://doi.org/10.1155/2018/6164534 Research Article Application of Data Clustering to Railway Delay Pattern Recognition Fabrizio Cerreto ,1 Bo Friis Nielsen,2 Otto Anker Nielsen,1 and Steven S. Harrod1 1 Department of Management Engineering, Technical University of Denmark, 2800 Kongens Lyngby, Denmark 2Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800 Kongens Lyngby, Denmark Correspondence should be addressed to Fabrizio Cerreto; [email protected] Received 21 November 2017; Revised 20 February 2018; Accepted 11 March 2018; Published 29 April 2018 Academic Editor: Andrea D’Ariano Copyright © 2018 Fabrizio Cerreto et al. Tis is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. K-means clustering is employed to identify recurrent delay patterns on a high trafc railway line north of Copenhagen, Denmark. Te clusters identify behavioral patterns in the very large (“big data”) datasets generated automatically and continuously by the railway signal system. Te results reveal the conditions where corrective actions are necessary, showing the cases where recurrent delay patterns take place. Delay profles and delay change profles are generated from timestamps to compare diferent train runs and to partition the set of observations into groups of similar elements. K-means clustering can identify and discriminate diferent patterns afecting the same stations, which is otherwise difcult in previous approaches based on visual inspection. Classical methods of univariate analysis do not reveal these patterns. Te demonstrated methodology is scalable and can be applied to any system of transport. 1. Introduction delay patterns in transportation, identify the main reason for cluster membership, and provide managerial insight to Operations analysis is the collection and review of perfor- improve timetables and processes. mance data, such as punctuality and process cycle time. It Prior studies propose several methods that are currently is a key step in the continuous improvement of transport in use for operation analysis, deploying sources of automatic services, and several methods exist to collect and analyze data data collection. Tese approaches can be divided into tra- from operations. Te increasing availability of automated ditional statistical methods and big data techniques, which data sources is ofering new ways to analyze operations, pro- diferinboththeuseofdataandintheoutputprovided. viding deeper insight and more reliable information. Railway Traditional methods tend to aggregate and summarize infor- management is very accepting of these new possibilities, and mation, so these can provide a general picture or detailed considerable efort is made by operators and institutions information on specifc stations or trains. Tese are typically to use operations analysis in feedback loops for improving proposed in the form of multiple univariate distribution the timetabling process [1–4]. A better understanding of the analysis, where the occurrence of diferent delay patterns at development of delays in railways, and in transportation in the same station is not visible. Big data techniques can be general, provides the opportunity to improve the processes used to investigate recurring patterns or internal structures and identify the factors afecting reliability. For example, in operations. Tese approaches are expanding, thanks to causes of delays might be identifed in misallocation of the growing availability of large amounts of data, and several supplements and bufers in timetables, structural conficts techniques have been deployed to identify recurrences of that require mitigation actions, suboptimal design of sta- delays and describe or predict delays. Advanced techniques tion processes, and inefcient procedures for preparing a such as neural networks, succession rules, Bayesian networks, train for departure. Tis paper demonstrates a data mining and various methods of regression have been developed technique based on k-means clustering to identify recurrent mainly to predict real time delays in railways, as described 2 Journal of Advanced Transportation in Section 2. However, train delays are necessarily correlated railway network. Goverde et al. ft diferent distributions over the progression of a complete journey, and these data for arrival and departure delays and fnd that no general relations both along the journey of a train and among distribution fts all groups of recorded arrival delays. adjacent train paths have not received as much attention in Primary delay distributions derived from operational the literature. dataarealsoofenemployedasinputinsimulationmodels Tis paper presents a big data technique to identify to evaluate the propagation of delays. Sipila¨ [9] explores recurring delay patterns in railway operations. Big data refer the efect of modifed running time supplements in railway to information assets characterized by high volume, velocity, schedules through microsimulation of a Swedish railway and variety, whose value is extrapolated by analytical methods line. Te author identifes diferent strategies for running [5].Inthisapplication,theabsolutedelayanddelaychange time supplement allocation by verifying the signifcance are tracked for individual train paths along a railway line, ofthechangeinpunctualityrecordedin1600simulations resultinginabsolutedelayanddelaychangeprofles.In of selected scenarios. Lindfeldt [10] describes a method to the papers based on univariate statistics, systematic delays aggregate delay data from real records and isolate distribu- in these profles are identifed through visual inspection. tions of primary delays. Tese distributions are then used Te manual search for similarities sufers from subjective to formulate microsimulation models. Te data consists of interpretation from the operator and is easily biased by com- manual records from dispatchers that assign a delay cause mon artefacts of the representation. Te technique presented code to every record greater than 4 minutes of delay on the in this paper applies k-means clustering to fnd recurrent Swedish railways. In absence of other sources of data, the patterns in train delay progression, so that management may reliabilityofmanualrecordcannotbevalidated,although identify processes for improvement or correction. In this the whole simulation model and its results rely on the way it is possible to support continuous quality improve- derived distributions. Studies from other countries show that ment. manual input can be indeed unreliable [11, 12]. Te same In Section 2, a literature survey of contemporary data method to extract primary delay distributions is later used analysis methods is ofered. Section 3 presents the k-means by Lindfeldt and Sipila[13]inasimulationmodeltoassess¨ cluster method and the structure of the data to be studied. the efect of allowing freight trains to travel outside their Section 4 presents results from the study of a high density assigned path. Te authors demonstrate that the realized Danish railway line. Te efectiveness of k-means clustering travel times of freight trains could be shortened consider- for this application is discussed in Section 5, particularly with ably without afecting the performance of other trains. Te regard to its novelty compared to existing literature, while reduction of unnecessary waits for trafc management and conclusions of this paper are presented in Section 6. the permission to depart before schedule reduced the average travel time on one side but increased its variability on the 2. Literature Survey other. Historical data also provides insight into the factors Operations analysis is fundamental in the continuous that infuence service reliability. Olsson and Haugland [14] improvement process to manage and modify railway opera- apply regression analysis on the Norwegian railway network tions. Data collected from real operations, or from simulation and identify the most relevant factors for punctuality, such models, has been used in the feedback loop to design and as absolute passenger fow and passenger occupation ratio. improve railway timetables for decades. Typically, even if Gorman [15] uses regression analysis on data from American timetables may change over time, some of the fundamental single-tracked freight railways to identify the factors that infrastructure and service behaviors will not be modifed. contribute the most to prolongation of railway running times. Timetables are ofen the result of only minor modifcations Gorman predicts congestion delay based on meets and passes to the previous editions and need to consider problems dis- scheduled as a consequence of speed heterogeneity. Again, covered in earlier timetables. For example, afer a structural in simulation, Shih et al. [16] apply an approach similar to change in the Danish railway timetable in 1998, afer the Gorman’s to determine the best capacity expansion strategy openingoftheGreatBeltfxedlink,theservicestructure in terms of reduction of average prolongation of running time remained largely unchanged until 2016 [6]. for freight trains. Shih et al. identify functional relationships, Data collection systems have proliferated in railway through regression of simulation results, between average networks since 2000, and very large amounts of data are

Application of Data Clustering to Railway Delay Pattern Recognition

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support