Identifying Dependencies Among Delays
Total Page:16
File Type:pdf, Size:1020Kb
Identifying Dependencies Among Delays Dissertation zur Erlangung des Doktorgrades der Mathematisch-Naturwissenschaftlichen Fakultäten der Georg-August-Universität zu Göttingen vorgelegt von Carla Conte aus Bassano del Grappa (VI) Italien Göttingen, 2007 2 D7 Referentin: Prof. Dr. Anita Schöbel Korreferent: Prof. Dr. Axel Munk Tag der mündlichen Prüfung: 17. Januar 2008 To Lia Zannoni ”. mortis nostræ animæ . ” Abstract Weniger als 5 Minuten Verspätung: Wir bitten um Ihre Verstandnis und Geduld. Mehr als 5 Minuten Verspätung: Wir bitten um Ihre Entschuldigung. DEUTSCHE BAHN Mathematical optimization is a vital area in applied mathematics. Even if it came into existence about few decades ago, this field underwent not only theoretical advances. Many research efforts have been spent to study large-scale real-world applications. The enormous progress made in computer science together with the development of sophisticated numerical and algorithmic techniques are the reasons for the success in dealing with the large amount of data now available for the description of a practical problem. In view of this background, solving a practical problem by means of applied math- ematics is much more than applying mathematics. Implementation skills are of im- portance as well as the ability to communicate with practitioners and scientists from other disciplines. The proximity and interaction with computer science and statistic supply us with a spectrum of new ideas and techniques. This application-oriented thesis deals with delay propagation in railway system, in particular with identifying dependencies among delays. The idea is to combine two different fields: stochastic and optimization. A stochastic approach points out rela- tionship among variables representing (arrival/departure) delays of the trains at the stations of their journey. These dependencies are then transformed with a linear re- gression procedure into capacity constraints of a rescheduling problem. Finally the optimization problem is solved and the robustness of the new solution is checked through a comparison with the one of the uncapacitated model of the problem. Nowadays, railway transportation needs to become more and more competitive, so new features are required to improve the planning process. In daily operations the goal is to compensate perturbations of the scheduled timetable, in particular to meet in a better way the passengers’ needs concerning transfers and changes. This leads to the delay management problem. Delay propagation is often considered as one of the main reasons for the poor attractiveness of railway transport. In fact, a better re- scheduling of the timetable in order to minimize the disadvantages for the passengers will be possible if the critical points of the system are known. This includes knowl- edge about dependencies among delays, in order to be able to point out where the source delays are and how they spread out into the system. Delays and their behavior in railway systems have recently been investigated by [32], [40], [61], [72], [80] . II We distinguish, as usual in the literature (see e.g. [54]), between two types of de- lay: source delays, i.e. delays that are caused from the outside and not from other trains (see “Urverspätungen” in [73]), which usually spread out into the system induc- ing a second kind of delay, called ”forced delay” (see “Folgeverspätungen” in [73]). We further distinguish between the following three types of delay propagation: 1. propagation along the same train. Delay is carried over along the path of each delayed train, i.e. if a train starts with a delay it is likely to reach its next station with a forced delay (propagation along a driving activity), and if it arrives at a station with a delay it will probably depart with a delay (propagation along a waiting activity); 2. propagation from one train to another due to connections. If a connecting train waits for a delayed feeder train, the delay of the feeder train may spread out to the connecting train (propagation along a transfer activity); 3. propagation from one train to another due to the limited capacity of infrastruc- ture. If two trains share the same infrastructure (a part of a track or a platform) one of them has to wait until the other has left (propagation along a headway). The first two types of delay propagation are easy to handle from an analytical point of view, since the minimal duration of every activity is known and is hence an explicit (i.e. given) parameter. However, the third kind of delay propagation is more compli- cated to deal with. This is due to the fact that it requires a detailed knowledge of the track system on a microscopic level. In Chapter 1 an overview about approaches dealing with the third type of delay prop- agation is given. In Chapter 2 the delay management problem is introduced. It deals with reactions in case of delays in public transportation. The timetable problem is formulated as a linear programming introducing the notion of Activity-on-arc Project Network (see [51]). The arrivals and departures of the trains in all the stations of their journey are regarded as the nodes of the network (we refer to them as the events of the system). The edges are defined by the scheduled activities of the system: waiting activities, driving activities and connections. Classical formulations of the timetable problem as Resource Constrained Project Scheduling problem (Ref: [64]) or as Job Shop Scheduling problem (Ref: [17] and [18]) are presented. The NP-hardness of these formulations could be proved. Models suggested by other research groups are intro- duced to stress advantages and drawbacks with respect to the model considered in this thesis. In contrast to all these classical approaches, the goal of this work is to propose a procedure which enables us to detect dependencies among delays of the third type without explicit knowledge of all details of the infrastructure. In Chapter 3 a stochastic approach to identify dependencies, called Tri-graph (see [75] and [76]), is presented. This is a simplified graphical modeling approach in which full conditional modeling is carried out in small subgraphs with only three vertices that will be then combined into the full model. The variables of the rescheduling problem are the events of an Activity-on-arc Project Network for the timetable problem in rail- way systems. The delay measurements of each event in this model correspond to the observations of the variables. Analyzing the dependencies of these variables with the Tri-graph method yields links among the events of the system, i.e. among the arrival and departure events of the trains in the stations. These links represent on one side the dependencies arising during the driving, waiting or transfer activities and on the other side the information about the third type of delay propagation, namely the propagation III due to capacity constraints that we are looking for. These “virtual activities” can be interpreted as inequalities that represent a lower bound for the re-scheduled time of an event ensuring that an event can not happen before another event has taken place: for example a train can enter in a station only if its assigned platform is free. Hence the “virtual activity” does not belong to the set of activities of the problem (waiting, driving and transfers), but it can be considered as a precedence constraint in the rail- way problem that has to be satisfied to avoid infrastructure conflicts (using the same track or the same platform) due to the limited capacity of the track system and to the operational rules of the security system. The idea of our approach is to describe the delay propagation by using these virtual activities instead of the classical inventory constraints that are commonly used in re- scheduling problems (see Chapter 2). Indeed, the Tri-graph method has the advantage to be able to identify even complex dependencies without “a priori” knowledge of the track system. Moreover, compared to other statistical methods, Tri-graph is appli- cable also when the data sample is small compared to the number of random variables. Chapter 4 presents the outcomes of the application of the Tri-graph method to real- world data of German railway, corresponding to some stations within the Harz area in Germany. The data has been provided by Deutsche Bahn within the context of a larger project named DisKon (see [11]) and it consists of delay data of all trains at all stations over a period of nine months. Apart from the delay, we also have information about the timetable and the infrastructure in this region, such that we are able to compare the stochastic results with the analytical constraints of the classical formulation of the problem. Moreover, a comparison with the outcomes of other standard techniques (in particular with Full Conditional Graph and with Covariance Graph) has been per- formed as validation of the theoretical results presented in Chapter 3. In Chapter 5 an analysis, from a theoretical and practical point of view, of the mod- eling of the delay and the virtual activities has been carried out in order to transform the edges resulting from the Tri-graph procedure into time constraints for the delay management problem. Four possible linear regressions have been considered and the approximation errors have been compared considering two different strategies: the first strategy was to consider as the best-fit polynomial approximation the one that has the minimal sum of the deviations squared (least squares error), the second strategy was based on a robust estimator so that it will be relatively unaffected by outlying values (Huber’s error). In Chapter 6 we introduce the concept of robustness and we test the “virtual” capac- itated model of the problem according to three possible criteria: number of violated capacity constraints, cost (in second) of the violations and price (in second) to trans- form the solution into a feasible one for the Microscopical model.