Network Performance Model for Urban Rail Systems by Baichuan Mo B.E., Tsinghua University (2018) Submitted to the Department of Urban Studies and Planning and Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science in Transportation and Master of Science in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY May 2020 ○c Massachusetts Institute of Technology 2020. All rights reserved.

Author...... Department of Urban Studies and Planning and Department of Electrical Engineering and Computer Science May 15, 2020 Certified by...... Haris N. Koutsopoulos Professor of Civil and Environmental Engineering, Northeastern University Thesis Supervisor Certified by...... Zhao Associate Professor of City and Transportation Planning, MIT Thesis Supervisor Certified by...... Patrick Jaillet Professor of Electrical Engineering and Computer Science, MIT Thesis Supervisor Accepted by...... P. Christopher Zegras Professor of Transportation and Urban Planning, MIT Chair, Program Committee Accepted by...... Leslie A. Kolodziejski Professor of Electrical Engineering and Computer Science, MIT Chair, Department Committee on Graduate Students 2 Network Performance Model for Urban Rail Systems by Baichuan Mo

Submitted to the Department of Urban Studies and Planning and Department of Electrical Engineering and Computer Science on May 15, 2020, in partial fulfillment of the requirements for the degree of Master of Science in Transportation and Master of Science in Electrical Engineering and Computer Science

Abstract Urban rail transit is an important component of transportation systems and plays a crit- ical role in providing smooth and efficient mobility in many metropolitan areas. Network performance monitoring, i.e. assessing the level of service and operation information of the network (e.g. train loads), is a fundamental task for urban rail transit management. The objective of this thesis is to develop a data-driven network performance model (NPM) for urban rail system performance monitoring. Specifically, this work focuses on two major components of the NPM: 1) a network loading engine which takes train movement data, origin-destination (OD) flows, network, train capacity, and path choices as inputs, andout- puts performance indicators such as train loads and crowding levels, and 2) a calibration engine which can estimate path choice and train capacity parameters using automatically collected data. The automated fare collection (AFC) and train movement data from Hong Kong’s Mass Rail Transit (MTR) network are used as a case study for the analysis. An event-based network loading engine is proposed. The model can distribute passengers in the network given OD demand, path choices, train capacity with a capacity constraint, and a first-come-first-board criterion. The event-based is computationally efficient while retaining the necessary performance information, which enables it to be applied with large- scale urban rail systems. An effective train capacity model is proposed, which assumes that train capacity is a function of train load and number of queuing passengers on platforms. The model recognizes that train capacity may vary across stations, which is seldom considered in the literature. The use of NPM for performance monitoring is demonstrated by analyz- ing the spatial-temporal crowding patterns in the MTR system and evaluating dispatching strategies. The model is validated by comparing its outputs (with effective capacity) with field observations at a busy station and the outputs of a benchmark fixed-capacity model. Results show that the output of the model matches the ground truth observations well and outperform the benchmark model. NPM is also used to identify crowding stations and evaluate different dispatching strategies. To calibrate path choices, an assignment-based path choice estimation framework using AFC data is proposed. The framework captures the crowding correlation among stations and the interaction between path choices and passenger denied boarding, which are usu- ally neglected in the literature. The path choice estimation is formulated as an optimization problem, which attempts to minimize the error between assignment outputs (which is a func- tion of path choices) and the corresponding quantities observed from the AFC data. The original problem is intractable because of a non-linear multinomial logit equation constraint

3 and a non-analytical black-box function constraint (i.e. assignment model). A solution pro- cedure is proposed to decompose the original problem into three tractable sub-problems: rough path shares estimation, choice parameters estimation, and path exit rates estimation. The sub-problems can all be solved efficiently. We prove the solution of the decomposed problem is equivalent to the original problem under certain conditions. The model is val- idated using both synthetic data and real-world AFC data. Results from synthetic data show estimated path choice parameters are very close to the “true” (synthetic) ones. The proposed method outperforms the benchmark models in both the convergence rate and final solutions quality. Results from real-world data show the estimated coefficients are similar to the previous survey results. The model’s robustness is verified through a sensitivity analysis. As the observed information in AFC data may also be affected by train capacity, a simul- taneous calibration of path choices and train capacity is more reasonable than calibrating one set of parameters alone. We propose a simulation-based optimization (SBO) framework to calibrate path choices and train capacity simultaneously using AFC and train movement data. The calibration problem is formulated as an optimization problem with a black-box objective function. Seven optimizers (solving algorithms) from four brunches of SBO solving methods are evaluated. The algorithms are evaluated using an experimental design that in- cludes five scenarios, representing different degrees of path choice randomness and crowding sensitivity. Results show that some of the algorithms can estimate the path choice and train capacity parameters well. In general, the response surface methods have better convergence speed, stability, and estimation accuracy. They exhibit consistently good performance under all scenarios. Future research directions include: 1) Developing a more efficient simultaneous path choices and train capacity calibration framework. The proposed SBO framework in this work is not computationally efficient due to a large number of expensive simulation-based function evaluations. Modifying the assignment-based framework to incorporate co-calibration is an interesting direction. 2) Developing a behavior-based effective train capacity model that incorporates passengers’ willingness to board explicitly. 3) Extending the NPM for real- time operation control and future planning. A reinforcement learning-based control engine and an automated timetable design engine can be added to the current NPM framework to enable control and planning applications, respectively.

Thesis Supervisor: Haris N. Koutsopoulos Title: Professor of Civil and Environmental Engineering, Northeastern University

Thesis Supervisor: Jinhua Zhao Title: Associate Professor of City and Transportation Planning, MIT

Thesis Supervisor: Patrick Jaillet Title: Professor of Electrical Engineering and Computer Science, MIT

4 Acknowledgments

Throughout my two years at MIT, I am indebted to many people for their companion, support, encouragement, and help.

First, I would like to express my deepest gratitude to my advisors Prof. Jinhua Zhao and Prof. Haris N. Koutsopoulos. Jinhua and Haris are two different spirits that consistently inspire my life. Jinhua is a creator, showing me rich and colorful research ideas. He is like the compass in my research ship, guiding me towards a vast and attractive ocean. I pretty much enjoyed the extensive discussions and sparkling moments with him. Haris is a craftsman, digging into depth of techniques with thorough understanding. He is like the sail in my research ship, providing the essential energy support in a challenging journey. I pretty much appreciated his dedicated edition to every corner in my papers, rigorous attitudes towards every problem encountered, and enlightening suggestions in even the complicated methodology details. From them, I learned what a great mentor should be. I feel very honored to have both of them as my advisors during my studies at MIT. I would also like to thank Prof. Patrick Jaillet for being my thesis reader and offering advice to this dissertation. Thanks to John Attanucci for providing insightful comments for my various research jobs.

I am thankful to my friends at JTL-Transit Urban Mobility Lab. I would like to especially thank Qing Yi Wang, with whom I discussed a lot of research ideas, had a lot of fun, and stayed up all night for thesis writing together. I would also like to thank Yunhan Zheng and Xiaotong Guo, who accompanied my daily study life, brought me much joy, and worked with me closely. Thanks to Zhenliang (Mike) Ma who worked with me collaboratively in the many research works. Mike is such a kind mentor that I can always ask questions for, helping me to own a wonderful start at MIT. I also thank Yu Shen, Shenhao Wang, Hongmou Zhang, Zhan Zhao, Hui Kong, Zhejing Cao, Jintai Li, Joanna Moody, Peyman Noursalehi, Seyedmostafa Zahedi, Nick Caros, Rachel Luo, Patrick Meredith-Karam, Saeid Saidi, Nate Bailey, and John Moody for cooperation, help, encouragement, and friendship.

I would also like to thank my fellow students at CEE, MIT, and Northeastern University. Thanks to Yifei Xie and Siyu Chen for sharing your experiences in MST programs when I was still new to here, and the friendship with many fun memories. Thanks to Yue Meng, Yunpo Li, Tong Bo, Yu Qiu, Xiaoyu Shan, Ruijiao Sun, Tian Zhao, Yilang Xu, and Jie Yun for having a lot of fun together. I also thank Kerem Tuncel and Tenyu Ryu for the

5 wonderful journey in Hong Kong. I am grateful to the Hong Kong Mass Transit Rail program, the Chicago Transit Author- ity program, and the Singapore–MIT Alliance for Research and Technology Future Mobility program, for providing financial and data support during my staying atMIT. I am extremely grateful to my mother, Qingping Tang. Thank you for fostering me, respecting my choices, and providing me with opportunities to pursue my dreams. Finally, a special thank you to you, Yiyun: thanks for your companion and understanding throughout the years. During my hard times, you guided me out of difficulties and helped me overcome challenges. You bring me love, care, and encouragement. This dissertation is dedicated to you.

6 Contents

1 Introduction 15 1.1 Background and Motivation ...... 15 1.2 Conceptual Framework ...... 16 1.3 Research Objectives and Approach Overview ...... 18 1.3.1 Network loading model ...... 18 1.3.2 Train capacity calibration ...... 19 1.3.3 Path choice calibration ...... 19 1.4 Data and Context ...... 20 1.4.1 Automatically collected data ...... 20 1.4.2 Hong Kong Mass Transit Railway system ...... 21 1.5 thesis Organization ...... 22 1.5.1 thesis structure ...... 22 1.5.2 Relationship among chapters from another angle ...... 23 1.6 Relation with Papers ...... 24

2 NPM Formulation and Functionality 25 2.1 Introduction ...... 25 2.2 NPM Formulation ...... 27 2.2.1 Inputs ...... 27 2.2.2 Network loading ...... 27 2.2.3 Route choice ...... 29 2.2.4 Effective capacity ...... 30 2.3 NPM Functionality ...... 32 2.3.1 Crowding patterns ...... 32

7 2.3.2 Crowding sources ...... 32 2.3.3 Dispatching strategies ...... 33 2.3.4 Network resilience ...... 33 2.4 Case Study ...... 33 2.4.1 System settings ...... 34 2.4.2 Model calibration ...... 34 2.4.3 Results ...... 35 2.5 Discussion ...... 39

3 Assignment-based Path Choice Estimation 41 3.1 Introduction ...... 41 3.2 Literature Review ...... 43 3.3 Methodology ...... 45 3.3.1 Network representation ...... 45 3.3.2 Problem definition ...... 48 3.3.3 Problem decomposition ...... 50 3.3.4 Linearization for sub-problem 1 ...... 51 3.3.5 Discussion of solution procedures ...... 56 3.4 Case Study and Model Validation ...... 59 3.4.1 Validation setting ...... 59 3.4.2 Benchmark model ...... 61 3.4.3 Synthetic data results ...... 62 3.4.4 Real-world data results ...... 64 3.4.5 Robustness testing ...... 65 3.5 Discussion ...... 67

4 Simultaneous Calibration of Path Choices and Train Capacity 69 4.1 Introduction ...... 69 4.2 Methodology ...... 71 4.2.1 Transit network loading model ...... 72 4.2.2 Problem formulation ...... 72 4.3 Simulation-based Optimization Algorithms ...... 74 4.3.1 Genetic Algorithm (GA) ...... 75

8 4.3.2 Simulated Annealing (SA) ...... 76 4.3.3 Nelder-Mead Simplex Algorithm (NMSA) ...... 77 4.3.4 Mesh Adaptive Direct Search (MADS) ...... 77 4.3.5 Simultaneous Perturbation Stochastic Approximation (SPSA) . . . . . 78 4.3.6 Bayesian Optimization (BYO) ...... 79 4.3.7 Constrained Optimization using Response Surfaces (CORS) ...... 79 4.4 Case Study ...... 80 4.4.1 Experimental design ...... 80 4.4.2 Case study settings ...... 81 4.4.3 Reference scenario results ...... 82 4.4.4 Sensitivity analysis ...... 83 4.5 Discussion ...... 86

5 Conclusion 89 5.1 Summary of Results ...... 89 5.2 Future Research ...... 91 5.2.1 Calibration methodology ...... 91 5.2.2 Behavioral effective capacity model ...... 93 5.2.3 From monitoring to control and planning ...... 93

A Passenger Route Choice Model for MTR System 97

9 THIS PAGE INTENTIONALLY LEFT BLANK

10 List of Figures

1-1 Diagram of the data-driven public transit management system ...... 16 1-2 Diagram of the network performance model ...... 18 1-3 Hong Kong MTR metro system map ...... 21 1-4 Structure of the thesis ...... 22 1-5 Relationship among critical parameters in urban rail systems ...... 23

2-1 Structure of the network loading model ...... 28 2-2 Model validation at the ADM station, Tsuen Wan Line, northbound (18:00- 19:00) ...... 36 2-3 Exit flow comparison (18:00-19:00) ...... 37 2-4 Network crowding patterns (18:00-19:00) ...... 38 2-5 Comparison of left-behind passengers for different dispatching strategies . . 38

3-1 Network representation example ...... 48 3-2 Network Example for ALC ...... 53 3-3 Convergence behavior of estimated 훽 ...... 62 3-4 Objective Function Results of Synthetic Data ...... 63 3-5 RMSE Results of Synthetic Data ...... 63 3-6 Model sensitivity to initial 훽 values ...... 66 3-7 Comparison of estimated 훽 values for different days ...... 67

1 4-1 Convergence results of the reference scenario. The error bar indicates 4 × standard deviation. NMSA has no error bar because it is a deterministic algorithm...... 82 4-2 Algorithm performance in the two path choice scenarios ...... 84 4-3 Algorithm performance in the two train capacity scenarios ...... 86

11 5-1 Example of a fine-grained TS hyper-network ...... 93 5-2 NPM extension for real-time control ...... 94 5-3 NPM extension for planning ...... 95

12 List of Tables

2.1 Input variables and data sources ...... 27 2.2 Crowding indicators ...... 32

3.1 훽 Estimation Results of Synthetic Data ...... 64 3.2 훽 Estimation Results of Real-world Data ...... 64 3.3 Comparison of various measurement of Admiralty station (18:00 to 19:00) . . 65

4.1 Algorithms Summary ...... 75 4.2 Scenario design ...... 81 4.3 Calibration results of the reference scenario ...... 83 4.4 Calibration results of the random path choice scenario ...... 85 4.5 Calibration results of the deterministic path choice scenario ...... 85 4.6 Calibration results of the crowding-insensitive train capacity scenario . . . . . 85 4.7 Calibration results of the crowding-sensitive train capacity scenario ...... 87

A.1 Route Choice Model Estimation Results ...... 97

13 THIS PAGE INTENTIONALLY LEFT BLANK

14 Chapter 1

Introduction

1.1 Background and Motivation

Urban rail transit is an important component of transportation systems in many metropoli- tan areas. Due to its high reliability, large capacity, and low pollution, the rail transit service continues to grow along with rising demands, especially in many Asian cities. However, in the context of large-scale networks and high-volume demands, recurrent congestion and sud- den incidents are becoming major concerns in these systems. Given the close interaction between urban rails and other transportation modes, ensuring normal operations for rail systems is important to keep smooth and efficient urban mobility. The management of urban rail systems has evolved from manual-based to data-driven. For a century, public transit agencies have relied on costly and unreliable manual data col- lection systems, such as on-site surveys for sample passengers, field observation for selected stations. These approaches have hampered the effective evaluation, management, and plan- ning of transit services, ultimately reducing efficiency and threatening the quality of services. Recently, with the emergence of Information and Communication Technology (ICT), various automated collect data in transit systems became available, which has transformed what was once a data-starved arena into a data-rich environment for planners and managers [1]. The management of urban rail systems has entered the second stage: data-driven management. Figure 1-1 shows the main tasks and functions of the data-driven public transit manage- ment system (DPTMS). Historical performance monitoring, real-time control, and future operation planning are three major tasks in the DPTMS, which can assist the operators to understand, inform, and improve the transit services. Each task contains many associ-

15 ated sub-problems, which cover different dimensions of the service management. This thesis focuses on the monitoring task.

Figure 1-1: Diagram of the data-driven public transit management system

Network performance monitoring, which means obtaining the level of service and op- eration information of the network (e.g. train load) and analyzing the service quality, is the fundamental task in the DPTMS. It is crucial for transportation agencies to identify congestion, evaluate the system, and adjust operation strategies. Given the cost of manual data collection, a network performance model that can evaluate the level of transit services based on automatically collected data alone is needed.

1.2 Conceptual Framework

Performance monitoring can be conducted at two different levels: data-description level [2, 3] and modeling level [4, 5, 6]. At the data-description level, researchers usually analyze the data which can be directly obtained from the automated data collection system. All performance indicators are derived from the raw data. The drawback of data-description based performance monitoring is that only the measurements which are directly available in raw data can be calculated. More detailed information, such as passenger deny-boarding rate due to train crowding, is not available. In terms of modeling level method, the performance evaluation is usually achieved by the transit network loading (or simulation) models. These models take origin-destination (OD) demand matrix, train schedules (or ground truth movement information), path choice behavior, and network infrastructure information (such as network typology and train ca-

16 pacity) as input, output all performance metrics (e.g. train load, passenger waiting time, deny-boarding probability) by simulating the passengers travel behaviors in the system. Given the richer output performance metrics, performance monitoring at the modeling level becomes the mainstream of research.

Data-driven performance monitoring at the modeling level requires two components. The first component is a network loading (or simulation) model that can output different performance indicators and are flexible enough to test different scenarios (e.g. timetable change, different dispatching strategies. Given the requirement for daily performance moni- toring, the simulation framework should be running efficiently for large-scale networks with high demand, while still maintaining the required details. The second component is a self- calibration model that can generate all required input files and calibrate model parameters for the simulation model with only automatically collected data. This can eliminate the manual data collection process, which makes the overall framework a purely automated data-driven system.

The self-calibrated data-driven performance monitoring system is referred to as the net- work performance model (NPM) in this study. The conceptual diagram of NPM is shown in Figure 1-2. The network loading engine of NPM takes AVL data (or time table), AFC data (or OD entry flow), network, train capacity, and path choices as inputs, and outputs performance indicators of interest.

Among all inputs, path choice fractions and train capacity are not easy to obtain. Path choice fractions cannot be directly observed. They are typically estimated from surveys, which are time-consuming and labor-intensive. Train capacity, defined as the actual capacity used by passengers, is not an objective parameter. Normally trains may not reach their designed physical capacity for various reasons (e.g. passengers may decide not to board due to the crowding [7]). Therefore, train capacity may vary across platforms depending on crowding levels and people’s willingness to board.

Among all outputs, the ground truth OD exit flow (number of passengers exiting a station within a specific time interval, a rigorous definition can be found in Section 2.2)and passenger journey time can be observed from the AFC data. Since these two outputs are affected by path choices and train capacity, it is possible to calibrate the path choices and train capacity by minimizing the difference between model-derived and observed OD exit flows and journey times. Therefore, there is a calibration engine in the diagram.

17 Figure 1-2: Diagram of the network performance model

1.3 Research Objectives and Approach Overview

The overarching objective of this thesis is to realize the self-calibrated NPM as shown in Figure 1-2. Specifically, there are three technical tasks: 1) developing the network loading model, 2) developing a train capacity calibration model, and 3) developing a path choices calibration model. The approach overview for the three tasks is shown below.

1.3.1 Network loading model

The network loading model aims to distribute passengers into the network given OD demand, path choices, and train capacity. An event-based simulation framework is proposed in this study. The event-based framework is computationally efficient while retaining the necessary performance information, which enables it to be applied with large-scale urban rail systems. Two types of events are considered: train arrivals and train departures. The events are sorted by time and processed sequentially until all events are completed during the analysis period. For an arrival event, the train offloads passengers who reach their destination or need to transfer at the station. The transfer passengers are added to the waiting queue at another platform. For departure events, the new tap-in passengers are added into the system. Passengers board the train according to a first-come-first-serve discipline until the train reaches its capacity. Passengers who cannot board are left behind and wait for the next available train.

18 The network loading model is described in Chapter 2, and also applied for path choice and train capacity calibration in Chapter 3 and 4.

1.3.2 Train capacity calibration

Train capacity is a vague concept. Normally trains may not reach their designed physical capacity for various reasons (e.g. passengers may decide not to board due to the crowding [8]). Therefore, assuming a fixed physical capacity (in many previous studies) may notbea reasonable assumption in real-world situations. In this study, we introduce the concept of ef- fective capacity, which is the train capacity actually being utilized under crowding situations. Effective capacity is dynamic and changes depending on the crowding state of the trainand the platform. In this study, we assume train load and the number of queuing passengers on the platform can affect the effective train capacity. A linear model is formulated todescribe the effective capacity with three parameters to calibrate (base capacity, sensitivity toplat- form crowding, and sensitivity to train crowding). The calibration problem is formulated as a simulation-based optimization (SBO) problem, where the difference of observed and model-derived OD exit flow and journey time distribution is set as the objective function. The train capacity calibration is shown in Chapter 2 for illustrating the NPM func- tionality. And we develop a co-calibration model for both path choices and train capacity in Chapter 4, where a comparative analysis is conducted with respect to different solving algorithms and under various scenarios.

1.3.3 Path choice calibration

In this study, path choice behavior is formulated as a C-logit model [9] with path attributes (e.g., in-vehicle time, number of transfers) parameters to calibrate. Two different calibra- tion frameworks are proposed in this study. In Chapter 3, we propose an assignment-based path choice estimation framework. The framework captures the crowding correlation among stations and the interaction between path choice and passenger denied boarding. The path choice estimation is formulated as an optimization problem. The original problem is in- tractable because of non-linear and non-analytical constraints. A solution procedure is proposed to decompose the original problem into three tractable sub-problems, which can be solved efficiently. We prove the solution of the decomposed problem is equivalent tothe original problem under specific conditions.

19 However, since path choice and train capacity may collectively affect passengers’ journey time and OD exit flow, a co-calibration model for estimating path choices and train capacity simultaneously is needed. As the assignment-based framework in Chapter 3 cannot model the train capacity explicitly, Chapter 4 proposes an SBO framework to calibrate path choices and effective train capacity simultaneously. The difference between observed and model- derived OD exit flow and journey time distribution is set as the objective function. We compare the performance of seven optimizers (solving algorithms) from four brunches of SBO methods and evaluate the model with We compare in five different scenarios, representing different degrees of path choice randomness and crowding sensitivity.

1.4 Data and Context

1.4.1 Automatically collected data

There are two important automatically collected data in public transit systems: automated fare collection (AFC) data and automated vehicle location (AVL) data. AFC data include passengers’ usage transactions with smart card data in the public transit system. According to services provided by the operator, the AFC data may contain transactions of both rails and buses (e.g., the CTA system in Chicago) or only rails (e.g., Hong Kong). AFC systems are either open or closed, determined by the agency’s fare policy. Open systems require that passengers only tap in when they enter the system (e.g. the MBTA system in Boston). Closed systems require both, tapping in and tapping out (e.g. the transit system in Seoul, Korea). Many systems are hybrid, utilizing an open architecture on the bus side and closed on the subway side (e.g. London). For a closed system, AFC data can provide accurate origins and destinations of passengers’ trips with tap-in and tap-out times. Given the rich information provided, AFC data have been used for understanding travel patterns [10], predicting individual trips [11], improving transit planning [12], etc. A complete review of the use of AFC data for transit system management can be found in Pelletier et al. [13]. AVL data contains information on the time-dependent location of vehicles. Train loca- tions are collected from the rail tracker system. Bus locations are collected from vehicle GPS. AVL data tells when a train or bus arrives at or departs from a corresponding station, which can be used to track the real-world vehicle movement. AVL data have been used for measuring travel time variability and reliability [14], predicting vehicle arrival time [15], up-

20 dating real-time scheduling [16]. A complete review of using AVL data for transit planning and management can be found in Levy and Lawrence [17].

1.4.2 Hong Kong Mass Transit Railway system

The Hong Kong Mass Transit Railway (MTR) system is used as the case study in the whole thesis. MTR operates a major urban rail network serving the areas of Hong Kong Island, Kowloon, and the New Territories. The system currently consists of 11 rail lines with 218.2 km serving 159 stations, including 91 heavy rail stations and 68 stops (bus is not included). The MTR uses a smart card fare payment system called Octopus, which requires both tap-in and tap-out when entering and exiting the system, respectively. The average weekday ridership is over 5 million. The map of the MTR system is shown in Figure 1-3. We point out two critical stations on the map: Admiralty (ADM) and Central (CEN). These two stations are located in the CBD areas and are two major transfer stations in the MTR system. During evening peak hours, a huge amount of passenger flows can be observed for these stations.

Figure 1-3: Hong Kong MTR metro system map

As the MTR is a closed system, the AFC data includes information of both origins and destinations, which is used to provide the full OD demand for the urban rail system. However, the AVL data is not available in the MTR system. The train movement information in the case study will be approximated by the time table. The airport express and light

21 rail transit services are not considered in this work since they are separated from the heavy railway lines and passengers who enter the heavy railway lines from these services need to tap-in again. Thus, the remaining part of the network is still a closed system with full OD information.

1.5 thesis Organization

1.5.1 thesis structure

The overall structure of the thesis is shown in Figure 1-4, which presents the relationship between the Chapters and the technical tasks described in Section 1.3.

Figure 1-4: Structure of the thesis

Chapter 2 describes the formulation and functionality of NPM. The event-based network loading engine, path choice model, and effective train capacity model are proposed. The effective capacity is calibrated using an SBO method. Path choices from Li [18] areused.The use of NPM for performance monitoring is demonstrated by analyzing the spatial-temporal crowding patterns in the MTR system and evaluating dispatching strategies. Chapter 3 describes the assignment-based path choice estimation model. The model is validated using both synthetic data and real-world AFC data. Results validate the model’s effectiveness in estimating path choice parameters. Model robustness with respect todiffer- ent initial values and case study dates are also verified. Chapter 4 describes the simultaneous calibration framework for path choices and train capacities using the SBO method. The comparative analysis for different solving algorithms

22 and synthetic scenarios shows that the response surface methods have consistently good performance in all scenarios. Chapter 5 summarizes the results and research findings and elaborates the future research in two perspectives: improving calibration methodology and extending NPM for service control and planning.

1.5.2 Relationship among chapters from another angle

In an urban rail system operated near its capacity, five critical parameters are correlated with each other: OD demand, journey time, left behind (or deny boarding), path choices, and train capacity. The relationship of these parameters can be explained in Figure 1-5.

Figure 1-5: Relationship among critical parameters in urban rail systems

OD demand is the input and journey time is the output (OD exit flow is a combination of the two), which can both be observed from the AFC data. Path choices, train capacity, and left behind are not observable in the AFC data. Journey time is directly affected by path choices and left behind (left behind can increase the waiting time). Left behind is directly affected by path choices and train capacity. This study aims to use observed OD demand and journey time to calibrate the unobserved parameters. This requires to capture the correlation between different parameters. Chapter 2, 3, and 4 are partial solutions for this problem under different levels of assumptions. Specifically, in Chapter 2, the calibration of train capacity assumes that path choices are fixed. The interaction coming from left behind (though not estimated directly) is captured. In Chapter 3, the calibration of path choices assumes that train capacity is fixed. The interaction between left behind and path choices is captured. And it is worth noting that Chapter 3 provides an efficient framework with tractable mathematical formulations. Chap- ter 4 estimates train capacity and path choices simultaneously, capturing all interactions. However, the SBO framework in Chapter 4 is not as efficient as that in Chapter 3.

23 1.6 Relation with Papers

The author also has three papers that are closely related to this research. The content in Chapter 2 is based on the paper “Capacity-constrained network perfor- mance model for urban rail systems” by Baichuan Mo, Zhenliang Ma, Haris N. Koutsopoulos, Jinhua Zhao [7]. This paper has been published in Transportation Research Record. The content in Chapter 3 is based on the paper “Assignment-based Path Choice Esti- mation for Metro Systems Using Smart Card Data” by Baichuan Mo, Zhenliang Ma, Haris N. Koutsopoulos, Jinhua Zhao [19]. This paper has been accepted in the 24th International Symposium on Transportation & Traffic Theory (ISTTT). The content in Chapter 4 is based on the paper “Calibrating Route Choice for Urban Rail System: A Comparative Analysis Using Simulation-based Optimization Methods” by Baichuan Mo, Zhenliang Ma, Haris N. Koutsopoulos, Jinhua Zhao [20]. This paper has been presented in Transportation Research Board 99th Annual Meeting.

24 Chapter 2

NPM Formulation and Functionality

2.1 Introduction

Increases in ridership are outpacing capacity in many large urban rail transit systems, in- cluding Hong Kong, London, New York, and Beijing [21]. Crowding at stations and on trains is a concern due to its impact on safety, service quality, and operating efficiency. Monitoring network performance (e.g. waiting time on platforms, load on trains, etc.) is essential to help agencies and operators in understanding the system, informing passengers, and improving operating strategies. Compared to traditional survey methods for service performance evaluation, data from AFC and AVL systems provide ample opportunities for analysis in areas such as travel behavior, operations planning, and monitoring, etc. [22, 23]. For performance monitoring, some performance indicators can be directly derived from automated data, including Vehicle kilometers traveled, Vehicle hours traveled, travel time reliability, OD demand, etc. [2, 3]. However, the problem of determining vehicle load and passenger waiting times is not trivial. Recently, a number of methods have been proposed to monitor passenger waiting times, left behind at stations, and vehicle loads, using AFC and AVL data [4]. Network loading or assignment models can also be used. The main difference between network loading and assignment models lies in their behavioral assumptions. Network loading models assume that travel choices are known, while assignment models estimate the travel choices through user equilibrium criteria (network loading is a key component of transit assignment models). Network assignment models are mainly used for planning applications. Nuzzolo et al. [24] proposed a dynamic schedule-based assignment model to simulate the within-day and day-

25 to-day learning process of passengers’ route choice. Nuzzolo et al. [5] proposed a mesoscopic transit modeling framework named “DYBUS2” to provide real-time short-term predictions of network performance. Recently, Yao et al. [25] developed an agent-based simulation model for the Beijing metro system. The applicability of these models for performance monitoring is limited. For performance monitoring, the interest is on the performance of the system on a particular day, which represents just one realization of operating conditions. Hence, finding an equilibrium solution is not actually applicable. Network loading models aremore appropriate for this purpose.

Network loading models provide detailed performance at different levels (station, line, train, passenger) given OD flows, path choices, and operating conditions. Therefore, they are suitable for modeling the performance of a particular day. For example, Grube et al. [26] developed an event-based network loading model to simulate metro systems in Santiago de Chile. However, since such approaches have to be applied at the network level, they require assumptions about train capacity (as assignment models also do). Determining the “actual” train capacity is not trivial. Studies found that train capacity may vary depending on the crowding levels in trains and on platforms [8, 27]. Ma et al. [4] showed that ignoring the “varied” train capacity results in biased estimates of left behind (highlighting the importance of network loading models to capture this varied capacity in their representation of the system).

This chapter describes the formulation and functionality of the NPM. Specially, we pro- pose an event-based network loading model, provide the formulation of the path choice model, and introduce a flexible train capacity model (called effective capacity hereafter). NPM models detailed passengers’ trajectories, including access and egress, queuing, trans- ferring, boarding, alighting, and left behind.

The remainder of the chapter is organized as follows: Section 2.2 introduces the compo- nents of NPM, including the network loading model, path choice model, and the effective train capacity models. The main functionality of NPM is described in Section 2.3. Case studies are presented in Section 2.4.3 to validate the NPM performance and demonstrate its functionality. Section 2.5 concludes the chapter and discusses future research directions.

26 2.2 NPM Formulation

Figure 1-2 provides an overview of the main structure of NPM. It consists of four main components: the input, the network loading engine, the output of various performance indicators, and the calibration engine. The calibration engine provides the capability to calibrate model parameters using reality available data. Green squares in the input section represent the parameters to be calibrated. Gray squares in the output section indicate model outputs that are also directly observable from AFC data (and hence, can be used for calibration and validation purposes). The calibration engine compares the output journey times and OD exit flows to the observed values (ground truth) and uses the difference to adjust model parameters (e.g. effective capacity).

2.2.1 Inputs

The NPM inputs include dynamic OD demand, path choice fractions, train movement infor- mation, train capacity, and access/egress/transfer walking time. Table 2.1 summarizes the data inputs. It is assumed that the system is closed. The AFC data contains passenger’s tap-in and tap-out times and stations (the complete OD entry demand by time period). The timetable provides the planned train arrival and departure information, while the AVL data provides the actual ones. The walk time is assumed to be normally distributed with the mean and variance calculated from field observations. Table 2.1: Input variables and data sources

Input variables Sources OD entry demand AFC data Path choice Section 2.2.3 Train movement Timetable or AVL data Train capacity Section 2.2.4 Measured by on-site observations or Access/egress/transfer walk time estimated from AFC/AVL data [28]

2.2.2 Network loading

Figure 2-1 summarizes the main structure of the network loading model. Three objects are defined: trains, queues, and passengers. Trains are characterized by routes, runs, current locations, and capacities. Passengers are queued based on their arrival times. Three different

27 types of passengers are represented: left-behind passengers who were denied boarding from previous trains, new tap-in passengers from outside the system, and new transfer passengers from other lines. The left-behind passengers are usually at the head of the queue.

An event-based modeling framework is used to load the passengers onto the network. Two types of events are considered: train arrivals and train departures. The events are sorted by time and processed sequentially until all events are successfully completed during the analysis period. When a train arrives at a station, the offloaded passengers either transfer or exit. Transfer passengers join the boarding queue. When a train departs a station, passengers are loaded on the train up to its available capacity based on a First Come First Serve (FCFS) principle.

Figure 2-1: Structure of the network loading model

Preprocessing

Train event lists (arrivals and departures) are generated according to the actual train move- ment data from AVL (or timetable). Each event contains a train ID, occurrence time, and location (platform). A passenger is randomly assigned to a route based on the correspond- ing path choice probability estimated from a path choice model. Random access and egress times are generated given corresponding distributions.

28 Train arrivals

For an arrival event, the train offloads passengers who reach their destination or needto transfer at the station and updates its state (e.g. train load and in-vehicle passengers). For passengers who reach their destinations, their tap out times are calculated by adding their egress time. For those who transfer at the station, their arrival times at the next platform are calculated based on the transfer time distribution. The transfer passengers are added to the waiting queue in order of their arrival times.

Train departures

For departure events, the queue on the platform is updated by the new tap-in passengers, that is, passengers who arrive at the platform after the last train departed are added into the queue based on their arrival times. Passengers board the train according to an FCFS discipline until the train reaches its capacity. Passengers who cannot board are left behind and wait in the queue for the next train. The states of the train and the waiting queue are updated accordingly.

2.2.3 Route choice

Route choice is usually modeled using the discrete choice framework, which assumes that decision-makers maximize their utilities when making choices [29]. The multinomial logit (MNL) model is a typical example of discrete choice models. For path choice problems, the C-logit model is often used. The C-logit is a variation of the MNL model which corrects for the fact that alternatives may not be independent due to path overlap. The C-logit incorporates an additional “cost” attribute, the commonality factor (CF), in the utility [9]. The probability of choosing path 푖 is given by:

exp (훽 · 푋 + 훽 · 퐶퐹 ) 푃 = 푋 푖 퐶퐹 푖 , (2.1) 푖 ∑︀ exp (훽 · 푋 + +훽 · 퐶퐹 ) 푗∈W 푋 푗 퐶퐹 푗 where, 푋푖 is the attribute vector of path 푖, such as in-vehicle time, number of transfers, etc.

W is the set of all alternative paths for the same OD pair. 훽푋 and 훽퐶퐹 are the corresponding

29 coefficients to be estimated. 퐶퐹푖 is the commonality factor of path 푖, defined as:

∑︁ 퐿푖,푗 훾 퐶퐹푖 = ln ( ) , (2.2) 퐿푖퐿푗 푗∈W where, 퐿푖,푗 is the number of common stations of paths 푖 and 푗. 퐿푖 and 퐿푗 are the numbers of stations for paths 푖 and 푗, respectively. 훾 is a positive constant that is determined based on empirical studies [18].

2.2.4 Effective capacity

Train capacity is a vague concept. Normally trains may not reach their designed physical capacity for various reasons (e.g. passengers may decide not to board due to the crowding [27]). Therefore, assuming a fixed physical capacity may not be a reasonable assumption in real-world situations. In this chapter, we introduce the concept of effective capacity, which is the train capacity actually being utilized under crowding situations. Effective capacity is determined by three factors: a) waiting passenger distribution on the platform, b) train load and distribution across the train, and c) passengers’ willingness to board a crowded train. Thus, train capacity is not constant but may vary across stations. We use the term effective capacity to differentiate it from the physical fixed train capacity. Effective capacity, as defined in this chapter, is dynamic and changes depending on the crowding stateofthe train and the platform.

Based on previous studies, two factors are included in the effective capacity (퐶푒) model: the current train load when a train arrives at the platform (denoted as 퐿) [8]; and the number of queuing passengers on the platform (denoted as 푄) [30]. The base capacity of train 푖 is 퐶푖 = 휃0푛푖, where 푛푖 is the number of cars of train 푖. 퐶푖 can be seen as the train load that represents acceptable service standards. At congested stations, passengers may still board a train even if it is already crowded [8], which makes the actual train load exceed 푒 퐶푖. Therefore, the effective capacity of train 푖 at platform 푗 (퐶푖,푗) can be formulated as:

⎧ ⎪휃 푛 + 휃 퐿 + 휃 푄 if platform 푗 is in the list of congested stations 푒 ⎨ 0 푖 1 푖,푗 2 푗 퐶푖,푗 = ∀푖, 푗 ⎩⎪휃0푛푖 otherwise (2.3)

30 The congested stations and time periods can be identified using AFC data [4]. The term platform means a combination of station+line+direction. 휃0, 휃1, and 휃2 are the parameters to 푒 be estimated. 휃0 is a measurement of the service standard (passengers/car). 퐶 at congested stations and for congested trains is expected to be higher than that of stations/trains with less crowding, hence 휃1 and 휃2 should be positive. Although a linear model is used here, the proposed approach is quite general and can accommodate more complex relationships between 퐶푒 and 퐿, 푄.

The effective capacity model can be calibrated as the following. Assume two available ground-truth information: observed OD exit flows and observed journey time distribution (JTD). The calibration problem is formulated as an optimization problem. The objective function has two parts: the square error between model-derived OD exit flows and the observations, and the difference between model-derived and observed JTD. The optimization problem is formulated as:

∑︁ 푖,푗푡 푖,푗푡 2 ∑︁ ˜ min 푤1 (푞 − 푞˜ ) + 푤2 퐷KL(푓푖,푗푡 (푥)||푓푖,푗푡 (푥)) (2.4a) 휃0, 휃1, 휃2 푖,푗,푡 푖,푗,푡

푖,푗푡 s.t. 푞 , 푓푖,푗푡 (푥) = Network Loading(휃0, 휃1, 휃2) ∀푖, 푗, 푡 (2.4b) where, 푞푖,푗푡 represents the number of passengers arriving from station 푖 and exiting at station

푗 during time interval 푡 (i.e. OD exit flows). 푞˜푖,푗푡 is the observed OD exit flow extracted from AFC data. 푤1 and 푤2 are the weights to balance the scale and the importance of the two parts. 푓푖,푗푡 (푥) is the probability density function of the estimated JTD of passengers ˜ who come from station 푖 and exit at station 푗 during time interval 푡. 푓푖,푗푡 (푥) the observed

푖,푗푡 JTD obtained from AFC data. Eq. 2.4b indicates that 푞 and 푓푖,푗푡 (푥) are obtained from the network loading model with 휃0, 휃1, and 휃2 as inputs. The difference between the two distributions is expressed using Kullback-Leibler (KL) divergence (퐷KL):

∫︁ 푓 (푥) 퐷 (푓 (푥)||푓˜ (푥)) = 푓 (푥) · log 푖,푗푡 d푥. (2.5) KL 푖,푗푡 푖,푗푡 푖,푗푡 ˜ 푥 푓푖,푗푡 (푥)

Solving the problem in Eq. 2.4 is a black-box optimization problem because of the non- analytical nature of the network loading process. In this study, a Bayesian Simulation-based Optimization (BSO) method [31] is applied. The BSO works by constructing a posterior dis- tribution (surrogate function) that best approximates the objective function. As the number

31 of observations grows, the posterior distribution improves, and the algorithm becomes more certain of which regions in the parameter space are worth exploring. Given the general optimization approach, more sophisticated effective capacity models could also be explored using the proposed BSO method.

2.3 NPM Functionality

NPM can be used to monitor performance at four dimensions: measuring crowding, diag- nosing crowding causes, evaluating dispatching strategies, and evaluating network resilience.

2.3.1 Crowding patterns

Crowding is one of the most important metrics for evaluating the level of service, safety, etc. The crowding indicators, directly obtained from NPM outputs, are summarized in Table 2.2. All indicators are time-dependent with flexible aggregated intervals. The left behind rate is the probability of not boarding on the first train, which can be calculated as the number of passengers who have been left behind at least once divided by the total number of boarding passengers at the platform during a specific time period. The number of times left behind is the number of trains missed after the first train due to crowding on the trains. Other service quality indicators can also be output by NPM, such as the availability of seats, the number of standing passengers, and journey time reliability.

Table 2.2: Crowding indicators

Train Train load Waiting time Number of times left behind Platform Left behind rate (% of passengers left behind) Queue length Link Link flow

2.3.2 Crowding sources

Since the NPM models passengers’ travel behavior at the individual level, the complete trajectories of all passengers are recorded. To diagnose the formation of crowding, the NPM

32 can trace the sources of passengers on each link and passengers exiting at each station. The information about where passengers come from and how they contribute to loads at critical links of the system can inform operators to develop specific demand management strategies, such as promotions [22], peak pricing, etc.

2.3.3 Dispatching strategies

Train dispatching strategies (e.g., headway adjustment, express trains) are basic instruments transit operators use on the supply side to deal with crowding or improve service reliability. Evaluating different train dispatching strategies can provide useful insights for improving service performance. NPM can be used to analyze network performance under different train dispatching strategies, such as express trains at different times during the peak. Such trains may skip stops in order to provide more capacity at crowded stations.

2.3.4 Network resilience

Network resilience is the ability of the system to provide and maintain an acceptable level of service in the face of disruptions and other challenges to normal operations, such as in- cidents, large scale natural disasters, special events, etc. Metro systems in large cities are facing more and more service disruptions due to increasing demand and aging infrastructure. These problems cause serious safety concerns and service performance deterioration. The agencies are using various strategies, from demand management to infrastructure improve- ments, to prevent disruptions and mitigate their impacts. Approaches based on passenger information are still emerging as strategies transit agencies use to deal with disruptions. NPM can be used to analyze network resilience by comparing the performance indicators (e.g. waiting time) given different actions that operators may take when disruptions occur, such as providing information and dispatching shuttle buses.

2.4 Case Study

The NPM was demonstrated and validated using data from the MTR system in Hong Kong (Figure 1-3). The Admiralty (ADM) station is one of the most crowded stations, with high volumes of passengers boarding and transferring there.

33 2.4.1 System settings

AFC data from a weekday in March 2017 is used to generate the OD entry demand and conduct effective capacity calibration. Since AVL data is not available for all lines,the timetable was used to provide train movement information (the actual train movements may differ from the timetable). Considering the high on-time performance of the MTRsystem (99.9 percent on-time rate)[32], this is a reasonable approximation. Since the evening peak is the most congested period, we only consider the period from 17:00 to 20:00 for model application. The warm-up and cool-down times are both set as 1 hour. The running time is about 15 minutes on a personal computer with a 3.6GHz CPU and 32GB of RAM.

2.4.2 Model calibration

The route choice model used to calculate path choice fractions for various OD pairs was estimated using data from a survey of MTR users [18]. A total of 31,640 passengers partic- ipated in the survey, with 26,996 valid responses. The model estimation results are shown in Appendix A.

The optimization problem (Eq. 2.4) is used for effective capacity calibration. The weights in the objective function were set to 푤1 = 1 and 푤2 = 1000 for the error in OD * * exit flows and JTD, respectively. The optimal coefficients are 휃0 = 231.6, 휃1 = 0.0732, and * * 휃2 = 0.0607. The value of 휃0 is close to the MTR standard (230 passengers/car). The signs * * of 휃1 and 휃2 are consistent with the discussion in the previous section.

For comparison purposes, a fixed train capacity model is used as the benchmark to compare with the effective capacity model. The fixed capacity for train 푖 at platform 푗 푓 (퐶푖,푗) is defined as:

푓 퐶푖,푗 = 휃푓 푛푖 ∀푖, 푗 (2.6)

We test three different values of 휃푓 for comparison, that is, 휃푓 = 230, 휃푓 = 245, and

휃푓 = 260 passengers/car.

34 2.4.3 Results

Model validation

To validate the performance of NPM, field observations at ADM station (Tsuen Wan Line, north direction) on the same day as the AFC data were used for comparison. The data were collected by MTR employees who counted passengers on the platform during 18:00- 19:00. Left behind passengers, the number of arriving passengers (sum of the new tap-in and transfer passengers), and the number of passengers boarding each train were recorded.

Figure 2-2 compares the fixed-capacity model, effective capacity model, and ground truth for different indicators. The number of boarding and arriving passengers from the effective capacity model matches the ground truth observations well as shown in Figure 2-2a and 2-2b. The peak in Figure 2-2a is due to an empty train dispatched from the upstream terminal station, so that more capacity is available to serve the passengers at the crowded ADM station. The Root Mean Square Error (RMSE) of the number of boarding and arriving passengers for each train is reported. The arrival passenger curves (Figure 2-2b) for fixed capacity and effective capacity models are nearly the same. This is expected because the number of arriving passengers mainly depends on the OD demand and path shares and these two inputs are the same for the fixed and effective capacity models. However, the estimates of boarding passengers from the effective capacity model are closer to the observed values compared to the fixed capacity model (Figure 2-2b). The results support the importance of using the effective capacity since boarding passengers are directly related to thetrain capacity.

A comparison of the train load between the fixed capacity and effective capacity models is shown in Figure 2-2c (ground truth train load data were not available). The trains at ADM station are always full from 18:15 to 19:00. Figure 2-2c shows that the effective capacity can capture the variability of the train load due to the change in crowding levels over time. Other studies, based on actual observations of train loads, also support this finding [8].

Figure 2-2d compares the percentage of passengers who are left behind different times ac- cording to the models and ground-truth observations. The effective capacity model provides a more accurate estimation of left behind than the fixed-capacity model, which is consistent with findings in Ma et al. [4].

Figure 2-3 compares the exit flows from the NPM against the actual observations ex-

35 (a) Boarding passengers (b) Arriving passengers

(c) Train load (d) Left behind

Figure 2-2: Model validation at the ADM station, Tsuen Wan Line, northbound (18:00- 19:00) tracted from the AFC data. The top 30 stations in terms of exit flows are displayed. The RMSE of the exit flows for each model is also reported. The results from the effective capacity model match the ground truth well and outperform the fixed-capacity models. Overall, the proposed effective capacity NPM can capture real-world situations well and has the potential to be an effective tool for performance monitoring.

Crowding analysis

In a congested rail system, waiting times increase because of passengers’ left behind due to full trains. Figure 2-4 shows the wait time-headway ratio and left behind rates for the top 10 crowding platforms in the network. The wait time-headway ratio is defined as the passenger’s average wait time divided by the headway. Under normal conditions for operations with small headway variabilities and no capacity constraints (assuming random passenger arrivals), the ratio has a value close to 0.5. The platform ID in Figure 2-4 reflects the station ID + line ID + direction. For example, 2_11_1 is the platform at ADM station

36 (a) 18:00-18:30

(b) 18:30-19:00

Figure 2-3: Exit flow comparison (18:00-19:00) serving the Tsuen Wan Line in the north direction. Figure 2-4a shows that, at platforms 27_13_1 and 2_11_1, passengers have to wait for an average of 2 headways. That means a passenger is expected to wait for more than 2 trains to board. Figure 2-4b shows the top 10 platforms by their left behind rates (probability to be left behind at least once). The most congested platform during the evening peak is at ADM station (Tsuen Wan Line northbound). The platform has a left behind rate of about 0.75, consistent with the high wait time-headway ratio shown in Figure 2-4a.

Evaluation of dispatching strategy

A key application of NPM is the evaluation of different dispatching strategies. As shown in Figure 2-2a, an empty train is dispatched (express train) from CEN to ADM at 18:40 to serve a large number of passengers typically waiting at the ADM station. NPM can be used to test

37 (a) Average wait time-headway ratio (b) Left behind rate

Figure 2-4: Network crowding patterns (18:00-19:00) how effective such strategies are in relieving congestion. For comparison purposes, wealso test two additional scenarios: a) no express train is dispatched; and b) the express train is dispatched at 18:30. Figure 2-5 compares the number of left-behind passengers at CEN and ADM, which are the first two stations on the Tsuen Wan Line, northbound. Dispatching an express train transfers the congestion from ADM to CEN. The strategy temporally decreases the left-behind passengers at ADM. The dispatching time does not significantly influence the crowding patterns at ADM. However, the 18:30 dispatching seems to reduce the number of left behind more than the 18:40 dispatching as it targets better the peak of the crowding conditions.

(a) CEN Station (b) ADM Station

Figure 2-5: Comparison of left-behind passengers for different dispatching strategies

38 2.5 Discussion

The chapter describes the main formulation and functionality of the NPM. The major com- ponent of NPM is an event-driven network loading module, which is capable of simulating passengers’ walking, queuing, boarding, and alighting processes. NPM can be used to infer crowding patterns and evaluate dispatching strategies. We propose an effective train capac- ity model, which explicitly recognizes that capacity may be different at different stations, depending on the crowding levels on the platform and the train. NPM is applied using a case study with data from Hong Kong’s MTR network. The results show that NPM is able to replicate actual conditions (based on AFC data and direct observations of crowding levels at one station). NPM is also used to evaluate the effectiveness of various dispatching strate- gies in reducing onboard crowding. The results highlight the importance of calibrating the train capacity and support the value of the model for performance monitoring and operating strategies evaluation.

39 THIS PAGE INTENTIONALLY LEFT BLANK

40 Chapter 3

Assignment-based Path Choice Estimation

3.1 Introduction

With the increase of city scales and populations, metro systems are playing an increasing role in urban transportation. Understanding passenger flow distribution in metro systems is crucial for adjusting operation strategies and better accommodating passengers. Simulation and transit assignment models are powerful instruments to infer and predict passenger flows in the network, and hence monitor and evaluate system performance. Two important inputs are required for these models: the OD demand and passengers’ path choices. With the widely adopted AFC systems, the station-to-station OD flows in metro networks are readily available, especially for close systems with both tap-in and tap-out transactions. However, path choices are not observed directly. Therefore, estimating path choices is an important task for system performance evaluation.

On-site surveys are typically used to estimate path choice models. However, surveys are time-consuming and labor-intensive. In addition, given the changes in operating character- istics and performance of metro systems. Survey results may be out of date. To overcome these disadvantages, researchers have proposed path choice estimation methods using AFC data.

In closed metro systems, AFC data include locations and times of both tap-in and tap-out transactions. AFC data-based methods for path choice estimation can be categorized into

41 two groups: path-identification methods [33, 34, 35, 36] and parameter-inference methods [37, 38, 39, 40]. The former studies aim to identify the exact path chosen by a user. The path attributes (e.g. walking time, in-vehicle time, etc.) are used to evaluate how likely each path is chosen by passengers. The later studies formulate probabilistic models to describe the random process of passengers’ path choice behavior. Bayesian inference is usually used to estimate the corresponding choice parameters or path choice fractions. Despite using different methods, the key idea for those AFC data-based approaches are similar. They all attempt to match the model-derived journey time with the observed journey time from AFC data. Since the model-derived journey time is determined by the choice parameters, the observed journey time provides a source to calibrate path choices. However, this type of method may fail if denied boarding (also called left behind, which means that passengers are not able to board the first train upon their arrival at the platform due to limited train capacity) is not taken into account.

Left behind causes passengers’ waiting time on platforms to increase, thus increasing their total travel time. It may happen that the journey time for a longer route without left behind is close to that of a shorter route with left behind, which makes the two routes indistinguishable using the pure journey time-based methods [41]. Several studies have taken left behind into consideration explicitly or implicitly. For example, Sun et al. [38] considered the delay caused by the left behind as part of travel time variability. This method is unable to distinguish the choice of routes with very similar journey time distribution. Sun and Xu [37] and Zhao et al. [39] assumed that the left-behind probabilities for different stations are independent, and explicitly estimated left behind probabilities before inferring the path choice fraction. These partially addressed the left behind problem. However, the independence assumption neglects left behind correlations among stations. In the real world, left behind is caused by the interaction between supply and demand. A station with high entry demand may cause its next station to be congested because the remaining capacity for the next station becomes very limited. Therefore, the left behind probabilities for different stations are not independent. Moreover, it is not reasonable to consider path choices and left behind separately. These two components are interconnected and affect passengers’ journey time collectively. Thus, the path choice estimation model needs to consider the correlation of left behind among platforms, as well as the interactions between path choices and left behind. One of the solutions is embedding the transit assignment model into path

42 choice estimation with the information of network topology and train timetables. Transit assignment models simulate passengers’ travel behavior to mimic the reality, where the left behind correlation among stations and the interaction between path choices and left behind are naturally satisfied. However, as known in the literature, the schedule-based dynamic transit assignment is a complicated problem with no analytical solution [42] (because of the complicated network loading process). None of the aforementioned studies have used the transit assignment model for the path choice estimation problem. To fill the research gap, this chapter proposes a new path choice estimation framework, which incorporates a transit assignment model with network topology and train operations information. It can capture the left behind correlation among platforms, and address the interactions between path choices and left behind. The path choice estimation is formulated as an optimization problem. The original problem is intractable due to the non-analytical transit assignment model and the non-linear constraints. We decompose the original prob- lem into three tractable sub-problems: rough path shares estimation, choice parameters estimation, and path exit rates estimation. All sub-problems have tractable forms and can be solved efficiently. We prove the solution of the decomposed problem is equivalent tothat of the original problem under specific conditions. The proposed framework is validated using data from Hong Kong’s MTR system. Results affirm the effectiveness and robustness ofthe proposed method in path choice estimation. This remainder of the chapter is organized as follows: Section 3.2 reviewed the related studies in the literature. Section 3.3 describes the modeling framework, including network representation, problem definition, and solution procedures. The model was validated using both synthetic data and actual data in Section 3.4. The main findings and future research directions are summarized in Section 3.5.

3.2 Literature Review

Considerable literature exists on rail transit path choice estimation. Stated preference (SP) and revealed preference (RP) surveys are often used to estimate path choices. For example, Lam and Xie [43] applied a path-size logit model to estimate path choices in Singapore’s metro system with mixed SP and RP data. Nazem et al. [44] adopted a discrete choice model to estimate passengers’ route choices for different demographic groups using the household

43 travel survey in Canada. Eluru et al. [45] used a mixed logit framework to study tran- sit path choices in Montreal, Canada with data from a Google Map-based RP survey. A methodological review on survey-based path choice estimation can be found in Prato [46].

Recently, the emergence of smart card data has shifted the research toward data-driven path choice estimation using historical transactions, rather than collecting route choice in- formation with physical surveys. As mentioned in the introduction, these studies can be categorized into two categories: path-identification methods and parameter-inference meth- ods. In terms of path identification, Kusakabe et al. [33] proposed an algorithm to identify the exact train that a passenger boarded using smart card data, which then gave the path choice. Based on a case study in Japan, the model was validated with train load weight data and GPS trajectories of probe passengers. Zhou and Xu [34] proposed a path iden- tification method using the “maximum likelihood boarding plan” method, which assumes each individual will choose the path with the highest matching degree. The matching de- gree is calculated based on journey times. The smart card data from the system were used for a case study. Kumar et al. [35] proposed a trip chaining method to infer the most likely trajectory of transit passengers using AFC and General Transit Feed Specification (GTFS) data. The method was applied using data from the Twin Cities and verified by automatic passenger count data. Path-identification methods have several dis- advantages. First, they are usually applied at the individual level, which may bring great computational challenges in large-scale and high-demand networks. Second, for the pur- pose of service quality evaluation, operators care more about the network-level path choices. Path-identification methods can only obtain network-level path choices by aggregating the individual-level behavior, which may generate estimation errors for OD pairs with limited sample sizes.

In contrast, parameter-inference methods directly output the network-level path choices, which is more suitable for system performance evaluation. Parameter-inference methods usually connect path shares with path attributes by constructing behavioral models (e.g. discrete choice models), and estimate the corresponding parameters in the constructed mod- els. Sun and Xu [37] proposed a probabilistic model for path choice estimation using AFC data. They first estimated platform elapsed time for transfer and through stations, andthen used a Gaussian mixture model to estimate path choice fractions based on the journey time distribution. The model was validated with a synthetic data set. Sun et al. [38] proposed

44 an integrated Bayesian approach to estimate the network-level path choices. Path choices were described by a multinomial logit model with parameters to be estimated. The model was implemented with data from the Singapore MRT system. Zhao et al. [39] proposed a probabilistic model to estimate path choice fractions using AFC data. They first estimated the number of trains waited by passengers, which is equivalent to the left behind rate. Then the path choice fractions were modeled and estimated based on a Gaussian mixture model. Xu et al. [40] proposed a Bayesian inference approach to estimate the path choice parameters of a logit model using AFC data. Metropolis-Hasting sampling was used to calibrate the model parameters. As mentioned in Section 3.1, the left behind is important in the estimation of path choices. However, few studies have addressed this problem satisfactorily. The pure journey time-based methods [37, 38, 40] considered the waiting time caused by left behind as part of total journey time, which cannot distinguish long-distance paths without left behind and short-distance paths with left behind when they have very similar total journey times. Zhao et al. [39] assumed left behinds are independent across stations, and considered the left behind estimation and path choice estimation problems separately, which neglects their interactions. Thus, a comprehensive path choice estimation framework that can capture the left behinds correlation among platforms, and address the interactions between path choices and left behind is needed to advance the current state of the art.

3.3 Methodology

3.3.1 Network representation

In order to capture the interactions in the system, the model requires a dynamic transit network loading model. A network loading process assumes that passengers’ path choices are known and treated as input [42]. A typical way to represent transit network with schedule information is using a Time-space (TS) hyper-network [47, 48, 49], where each station in the metro system is expanded into a series of nodes, representing the station at different time intervals. The length of the time interval 휏 is usually set as the minimal headway. For example, assuming train departing the terminal every 2 minutes, a station 푎 in the metro system is expanded to nodes (푎1, 푎2, ..., 푎푁 ), where 푎1 represents station 푎 at time

7:00-7:02; 푎2 at time interval 7:02-7:04, etc. This fine-grained network representation may

45 not be practical for the real-world application since the TS network can be extremely large. Consider a metro system with 100 stations and minimal headway of 2 minutes (e.g. the MTR network in our case study). To perform a 2-hours network loading, one station will be expanded to 60 TS nodes. The total number of OD pairs in this TS network is approximately 36 million, bringing large computational challenges. However, the path choice calibration problem actually does not require such a fine-grained representation. An aggregated network representation is sufficient for the problem. Let us consider a study time period 푇 divided into 푁 elementary time intervals of length 휏. Each time interval may include several headways (e.g. 휏 = 15 minutes), indicating a more aggregated representation. Considering a station 푖 in the metro system, we expand 푖 into a sequence of TS nodes, denoted as (푖1, ..., 푖푚, ..., 푖푁 ), where 푖푚 represents station 푖 at time interval 푚. The aggregated TS representation thus consists of 푁 layers of the network with each layer representing a time interval. In this aggregated TS network, many detailed aspects such as left behind, which requires schedule-based details, cannot be explicitly modeled. But the trade-off is that we can obtain a sparser TS network, which can be used for problems involving large scale metro systems. Consider two stations 푖 and 푗 in a metro system with different paths between them. The path set is denoted as R(푖, 푗). Our purpose is to calculate the choice proportions of these paths for different time intervals. The key variables are defined below:

∙ OD entry flow, 푞푖푚,푗: Number of people with origin 푖 and destination 푗 entering station 푖 during time interval 푚. It can be obtained from the AFC data directly. The

푖푚,푗 set of all 푞 is denoted as 푞푒.

∙ OD entry-exit flow, 푞푖푚,푗푛 : Number of passengers who enter station 푖 in time interval

∑︀ 푖푚,푗푛 푚 and exit at station 푗 in time interval 푛 (푚 ≤ 푛). By definition, 푛:푛≥푚 푞 = 푞푖푚,푗. 푞푖푚,푗푛 is an output of the network loading model. And for a closed metro system, it is also directly observed from the AFC data. Therefore, it can be used to calibrate path choices. In this study, OD entry-exit flows provide similar information as journey times that are used in previous research [38, 39, 36] but are more aggregated.

푖푚,푗 ∙ Path choice fraction (or path share), 푝푟 : The probability that path 푟 is

푖푚,푗 chosen in time interval 푚, where 푟 ∈ R(푖, 푗). By definition, 0 ≤ 푝푟 ≤ 1 and

∑︀ 푖푚,푗 푖푚,푗 푟∈R(푖,푗) 푝푟 = 1. The set of all 푝푟 is denoted as 푝.

46 푖푚,푗푛 ∙ Path flow, 푞푟 : Number of passengers who enter station 푖 in time interval 푚 and

푖푚,푗푛 exit at station 푗 in time interval 푛 using path 푟. 푞푟 is an output of the network loading process.

푖푚,푗푛 ∙ Path exit rate, 휇푟 : Number of passengers who enter station 푖 in time inter- val 푚 and exit station 푗 in time interval 푛 using path 푟 divided by number of pas- sengers who enter station 푖 in time interval 푚 and exit station 푗 using path 푟 (i.e.

푖푚,푗푛 푖푚,푗푛 ∑︀ 푖푚,푗푛 휇푟 = 푞푟 / 푛:푛≥푚 푞푟 ). This variable captures the information on how many passengers exit the system at different time intervals, which, for a given path 푟, de- pends only on the train schedule and left behind. Since the schedule is known, the

푖푚,푗푛 path exit rate can be seen as an indicator of left behind. The set of all 휇푟 is denoted as 휇.

Given the above notation, the following relationships hold.

∙ OD entry-exit flow equals the sum of all the path flows of the corresponding OD,

푖푚,푗푛 ∑︁ 푖푚,푗푛 푞 = 푞푟 , ∀푖푚, 푗푛 (3.1) 푟∈R(푖,푗)

∙ The path flow can be expressed as a product of the OD entry flow, the path share, and the path exit rate.

푖푚,푗푛 푖푚,푗 푖푚,푗 푖푚,푗푛 푞푟 = 푞 · 푝푟 · 휇푟 , ∀푖푚, 푗푛, 푟 ∈ R(푖, 푗) (3.2)

This relationship is the major procedure for the network loading, which assigns the OD demand (OD entry flows) to path flows.

We use a simple example to better illustrate the network representation. Consider two stations, 푖 and 푗, where 푖 is the origin and 푗 is the destination (see Figure 3-1a). Assume there exists two different paths connecting this OD pair, i.e. R(푖, 푗) = {1, 2}. The red arrows represent path 1, and blue arrows path 2. The time period of interest is from 7:00 to 7:30 and the time interval 휏 = 15 min. Then the network can be extended to the TS network shown in

Figure 3-1b. For example, 푖1 represents the station 푖 at time 7:00-7:15. Assume the only OD

푖1,푗 푖1,푗 푖1,푗 entry flow is 푞 = 10, and the path shares are 푝1 = 0.3 and 푝2 = 0.7. This means that there are 10 passengers arriving station 푖 during 7:00-7:15. Three of them use path 1 and seven path 2. They all head to destination 푗 but currently we do not know when they will

47 arrive at the destination. Actually, given the current information, a transit loading model will provide the passengers’ exit time. For illustration purpose, suppose we obtain the exit time (which can be used to calculate the path exit rate): for the 3 passengers who use path

푖1,푗1 1, 2 out of 3 tap out at station 푗 during 7:00-7:15 (i.e. 휇1 = 2/3) and 1 out of 3 tap out at

푖1,푗2 푖1,푗1 푖1,푗 푖1,푗 푖1,푗1 station 푗 during 7:15-7:30 (i.e. 휇1 = 1/3). Then we have: 푞1 = 푞 ·푝1 ·휇1 = 2 and

푖1,푗2 푖1,푗 푖1,푗 푖1,푗2 푞1 = 푞 · 푝1 · 휇1 = 1. These equations correspond to Eq. 3.2, which assigns the OD

푖1,푗 푖1,푗1 푖1,푗2 entry flow (푞 ) to the path flows (푞1 and 푞1 ). Similarly, for the 7 passengers who use

푖1,푗1 path 2, assume 4 out of 7 passengers tap out at station 푗 during 7:00-7:15 (i.e. 휇2 = 4/7)

푖1,푗2 and 3 out of 7 passengers tap out at station 푗 during 7:15-7:30 (i.e. 휇2 = 3/7). We have

푖1,푗1 푖1,푗2 푞2 = 4 and 푞2 = 3. From the relationship between OD entry-exit flows and path flows (Eq. 3.1), wehave

푖1,푗1 푖1,푗1 푖1,푗1 푖1,푗2 푖1,푗2 푖1,푗2 푞 = 푞1 + 푞2 = 6, and 푞 = 푞1 + 푞2 = 4. Also, the sum of the OD entry-exit flows over all exit time intervals is the OD entry flow, i.e. 푞푖1,푗 = 푞푖1,푗1 + 푞푖1,푗2 = 10.

(a) Physical network (b) Time-space Hypernetwork

Figure 3-1: Network representation example

3.3.2 Problem definition

Model assumptions

We assume that path shares can be formulated as a C-logit model [9] as in Chpater 2.

푖푚,푗 exp (훽푋 · 푋푟,푚 + 훽퐶퐹 · 퐶퐹푟) exp (훽푌푟,푚) 푝푟 = ∑︀ := ∑︀ , (3.3) 푟′∈R(푖,푗) exp (훽푋 · 푋푟′,푚 + 훽퐶퐹 · 퐶퐹푟′ ) 푟′∈R(푖,푗) exp (훽푌푟′,푚) where 푋푟,푚 are the attributes for path 푟 in time interval 푚 (in-vehicle time, number of transfers, transfer walking time, etc.). 퐶퐹푟 is the commonality factor of path 푟 which measures the degree of similarity of path 푟 with the other paths of the same OD. 훽푋 and

훽퐶퐹 are the corresponding coefficients to be estimated. 훽 and 푌푟,푚 represent the combination

48 of the two terms in the utility function. The definition of 퐶퐹푟 can be found in Eq. 2.2. In the formulation of the problem, we assume that passengers waiting at a platform board trains based on a First-In-First-Board (FIFB) principle. Every train has a capacity constraint. When a train reaches its capacity, the remaining passengers on the platform will be left behind for the next train with available capacity to board. These constraints are

푖푚,푗푛 푖푚,푗푛 formulated as the relationship among all 휇푟 , because 휇푟 represents when and how many passengers exit the system, which is a reflection of the network loading mechanism (NLM). However, formulating the constraints analytically based on the aggregated network

푖푚,푗푛 representation is difficult. We thus, temporally, denote the constraints for 휇푟 as

푖푚,푗푛 휇푟 satisfies the ,NLM ∀푖푚, 푗푛, 푟 ∈ R(푖, 푗) (3.4)

The constraints will be addressed in the following sections.

Formulation

The purpose of this research is to estimate path choices using AFC data. Since we assume path choices can be formulated as the C-logit model, the 훽 in the C-logit model will be the decision variables. As we mentioned before, the OD entry-exit flow푞 ( 푖푚,푗푛 ) can be obtained from the transit loading process, for which the ground truth value can be observed from AFC data. So minimizing the difference between estimated and observed OD entry- exit flow can be the objective function. The reasons of using the difference of 푞푖푚,푗푛 as the objected function, rather than journey times as in previous studies, is as follows: 1) The model is framed based on the aggregated TS hyper-network, where the individual- based journey time is not available; 2) Estimating individual-based journey time is difficult given many latent factors (e.g., various walking speed, in-station activities). This leads to high errors when matching individual-level information. Aggregate information (e.g., 푞푖푚,푗푛 ) has the potentials to offset some latent errors thus provides more reliable calibration. 3) Considering the computational cost, using aggregate information along with the aggregated TS hyper-network is available to apply the model in large-size urban rail systems with full AFC data, while the individual-based model can only be applied with a small sample of AFC data [38]. If prior information about path choices is available (for example, estimated from a prior

49 survey), the difference between estimated and prior 훽’s can be used in the objective function. Therefore, the original problem is formulated as:

∑︁ 푖푚,푗푛 푖푚,푗푛 2 2 min 푤1 (푞 − 푞˜ ) + 푤2||훽 − 훽˜|| (3.5a) 훽, 휇 푖푚,푗푛

푖푚,푗푛 ∑︁ 푖푚,푗푛 s.t. 푞 = 푞푟 ∀푖푚, 푗푛, (3.5b) 푟

푖푚,푗푛 푖푚,푗 푖푚,푗 푖푚,푗푛 푞푟 = 푞 · 푝푟 · 휇푟 ∀푖푚, 푗, 푟 ∈ R(푖, 푗), (3.5c)

푖푚,푗 exp (훽푌푟,푚) 푝푟 = ∑︀ ∀푖푚, 푗, 푟 ∈ R(푖, 푗), (3.5d) 푟′∈R(푖,푗) exp (훽푌푟′,푚)

푖푚,푗푛 휇푟 satisfies the NLM ∀푖푚, 푗, 푟 ∈ R(푖, 푗), (3.5e)

∑︁ 푖푚,푗 푝푟 = 1 ∀푖푚, 푗, (3.5f) 푟∈R(푖,푗)

푖푚,푗 0 ≤ 푝푟 ≤ 1 ∀푖푚, 푗, 푟 ∈ R(푖, 푗), (3.5g)

푖푚,푗푛 푞푟 ≥ 0 ∀푖푚, 푗, 푟 ∈ R(푖, 푗) (3.5h)

푖푚,푗푛 where 푞˜ is the observed OD entry-exit flow; 훽˜ is prior estimates of 훽. 푤1 and 푤2 are the corresponding weights. Note that in the case study section we assume no prior knowledge, so 푤2 = 0. Constraints 3.5b and 3.5c are the relationships described in Section 3.3.1. Constraints 3.5d and 3.5e represent the assumptions we made in Section 3.3.2. Constraints 3.5f, 3.5g, and 3.5h are given by definition. There are several constraints that make this problem hard to solve. First, constraints

푖푚,푗푛 푖푚,푗푛 3.5c and 3.5d are both nonlinear equality constraints because 휇푟 and 푝푟 are unknown. Second, constraint 3.5e is non-analytical because we cannot formulate the NLM constraints

푖푚,푗푛 in terms of 휇푟 analytically. So the original problem is intractable. The methods to deal with these constraints and approximately solve the original problem will be discussed in the following sections.

3.3.3 Problem decomposition

푖푚,푗푛 Though we cannot formulate constraints 3.5e analytically, the corresponding 휇푟 values can be obtained from the output of a network loading process. Therefore, we decompose the original problem into two sub-problems as following.

50 ∙ Sub-problem 1:

∑︁ 푖푚,푗푛 푖푚,푗푛 2 2 min 푤1 (푞 − 푞˜ ) + 푤2||훽 − 훽˜|| 훽 푖푚,푗푛 (3.6) s.t. Eq. (3.5b) - (3.5d),

Eq. (3.5f) - (3.5h)

∙ Sub-problem 2:

휇 = Network Loading (훽, 푞푒) (3.7)

Sub-problem 2 is the network loading model, which takes the route choice parameter 훽 and OD entry demand 푞푒 as input, and outputs the path exit rates 휇. In this study, we use the event-based simulation model proposed in Chapter 2 to perform the network loading. This simulation framework shares the same NLM and model assumptions as described before. Therefore, the estimated 휇 from the model satisfies the NLM constraints.

Sub-problem 1 is a variation of the original problem (Eq. 3.5), where the non-analytical constraint 3.5e is removed. 휇 is treated as known in sub-problem 1. Constraint 3.5c is now

푖푚,푗푛 linear since 휇푟 is fixed. However, the problem is still intractable because of the highly non- linear constraint 3.5d, which we refer it as the logit constraints. In the following sections, we will show how we linearize sub-problem 1 and solve it as a quadratic programming problem.

3.3.4 Linearization for sub-problem 1

Addressing the logit constraints is difficult. Davis et al. [50] and Atasoy et al. [51] showed that when the logit structure is in an objective function, and utilities are constants but choice sets are unknown (assortment planning problem), the integer programming can be reformulated as linear programming. However, for our problem, the logit structure is a constraint of the problem. And 훽 in the utility function is unknown. To the best of our knowledge, there is no equivalent transformation from this logit constraint to a tractable form. In this study, we propose two procedures to approximately linearize sub-problem 1 with the logit constraints.

51 Construct approximate linear constraints (ALC)

푖푚,푗 The logit constraint shows the relationship between 훽 and 푝푟 . Since directly dealing with

푖푚,푗 the non-linear constraints is difficult, we first replace the decision variables 훽 with 푝푟 and remove constraint 3.5d. Then sub-problem 1 becomes a simple quadratic programming

푖푚,푗 problem given all constraints become linear. However, as the degrees of freedom for 푝푟 are much larger than 훽, directly replacing decision variables will cause severe problems of

푖푚,푗 over-fitting. So, more constraints on 푝푟 are needed to narrow down the feasible space.

In this study, we propose a Monte-Carlo sampling method to construct a series of linear

푖푚,푗 constraints for 푝푟 . The basic idea is that, for some OD pairs with the same path sets, the corresponding path choice fractions may be the same under logit constraints. Then, we can 푖′ ,푗′ construct linear constraints with the form 푝푖푚,푗 = 푝 푚′ for some 푖 , 푗, 푟 and 푖′ , 푗′, 푟′. 푟 푟′ 푚 푚′

A simple example is shown below to illustrate this property. Consider the OD pairs 1-5 and 2-5 in Figure 3-2. There are two paths for each OD pair. Path 1 has a transfer at 1,5 1,5 2,5 2,5 station 4 and path 2 at station 3. The path choice fractions are denoted as 푝1 , 푝2 , 푝1 , 푝2 , respectively. For exposition purposes, we ignore the time index. We further assume there are four path attributes affecting passenger’s path choices: in-vehicle time, the number of transfers and transfer walking time and commonality factor. Then we can show that, under 1,5 2,5 1,5 2,5 logit constraints, there are 푝1 = 푝1 and 푝2 = 푝2 . The proof is shown below.

푖,푗 Denote the utility for path 푟 of OD pair 푖 and 푗 as 푉푟 . Since path 1 of OD 1-5 and path 1 of OD 2-5 share the same transfer patterns, the number of transfers and transfer walking time for them are the same. The commonality factors for these two paths are also the same (see Eq. 2.2). Therefore, the difference in utilities for path 1 of the two different ODpairs 푖,푗 only contains in-vehicle time. Let the in-vehicle time for path 푟 of OD pair (푖, 푗) be 푡푡푟 , the 푖,푗 in-vehicle time for link (푖, 푗) be 푡푡 , and the coefficients of in-vehicle time be 훽푡푡. We have

1,5 2,5 1,5 2,5 1,2 푉1 − 푉1 = 훽푡푡 · (푡푡1 − 푡푡1 ) = 훽푡푡 · 푡푡 , (3.8)

Similarly, for path 2 of OD 1-5 and OD 2-5, we have

1,5 2,5 1,5 2,5 1,2 푉2 − 푉2 = 훽푡푡 · (푡푡2 − 푡푡2 ) = 훽푡푡 · 푡푡 , (3.9)

52 According to the logit constraint, we have

1 1 푝1,5 = = . (3.10) 1 1,5 1,5 2,5 1,2 2,5 1,2 1 + exp(푉2 − 푉1 ) 1 + exp((푉2 + 훽푡푡 · 푡푡 ) − (푉1 + 훽푡푡 · 푡푡 )) 1 2,5 = 2,5 2,5 = 푝1 1 + exp(푉2 − 푉1 )

1,5 2,5 Similarly, for path 2, we will have 푝2 = 푝2 . This example network represents a sub-component of many real-world networks. There- fore, this property also holds in the real-world for many OD pairs in any network.

Figure 3-2: Network Example for ALC

Besides equality constraints, there are also inequality constraints under the logit assump- tion. For example, since all cost coefficients (e.g., in-vehicle time, transfer times) should be negative according to our prior knowledge, if there is a path have smaller costs than other paths for the same OD pair, it should always have a higher share regardless of the magni- tude of 훽. So we can construct linear constraints of the form 푝푖푚,푗 ≥ 푝푖푚,푗 to capture this 푟 푟′ information. To automatically extract all these linear constraints in the system, we propose a Monte-

Carlo sampling method. We first define a reasonable range for all 훽 (i.e. 훽 ∈ [퐿훽, 푈훽]) based on prior knowledge (e.g. survey results from previous years), where 퐿훽 (푈훽) is the vector of lower (upper) bounds for 훽. It is worth noting that the selection of 퐿훽 and 푈훽 has a limited impact on the construction of ALC. The equality constraints are independent of the value of 훽. 퐿훽 and 푈훽 only affect the construction of inequality constraints, and fromour numerical tests, the impact is very small. Generally, we only need to set the cost coefficients to be negative (i.e., 퐿훽 = −∞ and 푈훽 = 0). The detailed ALC construction steps are shown in Algorithm 1. The maximum number of sampling points is 푆. The choice of 푆 is a trade-off between computational efficiency and constraint accuracy. Larger 푆 can help avoid erroneously constructing constraints.

53 Algorithm 1 Monte-Carlo Based ALC Construction 1: Initialize 푠 = 0 2: while 푠 < 푆 do 3: 푠 = 푠 + 1 (푠) 4: Sample 훽 from the uniform distribution 푈(퐿훽, 푈훽), denoted as 훽 . (푠) (푠) 푖푚,푗 5: Calculate the path choice fraction for all paths based on 훽 , denote them as 푝푟 6: for all 푖푚, 푗, 푟 in path sets do ′ ′ ′ 7: for all 푖푚 , 푗 , 푟 in path sets do 푖 ,푗(푠) 푖 ′,푗′ (푠) 8: if 푝 푚 = 푝 푚 for all 푠 = 1, ..., 푆 then 푟 푟′ 푖 ,푗 푖 ′,푗′ 9: Save 푝 푚 = 푝 푚 as a linear constraint. 푟 푟′ 10: for all 푖푚, 푗 in OD pairs sets do 11: for all 푟 ∈ R(푖, 푗) do 12: for all 푟′ ∈ R(푖, 푗) do 푖 ,푗(푠) 푖 ,푗(푠) 13: if 푝 푚 ≥ 푝 푚 for all 푠 = 1, ..., 푆 then 푟 푟′ 푖 ,푗 푖 ,푗 14: Save 푝 푚 ≥ 푝 푚 as a linear constraint. 푟 푟′ 15: return All saved linear constraints

These linear constraints both partially capture the effect of logit constraints, and retain

푖푚,푗 the model tractability. We denote all the constructed linear constraints for 푝푟 as

푖푚,푗 푝푟 satisfies the ALC of ,logit ∀푖푚, 푗푛, 푟 ∈ R(푖, 푗) (3.11)

Then sub-problem 1 can be reformulated as

∑︁ ∑︁ ∑︁ min 푤 (푞푖푚,푗푛 − 푞˜푖푚,푗푛 )2 + 푤 (푝푖푚,푗 − 푝˜푖푚,푗)2 푝 1 2 푟 푟 푖푚,푗푛 푖푚,푗 푟∈R(푖,푗)

s.t. Eq. 3.5b - 3.5c, (3.12)

푖푚,푗 푝푟 satisfies the ALC of logit ∀푖푚, 푗푛, 푟 ∈ R(푖, 푗), Eq. 3.5f - 3.5h

푖푚,푗 The decision variables in the new formulation 푝 instead of 훽. 푝˜푟 is the prior knowledge about path shares derived from 훽˜. Eq. 3.12 is a quadratic program since all constraints are linear, and can be solved efficiently. However, it is not equivalent to the original problem. Based on the numerical test in the case study, after adding the ALC, the total degrees of freedom can decrease by 40%, which demonstrates a narrower feasible space. But we still need to go one step further to make all estimated path shares satisfy the actual logit constraints.

54 Logit correction

The estimated 푝 from Eq. 3.12 have two problems. The first is possible over-fitting dueto the high degree of freedom, which we have discussed before. The second is unidentifiable path shares due to few observed OD entry-exit flows. For example, if there is no observed

푖푚,푗 passenger for OD pair (푖, 푗) in time interval 푚. 푝푟 can take any values and does not affect the objective function. Hence, its value cannot be estimated. Both of these problems can

푖푚,푗 be attributed to the same source: the estimated 푝푟 violate the original logit constraints (they only satisfy the ALC of logit).

푖푚,푗 To address this problem, we can use the estimated 푝푟 from Eq. 3.12 (called rough path shares hereafter) to obtain a set of 훽, and then use the 훽 to generate new path shares. This procedure is referred to as logit correction. Path shares after the logit correction will naturally satisfy the logit constraints by definition. However, not all rough path shares are equally reliable. Since more observed passengers can provide more information for the path

푖푚,푗 shares estimation, the reliability of an estimated 푝푟 (∀푟 ∈ R(푖, 푗)) can be measured by the corresponding OD entry flow 푞푖푚,푗.

Therefore, we formulate the logit correction problem as following, which can be seen as a weighted fractional logit model [52].

∑︁ 푖푚,푗 ∑︁ 푖푚,푗 exp(훽푌푟,푚) max 푞 푝푟 · log ∑︀ (3.13) 훽 푟′∈ (푖,푗) exp(훽푌푟′,푚) 푖푚,푗 푟∈R(푖,푗) R

푖푚,푗 Note that in Eq. 3.13, 푝푟 are constants. The objective function has the form of a softmax function. It is a convex optimization problem without constraints (like logistic regression), which can be solved efficiently. 푞푖푚,푗 is the weight for corresponding path shares

푖푚,푗 (푝푟 , ∀푟 ∈ R(푖, 푗)), which reflects their reliability. Recall the problem that if there wasno

푖 ,푗 푖푚,푗 passengers observed for a specific OD pair (푞 푚 = 0), the corresponding path share 푝푟 would be unidentifiable in Eq. 3.12. Now as weadd 푞푖푚,푗 as the weight, 푞푖푚,푗 = 0 means the unidentifiable path shares have no reliability and will not appear in the objective function.

After we get 훽, the aforementioned two problems in the results of Eq. 3.12 will naturally disappear because we can generate a new 푝 that satisfies the logit constraints exactly.

55 3.3.5 Discussion of solution procedures

So far, we have formulated three sub-problems to approximate the solution of the original problem. These sub-problems can be summarized in Eq. 3.14-3.16. In sub-problem 1a, given

푖푚,푗푛 휇푟 , we estimate the rough path shares by solving a quadratic programming problem. In sub-problem 1b, given the rough path shares, we estimate the corresponding 훽 through a weighted fractional logit model formulation. In sub-problem 2, given 훽, we load passengers

푖푚,푗푛 to the network and return the 휇푟 values which satisfy the NLM constraints.

∙ Sub-problem 1a:

∑︁ ∑︁ ∑︁ min 푤 (푞푖푚,푗푛 − 푞˜푖푚,푗푛 )2 + 푤 (푝푖푚,푗 − 푝˜푖푚,푗)2 푝 1 2 푟 푟 푖푚,푗푛 푖푚,푗 푟∈R(푖,푗)

s.t. Eq. (3.5b) - (3.5c), (3.14)

푖푚,푗 푝푟 satisfies the ALC of logit ∀푖푚, 푗, 푟 ∈ R(푖, 푗), Eq. (3.5f) - (3.5h)

∙ Sub-problem 1b:

∑︁ 푖푚,푗 ∑︁ 푖푚,푗 exp(훽푌푟,푚) max 푞 푝푟 · log ∑︀ (3.15) 훽 푟′∈ (푖,푗) exp(훽푌푟′,푚) 푖푚,푗 푟∈R(푖,푗) R

∙ Sub-problem 2:

휇 = Network Loading (훽, 푞푒, 휃) (3.16)

We solve these three sub-problems iteratively and approximate the solution for the orig- inal problem. This is equivalent to find a fixed point of the following problem.

훽 = SP1b ∘ SP1a ∘ SP2(훽) (3.17) where SP2 is the solution function of Sub-problem 2, i.e. 휇 = SP2(훽); SP1a is the solution function of Sub-problem 1(a), i.e. 푝 = SP1a(휇); SP1b is the solution function of Sub- problem 1(b), i.e. 훽 = SP1b(푝);“∘” is the sign of function composition, i.e., 푓 ∘ 푔(푥) = 푓(푔(푥)). The existence and uniqueness of the solution in Eq. (3.17), and its relationship with the original problem in Eq. (3.5) is an important question.

56 Lemma 1. 훽* is the solution of sub-problem 1(b) with respect to 푝 (i.e. 훽* = SP1b(푝)) if the following two properties hold:

* * 푖푚,푗 exp (훽 푌푟,푚) ∙ 푝 satisfies the logit constraints in terms of 훽 , that is, 푝푟 = ∑︀ * for 푟′∈R(푖,푗) exp (훽 푌푟′,푚) 푖푚,푗 all 푝푟 ∈ 푝

∙ Sub-problem 1b has a unique solution

exp (훽푌푟,푚) 푖푚,푗 Proof. Denote ∑︀ as ℎ푟 . Then Eq. 3.15 can be rewritten as 푟′∈R(푖,푗) exp (훽푌푟′,푚)

∑︁ 푖푚,푗 ∑︁ 푖푚,푗 푖푚,푗 max 푞 푝푟 · log ℎ푟 , (3.18) 훽 푖푚,푗 푟∈R(푖,푗)

푖푚,푗 which has the form of entropy function. The maximum can be reached when 푝푟 =

푖푚,푗 푖푚,푗 * ℎ푟 , ∀푖푚, 푗, 푟 ∈ R(푖, 푗). Since 푝푟 already satisfy logit constraints in terms of 훽 , feeding

* 푖푚,푗 푖푚,푗 푖푚,푗 * 훽 into ℎ푟 gives the desired condition (푝푟 = ℎ푟 ). Thus 훽 is an optimal solution of

푖푚,푗 푖푚,푗 sub-problem 1(b). It is possible that other 훽 can also lead to 푝푟 = ℎ푟 . However, as we assume sub-problem 1b has a unique solution, we have 훽* = SP1b(푝).

The discussion of the existence of fixed point in Eq. 3.17 is shown in Proposition 1

Proposition 1. The optimal solution 훽* for the original problem (Eq. 3.5) is a fixed point for Eq. 3.17 if the following properties hold:

∙ 1) Sub-problem 1b has a unique solution for all given 푝.

∙ 2) The optimal objective function for the original problem Eq. 3.5 is 0.

* * * * * 푖푚,푗 exp (훽 푌푟,푚) Proof. Denote 휇 := SP2(훽 ). Define 푝 such that 푝푟 = ∑︀ * for all 푟′∈R(푖,푗) exp (훽 푌푟′,푚) * 푖푚,푗 * * * 푝푟 ∈ 푝 . We claim 푝 = SP1a(휇 ). The proof is shown below. By definition, 휇* satisfies the NLM constraints with respect to 훽* and 푝* satisfies the logit constraints with respect to 훽*. So (훽*, 휇*, 푝*) is the optimal solution for the original problem. Comparing sub-problem 1a and the original problem, if we use 휇* in sub-problem 1a, the optimal objective function of sub-problem 1a should be less than or equal to that of the original problem because 푝 has a larger feasible space in sub-problem 1a. However, given the assumption that the optimal objective function of the original problem is 0 (cannot be decreased), the optimal objective function of sub-problem 1a is 0 as well. Since the objective

57 function for these two problems are same, we have 푝* is the optimal solution for sub-problem 1a (i.e. 푝* = SP1a(휇*)). By definition, 푝* satisfies the logit constraints in terms of 훽*, and we also assume sub- problem 1b has a unique solution for all 푝. According to Lemma 1, 훽* = SP1b(푝*). This leads to 훽* = SP1b ∘ SP1a ∘ SP2(훽*).

Proposition 1 proves the existence of a fixed point for Eq. 3.17, which is exactly the solution of the original problem. However, Proposition 1 only holds under the two assump- tions. The first assumption requires a unique solution of SP1b(푝). This may not be true in the real world because the degrees of freedom for 푝 is larger than 훽. It is possible that two different 훽’s can lead to the same path shares 푝. The second assumption requires that the path choice behavior is perfectly captured by the C-logit model (so that the optimal objec- tive function is 0), which may not true in the real world. However, even if the decomposed method may give solutions not exactly the same as the original problem, the proof under the two assumptions illustrates the reasonableness of the approach. As Proposition 1 only discusses existence, the uniqueness of the fixed point is still un- known. Actually, proving uniqueness requires the contraction of SP1b∘SP1a∘SP2, that is, for any 훽 and 훽′, we have ||SP1b ∘ SP1a ∘ SP2(훽) − SP1b ∘ SP1a ∘ SP2(훽′)|| ≤ 훾||훽 − 훽′||, where || · || is a norm and 훾 ∈ [0, 1) is a constant. However, since SP1b ∘ SP1a ∘ SP2 has no analytical expression, it is hard to prove the uniqueness. According to the Banach fixed-point theorem [53], if the contraction holds, the unique fixed point can beobtained by the following procedures: start with an arbitrary 훽(0) and define a sequence {훽(푛)} by (푛) (푛−1) (푛) 훽 = SP1b ∘ SP1a ∘ SP2(훽 ) for 푛 ≥ 1. Then lim푛→∞ 훽 exists and the converged value is the fixed point. Though we cannot prove show the contraction of SP1b ∘ SP1a ∘ SP2 analytically, we apply the Banach fixed-point theorem to develop the solution procedure (Algorithm 2),and validate the convergence numerically. If lim푛→∞ 훽푛 exists, and the converged value is close to1 the solution of the original problem, then a necessary condition for the uniqueness is verified, which provides more evidence to the reasonableness of the decomposed method. The results of numerical validation using synthetic data are presented in Section 3.4.3,

1We use “close to” because the randomness of network loading model and precision issues in solving optimization problem can lead to small errors

58 which shows lim푛→∞ 훽푛 converges and the converged value is very close to the solution of the original problem (in Section 3.4.3, the solution of the original problem is the synthetic 훽).

Algorithm 2 Solution Procedures for Path Choice Estimation 1: Initialize 훽(0). (0) (0) 2: 휇 = Network Loading (훽 , 푞푒) (sub-problem 2) 3: Set iteration counter 푘 = 0. 4: do 5: 푘 = 푘 + 1 6: Solve sub-problem 1(a) with fixed 휇(푘−1) and return 푝(푘) 7: Solve sub-problem 1(b) with fixed 푝(푘) and return 훽(푘) 8: Solve sub-problem 2 with 훽(푘) as input and return 휇(푘) (푘) (푘−1) 9: while ||훽 − 훽 || ≤ 휖 or 푘 < 퐾푡 10: if 푘 < 퐾푡 then 11: 훽 = 훽(푘) 12: else 퐾 13: 훽 = ∑︀ 푡 훽(푘)/(퐾 − 퐾 + 1) 푘=퐾푏 푡 푏 14: return 훽

In Algorithm 2, 훽(0) is the initial value of 훽. 휖 is a predetermined threshold for algorithm termination. To address the randomness in the network loading model, we also define a

“burn-in” iteration 퐾푏 and a maximum iteration 퐾푡. When 훽 is fluctuated because of randomness, we take the average of last 퐾푡 − 퐾푏 values of 훽 as the final estimation.

3.4 Case Study and Model Validation

For the purpose of model illustration and validation, we apply the proposed modeling frame- work using data from the Hong Kong MTR network. The model is validated using both synthetic and real-world AFC data.

3.4.1 Validation setting

We use the AFC data from March 16th (Thursday), 2017 for the model validation. The path sets for each OD pair are provided by MTR. Li [18] conducted a revealed-preference (RP) route choice survey of more than 20,000 passengers in the MTR system. According to Li [18], the following attributes were used to quantify path utility: (a) total in-vehicle time, (b) the number of transfer times, (c) relative walking time (total walking time divided by

59 total route distance) and (d) the commonality factor (Eq. 2.2). Detailed estimation results are included in Appendix A. The evening peak (18:00-19:00) is selected for validation. For simplicity, we assume the path shares are static during this hour. The weights in the objective function of sub-problem 1(a) are set as 푤1 = 1 and 푤2 = 0, which means no prior knowledge is available. The maximum iteration 퐾푡 is set to 15 and the “burn-in” iteration (0) 퐾푏 is set as 13. 훽 is set to 0 for all parameters. The parameters of the network loading model are summarized below.

∙ Access, egress, and transfer walking times: Platform-specific, obtained from field mea- surement.

∙ Train arrival and departure times: Approximated by the March timetable. Future research can use AVL data to get actual train arrival and departure information.

∙ Capacity: Determined using the model described in Mo et al. [54].

∙ Warm-up and cool-down time: 60 minutes warm-up and cool-down time.

Access/Egress walking time is defined as the walking time between the fare machine and the train boarding platform. Warm-up (cool-down) time indicates the time before (after) simulation period start (end). It is needed because the simulation system usually starts from an empty state (no train and passengers). Since the real-world path choice information is usually unavailable, it is common to quantitatively validate the model with synthetic data. To generate the synthetic data, we first extract the OD entry flow from the real-world AFC records. Choice parameters 훽 estimated in Li [18] are treated as passengers’ “true” behavior parameters (called synthetic 훽 hereafter). We use the network loading model with the true OD entry flows (actual number of tap-in passengers according to the AFC data) and the synthetic 훽 as input to simulate the travel of passengers in the system, and record their tap-out time. The records of tap-out times and their true tap-in times are treated as the synthetic AFC data. The proposed approach is applied to the synthetic AFC data and validated based on its ability to recover the synthetic 훽 values (i.e., the difference between the estimated and synthetic 훽 values). The methodology is also applied to the real-world AFC data, providing further qualitative analysis and indirect comparison.

60 3.4.2 Benchmark model

To evaluate the model performance, we compare the results from the proposed model to a purely simulation-based optimization (SBO) method [55] (benchmark). The formulation of the benchmark approach is shown below.

∑︁ 푖푚,푗푛 푖푚,푗푛 2 2 min 푤1 (푞 − 푞˜ ) + 푤2||훽 − 훽˜|| (3.19a) 훽 푖푚,푗푛

푖푚,푗푛 푖푚,푗 s.t. 푞 = Network Loading (훽, 푞 ) ∀푖푚, 푗푛, (3.19b)

퐿훽 ≤ 훽 ≤ 푈훽 (3.19c)

where 퐿훽 and 푈훽 are pre-determined lower and upper bounds of 훽. We set 푤1 = 1 and

푤2 = 0 as above. Compared with our proposed model, the purely SBO method is closer to brute-force searching. 퐿훽 and 푈훽 are usually required to narrow the feasible space and guide the algorithm to reasonable parameters. The values of 퐿훽 and 푈훽 are shown in table 3.1. By introducing 퐿훽 and 푈훽, we actually provide the benchmark model with more information.

Many solution algorithms have been proposed to solve SBO problems. These algorithms generally belong to three major classes: direct search, gradient-based, and response surface methods [56, 57]. According to Osorio and Bierlaire [56] and Cheng et al. [58], response surface methods have good performance and are gaining popularity in the transportation literature. In this study, we adopt two response surface methods to solve the benchmark model: Bayesian Optimization (BYO) [31] and Constrained Optimization using Response Surfaces (CORS) [59]. BYO aims to construct a probabilistic model of the objective function (response surface) and then exploit this model to determine where to evaluate the objective function for the next step. In each iteration, the probabilistic model is updated according to the posterior distribution of the objective function. CORS constructs a response surface model, and updates the model based on all previously probed points. The criteria for selecting the next points to be evaluated are (a) finding points that have lower objective function value, and (b) improving the fitting of the response surface model by sampling feasible regions where little information exists.

SBO methods are usually unstable due to the randomness in the searching process. For this reason, we perform 10 replications of each algorithm and report the mean and standard deviation of the objective function. The SBO methods are only used with the synthetic data

61 so that we can compare the estimated path choice parameters to the synthetic ones.

3.4.3 Synthetic data results

Convergence of 훽

The convergence results of 훽 for the fixed point algorithm (Algorithm 2) is depicted in Figure 3-3, which shows the value of 훽 for each iteration. All 훽 values appear to converge despite some slight fluctuation in the tail. The fluctuation may be the result of the randomness inthe network loading model. The results support the proposed solution approach in Algorithm 2 and validate the theoretical arguments made in Section 3.3.5.

(a) In-vehicle time (b) Num of transfer (c) Relative walking time (d) Commonality factor

Figure 3-3: Convergence behavior of estimated 훽

Performance

Two indicators are reported during the iteration. One is the objective function, another is the root-mean-square error (RMSE).

√︃ ∑︁ ∑︁ 푖푚,푗 푖푚,푗 2 ∑︁ RMSE = (푝푟 − 푝ˆ푟 ) / 푅푖,푗 (3.20) 푖푚,푗 푟∈R(푖,푗) 푖,푗

푖푚,푗 푖푚,푗 where 푝푟 are the estimated path shares and 푝ˆ푟 are the synthetic path shares (unit is ∑︀ %). 푖,푗 푅푖,푗 is the total number of paths in the system. Figure 3-4 shows the value of the objective function as a function of the number of iterations. The error bars for the benchmark methods represent the standard deviation. The proposed method outperforms benchmark models both in convergence rate and final solutions. The RMSE comparison results are shown in Figure 3-5. The proposed method

62 approaches the ”true” path shares rapidly and has a lower estimation error than the bench- mark models. Note that the RMSE may not always decrease with the reduction of the objective function. This is because the relationship between path choices and OD entry-exit flows is highly non-linear.

Figure 3-4: Objective Function Results of Synthetic Data

Figure 3-5: RMSE Results of Synthetic Data

The comparison of estimated 훽 and “true” (synthetic) 훽 are shown in Table 3.1. The 훽 values estimated using the proposed method are very close to the “true” ones. The quality of the estimated solution is also highlighted by the RMSE values. The RMSE of the 훽 values estimated from the proposed method is much lower than the RMSE of those estimated from the benchmark method, indicating the proposed method outperforms the benchmark models.

63 Table 3.1: 훽 Estimation Results of Synthetic Data

Estimated Variable Synthetic (“true”) [퐿훽,푈훽] Proposed BYO CORS

In-vehicle time (훽1) -0.147 -0.156 -0.205 -0.231 [-2, 0]

Number of transfers (훽2) -0.573 -0.544 -1.218 -1.189 [-4, 0]

Relative walking time (훽3) -1.271 -1.291 -2.499 -2.316 [-6, 0]

Commonality factor (훽4) -3.679 -3.413 -6.184 -6.537 [-10, 0]

Objective function - 10328.8 42390.6 37066.1 - RMSE - 1.16 9.08 7.47 -

3.4.4 Real-world data results

Since ground-truth path fractions are not available, we use the real-world AFC data to estimated 훽 values with the proposed method and compare them to ones obtained by Li [18] (summarized in Table 3.2). Results show that the scale of all coefficients is similar. The trade-off between in-vehicle time and the number of transfers is reasonable, where one transfer is equivalent to 7.9 minutes of in-vehicle travel time, compared to 3.9 minutes in Li [18]. The trade-off between in-vehicle time and walking time is relatively small forlong trips but significant for short trips. The results indicate that for a trip with 4 stations, one minute of transfer walking time is equivalent to 3.14 minutes of in-vehicle travel time (2.16 minutes in Li [18]). For a trip with 8 stations, one minute of walking time is equivalent to 1.57 minutes of in-vehicle travel time (1.08 minutes in Li [18]). The substitution patterns are reasonable and similar to the previous results [18]. It should be pointed out that since 2014, a number of changes have taken place in the MTR network. There was not only a growth in demand, but also the opening of a new line that can result in path choice pattern changes.

Table 3.2: 훽 Estimation Results of Real-world Data

In-vehicle time Number of transfers Relative walking time Commonality factor

Estimated -0.116 -0.920 -1.457 -1.775 Li [18] -0.147 -0.573 -1.271 -3.679

64 Though we cannot directly compare path shares, other measurements (e.g. left behind rate) can also reflect the quality of path shares. Field observation data for the Admiralty station Northbound platform during the testing period (18:00-19:00) is available, which contain information on left behind rate (proportion of passengers with at least one left behind), total number of arrival passengers (sum of new tap-in and transfer passengers), and total number of boarding passengers. These measures can also be obtained from the network loading model using the path shares estimated from the proposed method as input. For comparison purposes, we also run the network loading model using two other path shares. The first is generated by a naive model that results in equal shares among allpaths (referred to as “uniform” path shares). The second is calculated using the path choice model in Li [18]. The comparison results are shown in Table 3.3. Compared with the ground truth, the network loading model using the estimated path shares replicates closely the left behind rate, number of arrival passengers, and number of boarding passengers. The square error of OD entry-exit flow (i.e. the objective function) is also the lowest. The performance in terms o left behind rate and number of arrival passengers for the estimated path shares are similar to that of Li [18]. But the estimated path shares can outperform that of Li [18] in number of boarding passengers and OD entry-exit flows.

Table 3.3: Comparison of various measurement of Admiralty station (18:00 to 19:00)

Number of Number of Square error of Left behind rate arrival passengers boarding passengers OD entry-exit flow

Ground-truth 0.747 24,945 23,926 - Proposed model 0.724 24,589 23,570 1,044,692 Li (2014) 0.742 24,959 22,357 1,166,814 Uniform 0.779 25,683 18,767 1,323,594

3.4.5 Robustness testing

Model robustness is important for real-world applications. We test the performance of the model under different 훽(0) (initial 훽 values), and for different days. A robust model should output similar estimated 훽 values regardless of initial values. The estimated 훽 for all different days should also be similar since passengers’ choice behavior is stable in the short term.

65 Sensitivity to initial 훽 values

The sensitivity analysis to different values of 훽(0) are conducted using the synthetic data. (0) 12 different 훽 are drawn from a uniform distribution U(퐿훽, 푈훽). Figure 3-6a shows the convergence of the objective function for different 훽(0) values. In early iterations, the initial objective function values vary a lot. But after around 10 iterations, all objective functions converge to the same value regardless of 훽(0) values, which demonstrates the robustness of the model with respect to initial 훽 values. Figure 3-6b is the boxplot of the estimated 훽 (0) parameters for different 훽 . The variables that 훽1,...,훽4 correspond to can be found in (0) Table 3.1. The estimated 훽1, 훽2 and 훽3 values are very stable regardless of 훽 . While the estimated 훽4 value (commonality factor) show some fluctuations, but still within a small range (the 95% confidence interval is [−3.2, −3.6]). This also corresponds to the survey estimation results where 훽4 has a relatively low t-value (see Appendix A).

(a) Convergence of the objective function (b) Boxplot of estimated coefficients for dif- (different curves indicate different 훽(0)) ferent 훽(0)

Figure 3-6: Model sensitivity to initial 훽 values

Sensitivity to data from different days

To test the robustness of the model in terms of OD demands from different days, the model was applied using actual AFC data for each day from March 13rd to March 17th, 2017 (Monday to Friday). Figure 3-7 compares the estimated 훽 values for the different days. In general, all estimated values are consistent across days except for the coefficient of relative walking time on March 17 (Friday). This may be due to the fact that Friday nights are the

66 start of weekends, and passengers may have different travel patterns and behaviors. Walking time, for example, is less important for entertainment trips which may take place during the evening peak on Friday. Overall, the proposed model is robust with respect to data from different weekdays.

Figure 3-7: Comparison of estimated 훽 values for different days

3.5 Discussion

This chapter presents an assignment-based approach to infer passenger route choice behav- ior in urban rail systems with both entry and exit AFC transactions. The approach models explicitly the interactions between path choices and left behind as well as the interactions among stations in terms of crowding. The path choice estimation is modeled as an op- timization problem. The original intractable problem is decomposed into three tractable sub-problems which can be solved efficiently. Case studies using synthetic and actual data validate the effectiveness and robustness of the approach. Future research can be done in the following directions. First, the model can be gener- alized to accommodate different choice model structures, such as nested logit models. This can be accomplished by changing sub-problem 1(b). Second, the model can be extended to allow different 훽 values for different groups of passengers. As real-world route choice behavior may be more diverse and heterogeneous.

67 THIS PAGE INTENTIONALLY LEFT BLANK

68 Chapter 4

Simultaneous Calibration of Path Choices and Train Capacity

4.1 Introduction

Urban rail systems are important components of the urban transportation system. They have attracted high passenger demand given the high reliability and large capacity. However, high demand also leads to problems such as overcrowding and disruptions, which decrease the level of service and impact passengers. To maintain service reliability and develop efficient response strategies, it is crucial for operators to better understand passenger demand and flow patterns in the network.

Transit network loading (or simulation) models for metro systems, powered by automated collected data, provide a useful instrument for network performance monitoring. They en- able operators to characterize the level of service and make decisions accordingly. A typical network loading model requires Origin-Destination (OD) matrix, supply information, and path choice fractions as input. The supply information includes the transit network topol- ogy, actual vehicle movement data, and vehicle capacity. Thanks to the wide deployment of automated fare collection (AFC) and automated vehicle location (AVL) systems, the OD demand and train movement data can be directly obtained. However, obtaining the corresponding path choices and quantifying reasonable vehicle capacity remain a challenge. According to Liu et al. [8] and Preston et al. [27], train capacity, defined as the maximum train load when remaining passengers in the platform denied boarding, may vary depending

69 on the crowding levels in trains and on platforms, and passenger attitudes. Calibration of path choices and train capacity can improve the accuracy of network loading models for performance monitoring. Thus, these models can provide better information to operators to adjust operating strategies, relieve congestion, and improve efficiency.

Traditionally, path choices are inferred with data from on-site surveys that are used to estimate path choice models. However, surveys are time-consuming and labor-intensive, limiting their real-world usage. To overcome these disadvantages, path choice estimation methods based on AFC data have been proposed in the literature.

AFC systems provide the exact locations and times of passengers’ entry and exit trans- actions. Therefore, they provide rich information for analyzing passenger behavior. In the context of path choice estimation, the AFC data-based methods can be categorized into two groups: path-identification methods [33, 34, 35, 36] and parameter-inference methods [37, 38, 39, 40]. The former studies aim to identify the exact path chosen by each user and even the train they boarded. Path attributes are used to evaluate how likely a path is chosen for a passenger’s trip from their observed origin to their observed destination. The latter studies formulate probabilistic models to describe passengers’ decision-making behav- ior. Bayesian inference is usually used to estimate the corresponding parameters and thus derive the path choice fractions. Despite using different methods, the key components for those AFC data-based studies are similar. They all attempt to match the model-derived journey time with the observed journey time from AFC data. However, many of these stud- ies either assume a fixed train capacity or specify a fixed link-impedance function. Moetal. [60] point out that, model-derived journey time depends on both path choices and train capacity.

Train capacity is a vague concept. Normally trains may not reach their designed physical capacity for various reasons (e.g. passengers may decide not to board due to the crowding [8]). Therefore, assuming a fixed physical capacity or fixed link-impedance function (in many previous studies) may not be a reasonable assumption in real-world situations. Only a few studies have explored the estimation of actual train capacity in the rail system. Liu et al. [8] proposed the concept of “willingness to board” (WTB) to describe the varied capacity in a bus system, and calibrated passengers’ WTB using a least square method. Xu and Yong [61] proposed a passenger boarding model which revealed that the number of actually boarding passengers in a crowded train was closely related to the number of queuing

70 passengers and train load. Similarly, Mo et al. [62] proposed an effective capacity model that recognized train capacity may vary across stations depending on the corresponding number of queuing passengers and train load. The calibration of train capacity or WTB usually requires the AFC data with passengers’ boarding and journey time information. However, this information may also be affected by path choices, which were neglected in previous studies. To fill these research gaps, we propose a simulation-based optimization (SBO) frame- work to calibrate path choices and train capacity simultaneously. The calibration problem is formulated as an optimization problem using AFC and AVL data. The formulation can capture the interaction among these variables and their impact on journey times. Seven op- timizers (solving algorithms) from four brunches of SBO solving methods are implemented for comparative analysis. They include Generic Algorithm (GA), Simulated Annealing (SA), Nelder-Mead Simplex Algorithm (NMSA), Mesh Adaptive Direct Search (MADS), Simulta- neous Perturbation Stochastic Approximation (SPSA), Bayesian Optimization (BYO) and Constrained Optimization using Response Surfaces (CORS). We compare these SBO solv- ing algorithms within a limited computational budget, defined by the number of function evaluations. Data from the Hong Kong Mass Transit Railway (MTR) system provide the foundation for a realistic case study. The remainder of the chapter is organized as follows: In Section 4.2, we illustrate the SBO problem formulation. Section 4.3 briefly describes the various SBO methods used in this study. The proposed framework is used in a case study with data from the Hong Kong MTR network in Section 4.4. The results are used to compare the performance of the different algorithms. Section 4.5 concludes the chapter by summarizing the main findings and discussing future research directions.

4.2 Methodology

Considering the complexity of the problem and the interaction among different variables in the urban rail systems, we use a network loading model (black-box function) to capture the performance of the network for a given set of capacity and path choices. The path choice and capacity estimation model is formulated as an optimization model that attempts to minimize the error between simulated outputs (which is a function of path choices and train

71 capacity) and the corresponding quantities observed from the AFC data.

4.2.1 Transit network loading model

Transit network loading (TNL) models aim to assign passengers over a transit network given the (dynamic) OD entry demand and path choices. In this chapter, we adopt an event-driven schedule-based TNL model proposed in Chapter 2. The model takes OD entry demand (number of tap-in passengers by time), path choices, train arrival and departure time from stations, train capacity and infrastructure information (e.g. network topology) as inputs, and outputs the passengers’ tap-out times, train loads, waiting times, and other network performance indicators of interest.

4.2.2 Problem formulation

Consider a general urban rail network in a specific time period 푇 , represented as 퐺 = (푆, 퐴), where 푆 is the set of stations and 퐴 is the set of directed links. We divide 푇 into several time intervals with equal length 휏 (e.g., 휏 = 15 min). Denote the set of all time intervals as 풯 = {1, 2, ..., 푇/휏}. Define a time-space (TS) node as 푖푚, where 푖 ∈ 푆 and 푚 ∈ 풯 . 푖푚 represents station 푖 in time interval 푚.

For an OD pair (푖, 푗) (푖, 푗 ∈ 푆), the OD entry flow (푞푖푚,푗) represents the number of passengers entering station 푖 during time interval 푚 and exiting at station 푗. Let the set of all OD entry flows be 푞푒. The OD exit flow (푞푖,푗푛 ) represents the number of passengers who exit at station 푗 in the time interval 푛 with origin 푖. 푞푖푚,푗 and 푞푖,푗푛 are inputs and outputs of the TNL model, respectively. Let the set of all paths between (푖, 푗) be ℛ(푖, 푗). As discussed in Chapter 2, the path

푖푚,푗 choice fraction for path 푟 ∈ ℛ(푖, 푗) in time interval 푚 (푝푟 ) is formulated as:

휇(훽푋 ·푋푟,푚+훽퐶퐹 ·퐶퐹푟) 푖푚,푗 푒 푝푟 = , ∀푟 ∈ ℛ(푖, 푗), 푚 ∈ 풯 , 푖, 푗 ∈ 푆 (4.1) ∑︀ 휇(훽푋 ·푋푟′,푚+훽퐶퐹 ·퐶퐹푟′ ) 푟′∈ℛ(푖,푗) 푒 where 휇 is the scale parameter of the Gumbel distribution of the error term [29], which is usually normalized to 1. Larger (smaller) 휇 means the choice behavior is more deterministic

(random). 푋푟,푚 is the vector of attributes for path 푟 in time interval 푚 (e.g., in-vehicle time, number of transfers, transfer walking time, etc.). 퐶퐹푟 is the commonality factor of path 푟 which measures the degree of similarity of path 푟 with the other paths of the same

72 OD. 훽푋 and 훽퐶퐹 are the corresponding coefficients to be estimated. Let 훽 be the vector that combining 훽푋 and 훽퐶퐹 (i.e. 훽 = [훽푋 , 훽퐶퐹 ]). 퐶퐹푟 is defined in Eq. 2.2.

The values of 훽 can be bounded from above and below. The boundaries can be obtained from the prior knowledge and previous survey results. Denote the upper bound as 푈훽 and lower bound as 퐿훽 (퐿훽 ≤ 훽 ≤ 푈훽), where 푈훽 and 퐿훽 are both vectors with the same cardinality as 훽.

As discussed in Chapter 2, the actual train capacity utilized by passengers is determined by three factors: a) waiting passenger distribution on the platform, b) train load and dis- tribution across the train, and c) passengers’ willingness to board a crowded train. Thus, train capacity is not constant. Instead, it is dynamic and changes across stations and trains depending on the crowding level of the train and the platform. The capacity of train 푘 at station 푖 (퐶푘,푖) is formulated in Eq. 2.3. 휃0, 휃1, and 휃2 are parameters to be calibrated

(휃0, 휃1, 휃2 > 0).

In the discussion that follows, let 휃 be the vector of these three parameters. We assume that the values that these parameters can take between 퐿휃 and 푈휃 (퐿휃 ≤ 휃 ≤ 푈휃), where 퐿휃 and 푈휃 are the corresponding lower and upper bounds, respectively.

The goal is to calibrate 휃 and 훽 vectors (used by the TNL model) based on indirect observations. Two sets of observations are used for the calibration: observed OD exit flows and observed journey time distribution (JTD). Both of them can be obtained from the AFC data.

푖,푗푛 Let the ground truth (observed) OD exit flow be 푞˜ . Let 푓푖,푗푡 (푥) be the model-derived ˜ JTD of passengers with origin 푖 who exit at station 푗 during time interval 푡. Let 푓푖,푗푡 (푥) be ˜ the corresponding observed JTD extracted from the AFC data. Since 푓푖,푗푡 (푥) and 푓푖,푗푡 (푥) are estimated from passengers’ journey time observations, only the OD pairs with more than 퐸 passengers exiting in a specific time interval are considered, where 퐸 is a predetermined threshold to ensure enough sample size. Denote the set of corresponding OD pairs and exit

푖,푗푛 푖,푗푛 time intervals as ℰ, where ℰ = {(푖, 푗푛) :푞 ˜ , 푞 > 퐸, ∀푖, 푗 ∈ 푆, 푛 ∈ 풯 }.

The calibration problem is formulated as an optimization problem:

73 ∑︁ 푖,푗푛 푖,푗푛 2 ∑︁ ˜ min 푤1 (푞 − 푞˜ ) + 푤2 퐷KL(푓푖,푗 ||푓푖,푗 ) (4.2a) 훽, 휃 푛 푛 푖,푗∈푆, 푚∈풯 (푖,푗푛)∈ℰ

s.t. 푞푖,푗푛 = TNL(푝, 푞푒, 휃) ∀푖, 푗 ∈ 푆, 푚 ∈ 풯 , (4.2b)

푒 푓푖,푗푛 (푥) = TNL(푝, 푞 , 휃) ∀(푖, 푗푛) ∈ ℰ, (4.2c)

휇(훽푋 ·푋푟,푚+훽퐶퐹 ·퐶퐹푟) 푖푚,푗 푒 푖푚,푗 푝푟 = ∀푝푟 ∈ 푝, (4.2d) ∑︀ 휇(훽푋 ·푋푟′,푚+훽퐶퐹 ·퐶퐹푟′ ) 푟′∈ℛ(푖,푗) 푒

퐿훽 ≤ 훽 ≤ 푈훽, (4.2e)

퐿휃 ≤ 휃 ≤ 푈휃 (4.2f)

The objective function (Eq. 4.2a) has two parts: the square error between model-derived OD exit flows and the corresponding observations, and the difference between model-derived and observed JTD. 푤1 and 푤2 are weights used to balance the scale and the importance of the two parts. The difference of the two distributions is expressed using Kullback-Leibler

(KL) divergence (퐷KL):

∫︁ 푓 (푥) 퐷 (푓 ||푓˜ ) = 푓 (푥) · log 푖,푗푛 d푥. (4.3) KL 푖,푗푛 푖,푗푛 푖,푗푛 ˜ 푥 푓푖,푗푛 (푥)

TNL(푝, 푞푒, 휃) is the black-box function that corresponds to the TNL model, which can output the model-derived OD exit flows and JTD for a given set of path choices and train capacity. Since the TNL model has no analytic form, Eq. 4.2 is an SBO problem with upper and lower bound constraints. In the following section, we discuss seven different algorithms appropriate for the solution of SBO problems. These algorithms belong to four general approaches of SBO solving methods.

4.3 Simulation-based Optimization Algorithms

There are four major classes of methods for solving the SBO problems, including the heuristic methods, direct search methods, gradient-based methods, and response surface methods [56, 57]. Heuristic methods are partial search algorithms that may provide a sufficiently good solution to an optimization problem, especially with incomplete or imperfect information or limited computation capacity. Direct search methods are derivative-free methods that

74 are based on the sequential examination of trial points generated by a certain strategy. They are attractive as they are easy to describe and implement. More importantly, they are suitable for objective functions where gradients do not exist everywhere. Gradient- based approaches (or stochastic approximation methods) attempt to optimize the objective function using estimated gradient information. These methods aim to imitate the steepest descent methods in derivative-based optimization. Finite difference schemes can be used to estimate gradients but they may involve a large number of expensive function evaluations if the number of decision variables is large. Response surface methods are useful in the context of continuous optimization problems. They focus on learning input-output relationships to approximate the underlying simulation by a pre-defined functional form (also known asa meta-model or surrogate model). This functional form can then be used for optimization leveraging powerful derivative-based optimization techniques. In this study, we use seven representative algorithms belonging to these four classes of SBO methods to address the aforementioned path choice and train capacity calibration problem. Table 4.1 summarizes the main characteristic of these algorithms. The summary of all algorithms is described in Table 4.1. In the discussion that follows, let Θ be the combined vector of 훽 and 휃 (i.e., Θ = [훽, 휃] is 푁 the vector of all coefficients to be calibrated). Let 푁 be the dimension of Θ (i.e. Θ ∈ R ).

Table 4.1: Algorithms Summary

Type Algorithm Constraints Stochastic Source Genetic Algorithm (GA) Yes Yes [63] Heuristic method Simulated Annealing (SA) Yes Yes [64] Nelder-Mead Simplex Algorithm (NMSA) No No [65] Direct search Mesh Adaptive Direct Search (MADS) Yes Yes [66] Simultaneous Perturbation Gradient-based Yes Yes [67] Stochastic Approximation (SPSA) Bayesian Optimization (BYO) Yes Yes [31] Response surface Constrained Optimization using Yes Yes [59] Response Surfaces (CORS)

4.3.1 Genetic Algorithm (GA)

GA is a heuristic method for solving both constrained and unconstrained optimization prob- lems, which belongs to the larger class of evolutionary algorithms inspired by natural selec- tion, the process that drives biological evolution. The GA repeatedly modifies a population

75 of individual solutions as an evolution process [68]. The GA can be used to solve a variety of optimization problems that are not well suited for standard optimization algorithms, such as the SBO problem where the objective function (or constraints) is nondifferentiable and highly nonlinear. The evolution starts from a population of randomly generated individuals, and is an iterative process, with the population in each iteration called a generation. In each genera- tion, the genetic algorithm selects individuals at random from the current population to be parents and uses them to produce the children for the next generation. Over successive gen- erations, the population “evolves” toward an optimal solution. The genetic algorithm uses three main procedures at each step to create the next generation from the current popula- tion: 1) Selection: select the individuals, called parents, that contribute to the population at the next generation. Individuals with better objective function values are more likely to be selected. 2) Crossover: combine two parents to form children for the next generation. 3) Mutation: apply random changes to individual parents to form children. In this study, we adopted a blend crossover and Gaussian mutation methods. The probability of crossover is set as 0.8 and the probability of mutating is set as 0.4. And the population size is set as 6 given the limited computational budget. The algorithm is implemented by the Python deap package [63].

4.3.2 Simulated Annealing (SA)

SA is a heuristic method for solving optimization problems [69]. The method is based on the physical process of heating a material and then slowly lowering the temperature to decrease defects, thus minimizing the system energy. At each iteration of the SA algorithm, a new point is randomly generated. The distance of the new point from the current point, or the extent of the search, is based on a probability distribution with a scale proportional to the temperature. A distorted Cauchy-Lorentz visiting distribution is used in this study [64]. The algorithm accepts all new points that lower the objective function, but also, with a certain probability, points that raise the objective function. By accepting points that raise the objective function, the algorithm avoids being trapped in local minima. An annealing schedule is selected to systematically decrease the temperature as the algorithm proceeds. As the temperature decreases, the algorithm reduces the extent of its search to converge to a minimum.

76 In this study, the SA algorithm in Python Scipy package is adopted for the implemen- tation with all model parameters set as default [70].

4.3.3 Nelder-Mead Simplex Algorithm (NMSA)

NMSA is a simplex method for finding a local minimum [71]. NMSA in 푁 dimensions maintains a set of 푁 + 1 test points arranged as a simplex. Denote the initial value of Θ ini ini as Θ . The initial simplex set (푁 + 1 points) is generated as {Θ : Θ = Θ + 푒푖, ∀푖 = ini 푁 1, ..., 푁} ∪ {Θ }, where 푒푖 ∈ R is the unit vector in the 푖 th coordinate, 휎 is the step-size which is set as 0.05 in this study [65]. Based on the initial simplex, the model evaluates the objective function for each test point, in order to find a new test point to replace one of the old test points. Thenewcan- didate can be generated through simplex centroid reflections, contractions, or other means depending on the function value of the test points. The process will generate a sequence of simplexes, for which the function values at the vertices get smaller and smaller. The size of the simplex is reduced and finally, the coordinates of the minimum point are found. Four possible operations: reflection, expansion, contraction, and shrink are associated with the corresponding scalar parameters: 훼1 (reflection), 훼2 (expansion), 훼3 (contraction) and 훼4 (shrink). In this study, we set the value of these parameters as {훼1, 훼2, 훼3, 훼4} = {1, 2, 0.5, 0.5} as suggested in Gao and Han [65]. The algorithm is implemented by the Python scikit-learn package with all parameters set as default. Since NMSA is designed for unconstrained problems, we turned the bound of Θ into a big penalized term in the objective function for this algorithm. More details regarding the NMSA can be found in Gao and Han [65].

4.3.4 Mesh Adaptive Direct Search (MADS)

The MADS algorithm is a directional direct search framework for nonlinear optimization [72]. It seeks to improve the current solution by testing points in the neighborhood of the current point (the incumbent). The neighborhood points are generated by moving one step in each direction from the incumbent on an iteration-dependent mesh. Each iteration of MADS consists of a SEARCH stage and an optional POLL stage. The SEARCH stage evaluates a finite number of points proposed by the searching strategy (e.g. moving onestep around from the current point). Whenever the SEARCH step fails to generate an improved

77 mesh point, the POLL step is invoked. The POLL step conducts local exploration near the current incumbent, which also intends to find an improved point on the mesh. Once an improved point is found, the algorithm updates the current point and construct a new mesh. According to [72], the mesh size parameters approach zero as the number of iteration approaches to infinity, which demonstrates the convergence of the MADS algorithm.

In this chapter, we use a variant of the MADS method called ORTHO-MADS, which leverages a special orthogonal positive spanning set of polling directions. More details re- garding the algorithm can be found in Abramson et al. [66]. NOMAD 3.9.1 [73] with the Python interface is used for the MADS algorithm application. The hyper-parameters are tuned based on the NOMAD user guide. The direction type is set as orthogonal, with 푁 + 1 directions generated at each poll. Latin Hypercube search is not applied.

4.3.5 Simultaneous Perturbation Stochastic Approximation (SPSA)

SPSA is a descent direction method for finding the local minimum. It approximates the gradient with only two measurements of the objective function, regardless of the dimension of the optimization problem. Denote the objective function in Eq. 4.2 as 푍(Θ). The estimated parameters in the 푘-th iteration is denoted as Θ(푘). Then one iteration for the SPSA is performed as

(푘+1) (푘) (푘) Θ = Θ − 푎푘 · ∇ˆ 푍(Θ ) (4.4) where

푍(Θ(푘) + 푐 ∆ ) − 푍(Θ(푘) − 푐 ∆ ) ∇˜ 푍(Θ(푘)) = 푘 푘 푘 푘 (4.5) 2푐푘∆푘 푎 푎 = (4.6) 푘 (푘 + 1 + 퐴)훼 푐 푐 = (4.7) 푘 (푘 + 1)훾

∆푘 is a random perturbation vector, whose elements are obtained from a Bernoulli distribu- tion with the probability parameter equal to 0.5. {훼, 훾, 푎, 푐, 퐴} are tuned as {0.602, 0.101, 0.001, 0.007, 0.1푀 } in this study according to the numerical tests and guidelines from prior empirical studies [74]. 푀 is the maximum number of iterations.

78 4.3.6 Bayesian Optimization (BYO)

BYO constructs a probabilistic model of the objective function and exploits this model to determine where to evaluate the objective function for the next step. The philosophy of BYO is to use all of the information available from previous evaluations, instead of simply relying on the local gradient and Hessian approximations. This enables BYO to find the minimum of difficult non-convex functions with relatively few function evaluations.

BYO assumes a prior distribution for the objective function values and uses an acquisition function to determine the next point to evaluate. In this study, we use the Gaussian process as the prior distribution for the objective function due to its flexibility and tractability. For the acquisition function, we tested three common criteria: probability of improvement (POI), expected improvement (EI), and upper confidence bound (UCB) [31]. The EI criterion is used in this path choice estimation problem due to its best performance in our problem. The BYO is implemented in Python with bayes_opt package. More details regarding the BYO can be found in Snoek et al. [31].

4.3.7 Constrained Optimization using Response Surfaces (CORS)

CORS is a response surface method for global optimization. In each iteration, it updates the response surface model based on all previously probed points and selects the next point to evaluate. The principles for next point selection are (a) finding new points that have lower objective function value, and (b) improving the fitting of the response surface model by sampling feasible regions where little information exists. Hence, the next point is selected by solving the minimization problem of the current response surface function subject to constraints that the next point should be more than a certain distance away from all previous points [59].

An algorithm following the CORS framework requires two components: (a) a scheme for selecting an initial set of points for objective function evaluation and (b) a procedure for globally approximating the objective function (i.e. a response surface model). In this study, the initial sampling is conducted using the Latin hypercube methods, with the initial sampling number equal to 0.2 × the total number of function evaluations allowed. The radial basis function (RBS) is used as the response surface model. For the subsequent sampling, a modified version of the CORS algorithm with space re-scaling is used. Details aboutthe

79 algorithm can be found in Regis and Shoemaker [59] and Knysh and Korkolis [75].

4.4 Case Study

The proposed modeling framework is tested using data from the Hong Kong MTR network.

4.4.1 Experimental design

We use AFC data on a typical weekday afternoon peak period (18:00-19:00) in March 2017 for the model application. Li [18] conducted a revealed-preference (RP) path choice survey of more than 20,000 passengers in the MTR system and used them to estimate a path choice model. The estimation results are shown in Appendix A. The following attributes were used to in the specification of the model: (a) total in-vehicle time, (b) the number of transfer times, (c) relative walking time (total walking time divided by total path distance) and (d) the commonality factor (Eq. 2.2). As the real-world path choice information and train capacity are usually unavailable, we validate the models with synthetic data. To generate the synthetic data, we first extract the OD entry flow (푞푖푚,푗) from the real-world AFC records. We assume a synthetic Θ as the “true” path choice and train capacity parameters. The TNL model with the true OD entry flow, train timetable, and the synthetic Θ as inputs is used to simulate the travel of passengers in the system and record people’s tap-in and tap-out time. The input timetable is treated as the synthetic AVL data. The resulting passengers’ tap-in and tap-out times are treated as the synthetic AFC data. The synthetic data, including “true” passenger path choices and train capacity, are used to evaluate the performance of the model under the various solution algorithms. To compare the different SBO solving algorithms, we design five test scenarios summa- rized in Table 4.2. Each scenario has a different synthetic Θ. The selection of synthetic Θ can represent different assumptions about passengers’ choice behavior and sensitivity to crowding. For the reference scenario, we use the path choice parameters in Table A.1 as the synthetic 훽, and use the calibrated train capacity parameters in Chapter 2 as the synthetic 휃. Passengers’ actual path choice behavior us assumed to be random (each path is equally likely to be selected) or deterministic. For the random path choice scenario, we set all

80 Table 4.2: Scenario design

Scenarios Parameter category Synthetic Θ Path choice Train capacity Bound Reference Crowding- Crowding- Random Deterministic sensitive insensitive In vehicle time -0.147 0 -2.0 -0.147 -0.147 [-2, 0] Relative walking time -1.271 0 -5.0 -1.271 -1.271 [-5, 0] Path choice Number of transfers -0.573 0 -3.0 -0.573 -0.573 [-3, 0] Commonality factor -3.679 0 -10.0 -3.679 -3.679 [-10, 0]

휃0 232 232 232 225 235 [220, 260] Train capacity 휃1 0.0732 0.0732 0.0732 0.2 0 [0, 0.2] 휃2 0.0607 0.0607 0.0607 0.2 0 [0, 0.2] synthetic choice parameters as 0, which means all available paths are equally likely to be chosen. For the deterministic1 path choice scenario, we set all synthetic choice parameters as the lower bounds (i.e. the maximum absolute value possible). Under this scenario, a slight difference in attributes between two paths can lead to a high difference in choice probability (i.e. this is close to passengers following the shortest path). As for the train capacity, the synthetic 휃 for these two scenarios is the same as the reference scenario. Passengers’ sensitivity to crowding may also vary. If all passengers are not sensitive to the crowding, train capacity can be modeled as a fixed value. However, if passengers become more sensitive to the crowding, the actual train capacity may largely depend on the crowding level in the train and on the platform. Therefore, passengers’ sensitivity to crowding can be reflected by the scale of 휃1 and 휃2 [62]. For the crowding-sensitive scenario, we set the synthetic train capacity parameters as 휃0 = 225, 휃1 = 0.2, 휃2 = 0.2.

Compared to the reference scenario, 휃1 and 휃2 are higher to represent higher sensitivity.

And 휃0 is decreased to offset the capacity increasing caused by the increase of 휃1 and 휃2. As for the crowding-insensitive scenario, we set the synthetic train capacity parameters as

휃0 = 235, 휃1 = 0, 휃2 = 0, which can be seen as a fixed-capacity model.

4.4.2 Case study settings

ini The lower and upper bounds of all parameters (퐿훽, 푈훽, 퐿휃, 푈휃) are shown in Table 4.2. Θ is set as (퐿Θ +푈Θ)/2 for all scenarios. To compare different algorithms, a fixed computational budget, 100 function evaluations, is applied to all algorithms. 푤1 = 1 and 푤2 = 600 are used based on numerical tests. All algorithms except for NMSA (deterministic algorithm)

1The word “deterministic” here just represents the degree of randomness is low. The “truly” deterministic corresponds to all parameters go to → −∞

81 are replicated 5 times (with different random seeds) to decrease the impact of randomness.

4.4.3 Reference scenario results

The convergence results of the reference scenario are depicted in Figure 4-1. Each point represents the average value over all replications. We found that the performance of differ- ent algorithms varied. Given the limited number of function evaluations, CORS, BYO, and SPSA converge to a relatively small objective function. GA, MADS, and SA have relatively large objective functions values upon termination. In terms of convergence speed, the re- sponse surface methods (BYO and CORS) have the fastest convergence speed. They also reach the lowest objective function value. This is consistent with conclusions regarding the performance of the SBO algorithms when used in the transportation domains [56, 58, 20]. Figure 4-1 also summarizes the behavior of the algorithm stability. The vertical indicates the 4 × standard deviations over the five replications. NMSA is a deterministic algorithm and not affected by randomness. BYO and CORS show high randomness in the first half iterations. However, as the number of function evaluations increases, the standard deviation of the objective function decreases, and the results become stable. GA, SA, and MADS are unstable compared to other algorithms. This means that the heuristic algorithms (GA and SA) are not suitable for the calibration problem studied in this chapter. The instability of MADS may be because it may converge to non-stationary points [76].

1 Figure 4-1: Convergence results of the reference scenario. The error bar indicates 4 × standard deviation. NMSA has no error bar because it is a deterministic algorithm.

82 Table 4.3 compares the parameters estimated by different algorithms with the synthetic ones. Although some algorithms can reach similar objective function values, they result in different estimated parameters. For example, CORS and SPSA have similar objective function values. However, SPSA performs better in path choice estimation while CORS performs better in train capacity estimation. We also observe that the train capacity pa- rameters are relatively harder to estimate. This may be because most of the stations in the rail system are not congested and all passengers can board the trains. Thus, the objective function is not very sensitive to the train capacity parameters. Future research can design higher weights for OD exit flows related to the congested stations to increase the sensitivity of train capacity parameters on the objective function.

Table 4.3: Calibration results of the reference scenario

Estimated parameters Category Variable name “True” GA SA NMSA MADS SPSA BYO CORS In-vehicle time -0.147 -0.392 -0.327 -0.342 -0.454 -0.170 -0.207 -0.229 Relative walking time -1.271 -2.205 -3.010 -3.020 -0.302 -2.257 -2.493 -2.486 Path choice Number of transfers -0.573 -1.143 -0.787 -0.389 -1.248 -0.598 -0.776 -0.756 Commonality factor -3.679 -6.482 -6.851 -7.250 -7.834 -4.419 -5.434 -5.716

휃0 232 239 243 259 252 241 234 243 Train capacity 휃1 0.073 0.117 0.118 0.146 0.040 0.162 0.110 0.069 휃2 0.061 0.069 0.110 0.080 0.080 0.163 0.100 0.086 Objective function - 676,392 416,923 359,663 773,526 245,269 258,688 203,885

4.4.4 Sensitivity analysis

Impact of randomness in path choice behavior

Figure 4-2 shows the estimation results for two path choice related scenarios: random and deterministic. The estimated parameters are shown in Table 4.4 and 4.5. For the random scenario, all “true” (synthetic) path choice parameters are set as zero, which means all paths are equally likely to be chosen. We observe that, in this scenario (Figure 4-2a), CORS and SA algorithms perform the best with the lowest objective function. Compared to the reference scenario in Section 4.4.3, the decreased performance of BYO and SPSA may be because the “true” 훽 is close to the upper-bound (푈훽 = 0). The Gaussian posterior distribution in BYO and gradient estimation in SPSA can suffer from instability in the boundary. From Table 4.4, we observe the parameters of in-vehicle time and number of transfers are better estimated than those of relative walking time and commonality factors.

83 Figure 4-2b shows the results of the deterministic scenario. The initial objective function is relatively small (1.5 × 105) compared to the reference scenario (1.5 × 106). All algorithms 1 only reduce the objective function by around 3 except for the CORS algorithm. The good performance of CORS may come from the global searching with the Latin hypercube method. It is better suited to explore the points near boundaries. Although the objective function does not decrease too much, the estimated parameters are still acceptable (see Table 4.5).

(a) Random

(b) Deterministic

Figure 4-2: Algorithm performance in the two path choice scenarios

84 Table 4.4: Calibration results of the random path choice scenario

Estimated Parameters Category Variable Name “True” GA SA NMSA MADS SPSA BYO CORS In-vehicle time 0 0 -0.072 -0.050 0 -0.108 -0.037 0 Relative walking time 0 -2.151 -1.139 -1.807 -1.000 -1.719 -3.725 -0.702 Path choice Number of transfers 0 -0.348 -0.185 -0.435 -1.334 -0.631 -0.207 0 Commonality factor 0 -5.997 -1.945 -9.991 -5.432 -5.127 -4.155 -8.000 휃0 232 243 224 254 232 241 248 223 Train capacity 휃1 0.073 0.067 0.050 0.124 0.048 0.106 0.079 0.016 휃2 0.061 0.037 0.072 0.136 0.134 0.112 0.159 0.072 Objective function - 1,202,761 756,321 1,399,836 1,365,291 1,429,942 1,203,696 855,627

Table 4.5: Calibration results of the deterministic path choice scenario

Estimated Parameters Category Variable Name “True” GA SA NMSA MADS SPSA BYO CORS In-vehicle time -2 -1.240 -1.243 -1.205 -1.160 -1.544 -1.537 -1.830 Relative walking time -5 -3.180 -3.358 -2.819 -2.480 -3.728 -3.807 -4.492 Path choice Number of transfers -3 -1.575 -1.551 -1.419 -1.524 -1.786 -1.761 -2.661 Commonality factor -10 -5.307 -5.251 -4.735 -4.920 -6.346 -6.379 -8.819 휃0 232 237 232 228 237 239 232 237 Train capacity 휃1 0.073 0.095 0.076 0.095 0.180 0.097 0.110 0.123 휃2 0.061 0.101 0.069 0.091 0.062 0.110 0.106 0.093 Objective function - 125,100 128,157 118,915 135,922 113,805 124,448 63,220

Impact of crowding sensitivity

Figure 4-3 shows the calibration results of the two scenarios related to train capacity (i.e., crowding-sensitive and crowding-insensitive). In the crowding-insensitive scenario (Figure 4-3a), the conclusions are similar to the reference scenario. CORS, BYO, NMSA, and SPSA converge to low objective function values and outperform other algorithms. The performance of NMSA and MADS is improved compared to the reference scenario. In the crowding-sensitive scenario, we still observe a good performance by the CORS, NMSA, and SPSA algorithms. The performance of BYO is slightly reduced. The results shown in Table

4.6 and 4.7 indicate that 휃0 (base capacity) is hard to estimate. This may be because trains at most stations do not reach the capacity. Therefore, for many OD pairs, the OD exit flows (directly related to the objective function) are not sensitive to the base capacity parameter.

Table 4.6: Calibration results of the crowding-insensitive train capacity scenario

Estimated Parameters Category Variable Name “True” GA SA NMSA MADS SPSA BYO CORS In-vehicle time -0.147 -0.392 -0.181 -0.254 -0.460 -0.191 -0.197 -0.177 Relative walking time -1.271 -2.153 -2.044 -2.636 -2.294 -2.284 -2.469 -2.025 Path choice Number of transfers -0.573 -1.127 -1.614 -1.279 -0.490 -0.760 -0.908 -1.011 Commonality factor -3.679 -6.489 -6.500 -7.492 -7.750 -5.299 -5.474 -5.130 휃0 235 239 245 249 230 241 238 236 Train capacity 휃1 0 0.088 0.109 0.096 0.050 0.096 0.093 0.084 휃2 0 0.05 0.058 0.108 0.050 0.150 0.078 0.063 Objective function - 700,441 418,196 277,835 553,765 241,533 305,846 277,212

85 (a) Crowding-insensitive

(b) Crowding-sensitive

Figure 4-3: Algorithm performance in the two train capacity scenarios

4.5 Discussion

In this chapter, we propose an SBO framework to estimate train capacity and path choice model parameters simultaneously in metro systems using AFC and AVL data. The advan- tage of the proposed framework lies in capturing the collective effect of both path choices and train capacity on passenger journey times. Seven representative algorithms from four main brunches of SBO methods are applied and compared with respect to their solution accuracy, convergence speed, and stability. We applied the proposed framework using data from the Hong Kong MTR network and compared the performance of the different algo- rithms. Overall, the results show that some algorithms result in a reasonable estimation of

86 Table 4.7: Calibration results of the crowding-sensitive train capacity scenario

Estimated Parameters Category Variable Name “True” GA SA NMSA MADS SPSA BYO CORS In-vehicle time -0.147 -0.472 -0.217 -0.228 -0.332 -0.177 -0.195 -0.196 Relative walking time -1.271 -2.533 -1.575 -2.735 -1.568 -2.118 -1.763 -2.534 Path choice Number of transfers -0.573 -0.759 -1.169 -1.323 -0.816 -0.495 -0.892 -0.734 Commonality factor -3.679 -6.489 -6.324 -7.040 -7.834 -4.361 -6.046 -5.021 휃0 225 238 244 238 245 244 237 237 Train capacity 휃1 0.2 0.166 0.149 0.121 0.112 0.140 0.085 0.099 휃2 0.2 0.080 0.114 0.125 0.144 0.129 0.123 0.110 Objective function - 765,621 320,745 231,228 502,341 199,753 335,558 169,057 the parameters of interest. These results also support the effectiveness of the proposed SBO framework for estimating these key parameters using AFC and AVL data. Especially, the response surface methods (particularly CORS) exhibit consistently good performance. The method proposed in this chapter has some limitations. First, we validate the frame- work and evaluate the algorithmic performance only using synthetic AFC and AVL data. Therefore, the complexities of noise and uncertainties in actual data do not play any role. This is caused by the absence of real-world path choice and train capacity information. Fu- ture research can collect real-world path choice and train capacity data to conduct more realistic model validation. Second, we assumed that the path choice behavior is similar for the whole network (same 훽 values). Given the real-world path choice behavior is possibly more diverse and heterogeneous, future research can explore clustering different OD pairs with different 훽 values.

87 THIS PAGE INTENTIONALLY LEFT BLANK

88 Chapter 5

Conclusion

5.1 Summary of Results

This thesis proposes a data-driven NPM for rail transit system performance monitoring. The NPM consists of two major components: 1) a network loading engine which takes AVL data (or time table), AFC data (or OD entry flow), network, train capacity, and path choices as inputs, and outputs performance indicators of interest, and 2) a calibration engine which can estimate path choice and train capacity parameters using AFC and AVL data. Specifically, we introduce the NPM formulation and functionality in Chapter 2 along with the network loading model and an effective train capacity model. An assignment-based path choice estimation model is proposed in Chapter 3. Chapter 4 proposes a simultaneous calibration model for both path choices and train capacity. The main results are summarized as follows.

NPM formulation and functionality

Chapter 2 describes the formulation and functionality of NPM. The event-based network loading model, path choice model, and effective train capacity model are proposed. The effective capacity is calibrated using an SBO method. Path choices from Li [18] isused.The use of NPM for performance monitoring is demonstrated by analyzing the spatial-temporal crowding patterns in the MTR system and evaluating express train dispatching strategies. The performance of NPM is validated by comparing the outputs of NPM (with effective capacity) with the field observations at Admiralty station and the outputs of a benchmark fixed-capacity model. Results show that the numbers of boarding and arriving passengers at Admiralty station from the NPM match the ground truth observations well. The estimates

89 of the number of boarding passengers, deny-boarding rate, and exit flows from the NPM are closer to the observed values compared to the benchmark model. NPM is used to identify crowding stations. From the results, the most congested plat- form during the evening peak is at Admiralty station (Tsuen Wan Line northbound). The platform has a deny-boarding rate of 0.75, which means 75% passengers are denied boarding at least once in this platform during the evening peak. NPM is also used for evaluating dif- ferent dispatching strategies. By testing different express train dispatching strategies from Central to Admiralty station, we find dispatching express trains transfers the congestion from Admiralty to Central. The strategy temporally decreases the left-behind passengers at Admiralty. Dispatching at 18:30 reduces the number of left behind more than dispatching at 18:40 because the former targets better the peak of the crowding conditions.

Assignment-based Path Choice Calibration

Chapter 3 proposes an assignment-based path choice estimation framework for urban rail systems using automated fare collection (AFC) data. The framework captures the crowding correlation among stations and the interaction between path choice and passenger denied boarding. The path choice estimation is formulated as an optimization problem, which attempts to minimize the error between assignment outputs (which is a function of path choices) and the corresponding quantities observed from AFC data. The original problem is intractable because of a non-linear multinomial logit equation constraint and a non-analytical black-box function constraint (i.e. assignment model). A solution procedure is proposed to decompose the original problem into three tractable sub-problems: rough path shares estimation, choice parameters estimation, and path exit rates estimation. The sub-problems can all be solved efficiently. We prove the solution of the decomposed problem is equivalent to the original problem under specific conditions. The model is validated using both synthetic data and real-world AFC data. Results from synthetic data show estimated path choice parameters are very close to the “true” (synthetic) ones. The proposed method outperforms the benchmark SBO models in both the convergence rate and final solutions quality. For the real-world data results, the scale of all estimated coefficients from real-world data is similar to the previous survey results [18]. The substitution patterns are reasonable and also similar to the previous ones. The model robustness is tested under different initial values, and for different days. We

90 find the estimated path choice parameters are very stable regardless of initial values.And all estimated values are consistent across days except for the coefficient of relative walking time on March 17 (Friday). The exception can be explained by different travel behaviors on Friday nights.

Simultaneous Calibration of Path choices and Train Capacity

Chapter 4 proposes a simulation-based optimization (SBO) framework to calibrate path choices and train capacity for urban rail systems simultaneously using AFC and AVL data. The calibration is formulated as an optimization problem with a black-box objective func- tion. Seven optimizers (solving algorithms) from four brunches of SBO solving methods are evaluated. The algorithms are evaluated using an experimental design that includes five scenarios, representing different degrees of path choice randomness and crowding sensitivity. Results show that part of the algorithms can well estimate the path choice and train capacity variables. But the performance of different algorithms varied. In the reference scenario, given the limited number of function evaluations, CORS, BYO, and SPSA con- verge to a relatively small objective function. GA, MADS, and SA have relatively large objective functions values upon termination. In terms of convergence speed, the response surface methods (BYO and CORS) have the fastest convergence speed. The algorithms also show different stability. NMSA is a deterministic algorithm and not affected by random- ness. BYO and CORS show high randomness in the first half iterations. However, as the number of function evaluations increases, the standard deviation of the objective function decreases, and the results become stable. GA, SA, and MADS are unstable compared to other algorithms. The results of other scenarios are similar. The response surface methods (particularly CORS) exhibit consistently good performance.

5.2 Future Research

5.2.1 Calibration methodology

As we shown in Figure 1-5 and discussed in Chapter 4. Train capacity and path choices can collectively impact the OD exit flows and passenger journey times. Therefore, singly calibrating one parameter and fix another may not fully capture network interactions and can introduce estimation bias if the fixed values are not properly selected. Simultaneous

91 calibration is a better way to develop the calibration engine in the NPM.

This thesis has developed two different calibration frameworks: SBO-based (Chapter 2 and 4) and assignment-based (Chapter 2). The SBO-based framework is capable of simulta- neous estimating path choices and train capacity. However, due to its stochastic properties in optimization and the requirement of running a large number of expensive simulation-based function evaluations, the SBO-based framework may not be efficient and stable enough, especially for day-to-day monitoring purposes where the parameters are required for daily updating. The assignment-based framework can outperform the SBO-based framework in convergence speed and estimation accuracy (Section 3.4.3). However, as it is constructed on the aggregated TS hyper-network, the detailed information on passenger boarding behavior, such as train capacity, cannot be explicitly modeled. Therefore, an efficient and reliable simultaneous calibration model for path choices and train capacity is an interesting future research direction.

We provide two modeling ideas. The first is to replace the aggregated TS hyper-network to a fine-grained one, where one TS node represents a train departure time for that station

(i.e. 푖푚 represents station 푖 at the time when the 푚-th train leaves the station). Figure 5-1 shows an example of a fine-grained TS hyper-network, where the TS nodes correspond to the train departure times in the timetable exactly. The number of boarding passengers for each train can be explicitly modeled in the hyper-network by summing all hyper-path flows starting from a specific TS node. Then train capacity can be incorporated and estimated by adding a constraint on the number of boarding passengers with respect to the capacity. However, the drawback of this modeling framework is the exponentially increasing number of TS nodes with the increase of network scales and study periods. Thus, a better solving algorithm is needed when using the fine-grained network representation. The new algorithm should make the large-scale estimation problem have a simple tractable form (e.g., linear programming) so that it can be practical in the real world.

The second modeling idea is capturing train capacity implicitly following our aggregated TS hyper-network. This can be done by introducing the “capacity of hyper-links”. For example, suppose 푖1 and 푗1 represent station 푖 and 푗 at time 7:00-7:15, respectively, where

푖 and 푗 are two adjacent stations. We can define the capacity for hyper-link푖 ( 1,푗1) as the maximum number of passengers that can move from 푖 to 푗 in 7:00-7:15. This can be calculated as the number of trains moving from 푖 to 푗 (obtained from AVL data or timetable)

92 Figure 5-1: Example of a fine-grained TS hyper-network times the train capacity. In this way, the train capacity can be incorporated and estimated by adding a constraint on the hyper-link flow with respect to the hyper-link capacity. A

푖푚,푗푛 hyper-link flow can be calculated as the sum over all path flows (i.e., 푞푟 ) passing through the link.

5.2.2 Behavioral effective capacity model

The effective capacity model in Chapter 2 assumes that train capacity is a function oftrain load and number of queuing passengers. This can be seen as an empirical model observed from data [61]. However, from behavioral perspectives, the actual number of boarding passengers should be determined by queuing passengers’ willingness to board [8]. And the willingness to board is affected by a) waiting passenger distribution (density) on the platform, b) train load and distribution across the train, c) passengers’ position in the queue, and d) passengers’ socio-demographic information. Future research can propose a behavioral model to quantify the probability of passengers boarding a train. The parameters in the behavioral model can be estimated using AFC data with a maximum likelihood estimation [41].

5.2.3 From monitoring to control and planning

As shown in Figure 1-1, the data-driven public transit management system has two other tasks: control and planning. The NPM can be extended for these two tasks as well. The diagram of the NPM extension for the real-time control is shown in Figure 5-2. The goal is to adaptively adjust the real-time timetable (including routing, headway, etc..)

93 to maximize the system performance defined by the operators. A new training engine is introduced to obtain the optimal control strategy. This can be formulated as a reinforcement learning problem [77]. After training, the results can be applied to a control engine that updates the real-time timetable to improve service performance.

Figure 5-2: NPM extension for real-time control

The NPM can also be extended for planning. As shown in Figure 5-3, by adjusting the inputs to predicted future demand, future network, and planned timetable, the NPM can be used to evaluate different planning policies. An automated timetable design engine can also be added, which is responsible to find the available or optimal timetables that can satisfy the pre-determined planning objectives. The timetable design problem can be formulated as an optimization problem with the objective function set as the planning objective. Then it can be solved with a similar framework proposed in the thesis for path choices and train capacity calibration.

94 Figure 5-3: NPM extension for planning

95 THIS PAGE INTENTIONALLY LEFT BLANK

96 Appendix A

Passenger Route Choice Model for MTR System

These results are from [18]. The C-logit Model formulation is the same as Eq. (2.1) and Eq. (2.2). A total number of 31,640 passengers completed the questionnaire. After filtering duplicate responses, 26,996 responses were available. The model results are shown in Table A.1. The main explanatory variables are the total in-vehicle time, relative transfer walking time, and number of transfers. All variables are statistically significant with the expected signs. Routes with high in-vehicle time, walking time, and number of transfers are less likely to be chosen by passengers.

Table A.1: Route Choice Model Estimation Results

Estimate Std. Error t-value

In-vehicle time -0.147 0.011 -13.64 *** Relative walking time -1.271 0.278 -4.56 *** Number of transfers -0.573 0.084 -6.18 *** Commonality factor -3.679 1.273 -2.89 **

휌2 = 0.54

***: 푝 < 0.01; **: 푝 < 0.05.

97 THIS PAGE INTENTIONALLY LEFT BLANK

98 Bibliography

[1] Haris N Koutsopoulos, Peyman Noursalehi, Yiwen Zhu, and Nigel HM Wilson. Au- tomated data in transit: Recent developments and applications. In 2017 5th IEEE international conference on models and technologies for intelligent transportation sys- tems (MT-ITS), pages 604–609. IEEE, 2017. [2] Martin Trépanier, Catherine Morency, and Bruno Agard. Calculation of transit per- formance measures using smartcard data. Journal of Public Transportation, 12(1):5, 2009. [3] Xiaolei Ma and Yinhai Wang. Development of a data-driven platform for transit perfor- mance measures using smart card and gps data. Journal of Transportation Engineering, 140(12):04014063, 2014. [4] Zhenliang Ma, Haris N Koutsopoulos, Yunqing Chen, and Nigel HM Wilson. Estimation of denied boarding in urban rail systems: alternative formulations and comparative analysis. Transportation Research Record, page 0361198119857034, 2019. [5] Agostino Nuzzolo, Umberto Crisalli, Luca Rosati, and Antonio Comi. Dybus2: a real- time mesoscopic transit modeling framework. In 2015 IEEE 18th International Con- ference on Intelligent Transportation Systems, pages 303–308. IEEE, 2015. [6] Felipe Nunez, Francisco Reyes, Pablo Grube, and Aldo Cipriano. Simulating railway and metropolitan rail networks: From planning to on-line control. IEEE Intelligent Transportation Systems Magazine, 2(4):18–30, 2010. [7] Baichuan Mo, Zhenliang Ma, Haris N Koutsopoulos, and Jinhua Zhao. Capacity- constrained network performance model for urban rail systems. Transportation Research Record, page 0361198120914309, 2020. [8] Zhiyuan Liu, Shuaian Wang, Weijie Chen, and Yuan Zheng. Willingness to board: a novel concept for modeling queuing up passengers. Transportation Research Part B: Methodological, 90:70–82, 2016. [9] Ennio Cascetta, Agostino Nuzzolo, Francesco Russo, and Antonino Vitetta. A modified logit route choice model overcoming path overlapping problems. specification and some calibration results for interurban networks. In Transportation and Traffic Theory. Pro- ceedings of The 13th International Symposium On Transportation And Traffic Theory, Lyon, France, 24-26 July 1996, 1996. [10] Xiaolei Ma, Yao-Jan Wu, Yinhai Wang, Feng Chen, and Jianfeng Liu. Mining smart card data for transit riders’ travel patterns. Transportation Research Part C: Emerging Technologies, 36:1–12, 2013.

99 [11] Zhan Zhao, Haris N Koutsopoulos, and Jinhua Zhao. Individual mobility prediction using transit smart card data. Transportation research part C: emerging technologies, 89:19–34, 2018.

[12] Mariko Utsunomiya, John Attanucci, and Nigel Wilson. Potential uses of transit smart card registration and transaction data to improve transit planning. Transportation research record, 1971(1):118–126, 2006.

[13] Marie-Pier Pelletier, Martin Trépanier, and Catherine Morency. Smart card data use in public transit: A literature review. Transportation Research Part C: Emerging Tech- nologies, 19(4):557–568, 2011.

[14] E Mazloumi, G Currie, and M Sarvi. Assessing measures of transit travel time vari- ability and reliability using avl data. In Transportation Research Board 87th Annual MeetingTransportation Research Board, 2008.

[15] Amer Shalaby and Ali Farhan. Prediction model of bus arrival and departure times using avl and apc data. Journal of Public Transportation, 7(1):3, 2004.

[16] Fabian Cevallos, Xiaobo Wang, Zhenmin Chen, and Albert Gan. Using avl data to improve transit on-time performance. Journal of Public Transportation, 14(3):2, 2011.

[17] Dan Levy and Llew Lawrence. The Use of Automatic Vehicle Location for Planning and Management Information. Number STRP# 4. 1991.

[18] Weixuan Li. Route and transfer station choice modeling in the system. Working paper, 2014.

[19] Baichuan Mo, Zhenliang Ma, Haris Koutsopoulos, and Jinhua Zhao. Assignment-based path choice estimation for metro system using smart card data. In 24th International Symposium on Transportation & Traffic Theory (ISTTT), 2020.

[20] Baichuan Mo, Zhenliang Ma, Haris Koutsopoulos, and Jinhua Zhao. Calibrating route choice for urban rail system: A comparative analysis using simulation-based optimiza- tion methods. In Transportation Rsearch Board 99th Annual Meeting, 2020.

[21] Haris N. Koutsopoulos, Zhenliang Ma, Peyman Noursalehi, and Yiwen Zhu. Chapter 10 - transit data analytics for planning, monitoring, control, and information. In Constanti- nos Antoniou, Loukas Dimitriou, and Francisco Pereira, editors, Mobility Patterns, Big Data and Transport Analytics, pages 229 – 261. Elsevier, 2019.

[22] Zhenliang Ma and Haris N Koutsopoulos. Optimal design of promotion based demand management strategies in urban rail systems. Transportation Research Part C: Emerg- ing Technologies, 109:155–173, 2019.

[23] Neema Nassir, Mark Hickman, and Zhen-Liang Ma. A strategy-based recursive path choice model for public transit smart card data. Transportation Research Part B: Methodological, 126:528 – 548, 2019.

[24] Agostino Nuzzolo, Francesco Russo, and Umberto Crisalli. A doubly dynamic schedule- based assignment model for transit networks. Transportation Science, 35(3):268–285, 2001.

100 [25] Xiangming Yao, Baomin Han, Dandan Yu, and Hui Ren. Simulation-based dynamic passenger flow assignment modelling for a schedule-based transit network. Discrete Dynamics in Nature and Society, 2017, 2017.

[26] Pablo Grube, Felipe Núñez, and Aldo Cipriano. An event-driven simulator for multi- line metro systems and its application to santiago de chile metropolitan rail network. Simulation Modelling Practice and Theory, 19(1):393–405, 2011.

[27] John Preston, James Pritchard, and Ben Waterson. Train overcrowding: investigation of the provision of better information to mitigate the issues. Transportation research record, 2649(1):1–8, 2017.

[28] Yiwen Zhu, Haris N Koutsopoulos, and Nigel HM Wilson. A probabilistic passenger- to-train assignment model based on automated data. Transportation Research Part B: Methodological, 104:522–542, 2017.

[29] Moshe E Ben-Akiva, Steven R Lerman, and Steven R Lerman. Discrete choice analysis: theory and application to travel demand, volume 9. MIT press, 1985.

[30] Massachusetts Bay Transportation Authority. At what level does crowding become un- acceptable. https://www.mbtabackontrack.com/blog/48-at-what-level-does-crowding- become-unacceptable, 2016. Accessed: 2019-06-27.

[31] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951–2959, 2012.

[32] Mass Transit Railway, Hong Kong. Mtr maintains 99.9 percent on-time performance in 2017. https://www.mtr.com.hk/archive/corporate/en/press_release/PR-18-009-E.pdf, 2018. Accessed: 2020-02-18.

[33] Takahiko Kusakabe, Takamasa Iryo, and Yasuo Asakura. Estimation method for railway passengers’ train choice behavior with smart card transaction data. Transportation, 37 (5):731–749, 2010.

[34] Feng Zhou and Rui-hua Xu. Model of passenger flow assignment for urban rail transit based on entry and exit time constraints. Transportation Research Record, 2284(1): 57–61, 2012.

[35] Pramesh Kumar, Alireza Khani, and Qing He. A robust method for estimating transit passenger trajectories using automated data. Transportation Research Part C: Emerging Technologies, 95:731–747, 2018.

[36] Yiwen Zhu, Haris N Koutsopoulos, and Nigel HM Wilson. Passenger itinerary infer- ence model for congested urban rail networks. Transportation Research C: Emerging Technologies, 2020. Under review.

[37] Yanshuo Sun and Ruihua Xu. Rail transit travel time reliability and estimation of passenger route choice behavior: Analysis using automatic fare collection data. Trans- portation Research Record, 2275(1):58–67, 2012.

101 [38] Lijun Sun, Yang Lu, Jian Gang Jin, Der-Horng Lee, and Kay W Axhausen. An in- tegrated bayesian approach for passenger flow assignment in metro networks. Trans- portation Research Part C: Emerging Technologies, 52:116–131, 2015.

[39] Juanjuan Zhao, Fan Zhang, Lai Tu, Chengzhong Xu, Dayong Shen, Chen Tian, Xiang- Yang Li, and Zhengxi Li. Estimation of passenger route choice pattern using smart card data for complex metro systems. IEEE Transactions on Intelligent Transportation Systems, 18(4):790–801, 2017.

[40] Xinyue Xu, Liping Xie, Haiying Li, and Lingqiao Qin. Learning the route choice behav- ior of subway passengers from afc data. Expert Systems with Applications, 95:324–332, 2018.

[41] Yiwen Zhu. Passenger-to-itinerary assignment model based on automated data. PhD thesis, Northeastern University, 2017.

[42] Wenjing Song, Ke Han, Yiou Wang, Terry Friesz, and Enrique Del Castillo. Statistical metamodeling of dynamic network loading. Transportation research procedia, 23:263– 282, 2017.

[43] Soi-Hoi Lam and Feng Xie. Transit path-choice models that use revealed preference and stated preference data. Transportation Research Record, 1799(1):58–65, 2002.

[44] Mohsen Nazem, Martin Trépanier, and Catherine Morency. Demographic analysis of route choice for public transit. Transportation Research Record, 2217(1):71–78, 2011.

[45] Naveen Eluru, Vincent Chakour, and Ahmed M El-Geneidy. Travel mode choice and transit route choice behavior in montreal: insights from mcgill university members commute patterns. Public Transport, 4(2):129–149, 2012.

[46] Carlo Giacomo Prato. Route choice modeling: past, present and future research direc- tions. Journal of choice modelling, 2(1):65–100, 2009.

[47] Sang Nguyen, Stefano Pallottino, and Federico Malucelli. A modeling framework for passenger assignment on a transport network with timetables. Transportation Science, 35(3):238–249, 2001.

[48] Younes Hamdouch and Siriphong Lawphongpanich. Schedule-based transit assignment model with travel strategies and capacity constraints. Transportation Research Part B: Methodological, 42(7-8):663–684, 2008.

[49] Younes Hamdouch, HW Ho, Agachai Sumalee, and Guodong Wang. Schedule-based transit assignment model with vehicle capacity and seat availability. Transportation Research Part B: Methodological, 45(10):1805–1830, 2011.

[50] James Davis, Guillermo Gallego, and Huseyin Topaloglu. Assortment planning under the multinomial logit model with totally unimodular constraint structures. Work in Progress, 2013.

[51] Bilge Atasoy, Takuro Ikeda, Xiang Song, and Moshe E Ben-Akiva. The concept and impact analysis of a flexible mobility on demand system. Transportation Research Part C: Emerging Technologies, 56:373–392, 2015.

102 [52] Leslie E Papke and Jeffrey M Wooldridge. Econometric methods for fractional response variables with an application to 401 (k) plan participation rates. Journal of applied econometrics, 11(6):619–632, 1996.

[53] Hui Luan and Zhihong Xia. Theorem of existence and uniqueness of fixed points of monotone operators. In Genetic and Evolutionary Computing, pages 11–17. Springer, 2015.

[54] Baichuan Mo, Zhenliang Ma, Haris Koutsopoulos, and Jinhua Zhao. Capacity- constrained network performance model for urban rail systems. Transportation Research Record., 2020.

[55] Baichuan Mo, Zhenliang Ma, Haris Koutsopoulos, and Jinhua Zhao. Calibrating route choice for urban rail system: A comparative analysis using simulation-based optimiza- tion methods. Transportation Research Board 99th Annual Meeting, 2020.

[56] Carolina Osorio and Michel Bierlaire. A simulation-based optimization framework for urban transportation problems. Operations Research, 61(6):1333–1345, 2013.

[57] Satyajith Amaran, Nikolaos V Sahinidis, Bikram Sharda, and Scott J Bury. Simulation optimization: a review of algorithms and applications. Annals of Operations Research, 240(1):351–380, 2016.

[58] Qixiu Cheng, Shuaian Wang, Zhiyuan Liu, and Yu Yuan. Surrogate-based simula- tion optimization approach for day-to-day dynamics model calibration with real data. Transportation Research Part C: Emerging Technologies, 105:422–438, 2019.

[59] Rommel G Regis and Christine A Shoemaker. Constrained global optimization of ex- pensive black box functions using radial basis functions. Journal of Global optimization, 31(1):153–171, 2005.

[60] Baichuan Mo, Zhenliang Ma, Haris Koutsopoulos, and Jinhua Zhao. Assignment-based path choice estimation for metro system using smart card data. In 24th International Symposium on Transportation & Traffic Theory (ISTTT), 2020.

[61] Cai Xu and Ding Yong. Modeling of passengers’ boarding during the urban rail transit rush hours.

[62] Baichuan Mo, Zhenliang Ma, Haris Koutsopoulos, and Jinhua Zhao. Capacity- constrained network performance model for urban rail systems. Transportation Research Record, 2020.

[63] Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, and Christian Gagné. DEAP: Evolutionary algorithms made easy. Jour- nal of Machine Learning Research, 13:2171–2175, jul 2012.

[64] Constantino Tsallis and Daniel A Stariolo. Generalized simulated annealing. Physica A: Statistical Mechanics and its Applications, 233(1-2):395–406, 1996.

[65] Fuchang Gao and Lixing Han. Implementing the nelder-mead simplex algorithm with adaptive parameters. Computational Optimization and Applications, 51(1):259–277, 2012.

103 [66] Mark A Abramson, Charles Audet, John E Dennis Jr, and Sébastien Le Digabel. Or- thomads: A deterministic mads instance with orthogonal directions. SIAM Journal on Optimization, 20(2):948–966, 2009.

[67] James C Spall et al. Multivariate stochastic approximation using a simultaneous pertur- bation gradient approximation. IEEE transactions on automatic control, 37(3):332–341, 1992.

[68] Darrell Whitley. A genetic algorithm tutorial. Statistics and computing, 4(2):65–85, 1994.

[69] Peter JM Van Laarhoven and Emile HL Aarts. Simulated annealing. In Simulated annealing: Theory and applications, pages 7–15. Springer, 1987.

[70] Scipy. Scipy dual annealing algorithm. https://docs.scipy.org/doc/scipy/reference/ generated/scipy.optimize.dual_annealing.html, 2019. Accessed: 2020-04-17.

[71] John A Nelder and Roger Mead. A simplex method for function minimization. The computer journal, 7(4):308–313, 1965.

[72] Charles Audet and John E Dennis Jr. Mesh adaptive direct search algorithms for constrained optimization. SIAM Journal on optimization, 17(1):188–217, 2006.

[73] C Audet, S Le Digabel, and C Tribes. Nomad user guide. Rapport technique, 2009.

[74] J. Gomez-Dans. A simultaneous perturbation stochastic approximation optimisation code in python, 2012.

[75] Paul Knysh and Yannis Korkolis. Blackbox: A procedure for parallel optimization of expensive black-box functions. arXiv preprint arXiv:1605.00998, 2016.

[76] Mark A Abramson and Charles Audet. Convergence of mesh adaptive direct search to second-order stationary points. SIAM Journal on Optimization, 17(2):606–619, 2006.

[77] Li Zhu, Ying He, F Richard Yu, Bin Ning, Tao Tang, and Nan Zhao. Communication- based train control system performance optimization using deep reinforcement learning. IEEE Transactions on Vehicular Technology, 66(12):10705–10717, 2017.

104